One of the exciting things Intel has brought to RSA 2019 is Intel SGX Card . Yet there is not much information about this coming hardware. This post collects some related documentation from Intel and speculates what could happen within Intel SGX Card with a focus on software architecture, cloud deployment, and security analysis. NOTE: all the figures come from public Intel blog posts and documentation, and there is no warranty for my speculations on Intel SGX Card! Read with caution!
1. Intel SGX Card
According to , “Though Intel SGX technology will be available on future multi-socket Intel® Xeon® Scalable processors, there is pressing demand for its security benefits in this space today. Intel is accelerating deployment of Intel SGX technology for the vast majority of cloud servers deployed today with the Intel SGX Card. Additional benefits offer access to larger, non-enclave memory spaces, and some additional side-channel protections when compartmentalizing sensitive data to a separate processor and associated cache.”
Simply put, Intel SGX Card is introduced to address 3 problems on SGX usage within cloud:
- Older servers/CPUs that do not support SGX
- Small EPC memory pool
- Side-channel attacks
Accordingly, Intel SGX Card is designed as a PCIe card, which can be plugged into old servers. This solves the first problem. But what about the second and the third problems? How could Intel SGX Card have larger EPC memory pool and defend against side-channel attacks? To answer these questions, we need to look into the internals of Intel SGX Card.
2. Intel VCA
According to , Intel SGX Card is actually built upon Intel VCA, the Intel® Visual Compute Accelerator (Intel® VCA) card . Moreover, “Intel VCA is a purpose-built accelerator designed to boost performance of visual computing workloads like media transcoding, object recognition and tracking, and cloud gaming, originally developed as a way to improve video creation and delivery. In the Intel® SGX Card, the graphics accelerator has been disabled and the system re-optimized specifically for security purposes. In order to take advantage of Intel SGX technology, three Intel Xeon E processors are hosted in the card, which can fit inside existing, multi-socket server platforms being used in data centers today.”
Alright, so Intel SGX Card is Intel VCA with graphics accelerator disabled essentially. Now it is time to learn what Intel VCA is. After some digging online, I found 2 precious documentations describing hardware specification  and software guide  respectively. Readers are encouraged to give a careful read on these documentations. Below is the TL;DR version.
The Intel VCA (or VCA 2) is a PCIe card with 3 Xeon CPUs. As shown in the figure above, each CPU has its own DRAM, instead of sharing RAMs. The internal architecture below shows better the nature of this card: 3 computers within a PCIe card.
These 3 CPUs do not only have their own DRAMs but also PCH chipsets and Flashes. They are connected and multiplexed by a PCIe bridge connecting with the host machine. Note that VCA 2 also supports optional NVM storage M2, as shown in the figure above. Let’s take a look at the software stack.
Did I say “3 computers within a PCIe card”? I actually mean it. Each CPU within the VCA card runs its own software stack, including UEFI/BIOS, operating system, drivers, SDKs, and applications. These operating systems could be Linux or Windows. Hypervisors are also supported including KVM and Xen. Even “better”, each CPU is also equipped with Intel SPS and ME. If you count ME as a microcomputer as well, now we have 3 microcomputers running inside 3 computers within 1 PCIe card.
Each computer within VCA is also called a node. Therefore, there are 3 nodes within 1 VCA card. Unlike typical PCIe cards, VCA exposes itself as virtual network interfaces to the host machine. For example, 2 VCA cards (6 nodes) add 6 different Virt eth interfaces to the host machine, as shown in the figure above. These Virt eth interfaces are implemented as MMIO over PCIe. Given that each node is indeed an independent computer system with full software stacks, this virtual network interface concept might be a reasonable abstraction. I was worried about the overhead of going through TCP/IP stack. Then I realize that Intel could provide dedicated drivers on both the host and the node side to bypass the TCP/IP stack, which is very possible, as suggested by those VCA drivers. It would be interesting to see what “packet” is sent and received from these virtual NICs. To support high bandwidth and throughput, the MMIO region is 4GB minimum. This means each node takes a 4GB memory space from the main system memory, as well as its internal memory.
3. Speculations on Intel SGX Card
Once we have some basic understanding of Intel VCA, we can now speculate what Intel SGX Card could be. Depending on what Intel meant by “disabling graphics accelerators“, it could be removing those VCA drivers and SDK within each node. Once we did that, we would have a prototype Intel SGX Card, where 3 SGX nodes run a typical operating system connecting with the host machine via PCIe. Now, what could we do?
To reuse most of the software stack developed for VCA already, I probably would keep the virtual network interface instead of creating a different device within the host machine. As such, the host still talks with the SGX card in virt eth. Within each node of the SGX card, we could install the typical Intel SGX PSW and SDK without any trouble since each node is an SGX machine. Then each node has all the necessary runtime to support SGX applications. On the host side, we could still install Intel SGX SDK to support compilation “locally”, although we might not be able to install PSW assuming an old Xeon processor. But this is not a problem because we will relay the compiled SGX application to the SGX card. To achieve this, a new SGX kernel driver is needed on the host machine to send the SGX application to one node within the SGX card via the virt eth interface.
So far we have speculated how to use Intel SGX card within a host (or server). It is time to review the design goals of Intel SGX card again:
- Enable older servers to support SGX
- Enlarge EPC memory pool
- Protect from side-channel attacks
The first problem can be achieved easily with the PCIe design and the fact that each node within the Intel SGX card is a self-contained SGX-enabled computer. However, the scalability of this solution is still limited by the number of PCIe (x16) slots available within a server and the number of CPU nodes within an Intel SGX Card. The number of PCIe slots is also limited by the power supply within the system. Unless we are talking about some crazy GPU-in-favor motherboard , 4 PCIe x16 slots seem to be a reasonable estimation. Multiplied by 3 (number of nodes within an Intel SGX card), we would have 12 SGX-enabled CPU nodes available within a server.
The second goal is a byproduct of the independent DRAM of each node within the Intel SGX card. Recall that each node has a maximum 32GB memory available. If Intel SGX card is based upon Intel VCA 2, each node then has maximum 64GB memory available. Because this 32GB (or 64GB) memory is dedicated to the node for SGX computation instead of a portion from the main system memory within the server, we can anticipate the EPC to be large for each node. For instance, a typical EPC memory size within an SGX-enable machine is 128MB. Because of the Merkle Tree used to maintain the integrity of each page and other housekeeping metadata, only around 90MB is for real enclave allocations. This means the overhead of EPC is 1/4 in general. If we assume 32GB for each node within an Intel SGX card, we could easily have 16GB for EPC, among of which 4GB is used for EPC management and 12GB for enclave allocations. Why 16GB? You might ask. Well, remember that each node is a running system. We need some memory both for OS and applications, including the non-enclave part of SGX applications. Moreover, due to the MMIO requirement, a 4GB memory space is reserved on both the main system memory and node’s memory for each node. As a result, we have roughly 12GB left for OS and applications for each node. Of course, we could push more but you get the point. We will see the EPC size once Intel SGX card is available.
The third goal is described as “additional benefit” of using Intel SGX card. Because all the 3 nodes within an Intel SGX card have its independent RAM and cache (which are also separated from the main system if the host supports SGX as well), we definitely could have better security guarantees for SGX applications. First, SGX applications can run within a node, thus isolating themselves from other processes running on the main system. Second, different SGX applications can run on different nodes, thus reducing the impact of enclave-based malware or side-channel attacks. Everything sounds good! What could possibly go wrong?
4. Speculations on security
First of all, SGX applications running within Intel SGX card is still vulnerable to whatever attacks as before, because each node within the card is still a computer system with a full software stack. Unless this whole software stack is within the TCB, an SGX application is still vulnerable to attacks from all other processes and even the OS or hypervisors running within the same node. From SGX application point of view, nothing is changed, really.
The other question is how a cloud service provider (CSP) could distribute SGX workload? A straightforward solution would be based on load balancing, where a CSP distributes different SGX applications to different nodes for performance considerations regardless of security levels of different end users. Again, this is no different with an SGX-enabled host machine running different SGX applications from different users. Another solution would be mapping a node with one user, meaning that SGX applications from the same user will run within the same node. While this solution reduces attacks from other end users, we can easily run into scalability issues given the limited number of nodes available within a system and a possibly large number of end users. The other problem of this solution would be load unbalancing. User A might only have 1 SGX applications running on node N-A while user B might have 100 SGX applications running on node N-B. I am not surprised if user B yells at the cloud.
That is being said I do not think Intel would take either approach. Instead, a VM-based approach might be used, where SGX applications from the same user run within the same VM and different users have different VMs. We can then achieve load balancing easily by assigning a similar number of VMs to each node. This approach is technically doable since we have seen SGX support for KVM  and nodes within Intel SGX card support KVM too. It is also possible that Clear Linux  will be used to reduce the overhead of VM by using KVM-based containters. The only question is if VM or container is enough to isolate potential attacks from other cloud tenants, e.g., cache-based attacks, and defend against attacks from OS and hypervisors, e.g., control-channel attacks.
This post tries to speculate what Intel SGX card would look like and how it would be used within a cloud environment. I have no doubt that some of the speculations could be totally wrong once we are able to see the real product. Nevertheless, I hope this post could shed some light on this new security product and what could/should be done and what is still missing. All opinions are my own.