Borland Turbo C++ 3.0 on MacOS

The recent post from Hackaday on Borland Turbo C/C++ [1] managed to trigger my nostalgia, when I needed to pass some national programming testing in C as an undergrad in China over 20 years ago, and it was Borland Turbo C 1.0. Anyway, this post is about running Borland Turbo C++ 3.0 on my MacOS. Nothing complicated albeit some caveats.

1. Get DOSBox

This is a no-brainer. Go get it at [2].

2. Get Borland Turbo C++ 3.0

I decided to go with Turbo C++ 3.0 because of the syntax highlight, especially the green color:) Fetch your beloved TC at [3]. I used the 3.5” version.

3. Install and Run TC

After unzip the TC3, there should be five disk image files (Disk01-05.img). These are floppy disk image files. One caveat here is that we need to extract the original installation files from these five images. Belive or not, MacOS gave me some hard time on this, since I’ve never figured out how its mount works (compared to the Linux one). Nevertheless, “Disk Utility” came to save, which can easily mount these floppy disk images under /Volumes.

tc-du

Create a local directory, e.g., “/Users/daveti/install”, and extract all the files from these five images into the install directory.

tc-ext-all

Create a target directory, e.g., “/Users/daveti/dos/tc”, where TC will be installed. Start Dosbox:

mount c /Users/daveti/dos
mount a /Users/daveti/install

Then change to A: and run “INSTALL.EXE” to install TC under “C:\TC” by default.

tc-ins

4. Run TC

Since I have configured my DOSBox yet, I need to mount C: before I could run TC. Once mounted, change to C:\TC\BIN, and run “TC.EXE”.

tc-run

References:

[1] https://hackaday.com/2023/04/08/revisiting-borland-turbo-c-and-c/
[2] https://www.dosbox.com/
[3] https://winworldpc.com/product/turbo-c/3x

Posted in OS, Programming | Tagged , , , , , | Leave a comment

SafeThings 2023

SafeThings 2023

IEEE/ACM Workshop on the Internet of Safe Things

Co-located with CPS-IoT Week 2023 »

San Antonio, Texas USA, May 9, 2023

The Internet of Things has become increasingly popular and innovative. With the rise of connected devices, we have an opportunity to significantly improve the safety of legacy systems. For instance, insights from data across systems can be exploited to reduce accidents, improve air quality and support disaster events. IoT-based cyber-physical systems (CPS) also bring new risks that arise due to the unexpected interaction between systems and the larger number of attack vectors on these systems. These safety risks can arise in the context of use of medical devices, smart home appliance control, autonomous vehicle and intelligent transportation designs, or conflicts in policy execution at a societal scale.

The Workshop on the Internet of Safe Things seeks to bring together researchers to create solutions for the development of safe cyber-physical systems. As safety is inherently linked with the security and privacy of a system, we also seek contributions in these areas that address safety concerns. We seek to develop a community that systematically dissects the vulnerabilities and risks exposed by these emerging CPSs, and creates tools, algorithms, frameworks, and systems that help in the development of safe systems.

We seek contributions across domains – autonomous vehicles, smart homes, medical devices, smart grid, intelligent transportation; and across disciplines – systems, control, human-computer interaction, privacy, security, reliability, machine learning, and verification.

Important Dates

Paper Submission DeadlineJanuary 28th February 4th, 2023 (AoE, UTC-12)
Acceptance NotificationFebruary 21th February 28th, 2023
Camera-ready Submission DeadlineMarch 5th March 12th, 2023 (AoE, UTC-12)
WorkshopMay 26th, 2023

Call for Papers

As the traditionally segregated systems are brought online for next-generation connected applications, we have an opportunity to significantly improve the safety of legacy systems. For instance, insights from data across systems can be exploited to reduce accidents, improve air quality and support disaster events. Cyber-physical systems (CPS) also bring new risks that arise due to the unexpected interaction between systems and the environment. These safety risks arise because of information that distracts users while driving, software errors in medical devices, corner cases in data-driven control, compromised sensors in drones or conflicts in societal policies. Accordingly, the Workshop on the Internet of Safe Things (or SafeThings, for brevity) seeks to bring researchers and practitioners that are actively exploring system design, modeling, verification, authentication approaches to provide safety guarantees in the Internet of Things (IoT). The workshop welcomes contributions that integrate hardware and software systems provided by disparate vendors, particularly those that have humans in the loop. As safety is inherently linked with security and privacy, we also seek contributions in these areas that address safety concerns. With the SafeThings workshop, we seek to develop a community that systematically dissects the vulnerabilities and risks exposed by these emerging CPSes, and create tools, algorithms, frameworks, and systems that help in the development of safe systems.

The scope of SafeThings includes safety topics as it relates to an individual’s health (physical, mental), society (air pollution, toxicity, disaster events), or the environment (species preservation, global warming, oil spills). The workshop considers safety from a human perspective, and thus, does not include topics such as thread safety or memory safety in its scope.

Topics of interest include, but are not limited to, the following categories:

  • Verification of safety in IoT/CPS platforms
  • Authentication in IoT/CPS settings
  • Adversarial machine learning and testing of IoT/CPS systems
  • Secure perception, localization, and planning in autonomous systems (e.g., autonomous vehicles and drones)
  • Sensors/analog and network protocol security in IoT/CPS systems
  • Compliance with legal, health, and environmental policies
  • Conflict resolution between IoT applications
  • Secure connectivity and updates in IoT/CPS
  • Secure integration of hardware and software systems
  • Privacy challenges in IoT/CPS settings
  • Privacy preserving data sharing and analysis
  • Resiliency against attacks and faults
  • Safety in human-in-the-loop systems
  • Support for IoT/CPS development – debugging tools, emulators, testbeds
  • Usable security and privacy for IoT/CPS platforms
  • Smart homes, smart buildings and smart city security and privacy issues

In addition, application domains of interest include, but are not limited to autonomous vehicles and transportation infrastructure; medical CPS and public health; smart buildings, smart grid and smart cities.

The PC will select a best paper award for work that distinguishes itself in moving the security and privacy of IoT/CPS forward through novel attacks or defenses.

Call for Demos

In addition to presentation of accepted papers, SafeThings will include a demo session that is designed to allow researchers to share demonstrations of their systems that include CPS/IoT security and safety as a major design goal. Demos of attacks are also welcome.


Submission Instructions

Submitted papers must be in English, unpublished, and must not be currently under review for any other publication. Manuscripts should be no more than 6 pages, including all figures, tables, and references in ACM two-column conference proceedings style (https://www.acm.org/publications/proceedings-template/), using US Letter (8.5-inch x 11-inch) paper size. Demos must be at most 1 single-spaced, double column 8.5” x 11” page, and have “Demo:” in their titles. All figures must fit within these limits. Papers that do not meet the size and formatting requirements will not be reviewed. All papers must be in Adobe Portable Document Format (PDF) and submitted through the web submission form via the submission link below. The review process is double-blind.

Full Papers: 6 pages including all figures, tables, and references.
Demos: 1 page (with “Demo:” in the title).

Submission Page »


ACM’s Publications Policies

By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM’s new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.

Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM has been involved in ORCID from the start and we have recently made a commitment to collect ORCID IDs from all of our published authors. The collection process has started and will roll out as a requirement throughout 2022. We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts.


Organization

General Chairs

Yuan Tian (University of Califonia, Los Angeles, USA)

Dave (Jing) Tian (Purdue University, USA)

Program Committee Chairs

Saman Zonouz (Georgia Institute of Technology, USA)

Z. Berkay Celik (Purdue University, USA)

Web Chair

Faysal Hossain Shezan (University of Virginia, USA)

Publicity Chair

Habiba Farrukh (Purdue University, USA)

Posted in Conference | Tagged , , | Leave a comment

IEEE Workshop on the Internet of Safe Things

Dear Colleagues,

Apologies if this reached you through multiple channels.

If you or anyone you know does research in Security and Privacy related to IoT, Mobile Devices and Platforms, Cyber-Physical Systems please consider submitting your early work or demos of existing, ongoing or published work at SafeThings ’22.

SafeThings ’22 (the IEEE Workshop on the Internet of Safe Things) will take place in conjunction with IEEE S&P ’22 (the IEEE Symposium on Security and Privacy 2022).

We solicit 6-page original papers on topics related to security and privacy at the edge of the network and 1-page demo proposals. Contributions can be across a variety of domains where interconnected systems affect the safety of the environment or their users – autonomous vehicles, smart homes, medical devices, smart grid, intelligent transportation; and across disciplines – systems, control, human-computer interaction, privacy, security, reliability, machine learning, and verification.

A more comprehensive list of topics can be found in the official call for papers: https://safe-things-2022.github.io/#cfp

Important Dates
Paper/Demo Submission Deadline: January 28th, 2022 (AoE, UTC-12)
Acceptance Notification: February 21th, 2022
Camera-ready Submission Deadline: March 5th, 2022 (AoE, UTC-12)
Workshop: May 26th, 2022

Best Paper Award
The PC will select a best paper award for work that distinguishes itself in moving the security and privacy of IoT/CPS forward through novel attacks or defenses.

Follow SafeThings 2022
Follow us on Twitter for updates: @safethings2022

We look forward to your paper/demo submissions. Please help spread the word by forwarding this post to anyone who might be interested.

Kind Regards,
Soteris, Danny, Sandeep and Dave

Posted in Conference | Tagged , , | Leave a comment

Some Thoughts about PCI Express Device Security Enhacement

The USB Type-C Authentication specification provides a method to authenticate USB products, although with some flaws. As you might wonder — what about other non-USB peripherals? How can we establish trust with them instead of “Trust-by-Default”? That’s how I ended up with Intel’s PCI Express Device Security Enhancement (DRAFT, we will call it PCIe authentication specification, or the specification for short), aiming to bring authentication to PCIe peripherals. This post is my thoughts after reading the draft. Opinions are my own.

The Good

Even though the specification targets PCIe authentication, it considers several existing specifications, including USB Type-C Authentication, MCTP, and TCG DICE. As a result, the PCIe authentication specification builds upon the existing framework proposed by USB Type-C Authentication and supports MCTP too. Note that this is a sensible decision considering the incoming of USB4, which is built upon Thunderbolt, which is essentially PCIe.

Compared to USB Type-C Authentication, PCIe authentication specification explicitly considers the measurement of device firmware and distinguishes between PCIe device measurement and PCIe device authentication. For instance, the device measurement could be tampered with during supply chain attacks, while the usage of certificates does not tell anything about the firmware rather than the identity of devices. Consequently, PCIe device authentication combines both PCIe device measurement and certificates (Public Key Infrastructure, PKI) to offer the most robust security guarantee.

Moving forward, I like how the specification clearly defines root-of-trust (RoT), root-of-trust for measurement (RTM), and root-of-trust for reporting (RTR). The spec explicitly mentions that both RTM and RTR should be implemented in pure hardware or immutable firmware that could not be manipulated by mutable firmware — kicking standard device firmware out of the trusted computing base (TCB). This requirement actually addressed one of the issues we found in the USB Type-C Authentication — what if the device firmware does the measurement by itself.

The Interesting

Instead of using ECDSA NIST P-256 and SHA-256 recommended in USB Type-C Authentication, this spec recommends ECDSA NIST P-384 or RSA 3072 and SHA2/3-384 or SHA2/3-512, because “PCIe Device Authentication requires a minimum level of 192-bit security.” Honestly, I do not understand a strong motivation since 128-bit security is secure IMO.

The spec also states explicitly, “Device manufacturers shall employ adequate protections against malicious insider attacks…” I do not recall insider attacks as part of USB Type-C Authentication, and it might be easy to say but hard to achieve, as previous insider attacks happened, e.g., Snowden.

Besides device2host authentication, the spec also envisions host2device authentication and eventually mutual authentication, as we argued in our study for USB Type-C Authentication. The spec further recommends AES-128/256/384 for symmetric crypto (e.g., after DHKE). Maybe AES-128 should not be listed, given that the spec wanted minimum 192-bit security.

The Bad

The spec also gives three levels of device private key protection. In Level 1, “Device private key is stored in plaintext from inside the Device (e.g., fuze, internal NVRAM)… is accessible to mutable Device component, such as firmware.” Even if we can argue that security is essentially an economic activity and there is a budget for every defense, what is the point of having this L1 protection? This is totally broken. In Level 2, “Device private key is encrypted using a key encryption key (KEK) which is known only to the immutable Device hardware… is accessible only to Device hardware or immutable firmware.” IMO, L2 should be the default one, although I’m a bit skeptical about the secrecy of a secret key known only to hardware. Level 3 mandates the secret generation inside the hardware, e.g., using PUF or DICE.

Same as USB Type-C Authentication, the spec states that “Device private key should be protected against software side-channel attacks as well as against hardware differential power analysis attacks, including all relevant keys and cryptographic algorithms related to the usage of the Device private key.” Presumably, the “software side-channel attacks” might refer to vulnerabilities within immutable firmware, and the only hardware side-channel highlighted here is power analysis. Here are my concerns: how often could we guarantee no side channels in our code? Why can’t we implement those crypto operations using pure hardware? What about EM side channels? Again, I do not think they are bad requirements at all. But I am hoping to see more justification behind all the statements.

USB Type-C Authentication introduces a number of slots supporting different certificate chains, essentially allowing the firmware to create its self-signed certificate chain that might be accepted by some vulnerable policy. The PCIe authentication specification inherits multiple slots and even introduces a new SET_CERTIFICATE command setting new certificate chains into non-zero slots. However, the spec requires that each new certificate chain signs the leaf DeviceCert in slot 0, which is immutable. The sole purpose of this owner-provisioned certificate chain allows the simplification of certificate chain validation, e.g., we don’t need to verify the slot-0 chain every time. Instead, one could provision a self-signed certificate chain and reduce the length of the chain for validation. Unfortunately, there is still a catch. First of all, it is not clear who has the write access to these non-zero slots. The spec uses the term “device owner,” but my gut feeling is that anything might be able to write and overwrite those slots. Second, to accelerate the chain validation, a host machine must remember what slot is used for a given device. This means the provisioned slot number has to be added into the PCIe database maintained by OS. We will have two problems here. What if the target slot was overwritten, which would fail the chain validation for that slot, although slot 0 should still work. What if the slot number in the PCIe database was tampered, e.g., attackers injected their own chains and redirected the slot number in the database to the new slot. The chain validation would succeed while the PCIe device firmware could be compromised if the challenge-response protocol (containing the DEV_IDENTITY and FW_IDENTITY signed by the Device Private Key) was skipped. In the end, no matter what case we are talking about here, a slot-0 chain validation might still be needed. The potential issues of non-zero slot usage might outweigh the benefit of its speed acceleration.

Summary

As a draft from Intel tackling PCIe authentication, I think this specification balanced different pros and cons. It has indeed reached a balance between high-level concepts and implementation details. A step forward might be implementing it on a real-world PCIe device and see all the pitfalls and challenges. One of the biggest challenges during implementation might be the boundary between hardware and software, immutable and mutable, and security and performance.

Posted in Security | Tagged , , , , , , , , | Leave a comment

Some notes for my security class stduents…

There were some crazy shooting and beating incidents happened in the past few weeks, and we have seen how our communities are trying to fight together to stop the hatred and racism. Unsurprisingly, my students asked me for my comments regarding these tragedies. Although it is not my personal interests to dive into these incidents, let alone commenting on them, and I do support the effort of our communities to stop these misbehaviors in the future, I would like to share some thoughts for my students from perspectives of being a teacher and student (used-to-be). Opinions are my own, and I will not participate in any further discussion. But feel free to leave a comment if you think needed.

1. Be judgemental

Like it nor not, any information we are getting anywhere could be biased, whether it’s online information or sth written down on paper. Unfortunately, people seem to forget this and treat “media” as “facts.” But are they? If you still remember the discussion we had earlier on “Trusted Computing Base” (TCB), ask yourself – are any media within your TCB? If they are trusted, why? I hope you will enjoy the process of reflective thinking and reasoning.

So what’s the point here? All news is fake, and we could not trust any news? Apparently not. Since “news” is still the major source of our real-time information, we probably will have to “trust” them to a certain extent. But this does not necessarily mean that “news” is trustworthy, and it could be far away from “facts.” IMO, the most important thing everyone should learn from a major like Computer Science should be the critical thinking like a scientist. Whenever you see a piece of “news,” ask yourself – what is the “fact” there and what are the subjective speculations that the makers try to deliver explicitly or implicitly.

Unfortunately, it is not an easy process to distinguish the “fact” from other parts of “news,” and it could be a life-long learning experience. But I think it is worth all the efforts and a nice habit to have if you hope to get close to the “truth” of the world. If “news” => “action”, a better causal chain might be “news” => “facts” => “action”. Be judgemental and think independently.

2. Protect yourself

No matter where we are, there will be some crazy people around. I have lived in Oregon and Florida, and I have been living in Indiana for almost 2 years now. A “fun” fact is I was harassed by different people in all these three states. My default policy against these harassements is always to go away as far as I could. Sounds like a “coward”? Maybe, but my “counter-argument” is that “I have more important things to do than arguing with these people,” such as “getting home safely,” “walking our dog every evening,” and “teaching my class on time.”

Remember – we cannot fix people. Haters are gonna hate no matter what. Life is too short to be wasted on “teaching a lesson” for people who cannot be taught with the risk of getting ourselves into injuries. Even better, try to avoid any potential risks if possible. I remember attending a conference in Baltimore, and our taxi driver warned us about places that we do not want to visit. Unfortunately, we did end up staying in a bar within a dangerous area. But we did call the taxi again and waited for the taxi to show up in front of the bar before we left for the hotel. Looking back, we probably should not have been to the bar in the first place.

In short, protect yourself. Avoid any potential risks if possible and run away as soon as possible without any confrontation because your safety triumphs over everything else.

3. Be kind

If you have not seen the commencement speech from Victor Wooten (my favorite bass player in the world, although I do not play bass at all) in 2016, please check it out. His mother asked the young Victor what the world needs, e.g., another good musician? The answer is good people. What the world needs are good people!

Just be kind, to your family, friends, classmates, strangers, animals, and so on so forth. It is sometimes surprising to realize how simple it is to make us “human” and how often we forget how to behave like “human.” The other day, while I was walking our dog (his name is Fubao, in case you are interested), I found a dying raccoon in the forest (actually, Fubao found it first). I tried all the numbers I could find that might save the poor thing, but none of them worked due to off-hours. At last, I called 911. The lady on the other end told me that animals are out of their scope. When I told her that it is heartbreaking to see an animal’s suffering and there is nothing I could do to help, she said, “This is life, and you have to deal with it.”

There is nothing wrong with her response, and I know she meant well to comfort me. Meanwhile, I started to realize how we, as human beings, could “easily” get used to “things,” which might not be good or kind and we probably should not have “get used to” in the first place. If you apply “Be judgemental” to “routines” (some) people take for granted, you might be shocked to see how often we are not “kind” in the name of “safety,” “tradition,” “race,” “religion,” or whatever. Again, there is nothing wrong with being self-protective, and we need to protect ourselves for sure. At the same time, please do not forget your kindness and give a hand if you could.

If possible, please don’t get worn out “so easily” by the “society.” Stay kind.

4. Learn something

Regardless of skin colors or religions, as human beings, we are curious about the meaning of our lives and even our next lives. When I was a teenager, I used to write down “The meaning of my life” whenever I started to drift away from classes (sounds weirdo?) While I do appreciate the wisdom of tautology, e.g., the meaning of life is to explore the meaning of life, I hit my first “life crisis” during my undergrad when I was lost and had no idea why I even went to college or what I should do in these 4 years.

While I seemed to find my “direction” by playing heavy metals, there was always a “dark cloud” somewhere in my sky bothering me now and then. At the beginning of sophomore, a professor gave a talk, and I could not even recall how I managed to stay there or what the talk was about. He said, “If you have no idea what to do, stay in the library and learn something.” I remember everyone laughed (including myself) since my roommates were busy with “Counter Strike” and I was busy with “Metallica.” “What a douche!” I laughed.

What’s the rest of the story? I am now writing a blog post trying to convince my students to learn something when they are lost or have no idea what to do, and I know someone will laugh at me. What about “the meaning of my life”? I am still clueless. However, I now know that I will get it sooner or later as long as I keep moving forward. There is no dark cloud in my sky anymore.

Posted in Life Prose | Leave a comment

Ubuntu Kernel Build Again

I wrote two blog posts about Linux kernel build on Ubuntu [1,2]. There is also an official wiki page talking about the same thing [3]. Still, things are broken when I try to create a homework assignment for my class. This post is about how to correctly, easily and quickly build Linux kernel on Ubuntu 16.04 and 18.04. This whole process is used and verified by myself. Happy hacking.

1. Install building dependencies

sudo apt-get build-dep linux linux-image-$(uname -r)
sudo apt-get install libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf

2. Fetch the Linux kernel source

Make sure /etc/apt/sources.list contains at least these 2 source entries (e.g., uncommenting). Note that the CODENAME is “xenial” for 16.04 and “bionic” for 18.04. Once the source entries are there, update the list file and fetch the kernel source.

deb-src http://us.archive.ubuntu.com/ubuntu CODENAME main
deb-src http://us.archive.ubuntu.com/ubuntu CODENAME-updates main
sudo apt update
sudo apt-get source linux-image-unsigned-$(uname -r)

Note that unlike the typical “linux-image-$(uname -r)”, there is “unsigned” added to fetch the kernel source to work around the bug caused by the kernel signing [4]. You might also wonder why I am not doing git directly. The reason is simple: the apt-get approach gives the exact kernel source files that your system is running now, while git usually provides the most recent version (e.g., master) for certain distro versions. Unless you need to play with cutting-edge features from the most recent kernels, it is wise to stick with the current kernel that proves working on your system. The other benefit of this approach is about generating the config – we could simply reuse the existing config without any trouble.

3. Build the kernel

I do not use the official Ubuntu kernel build procedure [3], which is tedious and moreover does not support incremental build. Note that X should be the kernel version you have downloaded (apt-get), and Y could be the number of cores available in your system for a parallel build.

cd linux-hwe-X
sudo make oldconfig
sudo make -jY bindeb-pkg

4. Install the new kernel

cd ..
sudo dpkg -i linux*X*.deb
sudo reboot

5. Sign kernel modules (optional)

If you are working with Ubuntu 16.04, it is likely that you do not need to deal with kernel module signing. But in case you need to, take a look at [6]. On Ubuntu 18.04, Lockdown is enabled by default to prevent the kernel from loading unsigned kernel modules. A quick and dirty fix is to disable Lockdown to load modules [7]:

sudo bash -c 'echo 1 > /proc/sys/kernel/sysrq'
sudo bash -c 'echo x > /proc/sysrq-trigger'

disable_lockdown

6. Resize VM disk (optional)

To build a kernel, a VM might need at least 32GB space (tested on Ubuntu 16.04, ). qemu-img is a convenient command to resize your VM image size.

sudo qemu-img resize /PATH_TO_YOUR_VM_IMG_FILE +20G
sudo qemu-img resize --shrink /PATH_TO_YOUR_VM_IMG_FILE -20G
sudo qemu-img info /PATH_TO_YOUR_VM_IMG_FILE

Note that the actual image size change cannot be seen from the host machine using e.g., fdisk. Instead, use qemu-img info to confirm the difference between “virtual size” and “disk size”. The former is changed by the resizing and the later has to be changed as well. We need to boot into the VM and grow the partition.

Once we are inside the VM, use lsblk to confirm the size that we have just grown. In the figure down below, the whole /dev/vda size is 40G but all the partitions only take 20G. We can grow the root partition (/) using parted. Unfortunately, it did not work as you can find in the figure.

grow_root

Why? Because we could not grow a partition which is not the last. Instead, we need to delete vda5 and vda2 if we wanna grow vda1. Again, parted is your friend here [5]. After we are able to append the extra 20G to the root partition (vda1), we need to fix the partition and resize the filesystem accordingly:

sudo apt-get install cloud-guest-utils
sudo growpart /dev/vda 1
sudo resize2fs /dev/vda1   

References:

[1] https://davejingtian.org/2013/08/20/official-ubuntu-linux-kernel-build-with-ima-enabled/
[2] https://davejingtian.org/2018/03/15/make-deb-pkg-broken/
[3] https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
[4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1790880
[5] https://unix.stackexchange.com/questions/196512/how-to-extend-filesystem-partition-on-ubuntu-vm
[6] https://www.kernel.org/doc/html/v4.15/admin-guide/module-signing.html
[7] https://bugzilla.redhat.com/show_bug.cgi?id=1599197

Posted in IDE_Make, Linux Distro, OS | Tagged , , , , , , , , , , , , | 1 Comment

USB Fuzzing: A USB Perspective

Syzkaller [1] starts to support USB fuzzing recently and has already found over 80 bugs within the Linux kernel [2]. Almost every fuzzing expert whom I talked to has started to apply their fuzzing techniques to USB because of the high-security impact and potential volume of vulnerabilities due to the complexity of USB itself. While this post is NOT about fuzzing or USB security in general, I hope to provide some insights for USB fuzzing in general as someone who has been doing research on USB security for a while. Happy fuzzing!

1. Understand USB Stacks

USB is split into two worlds due to the master-slave nature of the protocol: USB host and USB device/gadget. When we talk about USB, it usually refers to the USB host, e.g., a laptop with a standard USB port. The figure below is the Linux USB host stack. From bottom to up, we have the hardware, kernel space, and user space.

usbfuzz-host-arch
From the Syzkaller USB fuzzing slides by Andrey Konovalov [3].

The USB host controller device (aka, HCD) is a PCI device attached to the system PCI bus and provides USB connection supports via USB ports. Depending on the generation of the USB technology, it is also called UHCI/OHCI for USB 1.x, EHCI for USB 2.x, and XHCI for USB 3.x controllers. For kernel to use this controller, we need a USB host controller driver, which sets up the PCI configuration and DMAs. Above it is the USB core, implementing the underlying USB protocol stack (e.g., Chapter 9) and abstracting ways to send/recv USB packets with generic kernel APIs (submit/recv URB). Above it are different USB device drivers, such as USB HID drivers and USB mass storage drivers. These drivers implement different USB class protocols (e.g., HID, Mass Storage), provide glue layers with other subsystems within the kernel (e.g., input and block), facilitate user spaces (e.g., creating /dev node).

Since Linux is also widely used in embedded systems, e.g., some USB dongles, USB device/gadget refers to both the USB dongle hardware and the USB mode within Linux. “Surprisingly”, it is totally different from the USB host mode. The figure down below demonstrates the USB gadget stack within the Linux kernel.

usbfuzz-gadget-arch
From the Syzkaller USB fuzzing slides by Andrey Konovalov [3].

At the bottom, we have the USB device controller (aka, UDC). Like HCDs, UDCs also implement specific version of the USB standards within the PHY layer. However, unlike the most common HCDs made by Intel, UDC IPs are found from different hardware vendors [8], such as DWC2/3, OMAP, TUSB, and FUSB. These controllers usually have their own design specifications, and might follow the HCD specification (e.g., XHCI specification) as well when they support USB On-The-Go (aka, OTG) mode. OTG allows a UDC to switch between USB host and USB device/gadget modes. For example, when an Android device connects with a laptop as MTP, the Android USB device controller is in the USB device/gadget mode. If a USB flash drive is plugged into an Android device, the UDC works in the USB host mode. A UDC supporting OTG is also replaced by a Dual-Role Device (DRD) controller in USB 3.x standards [11]. As a result, an OTG cable is not needed to switch the role of the UDC, since the role switching is done in software for a DRD controller.

To use a UDC, you need a UDC driver within the kernel, providing connection and configuration over industry-standard buses including both the AMBA™ AHB and AXI interfaces, and setting up DMAs for the higher layer. Like the USB core within USB host stack, the USB gadget core within USB gadget stack provides APIs to register and implement a USB gadget function via callbacks and configfs. For instance, we can pass USB descriptors to the USB gadget core and achieve a typical USB mass storage device by requesting the existing mass storage function (f_mass_storage). For more complicated protocols such as MTP, a user-space daemon or library provides the protocol logic and communicates with the gadget function via e.g., configfs or usbfs.

2. Where We Are

USB fuzzing started to attract more attention thanks to the FaceDancer [4], a programmable USB hardware fuzzer. It supports both USB host and device/gadget mode emulation and allows sending out pre-formed or mal-formed USB requests and response. Umap/Umap2 [5] provides a fuzzing framework written in Python with the different USB device and response templates for the FaceDancer. The TTWE framework [9] enables MitM between a USB host and a USB device by using 2 FaceDancers emulating the USB host and device/gadget, respectively. This MitM allows USB packet mutations for both directions, thus enables fuzzing on both sides.

All these solutions focus on the USB host stack due to the facts that people assume a malicious USB device rather than a malicious USB host, e.g., a laptop, and that most USB device firmware is closed source and thus hard to analyze. Accordingly, most of the bugs/vulnerabilities are found within the USB core (for parsing USB response) and some common USB drivers (e.g., keyboard). The pros of these solutions are their ability to faithfully emulate a USB device. However, the problems, in my opinions, are:

a. Hardware dependency.
b. Limited feedback from the target.

FaceDancer is slow, which makes any solution built upon it not scale. The fact that we need both a FaceDancer and a target machine as the minimum to start fuzzing also imposes more challenges for scalability. Feedback is the other big issue here. Mutations of the fuzzing input are based on templates and randomizations without real-time feedback from the target (e.g., code coverage) except system logging. Thus, fuzzing efficiency is questionable. As a result, these solutions are “best-effort” to find some bugs with a minimum setup effort.

To get rid of the hardware dependency, virtualization (e.g., QEMU) comes to save. vUSBf [6] uses QEMU/KVM to run a kernel image and leverages the USB redirection protocol within QEMU to redirect the access to USB devices to a USB emulator controlled by the fuzzer, as shown below:

vusbf-arch
From the vUSBf paper [6].

While vUSBf provides a nice orchestration architecture to run multiple QEMU instances in parallel solve scalability issues, the fuzzer itself is essentially based on templates (or test cases according to the paper. The feedback still relies on system logging. POTUS [7] moves a step forward to leverage symbolic execution, e.g., S2E, to inject faults from the USB HCD layer, as shown below:

potus-arch
From the POTUS paper [7].

SystemTap is used to instrument the kernel to inject fault and annotations to save the number of faults. A path prioritization algorithm based on the number of faults within different states is used to control the number of “forks”. The number of faults of a given path represents the code coverage. Thus, a significant number of faults represents a high code coverage. POTUS also implements a generic USB virtual device within QEMU to emulate different USB devices using configurable device descriptors and data transfer. The Driver Exerciser within the VM uses syscalls to play with different device node exposed to the VM. Comparing to vUSBf, POTUS includes a fuzzing feedback mechanism (by counting the number of faults within a path) and support more USB device emulations. However, the manual effort to emulate operations on certain USB devices within the Driver Exerciser, fundamental limitations of symbolic executions – path explosion, and the unknown effectiveness and limitations of relying on the number of faults of a path for path scheduling, make POTUS hard to evaluate in the real world.

Syzkaller [1] USB fuzzing support [12] was added recently by Andrey Konovalov at Google., and has demonstrated its ability to find more bugs. Andrey solved two main problems of using Syzkaller to fuzz USB:

a. Code coverage for kernel tasks.
b. Device emulation within the same kernel image.

Since USB events and operations happen within IRQ or kernel context rather than a process context (e.g., USB plugging detection within khub kernel task in older kernels), syscall-based tracing and code coverage [10] simply won’t work. We need the ability to report the code coverage anywhere within the kernel. To do that, we need to annotate the USB-related kernel source (e.g., hub.c) with the extended KCOV kernel APIs to report the code coverage [13]. Instead of relying on QEMU, Syzkaller uses the gadgetfs to expose a fuzzer kernel driver to the user space [14], which can then manipulate the input for fuzzing. By enabling both the USB host stack and the USB gadget stack in the kernel configuration and connecting them using the dummy HCD and UDC drivers together as shown below, Syakaller is able to fuzz USB host device drivers, such as USB HID, Mass Storage, etc., from the user-space fuzzing the USB fuzzer kernel driver.

syzkaller-arch.png
From the Syzkaller USB fuzzing slides by Andrey Konovalov [3].

Syzkaller USB fuzzer might be the first real coverage-based USB host device driver fuzzer thanks to the existing Syzkaller infrastructure and nice hackings to bridge the USB host and gadget the same time. While it has found tons of bugs and vulnerabilities, the limitation of the fuzzer starts to reveal – most of the issues found were in the initialization phase of a driver (e.g., probing). In the user space, the fuzzer is able to configure the fuzzer kernel driver to represent any USB device/gadget by exploring different VID/PID combinations within the USB device descriptor. On the one hand, Syzkaller is able to trigger almost every USB host device driver to be loaded, thus having a prominent code coverage horizontally. On the other hand, since no real emulation code for a certain device is provided within the user-space fuzzer or the fuzzer kernel driver, most fuzzing stops after the initialization of a driver, thus covering only a small portion of the driver vertically.

3. What To Be Done

Cautions readers might have found it already – all these fuzzing solutions focus on the USB host stack, especially on the USB host device drivers. Again, this is due to the facts that people often refer USB to the USB host stack, and that these device drivers are famous for containing more vulnerabilities than other components within the kernel (e.g., device drivers on Windows). However, at this point, I believe you have realized that what has been covered on USB fuzzing so far is the tip of the iceberg, either horizontally or vertically. Let’s enumerate what to be done next.

a. HCD drivers fuzzing

If we limit ourselves within the USB host stack, it is interesting to find HCD drivers are ignored. Unlike device drivers, they are not accessible from the user space via syscall (except tunning some parameters using sysfs). Instead, they receive inputs from the USB core (e.g., usb_submit_urb) in the upper layer (internal) and DMAs of the HCD layer (external). From a security perspective, external inputs should impose more threats than the internal ones.

To directly fuzz the internal inputs of HCD drivers, we need the ability to mutates the parameters of kernel APIs exposed to the USB core and get the code coverage from the HCD driver. To directly fuzz the external inputs of HCD drivers, we need to mutate on DMA buffers and event queues, as well as the code coverage from the HCD driver. Note that the code coverage is often different in these two cases because of different code paths for TX and RX. Thus we need a fine-granularity code coverage reporting to reflect this. Mutating on DMA buffers and event queues is essentially building an HCD emulator with fuzzing capabilities. For common HCD drivers such as Intel XHCI, QEMU provides the corresponding HCD emulation already (e.g., qemu/hw/usb/hcd-xhci.c), and one can try to add fuzzing functionality there. For other HCD drivers that QEMU does not provide the HCD emulation, one needs to build the HCD emulation from scratch.

b. USB device/gadget stack fuzzing

Yes, we do not have systematic fuzzing on the USB device/gadget stack. It used to be OK since we often assume a malicious USB device rather than a malicious host. However, the broad adoption of USB OTG and DRD controllers in embedded systems (e.g., Android devices) extends the threat model to include USB host as well. For example, no one wants their phones to be hacked during USB charging. Architecturally, Syzkaller USB fuzzer has imagined a way to fuzz the USB device/gadget stack, as shown below.

syzkaller-arch-todo
From the Syzkaller USB fuzzing slides by Andrey Konovalov [3].

Instead of having the user-space fuzzer communicating with the USB fuzzer kernel driver, another user-space fuzzer will manipulate USB host device drivers. The fuzzer activities will be propagated via the USB host stack to the USB device/gadget stack, hopefully. Accordingly, we need to configure the kernel to enable all different gadget functions within the same kernel image, as well as code coverage reporting. We are then able to fuzz the USB gadget core and USB gadget function drivers, except UDC drivers.

Note that Syzkaller imagines one way to fuzz the USB device/gadget stack, a natural result due to the architecture and limitations of Syzkaller. Syzkaller is a syscall fuzzer, meaning that the input mutations happen from syscall parameters. But this does not mean we have to fuzz at the syscall layer (e.g., user-space). If we look at the figure above again, we can find a long path from the fuzzer to the fuzzing target (e.g., USB host device drivers or USB device/gadget drivers). How could we know if all the fuzzing inputs are successfully propagated to the target instead of being filtered by middle layers in between? One core question is if syscall-based fuzzing is suitable for USB fuzzing within the kernel. Again, the Syzkaller USB fuzzer accommodates itself to the constraints of the Syzkaller itself instead of thinking about building a USB fuzzer (e.g., the USB host or gadget stacks) from scratch.

To shorten the fuzzing path is to push the fuzzer (inputs) close to the target. For example, we could get rid of the whole USB host stack by building a USB UDC emulator/fuzzer within QEMU, directly enabling UDC driver fuzzing. However, this does not mean any DMA write can be translated into a valid USB request to the USB gadget drivers in the upper layer. As a result, the fuzzing path is indeed shorter, but we still hope that the mutation algorithm and the code coverage granularity will save us. In the end, we might need different fuzzers for different layers within the stack, making sure all fuzzing inputs applied to the target without being filtered. E.g., we may need to build a USB host emulator/fuzzer sending out USB requests to different USB gadget drivers directly.

c. Android USB fuzzing

Android might be the heaviest USB device/gadget stack user via maintaining its own branch of the kernel and implementing extra USB gadget function drivers (e.g., MTP). The OTG/DRD support within Android devices also doubles the attacking surface comparing to a typical USB host machine. The fundamental challenge is to run an Android kernel image with the corresponding UDC/DRD drivers used by real-world Android devices using QEMU. Running non-AOSP kernels in QEMU imposes extra difficulties due to SoC customizations and variations. That’s why a lot of Android fuzzing still requires a physical device.

d. Protocol-guided/Stateful fuzzing

In section b, we talked about why we might wanna shorten the fuzzing path because we wanna avoid fuzzing inputs being filtered before hitting the target. Turns out it is much more complicated here. If we look at the figure above on USB device/gadget fuzzing imagined in Syzkaller again, the fuzzer inputs start with syscalls, pass different USB host device drivers before finally delivered to the USB gadget stack. Yes, the fuzzing path is long, and the fuzzing input could be filtered along the way. Meanwhile, these extra layers in between guarantee that whatever fuzzing inputs sent out is legitimate USB request carrying the corresponding protocol payload triggered by the right driver state. For example, the final USB request generated by the fuzzer via the USB mass storage driver might contain a legitimate SCSI command (e.g., read) triggered by the core logic of the USB host device driver rather than the initialization part.

This is what I call the “protocol-guided/stateful” fuzzing. As you can tell, it is essential to go “deeper” vertically within a layer, e.g., exploring other parts of a kernel driver beyond the initialization/probing phase. Simply put, to fuzz either USB host or device/gadget drivers, we need to establish a virtual connection with the target (e.g., making sure the kernel driver initialized and ready to process inputs) – stateful, and teach the fuzzer to learn the structure of the input (e.g., SCSI protocol in USB Mass Storage) – protocol-guided. In the end, it is a trade-off between including other layers to reuse the existing protocol and state controls thus increasing the fuzzing path and complexity and implementing a light-weight protocol-aware/stateful fuzzing within the fuzzer directly to reduce the fuzzing path. Both have their pros and cons.

e. Type-C/USBIP/WUSB fuzzing

There are more things in USB other than the USB host and USB device/gadget, including USB Type-C, USBIP, WUSB, etc. While we could reuse some of the lessons learns in USB fuzzing, these technologies introduce different software stacks and may require different attentions to solve their quirks.

4. Summary

This post looks into USB fuzzing, a recent hot topic from both software security and operating system security. Instead of treating USB as another piece of software, we start with understanding what USB stacks are and why USB refers to a bigger picture than what people often imagined. We survey some previous works on USB fuzzing, from using specialized hardware to running QEMU. We conclude with what is missing and foresee the future.

P.S. This blog post was long overdue. I promised to have it a month ago but never made it. I also underestimated how much time needed to finish it. A lesson learns (again, for myself) is to start early and focus. Anyway, better delay than nothing:)

References:

[1] https://github.com/google/syzkaller
[2] https://github.com/google/syzkaller/blob/e90d7ed8d240b182d65b1796563d887a4f9dc2f6/docs/linux/found_bugs_usb.md
[3] https://docs.google.com/presentation/d/1z-giB9kom17Lk21YEjmceiNUVYeI6yIaG5_gZ3vKC-M/edit?usp=sharing
[4] http://goodfet.sourceforge.net/hardware/facedancer21/
[5] https://github.com/nccgroup/umap2
[6] https://github.com/schumilo/vUSBf
[7] https://www.usenix.org/conference/woot17/workshop-program/presentation/patrick-evans
[8] https://elinux.org/Tims_USB_Notes
[9] https://www.usenix.org/conference/woot14/workshop-program/presentation/van-tonder
[10] https://davejingtian.org/2017/06/01/understanding-kcov-play-with-fsanitize-coveragetrace-pc-from-the-user-space/
[11] https://blogs.synopsys.com/tousbornottousb/2018/05/03/usb-dual-role-replaces-usb-on-the-go/
[12] https://github.com/google/syzkaller/commit/e90d7ed8d240b182d65b1796563d887a4f9dc2f6
[13] https://github.com/xairy/linux/commit/ff543afbf78902acea566fa4c635240ede651f77
[14] https://github.com/xairy/linux/commit/700fb65580efc049133628e7b9f65453bb686231

Posted in Security | Tagged , , , , , , , , , , | 3 Comments

Speculations on Intel SGX Card

One of the exciting things Intel has brought to RSA 2019 is Intel SGX Card [2]. Yet there is not much information about this coming hardware. This post collects some related documentation from Intel and speculates what could happen within Intel SGX Card with a focus on software architecture, cloud deployment, and security analysis. NOTE: all the figures come from public Intel blog posts and documentation, and there is no warranty for my speculations on Intel SGX Card! Read with caution!

1. Intel SGX Card

According to [2], “Though Intel SGX technology will be available on future multi-socket Intel® Xeon® Scalable processors, there is pressing demand for its security benefits in this space today. Intel is accelerating deployment of Intel SGX technology for the vast majority of cloud servers deployed today with the Intel SGX Card. Additional benefits offer access to larger, non-enclave memory spaces, and some additional side-channel protections when compartmentalizing sensitive data to a separate processor and associated cache.

Simply put, Intel SGX Card is introduced to address 3 problems on SGX usage within cloud:

  1. Older servers/CPUs that do not support SGX
  2. Small EPC memory pool
  3. Side-channel attacks

Accordingly, Intel SGX Card is designed as a PCIe card, which can be plugged into old servers. This solves the first problem. But what about the second and the third problems? How could Intel SGX Card have larger EPC memory pool and defend against side-channel attacks? To answer these questions, we need to look into the internals of Intel SGX Card.

2. Intel VCA

According to [1], Intel SGX Card is actually built upon Intel VCA, the Intel® Visual Compute Accelerator (Intel® VCA) card [3]. Moreover, “Intel VCA is a purpose-built accelerator designed to boost performance of visual computing workloads like media transcoding, object recognition and tracking, and cloud gaming, originally developed as a way to improve video creation and delivery. In the Intel® SGX Card, the graphics accelerator has been disabled and the system re-optimized specifically for security purposes. In order to take advantage of Intel SGX technology, three Intel Xeon E processors are hosted in the card, which can fit inside existing, multi-socket server platforms being used in data centers today.

Alright, so Intel SGX Card is Intel VCA with graphics accelerator disabled essentially. Now it is time to learn what Intel VCA is. After some digging online, I found 2 precious documentations describing hardware specification [4] and software guide [5] respectively. Readers are encouraged to give a careful read on these documentations. Below is the TL;DR version.

vca-hw-dimm

The Intel VCA (or VCA 2) is a PCIe card with 3 Xeon CPUs. As shown in the figure above, each CPU has its own DRAM, instead of sharing RAMs. The internal architecture below shows better the nature of this card: 3 computers within a PCIe card.

vca2-hw-internal

These 3 CPUs do not only have their own DRAMs but also PCH chipsets and Flashes. They are connected and multiplexed by a PCIe bridge connecting with the host machine. Note that VCA 2 also supports optional NVM storage M2, as shown in the figure above. Let’s take a look at the software stack.

vca-sw-arch

Did I say “3 computers within a PCIe card”? I actually mean it. Each CPU within the VCA card runs its own software stack, including UEFI/BIOS, operating system, drivers, SDKs, and applications. These operating systems could be Linux or Windows. Hypervisors are also supported including KVM and Xen. Even “better”, each CPU is also equipped with Intel SPS and ME. If you count ME as a microcomputer as well, now we have 3 microcomputers running inside 3 computers within 1 PCIe card.

vca-sw-net

Each computer within VCA is also called a node. Therefore, there are 3 nodes within 1 VCA card. Unlike typical PCIe cards, VCA exposes itself as virtual network interfaces to the host machine. For example, 2 VCA cards (6 nodes) add 6 different Virt eth interfaces to the host machine, as shown in the figure above. These Virt eth interfaces are implemented as MMIO over PCIe. Given that each node is indeed an independent computer system with full software stacks, this virtual network interface concept might be a reasonable abstraction. I was worried about the overhead of going through TCP/IP stack. Then I realize that Intel could provide dedicated drivers on both the host and the node side to bypass the TCP/IP stack, which is very possible, as suggested by those VCA drivers. It would be interesting to see what “packet” is sent and received from these virtual NICs. To support high bandwidth and throughput, the MMIO region is 4GB minimum. This means each node takes a 4GB memory space from the main system memory, as well as its internal memory.

3. Speculations on Intel SGX Card

Once we have some basic understanding of Intel VCA, we can now speculate what Intel SGX Card could be. Depending on what Intel meant by “disabling graphics accelerators“, it could be removing those VCA drivers and SDK within each node. Once we did that, we would have a prototype Intel SGX Card, where 3 SGX nodes run a typical operating system connecting with the host machine via PCIe. Now, what could we do?

To reuse most of the software stack developed for VCA already, I probably would keep the virtual network interface instead of creating a different device within the host machine. As such, the host still talks with the SGX card in virt eth. Within each node of the SGX card, we could install the typical Intel SGX PSW and SDK without any trouble since each node is an SGX machine. Then each node has all the necessary runtime to support SGX applications. On the host side, we could still install Intel SGX SDK to support compilation “locally”, although we might not be able to install PSW assuming an old Xeon processor. But this is not a problem because we will relay the compiled SGX application to the SGX card. To achieve this, a new SGX kernel driver is needed on the host machine to send the SGX application to one node within the SGX card via the virt eth interface.

So far we have speculated how to use Intel SGX card within a host (or server). It is time to review the design goals of Intel SGX card again:

  1. Enable older servers to support SGX
  2. Enlarge EPC memory pool
  3. Protect from side-channel attacks

The first problem can be achieved easily with the PCIe design and the fact that each node within the Intel SGX card is a self-contained SGX-enabled computer. However, the scalability of this solution is still limited by the number of PCIe (x16) slots available within a server and the number of CPU nodes within an Intel SGX Card. The number of PCIe slots is also limited by the power supply within the system. Unless we are talking about some crazy GPU-in-favor motherboard [6], 4 PCIe x16 slots seem to be a reasonable estimation. Multiplied by 3 (number of nodes within an Intel SGX card), we would have 12 SGX-enabled CPU nodes available within a server.

The second goal is a byproduct of the independent DRAM of each node within the Intel SGX card. Recall that each node has a maximum 32GB memory available. If Intel SGX card is based upon Intel VCA 2, each node then has maximum 64GB memory available. Because this 32GB (or 64GB) memory is dedicated to the node for SGX computation instead of a portion from the main system memory within the server, we can anticipate the EPC to be large for each node. For instance, a typical EPC memory size within an SGX-enable machine is 128MB. Because of the Merkle Tree used to maintain the integrity of each page and other housekeeping metadata, only around 90MB is for real enclave allocations. This means the overhead of EPC is 1/4 in general. If we assume 32GB for each node within an Intel SGX card, we could easily have 16GB for EPC, among of which 4GB is used for EPC management and 12GB for enclave allocations. Why 16GB? You might ask. Well, remember that each node is a running system. We need some memory both for OS and applications, including the non-enclave part of SGX applications. Moreover, due to the MMIO requirement, a 4GB memory space is reserved on both the main system memory and node’s memory for each node. As a result, we have roughly 12GB left for OS and applications for each node. Of course, we could push more but you get the point. We will see the EPC size once Intel SGX card is available.

The third goal is described as “additional benefit” of using Intel SGX card. Because all the 3 nodes within an Intel SGX card have its independent RAM and cache (which are also separated from the main system if the host supports SGX as well), we definitely could have better security guarantees for SGX applications. First, SGX applications can run within a node, thus isolating themselves from other processes running on the main system. Second, different SGX applications can run on different nodes, thus reducing the impact of enclave-based malware or side-channel attacks. Everything sounds good! What could possibly go wrong?

4. Speculations on security

First of all, SGX applications running within Intel SGX card is still vulnerable to whatever attacks as before, because each node within the card is still a computer system with a full software stack. Unless this whole software stack is within the TCB, an SGX application is still vulnerable to attacks from all other processes and even the OS or hypervisors running within the same node. From SGX application point of view, nothing is changed, really.

The other question is how a cloud service provider (CSP) could distribute SGX workload? A straightforward solution would be based on load balancing, where a CSP distributes different SGX applications to different nodes for performance considerations regardless of security levels of different end users. Again, this is no different with an SGX-enabled host machine running different SGX applications from different users. Another solution would be mapping a node with one user, meaning that SGX applications from the same user will run within the same node. While this solution reduces attacks from other end users, we can easily run into scalability issues given the limited number of nodes available within a system and a possibly large number of end users. The other problem of this solution would be load unbalancing. User A might only have 1 SGX applications running on node N-A while user B might have 100 SGX applications running on node N-B. I am not surprised if user B yells at the cloud.

That is being said I do not think Intel would take either approach. Instead, a VM-based approach might be used, where SGX applications from the same user run within the same VM and different users have different VMs. We can then achieve load balancing easily by assigning a similar number of VMs to each node. This approach is technically doable since we have seen SGX support for KVM [7] and nodes within Intel SGX card support KVM too. It is also possible that Clear Linux [8] will be used to reduce the overhead of VM by using KVM-based containters. The only question is if VM or container is enough to isolate potential attacks from other cloud tenants, e.g., cache-based attacks, and defend against attacks from OS and hypervisors, e.g., control-channel attacks.

5. Conclusion

This post tries to speculate what Intel SGX card would look like and how it would be used within a cloud environment. I have no doubt that some of the speculations could be totally wrong once we are able to see the real product. Nevertheless, I hope this post could shed some light on this new security product and what could/should be done and what is still missing. All opinions are my own.

References:

[1] https://itpeernetwork.intel.com/sgx-data-protection-cloud-platforms/
[2] https://newsroom.intel.com/news/rsa-2019-intel-partner-ecosystem-offer-new-silicon-enabled-security-solutions/
[3] https://www.intel.com/content/www/us/en/products/servers/accelerators.html
[4] https://www.intel.com/content/dam/support/us/en/documents/server-products/server-accessories/VCA_Spec_HW_Users_Guide.pdf
[5] https://www.intel.com/content/dam/support/us/en/documents/server-products/server-accessories/VCA_SoftwareUserGuide.pdf
[6] https://www.pcgamer.com/asus-has-a-motherboard-that-supports-up-to-19-gpus/
[7] https://github.com/intel/kvm-sgx
[8] https://clearlinux.org/

Posted in Security | Tagged , , , , , | Leave a comment

Syscall hijacking in 2019

Whether you need to implement a kernel rootkit or inspect syscalls for intrusion detection, in a lot of cases, you might need to hijack syscall in a kernel module. This post summorizes detailed procedures and provides a working example for both x86_64 and aarch64 architectures on recent kernel versions. All the code can be found at [1]. Happy hacking~

1. Syscall hijacking

There are different ways to hijack syscall as summerized by [3]. The essense is to modify the sys_call_table within the kernel to overwrite the original address of certain syscall to be the one implemented by yourself. Here we use kallsyms_lookup_name to find the location of sys_call_table. However, 2 more things (or maybe 3 depending on the architecture and we will talk about that later) need to be considered. First, is the page of sys_call_table writable? Recent kernels have enforced read-only (RO) on text pages. So we need to make the page writable again (RW) in our kernel module. Second, SMP environment require us to synchronize the sys_call_table modification with all cores. This can be achieved by disabling preemption.

2. Hijacking read syscall

Once we hijack a certain syscall, we are able to see all the parameters from the user space. For example, we are able to see the file discripter (FD), user buffer, and number of bytes (count) within the read syscall. The real meat of syscall hijacking comes from what we could do using these parameters. As a proof-of-concept (PoC), we trace back the file name from FD and prevent users from reading the specific file by returning something else. In our implementations, we stop users from reading the README.md file (yup) and return bunch of 7s. The good news is we limit our target process to be the testing procedure instead of any process. Since syscall happens within the process context, “current” is always available. Accordingly, intrusion detection, system profiling, and etc are made possible thanks to different syscall parameters.

3. Architecture difference

Architecture makes a difference. Intel has a control bit within CR0 to write-protect the read-only memory on x86_64. As a result, besides adding the W permission to the sys_call_table page, we also need to disable the write protection within CR0. ARM, on the other hand, does not have this constraint. On the aarch64 board with kernel 4.4 that I used, the text page also allows for write.

Nevertheless, in case of page write protection, we will have to need to implemement set_memory_rw and set_memory_ro (for recovery) by ourselves, because none of these functions is exported to kernel modules [3]. Essentially, we need to call apply_to_page_range and implement flash_tlb_kernel_range within our kernel module. This also reminds me a potential bug within the current x86_64 implementation, where a TLB flush should be needed after we update the PTE to synchronize other CPU cores by triggering IPIs.

References:

[1] https://github.com/daveti/syscallh
[2] https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/
[3] https://lxr.missinglinkelectronics.com/linux/arch/arm64/mm/pageattr.c

Posted in OS, Security | Tagged , , , , | 1 Comment

Kernel build on Nvidia Jetson TX1

This post introduces native Linux kernel built on the Nvidia Jetson TX1 dev board. The scripts are based on the jetsonhacks/buildJetsonTX1Kernel tools. Our target is JetPack 3.3 (the latest SDK supporting TX1 by the time of writing). All the scripts are available at [2]. Have fun~

1. Kernel build on TX1

Nvidia devtalk has some general information about kernel build for TX1 [3], including both native build and cross compile (e.g., from a TFTP server). Here we focus on the native build. The procedure roughly follows a) installing dependencies, b) downloading the kernel src, c) generating config, d) making build, and e) installing the new kernel image.

Unlike a typical kernel build on x86-64 architecture, the most confusing part would be to figure out the right kernel version supported by the board. TX1 uses Nvidia L4T [5], which is a customized kernel for the Tegra SoC. Depending on the JetPack version running on your TX1 board, different L4T version is needed. As you can tell, there are a lot of prepairations needed to be done before we could kick off the build.

2. buildJetsonTX1Kernel

JetsonHacks provides a bunch of scripts to ease and automate differen steps mentioned above, called buildJetsonTX1Kernel [1]. By detecting the tegra chip id (sysfs) and tegra release note (/etc), these scripts can figure out the model of the board (e.g., TX1) and the version of JetPack installed (e.g., 3.2), thus download the right version of L4T kernel source. Please refer to [4] for a detailed usage of these scripts.

3. One-click build

The buildJetsonTX1Kernel scripts are great and useful, but somehow I realized that my TX1 setup was different and I needed some customizations to make my life (hopefully yours too) easier [2]. The first issue was the usage of JetPack 3.3. I have submitted a patch to JetsonHacks for JetsonUtilities to correctly detect this latest JetPack version supported by TX1. Unfortunately, buildJetsonTX1Kernel scripts still only support up to JetPack 3.2. Things get more complicated when both JetPack 3.2 and 3.3 use the same L4T kernel version.

The original scripts assume the usage of eMMC to hold all the kernel build artifacts, which does not hold in my TX1 environment where a 64G SD card is mounted. Accordingly, I have updated all the scripts to use my SD card instead of the default /usr/src/ directory.

I have also created a one-click build script (kbuild.sh) to automate the whole process within one script. Simply running ./kbuild.sh would generate a new kernel image ready to reboot. I have also replaced xconfig with menuconfig since I use SSH to connect with TX1. A simple hello world kernel module is also included as a starting point for module development.

References:

[1] https://github.com/jetsonhacks/buildJetsonTX1Kernel
[2] https://github.com/daveti/buildJetsonTX1Kernel
[3] https://devtalk.nvidia.com/default/topic/762653/-howto-build-own-kernel-for-jetson-tk1/
[4] https://www.jetsonhacks.com/2018/04/21/build-kernel-and-modules-nvidia-jetson-tx1/
[5] https://developer.nvidia.com/embedded/linux-tegra

Posted in Embedded System, gpu, OS | Tagged , , , , , , , | Leave a comment