USB gadget functionalities in Android

I started working on Android stuffs this summer. While I mainly work on the USB layer within the Linux kernel, I do sometimes need to look into the Android framework, to see if I could achieve my goal from the Android user space. One of the questions I have got is to find the USB configuration of the phone (e.g., MTP, adb, and etc) when connected with the host machine. And here comes this post.

1. lsusb

The most straightforward way to have the USB information of the phone is to plug it into a Linux box, and run ‘lsusb -t‘ or ‘lsusb -v‘. Things get annoying when I tried to pull similar information from the phone directly.

2. adb

If you have adb with root permission, go to ‘/sys/class/android_usb/android0‘. You will find all the kernel USB gadget functionalities builtin, and the configurations, e.g., which functionalities are enabled in the phone right now. This won’t work if adb runs as ‘shell’ user.

3. USB Options

When the phone is connected with the host machine, the USB option menu would pop up providing more options: charging, MTP, PTP, and etc. This should be the easiest way in most cases, however, would fail if you are not playing with your own phone (e.g., pin protected, and yes, I am talking about “remote” phones in the device farm). It is also possible that some USB gadget functionalities are NOT exposed to the menu at all, e.g., a hiddent USB CDC/ACM functionality.

4. USB Settings

Some vendors provide secret dial codes to trigger a USB Setting menu with some combinations of different gadget functionalities for testing/debugging purpose. E.g., for Samsung Galaxy phones, use ‘*#0808#‘. Let me know if you have similar dial codes from other vendors.

5. UsbManager

Now it is time to look into the Android framework. Can we write an app to show the USB configuration/information?[1] is the implementation in Android to control the USB hardware. To differentiate the USB host mode and the USB gadget mode, the Android APIs have USB Host APIs and USB Accessory API[2], both of which are managed by the UsbManager. For instance, getDeviceList() in the UsbManager returns a list of UsbDevice connected with the phone. As expected, UsbDevice class is the instantiation of a USB device defined by the USB spec. We could go and find the configuration, interfaces, and endpoints, just like what we could do with libusb. Accordingly, getAccessoryList() returns a list of UsbAccessory the phone behaves. Here is the problem – UsbAccessory is NOT abstracted as UsbDevice! The only information from UsbAccessory you could get is device information, such as manufacturer, product, serial number, and etc. In short, no interface-level information in the UsbAccessory!

Nevertheless, UsbManager may still be the answer if we look the latest implementation[3]. ‘isFunctionEnabled()’ and ‘containsFunction()’ seem to be promising to detect hiddent functionalities, which may not be shown in the USB option menu. Unfortunately, this still depends on the vendor-specific customization – whether or not the USB function is exposed to the Android framework (platform), and managed by the UsbManager.


Posted in OS | Tagged , , | Leave a comment

Understanding kcov – play with -fsanitize-coverage=trace-pc from the user space

kcov is a kernel feature used to support syzkaller[1]. To provide the code coverage information from the kernel itself, the GCC compiler was patched to instrument the kernel image[2]. The kernel itself was also patched to enable this feature where is propriate[3]. This post tries to reproduce the essense of kcov in the user space, in the hope of a better understanding of kcov in general.

1. -fsanitize-coverage=trace-pc

After (>=) 6.0, GCC supports this new feature/flag, which instruments every basic block generated by GCC with function “trace-pc”. This “trace-pc” function is provided by the user, and should have the name as “__sanitizer_cov_trace_pc“. For kcov, this function is implemented in kernel/kcov.c. By the time of writing, there is no user-space example available using this new flag, primarily because this feature is mainly designed for kernel/syzkaller, and gcov is available for user-space programs already. Nevertheless, we will show how to play this in the user space:)

2. First try

Let’s create a “non-trivial” testing file and a makefile, as shown below:

#include <stdio.h>
#include <string.h>

void __sanitizer_cov_trace_pc(void)
	printf("code instrumented...\n");

static int fun2(void)
	return 0;

static void fun1(int a)
	printf("fun1: a[%d]\n", a);

int main(void)
	int a = 10;
	return 0;
CFLAGS=-g -ggdb -fsanitize-coverage=trace-pc

%.o: %.c $(DEPS)
	$(CC) -c -o $@ $< $(CFLAGS) $(LIBS)

test : $(OBJ)
	$(CC) -o $@ $^ $(CFLAGS) $(LIBS)

.PHONY: clean

	rm -f *.o test

Function “__sanitizer_cov_trace_pc” is defined in test.c file. The Makefile also enables the “-fsanitize-coverage” flag. We compile it, run it, and as you expected, get coredump. What the heck? Disassembling the binary gives us some hints:

0000000000400546 <__sanitizer_cov_trace_pc>::
  400546:	55                   	push   %rbp
  400547:	48 89 e5             	mov    %rsp,%rbp
  40054a:	e8 f7 ff ff ff       	callq  400546 <__sanitizer_cov_trace_pc>
  40054f:	bf 80 06 40 00       	mov    $0x400680,%edi
  400554:	e8 d7 fe ff ff       	callq  400430 <puts@plt>
  400559:	90                   	nop
  40055a:	5d                   	pop    %rbp
  40055b:	c3                   	retq

Besides main, fun1, fun2 are instrumented by __sanitizer_cov_trace_pc, __sanitizer_cov_trace_pc itself is also instrumented by the compiler, which creates a recursive bomb, and stack overflow eventually. So, we need to tell GCC not to instrument the instrumenting function itself.

3. Second try

If we look at the kernel patch[3] again, the __sanitizer_cov_trace_pc is defined with “notrace” decorator, which is essentially an attribute to let GCC skip instrumentation for the decorated function. All we need to do is to update our __sanitizer_cov_trace_pc function in test.c with this decorator:

#define notrace __attribute__((no_instrument_function))

void notrace __sanitizer_cov_trace_pc(void)
	printf("code instrumented...\n");

Compile, run, and unfortunately coredump again. Disassembly shows the recursive bomb is still there. WTF!

4. Do it again

If we look at the old makefile, we have __sanitizer_cov_trace_pc defined inside the test.c and shared the same compilation flags as other functions, including “-fsanitize-coverage”. It is time to split the function into a separated file and use a different set of compilation flags. Now we have trace.c, trace.h, test.c, and a new makefile, as shown below:

#include <stdio.h>

#define notrace __attribute__((no_instrument_function))

void notrace __sanitizer_cov_trace_pc(void)
	printf("code instrumented...\n");
#define notrace __attribute__((no_instrument_function))
void notrace __sanitizer_cov_trace_pc(void);
#include <stdio.h>
#include <string.h>
#include "trace.h"

static int fun2(void)
	return 0;

static void fun1(int a)
	printf("fun1: a[%d]\n", a);

int main(void)
	int a = 10;
	return 0;
CFLAGS=-g -ggdb -fsanitize-coverage=trace-pc
OBJ=test.o trace.o

all: test

trace.o: trace.c
	$(CC) -c -o $@ $<

test.o: test.c
	$(CC) -c -o $@ $< $(CFLAGS) $(LIBS)

test : $(OBJ)
	$(CC) -o $@ $^ $(CFLAGS) $(LIBS)

.PHONY: clean

	rm -f *.o test

Compile, run, and it works:

[daveti@daveti fuzz]$ ./test
code instrumented...
code instrumented...
code instrumented...
code instrumented...
fun1: a[10]
code instrumented...

5. Trace PC

The final missing part of our user-space implementation comparing to kcov is the PC tracing functionality. Fortunately, GCC already provides a builtin command to retrieve the return address pushed on the stack. As what linux kernel does, we update trace.c to enable PC tracing:

#include <stdio.h>

#define notrace	__attribute__((no_instrument_function))
#define _RET_IP_	(unsigned long)__builtin_return_address(0)

void notrace __sanitizer_cov_trace_pc(void)
	//printf("code instrumented...\n");
	printf("return pc [0x%x]\n", _RET_IP_);

Compile, run, and use addr2line to get debugging information from these addresses. Doesn’t it look almost the same as kcov!

[daveti@daveti fuzz2]$ ./test
return pc [0x4005ab]
return pc [0x400581]
return pc [0x400554]
return pc [0x400568]
fun1: a[10]
return pc [0x4005c6]
[daveti@daveti fuzz2]$ addr2line -e ./test 0x4005ab

6. Notes

One observation we can easily find is the overhead of this instrumentation. Since the instrumentation happens per basic block instead of functions, it is trivial to have bigger number of instrumentations than the number of actual function calls. For example, in the our example, we have 3 functions, and 5 instrumentations. This is also the reason why not every kernel subsystem/component enables kcov (besides, this instrumentation could break kernel booting). Again, this post tries to mimic what kcov does from the user space. For real user space coverage instrumentation, there is gcov[4] already.


Posted in OS, Security, Stuff about Compiler | Tagged , , | Leave a comment


Intel SGX CPU (staring from Skylake) has been there for while. The good news is that there is still no known exploitation against SGX self yet, though there are some exploitations in the enclave code and Intel SGX SDK. In general, SGX is still believed to provide strong security guarantee for the data/code in the enclave. If there is really something messed up in SGX, it has to be the CPU logic/mirocode. This post tries to peek into a specific bug reported by Intel for its SGX CPU implementation. Moving forward, we invesitgate the possible mitigations and add new features into the well-known Intel platform security tool chipsec. Cheers.

1. SKL012

In the spec update [1] released by Intel on Sep 2016, there are 6 CPU bugs related with SGX in general [2]. None of them is treated seriously by Intel thus no fix is planed for any of it. Among them, we especially look at SKL012:


The SMSW Instruction May Execute Within an Enclave


The SMSW instruction is illegal within an SGX (Software Guard Extensions) enclave, and an attempt to execute it within an enclave should result in a #UD (invalid-opcode exception). Due to this erratum, the instruction executes normally within an enclave and does not cause a #UD.


The SMSW instruction provides access to CR0 bits 15:0 and will provide that information inside an enclave. These bits include NE, ET, TS, EM, MP and PE.


None identified. If SMSW execution inside an enclave is unacceptable, system software should not enable SGX.


For the steppings affected, see the Summary Table of Changes.

My interpretation for this SKL012 bug is that SGX was designed to not allow SMSW instruction in an enclave, though the instuction could be executed by ring-0 and ring-3. Unfortunately, due to this bug, SMSW “may” be executed in the enclave.

2. So what the fuss?

SMWS is one of the secuirty sensitive instructions, which can be run by ring-3 and reveal some security sensitive information of the platform and the kernel to the user space. These instructions are usually leveraged by malware to detect the VM enviornment or exploit the system/kernel configuation [3][4]. Similar with SMWS, other security sensitive instructions include SGDT, SLDT, SIDT, and STR. All these instructions could be run by ring-3 code without any problem. And what about these instructions for SGX besides the SMWS mentioned in SKL012.

3. Verify the bug(s)

Here we implement a tool called sgxbug [7] based on the enclave creation sample code provided by Intel SGX SDK for Linux [8] by adding the SMWS instruction into the enclave and retrieving the results from the application. We also covered other 4 security sensitive instructions mentioned above. For implementation detailes, please check out the commit ( For normal ring-3 testing code without SGX, one could refer to [6] for details.

Here is the result of running sgxbug app:

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/sgxbug# ./app 0
Got senstive instruction idx: 0
Checksum(0x0x7ffe4ba35c30, 100) = 0xfffd4143
Info: executing thread synchronization, please wait…
Start sensitive instruction testing…
GDT: limit=0127, base=ffff880273c49000
Sensitive instruction testing done…
Info: SampleEnclave successfully returned.
Enter a character before exit …

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/sgxbug# ./app 1
Got senstive instruction idx: 1
Checksum(0x0x7ffd677a11f0, 100) = 0xfffd4143
Info: executing thread synchronization, please wait…
Start sensitive instruction testing…
IDT: limit=4095, base=ffffffffff578000
Sensitive instruction testing done…
Info: SampleEnclave successfully returned.
Enter a character before exit …

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/sgxbug# ./app 2
Got senstive instruction idx: 2
Checksum(0x0x7fff399040a0, 100) = 0xfffd4143
Info: executing thread synchronization, please wait…
Start sensitive instruction testing…
LDT: ffffffffffff0000
Sensitive instruction testing done…
Info: SampleEnclave successfully returned.
Enter a character before exit …

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/sgxbug# ./app 3
Got senstive instruction idx: 3
Checksum(0x0x7fffb1b31300, 100) = 0xfffd4143
Info: executing thread synchronization, please wait…
Start sensitive instruction testing…
MSW: ffffffffffff0033
Sensitive instruction testing done…
Info: SampleEnclave successfully returned.
Enter a character before exit …

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/sgxbug# ./app 4
Got senstive instruction idx: 4
Checksum(0x0x7ffcfe43bee0, 100) = 0xfffd4143
Info: executing thread synchronization, please wait…
Start sensitive instruction testing…
TR: ffffffffffff0040
Sensitive instruction testing done…
Info: SampleEnclave successfully returned.
Enter a character before exit …


(Maybe not) Surprisingly, only SMWS is working smoothly but also SGDT, SLDT, SIDT, and STR. In summary, there is no any limitation for code in the enclave to execute these security sensitive instrucitons.


Now what? It turns out the latest Intel CPU has a thing called UMIP – User Mode Instruction Prevention [5], which as its name implied can block some security sensitive instructions running at ring-3. The blocked instructions currently include all the 5 instructions mentioned above. That means, once UMIP is enabled, the user-space application (or malware) is not able to run these instructions anymore. This is really good in my opinion and I would recommend enabling this if possible to reduce the attack surface from the user space. Due to this reason, we add some new features to the CHIPSEC tool [9], detecting the UMIP feature, checking if the feature enabled, and enabling the UMIP for all cores if possible. For implementation details, please refer to the commit ( and the commit (

5. What if…

With the knowledge of UMIP, we can start thinking what would happen for SKL012 when UMIP is enabled. We have verified that without UMIP, all these instructions work in the enclave. Will they work again when UMIP is enabled? Unfortunately, my SGX CPU is apparently not “latest” enough to have such a feature:

root@sgx2-HP-ENVY-x360-m6-Convertible:~/git/chipsec# chipsec_util cpu umip detect

### ##
### CHIPSEC: Platform Hardware Security Assessment Framework ##
### ##
[CHIPSEC] Version 1.2.5
****** Chipsec Linux Kernel module is licensed under GPL 2.0
[CHIPSEC] API mode: using CHIPSEC kernel module API
[CHIPSEC] Executing command ‘cpu’ with args [‘umip’, ‘detect’]

[cpu] CPUID out: EAX=0x00000000, EBX=0x029C67AF, ECX=0x00000000, EDX=0x00000000
[CHIPSEC] UMIP available for CPU: False
[CHIPSEC] (cpu) time elapsed 0.000

Here is what I imagined: Because of SKL012 bug in SGX, it is possible that UMIP could not prevent the execution of these security sensitive instructions within the enclave. If this is true, with UMIP enabled, malware would need to use SGX to guarantee the exeution of these (e.g., VM detection).



Posted in Security | Tagged , , , , , , | Leave a comment

getdelays – get delay accounting information from the kernel

Top may be the most common tool in use whenever a preformance issue is hit. It is simple, quick and dumb. Besides the heavy metal stuffs like perf and gprof, another really useful and simple tool is getdelays, which provides the latency statistics per process/task for CPU, memory, and I/O.

1. Where to get it
As mentioned in the comment, need to compile it with:

gcc -I/usr/src/linux/include getdelays.c -o getdelays

Since it uses the netlink socket, it requires root permission to run as well.

2. What it does

Getdelays does a simple job – creating a netlink socket, sending a request to the kernel for reading the task statistics, and printing out the reply. Essentially, this netlink socket exposes the kernel taskstats structure to the user space. For more information about taskstats struct, please refer to

3. How it looks like


In the example above, it shows the delay information for httpd, which seems working fine without any memory or I/O issues, except some minor delays from CPU since it is a background process rather than an interactive shell. If the application has shown some latency issues, getdelays should be able to show some numbers in “delay total” and “delay average”, which should be helpful to limit the scope of the performance issue to CPU, memory or I/O.

4. Note

Cannot expect too much from getdelays, which simply prints some counts in the kernel, and should be enough to know where the problem would be. To find the performance bottleneck, strace/ltrace/dtrace/lttng/perf/gprof should be considered as the next step.

Posted in OS, Programming | Tagged , , , , , | Leave a comment

Making USB Great Again with USBFILTER – a USB layer firewall in the Linux kernel

USENIX Security '16

Our paper “Making USB Great Again with USBFILTER” has been accepted by USENIX Security’16. This post provides a summary of usbfilter. For details, please read the damn paper or download the presentation video/slides from USENIX website. I will head to TX next week, and see you there~

0. Why USB is not great anymore?

We CANNOT trust a USB device from its appearance anymore. One of the typical BadUSB attacks is a USB drive with a keyboard functionality to inject malicious script into the host machine once plugged into. The root cause of the problem is that (almost) everyone can change the USB device firmware to add new functionalities as desired. And people would just plug in the USB flash drives found somewhere for curiorsities (“Users Really Do Plug in USB Drives They Find” Oakland’16). Even worse, this also puts enterprise infrastructure in danger – however powerful the networking firewall would be, a suspecious USB device used by an employee can turn everything into vain. As a result, enterprise settings usually forbid the usage the external USB devices except the original keyboards/mouses. For most normal users, we just ignore these threats or try to plug in unknown USB devices into someone else’s machines…(this is how friendship breaks). Note that cellphones are also USB devices, and what would you do when someone needs to charge his/her phone using your machine?

1. Our solution – usbfilter

The more we play with USB, the more we realize that it is just another transport protocol for USB devices, like TCP/IP for networking devices. Moreover, it is USB packets trasmitted between the USB host controller and devices. Inspired by the netfilter in the Linux kernel, we then made up something like below, and all we need is to make it work.


2. The design and implementation of usbfilter

One of the key features of usbfilter is its ability to trace the USB packet back to its originating process. This is non trivial. For instance, because of the generic block layer and the I/O scheduler within the kernel, all USB packets operating (read/write) the USB storage devices are handled by the usb-storage kernel thread for performance considerations. Similarly, USB networking devices usually have their own Rx/Tx queue to buffer skb (IP packet) before it is encapulated by the USB stack in their drivers. Because usbfilter works at the lowest level of the USB abstraction in the kernel, the pid it can sees usually is either from a kernel thread (device drivers) or an IRQ context (null). As one can imagine, we hacked into different subsystems of the kernel, and saved the originating pid into the urb (USB packet) before it was lost due to asynchronized I/O. Once we fix that, we have a more concrete picture of usbfilter:


Now all we need to do is to implement a user-space tool, which is called usbtables, to communicate with the usbfilter component in the kernel, and enforce rules/policies pushed from the user-space. To make sure no conflictness/contradictiveness within rules, usbtables also has an internal Prolog engine to reason about each new rule before it is pushed into the kernel.


3. So what can usbfilter do?

Here is the fun part. We list a bunch of cool use cases here. For a complete list of case studies, please refer to our paper. In general, just like iptables, with the help of usbtables, users can write rules to regulate the functionalities of USB devices.

A Listen-only USB headset

usbtables -a logitech-headset -v ifnum=2,product=
      "Logitech USB Headset",manufacturer=Logitech -k
      direction=1 -t drop

A Logitech webcam C310, which can only by used by Skype

usbtables -a skype -o uid=1001,comm=skype -v
      serial=B4482A20 -t allow
usbtables -a nowebcam -v serial=B4482A20 -t drop

A USB port dedicated for charging

usbtables -a charger -v busnum=1,portnum=4 -t drop

There are 2 possible settings for these rules, since users can use usbtables to set the default action when no rule matched. If the default action is DROP, users can use usbtables to add a whitelist, permitting certain devices with certain functionalities. This provides the strongest security guarantee since each USB device needs at least one rule to work. If the default action is ALLOW, users have to use usbtables to add a blacklist, blocking undesired functionalities from certain devices. This is less secure but provides the best usability.

4. What is LUM?

If you look at the usbfilter architecture figure again, you will notice a thing called usbfilter modules or Linux usbfilter modules (LUM). This is another powerful feature of usbfilter. Just like netfilter, usbfilter enables kernel developers to write kernel modules to look into and play with the USB packet as wished, plug in them into the usbfilter, and enable new rules using these kernel modules. Check out the example LUM in the code repo to detect the SCSI write command within the USB packet ( With the help of this LUM, one can write rules to stop data exfiltration from the host machine to a Kingston USB flash drive for user 1001:

usbtables -a nodataexfil2 -o uid=1001
      -v manufacturer=Kingston
      -l name=block_scsi_write -t drop

With default to block any SCSI write into any USB storage devices, a whitelist can help permit a limited number of trusted devices in use while preventing data exfiltration when an unknown USB storage device is plugged into.

5. Todo…

There is still a long way before the usbfilter can be officially accepted by the mainline. Applications may hang forever waiting for the response USB packet, whose request USB packet has been filtered by usbfilter, though this could be an implementation issue of applications. Some USB devices can also be stale in the kernel even if they have been unplugged already, if the USB packet used to release the resource is also filtered. Even though usbfilter has introduced a minimum overhead, using BPF may be mandatory for it to be accepted by the upstream.

6. Like it?

To download the full paper, please go to my publication page. The complete usbfilter implementation, including the usbfilter kernel for Ubuntu 14.04 LTS, the user-space tool usbtables, and the example LUM to block writings into USB storage devices are available on my github: Any questions, please go ahead to open an issue in the code repo, and I will try my best to answer it in time.

Posted in Dave's Tools, OS, Security | Tagged , , , , , , , , , , , | Leave a comment

Fedora Upgrade from 21 to 24

After almost 5 hours of upgrading, my server has been successfully upgraded from Fedora 21 to Fedora 24, which uses the latest stable kernel 4.6. There is a online post demonstrating how to upgrade from Fedora 21 to 23 using fedup. This post talks about Fedora upgrading from 21 to 24 using dnf. NOTE: please do backup your data before action!

0. yum update

This is usually not a problem for Fedora 21, whose support has expired for a long time. Anyway, run it just in case.

1. dnf

According to the Fedora official wiki (, dnf is recommed for system upgrade. Apparently, fedup has been ditched. Here what we need are 3 dnf commands:

sudo dnf upgrade --refresh
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --refresh --releasever=24

The last dnf command should list any error, which blocks the upgrade. The errors I have encountered were obsolete packages which are not supported in Fedora 24 repo. As you can tell, the only way to move the upgrade is to remove all these obsolete packages, using “yum remove” + unsupported package name reported by dnf.

Once all the errors are cleaned, dnf is able to download all the required packages for Fedora 24. On my server, it was about 4GB. So, you need at least some GB left to hold all these new packages. More important, dnf requires another 5GB under root during the package installation. Make sure you make dnf happy.

2. Keys

Before dnf was able to install all new downloaded packages, I got such an error:

Couldn’t open file /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-24-x86_64

There is a bug report talking about the possibilities of this issue and corresponding fixes ( However, if you find manual key importing does not work, go and take a look at /etc/pki/rpm-gpg directory. What happened to my server was simply no any key file for Fedora 24. Oops. The fix is also easy – creating the key files by ourselves. Go to and find the key files (primary/secondary). Create these key files and symlink the x86_64 (arch of my server) with the primary. That’s it.

3. dnf again

Reboot the machine to start the upgrade:

sudo dnf system-upgrade reboot

Hint: yum is now deprecated. Run “dnf update” once you are into the new system.

Posted in Linux Distro | Tagged , , , | 4 Comments

Malware Reverse Engineering – Part II

While most tools for MRE are staightforward, some of them require time, patience, and skills to show the full power. For static analysis, this means IDA; for dynamic analysis, it is OllyDbg (and WinDbg for Windows kernel debugging). In this post, we will play disassembly code heavily with both tools. Remember – the key point of MRE is not to fully understand every line of disassembly, but rather to construct a big picture of the malware in a high-level programming language, e.g., C/C++. If you have a Hex-Rays decompiler already, use it to make your life easier. Otherwise, read this post.

0. Report header

Apr 11, 2016. GNV, FL.

1. Download the malware – play with your own risk!

Git clone my git repo ( and copy the malware_g.7z into the Windows VM. NOTE: there is not password protection for this malware.

2. Summary

This malware G and the accompanied jellydll.dll are a proof-of-concept GPU-based rootkit  called JellyCuda ( It leverages the Nvidia GPU non-volatile memory to hide the malicious jellydll.dll and make it persistent without being detected by scanning the hard disk of the host machine. When the host is infected by the JellyCuda the first time, it loads the jellydll.dll into the GPU memory, creates a file called jellyboot.vbs in the startup folder and writes itself into the pre-formated VBscript, making sure that the malware would run every time when the machine is booted, and finally the jellydll.dll is removed. After the machine is rebooted, the malware looks for the jellydll.dll. If the dll file is still available, the malware would repeat the previous procedure to hide the malicious DLL file in the GPU memory. Otherwise, the malware reads the GPU memory, finds the memory block containing the jellydll.dll contents, reconstructs the DLL file in the memory, replaces the current process memory with the contents of the DLL, and finally calls the DllMain() entry function of the jellydll.dll, which simply prints out warnings of the existence of the GPU RAT.

Since this is a proof-of-concept malware, specific signatures or remediations for this malware may not be interesting or useful. However, JellyCuda does give us some hints to think about GPU-based rootkit in general:

  1. Calls to CUDA/OpenCL – normal applications usually do not deal with GPU directly.
  2. cuMemAlloc, cuMemcpyHtoD, cuMemcpyDtoH (or the OpenCL equivalents) – this means there is memory block transmission between the main RAM and the GPU memory.
  3. New file created – either the registry and/or the startup folder or the prefetch folder may be changed to include the malware itself, making sure it persistent across rebooting.

To remove JellyCuda from the system, one needs to clean the residency in the GPU memory at first, position the malware itself based on the modified registry/startup/prefetch, and remove it. The good news is that my Avast is able to recognize the JellyCuda as malware when I tried to copy it into the VM for analysis on my Mac.

NOTE: this report focuses on IDA and OllyDbg analysis, rather than other straight-forward tools. IDA analysis shows the complete picture of the malware, and OllyDbg digs into the malicious payload (jellydll.dll), which could not be analyzed by IDA.

3. Static Analysis

  • Is it packed?

No, though PEiD shows a packer named Pelles C for this malware, but it is the compiler which compiles the binary, not the packer.


And nothing found for the accompanied dll:


  • Compilation data?

Malware_g.exe: 2015/05/09.


Jellydll.dll: 2015/05/09


  • GUI or CLI?

Malware_g.exe: PEiD thinks it is a Win32 GUI and PEview thinks the same way.


jellydll.dll: PEiD reports it as Win32 GUI and PEview agrees.


  • Imports?




File manipulation:

CreateFile, WriteFile, CloseHandle, GetFileSize, ReadFile, DeleteFile, GetFileAttributes, GetFileType, GetStdHandle, DuplicateHandle, SetHandleCount,

Memory manipulation:

VirtualAlloc, GlobalAlloc, HeapAlloc, GlobalFree, HeapCreate, HeapDestroy, HeapReAlloc, HeapFree, HeapSize, HeapValidate, VritualQuery

Process manipulation:

GetProcAddress, GetModuleHandle, GetProcessHeap, GetModuleFileName, GetCurrentProcess, ExitProcess,

Library manipulation:

LoadLibrary, FreeLibrary,


Strlen, strcat, GetLastError, GetStartupInfo, RtlUnwind, GetSystemTimeAsFileTime, GetCommandLine, GetEnvironmentStrings, FreeEnvironmentStrings, UnhandledExceptionFilter, WideCharToMultiByte, SetConsoleCtrlHandler


MessageBox, wsprintf, ExitWindowsEx


OpenProcessToken, LookupPrivilegeValue, AdjustTokenPrivileges









File manipulation:

GetFileType, GetStdHandle, DuplicateHandle, SetHandleCount,

Memory manipulation:

VirtualAlloc, VirtualFree, HeapCreate, HeapDestroy, HeapReAlloc, HeapFree, HeapSize, HeapValidate, VritualQuery

Process manipulation:

GetCurrentProcess, ExitProcess,


GetStartupInfo, GetSystemTimeAsFileTime, GetCommandLine, GetModuleFileName, GetEnvironmentStrings, FreeEnvironmentStrings

  • Strings?


Process: svchost
Jellyboot.vbs, malware_g.exe

Files generated by the compiler:





Error handling:














  • Sections and contents?

malware_g.exe: there are 3 sections in total

.text: it looks like there is code in it.

.rdata: Warning strings, windows commands, CUDA functions, and interesting stuffs


.data: IAT, and a bunch of debug sections, including COFF



Jellydll.dll: there are 4 sections.

.text: normal code

.rdata: malware writer’s kind reminder


.data: IAT


.reloc: relocation table

(g) Resource

ResourceHacker found nothing for either the malware_g.exe or jellydll.dll.

(h) IDA Pro


The first entry function of malware_g.exe is WinMainCRTStartup(), which is generated by the Pelles C compiler for Windows.


It sets up an exception handler, which calls RtlUnwind(), which is usually generated by the compiler for try/except. It then moves to allocate space on the heap using HeapCreate() called by __bheapinit(). If failed, then exit. Otherwise, system setting up continues.



If everything is still good, we reach the second entry function WinMain(), which is the real function implemented by the malware.


The first thing WinMain() tries to do is to call LoadCuda().


If the loading is failed, the malware exits. Otherwise, it continues with a call to dword_40595C, dword_405958, dword_405954, and jc. Since all these are indirect calls, we need to figure out what these memory address are by looking into the LoadCuda().


As its name implies, LoadCuda() starts with loading nvcuda.dll using LoadLibrary(), and exits if the loading fails.


When nvcuda.dll is successfully loaded, memory address jc is loaded into %eax and then the local variable lpAddress. Looking at that memory address, we realize the connections among all those memory addresses. Jc is the start address of a struct with address 0x405950, and dword_405954, dword_405958, dword_40595C, …, dword_40596C are the following members of the struct. Since all members are dword (4 bytes) and called by the call instruction, this jc struct contains a bunch of function pointers.



Once jc is loaded into lpAddress, a loop starts on szFuncNames array. For each name in szFuncNames, GetProcAddress() is called with the library handle returned by LoadLibrary() and the name. The return value is assigned to the current value of lpAddress.


Looking into the szFuncNames, we see the CUDA functions we have seen in the strings.


Once LoadCuda() is done. Struct jc is initialized with all these CUDA functions in order. So back to the WinMain(), after LoadCuda() is successfully returned, cuInit(), cuDeviceGetCount(), cuDeviceGet(), cuCtxCreate_v2() are called one by one. Any call failure would free the loaded CUDA library and exit the malware. When CUDA is successfully initialized, GetFileAttributes() is called with jellydll.dll and the return value is checked against 0xffffffff (-1), which is INVALID_FILE_ATTRIBUTES. GetLastError() is called and the return value is checked against 2, which is ERROR_FILE_NOT_FOUND. When both errors happen, SearchJellyDustOnGPU() is called; otherwise, SprayJellyDustToGPU() is called. Then FreeLibrary() is called and WinMain() returns.


SearchJellyDustOnGPU() calls AllocateGPUMemory() at first, which calls dword_405960, which is essentially the 5th member of struct jc – cuMemAlloc_v2().


If AllocateGPUMemory() failed, SearchJellyDustOnGPU() would exit. Otherwise, it continues calling GlobalAlloc(), dword_405964 (cuMemcpyDtoH_v2()), dword_40596C (cuMemFree_v2()), which copies the GPU into the host memory. Note that the copied memory size is expected to >= 0x1000C (65548) bytes.


The copied memory is then examined against a number 0x5DAB355 in a loop.



If the memory blocks starts with the magic number, and some checkings are passed, and GetDustCheckSum() is passed as well, we hit the core of this SearchJellyDustOnGPU() – GetProcessHeap(), HeapAlloc(), and ExecuteJellyDust(). Note that the ‘rep movsb’ copies the memory block we found with offset 0xC into a local variable lpvDust, which is then passed into ExecuteJellyDust().


The ExecuteJellyDust() function calls VirtualAlloc(), LoadLibrary(), and GetProcAddress() in a big loop. Based on the naming of local variables involved – pImport and pRelocBase, one can guess that this loop is used to reconstruct a library from the memory block. Finally, ExecuteJellyDust() loads ntdll.dll and calls NtFlushInstructionCache(), which parameters (-1, 0, 0), which is undocumented, and clears the old code in the cache. Finally, an indirect call to %eax is made with parameters (lpvTarget, 1, 0). Note that %eax is derived from pNt with offset 0x28, which is the offset of DllMain() against the PE signature. So, we know that final call is to call the entry function of the library created in the fly before. Now the question is what is that library?


The last function we haven’t looked at is SprayJellyDustToGPU(), which is called when the malware is able to find the jellydll.dll. The only parameter of this function is “jellydll.dll”. First, it calls CreateFile() to open jellydll.dll, and GetFileSize(). Then GetProcessHeap() and HeapAlloc() are called to allocate enough memory for jellydll.dll, which is then read into the memory via ReadFile(). AllocateGPUMemory() is called after followed by GetDustCheckSum() and GlobalAlloc(). Note that the magic number 0x5DAB355 is added ahead of the memory block of jellydll.dll.


The JellyDust (magic number + tweak(jellydll.dll)) is then copied into the memory allocated by the GlobalAlloc(), and later copied into the GPU memory via dword_405968 (cuMemcpyHtoD_v2()).


At last, file jellydll.dll is closed and deleted via CloseHandle() and DeleteFile(), before the Reboot() is called, which is the last piece of the malware_g.exe puzzle. This function calls SHGetKnownFolderPath() to open _FOLDERDIR_Startup, which is %APPDATA%\Microsoft\Windows\Start Menu\Programs\StartUp.


The startup path is then converted from wide chars into multiple bytes using wcstombs(), appended with byte 0x5C (‘\”), and null terminated.


Then file jellyboot.vbs is created under than startup direction.


After the new file is created, GetModuleFileName() is called to get the file path of the malware_g.exe itself. The jellyboot.vbs file then is written via WriteFile() with command lines formated by wsprintf() using the file path of the malware itself, and finally closed via CloseHandle(). The command lines are used to create a COM object using VBscript to run the malware itself and then remove itself.


The last thing Reboot() does is to call GetCurrentProcess(), OpenProcessToken(), LookupPrivilegeValue(), and AdjustTokenPrivileges() to gain the permission to reboot the machine using ExitWindows().



Now we know that jellydll.dll is the RAT, and the DllMain() entry function would be executed by the malware_g.exe. However, IDA screws the analysis of this library. The dll entry function tries to call sub_10001030, which is the address in the .rdata section.



4. Dynamic Analysis


We are not able to run malware_g.exe, not only because of the CUDA requirement, but also the fact that below procedure could not be located. Why? This function is only available above Windows Vista.



To see what the heck jellydll.dll is doing in its DllMain() entry function, we load jellydll.dll into OllyDbg, which asks if we want to load LoadDLL.exe to run the library. After yes, we finally see the RAT.


Then we break at the new module loading time and find the exact DllMain entry function, which is at 0x7C901187.


Then we break at the DllMain() function to examine the stack. %esp is 0x0006F8AC, and %ebp is 0x0006F8C4. The first parameter of the function is at the top of the stack, which is address 0x0006F8AC. The second parameter is address 0x0006F8B0. The third parameter is address 0x0006F8B4. The function call is ss:[ebp+8], which is address 0x0006F8CC.


Moving on to look back at the stack, we have:

First parameter (hinstDLL) – 0x0006F8AC: 0x10000000 –  should be the handle to the loaddll.exe itself.

Second parameter (fdwReason) – 0x0006F8B0: 0x00000001 – that is the REASON code DLL_PROCESS_ATTACH.

Third parameter (lpvReserved) – 0x0006F8B4: 0x00000000 – NULL for dynamic loads.

Function call – 0x0006F8CC: 0x10001140 – that is the correct address of DllEntryPoint() shown in IDA.

There we go, let us step into the DllMain(). The real function call in the DLL entry is at address 0x1000117E, with an instruction “call 10001000”. So break at this line again and examine the stack.


Now interesting thing happens. When we try to set a breakpoint at the address, OllyDbg tells us that we are looking at the code in the data section rather than the code section, which may explains why IDA screws. Anyway, set the breakpoint and step into.


We finally see the final function called in the DllMain() of the jellydll.dll. It is a call to MessageBox with the capital string and the RAT string.


5. Indicators of compromise

Since this is a proof-of-concept of GPU-based malware, it is easy to know the machine is compromised when the warning window shows up. In reality, the indicator could be non-trivial to find, depending on the implementation of the GPU payload (jellydll.dll). If it is a rootkit, it may stay in the machine for a long time without detection, and even AV may not help. If it is a RAT, we may be able to find unfamiliar socket connections with outside. If it is a ransomware, we know when we know.

6. Disinfection and remedies

It is not clear so far what the best solution would be for GPU-based malware (and I am going to dig deeper to see if there would be a paper potential). Since current prototypes of GPU-based malware require a ‘helper’ in the host system to make it work, Intel does not think it would be threat ( On the other hand, my Avast on Mac is able to detect the JellyCuda when I tried to move it into the VM for analysis. As far as I can think of now is a system tool/mechanism to look into the GPU memory for malware detection just like AV does on the host machine. We may also reconsider the access control for the GPU from the security point. Yeah, I am talking about the pitch of a potential paper trying to defense GPU malware. Will see how it goes:)

Posted in Security, Static Code Analysis | Tagged , , , , , , , , , | Leave a comment