Malware Reverse Engineering – Part I

I took a “Malware Reverse Engineering (MRE)” class last semeter and it was fun to me, partially because I was not a Windows person, though I am still not. What seems ridiculous to me is how trivial one can write into any process on Windows XP, which was apparently designed for malware! Regardless of all those Windows craps, this post is to share a general working flow of malware reverse engineering on Windows (XP) platform, and the corresponding tools. Note that this report has no way to be a good one. Instead, this was my first trial and I intended to put as much information as possible. If you are interested in MRE or wants a job for that, do buy this book (https://www.nostarch.com/malware) and give it a complete read. Have fun and stick with Linux~

0. Report header

Feb 12, 2016, GNV FL.

1. Download the malware – play with your own risk!

All the malware samples can be found on my github (https://github.com/daveti/mre). All malware binaries are compressed by 7Zip with password “malware” protection. This post is about the first malware/ransomeware uploaded. Before you start, make sure you have a Windows VM (KVM/VirtualBox/VMware) ready with networking disconnected from the host machine.

2. Summary

This malware is a kind of ransomware. The IP/domain of the networking is encoded instead of plain text. The encryption routine is also a DIY method without calling existing crypto libraries. Most imports are ReadFile/CreateFile/DeleteFile. 2 new entries are added into the registry, and one of them is the malware itself. All *.doc, *.txt, *.jpg, and etc. under C: are encrypted. A DNS query is also triggered for domain “time.windows.com”. The file “CryptoLogFile.txt” may be used to detect this malware, since it is created at first to log all the files encrypted. “time.windows.com” seems not a helpful signature, since it is a valid domain.

3. Static analysis

  • Is it packed?

Seems not.

image34

PEiD: Nothing Found (but shows Win32 GUI as its subsystem).

image01

PEview: A lot of imports can be found.

image11

pestudio: same as PEview

If PEiD is able to detect packer, it would provide the information of the packer, which can be used to find the unpacker; If PEiD fails, we have to refer to PEview/pestudio, investigating the imports and section contents manually.

  • Compilation data?

image13

PEview: 2009/10/09.

  • GUI or CLI?

Seems CLI.

PEview: There is no GUI related functions or DLL found in the text section.

image30

Depends: It only depends on user32.dll, kernel32.dll, and shell32.dll.

  • Imports?

PEview: file related operations (CreateFile, DeleteFile, FindFile, ReadFile, WriteFile), a bunch of ‘get’ functions (GetCommandLine, GetEnvironmentVariable, GetFileSize, GetLogicalDrives, GetWindowsDirectory), and some string operations (lstrcat, lstrcmp, lstrcpy). A wild guess would be this malware goes into the Windows directory, removes the target files, and also creates some new files.

image05

  • Strings?

strings2:
IP: NA
URL: NA
Process: NA
File: user32.dll, kernel32.dll, shell32.dll, CryptLogFile.txt, wallpaper.bmp, .txt, .doc, .xls, .db, .mp3, .waw, .jpg, .rtf, .pdf, .zip,

image17

pestudio:

image29

  • Sections and contents?

PEview: text seems OK; rdata contains imports address table, directory table, name table, as shown in (d); data contains 2 interesting file names (CryptLogFile.txt, wallpaper.bmp); rsrc contains some icons, which seem fine.

image20

ResourceHacker: rsrc section looks no code embedded.

image28

IDA Pro

  • The first file created

There are 3 subroutines and main calling CreateFile:

image37

image14

The main function prepares the file name to be “c:\windows\CryptoLogFile.txt”,

image33

and then save it at byte_403F28 after preprocessing – removing the char 22h,

image25

and then call sub_4015B5, which calls CreateFile the first time, which then creates the file using the filename at byte_403F28, which is the CryptoLogFile.txt.

image00

  • dword 0xCA6B93C9

The DIY encryption routine uses this table to look up for a value, then XOR with the original value to achieve the encryption. If the encryption was PKI, then this secret data buffer could hold keys, .e.g., private key can be used to encrypt, while the public key would be sent to the attacker asking for ransom.

  • sub_401000

This routine starts with FindFirstFileA to find a specific file, and returns if the search fails (locret_401261), and keeps looping till all the target files have been gone thru.

image27

  • sub_40140D

I would try to name it as – read the file, encrypt it into a new file, and remove the original file.

image02

It also calls the shell del command to remove the file:

image24

  • sub_401263

I would rename it as – DIY encryption routine, especially after I saw the operations like:

lodsb
xor eax, the_secret_look_up_table[edx]
stosb

image19
4. Dynamic analysis

Preparement:

REMnux: start inetsim

image26

Windows:
start apateDNS

image03

start Process Explorer

image38

start Procmon (then pause and clear)

image22

start RegShot (the 1st shot)

image32

Unpause the Procmon; Execute the malware; Pause the Procmon (seems it got hang every time…)

image06

Take 2nd RegShot

  • Interesting behaviors that occur after the malware has executed.

image31

  • Machines and services the malware attempts to contact by IP or domain or host name.

image08

  • Registry keys created/modified by the malware

image04

  • Files created/modified by the malware

image07

There are also files encrypted outside the windows directory, e.g., the Dynamic Analysis directory on the desktop. Since I was scanning the dir only under c:\windows, these files are not shown in RegShot. However, CryptoLogFile lists all the encrypted files (how nice is that).

  • Processes started by the malware

Notepad, and maybe else (Procmon stuck when pause…)

image38

5. Indicators of compromise

A lot of files have been encrypted, as listed in CryptoLogFile.txt. For example, one of the README.txt looks like below. And, for sure, comes the “new” wallpaper with introduction to ransomware, and ways to pay the ransom.

image35

image21

6. Disinfection and remedies

To make sure this ransomware will not start again, need to do a clean up in the registry. If there is a data backup (there should be), or a system snapshot, do a recover – yeah, problem solved. If there is no data backup, and I am able to decrypt the encryption routine (DIY crypto could be vulnerable comparing to other common crypto methods and implementations), then it is time to learn maths and assembly. Otherwise, which may be the most common way, pay the ransom.

Posted in Security, Static Code Analysis, Uncategorized | Tagged , , , , , , , , , | Leave a comment

gcc, llvm, and Linux kernel

This post talks about what happened recently in the Linux kernel mailing list discussion. While this post does not dig into compiler internals or the whole picture between the Linux kernel and compilers, we discuss 2 specific issues from gcc and llvm respectively. The gcc issue may be a quirk but the llvm issue is definitely a bug. Keep reading…

1. leal %P1(%%esp),%0

The title is the inline assembly used at arch/x86/boot/main.c line 121. The thing seems weird is the ‘P’ in ‘%P1’, which is not the common comparing to ‘%1’ we used to see in gcc inline assembly. So what is the heck[1]? Let us try to put this kernel inline into a main function where we could play with gcc easily:

#include <stdio.h>
#define STACK_SIZE	512
static int stack_end;

int main()
{
	asm("leal %P1(%%esp),%0"
		: "=r" (stack_end)
		: "i" (-STACK_SIZE));

	return 0;
}

Then we assemble the code (gcc -S) and look at the assembly, where we can see the inline is interpreted as follows:

leal -512(%esp),%eax

This is exactly the thing we want for ‘leal’. In a word, gcc does not complain anything about this ‘P’. What if we remove the ‘P’ and look at the assembly again? After a quick trial, here is the inline generated by gcc:

leal $-512(%esp),%eax

Oops, gcc recognizes the ‘%1’ is an immediate value and appends ‘$’ (AT&T style) automatically. This may be right in most cases but definitely wrong for ‘lea’. As a matter a fact, if I try to compile the code directly, gcc would not let me do that. Now it is clear that the tricky ‘P’ in ‘%P1’ is used to make gcc happy and work. Note that I am using gcc 4.9.2. Latest gcc (5/6?) seems having fixed this quirk already – generating the same and correct assembly with or without the mysterious ‘P’. Go try yourself.

2. pushf/popf

The original issue was reported from usbhid testing using llvm-compiled kernel[2]. With kernel developers’ further debugging, the root cause of the bug is clear, pointing to the llvm rather than the kerne code itself[3]. Let us go thru the example described in the llvm mailing list. Here is the source file:

#include <stdlib.h>;
#include <stdbool.h>;

/* Assume foo changes the IF in EFLAGS */
void foo(void);
int a;

int bar(void)
{
	foo();
	bool const zero = a -= 1;
	asm volatile ("" : : : "cc");
	foo();
	if (zero) {
		return EXIT_FAILURE;
	}
	foo();
	return EXIT_SUCCESS;
}

The point is foo() may (or not) change the IF in the EFLAGS. Compile it to generate the object file (clang -O2  -c -o ) and disassemble it as shown below (objdump -S):

[daveti@daveti c]$ objdump -S llvm_if_issue.o

llvm_if_issue.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <bar>:
   0:	53                   	push   %rbx
   1:	e8 00 00 00 00       	callq  6 <bar+0x6>
   6:	ff 0d 00 00 00 00    	decl   0x0(%rip)        # c <bar+0xc>
   c:	9c                   	pushfq
   d:	5b                   	pop    %rbx
   e:	e8 00 00 00 00       	callq  13 <bar+0x13>
  13:	b8 01 00 00 00       	mov    $0x1,%eax
  18:	53                   	push   %rbx
  19:	9d                   	popfq
  1a:	75 07                	jne    23 <bar+0x23>
  1c:	e8 00 00 00 00       	callq  21 <bar+0x21>
  21:	31 c0                	xor    %eax,%eax
  23:	5b                   	pop    %rbx
  24:	c3                   	retq

Let us focus on the interesting part:

   c:	9c                   	pushfq
   d:	5b                   	pop    %rbx
   e:	e8 00 00 00 00       	callq  13 <bar+0x13>
  13:	b8 01 00 00 00       	mov    $0x1,%eax
  18:	53                   	push   %rbx
  19:	9d                   	popfq

As you can see here, before bar() calls foo(), it saves EFLAGS on the stack using ‘pushf’. After the foo() is done, it recovers the EFLAGS from the stack using ‘popf’. Remember our assumption – foo() may change the IF in the EFLAGS! Now we could explain the bug found in usbhid. The foo() is spin_lock_irq(), and the bar() is usbhid_close(). While spin_lock_irq() makes sure the interrupt disabled, usbhid_close() used the old value of EFLAGS, ignoring what happens in spin_lock_irq().

3. Summary

The gcc quirk may reflect the hackish fix of gcc in the early days to satisfy the kernel compilation requirement. After all, gcc is the only compiler without any patches in the kernel to compile the Linux kernel. As such, Linux kernel is the only project leveraging different gcc features other projects would never bother. On the other hand, llvm is catching up. There are kernel patches already to make llvm compile the kernel, and people are testing llvm kernel images. Nevertheless, the EFLAGS clobbering issue in llvm optimization may be a showstopper. Most user-space applications do not care about interrupt, however, it is the core requirement for the kernel to work as expected. As Linus pointed out – “Using pushf/popf in generated code is completely insane (unless done very localized in a controlled area).

4. Reference

[1]http://comments.gmane.org/gmane.linux.kernel.kernelnewbies/52259
[2]https://lkml.org/lkml/2016/3/1/160
[3]http://lists.llvm.org/pipermail/llvm-dev/2015-July/088780.html

Posted in OS, Stuff about Compiler | Tagged , , , , | Leave a comment

Defending Against Malicious USB Firmware with GoodUSB

Finally, 4 months after our paper was accepted by ACSAC’15, I could now write a blog talking about our work – GoodUSB, and release the code, due to some software patent bul*sh*t. (I sincerely think software patent should be abolished from the very start!) Anyway, this post is all about malicious USB firmware, BadUSB attacks, and our defense solution from the Linux kernel – GoodUSB. Go ahead to download GoodUSB and play with it. Any question, shoot me an email.

0. To memorize the old paper title given by Dr. Bates

GoodUSB: How I Learned to Stop Worrying and Love the Rubber Ducky

1. A quote from a chat in Skype[1]

“I read an article about how a dude in the subway fished out a USB flash drive from the outer pocket of some guy’s bag. The USB drive had “128” written on it. He came home, inserted it into his laptop and burnt half of it down. He wrote “129” on the USB drive and now has it in the outer pocket of his bag…”

2. BadUSB attacks

If you read the reference link, you will find that the “USB flash drive” (or USB Killer) has embedded some capacitors supporting high negative voltage (-110V). Once charged, these capacitors are able to cause over current in the USB signal line. While we are not going to talk about this in details (probably in another post), it does leave us a question: “Arriving at work, you find a USB drive on your table. What would you do?”[1]

In reality, data exfiltration or backdoor injection is preferred to burning down the machine. This is what USB rubber ducky[2] designed for. As a penetration testing tool, the USB rubber ducky looks like a USB thumb drive, but quacks like a keyboard, and types like a keyboard. Therefore, it is a keyboard. The only difference between a keyboard and a USB rubber ducky besides the appearance is that unlike normal keyboards, a USB rubber ducky does not need a human being to type the keystrokes – an adversary can write a malicious script, compile and load it into the ducky, and the ducky will execute it once plugged. How cool is that! A more powerful programable USB device is Teensy[3]. Teensy 3.1 development has been integrated into Arduino IDE. Similar with USB Killer, the best possible defense solution would be to open the case, and look at the PCB board carefully (as long as you know what you are looking at…).

Unfortunately, BadUSB[4] attacks in BlackHat 2014 made our best try so far a vain. Rather than requiring a specific USB micro controller, people could write malicious firmware by themselves on common USB micro controllers, thanks to the existing firmware building tools[5]. This means that a USB flash drive could behave like a storage and a keyboard the same time. While the storage provides the normal usage, the keyboard part is essentially a USB rubber ducky. The problem now is that we will not know if the firmware is malicious or not util it is plugged. And most of the time, we do not even know that there is a keyboard enabled, since it happens within the OS. Now let us repeat the question again: “Arriving at work, you find a USB drive on your table. What would you do?”

3. Root Cause Analysis

The root of BadUSB attacks originates from the USB spec. A USB device is able to have multiple functionalities (interfaces). Think about a USB headset, which contains audio functionalities (speaker + microphone), and a input/keyboard functionality (volume control). Therefore, there is no violation from the spec point for a USB storage device to have a keyboard functionality (and for some storage devices, this extra keyboard functionality may be needed as we will talk about this later). In reality, when a USB device is plugged into the host machine, it can report any functionalities (interfaces) that need OS’s support. The OS would try its best to find the corresponding driver to serve each of the functionality. Think about a BadUSB thumb drive. When it is plugged into the host machine, it reports itself with both a storage and an input (keyboard) interfaces during the USB enumeration (a procedure for the host machine to recognize the device). The OS then loads a storage driver and an input driver to make the device function. Once the input driver is loaded, the BadUSB device types a malicious script (like a human being), which is executed by the OS automatically. All of these happen in the OS within a second while the user is going through the files saved in the storage.

4. GoodUSB

The OS knows nothing about the USB device but is able to load different drivers to make the device happy (work); the user knows something about the device, e.g., from the appearance of the device, but is not able to interpose between the OS and the device. To bridge this semantic gap, ideally, we need a way to let the user and the OS talk:

User: I have just plugged in a USB flash drive.
OS: OK. I will not allow it to have a keyboard functionality then.

Essentially, this is GoodUSB.

As a end-to-end & systematic solution defending against malicious USB firmware, GoodUSB does not only include a customized Linux kernel but also a user-space daemon supporting GUI and a Honeypot KVM (HoneyUSB) for redirecting suspicious devices during run-time or start-time. While I am not going to list technical details here, I put the GoodUSB architecture figure here for a flavor and redirect further interests to our paper.

goodusb_arch

When the USB device is plugged into the host machine for the first time, the device class identifier in the kernel would try to fingerprint the firmware to get the signature (SHA1). The kernel then suspends further actions and sending the information about the device to the user-space daemon before enabling the device. The GoodUSB user-space daemon (gud) pops out a GUI asking for the user’s expectation about this device, as shown in the figure below:

goodusb_stupid_user_mode

Note that the choices in the GUI are high-level description for the device without any low-level USB spec terms. One beauty of GoodUSB is that once the user could give a general description of the device, the policy engine within gud is able to find right possible functionalities (interfaces) required to enable the device with the least “permission” (if we treat drivers as permissions). For instance, if the user choses “USB Storage”, no keyboard (input) functionality will be enabled for sure. After this, another GUI would pop out letting the user to bind this device with a security picture (just like a security picture when logging into online banking system). Now gud has all the information it needs. Besides updating the local device database, it relays all the information to the kernel, which could further configure the device as needed and expected by the user. When the device is plugged in for the 2nd time, the kernel is able to recognize it and asks for confirmation from the user via gud:

sec_pic_user_mode

However, if the device is shown as a green dinosaur but the user knows it should be a red one, then the user is aware that the firmware of the device has been changed (to mimic the device bound with a green dinosaur). In the case, after “This is NOT my device!” is clicked, the device will be redirected into HoneyUSB, where we have implemented a USB profiler (usbpro) to inspect the behaviors of the device. Even though GoodUSB was designed against BadUSB attacks, its ability to customize the functionalities to be enabled for a USB device is also invaluable in daily use. E.g., GoodUSB is able to shutdown the microphone in a USB headset but leaving the speaker working as usual.

5. Limitations

As other 0-day stuffs, GoodUSB is not able to defend against 0-day malicious firmware. If the keyboard is able to input scripts automatically, there is nothing GoodUSB can do. As readers may have been realized, GoodUSB relies on the trust of drivers. If the driver is malicious, GoodUSB does not work. Another thing I have to mention here is USB quirks. Although we have tried to cover as many devices as possible, there are always USB quirks, which would not function properly with GoodUSB. One example would be Yubikey, which looks like a thumb driver, has a USB hub functionality, and behaves like a keyboard. The last limitation comes from us – human beings. GoodUSB uses GUI and security pictures with the hope to help users make a better judgement. Again, this is our hope. We have not done and will not do any user study to show the validity of using GUI and security pictures. Usability is beyond the scope of the paper.

6. RtDC
paper: https://github.com/daveti/daveti/raw/master/paper/acsac15/acsac2015djt.pdf
code: https://github.com/daveti/GoodUSB

References:
[1]http://kukuruku.co/hub/diy/usb-killer
[2]http://hakshop.myshopify.com/products/usb-rubber-ducky-deluxe?variant=353378649
[3]https://www.pjrc.com/teensy/teensy31.html
[4]https://srlabs.de/badusb/
[5]https://github.com/daveti/badusb

Posted in OS, Security | Tagged , , , , , , , | Leave a comment

Linux kernel hacking – one relay file for all CPUs

I wrote a post about kernel relay 2 years go (https://davejingtian.org/2013/06/29/relay-linux-kernel-relay-filesystem/). However, I have realized that I did not understand relay until recently when I was debugging a relay-related bug. Though I was working on RHEL 2.6.32 kernel, this post also applies for the latest 4.3 kernel by the time of writing. After all, the kernel relay has been stable for more than a decade. May this post help understand kernel relay a little bit better.

0. When relay is init’d normally

Like my old post described, when the relay is initialized normally. There should be a relay file under /sys/kernel/debug for each CPU. As you would expected, there is a per-cpu buffer in the struct rchan to avoid potential locking among different CPUs.

  68        struct rchan_buf *buf[NR_CPUS]; /* per-cpu channel buffers */

The user-space code has to go thru all the relay files to receive the data from the kernel (select()) then.

1. Could we just have 1 relay file in the user-space?

When relay_open() is called to start the relay, the per-cpu buffer would be created one by one given the total number of CPUs online:

 603        for_each_online_cpu(i) {
 604                chan->buf[i] = relay_open_buf(chan, i);
 605                if (!chan->buf[i])
 606                        goto free_bufs;
 607        }

When the kernel starts to relay sth in relay_write():

 207        buf = chan->buf[smp_processor_id()];

smp_processor_id() determines the current CPU id when the current code is running and the corresponding per-cpu buffer would be used for that CPU to hold the data. Now the question is: could we make all the CPUs use just one “per-cpu” buffer?

2. A dirty hack!

Short answer is yes. This is done by a dirty hack in the struct rchan_callbacks:

 143        struct dentry *(*create_buf_file)(const char *filename,
 144                                          struct dentry *parent,
 145                                          umode_t mode,
 146                                          struct rchan_buf *buf,
 147                                          int *is_global);

In short, besides all the relay code we had, we also need to tune the callback named create_buf_file() and mark the is_global to be 1 (true). Besides the opportunity for us to customize the location of the relay file provided by this callback, it also gives us a chance to let the kernel know that we want a “global” buffer for all CPUs.

Here is the reason why I call it dirty:

 442        if (chan->is_global)
 443                return chan->buf[0];
 444
 445        buf = relay_create_buf(chan);
 446        if (!buf)
 447                return NULL;
 448
 449        if (chan->has_base_filename) {
 450                dentry = relay_create_buf_file(chan, buf, cpu);
 451                if (!dentry)
 452                        goto free_buf;
 453                relay_set_buf_dentry(buf, dentry);
 454        }
 455

When relay_open_buf() is called by the relay_open() for CPU0, the is_global flag saved in the struct rchan is still 0 (false) after initialization. So the per-cpu buffer will be created for CPU0 and relay_create_buf_file() will be called to create the relay file in the filesystem. But, before relay_create_buf_file returns, it calls our create_buf_file callback (finally!):

 423        dentry = chan->cb->create_buf_file(tmpname, chan->parent,
 424                                           S_IRUSR, buf,
 425                                           &chan->is_global);

Remember that we have fixed the is_global to be 1 in our callback? Here we pass the value from our callback to the is_global flag saved in the struct rchan in the kernel – how dirty is that! When relay_open_buf() tries to create a “per-cpu” for CPU1, it recognizes the is_global flag and sets the CPU1 buffer pointing to the CPU0’s.

3. Global buffer vs. per-cpu buffer

Global buffer is friendly to the user space, since no select() is needed. However, because all CPUs try to write to the save buffer, some locking mechanism is needed to serialize the access, as well as a big buffer to satisfy all CPUs given a period. However, if the system is NUMA, global buffer is apparently a bad idea and one should stick with per-cpu buffer to take the advantage of the NUMA.

Posted in Linux Dist, OS, Uncategorized | Tagged , , , | Leave a comment

Linux kernel hacking – support SO_PEERCRED for local TCP socket connections

In my old post (https://davejingtian.org/2015/02/17/retrieve-pid-from-the-packet-in-unix-domain-socket-a-complete-use-case-for-recvmsgsendmsg/), we talked about how to retrieve the peer PID from Unix domain socket using struct ucred. A more smart way to do this is using getsockopt() syscall with option SO_PEERCRED directly. As you expected (or not), this mechanism only works for Unix domain sockets. After all, why would we be interested in the PID of the peer socket in the other machine? But, what about local TCP/UDP connections? Why couldn’t we have this mechanism as well? This post gives technical details of how to implement the SO_PEERCRED support for local TCP socket connections within the Linux kernel. For more information, please R.t.D.C.

0. Finding the PID given the socket in the user space

To motivate a little bit, please consider the task as titled. I’m so sure that most sysadmins have got similar experience – finding the process using the specific socket. A most common way is to use netstat and grep. It works though pretty slow. Using libc system() embedded with a simple netstat script yields an overhead around 80 ms. Still, this is fine if the task is one-time shot and is not the bottle neck of the whole program. Otherwise, we can ask if we could do better.

In my opinion, this is the partial reason why ss is created. ss leverages a kernel module called tcp_diag, which uses the Linux kernel inet diagnostic interface to hook up TCP sockets, to accelerate the speed to retrieve TCP connection information from the kernel, with the help of the inet diag netlink socket, rather than digging around the /proc rudely (what netstat does). Thanks to tcp_diag, ss is able to know the backend file descriptor (FD) of the socket, based on which a /proc/X(pid)/fd/ search can reveal the right PID. A normal ss usage to find the PID using TCP port 22 (SSH) produces around 8 ms. Note that you have to make sure the tcp_diag kernel module is loaded. Otherwise, ss will do the same as netstat. The problem of ss is that it still needs to go thru all the /proc/X/ to have the mapping information between PID and FD, which is not scalable. Besides, 8 ms is still a big overhead in some user-space applications. So, can we make it faster?

1. Supporting SO_PEERCRED for local TCP socket connections in the Linux kernel

Finally, we are getting to the core of this post! Yes, we could make it faster. I mean really fast, less than 30 us! You are now finally interested in what I have done, right? Let us recall what have done for Unix domain socket. To retrieve the PID of the peer socket, all we need is a getsockopt() syscall with option SO_PEERCRED. Therefore, the overhead can be seen from the user space is just the overhead of getsockopt() syscall. Doesn’t this sound exciting! What we are going to do is to implement similar mechanism for local TCP socket. Warning: this may require you to have some Linux kernel networking knowledge before hand for a better understanding. E.g., it is good to know what skb is. Nevertheless, I will try to make things easier to understand while not offending other kernel hackers:) Ready? Go!

a. Look into SO_PEERCRED

When getsockopt() syscall is called with SO_PEERCRED in the user space, the code path goes into sock_getsockopt() in net/core/sock.c. You will find the code snippet for Linux kernel 2.6.32:

        case SO_PEERCRED:
 867                if (len > sizeof(sk->sk_peercred))
 868                        len = sizeof(sk->sk_peercred);
 869                if (copy_to_user(optval, &sk->sk_peercred, len))
 870                        return -EFAULT;
 871                goto lenout;
 872

As one can tell, what it does is just copying the sk->sk_peercred, which is struct ucred containing pid/uid/gid, to the user space. This code works for Unix domain sockets and now we will make it work for TCP sockets. The take-away here is now we know where we should put the PID. BTW, sk is struct sock, the network layer representation of socket in the kernel.

b. Make a TCP connection

The next question we need to answer is where a new TCP connection happens, since we want to find the peer PID as soon as a new connection comes. The kernel API tcp_v4_conn_request() in net/ipv4/tcp_ipv4.c is the answer. This function receives 2 parameters, a struct sock *sk, standing for the TCP server, and a struct sk_buff *skb, standing for a packet passing thru the whole TCP/IP stack within the kernel (yep, you hear me – skb is the key to Linux kernel networking hacking, though I am not going to talk more).

int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
1212{

What this function does is to accept/reject a new TCP connection request from skb. Another interesting thing in this function is a security hook:

        if (security_inet_conn_request(sk, skb, req))
1279                goto drop_and_free;

This security hook gives LSM (Linux Security Module) a chance to grant/deny the TCP connection based on security polices. To make our kernel hacking as less intrusive as possible, I decided to instrument the selinux_inet_conn_request() API in security/selinux/hooks.c, since CentOS is using SELinux for LSM.

static int selinux_inet_conn_request(struct sock *sk, struct sk_buff *skb,
4299                                     struct request_sock *req)
4300{

c. Assume the world is perfect

Look at the selinux_inet_conn_request() again. We have got a struct sock (*sk) and a connection request packet from the peer (*skb). Moving forward, we could find that skb also keeps a back reference to its parent struct sock. Since we are dealing with local connections, we (at least myself) assume that we should be able to trace back the struct sock from skb. Then the question would be how to retrieve the PID from struct sock. The answer is skb->sk->socket->file->f_owner->pid, which displays a possible path from skb back to the backend file of the socket (VFS), where PID is trivial to have. However, the world is not perfect. We could not even have the reference to the struct sock within the skb. On the other hand, we are so sure that skb->sk should point back to its parent struct sock when the skb (packet) is generated from the sock (socket). What is wrong?

d. “I am a strange loop”

All packets are finally queued in the network device for sending and receiving. Because we only consider local connections, all IP packets with target IP belonging to local or 127.0.0.1 are essentially “transmitted” using a loopback device. Let us go to the device driver for this loopback device – loopback_xmit() in drivers/net/loopback.c.

/*
  69 * The higher levels take care of making this non-reentrant (it's
  70 * called with bh's disabled).
  71 */
  72static netdev_tx_t loopback_xmit(struct sk_buff *skb,
  73                                 struct net_device *dev)
  74{
  75        struct pcpu_lstats *pcpu_lstats, *lb_stats;
  76        int len;
  77
  78        skb_orphan(skb);
  79
  80        skb->protocol = eth_type_trans(skb, dev);
  81
  82        /* it's OK to use per_cpu_ptr() because BHs are off */
  83        pcpu_lstats = dev->ml_priv;
  84        lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
  85
  86        len = skb->len;
  87        if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
  88                lb_stats->bytes += len;
  89                lb_stats->packets++;
  90        } else
  91                lb_stats->drops++;
  92
  93        return NETDEV_TX_OK;
  94}

When a new packet is to be sent locally, the network core calls loopback_xmit() to transmit the packet to the target, which is ourselves! Therefore, it calls netif_rx(), which just pushes the packet into its receiving queue directly, to send this packet. A software IRQ will be then raised to notify the CPU to handle this “new” packet. A more interesting thing in this function is skb_orphan(). I will let you guess what it does. Yes, it removes the back reference to the parent struct sock from the skb!

e. “Mercy Mercy Me”

OK, let’s try to not “orphan” the skb in the loopback device. Urr, it still does not work. Now are getting smarter. Let’s try to do a code search for skb_orphan() in the whole kernel source. Oops, there are tons of callings around the TCP networking implementation. E.g., when the packet is passed to the IP layer, ip_rcv() in net/ipv4/ip_input.c would “orphan” the packet because of tproxy (Transparent Proxy). On one hand, this explains again why we cannot trace back the struct sock from skb even for local connections; on the other hand, this implies that kernel basically does not distinguish local packets from non-local packets at the level of skb processing once the packet is received.

f. K.I.S.S.

Though I am personally not in favor of this solution due to the potential cache impact, it is clear that we need to have a new field to save PID in skb. Then during loopback_xmit(), we need to find the PID and assign the value to the skb new field, leaving all those “orphan”s doing whatever they wanna do. To find the PID from the struct sock, we have already learned to use sk->socket->file->f_owner->pid. Unfortunately, there is still a problem, the pid within f_owner is NULL! (WTF!) Now we (at least myself) are so angry that we go straightforward into the sock_alloc_file() in net/socket.c, where the backend file of the socket is created, and add the damn PID to the damn f_owner->pid. Finally, the world is getting better:)

2. Code

Within the code repo (https://github.com/daveti/tcpSockHack), there are 2 directories. The kernel directory contains a complete Linux kernel 2.6.32 patched with this cool feature can be used directly by CentOS 6.7. The user directory contains a simple TCP server/client, where the TCP server uses getsockopt with SO_PEERCRED to retrieve the PID of the TCP client. The kernel log is also included for debugging purpose.

3. What about UDP?

So far, I have neither talked about UDP nor investigated the possible hacking implementation. It is possible that the implementation for UDP could be the similar as the one for Unix domain, since both of them are datagram based; it is also possible, however, that the hacking would be heavily intrusive, since UDP is connection-less. Before I could find some time to dig around the UDP implementation, all I could say for now is TBD:)

4. K.R.K.C.

I hope you enjoy this post. This should be my longest post so far since I have covered a lot of kernel hacking knowledge and it took me the whole night to write it. Any comment is welcomed. Finally, life is short; please hack the kernel!

Posted in Linux Dist, Network, OS | Tagged , , , , , , , , , , , , , , , , , | 5 Comments

How Linux kernel works – in 4 sentences

I found this in “Understanding the Linux Kernel” (ULK). I believe I have seen a lot of analogies in computer science. But this one is “Simply the Best”. I am not going to put anything here except the original analogy in 4 sentences. Take a few seconds and try to see if this analogy could be the best you have ever seen.

  1. If a boss calls while the waiter is idle, the waiter starts servicing the boss.
  2. If a boss calls while the waiter is servicing a customer, the waiter stops servicing the customer and starts servicing the boss.
  3. If a boss calls while the waiter is servicing another boss, the waiter stops servicing the first boss and starts servicing the second one. When he finishes servicing the new boss, he resumes servicing the former one.
  4. One of the bosses may induce the waiter to leave the customer being currently serviced. After servicing the last request from the bosses, the waiter may decide to drop temporarily his customer and to pick up a new one.
Posted in OS | Tagged , , | Leave a comment

Cross compile user-space applications using Yocto for Gumstix Overo

An official way to add a software package using Yocto is to add a new layer if there is no recipe existing for this package. Then baking the whole image should have it included. However, we are not going to take about it in this post, where we will examine how to make the Wifi work and/or cross compile the code in the development server.

1. Compile locally

One would expect to have gcc installed in the rootfs (console-image) but it is not. As usual, the corresponding recipe could be added into the image. However, if the gumstix is equipped with a wifi adaptor, bringing it online may be a good idea as well if the corresponding package manager (e.g., opkg) is installed already.

This link (http://wiki.gumstix.org/index.php?title=Overo_Wifi) provides all the information to configure wifi connection. Note that the example is WPA2 in the wpa_supplicant. I have tried WPA-EAP, which is only supported by my wifi environment, but failed – wpa_supplicant parsing failure. I was using Gumstix Overo and Yocto 1.6 Daisy. It is not clear if newer Yocto Gumstix layer would support WPA-EAP (I am highly suspicious not). The workaround is simple as long as you have a smartphone supporting hotspot. Yep, configure the hotspot to make it WPA2 only and you have got it.

2. Cross-compile in the development server

Believe it or not – the best way working for me is cross-compile in the development server. This link (https://github.com/gumstix/yocto-manifest/wiki/Cross-Compile-with-Yocto-SDK) has everything. Note that the SDK provided by the Gumstix team is based on Yocto 1.8. If you are not using this version, build your own SDK:

bitbake gumstix-console-image -c populate_sdk

One tricky thing before starting the cross-compile is the Makefile. The environment gets reset for the ARM gcc after the environment script is sourced. This means, on one hand, any env variables reseting in the Makefile would break the cross-compile and fall back to native-compile. These env variables include CC, CFLAGS, LDFLAGS and etc. On the other hand, this new Makefile is much more simpler.

# Makefile for provd
# using Yocto cross compile
# Sep 1, 2015
# root@davejingtian.org
# http://davejingtian.org

# NOTE:
# Do NOT change CC,CFLAGS,LDFLAGS

OBJS = provd.o nlm.o provmem.o

provd : $(OBJS)
	$(CC) $(CFLAGS) $(LDFLAGS) $(OBJS) -o $@
Posted in Embedded System, IDE_Make | Tagged , , , , , | Leave a comment