Valgrind has a client request mechanism, which allows a client to pass some information back to valgrind. This includes asks valgrind to do a logging in its own environment, tells valgrind a range of VA being used as a new stack, and etc [1]. This mechansim is essentially a trapdoor built into VEX during the binary translation. We starts with a typical usage of valgrind trapdoor to add a logging into valgrind from the client. We then remove the dependency of valgrind header files and manually craft the trapdoor by ourselves. Note that this post is NOT about how to use the client request mechanism (read the damn manual:), nor on how to add a new client request (which I will talk in another post on valgrind hacking in general). Last, the code can be found at [2], and have fun:)
1. A typical usage
To use the valgrind trapdoor, we need to include the header: <valgrind/valgrind.h>. Let’s take VALGRIND_PRINTF as an example, which asks valgrind to add a logging for the client. The code below prints out the magic number in the valgrind logging:
#include ... #define valgrind_printf_fmt_str "daveti: trapdoor, magic [%d]\n" ... int magic = 777; /* Normal valgrind trapdoor */ ret = VALGRIND_PRINTF(valgrind_printf_fmt_str, magic); printf("daveti: ret [%d]\n", ret);
When running with valgrind, the output looks like below:
**7382** daveti: trapdoor, magic [777]
daveti: ret [30]
The return value is the total length of the string printed out. Nothing fancy here but do remember that this logging is done by valgrind rather than the client program.
2. A better understanding
Alright, so what is that damn VALGRIND_PRINTF thing? Let’s have a deeper view using objdump (-S):
0000000000400527 : 400527: 55 push %rbp 400528: 48 89 e5 mov %rsp,%rbp 40052b: 48 81 ec a8 00 00 00 sub $0xa8,%rsp 400532: 48 89 bd e8 fe ff ff mov %rdi,-0x118(%rbp) 400539: 48 89 b5 58 ff ff ff mov %rsi,-0xa8(%rbp) 400540: 48 89 95 60 ff ff ff mov %rdx,-0xa0(%rbp) 400547: 48 89 8d 68 ff ff ff mov %rcx,-0x98(%rbp) 40054e: 4c 89 85 70 ff ff ff mov %r8,-0x90(%rbp) 400555: 4c 89 8d 78 ff ff ff mov %r9,-0x88(%rbp) 40055c: 84 c0 test %al,%al 40055e: 74 20 je 400580 400560: 0f 29 45 80 movaps %xmm0,-0x80(%rbp) 400564: 0f 29 4d 90 movaps %xmm1,-0x70(%rbp) 400568: 0f 29 55 a0 movaps %xmm2,-0x60(%rbp) 40056c: 0f 29 5d b0 movaps %xmm3,-0x50(%rbp) 400570: 0f 29 65 c0 movaps %xmm4,-0x40(%rbp) 400574: 0f 29 6d d0 movaps %xmm5,-0x30(%rbp) 400578: 0f 29 75 e0 movaps %xmm6,-0x20(%rbp) 40057c: 0f 29 7d f0 movaps %xmm7,-0x10(%rbp) 400580: c7 85 30 ff ff ff 08 movl $0x8,-0xd0(%rbp) 400587: 00 00 00 40058a: c7 85 34 ff ff ff 30 movl $0x30,-0xcc(%rbp) 400591: 00 00 00 400594: 48 8d 45 10 lea 0x10(%rbp),%rax 400598: 48 89 85 38 ff ff ff mov %rax,-0xc8(%rbp) 40059f: 48 8d 85 50 ff ff ff lea -0xb0(%rbp),%rax 4005a6: 48 89 85 40 ff ff ff mov %rax,-0xc0(%rbp) 4005ad: 48 c7 85 f0 fe ff ff movq $0x1403,-0x110(%rbp) 4005b4: 03 14 00 00 4005b8: 48 8b 85 e8 fe ff ff mov -0x118(%rbp),%rax 4005bf: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) 4005c6: 48 8d 85 30 ff ff ff lea -0xd0(%rbp),%rax 4005cd: 48 89 85 00 ff ff ff mov %rax,-0x100(%rbp) 4005d4: 48 c7 85 08 ff ff ff movq $0x0,-0xf8(%rbp) 4005db: 00 00 00 00 4005df: 48 c7 85 10 ff ff ff movq $0x0,-0xf0(%rbp) 4005e6: 00 00 00 00 4005ea: 48 c7 85 18 ff ff ff movq $0x0,-0xe8(%rbp) 4005f1: 00 00 00 00 4005f5: 48 8d 85 f0 fe ff ff lea -0x110(%rbp),%rax 4005fc: b9 00 00 00 00 mov $0x0,%ecx 400601: 89 ca mov %ecx,%edx 400603: 48 c1 c7 03 rol $0x3,%rdi 400607: 48 c1 c7 0d rol $0xd,%rdi 40060b: 48 c1 c7 3d rol $0x3d,%rdi 40060f: 48 c1 c7 33 rol $0x33,%rdi 400613: 48 87 db xchg %rbx,%rbx 400616: 48 89 d0 mov %rdx,%rax 400619: 48 89 85 28 ff ff ff mov %rax,-0xd8(%rbp) 400620: 48 8b 85 28 ff ff ff mov -0xd8(%rbp),%rax 400627: 48 89 85 48 ff ff ff mov %rax,-0xb8(%rbp) 40062e: 48 8b 85 48 ff ff ff mov -0xb8(%rbp),%rax 400635: c9 leaveq 400636: c3 retq
A quick code go-thru shows that this function does “NOTHING”, except saving some registers on the stack before updating them. This is actually the design of valgrind trapdoor – it should not change any registers or memory when the client program does not run with valgrind. In other words, only valgrind is able to interpret this trapdoor and do something with side effect. Let’s dive into this function.
The first around 20 lines are a typical usage of va_list, because VALGRIND_PRINTF accepts variable-length arguments like printf. Then we see bunch of values pushed into the stack, including this magic value 0x1403:
movq $0x1403, -0x110(%rbp)
And then more “useless” code near the end of the function:
rol $0x3, %rdi rol $0xd, %rdi rol $0x3d, %rdi rol $0x33, %rdi xchg %rbx, %rbx
After all those rotations, rdi is unchanged, as well as rbx. Now it is time to look at the valgrind.h file [3] to sort things out, and here it goes:
#define __SPECIAL_INSTRUCTION_PREAMBLE \ "rolq $3, %%rdi ; rolq $13, %%rdi\n\t" \ "rolq $61, %%rdi ; rolq $51, %%rdi\n\t" #define VALGRIND_DO_CLIENT_REQUEST_EXPR( \ _zzq_default, _zzq_request, \ _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5) \ __extension__ \ ({ volatile unsigned long int _zzq_args[6]; \ volatile unsigned long int _zzq_result; \ _zzq_args[0] = (unsigned long int)(_zzq_request); \ _zzq_args[1] = (unsigned long int)(_zzq_arg1); \ _zzq_args[2] = (unsigned long int)(_zzq_arg2); \ _zzq_args[3] = (unsigned long int)(_zzq_arg3); \ _zzq_args[4] = (unsigned long int)(_zzq_arg4); \ _zzq_args[5] = (unsigned long int)(_zzq_arg5); \ __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE \ /* %RDX = client_request ( %RAX ) */ \ "xchgq %%rbx,%%rbx" \ : "=d" (_zzq_result) \ : "a" (&_zzq_args[0]), "0" (_zzq_default) \ : "cc", "memory" \ ); \ _zzq_result; \ })
Turns out those rotations instructions are the essential trapdoor for x86_64. The xchg is used to ask valgrind to do a client request where the request number is _zzq_args[0] and the return value is saved into rdx. As you might have guessed, the request number is 0x1403 for VALGRIND_PRINTF.
In summary, when valgrind sees those rol instructions followed by an xchg, it recognizes this trapdoor, and passes arguments from the stack. The first argument, request number, determines which valgrind function will be called internally. The return value will be hold in rdx and then futher propagated to the client program via rax.
3. Fun
Once we know what the trapdoor looks like, we can get rid of the dependency on valgrind.h header file, and craft our own trapdoor. Say we wanna make our own VALGRIND_PRINTF. Then what we need are a va_list, filled with format strings and variable-length arguments, and the trapdoor instructions followed by xchg:
#define valgrind_printf_code 0x1403 #define valgrind_printf_fmt_str "daveti: trapdoor, magic [%d]\n" #define valgrind_trapdoor_code \ "rol $0x3, %%rdi\n\t" \ "rol $0xd, %%rdi\n\t" \ "rol $0x3d, %%rdi\n\t" \ "rol $0x33, %%rdi\n\t" static unsigned long valgrind_printf_manual(char *fmt, ...) { unsigned long args[6] = {0}; unsigned long ret = 0; va_list vargs; /* Follow valgrind ABI */ va_start(vargs, fmt); args[0] = (unsigned long)valgrind_printf_code; args[1] = (unsigned long)fmt; args[2] = (unsigned long)&vargs; /* rdx = client_req(rax); */ asm volatile ("mov $0x0, %%rdx\n\t" \ valgrind_trapdoor_code \ "xchg %%rbx, %%rbx\n\t" \ : "=d"(ret) \ : "a"(&args[0]) \ : "cc", "memory"); va_end(vargs); return ret; }
That’s it. Now we have a homemade VALGRIND_PRINT – valgrind_printf_manual, which behaves exactly what the former does, and we do not need to include valgrind.h header file at all.
NOTE 1: For other client requests (other than VALGRIND_PRINTF), the arguments building should be more straight-forward. VALGRIND_PRINTF is tricky due to the usage of variable-length arguments. And we decided it to use it because of its obvisous side effect (log printing in valgrind).
NOTE 2: While the trapdoor mechanism is the same across architectures, the trapdoor instructions are different among different architectures. We limit out focus on x86_64. Nevertheless, all these trapdoor instructions should follow the same deign goal – no changes on registers or memory when invoked without valgrind.
4. Security vs. Obfuscation
The design of valgrind trapdoor is delicate and useful. It gives a client program and opportunity to pass some useful information to valgrind, e.g., to suppress false positives in memcheck. Meanwhile, because we could craft the trapdoor manually, leaving no trace of valgrind in the binary, a client program is able to detect if it is running under valgrind essentially. Based on the detection result, the client program may do something totally different (for PoC, please check out [2]).
From security perspectives, a client program can detect the valgrind running environment, thus skip malicious behaviors which might be found by certain valgrind plugin, similar as VM detection techniques used by malware. From obfuscation points, a client program can also hide critical functionality from being analyzed by valgrind during runtime. Although I have not seen a strong motivation to detect valgrind as VM, the trapdoor mechanism has already provided a neat technique to achieve this.
References:
[1] http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq
[2] https://github.com/daveti/valtrap
[3] https://github.com/daveti/valgrind/blob/zircon/include/valgrind.h