Valgrind trapdoor and fun

Valgrind has a client request mechanism, which allows a client to pass some information back to valgrind. This includes asks valgrind to do a logging in its own environment, tells valgrind a range of VA being used as a new stack, and etc [1]. This mechansim is essentially a trapdoor built into VEX during the binary translation. We starts with a typical usage of valgrind trapdoor to add a logging into valgrind from the client. We then remove the dependency of valgrind header files and manually craft the trapdoor by ourselves. Note that this post is NOT about how to use the client request mechanism (read the damn manual:), nor on how to add a new client request (which I will talk in another post on valgrind hacking in general). Last, the code can be found at [2], and have fun:)

1. A typical usage

To use the valgrind trapdoor, we need to include the header: <valgrind/valgrind.h>. Let’s take VALGRIND_PRINTF as an example, which asks valgrind to add a logging for the client. The code below prints out the magic number in the valgrind logging:

#include
...
#define valgrind_printf_fmt_str "daveti: trapdoor, magic [%d]\n"
...
int magic = 777;
/* Normal valgrind trapdoor */
ret = VALGRIND_PRINTF(valgrind_printf_fmt_str, magic);
printf("daveti: ret [%d]\n", ret);

When running with valgrind, the output looks like below:
**7382** daveti: trapdoor, magic [777]
daveti: ret [30]
The return value is the total length of the string printed out. Nothing fancy here but do remember that this logging is done by valgrind rather than the client program.

2. A better understanding

Alright, so what is that damn VALGRIND_PRINTF thing? Let’s have a deeper view using objdump (-S):

0000000000400527 :
  400527:	55                   	push   %rbp
  400528:	48 89 e5             	mov    %rsp,%rbp
  40052b:	48 81 ec a8 00 00 00 	sub    $0xa8,%rsp
  400532:	48 89 bd e8 fe ff ff 	mov    %rdi,-0x118(%rbp)
  400539:	48 89 b5 58 ff ff ff 	mov    %rsi,-0xa8(%rbp)
  400540:	48 89 95 60 ff ff ff 	mov    %rdx,-0xa0(%rbp)
  400547:	48 89 8d 68 ff ff ff 	mov    %rcx,-0x98(%rbp)
  40054e:	4c 89 85 70 ff ff ff 	mov    %r8,-0x90(%rbp)
  400555:	4c 89 8d 78 ff ff ff 	mov    %r9,-0x88(%rbp)
  40055c:	84 c0                	test   %al,%al
  40055e:	74 20                	je     400580
  400560:	0f 29 45 80          	movaps %xmm0,-0x80(%rbp)
  400564:	0f 29 4d 90          	movaps %xmm1,-0x70(%rbp)
  400568:	0f 29 55 a0          	movaps %xmm2,-0x60(%rbp)
  40056c:	0f 29 5d b0          	movaps %xmm3,-0x50(%rbp)
  400570:	0f 29 65 c0          	movaps %xmm4,-0x40(%rbp)
  400574:	0f 29 6d d0          	movaps %xmm5,-0x30(%rbp)
  400578:	0f 29 75 e0          	movaps %xmm6,-0x20(%rbp)
  40057c:	0f 29 7d f0          	movaps %xmm7,-0x10(%rbp)
  400580:	c7 85 30 ff ff ff 08 	movl   $0x8,-0xd0(%rbp)
  400587:	00 00 00
  40058a:	c7 85 34 ff ff ff 30 	movl   $0x30,-0xcc(%rbp)
  400591:	00 00 00
  400594:	48 8d 45 10          	lea    0x10(%rbp),%rax
  400598:	48 89 85 38 ff ff ff 	mov    %rax,-0xc8(%rbp)
  40059f:	48 8d 85 50 ff ff ff 	lea    -0xb0(%rbp),%rax
  4005a6:	48 89 85 40 ff ff ff 	mov    %rax,-0xc0(%rbp)
  4005ad:	48 c7 85 f0 fe ff ff 	movq   $0x1403,-0x110(%rbp)
  4005b4:	03 14 00 00
  4005b8:	48 8b 85 e8 fe ff ff 	mov    -0x118(%rbp),%rax
  4005bf:	48 89 85 f8 fe ff ff 	mov    %rax,-0x108(%rbp)
  4005c6:	48 8d 85 30 ff ff ff 	lea    -0xd0(%rbp),%rax
  4005cd:	48 89 85 00 ff ff ff 	mov    %rax,-0x100(%rbp)
  4005d4:	48 c7 85 08 ff ff ff 	movq   $0x0,-0xf8(%rbp)
  4005db:	00 00 00 00
  4005df:	48 c7 85 10 ff ff ff 	movq   $0x0,-0xf0(%rbp)
  4005e6:	00 00 00 00
  4005ea:	48 c7 85 18 ff ff ff 	movq   $0x0,-0xe8(%rbp)
  4005f1:	00 00 00 00
  4005f5:	48 8d 85 f0 fe ff ff 	lea    -0x110(%rbp),%rax
  4005fc:	b9 00 00 00 00       	mov    $0x0,%ecx
  400601:	89 ca                	mov    %ecx,%edx
  400603:	48 c1 c7 03          	rol    $0x3,%rdi
  400607:	48 c1 c7 0d          	rol    $0xd,%rdi
  40060b:	48 c1 c7 3d          	rol    $0x3d,%rdi
  40060f:	48 c1 c7 33          	rol    $0x33,%rdi
  400613:	48 87 db             	xchg   %rbx,%rbx
  400616:	48 89 d0             	mov    %rdx,%rax
  400619:	48 89 85 28 ff ff ff 	mov    %rax,-0xd8(%rbp)
  400620:	48 8b 85 28 ff ff ff 	mov    -0xd8(%rbp),%rax
  400627:	48 89 85 48 ff ff ff 	mov    %rax,-0xb8(%rbp)
  40062e:	48 8b 85 48 ff ff ff 	mov    -0xb8(%rbp),%rax
  400635:	c9                   	leaveq
  400636:	c3                   	retq

A quick code go-thru shows that this function does “NOTHING”, except saving some registers on the stack before updating them. This is actually the design of valgrind trapdoor – it should not change any registers or memory when the client program does not run with valgrind. In other words, only valgrind is able to interpret this trapdoor and do something with side effect. Let’s dive into this function.

The first around 20 lines are a typical usage of va_list, because VALGRIND_PRINTF accepts variable-length arguments like printf. Then we see bunch of values pushed into the stack, including this magic value 0x1403:

movq $0x1403, -0x110(%rbp)

And then more “useless” code near the end of the function:

rol $0x3, %rdi
rol $0xd, %rdi
rol $0x3d, %rdi
rol $0x33, %rdi
xchg %rbx, %rbx

After all those rotations, rdi is unchanged, as well as rbx. Now it is time to look at the valgrind.h file [3] to sort things out, and here it goes:

#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
                     "rolq $3,  %%rdi ; rolq $13, %%rdi\n\t"      \
                     "rolq $61, %%rdi ; rolq $51, %%rdi\n\t"

#define VALGRIND_DO_CLIENT_REQUEST_EXPR(                          \
        _zzq_default, _zzq_request,                               \
        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
    __extension__                                                 \
    ({ volatile unsigned long int _zzq_args[6];                   \
    volatile unsigned long int _zzq_result;                       \
    _zzq_args[0] = (unsigned long int)(_zzq_request);             \
    _zzq_args[1] = (unsigned long int)(_zzq_arg1);                \
    _zzq_args[2] = (unsigned long int)(_zzq_arg2);                \
    _zzq_args[3] = (unsigned long int)(_zzq_arg3);                \
    _zzq_args[4] = (unsigned long int)(_zzq_arg4);                \
    _zzq_args[5] = (unsigned long int)(_zzq_arg5);                \
    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
                     /* %RDX = client_request ( %RAX ) */         \
                     "xchgq %%rbx,%%rbx"                          \
                     : "=d" (_zzq_result)                         \
                     : "a" (&_zzq_args[0]), "0" (_zzq_default)    \
                     : "cc", "memory"                             \
                    );                                            \
    _zzq_result;                                                  \
    })

Turns out those rotations instructions are the essential trapdoor for x86_64. The xchg is used to ask valgrind to do a client request where the request number is _zzq_args[0] and the return value is saved into rdx. As you might have guessed, the request number is 0x1403 for VALGRIND_PRINTF.

In summary, when valgrind sees those rol instructions followed by an xchg, it recognizes this trapdoor, and passes arguments from the stack. The first argument, request number, determines which valgrind function will be called internally. The return value will be hold in rdx and then futher propagated to the client program via rax.

3. Fun

Once we know what the trapdoor looks like, we can get rid of the dependency on valgrind.h header file, and craft our own trapdoor. Say we wanna make our own VALGRIND_PRINTF. Then what we need are a va_list, filled with format strings and variable-length arguments, and the trapdoor instructions followed by xchg:

#define valgrind_printf_code		0x1403
#define valgrind_printf_fmt_str		"daveti: trapdoor, magic [%d]\n"
#define valgrind_trapdoor_code		\
	"rol $0x3, %%rdi\n\t"		\
	"rol $0xd, %%rdi\n\t"		\
	"rol $0x3d, %%rdi\n\t"		\
	"rol $0x33, %%rdi\n\t"


static unsigned long valgrind_printf_manual(char *fmt, ...)
{
	unsigned long args[6] = {0};
	unsigned long ret = 0;
	va_list vargs;

	/* Follow valgrind ABI */
	va_start(vargs, fmt);
	args[0] = (unsigned long)valgrind_printf_code;
	args[1] = (unsigned long)fmt;
	args[2] = (unsigned long)&vargs;

	/* rdx = client_req(rax); */
	asm volatile ("mov $0x0, %%rdx\n\t"	\
			valgrind_trapdoor_code	\
			"xchg %%rbx, %%rbx\n\t"	\
			: "=d"(ret)		\
			: "a"(&args[0])		\
			: "cc", "memory");
	va_end(vargs);
	return ret;
}

That’s it. Now we have a homemade VALGRIND_PRINT – valgrind_printf_manual, which behaves exactly what the former does, and we do not need to include valgrind.h header file at all.

NOTE 1: For other client requests (other than VALGRIND_PRINTF), the arguments building should be more straight-forward. VALGRIND_PRINTF is tricky due to the usage of variable-length arguments. And we decided it to use it because of its obvisous side effect (log printing in valgrind).

NOTE 2: While the trapdoor mechanism is the same across architectures, the trapdoor instructions are different among different architectures. We limit out focus on x86_64. Nevertheless, all these trapdoor instructions should follow the same deign goal – no changes on registers or memory when invoked without valgrind.

4. Security vs. Obfuscation

The design of valgrind trapdoor is delicate and useful. It gives a client program and opportunity to pass some useful information to valgrind, e.g., to suppress false positives in memcheck. Meanwhile, because we could craft the trapdoor manually, leaving no trace of valgrind in the binary, a client program is able to detect if it is running under valgrind essentially. Based on the detection result, the client program may do something totally different (for PoC, please check out [2]).

From security perspectives, a client program can detect the valgrind running environment, thus skip malicious behaviors which might be found by certain valgrind plugin, similar as VM detection techniques used by malware. From obfuscation points, a client program can also hide critical functionality from being analyzed by valgrind during runtime. Although I have not seen a strong motivation to detect valgrind as VM, the trapdoor mechanism has already provided a neat technique to achieve this.

References:

[1] http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq
[2] https://github.com/daveti/valtrap
[3] https://github.com/daveti/valgrind/blob/zircon/include/valgrind.h

About daveti

Interested in kernel hacking, compilers, machine learning and guitars.
This entry was posted in Programming, Security and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.