Pitfalls in negative indexing in C

Negative indexing in C, such as a[-1], is legit, although rarely used. There are reasons (pitfalls) why negative indexing is not recommended. This post discusses these pitfalls when using negative indexing (for fun).

1. Negative indexing within the bound

#include <stdio.h>
#include <string.h>

int main()
{
	int a[10];
	int *p;
	int i;

	for (i = 0; i < 10; i++)
		a[i] = 0x12345678;
	p = &a[9];

	for (i = 0; i < 10; i++) {
		printf("a[%d]=0x%x, %p\n", i, a[i], &a[i]);
		printf("p[-%d]=0x%x, %p\n", i, p[-i], &p[-i]);
	}

	return 0;
}

In the code above, we let pointer p pointing to the end of array a, and print out each element in a using negative indexing with the help of p.

[daveti@daveti c]$ ./negidx0
a[0]=0x12345678, 0x7ffd50edad40
p[-0]=0x12345678, 0x7ffd50edad64
a[1]=0x12345678, 0x7ffd50edad44
p[-1]=0x12345678, 0x7ffd50edad60
a[2]=0x12345678, 0x7ffd50edad48
p[-2]=0x12345678, 0x7ffd50edad5c
a[3]=0x12345678, 0x7ffd50edad4c
p[-3]=0x12345678, 0x7ffd50edad58
a[4]=0x12345678, 0x7ffd50edad50
p[-4]=0x12345678, 0x7ffd50edad54
a[5]=0x12345678, 0x7ffd50edad54
p[-5]=0x12345678, 0x7ffd50edad50
a[6]=0x12345678, 0x7ffd50edad58
p[-6]=0x12345678, 0x7ffd50edad4c
a[7]=0x12345678, 0x7ffd50edad5c
p[-7]=0x12345678, 0x7ffd50edad48
a[8]=0x12345678, 0x7ffd50edad60
p[-8]=0x12345678, 0x7ffd50edad44
a[9]=0x12345678, 0x7ffd50edad64
p[-9]=0x12345678, 0x7ffd50edad40

The output proves what we would expect. Subcriptor [] simply means that E1[E2]=*((E1)+(E2)). So it does not matter whether E2 is positive or negative, and the compiler will take care of pointer type to generate the correct the result pointer (say, whether it is int or char). [1]

2. Negative indexing out of the bound

Things get complicated when we try to reference var a from var b, assuming that both a and b are next to each other…

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 10; i++) {
		printf("b[%d]=%d, %p\n", i, b[i], &b[i]);
		printf("b[-%d]=%d, %p\n", i, b[-i], &b[-i]);
	}

	return 0;
}

If you, like myself, expect to see some 0/1 interleaved, then the output would let you down:

[daveti@daveti c]$ ./negidx1
a=0x7ffc4c88ecd2, b=0x7ffc4c88ecc8
b[0]=1, 0x7ffc4c88ecc8
b[-0]=1, 0x7ffc4c88ecc8
b[1]=1, 0x7ffc4c88ecc9
b[-1]=0, 0x7ffc4c88ecc7
b[2]=1, 0x7ffc4c88ecca
b[-2]=0, 0x7ffc4c88ecc6
b[3]=1, 0x7ffc4c88eccb
b[-3]=0, 0x7ffc4c88ecc5
b[4]=1, 0x7ffc4c88eccc
b[-4]=0, 0x7ffc4c88ecc4
b[5]=1, 0x7ffc4c88eccd
b[-5]=0, 0x7ffc4c88ecc3
b[6]=1, 0x7ffc4c88ecce
b[-6]=64, 0x7ffc4c88ecc2
b[7]=1, 0x7ffc4c88eccf
b[-7]=6, 0x7ffc4c88ecc1
b[8]=1, 0x7ffc4c88ecd0
b[-8]=16, 0x7ffc4c88ecc0
b[9]=1, 0x7ffc4c88ecd1
b[-9]=0, 0x7ffc4c88ecbf

With all the addresses printed out, I guess you have figured sth out. The pitfall here is the actual memory layout of a and b – since stack grows downwards (from high to low), a has higher starting address than b. That is being said, b[-x] would never reach a, which can be seen from the addresses printed out. Therefore, if we wanna reference a from b, we need positive indexing out of bound, e.g.,

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 20; i++) {
		printf("b[%d]=%d, %p\n", i, b[i], &b[i]);
	}

	return 0;
}

And here is the output:

[daveti@daveti c]$ ./negidx2
a=0x7ffdbf9bc682, b=0x7ffdbf9bc678
b[0]=1, 0x7ffdbf9bc678
b[1]=1, 0x7ffdbf9bc679
b[2]=1, 0x7ffdbf9bc67a
b[3]=1, 0x7ffdbf9bc67b
b[4]=1, 0x7ffdbf9bc67c
b[5]=1, 0x7ffdbf9bc67d
b[6]=1, 0x7ffdbf9bc67e
b[7]=1, 0x7ffdbf9bc67f
b[8]=1, 0x7ffdbf9bc680
b[9]=1, 0x7ffdbf9bc681
b[10]=0, 0x7ffdbf9bc682
b[11]=0, 0x7ffdbf9bc683
b[12]=0, 0x7ffdbf9bc684
b[13]=0, 0x7ffdbf9bc685
b[14]=0, 0x7ffdbf9bc686
b[15]=0, 0x7ffdbf9bc687
b[16]=0, 0x7ffdbf9bc688
b[17]=0, 0x7ffdbf9bc689
b[18]=0, 0x7ffdbf9bc68a
b[19]=0, 0x7ffdbf9bc68b

Or, if we wanna reference b from a, we could use negative indexing out of bound, e.g.,

#include <stdio.h>
#include <string.h>

int main()
{
	char a[10];
	char b[10];

	memset(a, 0x0, 10);
	memset(b, 0x1, 10);

	int i;
	printf("a=%p, b=%p\n", a, b);
	for (i = 0; i < 10; i++) {
		printf("a[%d]=%d, %p\n", i, a[i], &a[i]);
		printf("a[-%d]=%d, %p\n", i, a[-i], &a[-i]);
	}

	return 0;
}

And here is the output:

[daveti@daveti c]$ ./negidx3
a=0x7ffece838602, b=0x7ffece8385f8
a[0]=0, 0x7ffece838602
a[-0]=0, 0x7ffece838602
a[1]=0, 0x7ffece838603
a[-1]=1, 0x7ffece838601
a[2]=0, 0x7ffece838604
a[-2]=1, 0x7ffece838600
a[3]=0, 0x7ffece838605
a[-3]=1, 0x7ffece8385ff
a[4]=0, 0x7ffece838606
a[-4]=1, 0x7ffece8385fe
a[5]=0, 0x7ffece838607
a[-5]=1, 0x7ffece8385fd
a[6]=0, 0x7ffece838608
a[-6]=1, 0x7ffece8385fc
a[7]=0, 0x7ffece838609
a[-7]=1, 0x7ffece8385fb
a[8]=0, 0x7ffece83860a
a[-8]=1, 0x7ffece8385fa
a[9]=0, 0x7ffece83860b
a[-9]=1, 0x7ffece8385f9

3. Alignment

Things get more complicated when there is memory alignment in the stack, which is pretty common.

#include <stdio.h>
#include <string.h>

typedef struct _total_made_up {
	char c0;
	int i;
	char c1[10];
} tmu;

int main()
{
	tmu a;
	tmu b;

	memset(&a, 0x0, sizeof(tmu));
	memset(&b, 0x1, sizeof(tmu));

	int i;
	printf("a=%p, a.c0=%p, a.i=%p, a.c1=%p, sizeof(a)=%d\n",
		&a, &(a.c0), &(a.i), &(a.c1), sizeof(a));
	printf("b=%p, b.c0=%p, b.i=%p, b.c1=%p, sizeof(b)=%d\n",
		&b, &(b.c0), &(b.i), &(b.c1), sizeof(b));
	for (i = 0; i < 10; i++) {
		printf("a.c1[%d]=%d, %p\n", i, (a.c1)[i], &(a.c1)[i]);
		printf("a.c1[-%d]=%d, %p\n", i, (a.c1)[-i], &(a.c1)[-i]);
	}

	return 0;
}

Here is the output:

[daveti@daveti c]$ ./negidx4
a=0x7ffc4bf3c8e0, a.c0=0x7ffc4bf3c8e0, a.i=0x7ffc4bf3c8e4, a.c1=0x7ffc4bf3c8e8, sizeof(a)=20
b=0x7ffc4bf3c8c0, b.c0=0x7ffc4bf3c8c0, b.i=0x7ffc4bf3c8c4, b.c1=0x7ffc4bf3c8c8, sizeof(b)=20
a.c1[0]=0, 0x7ffc4bf3c8e8
a.c1[-0]=0, 0x7ffc4bf3c8e8
a.c1[1]=0, 0x7ffc4bf3c8e9
a.c1[-1]=0, 0x7ffc4bf3c8e7
a.c1[2]=0, 0x7ffc4bf3c8ea
a.c1[-2]=0, 0x7ffc4bf3c8e6
a.c1[3]=0, 0x7ffc4bf3c8eb
a.c1[-3]=0, 0x7ffc4bf3c8e5
a.c1[4]=0, 0x7ffc4bf3c8ec
a.c1[-4]=0, 0x7ffc4bf3c8e4
a.c1[5]=0, 0x7ffc4bf3c8ed
a.c1[-5]=0, 0x7ffc4bf3c8e3
a.c1[6]=0, 0x7ffc4bf3c8ee
a.c1[-6]=0, 0x7ffc4bf3c8e2
a.c1[7]=0, 0x7ffc4bf3c8ef
a.c1[-7]=0, 0x7ffc4bf3c8e1
a.c1[8]=0, 0x7ffc4bf3c8f0
a.c1[-8]=0, 0x7ffc4bf3c8e0
a.c1[9]=0, 0x7ffc4bf3c8f1
a.c1[-9]=0, 0x7ffc4bf3c8df

If we compare the starting address of a and b, we can find a 32-byte difference. The size of the struct is 20 bytes, which means there is a 12-byte gap between the end of b and the start of a due to padding (alignment). Even worse, there is also padding happened within the structure, where the c0 is padded with 3 bytes to alignment with the 4-byte int (i), and c1 is padded with another 2 bytes in the end. Therefore, if we want to use a.c1 to reference b.c1 with negative indexing, the index should start from -22 (8 bytes in a, and 12 bytes in the gap, and 2 bytes padding in b.c1). You don’t see this number coming, do you:)

#include <stdio.h>
#include <string.h>

typedef struct _total_made_up {
	char c0;
	int i;
	char c1[10];
} tmu;

int main()
{
	tmu a;
	tmu b;

	memset(&a, 0x0, sizeof(tmu));
	memset(&b, 0x0, sizeof(tmu));
	b.c0 = 1;
	b.i = 1;
	memset(&b.c1, 0x1, 10);

	int i;
	printf("a=%p, a.c0=%p, a.i=%p, a.c1=%p, sizeof(a)=%d\n",
		&a, &(a.c0), &(a.i), &(a.c1), sizeof(a));
	printf("b=%p, b.c0=%p, b.i=%p, b.c1=%p, sizeof(b)=%d\n",
		&b, &(b.c0), &(b.i), &(b.c1), sizeof(b));
	for (i = 0; i < 10; i++) {
		printf("a.c1[%d]=%d, %p\n", i, (a.c1)[i], &(a.c1)[i]);
		printf("a.c1[-%d-22]=%d, %p\n", i, (a.c1)[-i-22], &(a.c1)[-i-22]);
	}

	return 0;
}

[daveti@daveti c]$ ./negidx5
a=0x7ffce988ccf0, a.c0=0x7ffce988ccf0, a.i=0x7ffce988ccf4, a.c1=0x7ffce988ccf8, sizeof(a)=20
b=0x7ffce988ccd0, b.c0=0x7ffce988ccd0, b.i=0x7ffce988ccd4, b.c1=0x7ffce988ccd8, sizeof(b)=20
a.c1[0]=0, 0x7ffce988ccf8
a.c1[-0-22]=0, 0x7ffce988cce2
a.c1[1]=0, 0x7ffce988ccf9
a.c1[-1-22]=1, 0x7ffce988cce1
a.c1[2]=0, 0x7ffce988ccfa
a.c1[-2-22]=1, 0x7ffce988cce0
a.c1[3]=0, 0x7ffce988ccfb
a.c1[-3-22]=1, 0x7ffce988ccdf
a.c1[4]=0, 0x7ffce988ccfc
a.c1[-4-22]=1, 0x7ffce988ccde
a.c1[5]=0, 0x7ffce988ccfd
a.c1[-5-22]=1, 0x7ffce988ccdd
a.c1[6]=0, 0x7ffce988ccfe
a.c1[-6-22]=1, 0x7ffce988ccdc
a.c1[7]=0, 0x7ffce988ccff
a.c1[-7-22]=1, 0x7ffce988ccdb
a.c1[8]=0, 0x7ffce988cd00
a.c1[-8-22]=1, 0x7ffce988ccda
a.c1[9]=0, 0x7ffce988cd01
a.c1[-9-22]=1, 0x7ffce988ccd9

3. Summary

We have just talked about how messed up negative indexing could be on the stack. It is also possible to do this on the heap, which is more error-prone given the memory management under the hood (malloc/kmalloc). To correctly use negative (or out of bound) indexing, one needs to know exactly what the memory layout looks like, this could be hard for stack variables, even harder for heap ones. For normal programming, in most cases, you do not need negative indexing. As a matter of fact, the only normal usage I have seen so far is from ARMv3-M software development manual, where negative indexing is used to retrieve the SVC instruction number, which is saved on the stack by the CPU core during the supervisor call.

Reference:

[1] https://stackoverflow.com/questions/3473675/are-negative-array-indexes-allowed-in-c

About daveti

Interested in kernel hacking, compilers, machine learning and guitars.
This entry was posted in Programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s