Sunday, October 25, 2015

Clang (LLVM) versus gcc

Below are some of my experiments which list the difference between clang and gcc seen as a user.
(clang version 3.6, gcc version 4.8.4)

Expt. 1: Memory Allocation:
If I declare something like the below (small arrays):
    char a[4];
    char b[4];
    char c[4];

gcc allocates memory addresses to a, b, c which are ending in 0 (aligned to 16 bytes) even if the actual usage is only 4 bytes for each of them.
But clang is able to analyze the size of the array and allocates space only what is needed, thus preventing memory holes.
gcc output:
Addresses for char:
clang output:
Addresses for char:

Expt. 2: Register Allocation:
I have the below testcase, which I am compiling using -O3 optimization.

int main(){
    register int i;
    int a[1000];
    for(i=0;i<1000;i++)          //loop-1
    printf ("%p\n",&i);             //line-1

When I remove line-1, then variable i is allocated register as expected. But when I add line-1, then variable i is not allocated register, but referenced from memory for each iteration of the for loop (loop-1). This was unexpected and on further probing, it was understood that because printf is an extern function and may be spawning threads and modifying variable i using pointers, compiler tries to be pessimistic and references variable from memory.
But the for loop occurs before the printf is called. In no way, the external function could harm using i from register in loop-1. This behavior is properly analyzed by clang but not by gcc. See the disassembly for both of them below. It is clear that clang allocates register rbx to variable i but gcc references it from memory always and hence clang does a better job here.

gcc code:
  400497:    8b 04 24                 mov    (%rsp),%eax
  40049a:    83 c0 01                 add    $0x1,%eax
  40049d:    3d e7 03 00 00           cmp    $0x3e7,%eax
  4004a2:    89 04 24                 mov    %eax,(%rsp)
  4004a5:    7e d9                    jle    400480 <main+0x10>

llvm code:

 40055f:    48 ff c3                 inc    %rbx
 400562:    48 81 fb e8 03 00 00     cmp    $0x3e8,%rbx
 400569:    75 e5                    jne    400550 <main+0x20>

Expt. 3: Memory Allocation:
gcc allocates memory addresses to local variables such that the smaller sized variables are allocated first, followed by larger ones and so on. In this process, it may happen that some of the intermediate addresses are kept unused (for alignment). e.g. char are allocated first, then short and then int.
clang allocates memory addresses in reverse order as gcc. This prevents any unused address space in between the allocated addresses and packs the variables compactly.

while gcc relatively performs bad in terms of space, there is one benefit of such allocation. gcc tries to maximize the number of variables which are closer to the SP and hence would be easily accessible by the immediate offset relative to SP. For variables which are allocated large address offset w.r.t. SP (which can happen if too many variables in the function), far variables cannot be accessed using immediate offset and will require extra processing. Allocating small size variables first enables gcc to enable more variables to be accessed faster compared to llvm.

Sample output from gcc and llvm are printed below:

    char a;
    char c;
    short aa;
    short ab;
    char b;
    int bb;

    printf ("Addresses for char:\n%p\n%p\n%p\n",&a,&b,&c);
    printf ("\nAddresses for short:\n%p\n%p\n",&aa,&ab);
    printf ("\nAddresses for int:\n%p\n\n",&bb);

gcc output:
Addresses for char:

Addresses for short:

Addresses for int:

llvm output:
Addresses for char:

Addresses for short:

Addresses for int: