Thursday, September 17, 2020

The "Less technical, More economical" aspects of VLSI design

Many advancements in technology, design automation, and novel applications have fueled the growth of chip design and the VLSI industry significantly. However, apart from the technical aspects, there are many practices followed in the industry which are not "really technical" in the true sense as they are neither functional nor technological needs, but are more "commercial" practices driven from economic and marketing metrics. They are usually not covered in regular classroom courses of VLSI design and form a niche topic, very relevant to the success of the semiconductor industry. I am listing some of these practices which intrigued me when I first learned about them during my tenure at a semiconductor company. I plan to develop this page by adding one topic at a time gradually.
 

1. Spare gates

Unlike a typical software design where fixing a bug involves a recompilation of the code after changes, the bug fixing methods for a hardware chip still remain the same as earlier. Fixing a hardware bug requires regenerating the mask used for manufacturing the chip, incurring delays and cost of mask regeneration. A chip is designed in multiple layers with the lowest layer being composed of silicon material while higher layers consisting of metal connections. Regenerating the mask for the silicon layer involves significant cost (~Million $) as well as considerable amount of time (3-4 weeks) in comparison to metal layers. With complexity of circuits growing and technology scaling, the chances of a bug being discovered is very high. Therefore, a practical solution needs to be devised which can help alleviate the cost and delay associated with identification of a bug. In one of the standard practice followed by many semiconductor industries, a designer "sprinkles" some extra "spare gates" into the original circuit. These spare gates are not connected to the main circuit and are just lying around scattered throughout the chip. In an ideal situation of no bugs showing up ever, these gates serve no purpose. However, if a specific bug shows up in the chip after manufacturing and if the bug is fixable by using a few extra gates, such spare gates can come to rescue. The designer can connect these spare gates to make useful connections with the actual circuit and avoid adding any extra gate into the original design. Such changes in connections can be realized using changes in metal layers only and therefore, masks for only metal layers are to be regenerated. Avoiding the need to add additional gates can avoid regenerating the mask for the silicon layer, and thereby save a huge amount of cost as well as manufacturing time.

Functionally, no one needs spare gates - it is not useful for the end user, nor it is useful for the application developer. It is also an additional overhead for the chip designer to make sure that spare gates are properly isolated from the main circuit and do not introduce additional side-effects. The technology also does not demand addition of spare gates. However, spare gates find widespread usage in chip design owing to the economic benefits and quick release to market that they enable.

Sunday, August 5, 2018

Concurrency bug in a real banking system

In this era of banking where online banking and mobile banking has became a daily affair for many of us (esp. in cities), any technical bug in the backend software used by banks can have an impact whose severity is difficult to assess without the occurrence of the incident. While the primary onus of correctness of the software is on the verification engineer, in the era of multi-processing and multi-threading, the design structure and the coding style matters a lot. Many of the artifacts of complexities in the current systems are non-deterministic and non-repeatable and cannot be verified with guarantees. Thus, the designer is also equally responsible for a bug that leaks into the production system.

Recently, I hit up on one such bug in a recent banking transaction, which we have studied as a fundamental problem in concurrent systems. Apparently, the example studied in books consider the scenario of banking transaction and hitting the bug in reality reminded me of the same. To explain the situation, I had INR 7316.88 in my account. I transferred INR 3000.00, which attracted a service charge and tax of INR (50 + 9). This triggered 2 transactions in parallel - deducting the 3000.00 and deducting 59.00 as service charge from the same account. It could have possibly happened in sequence - 3000.00 deducted first and then 59.00. But somehow, the banking system's code treated them as 2 parallel threads and invoked them in parallel. Now what happens - 3000.00 deducted from 7316.88 leading to a balance of 4316.88. Now 59.00 should be deducted from 4316.88, but due to this happening in parallel with 3000.00 deduction, it deducted 59.00 from the original amount of 7316.88, leading to a balance of 7257.88. The transaction of 59.00 got completed later than 3000.00, ending up with a total balance of 7257.88 in my account. The below snapshot proves this incident (listed in reverse order of date - the last entry is older transaction than previous).

This is a clear concurrency bug which could have been avoided by using proper locks in the software code (for those who understand the terminologies). What this incident brings to forefront is that there are bugs for such simple situations still existing in production banking systems which many of us just rely on and use regularly. This one ended up being beneficial for me and hence I am not worried. What if the reverse had happened - The final balance turned out to be lower than expected. I would be running from one bank counter to another, and they would blame it on the software without being able to help themselves. This might get resolved, but would take time. What if people do not notice this - then it is a matter of chance that it would be fixed.

I was able to withdraw 7000.00 later from the account, leading to no way that the software could fix the problem some days later. I am waiting to see what happens and how the system goes with this. I have no intent to keep this amount and I would eventually return it, once the bank identifies their issue.

I would expect that the application developers understand the severity of such incidents and do their due diligence in ensuring the quality of their software. I would also hope that the bank staff and the customers do not trust these systems blindly.

Happy online banking to all !!

Sunday, October 25, 2015

Clang (LLVM) versus gcc

Below are some of my experiments which list the difference between clang and gcc seen as a user.
(clang version 3.6, gcc version 4.8.4)

Expt. 1: Memory Allocation:
If I declare something like the below (small arrays):
    char a[4];
    char b[4];
    char c[4];

gcc allocates memory addresses to a, b, c which are ending in 0 (aligned to 16 bytes) even if the actual usage is only 4 bytes for each of them.
But clang is able to analyze the size of the array and allocates space only what is needed, thus preventing memory holes.
gcc output:
Addresses for char:
a=0x7fffd9ab2540
b=0x7fffd9ab2550
c=0x7fffd9ab2560
clang output:
Addresses for char:
a=0x7fffe8c528c4
b=0x7fffe8c528c0
c=0x7fffe8c528bc

Expt. 2: Register Allocation:
I have the below testcase, which I am compiling using -O3 optimization.

int main(){
    register int i;
    int a[1000];
    for(i=0;i<1000;i++)          //loop-1
        printf("%d\n",a[i]);
    printf ("%p\n",&i);             //line-1
}


When I remove line-1, then variable i is allocated register as expected. But when I add line-1, then variable i is not allocated register, but referenced from memory for each iteration of the for loop (loop-1). This was unexpected and on further probing, it was understood that because printf is an extern function and may be spawning threads and modifying variable i using pointers, compiler tries to be pessimistic and references variable from memory.
But the for loop occurs before the printf is called. In no way, the external function could harm using i from register in loop-1. This behavior is properly analyzed by clang but not by gcc. See the disassembly for both of them below. It is clear that clang allocates register rbx to variable i but gcc references it from memory always and hence clang does a better job here.


gcc code:
  400497:    8b 04 24                 mov    (%rsp),%eax
  40049a:    83 c0 01                 add    $0x1,%eax
  40049d:    3d e7 03 00 00           cmp    $0x3e7,%eax
  4004a2:    89 04 24                 mov    %eax,(%rsp)
  4004a5:    7e d9                    jle    400480 <main+0x10>

llvm code:

 40055f:    48 ff c3                 inc    %rbx
 400562:    48 81 fb e8 03 00 00     cmp    $0x3e8,%rbx
 400569:    75 e5                    jne    400550 <main+0x20>

Expt. 3: Memory Allocation:
gcc allocates memory addresses to local variables such that the smaller sized variables are allocated first, followed by larger ones and so on. In this process, it may happen that some of the intermediate addresses are kept unused (for alignment). e.g. char are allocated first, then short and then int.
clang allocates memory addresses in reverse order as gcc. This prevents any unused address space in between the allocated addresses and packs the variables compactly.

while gcc relatively performs bad in terms of space, there is one benefit of such allocation. gcc tries to maximize the number of variables which are closer to the SP and hence would be easily accessible by the immediate offset relative to SP. For variables which are allocated large address offset w.r.t. SP (which can happen if too many variables in the function), far variables cannot be accessed using immediate offset and will require extra processing. Allocating small size variables first enables gcc to enable more variables to be accessed faster compared to llvm.

Sample output from gcc and llvm are printed below:

code:
    char a;
    char c;
    short aa;
    short ab;
    char b;
    int bb;

    printf ("Addresses for char:\n%p\n%p\n%p\n",&a,&b,&c);
    printf ("\nAddresses for short:\n%p\n%p\n",&aa,&ab);
    printf ("\nAddresses for int:\n%p\n\n",&bb);


gcc output:
Addresses for char:
0x7fff65d5b625
0x7fff65d5b627
0x7fff65d5b626

Addresses for short:
0x7fff65d5b628
0x7fff65d5b62a

Addresses for int:
0x7fff65d5b62c

llvm output:
Addresses for char:
0x7fff717b30bf
0x7fff717b30b9
0x7fff717b30be

Addresses for short:
0x7fff717b30bc
0x7fff717b30ba

Addresses for int:
0x7fff717b30b4