Author: Roman Fomichev
We often need to store private data in programs, for example passwords, secret keys, and their derivatives, and we usually need to clear their traces in the memory after using them so that a potential intruder can't gain access to these data. In this article we will discuss why you can't clear private data using memset() function.
You may have already read the article discussing vulnerabilities in programs where memset() is used to erase memory. However, that article doesn't fully cover all the possible scenarios of incorrect use of memset(). You may have problems not only with clearing stack-allocated buffers but with clearing dynamically allocated buffers as well.
For a start, let's discuss an example from the above-mentioned article that deals with using a stack-allocated variable.
Here is a code fragment that handles a password:
|
|
This example is rather conventional and completely synthetic.
If we build a debug version of that code and run it in the debugger (I was using Visual Studio 2015), we'll see that it works well: the password and its calculated hash value are erased after they have been used.
Let's take a look at the assembler version of our code in the Visual Studio debugger:
|
|
We see the call of memset() function, that clears the private data after use.
We could stop here, but we'll go on and try to build an optimized release version. Now, this is what we see in the debugger:
|
|
All the instructions associated with the call to the memset() function have been deleted. The compiler assumes that there is no need to call a function erasing data since they are no longer in use. It's not an error; it's a legal choice of the compiler. From the language viewpoint, a memset() call is not needed since the buffer is not used further in the program, so removing this call cannot affect its behavior. So, our private data remain uncleared, and it's very bad.
Now let's dig deeper. Let's see what happens to data when we allocate them in dynamic memory using the malloc function or the new operator.
Let's modify our previous code to work with malloc:
|
|
We'll be testing a release version since the debug version has all the calls where we want them to be. After compiling it in Visual Studio 2015, we get the following assembler code:
|
|
Visual Studio has done well this time: it erases the data as planned. But what about other compilers? Let's try gcc, version 5.2.1, and clang, version 3.7.0.
I've modified our code a bit for gcc and clang and added some code to print the contents of the allocated memory block before and after the cleanup. I print the contents of the block the pointer points to after the memory is freed, but you shouldn't do it in real programs because you never know how the application will respond. In this experiment, however, I'm taking the liberty to use this technique.
|
|
Now, here's a fragment of the assembler code generated by gcc compiler:
|
|
The printing function (printf) is followed by a call to the free() function while the call to the memset() function is gone. If we run the code and enter an arbitrary password (for example "MyTopSecret"), we'll see the following message printed on the screen:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
The hash has changed. I guess it's a side effect of the memory manager's work. As for our password "MyTopSecret", it stays intact in the memory.
Let's check how it works with clang:
|
|
Just like in the previous case, the compiler decides to remove the call to the memset() function. This is what the printed output looks like:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
So, both gcc and clang decided to optimize our code. Since the memory is freed after calling the memset() function, the compilers treat this call as irrelevant and delete it.
As our experiments reveal, compilers tend to delete memset() calls for the sake of optimization working with both stack and dynamic memory of the application.
Finally, let's see how the compilers will respond when allocating memory using the new operator.
Modifying the code again:
|
|
Visual Studio clears the memory as expected:
|
|
The gcc compiler decided to leave the clearing function, too:
|
|
The printed output has changed accordingly; the data we have entered are no longer there:
MyTopSecret| 7882334103340833743
| 0
But as for clang, it chose to optimize our code in this case as well and cut out the "unnecessary" function:
|
|
Let's print the memory's contents:
MyTopSecret| 7882334103340833743
MyTopSecret| 0
The password remains, waiting for being stolen.
Let's sum it all up. We have found that an optimizing compiler may remove a call to the memset() function no matter what type of memory is used - stack or dynamic. Although Visual Studio didn't remove memset() calls when using dynamic memory in our test, you can't expect it to always behave that way in real-life code. The harmful effect may reveal itself with other compilation switches. What follows from our small research is that one cannot rely on the memset() function to clear private data.
So, what is a better way to clear them?
You should use special memory-clearing functions, which can't be deleted by the compiler when it optimizes the code.
In Visual Studio, for example, you can use RtlSecureZeroMemory. Starting with C11, function memset_s is also available. In addition, you can implement a safe function of your own, if necessary; a lot of examples and guides can be found around the web. Here are some of them.
Solution No. 1.
|
|
Solution No. 2.
|
|
Some programmers go even further and create functions that fill the array with pseudo-random values and have different running time to hinder attacks based on time measuring. Implementations of these can be found on the web, too.
PVS-Studio static analyzer can detect data-clearing errors we have discussed here, and uses diagnostic V597 to signal about the problem. This article was written as an extended explanation of why this diagnostic is important. Unfortunately, many programmers tend to think that the analyzer "picks on" their code and there is actually nothing to worry about. Well, it's because they see their memset() calls intact when viewing the code in the debugger, forgetting that what they see is still just a debug version.