JazzLab 블로그: Purify - 메모리 에러 설명

출처 : http://www.ibm.com/developerworks/rational/library/06/0822_satish-giridhar/

Introduction
Most programmers agree that defects related to incorrect memory usage and management are the hardest to isolate, analyze, and fix. Therefore, they are the costliest defects to have in your programs. These defects are typically caused by using uninitialized memory, using un-owned memory, buffer overruns, or faulty heap management.

IBM® Rational® Purify® is an advanced memory usage error detecting tool that enables software developers and testers to detect memory errors in C and C++ programs. While a program runs, Purify collects and analyzes data to accurately identify memory errors that are about to happen. It provides detailed information, such as the error location (function call stack) and size of the affected memory, to assist you in quickly locating the problem areas. It also greatly reduces debugging time and complexity, so you can focus on fixing the flaw in the application logic that is causing the error.

Purify is available for all prominent platforms, including IBM® AIX® on Power PC®, HP-UX® on PA-RISC, Linux™ on x86 and x86/64, Sun™ Solaris™ on SPARC®, and the Microsoft® Windows® on x86 (check documentation for updated list of supported platform). In this article, you will first learn about various types of memory access errors with the help of examples, and then learn how to use Purify for detecting and fixing those errors. In the Download section, you will find the C source file (memerrors.c) with the code samples in this article, and you can use them to experiment with Purify.

Memory errors

Memory errors can be broadly classified into four categories:

Using memory that you have not initialized
Using memory that you do not own
Using more memory than you have allocated (buffer overruns)
Using faulty heap memory management

Purify detects errors in all of these categories and identifies the type of the error within a category. Understanding the types of errors helps you identify and isolate subtle mistakes in your program that may cause the program to act strangely and unpredictably. In rest of this section, various error types are explained using code samples.

Using memory that you have not initialized
When you read from memory that you forgot to initialize, you get garbage value. This error looks deceptively innocent, but it has the potential to cause mysterious program behavior.

The garbage value that you get could fortuitously happen to be a meaningful value that your program can handle. For example, some operating systems initialize a memory block with zeros when it is allocated for the first time. If zero is a meaningful value for your program, it may run smoothly, initially. However, after the program runs for a while, the memory might be freed and reallocated. When a memory block is recycled, it has the values that were stored in it when it was last used. These values are unpredictable. Depending upon the value, your program may crash immediately, may run for a while and crash sometime later, or may run smoothly but produce strange results. Since the value could be different in each run, the behavior of the program can be baffling, making it hard to reproduce the problem consistently.

Purify detects such errors and reports an Uninitialized Memory Read (UMR) error for every use of uninitialized memory. It further differentiates between using uninitialized memory and copying value from an uninitialized memory location to another memory location. When an uninitialized memory is copied, Purify reports an Uninitialized Memory Copy (UMC) error. After the copying, the destination location also has uninitialized memory; therefore, whenever this memory is used, Purify reports a UMR.

Listing 1 shows a simple example. There are two integers: i and j. The integer i is initialized with 10. Then the value of j is copied into i. Since j has not been initialized, i also has garbage value after j is copied into it. Purify maintains status of each memory location. It is capable of the analysis that reveals that, although i has been initialized with 10, copying an uninitialized value has made i also uninitialized. Therefore, Purify reports any usage of i (for example, as an argument to printf in the next line) as a UMR error.

Listing 1. An example of UMR and UMC errors

void uninit_memory_errors() {
    int i=10, j;
    i = j;     /* UMC: j is uninitialized, copied into i */
    printf("i = %d\n", i); /* UMR: Using i, which has junk value */
}

This example is intentionally trivial to make it easy for you to identify the problem just by inspecting the code. But real-world applications have many thousands lines of code and have complex control flow. The location where a valid value is corrupted by copying a garbage value into it, could be in a different function, and potentially in a different sub-system or library. If you inspect the bar method in Listing 2, and you do not know much about foo method, you would not suspect that i would be corrupted after calling the foo method. Depending upon the size and complexity of the source code, you may have to spend considerable time and effort to analyze and then to rectify this type of defect. Purify eliminates this effort and reports UMRs, indicating the use of uninitialized memory value.

Listing 2. Another example of UMR and UMC errors

void foo(int *pi) {
    int j;
    *pi = j; /* UMC: j is uninitialized, copied into *pi */
}

void bar() {
    int i=10;
    foo(&i);
    printf("i = %d\n", i); /* UMR: Using i, which is now junk value */
}

As you notice, whenever a memory location with a UMC error is finally used, Purify reports a UMR error for that same memory location. UMC errors may not always be critical, and Purify hides them by default. Later in this article, you will learn how to see UMC and other errors that Purify hides.

Using memory that you don't own
Explicit memory management and pointer arithmetic present opportunities for designing compact and efficient programs. However, incorrect use of these features can lead to complex defects, such as a pointer referring to memory that you don't own. In this case, too, reading memory through such pointers may give garbage value or cause segmentation faults and core dumps, and using garbage values can cause unpredictable program behavior or crashes.

Purify detects these errors. In addition to reporting the type of error, Purify indicates the memory area that the pointer refers to and where that memory has been allocated. This is typically a good clue for identifying the cause of the error. This category includes following types of errors:

Null pointer read or write (NPR, NPW)
Zero page read or write (ZPR, ZPW)
Invalid pointer read or write (IPR, IPW)
Free memory read or write (FMR, FMW)
Beyond stack read or write (BSR, BSW)

Null Pointer Read/Write (NPR, NPW) and Zero Page Read/Write (ZPR, ZPW):
If a pointer's value can potentially be null (NULL), the pointer should not be de-referenced without checking it for being null. For example, a call to malloc can return a null result if no memory is available. Before using the pointer returned by malloc, you need to check it to make sure that isn't null. For example, a linked list or tree traversal algorithm needs to check whether the next node or child node is null.

It is common to forget these checks. Purify detects any memory access through de-referencing a null pointer, and reports an NPR or NPW error. When you see this error, examine whether you need to add a null pointer check or whether you wrongly assumed that your program logic guaranteed a non-null pointer. On AIX, HP, and under some linker options in Solaris, dereferencing a null pointer produces a zero value, not a segmentation fault signal.

The memory is divided into pages, and it is "illegal" to read from or write to a memory location on the zero'th page. This error is typically due to null pointer or incorrect pointer arithmetic computations. For example, if you have a null pointer to a structure and you attempt to access various fields of that structure, it will lead to a zero page read error, or ZPR.

Listing 3 shows a simple example of both NPR and ZPR problems. The findLastNodeValue method has a defect, in that it does not check whether the head parameter is null. NPR and ZPR errors occur when the next and val fields are accessed, respectively.

Listing 3. An example of NPR and ZPR errors

typedef struct node {
    struct node* next;
    int          val;
} Node;

int findLastNodeValue(Node* head) {
    while (head->next != NULL) { /* Expect NPR */
        head = head->next;
    }
    return head->val; /* Expect ZPR */
}

void genNPRandZPR() {
    int i = findLastNodeValue(NULL);
}

Invalid Pointer Read or Write (IPR, IPW):
Purify tracks all memory operations. When it detects a pointer to a memory location that has not been allocated to the program, it reports either an IPR or IPW error, depending on whether it was a read or write operation. The error can happen for multiple reasons. For example, you will get this type of error if you have an uninitialized pointer variable and the garbage value happens to be invalid. As another example, if you wanted to do *pi = i;, where pi is a pointer to an integer and i is an integer. But, by mistake, you didn't type the * and wrote just pi = i;. With the help of implicit casting, an integer value is copied as a pointer value. When you dereference pi again, you may get an IPR or IPW error. This can also happen when pointer arithmetic results in an invalid address, even when it is not on the zero'th page. (See Listing 4.)

Listing 4. An example of IPR and IPW errors

void genIPR() {
    int *ipr = (int *) malloc(4 * sizeof(int));
    int i, j;
    i = *(ipr - 1000); j = *(ipr + 1000); /* Expect IPR */
    free(ipr);
}

void genIPW() {
    int *ipw = (int *) malloc(5 * sizeof(int));
    *(ipw - 1000) = 0; *(ipw + 1000) = 0; /* Expect IPW */
    free(ipw);
}

IPR and IPW are encountered commonly while using functions that return a pointer (e.g. malloc) in 64-bit applications because pointer is 8 byte long and integer is 4 byte long. If the method declaration is not included, compiler assumes that the method returns an integer, and implicitly casts the return value and retains only lower 4 bytes of the pointer value. Purify reports IPR and IPW upon using this invalid pointer. (See Listing 5.)

Listing 5. Another example of IPR and IPW errors

/*Forgot to include following in a 64-bit application:
#include <malloc.h>
#include <stdlib.h>
 */

void illegalPointer() {
    int *pi = (int*) malloc(4 * sizeof(int));
    pi[0] = 10; /* Expect IPW */
    printf("Array value = %d\n", pi[0]); /* Expect IPR */
}

Free Memory Read or Write (FMR, FMW):
When you use malloc or new, the operating system allocates memory from heap and returns a pointer to the location of that memory. When you don't need this memory anymore, you de-allocate it by calling free or delete. Ideally, after de-allocation, the memory at that location should not be accessed thereafter.

However, you may have more than one pointer in your program pointing to the same memory location. For instance, while traversing a linked list, you may have a pointer to a node, but a pointer to that node is also stored as next in the previous node. Therefore, you have two pointers to the same memory block. Upon freeing that node, these pointers will become heap dangling pointers, because they point to memory that has already been freed. Another common cause for this error is usage of realloc method. (See Listing 6 code.)

The heap management system may respond to another malloc call in the same program and allocate this freed memory to other, unrelated objects. If you use a dangling pointer and access the memory through it, the behavior of the program is undefined. It may result in strange behavior or crash. The value read from that location would be completely unrelated and garbage. If you modify memory through a dangling pointer, and later that value is used for the intended purpose and unrelated context, the behavior will be unpredictable. Of course, either an uninitialized pointer or incorrect pointer arithmetic can also result in pointing to already freed heap memory.

Listing 6. An example of FMR and FMW errors

int* init_array(int *ptr, int new_size) {
    ptr = (int*) realloc(ptr, new_size*sizeof(int));
    memset(ptr, 0, new_size*sizeof(int));
    return ptr;
}

int* fill_fibonacci(int *fib, int size) {
    int i;
    /* oops, forgot: fib = */ init_array(fib, size);
    /* fib[0] = 0; */ fib[1] = 1;
    for (i=2; i<size; i++)
        fib[i] = fib[i-1] + fib[i-2];
    return fib;
}

void genFMRandFMW() {
    int *array = (int*)malloc(10);
    fill_fibonacci(array, 3);
}

Beyond Stack Read or Write (BSR, BSW) :
If the address of a local variable in a function is directly or indirectly stored in a global variable, in a heap memory location, or somewhere in the stack frame of an ancestor function in the call chain, upon returning from the function, it becomes a stack dangling pointer. When a stack dangling pointer is de-referenced to read from or write to the memory location, it accesses memory outside of the current stack boundaries, and Purify reports a BSR or BSW error. Uninitialized pointer variables or incorrect pointer arithmetic can also result in BSR or BSW errors.

In the example in Listing 7, the append method returns the address of a local variable. Upon returning from that method, the stack frame for the method is freed, and stack boundry shrinks. Now the returned pointer would be outside the stack bounds. If you use that pointer, Purify will report a BSR or BSW error. In the example, you would expect append("IBM ", append("Rational ", "Purify")) to return "IBM Rational Purify", but it returns garbage manifesting BSR and BSW errors.

Listing 7. An example of BSR and BSW errors

char *append(const char* s1, const char *s2) {
    const int MAXSIZE = 128; 
    char result[128];
    int i=0, j=0;

    for (j=0; i<MAXSIZE-1 && j<strlen(s1); i++,j++) {
        result[i] = s1[j];
    }

    for (j=0; i<MAXSIZE-1 && j<strlen(s2); i++,j++) {
        result[i] = s2[j];
    }

    result[++i] = '\0';
    return result;
}

void genBSRandBSW() {
    char *name = append("IBM ", append("Rational ", "Purify"));
    printf("%s\n", name); /* Expect BSR */
    *name = '\0'; /* Expect BSW */
}

Using memory that you haven't allocated, or buffer overruns
When you don't do a boundary check correctly on an array, and then you go beyond the array boundary while in a loop, that is called buffer overrun. Buffer overruns are a very common programming error resulting from using more memory than you have allocated. Purify can detect buffer overruns in arrays residing in heap memory, and it reports them as array bound read (ABR) or array bound write (ABW) errors. (See Listing 8.)

Listing 8. An example of ABR and ABW errors

void genABRandABW() {
    const char *name = "IBM Rational Purify";
    char *str = (char*) malloc(10);
    strncpy(str, name, 10);
    str[11] = '\0'; /* Expect ABW */
    printf("%s\n", str); /* Expect ABR */
}

Using faulty heap memory management
Explicit memory management in C and C++ programming puts the onus of managing memory on the programmers. Therefore, you must be vigilant while allocating and freeing heap memory. These are the common memory management mistakes:

Memory leaks and potential memory leaks (MLK, PLK, MPK)
Freeing invalid memory (FIM)
- Freeing mismatched memory (FMM)
- Freeing non-heap memory (FNH)
- Freeing unallocated memory (FUM)

Memory leaks and potential memory leaks:
When all pointers to a heap memory block are lost, that is commonly called a memory leak. With no valid pointer to that memory, there is no way you can use or release that memory. You lose a pointer to a memory when you overwrite it with another address, or when a pointer variable goes out of the scope, or when you free a structure or an array that has pointers stored in it. Purify scans all of the memory and reports all memory blocks without any pointers pointing to them as memory leaks (MLK). In addition, it reports all blocks as potential leaks, or PLK (called MPK on Windows platforms) when there are no pointers to the beginning of the block but there are pointers to the middle of the block.

Linsting 9 shows a simple example of a memory leak and a heap dangling pointer. In this example, interestingly, methods foo and main independently seem to be error-free, but together they manifest both errors. This example demonstrates that interactions between methods may expose multiple flaws that you may not find simply by inspecting individual functions. Real-world applications are very complex, thus tedious and time-consuming for you to inspect and to analyze the control flow and its consequences. Using Purify gives you vital help in detecting errors in such situations.

First, in the method foo, the pointer pi is overwritten with a new memory allocation, and all pointers to the old memory block are lost. This results in leaking the memory block that was allocated in method main. Purify reports a memory leak (MLK) and specifies the line where the leaked memory was allocated. It eliminates the slow process of hunting down the memory block that is leaking, therefore shortens the debugging time. You can start debugging at the memory allocation site where the leak is reported, and then track what you are doing with that pointer and where you are overwriting it.

Later, the method foo frees up the memory it has allocated, but the pointer pi still holds the address (it is not set to null). After returning from method foo to main, when you use the pointer pi, it refers to the memory that has already been freed, so pi becomes a dangling pointer. Purify promptly reports a FMW error at that location.

Listing 9. An example of a memory leak and a dangling pointer

int *pi;
void foo() { 
    pi = (int*) malloc(8*sizeof(int)); /* Allocate memory for pi */
    /* Oops, leaked the old memory pointed by pi holding 4 ints */
    /* use pi */
    free(pi); /* foo() is done with pi, so free it */
}
void main() {
    pi = (int*) malloc(4*sizeof(int)); /* Expect MLK: foo leaks it */
    foo();
    pi[0] = 10; /* Expect FMW: oops, pi is now a dangling pointer */
}

Listing 10 shows an example of a potential memory leak. After incrementing pointer plk, it points to the middle of the memory block, but there is no pointer pointing to the beginning of that memory block. Therefore, a potential memory leak is reported at the memory allocation site for that block.

Listing 10. An example of potential memory leak

int *plk = NULL;
void genPLK() {
    plk = (int *) malloc(2 * sizeof(int)); /* Expect PLK */
    plk++;
}

Freeing invalid memory:
This error occurs whenever you attempt to free memory that you are not allowed to free. This may happen for various reasons: allocating and freeing memory through inconsistent mechanisms, freeing a non-heap memory (say, freeing a pointer that points to stack memory), or freeing memory that you haven't allocated. When using Purify for the Windows platform, all such errors are reported as freeing invalid memory (FIM). On the UNIX® system, Purify further classifies these errors by reporting freeing mismatched memory (FMM), freeing non-heap memory (FNH), and freeing unallocated memory (FUM) to indicate the exact reason for the error.

Freeing mismatched memory (FMM) is reported when a memory location is de-allocated by using a function from a different family than the one used for allocation. For example, you use new operator to allocate memory, but use method free to de-allocate it. Purify checks for the following families, or matching pairs:

malloc() / free()
calloc() / free()
realloc() / free()
operator new / operator delete
operator new[] / operator delete[]

Purify reports any incompatible use of memory allocation and de-allocation routine as an FMM error. In the example in Listing 11, the memory was allocated using the malloc method but freed using the delete operator, which is not the correct counterpart, thus incompatible. Another common example of an FMM error is C++ programs that allocate an array using the new[] operator, but free the memory using a scalar delete operator instead of array delete[] operator. These errors are hard to detect through code inspection, because the memory allocation and de-allocation locations may not be located close to each other, and because there is no difference in syntax between an integer pointer and a pointer to an integer array.

Listing 11. An example of a freeing mismatched memory error

void genFMM() {
    int *pi = (int*) malloc(4 * sizeof(int));
    delete pi; /* Expect FMM/FIM: should have used free(pi); */
    pi = new int[5];
    delete pi; /* Expect FMM/FIM: should have used delete[] pi; */
}

Freeing non-heap memory (FNH) error is reported when you call free with a non-heap address (a stack address, for instance). Freeing unallocated memory (FUM) is reported when you try to free unallocated memory, such as memory that you have already freed, or the pointer you are trying to free points to the middle of a memory block. Listing 12 shows examples of these errors.

Listing 12. Examples of freeing non-heap memory and freeing unallocated memory errors

void genFNH() {
    int fnh = 0;
    free(&fnh); /* Expect FNH: freeing stack memory */
}

void genFUM() {
    int *fum = (int *) malloc(4 * sizeof(int));
    free(fum+1); /* Expect FUM: fum+1 points to middle of a block */
    free(fum);
    free(fum); /* Expect FUM: freeing already freed memory */
}

JazzLab 블로그

2012년 1월 8일 일요일

Purify - 메모리 에러 설명

댓글 없음:

댓글 쓰기