Basic C- pointers, freeing memory, and debugging

Hello all! I’m writing this for the students of CS201: Operating Systems, but hope it will be handy for anyone stumbling onto this article. This will assume programming experience, but try to explain pointers (as well as command line debugging with the GNU Debugger) in a beginner-friendly way.

Basic Pointers#

Computer memory can be thought of as a very, very long array. Each index corresponds to a particular memory address, and everything in your program lives somewhere in memory.

int main(int argc, char *argv[]){
    int x = 5; 
    char greeting[] = "Hello, world";
    printf(greeting); 
}

Output: Hello, world

Within this short program, we have two variables that are placed into memory- x & greeting. You can ask C for a variable’s memory address with the ampersand operator, as follows:

int main(int argc, char *argv[]){
    int x = 5;
    printf("x lives at %p",(&x)); 
}

Output: x lives at 0x7ffc65b04264

That hex number is a pointer, and you can think of it as an index into memory. Imagine a pointer as the street address of a piece of memory. A pointer is nothing more than the numerical address of some block of memory (modern memory is a little more complex than “a giant array” but it’s a fine abstraction for now, and the one C uses). What can you do with an address? Visit it and see what’s there. We call this dereferencing a pointer (because a pointer is a reference to a place in memory), as follows:

int main(int argc, char *argv[]){ 
    int *x; 
    int y = 0; 
    x = &y; 
    printf("The value %d is at %p",*x,x); 
}

Output: The value 0 is at 0x7ffdfa7021ac

C follows that pointer into memory. Notice how the second argument into the print string, I just gave it x- no ampersand or asterisk. That’s because x is just an address. It’s like an integer that corresponds to the particular place in an array you want to be- but the integer is in hex, and the array is computer memory.

You get something’s memory address with &

You follow a memory address with *

* put in front of a variable means “go to this address in memory, give me whatever is there”. Where do monstrosities like 5 or 6 star variables come from? Well, a pointer can point to a piece of memory containing anything - including other pointers.

int main(){
    int var = 5; 
    int *ptr = (&var); 
    int **pptr = (&ptr); 
    printf("%p points to %p points to %d",pptr,*pptr,**pptr); 
}

Output: 0x7fff5cf1b2b8 points to 0x7fff5cf1b2b4 points to 5

I could nest those pointers ad-infinitum to no ill effect, as long as I was careful. Linked lists work on a similar principle, as you’ll see. But for now, let’s take a deeper dive into how your c program uses memory.

The Stack and The Heap#

typedef struct{
    int x; 
    int y; 
} myStruct; 

int main(){
    myStruct m; 
    m.x = 5; 
    m.y = 6;
    return 0; 
}

That’s perfectly valid C code, but it looks wrong doesn’t it? That’s because traditionally structs go into the heap. What’s the heap?

When your process (or thread) starts, the operating system allocates you a static amount of memory. This memory is formatted like a stack (the data structure). Whenever a new function is called (main included), C puts a new frame onto the stack. All the variables in that function are saved on that frame, and exist in the same scope. If you call another function, a new frame goes on and none of your new variables can see your old variables. When that function returns, the nature of the stack means that all C has to do to clean up is pop the frame (move the stack pointers back, really). This is not unique to C, there are very few languages that don’t rely on call stacks, FORTAN 77 being the only I have heard of.

How big is the stack? By default on most Linux machines, 2 Megabytes. It’s actually decently easy to violate stack size- you might have noticed every function call gets a frame, which takes a little memory (gcc is usually clever enough to optimize away pointless frames in a recursive function (tail call optimization), but you can force it). Thus, enough recursion can easily consume the entire stack space and you get a stack overflow. That’s where the name of your favorite website comes from. So what do we do if we don’t want to store everything on the stack? There’s only so much memory, and everything on the stack is passed by value- to pass the struct in my previous code, I’d be copying it onto the new stack frame (which consumes time and memory), and none of the changes I made to it in the called function would be visible in my calling function.

That’s where the heap comes in. Think of the heap as extra memory you can ask the operating system to give you in addition to your stack memory. You get this memory with an allocation function like malloc. Allocation functions take as argument the size of the piece of memory you want, and return a pointer to the memory they’ve allocated (which may be null if something’s gone wrong). Of course, there’s no free lunch. If you allocate and allocate memory, it’ll eventually run out. Theoretically, heap allocation can grow up to the limit of the virtual memory your operating system provides (but this is compiler implementation dependent, operating system dependent, etc). You eventually have to signal to the operating system, “I’m done with this piece of memory”. You do that via free(). free takes a pointer to the memory you’re done with, and it marks it for reallocation by the operating system.

An important note is that if two or more pointers point to the same memory, calling free with any one of them frees that memory. Pointers are just addresses, the actual structs live out in the heap.

So what happens if you access memory after you’ve freed it? Undefined Behavior. C has a specification, and it defines use-after-free as “undefined behavior”, which means literally anything can happen. What actually happens is dependent on your operating system, your computer architecture, how many processes are running, the phase of the moon… and so on. When do you want to call free()? When you’re done using a piece of memory. When you’re sure you’re done using a piece of memory. If your function allocates memory for a calling function to use, you don’t want to free it- because then you’re giving the caller a reference to de-allocated memory, which they’ll try to use, which is undefined behavior. But you do want to be sure to free your memory eventually, or you get a memory leak, which is when the program consumes the computer’s entire heap memory over time. Of course, if your program terminates at some point, all the memory it allocated is automatically reclaimed, heap or stack. Thankfully determining if a program halts is a super easy problem.. So learning to program without memory leaks is essential to writing actually usable C code.

Of course, your error could be in the other direction- forgetting to allocate rather than deallocate. Humans make mistakes. What happens then?

int main(int argc, char *argv[]){
    int * x; 
    printf("%p points to %d",x,*x);
}

Output: Segmentation fault (core dumped)

This is the shortest reasonable C that segfaults (code golf record is 5 characters, main;, but I digress).

So remember the earlier street address metaphor? Visiting someone’s house, you have to wait for them to let you in. C is a lot like that- your computer memory is protected. When your C program goes looking for something in memory, your operating system has to let them have it. If your program tries to get into somewhere it shouldn’t be (protected memory, or memory owned by another process), the operating system issues it a SIGSEGV signal (or a signal 11 on most systems), which causes it (by default) to crash. It’s totally possible to catch and recover from a SIGSEGV like a normal Exception in an OOP language, but it’s a terrible idea unless you’re an arcane C wizard.

Segfaults can be caused by:

Dereferencing NULL (every time)
Dereferencing a random pointer
Might be caused by dereferencing into freed memory (again- undefined behavior)
Might be caused by indexing past the end of the array. You might get random garbage data, because the array might be laid next to unprotected memory, but it might segfault. There’s some randomness here, so be wary. C doesn’t ever warn you you’re indexing-out-of-bounds, which can pollute computation with garbage data.
Buffer overflow
Stack overflow
And sometimes other things

So how do you debug this? C isn’t exactly forthcoming with error messages here. In a complicated program, you can have any number of derefences, and any could be the offender. Besides binary-searching your program with printf debugging (which does work), what can we do? You could use an IDE’s debugger, but sometimes C behaves differently on Linux than on Windows or Mac, and your assignments for CS201 will be graded on Silk. If it runs fine on your computer, but tests fail when you upload it, your last resort might be debugging over ssh on Silk, using GDB (and maybe vi if you need a good command line editor).

GDB- GNU DeBugger#

GDB is a useful command line debugger for (currently) C, C++, D, Go, Objective-C, Fortran, OpenCL C, Pascal, Rust, Assembly, Modula-2, and Ada. It’s installed on Silk, and most linux-derivative (as part of the core GNU Utils) systems. To use it, we compile your c program with debugging info attached, like so:

gcc -g badidea.c -g as in debu(g)ging information And then run GDB a.out

You’ll be greated by the GDB prompt. GDB by default does, well, nothing. You have to give it a command, and pressing enter on a blank line repeats the last command. To get started, let’s just run our loaded program by typing “run” and hitting enter.

(GDB) run

Program received signal SIGSEGV, Segmentation fault.
0x000055555555465d in main (argc=1, argv=0x7fffffffdf48) at badidea.c:5
5           printf("%p points to %d",x,*x);

We’re off to a great start! It might not sound like much, but GDB has done the essential service of finding the offending line for us. This is 95% of what you’ll likely need to use GDB for this semester. But let’s say you want to use GDB like you’d use a normal debugger- it can do that too! Here’s a short list of commands that’ll help:

run run the program normally, pausing at breakpoints. If no crashes happen, it will just produced normal output
file filename loads an executable into GDB. You can start GDB without any filenames and load them after, or load multiple files.
break use a form of this, and then enter run. GDB will pause execution at the breakpoint set
- break filename:linenumber sets a breakpoint at the specified line of the specified file. break linenumber works equally well if you’ve only loaded 1 file.
- break functionNameyou can set a breakpoint on a function, so whenever it’s called GDB will stop
- break filename:linenumber (conditional) you can set a breakpoint to occur ONLY IF an expression is true. As an example break 42 if i >= ARRAYSIZE, check if your for-loop’s counter-variable has gotten bigger than the array. You can reference any program variables that would be in scope for that breakpoint in the conditional.
watch var sets a breakpoint so that whenever var changes, GDB will pause there
- If you have more than one of the same named var in your program, GDB picks the one in your current scope. If you haven’t started executing yet, it takes the first one it finds.
continue resume execution until the next breakpoint is hit
step execute one line of code. Step follows function calls.
next execute one line of code, but treat function calls as an instruction without going into them
print var prints a variable.
- print *ptr you can dereference a pointer inside a print command! As many times as you want, or use an arrow operator
- print structName GDB will print out the whole structure for you, with all contained values.
backtrace run after a seg fault to produce a back trace of the call stack leading to that fault. Looks a lot like Java’s backtrace on Exceptions.
where works exactly like backtrace, but it’ll give you a stack trace to the point you’re at when the program is running. No segfault needed.
delete breakpoint delete a specific breakpoint. IE delete 10 to delete the breakpoint on line 10
info breakpoints lists all your existing breakpoints and info

GDB is massive and ancient (first released in 1986), so there is more obscure stuff you can do with it, but those are the most relevant and useful. Google the GDB manual if you want to know more.

Conclusion#

I hope that helped, and if you got nothing else from it, here’s some concrete advice:

The compiler is your friend, don’t ignore compiler warnings. Go by the NASA standard, which is to run the compiler at the highest level of warning (gcc file.c -Werror -Wall). gcc can catch a lot of basic errors for you.
Test & debug your code in the same environment it’s being graded in (Silk), before you hand it in.
It isn’t necessary (but does help) to google a few command-line-basics articles and the basics of a command-line editor such as Vim, Emacs, or Nano.