Quick and Dirty Dynamic Arrays in C
C is an obsolete language and there’s really no good reason to use it. For all but the most esoteric projects, C++ should be used instead of C. I may elaborate on this theme in a future post.
That being said, sometimes you’re stuck with C code. Maybe you’re working on legacy code, or you don’t have control over the engineering requirements. Maybe management mistakenly thought C was “faster” or “more portable”. Maybe there’s no C++ compiler for your platform.
Recently, I was asked to reduce the memory footprint for a C application which had somehow ballooned to gargantuan proportions. The application was targeted at an embedded device with 64MB of RAM and no virtual memory. A bit of sleuthing revealed that a geocoding API was allocating a 20MB data structure. That’s a pretty hefty chunk of change for our little device. The API allocated space for up to 50 matches (each of which was 400kb), instead of using a dynamic array to return the results. The code was written by a junior engineer.
When using C, you tend to spend inordinate amounts of time managing arrays, strings, and hashtables. Think of the thousands of hours wasted on such minutae. Isn’t our collective time more valuable? I feel pity for the poor engineers (like me) forced to labor under such cruel conditions.
Here is a recipe for quick and dirty dynamic arrays in C. I have used this pattern many times with excellent results.
Introduction
This example is targeted at a specific use case. It will serve you well if you have to accumulate an unknown amount of data. You can roll this into an existing struct or grow it into a generic void* array class. I tend to use it inline in other data structures because it’s so simple. You can also use this pattern in Java if you want to avoid the overhead of the java.util.* classes.
Struct
First, create a struct or add the members to an existing struct. In this case I’m creating a new struct and my array will store char*’s. Note that pos is the number of elements in the array, and capacity is the number of elements that the array is capable of storing. Feel free to use other variable names if they make more sense to you.
Gub is simply a word I use a lot in test code, similar to foo or bar.
typedef struct {
int pos;
int capacity;
char **array;
} Gub;
Init the struct
calloc is a handy allocation function that fills the memory with zeroes before returning it. This is an easy way to avoid uninitialized memory errors. In this case, I’m also counting on that behavior because I want pos to be 0 and I want array to be NULL before proceeding.
Gub *g = (Gub*)calloc(sizeof(Gub), 1);
Create an empty array
When creating the empty array, it’s necessary to select an initial capacity. In this case the array can hold four elements before it needs to be resized. Note that we init capacity inline here. Personally I think this makes the code more readable.
This could be written just as easily with malloc (or calloc), but I prefer to use realloc to matche the code used when appending elements (see below). Realloc is a magical allocation function that resizes an existing block of allocated memory to a new size. It’s tremendously useful for resizable data structures. If you pass NULL to realloc (as we’re doing here with the g->array), realloc will simply allocate new memory for you.
g->array = (char**)realloc(g->array, (g->capacity = 4) *
sizeof(char*));
Append some data
This is a silly loop that appends argv to the array. Before adding an element to the array, we check to see if there’s room for more elements. If pos == capacity, the array is full and must be resized. Realloc resizes the block of memory and then we’re good to go. Again, note that capacity is modified inline during the realloc. I think this makes the code more readable. In this case I’m simply doubling the size of the array whenever we run out of space.
Also, note that g->pos is incremented inline when we actually add the element.
for (int i = 0; i < argc; ++i) {
if (g->pos == g->capacity) {
g->array = (char**)realloc(g->array, (g->capacity *= 2) *
sizeof(char*));
}
g->array[g->pos++] = argv[i];
}
Free
When we’re done with the array, we free it and then the containing struct. Don’t forget to free the elements themselves if necessary.
free(g->array); free(g);
Tuesday, August 1st 2006 at 3:24 pm
Adam,
I couldn’t disagree more with this part of your blog post:
C is an obsolete language and there’s really no good reason to use
it. For all but the most esoteric projects, C++ should be used instead
of C. I may elaborate on this theme in a future post.
I think it’s just the opposite. If you actually care about performance
or are close to the hardware, C is an obvious choice. The set of people
who should use C is smaller than it used to be, but for some areas there
is no substitute. (C++’s function call overhead and baffling compiler
make it unsuitable for really performance-sensitive stuff.)
On the other hand, if you’re doing application programming, there are
lots of good substitutes to C++ like Java and C#. (C shouldn’t even be
an option.) Python might also be an option. These languages offer lots
of features absent in C++, while eliminating a lot of stuff most C++
programmers don’t need. True, they’re probably slower, but who cares?
C++ occupies a place in the performance/expressiveness spectrum that’s
obsolete. I can’t think of a project today where I’d choose C++.
–Mike!
Tuesday, August 1st 2006 at 3:28 pm
There is a very small penalty for using class functions in C++. If you actually care about that overhead, you’re probably working on one of the “esoteric projects” to which I refer. I imagine this sort of project is more common in academia.
If you’re shipping a product for consumers or businesses, this won’t matter to you. I’ve worked on “performance sensitive” code on a variety of embedded devices with small CPUs (details omitted) and I can definitively say that it’s not function call overhead that impacts performance.
Tuesday, August 1st 2006 at 4:06 pm
Function call overhead makes a difference in places close to the metal, like
operating systems and device drivers. These are the most practical of projects,
not academic at all. (Academics would shy from C wherever possible, just
because the development time is high compared to a more expressive language.
But for some areas of research, like OSes, you have to use C.)
The latest version of Office is written mainly in C#, and even a popular program
like BitTorrent is written in Python. So I don’t see any place for C++ on a PC any longer.
Are embedded devices a good place for C++? Maybe, as they simulate what
computing was like in 1992 - 1998 or so. But this seems like a temporary
situation; you’d expect more development to move to higher-order application
languages as CPUs grow in power, just like what happened on the PC.
Indeed, more and more cell phone dev is moving to Java/BREW. I heard that
TiVo had some kind of Java-based toolkit (though I don’t know what ever happened
to it). It seems reasonable to assume that other consumer devices will follow suit, if they
haven’t already.
C++ was OK while it lasted, but it was essentially transitional. I don’t think anyone
would pick it if there are the resources to run Java or C#.
Saturday, October 14th 2006 at 1:42 am
Hi,
I just want to say, C doesn’t supports inline variable declaration in a “For Loop” declaration. So, if you want it to be pure C; you should change line “for (int i = 0; i
Monday, April 16th 2007 at 7:17 am
try dyanamic array in C99.
okay!!!!
its nicely working and nice to use as well………