Tuesday, June 7, 2011

Words to Code By

"Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction." - Albert Einstein

Monday, June 6, 2011

Concatenation Buffer Implementation

As promised in my previous post, I will now present the implementation for a dynamic string buffer class which concatenates until reset() is called. We start with a typedef for the structure which contains the class members:


/* CB is for Concatenation Buffer, a dynamically sized null
 * terminated string buffer which accumulates until CB_reset() is
 * called.  It is particularly useful for things like creating
 * complex SQL queries.
 */

typedef struct _cb {
  size_t sz,
         len;
  char *buf;
} CB;

Explaining each of the members:

  • buf - points to the heap allocated memory in which we store our string.
  • len - current length of the string in our buffer.
  • sz - current size of our buffer, which must be able to hold the string and the null terminating byte on the end (len + 1).

Now, let's get started with the constructor:

CB* CB_constructor(CB *self, size_t sz_hint)
/***************************************************************
 * Prepare a CB for use with initial size of sz_hint.
 */
{
  CB *rtn= NULL;
  assert(sz_hint);
  self->sz= sz_hint;
  if(!(self->buf= malloc(self->sz))) goto abort;
  CB_reset(self);
  rtn= self;
abort:
  return rtn;
}

The first argument, self, is the address of the instance of CB which we will initiate. We can't know whether it resides on the stack, heap, or in the static data segment. sz_hint is the caller's hint as to how large the buffer should be sized initially. If it needs to get bigger, it will be realloc()'d on demand. We initiate the value of rtn to NULL to indicate failure. If everything goes OK, then it will be set to the value of self before the constructor returns. We use the assert() macro to verify that the caller did not pass us a 0 sz_hint. I am amazed at how many "C" programmers I have encountered who do not know about the assert() macro. Finally we malloc() the buffer memory and call our reset() function which sets len to 0 and null terminates the empty string.

The destructor is very simple:

void* CB_destructor(CB *self)
/****************************************************************
 * Free resources associated with CB.
 */
{
  if(self->buf) free(self->buf);   return self;
}

And we'll need a function to grow the buffer when required. Since this function is "private", we declare it static so that it is invisible outside of the current source file:

static int
growbuf(CB *self)
/****************************************************************
 * Attempt to increase the size buffer initially trying to double
 * it, then backing off 10% at a time.
 * Returns non-zero for error.  */
{
  int rtn= 1;
  size_t i;
  /* Initially try do double memory. If that fails, back off 10%
   * at a time
   */
  for(i= 20; i > 10; i--) {
    char *p;
    size_t new_sz= self->sz * i / 10;
    /* Try to reallocate the memory */
    if(!(p= realloc(self->buf, new_sz))) continue;
    self->buf= p;
    self->sz = new_sz;
    break;
  }
  if(i != 10) rtn= 0; /* Note success if we grew the buffer */
abort:
  return rtn;
}

And a sprintf() like variadic function used to build the string:

int
CB_sprintf(CB *self, const char *fmt, ...)
/****************************************************************
 * Same as sprintf, except you don't have to worry about buffer  * overflows.
 * Returns non-zero for error.
 */
{
  int is_done, rtn= -1;
  va_list arglist;
  /* Catch empty strings */
  if(!strlen(fmt)) return 0;
  /* vsprintf the fmt string */
  for(is_done= 0; !is_done;) {
    int rc;
    va_start (arglist, fmt);
    rc= vsnprintf(
        self->buf+self->len,
        self->sz - self->len,
        fmt,
        arglist);
     /* Buffer isn't large enough */
    if(rc >= (self->sz - self->len)) {
      if(growbuf(self)) is_done= 1;
    } else {
      if(rc != -1) { /* Successful return */
        rtn= rc;
        self->len += rc;
      }
      is_done= 1;
    }
    va_end (arglist);
  }
  return rtn;
}

With that I'll wrap up this post. Next time I will tackle simple inheritance. In the mean time, happy coding!

Monday, May 16, 2011

Static Initialization

For cases where the number of instances (per thread) of a class are known at compile time, it is most efficient to place them in static data segment. C and C++ provide static initialization facilities for structures and simple data types where the values are known at compile time. A more general solution to initializing static instances is to call the class constructor, and the C++ startup code does just that before the main() function is called. This strategy does not work, however, for shared objects which are loaded during runtime.

An even more general approach is to arrange for the class constructor to be called on each static instance of the class exactly once per process before the instance is used. This could be done like so:

void someFunc(void)
{
  static int is_initialized= 0;
  static CLASS s_class;

  if(!is_initialized) {
    is_initialized= 1;
    CLASS_constructor(&s_class);
  }

  /* do stuff with s_class */
}

This is not thread-safe. Most compilers have a keyword which causes a separate static instance or your data to exist in each thread. For gcc, the keyword is "__thread". So, a thread-safe version for GCC is:

void someFunc(void)
{
  static __thread int is_initialized= 0;
  static __thread CLASS s_class;

  if(!is_initialized) {
    is_initialized= 1;
    CLASS_constructor(&s_class);
  }

  /* do stuff with s_class */
}

For many classes this pattern of initialization occurs frequently enough that it deserves it's own initializer. Here is an example of a static initializer which calls the class constructor the first time, and the class reset() function thereafter:

typedef struct _CLASS {
  int is_initialized;
  /* Other members go here */
} CLASS;

void CLASS_sinit(CLASS *self)
{
  if(!self->is_initialized) {
    CLASS_constructor(self);
    self->is_initialized= 1;
  } else {
    CLASS_reset(self);
  }
}

Note that CLASS_sinit() takes advantage of the fact that all uninitialized static data is guaranteed to be zero (or NULL for pointers). Here is how you would use CLASS_sinit():

void someFunc(void)
{
  static __thread CLASS s_class;
  CLASS_sinit(&s_class);

  /* do stuff with s_class */
}

It is important to note that while using static instances this way is thread-safe, it is not reentrant, and probably shouldn't be used in source code intended for libraries.

Now it's time to start putting together this post with the previous posts to create a useful dynamic string buffer class. In order to improve the usability of the class, I have chosen to make the contents accumulate until the reset() function is called, and so it's really a concatenation buffer, or CB for short. Here is the class declaration:

#include <stdarg.h>
#include <stdlib.h>

/* CB is for Concatenation Buffer, a dynamically sized null
 * terminated string buffer which accumulates until CB_reset() is
 * called.  It is particularly useful for things like creating
 * complex SQL queries.
 */

typedef struct _cb {
  size_t sz,
         len;
  char *buf;
} CB;

#define CB_str(self) \
  ((const char*)(self)->buf)
/****************************************************************
 * Return the pointer to the buffer.
 */

#define CB_len(self) \
  ((self)->len)
/****************************************************************
 * Return the current length of the string in the buffer
 */

#define CB_reset(self) \
  ((self)->buf[((self)->len= 0)]= '\0')
/****************************************************************
 * Reset the buffer so that the length is zero.
 */

int CB_sinit(CB *self, size_t sz_hint);
/****************************************************************
 * Initialization to be called for static instances of CB each
 * use, but actual initialization only occurs once.
 */

CB* CB_constructor(CB *self, size_t sz_hint);
/****************************************************************
 * Prepare a CB for use with initial size of sz_hint.
 */

#define CB_create(p)\
   ((p)= (CB_constructor((p)=malloc(sizeof(CB))) ? \
     (p) : ( p ?  realloc(CB_destructor(p),0): 0)))


void* CB_destructor(CB *self);
/****************************************************************
 * Free resources associated with CB.
 */
#define CB_destroy(self)\
  free(CB_destructor(self))

int CB_sprintf(CB *self, const char *fmt, ...);
/****************************************************************
 * Same as sprintf, except you don't have to worry about buffer
 * overflows.  Returns -1 for error.
 */

int CB_vsprintf(CB *self, const char *fmt, va_list ap);
/****************************************************************
 * Same as vsprintf, except you don't have to worry about buffer
 * overflows.  Returns -1 for error.
 */

In my next post I will go over the implementation of the CB class, which is quite simple.

Friday, May 13, 2011

OOPinC Destructor Anatomy

Previously I covered a simple constructor for OOPinC, and now I will cover a destructor. The purpose of a destructor is to free any resources associated with an instance of a class when it is no longer needed. If the class does not use instance-specific resources (e.g. heap memory, opened files, pipes, etc.), then the destructor is very simple:


void*
CLASS_destructor(CLASS *self)
{
  /* Free your resources here */
  return self;
}

Of course we'll need a replacement for the C++ delete operator:


#include <stdlib.h>
#define CLASS_destroy(p) \
  free(CLASS_destructor(p))

There is one more form of initialization that most classes should have, and that is a special static initializer for the case where the instance resides in the static data segment. In my next post I will cover the static initializer, and then I'll start work on a useful string buffer class to eliminate those pesky buffer overflows that wreak all kinds of havoc on careless programmers ;-)

Wednesday, May 11, 2011

Replacing and upgrading the "new" operator

As promised in my previous post, I will now endeavour to replace and upgrade the functionality of C++'s new operator.

The new operator in C++ allocates enough memory for an instance of CLASS from the heap, and then calls the constructor for CLASS with this set to the address of the allocated heap memory. The upgraded replacement for new presented here provides the following services:

  1. Allocates memory for the instance from the heap.
  2. Calls constructor with address of instance to be initialized.
  3. Handles possible failures of both malloc() and the constructor
  4. In the case of constructor failure the destructor is called, and the memory is freed.

Recall the OOPinC prototype of the constructor for class "CLASS":

CLASS* CLASS_constructor(CLASS *self);

And now, a macro to provide new operator functionality in C:

#include <stdlib.h>
#define CLASS_create(p) \
  ((p)= (CLASS_constructor((p)=malloc(sizeof(CLASS))) ? \
  (p) : \
  ( p ? realloc(CLASS_destructor(p),0): 0)))

Which would be used like so:

CLASS *pClass;
CLASS_create(pClass);
if(!pClass) {
  /* error reporting and/or recovery */
}

or, more succinctly:

CLASS *pClass;
if(!CLASS_create(pClass)) {
  /* error reporting and/or recovery */
}

That's a lot to bite off at once, so let's break the macro down into components. The outermost structure is the ternary conditional operator which will branch differently based on the return value of CLASS_constructor(). In our case a return value of NULL means the constructor failed, otherwise the constructor was successful. The allocation of memory from the heap for our instance is accomplished by calling malloc(), which takes sizeof(CLASS) as the argument telling how much memory needs to be allocated. Note that the return of malloc() is stored in p for later reference within the macro.

If the constructor was successful initializing the memory referenced by p, then the value assigned to p is returned. If not, then there is a nested ternary conditional operator to deal with the two possible modes of failure:

  • In the case that malloc() was successful but the constructor failed, CLASS_destructor() is called with the value of p as the argument. CLASS_destructor() should be able to handle partially constructed instances of CLASS because that is usually the product of a failed constructor. Finally, the return of CLASS_destructor() is passed as the first argument to realloc(), with the second (size) argument set to 0. realloc() is used instead of free() because it returns void*, which is reconcilable with CLASS* and provides the required return for the non-null branch of the outer ternary conditional operator.
  • In the case that malloc() returns NULL, the macro itself returns NULL.

In my next post I will present the anatomy of an OOPinC destructor.

Tuesday, May 10, 2011

OOPinC_constructor(&reader);

Hello and welcome to the Object Oriented Programming in C blog. My name is John Robertson, and I am a professional software engineer and Linux guru located in the southeastern United States of America. The purpose of this blog is to share and extend a specific set of highly productive object oriented programming techniques in C which I have developed over the last 14 years.

It all started back in 1997 when I contracted to write a warehouse inventory application on a hand-held Psion 16 bit computing platform. While my roots are in C, I had been programming in C++ for several years, and was disappointed to discover that no C++ compiler was available for the Psion platform at that time. My choices were Psion's version of Basic, or C, so I chose C.

After programming in C++ for several years I had grown accustomed to the formal object oriented constructs of C++. Remembering that C++ started out as a preprocessor to C, I began to contemplate ways that object oriented programming can be coded in C such that the code is both versatile and legible. The Psion inventory application project went very well, and since then I find myself drawn back to these techniques time and time again with excellent results.

So, let's get started with a constructor for a class "CLASS" in C:

typedef struct _CLASS {
  /* members go here */
} CLASS;

CLASS*
CLASS_constructor(CLASS *self)
{
  CLASS *rtn= NULL;
  /* initialize members. If something goes wrong,
   * goto abort, which will cause
   * the constructor to return NULL.
   */
  rtn= self; /* Assign successful return value */

abort:
  return rtn;
}


The address of the instance to be initialized is passed in as self, which can be located on the stack, on the heap, or in the static data segment. Furthermore, success can be determined by examining the constructor's return value.

In my next post I will present a CLASS_create() macro which provides the functionality of the C++ new operator, with some extra goodies.