Inheritance and Polymorphism in Plain C

People often say C is not an object-oriented language because it's missing inheritance and polymorphism, but this is not quite accurate! I would argue that OOP is more about how you structure a solution than what language features you use. You can create the abstraction of objects in C just as well as other languages like C++ and Java, with a bit of extra work.

In this post, I want to show how inheritance and polymorphism can be achieved in plain C. Since C has no syntax sugar for this, we will need to build things from scratch using plain old structures and functions. This will have the side-effect of showing how effortlessly dynamic languages work under the hood.

Inheritance in C++

In case you haven't been on Linkedin recently, inheritance is a way to declare types as extensions of other types. Consider this C++ code:

// base class
struct Shape {

    Color color;

    void  setColor(Color color_) { color = color_; }
};

// derived class
struct Square : Shape {

    float side;

    float calculateArea() { return side * side; }
};

we declared two types: Shape and Square. The Shape object has a color and a method to change it. The Square has a side variable and a method to calculate its area. Since we want the Square to have a color just as any other shape, we define it to be an extension of the Shape type (this is what the Square : Shape notation means). The Square implicitly inherits all fields and methods of the base type as its own. For instance, you could do:

int main()
{
    Square square;
    square.setColor(RED);
}

The setColor method was declared on Shape and not Square, but it was inherited.

Any code implemented for Shape will also work on Squares now. Say you have a function for drawing a shape:

void draw(Shape*);

int main()
{
    Square *square = createSquare();

    draw(square);
}

The draw function was defined to work on a Shape, which means it will also work on types derived from it. This simplifies code reuse quite a bit!

Now let's peek under the hood to understand how this works. Whenever an object inherits from another, all its fields are implicitly inserted before that type's regular fields:

// base class
struct Shape {

    Color color;

    void  setColor(Color color_) { color = color_; }
};

// derived class
struct Square : Shape {

    // Color color; <----- Implicit field inserted by the compiler

    float side;

    float calculateArea() { return side * side; }
};

This means that the first half of any derived type will match the base class:

+---------------------------------+ Square
|+-------+ Shape                  |
||       |                        |
||       |                        |
||       |                        |
|+-------+                        |
+---------------------------------+
 ^
 Address of the Square

This implies that any pointer to a Square is also a valid pointer to Shape at the same time, therefore casting Square* to Shape* (which is called an "upcast") is a safe operation. The C++ compiler performs this cast implicitly for us, which allows for seamless reuse of code defined on base types for its derived types.

The inverse operation of casting base type pointers to derived type pointers (downcast) is generally unsafe. It's not possible to determine without extra information that the base type pointer came original from an upcast of the same derived type.


struct Shape {

    Color color;

    void  setColor(Color color_) { color = color_; }
};

struct Square : Shape {

    float side;

    float calculateArea() { return side * side; }
};

struct Triangle : Shape {

    float base;
    float height;

    float calculateArea() { return base * height / 2; }
};

int main()
{
    Square   *square   = createSquare();
    Triangle *triangle = createTriangle();

    Shape *shapeA = square;   // upcast
    Shape *shapeB = triangle; // upcast

    Square *downcastA = (Square*) shapeA; // downcast (OK)
    Square *downcastB = (Square*) shapeB; // downcast (ERROR!)
}

Ignoring how the shapeA and shapeB pointers were obtained, the last two lines are identical from the compiler's perspective, which is why it's not possible to detect downcast errors without extra information.

Inheritance in C

Inheritance really is just a glorified version of composition, which means it's quite easy to replicate in C:

1	typedef struct {
2	Color color;
3	} Shape;
4
5	typedef struct {
6	Shape base;
7	float side;
8	} Square;
9
10	void draw(Shape *shape);
11

We just need to explicitly add the base type's fields at the start of the derived type. If we do so, all pointer cast consideration we made for C++ apply. The only exception is that the compiler has no notion of upcasts so we'll need to make them explicit:

1	int main(void)
2	{
3	Square *square = createSquare();
4	draw((Shape*) square); // explicit upcast
5	}
6

Type methods can be written as regular functions that take the object's pointer as first argument:

1	void Shape_setColor(Shape *shape, Color color)
2	{
3	shape->color = color;
4	}
5
6	float Square_calculateArea(Square *square)
7	{
8	return square->side * square->side;
9	}
10

Polymorphism in C++

The inheritance technique we just saw is completely resolved at compile-time. Any time a method is called on an object, the compiler knows exactly which method will be executed based on that object pointer's type. This is obvious from the C code but may be less clear in C++. Consider the following code:

struct Animal {
    void talk() { printf("yayaya\n"); }
};

struct Cat : Animal {
    void talk() { printf("meow\n"); }
};

int main()
{
    // Create a cat object
    Cat cat;

    // Convert it to an animal object
    Animal *animal = &cat;

    // Run talk() from Animal*
    animal->talk();
}

What does this program print? Since inheritance is resolved at compile-time, the talk() method is resolved based on the current type of the object, which is Animal.

Polymorphism allows us to change this behavior. If Animal were polymorphic, the talk() method would not be called based on the current pointer type, but the original type of the object.

struct Animal {
    virtual void talk() { printf("yayaya\n"); }
};

struct Cat : Animal {
    virtual void talk() { printf("meow\n"); }
};

int main()
{
    // Create a cat object
    Cat cat;

    // Convert it to an animal object
    Animal *animal = &cat;

    // Run talk() from Animal*
    animal->talk();
}

If we mark the methods as virtual, the Animal* object will hold internally some information about its original type, which allows the program to call the right method regardless of what is the current type of the pointer.

But how does this work under the hood? Let's replicate this behavior using C.

Polymorphism in C

The way we implement this is by adding to the base type a list of function pointers, one per method. These pointers will refer to the current implementation that should be used when calling a method on the base type. Derived types will inherit the function pointers alongside other base type fields and upon initialization set them to their own implementation. When the derived types are upcased, the base type object will still hold the pointers to the implementations of the derived type.

1	typedef struct {
2	void (*talk)(void);
3	} Animal;
4
5	void talk(Animal *animal)
6	{
7	animal->talk();
8	}
9
10	////////////////////////////////////////
11
12	typedef struct {
13	Animal base;
14	} Cat;
15
16	void catTalk(void)
17	{
18	printf("Meow\n");
19	}
20
21	Animal *makeCat(void)
22	{
23	Cat *cat = malloc(sizeof(Cat));
24	cat->base.talk = catTalk;
25	return (Animal*) cat; // upcast
26	}
27
28	////////////////////////////////////////
29
30	typedef struct {
31	Animal base;
32	} Dog;
33
34	void dogTalk(void)
35	{
36	printf("Bau\n");
37	}
38
39	Animal *makeDog(void)
40	{
41	Dog *dog = malloc(sizeof(Dog));
42	dog->base.talk = dogTalk;
43	return (Animal*) dog;
44	}
45
46	////////////////////////////////////////
47
48	int main(void)
49	{
50	Animal *a = makeCat();
51	Animal *b = makeDog();
52
53	talk(a); // Meow
54	talk(b); // Bau
55	}
56

Even though both objects have the Animal type, the talk operation operates based on the original type. This matches quite closely how C++ does things. One optimization they introduce is the "vtable". Object of the same types will always have the function pointers set to the same implementations, so it makes sense to introduce a global table of implementation associated to each derived type and keep a pointer to that in the object's header.

Final Note

If you are anything like me, you will take this new found knowledge and make every struct of your program polymorphic. That is all fun and good, but it does introduce quite a bit of complexity. In my experience I found that the benefits are rarely worth the cost. I found this video on the topic quite fascinating!

If you enjoy this type of discussion consider hopping in my discord server. We talk about this sort of stuff all the time.

Thanks for reading :)