Making the transition from C to C++

The transition from C to C++ has been less rapid than expected a decade ago, but the language is now beginning to gain ground. As a result, many embedded software teams are considering the matter of migration from C to C++.

Mainly C. In the early days of embedded system programming, everything was coded in assembly language. Then, as processors became more powerful and software increasingly sophisticated, high-level languages became a popular option. For some years, it was unclear which language would become prevalent. Pascal, C, PL/M and Ada were all contenders.

Eventually, C won out (with Ada hanging on in defense-oriented applications), as it offered most benefits of a high level language, with very low overheads and the possibility to perform quite low level operations [like direct access to memory locations] without resort to assembly language.

In the 1990s, there was a surge in interest in object-oriented programming (OOP) and a number of languages were developed, which were based on C, but had OOP capabilities. Of these, the one which began to take a hold in embedded was C++. Around 1995, a user survey suggested that by the end of the century half of embedded programming would be carried out using this language, but that did not prove to be the case.

Many early adopters of C++ for embedded were disappointed with the results. In some cases, projects went disastrously wrong, with enormous shortfalls in performance and/or excessive amounts of memory being required.

There were two reasons for such problems: many of the early compilers and linkers were not very good, having been adapted from desktop toolchains; programmers commonly “forgot” that they were programming embedded applications and took the desktop programming attitude that resources were limitless, which was clearly not the case.

Now, a decade later, things have progressed. The tools are much better and many engineers have a clearer focus on the priorities of embedded application programming. The challenge now is to make the transition from C to C++. There is a body of C code that may need to be ported and numerous software engineers who want to become proficient and confident with programming C++ for embedded.

The Transition

There are numerous strategies that may be employed to migrate code and transition expertise to C++. Some purists suggest that the worst person to learn C++ is a C programmer, as they will have bad habits and will not subscribe to the ideals of object oriented programming. This is not very logical, as one of the greatest benefits of C++ is its C language ancestry.

The process described here comprises three phases, which may be deployed sequentially or could be used separately or in parallel as the basis for C to C++ migration in a particular development environment.

The three phases:

1) Applying reusability. Write new code in C++ and link with existing C code.

2) Develop Clean C. Modify existing C code to be acceptable to a C++ compiler.

3) Use C+. Start using C++ language features to improve programming style, initially stopping short of using OOP features.

Applying Reusability

An obvious approach to starting out with using C++ is to simply write all new modules in that language and link with existing C code modules. This might mean that C++ modules are called from C or vice versa. In either case, there is a problem: C++ typesafe linkage will cause linker errors unless handled correctly.

Typesafe linkage is a facility in C++ which enables function overloading and reduces programming errors. Every function name, whether it is a declaration or a call, is “mangled” – additional characters are added to indicate the number and type of parameters. So, when a function is overloaded:

int sum(int, int);

is given a different name from:

int sum(int, int, int);

If a function is not overloaded, the name mangling process enables the linker (which is not necessarily “C++ aware”) to detect mismatched function declarations and calls, as the mangled function names are different.

The problem is that C function names are not mangled. To address this, it is necessary to turn off typesafe linkage for those functions – either the declaration, for C functions called from C++, or the definition, for C++ functions called from C. This is achieved using the extern “C” construct, thus:

extern “C” sum(int, int);

The name of the function sum() is not mangled.

A group of functions may be declared in a portable way like this:

#ifdef _cplusplus

extern “C”

{

#endif

extern void alpha(int a);

extern int beta(char *p, int b);

#ifdef _cplusplus

}

#endif

Clean C

Another stage in moving to C++ is to take full advantage of C being almost a subset of C++ and use the C++ toolkit to build the entire project, both C and C++ code. The challenge with this approach is the word “almost”, as there are many exceptions, where acceptable C code would be rejected by a C++ compiler. However, these may be overcome and the process has two spin-off benefits:

1) The resulting code is better – clearer and/or more secure.

2) Using the C++ compiler enables the identification of many potential bugs, as it is “fussier” and supports type-safe linkage.

C code, which has been “cleaned up” to make it acceptable to a C++ compiler is commonly termed “Clean C”. Here are some examples of the required measures:

In C, function prototypes are very advisable, but optional. In C++, and, hence, Clean C, they are mandatory and must be included.
Enumerated types [enum] are not taken “seriously” in C – they are used like an alternative to #define. In C++, they are a true data type and must be respected accordingly. Inappropriate assignment etc. results in compiler errors.
In both C and C++, it is a convention to terminate a string [array of char] with a NULL character. However, in C, this is not compulsory and non-terminated strings may be initialized, which makes sense if the length is fixed or defined elsewhere and saving the odd byte of memory is desirable. This may be achieved like this:

char str[3] = “xyz”;

However, this is illegal in C++ and one of the following constructs must be used:

char str[4] = “xyz”;

char str[] = “xyz”;

If a non-terminated string is really essential, the following syntax is legal:

char str[] = {‘x’, ‘y’, ‘z’};

In both languages, it is possible to define one structure inside another. In C, the scope of the inner structure extends beyond the outer one, so code like this is legal:

struct out

{

struct in { int i; } m;

int j;

};

struct in inner;

struct out outer;

In C++, struct in is only in scope within struct out. So, to achieve the same result, the code would need to be modified thus:

struct in { int i; };

struct out

{

struct in m;

int j;

};

struct in inner;

struct out outer;

A variable, external to functions, may be re-declared later in a C module, without a problem([unless, presumably, the new definition is inconsistent with the first). This is not legal in C++. For hand-written code, this is unlikely to be an issue. However, tools that generate code automatically may generate multiple variable definitions which means that the resulting code is not Clean C.
Both languages have a number of keywords, which are reserved for their specific usage only. As C++ has some which are additional to those in C, there is the potential for a clash with existing identifiers. This may be eliminated by scanning the code for C++ keywords, but a more flexible approach is to adopt an identifier naming convention, such as initial/embedded capital letters. So, instead of temp and flowrate, there would be Temp and FlowRate.

Going halfway with C+

Once all the code is in a state that it can be processed using C++ tools, there is the opportunity to start using C++ language features. Because C++ is based upon C, its functionality may be learned and applied incrementally. Initially, some of the non-OOP features that make C++ a “better C” can be used. This language is sometimes termed “C+”.

Some C++ features that may be applied incrementally:

In C, pointers are the #1 cause of programming errors. They are a powerful facility and are, of course, supported by C++. However, an alternate way to give a called function access to the parameter variables of a calling function is available: reference parameters. In C, you might write a function like this:

void swap(int *a, int *b)

{

int temp = *a;

*a = *b;

*b = temp;

}

And a call would look like this: swap(&x, &y);

In C++ it may be coded thus:

void swap(int& a, int& b)

{

int temp = a;

a = b;

b = temp;

}

This code is clearer because all the pointer dereferences are not required. A downside of this facility is that the reference parameters are not obvious at the call site. It would not be clear, on seeing the code:

swap(x, y);

That the values of x and y will be affected by the call.

C++ offers greater flexibility in the location of variable definitions. This enables them to be defined closer to the point of [first] use, thus making the code more readable. A good example is a for loop parameter:

for (int i=0; i<3; i++)

…

In C++, a class can have a constructor and/or a destructor – functions that are automatically executed when an instance of the class is created and destroyed respectively. It is not well known that a structure [struct] is identical in almost every respect to a class and, hence, may have a constructor/destructor pair to handle resource allocation/deallocation, which is otherwise a common cause of memory leaks.

The only difference between a structure and a class is the status of member variables and functions. In a class, the first members are private by default and members are only made public by means of a public: label. In a structure, the default is public and a private: label is needed to hide structure members.

Although the exception handling system [EHS] in C++ uses an object to communicate the kind of exception being processed, its use is not strictly tied to object oriented programming. So the EHS may be useful for handling error conditions in deeply nested code structures in C+, as a much more maintainable alternative to using goto.

The EHS carries an overhead to accommodate its flexibility. This overhead may be reduced in one of two ways, if a relatively simple exception handling scheme is required [as is the case with most embedded applications]:

1) Use a generic catch block, using the notation catch (…) to trap all exceptions.

2) Do not have a catch block at all, but redefine the library function terminate(), which is automatically called under such circumstances.

Either of these approaches results in less code being generated.

It should be noted that many compilers have EHS enabled by default. This means that extra code is always generated just in case the program makes use of EHS. If EHS is not in use, it is essential to ensure that the capability is deactivated in the compiler.

Conclusions

C to C++ migration is a common requirement, as the C++ language becomes more popular. The process may be approached in various ways, which facilitate the porting of code and the incremental development of C++ expertise.

The Transition

Related Posts

Team Building

Türkiye Yazılım Kalite Raporu (TSQR)

Software Quality Fusion; Agile and CMMI