Wednesday, April 27, 2011

C++ Programming Style Guidelines



  • Use a source code style that makes the code readable and consistent. Unless you have a group code style or a style of your own, you could use a style similar to the Kernighan and Ritchie style used by a vast majority of C programmers. Taken to an extreme, however, it's possible to end up with something like this:
    int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\ 
     o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);

    --Dishonorable mention, Obfuscated C Code Contest, 1984. Author requested anonymity.
  • It is common to see the main routine defined as main(). The ANSI way of writing this is int main(void) (if there are is no interest in the command line arguments) or as int main( int argc, char **argv ). Pre-ANSI compilers would omit the void declaration, or list the variable names and follow with their declarations.
  • WhitespaceUse vertical and horizontal whitespace generously. Indentation and spacing should reflect the block structure of the code.
    A long string of conditional operators should be split onto separate lines. For example:
    if (foo->next==NULL && number < limit      && limit <=SIZE
             && node_active(this_input)) {...

    might be better as:
    if (foo->next == NULL
               && number < limit && limit <= SIZE
               && node_active(this_input))
            {
             ...

    Similarly, elaborate for loops should be split onto different lines:
    for (curr = *varp, trail = varp;
             curr != NULL;
            trail = &(curr->next), curr = curr->next )
      {
               ...

    Other complex expressions, such as those using the ternary ?: operator, are best split on to several lines, too.
    z = (x == y)
             ? n + f(x)
              : f(y) - n;
                        
  • CommentsThe comments should describe what is happening, how it is being done, what parameters mean, which globals are used and any restrictions or bugs. However, avoid unnecessary comments. If the code is clear, and uses good variable names, it should be able to explain itself well. Since comments are not checked by the compiler, there is no guarantee they are right. Comments that disagree with the code are of negative value. Too many comments clutter code.
    Here is a superfluous comment style:
    i=i+1;        /* Add one to i */
    

    It's pretty clear that the variable i is being incremented by one. And there are worse ways to do it:
    /************************************
          *                                   *
          *          Add one to i             *
         *                                   *
            ************************************/
    
                        i=i+1;
  • Naming Conventions Names with leading and trailing underscores are reserved for system purposes and should not be used for any user-created names. Convention dictates that:
    1. #define constants should be in all CAPS.
    2. enum constants are Capitalized or in all CAPS
    3. Function, typedef, and variable names, as well as struct, union, and enum tag names should be in lower case.
    For clarity, avoid names that differ only in case, like foo and Foo . Similarly, avoid foobar and foo_bar. Avoid names that look like each other. On many terminals and printers, 'l', '1' and 'I' look quite similar. A variable named 'l' is particularly bad because it looks so much like the constant '1'.
  • Variable namesWhen choosing a variable name, length is not important but clarity of expression is. A long name can be used for a global variable which is rarely used but an array index used on every line of a loop need not be named any more elaborately than i. Using 'index' or 'elementnumber' instead is not only more to type but also can obscure the details of the computation. With long variable names sometimes it is harder to see what is going on. Consider:
    for(i=0 to 100)
              array[i]=0

    versus
    for(elementnumber=0 to 100)
              array[elementnumber]=0;
    
  • Function namesFunction names should reflect what they do and what they return. Functions are used in expressions, often in an if clause, so they need to read appropriately. For example:
    if (checksize(x))

    is unhelpful because it does not tell us whether checksize returns true on error or non-error; instead:
    if (validsize(x))
    

    makes the point clear.
  • DeclarationsAll external data declaration should be preceded by the extern keyword.
    The "pointer'' qualifier, '*', should be with the variable name rather than with the type.
    char        *s, *t, *u;
    

    instead of
    char*   s, t, u;

    The latter statement is not wrong, but is probably not what is desired since 't' and 'u' do not get declared as pointers.
  • Header FilesHeader files should be functionally organized, that is, declarations for separate subsystems should be in separate header files. Also, declarations that are likely to change when code is ported from one platform to another should be in a separate header file.
    Avoid private header filenames that are the same as library header filenames. The statement #include "math.h'' includes the standard library math header file if the intended one is not found in the current directory. If this is what you want to happen, comment this fact.
    Finally, using absolute pathnames for header files is not a good idea. The "include-path'' option of the C compiler (-I (capital "eye") on many systems) is the preferred method for handling extensive private libraries of header files; it permits reorganizing the directory structure without having to alter source files.
  • scanf scanf should never be used in serious applications. Its error detection is inadequate. Look at the example below:
    #include <stdio.h>
    
     int main(void)
     {
      int i;
      float f;
    
      printf("Enter an integer and a float: ");
      scanf("%d %f", &i, &f);
    
      printf("I read %d and %f\n", i, f);
      return 0;
     }

    Test run
    Enter an integer and a float: 182 52.38
    I read 182 and 52.380001
    Another TEST run
    Enter an integer and a float: 6713247896 4.4
    I read -1876686696 and 4.400000
  • ++ and --When the increment or decrement operator is used on a variable in a statement, that variable should not appear more than once in the statement because order of evaluation is compiler-dependent. Do not write code that assumes an order, or that functions as desired on one machine but does not have a clearly defined behavior:
    int i = 0, a[5];
    
     a[i] = i++; /* assign to  a[0]?  or  a[1]? */
  • Don't let yourself believe you see what isn't there.Look at the following example:
    while (c == '\t' || c = ' ' || c == '\n')
       c = getc(f);
    

    The statement in the while clause appears at first glance to be valid C. The use of the assignment operator, rather than the comparison operator, results in syntactically incorrect code. The precedence of = is lowest of any operator so it would have to be interpreted this way (parentheses added for clarity):
    while ((c == '\t' || c) = (' ' || c == '\n'))
       c = getc(f);

    The clause on the left side of the assignment operator is:
    (c == '\t' || c)
    

    which does not result in an lvalue. If c contains the tab character, the result is "true" and no further evaluation is performed, and "true" cannot stand on the left-hand side of an assignment.
  • Be clear in your intentions. When you write one thing that could be interpreted for something else, use parentheses or other methods to make sure your intent is clear. This helps you understand what you meant if you ever have to deal with the program at a later date. And it makes things easier if someone else has to maintain the code.
    It is sometimes possible to code in a way that anticipates likely mistakes. For example, you can put constants on the left of equality comparisons. That is, instead of writing:
    while (c == '\t' || c == ' ' || c == '\n')
      c = getc(f);

    You can say:
    while ('\t' == c || ' ' == c || '\n' == c)
       c = getc(f);

    This way you will get a compiler diagnostic:
    while ('\t' = c || ' ' == c || '\n' == c)
       c = getc(f);

    This style lets the compiler find problems; the above statement is invalid because it tries to assign a value to '\t'.
  • Trouble from unexpected corners.C implementations generally differ in some aspects from each other. It helps to stick to the parts of the language that are likely to be common to all implementations. By doing that, it will be easier to port your program to a new machine or compiler and less likely that you will run into compiler idiosyncracies. For example, consider the string:
    /*/*/2*/**/1

    This takes advantage of the "maximal munch" rule. If comments nest, it is interpreted this way:
    /*  /*  /2  */  *  */  1

    The two /* symbols match the two */ symbols, so the value of this is 1. If comments do not nest, on some systems, a /* in a comment is ignored. On others a warning is flagged for /*. In either case, the expression is interpreted this way:
    /*  /  */  2  *  /*  */  1

    2 * 1 evaluates to 2.
  • Flushing Output BufferWhen an application terminates abnormally, the tail end of its output is often lost. The application may not have the opportunity to completely flush its output buffers. Part of the output may still be sitting in memory somewhere and is never written out. On some systems, this output could be several pages long.
    Losing output this way can be misleading because it may give the impression that the program failed much earlier than it actually did. The way to address this problem is to force the output to be unbuffered, especially when debugging. The exact incantation for this varies from system to system but usually looks something like this:
    setbuf(stdout, (char *) 0);

    This must be executed before anything is written to stdout. Ideally this could be the first statement in the main program.
  • getchar() - macro or functionThe following program copies its input to its output:
    #include  <stdio.h>
    
     int main(void)
     {
      register int a;
    
      while ((a = getchar()) != EOF)
       putchar(a);
     }
    

    Removing the #include statement from the program would cause it to fail to compile because EOF would then be undefined.
    We can rewrite the program in the following way:
    #define EOF -1
    
     int main(void)
     {
      register int a;
    
      while ((a = getchar()) != EOF)
       putchar(a);
     }
    

    This will work on many systems but on some it will run much more slowly.
    Since function calls usually take a long time, getchar is often implemented as a macro. This macro is defined in stdio.h, so when #include <stdio.h> is removed, the compiler does not know what getchar is. On some systems it assumes that getchar is a function that returns an int.
    In reality, many C implementations have a getchar function in their libraries, partly to safeguard against such lapses. Thus in situations where #include <stdio.h> is missing the compiler uses the function version of getchar. Overhead of function call makes the program slower. The same argument applies to putchar.
  • null pointerA null pointer does not point to any object. Thus it is illegal to use a null pointer for any purpose other than assignment and comparison.
    Never redefine the NULL symbol. The NULL symbol should always have a constant value of zero. A null pointer of any given type will always compare equal to the constant zero, whereas comparison with a variable with value zero or to some non-zero constant has implementation-defined behaviour.
    Dereferencing a null pointer may cause strange things to happen.
  • What does a+++++b mean?The only meaningful way to parse this is:
    a ++  +  ++  b
    

    However, the maximal munch rule requires it to be broken down as:
    a ++  ++  +  b
    

    This is syntactically invalid: it is equivalent to:
    ((a++)++) +  b
    

    But the result of a++ is not an lvalue and hence is not acceptable as an operand of ++. Thus the rules for resolving lexical ambiguity make it impossible to resolve this example in a way that is syntactically meaningful. In practice, of course, the prudent thing to do is to avoid construction like this unless you are absolutely certain what they mean. Of course, adding whitespace helps the compiler to understand the intent of the statement, but it is preferable (from a code maintenance perspective) to split this construct into more than one line:
    ++b;
     (a++) + b;
    
  • Treat functions with care Functions are the most general structuring concept in C. They should be used to implement "top-down" problem solving - namely breaking up a problem into smaller and smaller subproblems until each piece is readily expressed in code. This aids modularity and documentation of programs. Moreover, programs composed of many small functions are easier to debug.
    Cast all function arguments to the expected type if they are not of that type already, even when you are convinced that this is unnecessary since they may hurt you when you least expect it. In other words, the compiler will often promote and convert data types to conform to the declaration of the function parameters. But doing so manually in the code clearly explains the intent of the programmer, and may ensure correct results if the code is ever ported to another platform.
    If the header files fail to declare the return types of the library functions, declare them yourself. Surround your declarations with #ifdef/#endif statements in case the code is ever ported to another platform.
    Function prototypes should be used to make code more robust and to make it run faster.
  • Dangling elseStay away from "dangling else" problem unless you know what you're doing:
    if (a == 1)
       if (b == 2)
        printf("***\n");
       else
        printf("###\n");
    
    

    The rule is that an else attaches to the nearest if. When in doubt, or if there is a potential for ambiguity, add curly braces to illuminate the block structure of the code.
  • Array boundsCheck the array bounds of all arrays, including strings, since where you type "fubar'' today someone someday may type "floccinaucinihilipilification". Robust production software should not use gets().
    The fact that C subscripts start from zero makes all kinds of counting problems easier. However, it requires some effort to learn to handle them.
  • Null statementThe null body of a for or while loop should be alone on a line and commented so that it is clear that the null body is intentional and not missing code.
    while (*dest++ = *src++)
         ;   /* VOID */
    
  • Test for true or falseDo not default the test for non-zero, that is:
    if (f() != FAIL)
    

    is better than
    if (f())
    

    even though FAIL may have the value 0 which C considers to be false. (Of course, balance this against constructs such as the one shown above in the "Function Names" section.) An explicit test will help you out later when somebody decides that a failure return should be -1 instead of 0.
    A frequent trouble spot is using the strcmp function to test for string equality, where the result should never be defaulted. The preferred approach is to define a macro STREQ:
    #define STREQ(str1, str2) (strcmp((str1), (str2)) == 0)
    

    Using this, a statement such as:
    If ( STREQ( inputstring, somestring ) ) ...
    

    carries with it an implied behavior that is unlikely to change under the covers (folks tend not to rewrite and redefine standard library functions like strcmp()).
    Do not check a boolean value for equality with 1 (TRUE, YES, etc.); instead test for inequality with 0 (FALSE, NO, etc.). Most functions are guaranteed to return 0 if false, but only non-zero if true. Thus,
    if (func() == TRUE) {...

    is better written
    if (func() != FALSE)
  • Embedded statementThere is a time and a place for embedded assignment statements. In some constructs there is no better way to accomplish the results without resulting in bulkier and less readable code:
    while ((c = getchar()) != EOF) {
         process the character
     }

    Using embedded assignment statements to improve run-time performance is possible. However, you should consider the tradeoff between increased speed and decreased maintainability that results when embedded assignments are used in artificial places. For example:
    x = y + z;
     d = x + r;

    should not be replaced by:
    d = (x = y + z) + r;

    even though the latter may save one cycle. In the long run the time difference between the two will decrease as the optimizer is enhanced, while the difference in ease of maintenance will increase.
  • goto statementsgoto should be used sparingly. The one place where they can be usefully employed is to break out of several levels of switch, for, and while nesting, although the need to do such a thing may indicate that the inner constructs should be broken out into a separate function.
    for (...) {
            while (...) {
            ...
                      if (wrong)
                         goto error;
            
                        }
       }
          ...
                error:
               print a message
    

    When a goto is necessary the accompanying label should be alone on a line and either tabbed one stop to the left of the code that follows, or set at the beginning of the line. Both the goto statement and target should be commented to their utility and purpose.
  • Fall-though in switchWhen a block of code has several labels, place the labels on separate lines. This style agrees with the use of vertical whitespace, and makes rearranging the case options a simple task, should that be required. The fall-through feature of the C switch statement must be commented for future maintenance. If you've ever been "bitten" by this feature, you'll appreciate its importance!
    switch (expr) {
     case ABC: 
     case DEF:
          statement;
          break;
     case UVW:
          statement; /*FALLTHROUGH*/ 
     case XYZ:
          statement;
          break; 
     }
    

    While the last break is technically unnecessary, the consistency of its use prevents a fall-through error if another case is later added after the last one. The default case, if used, should always be last and does not require a final break statement if it is last.
  • ConstantsSymbolic constants make code easier to read. Numerical constants should generally be avoided; use the #define function of the C preprocessor to give constants meaningful names. Defining the value in one place (preferably a header file) also makes it easier to administer large programs since the constant value can be changed uniformly by changing only the define. Consider using the enumeration data type as an improved way to declare variables that take on only a discrete set of values. Using enumerations also lets the compiler warn you of any misuse of an enumerated type. At the very least, any directly-coded numerical constant must have a comment explaining the derivation of the value.
    Constants should be defined consistently with their use; e.g. use 540.0 for a float instead of 540 with an implicit float cast. That said, there are some cases where the constants 0 and 1 may appear as themselves instead of as defines. For example if a for loop indexes through an array, then:
    for (i = 0; i < arraysub; i++)
    

    is quite reasonable, while the code:
    gate_t *front_gate = opens(gate[i], 7);
     if (front_gate == 0)
         error("can't open %s\n", gate[i]);

    is not. In the second example front_gate is a pointer; when a value is a pointer it should be compared to NULL instead of 0. Even simple values like 1 or 0 are often better expressed using defines like TRUE and FALSE (and sometimes YES and NO read better).
    Don't use floating-point variables where discrete values are needed. This is due to the inexact representation of floating point numbers (see the second test in scanf, above). Test floating-point numbers using <= or >=; an exact comparison (== or !=) may not detect an "acceptable" equality.
    Simple character constants should be defined as character literals rather than numbers. Non-text characters are discouraged as non-portable. If non-text characters are necessary, particularly if they are used in strings, they should be written using a escape character of three octal digits rather than one (for example, '\007'). Even so, such usage should be considered machine-dependent and treated as such.
  • Conditional CompilationConditional compilation is useful for things like machine-dependencies, debugging, and for setting certain options at compile-time. Various controls can easily combine in unforeseen ways. If you use #ifdef for machine dependencies, make sure that when no machine is specified, the result is an error, not a default machine. The #error directive comes in handy for this purpose. And if you use #ifdef for optimizations, the default should be the unoptimized code rather than an uncompilable or incorrect program. Be sure to test the unoptimized code.
Miscellaneous
  • Utilities for compiling and linking such as Make simplify considerably the task of moving an application from one environment to another. During development, make recompiles only those modules that have been changed since the last time make was used. Use lint frequently. lint is a C program checker that examines C source files to detect and report type incompatibilities, inconsistencies between function definitions and calls, potential program bugs, etc.
    Also, investigate the compiler documentation for switches that encourage it to be "picky". The compiler's job is to be precise, so let it report potential problems by using appropriate command line options.
  • Minimize the number of global symbols in the application. One of the benefits is the lower probability of conflicts with system-defined functions.
  • Many programs fail when their input is missing. All programs should be tested for empty input. This is also likely to help you understand how the program is working
  • Don't assume any more about your users or your implementation than you have to. Things that "cannot happen" sometimes do happen. A robust program will defend against them. If there's a boundary condition to be found, your users will somehow find it! Never make any assumptions about the size of a given type, especially pointers.
    When char types are used in expressions most implementations will treat them as unsigned but there are others which treat them as signed. It is advisable to always cast them when used in arithmetic expressions.
    Do not rely on the initialization of auto variables and of memory returned by malloc.
  • Make your program's purpose and structure clear.
  • Keep in mind that you or someone else will likely be asked to modify your code or make it run on a different machine sometime in the future. Craft your code so that it is portable to other machines.

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...