Salvatore Iovene

Common mistakes when approaching OO design - Class dependencies

Dec 21, 2006

Here we continue with explaining some of the mistakes commonly made in Object Oriented design, and the good practices that are often ignored. This article is focused on code maintainability and on improving cooperation with people working at the same project.

Encouraging class dependencies

Having a lot of (mutual) dependencies in the code is quite typical of Spaghetti Code, and it’s definitely something we want to avoid, in order to keep our design neat, improve maintainability and ensure ease of collaboration with colleagues. What do I mean by “class dependencies”? Let’s continue the example from the last article, and suppose we have a certain class GuiManager which, at some points, wants to generate some reports. Let’s introduce now a certain ReportManager, which is a class responsible for generating reports. We have two types of report: TableReport, and ChartReport. They look like this:

class TableReport {
    public:
        void report()  {
            // do something
        }
};

class ChartReport {
    public:
        void report()  {
            // do something
        }
};

This means that the ReportManager will have to look something like this:

class ReportManager {
    public:
        void reportAll() {
            m_tableReport.report();
            m_chartReport.report();
        }

    private:
        TableReport m_tableReport;
        ChartReport m_chartReport;
};

There are several problems in this implementation. First of all, If the guy responsible for the TableReport one day wakes up, and decides that the method report() should rather be named generate(), he will not only be allowed to just change that and commit to the repository, but this will break the ReportManager! So after a few hours, the guy responsible for the ReportManager checks out from the repository, builds, and finds out that all the times he has used the TableReport need to be changed. Of course this is something we don’t want to happen.

The usual approach to this, is using an Abstract Base Class (ABC), which is a very robust way to sort out problems like this. Let’s see come code:

class Report {
    public:
        virtual void report() = 0;
};

class TableReport : public Report {
    public:
        void report()  {
            // do something
        }
};

class ChartReport : public Report {
    public:
        void report()  {
            // do something
        }
};

Report is ourABC, and with it we are literally forcing the people who write TableReport and ChartReport to write a method named report(). So, this way we broke one dependency: the ReportManager doesn’t need to worry about the way every single report will call the method: it’s sure that a method named report() will exist.

There is, tho, another dependency. If somebody writes a new report, say XmlReport, this will need modifications to the ReportManager, because our logic so far implies that the ReportManager knows about all the reports. So, if we’re not the maintainers of the ReportManager (because maybe it’s in some different library, written by someone else, and we don’t have access to the code), we will have to go ahead and ask the rightful maintainer to modify the code. Hence, there’s an extra dependency, not structural, this time, but logical. What if the maintainer of the ReportManager gave us tools (read APIs) so that we can register our particular report to the ReportManager? Consider the following code:

class ReportManager {
    public:
        void registerReport(Report const & r) {
            m_reports.push_back(r);
        }

        void reportAll() {
            std::list::const_iterator iter;
            for(iter = m_reports.begin();
                 iter != m_reports.end();
                 ++iter)
            {
                iter->report();
            }
        }

        private:
            std::list m_reports;
};

This way, the ReportManager doesn’t have to know anything about any Report.

Common mistakes when approaching OO design

Dec 19, 2006

Today I want to talk about Object Oriented practices, and 3 commonly made mistakes. Very often, when reviewing code written by other people, I find violations of common OO practices, that make the code a lot less maintainable. Here follows a list of the most common ones, and, of course, some explanations about them.

Layer violation

While not the most common, this appears to me as the most dangerous. What is layer violation? Let’s show it with an example. Assume we have a GUI driven application that reads data from a database and shows the results on the display. We might consider having some upper level Controller class, and managers for each component, e.g. GuiManager, DbManager, ReportManager. Assume that the Controller class runs a loop, and in that loop we take care of refreshing the GUI. What I’m going to write now is really wrong:

this->guiManager().reportTable().update(this->reportManager().populate(
    this->dbManager().query(someSqlString)));

Well, there are many horrible thing here, but the layer violation happens in this->guiManager().reportTable().update(…). Imagine the various components of this scheme as layers on top of each other. We have the Controller, the GuiManager and a certain ReportTable.

What we’re doing, is accessing the ReportTable layer from the Controller one. Why is this bad? Having layer violations will fill your code up with disturbance. You will rapidly lose track of what does what (e.g., who is updating the ReportTable? The Controller or the GuiManager?), and this will end up into an intertwined mess commonly known as Spaghetti Code. Doing that, you are performing actions from parts of the code to which those actions do not belong. Classes shouldn’t care about what other classes are, but only about what other classes do. Think about it: do you really want to let the Controller know that the GuiManager has a ReportTable, inside? Shouldn’t the Controller tell the GuiManager just what to do, rather than how to do it? Having classes access inner functioning of other classes will lead you to messy code, especially when there’s more than one people working on a project, as discussed later in this article. Having all the communication happening between adjacent layers will help us keeping the project consistent even in case of changes to components. Imagine if one day I will decide that the GuiManager doesn’t need a ReportTable, but a ReportChart. My ideal scenario is the one where all the changes I need to make are only within the GuiManager. But if there was a layer violation, such as the mentioned one, I would have to modify the Controller as well. When people in a group work on different components of a system, they don’t want to make a change that will break everything else. In order to avoid broken code, it would be a good practice to keep layers commnicate with the adjacent ones, according to well known interfaces.

Information hiding

This brings us to our next point. What does the Controller need to know about the members of the GuiManager? Ideally, nothing. Ideally, there would be no getters or setters, since the Controller doesn’t need to know anything about the GuiManager‘s inner functioning. What needs to be done, in fact, is designing a well known interface for the GuiManager that the Controller can use. Once designed, such interface should never be changed, in order to ensure maximum compatibility within the components. Imagine you have just a certain GuiManager::update()method, the Controller would just need to call this->guiManager().update() and, whatever the GuiManager does, is none of the Controller‘s business. Inside, the GuiManager might do something like this->reportTable().update(), but in case this would change to a ReportChart, it wouldn’t break the Controller, and keep the people that work with it happy.

Abusing singleton pattern

Singletons are not a way to get yourself some global variables. Think thoroughly about the reasons why you really need a Singleton in your program. Is it just a way to access some variables from everywhere in the code? If the answer is yes, you should consider refactoring your code to get rid of the Singleton class. Keep also in mind that Singletons are enemies of unit testing. Have a Singleton class do something, rather than contain _something. A typical example of a class suitable to be a Singleton is a Logger class. You need to access it from everywhere in the code; the class doesn’t need to be aware of the application it’s in; the class _does (logs) and doesn’t just contain. If you write a Singleton class like the following, you’re doing something wrong:

class AccessData : public Singleton<AccessData> {
    friend class Singleton<AccessData>;
    public:
        std::string username;
        std::string password;
};

This class seems to have the sole purpose of easing the access to a certain username and password from everywhere in the code, without the need of passing them around. You should consider passing references and data around only when needed, or adopting some signaling framework.

5 SVN best practices

Dec 15, 2006

Versioning systems like CVS, SVN or Darcs are very important tools, that no serious programmers can omit to use. If you started a project without using any versioning tools, I really recommend that you start using one immediately; but I’m not discussing this right now.

I would like to point your attention to some best practices that I recommend when working in a team.

Don’t use versioning like it were a backup tool.

I’ve heard this question too often: “Have you put your code safely on SVN?”. That’s a bad question. Storing code to an SVN server is not meant for safety, i.e. for fear of losing it. You are talking about something else, and that’s called backup. Take Darcs, a not so popular versioning system. It can start without a server, and you can just run it locally on your machine without launching any daemon whatsoever. A faulty hard drive can still make you lose all your work, of course. That’s why you have to do backups, of course, but they don’t have anything to do with versioning. Hence, committing to the repository once a day, before taking off home, e.g., is not an acceptable practice, especially if you work in a team. Doing that would be like making a daily backup. An SVN commit, instead, has to have a meaning of some sort, not just “Ok, let’s store to the SVN server the work of today”. Moreover, sometimes, if the schedule is tough and the cooperation is tight, you need to commit very often so your peer will keep up with you all the time, and not just find out, at evening, that he’s got dozens conflicts after checking out your code.
Commit as soon as your changes makes a logical unit.

How often should you commit? Theres no such thing as committing too often, or too rarely. You should commit each time your changes represent a logical unit, i.e. something that makes sense. Usually that happens because you’re following a plan, when coding (because you are, aren’t you?). So, you find out a bug in the trunk, plan a strategy about how to fix it, fix it, and then commit. This makes sense because that’s a commit that fixes a bug. So that revision X is buggy, while revision X+1 is not. Don’t be shy about committing too often. Should you just find an insignificant typo in a debug string, or in a comment, don’t be afraid of committing just to fix that. Nobody will be mad at you for being precise. Consider the extreme situation in which, after months and months, you may want to remember “What was the revision where I fixed that typo in that debug string?”. If you dedicated one signle commit for the actual finite logical unit of correcting the typo, you can just scroll back your changelog and find it. But what often happens, is that people will be doing something else, and, while doing that something else, will notice the type, and correct it, and basically merge that correction with the rest of the commit, making that thing losing visibility. To make it simple: your SVN comments shouldn’t explain that you did more than one thing. If your SVN comment looks like “Fixing bugs #1234 and #1235” or “Fixing bug #4321 and correcting typo in debug sting” then you should’ve used two commits.
Be precise and exhaustive in your commit comments.

The second most annoying thing ever is committing with blank comments. If you’re working in a team, your peer developers will be frustrated about it and possibly mad at you, or will label you in a bad way; possibly publicly humiliate you. If you’re working alone, you will experience what you’re hypothetical development companions would have: frustration in not being able to easily track down what a certain commit did. Comments in commits are important. Please be precise and explain in detail everything you did. In the optimal case, I shouldn’t need to read your code.
Never ever break the trunk.

This is probably the most annoying thing when dealing with people who can’t use versioning. Breaking the trunk is an habit that will quickly earn you the hatred of your colleagues. Think about it: if you commit a patch that breaks the trunk, and then I check it out, what am I going to do? The project won’t build so I either have to fix it, or come to your desk and complain to you. In both cases I’m wasting some time. And consider the first case again: what should I do after fixing your broken code? Commit it? Sending you a diff? If I’ll commit, chances are that you’ll have conflicts when you checkout, and you’ll have to waste time in resolving them. Maybe sending you a patch would be the best way, but still it’s a waste of time for the both of us. So the thing is: before committing, ALWAYS double check! Make a clean build and make sure that it builds. And don’t forget to add files! It’s a very common mistake: committing good code, but forgetting to add a file. You won’t realize, because the thing builds, but when I’ll checkout, I’ll have troubles, because of missing file(s). If you’re using Darcs, just make a “darcs get” in a new directory, and then build.
Branch only if needed.

There are some ways to handle branches, but here’s my favorite. The most of the work should happen in the trunk, which is always sane, as stated by the previous practice, and the patches should always be small, so that they can be reviewed very easily. If you find yourself in the situation of needing to write a large patch, then you should branch it. In that way you can have small patches that will break your branch over the time, but they can be easily reviewed. After the process is completed, i.e. you’ve achieved your goal of fixing a bug or implementing a new feature, you can test the branch thoroughly, and then merge it to the trunk.

10 pieces of advice to write good code

Dec 14, 2006

Having been coding for 16 years now (I started quite young), I have seen a lot of bad code. Code is not good just because it works. So here’s a quick list of 10 advice that you’d better keep in mind while coding.

Don’t sacrifice code maintainability to performance, unless it’s strictly necessary.

This happens very often. You have to consider that your code is likely to be read by many persons, and some of them will read it after you might have parted from that company. Remember that you won’t remember what your own code does after few weeks. So always try to put things in the most readable and obvious form, even if this will require writing more lines of code, or having less performing code. Of course this is not so important if performance is your number one issue. Try, for instance, to avoid use of the ?: operator in C/C++. Everybody will understand it anyway, but a good old if statement will do it, so why not going for it?
Be precise as a Swiss clock, when it comes to naming conventions.

Nobody wants to read class names or variable names that look like gibberish. Don’t be mean on the keyboard: when you type, remember that somebody else will have to read it, so be extensive.
1. Name your variable NumberOfItems rather than items_n. Don’t use cryptic prefixes to class name. Name your class ClientMessageOperationsBasicFunctor rather than CMOpFunctor. It’s a lot more typing for you, but a lot less hassle reading for the ones that will come after you.
2. Don’t change your conventions. If you’re calling the iterators i, don’t call any of them n, ever. You will induce confusion to your reader. It doesn’t seem as important as it actually is. If you call a class ClientMessageBlockContact, then do not have ServerMessageContactBlock. Be perfect, be precise, be obsessed.
Use a good and consistent indentation style.

Never ever have more than one blank line. Don’t have trailing spaces at the end of the lines. Don’t have blank spaces or TAB characters in blank lines. A blank line must be blank, that is.Be consistent: don’t use TABs to indent in one file, and spaces in another one. Possibly, use 8-chars wide TABs to indent. If you find yourself going beyond 80 rows too often, then that could be an indication that there might be some design flaws in your program. Tweak your editor to show you the end-of-line character and the TABs.
One class, one file.

Don’t write files like ServerMessages.h where you write all the class that are ServerMessages. One class goes in one file. If you find yourself thinking that you can’t do it, review your design.
In C/C++, project includes use “”, dependency includes use <>.

If you’re including a file that’s local to your project, use #include "file.h"; if it’s an external dependency, do #include <file.h>. Why? I’ve seen people including <file.h> and then just put things like /usr/includes/my_project in the inclusions search path, so that nobody will be able to compile before installing. That’s a bad assumption. And you don’t want to end up in that error.
Always compile with -ansi -pedantic -Wall -Werror flags (or similar, according to your compiler).

Let’s adhere to standards. Let’s avoid warnings. A warning might become an error in the future.
Use TODOs and FIXMEs.

If you know that you, or somebody else, will have to return on a certain piece of code to add or modify some functionality, please mark it with a TODO. If you know that a piece of code is buggy but you can’t fix it right now, add a FIXME marker. Later on, it will be easy to grep the source tree for TODOs and FIXMEs and analyze them, especially if they’re very well commented.
Comment your own code.

Seriously: you’re going to forget, sooner than you think. Just invest 5% of your time in writing commented code. Never assume that code is self-explanatory, just write a couple of lines for everything you do. Comments are not only meant to generate doxygen documentation. You have to assume that somebody else will read your code and need to modify/extend it.
Use a versioning system even if you’re working alone.

Yes, versioning is not just for working in a team. Use Darcs or SVN even if you’re working alone: you won’t regret it. Commit often and try to be professional all the time. Later on somebody else might join you. Or then you might find useful to revert to previous versions of your program. And it will help you to keep trace of what you’re doing.
Use a bug tracking system even if you’re working alone.

Things like Bugzilla are EXTREMELY useful. Usually you will forget bout a bug after less than 2 days. Everytime you find a bug, either fix it immediately, or mark it to your personal bugzilla. And always fix bugs first, and then write new code.

Common errors

It compiles, so it works.
It works here, so it works everywhere.
Commenting? We don’t have time to waste, we gotta ship!
Plan code so that we can reuse it is useless: we’ll end up writing everything from scratch anyway.
Unit tests are a waste of time.

Is your stacktrace really corrupted?

Oct 17, 2006

You may encounter, during your debugging sessions, the `stack corruption’ problem. Usually you will find it out after seeing your program run into a segmentation fault. Otherwise, it must mean that some very malicious and subtle code has been injected into your program, usually through a buffer overrun. What is a buffer overrun? Let’s examine the following short C code:

#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("This string definitely is too long, sorry!");
    foo();
    return 0;
}

There’s clearly something wrong with it: as you can see, we are copying str' tobuf’ without first checking the size of str'. First of all there is a security issue, because ifstr’ didn’t just come from a fixed string like in this case, but got inputted from somewhere (maybe on a website), then there could be a string long enough to overwrite the code of `foo’, and run malicious code on its behalf. What we have here, anyhow, is just a segmentation fault. Let’s debug the program.

(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) run
Starting program: /home/siovene/stack

Program received signal SIGSEGV, Segmentation fault.
0x6f6c206f in ?? ()
(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

Obviously something must have gone wrong. In order to better understand what is going on, let’s make a step back, and let’s examine a working example instead:

#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("abc");
    foo();
    return 0;
}

This is the same code, but it’s been stripped off of the long string that caused the segmentation fault, and in its place we find a harmless 3 character string: `abc’. Let’s name the program stack.c anc compile it with debug informaion:

$> gcc -g -o stack stack.c

Now let’s debug it:

(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048545 "abc") at stack.c:5
5         strcpy( buf, str );

We have entered the bar() function, let’s examine the backtrace:

(gdb) backtrace
#0  bar (str=0x8048545 "abc") at stack.c:5
#1  0x0804840e in main () at stack.c:13
</code>

What is the address of the bar() function?

(gdb) print bar
$1 = {void (char *)} 0x80483c4

Let’s now be paranoid and check this out producing a dump of our executable:

$> objdump -tD stack > stack.dis

Open the file with your favorite editor and look for `80483c4′, the address of bar():

080483c4 <bar>:
 80483c4: 55                    push   %ebp
 80483c5: 89 e5                 mov    %esp,%ebp
 80483c7: 83 ec 28              sub    $0x28,%esp
 80483ca: 8b 45 08              mov    0x8(%ebp),%eax
 80483cd: 89 44 24 04           mov    %eax,0x4(%esp)
 80483d1: 8d 45 e8              lea    0xffffffe8(%ebp),%eax
 80483d4: 89 04 24              mov    %eax,(%esp)
 80483d7: e8 0c ff ff ff        call   80482e8
 80483dc: c9                    leave
 80483dd: c3                    ret

Perfect, that’s our function. But now let’s get curious. Where’s the stack pointer in the CPU registers?

(gdb) info registers
eax            0x0      0
ecx            0xb7ed11b4       -1209200204
edx            0xbff04f60       -1074770080
ebx            0xb7ecfe9c       -1209205092
esp            0xbff04f10       0xbff04f10
ebp            0xbff04f38       0xbff04f38
esi            0xbff04fd4       -1074769964
edi            0xbff04fdc       -1074769956
eip            0x80483ca        0x80483ca
eflags         0x282    642
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

The esp register, on the architecture this article is written on, is the stack pointer. Its address is 0xbff04f10. Let’s examine the memory at that point:

(gdb) x/20xw 0xbff04f10
0xbff04f10:  0x00000000   0x08049638   0xbff04f28   0x080482b5
0xbff04f20:  0xb7ecfe90   0xbff04f34   0xbff04f48   0x0804843b
0xbff04f30:  0xbff04fdc   0xb7ecfe9c   0xbff04f48   0x0804840e
0xbff04f40:  0x08048545   0x08048480   0xbff04fa8   0xb7db3970
0xbff04f50:  0x00000001   0xbff04fd4   0xbff04fdc   0x00000000

With this command we have told GDB to examine 20 words in exadecimal format at the address 0xbff04f10. That’s because the value of the stack pointer is the address of the back-chain pointer to the previous stack frame. So address 0×00000000 is the address of the previous stack frame. But 0×00000000 is put in the stack frame in concurrence of the program entry point, i.e. the main() function. This agrees with the fact that we know bar() was called by main()!

Everything looks ok and in place, since the program works perfectly we weren’t expecting anything different. Let’s now do the same with the faulty program. At the moment of the segmentation fault, the backtrace looked like this:

(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

To see exactly what goes on, it would be better to debug it more carefully:

(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048580
                    "This string definitely is too long, sorry!")
                  at stack.c:5
5         strcpy( buf, str );
(gdb) next
6       }
(gdb) next
0x6f6c206f in ?? ()
(gdb) next
Cannot find bounds of current function

Let’s then try to follow back the stacktrace, as we did previously:

(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7e9b970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

(gdb) info registers
eax            0xbfeed1e0       -1074867744
ecx            0xb7ea4c5f       -1209381793
edx            0x80485ab        134514091
ebx            0xb7fb7e9c       -1208254820
esp            0xbfeed200       0xbfeed200
ebp            0x6f742073       0x6f742073
esi            0xbfeed294       -1074867564
edi            0xbfeed29c       -1074867556
eip            0x6f6c206f       0x6f6c206f
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

(gdb) x/20xw 0xbfeed200
0xbfeed200:  0x202c676e   0x72726f73   0xbf002179   0xb7e9b970
0xbfeed210:  0x00000001   0xbfeed294   0xbfeed29c   0x00000000
0xbfeed220:  0xb7fb7e9c   0xb7fee540   0x08048480   0xbfeed268
0xbfeed230:  0xbfeed210   0xb7e9b932   0x00000000   0x00000000
0xbfeed240:  0x00000000   0xb7feeca0   0x00000001   0x08048300

(gdb) x/20xw 0x202c676e
0x202c676e:     Cannot access memory at address 0x202c676e

There’s only one explanation to that: the stack memory has been overwritten and now contains gibberish. We have been very unlucky with our example, but this gave us the tools to imagine another case. Let’s assume the stack got actually corrupted not because it was overwritten accidentally, but because GDB was failing to build it. In this case you are still able to navigate it backwards. All you need to do it keep following the value of the stack frames, starting from the esp register, until you reach 0×000000. Write all the addresses down, and then use `objdump’ to obtain the disassembly and symbols information from the binary. All is left, now, is to check the names of the symbols matching the pinned up addresses.

If you can actually do that, than you have successfully reconstructed your stacktrace. It wasn’t really corrupted by a bug in your program, but simply GDB missed to keep it up with it.

« Page 14 / 14