The Role of Metrics in Object-Oriented Development

Jeffrey S. Poulin

IBM Federal Systems Company

Owego, NY

The IBM Object-Oriented Technology Center (OOTC) studied the problem of applying metrics in OO for the purpose of identifying and recommending metrics for use in IBM's OO community. However, after an extensive survey of the industry, and after speaking with the numerous OO developers with whom they consult, the OOTC decided that the practice of OO had not yet evolved to the point where they could clearly identify key metrics for OO development. Of course OO needs metrics; we need them to help us evaluate how we do our jobs, and managers need them to help make their business decisions. However, we have yet to reach consensus on the most effective way to quantify OO development.

During this study the OOTC identified over 40 elements of observable, "raw" data that might contribute to interesting metrics. Examples of observable data include the number of classes, methods, iterations, and depth of the class hierarchy. The OOTC then created a list of 114 metrics that a developer could derive from these 40 observable data elements. However, OO developers clearly cannot handle that many metrics. Even if each metric provided some key part of the measurement puzzle, if a project actually tried to collect them, the team members would spend more time counting beans than writing code.

In fact, the OOTC concluded that of the 114 metrics, about 32 might reveal truly interesting results. They then recommended that project leaders decide which metrics they wanted to use. Perhaps the metrics that they chose would come from the core set of 32, maybe they would not. Admitting the limits of the field, they told the IBM OO community that they could decide what metrics meant something to them.

Of course, from the point of view of a software measurement purist, not all of original list of 114 metrics might really stand up as a real metric. For one thing, you really want independence between metrics and the data that contributes to them. For example, if you know the average number of errors per class, you can easily derive the average number of errors per line of code if you know the size of the project. You do not need to track both metrics; pick the one that makes the most sense for you. It does not do much good to spend valuable resources maintaining long lists of numbers.

As OO developers and managers, these kinds of measurement theory issues do not concern us. The bottom line comes down to the following first rule of software metrics:

Rule #1: "Show me something that works."

Throughout our metric work here with the OOTC and the IBM Reuse Technology Support Center (RTSC), we have continuously stressed the use of practical and useful metrics [Poulin93]. This means that collecting the data required for the metrics must not cause any undo burden on the project and the metrics must easily convey the important information to the person looking at them. This lets the project collect the metrics often and use them to help improve their product or the way they do business.

Essentially, we use metrics to quantify our progress or expected progress in four areas:

Estimating - to predict the level of effort or level of effort expected on an upcoming project.
Management - to tell you how the project progresses; for example, the number of classes completed, the number of iterations completed, or the percent of budget expended. I put all "in-progress" metrics into this category.
Productivity - to indicate how much you did, to include the overall size of the effort; e.g., the total number of classes you wrote, the total person-months expended.
Quality - to tell you how well you did your work.

I do not claim that this represents the only way to look at categories of metrics. For example, the OOTC leaves out "Productivity" in favor of "Testing," which I consider a Management metric. That does not mean that the OOTC does not care about productivity, but they do not emphasize asking the question: "How much code did you write?," either.

So how do we apply these groups to real life? You need to know how you do in each of these four areas, and to do so, you need a compact but revealing set of metrics. I have observed that when I work with development groups or review experience reports, I tend to see a couple of key metrics over and over. In the Productivity area I see two key indicators of size; total lines of code (LOC) and total programmers. Note that even on OO projects, we naturally express how much we wrote in terms of LOC, not classes or methods. Despite their shortcomings [Firesmith88], LOC remain the most meaningful unit with which we express our programming effort. In the Management area, we tend to tell how long the project lasted (or how long the project has lasted). From this we can derive indicators of cycle time and productivity rates. Finally, in the Quality area, we tend to say how well we did, especially if the latest effort showed a positive trend from the last time. I see errors/LOC as the most commonly reported quality metric.

Notice that I did not include a metric for estimating. I do not want to say that I do not trust estimates, but most of the time we do not do a very good job of it. In practice, our estimate models consist of looking at the metrics that we collected on past projects, factoring in our experiences and lessons learned, making the necessary adjustments for the new project, and coming up with some numbers. Not very scientific, but right now we do not have the reliable models we need to do better.

The interesting thing about these metrics comes from the fact they these numbers tend to get reported up the management chain and often form the basis upon which the really big management decisions get made. However, by themselves, they do not constitute a complete picture with which developers, first-line managers, and team leads can effectively analyze their work so they can do better the next time. They will point the team in the right direction, but using metrics to improve a process does not just happen. Just as we have become accustomed to seeing a standard set of metrics that mean something to executives, we need to select and use a set of metrics that mean something to us. These might not be the same for every group, but once you have identified your key indicators, remember the second rule of software metrics:

Rule #2: "Use the metrics you collect."

After all, if you do not use the metrics to drive changes that make you better, why do you collect them in the first place? Make note of the metrics you collect, use, and learn to trust. Use them as indicators of how well you do and use them to help identify problem areas when you do not do well. Fix those problem areas and do it better the next time. A small set of metrics can make all the difference.

References

[Firesmith88] Firesmith, Donald G., "Managing Ada Projects: the People Issues," Proceedings of TRI Ada'88, Charleston, WV, 24-27 Oct. 1988, pp. 610-619.

[Poulin93] Poulin, Jeffrey S., and Joseph M. Caruso, "Determining the Value of a Corporate Reuse Program," Proceedings of the IEEE Computer Society International Software Metrics Symposium, Baltimore, MD, 21-22 May 1993, pp. 16-27.