Jeffrey S. Poulin
IBM Federal Systems Company
Owego, NY
The IBM Object-Oriented Technology Center (OOTC)
studied the problem of applying metrics in OO for the purpose
of identifying and recommending metrics for use in IBM's OO community.
However, after an extensive survey of the industry, and after
speaking with the numerous OO developers with whom they consult,
the OOTC decided that the practice of OO had not yet evolved to
the point where they could clearly identify key metrics for OO
development. Of course OO needs metrics; we need them to help
us evaluate how we do our jobs, and managers need them to help
make their business decisions. However, we have yet to reach
consensus on the most effective way to quantify OO development.
During this study the OOTC identified over 40 elements
of observable, "raw" data that might contribute to interesting
metrics. Examples of observable data include the number of classes,
methods, iterations, and depth of the class hierarchy. The OOTC
then created a list of 114 metrics that a developer could derive
from these 40 observable data elements. However, OO developers
clearly cannot handle that many metrics. Even if each metric
provided some key part of the measurement puzzle, if a project
actually tried to collect them, the team members would spend more
time counting beans than writing code.
In fact, the OOTC concluded that of the 114 metrics,
about 32 might reveal truly interesting results. They then recommended
that project leaders decide which metrics they wanted to use.
Perhaps the metrics that they chose would come from the core
set of 32, maybe they would not. Admitting the limits of the
field, they told the IBM OO community that they could decide
what metrics meant something to them.
Of course, from the point of view of a software measurement
purist, not all of original list of 114 metrics might really stand
up as a real metric. For one thing, you really want independence
between metrics and the data that contributes to them. For example,
if you know the average number of errors per class, you can easily
derive the average number of errors per line of code if you know
the size of the project. You do not need to track both metrics;
pick the one that makes the most sense for you. It does not do
much good to spend valuable resources maintaining long lists of
numbers.
As OO developers and managers, these kinds of measurement
theory issues do not concern us. The bottom line comes down to
the following first rule of software metrics:
Throughout our metric work here with the OOTC and
the IBM Reuse Technology Support Center (RTSC), we have continuously
stressed the use of practical and useful metrics [Poulin93].
This means that collecting the data required for the metrics must
not cause any undo burden on the project and the metrics must
easily convey the important information to the person looking
at them. This lets the project collect the metrics often and
use them to help improve their product or the way they do business.
Essentially, we use metrics to quantify our progress
or expected progress in four areas:
I do not claim that this represents the only way
to look at categories of metrics. For example, the OOTC leaves
out "Productivity" in favor of "Testing,"
which I consider a Management metric. That does not mean that
the OOTC does not care about productivity, but they do not emphasize
asking the question: "How much code did you write?,"
either.
So how do we apply these groups to real life? You
need to know how you do in each of these four areas, and to do
so, you need a compact but revealing set of metrics. I have observed
that when I work with development groups or review experience
reports, I tend to see a couple of key metrics over and over.
In the Productivity area I see two key indicators of size; total
lines of code (LOC) and total programmers. Note that even on
OO projects, we naturally express how much we wrote in terms of
LOC, not classes or methods. Despite their shortcomings [Firesmith88],
LOC remain the most meaningful unit with which we express our
programming effort. In the Management area, we tend to tell how
long the project lasted (or how long the project has lasted).
From this we can derive indicators of cycle time and productivity
rates. Finally, in the Quality area, we tend to say how well
we did, especially if the latest effort showed a positive trend
from the last time. I see errors/LOC as the most commonly reported
quality metric.
Notice that I did not include a metric for estimating.
I do not want to say that I do not trust estimates, but most
of the time we do not do a very good job of it. In practice,
our estimate models consist of looking at the metrics that we
collected on past projects, factoring in our experiences and lessons
learned, making the necessary adjustments for the new project,
and coming up with some numbers. Not very scientific, but right
now we do not have the reliable models we need to do better.
The interesting thing about these metrics comes from
the fact they these numbers tend to get reported up the management
chain and often form the basis upon which the really big management
decisions get made. However, by themselves, they do not constitute
a complete picture with which developers, first-line managers,
and team leads can effectively analyze their work so they can
do better the next time. They will point the team in the right
direction, but using metrics to improve a process does not just
happen. Just as we have become accustomed to seeing a standard
set of metrics that mean something to executives, we need to select
and use a set of metrics that mean something to us. These might
not be the same for every group, but once you have identified
your key indicators, remember the second rule of software metrics:
Rule #2: "Use the metrics you collect."
After all, if you do not use the metrics to drive
changes that make you better, why do you collect them in the first
place? Make note of the metrics you collect, use, and learn to
trust. Use them as indicators of how well you do and use them
to help identify problem areas when you do not do well. Fix those
problem areas and do it better the next time. A small set of metrics
can make all the difference.
[Firesmith88] Firesmith, Donald G., "Managing
Ada Projects: the People Issues," Proceedings of TRI Ada'88,
Charleston, WV, 24-27 Oct. 1988, pp. 610-619.
[Poulin93] Poulin, Jeffrey S., and Joseph M. Caruso, "Determining the Value of a Corporate Reuse Program," Proceedings of the IEEE Computer Society International Software Metrics Symposium, Baltimore, MD, 21-22 May 1993, pp. 16-27.