Jeffrey S. Poulin, Lockheed Martin Federal
Systems
Almost everyone agrees that reuse involves using
something that someone originally developed for "someplace
else." However, when it actually comes time to calculate
reuse levels and reuse benefits, this understanding leaves a lot
of room for interpretation. This article describes some of the
pitfalls of measuring software reuse.
In a very thorough and well-executed reuse survey
of industry and government organizations by the Defense Information
Systems Agency (DISA), software development leaders consistently
reported impressive results. Managers gave reuse returns on investment
(ROI) as high as four to one (4:1) after two years, cycle time
improvements of more than 30%, and "a significantly increased
competitive edge" [DISA95]. However, what metrics did the
managers use to come to these conclusions? When the survey team
sought to understand how the survey participants measure success,
they found that the survey participants provided testimonials
and anecdotal results rather than objective evidence derived from
quantitative measures. This article explains how reuse metrics,
like all software metrics, can leave a lot of open questions [Poulin97].
Leaving Room for Interpretation
The values we get from any software metric depend
on the data that we gather to put into the metric. Unfortunately,
developers usually exercise considerable discretion and creativity
when deciding how to count different classes of code. To understand
the true value of software development practices such as reuse,
we must objectively evaluate our data.
To complicate our efforts, most people want to use
metrics to show increased productivity (e.g., lines of code per
labor-month) and improved quality (e.g., errors per thousand lines
of code). This creates an intuitive tendency to "look better
by writing more," which conflicts with our motivation in
reuse to avoid software development costs by "writing less."
We must acknowledge this conflict or else we will see it arise
in subtle ways when people attempt to use the same data to show
excellence in reuse as well as traditional, productivity-sensitive
metrics.
Management has considerable influence over what takes
place in their organizations. If management emphasizes reuse,
we will see a variety of common practices appear under the reuse
banner. Personally, my favorite stories involve cases where developers
use code generators to artificially achieve extremely high levels
of "reuse." (For example, Graphical User Interface
(GUI) - intensive applications commonly use code generators. After
designing the GUI using a graphical layout editor, the developers
simply push a button to create thousands of lines of code (LOC)
that implements the GUI.) Likewise, the developers might also
count the generated code to inflate their productivity metrics
(I have seen productivity rates as high as 15k LOC per labor month;
a super-human feat by any standard!). Finally, the developers
might include the generated code (usually close to defect-free)
to distort their quality metrics with uncommonly good defect rates.
Using code generators yields a significant advantage when we can use them. However, from a metrics point of view the code they generate requires special consideration when calculating metrics such as productivity, quality, and reuse. We do this by reporting the generated code separately from all other code.
Other situations can also significantly impact software
metric values. For example, one practice comes from groups performing
maintenance. When making a new version of an application via
changes to a large software baseline, the group might claim all
of the software from that baseline as "reuse" in the
subsequent version of the application. As with the "generated
code" example, including baseline code will severely inflate
the values of software metrics such as reuse, productivity, and
quality.
Another example comes in an "Open Systems Environment,"
where we often find the same application available for various
hardware, operating systems, and flavors of the same operating
system. Porting software to similar platforms usually requires
little more than recompiling the source code for the new targets.
However, some developers will report reuse levels of 99.9% for
each "port," and claim a corresponding financial benefit
from "reuse."
Reuse levels of this 85-99.9% magnitude may not make
it far without scrutiny. On the other hand, I have seen reuse
metrics collected and combined in many interesting and creative
ways. For example, take a low reuse level on the initial development
and average it with a couple of 99% values from porting
to arrive at a very plausible and impressive value of 65-85%.
Of course, this involves the mathematically unsound practice
of averaging percents.
Metric Values Can Vary for Many Reasons
Reuse metrics do not necessarily require large influences
to cause their values to vary. The previous examples illustrate
factors that can cause a major impact on metric values. However,
I have also seen metric values vary significantly due to inconspicuous
and seemingly innocuous factors. For example, what units do we
use in our metrics? Most organizations use lines of code, which
have many well-documented advantages and disadvantages. We could
also use units such as function points or objects. If we use
objects, how do we account for the wide variation in the size
of objects? Do we make an allowance for the fact that the use
or reuse of a small object counts exactly the same as the use
or reuse of a large object? I have seen reuse levels vary by
as much as a factor of 8 depending on whether we counted
reuse by line of code or by object.
This wide variation in metrics can come from a simple
management decision or condition in our development environment.
Obviously, we need to understand these factors and how they
affect our metric values. Our metrics will have values only as
good as the guidance we provide to our developers.
Remove the Ambiguity
We have seen how the choice of units can affect metric
values, but the need to remove ambiguity in "what to count"
extends beyond code. If we intend to reuse life-cycle products
such as requirements, documentation, and designs, we also need
to specify how to reuse these items and how to quantify them.
What units do we use for these life-cycle phases? How do we
count words, scripts, or graphics? Does a picture equal a thousand
words? With the exception of the design phase, where some design
environments clearly identify measurable "design objects,"
we do not have standard, agreed-upon units of measure.
Management needs to give specific directions when
requesting metric data. Otherwise, developers will almost certainly
use some latitude to interpret the request to their advantage.
Developers do not necessarily manipulate the data, but
they will present what they have done in the best possible light.
For example, another of my personal favorite reuse stories involves
the difference between "use" and "reuse."
When a program needs to do the same thing many times, we naturally
code that need into a procedure, function, method, subroutine,
remote call, or similar language feature. When asked for reuse
data, I have found that groups commonly exaggerate their reuse
level by reporting every invocation of these routines as "reuse."
Management should specifically state that multiple "uses"
do not count as "reuse."
Just as we expect developers to create routines when
they have to do the same thing many times, we expect them to copy
and modify software when possible. This widespread practice,
which we call "reengineering," routinely saves our developers
significant implementation effort. However, how do we track the
amount of modified software in each routine? In addition to the
difficulties of measuring reengineered software, how does it compare
to the disciplined use of unmodified software that we have designed
and implemented for reuse? Note that a plethora of evidence shows
that reengineering can actually cost more than new development.
This penalty can occur if we modify as little as 20% of the code
and almost certainly if we modify more than half [e.g., Stutzke96].
Including modified software in reuse metrics without adjusting
for these penalties will inflate the apparent benefits from "reuse."
Altogether, these facts make a cogent argument for the design
and reuse of unmodified software.
Programming Language Support
Language features provided by [insert your favorite
programming language here] can help make it easier to use software
that someone originally wrote for use someplace else. Procedures,
functions, generics, templates, etc., all provide very effective
mechanisms for reuse [Biddle96]. However, these features do not
guarantee reuse, nor does their use automatically mean reuse
has taken place. The following examples expose a common method
of exaggerating reuse levels through the use of programming language
features:
Traditional example:
Programmer J. Average sits down to design and implement a couple
of thousand lines of code for a new personnel roster and duty
assignment system. J. Average decides to use the C language.
Since many of the functions performed by officers, NCOs, and
enlisted men do exactly the same thing, J. Average codes these
functions as, well, C functions. The program calls these C functions
to process data records for officers, NCOs, and enlisted men and
it takes J. Average 3 months to complete the program. When asked
about reuse, J. Average responds "I didn't do any; I wrote
the whole program myself."
Object-Oriented example:
Programmer J. Average sits down to design and implement a couple
of thousand lines of code for a new personnel roster and duty
assignment system. J. Average decides to use the C++ language.
Since many of the functions performed by officers, NCOs, and
enlisted men do exactly the same thing, J. Average codes these
functions as C++ methods. Classes in the program inherit these
methods to process data objects for officers, NCOs, and enlisted
men and it takes J. Average 3 months to complete the program.
When asked about reuse, J. Average responds "at least 60%,
because I reused the same code over and over."
Sometimes we wonder why we do not always see the
benefits promised by purveyors of a particular process, method,
or tool. If both efforts took 3 months, what happened to the
benefits of reuse in the second example? In fact, no reuse took
place because both programmers did all the work themselves. The
programmers just placed different labels on what they did; both
used good software engineering techniques and the design mechanisms
provided by their programming language.
What we can do
More often than not, reuse metrics should include
a warning. Knowing to watch for the "reuse warning label"
will help everyone better understand the true benefits of reuse
and ultimately help them make the best possible business decisions
involving their reuse investments. Reuse metrics make a very effective
technology insertion tool even if people will always try to find
ways to make the numbers sing their praises. To help reuse and
reuse metrics work, try the following:
Conclusions
We have all seen experience reports that reflect
glowing results from reuse. Although reuse can lead to significant
benefits, reuse does not automatically provide the panacea that
some experience reports would like us to believe [Pfleeger96].
Experience reports that do not mention where and how they obtained
their data probably rely on testimonials and anecdotes rather
than objective evidence. Armed with a little knowledge about
how metric values can vary based on who does the counting, we
can better understand reuse activities and the benefits of reuse
to our organization.
Acknowledgments
I would like to thank Marilyn Gaska, Karen Holm,
Allen Matheson, and Will Tracz for their encouragement and help
during the preparation of this paper.
About the Author
Jeff Poulin works as a
Senior Programmer and software architect with Lockheed Martin
Federal Systems (formally Loral Federal Systems and IBM Federal
Systems Company) in Owego, NY. As a member of the Advanced Technology
Group, he works software reuse and architecture issues on a variety
of software development efforts across Lockheed Martin. Active
in numerous professional activities and conference committees,
Dr. Poulin has over 40 publications, including a book on reuse
metrics and economics recently published by Addison-Wesley. A
Hertz Foundation Fellow, Dr. Poulin earned his Bachelors degree
at the United States Military Academy at West Point and his Masters
and Ph.D. degrees at Rensselaer Polytechnic Institute in Troy,
New York.
References
[Biddle96] Biddle, Robert L. and Ewan D. Tempero,
"Understanding the Impact of Language Features on Reusability,"
Fourth International Conference on Software Reuse, Orlando,
FL, 23-26 April 1996, pp. 52-61.
[DISA95] DISA/SRI, "Software Reuse Benchmarking
Study: Learning from Industry and Government Leaders," Defense
Information Systems Agency Center for Software, Software Reuse
Initiative, Document DCA100-93-D-0066, Delivery Order 0037,
10 November 1995.
[Pfleeger96] Pfleeger, Shari Lawrence, "Measuring
Reuse: A Cautionary Tale," IEEE Software, Vol. 13,
No. 4, July 1996, pp. 118-127.
[Poulin97] Poulin, Jeffrey S., Measuring Software
Reuse: Principles, Practices, and Economic Models. Addison-Wesley
(ISBN 0-201-63413-9), Reading, MA, 1997.
[Stutzke96] Stutzke, Richard D., "Software Estimating
Technology: A Survey," Crosstalk: The Journal of Defense
Software Engineering, Vol. 9, No. 5, May 1996, pp. 17-22.