Dr. Jeffrey S. Poulin

Reuse Metrics Deserve a Warning Label:

The Pitfalls of Measuring Software Reuse

Jeffrey S. Poulin, Lockheed Martin Federal Systems

Almost everyone agrees that reuse involves using something that someone originally developed for "someplace else." However, when it actually comes time to calculate reuse levels and reuse benefits, this understanding leaves a lot of room for interpretation. This article describes some of the pitfalls of measuring software reuse.

In a very thorough and well-executed reuse survey of industry and government organizations by the Defense Information Systems Agency (DISA), software development leaders consistently reported impressive results. Managers gave reuse returns on investment (ROI) as high as four to one (4:1) after two years, cycle time improvements of more than 30%, and "a significantly increased competitive edge" [DISA95]. However, what metrics did the managers use to come to these conclusions? When the survey team sought to understand how the survey participants measure success, they found that the survey participants provided testimonials and anecdotal results rather than objective evidence derived from quantitative measures. This article explains how reuse metrics, like all software metrics, can leave a lot of open questions [Poulin97].

Leaving Room for Interpretation

The values we get from any software metric depend on the data that we gather to put into the metric. Unfortunately, developers usually exercise considerable discretion and creativity when deciding how to count different classes of code. To understand the true value of software development practices such as reuse, we must objectively evaluate our data.

To complicate our efforts, most people want to use metrics to show increased productivity (e.g., lines of code per labor-month) and improved quality (e.g., errors per thousand lines of code). This creates an intuitive tendency to "look better by writing more," which conflicts with our motivation in reuse to avoid software development costs by "writing less." We must acknowledge this conflict or else we will see it arise in subtle ways when people attempt to use the same data to show excellence in reuse as well as traditional, productivity-sensitive metrics.

Management has considerable influence over what takes place in their organizations. If management emphasizes reuse, we will see a variety of common practices appear under the reuse banner. Personally, my favorite stories involve cases where developers use code generators to artificially achieve extremely high levels of "reuse." (For example, Graphical User Interface (GUI) - intensive applications commonly use code generators. After designing the GUI using a graphical layout editor, the developers simply push a button to create thousands of lines of code (LOC) that implements the GUI.) Likewise, the developers might also count the generated code to inflate their productivity metrics (I have seen productivity rates as high as 15k LOC per labor month; a super-human feat by any standard!). Finally, the developers might include the generated code (usually close to defect-free) to distort their quality metrics with uncommonly good defect rates.

Using code generators yields a significant advantage when we can use them. However, from a metrics point of view the code they generate requires special consideration when calculating metrics such as productivity, quality, and reuse. We do this by reporting the generated code separately from all other code.

Other situations can also significantly impact software metric values. For example, one practice comes from groups performing maintenance. When making a new version of an application via changes to a large software baseline, the group might claim all of the software from that baseline as "reuse" in the subsequent version of the application. As with the "generated code" example, including baseline code will severely inflate the values of software metrics such as reuse, productivity, and quality.

Another example comes in an "Open Systems Environment," where we often find the same application available for various hardware, operating systems, and flavors of the same operating system. Porting software to similar platforms usually requires little more than recompiling the source code for the new targets. However, some developers will report reuse levels of 99.9% for each "port," and claim a corresponding financial benefit from "reuse."

Reuse levels of this 85-99.9% magnitude may not make it far without scrutiny. On the other hand, I have seen reuse metrics collected and combined in many interesting and creative ways. For example, take a low reuse level on the initial development and average it with a couple of 99% values from porting to arrive at a very plausible and impressive value of 65-85%. Of course, this involves the mathematically unsound practice of averaging percents.

Metric Values Can Vary for Many Reasons

Reuse metrics do not necessarily require large influences to cause their values to vary. The previous examples illustrate factors that can cause a major impact on metric values. However, I have also seen metric values vary significantly due to inconspicuous and seemingly innocuous factors. For example, what units do we use in our metrics? Most organizations use lines of code, which have many well-documented advantages and disadvantages. We could also use units such as function points or objects. If we use objects, how do we account for the wide variation in the size of objects? Do we make an allowance for the fact that the use or reuse of a small object counts exactly the same as the use or reuse of a large object? I have seen reuse levels vary by as much as a factor of 8 depending on whether we counted reuse by line of code or by object.

This wide variation in metrics can come from a simple management decision or condition in our development environment. Obviously, we need to understand these factors and how they affect our metric values. Our metrics will have values only as good as the guidance we provide to our developers.

Remove the Ambiguity

We have seen how the choice of units can affect metric values, but the need to remove ambiguity in "what to count" extends beyond code. If we intend to reuse life-cycle products such as requirements, documentation, and designs, we also need to specify how to reuse these items and how to quantify them. What units do we use for these life-cycle phases? How do we count words, scripts, or graphics? Does a picture equal a thousand words? With the exception of the design phase, where some design environments clearly identify measurable "design objects," we do not have standard, agreed-upon units of measure.

Management needs to give specific directions when requesting metric data. Otherwise, developers will almost certainly use some latitude to interpret the request to their advantage. Developers do not necessarily manipulate the data, but they will present what they have done in the best possible light. For example, another of my personal favorite reuse stories involves the difference between "use" and "reuse." When a program needs to do the same thing many times, we naturally code that need into a procedure, function, method, subroutine, remote call, or similar language feature. When asked for reuse data, I have found that groups commonly exaggerate their reuse level by reporting every invocation of these routines as "reuse." Management should specifically state that multiple "uses" do not count as "reuse."

Just as we expect developers to create routines when they have to do the same thing many times, we expect them to copy and modify software when possible. This widespread practice, which we call "reengineering," routinely saves our developers significant implementation effort. However, how do we track the amount of modified software in each routine? In addition to the difficulties of measuring reengineered software, how does it compare to the disciplined use of unmodified software that we have designed and implemented for reuse? Note that a plethora of evidence shows that reengineering can actually cost more than new development. This penalty can occur if we modify as little as 20% of the code and almost certainly if we modify more than half [e.g., Stutzke96]. Including modified software in reuse metrics without adjusting for these penalties will inflate the apparent benefits from "reuse." Altogether, these facts make a cogent argument for the design and reuse of unmodified software.

Programming Language Support

Language features provided by [insert your favorite programming language here] can help make it easier to use software that someone originally wrote for use someplace else. Procedures, functions, generics, templates, etc., all provide very effective mechanisms for reuse [Biddle96]. However, these features do not guarantee reuse, nor does their use automatically mean reuse has taken place. The following examples expose a common method of exaggerating reuse levels through the use of programming language features:

Traditional example: Programmer J. Average sits down to design and implement a couple of thousand lines of code for a new personnel roster and duty assignment system. J. Average decides to use the C language. Since many of the functions performed by officers, NCOs, and enlisted men do exactly the same thing, J. Average codes these functions as, well, C functions. The program calls these C functions to process data records for officers, NCOs, and enlisted men and it takes J. Average 3 months to complete the program. When asked about reuse, J. Average responds "I didn't do any; I wrote the whole program myself."

Object-Oriented example: Programmer J. Average sits down to design and implement a couple of thousand lines of code for a new personnel roster and duty assignment system. J. Average decides to use the C++ language. Since many of the functions performed by officers, NCOs, and enlisted men do exactly the same thing, J. Average codes these functions as C++ methods. Classes in the program inherit these methods to process data objects for officers, NCOs, and enlisted men and it takes J. Average 3 months to complete the program. When asked about reuse, J. Average responds "at least 60%, because I reused the same code over and over."

Sometimes we wonder why we do not always see the benefits promised by purveyors of a particular process, method, or tool. If both efforts took 3 months, what happened to the benefits of reuse in the second example? In fact, no reuse took place because both programmers did all the work themselves. The programmers just placed different labels on what they did; both used good software engineering techniques and the design mechanisms provided by their programming language.

What we can do

More often than not, reuse metrics should include a warning. Knowing to watch for the "reuse warning label" will help everyone better understand the true benefits of reuse and ultimately help them make the best possible business decisions involving their reuse investments. Reuse metrics make a very effective technology insertion tool even if people will always try to find ways to make the numbers sing their praises. To help reuse and reuse metrics work, try the following:

Have a management structure that supports reuse. Dedicate a team of programmers to the task of building and supporting code that all your teams can use. All developers can use this team as a resource during application design and implementation.
Develop and enforce a software architecture for all your teams. Provide the resources early in your project for domain analysis and to develop a common software architecture.
Define what you will measure as reuse and how you will measure it. Make sure everyone understands what you will count and why.
Regularly track your reuse metrics. Use metrics as a technology insertion tool to improve all of your processes and methods.

Conclusions

We have all seen experience reports that reflect glowing results from reuse. Although reuse can lead to significant benefits, reuse does not automatically provide the panacea that some experience reports would like us to believe [Pfleeger96]. Experience reports that do not mention where and how they obtained their data probably rely on testimonials and anecdotes rather than objective evidence. Armed with a little knowledge about how metric values can vary based on who does the counting, we can better understand reuse activities and the benefits of reuse to our organization.

Acknowledgments

I would like to thank Marilyn Gaska, Karen Holm, Allen Matheson, and Will Tracz for their encouragement and help during the preparation of this paper.

About the Author

Jeff Poulin works as a Senior Programmer and software architect with Lockheed Martin Federal Systems (formally Loral Federal Systems and IBM Federal Systems Company) in Owego, NY. As a member of the Advanced Technology Group, he works software reuse and architecture issues on a variety of software development efforts across Lockheed Martin. Active in numerous professional activities and conference committees, Dr. Poulin has over 40 publications, including a book on reuse metrics and economics recently published by Addison-Wesley. A Hertz Foundation Fellow, Dr. Poulin earned his Bachelors degree at the United States Military Academy at West Point and his Masters and Ph.D. degrees at Rensselaer Polytechnic Institute in Troy, New York.

Dr. Jeffrey S. Poulin

Lockheed Martin Federal Systems

Mail Drop 0210

Owego, NY 13827

Voice: 607-751-6899

Fax: 607-751-6025

E-mail: Jeffrey.Poulin@lmco.com

Web: http://www.owego.com/~poulinj

References

[Biddle96] Biddle, Robert L. and Ewan D. Tempero, "Understanding the Impact of Language Features on Reusability," Fourth International Conference on Software Reuse, Orlando, FL, 23-26 April 1996, pp. 52-61.

[DISA95] DISA/SRI, "Software Reuse Benchmarking Study: Learning from Industry and Government Leaders," Defense Information Systems Agency Center for Software, Software Reuse Initiative, Document DCA100-93-D-0066, Delivery Order 0037, 10 November 1995.

[Pfleeger96] Pfleeger, Shari Lawrence, "Measuring Reuse: A Cautionary Tale," IEEE Software, Vol. 13, No. 4, July 1996, pp. 118-127.

[Poulin97] Poulin, Jeffrey S., Measuring Software Reuse: Principles, Practices, and Economic Models. Addison-Wesley (ISBN 0-201-63413-9), Reading, MA, 1997.

[Stutzke96] Stutzke, Richard D., "Software Estimating Technology: A Survey," Crosstalk: The Journal of Defense Software Engineering, Vol. 9, No. 5, May 1996, pp. 17-22.