Integrated Support for Software Reuse in Computer-Aided Software Engineering (CASE)
Jeffrey S. Poulin
Federal Systems Company
International Business Machines
Corporation
Abstract
The success and acceptance of reuse tools and libraries depends on
their integration into existing software development environments. However, the
addition of large libraries of reusable components to software design databases
only exacerbates the problem of design data management. Object-oriented
databases originated to meet the requirements of design data management that
relational databases could not satisfy. This paper describes a semantic data
model for an object-oriented database supporting an integrated Computer Aided
Software Engineering environment (CASE). The date model promotes reuse by
providing objects that match program design requirements to existing components
in the reuse library.[1]
KEYWORDS: Software reuse, Computer-Aided Software
Engineering, CASE, Semantic data modeling, Object-Oriented Database Systems.
To successfully insert reuse into the software development
process, we must integrate support for reuse into existing software tools and
CASE environments [15]. Even in current CASE systems, software developers
cannot conveniently search reuse libraries. A programmer often has to exit the
programming environment, use a limited set of tools to search for possible
reusable components, and then reenter the design environment. This causes the
programmer to not only break important lines of thought in problem solving but
also to make an additional investment in time and effort. This additional
effort often prevents reuse from taking place. Therefore, we believe that the
success of reuse technology depends on the integration of reuse libraries into
our design and programming environments [23].
Our work in CAD/CAM
systems for VLSI shows the importance of modeling design data in a way that
supports the developer [18]. We believe successful design systems for both CAD
and CASE require an underlying data model that matches the user’s conceptual
view of the problem. To this end we built on existing CAD/CAM modeling
techniques and created a data model for CASE that specifically supports
software reuse.
One attribute of software that makes it more difficult to produce
than hardware comes from the relatively abstract nature of problem solving.
Physical bounds do not constrain software components and designs as they do for
hardware. The “hardness” of packaging assists hardware designers not only by
providing physical limits (and challenges) but also by allowing them to see
what they build [7]. In contrast, we find it much more difficult to quantify
and describe the ideas and concepts that make up software. This complicates
software reuse because we do not generally have formal methods to specify
requirements for what we need nor do we have verifiable ways to catalog what we
have created.
The cross-lifecycle nature of reuse has enormous potential
benefits if integrated into a CASE system that guides development of software
across the entire lifecycle. Reuse of requirements and high-level design early
in the design process results in the reuse of subsequent lifecycle products
[14]. We emphasize the importance of
reuse early in the lifecycle by attempting to match developing requirements and
designs with reusable components in the reuse library. This paper describes
design data management issues and solution techniques. The paper then describes
semantic data modeling for CASE and an implementation built using our model.
Storing software design data causes difficulties due to the many
forms it may take. Not only do requirements, design, test cases, and code all
have different formats and potentially different media, various classifications
and descriptions may represent the information. Describing and defining the
abstract nature of the work products exceeds the capabilities of the
traditional relational model. Today’s workstation-based design systems must
store and manipulate data rich in semantics and programmer knowledge.
The success of the relational model in business and management
does not fully extend to design applications. The reason comes from the
restrictions caused by the simple and powerful method the relational model uses
to organize data into tables. Each item in the database must fit the scheme
established by the table columns; storage difficulties arise with items that do
not fit these well-defined columns. Furthermore, the relations do not explicitly
include semantics regarding the information contained in the tables. This
requires application programs to interpret and enforce all semantics. Because
real-world objects do not fit well into the relational model, we must make
artificial manipulations or extensions to accommodate the data. These
techniques cannot only adversely affect efficiency and performance but can also
cause the loss of important semantic information [10].
Although many VLSI and CAD/CAM systems use relational databases,
many researchers recognize the inadequacy of the relational model for
representing and storing design data [9]. To meet the need for a powerful and
straightforward representation of design objects, complex objects and
object-oriented databases emerged. Complex objects consist of hierarchical
groups of tuples starting with a root tuple that represents the design object,
and sets of dependent tuples that define the object. Complex objects can
succinctly represent the recursive, non-disjoint objects that the relational
model cannot easily handle [12], [13]. Design data commonly includes recursive
data because real world objects often consist of smaller objects that retain
many of the characteristics of the entire object. The following examples show
how we can represent data modeling abstractions [24] using a tuple format:
1. Aggregation
Abstractions- requires all tuple members. For example, a Point has both X
and Y coordinates.
Point: (X: real, Y: real)
2. Generalization
Abstractions- requires at most one tuple member. For example, we can
declare Numbers of type INTEGER or of type REAL.
Number: (INTEGER: integer
+ REAL: real)
3. Aggregation with
Association Abstractions- requires all tuple members, zero or more times.
For example, a Polygon has any number (0-N) of (X,Y) coordinate pairs.
Polygon: (X: real, Y:
real)(*)
4. Generalization with
Association Abstractions- requires at most one tuple member, zero or more
times. For example, a Number collection may have 0-N numbers, each of either
type INTEGER or of type REAL.
Number_collection:
(INTEGER: integer + REAL: real)(*)
The molecular object model serves as an important example of a
complex object based model. Developed for VLSI CAD systems, the model defines
two distinct parts; an interface and an implementation [2], [5]. The interface
of the object consists of all connections to the outside world and defines how
other objects use and access the object. The implementation of the object
defines how the object does its job. An interface may have several
implementations. In VLSI, a molecular representation of a circuit would have an
interface consisting of a list of pins and an implementation consisting of sub
circuits and the wires that interconnect them.
When using the molecular model, a designer may refer to an
interface without specifying an implementation for the interface. If the
designer chooses not to specify, or bind, an implementation to the interface,
we refer to the un-bound interface as a socket in the design of the circuit.
The designer must specify an implementation for the interface by plugging the
socket before he can complete the design. The plugged socket results in an
instance of the sub circuit implementation in the design of the circuit.
The molecular object model
conceptually matches the way VLSI designers create circuits. VLSI designers
create circuits by recursively decomposing the circuit into smaller sub
circuits. When implementations for the sub circuits exist, they get wired into
the design without alteration. When they do not exist, the designer must
implement the sub circuit using basic gates and wires. The molecular object
model succeeds in VLSI CAD systems because the semantics of the molecular model
match the VLSI design process.
Data models for CASE have
many of the same requirements as those for VLSI CAD and other CAD/CAM data
models [17], [21], [25]. We believe the key to storing CASE design data lies in
the underlying semantic data model used to represent the information. The more
the model maps to the users’ conceptual view of the program, the greater it
will support their needs. Furthermore, a data model that provides a single
representation of the design data provides better space utilization, reduced
complexity of the database, and improves data integrity by guaranteeing that
multiple copies of the same information do not become inconsistent.
We base our work with CASE data models on that of CAD/CAM data
modeling because of our previous work in VLSI CAD systems. We also recognize
the natural correspondence between the molecular view of VLSI objects and the
object-based view of software modules. For example, inputs and outputs in the
VLSI interface consist of a pin-list, which has an analogy in the
parameter-list of the software interface. Furthermore, the VLSI implementation
consists of gates, sub circuits, and wires, which have an analogy in the data
declarations, subprograms, and program statements of the software
implementation.
However, molecular
concepts and definitions do not fully map onto those for software. First, the
concept of instantiation has a different meaning in programming than it does in
VLSI. In VLSI, instances of components refer to copies of the component.
Although the database records only one definition of the component, in the
final hardware product every instance of the component in the design results in
a copy of that circuit in silicon. The software concept of instantiation has a
different meaning. In the final software product, just as in software design,
only one copy of the component exists.[2]
Calls to subprograms take the form of references to software services.
Normally, copies of that service do not exist until execution of the program.
We do not find this view of instantiation, which implies the creation of
multiple copies of the design object, appropriate for software.
In the module
specification:
Procedure Sort (Var:Variable1:
Array[min..max] of integer, Variable2: Boolean);
In the module design:
Sort(Integer_array, Error_flag);
Figure
1. The two roles of the software interface.
As shown in Figure 1, the second change to the VLSI molecular
model involves the dual role of the software interface. The interface found in
the software specification (e.g., declaration) and the interfaces found in the
software requirements (e.g., design) have different functions. In the first
case, the interface represents a reusable module. Because the interface
represents possibly multiple implementations and versions of the module, we
must limit how the user can modify the interface. The interface in the design,
however, represents a request for service. It differs from the interface in the
specification because it evolves with the design. During early phases of the
design, the service request may not represent any actual software module. But
as the design develops, the specification evolves into either:
·
a new module, or,
·
a call to an existing reusable module.
Ideally, the designer
fills the software specification with an existing module. Operations on the object
should guide designers toward satisfying their needs with existing reusable
software.
Our model extends the two-part molecular model to fully support
the unique requirements of the software design process. We do this by providing
a form of meta-interface for the designer to use while he develops the software
requirements. We call this new object type a Call and refer to the resulting
three-part model as the Interactive Development Model (IDM).
The IDM consists of Interface,
Alternative, and Call objects. The Interface portion of the IDM gives a
specification of the module behavior, thereby serving the declarative role of
the interface in the VLSI molecular model. The Alternative portion of the IDM
describes the design, code, and implementation details of the software module.
The Call serves to abstractly represent software requirements. As shown in
Figure 2 on page 6, a reusable software library consists of a generalization
with association abstraction of the three IDM objects where each IDM object
serves as a base type in the database. The following paragraphs describe each
of the three IDM objects in more detail.
Reuse_Library:
(Call: call + Interface: interface
+ Alternative: alternative)(*)
Figure
2. Reusable software library data model
The Interface object represents all implementations for the
module. It consists of four major components: Header, Classifiers, Parameters,
and Alternatives. The Header component contains the administrative information
about the Interface; e.g., the designer name, date of creation, and owning
organization. The Classifiers component contains information describing the
Interface. The classification information describes the function and purpose of
the service that the Interface represents. In part, the Classifiers contain an
aggregation with association abstraction of descriptive keywords, but this can
vary depending on local requirements. The Parameters component also consists of
an aggregation with association abstraction containing all data input and
output by the Interface. The final component, Alternatives, contains an
aggregation with association abstraction of all valid Alternatives existing to
implement the Interface.
Interface:
(Header: header_info,Classifiers:
classifier_info, Parameters: param_list, Alternatives: alt_list)
Figure
3. The IDM Interface
The Alternative represents an implementation of an Interface; an
Interface may have several implementation Alternatives. The Alternative object
consists of six major components; the first two components closely match those
of the Interface. The Header contains administrative information about the
Interface and Classifiers contains descriptive information about the function
and purpose of the Alternative. The Declarations component consists of an
aggregation with association abstraction of required local variables or
variables in the functional scope of the Alternative. Although the same kind of
abstraction, Performance attributes differs from Classifiers in that it
contains information about externally visible properties of the Alternative,
such as the space or time complexity for a given input. The Body of the
Alternative consists of a generalization with association abstraction of
requirements represented by Call objects. We use a generalization abstraction
rather than aggregation to allow for program constructs such as iteration and
conditionals. By identifying these program constructs, we can map the design
into a pseudocode for use during the coding phase of development. Finally, the
Versions component manages the configuration and version control by linking
descendent versions of the Alternative.
Alternative:
(Header: header_info, Classifiers: classifier_info, Declarations:
declare_list, Performance_attributes: performance_data, Versions: version_list,
Body: call_list)
Figure
4. The IDM Alternative
The Call represents requirements for a software service. As work
progresses, the designer creates a more detailed description of the required
service using the Call object. At a high level, requirements engineers and high
level designers create and form the software service descriptions. Component
level designers continue to evolve the design by further modifying and binding
the Call objects with Interfaces. To complete implementation, coders either
bind or develop Alternatives for the Interfaces already bound to the Call
objects.
The Call contains a Header and a Classifier component similar to
the Interface and Alternative objects. The Call also contains a parameter list
for anticipated input and output variables and a list of Performance
constraints. Both components consist of an aggregation with association
abstraction.
In general, the classification problem has a recurrent solution
structure [6]. A collection of data, generated from several sources, gets
interpreted as a predefined pattern. The search proceeds by mapping the
recognized pattern into a set of possible solutions from which the designer
selects the most appropriate result for the given case. In the IDM, the search
engine matches developing software requirements represented by the Call to
existing reusable Interfaces and Alternatives using the components contained
within the respective objects. The Classifier component in the Call represents
the need; the Classifier component in the other two objects indicate what can
meet the need. The search engine also matches the anticipated Parameters in the
Call to existing Parameters in available Interfaces to complete the search. Finally,
the search concludes when the Performance attributes identified in the
Alternative object satisfy the performance constraints identified in the Call
object. Once the program designer examines and accepts an Interface and
Alternative, the designer binds the selected objects to the Call using the
simple aggregation abstractions provided by Bound interface and Bound
alternative.
Call:
(Header:
header_info,Classifiers: classifier_info, Parameters: param_list,
Performance_constraints: performance_data, | Bound_interface: Interface,
Bound_alternative: Alternative)
Figure
5. The IDM Call
A prototype developed in the ROSE (Relational Object System for
Engineering) Database currently exists on DEC VAXstations. The object-oriented
ROSE data model maps closely to the IDM and provides an excellent way to
leverage our knowledge from CAD data management into CASE.
ROSE consists of an integrated experimental database system,
graphics, and user interface toolkit for developing CAD applications. The ROSE
design gives fast object access by managing a cache of logical data clusters as
physical objects. ROSE provides access to the database through a combination of
powerful control structures based on the ‘C’ programming language and database
commands extended from relational algebra [8].
We can represent ROSE complex object data models in several ways.
AND/OR trees [16] give an expressive representation of all abstractions when
the user desires a schematic of the data model. Each node in the AND/OR tree serves
to define the domain of an object or one of its sub-objects. Alternately, we
can use the LISP-like tuple format to describe ROSE data structures. The LISP
notation has an advantage in that it maps closely to the internal storage
tuples used by the ROSE object manager.
The complete ROSE CASE system includes several graphical editors
for design data input and a full GUI interface. Each editor presents the user
with a view of the object base; because one underlying data model supports all
views, a change in one view automatically results with all applicable changes
in the other views. The tool supports structured design, program structure, and
Input-Process-Output (IPO) design methods and views [3]. The tools also allow
multiple levels of abstraction for viewing each representation of the design.
To support software reuse we classify design objects using facets
[20] and free-text keywords [19]. The ROSE CASE tool maps requirements
specified by Call objects to Interfaces and Implementations using these classifiers.
As the designer develops the requirements represented by the Call object, the
search engine tries to identify existing Interface and Implementation objects
matching the specification. The tool presents candidate reusable objects to the
designer who may retrieve and examine them for possible use. When designers
choose a suitable object from the candidate list, they bind it to the call
object. The designer may bind both an Interface and Implementation to the Call,
or only bind an Interface. In this case, the designer leaves development of a
suitable implementation for the coding phase. The designer completes the design
by either binding all Call objects or leaving them unbound for later
development. As shown in Figure 6, a program design consists of a collection of
(either bound or unbound) Call objects.
Design:
(Call: call)(*)
Figure
6. IDM program design representation
The ARIES system uses a single underlying representation to store
and present requirements knowledge [11]. ARIES composes system descriptions
from the following basic units: types, relations, events, and invariants. The
system descriptions take the form of a collection of objects, each of which
represents some element of the system. However, although ARIES addresses reuse
of requirements knowledge, ARIES focuses on presentation, reasoning, and
evolution of the knowledge.
The CARE (Computer Aided Reuse Engineering) system at the
University of Maryland supports a process model for extracting candidates for
reusable components from existing software [1]. CARE has two main parts, the
component identifier and the component qualifier, and supports the derivation
of program specifications and the verification of whether or not the programs
meet those specifications.
Techniques for classifying and identifying candidate reusable
components include use of polymorphic types [22]. Polymorphic types classify
both defined components in a library and contexts of free variables in
partially written programs. A system using polymorphic types may help
programmers make better use of software libraries by implementing a retrieval
system that matches the types representing software requirements to those
defining existing reusable components.
We intend to extend the IDM by encapsulating retrieval algorithms
and expert system strategies into the call object. A C++ implementation allows
us to evaluate the feasibility of alternative reusable components through
message passing and member functions. Upon receipt of a requirements message,
the call object invokes the necessary methods to unify the requirement
specification with available Alternatives.
To support interoperability with other CASE systems and reuse
libraries, we will extend the current database to record required information
about member libraries and organizations. The database will contain library
structure, physical location, access method, validation data, and other
information required for interoperability. When the Call object fails to bind
with a suitable local reusable interface and implementation, the object invokes
a binding method against the database. The object then formats and initiates a
query for associated libraries using the appropriate inference rules, query,
and access method. When the object receives the query result, it evaluates
binding options using the same rules as for local reusable objects.
The database can also provide useful information about composite
programs resulting from cross-library reuse. For example, the product test
group can determine statistical reliability based on library quality data and
can identify modules without proven histories or test results. This information
can help predict maintenance costs and resource allocation.
By dividing the molecular interface into a requirements object and
a definition object, the IDM permits a high level of flexibility during the
design process. Since molecular interfaces define modules, the designer cannot
modify an interface without affecting existing implementations or without
creating a new object. However, Call objects represent requirement
specifications and may adapt interactively to the dynamic needs of the
designer.
The IDM provides support for all stages of the software
engineering lifecycle. The IDM supports high-level design and product
maintenance by storing requirements with the code. At component level design
the IDM reflects both the control flow and declaration structure of the
program. At the implementation level, the language-independent pseudocode
representation directly maps to the source code constructs required to
implement the product.
The IDM model supports software designers by mirroring the process
of software development. The data model promotes reuse by creating Call objects
that represent program design requirements and by matching the requirements to
existing Interface and Alternative components in the reuse library. This not
only makes a CASE system that implements the model conceptually easy to use,
but by integrating support for reuse into the development environment makes
reuse a natural result of software design.
[1] Abd-El-Hafiz, S.K., V.R. Basili, and G. Caldiera, “Towards
Automated Support for Extraction of Reusable Components,” Proceedings of the
Conference on Software Maintenance, Sorrento, Italy, 15-17 Ocotber, 1991,
pp. 212-219.
[2] Batory, D.S. and Won Kim, “Modeling Concepts for VLSI CAD
Objects,” ACM Transactions of Database Systems, Vol. 10, No. 3,
September 1985, pp. 322-346.
[3] Bergland, G.D., “A Guided Tour of Program Design
Methodologies,” in IEEE Tutorial on Software Quality Assurance, ed. Tsun
S. Chow, IEEE Computer Society Press, Silver Springs, Maryland, 1985, pp.
219-243.
[4] Bourland, D. David and Paul Dennithorne Johnston, ed., To
Be or Not: An E-Prime Anthology, International Society for General
Semantics, San Francisco, CA, 1991.
[5] Buchmann, Alejandro P. and Concepcion Perez de Celis, “An
Architecture and Data Model for CAD Databases,” Proceedings of the 11th
International Conference on Very Large Databases, Stockholm, 1985, pp.
105-114.
[6] Clancy, William J., “Classification Problem Solving,” Proceedings
3rd National Conference on Artificial Intelligence (AAAI), August 1984.
[7] Grady, Robert B., “Work-Product Analysis: The Philosopher’s
Stone of Software,” IEEE Software,, March 1990, pp. 27-34.
[8] Hardwick, Martin, “Why ROSE fast: Five Optimizations in the
Design of an Experimental Database System for CAD/CAM Applications,” Proceedings
of ACM SIGMOD, San Francisco, CA, May 1987, pp. 292-298.
[9] Helier, Sandra, Umeshwar Dayal, Jack Orenstein, and Susan
Radke-Sproull, “An Object-Oriented Approach to Data Management: Why Design
Databases Need It,” Proceedings of the 24th Design Automation Conference,
Las Vegas, Nevada, 1987, pp. 335-340.
[10] Hurson, A.R., Simin H. Pakzad, and Jia-bing Cheng,
“Object-Oriented Database Management Systems: Evolution and Performance
Issues,” IEEE Computer, February 1993, pp. 48-60.
[11] Johnson, W. Lewis, Martin S. Feather, and David R. Harris,
“Represention and Presentation of Requirements Knowledge,” IEEE Transactions
on Software Engineering, Vol. 18, No. 10, October 1992, pp.853-869.
[12] Kim, Won, Hong-Tai Chou and Jay Banerjee, “Operations and
Implementation of Complex Objects,” Proceedings of the 3rd International Conference
on Data Engineering, Los Angeles, CA, 1987, pp. 626-633.
[13] Lorie, Raymond and Wilfred Plouffe, “Complex Objects and
Their Use in Design Transactions,” Proceedings of the Annual Meeting of
Engineering Design Applications, San Jose, CA, May 1983, pp. 115-121.
[14] Lubars, Mitch D, “Reusing Designs for Rapid Application
Development,” Proceedings of the International Conference on Communications,
Denver, CO, 23-26 June 1991, pp. 1515-1519.
[15] Matsumoto, Masao, “Automatic Software Reuse Process in
Integrated CASE Environment,” IEICE Transactions on Information Systems,
Vol. E75-D, No. 5, September 1992, pp. 657-73.
[16] McLeod, D, et. al., “An Approach to Information Management
for CAD/VLSI Applications,” Proceedings of ACM Database Week, SIGMOD
Conference, San Jose, CA, May 1983.
[17] Onuegbe, Emmanuael O., “Database Management System
Requirements for Software Engineering Environments,” Proceedings of the 3rd
International Conference on Data Engineering, Los Angeles, CA, 1987, pp.
501-509.
[18] Poulin, Jeffrey S. and Martin Hardwick, “Adapting
Object-Oriented CAD Database Concepts for Computer Aided Software Engineering,”
Proceedings of the International Symposium on Database Systems for Advanced
Applications, Seoul, Korea, April 1989, pp. 201-208.
[19] Poulin, Jeffrey S., and Kathryn P. Yglesias, “Experiences
with a Faceted Classification Scheme in a Large Reusable Software Library
(RSL),” to appear, Seventeenth Annual International Computer Software and
Applications Conference (COMPSAC), Phoenix, AZ, 3-5 November 1993.
[20] Prieto-Diaz, Ruben, and Peter Freeman, “Classifying Software
for Reusability,” IEEE Software, Jan. 1987, pg. 6-16.
[21] Roman, Gruia-Catalin, “Data Engineering in Software
Development Environments,” Proceedings of the 3rd International Conference
on Data Engineering, Los Angeles, CA, 1987, pp. 85-86.
[22] Runciman, C. and I. Toyn, “Retrieving reusable software
components by polymorphic type,” Journal of Functional Programming,
Vol.1, pt.2, April 1991, pp. 191-211.
[23] Shriver, Bruce D., “Reuse Revisited,” IEEE Software,
Jan. 1987, pg. 5.
[24] Smith, J. and D. Smith, “Data Abstractions: Aggregation and
Generalization,” ACM Transactions on Database Systems, Vol. 3, No. 3,
1977, pp. 105-133.
[25] Yau, Stephen S., “Relationship Between Data Engineering and
Software Engineering,” Proceedings of the 3rd IEEE International Conference
on Data Engineering, Los Angeles, CA, 1987, pp. 84.
JEFFREY S. POULIN (poulinj@vnet.ibm.com). IBM Federal Systems
Company, MD 0220, Owego, New York, 13827. Dr. Poulin works with the IBM FSC
Open Systems Development group where he conducts applied research on software
reuse and leads the Integrated Software Development Environment team for the
Army Sustaining Base Information Systems (SBIS) project. He participates in the
IBM Corporate Reuse Council, the Association for Computing Machinery, and the
IEEE Computer Society. A Hertz Foundation Fellow, Dr. Poulin earned his
Bachelors degree at the United States Military Academy at West Point and his
Masters and Ph.D. degrees at Rensselaer Polytechnic Institute in Troy, New
York.
[2] We consider macros as an implementation technique used for efficiency. Although multiple copies of macros may exist (just as in VLSI) the software designer treats them in concept like other dynamically expanded software services.