Integrated Support for Software Reuse in Computer-Aided Software Engineering (CASE)

Jeffrey S. Poulin

Federal Systems Company

International Business Machines Corporation

Abstract

The success and acceptance of reuse tools and libraries depends on their integration into existing software development environments. However, the addition of large libraries of reusable components to software design databases only exacerbates the problem of design data management. Object-oriented databases originated to meet the requirements of design data management that relational databases could not satisfy. This paper describes a semantic data model for an object-oriented database supporting an integrated Computer Aided Software Engineering environment (CASE). The date model promotes reuse by providing objects that match program design requirements to existing components in the reuse library.[1]

KEYWORDS: Software reuse, Computer-Aided Software Engineering, CASE, Semantic data modeling, Object-Oriented Database Systems.

1 Overview

To successfully insert reuse into the software development process, we must integrate support for reuse into existing software tools and CASE environments [15]. Even in current CASE systems, software developers cannot conveniently search reuse libraries. A programmer often has to exit the programming environment, use a limited set of tools to search for possible reusable components, and then reenter the design environment. This causes the programmer to not only break important lines of thought in problem solving but also to make an additional investment in time and effort. This additional effort often prevents reuse from taking place. Therefore, we believe that the success of reuse technology depends on the integration of reuse libraries into our design and programming environments [23].

Our work in CAD/CAM systems for VLSI shows the importance of modeling design data in a way that supports the developer [18]. We believe successful design systems for both CAD and CASE require an underlying data model that matches the user’s conceptual view of the problem. To this end we built on existing CAD/CAM modeling techniques and created a data model for CASE that specifically supports software reuse.

One attribute of software that makes it more difficult to produce than hardware comes from the relatively abstract nature of problem solving. Physical bounds do not constrain software components and designs as they do for hardware. The “hardness” of packaging assists hardware designers not only by providing physical limits (and challenges) but also by allowing them to see what they build [7]. In contrast, we find it much more difficult to quantify and describe the ideas and concepts that make up software. This complicates software reuse because we do not generally have formal methods to specify requirements for what we need nor do we have verifiable ways to catalog what we have created.

The cross-lifecycle nature of reuse has enormous potential benefits if integrated into a CASE system that guides development of software across the entire lifecycle. Reuse of requirements and high-level design early in the design process results in the reuse of subsequent lifecycle products [14]. We emphasize the importance of reuse early in the lifecycle by attempting to match developing requirements and designs with reusable components in the reuse library. This paper describes design data management issues and solution techniques. The paper then describes semantic data modeling for CASE and an implementation built using our model.

2 Data Management in CAD and CASE

Storing software design data causes difficulties due to the many forms it may take. Not only do requirements, design, test cases, and code all have different formats and potentially different media, various classifications and descriptions may represent the information. Describing and defining the abstract nature of the work products exceeds the capabilities of the traditional relational model. Today’s workstation-based design systems must store and manipulate data rich in semantics and programmer knowledge.

2.1 Limitations of the Relational Model

The success of the relational model in business and management does not fully extend to design applications. The reason comes from the restrictions caused by the simple and powerful method the relational model uses to organize data into tables. Each item in the database must fit the scheme established by the table columns; storage difficulties arise with items that do not fit these well-defined columns. Furthermore, the relations do not explicitly include semantics regarding the information contained in the tables. This requires application programs to interpret and enforce all semantics. Because real-world objects do not fit well into the relational model, we must make artificial manipulations or extensions to accommodate the data. These techniques cannot only adversely affect efficiency and performance but can also cause the loss of important semantic information [10].

2.2 Data Models for CAD/CAM

Although many VLSI and CAD/CAM systems use relational databases, many researchers recognize the inadequacy of the relational model for representing and storing design data [9]. To meet the need for a powerful and straightforward representation of design objects, complex objects and object-oriented databases emerged. Complex objects consist of hierarchical groups of tuples starting with a root tuple that represents the design object, and sets of dependent tuples that define the object. Complex objects can succinctly represent the recursive, non-disjoint objects that the relational model cannot easily handle [12], [13]. Design data commonly includes recursive data because real world objects often consist of smaller objects that retain many of the characteristics of the entire object. The following examples show how we can represent data modeling abstractions [24] using a tuple format:

1. Aggregation Abstractions- requires all tuple members. For example, a Point has both X and Y coordinates.

Point: (X: real, Y: real)

2. Generalization Abstractions- requires at most one tuple member. For example, we can declare Numbers of type INTEGER or of type REAL.

Number: (INTEGER: integer + REAL: real)

3. Aggregation with Association Abstractions- requires all tuple members, zero or more times. For example, a Polygon has any number (0-N) of (X,Y) coordinate pairs.

Polygon: (X: real, Y: real)(*)

4. Generalization with Association Abstractions- requires at most one tuple member, zero or more times. For example, a Number collection may have 0-N numbers, each of either type INTEGER or of type REAL.

Number_collection: (INTEGER: integer + REAL: real)(*)

The molecular object model serves as an important example of a complex object based model. Developed for VLSI CAD systems, the model defines two distinct parts; an interface and an implementation [2], [5]. The interface of the object consists of all connections to the outside world and defines how other objects use and access the object. The implementation of the object defines how the object does its job. An interface may have several implementations. In VLSI, a molecular representation of a circuit would have an interface consisting of a list of pins and an implementation consisting of sub circuits and the wires that interconnect them.

When using the molecular model, a designer may refer to an interface without specifying an implementation for the interface. If the designer chooses not to specify, or bind, an implementation to the interface, we refer to the un-bound interface as a socket in the design of the circuit. The designer must specify an implementation for the interface by plugging the socket before he can complete the design. The plugged socket results in an instance of the sub circuit implementation in the design of the circuit.

The molecular object model conceptually matches the way VLSI designers create circuits. VLSI designers create circuits by recursively decomposing the circuit into smaller sub circuits. When implementations for the sub circuits exist, they get wired into the design without alteration. When they do not exist, the designer must implement the sub circuit using basic gates and wires. The molecular object model succeeds in VLSI CAD systems because the semantics of the molecular model match the VLSI design process.

2.3 Data models for CASE

Data models for CASE have many of the same requirements as those for VLSI CAD and other CAD/CAM data models [17], [21], [25]. We believe the key to storing CASE design data lies in the underlying semantic data model used to represent the information. The more the model maps to the users’ conceptual view of the program, the greater it will support their needs. Furthermore, a data model that provides a single representation of the design data provides better space utilization, reduced complexity of the database, and improves data integrity by guaranteeing that multiple copies of the same information do not become inconsistent.

We base our work with CASE data models on that of CAD/CAM data modeling because of our previous work in VLSI CAD systems. We also recognize the natural correspondence between the molecular view of VLSI objects and the object-based view of software modules. For example, inputs and outputs in the VLSI interface consist of a pin-list, which has an analogy in the parameter-list of the software interface. Furthermore, the VLSI implementation consists of gates, sub circuits, and wires, which have an analogy in the data declarations, subprograms, and program statements of the software implementation.

However, molecular concepts and definitions do not fully map onto those for software. First, the concept of instantiation has a different meaning in programming than it does in VLSI. In VLSI, instances of components refer to copies of the component. Although the database records only one definition of the component, in the final hardware product every instance of the component in the design results in a copy of that circuit in silicon. The software concept of instantiation has a different meaning. In the final software product, just as in software design, only one copy of the component exists.[2] Calls to subprograms take the form of references to software services. Normally, copies of that service do not exist until execution of the program. We do not find this view of instantiation, which implies the creation of multiple copies of the design object, appropriate for software.

In the module specification:

Procedure Sort (Var:Variable1: Array[min..max] of integer, Variable2: Boolean);

In the module design:

Sort(Integer_array, Error_flag);

Figure 1. The two roles of the software interface.

As shown in Figure 1, the second change to the VLSI molecular model involves the dual role of the software interface. The interface found in the software specification (e.g., declaration) and the interfaces found in the software requirements (e.g., design) have different functions. In the first case, the interface represents a reusable module. Because the interface represents possibly multiple implementations and versions of the module, we must limit how the user can modify the interface. The interface in the design, however, represents a request for service. It differs from the interface in the specification because it evolves with the design. During early phases of the design, the service request may not represent any actual software module. But as the design develops, the specification evolves into either:

· a new module, or,

· a call to an existing reusable module.

Ideally, the designer fills the software specification with an existing module. Operations on the object should guide designers toward satisfying their needs with existing reusable software.

2.4 The Interactive Development Model

Our model extends the two-part molecular model to fully support the unique requirements of the software design process. We do this by providing a form of meta-interface for the designer to use while he develops the software requirements. We call this new object type a Call and refer to the resulting three-part model as the Interactive Development Model (IDM).

The IDM consists of Interface, Alternative, and Call objects. The Interface portion of the IDM gives a specification of the module behavior, thereby serving the declarative role of the interface in the VLSI molecular model. The Alternative portion of the IDM describes the design, code, and implementation details of the software module. The Call serves to abstractly represent software requirements. As shown in Figure 2 on page 6, a reusable software library consists of a generalization with association abstraction of the three IDM objects where each IDM object serves as a base type in the database. The following paragraphs describe each of the three IDM objects in more detail.

Reuse_Library:

(Call: call + Interface: interface + Alternative: alternative)(*)

Figure 2. Reusable software library data model

2.4.1 The Interface

The Interface object represents all implementations for the module. It consists of four major components: Header, Classifiers, Parameters, and Alternatives. The Header component contains the administrative information about the Interface; e.g., the designer name, date of creation, and owning organization. The Classifiers component contains information describing the Interface. The classification information describes the function and purpose of the service that the Interface represents. In part, the Classifiers contain an aggregation with association abstraction of descriptive keywords, but this can vary depending on local requirements. The Parameters component also consists of an aggregation with association abstraction containing all data input and output by the Interface. The final component, Alternatives, contains an aggregation with association abstraction of all valid Alternatives existing to implement the Interface.

Interface:

(Header: header_info,Classifiers: classifier_info, Parameters: param_list, Alternatives: alt_list)

Figure 3. The IDM Interface

2.4.2 The Alternative

The Alternative represents an implementation of an Interface; an Interface may have several implementation Alternatives. The Alternative object consists of six major components; the first two components closely match those of the Interface. The Header contains administrative information about the Interface and Classifiers contains descriptive information about the function and purpose of the Alternative. The Declarations component consists of an aggregation with association abstraction of required local variables or variables in the functional scope of the Alternative. Although the same kind of abstraction, Performance attributes differs from Classifiers in that it contains information about externally visible properties of the Alternative, such as the space or time complexity for a given input. The Body of the Alternative consists of a generalization with association abstraction of requirements represented by Call objects. We use a generalization abstraction rather than aggregation to allow for program constructs such as iteration and conditionals. By identifying these program constructs, we can map the design into a pseudocode for use during the coding phase of development. Finally, the Versions component manages the configuration and version control by linking descendent versions of the Alternative.

Alternative:

(Header: header_info, Classifiers: classifier_info, Declarations: declare_list, Performance_attributes: performance_data, Versions: version_list, Body: call_list)

Figure 4. The IDM Alternative

2.4.3 The Call

The Call represents requirements for a software service. As work progresses, the designer creates a more detailed description of the required service using the Call object. At a high level, requirements engineers and high level designers create and form the software service descriptions. Component level designers continue to evolve the design by further modifying and binding the Call objects with Interfaces. To complete implementation, coders either bind or develop Alternatives for the Interfaces already bound to the Call objects.

The Call contains a Header and a Classifier component similar to the Interface and Alternative objects. The Call also contains a parameter list for anticipated input and output variables and a list of Performance constraints. Both components consist of an aggregation with association abstraction.

In general, the classification problem has a recurrent solution structure [6]. A collection of data, generated from several sources, gets interpreted as a predefined pattern. The search proceeds by mapping the recognized pattern into a set of possible solutions from which the designer selects the most appropriate result for the given case. In the IDM, the search engine matches developing software requirements represented by the Call to existing reusable Interfaces and Alternatives using the components contained within the respective objects. The Classifier component in the Call represents the need; the Classifier component in the other two objects indicate what can meet the need. The search engine also matches the anticipated Parameters in the Call to existing Parameters in available Interfaces to complete the search. Finally, the search concludes when the Performance attributes identified in the Alternative object satisfy the performance constraints identified in the Call object. Once the program designer examines and accepts an Interface and Alternative, the designer binds the selected objects to the Call using the simple aggregation abstractions provided by Bound interface and Bound alternative.

Call:

(Header: header_info,Classifiers: classifier_info, Parameters: param_list, Performance_constraints: performance_data, | Bound_interface: Interface, Bound_alternative: Alternative)

Figure 5. The IDM Call

2.5 Prototype System

A prototype developed in the ROSE (Relational Object System for Engineering) Database currently exists on DEC VAXstations. The object-oriented ROSE data model maps closely to the IDM and provides an excellent way to leverage our knowledge from CAD data management into CASE.

ROSE consists of an integrated experimental database system, graphics, and user interface toolkit for developing CAD applications. The ROSE design gives fast object access by managing a cache of logical data clusters as physical objects. ROSE provides access to the database through a combination of powerful control structures based on the ‘C’ programming language and database commands extended from relational algebra [8].

We can represent ROSE complex object data models in several ways. AND/OR trees [16] give an expressive representation of all abstractions when the user desires a schematic of the data model. Each node in the AND/OR tree serves to define the domain of an object or one of its sub-objects. Alternately, we can use the LISP-like tuple format to describe ROSE data structures. The LISP notation has an advantage in that it maps closely to the internal storage tuples used by the ROSE object manager.

The complete ROSE CASE system includes several graphical editors for design data input and a full GUI interface. Each editor presents the user with a view of the object base; because one underlying data model supports all views, a change in one view automatically results with all applicable changes in the other views. The tool supports structured design, program structure, and Input-Process-Output (IPO) design methods and views [3]. The tools also allow multiple levels of abstraction for viewing each representation of the design.

To support software reuse we classify design objects using facets [20] and free-text keywords [19]. The ROSE CASE tool maps requirements specified by Call objects to Interfaces and Implementations using these classifiers. As the designer develops the requirements represented by the Call object, the search engine tries to identify existing Interface and Implementation objects matching the specification. The tool presents candidate reusable objects to the designer who may retrieve and examine them for possible use. When designers choose a suitable object from the candidate list, they bind it to the call object. The designer may bind both an Interface and Implementation to the Call, or only bind an Interface. In this case, the designer leaves development of a suitable implementation for the coding phase. The designer completes the design by either binding all Call objects or leaving them unbound for later development. As shown in Figure 6, a program design consists of a collection of (either bound or unbound) Call objects.

Design:

(Call: call)(*)

Figure 6. IDM program design representation

3 Related Work

The ARIES system uses a single underlying representation to store and present requirements knowledge [11]. ARIES composes system descriptions from the following basic units: types, relations, events, and invariants. The system descriptions take the form of a collection of objects, each of which represents some element of the system. However, although ARIES addresses reuse of requirements knowledge, ARIES focuses on presentation, reasoning, and evolution of the knowledge.

The CARE (Computer Aided Reuse Engineering) system at the University of Maryland supports a process model for extracting candidates for reusable components from existing software [1]. CARE has two main parts, the component identifier and the component qualifier, and supports the derivation of program specifications and the verification of whether or not the programs meet those specifications.

Techniques for classifying and identifying candidate reusable components include use of polymorphic types [22]. Polymorphic types classify both defined components in a library and contexts of free variables in partially written programs. A system using polymorphic types may help programmers make better use of software libraries by implementing a retrieval system that matches the types representing software requirements to those defining existing reusable components.

4 Future Work

We intend to extend the IDM by encapsulating retrieval algorithms and expert system strategies into the call object. A C++ implementation allows us to evaluate the feasibility of alternative reusable components through message passing and member functions. Upon receipt of a requirements message, the call object invokes the necessary methods to unify the requirement specification with available Alternatives.

To support interoperability with other CASE systems and reuse libraries, we will extend the current database to record required information about member libraries and organizations. The database will contain library structure, physical location, access method, validation data, and other information required for interoperability. When the Call object fails to bind with a suitable local reusable interface and implementation, the object invokes a binding method against the database. The object then formats and initiates a query for associated libraries using the appropriate inference rules, query, and access method. When the object receives the query result, it evaluates binding options using the same rules as for local reusable objects.

The database can also provide useful information about composite programs resulting from cross-library reuse. For example, the product test group can determine statistical reliability based on library quality data and can identify modules without proven histories or test results. This information can help predict maintenance costs and resource allocation.

5 Conclusion

By dividing the molecular interface into a requirements object and a definition object, the IDM permits a high level of flexibility during the design process. Since molecular interfaces define modules, the designer cannot modify an interface without affecting existing implementations or without creating a new object. However, Call objects represent requirement specifications and may adapt interactively to the dynamic needs of the designer.

The IDM provides support for all stages of the software engineering lifecycle. The IDM supports high-level design and product maintenance by storing requirements with the code. At component level design the IDM reflects both the control flow and declaration structure of the program. At the implementation level, the language-independent pseudocode representation directly maps to the source code constructs required to implement the product.

The IDM model supports software designers by mirroring the process of software development. The data model promotes reuse by creating Call objects that represent program design requirements and by matching the requirements to existing Interface and Alternative components in the reuse library. This not only makes a CASE system that implements the model conceptually easy to use, but by integrating support for reuse into the development environment makes reuse a natural result of software design.

6 Cited References

[1] Abd-El-Hafiz, S.K., V.R. Basili, and G. Caldiera, “Towards Automated Support for Extraction of Reusable Components,” Proceedings of the Conference on Software Maintenance, Sorrento, Italy, 15-17 Ocotber, 1991,

pp. 212-219.

[2] Batory, D.S. and Won Kim, “Modeling Concepts for VLSI CAD Objects,” ACM Transactions of Database Systems, Vol. 10, No. 3, September 1985, pp. 322-346.

[3] Bergland, G.D., “A Guided Tour of Program Design Methodologies,” in IEEE Tutorial on Software Quality Assurance, ed. Tsun S. Chow, IEEE Computer Society Press, Silver Springs, Maryland, 1985, pp. 219-243.

[4] Bourland, D. David and Paul Dennithorne Johnston, ed., To Be or Not: An E-Prime Anthology, International Society for General Semantics, San Francisco, CA, 1991.

[5] Buchmann, Alejandro P. and Concepcion Perez de Celis, “An Architecture and Data Model for CAD Databases,” Proceedings of the 11th International Conference on Very Large Databases, Stockholm, 1985, pp. 105-114.

[6] Clancy, William J., “Classification Problem Solving,” Proceedings 3rd National Conference on Artificial Intelligence (AAAI), August 1984.

[7] Grady, Robert B., “Work-Product Analysis: The Philosopher’s Stone of Software,” IEEE Software,, March 1990, pp. 27-34.

[8] Hardwick, Martin, “Why ROSE fast: Five Optimizations in the Design of an Experimental Database System for CAD/CAM Applications,” Proceedings of ACM SIGMOD, San Francisco, CA, May 1987, pp. 292-298.

[9] Helier, Sandra, Umeshwar Dayal, Jack Orenstein, and Susan Radke-Sproull, “An Object-Oriented Approach to Data Management: Why Design Databases Need It,” Proceedings of the 24th Design Automation Conference, Las Vegas, Nevada, 1987, pp. 335-340.

[10] Hurson, A.R., Simin H. Pakzad, and Jia-bing Cheng, “Object-Oriented Database Management Systems: Evolution and Performance Issues,” IEEE Computer, February 1993, pp. 48-60.

[11] Johnson, W. Lewis, Martin S. Feather, and David R. Harris, “Represention and Presentation of Requirements Knowledge,” IEEE Transactions on Software Engineering, Vol. 18, No. 10, October 1992, pp.853-869.

[12] Kim, Won, Hong-Tai Chou and Jay Banerjee, “Operations and Implementation of Complex Objects,” Proceedings of the 3rd International Conference on Data Engineering, Los Angeles, CA, 1987, pp. 626-633.

[13] Lorie, Raymond and Wilfred Plouffe, “Complex Objects and Their Use in Design Transactions,” Proceedings of the Annual Meeting of Engineering Design Applications, San Jose, CA, May 1983, pp. 115-121.

[14] Lubars, Mitch D, “Reusing Designs for Rapid Application Development,” Proceedings of the International Conference on Communications, Denver, CO, 23-26 June 1991, pp. 1515-1519.

[15] Matsumoto, Masao, “Automatic Software Reuse Process in Integrated CASE Environment,” IEICE Transactions on Information Systems, Vol. E75-D, No. 5, September 1992, pp. 657-73.

[16] McLeod, D, et. al., “An Approach to Information Management for CAD/VLSI Applications,” Proceedings of ACM Database Week, SIGMOD Conference, San Jose, CA, May 1983.

[17] Onuegbe, Emmanuael O., “Database Management System Requirements for Software Engineering Environments,” Proceedings of the 3rd International Conference on Data Engineering, Los Angeles, CA, 1987, pp. 501-509.

[18] Poulin, Jeffrey S. and Martin Hardwick, “Adapting Object-Oriented CAD Database Concepts for Computer Aided Software Engineering,” Proceedings of the International Symposium on Database Systems for Advanced Applications, Seoul, Korea, April 1989, pp. 201-208.

[19] Poulin, Jeffrey S., and Kathryn P. Yglesias, “Experiences with a Faceted Classification Scheme in a Large Reusable Software Library (RSL),” to appear, Seventeenth Annual International Computer Software and Applications Conference (COMPSAC), Phoenix, AZ, 3-5 November 1993.

[20] Prieto-Diaz, Ruben, and Peter Freeman, “Classifying Software for Reusability,” IEEE Software, Jan. 1987, pg. 6-16.

[21] Roman, Gruia-Catalin, “Data Engineering in Software Development Environments,” Proceedings of the 3rd International Conference on Data Engineering, Los Angeles, CA, 1987, pp. 85-86.

[22] Runciman, C. and I. Toyn, “Retrieving reusable software components by polymorphic type,” Journal of Functional Programming, Vol.1, pt.2, April 1991, pp. 191-211.

[23] Shriver, Bruce D., “Reuse Revisited,” IEEE Software, Jan. 1987, pg. 5.

[24] Smith, J. and D. Smith, “Data Abstractions: Aggregation and Generalization,” ACM Transactions on Database Systems, Vol. 3, No. 3, 1977, pp. 105-133.

[25] Yau, Stephen S., “Relationship Between Data Engineering and Software Engineering,” Proceedings of the 3rd IEEE International Conference on Data Engineering, Los Angeles, CA, 1987, pp. 84.

7 Biography

JEFFREY S. POULIN (poulinj@vnet.ibm.com). IBM Federal Systems Company, MD 0220, Owego, New York, 13827. Dr. Poulin works with the IBM FSC Open Systems Development group where he conducts applied research on software reuse and leads the Integrated Software Development Environment team for the Army Sustaining Base Information Systems (SBIS) project. He participates in the IBM Corporate Reuse Council, the Association for Computing Machinery, and the IEEE Computer Society. A Hertz Foundation Fellow, Dr. Poulin earned his Bachelors degree at the United States Military Academy at West Point and his Masters and Ph.D. degrees at Rensselaer Polytechnic Institute in Troy, New York.

[1] ACM Software Engineering Notes (SEN), Vol. 18, No. 4, October 1993, pp. 75-82.

[2] We consider macros as an implementation technique used for efficiency. Although multiple copies of macros may exist (just as in VLSI) the software designer treats them in concept like other dynamically expanded software services.