Organization and Component Classification in the IBM Reuse Library Jeffrey S. Poulin Kathryn P. Yglesias Reuse Technology Support Center International Business Machines Corporation Abstract This paper presents experiences with software classification in a large ___________________________________________________________________________ corporate reuse software library (RSL) at IBM. Facets are one popular method _____________________________________________________________________________ of classification used extensively in the IBM RSL. However, facets alone _____________________________________________________________________________ cannot adequately provide all the information needed to fully classify and _____________________________________________________________________________ understand a reusable component. Experience with an operational RSL reveals _____________________________________________________________________________ that we require a combination of classification techniques to meet the needs _____________________________________________________________________________ of software developers. Following an overview of the IBM classification _____________________________________________________________________________ method, we discuss the issues surrounding the use of facets and software _____________________________________________________________________________ classification in a large reuse system and give techniques used at IBM to _____________________________________________________________________________ address those issues. _____________________ KEYWORDS: Software Reuse, Classifying Software, Faceted Classification 1.0 Overview Storing, searching, and retrieving software from a repository of reusable components is central to the practice of reuse. Each of these activities relies on the existence of a systematic method of organizing the components so reusable parts can be matched to existing needs. Classifying software allows the reusers to organize collections of components into structures that can be searched easily. Classification usually results in the creation of an index that can assist in the physical storage of components in a library or database and provides the input to search tools. The method of classifica- tion is an important ingredient in determining the types of indices that can be used, the types of searches that can be conducted, and the types of tools that can be used or required. The method of classification also determines the accuracy of possible searches and the precision of the results [19]. However, classification requires an investment in resources. The classi- fication scheme and the instances of the scheme need to be created and main- tained. Fast, interactive searches can require large and complex indices that are not necessary in other situations. Automated indexing methods, manual classification, the nature of the reuse repository, and the types of available retrieval tools all influence the classification method. We use the popular method of faceted classification extensively in the internal IBM RSL. A facet term describes a key aspect of the software; each facet has a related set of terms to describe the possible values of that facet. Selecting an appropriate component consists of matching facet terms of software in the RSL to a specific need. However, facets alone cannot ade- quately provide all the information needed to fully classify and understand a reusable component. We find that we require a combination of classification techniques to help software developers locate, assess and intergrate reusable components into their products. This paper presents experiences with software classification in a large corporate reuse library system at IBM. An overview of software classifica- tion precedes an explanation of the IBM reuse system. Issues surrounding the use of facets and techniques used at IBM to address those issues are then presented. Following the IBM software classification experiences is a dis- cussion of enhancements to software classification required in a large pro- duction environment. 2.0 Software classification Systematically ordering software into classes that specify the allowable uses for the software is particularly complex. Unlike the specific functions provided by computer hardware, software often possesses an overall abiguity and generality that is difficult to express. There has been success identi- fying math and I/O routines. However, although the software community under- stands simple utilities and abstract data types very well, this understanding becomes lost when the software is composed of more abstract ideas and algo- rithms containing intermingled functions and side effects. Various methods to classify software have been proposed and implemented. These include various formal and automated techniques. However, most of the methods actually in practice are based on classification lessons learned in library science. Of these methods, there are four major categories [5]: 1. Enumerated 2. Attribute-value 3. Facets 4. Free-text In Enumerated classification parts are organized into classes. These __________ classes are usually hierarchical; an example of enumerated classification is the Dewey Decimal system. An example in a reusable software library that uses an enumerated classification might go as follows: to find a routine for solving 3rd order differential equations, you first look for Math Routines, then Calculus Routines, then Differential Calculus Routines, then examine a list of what is available. Searching for software is analogous to looking something up in the table of contents in a book. Therefore, enumerated clas- sification has the advantage that it is well understood by most people and is therefore easy to use. Enumerated classification has the following disadvan- tages: o the index must be built manually, which can be costly and prone to error o the ambiguity of a part can cause it to fit several places in the scheme, which can make it difficult to locate o the structure is not easily balanced, which makes it awkward to use if there are many parts of one type and few of another. In Attribute-Value classification parts are described by a set of attri- _______________ butes and their values, e.g., if the attribute is "Author_name," the value might be "John Smith." The attribute can take any value assigned by the person classifying the part. Locating a part consists of specifying an exact value or range of values for a subset or all of the attributes. This method is easy to use and can be partially automated, such as for the attribute "Object_code_size." However, it requires a more sophisticated search mech- anism and incurs relatively poor performance during searches. Attribute- value also suffers the same ambiguity problem as does enumerated classification. Without some way to control the attribute values reusers can use different terms to describe the same part, thereby making it difficult to locate. Some libraries implement synonymn lists or a thesaurus to address this problem. In Faceted classification parts are described by a set of terms, or facets, _______ and facet values. In this way facets are similar to the attribute-value method. However, with facets the choice of values is limited. This elimi- nates the problem of ambiguity in deciding the best value for a term or attribute. For example, if the facet is "Operating_System," the values can only be one of (AIX, VM, MVS, OS/2). Because the choice of terms is limited, search performance can be very good. In Free Text Keyword classification parts are described by English words or _________________ phrases. The text can be entered by a person who is classifying the part or can be extracted automatically from the documentation or source files [13]. Library science has two general strategies for handling free text indexing. These are through the use of controlled and uncontrolled vocabularies. Three __________ ____________ of the four methods for classifying software; enumerated, attribute-value, and facets, are controlled vocabulary techniques. Free text, however, can be either uncontrolled or controlled. Free text is uncontrolled when terms are either chosen ad hoc by the person classifying the part or are automatically ______ extracted from text, either with or without syntax. Free text is controlled when keywords are matched to a synonym list or thesaurus. Free text is easy to implement and use but suffers from difficulty in matching requirements to existing software unless, like attribute-value, a sophisticated search tool is used. A study on the relative recall,(1) precision,(2) search times,(3) overlap,(4) and user preference of each classification method shows that each method has strengths and weaknesses, and therefore the use of as many as pos- sible is recommended [5]. In fact, IBM uses all four techniques in the organization of the IBM corporate reuse environment. The environment is structured into inventories and libraries using enumerated techniques, free text is exploited in searches, and classifying individual reusable units is done with facets and attribute-values. 2.1 Overview of the IBM Classification Scheme The IBM reuse environment consists of high level groupings of related libraries called inventories. We use inventories as the entry point for the ____________ reuser in the environment. Example inventories include "documents," "commer- cial software," and "federal software." Inventories contain one or more libraries.. Libraries further enumerate __________ the options available to the reuser. Examples of libraries within the "federal software" inventory are "Ada collection packages," "flight simu- lation software," and "aerospace navigation software." Libraries consist of collections of components. A component is a bundle ___________ containing all the information needed to ensure a part can be reused effi- ciently; the component is the basic unit of reuse. Figure 1 on page 5 shows how libraries consist of components and how components consist of elements of information. The information elements can be themselves reusable or can simply be information elements, such as "integration instructions," which are intended solely to assist the reuser. --------------- (1) Recall is the number of relevant items retrieved compared to the total number of relevant items in the database. (2) Precision is the number of relevant parts retrieved compared to the total number of parts retrieved. (3) Search time is the elapsed time required to locate an applicable reusable part. (4) Overlap is a measure of the total unique parts retrieved by each classification method. *--------------------------* *--------------------------* | LIBRARY X | | Component A | | *------------------* | | | | | Library | | | *------------------* | | | Abstract | | | | Element 1 | | | | Element | | | | (reusable) | | | *------------------* | | *------------------* | | | | *------------------* | | *------------------* | | | Element 2 | | | | Library | | | | (information) | | | | Legal | | | *------------------* | | | Element | | | *------------------* | | *------------------* | | | Element 3 | | |--------------------------| | | (reusable) | | | | | *------------------* | | *------------------* | | *-----------------* | | | Component A |<--+-------------| | Element 4 | | | *------------------* | | | (information) | | | | | *-----------------* | | *------------------* | | *------------------* | | | Component B | | | | Element 5 | | | *------------------* | | | (reusable) | | | | | *------------------* | | *------------------* | | *-----------------* | | | Component C | | | | Element 6 | | | *------------------* | | | (information) | | | o | | *-----------------* | | o | | o | | o | | o | | *------------------* | | o | | | Component N | | | *------------------* | | *------------------* | | | Element n | | | | | | | | | | | *------------------* | *--------------------------* *--------------------------* Figure 1. Libraries, components, and elements. Libraries contain related components. Components contain related elements. Elements contain either reusable parts or information about the reusable parts. Note that classification involves information that is used specifically for searching for reusable components. The other elements of information are required to assist reusers evaluate and reuse software; those information requirements are not included in this discussion. However, in some cases elements originally intended for informational purposes become part of a search mechanism. For example, short (less than one page) free-text abstracts of reusable components are very useful for understanding a reusable component. Although abstracts are not used in faceted classification, some search tools, including the IBM tool, use abstracts to build indices of the reuse library or scan the abstract during the search process. Facets and attribute-values are together called classifiers. As shown in ____________ Figure 2 on page 6, classifiers describe libraries, components, and elements. The use of both methods rather than the exclusive use of facets was one of the earliest issues addressed when developing the IBM software classification scheme. The need to track non-enumerable data such as individual author name and the need to track data with potentially large ranges of values such as department codes mandated the use of attribute-value classifiers. *-------------------* *-----------------------* | LIBRARY X | | CLASSIFIERS | | | | | | *-------------* | | *-----------------* | | |Lib. Abstract| |<----------------------+--| Library | | | *-------------* | | *-----------------* | | *-------------* | | *-----------------* | | |Lib. Legal | | *------+--| Component | | | *-------------* | | | *-----------------* | | | | | *-----------------* | |-------------------| | | | Element |--+-* | *-------------* | | | *-----------------* | | | | Component A | | | | | | | *-------------* | V *-----------------------* | | *-------------* | *---------------------------------* | | | Component B |--+--------->| COMPONENT B | | | *-------------* | | | | | o | | *---------------------------* | | | o | | | Element 1 (reusable) |<-+----* | *-------------* | | *---------------------------* | | | Component N | | | *---------------------------* | | *-------------* | | | Element 2 (information) | | *-------------------* | *---------------------------* | | o | | o | | *---------------------------* | | | Element n | | | *---------------------------* | *---------------------------------* Figure 2. Libraries, components, elements and classifiers. Classifiers describe libraries and components. We use classifiers to identify which libraries to search and to locate and retrieve potentially reusable components. The organization and required contents of the classification scheme which IBM uses are described in [8], [9]. 2.2 The origin of faceted software classification In January 1987 Prieto-Diaz and Freeman published an article in IEEE Soft- ware proposing a faceted classification scheme for software. [15], [16]. Recognizing that previously existing methods of organizing software were inadequate for large, continuously growing libraries, they proposed a compo- nent description format based on a standard vocabulary of terms. This faceted approach was more descriptive and extensible than earlier enumerative methods. The Prieto-Diaz and Freeman approach centers on the six facets shown in Table 1 on page 7 and seeks to provide a preliminary schedule intended for functionally identifiable software products ranging from about 50 to 200 lines of code. +------------------------------------------------------------------+ | Table 1. Prieto-Diaz and Freeman faceted classification schedule | +---------+----------------------------------------------+---------+ | FACET | DESCRIPTION | EXAM- | | | | PLES | +---------+----------------------------------------------+---------+ | Func- | Specific function performed by the compo- | add, | | tion | nent. | delete | +---------+----------------------------------------------+---------+ | Object | What the component acts on. | arrays, | | | | files | +---------+----------------------------------------------+---------+ | Medium | Where the action is executed. | buffer, | | | | tree | +---------+----------------------------------------------+---------+ | System | Functional or application independent area. | com- | | type | | piler, | | | | sched- | | | | uler | +---------+----------------------------------------------+---------+ | Func- | Application dependent activity. | budg- | | tional | | eting, | | area | | DB | | | | design | +---------+----------------------------------------------+---------+ | Setting | Where the application takes place. | adver- | | | | tising, | | | | finance | +---------+----------------------------------------------+---------+ 2.3 Issues with a faceted taxonomy Although many RSLs use the faceted taxonomy, many implementation issues need to be addressed. Our experience shows that the following issues are the most significant: 2.3.1 Consistency Keeping the classification consistent requires a thorough understanding of the domain or domains which the classification covers. The first step in understanding a domain requires working with domain experts and users of the classification to understand their use of terms. The analyst must then look for conceptual similarities and differences among the terms of the various user groups. The analyst seeks to create a set of terms at as similar a conceptual level as possible by including the variations on a shared concept within a single "group" (or root term). Several techniques, including conceptual closeness [15], [6], lattices [4], or thesauri [2]. IBM uses a synonym matching tool to help unify terms into consistent structures. Because every user group has a select area of specialization, small differences of concept become very significant. For example, the first IBM classification schedule had numerous terms related to operating systems and aerospace because IBM did a lot of work in these areas. However, even within these domains a term would have several interpretations. For example, a developer in the MVS operating system will disagree on the similarity of a concept also used in the VM oper- ating system. There may be differences, but it is the task of the classi- fication analyst to look for where subtle differences in a term will compromise the consistency of the rest of the classification scheme. 2.3.2 Brevity Brevity is one of the most important considerations when defining a classi- fication scheme. The set of facets and the terms in each facet should be kept as brief as possible. A diverse user group with many different views of the salient characteristics of any given domain make this difficult to achieve. For example, the original classification scheme at IBM included all of the attributes which Grady Booch used to define his Ada Abstract Data Types (ADTs), since this was one of the first commercially available RSLs [1]. Eventually we condensed many of the Booch attributes by aggregating related functions, characteristics, and features into single facets and then grouping these ADT-unique facets under an ADT domain. 2.3.3 Resolving ambiguous terms Ambiguity of terms arises when more than one group of users will retrieve parts from the RSL. The developers of operating systems and office systems both refer to the term address, which has two completely separate meanings. _______ In this case we can resolve the ambiguity fairly easily, but other domains have much greater overlap and so require a much more detailed analysis. To resolve this we defined a guideline which states that the same term cannot exist as a root term within the same facet, even across domain bounda- ries. This guideline helps decrease user confusion and simplifies implemen- tation of the RSL. User confusion occurs when the user considers several domains to be good candidates for finding parts but each domain contains the same term with different meanings. An example is the term cursor for the ______ "Object" facet in the graphical user interface (GUI) and operating systems domains. In a GUI cursor refers to a symbol on the output device whereas in ______ the operating system domain cursor refers to a type pointer. ______ If the ambiguous term is significant in both domains, we resolve the situ- ation by making it a synonym of a root term which developers identify within the domain as conceptually similar. The decision on which domain will main- tain ownership of the root term usually depends on the existence of an appro- priate alternative in the alternate domains. In the case of cursor, it was ______ retained as a root term for the "Object: user interface" facet (with a synonym of mouse pointer) and pointer became the root term for "Object: data _____________ _______ structure." 2.3.4 Extensibility Any classification scheme must be able to adapt to new requirements. The addition of new domains affects the classification structure by potentially requiring new facets and the need for new terms in existing facets. The most difficult aspect of extensibility involves controlling the ambiguity and brevity of the classification scheme while being responsive to user needs and concerns. A decision to limit addition of terms (and control brevity) made on a purely conceptual basis may not be acceptable to one or more user sets whose terms are being condensed or grouped. Experience can be used to weight the factors to determine a solution. The factors can include: o willingness of the groups involved to accept an explanation of the rationale used, o number of proposed changes which will not be accepted, o frequency of use of the RSL by the groups, o number of users in each group, and o cohesiveness of each group. 2.3.5 Maintaining facets and terms The dynamic nature of programming requires constant maintenance of not only software but also of the terms needed to describe it. For example, we changed the facet "Function" to "Function (What it does)" because many people confused the intended meaning with several other possibilities. We also changed the facet "Proven Operating System" to "Software Environment" to accommodate network operating systems and other environments (such as IBM's Customer Information Control System (CICS)) which are not technically oper- ating systems. 2.3.6 Balancing administrative and technical needs Technical users frequently comment that they only want a few simple facets (e.g., "their" facets and terms). They request simplification of the classi- fication schedule by removing all facets that they feel do not apply to them. In fact, the facets they target for removal are usually the administrative classifiers which identify the library, owner, legal, and other restrictions related to reusing the component. Our experience shows these administative classifiers and information elements must be included to provide the neces- sary information to all organizations affected by the component: designers, testers, maintainers, legal reviewers, etc. However, part suppliers and reusers can find the wealth of information surrounding a reusable component somewhat overwhelming. 3.0 Experiences with the faceted taxonomy Early work in faceted classification provided the architectural basis for the IBM taxonomy [15]. We redesigned an existing prototype reuse system to address needs of an expanded audience and range of life cycle products. We then implemented the faceted taxonomy in the new tool. Since establishing the initial set of classifiers, we have conducted two major classification evaluations and re-designs. Furthermore, we made approximately forty opera- tional updates to the classification to incorporate changes requested by nearly constant user feedback. 3.1 The first set of IBM facets IBM based the first set of facets on the work done for the STARS program. Many of the same individuals working on the IBM STARS team also worked on the IBM RSL. However, due to differences in the audiences and goals of the RSLs, the resulting classifications and RSLs are completely different tools. The best feature of the first set of IBM facets was the close mapping of the classification to one of our major libraries, Booch's Ada abstract data types. This strength rapidly disappeared as difficulties were encountered due to terminology differences in other ADT libraries as well as in the addi- tion of avionics part sets from IBM Federal Systems Company. The avionics library included math models, ADTs, and fairly large reusable components for navigation and guidance. Agreement on the first set of facets was difficult due to the need to expand the scope of the library. For example, we did not have adequate exam- ples of how to classify user documentation, designs, or test cases. Conse- quently, the first set of facets included internal (process) and external (customer) documentation, but did not attempt to cover the other software life cycle products. We reviewed the classification quite extensively, but until recently we did not have sufficient experience to determine the ade- quacy of the scheme. Although few changes have been requested, we continue to conduct research in this area. 3.2 Current set of Classifiers The current set of IBM classifiers resulted from the issues and experiences described above. These classifiers have proven to work across all phases of the software lifecycle. They have also proven flexible enough to describe and discriminate between parts developed by contractors, vendors, and IBM organizations over the wide diversity of environments in which we operate. Despite these successes, user feedback alerts us to a continuing need to analyze usability and retrieval precision [19]. A partial list of the IBM classifiers appears in the following table. The evolution from the original set of facets defined by Prieto-Diaz and Freeman (Table 1 on page 7) can be noted. It is also interesting to note that IBM's experience has led to a classifier set with similarities to those of the other companies or groups shown in 4.0, "Related Work" on page 15. +------------------------------------------------------------------+ | Table 2 (Page 1 of 2). Partial listing of the IBM classifiers | +---------+----------------------------------------------+---------+ | FACET | DESCRIPTION | EXAM- | | | | PLES | +---------+----------------------------------------------+---------+ | Algo- | Technique to perform an action | bubble, | | rithm | | merge | +---------+----------------------------------------------+---------+ | Appli- | Broad area of application | adver- | | cation | | tising, | | Domain | | aero- | | | | space | +---------+----------------------------------------------+---------+ | Certif- | IBM quality rating | Certi- | | ication | | fied, | | Level | | as-is | +---------+----------------------------------------------+---------+ | Compo- | Component size | 512 | | nent | in LOC or words | LOC, | | Size | | 8245 | | | | words | +---------+----------------------------------------------+---------+ | Data | How the data object | bounded,| | Struc- | is implemented. | directed| | ture | | | | Charac- | | | | teristic| | | +---------+----------------------------------------------+---------+ | Devel- | Compliance with | ISO | | opment | standards | 9000, | | Stand- | | DoD-2167| | ards | | | +---------+----------------------------------------------+---------+ | Func- | What the component does. | sort, | | tion | | add | +---------+----------------------------------------------+---------+ | Object | What the component acts on. | buffer, | | | | array | +---------+----------------------------------------------+---------+ | Imple- | Describes various | sequen- | | mentatio| techniques. | tial, | | | | dynamic | +---------+----------------------------------------------+---------+ | Imple- | Programming | C++, | | mentatio| language | Ada | | Lan- | | | | guage | | | +---------+----------------------------------------------+---------+ | National| National | English,| | Lan- | language | French | | guage | | | +---------+----------------------------------------------+---------+ +---------+----------------------------------------------+---------+ | Proven | Tested computer | R/6000, | | Hard- | platforms | S390 | | ware | | | +---------+----------------------------------------------+---------+ | Proven | Tested | AIX, | | O.S. | Operating Systems | OS/2 | +---------+----------------------------------------------+---------+ | Support | Guaranteed | Limited,| | Level | service provided | Full | +---------+----------------------------------------------+---------+ 3.3 Further enhancements Although IBM has been involved with formal reuse for many years, the ability to satisfy the reuser customer set requires continuous evolution of the available capabilities. 3.3.1 Integration with text search techniques Although the extensive list of classifiers provided a complete descriptive scheme, customers requested the ability to execute key word searches in addi- tion to classifier searches. There were several reasons for this. Groups with limited domains are often familiar with the parts available for their domain and choose to access them directly by appropriate keyword search. Other groups find the time required to understand the classification a luxury which they cannot justify (an real example of short term goals off-setting long term benefits). We encourage reusers to use the faceted classification to locate feasibly usable components from a broad set of libraries, then to iterate on the search results through use of keyword search. This two-fold approach helps control search costs and provides options for very specialized winnowing of the first order selection. 3.3.2 Hierarchical facets To further reduce the number of classifiers initially shown to the reuser, we defined a hierarchical ordering of existing classifiers. By grouping the facets on which users did not want to focus (such as administrative facets), the programmer can more quickly re-define the correct search scope as his requirements evolve. Because there of the inter-dependence among facets, the hierarchical structure is not pure. The three-dimensional model which reflects the actual dependencies of the independent views (e.g., facets) must be simplified and to some extent hidden from the user by the RSL's interface. Hierarchical facets are one of the most effective ways found so far to help accomplish this goal. 3.3.3 Domain requirements Despite extensive domain research and analysis, no set of classifiers will satisfy all users. Development organizations with unique products or com- puting environments demanded the ability to create custom sets of classifiers for use by their organization. Differences in jargon and culture further complicate customer needs. As a result, we adopted a mechanism to allow selected user groups to add facets for local use. The object oriented development environment offers a challenge for adaption of faceted classification [10]. We must encourage Object-Oriented program- mers to use and contribute to corporate assets. They are usually ready and willing to do so, especially if their compilers, browsers, and other develop- ment tools integrate well with the corporate RSL. 3.3.4 User perspectives We face a fundamental problem in classification in that reusers often have different understandings of the terms. The cognitive processing of the potential reuser often causes the reuser to develop a concept of the problem in such a way as to allow for a variety of potential solutions. This tend- ency to think in terms of the "solution space" rather than the "problem space" means the reuser will fail to consider potential solutions which fall outside the scope of the solution as he sees it. ______________ The nature of language and classification for reuse causes interpretation options. Once the analyst has completed the classification, the responsi- bility to interpret the classification and think in the problem space shifts to the reuser. Education on classification and reuse must accompany the installation of a RSL. Furthermore, the education must extend beyond use of the features of the RSL, it must provide the potential reuser with the ability to flexibly evaluate a problem and find full or partial solutions. The following experience illustrates the importance of reuse education in determining the appropriate use of Abstract Data Types (ADTs) developed for reuse. One of our development projects reported a performance problem with a list ____ ADT from our RSL which they chose to reuse in their code. Our technical con- sultant investigated with questions related to their abstraction requirements rather than the specifics of the alleged performance problem. He found that the abstraction required a direct access method, multiple iterations, fast sequential access, fast insertion, and fast key-based retrieval. He recom- mended that the project use an AVL-tree rather than a list. With this choice ________ ____ of ADT, the performance problem went away. 3.4 When to apply facets The use of facets depends on the situation. The previous sections have discussed some of the benefits and some of the limitations of use of faceted classifications. IBM's reusers are a heterogeneous set of developers located around the world. They develop software ranging from operating systems to business applications to medical systems and write in programming languages ranging from COBOL to C++ to Assembler. The needs of such a diverse group require a mechanism such as a faceted classification. Few other companies have as extreme a set of programming requirements as IBM. As the set of requirements are scaled down -- smaller groups of pro- grammers, fewer languages and fewer domains, the requirements for a faceted classification decrease. At some point, the maintenance requirements and overhead for retrieval cause facets to become a detriment. Decreasing need for facets -------------------------------------> distributed sites one group many assets few assets Remember that the reason we use facets is to provide a systematic method for focusing on the problem space rather than the solution space. This means _____________ ______________ that a programmer can use code written for an entirely different application, often even in a different language, when the programmer focuses on what the code accomplishes (and its performance characteristics) and not on a pre- conceived solution. Finding these opportunities for reuse depends on a reli- able classification method and RSL. 4.0 Related Work The following discussion is limited to those techniques based on library science. However, library science is not the only field contributing to classification research. For example, the mathematical field of catagory theory has established methods to characterize algebraic structures. Formal specification languages using relation algebras can formally specify the behavior of software components [12]. Denotational semantics and predicate calculus specifications consisting of preconditions and invariant assertions can also provide accurate, provably correct descriptions of software [7]. However, most operational libraries rely on principles adapted from library science. Facets are a proven method of organizing information and are used in numerous disciplines, such as zoology. Within computer science, facets have been used to classify documents for reuse [11], to organize and retrieve appropriate programming tools [14], and to group, analyze, and predict errors in software development [3], [18]. Table 3 on page 16 contains a sample of operational reuse libraries, the organizations which sponsor or developed them, and the classification method used by each. Several of these libraries have been selected for more detailed description below. +------------------------------------------------------------------+ | Table 3. Sample Reuse Libraries | +---------+----------------------------------------------+---------+ | LIBRARY | DEVELOPER | CLASSI- | | | | FICATION| | | | METHOD | +---------+----------------------------------------------+---------+ | Catalog | Bell Labs | Free | | | | text | +---------+----------------------------------------------+---------+ | Asset-CA|DDept. of Defense | Faceted | +---------+----------------------------------------------+---------+ | REBOOT | Europe | Faceted | +---------+----------------------------------------------+---------+ | Reuse | Texas Instruments | Free | | | | text | | | | (con- | | | | trolled | | | | keyword)| +---------+----------------------------------------------+---------+ | RAASP | Westinghouse | Enumer- | | | | ated | | | | Faceted | +---------+----------------------------------------------+---------+ | RLF | Unisys | Faceted | +---------+----------------------------------------------+---------+ | Asset | GTE | Faceted | | Library | | | +---------+----------------------------------------------+---------+ | RSL | Intermetrics | Free | | | | Text | | | | (uncon- | | | | trolled | | | | keyword)| | | | Enumer- | | | | ated | +---------+----------------------------------------------+---------+ | CAMP-PES| US Air Force | Enumer- | | | | ated | | | | Attribut|-Value +---------+----------------------------------------------+---------+ 4.1 REBOOT In Europe, the ESPRIT project REBOOT (REuse Based on Object-Oriented Tech- niques) has also chosen a component-based approach to reuse. REBOOT is a _______________ major four year study being conducted by a consortium of European companies, including Bull S.A., Cap Gemini Innovation, and Siemans. REBOOT components are like IBM components in that they bundle work products from all phases of the software development cycle. REBOOT focuses on object technology, and chose a facet-based classification adapted for object-oriented components. The four facets, shown in Table 4 on page 17, each have an associated term space which is connected in an specialization/generalization hierarchy with synonyms [10]. +------------------------------------------------------------------+ | Table 4. The REBOOT faceted classification schedule | +---------+----------------------------------------------+---------+ | FACET | DESCRIPTION | EXAM- | | | | PLES | +---------+----------------------------------------------+---------+ | Abstract|oA noun that characterizes the component. | stack | | | | queue | +---------+----------------------------------------------+---------+ | Oper- | What the component does. | | | ations | | | +---------+----------------------------------------------+---------+ | Oper- | What the component acts on. | inte- | | ates On | | gers, | | | | set | +---------+----------------------------------------------+---------+ | Depend- | Characteristics which affect reuse. | C++ | | encies | | based, | | | | Unix- | | | | based | +---------+----------------------------------------------+---------+ 4.2 STARS and Reuse Interoperability Group Current work by the Reuse Interoperability Group (RIG), an organization originating from the Software Technology for Adaptable, Reliable Systems (STARS) program, is based on facets, attribute-value, and keyword classifiers. The RIG is a volunteer group formed by the United States gov- ernment and private organizations to investigate, develop, and propose stand- ards to assist in the exchange of information between heterogenous reuse libraries. The RIG proposes a bi-level model in which one set of classifiers, the Basic Interoperability Data Model (BIDM), is a subset of another functionally complete set, the Universal Data Model (UDM) [17]. The choice of model depends on the application. The two models are object based and use attributes and facets to describe the components. The complete UDM consists of 59 attributes or information items. The BIDM is the core, or subset, of 21 items. Since the purpose of the models is to help assess the reusability of a part, many of the items are for information and are not likely to be used in classification. Table 5 on page 18 lists those likely to be used in classification. +------------------------------------------------------------------+ | Table 5. Partial list of the RIG classification schedule | +---------+----------------------------------------------+---------+ | FACET | DESCRIPTION | EXAM- | | | | PLES | +---------+----------------------------------------------+---------+ | Abstract| General text description of the asset. | | +---------+----------------------------------------------+---------+ | Descript|vWord(s) describing the asset. | sort, | | Keyword | | add | +---------+----------------------------------------------+---------+ | Domain | Broad area of application | avionics| | | | guid- | | | | ance | | | | system | +---------+----------------------------------------------+---------+ | Lan- | Language (usually computer) used. | Ada, | | guage | | COBOL | +---------+----------------------------------------------+---------+ | Target | Computer, OS, and compiler types. | R/6000, | | Envi- | | AIX, | | ronment | | | +---------+----------------------------------------------+---------+ | Element | The type of asset. | requirem|nts, | Type | | make | | | | file | +---------+----------------------------------------------+---------+ | Media | How element is | CD-ROM, | | | available | elec- | | | | tronic | +---------+----------------------------------------------+---------+ 4.3 Intermetrics Intermetrics uses a combination attributes and keywords to classify objects in their Reusable Software Library (RSL). They permit the user to list up to five descriptive keywords for each component and do not restrict the length or content of these keywords. The RSL also uses a category code system similar to that used by libraries. A partial listing of the RSL classifica- tion appears in Table 6 on page 19[2]. +------------------------------------------------------------------+ | Table 6. Partial listing of the Intermetrics RSL schedule | +----------------+-------------------------------------------------+ | ATTRIBUTE | DESCRIPTION | +----------------+-------------------------------------------------+ | Unitname | Name of the reusable object. | +----------------+-------------------------------------------------+ | Category Code | A predefined code describing the component | | | function. | +----------------+-------------------------------------------------+ | Machine | The computer on which the component was pro- | | | grammed. | +----------------+-------------------------------------------------+ | Compiler | The compiler on which the component was pro- | | | grammed. | +----------------+-------------------------------------------------+ | Keywords | Up to five unrestricted keywords may be pro- | | | vided. | +----------------+-------------------------------------------------+ | Requirements | Information about things the components needs | | | to run. | +----------------+-------------------------------------------------+ | Overview | A brief text description of the component. | +----------------+-------------------------------------------------+ | Errors | Describes error handling and exception han- | | | dling. | +----------------+-------------------------------------------------+ | Algorithm | Describes the algorithm used by the component. | +----------------+-------------------------------------------------+ | Documentation | Describes where to find information and test | | and Testing | cases for the component. | +----------------+-------------------------------------------------+ 5.0 Future Work The dynamic nature of our development environment demands ongoing mainte- nance and enhancement of the classification schedule so we can remain respon- sive to customers. In addition, all corporate practices and guidelines must include the standards for software classification and reuse for them to become fully ingrained in the development process. Identifying affected areas, including new technologies which overlap with reuse (such as object oriented) ensures that we continue our progress. As we build our sets of reusable parts into new domains and languages, we must expand the set of classifiers to embrace the new concepts and terms. We constantly apply what we have learned in our early classification experience to new areas. These include defining tool requirements to ease part retrieval and developing standards to support packaging and storing of reus- able parts. One area of future work involves the level of abstraction in the current set of facets and terms. We must provide the appropriate abstractions to conduct optimal searches of class libraries containing a wide range of methods and features. This conflicts with the desire to reduce the overall number of facets for general usability, since we tend to add new facets and classifiers as we add new functions and classes to the RSL. A natural synergism exists between Object-Oriented programming and reuse. Many organizations who now develop in COBOL or other older languages do not believe the potential of OO, we believe that class libraries will make reuse a mainstream technology in software development. To accomodate class libraries in the existing RSL requires special operators, alternate system designs, and special user interfaces that provide graphical browsing and retrieval. 6.0 Conclusion A faceted taxonomy applies best when many parts must be shared between diverse, geographically distributed organizations. Under these conditions, facets provide one vehicle to normalize terminology and programmer pre- dispositions to a particular solution. Facets provide a systematic approach to the problem space, thereby providing opportunities to locate and reuse ______________ parts which might otherwise not contribute to a solution. However, facets cannot satisfy all classification requirements. Other techniques must also contribute to the knowledge base, including attribute- values, enumerated, and free-text. Users demand flexibility and require more options. The classification analyst must balance the needs of the various user groups and provide a meaningful, accurate, and useful classification standard. Failing to moderate the numerous demands for information and special requirements can result in overwhelming the user with information. 7.0 Cited References [1] Booch, Grady. Software Components with Ada: Structures, Tools, and Sub- _________________________________________________________ systems. Benjamin Cummings, Menlo Park, CA, 1987. ________ [2] Burton, Bruce A. et. al., "The Reusable Software Library," IEEE Soft- ___________ ware,, July 1987, pp. 25-33. ______ [3] Chillarege, R., et.al., "Orthogonal Defect Classification- A Concept for In-Process Measurements," IEEE Transactions on Software Engineering, ____________________________________________ Vol.18, No.11., November 1992, pp. 943-956. [4] Eichman, David and John Atkins, "Design of a Lattice-Based Faceted Clas- sification System," Proceedings of the Second International Conference ____________________________________________________ on Software Engineering and Knowledge Engineering, Skokie, IL, 21-23 ______________________________________________________ June 1990. [5] Frakes, William, "Empirical Study of Representation Methods for Reuse- able Software," Software Engineering Guild Presentation, Yorktown, NY, 7 ________________________________________ February 92. [6] Gagliano, R.A., M.D. Fraser, G.S. Owen, and P.A. Honkanen, "Issues in reusable Ada library tools," Empirical Foundations of Information and ___________________________________________ Software Science, 1990, pp. 427-35. _________________ [7] Goguen, Joseph A., "Parameterized Programming," IEEE Transactions on ____________________ Software Engineering, Vol. SE-10, No. 5, September 1984, pp. 528-543. _____________________ [8] "IBM Reuse Methodology: Classification Standards for Reusable Compo- nents," IBM Document Number Z325-0681, 2 October 1992. ______________________________ [9] "IBM Reuse Methodology: Qualification Standards for Reusable Components," IBM Document Number Z325-0683, 2 October 1992. ______________________________ [10] Karlsson, Even-Andre, Sivert Sorumgard, and Eirik Tryggeseth. "Classi- fication of Object-Oriented Components for Reuse," Proceedings of _______________ TOOLS'7,, Dortmund, Prentice-Hall, 1992. _________ [11] Laitinen, Kari, "Document Classification for Software Quality Systems," ACM Software Engineering Notes, V. 17, No. 4, October 1992, pp.32-9. _______________________________ [12] Litvintchouk, Steven D. and Allen S. Matsumoto, "Design of Ada Systems Yielding Reusable Components: An Approach Using Structured Algebraic Specification," IEEE Transactions on Software Engineering, Vol. SE-10, ___________________________________________ No. 5, September 1984, pp. 544-551. [13] Maarek, Yoelle S., Daniel M. Berry, and Gail E. Kaiser, "An Information Retrieval Approach for Automatically Constructing Software Libraries," IEEE Transactions on Software Engineering, Vol. 17, No. 8, August 1991, __________________________________________ pp. 800-813. [14] Pfleeger, S.L. Fitzgerald, J.C., Jr., " Software metrics tool kit: support for selection, collection and analysis," Inf. Softw. Technol., _______________________ Vol.33, No.7 September 1991 pp. 477-82. [15] Prieto-Diaz, Ruben, and Peter Freeman, "Classifying Software for Reusa- bility," IEEE Software, Jan. 1987, pg. 6-16. ______________ [16] Prieto-Diaz, Ruben, "Implementing Faceted Classification for Software Reuse," Communications of the ACM,, Vol. 34, No. 5, May 1991, pp. 88-97. ___________________________ [17] RIG Subcommittee Draft Standard SDS-00001 Version 2, "A Basic Reuse Interoperability Model for Reuse Libraries," Reuse Interoperability _______________________ Group Technical Committee #2, 5 February 1993. _____________________________ [18] Straub, Pablo A. and Eduardo J. Ostertag, "EDF: A Formalism for Describing and Reusing Software Experience," Proceedings of the 1991 __________________________ International Symposium on Software Reliability Engineering, Austin, TX, ____________________________________________________________ 17-18 May 1991, pp. 106-13. [19] Yglesias, Kathryn P., "Limitations of Certification Standards in Achieving Successful Parts Retrieval," Proceedings of the 5th Interna- _________________________________ tional Workshop on Software Reuse, Palo Alto, California, 26-29 October ___________________________________ 1992. 8.0 Biography JEFFREY S. POULIN joined IBM's Reuse Technology Support Center, Poughkeepsie, New York, in 1991 as an advisory programmer. His primary responsibilities include developing and applying corporate standards for reusable component classification, certification, and measurements. He par- ticipates in the IBM Corporate Reuse Council, the Association for Computing Machinery, and Vice-Chairs the Mid-Hudson Valley Chapter of the IEEE Computer Society. A Hertz Foundation Fellow, Dr. Poulin earned his Bachelors degree at the United States Military Academy at West Point, New York, and his Masters and Ph.D. degrees at Rensselaer Polytechnic Institute in Troy, New York. KATHRYN P. YGLESIAS is an advisory systems analyst on the staff of the IBM Reuse Technology Support Center. Her current work includes information model definition, classification evolution, and requirements definition for the corporate standards and tools. Previously she coordinated the initiative to define formal methods for reusing non-code work products, especially customer documentation. Prior to joining IBM, she worked for ten years in aerospace. Her experiences include engineering and project management for the Space Shuttle program and customer liasion tasks for an internal computer systems organization. She is a member of the AIAA and Society for Software Quality.