Organization and Component Classification
                              in the IBM Reuse Library

                                 Jeffrey S. Poulin
                                Kathryn P. Yglesias

                          Reuse Technology Support Center
                    International Business Machines Corporation

                                          Abstract

          This  paper  presents  experiences  with software classification in a large
          ___________________________________________________________________________
        corporate reuse software library (RSL) at IBM.  Facets are one popular method
        _____________________________________________________________________________
        of classification used extensively in the IBM RSL.    However,  facets  alone
        _____________________________________________________________________________
        cannot  adequately  provide  all the information needed to fully classify and
        _____________________________________________________________________________
        understand a reusable component.  Experience with an operational RSL  reveals
        _____________________________________________________________________________
        that  we require a combination of classification techniques to meet the needs
        _____________________________________________________________________________
        of software developers.   Following an overview  of  the  IBM  classification
        _____________________________________________________________________________
        method,  we  discuss  the  issues  surrounding the use of facets and software
        _____________________________________________________________________________
        classification in a large reuse system and give techniques  used  at  IBM  to
        _____________________________________________________________________________
        address those issues.
        _____________________

          KEYWORDS: Software Reuse, Classifying Software, Faceted Classification


                                        1.0  Overview


          Storing,  searching,  and retrieving software from a repository of reusable
        components is central to the practice of reuse.   Each  of  these  activities
        relies  on  the existence of a systematic method of organizing the components
        so reusable parts can be matched to existing  needs.    Classifying  software
        allows the reusers to organize collections of components into structures that
        can be searched easily.  Classification usually results in the creation of an
        index  that  can assist in the physical storage of components in a library or
        database and provides the input to search tools.  The method  of  classifica-
        tion  is an important ingredient in determining the types of indices that can
        be used, the types of searches that can be conducted, and the types of  tools
        that  can  be used or required.  The method of classification also determines
        the accuracy of possible searches and the precision of the results [19].

          However, classification requires an investment in resources.   The  classi-
        fication  scheme and the instances of the scheme need to be created and main-
        tained.  Fast, interactive searches can require  large  and  complex  indices
        that  are  not  necessary  in other situations.   Automated indexing methods,
        manual classification, the nature of the reuse repository, and the  types  of
        available retrieval tools all influence the classification method.


          We  use  the  popular  method  of faceted classification extensively in the
        internal IBM RSL.  A facet term describes a key aspect of the software;  each
        facet  has  a  related  set  of terms to describe the possible values of that
        facet.  Selecting an appropriate component consists of matching  facet  terms
        of software in the RSL to a specific need.  However, facets alone cannot ade-
        quately provide all the information needed to fully classify and understand a
        reusable  component.  We find that we require a combination of classification
        techniques to help software developers locate, assess and intergrate reusable
        components into their products.

          This paper presents experiences with software  classification  in  a  large
        corporate  reuse  library system at IBM.  An overview of software classifica-
        tion precedes an explanation of the IBM reuse system.  Issues surrounding the
        use of facets and techniques used at IBM to address  those  issues  are  then
        presented.    Following the IBM software classification experiences is a dis-
        cussion of enhancements to software classification required in a  large  pro-
        duction environment.


                                2.0  Software classification


          Systematically  ordering  software  into classes that specify the allowable
        uses for the software is particularly complex.  Unlike the specific functions
        provided by computer hardware, software often possesses an  overall  abiguity
        and  generality that is difficult to express.  There has been success identi-
        fying math and I/O routines.  However, although the software community under-
        stands simple utilities and abstract data types very well, this understanding
        becomes lost when the software is composed of more abstract ideas  and  algo-
        rithms containing intermingled functions and side effects.

          Various  methods  to  classify software have been proposed and implemented.
        These include various formal and automated techniques.  However, most of  the
        methods  actually  in practice are based on classification lessons learned in
        library science.  Of these methods, there are four major categories [5]:

        1.  Enumerated
        2.  Attribute-value
        3.  Facets
        4.  Free-text

          In Enumerated classification parts  are  organized  into  classes.    These
             __________
        classes  are usually hierarchical; an example of enumerated classification is
        the Dewey Decimal system.  An example in a  reusable  software  library  that
        uses an enumerated classification might go as follows:  to find a routine for
        solving  3rd  order differential equations, you first look for Math Routines,
        then Calculus Routines, then Differential Calculus Routines, then  examine  a
        list  of  what is available.   Searching for software is analogous to looking
        something up in the table of contents in a book.  Therefore, enumerated clas-
        sification has the advantage that it is well understood by most people and is


        therefore easy to use.  Enumerated classification has the following disadvan-
        tages:

        o   the index must be built manually, which can be costly and prone to error
        o   the ambiguity of a part can cause it to fit several places in the scheme,
            which can make it difficult to locate
        o   the  structure  is  not easily balanced, which makes it awkward to use if
            there are many parts of one type and few of another.

          In Attribute-Value classification parts are described by a  set  of  attri-
             _______________
        butes  and  their  values, e.g., if the attribute is "Author_name," the value
        might be "John Smith." The attribute can  take  any  value  assigned  by  the
        person classifying the part.  Locating a part consists of specifying an exact
        value  or range of values for a subset or all of the attributes.  This method
        is easy to use and can be partially automated,  such  as  for  the  attribute
        "Object_code_size."    However, it requires a more sophisticated search mech-
        anism and incurs relatively poor performance  during  searches.    Attribute-
        value   also   suffers   the   same  ambiguity  problem  as  does  enumerated
        classification.  Without some way to control the attribute values reusers can
        use different terms to describe the same part, thereby making it difficult to
        locate.  Some libraries implement synonymn lists or a  thesaurus  to  address
        this problem.

          In Faceted classification parts are described by a set of terms, or facets,
             _______
        and  facet  values.    In  this way facets are similar to the attribute-value
        method.  However, with facets the choice of values is limited.   This  elimi-
        nates  the  problem  of  ambiguity  in  deciding the best value for a term or
        attribute.  For example, if the facet is "Operating_System," the  values  can
        only be one of (AIX, VM, MVS, OS/2).  Because the choice of terms is limited,
        search performance can be very good.

          In Free Text Keyword classification parts are described by English words or
             _________________
        phrases.   The text can be entered by a person who is classifying the part or
        can be extracted automatically from the documentation or source files [13].

          Library science has two general strategies for handling free text indexing.
        These are through the use of controlled and uncontrolled vocabularies.  Three
                                     __________     ____________
        of  the  four  methods for classifying software; enumerated, attribute-value,
        and facets, are controlled vocabulary techniques.  Free text, however, can be
        either uncontrolled or controlled.  Free text is uncontrolled when terms  are
        either  chosen ad hoc by the person classifying the part or are automatically
                       ______
        extracted from text, either with or without syntax.  Free text is  controlled
        when  keywords are matched to a synonym list or thesaurus.  Free text is easy
        to implement and use but suffers from difficulty in matching requirements  to
        existing  software  unless, like attribute-value, a sophisticated search tool
        is used.


          A   study   on  the  relative  recall,(1)  precision,(2)  search  times,(3)
        overlap,(4) and user preference of each classification method shows that each
        method has strengths and weaknesses, and therefore the use of as many as pos-
        sible is recommended [5].   In fact, IBM uses  all  four  techniques  in  the
        organization  of  the  IBM  corporate reuse environment.   The environment is
        structured into inventories and libraries using enumerated  techniques,  free
        text  is  exploited in searches, and classifying individual reusable units is
        done with facets and attribute-values.


        2.1  Overview of the IBM Classification Scheme

          The IBM reuse environment consists  of  high  level  groupings  of  related
        libraries  called inventories.  We use inventories as the entry point for the
                          ____________
        reuser in the environment.  Example inventories include "documents," "commer-
        cial software," and "federal software."

          Inventories contain one or more libraries..   Libraries  further  enumerate
                                          __________
        the  options  available  to  the  reuser.    Examples of libraries within the
        "federal software" inventory are "Ada collection packages,"    "flight  simu-
        lation software," and "aerospace navigation software."

          Libraries  consist  of  collections of components.  A component is a bundle
                                                 ___________
        containing all the information needed to ensure a part can  be  reused  effi-
        ciently;  the component is the basic unit of reuse.  Figure 1 on page 5 shows
        how libraries consist of components and how components consist of elements of
        information.   The information elements can be  themselves  reusable  or  can
        simply be information elements, such as "integration instructions," which are
        intended solely to assist the reuser.


---------------

        (1) Recall is the number of relevant items retrieved compared to the
            total number of relevant items in the database.

        (2) Precision is the number of relevant parts retrieved compared  to
            the total number of parts retrieved.

        (3) Search time is the elapsed time required to locate an applicable
            reusable part.

        (4) Overlap is a measure of the total unique parts retrieved by each
            classification method.


         *--------------------------*             *--------------------------*
         |        LIBRARY X         |             |        Component A       |
         |   *------------------*   |             |                          |
         |   |     Library      |   |             |   *------------------*   |
         |   |    Abstract      |   |             |   |    Element 1     |   |
         |   |     Element      |   |             |   |   (reusable)     |   |
         |   *------------------*   |             |   *------------------*   |
         |                          |             |   *------------------*   |
         |   *------------------*   |             |   |    Element 2     |   |
         |   |     Library      |   |             |   |  (information)   |   |
         |   |      Legal       |   |             |   *------------------*   |
         |   |     Element      |   |             |   *------------------*   |
         |   *------------------*   |             |   |    Element 3     |   |
         |--------------------------|             |   |   (reusable)     |   |
         |                          |             |   *------------------*   |
         |   *------------------*   |             |   *-----------------*    |
         |   |  Component A     |<--+-------------|   |    Element 4    |    |
         |   *------------------*   |             |   |  (information)  |    |
         |                          |             |   *-----------------*    |
         |   *------------------*   |             |   *------------------*   |
         |   |  Component B     |   |             |   |    Element 5     |   |
         |   *------------------*   |             |   |    (reusable)    |   |
         |                          |             |   *------------------*   |
         |   *------------------*   |             |   *-----------------*    |
         |   |  Component C     |   |             |   |    Element 6    |    |
         |   *------------------*   |             |   |  (information)  |    |
         |            o             |             |   *-----------------*    |
         |            o             |             |            o             |
         |            o             |             |            o             |
         |   *------------------*   |             |            o             |
         |   |  Component N     |   |             |   *------------------*   |
         |   *------------------*   |             |   |    Element n     |   |
         |                          |             |   |                  |   |
         |                          |             |   *------------------*   |
         *--------------------------*             *--------------------------*


        Figure 1. Libraries,  components,  and  elements.   Libraries contain related
                  components.  Components contain related elements.  Elements contain
                  either reusable parts or information about the reusable parts.

          Note that classification involves information that is used specifically for
        searching  for  reusable  components.   The other elements of information are
        required to assist reusers evaluate and  reuse  software;  those  information
        requirements  are  not  included in this discussion.   However, in some cases
        elements originally intended for informational  purposes  become  part  of  a
        search  mechanism.    For  example,  short  (less  than  one  page) free-text
        abstracts of reusable components are very useful for understanding a reusable
        component.  Although abstracts are not used in faceted  classification,  some
        search  tools,  including the IBM tool, use abstracts to build indices of the
        reuse library or scan the abstract during the search process.

          Facets and attribute-values are together called classifiers.   As shown  in
                                                          ____________
        Figure 2 on page 6, classifiers describe libraries, components, and elements.


        The  use  of  both methods rather than the exclusive use of facets was one of
        the earliest issues addressed when developing the IBM software classification
        scheme.  The need to track non-enumerable data such as individual author name
        and  the  need  to track data with potentially large ranges of values such as
        department codes mandated the use of attribute-value classifiers.

        *-------------------*                       *-----------------------*
        |     LIBRARY X     |                       |      CLASSIFIERS      |
        |                   |                       |                       |
        |  *-------------*  |                       |  *-----------------*  |
        |  |Lib. Abstract|  |<----------------------+--|     Library     |  |
        |  *-------------*  |                       |  *-----------------*  |
        |  *-------------*  |                       |  *-----------------*  |
        |  |Lib. Legal   |  |                *------+--|    Component    |  |
        |  *-------------*  |                |      |  *-----------------*  |
        |                   |                |      |  *-----------------*  |
        |-------------------|                |      |  |     Element     |--+-*
        |  *-------------*  |                |      |  *-----------------*  | |
        |  | Component A |  |                |      |                       | |
        |  *-------------*  |                V      *-----------------------* |
        |  *-------------*  |          *---------------------------------*    |
        |  | Component B |--+--------->|           COMPONENT B           |    |
        |  *-------------*  |          |                                 |    |
        |         o         |          |  *---------------------------*  |    |
        |         o         |          |  |  Element 1  (reusable)    |<-+----*
        |  *-------------*  |          |  *---------------------------*  |
        |  | Component N |  |          |  *---------------------------*  |
        |  *-------------*  |          |  |  Element 2  (information) |  |
        *-------------------*          |  *---------------------------*  |
                                       |                o                |
                                       |                o                |
                                       |  *---------------------------*  |
                                       |  |  Element n                |  |
                                       |  *---------------------------*  |
                                       *---------------------------------*


        Figure 2. Libraries,  components,  elements  and  classifiers.    Classifiers
                  describe  libraries and components.  We use classifiers to identify
                  which libraries to search and to locate  and  retrieve  potentially
                  reusable components.

          The  organization  and required contents of the classification scheme which
        IBM uses are described in [8], [9].


        2.2  The origin of faceted software classification

          In January 1987 Prieto-Diaz and Freeman published an article in IEEE  Soft-
        ware  proposing  a  faceted  classification scheme for software.  [15], [16].
        Recognizing that previously existing  methods  of  organizing  software  were
        inadequate  for large, continuously growing libraries, they proposed a compo-
        nent description format based on  a  standard  vocabulary  of  terms.    This
        faceted approach was more descriptive and extensible than earlier enumerative


        methods.    The  Prieto-Diaz  and  Freeman approach centers on the six facets
        shown in Table 1 on page 7  and  seeks  to  provide  a  preliminary  schedule
        intended  for  functionally identifiable software products ranging from about
        50 to 200 lines of code.

        +------------------------------------------------------------------+
        | Table 1. Prieto-Diaz and Freeman faceted classification schedule |
        +---------+----------------------------------------------+---------+
        | FACET   | DESCRIPTION                                  | EXAM-   |
        |         |                                              | PLES    |
        +---------+----------------------------------------------+---------+
        | Func-   | Specific function performed by the compo-    | add,    |
        | tion    | nent.                                        | delete  |
        +---------+----------------------------------------------+---------+
        | Object  | What the component acts on.                  | arrays, |
        |         |                                              | files   |
        +---------+----------------------------------------------+---------+
        | Medium  | Where the action is executed.                | buffer, |
        |         |                                              | tree    |
        +---------+----------------------------------------------+---------+
        | System  | Functional or application independent area.  | com-    |
        | type    |                                              | piler,  |
        |         |                                              | sched-  |
        |         |                                              | uler    |
        +---------+----------------------------------------------+---------+
        | Func-   | Application dependent activity.              | budg-   |
        | tional  |                                              | eting,  |
        | area    |                                              | DB      |
        |         |                                              | design  |
        +---------+----------------------------------------------+---------+
        | Setting | Where the application takes place.           | adver-  |
        |         |                                              | tising, |
        |         |                                              | finance |
        +---------+----------------------------------------------+---------+


        2.3  Issues with a faceted taxonomy

          Although  many  RSLs  use  the faceted taxonomy, many implementation issues
        need to be addressed.  Our experience shows that the following issues are the
        most significant:


        2.3.1  Consistency

          Keeping the classification consistent requires a thorough understanding  of
        the  domain  or  domains  which the classification covers.  The first step in
        understanding a domain requires working with domain experts and users of  the
        classification  to understand their use of terms.  The analyst must then look
        for conceptual similarities and differences among the terms  of  the  various
        user groups.

          The analyst seeks to create a set of terms at as similar a conceptual level
        as  possible  by including the variations on a shared concept within a single


        "group" (or root term).  Several techniques, including  conceptual  closeness
        [15],  [6],  lattices [4], or thesauri [2].  IBM uses a synonym matching tool
        to help unify terms into consistent structures.  Because every user group has
        a  select  area  of  specialization, small differences of concept become very
        significant.  For example, the first IBM classification schedule had numerous
        terms related to operating systems and aerospace because IBM  did  a  lot  of
        work  in  these areas.   However, even within these domains a term would have
        several interpretations.   For example, a  developer  in  the  MVS  operating
        system will disagree on the similarity of a concept also used in the VM oper-
        ating  system.    There may be differences, but it is the task of the classi-
        fication analyst to  look  for  where  subtle  differences  in  a  term  will
        compromise the consistency of the rest of the classification scheme.


        2.3.2  Brevity

          Brevity is one of the most important considerations when defining a classi-
        fication  scheme.    The  set of facets and the terms in each facet should be
        kept as brief as possible.  A diverse user group with many different views of
        the salient characteristics of  any  given  domain  make  this  difficult  to
        achieve.  For example, the original classification scheme at IBM included all
        of  the  attributes  which  Grady  Booch used to define his Ada Abstract Data
        Types (ADTs), since this was one of the  first  commercially  available  RSLs
        [1].    Eventually  we  condensed many of the Booch attributes by aggregating
        related functions, characteristics, and features into single facets and  then
        grouping these ADT-unique facets under an ADT domain.


        2.3.3  Resolving ambiguous terms

          Ambiguity  of  terms arises when more than one group of users will retrieve
        parts from the RSL.  The developers of operating systems and  office  systems
        both  refer  to the term address, which has two completely separate meanings.
                                 _______
        In this case we can resolve the ambiguity fairly easily,  but  other  domains
        have much greater overlap and so require a much more detailed analysis.

          To  resolve  this  we  defined  a guideline which states that the same term
        cannot exist as a root term within the same facet, even across domain bounda-
        ries.  This guideline helps decrease user confusion and simplifies  implemen-
        tation  of  the  RSL.   User confusion occurs when the user considers several
        domains to be good candidates for finding parts but each domain contains  the
        same  term  with  different meanings.   An example is the term cursor for the
                                                                       ______
        "Object" facet in the graphical user interface (GUI)  and  operating  systems
        domains.   In a GUI cursor refers to a symbol on the output device whereas in
                            ______
        the operating system domain cursor refers to a type pointer.
                                    ______

          If the ambiguous term is significant in both domains, we resolve the  situ-
        ation  by making it a synonym of a root term which developers identify within
        the domain as conceptually similar.  The decision on which domain will  main-
        tain ownership of the root term usually depends on the existence of an appro-
        priate  alternative  in the alternate domains.  In the case of cursor, it was
                                                                       ______
        retained as a root term for  the  "Object:  user  interface"  facet  (with  a
        synonym  of mouse pointer) and pointer became the root term for "Object: data
                    _____________      _______
        structure."


        2.3.4  Extensibility

          Any classification scheme must be able to adapt to new requirements.    The
        addition  of  new domains affects the classification structure by potentially
        requiring new facets and the need for new terms in existing facets.  The most
        difficult aspect of extensibility  involves  controlling  the  ambiguity  and
        brevity of the classification scheme while being responsive to user needs and
        concerns.

          A  decision  to  limit  addition  of  terms (and control brevity) made on a
        purely conceptual basis may not be acceptable to one or more user sets  whose
        terms  are  being condensed or grouped.  Experience can be used to weight the
        factors to determine a solution.  The factors can include:

        o   willingness of the groups  involved  to  accept  an  explanation  of  the
            rationale used,
        o   number of proposed changes which will not be accepted,
        o   frequency of use of the RSL by the groups,
        o   number of users in each group, and
        o   cohesiveness of each group.


        2.3.5  Maintaining facets and terms

          The dynamic nature of programming requires constant maintenance of not only
        software  but  also  of  the  terms  needed to describe it.   For example, we
        changed the facet "Function" to "Function (What it does)" because many people
        confused the intended meaning with several  other  possibilities.    We  also
        changed  the  facet  "Proven  Operating  System" to "Software Environment" to
        accommodate network operating systems and other environments (such  as  IBM's
        Customer  Information  Control System (CICS)) which are not technically oper-
        ating systems.


        2.3.6  Balancing administrative and technical needs

          Technical users frequently comment that they only want a few simple  facets
        (e.g., "their" facets and terms).  They request simplification of the classi-
        fication schedule by removing all facets that they feel do not apply to them.
        In  fact,  the  facets they target for removal are usually the administrative
        classifiers which identify the library, owner, legal, and other  restrictions
        related  to  reusing the component.  Our experience shows these administative
        classifiers and information elements must be included to provide  the  neces-
        sary  information  to all organizations affected by the component: designers,
        testers, maintainers, legal reviewers, etc.    However,  part  suppliers  and
        reusers  can  find the wealth of information surrounding a reusable component
        somewhat overwhelming.


                         3.0  Experiences with the faceted taxonomy


          Early work in faceted classification provided the architectural  basis  for
        the  IBM  taxonomy [15].  We redesigned an existing prototype reuse system to
        address needs of an expanded audience and range of life cycle products.    We
        then  implemented  the  faceted taxonomy in the new tool.  Since establishing
        the initial set of classifiers, we have conducted  two  major  classification
        evaluations  and re-designs.  Furthermore, we made approximately forty opera-
        tional updates to the classification  to  incorporate  changes  requested  by
        nearly constant user feedback.


        3.1  The first set of IBM facets

          IBM  based  the first set of facets on the work done for the STARS program.
        Many of the same individuals working on the IBM STARS team also worked on the
        IBM RSL.  However, due to differences in the audiences and goals of the RSLs,
        the resulting classifications and RSLs are completely different tools.

          The best feature of the first set of IBM facets was the  close  mapping  of
        the  classification  to one of our major libraries, Booch's Ada abstract data
        types.  This strength rapidly disappeared as  difficulties  were  encountered
        due to terminology differences in other ADT libraries as well as in the addi-
        tion  of  avionics  part sets from IBM Federal Systems Company.  The avionics
        library included math models, ADTs, and fairly large reusable components  for
        navigation and guidance.

          Agreement  on  the  first  set  of  facets was difficult due to the need to
        expand the scope of the library.  For example, we did not have adequate exam-
        ples of how to classify user documentation, designs, or test cases.    Conse-
        quently,  the  first  set  of facets included internal (process) and external
        (customer) documentation, but did not attempt to  cover  the  other  software
        life  cycle products.   We reviewed the classification quite extensively, but
        until recently we did not have sufficient experience to  determine  the  ade-
        quacy  of  the scheme.  Although few changes have been requested, we continue
        to conduct research in this area.


        3.2  Current set of Classifiers

          The current set of IBM classifiers resulted from the issues and experiences
        described above.  These classifiers have proven to work across all phases  of
        the  software  lifecycle.   They have also proven flexible enough to describe
        and discriminate between parts developed by  contractors,  vendors,  and  IBM
        organizations  over  the  wide diversity of environments in which we operate.
        Despite these successes, user feedback alerts us  to  a  continuing  need  to
        analyze usability and retrieval precision [19].


          A  partial list of the IBM classifiers appears in the following table.  The
        evolution from the original set of facets defined by Prieto-Diaz and  Freeman
        (Table 1  on page 7) can be noted.  It is also interesting to note that IBM's
        experience has led to a classifier set with  similarities  to  those  of  the
        other companies or groups shown in 4.0, "Related Work" on page 15.


        +------------------------------------------------------------------+
        | Table 2 (Page 1 of 2). Partial listing of the IBM classifiers    |
        +---------+----------------------------------------------+---------+
        | FACET   | DESCRIPTION                                  | EXAM-   |
        |         |                                              | PLES    |
        +---------+----------------------------------------------+---------+
        | Algo-   | Technique to perform an action               | bubble, |
        | rithm   |                                              | merge   |
        +---------+----------------------------------------------+---------+
        | Appli-  | Broad area of application                    | adver-  |
        | cation  |                                              | tising, |
        | Domain  |                                              | aero-   |
        |         |                                              | space   |
        +---------+----------------------------------------------+---------+
        | Certif- | IBM quality rating                           | Certi-  |
        | ication |                                              | fied,   |
        | Level   |                                              | as-is   |
        +---------+----------------------------------------------+---------+
        | Compo-  | Component size                               | 512     |
        | nent    | in LOC or words                              | LOC,    |
        | Size    |                                              | 8245    |
        |         |                                              | words   |
        +---------+----------------------------------------------+---------+
        | Data    | How the data object                          | bounded,|
        | Struc-  | is implemented.                              | directed|
        | ture    |                                              |         |
        | Charac- |                                              |         |
        | teristic|                                              |         |
        +---------+----------------------------------------------+---------+
        | Devel-  | Compliance with                              | ISO     |
        | opment  | standards                                    | 9000,   |
        | Stand-  |                                              | DoD-2167|
        | ards    |                                              |         |
        +---------+----------------------------------------------+---------+
        | Func-   | What the component does.                     | sort,   |
        | tion    |                                              | add     |
        +---------+----------------------------------------------+---------+
        | Object  | What the component acts on.                  | buffer, |
        |         |                                              | array   |
        +---------+----------------------------------------------+---------+
        | Imple-  | Describes various                            | sequen- |
        | mentatio| techniques.                                  | tial,   |
        |         |                                              | dynamic |
        +---------+----------------------------------------------+---------+
        | Imple-  | Programming                                  | C++,    |
        | mentatio| language                                     | Ada     |
        | Lan-    |                                              |         |
        | guage   |                                              |         |
        +---------+----------------------------------------------+---------+
        | National| National                                     | English,|
        | Lan-    | language                                     | French  |
        | guage   |                                              |         |
        +---------+----------------------------------------------+---------+


        +---------+----------------------------------------------+---------+
        | Proven  | Tested computer                              | R/6000, |
        | Hard-   | platforms                                    | S390    |
        | ware    |                                              |         |
        +---------+----------------------------------------------+---------+
        | Proven  | Tested                                       | AIX,    |
        | O.S.    | Operating Systems                            | OS/2    |
        +---------+----------------------------------------------+---------+
        | Support | Guaranteed                                   | Limited,|
        | Level   | service provided                             | Full    |
        +---------+----------------------------------------------+---------+


        3.3  Further enhancements

          Although  IBM  has  been  involved  with  formal  reuse for many years, the
        ability to satisfy the reuser customer set requires continuous  evolution  of
        the available capabilities.


        3.3.1  Integration with text search techniques

          Although  the extensive list of classifiers provided a complete descriptive
        scheme, customers requested the ability to execute key word searches in addi-
        tion to classifier searches.  There were several reasons for  this.    Groups
        with  limited  domains  are often familiar with the parts available for their
        domain and choose to access them  directly  by  appropriate  keyword  search.
        Other groups find the time required to understand the classification a luxury
        which  they  cannot  justify (an real example of short term goals off-setting
        long term benefits).

          We encourage reusers to use the faceted classification to  locate  feasibly
        usable  components  from  a  broad  set  of libraries, then to iterate on the
        search results through use of keyword search.  This two-fold  approach  helps
        control  search  costs and provides options for very specialized winnowing of
        the first order selection.


        3.3.2  Hierarchical facets

          To further reduce the number of classifiers initially shown to the  reuser,
        we  defined a hierarchical ordering of existing classifiers.  By grouping the
        facets on which users did not want to focus (such as administrative  facets),
        the  programmer  can  more  quickly re-define the correct search scope as his
        requirements evolve.  Because there of the inter-dependence among facets, the
        hierarchical structure is  not  pure.    The  three-dimensional  model  which
        reflects the actual dependencies of the independent views (e.g., facets) must
        be simplified and to some extent hidden from the user by the RSL's interface.
        Hierarchical  facets  are one of the most effective ways found so far to help
        accomplish this goal.


        3.3.3  Domain requirements

          Despite extensive domain research and analysis, no set of classifiers  will
        satisfy  all  users.   Development organizations with unique products or com-
        puting environments demanded the ability to create custom sets of classifiers
        for use by their organization.   Differences in jargon  and  culture  further
        complicate  customer  needs.    As  a result, we adopted a mechanism to allow
        selected user groups to add facets for local use.

          The object oriented development environment offers a challenge for adaption
        of  faceted  classification [10].  We must encourage Object-Oriented program-
        mers to use and contribute to corporate assets.  They are usually  ready  and
        willing to do so, especially if their compilers, browsers, and other develop-
        ment tools integrate well with the corporate RSL.


        3.3.4  User perspectives

          We  face a fundamental problem in classification in that reusers often have
        different understandings of the terms.    The  cognitive  processing  of  the
        potential  reuser often causes the reuser to develop a concept of the problem
        in such a way as to allow for a variety of potential solutions.   This  tend-
        ency  to  think  in  terms  of  the "solution space" rather than the "problem
        space" means the reuser will fail to consider potential solutions which  fall
        outside the scope of the solution as he sees it.
                                          ______________

          The  nature  of language and classification for reuse causes interpretation
        options.  Once the analyst has completed the  classification,  the  responsi-
        bility  to interpret the classification and think in the problem space shifts
        to the reuser.   Education on classification and  reuse  must  accompany  the
        installation  of a RSL.  Furthermore, the education must extend beyond use of
        the features of the RSL, it  must  provide  the  potential  reuser  with  the
        ability  to  flexibly  evaluate a problem and find full or partial solutions.
        The following experience illustrates the importance  of  reuse  education  in
        determining  the  appropriate use of Abstract Data Types (ADTs) developed for
        reuse.

          One of our development projects reported a performance problem with a  list
                                                                                 ____
        ADT from our RSL which they chose to reuse in their code.  Our technical con-
        sultant investigated with questions related to their abstraction requirements
        rather  than the specifics of the alleged performance problem.  He found that
        the abstraction required a direct access method,  multiple  iterations,  fast
        sequential  access,  fast insertion, and fast key-based retrieval.  He recom-
        mended that the project use an AVL-tree rather than a list.  With this choice
                                       ________               ____
        of ADT, the performance problem went away.


        3.4  When to apply facets

          The use of facets depends on the situation.   The  previous  sections  have
        discussed  some of the benefits and some of the limitations of use of faceted
        classifications.  IBM's reusers are a heterogeneous set of developers located
        around the world.  They develop software ranging from  operating  systems  to
        business  applications  to medical systems and write in programming languages
        ranging from COBOL to C++ to Assembler.  The needs of such  a  diverse  group
        require a mechanism such as a faceted classification.

          Few  other  companies  have as extreme a set of programming requirements as
        IBM.  As the set of requirements are scaled down -- smaller  groups  of  pro-
        grammers,  fewer  languages and fewer domains, the requirements for a faceted
        classification decrease.   At some point, the  maintenance  requirements  and
        overhead for retrieval cause facets to become a detriment.

                       Decreasing need for facets
                  ------------------------------------->
             distributed sites                   one group
                many assets                      few assets

          Remember  that  the  reason we use facets is to provide a systematic method
        for focusing on the problem space rather than the solution space.  This means
                            _____________                 ______________
        that a programmer can use code written for an entirely different application,
        often even in a different language, when the programmer focuses on  what  the
        code  accomplishes  (and  its  performance characteristics) and not on a pre-
        conceived solution.  Finding these opportunities for reuse depends on a reli-
        able classification method and RSL.


                                      4.0  Related Work


          The following discussion is limited to those techniques  based  on  library
        science.    However,  library  science  is not the only field contributing to
        classification research.   For example, the mathematical  field  of  catagory
        theory  has established methods to characterize algebraic structures.  Formal
        specification languages using relation  algebras  can  formally  specify  the
        behavior  of  software components [12].  Denotational semantics and predicate
        calculus specifications consisting of preconditions and invariant  assertions
        can  also  provide  accurate,  provably correct descriptions of software [7].
        However, most operational libraries rely on principles adapted  from  library
        science.

          Facets  are  a  proven  method  of  organizing  information and are used in
        numerous disciplines, such as zoology.  Within computer science, facets  have
        been  used  to  classify  documents  for reuse [11], to organize and retrieve
        appropriate programming tools [14], and to group, analyze, and predict errors
        in software development [3], [18].


          Table 3 on page 16 contains a sample of operational  reuse  libraries,  the
        organizations  which sponsor or developed them, and the classification method
        used by each.   Several of  these  libraries  have  been  selected  for  more
        detailed description below.

        +------------------------------------------------------------------+
        | Table 3. Sample Reuse Libraries                                  |
        +---------+----------------------------------------------+---------+
        | LIBRARY | DEVELOPER                                    | CLASSI- |
        |         |                                              | FICATION|
        |         |                                              | METHOD  |
        +---------+----------------------------------------------+---------+
        | Catalog | Bell Labs                                    | Free    |
        |         |                                              | text    |
        +---------+----------------------------------------------+---------+
        | Asset-CA|DDept. of Defense                             | Faceted |
        +---------+----------------------------------------------+---------+
        | REBOOT  | Europe                                       | Faceted |
        +---------+----------------------------------------------+---------+
        | Reuse   | Texas Instruments                            | Free    |
        |         |                                              | text    |
        |         |                                              | (con-   |
        |         |                                              | trolled |
        |         |                                              | keyword)|
        +---------+----------------------------------------------+---------+
        | RAASP   | Westinghouse                                 | Enumer- |
        |         |                                              | ated    |
        |         |                                              | Faceted |
        +---------+----------------------------------------------+---------+
        | RLF     | Unisys                                       | Faceted |
        +---------+----------------------------------------------+---------+
        | Asset   | GTE                                          | Faceted |
        | Library |                                              |         |
        +---------+----------------------------------------------+---------+
        | RSL     | Intermetrics                                 | Free    |
        |         |                                              | Text    |
        |         |                                              | (uncon- |
        |         |                                              | trolled |
        |         |                                              | keyword)|
        |         |                                              | Enumer- |
        |         |                                              | ated    |
        +---------+----------------------------------------------+---------+
        | CAMP-PES| US Air Force                                 | Enumer- |
        |         |                                              | ated    |
        |         |                                              | Attribut|-Value
        +---------+----------------------------------------------+---------+


        4.1  REBOOT

          In  Europe, the ESPRIT project REBOOT (REuse Based on Object-Oriented Tech-
        niques) has also chosen a component-based approach to reuse.    REBOOT  is  a
                                  _______________
        major  four year study being conducted by a consortium of European companies,
        including Bull S.A., Cap Gemini Innovation, and Siemans.   REBOOT  components
        are  like IBM components in that they bundle work products from all phases of
        the software development cycle.   REBOOT focuses on  object  technology,  and
        chose  a  facet-based  classification adapted for object-oriented components.
        The four facets, shown in Table 4 on page 17, each have  an  associated  term
        space  which  is connected in an specialization/generalization hierarchy with
        synonyms [10].

        +------------------------------------------------------------------+
        | Table 4. The REBOOT faceted classification schedule              |
        +---------+----------------------------------------------+---------+
        | FACET   | DESCRIPTION                                  | EXAM-   |
        |         |                                              | PLES    |
        +---------+----------------------------------------------+---------+
        | Abstract|oA noun that characterizes the component.     | stack   |
        |         |                                              | queue   |
        +---------+----------------------------------------------+---------+
        | Oper-   | What the component does.                     |         |
        | ations  |                                              |         |
        +---------+----------------------------------------------+---------+
        | Oper-   | What the component acts on.                  | inte-   |
        | ates On |                                              | gers,   |
        |         |                                              | set     |
        +---------+----------------------------------------------+---------+
        | Depend- | Characteristics which affect reuse.          | C++     |
        | encies  |                                              | based,  |
        |         |                                              | Unix-   |
        |         |                                              | based   |
        +---------+----------------------------------------------+---------+


        4.2  STARS and Reuse Interoperability Group

          Current work by the Reuse Interoperability  Group  (RIG),  an  organization
        originating  from  the  Software  Technology  for Adaptable, Reliable Systems
        (STARS)  program,  is  based  on   facets,   attribute-value,   and   keyword
        classifiers.    The RIG is a volunteer group formed by the United States gov-
        ernment and private organizations to investigate, develop, and propose stand-
        ards to assist in the exchange  of  information  between  heterogenous  reuse
        libraries.    The  RIG  proposes  a  bi-level  model  in  which  one  set  of
        classifiers, the Basic Interoperability Data Model (BIDM),  is  a  subset  of
        another  functionally complete set, the Universal Data Model (UDM) [17].  The
        choice of model depends on the application.  The two models are object  based
        and  use  attributes and facets to describe the components.  The complete UDM
        consists of 59 attributes or information items.   The BIDM is  the  core,  or
        subset,  of  21 items.  Since the purpose of the models is to help assess the
        reusability of a part, many of the items are  for  information  and  are  not
        likely  to  be used in classification.  Table 5 on page 18 lists those likely
        to be used in classification.


        +------------------------------------------------------------------+
        | Table 5. Partial list of the RIG classification schedule         |
        +---------+----------------------------------------------+---------+
        | FACET   | DESCRIPTION                                  | EXAM-   |
        |         |                                              | PLES    |
        +---------+----------------------------------------------+---------+
        | Abstract| General text description of the asset.       |         |
        +---------+----------------------------------------------+---------+
        | Descript|vWord(s) describing the asset.                | sort,   |
        | Keyword |                                              | add     |
        +---------+----------------------------------------------+---------+
        | Domain  | Broad area of application                    | avionics|
        |         |                                              | guid-   |
        |         |                                              | ance    |
        |         |                                              | system  |
        +---------+----------------------------------------------+---------+
        | Lan-    | Language (usually computer) used.            | Ada,    |
        | guage   |                                              | COBOL   |
        +---------+----------------------------------------------+---------+
        | Target  | Computer, OS, and compiler types.            | R/6000, |
        | Envi-   |                                              | AIX,    |
        | ronment |                                              |         |
        +---------+----------------------------------------------+---------+
        | Element | The type of asset.                           | requirem|nts,
        | Type    |                                              | make    |
        |         |                                              | file    |
        +---------+----------------------------------------------+---------+
        | Media   | How element is                               | CD-ROM, |
        |         | available                                    | elec-   |
        |         |                                              | tronic  |
        +---------+----------------------------------------------+---------+


        4.3  Intermetrics

          Intermetrics uses a combination attributes and keywords to classify objects
        in their Reusable Software Library (RSL).  They permit the user to list up to
        five  descriptive  keywords for each component and do not restrict the length
        or content of these keywords.   The RSL also  uses  a  category  code  system
        similar  to that used by libraries.  A partial listing of the RSL classifica-
        tion appears in Table 6 on page 19[2].


        +------------------------------------------------------------------+
        | Table 6. Partial listing of the Intermetrics RSL schedule        |
        +----------------+-------------------------------------------------+
        | ATTRIBUTE      | DESCRIPTION                                     |
        +----------------+-------------------------------------------------+
        | Unitname       | Name of the reusable object.                    |
        +----------------+-------------------------------------------------+
        | Category Code  | A predefined code describing the component      |
        |                | function.                                       |
        +----------------+-------------------------------------------------+
        | Machine        | The computer on which the component was pro-    |
        |                | grammed.                                        |
        +----------------+-------------------------------------------------+
        | Compiler       | The compiler on which the component was pro-    |
        |                | grammed.                                        |
        +----------------+-------------------------------------------------+
        | Keywords       | Up to five unrestricted keywords may be pro-    |
        |                | vided.                                          |
        +----------------+-------------------------------------------------+
        | Requirements   | Information about things the components needs   |
        |                | to run.                                         |
        +----------------+-------------------------------------------------+
        | Overview       | A brief text description of the component.      |
        +----------------+-------------------------------------------------+
        | Errors         | Describes error handling and exception han-     |
        |                | dling.                                          |
        +----------------+-------------------------------------------------+
        | Algorithm      | Describes the algorithm used by the component.  |
        +----------------+-------------------------------------------------+
        | Documentation  | Describes where to find information and test    |
        | and Testing    | cases for the component.                        |
        +----------------+-------------------------------------------------+


                                      5.0  Future Work


          The  dynamic  nature of our development environment demands ongoing mainte-
        nance and enhancement of the classification schedule so we can remain respon-
        sive to customers.  In addition, all corporate practices and guidelines  must
        include  the  standards  for  software  classification  and reuse for them to
        become fully ingrained in the  development  process.    Identifying  affected
        areas,  including  new  technologies which overlap with reuse (such as object
        oriented) ensures that we continue our progress.

          As we build our sets of reusable parts into new domains and  languages,  we
        must expand the set of classifiers to embrace the new concepts and terms.  We
        constantly  apply what we have learned in our early classification experience
        to new areas.    These  include  defining  tool  requirements  to  ease  part


        retrieval  and developing standards to support packaging and storing of reus-
        able parts.

          One  area  of  future work involves the level of abstraction in the current
        set of facets and terms.   We must provide the  appropriate  abstractions  to
        conduct  optimal  searches  of  class  libraries  containing  a wide range of
        methods and features.  This conflicts with the desire to reduce  the  overall
        number  of  facets for general usability, since we tend to add new facets and
        classifiers as we add new functions and classes to the RSL.

          A natural synergism exists between Object-Oriented programming  and  reuse.
        Many  organizations  who now develop in COBOL or other older languages do not
        believe the potential of OO, we believe that class libraries will make  reuse
        a  mainstream  technology  in  software  development.    To  accomodate class
        libraries in the existing RSL requires special  operators,  alternate  system
        designs,  and  special  user  interfaces  that provide graphical browsing and
        retrieval.


                                       6.0  Conclusion


          A faceted taxonomy applies best when many  parts  must  be  shared  between
        diverse,  geographically  distributed organizations.  Under these conditions,
        facets provide one vehicle  to  normalize  terminology  and  programmer  pre-
        dispositions  to a particular solution.  Facets provide a systematic approach
        to the problem space, thereby providing opportunities  to  locate  and  reuse
               ______________
        parts which might otherwise not contribute to a solution.

          However,  facets  cannot  satisfy  all classification requirements.   Other
        techniques must also contribute to the knowledge base,  including  attribute-
        values, enumerated, and free-text.  Users demand flexibility and require more
        options.    The  classification analyst must balance the needs of the various
        user groups and provide a meaningful,  accurate,  and  useful  classification
        standard.    Failing  to  moderate  the  numerous demands for information and
        special requirements can result in overwhelming the user with information.


                                    7.0  Cited References


        [1]  Booch, Grady.  Software Components with Ada: Structures, Tools, and Sub-
                            _________________________________________________________
             systems. Benjamin Cummings, Menlo Park, CA, 1987.
             ________


        [2]  Burton, Bruce A. et. al., "The Reusable Software  Library,"  IEEE  Soft-
                                                                          ___________
             ware,, July 1987, pp. 25-33.
             ______

        [3]  Chillarege, R., et.al., "Orthogonal Defect Classification- A Concept for
             In-Process  Measurements,"  IEEE  Transactions  on Software Engineering,
                                         ____________________________________________
             Vol.18, No.11., November 1992, pp. 943-956.

        [4]  Eichman, David and John Atkins, "Design of a Lattice-Based Faceted Clas-
             sification System," Proceedings of the Second  International  Conference
                                 ____________________________________________________
             on  Software  Engineering  and  Knowledge Engineering, Skokie, IL, 21-23
             ______________________________________________________
             June 1990.

        [5]  Frakes, William, "Empirical Study of Representation Methods  for  Reuse-
             able Software," Software Engineering Guild Presentation, Yorktown, NY, 7
                             ________________________________________
             February 92.

        [6]  Gagliano,  R.A.,  M.D.  Fraser, G.S. Owen, and P.A. Honkanen, "Issues in
             reusable Ada library tools," Empirical Foundations  of  Information  and
                                          ___________________________________________
             Software Science, 1990, pp. 427-35.
             _________________

        [7]  Goguen,  Joseph  A.,  "Parameterized  Programming," IEEE Transactions on
                                                                 ____________________
             Software Engineering, Vol. SE-10, No. 5, September 1984, pp. 528-543.
             _____________________

        [8]  "IBM Reuse Methodology: Classification  Standards  for  Reusable  Compo-
             nents," IBM Document Number Z325-0681, 2 October 1992.
                     ______________________________

        [9]  "IBM   Reuse   Methodology:   Qualification   Standards   for   Reusable
             Components," IBM Document Number Z325-0683, 2 October 1992.
                          ______________________________

        [10] Karlsson, Even-Andre, Sivert Sorumgard, and Eirik Tryggeseth.   "Classi-
             fication  of  Object-Oriented  Components  for  Reuse,"  Proceedings  of
                                                                      _______________
             TOOLS'7,, Dortmund, Prentice-Hall, 1992.
             _________

        [11] Laitinen, Kari, "Document Classification for Software Quality  Systems,"
             ACM Software Engineering Notes, V. 17, No. 4, October 1992, pp.32-9.
             _______________________________

        [12] Litvintchouk,  Steven  D. and Allen S. Matsumoto, "Design of Ada Systems
             Yielding Reusable Components: An  Approach  Using  Structured  Algebraic
             Specification,"  IEEE  Transactions on Software Engineering, Vol. SE-10,
                              ___________________________________________
             No. 5, September 1984, pp. 544-551.

        [13] Maarek, Yoelle S., Daniel M. Berry, and Gail E. Kaiser, "An  Information
             Retrieval  Approach  for Automatically Constructing Software Libraries,"
             IEEE Transactions on Software Engineering, Vol. 17, No. 8, August  1991,
             __________________________________________
             pp. 800-813.

        [14] Pfleeger,  S.L.    Fitzgerald,  J.C.,  Jr., " Software metrics tool kit:
             support for selection, collection and analysis," Inf.  Softw.  Technol.,
                                                              _______________________
             Vol.33, No.7 September 1991 pp. 477-82.

        [15] Prieto-Diaz,  Ruben, and Peter Freeman, "Classifying Software for Reusa-
             bility," IEEE Software, Jan. 1987, pg. 6-16.
                      ______________


        [16] Prieto-Diaz, Ruben, "Implementing Faceted  Classification  for  Software
             Reuse," Communications of the ACM,, Vol. 34, No. 5, May 1991, pp. 88-97.
                     ___________________________

        [17] RIG  Subcommittee  Draft  Standard  SDS-00001  Version 2, "A Basic Reuse
             Interoperability Model  for  Reuse  Libraries,"  Reuse  Interoperability
                                                              _______________________
             Group Technical Committee #2, 5 February 1993.
             _____________________________

        [18] Straub,  Pablo  A.  and  Eduardo  J.  Ostertag,  "EDF:  A  Formalism for
             Describing and Reusing Software Experience,"  Proceedings  of  the  1991
                                                           __________________________
             International Symposium on Software Reliability Engineering, Austin, TX,
             ____________________________________________________________
             17-18 May 1991, pp. 106-13.

        [19] Yglesias,   Kathryn  P.,  "Limitations  of  Certification  Standards  in
             Achieving Successful Parts Retrieval," Proceedings of the  5th  Interna-
                                                    _________________________________
             tional  Workshop on Software Reuse, Palo Alto, California, 26-29 October
             ___________________________________
             1992.


                                       8.0  Biography


          JEFFREY  S.  POULIN  joined  IBM's   Reuse   Technology   Support   Center,
        Poughkeepsie,  New  York,  in  1991  as an advisory programmer.   His primary
        responsibilities include developing  and  applying  corporate  standards  for
        reusable  component classification, certification, and measurements.  He par-
        ticipates in the IBM Corporate Reuse Council, the Association  for  Computing
        Machinery, and Vice-Chairs the Mid-Hudson Valley Chapter of the IEEE Computer
        Society.    A Hertz Foundation Fellow, Dr. Poulin earned his Bachelors degree
        at the United States Military Academy  at  West  Point,  New  York,  and  his
        Masters  and  Ph.D.  degrees at Rensselaer Polytechnic Institute in Troy, New
        York.

          KATHRYN P. YGLESIAS is an advisory systems analyst on the staff of the  IBM
        Reuse Technology Support Center.  Her current work includes information model
        definition,  classification  evolution,  and  requirements definition for the
        corporate standards and tools.  Previously she coordinated the initiative  to
        define formal methods for reusing non-code work products, especially customer
        documentation.   Prior to joining IBM, she worked for ten years in aerospace.
        Her experiences include engineering and  project  management  for  the  Space
        Shuttle  program  and customer liasion tasks for an internal computer systems
        organization.  She is a member of the AIAA and Society for Software Quality.