Whenever we develop a new skill or extend an old one, we have to emphasize the relative importance of some aspects and features over others. We can then place these into neat levels only when we discover systematic ways to do so. Then our classifications can resemble level-schemes and hierarchies. But the hierarchies always end up getting tangled and disorderly because there are also exceptions and interactions to each classification scheme. -- Marvin Minsky, The Society of Mind
Classification is needed to provide context for navigation in any knowledge medium, e.g. the WWW. We want to know where we are, and how to go elsewhere. Maps, coordinates, signposts, landmarks, etc are what we use in the physical world, and equivalents are needed in cyberspace. But present methods are cumbersome - you guess at keywords, or click on the most promising link; wait for the results, and click some more, wait some more - searching for specific information can get really frustrating.
Conventional Library Classification
Much work has been done by librarians and information scientists to create good classification systems. Classification involves the development and use of a scheme for the systematic organization of knowledge:
- The enumerative scheme is based on the concept of a universe of
- knowledge which is divided into successively narrower and more specific subjects. Theoretically, all topics are to be represented. Library of Congress (LC) is an enumerative scheme. Enumerative classification attempts to assign headings for every subject and alphabetically enumerates them.
- Hierarchical classification
- uses a more philosophical approach based on the inherent organization of the subject being classified, and establishes logical rules for dividing topics into classes, divisions, and subdivisions.
- A synthetic scheme
- is one in which new class numbers can be developed for new topics not already listed. Dewey Decimal Catalog (DDC), although primarily enumerative, approaches a synthetic scheme with each revision.
- Analytico-synthetic classification
- assigns terms to individual concepts and provides rules for the local cataloger to use in constructing headings for composite subjects.
Dewey Decimal Classification
Dewey Decimal Classification system (DDC) is a general knowledge organization tool that is continuously revised to keep pace with knowledge. The system was conceived by Melvil Dewey in 1873 and first published in 1876.
The Dewey Decimal Classification is the most widely used library classification system in the world. It is used in more than 135 countries and has been translated into over 30 languages. In the United States, 95% of all public and school libraries, 25% of all college and university libraries, and 20% of special libraries use the DDC. In addition, Dewey is used for other purposes, e.g., as a browsing mechanism for resources on the World Wide Web.
Here is a clear overview of the top-level Dewey classes and some of their sub-classes. A problem with Dewey is that, for some topics, you must go down several levels before you get to useful information. For other topics, there may be less information available and only the top 2 or 3 levels will be sufficient.
The Library of Congress
The Library of Congress system was originally created to provide the US Congress with the information it needed to pass laws. There are some rather specific top level directories, such as Naval Science, that will probably have a limited number of entries. In addition, there is a definite US bias to the categories. On the other hand, the LOC system ... [TBD]
Breaking Free
Classical classification and navigation structures follow the familiar hierarchical model, e.g. Dewy, Yahoo!, etc. We instinctively think in terms of topics and their subtopics, and map these to file-system directory trees in our web sites. Comfortable as it may be, there are various problems with this:-
- Web pages don't always fit neat single categories; they might well span several, e.g. "The Humor of British Politics in the 18th Century". In fact, most topics or web pages span to some extent.
- Hierarchies are inflexible; it can be hard or inconvenient to re-assign items if a mistake is uncovered, or the field evolves in an unexpected direction, or a better structure is discovered. Many major web-sites have been restructured at some time. Many broken links..
- Hierarchical classifications are prone to subjectivity and cultural bias. Look at DDC and you'll probably guess that it was devised by a Christian American over a hundred years ago. LoC has military and naval science as top-level categories. And you've probably, at some time, descended a subject hierarchy such as Yahoo!, expecting to find a topic in a certain place, but it was elsewhere..
- Rigid hierarchical classification schemes cannot keep up with scientific advances. Sections of the widely-used schemes -- notably Dewey -- are restructured periodically, but there are always protests from the library community when the revisions necessitate reclassification of large parts of a collection.
What we want is new methods that harness the power of the computer to take over the drudgery, and present better 'maps' of cyberspace. Imagine a DHTML system that presents a matrix of options, and when you've selected some, presents more if needed - immediately - and only when you've done does it query the server.
Faceted Classification
Traditional notions of simple hierarchical classification need to be augmented or replaced with more powerful methods. I don't think a hierarchical classification scheme is good enough for a modern web-based catalog of any substantial size. Entries rarely fit exactly into one leaf node.
Ruben Prieto-Diaz has proposed "faceted classification" for a reusable software library - a concept he found in library science. [R. Prieto-Diaz and P. Freeman., Classifying Software for Reusability. IEEE Software, 4(1):6-16, January 1987.] The classification was originated by S.R. Ranganathan in the 1930's with the Colon Classification. This works by classifying domains in terms of 'facets'. In a faceted classification scheme, the facets may be considered to be dimensions in a cartesian classification space, and the value of a facet is the position of the artifact in that dimension. For software, one might have facets with values such as "Operand", "Functionality", "Platform", "Language", .... Prieto-Diaz claims that a fixed (and small) number of facets is sufficient for classifying all software.
Facet classification is an analytico-synthetic scheme. It is analytic because it subdivides broader elements into single concepts that are clearly defined through facet analysis. It is synthetic in that new elements can be developed. The process of facet analysis can also be used to construct thesauri. There is renewed interest in this system, because some believe that older systems such as DDC and LC
- may require complex or lengthy notation,
- are often difficult to use to locate materials,
- may not provide for enough coordination of terms,
- may not meet the needs of the individual or special library,
- not provide enough detail to accurately describe all subjects in all media.
Basically, the facet development process begins by defining the subject to be covered by examining existing classifications or thesauri, or titles or objects in the perspective database. The derived topics are broken down into facets each with a distinct label. Items are organized so that they are in homogeneous, mutually exclusive groups that differ from the main group by one characteristic. Within each facet, subfacets or more specific topics are listed. The breakdown continues into subfacets within subfacets. The items in each subfacet, in general, are ordered from more general to more specific, complex or concrete.

