e-Portfolio Home :: Competencies :: Vlasta Home

Statement of Competency G

In which I show my understanding of standards and methods used to control and create information structures and my competency in applying the basic principles involved in the organization and representation of knowledge.


California State Archvies

California State Archives
Photo Vlasta Radan, 2006.

In any system, information needs to be sorted in some way. The way things are sorted implies that things in the same group have something in common and that they are in some kind of relationship to each other. This helps people to create mental models of how the information is organized. With mental models, people make sense of complex systems, like databases or web sites.

The Dewy Decimal Classification (DDC) system organizes human knowledge into ten main classes, which are then subdivided into ten divisions, and they are then subdivided into ten more sections. The decimal classification notation allows the system to be numerical and to be extended indefinitely. The tree structure, of which DDC is a classical example, is the most common way to represent the hierarchical relationships between data. The tree structure has a root of classes that are at same level and from which child levels branch off. Branches could be ordered in sequence and could branch further.

A tree structure provides that every information have a unique address with only one possible path to reach it from the root. However, the strict tree structure of DDC could be modified by the use of a faceted classification scheme. In order to best define the subject content of the book, a classification number is formed by using different elements (divisions and sections) across the structure. On this way it is possible to successfully classify the items that could fit into multiple classes.

During my internship in the Special Collections of UC Riverside libraries, I applied my understanding of the classification of information while processing the Anne McCaffrey Papers collection. My task was to complete the processing, create an electronic database, enter the information, and write the finding aid for the collection. The collection was composed of manuscripts, galleys, notes, and correspondence created throughout most of Anne McCaffrey’s writing career and, with only few exceptions, exclusively related to the writing of her books.

Before starting to build the McCaffrey Papers Access database, I devised the classification system and organization of information that will be entered into the database. The organizational chart of the field organization and classification of the titles is my EVIDENCE 1 for this competency. A quick look through boxes provided basic information about the format of the material (manuscripts, letters, and maps) and rough scope of her literary work in the collection.

Anne McCaffrey literary interest varied from Science Fiction to romantic novels, and she often developed characters from the simple novel to extensive book series, which would then go on and branch into thematic sub series. For example, her series Ship Who Sang started as a novel, which then grew to a series of books, a number of which McCaffrey wrote in partnership with other authors. Because the collection was centered around McCaffrey’s books, the collection series were designed to reflect the bibliographical structure of the material.

Various bibliographies of McCaffrey’s works sort books differently and there are no "official" classification of her opus. The material was grouped as correspondence, nonfiction or fiction material. The fiction material was further divided by the book series, collection of stories, and so on.

While classification systems are about defining the structure and relationship between the information, the indexing is about expressing the aboutness of the document. To ensure that the definition of same things would not vary from one indexer to another, the information professionals use controlled vocabulary. The documents belonging to the different classes in classification scheme could share the same controlled vocabulary terms. Organizing information in the logical groups does not include or exclude uniformity of their appearance or presentation of the information. In the databases which do not use classification, controlled vocabulary would be even more important, because only uniformity of the words for the same idea would assure that the documents that are similar in context would be retrieved.

As part of the assignment in indexing and vocabulary design in my Information Retrieval class, students as a group created the Library Literature Database “for use by LIS students interested in the professional literature in the domain of Information Search, Storage and Retrieval.” From the list of suggested literature, the group selected a number of articles, read them, and created pre-coordinate and post-coordinate controlled vocabulary lists. After forming the lists, we indexed the articles using both indexing systems.

The aboutness of the document or knowledge encapsulated in the document could be presented through a set of single words or short phrases. For example, the article describing a cognitive process model of document indexing could be indexed with terms Cognitive Studies and Indexing (post-coordinate indexing) or phrase Information Retrieval–Models–Cognitive (pre-coordinate). Post-coordinate controlled vocabularies do not have assembled terms that need to be applied in exact order in order to retrieve the document, but have a limited pool of terms that are used to index a document.

Regardless of the system of indexing, describing the documents with the right terms is really an art and requires significant insider knowledge of the domain. Without that knowledge, it is difficult to judge if the term used in a document is a widely accepted expression in the field or just an idiosyncrasy of the author’s vocabulary.

During my internship in the Autry National Center, during the Fall semester of 2007, I had the opportunity to hone my indexing skills while preparing the electronic records for public access on the Internet catalog. As I was cataloging baskets records, the question came up if I would like to devise a system for indexing various patterns and designs of baskets.

The vocabulary of subject headings and index terms are in theory controlled by the thesaurus, which is part of the Mimsy XG database that the Autry National Center uses. Although the database comes with a generic thesaurus developed for humanities, that thesaurus does not reflect the structure and peculiar character of the Autry National Center collections.

Because of the problems with the thesaurus, the various patterns used on baskets were indexed in existing records by a number of terms, and there was no consistency in applying similar terms to similar designs. This makes it very difficult or almost impossible to retrieve records that are related by some visual element. It also diminishes the relevance of searches and usefulness of that particular subject heading in all the records.

To assure that I use the most appropriate and accurate terms for the subject, I surveyed the literature in the field. That research yielded only the fact that there are no religious significance in patterns and that there are no generally accepted terminology for designs. The problem was also that most of the relevant literature was from the early to mid 19th century using language that is obsolete and what is today considered culturally insensitive. If the use of those terms would make sense in indexing or in the subject heading of contemporary documents, it could not be applied to the visual appearance of objects, which were expected to be viewed by a wide range of users. The use of complicated and archaic English expressions was also problematic in the context of the Internet, where records could be accessed and viewed by users with limited proficiency in English.

The vocabulary that I designed uses very limited and simple geometric terms (square, triangle, horizontal band, human figure etc.) to describe designs. However the system does not use completely controlled vocabulary, which would be almost impossible with purely design elements, but rather defines the rule and restrict the range of the terms.

Examples of indexed records using my system of design subject headings as applied to the designs on Native American basketry could be seen in records for Pomo baskets and Mission Indian baskets.

The main problem with the indexing and applying the subject headings is a highly subjective process of deciding what a document is about. Humans give meaning to the words and understand them from the context. We would know that in the phrase “pen in the box” pen could not have anything to do with a pig pen. Sometimes deciding what something is about has more to do with our knowledge about subject than with the words used in the abstract or the title of the document. However, exactly this elusive contextual or background knowledge makes the system very unpredictable and arbitrary. And that is also the reason why the complete automation of classification and indexing is almost impossible.

 

Back to topLast update 04/2008

This web site was developed to satisfy the graduation requirements for
the School for Library and Information Science at San Jose State University California
Text, design, and digital imaging by Vlasta Radan