Subjects

Information Organization – LIS 5703
Dr. Mary Prentice
Subjects Paper
By Daniel Lanza
4 August 2011

The best approach for locating information resources is searching digital content. Digital searching is much faster than scanning through physical objects such as paper books, etc.. There are a vast amount of resources available today, especially on the Internet. Users typically start their quest for knowledge by performing a keyword search on systems like Google. Many times these results can be overwhelming, it’s almost always more efficient to use a controlled vocabulary system like a library catalog to find published resources in a specific subject area. There is a big difference in the way that folksonomy and controlled vocabulary systems retrieve and organize information. For the purposes of this assignment, I chose to focus on a specific subject area related to the resource: Libraries, the Internet, and scholarship (Contribute 2).

Controlled Vocabulary

Taylor defines a controlled vocabulary as, “a list or database of subject terms in which all terms or phrases representing a concept are brought together” (Taylor 2009). The terms are meant to be as consistent, reliable, and predictable as possible. There are three categories of controlled vocabularies: subject heading lists, thesauri, and ontologies. Some of the challenges of controlled vocabulary systems are specificity of terms, synonymous terms, sequence and form of multi-word terms, and qualification of terms.

Subject heading lists are primarily used by the library community, the most popular and widely used list is the Library of Congress Subject Headings (LCSH). The LCSH strives to provide subject access to information resources by using terminology that is consistent and reliable rather than uncontrolled and unpredictable. The Library of Congress (LOC) calls the records in the LCSH “Authority Records” because they cover more than just subjects. The LOC describes an authority record as “a tool used by librarians to establish forms of names (for persons, places, meetings, and organizations), titles, and subjects used on bibliographic records. (LOC 2009)” It is important to understand that authority records do not represent materials in the Library’s collection, rather they are a tool used to help organize the library catalog and assist users in finding those materials. The scope of the LCSH is wide ranging because there is an almost infinite number of subjects possible.

On the contrary, the other common type of controlled vocabulary called thesauri have been created and used mostly by the indexing communities. Thesauri are almost always much narrower in scope. They are usually made up of terms from a very specific subject area like Electrical Engineering. The terms are also very hierarchical, meaning they have clearly defined distinctions between themselves and their groupings. The Institute of Electrical and Electronic Engineers (IEEE) has a thesaurus which is described as, “ a tool to improve and enhance the indexing and retrieval of articles and other material from IEEE periodicals, conferences, standards, and other publications that are made available on IEEE Xplore. It provides a common and consistent language for authors, researchers, and online discovery.” (IEEE 2011).

MARC Subject Headings (Controlled Vocabulary)
The following list is taken from the Library of Congress MARC record for the resource http://lccn.loc.gov/2005029985.

Libraries |x Special collections |x Electronic information resources
Libraries and scholars
Digital libraries
Research libraries
Internet in higher education
Communication in learning and scholarship
Libraries and the Internet

The list of MARC subject headings above is a perfect example of LCSH controlled vocabulary. Each subject is fairly consistent and descriptive. The first bullet is interesting because it contains “|x”, which denote general subdivisions. The word “Libraries” is used four times, but it is consistent every time. Each subject is somewhat related, but also very different and distinct. I believe that these subject headings were chosen very wisely and they certainly help represent the concepts that the book describes.

Folksonomy

Although the word Folksonomy may have ancient roots in philosophy, these days it’s been adopted by Thomas Vander Wal as, “the result of personal free tagging of information and objects (anything with a URL) for one’s own retrieval” (Vander Wal 2007). Tagging is a popular feature of Web 2.0 interactive technologies which allows users to group similar resources together by using their own terms or labels with few restrictions. “Today, tagging is a widespread phenomenon popularized by applications such as social bookmarking (Del.icio.us) and social photo sharing (Flickr)” (Gruber 2005).

Tagging is amplified by the tremendous number of people who participate in the categorization,“we now have an entirely new source of data for finding and organizing information: user participation….Tags introduce distributed human intelligence into the system” (Gruber 2005). However, It is certainly possible for a malicious user to go into a record and tag it with false information. Since there are few or no restrictions the system must rely on users good will for reliable and honest tagging. If one bad user sabotages the information there should be another good user right behind them to clean it up. By definition folksonomy is uncontrollable and unpredictable, thankfully in practice, there seems to be more good than bad.

One common misconception is that search engines like Google are folksonomy based systems. This is not entirely true since search engines are primarily based on computer algorithms and not public tags. Google’s algorithm is based on “Page Rank” which is designed to rank pages based on the number of other pages that link to that page. It can be argued that since the Internet is a public domain and technically anything is possible (including manipulating Page Rank). Google has implemented features like “+1” to enhance search results with user based intelligence. I believe that these are steps toward the semantic web movement, which is an endeavour that promises to help make all information on the web more meaningful for search engines. The World Wide Web Consortium says, “The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web” (W3C 2004).

Tags/Keywords (Folksonomy)
This list is supplied by my own searching of Folksonomy style systems like IMDB, etc.

Internet
Searching
Information
Organization
Research
Library

The keywords that I came up with were very broad and generalised. While searching different systems, I encountered a wide variety of related results, but nothing specific. The reason I chose such broad keywords is because I didn’t really research or think about them very much beforehand. This is usually the case with most web searching, users don’t give their searches enough thought before pressing go and they often have to refine the search further to find what they’re looking for. This is one reason why folksonomy systems flourish with a large amount of tags rather than just a few specific ones.

References

Taylor, A. G., & Joudrey, D. N. (2009). The organization of information. Westport, CT:

Libraries Unlimited, Inc.

W3C. (2004). OWL Web Ontology Language Overview. Retrieved on July 20, 2011, from

http://www.w3.org/TR/owl-features/

Gruber, Thomas. (2005). Folksonomy of Ontology: A Mash-up of Apples and Oranges.

Retrieved on July 22, 2011, from http://tomgruber.org/writing/ontology-of-folksonomy.htm

Vander Wal, Thomas. (2007). Folksonomy Coinage and Definition. Retrieved on July 22, 2011,

from http://vanderwal.net/folksonomy.html

Library of Congress. (2009). Frequently Asked Questions (Library of Congress Authorities).

Retrieved on July 24, 2011, from http://authorities.loc.gov

Institute of Electrical and Electronic Engineers. (2011). IEEE Thesaurus. Retrieved on July 24,

2011, from http://www.ieee.org/publications_standards/publications/services/thesaurus2.html