Thursday, December 31, 2015

Vocabularies and Controlled Vocabularies

I have long considered a taxonomy as a particular, structured kind of controlled vocabulary. More recently, however, I have been hearing of “vocabularies” without the word “controlled” in front, although still for the purposes of information management and retrieval, which is cause to wonder: are controlled vocabularies and vocabularies the same thing or not?

Controlled Vocabularies


Definition

It’s the standards that drive the definitions and also the scope of meaning. “Controlled vocabularies” have been most authoritatively defined and scoped by ANSI/NISO Z39.19-2005 Guidelines for the construction, format, and management of monolingual controlled vocabularies. The Standard’s glossary defines it as: “A list of terms that have been enumerated explicitly.” Vocabulary control is an important part of the definition of controlled vocabularies, whereby synonyms are linked together, homographs are distinguished, and unambiguous concepts are defined or scoped.

Although not part of the standard’s name, ISO 25964 Thesauri and interoperability with other vocabularies (parts 1 and 2 published in 2011 and 2013) also defines controlled vocabularies in its glossary, where it states that a controlled vocabulary is a “prescribed list of terms, headings or codes, each representing a concept.” It is also noted: “Controlled vocabularies are designed for applications in which it is useful to identify each concept with one consistent label, for example when classifying documents, indexing them and/or searching them.”

Scope
As for what is included within the scope of controlled vocabularies, ANSI/NISO Z39.19-2005 states in its Scope section, on the first page that controlled vocabularies include:
  • Lists of controlled terms
  • Synonym rings
  • Taxonomies
  • Thesauri
In the ISO 25964, the scope of inclusion of controlled vocabularies is less clear. In the glossary definition for controlled vocabulary, it states: “Thesauri, subject heading schemes and name authority lists are examples of controlled vocabularies,” but a complete list of controlled vocabularies is not presented.

What is significant is that ISO 25964 does make a distinction between “controlled vocabulary” and just vocabulary. ISO 25964 describes more kinds of vocabularies, but then addresses the issue of vocabulary control in each.  Types of vocabularies that ISO 25964 discusses as having vocabulary control are:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Name authority lists
According to ISO 25964 part 2, terminologies and ontologies usually have vocabulary control, but vocabulary control is not a requirement. So, it can be inferred that most but not all terminologies (discussed in my last blog post) or ontologies are controlled vocabularies. Name authority lists are “usually controlled vocabularies” according to ISO 25964 part 2 (section 23.1.1). Synonym rings do not have vocabulary control (section 24.2.3).

Structured Vocabularies


Definition

There is another designation less commonly used of “structured vocabulary.” It appears in the name of the British Standard, BS 8723 Structured vocabularies for information retrieval – Guide. BS 8723 was published in five parts over 2005 – 2008, revising and expanding on the earlier BS and ISO standards for monolingual and multilingual thesauri, and, in turn, became the basis for the current ISO 25964 pair of standards.

ISO 25964 also includes “structured vocabulary” in its glossary, defined as an “organized set of terms, headings or codes representing concepts and their inter-relationships, which can be used to support information retrieval,” and goes on to note: “A structured vocabulary can also be used for other purposes. In the context of information retrieval, the vocabulary needs to be accompanied by rules for how to apply the terms.”  Meanwhile, ANSI/NISO Z39.19-2005 does not mention “structured vocabularies.”

Scope
As for what is included within the scope of structured vocabularies, while that is not so clearly stated, it can be assumed, based on the title of BS 8723 Structured vocabularies for information retrieval – Guide, that the vocabularies included within the standard are all “structured vocabularies.” These are:
  • Thesauri
  • Classification schemes
  • Business classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Authority lists
ISO 25964 seems to use “vocabularies” and “structured vocabularies” somewhat interchangeably. While the standard’s title refers to “thesauri and … other vocabularies,” its foreword states “ISO 25964-2 will cover interoperability between different thesauri and with other types of structured vocabulary, such as classification schemes, name authority lists, ontologies, etc.”

If all the types of vocabularies in part 2 are indeed considered as “structured vocabularies” then the scope of structured vocabularies would cover:
  • Thesauri
  • Classification schemes
  • Classification schemes for records management
  • Taxonomies
  • Subject heading schemes
  • Ontologies
  • Terminologies
  • Name authority lists
  • Synonym rings
The last two, however, might not be included as structured vocabularies. ISO 25964 part 2 says that name authority lists “may also be structured vocabularies” (23.1.1), implying that they are not always structured vocabularies, and it also explains that synonym rings are “not hierarchically structured.”

Vocabularies


The simple one-word designation of “vocabulary,” when used in the context of support for information retrieval, comprises all controlled and structured vocabularies, including those at the margin of the definitions or not always meeting their strict requirements of controlled or structured vocabularies, such as ontologies, terminologies, name authority lists, and synonym rings, along with other flat (unstructured) term lists.

Vocabularies, not necessarily controlled or structured, are also what are referred to in other frameworks or web contexts, such as SKOS (simple knowledge organization system) vocabularies, Semantic Web Vocabularies, and Linked Open Vocabularies.

What is interesting to note is what other topics are being discussed when the terms “controlled vocabulary” and “vocabulary” alone are used in ISO 25964 part 2 Interoperability with other vocabularies.  Controlled vocabularies are discussed in context of entry terms, pre-coordination, post-coordination, near synonyms, and indexing. Vocabularies in general are discussed in context of equivalence mapping, interoperability, resources and authorities, registries, multilingual types, and management software/systems.

Conclusions


Taxonomies, thesauri, subject heading schemes, and classification schemes are both controlled vocabularies and structured vocabularies. Most controlled vocabularies are structured vocabularies, and almost all structured vocabularies are controlled vocabularies.  But there are other vocabularies that do not meet the criteria of one definition or another, and to recognize and include them, especially as resources or for the mapping of terms, we refer to them as just vocabularies.

Friday, November 27, 2015

Taxonomies and Terminologies

The current specialties of taxonomy management and terminology management have different histories and serve different purposes, but they are in fact closely related, and taxonomies and terminologies can be linked to share knowledge. At the annual Taxonomy Boot Camp conference in Washington, DC, earlier this month I met a terminologist attendee (Beate Früh of Büro b3) from Germany, who explained to me that the fields are quite similar, and that’s why she was attending a taxonomy conference. Also at the conference I met a vendor of a new software company (Jochen Hummel, CEO of Coreon), whose product provides both taxonomy and terminology management.

As with the field of taxonomies and taxonomy management, there are varying definitions of terminologies and terminology management.  The original meanings of both taxonomy and terminology are as fields of study, with taxonomy being the study of naming and classifying and terminology being the study of terms and their use. More commonly though, we refer to taxonomies and terminologies as sets of terms or concepts for a particular subject area or purpose.

Definitions of terminology include “technical or special terms used in a business, art, science, or special subject” (www.merriam-webster.com), and a “set of designations belonging to one special language” (ISO 1087-1:2000, 3.5.1), with “each designation representing a concept” ISO 25964-2:2013. According to International Information Centre for Terminology (InfoTerm): "The systematic organization and definition of concepts is called terminology management – which also includes classification.” (T.E.R.M.I.N.O.L.O.G.Y. PDF)

Differences


There are several differences between taxonomies and terminologies. The most obvious difference is that taxonomies have hierarchical relationships between the terms/concepts so as to create an overall hierarchical structure, and terminologies generally do not. Other differences are that terminologies contain more detailed terms than are found in a taxonomy for a comparable subject area.  Furthermore, while taxonomies are limited to nouns and noun phrases (including verbal nouns), terminologies may contain some specific adjectives. Terminologies generally include definitions for every term, which is not so typical for taxonomies. Many terminologies are used  to support foreign language translation, so there are usually foreign language equivalents for every term, something found in only a small minority of taxonomies. In general, there is more data for a term in a terminology than in a taxonomy.

The most significant difference between taxonomies and terminologies is how they are used. Taxonomies serve information retrieval, through a combination of indexing/tagging use and browsing/navigation and/or search support. Rather than serve information retrieval, the main purposes of terminologies are to support standard use of terms, especially technical terms, with agreed-upon meaning for creating technical documentation and for foreign language translations. Translation has historically been the field of greatest use of terminologies. As such, many terminologists have a background in translation or linguistics. The co-authors of a leading book in the field of terminology, Handbook of Terminology Management, are both professors of translation.

Another difference is in regional use. Taxonomies are especially widely used in the United States and other English-speaking countries, while growing elsewhere too, whereas terminologies are more widely used in Europe and bilingual countries such as Canada. Member organizations of Infoterm, the independent international association focused on terminology, include numerous organizations in Europe, a few in each of Africa, Asia, Latin America, and Canada, but there are no organizations in the United States.

Finally, there are a greater number of standards for terminologies. There are a large number of currently published standards of ISO committee 37 for Terminology and Other Language and Content Resources, including five standards of the Principles and Methods subcommittee, 14 of the Terminographical and Lexicographical Working Methods subcommittee, and five standards of the Systems to Manage Terminology, Knowledge and Content subcommittee, including ISO 30042:2008 TermBase eXhange (TBX). For taxonomies, on the other hand, standards are fewer, or, if considering specifically taxonomies, there actually are no standards, as the most relevant standards are for thesauri (ISO 25964 or ANSI/NISO Z39.19), ontologies (OWL, based on RDF), or more broadly web-based knowledge organization systems(SKOS).

Similarities


Despite their differences, taxonomies and terminologies both are kinds of vocabularies or controlled vocabularies (depending on how “controlled vocabulary” is defined, the topic of my next blog post). The international standard ISO 25964 Thesauri and interoperability with other vocabularies, (part 1 in 2011 and part 2 in 2013) discusses the following “other” vocabularies (as listed in its table of contents): classification schemes, taxonomies, subject heading schemes, ontologies, terminologies, name authority lists, and synonym rings. Thus, terminologies are listed right along with taxonomies and ontologies. The United States standard ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies, however, does not include terminologies in its more limited scope: “Controlled vocabularies covered in by this Standard includes lists of controlled terms, synonyms rings, taxonomies, and thesauri.” (Section 2 Scope).

The most important similarity is that both taxonomies and terminologies refer to terms and unique concepts and not to mere words. As such, they often include and bring together synonyms or other variants to disambiguate concepts. While terminologies don’t characteristically have relationships between terms, they sometimes do.

Linkages


Due to these similarities, it is quite feasible to have connections, links, mappings, etc., between terms in a taxonomy and in a terminology.  Taxonomies and terminologies for internal content within the same organization will have a lot of overlap, so it makes sense to leverage the same knowledge bases and either reuse the same terms in taxonomies and terminologies or at least link/map the equivalencies, both to save effort and to ensure consistency of understanding across and organization. ISO 25964-2 Thesauri and interoperability with other vocabularies includes a section on guidelines for the interoperability between thesauri (and, by extension, taxonomies) and terminologies:
  • Concepts may be mapped between a thesaurus and a terminology, and should follow the same methods and best practices as mapping between two thesauri (22.3.2)
  • Terminologies are useful as sources for concept of terms when building or maintaining a thesaurus. They can also be referred to when writing scope notes. (22.3.3)
  • A search thesaurus or synonym ring may be built using a combination of a thesaurus and a terminology. (22.3.4)

Hopefully, more organizations will be developing both taxonomies and terminologies where they are lacking and also build connections between the two.

Find out more about terminologies


Tuesday, October 6, 2015

Taxonomies and Tables of Contents

A table of contents and a hierarchical taxonomy appear to be quite similar. In my last blog post I looked at taxonomies and indexes, and in the end concluded: “A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index).” Taxonomies, especially the thesaurus kind, have many similarities to indexes when it comes to looking up a topic. Taxonomies, especially the hierarchical kind, are also similar to a table of contents or the navigation aid to a set of content.

Despite the apparent similarities in hierarchical structure and the the purpose of supporting browse navigation, the differences between a table of contents and a hierarchical taxonomy, however, are far greater than the differences between a displayed index and a search-supporting thesaurus.

A table of contents provides navigation, whether for a printed book or large document or for an electronic document or collection. In fact, in a MS Word document with headings, a table of contents that is generated in the left margin pane from those headings is called “Navigation.” Labels in a table of contents or navigation system are arranged like a taxonomy but are not exactly a kind of taxonomy.

Navigation is not a taxonomy

 

Navigation or a table of contents has to perfectly reflect the content that it belongs to. It is completely customized. Two books on the same subject cannot have the same table of contents.  The same taxonomy, however, may be used for more than one content source and typically is. In a table of contents or navigation, each navigation entry, menu label, or heading matches one-to-one to a single, specific section or web page.  Terms in a taxonomy are intended to be used more than once, so each term in a taxonomy is linked to multiple documents or content items.  As such, taxonomy terms need to be somewhat generic, whereas labels or headings in a table of contents or navigation can be specific. Taxonomy terms also need to be created with the anticipation of serving not only current content but also future content, whereas navigation or table of contents entries need only reflect the current content.

Different label wording 

In addition to being more generic, taxonomy terms differ from table of contents entries or navigation labels in other ways.

  • The names of chapters and headings may be longer descriptions (such as “Procedures to Enhance the Accuracy and Integrity of Information Furnished”), whereas taxonomy terms should be concise to aid skimming. A complex topic with a complex heading, can be covered with a combination of taxonomy terms instead of a single complex term, because taxonomy terms do not need to match all content one-to-one (such as the combination of terms: Information accuracy, Information integrity, and Information-gathering procedures).
  • The names of chapters and headings might be question phrases (such as “Why study statistics?”), whereas taxonomy terms should be nouns or adjective-noun phrases and start off with a “keyword” likely to be looked up (not “Why”) to support alphabetical lookup options. Even in a hierarchical taxonomy display, a list of terms at the same hierarchical level tend to be arranged alphabetically.
  • Table of contents entries may be context-specific based on the parent/broader level (such as “Identification and General Terms” or “Special Concerns”), and, in fact, the same sub-heading could repeat under different broader headings. In a taxonomy, each term should be independently unambiguous.
  • Table of contents often start off naming introductory information (such as “Introduction to Identity Theft”) or have sections for Conclusions, neither of which should be terms in a taxonomy. If the same topic is covered three times, in an introduction, body, and conclusions, it will be indexed with the same single taxonomy term, and the end-user will retrieve all indexed results on that topic grouped together.
  • Table of contents or navigation headings can be like titles, which may be “catchy” or enticing to the reader, especially at the top level. Taxonomy terms, by contrast, are clear, concise, and common (based on what most users would call the concept), and not especially creative.

Different structure

 

Tables of contents and taxonomies also differ in their structure. Tables of contents or navigation schemes reflect the organization of content, which may be chronological, pedagogical, from fundamental to detailed, from most important to least important, or the order of perceived user interest. In a taxonomy, the terms at each hierarchical level are arranged alphabetically by default. In a navigation there are no “related terms”, so what appear as subtopics might not be taxonomical narrower terms, but just related terms. Taxonomies, on the other hand, must follow the ANSI/NISO Z39.19 guidelines or ISO 25964 with respect to structuring hierarchical relationships: narrower terms bust be specific types, instances, or integral parts of their broader terms.  By having this standard format, a taxonomy provides organizational predictability for all kinds of users and all kinds of content.

There are certain editorial conventions for content, such as having units of a roughly standard length, which then impact the table of contents or navigation. While there are some variations, one chapter or section is typically not twice as long as another. To achieve balance, a large topic may be spread out over two or more sections, whereas several small topics are grouped together under a heading that is a serial list (such as “Poverty, Inequality, and Mobility”), or under “Other.” Thus, a table of contents topics are based on the amount of material presented. Taxonomy structure, on the other hand, looks at the terms/concepts only, and does not take into consideration the amount of content per term. There is once concept per term, not a list. Rare occurrences of two concepts combined into a single term, such as “Author voice and tone,” are the consequence of two topics being very closely related with overlapping meaning and usage.

Conclusions


While a table of contents or navigation system is not a taxonomy, nor should it be used as a taxonomy, when a legacy print source is converted to units of digital content, a table of contents is still an excellent source for creating a taxonomy.




Monday, August 31, 2015

Taxonomies and Indexes



Taxonomies and indexes are similar in that they both help guide people to find desired information on a selected topic. While they could be searched, they are designed specifically to be browsed. The obvious difference is that taxonomies for end-users are arranged hierarchically (or by facets), and indexes are arranged alphabetically. I have blogged previously on a comparison of index creation and taxonomy/thesaurus creation, but for those who are not already skilled at creating one or the other, let’s step back and further compare taxonomies and indexes themselves.

Taxonomy and Index Similarities and Differences


Taxonomies and indexes were developed for different kinds of media. Modern taxonomies are designed to function well in online implementations (through clicking on hyperlinks to narrower topics or plus signs to expand hierarchical trees), although taxonomies have existed in print as well. Indexes, specifically the back-of-the-book style, are designed to function well in print (through scanning a large number of entries and subentries on a page), although displayed indexes occasionally exist online as site A-Z indexes on small, static websites. Hyperlinked indexes at the end of ebooks are also possible, but the inadequate application of ebook standards have hindered such indexes from becoming commonplace.

Taxonomies and indexes serve different kinds of content. Taxonomies work well for content in a subject area that is easy or logical to categorize: products or product types, industries, geographic areas, occupational areas, media or document types, etc. Indexes work will for content on a subject area that is more abstract and does not lend itself to hierarchical categories: management concepts, history, news, etc. Indexes, since they are arranged alphabetically, are also excellent for browsing names/proper nouns. Taxonomies work well for a defined scope, such as collections of documents of the same type (all resumes, all marketing materials, all legal documents, etc.). Indexes, on the other hand, tend to serve better for content with a less defined scope, such as general encyclopedic information or detailed user manuals. Not surprisingly, book-like content continues to be best served by indexes.

The differences in structure are not as simple as taxonomies being hierarchical and indexes being alphabetical. Taxonomies also have alphabetical aspects, as terms at the same level of a hierarchy are typically (or by default) arranged alphabetically. Indexes, meanwhile, also have hierarchical aspects, as there are main entries with subentries under them. Some large indexes even have a third level of sub-subentries. Then there are kinds of taxonomies, called thesauri, which are structured more around terms and relationships than hierarchical trees, and such thesauri may be arranged alphabetically. In fact, the same thesaurus can be arranged both hierarchically or alphabetically, with the click of a toggle button in a thesaurus management system. But re-sorting a thesaurus alphabetically does not change it into an index. It will still lack the subentry features of an index.

The defining difference between a taxonomy and an index is that an index is not an index unless it is linked to content, as the word “index” means “to indicate” or “to point,” as in to point to content. A taxonomy is still a taxonomy whether or not it is linked to content. (But it is not really useful, unless it is linked to content.)

Where Taxonomies and Indexes Meet


In addition to back-of-the-book indexes, there also exist periodical article indexes, such as the green-bound printed volumes of the Reader’s Guide to Periodical Literature and subsequent online periodical and reference databases accessed through libraries (InfoTrac, ProQuest, EBSCOhost, etc.) What happens is that indexers index the articles with terms from the taxonomy (or thesaurus or controlled vocabulary). The result of the indexing, an alphabetical arrangement of taxonomy terms that were used in the indexing with their links to content, constitutes an index. So, the index comprises terms in the taxonomy that are linked to content and arranged alphabetically. Displayed browsable alphabetical indexes, however, have become less common in online services, as they have been replaced by features that search on the index terms instead.

The trend toward “multi-channel publishing” means that the same original content may appear in different formats and media, such as print and online. Online, however, may mean more than just a PDF or other ebook format of the printed version. Rather, digital text content gets chunked into units of the size or length that could be indexed as a whole with taxonomy terms, and images and new multimedia exist as separate assets that can also be indexed with taxonomy terms.  What this means is that a manual, user guide, or textbook that in print had a back-of-the-book index, in the digital or online medium consists of multiple files for each section or unit and for each media asset, which are indexed and thus retrieved by taxonomy terms instead of using the back-of-the-book index.

Index Entries for Taxonomy Terms?


I have worked on projects were printed content (books, manuals, etc.) were digitized and put into small chunks or files to be indexed with a taxonomy, and the original printed volume had a back-of-the-book index. So, the issue arose: to what extent should the legacy back-of-the-book index be utilized when developing the new digital retrieval taxonomy?  I had access to the index for candidate taxonomy terms and was encouraged to utilize it.

My conclusions have been that the back-of-the-book index serves a slightly different purpose for users than does an indexed taxonomy. A back-of-the-book index serves to locate the page where something was mentioned on a specific topic. Users of a reference work, however, may at other times consult the table of contents to navigate and find the relevant sections and sub-section. A taxonomy serves a purpose that is both, or something in-between, that of a table of contents and a back-of-the-book index. It’s for searching (like in an index) and also for navigating (like in a table of contents), but it points to the subsection level (as in a detailed table of contents), not to a page (as in an index). Also more content is expected to be linked to a taxonomy term (a section unit, and often multiple such units) than content indicated by an index entry (as little as one sentence). So, it would not be right to use all or most of the main entries of a back-of-the-book index to create a taxonomy for the same content.  



Wednesday, July 1, 2015

Taxonomies for Indexing Images

It’s becoming more common to index images with taxonomy terms, instead of just text documents or instead of just keyword-tagging of images. A taxonomy for the subject-indexing of images need not be significantly different than a taxonomy for indexing textual documents, but other metadata differs, and the indexing activity is also quite different.

A dedicated taxonomy for images might be needed for various reasons:
1.    There is no subject-indexing of text documents by an organization.
2.    Different software systems are used by the same organization to manage images and for managing text documents.
3.    Text documents of the same organization are large and thus indexed or cataloged at a broader level.

1.    No text indexing
Some organizations have a large image collection, and that is what they focus their indexing efforts on. They thus design or adapt a taxonomy specific to their image collection. They likely did not have any taxonomy for indexing text. They either don’t find the need for text document search and retrieval, or if they do, they will simply use the search engine instead, since, after all, search engines can search on text, unlike images.

2.    Different systems
Large image collections are increasingly managed in dedicated digital asset management systems, which are designed to support the various metadata associated with images and other nontext media files. Text documents, on the other hand, may be managed in document management systems, record management systems, or collaboration systems such as SharePoint. Each of these kinds of system support some form of controlled vocabulary for tagging content. But if the images are in one system and the text documents are in another system, different controlled vocabularies are likely to be developed. Of course, a generic “content management system” may be used for both images and text documents, but many organizations don’t manage all their content in a single system.

3.    Different levels of indexing detail
The classic example of different levels of detail is for materials at Library of Congress, which had developed Subject Headings for descriptive cataloging for library materials, which are generally monographs, such as books, or video-recordings of films, or sound recordings of music collections. While the subjects of these works might be quite specific, they are often not as specific as an individual graphic material. (An entire book may have numerous specific images.) But over the years, individual images also became part of its collection, and the LC Subject Headings were not specific enough, so the Library of Congress development the Thesaurus for Graphic Materials, which is freely available. The fact that the Thesaurus for Graphic Materials exists does not mean that a dedicated thesaurus for images is always needed, but that it was needed in the context of the Library of Congress collections and the shortcomings of the Library of Congress Subject Headings for indexing images.

If you already have a detailed taxonomy for documents, it certainly can be used for images, as well. Some terms, such as for abstract concepts (such as “Beliefs”), will simply not be needed in the image indexing, whereas a new terms might need to be added (such as the name of a specific type of flower.)

There is definitely unique metadata for images, of which subjects for indexing are just a part. Examples of other possible image metadata includes Creator/photographer, Location shown, Location of creation (camera location), Collection name, Time or part of day (especially if outdoors), Date taken (in contrast to date the image was digitized or edited), Number of people depicted, Copyright, Intended purpose, etc. The Thesaurus for Graphic Materials has had a separate “genre” facet that is very specific for types of graphical works (such as terms for Abstract paintings, Family trees, HVAC drawings, and Magazine covers). Image metadata standards include the IPTC (International Press Telecommunications Council)’s Photo Metadata for photojournalism. Different metadata may be needed for different kinds of images (news, commercial/advertising, art, etc.)

Indexing images is different from indexing text documents. First of all, it’s mostly manual because automation is very limited in image detection (but may be able to detect people’s faces). It’s more subjective as to what is of key importance in an image versus a document. An indexer may also tend to index for what is not actually depicted but for what is implied, which often, but not always, should be avoided.

I recently attended a conference presentation on this subject, “Get the Picture: Use Your Taxonomy to Classify Images” at the SLA conference in Boston earlier this month. The presenter, Ann Poole from Corbis, mentioned various challenges of image indexing, including over-indexing by photographer-submitters, indexing for emotions depicted or implied, and indexing for the backstory of an image in a known place.

Thursday, June 4, 2015

Taxonomist Trends



Last month I conducted an online survey of 150 taxonomists (described in my last blog post). Although the results of which will be used in another publication, it is interesting to note at this time a few comparisons between the results of this survey with a similar one I had conducted in late 2008 for my book, The Accidental Taxonomist. While I added further questions this time, some of the questions stayed the same for comparison.

We would expect over time that more taxonomists have been doing the work for longer. While this is the case for those in the field for 8-15 years, for those involved in the longest period, over 15 years, surprisingly, the survey results did not indicate this. Those who have done taxonomy work for 15 years or more were 26.2% in 2008 but only 17.6% now. The raw numbers, however, for over 15 years did, in fact, increase. So, the survey percentage indicates that there are proportionally more people who have been involved in taxonomies for an intermediate period of time. At the most beginner level, the numbers and percentage of respondents with less than a year of experience in taxonomies declined, from 9.2% to 3.4%. Those with 1-4 years of experience are about the same, and those with 4-15 years of experience increased from 32.4% to 41.2%. So, these numbers could indicate a maturing of the taxonomist profession, but not a graying of the field.

Trends in taxonomist work situation has not changed much with respect to it being a primary job responsibility vs. secondary and with respect to freelance vs. full-time employed. There was a noticeable difference, though, among those who are freelancers (totaling 17% before and 16% now), that more of them are now doing freelance taxonomy work only “occasionally” compared with before,  8% now compared with 4.7% in 2008, and not as many are doing it “often” as before, 8% compared with 12.5%. The fact that there is work for those who want to do freelance taxonomy work only occasionally, whether on top of another job or in combination with other kinds of freelance work is encouraging for those individuals who want to gradually break into taxonomy work.

Regarding the professional and educational background, the leading degree and prior profession of taxonomists today remains that of librarian, and the percentage has, in fact, increased slightly. Meanwhile, those with a technical background have proportionally decreased.  The percentage with an MLS/MLIS degree increased from 48.4% to 54.4% of respondents, and for the options of prior work experience, “librarian” increased from 27.7% to 28.3%. Those with an M.S. or M. Eng. degree decreased from 14.1% to 8.7%. Those with a background in Software/IT decreased from 12.3% to 8.3%, and those with a background in database design, development, or administration, decreased from 6.2% to 1.5%.  While the taxonomy field can certainly benefit from those with a technical background, it is not a necessary skill, and we might assume that fewer IT people in taxonomy work since 2008 might be due to an improvement in the economy, whereupon more of those people have found work in IT again.

In other areas, knowledge management, content management, and content strategy are backgrounds that have become more common, whereas “document management” has decreased. This is likely due to the fact that “content” of various formats is becoming more common than mere “documents.” Digital asset management was not even presented as an option, but three respondents wrote in the blank under “Other.”

Despite the preponderance of MLS/MLIS graduates, still only a minority of respondents had training in taxonomies/classification in college courses, and only a few percentage points more than before, merely reflecting that there were more MLS/MLIS graduates. Those having taken continuing education courses or workshops on taxonomies increased from 13.8% to 20.1%, but there are more such course that did not exist before (including mine). On-the-job training remains the primary means of learning how to create taxonomies. There has been a slight increase in on-the-job “formal” training over “informal” learning and experience, with the percentage with formal on-the-job training having increased from 21.5% to 28.9%.  Since this particular survey question permitted multiple responses, the leading response of informal on-the-job learning was 71.1%, but this was the only response option with a decrease (down of 83.1%). This is a good sign that taxonomists seem to be learning the skill in more varied means than the dominant on-the-job experience.