Sunday, February 19, 2017

Avoiding Mistakes in Taxonomy Hierarchical Relationships

Perhaps the most important issue in designing a hierarchical taxonomy is creating hierarchical relationships between terms correctly. This makes the taxonomy intuitively easy to understand and navigate by all kinds of users, regardless of whether they have had any training on using a taxonomy.

The basic principles of the hierarchical relationship are described in the ANSI/NISO Z39.19 and ISO 25964-1 standards for thesauri. As a quick summary, the relationship is created between terms in the following circumstances:

  • a broader term which is generic and a narrower term which is a more specific type of the generic broader term,
  • a broader term which is generic and the narrower term is a named instance (proper noun) of the generic broader term,
  • a broader term which is a whole entity and a narrower term which is an integral part.

It is the first, generic-specific type that is most common, but is also most prone to errors by those not experienced in creating taxonomies. Typical errors include confusing refinement and narrower terms, too closely reflecting the source content hierarchy, and creating narrower terms that are applications, uses, or examples of a broader term.

Confusing Refinements with Narrower Terms

We envision users browsing a hierarchical taxonomy from top down, from broad topic to more specific topic. A more specific topic is a narrower term (NT) of a broader topic. However, instead of providing more specific topics, the creator of a taxonomy might mistakenly provide refinements of the broader topic, which are aspects of the topic, but not actually narrower terms. A term that is an aspect or refinement is not a unique stand-alone term/concept, but rather it is meant to be used in combination with its parent term.

An example of such an erroneous hierarchy would be:

  Eye diseases

Diagnosis is an aspect or refinement of Eye diseases (and of other disease-type terms), and not a narrower term. A narrower term would be specific type of eye disease:

  Eye diseases
     NT: Glaucoma

A refinement term might not be as obvious as it is in the above example. If the same term, however, appears duplicated as a narrower term to different broader terms, but with a different implied/contextual meaning in each case, this should be red flag that the duplicated narrower term is really a refinement term. For example, the duplication of the term Waiver in a legal taxonomy as:

  Objections to evidence

  Right to jury trial

In this case, the duplicate narrower term should be changed to be specific in each case, such as: Objections to evidence waiver and Right to jury trial waiver.

Novice taxonomists might create such incorrect broader term-narrower term relationships because they have seen them formed as such elsewhere, such as Library of Congress Subject Headings plus Subdivisions or back-of-the-book index main entries plus subentries. A subheading or a subentry is not the same as a narrower term, because a subheading or a subentry only has usage and meaning in the context of the main heading it is associated with (appears under). A taxonomy narrower term, on the other hand, is not a different kind of term, but is rather a description of a relationship between terms. The meaning of a term in a taxonomy is constant and not dependent on its location in the taxonomy.

Too Closely Reflecting the Source Content Hierarchy

Some taxonomies are based heavily on certain text sources, such as the table of contents of one or a limited number of books or manuals, where the text is structured into units, chapters, main heading sections, subheading sections, etc. It is thus natural to make use of the structure of the text as a basis for the structure of the hierarchy. But there can be issues.

In the following example of a chapter and its headings from a textbook, greater hierarchical structure is needed for the corresponding taxonomy terms, and one of the topics (Units of Measure) does not belong within this hierarchy.

  Microbiology Laboratory
  --Microbiology Lab Personnel
  --Introduction to the Microscope
  --Units of Measure
  --Types of Microscopes
  --Laboratory Staining Methods
  --Culture Media

These concepts may appear in a taxonomy arranged hierarchically as follows:

  Medical laboratory technology
  NT: Laboratory equipment and supplies
       NT: Culture media
       NT: Microscopes
            NT: Microscope types
  NT: Laboratory personnel
  NT: Microscope use
       NT: Microscopy stains
  NT: Serology

  NT: Measurements and calculations
       NT: Units of measure

Another issue is that, even when the the hierarchy from the source is acceptable, the subheading-based terms are short, generic, and without context. An example is as follows:

  Eye Medications
  --Anti-inflammatory Agents
  --Antiglaucoma Agents
  --Local Anesthetics

The only correct narrower term above is Antiglaucoma Agents, as the other terms are not specific to eye medications. They could be linked as related terms instead.

Applications, Uses, or Example-Type Terms

Relying too much on certain text sources for the taxonomy may also result in erroneously creating narrower terms for the applications, uses, or examples of the broader term concept, because the text presents content that way.

Following are several examples:

  Web Applications
  --Tourism and Travel
  --Higher Education
  --Financial Institutions
  --Software Distribution
  --Health Care

  Decision making issues
  --Ethical conflicts
  --Information sources
  --Intraorganizational conflicts
  --Social influences

  Globalization challenges
  --Cultural differences
  --Economic risk
  --Political risk
  --Managerial limitations

Each of these so-called narrower terms are merely examples within the context of the broader term. All "narrower terms" could have other uses beyond the context of the broader term. To make the hierarchy correct, either:
1) the relationship should be changed from narrower-term (NT) to related-term (RT). This would be the case, if these terms can logically exist elsewhere in the taxonomy. Also, indexing of the concepts may require a pair of terms (such as Globalization challenges AND Economic risk),
2) the narrower terms should be modified and clarified, such as Cultural challenges to globalization, Economic risk challenges to globalization, Political challenges to globalization, and Managerial challenges to globalization. This would be the case, if these terms did not exist elsewhere in the taxonomy.

In conclusion, hierarchical relationships need to be constructed independent of any sources for terms, and they need to be universal and not subject to certain contexts.

Friday, January 20, 2017

Orphan Terms in a Taxonomy

A taxonomy has hierarchical relationships between all of its terms, so one of the quality control checks on a taxonomy is to ensure that there are no “orphan” terms, which are terms that lack hierarchical relationships. One of the purposes of a taxonomy is for users to be able to navigate it (whether it is fully displayed or whether the links between only the selected terms are displayed), in order to find terms of interest. An orphan term, thus, cannot be found by browsing, only by searching.

Taxonomy/thesaurus management software can generate orphan term reports. However, as there are different kinds or definitions of taxonomies or thesauri, there are also different kinds or definitions of orphan terms.  Certain definitions of orphans may be permitted, other kinds of orphans may be permitted in only certain kinds of controlled vocabularies, and some kinds of orphans are never permitted in any taxonomy or thesaurus.

Differences between taxonomies and thesauri

There are two main differences between strictly defined taxonomies and thesauri that have an impact on orphan terms.
  1. A taxonomy has only hierarchical (broader-narrower) relationships between its terms, whereas a thesaurus has both hierarchical and associative (related-term) relationships between terms.
  2. In a taxonomy, all terms belong to a single or limited number of hierarchies, each with a designated, broad-meaning “top term,” whereas in a thesaurus hierarchical relationships are created between terms merely as appropriate, without regard to any larger hierarchies or top terms.  A taxonomy thus has a top-down inverted tree structure, whereas a thesaurus does not necessarily have an over-arching hierarchical structure.

Different kinds of orphan terms

The loosest and easiest to remember definition of an orphan term is a term which lacks a “parent”. In other words, the term has no broader term, but it may have other kinds of relationships to terms.  A “top term” report of taxonomy/thesaurus management software will get this result, since all top terms are, by this definition, orphans.

An orphan term could also be defined as a term that has no hierarchical relationships, whether broader or narrower. In a thesaurus, such terms could have associative relationships only. In a taxonomy (lacking associative relationships), these terms then would have no relationships to other terms in the taxonomy.

At the strictest definition, an orphan term is defined as a term which lacks any relationships to any other term. This would be the same in a taxonomy or a thesaurus.
Finally, taxonomy/thesaurus management software may have the feature to allow you to define your own orphans, that is to designate a relationship type and then generate a list of terms that lack that relationship type to any other terms.

Which kind of orphans to avoid

Orphans defined merely as those lacking broader terms, are not necessarily a problem, since every taxonomy or thesaurus has top terms. For quality control, you would want to ensure that these parent-less “orphans” are indeed the top terms that you want. For a taxonomy, there are strict criteria for top terms. They must be broad-meaning categories under which are extensive hierarchical trees, perhaps even of a similar depth and breadth for each top term. For thesauri, the requirement for top terms are usually not strict, but it is still a good idea to review the top terms to ensure that there really is no appropriate broader term move them under.

An orphan report of the kind that indicates terms that lack any hierarchical relationship (narrower or broader) but may have associative (related-term) relationships is quite helpful when editing thesauri. It will depend on the thesaurus owner whether the policy should permit such “hierarchical orphans.” Generally, such orphans should at least be avoided and perhaps permitted in only exceptional circumstances.

Orphans defined as terms that lack any relationships to other terms in the taxonomy should not be permitted in any circumstance. They don’t serve the navigation feature of a taxonomy, as there is no way to find them without search. If a suitable broader term within the taxonomy cannot be found, then they may be out of scope of the taxonomy/thesaurus. Usually, though, such orphan terms are the results of taxonomist error. If the taxonomy management software permits duplicate terms, these orphans could be duplicates of synonyms/nonpreferred terms/alternative labels.

Resolving orphan terms

In the case of orphan terms that lack broader terms but are not obviously top terms, the taxonomist should search the taxonomy/thesaurus for a suitable broader term. If one cannot be found, careful consideration should be made whether a new term should be added that would both serve as a broader term for the orphan term but also have a suitable broader term of its own already in the taxonomy/thesaurus. If dealing with a thesaurus rather than a taxonomy, then it may be OK to leave the term without a broader term, but then the related-term relationships should be checked and possibly enhanced so that there are multiple related-term relationships.

Sometimes stretching the thesaurus rules for hierarchical relationships may be desired to provide a broader term to an orphan. This is generally acceptable in a taxonomy but not in a thesaurus. Following are examples of former orphan terms whose candidate broader terms are not 100% correct broader terms (the narrower term is not a kind of or a part of its broader term), but they are close, so these relationships could be made, even in a thesaurus. What follows in parentheses are theoretical broader terms which are not practical terms to create.
  • College applications BT College admissions (and not a BT of Applications)
  • Behavior problems BT Behavior (and not a BT of Problems)
  • Atmospheric composition BT Atmosphere (and not a BT of Composition)
  • Conflict termination (Military science) BT Wars (and not a BT of Termination)

Orphans that lack any relationships are usually the result of taxonomist error. Perhaps the taxonomist got interrupted and did not complete the process of relating a term and then forgot. In many cases these orphans should have been made as synonyms/nonpreferred terms/alternative labels. The taxonomist should run orphan reports frequently enough to remember whether the orphan term was intended to be a preferred or a nonpreferred name.

More examples of how to resolve orphan terms are in a PDF of a PowerPoint presentation “Managing Mature Taxonomies: Resolving Orphan Terms” I gave as an SLA Taxonomy Division webinar in December 2016.

Sunday, December 11, 2016

Use Cases for Taxonomy Development

Developing use cases in the initial design of a taxonomy is something I did not learn about until I went into consulting, but it is a useful approach to taxonomy and metadata design in any circumstance, regardless of the involvement of an external taxonomy consultant.
The use case technique comes from the field of systems analysis, and especially software and systems engineering, but use cases are increasingly applied in the development of systems and structures for knowledge management, content management, information management, etc. Typically use cases for a taxonomy are not limited to the taxonomy alone but are for the design of all metadata and the broader information or knowledge management system. “System” means the combination of software, content, metadata/taxonomies, and users.

What is a use case?

A use case describes a scenario of how a user uses a system to accomplish a particular goal. A use case should not be confused with a case study. It need not be long and detailed, although they may vary in their descriptive length. All use cases include:
  1. A designated user type and role (sometimes called “actor”), which could be as simple as an internal organization job title. Examples of external users could be designated as: undergraduate college student, paralegal, pharmaceutical corporate librarian, experienced online shopper, etc.
  2. A task that the user is engaged in which uses the system. This will likely be described in more detail than the description of the user. Taxonomy use cases would typically involve a specific aspect of one of the following tasks: indexing/tagging, using search to find information, using browse to find information, discovering/exploring for related information and finding/retrieving certain content items
  3. A goal and perhaps ultimate purpose of the user’s task.
I had participated in a consulting project once whereby the stakeholders were advised to create use cases that went so far as identifying fictitious personas, a practice that is often done in marketing planning. I don’t think it’s necessary to go that far in taxonomy use case development, although it might be useful if there are users of the taxonomy who are external customers/clients.

Why create use cases for taxonomies and other metadata?

The task of developing taxonomies and other metadata can benefit from use cases in particular ways:
  • It grounds the taxonomy in reality, ensuring that it is designed to be usable, rather than being an academic taxonomy on a subject domain.
  • It engages the users and other stakeholders in the taxonomy development process, who then become more interested in supporting/promoting or using the taxonomy, especially when the taxonomy serves their user needs and solves their problems.
  • It provides sample situations which can then be utilized for testing the draft taxonomy before the taxonomy and content are fully implemented in the system. As a taxonomist who has led taxonomy testing activities among sample users, I have personally found used cases to be valuable for this purpose. 

What are examples of use cases for taxonomies and other metadata?

The following brief fictitious use case examples are of the kind that could be used for taxonomy development.

Internal organization use cases:

  • A subject-matter-expert author who is required to tag authored documents with subject categories so that users can find documents by subject.
  • A digital asset manager in an advertising agency, who needs to ensure that image files are assigned the proper copyright information.
  • A content manager at a publishing company who, as a major responsibility, needs to assign full metadata to XML file content for various downstream purposes to assembly digital content products.
  • A marketing copywriter seeking an expert on a specific subject among a company’s employees to give feedback on the accuracy of a blog post the copywriter is writing and who is inclined to browse subjects if available. 
  • A manager who wants to find historical information on product offered in order to prepare a presentation about the product.
  • A digital marketer who needs to update the public website with seasonal images that were not used last year (but two years ago is OK).
External/customer use cases:
  • An undergraduate student who uses the default search to look for information on the events leading up to the fall of the Berlin Wall for a history class paper.
  • An experienced online shopper who is searching to purchase carry-on luggage and wants to filter results by price, color, and positive reviews.
  • A corporate librarian conducting competitive intelligence research on market strategies of leading competitor companies in the same industry and who would like to use advance and/or Boolean searching, if possible.
  • A lawyer specialized in commercial law who need to find out where and how to file a financing statement in the proper jurisdiction for a client of his who to secure a loan, but lacks experience in legal research.
  • A cancer patient searching for an oncologist with a certain type of cancer specialty, acceptance of certain insurance, within a certain geographic region, and with a number of good patient reviews.
  • A compliance officer who needs to find regulations and associated policies and procedures that pertain to various departments and products lines of his employer, who knows the names of statutes but not the titles of associated regulations.

How are taxonomy use cases utilized?

In addition to serving the purposes of engaging stakeholders and ensuring the taxonomy is content- and user-focused, use cases can have additional specific applications, such as:

  • Identifying or validating who all the different types of users are, so that their issues and feedback can be taken into consideration in the future.
  • Suggesting improvements in the user interface design.
  • Developing walk-through scenarios, with specific search criteria or topics of browsing spelled out, for offline testing of the taxonomy usability (including adequate depth and breadth) for both indexing/tagging and retrieval. (Read more at the post "Testing Taxonomies.")
  • Providing scenarios that can be used in other taxonomy/knowledge management project research, such as ROI (return on investment) research.

Wednesday, November 30, 2016

Popular Topics in Taxonomies

This month marks the 5th anniversary of The Accidental Taxonomist blog, so it is a fitting time to look back and see which posts were most popular.  Following are the top 10 posts with the most visits (pageviews) from the time they were published to date, with the number of visit indicated:

1)  3722  E-Commerce Taxonomies (Nov 26, 2012)
2)  3267  Taxonomy Software Directories (Apr 11, 2014)
4)  2462  Taxonomies vs. Classification (Apr 2, 2013)
5)  1859  Taxonomies vs. Thesauri (Jan 28, 2014)                         
6)  1743  Digital Asset Management and Taxonomies (May 28, 2012)
7)  1725  Information Architecture and Taxonomies (Nov 9, 2013)       
8)  1670  Taxonomy Design for Content Management Systems (May 4, 2016)                     
9)  1621  Taxonomy Governance (Dec 9, 2013)
10) 1448  Topics and Document Types in Taxonomies (May 6, 2013)

The topic of taxonomies for e-commerce has been the most popular blog since shortly after it was published. This does not necessarily mean that e-commerce is the most common implementation of taxonomies, but it is clearly defined, whereas others, such as enterprise taxonomies, could go by different names, such as business taxonomies, internal taxonomies, organizational taxonomies, intranet taxonomies, etc. Nevertheless, e-commerce is a very significant application of taxonomies. Among my presentations on SlideShare, the presentation on e-commerce taxonomies is also by far the most popular.

Other popular blog post topics on taxonomies tend to be those in combination with other significant topics in the blog title, such as software, ontologies, digital asset management, content management, content management systems, information architecture, and governance. This is not surprising. I am a little more surprised at the popularity of topics “Taxonomies vs. Classification,” Taxonomies vs. Thesauri,” and especially “Topic and Document Types in Taxonomies."

Other posts with high pageview numbers (although not in the top 10) include “Card Sorting and Taxonomies,” “Taxonomies and Content Management, “Evaluating Taxonomies,” “Faceted Search vs. Faceted Browse,” and “Business Taxonomies.”

Blog posts that were less popular (besides the first two) were ones about taxonomists, and not taxonomies, despite the title of this blog, such as “The Remote Taxonomist” and “Mentoring Taxonomists.” The post on “Multilingual Taxonomies” surprisingly has one of the fewest page views, but I had posted it only in my first month, November 2011, of the blog, when the blog was not well known. I would expect it to be found later through searches, though.

Some posts will get high numbers of visits based on their titles, and some will not, such as “Tags and Categories” or “Taxonomies for Multiple Kinds of Users,” even if the topics are of particular interest to taxonomists. It sometimes seems as if I have already posted on all of the leading topics related to taxonomies, and there is not much more to write about. However, here will continue to be interesting topics to write about, but I may simply run out of blog post titles that have high SEO (search engine optimization) value.

Monday, October 31, 2016

Taxonomy Boot Camp London Conference

I was fortunate to attend and present the inaugural Taxonomy Boot Camp London conference earlier this month. After 11 successful years in the United States (initially in New York in 2005, then for four years in San Jose, CA, and six years in Washington, DC), Taxonomy Boot Camp held its first overseas conference at the Olympia Conference Centre in London, October 17-18, 2016. Although taxonomy related topics are presented at many other conference, Taxonomy Boot Camp remains the only conference dedicated to taxonomies.

Conference Format 

The conference was very similar and comparable to Taxonomy Boot Camp in the U.S., with respect to scope and range of topics covered, level of detail, and quality. The only different was some more UK examples/case studies, rather than US examples/case studies.  Sessions were also a similar mix of general topics and case studies. The format was also similar, but not identical.  Whereas the U.S. conference has, in recent years, two tracks on the first day and a combined track the second day, Taxonomy Boot Camp London maintained two tracks on both days, except for the keynotes and one plenary session. As a result, I had to make more decisions about which sessions to attend. The number of speakers is about the same at both conferences, so by holding more concurrent sessions, Taxonomy Boot Camp London had slightly longer sessions per speaker on average. At Taxonomy Boot Camp in the U.S., an individual speaker may speak for only 15-20 minutes in many sessions. A half-day afternoon pre-conference workshop on “Taxonomy Fundamentals” was also part of Taxonomy Boot Camp London, whereas Taxonomy Boot Camp in the U.S. has not had half-day pre-conference workshops, shared with KM World, since 2009, as now the conference starts on the Monday of the KM World pre-conference workshops. Instead, Taxonomy Boot Camp in the U.S. has a 1.5-hour taxonomy basics session on the first day, concurrent with other sessions. 

Attendance was strong for a first time specialized conference with 173 (including 42 speakers). While not as many attendees as Taxonomy Boot Camp in the Washington, DC, which has about 200, this was more attendees than the U.S. Taxonomy Boot Camp conference in its earlier years.  There was, as expected, greater international participation from throughout Europe. There were probably slightly more whose interest in taxonomies is for internal organization information management, rather than for published content, whether corporate, nongovernmental organization, or government agency. While there were some publishers, there was a noticeable lack of those involved in ecommerce. I led the half-day pre-conference workshop, and received the list of 37 attendees and their affiliations for the workshop, and I assume they are a representative sample of the conference attendees.

As with Taxonomy Boot Camp in the U.S., the conference is not held by itself, but is co-located with another conference by the same organizer, Information Today Inc. Whereas in the U.S. Taxonomy Boot Camp is currently co-located with KM World, Enterprise Search & Discovery, and SharePointSymposium, Taxonomy Boot Camp was co-located with Internet Librarian International (ILI), which has been taking place in London every October since 2008. Taxonomy Boot Camp London and ILI (which now has the tagline “The Library Innovation Conference”) are not as integrated as Taxonomy Boot Camp and KM World are. The attendees were more distinct in their professions and interest. Whereas in the U.S. attendees may register for a “platinum” pass which allows access to any of the co-located conference sessions, in London the registrations for the two conferences were distinct. There were no shared keynotes, and meals and breaks were in slightly different areas.  Taxonomy Boot Camp attendees had access to the ILI sponsor booths, but ILI attendees did not have access to the three Taxonomy Boot Camp sponsor booths, which were located within one of the session rooms. I imagine this might change in the future, if the number of Taxonomy Boot Camp London sponsors grows. On the other hand, the relatively contained nature of Taxonomy Boot Camp London made it excellent networking opportunity.

Taxonomy Boot Camp London also had an association partner, the International Society for KnowledgeOrganization (ISKO), whose UK chapter is very active. Its chapter for Canada and the United States is not so active. It’s membership also tends to be more academic, with variations by chapter, but its vice president, Stella Dextre Clarke, who gave a brief presentation, said that the organization hoped to broaden its membership more beyond academia.

Specific Sessions

The two keynotes, one on each morning, were both excellent and relevant to the audience. Mike Atherton, a content strategist at Facebook and formerly and information architect at the BBC for its websites, spoke on “Designing Future-Friendly Content” as the opening keynote. He presented a case study of designing the website for the IA Summit conference, which is redone every year. Some of his key points were: Agree to the strategy, argue the tactics, stand up for taxonomy for information architecture, and be a teacher. 

Patrick Lambe, partner of the knowledge management consultancy Straits Knowledge, and a frequent speaker at Taxonomy Boot Camp in the U.S., presented the second day’s keynote: “Gathering evidence for a taxonomy – knowledge mapping or content modellings.” He spoke of the key issues/decision points as: purpose, constraints, principles, and scope. He said that subject matter experts should only be engaged for feedback on specific questions at the end of a taxonomy project. Design is based on evidence and desired outcomes. Warrant is the evidence behind the design and includes content warrant, user warrant, and standards warrant. There are different approaches for building different kinds of taxonomies. For building an internal/enterprise taxonomy, Patrick recommends undertaking knowledge auditing and knowledge mapping, mapping both activities and assets. For building an external-use taxonomy, or one with both internal and external sources and scope, knowledge mapping does not work. Rather, content modeling is done with use case scenarios and just a sampling of content.

Other informative sessions of note included “How to fast-track taxonomy projects using linked data” by Dave Clarke, CEO of Synaptica. He explained the difference between linked open data and linked enterprise data (behind the firewall), and both have their uses and benefits. Mapping to linked open data resources can be done for semantic enrichment, pulling information from outside into an organizational system. 

Ben Licciardi, Manager (consultant) at PwC, presented "Taxonomies and the systems in which they reside: Is the technology-agnostic approach right for you?" He presented the benefits of both scenarios. Developing a technology-agnostic taxonomy, in addition to enabling the taxonomy to be used in different systems, also gets you thinking outside the box and helps future-proof the taxonomy. A system-focused taxonomy, on the other hand, keeps you grounded in reality, is designed to the customers, and is budget-conscious.

A panel comprising two consultants and a user experience architect spoke in a session titles “Working within multi-disciplinary teams - taxonomist tales from the trenches.” Among other things, they discussed that more people want to be involved on the team of developing taxonomies, and more people should be talked to, including scrum masters, QA team, tech leaders, user experience people, software people, content strategists, product managers, business analysts, data modelers, enterprise architects, etc.  

Congratulations to conference chair Helen Lippell for a successful conference. The date for the next Taxonomy Boot Camp London conference has already been set for October 17-18, 2017, at the same venue.