Thursday, June 6, 2013

How Many Facets


Faceted taxonomies (taxonomies with attributes, dimensions, filters, etc. to limit search results based on the combination of selected criteria) are becoming increasingly popular with the support of web database technology. Unlike traditional hierarchical taxonomies, designing a faceted taxonomy first requires a decision on how many facets to create. There are various factors to take into consideration. 

What the content supports

The nature of the content is always the most important factor. It may seem ironic, but content that is more limited in scope can support more facets than content that it broad in scope. For example, an ecommerce site selling just computers, could have a relatively large number of facets by which to limit laptop computers: brand, price range, hard drive, screen size, operating system, processor brand, processor type, webcam inclusion, and online/in-store availability (9 facets). On the other hand, if a content repository comprises all kinds of articles, then there is not much else beyond “subject” and article type to classify them by (2 facets). (Other metadata fields, such as author, title, and date, may also be used to limit results, but these do not involve taxonomy terms.)

What the end-user user interface supports

More facets can be included, if they are stacked one above each other vertically, such as in a left-margin, than if they are displayed horizontally across the width of the screen. This is because horizontal scrolling is something users dislike and is avoided in content design, whereas limited vertical scrolled is acceptable.

Sometimes a website or intranet is created in a web content management system that does not give as much flexibility in taxonomy display. For example, SharePoint requires a horizontal list of facets, if the facets are to be used to filter content displayed in “columns,” where facet names are the column headers. Furthermore, SharePoint will by default create columns for document format type, content type, author, date created, and date modified. While you can hide these columns, if you want to use some of these defaults, that will limit the number of other descriptive facets for columns to about three or four.

Facets that limit search results are typically displayed in the left-margin, so more facets can be created. However, the number of facets should be limited so that all of the facet labels (although not necessarily all of their contents/facet values/terms) display by default without scrolling. The first 4-6 terms or values within a facet should be displayed to give the user a good understanding of what is in there, with a link or button to “show more.” Scrolling can be used when a facet category is expanded. So, what needs to be considered is the vertical space if all facets display at least some values, and if that does not fit, whether some facets can be collapsed by default. The example below of the facets for limiting people search results on LinkedIn shows the default display of two facets with the first 6 terms, one facet with all 5 terms, and 12 facets collapsed (an unusually high number of facets).



What the tagging process supports

For manual tagging, you have to consider who is doing the tagging, what their knowledge and experience is, what level of training is practical, how much time and effort can practically be devoted to tagging, and what the tagging user interface looks like. As with the end-user UI, the tagging interface also needs to display all facets and facet values in an easy-to-use manner. Usually, people who tag content for internal content management are not dedicated indexers. To simplify tagging and ensure that it is done correctly and done at all, for internal tagging there should not be too many facets for internal tagging (such as around 3).

Organizations which tag/index content for subscription sale, on the other hand, where content indexing is core to their business, will invest in dedicated indexers who can be given thorough training in assigning terms from multiple facets and will also check their indexing for quality. Thus, for professional indexing, a greater number of facets can be supported.

In automated tagging, it’s not so much a matter of how many facets, but rather how distinct the facets are and how easy they are for automated tagging. There are different technologies out there, but, in general, named entities/proper nouns are easier to distinguish than topical subjects. So, facets for author, location, department, product name, etc., are easy to classify automatically. Language, and a document type that is based on file format are also straight-forward for auto-classification. Subject or Topic could be catch-all for high-ranked keywords. If you want to create facets for different kinds of topics, though, such as Purpose, Activity, Significance, Origin, etc., the distinctions will likely be too challenging for an auto-classification tool.