Monday, February 29, 2016

Free Taxonomy Management Software

There is always an interest in free taxonomy or thesaurus management software. Many people who create taxonomies try to save money on purchasing taxonomy management software by simply not using any taxonomy management software but something else they already have, such as Excel. Those who are developing either very large taxonomies or more complex thesauri, however, realize that a dedicated taxonomy/thesaurus management system will save a lot of time and headache in the long term.

Various free thesaurus management software offerings have been available since the early 1990s. They tend to have their origins in academic projects in computer science, information science, or library science at universities, and others have been government projects. Some free software of the previous decade is no longer available, though. Discontinued software is still listed for posterity on the web directory of "Software for building and editing thesauri," started by Leonard Will and now managed on the Taxobank website. For example, two free software products listed were for MS-DOS and one no later than Windows 3.1.

The first free thesaurus software I was familiar with was TheW, a simple thesaurus management software developed by Tim Craven a professor of information science at the University of Western Ontario, since retired. I actually ran across it, because I was at the time exploring another software program of Prof. Craven’s for creating website indexes. TheW32, which is available for Windows XP, Vista, and 8 and for Java, is no longer maintained. It was last updated for Windows in in 2007 and for Java in 2009. At this point, I would no longer recommend it.

Protégé Ontology Editor is an established free and open-source ontology editor from Stanford University. It is quite robust, has an active user community and support groups, and continues to be upgraded (with version 5.0.0 recently released in beta). The issue with Protégé is that it is a native ontology management tool, not a thesaurus management program (or even ontology “lite” as some thesaurus management software can manage semantic relationships and classes). Thus, it takes a very different approach to modeling and building vocabularies, which is not intuitive to taxonomists, such as myself, and, although I downloaded it, I never found it worth the difficulty to learn. If you can truly consider yourself an ontologist, though, then great, this might just be the solution for you.

I had explored some other free software offerings when writing my book, The Accidental Taxonomist, six years ago and came across TemTres and ThManager. At the time I did not find them adequately enforcing valid relationships between terms, so I was somewhat dismissive about the software. Recently I revisited these products.

TemaTres, which has its origins in the Library and the University of Buenos Aires, Argentina still does allow creating duplicate terms, which was my initial cause for concern, but since then the user interface of the latest version (2.1) offers a new configuration option for quality policies, to enable or disallow duplicate terms. Thus, TemaTres is a suitable free thesaurus software product if used by a knowledgeable and experienced taxonomist who knows to set the options and understands the alerts. TemaTres is being supported, and its latest version was just this winter, 2016. The software is web-based, which means that it requires a PHP, MySQL, and HTTP web server, so it may not be the configuration that any independent taxonomist would set up and install in a small/home office. Otherwise, TemaTres is worth looking into.

ThManager is from the University of Zaragoza and GeoSpatiumLab S.L., both in Zaragoza, Spain. ThManager supports the SKOS standard rather than ANSI/NISO Z39.19 or ISO 25964, which means it does not by default enforce all rules of the latter standards. But I have since found this to be a trend of new vocabulary management software: compliance with SKOS and support for ANSI/NISO Z39.19 or ISO 25964, as configurable rather than by default. Thus, I am no longer complaining if it does not support ANSI/NISO Z39.19 by default. The main problem with ThManager, though, is that it is not kept so well up to date. It was last significantly updated in 2006. The installation for even Windows 7 requires a “portable” version due to an installation bug.

More recently I discovered another free thesaurus management software, VocBench. It was developed originally for the management the AGROVOC thesaurus of the Food and Agriculture Organization (FAO) of the United Nations as a joint project of FAO, which is based in Rome, Italy, and the Artificial Intelligence Research group at the University of Rome Tor Vergata. VocBench, like TemaTres, is SKOS-compliant, rather than ANSI/NISO Z39.19 compliant. VocBench is web based, with web server requirements of Apache Tomcat, MySQL, and OWLIM installed on a Sesame2 server.

In addition to being free, these applications tend to have the advantage of being able to run on multiple platforms and yet can be installed and used by single user. The editing features may be a little less standard and thus less intuitive, and documentation and support tends to be less than commercial software. Yet, they are worth considering for long-term experimentation (with no time limit as in commercial demo software), for use in nonprofit or low-budget projects, or by anyone with a strong interest in working with open source software.