Archive for the 'information architecture' Category

Dec 02 2012

Libraries, Media, and the Semantic Web meetup at the BBC

In a bit of a blog cleanup, I discovered this post languishing unpublished. The event took place earlier this year but the videos of the presentations are still well worth watching. It was an excellent session with short but highly informative talks by some of the smartest people currently working in the semantic web arena. The Videos of the event are available on You Tube.

Historypin

Jon Voss of Historypin was a true “information altruist”, describing libraries as a “radical idea”. The concept that people should be able to get information for free at the point of access, paid for by general taxation, has huge political implications. (Many of our libraries were funded by Victorian philanthropists who realised that an educated workforce was a more productive workforce, something that appears to have been largely forgotten today.) Historypin is seeking to build a new library, based on personal collections of content and metadata – a “memory-sharing” project. Jon eloquently explained how the Semantic Web reflects the principles of the first librarians in that it seeks ways to encourage people to open up and share knowledge as widely as possible.

MIMAS

Adrian Stevenson of MIMAS described various projects including Archives Hub, an excellent project helping archives, and in particular small archives that don’t have much funding, to share content and catalogues.

rNews

Evan Sandhaus of the New York Times explained the IPTC’s rNews – a news markup standard that should help search engines and search analytics tools to index news content more effectively.

schema.org

Dan Brickley’s “compare and contrast” of Universal Decimal Classification with schema.org was wonderful and he reminded technologists that it very easy to forget that librarians and classification theorists were attempting to solve search problems far in advance of the invention of computers. He showed an example of “search log analysis” from 1912, queries sent to the Belgian international bibliographic service – an early “semantic question answering service”. The “search terms” were fascinating and not so very different to the sort of things you’d expect people to be asking today. He also gave an excellent overview of Lonclass the BBC Archive’s largest classification scheme, which is based on UDC.

BBC Olympics online

Silver Oliver described how BBC Future Media is pioneering semantic technologies and using the Olympic Games to showcase this work on a huge and fast-paced scale. By using semantic techniques, dynamic rich websites can be built and kept up to the minute, even once results start to pour in.

World Service audio archives

Yves Raimond talked about a BBC Research & Development project to automatically index World Service audio archives. The World Service, having been a separate organisation to the core BBC, has not traditionally been part of the main BBC Archive, and most of its content has little or no useful metadata. Nevertheless, the content itself is highly valuable, so anything that can be done to preserve it and make it accessible is a benefit. The audio files were processed through speech-to-text software, and then automated indexing applied to generate suggested tags. The accuracy rate is about 70% so human help is needed to sort out the good tags from the bad (and occasionally offensive!) tags, but thsi is still a lot easier than tagging everything from scratch.

No responses yet

Sep 06 2012

The Shape of Knowledge - review of ISKOUK event

Published by Fran under KO, information architecture

On Tuesday I attended a very interesting event about information visualization and I have written a review for the ISKO UK blog.

I was particularly fascinated by the ideas suggested by Martin Dodge of mapping areas that are not “space” and what this means for the definition of a “map”. So, the idea of following the “path” of a device such as a phone through the electromagnetic spectrum brings a geographical metaphor into a non-tangible “world”. Conversely, is the software and code that devices such as robots use to navigate the world a new form of “map”? Previously, I have thought of code as “instructions” and “graphs” but have always thought of the “graph” as a representation of coded instructions, visualized for the benefit of humans, rather than the machines. However, now that machines are responding more directly to visual cues, perhaps the gap between their “maps” and our “maps” is vanishing.

One response so far

Aug 11 2012

SLA Conference in Chicago

Last month I had a wonderful time at the SLA (Special Libraries Association) conference in Chicago. I had never previously been to an SLA conference, even though there is a lively SLA Europe division. SLA is very keen to be seen as “not just for librarians” and the conference certainly spanned a vast range of information professions. The Taxonomy Division is thriving and there seem to be far more American than British taxonomists, which, although not surprising, was a pleasure as I don’t often find myself as one of a crowd! The conference has a plethora of receptions and social events, including the “legendary” IT division dance party.

There were well over 100 presentation sessions, as well as divisional meetings, panel discussions, and networking events that ranged from business breakfasts to tours of Chicago’s architectural sights. There was plenty of scope to avoid or embrace the wide range of issues and areas under discussion and I focused on taxonomies, Linked Data, image metadata, and then took a diversion into business research and propaganda.

I also thoroughly enjoyed the vendor demonstrations, especially the editorially curated and spam-free search engine Blekko, FastCase, and Law360 legal information vendors, and EOS library management systems.

My next posts will cover a few of the sessions I attended in more detail. Here’s the first:

Adding Value to Content through Linked Data

Joseph Busch of Taxonomy Strategies offered an overview of the world of Linked Data. The majority of Linked Data available in the “Linked Data Cloud” is US government data, with Life Sciences data in second place, which reflects the communities that are willing and able to make their data freely and publicly available. It is important to keep in mind the distinction between concept schemes - Dublin Core, FOAF, SKOS, which provide structures but no meanings - and semantic schemes - taxonomies, controlled vocabularies, ontologies, which provide meanings. Meanings are created through context and relationships, and many people assume that equivalence is simple and association is complex. However, establishing whether something is the “same” as something else is often far more difficult than simply asserting that two things are related to each other.

Many people also fail to use the full potential of their knowledge organization work. Vocabularies are tools that can be used to help solve problems by breaking down complex issues into key components, giving people ways of discussing ideas, and challenging perceptions.

The presentation by Joel Richard, web developer at the Smithsonian Libraries, focused on their botanic semantic project – digitizing and indexing Taxonomic Literature II. (I assume they have discussed taxonomies of taxonomy at some point!) This is a fifteen-volume guide to the literature of systemic botany published between 1753 and 1940. The International Association for Plant Taxonomy (IAPT) granted permission to the Smithsonian to release the work on the web under an open licence.

The books were scanned using OCR, which produced 99.97% accuracy, which sounds impressive but that actually means 5,000-12,000 errors – far too many for serious researchers. Errors in general text were less of a concern than errors in citations and other structured information, where – for example, mistaking an 8 for a 3 could be very misleading. After some cleanup work, the team next identified terms such as names and dates that could be parsed and tagged, and selected sets of pre-existing identifiers and vocabularies. They are continuing to look for ontologies that may be suitable for their data set. Other issues to think about are software and storage. They are using Drupal rather than a triplestore, but are concerned about scalability, so are trying to avoid creating billions of triples to manage.

Joel also outlined some of the benefits of using Linked Data, gave some examples of successful projects, and provided links to further resources.

No responses yet

Jun 19 2012

Building bridges: Linking diverse classification schemes as part of a technology change project

My paper about my work on the linking and migration of legacy classification schemes, taxonomies, and controlled vocabularies has been published in the Journal for Business Information Review.

No responses yet

Jun 06 2012

Building, visualising and deploying taxonomies and ontologies; the reality - Content Intelligence Forum event

I have been trying to get to the Content Intelligence Forum meetups for some time as they always seem to offer excellent speakers on key topics that don’t tend to get the attention they deserve, so I was delighted to be able to attend Stephen D’Arcy’s talk a little while ago on taxonomies and ontologies.

Stephen has many years of experience designing semantic information systems for large organisations, ranging from health care providers, to banks, to media companies. His career illustrates the transferability and wide demand for information skills.

His 8-point checklist for a taxonomy project was extremely helpful – Define, Audit, Tools, Plan, Build, Deploy, Governance, Documentation – as were his tips for managing stakeholders, IT departments in particular. He warned against the pitfalls of not including taxonomy management early enough in search systems design, and the problems that you can be left with if you do not have a flexible and dynamic way of managing your taxonomy and ontology structures. He also included a lot of examples that illustrated the fun aspects of ontologies when used to create interesting pathways through entertainment content in particular.

The conversation after the talk was very engaging and I enjoyed finding out about common problems that information professionals face, including how best to define terms, how to encourage clear thinking, and how to communicate good research techniques.

No responses yet

Apr 23 2012

To embed or not to embed – metadata and IDs

One of the problems with the word metadata (apart from the fact that no-one can decide whether it should be singular or plural - as a former classicist I am quite happy to use it in the Anglicized singular form!) is that the word covers such a wide range of data required for a huge variety of uses.

At a recent presentation I gave as part of a “knowledge share” session at the digital design agency Tobias and Tobias, I was rightly challenged by Patrick from Golant Media Ventures, when I said that you should not embed metadata in your content, but manage it separately. He pointed out that for copyright and rights management purposes embedded metadata is extremely useful and in fact many content creators are actively campaigning to make sure that software and service providers do not strip metadata out of content when it is transferred or transcoded.

Embedding information versus embedding IDs

He is quite right, but I was right too – just in a different sense. It is a complex and important point, so I thought it was worth expanding on. I was talking about not embedding metadata structures in assets when you can manage structures of primarily semantic metadata separately. You can do this by embedding only IDs in the assets, and then using those as lookups to access the structure as and when you need to, picking up the structure “on the fly”. The principle remains the same whether you are talking about “private” localised IDs or “public” IDs, such as Linked Open Data dereferenceable URIs (i.e. website addresses you can look up). Such an approach allows you to manage the structures and meanings contextualising those IDs separately from managing the assets themselves.

The reason is mainly technical. If you wish to add to or edit the structure of your taxonomy (or ontology) or change the information your URI points to, it is far easier to do this in one place than it is to find all the assets containing that metadata and re-index them all individually every time you make a change. So, if you store taxonomy pathways as hard-coded text strings in a piece of content, but then you decide to alter the hierarchy, you have to go back to each and every occurrence of that text string applied to content and update it, in each and every asset record that contains it. Sometimes this might be fine – if you know that you are hardly ever going to change the structure or if you have very few assets, or if you have a very powerful and sophisticated re-indexing service. Generally, however, given that language is constantly evolving and asset collections are constantly growing and changing, the “hard-coding” approach is going to require an awful lot of processing and so will be very resource hungry.

If, on the other hand, all you embed in your asset record is an ID, you can use an external system to provide the context for that ID – the pathways of the taxonomy, the relationships of an ontology, the semantic sense of a URI. You can then alter your taxonomy’s hierarchies (e.g. adding and moving concept nodes) or develop your ontology (e.g. adding new classes and relationships) in one centralised system without having to go back to every individual indexed asset in turn. This also means that you can de-couple your taxonomy or ontology management system from your digital asset management or content management system. This is important if you want more sophisticated metadata management than standard DAM, search, or CMS software provides, or if you want to future proof your semantic structures.

Modular systems are more future proof

By keeping asset management and metadata management separate you can upgrade either part without having to upgrade the other. As semantic technologies – such as ontology editing systems – are going through a rapid phase of development, and in general evolving faster than search, DAM, and other consuming systems, maintaining your semantic structures in as transferable and system agnostic form as possible shows foresight. Conversely, you may want to invest only a little in a DAM system, with the hope that business will grow and you will be able to upgrade as your content collections increase. If you have a separate metadata management system you should be able to keep that, while changing your DAM system.

Rights management is different

However, all this primarily concerns internal content and metadata management. Where embedding metadata in the asset itself makes most sense is when that metadata is metadata that you want to remain fixed to that asset and be published with it - for example, details of where a photo was taken, who owns the copyright and how to get in touch with them to licence re-use of that photo. This is because making that information hard to strip out means that when your asset wanders out into the public world of the Internet and frequent uncontrollable copying, you want users to be able to find out easily the origins of the image and its ownership.

A huge problem for collection of royalties and licensing payments is that people who would be willing to pay simply don’t know who to pay. Deliberate piracy will always be a cost – just as shops will always have to allow for a certain amount of “shrinkage” due to shoplifting, but physical shops tend to be pretty good at making sure customers who are willing to pay can find plenty of checkout tills, self-service checkouts, or sales assistants. Keeping rights information embedded in assets is the equivalent of the checkout, not the security camera.

How important is being up to date?

Of course, the problem of updating remains – so if copyrights are transferred, all those assets that have gone out with old embedded metadata contain out of date information. So, rights managers are increasingly moving towards a system of embedding dereferenceable IDs as well. One example is the EIDR system that uses this method (as well as other techniques) to manage rights. By embedding an ID that links to a centralised rights registry, information can be updated once within that central registry, and then whenever someone looks up that ID, they get the most up to date details.

So, we are both right in a way. Embedding IDs and managing metadata separately to managing assets has many advantages. Embedding the metadata itself can also be useful, especially if it is rights information of assets that will be released onto the public Internet and is information that you may not need to update, but that you do not want to be lost when the asset is copied.

No responses yet

Nov 13 2011

Holodecks, marketing, and crime scenes - the DAM link between different worlds

In the last two weeks I have attended three very different conferences, with DAM as the common thread. The first was Media Pro Expo, where I spoke on a panel with the DAM Foundation, alongside Mark Davey, Madi Solomon, and David Lipsey. The second was Createasphere’s first European DAM conference, and the third (co-located with the Createasphere event) was the SPAR Europe Conference on 3D Imaging and Data Management for Engineering, Construction, Manufacturing, and Security.

The contrast between Media Pro and SPAR, and their respective audiences, was striking, but so were the similarities of the problems they faced, such as the common need to manage rich media assets and huge volumes of data. Media Pro was aimed at marketing companies, and had lots of amusing exhibits showcasing ways of using technology to create engaging and entertaining campaigns. (I enjoyed playing with an interactive magazine cover linked to a camera that allowed you to put your picture “on the cover” and select your favourite headlines.) Marketing companies are concerned with keeping, curating and mining data not just about customers’ contact details, but also their likes, social connections, and shopping habits in order to create personalised campaigns, so they have become great consumers of metadata.

3D Imaging and Data Management

SPAR was all about scanning and mapping, not in the sense that I am familiar with, but literally surveying the Earth and making maps. There were companies that use lasers to create roadmaps, others that carry out aerial surveys, and some that create 3-D representations of buildings. There are systems for surveying and modelling building sites to make sure that construction avoids sewers, pipes, and underground cables, and even a system for creating 3-D photosets of crime scenes to help the police in investigation and evidence gathering.

Createasphere

At Createasphere I talked about managing metadata in complex information environments and how we need to treat metadata as content in its own right. There were a range of excellent and diverse presentations, covering topics from the potential of immersive virtual worlds and the huge volumes of data they produce, to descriptions of technical metadata exchange projects.

I began to think about the crossover point between the creativity and imagination of the media and marketing companies and the power and accuracy of the surveying companies and how this is going to bring about hugely powerful fantasy “Holodeck” worlds that will make Second Life and the Sims look quainter than the Mickey Mouse cartoons of the 1930s.

Better than the real world

One challenge for information professionals is to think about how we can create navigation and search systems that do more than just replicate the real-world paradigms we are used to at the moment - I am thinking of things like road signs and timetables - but how to harness the best of semantic techniques and data mining processes to create reactive intuitive worlds that work better than the real one. Ed Lantz of Vortex Immersion Media spoke of “intelligent spaces” that automatically access our data, our assets, information about us, and arrange themselves to suit us. How do we prepare for a world when the likes of Apple’s speech recognition system Siri aren’t genies in bottles, but are the environment around us? We used to worry about ghosts in the machine, but will we end up as the ghosts inside the machine? We worry about putting our assets out there into the cloud, but perhaps we should be thinking more about what it will be like when we step inside the cloud or bring the cloud into our homes?

There was a post circulating on Twitter recently describing the library of the future as a hellish place where characters from books come alive and stalk the readers in the rooms. It was somewhat derided as a childish joke, but if we create Holodecks and then try to live in them, it could well come true. The implicit warning it contains that we could inadvertently trap ourselves in such a hellish place where privacy, rights, control, and manipulation are so hidden from view that we lose our sense of self seems to be very mature and insightful. Another post I read was about how interface designers are currently working on “pictures under glass” and need to start to use the full tactile, haptic, and 360 degree expressivity of our physical bodies, such as we are beginning to with technologies like the Wii and Kinect.

Making work fun

Theresa Regli of the Real Story Group pointed out that the world we are in now is one in which people still don’t grasp the importance of labelling their images, so immersive virtual worlds seem a long way off, but she also talked of the need for corporate interfaces to embrace “gamification”, as employees are far more productive when their jobs are fun. It may take some time, but I like the idea of a Holodeck meeting room where people make presentations and collaborate on plans by dancing around, rather than sitting staidly at a table. Rather than the hellish library where AI brings fictional monsters to life, it might turn out to be a lot of fun and all that movement may even be good for our health!

No responses yet

Feb 07 2011

Serendipity and large video collections

I enjoyed this blog post: On Serendipity. Ironically, it was recommended to me, and I am now recommending it!

Serendipity is rarely of use to the asset manager, who wants to find exactly what they expect to find, but is a delight for the consumer or leisure searcher. People sometimes cite serendipity as a being a reason to abandon classification, but in my experience classification often enhances serendipity and can be lost in simple online search systems.

For example, when browsing an alphabetically ordered collection in print, such as an encyclopedia or dictionary, you just can’t help noticing the entries that sit next to the one you were looking for. This can lead you to all sorts of interesting connections - for example, looking up crescendo, I couldn’t help noticing that crepuscular means relating to twilight, and that there is a connection between crepe paper and the crepes you can eat (from the French for “wrinkly”), but crepinette has a different derivation (from the French for “caul”). What was really interesting was the fact that there was no connection, other than an accident of alphabetical order. I wasn’t interested in things crepuscular, or crepes and crepinettes, and I can’t imagine anyone deliberately modelling connections between all these things as “related concepts”.

Wikipedia’s “random article” function is an attempt to generate serendipity alogrithmically. On other sites the “what people are reading/borrowing/watching now” functions use chronological order to throw out unsought items from a collection in the hope that they will be interesting. Twitter’s “trending topics” use a combination of chronological order and statistics on the assumption that what is popular just now is intrinsically interesting. These techniques look for “interestingness” out of what can be calculated and it is easy to see how they work, but the semantic web enthusiasts aim to open up to automated processing the kind of free associative links that human brains are so good at generating.

No responses yet

Jan 09 2011

Online Information Conference – day two

Linked Data in Libraries

I stayed in the Linked Data track for Day 2 of the Online Information Conference, very much enjoying Karen Coyle’s presentation on metadata standards - FRBR, FRSAR, FRAD, RDA - and Sarah Bartlett’s enthusiasm for using Linked Data to throw open bibliographic data to the world so that fascinating connections can be made. She explained that while the physical sciences have been well mapped and a number of ontologies are available, far less work has been done in the humanities. She encouraged humanities researchers to extend RDF and develop it.

In the world of literature, the potential connections are infinite and very little numerical analysis has been done by academics. For example, “intertextuality” is a key topic in literary criticism, and Linked Data that exposes the references one author makes to another can be analysed to show the patterns of influence a particular author had on others. (Google ngrams is a step in this direction, part index, part concordance.)

She stressed that libraries and librarians have a duty of care to understand, curate, and manage ontologies as part of their professional role.

Karen and Sarah’s eagerness to make the world a better place by making sure that the thoughtfully curated and well-managed bibliographic data held by libraries is made available to all was especially poignant at a time when library services in the UK are being savaged.

The Swedish Union Catalogue is another library project that has benefited from a Linked Data approach. With a concern to give users more access to and pathways into the collections, Martin Malmsten asked if APIs are enough. He stressed the popularity of just chucking the data out there in a quick and dirty form and making it as simple as possible for people to interact with it. However, he pointed out that licences need to be changed and updated, as copyright law designed for a print world is not always applicable for online content.

Martin pointed out that in a commercialised world, giving anything away seems crazy, but that allowing others to link to your data does not destroy your data. If provenance (parametadata) is kept and curated, you can distinguish between the metadata you assert about content and anything that anybody else asserts.

During the panel discussion, provenance and traceability – which the W3C is now focusing on (parametadata) – was discussed and it was noted that allowing other people to link to your data does not destroy your data, and often makes it more valuable. The question of what the “killer app” for the semantic web might be was raised, as was the question of how we might create user interfaces that allow the kinds of multiple pathway browsing that can render multiple relationships and connections comprehensible to people. This could be something a bit like topic maps - but we probably need a 13-year-old who takes all this data for granted to have a clear vision of its potential!

Tackling Linked Data Challenges

The second session of day two was missing Georgi Kobilarov of Uberblic who was caught up in the bad weather. However, the remaining speakers filled the time admirably.

Paul Nelson of Search Technologies pointed out that Google is not “free” to companies, as they pay billions in search engine optimisation (SEO) to help Google. Google is essentially providing a marketing service, and companies are paying huge amounts trying to present their data in the way that suits Google. It is therefore worth bearing in mind that Google’s algorithms are not resulting in a neutral view of available information resources, but are providing a highly commercial view of the web.

John Sheridan described using Linked Data at the National Archives to open up documentation that previously had very little easily searchable metadata. Much of the documentation in the National Archives is structured – forms, lists, directories, etc. – which present particular problems for free text searches, but are prime sources for mashing up and querying.

Taxonomies, Metadata, and Semantics: Frameworks and Approaches

There were some sensible presentations on how to use taxonomies and ontologies to improve search results in the third session.
Tom Reamy of KAPS noted the end of the “religious fervour” about folksonomy that flourished a few years ago, now that people have realised that there is no way for folksonomies to get better and they offer little help to infrequent users of a system. They are still useful as a way of getting insights into the kind of search terms that people use, and can be easier to analyse than search logs. A hybrid approach, using a lightweight faceted taxonomy over the top of folksonomic tags is proving more useful.

Taxonomies remain key in providing the structure on which autocategorisation and text analytics is based, and so having a central taxonomy team that engages in regular and active dialogue with users is vital. Understanding the “basic concepts” (i.e. Lakoff and Rosch’s “basic categories”) that are the most familiar terms to the community of users is vital for constructing a helpful taxonomy and labels should be as short and simple as possible. Labels should be chosen for their distinctiveness and expressiveness.

He also pointed out that adults and children have different learning strategies, which is worth remembering. I was also pleased to hear his clear and emphatic distinction between leisure and workplace search needs. It’s a personal bugbear of mine that people don’t realise that looking for a hairdresser in central London – where any one of a number will do – is not the same as trying to find a specific shot of a particular celebrity shortly after that controversial haircut a couple of years ago from the interview they gave about it on a chat show.

Tom highlighted four key functions for taxonomies:

  • knowledge organisation systems (for asset management)
  • labelling systems (for asset management)
  • navigation systems (for retrieval and discovery)
  • search systems (for retrieval)

He pointed out that text analytics needs taxonomy to underpin it, to base contextualisation rules on. He also stressed the importance of data quality, as data quality problems cause the majority of search project failures. People often focus on cool new features and fail to pay attention to the underlying data structures they need to put in place for effective searching.

He noted that the volumes of data and metadata that need to processed are growing at a furious rate. He highlighted Comcast as a company that is very highly advanced in the search and data management arena, managing multiple streams of data that are constantly being updated, for an audience that expects instant and accurate information.

He stated that structure will remain the key to findability for the foreseeable future. Autonomy is often hailed as doing something different to other search engines because it uses statistical methods, but at heart it still relies on structure in the data.

Richard Padley made it through the snow despite a four-hour train journey from Brighton, and spoke at length about the importance of knowledge organisation to support search. He explained the differences between controlled vocabularies, indexes, taxonomies, and ontologies and how each performs a different function.

Marianne Lykke then talked about information architecture and persuasive design. She also referred to “basic categories” as well as the need to guide people to where you want them to go via simple and clear steps.

Taxonomies, Metadata, and Semantics in Action

I spoke in the final session of the day, on metadata life cycles, asset lifecycles, parametadata, and managing data flows in complex information “ecosystems” with different “pace layers”.

Neil Blue from Biowisdom gave a fascinating and detailed overview of Biowisdom’s use of semantic technologies, in particular ontology-driven concept extraction. Biowisdom handle huge complex databases of information to do with the biological sciences and pharmaceuticals, so face very domain-specific issues, such as how to bridge the gap between “hard” scientific descriptions and “soft” descriptions of symptoms and side-effects typically given by patients.

In the final presentation of the day, Alessandro Pica outlined the use of semantic technologies by Italian News agency AGI.

One response so far

May 02 2010

The power of parametadata

First we had content, then not long after that we had metadata, although no-one called it that. Now we need parametadata – the metadata about metadata!

Neither metadata nor parametadata are anything new, but what is new is how central they have become to all sorts of business processes. People think there is something modern and techie about metadata, but ever since the first author signed their initials on a piece of work, or added a title, we have had metadata. Librarians are just one group who have been using metadata for centuries.

Thanks to technological advances, there is now a huge amount of processing that can be done with metadata, indeed that needs to be done if we are to have any idea what assets we have available. Metadata has become the active driver of numerous business processes. You couldn’t operate a computer without the metadata that tells you the name of a file, its location, when it was last saved, etc. and this sort of metadata is so ubiquitous that nobody tends to think about it too much. Now metadata is so pervasive, it is becoming increasingly important to talk about it and define different aspects and types.

One key distinction is the one between objective and subjective metadata. Subjective metadata refers to classification, tagging, taxonomies, etc. This metadata is subjective because it is always possible to argue about it. Objective metadata on the other hand is uncontroversial and typically process-driven – a file format is what it is, the time the file was last saved might cause consternation after a PC crash, but is unarguable. However, there is actually surprisingly little uncontroversial metadata. Even something like a title can be edited and changed – what do you do when some content acquires a popular or folk title that is not the same as its official title? This happens a lot with comedy sketches and songs, but can also happen to names of projects, working groups, etc.

Parametadata (or meta-metadata) is another subset of metadata – it is the metadata about the metadata, giving its provenance, date of creation, technical specifications, etc. Once you start to think about metadata as content in its own right, it becomes obvious that just as you wish to track the author, title, and so on of the core content, so too you need to track the author(s), provenance, date of creation and latest update of the metadata as well. For subjective metadata, parametadata becomes hugely useful. Because you can have multiple classifications of an asset, it is very important to track the source – distinguishing between author added keywords, indexer keywords, and folksonomic tags, for example – so that people can tell where a tag has come from.

As long as you know where tags have come from, you can decide whether or not you want to trust in their authority. In an increasingly muddled web, it is helpful to be told the source of a comment or an opinion in order to try to distinguish sound information from propaganda or uninformed speculation. Anecdotally, many people who were initially excited about citizen review sites – rating hotels, etc. – have now given up on them on the grounds that the people who contribute to them tend to have some kind of axe – or worse – to grind, so you can’t take them seriously. Even reviews that aim to be fair may not be relevant if the reviewer is too dissimilar to the reader. The perfect holiday for a group of teenagers is unlikely to be what a retired couple are looking for. So any review needs to carry sufficient information so that the reader can work out how relevant the content is to them. A good review site would carry a range of reviews aimed at different audiences.

Similarly, a rich navigation system needs to offer a range of tags and taxonomies, but these will only be useful when there is sufficient parametadata to tell the user where each scheme or tag came from, who created it, how up to date it is, etc. From a user perspective, being able to choose from a range of well-documented navigation systems means they can make an informed choice about whether to have fun with the randomness of folksonomic tags, to follow a specialist taxonomy in order to learn how a subject is handled by experts, or to use a guide constructed by the content curators for a general audience.

Interface designers can use the parametadata to make different sources of metadata distinct – with different visual or other cues, for example, to indicate different navigation environments. This means you can create a range of different “navigation worlds” and let your users wander to and fro while always making sure they know where – in terms of trust and authority – they are.

9 responses so far

Next »