UK Archives Discovery Forum

6th March, 2011 Fran 1 comment
Estimated reading time 6–10 minutes

I very much enjoyed the UKAD UK Archives Discovery Forum event at the National Archives. There were three tracks as well as plenary sessions, so I couldn’t attend everything.

Linked Data and archives

After an introduction from Oliver Morley, John Sheridan opened by talking about the National Archives and Linked Data. Although not as detailed as the talk he gave at the Online Information Conference last December, he still gave the rallying call for opening up data and spoke of a “new breed” of IT professionals who put the emphasis on the I rather than the T. He spoke about Henry Maudslay who invented the screw-cutting lathe, which enabled standardisation of nuts and bolts. This basically enabled the industrial revolution to happen. Previously, all nuts and bolts were made individually as matching pairs, but because the process was manual, each pair was unique and not interchangeable. If you lost the bolt, you needed a new pair. This created huge amounts of management and cataloguing of individual pairs, especially if a machine had to be taken apart and re-assembled, and meant interoperability of machinery was almost impossible. Sheridan asserted that we are at that stage with data – all our data ought to fit together but at the moment, all the nuts and bolts have to be hand crafted. Linked Data is a way of standardising so that we can make our data interchangeable with other people’s. (I like the analogy because it makes clear the importance of interoperability, but obviously getting the nuts and bolts to fit is only a very small part of what makes a successful machine, let alone a whole factory or production line. Similarly Linked Data isn’t going to solve broad publishing or creative and design problems, but it makes those big problems easy to work on collaboratively.)

Richard Wallis from Talis spoke about Linked Data. He likes to joke that you haven’t been to a Linked Data presentation unless you’ve seen the Linked Open Data cloud diagram. My version is that you haven’t been to a Linked Data event unless at least one of the presenters was from Talis! Always an engaging speaker, his descriptions of compartmentalisation of content and distinctions between Linked Data, Open Data, and Linked Open Data were very helpful. He likes to predict evangelically that the effects of linking data will be more profound to the way we do business than the changes brought about by the web itself. Chatting to him over tea, he has the impression that a year ago people were curious about Linked Data and just wanted to find out what it could do, but this year they are now feeling a bit more comfortable with the concepts and are starting to ask about how they can put them into practice. There certainly seemed to be a lot of enthusiasm in the archive sector, which is generally cash-strapped, but highly co-operative, with a lot of people passionate about their collections and their data and eager to reach as wide an audience as possible.

A Vision of Britain

Humphrey Southall introduced us to A Vision of Britain, which is a well-curated online gazetteer of Britain, with neat functions for providing alternative spellings of placenames, and ways of tackling the problems of boundaries, especially of administrative divisions, that move over time. I’m fascinated by maps, and they have built in some interesting historical map functionality too.

JISC and federated history archives

David Flanders from JISC talked about how JISC and its Resource Discovery Task Force can provide help and support to educational collections especially in federation and Linked Data projects. He called on archives managers to use hard times to skill up, so that when more money becomes available staff are full of knowledge, skills, and ideas and ready to act. He also pointed out how much can be done in the Linked Data arena with very little investment in technology.

I really enjoyed Mike Pidd’s talk about the JISC-funded Connected Histories Project. They have adopted a very pragmatic approach to bringing together various archives and superimposing a federated search system based on metadata rationalisation. Although all they are attempting in terms of search and browse functionality is a simple set of concept extractions to pick out people, places, and dates, they are having numerous quality control issues even with those. However, getting all the data into a single format is a good start. I was impressed that one of their data sets took 27 days to process and they still take delivery of data on drives through the post. They found this was much easier to manage than ftp or other electronic transfer, just because of the terabyte volumes involved (something that many people tend to forget when scaling up from little pilot projects to bulk processes). Mike cautioned against using RDF and MySql as processing formats. They found that MySql couldn’t handle the volumes, and RDF they found too “verbose”. They chose to use a fully Lucene solution, which enabled them to bolt in new indexes, rather than reprocess whole data sets when they wanted to make changes. They can still publish out to RDF.

Historypin

Nick Stanhope enchanted the audience with Historypin, an offering from wearewhatwedo.org. Historypin allows people to upload old photos, and soon also audio and video, and set them in Google streetview. Although flickr has some similar functions, historypin has volunteers who help to place the image in exactly the right place, and Google have been offering support and are working on image recognition techniques to help place photos precisely. This allows rich historical street views to be built up. What impressed me most, however, was that Nick made the distinction between subjective and objective metadata, with his definition being objective metadata is metadata that can be corrected and subjective metadata is data that can’t. So, he sees objective metadata as the time and the place that a photo was taken – if it is wrong someone might know better and be able to correct it, and subjective metadata as the stories, comments, and opinions that people have about the content, which others cannot correct – if you upload a story or a memory, no-one else can tell you that it is wrong. We could split hairs over this definition, but the point is apposite when it comes to provenance tracking. He also made the astute observation that people very often note the location that a photo is “of”, but it is far more unusual for them to note where it was taken “from”. However, where it was taken from is often more use for augmented reality and other applications that try to create virtual models or images of the world. Speaking to him afterwards, I asked about parametadata, provenance tracking, etc. and he said these are important issues they are striving to work through.

Women’s history

Theresa Doherty from the Women’s Library ended the day with a call to stay enthusiastic and committed despite the recession, pointing out that it is an achievement that archives are still running despite the cuts, and that this shows how valued data and archives are in the national infrastructure, how important recording our history is, and that while archivists continue to value their collections, enjoy their visitors and users, and continue to want their data to reach a wider audience the sector will continue to progress. She described how federating the Genesis project within the Archives hub had boosted use of their collections, but pointed out that funders of archives need to recognise that online usage of collections is just as valid as getting people to physically turn up. At the moment funding typically is allocated on visitor numbers through the doors, and that this puts too much emphasis on trying to drag people in off the street at the expense of trying to reach a potentially vast global audience online.

Top

The power of parametadata

2nd May, 2010 Fran 9 comments
Estimated reading time 4–6 minutes

First we had content, then not long after that we had metadata, although no-one called it that. Now we need parametadata – the metadata about metadata!

Neither metadata nor parametadata are anything new, but what is new is how central they have become to all sorts of business processes. People think there is something modern and techie about metadata, but ever since the first author signed their initials on a piece of work, or added a title, we have had metadata. Librarians are just one group who have been using metadata for centuries.

Thanks to technological advances, there is now a huge amount of processing that can be done with metadata, indeed that needs to be done if we are to have any idea what assets we have available. Metadata has become the active driver of numerous business processes. You couldn’t operate a computer without the metadata that tells you the name of a file, its location, when it was last saved, etc. and this sort of metadata is so ubiquitous that nobody tends to think about it too much. Now metadata is so pervasive, it is becoming increasingly important to talk about it and define different aspects and types.

One key distinction is the one between objective and subjective metadata. Subjective metadata refers to classification, tagging, taxonomies, etc. This metadata is subjective because it is always possible to argue about it. Objective metadata on the other hand is uncontroversial and typically process-driven – a file format is what it is, the time the file was last saved might cause consternation after a PC crash, but is unarguable. However, there is actually surprisingly little uncontroversial metadata. Even something like a title can be edited and changed – what do you do when some content acquires a popular or folk title that is not the same as its official title? This happens a lot with comedy sketches and songs, but can also happen to names of projects, working groups, etc.

Parametadata (or meta-metadata) is another subset of metadata – it is the metadata about the metadata, giving its provenance, date of creation, technical specifications, etc. Once you start to think about metadata as content in its own right, it becomes obvious that just as you wish to track the author, title, and so on of the core content, so too you need to track the author(s), provenance, date of creation and latest update of the metadata as well. For subjective metadata, parametadata becomes hugely useful. Because you can have multiple classifications of an asset, it is very important to track the source – distinguishing between author added keywords, indexer keywords, and folksonomic tags, for example – so that people can tell where a tag has come from.

As long as you know where tags have come from, you can decide whether or not you want to trust in their authority. In an increasingly muddled web, it is helpful to be told the source of a comment or an opinion in order to try to distinguish sound information from propaganda or uninformed speculation. Anecdotally, many people who were initially excited about citizen review sites – rating hotels, etc. – have now given up on them on the grounds that the people who contribute to them tend to have some kind of axe – or worse – to grind, so you can’t take them seriously. Even reviews that aim to be fair may not be relevant if the reviewer is too dissimilar to the reader. The perfect holiday for a group of teenagers is unlikely to be what a retired couple are looking for. So any review needs to carry sufficient information so that the reader can work out how relevant the content is to them. A good review site would carry a range of reviews aimed at different audiences.

Similarly, a rich navigation system needs to offer a range of tags and taxonomies, but these will only be useful when there is sufficient parametadata to tell the user where each scheme or tag came from, who created it, how up to date it is, etc. From a user perspective, being able to choose from a range of well-documented navigation systems means they can make an informed choice about whether to have fun with the randomness of folksonomic tags, to follow a specialist taxonomy in order to learn how a subject is handled by experts, or to use a guide constructed by the content curators for a general audience.

Interface designers can use the parametadata to make different sources of metadata distinct – with different visual or other cues, for example, to indicate different navigation environments. This means you can create a range of different “navigation worlds” and let your users wander to and fro while always making sure they know where – in terms of trust and authority – they are.

Top

A Taxonomy of the City

28th February, 2009 Fran Start a conversation
Estimated reading time 2–3 minutes

I recently returned from a visit to New York. The numerical street/avenue system seemed to make navigation very easy, but it is important not to make mistakes over the numbers (I almost confused 145th with 45th Street – they are miles apart!) and you always need two numbers to make a grid reference. In London, you need to remember more names, but usually only one number. I would never try to find somewhere in London with just a street name and a number – I always want to know the area. This seems to make navigation harder, but once you have the area right, you won’t be that far away, even if you get the street name and number wrong, whereas the temptation to rely on numbers rather than area names in New York means you have effectively no error-recovery mechanism.

I personally find it easier to remember names than numbers (maybe Americans are better at numbers), and I navigate London by remembering nearest tube station names. I found subway stations in New York trickier, as so many have what to me are not very memorable names – “8th Street” just doesn’t seem to stick in the same way that “Colindale” or “South Acton” do. If I’ve lost the recall, recognition of numbers is also difficult. I might recognise “Colindale” as being the right shape or sound of word, but if I’ve forgotten 8th, being shown it among 6th, 10th, and 12th doesn’t help. So although New York at first seemed far easier to navigate than London, I still felt I had to work quite hard to build a mental map. It would be interesting to know if one system really is more user-friendly, or if you just get used to either in time.

I suspect that I just prefer the London system because that is what I am used to, and I would sitll prefer it, even if it was demonstrably less efficient than the numerical system – a good illusration of why change management is so difficult. Even if you introduce a simpler and more efficient system, people yearn for the old familiar one with all its complexities and peculiarities.

I had a look to see what studies on urban navigation are out there, but instead happened on this rather charming public art project:
Wooster Collective: Urban Flora – A Taxonomy Of The City.

Top

Tag Archives: navigation