Mar 06 2011
Linked Data and archives
After an introduction from Oliver Morley, John Sheridan opened by talking about the National Archives and Linked Data. Although not as detailed as the talk he gave at the Online Information Conference last December, he still gave the rallying call for opening up data and spoke of a “new breed” of IT professionals who put the emphasis on the I rather than the T. He spoke about Henry Maudslay who invented the screw-cutting lathe, which enabled standardisation of nuts and bolts. This basically enabled the industrial revolution to happen. Previously, all nuts and bolts were made individually as matching pairs, but because the process was manual, each pair was unique and not interchangeable. If you lost the bolt, you needed a new pair. This created huge amounts of management and cataloguing of individual pairs, especially if a machine had to be taken apart and re-assembled, and meant interoperability of machinery was almost impossible. Sheridan asserted that we are at that stage with data – all our data ought to fit together but at the moment, all the nuts and bolts have to be hand crafted. Linked Data is a way of standardising so that we can make our data interchangeable with other people’s. (I like the analogy because it makes clear the importance of interoperability, but obviously getting the nuts and bolts to fit is only a very small part of what makes a successful machine, let alone a whole factory or production line. Similarly Linked Data isn’t going to solve broad publishing or creative and design problems, but it makes those big problems easy to work on collaboratively.)
Richard Wallis from Talis spoke about Linked Data. He likes to joke that you haven’t been to a Linked Data presentation unless you’ve seen the Linked Open Data cloud diagram. My version is that you haven’t been to a Linked Data event unless at least one of the presenters was from Talis! Always an engaging speaker, his descriptions of compartmentalisation of content and distinctions between Linked Data, Open Data, and Linked Open Data were very helpful. He likes to predict evangelically that the effects of linking data will be more profound to the way we do business than the changes brought about by the web itself. Chatting to him over tea, he has the impression that a year ago people were curious about Linked Data and just wanted to find out what it could do, but this year they are now feeling a bit more comfortable with the concepts and are starting to ask about how they can put them into practice. There certainly seemed to be a lot of enthusiasm in the archive sector, which is generally cash-strapped, but highly co-operative, with a lot of people passionate about their collections and their data and eager to reach as wide an audience as possible.
A Vision of Britain
Humphrey Southall introduced us to A Vision of Britain, which is a well-curated online gazetteer of Britain, with neat functions for providing alternative spellings of placenames, and ways of tackling the problems of boundaries, especially of administrative divisions, that move over time. I’m fascinated by maps, and they have built in some interesting historical map functionality too.
JISC and federated history archives
David Flanders from JISC talked about how JISC and its Resource Discovery Task Force can provide help and support to educational collections especially in federation and Linked Data projects. He called on archives managers to use hard times to skill up, so that when more money becomes available staff are full of knowledge, skills, and ideas and ready to act. He also pointed out how much can be done in the Linked Data arena with very little investment in technology.
I really enjoyed Mike Pidd’s talk about the JISC-funded Connected Histories Project. They have adopted a very pragmatic approach to bringing together various archives and superimposing a federated search system based on metadata rationalisation. Although all they are attempting in terms of search and browse functionality is a simple set of concept extractions to pick out people, places, and dates, they are having numerous quality control issues even with those. However, getting all the data into a single format is a good start. I was impressed that one of their data sets took 27 days to process and they still take delivery of data on drives through the post. They found this was much easier to manage than ftp or other electronic transfer, just because of the terabyte volumes involved (something that many people tend to forget when scaling up from little pilot projects to bulk processes). Mike cautioned against using RDF and MySql as processing formats. They found that MySql couldn’t handle the volumes, and RDF they found too “verbose”. They chose to use a fully Lucene solution, which enabled them to bolt in new indexes, rather than reprocess whole data sets when they wanted to make changes. They can still publish out to RDF.
Nick Stanhope enchanted the audience with Historypin, an offering from wearewhatwedo.org. Historypin allows people to upload old photos, and soon also audio and video, and set them in Google streetview. Although flickr has some similar functions, historypin has volunteers who help to place the image in exactly the right place, and Google have been offering support and are working on image recognition techniques to help place photos precisely. This allows rich historical street views to be built up. What impressed me most, however, was that Nick made the distinction between subjective and objective metadata, with his definition being objective metadata is metadata that can be corrected and subjective metadata is data that can’t. So, he sees objective metadata as the time and the place that a photo was taken - if it is wrong someone might know better and be able to correct it, and subjective metadata as the stories, comments, and opinions that people have about the content, which others cannot correct - if you upload a story or a memory, no-one else can tell you that it is wrong. We could split hairs over this definition, but the point is apposite when it comes to provenance tracking. He also made the astute observation that people very often note the location that a photo is “of”, but it is far more unusual for them to note where it was taken “from”. However, where it was taken from is often more use for augmented reality and other applications that try to create virtual models or images of the world. Speaking to him afterwards, I asked about parametadata, provenance tracking, etc. and he said these are important issues they are striving to work through.
Theresa Doherty from the Women’s Library ended the day with a call to stay enthusiastic and committed despite the recession, pointing out that it is an achievement that archives are still running despite the cuts, and that this shows how valued data and archives are in the national infrastructure, how important recording our history is, and that while archivists continue to value their collections, enjoy their visitors and users, and continue to want their data to reach a wider audience the sector will continue to progress. She described how federating the Genesis project within the Archives hub had boosted use of their collections, but pointed out that funders of archives need to recognise that online usage of collections is just as valid as getting people to physically turn up. At the moment funding typically is allocated on visitor numbers through the doors, and that this puts too much emphasis on trying to drag people in off the street at the expense of trying to reach a potentially vast global audience online.