Jun 29 2011

Digital Asset Management - DAM EU Conference - Third Session

Published by Fran under Digital Asset Management

Sustaining your DAM

Sara Winmill from the V&A talked about the huge shifts in mindset that were needed to accompany their DAM work. They needed to stop thinking about storing pictures of things and start thinking about managing those digital images as the things. Their needs for storage were vastly underestimated at first. Unlike the myth, storage is not so cheap - the V&A need some £330K for storage annually. They have been investigating innovative approaches to “backup bartering” - finding a similar organisation and storing a copy of each other’s data, so that the backups exist offsite but without the expense of using commercial storage companies.

Despite having a semantically enabled website, they have not been able to link their Library Catalogue’s MARC records with the images, and have three sets of identifiers that are not mapped.

One of their major DAM problems is trying to stop people storing multiple copies and refusing to delete anything. The core collections images need to be kept, but publicity and marketing material is now being stored in the system without any selection and disposal policies in place, The original system was designed without a delete button altogether.

Can we fix it? Yes we Can! Successfully Implementing a Multi-faceted DAM system at HiT entertainment

It was a pleasure to hear of Tabitha Yorke’s successful DAM implementation at HiT as they built their first digital library. This was a relatively constrained collection and two fulltime members of staff were able to catalogue it in a year. This provided the metadata they needed for a straightforward taxonomy-based search system that is simple and easy to use. This meant that self-research was supported, saving the team much time and increasing productivity hugely. They are now working to integrate the library with rights systems. They worked hard at getting users to test the metadata and made sure that they were cataloguing with terms the users wanted to search with, rather than those that occurred first to the cataloguers. They now have two digital librarians managing 150,000 assets.

Tabitha stayed on the stage and was joined in a panel session by David Bercovic, Digital Project Manager at Hachette UK, and Fearghal Kelly of Kit digital. The afternoon ended with David Lipsey’s concluding remarks.

No responses yet

Jun 28 2011

Digital Asset Management - DAM EU Conference - Second Session

Published by Fran under Digital Asset Management

Serco Artemis Digital - Realising the Value of Archives and Rehabilitating Prisoners

Bruce Hellman from Serco described the work they have been doing to employ prisoners as cataloguers and transcribers. The work, which varied from project to project, but which included typing up handwritten archival documents that were not suitable for OCR capture techniques and adding metadata, was very popular with prisoners.

Bruce argued that it gave them a chance to develop skills that would be useful in the workplace on their release, and allowed organisations to get work done more cheaply than by paying standard market rates.

How Metadata and Semantic Technologies will Revolutionise your Workflow

John O’Donovan of the Press Association gave an entertaining presentation about using semantic technologies to index or re-index and publish to the web content from a range of systems, including legacy systems and external feeds. He pointed out - with a series of amusing ambiguities and unintentional innuendos - that simple text search lacks context, and that newspaper headlines often contain jokes, ambiguous terms, and terms that quickly become obsolete. So, metadata is vital in assembling assets that are about the same topic.

He stressed the importance of keeping your metadata management separate from your content management, so that metadata can be changed without having to re-index assets. (An exception is rights and other non-subjective metadata that needed to be embedded in the asset for further tracking. This is not a major concern to the Press Association as they do not track assets once they are published onto the web. I wasn’t sure what would happen if you decided you wanted to repurpose your content, and so needed a new set of metadata, how you link content and metadata, and how you manage the metadata and content within their separate stores.)

The PA are using Mark Logic as the content repository and a BigOWLIM triplestore to handle the associated metadata. Content is fed into the content store, then out again to a suite of indexing technologies, including concept extraction and other text-processing systems, as well as facial recognition software, to create semantic metadata. Simple ontologies are used to model the content, mainly indexing people, places, and events - themes chosen as covering the most popular search terms entered by users of the website.

John argued that such gathering and indexing of assets in order to automatically create and publish collections of associated content was simpler and easier than ingesting diverse content and metadata into traditional search, content management, and online publishing systems.

DAM for Content Marketing, Curation, and Knowledge Organisation

Mark Davey of the DAM Foundation took us on an animated and musical tour of different perspectives on metadata, engagement, social media, and how different the “digital natives” - young people who have grown up with digital technologies - will be to previous generations. Kids of the future will be able to have an idea in the morning, go to an online website app and create their site, their brand, and their marketing strategy in the afternoon, and be engaging with their potential clients by the evening.

Mark pointed out that people have moved on from the initial narcissism of social media and self-publishing and now want compelling stories they can engage with. He pointed out that as semantic technologies advance, we are caught in a feedback loop with them - we are the ontology that is driving the machines - and so we should be aware and vigilant. As the technologies become more powerful and all pervasive, we may lose sight of how they are working to serve us, rather than how we are serving up information about ourselves to them.

Marketing will have to become more sophisticated. Amongst the many statistics he quoted, I noted that 84% of 25-34 year olds have left a favourite website because of ads. At the same time, our networks become more interconnected. In a “six degrees of separation” game, we discovered that three people in the audience had met the Dalai Lama, and we are linking to more and more people through social media sites every day.

The metaphor of information as water is a familiar one, especially in the knowledge management area, but Mark’s colleague Dave pointed out how appropriate it is when talking about a DAM/dam. The DAM system forms the reservoir of content.

(I couldn’t help comparing and contrasting the ever-changing semantic seas of information at the Press Association with the more manageable streams of content that flow within smaller organisations, and how very different approaches are needed for such different contexts. The other day I saw the metaphor used again, in an interview with - apparently - one of the LulzSec hackers who talked about their pirate boat and “copywrong” as an enemy of the seas. )

Black Holes and Revelations: DAM and a museum collection

As if to continue the water metaphor, the next speaker was Douglas McCarthy from the National Maritime Museum. However, he took the metaphor up a stage, to space ships and black holes, with their content assets hidden in black holes as 100,000 uncatalogued image files.

Having catalogued and improved their DAM system, the Musuem’s Picture Library is now showing a healthy profit. Many sales come from the “long tail” of images that no-one anticipated anyone would want. Rather than saturating the market, putting the images online has been stimulating demand, with customers calling for more collections to be made available.

No responses yet

Aug 01 2010

Content Identifiers for Digital Rights Persistence

This is another write-up from the Henry Stewart DAM London conference.

Identity and identification

Robin Wilson discussed the issue of content identifiers, which are vitally important for digital rights management, but yet tend to be overlooked. He argued that although people become engaged in debates about titles and the language used in labels and classification systems, people overlook the need to achieve consensus on basic identification.

(I was quite surprised, as I have always thought that people would argue passionately about what something should be called and how using the wrong terminology affects usability, but that they would settle on machine-readable IDs quite happily. Perhaps it is the neutrality of such codes that makes the politics intractable. If you have invested huge amounts of money in a database that demands certain codes, you will argue that those codes are used by everyone else to save you the costs of translation or acquiring a compatible system, and there are no appeals to usability, or brokerage via editorial policy, that can be made. It simply becomes a matter of whoever shouts the loudest gets to spend the least money in the short term. )

Robin argued that the only way to create an efficient digital marketplace is to have a trusted authority oversee a system of digital identifiers that are tightly bound within the digital asset, so they cannot easily be stripped out even when an asset is divided, split, shared, and copied. The authority needs to be trusted by consumers and creators/publishers in terms of political neutrality, stability, etc.

(I could understand how this system would make it easier for people who are willing to pay for content to see what rights they need to buy and who they should pay, but I couldn’t see how the system could help content owners identify plagiarism without an active search mechanism. Presumably a digital watermark would persist throughout copies of an asset, provided that it wasn’t being deliberately stripped, but if the user simply decided not to pay, I don’t see how the system would help identify rights breaches. Robin mentioned in conversation Turnitin’s plagiarism management, which has become more lucrative than their original work on content analysis, but it requires an active process instigated by the content owner to search for unauthorised use of their content. This is fine for the major publishers of the world, who can afford to pay for such services, but is less appealing to individuals, whether professional freelances or amateur content creators, who would need a cheap and easy solution that would alert them to breaches of copyright without their having to spend time searching.)

The identifiers themselves need to be independent of any specific technology. At the moment, DAM systems are often proprietary and therefore identifiers and metadata cannot easily flow from one system to another. Some systems even strip away any metadata associated with a file on import and export.

Robin described five types of identifier currently being used or developed:

  • Uniform Resource Name (URN)
  • Handle System
  • Digital Object Identifier
  • Persistent URL (PURL)
  • ARK (Archival Resource Key).

He outlined three essential qualities for identifiers - that they be unique, globally registered, and locally resolved.

So why don’t we share?

Robin argued that it is easier for DAM vendors to build “safe” systems that lock all content within an enterprise environment, only those with a public service/archival remit tend to be collaborative and open. DAM vendors resist a federated approach online and prefer to use a one-to-one or directly intermediated transaction model. Federated identifier management services exist but vendors and customers don’t trust them. The problem is mainly social, not technological.

One of the problems is agreeing to share the costs of services, such as infrastructure, registration and validation, governance and development of the system, administration, and outreach and marketing.

(Efforts to standardise may well benefit the big players more than the small players and so there is a strong argument for them bearing the initial costs and offering support for smaller players to join. Once enough people opt in, the system gains critical mass and it becomes both easier to join and costs of joining become less of an unquantifiable risk – you can benefit from the experiences of others. The semantic web is currently attempting to acquire this “critical mass”. As marketers realise the potential of semantic web technology to make money, no doubt we will see an upsurge in interest. Facebook’s “like” button may well be heralding the advent of the ad-driven semantic web, which will probably drive uptake far faster than the worthy efforts of academics to improve the world by sharing research data!)

4 responses so far

Jul 03 2010

Procuring a Digital Asset Management system

Published by Fran under Digital Asset Management

This is the first of a series of summaries of the Henry Stewart DAM London conference on June 30, chaired by David Lipsey. The panels (one of which included me) were a pleasing mix of very practical information and more theoretical discussion.

Classic DAM vendor “overstatements”

Theresa Regli, who does a great job as a “professional sceptic” stressed the need for a calm and considered approach to procurement with the most important stage being the testing stage. You wouldn’t buy a car without taking it for a test drive, but people buy software without finding out if it can handle their content. Nobody’s assets and business processes are exactly the same, and just because a system suited somebody else perfectly doesn’t mean it is right for you. Vendors will say that they can do anything, but that’s their job so don’t take their word for it. Don’t be distracted by the coolest of the cool new features or other bells and whistles. Cool costs - but may not make - money for your business. On the one hand, if the cool features don’t actually improve your specific business processes, they won’t benefit you, and on the other, vendors have become increasingly adept at marketing the same old features in new ways, so it is very important to dig beneath the surface to find out how they are doing what they claim. Surprisingly little has changed technologically in the DAM vendor landscape over the last five years. So, a wonderful new system for automatically indexing images directly may in fact just be the familiar territory of analysing textual metadata associated with images.

Speech to text

One area that has moved on is the technology to convert speech to text. This means that you can, to an extent, subtitle a film automatically (which isn’t quite the same as a system that can “watch a movie and understand what’s going on scene by scene”). This then gives you a chunk of textual metadata you can search and analyse (“understanding” what’s going on relies on sentiment analysis – looking up words in thesauruses, so, for example, if the dialogue mentions guns, shooting, and bullets a lot, the software could suggest it is a gunfight scene). However, accuracy rates are patchy and the systems require training, which could be labour intensive, so you need to make sure those training costs and the time required are included in budgets and schedules. The systems work best if you can get everything read by someone like Patrick Stewart, as he has very clear and even enunciation. Anyone with an unusual accent or who mumbles is far more difficult to process. As usual, the software is easiest to train if you are working within a specific context, so you can focus on relevant words and accents, rather than anything anyone anywhere in the world might happen to say.

A clever use of the technology is by the car industry to save time analysing focus group interviews. They asked interviewers to “audio index” their interviews by saying a key “trigger” word when somebody in the focus group said something interesting. The technology was set to clip out a section of video a few seconds before and after the trigger word, so the interviewers could then automatically generate “edited” versions of the interviews, saving a lot of time. I can see this being a great tool for anyone processing ethnographic data or conducting UX or similar testing based on interviews.

Zooming in on the detail

Another feature Theresa demonstrated was a high definition zooming tool, so that you can see very fine detail in your digital images – lovely for museums and art galleries but costly in terms of storage space and bandwidth. I could see it working well as an in-gallery interactive guide to certain collections. It wouldn’t be so good if you were trying to access it externally from a dodgy wifi or bandwidth-limited connection.

(The British Museum’s Magnificent Maps collection – which I saw on a London IA visit – has an interesting interactive zoom feature that works entirely differently, but was very popular. It worked by using a “magnifying glass” – actually a device with some LED transmitters that send an infrared signal to a webcam to trigger a zoom response through a special display interface.)

Procurement process tips

The other panel members talked through various DAM system procurement processes, from a huge global project for Cambridge University Press that began with a list of 452 vendors, through to a very detailed process for adidas with a smaller initial list but a large number of criteria to be fulfilled. It was pleasing that the panel agreed that cultural fit can be as important as any technical specifications. A state of the art or very large vendor who just doesn’t get your world is very unlikely to provide you with a good solution, but a mid-range vendor who really understands your particular context is much more likely to find or develop something that matches your business processes.

Although the use of personas (popular in the UX world) in procurement is quite unusual, Theresa suggested that user stories could be more effective than requirements spreadsheets. Vendors are likely to tick all the boxes in the spreadsheet without getting to grips with the business processes behind them. It is also hard to explain complex interactions as sets of requirements, but telling a story can make it clear what the system as a whole should provide, e.g. Sue has to research images for marketing campaigns and make sure that editors based in offices around the globe can see them to approve them and designers need to be able to access them remotely and then they need to be output in a variety of formats for publication both in print and online.

It is also worth making sure that any arrangements with outsourced suppliers are checked. Sometimes vendors will provide case studies of a successful implementation but not mention that they have never worked with your supplier before.

I noted the emphasis panellists placed on making sure taxonomies and vocabularies are user-friendly and effective in order to get the best out of any DAM system.

Manage your metadata

Sarah Saunders of Electric Lane discussed the importance of controlled vocabularies and managed metadata for image search and management. Speech-to-text software can’t help with stills collections, or when part of your collection is video without accompanying audio (e.g. a rushes collection – the “spare” footage that wasn’t used in a broadcast and which often has no associated dialogue or voiceover script). She described advances in visual sorting software that use a combination of textual metadata and content-based image retrieval (CBIR) to refine search results. Although CBIR is still in its infancy, when running over a small image set pre-selected by text searching it can be very helpful. CBIR can identify basic features like the colour that is used the most in an image, not much help if you run it over a large image collection with no other metadata (i.e. “give me all the mainly red pictures” will bring up images of everything from fire engines to strawberries – fun if all you want is inspiration, not so good if you have something more specific in mind). However, if you have a set of images of the Eiffel Tower for example, it could distinguish between close-ups and shots with lots of blue sky. If you like the blue sky ones, you can click on one and ask for “more like this” and be offered other mainly blue sky ones.

The second panel will be the subject of my next post.

One response so far