Archive for October, 2011

Oct 09 2011

More than a schedule, give me an index

Published by Fran under KO, archives, semantic web

People have started to talk about the death of the schedule, often in the context of complaining that broadcasters are ill-prepared for this inevitability and schedulers complaining that no-one appreciates their skills in placing programmes appropriately and in context. One example is “hammocking” – making sure that viewers receive a “varied diet” across an evening, perhaps placing the news between two lighthearted pop culture programmes.

Meanwhile, the anti-schedulists point out that given the choice, some people will download and watch an entire series in one marathon session (people have “Torchwood weekends”), so that they don’t have to commit to being in front of the TV at 9pm every Thursday, or will watch a film broken down into 20 minute sections on their mobile phone while commuting. Schedulism and anti-schedulism can seem like major culture clash, but is easily resolved when you think purely in terms of knowledge organisation.

A schedule is just metadata

A schedule is merely a set of metadata about programmes. It used to be the most important set of metadata for most people (along with the programme title!) as it was the key to not missing the programme. Now that we have catchup services and archives, knowing exactly when a programme will be broadcast or was broadcast may be less significant for finding that programme again, leading some people to claim that schedules are no longer needed. However, there are plenty of people who don’t want to look for specific programmes but want to sit down and be entertained for the evening. For them, schedules remain vital as they outline what is available. Scheduling in this sense is editorial selection, with all the craftsmanship and judgement that implies.

People are fascinated to know what was broadcast on the day they were born, and which programmes went out together, and schedules offer all sorts of socio-political and cultural information, giving snapshots of what were popular topics or contentious issues over time.

Schedule data is less significant in a vast online digital archive, but it is still useful. For example, you might want to find an episode you missed in a long-running series. You probably won’t know that it was episode 12 of 26, but you might remember that the reason you missed it was because you were out celebrating a friend’s birthday, which is a date you know. This may be a lot quicker than reading through the episode descriptions, which are usually too vague to be helpful, as the writers don’t want to give away “spoilers”, such as the final cliffhanger, which is often the part of the episode you remember the best. The programme descriptions are intended to entice you to watch the programme, not help you work out whether or not you have already seen it.

Don’t ditch the schedule, add to it

What is important to bear in mind is that digital archives can offer schedule data almost effortlessly, but can offer many more metadata streams as well. These metadata streams are in many ways innovative and can lead to fascinating new ways of grouping programmes and promoting content. Rich subject metadata (such as a subject index) becomes an engine by which you can drive all sorts of automatically created content channels. You can group programmes by theme or topic as well as series and genre. So you don’t have to rely on when something was shown, you can use an index to gather together all programmes about fishing, or harpsichords, or the miners’ strike – bringing together documentaries (Heart of the Matter, Panorama), news and current affairs (also Question time, Newsnight, even The Money Programme), as well as plays (The Price of Coal), or even comedies (The Comic Strip Presents.. The Strike).

Such subjective metadata also gives you extra contextual information, for example in the case of the Miners’ Strike, it shows you that there were miners’ strikes in 1921, 1926, 1955, 1972, 1974, 1981, as well as in 1984, and that miners around the world have gone on strike at various times. This historic perspective is hard to pick out from schedule data. (Even if you could see programmes about miners’ strikes had been broadcast in these years, you would have to do further research to find out if they were covering contemporary events.) If the programmes have such metadata attached, anyone – any user of the archive – can effectively build rich personalised channels on their favourite topics or themes, and share those with others who have similar interests.

Metadata advertises content

If the metadata is in a Linked and Open format, the associative trails can wander beyond your collection to others, reaching new audiences, perhaps via social networks. This releases the “long tail” of content that is otherwise hard to find and re-use, as well as putting popular content into context. Making your metadata available more widely means more people will have more and more routes in to exploring your archive, even if you choose to restrict this to in-house teams or paying subscribers.

Either way, if you want to sell individual programmes or parts of programmes, knowing not just when you transmitted them but knowing exactly what they are about - via the rich semantic metadata you have added - offers a very useful sales and marketing tool.

No responses yet

Oct 07 2011

Transforming and extending classification systems - UDCC Seminar

Published by Fran under KO, semantic web

This post is the last in a series about the UDC consortium international seminar in The Hague, 19-20 September, 2011

Joan S. Mitchell, OCLC (USA), and Marcia Lei Zeng, Kent State University (USA), supported by Maja Žumer, University of Ljubljana (Slovenia), talked about extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD, which led to interesting discussions about their concepts of “nomen” and “thema”.

Along with my former colleague Andy Heather, now CTO at DODS Parliamentary Communications Ltd, I talked about our work on the data migration of classifications from a legacy database into new taxonomy management software, presenting our paper: Transformation of a legacy UDC-based classification system: exploiting and remodelling semantic relationships.

Conclusions

The key ideas I took away from the conference were:
1) Classifications and ontologies are not an either/or choice. They have different properties and different strengths and weaknesses and so should be chosen according to the task in hand.
2) It is difficult to turn a classification into an ontology, but easy to turn an ontology into a taxonomy, so if you don’t have either to start with and can’t decide, an ontology is a safer bet. If you already have a classification, you need to think carefully about whether it is worth turning it into a fully modelled ontology, as converting it to RDF or SKOS is likely to be much easier. However, at the moment, RDF and SKOS have limitations, especially in handling faceted taxonomies, so beware of losing semantic richness in the conversion process. Polyhierarchies offer a way of expressing facets in SKOS.
4) Vocabulary control and alignment continue to be significant issues for the Semantic Web.
5) Ontology curation, management, and semantic alignment will be increasingly important issues for the Semantic Web.

Slides and audio recordings of all 21 talks can be now downloaded from the conference website.

Conference proceedings are published by Ergon Verlag and can now be
purchased/ordered online from http://seminar.udcc.org/2011/php/proceedings.php.

No responses yet

Oct 03 2011

Classification and ontology in specific subjects - UDCC Seminar

Published by Fran under KO, semantic web

Day two of the UDC consortium international seminar opened with two subject-specific talks – Wolfram Sperber described a classification of mathematics and Andrew Buxton showed how similar chemistry classification and ontologies are, using the ChEBI ontology. He also described the different ways classifications and ontologies could be used to support each other and about the lack of good graphical tools and visualisations to represent ontologies.

Categories and relations: key elements of ontologies - Categorial Distinctions

Roberto Poli, University of Trento (Italy) talked about the compliexisties of part-whole relationships. There are simple wholes, composed of a sum of their parts, but some parts of wholes cannot simply be added together – for example, the social, psychological, and physical aspects of a person. He also discussed the difference between science as epistemological – dealing with what can be known – and ontological – deraling with what exists.

Towards a relation ontology for the Semantic Web

Dagobert Soergel made a bold claim that the only way for the Semantic Web to deliver its promise is if we adopt a relation ontology and map each dataset to the standard, to allow interoperability. He pointed out that you “do not getting semantics from syntax alone”.

Relations in the notational hierarchy of the Dewey Decimal Classification

Rebecca Green from OCLC described the difficulties encountered when trying to automatically create ontologies from the Dewey Decimal Classification. These included semantic differences in the way subclasses had been defined, meaning that no single rule would handle them all appropriately.

Modelling concepts and structures in analytico-synthetic classifications

The eminent Ingetraut Dahlberg compared Aristotle and Ranganathan’s key facets and UDC and Colon Classification systems. She also presented a survey of academic subject areas analysed into facets.

Representing the structural elements of a freely faceted classification

Claudio Gnoli of the University of Pavia, talked about freely faceted classifications, in comparison with systems such as UDC. He emphasised the urgency of publishing classifications on line, but highlighted the limitations of SKOS and OWL to fully expressed faceted systems despite the fact that faceted systems are extremely good tools for obtaining precise search results. Faceted systems are also excellent for combining information across disciplines, allowing you to combine aspects of one subject areas with aspects of a different one, and interdisciplinarity is becoming increasingly important as an approach, as innovation often happens at the boundaries between disciplines.

He pointed out that a polyhierarchical approach can be modelled in SKOS as a way of representing facets, but that this approach is often overlooked. He also called for more work to be done on SKOS so that it can represent facets directly.

Facet analysis as a tool for modelling subject domains and terminologies

Vanda Broughton, University College London, offered the Bliss Classification as a useful tool for online subject classification, but called for help in how best to publish it for general use. Should it be released as a text document, database, or should work be done to convert it to an ontology – and if so, in what form?

She stressed how the logical approach of facet analysis and regular syntax makes it predictable and hence ideal for machine manipulation.

Analytico-synthetic approach for handling knowledge diversity in media content analysis

Devika P. Madalli, Indian Statistical Institute, DRTC (India), described the Living Knowledge project that used an analytico-synthetic approach in order to bring together around useful themes diverse content from different sources using varied means of expression. This supported a rich faceted search system.

Slides and audio recordings of all 21 talks can be now downloaded from the conference website.

Conference proceedings are published by Ergon Verlag and can now be
purchased/ordered online from http://seminar.udcc.org/2011/php/proceedings.php.

No responses yet