May 13 2012

I friend dead people - Are social media mature enough to cope with bereavement?

Published by Fran under archives, culture

This is a very personal post about topics in which I am not an expert, so I welcome comments and suggestions.

When “like” and “lol” don’t help

In February, a young man I had never met died in sad circumstances. He was a friend of a friend and I was supposed to meet him on the day he died. Completely coincidentally, within a fortnight I myself lost a dear friend, someone I had known for over 20 years.

The closeness in timing has thrown out sharp contrasts in the way that these deaths have reverberated around my social media worlds (obviously the real world impacts have been huge, but I am not going to discuss those here).

In many ways, dealing with the death of my own friend on social media has been easier. Being well known to her family and her circle of closest friends has meant that I have felt able to post messages of condolence and remembrance as I instinctively know what is appropriate, and I know that most of the people reading them will know me. It has been strange to see her name pop up as a “friend available on chat” when I know any activity in her account must be one of her family members logging in to maintain the page. Yesterday was her birthday, and the reminders in my calendar and the little birthday gift “event reminder” were bittersweet, but not unwelcome. I think of her and her family often, and do not want to forget.

Just after she died, I received a message through a social media site from someone I had never met or even heard of, who had been a schoolfriend of hers long ago, asking what had happened to our mutual friend, and I felt comfortable in answering. It helped me to talk about her with this stranger. I even flattered myself that I was doing some good, in that they clearly felt awkward about contacting her family directly while I was able to act as an “information resource” meaning the family and closest friends could focus on their own grieving.

I friend dead people

In contrast, how to cope with the loss of an almost-friend on social media has been strange and unnerving. One social media application has tactlessly and repeatedly suggested him as a friend, noting how many friends we had (have?) in common. Somehow I didn’t have the heart to click on “ignore”. I realise now I should have done just that, because I was anguished when I accidentally clicked on “confirm”. I worried that his friends and relatives might see my “friend request” and be distressed by it. Maybe they would never spot the noitification, maybe they would assume it was sent at a time before his death - just another reminder of what might have been, maybe they would even be comforted by the continuation of these distant social interactions with almost-strangers. (I immediately emailed the site in question asking them to retrieve my suggestion, but received no reply.)

My uncertainty about the appropriate “social media etiquette” was no doubt increased rather than diminished by our social distance. I do not know his family and friends well enough to mention this casually in passing, to express that this had been a mistake and was not intended to distress, or even to know what sort of people they are and whether this is the sort of thing that might upset them. However, it is exactly these sort of loose “one degree of separation” relationships that online social media foster and this incident struck me as illustrating how inadequate such media are when interactions need to go beyond chirruping about the weather, saying a website is cool, or asking whether or not someone wants to go to a party.

Digital memorials

My friend’s social media pages have slipped into being a form of digital memorial, but this also raises new issues. There have been stories in the press of “trolls” deliberately desecrating memorial pages in an online equivalent of upturning flowers left on a grave or kicking over and spraying graffiti on a headstone (e.g. http://gawker.com/5868503/why-people-troll-dead-kids-on-facebook). The only way to deal with this seems to be to remove the page, which is a shame and in a way seems to mean the bullies have won. It also highlights a strange transition from personal to public. Our graveyards are either public spaces that the authorities monitor and maintain or privately curated grounds. I have previously thought of my social media pages as more like a private garden - people may peer over the wall, but it is essentially “my” space to maintain. People are starting to think more and more about their digital legacies (the British Computer Society recently held an event on this theme).

There are already “digital memorial” companies offering guarantees of “permanent” archiving and access to sites (e.g. Much Loved). Other sites offer memorial pages that allow people to make donations to charity, but presumably these are not expected to remain in place forever.

However, these sites are aimed at those who remain setting up the sites, not taking over the sites that belonged to their loved ones. The value of someone’s posts and pages changes dramatically when they become precious memories, and not just ephemeral chatter. If we (or our loved ones) want our own sites to go on after us, do we need to bequeath our passwords to trusted friends or family? How does that affect our contracts with hosts and service providers? What rights do families have to “reclaim” the pages and content if there is no such bequest? How would disputes over inheritance of such sites be decided? What recourse do we have if the site owner decides to shut down and delete the content or simply loses it?

It seems to me that such issues have the potential to cause far more distress than the strangenesses we encounter when automated reminders and friend suggestions behave as if we are all immortal.

No responses yet

Apr 23 2012

To embed or not to embed – metadata and IDs

One of the problems with the word metadata (apart from the fact that no-one can decide whether it should be singular or plural - as a former classicist I am quite happy to use it in the Anglicized singular form!) is that the word covers such a wide range of data required for a huge variety of uses.

At a recent presentation I gave as part of a “knowledge share” session at the digital design agency Tobias and Tobias, I was rightly challenged by Patrick from Golant Media Ventures, when I said that you should not embed metadata in your content, but manage it separately. He pointed out that for copyright and rights management purposes embedded metadata is extremely useful and in fact many content creators are actively campaigning to make sure that software and service providers do not strip metadata out of content when it is transferred or transcoded.

Embedding information versus embedding IDs

He is quite right, but I was right too – just in a different sense. It is a complex and important point, so I thought it was worth expanding on. I was talking about not embedding metadata structures in assets when you can manage structures of primarily semantic metadata separately. You can do this by embedding only IDs in the assets, and then using those as lookups to access the structure as and when you need to, picking up the structure “on the fly”. The principle remains the same whether you are talking about “private” localised IDs or “public” IDs, such as Linked Open Data dereferenceable URIs (i.e. website addresses you can look up). Such an approach allows you to manage the structures and meanings contextualising those IDs separately from managing the assets themselves.

The reason is mainly technical. If you wish to add to or edit the structure of your taxonomy (or ontology) or change the information your URI points to, it is far easier to do this in one place than it is to find all the assets containing that metadata and re-index them all individually every time you make a change. So, if you store taxonomy pathways as hard-coded text strings in a piece of content, but then you decide to alter the hierarchy, you have to go back to each and every occurrence of that text string applied to content and update it, in each and every asset record that contains it. Sometimes this might be fine – if you know that you are hardly ever going to change the structure or if you have very few assets, or if you have a very powerful and sophisticated re-indexing service. Generally, however, given that language is constantly evolving and asset collections are constantly growing and changing, the “hard-coding” approach is going to require an awful lot of processing and so will be very resource hungry.

If, on the other hand, all you embed in your asset record is an ID, you can use an external system to provide the context for that ID – the pathways of the taxonomy, the relationships of an ontology, the semantic sense of a URI. You can then alter your taxonomy’s hierarchies (e.g. adding and moving concept nodes) or develop your ontology (e.g. adding new classes and relationships) in one centralised system without having to go back to every individual indexed asset in turn. This also means that you can de-couple your taxonomy or ontology management system from your digital asset management or content management system. This is important if you want more sophisticated metadata management than standard DAM, search, or CMS software provides, or if you want to future proof your semantic structures.

Modular systems are more future proof

By keeping asset management and metadata management separate you can upgrade either part without having to upgrade the other. As semantic technologies – such as ontology editing systems – are going through a rapid phase of development, and in general evolving faster than search, DAM, and other consuming systems, maintaining your semantic structures in as transferable and system agnostic form as possible shows foresight. Conversely, you may want to invest only a little in a DAM system, with the hope that business will grow and you will be able to upgrade as your content collections increase. If you have a separate metadata management system you should be able to keep that, while changing your DAM system.

Rights management is different

However, all this primarily concerns internal content and metadata management. Where embedding metadata in the asset itself makes most sense is when that metadata is metadata that you want to remain fixed to that asset and be published with it - for example, details of where a photo was taken, who owns the copyright and how to get in touch with them to licence re-use of that photo. This is because making that information hard to strip out means that when your asset wanders out into the public world of the Internet and frequent uncontrollable copying, you want users to be able to find out easily the origins of the image and its ownership.

A huge problem for collection of royalties and licensing payments is that people who would be willing to pay simply don’t know who to pay. Deliberate piracy will always be a cost – just as shops will always have to allow for a certain amount of “shrinkage” due to shoplifting, but physical shops tend to be pretty good at making sure customers who are willing to pay can find plenty of checkout tills, self-service checkouts, or sales assistants. Keeping rights information embedded in assets is the equivalent of the checkout, not the security camera.

How important is being up to date?

Of course, the problem of updating remains – so if copyrights are transferred, all those assets that have gone out with old embedded metadata contain out of date information. So, rights managers are increasingly moving towards a system of embedding dereferenceable IDs as well. One example is the EIDR system that uses this method (as well as other techniques) to manage rights. By embedding an ID that links to a centralised rights registry, information can be updated once within that central registry, and then whenever someone looks up that ID, they get the most up to date details.

So, we are both right in a way. Embedding IDs and managing metadata separately to managing assets has many advantages. Embedding the metadata itself can also be useful, especially if it is rights information of assets that will be released onto the public Internet and is information that you may not need to update, but that you do not want to be lost when the asset is copied.

No responses yet

Apr 02 2012

Change, technology, understanding, and the information professions

Published by Fran under culture, information management

Not being a morning person, I was unsure whether a networking breakfast would suit me, but the recruitment agent Sue Hill’s event offered good food and interesting conversation, so I thought I would give it a try. I wasn’t disappointed – the food was excellent and the big round tables promoted lively group discussion.

We were a mix of information professionals from public and private sector, at different stages of our careers, but three key themes prompted the most debate.

Change management

Managing technology change and bridging the cultural and political divisions within organisations in order to bring about change were key concerns. Information professionals can contribute by explaining how new technologies work, how technologies can be catalysts of changes in behaviour, and how they mitigate or increase informational and archival risks. Even simply letting people know new technology is out there can be hugely valuable. Knowledge and information workers can help manage change on political and cultural levels by understanding the corporate culture they are working in and helping their organisation to understand itself and so make good decisions about systems procurement. Information professionals can also often help to break down cultural barriers, to sharing information, for example.

Social media

Social media are now being used to differing degrees within organisations – some having embraced the technologies wholeheartedly, others seeing them as a problem or a threat. There was a general concern that technology is being adopted and used faster than we can understand its impacts and devise strategies for mitigating any risks.

Personal and cultural understanding of the divisions between the public and the private seemed to be a problematic area. Young people in particular were perceived as being vulnerable to “over exposure” as they seemed not to notice that postings about them – pictures especially – would remain available for decades to come and could compromise them in their future careers. Recruitment agents use social media to find out about potential job candidates, and notice inconsistencies between a very professional image presented in a CV or at interview with a Twitter feed that paints a picture of carelessness, foolishness, or irresponsibility.

Information literacy

Awareness of how to use and abuse social media, search engines and research tools, and data and statistics was seen as an arena in which information professionals can offer advice and mentoring, to young people, but also to organisations. Information professionals should also set good examples of how to use social media tools, adopt new working practices, and evaluate new technologies. They should also be able to explain how search engines work, what the pitfalls of poorly planned or too narrow research strategies are, and how to research in a more efficient and effective manner.

A new area that information professionals also need to understand is data analytics and how statistics and algorithmic data mining can be used or abused. Information professionals need not be advanced mathematicians to contribute in this area – an understanding of how to interpret data, the political and cultural issues that can bias interpretations, how to frame questions to get mathematically and statistically significant results, and how to understand the importance of outliers and statistical anomalies are skills that are becoming more important every day.

Overall, I thoroughly enjoyed being woken up by such thoughtful and interesting breakfast companions and went about the rest of the day with a head full of fresh ideas.

No responses yet

Mar 29 2012

On Location - geospatial information

Published by Fran under semantic web

I attended an event co-hosted by ISKO UK and the British Computer Society about location data and have written about it for the ISKO UK blog.

No responses yet

Mar 24 2012

DAM, BAM, MAM - taxonomy, metadata, and marketing

Published by Fran under Digital Asset Management

I was delighted to be asked by the DAM Foundation to join a panel for Communicate Magazine’s Transform conference, talking about metadata and its importance in digital asset management and brand management, with a focus on re-branding (hence the conference title). I spoke alongside Mark Davey, from the DAM Foundation, Romney Whitehead of Net a Porter, and Phil Morton, of Freestyle Interactive.

Marketing meets metadata

I was heartened by the growing enthusiasm for bringing together “geeky” metadata specialists with “creative” marketing people. I think both communities of practice have a lot to learn from each other, even though it may seem we speak totally different languages and care about totally different things. I do my best to act as a “boundary object” or translator bringing together our different perspectives - I like to joke that my role is all about putting the “sexy into taxonomy”.

I find conferences are a great source of serendipitous discovery, making me think about business needs and processes that I don’t usually encounter in my day job. One example was a presentation about the recent major re-branding project by Global Blue - the tax free shopping corporation. I learned about the breakdown of the distinction between global and local marketing campaigns and customer service. This is because the luxury shopping market is aimed at the global travelling community who fly around the world and prefer shopping to visiting museums and art galleries. High-end brands need to provide services for these customers not locally to their home towns, but in the shops they visit, by - for example, having Russian- and Chinese-speaking sales assistants in outlets in Paris, London, and Berlin.

In data terms, this means making sure customer relations management is global, and marketing campaigns travel with the customer as they move, rather than being tied to places. This requires sophisticated personal and social metadata management. It is no longer enough to keep customer data in localised silos, as a customer wants to be recognised at every store in the chain everywhere in the world. In other words, there is no such thing as local for these shoppers any more, there is only location - local is wherever they happen to be right now.

This may seem like an obvious point, but for information managers it poses many challenges regarding data security and compliance with different laws in different countries. For brand managers, marketers, and designers, it means devising campaigns that make sense in local cultures, but also appeal to viewers from all over the world.

DAM to protect good supplier as well as customer relations

Romney Whitehead described how important it is for Net a Porter to track the rights, restrictions, and usage of their content as it is used in different media and in different parts of the world. So, if they have acquired the rights to images of a fashion show in Paris, they may only be able to use those shots in certain locations or in particular publications (Europe, but not the USA, print but not online, etc.). Mistakes could lead to their suppliers refusing to provide images in future, which would be very damaging for the company. Their DAM system is therefore vital to their business.

Internally social

Phil Morton talked about the rise of social media and how corporations need to embrace social media in particular for in-house corporate communications and knowledge management. Many organisations still see social media as something that happens “outside” the organisation, but for younger workers and for collaborative projects, internal social media is becoming a key daily business tool, so information managers have to consider how to provide access to and archiving of key social media conversations.

This is clearly a hot topic, as it was discussed at a Sue Hill networking event I attended last week and which I will write about in my next post.

No responses yet

Mar 11 2012

Isn’t search the same as browse?

Published by Fran under KO, search

I nearly wept when one of our young rising IT stars queried in a meeting why we had separated “search” and “browse” as headings for our discusssions on archive navigation functionality. So, to spare me further tears here are some distinctions and similarities. There won’t be anything new for information professionals, but I hope it will be useful if any of your colleagues in IT need a little help. I am sure this is far from comprehensive, so please leave additions and comments!

Differences between search and browse

Search is making a beeline to a known target, browse is wandering around and exploring.
Search is for when you know what you are looking for, browse is for when you don’t.
Search is for when you know what you are looking for exists, browse is for when you don’t.

Search expects you to look for something that is findable, browse shows you the sort of thing you can find.
Search is for when you already know what is available in a collection or repository, browse is how you find out what is there, especially if you are a newcomer.
Search is difficult when you don’t know the right words to use, browse offers suggestions.
Search is a quickfire answer, browse is educative.
Search is about one-off actions, browse is about establishing familiar pathways that can be followed again or varied with predictable results.

Search relies on the seeker to do all the thinking, browse offers suggestions.
Search is a tricky way of finding content on related topics, browse is an easy way of finding related content.
Search is difficult when you are trying to distinguish between almost identical content, browse can highlight subtle distinctions.
Search rarely offers completeness, browse often offers completeness.

Search is pretty much a “black box” to most people, so it is hard to tell how well it has worked, browse systems are visible so it is easy to judge them.
Search uses complex processing that most people don’t want to see, browse uses links and connections that most people like to see.
Search is based on calcuations and assumptions that are under the surface, browse systems offer frameworks that are more open.

Search works well on the web, because the web is so big no-one has had time to build an easy way to browse it, browse works well on smaller structured collections.
Search can run across vast collections, browse needs to be offered at human-readable scales.
Search does not usually give an indication of the size or scope of a collection, browse can be designed to indicate scale.

Similarities between search and browse

Search and browse are both ways of finding content.
Search and browse can both be configured in a huge variety of ways.
Search and browse both have many different mechanisms and implementations.
Search and browse should both be tailored to users’ needs.
Search and browse systems both require thought and editorial judgement in their creation so that they work effectively for any particular collection.
Search and browse systems can often both be created largely automatically.
Search and browse often both involve metadata.
Search and browse behaviours may be intertwined, with users switching from one to the other.
Search and browse may be used by the same users for different tasks at different times.
Search and browse both offer serendipity, although serendipitous opportunities are often hidden by interface design.

Should I offer my users search or browse?

Almost always, you should offer both. Unless you are very sure that your users will always be performing the same kind of task and have the same level of familiarity with your content. With small static collections of content, it may not matter too much, but for most content collections, users will probably want both, but which you make your main focus depends on the context and collection.

Shops might have lots of images and very little text, so a beautifully designed navigation system will help customers find - and buy - products they might not know about, while only a simple search system might be needed to cover searches for product names. A library will need to support lots of searches for titles and across catalogue text with a good search system, but will also need to help educate and inform users with a clear user-friendly browsable navigation system. A large incoherent collection of unstructured text with no particular purpose is likely to be difficult to navigate no matter what you design, so will need good search, but - apart from the web itself - such unbounded and unmanaged collections tend to be quite unusual.

No responses yet

Feb 12 2012

Data: The New Black Gold?

Published by Fran under culture, information management

Last week I attended a seminar organised by The British Screen Advisory Council and Intellect, the technology trade association, and hosted by the law firm SNR Denton. The panellists included Derek Wyatt, internet visionary and former politician, Dr Rob Reid, Science Policy Adviser, Which?, Nick Graham, of SNR Denton, Steve Taylor, creative mentor, Donna Whitehead, Government Affairs Manager, Microsoft, Theo Bertram, UK Policy Manager, Google, David Boyle, Head of Insight, Zeebox, and Louisa Wong, Aegis Media.

Data as oil

The event was chaired by Adam Singer, BSAC chairman, who explored the metaphor of “data as oil”. Like oil, raw data is a valuable commodity, but usually needs processing and refining before it can be used, especially by individual consumers. Like oil, data can leak and spill, and if mishandled can be toxic.

It struck me through the course of the evening, that just like oil, we are in danger of allowing control of data to fall into the hands of a very small number of companies, who could easily form cartels and lock out competition. It became increasingly obvious during the seminar that Google has immense power because of the size of the “data fields” it controls, with Facebook and others trying to stake their claims. All the power Big Data offers - through data mining, analytics, etc. - is dependent on scale. If you don’t have access to data on a huge scale, you cannot get statistically significant results, so you cannot fine tune your algorithms in the way that Google can. The implication is that individual companies will never be able to compete in the Big Data arena, because no matter how much data they gather on their customers, they will only ever have data on a comparatively small number of people.

How much is my data worth?

At a individual level, people seemed to think that “their” data had a value, but could not really see how they could get any benefit from it, other than by trading it for “free” services in an essentially hugely asymmetrical arrangement. The value of “my” data on its own - i.e. what I could sell it for as an individual - is little, but when aggregated, as on Facebook, the whole becomes worth far more than the sum of its parts.

At the same time, the issue of who actually owns data becomes commercially significant. Do I have any rights to data about my shopping habits, for example? There are many facts about ourselves that are simply public, whether we like it or not. If I walk down a public street, anybody can see how tall I am, guess my age, weight, probably work out my gender, social status, where I buy my clothes, even such “personal” details as whether I am confident or nervous. If they then observe that I go into a certain supermarket and purchase several bags of shopping, do I have any right to demand that they “forget” or do not use such observations?

New data, new laws?

It was repeatedly stated that the law as it stands is not keeping up with the implications of technological change. It was suggested that we need to re-think laws about privacy, intellectual property, and personal data.

It occurred to me that we may need laws that deal with malicious use of data, rather than ownership of data. I don’t mind people merely seeing me when I walk down the street, but I don’t want them shouting out observations about me, following me home, or trying to sell me things, as in the “Minority Report” scenario of street signs acting like market hawkers, calling out your name as you walk by.

What sort of a place is the Internet?

Technological change has always provoked psychological and political unease, and some speakers mentioned that younger people are simply adapting to the idea that the online space is a completely open public space. The idea that “on the Internet, no-one knows you are a dog” will be seen as a temporary quirk - a rather quaint notion amongst a few early idealists. Nowadays, not only does everyone know you are a dog, they know which other dogs you hang out with, what your favourite dog food is, and when you last went to the vet.

The focus of the evening seemed to be on how to make marketing more effective, with a few mentions of using Big Data to drive business process efficiencies. A few examples of how Big Data analytics can be used to promote social goods, such as monitoring outbreaks of disease, were also offered.

There were clear differences in attitudes. Some people wanted to keep their data private, and accept in return less personalised marketing. They also seemed to be more willing to pay for ad-free services. Others were far more concerned that data about them should be accurate and they wanted easy ways of correcting their own records. This was not just to ensure factual accuracies, but also because they wanted targeted, personalised advertising and so actively wanted to engage with companies to tell them their preferences and interests. They were quite happy with “Minority Report” style personalisation, provided that it was really good at offering them products they genuinely wanted. They were remarkably intolerant of “mistakes”. The complaint “I bought a book as a present for a friend on Amazon about something I have no interest in, now all it recommends to me are more books on that subject” was common. Off-target recommendations seemed to upset people far more than the thought of companies amassing vast data sets in the first place.

Lifting the lid of the Big Data black box

The issue that I like to raise in these discussions is one that Knowledge Organisation theorists have been concerned about for some time - that we build hidden biases so deeply into our data collection methods, our algorithms, and processes, that our analyses of Big Data only ever give us answers we already knew.

We already know you are more likely to sell luxury cars to people who live in affluent areas, and we already know where those areas are. If all our Big Data analysis does is refine the granularity of this information, it probably won’t gain us that many more sales or improve our lives. If we want Big Data to do more for us, we need to ask better questions - questions that will challenge rather than confirm our existing prejudices and assumptions and promote innovation and creativity, not easy questions that merely consolidate the status quo.

No responses yet

Jan 22 2012

Your organization is not the Internet

Published by Fran under KO, search

Many people find it very difficult to understand why search within an organization can’t “just be like Google”. This is often because they haven’t thought about the differences between an organization and the Internet.

Your organization is smaller than the Internet

Search engines like Google work because they have access to big data. Google gets billions of searches to process, from billions of users. Even if your organization is a large one, it won’t have that many users either searching or contributing content, so it cannot number crunch on the same scale as Google. Your IT department is probably a lot smaller than Google’s and your enterprise search team’s daily budget is unlikely to cover more than the tiniest fraction of what Google spends. Last, but by no means least, your organization doesn’t have as much content as the Internet, so it probably needs to be far more careful about not losing any that is valuable.

Surfing the net is not many people’s job

There are important differences between how and why people search when they are at work and when they are not, and between how and why they search the Internet and their organization’s Intranet or archives. People rarely surf their organization’s Intranet for fun, to be entertained, or to while away the time. The differences in serious research behaviour and leisure searching are well documented, so I am going to write about another aspect of differences between the Internet and organizations that is often overlooked.

Putting stuff online is not the same as writing a business report

There are vast differences in the ways that people create and curate content on the Internet and within an organization. These differences have a significant effect on the way search functions. The key difference is in how much they link their content to that of others. Of course, there are people whose jobs are to create and curate online content - all the web editors, content strategists, copywriters, social media marketers, etc. - but they will be the first to explain that they have a very specialised set of skills focused on making their content searchable, commercial, or otherwise user friendly. They do a whole lot of things that most people as part of the day job neither know how nor have the time to do.

Links are a form of Knowledge Organization that Google gets for free

One of the key things that web professionals and unpaid web enthusiasts do with their content is to add and manage links. Links are what organize the web. Links are what group sites into clusters by content. Links are the web’s classification scheme. Clay Shirky back in 2005 said “there is no shelf” but it makes just as much sense to think of millions of shelves – infinite shelves going off in all directions, with new ones being created and old ones being discarded. The web is not linear – like a shelf – but it is not without structure. Google effectively picks one of the near infinity of shelves and offers it up as a linear list whenever you do a search. It chooses the shelf that seems to be the most popular, or that fits its commercial model. First on the shelf is often a paid-for advertisement or a Wikipedia entry, followed by other big well-established commercial sites. Out there on the Internet, people do an awful lot of shopping, and not much work, so that’s fine. (If they are doing more shopping than work when they are at work, your organization probably has bigger problems than search to deal with.).

For many other searches, especially more thematic research, people would be disappointed with the results, were it not for the magic of the way the web works – the links. As long as Google slings a site at you that has lots of links to other sites, it doesn’t have to take you straight to what you want, it lets you and the links do the rest of the work. Links gather together similar content, so they function like a classification scheme. The links associate content that is aimed at similar audiences, is on similar topics, is of a similar age. The links represent a huge amount of sorting, cataloguing, and classification work. Google did not have to pay for this work (genius business model). People do this work for Google for free. They do this work as part of creating and curating their content.

Many of Google’s volunteer librarians do this work for fun. They create fan sites, they write Wikipedia articles, they produce lists and generate indexes to their favourite content. They provide cataloguing descriptions and context. They do all this work partly because they enjoy it and partly because they hope to get “repaid” by their site becoming popular. They hope this will either lead to monetary reward (their band will get signed, they’ll get a better job, they’ll sell advertising) or social reward (they’ll make online “friends”, get positive feedback from comments, etc.).

From the commercial angle, people do this work because they expect to gain financial reward. They want to sell more products and make money. This is why there are howls of pain whenever Google tweaks its algorithms. Companies that balk at investing in internal search systems will spend fortunes chasing SEO.

Are your staff content curators?

If you want your organization’s search to be “just like Google” you need to think about how linked your content is. Do people who create content in your organization do so for the same reasons and with the same motivations as people create and link content on the web? It is very unlikely that you have lots of “fans” who will spend their free time creating lists of your companies’ best information resources, or collecting and rating and reviewing reports and documents. Most employees are too busy getting on with their day jobs to spend office hours pursuing their “fan” projects. Even if your staff have plenty of spare time, how many of them are big enough fans of some aspect of work to treat it like a hobby? If you want people to start looking out for similar documents on your Intranet and linking their own documents to them, you will probably have to find ways of motivating them to do this as a special initiative. It is not likely to come “for free”, like it does for the web search engines.

For some organizations, encouraging and incentivising “fan”-type behaviour may work. If the organization already has a strong collaborative culture, with people sharing ideas and using social media, it may be a small step to get them to think of their documents and presentations as blog posts. Including content creation and curation in people’s job roles and rewarding those who do well will foster a link-rich Intranet. By recognising and rewarding people who promote useful links and lists and get them to rank highly in your enterprise searches, you could bring an element of gamification to encourage this sort of behaviour. For other organisations, the culture may support this kind of web-style content creation, but people are generally too busy, have skill sets too far from what is required, or need training and encouragement. In such organizations it may make sense to have the equivalent of web editors, content strategists, user experience specialists, search engine optimizers, etc. working with the organization’s internal content to promote the most valuable resources. In other words, layer of “linkers” who work alongside the content originators.

For other organizations, where it would be inappropriate, too time consuming, or too far from established culture to encourage web-like information behaviour, enterprise search will never work “just like Google”. More formal and standardized metadata management processes are likely to be needed. Organizations that generate a lot of very specific content that is unlikely to be useful in broader contexts, confidential content, or large volumes of very similar structured content are likely to find it hard to move away from directed and standardised searching.

Many organizations will have a “mixed economy” with different types of content and different departments operating with different styles (e.g. what works in a marketing department is unlikely to work in the same way in a finance department).

Without links, search is a lot of dead ends

Without links, each search result is isolated. This stops the searcher in their tracks and means they cannot surf in the way they do on the Internet. They will have to check search results one after another in a linear fashion. If your search engine is not getting the most relevant results to the top of that list, your staff will be spending a huge amount of time working their way through that list. They cannot plump for one likely looking result then follow the trail of links, as they do on the web. The links as a form of classification do not exist, so you need another mechanism (taxonomy, ontology, index, directory) to help people find groups of related content and browse through from one document to another.

So, even though you may have the technology and the budget to match Google’s, unless your content creators are linking freely, you will never completely succeed in turning your Intranet into a mini-Internet.

One response so far

Dec 04 2011

Big Data

Published by Fran under information management

I recently attended a gem of an event at the British Computer Society. There were three top class speakers and great networking with interesting people afterwards. I am always a little concerned that I’m not “hard” IT enough for these events, but metadata and taxonomies are such a cross-domain topic I always end up in fascinating conversations.

Why Information Professionals should care about Big Data

There are huge skills shortages in Big Data, mainly on the understanding and interpretation side, rather than the IT angle. The feeling seemed to be that knowing how to build the machines to do the work is the easy bit, but understanding the meanings of data, avoiding the semantic traps and pitfalls, the interpretation of statistics, and above all the appreciation of risk in order to make sound judgement calls are far harder. I believe this is where information scientists and professionals, taxonomists and ontologists, librarians, archivists, and historians have a vital role to play. Our skills of understanding and interpretation are vital, so we need to step up and embrace Big Data collections as a new form of libraries and archives.

Big Data is scary because there is so much of it, it changes so fast, and it has the power to create feedback loops. An especial danger, and one that is familiar to classificationists (as Bowker and Star discuss in their book Sorting Things Out), is that interpretations, biases, and assumptions can be “hard coded” into raw data right from the point of collection. We know that the way you ask a survey question skews the answers you are likely to get, and which data sets you decide to collect and compare will lead you to certain interpretations above others. Make the wrong choices and the algorithms will send you down false alleyways while appearing to be coldly neutral and scientific (a theme that Jonah Bossewitch inspired me to think about). Miss a key factor, and you will make an apparently unarguable leap to a wrong conclusion.

For example, if you see some statistics that indicate a recent increase in clickthroughs to advertisements, you might conclude that your new mobile marketing strategy is a success. However, further investigation might reveal that most of those clickthroughs don’t result in a sale. If you look again you might see that most of the clickthroughs are followed by people immediately reverting back to the site they were on before, making it far more likely that they are just accidentally clicking on adverts. If you refine the results to identify those who were using a specific new device, it seems likely most of the hits were because people were getting used to an unfamiliar touchscreen. Your apparently successful marketing strategy might in fact just be annoying your users.

How Big Data is changing the world

The first speaker, John Morton, Chief Technology Officer, SAS UK, described how increases in processing power have meant that data capture, collection, and analysis are now taking place on a mind-boggling scale at incredible speeds. One example of how this has changed over the last decade is a processing job that used to take 167 hours can now be carried out in 84 seconds. At the same time, huge amounts of unstructured and unmanaged data that used to be uncollectable ephemera are now being stored and can be analysed. He spoke of an urgent need for “data curators” and suggested they would be the librarians of the future, as without management and curation the value of the data is lost. Organisations are typically only manging 5% of their data. Issues familiar to librarians and archivists, such as the importance of metadata, quality tracking, and provenance, are vital in a Big Data world, where the quality of your analysis depends on the quality of the data you mine. Much of this quality lies within the structure of the data and its metadata.

Big Data and Artificial Intelligence

Peter Waggett, Emerging Technology Programme Leader, IBM, talked about how Big Data analysis was used to power the “intelligence” of Watson, the computer that won the US quiz game Jeopardy, beating human contestants. The mistakes Watson made were in many ways more interesting than the questions it answered correctly, as was the speed of answering. Watson always needed some processing time, whereas humans were much faster when they just “knew” the answer.

Although Watson just seems like fun, there are many examples of how Big Data techniques can be used in practice. The retail industry is one of the leaders in Big Data analysis, using data on shopping behaviour gathered from sources like loyalty cards and online shopping orders. Some companies are now using RFID chips (e.g. clothing company Zara) to gather data about the physical movement of products.

(I wondered if retailers were leaping ahead because they can use Big Data to reap rewards with comparatively little risk. In retail, there are huge profits to be made by better tailoring stock to consumer demands, but mistakes are generally not disastrous - a product line that doesn’t sell well and has to be dropped is not a new problem and is one that business models and buyers’ expertise already make allowances for.)

Another example of Big Data analysis is the football team AC Milan, where analysing data about players’ physiology and movements has helped predict injury rates and manage players in order to minimise risks.

The Internet of Things is going to generate even more Big Data and understanding its applications in new arenas - sporting or otherwise - is going to be a huge challenge for managers of the future.

Big Data and Bias

Brijesh Malkan, Founding Partner, EGO Ventures, highlighted some issues to be resolved as we move into a Big Data age. The nature of privacy is changing as oceans of personal data sweep around the world (Facebook already knows you), and so organisations are going to need transparent ethical policies about handling such data. We have reached our “cognitive limits” with so much data to read and so information visualisation is going to be of increasing importance. Data quality also needs to be managed if data mining techniques are to be effective and if algorithmic processing is to produce sensible, useful results. Brijesh talked about “Bayesian filters” and “cognitive scrubbers” to help compensate for biases in data, whether these biases are embedded in the data capture process, in the choices of data used, in the algorithms processing the data, or ultimately in the decisions made by the humans who are interpreting the data.

He spoke of the need for more understanding of psychology, especially of groupthink, echo chambers, and risk perception. Financial markets in particular are prone to “stampede” behaviour, creating bubbles and panics in markets. Data mining of social networking can be prone to creating feedback loops and encouraging risky behaviour. He also spoke of a desperate shortage of people who understand statistics and probability, even within the scientific community.

(This reminded me of the question Patrick Lambe raised at the ISKO UK conference in the summer, asking how information professionals can do something useful for science and for society. Understanding how to interpret and capture data and account for biases, explaining how easy it is to manipulate people’s perceptions through the way information is presented, and teaching how knowledge requires judgement as well as number crunching would seem to be skills that we can offer to the Big Data world already.)

2 responses so far

Nov 20 2011

Data Ghosts in the Facebook Machine by Fantasticlife

Published by Fran under culture, search, semantic web

Understanding how data mining works is going to become increasingly important. There is a huge gap in popular and even professional knowledge about what organisations can now do “under the surface” with our data. For a very clear and straightforward explanation of how social graphs work and why we should be paying attention read Data Ghosts in the Facebook Machine.

One response so far

Next »