Archive for the 'search' Category

May 11 2013

Tagging the cart before the horse - Getting your project plan in order

Published by Fran under KO, search

When people launch search improvement or information organziation projects, one of the commonest mistakes is to be over-eager to “just get the content indexed or tagged” without spending enough time and thought on the structure of an index, what should be tagged, and how the tags themselves should be structured.

This typically happens for two reasons:
1. The project managers - often encouraged by service providers who just want to get their hands on the cheque - simply underestimate the amount of preparatory work involved, whether it is structuring and testing a taxonomy, setting up and checking automated concept extaction rules, or developing a comprehensive domain model and tag set, so they fail to include enough - if any - of a development and testing stage in the plan. This often happens when the project is led by people who do not work closely with the content itself. Projects led by marekting or IT departments often fall into this trap.

2. The project managers include development and testing, with iterative correction and improvement phases, but are put under pressure to cut corners, or to compress deadlines.  This tends to happen when external forces affect timescales - for example local government projects that have to spend the budget before the end of the financial year. It can also happen when stakeholder power is unevenly distributed - for example, the advice of information professionals is sought but then over-ruled by more powerful stakeholders who have a fixed deadline in mind - for example a launching a new website in time for the Christmas market.

Forewarned is forearmed

Prevention is better than cure in both these scenarios, but easier said than done. Your best defence is to understand organizational culture, politics, and history and to evangelize the role and importance of information work and your department. Find out which departments have initiated information projects in the past, which have the biggest budgets, which have the most proactive leadership teams, then actively seek allies in those departments. Find out if there are meetings on information issues you could attend, offer to help, or even do something like conduct a survey on information use and needs and ask for volunteers to be interviewed.  Simply by talking to people at any level in those departments you will start to find out what is going on, and you will remind people in those departments of your existence and areas of expertise.

On a more formal level, you can look at organizational structures and hierarchies and make sure that you have effective chains of communication that follow chains of command. This may mean supporting your boss in promoting the work of your department to their boss. This is especially important in organizations with lots of layers of middle management, as middle managers can get so caught up in day to day work that longer term strategy can get put on the back burner, so offer support.

If you find out about projects early enough, you have a chance of influencing the project planning stages to make sure information and content issues are given the attention they need, right from the start.

Shutting the stable door…

Sometimes despite our best efforts we end up in a project that is already tripping over itself. A common scenario is for tagging work to be presented as a fait accompli. This is particularly likely with fully automated tagging work, as processing can be done far faster than any manual tagging effort. However, it is highly unusual for any project to be undertaken without its being intended to offer some sort or service or solve some recognized problem.

Firstly, assess how well it achieves its intended goals. If you have only been called into the project at the late stage, is this because it is going off the rails and the team want a salvage solution, or is it because it works well in one context and the team want to see if it can be used more widely? If it is the latter, that’s great - you can enjoy coming up with lots of positive and creative proposals. However, the core business planning principles are pretty much the same whether you are proposing to extend a successful project or corralling one that is running out of control.

Once you know what the project was meant to achieve, assess how much budget and time you have left, as that will determine the scope to make changes and improvements. Work out what sort of changes are feasible. Can you get an additional set of tags applied for example? Can you get sets of tags deleted? Are you only able to make manual adjustments or can you re-run automated processes? How labour intensive are the adjustment processes? Is chronology a factor – in other words can you keep the first run for legacy content but evolve the processes for future content?

These assessments are especially valuable for projects that are at an intermediate stage as there is much more scope to alter their direction. In these cases it is vital to prioritize and focus on what can be changed in a pragmatic way. For example, if the team are working chronologically through a set of documents, you may have time to undertake planning and assessment work focused on the most recent and have that ready before they get to a logical break point. So, you prioritize developing a schema relevant to the current year, and make a clean break on a logical date, such as January 1. If they have been working topic by topic, is there a new search facet you could introduce and get a really good set for that run as a fresh iteration?

If there are no clean breakpoints or clear sets of changes to be made, focus on anything that is likely to cause user problems or confusion or serious information management problems in future. What are likely to cause real pain points? What are the worst of those?

Once you have identified the worst issues and clarified the resources you have for making the changes, you have the basis for working up the time and money you need to carry them out. This can form the basis of your business case and project plan either to improve a faltering project and pull it back on track or to add scope to a project that is going well.

…after the horse has bolted

If there is limited scope to make changes, and the project is presented as already complete, it is still worth assessing how well it meets its goals as this will help you work out how you can best use and present the work that has been done. For example, can it be offered as an “optional extra” to existing search systems?

It is also worth assessing the costs and resource involved in order to make changes you would recommend even if it seems there is no immediate prospect of getting that work done. It is likely that sooner or later someone will want to re-visit the work, especially if it is not meeting its goals. Then it will be useful to know whether it can be fixed with a small injection of resource or whether it requires a major re-working, or even abandoning and starting afresh. Such a prospect may seem daunting, but if you can learn lessons and avoid repeating mistakes the next time around, then that can be seen as a positive. If one of the problems with the project was the lack of input from the information team early on, then it is worth making sure for the sake of the information department and the organization as a whole that the same mistake does not happen again. If you demonstrate well enough how you would have done things differently, you might even get to be in charge next time!

No responses yet

Apr 13 2013

ISKO UK 2013 - provisional programme

Published by Fran under KO, search, semantic web

I will probably be on the other side of the Atlantic when the ISKO UK conference takes place in July in London, UK. I will be sorry to miss it, because the committee have brought together a diverse, topical, and fascinating collection of speakers.

ISKO UK excels in unifying academic and practitioner communities, and the conference promises to investigate the barriers that separate research from practice and to seek out boundary objects that can bring the communities together.

This is demonstrated in person by the keynote speakers Patrick Lambe of Straits Knowledge and Martin White of Intranet Focus Ltd - both respected for their commercial as well as academic contributions to the field of Knowledge Organization.

Amidst what is already shaping up to be a very full and varied programme, the presentations by Jeremy Tarling and Matt Shearer (BBC News) and Jarred McGinnis and Helen Lippell (Press Association) will show how research in semantic techniques is now being put to practical use in managing the fast-flowing oceans of information that news organizations handle.

The programme also includes a whole session on combining ontologies with other tools, as well as papers on facet analysis and construction of controlled vocabularies. There’s even some epistemology to please pure theoreticians.

No responses yet

Dec 02 2012

Libraries, Media, and the Semantic Web meetup at the BBC

In a bit of a blog cleanup, I discovered this post languishing unpublished. The event took place earlier this year but the videos of the presentations are still well worth watching. It was an excellent session with short but highly informative talks by some of the smartest people currently working in the semantic web arena. The Videos of the event are available on You Tube.

Historypin

Jon Voss of Historypin was a true “information altruist”, describing libraries as a “radical idea”. The concept that people should be able to get information for free at the point of access, paid for by general taxation, has huge political implications. (Many of our libraries were funded by Victorian philanthropists who realised that an educated workforce was a more productive workforce, something that appears to have been largely forgotten today.) Historypin is seeking to build a new library, based on personal collections of content and metadata – a “memory-sharing” project. Jon eloquently explained how the Semantic Web reflects the principles of the first librarians in that it seeks ways to encourage people to open up and share knowledge as widely as possible.

MIMAS

Adrian Stevenson of MIMAS described various projects including Archives Hub, an excellent project helping archives, and in particular small archives that don’t have much funding, to share content and catalogues.

rNews

Evan Sandhaus of the New York Times explained the IPTC’s rNews – a news markup standard that should help search engines and search analytics tools to index news content more effectively.

schema.org

Dan Brickley’s “compare and contrast” of Universal Decimal Classification with schema.org was wonderful and he reminded technologists that it very easy to forget that librarians and classification theorists were attempting to solve search problems far in advance of the invention of computers. He showed an example of “search log analysis” from 1912, queries sent to the Belgian international bibliographic service – an early “semantic question answering service”. The “search terms” were fascinating and not so very different to the sort of things you’d expect people to be asking today. He also gave an excellent overview of Lonclass the BBC Archive’s largest classification scheme, which is based on UDC.

BBC Olympics online

Silver Oliver described how BBC Future Media is pioneering semantic technologies and using the Olympic Games to showcase this work on a huge and fast-paced scale. By using semantic techniques, dynamic rich websites can be built and kept up to the minute, even once results start to pour in.

World Service audio archives

Yves Raimond talked about a BBC Research & Development project to automatically index World Service audio archives. The World Service, having been a separate organisation to the core BBC, has not traditionally been part of the main BBC Archive, and most of its content has little or no useful metadata. Nevertheless, the content itself is highly valuable, so anything that can be done to preserve it and make it accessible is a benefit. The audio files were processed through speech-to-text software, and then automated indexing applied to generate suggested tags. The accuracy rate is about 70% so human help is needed to sort out the good tags from the bad (and occasionally offensive!) tags, but thsi is still a lot easier than tagging everything from scratch.

No responses yet

Oct 07 2012

Local is the new social – location data startups

Published by Fran under culture, search

A few weeks ago I attended an event by Dreamstake featuring a collection of startup companies that are using open geographical data – such as the data released by Ordnance Survey. There was much championing of the possibilities of much money to be made by using data that organisations release for free. This seems obvious to me – someone else has paid to do all the preparatory work so others can cash in. No-one seems concerned about the ethics of this. If UK taxpayers have paid for the OS work to be done, should they not automatically be shareholders in any company that profits from the fruits of this investment?

The companies showcased all had new twists on using location data. What I found especially interesting was the emphasis on context. When selling services, place alone is not enough. Time is important and also the circumstances. So, a businesswoman on a work trip will want probably different products and services to when she is out with her family.

The speakers were
James Pursey of Sortedapp
Sadiq Qasim LoYakk
Craig Wareham of Viewranger
Tim Buick of Streetpin

Location-based marketing

James Pursey opened by giving a brief history of location-based marketing, pointing out that this was pioneered by the Yellow Pages (now yell.com). His company attempts to match time, place, and location and makes the consumer the advertiser and the service provider the respondent. He explained this as a “reverse Ebay”. Instead of advertising your products and services, consumers post details of what they want, e.g. I need someone to clean my flat before my wife gets home (the data game still seems to be a man’s world!). The message is then pushed to local cleaners who have a window of time in which to respond. The app works on the location of your mobile phone, but you can alter that on a map so that you can be at home but arrange a service to be provided near your workplace, etc.

Chatting about a shared experience

Sadiq Qasim explained that LoYakk – local yakking – recognises that conversations are often focused around specific places and events. Social media links tend to be based on static lists of friends, with very little contextualisation. However, social relationships and conversations are often transient. You might want to chat to someone at a conference, but that doesn’t mean you want to become lifelong friends. By creating an app that mirrors the real world nature of such connections, people can drop in, chat to people in the vicinity and leave again. Events such as conferences, arts and sporting events, and holiday destinations are particularly well suited to this approach.

Mobile is local

Craig Wareham described Viewranger, which is an app for outdoorsy people. It combines guidebook information, a social community, a marketplace, based around location and has become popular with search and rescue teams.

Tim Buick of Streetpin emphasised that about half of searches on mobiles – perhaps unsurprisingly – are for something local. However, time is very relevant - he might be near a great pub that has a special offer on beer but he doesn’t want to be told about it at 8 in the morning when he has just dropped the kids off at nursery, but in the same location 12 hours later with his mates, the offer might be just what they want. The right information, to the right person, at the right place and at the right time is what matters.

The distinction between what is useful information and what is marketing becomes very blurred.

Place, space, maps

Thinking about this event along with the Shape of Knowledge event’s discussions of maps of cyberspace, and the Superhuman exhibition’s raising the question of the potential of transhumans to relate to space in a different way to current humans, made me wonder how location-based services will change in future. The technologically enhanced human will, presumably, need maps that make sense to computers as well as maps that make sense in real space and time. Navigation and location are most likely going to change beyond all recognition.

No responses yet

Aug 22 2012

Digital Asset Management Techniques for Indexing Non-Textual Content - SLA Chicago

Published by Fran under KO, cataloguing, search

David Riecks of Controlled Vocabulary gave a presentation about indexing images. He pointed out that metadata is all around us, but we don’t tend to notice it. He described the sort of metadata needed to make an asset “smart” and how organizations like the PLUS registry are attempting to provide a simple, one-stop shop for rights and licensing metadata. The Embedded Metadata Manifesto sets out details of metadata that needs to be included in image files to promote easy and legal re-use of content and so protect the rights of photographers and others in the content creation and related industries.

David also provided an extremely useful list of metadata resources , including a handy link to a website that checks whether metadata is being stripped from files at the point of upload.

Laura Fu talked us through the latest Digital Asset Management (DAM) implementation at Randall Marcinko of Marcinko Enterprises Inc. then talked about using different elements of assets to act as indexing mechanisms. He gave an example of where they were able to use the images associated with pieces of text as disambiguators to distinguish between the text. He also pointed out the dangers of trying to make every information project the same, and to think carefully about what is needed. It is easey to fall into the trap of simply offering all clients the same solution, whether that works best for them or not. Depending on what you are trying to achieve, a simple list is all that is needed, not a complex taxonomy or thesaurus, and the simpler the method of solving a problem, the easier and cheaper it is likely to be to implement.

No responses yet

Jul 05 2012

Photo metadata conference

Published by Fran under Digital Asset Management, KO, search

I was very grateful to Sarah Saunders of Electric Lane for inviting me to speak at the CEPIC Conference at the IPTC Congress in May.

These are just a few of my personal highlights from a very full conference.

Image content for mobile devices

Dittmar Frohmann, Director of International Product at iStock and Getty Images, the keynote speaker of the day, covered a lot of ground, but I was struck by his recognition of the need for new business models for photo libraries. As has happened to the book publishing and music industries, the photo industries are reeling from the shock of the transition to a digital world.

Professional photographers are finding it harder to manage rights and licensing of their images, as digital copies are now so cheap and easy to produce and distribute around the world, and at the same time images taken on ubiquitous mobiles phones have become fashionable. “Citizen photographers”, including those taking out-of-focus badly lit mobile phone photos, are producing huge numbers of images that often do not meet traditional professional standards. However, such images are seen as “authentic” and “intimate” and have become popular with consumers in an age of austerity where slick, aspirational hyper-reality and glamorous models (Photoshop handsome?) are increasingly failing to chime with ordinary people.

This means that “un-professional” images are actively being sought by advertising agencies. Photographic styles go in and out of fashion, but never before has it been so easy for “amateurs” to produce high resolution images. At the same time, image libraries find themselves faced with a deluge of digital files and have to manage these files to ensure they don’t inadvertently breach rights agreements, while trying to add value to their services.

For image libraries, rights management and search/retrieval have become the two hottest topics as the key areas where economies of scale can offer improvements over “DIY” online sales and marketing. Libraries are effectively aggregators, and therefore services providers - gathering independent collections and individual photographers in one place can provide a one-stop shop for purchasers. If this is combined with fast and easy rights and re-use clearing services, along with distribution, then the libraries can still provide a useful and profitable service to both the producers of content (the photographers) and the consumers.

(I was surprised that very little was said about an editorial role for image collections - another area that value can be added is through collection curation and branding. So, you know that the best place to get UK landscape shots is from such-and-such a collection, etc. However, this is much harder to maintain, manage, and promote.)

Image metadata

I gave an overview of the history of metadata for knowledge organisation, with an emphasis on aspects that are peculiar to image libraries. For example, still images do not come with text attached, so natural language processing and concept extraction techniques that can drive document and text-based search systems can only be a second step for image libraries, once some text has been generated to associate with stills.

I was very pleased that a couple of the key themes that I introduced in my talk were picked up and elaborated on by other presenters.

Linked Data and crowdsourcing

Mary Forster from Getty Images went into detail about Linked Data and how this is being used to enhance Getty’s services and image management, by using linked data concept URIs to index images. She explained the differences between text matching and concept linking, and how text matching is far more noisy and imprecise than concept linking, and how using concepts enables flexible management of metadata structures so that creation of complex associations can be automated.

Andrew Ellis from the My Paintings project with the Public Catalogue Foundation talked about how they had successfully managed crowdsourcing by putting in place a sophisticated number of ways of managing the capture of the metadata. For example, rather than only offering unconstrained free tagging, taggers were invited to select tags from a dictionary list, in order to disambiguate concepts. They were also invited to select from a number of pre-set facets driven by controlled vocabularies - image type, style, etc. This made it easy to integrate the free tagging within an existing navigational scheme.

Content-based image retrieval

Mathieu from Xerox then talked about content-based image retrieval. Xerox have been working on sophisticated image analysis techniques designed to find images that have similar qualities to other images. They have a series of algorithms that analyse image “texture” and create a “Digital fingerprint” of an image. Other images with very similar fingerprints tend to look similar. This means that you can train the system with sets of example images, and it can then identify similar images in the collection. This can be used as an image autoclassification tool, as you can set up your training sets to be useful categories (famous landmarks, pop stars, tigers, etc.) and then sort your images into these categories. Xerox trained their system’s 706 categories using 1.5 million images.

The system works very well with distinct and easily recognisable images - iconic images like the Sydney Opera house for example - and on large collections where there are clear and obvious “hits” and “misses”. It doesn’t work well with concepts such as politics or history, as it is hard to come up with key images for the training set, nor moods - inspirational, happy, tranquil, etc. However, for large collections with no metadata, it offers a good way of adding structured metadata to make a collection navigable. Another interesting use is to identify duplicate images, so you could use it to assess the contents of a collection to find gaps (“we have hundreds of images of Tower Bridge, but none of the Golden Gate bridge”, etc.).

Perhaps it even has a potential use for TV producers editing rushes on a shoot - “we already have hundreds of shots of the sunset over the mountains, but hardly any close-ups of skiers”, for example.

I guess one day there will be a market for “controlled imageries” - training sets of example images to use as basis for such autoclassification software.

You can try it here.

Rights, IPO, orphan works

Nancy Wolff and Antoinette Graves of the IPO talked about rights and the law. Nancy stated that the need to be found is becoming more critical. Orphan works legislation advocates in the US want to de-risk usage so that images can be used even when it is not clear who they belong to or the owner is known but cannot be found.
Nancy noted that proposals for rights registries are being enthusiastically supported by Google but also that whoever owns such registries will not only make a lot of money but will also control access to and usage of content.

Antoinette pointed out that in the UK at present there is no diligent search that will allow for the use of an “orphan work”. This makes it very hard for publishers to be sure that they will not be prosecuted. There is a notable difference between “old” orphan works in museums, etc. and “new” orphans caused by metadata stripping.

Future of image search and rights management

In the afternoon I attended an interesting breakout session on the future of search, with a large and impressive panel. Rights management was a cited as a huge issue to resolve, with a call for slick seamless user-friendly payment systems, to enable people to buy images and re-use them legally, without friction and effort. Technology was seen as the answer to an essentially technology-created problem. Free distribution over the internet meant that people had a sense of entitlement - a sense that content ought to be free, mistaking the differences between free content and freedom of information.

Managing digital rights is not the same as imposing “lockout” DRM systems. There is a need to devise licensing methods that are based on understanding machine-to-machine communication, rights description metadata, etc. No-one wants to invest in content creation any more, largely because the protection of rights is so difficult, making content creation a very risky business. If this trend is to be reversed, technological solutions to the problems of rights clearances must be found.

Predictions for the future were that crowd sourcing would become increasingly important. Interestingly crowd-sourcing relies on the notion of people working for nothing, and I couldn’t help noticing the contrast between the professional photographers trying to stop “amateurs” destroying their living by providing images without expecting payment, but being perfectly happy for people to add metadata without being paid for their work.

The need to get money into the system somewhere in order to enable anyone to get paid was emphasised and I suppose when an industry is facing diminishing returns, everybody involved in the supply chain puts pressure on everyone else to cut their costs or work for nothing.
I can’t help thinking that the deluge of images from all sources is going to mean that findability - and hence metadata - will become even more significant as more and more images chase fewer and fewer users willing to pay for them.

No responses yet

Jun 19 2012

Building bridges: Linking diverse classification schemes as part of a technology change project

My paper about my work on the linking and migration of legacy classification schemes, taxonomies, and controlled vocabularies has been published in the Journal for Business Information Review.

No responses yet

May 20 2012

Google goes semantic

Published by Fran under search, semantic web

A happy week for ontologists, taxonomists, and other knowledge organisers as Google reveals its knowledge graph.

Patrick Lambe sums it up wonderfully:
Google Finally Comes Out of the Closet on Taxonomies.

Here’s a great post by Seth Earley:
Google Knowledge Graph and Taxonomy - It’s in There.

No responses yet

Mar 11 2012

Isn’t search the same as browse?

Published by Fran under KO, search

I nearly wept when one of our young rising IT stars queried in a meeting why we had separated “search” and “browse” as headings for our discusssions on archive navigation functionality. So, to spare me further tears here are some distinctions and similarities. There won’t be anything new for information professionals, but I hope it will be useful if any of your colleagues in IT need a little help. I am sure this is far from comprehensive, so please leave additions and comments!

Differences between search and browse

Search is making a beeline to a known target, browse is wandering around and exploring.
Search is for when you know what you are looking for, browse is for when you don’t.
Search is for when you know what you are looking for exists, browse is for when you don’t.

Search expects you to look for something that is findable, browse shows you the sort of thing you can find.
Search is for when you already know what is available in a collection or repository, browse is how you find out what is there, especially if you are a newcomer.
Search is difficult when you don’t know the right words to use, browse offers suggestions.
Search is a quickfire answer, browse is educative.
Search is about one-off actions, browse is about establishing familiar pathways that can be followed again or varied with predictable results.

Search relies on the seeker to do all the thinking, browse offers suggestions.
Search is a tricky way of finding content on related topics, browse is an easy way of finding related content.
Search is difficult when you are trying to distinguish between almost identical content, browse can highlight subtle distinctions.
Search rarely offers completeness, browse often offers completeness.

Search is pretty much a “black box” to most people, so it is hard to tell how well it has worked, browse systems are visible so it is easy to judge them.
Search uses complex processing that most people don’t want to see, browse uses links and connections that most people like to see.
Search is based on calcuations and assumptions that are under the surface, browse systems offer frameworks that are more open.

Search works well on the web, because the web is so big no-one has had time to build an easy way to browse it, browse works well on smaller structured collections.
Search can run across vast collections, browse needs to be offered at human-readable scales.
Search does not usually give an indication of the size or scope of a collection, browse can be designed to indicate scale.

Similarities between search and browse

Search and browse are both ways of finding content.
Search and browse can both be configured in a huge variety of ways.
Search and browse both have many different mechanisms and implementations.
Search and browse should both be tailored to users’ needs.
Search and browse systems both require thought and editorial judgement in their creation so that they work effectively for any particular collection.
Search and browse systems can often both be created largely automatically.
Search and browse often both involve metadata.
Search and browse behaviours may be intertwined, with users switching from one to the other.
Search and browse may be used by the same users for different tasks at different times.
Search and browse both offer serendipity, although serendipitous opportunities are often hidden by interface design.

Should I offer my users search or browse?

Almost always, you should offer both. Unless you are very sure that your users will always be performing the same kind of task and have the same level of familiarity with your content. With small static collections of content, it may not matter too much, but for most content collections, users will probably want both, but which you make your main focus depends on the context and collection.

Shops might have lots of images and very little text, so a beautifully designed navigation system will help customers find - and buy - products they might not know about, while only a simple search system might be needed to cover searches for product names. A library will need to support lots of searches for titles and across catalogue text with a good search system, but will also need to help educate and inform users with a clear user-friendly browsable navigation system. A large incoherent collection of unstructured text with no particular purpose is likely to be difficult to navigate no matter what you design, so will need good search, but - apart from the web itself - such unbounded and unmanaged collections tend to be quite unusual.

No responses yet

Jan 22 2012

Your organization is not the Internet

Published by Fran under KO, search

Many people find it very difficult to understand why search within an organization can’t “just be like Google”. This is often because they haven’t thought about the differences between an organization and the Internet.

Your organization is smaller than the Internet

Search engines like Google work because they have access to big data. Google gets billions of searches to process, from billions of users. Even if your organization is a large one, it won’t have that many users either searching or contributing content, so it cannot number crunch on the same scale as Google. Your IT department is probably a lot smaller than Google’s and your enterprise search team’s daily budget is unlikely to cover more than the tiniest fraction of what Google spends. Last, but by no means least, your organization doesn’t have as much content as the Internet, so it probably needs to be far more careful about not losing any that is valuable.

Surfing the net is not many people’s job

There are important differences between how and why people search when they are at work and when they are not, and between how and why they search the Internet and their organization’s Intranet or archives. People rarely surf their organization’s Intranet for fun, to be entertained, or to while away the time. The differences in serious research behaviour and leisure searching are well documented, so I am going to write about another aspect of differences between the Internet and organizations that is often overlooked.

Putting stuff online is not the same as writing a business report

There are vast differences in the ways that people create and curate content on the Internet and within an organization. These differences have a significant effect on the way search functions. The key difference is in how much they link their content to that of others. Of course, there are people whose jobs are to create and curate online content - all the web editors, content strategists, copywriters, social media marketers, etc. - but they will be the first to explain that they have a very specialised set of skills focused on making their content searchable, commercial, or otherwise user friendly. They do a whole lot of things that most people as part of the day job neither know how nor have the time to do.

Links are a form of Knowledge Organization that Google gets for free

One of the key things that web professionals and unpaid web enthusiasts do with their content is to add and manage links. Links are what organize the web. Links are what group sites into clusters by content. Links are the web’s classification scheme. Clay Shirky back in 2005 said “there is no shelf” but it makes just as much sense to think of millions of shelves – infinite shelves going off in all directions, with new ones being created and old ones being discarded. The web is not linear – like a shelf – but it is not without structure. Google effectively picks one of the near infinity of shelves and offers it up as a linear list whenever you do a search. It chooses the shelf that seems to be the most popular, or that fits its commercial model. First on the shelf is often a paid-for advertisement or a Wikipedia entry, followed by other big well-established commercial sites. Out there on the Internet, people do an awful lot of shopping, and not much work, so that’s fine. (If they are doing more shopping than work when they are at work, your organization probably has bigger problems than search to deal with.).

For many other searches, especially more thematic research, people would be disappointed with the results, were it not for the magic of the way the web works – the links. As long as Google slings a site at you that has lots of links to other sites, it doesn’t have to take you straight to what you want, it lets you and the links do the rest of the work. Links gather together similar content, so they function like a classification scheme. The links associate content that is aimed at similar audiences, is on similar topics, is of a similar age. The links represent a huge amount of sorting, cataloguing, and classification work. Google did not have to pay for this work (genius business model). People do this work for Google for free. They do this work as part of creating and curating their content.

Many of Google’s volunteer librarians do this work for fun. They create fan sites, they write Wikipedia articles, they produce lists and generate indexes to their favourite content. They provide cataloguing descriptions and context. They do all this work partly because they enjoy it and partly because they hope to get “repaid” by their site becoming popular. They hope this will either lead to monetary reward (their band will get signed, they’ll get a better job, they’ll sell advertising) or social reward (they’ll make online “friends”, get positive feedback from comments, etc.).

From the commercial angle, people do this work because they expect to gain financial reward. They want to sell more products and make money. This is why there are howls of pain whenever Google tweaks its algorithms. Companies that balk at investing in internal search systems will spend fortunes chasing SEO.

Are your staff content curators?

If you want your organization’s search to be “just like Google” you need to think about how linked your content is. Do people who create content in your organization do so for the same reasons and with the same motivations as people create and link content on the web? It is very unlikely that you have lots of “fans” who will spend their free time creating lists of your companies’ best information resources, or collecting and rating and reviewing reports and documents. Most employees are too busy getting on with their day jobs to spend office hours pursuing their “fan” projects. Even if your staff have plenty of spare time, how many of them are big enough fans of some aspect of work to treat it like a hobby? If you want people to start looking out for similar documents on your Intranet and linking their own documents to them, you will probably have to find ways of motivating them to do this as a special initiative. It is not likely to come “for free”, like it does for the web search engines.

For some organizations, encouraging and incentivising “fan”-type behaviour may work. If the organization already has a strong collaborative culture, with people sharing ideas and using social media, it may be a small step to get them to think of their documents and presentations as blog posts. Including content creation and curation in people’s job roles and rewarding those who do well will foster a link-rich Intranet. By recognising and rewarding people who promote useful links and lists and get them to rank highly in your enterprise searches, you could bring an element of gamification to encourage this sort of behaviour. For other organisations, the culture may support this kind of web-style content creation, but people are generally too busy, have skill sets too far from what is required, or need training and encouragement. In such organizations it may make sense to have the equivalent of web editors, content strategists, user experience specialists, search engine optimizers, etc. working with the organization’s internal content to promote the most valuable resources. In other words, layer of “linkers” who work alongside the content originators.

For other organizations, where it would be inappropriate, too time consuming, or too far from established culture to encourage web-like information behaviour, enterprise search will never work “just like Google”. More formal and standardized metadata management processes are likely to be needed. Organizations that generate a lot of very specific content that is unlikely to be useful in broader contexts, confidential content, or large volumes of very similar structured content are likely to find it hard to move away from directed and standardised searching.

Many organizations will have a “mixed economy” with different types of content and different departments operating with different styles (e.g. what works in a marketing department is unlikely to work in the same way in a finance department).

Without links, search is a lot of dead ends

Without links, each search result is isolated. This stops the searcher in their tracks and means they cannot surf in the way they do on the Internet. They will have to check search results one after another in a linear fashion. If your search engine is not getting the most relevant results to the top of that list, your staff will be spending a huge amount of time working their way through that list. They cannot plump for one likely looking result then follow the trail of links, as they do on the web. The links as a form of classification do not exist, so you need another mechanism (taxonomy, ontology, index, directory) to help people find groups of related content and browse through from one document to another.

So, even though you may have the technology and the budget to match Google’s, unless your content creators are linking freely, you will never completely succeed in turning your Intranet into a mini-Internet.

One response so far

Next »