Sep 06 2012

The Shape of Knowledge - review of ISKOUK event

Published by Fran under KO, information architecture

On Tuesday I attended a very interesting event about information visualization and I have written a review for the ISKO UK blog.

I was particularly fascinated by the ideas suggested by Martin Dodge of mapping areas that are not “space” and what this means for the definition of a “map”. So, the idea of following the “path” of a device such as a phone through the electromagnetic spectrum brings a geographical metaphor into a non-tangible “world”. Conversely, is the software and code that devices such as robots use to navigate the world a new form of “map”? Previously, I have thought of code as “instructions” and “graphs” but have always thought of the “graph” as a representation of coded instructions, visualized for the benefit of humans, rather than the machines. However, now that machines are responding more directly to visual cues, perhaps the gap between their “maps” and our “maps” is vanishing.

One response so far

Nov 20 2011

Data Ghosts in the Facebook Machine by Fantasticlife

Published by Fran under culture, search, semantic web

Understanding how data mining works is going to become increasingly important. There is a huge gap in popular and even professional knowledge about what organisations can now do “under the surface” with our data. For a very clear and straightforward explanation of how social graphs work and why we should be paying attention read Data Ghosts in the Facebook Machine.

One response so far

Jun 25 2011

The Organizational Digital Divide

Published by Fran under KO, culture

Catching up on my reading, I found this post by Jonah Bossewitch: Pick a Corpus, Any Corpus and was particularly struck by his clear articulation of the growing information gulf between organizations and individuals.

I have since been thinking about the contrast between our localised knowledge organization systems and the semantic super-trawlers of the information oceans that are only affordable - let alone accessible - to the megawealthy. It is hard not to see this as a huge disempowerment of ordinary people, swamping the democratizing promise of the web as a connector of individuals. The theme has also cropped up in KIDMM discussions about the fragmentation of the information professions. The problem goes far beyond the familiar digital divide, beyond just keeping our personal data safe, to how we can render such meta-industrial scale technologies open for ordinary people to use. Perhaps we need public data mines to replace public libraries? It seems particularly bad timing that our public institutions - our libraries and universities - are under political and financial attack just at the point when we need them to be at the technological (and expensive) cutting edge.

We rely on scientists and experts to advise us on how to use, store and transport potentially hazardous but generally useful chemicals, radioactive substances, even weapons, and information professionals need to step up to the challenges of handling our new potentially hazardous data and data analysis tools and systems. I am reassured that there are smart people like Jonah rising to the call, but we all need to engage with the issues.

No responses yet

Nov 03 2010

Assumptions, mass data, and ghosts in the machine

Published by Fran under culture

Back in the summer, I was very lucky to meet Jonah Bossewitch (thanks Sam!) an inspiring social scientist, technical architect, software developer, metadatician, and futurologist. His article The Bionic Social Scientist is a call to arms for the social sciences to recognise that technological advances have led to a proliferation of data. This is assumed to be unequivocably good, but is also fuelling a shadow science of analysis that is using data but failing to challenge the underlying assumptions that went into collecting that data. As I learned from Bowker and Star, assumptions - even at the most basic stage of data collection - can skew the results obtained and that any analysis of such data may well be built on shaky (or at the very least prejudiced) foundations. When this is compounded by software that analyses data, the presuppositions of the programmers, the developers of the algorithms, etc. stack assumption on top of assumption. Jonah points out that if nobody studies this phenomenon, we are in danger of losing any possibility of transparency in our theories and analyses.

As software becomes more complex and data sets become larger, it is harder for human beings to perform “sanity checks” or apply “common sense” to the reports produced. Results that emerge from de facto “black boxes” of calculation based on collections of information that are so huge that no lone unsupported human can hope to grasp are very hard to dispute. The only possibility of equal debate is amongst other scientists, and probably only those working in the same field. Helen Longino’s work on science as social practice emphasised the need for equality of intellectual authority, but how do we measure that if the only possible intellectual peer is another computer? The danger is that the humans in the scientific community become even more like high priests guarding the machines that utter inscrutable pronouncements than they are currently. What can we do about this? More education, of course, with the academic community needing to devise ways of exposing the underlying assumptions and the lay community needing to become more aware of how software and algorithms can “code in” biases.

This appears to be a rather obscure academic debate about subjectivity in software development, but it strikes to the heart of the nature of science itself. If science cannot be self-correcting and self-criticising, can it still claim to be science?

A more accessbile example is offered by a recent article claiming that Facebook filters and selects updates. This example illustrates how easy it is to allow people to assume a system is doing one thing with massed data when in fact it is doing something quite different. Most people think that Facebook’s “Most Recent” updates provides a snapshot of the latest postings by all your friends, and if you haven’t seen updates from someone for a while, it is because they haven’t posted anything. The article claims that Facebook prioritises certain types of update over others (links take precedence over plain text) and updates from certain people. Doing this risks creating an echo chamber effect, steering you towards the people who behave how Facebook wants them to (essentially, posting a lot of monetisable links) in a way that most people would never notice.

Another familiar example is automated news aggregation - an apparently neutral process that actually involves sets of selection and prioritisation decisions. Automated aggreagations used to be based on very simple algorithms, so it was easy to see why certain articles were chosen and others excluded, but very rapidly such processing has advanced to the point that it is almost impossible (and almost certainly impractical) for a reader to unpick the complex chain of choices.

In other words, there certainly is a ghost in the machine, it might not be doing what we expect, and so we really ought to be paying attention to it.

One response so far

Mar 06 2009

Information visualisation

Published by Fran under information management

I heard a talk by Ben Shneiderman about information visualisation yesterday for the Cambridge Usability Professionals Group. (It was ironic that I had a “locational usability” problem and was almost late, having made the novice error of trying to find Microsoft Research in the William Gates Building - which is indeed named after Microsoft’s Bill Gates foundation - but Microsoft Research in Cambridge was set up by Roger Needham, so it is in his building!)

The talk itself was very easy to follow with lively demonstrations of a number of visualisation tools. Shneiderman was very careful to point out that you need to have a good question and good data to get good results from information visualisation, and that it is no panacea, but when it works, it is fantastically powerful. One of the key strengths is that it makes it easy to spot outliers or anomalies in huge masses of data, particularly when there is a general underlying correlation. It is almost impossible to detect trends in a big spreadsheet full or numbers, but convert that into a visual form and the trends leap out. This means that you can see at a glance things like which companies’ stocks are rising when all the others are falling. Of course, graphs are nothing new, but the range of analytical tools that are now available mean you can quickly pick out things like spikes and shapes in your data in a way that would have been painstaking previously. There are also very important applications in medical research and diagnosis, as the ability to track which order certain events happen, helps researchers establish whether one condition causes another and could even be used to generate personal health alerts.

I liked the smart-money style treemaps (although the choice of red-green can’t be great for anyone who is colourblind), but I found the marumushi newsmap fun but not much more informative than traditional newspapers, mainly because the newsmap crams in more words than I can take in. Newspapers are really pretty good at writing headlines that work, and you can usually see at a glance what today’s top story is anyway - it’s the one in big letters at the top! However, if you need an aggregation of global news for international comparison, the newsmap does give you quick access to a lot of international sources all together.**

One of the great pleasures of these events is getting to talk to other people who are there and I met a fascinating researcher who had been monitoring importance of stories by keyword frequency, showing that when something happens you get a burst of news activity around the relevant keywords, a ripple effect, and then it dies away. By looking at those patterns you can produce a measure of the impact of different events.

**Update: Rayogram gives you images of actual newspaper front pages, with some options for sorting.
Creative Review - interesting post on tube maps.

No responses yet