Apr
28
2009
Content Based Image Retrieval - Google and Similar Image Search | Synaptica Central is a useful critique of the state of Google’s new similar images search, which doesn’t seem to be that good yet. I hope it’s not my fault for ruining their metadata with all the daft things I type into the totally addictive Google Image Labeler!
Apr
01
2009
Google Invests in Pixazza, An AdSense for Images is another neat little crowdsourcing initiative. What interested me the most was this: “As James Everingham, Pixazza’s CTO, states, “No computer algorithm can identify a black pair of Jimmy Choo boots from the 2009 fall collection in the same way a person can. Rather than rely on image analysis algorithms, our platform enlists product experts to drive the process.” ”
In other words, they are paying indexers/cataloguers. Not very much, it’s true, but it is still good to see someone in tech admitting that old fashioned human beings still have their uses! Image recognition algorithms are getting better all the time, but we haven’t even really conquered text processing yet.
Sep
22
2008
New method for building multilingual ontologies appeared on AlphaGalileo.Org - the Internet-based news centre for European science, engineering and technology. Researchers at the Universidad Politécnica de Madrid’s School of Computing (FIUPM) claim to have created a language-independent ontology-building tool. I think it will work very well for consistent well-structured information - for example in catalogues and directories - but it seems to me that it is essentially being an “auto-indexer” that only really works if you control linguistic forms, and perhaps even vocabulary, very tightly. That’s great - and means plenty of work for editors making sure everything is neat, tidy, and consistent to suit the system - but isn’t it going to be an awful lot of work? Or am I massively missing the point?
Mar
29
2008
Reuters Wants The World To Be Tagged. This article on the ReadWrite Web blog is about the new API (does anyone else pronounce this “appy”?) sent out into the world by Reuters. They are hoping it will encourage tagging of articles in a way they can then harvest. It sounds like it is fairly basic at the moment - it is only recognising a few bits and pieces like people and places. It would be interesting to see how well it does with people like Jack London and places like Congo (Brazzaville) and Congo (Kinshasa). When I worked on a similar project we had lots of problems disambiguating the Guineas (Papua New, Equatorial, etc) and Salvadors (El or San) in particular. I assume they have lots of authority files backed up by rules that will sort all those out. It would be nice to see “under the bonnet” as it were!