Introduction to Session 1, 0945-1115
Friday February 29th 2008
Wikipedia & Wikiscanner
Google's Attention Deficit
Librarians against the torrent
The Health Ranger and the Sunstein Effect
Subvert & Profit
Digg versus Arts and Letters
Crowdsourcing: Newstrust, NewsAssignment, and User Generated Content
Google's central PageRank algorithm harnesses the Wisdom of the Crowd to sort web pages into a hierarchy of relevance.
There are three steps to Google's process:
- a ``similarity'' measure: statistically analyse the content of pages and assess whether a given page is likely to be ``about'' the query. For example, ``Should we have gone to war in Iraq'' as a quesry will return all pages which, based on content analysis, seem to be about the decision to go to war.
- an ``authority'' measure based on ``links are votes''. For each page that might be relevant, determine how many pages are linking in to it; for those that are linking in, determine their ``authority'' as being the number of pages linking in to that page. Use the authority-weighted number of pages pointing to a ``relevant'' page to determine the relevance of this page relative to all other pages.
- serve the ordered list to the client screen, with, of course, the list of right-hand-side, paid-for advertising links that could be relevant to you.
We all know that the results ranges form the spectacular to the disappointing. Google's own assessment is that:
Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement.
My own straw man for today is that this used to be true, but is much less so today. Google and PageRank did a remarkable and valuable job in the ``age of innocence'' of the Internet. It was a hugely liberating force. It is no longer, for deep, structural reasons. We do not yet have a replacement.
As every web content producer adjusts to Google, its results become necessarily less and less compelling. The joy of Google past was to think hard about the search query and get a first screen result full of relevant but quirky,even obscure material. A Google result today is much less sensitive to the driver, because every content maker is trying to "buy" space that it can't pay for in genuine relevance.
SEO will ossify Google and a better solution will wipe it out with the speed of an epidemic. The web has become over-fitted to Google's algorithm like a strain of wheat becomes over-designed to a specific ecology. The web is covered in content strategies hyper-alligned to Google, and a new mechanism will find a source of meaningful, un-manipulated information--just as the hyper-link was before PageRank made it a gameable commodity.
PageRank ``worked'' in a directly analogous way to the sense that the market''works'' for Hayek: in both cases, an un-intended side-product of a meaningful action by individuals is aggregated into a socially useful measure: price and PageRank. And PageRank manipulation is just like the exercise of market power--not a show-stopper, but a cause for concern.
If Google's PageRank is like the market, Digg works like the political system. Members vote on stories, and high votes lead to high salience. Digg is a substantial driver of traffic--a Digg front page offers a huge boost to the visibility of a piece of material.
Digg is extremely entertaining--especially if you have a taste for the (mostly male) North American geek interests: technology, Science, Science Fiction, Libertarian politics etc.
But as the model for ``democratic news'' it has suffered the recurrent -- and maybe endemically democratic?-- problems of cliques, subversion, manipulation and sometimes, quite dramatically, populism and mob-rule.
Compare to Arts and Letters Daily. 3 articles a day are ``dugg'', always by the same New Zealand-based professor of aesthetics. No more transparent, and just as ``stamped'' by its character as Digg. But we imagine ALD is not ``buyable'' or manipulable ...except by writing to the taste of the editor!
Digg and ALD perform very similar functions: they harness the naturl desire to share information you have found interesting and turn that into a powerful filter. ALD relies on one talented and dedicated person, and Digg on thousands. Can we create ``mid-points'' in this space: to leverage the efforts of distributed communities in mechanisms that will create ALD-style focus with Digg-style distribution of effort?
Google and Digg are all about extracting signal first from an accidental by-product of building the web, then from votes--in bth cases small but supposedly significant pieces of user input. The downside in both techniques comes from the fact that they are rough filters, quite cheaply manipulated.
Maybe the magical signal-extracting formula will come from attempts to harness the efforts of a more dedicated crowd. This is the hope both of the ``crowd-sourcers'' and of User Generated Content.
NewsTrust is a sort of ``super-Digg''. If you really want to contribute to filtering the signal from the noise, here are the tools to do it. The method is extensive and more effortful than a Digg. Your reputation counts in the weight you have in the average rank of a story. And still ... NewsTrust is stamped by the character of its community. Not an objective measure as it set out to be, it is valuable and expressive of a set of values.
NewsAssignment, a project by Jay Rosen and Dan Cohn, goes a step further in the effort that it expects from participants: NewsAssignment looks to open-out the process of making an article. It is for all to enter and all to see. Here is Wired's first crowd-sourced article, appropriately enough about Wikipedia, crowdsourcing and quality.
Want to know how it got put together? Where its biases are? Look at the project files on NewsAssignment. Here, the questions to be put to Jim Wales.
In a sense crowdsourcing is a refinement on User Generated Content, like comments under articles on the BBC or Guardian, or any number of forums--here on Comment-is-Free. the commenter is part of the media product.
Openness changes content. Here is a powerful and direct example from the Guardian in Feb 2008. When the crowd does not like what the institution is doing, it can exercise power because it is an essential and un-contracted part of the creative process. Populist waves hit Digg. They will hit every other crowd-sourced media creator, where the editor's role becomes more and more like that of a politician, balancing interests, coalitions and looking over your shoulder to assess who can bring you down today. Newsmaking used to be largely about politics; it now is politics.
tony curzon price 2008-02-26