Tuesday, September 09, 2008

It's Not the Content Silly, but the Metadata You Should be Listening to

1 comment :
I have been biting my bottom lip all day- at first because i really did not feel like commenting on it and specifically because i didn't know the 'facts'. But after noticing a swollen lip, i decided to go ahead.

Since i happen to work for a media provider (Dow Jones) whose model is certainly currently focused on providing 'authoritative' premium content- something that many clients are willing to pay a lot of money for access to (so things like today are not a common occurrence)- i knew that most likely any post would be blatantly pro-premium subscriptions- because trust me if someone had done a check on a Dow Jones news feed product they would have seen that the story was 'bogus' with little effort. So a story like this perhaps makes us drool because it provides a perfect case story on why 'free' is not better- certainly not when millions of dollars are at stake.

From a Wired Blog a good explanation of what happened:
the article in the Sun Sentinel's archive had no date on it. But when Google's spider grabbed it, it assigned a current date to the piece, which then resulted in the article being placed in the top results of Google News. When the employee from Income Securities Advisors ran a Google search on "2008 bankruptcies," the old United Airlines story appeared as the top link in the results, with a September 6, 2008 date on it. (Google has now released a screenshot that shows the UAL story as it appeared on the Sun Sentinel web site. The only date in the screenshot is September 7, 2008, the date Google accessed the page. There is no date under the story's headline to indicate when it was published.) At 11 am Monday, the employee added the story to a feed that is included in a Bloomberg subscription service and within minutes, 15 million shares of United Airlines stock had been sold before trading on the stock was halted.

But although this of course is a compelling story to only trust authoritative sources and premium content aggregators, this story is not only about 'free' content because the traders (human or machines i may ask?) acted on a news story from a reputable and costly service- Bloomberg.

So who is to blame? Well of course many are talking about it in both main stream media and blogs and sure things like this have happened before but I think that the simplest answer is to ask why the person who pressed the send button to Bloomberg didn't vet out the story. Perhaps it was early in the morning (ok 11am is not that early but let's go with that excuse....) and he got in late the night before thanks to a delayed United flight and all the recent news about the airline industry losing money and charging $15 for a pillow was enough to believe the story and pass it along as a true story without any vetting.

But the interesting part to me is the metadata associated with the news story- because essentially that was the technical culprit- the article did not have metadata (in this case publication date) to tell Google that it was an article from 2002 that had been republished on their website. The problem is that there really is no standard to provide that information that online news providers adhere to and as more of their archives that were traditionally only available through premium aggregators that normalize the content, come on-line for 'free' more unknowns start to be thrown at online news services like Google News.

One of the core benefits of aggregator premium services (e.g. Factiva from Dow Jones, LexisNexis) is the normalization of the content from 'trusted' sources. This ensures that publication data- sometimes down to the millisecond is provided and the consuming application whether it is a trading system, a news portal or an alerting sms message sent to the banker on the run- gets it right.


Image|Flickr|arimoore

1 comment :

Anonymous said...

hAtom is a fine standard to use for this, but doesn't help if there is no metadata...

Reminds me of Dougla Adams quote about trusted sources:

http://bit.ly/DA

Because the Internet is so new we still don’t really understand what it is. We mistake it for a type of publishing or broadcasting, because that’s what we’re used to. So people complain that there’s a lot of rubbish online, or that it’s dominated by Americans, or that you can’t necessarily trust what you read on the web. Imagine trying to apply any of those criticisms to what you hear on the telephone. Of course you can’t ‘trust’ what people tell you on the web anymore than you can ‘trust’ what people tell you on megaphones, postcards or in restaurants. Working out the social politics of who you can trust and why is, quite literally, what a very large part of our brain has evolved to do. For some batty reason we turn off this natural scepticism when we see things in any medium which require a lot of work or resources to work in, or in which we can’t easily answer back – like newspapers, television or granite. Hence ‘carved in stone.’ What should concern us is not that we can’t take what we read on the internet on trust – of course you can’t, it’s just people talking – but that we ever got into the dangerous habit of believing what we read in the newspapers or saw on the TV – a mistake that no one who has met an actual journalist would ever make. One of the most important things you learn from the internet is that there is no ‘them’ out there. It’s just an awful lot of ‘us’.