Nice catch by Ian Kennedy as usual over on his blog pointing us to this post on The New York Times Open Source blog about the Metadata that they are making available as part of their online open archive. They are making a good portion of the Metadata available for all their electronic Web content from 2001 onwards.
In my last post on this topic i mentioned some of the challenges of adding Metadata- especially only using machines. so i certainly grinned when i read how the NY Times deals with it:
"Summarization is a particularly tough one. At The Times, our goal is to apply our metadata to describe the essential summary of the story; this is more than mere entity extraction is capable of doing. Instead, we have tackled this problem by developing the most advanced computational text-categorizing system known to mankind: a crack team of whipsmart librarians. Armed with some guidelines and an organizational zeal, they’re able to maintain consistent tagging rules on our daily output. They and their predecessors have been doing this for our material all the way back to 1851."
The NY Times seems to be keeping an eye on those out their hacking different news views using the Metadata that is available and even go so far as to ask people to hack something even cooler....- nice.
photo credit: emdot