Wednesday, January 04, 2006

Skewering of the Semantic Web

See Clay Shirky's article, "The Semantic Web, Syllogism, and Worldview" for a real skewering of the Semantic Web initiative. Some quotes:

The Semantic Web's Proposed Uses #


Dodgson's syllogisms actually demonstrate the limitations of the form, a pattern that could be called "proof of no concept", where the absurdity of an illustrative example undermines the point being made. So it is with the Semantic Web. Consider the following, from the W3C's own site:


Q: How do you buy a book over the Semantic Web? A: You browse/query until you find a suitable offer to sell the book you want. You add information to the Semantic Web saying that you accept the offer and giving details (your name, shipping address, credit card information, etc). Of course you add it (1) with access control so only you and seller can see it, and (2) you store it in a place where the seller can easily get it, perhaps the seller's own server, (3) you notify the seller about it. You wait or query for confirmation that the seller has received your acceptance, and perhaps (later) for shipping information, etc. [http://www.w3.org/2002/03/semweb/]


One doubts Jeff Bezos is losing sleep.


This example sets the pattern for descriptions of the Semantic Web. First, take some well-known problem. Next, misconstrue it so that the hard part is made to seem trivial and the trivial part hard. Finally, congratulate yourself for solving the trivial part.


All the actual complexities of matching readers with books are waved away in the first sentence: "You browse/query until you find a suitable offer to sell the book you want." Who knew it was so simple? Meanwhile, the trivial operation of paying for it gets a lavish description designed to obscure the fact that once you've found a book for sale, using a credit card is a pretty obvious next move. (boldface mine)

....



From time to time, proselytizers of the Semantic Web try to give it a human face: For example, we may want to prove that Joe loves Mary. The way that we came across the information is that we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site.

To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]


You may want to read that second paragraph again, to savor its delicious mix of minutia and cluelessness.


Anyone who has ever been 15 years old knows that protestations of love, checksummed or no, are not to be taken at face value. And even if we wanted to take love out of this example, what would we replace it with? The universe of assertions that Joe might make about Mary is large, but the subset of those assertions that are universally interpretable and uncomplicated is tiny.

...

Meta-data is Not A Panacea #


The Semantic Web runs on meta-data, and much meta-data is untrustworthy, for a variety of reasons that are not amenable to easy solution. (See for example Doctorow, Pilgrim, Shirky.) Though at least some of this problem comes from people trying to game the system, the far larger problem is that even when people publish meta-data that they believe to be correct, we still run into trouble.


Consider the following assertions:

  • Count Dracula is a Vampire
  • Count Dracula lives in Transylvania
  • Transylvania is a region of Romania
  • Vampires are not real


You can draw only one non-clashing conclusion from such a set of assertions -- Romania isn't real. That's wrong, of course, but the wrongness is nowhere reflected in these statements.


The postings on this site are my own and don't necessarily represent IBM's positions, strategies, or opinions.

No comments: