Thursday, January 28, 2010

HTML5, RDFa, Microdata

I wrote the rough draft of this as a reply to a comment, but I thought it was worth raising into a full blown blog post.

To summarise: I'm pro RDFa, a web developer, and an open data geek. I'm also participant in a lot of communities, the biggest being probably PEAR, Freebase and

This is just a summary of problems (not technical problems) I see with Microdata and its role in HTML5: things which need solving before I'd be happy to put my weight behind it.
Unfortunately, some of these are extremely difficult to accomplish.

Tool support, and will it work how we think?

Microdata parsers: Google

I can only find one popular result at this time, a perl CPAN package.

RDFa parsers: Google

Oodles of results.

What I draw from these extremely limited samplings are basically
(1) A fair few people are out there consuming RDFa.
(2) There are far less people out there who have parsed microdata.

What are the challenges that will be faced by implementors of microdata?
What are the deficiencies in microdata which aren't yet apparent to us?
What are the hurdles publishers must learn about?

In my view, RDFa has answers for these questions; and is proven in the field to some degree.

Real world examples:
Google / Yahoo indexing RDFa good relations / product data; publishers like Best Buy rendering their information as RDFa.

At the moment, there are few implementors of microdata on a large scale that I'm aware of.


With RDFa, it's got the bulk of all of the ontologies/vocabularies (ie foaf, dc, good relations) that exist from the RDF/Semantic Web world sitting right behind it.

The people who create these kinds of ontologies in the current semantic web field are researchers, academics, and pedantic people. They are generally fine with RDF/XML, triples, etc. Being the kind of people they are, they also love the linked data tool stack.

When these people make efforts to reach out, document and share their work; they are unlikely to choose microdata to publish their tutorials in. Particularly due to exactly this kind of discussion that has been taking place: a very adversarial one.

What this means is that web authors who want to publish structured data won't simply be able to google "microdata goodrelations" and get meaningful results at the drop of a hat.

Real world example:
Microformats vs RDF, RDFa + Ontologies has a handful of microformats documented - hCal, hCard, XFN etc. Total: Less than 20 de facto standards.
These have gained widespread adoption, and that's great.

RDF, RDFa + Ontologies
Swoogle has over 10,000 ontologies indexed.
The most popular ontologies/vocabularies, off the top of my head:
FOAF, Music Ontology, Geo, Dublin Core, RSS 1.0, Good Relations, SIOC, BioRDF, etc - each has millions of triples, and many implementors/publishers.

For me; this means I search for "gene data rdfa"; I get the answers I'm looking for, and the same answers everyone else who has an interest in this gets. I can't do "gene data microformat" (edit: oh crap; I can - hGene. Either way I was thinking mostly about BioRDF when I wrote this).

At the moment, I can't do that with microdata. Because of the interaction between the RDF(a) community and the HTML5 community; I don't see it being possible in the next 5 years.

The biggest problem I have with that: making standards is *hard* work. To throw away the linked data / semantic web folks work of the last few years and start over strikes me as a tremendous expenditure of energy for little gain.

Reblog this post [with Zemanta]

No comments: