Recently I’ve been reading about and playing with Microdata. For those who’ve never heard of Microdata, you may have heard of microformats or RDFa. Still no? Let me explain.

Web sites are made up of lots of interesting information. An example of such information would be a movie review on IMDB, a telephone number on QYPE or product details on eBay. Then there is Google, your favourite search engine, which spiders these web sites and tries to make sense of them. Google is pretty good at understanding these sites now; how many times have you typed your favourite restaurant in to Google and the first result is an address, map and phone number, along with a few reviews? Impressive, eh?

The problem though, is that despite the clever folk at Google, this will never be a perfect science. For humans, extracting information from web pages is pretty easy. We understand the hundreds of tiny imperfections and differences of how similar data is displayed, where computer software, such as search engine spiders, find it far more difficult. This is where microdata comes in.

Remember that even though we see pretty gradients, fancy typography and jQuery enabled navigation, this isn’t how the machines see it. They see the code — the HTML, XML, JavaScript, CSS and everything else that makes up the page. Think about the infinitely different ways of marking up the same information. Microdata provides a standard for developers to mark up their existing web sites, to say “OK, this piece of information is a product, this is a review, this is a person”, and so on. The markup itself is actually quite simple. What’s more, Google, Bing and Yahoo! have teamed up to agree on the standard, and come up with a rather extensive list of different types and all their properties. Check out http://schema.org for more information that you probably need.

So we now have a way of tagging all the data in our web pages, making it easy for the extraction of information from our web sites. Sadly, though, it will never be that easy to automate this process. As a programmer, I love to be able to write a program to do my work for me, instead of actually doing the work. Taking an existing web site and adding all the microdata is a fairly manual process. If you think about it this makes perfect sense – if there existed a program to automatically convert your web pages in to microdata, there would be no need for microdata.

As always, though, there are methods of making things a little bit easier. Instead of us trying to remember all the different types (items) of microdata, or whether any specific item is a child or parent of another item, or what properties they contain or how exactly we might add this microdata markup to our code, we can abstract the bulk of that away, and wrap it in some form of tasty GUI or template system that does all that nasty business for us. This is what I hoped to achieve when faced with this very problem.

After I’d finished what I had to do, I decided to expand on it a little more and provide a number of helpful functions to work with the microdata standard and the many items and properties it encapsulates. I bunged this all in a class, and have popped it on github, in the hope that someone, somewhere, might find it useful.

I believe that microdata is going to be a pretty big thing. At work, and in my personal projects, I’m beginning to implement microdata now, so that as Google (and other search engines) add more and more support for it, my rich data will show up in the Google results before everyone else’s. Organic results seem to be pushed further and further down now, with images, videos, maps, product reviews and more taking precedence. I’d like to think this bandwagon would be a good one to jump on.

Check it out on github.

Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>