Greg Heartsfield home

Facebook SemWeb Exporter

This is a long overdue post to talk about a facebook app I wrote almost a year ago. Since Facebook announced they were adding a “Download Your Information” feature, there has been renewed discussion about data portability and liberation. Kudos to the Facebook team for making it easier for users to retrieve more of what they have created and entrusted to the service. However, as others have noted, this is but the smallest of steps in the right direction.

Although I do not yet have access to Facebook’s new download function, from what I can gather, it neglects to give anything more than the most basic information about friends. It seems that it isn’t useful for much other than exporting photos (who has their only copy of photos on facebook anyway?). The social graph is the most valuable asset on Facebook, and it’s a bit of a cruel joke that even a v1.0 Facebook exporter won’t give you your friends’ email addresses.

For some time now I’ve been an RDF geek. I keep reading lists, medical history, a personal profile, project profiles and now, social graph data in RDF. It’s not for everybody, especially if you haven’t heard of SPARQL, but I have found it to be both a powerful and practical system for managing rich interconnected data. The natural choice for representing friends and social graph data in RDF is the combination of the FOAF and SIOC ontologies. These provide a standard way of representing exactly what we want to get out of facebook: person information, friend relationships, accounts, groups and group membership. So, my motivation has been to be able to easily pull this data out of facebook so that I can have it under my own control. And, I want it before Facebook does something that may prompt me to hit the “Deactivate Account” button.

Frustration with an existing tool for exporting RDF data out of Facebook (which generated invalid RDF/XML) led me to create the Facebook application Semantic Web Exporter. It is Free software of course, source available through a mercurial repository. Written in Python, using Web2py and PyFacebook libraries, over the course of a couple weeks. The Facebook API is a little strange (I’ll be quite happy to never touch it again), but besides some tricks to get a non-proxied download link to the actual RDF data, I’d classify it as only mildly annoying. Thankfully, the parts of the API I touch have been stable over the past 9 months, so it hasn’t been a chore to maintain. My biggest worry was Facebook actually approving it, but they did so in just a few days.

I’ve been pleased with it, and hopefully others have as well. Almost half a million triples have been exported so far, by several hundred people (most of that due to a recent surge in traffic from twitter). It’s in Facebook’s own best interest to make this type of data available to its users. We (data portability advocates, developers) need to take the initiative and responsibility in this as well. When one-off APIs are offered, we should proactively offer solutions that remove data lock-in. If API access to data portability applications is denied, then we should raise a stink. Until that happens, it’s unfair to say that Facebook is holding anyone’s data hostage, when they’ve given us the raw tools necessary to liberate it.

SemWebExporter Screenshot

Validate XHTML Validate CSS