About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Monday, October 18, 2010

Open Culture 2010 ruminations #1: Linked Data

I just came back from the Europeana plenary conference, Open Culture 2010, in Amsterdam. Before the conference I went to meetings of Working Party 1 (Users) and WP3 (Technical), and at all three gatherings I found myself ruminating on a few key areas: the question of Linked Data and the API; how social media and user generated content relate to the distribution model for Europeana; and the future of the project itself. In this first post I'll look at Linked Data and why I think we need to worry less about some things and more about others that aren't getting much attention; and I'll suggest some analogies etc that we might use to help sell the idea a bit.

Linked (Open) Data was a constant refrain at the meetings (OK, not at the WP1 meeting) and the conference, and two things struck me. Firstly, there’s still lots of emphasis on creating out-bound links and little discussion of the trickier(?) issue of acting as a hub for inbound links, which to my mind is every bit as important. Secondly, there’s a lot of worry about persuading content providers that it’s the right thing to do. Now the very fact that it was a topic of conversation probably means that there really is a challenge there, and it’s worth then taking some time to get our ducks in a row so we can lay out very clearly to providers why it is not going to bring the sky crashing down on their heads.
During a brainstorming session on Linked Data, the table I sat with paid quite a lot of attention to this latter issue of selling the idea to institutions. The problem needs teasing apart, though, because it has several strands – some of which I think have been answered already. We were posed the questions “Is your institution technically ready for Linked Data” and “Does it have a business issue with LD?”, but we wondered if it’s even relevant if the institution is technically ready: Europeana’s technical ability is the question, and it can step into the breach for individual institutions that aren't technically ready yet. With regard to the "business issue" question, one wonders whether such issues are around out-going links, or incoming links? Then, for inbound linkage, is it the actual fact of linkage, or the metadata at the end of the link that are more likely to be problematic? And what are people’s worries about outbound links?
What we resolved it down to in the end was that we expected people would be most worried about (a) their content being “purloined”, and (b) links to poor-quality outside data sources. But how new are these worries? Not new at all, is the answer, and Linked Data really does nothing to make them more likely to be realised, when you think about what we already enable. In fact, there’s a case to be made that not only does LD increase business opportunities but it might also increase organisations’ control over “their” data, and improve the quality of things that are done with it: letting go of your data means people don’t do a snatch-and-grab instead.
Ultimately, I think, Linked Data really doesn’t need a sales effort of its own. If Europeana has won people over to the idea of an API and the letting-go of metadata that it implies, then Linked Data is nothing at all to worry about. What does it add to what the API and HTML pages already do? Two things:
  • A commitment to giving resources a URI (for all intents and purposes, read “stable URL”), which they should have for the HTML representation anyway. In fact, the HTML page could even be at that URI and either contain the necessary data as, say, RDFa in the HTML, or through content negotiation offer it in some purer data format (say, EDM-XML).
  • Links to other data sources to say “this concept/thing is the sameAs that concept/thing”. People or machines can then optionally either say “ah, I know what you mean now”, or go to that resource to learn more. Again, links are as old as the Web and, not to labour the point, are kinda implicit in its name.

So really there’s little reason to worry, especially if the API argument has already been put to bed. However I thought it might be an idea to list some ways in which we can translate the idea of LD so it’s less scary to decision-makers.

  • Remember the traditional link exchange? There’s nothing new in links, and once upon a time we used to try to arrange link exchanges like a babysitting circle or something. We desperately wanted incoming links, so where’s the reason in now saying, “we’re comfortable linking out, but don’t want people linking in to our data”?
  • Linked data as SEO. Organisations go to great lengths to optimise their sites so they fare well in search engine rankings. In other words, we already encourage Google, Bing and the like spider, copy and index our entire websites in the name of making them easier to discover. Now, search is fine, but it would be still better to let people use our content in more places (that’s what the API is about), and Linked Data acts like SEO for applications that could do that: if other resources link to ours, applications will “visit”.
    The other thing here is that we let search engines take our content for analysis, knowing they won’t use it for republication. We should also licence our content for complete ingestion so that applications indexing it can be as powerful as possible.
  • It’s already out there, take control! We let go of our content the moment we put it on the web, and we all know that doing that was not just a good thing, it’s the only right thing. But whilst the only way to use it is cut-n-paste (a) it’s not reused and seen nearly as much as it should be, and (b) it’s completely out of our control, lacking our branding and “authority”, and not feeding people back to us. Paradoxically, if we make it easier to reuse our content our way than it is to cut and paste, we can change this for the better: maintain the link with the rest of our content, keep intellectual ownership, drive people back to us. Helping reuse through linked data and APIs thus potentially gives us more control.
  • Get there first. There is no doubt that if we don’t offer our own records of our things in a reusable form online then bit by bit others will do it for us, and not in the way we might like. Wikipedia/DBPedia is filling up with records of artworks great and small, and will therefore be the reference URIs for many objects.
  • Your objects as context. Linked data lets us surround things/concepts with context;

So if I think fears about LD should something of a non-issue, what do I think are the more important questions we should be worrying about? Basically, it’s all about what’s at the end of the reference URI and what we can let people do with it. Again, it’s really a question as much about the API as it is about Linked Data, but it’s a question Europeana needs to bottom out. How we license the use of data we’re releasing from the bounds of our sites is going to become a hotter area of debate, I reckon, with issues like:

  • Is Europeana itself technically prepared to offer its contents as resources for use in the LD web? Are we ready to offer stable URIs and, where appropriate, indicate the presence of alternative URIs for objects?
  • What entities will Europeana do this for? Is it just objects (relatively simple because they are frequently unique), or is it for concepts and entities that may have URIs elsewhere?
  • What’s the right licence for simple reuse?
  • Does that licence apply to all data fields?
  • Does it apply to all providers’ data?
  • Does it apply to Europeana-generated enrichments?
  • Who (if anyone) gets the attribution for the data? The provider? Aggregator? Disseminator (Europeana)?
  • Do we need to add legal provisions for static downloads of datasets as opposed to dynamic, API-based use of data?

Just to expand a little on the last item, the current nature of semantic web (or SW-like) applications is that the tricky operation of linking the data in your system to that in another isn't often done on the fly: often it happens once and the results ingested and indexed. Doing a SPARQL query over datasets on opposite sides of the Atlantic is a slow business you don’t want to repeat for every transaction, and joining more sets than that is something to avoid. The implication of this is that, if a third party wanted to work with a graph that spread across Europeana and their own dataset, it might be much more practical for them to ingest the relevant part of the Europeana dataset and index and query it locally. This is in contrast to the on-the-fly usage of the metadata which I suspect most people have in mind for the API. Were we be allow data downloads we might wish to add certain conditions to what they could do with the data beyond using it for querying.

In short I think most of the issues around Linked Data and Europeana are just issues around opening the data full stop. LD adds nothing especially problematic beyond what an API throws up, and in fact it's a chance to get some payback for that because it facilitates inbound links. But we need to get our ducks in a row to show organisations that there's little to be worried about and a lot to gain from letting Europeana get on with it.


David Haskiya said...

"In short I think most of the issues around Linked Data and Europeana are just issues around opening the data full stop. "

From a business point of view, yes.

And certainly the point of Europeana publishing Linked Open Data is so that our individual 1500 data providers don't need to! As for are we technically ready? Not right now, but we certainly aim to be.

The word Open in Linked Open Data implies a little bit more openess, than the API. The API will in near time demand an API key to be able to use. So the API-garden will have a gatekeeper for now.

Needing a key or log-in to call a linked object URI though seems silly. Would that really be Linked OPEN Data?

We want those 1500 data providers to be on board. So that's why we had an Open Culture theme to the plenary, linked open data tracks, a risk and rewards track and also hold workshops on the benefits of opening up content etc.

So more good arguments for opening up are needed! As well as good examples of the rewards of open APIs or Linked Open Data brings. There are already many, but we need to make them more known within the GLAM-sector in Europe.

Jeremy said...

Thanks David, you're absolutely right of course. We need to be making those arguments clearly and demonstrating the rewards. I think the main thing I wanted to say was that, in terms of selling LOD to content providers, we shouldn't make a big deal of it as being distinct from the API, because it introduces nothing new that should worry them; to all intents and purposes it's a facet of the API, albeit one that as you say won't require a key (after all, linked data requests are for a single resource rather than a more expensive search) and which promises stable URLs. So I reckon that it's just better to bundle it in with the arguments and business case for the API, rather than split it out and make a rod for our own back!
That doesn't mean that it wasn't worth having the open data track, far from it. Linked data is indeed a vitally important aspect of what we should be doing, and we need to make sure that it is part of the vision, so raising consciousness amongst the partners is still important. The conference let us start to flag up how we might use it to the benefit of Europeana, its stakeholders and the whole linked data web, all very important stuff!

David Haskiya said...

Coming out as a bit of Linked Data scpetic here... I think Linked Data will reach fruition a few years down the line. Basically, the apps are missing. Perhaps because the user tends to be missing in the equation? BBC Wildlife is the only decent end-user service I've seen built on LinkedData. Maybe there are others?

LinkedData and Semantic Web will be a success when we stop talking about them. They need to be under layers of good usability.

A "simple" API though. That you can do things with here and now!

Still, yes, it's all about variations of content re-use.Sometimes I try to put tools for re-use on a scale from simple to complex:


Cite record

Embed record

Search widgets of various shapes and forms

Search APIs

Read-Write APIs

Linked Open Data

After a while I hope we'll have some support across this spectrum.