The Doofer Call: meeting

Showing posts with label meeting. Show all posts

Thursday, February 25, 2010

Linked Data meeting at the Collections Trust

[December 2010: I don't even know anymore if this was ever published, or if I simply edited it and it went back into draft. If the latter, duh. If the former, well, in the spirit of an end-of-year clearout here's something I wrote many months ago]
[UPDATE, March 2010: Richard Light's presentation is now available here]

On February 22nd Collections Trust hosted a meeting about Linked Data (LD) at their London Bridge offices. Aside from yours truly and a few other admitted newbies amongst the very diverse set of people in the room, there was a fair amount of experience in LD-related issues, although I think only a few could claim to have actually delivered the genuine article to the real world. We did have two excellent case studies to start discussion, though, with Richard Light and Joe Padfield both taking us through their work. CT's Chief Executive Nick Poole had invited Ross Parry to chair and tasked him with squeezing out of us a set of principles from which CT could start to develop a forward plan for the sector, although it should be noted that they didn’t want to limit things too tightly to the UK museum sector.

In the run-up to the meeting I’d been party to a few LD-related exchanges, but they’d mainly been concentrated into the 140 characters of tweets, which is pragmatic but can be frustrating for all concerned, I think. The result was that the merits, problems, ROI, technical aspects etc of LD sometimes seemed to disappear into a singularity where all the dimensions were mashed into one. For my own sanity, in order to understand the why (as well as the how) of Linked Data, I hoped to see the meeting tease these apart again as the foundation for exploring how LD can serve museums and how museums can serve the world through LD. I was thinking about these as axes for discussion:

Creating vs consuming Linked Data

End-user (typically, web) vs business, middle-layer or behind-the-scenes user

Costs vs benefits. ROI may be thrown about as a single idea, but it’s composed of two things: the investment and the return.

On-the-fly use of Linked Data vs ingested or static use of Linked Data

Public use vs internal drivers

So I took to this meeting a matchbox full of actual knowledge, a pocket full of confusion and this list of axes of inquiry. In the end the discussion did tread some of these axes whilst others went somewhat neglected, but it was productive in ways I didn’t expect and managed to avoid getting mired in too much technology.

To start us off, Richard Light spoke about his experiments with the Wordsworth Trust’s ModesXML database (his perennial sandbox), taking us through his approach to rendering RDF using established ontologies, to linking with other data nodes on the web (at present I think limited to GeoNames for location data, grabbed on the fly), and to cool URIs and content negotiation. Concerning ontologies, we all know the limitations of Dublin Core but CIDOC-CRM is problematic in its own way (it’s a framework, after all, not a solution), and Richard posed the question of whether we need any specific “museum” properties, or should even broaden the scope to a “history” property set. He touched on LIDO, a harvesting format but one well placed to present documents about museum objects and which tries to act as a bridge between North American formats (CDWALite) and European initiatives including CIDOC-CRM and SPECTRUM (LIDO intro here, in depth here (both PDF)). LIDO could be expressed as RDF for LD purposes.

For Richard, the big LD challenges for museums are agreeing an ontology for cross-collection queries via SPARQL; establishing shared URLs for common concepts (people, places, events etc); developing mechanisms for getting URLs into museum data; and getting existing authorities available as LD. Richard has kindly allowed me to upload his presentation Adventures in Linked Data: bringing RDF to the Wordsworth Trust to Slideshare.

Joe Padfield took us through a number of semantic web-based projects he’s worked on at the National Gallery. I’m afraid I was too busy listening to take many notes, but go and ferret out some of his papers from conferences or look here. I did register that he was suggesting 4store as an alternative to Sesame for a triple store; that they use a CRM-based data model; that they have a web prototype built on a SPARQL interface which is damn quick; and that data mining is the key to getting semantic info out of their extensive texts because data entry is a mare. A notable selling point of SW to the “business” is that the system doesn’t break every time you add a new bit of data to the model.

Beyond this, my notes aren’t up to the task of transcribing the discussion but I will put down here the things that stuck with me, which may be other peoples’ ideas or assertions or my own, I’m often no longer sure!

My thoughts in bullet-y form
I’m now more confident in my personal simplification that LD is basically about an implementation of the Semantic Web “up near the surface”, where regular developers can deploy and consume it. It seems like SW with the “hard stuff” taken out, although it’s far from trivial. It reminds me a lot of microformats (and in fact the two can overlap, I believe) in this surfacing of SW to, or near to, the browsable level that feels more familiar.

Each audience to which LD needs explaining or “selling” will require a different slant. For policy makers and funders, the open data agenda from central government should be enough to encourage them that (a) we have to make our data more readily available and (b) that LD-like outputs should be attached as a condition to more funding; they can also be sold on the efficiency argument or doing more with less, avoiding the duplication of effort and using networked information to make things possible that would otherwise not be. For museum directors and managers, strings attached to funding, the “ethical” argument of open data, the inevitability argument, the potential for within-institution and within-partnership use of semantic web technology; all might be motives for publishing LD, whilst for consuming it we can point to (hopefully) increased efficiency and cost savings, the avoidance of duplication etc. For web developers, for curators and registrars, for collections management system vendors, there are different motives again. But all would benefit from some co-ordination so that there genuinely is a set of services, products and, yes, data upon which museums can start to build their LD-producing and –consuming applications.

There was a lot of focus on producing LD but less on consuming it; more than this, there was a lot of focus producing linkable data i.e. RDF documents, rather than linking it in some useful fashion. It's a bit like that packaging that says "made of 100% recyclable materials": OK, that's good, but I'd much rather see "made of 100% recycled materials". All angles of attack should be used in order to encourage museums to get involved. I think that the consumption aspect needs a bit of shouting about, but it also could do with some investment from organisations like Collections Trust that are in a position potentially to develop, certify, recommend, validate or otherwise facilitate LD sources that museums, suppliers etc will feel they can depend upon. This might be a matter of partnering with Getty, OCLC or Wikipedia/dbPedia to open up or fill in gaps in existing data, or giving a stamp of recommendation to GeoNames or similar sources of referenceable data. Working with CMS vendors to make it easy to use LD in Modes, Mimsy, TMS, KE EMu etc, and in fact make it more efficient than not using LD; now that would make a difference. The benefits depend upon an ecosystem developing, so bootstrapping that is key.

SPARQL: it ain’t exactly inviting. But then again I can’t help but feel that if the data was there, we knew where to find it and had the confidence to use it, more museum web bods like me would give it a whirl. The fact that more people are not taking up the challenge of consuming LD may be partly down to this sort of technical barrier, but may also be to do with feeling that the data are insecure or unreliable. Whilst we can “control” our own data sources and feel confident to build on top of them, we can’t control dbPedia etc., so lack confidence of building apps that depend on them (Richard observed that dbPedia contains an awful lot of muddled and wrong data, and Brian Kelly's recent experiment highlighted the same problem). In the few days since the meeting there have been more tweets in this subject, including references to this interesting looking Google Code project for a Linked Data API to make it simpler to negotiate SPARQL. With Jeni Tennison as an owner (who has furnished me with many an XSLT insight and countless code snippets) it might actually come to something.

Tools for integrating LD into development UIs for normal devs like me – where are they?

If LD in cultural heritage needs mass in order for people to take it up, then as with semantic web tech in general we should not appeal to the public benefit angle but to internal drivers: using LD to address needs in business systems, just as Joe has shown, or between existing partners.

What do we need? Shared ontologies, LD embedded in software, help with finding data sources, someone to build relationships with intermediaries like publishers and broadcasters that might use the LD we could publish.

Outcomes of the meeting
So what did we come up with as a group? Well Ross chaired a discussion at the end that did result in a set of principles. Hopefully we'll see them written up soon coz I didn't write them down, but they might be legible on these images:

Tuesday, May 20, 2008

To Leuven (expenses paid), but who will pay for EDL?

Tomorrow sees EDL's working group 1 meeting at the Katholieke Universiteit in Leuven, not far from Brussels. Quite exciting to be visiting, albeit fleetingly, a place that played host to Matsys, Bouts, Erasmus and Vesalius, amongst others (not to mention, apparently, the infamous AQ Khan). I'm looking forward to attending, though with no expectation of being able to contribute a lot since this group covers different ground to the one I've worked with up till now. I'm not even sure if I'm part of the group or simply in attendance. Anyway, the meeting will look at progress with Europeana so far, and consider the business issues facing it, particularly how to move to phase 2 (i.e. following the prototype, to be launched in November) and how to build a sustainable future. Working my way through 100-odd pages of reading matter in preparation for the meeting, I'm struck by how big a challenge it will be to find the necessary ongoing resources, but also the fact that they are tackling the problem head-on and examining a wide variety of options, from direct subsidy, through subscription by contributors or users, to corporate partnership or sponsorship.

A significant factor in the search for revenue-raising avenues is the fact that Europeana is not going to be a content owner in any significant way, but rather a broker/facilitator for accessing content owned by others. One possibility that I believe it could be worth exploring for two reasons is some form of partnership with a search provider. Yahoo! may be a bit too distracted to talk at the moment, but along with Google could be productive partners. Both sides could benefit by working on an interface and aligning their data structures, and EDL could perhaps offer quite a bit to such a partner in terms of preferential access to the semantically enriched data it will hold. This might be directly to do with searching the resources in EDL, or it might be, say, helping to clean up datasets of people and places. In exchange, maybe either some cash or technological assistance? Perhaps some of the semantic-y startups currently taking wing could also be interesting to work with, but they won't be as well resourced. Cultural heritage organisations have a lot of knowledge and context to offer here so maybe there's a business model to be had.

Tuesday, March 11, 2008

Here's that EDLNet presentation and notes

Yesterday I put the EDLNet Paris slideshow onto Slideshare, but since Scribd also let's me put up other stuff I'm putting that and the notes there too. If you can't see these coz I've screwed up the Scribd embed link or something, go here for the presentation and here for the notes

EDLNet Paris presentation:

Presentation notes:

And just in case...

The previous post with options for API input parameters is also on Scribd (UPDATED 17/3/2008)

Friday, March 07, 2008

EDL WP3 Paris meeting

So, Friday evening, perhaps a few moments to write up (some of) the Paris meeting. I did mean to attach my presentation too but the last version is on a memory stick at home. Too damn portable, never where you need it!
It was a successful meeting, I would say, and it was a pleasure to see a couple of faces I already knew, and meet others for the first time. On Monday (with me still reeling from a 4am start) we were taken through the results of the user testing. These were overwhelmingly positive, which needs to be taken with caution given the guided nature of the demo (especially with the online questionnaire, but also perhaps the expert users and the focus groups). All the same there were criticisms that provided something to get our teeth into, particularly around the home page and the pupose of the "who and what" tab. Search result ordering was an issue, a particularly thorny one in fact that we tackled on Tuesday as best we could. Clearly a lot of users don't really understand tagging, though they thought they liked it. Other plusses were for the timeline and map.
There was a good session with representatives of a French organisation for the blind and visually disabled after lunch (a bloomin' good lunch, in fact. Good wine, too. I love France!). Aside from HTML accessibility they talked extensively about Daisy, and it would be marvellous if some of the text content that may end up there could be daisified. No-one had heard of TEI (or DocBook) but it struck me that these formats are pretty close to what Daisy sounds like, and that there may be TEI material amongst the content we'll be aggregating, so translations to Daisy could be relatively straightforward. Anyone know?
Personalisation took us to the end of the day and we distinguished between activities done for private purposes (though perhaps with public benefits) like bookmarking with tagging, or tailoring search preferences, setting up alerts, or saving searches; and explicitly public activities like enriching content, suggesting and tagging (when not bookmarking). The question of downloads (what? how? assets or data?) and the related issue of licensing came up. I think we worked out that possibly four levels of privacy would be useful, extending the way Flickr and other sites work, with private, public, friends/family, and "share with institutions". The latter is really about saying, I will let me and my mates and the organisation whose objects I'm tagging/annotating look at the data, but not everyone. I think it's important and should be encouraged, as it lets those institutions do interesting stuff with the resulting UGC for everyone's benefit. We ran over into the next day to deal with communities (still plenty to think about there, I would say) and results display, a practical and useful discussion that touched on the fields that might be searched across and how they would be used in ranking.
Finally my bit came up. Although Fleur had suggested that I talk for maybe 15-20 minutes to kick off discussions on the API, I, feeling unsure of my ground, prepared pretty thoroughly with the result that I had material that kept me talking for an hour or more, I think, albeit with some digressions for debating what I was saying. On the whole it went down quite well, I think, but I learned a bit about what I should have added (proper, simple explanations of APIs, and more examples of how they're used) and what I should have left out (a section where for the sake of completeness I referred to the management of collection data, which is not part of the public API anyway and is outside the scope of our WP. This led to a digression that was I think still useful, but not to the topic of that moment). And then, seeding the discussion with a use case related to VLEs, we tried to figure out in more detail what functions and parameters would be needed in an API call, and what would be returned. And that, my friends, I will write up shortly for now I want my dinner. Home calls.

About Me