The Doofer Call: February 2008

Thursday, February 28, 2008

Tim Berners-Lee talks to Talis, namechecks museums (generally)

TB-L just did an interview with Paul Miller of the estimable Talis (transcript), talking inevitably about the Semantic Web. I've yet to read it all, but it was nice to see that in the first paragraph of his answer to the first question, he raised museums and libraries:

I think the Semantic Web is such a broad set of technologies and is going to do
so many different things for different people. It is really difficult to put it
on one thing. What are the steps necessary right now for the life sciences
community to be able to use it for their data about proteins is probably
different from which steps do we need to be able to get interoperability between
repositories of library data and museum data.

Which is true enough, and not exactly controversial. Good to have it from the Don, though. It's all about repositories in this vision, which is fine as far as it goes but I'm not clear that it gets us all the way, though. TB-L points out that the common worry that SW means marking up HTML pages is looking at a small part of the picture and that really the bulk of the SW effort will be about databases (remember, I'm skimming!).
My feeling at the moment is that the semantic web is building, but in good part through the efforts of those who are bypassing the "classical" technology. Inferred semantics are crucial in this phase, at least for the businesses that seem to be making some of the most interesting stuff. RDF and OWL are bit players here, although they have much more of a role in dealing with data that's already nicely structured. As TB-L points out, the steps towards full SWW do have payoffs on the way (as they must, not least in our own sector), and data integration is just such a self-rewarding step:

In fact, the gain from the Semantic Web comes much before that. So maybe we
should have written about enterprise and intra-enterprise data integration and
scientific data integration. So, I think, data integration is the name of the
game. That's happening, it's showing benefits. Public data as well; public data
is happening and it is providing the fodder for all kinds of mashups.

On the public side, light and loosely-coupled stuff will/is giving us payoffs for lower cost than going hardcore SW, and yet provides useful stepping stones on the way. Microformats, public APIs etc.: lowish cost, relatively immediate reward (potentially).
One more quote, and then I'll have to stop even skimming. This is about what to say to a CIO who want to understand what SW could do for their company, but it applies to museum people too:

"Well you should take an inventory of what you have got in the way of data and
you should think about how valuable each piece of data in the company would be
if it were available to other people across the company, or if it were available
publicly, and if it were available to your partners."
And then, you should make a list of these things and tackle them in order. You should make sure you don't change the way any of your data is existing, is managed, so you don't mess up the existing systems and so on.

He then goes on to talk about the developing technological picture including SPARQL, GRDDL and the rest, and from that point on I need to read a lot more attentively....

Wednesday, February 27, 2008

The EDL API debate - Museum Computer Group thread

Recently I kicked off a debate on the MCG mailing list (archive here, check out the February 2008 threads "APIs and EDL" and "API use-cases"). It was really productive. Inevitably the debate strayed beyond the strict bounds of considering the relevance of an API to EDL, and the functionality that it might include. Quite a bit of scepticism was heard concerning the whole project, and there was much debate around the barriers to participation, especially the generation of content and its publication through OAI gateways. In preparation for the WP3 meeting next week, I spent some time yesterday collating and summarising the discussion, which I'm posting below. I did receive some off-list responses and queries, which are not included here.
For those in a hurry, the quick summary of recommendations for an API is this:

be “'open', feature-rich and based on established and agreed metadata models/standards/schemas that allow multiple sources and minimise data loss.”
feature most of the functionality that can be accessed from the back-end
include terms and conditions that specifically requires that UGC be flexible enough to allow any reuse with attribution
include a key to enable differentiated access to services for different types of users
enable the addition of “crowd-sourced” user-generated metadata
be lightweight, using REST, XML and possibly RSS and JSON

I'm still extremely interested in any more opinions on the whys and hows of an API for EDL (or even, more generally, for any digital resource built for a museum) so please do comment or e-mail me if you have anything to add.

************************************
Summary of MCG EDL/API thread
Contributors
Jeremy Ottevanger, web developer, Museum of London
Tehmina Goskar
David Dawson, Senior Policy Adviser (Digital Futures), MLA
Mike Ellis, Solutions Architect, Eduserv
Martyn Farrows, Director, Lexara Ltd
Dr John Faithfull, Hunterian Museum, University of Glasgow
Sebastian Chan, Manager, Web Services, Powerhouse Museum
Nick Poole, Chief Executive, MDA
Terry Makewell, Technical Manager, National Museums Online Learning Project
Robert Bud, Science Museum
Matthew Cock, Head of Web, The British Museum
Douglas Tudhope, Professor, Faculty of Advanced Technology University of Glamorgan
Kate Fernie, MLA
Trevor Reynolds, Collections Registrar, English Heritage
Dylan Edgar, London Hub ICT Development Officer
Joe Cutting, consultant (ex-NMSI)
Richard Light, SGML/XML & Museum Information Consultancy (DCMI & SPECTRUM contributor, developer of MODES)
Ian Rowson, General Manager, ADLIB Information Systems
Graham Turnbull, Head of Education & Editorial, Scran
Frankie Roberto, Science Museum, London

Overview
The discussion kicked off with an introduction to EDL from JO, and a request for responses to the idea of an API for it, specifically:

whether and why an API would be useful to them, or influence their decision on whether to contribute content to EDL
what features might prove useful
any examples of APIs or of their application that they think provide a model for what EDL's API could offer or enable

A second e-mail followed, offering some possible use cases for museums, libraries and archives; for strategic bodies; and for third parties.
Responses fell into three main (interconnected) strands:

attempting to understand the role and purpose of EDL itself, and debating the value of participation
problems relating to the practicalities of cataloguing and digitisation of collections, and the publication/aggregation of the data
the API question

As well as providing useful ideas in respect of an API, the discussion made it clear that in the UK at least there is a need for some public relations work to be done to make the case for EDL, to explain its use for museums and to demonstrate that it will be doing something genuinely new and valuable. Barriers need to be as low as possible, and payoffs immediate and demonstrable. An alternative route to ensuring that there are contributors is coercion, so that funding is dependent upon participation, or a backdoor route wherein content aggregated for other purposes is submitted by aggregators, but ensuring institutional buy-in will be the best route to success and garner the most support. As Nick Poole (NP) himself stated:

The real question, to my mind, is whether museums perceive enough value in participating in something like the EDL to be worth the time it takes to get
involved. People have been burned in the past by services such as Cornucopia
which have tended to be relatively resource-intensive, but with little direct
payoff for individual museums - I'm not surprised people are sceptical.

EDL

Questions included, how would EDL fit in with existing EU and UK projects
such as MICHAEL, Cornucopia, and the People’s Network Discover Service. David
Dawson (DD) offered a detailed overview of its position in this network.

Cataloguing and other barriers

As John Faithfull (JF) expressed it:

I think that the current lack of killer "one stop" apps in the museum sector
is not so much due to lack of projects, technologies, or even standards, but
lack of available basic collection content for them to work with.

While supportive of APIs, he felt that it was the lack of online collection data that was the main problem. Infrastructural problems, such as access to a web server to enable automatic content harvesting in a sustainable fashion, were a big challenge. Nevertheless, he suggested that “the amount publicly available online is bizarre, bewildering and indefensible, given how technically simple the basic task has been for a long time.” Getting even flawed records out there is great for users (a point supported by Matthew Cock). Robert Bud raised some objections to this, if it just added “noise” and confusion to the internet.

NP also argued that shiny front ends tended to get financial priority over sorting out the data, but that we should get on and make the best of what we have (EDL being one means). He also felt that curators often put up resistance to getting their data online.

DD explained the planned architecture for content aggregation, which led to a discussion of software capable of acting as an OAI gateway, Trevor Reynolds pointing out that implementing an OAI gateway is not necessarily that simple. Richard Light (RL), Graham Turnbull, Ian Rowson and DD pointed to various products that do or might offer OAI servers (Modes, Scran-in-a-box, Adlib, MimsyXG and possibly others). NP indicated, too, that the solution should not be oriented at one service (EDL) or one protocol, but should be multilingual, and “the burden of responsibility has to be shifted onto the services themselves to ensure that they capture and preserve as much of the value in the underlying datasets as possible.”

Dylan Edgar pointed to the need to measure or demonstrate impact, if only in order to get funding, whilst DD reminded us that Renaissance and Designation funding, at least, came with a requirement to make metadata available to the PNDS.

An API for EDL

Mike Ellis (ME) argued that:

The notion of an API in *any* content-rich application should be moving not
only in our sphere of knowledge ("I know what an API is") but *fast* into our
sphere of requirement ("give me an API or I won't play")…
…EDL should have a
feature-rich API. A good rule of thumb for this functionality is to ask: "how
much of what can be done by back-end and developer built web systems can be done and accessed via the API?" In an ideal world it'd be 100%. If it's 0 then run
away, fast!

Applications must give us “easy, programmatic access into our data”.

Lexara’s Martyn Farrows made the case, from experience in the commercial software sector, that any API should be “'open', feature-rich and based on established and agreed metadata models/standards/schemas that allow multiple sources and minimise data loss.”

Sebastian Chan suggested that APIs may be “a *practical* alternative to the never ending (dis)agreement on 'standards'.” He suggested an API key to manage security levels and access to different services for various types of users. With regard to user generated content:

it would be prudent to have a T&C that specifically requires that UGC be
flexible enough to allow any reuse with attribution. (A CC with attribution
license may be a good option).

NP pointed out that in the cultural heritage sector the APIs of recent years have generally been one way i.e. enabling content aggregation. There is a need for evidence of the value that this returns to the content provider, in exchange for the cost of participation. He suggested that opening up the content to third parties is no different: the value is not gained directly by the content provider, and the cost of providing something adequate to all uses is probably too high. He wondered if therefore an API might be inbound as well as outbound, to allow “crowd-sourcing” of value-adding metadata creation.

JF was sceptical of the idea of working with an application housing his institution’s data, at least if this meant another obligation (providing the data):

We need stuff that makes everything easier/cheaper/faster/better rather than
having extra things to do, at extra cost.

He pointed out that the Hunterian can already do all that they wish with their own data, and doubted that any central initiative could offer much to help them add to their capacity.

Joe Cutting (JC) suggested as his main use-case the creation of exhibition displays and interactives. He indicated the problems such applications can have, such as copyright, data integrity, completeness and validity, and service level. His recommendations could be well inform an API for EDL.

In terms of technology, ME argued “lightweight every step of the way”, meaning widespread and simple technology. REST and XML (perhaps RSS too) were his preferences, rather than SOAP or JSON, which JC backed up. RL added the proviso of XML being in a community-agreed an application (for example SPECTRUM interchange format). Frankie Roberto argued for both XML and JSON, since the latter has advantages for data exchange and overcoming cross-site security issues with JavaScript.

Friday, February 22, 2008

What have I been doing?

Well, more than I realised perhaps. It's felt both hectic and sometimes unproductive lately but I have moved forwards on quite a few things, though a couple of thorns remain firmly embedded in my side. This has been a week of payoffs for the previous few, as things come to fruition. In no particular order, in fact in random chaos because I'm in a rush, here's some stuff I've been busy with.
The discussion I kicked off on the MCG list about APIs and the EDL was really stimulating, though I never managed to find the time to respond to several of the interesting responses I had. There was interesting feedback on the idea of the EDL itself, or any effort at centralising content. There were also plenty of thoughts about what make good characteristics of APIs and what their uses may be, plus suggestions of examples. I now need to use these in preparing a talk for the EDL WP3 meeting in Paris on the 3-4 March. I've been working on this quite hard, well, sort of, in between the rest, and my own ideas about APIs and indeed the Semantic Web have been evolving a little. Preparation of my survey questionnaire is another area of (out of hours) activity.
What else? I spent 3 days on preparing an out of print guide to pottery fabrics for the web. I started with a PDF of the 200 page Quark document from 10 years ago, exported XML, did some cleaning by hand, passed it through three sets of transformations to structure data and then on to TEILite, and adapted another transformation to display this as HTML. It's not perfect yet and we need to hook all the images in to it, as well as proof read etc., but it was a good proof of how one can take this content and semi-automate the creation of a nicely structured version. The plan had been to database it, and I was just going to get structured data on each fabric out but we seemed to close to what I ended up doing that I went ahead. It felt good.
I did more preparation for moving the events database out of Oracle and into SQL Server; refined some ideas with the help of the microformats list (in short, sod that, it's going to be eRDF for me or something like); met Mike Ellis to talk about the ways that Eduserve can work with museums, gab about microformats and all the rest; talked with a consultant we have working on the Port of London Archives, held at Museum in Docklands, about IT requirements, OAI-PMH and so on (followed up today with a grand assembly of all his interviewees, very interesting); worked on the London's Burning site for key stage 1 kids (more fancy pants XSLT for me, but the good bit will be a great interactive by ON 101); also work on integrating a new game, Family Favourites, soon to be launched at MiD; I got RSS feeds (and other XML sources) drawn into our site server-side (not in a public place yet, but working) and along the way rediscovered a baby REST interface I'd built in the autumn. At present this just lets you search the MoLAS publications, but events and collections are but a short way off if I get a few minutes. I advised on the procurement of a new map application for the London Sugar and Slavery gallery, although we only just made one. Still, there's money to pour away and it must be poured before the end of the financial year. Damn that financial year, it's the bane of my life right now! And I've been trying to get this blasted generic timeline project back on track. Happily today I had a meeting where we got ourselves a plan of action, and just a moment ago spoke to the designer about the next steps so now I hope there's time to get it done before, you guessed it, the end of the financial year...

Quick note: Waibel and Godby reporting on the MCN 2007 (just published in Ariadne) made some interesting observations, including: "During the public CDWA Lite Advisory Committee meeting, Inge Stein (Konrad-Zuse Zentrum für Informationstechnik, Berlin) presented on Museumdat [4], a harmonisation of CDWA (Categories for the Description of Works of Art) Lite with the CIDOC (Committee on Documentation of the International Council of Museums) Conceptual Reference Model). One of the motivations for this effort: with a small amount of changes, CDWA Lite could be used for all objects across the cultural heritage spectrum, whereas currently it is optimised exclusively for fine art. All delegates agreed that the changes proposed by the German museum community should form the basis for the next version of CDWA Lite. Monika Hagedorn-Saupe (Institute for Museums Studies, Germany), Inge Stein and the Advisory Committee agreed that a single international version of the standard would be desirable."
"The Town Hall Meeting on Intellectual Property: Museum Image Licensing – The Next Generation provoked a lively debate, with many points of view represented by both presenters and delegates, and little evidence of an emerging consensus around the business model for sustaining digital image provision: the room seemed divided between those who feel that the museum community can make the most impact in our information economy by providing open access whenever legally possible, and those who favour business models of cost recovery or even revenue generation. "

Thursday, February 21, 2008

Getting the Semantic Web to the level of the human author

Authoring tools for the Semantic Web are a problem, addressed thoughtfully here.

Wednesday, February 20, 2008

MLA reorganisation outlined

24 Hour Museum (looks like it's still called that) says that the MLA Board has now anounced its plans for their imminent reorganisation.
So now we can see the plan. Well, sort of. I can't really fathom too much from the management rhetoric in here, except that they're being obliged to slim down, reshaping (for what is this, the fifth time in a decade?) and things are going to be tight. However it may all work out well, at least insofar as making it more comprehensible and transparent to outsiders. I for one have found the structure of the MLA and its agencies confusing, and this goes for the Hubs too - we may be the lead member of one such, but I still find the relationship between Hub, partners and MLA(regionalagencyhere) to be, um, enigmatic.
Hopefully the stuff about "finding new ways to share information in a digital age" is a good sign. MLA has good people on board and perhaps they'll work more tightly with frontline museum folk if their own resources are more limited. Or not - they may have less time for consultation. Time will tell.

Tuesday, February 19, 2008

Listening Post at the Science Museum

I like the sound of this (no pun intended): a fascinating new artwork at the Science Museum, although this is not its first stop: apparently this piece was launched in 2002 and recently purchased for £120k+. It will be supported until 2010, and draws upon live internet chat, using screens and synthesised voice. I need to talk to the NMSI team about the arrangements they have for supporting this, and what will happen after 2010.

Thursday, February 14, 2008

The USA's EDL? Prob'ly not

Guther Waibel writes on Hanging Together about a Mellon funded Museum Data Exchange project that shares some aims and problems with EDL. OK, they're not the same thing at all, but all the same the challenges they face in terms of consolidating data in one place may offer a useful comparison, and I'm presuming that in due course they may be offering a public interface to this aggregation. Will it flourish, will it bring in other institutions and might it one day touch edges with EDL? This is simply an experiment at the moment, of course, but an interesting one. I'd like to see how CDWALite works in this context as opposed to OAI-PMH.

Tuesday, February 12, 2008

Europeana demo site launched. It's cool.

Well, here we go then. The retitled EDLNet maquette is now known as Europeana and has gone live. Looks good, and plenty of functionality too. I was a bit taken aback by how cool it actually is. There's still clearly a case for distibuting the functionality, but this could become a pretty attractive destination. Congratulations to those involved!

Friday, February 08, 2008

TO'R and the dude from Reuters on OpenCalais and SW

Got no time to think or respond properly to this but it's pretty juicy:
Reuters CEO sees "semantic web" in its future

Thursday, February 07, 2008

I think I get Dapper now

I hadn't really twigged what Dapper was all about, but now with Marshall Kirkpatrick's latest post it's a lot clearer, as is how they're hoping to propagate it and make money. Cunning! And yet again an example of kind of retro-fitting the semantic web to the infinite variety of content already out there.

Wednesday, February 06, 2008

OpenCalais

One for the top-down semantic web: Reuters' new API, OpenCalais, that uses AI, NLP and a mahoosive database of people, places, organisations and events to extract semantics from content submitted to it. You can see more about the API here and read the RWW analysis here. It spits out RDF, actually, demonstrating the overlap between the automated, AI-driven SW that is clearly going to happen, and the formal vision of RDF etc. is going to be a real force.

About Me