The Doofer Call: The EDL API debate - Museum Computer Group thread

Recently I kicked off a debate on the MCG mailing list (archive here, check out the February 2008 threads "APIs and EDL" and "API use-cases"). It was really productive. Inevitably the debate strayed beyond the strict bounds of considering the relevance of an API to EDL, and the functionality that it might include. Quite a bit of scepticism was heard concerning the whole project, and there was much debate around the barriers to participation, especially the generation of content and its publication through OAI gateways. In preparation for the WP3 meeting next week, I spent some time yesterday collating and summarising the discussion, which I'm posting below. I did receive some off-list responses and queries, which are not included here.
For those in a hurry, the quick summary of recommendations for an API is this:

be “'open', feature-rich and based on established and agreed metadata models/standards/schemas that allow multiple sources and minimise data loss.”
feature most of the functionality that can be accessed from the back-end
include terms and conditions that specifically requires that UGC be flexible enough to allow any reuse with attribution
include a key to enable differentiated access to services for different types of users
enable the addition of “crowd-sourced” user-generated metadata
be lightweight, using REST, XML and possibly RSS and JSON

I'm still extremely interested in any more opinions on the whys and hows of an API for EDL (or even, more generally, for any digital resource built for a museum) so please do comment or e-mail me if you have anything to add.

************************************
Summary of MCG EDL/API thread
Contributors
Jeremy Ottevanger, web developer, Museum of London
Tehmina Goskar
David Dawson, Senior Policy Adviser (Digital Futures), MLA
Mike Ellis, Solutions Architect, Eduserv
Martyn Farrows, Director, Lexara Ltd
Dr John Faithfull, Hunterian Museum, University of Glasgow
Sebastian Chan, Manager, Web Services, Powerhouse Museum
Nick Poole, Chief Executive, MDA
Terry Makewell, Technical Manager, National Museums Online Learning Project
Robert Bud, Science Museum
Matthew Cock, Head of Web, The British Museum
Douglas Tudhope, Professor, Faculty of Advanced Technology University of Glamorgan
Kate Fernie, MLA
Trevor Reynolds, Collections Registrar, English Heritage
Dylan Edgar, London Hub ICT Development Officer
Joe Cutting, consultant (ex-NMSI)
Richard Light, SGML/XML & Museum Information Consultancy (DCMI & SPECTRUM contributor, developer of MODES)
Ian Rowson, General Manager, ADLIB Information Systems
Graham Turnbull, Head of Education & Editorial, Scran
Frankie Roberto, Science Museum, London

Overview
The discussion kicked off with an introduction to EDL from JO, and a request for responses to the idea of an API for it, specifically:

whether and why an API would be useful to them, or influence their decision on whether to contribute content to EDL
what features might prove useful
any examples of APIs or of their application that they think provide a model for what EDL's API could offer or enable

A second e-mail followed, offering some possible use cases for museums, libraries and archives; for strategic bodies; and for third parties.
Responses fell into three main (interconnected) strands:

attempting to understand the role and purpose of EDL itself, and debating the value of participation
problems relating to the practicalities of cataloguing and digitisation of collections, and the publication/aggregation of the data
the API question

As well as providing useful ideas in respect of an API, the discussion made it clear that in the UK at least there is a need for some public relations work to be done to make the case for EDL, to explain its use for museums and to demonstrate that it will be doing something genuinely new and valuable. Barriers need to be as low as possible, and payoffs immediate and demonstrable. An alternative route to ensuring that there are contributors is coercion, so that funding is dependent upon participation, or a backdoor route wherein content aggregated for other purposes is submitted by aggregators, but ensuring institutional buy-in will be the best route to success and garner the most support. As Nick Poole (NP) himself stated:

The real question, to my mind, is whether museums perceive enough value in participating in something like the EDL to be worth the time it takes to get
involved. People have been burned in the past by services such as Cornucopia
which have tended to be relatively resource-intensive, but with little direct
payoff for individual museums - I'm not surprised people are sceptical.

EDL

Questions included, how would EDL fit in with existing EU and UK projects
such as MICHAEL, Cornucopia, and the People’s Network Discover Service. David
Dawson (DD) offered a detailed overview of its position in this network.

Cataloguing and other barriers

As John Faithfull (JF) expressed it:

I think that the current lack of killer "one stop" apps in the museum sector
is not so much due to lack of projects, technologies, or even standards, but
lack of available basic collection content for them to work with.

While supportive of APIs, he felt that it was the lack of online collection data that was the main problem. Infrastructural problems, such as access to a web server to enable automatic content harvesting in a sustainable fashion, were a big challenge. Nevertheless, he suggested that “the amount publicly available online is bizarre, bewildering and indefensible, given how technically simple the basic task has been for a long time.” Getting even flawed records out there is great for users (a point supported by Matthew Cock). Robert Bud raised some objections to this, if it just added “noise” and confusion to the internet.

NP also argued that shiny front ends tended to get financial priority over sorting out the data, but that we should get on and make the best of what we have (EDL being one means). He also felt that curators often put up resistance to getting their data online.

DD explained the planned architecture for content aggregation, which led to a discussion of software capable of acting as an OAI gateway, Trevor Reynolds pointing out that implementing an OAI gateway is not necessarily that simple. Richard Light (RL), Graham Turnbull, Ian Rowson and DD pointed to various products that do or might offer OAI servers (Modes, Scran-in-a-box, Adlib, MimsyXG and possibly others). NP indicated, too, that the solution should not be oriented at one service (EDL) or one protocol, but should be multilingual, and “the burden of responsibility has to be shifted onto the services themselves to ensure that they capture and preserve as much of the value in the underlying datasets as possible.”

Dylan Edgar pointed to the need to measure or demonstrate impact, if only in order to get funding, whilst DD reminded us that Renaissance and Designation funding, at least, came with a requirement to make metadata available to the PNDS.

An API for EDL

Mike Ellis (ME) argued that:

The notion of an API in *any* content-rich application should be moving not
only in our sphere of knowledge ("I know what an API is") but *fast* into our
sphere of requirement ("give me an API or I won't play")…
…EDL should have a
feature-rich API. A good rule of thumb for this functionality is to ask: "how
much of what can be done by back-end and developer built web systems can be done and accessed via the API?" In an ideal world it'd be 100%. If it's 0 then run
away, fast!

Applications must give us “easy, programmatic access into our data”.

Lexara’s Martyn Farrows made the case, from experience in the commercial software sector, that any API should be “'open', feature-rich and based on established and agreed metadata models/standards/schemas that allow multiple sources and minimise data loss.”

Sebastian Chan suggested that APIs may be “a *practical* alternative to the never ending (dis)agreement on 'standards'.” He suggested an API key to manage security levels and access to different services for various types of users. With regard to user generated content:

it would be prudent to have a T&C that specifically requires that UGC be
flexible enough to allow any reuse with attribution. (A CC with attribution
license may be a good option).

NP pointed out that in the cultural heritage sector the APIs of recent years have generally been one way i.e. enabling content aggregation. There is a need for evidence of the value that this returns to the content provider, in exchange for the cost of participation. He suggested that opening up the content to third parties is no different: the value is not gained directly by the content provider, and the cost of providing something adequate to all uses is probably too high. He wondered if therefore an API might be inbound as well as outbound, to allow “crowd-sourcing” of value-adding metadata creation.

JF was sceptical of the idea of working with an application housing his institution’s data, at least if this meant another obligation (providing the data):

We need stuff that makes everything easier/cheaper/faster/better rather than
having extra things to do, at extra cost.

He pointed out that the Hunterian can already do all that they wish with their own data, and doubted that any central initiative could offer much to help them add to their capacity.

Joe Cutting (JC) suggested as his main use-case the creation of exhibition displays and interactives. He indicated the problems such applications can have, such as copyright, data integrity, completeness and validity, and service level. His recommendations could be well inform an API for EDL.

In terms of technology, ME argued “lightweight every step of the way”, meaning widespread and simple technology. REST and XML (perhaps RSS too) were his preferences, rather than SOAP or JSON, which JC backed up. RL added the proviso of XML being in a community-agreed an application (for example SPECTRUM interchange format). Frankie Roberto argued for both XML and JSON, since the latter has advantages for data exchange and overcoming cross-site security issues with JavaScript.

The Doofer Call

About Me

Wednesday, February 27, 2008

The EDL API debate - Museum Computer Group thread