The Doofer Call: digital heritage

Showing posts with label digital heritage. Show all posts

Saturday, June 25, 2011

The LOD-LAM star system....

....is not in a galaxy far far away, although I believe it came out of San Francisco so that may not be too much of a stretch. The recent LOD-LAM workshop there on the question of Linked Open Data in Libraries Archives and Museums seems to have been a very lively and stimulating event and resulted in, amongst other things, a star rating system for linked open cultural metadata.
Mia yesterday posted a question to the MCG list asking for reactions to the scheme, which addresses in particular the issue of rights and rights statements for metadata - both the nature of the licence, which must reach a minimum level of openness, and the publication of that licence/waiver. Specifically she asked whether the fact that even the minimum one-star rating required data to be available for both non-commercial and commercial use was a problem for institutions.

My reply was that I felt it essential, in order for it to count as linked data (so I'm very pleased to see it required for the most basic level of conformance). But here I'd like to expand on that a bit and also start to tease out a distinction that I think has been somewhat ignored: between the use of data to infer, reason, search, analyse, and the re-publication of data.

First, the commercial/non-commercial question. I suppose one could consider that as long as the data isn't behind a paywall or password or some other barrier then it's open, but that's not my view: I think that if it's restricted to a certain group of users then it's not open. Placing requirements on those users (e.g. attribution) is another matter; that's a limitation (and a pain, perhaps) but it's not closing the data off per se, whereas making it NC only is. Since the 4 different star levels in the LOD-LAM scheme all seem to reflect the same belief that's cool with me.

The commercial use question is a problem that has bedevilled Europeana in recent months, and so it is a very live issue in my mind. The need to restrict the use of the metadata to non-commercial contexts absolutely cripples the API's utility and undermines efforts to create a more powerful, usable, sustainable resource for all, and indeed to drive the creative economy in the way that the Europeana Commission originally envisaged. With a bit of luck and imagination this won't stay a problem for long, because a new data provider agreement will encourage much more permissive licences for the data, and in the meantime a subset of data with open licences (over 3M objects) has been partitioned off and was released this very week as Linked Open Data. Hurrah!

This brings us to the question of how LOD is used and whether we need a more precise understanding of how this might relate to the restrictions (e.g. non-commercial only) and requirements (e.g. giving attribution) that could be attached to data. I see two basic types of usage of someone else's metadata/content: publication e.g. displaying some facts from a 3rd party LOD source in your application; and reasoning with the data, whereby you may use data from 3rd party A to reach data from 3rd party B, but not necessarily republish any of the data from A.

If LOD sources used for reasoning have to be treated in the same way as those used for publication you potentially have a lot more complexity to deal with*, because every node involved in a chain of reasoning needs to be checked for conformance with whatever restrictions might apply to the consuming system. When a data source might contain data with a mixture of licences, so you have to check each piece of data, this is pretty onerous and will make developers think twice about following any links to that resource, so it's really important that aggregators like Culture Grid and Europeana can apply a single licence to a set of data.

If, on the other hand, licences can be designed that apply only to republication, not to reasoning, then client systems can use LOD without having to check that commercial use is permitted for every step along the way, and without having to give attribution to each source regardless of whether it’s published or not. I'm not sure that Creative Commons licences are really set up to allow for this distinction, although ODC-ODbL might be. Besides, if data is never published to a user interface, who could check whether it had been used in the reasoning process along the way? If my application finds a National Gallery record that references Pieter de Hooch’s ULAN record (just so that we’re all sure we’re talking about the same de Hooch), and I then use that identifier to query, say, the Amsterdam Museum dataset, does ULAN need crediting? Here ULAN is used only to ensure co-reference, of course. What if I used the ULAN record’s statement that he was active in Amsterdam between 1661-1684 to query DBPedia and find out what else happened in Amsterdam in the years that he was active there? I still don’t republish any ULAN data, but I use it for reasoning to find the data I do actually publish. At what point am I doing something that requires me to give attribution, or to be bound by restrictions on commercial use? Does the use of ULAN identifiers for co-reference bind a consuming system to the terms of use of ULAN? I guess not, but between this and republishing the ULAN record there’s a spectrum of possible uses.

Here's an analogy: when writing a book (or a thesis!), if one quotes from someone else's work they must be credited - and if it's a big enough chunk you may have to pay them. But if someone's work has merely informed your thinking, perhaps tangentially, and you don't quote them; or if perhaps you started by reading a review paper and end up citing only one of the papers it directed you to, then there's not the same requirement to either seek their permission to use their work, nor to credit them in the reference list. There's perhaps a good reason to try to do so, because it gives your own work more authority and credibility if you reference sources, but there's not a requirement - in fact it's sometime hard to find a way to give the credit you wish to someone who's informed your thinking! As with quotations and references, so with licensing data: attributing the source of data you republish is different to giving attribution to something that helped you to get somewhere else; nevertheless, it does your own credibility good to show how you reached your conclusions.

Another analogy: search engines already adopt a practical approach to the question of rights, reasoning and attribution. "Disallow: /" in a robots.txt file amounts to an instruction not to index and search (reason) and therefore not to display content. If this isn't there, then they may crawl your pages, reason with the data they gather, and of course display (publish) it in search results pages. Whilst the content they show there is covered by "fair use" laws in some countries, in others that’s not the case so there has occasionally been controversy about the "publication" part of what they do, and it has been known for some companies to get shirty with Google for listing their content (step forward, Agence France, for this exemplary foot-shooting). As far as attribution goes, one could argue that this happens through the simple act of linking to the source site. When it comes to the reasoning part of what search engines do, though, there's been no kerfuffle concerning giving attribution for that. No one minds not being credited for their part in the page rank score of a site they linked to – who pays it any mind at all? – and yet this is absolutely essential to how Google and co. work. To me, this seems akin to the hidden role that linked data sources can play in-between one another.

Of course, the “reasoning” problem has quite a different flavour depending upon whether you’re reasoning across distributed data sources or ingesting data into a single system and reasoning there. As Mia noted, the former is not what we tend to see at the moment. All of the good examples I know of digital heritage employing LOD actually use it by ingesting the data and integrating it into the local index, whether that's Dan Pett's nimble PAS work or Europeana's behemoth. But that doesn't mean that it's a good idea for us to build a model that assumes this will always be the case. Right now we're in the earliest stages of the LOD/semweb project really gathering pace - which I believe it finally is. People will do more ambitious things as the data grows, and the current pragmatic paradigm of identifying a data source that could be good for enriching your data and ingesting it into your own store where you can index it and make it actually scale may not stay the predominant one. It makes it hard to go beyond a couple of steps of inference because you can't blindly follow all the links you find in the LOD you ingest and ingest them too – you could end up ingesting the whole of the web of data. As the technology permits and the idea of making more agile steps across the semantic graph beds in I expect we'll see more solutions appear where reasoning is done according to what is found in various linked data sources, not according to what a system designer has pre-selected. As the chains of inference grow longer, the issue of attribution becomes keener, and so in the longer term there will be no escaping the need to be able to reason without giving attribution.

This is the detail we could do with ironing out in licencing LOD, and I’d be pleased to see it discussed in relation to the LOD-LAM star scheme.

Tuesday, May 05, 2009

CFP for VALA2010

i.e. a trip to Australia. VALA 2010 looks like an interesting conference:

VALA promotes the use and understanding of information and communication
technologies across the Galleries, Libraries, Archives and Museum sectors.

The CFP is here but the deadline is nearly up (although the conference isn't until Feb 2010)

Monday, May 04, 2009

ICHIM and DISH

I hadn't twigged that the 2007 ICHIM was in fact the last of that long-running series of bi-annual conferences, which ran, amazingly, from 1991. April's issue of Curator starts off with an interview with David Bearman on the ICHIM's history, why it ended, and what next. Let's not forget that dbear and Jennifer Trant also run the universally adored and enormous Museums and the Web conferences, but ICHIM covered somewhat different territory and arguably there's a space that needs filling now...

...which is why it was timely that on the same day I found that interview, I also read about DISH2009:

"Digital Strategies for Heritage (DISH) is a new bi-annual international
conference on digital heritage and the opportunities it offers to cultural
organisations."

DISH 2009 takes place in Rotterdam December 8-10th, and the CFP is up. It looks interesting: taking a step back to look at strategic questions of innovation, collaboration, management etc.

Saturday, April 25, 2009

Catching up with Europeana v1.0 [pt.1]

Last November, a prototype Europeana launched. Many (perhaps even both) of you will know that the results were mixed: the index itself was successful, at least given its proof-of-concept status, but personalisation features were not optimised and led rapidly a crash as the user sessions racked up. It seems that the solution to this was essentially configuration, but politics meant that more had to be seen to be done and so hardware was thrown at the problem. A couple of weeks later the site was back but under the radar and without the personalisation bit ("My Europeana"), and more recently this too has returned - go and have a play here.

Prototyping done, the bid was assembled to develop a full-blown service, "Europeana v1.0". This bid to the European Commission was successful and just before Easter a kick-off meeting was held at the Koninklijke Bibliotheek in the Hague to initiate the project. This is actually but one of a suite a of projects under the EDLFoundation umbrella, all working in the same direction, but I guess you could say it's the one responsible for tying them together.

So how is Europeana shaping up now? Having spent three days finding out I can tell you now that I came back feeling good - and not just because I was heading straight off again on holiday. Day 1 was about travel and (obviously) a long and lovely trip to the Mauritshaus, but it ended with an hour in the company of Sjoerd Siebinga, lead developer on the project, and a session with Jill Cousins, Europeana's director. I went to see Sjoerd because I wanted to find out how Europeana's technical solution would fit with our plans at the Museum of London for a root-and-branch overhaul of our collections online delivery system. I knew that they'd be opening the source code up later this year, and I also knew that in essence what Europeana does is a superset of what we want to do, so I figured, find out if there'll be a good fit and whether there are things I could start to use or plan for now. Laughably, I thought that we might actually be able to help out by testing and developing the code further in a different environment - as if they needed me! I'll save this for another post, but in short Sjoerd took me on a tour of what they use as the core of the system (Solr) and blew me away. There are layers that they have built/will build above and below Solr that make Europeana what it is and may also prove helpful to us, but straight out of the box Solr is, quite simply, the bollocks. I've known of it for ages, but until given a tour of it didn't really grasp how it would work for us. Many, many thanks to Sjoerd for that.

Next I met with Jill for an interview for my research on digital sustainability in museums, where we dug into the roots of Europeana, its vision, key challenges, and of course sustainability (especially in terms of financial and political support). This was fascinating and revealing and added a lot to my understanding of the context of the project's birth and its fit in the historical landsacpe of EC-funded initiatives in digital/digitised cultural heritage. As a research exercise it was a test of my ability to work as an embedded researcher; one who is not just observing the processes of the project but contributing and arguing and necessarily developing opinions of his own. I really don't know how well I did in this regard - I'm not sure how often my attempts to be probing may in fact be leading, or whether my concerns with the project distort the approach I take in interviewing. Equally I don't know if this matters. A debate to expand upon another time, perhaps.

Days 2 and 3 were the kick-off meeting, and I'll put that in another post.

Friday, December 05, 2008

Building communities pt.2

In my previous post about "Building Communities in the Digital Arts and Humanities", the workshop I recently attended, I mentioned that one concrete suggestion had caught my imagination, and that of others, in the final discussion, but I forgot to actually write about it. Rather than heavily editing that post, I'll outline it here.

John Byron, Executive Director of the Australian Academy of the Humanities, proposed a sort of helpdesk for the digital humanities. The situation at present for anyone with a problem can be tricky: non-specialists may have no clue where to turn to find advice on, say, digital preservation, whilst techies might wonder who to ask about, for example, reconciling two metadata schemas; and yet, if you knew who to ask, there's almost certainly someone out there who could answer that query, in a centre of expertise, a grass-roots network, a software house etc. But what if there was one website (or just an e-mail address!) you could go to with your problem, which would direct the query to the right place to get it answered? The model might be one of triage - a crack squad of dedicated elves with a deep knowledge of the sources of expertise decide who to send the question to - or of an expertise marketplace, akin to Experts Exchange and the like, where a problem would be posted to a suitable forum (perhaps by elves again) and the community there would propose answers. The beneficiary might be able to assign points for the help they're given.
The proposal is not at heart about how to build communities, of course, but it would face that problem in two areas - building the community of experts, and that of users. Perhaps it would also build on what we learned from the meeting, too, because the idea would be to build on existing communities, creating a community of communities in fact, although quite how would I guess depend upon each community. It would also, hopefully, adapt itself to the needs of the target (user) community too, providing services that it needs rather than what someone else thinks it needs.
I really liked the idea, which would need some funds to get off the ground and to keep going, but which I think is quite easy to explain and show the benefits of. The problem may be one of gaining resources for a project that benefits people worldwide. But there are examples of this working (OCLC, for one). I hope it goes somewhere.

Thursday, November 20, 2008

UK press on Europeana's launch

Some UK press on the launch of Europeana:

BBC online: European online library launches
Guardian: Dante to dialects: EU's online renaissance
Associated Press: European history, culture and art goes digital (well not really UK but never mind)

I'll keep editing this stuff as I find more. Plenty of concentration on the awesome content as well as the fact the site was brought to its knees by huge traffic, which I'd see as a success of sorts - best see how the traffic holds up over the next few weeks, though.

Tuesday, July 15, 2008

Why Gnip caught my eye: a bit more depth (just a bit)

Eric Marcoullier commented on my last post on Gnip and I wrote him the following e-mail because, as I say, it's about time I worked through a little bit the reason why his baby caught my attention (not that it's a particularly worked through working through, but hey, it's a start). It was a bit much for a comment but enough for a post, so here you go.

*******************************************************************

Many thanks for taking the time to look at my brief notes, you must be a busy man so I really appreciate it. It's definitely time I tried to put some flesh on the bones because it's true, I've barely sketched the link between Gnip and my own preoccupations.

My research is looking at how museums keep their digital stuff useful; in other words, how and when we keep on trying to squeeze value out of the digital stuff we've invested in. I'm trying to put a particularly museum-y spin on it because it would be all too easy to look, for example, at general questions related to digital preservation (yawn). Hence I'm exploring the specific conditions and challenges that museums have to face, as well as the way in which they value what they hold - as a "memory institution" with a remit to preserve and to serve the public, a museum has potentially got a slightly different way of valuing what it holds, though arguably this won't really apply to digital material except in special cases (like digital art). So that's the basic thread of my research: looking at how museums can and do decide a strategy for maximising value from their digital assets, and for planning new ones.

Of course, no museum is an island (that's kind of the point of the 'net, right?) and I'm inevitably thinking a lot about the relationships between museums and other parties that might provide or use services and data to/from them - this is key to extracting value, but it's also a dependency for which we need to understand the risks. In the museum community, a lot of the talk (for a couple of decades or more, now) is about how we share our most obvious USP: our collections data. Loads of work has been done on this and yet we still seem to be a long way from the dream of a way of effectively integrating the collections of more than a few institutions. This is why I've been working with the European Digital Library/Europeana project. The reason that Gnip caught my eye was because it suggests another model for data interchange. It may be not be appropriate for the scenario of sharing collections data, and one could argue that in some ways other museum initiatives share some of its characteristics (federated search, metadata harvesters etc.), but I was interested in whether we could learn from the model of a neutral mediating agent as rather than a central pool of data. We're not short of standards but we are short of co-ordinating mechanisms that we can all trust and feel we leave us with some control over "our" data.

The actual purpose of Gnip as an exchange for social data was probably of secondary interest to me, but of interest all the same - it's just an area I don't know much about. I think that on the whole museums won't need to concerns themselves directly about how whatever it is they do will relate to Gnip: I presume that if they incorporate a third party service in their site, or perhaps have an installation of WordPress, then a lot of the mechanics may be dealt with already (or will be in due course). But concern about interoperability and data portability may well be a reason why many museums (my own included) haven't yet done an awful lot with social software - although there are some notable exceptions. If Gnip helps to address these concerns then all that will still be lacking is our imagination!

One other possibility is that museum applications could indeed work with Gnip to integrate individuals' public information with their own services - say, by drawing links between a person's list of interests or music preferences, and what's in a museum's (or a library's)collection; or by suggesting events to attend based on user location, age and interests. I don't understand Gnip well enough to know if this is plausible, though, but it's an intriguing prospect.

*******************************************************************

Thanks again to Eric for taking the time to contact me, I think it speaks well of new ventures like this (OpenCalais was another) when the key people go out out of their way to make contact with the people that are talking about them.

Thursday, May 22, 2008

Leuven that aside...

I'm not quite sure where to go with a pun of profound lameness even by my own pitiful standards, but I'm giving it a Creative Commons Attribution Sharealike licence in case anyone can make it pay. Not holding my breath.

Anyway, to the point: bloody hell, railways! Well there are other points, but just wanted to get that off my chest. To Beligium by Eurostar is lovely, unless there's a strike. Eurostar to their credit did all they could to help, moving me onto a later train in the not-quite-certainty that it would get past Lille. They also told my hotel I'd be late. I got there, well, the right side of dawn but the wrong side of midnight. And travelling back I had the more day-to-day joy of British trains being titsup. Not forgetting Junior puking on the journey to the train before all of that. I must have offended St Christopher or something.

But the travel trauma was all worthwhile. The meeting itself was really good and I would love to return to Leuven, it's a beautiful and tranquil place. Imagine a city centre so quiet at rush hour that most of the traffic consists of parent cycling next to their infants on the way to school.

My take-homes from the meeting were many. Here are some.

There seem still to be divergent thoughts on whether there will be a record page, although I think it's looking very likely (as seen in the maquette). This means different things for different types of institution and material. Where the original DO on the institution's website is more than an image of moderate size, the visitor will have a motivation to leave Europeana and visit the original. For films and audio this will be clearcut. For assets where the surrogate is in any case an image, they may be less likely to leave. Perhaps this depends upon the size of the largest image that Europeana will show. It does bring home, however, how vital it will be to demonstrate to content contributors that there will be superb reporting of usage of their material on that site.

linked to the previous point, if EDL hosts only image surrogates (and occasional small derivatives of multimedia?), it keeps its costs down and traffic to institutions high, but we must keep in mind quality control of the originals hosted elsewhere - what will be acceptable standards for different formats, and how do we control this when they're actually never ingested?

a drawback to the plan of ingesting only thumbnails is that for institutions with no existing online presence this is not very helpful. Perhaps EDL should offer a premium service, or one for limited numbers of surrogates per institution, where they can offer to hold a full size image/media asset and display this in a modified details/record page. I'm very keen on this, as a means to attract the participation of tiny museums by, essentially, offering to get them online for the first time. The current model presupposes that every partner is already online - we need a 180 degree turn on this. It's also possibly a modest source of revenue.

EDLLocal is obviously the plan to encourage the participation of the minnows at the moment, and I need to find out more about this. Nevertheless, if it still presupposes a web destination outside of Europeana for all DOs it's not enough, IMHO. We need to see just how low we can set the barrier to participation, and how big we can make the reward. Rather than requiring a URL for each DO, it would be better to be able to take whatever a contributor can give - a DVD of images and a spreadsheet of metadata, say - and offer them
- fuller record pages on Europeana
- API access and code fragments for dropping onto a blog or whatever

I like the idea of using a wiki for UGC related to objects. It has the advantages of being
- cheap and out of the box
- very easy to create new pages
- EDL GUIDs/DOIs usable for page names
- microformat/POSH friendly?
- familiar to many
- clearly distinct from the "authorised" content

The new home page that Jon Purday showed looks like progress. The concept is a good step, the graphical side isn't finished but getting there

There was plenty to read and discuss about reorganising EDL for the next phase of the project, the build of version 1.0 (first we have to launch a series of prototypes). It's too early to talk about this.

We worked on the vision and mission, which was quite fun and threw into relief some differing ideas of what the whole thing is about. Personally I like for a vision something like: Europeana.eu: culture and heritage, connected and shared. It's short, it emphasises both the connections being made between knowledge and the sharing of this with people.

Money is tight, realistically EDL will be relying on a subsidy of some sort for a good while to come, but there may be some good commercial opportunities. These needn't conflict with either the ownership of the source data and digital assets by contributors, nor the public service/public good ethos. The semantic graph derived from the combined dataset will belong to EDL, and this could be very marketable. I have to work up ideas here, I have OpenCalais in mind as some sort of model.

One other outcome was that the cabbie who finally got me home solved a UFO mystery I'd been intrigued by for a few months. One evening last autumn I watched a string of lights for about ten minutes as they emerged over the horizon and slowly rose through the clouds. Either it was something fast but a long way off - I was thinking shed-loads of planes from one of the Suffolk airbases - or actually as slow as they appeared and nearby. Turned out they were the latter - a whole load of candles floating under balloons, released from Gosfield School to confuse those who Don't Even Want To Believe But Are Intrigued By The Lights

Monday, March 17, 2008

A few more Paris notes and an update

A dull post. I listened to my recording of my talk in Paris and jotted down notes on a few things that came up in the discussion, thought I'd get them down here. Also, I updated the list of input parameters I put up before and updated the version on Scribd.

Those extra points:

Geo search
There are geographical search (and geo plus time) projects going on in eContent Plus and IST, using co-ordinates, place names, changing boundaries etc. We would hope to incorporate these (possibly post-prototype). Everything in Europeana will be public domain (development-wise) therefore the software will be there for the taking (I hope I got that right!)
"Privileged" tags
We mooted the possibility of privileged tags, i.e. those produced by certain authorised users, perhaps agreed by certain groups. Tags created by these users (most likely content contributors) would be treated differently so that we could pull out only certain items with a tag. But probably, rather than giving them some specific "privileged" status, we could achive the same thing just by identify them by contributor, user group or contributor type.
Stuff to clarify

Licensing data model and assumptions
Core common data
Where is the boundary between Europeana and the contributor sites? Maquette seemed to include considerable data and the actual content displayed in-site for some types of asset e.g. images, but others might be held off-site. What are the rules?
What needs to be added to the API to work well for libraries and archives?

Thursday, August 30, 2007

Odds and sods 2

A couple of really interesting links from today:

From hand-crafted to mass digitized by Gunter Waibel at Hanging Together. There's loads I identify with in these summarised remarks, and a fair bit to argue with. An evolving discussion on the balance between the practical, the ideal, and the flexible. Gunter also wrote recently about the environment in which LAMs operate, pointing to Lawrence Lessig's "modalities of constraint" - factors regulating behaviour. The latter fits nicely into the paper I'm writing, at least if I can work through the 32 odd pages of this reference.
Where Do We Put It? Fitting the Web Into Museums from Nina Simon (on Museum 2.0) pointed me to this thesis by Karen A. Verschooren. Fascinating thinking and material on internet art, much of which logic can probably be applied beyond that precise field and onto other museum digital resources. I'll have to read another 200 pages or so to confirm this, but it looks promising! Nina's post in interesting in itself for its response to the thesis. I'd like to do them both justice here but it will have to wait.

Thursday, May 03, 2007

Cross-posted comment: on real museums and digital value

Cross-posted comment to Holly Witchey's post on musematic (since it's what I had on my mind today anyway):

Hi Holly, you won’t remember me but we shared breakfast in Pasadena…. I do understand how you can sometimes get to feel like this. I don’t have much to add to your thoughts on communication breakdown – you’re right, communication takes both effective transmission and reception. Your remarks on the value of our whole digital enterprise, though, chimed very serendipitously with my own musings on the way to work this morning. I was thinking: some museums that hold a preponderance of “real” objects, others contain more in the way of dioramas, reconstructions, replicas, interactives and experiences; indeed some have nothing “real” at all. Does this lead them to have different attitudes to their digital holdings or place different value upon them? Despite the AAM’s Code of Ethics, not all museums (in the broad definition that the AAM also holds) have collections per se. In fact, the section you quote includes not just collections but “exhibition materials”, and maybe that’s where we can salve our consciences a little: exhibition materials could well include digital resources. In the end it’s true, most of the time in most cases it’s the collections that really count and they must take priority, and museums always have to balance, to choose, and they do. As well as going on building and maintaining collections they have to use them in all sorts of ways to get value now as well as in the future. That’s where we come in – only rarely are we creating works of art; mostly we’re making stuff that brings art, history, science, ideas to people. It would be foolish to spend too much on that, like it would be foolish to spend all the money on gallery refurbs and none on building and caring for collections, but still it’s valuable work. Occasionally we might even make something that could become digital heritage (i.e. a digital thing worth keeping) as opposed to digitized heritage, and in this time of flux and exciting experiments we are surely seeing some genuinely valuable bodies of knowledge and experience that perhaps we’ll want to “preserve for posterity”. I’m really interested to find out if and when our digital stuff is anywhere near as precious as our real stuff (it’s my research area, as it happens), but anyway don’t be too down! If people needed museums in the past, not just collections, then they need us now, in the same way

About Me