About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Friday, May 30, 2008

WHATWG, RIAs and online/offline applications

TechCrunch has a really useful survey of the current status of browser storage, what it really means, how it relates to WHATWG's work on HTML5, [Google] Gears and current and forthcoming browsers. It's explained a lot to me and given me at least the idea that WHATWG's work is coming to something. Looks like we can't yet know quite how the RIA technologies like AIR, Silverlight etc will tie in with this, but the article ends on a very optimistic note about being able to programme to one API since the only mature product (Gears) should be fully HTML5 compliant. All good.

Freebase - a LOT is interesting about this

Alex Iskold (again) gives a great explanation on ReadWriteWeb of what Freebase is, how it works, and where it fits into the Semantic Web. I've got nothing to add, except that I've realised I must explore it, but it's blindingly obvious that this is one possible path we could explore to binding together information from multiple museums and linking it to more broadly understood concepts and data from outside that domain. I love the sound of the query language the API uses, too (metaweb query language, MQL), it's so obvious and yet so novel and useful.
Iskold follows up with another post putting Freebase and others into a broader context of semantic search: Semantic Search: The Myth and Reality

Google Earth in your browser

GE in your browser: about time too. Just wanted to test it, so using the utility that O'Reilly Radar pointed to I've embedded GE below to show some more pics, this time from Chalkney Wood in bluebell season. They're not all that well placed, it must be said!

You'll need to install the plugin, of course, and its currently Windows only, and for the main browsers.

Thursday, May 29, 2008

Screen-scraping and POSH

I hesitate to put this post in thing-versus-another-thing terms, and I won't. For one thing I think that both the alternatives I'll discuss are not in opposition but compliment one another. But there are strengths and weaknesses to each.

Mike Ellis and Dan Zambonini recently unveiled hoard.it, which Mike showed me a draft of some time ago but which looks like it's come a long way. Hoard.it is basically not unlike Dapper in that you can teach it to screen-scrape sites, but instead of basically making an API onto those sites it will ingest the data and then offer it up through an API once scraped (not sure if this is live yet). It can screen-scrape by spidering specified sites. I don't know if their other plans are yet implemented but they also hope to let you scrape a given page via a bookmarklet, and to give the app the ability to match up a given page to templates in its memory. What they're showing is more targetted than the application is capable of because it's aimed specifically at gathering museum collections data, and displays it all appropriately, whereas of course the whole screen-scraping thing that hoard.it is capable of has many more uses than that. It's cool, and in typical style it's an example of "just get it done" tech designed to get something useful underway, even if it's but a stepping stone to something else.

Making templates for a screen-scraper is one way to gather data from the "surface" of the web. Another way to achieve something similar is to embed some sort of standardised "meaning" in the HTML. Microformats are one such route, as are various other flavours and approaches to Plain Old Semantic HTML (POSH). Early last year I put aside my effort to test this idea out for indicating and gathering museum objects. I called it Salad Bowl but Gathery has taken its place as my favoured moniker. Nothing else has changed in the last year, though.

Gathery is a test of two things, really: firstly, the idea of using microformat-like POSH for museum objects; and secondly, the dream I've long had of being able to gather things I liked as I explored the web. Technically there are three elements, I suppose: the POSH; a bookmarklet to identify any number of objects on a page and pass your selected one to the application; and the application that takes that data, processes it (including looking for better quality data at a "home" URL - say, an OAI-PMH source) and lets the user do stuff like tag and describe it, and ultimately feed it back out. It's functional but not complete (I laid it aside because I was unsure which direction to go with the POSH, not to mention being quite busy), but in many ways it's similar to what Dan and Mike are doing with hoard.it. When you boil it down, the differences come down to the screen-scraping as opposed to the POSH approach.

So I've been trying to draw out the pros and cons of hoard.it and Gathery (the latter will probably never come to anything), but essentially it's a comparison of screen-scraping and POSH (though a pro for one isn't automatically a con for the other). Bear in mind that I have at the back of my mind a set of questions relating to how we move towards a semantic web for museums, as well as how to achieve the things I've always wanted as a user, namely an easy way to search and gather material from collections all round the web. Obviously my comparison is based on a good knowledge of what I built and pretty thin knowledge about hoard.it, and since I've not publicised Gathery it's not easy for you to test the comparison (though do ask for the URL if you're interested). But take think my pros and cons as relating to screen-scraping and POSH and let me know if you think they cover the important aspects, and tell me where I'm wrong and what I'm missing.

Gathery (or microformat/POSH approaches)

  • m-object, The POSH I drafted and tested with Gathery, has two levels: content indicator (points to the "home" URL where the best content resides, includes a GUID and an institutional identifier); and content carrier.
  • content carrier is optional, so it is not necessary for content to reside on the page: the author can choose to do no more than indicate that an object's record exists somewhere else
  • authors using explicitly-chosen standards should be less fragile than screen scraping
  • Gathery is focussed around user behaviour and goes where they go, including outside museum sites. If someone embeds in a blog or Wikipedia a pointer to an object on the Museum Of London site, it can be submitted to Gathery which will go to the "home" URL to look for the fuller record
  • content owners get to decide the relationship between their data and the fields in POSH of whatever sort (a microformat, the m-object quasi-format I dreamt up, RDFa etc)
  • data content can be in any number of forms other than the m-object snippet on the page itself


  • POSH or microformats require explicit choices and action by website owners, whether they are museums or private individuals etc.
  • the m-object content carrier part is inflexible [whereas screen-scraping is in some respects as versatile as the scraper's designer wishes]
  • content creators have decided how to align their data with standard fields, as opposed to the gatherer (see also pros!)

Hoard.it (or screen-scraping approaches)

  • scraping of (well-structured) content requires only that the template be built: nothing extra is required of the site owner
  • it is adaptable to fit the available data and the preferred data model of the operator (the ingesting application), and to an extent the template creator
  • clumsy, semantically overloaded HTML is avoided
  • hoard.it includes a spider (though of course this is just as possible for a POSH-based application)
  • when the bookmarklet is available (if it's not already) then hopefully users will be able to gather data from wherever, and apply existing or new template to it


  • screen-scraping by definition depends on content at the surface of the web i.e. on web pages. All the content you wish to grab needs to be there
  • data is rarely structured on screen in an unambiguous and subtle way. Data that is structured for machine-to-machine communication or indexing is. Using this where possible would therefore be better
  • the scraper template designer tries to fit what they find to the data model they have, whilst the data's owner may have other ideas about the best fit
  • fragile (part 1) - a change of HTML breaks the template and may make previous data models for an item unworkable, breaking the logical link between versions of an item
  • fragile (part 2) - a change of location breaks the knowledge of an item because there is no concept of a home URL or a unique identifer.
  • if users are declaring their own templates, there are no common standards for the data they are ingesting

This is all a bit of a muddle, and perhaps it's unwise to mix up discussion of two particular (and very alpha) applications with discussion of the two broad approaches they take to similar problems, but for me it's a good way of teasing out the merits of each. I also think that there's scope to combine them - for example, whilst hoard.it might continue with a gunslinging, screen-scraping, get it done approach, it could also advocate to museum web techs that they use some minimalist POSH (a bit like COINS) to uniquely identify their objects and give them a "home URL", ideally an end-point with proper structured data that it could also ingest (CDWALite or whatever). It could demonstrate the merit of this relatively easily. In this way something that didn't require too much from authors could add an extra dimension to an application that requires nothing from them at all, other than nice and regular HTML.

OT: Leuven photos

As I said, Leuven is a beautiful place. In truth the only chance I had to explore it (having arrived at half past midnight, and rushing back to the station before the meeting even finished) was between leaving the hotel and the start of the meeting. I took my time, though, and snapped incessantly on my intentionally circuitous route. This slide show is the best of it. Hope you like it. They're all mapped, too, if that tickles your fancy.

Created with Admarket's flickrSLiDR.

Tuesday, May 27, 2008

Good 'Times for APIs

Not quite as abstract and "strategic", perhaps, as Reuters' OpenCalais play, but the news that the New York Times is opening up as an API of some sort is significant. They have to make money from their content, and yet they look like they're giving a lot of control over it to other programmers. What stronger example could one wish for to argue that it's good to open up access to content that we actually want people to use as fully and freely as possible?

RWW's article points also to a post that I didn't remark on first time around: APIs and Developer Platforms: A Discussion on the Pros and Cons.

Slicing the market mini-, sorry Micro-update: TO'R on MicroHoo

In his piece MicroHoo: corporate penis envy? Tim O'Reilly makes lots of interesting arguments concerning where Microsoft should be putting its energy - as far as he's concerned, it should forget about trying to grab a slice of the search market and focus instead on building the internet operating system. For Yahoo!s part, it should realise it's the number one internet media company and make the most of it. There's lots here to chew on (and object to, as Michael Arrington does here, and I happen to agree with him and Jakob Nielsen that search, especially semantically intelligent search, is far from ticked off), but the paragraph that grabbed my attention concerned a couple of other players:

Apple's apparent success with an "own the stack, from the device to cloud" strategy is misleading. With both the iPod and the iPhone, a key element of success is precisely the device's openness to what Apple does not own. Imagine an iPod where you could only buy music from the Apple music store instead of ripping your own CDs (this is Amazon's mistake with the Kindle). Imagine an iphone without the Safari browser (opening a world of web apps to the phone) or the Google Maps application. Apple owns key elements of the stack, but it's a permeable stack, and getting more so.)

This is handy material for making the argument that museums shouldn't try to (or be required to) "own the stack". Far better to focus on a layer in the stack and make it permeable. iTunes (the music store) is clearly an example of trying to corner a part of the market outside Apple's home turf, but (a) they're big and ballsy enough to try it (and who else was doing so effectively at that time? Aside from Napster...) and (b) the core offering is still actually the hardware, and it allows you to acquire music by other means. Amazon and Kindle is interesting too. Perhaps it's too early to say they've made a mistake, though they probably have. I doubt it will stay closed and succeed. Nevertheless, as I acknowledged before, they are making a grab for more of the stack. But let's not forget, they're HUGE!

What can we learn? Well there's that point about openness/permeability. If we must insist on claiming the vertical market from top to toe, from collections management system to the end user's screen, then at least make it permeable and open. Otherwise your carefully grown fruit will wither on the vine, forgotten and increasingly past its sell-by date.

Thursday, May 22, 2008

Leuven that aside...

I'm not quite sure where to go with a pun of profound lameness even by my own pitiful standards, but I'm giving it a Creative Commons Attribution Sharealike licence in case anyone can make it pay. Not holding my breath.

Anyway, to the point: bloody hell, railways! Well there are other points, but just wanted to get that off my chest. To Beligium by Eurostar is lovely, unless there's a strike. Eurostar to their credit did all they could to help, moving me onto a later train in the not-quite-certainty that it would get past Lille. They also told my hotel I'd be late. I got there, well, the right side of dawn but the wrong side of midnight. And travelling back I had the more day-to-day joy of British trains being titsup. Not forgetting Junior puking on the journey to the train before all of that. I must have offended St Christopher or something.

But the travel trauma was all worthwhile. The meeting itself was really good and I would love to return to Leuven, it's a beautiful and tranquil place. Imagine a city centre so quiet at rush hour that most of the traffic consists of parent cycling next to their infants on the way to school.

My take-homes from the meeting were many. Here are some.
  • There seem still to be divergent thoughts on whether there will be a record page, although I think it's looking very likely (as seen in the maquette). This means different things for different types of institution and material. Where the original DO on the institution's website is more than an image of moderate size, the visitor will have a motivation to leave Europeana and visit the original. For films and audio this will be clearcut. For assets where the surrogate is in any case an image, they may be less likely to leave. Perhaps this depends upon the size of the largest image that Europeana will show. It does bring home, however, how vital it will be to demonstrate to content contributors that there will be superb reporting of usage of their material on that site.

  • linked to the previous point, if EDL hosts only image surrogates (and occasional small derivatives of multimedia?), it keeps its costs down and traffic to institutions high, but we must keep in mind quality control of the originals hosted elsewhere - what will be acceptable standards for different formats, and how do we control this when they're actually never ingested?

  • a drawback to the plan of ingesting only thumbnails is that for institutions with no existing online presence this is not very helpful. Perhaps EDL should offer a premium service, or one for limited numbers of surrogates per institution, where they can offer to hold a full size image/media asset and display this in a modified details/record page. I'm very keen on this, as a means to attract the participation of tiny museums by, essentially, offering to get them online for the first time. The current model presupposes that every partner is already online - we need a 180 degree turn on this. It's also possibly a modest source of revenue.

  • EDLLocal is obviously the plan to encourage the participation of the minnows at the moment, and I need to find out more about this. Nevertheless, if it still presupposes a web destination outside of Europeana for all DOs it's not enough, IMHO. We need to see just how low we can set the barrier to participation, and how big we can make the reward. Rather than requiring a URL for each DO, it would be better to be able to take whatever a contributor can give - a DVD of images and a spreadsheet of metadata, say - and offer them

    • fuller record pages on Europeana

    • API access and code fragments for dropping onto a blog or whatever

  • I like the idea of using a wiki for UGC related to objects. It has the advantages of being

    • cheap and out of the box

    • very easy to create new pages

    • EDL GUIDs/DOIs usable for page names

    • microformat/POSH friendly?

    • familiar to many

    • clearly distinct from the "authorised" content

  • The new home page that Jon Purday showed looks like progress. The concept is a good step, the graphical side isn't finished but getting there

  • There was plenty to read and discuss about reorganising EDL for the next phase of the project, the build of version 1.0 (first we have to launch a series of prototypes). It's too early to talk about this.

  • We worked on the vision and mission, which was quite fun and threw into relief some differing ideas of what the whole thing is about. Personally I like for a vision something like: Europeana.eu: culture and heritage, connected and shared. It's short, it emphasises both the connections being made between knowledge and the sharing of this with people.

  • Money is tight, realistically EDL will be relying on a subsidy of some sort for a good while to come, but there may be some good commercial opportunities. These needn't conflict with either the ownership of the source data and digital assets by contributors, nor the public service/public good ethos. The semantic graph derived from the combined dataset will belong to EDL, and this could be very marketable. I have to work up ideas here, I have OpenCalais in mind as some sort of model.
One other outcome was that the cabbie who finally got me home solved a UFO mystery I'd been intrigued by for a few months. One evening last autumn I watched a string of lights for about ten minutes as they emerged over the horizon and slowly rose through the clouds. Either it was something fast but a long way off - I was thinking shed-loads of planes from one of the Suffolk airbases - or actually as slow as they appeared and nearby. Turned out they were the latter - a whole load of candles floating under balloons, released from Gosfield School to confuse those who Don't Even Want To Believe But Are Intrigued By The Lights

Tuesday, May 20, 2008

To Leuven (expenses paid), but who will pay for EDL?

Tomorrow sees EDL's working group 1 meeting at the Katholieke Universiteit in Leuven, not far from Brussels. Quite exciting to be visiting, albeit fleetingly, a place that played host to Matsys, Bouts, Erasmus and Vesalius, amongst others (not to mention, apparently, the infamous AQ Khan). I'm looking forward to attending, though with no expectation of being able to contribute a lot since this group covers different ground to the one I've worked with up till now. I'm not even sure if I'm part of the group or simply in attendance. Anyway, the meeting will look at progress with Europeana so far, and consider the business issues facing it, particularly how to move to phase 2 (i.e. following the prototype, to be launched in November) and how to build a sustainable future. Working my way through 100-odd pages of reading matter in preparation for the meeting, I'm struck by how big a challenge it will be to find the necessary ongoing resources, but also the fact that they are tackling the problem head-on and examining a wide variety of options, from direct subsidy, through subscription by contributors or users, to corporate partnership or sponsorship.

A significant factor in the search for revenue-raising avenues is the fact that Europeana is not going to be a content owner in any significant way, but rather a broker/facilitator for accessing content owned by others. One possibility that I believe it could be worth exploring for two reasons is some form of partnership with a search provider. Yahoo! may be a bit too distracted to talk at the moment, but along with Google could be productive partners. Both sides could benefit by working on an interface and aligning their data structures, and EDL could perhaps offer quite a bit to such a partner in terms of preferential access to the semantically enriched data it will hold. This might be directly to do with searching the resources in EDL, or it might be, say, helping to clean up datasets of people and places. In exchange, maybe either some cash or technological assistance? Perhaps some of the semantic-y startups currently taking wing could also be interesting to work with, but they won't be as well resourced. Cultural heritage organisations have a lot of knowledge and context to offer here so maybe there's a business model to be had.

Friday, May 16, 2008

Yay! Follow the Search Monkey!

Well, rock'n'roll, looks like Yahoo!'s* Search Monkey is going live today. Apparently this will allow us as site owners to (cribbing from RWW's report) "share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. " On the basis of that data, other developers will build apps and users will enhance their search. This seems to be precisely the sort of thing we wished for in the SWTT (in fact legendary Mike Lowndes pointed to earlier signs of this move last month). It's also what I had my doh! moment about last week.

So if it's what I hope it is, we can co-ordinate with others in the sector on some standard fields (and keep it simple initially), push our content into Yahoo! and build apps on top of their search engine. My reservation would be that at the moment it seems to be about building either "Infobars" or "Enhanced Results", but perhaps there's something more API-like and programmable there, or on the way.

* is this the right way to punctuate the possessive of that annoyingly-punctuated name? Answers on a postcard (to Jerry Y!ang).

Thursday, May 15, 2008

Microsoft's prototype chirpy cheap cheap multi-touch interface: Touch Wall

No point rewriting this post on RWW, this is cool though. Watch the vid. Shame there are not plans to "productize" now, but the cost sounds low and the interface intuitive (and obeying the conventions developing around multi-touch).

It's also worth looking at the Touch blog now and then. It's mainly about near-field communication (lots of RFID) but also touch interfaces (nothing on this MS thing yet).

Wednesday, May 14, 2008

Free font creation web app. Cool!

I like: Font Making Made Easy (ReadWriteWeb), pointing at FontStruct

Digital archiving and risk management, that's DRAMBORA's game

The DRAMBORA project is blogged about on ULCC's da blog. It's another initiative in the digital curation/archiving community that could have outputs that can be directly transported into good management practice beyond formal archiving.

Tuesday, May 13, 2008

The Passively Multiplayer Online Game, PMOG

TechCrunch led me to the Passively Multiplayer Online Game, PMOG. Looks like it could be fun, and I wonder if it will turn out to have a bit of overlap with the "web quests" idea of the sort that NMOLP was apparently inspired by (not much point clicking that link, though, it's password protected. Google it and look at their cache if you like, though :-). Just keep escaping the login windows).

The Yahoo! Internet Location Platform

Brady Forrest writes about the Yahoo! Internet Location Platform, which sounds very cool. Couldn't be simpler, really. I'll have to have a look at just what sort of entities that relate to MoL have a WOEID (What On Earth ID), but doubtless there's lots we can do with this. If we can find the time...
He also points to this for how it all ties to Flickr.

Monday, May 12, 2008

Slicing the market

To follow up my recent posts on the 21st century museum and the National Collections Online feasibility study, here's a little more about the question of how funds are allocated by bodies like HLF, and how come we've tended to see them requiring digitisation projects to run from end to end, from digitisation to web publication.

I don't really know enough about the programmes that were run in the past like, for example, NOF Digitise and the HLF (still funding today). However from the projects that I do know that drew on these funding streams, it seems clear enough that concentrating on digitisation per se was never enough - a user-facing product was expected. These did have a sort of aggregated interface (http://www.enrichuk.net/, now pointing at MICHAEL UK) but not really what I've got in mind.

It's quite understandable that this practice persists, indeed there is a good case for it. For funders, they want to see direct user impacts, and a website that can generate stats is nice for that. For museums, well they have a two-fold role - to hold and to document their collections; and to research and interpret them for various audiences. Naturally, then, there will be forces within them urging them to do something with the stuff they digitise, and do it now. The downside is that this has led to fragmentation of the materials digitised for these programmes.

My argument is that for modest sized projects it's not good to spread yourself too thinly trying to do everything, but instead to concentrate on part of the value chain. It's a straighforward architectural thing, and essentially I've realised that I'm just asking for a shift in the assumptions of funding bodies from demanding vertical slices to demanding (contributions to) horizontal slices - in short, to encourage the building of a strong sector-wide layered architecture. Vertical businesses deal in multiple (if not all) links in the chain from production to consumption. This is not really suited to much of what we do. Indeed even behemoths like Amazon don't yet do this (though they a strategically building out from their core business) - they don't actually write books or commission, edit and publish them, print them, or market them, or at the other end deliver them. They flog 'em and dispatch them, and act as an agent for others to do the same.

It's crazy to ask, as a precondition for funding, that a museum or partnership does the whole chain from digitisation and record enhancement, to search engine development, perhaps aggregating records, through to building the user experience as a website, a game etc. Things get done again and again and again, effects are diluted, money wasted. Lots of good stuff will still be made, of course. I don't know what this horizontal architecture would be like, though just possibly we can see parts of it falling into place with PNDS and the Integrated Architecture, and I don't know to what degree it would play a part in many projects in which UK museums engage, but I'm just suggesting that instead of HLF, for one, getting hung up on projects doing the whole end-to-end thing they should support us in coming up with a model to slice the market differently, working on one step at a time, and recognising that the very process of creating the digitised content is a valuable step that is somewhat independent of the creation of an audience-facing instance of it. Vertical slices make silos.

For more on slicing and dicing see here. Shalom.

Saturday, May 10, 2008

National Collections Online: why, what, how...in fact, ?!?

Friday afternoon we gathered at the Science Museum's Dana Centre, perhaps two dozen people sampling a variety of disciplines but all, for one reason or another, interested in the National Collections Online project. This was a workshop for the feasibility study being run by Flow Associates (Bridget McKenzie and Mark Stevenson) following up the discussions they've had with, presumably, all or most of those there. NMSI and the V&A had good representation, not just techy and new media but curatorial and from the collections data side. The National Maritime Museum, the third museum partner in the project, had a couple of representatives too. Culture24 is another partner, and Jane and Jon were there. The NMDC, who I take to be the initiating partner, had no direct representation but wil hopefully learn the lessons anyway. The rest of us included e-learning experts, a couple of people from other museums and the Wellcome Trust, Ross flying the flag for academia, Ben Hayman of Lexara that for the commercial sector, Nick of Collections Trust, Jill Cousins from EDL (which I thought was brilliant, unfortunately she couldn't stay until the denouement but I have a feeling that her presence may have swayed some sentiments in the direction of EDL), and finally Tom Steinberg, director of mySociety. A very interesting crowd.

The biggest problem with the session was, I guess, exactly the same as that which it was trying to address, and perhaps will have helped us to break through it before the next gathering. That problem is the vagueness of the project, which made it almost impossible to talk incisively about it. Someone mentioned a 10,000lb gorilla at one point; all I could see was a big cloud in the middle of the room, who knows what was at the heart of it. I've not talked about NCO before so I should do the basics: Flow's inquiry is into the "viability of an online resource to integrate national museum collections", and perhaps we can orient ourselves as to what this might mean by the well-known points of reference that they have mentioned: CHIN/CCO, CAN, culture.fr, Artstore, Powerhouse. It seems that many people said "not a portal", which was the common response at the session too. Given this, we both needed to decide what we were talking about if not a portal, and know that in advance to make much progress on questions like whether a "single point of access" would be of use to various audiences. This isn't a criticism of how the session was run, it was sure to be tricky as a natural consequence of the fact that, even before the project properly starts, it maybe engaged in a major change of direction (and this is the time to do it!).

Anyway, it was a useful process and it did seem to bring out a good degree of consensus on the unsuitability of an old-skool "portal" approach. I took away a couple of key points though, including a useful reminder:
  1. Fiona Romeo made a remark that inspired a "doh!" moment in me: why not talk to Google about them ingesting our content directly? This time last year we were talking about this very prospect at the UK Museums Semantic Web Think Tank, and it still seems like a really good plan (Yahoo! too, especially since they seem at least as keen on adopting semantic technology). Still, somehow, I'd stopped thinking about this line of attack having been concentrating on the opportunity (or threat, if done badly) offered by EDL. And in the session I didn't shy away from pushing EDL as the obvious place in which collections data should be aggregated, so as to scale to all museums and not duplicate efforts. I argued that NCO should wait for this to firm up before deciding what to do that would compliment or build upon it. However when Fiona mentioned the prospect of working with Google, I realised that over the last 6 months of talking to EDL about how to make sure that it wasn't seen as irrelevant by museums, I'd started to forget that it needn't be the only path down which we go to achieve our goals - just the one that grabbed my attention late last year. I think that the two approaches can be complimentary, and in fact EDL itself would be well advised to talk directly to the search providers about their ingesting structured data. NCO could, in theory, provide something of a breakthrough that would be genuinely extensible and scalable. This would also put the lie to one of my contentions, which was that there was little point in doing something that only involved a small number of nationals. On the contrary, if it opened a very wide door for cross-collection search as this approach might, it would be very worthwhile.
  2. I had a moment of clarity, which followed on from the recent hooha on the MCG list concerning the disappointment or otherwise of the NOF Digitise programme. One of my arguments was that it would be better to make distinctions in projects between those that do digitisation, those that build functionality, and those that build user experiences. I realised that I'm talking basically about slicing funding differently, changing from vertical to horizontal slicing, and that it's not unlike talking about markets. I'm going to post separately with more thoughts on that.
  3. We had useful input from Tom. Though he was kind of preaching to the converted about the idea of making our content as widely available as possible, it's not surprising, and he also furnished us with some useful parallels and metaphors. From our later chat at the pub it's clear enough that he's as keen on the lightweight dissemination of semantic data as I am (albeit sceptical about the Semantic Web - but then in a sense so am I, it's semantic technology that is making the headway, and there are riches to be found there that do appear to be speeding us in the general direction of a more semantic web in any case)

Tuesday, May 06, 2008

The slightly better shape we're in

Well one thing I didn't mention yesterday was the good news, which is that Friday saw the official launch internally of a new set of guidelines for "digital programmes". This is something that Pete had been working on for some time, doing exactly what I've been arguing for from the point of view of planning, not just for current usefulness, but for longevity. That is, he makes explicit links between the strategic and business aims of the organisation, and our digital activities. From there he turns this into a set of principles or activities that must be followed/carried out by "any member of staff considering the creation of a digitally based resource either as an item in its own right or in support of an exhibition, publicity campaign or event ". It's a very thoughtful document and a really big step for us. In fact it's pretty big full stop - 25 pages big.

Another part of the document outlines the planned ICT committee (heavy on the web), and gives a dozen strategic considerations that underlie the document such as centralised programme assessment, interoperability, technical fit, and some interesting ones like flexibility, which relates to the ability of potential partners to respond to our needs.

There's plenty more in there. There are still holes to fill, but I'm hopeful that this document will make a difference to how commissioning takes place from now on, with fewer nasty surprises for us and better allocation of resources for all.

Monday, May 05, 2008

The shape we're in (with a musical digression)

Today I'm still making the most of the quiet to dig into my vinyl and I'm working to the sounds of the german underground, mainly circa 1970-75. Right now, though, Phew (1981). This just so blows my mind. Can fans must have it, and they can get it on CD now (perhaps I will too, it's so hard to get at my LPs most of the time). Given that I've been writing about what our organisation needs to do to get its digital act together perhaps its good that the previous listening has been pretty calming: Alpha Centauri, Tone Float and Outside the Dream Syndicate (not that mellow, actually, the last one). Phew has lovely peaceful moments (like Dream, now), but a driving, anxious motion to some tracks too (Signal). The legendary Roger of defunct Revolver Records in Bristol sold me my copy, I don't think I've thanked him enough and probably never will, but thanks anyway, Rog.

I've been thinking about the holes in our policies (notably concerning the preservation of UGC - we may not want to bother, but we have to at least formulate a policy towards it), and I've been looking back over the 6 years I've been there and thinking about how our responsibilities have developed. We need to work up a business case for a larger team, and perhaps one of a different shape. Frankly it's a no-brainer, but the case needs to be made and it's a good exercise to outline those changes. We also need to work out with the Godot-like web committee (I'm sure it will be along any day now, with purposeful stride and a keen sense of what needs to be done) quite what role we want the web folk at MoL to have. There is a wide range of things we could be doing, and right now we try to do most of them, with some key exceptions like web design and rich media work (Flash, video), which we always farm out. But we do too much and can't always do it properly, and I feel the need right now to consolidate, to work on our infrastructure. If we can decide on exactly what we should be done in-house, and what can be effectively handled out of house, then we can get and give a clear idea of just what resources we need. Right now we have our content manager Bilkis (the one net gain since 2002), Mia, who does a lot of web work as well as various databases, and me, essentially just web.

Personally I reckon we need more people in the team, but there are alternatives, even if they're idiotic: the Museum could decide to draw in its horns and use our expertise just to commission, integrate and manage third party work, as well as content. We certainly do plenty of advising on this at the moment, and most of the vendors we've used recently have done great work for us (please drop me a line if you want to know whose work was not up to scratch), but I'm one of those who think the core work of a museum web team must also include taking care of the plumbing and probably building/running the core CMS too. Perhaps when I've finished my musings I'll post a proper list here of what functions I identify, what I think we should take into our remit, and what I think we should hand out. Best get on with it, then.

Sunday, May 04, 2008

The MCG thread: 21st Century digital curation

Bridget McKenzie has blogged and kicked off a great discussion on the MCG list following a seminar last week with Carole Souter of HLF and Roy Clare of MLA. I've written a reply but as usual I do go on a bit so I'm sending a brief version there and putting more depth here.

There are so many strands to this exchange that I want to jabber about so I’m going back to the start – Bridget’s post. Bridget, if I get your drift, yes, I agree: there is a balance to be struck between the effort put into basic “digitisation” (though I think there are various ideas circulating about what that term implies) and interpreting our digitised collections, and I’m not sure we’re being helped to strike the right balance at present. Raves and gripes about past projects aside, it’s how we spend the scant funds now available that bothers me. Going from a time of relative plenty to a time where most budgets are Spartan rather than Olympian, how do we plan clearly how we spend them?

I’m torn. On the one hand, I get the sentiment which says, we need to be making things that people will enjoy, with some sort of mediation and aimed at well-defined audiences, just like any exhibition or public event we host. I accept that HLF amongst others want their funds to be used on things that we don’t automatically do, things that aren’t our “core activities”. And I take Dylan’s point that, when resources are scarce (which is always), demonstrating (actually, having) impact is really important. But….

On the other, I would suggest that digitisation being referred to as a core activity betrays the fact that it is indeed now something we have to do. The trouble is, it may be a part of our core work, but it has never been core funded (at least not in terms of funds additional to the pre-digitisation days). So it’s a cop-out to say “it’s really important, and therefore we won’t pay for it”. Who will, then? Until our core funders, whoever they might be, come up with the cash to do this newly core activity on an ongoing rather than project basis, we’ll have to act as though it’s not “core” and go begging precisely because it’s so important. But apparently HLF think it’s important enough to not fund it too, so we’re stuffed. I’m glad to hear that in Birmingham at least there digitisation is seen as something to drive with internal funds, I guess HLF are hoping more places will go that way.

The important thing that NOF Digi did (along with other HLF and DCMS funded projects) was get a load of collections records in some sort of order and snap some nice shots of the objects (which seems to be commonly accepted as equating to the “digitisation” of a typical museum/gallery item – making a good record and a decent photo). All the other stuff isn’t flim-flam – for those many users that had a great experience of Port Cities and other such projects, that experience was in no small part due to the contextualisation and linking built on top of the vital digitisation effort. I’m sure that BMAG’s Pre-Raphaelite website will be great, but as Rachel says, the payoff is much more than whatever users it attracts. Like an exhibition, the mediated experience will be more transient than the collections (physical or surrogate) on which it is built. We can’t equate web-ready records and surrogates with physical collections, but nevertheless they are the bricks and mortar from which our public-facing offering is built, and they will last longer than the wallpaper, lighting, video games and Persian rugs with which we make it “engaging”. Besides, sometimes all that stuff ends up seeming really forced just so we could secure the funding. Kind of like the way I’m now listening to Paul’s Boutique and hoping to find a clever excuse to quote some lyrics in support of my argument!

Unsurprisingly, then, I come out on the side of those like Bridget, Mike (E, not D) who argue that investing in the fundamentals in such a way that they can be built upon in future is the way to go. To me this means, basically, getting those records and surrogates done irrespective of anticipated clicks: if the object is worth accessioning, it’s worth recording properly. Getting that content into some public-facing form is, frankly, less vital, and we need to be considering intelligent ways of doing it. Building stuff in such a way that others can do good things with it is a step in that direction, which is why I’ve been pinning my hopes on EDL doing the Right Thing with an API. If this works the right way, any size of museum could contribute content and use Europeana’s centralised brain-power to do all the hard work. Then the basics are done, the mediating content can be tied in to it. But “digitisation” is the essential part. Like Mike, I’ve told both EDL and NCO (via Bridget) that they could reasonably drop a user interface altogether if they provide for other people to programme against an aggregated collection, but conversely a public UI without and API is pointless. I say “Rapunzel, Rapunzel, let down your hair” [see side 2 for details]. And Nick, the way you describe the vision for IA is very encouraging for this precise reason. Wish we’d had the chance to talk about it t’other night!

Stephen Lowy also raised the term sustainability. What he described is better termed preservation to my mind (although successful preservation needs sustaining…), but sustainability is at the core of this, and for this we need to make a clear distinction both conceptually and architecturally between the layers of the “digitisation” – the records and surrogates ([collections] data layer), access to them (the layer of functionality), contextualisation (a mediating layer of more content), showing them (a UI layer), perhaps including engagement with the resource in Web 2-ish ways (an additional set of social layers?). We shouldn’t be required to provide all of these parts; it’s ridiculous, even with multi-institution projects. Of course we often want to anyway, but for major funders to say, “we are only interested in helping you with the basics if you’ll build all the other layers too” is dumb.

I should add that I do actually believe that interpreting and being creative with our collections (and in fact going beyond them) is also a core activity of museums, and this obviously carries into the digital realm. To take a current famous example, Launchball, which I had a riot playing with my 5 and 7 year old yesterday: this is exactly what museums are for, but it could have been (relatively) rubbish if it had been compromised in the way Mike describes, perhaps tacked onto a digitisation initiative for the sake of funds and shorn of its purity. It has all it needs: food, sickles and girls

Mike again: “cash for sustainability is either not considered or frowned upon by funders who simply don't recognise that this is an absolute requirement in any successful (web) project.”, and Tehmina also wonders what measures are in place to avoid a repeat of situations where there is no planning for maintenance and continued content development. They’re right. At the start of each project we must be stating (a) what’s important about it (b) how long we want the important aspects to last (c) what strategies will be built in to assist this (d) what other potential sources of value the resource offers. This way we can build it appropriately (funds permitting) to ensure that we can indeed continue to realise value from the thing for the proposed lifespan. And once that period is over, we should also be in a better position to re-examine the resource, decide what’s still got potential to advance the organisation’s purpose, and maybe squeeze more value from it. And if we take the right approach to architecture, with conceptual and technical divisions between the layers, then if we’ve decided that one part is for a couple of years and another is “forever” we’ll be able to put out efforts where it matters.

Bridget asked: “what such lead bodies [HLF and MLA] should be doing to invest in 21st century digital curation?” Basically, I’d say, they should put their funds in three areas and realise that they need to be seen as separate endeavours
  1. invest properly in strategic, sector-wide initiatives like the Information Architecture, that one would hope will do the plumbing job we need, and feed into EDL (and beyond?). Fingers crossed for this one.
  2. support simple digitisation to create the straight-ahead content to go into EDL and/or IA. It’s still got to be done. If it’s not support with funds then MLA must ensure that digitisation is recognised by those providing the core funding as a core activity, and is adequately provided for on an ongoing basis. Not too optimistic.
  3. yes, still fund us to build some imaginative and innovative, born-to-die experimental exciting digital stuff aimed directly at the public. Who knows? Maybe.
  4. and funds need to have the right strings attached. Maybe this is sometimes related to “impact”; it should also be about identifying the “sources of value” in a resource, budgeting realistically for supporting them for a specific period, and planning for the end of its life.

Can you tell it’s a holiday weekend and I’m the only one at home?

PS. About those sickles: sorry, the Mummies also made it onto the turntable.

Saturday, May 03, 2008

PhotoLondon, genealogists and GEDCOM

Since I discovered that at least one family history website was sending users in the direction of the newly belatedly launched "Database of 19th Century Photographers and Allied Trades in London: 1841-1901", I've been thinking more and more about how we can serve this audience.

GEDCOM (for which I guess this is effectively the official homepage) seems to be the data standard of choice for interoperability in genealogy software. The latest non-XML version dates to 1995, but although its mormon keepers have been using the XML form for several years now (it was published in 2002) apparently none of the software out there in general use supports it still. How tragic is that? I guess in the museum world we're not quite the worst example of data standards paralysis! Anyway, if it had to be that which I would offer to the millions of family historians out there, so be it. It would make a lot of sense, though, to talk to that audience a bit, which I started to do on this thread. Useful feedback, not just on the worth of offering GEDCOM at all, but reminding me also of various things I'd forgotten (maybe overlooked) about that site. Copyright, sources, addresses (we have lots more structured data than is visible there), all things we could improve (given some resources).

It all ties in with an announcement this week from Stephen Brown to the MCG and MCN lists of the launch of Exhibitions of the Royal Photographic Society 1870-1915. The data in this (and an earlier site) seem so congruent with the photoLondon data that it would be lovely to explore how they might be tied together. A case for large or small semantic web, for feeds and APIs, for literally pooling data, for imaginative use of search engines or god old fashioned web content creation and management...I don't know, but perhaps we'll explore this. And now I've remembered that some of our data is more precise and structured than I had recalled, perhaps the possibilities are that much greater.

Now to get stuck into some GEDCOM. I might succumb to the XML flavour, though because frankly that 5.5 version looks like a dog.

Thursday, May 01, 2008

Broadband: have you tried it?

Well tonight of all nights is one where I simply have to blog at the first chance I get. I've been telling everyone that today I finally join the late 20th century, because up until now I was quite likely the last web developer in Britain never to have had broadband at home. Today, ahead of the planned date, it was switched on. Even before then my ISP (PlusNet) had doubled my allowance to 15GB/month, which should do for now...
Took me five minutes from unwrapping the router to being online (a little less to set up wireless), so it's no surprise that Fiona has been asking why it needed to take so long. Good question. There was an economic issue at one point, and then at about the time it stopped being one (prices falling, dial-up costs rising as we did more online and the modem got flakier and everything dropped to about 3kbps) it all became just TOO much for me to get my head around: so many choices, contract lengths, how much of what do I need? to bundle or not? And so on. So a good 18 months after getting the green light I finally get around to it and the whole thing is a breeze. Of course, it's too early to review PlusNet as yet, but aside from a slightly confusing sign-up process they've really impressed so far. I love the way they tell you everything on their website, so much more transparent than anything else I've seen.
As for broadband itself, well, to be honest I use it all day anyway so of course it doesn't blow me away, but what I hope it will do is have a profound impact on how I work and study. If I can work effectively from home at last and avoid a 2 hour commute every now and then; and if I can do my research away from the office, that will be all I could wish for. Speed is actually much less important than having no restrictions on what I can do and how long for, but fo course I get both. There's a ton of stuff I want to experiment with, too, which there isn't time for at work. Yippee!
So for most of you that might be reading this (but probably aren't) this is not new at all, and for me it's not really a shiny new toy (well, a bit). But it is a big change, and I'm dead excited. How millennial!