About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Thursday, May 22, 2008

Leuven that aside...

I'm not quite sure where to go with a pun of profound lameness even by my own pitiful standards, but I'm giving it a Creative Commons Attribution Sharealike licence in case anyone can make it pay. Not holding my breath.

Anyway, to the point: bloody hell, railways! Well there are other points, but just wanted to get that off my chest. To Beligium by Eurostar is lovely, unless there's a strike. Eurostar to their credit did all they could to help, moving me onto a later train in the not-quite-certainty that it would get past Lille. They also told my hotel I'd be late. I got there, well, the right side of dawn but the wrong side of midnight. And travelling back I had the more day-to-day joy of British trains being titsup. Not forgetting Junior puking on the journey to the train before all of that. I must have offended St Christopher or something.

But the travel trauma was all worthwhile. The meeting itself was really good and I would love to return to Leuven, it's a beautiful and tranquil place. Imagine a city centre so quiet at rush hour that most of the traffic consists of parent cycling next to their infants on the way to school.

My take-homes from the meeting were many. Here are some.
  • There seem still to be divergent thoughts on whether there will be a record page, although I think it's looking very likely (as seen in the maquette). This means different things for different types of institution and material. Where the original DO on the institution's website is more than an image of moderate size, the visitor will have a motivation to leave Europeana and visit the original. For films and audio this will be clearcut. For assets where the surrogate is in any case an image, they may be less likely to leave. Perhaps this depends upon the size of the largest image that Europeana will show. It does bring home, however, how vital it will be to demonstrate to content contributors that there will be superb reporting of usage of their material on that site.

  • linked to the previous point, if EDL hosts only image surrogates (and occasional small derivatives of multimedia?), it keeps its costs down and traffic to institutions high, but we must keep in mind quality control of the originals hosted elsewhere - what will be acceptable standards for different formats, and how do we control this when they're actually never ingested?

  • a drawback to the plan of ingesting only thumbnails is that for institutions with no existing online presence this is not very helpful. Perhaps EDL should offer a premium service, or one for limited numbers of surrogates per institution, where they can offer to hold a full size image/media asset and display this in a modified details/record page. I'm very keen on this, as a means to attract the participation of tiny museums by, essentially, offering to get them online for the first time. The current model presupposes that every partner is already online - we need a 180 degree turn on this. It's also possibly a modest source of revenue.

  • EDLLocal is obviously the plan to encourage the participation of the minnows at the moment, and I need to find out more about this. Nevertheless, if it still presupposes a web destination outside of Europeana for all DOs it's not enough, IMHO. We need to see just how low we can set the barrier to participation, and how big we can make the reward. Rather than requiring a URL for each DO, it would be better to be able to take whatever a contributor can give - a DVD of images and a spreadsheet of metadata, say - and offer them

    • fuller record pages on Europeana

    • API access and code fragments for dropping onto a blog or whatever

  • I like the idea of using a wiki for UGC related to objects. It has the advantages of being

    • cheap and out of the box

    • very easy to create new pages

    • EDL GUIDs/DOIs usable for page names

    • microformat/POSH friendly?

    • familiar to many

    • clearly distinct from the "authorised" content

  • The new home page that Jon Purday showed looks like progress. The concept is a good step, the graphical side isn't finished but getting there

  • There was plenty to read and discuss about reorganising EDL for the next phase of the project, the build of version 1.0 (first we have to launch a series of prototypes). It's too early to talk about this.

  • We worked on the vision and mission, which was quite fun and threw into relief some differing ideas of what the whole thing is about. Personally I like for a vision something like: Europeana.eu: culture and heritage, connected and shared. It's short, it emphasises both the connections being made between knowledge and the sharing of this with people.

  • Money is tight, realistically EDL will be relying on a subsidy of some sort for a good while to come, but there may be some good commercial opportunities. These needn't conflict with either the ownership of the source data and digital assets by contributors, nor the public service/public good ethos. The semantic graph derived from the combined dataset will belong to EDL, and this could be very marketable. I have to work up ideas here, I have OpenCalais in mind as some sort of model.
One other outcome was that the cabbie who finally got me home solved a UFO mystery I'd been intrigued by for a few months. One evening last autumn I watched a string of lights for about ten minutes as they emerged over the horizon and slowly rose through the clouds. Either it was something fast but a long way off - I was thinking shed-loads of planes from one of the Suffolk airbases - or actually as slow as they appeared and nearby. Turned out they were the latter - a whole load of candles floating under balloons, released from Gosfield School to confuse those who Don't Even Want To Believe But Are Intrigued By The Lights

Tuesday, May 20, 2008

To Leuven (expenses paid), but who will pay for EDL?

Tomorrow sees EDL's working group 1 meeting at the Katholieke Universiteit in Leuven, not far from Brussels. Quite exciting to be visiting, albeit fleetingly, a place that played host to Matsys, Bouts, Erasmus and Vesalius, amongst others (not to mention, apparently, the infamous AQ Khan). I'm looking forward to attending, though with no expectation of being able to contribute a lot since this group covers different ground to the one I've worked with up till now. I'm not even sure if I'm part of the group or simply in attendance. Anyway, the meeting will look at progress with Europeana so far, and consider the business issues facing it, particularly how to move to phase 2 (i.e. following the prototype, to be launched in November) and how to build a sustainable future. Working my way through 100-odd pages of reading matter in preparation for the meeting, I'm struck by how big a challenge it will be to find the necessary ongoing resources, but also the fact that they are tackling the problem head-on and examining a wide variety of options, from direct subsidy, through subscription by contributors or users, to corporate partnership or sponsorship.

A significant factor in the search for revenue-raising avenues is the fact that Europeana is not going to be a content owner in any significant way, but rather a broker/facilitator for accessing content owned by others. One possibility that I believe it could be worth exploring for two reasons is some form of partnership with a search provider. Yahoo! may be a bit too distracted to talk at the moment, but along with Google could be productive partners. Both sides could benefit by working on an interface and aligning their data structures, and EDL could perhaps offer quite a bit to such a partner in terms of preferential access to the semantically enriched data it will hold. This might be directly to do with searching the resources in EDL, or it might be, say, helping to clean up datasets of people and places. In exchange, maybe either some cash or technological assistance? Perhaps some of the semantic-y startups currently taking wing could also be interesting to work with, but they won't be as well resourced. Cultural heritage organisations have a lot of knowledge and context to offer here so maybe there's a business model to be had.

Friday, May 16, 2008

Yay! Follow the Search Monkey!

Well, rock'n'roll, looks like Yahoo!'s* Search Monkey is going live today. Apparently this will allow us as site owners to (cribbing from RWW's report) "share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. " On the basis of that data, other developers will build apps and users will enhance their search. This seems to be precisely the sort of thing we wished for in the SWTT (in fact legendary Mike Lowndes pointed to earlier signs of this move last month). It's also what I had my doh! moment about last week.

So if it's what I hope it is, we can co-ordinate with others in the sector on some standard fields (and keep it simple initially), push our content into Yahoo! and build apps on top of their search engine. My reservation would be that at the moment it seems to be about building either "Infobars" or "Enhanced Results", but perhaps there's something more API-like and programmable there, or on the way.


* is this the right way to punctuate the possessive of that annoyingly-punctuated name? Answers on a postcard (to Jerry Y!ang).

Thursday, May 15, 2008

Microsoft's prototype chirpy cheap cheap multi-touch interface: Touch Wall

No point rewriting this post on RWW, this is cool though. Watch the vid. Shame there are not plans to "productize" now, but the cost sounds low and the interface intuitive (and obeying the conventions developing around multi-touch).

It's also worth looking at the Touch blog now and then. It's mainly about near-field communication (lots of RFID) but also touch interfaces (nothing on this MS thing yet).

Wednesday, May 14, 2008

Free font creation web app. Cool!

I like: Font Making Made Easy (ReadWriteWeb), pointing at FontStruct

Digital archiving and risk management, that's DRAMBORA's game

The DRAMBORA project is blogged about on ULCC's da blog. It's another initiative in the digital curation/archiving community that could have outputs that can be directly transported into good management practice beyond formal archiving.

Tuesday, May 13, 2008

The Passively Multiplayer Online Game, PMOG

TechCrunch led me to the Passively Multiplayer Online Game, PMOG. Looks like it could be fun, and I wonder if it will turn out to have a bit of overlap with the "web quests" idea of the sort that NMOLP was apparently inspired by (not much point clicking that link, though, it's password protected. Google it and look at their cache if you like, though :-). Just keep escaping the login windows).

The Yahoo! Internet Location Platform

Brady Forrest writes about the Yahoo! Internet Location Platform, which sounds very cool. Couldn't be simpler, really. I'll have to have a look at just what sort of entities that relate to MoL have a WOEID (What On Earth ID), but doubtless there's lots we can do with this. If we can find the time...
He also points to this for how it all ties to Flickr.

Monday, May 12, 2008

Slicing the market

To follow up my recent posts on the 21st century museum and the National Collections Online feasibility study, here's a little more about the question of how funds are allocated by bodies like HLF, and how come we've tended to see them requiring digitisation projects to run from end to end, from digitisation to web publication.



I don't really know enough about the programmes that were run in the past like, for example, NOF Digitise and the HLF (still funding today). However from the projects that I do know that drew on these funding streams, it seems clear enough that concentrating on digitisation per se was never enough - a user-facing product was expected. These did have a sort of aggregated interface (http://www.enrichuk.net/, now pointing at MICHAEL UK) but not really what I've got in mind.



It's quite understandable that this practice persists, indeed there is a good case for it. For funders, they want to see direct user impacts, and a website that can generate stats is nice for that. For museums, well they have a two-fold role - to hold and to document their collections; and to research and interpret them for various audiences. Naturally, then, there will be forces within them urging them to do something with the stuff they digitise, and do it now. The downside is that this has led to fragmentation of the materials digitised for these programmes.



My argument is that for modest sized projects it's not good to spread yourself too thinly trying to do everything, but instead to concentrate on part of the value chain. It's a straighforward architectural thing, and essentially I've realised that I'm just asking for a shift in the assumptions of funding bodies from demanding vertical slices to demanding (contributions to) horizontal slices - in short, to encourage the building of a strong sector-wide layered architecture. Vertical businesses deal in multiple (if not all) links in the chain from production to consumption. This is not really suited to much of what we do. Indeed even behemoths like Amazon don't yet do this (though they a strategically building out from their core business) - they don't actually write books or commission, edit and publish them, print them, or market them, or at the other end deliver them. They flog 'em and dispatch them, and act as an agent for others to do the same.

It's crazy to ask, as a precondition for funding, that a museum or partnership does the whole chain from digitisation and record enhancement, to search engine development, perhaps aggregating records, through to building the user experience as a website, a game etc. Things get done again and again and again, effects are diluted, money wasted. Lots of good stuff will still be made, of course. I don't know what this horizontal architecture would be like, though just possibly we can see parts of it falling into place with PNDS and the Integrated Architecture, and I don't know to what degree it would play a part in many projects in which UK museums engage, but I'm just suggesting that instead of HLF, for one, getting hung up on projects doing the whole end-to-end thing they should support us in coming up with a model to slice the market differently, working on one step at a time, and recognising that the very process of creating the digitised content is a valuable step that is somewhat independent of the creation of an audience-facing instance of it. Vertical slices make silos.

For more on slicing and dicing see here. Shalom.

Saturday, May 10, 2008

National Collections Online: why, what, how...in fact, ?!?

Friday afternoon we gathered at the Science Museum's Dana Centre, perhaps two dozen people sampling a variety of disciplines but all, for one reason or another, interested in the National Collections Online project. This was a workshop for the feasibility study being run by Flow Associates (Bridget McKenzie and Mark Stevenson) following up the discussions they've had with, presumably, all or most of those there. NMSI and the V&A had good representation, not just techy and new media but curatorial and from the collections data side. The National Maritime Museum, the third museum partner in the project, had a couple of representatives too. Culture24 is another partner, and Jane and Jon were there. The NMDC, who I take to be the initiating partner, had no direct representation but wil hopefully learn the lessons anyway. The rest of us included e-learning experts, a couple of people from other museums and the Wellcome Trust, Ross flying the flag for academia, Ben Hayman of Lexara that for the commercial sector, Nick of Collections Trust, Jill Cousins from EDL (which I thought was brilliant, unfortunately she couldn't stay until the denouement but I have a feeling that her presence may have swayed some sentiments in the direction of EDL), and finally Tom Steinberg, director of mySociety. A very interesting crowd.

The biggest problem with the session was, I guess, exactly the same as that which it was trying to address, and perhaps will have helped us to break through it before the next gathering. That problem is the vagueness of the project, which made it almost impossible to talk incisively about it. Someone mentioned a 10,000lb gorilla at one point; all I could see was a big cloud in the middle of the room, who knows what was at the heart of it. I've not talked about NCO before so I should do the basics: Flow's inquiry is into the "viability of an online resource to integrate national museum collections", and perhaps we can orient ourselves as to what this might mean by the well-known points of reference that they have mentioned: CHIN/CCO, CAN, culture.fr, Artstore, Powerhouse. It seems that many people said "not a portal", which was the common response at the session too. Given this, we both needed to decide what we were talking about if not a portal, and know that in advance to make much progress on questions like whether a "single point of access" would be of use to various audiences. This isn't a criticism of how the session was run, it was sure to be tricky as a natural consequence of the fact that, even before the project properly starts, it maybe engaged in a major change of direction (and this is the time to do it!).

Anyway, it was a useful process and it did seem to bring out a good degree of consensus on the unsuitability of an old-skool "portal" approach. I took away a couple of key points though, including a useful reminder:
  1. Fiona Romeo made a remark that inspired a "doh!" moment in me: why not talk to Google about them ingesting our content directly? This time last year we were talking about this very prospect at the UK Museums Semantic Web Think Tank, and it still seems like a really good plan (Yahoo! too, especially since they seem at least as keen on adopting semantic technology). Still, somehow, I'd stopped thinking about this line of attack having been concentrating on the opportunity (or threat, if done badly) offered by EDL. And in the session I didn't shy away from pushing EDL as the obvious place in which collections data should be aggregated, so as to scale to all museums and not duplicate efforts. I argued that NCO should wait for this to firm up before deciding what to do that would compliment or build upon it. However when Fiona mentioned the prospect of working with Google, I realised that over the last 6 months of talking to EDL about how to make sure that it wasn't seen as irrelevant by museums, I'd started to forget that it needn't be the only path down which we go to achieve our goals - just the one that grabbed my attention late last year. I think that the two approaches can be complimentary, and in fact EDL itself would be well advised to talk directly to the search providers about their ingesting structured data. NCO could, in theory, provide something of a breakthrough that would be genuinely extensible and scalable. This would also put the lie to one of my contentions, which was that there was little point in doing something that only involved a small number of nationals. On the contrary, if it opened a very wide door for cross-collection search as this approach might, it would be very worthwhile.
  2. I had a moment of clarity, which followed on from the recent hooha on the MCG list concerning the disappointment or otherwise of the NOF Digitise programme. One of my arguments was that it would be better to make distinctions in projects between those that do digitisation, those that build functionality, and those that build user experiences. I realised that I'm talking basically about slicing funding differently, changing from vertical to horizontal slicing, and that it's not unlike talking about markets. I'm going to post separately with more thoughts on that.
  3. We had useful input from Tom. Though he was kind of preaching to the converted about the idea of making our content as widely available as possible, it's not surprising, and he also furnished us with some useful parallels and metaphors. From our later chat at the pub it's clear enough that he's as keen on the lightweight dissemination of semantic data as I am (albeit sceptical about the Semantic Web - but then in a sense so am I, it's semantic technology that is making the headway, and there are riches to be found there that do appear to be speeding us in the general direction of a more semantic web in any case)

Tuesday, May 06, 2008

The slightly better shape we're in

Well one thing I didn't mention yesterday was the good news, which is that Friday saw the official launch internally of a new set of guidelines for "digital programmes". This is something that Pete had been working on for some time, doing exactly what I've been arguing for from the point of view of planning, not just for current usefulness, but for longevity. That is, he makes explicit links between the strategic and business aims of the organisation, and our digital activities. From there he turns this into a set of principles or activities that must be followed/carried out by "any member of staff considering the creation of a digitally based resource either as an item in its own right or in support of an exhibition, publicity campaign or event ". It's a very thoughtful document and a really big step for us. In fact it's pretty big full stop - 25 pages big.


Another part of the document outlines the planned ICT committee (heavy on the web), and gives a dozen strategic considerations that underlie the document such as centralised programme assessment, interoperability, technical fit, and some interesting ones like flexibility, which relates to the ability of potential partners to respond to our needs.


There's plenty more in there. There are still holes to fill, but I'm hopeful that this document will make a difference to how commissioning takes place from now on, with fewer nasty surprises for us and better allocation of resources for all.

Monday, May 05, 2008

The shape we're in (with a musical digression)

Today I'm still making the most of the quiet to dig into my vinyl and I'm working to the sounds of the german underground, mainly circa 1970-75. Right now, though, Phew (1981). This just so blows my mind. Can fans must have it, and they can get it on CD now (perhaps I will too, it's so hard to get at my LPs most of the time). Given that I've been writing about what our organisation needs to do to get its digital act together perhaps its good that the previous listening has been pretty calming: Alpha Centauri, Tone Float and Outside the Dream Syndicate (not that mellow, actually, the last one). Phew has lovely peaceful moments (like Dream, now), but a driving, anxious motion to some tracks too (Signal). The legendary Roger of defunct Revolver Records in Bristol sold me my copy, I don't think I've thanked him enough and probably never will, but thanks anyway, Rog.

I've been thinking about the holes in our policies (notably concerning the preservation of UGC - we may not want to bother, but we have to at least formulate a policy towards it), and I've been looking back over the 6 years I've been there and thinking about how our responsibilities have developed. We need to work up a business case for a larger team, and perhaps one of a different shape. Frankly it's a no-brainer, but the case needs to be made and it's a good exercise to outline those changes. We also need to work out with the Godot-like web committee (I'm sure it will be along any day now, with purposeful stride and a keen sense of what needs to be done) quite what role we want the web folk at MoL to have. There is a wide range of things we could be doing, and right now we try to do most of them, with some key exceptions like web design and rich media work (Flash, video), which we always farm out. But we do too much and can't always do it properly, and I feel the need right now to consolidate, to work on our infrastructure. If we can decide on exactly what we should be done in-house, and what can be effectively handled out of house, then we can get and give a clear idea of just what resources we need. Right now we have our content manager Bilkis (the one net gain since 2002), Mia, who does a lot of web work as well as various databases, and me, essentially just web.

Personally I reckon we need more people in the team, but there are alternatives, even if they're idiotic: the Museum could decide to draw in its horns and use our expertise just to commission, integrate and manage third party work, as well as content. We certainly do plenty of advising on this at the moment, and most of the vendors we've used recently have done great work for us (please drop me a line if you want to know whose work was not up to scratch), but I'm one of those who think the core work of a museum web team must also include taking care of the plumbing and probably building/running the core CMS too. Perhaps when I've finished my musings I'll post a proper list here of what functions I identify, what I think we should take into our remit, and what I think we should hand out. Best get on with it, then.

Sunday, May 04, 2008

The MCG thread: 21st Century digital curation

Bridget McKenzie has blogged and kicked off a great discussion on the MCG list following a seminar last week with Carole Souter of HLF and Roy Clare of MLA. I've written a reply but as usual I do go on a bit so I'm sending a brief version there and putting more depth here.




There are so many strands to this exchange that I want to jabber about so I’m going back to the start – Bridget’s post. Bridget, if I get your drift, yes, I agree: there is a balance to be struck between the effort put into basic “digitisation” (though I think there are various ideas circulating about what that term implies) and interpreting our digitised collections, and I’m not sure we’re being helped to strike the right balance at present. Raves and gripes about past projects aside, it’s how we spend the scant funds now available that bothers me. Going from a time of relative plenty to a time where most budgets are Spartan rather than Olympian, how do we plan clearly how we spend them?

I’m torn. On the one hand, I get the sentiment which says, we need to be making things that people will enjoy, with some sort of mediation and aimed at well-defined audiences, just like any exhibition or public event we host. I accept that HLF amongst others want their funds to be used on things that we don’t automatically do, things that aren’t our “core activities”. And I take Dylan’s point that, when resources are scarce (which is always), demonstrating (actually, having) impact is really important. But….

On the other, I would suggest that digitisation being referred to as a core activity betrays the fact that it is indeed now something we have to do. The trouble is, it may be a part of our core work, but it has never been core funded (at least not in terms of funds additional to the pre-digitisation days). So it’s a cop-out to say “it’s really important, and therefore we won’t pay for it”. Who will, then? Until our core funders, whoever they might be, come up with the cash to do this newly core activity on an ongoing rather than project basis, we’ll have to act as though it’s not “core” and go begging precisely because it’s so important. But apparently HLF think it’s important enough to not fund it too, so we’re stuffed. I’m glad to hear that in Birmingham at least there digitisation is seen as something to drive with internal funds, I guess HLF are hoping more places will go that way.

The important thing that NOF Digi did (along with other HLF and DCMS funded projects) was get a load of collections records in some sort of order and snap some nice shots of the objects (which seems to be commonly accepted as equating to the “digitisation” of a typical museum/gallery item – making a good record and a decent photo). All the other stuff isn’t flim-flam – for those many users that had a great experience of Port Cities and other such projects, that experience was in no small part due to the contextualisation and linking built on top of the vital digitisation effort. I’m sure that BMAG’s Pre-Raphaelite website will be great, but as Rachel says, the payoff is much more than whatever users it attracts. Like an exhibition, the mediated experience will be more transient than the collections (physical or surrogate) on which it is built. We can’t equate web-ready records and surrogates with physical collections, but nevertheless they are the bricks and mortar from which our public-facing offering is built, and they will last longer than the wallpaper, lighting, video games and Persian rugs with which we make it “engaging”. Besides, sometimes all that stuff ends up seeming really forced just so we could secure the funding. Kind of like the way I’m now listening to Paul’s Boutique and hoping to find a clever excuse to quote some lyrics in support of my argument!

Unsurprisingly, then, I come out on the side of those like Bridget, Mike (E, not D) who argue that investing in the fundamentals in such a way that they can be built upon in future is the way to go. To me this means, basically, getting those records and surrogates done irrespective of anticipated clicks: if the object is worth accessioning, it’s worth recording properly. Getting that content into some public-facing form is, frankly, less vital, and we need to be considering intelligent ways of doing it. Building stuff in such a way that others can do good things with it is a step in that direction, which is why I’ve been pinning my hopes on EDL doing the Right Thing with an API. If this works the right way, any size of museum could contribute content and use Europeana’s centralised brain-power to do all the hard work. Then the basics are done, the mediating content can be tied in to it. But “digitisation” is the essential part. Like Mike, I’ve told both EDL and NCO (via Bridget) that they could reasonably drop a user interface altogether if they provide for other people to programme against an aggregated collection, but conversely a public UI without and API is pointless. I say “Rapunzel, Rapunzel, let down your hair” [see side 2 for details]. And Nick, the way you describe the vision for IA is very encouraging for this precise reason. Wish we’d had the chance to talk about it t’other night!

Stephen Lowy also raised the term sustainability. What he described is better termed preservation to my mind (although successful preservation needs sustaining…), but sustainability is at the core of this, and for this we need to make a clear distinction both conceptually and architecturally between the layers of the “digitisation” – the records and surrogates ([collections] data layer), access to them (the layer of functionality), contextualisation (a mediating layer of more content), showing them (a UI layer), perhaps including engagement with the resource in Web 2-ish ways (an additional set of social layers?). We shouldn’t be required to provide all of these parts; it’s ridiculous, even with multi-institution projects. Of course we often want to anyway, but for major funders to say, “we are only interested in helping you with the basics if you’ll build all the other layers too” is dumb.

I should add that I do actually believe that interpreting and being creative with our collections (and in fact going beyond them) is also a core activity of museums, and this obviously carries into the digital realm. To take a current famous example, Launchball, which I had a riot playing with my 5 and 7 year old yesterday: this is exactly what museums are for, but it could have been (relatively) rubbish if it had been compromised in the way Mike describes, perhaps tacked onto a digitisation initiative for the sake of funds and shorn of its purity. It has all it needs: food, sickles and girls

Mike again: “cash for sustainability is either not considered or frowned upon by funders who simply don't recognise that this is an absolute requirement in any successful (web) project.”, and Tehmina also wonders what measures are in place to avoid a repeat of situations where there is no planning for maintenance and continued content development. They’re right. At the start of each project we must be stating (a) what’s important about it (b) how long we want the important aspects to last (c) what strategies will be built in to assist this (d) what other potential sources of value the resource offers. This way we can build it appropriately (funds permitting) to ensure that we can indeed continue to realise value from the thing for the proposed lifespan. And once that period is over, we should also be in a better position to re-examine the resource, decide what’s still got potential to advance the organisation’s purpose, and maybe squeeze more value from it. And if we take the right approach to architecture, with conceptual and technical divisions between the layers, then if we’ve decided that one part is for a couple of years and another is “forever” we’ll be able to put out efforts where it matters.

Bridget asked: “what such lead bodies [HLF and MLA] should be doing to invest in 21st century digital curation?” Basically, I’d say, they should put their funds in three areas and realise that they need to be seen as separate endeavours
  1. invest properly in strategic, sector-wide initiatives like the Information Architecture, that one would hope will do the plumbing job we need, and feed into EDL (and beyond?). Fingers crossed for this one.
  2. support simple digitisation to create the straight-ahead content to go into EDL and/or IA. It’s still got to be done. If it’s not support with funds then MLA must ensure that digitisation is recognised by those providing the core funding as a core activity, and is adequately provided for on an ongoing basis. Not too optimistic.
  3. yes, still fund us to build some imaginative and innovative, born-to-die experimental exciting digital stuff aimed directly at the public. Who knows? Maybe.
  4. and funds need to have the right strings attached. Maybe this is sometimes related to “impact”; it should also be about identifying the “sources of value” in a resource, budgeting realistically for supporting them for a specific period, and planning for the end of its life.

Can you tell it’s a holiday weekend and I’m the only one at home?

PS. About those sickles: sorry, the Mummies also made it onto the turntable.

Saturday, May 03, 2008

PhotoLondon, genealogists and GEDCOM

Since I discovered that at least one family history website was sending users in the direction of the newly belatedly launched "Database of 19th Century Photographers and Allied Trades in London: 1841-1901", I've been thinking more and more about how we can serve this audience.

GEDCOM (for which I guess this is effectively the official homepage) seems to be the data standard of choice for interoperability in genealogy software. The latest non-XML version dates to 1995, but although its mormon keepers have been using the XML form for several years now (it was published in 2002) apparently none of the software out there in general use supports it still. How tragic is that? I guess in the museum world we're not quite the worst example of data standards paralysis! Anyway, if it had to be that which I would offer to the millions of family historians out there, so be it. It would make a lot of sense, though, to talk to that audience a bit, which I started to do on this thread. Useful feedback, not just on the worth of offering GEDCOM at all, but reminding me also of various things I'd forgotten (maybe overlooked) about that site. Copyright, sources, addresses (we have lots more structured data than is visible there), all things we could improve (given some resources).

It all ties in with an announcement this week from Stephen Brown to the MCG and MCN lists of the launch of Exhibitions of the Royal Photographic Society 1870-1915. The data in this (and an earlier site) seem so congruent with the photoLondon data that it would be lovely to explore how they might be tied together. A case for large or small semantic web, for feeds and APIs, for literally pooling data, for imaginative use of search engines or god old fashioned web content creation and management...I don't know, but perhaps we'll explore this. And now I've remembered that some of our data is more precise and structured than I had recalled, perhaps the possibilities are that much greater.

Now to get stuck into some GEDCOM. I might succumb to the XML flavour, though because frankly that 5.5 version looks like a dog.

Thursday, May 01, 2008

Broadband: have you tried it?

Well tonight of all nights is one where I simply have to blog at the first chance I get. I've been telling everyone that today I finally join the late 20th century, because up until now I was quite likely the last web developer in Britain never to have had broadband at home. Today, ahead of the planned date, it was switched on. Even before then my ISP (PlusNet) had doubled my allowance to 15GB/month, which should do for now...
Took me five minutes from unwrapping the router to being online (a little less to set up wireless), so it's no surprise that Fiona has been asking why it needed to take so long. Good question. There was an economic issue at one point, and then at about the time it stopped being one (prices falling, dial-up costs rising as we did more online and the modem got flakier and everything dropped to about 3kbps) it all became just TOO much for me to get my head around: so many choices, contract lengths, how much of what do I need? to bundle or not? And so on. So a good 18 months after getting the green light I finally get around to it and the whole thing is a breeze. Of course, it's too early to review PlusNet as yet, but aside from a slightly confusing sign-up process they've really impressed so far. I love the way they tell you everything on their website, so much more transparent than anything else I've seen.
As for broadband itself, well, to be honest I use it all day anyway so of course it doesn't blow me away, but what I hope it will do is have a profound impact on how I work and study. If I can work effectively from home at last and avoid a 2 hour commute every now and then; and if I can do my research away from the office, that will be all I could wish for. Speed is actually much less important than having no restrictions on what I can do and how long for, but fo course I get both. There's a ton of stuff I want to experiment with, too, which there isn't time for at work. Yippee!
So for most of you that might be reading this (but probably aren't) this is not new at all, and for me it's not really a shiny new toy (well, a bit). But it is a big change, and I'm dead excited. How millennial!

Tuesday, April 29, 2008

KML to go

Well I've finally bridged the gap between our site summaries (which have long been available online) and GMaps (likewise). I told part of the story earlier - the summaries that our archaeology service write are compiled into an XML document (processed out of Word via a macro...), and transformed into HTML with XSLT. But because the location data is always in OS grid references it's no good for online mapping apps (which all like latitude and longitude). So I've been trying to find a way to get lat/long for the sites (which number many thousands) in order to let us plot the data for World And Dog, if they're not otherwise engaged.

Step one was to get a way to clean up the TQ-style OSGB data. Step two, adapt code/write a web service to enable me to pass that lot in and get back latitudes and longitudes. From there I needed to combine the resultant XML with the original site summaries. I tried doing stuff with Yahoo! Pipes but it wasn't too keen on my XML, or at least it wouldn't show me the items. Anyway, instead of that I thought I'd draw both datasets into one XSLT transformation and output KML, which is what I've done today (thanks in part to the inspiration of Raymond Yee's great "Pro Web 2.0 Mashups" book from Apress). I would have liked to just pass in a single variable (year) and go through all these steps automatically but it wasn't worth the hassle since every step needed a touch of hand-massaging on the data.

The KML includes all the site summary descriptive content. Looking at the resultant Google Maps I see there are some glitches, like things in the wrong place, and things without coordinates, and actually I need to check out 1999 which has a fatal error somewhere. I don't have time to fix these right now, but overall I'm pretty excited: at long last, we have a nice mapping interface for the public to look at all those thousands of excavations, desktop assessments, surveys etc. that MoLAS has conducted since 1992 (but not 2007 yet). Well I say all, in fact some of those from outside the London area are not included.

Now I'm hoping that someone will come and do something cool with the KML. In due course I'll have a go myself, but if you come up with anything please let me know!

So here are links to the functioning maps. Save them to My Maps and do something with the result!

[Edit: you can also see these embedded into our website. Here's 2006]

MoLAS site summaries 1992
MoLAS site summaries 1993
MoLAS site summaries 1994
MoLAS site summaries 1995
MoLAS site summaries 1996
MoLAS site summaries 1997
MoLAS site summaries 1998
[MoLAS site summaries 1999 - bust right now]
MoLAS site summaries 2000
MoLAS site summaries 2001
MoLAS site summaries 2002
MoLAS site summaries 2003
MoLAS site summaries 2004
MoLAS site summaries 2005
MoLAS site summaries 2006

Thursday, April 24, 2008

OSGB-lat/long web service for GIS

We have loads of geographical information. Trouble is, it's almost all in OSGB grid reference form, which is no good for feeding to apps like Google Maps. Worse, much of it uses old-style 100km squares (mainly TQ, which covers London). We've taken a couple of approaches to this - hand-making a few maps in Google (for example, our Olympics work is mapped here), and using batch processing scripts from ESRI or others to manipulate the data in the geographical fields of our archaeology database, creating latitude and longitude values to accompany the OS grid references. However there is still a good set of data that needs another approach - for example, our site summaries for the last decade and more are available online. These XML-driven pages contain only OSGB data. I plot them very crudely onto our own ESRI-based map application, but would much rather have KML to work with.

So to the point. I came across a great script on the Hairy Spider site that also runs a web service. I wanted to take this further so that (a) I could pass in lots of values, not just one at a time (b) it could handle the TQ-style syntax (c) not keep on hitting friendly Mr Spider's server. The code available is for the convertion from OS eastings and northings to lat/long only and I've not tried to reproduce his "proper" web service, but I do now have something that will work for my needs. I can pass in a querystring like
gr=tq709098,SW465987,tl123456,51232456,512300245600
and get back something like:



50.8618352280453
0.431144625126249
tq709098


50.7318874223712
-5.59642485933937
SW465987


52.0968934973257
-0.358719120254802
tl123456


52.0968934973257
-0.358719120254802
51232456


52.0968934973257
-0.358719120254802
512300245600



[note that the last three values in the query were all for the same grid reference but in different formats, producing the same lat/long]
For me this is pretty useful. I may well extend this to take more parameters and pass out KML, but the main thing is having a means to convert the data on the fly over HTTP.

Many thanks to Hairy Spider for doing all the hard work on this. I've tested the outputs and they're very close to the OS's own tool, so good work HS!

Next thing is to use this. I'll let you know when I do, and we might also be open about making the service publicly available if that would be of assistance to anyone that might be reading this.

Tuesday, April 22, 2008

AdaptiveBlue offers AB Meta. Did the earth move for you?

AdaptiveBlue is an interesting company. Although I've not found myself using their Blue Organiser tool all that much myself, I can see which way they're pointing and I like it. Now they have announced how they wish to refresh of the old familiar META tags in the heads of web pages with their take on object-centric metadata. AB Meta (apparently developed with other web companies) is all about surfacing semantic data into the layer that we typically interact with, and that even non-tech people can hopefully author without too much trouble. From their page:

AB Meta is a simple and open format for annotating pages that are about things.

A book publisher can use AB Meta to provide information about a book such as the author and ISBN, a restaurant owner can provide information such as the cuisine, phone number and address and a movie reviewer can annotate reviews with movie titles and directors.

The format allows site owners to describe the main thing on the HTML page in a very simple way - using standard META headers. AB Meta is purposefully simple and understandable by anyone. AB Meta is based on eRDF Standard.


I'm especially interested in this "surface" expression/implementation of SW. It's clear to me that much of the running in recent times has been made by companies looking to SW-style concepts and aspirations to deliver real benefits to their business, and only in a few cases has this led to them taking a classic-ish SW path (c.f. Reuters with OpenCalais). AdaptiveBlue and many others have instead set out along the light-weight, near-the-surface route, and as an eternal optimist (for some reason), I am hopeful that this will ultimately deliver the meat that the heavy-weight, deep SW needs to do something exciting. Thus killing the chicken/egg situation, with pay-offs along the way. This was the real take-home for me of last year's SW think tank.

Whether AB Meta has a part in this for museums I can't say. It's certainly lightweight but whether it will be different enough from existing alternatives to persuade our sector to adopt it, I don't know. Perhaps the earth will yet move.

As a PS, I should add that I dropped them a line to ask about a detail (whether it would be possible to include more than one object in the head of a page) and the reply came from CEO Alex Iskold. I think that's pretty impressive: presumably he's a busy guy (and he writes a good blog post, too) and yet he took the time to reply to a pretty pedestrian inquiry.

Wednesday, April 16, 2008

That IE7 prompt issue...

So here's why: Working around IE7s prompt bug, er feature (includes a possible solution)
Damn their eyes!

EDIT:
There's an alternative, but similar, solution here: http://www.anyexample.com/webdev/javascript/ie7_javascript_prompt()_alternative.xml. Both solutions require a relatively extensive script and callback, so it may be best if I stick it all into an external JS file and embed and call this with the bookmarklet. However I also now know that it's a security setting, so I've fixed my own IE installation. If you want to do the same, you must enable "Allow websites to prompt for information using scripted windows". Seems to all work then.

Bookmarklet update

OK, there are problems with with prompting bookmarklet in IE. It's all to do with the prompt. Yesterday it worked, but only on the first use, then it would stop prompting for the language-pair value (and in fact ignore the default value I'd put in there) and just skip straight to the translation page, which, without a language pair, can't do much. I think it may be to do with security, since it occasionally shows that "website trying to show active content" warning for a moment before scooting straight off to Google without a by-your-leave, let alone that prompt.

So for IE, for now, I'm just using a straightforward Italian-English bookmarklet
For Mozilla, the prompting version is now also Google-based and goes there. Here it is: translate

Tuesday, April 15, 2008

A Babelfish bookmarklet

I've been longing for a way to do on-page translation - you know, highlight a bit of text and see its translation inline (dodgy though machine translation is). It's not a HUGE bother to go to Babelfish and do the job there but still just a bit too much of a bother. Today I wanted to see what a Portuguese blog was saying about us (beyond what I could hazard with my spotty knowledge of other Romance languages) so I thought, sod it, time to try and do this.

Well, there's no public API for Babelfish (at Google, Yahoo! or Altavista) as far as I can tell, so doing what I really want to do isn't going to be straightforward. Getting the text translated means receiving the results as a full HTML page, so embedding the translation alone will involve some screen-scraping. The next best thing would at least be to highlight some text and go straight to the translation, so I've made a bookmarklet for the job: trans PT_EN

If you want this, drag it to your Links bar in IE (or right-click, save to Favourites>>Links), and in Mozilla drag it to your Bookmarks toolbar (I may have remembered this wrong). By changing the language pair indicated at the end of the redirect URL you can modify this for lots of other languages (this one is pt_en i.e. Portuguese to English). Personally, I may not use it very often, I'll have to see. Let me know if it's any good for you. You'll currently need a different bookmarklet for each language pair, of course.

I hope I can make some improvements. One would be letting the user set the language pair each time - perhaps with a prompt box. Perhaps the next thing would be to pass the translated page through a Yahoo! Pipe and scrape out the translation, to drop it straight on the page.

EDIT: Oh sod it, here's a version that lets you set the language pair in a prompt: translate

EDIT AGAIN: Just seen that Google's translate page offers something pretty much the same, dammit - though not my second option with the prompt. Perhaps I should mod their code, it will be better than mine...

KML goes open

ReadWriteWeb comments on Google's announcement that KML is being handed over to the Open Geospatial Consortium. As RWW says: "For something as boring and painful as it is - standards work is very sexy". This gives us all more confidence that it is a format that's going somewhere and should be reliable for a good while to come. No more hangups about being too tied to Google's proprietary format. Cool!

Friday, April 11, 2008

Standing back for a moment [cross-post from mymuseumoflondon.org.uk]

[cross-post from mymuseumoflondon.org.uk]

Hi, it’s the web-monkey again. Things have been pretty intense lately, due in large part to the end of the financial year and the need to wrap up all sorts of budgets.


My part in the various projects I work on ranges from major to peripheral - sometimes some serious programming, sometimes offering advice on commissioning, sometimes just doing a little tweaking ready for integrating someone else’s work. All the same I’ll flag up a couple of things I’ve been involved in lately, at least of those that have now launched, even if I didn’t do that much myself - after all, where else do we sing about some of this stuff? Too often it ends up sort of dribbling out because we’re all too busy or exhausted to make a song and dance about it. So, here we go:



  • The Great Fire of London website, orientated at children of Key Stage 1 age (5-7) and their teachers. This is the result of a partnership between the Museum of London, National Portrait Gallery, The National Archives, London Metropolitan Archives, and London Fire Brigade Museum. It’s cool. Thanks to ON101 for building the game and designing the site, and our own Mariruth Leftwich for shepherding the whole thing. Also via Mariruth comes a game to complement our Digging Up the Romans learning resource.

  • At last we have sort of launched “The Database of 19th Century Photographers and Allied Trades in London: 1841-1901“. This is the electronic representation of the amazing work done by David Webb in cataloguing thousands of people in that industry in Victorian times. I built the database, hmm, several years ago for another partnership we’re in, but it was never launched for reasons that even now seem obscure. Anyway, it’s now live and, though it needs an overhaul even now, it’s great to think it may at last start being useful. I want to open the data up for mash-ups….when I get some time.

  • The Sainsbury Archive, a fantastic resource at Museum in Docklands, has a new site through the efforts of archivist Clare Wood

  • I can’t tell you about the work I’ve been doing on republishing an archaeological reference text, because it’s not ready yet. If you can find the test URL, well, you’re very sneaky.

  • Any day now we’ll see the launch of the “Family Favourites” pages on the Museum in Docklands website. Go and seek it out, there’s a fun game and an introduction to various highlights of the galleries there.

  • It’s just a promo site until the exhibition itself happens, but have a look at the Jack the Ripper pages. That’s gonna be well worth a visit - get yourself some tickets!

  • Geek stuff: some time ago I made a machine-friendly interface to look at the database of publications our archaeology service (MoLAS) produces. Whilst working towards the launch of http://www.museumoflondonarchaeology.org.uk/ I decided I wanted to change the architecture of the publications application, which for one thing makes it easy to drop little nuggets of info about our publications around the site, all fed from a database. The solution I went for also works for machine access by anyone, and I hope it will be just a start: we’d like to make our events available like this, and in time our collections. For the record, it’s basically REST/XML, drop us a line if you want to use it (though I imagine that it will be the collections and events that will have wider appeal - note that events already have an RSS feed, which is used on sites like docklands.co.uk).

  • And check out our events programme, I’ve just uploaded the May to August programme.


Now, what have I forgotten to mention?


Of course, there’s more in the pipeline, keep your eyes on all our sites!

Thursday, April 10, 2008

FlickrSLiDR

This is nicer than the badge.:


Created with Admarket's flickrSLiDR.

Tuesday, April 08, 2008

Significant properties workshop - report

DCC/JISC significant properties workshop (British Library, 7/4/2008)

I'm not going to write up in detail all that was presented on Monday, but highlight a few things that seemed important to me, and work out a couple of thoughts/responses of my own. I haven't yet had a chance to read the papers that were sometimes referred to at the workshop (links to them here, some are huge!) so my questions may be answered there.
  • JISC’s INSPECT project, run by CeRch at KCL, has set a framework for identifying and assessing the value of significant properties (SPs), and the success of their preservation; and initiated several case studies looking at SPs in the context of sets of similar file formats (still images, moving images etc) and categories of digital object (including e-learning objects and software).
  • 5 broad SP “classes” (behaviour, appearance/rendering, content, context and structure) are identified by INSPECT. These don’t seem to include space to describe the “purpose” of a digital object (DO), unless this is somehow the combined result of all other SPs. But an objective such as “fun” or “communicates a KS2 concept effectively to the target audience” needs to be represented, especially for complex, service-level resources. Preserving behaviour or content but somehow failing to achieve the purpose would be to miss the point.
  • Something I’m still unclear on: is it that a range of SPs are identified that can be given a value of significance for a given “medium” or format? Or is it that a set of SPs is identified for a format, and the value given according to each instance (or set of instances) submitted for presentation? In other words, it a judgement made of the significance of a property for a format/medium, or for a given preservation target?
  • Once identified, SPs provide a means for measuring the success of preservation of a file format (whether the preservation activities entail migration to or from that format, or emulation of systems that support it).
  • The two classes of object explored in the workshop (software and e-learning objects) are typically compound, and are much more variable than file formats. They will inherit some (potential) SPs from their components, but others (many behaviours, for example) may be implicit in the whole assemblage.
  • Andrew Wilson (keynote speaker, NAA) raised the importance of authenticity. His archivists’ point of view of this concept is not identical with that in museums, or that which I'm using in my research, but it’s useful nonetheless. I have, however, already discarded it as a significant property for most museum digital resources, with the exception of the special case of DRs held as either evidence, or accessioned into collections. Archivists’ focus on informational value and “evidence” as the core measure of (and motivation for) authenticity isn’t always useful for DRs, but it is nice and clear-cut.
  • The software study drew out the differences between preservation for preservation’s sake – the museum collecting approach – and preservation for use, where the outputs are the ultimate measure of success. The SPs for these scenarios differ.This paper was very interesting, and perhaps (along with the Learning Objects paper) came closest to my own concerns, but the huge variety of material under the banner of “software” clearly makes it very difficult to characterise SPs. The result is that many of those identified look more like preservation challenges than SPs in themselves. Specifically, dependencies of various sorts might count as a significant property in a “pure preservation” scenario; but in most cases they are, more likely, simply a challenge to address to maintain significant properties of other sorts, such as functionality, rendering, and the accuracy of the outputs.
  • I suggested in Q&As that my reason for being interested in SPs probably differed from that of a DO-preserving project or organisation, although they have plenty in common. Andrew Wilson said that he saw the sort of preservation (sustaining value) that I was talking about as being the same as preserving in the archiving sense. I disagree, in part at least, because:
    • He made the case for authenticity. This doesn’t apply when one is using SPs to help planning for good management, where we just want to make sure that we’re making best use of our resources.
    • For me, SPs could prove an important approach for planning new resources, whilst for archives they are primarily for analysing what they’ve received and need to preserve (although they could in theory feed into future formats, or software purchasing decisions)
    • Whilst for preservation purposes it may often be necessary to decide at a batch or format level what SPs are highly valued and hence what efforts will be invested in their maintenance, for questions of managing complex resources for active use, case-by-case decisions (based on idiosyncratic SPs?) may be the norm.
    • For preservation, the “designated community” is essentially a presumptive audience, whose needs should be considered. For museums looking to maximise value from their resources, the SPs will reflect the needs of the museum itself (its business objectives and strategic aims), although ultimately various other audiences are the targets of these objectives. Perhaps there’s not so much difference here.
    • Fundamental to all these differences is the fact that for archives etc, the preservation operation in which they are engaged is the core activity of the organisation. In other situations, like planning for sustainability, it is not preservation of a digital object, but its continued utility in some form (any form), i.e. the continued release of value, that counts.

    These differences are largely of degree, but to me there is still a worthwhile distinction between preservation and sustainability. In a sense, preservation is the action and sustainability the continued ability to perform that action, so SPs are a way of reconciling preservation with the need for it to be sustainable. Perhaps the lack of a category that outlines the objectives, rather than the behaviour, of a digital object reflects this difference between preserving and sustaining.

Testing oneTag

Well, I'm not at MW2008, more's the pity, but I'd like to try out Mike Ellis's latest tomfoolery which is, as usual, a bloody good idea. OneTag lets you bring together all the stuff tagged with your choice of tag, from your choice of sources. It's in action on the MW2008 conference site so let's see if this post gets in there. First OneTag spam, anyone?
Cheers, Mike!

[edit] the answer to this is it didn't work and it didn't work and it didn't work and I decided to look at the Pipe, followed that lead to Technorati and found that it hadn't updated my site's content since February, pinged it and it's now listed, but because I have very little authority (a measly 3) it won't show up with the feed that's currently in the Pipe. Bummer. Still, at least I found out that Technorati had forgotten about me!

Thursday, March 27, 2008

Chris Rusbridge on significant properties

I've been a fan of Chris Rusbridge for a while now, increasingly so as I delved into the work he's been involved in previously, not least (as he mentions in his latest post) CEDARS. I was thinking along the same lines in terms of the need to identify what's important about a digital resource before you settle on a strategy for "sustaining" it (ideally before you build it, actually), and then I came across the work they did on "significant properties". Clearly the properties for the sorts of data and file-based assets that digital curators typically deal with (not to mention the context and purpose of their work) will often be different from the properties that a museum cherishes in their digital investments, but the framework that CEDARS developed is a great starting point for me.
CR has just blogged about this topic again, in the run-up to a JISC workshop that I'll unfortunately miss next week.

Wednesday, March 26, 2008

The Semantic Web now - Alex Iskold's latest great primer

Alex Iskold's latest guide to SW tech is great, his best yet. Really clear, with useful classifications of the kinds of technology and applications that we're starting to see. If you need a primer or an update, have a look

Thursday, March 20, 2008

OT: awesome freestyling

Gollito and Paskowski in fullest effect. Up until recently I could imagine more ridiculous moves than were actually being pulled off by guys like this, but now, well, my imagination would be very stretched to exceed this!


Wednesday, March 19, 2008

What's new

Stuff seen:

SearchMe beta, a search engine which shows a visual results (images of the web pages) categorised (as e.g. museum, art, shopping, fishing). Quite nice. Silverlight I think. It's a bit SW (in its results clustering, for example), though how it goes about doing this I don't know, but other "semantic" search stuff has shown up lately. TextWise (small "sw", I guess) has just been reviewed by TechCrunch, which was doubtless part of the point of offering a $1m prize for suggesting uses for its technology. Hakia is another such.

Stuff I've been doing the last week or two:

  • the Great Fire of London site for Key Stage 1 kids finally soft-launched.


  • working on templates for the Londinium site - the bulk of my time right now


  • preparing the digital republication of an out of print handbook for identifying roman pottery fabrics. I probably mentioned it before, it involved the export of Quark to PDF, the export of PDF to XML, translation via several XSLT steps and manual clean-up to TEI-Lite, and finally modification of some XSLT to display this as an HTML page. Most of this was a while ago; right now I'm getting ready for the images which will need to be embedded once all the scanned thin sections are ready.


  • testing out and integrating Flash interactives with our CMS. Several are pretty much ready for launch, including two from the London Sugar and Slavery exhibition, and two games


  • advising as best I can on the development of the replacement map interface for the LSS gallery


  • fretting over the re-branding exercise the MoL group is engaged in. how much work is it worth doing right now to fix issues on the sites if we'll be overhauling the whole thing in the autumn?

  • testing new search engine SearchMe (see above). Didn't get good results for "roman london" yet, but it's only indexed a billion pages or so....

One-stop shop for non-profits at Google

TechCrunch points out Google's new portal for non-profit organisations, which should at least make it simpler to sign up for the free versions of their services for museums et al. Off now to try it out...

Monday, March 17, 2008

A few more Paris notes and an update

A dull post. I listened to my recording of my talk in Paris and jotted down notes on a few things that came up in the discussion, thought I'd get them down here. Also, I updated the list of input parameters I put up before and updated the version on Scribd.

Those extra points:

Geo search
There are geographical search (and geo plus time) projects going on in eContent Plus and IST, using co-ordinates, place names, changing boundaries etc. We would hope to incorporate these (possibly post-prototype). Everything in Europeana will be public domain (development-wise) therefore the software will be there for the taking (I hope I got that right!)
"Privileged" tags
We mooted the possibility of privileged tags, i.e. those produced by certain authorised users, perhaps agreed by certain groups. Tags created by these users (most likely content contributors) would be treated differently so that we could pull out only certain items with a tag. But probably, rather than giving them some specific "privileged" status, we could achive the same thing just by identify them by contributor, user group or contributor type.
Stuff to clarify
  • Licensing data model and assumptions
  • Core common data
  • Where is the boundary between Europeana and the contributor sites? Maquette seemed to include considerable data and the actual content displayed in-site for some types of asset e.g. images, but others might be held off-site. What are the rules?
  • What needs to be added to the API to work well for libraries and archives?

Friday, March 14, 2008

Shock discovery: poor communication wastes resources

Yesterday I found out about some work that's being done on our behalf by the company that advertises our jobs. At the request of our HR people, they've almost finished building some custom pages to mimic the look of our group portal pages. Trouble is, the web techs (in this case, me) were never part of the discussions and I can't help but feel that we'd probably have got things done very differently if we had been. Certainly I have some concerns about what we'll get and I would have liked to explore other options. What are my concerns? They can mostly be expressed adequately with the word "sustainability":
  • Content maintenance. It mimics the look of our CMS pages, but the content isn't integrated with our CMS. Changes to site structure won't be reflected in the menus, nor would updated content.
  • Visual maintenance. The look of this site will change (we dearly hope) and I can't change their pages
  • Google. I don't know how they look upon sites that look like copies of existing sites and point at their pages. I suspect it might look like spamming and I wouldn't want to be blacklisted.
  • Site stats. We can't (readily) integrate the job site's stats with ours (if we get them at all). Not a huge deal to me but a factor.
  • Cost. I don't know what this will have cost, but five minutes after getting hold of an RSS feed from their site I had integrated it into our own, replicating the most important part of what they'd done. I suspect we could have done it cheaper, in short!
I don't know if, had we talked about this properly, we would have ended up doing something different. It depends upon what the important parts of the "site" are, and on what the job site can offer beyond a pretty sparse RSS feed, but I think we could have negated the need for at least some of what they did. There's meant to be a process we follow for every single new media project, so that it passes by the right eyes to let us make any recommendations. This time that broke. HR did speak to people on our team but the plans didn't reach me (at least, not that I recall), and I think this has something to do with a communication failure about just what the plans involved. I may not have made it clear the sort of things we are able to do in terms of integrating third party sites (or at least what we'd be up for trying), or perhaps the scope of HR's plans weren't really clear to start with. Or perhaps I just need to make it clear that every single piece of third party work, no matter how small it seems, must go through me. One way or another, our communications have been lacking and it looks like we've ended up doing things the wrong way.

Thursday, March 13, 2008

Yahoo semanticises(?) business

Sorry, that's a lame title. "Means business", I mean (sic). Anyway, exciting reports of their plans for using various forms of structured content that are out there, inluding key microformats, eRDF and RDFa, Dublin Core(!). There'll be a developer platform, too, and the ability to "create mods for Yahoo search that leverage their semantic data". This sounds more than Google Base or something, this is very cool and I wonder if it might mean that using custom POSH will also work.
Hmm, exciting!
[edit] see this too

The PSP is dead in the water

The Reg reports that Ofcom's mooted Public Service Publisher idea is now dead. They loathed it, but from our point of view in museums it had potential, or at least we could imagine a potential role for ourselves. There may be other ways to get to the same thing, although to be honest as a sector we're so busy trying to do the basics that I rather doubt most of us would find the time to develop any ideas to bid for PSP cash. So, goodbye to all that.

Tuesday, March 11, 2008

Here's that EDLNet presentation and notes

Yesterday I put the EDLNet Paris slideshow onto Slideshare, but since Scribd also let's me put up other stuff I'm putting that and the notes there too. If you can't see these coz I've screwed up the Scribd embed link or something, go here for the presentation and here for the notes


EDLNet Paris presentation:






Presentation notes:





Read this doc on Scribd: EDL WP3 Presentation Paris 20080304 v2


And just in case...


The previous post with options for API input parameters is also on Scribd (UPDATED 17/3/2008)

Monday, March 10, 2008

Paris presentation online

http://www.slideshare.net/guest3fb875/europeana-wp3-api-presentation-paris-432008/

New world speed-sailing record

Antoine Albeau, who has had most other windsurfing titles in his time, now has not only the windsurfing speed record, but the outright record for all sail-powered watercraft (at least, the record most of us count: over a fixed 500m course). Rock on! He broke the record with a shade over 49 knots (beating Finian Maynard's previous by 0.4kts) on March 5th in the mistral blowing at Saintes Marie de la Mer, southern France. Video here (I'm guessing he's the first rider in the clip. Hang on for some vicious wipeouts too)
Fingers crossed they might even top that at Southend today. Go Dave White!

Saturday, March 08, 2008

Public API inputs

Public API inputs and outputs

[edited 17/3/2008]

We discussed at the Paris meeting the range of parameters
that we thought that an API might need to handle to perform the sort of (public-facing)
tasks we envisaged. We didn't actually talk about output, except in regard
to the ability to specifiy return fields, but I think that this is actually
much the simpler part to work out. I've reworked our discussion, added a few
bits of my own (including the UGC bit), and split it into sections relating
to general parameters, filters for collections queries, and UGC. No doubt
lots more clarification and revision are needed and I'm pretty unclear on
some bits myself, but it's something!

Input parameters

The “profile” includes various elements defining the operation
in terms of function, languages, values and format of returned data etc. Collections
data requests will be required for some functions, and consist of various
filters. The third table relates to operations on user generated content,
including adding, editing and getting (by user or group). We may decide that
some operations are only open to specific users or categories of user; for
example, accessing UGC of some categories might only be possible for the owner
of that UGC (via their associate API key) or the owner of the collections
related to that UGC. TBC!

Query profile (data access and data addition/editing functions)







Parameter





Access or edit





Example values, notes




Function [required]



A



search, compare, translate, add, update




Return format



[required]



A, E



DC-XML, RSS, geoRSS, CDWALite, JSON, CSV. This might instead be implicit in the target URL.




Return fields



A



Array of field names, but a default set would perhaps include GUID, title, thumbnail, short description, owner, owner type, media. Might also provide shortcuts to preset field groups. Will vary according to target entities




Search data



A



Formal metadata; all data; expert and user tags; user tags only; “expert” tags only; specific user/expert/group tags.




Expanded terms



A



True, false [use/don’t use thesauri etc.]




Requesting language



A, E



EN, FR etc.




Return language



A, E



As above. If only one is present presume the same.




Key [required]



A, E



API user key




User ID [required for some operations]



A, E



For the end user. Required for accessing/modifying data attached to specific users or groups. Presumably we need to authenticate and authorise in some way, too, for some operations.




Rights/licence



A, E



Perhaps multi-value, specifying rights/licensing parameters. Likely to be more complex than one field!



Collection data filters (access only)







Parameter





Example values, notes




Target entities



Objects, people, places, subjects [if we are enabling anything more than objects]




GUID



A unique identifier given to every record in Europeana




Set ID



ID for a set of entities, which may require the appropriate key, depending upon privacy settings for that set.




Keyword



Tricycle, ww2, treaty, Anne Briggs, documentary [multi-field search].




Structured data: name



Name of object, person or place. If these use different fields, then the right one should be inferred from the target entity.


Examples: photograph; sunflowers; Forlì (or Forli); Max Brod (or M Brod); Rockall




Structured data: date [point, range, older, younger]



14 July 1792; 19th century. “Older than 1850” might be expressed as: “- 20000000 – 1850”; “Younger than” as: “1850-2050”; uncertainty like “1850 +- 5” as a range: “1845-1855”, though this isn’t perfect




Structured data: related person



For returning objects, people and places




Structured data: related place



For returning objects, people and places. See also “geographical” below.




Subject





Original language



Of object, principally, for documents (if this data is well expressed)




Originating institution



Good structured data, ideally (we may require an ID), but we could permit a string search across the relevant field.




Originating institution country



For searching by current location




Originating institution type



Museum, library, archive, A/V archive




Location



Including sub-parameters for grid reference, coordinates, place name, and the location of concern (e.g. place of creation, place of publication, location of subject matter)




Sorting



Keyword occurrence, date precision, location, location relative to user, institution type – perhaps sorting partly inferred from the fields used in search, but if these are mixed e.g. date and place plus keyword, need to sort on one before the other.




Media



text, audio, image, video or more specifically PDF, WAV, MPEG etc.




Format [item type]



Map, book, video perhaps. Is this data held in a structured way, and is it distinct from the media metadata?



UGC operations (add, edit, view)


These operations will need user ID (or group ID) plus authentication and authorisation for certain operations (but not for viewing public data).








Parameter





Access/edit





Example values, notes




tag



A, E



For modifying tag i.e. deleting, or viewing associated items




note



A, E



For modifying or viewing




UGC contributor



A, E



Perhaps multiple values, including groups, so we can look for stuff with a given tag but only when tagged by a certain set of UGC contributors




UGC contributor type



A, E



Content contributor vs. other user


Friday, March 07, 2008

EDL WP3 Paris meeting

So, Friday evening, perhaps a few moments to write up (some of) the Paris meeting. I did mean to attach my presentation too but the last version is on a memory stick at home. Too damn portable, never where you need it!
It was a successful meeting, I would say, and it was a pleasure to see a couple of faces I already knew, and meet others for the first time. On Monday (with me still reeling from a 4am start) we were taken through the results of the user testing. These were overwhelmingly positive, which needs to be taken with caution given the guided nature of the demo (especially with the online questionnaire, but also perhaps the expert users and the focus groups). All the same there were criticisms that provided something to get our teeth into, particularly around the home page and the pupose of the "who and what" tab. Search result ordering was an issue, a particularly thorny one in fact that we tackled on Tuesday as best we could. Clearly a lot of users don't really understand tagging, though they thought they liked it. Other plusses were for the timeline and map.
There was a good session with representatives of a French organisation for the blind and visually disabled after lunch (a bloomin' good lunch, in fact. Good wine, too. I love France!). Aside from HTML accessibility they talked extensively about Daisy, and it would be marvellous if some of the text content that may end up there could be daisified. No-one had heard of TEI (or DocBook) but it struck me that these formats are pretty close to what Daisy sounds like, and that there may be TEI material amongst the content we'll be aggregating, so translations to Daisy could be relatively straightforward. Anyone know?
Personalisation took us to the end of the day and we distinguished between activities done for private purposes (though perhaps with public benefits) like bookmarking with tagging, or tailoring search preferences, setting up alerts, or saving searches; and explicitly public activities like enriching content, suggesting and tagging (when not bookmarking). The question of downloads (what? how? assets or data?) and the related issue of licensing came up. I think we worked out that possibly four levels of privacy would be useful, extending the way Flickr and other sites work, with private, public, friends/family, and "share with institutions". The latter is really about saying, I will let me and my mates and the organisation whose objects I'm tagging/annotating look at the data, but not everyone. I think it's important and should be encouraged, as it lets those institutions do interesting stuff with the resulting UGC for everyone's benefit. We ran over into the next day to deal with communities (still plenty to think about there, I would say) and results display, a practical and useful discussion that touched on the fields that might be searched across and how they would be used in ranking.
Finally my bit came up. Although Fleur had suggested that I talk for maybe 15-20 minutes to kick off discussions on the API, I, feeling unsure of my ground, prepared pretty thoroughly with the result that I had material that kept me talking for an hour or more, I think, albeit with some digressions for debating what I was saying. On the whole it went down quite well, I think, but I learned a bit about what I should have added (proper, simple explanations of APIs, and more examples of how they're used) and what I should have left out (a section where for the sake of completeness I referred to the management of collection data, which is not part of the public API anyway and is outside the scope of our WP. This led to a digression that was I think still useful, but not to the topic of that moment). And then, seeding the discussion with a use case related to VLEs, we tried to figure out in more detail what functions and parameters would be needed in an API call, and what would be returned. And that, my friends, I will write up shortly for now I want my dinner. Home calls.

Thursday, March 06, 2008

MS new stuff: IE8 and super-cool image zooming in Silverlight

IE8 beta released, some interesting developments (the microformats list is debating the rights and wrongs of "web slices", which are based on hAtom and are intended to let highlight part of a page to treat as a feed)
Also from MS, SeaDragon (see also Photosynth) in Sliverlight 2. More than a bit useful for use cultural heritage types [edit] and I should add that the video on that TechCrunch page is of a cultural heritage application - Hard Rock Cafe's memorabilia application, which was the demo shown at MIX08. They talk about the role of imaging for authentication, for bringing objects to life, and though it's obviously a business, their business is really not so far from ours (albeit for profit)

Sunday, March 02, 2008

Hey MCG Listers!

Thank you for visiting. I see that some 30-odd listers have had a peek at my summary of the recent EDL/API thread - I hope it was worth the trip to this blog, and I'm really pleased that the thread must have piqued the interest of a fair few people, still more than actively participated in the thread (and lurking is just fine by me, I lurk on many a list). Anyway, I switched on comments after posting (though I've forgotten ever turning them off) but it turns out that this isn't retrospective, so if you wanted to say anything in response to the EDL/API post you couldn't. Hence this one - you can stick any responses to that API stuff here if you like. No pressure, though!

Cheers, Jeremy