About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Wednesday, December 02, 2009

UKMW2009

Yesterday the V&A in London hosted first the UK Museums on the Web 2009 conference, run by the Museums Computer Group (MCG, spanking new website here), and then the Jodi Awards. both of which I attended and both of which I found immensely stimulating. As always, it was fantastic too to catch up with various peers, some of them old friends (or indeed supervisors!), others people I knew only from Twitter or their blog, or not at all. I only wish we'd had a week to spend so that all the discussions I found myself in could have flourished fully, but that's the nature of events like this. Hopefully those nascent conversations started with @janetedavis, @psychemedia, Dave Patten, @gsturtridge, Carl Hogsden and so many others will continue elsewhere, soon.
UKMW09 had a very lively back-channel on Twitter. Check out the tag #ukmw09 to see what I mean. I had enough of a job keeping up with what the people I follow were tweeting but I dropped in every now and then to see what others were saying and would find 60, 80, 100, new tweets: too much to follow and still pay attention, so I may do some catching up today. Some papers really seemed to get people excited, though, notable Paul Golding's, and perhaps this showed how well the organising committee had identified what MCG people needed to hear about.
In the middle of the conference there was a break for the MCG AGM. The Group has had a major constitutional overhaul and the changes implied by this are only beginning to be evident, starting with that website and new logo. A very exciting research partnership is in the offing, and together with membership rules that now include mail list members as formal members of MCG, plans for sandboxes and evolving relationships with strategic bodies, it looks like the MCG is getting a real shot in the arm, thanks to an imaginative committee and valuable facilitation by Flow Associates.
This stuff is the notes I took down live at #UKMW09. I'm not planning to edit it*, that will only slow it down so much I never publish this, so naturally you'll find it somewhat impenetrable and full of bits of that amount to straight transcription coz you don't know where the speakers leading you, and other parts lacking enough context to make any sense but hey, if you find anything of interest I'd urge you to look deeper into that speaker's work because all of these papers were way more fascinating than my notes could ever convey - even if I did edit them!
* but you never know.
Enough preamble and caveats, here you go. Oh, any stuff in [square brackets] is my own interjections [or is it Someone Else?]

UKMW 2009
Ross Parry, chair
[...basically I missed Ross pt. 1. Shame, he sets this stuff up so well but if you've heard him before you'll know this.]
Mike Ellis
Today's experiment: QR codes on delegate badges so we can stalk each other. Will get e-mailed vCard of whoever we scan (or key the number in for). Done through Mike's onetag.org service

Session 1: Social (Bridget McKenzie, chair)
BMcK intro: Social is deep: connected with museological issues of contested ownership, authority etc. Web can power a civic society.
Matthew Cock (British Museum) & Andrew Caspari (BBC)
A History of the World partnership (AHOW) BM+BBC, 2010-2012 and beyond. 100 episode Radio 4 series, involving 350 museums, local radio stations, website, kids' programmes, plus 100 objects from BM. Radio rather than TV for speed: story rather than visual focus. MC: opportunity for a social site and engagement.
AC: 100 BM objects woven into a history of past 2 million years by Neil Macgregor on R4, 13 then done for CBBC. 600 objects from round the country telling regions' relevance to story of UK & world. Beyond that, UGC: public invited to upload their objects to weave into the story. World Service will overlap with this. Hope to encourage conversation off AHOW i.e in Twitter etc.
Forcing partnerships, encouraging wide participation, building new audiences for digital, museums and history. Pan-platform. "Permanent" collection [very interesting to see how this will work]. Each object has own page, journeys through geography and time via objects.
MC: priority for BM object pages is to get people to listen again to radio show. There will be video, 3D for some, related objects, other contributions, (requested) comments from others as well as open for public comments and, for limited time, questions.
Other museums and public can tag their objects in the same way as BM has done for findability, a simple uploader with variable levels of detail. This open to people worldwide, moderated too.
Launch January.
Qs
BMcK: will the collections gathered through this feed into Culture Grid? MC: not decided, governed by BBC T&Cs at present.
Denise Drake (Tower Hamlets Summer University): Staying social online
Small independent charity. Free summer courses to all young people (11-25) in TH. Actually year round. 26 staff. Have helped set up similar summer uni in every London boro, coordinated but independent, 50k places in all.
2 websites, active on 13 social network sites/accounts. Bursaries for film/photo projex, blogs for these.
Asked for a vote for an award, got some strong negative responses but regarded as an opportunity to react positively, quickly. -ve comments left visible. Tries not to do social stuff out of office hours, partly for protection. Child protection issues: don't make "friends" of u-18s; be careful with images (make them small, no name)
Nadia Arbach (now @ V&A)
Wikipedia Loves Art campaign, a way of generating images for WikiPedia, Feb '09. BMA led 16 museums, V&A only UK participant. Next year V&A will lead a proper UK project, Britain loves Wikipedia, and they're looking for museums that want to take part. [see also Nick Poole's blog here]. BMA encouraged their users to join in, V&A targeted the "London" Flickr group. Each museum had own guidelines and routes to participation, interesing use of existing networks.
Hosted a special day blitzing the museum (though could do any time in Feb). Competitive in terms of numbers uploaded by inds/teams on that day.
Official WLA photopool, museums checking correct data attached to photos. Process changing nect time so that quality priotirised over quality - an uploader?
Museums have had people asking them to photograph particular artefacts for them via WLA, or to add images to Flickr groups.
CC licence required for all contributions.
Session Q&A
Q from Jude Habib: how did BBC engage local museums? A: Local radio has a buddy in each station for museums to contact.
Ruth Harper: [my summary: sounds like C24 want in on AHOW]
Q from Mike Ellis: should we build it or should we use what's there? MC: never considered using e.g. Wikipedia because they had the BBC platform.
From me: the "permanent" collection? MC/AC: no plan yet, too busy getting this ready to support the Cultural Olympiad etc. though definitely intend to find a way to sustain this, perhaps through integration with the offer in the BBC History site and the like.

Session 2: Situational (Loic Tallon, chair)
Clients asking how to create a web-like experience in the museum - is that confusing? Mobile growing, also calls for another experience.
Paul Golding (wirelesswanders.com): Situational web
How you give an experience based on where you are, overview of technologies.
Cells have IDs we use to tell where roughly phones are. Public info which e.g. Google can use. Location gateways queried for this, with estimates of uncertainy. "self location" UI important coz errors can be big, want to override. Dense urban areas 300m, semi-urban:600-1200m, rural up to 10kms.
GPS way better. Devices like iPhone offer this plus cell and wifi fallbacks. Getting location info as a programmer can be done on handset (note: location API in HTML5 JS). Or can ask phone servers for location; then use Fire Eagle, GMaps etc.
Proximity services: RFID and the like; barcode and QRs; Bluetooth/WiFi/SigBee; visual recognition. Wifi works where the network has been mapped and is quite dense e.g. in a warehouse. Visual recognition ie cameras recognising images - good for museums?
AR apps like Layar, Wikitude, Junaio. Markup not necessary. Cd create a "digital fingerprint" for artwork and connect to information record. Camera as "third eye" "Disintermediation" of exhibitor/producer's presentation i.e. can bypass the information you're presented with directly in the space.
VWs: massive growth in the tweenies market, our market of tomorrow.
"Conversation via place": e.g. flook.it. "Leave" a note in a space to be picked up by someone moving into that space.
Trends and predictions: 80% penetration of smartphones by 2015, ready for mass consumption. HTML5 browsers on them. AR will be done via these rather than specific gadgets. Location key "web 3" enabler. Indoor location sensitivity/AR popular for enhancing events, shopping etc by '13. VWs will be the popular UI metaphor for some handsets.
Andy Ramsden (University of Bath - UB): QR codes
UB project: what does QR offer to learning opportunities? What do they offer museums, are they a fad?
QRs connect something physical to something electronic. Souped-up barcode readable by phone. Alternatives coming tho using similar principles. Require an activity/task suited to a small screen device. Will cost users if they aren't connecting thru WiFi. When you read a QR, a URL is decoded, you decide if you want to follow this i.e. perform the action.
Creating a QR code: can do on the Bath Uni site: http://www.bath.ac.uk/barcodes.
Thinking about how QR codes could be used in a more social constructivist approach to learning. Tie to a blog. QRs in library for adding books to your reading list. Subscribing to RSS feed: loads easier than plugging long URLs into phone.
Connecting phys to virtual learning materials, because these can be dislocated rather than obliging learner to perform learning task at a PC.
Students becoming much more aware of QRs, 10% have now used them (UB survey)
Mike Ellis: how's this work in museums?
Convergence of technologies finally making lots of thing possible, at last. Networks getting cheaper and mass-market. Data the norm with mobile devices now. Computing power increasing and APIs flourishing, copyright barriers lowering/easy licencing too. And Google. Highly available services.
"Vastpoint sensing" using massive contribution of content esp location-based. With many consumers' devices as sensors, masses of data can be added in realtime to e.g. GMaps.
Predictions: city-wide wireless networks; increased understanding of the tech into our psyche; less geek, more invisible.
Session Q&As
ME: cost shouldn't be a huge barrier, it's dropping
Linda Ellis: Where should people put their stuff to get it picked up by these services? Paul: wherever there's a public API e.g. Flickr. Mike: where the people are.
Joe Cutting: how have things changed in terms of contract/PAYG phones, replacement of handsets? Paul: contract phones replaced on avg every 18 months, but 80% estimate by 2015 for smartphones.
Gail Durbin: ideas for how to use the QR codes they already have on their object labels. Andy: treasure hunt activities tied to OPAC (UBath library expt). Paul: run a competition to answer this question!

Open Mic session
Linda Spurdle (Birmingham Museum and Art Gallery): Pre-raphaelite Online Resource
BMAG-built, JISC and MLA funded. Targetted at HE students and researchers. Audience research indicated low demand for social features, commenting etc. Not about fun! "Wikirage" of lecturers a deterrent to making it social. Instead put objects at the heart with zoomable images (via moderately controversial but beautifully effective Sliverlight interface)
Julian ...(Manchester Art Gallery): QR codes in practice
Revealing Histories display hooked into website, ability to submit content.
Current trial: Manchester public sculptures interpreted via QR codes. There's already a GMap of where the sculptures are. Should they try RFID? Small redesign of pages for mobile leads to questions over main collections pages, which aren't mobile friendly. [if end-points were, say, Wikipedia pages, might it be easier to pull into Layar etc?]
Tim Boundy, JANET UK: Use JANET for video-conferencing! Please!
JANET hooks up museums with schools for VC [including the Museum of London]. Both the infrastructure and the booking system made super-easy; number of registered schools growing rapidly (4k now). You need the hardware, JANET can provide the software.
Andrew @ V&A: blogs
Audience, contributors and plan all needed! Knowing limits of your technology and agreeing a schedule. The plan: if for an exhibition, start well before launch!
Shona (Museum of Hartlepool): Articus
Educational resource aiming to increase footfall in the galleries (specifically, booked school groups), aimed at children and teachers. Various activities include creating art offline and uploading, curating own galleries, gathering images to use on IWBs. Launched February but less uptake than expected - is this because of registration requirement?
[They really want school groups to book visits following from this. I wonder, how is the site positioned to tempt people in rather than do it all online? Are they not encouraged to submit and place value in the statistics of thes sites usage in itself, because it seems a bit C20th to value only the visits to the physical museum that it may encourage. It's treating it a bit like fancy brochure-ware, when really what it is, is a valuable service for schools in its own right that simply needs them to find a workable way of measuring its impact in terms of MH's mission. As for why they're losing visitors, what do the stats say on bounce rates, is there real evidence from there that registration is off-putting? Tweet @snowflakeshona with your suggestions]

Next, Ross does some defragging.

Keynote - Richard Morgan (V&A): Making the digital museum relevant in people's everyday lives
[the following, in retrospect, makes not a lot of sense, which is my fault not Richards! By brain was failing.]
What's people's daily experience of museums? Are they as likely to see the commercial arm as anything else?
Making money is important; our digital presence is scattered, how do we make sense of it for people? Will SW do anything for this?
The Q: "maverick activity" that leads to all this, fitted into the corporate narrative (and defragmenting that corporate narrative too).
Capture your data, tho this requires a leap of faith. V&A's new Search The Collections has got 1M records out there, from a low base. Put focus on relationships between records, to be articulated in UI and underneath, to encourage reuse (API).
Interfaces: browsing; visualising through mapping
Browing continuous variables and topologies? FABRIC, TSB-funded project about content-based image retrieval, variables including colour, "texture" (shape and angles, really).
World Beach Project: bringing world into collections.
V&A Wedding fashion site: data includes where clothes bought, an implicit link.
Lots of curatorial stuff and lots of UGC stuff now. How to join it up with real semantic connections?
Collections of photos/paintings of localities, being places of significance, have a fair chance of being in e.g. Flickr too (and tagged). So can we dynamically hook our content into this?
Museums good at finding strong niches we can build networks around. We can delegate to these what the museum cannot provide. Might this even be a way to make cash, selling services around, say, weddings or fashion?
Moving from niches to "web intelligence and insight", looking for stuff that's not so obvious, the signal in the trends. We do have lots of data after all. Can also identify weaker signals, can we een anticipate trends? Helps us argue to funders about the value we (could) give.
Finding the stories in the data is one of the things a technologist in a museum should be doing. [a key point slipped in right at the end there! I'm really interested in where we might find the boundaries of our work moving, or where we might wish to expand them, as we find that digital media peeps have maybe accidentally found themselves to be information curators, analysts, interpreters, and disseminators. There are professionals in museums trained exactly for some of this, but if the webby people know how to hook it all together, plumb in the visualisation tools and the metadata enrichment tools and so on, are we gradually moving onto that turf too? The role of the information or computing professional is always evolving, whether in large or small organisations; I think it's good sometimes to reflect on where it's evolving to]

Session 3 (Mia Ridge, chair): Sensory
Joe Cutting: Telling stories with games
Company of Merchant Adventurers of York: good-looking teading game [did they refactor the code from Elite?] Instructional and fun, that's what it's all about. So what is a game anyway? JC's practical proposition: given goals in a situation, choices, feedback, make more choices i.e back to step 2. Learning through iteration. "Active prolonged engagement", a term coined at Exploratorium. [think I ballsed up the definition a bit here, sorry]
Need enough info to make a good choice. Success or failure must be gradual.
Game models: lots of console games played obsessively by a small audience. Not really what we're after. Arcade games better. Web-based MMORPGs better still, but 3D makes things harder, not easier.
Anne Kahr-Hojland (DREAM in Denmark): Ego-trap
Ego-trap comes out of her PhD work. Visitors guided by mobiles through exhibition at Experimentarium. 2 narrative layers, three levels. Personality test, questions from a woman who has called you; a level at which suspicion is aroused by another person contacting you; ...
AR gameplay, digital narrative determined by physical setting.
Targeting secondarry students. Objectives: to stimulate interest in science, improve learning in that setting by prompting reflection. Reflection prompted by predictions and evaluations, narrative structure, discussion with others. Works well as an exhibition guide; high levels of engagement and recall of exhibitions after play.
The meta-narrative gets less committment than the personal test; is it coz they know what to do in context of a test? Does this interfere with the critical reflection aspect?
[once again, I couldn't really keep up with the ideas properly whilst tweeting and writing this and this definitely shows! There's a lesson there, but I'm too preoccupied with the personality test to realise it]
Victoria Tillotson (iShed, c/o Watershed Media Centre)
Project to bring together practitioners, researchers and users in immersive experiences. "A space for risk", inspire innovation and share ideas, create market place.
Includes artists, creative industry, IT co's, community etc.
mscapers.com: software to create location-based mobile games, which will be hosted on the mscapers site. Cool!
HP facilitate annual festival: mScape Fest.
mScape only works in iPaqs. Oh. But work underway to port to Android and iPhone. Uses GPS so currently only outdoors, indoors version coming[?], also downloadable versions of games.
Pervasive Media Studio: pmstudio.co.uk will be home to cool stuff in due course...
Pervasive gaming gradually spreading in pockets. Face to face, on the streets, on the 'net, without technology. All sorts of genres and timescales.
Simongames.co.uk: game built around location of a Romany caravan, parked around London. Done for Soho Theatre. Interaction between public and travellers in the caravan. [not too sure what the game part was tho, same old story: too distracted. Oops]
Duncan Speakman: sound to navigate public spaces. "subtle mobs" of people gathering to listen to a set of instructions and act on them. "As if it were the last time": a subtle mob in Bristol, coming to London soon. See http://youtube.com/watch?v=FY6S4GkCZ9c

Final session (Marcus Weisen, chair): Accessible digital museums
Lots physical exhibitions still failing accessibility for people with physical or sensory impairments.
Helen Petrie & Christopher Power: Accessible digital culture
Trying to make this an interesting challenge rather than a burden often tackled as an add-on must-do at the end.
The digital past in large part about about websites. There was a burst of interest (following WCAG) with DRC investigation into web accessibility, eGov targets for govt websites, Culture Online funding dependent on accessibility, MLA audit. Then govt unit closed down, sites got more complex - moving target - DRC merged into EHRC, no legal cases brought against failing orgs. EC push, but have the targets been set too high? "Is it too hard to implement accessibility in digital culture and related areas?". UK gov now has Digital Inclusion Champion in Martha Lane Fox. New EC initiatives will target culture more.
Christopher moves us on to the present: WCAG has aged poorly; technology, interaction and user changes; new WCAG out last year tho not much fanfare. Few tools to address WCAG2 conformance. Is it harder to understand than v1? Guidelines are mainly tech-independent and so "future-proofed", grouped into 4 principles: perceivable, operable, understandable, robust. It's about "transmitting meaning" after all.
Success criteria don't tell you how to test your technology, perhaps this just makes it trickier - you have to go to "techniques" section, dealing with implementation. These only deal with W3C technologies. Often not evidence-based but value-based.
Outcome: WCAG has moved the responsibility for developing the correct tests onto the development community.
The future: interacting with content, focusing on users' objectives not the technology. Communicating meaning is the point. About content, not delivery, whether digital or not. Personalisation of content to match user preferences. How do we measure experience and communication of meaning? [the self same problem we have for assessing effectiveness of our digital media even if we were oblivious to accessibility questions]
Contract soon with EC to investigate this problem in the digital culture arena.

Jodi Awards
I didn't take any notes during the awards, and only part way through did I start to tweet coz I couldn't see anyone else doing it (until I looked at my twitter stream!) It was the first time I'd been along to these awards, perhaps because it was also the first time we'd been up for anything and whilst I can't claim any credit at all for that (or for MOL ultimately winning the award we were nominated for) it felt like I had some small justification for attending what was an over-subscribed event.
I found it all pretty moving. The last year has brought disability closer to home, well, into the home for me and I'm some way through a process where the sort of knowledge and values about disability that I've always known in a sort of factual "of course that's the right way to think" kind of way, are turning into the sort of knowledge that is more internalised, that is felt and truly believed, not simply accepted. There's a difference between that knowing and believing. I guess it's called awakening. Anyway, attending the awards was a privilege and an inspiration and I want now to make sure that I embed a righter way of thinking into my work, rather than doing the things I know (or am told) need to be done and assuming that is sufficient.
I was struck in Helen and Chris's talk before the awards (enough to tweet it in very compressed form) that they had identified the communication of meaning as the core of what accessibility should mean, and that this resonated so well with the way I've been thinking about sustaining/sustainability: the purpose of sustaining is not to continue with the thing the way it is, but with it serving what it's for (even this might change, in fact), and a radical change in form might be the best way for something to carry on serving the same purpose. I like this synergy very much, this fact that perhaps we can focus on what a resource is for and serve both sustainability and accessibility ends.
You can read who was nominated, who was commended and who won here, for this and previous years. What you probably can't read, or relive in any way, was the wonderful Skype moment we had with the Karlovy Vary recipients of the jointly-awarded International Award, who couldn't hear us at all. Turns out that the very excellent Matthew Cock was using the wrong mic. Thank you Matthew for the ensuing comedy on a heartwarming but quite serious evening. Thank you too to Martha Lane Fox, who announced the awards and took us through a bit of her own journey. And finally humble congratulations to all the winners and nominees on that list, not least our own Lucie Fitton and Jude Habib (not our own) who deservingly won the Digital Access Online award.

Friday, October 02, 2009

Unbuggering SQL Server - xpstar.dll fix (Feb '10 edit)

Having spent lots of time googling and going down various blind alleys, trying to fix an error on our live site servers where trying to run a DTS, looking at user properties and various other things all threw up a pretty uninformative error related to being unable to find xpstar.dll. Having sorted it I thought I'd post what worked for me in case it saves anyone else from too much time-wasting.

The root of the problem was the virus and trojan s***storm that hit us a few weeks back, but once that was remedied the SQL Server 2000 issues remained.
Lesson one
For a while I was looking on the wrong server, coz I assumed that the server running the DTS would be requiring its own copy, but turns out that the server it was targetting was the one lacking xpstar. Don't forget to check the other server!
Lesson two
Plopping a new copy of it into the relevant place didn't help. Perhaps it needed re-registering in some way. A couple of threads in Googledom talked about MDAC, and reinstalling it seemed like a good idea, but fooling round looking for Windows Components to reinstall led nowhere.
Lesson three
One of those threads mentioned reinstalling SP4. Fortunately we had copies of the installation media for both SP4 and SQL Server 2000 on the server, but first time we ran the SP4 install it transpired that the virus had actually messed with about with these too, deleting a crucial directory (Binn), not within the SP4 media but the SQL installation media. Once this was sorted out, an SP4 reinstallation was all that was required. It fixed the xpstar error without a restart.
Lesson four
Don't know if there is a lesson four, but if it turns out that we didn't get the virus off properly, or I should have done a restart after all, I'll let you know.

HTH Jeremy

Postscript
January 2010: we had more problems which in part were related to, you guessed it, xpstar.dll "disappearing". All the SQL Server jobs disappeared, for one. Reapplying SP4 once more did the trick: turned out the jobs were just hiding when the server "lost" xpstar. I need to get to the bottom of why it happened a second time, but there you go: SP4 fix definitely works for us.

Thursday, October 01, 2009

Google Translate widget: awesome

Some time back I had an abortive go at making a bookmarklet to do various things with Google Translate that weren't that straightforward. It all went a bit off the boil but thankfully it's pretty irrelevant now that Google's done what was needed all along and given us a dead easy widget to drop into your site and do good-enough translations on-the-fly. Links on the page are also appended with parameters so that it will translate subsequent pages too, as long as the widget is on them. Nice.
Have a look at the MOL pages. The layout is not optimal (it's down at the bottom) so I need to experiment with changing that, but it works nicely on the pages where the one footer I've changed is used. Took 2 minutes. I had to refresh the page after its first load to get it to work, which may be because I have the widget at the bottom and the javascript file didn't load properly or something. Wha'evah.
Actually I'd still like a bookmarklet (not G Toolbar) that lets me highlight a bit of text and translate to/from a language of my choice. Got a bit stuck with the whole IE prompt thing but hey, perhaps I'll have another go.

Monday, September 28, 2009

Jennifer Trust outreach programme wins the National Lottery Award

A whilst back I blogged/tweeted about the fact that the Jennifer Trust, which supports sufferers of Spinal Muscular Atrophy and their families, had been shortlisted for a National Lottery Award for Best health Project, recognising the quality of the work of their outreach programme. I'm really pleased to hear that it won the award (Lottery news item here).
Unfortunately the Lottery funding came to an end in May and the award itself brings no cash at all, but I dearly hope it will raise awareness of and support for the Jennifer Trust's work, which offers hope to many thousands of people in the UK, and indeed support to those who have no hope and may have lost their beloved children.

Monday, September 21, 2009

Anglo-Saxon partnership

Not being much of a historian I don't know how much the Angles and the Saxons saw themselves as a partnership or how much they were just lumped together as such by the residents of these islands left behind after the Romans' holiday, or by later centuries of ignoramuses. I suppose it's a question I might find an answer to if the following suggestion ever came to be.

Today I read of a wonderful find from the North-East, where the burial of an Anglo-Saxon "princess" has been uncovered and the rich remains are to be displayed in a new display at Kirkleatham Museum, Redcar. The cross-over with the Prittlewell "prince" that MOL's archaeology unit excavated a few years back is obvious, no less the uber-famous royal burial at Sutton Hoo [tour]. The fantastic Portable Antiquities Scheme has brought many smaller finds to the knowledge of the heritage community and no doubt they all add to the sum of our knowledge about life at that time, whether noble or otherwise. It struck me, as one who often talks about partnerships but is lacking the imagination to come up with many good candidates for it, that this was one such, where the evidence of Anglo-Saxon royalty that's thinly scattered round the country could be united digitally. Not exactly revolutionary, but it shouldn't be too hard to make it happen, at least in part via the magic of machine interfaces. The PAS has Dan Pett's excellent API, the BM (home of many Sutton Hoo treasures, as well as the PAS, as it happens) has it's fab new-ish Merlin system, and we have...oh bugger, nothing at present but in due course the Museum of London's Collections Online system will emerge. Sadly the Prittlewell finds aren't ours, though we look after them for now, but we have plenty of info about the site as well as other A-S riches from within Lundenwic. Perhaps it's a nice student project to bring these all together. Anyone up for it?

Now I'm looking forward to Thursday, for when Dan is threatening exciting news. Can't wait!

Thursday, September 10, 2009

Not very news: I won a competition

Well, well, well. Apparently back in March I actually won something, but it's only by stumbling across this blog post today that I found out. It's quite cool, actually, that I get to have a product (an "Open Source animal bones database for use by Archaeologists") named with my suggestion, although I really like some of the others (SQLETON, anyone?). No other prize, but as a latent bones person myself it's really nice (human bones, though). Apparently, looking back at my reply to the Antiquist mailing list, I suggested "zooos" because (obviously) the zoo relates to animals and the "os" to:

bone, as in "os animalis". Short and sweet! And maybe the extra "o" makes
it more memorable in a way. Then again, it could just be frivolos :-)


Mixing my languages of course but never mind. Strangely I didn't make the point Joseph makes, which is that the OS also puns with Open Source, and which is why they've amended it to zooOS. Shame I don't do PHP much, or PostgreSQL.

More important than the name is the idea of the Open Archaeology Software Suite itself, not to mention Oxford Archaeology's Open Archaeology project that sits behind it. I mean to look at these more carefully and to prod our Archaeological Applications Development Manager, Pete, to do the same. Cool idea.

Museum websites aren't down

Just thought I should mention, some time after the fact, that the sites aren't down any more. For a while they were up-n-down like a wh... no, like my eyelids during one of those structural geology lessons back in my undergrad days (mainly down, then), but now they appear to be reasonably stable, so I'll tempt fate by saying as much

That really was a crap week. Looking forward to moving on and catching up now.

Friday, September 04, 2009

Museum of London websites down

...and will be for a bit. Our just-appointed Head of ICT, Adam Monnery, is doing his bit. With any luck things will be running by the weekend but don't hold your breath.

Thursday, August 20, 2009

The great escape

Well today I don't feel like moaning. Pretty fecking remarkable, huh? Stuff went pretty well, we're close to finishing a very important stage in the Collections Online project, I unbroke some things earlier this week so I could get on with some actual work, I talked with a curator about an exciting project that's still far enough in the future that we can dream big dreams and not worry about the inevitable slap in the face that reality will give us...
On top of all that I managed to find a few minutes to do some development, which is pretty good by current standards. One thing I wanted to do was simply make a map link from an object record in our Solr index. Now, Solr URLs have their reserved characters as well as normal URL escaping. XSL, too, with which I transform the Solr output, likes escaped characters. Google Maps URLs, of the sort that you make to overlay KML on a map, well, of course they also require characters in the KML URL parameter to be escaped. The end result is a URL for a map with overlay that looks something like this:

http://maps.google.co.uk/maps?f=q&source=s_q&hl=en&geocode=&q=http:%2F%2Fwww.museumoflondon.org.uk:8080%2Fsolr%2Fselect%2F%3Fq%3Dtext:knife%2BAND%2B(start_latitude%5B-1%2BTO%2B1%5D)%26version%3D2.2%26start%3D0%26rows%3D30%26fl%3Dname,caption,start_latitude,start_longitude,site,accNum%26wt%3Dxslt%26tr%3Dkml.xsl&ie=UTF8&z=14

Ugly, huh? [BTW, once I put the new multicore index up this URL won't work]

Escape, escape, escape, and I've had plenty of fun and games in the past trying to escape stuff in XSL the way I want it without XSL then re-escaping or unescaping or otherwise ballsing up the output, so this time I thought, sod this, I'll just make a page to take in a nice simple set of parameters and redirect to the map. This makes it a whole lot easier to write the links in XSLT without worrying so much about the escape nightmare. A link like:

http://www.museumoflondon.org.uk/scripts/solrgmapredirect.asp?q=knife+AND+(start_latitude[0+TO+*])&s=0&r=30
[the "+" can be "%20" instead]

I don't know how much time I saved but I know it only took 5 minutes. It takes in a Solr query, record count and start index, escapes characters as befits GMaps KML URLs, and inserts them into a Solr query URL (including the KML transform bit, of course: wt=xsl&tr=kml.xsl, in our case). This is put into the GMaps URL and we do the response.redirect (yes, it's classic ASP). It's brittle: it will break if the GMaps URL format changes, or if the Solr URL or output format change; but hey, it's simple and works (for now).
Side benefits
It was only after making the script for these pragmatic reasons did I realise that having such a page is, of course, good for several other reasons, including:
  • it will give us stats on people following the map links
  • that same brittleness is more of a problem if I'm making links like this in lots of scripts and transformations around the site. This way I only need to point all similar links to one script and change that
  • if I decide to scrap Google Maps and use, say, OpenStreetMap, or if I want to get my KML from somewhere else, again, one script to change

I will probably add a couple of other parameters but don't want to make it heavy. Specifying the data source is one (other than Solr we can get KML out of, for example, our publications database); specifying the target service is another, so that we could use GMaps, OSM, Yahoo! and so on. Shit, anything but Streetmap (how much do you not miss having to use that piece of crap? Best thing about the last few years in mapping is the fact that you never see that anymore).

[edit 21/8/2009]

I've done some further work this morning, along the lines suggested above. It now takes in a data source and a target service parameter (though the latter only works for GMaps at present), which means I can pull in the publications KML and may start getting MOLA sites by site code too. Much more flexible now, and a single point for all map requests is going to be handy. More work to do to use more powerful aspects of pubs search.

It may seem odd to blog about a 5 minute job when I've been doing much more challenging and complex things that take months, but it's very satisfying when it works so quickly, plus my belated realisation of the useful side effects made me think it was worth talking about. Here's the salient part of the code as it now stands, for interest.

Here's a link to the new script, looking at publications data:

http://www.museumoflondon.org.uk/scripts/mapredirect.asp?r=30&q=roman&s=0&t=gmap&src=pubs

Follow that and see the GMaps URL I now longer have to write!

Saturday, August 15, 2009

TwitsTwotsBitsBotsDeliciousDosAndDoNotsIfsAndButs

So I've got into using bit.ly for my short links, particularly on Twitter. I'm sure I don't need to explain why, but aside from these serving the obvious need for brevity within a tweet (but never here...), I appreciate the stats, which appear in real time at minute-scale granularity and so are in some ways clearly superior to what you get from Google Analytics. Here's an example: http://bit.ly/info/3NuA3P . I'm going to talk more about the stats later on but before that a brief digression about Delicious, which I think will repeat some of what I saw in a post by Tony Hirst recently, but it's been brewing so gotta get it out.
What's wong wiv bit.ly and Delicious
The problem with bit.ly is that the things I want to tweet I typically also want to bookmark using Delicious, for whilst bit.ly keeps hold of your tasty links it's not got tagging (and why not, I wonder? That would make it a much more useful and social service). It got the the point where I was wondering why Delicious wasn't offering an integrated short URL service, since right now if you want the full benefits of online/social bookmarking and short URLs neither bit.ly nor Delicious cuts it. Or should I say, right then, since about a week ago (and within a week of my tweeting my bemusement that Delicious wasn't doing this), it did. Bookmark with Delicious now and you get the option to share your link, which produces a short URL. Cool, and yet.... it's not good enough for me. You can only do it by letting Delicious e-mail or tweet the links for you, at which point you see the short URL. Bu you can't simply view the short code immediately so that you can cut-n-paste at will. Delicious should create one for every single bookmark, with the option of custom links. To do what bit.ly does and tempt me away from it, it must also create unique URLs for each person's version of a link, so that can be tracked individually (together with the shared one for aggregate data), and it must offer decent stats.
So, that's why Delicious isn't up to snuff yet for me to jump ship from bit.ly, even if that would mean just one operation for tweeting and bookmarking my fave URLs. Hmm, come to think of it perhaps bit.ly could offer OPML output or some simple export or integration with Delicious so you could just synchronise periodically? That might keep me using both services happily. I should say, I'm perfectly aware that there are alternatives to Delicious and that some of them offer better integration with Twitter, but that's my chosen poison and with social stuff the size of the network is vital to its gravity; ain't no bookmarking service with more gravity than Delicious.
Who follows?
But what about those stats about link followers that bit.ly offers? Let's dig into them. What do they really tell us?
I started to get suspicious that so many of the followers of links I tweeted were from the US, and often at a time when normal people would be a-bed across the Atlantic. The real-time stats showed that they were also very quick off the mark, and whilst the streaming nature of Twitter means that you expect responses to be quick or not at all, sometimes the click-throughs seemed to come even before I tweeted (via Spaz, the Air client I normally use). Super-quick, US-based (which only a few of my followers are), and very steady numbers for most tweeted links; were these clicks from real people at all?
Short answer
No, lots of them weren't; a steady residue of link follows came from bots of one sort or another.
Long answer: an experiment
I did a couple of experiments to test this. First I made a page on my own web space just fo' the bots, made a bit.ly link to it, and tweeted it asking humans NOT to click the link. This being a highly scientific an experiment I should here state that an explicit assumption was that my followers deem themselves to be human (though I know this was violated occasionally, including by myself. Doh!). I hoped that the stats from this web page would give me the answer as to how many click-throughs shown in the bit.ly stats were via browsers and how many were bots. Well, yes and no. I forgot how lame the stats on that web-space are. No breakdown by day nor details of users by page, only for the site. Nevertheless I get minimal visits to those pages and a massive peak on the day of that tweet, so I can probably tell enough. All the same I thought I'd better try Google Analytics too, so having set that up for my site I repeated my tweet. Then I thought, perhaps I should have used a new link? So I created a custom name for my URL and tweeted once again.
Some numbers
Of 17 link follows reported by bit.ly (http://bit.ly/info/3wQZ51), 15 were "direct", which would include bots but also most other applications, e-mail clients etc. One was from bit.ly itself (that was me, oops) and one from tweetdeck (Mike, that you?). 10 were from the UK and 7 from the US, and pretty much all of them happened within seconds or minutes of my tweets.
My original "for bots only" tweet on the 8th yielded 6 of the follows, plus that accidental click from me. According to PlusNet web stats package I had 51 hits and 22 visits that day, which were almost exclusively to that page (with some other pollution from yours truly, no doubt, as I clicked round the site setting stuff up). I guess that means that once they'd found the page via the bit.ly link some of the followers came back a few times. Now that's definitely not human.
Once I had my Google Analytics bit sorted out, on the 11th, I sent a second tweet. This resulted I think in one visit, by bit.ly's stats. This tweet used the original bit.ly short URL (http://bit.ly/3wQZ51), so presumably the bots figured it wasn't worth going there again. Looking back at the tweet, actually, I think I left the "http://" off so perhaps that's the real answer.
Anyway finally I did the same thing again but using a new custom link (http://bit.ly/bottest), which to the bots would appear to be a new link (all except bit.ly's own bots, perhaps?). This produced another 7 follows, and the next day there were two more when I wasn't watching. Google Analytics reported one visit to the target page, from a Firefox/Windows user in south London, so I presume that one of those 10 follows was via a browser. According to PlusNet, there were 28 hits and 15 visits on the 11th (1 visit is more normal).
So how many of the visits were bots? Well, putting GA together with bit.ly's stats I'd say only 1 out of 10 follows on the 11th/12th was not a bot, though it's possible that others were humans users that just didn't fire the GA code for one reason or another.
Overall, in August so far 13 hits in the web logs are attributed to the bitlybot user agent, 4 to the Tweetmemebot, 2 to twitturls.com's bot, 4 to Spaz, which I know as a user makes requests to something or other (bitly, or the target URL perhaps) to get some page info. A bunch of other bots and non-browser UAs are in there too but I can't say if they're related to the tweets.
Conclusions
I don't think I can squeeze much more from this paltry sample and the crappy and contradictoty web log stats, but clearly nearly all of the visits via Twitter/bit.ly were, as I hoped, not from humans and most likely came from bit.ly's own bot and that of Tweetmeme and Twitturl. From this evidence, if bit.ly reports that I get half a dozen "clicks" on a short URL I've tweeted then I can assume they're probably bots. More than that and they're probably at least partly human. Whether this applies to other twitterers I can't say, but you can do your own experiments. I'd like to repeat this using a site with a better stats package as bait, and perhaps using a few different twitterers to throw out the link, to see whether there's any relationship between numbers of followers and numbers of bots. Quite likely not, but who knows.
Is this any use? I dunno, but I'm a little better informed about the impact of my tweeted URLs now.

[[edit: ironically, looking for the custom link to this post that I made at the weekend, I found their user forums where there's more discussion of the bot problem e.g. http://feedback.bit.ly/pages/5239-suggestions/suggestions/126917-show-me-if-hits-are-bots-human-or-rss-readers-etc-]]

Tuesday, August 04, 2009

Spinal Muscular Atrophy links to act on

Yesterday I heard about three separate activities concerning Spinal Muscular Atrophy. This is a nasty disease that kills more infants than any other genetic disorder and in its milder forms leads to varying degrees of disability or threats to mortality. Recently you may have heard about one prominent sufferer of SMA, Baroness Campbell, a commissioner on the Equlity and Human Rights Commission (good profile/interview in the Guardian).
  • There is currently a petition on the Number 10 website seeking more funds for research into SMA, for whilst there are according to Wikipedia various "cures" being trialled there's nothing realistic in the offing. If you feel that, amongst the many competing claims on your taxes, this is a worthy cause, I'd urge you to sign up.
  • Secondly, the Jennifer Trust is a huge boon to sufferers of SMA and their families and is currently on the shortlist for a National Lottery Award for bst health Project, for the quality of its outreach programme. As well as more exposure the award brings a little cash, which would be nice, and of course recognition for the wonderful people that do this work. Again, there are other laudable health projects in the shortlist but we'd love your support in the form a your vote!
  • Finally, I found out that my colleague Adam Monnery (acting head of IT) is doing a charity triathlon which is raising money for a variety of charities, including those supporting ill and disabled children (see the Tri For Life site for more info). Here's his team's Just Giving page. [As an aside, I have to say it perplexes me that the '000s of charities in the UK haven't come up with their own alternative to JG (which takes a slice of the donation), but perhaps the economic benefits aren't worth it.]
Pardon the naked self-interest in this off-topic post, but frankly it's more important than any of the digital heritage stuff I'd usually put up! Thanks for reading.

Tuesday, July 21, 2009

Another catch-up post

Time for a catch-up. It's been quite a while, after all (umm, not counting the Ithaka post I wrote whilst failing to finish this one), and though I've got a bunch of drafts waiting they'll never come to anything so here goes with a bunch of things that have happend or caught my eye recently-ish.
  • Didn't get a job. Went for one, lucky enough to be interviewed, didn't clear that hurdle but did learn a bit along the way. Firstly, I really need to get more structured project management experience. Secondly, gotta calibrate my confidence gauge correctly. Am I accurately putting across what I'm capable of? Do I really know what I'm capable of? I don't want a job I'm unable to perform well but I do want to be stretched; it's a fine line and I think the employer is vital in assessing this question but they need the most accurate information to decide this (rather than bullshit), but equally I need to be able to assess myself objectively. For when the next job comes up.

  • Did get some help. Back in June we finally got me some help from Julia Fernee, a contractor (at present) with a museum/art background and whizzo tech skills who's just a god-send (hope that doesn't compromise my agnostic credentials). Julia's been working on the LAARC access system, which is one of those systems that's been broken since our Mimsy upgrade in late '07. She's worked methodically through the system fixing all the routines and various bugs, enabling downloads of digital archives (yay!), auditing, documenting, unf***ing stuff. We're going to look at the whole data access layer next and rebuild it in a proper service-orientated way, so that we can finally start re-using that amazing resource in other places and ultimately offer a public API. All assuming we can keep JF for long enough.

  • Got a boss. I posted before about Antony Robbins joining MOL. He started earlier this month and now we in the web team (i.e. Bilkis and I) need to start thinking of ourselves really as part of the Communication department. It's hard - we still sit with our old IT buds most of the time - but they're a good lot in Comms and there's an enthusiasm for e-marketing and social media. At the same time a number of other things are happening that hopefully bode well. These include MOL taking the first steps to a proper digital (or is it web?) strategy; the creation of a "digital museum manager" post to lead our team; and the initiation of a social media group with participants from many departments.

  • MOLA and Nomensa. We had a very useful review of the MOL Archaeology website from Nomensa. We were well aware of many of the problems but having some fresh eyes to help develop ideas on how to solve them is really helpful. They also picked up various points we'd not really noticed. Lots of the issues relate to our complicated new brand, which has made confusing messages almost inevitable; nevertheless we can do better.

  • Open Repository. I went over to Gray's Inn Road to talk to the folks at BioMed about Open Repository, which hosts a never-realised-but-still-paid-for repository that was intended to provide an OAI gateway for some PNDS data. They were wondering (rather honourably I thought) whether we fancied making anything of the investment. It seemed like a good chance to reduce my ignorance of exactly how repository software fits into the general scheme, its overlap with e.g. DAMS and so on. There's a nest of problems in our Collections Online Delivery System that such software might play a part in addressing, but equally it will be but one part of the architecture and does it fit better than alternatives, or a custom-made "black box" (see below)? I learnt a lot, but haven't reached a resolution yet.

  • CODS. So, speaking of CODS, we struggle on. Can I bear to go into it now? Not really. We're getting closer to defining the edges of the bit we can't define (the "Black Box") but whether anyone will want to build it for us, or think we're anything other than insane for proposing it, is another matter. The Black Box, by the way, is the part that takes data from multiple sources and aggregates it, but also enables its enrichment via the creation of new associated content and relationships between entities. It doesn't have to do the discover part or offer many services but it does have to offer a reasonable authoring/management interface. It's not a hole that seems to fit any off-the-shelf software, so as I say perhaps we're simply stupid to dig that hole in the first place.
    Anyway, deadlines loom and something must congeal before then. Should be a laugh seeing what it is. Oh god.
  • IT dead people. I'm helping to put together a proposal for a project I won't be able to give details of as yet, but it's an interesting opportunity to wed multiple strands of archaeological/historical research with popular interests, notably family history.

  • MCN2009. I'm honoured to have been invited to present a paper for which I submitted an abstract, it seems like an age ago. MCN2009 takes place in Portland, Oregon in November - again, seems like an age away but I'd better not leave it too long before I get scribbling in earnest. I'm very excited, both to be asked there and by the paper itself, which I think should be quite fun to write.
  • Went to Paris in the the spring. Well, June, when I tripped over to IRLIS (next to the Pompidou Centre) for a Europeana meeting to develop API requirements. It was an interesting exercise and I think we made progress with evolving ideas for end-uses and for figuring out priorities. One of the most important things to work out is how to intertwine the API build with that of the internal architecture, so that we can make best use of what's going to be built anyway.

I think that covers most of it for now. Still awake at the back?

Ithaka report ith a catastrophe (for me)

I am tho thcrewed.
Ithaka S+R has produced a report for JISC, the SCA, NEH and NSF entitled "Sustaining digital resources: an on-the-ground view of projects today". It's a great report, outlining a sensible approach to what sustainability actually means, strategies to achieve it, and how a number of current projects put these strategies into practice. It's a follow-up to "Sustainability and revenue models for online academic resources" (2008).
The problem, for me, is that much of the originality that I hoped my PhD work would offer has vanished overnight. Their definition of sustainability; their explicit linking of financial sustainability to the value offer; their arguments for leadership that offers clarity of purpose and evidence of success; their clear-eyed distinctions between the financial versus mission-based value on which non-profits must base their measures of success and arguments for their future: all of these could have been plucked from my private writings and debates with my supervisor over the last 3 years. Yet of course they haven't been; they're the result of parallel thinking, but these guys have gathered the evidence that I'm still at the early stages of assembling.
It's not that my thoughts were especially profound, but I've never previously found any authors linking sustainability and the value proposition so clearly to the digital work of cultural heritage institutions. Now that niche is no longer empty. I'm really pleased to see that there are others of like mind out there, and of course a report like this gives me a great reference in support of my work, but the problem for me is not just one of dented pride, because I have to demonstrate considerable originality in my work and without that, no PhD. It's a pickle.
On the upside, there are some differences. The Ithaka report emphasises how important it is to be clear on the sources of value (and cost) in order to make the case to funders for continued support. I will too, but I hope to investigate more deeply the influence of decision-making processes and the sources of "friction" that may cause a discrepancy between what a resource is "worth" and what people are prepared to invest in it (complicated, of course, by questions of opportunity cost and uncertainty). Ithaka mainly looked at digitisation projects, in the sense of those offering sets of digitised material such as images, maps, papers or databases. I'm just as interested in the value of learning objects, games, mobile phone tours, exhibition websites (though admittedly my core case studies are less diverse than this), and only in a cultural heritage context, not education. I'm interested in certain "modalities of constraint" (c.f. Lessig) that aren't addressed by Ithaka, and in questions of risk management. I am looking at three varied partnerships and the effects that collaboration has on decision-making, value and funding, and partnership is an area that the Ithaka report, by its own admission, examines only briefly. So perhaps all is not lost, but whilst I genuinely think that this report is excellent and provides a lot for people in the digital humanities to chew over, it's given me quite a tricky challenge. I'm not about to give up my PhD, but with everything else that's happened over the last year I have to confess it's taken me pretty bloody close.

P.S. Ah bollocks, now Nick's at it too (para 6). Shows how un-original I was in the first place.

*Nancy L Maron, K Kirby Smith and Matthew Loy, 2009. "Sustaining digital resources: an on-the-ground view of projects today. Ithaka case studies in sustainability" http://www.ithaka.org/ithaka-s-r/strategy/ithaka-case-studies-in-sustainability

Tuesday, May 12, 2009

Do the Europeana survey for purely selfish reasons

[edited from a circular e-mail, here's a chance to win something shiny. Any why not?]

Please use Europeana.eu, the European digital library with 4 million digital objects, and join a survey to win iPodTouch.

Friday, May 08, 2009

Solr: lessons in love

OK, maybe this will turn out to be the promised foundational intro to Solr/Lucene and the reasons why I've found myself seeing them as some kinda saviour. Or it may just be a few of the lessons I've learned thus far in my experiments.

Solr is...

...a wrapper around the Lucene search engine, a mature OS product from the Apache Foundation, which gives it an HTTP REST interface - in other words, it's got an out-of-the-box web API. So why do we not hear more about it in the world of the digital museum? I've been vaguely aware of it for a long time, and quite possibly there are many museums out there using it, but if so I've not heard of them. We've been talking for so long about the need to get our stuff out there as APIs and yet there's an easy solution right there. OK, perhaps the API isn't quite what you might specify a discover API to be, but then again maybe it is. It certainly enables pretty sophisticated queries (though see below) and one virtue is that the parameters (though not the field names) would be the same across many installations, so if every museum used Solr we'd have the start of a uniform search API. A good start.

Installation and configuration
Dead easy. Read the tutorial. Download the nightly (it comes with examples), though it's not as easy to find as it should be. Being the nightly, it needs the JDK rather than the JRE to run it, and on my machine I had to fiddle about a bit because I have a couple of versions of Java running, so I couldn't rely on the environment variables to start Solr with the right one. This just means putting the full path the the JDK java EXE into your command prompt if you're running the lightweight servlet container, Jetty, that it comes with. This is the easiest way to get going. Anyway I wrote a little BAT file for the desktop to make all this easier and stop fannying about with the CMD window each time I wanted to start Solr.

The other thing to remember with Jetty is that it's run as part of your user session. Now, when you're using Remote Deskop and you log off you see a message to the effect that your programmes will keep running whilst you've logged off. Well for one server this seemed to be true, but when I tried to get Jetty going on the live web server (albeit going via another, intermediate, RDP) it stopped when I disconnected. I thought I'd use Tomcat instead, since that was already running (for a mapping app, ArcIMS), and by following the instructions I had it going in minutes. Now that may seem unremarkable, but I've installed so many apps over the years, and pretty much anything oriented at developers (and much more besides) pretty much always requires extra configuration, undocumented OS-specific tweaks, additional drivers or whatever. With Solr, it's pretty much download, unpack, run - well it was for me. Bloody marvellous.

Replication

Well this couldn't be much easier, and so far no problems in the test environment. Use the lines in the sample config file to specify the master and slave servers, put the same config on both of them (of course) and the slave will pick it up and start polling the master for data.

Data loads/updates

The downside to this seems to be the time it can take to do a full re-indexing, but then that's my fault, really, because I've not done what's necessary to do "delta" updates i.e. just the changes. It can take a couple of hours to index 3000 records from our collections database - these have some additional data from one-to-many relationships, which slows things down

Data modelling and the denormalised life

Before indexing and updates, though, comes modelling. I've only done one thing so far, which is grab records from a relational database, and after a straight-forward basic version (the object record) I played around with adding in related records from subject, date, and site tables. Here I found a problem, which was that the denormalised nature of Solr was, well, more denormalised than was healthy for me. I still haven't quite got my head around whether this is a necessary corrollary of a flattened index, or a design limitation that could be overcome. You get to group your multiple records into a sub-element, but instead of a sub-element for each related record, you get a sub-element for each related column. Basically I wanted a repeating element for, say, "subject", and in that element further elements for ID, subject name, hierarchy level. Instead I get an element containing ID elements, one with subject names etc. Like this I cannot confidently link ID and subject name. My work-around was a pipe-separated concatenated field that I split up as needed, but that's really not ideal.

The other thing I've not yet tried to deal with is bringing in other types of record (or "documents" in the Solr vocab). For instance, full subject records searchable in their own right. Probably they belong in the same index, but it's possible to run a multi-core instance that might be the proper way to handle this. Dunno. Better find out soon though.

Incidentally one of the nice things I've found with this denormalised way of life, with the whole thing set up for speedy search rather than efficient storage or integrity (you have to assume that this is assured by the original data source) is that some of the nightmare data modelling problems I'd struggled with - duplicate images, messy joins - don't really matter much here. go for speed and clean up some of the crap with a bit of XSLT afterwards.

Oh, and a small JDBC tip. I've not used JDBC before (I needed it to connect to SQL Server) but installation was a breeze once I'd figured out which version of the connection string I needed for SQL Server 2000. I needed to drop the jar file into the Solr WAR directory, if I recall correctly - that's where it looks first for any jars - so whilst there may be more "proper" solutions this was effective and easy.

Oops, wrong geo data! (pt. 1)

I already talked about one problem, the lack of a structure to hold related multi-value records usefully. One other problem I had with data modelling was with lat/long data. First problem: it wasn't lat/long data in the lat/long fields. Arse. No wonder it seemed a bit short on decimal points - it was OSGB. I brought in the right data from another database (lat/longs are not in Multi Mimsy at the moment). Job done.

Data typing problems - floats and doubles; oops, wrong geo data! (pt. 2)

....or not. To float or not to float? I'm not very good with databases or, indeed, data types. I knew I needed a Trie field in order to do range queries on it for the geo data. Clearly an integer field would not do, either, these being numbers with lots of numbers to the right of a decimal point (up to 6, I think). A float was my first port of call. Turned out not to work, though, so I tried a double. This worked, but I think I need to change the precision value so that I can use finer-grained query values. Do look at trie fields if you're going to use Solr for range queries, they're pretty damn quick. Sjoerd at Europeana gave me the tip and it works for them.

wt=xslt and KML

One nice thing with Solr, being a REST-y HTTP wrapper for Lucene, is that you can do it all in the querystring. One such thing is specify the transform and get your results out of Solr as you want them, rather than having to pull them into some other environment and do it there. So whilst I was at the in-laws over bank holiday weekend I could RDP to the web server, set up Tomcat and write a quick transform that could be called from the querystring to return KML instead of plain Solr XML. It was at this point that I realised about the geo data problems, but once they were resolved the wt=xslt method was sweet. Though you can't use your favoured XSLT engine - it's Saxon, for better or worse.

Other limitations

This is based on my limited knowledge and experience and so subject to being completely wrong. However...

  • It's not an RDB. No sub-selects, awkward ways of joining data
  • I've found that indexing seems to get inconsistent results. It might be that there have been small differences between the data config files each time with big results, but I'm pretty sure that it's just cocking up sometimes. Maybe the machine I'm running or the database server are over-worked, but sometimes I get 2k records rather than 3k, and also I may find that a search for a particular borough returns no results even though I can see records in there with that borough in the text AND the borough field. Somethings wrong there.
  • flaky query order. I got no results with a query along the lines of , "text contains keyword, and latitude is between val1 and val2", whereas if I did "latitude is between val1 and val2, and text contains keyword" I got loads. Various other fields were sensitive to being first in the query. I can't find this documented as being intentional. Some situations were helped by putting brackets around the bits and pieces, but some weren't. I'd recommend lots of brackets and judicious use of both quotation marks and "+" signs.

So OK, it's not been totally wrinkle free and I'm sure I've forgotten or mixed up stuff, but on the whole Solr has been a great experience. I have a lot still to find out and plenty to fix before my current test dataset is good, but I'm confident that this will end up at the heart of the discovery and web service parts of our planned collections online delivery system. Check it out.

Thursday, May 07, 2009

A new Head of Communications

Well after many months it seems we have a new Head of Communications on the way. Congratulations to Antony Robbins (LinkedIn profile), who will be our boss starting from July. This is, I suppose, when we'll start to find out what it really means for the digital media "team" to be split off from the rest of IT and integrated into a department with Press and Marketing. There are obvious synergies there, but there are with IT too (and several other departments) so I hope we can develop a healthy and balanced perspective on what the digital museum is all about.


Hopefully before that point we will have started or even completed the process of finding a replacement for Mia, who we lost to the Science Museum all those months ago. The vacant post is being boosted to "Digital Museum Manager", to make up for the fact that we have no manager responsible for web and digital media since October, for reasons it would be imprudent to expand upon here. We need someone at that level to take on the planning, policy and strategic work that the HoC will be too busy to deal with, given that he's covering the whole of communications (internal and external), but we also have to have a developer to fill the gap that Mia left so this will be a pretty hands-on post, with probably more time coding than managing. We'll have to see if this proves sufficient, since even when we were fully staffed we were short-staffed.

Looking at Mr Robbin's profile it is good to see that internal communications are part of his skill-set. I think it's broadly felt at all levels here that MOL needs to work on this area in order to strengthen us as a corpus of colleagues with a commonly understood direction, and it will be interesting to see how our internal comms evolve in the coming months. Between now and July there's a lot that needs doing, so we'll have to muddle on in the meantime, but overall an interesting time ahead.

Tuesday, May 05, 2009

CFP for VALA2010

i.e. a trip to Australia. VALA 2010 looks like an interesting conference:

VALA promotes the use and understanding of information and communication
technologies across the Galleries, Libraries, Archives and Museum sectors.

The CFP is here but the deadline is nearly up (although the conference isn't until Feb 2010)

Museums Association digital events

The Museums Association is bit by bit getting more involved in the digital side of museums. There's never much in the Museums Journal, to be honest, but Museum Practice has regular web reviews in and recently ran a feature on in-gallery digital media.
The only conference in that area that I recall the MA running was, ooh, 2006 or so, but there are two more coming up. In June we have World wide wonder: museums on the web (NOT to be confused with the long-standing MGC-run UK Museums on the Web conference that I presume will take place later that month). There are some great people lined up for that, with perspectives ranging from academic to managerial to dirty-hands coder to strategic.
Then on September 18th is "Go digital: New trends in electronic media", which looks like it draws upon the sources interviewed for the MP special (including the director of public programmes here, David Spence). In contrast to June, it looks like it's going to be focussed on off-line media.

Monday, May 04, 2009

A dawning realisation?

[third in a recent series of observations and unfinished semi-coherent thoughts I just need to get out of the way]

Nowadays everyone I talk to questions the metrics they use. More than that, people seem keener to dig into what they may mean in terms of value. Seb Chan is amongst those in our sector that's exploring how to make better, and better use of, measurements, and closer to home, Dylan Edgar's work with the London Hub dug into similar issues.


Last week in a catch-up with the director of my division we touched on his own objective of "improving the website". In itself it's encouraging that the objective is there, as part of the reorganisation we are currently experiencing, but "improving the website" is a pretty broad ambition. I think it's a subject that we'll revisit in more depth soon, but it was clear that our director was as aware as we web types were that when you lift up that rock you'll find a tangled mess of questions. Before you talk about "improving" you need to identify what you consider to be valuable, and to disentangle theoretical "improvements" from impact, preparedness, experimentation etc. Obviously a set of measurements that to some degree reflect these valued qualities are a sine qua non for managing their realisation, and so here's a reference to provoke a little more thought on the subject that I won't dig into here, but has had me rethinking my own attitudes web stats and the whole evaluation problem: Douglas W Hubbard, 2007, How to measure anything : finding the value of "intangibles" in business. *

In any case I find it encouraging that in this discussion and others with senior colleagues there seems to be a dawning awareness that we have a complex, multidimensional environment to deal with, wherein the varieties of "success" may be as varied as between all the departments within a museum. I'm not sure that it would always have been the case that the higher echelons were aware of the perils of trying to evaluate our digital programmes, although perhaps any senior manager worth their salt will have long ago twigged that a website is not "improved" merely by adding pages, Flash splashes and video - evaluating the more familiar physical museum is no easier, after all, and nor is improving it. We do need to have that conversation about what we mean by "website" with senior management, though. Is it only geeks that see this as only a part of our digital presence?


When it comes to the use of web stats of various sorts, there have always been lots of complaints about them, but I suspect that in this discussion too we are seeing greater recognition that it's not about visitors versus hits. Maybe it's not even enough to focus on "impact" since the heart of the matter arguably lies a level deeper than that: the first step is figuring out what impact itself means in the context of the museum's mission, and in this networked environment in the mission of the meta-museum that we must realise we are a part of.

Rhetorical question for the day, then: Is there a mission for the meta-museum, and do we measure up to it?


*I hope to post about this book properly, eventually, but don't wait for that, try to check out the book which, for all its flaws of repetition, is full of useful ideas and tools.

From the library: Renaissance and metrics

Every now and then I drop into the MOL library and flick through the latest journals. There's usually something to catch the eye. Last week, as well as the Bearman interview I mentioned already, I picked up March's Cultural Trends, which includes an article from Sian Everitt that reviews data collection and documentation pracitices for Renaissance in the Regions.

It's not a brilliant piece, to be honest; it's limited by reference to online publications and ends up muddling the question of what data are gathered with that of what is made available on public websites. Everitt was writing in advance of a review being conducted for the MLA (review FAQs) by an advisory group led by Sara Selwood, Phase 1 of which was to be completed last autumn so as to inform the business plan for the years ahead [note to self: track down other Selwood refs on data collection in cultural heritage]. Because of this it's quite likely that Everitt's findings were out of date before they were even accepted for publication. All the same there are some interesting points within the paper. For example, despite the declared intention of Renaissance to standardise methods of evaluating impact, Everitt finds notable variability in how this is actually undertaken. Two Public Service Agreement targets are applied to Renaissance, and measurements against these seem to be uniform, but beyond this and the headline figures there is less consistency; likewise the approaches to making their data, analysis and reports public vary greatly. I also discovered that the MLA also offer a set of Data Collection Guidelines and templates, which I now need to digest. Presumably this 2008 manual (PDF) is the replacement for the 2006 version that Everitt was refering to, and here's a page on the MLA site about the results to 2006.

I look forward to seeing whatever parts of the Selwood-led review are published. The overall direction of Renaissance is up for grabs, it would seem, which could have a big impact in the Museum of London, for one. I will be especially interested, though, in the data collection strand, and in how they suggest we evaluate impact.

ICHIM and DISH

I hadn't twigged that the 2007 ICHIM was in fact the last of that long-running series of bi-annual conferences, which ran, amazingly, from 1991. April's issue of Curator starts off with an interview with David Bearman on the ICHIM's history, why it ended, and what next. Let's not forget that dbear and Jennifer Trant also run the universally adored and enormous Museums and the Web conferences, but ICHIM covered somewhat different territory and arguably there's a space that needs filling now...

...which is why it was timely that on the same day I found that interview, I also read about DISH2009:


"Digital Strategies for Heritage (DISH) is a new bi-annual international
conference on digital heritage and the opportunities it offers to cultural
organisations."

DISH 2009 takes place in Rotterdam December 8-10th, and the CFP is up. It looks interesting: taking a step back to look at strategic questions of innovation, collaboration, management etc.

Thursday, April 30, 2009

NMM, YQL, COBOAT, CODS

Jim O'Donnell organised a talk on Tuesday at the National Maritime Museum from Christian Heilmann of Yahoo! Mia wrote up her notes already and I've not got much to add, but it was a very enjoyable presentation, and when he reached the juicy bit about YQL and BOSS, both of which I'd left for another day's exploration, I learned a lot. Clearly there's a lot of potential there (especially now it's augmented by YQL Execute, announced yesterday), and it looks like it will let you do a bunch of things that Pipes can't do, or is a pain to do (the GUI is great and yet infuriating with Pipes). YQL gives a common API meta-interface (I guess that's the word) for loads of other APIs and for things with no API; it also handles all the crap with authentication, tokens etc; and it will act as the gatekeeper for your API so you don't get hammered by unreasonable numbers of requests.

As with similar tools/services (Pipes, Dapper, dbpedia, and various things nearer the surface like GMaps), YQL is clearly a blessing from both ends of the telescope: we get to use it for its intended purpose - to be "select * from Internet" is the grandiose ambition - knitting together data sources from Yahoo! and beyond; and we also get to offer our data in a developer-friendly way to encourage its reuse by creating OpenTables [note that these are purely a machine-friendly description of how to access data: no data is handed over as such]. Jim has already been busy creating Open Tables and experimenting with YQL.

Following the talk we headed for a pint (and one of themost jaw-dropping jokes I've heard, from Chris), and it was good to talk to Tristan from Cogapp. When I stopped raving incoherently about the marvel that is Solr (yes, still in love even as I gradually find out more about it), Tristan cleared up some questions for me about Cogapp's COBOAT app. They recently open-sourced this (as far as possible), in the context of the Museum Data Exchange project with OCLC (see Gunter Waibel's recent post), where it plays the role of connecting various collections management systems to an OAI Gateway-in-a-box, OAICatMuseum (well seems like it's only used with TMS in the project, but the point of COBOAT is that it just makes life easier for mapping one data structure to another, and another CollMS would slot in just fine).

For me, both COBOAT and OAICatMuseum are of interest for the role they could play in our the revamped Collections Online Delivery System* we'll build this year, resources allowing (in other words, don't hold your breath. Mission critical, yeah, but worth paying for? I await the answer with interest). Integrating and re-mapping data sources, an OAI gateway, and sophisticated and fast search are key requirements, as is a good clean API, and taking these two applications along with Solr I feel like I may have identified candidates for achieving all of these aims. We're a long way from a decision, of course, at least on the architecture as a whole, but I have some tasty stuff to investigate, and I'm already well down the track in my tests of Solr.

Thanks again to Jim for arranging the talk. He's got another great guest coming up, hopefully I can make it to that one too.

*I'm resigned to this thing being called CODS but still hoping for something less, well, shit

Sunday, April 26, 2009

macro-blogging about micro-blogging

am on Twitter at last learning to be brief. Not easy.

Saturday, April 25, 2009

Catching up with Europeana v1.0 [pt.2]

[see part 1 for stuff about what I did before the kick-off meeting]

So April 2nd/3rd were the kick-off meeting for Europeana 1.0, the project to take the prototype that launched last November and develop it into a full service. There may have been glitches at the launch but at the meeting there was a tremendous feeling of optimism, sustained I suppose by the knowledge that those glitches were history, and by the strength of the vision that has matured in people's minds.

The meeting was about getting the various re-shuffled (and trimmed) work-groups organised, with their scope understood by their members and refined in some initial discussions before the proper work begins. There are tight dependencies going in all directions between the work-groups. My problem was, on reflection, a very encouraging one: it was difficult to decide which WG I should work with, since they nearly all now have some mention of APIs in their core tasks. Given that concern over APIs was the reason I got involved with Europeana, it's great to see how central a place they occupy in the plans for v1.0. Not surprising, perhaps, given the attitudes I've discovered since joining, but feeling more real now that they're boosted up the agenda. For those who worry (as I used to) that Europeana was all about a portal this shows that fear is groundless. Jill Cousins (the project's director) distilled the essence of Europeana's purpose as being an aggregator, distributor, catalyst, innovator and facilitator; the portal, whilst necessary, is but a small part of this vision.

In the end I elected to join WG3.3, which will develop the technical specs of the service, including APIs. Jill is also organising a group to work up the user requirements (to feed to WG3.3), which I'll participate in. I guess this will also help to co-ordinate all the other API-related activity, and I'm thrilled to see several great names on the list for that group, not least Fiona Romeo of the National Maritime Museum. Hi Fiona! I hope to see more from the UK museum tech community raising their hand to contribute to a project that's actually going to do something, but for now it's great to have this vote of confidence from the museum that puts many of us to shame for their attitude and their actions.

So we heard about the phasing of developments; about the "Danube" and "Rhine" releases planned for the next two years; about the flotilla of projects like EuropeanaLocal, ApeNet, Judaica, Biodiversity Heritage Library, and especially EuropeanaConnect (a monster of a project supplying some core semantic and multilingual technology, and content too); and about the sandbox environment that will in due course be opened up to developers to test out Europeana, share code and develop new ideas. Though we await more details, this last item is particularly exciting for people like me, who will have the chance to both play with the contents and perhaps contribute to the codebase of Europeana itself, whilst becoming part of a community of like-minded digi-culture heads.

Man, you know, I've got so much stuff in my notes about specific presentations and discussions but you don't want all that so here's the wrap. As you can tell I've come away feeling pretty positive about the shape it's all taking, but there are undoubtedly big challenges, in terms of achieving detailed aims in areas like semantic search and multilinguality, but also in ensuring the long-term viability of the service Europeana hopes to supply; nevertheless the plans are good and, crucially, there are big rewards even if some ambitions aren't realised.

Within the UK there are a number of large museums with great digital teams and programmes that are not yet part of Europeana. There are also, obviously, lots of smaller ones with arguably even more to gain from being in it, but they have more of a practical challenge to participation right now. But why is it that those big fish are not on board yet? Is it just too early for them, or are there major deterrents at work? I know that there are people out there, including friends of mine, who are sceptical of Europeana's chances of success and sometimes of its validity as an idea. The former is still fair enough I suppose, or at least the long-term prospects are hard to predict; the latter, though, still mystifies me. If we want cross-collection, cross-domain search - and other functionality - based on the structured content of large numbers of institutions, there's really no alternative to bringing the metadata (not the content) into one place. Google and the like are not adequate stand-ins, despite their undoubtable power and despite the future potential for enabling more passive means of aggregation by getting, say, Yahoo! to take content off the page with POSH of some sort (which certainly gets my vote, but again relies on agreed standards). Mike Ellis and Dan Zambonini, and I myself separately, have done experiments with this sort of scraping into a centralised index, turning the formal aggregation model around, and there's something in that approach, it's true. Federated search is no panacea given that it requires an API from each content holder and is inferior for a plethora of reasons. Both are good approaches in their own ways and for the right problem - as Mike often reminds us, we can do a lot with relatively little effort and needn't get fixated on delivering the perfect heavyweight system if quick and light is going to get us most of the way sooner and cheaper. But I can't help but detect some sort of submerged philosophical or attitudinal* objection to putting content into Europeana - a big, sophisticated, and (perhaps the greatest sin of all) a European service. I sense a paranoia that being part of it could somehow reduce our own control of our content or make us seem less clever by doing things we haven't done, even if we're otherwise agile clever web teams in big and influential museums. But the fact is that a single museum is by definition incapable to doing this, and if you believe in network effects, in the wisdom of crowds, in the virtues of having many answers to a question available in one place, then you need also to accept that your content and your museum should be part of that crowd, a node in that network, an answer amongst many. If your stuff is good, it will be found. Stay out of the crowd and you don't become more conspicuous, you become less so. Time will doubtless throw up other solutions to this challenge, but right now a platform for building countless cultural heritage applications on top of content from across Europe (and beyond?) looks pretty good to me. It's heavyweight, sure, but that's not innately bad.

If your heritage organisation is inside the EU but isn't part of Europeana, or if it's in it but you aren't part of the discussions that are helping to shape it, then get on board and get some influence!

Flippin' 'eck, I didn't really plan on a rant.

*is this a made up word?

Catching up with Europeana v1.0 [pt.1]

Last November, a prototype Europeana launched. Many (perhaps even both) of you will know that the results were mixed: the index itself was successful, at least given its proof-of-concept status, but personalisation features were not optimised and led rapidly a crash as the user sessions racked up. It seems that the solution to this was essentially configuration, but politics meant that more had to be seen to be done and so hardware was thrown at the problem. A couple of weeks later the site was back but under the radar and without the personalisation bit ("My Europeana"), and more recently this too has returned - go and have a play here.


Prototyping done, the bid was assembled to develop a full-blown service, "Europeana v1.0". This bid to the European Commission was successful and just before Easter a kick-off meeting was held at the Koninklijke Bibliotheek in the Hague to initiate the project. This is actually but one of a suite a of projects under the EDLFoundation umbrella, all working in the same direction, but I guess you could say it's the one responsible for tying them together.


So how is Europeana shaping up now? Having spent three days finding out I can tell you now that I came back feeling good - and not just because I was heading straight off again on holiday. Day 1 was about travel and (obviously) a long and lovely trip to the Mauritshaus, but it ended with an hour in the company of Sjoerd Siebinga, lead developer on the project, and a session with Jill Cousins, Europeana's director. I went to see Sjoerd because I wanted to find out how Europeana's technical solution would fit with our plans at the Museum of London for a root-and-branch overhaul of our collections online delivery system. I knew that they'd be opening the source code up later this year, and I also knew that in essence what Europeana does is a superset of what we want to do, so I figured, find out if there'll be a good fit and whether there are things I could start to use or plan for now. Laughably, I thought that we might actually be able to help out by testing and developing the code further in a different environment - as if they needed me! I'll save this for another post, but in short Sjoerd took me on a tour of what they use as the core of the system (Solr) and blew me away. There are layers that they have built/will build above and below Solr that make Europeana what it is and may also prove helpful to us, but straight out of the box Solr is, quite simply, the bollocks. I've known of it for ages, but until given a tour of it didn't really grasp how it would work for us. Many, many thanks to Sjoerd for that.

Next I met with Jill for an interview for my research on digital sustainability in museums, where we dug into the roots of Europeana, its vision, key challenges, and of course sustainability (especially in terms of financial and political support). This was fascinating and revealing and added a lot to my understanding of the context of the project's birth and its fit in the historical landsacpe of EC-funded initiatives in digital/digitised cultural heritage. As a research exercise it was a test of my ability to work as an embedded researcher; one who is not just observing the processes of the project but contributing and arguing and necessarily developing opinions of his own. I really don't know how well I did in this regard - I'm not sure how often my attempts to be probing may in fact be leading, or whether my concerns with the project distort the approach I take in interviewing. Equally I don't know if this matters. A debate to expand upon another time, perhaps.

Days 2 and 3 were the kick-off meeting, and I'll put that in another post.

Thursday, April 16, 2009

Museums and digital sustainability: the other meaning

Well, this is not an angle I'd considered before in my research (the sustainability of digital resources in museums, since you ask), but I guess it's interpretation of the problem:
Pirate Bay server becomes museum artefact
Whether the Swedish National Museum of Science and Technology will be sustaining the file-sharing service is another matter.
I guess, joking aside, that really does highlight the key difference between (my definitions of) sustaining and preserving: the latter is about keeping stuff in existence, the former about fulfilling their purpose.