The Doofer Call: April 2008

Wednesday, April 30, 2008

"da blog" the WCT workflow tool

Tuesday, April 29, 2008

KML to go

Well I've finally bridged the gap between our site summaries (which have long been available online) and GMaps (likewise). I told part of the story earlier - the summaries that our archaeology service write are compiled into an XML document (processed out of Word via a macro...), and transformed into HTML with XSLT. But because the location data is always in OS grid references it's no good for online mapping apps (which all like latitude and longitude). So I've been trying to find a way to get lat/long for the sites (which number many thousands) in order to let us plot the data for World And Dog, if they're not otherwise engaged.

Step one was to get a way to clean up the TQ-style OSGB data. Step two, adapt code/write a web service to enable me to pass that lot in and get back latitudes and longitudes. From there I needed to combine the resultant XML with the original site summaries. I tried doing stuff with Yahoo! Pipes but it wasn't too keen on my XML, or at least it wouldn't show me the items. Anyway, instead of that I thought I'd draw both datasets into one XSLT transformation and output KML, which is what I've done today (thanks in part to the inspiration of Raymond Yee's great "Pro Web 2.0 Mashups" book from Apress). I would have liked to just pass in a single variable (year) and go through all these steps automatically but it wasn't worth the hassle since every step needed a touch of hand-massaging on the data.

The KML includes all the site summary descriptive content. Looking at the resultant Google Maps I see there are some glitches, like things in the wrong place, and things without coordinates, and actually I need to check out 1999 which has a fatal error somewhere. I don't have time to fix these right now, but overall I'm pretty excited: at long last, we have a nice mapping interface for the public to look at all those thousands of excavations, desktop assessments, surveys etc. that MoLAS has conducted since 1992 (but not 2007 yet). Well I say all, in fact some of those from outside the London area are not included.

Now I'm hoping that someone will come and do something cool with the KML. In due course I'll have a go myself, but if you come up with anything please let me know!

So here are links to the functioning maps. Save them to My Maps and do something with the result!

[Edit: you can also see these embedded into our website. Here's 2006]

MoLAS site summaries 1992
MoLAS site summaries 1993
MoLAS site summaries 1994
MoLAS site summaries 1995
MoLAS site summaries 1996
MoLAS site summaries 1997
MoLAS site summaries 1998
[MoLAS site summaries 1999 - bust right now]
MoLAS site summaries 2000
MoLAS site summaries 2001
MoLAS site summaries 2002
MoLAS site summaries 2003
MoLAS site summaries 2004
MoLAS site summaries 2005
MoLAS site summaries 2006

Thursday, April 24, 2008

OSGB-lat/long web service for GIS

We have loads of geographical information. Trouble is, it's almost all in OSGB grid reference form, which is no good for feeding to apps like Google Maps. Worse, much of it uses old-style 100km squares (mainly TQ, which covers London). We've taken a couple of approaches to this - hand-making a few maps in Google (for example, our Olympics work is mapped here), and using batch processing scripts from ESRI or others to manipulate the data in the geographical fields of our archaeology database, creating latitude and longitude values to accompany the OS grid references. However there is still a good set of data that needs another approach - for example, our site summaries for the last decade and more are available online. These XML-driven pages contain only OSGB data. I plot them very crudely onto our own ESRI-based map application, but would much rather have KML to work with.

So to the point. I came across a great script on the Hairy Spider site that also runs a web service. I wanted to take this further so that (a) I could pass in lots of values, not just one at a time (b) it could handle the TQ-style syntax (c) not keep on hitting friendly Mr Spider's server. The code available is for the convertion from OS eastings and northings to lat/long only and I've not tried to reproduce his "proper" web service, but I do now have something that will work for my needs. I can pass in a querystring like

gr=tq709098,SW465987,tl123456,51232456,512300245600

and get back something like:

50.8618352280453
0.431144625126249
tq709098

50.7318874223712
-5.59642485933937
SW465987

52.0968934973257
-0.358719120254802
tl123456

52.0968934973257
-0.358719120254802
51232456

52.0968934973257
-0.358719120254802
512300245600

[note that the last three values in the query were all for the same grid reference but in different formats, producing the same lat/long]
For me this is pretty useful. I may well extend this to take more parameters and pass out KML, but the main thing is having a means to convert the data on the fly over HTTP.

Many thanks to Hairy Spider for doing all the hard work on this. I've tested the outputs and they're very close to the OS's own tool, so good work HS!

Next thing is to use this. I'll let you know when I do, and we might also be open about making the service publicly available if that would be of assistance to anyone that might be reading this.

Tuesday, April 22, 2008

AdaptiveBlue offers AB Meta. Did the earth move for you?

AdaptiveBlue is an interesting company. Although I've not found myself using their Blue Organiser tool all that much myself, I can see which way they're pointing and I like it. Now they have announced how they wish to refresh of the old familiar META tags in the heads of web pages with their take on object-centric metadata. AB Meta (apparently developed with other web companies) is all about surfacing semantic data into the layer that we typically interact with, and that even non-tech people can hopefully author without too much trouble. From their page:

AB Meta is a simple and open format for annotating pages that are about things.
A book publisher can use AB Meta to provide information about a book such as the author and ISBN, a restaurant owner can provide information such as the cuisine, phone number and address and a movie reviewer can annotate reviews with movie titles and directors.
The format allows site owners to describe the main thing on the HTML page in a very simple way - using standard META headers. AB Meta is purposefully simple and understandable by anyone. AB Meta is based on eRDF Standard.

I'm especially interested in this "surface" expression/implementation of SW. It's clear to me that much of the running in recent times has been made by companies looking to SW-style concepts and aspirations to deliver real benefits to their business, and only in a few cases has this led to them taking a classic-ish SW path (c.f. Reuters with OpenCalais). AdaptiveBlue and many others have instead set out along the light-weight, near-the-surface route, and as an eternal optimist (for some reason), I am hopeful that this will ultimately deliver the meat that the heavy-weight, deep SW needs to do something exciting. Thus killing the chicken/egg situation, with pay-offs along the way. This was the real take-home for me of last year's SW think tank.

Whether AB Meta has a part in this for museums I can't say. It's certainly lightweight but whether it will be different enough from existing alternatives to persuade our sector to adopt it, I don't know. Perhaps the earth will yet move.

As a PS, I should add that I dropped them a line to ask about a detail (whether it would be possible to include more than one object in the head of a page) and the reply came from CEO Alex Iskold. I think that's pretty impressive: presumably he's a busy guy (and he writes a good blog post, too) and yet he took the time to reply to a pretty pedestrian inquiry.

Wednesday, April 16, 2008

That IE7 prompt issue...

So here's why: Working around IE7s prompt bug, er feature (includes a possible solution)
Damn their eyes!

EDIT:
There's an alternative, but similar, solution here: http://www.anyexample.com/webdev/javascript/ie7_javascript_prompt()_alternative.xml. Both solutions require a relatively extensive script and callback, so it may be best if I stick it all into an external JS file and embed and call this with the bookmarklet. However I also now know that it's a security setting, so I've fixed my own IE installation. If you want to do the same, you must enable "Allow websites to prompt for information using scripted windows". Seems to all work then.

Bookmarklet update

OK, there are problems with with prompting bookmarklet in IE. It's all to do with the prompt. Yesterday it worked, but only on the first use, then it would stop prompting for the language-pair value (and in fact ignore the default value I'd put in there) and just skip straight to the translation page, which, without a language pair, can't do much. I think it may be to do with security, since it occasionally shows that "website trying to show active content" warning for a moment before scooting straight off to Google without a by-your-leave, let alone that prompt.

So for IE, for now, I'm just using a straightforward Italian-English bookmarklet
For Mozilla, the prompting version is now also Google-based and goes there. Here it is: translate

Tuesday, April 15, 2008

A Babelfish bookmarklet

I've been longing for a way to do on-page translation - you know, highlight a bit of text and see its translation inline (dodgy though machine translation is). It's not a HUGE bother to go to Babelfish and do the job there but still just a bit too much of a bother. Today I wanted to see what a Portuguese blog was saying about us (beyond what I could hazard with my spotty knowledge of other Romance languages) so I thought, sod it, time to try and do this.

Well, there's no public API for Babelfish (at Google, Yahoo! or Altavista) as far as I can tell, so doing what I really want to do isn't going to be straightforward. Getting the text translated means receiving the results as a full HTML page, so embedding the translation alone will involve some screen-scraping. The next best thing would at least be to highlight some text and go straight to the translation, so I've made a bookmarklet for the job: trans PT_EN

If you want this, drag it to your Links bar in IE (or right-click, save to Favourites>>Links), and in Mozilla drag it to your Bookmarks toolbar (I may have remembered this wrong). By changing the language pair indicated at the end of the redirect URL you can modify this for lots of other languages (this one is pt_en i.e. Portuguese to English). Personally, I may not use it very often, I'll have to see. Let me know if it's any good for you. You'll currently need a different bookmarklet for each language pair, of course.

I hope I can make some improvements. One would be letting the user set the language pair each time - perhaps with a prompt box. Perhaps the next thing would be to pass the translated page through a Yahoo! Pipe and scrape out the translation, to drop it straight on the page.

EDIT: Oh sod it, here's a version that lets you set the language pair in a prompt: translate

EDIT AGAIN: Just seen that Google's translate page offers something pretty much the same, dammit - though not my second option with the prompt. Perhaps I should mod their code, it will be better than mine...

KML goes open

ReadWriteWeb comments on Google's announcement that KML is being handed over to the Open Geospatial Consortium. As RWW says: "For something as boring and painful as it is - standards work is very sexy". This gives us all more confidence that it is a format that's going somewhere and should be reliable for a good while to come. No more hangups about being too tied to Google's proprietary format. Cool!

Friday, April 11, 2008

Standing back for a moment [cross-post from mymuseumoflondon.org.uk]

[cross-post from mymuseumoflondon.org.uk]

Hi, it’s the web-monkey again. Things have been pretty intense lately, due in large part to the end of the financial year and the need to wrap up all sorts of budgets.

My part in the various projects I work on ranges from major to peripheral - sometimes some serious programming, sometimes offering advice on commissioning, sometimes just doing a little tweaking ready for integrating someone else’s work. All the same I’ll flag up a couple of things I’ve been involved in lately, at least of those that have now launched, even if I didn’t do that much myself - after all, where else do we sing about some of this stuff? Too often it ends up sort of dribbling out because we’re all too busy or exhausted to make a song and dance about it. So, here we go:

The Great Fire of London website, orientated at children of Key Stage 1 age (5-7) and their teachers. This is the result of a partnership between the Museum of London, National Portrait Gallery, The National Archives, London Metropolitan Archives, and London Fire Brigade Museum. It’s cool. Thanks to ON101 for building the game and designing the site, and our own Mariruth Leftwich for shepherding the whole thing. Also via Mariruth comes a game to complement our Digging Up the Romans learning resource.

At last we have sort of launched “The Database of 19th Century Photographers and Allied Trades in London: 1841-1901“. This is the electronic representation of the amazing work done by David Webb in cataloguing thousands of people in that industry in Victorian times. I built the database, hmm, several years ago for another partnership we’re in, but it was never launched for reasons that even now seem obscure. Anyway, it’s now live and, though it needs an overhaul even now, it’s great to think it may at last start being useful. I want to open the data up for mash-ups….when I get some time.

The Sainsbury Archive, a fantastic resource at Museum in Docklands, has a new site through the efforts of archivist Clare Wood

I can’t tell you about the work I’ve been doing on republishing an archaeological reference text, because it’s not ready yet. If you can find the test URL, well, you’re very sneaky.

Any day now we’ll see the launch of the “Family Favourites” pages on the Museum in Docklands website. Go and seek it out, there’s a fun game and an introduction to various highlights of the galleries there.

It’s just a promo site until the exhibition itself happens, but have a look at the Jack the Ripper pages. That’s gonna be well worth a visit - get yourself some tickets!

Geek stuff: some time ago I made a machine-friendly interface to look at the database of publications our archaeology service (MoLAS) produces. Whilst working towards the launch of http://www.museumoflondonarchaeology.org.uk/ I decided I wanted to change the architecture of the publications application, which for one thing makes it easy to drop little nuggets of info about our publications around the site, all fed from a database. The solution I went for also works for machine access by anyone, and I hope it will be just a start: we’d like to make our events available like this, and in time our collections. For the record, it’s basically REST/XML, drop us a line if you want to use it (though I imagine that it will be the collections and events that will have wider appeal - note that events already have an RSS feed, which is used on sites like docklands.co.uk).

And check out our events programme, I’ve just uploaded the May to August programme.

Now, what have I forgotten to mention?

Of course, there’s more in the pipeline, keep your eyes on all our sites!

Thursday, April 10, 2008

FlickrSLiDR

This is nicer than the badge.:

Created with Admarket's flickrSLiDR.

Tuesday, April 08, 2008

Significant properties workshop - report

DCC/JISC significant properties workshop (British Library, 7/4/2008)

I'm not going to write up in detail all that was presented on Monday, but highlight a few things that seemed important to me, and work out a couple of thoughts/responses of my own. I haven't yet had a chance to read the papers that were sometimes referred to at the workshop (links to them here, some are huge!) so my questions may be answered there.

JISC’s INSPECT project, run by CeRch at KCL, has set a framework for identifying and assessing the value of significant properties (SPs), and the success of their preservation; and initiated several case studies looking at SPs in the context of sets of similar file formats (still images, moving images etc) and categories of digital object (including e-learning objects and software).
5 broad SP “classes” (behaviour, appearance/rendering, content, context and structure) are identified by INSPECT. These don’t seem to include space to describe the “purpose” of a digital object (DO), unless this is somehow the combined result of all other SPs. But an objective such as “fun” or “communicates a KS2 concept effectively to the target audience” needs to be represented, especially for complex, service-level resources. Preserving behaviour or content but somehow failing to achieve the purpose would be to miss the point.
Something I’m still unclear on: is it that a range of SPs are identified that can be given a value of significance for a given “medium” or format? Or is it that a set of SPs is identified for a format, and the value given according to each instance (or set of instances) submitted for presentation? In other words, it a judgement made of the significance of a property for a format/medium, or for a given preservation target?
Once identified, SPs provide a means for measuring the success of preservation of a file format (whether the preservation activities entail migration to or from that format, or emulation of systems that support it).
The two classes of object explored in the workshop (software and e-learning objects) are typically compound, and are much more variable than file formats. They will inherit some (potential) SPs from their components, but others (many behaviours, for example) may be implicit in the whole assemblage.
Andrew Wilson (keynote speaker, NAA) raised the importance of authenticity. His archivists’ point of view of this concept is not identical with that in museums, or that which I'm using in my research, but it’s useful nonetheless. I have, however, already discarded it as a significant property for most museum digital resources, with the exception of the special case of DRs held as either evidence, or accessioned into collections. Archivists’ focus on informational value and “evidence” as the core measure of (and motivation for) authenticity isn’t always useful for DRs, but it is nice and clear-cut.
The software study drew out the differences between preservation for preservation’s sake – the museum collecting approach – and preservation for use, where the outputs are the ultimate measure of success. The SPs for these scenarios differ.This paper was very interesting, and perhaps (along with the Learning Objects paper) came closest to my own concerns, but the huge variety of material under the banner of “software” clearly makes it very difficult to characterise SPs. The result is that many of those identified look more like preservation challenges than SPs in themselves. Specifically, dependencies of various sorts might count as a significant property in a “pure preservation” scenario; but in most cases they are, more likely, simply a challenge to address to maintain significant properties of other sorts, such as functionality, rendering, and the accuracy of the outputs.
I suggested in Q&As that my reason for being interested in SPs probably differed from that of a DO-preserving project or organisation, although they have plenty in common. Andrew Wilson said that he saw the sort of preservation (sustaining value) that I was talking about as being the same as preserving in the archiving sense. I disagree, in part at least, because:
- He made the case for authenticity. This doesn’t apply when one is using SPs to help planning for good management, where we just want to make sure that we’re making best use of our resources.
- For me, SPs could prove an important approach for planning new resources, whilst for archives they are primarily for analysing what they’ve received and need to preserve (although they could in theory feed into future formats, or software purchasing decisions)
- Whilst for preservation purposes it may often be necessary to decide at a batch or format level what SPs are highly valued and hence what efforts will be invested in their maintenance, for questions of managing complex resources for active use, case-by-case decisions (based on idiosyncratic SPs?) may be the norm.
- For preservation, the “designated community” is essentially a presumptive audience, whose needs should be considered. For museums looking to maximise value from their resources, the SPs will reflect the needs of the museum itself (its business objectives and strategic aims), although ultimately various other audiences are the targets of these objectives. Perhaps there’s not so much difference here.
- Fundamental to all these differences is the fact that for archives etc, the preservation operation in which they are engaged is the core activity of the organisation. In other situations, like planning for sustainability, it is not preservation of a digital object, but its continued utility in some form (any form), i.e. the continued release of value, that counts.
These differences are largely of degree, but to me there is still a worthwhile distinction between preservation and sustainability. In a sense, preservation is the action and sustainability the continued ability to perform that action, so SPs are a way of reconciling preservation with the need for it to be sustainable. Perhaps the lack of a category that outlines the objectives, rather than the behaviour, of a digital object reflects this difference between preserving and sustaining.

Testing oneTag

Well, I'm not at MW2008, more's the pity, but I'd like to try out Mike Ellis's latest tomfoolery which is, as usual, a bloody good idea. OneTag lets you bring together all the stuff tagged with your choice of tag, from your choice of sources. It's in action on the MW2008 conference site so let's see if this post gets in there. First OneTag spam, anyone?
Cheers, Mike!

[edit] the answer to this is it didn't work and it didn't work and it didn't work and I decided to look at the Pipe, followed that lead to Technorati and found that it hadn't updated my site's content since February, pinged it and it's now listed, but because I have very little authority (a measly 3) it won't show up with the feed that's currently in the Pipe. Bummer. Still, at least I found out that Technorati had forgotten about me!

About Me