The Doofer Call: June 2008

Friday, June 27, 2008

Mike Dunn on "technical due dilligence"

Extracted from an interview with Mike Dunn of Hearst, published on ReadWrite Web, these are the top five aspects of "technical due dilligence" that Dunn looks for when considering a startup for acquisition.

The primary things I look for are a thorough understanding of a company's current technology state and a roadmap of their future. I then fill in the building blocks to paint a picture of the company and its structure via the next 4 areas.
Staffing: The company should have a proper ratio of dedicated to outsourced staff. The focus for in-house staff should be on owning and extending the company's value-add. The focus of the outsourced staff / service should be on areas where technology is available at a reasonable price.
Infrastructure and Architectural: I look for alignment between the infrastructure in place and their roadmap. I try to understand their architecture, i.e., have they designed something that will be stable, yet scale and grow as their business requires? Have they over or under built, are their investments proper for current state and extensible as their growth requires?
Workflow and Processes: This is usually the hardest part of my interviews with startups because while most have ways they do things, they often aren't comfortable expressing them. They also aren't normally done in a way that's repeatable to the point where they could be called a workflow. This is OK. As they mature, standardized workflows and processes will be established, normally out of a necessity to ensure they're providing a stable environment that doesn't get negatively affected as they introduce change.
Costs: This is the spreadsheet part of the conversations. What has been spent to get them to the point they're at, what do they need to spend near term, possibly with funding from my company, and what do they envision they'll need to spend? I look for a grounded approach to spending.

With a little flipping around these consideration probably apply just as well to museums.

Wednesday, June 25, 2008

Conference ketchup

Well it's been a pretty busy time. After many years of avoiding presenting at conferences, following a number of crappy performances in '99, I bit the bullets kindly shot at me by Ross and Jill and opened my cakehole to several hundred unfortunate captives, first at the UK Museums on the Web conference in Leicester, and then at the EDL plenary conference in the Hague. And I'm truly grateful to both Ross and Jill for the opportunity to do this: it's very flattering, humbling, really, that they felt I'd have something worth saying to such informed and inquisitive audiences.

In the end, nervous anticipation gave way to the onrush of time and once I was up there in front of faces familiar and not I felt a more at ease than I would have expected. Having listened to the recordings, well, there were a lot more "ums" and "errs" than ideal, but hey, I didn't forget too many things and I kept pretty close to time, which is a big improvement on my earlier debacles.

So what was I talking about? In Leicester, I talked about Europeana. It was not meant to be an overview as such (that's not really my role), but an account of my involvement and interest, focussing on my hopes for the project and, of course, the role that APIs play in that. During Q&As and coffee breaks I had a lot of really useful feedback to my question: what is stopping many more UK museums from getting involved in the project? On the whole these revolved around the burden and mechanics of providing data, which was pretty much as I suspected. It's made me more determined to do what I can to simplify these processes, but also to ensure that the pay-off to partners is as high as it can be and as well understood as possible. Perhaps we have the furthest to go to achieve the latter.

At the Koninklijke Bibliotheek in the Hague I had an even shorter slot, which was fine by me, as part of a panel whose other members were intimidatingly illustrious. The subject of the conference was "Users expect the interoperable", and this particular session had two panels discussing interoperability in relation to archives and museums, respectively. I took part in the latter panel. I still don't know if I actually said anything, really, because I had little in the way of conclusions to offer: I just teased out some ways in which I thought "interoperability" questions pertained to APIs in a museum context. I also looked at a few examples from the world of semantic enrichment - a strange choice, perhaps, but made because there are really no proper museum APIs to compare to, and in order to show that a lack of standardisation in that area is no barrier to those APIs (Calais, Hakia, and Yahoo! Term Extractor) being useful. Simplicity gets you a long way, as does the use of existing data formats (e.g. DC or microformats). These also fit well with the other drum I was banging, the services that EDL could offer to contributors and third parties for enriching content. So, a kind of bitty talk but at least it was brief!

On Tuesday the conference wrapped up (and I do want to talk a lot more about it ASAP, because apart from anything else the first prototype was shown off and it's COOL!). I attended a hurried meeting of WP1 and Harry Verweyen presented his paper on the business model. I think he's done a great job, although this is so far outside my area of comptence I scarcely dare comment. He'd also done a lot of work integrating some of my suggestions into the plan, and it became still clearer to me how much of this hangs off the success of the semantic web tech part of the project.

Both conferences were really rewarding in their own ways and I'll try to offer some proper notes from them as soon as I find my feet again.

Friday, June 20, 2008

Hakia, semantic enrichment, and EDL

Moving in the same direction as Reuters, with its OpenCalais service, Hakia has started offering two new APIs, one related to search and the other to content summarising and enhancement (see RWW's story). Perhaps it has some way to go before this is a really useful service in terms of the quality of its output (if RWW's experience is any guide) but it's early days. In any case, putting this alongside Calais and Yahoo!'s Term Extractor (not to mention other semantic enhancement services extracting, for example, location data), this shows at least that there are quite a few people out there that think there's a market for this sort of service.

Semantic enhancement (as well as data validation) is a service that I've mooted as a possibility for Europeana. With a specialist and very authoritative data set, it could appeal to those needing to enrich cultural heritage content. There may not be a lot of money in it, but as Harry Verwayen pointed out to me, that's not necessarily the only benefit to the service provider (or I doubt Reuters would be in this game). Building traffic around the site and strengthening the brand is a benefit. Similarly, increasing the use of the ontology/thesauri used by EDL increases its influence.

Harry is putting together a presentation for next week's WG1 meeting after the EDL plenary, and he's generously put my name on the front too although I have little to contribute beyond these slightly flaky suggestions. We'll be throwing these ideas into the mix in a discussion of business models for EDL when it goes live.

Tuesday, June 10, 2008

Clarification

Following my off-the-cuff post about the Museum of London's strike (in which I did not participate, not being a member of a union), it's become apparent that there is some confusion as to the status of this blog. It's important, therefore, that I make it very clear that the contents of The Doofer Call are a personal expression only. This is a research diary for my PhD that I've chosen to make public in order to engage in debate and help people find things that interest me; my profile states this. However, because I talk regularly about things I'm doing at the Museum I can see that casual readers might mistakenly believe that I am writing in an official capacity, or with official sanction, and any confusion is my fault and for me to address. Hence this post.

Friday's missive may turn out to be factually incorrect (one might argue that the pay at MoL is not rubbish, or that the pay rise was not over a year late) or inaccurate in its representation of the reasons for the disgruntlement that led to the unions striking, but that's not really the point. What's important is that it's clear that what I wrote was not an official press release, but the completely unofficial, independent expression of someone who is also employee, written at home, out of office hours (like this).

So for clarity, I am posting a disclaimer that can also be read as a declaration of interests. Please read it here.

My apologies to anyone who previously mistook this blog to be anything other than my thoughts in my words, I hope this has made things clearer.

Note: all of the organisations with which I have a formal relationship are included as tags on this post in the hope that people searching for the organisations and stumbling across this blog will also see this clarification.

Disclaimer and declaration of interests

The following are organisations with which I have a formal relationship (most of which I've written about at some point). The opinions expressed on this blog are mine alone, and are not to be attributed on the basis of this blog to any of these organisations:

Museum Of London, my employer. This includes Museum in Docklands and MoLAS. The Museum is also a contracted partner in my PhD

the City of London Corporation write the contracts and the pay cheques, and the Greater London Authority are 50% partners in the Museum

University of Leicester, Department of Museum Studies, at which I am studying for a PhD

Lexara (and previously Simulacra and MWR), the non-academic partner in my PhD. Lexara also supply some technology to the Museum of London.

Richard de Clare Primary School, where I am a parent governor with oversight over ICT across the curriculum

EDLNet, the EC project for which I am the official representative of the Museum of London

None of the above sanctions or previews anything that I write here. Equally, you should read my remarks in the light of my committments to them all.

If anything written here is inaccurate or unclear, please contact me directly or through comments about changing it.

Monday, June 09, 2008

Victorian photographers API now also alphatastic

Didn't manage (didn't dare) to upload this before going-home time on Friday. There is now another string to the bow of the nascent REST alpharama here. You can query the people in the database of 19th Century photographic London (pardon the contraction) at the PhotoLondon website we launched recently. Right now, it only sends you back a list of people matching your criteria (100 records per page) with no details. I will soon produce a detailed machine-friendly record page for each person (an example of a human-friendly one is here)

The querystring takes most of parameters you can see on the form including surname, forename, search text (but not multiple words like the form), gender, year of birth/death, "alive in..." year, place of origin, photographic occupation, non-photographic occupation, and presence of attached images. Here's an example search so you can see what you'll get back right now. Here's a wider one.

There are bugs, no doubt, and it will be of more use when I can send you the person's details, as well as a list of countries and occupations to build queries from. I realise I also need to write some proper documentation. Still, it's a start. Please tell me if it might be useful to you, and what you'd like to see it doing/do with it.

Guardian bigs up the 24 Hour Museum

Jack Schofield today writes up 24HM here. It's funny timing - as he himself points out, it's looking at a relaunch pretty soon - but perhaps there are clues in that as to the timing of this review. It's a pretty positive piece on the whole but suggests that "Culture 24 could be so much better" (after its relaunch). I'm as intrigued as JS to know what's planned - and whether, for example, it will be able to consume/aggregate RSS feeds as well as publish them!

Friday, June 06, 2008

Small API update

A couple of small advances on the API front (again, see here)

fixed a bug on the geo thing. For some reason an imbecilic code error wasn't breaking the script on my machine, but did on the web server. Now fixed.

a CDWALite-lite output for individual object records (example). There's more to add, glitches to fix, and ideally a better solution to the URL, but it's a start. Next thing is a search interface but that depends upon agreement within the Museum. A good solution may be to combine CDWALite and OpenSearch-style RSS, with the records enabling users to find the data end-point, as well as the HTML rendering. In due course I'll probably add tags to HTML record pages to point at data like this, or I may do it with some POSH.

the photoLondon website data now has a basic API, which I'll put on the live site next week. It returns basic person details and search parameters include: surname, forename, keyword, birth year, death year, "alive in" year, gender, country of origin, photographic occupation and non-photographic occupation. I'll work on the search result format soon, as well as the person details.

Wednesday, June 04, 2008

MoL APIs live (but very alpha)

Well there's more to do, but an alpha version of three services is now there if you want to play. All of them currently put out only XML, no JSON or raw text, GEDCOM or whatever else, but this may change and new services may be added. Have a read here. There is an events database, publications from our Archaeology Service, and a sort of geocoding tool. I've rejigged the code so that adding a different (XML) output format to these involves just writing XSLT and putting in a new value for the "mode" parameter, rather than fiddling around with C#, recompiling and all that. All feedback most welcome. But don't bug me about a collections API!

To start you off, here are links to one request from each service.

events API. This won't work forever as the events will expire. Uses the format that Upcoming outputs with xCal, DC and geo extensions
publications API
the geothingy converter whatsit

Tuesday, June 03, 2008

The browser on the server

I've been wondering about the possibility of running JavaScript outside the browser, in fact as part of the web server. Of course, old-style ASP could be written with that language but that's not really the same - what I want is the ability to take and manipulate objects on the server that you'd otherwise be expected to work with in a browser context. I was thinking initially about JSON - I know that it's perfectly possible to use JSON from other languages (libraries exist for PHP, Python, .Net etc) but wouldn't it be better just to do it with a JavaScript engine? And then I thought, well, Mozilla's engine would be the obvious thing, I presume they've built it in such a way it can run in other contexts. So I started googling and obviously this is a very well-trodden trail that I'm just late in exploring: there's loads of stuff out there. I've not checked out any of it properly but thought I'd put some links here in case anyone else is toying with the same idea, or is prompted to now.

Incidentally, one of the things that's nice about JavaScript is that it's all focussed around the DOM, unlike many other languages, and I'd think it's pretty forgiving too (depending on the engine). All of which makes it a good candidate for screen scraping. And one thing I've not yet found is a way to tie this into .Net nicely

http://en.wikipedia.org/wiki/Server-side_JavaScript This lists loads of products for various environments that will run JS server-side. Looks like Mozilla's Rhino and SpiderMonkey products are the basis for most of them.
http://ejohn.org/blog/server-side-javascript-with-jaxer/. This must be the first stop for me. How cool is that? Here's jaxer itself: http://www.aptana.com/jaxer/. Yes, it is based on Mozilla. Damn, it's for Apache, but it's not a deal-breaker.
from the same source but older: http://ejohn.org/blog/bringing-the-browser-to-the-server/ Look in the comments too, plenty of starting points there
I also turned up some of what I take to be the original Netscape JavaScript documentation, which talks a lot about server-side javascript but in relation to the old Netscape Enterprise Server

Monday, June 02, 2008

Made an API

No, not a collections database interface, Mia's onto that one. But I'm scheduled to work on the events database about now, and with the mashup day (now full up, I think) coming it seemed a good idea to grab a day and finish off this job. For ages we've had an RSS feed which gets a lot of use, including consumption by sites like dockland.co.uk, and of course search functionality onto the events web pages, but the feed is (a) a bit basic and (b) just shows everything for the next 14 days.
I wanted to turn the static feed into a properly searchable REST interface, as well as load it with more good data. So I plugged a hole whereby only a start date could be specified, put in filters for audience, event type, and keyword (like the web pages), and neatened up the separation into XML. Now it can be churned out in raw straight-from-the-database XML or in RSS 2.0 with extra hCalendar. I've got a couple of tweaks to make (adding in addresses and geo data) and then will invite the world. The plan is that you can set the format in the query string, so RSS2 will be one option but so will xCal (as used in Upcoming's XML) and whatever else - it'll just pick up a stylesheet and apply it on the fly. There are more filters I could apply, and documentation to do. Then thinking about pagination with OpenSearch extensions... and then who knows. Most of it will wait, so I'll post soon with the details on the where and how.

About Me