About Me
- Jeremy
- Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger
Thursday, April 30, 2009
NMM, YQL, COBOAT, CODS
As with similar tools/services (Pipes, Dapper, dbpedia, and various things nearer the surface like GMaps), YQL is clearly a blessing from both ends of the telescope: we get to use it for its intended purpose - to be "select * from Internet" is the grandiose ambition - knitting together data sources from Yahoo! and beyond; and we also get to offer our data in a developer-friendly way to encourage its reuse by creating OpenTables [note that these are purely a machine-friendly description of how to access data: no data is handed over as such]. Jim has already been busy creating Open Tables and experimenting with YQL.
Following the talk we headed for a pint (and one of themost jaw-dropping jokes I've heard, from Chris), and it was good to talk to Tristan from Cogapp. When I stopped raving incoherently about the marvel that is Solr (yes, still in love even as I gradually find out more about it), Tristan cleared up some questions for me about Cogapp's COBOAT app. They recently open-sourced this (as far as possible), in the context of the Museum Data Exchange project with OCLC (see Gunter Waibel's recent post), where it plays the role of connecting various collections management systems to an OAI Gateway-in-a-box, OAICatMuseum (well seems like it's only used with TMS in the project, but the point of COBOAT is that it just makes life easier for mapping one data structure to another, and another CollMS would slot in just fine).
For me, both COBOAT and OAICatMuseum are of interest for the role they could play in our the revamped Collections Online Delivery System* we'll build this year, resources allowing (in other words, don't hold your breath. Mission critical, yeah, but worth paying for? I await the answer with interest). Integrating and re-mapping data sources, an OAI gateway, and sophisticated and fast search are key requirements, as is a good clean API, and taking these two applications along with Solr I feel like I may have identified candidates for achieving all of these aims. We're a long way from a decision, of course, at least on the architecture as a whole, but I have some tasty stuff to investigate, and I'm already well down the track in my tests of Solr.
Thanks again to Jim for arranging the talk. He's got another great guest coming up, hopefully I can make it to that one too.
*I'm resigned to this thing being called CODS but still hoping for something less, well, shit
Tuesday, December 09, 2008
Zemanta: another channel for Europeana content?
So what is Zemanta? Well TechCrunch just wrote about the launch of its public API, and from what they say Zemanta is looks to be amongst a burgeoning sector of semantic enhancement tools - another with an API announcement this week was uClassify, and you can also look to OpenCalais, Hakia, AdaptiveBlue's BlueOrganizer and others including Yahoo!. These are tools that take in (text) content, analyise it, and identify entities within or characteristics of that text. These might be embedded into the text, or returned as recommendations, classifications, or links to related material. Sometimes we're talking about a machine-facing service, sometimes an end-user one e.g. the BlueOrganizer plugin. With Hakia and Yahoo!, these are services built on the power of their search engines. Zemanta sounds like it's squarely in this area, digesting content and returning links, images, keywords etc. from a database including (of course) Wikipedia, Amazon and Flickr. Looks like it's a plugin too.
uClassify is a little different - it learns to classify your text as you train it. I'm characterising it as a semantic enhancement technology but that may not be right in a strict sense. In any case, it will "enrich" the content you submit by putting it into categories you've assigned. That said, when I used oFaust, one of the apps built on top of its API, it took my snippet of Moby Dick and told me it was like Edgar Allen Poe, but needed work! Hmm. Whether that was down to the classifier or the training, though, I don't know.
So to go back to how Zemanta might fit in with Europeana, it's basically that we could work with them to digest our content and create relevant links to Europeana's vast (hopefully) and authoritative collection of cultural heritage content: artefacts, media, documents, people, events, and places. This is where I expect it helps to be big and standardised, as it should be easier for companies like Zemanta to work with one provider of cultural heritage content than with thousands of museums, libraries and archives.
To read more about Europeana (formerly EDL) check out my earlier posts: Europeana and EDL
Wednesday, June 25, 2008
Conference ketchup
In the end, nervous anticipation gave way to the onrush of time and once I was up there in front of faces familiar and not I felt a more at ease than I would have expected. Having listened to the recordings, well, there were a lot more "ums" and "errs" than ideal, but hey, I didn't forget too many things and I kept pretty close to time, which is a big improvement on my earlier debacles.
So what was I talking about? In Leicester, I talked about Europeana. It was not meant to be an overview as such (that's not really my role), but an account of my involvement and interest, focussing on my hopes for the project and, of course, the role that APIs play in that. During Q&As and coffee breaks I had a lot of really useful feedback to my question: what is stopping many more UK museums from getting involved in the project? On the whole these revolved around the burden and mechanics of providing data, which was pretty much as I suspected. It's made me more determined to do what I can to simplify these processes, but also to ensure that the pay-off to partners is as high as it can be and as well understood as possible. Perhaps we have the furthest to go to achieve the latter.
At the Koninklijke Bibliotheek in the Hague I had an even shorter slot, which was fine by me, as part of a panel whose other members were intimidatingly illustrious. The subject of the conference was "Users expect the interoperable", and this particular session had two panels discussing interoperability in relation to archives and museums, respectively. I took part in the latter panel. I still don't know if I actually said anything, really, because I had little in the way of conclusions to offer: I just teased out some ways in which I thought "interoperability" questions pertained to APIs in a museum context. I also looked at a few examples from the world of semantic enrichment - a strange choice, perhaps, but made because there are really no proper museum APIs to compare to, and in order to show that a lack of standardisation in that area is no barrier to those APIs (Calais, Hakia, and Yahoo! Term Extractor) being useful. Simplicity gets you a long way, as does the use of existing data formats (e.g. DC or microformats). These also fit well with the other drum I was banging, the services that EDL could offer to contributors and third parties for enriching content. So, a kind of bitty talk but at least it was brief!
On Tuesday the conference wrapped up (and I do want to talk a lot more about it ASAP, because apart from anything else the first prototype was shown off and it's COOL!). I attended a hurried meeting of WP1 and Harry Verweyen presented his paper on the business model. I think he's done a great job, although this is so far outside my area of comptence I scarcely dare comment. He'd also done a lot of work integrating some of my suggestions into the plan, and it became still clearer to me how much of this hangs off the success of the semantic web tech part of the project.
Both conferences were really rewarding in their own ways and I'll try to offer some proper notes from them as soon as I find my feet again.
Friday, June 20, 2008
Hakia, semantic enrichment, and EDL
Semantic enhancement (as well as data validation) is a service that I've mooted as a possibility for Europeana. With a specialist and very authoritative data set, it could appeal to those needing to enrich cultural heritage content. There may not be a lot of money in it, but as Harry Verwayen pointed out to me, that's not necessarily the only benefit to the service provider (or I doubt Reuters would be in this game). Building traffic around the site and strengthening the brand is a benefit. Similarly, increasing the use of the ontology/thesauri used by EDL increases its influence.
Harry is putting together a presentation for next week's WG1 meeting after the EDL plenary, and he's generously put my name on the front too although I have little to contribute beyond these slightly flaky suggestions. We'll be throwing these ideas into the mix in a discussion of business models for EDL when it goes live.
Tuesday, May 27, 2008
Slicing the market mini-, sorry Micro-update: TO'R on MicroHoo
Apple's apparent success with an "own the stack, from the device to cloud" strategy is misleading. With both the iPod and the iPhone, a key element of success is precisely the device's openness to what Apple does not own. Imagine an iPod where you could only buy music from the Apple music store instead of ripping your own CDs (this is Amazon's mistake with the Kindle). Imagine an iphone without the Safari browser (opening a world of web apps to the phone) or the Google Maps application. Apple owns key elements of the stack, but it's a permeable stack, and getting more so.)
This is handy material for making the argument that museums shouldn't try to (or be required to) "own the stack". Far better to focus on a layer in the stack and make it permeable. iTunes (the music store) is clearly an example of trying to corner a part of the market outside Apple's home turf, but (a) they're big and ballsy enough to try it (and who else was doing so effectively at that time? Aside from Napster...) and (b) the core offering is still actually the hardware, and it allows you to acquire music by other means. Amazon and Kindle is interesting too. Perhaps it's too early to say they've made a mistake, though they probably have. I doubt it will stay closed and succeed. Nevertheless, as I acknowledged before, they are making a grab for more of the stack. But let's not forget, they're HUGE!
What can we learn? Well there's that point about openness/permeability. If we must insist on claiming the vertical market from top to toe, from collections management system to the end user's screen, then at least make it permeable and open. Otherwise your carefully grown fruit will wither on the vine, forgotten and increasingly past its sell-by date.
Tuesday, May 20, 2008
To Leuven (expenses paid), but who will pay for EDL?
A significant factor in the search for revenue-raising avenues is the fact that Europeana is not going to be a content owner in any significant way, but rather a broker/facilitator for accessing content owned by others. One possibility that I believe it could be worth exploring for two reasons is some form of partnership with a search provider. Yahoo! may be a bit too distracted to talk at the moment, but along with Google could be productive partners. Both sides could benefit by working on an interface and aligning their data structures, and EDL could perhaps offer quite a bit to such a partner in terms of preferential access to the semantically enriched data it will hold. This might be directly to do with searching the resources in EDL, or it might be, say, helping to clean up datasets of people and places. In exchange, maybe either some cash or technological assistance? Perhaps some of the semantic-y startups currently taking wing could also be interesting to work with, but they won't be as well resourced. Cultural heritage organisations have a lot of knowledge and context to offer here so maybe there's a business model to be had.
Friday, May 16, 2008
Yay! Follow the Search Monkey!
So if it's what I hope it is, we can co-ordinate with others in the sector on some standard fields (and keep it simple initially), push our content into Yahoo! and build apps on top of their search engine. My reservation would be that at the moment it seems to be about building either "Infobars" or "Enhanced Results", but perhaps there's something more API-like and programmable there, or on the way.
* is this the right way to punctuate the possessive of that annoyingly-punctuated name? Answers on a postcard (to Jerry Y!ang).
Tuesday, May 13, 2008
The Yahoo! Internet Location Platform
He also points to this for how it all ties to Flickr.
Thursday, March 13, 2008
Yahoo semanticises(?) business
Hmm, exciting!
[edit] see this too
Monday, January 21, 2008
...and Yahoo! using user-created tags
Thursday, January 17, 2008
Yahoo! and OpenID
OK, not much to add to this, another one on board and a very big one at that. According to RWW, this triples the number of people with and OpenID (or access to one) or will when it goes live at the end of the month.