The Doofer Call: yahoo

Showing posts with label yahoo. Show all posts

Thursday, April 30, 2009

NMM, YQL, COBOAT, CODS

Jim O'Donnell organised a talk on Tuesday at the National Maritime Museum from Christian Heilmann of Yahoo! Mia wrote up her notes already and I've not got much to add, but it was a very enjoyable presentation, and when he reached the juicy bit about YQL and BOSS, both of which I'd left for another day's exploration, I learned a lot. Clearly there's a lot of potential there (especially now it's augmented by YQL Execute, announced yesterday), and it looks like it will let you do a bunch of things that Pipes can't do, or is a pain to do (the GUI is great and yet infuriating with Pipes). YQL gives a common API meta-interface (I guess that's the word) for loads of other APIs and for things with no API; it also handles all the crap with authentication, tokens etc; and it will act as the gatekeeper for your API so you don't get hammered by unreasonable numbers of requests.

As with similar tools/services (Pipes, Dapper, dbpedia, and various things nearer the surface like GMaps), YQL is clearly a blessing from both ends of the telescope: we get to use it for its intended purpose - to be "select * from Internet" is the grandiose ambition - knitting together data sources from Yahoo! and beyond; and we also get to offer our data in a developer-friendly way to encourage its reuse by creating OpenTables [note that these are purely a machine-friendly description of how to access data: no data is handed over as such]. Jim has already been busy creating Open Tables and experimenting with YQL.

Following the talk we headed for a pint (and one of themost jaw-dropping jokes I've heard, from Chris), and it was good to talk to Tristan from Cogapp. When I stopped raving incoherently about the marvel that is Solr (yes, still in love even as I gradually find out more about it), Tristan cleared up some questions for me about Cogapp's COBOAT app. They recently open-sourced this (as far as possible), in the context of the Museum Data Exchange project with OCLC (see Gunter Waibel's recent post), where it plays the role of connecting various collections management systems to an OAI Gateway-in-a-box, OAICatMuseum (well seems like it's only used with TMS in the project, but the point of COBOAT is that it just makes life easier for mapping one data structure to another, and another CollMS would slot in just fine).

For me, both COBOAT and OAICatMuseum are of interest for the role they could play in our the revamped Collections Online Delivery System* we'll build this year, resources allowing (in other words, don't hold your breath. Mission critical, yeah, but worth paying for? I await the answer with interest). Integrating and re-mapping data sources, an OAI gateway, and sophisticated and fast search are key requirements, as is a good clean API, and taking these two applications along with Solr I feel like I may have identified candidates for achieving all of these aims. We're a long way from a decision, of course, at least on the architecture as a whole, but I have some tasty stuff to investigate, and I'm already well down the track in my tests of Solr.

Thanks again to Jim for arranging the talk. He's got another great guest coming up, hopefully I can make it to that one too.

*I'm resigned to this thing being called CODS but still hoping for something less, well, shit

Tuesday, December 09, 2008

Zemanta: another channel for Europeana content?

OK there are several ways I could frame this post, but obviously one is that here is another opportunity for Europeana to channel its content.

So what is Zemanta? Well TechCrunch just wrote about the launch of its public API, and from what they say Zemanta is looks to be amongst a burgeoning sector of semantic enhancement tools - another with an API announcement this week was uClassify, and you can also look to OpenCalais, Hakia, AdaptiveBlue's BlueOrganizer and others including Yahoo!. These are tools that take in (text) content, analyise it, and identify entities within or characteristics of that text. These might be embedded into the text, or returned as recommendations, classifications, or links to related material. Sometimes we're talking about a machine-facing service, sometimes an end-user one e.g. the BlueOrganizer plugin. With Hakia and Yahoo!, these are services built on the power of their search engines. Zemanta sounds like it's squarely in this area, digesting content and returning links, images, keywords etc. from a database including (of course) Wikipedia, Amazon and Flickr. Looks like it's a plugin too.
uClassify is a little different - it learns to classify your text as you train it. I'm characterising it as a semantic enhancement technology but that may not be right in a strict sense. In any case, it will "enrich" the content you submit by putting it into categories you've assigned. That said, when I used oFaust, one of the apps built on top of its API, it took my snippet of Moby Dick and told me it was like Edgar Allen Poe, but needed work! Hmm. Whether that was down to the classifier or the training, though, I don't know.
So to go back to how Zemanta might fit in with Europeana, it's basically that we could work with them to digest our content and create relevant links to Europeana's vast (hopefully) and authoritative collection of cultural heritage content: artefacts, media, documents, people, events, and places. This is where I expect it helps to be big and standardised, as it should be easier for companies like Zemanta to work with one provider of cultural heritage content than with thousands of museums, libraries and archives.
To read more about Europeana (formerly EDL) check out my earlier posts: Europeana and EDL

Wednesday, June 25, 2008

Conference ketchup

Well it's been a pretty busy time. After many years of avoiding presenting at conferences, following a number of crappy performances in '99, I bit the bullets kindly shot at me by Ross and Jill and opened my cakehole to several hundred unfortunate captives, first at the UK Museums on the Web conference in Leicester, and then at the EDL plenary conference in the Hague. And I'm truly grateful to both Ross and Jill for the opportunity to do this: it's very flattering, humbling, really, that they felt I'd have something worth saying to such informed and inquisitive audiences.

In the end, nervous anticipation gave way to the onrush of time and once I was up there in front of faces familiar and not I felt a more at ease than I would have expected. Having listened to the recordings, well, there were a lot more "ums" and "errs" than ideal, but hey, I didn't forget too many things and I kept pretty close to time, which is a big improvement on my earlier debacles.

So what was I talking about? In Leicester, I talked about Europeana. It was not meant to be an overview as such (that's not really my role), but an account of my involvement and interest, focussing on my hopes for the project and, of course, the role that APIs play in that. During Q&As and coffee breaks I had a lot of really useful feedback to my question: what is stopping many more UK museums from getting involved in the project? On the whole these revolved around the burden and mechanics of providing data, which was pretty much as I suspected. It's made me more determined to do what I can to simplify these processes, but also to ensure that the pay-off to partners is as high as it can be and as well understood as possible. Perhaps we have the furthest to go to achieve the latter.

At the Koninklijke Bibliotheek in the Hague I had an even shorter slot, which was fine by me, as part of a panel whose other members were intimidatingly illustrious. The subject of the conference was "Users expect the interoperable", and this particular session had two panels discussing interoperability in relation to archives and museums, respectively. I took part in the latter panel. I still don't know if I actually said anything, really, because I had little in the way of conclusions to offer: I just teased out some ways in which I thought "interoperability" questions pertained to APIs in a museum context. I also looked at a few examples from the world of semantic enrichment - a strange choice, perhaps, but made because there are really no proper museum APIs to compare to, and in order to show that a lack of standardisation in that area is no barrier to those APIs (Calais, Hakia, and Yahoo! Term Extractor) being useful. Simplicity gets you a long way, as does the use of existing data formats (e.g. DC or microformats). These also fit well with the other drum I was banging, the services that EDL could offer to contributors and third parties for enriching content. So, a kind of bitty talk but at least it was brief!

On Tuesday the conference wrapped up (and I do want to talk a lot more about it ASAP, because apart from anything else the first prototype was shown off and it's COOL!). I attended a hurried meeting of WP1 and Harry Verweyen presented his paper on the business model. I think he's done a great job, although this is so far outside my area of comptence I scarcely dare comment. He'd also done a lot of work integrating some of my suggestions into the plan, and it became still clearer to me how much of this hangs off the success of the semantic web tech part of the project.

Both conferences were really rewarding in their own ways and I'll try to offer some proper notes from them as soon as I find my feet again.

Friday, June 20, 2008

Hakia, semantic enrichment, and EDL

Moving in the same direction as Reuters, with its OpenCalais service, Hakia has started offering two new APIs, one related to search and the other to content summarising and enhancement (see RWW's story). Perhaps it has some way to go before this is a really useful service in terms of the quality of its output (if RWW's experience is any guide) but it's early days. In any case, putting this alongside Calais and Yahoo!'s Term Extractor (not to mention other semantic enhancement services extracting, for example, location data), this shows at least that there are quite a few people out there that think there's a market for this sort of service.

Semantic enhancement (as well as data validation) is a service that I've mooted as a possibility for Europeana. With a specialist and very authoritative data set, it could appeal to those needing to enrich cultural heritage content. There may not be a lot of money in it, but as Harry Verwayen pointed out to me, that's not necessarily the only benefit to the service provider (or I doubt Reuters would be in this game). Building traffic around the site and strengthening the brand is a benefit. Similarly, increasing the use of the ontology/thesauri used by EDL increases its influence.

Harry is putting together a presentation for next week's WG1 meeting after the EDL plenary, and he's generously put my name on the front too although I have little to contribute beyond these slightly flaky suggestions. We'll be throwing these ideas into the mix in a discussion of business models for EDL when it goes live.

Tuesday, May 27, 2008

Slicing the market mini-, sorry Micro-update: TO'R on MicroHoo

In his piece MicroHoo: corporate penis envy? Tim O'Reilly makes lots of interesting arguments concerning where Microsoft should be putting its energy - as far as he's concerned, it should forget about trying to grab a slice of the search market and focus instead on building the internet operating system. For Yahoo!s part, it should realise it's the number one internet media company and make the most of it. There's lots here to chew on (and object to, as Michael Arrington does here, and I happen to agree with him and Jakob Nielsen that search, especially semantically intelligent search, is far from ticked off), but the paragraph that grabbed my attention concerned a couple of other players:

Apple's apparent success with an "own the stack, from the device to cloud" strategy is misleading. With both the iPod and the iPhone, a key element of success is precisely the device's openness to what Apple does not own. Imagine an iPod where you could only buy music from the Apple music store instead of ripping your own CDs (this is Amazon's mistake with the Kindle). Imagine an iphone without the Safari browser (opening a world of web apps to the phone) or the Google Maps application. Apple owns key elements of the stack, but it's a permeable stack, and getting more so.)

This is handy material for making the argument that museums shouldn't try to (or be required to) "own the stack". Far better to focus on a layer in the stack and make it permeable. iTunes (the music store) is clearly an example of trying to corner a part of the market outside Apple's home turf, but (a) they're big and ballsy enough to try it (and who else was doing so effectively at that time? Aside from Napster...) and (b) the core offering is still actually the hardware, and it allows you to acquire music by other means. Amazon and Kindle is interesting too. Perhaps it's too early to say they've made a mistake, though they probably have. I doubt it will stay closed and succeed. Nevertheless, as I acknowledged before, they are making a grab for more of the stack. But let's not forget, they're HUGE!

What can we learn? Well there's that point about openness/permeability. If we must insist on claiming the vertical market from top to toe, from collections management system to the end user's screen, then at least make it permeable and open. Otherwise your carefully grown fruit will wither on the vine, forgotten and increasingly past its sell-by date.

Tuesday, May 20, 2008

To Leuven (expenses paid), but who will pay for EDL?

Tomorrow sees EDL's working group 1 meeting at the Katholieke Universiteit in Leuven, not far from Brussels. Quite exciting to be visiting, albeit fleetingly, a place that played host to Matsys, Bouts, Erasmus and Vesalius, amongst others (not to mention, apparently, the infamous AQ Khan). I'm looking forward to attending, though with no expectation of being able to contribute a lot since this group covers different ground to the one I've worked with up till now. I'm not even sure if I'm part of the group or simply in attendance. Anyway, the meeting will look at progress with Europeana so far, and consider the business issues facing it, particularly how to move to phase 2 (i.e. following the prototype, to be launched in November) and how to build a sustainable future. Working my way through 100-odd pages of reading matter in preparation for the meeting, I'm struck by how big a challenge it will be to find the necessary ongoing resources, but also the fact that they are tackling the problem head-on and examining a wide variety of options, from direct subsidy, through subscription by contributors or users, to corporate partnership or sponsorship.

A significant factor in the search for revenue-raising avenues is the fact that Europeana is not going to be a content owner in any significant way, but rather a broker/facilitator for accessing content owned by others. One possibility that I believe it could be worth exploring for two reasons is some form of partnership with a search provider. Yahoo! may be a bit too distracted to talk at the moment, but along with Google could be productive partners. Both sides could benefit by working on an interface and aligning their data structures, and EDL could perhaps offer quite a bit to such a partner in terms of preferential access to the semantically enriched data it will hold. This might be directly to do with searching the resources in EDL, or it might be, say, helping to clean up datasets of people and places. In exchange, maybe either some cash or technological assistance? Perhaps some of the semantic-y startups currently taking wing could also be interesting to work with, but they won't be as well resourced. Cultural heritage organisations have a lot of knowledge and context to offer here so maybe there's a business model to be had.

Friday, May 16, 2008

Yay! Follow the Search Monkey!

Well, rock'n'roll, looks like Yahoo!'s* Search Monkey is going live today. Apparently this will allow us as site owners to (cribbing from RWW's report) "share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. " On the basis of that data, other developers will build apps and users will enhance their search. This seems to be precisely the sort of thing we wished for in the SWTT (in fact legendary Mike Lowndes pointed to earlier signs of this move last month). It's also what I had my doh! moment about last week.

So if it's what I hope it is, we can co-ordinate with others in the sector on some standard fields (and keep it simple initially), push our content into Yahoo! and build apps on top of their search engine. My reservation would be that at the moment it seems to be about building either "Infobars" or "Enhanced Results", but perhaps there's something more API-like and programmable there, or on the way.

* is this the right way to punctuate the possessive of that annoyingly-punctuated name? Answers on a postcard (to Jerry Y!ang).

Tuesday, May 13, 2008

The Yahoo! Internet Location Platform

Brady Forrest writes about the Yahoo! Internet Location Platform, which sounds very cool. Couldn't be simpler, really. I'll have to have a look at just what sort of entities that relate to MoL have a WOEID (What On Earth ID), but doubtless there's lots we can do with this. If we can find the time...
He also points to this for how it all ties to Flickr.

Thursday, March 13, 2008

Yahoo semanticises(?) business

Sorry, that's a lame title. "Means business", I mean (sic). Anyway, exciting reports of their plans for using various forms of structured content that are out there, inluding key microformats, eRDF and RDFa, Dublin Core(!). There'll be a developer platform, too, and the ability to "create mods for Yahoo search that leverage their semantic data". This sounds more than Google Base or something, this is very cool and I wonder if it might mean that using custom POSH will also work.
Hmm, exciting!
[edit] see this too

Monday, January 21, 2008

...and Yahoo! using user-created tags

TechCrunch reports that Yahoo! is starting to put delicious rankings with its search results. It's unclear if delicious tags are actually used to calculate rankings, which would be a great test of the power of the lower-case semantic web, based on UGC/social tagging. Fingers crossed. Next we'll have to push them to look at other sources of socially tagged data. Come on out, Steve

Thursday, January 17, 2008

Yahoo! and OpenID

Take back your digital ID

OK, not much to add to this, another one on board and a very big one at that. According to RWW, this triples the number of people with and OpenID (or access to one) or will when it goes live at the end of the month.

About Me