Thursday, March 27, 2008

Chris Rusbridge on significant properties

I've been a fan of Chris Rusbridge for a while now, increasingly so as I delved into the work he's been involved in previously, not least (as he mentions in his latest post) CEDARS. I was thinking along the same lines in terms of the need to identify what's important about a digital resource before you settle on a strategy for "sustaining" it (ideally before you build it, actually), and then I came across the work they did on "significant properties". Clearly the properties for the sorts of data and file-based assets that digital curators typically deal with (not to mention the context and purpose of their work) will often be different from the properties that a museum cherishes in their digital investments, but the framework that CEDARS developed is a great starting point for me.
CR has just blogged about this topic again, in the run-up to a JISC workshop that I'll unfortunately miss next week.

Wednesday, March 26, 2008

The Semantic Web now - Alex Iskold's latest great primer

Alex Iskold's latest guide to SW tech is great, his best yet. Really clear, with useful classifications of the kinds of technology and applications that we're starting to see. If you need a primer or an update, have a look

Thursday, March 20, 2008

OT: awesome freestyling

Gollito and Paskowski in fullest effect. Up until recently I could imagine more ridiculous moves than were actually being pulled off by guys like this, but now, well, my imagination would be very stretched to exceed this!

Wednesday, March 19, 2008

What's new

Stuff seen:

SearchMe beta, a search engine which shows a visual results (images of the web pages) categorised (as e.g. museum, art, shopping, fishing). Quite nice. Silverlight I think. It's a bit SW (in its results clustering, for example), though how it goes about doing this I don't know, but other "semantic" search stuff has shown up lately. TextWise (small "sw", I guess) has just been reviewed by TechCrunch, which was doubtless part of the point of offering a $1m prize for suggesting uses for its technology. Hakia is another such.

Stuff I've been doing the last week or two:

the Great Fire of London site for Key Stage 1 kids finally soft-launched.

working on templates for the Londinium site - the bulk of my time right now

preparing the digital republication of an out of print handbook for identifying roman pottery fabrics. I probably mentioned it before, it involved the export of Quark to PDF, the export of PDF to XML, translation via several XSLT steps and manual clean-up to TEI-Lite, and finally modification of some XSLT to display this as an HTML page. Most of this was a while ago; right now I'm getting ready for the images which will need to be embedded once all the scanned thin sections are ready.

testing out and integrating Flash interactives with our CMS. Several are pretty much ready for launch, including two from the London Sugar and Slavery exhibition, and two games

advising as best I can on the development of the replacement map interface for the LSS gallery

fretting over the re-branding exercise the MoL group is engaged in. how much work is it worth doing right now to fix issues on the sites if we'll be overhauling the whole thing in the autumn?

testing new search engine SearchMe (see above). Didn't get good results for "roman london" yet, but it's only indexed a billion pages or so....

One-stop shop for non-profits at Google

TechCrunch points out Google's new portal for non-profit organisations, which should at least make it simpler to sign up for the free versions of their services for museums et al. Off now to try it out...

Monday, March 17, 2008

A few more Paris notes and an update

A dull post. I listened to my recording of my talk in Paris and jotted down notes on a few things that came up in the discussion, thought I'd get them down here. Also, I updated the list of input parameters I put up before and updated the version on Scribd.

Those extra points:

Geo search
There are geographical search (and geo plus time) projects going on in eContent Plus and IST, using co-ordinates, place names, changing boundaries etc. We would hope to incorporate these (possibly post-prototype). Everything in Europeana will be public domain (development-wise) therefore the software will be there for the taking (I hope I got that right!)
"Privileged" tags
We mooted the possibility of privileged tags, i.e. those produced by certain authorised users, perhaps agreed by certain groups. Tags created by these users (most likely content contributors) would be treated differently so that we could pull out only certain items with a tag. But probably, rather than giving them some specific "privileged" status, we could achive the same thing just by identify them by contributor, user group or contributor type.
Stuff to clarify

Licensing data model and assumptions
Core common data
Where is the boundary between Europeana and the contributor sites? Maquette seemed to include considerable data and the actual content displayed in-site for some types of asset e.g. images, but others might be held off-site. What are the rules?
What needs to be added to the API to work well for libraries and archives?

Friday, March 14, 2008

Shock discovery: poor communication wastes resources

Yesterday I found out about some work that's being done on our behalf by the company that advertises our jobs. At the request of our HR people, they've almost finished building some custom pages to mimic the look of our group portal pages. Trouble is, the web techs (in this case, me) were never part of the discussions and I can't help but feel that we'd probably have got things done very differently if we had been. Certainly I have some concerns about what we'll get and I would have liked to explore other options. What are my concerns? They can mostly be expressed adequately with the word "sustainability":

Content maintenance. It mimics the look of our CMS pages, but the content isn't integrated with our CMS. Changes to site structure won't be reflected in the menus, nor would updated content.
Visual maintenance. The look of this site will change (we dearly hope) and I can't change their pages
Google. I don't know how they look upon sites that look like copies of existing sites and point at their pages. I suspect it might look like spamming and I wouldn't want to be blacklisted.
Site stats. We can't (readily) integrate the job site's stats with ours (if we get them at all). Not a huge deal to me but a factor.
Cost. I don't know what this will have cost, but five minutes after getting hold of an RSS feed from their site I had integrated it into our own, replicating the most important part of what they'd done. I suspect we could have done it cheaper, in short!

I don't know if, had we talked about this properly, we would have ended up doing something different. It depends upon what the important parts of the "site" are, and on what the job site can offer beyond a pretty sparse RSS feed, but I think we could have negated the need for at least some of what they did. There's meant to be a process we follow for every single new media project, so that it passes by the right eyes to let us make any recommendations. This time that broke. HR did speak to people on our team but the plans didn't reach me (at least, not that I recall), and I think this has something to do with a communication failure about just what the plans involved. I may not have made it clear the sort of things we are able to do in terms of integrating third party sites (or at least what we'd be up for trying), or perhaps the scope of HR's plans weren't really clear to start with. Or perhaps I just need to make it clear that every single piece of third party work, no matter how small it seems, must go through me. One way or another, our communications have been lacking and it looks like we've ended up doing things the wrong way.

Thursday, March 13, 2008

Yahoo semanticises(?) business

Sorry, that's a lame title. "Means business", I mean (sic). Anyway, exciting reports of their plans for using various forms of structured content that are out there, inluding key microformats, eRDF and RDFa, Dublin Core(!). There'll be a developer platform, too, and the ability to "create mods for Yahoo search that leverage their semantic data". This sounds more than Google Base or something, this is very cool and I wonder if it might mean that using custom POSH will also work.
Hmm, exciting!
[edit] see this too

The PSP is dead in the water

The Reg reports that Ofcom's mooted Public Service Publisher idea is now dead. They loathed it, but from our point of view in museums it had potential, or at least we could imagine a potential role for ourselves. There may be other ways to get to the same thing, although to be honest as a sector we're so busy trying to do the basics that I rather doubt most of us would find the time to develop any ideas to bid for PSP cash. So, goodbye to all that.

Tuesday, March 11, 2008

Here's that EDLNet presentation and notes

Yesterday I put the EDLNet Paris slideshow onto Slideshare, but since Scribd also let's me put up other stuff I'm putting that and the notes there too. If you can't see these coz I've screwed up the Scribd embed link or something, go here for the presentation and here for the notes

EDLNet Paris presentation:

Presentation notes:

And just in case...

The previous post with options for API input parameters is also on Scribd (UPDATED 17/3/2008)

Monday, March 10, 2008

Paris presentation online

http://www.slideshare.net/guest3fb875/europeana-wp3-api-presentation-paris-432008/

New world speed-sailing record

Antoine Albeau, who has had most other windsurfing titles in his time, now has not only the windsurfing speed record, but the outright record for all sail-powered watercraft (at least, the record most of us count: over a fixed 500m course). Rock on! He broke the record with a shade over 49 knots (beating Finian Maynard's previous by 0.4kts) on March 5th in the mistral blowing at Saintes Marie de la Mer, southern France. Video here (I'm guessing he's the first rider in the clip. Hang on for some vicious wipeouts too)
Fingers crossed they might even top that at Southend today. Go Dave White!

Saturday, March 08, 2008

Public API inputs

Public API inputs and outputs

[edited 17/3/2008]

We discussed at the Paris meeting the range of parameters
that we thought that an API might need to handle to perform the sort of (public-facing)
tasks we envisaged. We didn't actually talk about output, except in regard
to the ability to specifiy return fields, but I think that this is actually
much the simpler part to work out. I've reworked our discussion, added a few
bits of my own (including the UGC bit), and split it into sections relating
to general parameters, filters for collections queries, and UGC. No doubt
lots more clarification and revision are needed and I'm pretty unclear on
some bits myself, but it's something!

Input parameters

The “profile” includes various elements defining the operation
in terms of function, languages, values and format of returned data etc. Collections
data requests will be required for some functions, and consist of various
filters. The third table relates to operations on user generated content,
including adding, editing and getting (by user or group). We may decide that
some operations are only open to specific users or categories of user; for
example, accessing UGC of some categories might only be possible for the owner
of that UGC (via their associate API key) or the owner of the collections
related to that UGC. TBC!

Query profile (data access and data addition/editing functions)

*Parameter*	*Access or edit*	*Example values, notes*
Function [required]	A	search, compare, translate, add, update
Return format [required]	A, E	DC-XML, RSS, geoRSS, CDWALite, JSON, CSV. This might instead be implicit in the target URL.
Return fields	A	Array of field names, but a default set would perhaps include GUID, title, thumbnail, short description, owner, owner type, media. Might also provide shortcuts to preset field groups. Will vary according to target entities
Search data	A	Formal metadata; all data; expert and user tags; user tags only; “expert” tags only; specific user/expert/group tags.
Expanded terms	A	True, false [use/don’t use thesauri etc.]
Requesting language	A, E	EN, FR etc.
Return language	A, E	As above. If only one is present presume the same.
Key [required]	A, E	API user key
User ID [required for some operations]	A, E	For the end user. Required for accessing/modifying data attached to specific users or groups. Presumably we need to authenticate and authorise in some way, too, for some operations.
Rights/licence	A, E	Perhaps multi-value, specifying rights/licensing parameters. Likely to be more complex than one field!

Collection data filters (access only)

*Parameter*	*Example values, notes*
Target entities	Objects, people, places, subjects [if we are enabling anything more than objects]
GUID	A unique identifier given to every record in Europeana
Set ID	ID for a set of entities, which may require the appropriate key, depending upon privacy settings for that set.
Keyword	Tricycle, ww2, treaty, Anne Briggs, documentary [multi-field search].
Structured data: name	Name of object, person or place. If these use different fields, then the right one should be inferred from the target entity. Examples: photograph; sunflowers; Forlì (or Forli); Max Brod (or M Brod); Rockall
Structured data: date [point, range, older, younger]	14 July 1792; 19^th century. “Older than 1850” might be expressed as: “- 20000000 – 1850”; “Younger than” as: “1850-2050”; uncertainty like “1850 +- 5” as a range: “1845-1855”, though this isn’t perfect
Structured data: related person	For returning objects, people and places
Structured data: related place	For returning objects, people and places. See also “geographical” below.
Subject
Original language	Of object, principally, for documents (if this data is well expressed)
Originating institution	Good structured data, ideally (we may require an ID), but we could permit a string search across the relevant field.
Originating institution country	For searching by current location
Originating institution type	Museum, library, archive, A/V archive
Location	Including sub-parameters for grid reference, coordinates, place name, and the location of concern (e.g. place of creation, place of publication, location of subject matter)
Sorting	Keyword occurrence, date precision, location, location relative to user, institution type – perhaps sorting partly inferred from the fields used in search, but if these are mixed e.g. date and place plus keyword, need to sort on one before the other.
Media	text, audio, image, video or more specifically PDF, WAV, MPEG etc.
Format [item type]	Map, book, video perhaps. Is this data held in a structured way, and is it distinct from the media metadata?

UGC operations (add, edit, view)

These operations will need user ID (or group ID) plus authentication and authorisation for certain operations (but not for viewing public data).

*Parameter*	*Access/edit*	*Example values, notes*
tag	A, E	For modifying tag i.e. deleting, or viewing associated items
note	A, E	For modifying or viewing
UGC contributor	A, E	Perhaps multiple values, including groups, so we can look for stuff with a given tag but only when tagged by a certain set of UGC contributors
UGC contributor type	A, E	Content contributor vs. other user

Friday, March 07, 2008

EDL WP3 Paris meeting

So, Friday evening, perhaps a few moments to write up (some of) the Paris meeting. I did mean to attach my presentation too but the last version is on a memory stick at home. Too damn portable, never where you need it!
It was a successful meeting, I would say, and it was a pleasure to see a couple of faces I already knew, and meet others for the first time. On Monday (with me still reeling from a 4am start) we were taken through the results of the user testing. These were overwhelmingly positive, which needs to be taken with caution given the guided nature of the demo (especially with the online questionnaire, but also perhaps the expert users and the focus groups). All the same there were criticisms that provided something to get our teeth into, particularly around the home page and the pupose of the "who and what" tab. Search result ordering was an issue, a particularly thorny one in fact that we tackled on Tuesday as best we could. Clearly a lot of users don't really understand tagging, though they thought they liked it. Other plusses were for the timeline and map.
There was a good session with representatives of a French organisation for the blind and visually disabled after lunch (a bloomin' good lunch, in fact. Good wine, too. I love France!). Aside from HTML accessibility they talked extensively about Daisy, and it would be marvellous if some of the text content that may end up there could be daisified. No-one had heard of TEI (or DocBook) but it struck me that these formats are pretty close to what Daisy sounds like, and that there may be TEI material amongst the content we'll be aggregating, so translations to Daisy could be relatively straightforward. Anyone know?
Personalisation took us to the end of the day and we distinguished between activities done for private purposes (though perhaps with public benefits) like bookmarking with tagging, or tailoring search preferences, setting up alerts, or saving searches; and explicitly public activities like enriching content, suggesting and tagging (when not bookmarking). The question of downloads (what? how? assets or data?) and the related issue of licensing came up. I think we worked out that possibly four levels of privacy would be useful, extending the way Flickr and other sites work, with private, public, friends/family, and "share with institutions". The latter is really about saying, I will let me and my mates and the organisation whose objects I'm tagging/annotating look at the data, but not everyone. I think it's important and should be encouraged, as it lets those institutions do interesting stuff with the resulting UGC for everyone's benefit. We ran over into the next day to deal with communities (still plenty to think about there, I would say) and results display, a practical and useful discussion that touched on the fields that might be searched across and how they would be used in ranking.
Finally my bit came up. Although Fleur had suggested that I talk for maybe 15-20 minutes to kick off discussions on the API, I, feeling unsure of my ground, prepared pretty thoroughly with the result that I had material that kept me talking for an hour or more, I think, albeit with some digressions for debating what I was saying. On the whole it went down quite well, I think, but I learned a bit about what I should have added (proper, simple explanations of APIs, and more examples of how they're used) and what I should have left out (a section where for the sake of completeness I referred to the management of collection data, which is not part of the public API anyway and is outside the scope of our WP. This led to a digression that was I think still useful, but not to the topic of that moment). And then, seeding the discussion with a use case related to VLEs, we tried to figure out in more detail what functions and parameters would be needed in an API call, and what would be returned. And that, my friends, I will write up shortly for now I want my dinner. Home calls.

Thursday, March 06, 2008

MS new stuff: IE8 and super-cool image zooming in Silverlight

IE8 beta released, some interesting developments (the microformats list is debating the rights and wrongs of "web slices", which are based on hAtom and are intended to let highlight part of a page to treat as a feed)
Also from MS, SeaDragon (see also Photosynth) in Sliverlight 2. More than a bit useful for use cultural heritage types [edit] and I should add that the video on that TechCrunch page is of a cultural heritage application - Hard Rock Cafe's memorabilia application, which was the demo shown at MIX08. They talk about the role of imaging for authentication, for bringing objects to life, and though it's obviously a business, their business is really not so far from ours (albeit for profit)

Sunday, March 02, 2008

Hey MCG Listers!

Thank you for visiting. I see that some 30-odd listers have had a peek at my summary of the recent EDL/API thread - I hope it was worth the trip to this blog, and I'm really pleased that the thread must have piqued the interest of a fair few people, still more than actively participated in the thread (and lurking is just fine by me, I lurk on many a list). Anyway, I switched on comments after posting (though I've forgotten ever turning them off) but it turns out that this isn't retrospective, so if you wanted to say anything in response to the EDL/API post you couldn't. Hence this one - you can stick any responses to that API stuff here if you like. No pressure, though!

Cheers, Jeremy

The Doofer Call

About Me