The Doofer Call: 2012

Friday, November 16, 2012

The Mystery Trend

This is a little case study of digging into our web stats a bit to understand a recent trend. Doing this sort of digging is pretty addictive and can, of course, throw up some interesting insights.
Recently we noticed a bit of an odd trend in our user stats on iwm.org.uk. Although direct comparisons are imperfect we're happy overall with how they've looked in the year since we swopped to the new site, with decent improvements in all the main crude metrics that might mean, well, something or other that's meant to be good. Last month we exceeded 500,000 visitor sessions for the first time. Here's how things have looked week-by-week from early May to mid-November

What with a general above-target performance, an odd summer (a big dip - thank you, Olympics!) and peaks in months that historically haven't been so prominent, the trends have been a bit of a departure from the norm so I thought I'd dig into it. It turned out that direct visits had increased noticeably from mid-September onwards - like, doubled.

Further, the growth was in the mobile segment and, more specifically, in iOS devices - equally iPhones and iPads. But why? This is weird. I mean, we're seeing the rapid growth in mobile traffic like everyone else, but whilst the numbers of iOS users overall increased by a couple of percent between May and November, direct visitors increased 4-fold just from September to November (whilst direct visits from Android are pretty much flat). Yep, that much.

To me directs from mobile means visits referred from apps, and in this case I certainly doubt that all those Apple fans - and only them - have suddenly bookmarked our site or emailed it to each other. So it's probably about referrals from apps, but which ones? There are few self-explanatory clues, either in where they arrive or in the timing. Plenty land on the home page:

our branch pages and visiting info:

(but not Duxford)

and a healthy number also land on collections record pages:

If the trend is evident for all of these it can't be from a single suddenly-popular app, right?
Predictably, our site gets healthy spikes of traffic from places like Reddit, the BBC or newspapers, which we shows up in our referrer traffic; perhaps the app equivalent of these websites is responsible for this other trend. But at first look the spikes don't seem to correlate, as for instance this graph of iOS visits from Reddit shows. I'd expect referrals from a Reddit app to be similar and is not much like the "directs" trend:

It just seems odd that there's been such growth in whatever apps are responsible in that period. We have a couple of apps of our own, of course, but again looking in detail at the pages they may link to doesn't provide an explanation. At first I suspected social media apps. Facebook & Twitter have not suddenly quadrupled in users over 8 weeks, but it seemed a plausible explanation for a rise in direct traffic to such a range of pages. Then I thought about search. Perhaps there's a newly popular search app on Apple devices? Aha, perhaps we're onto something:

So, referrals from Google drop off amongst iOS users at the same point that direct visits increase. Not being an iOS user I still have no idea of what's caused this, though I assume that there's a new appified version of Google that's being taken up rapidly. I think I'll need to do some more detailed analysis to make sense of this, but it looks like (a) the rise in directs doesn't mean an absolute rise in iOS users, and (b) our overall increase in visitors has nothing to do with this trend. This actually makes things a little worse for us because we now have less information about searches that bring traffic to us. Hmm.
I'd be interested in whether any of you lot seen a similar trend in direct traffic on iOS devices over recent months? Comments please!

PS if you found this blog post hoping to download stuff by the Mystery Trend, sorry to disappoint you. Interesting band though. More here.

Thursday, October 11, 2012

IWM and the Google Cultural Institute

Just a very quick one. Yesterday Google launched the new historical part of what they now call the Cultural Institute*, consisting of object records and media plus a number of exhibits assembled from them. IWM was amongst a small group of institutions in this first wave. We contributed metadata and media for a few dozen items in our collection relating to the Second World War, and put together an exhibition about D-Day. The exhibition tool is very effective, but what I think is more fundamentally interesting is that this project (like Europeana, albeit on a fraction of the scale) enables the remixing of collections from various kinds of organisation. Some of our material was used in the Anne Frank exhibit, for instance. So just like Europeana Exhibitions, the potential for remixing allows for new combinations of material culture and new stories to be told. I wonder how the "market" here will shake out...
It was an interesting project to work on. The data standards evolved as we went along and were significantly more useful by the end, and it was really good to be part of that group of GLAMs that could test it out and help to make it more fit for purpose. the process consisted of one of our historians, Mandy, putting together the story and selecting the items she wanted to use to illustrate it, which involved a bit of new digitisation and data cleaning. Then this metadata needed turning into the XML format Google needed. I resisted hand-coding the XML even though it would have been easy enough given the data we had and the smallish number of items. Instead I wanted to be ready to do this lots of times for lots of objects and exhibitions, so we used the CIIM middleware to organise all the object records and add context-specific data and media (such as a YouTube video). Then I wrote an XSL transform for the Solr XML it churned out, so it will be trivial to put together new batches of metadata in the future. Once the data and media were uploaded to Google (still a pretty manual process) my colleague Jesse put the exhibit itself together (with Google working out some of the kinks in the tool as we went along). All in all it was a good process. Perhaps it took more time that we'd envisaged but that is in the nature of doing something new, and some of the bottlenecks are now gone.
I think the end results are great, not just ours but those assembled by other partners too. This first group are very strong in a few areas (the Holocaust and South Africa, in particular, but not forgetting the simply huge photo archive from LIFE), which was an interesting approach. I think it was a good idea in that it facilitated strong cross-institutional combinations of material. Hopefully we'll see a wider spread of subject matter in coming months, though (and keep an eye out for other IWM exhibits there).

* I think the Cultural Institute now gathers together some of their other cultural projects they've done in the past (Art Project, Dead Sea Scrolls etc) but the historical exhibitions seem to take centre stage.

Wednesday, September 19, 2012

Mobile thinking

(well, pontificating.)

So, how to go mobile? Don’t ask me, every time we go around this one all the same questions come out. One site or multiple? One URL or more? (these are not the same question.) Is responsive design the way forward? What about performance? Do different users need different content and IA? Is the device or the use-case most important, and should we infer one from the other? When must we resort to an app? Everything seems to depend on something else, so how do we cut that Gordian knot? At IWM thus far we’ve only dabbled in mobile websites so we really haven’t settled on our approach yet, but recently we spent some time with colleagues from the National Gallery and Tate talking it over, and some things seemed to get a little clearer to me. Not answers, but perhaps a way through to some decisions. Bear in mind, of course, that given my limited experience of actually doing this stuff it’s possibly just BS, but anyways...

For starters, I think it’s three knots. Each is a bundle of objectives, factors, constraints and decisions. Some of these decisions will be easy to settle or there may be no choice about, whilst others are our degrees of freedom. The knots themselves seem to me to have limited impact on each other, though there are connections. What I hope is that they help us to address questions of principle and implementation in the right order.

Problem 1: Knowing which mode of display is required

How do we decide what version of a site or rendition of a piece of content is shown to a user to allow for their device, and what is the importance of user choice in this? What aspects of the device are significant? Really this is a question about whether we think we can guess what people need in order to achieve their objectives, and what happens if we guess wrong.

Informing considerations

User choice: is it important for users to have explicit control over how they see your site? Are the tasks they may want to achieve guessable from the platform they use and does this matter? Source: is the source a user comes from relevant (e.g. a social share), and should the URL they request determine what they see regardless of their device? Technical characteristics of devices: what are the significant technical dimensions along which devices differ (screen size? (if any) Touch vs keyboard/mouse vs voice interface? Functional capabilities?). User choice and source have a somewhat inverse relationship with the importance of technical characteristics in governing the experience, but aren’t mutually exclusive.

Consequences

Various technical choices depend upon how we answer this problem (the role of URLs; mechanisms for allowing user choice or automatic detection), but maybe even content or the user journey are impacted. And the decisions over user choice, sources and technology will affect each other.
In terms of technology, for instance, unprompted client-side detection of window size (as would be used to perform media queries for a responsive design) is one option, whilst server-side detection of user agent properties is another (perhaps directly determining what rendition is shown; or else offering the user a choice that is saved as a cookie or redirects them to another URL).

Problem 2: What to show differently

Problem1 concerned identifying the important dimensions of difference (devices, tasks) and how much we should try to work this out on behalf of users or leave it to them. Once that choice is made comes the question of what should actually be different about the different renditions of a site to (hopefully) adapt the experience to users’ needs.

Informing considerations

Users and the tasks they wish to accomplish (again) - if we decided in Problem 1 that tasks were important in determining the selecting rendition that probably means renditions will differ along the dimension of tasks too. Form factors & device capabilities. Accessibility. Location (might we change the content according to where the user is?).

Consequences

User experience and design. Delivery of media. If we choose to focus on facilitating different tasks for different platforms, then information architecture, content and functionality may also be affected.

Problem 3: How do we display it?

Finally, the bit we usually seem to skip right to: how do we get the right stuff onto the page and get it to look right once it’s there? There are links here back to 1 and 2, because the technical solution to rendering stuff to the display will be some combination of server- and client-side technology that is also going to relate to how (and where) the results of those two decisions are governed.

Informing considerations

Devices; technology; cost; performance.
Assuming we’ve gone down the HTML rather than the app route, there are (I think) three basic options, which can be combined: get the server to spit out different HTML depending upon what is required (on the basis of platform, user choice or whatever – see above); spit out the same basic HTML and change the lay it out with CSS via fluid design and/or using media queries and the like (which may also load some different assets); or spit out the same basic HTML (preferably HTML5) and remodel it with JavaScript-y goodness, possibly loading in different content at the same time.
Doing it all server-side may of course be difficult depending upon the technology you use to assemble your web-pages, or it might make a stronger case for a separate mobile site. Using CSS to do the layout is probably the simplest approach but it has limitations and means that with some minor exceptions all content will be loaded, whether it’s shown to the user or not (though I am no CSS maven so may have missed something here). Using the power of HTML5, combined with JavaScript, we can do a lot more, including pull in content only as it is needed. It’s possible to request images to be made on the fly with dimensions that suit the user’s screen, which could be a boon for page load weights. As it depends upon client-side code the results are perhaps a bit less in the hands of the website owner and users of older desktop browsers may suffer, but there are plenty of libraries out there now that profess to make the whole cross-browser thing less of an issue, and recent browsers do tend to play a bit nicer together than once upon a time. I suspect this is a pretty demanding route technically, or it could be (especially if on the server side you need to set up services for dynamically loading content), and thinking about how to make it play nicely with Drupal themes gives me shivers, but it’s got a lot going for it.
To be honest I get confused as to which of these approaches people mean when they say “responsive design”, and no doubt they can be hybridised anyway.

Consequences

The choice here is basically leads us to our technical approach: whether the hard bits of the coding we have to do are client-side or server-side, feeding back to how we tell what type of rendering is required, whether we need to develop content feeds or other services, and so on. All will require continuing maintenance especially if there is some sort of automated detection going on, rather than user choice. And there will be cost implications whichever way, but they will depend upon the complexity of the requirements, the capabilities and flexibility of existing systems, in-house expertise and so on.

Conclusion

Huh? Well basically I wrote this just to rescue myself from all those circular discussions that seem to happen but if I'm talking through my hat speak up!

Friday, July 13, 2012

Solr replication "invalid version" error

A quick one for googlers: I had this error in my catalina log when trying to replicate a Solr index:

invalid version expected 2 but 10 or the data in not in 'javabin' format

I first saw it when I moved between versions of Solr, because versions 3+ use a different format. Most things I read were about this issue, and it was easy enough to resolve (I just made sure all my Solrs were on the same version - not so easy for everyone but worth the effort). But in one case I still had the error, and this whole javabin thing turned out to be a red herring. The answer (from here) is simply that the configuration of the slave included "admin" in the replication URL, which is not where the replication service itself exists. Doh! There was indeed something at the http://www.example.com/solr/admin/replication/ address, but it's the admin interface to replication rather than the endpoint itself, which was at http://www.example.com/solr/replication/. I'm not sure how the error crept in for me but clearly I'm not the only one! Perhaps I snipped it from some faulty documentation, but all those false positives going on about javabin format kept me busy trying all sorts of other things before I stumbled across the right answer. Anyway, check your solrconfig.xml on the slave to be sure your masterUrl string doesn't include "/admin" in the path.

Wednesday, June 20, 2012

What would a European Cultural Commons be?

The motion: “This house believes that all content in the European Cultural Commons should be freely available for reuse”

Last week’s, um, Uxford Onion debate at the Europeana plenary in Leuven (all presentations here), on the proposition above, was stimulating and useful but left most of us still scratching our heads about just what it is. I’d been fortunate to take part in a break-out session the day before where Louise Edwards gave us some background on the idea of a cultural commons and we heard about two case studies showing different facets of the idea. Louise then did another version for the whole conference before the debate. I’ve snipped a little from my notes at the end of this post but I can't do it justive. Look at Louise's presentation here (PDF), or else go to the source and read the work of the late Elinor Ostrom and of Charlotte Hess and extrapolate from there (I must also do this myself). I must admit I was still pretty unclear how the general idea was meant to translate into an actual thing, and I wasn’t the only one, but it was a very stimulating idea nonetheless and an excellent debate too, spearheaded by Tony Ageh and Nick Poole (for and against) and seconded by Gary Hall and Susan Hazan, respectively, with input from the floor too (and chaired by the one and only Jill Cousins to strict Uxford Onion rules).

To me, the question appeared moot because it seemed that a fundamental of a commons is that it is free to all – certainly within the identified community, although perhaps in some cases amongst anyone who happens to stumble across the resource in question. But still, it does beg the question of what was required of the Commons or of something that was “in” it. It also threw up many possibilities as to what “free” might mean; whether some sorts of “freedom” might look like restrictions from some perspectives; and whether all dimensions of “free” would be required in order for something to pass muster as part of a commonly-held good.

One respondent from the floor argued forcefully that GLAMs should not have to allow people to do what they liked with their material, because they were the authoritative source. My immediate reaction was, well, then it doesn’t belong in the commons (also: get a grip). It seemed to me that we could thereby end up with two cultural commons: one the actual culture that people live in day to day (arguably myriad cultures, actually), replete as it is with restrictions both legal and normative but essentially an organically growing thing in which ideas are transmitted, mutated, opposed and which is actually in all but limited circumstances very free (if you think about the number of ideas, memes, fashions, habits etc that get passed around they far outweigh those encumbered by patents, copyright etc); the other a Commons with a C, in which authorised voices in a partitioned corner of that approved “culture” get to control what is said about various ideas, things etc. This is neither common nor cultural (that is, passed between members of a society and evolving along the way), and it would probably be counterproductive, putting a barrier between GLAMs and unauthorised participants by treating the latter as second-class members of the “culture”. OK, that’s how stuff often happens now, and it’s not necessarily illegitimate, but it is nothing to do with the idea of a commons in which participants develop ways to look after a shared resource responsibly.

Degrees of freedom

Well after my initial reaction to that speaker making the case for museum authority in a commons, I did reflect further. In his closing remarks Nick Poole almost had me voting for the “opposition” by arguing persuasively for the sustainability of a commons to be remembered (I think he may have been arguing against his beliefs but god, he’s a fine debater!) It reminded me that for a commons to succeed it is very likely to have restrictions. According to Ostrom’s model they will be set by the community (perhaps they will even end up ossified into law), complete with sanctions and means for these to be applied, and everyone in the community can participate in rule-making.

So is a restriction that requires, for example, the respectful use of an image or reasonable attribution of a source necessarily incompatible with a commonly-held good? Would, say, a requirement to give attribution (a BY clause) stop something being “free” and be incompatible with a commons? Perhaps not. A commons is about shared responsibility, so if an organisation contributes something to the commons it gives up the right to dictate the correct use of the item, but the community may still demand respectful use. As we said above, a code of behaviour and system for establishing breaches and applying sanctions are part of a commons too, and rules around respectful behaviour could help to make a cultural commons sustainable, not least by giving museums the confidence to add more material. Free-riders – for example, those that make offensive use of commonly-held media – would need to be identifiable and open to sanction by the community, and perhaps one could/would still need to depend upon legal means (licences etc) for this. This is not the same by any means as a museum being able to impose its own limitations on the use of things it had contributed to the commons.

Perhaps the deal with the commons, then, is not that restrictions upon totally libre use of an item are absent, but that the contributor must place its trust in the community (if it accepts the item), which will then assume responsibility for setting any restrictions and applying any sanctions. Once it is part of the commons, though, a contributor must accept that it has lost exclusive control and becomes one voice amongst many. By participating as a persuasive member of a commons, a museum might have a stronger voice, but ownership is gone – that just ain’t a commons, and it ain’t culture either.
I wonder if this extends to the other kind of free: does something not being gratis-free stop it also being free-as-a-bird free? Well it does limit freedom, in that it prevents use by certain parties that are unable to pay; but if it is what the community deems necessary for the commonwealth to be sustainable, perhaps again it’s not actually incompatible with a commons after all. I surprise myself.

Yeah, but still, wtf is it?

So back to the question: what concrete form might a European Cultural Commons take, and what would that mean someone would actually do in order to add material to it, and what role might Europeana play, if any? Some ideas.

A badge
It might be as simple as a public domain dedication of the sort that already exists. Perhaps it could be branded in some new fashion, if there was some political or stakeholder rationale for that, but it would basically be nothing new.

A place
It might be a place where items badged like this are held, or where an index of them is held. Again, things like this exist already and it could be an extension of what Europeana does. If it was a physical repository for the items it would be more like Wikipedia Commons.

An agreement
...so it could simply be an agreement with Wikipedia Commons to do this

A framework
It could be a set of policies or guidelines to which content contributors could subscribe that vouches for certain beliefs and practices, perhaps including the badge above.

A vision
It could be a concept for policy makers to get hold of, a flag to rally around. But even then you’d think it probably needs something more concrete to talk about otherwise no-one will know what to do with the idea. And if it’s not concrete it can’t be measured, which will be important for many people.
What this does have in its favour is that it may be vague enough for a wider range of organisations to subscribe to it. If the BBC or Supraphon or Lionhead Games wanted to be a part of this sort of commons, perhaps it could be vague enough to enable it. It would be a commons with a small “c”, perhaps: simply an expression of the shared culture(s) of Europe, with no warranty implied about the nature of the content or how people might use it to “participate” – the ability to use (consume) alone might be enough, not to reuse or repurpose.

PS
Here’s a snip from my conference notes. Apologies for their unusual brevity and inevitable errors. [edit: now that I've linked to Louise Edwards' presentation I suggest you look at that instead]

Principles for a commons (Ostrom):
- Establish cty boundaries
- & rules
- All can participate & change rules
- A system for monitoring
- Sanctions for breaking rules
- Conflict resolution mechanism
- Polycentric systems better than centralised ones
Charlotte Hess: what makes a new commons different?
- Complex, variable membership unknown to each other
- No established norms
- Egs:
  - Libraries
  - Educational commons
  - Cultural commons

What is a cultural commons? Cultures shared by a community. Shares intellectual resources, ideas, creativity, styles. [It’s obviously not new!]

Saturday, June 09, 2012

A memorial post

On Wednesday my great uncle Raoul died, a couple of months shy of his hundredth birthday. It wasn't unexpected really, but still sad. Though I didn't see him nearly enough, he was a lovely man and really unlike anyone else I've met, yet at the same time he evoked many of the feelings and sense of mental orientation common to the eastern European Jewish branch of my family. That evening at Fiona's suggestion we had vodka and smoked salmon for dinner, the perfect way to toast Raoul's passing. The next morning I scrawled some thoughts as they fell out of my head on the train to work. Here they are pretty much unaltered, errors and all (but small clarifications follow).

It was a hollow day when that last thread was cut, but it ended with salmon and chilled vodka in icy glasses and reminiscence and warm thoughts.

The last of a generation was the least of what Raoul was, but he was that for our family too, and losing him is losing all those already missing, a little more. But it’s Raoul we’re missing now. Of course, I never knew the baby born in Samara or whatever that far off city was, or the little boy fled to or through Odessa with his family when the Russian revolution came. I never knew Raoul in Czechoslovakia or later in Estonia or Latvia or whichever Baltic enclave where he worked in... was it some printing trade? I never knew the Raoul my aunt Grete knew, who met and married him; I’m vague even on the how or the when. The Raoul who came pre-war to London on business dealings for his father, who by myth or legend seems to have had a slightly roguish, edgy existence, but later became a restaurateur, businessman, landlord, investor; I was no peer or pal of his either. My history is hazy, and I could easily clear it up but it doesn’t seem to matter in a way, except for the pleasure of the tales. I knew – I loved – but a small corner of the man. I knew him as a child knows an adult, perhaps even in adulthood myself. But his long history, all of it, was marbled through him, salted and spiced him, was bound into his manner, his good humoured warmth and generosity. It peppered his parlance and points of reference and if I only heard him talk (hilariously) of his early years in his very late years, I’d felt it long, long before.

Because Raoul, to me, was the man I knew as a little kid, part of Grete and Raoul with their sidekick Misty, who would roll in with their Peugeot saloon and often as not with Tic Tac Granny to our house in St Margarets, or who we’d visit in Ladbrooke Grove. He was the big hairy hand, those particular fingers, the complicit pat on the wrist and later, very hard of hearing, the waving away of some missed part of a conversation because he knew it only really mattered that we were sharing time. Too little of it, of course, and I wish more than anything that my kids had seen enough of him to know him a little, and vice versa. But I know how he cared for them regardless.

I can hear his voice now, not saying anything, I just hear the tone, the syllables and prosody and that same layered and weathered, weathered and layered person, complete just in that sound. I wish I could really, really hear it.

Some factual corrections following a little asking around. It turns out that it was probably Saratov, another city on the Volga, where Raoul was born in 1912, and that as a 7-year old his family fled to Czechoslovakia and came to know my grandfather’s family, especially my great uncle. The little sister, Greta, 8 years his junior, wouldn’t have been that noticeable but when they met again years later in London and she was a young woman, presumably that was no longer true. They married in 1946. Before being sent to London by his father to learn English and business skills and to work for an uncle, Raoul was in Lithuania and I think that’s where he must have worked as a sports reporter. Sports remained important to him. Presumably the family had all gone together to Lithuania, in fact. It's where Raoul's father was killed in 1941, like so many other Jews. His mother managed to bribe her way from the camps to freedom and eventually to London too. Tic Tac Granny was Greta's mother, my Grandfather's step-mother. And Misty? She was a highland terrier, and you know how animals make an impression on kids...

Thursday, May 17, 2012

Off you go, my lovelies: embedding film and sound from IWM

On Tuesday we made a few changes to the website, one of which we've been working towards for a long time: freeing up our streaming media so that you - and you and you and you - can use it on your own websites (subject to the terms of the IWM User Licence).
Since we launched the new site in November we've offered over 40,000 images for free re-use either by embedding HTML we offer, or by downloading the files, also using this licence, but we weren't quite ready to do that for our videos or (in particular) our sound recordings. There are more complex rights issues around these, as well as ethical issues, and as a consequence we had to think carefully, not only about what to let go of, but what to put on the website at all.
Now, though, we have approval to apply the same licence to parts of our sound and film collections and are applying it bit by bit. We also have new physical infrastructure in place for our streaming media that should be able to cope if something gets popular. We currently have about 660 films digitised and all of them are online, with about half of them cleared for you to reuse. Of the tens of thousands of items in our sound collections a fair portion are digitised but clearing them for the web is a huge job, so we are working our way through, and now there are over 1800 you can listen to, over 1400 of which you can reuse.
How can you reuse our sound and film? We allow you to embed them, for which we provide some very simple HTML. If you go to a page such as this or this or this and look to the right of the media player you will see text that says "This item is available to share and reuse under the terms of the IWM Non Commercial Licence." Click on "share and reuse" and expand the "Embed HTML" to grab the code. Then paste it like I've done below.
Credit for this is due to lots of people. I'd mention particularly Debbie McDonnell and Naomi Korn, whose work on (and advocacy of) the IWM User Licence was fundamental. However the whole of IWM's Copyright Group deserves thanks for its support for releasing our content like this, from our film and sound curators through to those responsible for e-commerce and for the maintenance of our digital assets. Taking this step is no small risk to those charged with the guardianship of our collections or those that need to generate income for IWM, but the evolution of attitudes from our first discussions to this stage showed great open-mindedness and was really encouraging. ICT's work to beef up the infrastructure was another crucial ingredient.
Please let us know what you think. There are various things we know need improving but we are keen to have some feedback first. We are hoping that later this year we'll be able to offer an HTML5 player and much higher quality video too, as well as (with any luck) more media. No doubt a filter for media with this licence would be useful too, so let us know what you're after.
If you want to use our films for purposes outside the IWM User Licence, do please contact us. For film, see our film sales site; for audio, get in touch

"Fit to fight" [more]

Tests of the "Highball" and "Upkeep" bouncing bombs [more]

Recollections of a POW in WW2 [more]

Saturday, March 03, 2012

Installing Windows 8 Consumer Preview as a virtual machine

Windows 8 Consumer Preview is out. By all accounts the new OS is a radical departure from the Windows UX paradigm of the last 2 decades and it's quite possibly going to be a very important OS, not least because of convergence it apparently shows between desktop and mobile, with the interface rethought with a focus on touch. I've not played with it yet because I've just completed the installation, but I thought I'd put up a couple of words about that process. So far it's evident that the look is very different indeed.
Most of these instructions are explained in more detail and with screenshots here so you might want to consult that too, but that was written for an earlier release and I found a couple of gotchas fow which I've gleaned answers so I thought it worth putting them here. I've used VMWare Player, which is free. There are alternatives, just pick your version carefully as not all will work with Win8. So here's a quick step-by-step.

Step 1
Grab the 32 bit ISO from here
Step 2
Install VMWare Player 4. Version 3.x doesn't work and will give you a "HAL_INITIALIZATION_FAILED" error (I know, I tried). VMWare Workstation 8 also works apparently, as do some versions of other virtualisation software.
Step 3
Create a new VM. Just point at the ISO, wherever you saved it, and VMWare will do the rest. Pick "Windows" and then "Windows 7", don't put in a licence key. Let it go.
If you get an error saying "Windows cannot read the setting from the unattended answer file", disable the floppy drive in your VM - it's an icon at the bottom right (see here).
Step 4
When you get to the licence key screen, use the following: DNJXJ-7XBW8-2378T-X22TX-BKG7J (see here)
Step 5
Set up your account. I'm not sure how optional this was but I did it anyway. They ask for a lot of compulsory info but it does open up SkyDrive and other interesting aspects of the new OS so it's worth doing, I think. You can always bullshit.

You're done. Then find your way around. Hint: the Windows key is useful.

Wednesday, January 25, 2012

Solr to Google Earth

This is a basic how-to you can probably find done better elsewhere, but since I didn't find all the bits in one place myself I thought I may as well put this up.

The task: show query results from Solr on a map or in Google Earth using the latitude/longitude data in there. Make the results update as you move around, because there may be too many to bring back all at once.
The technology: Solr, XSLT, KML, PHP, Apache web server

This is a pretty common sccenario and for basic needs and people that aren't already into full-blown mapping solutions this is going to be a better choice. That said, for the latter there are plenty of options and you may want to investigate, for instance, OSGeo/OSGeo4W.

I had a small amount of time to evaluate out some possibilities for a future project, so I needed something quick and familiar. I'd done some Solr-based mapping a couple of years back so I had some code to nick. So, having turned some OSGB36 data into WGS84 lat/longs (thanks for the help, @portableant!) I got it into a Solr index, which I'm not going to go into here except to say that I used the "tdouble" datatype because a trie field seems like a good idea for efficient searching, and you need something that can cope with all those floating points. I believe there are proper geo data types but I'm ashamed to say I've not even bothered looking at them, I think with them you could do fancier proximity search and the like but basic is fine for me. So here's are the relevant bits from the schema.xml:


<types>
...
<fieldtype name="tdouble" omitnorms="true" class="solr.TrieDoubleField" positionincrementgap="0" precisionstep="8">
...
</types>
<fields>
...
<field name="latitude" indexed="true" type="tdouble" stored="true">
<field name="longitude" indexed="true" type="tdouble" stored="true">
...
</fields>

With the index done and queries working it was a matter of getting some KML out. Solr will transform XML on the fly so converting its XML output into KML is really not hard. Plus I already had a transform to cannibalise. It goes something like this:


<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" space="preserve">
<xsl:output type="text/xml; charset=UTF-8">
<xsl:preserve-space elements="*">
<xsl:template match="/">
<kml xmlns="http://www.opengis.net/kml/2.2">
 <document>
  <name>Results</name>
  <xsl:apply-templates select="//doc[double/@name='longitude']">
 </document>
</kml>
</xsl:template>
<xsl:template match="doc">
 <placemark id="{str[@name='id']}">
  <description>
   <xsl:text escaping="yes">
   <![CDATA[  <![CDATA[  ]]></xsl:text>
   <p>
   <xsl:value-of select="str[@name='title']">
   </p>
   <xsl:value-of select="']]>'" disable-output-escaping="yes" />
  </description>
  <name><xsl:value-of select="str[@name='title']"></name>
  <point>
   <coordinates><xsl:value-of select="double[@name='longitude']">,<xsl:value-of select="double[@name='latitude']">,0</coordinates>
  </point>
 </placemark>
</xsl:template>
</xsl:stylesheet>

Put your own preferred fields in here, of course. Some notes: the CDATA bits are because you need to put HTML in your "description" element into CDATA, but outputting this with XSLT takes a little lateral thinking because it needs to see your CDATA declaration as....CDATA. Hence this structure (and the bit for closing the section with "]]>"). Secondly, note that I start by selecting the "doc" elements that Solr returns but filtered for the presence of "longitude", since we only want to show things that have a point. Actually you may do the filtering in the Solr query instead (or a filter query). As we'll get to later on, in fact. Final note: your fields will obviously be different and you'll want to put something interesting into the "description" element, which is what pops up in balloons on Google Maps and the like.
Getting KML out like this is just fine for showing this stuff on Google Maps, but this wasn't working for me on Google Earth. The reason is that GE wants the correct content type, whereas GMaps, OpenLayers etc don't care. GE isn't bothered by the file extension AFAIK, but headers? Yes. The other thing I wanted to do was create a network link, which is a means by which a request for KML can be updated to restrict it to a geographical area (a bounding box). With a network link you can specify how the north, east, south and west limits are expressed, which makes it pretty easy to slot those values into a Solr URL. However because of the content type issue this wasn't going to work.
So, here's a PHP file that proxies the Solr query and spits it out with the right content type:


<?php
/*
Google Earth wants text/plain or application/vnd.google-earth.kml+xml so this script is a proxy that will pass on all parameters to solr and return the results with the right headers
Set up a bunch of defaults like the number of points you want and the default bounding box, which is basically the whole world here (I think)
*/
$rows=100;
$start=0;

if(is_numeric($_GET["start"])){
 $start=$_GET["start"];
}
if(is_numeric($_GET["rows"])){
 $rows=$_GET["rows"];
}
$lat0=-90;
if(isset($_GET["lat0"])){
 $lat0=$_GET["lat0"];
}
$lat1=90;
if(isset($_GET["lat1"])){
 $lat1=$_GET["lat1"];
}
$lon0=-180;
if(isset($_GET["lon0"])){
 $lon0=$_GET["lon0"];
}
$lon1=180;
if(isset($_GET["lon1"])){
 $lon1=$_GET["lon1"];
}
if(isset($_GET["q"])){
 $text="text:".$_GET["q"]."+AND+";
}

$solrbaseurl = "http://localhost:8080/solr/myindex/select/?";
$url=$solrbaseurl."q=".$text."longitude:[$lon0+TO+$lon1]+AND+latitude:[$lat0+TO+$lat1]&wt=xslt&tr=kml.xsl&start=".$start."&rows=".$rows;
$s = file_get_contents($url);
header('Content-Type: application/vnd.google-earth.kml+xml');
echo $s;
?>

[CAUTION: see below for a note on web server configuration for another necessary step.]
This script was written with a network link in mind, so I decided to pass in latitide and longitude start and finish values using the preferred parameters for this, but it's up to you how you do it (see the docs here but they don't make it clear that you can pick your own format for the bounding bpox parameters). I take them (lat0, lat1, lon0, lon1) and put them into the Solr query so that you end up with something like:
q=text:church+AND+longitude:[-1.5+TO+1.5]+AND+latitude:[48.5+TO+51.5]
(the "text:church+AND+" part is only put in if a query term was also specified)
So that gets you a set of results within a bounding box. So if you can update this with a new bounding box every time the user's viewport changes it's pretty useful. So the next thing is the network link itself. It's another KML file which again I needed to make on the fly so I could call it from a form, and, of course, I had to put it out with the right headers so here's another PHP script:


<?php
//this script is a proxy to create a KML file for a network link
header('Content-Type: application/vnd.google-earth.kml+xml');
print("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
?>
<kml xmlns="http://www.opengis.net/kml/2.2">
 <NetworkLink>
  <name/>
  <visibility>0</visibility>
  <open>0</open>
  <description>A network link to some results</description>
  <refreshVisibility>0</refreshVisibility>
  <flyToView>0</flyToView>
  <Link>
   <href>http://localhost/myapp/solrKmlLatlongs.php?q=<?php echo $_GET["q"];?></href>
   <refreshInterval>2</refreshInterval>
   <viewRefreshMode>onStop</viewRefreshMode>
   <viewRefreshTime<1</viewRefreshTime>
   <viewFormat<lat0=[bboxSouth]&lat1=[bboxNorth]&lon0=[bboxWest]&lon1=[bboxEast]</viewFormat>
  </Link>
 </NetworkLink>
</kml>

The "Link" element in that KML points at the other file, proxying Solr (in this case located at http://localhost/myapp/solrKmlLatlongs.php)
"viewFormat" is the part where you can specify how the bounding box parameters are to be passed into your KML-emitting script. There's other stuff you can look up for yourself.
Basically, if you call this script with your solr query it will chuck out the KML with the network link, which you can view in GE (it will be switched off by default). Then whenever you zoom in or move around the query will refresh with a new bounding box. For me, with a pretty big dataset that you can't really load all at once, I may start with a text query that covers the whole of the UK (no bounding box parameters) but, being limited to, say, 100 results, these may be scattered all over the map. As you zoom in and shift round, it will load more and more results according to your current view.
The final thing to note is that it's not probably enough to set the content type in PHP. In my case I needed to add a couple of lines to my Apache config (httpd.conf) so it knew what to do:


AddType application/vnd.google-earth.kml+xml .kml 
AddType application/vnd.google-earth.kmz .kmz

I hope this helps

About Me