About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Thursday, August 20, 2009

The great escape

Well today I don't feel like moaning. Pretty fecking remarkable, huh? Stuff went pretty well, we're close to finishing a very important stage in the Collections Online project, I unbroke some things earlier this week so I could get on with some actual work, I talked with a curator about an exciting project that's still far enough in the future that we can dream big dreams and not worry about the inevitable slap in the face that reality will give us...
On top of all that I managed to find a few minutes to do some development, which is pretty good by current standards. One thing I wanted to do was simply make a map link from an object record in our Solr index. Now, Solr URLs have their reserved characters as well as normal URL escaping. XSL, too, with which I transform the Solr output, likes escaped characters. Google Maps URLs, of the sort that you make to overlay KML on a map, well, of course they also require characters in the KML URL parameter to be escaped. The end result is a URL for a map with overlay that looks something like this:


Ugly, huh? [BTW, once I put the new multicore index up this URL won't work]

Escape, escape, escape, and I've had plenty of fun and games in the past trying to escape stuff in XSL the way I want it without XSL then re-escaping or unescaping or otherwise ballsing up the output, so this time I thought, sod this, I'll just make a page to take in a nice simple set of parameters and redirect to the map. This makes it a whole lot easier to write the links in XSLT without worrying so much about the escape nightmare. A link like:

[the "+" can be "%20" instead]

I don't know how much time I saved but I know it only took 5 minutes. It takes in a Solr query, record count and start index, escapes characters as befits GMaps KML URLs, and inserts them into a Solr query URL (including the KML transform bit, of course: wt=xsl&tr=kml.xsl, in our case). This is put into the GMaps URL and we do the response.redirect (yes, it's classic ASP). It's brittle: it will break if the GMaps URL format changes, or if the Solr URL or output format change; but hey, it's simple and works (for now).
Side benefits
It was only after making the script for these pragmatic reasons did I realise that having such a page is, of course, good for several other reasons, including:
  • it will give us stats on people following the map links
  • that same brittleness is more of a problem if I'm making links like this in lots of scripts and transformations around the site. This way I only need to point all similar links to one script and change that
  • if I decide to scrap Google Maps and use, say, OpenStreetMap, or if I want to get my KML from somewhere else, again, one script to change

I will probably add a couple of other parameters but don't want to make it heavy. Specifying the data source is one (other than Solr we can get KML out of, for example, our publications database); specifying the target service is another, so that we could use GMaps, OSM, Yahoo! and so on. Shit, anything but Streetmap (how much do you not miss having to use that piece of crap? Best thing about the last few years in mapping is the fact that you never see that anymore).

[edit 21/8/2009]

I've done some further work this morning, along the lines suggested above. It now takes in a data source and a target service parameter (though the latter only works for GMaps at present), which means I can pull in the publications KML and may start getting MOLA sites by site code too. Much more flexible now, and a single point for all map requests is going to be handy. More work to do to use more powerful aspects of pubs search.

It may seem odd to blog about a 5 minute job when I've been doing much more challenging and complex things that take months, but it's very satisfying when it works so quickly, plus my belated realisation of the useful side effects made me think it was worth talking about. Here's the salient part of the code as it now stands, for interest.

Here's a link to the new script, looking at publications data:


Follow that and see the GMaps URL I now longer have to write!

Saturday, August 15, 2009


So I've got into using bit.ly for my short links, particularly on Twitter. I'm sure I don't need to explain why, but aside from these serving the obvious need for brevity within a tweet (but never here...), I appreciate the stats, which appear in real time at minute-scale granularity and so are in some ways clearly superior to what you get from Google Analytics. Here's an example: http://bit.ly/info/3NuA3P . I'm going to talk more about the stats later on but before that a brief digression about Delicious, which I think will repeat some of what I saw in a post by Tony Hirst recently, but it's been brewing so gotta get it out.
What's wong wiv bit.ly and Delicious
The problem with bit.ly is that the things I want to tweet I typically also want to bookmark using Delicious, for whilst bit.ly keeps hold of your tasty links it's not got tagging (and why not, I wonder? That would make it a much more useful and social service). It got the the point where I was wondering why Delicious wasn't offering an integrated short URL service, since right now if you want the full benefits of online/social bookmarking and short URLs neither bit.ly nor Delicious cuts it. Or should I say, right then, since about a week ago (and within a week of my tweeting my bemusement that Delicious wasn't doing this), it did. Bookmark with Delicious now and you get the option to share your link, which produces a short URL. Cool, and yet.... it's not good enough for me. You can only do it by letting Delicious e-mail or tweet the links for you, at which point you see the short URL. Bu you can't simply view the short code immediately so that you can cut-n-paste at will. Delicious should create one for every single bookmark, with the option of custom links. To do what bit.ly does and tempt me away from it, it must also create unique URLs for each person's version of a link, so that can be tracked individually (together with the shared one for aggregate data), and it must offer decent stats.
So, that's why Delicious isn't up to snuff yet for me to jump ship from bit.ly, even if that would mean just one operation for tweeting and bookmarking my fave URLs. Hmm, come to think of it perhaps bit.ly could offer OPML output or some simple export or integration with Delicious so you could just synchronise periodically? That might keep me using both services happily. I should say, I'm perfectly aware that there are alternatives to Delicious and that some of them offer better integration with Twitter, but that's my chosen poison and with social stuff the size of the network is vital to its gravity; ain't no bookmarking service with more gravity than Delicious.
Who follows?
But what about those stats about link followers that bit.ly offers? Let's dig into them. What do they really tell us?
I started to get suspicious that so many of the followers of links I tweeted were from the US, and often at a time when normal people would be a-bed across the Atlantic. The real-time stats showed that they were also very quick off the mark, and whilst the streaming nature of Twitter means that you expect responses to be quick or not at all, sometimes the click-throughs seemed to come even before I tweeted (via Spaz, the Air client I normally use). Super-quick, US-based (which only a few of my followers are), and very steady numbers for most tweeted links; were these clicks from real people at all?
Short answer
No, lots of them weren't; a steady residue of link follows came from bots of one sort or another.
Long answer: an experiment
I did a couple of experiments to test this. First I made a page on my own web space just fo' the bots, made a bit.ly link to it, and tweeted it asking humans NOT to click the link. This being a highly scientific an experiment I should here state that an explicit assumption was that my followers deem themselves to be human (though I know this was violated occasionally, including by myself. Doh!). I hoped that the stats from this web page would give me the answer as to how many click-throughs shown in the bit.ly stats were via browsers and how many were bots. Well, yes and no. I forgot how lame the stats on that web-space are. No breakdown by day nor details of users by page, only for the site. Nevertheless I get minimal visits to those pages and a massive peak on the day of that tweet, so I can probably tell enough. All the same I thought I'd better try Google Analytics too, so having set that up for my site I repeated my tweet. Then I thought, perhaps I should have used a new link? So I created a custom name for my URL and tweeted once again.
Some numbers
Of 17 link follows reported by bit.ly (http://bit.ly/info/3wQZ51), 15 were "direct", which would include bots but also most other applications, e-mail clients etc. One was from bit.ly itself (that was me, oops) and one from tweetdeck (Mike, that you?). 10 were from the UK and 7 from the US, and pretty much all of them happened within seconds or minutes of my tweets.
My original "for bots only" tweet on the 8th yielded 6 of the follows, plus that accidental click from me. According to PlusNet web stats package I had 51 hits and 22 visits that day, which were almost exclusively to that page (with some other pollution from yours truly, no doubt, as I clicked round the site setting stuff up). I guess that means that once they'd found the page via the bit.ly link some of the followers came back a few times. Now that's definitely not human.
Once I had my Google Analytics bit sorted out, on the 11th, I sent a second tweet. This resulted I think in one visit, by bit.ly's stats. This tweet used the original bit.ly short URL (http://bit.ly/3wQZ51), so presumably the bots figured it wasn't worth going there again. Looking back at the tweet, actually, I think I left the "http://" off so perhaps that's the real answer.
Anyway finally I did the same thing again but using a new custom link (http://bit.ly/bottest), which to the bots would appear to be a new link (all except bit.ly's own bots, perhaps?). This produced another 7 follows, and the next day there were two more when I wasn't watching. Google Analytics reported one visit to the target page, from a Firefox/Windows user in south London, so I presume that one of those 10 follows was via a browser. According to PlusNet, there were 28 hits and 15 visits on the 11th (1 visit is more normal).
So how many of the visits were bots? Well, putting GA together with bit.ly's stats I'd say only 1 out of 10 follows on the 11th/12th was not a bot, though it's possible that others were humans users that just didn't fire the GA code for one reason or another.
Overall, in August so far 13 hits in the web logs are attributed to the bitlybot user agent, 4 to the Tweetmemebot, 2 to twitturls.com's bot, 4 to Spaz, which I know as a user makes requests to something or other (bitly, or the target URL perhaps) to get some page info. A bunch of other bots and non-browser UAs are in there too but I can't say if they're related to the tweets.
I don't think I can squeeze much more from this paltry sample and the crappy and contradictoty web log stats, but clearly nearly all of the visits via Twitter/bit.ly were, as I hoped, not from humans and most likely came from bit.ly's own bot and that of Tweetmeme and Twitturl. From this evidence, if bit.ly reports that I get half a dozen "clicks" on a short URL I've tweeted then I can assume they're probably bots. More than that and they're probably at least partly human. Whether this applies to other twitterers I can't say, but you can do your own experiments. I'd like to repeat this using a site with a better stats package as bait, and perhaps using a few different twitterers to throw out the link, to see whether there's any relationship between numbers of followers and numbers of bots. Quite likely not, but who knows.
Is this any use? I dunno, but I'm a little better informed about the impact of my tweeted URLs now.

[[edit: ironically, looking for the custom link to this post that I made at the weekend, I found their user forums where there's more discussion of the bot problem e.g. http://feedback.bit.ly/pages/5239-suggestions/suggestions/126917-show-me-if-hits-are-bots-human-or-rss-readers-etc-]]

Tuesday, August 04, 2009

Spinal Muscular Atrophy links to act on

Yesterday I heard about three separate activities concerning Spinal Muscular Atrophy. This is a nasty disease that kills more infants than any other genetic disorder and in its milder forms leads to varying degrees of disability or threats to mortality. Recently you may have heard about one prominent sufferer of SMA, Baroness Campbell, a commissioner on the Equlity and Human Rights Commission (good profile/interview in the Guardian).
  • There is currently a petition on the Number 10 website seeking more funds for research into SMA, for whilst there are according to Wikipedia various "cures" being trialled there's nothing realistic in the offing. If you feel that, amongst the many competing claims on your taxes, this is a worthy cause, I'd urge you to sign up.
  • Secondly, the Jennifer Trust is a huge boon to sufferers of SMA and their families and is currently on the shortlist for a National Lottery Award for bst health Project, for the quality of its outreach programme. As well as more exposure the award brings a little cash, which would be nice, and of course recognition for the wonderful people that do this work. Again, there are other laudable health projects in the shortlist but we'd love your support in the form a your vote!
  • Finally, I found out that my colleague Adam Monnery (acting head of IT) is doing a charity triathlon which is raising money for a variety of charities, including those supporting ill and disabled children (see the Tri For Life site for more info). Here's his team's Just Giving page. [As an aside, I have to say it perplexes me that the '000s of charities in the UK haven't come up with their own alternative to JG (which takes a slice of the donation), but perhaps the economic benefits aren't worth it.]
Pardon the naked self-interest in this off-topic post, but frankly it's more important than any of the digital heritage stuff I'd usually put up! Thanks for reading.