About Me

My photo
Web person at the Imperial War Museum, just completed PhD about digital sustainability in museums (the original motivation for this blog was as my research diary). Posting occasionally, and usually museum tech stuff but prone to stray. I welcome comments if you want to take anything further. These are my opinions and should not be attributed to my employer or anyone else (unless they thought of them too). Twitter: @jottevanger

Monday, November 07, 2011

New IWM websites pt.III: the, um, website

So, as on the evening before we switch our new website over from beta to fully live status I finally get round to the website part of this series of blog posts. In part 1 we did brand, e-commerce and hosting. In part 2, collections and licensing. Here, we'll look in too much detail at building the core website itself. Sorry, it's a long 'un.

Why a new website?
For the last 8 (I think) years, IWM has used BoxUK’s Amaxus CMS to run its websites. Naturally the sites were getting creaky and the CMS itself has been superseded, IWM itself has changed, and so has how the web works – both technically and in terms of the behaviour of web users and the language of interaction that they understand. A clean sweep was in order, which means a variety of strands of work. This much was clear when Carolyn Royston (our head) and Wendy Orr (our Digital Projects Manager, and so lead on the website project) outlined their ambitions to me when I started at IWM in May 2010.

Choosing the platform
Research for the new sites had began some time before that May but planning really kicked off in June. For me, the first key deliverable was the selection of a technical solution, but although I had a fair idea which way I would go I wanted to know a variety of other things first. How can you choose a CMS without knowing the functional specification, and how can you really know that without settling the information architecture to some degree, and the ways that people will find content and interact with the site? Decisions on whether we’d be supporting a separate mobile site, for instance (we don’t, at least not for now), and our plans for legacy sites all could have an impact. But of course you can only work out so much of this beforehand, and most questions seem to lead to others in a Gordian knot, so in the end you have to assess the situation as best you can, put together your own set of technical priorities, and make your selection as something of a leap of faith. I had the benefit of advice from various knowledgeable people in the sector who told us of their experiences with various CMSs, in particular IMA’s Drupal mage Rob Stein and the V&A’s Richard Morgan and Rich Barrett-Small, and we also had demos of a couple of commercial CMSs. Most importantly, though, we had Monique Szpak, whose role in this project (and my learning process at IWM) really needs a blog post of its own. Her experience with various open source products including Drupal was key, and after we identified that as our preferred solution she built us a proof-of-concept late last year to confirm that Drupal was likely to be able to do what we needed, and to assess the likelihood that Drupal 7, which at that point was still in alpha, would be ready when we needed it. With this information we took an informed gamble that it would be, and the choice was made.

As I already said, we started development work even before settling finally on Drupal, as a piloting project, and this continued whilst we were developing our plans for content, IA and design. There were a number of things we knew we wanted, even if the functionality was still hazy – with Monique’s help we’ve instituted agile practices which positively encourage trial, error, testing and improvement. This change, in fact, together with the development environment we’ve gradually (& painfully) pieced together and the implementation of tools like Jira and Subversion, has been fundamental to making this project work, and it would have been impossible without Monique. Whilst she worked on prototyping more functionality, I did some groundwork on indexing shop and external sites. Then in the spring Toby Bettridge joined us, fresh from working on the Drupal part of the V&A’s new site. He and Monique worked very closely (with the help of Skype) and long before the design work was complete we had basic versions of the taxonomy, events, multi-index search and collections functionality done, amongst other things.
Although I’ve been paying attention to what they do, my hands-on involvement in Drupal development has been pretty much nil and I still understand the CMS far less than I’d like, so anything I say about Drupal development here needs to be read with that in mind! I do get, though, that one picks modules carefully, develops new ones with reserve, and never hacks core. We started developing with Drupal 7 before it was released, and even when it was there were (and remain) quite a lot of modules that weren’t ready to use. We thought the gamble was worthwhile, though, and forged ahead. In time we did incorporate some of them, although unfortunately we still don’t have some of the things promised by e.g. Workbench. Along the way Monique and Toby also did some vital module development of their own, notably a custom collections search module (using Search Api Solr Search), media embedding for authors using IWM’s oEmbed service, entity lists (old-style Drupal nodes had lists, but new-style entities didn’t), and some administrative tools.
My role in development? I’ve often felt somewhat awkward, if I’m honest, about my fit, because having elected to go with an entire technology stack and various development practices that were new to me, I often found I couldn’t really contribute practically, even where I understood some things well. For instance, although I have plenty of experience with Solr, my practical contribution to integrating it with Drupal was negligible; likewise if I knew what was required to fix some HTML/JS/CSS at the front end, I could not implement this in an unfamiliar environment for fear of messing up Drupal or making some Subversion faux pas. I think I’ve made but one single (successful) check-in of Drupal code. I concentrated instead on sorting out development and live hosting, working on getting the collections data right, filling the holes in the spec as we noticed them, and so on. I spent a good while working out how the media streaming worked and how to embed that in our pages, using the DAMS’ web service to build a light-weight SOAP-free alternative (an oEmbed service) that could both serve our websites and potentially 3rd parties. When everything calms down, though, I need to properly get to grips with the codebase.

Information architecture, discovery & URLs
Working out the logic for a site that’s going to function for several years is not easy. One can change that logic if necessary, but you really need to know how likely that is to happen in order to give your designers some parameters to work within – how flexible do menus have to be? How directed will the user be, and how much should any one piece of content be located in a specific part of the site? As I said earlier, the brand structure and the 5 IWM branches were a big factor in how we had to organise content, since we needed to make things readily discoverable whatever the user’s journey, but without making them context-free and confusing. Another pair of conflicting priorities were the wish to avoid having too many top-level menu items and the wish to keep the site fairly flat without obliging too many clicks to find content.
Sites like the V&A, which relaunched earlier this year, have taken adventurous routes to delivering masses of content to users (or users to content) – in the V&A’s case, centring around search and introducing a sort of machine-learning to categorise content and indeed to identify what categories might exist. Brave stuff, and a great solution to the huge volume of content they have there.
At IWM we played for a while with the idea of a taxonomy driven site, wondering if we could use a set of taxonomies as facets onto different aspects of the site that would let users cut across a traditional hierarchical organisation of content. We’ve kind of gone with a watered-down version of that, wherein the structure of the content is fairly obvious and on the whole quite flat but we’ve used controlled terms and free tagging to help make things more discoverable to users coming from other angles. This is pretty conventional and at the moment of limited power, but in due course we will make greater efforts to align our taxonomies (in particular our history taxonomy) with the controlled terminology used in our collections. This was too much to do in this phase, but when that happens we should be able to make ever-better connections between our collections and pages like our “Collections In Context” history pages, learning resources, galleries and perhaps events. A learning-focused vocabulary will do the same, but right now our e-learning resources are pretty much non-existent.
Perhaps more important than taxonomy at the moment is search, which has been a key way of integrating content that lies outside our main site. We’ve elected to run 4 separate Solr indexes for this, and to keep them separate owing to the distinctive nature of their content. We have the Drupal index itself; collections data; an index of products extracted from our Cybertill e-shop; and a crawl (using Nutch) of a number of IWM sites that are outside of Drupal, such as blogs and the “Their Past, Your Future” learning resource. The last one needs a lot more work but as a quick-and-dirty way of ensuring that those legacy sites weren’t left out in the cold it works. And yes, a Google custom search engine would have been an alternative but then it would not have worked in the same way as the other searches, with deep integration into Drupal and the ability to treat the results as entities and reuse them elsewhere.
One obvious change with the new site is that, well, it’s one site. Previously we used a morass of subdomains for somewhat independent branch sites and even for the collections-pages-that-weren’t-collections-search (collections search had its own domain, no less). I for one found it pretty confusing. With the rebrand making the “IWM-ness” of all of our branches more prominent we were able to do the same on the website. I had been through a similar exercise at the Museum of London, and though it was not an identical situation some conundrums and dilemmas were shared by both. How to make it easy to access non-branch specific content and information to all users in the same place as branch-specific content, and how to make sure that people are well aware that the latter pertains only to one physical site? How to cross-promote? Like MoL we had no specific digital brand, nor a mother brand to distinguish particular projects or sites from cross-organisational activities. I hope we found a solution that works for our users and not just for IWM itself, but time (and more user-testing) will tell.
The expectation that we’d move content around and that the organisation of material on the site, as seen by users in menus etc, would not be forever, prompted me to seek a URL structure that was a little more abstract. I didn’t want URL components necessarily to be the same as top-level menu items, which might disappear, but to relate to more stable concepts of what IWM does and offers whilst remaining meaningful. That doesn’t mean permanent URLs but hopefully relatively long-lasting and predictable ones. In one area – collections – we do aim for the URLs to be “permanent”, though (whatever that means). What I tried to do was put what I imagined to be the most stable aspects towards the left hand end of the URL path, things like “corporate” and “visits”, because I envisaged these as being more stable than even branch names (we might get more branches, or rename them again). I also wanted to be able to put non-branch content under these. The result is that we don’t have branch names at the top of a hierarchy but reappearing in a few places – visist/iwm-london as well as events/iwm-london and others. It may seem messy but I hope it’s reasonably predictable all the same, and it means we never need a catch-all URL to cope with the miscellany that we hadn’t foreseen would ever exist outside branches.

We appointed the Bureau for Visual Affairs, who were responsible for the National Maritime Museum’s new website’s design, to do the same for us. Judge for yourself how they’ve done, although good or bad the credit or blame are not all theirs, even when it comes to aesthetics. Design and content go hand in hand, and in some places we’re still working to improve the latter to make the best of the former. Under the covers, too, the HTML that’s spat onto the page is the result of BVA’s HTML coders’ fine work at one end, Drupal at the other, and the best efforts of our devs to bridge the gap. And sometimes the gap was pretty big.
The theming process was one area where our plans went somewhat awry. We had two experienced Drupal developers on our team, but as there was plenty for them to do in back-end development we were planning on the theming being handled by whoever we appointed as designers. BVA, however, are not a Drupal house but their design was what got us all excited, so we reached an arrangement with a third company to subcontract to BVA to do this part of the work. Having done it once, this is not something I would recommend - at least not unless you can make it very clear who answers to whom and where the line lies between development work, theming, and HTML development (and who pays who for what). We ended up some weeks behind but got back on track with the help of Ed Conolly of http://www.inetdigital.co.uk, who moonlighted as a themer for a few weeks and helped put a spring back in everyone’s step. Bravo Ed!

Early in our content planning we decided what we’d migrate from the old sites (not a lot), what we’d need to keep going (a small set of microsites) and, broadly speaking, what we’d want to add to the new site. Killing off content doesn’t usually sit too well with me, who’s a conservationist and archivist by inclination. My instinct is that it’s sure to be useful to someone to have pretty much everything we’ve ever done remain available, but that’s nonsense really and far from helping people could end up confusing them, not to mention sucking up resources for maintenance that would be much better spent on creating new content of real worth. We did have an awful lot of pages that related to old exhibitions and so on, and were very keen to disentangle ourselves as fully as possible from our old content management system, Amaxus 3. In the end we have kept three or four microsites from that. Other content needed substantial alterations to bring it up to date and suit it to the new site structure.
However, beyond the core, practical information about visits etc., we wanted to do something that would directly serve the core purpose of the IWM: to tell the stories of conflict through the material we hold; and we wanted to do it in a rich, immersive way. BVA came up with a solution that looked lovely, although we went through a few iterations in order to make it easier to create the content and to draw parts of it from the collections middleware. We wanted HTML that could be generated almost automatically, which opens up other potential uses for the template. This took away some of the visual sophistication with which BVA won our hearts, and I suspect that they were a little unhappy to see this go, but this is a site that we want to add to frequently and without having to use HTML developers to do it, so I think we found a happy medium. Our “Collections in Context” (or simply, “history”) section contains over 100 articles at present, using images, audio and video to tell stories spanning from the First World War to the present conflicts in which the UK is involved. They were written by one of IWM’s historians and put carefully worked into the CMS by our team in a close collaboration that we hope to turn into a rolling programme of content creation, perhaps reflecting current events or notable anniversaries. I hope in due course we can extend the use of the format to other parts of the site and other voices, perhaps enabling its use as a tool for our website visitors. The people who deserve a shout-out for writing, editing, and/or inputting the hundreds of content pages that make up the new site are New Media’s Jesse Alter and Janice Phillips together with Maggie Hills, who has joined us for a busy few months.
BVA brought a couple of other bits of bling to the site, with the aim of a more engrossing, immersive experience. First amongst these is the “visual browse”, a slideshow mechanism that underlies many of our pages and is brought to the fore by clicking a tab at the top left. We can make any number of these and surface them on the pages where they are relevant – for instance, each branch has its own visual browse.

Is it any good?
When I stand back from whatever details might be preoccupying me on a given day I’m really pleased with the overall effect of what we’ve done, but of course I am not a typical user and what will count will be the feedback we get from our users. But for me, I’m especially pleased with the history pages and the way that our collections are now used there and in the search pages. I am also pretty pleased with the balance we’ve found between the individual branches (essentially, the needs of the physical visitor) and the cross-branch/non-branch activities and content, but because this is necessarily a compromise I expect that it will not work for everyone.
I have reservations too. I think the lack of a mother brand is a problem, and I think we need to make the home page work harder to offer a powerful message of what IWM as a whole is. The lack of fly-out menus is galling to me, although the ones for branches work well. It means more of a leap into the unknown and more clicks to find what you’re after. Our lovely, lovely history content is hard to find. Mobile performance is not that great – the whole site is too wide to load full-width with the text legible, and a ton of stuff loads onto the page that is not important for the mobile user. It functions OK, but it’s far from an optimised experience.
So, my opinions aside, there’s plenty to do over the coming months. But it will feel mighty good to have this milestone out of the way: November 8th – switchover day.

Wednesday, November 02, 2011

New IWM websites pt.II: Collections and licensing

This is the second post about our "big bang" and the things we launched on October 4th. In the first post I talked about the new brand, e-commerce, and hosting. Here I'll talk a bit about collections and the closely related issue of licensing.

The collections
So, now we’re getting more into the area I can talk about more knowledgeably. The opportunity to help reimagine how IWM’s collections are brought to the public through digital media was one of the key attractions that brought me here 18 months ago (though there were several other extremely compelling reasons. I think some of them are still in post). I felt I could bring something useful from my experiences at the Museum of London, having been part of the team that brought an ambitious new system there just before I left.
I hope I’ll be able write it all up properly soon, but I’ll keep it brief here. The collections online project at IWM had two objectives: firstly to build the foundational infrastructure for all future data-driven collections applications; and secondly to build the first public interface onto that infrastructure with the collections search pages in the new website. Only the bare essentials of the infrastructure were to be built in this phase: purely what was necessary to deliver to the web application and to be provide enough of an architecture for us to plug in the planned extra features later on. We have plenty in line for phase 2, but we’ll tie all that stuff in with specific front-end requirements.
Simon Chambers came on board with us to project manage this one and he did an incredible job of marshalling the requirements, prioritising them, working with a number of departments and strong-willed people and getting things as quickly as possible to the point where we could deliver the baseline of what the website needed. We ultimately decided to work with Knowledge Integration, who built the CIIM for us at MoL and who could bring us an existing application that fitted our needs very well.
Essentially the CIIM, at least the part we’ve implemented so far, pulls data from the collections management system (in our case Adlib) and remodels it to serve the needs of discovery and delivery as opposed to data management. These are very different things, and that a CollMS may do the latter very well doesn’t make it ideal for the former. This is as much about how the database is used across the organisation’s varied collections as it is about the technical qualities of, say, Adlib specifically, because this architecture allows us to intervene in the data between its source and front-end applications that use it – to remodel and align it, to integrate it with other data sources or enrich it, to prepare associated media, and to optimise it for full text searching, for instance. The big job since February, when things kicked off in earnest, has been modelling the data correctly. I have to admit I seriously underestimated the complexity of getting this right, and we had a series of problems to do with the API it was extracting from and the readiness of some of the data, but in a way this illustrates why it’s a good thing to be able to do all this work away from the front-end.
The result is a first-pass at a Solr index that, along with 3 others, lies at the heart of discovery on our new website. Try out the search engine here, or here are some good searches to get you going. Watch a video. Listen to interviews or momentous radio broadcasts (Czech alert). Oh and if you find an object like this you'll see that it's part of a collection, and can see a complete listing of that collection. Our priorities mean that we’ve deferred implementation of some of the features we want in the longer term but we know we can leap into action soon. In fact, Tom Grinsted, our multimedia manager, has put together a project with UCL and K-Int that gives us a focus for some of this functionality and is just getting underway now, which is rather exciting. Luke Smith and Giv Parvaneh are also busy planning various projects for the next few years as part of the centenary of the First World War that will also draw on and feed into the system. So watch this space.

All that e-commerce work around collections has also meant reviewing the way we licence our material. Recent developments at national and European level – notably the creation of the Open Government Licence by the National Archive (TNA) – and the steps that some of our peers have made in offering their assets for creative reuse to the benefit of all, have also had an impact. IWM has now launched its User Licence (essentially the OGL), which frees up almost 200,000 images, audio recordings and films, for non-commerical use. Regular "fair dealing" restictions apply to others, like this nice Ronald Searle picture. I'm afraid we've not yet got a filter in collections search for items with this licence, but try right-clicking on an image on an item page to see if you can download or embed it.
The licence applies to IWM-generated content and data too, so although we don’t yet have a public API to our collections the data around them is up for grabs. Hopefully I'll have more to report on this before too long.

In the next post I'll talk through the website itself. Stay awake at the back!