Fun with Trove Newspapers

The NLA’s Trove Newspapers database is a magnificent resource for digital history, but it’s currently not very easy to do detailed analysis of content. I’ve been working on a few tools which make this easier and I’d be interested in giving a bit of a how-to session to explain them and the technologies they use and kick off a ‘what next’ discussion.

At the base of my tools is a screen-scraper which, in the absence of an official API, retrieves article information in machine-readable form. I’ve used this to create a harvester, which you can use to dump the results of your search to a CSV file for further analysis. It can also retrieve text and pdf versions of the articles. You can read more about the havester on my blog.

To provide a more quick and dirty picture of search results over time, I’ve created another little harvest tool that gets the number of results for each year and calculates what proportion this is of total articles (in Trove) for that year. I’ve used this to create a few interactive graphs as a demonstration.

I’ve also used the scraper to create my own ‘unofficial’ Newspapers API on Google App Engine. I put this up to encourage people to have a play and think about the possibilities.

Depending on people’s interests I’m happy to walk through the development of these tools as well as describing how to use them.

But I’m also very interested in talking about further possibilities for analysing and visualising the results. I’ve been playing around with VoyeurTools, but I wonder what other ideas people have. What about comparisons with other data sources like Google’s ngrams? What other tools do we need? What other databases should we mine?

And lastly it would be good to talk about how we engage directly with the NLA to build a community of digital researchers and help encourage the development of tools like APIs.

About Tim Sherratt

I'm a digital historian, web developer and cultural data hacker who's been developing online resources relating to archives, museums and history since 1993. I've written on weather, progress and the atomic age, and developed resources including Bright Sparcs, Mapping our Anzacs and The History Wall. I'm currently employed by the National Museum of Australia, as well as being an Adjunct Associate Professor in the Digital Design and Media Arts Research Cluster at the University of Canberra. I was one of the organisers of THATCamp Canberra in 2010.
This entry was posted in Session Proposal and tagged , , , . Bookmark the permalink.

One Response to Fun with Trove Newspapers

  1. Pingback: Airminded · More THATCamp thoughts

Leave a Reply