Libraries and Open Data #Libraries hacked By Dave Rowe
CILIP blog Posted on 15 August 2016
(NOTE: Dave Rowe is a freelance software engineer and open data/hackathon enthusiast. Currently working part-time in library systems administration for a public library authority)
From the blog post
“Libraries Hacked is a project to promote library open data and the creative reuse of that data. Open data, and producing tech solutions with that data, has already proved to be of great benefit to public organisations. It is a benefit that can be particularly applied to libraries.
Open Data is data made public with a non-restrictive licence, available to everyone to use for any purpose. The 5-star open data plan provides guidance for open data quality. This includes just making stuff available (maybe a PDF - 1 star), to open formats linked to other datasets (linked open data - 5 stars!).
The UK is currently top ranked for open data by the Open Data Barometer, and 2nd by the Open Knowledge Foundation. This is a partly due to enthusiasts pushing for more national and local open data, and how organisations have responded to that demand.
Studies point to wider economic benefit, such as an Open Data Institute (ODI) and Nesta report suggesting a 5 to 10-fold return on investment over 3 years. But for an organisation, benefits can be imagined by asking very simple questions. Are we using data to it's full potential? Could it be merged with data from elsewhere? Are those sources open and available? Do we have the time to do all these things or should we let other people have a go? Barriers to a growth in open data have been a suspicion and fear of data requests, associating that process with transparency obligations. However, public and private organisations are realising that data-sharing benefits everyone, and a genuine external demand for their data could be of direct benefit to them.
Bath and North East Somerset Council have an open data policy which states they will open up any data requested of them. The few exemptions are for data that is personal/sensitive, third-party owned/commercial, or that could pose a security risk. It's a policy that has led the community to dictate what they want, requesting real-time car park occupancy, cycling/traffic counts, mapping of green spaces, and much more. Far from being an exercise limited to increased transparency, this has resulted in a process of community engagement, using local data to create solutions to specific problems, as well as informing council policy. In other words, to the emergence of a citizen-led ‘smart city’.
It's easy to enthuse about all this, but where's the link to libraries? Part of the problem is that there rarely is one. The 'smart city' agenda promoted by government has largely ignored libraries, but cities have been ‘smart’ since they've had public libraries. It is odd that a movement to inform citizens about their immediate environment, and enable community solutions, would exclude libraries, where such activity has always been promoted.
Articles often try to envisage a 'library of the future', imagining a changing role of libraries. But lack of library involvement in open data (which can often receive significant funding) is a departure from a historic role in the community, not just a future opportunity. Addressing this is not a suggestion of any change in focus or skills. Open data needs libraries, and existing professional library skills. Local and national open data portals, such as data.gov.uk are often a chaotic mess with few metadata standards, conflicting structures, poor categorisations and no conventions. They just require some of these information professional skills.
There is also a lack of open data about public libraries themselves, such as library catalogues, usage data, opening hours, or static and mobile library locations.
The benefits to libraries in offering comprehensive open data are clear. Current national (non-open) library performance data is based on historic opinions on what needs to be measured, and isn’t always successful in collecting that data. But where such measures attempt to collect specific data, an open data strategy simply releases data. It is the public that define the performance of their library. The data released engages citizens and encourages them to form their own questions of it. Rather than being told answers to questions they may not agree with in the first place, it allows individuals to produce their own insight.
With a lack of official open data, library-related data can still appear in many places. Ian Anstice releases regular news posts on Public Libraries News (PLN). Looking at these it's easy to see them as a dataset, with regular structure: local news by authority, changes, international news. A libraries hacked project queries PLN for new posts each night and extracts the individual stories. The data is then embedded into the sidebar of the PLN site as an interactive map to show the spread of news across the UK.
(statutory* and non-statutory) as on 1 July 2016.
Published by Department for Culture, Media and Sport. Licensed under [Open Government Licence] Open Government Licence.
Also includes libraries that were temporarily closed on that date, eg. for refurbishment. Publication of this basic dataset provides a definitive source of data on public libraries in England that everyone can point to and use.
This dataset will form the basis of a wider core dataset for public libraries in England. Building on this initial exercise, we will look at existing data collection (largely focused on inputs and outputs) and also aim to capture data which covers outcomes and impacts and the wider variety of activities libraries undertake. This dataset has been validated by all 151 library services in England, but handling this quantity of data means the occasional error is possible. If you spot any errors or missing information, please firstname.lastname@example.org.
From LIS_PUB_LIB listeserve 22 July 2017
Library and Information Officer
“In Newcastle Libraries (Newcastle upon Tyne), we publish some information as open data since March 2016. The aim is to publish everything we possibly can; though we have started the easy way with things we already collate, such as:
• visitor figures
• number of enquiries
• number of loans
• PC usage
• online resources usage
• list of titles in the catalogue
• list of items in the catalogue
• locations of libraries
You can see the full detail on our open data page We currently publish our open data “with metadata” on the Data Mill North (where it all looks nice and tidy), and the same plus some “untidy” data on GitHub (where the explanation of what the terms used in the data set mean is missing - simply because we haven't yet had time to write the metadata to go with it). If you visit our GitHub repository you will see some of the other data sets we're looking into but haven't put as much work in yet: libraries opening hours over the years, comments about our events from feedback forms, information about members (while obviously being mindful of protecting our users' personal information)… I would also be interested in publishing information about staffing levels and budgets - though as you can guess this may not be possible, or only in a limited way.
If you look for libraries data sets on the Data Mill North you will see there is also data from Leeds and Durham - if there is anyone from those library authorities on this list, I would be curious to know how much input you have had into your data being published there.
We have not had any problems with ownership of our data - our suppliers have said that the data belongs to us (though they might charge if we need their help getting it out of their systems!) At the moment the retrieval and publication of data is done manually rather than being automated - I'm not sure if that is what Helen Leech meant about the situation in Surrey?
My colleague Luke Burton and I are also interested in some kind of “standard” for how we publish our data sets (I think the technical people call it a schema). If several other library authorities start publishing data, can we make sure what we publish is easily comparable? Can we all publish the same things in the same way, so when a new library service gets onboard they can simply replicate what we have all agreed to do (e.g. same columns on the spreadsheet in the same order, counting the same thing) by using some kind of template and copying over the metadata (so they don't have to do that work when others have done it already)? If any of you are interested in this Luke and I would be keen for you to get in touch - Luke can be contacted on email@example.com and myself on firstname.lastname@example.org