Wuthering Hacks

A data dashboard to display Newcastle libraries open data. Currently published at https://newcastle.librarydata.uk

What is it?

Newcastle public libraries publish as much of their data as possible under a Public Domain licence (https://creativecommons.org/publicdomain/zero/1.0/). Details of existing datasets can be found at Libraries data sets.

They also have a GitHub account at ToonLibraries, and an open data repository within this account at library-open-data.

Dashboard pages

The dashboard splits visualisations into pages, focussing on different areas of the library data provided by Newcastle.

Page	Description
Usage	Details of issues, computer use, enquiries, and visits by month and by library
Catalogue	Details on the library catalogue - from titles and items data
Members	Details on membership by postcode area and date joined/active

Data provided

The dashboard uses CSVs published by Newcastle libraries under the Public Domain licence.

Data	Link	Description
Current Libraries	CSV	Location of current Newcastle City Council Libraries along with number of public access computers and Wi-Fi provision
Monthly computer usage	CSV	Monthly computer usage figures by branch for April 2008 to Present
Monthly enquiries	CSV	Monthly enquiry figures by branch for April 2008 to Present
Monthly issues	CSV	Monthly loan figures (number of items issued) by branch for April 2008 to Present
Monthly visits	CSV	Monthly issue figures by branch for April 2008 to Present
Members	CSV	Anonymised member data including postcode district, library registered at, date added and last used
Catalogue	CSV	Extract from the Library Management System (LMS) catalogue
Items	CSV	Items in the Library Management System (LMS) catalogue

The code does not link directly to these files but uses a copy held within the project. This means that updates to those open data files need to be manually copied into this project. See Build section for instructions.

Data definitions

The data that the dashboard uses is converted from the source data, and put into a format that is most efficient for the code to use.

However, the original data can be copied into this project when it is published. The definitions of the datasets used are included below.

Monthly enquiries

Field	Description	Example
Library	The name of the library	Blakelaw
2008-04	The number of enquiries for the month	312

The columns go on to cover each month in the form of YYYY-MM.

Comments

It would be nice to have this dataset with the month in a row, rather than a column header. For example:

Field	Description	Example
Library	The name of the library	Blakelaw
Month	The month	2008-04
Enquiries	The number of enquiries for the month	312

That way the structure would be fixed to three columns and would increase in rows (rather than columns) as new months are added. The same applies to the following datasets on usage.

Monthly issues

Field	Description	Example
Library	The name of the library	Blakelaw
2008-04	The number of issues for the month	1048

Monthly visits

Field	Description	Example
Library	The name of the library	Blakelaw
2008-04	The number of visits for the month	1768

Monthly computer usage

Field	Description	Example
Library	The name of the library	Blakelaw
2008-04	The percentage of computer utilisation	50%

Online resources usage

Field	Description	Example
Online Resource	The type of online resource	19th Century British Library Newspapers
Jan-05	The usage figure for the month	300

Membership

Field	Description	Example
Postcode	The postcode district of the member	AB10
Library Registered At	The library the member is registered at	CITY
Date Added	The date the user was added as a member	04/09/15 or 04/08/2005
Time Added	The time the user was added as a member	8:45:00 or Empty
Last Used Date	The date the member last used services	04/09/15
Last Used Time	The time the member last used services	8:45:00

Catalogue

Field	Description	Example
rcn	Unique identifier for the title	413396703
isbn	The International Standard Book Number of the title record	9780413396709
publ_y	The year the title was published	1980
author	Main author of the work	Osborne, Charles
title	Main title as on title page or equivalent	W.H. Auden : the life of a poet
price	Price of 1 copy	£0.0
langua	Main language of the work. Note: for most works in English the language is not specified.
editio	Edition or version of the work
class	Main classification allocated by library staff or by the supplier for the title	821AUDE
publisher	Name of the publisher	EYRE METH
firstcopydate	Date the first copy was added. Note: field rarely used.
acpy	Number of copies in stock for that ISBN	1

Comments

It's probably down to the tool used to create the CSV, but the header is on the second row. The first row includes a timestamp.

__ Mon Sep,19 13:33:20 2016,______

Never mind though, it'll be easy enough to ignore. There seems to be a final column on the end that doesn't have any data in. Will ignore that. Pound (£) signs are included in the price column. I don't think there are any other currencies so will remove these and just have a decimal number. The number of copies sometimes seems to have a pipe character (|), maybe some remnant from a MARC field, so will also remove these.

Items

Field	Description	Example
item	A unique ID for the item.	C203255900
rcn	The unique title record (links to the catalogue title data above).	573011680
catego	A category ID for the item.	2
text	Text for the category ID.	ADULT NON FICTION
homebr	An ID for the item branch.	46
name	Name of the item location.	CITY STACK
added	Date and time added to the catalogue.	22/01/2007 14:26
issues current branch	Number of issues at the current branch.	0
issues previous branch	Number of issues at the previous branch.	0
renewals current branch	Number of renewals at the current branch.	0
renewals previous branch	Number of renewals at the previous branch.	0

Combining usage data

To produce a file that is efficient to show for usage data, it's worth merging together a number of the files on usage: enquiries, issues, vists, and computer usage. Each of these include libraries and months, so when separate contain a lot of duplicated data.

The goal will be to produce a file to be used by the dashboard that looks like the following.

Field	Description	Example
Library	The name of the library	Blakelaw
Month	The month	2008-04
Enquiries	The number of enquiries for the month	312
Issues	1048
Visits	1768
Computer Usage	50%

The data is created using a python script. This is included in the scripts directory of this project and prduces 1 file.

dashboard_usage.csv

This file is then used in the usage page of the data dashboard.

Combining and aggregating catalogue and items

Both the catalogue and item extracts are fairly large files (29MB and 27MB). Given that this project mainly processes data client-side (in the web browser), those files are too large to expect users to download.

We mainly need aggregated data (e.g. x thousand items, x thousand items of a particular category). For this purpose I have created a single aggregated dataset for catalogue and items. This is made smaller by using Ids for category and branch. These lookups are then included as a separate export.

Field	Description	Example
CategoryId	An integer ID of the category type.	1
Category	The textual name of the category.	ADULT NON FICTION

Field	Description	Example
BranchId	An integer ID of the branch.	0
Branch	The textual name of the location.	CITY STACK

Field	Description	Example
CategoryId	Derived from the text field in the item data.	1
BranchId	Derived from the name field from the item table.	1
Added	Month the items were added to the catalogue.	2016-01
Count	A count of the number of items.	1
Issues	A count of the number of issues	419757
Renewals	A count of the number of renewals	605263
Price	Taken from the price field of the title data, in this case a total price for the items. Will be in pounds but with no symbol.	462969.67

So, the above example would show that

The data is created using a python script. This is included in the scripts directory of this project and prduces 3 files.

dashboard_catalogue.csv
dashboard_catalogue_grouped.csv
dashboard_catalogue_branches.csv
dashboard_catalogue_categories.csv

These files are then used in the catalogue page of the data dashboard.

Technologies used and licences

The following technologies (with licences listed) are used in this project.

Technology	Used for	Link	Licence
Bootstrap	To provide the page structure. Currently using version 4 Alpha 6.	Bootstrap	MIT
jQuery	Required by bootstrap and to provide JavaScript code shortcuts	jQuery	MIT
DC JS	Dimensional Charting JavaScript library - used for the dynamic charts	dc.js	Apache
Crossfilter	Required by DC JS, provides the cross flitering functionality	Crossfilter	Apache
D3	Required by DC JS, provides the data driven graphs	D3JS	BSD
Leaflet	JavaScript library for mapping.	LeafletJS	Open Source
CartoJS	Specific functions for mappping using data stored in Carto.	CartoJS	Open Source

Licence

This code is licensed under the MIT Licence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wuthering Hacks

What is it?

Dashboard pages

Data provided

Data definitions

Monthly enquiries

Comments

Monthly issues

Monthly visits

Monthly computer usage

Online resources usage

Membership

Catalogue

Comments

Items

Combining usage data

Combining and aggregating catalogue and items

Technologies used and licences

Licence

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wuthering Hacks

What is it?

Dashboard pages

Data provided

Data definitions

Monthly enquiries

Comments

Monthly issues

Monthly visits

Monthly computer usage

Online resources usage

Membership

Catalogue

Comments

Items

Combining usage data

Combining and aggregating catalogue and items

Technologies used and licences

Licence