Skip to content

Keeping addressbase up to date

chris48s edited this page Nov 28, 2024 · 16 revisions

Keeping Addressbase up to date.

Addressbase is released every 6 weeks. We want to keep this up to date on our websites and locally for writing import scripts.

Updating Addressbase Locally

This should be as simple as:

AWS_PROFILE=<your-dem-club-profile> ./manage.py update_addressbase \
        --addressbase-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv' \
        --uprntocouncil-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv'

You can also download the files (e.g. overnight) and then use the --addressbase-path and --uprntocouncil-path flags to import from files on disk:

     ./manage.py update_addressbase \
        --addressbase-path /path/to/addressbase_cleaned.csv \
        --uprntocouncil-path /path/to/uprn-to-councils.csv

This assumes you've completed the setup described in the readme. i.e. set up a venv, install python dependencies, make a postgis db, migrate, import ONSPD and, import councils.

Updating Addressbase in Production

If someone has preprocessed the data to create the addressbase_cleaned.csv and uprn-to-councils.csv files then this process is the same as for local updates. However, you will need to specify the RDS db as target with --database principal.

Assuming no one has done the preprocessing the steps are described in the next section.

Updating Addressbase from a new release

At a high level the steps to update addressbase are:

  • Note any upcoming (by-)elections: Re-importing addressbase will mean we have to re-run these import scripts
  • Download new Addressbase Plus
  • Preprocess Addressbase to create cleaned_addressbase.csv and uprn-to-councils.csv
  • Upload preprocessed files to s3.
  • Import on production RDS
  • Re-run the import scripts we noted at the start of the process

In a bit more detail:

Note any upcoming by-elections

If we're updating AddressBase when there is live data imported, note anything we already have imported. We're going to need to re-run these import scripts at the end of the process.

Download new Addressbase Plus

  • Get latest version from OS Data Hub. This is only possible for DC staff.
  • Look in bitwarden for a login or ask someone.

Preprocess

This step is to create cleaned_addressbase.csv and uprn-to-councils.csv

  • This should be done locally or on a worker instance that isn't serving traffic or replicating from RDS.
  • Run python manage.py clean_addressbase_plus folder/with/addressbase/csvs/ to create addressbase_cleaned.csv.
  • Run python manage.py create_uprn_council_lookup to create uprn-to-councils.csv.
  • Check these work with the update_addressbase command.

Upload preprocessed files to s3.

  • Follow the existing naming convention: s3://bucket/addressbase/yyyy-mm-dd/<addressbase_cleaned|uprn-to-council>/<release_name>_<addressbase_cleaned|uprn-to-council>.csv
  • eg: s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv and s3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv
  • The dates are the release dates - i.e. what's in the file name of the release you downloaded from OS Datahub.

Import on production RDS

  • Don't forget the --database principal flag.
  • ./manage.py update_addressbase --database principal \
        --addressbase-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv' \
        --uprntocouncil-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv'

Re-run import scripts

SSH in and re-run any import scripts we noted at the start of the process