-
Notifications
You must be signed in to change notification settings - Fork 30
Keeping addressbase up to date
Addressbase is released every 6 weeks. We want to keep this up to date on our websites and locally for writing import scripts.
This should be as simple as:
AWS_PROFILE=<your-dem-club-profile> ./manage.py update_addressbase \
--addressbase-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv' \
--uprntocouncil-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv'
You can also download the files (e.g. overnight) and then use the --addressbase-path
and --uprntocouncil-path
flags to import from files on disk:
./manage.py update_addressbase \
--addressbase-path /path/to/addressbase_cleaned.csv \
--uprntocouncil-path /path/to/uprn-to-councils.csv
This assumes you've completed the setup described in the readme. i.e. set up a venv, install python dependencies, make a postgis db, migrate, import ONSPD and, import councils.
If someone has preprocessed the data to create the addressbase_cleaned.csv
and uprn-to-councils.csv
files then this process is the same as for local updates.
However, you will need to specify the RDS db as target with --database principal
.
Assuming no one has done the preprocessing the steps are described in the next section.
At a high level the steps to update addressbase are:
- Note any upcoming (by-)elections: Re-importing addressbase will mean we have to re-run these import scripts
- Download new Addressbase Plus
- Preprocess Addressbase to create
cleaned_addressbase.csv
anduprn-to-councils.csv
- Upload preprocessed files to s3.
- Import on production RDS
- Re-run the import scripts we noted at the start of the process
In a bit more detail:
If we're updating AddressBase when there is live data imported, note anything we already have imported. We're going to need to re-run these import scripts at the end of the process.
- Get latest version from OS Data Hub. This is only possible for DC staff.
- Look in bitwarden for a login or ask someone.
This step is to create cleaned_addressbase.csv
and uprn-to-councils.csv
- This should be done locally or on a worker instance that isn't serving traffic or replicating from RDS.
- Run
python manage.py clean_addressbase_plus folder/with/addressbase/csvs/
to createaddressbase_cleaned.csv
. - Run
python manage.py create_uprn_council_lookup
to createuprn-to-councils.csv
. - Check these work with the
update_addressbase
command.
- Follow the existing naming convention:
s3://bucket/addressbase/yyyy-mm-dd/<addressbase_cleaned|uprn-to-council>/<release_name>_<addressbase_cleaned|uprn-to-council>.csv
- eg:
s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv
ands3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv
- The dates are the release dates - i.e. what's in the file name of the release you downloaded from OS Datahub.
- Don't forget the
--database principal
flag. -
./manage.py update_addressbase --database principal \ --addressbase-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/addressbase_cleaned/AddressBasePlus_FULL_2024-08-06_addressbase_cleaned.csv' \ --uprntocouncil-s3-uri='s3://pollingstations.private.data/addressbase/2024-08-06/uprn-to-council/AddressBasePlus_FULL_2024-08-06_uprn-to-councils.csv'
SSH in and re-run any import scripts we noted at the start of the process