Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Woaq joiner #21

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@ shifts/*
*.txt
yml_template/*
!requirements.txt
env/*
.idea/*

*.iml
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,27 @@ This data was collected in a series of surveyor shifts in which a surveyor colle

### Generating the files

Simply run `python get_shifts.py`. It will create a directory called "shifts" and then save a bunch of CSV files in it.

```bash
docker-compose up
The name of each CSV file designates the device name (A, B, etc) and the expected range of readings, for reference. Columns include timestamp, lat/long, the filter size used on this shift, and PM (particulate matter) reading.

#creates a directory called 'shifts' and writes CSVs to it
python scripts/get_shifts.py
```
Orphaned PM and GPS data is *not* included. That is, only readings that contain both PM and lat/long will be present in these files.

### Joining Air Quality Data with GPS Data

The name of each CSV file designates the device name (A, B, etc) and the expected range of readings, for reference. Columns include timestamp, lat/long, the filter size used on this shift, and PM (particulate matter) reading.
The files/joiner.py script can be used to join air quality data with GPS data. The air quality data should be a CSV, as produced by a DustTrak II device. (See examples/8530C_2-5_002.csv) The GPS data should be a log file containing NMEA sentences. (See examples/GPS_20140717_193858_8530C.log) Additionally, an empty file should be passed that will become the output CSV file. If necessary, this can be obtained via a getter function called on the object (.getFile()) The steps listed below for obtaining a file will also work.

Orphaned PM and GPS data is *not* included. That is, only readings that contain both PM and lat/long will be present in these files.
The output of the script will be a CSV file containing both GPS and air quality data. (See examples/joiner-output.csv)

The script can be run as `files/joiner.py --aq <air-quality-file.csv> --gps <gps-file.log> --out <output.csv> --tolerance 1 --filter 2.5`

The command line options are as follows:

- `-a`/`--aq` : the path to the CSV file containing air quality data
- `-g`/`--gps` : the path to the log file containing GPS data
- `-o`/`--out` : the path to where the output file will be created
- `-t`/`--tolerance` : this parameter is the maximum difference (in seconds) between an air quality datum and a GPS datum for the two data to be joined. In other words, if there is an air quality datum at 12:00:00, but the closest GPS datum is at 12:00:02, then the output will include a row combining those two data if the value of -t is >= 2, otherwise that air quality datum will be dropped.
- `-f`/`--filter` : this is the size of the filter used to collect the air quality data. (2.5 is most common, 10 may also be used.) Often this is embedded in the filename. E.g., for 8530C_2-5_002.csv, the filter size was 2.5.

```bash
# merges shifts from the same month into the shift_by_month directory
scripts/get_shift_by_month.sh

# Generates markdown pages for jekyll. Writes them to _posts
scripts/make_markdown.sh
```

Loading