Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input data overhaul #184

Merged
merged 37 commits into from
Jul 16, 2020
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d28af3f
moved csv to it's own class
scotthavens Jul 6, 2020
498fb1e
added wrf loading
scotthavens Jul 6, 2020
75f463a
netcdf loading
scotthavens Jul 6, 2020
086c37d
hrrr grib loading
scotthavens Jul 6, 2020
d4089b9
some cleanup
scotthavens Jul 6, 2020
739ece2
organized the tests to exapand testing of data
scotthavens Jul 6, 2020
df20f44
initial hrrr timestep working with the non threaded run
scotthavens Jul 6, 2020
fa91111
loading of hrrr working for non threaded run
scotthavens Jul 6, 2020
c249d72
data queue that loads all data for the threads to access
scotthavens Jul 7, 2020
a802564
working for now with hrrr loading as a thread
scotthavens Jul 7, 2020
df7d460
logging in tests
scotthavens Jul 7, 2020
b845bb4
Merge branch '18_threading' into gridded_data
scotthavens Jul 7, 2020
1220ff2
needed the updated threading scheme for the hrrr data
scotthavens Jul 7, 2020
394c3e4
Topo class and some logging leftovers
scotthavens Jul 7, 2020
c39f9dc
added a hrrr cloud factor interpolater for a single time step
scotthavens Jul 7, 2020
0390815
flake8 and isort
scotthavens Jul 7, 2020
ffd1ca8
order import for data
scotthavens Jul 7, 2020
851a2a7
performing a test on tuolumne and had to make some fixes
scotthavens Jul 7, 2020
23266f0
default clear sky if first is night time
scotthavens Jul 8, 2020
9090a48
added an argument for the create-distributed_threads for awsm
scotthavens Jul 9, 2020
c611013
Merge remote-tracking branch 'upstream/master' into gridded_data
scotthavens Jul 9, 2020
750455c
messed up on the merge conflicts
scotthavens Jul 9, 2020
5b185d8
Merge remote-tracking branch 'upstream/master' into gridded_data
scotthavens Jul 14, 2020
ed9616c
some last cleanup from the merge
scotthavens Jul 14, 2020
f019a48
renamed the data classes, started to consolidate the threads
scotthavens Jul 14, 2020
8c261f6
streamlined the thread creation as all need the smrf and data queues,…
scotthavens Jul 14, 2020
2152cd9
moved initializeDistribution to create_distribution (going to break A…
scotthavens Jul 14, 2020
787c851
cleaned up wrf, all gridded datasets are treated the same
scotthavens Jul 15, 2020
b06eb00
Merge remote-tracking branch 'upstream/master' into gridded_data
scotthavens Jul 15, 2020
b63d531
updating for new test structure
scotthavens Jul 15, 2020
fea4ed5
removed mysql as an input data type, fix #127
scotthavens Jul 15, 2020
73b5f90
flake8 and isort
scotthavens Jul 15, 2020
5aecc16
fixing some import errors after isort
scotthavens Jul 15, 2020
634a42f
addressing review comments
scotthavens Jul 15, 2020
a44b65f
flake8
scotthavens Jul 15, 2020
4699dab
data types for functions
scotthavens Jul 15, 2020
bdd04a5
one more data type
scotthavens Jul 15, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 28 additions & 12 deletions docs/api/smrf.data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,50 @@ smrf.data package
Submodules
----------

smrf.data.loadData module
-------------------------
smrf.data.csv module
--------------------

.. automodule:: smrf.data.loadData
.. automodule:: smrf.data.csv
:members:
:undoc-members:
:show-inheritance:

smrf.data.loadGrid module
-------------------------
smrf.data.hrrr\_grib module
---------------------------

.. automodule:: smrf.data.loadGrid
.. automodule:: smrf.data.hrrr_grib
:members:
:undoc-members:
:show-inheritance:

smrf.data.loadTopo module
-------------------------
smrf.data.load\_data module
---------------------------

.. automodule:: smrf.data.loadTopo
.. automodule:: smrf.data.load_data
:members:
:undoc-members:
:show-inheritance:

smrf.data.mysql\_data module
----------------------------
smrf.data.load\_topo module
---------------------------

.. automodule:: smrf.data.mysql_data
.. automodule:: smrf.data.load_topo
:members:
:undoc-members:
:show-inheritance:

smrf.data.netcdf module
-----------------------

.. automodule:: smrf.data.netcdf
:members:
:undoc-members:
:show-inheritance:

smrf.data.wrf module
--------------------

.. automodule:: smrf.data.wrf
:members:
:undoc-members:
:show-inheritance:
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/run_smrf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ below is the function :mod:`run_smrf <smrf.framework.model_framework.run_smrf>`.
s.loadTopo()

# initialize the distribution
s.initializeDistribution()
s.create_distribution()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break AWSM. Should be included in AWSM #68


# initialize the outputs if desired
s.initializeOutput()
Expand Down
112 changes: 10 additions & 102 deletions docs/user_guide/auto_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,115 +112,15 @@ csv
|


mysql
-----

| **air_temp**
| name of the table column containing station air temperature
| *Default: air_temp*
| *Type: string*
|

| **cloud_factor**
| name of the table column containing station cloud factor
| *Default: cloud_factor*
| *Type: string*
|

| **data_table**
| name of the database table containing station data
| *Default: tbl_level2*
| *Type: string*
|

| **database**
| name of the database containing station data
| *Default: weather_db*
| *Type: string*
|

| **host**
| IP address to server.
| *Default: None*
| *Type: string*
|

| **metadata**
| name of the database table containing station metadata
| *Default: tbl_metadata*
| *Type: string*
|

| **password**
| password used for database login.
| *Default: None*
| *Type: password*
|

| **port**
| Port for MySQL database.
| *Default: 3606*
| *Type: int*
|

| **precip**
| name of the table column containing station precipitation
| *Default: precip_accum*
| *Type: string*
|

| **solar**
| name of the table column containing station solar radiation
| *Default: solar_radiation*
| *Type: string*
|

| **station_table**
| name of the database table containing client and source
| *Default: tbl_stations*
| *Type: string*
|

| **stations**
| List of station IDs to use for distributing any of the variables
| *Default: None*
| *Type: station*
|

| **user**
| username for database login.
| *Default: None*
| *Type: string*
|

| **vapor_pressure**
| name of the table column containing station vapor pressure
| *Default: vapor_pressure*
| *Type: string*
|

| **wind_direction**
| name of the table column containing station wind direction
| *Default: wind_direction*
| *Type: string*
|

| **wind_speed**
| name of the table column containing station wind speed
| *Default: wind_speed*
| *Type: string*
|


gridded
-------

| **data_type**
| Type of gridded input data
| *Default: hrrr_netcdf*
| *Default: hrrr_grib*
| *Type: string*
| *Options:*
*wrf hrrr_grib netcdf hrrr_netcdf*
*wrf hrrr_grib netcdf*
|

| **hrrr_directory**
Expand All @@ -235,6 +135,14 @@ gridded
| *Type: bool*
|

| **hrrr_load_method**
| Method to load the HRRR data either load all data first or for each timestep
| *Default: first*
| *Type: string*
| *Options:*
*first timestep*
|

| **netcdf_file**
| Path to the netCDF file containing weather data
| *Default: None*
Expand Down
40 changes: 2 additions & 38 deletions docs/user_guide/input_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,44 +129,8 @@ Example data files can be found in the ``tests`` directory for RME.
MySQL Database
``````````````

The MySQL database is more flexible than CSV files but requires more effort to setup. However,
SMRF will only import the data and stations that were requested without loading in additional
data that isn't required. See :mod:`smrf.data.mysql_data` for more information.

The data table contains all the measurement data with a single row representing a measurement
time for a station. The date column (i.e. ``date_time``) must be a ``DATETIME`` data type with
a unique constraint on the ``date_time`` column and ``primary_id`` column.

================ ========== ==== ==== === =====
date_time primary_id var1 var2 ... varN
================ ========== ==== ==== === =====
10/01/2008 00:00 ID_1 5.2 13.2 ... -1.3
10/01/2008 00:00 ID_2 1.1 0 ... -10.3
10/01/2008 01:00 ID_1 6.3 NAN ... -2.5
10/01/2008 01:00 ID_2 0.3 7.1 ... 9.4
================ ========== ==== ==== === =====

The metadata table is the same format as the CSV files, with a primary_id, X, Y, and elevation
column. A benefit to using MySQL is that we can use a ``client`` as a way to group multiple
stations to be used for a given model run. For example, we can have a client named BRB, which
will have all the station ID's for the stations that would be used to run SMRF. Then we can
specify the client in the configuration file instead of listing out all the station ID's. To use
this feature, a table must be created to hold this information. Then the station ID's matching
the client will only be imported. The following is how the table should be setup. Source is used
to track where the data is coming from.

========== ====== ======
station_id client source
========== ====== ======
ID_1 BRB Mesowest
ID_2 BRB Mesowest
ID_3 TUOL CDEC
... ... ...
ID_N BRB Mesowest
========== ====== ======

Visit the `Weather Database GitHub page <https://github.com/USDA-ARS-NWRC/weather_database>`_ if you'd
like to use a MySQL database.
The MySQL database has been depricated as of SMRF v0.11.0. If that feature is needed,
scotthavens marked this conversation as resolved.
Show resolved Hide resolved
we recommend using v0.9.X or export the tables to csv format.


Weather Research and Forecasting (WRF)
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
coloredlogs
Cython>=0.28.4
inicheck>=0.9.0,<0.10.0
mysql-connector-python-rf==2.2.2
jomey marked this conversation as resolved.
Show resolved Hide resolved
netCDF4>=1.2.9
numpy>=1.14.0,<1.19.0
pandas>=0.23.0
Expand Down
4 changes: 2 additions & 2 deletions smrf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
__version__ = get_distribution(__name__).version
except DistributionNotFound:
__version__ = 'unknown'

__core_config__ = os.path.abspath(
os.path.dirname(__file__) + '/framework/CoreConfig.ini')
__recipes__ = os.path.abspath(os.path.dirname(
Expand All @@ -21,7 +22,6 @@
"time": "Dates to run model",
"stations": "Stations to use",
"csv": "CSV section configurations",
"mysql": "MySQL database",
"gridded": "Gridded datasets configurations",
"air_temp": "Air temperature distribution",
"vapor_pressure": "Vapor pressure distribution",
Expand All @@ -36,7 +36,7 @@
"system": "System variables and Logging"
}

# from . import data, distribute, envphys, framework, output, spatial, utils # isort:skip
from . import utils, data, distribute, envphys, framework, output, spatial # isort:skip

__config_header__ = "Config File for SMRF {0}\n" \
"For more SMRF related help see:\n" \
Expand Down
8 changes: 7 additions & 1 deletion smrf/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-
# flake8: noqa
from . import loadData, loadGrid, loadTopo, mysql_data
from .csv import InputCSV
from .hrrr_grib import InputGribHRRR
from .load_topo import Topo
from .netcdf import InputNetcdf
from .wrf import InputWRF

from .load_data import InputData # isort:skip
76 changes: 76 additions & 0 deletions smrf/data/csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import logging

import pandas as pd

from smrf.utils.utils import check_station_colocation


class InputCSV():

def __init__(self, *args, **kwargs):

for keys in kwargs.keys():
setattr(self, keys, kwargs[keys])
jomey marked this conversation as resolved.
Show resolved Hide resolved

self._logger = logging.getLogger(__name__)
jomey marked this conversation as resolved.
Show resolved Hide resolved

if self.stations is not None:
self._logger.debug('Using only stations {0}'.format(
", ".join(self.stations)))

def load(self):
"""
Load the data from a csv file
Fields that are operated on
- metadata -> dictionary, one for each station,
must have at least the following:
primary_id, X, Y, elevation
- csv data files -> dictionary, one for each time step,
must have at least the following columns:
date_time, column names matching metadata.primary_id
"""

self._logger.info('Reading data coming from CSV files')

variable_list = list(self.config.keys())
variable_list.remove('stations')

self._logger.debug('Reading {}...'.format(self.config['metadata']))
metadata = pd.read_csv(
self.config['metadata'],
index_col='primary_id')
# Ensure all stations are all caps.
metadata.index = [s.upper() for s in metadata.index]
self.metadata = metadata
variable_list.remove('metadata')

for variable in variable_list:
filename = self.config[variable]

self._logger.debug('Reading {}...'.format(filename))

df = pd.read_csv(
filename,
index_col='date_time',
parse_dates=[0])
df = df.tz_localize(self.time_zone)
df.columns = [s.upper() for s in df.columns]

if self.stations is not None:
df = df[df.columns[(df.columns).isin(self.stations)]]

# Only get the desired dates
df = df[self.start_date:self.end_date]

if df.empty:
raise Exception("No CSV data found for {0}"
"".format(variable))

setattr(self, variable, df)

def check_colocation(self):
# Check all sections for stations that are colocated
colocated = check_station_colocation(metadata=self.metadata)
if colocated is not None:
self._logger.error(
"Stations are colocated: {}".format(','.join(colocated[0])))
Loading