How the 2023 coal model transportation distances by facility were created #273

dt-woods · 2024-12-05T16:50:38Z

dt-woods
Dec 5, 2024
Collaborator

In the 2016 baseline, coal transportation data were provided by a third-party data provider (see here. In the 2023 coal model (see here), transportation distances are now publicly provided; however, they are on a NERC-region and coal-basin basis (see Transportation tab in the Excel workbook).

Challenge: Map transportation data at the regional level to their associated facilities based on NERC region and coal basin attributes.

Solution: The following shows how this may be done. In this approach, the goal is to replace the CSV file with updated values from the 2023 coal model.

This method uses the in-development version 2 of the ElectricityLCI Python package.

Start by importing needed Python packages.
Note that the imports from ElectricityLCI will prompt for model specification file.
In this example, the 'ELCI_2020' model was chosen.
Once implemented in the code, this won't be necessary.

import os
import pandas as pd

from electricitylci.coal_upstream import basin_codes
from electricitylci.coal_upstream import generate_upstream_coal_map
from electricitylci.eia860_facilities import eia860_balancing_authority

Generate the coal upstream map, which labels each facility with its coal source code: a three-part combo of coal basin, coal type, and mine type. We want the coal basin data from this.

>>> coal_map_df = generate_upstream_coal_map(2020)
>>> coal_map_df.head()
plant_id	coal_source_code	quantity	heat_input
0	3	IMP-B-S			1.292499e+06	2.618872e+07
1	26	IMP-B-S			9.670682e+04	1.971177e+06
2	26	SA-B-S			9.577832e+05	2.310327e+07
3	51	GL-L-S			6.140940e+05	8.208023e+06
4	56	CA-B-S			1.803470e+05	4.130585e+06

Using string split, we can take just the coal basin code from the coal source code string.

>>> coal_map_df["Basin"] = coal_map_df["coal_source_code"].str.split("-").str[0]
>>> coal_map_df.head()
plant_id	coal_source_code	quantity	heat_input	Basin
0	3	IMP-B-S			1.292499e+06	2.618872e+07	IMP
1	26	IMP-B-S			9.670682e+04	1.971177e+06	IMP
2	26	SA-B-S			9.577832e+05	2.310327e+07	SA
3	51	GL-L-S			6.140940e+05	8.208023e+06	GL
4	56	CA-B-S			1.803470e+05	4.130585e+06	CA

The coal basin code is known!

Now, let's find the NERC region.
Note that this example is hard-coding the generation year, which is to be replaced with model_specs EIA generation year.

>>> BA_region_df = eia860_balancing_authority(2020, regional_aggregation=None)
>>> BA_region_df.head()
Plant Id	State	NERC Region	Balancing Authority Code	Balancing Authority Name
0	1	AK	NaN		NaN		NaN
1	2	AL	SERC		SOCO		Southern Company Services, Inc. - Trans
2	3	AL	SERC		SOCO		Southern Company Services, Inc. - Trans
3	4	AL	SERC		SOCO		Southern Company Services, Inc. - Trans
4	7	AL	SERC		SOCO		Southern Company Services, Inc. - Trans

This provides us with NERC regions for each facility.
Note that Plant IDs are strings in this dataset.
Let's create a dictionary that maps facilities to their NERC region, fixing the plant ID from string to integer along the way.
We don't need the heat input or the old coal source code, so let's drop them.

>>> region_dict = dict(zip(BA_region_df["Plant Id"], BA_region_df["NERC Region"]))
>>> region_dict = {int(k): v for k, v in region_dict.items()}
>>> coal_map_df['NERC Region'] = coal_map_df['plant_id'].map(region_dict)
>>> coal_map_df = coal_map_df.drop(columns=['coal_source_code', 'heat_input'])
>>> coal_map_df.head()
plant_id	quantity	Basin	NERC Region
0	3	1.292499e+06	IMP	SERC
1	26	9.670682e+04	IMP	SERC
2	26	9.577832e+05	SA	SERC
3	51	6.140940e+05	GL	SERC
4	56	1.803470e+05	CA	SERC

That's the basis of our mapping!

We now have facilities and their NERC region and their coal basin.
Now, let's pull the new coal transportation data from the coal model.

Download NETL's 2023 coal Excel model, find the Transportation tab, and scroll over to the transportation data (see screen shot).

Copy the transportation data values (AW3:BD46) to its own workbook (see screen shot); note that the headers were adjusted for convenience.

UPDATE (12/6/2024): The worksheet was exported to CSV after removing the Total column, TRIM-ing the whitespace from the NERC region names, and converting the scientific values to General.

Now, read the coal model transportation data.
Remove the extra spaces in the NERC region column, and drop the total column.

>>> coal_trans_df = pd.read_excel("Transport-Data-From-Coal-Model.xlsx")
>>> coal_trans_df['NERC Region'] = coal_trans_df['NERC Region'].str.strip()
>>> coal_trans_df = coal_trans_df.drop(columns='Total')
>>> coal_trans_df.head()
	Basin		NERC Region	Belt	Truck	Barge	Ocean Vessel	Train
0	Central Appalachia	FRCC	0.0	0.00000	0.000000	0.000000	1107.977752
1	Central Appalachia	MRO	0.0	0.00000	0.000000	307.000000	560.085252
2	Central Appalachia	NPCC	0.0	0.00000	0.000000	0.000000	885.108703
3	Central Appalachia	RFC	0.0	5.74655	104.386181	2.733887	82.462229
4	Central Appalachia	SERC	0.0	1.21293	7.506436	0.000000	432.026518

The 2023 coal model uses a slightly different naming scheme for WNW coal basin.
So, let's correct that!

>>> basin_codes_new = {k:v for k, v in basin_codes.items()}
>>> del basin_codes_new["West/Northwest"]
>>> basin_codes_new["West/North West"] = "WNW"
>>> basin_codes_new
{'Central Appalachia': 'CA',
 'Central Interior': 'CI',
 'Gulf Lignite': 'GL',
 'Illinois Basin': 'IB',
 'Lignite': 'L',
 'Northern Appalachia': 'NA',
 'Powder River Basin': 'PRB',
 'Rocky Mountain': 'RM',
 'Southern Appalachia': 'SA',
 'Import': 'IMP',
 'West/North West': 'WNW'}

Now, map the basin names to their basin codes.
Note that this works for all basins except for "U.S. Average"

>>> coal_trans_df["Basin"] = coal_trans_df["Basin"].map(basin_codes_new)
>>> coal_trans_df
	Basin	NERC Region	Belt	Truck	Barge	Ocean Vessel	Train
0	CA	FRCC	0.0	0.00000	0.000000	0.000000	1107.977752
1	CA	MRO	0.0	0.00000	0.000000	307.000000	560.085252
2	CA	NPCC	0.0	0.00000	0.000000	0.000000	885.108703
3	CA	RFC	0.0	5.74655	104.386181	2.733887	82.462229
4	CA	SERC	0.0	1.21293	7.506436	0.000000	432.026518

Some facilities may not map to our coal model, so let's save the U.S. average and use it for them.

>>> us_ave_coal_trans = coal_trans_df.loc[coal_trans_df['Basin'].isna(), :]
>>> us_ave_coal_trans = us_ave_coal_trans.reset_index(drop=True)
>>> us_ave_coal_trans
	Basin	NERC Region	Belt	Truck	Barge	Ocean Vessel	Train
0	NaN	U.S. Average	0.398091	3.778319	35.092287	42.137498	577.272915

Drop the NaNs from our coal transportation data frame (i.e., the U.S. average that we saved separately).

>>> coal_trans_df = coal_trans_df.dropna().copy()

Now, let's put it all together by merging our transportation data and our coal data using the NERC region and coal basin codes as the common attributes.

>>> final_df = pd.merge(
...     left=coal_map_df,
...     right=coal_trans_df,
...     on=['Basin', 'NERC Region'],
...     how='left',
... )

We were right, there are facilities not mapped to transportation.
Let's give them the U.S. average.

>>> final_df = final_df.fillna({
...     'Belt': us_ave_coal_trans.loc[0, 'Belt'],
...     'Truck': us_ave_coal_trans.loc[0, 'Truck'],
...     'Barge': us_ave_coal_trans.loc[0, 'Barge'],
...     'Ocean Vessel': us_ave_coal_trans.loc[0, 'Ocean Vessel'],
...     'Train': us_ave_coal_trans.loc[0, 'Train'],
... })

Lastly, the transportation data from the coal model are in miles.
Let's convert miles to kilometers, and calculate the kg*km values by multiplying the quantity (kg of coal) by transportation distance (miles converted to km).

>>> mi_to_km = 1.60934
>>> trans_cols = ["Belt", "Truck", "Barge", "Ocean Vessel", "Train"]
>>> final_df[trans_cols] = final_df[trans_cols].mul(mi_to_km)
>>> final_df[trans_cols] = final_df[trans_cols].mul(final_df["quantity"], axis=0)
>>> final_df.head()
plant_id	quantity	Basin	NERC Region	Belt	Truck	Barge	Ocean Vessel	Train
0	3	1.292499e+06	IMP	SERC	828058.135049	7.859169e+06	7.299442e+07	8.764896e+07	1.200768e+09
1	26	9.670682e+04	IMP	SERC	61956.622188	5.880355e+05	5.461559e+06	6.558034e+06	8.984339e+07
2	26	9.577832e+05	SA	SERC	0.000000	5.029310e+07	4.339558e+07	0.000000e+00	2.080195e+06
3	51	6.140940e+05	GL	SERC	0.000000	4.941430e+05	0.000000e+00	0.000000e+00	0.000000e+00
4	56	1.803470e+05	CA	SERC	0.000000	3.520403e+05	2.178665e+06	0.000000e+00	1.253912e+08

This is essentially the same the data as the CSV file from the 2016 baseline, updated with transportation data from the 2023 coal model, where gaps are filled using the U.S. average.
Arguably, U.S. average may be too robust for regions where barges and ocean vessels do not make sense.

m-jamieson · 2024-12-05T18:11:12Z

m-jamieson
Dec 5, 2024
Maintainer

In the 2016 baseline, coal transportation data were provided by a third-party data provider (see here. In the 2023 coal model (see here), transportation distances are now publicly provided

To be fully transparent, the data still comes from the third party, it's just now publicly available as part of the report. And while it's the 2023 report, the report and data are for 2016 data year coal.

I think the approach outlined above is great! Thanks for putting it together.

1 reply

dt-woods Dec 5, 2024
Collaborator Author

That's a good reminder of data vintage. Thanks, Matt!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the 2023 coal model transportation distances by facility were created #273

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How the 2023 coal model transportation distances by facility were created #273

dt-woods Dec 5, 2024 Collaborator

Replies: 1 comment · 1 reply

m-jamieson Dec 5, 2024 Maintainer

dt-woods Dec 5, 2024 Collaborator Author

dt-woods
Dec 5, 2024
Collaborator

Replies: 1 comment 1 reply

m-jamieson
Dec 5, 2024
Maintainer

dt-woods Dec 5, 2024
Collaborator Author