adding file for calculating comparison of hn zones as well as plottin… #81

helloaidank · 2024-11-06T19:06:38Z

Description

All files are contained within the asf_heat_pump_suitability/analysis/hn_zones directory.

The script comparison_of_hn_zones.py processes and compares DESNZ heat network (HN) to Nesta's heat pump suitability data for Liverpool, and outputs some files for plotting.

Steps taken:

Load DESNZ HN GeoData and perform a spatial join with LSOA polygons.
Process Nesta heat pump suitability scores for Liverpool.
Identify LSOAs in DESNZ HN list not in HP suitability data.
Calculate and log average scores and Mean Absolute Error (MAE).
Save processed data to files.

Outputs:

'liverpool_with_desnz_hn_lsoa.gpkg': GeoDataFrame with LSOA codes added to HN zones.
'liverpool_hp_suitability_lsoas.json': List of unique LSOA codes in Liverpool.
'liverpool_hp_suitability_scores_with_desnz.parquet': Processed suitability scores with DESNZ pilot scores and Mean Absolute Error (MAE).
'script_output.log': Log file with metrics such as avg HN scores and MAE

The script comparison_of_hn_zones.py visualises heat network (HN) and heat pump suitability data for Liverpool.
Steps taken:

Load and preprocess data:
- Read Parquet and GeoPackage files.
- Extract unique LSOA geometries.
- Load LSOA geometries and convert CRS if needed.
- Filter Liverpool LSOAs.
Plot Liverpool LSOA geometries and save as PNG.
Merge suitability scores with geometries and convert to GeoDataFrame.
Plot and save overlay of DESNZ Pilot Score = 1 on Liverpool heat network zones.
Define and use a function to plot absolute error maps by DESNZ Pilot Score, saving the plots in the output plots directory.

Fixes #80

Instructions for Reviewer

In order to test the code in this PR you need to run the following commands:
python asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

This should output a few data files to a directory called output_data, please check that they are there. If you could see how long this script will take as we have to scale it to ~20 other local authorities. For me it takes 30s-1 minute to run, which seems OK as we likely won't be running this repeatedly (i.e. it's good enough) but any major efficiency improvements are always welcome.

Then to visualise some of this data, run the following:
python asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py
This will output some visuals to output_plots for the Liverpool LA.

Please pay special attention to whether the outputs/data we are getting seem sensible and that the MAE checks out, this is the primary motivation for this PR.

Two things to be aware of regarding the methodology and visuals:

For the DESNZ Liverpool HN zones, a couple of the LSOAs are actually outside the LA of Liverpool and at the moment these are not taken into account in the script. There could be a case to include them in an exception but for the moment I have left them out.
The shape file with the UK LSOAs geometries that I do read in include land and water, whereas the DESNZ HN Zones are only for land. In terms of the absolute error, this does not make a difference (I don't think), but it does mean that the visuals for the DESNZ HN Zones look different to the Liverpool LSOA map as shown by the overlay plot.

For the next evolution of this script, we will scale it to the other 20 LAs we have and I think it's probably a good call to put these file paths into a config file. I am also curious as to where you think this should live in the repo!

Thanks so much for taking the time to review! It's much appreciated.

Checklist:

…g script for now

crispy-wonton

Hey @helloaidank.
I've reviewed your comparison_of_hn_zones.py script and left comments. I need to come back and review the plotting script but wanted to publish these comments already so you can start scaling up :)

The code looks great! I do think we need to change the methodology slightly to exclude any small bits of LSOAs that might be skewing the error results because they look so similar for DESNZ pilot and non-pilot zones. And we want to make sure our results are representative of the same/similar locations so that we can assess them. If they still come out like this then maybe the features for HN model need to be improved!
Let me know if you have questions !

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

crispy-wonton · 2024-11-07T17:08:11Z

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

+def main():
+    # Define the output directory and set up logging
+    output_dir = os.path.join(
+        PROJECT_DIR, "asf_heat_pump_suitability/analysis/hn_zones/output_data/"


I would suggest moving the hn_zone dir to outputs/hn_zones and then just save the results directly into that dir :)

Just on this, the outputs directory is in the .gitignore, so I am unsure if this is the best place to put the hn_zones directory?

I think it's fine that the outputs go into a folder that's ignored by git. I don't think we want to push the results to github.

I can see you've change it in the new script. To clarify, I just meant to save the output files to the outputs dir. I think the scripts themselves should stay in the analysis folder. :) But perhaps I misunderstood your reasoning. What do you think?

Lastly, you could also save your outputs directly to S3 if you wanted. At least the .json files, if not the plots. We could make a new evaluation dir in the asf-heat-pump-suitability bucket.

Ah ok, I misunderstood, I thought you meant moving the whole directory to the outputs folder. I will move it over after we've finished this review :).

After weighing it up in my head, I do think it probably makes sense to have a version saved to s3 (so people can play around with the plotting scripts without needing to run the comparison script) but if you are iterating locally, then make sure you save the files locally because it takes ages otherwise hehe and is a bit annoying.

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

crispy-wonton · 2024-11-07T17:39:43Z

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

@@ -0,0 +1,350 @@
+"""


First of all, this is great work. The code is clear and neat and I really like the logging. Great job!

Below are the logs from running the script. The results of average HN score for DESNZ pilot 0 and 1 seem very similar and with really high mean error rates. At least 0 has a lower score on avg than 1, but they are so close together and also both very central values at 0.52-0.61. If I saw those results out of context I don't think I would interpret that 0.52 is unsuitable for a HN but 0.61 really is.

It might be worth looking at the underlying data for the suitability features for these LSOAs and try to identify if they have anything in common that we seem to be misclassifying.

Looking at the relevant features for Heat Networks in the model, they are: not a listed building; not in building conservation zone (bear in mind a lot of properties will have a NAN flag here because of the patchiness of the conservation zone data. We assigned those not in a cons zone as NAN because we couldn't know if they were in one or if there was just cons data missing). ; off-gas; property in building with other properties (i.e. a flat); and in an urban area.

One reason that they have such similar scores could be that a lot of properties will score very similarly in a city even with variation between them. E.g. property density doesn't count for HN so there's no scale to density except urban/rural binary, and presumably all the LSOAs in this sample are urban. I'm guessing most properties in Liverpool will be not listed buildings and not off-gas(?). Those 3 features alone give a score of 0.47 (2.25/4.75). But still we would expect those on the higher scores to be around 0.9 if they are flats as well, which would put the score at 0.89 (4.25/4.75).

I think it might be true what you said in one of the stand ups, that small parts of LSOAs that have a tiny overlap with the DESNZ zones might be affecting the results. Maybe we need to set a threshold for the amount of overlap with DESNZ zone required to be considered in the analysis. We can start by setting it at 50% area (or 50% of OAs in an LSOA).

NFO: Loaded Liverpool with DESNZ HN LSOA data.
INFO: Created 133 records
INFO: Processed Nesta HP suitability scores for Liverpool.
INFO: number of LSOAs in DESNZ list not in HP suitability data: 2
INFO: Average HN_N_avg_score_weighted: 0.5608973509933775
INFO: Avg HN_N_avg_score_weighted for DESNZ_pilot_score = 1: 0.6132439024390244
INFO: Avg HN_N_avg_score_weighted for DESNZ_pilot_score = 0: 0.524927374301676
INFO: Mean Absolute Error (MAE) for DESNZ_pilot_score vs HN_N_avg_score_weighted: 0.46865231788079464
INFO: Mean Absolute Error (MAE) for DESNZ_pilot_score = 1: 0.3867560975609756
INFO: Mean Absolute Error (MAE) for DESNZ_pilot_score = 0: 0.524927374301676

crispy-wonton

I've reviewed the plot comparison code. It looks good and I don't see any issues or bugs. Mainly just a couple of suggestions for documentation.

I got 4 maps output from the script and they look the same as the ones you showed in the sprint review (see file names below). Great job with the heat mapping, it looks really good and the code is really clean and readable and simple! Really nice work!!

crispy-wonton · 2024-11-08T14:04:38Z

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

+INPUT_DIR = os.path.join(
+    PROJECT_DIR, "asf_heat_pump_suitability/analysis/hn_zones/output_data/"


same as above - can move this to the main outputs dir :)

As above, I meant moving the output_data and output_plots dirs only, not the scripts :)

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

crispy-wonton · 2024-11-08T14:24:44Z

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

+
+
+def merge_data(
+    liverpool_hp_suitability_scores_pd: pd.DataFrame,


In general, you could shorten argument names and in-function variable names if you wanted. If you document what they represent well in the docstring, then you can use names like df and gdf here. I tend to do that in my code because I think it makes the code easier to read, because the lines then become a lot shorter.

Of course it's a matter of preference because your descriptive names can make it easier to understand what's going on in more complex pieces of code. So do whatever you prefer :)

I think at least initially I like keeping the names descriptive especially on first iteration as I have a tendency to forget what each variable means! I'm also sure we can factor this down, as it will be for every LA rather than just Liverpool. But thinks like heat network (can abbreviate to hn as I'm very familiar with that by now).

No problem, go with what you find easier to use. As you've said we can shorten them later if necessary.

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

…ather than binary

…core vs desnz fraction

helloaidank · 2024-12-05T09:28:49Z

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

Hi @crispy-wonton!

Thank you for your comments, I’ve tweaked the methodology so that we are looking at the fraction of area covered in an LSOA by the DESNZ HN zones. We have taken this fraction to be the ‘DESNZ HN pilot score’ and this is what we compare to the Nesta HN score per LSOA.

We also now output a parquet file with the avg Nesta HN s as a function of the DESNZ HN pilot score (fraction of the LSOA that is covered by a DESNZ HN zone).
I’ve also moved this all to the ‘output’ directory as I mentioned, but we probably should put it somewhere else because I keep on getting the git .ignore warning.

In the plotting script, I’ve added a plot of avg Nesta HN vs DESNZ HN pilot zone and also perform and plot a linear regression and show the R^{2} on it.

Thanks for taking another look and for your really useful comments!

crispy-wonton

Hi @helloaidank ,

This looks great thank you!
I have mainly suggested some cosmetic changes and updates to documentation and a couple of places to simplify the inputs of functions.

There is one point on outputs/hn_zones/comparison_of_hn_zones.py that suggests to use overlay instead of sjoin with an intersection. I think it will be easier to apply. Let me know if you have questions about this.

Secondly, sorry my point in the previous review was not clear. I had meant for you to save the outputs of the script to the outputs directory in the repo, but I think the scripts themselves should be in the analysis folder where you originally had them.

Having said this, to make the final review easier and quicker, I would suggest keeping the files in the same place for now as you make further edits to the code. This means when I do my final review, I can review only the changes since this review, rather than the entire file showing up as diffs (due to it being moved to a new folder). If there's a way to do this on Github anyway that I don't know about , please let me know! After the final review, we can then move the files back to the analysis folder.

Lastly, I think you could create a new folder in S3, called evaluation/liverpool_heat_networks or something, and save your intermediate (and also final!) outputs to there if you wanted. I think this is better than saving locally because then the whole project team has direct access to the results :)

Great work! I ran your code and it produced the expected outputs and was clearly documented.

Let me know if you have any questions :)

crispy-wonton · 2024-12-05T10:40:59Z

outputs/hn_zones/comparison_of_hn_zones.py

+
+    Returns:
+        Tuple[pl.DataFrame, float, float]:
+            - Updated DataFrame with 'DESNZ_pilot_fraction' column added (fraction of area covered by HN zones).


Suggested change

- Updated DataFrame with 'DESNZ_pilot_fraction' column added (fraction of area covered by HN zones).

- Updated DataFrame with 'DESNZ_pilot_fraction' column added (fraction of local authority area covered by HN zones).

Is this correct?

Yep that's correct!

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

crispy-wonton · 2024-12-05T10:48:57Z

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

+def main():
+    # Define the output directory and set up logging
+    output_dir = os.path.join(
+        PROJECT_DIR, "asf_heat_pump_suitability/analysis/hn_zones/output_data/"


I think it's fine that the outputs go into a folder that's ignored by git. I don't think we want to push the results to github.

I can see you've change it in the new script. To clarify, I just meant to save the output files to the outputs dir. I think the scripts themselves should stay in the analysis folder. :) But perhaps I misunderstood your reasoning. What do you think?

Lastly, you could also save your outputs directly to S3 if you wanted. At least the .json files, if not the plots. We could make a new evaluation dir in the asf-heat-pump-suitability bucket.

crispy-wonton · 2024-12-05T10:51:40Z

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

+INPUT_DIR = os.path.join(
+    PROJECT_DIR, "asf_heat_pump_suitability/analysis/hn_zones/output_data/"


As above, I meant moving the output_data and output_plots dirs only, not the scripts :)

crispy-wonton · 2024-12-05T10:53:33Z

asf_heat_pump_suitability/analysis/hn_zones/plot_comparison_of_hn_zones.py

+
+
+def merge_data(
+    liverpool_hp_suitability_scores_pd: pd.DataFrame,


No problem, go with what you find easier to use. As you've said we can shorten them later if necessary.

crispy-wonton · 2024-12-06T13:48:14Z

outputs/hn_zones/plot_comparison_of_hn_zones.py

+    coefficients = np.polyfit(x, y, 1)
+    model = np.poly1d(coefficients)
+    predicted_y = model(x)
+    r_squared = r2_score(y, predicted_y)


Haven't used numpy for linear models before, cool!

crispy-wonton · 2024-12-06T13:55:13Z

outputs/hn_zones/plot_comparison_of_hn_zones.py

+    # Break mark on the y-axis
+    d = 0.015
+    kwargs = dict(transform=ax.transAxes, color="k", clip_on=False)
+    ax.plot((-d, +d), (-d, +d), **kwargs)
+    ax.plot((-d, +d), (1 - d, 1 + d), **kwargs)


crispy-wonton · 2024-12-06T13:56:26Z

outputs/hn_zones/plot_comparison_of_hn_zones.py

+    ax.plot((-d, +d), (1 - d, 1 + d), **kwargs)
+
+    ax.set_title(
+        "Average Nesta HN Score vs Fraction of DESNZ HN zone in LSOA", fontsize=14


Suggested change

"Average Nesta HN Score vs Fraction of DESNZ HN zone in LSOA", fontsize=14

"Average Nesta HN Score vs Fraction of LSOA in DESNZ Heat Network zone", fontsize=14

Should it be this way round? Otherwise lmk because I got confused!

It's the fraction of the DESNZ HN zone in the LSOA. We are increasing the threshold, so as we increase the amount of DESNZ HN zone in the LSOA, our average Nesta HN score. This is what we expect intuitively!

crispy-wonton · 2024-12-06T13:57:29Z

outputs/hn_zones/plot_comparison_of_hn_zones.py

+    ax.text(
+        0.05,
+        0.95,
+        f"$R^2 = {r_squared:.2f}$",


For my understanding, what does the dollar sign do here?

It's LaTeX notation, entering math mode!

crispy-wonton · 2024-12-17T16:17:03Z

outputs/hn_zones/comparison_of_hn_zones.py

+    and log the result.
+
+    Args:
+        hp_suitability_scores (pl.DataFrame): DataFrame containing the DESNZ pilot (actual) and Nesta HN suitability score (predicted) columns.


It's really helpful to label them as 'actual' and 'predicted' here! :)

helloaidank · 2024-12-23T16:46:50Z

Hi there, I've added in some of the changes that you suggested!

The code for evaluation has become quite lengthy, so I think in the next PR I will likely scale to the other LAs but also modularise the code a bit across a few different scripts. Hopefully this will improve things in terms of readability!

Thanks for taking the time to go through the code again and making suggestions, very useful ones, especially the overlay one.

I am happy to merge the PR and then make the changes I mentioned above in another one, once we're obviously happy with the code.

helloaidank added 2 commits November 6, 2024 18:36

adding file for calculating comparison of hn zones as well as plottin…

e4cd637

…g script for now

delete heat network zone

3bef3c2

helloaidank marked this pull request as ready for review November 7, 2024 09:29

helloaidank requested a review from crispy-wonton November 7, 2024 10:05

crispy-wonton requested changes Nov 7, 2024

View reviewed changes

crispy-wonton reviewed Nov 8, 2024

View reviewed changes

helloaidank added 2 commits December 2, 2024 14:52

changes to methodology to have the desnz hn pilot score as fraction r…

17d69a3

…ather than binary

small changes to methods script alongside adding plot of average hn s…

4100bfd

…core vs desnz fraction

helloaidank commented Dec 5, 2024

View reviewed changes

crispy-wonton reviewed Dec 17, 2024

View reviewed changes

helloaidank added 2 commits December 23, 2024 16:33

suggested changes

128474b

changes to the plotting script

94e8130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding file for calculating comparison of hn zones as well as plottin… #81

adding file for calculating comparison of hn zones as well as plottin… #81

helloaidank commented Nov 6, 2024 •

edited

Loading

crispy-wonton left a comment

crispy-wonton Nov 7, 2024

helloaidank Dec 2, 2024

crispy-wonton Dec 5, 2024

helloaidank Dec 20, 2024

helloaidank Dec 20, 2024

crispy-wonton Nov 7, 2024

crispy-wonton left a comment

crispy-wonton Nov 8, 2024

crispy-wonton Dec 5, 2024

crispy-wonton Nov 8, 2024

helloaidank Dec 2, 2024

crispy-wonton Dec 5, 2024

helloaidank Dec 5, 2024

crispy-wonton left a comment

crispy-wonton Dec 5, 2024

helloaidank Dec 20, 2024

crispy-wonton Dec 5, 2024

crispy-wonton Dec 5, 2024

crispy-wonton Dec 5, 2024

crispy-wonton Dec 6, 2024

crispy-wonton Dec 6, 2024

crispy-wonton Dec 6, 2024

helloaidank Dec 23, 2024

crispy-wonton Dec 6, 2024

helloaidank Dec 23, 2024

crispy-wonton Dec 17, 2024

helloaidank commented Dec 23, 2024

		INPUT_DIR = os.path.join(
		PROJECT_DIR, "asf_heat_pump_suitability/analysis/hn_zones/output_data/"



		def merge_data(
		liverpool_hp_suitability_scores_pd: pd.DataFrame,

	- Updated DataFrame with 'DESNZ_pilot_fraction' column added (fraction of area covered by HN zones).
	- Updated DataFrame with 'DESNZ_pilot_fraction' column added (fraction of local authority area covered by HN zones).

	"Average Nesta HN Score vs Fraction of DESNZ HN zone in LSOA", fontsize=14
	"Average Nesta HN Score vs Fraction of LSOA in DESNZ Heat Network zone", fontsize=14

adding file for calculating comparison of hn zones as well as plottin… #81

Are you sure you want to change the base?

adding file for calculating comparison of hn zones as well as plottin… #81

Conversation

helloaidank commented Nov 6, 2024 • edited Loading

Description

Instructions for Reviewer

Checklist:

crispy-wonton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crispy-wonton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crispy-wonton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helloaidank commented Dec 23, 2024

helloaidank commented Nov 6, 2024 •

edited

Loading