-
Notifications
You must be signed in to change notification settings - Fork 318
Meeting Notes 2022 Software
Feel free to reorder items in the list. Put them in priority order rather than when entered. We always need a good chunk of time for prioritization and looking at issues. Anyone is free to add to the agenda. Anyone is free to reorder, in general we let items that came first go first. Agenda items can also have a: brainstorm, make-decision, long-term planning, or make criteria for decision marker on them Recommend we periodically add highlights or wins of the week.
- Erik: Some thoughts on the purposes of this meeting:
- CTSM science leads are here to direct priorities of software efforts
- CTSM scientists are here to represent users of the system
- CTSM SE's are here partially to get priority direction from CTSM leads
- CTSM SE's are also here to get technical help on their work for the week (let's make sure this working for everyone and all SE's are getting the input/feedback they need)
- FATES SE's are critically important parts of the team both as CTSM/FATES users and CTSM/FATES SE. So we need your input to make sure the CTSM system works well from a FATES user and SE perspective. We may want to have a guideline for discussion length that if something isn't resolved reasonably quickly that we setup a different meeting for it (because of the different needs of attendees) By the way I think python SE development needs a community to collaborate on best practices.
- Feb 4th, 2022 -- LMWG spring meeting -- no meeting
- Feb 18th, 2022 -- what do we need to change in response to the LMWG meeting?
- Apr 7th, 2022 -- SEA conference -- no meeting
- Apr 14th, 2022 -- What can we improve that we learned from the SEA conference?
- May 9th, 2022 -- What do we need to have done by CESM workshop?
- Jun 16th, 2022 -- CESM Workshop -- no meeting
- Jun 23rd, 2022 -- What do we need to change in response to the CESM Workshop?
- Sep 29th, 2022 -- Last day of the fiscal year, do a fall assessment
- Nov 24th, 2022 -- Thanksgiving -- no meeting
- Dec 2nd, 2022 -- Winter long term assessment
- Dec 22nd, 2022 -- No meeting holidays
- Dec 29th, 2022 -- No meeting holidays
- Jan 5th, 2023 -- First meeting of the new year
- Jan 12th, 2023 -- What needs to be done by the LMWG meeting?
- No immediate impacts on CTSM
- In progress:E3SM 5345 (SP fix) (integrated ELM only), 958 (drought decid phen) (Marcos - new version, easier to integrate, elongation factor may allow for slower drop & not instantaneous, drought decid now PFT specific), 959 (allometry storage) (Marcos option to scale by untrimmed leaf from day before)
- Long term: E3SM 5369 (elm-fates fire api) (Greg ongoing ELM bring up to CLM-FATES-SPITFIRE), 769 (leaf memory) (Ryan) scientific testing results slight diff not b4b, 888 (c-based harvest) --> API 25 compliant PR (Shijie in progress ELM only) Ryan - wants patch levels clean up. No history, no allocation, daily next level,
- Calibration progress update - no update
-
- CTSM-FATES priorities by category
- Highest priority - Land Use w/ FATES
- [x]
- [x]
Following a shortened CTSM-Software meeting, we had an extended meeting with Beatrice, where the focus was on making this and other meetings more effective. We covered a lot of ground in this meeting, and there were a number of useful recommendations. Here I'll just capture what I think is the most significant set of recommendations for the CTSM-Software meeting:
We're going to try to reduce this meeting to an hour. Along with this, we will try to move any discussion on medium-to-long-term prioritization of SE and other resources out of the CTSM-software meeting to the monthly TSS meetings. We hope that this will help achieve two goals: (1) shortening the CTSM-software meetings; (2) providing a more clear mechanism for all members of TSS to be part of the prioritization discussions, avoiding the feeling that these decisions get made in meetings that they do not attend. (We imagine that questions of prioritization will continue to come up in the CTSM-software meeting, but when we notice this, rather than talking about it then, we will add them as agenda items for the next TSS meeting.)
- I ran a case with CTSM and mizuRoute with lakes on.
- Sam has trimmed down SLIM to about the right size
-
No immediate impacts on CTSM
- In progress: 958 (drought decid phen) (Marcos), 959 (allometry storage) (Marcos),888 (c-based harvest) (Ryan's branch to bring Shijie's E3SM PR up to API 25)769 (leaf memory) (Ryan)
-
Calibration:
- Rosie Preparing to do ensemble of NIR parameters in FATES. Highlighted in albedo bias discussion
- CLM vs ELM there are differences in surface data files. Jessie documenting these. Highlighted in the ELM-SP issue
- CTSM-FATES priorities by category
-
FATES functional with land use (highest priority)
- create a design doc per Adrianna recommendation?
- Example from ELM-FATES for seed-dispersal design doc and fire data api
-
FATES nutrients (high priority)
- implemented in ELM-FATES
- FATES-SP (functional and in calibration)
- Calibration work (in progress, high priority)
- Rosie: CTSM-FATES global calibration albedo bias
- Jessie: ELM calibration global and demographic benchmarking w/ VDM MIP
-
MEGAN and Dry Dep seems to be functional in FATES-SP
- long term work to get it functional for other FATES complexity modes (low priority to LU and nutrients)
- Erik and Jackie reviewed these issue over summer 2022
- CTSM-HH-FATES functional w/ FATES; need better way to analyze data
- Erik: Excess ice to truly be tested we need to run long enough that at least one grid cell melts. Should we setup a test for this?
- Erik: This is unfortunately my fault. But, Peter had a couple PR's about shifting cultivation. We integrated some of it so that unrepresented change is zero on the current surface datasets. There's another PR that in my head was tied up with CTSM5.2, but probably could have come to main without CTSM5.2. The one issue with it is that to know if it's functioning properly we need to have datasets with non-zero values to truly test it, and we only have that with TRENDY datasets and CTSM5.2 datasets.
- Status on surface datasets and CTSM5.2 branch discussion #1868
- Land use transitions with FATES, can we make some progress on the technical side of passing information between the HLM and FATES?
- Erik: We've been talking about CESM grid aliases in CSEG because of mizuRoute. We plan to have mizuRoute grids spelled out for example: f09_f09_rHDMAlk_mg17. Is there still a need for tri-grids? And is there work where the atmosphere will run on a regular grid and CTSM on HRU's?
- Erik: On mizuRoute we plan to bring mizuRoute into CTSM main when we get lakes fully functioning and are passing lake precip/ET from CTSM to mizuRoute.
Erik and Naoki think: once we have lakes working and are passing lake precip/ET from CTSM to mizuRoute, that's probably the time to bring it into CTSM main.
However, mizuRoute isn't ready to couple to the ocean yet, because we aren't passing ice streams.
It would be good to evaluate whether mizuRoute with lakes fixes the negative runoff issues.
mizuRoute will change lake depth, but for now not lake area.
Technically mizuRoute could have a lake totally dry out, but there isn't a way to feed this back to CTSM.
- Bill: until we have fully dynamic two-way connections, I'm thinking we'll need to keep negative runoff to avoid long-term transients that take us out of water balance.
Bill: what's the plan for connecting dynamic lake areas to mizuRoute?
- Erik: unsure. We should get the right people together to understand this.
Will: He, Sam and Keith L have been working on a new spinup method (Newton-Krylov) that is an alternative to matrix and AD. This seems close to working. It may not be as fast as matrix, but works with MIMICS (and may work with FATES, in contrast to CN-Matrix), may be more sustainable, and has in-house expertise – and is consistent with the ocean model spinup (the main difference being that the ocean model spinup needs to deal with lateral transport).
With FATES, an issue could be patches changing in time: you could spin up woody pools, but maybe not deal with patches changing in time.
Jackie has put together https://github.com/orgs/ESCOMP/projects/10/views/1
Highest priorities are LULCC and nutrients. MEGAN/dry-dep lower priority right now... would eventually be needed for coupled runs, although this should already work for FATES-SP (pending a little more testing).
We should consider merging https://github.com/ESCOMP/CTSM/projects/28 with this.
It would be good to have a test of the behavior of fully melting. But we don't want to slow down our whole test suite for this purpose.
One possibility would be a single-point run with really hot anomaly forcing. That could have some side benefits of stress-testing the model.
But, since it will take some work to put that test together, it could be worth deferring that until excess ice becomes a standard part of the model.
Erik: Best plan is probably: Bring this into master (won't have any impact, but also won't really be tested well in that context), then merge it to the 5.2 branch, and have Peter test it there with the new datasets to make sure it's working properly.
Others agree with this plan.
Erik: It would be good for the whole group to review outstanding PRs periodically. This would help prevent things from falling through the cracks.
(Today we reviewed from old to new, getting through #1596.)
See https://docs.google.com/document/d/1up1pfdfiA9IPoOH8o-eM7bBX80SszEb6F2PMtTQ7jnY/edit#
NCAR organization may be most appropriate.
Can tag repos with something like CTSM for discoverability.
- Bill, solved Iris's CTSM-WRF problem (turned out to be a subtle bug in PIO). Nice. Erik -- Bill show us what you did.
- Erik: I do appreciate efforts that Jackie has done in working on project management type things. Making sure tasks that need to happen are accomplished and the like. Good job on running the pasture conversation by the way. The reminder that she gave me was particularly helpful as well.
- ?
- 880 (nutrients v2) (E3SM), 955 (hydro bug fix) (Junyan E3SM-Hydro),888 (C-based wood harvest) (E3SM; use recent CTSM api)
- Use bedrock default on for CTSM-FATES Triggering root density arrays (Greg in progress)
- Decreasing memory arrays (Ryan in progress)
- Atkin respiration model Update to option, not default (Charlie in progress)
- Calibration progress:
- (Rosie) high bias complexity in RTM resolving empty layers for Norman Radiation, most bias in NIR, ensemble for NIR parameters, FATES albedo bias
- (Jessie) demographic benchmarking in progress
- Pasture/ LU discussion:
- schedule meeting to discuss data processing for gross transitions and dynamic land units (January)
- Erik: Do we have a way to "picture" where we are at in terms of progress for CTSM/FATESv1?
- Bill: after a lot of discussion, the plan is to put pasture as a column (or multiple columns) on the natural vegetated landunit, rather than having a separate landunit for pasture. The main rationale for this is that there doesn't seem to be a clear-cut rule for the distinction between pasture vs. natural veg: there could be gradients of behavior (e.g., natural veg could have some herbivory), and there is some desire to run natural PFTs each on their own columns, so there isn't a clear-cut distinction in terms of the subgrid structure, either.
- Erik: Beatrice has time on Dec 22nd at 10am that we could utilize her. Who will be available then? She also has time the week after, but I suspect many of us (including me) will be out then.
- []
Keith: the current TRENDY data are on campaign store. But he created a link as part of Globus that will let you get the data even if you don't have access to campaign store.
Jackie: Critical things are having FATES functional with land use (highest priority); nutrients, FATES-SP, FATES calibration and PPE; MEGAN and Dry Dep (lowest priority)
- FATES functional with land use (highest priority)
- FATES nutrients (high priority)
- FATES-SP (functional and in calibration)
- Calibration work (in progress)
- Rosie: CTSM-FATES global calibration albedo bias
- Jessie: ELM calibration global and demographic benchmarking w/ VDM MIP
-
MEGAN and Dry Dep seems to be functional in FATES-SP
- long term work to get it functional for other FATES complexity modes (low priority to LU and nutrients)
- Erik and Jackie reviewed these issue over summer 2022
- CTSM-HH-FATES functional w/ FATES; need better way to analyze data
After a lot of discussion, the plan is to put pasture as a column (or multiple columns) on the natural vegetated landunit, rather than having a separate landunit for pasture. The main rationale for this is that there doesn't seem to be a clear-cut rule for the distinction between pasture vs. natural veg: there could be gradients of behavior (e.g., natural veg could have some herbivory), and there is some desire to run natural PFTs each on their own columns, so there isn't a clear-cut distinction in terms of the subgrid structure, either.
Part of this is conceptualizing this as consumption rather than just grazing. So there is a gradient of landuse, rather than a clear-cut distinction. And in the future we may have things like fertilization and irrigation on some natural tiles.
Part of this is also that it's easiest to get started this way and then add a separate landunit if we decide we want in down the line.
- ctsm5.1.dev114 fixed a bunch of NEON issues, became our first development tag labeled as a pre-release tag with a DOI
- Value of belonging in work environments: https://docs.google.com/presentation/d/13zgi97r0-jXz_RG351NOdNWcMv8UCaZmw5waTQk3rqA
- ?
- FATES detail notes Nov 28 2022
- 769 (reduce cohort memory arrays) part of leaf layer optimization, make things local or send to history, should speed things up (Ryan Knox)
- Calibration update: Rosie: albedo bias responds to leaf layering, but not solved with increased layers.
- Recommendations on setup for 1907 regional cases
- Erik: Jackie and I will meet with Beatrice to talk about this meeting and improving communication
- Erik/Jackie: Jackie notes that the regional setup assumes 100% land covered, so you need to modify the mesh mask to mask out inactive points. (From Negin:
--mask
+ giving the name of the land mask variable in mesh_maker.py create a mesh file with the land mask on. When the--mask
is not identified it assume all points are land points. For example--mask LANDMASK
. This is similar to how we have to identify the names of lat and long variables inmesh_maker
script. ) - Erik: I've been working on excess_ice PR. Kaveh is now gone and Marius is defending thesis this month. So will probably do the update to newer version myself if it's easy.
- Bill: quick update on mksurfdata changes
- Bill: quick update on Iris's WRF issue
- Status on surface datasets and CTSM5.2 branch discussion #1868
- Land use transitions with FATES, can we make some progress on the technical side of passing information between the HLM and FATES?
- Erik: We've been talking about CESM grid aliases in CSEG because of mizuRoute. We plan to have mizuRoute grids spelled out for example: f09_f09_rHDMAlk_mg17. Is there still a need for tri-grids? And is there work where the atmosphere will run on a regular grid and CTSM on HRU's?
- FATES detail notes Nov 14 2022
- FATES api update w/ Ryan's nutrient work
- 880 (nutrients v2) Ready to test.
- 935 (Longwave rad error) - CTSM or forcing issue
-
Use_bedrock PR
- run failures from NaN for rootr_patch(Greg)
- Calibration:
- Rosie: more study on clumping sensitivity with ilamb. Differences are significant in the tropics
- Jessie: Working on respiration, studying the base rate at Ryan et al., found that increasing base respiration, improved global biomass, but made negative progress on LAI and GPP. Atkin respiration is showing collapse in certain areas.
- Pasture discussion for FATES LU development targets/needs
- Erik: Will run_neon be run outside of cheyenne?
- No meeting next week, but then in two weeks?
- Current thoughts on % wetland
- Communications article from Jackie -- for two weeks?
- Erik: CSEG talked about mizuRoute grid aliases. At least Jim Edwards wanted shorter names for mizuRoute. HMDA, MERIT-Hydro, USGS-GF could be: H, M, U? Lake just l? I need Naoki to agree to this as well.
- group discussion Nov 14 & online discussion [pastures dev] (https://github.com/NGEET/fates/discussions/936)
- further discussion CLM meeting Dec 1
- FATES group will set recommendations for LU development needs
- Matvey and Sam Rabin proposing same infrastructure development
- advance development to consider FATES with LU
- question of whether it's worth the effort to put pasture on its own landunit
- this raised a big set of questions around how to implement landuse with FATES; part of this is the desire to switch from net to gross transitions
Erik: will run_neon be run outside of cheyenne? The issue is that its defaults assume the path on cheyenne. A better way to do it would be to have a command-line input giving the path to inputdata so you can run on any machine (with a default set up for cheyenne), but this depends on how important it is to run places other than cheyenne.
Will: It has run in containers and on the cloud; not sure how Brian has gotten that working.
- Erik wonders if Brian set up the containers with paths mimicking cheyenne
Will does feel it would be good to allow this to work outside of cheyenne, like Erik suggested.
Jackie: Regional subsetting runs are functional, but NUOPC is significantly slower in runtime – something like 3x slower.
Jackie didn't subset the forcing data, but she didn't on the MCT side either.
Bill: let's look at the timing files between the MCT and NUOPC runs.
- Jackie notes that the MCT runs she has are about a year old, but she wouldn't expect big timing differences between then and now on the FATES side.
Will noticed really slow single-point runs last week, but it seems that was due to running in the share queue.
- Could force using JOB_QUEUE=premium
Sam: has had faster regional runs compared to someone else due to some settings he made based on Mariana's suggestions. This included forcing use of pnetcdf, which is likely the main thing that helps.
Suggestion: Jackie will run 10-day runs with:
- Current setup
- Changing NTASKS=-1 for all components
- Additionally using pnetcdf rather than netcdf
Then we can look at timing files for those different runs, see if the final run achieves a reasonable turnaround, and determine whether to change any out-of-the-box settings.
There is general agreement that, moving forward, the guts of python scripts should go in the python directory (the exception potentially being if someone initially develops something outside of CTSM and contributes it: then it can go as is into the contrib directory). In terms of existing scripts: run_neon is a relatively high priority to do this with, as are the new mksurfdata tools.
- Have a PR for regional work
- Bill helped me solve a python issue in 2 seconds
- long FATES notes Nov 7, 2022
- Integrated
- 916 (batchpatchparams)
- 788 (crown damage) (much cheering)
- Calibration - Rosie working on ILAMB for ensembles
- Issue with albedo related to clumping (higher value lets light to understory)
- SP runs with clumping have high bias, 0 clumping reduces bias
- Perhaps an issue with LAI from MODIS (already accounts for clumping)?
- Handle this by reducing clumping, use clumping and increase LAI, other?
-
AD spin-up and CTSM_AD
- AD seems to be working, but TOTSOMC should not be so large
- Too much ending up on soil C - needs investigation
- Non-spin off differences suggest problem (TOTSOMC and TOTECOSYSC)
-
Use_bedrock issue
- Being turned off on CLM side?
- Interface should be able to accommodate bedrock update
- Should be default ON - test impact of this change (impact calibration)
- TASK - Katie M test with her simulations
- Rosie test with SP calibration
-
LU pastures and crops
- Rosie started discussion on this
- number of decisions to be made regarding crops and pastures.
- task of implementing pasture practices on gridded cells, Rosie had planned for she and Matvey to work on this (but can shift)
- separate task (someone else) transitions in and out of pasture through connections to LUH2 data
- Ryan Knox - there is overlap with patches on unique soil columns
- we should synchronize and build in future compatibility including functionality with FATES and future efforts
- working together we can avoid redundancy
-
Respiration Atkin 2017
- Jessie Needham, C Koven testing new respiration model (Atkin 2017)
- Erik: FYI. mizuRoute Lake will need a different grid. Thinking about names like: hcru_hcru_mt13_rHDMA-lake
- Erik: Planning issue for ndep. Mariana implemented passing ndep in datm, so now it appears that ndep comes both from datm and CTSM which is confusing. It will likely change answers to start using ndep from datm. And what do we do about ndep from CAM for non WACCM cases? There's an issue in CAM to pass ndep, but it's not done as far as I can tell.
- Longwave issue
- How are discussions working?
- Erik: We had some offline discussion about keeping healthy lines of communication open. Talking about our work as we are making decisions works better than working in isolation and then others having issues with the completed work. How do we keep these lines open? Does everyone feel able to ask for help or questions on your work? Beatrice's communication model of Frisbee vs. Shotput helps with this. As well as the idea of psychological safety. Do we feel safe enough to bring up problems, or admit not knowing something? The ability to have difficult conversations also rests on our level of trust with each other. The one idea I have here right now is to make sure we all have an open Google chat line with each other? And possibly we should make a point of doing a small project with someone we haven't worked with before? This will need some long term thinking and we'll need to come back to it...
- Erik: For the long term is SLIM something that LMWG supports? I think the long term support of it won't be a lot, maybe a few hours a month at most (it'll be similar to support of RTM/MOSART). But it is non-zero. One reason for LMWG to support it is that it's code base is more similar to CTSM than CAM.
- Erik: I'm noticing glitchy issues with run_neon.py that indicate to me that we should add more testing for it. Currently we only have a few tests that run it at the system level. I think the guts should be moved to the python/ctsm directory and then unit and system python tests be added. Specifically we want to make sure all of the command line options work and continue to work.
- Erik: I'm noticing we are opening new issues for existing issues. I think making sure we search for relevant older issues can be helpful for that. Also some issues are hard to figure out what the "definition of done" is. It would be good to have this in mind when we create an issue (unless the purpose is for discussion, when it should be opened as a discussion). It would be good to be able to close many issues that have been mostly addressed.
Jackie's impression is that we should have a larger group discussion on pastures, so we'll probably wait to talk more with Sam Rabin about this until we can schedule a larger discussion that includes FATES folks.
Erik proposes hcru_hcru_mt13_rHDMA-lake. Question about whether the dash is okay.
Note that glacier grids have the glacier grid before the mask. We'll talk about which feels best.
Keith feels this has been working so far. He directs people to GitHub discussions for things that seem like they would benefit from more input.
For the long-term, is SLIM something that LMWG supports? Hopefully this won't be a huge amount of support, but it's non-zero.
General feeling is that it makes sense for LMWG to support it since the code base is similar to CTSM. But we also need recognition that we can't keep supporting more and more things with flat resources.
- Nice to have Negin back as our guest.
- We had a planning session about SLIM that happened and went well.
- Danny was able to show that he gets similar results with his dust model updated to ctsm5.1.dev106
-
Short term: no CTSM api impact;
- integration of 788 (crown damage), 916 (batchpatchparams), 891 (grass fire handling)
-
Long term: integration of 851 (hydro stability), 880 (nutrients v2), 888 (ELM c-based harvest)
-
MCT & regional subset data
- Negin working on CTSM_PR1735 python
- CTSM_1773 two line is NOT long term solution (CIME dependance for module compatibility)
- gnu MCT fail https://github.com/ESCOMP/CTSM/issues/1887#issuecomment-1295665202
- Erik work around
-
Filtering discussion
- Updated filter map concept
- FATES group agrees this is a good idea
- with filters at patch and column levels, use filter most appropriate for process
- work on fates side to identify columns for fates use (crop or natural)
- would clean up and clear up code by removing /if/else statements
- Ryan Knox needs filters for nutrient work (CTSM 1046), and will add them incrementally.
- Greg suggests sketching it out in design document to build off of later
- Negin: Update on the regional mesh subset_data work.
- Bill: planning to move ahead with other changes to mksurfdata (to handle coastal areas more rigorously)
- Erik: Three NEON questions. I have it setup to run KONA and STER with BGC/Crop, if you really do want to run with SP or BGC without CROP, you'd need to explicitly do this afterwards. Also because there are now two compsets for run_neon.py you have to overwrite the previous main case that you clone from. This could be more robust by checking the compset of the case being cloned from. I think that's the right thing to do.
- Erik: The other NEON/subset_data question is that when you set dompft to a natural veg type the PCT_CFT array is still filled, I wonder if it should be reset just so that it "looks right". Otherwise it looks "wrong". It doesn't honestly cause a problem, but it gives me pause when I look at the datasets. There is a similar problem for a AG site.
- Erik: For the long term is SLIM something that LMWG supports? I think the long term support of it won't be a lot, maybe a few hours a month at most (it'll be similar to support of RTM/MOSART). But it is non-zero. One reason for LMWG to support it is that it's code base is more similar to CTSM than CAM.
- Erik: I'm noticing glitchy issues with run_neon.py that indicate to me that we should add more testing for it. Currently we only have a few tests that run it at the system level. I think the guts should be moved to the python/ctsm directory and then unit and system python tests be added. Specifically we want to make sure all of the command line options work and continue to work.
- Erik: I'm noticing we are opening new issues for existing issues. I think making sure we search for relevant older issues can be helpful for that. Also some issues are hard to figure out what the "definition of done" is. It would be good to have this in mind when we create an issue (unless the purpose is for discussion, when it should be opened as a discussion). It would be good to be able to close many issues that have been mostly addressed.
Keith L is close to getting his Newton-Krylov method working. In contrast to CN-Matrix, it doesn't assume linearity, so will hopefully be useful for MIMICS.
Keith wonders if it could be a helpful method for FATES as well. Ryan suggests talking to Charlie about this.
Rather than subsetting mesh files, Negin's new approach involves creating a mesh from the center coordinates on the surface dataset. This works for curvilinear grids, though not for unstructured grids; feeling is that that is probably an uncommon use case for regional cases.
- There is a separate workflow that could be used for unstructured grids.
The new approach also leverages dask arrays for the sake of performance.
Negin will open a new PR for this, since the old PR was based on the approach of subsetting an existing mesh file.
Suggestion of using BGC-Crop for everything. (It shouldn't matter much, if at all, to use BGC-Crop for sites without crop.)
For FATES, though, you currently can't run with crop. So, given that issue, let's not change the compset across the board to BGC-Crop.
- CGD-DEI lunch about MGEN yesterday
- Nutrients - Ryan completed initial scoping work for CTSM-FATES (maybe early 2023)
- Higher order history output CLM/ELM, abandon multiplexing? - Erik Kluzek & Ryan Knox per FATES-PR880
- Can CLM accommodate this with reliance on netcdf libraries?
- Subset data
- Once Sam Levis adds nco commands to Negin's python script we can deprecate MCT
- This will take nco commands from CTSM_1773 two line and add it toCTSM_PR1735 python
- Erik: Proposal for turning direct_to_outlet on in MOSART. One way would be a user-mod directory that is turned on when it sees the compset uses MOM. This could be done either at the CESM level for all-active compsets, or in MOSART. Another way would be for the buildnml to query COMP_OCN and if it's MOM turn direct_to_outlet to true. The query could also be to check for either MOM or POP, or if COMP_OCN is not: socn, xocn, or docn.
- Erik: When ctsm5.2 comes to main the surface datasets and thus answers will change for all physics options. Should we continue to make answer changing ctsm5.2 tags or fairly quickly go to ctsm5.3 tags, once we have something that changes answers?
- Erik/Bill: Also along with above should the one change for clm5_2 physics be to do beaches rather than wetlands for coastal/islands that disagree with ocean mask?
- Bill: How important is it to have this option differ for 5_2 vs. 5_1, and more generally how important is it to introduce a 5_2 physics option? I realized that a lot of changes are needed to introduce a new 5_2 physics option. (This is another thing that I think will become easier once we pythonize build-namelist.)
- Bill: Do we want the negative runoff fixes (https://github.com/ESCOMP/CTSM/discussions/1835#discussioncomment-3926444) to only apply for fully-coupled cases, or for all cases? (See also some discussion in https://github.com/ESCOMP/CTSM/issues/1878 )
- Bill: Connected with the above point: it seems like
LND_TUNING_MODE
should be set to use the CAM-specific tunings when doing a cplhist-forced run. Is that what's typically done? - Erik: Should the I1Pt compsets set to use the new 2018_control and 2018-PD_transient use-cases I'm setting up?
We support Ryan's ideas for new filters for the sake of FATES.
Two main questions:
- Should these things apply only in coupled runs, or all?
- Should these things apply only in ctsm5_2 or all phys versions?
On question (1), the river model changes are probably the more important. There is probably some scientific benefit to only applying these in coupled cases where they're needed.
The wetland change can apply to clm51 and beyond – no need for a new clm52 for that. That's going to be awkward for the river model changes, though.
The initial plan will be:
- Do not depend on coupled vs. uncoupled (we might revisit that)
- Wetland change will apply to clm51 and beyond; river model changes will be the new river model default (not depending on CTSM physics version)
Keith notes that the differences from the direct_to_outlet+wet2veg in terms of runoff are small.
Bill: it seems like LND_TUNING_MODE
should be set to use the CAM-specific tunings when doing a cplhist-forced run. Is that what's typically done?
Keith: For the official spinups for CESM2, he used CLM5_CAM6 tunings for the CPLHIST-based spinups.
Erik points out that we're going to use new surface datasets that apply to all physics options. Do we want to put new answer changes in CLM52, or quickly move on to CLM53?
In the past, we've used a change in the tag version to indicate the introduction of a new physics version, without implying a change in the behavior of past physics versions. CLM52 will differ in that it implies a change to all previous physics versions.
People are okay with the idea that we will introduce a CLM53 once we want new answer changes beyond the new surface datasets.
We notice that they're currently using CLM50... we should suggest changing to CLM51.
Will: The I1Pt compsets will probably be used for a variety of purposes, so doesn't want these to be tied too much to NEON. But doesn't see strong arguments one way or the other. Would ideally like it to be clear to users that you might need to change this whenever running an I1Pt compset.
link to long FATES notes 17 October
- Testing regional subsetting method by @slevis
- two regional subset methods CTSM_1773 two line & CTSM_PR1735 python
- Integrated
- 914 (ERS b4b tveg)
- 917 (github automated issue to project board)
- Of interest to CTSM? method from @glemieux
- Still in testing
- Will plans to introduce discussion feature re. CTSM5.2 at CLM meeting today (this will not be a focus of the meeting).
- What are happened in the CTSM5.2 BGC testing with transient lakes? Are there additional 'issues' with LULCC?
- Science and SE priorities for CTSM5.x
- Erik: Gordon's point about things that are "experimental", don't fully work, but we need to come in so others can work on them. What are different ways to approach this? What standards do we need to insist on and what can we let slide for this category? What about maintaining long term branches for these type of efforts?
The one thing that isn't working in Negin's script is subsetting mesh files. Sam has an nco-based method for that piece. Sam is still using Negin's script for subsetting the surface dataset.
Will asks if we can add the nco commands to the python script. Sam thinks that could be possible.
Once we have this working, we can move ahead with the deprecation of MCT.
https://github.com/NGEET/fates/pull/917
Bill described a proposed strategy from the LIWG to greatly reduce negative ice runoff from ice sheets by integrating in space and time. Feeling is that this is probably worth moving forward with.
Plan is to do three things:
- Changing wetlands to bare ground [Bill]
- River model option to make positive runoff offset negative [Erik]
- (Hardest) global spreading of the remaining negative runoff, which we remove from positive runoff [TBD]
- Where would we want to do this? If it were in the land model, then you'd be spreading before dealing with the offsetting of positive with negative. So maybe spread in the river model or in the coupler? The coupler would have the advantage of only needing to do it once, not in each river model, but the implementation might be harder for that.
Sam got BGC with transient lakes working for hybrid cases by bypassing the ch4 balance checks at the start of the run.
Longer-term we should resolve this correctly (https://github.com/escomp/ctsm/issues/43)
We are happy to bring in experimental / partially-working things in some cases. We have done that for a number of things. It's often better to bring it in than to maintain a long-term branch, since the latter can have a high maintenance cost.
- I got the sparse matrix case to work with NUOPC!
link to long FATES notes: Oct 10, 2022
-
FATES_PR914 will fix FATES_908 (bare ground area_pft) and FATES_911 (veg temp looping) Looks good, and Greg investigating expected differences.
-
CTSM_PR1849 (add long restarts) still in testing to fix
FATES_897(ERS fail in long tests). Retest after FATES_PR914 -
E3SM#5106 (C-based harvests) in review. CTSM_PR1040 (area-based harvest), need to test read of LUH2 or update to streams CTSM_1077 (read raw LUH2)
- Update on regional script status
- Greg: Some considerations of polymorphism -- next time?
- Erik: I'd like to go through the "bug impacts science" list of issues. Can we mark most as "low priority" if they aren't important? Some are more involved and maybe can't be marked as low priority either. -- next time?
- ctsm5.1.dev111 brought work from a bunch of people to get NEON working again
- Systems of oppression was good. You can watch it here for a limited time: https://operations.ucar.edu/erg An impactful insight for me was that racism was the coupling of prejudice with power. Minority groups by definition don't have power.
link to long FATES notes: Oct 3, 2022
- CTSM_PR1827 (updated test mods) uncovered issues FATES_908 (bare ground area_pft) & FATES_911 (veg temp looping)
- FATES_PR914 (bareground & veg temp) will fix FATES_908 (bare ground area_pft) and FATES_911 (veg temp looping)
-
CTSM_PR1849 (add long term restarts) still in testing to fix
FATES_897(ERS test fail for long tests)
- Erik: Sparse grid status. Need to change forcing "mesh" to a list of disconnected points. I think we should make this something that we turn into a user-mod that we test, because of the effort to get it working. Should we checkin the forcing data and put it in the standard location? It's currently under scratch and only 4GB.
- Erik: FAN work in CAM. We need to get a group together for this and look how this was done before.
- Erik: Are all the direct_to_outlet fixes in place? https://github.com/ESCOMP/MOSART/pull/57
Greg figured out exact restart issue around vegetation temperatures. The issue ended up being with the bare ground patch, and a mismatch between two loops.
Ryan's thoughts on FATES-CTSM coupling:
- He has been working on the FATES side of nutrients
- A few people might start working on a design document to scope out changing the communication level with FATES: currently, have FATES patches communicating with CTSM patches. But because of nutrients, hydraulics and land use, thinking of having each FATES patch point to a column.
- Each patch would have its own column, so that each patch has its own nutrient and hydrologic environment.
- Bill: Currently, you can have multiple vegetated columns, although transient land use doesn't work with that yet. One concern is the computational expense. Would also probably want to treat conservation of below-ground water and energy more rigorously when changing column areas, splitting or merging columns.
Suggestion from Bob Oehmke is to create infinitesimally small points in the mesh that have no connectivity. Erik will try this.
Erik suggests having a usermod directory for this and having a test for it.
Are roughness changes in? No, these are still outstanding in https://github.com/ESCOMP/CTSM/pull/1596
- Not a win, but we will miss Negin working with us.
- Sam did get a couple tags in place
- Naoki with mizuRoute has handling of irrigation, qgrwl field, and route_to_outlet in place
- Erik: Things to finish dropping support of MCT. https://github.com/orgs/ESCOMP/projects/2/views/12
We wonder if it's worth adding a test to cover the NEON spinup. Bill thinks it might be overkill to have an extra test to cover just a few lines of code.
Will suggests having a separate test suite that's run when updating externals that could include things like this. We like this idea.
- We could consider not having baseline comparisons for this – or generate baselines on the previous tag before running this.
- Erik/Sam: How to handle lake datasets for depth. We have three files for depth 1850, 1900, 2017. If depth is just on the surface dataset it will be fixed at 1850 throughout a historical or future SSP. Also for a 2000 control if 2017 is used it will be off. It would be better to have depth also be transient. Is that even a possibility? That would likely mess up the science when it changes. So it would take research science effort to have it be transient. Alternatively we could have it fixed for one year and would need to pick what year that should be (presumably 2017).
FATES restart issue: nocomp case works now, though there are still restart issues with more complex cases. TVEG is the field showing restart problems. TVEG24 shows differences only in the ERP threaded test (non-threaded tests don't show differences).
The new lake dataset includes a transient feature, so there are %LAKE data from 1850-2100. The baseline files (1850, 1900 and 2017) contain both %LAKE and LAKEDEPTH. So there is a potential inconsistency.
We should check if LAKEDEPTH is the same for all of these files. If so, then the solution could be to separate LAKEDEPTH from %LAKE onto different files. Then we could point to one year from the transient dataset for the sake of the surface dataset.
- Design docs
There is general support for the design doc template that Adrianna put together – thanks, Adrianna!
There are potentially two purposes that these serve:
- Aiding an initial design review before implementation begins
- Long-term documentation of design / architectural decisions (particularly useful for big things)
Where should we put these?
- Possibly putting them in the issue, PR and/or ChangeLog (at least putting the relevant information there)
- Probably put it as a comment, possibly as part of the initial PR comment (as opposed to attaching a file)
- For things we want to stick around longer-term – especially big, architectural decisions – putting them in the
doc/design
directory in the repository.
We'll put the template in the repository.
We'll be flexible and treat this as an evolving process – e.g., using this template as a guide but not necessarily feeling the need to fill out every section for everything.
There's a balance here: we don't want to have design documents for every little thing, but we've probably been too much on the side of not having design documents when they would be helpful.
Sam: Should we consider having an issue type with a template for this design doc, so that if people are opening an issue where they have an idea for a design they have a template that they can fill out?
- Adrianna got a difficult externals tag in!
- Datasets for CTSM5.2
- Status of various tools
- Adrianna Ozone and externals update
- Erik: Greg show us the FATES issue board you put together
- Should we have a longer discussion on datasets for CTSM5.2 next week with Sam Levis?
- How are Discussions working? When / how do we want to start conditioning others to use them?
- Erik: I am running into trouble with FAN. I'm seeing why we weren't able to get it in previously. FAN doesn't work with threading (which is a problem for running with CAM). It also doesn't work in DEBUG mode (which I might have solved). There are some fields that are read in with Infinity, which I interpreted as missing value, but it's possible it's intended to indicate a large value. Nitrogen balance checks are an issue. I kind of want to move forward in the midst of a bunch of problems just to get us to the end of the merge and then bring in the group. But, I'm afraid that will cause problems harder to solve.
- Erik: CGD Peeps be sure to take the CGD culture survey which will be announced today...
Greg is finding that long tests (year-long, with restart mid-year) fail exact restart, in most FATES modes (SP is fine).
For non-FATES – especially for crop – we have a variety of multi-year restart tests:
- Multi-year with restarts on the year boundary
- Multi-year with mid-year restarts
- Long (20-year) single-point with restarts
Erik: I am running into trouble with FAN. I'm seeing why we weren't able to get it in previously. FAN doesn't work with threading (which is a problem for running with CAM). It also doesn't work in DEBUG mode (which I might have solved). There are some fields that are read in with Infinity, which I interpreted as missing value, but it's possible it's intended to indicate a large value. Nitrogen balance checks are an issue. I kind of want to move forward in the midst of a bunch of problems just to get us to the end of the merge and then bring in the group. But, I'm afraid that will cause problems harder to solve.
We're nervous about spending a lot of time on this right now. Probably go back to Peter and his group about where things stand and what is needed.
- Quick updates on outstanding projects, the list below is likely incomplete. This can also be something we discuss at another time...
- anomaly forcing (usermods_dirs?, compsets? documentation?)
- regional cases
- CTSM5.2 branch, (have we tried running with any of the updated datasets)?
- Others? (Erik: We could make this kind of update a regular thing for important projects. Have to think how that would look...)
- Erik: Help Response levels. I want to get a handle on my own time spent on helping people, and think it would be good if we coordinate this. I suggest a four level response, with only one level being immediate.
Four level system:
- A CTSM-Software Team: (Bill, Adrianna, Negin, Sam, Naoki) should respond fairly quickly
- B CTSM/CESM Leadership: (Will, Dave, Danica, Mariana) Respond quickly, but schedule
- C CESM Wide at NCAR: Scheduled response
- C CESM Active Collaborators: Schedule response
- D CESM Wide: In general – make them go through the forum
- D CESM Forums: Keith and I monitor daily, respond when mentioned, or answer doesn’t come
Expertise:
- Negin: CTSM-WRF, NEON and subset_data
- Adrianna/Jackie: FATES for NCAR folks
- Greg/Ryan: FATES for non-NCAR folks
In general direct people with issues to the expert. When they can't respond we point out they are out. If they get stuck they bring it up to the team, and then we work on it together (part of the reason why we should be most responsive to the team).
- Adrianna: Design doc template.
Erik is doing 4 updates to different versions... will take longer than he originally thought, but not forever. He's probably 1/4 of the way done.
Working through a restart problem.
Want to add a test to the FATES test suite that does longer runs.
Want to get an ILAMB automated test going. Probably wouldn't be in the standard test suite, but could be run similarly. Probably, rather than giving you images, it would give you some summary numbers telling you the difference from the baseline.
Erik: we have an issue saying what to do. What's left to do is to bring this into a CDEPS tag and make it happen automatically. Probably fairly straightforward (involving some changes to namelist defaults), but Erik hasn't actually tried doing it.
Will: Should also add some documentation of this.
- Erik: ideally, in the end, this should just give you what you need automatically, so may not need a lot of documentation.
- But there is some documentation that Sean added, which will need to be updated: what Sean added is for MCT rather than NUOPC. (The MCT documentation will still be around in the 5.0 release documentation.)
Adrianna has pointed out that it would help our issue management if we made it more clear what "done" looks like. Let's try to be more explicit about this moving forward.
Sam has been working on this – mostly focusing on making mksurfdata_esmf easier to use.
See Erik's proposed four-level system above.
What does "fairly quickly" mean for responding to emails among the software team?
- Bill's approach: use email for things that don't need an absolute immediate (within the day) response... using chat for things that need a more immediate response.
Should we have more of an email ticketing system?
- Is there an advantage of this over our existing systems (forums, GitHub)
- Would GitHub Discussions be useful... or just one more thing?
- Possible concern that someone would be uncomfortable posting basic questions in a public forum.
There is some interest in at least exploring GitHub Discussions and seeing if it could work well... though concerns about overlap with the Forums. Maybe long-term we just keep one.... For now, if people email us, let's point people to Discussions.
In terms of expertise:
- For FATES, who should answer a question isn't necessarily clear-cut based on institution. Maybe it's more of a question of infrastructure/software vs. science.
- We should be comfortable telling people that the point person on a particular topic is out but will answer this when back to work.
- Baby shower for Danica and Adrianna was awesome.
- FORTRAN is a language that you can be fluent in (Joke at the multicultural workshop)
- Erik: Outstanding PR's, notably prioritization for FAN #767.
- Erik: Passing filters into InitCold? I suggested this for #1787 I think I should probably withdraw that request.
- Erik/Kaveh: With the Norway connection being important I think we should get access to a Norway machine. Who? Is the Norway machine a standard part of ccs_config? Should we run testing there on a regular basis.
- Erik/Kaveh: I'm going to work with Kaveh on the excess_ice tag together. Then we'll also do the PR's he's prepared. Doing two tags together, first just watching, second driving is a good way onboard someone onto the team. I think we should make that a standard way of bringing new people on.
There are some research priorities with FAN, but it's questionable whether the 3-year-old PR is still useful as it is.
What would the feasibility be to bring the current branch up to date with the latest master?
- Note that there are quite a few conflicting files, so this probably won't be trivial.
- On the other hand, the changes in FAN are pretty minimal outside of a few files, so maybe it won't be too bad???
For the excess ice PR, Erik initially suggested changing a type conditional to a filter. But now we realize that this is in initCold, and we generally don't seem to use filters there. So it seems okay to use a type-based conditional there.
There is support there for adding someone from here to the Norway machine. Probably one person from here is fine for now.
In terms of porting: They currently have a .cime
directory that they point people to.
There are two issues that add complexity to FATES test mods:
- FATES compsets leverage user mods
- We'd like to do away with this, leveraging other mechanisms to set options. For history variables, there could be code in FATES to turn off some history variables with certain options
- There is a lot of layering in the use of FATES test mods
- There is general agreement that we should flatten this structure, even if it means some duplication between test mods. Adrianna will give this a shot.
There are some islands where we have negative runoff. We currently classify those islands as wetlands (due to a mismatch between the landfrac being used in the run and the landfrac used in generating the surface dataset).
Note that MOSART doesn't allow routing of negative runoff. The standard option is to pull that directly from the ocean. MOSART has an option to route the negative runoff to the river mouth so that positive and negative runoff can cancel out; it doesn't seem to be working currently, but it's probably worth investigating this further.
If we had mizuRoute with dynamic lakes, that could solve the lake issue – since a lake would just dry up rather than generating consistent negative water balance. The potential issue with this is that it could be less realistic scientifically: lakes could dry up during model spinup.
In terms of the negative water balance from wetlands: feeling is that we should try modeling those areas as land rather than wetland to avoid negative water balance. That isn't ideal scientifically, but maybe is okay. There is a question of what the surface (i.e., vegetation) type should be in those areas. If it's a grid cell that already has some vegetation present, then we can apply the vegetation fraction from the land-covered portion to the entire grid cell (Bill's note: I think that might be what is already done). For a land grid cell without any vegetation information, we guess we'll leave it as bare ground (we'll have big beaches in CLM!)... this doesn't feel ideal scientifically, but hopefully there aren't too many grid cells where this is happening.
- Erik: The code review of the excess ice work largely looks good. I think we need to meet with Kaveh and Matvey to plan how to finish it out.
- Bill: extra step for LILAC
- Erik: Dust work. Prigent data comes in 2005 and 2012 versions. It looks like both Leung and Meier use the 2005 version. Danny's version is "drag partition factor" and modifies it with an expression. So we can use the Meier version of the data and use it for the dust work as well. Danny is verifying that his version and Ronny's version compares well.
- Erik/Danica: How do we communicate to outside groups more about how their PR's can come into CTSM?
- Erik: I'd like to go through the "bug impacts science" list of issues. Can we mark most as "low priority" if they aren't important? Some are more involved and maybe can't be marked as low priority either.
Erik is suggesting using a stream so that we can just have one resolution that gets interpolated at runtime.
Why use the 2005 version? Apparently Danny was advised to use the 2005 version by Katherine.
Dave: let's use the 1/4 degree version for the stream to help support higher-resolution simulations – though we could start with the 1/2 degree and swap it out later.
Erik is thinking of starting with the 1/2 degree and making sure that the interpolation ends up with something similar to he 1 degree file.
- Erik/Negin: Go over py_env_create usage and testing
- Adrianna: Bug in FATES SP compset with latest API update?
- Erik: Dust work. Prigent data comes in 2005 and 2012 versions. It looks like both Leung and Meier use the 2005 version. Danny's version is "drag partition factor" and must be something different from momentum roughness for Meier work.
- Erik/Danica: How do we communicate to outside groups more about how their PR's can come into CTSM?
- Erik: I'd like to go through the "bug impacts science" list of issues. Can we mark most as "low priority" if they aren't important? Some are more involved and maybe can't be marked as low priority either.
We now have two possible approaches working:
- Negin has a script based on a script from Sean to subset a mesh file
- Sam has an approach based on an nco command to create a mesh from a dataset
Minor API update last week (PR 1515)
Adrianna: it seems like some of the hist_fexcls for FATES SP are wrong with the latest update. (This is related to the usermods that are picked up by the compset.)
Probably the tests currently override the usermods, which is why this wasn't detected. We should probably change the testmods to use the usermods directly so that we're actually testing what a user would get.
Erik has changed the name of this to py_env_create.
This is set up to list exact versions of a number of things.
- python itself: conda does manage this, so it can pull down a specific version. There is a bug on CGD systems that forces us to use 3.7.0, but if you have installed your own version of conda on CGD systems, then you can use something else.
- xarray is changing frequently, so it's helpful to list a specific version
- pylint gives different errors for different versions, so it's important that we use a stable version
- black: in principle version could matter, though Erik sees that, for our code, the latest version of black doesn't change anything.
Once you install it, you would just activate that environment.
Potentially there would be a different environment for different versions of CTSM, so need to make sure you have the right environment loaded for the version of CTSM you're using. So you should rerun py_env_create after updating your CTSM version.
- Jackie: but sometimes updating your environment can be really slow... can we avoid needing to rerun it all the time?
- It could also be painful to need to change your environment for each version, because you're often going back and forth between a few different versions
- So maybe we don't expect users to update all the time... and maybe that's actually overkill given that the packages we depend on probably don't introduce breaking changes too often.
- We could have a section in the ChangeLog that indicates whether you need to update your ctsm python environment for a given tag.
- Erik does note, though, that running the script is very fast if all of the packages are already installed.
On cheyenne, you need to module unload python and load conda. izumi has an overall similar process, but need to module load lang/python and use the cgd-specific file.
Will points out that there is a pre-existing ctsm_py (started by Joe Hammon), so we should consider using a different name.
- Adrianna suggests ctsm_pylib
Bill: Longer-term, thinking about how this might fit into a broader CESM python ecosystem – e.g., running tools across multiple components, where presumably a user wouldn't want separate environments for each tool. Would it work, for example, to specify a minimum and maximum version and then have automated testing to ensure that the min and max versions work with our code – so that we could hope to fit in with other components in terms of the requirements for different versions.
- First CN Matrix tag came in!
- LILAC is working again!
- Erik: The veg-type "IVT" vector and PFT types in CTSM is wrong for FATES, to get it correct we'd need to use what's on the FATES parameter file. Is this what we should do? This is needed for MEGAN/DryDep as put together now.
- Erik: For FATES-SP mode both generic crops should map to a C3 grass, so that FATES and the CTSM parameter file agree. Currently the last generic crop maps to C4.
- Erik: What is the status of PR #1647? Is someone working on this?
- Erik: What is the status of PR #1249 us this still something we want?
- Erik: Created a DUST project, can I add Danny Leung to ESCOMP and CTSM? So he can triage the board? Otherwise, I need to move to google docs
- Erik: manage_python_env, what should we change the name to? Should we go over how this works?
- Bill: brief AgSys update
- Bill: brief hillslope update
- Erik: For DUST we need the surface roughness data from Catherine Priglent that's part of the surface roughness PR. We had talked about not doing this part as it involves bringing in new datasets either into CTSM as a stream or on the surface dataset. Because of it's importance for dust, we do need to incorporate it.
- Erik: I'd like to go through the "bug impacts science" list of issues. Can we mark most as "low priority" if they aren't important? Some are more involved and maybe can't be marked as low priority either.
Will have an update coming in with some history field fixes, and added a FATES land fraction. Also includes some bug fixes from Greg, and externals update to support an LBL machine.
Testing a PR that has potential answer changes with grasses. Jackie is testing to see how big the impacts are.
MEGAN:
- Thanks to Erik for doing work related to this.
- Thinking of a FATES parameter that gives mapping from FATES PFTs to MEGAN PFTs (so not tied to one default parameter file in FATES).
- One thing Erik realized is that the FATES parameter file overrides the CTSM parameter file in terms of PFT names, etc. This is probably good, but may be confusing to some.
- One thing you want to do with names in the parameter file is check to make sure they agree with expectations.
- Erik notes: The MEGAN file has a list of the 78 CTSM PFTs, but groups them into 6 classes. But this is based on the CTSM PFTs, which is a problem when running with FATES. Jackie would like this list of 6 to be expanded to more. Will discuss this with Louisa tomorrow.
Erik: ivt is incorrect when running with FATES.
- Ryan feels like it fundamentally can't be right with FATES.
- Erik sees a way this could work with a connection to the FATES parameter file. This could work in SP mode. It wouldn't work in general when a patch is a mix of PFTs.
- Erik suggests setting ivt to negative numbers with FATES, so that it aborts if you try to use it. As far as he can tell, the only time it's being used with FATES is with drydep and MEGAN.
Erik: FATES-SP is incorrect right now in its treatment of the generic crop: both generic crops are C3 on the CTSM side, whereas FATES treats one as C4. It seems like the correct thing to do would be to treat all of the generic crop as a C3 grass.
Rename suggestion: py_env_create
Bill asks about this approach vs. an approach in which the user defines their own grid and uses nco commands (to create a SCRIP grid file) together with the ESMF tool to convert a SCRIP file to mesh.
Adrianna points out that there is real value in the regional subsetting approach in that the user doesn't need to make as many decisions. Will echoes this.
So we'll likely have two approaches for regional runs:
- The subsetting approach for people who don't have a specific resolution in mind, but just want to run over a region clipped from an arbitrary global dataset
- An approach that lends itself better to running over a specific regional grid
We need to bring this in somehow. Question is whether to bring it in on the surface dataset or as a stream.
There is an open question about what regridding means here. Previous discussions have highlighted that results are very sensitive to the resolution.
- Mariana/Bill/Erik: Don't remake init files if they already exist.
Mariana is about to submit a PR that changes logic for an initial run. Currently, we always interpolate initial conditions. This can take a long time (13 min for ne30, 45 min for 7.5 km). Knowledgeable users can reuse this interpolated file after the first run of the initial case, but most people don't know to do this. (There is a similar issue for the creation of the land fraction in initialization – though that's a memory issue more than a time issue.)
Mariana has introduced a new directory in your run directory (init_generated_files
). These two files – finidat_interp_dest.nc
and the land fraction file (which is new in this PR) – are put in that directory when they are created. Then, if they already exist, they are reused rather than being regenerated. (Technically, it looks for a status file that flags successful creation of this file.)
For tests, this init_generated_files
directory is removed before running, so that rerunning a test will do the same thing the second time.
There are occasional, rare cases when a user would need to manually remove the init_generated_files
directory – e.g., when changing things about N dominant landunits or PFTs.
Bill: we may want a test that exercises this and ensures things are bit-for-bit.
Have incorporated LULC changes for ELM. Will require an API change (though no impact on the CTSM side).
Summary of a recent discussion:
- Short-term: FATES needs a way to do something for crop areas so that there is at least LAI that exists over crop areas. (There are a few different ways that could be done.)
- Medium-term: There are a few different ways for handling this – e.g., the collapse capability, or using FATES's ability for mapping PFTs. Would like to map out the different options, considering the complexity of balancing FATES, CTSM and ELM.
- Long-term: When FATES becomes the standard way of running the model, FATES needs some way of handling crop. (Although another option would be to allow the host land model to continue to handle crops – though that has some undesirable aspects in terms of crop using different biogeophysics and biogeochemistry for crops than for natural vegetation.)
More on crops:
- We're tentatively thinking that we should put our focus on APSIM and FATES (rather than the existing crop model)
- A first step is adding flexibility for FATES to either handle the crop landunit or not. (Ryan feels like this wouldn't be too hard to first order, but dynamic landunits may be hard.)
- For generic crops, there should at a minimum be two rather than one generic crops: C3 and C4.
- We should have a bigger meeting on crops with a broader group.
-- CESM workshop is done -- ctsm5.1.dev099 has a nice advancement for FATES to only create memory for patches needed for FATES
- Erik: I want to propose that DryDep and MEGAN only be invoked for FATES-SP. We should be able to get that working soon, but normal FATES patches don't lend themselves to how DryDep and MEGAN use the veg-type in the patch to determine behavior. I also propose that Patch%itype should be a special value for normal FATES natural veg (maybe negative so that subscript checking will show an error if it's used).
- Erik: Do we want FATES and collapse options to work together? We don't test, but most should work. Should we keep that attitude, of expect it to work, but don't test? It does seem like possibly collapse options might be useful with FATES-SP mode.
- Erik: Both FATES and CTSM have options to collapse patches/PFTS should we do this on the CTSM side or FATES side for example to collapse crops?
- Erik: I put together a github action to run black on python code. This sort of thing can be used for various other things as well. It dovetails with the pre-commit hook that Negin is working on. It'll run even if you don't opt into pre-commit hooks. Let me show how it will work.
- Erik/Negin: Proposed sequence to bring in black PR's.
- Black tells you how to run black on saves in your editor (for a particular list of editors). Should we all learn about this?
- Black reformat commits should go into the .git-blame-ignore-revs file. Using it is something you have to opt into for git.
- After that point we make sure python code is black clean when reviews are asked for.
- Erik: What did we learn from the CESM meeting that we need to act on (short or long term)?
Now, FATES dictates number of patches allocated, rather than going with the CTSM default. This will save memory in typical cases, and give flexibility to run with a larger number of patches if desired.
Also, overhauled naming convention of namelist parameters. Now namelist parameters have a prefix so you can see parameters related to a particular aspect of the model.
Erik's proposal: DryDep and MEGAN only be invoked for FATES-SP. Ryan's PR got this working, though want to confirm that it's working correctly.
Problem: FATES patches don't lend themselves to how DryDep and MEGAN use the patch type. Erik suggests setting patch%itype
to a negative value for standard FATES runs so that debug tests will fail if you try to use it.
For SP and nocomp, with each PFT on its own patch, you don't need to make any choices about how to summarize the different PFTs within a patch. But for FATES with competition, you do need to figure out how to aggregate things for different PFTs.
Note also that MEGAN currently calculates its own PAR... ideally it would get PAR from the model.
Ideally we would refactor MEGAN so that both FATES and non-FATES could call the same subroutine.
Is it acceptable to defer getting this working in FATES-comp mode? General feeling is yes.
There is a challenge with changing PFT definitions when coupled with CAM, since CAM uses an emissions file that is tied in with this.
We want to make MEGAN folks aware of the challenge – and opportunity – related to coupling with FATES. Also the issue of where PAR is calculated.
Do we want FATES and collapse options to work together?
FATES can't work with the dominant PFT option, but it should work with everything else.
We won't worry about getting FATES working with the dominant PFT option for now.
One possibility is that Kaveh follows along with Erik's review of the upcoming excess ice PR.
Also taking on a small tag.
Erik has added a black check via a GitHub action.
This dovetails with a pre-commit hook that Negin added – which is opt-in – which will actually run black on the code (among other things).
Running black in your editor: Let's get some experience with this vs. using the pre-commit hook so we can share experiences with which has the least friction.
If you do a commit where you have just run black, then add it to the .git-blame-ignore-revs
file.
Erik's proposal (Bill agrees) is that the pre-commit hooks just be run under the python directory for now.
We now have some out-of-date information in the wiki and User's Guide, especially related to running single-point and regional cases.
Given the ease of creating tutorial notebooks, Will is somewhat inclined to use them for this kind of documentation moving forward.
However, there is potentially a distinction between a tutorial aimed at first-time users vs. more of a reference manual.
Maybe we should have a hackathon to work together to update the documentation, sometime before the next release.
See https://documentation.divio.com/ for some useful information on writing good documentation.
Jackie suggests:
- Making documentation searchable
- An indication of when documentation was last updated (ideally for each page; would there be a way to get this automatically from Sphinx?)
- Excess ice PR coming in, Sean's taken a first look and it seems pretty good.
- Erik juggles lots of tasks with a smile and laugh.
-
Ryan, Marcos, & Greg have integrated a bunch of things into several PRs (from #800) that includes a number of improvements and a bug fix.
-
#1766, CTSM PR address the patch count issue and parameter file format & and should be merged shortly (pending Erik's review), this PR will have impacts on FATES users.
- Erik noted that currently FATES won't run with generic crop land units, except for FATES-SP mode.
- Ryan suggested that we just need to create the mapping between generic crops and the FATES pfts.
- Greg noted that Rosie has a PR work in progress to facilitate this #817.
- Currently FATES-SP with generic crops still won't be on their own land units. Dave noted that this is something we'll want (both for running with SP and BGC crops on their own land unit). Ryan thinks this should be doable, but Erik seems less sure...
- Long term, it would be nice to decide if you're using FATES or BGC-CROP on any land unit. This will be a task for Erik soon.
- Erik said we'd want to use the 78-pft surface dataset, as we're doing for CTSM-SP and does't want to maintain the 16 pft surface dataset in CTSM5.2. Which would require collapsing all the crops on the CFT array into one FATES crop functional type, or mapping this on the binary file that maps between the surface dataset and FATES parameter file.
- Ryan will bring this up at the next FATES-SE meeting (in two weeks).
-
#1515 Greg's removing working on cleaning up some more stuff...
-
Adrianna asked about setting bounds on parameter files that won't cause the model to crash.
- Dave noted that the PPE spreadsheet sets these limits for users, which may be a lower cost mechanism to accomplish this same thing.
- Ryan suggested using the
modify_FATES_params.py
script could be modified to provide these range checks pretty easily, by adding min-max data to the metadata attributes on the parameter file. - These are potentially two issues, one that involves absolute ranges for some parameters (e.g. 0-1) that could have ranges on the parameter file vs. parameters that have some imposed 'fuzzy' ranges, based on information in the PPE spreadsheet. Adrianna's question for FATES seems to be in this second category, and is a longer term goal (maybe sooner for FATES)
Negin suggested that we hold off all topics related to Black until Bill gets back.
- Erik: I put together a github action to run black on python code. This sort of thing can be used for various other things as well. It dovetails with the pre-commit hook that Negin is working on. It'll run even if you don't opt into pre-commit hooks. Let me show how it will work.
- Erik/Negin: Proposed sequence to start running black for supported python:
- Erik bring run of black as part of his tag. Also does the github action for black check
- Negin add the black pre-commit hook in her regional tag.
- Black tells you how to run black on saves in your editor (for a particular list of editors). Should we all learn about this?
- Black reformat commits should go into the .git-blame-ignore-revs file. Using it is something you have to opt into for git.
- After that point we make sure python code is black clean when reviews are asked for.
- CTSM Tutorial last week! Jupyter notebooks and AWS cloud was amazing. Thanks: Will, Danica, Adrianna, Negin, Brian, and everyone else that helped.
- Ryan and Erik working on a big PR that includes several issues (#1766)
- FATES patch count issue.
- FATES-SP adjustments have been made.
- Parameter files updates are still in progress, but a bunch of renaming comes in with #1766.
- LULCC changes require an API change (including passing product pools back to HLM). Changes are needed on CTSM side to point to the right API.
- There are special considerations for SP and NoComp mode where FATES is getting information for HLM surface dataset.
- PR isn't passing BFB CTSM tests.
- Maybe theses could be expected, but Ryan's planning to split this PR to isolate the issues
- Keith: Status of anomaly forcing update? Several requests coming for this to different people, and several issues about this already.
- Currently data are in Keith's directory (not input data), This should be fixed.
- Erik is going to work on the nuopc issue
- Sean asked if Negin could bring his python script into 'official' tools for CTSM.
- Keith: How to proceed on surface roughness testing
- migrate to CRU-JRA, should we set the forcing height (10m)? (can we use the LW forcing from new dataset)?
- set zeta max to 2 globally.
- Will suggested we meet next week to discuss further.
- Adrianna is working on bringing Ronnie's roughness length approach into FATES too.
- Keith: Effects of cam reordering on CTSM shortwave/albedo calculations.
- Skip this for now, but we need to make sure the CAM radiation calculations synced up correctly with CLM albedo (so far Keith doesn't see any issues).
- Negin: Show us more about the pre-commit hook.
- PR for using black to correct python scripts
- Pre-commit hooks run before git commit and automatically formats python code with black.
- Whole repository needs to be cleaned up for everyone?
- Erik suggested we get our feet wet with this (e.g. the python directory) before expanding to additional functionality in .f90 code.
- Erik asked if we needed to re-run testing after formatting is changed with pre-commit hooks.
- Interest in expanding to fortran code, but Erik's cautious too.
- Erik: There's a PR that Jim needs to work with SCAM.
- Erik will include with some BFB tags
- Erik: Anything that needs to happen before the CESM workshop?
- Will would appreciate clarification for our plan to articulate to the LWMG
- Erik: I started a prototype for a python manager tool. I'd like to do some more work with Negin on getting this in a more final form.
- Erik: How is the tutorial coming?
- Erik/Negin: We plan to put together some slides on the SEA ISS conference and focus on what we learned that could help the team here. We should probably present it after the workshop, since the tutorial is next and then the workshop shortly after that.
- Erik: Is MIMICS the only reason that going beyond 4-digit years would be useful for CTSM? It was something needed for TG compsets for CISM, and I suspect useful for Paleo work, but CESM is not clamoring for it.
Erik: Is MIMICS the only reason that going beyond 4-digit years would be useful for CTSM? It was something needed for TG compsets for CISM, and I suspect useful for Paleo work, but CESM is not clamoring for it.
Will: we'd really like to have a better spinup, so we don't need to spin up for > 10k years. So not worth working on extending things to work with this long of a run.
Erik suggests adding a stream for forcing height. This way it differs for different forcing streams, which feels like the right way to do it.
For the future, it might be clearer to users (and more self-documenting) if it's on one of the existing files. Either way could be fine.
- Thanks, Erik, for the tag just before cheyenne went down that fixed a number of issues
- We have izumi to run on this week!
- Erik: I'd like to talk to the team about conda environments. Should this be a subgroup discussion? I added a file that under the python directory that can be used for a working conda environment. Is this the way we want it to work? How do we envision having conda environments setup for CTSM? I also realize that we will need to update the python environment on occasion and that will mean updating the python code.
- Erik: The FATES-SP issue reminds me of #942 and more work we should do on designing how build-namelist and XML settings should work. How they work now, and what we should work towards. This also requires having a small group discussion on how things currently work and the different options (we have several ways to do the same thing). I proposed doing that before, but we decided to delay it. I think I should schedule a meeting with a subgroup: Bill, Erik, Negin, Greg, Ryan?, Adrianna? who else? We want to start with how things currently work, and the pros/cons of each option and then get a feel for what we are working toward.
- Erik: Improving Scientific Software Conference take aways. I'd like to have our team do the RateYourProject survey on our practices. My rating showed that "planning" is what we are short in. If we all take it, we can see what we agree we are short in and then make steps at improving that part. Any other take aways from the conference on how we can improve? Take aways for Erik:
- gcov for coverage of our code in testing
- Scientific software quality levels
- Project Tracking Cards for process improvement (PSIP) https://betterscientificsoftware.github.io/PSIP-Tools/PSIP-Overview.html
- More we can do with github actions? CI for CTSM?
- Using GPU's for mizuRoute? (work is proportional to number of OpenMP directives which is small)
- MPI interface improvements may remove our need for wrappers to MPI code
Erik has started with a conda environment... may have a discussion of how exactly to manage this.
Which netcdf library should we use for things like modifying the param file: scipy or netcdf4? scipy seems like it may be more ubiquitous, but netcdf4 is more feature-rich (including supporting netcdf4-formatted files). Let's keep talking about this; it would be ideal to settle on one for the use case of modifying the param file for FATES and CTSM.
Negin notes that she has been using xarray or netcdf4 for the python tools.
Original question from Erik: The FATES-SP issue reminds me of #942 and more work we should do on designing how build-namelist and XML settings should work. How they work now, and what we should work towards. This also requires having a small group discussion on how things currently work and the different options (we have several ways to do the same thing). I proposed doing that before, but we decided to delay it. I think I should schedule a meeting with a subgroup: Bill, Erik, Negin, Greg, Ryan?, Adrianna? who else? We want to start with how things currently work, and the pros/cons of each option and then get a feel for what we are working toward.
Bill feels the decision hinges partly on what users feel would be the most usable way to tweak settings after you have set up a compset. Bill's feeling has been that, in general, we should aim to have individual xml variables that are set by the compset (and can later be adjusted by users), moving away from the catch-all CLM_BUILDNML_OPTS
. Others agree with that.
We'll start with a proposal that we can discuss in the software group. Then can discuss it with the scientists.
Greg: For FATES, they want to do seed dispersal. They've been looking at the prognostic beetle code for how the communication was done there.
This method is inefficient, but probably sufficient for once-a-year communication, which would be the case for seed dispersal as well as beetles.
Greg sees some potential to save some time by storing some information about nearest neighbors in initialization instead of recalculating it each time.
Other potential uses of inter-grid cell communication are fire and lateral water flow. Those are different in that they would need more frequent communication.
To keep this general for the types of grids we may want in the future, we should design this to work with unstructured grids. May be able to use the connectivity information on the mesh file to facilitate this.
- FATES-MIMICS is working and with threading!
- Nutrient Refactor
- Couple more ctsm5.2 alpha tags have happened, the build is pretty robust for cime machines
- Bob Ohemke is going to work on getting ESMF regridding to work with HRU grids like for mizuRoute
- Erik: Unfortunately our list for removing support for MCT is the longest in CESM. The recommendation is still for it to stay for one or two CESM beta tags. There was thought to be a problem with MARBL, but that seems to have been resolved.
- Erik: The FATES-SP issue reminds me of #942 and more work we should do on designing how build-namelist and XML settings should work. How they work now, and what we should work towards. This also requires having a small group discussion on how things currently work and the different options (we have several ways to do the same thing). I proposed doing that before, but we decided to delay it. I think I should schedule a meeting with a subgroup: Bill, Erik, Negin, Greg, Ryan?, Adrianna? who else? We want to start with how things currently work, and the pros/cons of each option and then get a feel for what we are working toward.
- Will/Erik: As part of this meeting we should advise Will on what he should tell CTSM scientists as part of the CTSM science meeting. There might not be anything important each week, but likely something roughly monthly.
- Erik/Bill: Software priorities for CESM3. We especially need to add our thoughts about land diagnostics.. https://docs.google.com/document/d/1LRDC9Un3faqZtPJLhejjN5MFgzs-EvlSR_ns66YXJ-M
- Erik: Improving Scientific Software Conference take aways. I'd like to have our team do the RateYourProject survey on our practices. My rating showed that "planning" is what we are short in. If we all take it, we can see what we agree we are short in and then make steps at improving that part. Any other take aways from the conference on how we can improve? Take aways for Erik:
- gcov for coverage of our code in testing
- Scientific software quality levels
- Project Tracking Cards for process improvement (PSIP) https://betterscientificsoftware.github.io/PSIP-Tools/PSIP-Overview.html
- More we can do with github actions? CI for CTSM?
- Using GPU's for mizuRoute? (work is proportional to number of OpenMP directives which is small)
- MPI interface improvements may remove our need for wrappers to MPI code
- Erik: The SourceMod workflow that we show causes problems in being able to bring work back into the model. Examples of this include Danny's and LongLei's work on Dust, as well as the SIF work. I wonder if we shouldn't give more instruction on how to setup a workflow using git, that would allow work to be moved over easier. We should take this into account for the CTSM tutorial as well.
See https://github.com/orgs/ESCOMP/projects/2/views/12
Erik was working with Ufuk's tool to convert a domain file to a mesh file. Had to fix some issues. Also some issues with establishing the right python environment – maybe because of Dask requirement.
- Bill: there may be some broader work by CSEG and/or ESMF to have a supported tool. So we should check in on the plans for that before making our own supported long-term tool.
Negin: new mesh subsetting takes a few minutes. Should we use Dask to speed things up?
- Feeling is that it's better to keep things simple, even if it takes a few minutes.
Dave: it would be useful to have a tutorial on some realistic science project workflows. For example, incrementally developing along a branch and running cases as you go... how would you manage that with git?
- mksurfdata_esmf merged to ctsm5.2 branch
- Greg is doing his final presentation for his grad school class.
- Sounds like Jim has a more robust build for mksurfdata_esmf
- Erik has a go at turning off soil BGC for fates-SP
- Sounds like work on the tutorial is coming along nicely
- What else?
- Erik: Would like to have some regular structures to make sure we are hearing about progress in FATES. I think a regular "What's up with FATES?" question each week would be helpful. And on roughly a monthly a basis we should go over what's up and coming for FATES-v1 to happen. FATES people are very tied into what CTSM is doing, I feel like we could use a little more on the other direction. Are there FATES meetings that Erik should attend more often? One example of something that has fallen through the cracks is that it's on me to get FATES single point simulations to work with crop datasets, I haven't got back to that. Regular accountability helps with making sure things happen.
- Jackie: In the spirit of above, update us on what's up with FATES
- Erik: Should we add back the ability to bypass the 1km ELEV dataset handling in mksurfdata_esmf?
- Bill: ozone updates; stream vs. surface dataset
- Keith: We would like to run a fully coupled future scenario simulation (SSP585) with dynamic urban. The release-cesm2.2.0 doesn't support the BSSP585 compset and release-cesm2.1.3 doesn't support CTSM. Is there an alpha/beta tag that is tested that we could use and then update the CTSM external?
- Bill / Keith: Plan for https://github.com/ESCOMP/CTSM/issues/1716 : should it be a priority for the ctsm5.2 surface datasets, and who should work on it?
- If it's a priority, then we should have a follow-up meeting with the relevant people to chart a more detailed path forward.
MIMICS coming in: very close.
There is an issue about land cover / land use change; this is really just about finalizing things on the ELM side.
Ryan has been cleaning up logging.
Longer-term:
- Work on FATES-SP (lots of discussion on that)
- Upcoming meeting on land cover / land use change
- Ryan will be making recommendations for linking nutrients (which has been done on the ELM side, but needs to be done on the CTSM side). Ryan is going to take a couple of days to investigate then will report back.
Ryan is working on combining a bunch of parameter changes so they can do a batch of changes to the parameter file. This will require an API update – but will be non-answer changing.
Jackie suggests putting something on the project board for changes that will require an API update.
Dave has filed an issue on the relative costs of FATES vs. non-FATES.
If you want to use a more recent version, you can go into your FATES directory and do a git checkout
of master.
For API changes, there will be a coordinated CTSM tag. Without API changes, you can update the FATES version safely.
They also keep a table of compatibility: https://github.com/NGEET/fates/wiki/Table-of-FATES-API-and-HLM-STATUS
Should we add back in the capability of bypassing the 1km elevation dataset? The old tool had a flag that set this to a constant.
Bill's feeling: depends on the minimum resource needs of mksurfdata_esmf with and without the 1km file.
Erik has suggested using streams for more things.
Discussion in March, 2019:
- Erik: Suggestion for adding fields to surface datasets. Should only add ones used all the time, at high resolution, non-transient. Optional, low resolution, transient (outside of yearly) should be added as streams. So suggest soil Ph should be own streams for example.
- When it's a mix of these characteristics, we need to make a case-by-case call. But Erik feels that, in general, optional fields should be on a stream.
- Dave points out that sometimes a field is tied to another field on the surface dataset (e.g., soil texture).
- Bill: can we come up with some default options and wrapper code so that it's easy to add a time-constant stream like pH, with a few lines of code rather than a page of new code? Erik thinks that could be possible. (For transient things, there truly are more settings that you need to consciously set.)
Some reasons not to use streams that we have discussed in the past are:
- PCT fields
- Fields that need special / custom mapping (e.g., some soils data)
Erik suggests only putting things on the surface dataset if it is tied together with other things – like the PCT fields. Erik feels it would be better if other things are not on the surface dataset.
A downside of putting the data on the surface dataset is that, particularly for high resolution, this is wasteful in terms of space.
A theoretical downside of streams is the initialization time; it could be good to check to have a rough sense of how much initialization time is added for each stream (calculating the regridding weights).
Dave: another nice thing about streams is that you can look in the namelist to see what's used rather than needing to dig into the surface dataset.
General feeling: for things that are interpolated using a standard approach and aren't tied in with other fields (e.g., the PCT fields), let's default to using streams.
- This is at least for moving forward... separate question about whether we'd want to revisit this for existing fields.
What about urban parameters? These have historically accounted for a large portion of the surface dataset size. For now these could be indexed just by region (without having them explicitly spatial)... could make this a stream if streams can handle fields that don't have explicit spatial dimensions.
Erik thinks that we have support for the basic SSP compsets in the latest development code. Will need to figure out appropriate initial conditions. However, this might not be a great idea scientifically (in terms of TOA balance, etc.).
Regarding https://github.com/ESCOMP/CTSM/issues/1716
Bill's initial suggestion was: let's at least get consistency in terms of raw datasets being specified as pct of grid cell area (requires change in urban data set) and fix the mksurfdata_esmf code to regrid these PCT datasets so that they are remapping in a way that is correct for PCT of grid cell area.
But then Bill realized that making this change for urban would make things worse for urban in terms of PCT urban cover in coastal grid cells unless we also do the other changes suggested in this issue.
So to some extent we need to view these changes as a package and should do them all together.
There is uncertainty about the priority of this. On the one hand, it would be nice to just go ahead and fix this so that we can be consistent and stop thinking about it. On the other hand, the effects are probably small, so it's hard to justify this being very high priority.
- mksurfdata_esmf is ready to be turned into a tag!
- Hillslope Hydrology has a PR now!
- Mariana / Bill: Follow-up on timeline for dropping MCT support
- Mariana/Sam: Bring #1663 to ctsm5.2 branch? Is there a reason it needs to come to main-dev now? We have some notes on this for Jan/21/2021. Note this expression from Bill "As a general rule, we should aim to have mksurfdata_map generate datasets that are the same as what's being used out-of-the-box, to avoid accidental answer changes if someone generates their own surface dataset."
- Dave: new mask file for CRUJRA https://github.com/ESCOMP/CTSM/issues/1701#issuecomment-1096102550
- Erik: Naming convention and standards for ctsm5.2 branch. I think I'd like to shorten the names. Propose removing "branch_tags/" part. Shorten to ctsm5.2.mksrf02_ctsm5.1.dev099? https://github.com/ESCOMP/CTSM/wiki/Tag-naming-conventions#mksurfdata-branches-and-tags When making a new mksurf tag should also update to latest ctsm main-dev tag at the same time. Should always run tools testing for these tags. And sometimes should run the mksurfdata Makefile to make sure you can create all surface datasets. I don't think we should manage a ChangeLog for this, just include updates on each tag in the log for it. This will then be assembled for the ChangeLog for the first ctsm5.2 tag that brings the updates to main-dev. Should we send info. on these tags to ctsm-dev? Do it automatically?
- Erik: Currently modify_fsurdat and modify_mesh are tools in their own directory for a single tool. It seems to me that these could be combined into one directory for "modify_tools" or something to that effect?
- Erik/Dave: FATES-SP timing results.
- Ryan: Reducing log messaging https://github.com/NGEET/fates/pull/792
- Bill: PCT convention on raw datasets; wetland issues
Mariana presented reasons for dropping MCT support relatively soon – probably order of a couple of months.
There are a number of things that we still need to get in place, but feeling is that a couple of months should give us time for this, so we are okay with this time frame.
For subsetting mesh files for regional cases:
- Either Adrianna or Negin will add a capability to do this in subset_data
- Note that Sam has a recent set of changes that changes the land/ocean mask on a mesh file, but that's a different need
Plan is for this not to come to master yet. It will be the first tag on the 5.2 branch.
Erik points out that this is a more important / stable / tested branch than our typical branches, arguing for not having a branch_tags
prefix in this case.
However, Bill & Will would like some clear way to distinguish this from our more supported tags that users are expected to use.
So plan is to prefix these tags with something like mksrf
, pre
or alpha
A big culprit is history averaging.
- Ryan isn't too surprised because this involves looping over all points every time step
- Ryan suggests opting out of these high frequency variables by default
- However, right now they don't have a good way to say that a certain thing doesn't need to be calculated. Ryan would like to add logic to have this.
Radiation is also more expensive. Again, Ryan isn't too surprised because of the extra canopy layers that need to be looped over.
- The ISS conference confirmed Bill's work on the test suite. They talked about how testing is a "tax" and that it needs to be set at the right amount. By, lowering the tax for standard tags, Bill is helping us to get more tags through.
- The MPI guy at the ISS conference said FORTRAN is a better language than "C" (for MPI because you can overload subroutine calls). I only hear derogatory remarks about FORTRAN so that was good to hear!
- Adrianna suggested a change to create_newcase, that Erik was able to make a quick PR for and get it approved and merged into cime.
- I'm going to call this a win. I was able to close a bunch of PR's and issues that were superseded by mksurfdata_esmf. We did put effort into things that won't be coming in. But, we did learn a lot from that effort that was important to take us to where we are now. Some of the thinking and analysis has been carried into the current work. There were sticking points to the work that we couldn't get past with the previous methodology.
- What else?
- Bill: Testing the answer changing tags for CTSM5.1
- Look at upcoming tags
- Erik: FATES-SP compset user-mod for tutorial and longer term plan for how it should be connected.
- Erik/Adrianna/Dave: CESM now requires NUOPC since CICE6 is now the standard sea-ice and it doesn't have MCT implemented. What are the things we know how to do in MCT but not NUOPC? Things I think of: Sparse grid, CRUJRA datasets, anomaly forcing
- Erik: FYI. Danny's dust work has resolution dependent datasets that need to come in. He also wants to hear about the difference between CESM1 and CESM2 in regard to surface soil moisture. There was some type of surface moisture resistance added in. The DUST looks totally different because of this. Who should Danny connect with to learn more about this?
- Adrianna/Ryan: We would like to be able to have FATES (via the parameter file) dictate the number of patches per site, but right now this (i.e. the max) is dictated by CTSM. We would like to update this!
- Erik/Will: Restart issue likely in ctsm5.1.dev054 that Will noticed?
- Erik: Should we close the project board on mksurf portability? Some things still apply... https://github.com/ESCOMP/CTSM/projects/20
- Erik: FYI, plan to check with a couple people to ensure this meeting is working for them. Let me know if you'd like to give feedback.
Brian Dobbins did a timing comparison of FATES-SP with non-FATES; FATES-SP took 3.5x longer. We'd like to investigate this to see if we can find some major culprits that we can speed up, because we don't expect this big of a difference (and it could be that finding the culprit here could help improve the timing of the full FATES runs).
Adrianna suggests getting VTUNE running on this. This could complement our existing manual timing calls.
FATES does have some timing calls in it already. But they are all currently at the interface level. We could consider adding some lower-level timing calls within FATES.
Mariana and Will got things working with the new soil data. Advantages of this are more up-to-date data, and more consistency between different variables. We are ending up using an approach like before: a dominant type approach based on regions.
Things Erik can think of: Sparse grid, CRUJRA datasets, anomaly forcing
- Started on CRUJRA but ran into trouble with it
- Sparse grid will differ because you have a mesh file
Regional runs will also differ – because of the need to mess with a mesh file.
Keith's script for running historical cases needs to be ported over to NUOPC (hopefully simple).
- Keith: there is a CDEPS issue that needs to be resolved – for the transition from 1901-1920 to the full series.
There could be others that we'll run into.
When are we comfortable dropping MCT support? Not imminently. One reason for keeping MCT around longer is to provide a step-wise approach for people updating to the latest code: they can first update to the latest code base, then as a separate step update their scripts (e.g., for single-point / regional runs) to use the NUOPC method.
Dave's recollection is that the resolution-dependence might have some trickiness – e.g., as a calibration factor. We probably want to circle back with him on this.
Regarding change between CESM1 and CESM2: the dust looks totally different, presumably due to the change to have a dry surface layer making it harder for soil to evaporate. He's wondering if he should change the dust based on that.
Erik thinks we'll end up with 2 PRs:
- Base-level work implementing the 2014 formulation
- Danny's work on top of this, probably with a switch to turn it on/off
Would like to have FATES dictate the number of patches per site. This is easy on the FATES side, but some questions on the CTSM side.
Plan is to move the read of FATES parameters to (much) earlier in initialization.
In most cases, Ryan imagines that there will be fewer patches than current. But in one scenario, they need somewhat more patches than currently – and for some sensitivity analyses, it could be much higher.
Will ran into a problem trying to use dev034 restart file in a dev074 run with init_interp = .true.
(but could run without init_interp
).
- It died during interpolation; last variable reported during the interpolation was SOIL10(?).
Keith found that dev054 is the issue.
Workaround is to do a two-step process: create a restart file from an 1850 run in the newer code base, then run with init_interp
using that.
Feeling is: unless others run into this, let's just document this but not try to figure it out.
-- Mariana spent four days to figure out the tricky coszen restart problem! Part of what helped her was creating a much shorter test she could reproduce the problem with. She thought four days was too long, but I thought that was very quick for such a tricky problem. She also added some extra debug output that might help in the future. There was also a comment that explained the issue (which of course is hard to find until you identify the problem). But, the comment helps you feel more confident that it's right. So yay Mariana! Yay comments! Yay, finding small tests that show the problem! -- CAM is under the gun to move their testing to NUOPC. This makes me glad that we are well past that hurdle! Thanks to Keith, Bill, Mariana, and Erik for their work in getting that to happen.
- Erik: No meeting next week because of SEA ISS conference
- Erik: It's kind of late right now. But, I realized that there is a small amount of refactoring for CN-Matrix that would've been helpful in keeping matrix up to date in the code. Small changes like formatting changes I've kept around, the introduction of "if ( .not. use_matrix )", and some minor movement of lines. I could bring this in as a bit-for-bit tag right away. This also might be good as it shows people where the matrix code comes into play, so it's a simple introduction. Thoughts?
- Erik: coszen issue in CESM will probably take a bit, as the next critical thing is CAM running nuopc and move to CICE6.
- Bill: tag for changing Clm45 crop allocation
- Bill: ozone
- Bill & Will: Time averaging priorities for history files in CESM3.
- Bill's slides from co-chairs meeting
- Bill: Mariana suggests that I should make (3) a priority. Feelings on that?
- Erik: In CN-Matrix we can increase the matrix size to account for the new reproductive pools, but this would take some work, and validation. The simple thing I can do is either to require matrix to only have one grain pool, or to have the matrix use the sum of the pools, and then figure they are evenly divided (which is probably a bad assumption). An improvement would be to have a fixed weighting for the grain pools, but you probably should add the new pool at that point. Since, there is currently only one reproductive pool, it seems like filing an issue for this and resolve this later would be reasonable. I'm pretty sure that I understand the matrix enough that I could add a new pool, but it would take some figuring out and some work. If possible I'd have Chris validate what I did. And I think our balance checks will ensure it's correct in the end. Doing that could be a useful exercise, but it would also delay bringing matrix in.
- Will: Development plans, timeline & resources for CESM3 (Summer 2023)
- Erik: Some notes on the STRATA workshop that applies to improving performance of teams. Highest performing teams coupling high psychological safety with accountability. High accountability without psychological safety is stressful, but can be productive in the short term. High psychological safety without accountability leads to complacency. When both are low there's apathy. "Psychological safety is a condition in which human beings feel (1) included, (2) safe to learn, (3) safe to contribute, and (4) safe to challenge the status quo – all without fear of being embarrassed, marginalized, or punished in some way." (LeaderFactor https://www.leaderfactor.com) https://drive.google.com/drive/folders/1RMcF8z_DFNKDaE_Au_liRxSU6gofyjCo
- Erik was able to guess that a problem with CAM was just due to use of year zero.
- Negin's tag enabled NEON folks to run again!
- Bill's simplification was great to see in.
- Erik is up to dev085 for CN-Matrix.
- Okay not to allow year 0, or need workaround in code?
- Erik/Bill: Bill had an interesting solution for small PE layouts that I think we should all understand. The concurrent datm on 1-node and CLM on many is good when they run near the same rate, Bill found that datm doesn't scale so you shouldn't give it as many processors. This is good for all of us to know.
- Erik: Mariana talked to me about another resource improvement to mksurfdata_esmf. The idea is that all the time is spent in the 1km file, so save the output of that for creation of new datasets. It's already fast so I don't think that matters much (although it does half the time), but it also lowers memory reguirements which allows you to use more processors. I think this makes sense, especially if added in a general way, and especially if we think we are going to update surface datasets more often.
- Erik: Note SEA meeting is week after next, should we cancel for the 7th?
- Erik: Jim has new container support for CESM on cheyenne for the cray compiler. Negin do you know more about this? What can you tell us about it? I wonder if this is something that we should look more into? Should we think of adding container testing to our standard testing?
- Erik: I advocated about the STRATA workshop last week. The slides are now available, I'd like to go over a few of them that talk about performance (maybe next week)?
Bill: how much time should we spend – or ask others to spend – making sure we understand reasons for answer changes?
General feeling from Bill, Erik and Adrianna is that it's generally worth spending time to make sure we understand reasons for answer changes, as long as this doesn't take too long (for some uncertain definition of too long).
However, Dave raised a good point: How much time is worth spending on things like this could depend partly on how much confidence we have in the original code that's being replaced. For something like photosynthesis that has been around for a long time and we feel is probably correct, we'd want to spend more time understanding any answer changes. But for the crop phenology, where our intuition is that the original code could have some issues, it feels less worthwhile to understand all of the reasons for answer changes: if the new code looks right, and we have reason to suspect that the old code might have been wrong, it could be safe to assume that answer changes are coming from fixing issues in the old code, rather than from introducing new issues, and so save some time by not feeling a need to look carefully into the reasons for the answer changes.
- Thank you to Erik and Greg for dealing with the cheyenne mpi-serial module change fire drill on Friday. Their putting in the time to be proactive about this saves the rest of us headaches. Happy Saint Patty's day everyone!
- Bill: To greatly reduce cost and queue wait times of tests, I'm thinking of changing ALL higher-resolution tests to use small-PE count layouts. Production resolutions will be tested via ctsm_sci test list.
- However, it probably doesn't make sense to change the PFS test to small-PE count. I'm thinking of moving it to the ctsm_sci test list, because I haven't been finding this test as valuable as I thought I would.
- Any objections? (If so, what alternative plan would you suggest that will allow us to reduce test turnaround time and reduce the cost of running the test suite?)
- Keith: Participants for meeting on Meier2022 negative 2-m humidity
- Meeting next week?
- Ryan/others: Questions about how history fields arrays are (or are not) re-initialized from their last values, upon a restart..
- Erik: STRATA workshop was really useful. A lot of it was how "psychological safety" along with accountability leads to the highest performing teams. They are going to share the slides with us. I highly recommend everyone taking the workshop if they do another offering.
- Erik/Adrianna: In our meeting yesterday Adrianna had a good point that sometimes it might make sense to make some design notes and work out the user interface for a project first.
Bill plans to reduce PE layouts of our high resolution tests. No objection to this.
Bill originally didn't think it made sense to run the PFS test with a small PE count, but on further reflection there probably is still value in running this with a small PE count: it won't be exactly representative of performance at production PE layouts, but should still give good indication of significant performance changes.
Erik suggests running this when we change answers.
This makes sense to Bill - answer changes bigger than roundoff.
- Dave: Or another way of saying this is: when it changes answers because we intend to change answers.
We will update https://github.com/ESCOMP/CTSM/wiki/Answer-changing-tags soon with recent tags, and then Keith will run a set of simulations comparing against the ones from a year ago.
We will list anything that changes answers for clm51 configurations now. Don't bother listing FATES changes.
Keith has a possible fix for this negative 2-m humidity issue: if you change the stability parameter (lower it to be a bit closer to what we used to have – like 20, as opposed to the current 100... note that this used to be something like 2), this issue goes away. But he would like to consult with some others on this.
Ryan: in FATES, when they restart the model, they are currently reloading all of the history variables to allow for the possibility that there will be some history averaging time steps before FATES next updates the variable.
Some options for how to avoid the need for this:
- Have a flag associated with a history field saying "this only needs to be averaged on the day boundary, not every time step"
- And/or have a flag associated with a history field saying "store the most recent value on the history restart file", which would then get reloaded when you restart a run that stopped in the middle of a history file averaging interval. (This could be enough as long as you aren't trying to write history files more often than the frequency at which FATES updates its variables. If, for example, we are writing a history file every 6 hours and restart on this 6 hour boundary, then the history restart file wouldn't be invoked... however, it doesn't really make sense to have a daily FATES variable on a 6-hourly file anyway.)
Software Wins for the week: Negin/Iris and co. getting Norwegian domain to work! Keith figured some things out with the surface roughness PR. Sounds like Sam and Bill are doing good work with the crop changes.
- Erik/Will: Move single point tests over to subset data? See #1674
- Erik/Bill: Standup meeting. Talked about doing the following to shorten (it's been going up to an hour)...
- We will not all share updates: if there's nothing you're stuck on or need help with, then just pass
- We can do a little on-the-spot problem solving, but try to keep it to no more than about 5 minutes per person; if more time is needed, then the right people should schedule a follow-up
- We should keep a brief discussion of upcoming tags to try to keep them moving along
- Erik/Greg/Ryan: We need to add inventory files to startup FATES, so more of the code is exercised on our tests. Would this just be a specific single point site?
For single-point cases, our recommended / supported workflow is to use subset_data.
For regional, it is case-by-case: in some cases it makes sense to subset, and in other cases it makes sense to make a surface dataset directly at your resolution with mksurfdata.
The reason why single-point generally / always will use subset_data is because you generally will override the important properties like vegetation type anyway.
Erik asks if it makes sense to switch over the process for creating our out-of-the-box single-point surface datasets now. General feeling is yes. (See issue #1674.)
Will points out that, for the sake of running in a container, it could make sense to do the subsetting from a relatively coarse-resolution dataset (e.g., 1 degree) to avoid the need for downloading a high-resolution dataset in the container.
Negin: the actual downloading of the data is the part that can sometimes be problematic. One possible solution is to bundle the necessary data with the Docker image.
Will points out that the climate data are going to be the especially big piece, so that's more of an issue than the surface dataset anyway.
Talked about doing the following to shorten (it's been going up to an hour)...
- We will not all share updates: if there's nothing you're stuck on or need help with, then just pass
- We can do a little on-the-spot problem solving, but try to keep it to no more than about 5 minutes per person; if more time is needed, then the right people should schedule a follow-up
- We should keep a brief discussion of upcoming tags to try to keep them moving along
We need to add inventory files to startup FATES, so more of the code is exercised on our tests. Would this just be a specific single point site?
Currently, regression tests with FATES start from cold start. This doesn't exercise as much of FATES as we'd like.
A fear with this is: there are a lot of restart variables in FATES, and it could be a pain to remake all of the restart files when new state variables are added, etc.
A solution they have been considering is: Have a small initialization file that would provide the key needed variables, but not try to provide all state variables needed for a restart.
This would likely follow similar logic to the inventory file initialization – but inventory files are text files that don't lend themselves well to global, gridded datasets.
Erik: in the short-term, could you use the inventory approach for single-point or small cases? Ryan: it's possible.
Another possibility could be to come up with something in the code that gives you some sort of idealized startup case that is an alternative to cold start – probably leveraging the LAI data.
If we went with the inventory text file approach, there is some question about where we would store the data – noting that there would be a separate file for each grid cell. This starts to feel kind of messy. The other approach would be to write these data to a netCDF files; or to create idealized conditions (e.g., dummy cohorts) based on LAI data.
- Will & Erik: SoilGrids Data update, #1303
- 5 km .nc file produced w/ mesh file (Will and Ufuk)
- Next steps are to evaluate compatibility in mksurf for using bilinear interpolation (Sam or Erik)
- Additional considerations about how best to interpolate fields without mapping units (Erik, Bill, Sam, Will, Mariana, others e.g. ESMF)
- Note, SoilGrids team is looking into potential issues with data (soil C:N ratios don't make sense), so updates to 'raw' data may be forthcoming.
Will produced a 5-km netcdf file. Ufuk made a mesh file from this, which should now be compatible with mksurfdata_map. Next step: plug in the new soils dataset and try mapping it.
Right now mksurfdata_map is trying to use mapping units, but the new dataset doesn't have mapping units on it.
As a first step, we should try just regridding this using our standard mapping approaches.
However, for the real mapping, the problem is that we can't just do a standard mapping, because you'd end up with a weird, unrealistic soil. We need to find out what standard approaches there are for doing these regriddings of soil properties.
Negin got her long-outstanding tag on subset_data in! New FATES tag in despite random answer change issue. Bill's work on single point testing.
- Erik: What came up in the LMWG meetings that we need to respond to? SCAM update needs to come in. Emily had a nice illustration of doing simple offline modeling that shows importance, which is then moved into the model.
- Erik/Ryan: We should discuss how FATES should interact with soil BGC and work in AD mode, as well as should FATES be run with supplemental Nitrogen? Who needs to be in this discussion? (Charlie, Ryan, Will, Erik, who else wants to be?)
- Erik: Dave and others have suggested the idea of having a "less supported release tag". I think the way is to mark the "last stable tag". We did think that we should do more testing on certain tags, and also have Keith periodically run simulations with certain tags. I think the question is how do we mark the latest stable and what are the testing standards for it?
- Erik/Negin: Negin has some slides on python environments that she gave to CSEG we should discuss this as well. Let's plan on when to do that. Should we wait until CSEG decides on a path forward?
- Erik/Negin/Bill: We had a bunch of discussion on python coding in Negin's PR. I'd like to step back from the details of that discussion and have a subgroup talk about standards in python.
Longer Term Design Discussion: (Should just SE's discuss this?) There is utility in everyone understanding these different elements...
- Erik: Let's decide what subgroup should discuss this, and setup a time for a meeting. See below for the details that I had to cover.
Erik points out that a number of recent successful scientific developments started their life outside of CTSM before coming into CTSM.
- Will points out that MIMICS worked that way, too
One presentation in the LMWG meeting mentioned unexpected differences between cplhist forced runs and the underlying coupled runs.
We do expect some differences between these two. One possible need may be for better documentation of what's expected in this respect. And it might be worth doing some more verification that we're getting scientifically basically the same result with cplhist as in the underlying coupled runs.
CLM5.0 is getting old, and no longer getting many updates. Erik suggests that we change the recommendation for what people use for science.
Will: it could make sense to do this once we have a "soft" CTSM5.1 release.
We can plan to have "soft" releases like this for each minor version (CTSM5.1, CTSM5.2, etc.).
What would be done for these?
- We will run the clm_science test list on it
- We plan to have some level of simulations / diagnostics showing scientific reasonableness
Realistically, we'll probably need release branches for these. Ideally we wouldn't need to put many updates on these release branches, but realistically we'll probably need a few critical bug fixes over time.
There was some discussion in the CSEG meeting where Negin presented some info on options for managing python environments.
General sense is that most scientists in CGD who are using python for analysis (Jupyter notebooks, etc.) are already using conda. (This could be partly a result of that being the recommendation from Anderson and others, and what their tutorials recommend.)
The conclusions from the CSEG meeting from this are unclear, though one takeaway is the importance of staying with very few, well-supported, stable packages for anything that is invoked in the standard CESM run process (case.setup, case.build, case.submit). So, for example, probably staying away from xarray for things invoked in that process (sticking with lower-level netcdf libraries).
Motivation: Keith is working with Ryan to test out a spinup process with FATES.
- Since FATES has no dependence on soil carbon, people working with FATES until now have been running without bothering to spinup soil BGC. This is something that really is worth addressing.
Erik points out a difference between the approaches for carbon-only in big-leaf CTSM vs. FATES: For CTSM, supplemental nitrogen is added to make nitrogen non-limiting, whereas FATES doesn't do anything with nitrogen when running carbon-only. Should those approaches be reconciled?
Ryan: there is a practical reason for the different approaches: FATES is very memory intensive, so it is worth not bothering to allocate memory for things that aren't needed.
Regarding testing of AD with FATES: We have an SSP system test that runs through the spinup process. We should either make sure this works for FATES or introduce a new test like it that would work for FATES.
Erik got the answer changing tag in after working through some issues.
- Mariana: new mksurfdata_map
- Bill: path forward for use of doalb around phenology / ecosystem-related calls
Longer Term Design Discussion: (Should just SE's discuss this?) There is utility in everyone understanding these different elements...
-
Erik: Let's decide what subgroup should discuss this, and setup a time for a meeting. See below for the details that I had to cover.
-
Erik: Some design ideas for where to place different settings: we should think about when do you put something in each place: compset, case xml settings, use-case, user-mods, namelist-defaults, or offline run script.
-
Have as few compsets as possible for single-point and use the same compset for generic tower sites as well as NEON.
-
Have the process for running NEON as similar to generic tower sites as possible
-
It's OK for I-only cases to have compsets that don't encapsulate the exact year being simulated
-
In contrast higher level compsets (B, E, F, etc.) do always keep the exact year in the name
-
It's OK for single-point cases to use the same compset for both spinup and transient simulation in contrast to global simulations where we separate the two I feel like those came directly out the discussion. But, I'd like to add a few that we will need to discuss and make sure we all agree on.
-
Avoid putting complex logic in user-mod shell_commands
-
The original design had build-namelist/configure as a tool that was used outside of the context of a CESM case. And the list of XML control variables for CLM was designed to be short and general and not needing expansion. I don't think we need that restriction anymore. Bill added some new XML variables and they seem to work well. It's a bit bigger deal to add new XML variables, but it can be a good way to go.
-
Avoid putting things in user-mods that can be handled in use-cases or namelist-defaults
-
Avoid putting things in use-cases that can be handled in namelist-defaults
-
Use-cases can't always have complex attribute logic for elements in them (only high level options like physics-version, resolution, BGC-mode, etc.)
-
Avoid putting customization in run-scripts that can be handled in the above infrastructure as that allows people who don't use the run-scripts to have access to it as well.
-
Adding history settings in user-mods is useful as a starting point for users to modify the case by themselves. When buried in use-cases or build-namelist users can't easily modify them
-
A caveat to user-mods is that when someone clones a case for an older version it brings in that case, but doesn't bring in any updates that were made to the user-mods in the newer version.
-
Using user-mods with a clone is awkward (which is primary the cloned case or the user-mods)? Adrianna, noticed it appends the user-mods to the end, this is not always ideal.
-
The easiest to hardest to implement: offline-run script, user-mods, use-case, compset, namelist-defaults, case xml settings
Background: combination of needs for simpler models project and recent experience with challenges creating surface dataset for very high resolution. This made Mariana realize that we could put in place a better surface dataset generation workflow / tool: Rather than creating mapping files as a separate step, create them in mksurfdata_map itself. Two big advantages of this are that it simplifies the workflow and that it is scalable (for memory and time), supporting the creation of high resolution surface datasets.
Another thing Mariana has done is to rework the xml file used for mksurfdata_map. It is MUCH shorter now, largely due to generalizing the looping over years, and partly due to now looking for a mesh file in the existing configuration directory (ccs_config).
One change that Mariana made to the original design for the updated toolchain is that she has gone back to explicitly listing the mesh file for each raw data file, rather than having this in netcdf metadata. There were a number of reasons for this; one big one was that we want to allow the mesh file to change without needing to create a new copy of the raw dataset (this is how things are done in other places in CESM now, and Mariana has found it to be helpful to have mesh files listed separately from the dataset itself for this purpose).
Will: is there a method for generating the mesh files?
- Mariana: current method is to first create a SCRIP grid file, then use that to create a mesh file (though different people do this in different ways)
The new mksurfdata_map code has essentially been totally rewritten: now it is parallel, leveraging pio for i/o and esmf for regridding (rather than using the offline esmf regrid tool). mkmapdata is no longer needed.
One somewhat tricky thing is getting the right processor count for the mksurfdata_map run. For the sake of the 1km data, she is using 24 nodes with 12 tasks per node.
With this, Mariana can create a 2-degree surface dataset (along with generating all of the necessary mappings) in 8 or 10 minutes using 24 nodes.
So this still requires a relatively large system, but doesn't require large memory nodes like it did before. So this doesn't completely solve the problem of people wanting to create surface datasets on small-ish machines, but if anything it should be better than before.
For job submission: at least for now, user will need to write a job submission script themselves. We'll need to provide some documentation or support for how many nodes / processors are needed.
Bill: It seems like, besides the clear use of doalb for controlling when radiation / albedo calculations are done, it also is used to control some phenology-related calls. The use of a doalb conditional was causing problems in the context of Sam Rabin's PR for adding crop planting & harvest date output, so Bill has removed this conditional around the call to CropPhenology, but he is wondering if it should be removed from other calls as well.
Bill looked into the history of this: Back around clm3_6, most / all of the CN code was in a doalb conditional. However, in a commit in the 3_6 series, Peter Thornton changed this so that most of CN was called on the main CLM time step, not the radiation time step. But it was still left in place for a few things.
People can't remember much about why these doalb conditionals were put in, except for recollections that it might have been an attempt to improve performance.
Bill's plan for now is to just remove the doalb call from around CropPhenology, leaving the other uses in place for now.
- MIMICS is in! We survived the LMWG meeting with lots of great participation and enthusiasm. It's cool to see the international involvement and see everything on YouTube.
- Will/Dave: Soil datasets
- Erik/Negin/Bill/Adrianna: Discuss adding the type to variable names in python? Or have we discussed this enough? :-) I really do like the fact that we are forging our way in making standards for our python coding. https://github.com/negin513/ctsm/pull/4#pullrequestreview-878071990
Each field is on a geoTiff dataset.
May be worth checking with Sean Swenson about what he needed to do for the 1km topography data.
Do we want 1 km, 5 km or both?
- Because we've had a lot of trouble with the 1km resolution, Erik suggests doing it at 5km.
- The situation with 1 km will likely improve with some work Mariana is doing with mksurfdata_map, though it could still make sense to have 5 km for now if we don't need 1 km scientifically for now.
Preserving metadata / provenance? We haven't generally preserved the scripts needed to make raw data files in the past, but there is a push across CESM to preserve more metadata in any file that goes into inputdata. So, if it's not too much burden, it would be good to put the script on GitHub somewhere and point to it in metadata on the file. In addition, point to the original raw data somewhere.
- Erik: Bill's careful work in cleaning up the year-length calendar issues for Gregorian. Sorry for the mess this was. Thanks for your careful work on this that fixes some legit issues, and makes it more obvious what is happening (the worse part of the previous hack was that it was hidden). We also as a group are making good progress on developing standards for our CTSM python coding which will enable us to work more effectively as an engineering team. It's hard work to do this, but we are making progress.
- Erik: This is late to ask. What do we need to have done by the LMWG workshop? What can we have done by then? Is it helpful to have things to "checkoff" during the WG meetings?
- Erik: We should have Sam bring in the current version of MIMICS as an experimental option now, as soon as he can schedule it. This will prevent having to maintain it outside of normal development.
- Erik: Does lawrencium-lr3 have ESMF installed on it? This will be required for NUOPC. Are there machines people use for CTSM besides: cheyenne, bluewaters, hobart, and izumi? The CTSM container for NEON includes ESMF so personal laptops are taken care of that way. For someone to build without the container on their own laptop would require installing ESMF though.
- Erik: Notice that Brian Dobbins let us know is about upcoming workshop on Intel tools. Personally I like the idea of using the ARM tools in their place as they are compiler independent. But, I do think it's good for some to know about other options as well.
- Erik: Sad note for CTSM community, Johan Feddema is in the equivalent of hospice care...
- Negin: quick updates about subset_data.py
- Erik: Negin has some slides on Python environments that she prepared for CSEG (but CSEG meetings keep getting cancelled). I'd like to have her show those slides to us, as this will be helpful to this group. The same issue is actually more important for us in CTSM as we require certain Python packages. I want to advocate for making conda the standard for python environments (since conda is now supported in modules on cheyenne).
Longer Term Design Discussion: (Should just SE's discuss this?) There is utility in everyone understanding these different elements...
-
Erik: Some design ideas for where to place different settings: we should think about when do you put something in each place: compset, case xml settings, use-case, user-mods, namelist-defaults, or offline run script.
-
Have as few compsets as possible for single-point and use the same compset for generic tower sites as well as NEON.
-
Have the process for running NEON as similar to generic tower sites as possible
-
It's OK for I-only cases to have compsets that don't encapsulate the exact year being simulated
-
In contrast higher level compsets (B, E, F, etc.) do always keep the exact year in the name
-
It's OK for single-point cases to use the same compset for both spinup and transient simulation in contrast to global simulations where we separate the two I feel like those came directly out the discussion. But, I'd like to add a few that we will need to discuss and make sure we all agree on.
-
Avoid putting complex logic in user-mod shell_commands
-
The original design had build-namelist/configure as a tool that was used outside of the context of a CESM case. And the list of XML control variables for CLM was designed to be short and general and not needing expansion. I don't think we need that restriction anymore. Bill added some new XML variables and they seem to work well. It's a bit bigger deal to add new XML variables, but it can be a good way to go.
-
Avoid putting things in user-mods that can be handled in use-cases or namelist-defaults
-
Avoid putting things in use-cases that can be handled in namelist-defaults
-
Use-cases can't always have complex attribute logic for elements in them (only high level options like physics-version, resolution, BGC-mode, etc.)
-
Avoid putting customization in run-scripts that can be handled in the above infrastructure as that allows people who don't use the run-scripts to have access to it as well.
-
Adding history settings in user-mods is useful as a starting point for users to modify the case by themselves. When buried in use-cases or build-namelist users can't easily modify them
-
A caveat to user-mods is that when someone clones a case for an older version it brings in that case, but doesn't bring in any updates that were made to the user-mods in the newer version.
-
Using user-mods with a clone is awkward (which is primary the cloned case or the user-mods)? Adrianna, noticed it appends the user-mods to the end, this is not always ideal.
-
The easiest to hardest to implement: offline-run script, user-mods, use-case, compset, namelist-defaults, case xml settings
Erik asks if it is useful to tie software development goals / deadlines to the timing of workshops.
Will: Yes, this can be useful.
Some possibilities for this upcoming workshop:
- subset_data / single-point work
- emphasizing how easy it is to run single-point cases now
- dynamic urban
Will is happy to give a brief update on some of these things in his talk.
Dave: in general, want to let people know about capabilities; don't need to give them details on how to do it. Can put contact info for more info. Can also include additional slides at the end of the talk for people to look at on their own. (Need to make sure that slides get posted, and in a timely fashion.)
Dave: back to the original question: doesn't feel we really need to tie our timelines to working group meetings.
Will: one thing that can be helpful is going through the tags since the last meeting in order to pull out highlights.
Will: would it be useful for people in the broader community to have a new release tag that they could use for their work?
- A related question is whether we treat this as a real release. This would probably imply additional testing as well as possibly long-term support.
- Dave suggests a "soft release", where we announce a tag but don't promise any support for it. This could be an opportunity to describe what is available through FATES, what parts are robust, what parts are more experimental, etc.
What is needed for a release (we are not talking about a release with the updated datasets: we're talking about a release before then):
- CN-Matrix is the big one
- One possible challenge with this is interaction with the MIMICS work... probably will just say that you can't run CN-Matrix with MIMICS.
- Other things, like the new tools, would be nice to have, but wouldn't hold up a release
Greg thinks that lawrencium probably doesn't have ESMF on it. Erik suggests getting that installed.
Keith has been running single point jobs on casper and it has been working really well in terms of stability and number of jobs you can run at once. The latest cime has configuration for it (using --machine casper
).
Erik and Bill will look into getting cime set up to run single-point jobs automatically on casper.
Keith notes that the default is to use pgi on casper.
- All: Any CTSM "wins" for the week? Erik is happy about having an initial working test list for mizuRoute. Fixing exact restart for FATES-SP was significant even if the tag itself was small.
- Erik: CDEPS issue #143 for nldas2 in DEBUG mode
- Erik: surface roughness PR? What is the priority now that Ronny is out?
- Erik/Bill: We started working with Beatrice on meetings. She has a lot to offer to improve how we are currently doing things. Erik is planning on implementing some of those ideas now. Bill and I should meet with Will/Dave to clarify each of our roles, and report back to the group.
- Erik: Changes to get FATES to work with NEON. See issues #1363, and #1609. I think we should add two new use-cases for NEON (one for spinup and one for transient) as well as a FATES compset, and a transient compset for BGC. We haven't distinguished between spinup and transient for NEON to this point, I think we should start doing that.
- Follow up on logging level discussion
Question of how this would apply in a FATES run.
- Greg notes that FATES does its own calculation of surface roughness.
- The broader question is how we handle things like this in this interim period when we're working with both big-leaf CLM and FATES. One possibility that Dave suggests is that, once we can show that FATES works well in various aspects in coupled runs, we say we'll stop trying to develop and improve big-leaf CLM. This needs to be discussed in a much broader venue.
- Erik asks: for PRs like this, should he ping someone on the FATES side to have them look at how this might apply to FATES? Unclear at this point. Greg offers to look at PRs (and possibly bring them to others' attention) if we ping him, but need to keep in mind that he and others might not be able to devote a lot of time to this. In general, continuing communication and bringing things to each other's attention is probably a good thing.
Erik wonders if we should have two new use-cases for NEON (one for spinup and one for transient).
Will and Negin suggest this may not be needed with the current workflow.
Erik: one issue is that some aspects of transient don't apply to FATES.
- You still need to spin it up, but you don't need AD mode, for example.
With the standard model, you do the spinup in a standard I1850 compset.
Dave feels we should do something similar for FATES (or these NEON cases in general); he feels that you use a HIST case only when everything is transient. His suggestion is that we use a present-day compset as a starting point for these single-point runs, and then just turn on the appropriate transient aspects.
Erik: but for NEON, we are actually doing just about everything transient, but then just using constant land use. Right now we're doing that by using a present-day compset but then choosing a transient use case, which is inconsistent.
On use cases: This is something that is hidden from most users and can potentially trip people up. Bill raises the question of whether it would make sense (eventually) to ditch use cases and just use namelist defaults, because they are really doing the same thing in different ways. (Where we currently set a use case based on the compset, you could instead set an xml variable that is used to set the default namelist options via namelist_defaults, as is done for other xml variables.) This idea of ditching use cases is not on the table right now, but it might at least argue for not making things more confusing by choosing a use case that disagrees with the compset.
Will likes having a generic compset for single point and then adjust things as needed. Adrianna echoes this: she feels there would be too many compsets, etc. if we wanted to introduce separate compsets for each of these.
One argument for putting the year in the compset is for consistency with how things are done across CESM. This is important for something like an F or B compset.
However, we're coming around to seeing Will and Adrianna's (and others) point. So that will be the plan: Use a standard I1Pt compset (but there will be separate ones for BGC, FATES or SP). Then Erik plans to make a new use case for FATES runs.
But Adrianna points out that we need a different surface dataset for FATES NEON runs than for non-FATES runs.
- Erik wonders if we can get things running with the non-FATES (78 pft) surface dataset, as long as you set use_crop=.false.
- Adrianna suggests using a variable in the file name to distinguish FATES vs. non-FATES, avoiding the need for separate user_mods directories (if I understood correctly).
Negin raises the point that it's confusing that we can do things both in run_neon.py and shell_commands. She points out that it would be clearer if this was done in just one place.
- Erik agrees with this general point; he suggests that shell_commands might be the most appropriate place, since this is accessible to people who are creating a case without using the run_neon script.
Erik will open a PR to illustrate what he's thinking.
- Bill: Two testing issues with my recent tag:
-
SMS_Ld5.f10_f10_mg37.ISSP370Clm50BgcCrop.cheyenne_gnu.clm-ciso_dec2050Start
had unexpected baseline failures in a bunch of fields when I first ran it. I reran it twice and it passed both times. Probably a system glitch, but FYI. -
ERS_Ly20_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.cheyenne_intel.clm-cropMonthOutput
exceeded wallclock time (1:40:00) twice. I have increased the wallclock time to 4:00:00, but I consider that an unreasonable amount of time for a test. The third time through, it completed in a little over an hour. Not sure what to do here.... - Update (2022-01-12) I brought this up in the software standup meeting. No need to revisit it here.
-
- Erik: FYI Note: Bill and I are going to be talking with Beatrice Meyer PhD about improving these meetings. I've been talking to her a bit about it, and talked to some of you individually about it as well. After we meet with Beatrice we'll likely talk to the group about what we learn. I have some notes above of some questions I put together in that regard, we'll refine it later. But, I'm still interested in hearing from anyone individually if you have any thoughts about the meeting.
- Erik: Feb/3rd and after is a GPU tutorial by CISL. Should we change our meeting time? Who should go to those classes from our group?
- Erik: More questions on command line interface standard. Add --silent? Should --debug just be logging or also a dry-run? Is it OK to bundle single letter arguments together? Also think we should eliminate abbreviations of arguments.
Adding --silent
: agreement that this should be done; it will change the logging level to errors only.
--dry-run
: agreement that this should be separate from --debug
: --dry-run
should print out what's going to happen without actually doing anything. This makes sense for some scripts but not others.
Bundling single letter arguments (e.g., ls -lrt
): agreement is we should do whatever argparse does as a standard.
Abbreviations of arguments: perl allows fuzzy matching of argument names. We don't think python's argparse allows that, and we think it again makes sense to do things the way argparse does.
- Update: python's argparse does allow prefix matching by default; Erik suggests disallowing it with
allow_abbrev=False
; no objections to that.
We had a long discussion on logging levels in python scripts. See also https://github.com/ESCOMP/CTSM/issues/1601 and https://github.com/ESCOMP/CTSM/pull/1598 .
The central question was how to handle output from scripts that we typically want to appear – e.g., along the lines of, "this process completed successfully"? Currently, we follow the python logging standard (laid out in https://docs.python.org/3/howto/logging.html ) of having the default logging level be WARNING, so anything printed via logger.info
only appears if you specify the -v
/ --verbose
flag. We additionally use print
statements for output that we feel should always appear, and so feels like fundamental output from the script. (It is admittedly somewhat subjective when to use print
vs. when to use a logging-based message; the above howto gives some guidance, but it is not black and white.) (One problem with this is that there is then no way to turn off these print statements.)
The issue is that we often seem to want more output from our scripts than what typically comes out of Unix utilities: The Unix standard is for commands to typically be silent if everything went well, but our group more often wants confirmation that things worked successfully. It feels like this difference leads to friction between what we want and what the standard is for using python logging.
One possibility is to change our default logging level to INFO
, so anything printed via logger.info
would appear by default. We then might want to change some of our logger.info
calls to instead be logger.debug
. It's not entirely clear what we would do with the --verbose
and --debug
flags in this case, but one possibility is that --verbose
would set the logging level to DEBUG
(thus printing all logging messages), and --debug
would do the same but would additionally enable the code that dumps you into the python debugger when the code aborts. We generally feel that this could be a good solution for what we want. The main downside is that we're then doing something non-default with respect to logging, which might be confusing for people contributing to our python code who are coming in with the assumption that logger.info
calls won't give any output by default.
Adrianna raises another possibility, which would be defining our own logging level between INFO
and WARNING
, and make that new level the default level. We feel like that could be a good solution, although the caution in https://docs.python.org/3/howto/logging.html#custom-levels is a little worrisome.
So the three possibilities we would like to consider are:
- The current status quo: default is
WARNING
; messages that we want to have typically appear can be done withprint
; other messages go ininfo
and if people want to see them they need to add-v
. - Change the default level to
INFO
- Define a new level between
INFO
andWARNING
and use that as our default level.
Here were notes from the discussion of this issue last August, with Sam, Erik and Negin:
We'll keep the default log level at warning. We discussed changing the default to info, at least for some scripts such as in the toolchain, so that users see more info about what the tools are doing. But a default level of warning is what's noted as the default in https://docs.python.org/3/howto/logging.html, and makes sense with both a
--verbose
and--debug
flag (so--verbose
turns on info-level output and--debug
turns on debug-level output). Moreover, Sam points out that it can be helpful to by default only see issues in a script, not a bunch of things providing general info – so that issues / warnings stand out more. In general, it seems like in CESM we are more verbose than standard unix utilities, and it might be a good thing to be a little less verbose. So we agree that we'll make the default be warning-level, and users can specify--verbose
if they want more output (and we might recommend that people use--verbose
the first time they run a script).
Some other things I found after the discussion:
- I looked back at the CIME code, and see that that actually uses a default level of
INFO
(with the--silent
flag changing this toWARN
). (This may be partly responsible for the perpetuation of the verbosity of CESM scripts as noted above.) - I tried looking at various guides on logging best practices, but most of what I can find seem aimed at applications where you're using the logging library for the sake of logging to log files that can be investigated later if needed; that might be appropriate for, say, a web application, but seems to have different requirements than a command-line application.
I realized afterwards (as Bill points out above) that adding "--silent" as an option might essentially shift this discussion toward more logging. In general by default I like having scripts tell you something about what it's doing -- especially if it takes some time to do it. This is especially true in the case of scripts that we've developed that we can't guarantee are as robust of LINUX commands. When I see it taking time -- I wonder is the thing working at all? The tools testing also assumes that you end the script with some type of "Success" message so you know it worked. And I as a human also like getting those types of messages as well.
But, that could be all encapsulated by having --verbose do "info" (including the "Success" message). So the tools testing would just always activate --verbose. And as Bill points out above I as a user would just always run with --verbose to get the extra logging I like. This would mean that the default would be WARN and --silent would shift it to ERROR (which in many cases wouldn't make a difference). But, it is sometimes helpful to turn off even warnings, especially if you are going to capture the output.
- Erik/Adrianna/Negin: Talk about defaults for subset_data.
- Dave/Mariana: High resolution urban streams and other stream data for ultra high-resolution grids
- Erik: Bill or I should add a testmod and hillslope tests to the hillslope branch. Who should do this? It won't take long. Yifan did spot a restart issue in the branch. I told Sean how to do a simple restart test without this in place. But, we might as well start getting this in place on the branch since several people are using it.
- Erik: We need to spinoff a 2022 version of these notes.
- Erik: A CTSM win. Keith helped a user get reasonable results with CTSM on the forum. Good job Keith! https://bb.cgd.ucar.edu/cesm/threads/the-simulation-result-gpp-and-le-of-clm4-5-single-point-is-much-lower-than-the-observational-data.6941/#post-42865
- Erik: FYI fire-emission changes are more complex than I originally thought, as it turns out it requires several component tags and a CESM tag to be coordinated.
- Erik: One of the things I was going to put in with answer changes, only changes answers for diagnostic fields not normally output. So you can argue that it's a simple-bfb change. But, this makes me think that we should put into place at least one test that turns on all history fields.
- Bill: see https://github.com/ESCOMP/CTSM/issues/29
- Erik: Go over updated datasets project as well as upcoming tags.
We talked about using a "super" flag option and a few different ways that could work. We decided to go away from that as that will require more internal checking.
Negin has a wrapper script that creates the NEON site data, and as such the NEON standard case is taken care, since the wrapper script will choose the correct options for the NEON case.
Jackie brought the view for anything that is "non-standard" (beyond just subsetting the data) the user should make an intentional choice.
Negin points out it's nice to have defaults so that the user can at least run the script out of the gate. So to make it easier for novice users it's good to have defaults to make it easy to run.
Erik will add a test-mod and tests for hillslope. And work with Sean about the restart issue.
Mariana: for EarthWorks, she had a problem with the performance of mapping the urban streams data when using 18000 processors: there is an overflowing of a buffer in creating route handles. So it would probably help to have a somewhat higher resolution; suggestion of a 25 km file.
The only thing the urban streams file does is specify something for air conditioning. There is actually the same value for all points within a given urban region. So in principle we could just have 33-ish values that we use the urban region to index into. It's done as a stream right now to allow it to be time varying. But it seems like we could have a stream that has no spatial dimension, but has one dimension of urban region and another of time.
For now, Keith will provide a 25 km dataset; suggestion of using 1/4 degree (same resolution as PFT data)
Dave raises the point that a lot of our input datasets aren't at high enough resolution to support high resolution runs.
Erik was able to create a 7.5 km surface dataset using our existing tools (though there was a problem with the 1km source grid).
But there's also the science problem of having higher resolution source input data.
In the short-term, we can have a couple of tests that turn on all diagnostic fields (see https://github.com/ESCOMP/CTSM/issues/29 ). Once that works, consider transitioning to having our default
testmod turn on all diagnostic fields (still maintaining some tests that don't use that testmod and so use the out-of-the-box behavior).
-
General
-
Documents
-
Bugs/Issues
-
Tutorials
-
Development guides
CTSM Users:
CTSM Developer Team
-
Meetings
-
Notes
-
Editing documentation (tech note, user's guide)