Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Processing Request]: High Priority CalVal sites Request #55

Open
3 of 13 tasks
gracebato opened this issue Sep 23, 2024 · 19 comments
Open
3 of 13 tasks

[Processing Request]: High Priority CalVal sites Request #55

gracebato opened this issue Sep 23, 2024 · 19 comments
Assignees

Comments

@gracebato
Copy link

gracebato commented Sep 23, 2024

Venue

PST

Product

DISP-S1

SAS Version

No response

SDS Version

No response

Input Data

High Priority Frames:

  • F08622
  • F08882
  • F09156
  • F11116
  • F12640
  • F18903
  • F28486
  • F33039
  • F33065
  • F36542
  • F42779

Share Results

  • Google Earth Engine (GEE)
  • NASA Distributed Active Archive Centers (DAAC) User Acceptance Testing (UAT)

Additional Notes

F11116 and F08882 were already processed in: #53

@gracebato
Copy link
Author

Parameters should be similar to #53, e.g.

Date range: 20160701 - 20240905
k=15
m=5

@philipjyoon philipjyoon self-assigned this Sep 23, 2024
@philipjyoon
Copy link

philipjyoon commented Sep 23, 2024

@gracebato just to confirm: You'd like these products delivered to ASF UAT correct? #53 did not request that.

Also does it make a difference if we process these on our INT venue instead of PST? The difference is that if we process on PST we will keep the products in PST S3 forever; wherever INT S3 will be deleted clean every deployment. So the question is: do these products need to be archived in PST S3 forever or just delivering to ASF UAT sufficient?

EDIT: After speaking with @LucaCinquini we decided that we will process on the PST venue and deliver to ASF UAT.

@philipjyoon
Copy link

We want to make sure that all frames process at least 4 years worth of data. So I'll process 2016-2021 covering 5 years. And if any of the frame still don't have 4 years worth of data, which is possible, I can extend the time range of those specific frames until that point.

This is a bit of manual work but not too bad. I can predetermine which frames would not have 4 years worth of data in the first 5 calendar years using the historical database.

@gracebato
Copy link
Author

gracebato commented Sep 24, 2024

Hi @philipjyoon all DISP-S1 products goes to UAT for all produced going forward. So request #53 would also go to UAT. Thanks.

@philipjyoon
Copy link

This request will be executed in 2 variations with one dependency. @gracebato please correct me if the understanding is incorrect:

  1. Variation 1: Using OPERA PCM version 3.1.0-rc.6.0 (latest version as of today) process these frames for at least 4 years. Product version is still v0.6
  2. Dependency: After OPERA PCM version 3.1.0-rc.7.0 is released next week, first run the 3 frames in request [Processing Request]: DISP-S1 QC check for v0.6 products #53 using product version v0.7 for 2016-2024
  3. Once above dependency is satisfactory, run these frames using the same software and same product version for the entire historical period 2016-2024.

@philipjyoon
Copy link

Will use the following batch_proc for Variation 1

{
 "enabled": true,
 "label": "PST_Request_55",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622, 9156, 12640, 18903, 28486, 33039, 33065, 36542, 42779],
 "wait_between_acq_cycles_mins": 10,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }

@philipjyoon
Copy link

~65% complete as of now.

f28486 has finished. f33039 is taking by far the longest... it's currently only 35% complete processing around 2018 right now.

@philipjyoon
Copy link

80% complete. The rate is about 1% per hour

@philipjyoon
Copy link

86% complete. There was some sort of JPL-wide network issue between last night and this morning. It seems to have just resolved and we've resumed processing.

frame_completion_percentages    ['33039: 62%', '9156: 82%', '8622: 94%', '28486: 100%', '36542: 90%', '18903: 88%', '33065: 89%', '12640: 99%', '42779: 87%']
last_processed_datetimes        {'33039': '2019-08-07T04:30:10', '9156': '2020-08-01T02:07:32', '8622': '2020-11-04T22:51:11', '28486': '2021-01-21T00:36:31', '36542': '2020-10-07T01:59:19', '18903': '2020-09-26T13:51:31', '33065': '2020-10-12T04:39:58', '12640': '2021-01-04T23:28:50', '42779': '2020-09-26T16:13:06'}
progress_percentage             86%

@philipjyoon
Copy link

Logging into the SCIFLO verdi machines and manually killing cloudwatch agent service which uses up one whole CPU core. This frees up the CPU core for the actual DISP-S1 processing. Next OPERA PCM release have a fix for this cloudwatch agent inefficiency.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop

@philipjyoon
Copy link

Processing is complete. Here is the listing of all products

request_55_product_paths.txt

@philipjyoon
Copy link

philipjyoon commented Oct 8, 2024

Due to an operator error getting around a database file issue (which is now fixed going forward) we need to reprocess the last 4 runs of frame 8622 starting w the query job that generated the following batch ids:
f8622_a1032 f8622_a1020 f8622_a996 f8622_a984 f8622_a972 f8622_a960 f8622_a948 f8622_a936 f8622_a924 f8622_a912 f8622_a900 f8622_a888 f8622_a864 f8622_a852 f8622_a840

To do this, we will need to perform the following actions:

  1. Delete all compressed CSLC records from GRQ ES index grq_1_l2_cslc_s1_compressed that were generated by the SCIFLO runs that we are going to re-run.
  2. Create a DISP-S1 historical processing batch_proc with frame 8622. Also need to create the following field: "frame_states": {"8622": 60} and data_start_date can be the original start date in 2016. This way the processing will happen with sensing dates 61 to 75 in the frame 8622 series
  3. Purge all query, download, sciflo, cnm-s, and cnm-r jobs that correspond to those runs so that they don't get deduped
  4. Ask ASF to delete the old files
  5. Start the batch_proc and monitor... make sure the first product reference date is correct.

@philipjyoon
Copy link

philipjyoon commented Oct 8, 2024

{
 "enabled": true,
 "label": "PST_Request_55_partial_8622",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2021-02-01T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622],
 "frame_states": {"8622": 60},
 "wait_between_acq_cycles_mins": 5,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }

@philipjyoon
Copy link

Reprocessing of frame 8622 last 4 runs have started

@philipjyoon
Copy link

Had to stop and restart because I hadn't deleted the compressed CSLCs from those 4 incorrect runs.

We can use Tosca to delete unwanted Compressed CSLC records. In this case we want to delete all C-CSLC products that have the reference date 20181103T000000Z

  1. Create filter in Tosca
    Image
  2. Use that filter to run on-demand "Purge datasets" job
    Image

@philipjyoon
Copy link

Reprocessing of the last 4 runs of frame 8622 was successful. Below is the corrected listing of all products from this run.

request_55_product_paths_fixed_8622.txt

@arothjpl
Copy link

Processing started on 10-28. Still processing as of 10-30.

@philipjyoon
Copy link

philipjyoon commented Nov 14, 2024

Used the following to run this request again using the latest PCM release 3.1.0 final. Production version is 0.8

{
 "enabled": true,
 "label": "PST_Request_55",
 "processing_mode": "historical",
 "include_regions": "",
 "exclude_regions": "",
 "temporal": true,
 "data_start_date": "2016-07-01T00:00:00",
 "data_end_date": "2024-09-05T00:00:00",
 "k": 15,
 "m": 6,
 "frames": [8622, 9156, 12640, 18903, 28486, 33039, 33065, 36542, 42779],
 "wait_between_acq_cycles_mins": 10,
 "job_type": "cslc_query_hist",
 "provider_name": "ASF",
 "job_queue": "opera-job_worker-cslc_data_query_hist",
 "download_job_queue": "opera-job_worker-cslc_data_download_hist",
 "chunk_size": 1
 }

@philipjyoon
Copy link

Running this ticket again using v0.9 product and PCM version 3.1.1. We are running several other requests and the following 3 frames are unique to this request: 18903, 28486, 33065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants