Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Awareness for PySPEDAS #1061

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Cloud Awareness for PySPEDAS #1061

wants to merge 13 commits into from

Conversation

edmondb
Copy link
Collaborator

@edmondb edmondb commented Nov 20, 2024

This is the completed PR that brings Cloud Awareness to PySPEDAS using the fsspec filesystem protocol (includes AWS and GCS support). The proposed changes do not break current API usage.

Functionality/Summary:

  • For invocation, changing one of the SPEDAS_DATA_DIR, local_data_dir, or remote_data_dir environment variables is required.
  • Retrieve files from http/https servers and place onto URI-path data storage (e.g., pull from host and put onto AWS)
  • Read/Stream from URI storage (preferred method of use) when given a URI for remote/local path.
  • Update URI-path storage from a separate URI-path storage if file mod time is newer.
  • Allow a user to force download from a remote URI (NOT RECOMMENDED) by using the force_download option.
  • Attempt to read/stream from local storage (POSIX or URI-based) if remote fails.
  • Mock cloud storage unit testing using moto; dependency is not included in requirements.txt file
  • Documentation updates for PySPEDAS
  • Note: MAVEN STS file types have issues when PyTplot and Cloud Awareness are in question because PyTplot is not cloud aware yet. The path taken was to report the error and ignore the file attempted to be read. This can be found in maven_load.py:543.

Dependencies:

  • fsspec
  • s3fs (for AWS)
  • aioboto3 (necessary due to cdflib's boto3 cloud implementation)
  • cdflib >= 1.0.0 (contains cloud storage reading functionality)

This code was tested using AWS' CLI on an EC2 resource provided by the HelioCloud project. Details for setup and temporary credential management not included.

Finally, a separate contribution of a Jupyter notebook will be submitted to the pyspedas_examples repository for use with public AWS storage of mission data (e.g., from CDAWEB).

closes #416

@edmondb edmondb linked an issue Nov 20, 2024 that may be closed by this pull request
@edmondb
Copy link
Collaborator Author

edmondb commented Dec 2, 2024

@jameswilburlewis The checks were not successful due to the MAVEN server being hit with too many requests. Let us know if there's anything you'd like us to do on our end regarding this PR.

@edmondb edmondb requested a review from nickssl December 5, 2024 20:38
@edmondb
Copy link
Collaborator Author

edmondb commented Dec 5, 2024

This PR was clean and will need conflict resolution due to recent PR merge from @nickssl .

b568d2d

@edmondb
Copy link
Collaborator Author

edmondb commented Dec 23, 2024

Resolved conflicting merged commit and reinstated Cloud Awareness for download.py. This should be ready to be incorporated now @jameswilburlewis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for loading data from Amazon S3
4 participants