-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'subsetter_argo' into develop_v2
- Loading branch information
Showing
24 changed files
with
355 additions
and
452 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Subsetter FastAPI | ||
|
||
A python FastAPI application that can submit the subsetter workflow templates to an argo instance updated with the subsetter workflow templates at `./argo/`. | ||
|
||
The Dockerfile declares a python base image and installs the dependencies declared in `requirements.txt` and `requirements-dev.txt` and starts up the FastApi application at port 8000. | ||
|
||
`subsetter/main.py` is the entrypoint to the FastAPI application, configures the routers. The file also contains a startup event hook that initialized the mongodb database with [beanie ODM](https://beanie-odm.dev/). The startup event hook also sets up a minio client for the [CUAHSI MinIO instance](https://console.minio.cuahsi.io). The minio client is used for synchronizing user specific access policies and keys/secrets. | ||
|
||
API documentation is rendered at https://subsetter-api-jbzfw6l52q-uc.a.run.app/redoc (This will be updated to https://api.subsetter.cuahsi.io/redocs pending certificate creation). OpenAPI spec documentation is generated from the code defining the api endpoints (FastAPI) and input/output models (Pydantic). | ||
|
||
User authentication is achieved by configuring the [fastapi_users](https://github.com/fastapi-users/fastapi-users) module with [CUAHSI SSO](https://auth.cuahsi.org/) using the `OpenID Connect` protocol. On registration a S3 bucket is created for the user on [CUAHSI MinIO](https://console.minio.cuahsi.io) (TODO: create a default quota of 5 GB). An admin may increase the quota on a case by case basis. | ||
|
||
The Subsetter API is divided into 4 routers defined at `subsetter/app/routers/`. | ||
|
||
## Routers | ||
### Access Control Router | ||
The `access_control` router contains prototyped synchronization of view/edit access to paths on MinIO that have a HydroShare resource that references a path on the CUAHSI MinIO instance. In the [mongo_discovery-access-control](https://github.com/hydroshare/hydroshare/compare/develop...mongo-discovery-access-control) HydroShare branch, event hooks are created for exporting Resource and User access to a mongo database. This mongo database is accessed to look up the resources which a user has view/edit privileges and generates the view/edit policies that are assigned to the user on CUAHSI MinIO storage. This means a path in a user's bucket may be registered on HydroShare and enjoy the same access control capabilities of a HydroShare Composite Resource. | ||
|
||
### Argo Router | ||
Contains the api endpoints for submitting a subsetter workflow, tracking submissions, and generating a presigned download url to the resulting datasets. | ||
|
||
### Discovery Router | ||
A copy of the IGUIDE discovery router that includes endpoints for searching resource metadata. The Subsetter workflows run the hydroshare metadata extraction tool to extract metadata the same metadata that a HydroShare composite resource will extract from recognized file formats. The resulting metadata can then be written to the Discovery database on Atlas. TODO: collect the metadata extracted from subsetter outputs into a discovery database. | ||
|
||
### Storage Router | ||
Contins the endpoints to generate presigned urls for PUT and GET of objects on S3. This is not currently used but could be used to create a resource landing page for resources stored on S3 equivalent to a resource on HydroShare. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
from subsetter.app.routers.argo.router import router | ||
from .router import router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .router import router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .router import router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import json | ||
import tempfile | ||
from typing import Any, Union | ||
|
||
import google.cloud.logging as logging | ||
from fastapi import APIRouter, Depends | ||
from pydantic import BaseModel | ||
|
||
from subsetter.app.db import User | ||
from subsetter.app.users import current_active_user | ||
from subsetter.config import get_minio_client, get_settings | ||
|
||
if get_settings().cloud_run: | ||
logging_client = logging.Client() | ||
logging_client.setup_logging() | ||
|
||
router = APIRouter() | ||
|
||
|
||
class HydroShareMetadata(BaseModel): | ||
title: str | ||
description: str | ||
|
||
|
||
class DatasetMetadataRequestModel(BaseModel): | ||
file_path: str | ||
# bucket_name: str | ||
metadata: Union[HydroShareMetadata, Any] | ||
|
||
|
||
@router.post('/dataset/metadata') | ||
async def create_metadata(metadata_request: DatasetMetadataRequestModel, user: User = Depends(current_active_user)): | ||
with tempfile.NamedTemporaryFile(delete=False) as fp: | ||
metadata_json_str = json.dumps(metadata_request.metadata) | ||
print(metadata_json_str) | ||
fp.write(str.encode(metadata_json_str)) | ||
fp.close() | ||
get_minio_client().fput_object(user.bucket_name, metadata_request.file_path, fp.name) | ||
|
||
|
||
@router.put('/dataset/metadata') | ||
async def update_metadata(metadata_request: DatasetMetadataRequestModel, user: User = Depends(current_active_user)): | ||
get_minio_client().remove_object(user.bucket_name, metadata_request.file_path) | ||
return await create_metadata(metadata_request, user) | ||
|
||
|
||
class DatasetExtractRequestModel(BaseModel): | ||
file_path: str = None | ||
# bucket_name: str | ||
metadata: Union[HydroShareMetadata, Any] = None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
from subsetter.app.routers.storage.router import router | ||
from .router import router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,42 @@ | ||
Minio Artifact setup; | ||
# Argo Workflows for Subsetting | ||
|
||
|
||
Argo Workflows are containerized DAG workflows declared in yaml that run on Kubernetes. Each node in the workflow is a docker container with S3 storage. | ||
|
||
*An example graph of the parflow subsetter workflow* | ||
![Alt text](.images/workflow-graph.png) | ||
|
||
The `templates/` directory contains workflows which declare composable [templates](https://argo-workflows.readthedocs.io/en/latest/workflow-templates/). They are referenced by the workflows in the `workflows/` directory with a [templateRef](https://argo-workflows.readthedocs.io/en/latest/workflow-templates/#referencing-other-workflowtemplates) | ||
|
||
Artifact storage is S3 w/MinIO. We use artifacts to store input and output files for our workflows. Each user is given a bucket (TODO: configurable Version Control and Quotas) for storing output data of their workflows. The output of one workflow may be used as input to subsequent workflow runs. | ||
|
||
The 3 supported subsetter workflows (nwm1, nwm2, parflow) write the result to S3 storage in their own bucket at `/argo_workflows/{workflow_template}/{GUID}`. A GUID is generated for each run of a subsetter workflow and is used as the workflow run name. An example parflow subsetter output viewed in the MinIO viewer is shown below. | ||
![Example user bucket with parflow subsetter output](.images/minio-view-subsetter-ouput.png) | ||
|
||
The subsetter input datasets are stored on the [CUAHSI MinIO instance](https://console.minio.cuahsi.io). This bucket has public read access. The workflows use these datasets as input artifacts within a subsetter workflow. A workflow conveninently maps an artifact to a path within a container that can be used as input our output locations to a program running in the container. | ||
|
||
*Example output declaration with configurable output locations. [ArtifactRepositoryRef](https://argo-workflows.readthedocs.io/en/latest/artifact-repository-ref/) could be used to simplify artifact use.* | ||
|
||
```yaml | ||
outputs: | ||
artifacts: | ||
- name: subsetter-result | ||
path: /output | ||
s3: | ||
endpoint: api.minio.cuahsi.io | ||
bucket: '{{inputs.parameters.output-bucket}}' | ||
accessKeySecret: | ||
name: minio-credentials | ||
key: accessKey | ||
secretKeySecret: | ||
name: minio-credentials | ||
key: secretKey | ||
key: '{{inputs.parameters.output-path}}' | ||
``` | ||
# `minio-credentials` access key/secret setup; | ||
1. Create an access key/secret in the minio UI at | ||
2. Save the key/secret as a secret in kubernetes in the workflows namespace | ||
`kubectl create secret generic minio-credentials --namespace workflows --from-literal=accessKey='<key>' --from-literal=secretKey='<secret>` | ||
2. Save the key/secret as a secret in kubernetes in the `workflows` namespace | ||
`kubectl create secret generic minio-credentials --namespace workflows --from-literal=accessKey='<key>' --from-literal=secretKey='<secret>` | ||
|
||
These workflows should eventually be setup to automatically sync to https://workflows.argo.cuahsi.io |
Oops, something went wrong.