-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mount public wpsoutputs data #360
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
e57d421
mount public wpsoutputs data with cowbird and jupyterlab instances
cwcummings c040992
bump cowbird to 2.0.0 and mongo to 5.0 to support the new cowbird ver…
cwcummings 438b3de
isolate mongodb 5.0 required for cowbird
cwcummings d5f9ded
update changelog
cwcummings fce59da
review changelog
cwcummings 5ed155c
remove unused wpsoutputs mount
cwcummings 53c5cb8
Merge branch 'master' of github.com:bird-house/birdhouse-deploy into …
cwcummings d800550
updates after pr feedback
cwcummings 3bc3133
add secure-data-proxy name variable to cowbird
cwcummings 2520f80
add new optional cowbird config parameters as comments
cwcummings 71fc669
Merge branch 'master' of github.com:bird-house/birdhouse-deploy into …
cwcummings 5673593
review wps outputs variables
cwcummings 7afd194
bump cowbird version and move changes in changelog
cwcummings c8c8728
small adjust cowbird vars
cwcummings 1c66ed4
use 'build.os' instead of deprecated 'build.image' in readthedocs config
cwcummings cadd9f4
try newer python version for readthedocs
cwcummings 90611a7
Merge branch 'master' of github.com:bird-house/birdhouse-deploy into …
cwcummings 7119db0
Merge branch 'master' of github.com:bird-house/birdhouse-deploy into …
cwcummings 5ffcb0f
Bump version: 1.32.0 → 1.33.0
cwcummings File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
1.32.0 2023-09-22T16:09:24Z | ||
1.33.0 2023-09-25T13:02:37Z |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo, this has write-access to the entire
$DATA_PERSIST_ROOT
!!! Isn't this too much access?I am guessing it only needs access to
JUPYTERHUB_USER_DATA_DIR="$DATA_PERSIST_ROOT/jupyterhub_user_data"
and Thredds wps_output dir?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaamC correct me if I'm wrong... but the reason for this is that both of those directories need to be mounted in the same volume for hardlinks to work:
@tlvu In #356 we introduce the
DATA_PERSIST_SHARED_ROOT
variable here instead. This gives us the ability to mount only the required subdirectories as as single volume.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am a bit behind in most things cowbird. Understood that for hard links to work, the src and dest of the link should be on the same filesystem.
So if we need to expose Thredds wps_outputs and possibly Geoserver workspace in the Jupyter env workspace, then we need to only volume-mount those 3, in the cowbird container, not the entire
/data
.If
DATA_PERSIST_SHARED_ROOT
is the same asDATA_PERSIST_ROOT
by default, then it comes back to the same as exposing the entire/data
to cowbird writable.Basically would something like the following works:
So the paths inside and outside of the cowbird container is exactly the same, giving it the impression it has access to the entire
/data
but it's not true.By the way, I am not sure
${DATA_PERSIST_SHARED_ROOT}/datasets/wps_outputs
actually works since I think the actual wps_outputs is in a data-volume https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/config/wps_outputs-volume/docker-compose-extra.ymlUsage of that wps_outputs data-volume:
Anyways, many inter-connected pieces so not simple to wrap my head around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would. But... I would like to see
DATA_PERSIST_SHARED_ROOT
not be the same asDATA_PERSIST_ROOT
. The only reason it is like that by default is to maintain backwards compatibility.No, unfortunately that wouldn't work. They actually need to be mounted at the same mount-point. The relative location matters for symlinks, but a shared mount-point matters for hard-links.
I don't have enough knowledge to comment on this. Sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. If we want to share the wps_outputs data to the user, we need to use hardlinks, and cowbird requires the src and destination of the hardlinks to be in the same volume/file partition, or it will trigger a
Cross-device link
.I needed to mount the full data directory to be able to make hardlinks between files from the
/data/wps_outputs
and/data/user_workspaces
directories.We can update the volume mount to use the upcoming new variable
DATA_PERSIST_SHARED_ROOT
as long as all files used by cowbird for this PR's feature are contained in this directory.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, for existing deployments, if we want
DATA_PERSIST_SHARED_ROOT=/data_shared_mount
different thanDATA_PERSIST_ROOT=/data
, we have to moveI think this migration should be documented because anyone starting with the current default values and do not want Cowbird to have access to the entire
/data
will have to perform this manual migration.Or we set a different value immediately in
default.env
and inenv.local.example
we document the backward-compatible valueDATA_PERSIST_SHARED_ROOT=/data
and warn this gives Cowbird too much write-access. I prefer different default value now to avoid migration later. Cowbird is not in the default enable list right now so this should not break too many existing deployments.Also, I still do not understand why we are referring to
/data/wps_outputs
when currently wps_outputs is a data-volume as mentioned in my comment #360 (comment) above.This means different volume-mount between
wps_outputs
dir anduser_workpaces
dir and hardlink probably do not work.Does Cowbird fallback to a regular copy when hardlink do not work? Does it log somewhere when hardlink do not work?
Is there a notebook or test script that tests Cowbird in the PAVICS stack I can try?
Also question for the future as I heard maybe Cowbird will manage Geoserver workspace too. Does this means we will need to migrate Geoserver data dir under
DATA_PERSIST_SHARED_ROOT
later?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's it, at least concerning this PR's changes.
Although, I'm thinking about something else to watch out for. Not sure if this variable's path could be affected, but changing the
JUPYTERHUB_USER_DATA_DIR
variable's path could break the symlink that is used in the cowbird workspace to link to theJUPYTERHUB_USER_DATA_DIR
directory. So, maybe something to watch out for. The user workspaces could eventually be resynchable if we decide to change the jupyterhub path, but the resync is not yet implemented in cowbird for this symlink, so for now it would need a manual recreation of symlinks in each user workspace if that variable is changed.If we know we will eventually do the migration, I don't see any reason not to do it now. I think it would be okay to change it now if you prefer.
I am not sure to understand the problem here. Do you mean the fact that the birds who will produce wps_outputs data only mount the wps_outputs volume (and not the user workspace volume)? Any bird that produces wps_outputs only needs access to the wps_outputs volume. Only cowbird requires the full access to the data directory to allow the creation of hardlinks between the wps_outputs and user workspace directories. After that, when the hardlink is created, the volume mount does not require the full data mount to have the hardlink accessible. It is really only for the moment of creation, in cowbird that we require the full mount of the data directory.
If the hardlink fails, the failure is only logged in Cowbird's logs. We want to avoid having 2 independant copies that could diverge. We did implement the resync function for the wps_outputs data, so if we need, we can call that API endpoint on cowbird, and cowbird will regenerate any missing hardlinks that could have resulted from a previous failure.
Nope, not at the moment, although I am checking to create a notebook to verify the different parts of the user workspace (notebook symlink, wps_outputs hardlinks, geoserver data, etc.), but I am not sure yet if it is achievable.
Cowbird currently already manages geoserver's file permissions on the user workspace. Since the data is only found in the user workspace directly, it doesn't require any symlink/hardlinks, as data is already accessible to the user,it shouldn't be a problem.
Unless there is external geoserver data to be made accessible to the user that I am not aware of yet?