-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to employ a subset of components #357
Comments
Some things we've discussed in the past:
My initial thoughts then is that the following should be part of the minimal components:
Then we also have the question of cowbird, even though it is not strictly necessary in all cases, it might be worth it to add it to the list just to keep the node set-up consistent. I'm not sure about this though |
From my limited understanding of what Cowbird does, I think it allows Jupyter users to share notebooks with other Jupyter users via dynamically created symlinks in the recipient side. Therefore, it is not such "generic" component. I might be wrong, please correct me on this. Back to the primary point about documenting the minimal set, I would just put them in Should we finalize the move of everything under |
Cowbird syncs permissions of corresponding resources between services. Considering that many services and files were assuming publicly access before, Cowbird didn't accomplish much more, hence why it must have felt not that important so far. However, as soon as you toggle that public switch, everything is not accessible properly if Cowbird is not involved. |
Sounds good. |
Another item discussed in today's executive committee meeting was to consider The reasoning behind this is mostly that it is the only current service in birdhouse-deploy that can perform federated operations, namely, dispatching processing steps to various DACCS nodes in a network, according to the specified data-source URLs. This goes well with STAC, once #297 is completed, which can offer a federated catalogue to search of data over the DACCS network, provided that the STAC populator behind it gradually sync available metadata between the nodes. It is expected in the long run that any WPS output produced by Weaver (and therefore all other WPS birds of each node since it can wrap their processing monitoring) would be inserted to the STAC catalogue. |
If we say that weaver is required then we're essentially saying that you can't have a data-only node. I am in favour of saying that weaver is required if there are any wps services so that weaver can wrap their processes. |
Maybe a better way of thinking about this is in terms of component dependencies. Here is a sketch of what I imagine we should do with the stack: Minimal components required in every deployment:
Cowbird is required if:
If any WPS or weaver components are enabled, then thredds is required (to serve wps outputs). If any WPS components are enabled, then weaver is required to wrap their services. |
Jupyter by itself is sufficient to have cowbird active.
This is not true. The WPS outputs are accessible by themselves on |
Ok so to update my understanding of component dependencies:
Am I missing anything? |
Cowbird could be needed for WPS outputs if WPS services are enabled as well. |
@fmigneault The STAC one is interesting... what still needs to be implemented to integrate STAC with cowbird? |
Some of the items published in STAC could be NetCDF files (or others) accessible through THREDDS. |
cowbird required if jupyterhub enabled: Agreeded cowbird required if thredds and geoserver are enabled AND one of the WPS bird is enabled: If no WPS bird, nothing will write to the Basically, data-only node do not need cowbird, unless I still do not fully understand what cowbird does. |
But it sounds like that if we have a data only node that is serving data through thredds we still need cowbird. |
Cowbird also needed to sync permissions between GeoServer and THREDDS, even without any WPS. |
Oh I missed this. Can you remind me what what GeoServer is trying to access on Thredds and vice-versa? If only one of GeoServer or Thredds, then no need for Cowbird right? |
Could we say the same for GeoServer? Don't think something depends on GeoServer. |
Some shapefiles/layers that could be shared within user-workspaces.
Correct, unless some other service needs synchronization such as in #360 for WPS outputs, or some other use cases we could come up with (e.g.: MLflow instance in the works with JupyterHub that could need some user-workspace file share as well).
I believe this is the case. |
I think we're going in circles here. Let me talk about this a different way:
So in every configuration of the node, cowbird is required. If you can think of a configuration where cowbird is not required, please let us know |
Yes. However, I believe this is an edge case, and it is safe to assume that STAC would also refer to local data provided by another service of the same instance. Another situation that could make Cowbird unnecessary is if the instance is configured to be fully open with public access and that user workspaces are not used (eg. data only node without JupyterHub). Cowbird would only create redundant permissions for users that already have public access. Given all that, having Cowbird running in the background shouldn't pose any issue even if those use cases are encountered. |
😆
Yeah I agree |
Ok so the proposed minimal subset of components required are now:
|
Agreed |
So based on the discussion above, I believe we've decided on the following action items:
I feel like this should all be done in one PR (or all in the same version update) since this will require all current deployments to make some manual changes to their env.local files |
I'm not quite sure how to handle this one. |
I think that this is necessarily going to be a breaking change and will require current deployments to update If we want to ease the transition, we can create a migration script that can be run to update env.local files to automatically update the relevant variables. We can even configure it to run as part of |
But then the migration script will be enabled by default? So by default all the current enabled components will still be enabled even if they are not in If each of us, for each existing deployment, we have to manually edit each Given the dir list in Basically, instead of editing each existing That "rename PR" has to wait for each org to approuve saying "I have prepared all my |
Sure, if you think that it's easier to coordinate all of the existing deployments that works too. |
I think it's just simpler, no migration script to write and same effort of searching and editing all existing Otherwise, if the migration script is activated by default, it means the same components will still be deployed by default, which defeat the purpose of moving them out of To ease further the editing of the various |
I disagree. This repository is not exclusively for DACCS/Marble nodes. I don't think there is any technical issue in this case that forces us to cause major/breaking changes.
I don't like this idea. The user should be in control of what they enable. This can cause undesired side-effects.
Expanding on that, I think this is the key. We only need to make sure that What we need to watch for is the order defined here: birdhouse-deploy/birdhouse/read-configs.include.sh Lines 252 to 254 in 5c06b4b
We must make sure not to resolve export EXTRA_CONF_DIRS="${EXTRA_CONF_DIRS:-${DEFAULT_CONF_DIRS}}" before env.local had the change to be evaluated. Therefore, this definition cannot be directly in default.env .
If evaluated in the right order, existing instances that already override |
I don't see how making DEFAULT_CONF_DIRS the default for EXTRA_CONF_DIRS helps in this case. I think what @tlvu is suggesting is that we just ask everyone to copy of move some lines from DEFAULT_CONF_DIRS to EXTRA_CONF_DIRS before they update to the new version |
Let me give an example so that we're sure we're talking about the same thing... BEFORE:export DEFAULT_CONF_DIRS='
./config/proxy
./config/canarie-api
./config/geoserver
./config/finch
./config/raven
./config/hummingbird
./config/thredds
./config/portainer
./config/magpie
./config/twitcher
./config/jupyterhub
'
export EXTRA_CONF_DIRS='
./components/monitoring
./components/cowbird
./components/weaver
' AFTER:export DEFAULT_CONF_DIRS='
./config/proxy
./config/magpie
./config/twitcher
./components/stac
./components/cowbird
'
export EXTRA_CONF_DIRS='
./config/canarie-api
./config/geoserver
./config/finch
./config/raven
./config/hummingbird
./config/thredds
./config/portainer
./config/jupyterhub
./components/monitoring
./components/weaver
' A deployment that has the configuration in the BEFORE section can manually edit EXTRA_CONF_DIRS so that it looks like the one in the AFTER section without any major change to the services that their deployment offers |
Exact! That's what I have in mind. During the same edit of my various existing
Exact. I meant for all orgs to prepare all the various existing |
@tlvu |
Backward/forward-compatible config/components location have been applied for CI instances. |
…oy the stack (#399) ## Overview Changes `DEFAULT_CONF_DIRS` to refer exclusively to the proxy, magpie, twitcher, stac, and cowbird components. Also moves all components that were previously under the `birdhouse/config` directory to the `birdhouse/components` directory. This removes the arbitrary distinction between these groups of components that didn't have any functional or logical reason. Because this change updates the default components, this is not backwards compatible unless the following changes are made to the local environment file (`birdhouse/env.local` by default): - add any components no longer in the `DEFAULT_CONF_DIRS` list to the `EXTRA_CONF_DIRS` list. For example, to keep the jupyterhub component enabled, add `./components/jupyterhub` to the `EXTRA_CONF_DIRS` list. ## Changes **Non-breaking changes** - changes `PROXY_ROOT_LOCATION` to refer to the magpie login page by default but to the jupyterhub login page when jupyterhub is also enabled. **Breaking changes** - moves all components under `./config` to `./components` - sets `DEFAULT_CONF_DIRS` to refer exclusively to the proxy, magpie, twitcher, stac, and cowbird ## Related Issue / Discussion - Resolves #357 ## Additional Information Links to other issues or sources. <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci: true`` in the PR description. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
Summary
The intent of this issue is to document the recommended approach for future users, such that we reduce the potential combinations of unexpected overrides they might attempt, and they could as us to maintain support.
Description
Using the
EXTRA_CONF_DIRS
, it is possible to apply additional configurations on top ofDEFAULT_CONF_DIRS
. The activated components are then the super-set ofDEFAULT_CONF_DIRS | EXTRA_CONF_DIRS
. However, this still forces the user to have, at least, theDEFAULT_CONF_DIRS
set of components enabled. Technically, it would be perfectly possible to overrideDEFAULT_CONF_DIRS
to start the instance with an even smaller subset than the proposed default services.The recommended approach to do so should be better documented in:
birdhouse-deploy/birdhouse/env.local.example
Lines 45 to 47 in 2b344d3
Also, there might be a need to document a "minimal set" of dependencies (i.e.: what must absolutely be defined for the instance to start without error). Notably,
proxy
(to havenginx
) and "some service" that overrides the root location would be needed. Maybe more?birdhouse-deploy/birdhouse/env.local.example
Lines 216 to 219 in 2b344d3
This "minimal set" could be simply documented, and it would be up to the new node maintainers to keep them in
DEFAULT_CONF_DIRS
, or we could be more proactive and make anotherMINIMAL_CONF_DIRS
variable (or some other method...?).The text was updated successfully, but these errors were encountered: