Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix regressions introduced by PR #359 "Flexible locations for data served by thredds" #478

Merged
merged 6 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,20 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
## Fixes

- Fix regressions introduced by PR #359 "Flexible locations for data served by thredds"

In [PR #359](https://github.com/bird-house/birdhouse-deploy/pull/359/):

`secure-thredds/config/magpie/permissions.cfg` started to use variable but was never renamed to `.template`
so those variable never get template expanded
(commit [317d96c3](https://github.com/bird-house/birdhouse-deploy/commit/317d96c39db7a6d79d1568a7094441ccdedc55ae)).

`bootstrap-testdata` default value was removed but did not source `read-configs.include.sh` so the variable
stayed blank (commit [4ab0fc74](https://github.com/bird-house/birdhouse-deploy/commit/4ab0fc74cb8fa601d75ecfc2a94749b23f60109c)).
The default value was there initially so the script can be used in standalone situation (not inside a checkout).


[2.6.0](https://github.com/bird-house/birdhouse-deploy/tree/2.6.0) (2024-11-19)
------------------------------------------------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
permissions.cfg
mishaschwartz marked this conversation as resolved.
Show resolved Hide resolved
File renamed without changes.
18 changes: 17 additions & 1 deletion birdhouse/scripts/bootstrap-testdata
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,22 @@
# Need write-access to DATASET_ROOT (/data/datasets/).


THIS_FILE="$(readlink -f "$0" || realpath "$0")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred way to do this is to use the birdhouse executable which should be used as the interface for interacting with the stack (see #428).

birdhouse configs -c birdhouse/scripts/bootstrap-testdata

I know a lot of the other scripts do this for backwards compatibility reasons but going forward I would rather encourage using the birdhouse executable.

I'm fine with this for the PR since it puts it in line with some of the other scripts in this directory but I'd like to move away from this strategy soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the scripts having that "just in case" they are called directly.
If omitted, a server using the old way would suddenly get some operations working and others not.

That being said, we should promote using birdhouse binary rather than calling the scripts directly.
However, it needs to be improved (other PRs), since the above command actually executes the script, and in this specific case, that leads to downloading all the test data. So, not that great for a "print configs" command.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault I agree with everything you're saying.

I'm not sure I understand what you mean by:

So, not that great for a "print configs" command.

Do you mind clarifying

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

birdhouse configs -c birdhouse/scripts/bootstrap-testdata

Oh thanks for the reminder, my habits still stayed pre 2.0 version !

going forward I would rather encourage using the birdhouse executable.

I think this is absolutely amazing for various external one-off scripts outside of this repo because then the user do not have to remember the boilerplate to source read-configs.include.sh.

But for official scripts committed to this repo, I think I would prefer the current backward compatible behavior for the following reasons:

  • backward-compatible, can still be used standalone (without birdhouse executable and even without being in a checkout). Historically all the scripts in scripts/ actually started as fully standalone scripts without the need to be in a checkout. I wrote them for my own quick usage, then I realized they could be useful for other, so I commit them in the repo for sharing.
  • when one script calls another script (ex: this bootstrap-testdata is actually being called by bootstrap-instance-for-testsuite, the single entrypoint for any CI to prep the instance with test data so we do not need to update the CI steps if we change our testdata initialization process), then the script should be able to run "standalone".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishaschwartz
I've tried doing birdhouse configs -c birdhouse/scripts/bootstrap-testdata, and the operation proceeded to download test data, although birdhouse -h indicates that configs only "prints a command that can be used to load configuration". I assumed that means it should only print the configurable variables, but not run the script itself? Must be a side effect of the sourcing mechanism that runs the script anyway.

Copy link
Collaborator

@fmigneault fmigneault Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this more detailed help, I'm not sure what is expected 😅
What is the use of the -p flag (load variant?) vs not having it (execute variant?) ? Shouldn't load be the default for a "configs" operation?

I'm somewhat expecting birdhouse configs to behave like docker compose config or birdhouse-compose config. It does any relevant templating replacement and prints the resulting "state" of what would be run (or available variables), but doesn't actually run it.

Is that correct expectation?
If not, why would one use configs to run the script, rather than just running the script?
It would be clearer to have something like birdhouse run -c <script-path> for that purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a minute, run birdhouse configs -h and check out the examples at the bottom to see if that clears up some of your questions.

To try to answer some of them here:

What is the use of the -p flag (load variant?) vs not having it (execute variant?)

with -p you can essentially source the environment variables into your current process. So for example, we could replace the code that we're discussing right now with this to get the same result:

eval $(birdhouse configs -p)

I'm somewhat expecting birdhouse configs to behave like docker compose config or birdhouse-compose config

You can do that with birdhouse compose config and you'll get the result you're expecting.

Is that correct expectation?

Not really... think of it this way... we're solving two problems:

  • with -c we can load all the environment variables and then run a command in a subprocess
  • with -p we can load all of the environment variables in the current process (like we were sourcing read-configs.include.sh)

If not, why would one use configs to run the script, rather than just running the script?

Because now the script doesn't need to require that all relevant environment variables are sourced. Also we can run arbitrary commands/scripts so it makes testing and script writing easier.

It would be clearer to have something like birdhouse run -c for that purpose.

That's currently what we can do with birdhouse configs -c <script-path>. Are you saying that you'd prefer run as a subcommand?

Copy link
Collaborator

@fmigneault fmigneault Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I wasn't clear by saying "I'm somewhat expecting birdhouse configs to behave like docker compose config [...]". I understand the confusion I created.

What I actually meant was not literally the "same command", but "same behavior", as in, it evaluates the configs as if it would do its usual command, but stops just before and shows the "config" instead. In the case of birdhouse configs, the "configs" would just print the list of resolved variable names and values. Doing a birdhouse compose ... right after should resolve exactly the same name/values, but do the compose operation with them. Basically, I was expecting birdhouse configs to be equivalent to a pip freeze, or any other <tool> list command, and adding the -c meant "using this config file instead of the default" (rather than run this command). Basically, a "show me resolved values" command.
Maybe that could be a birdhouse configs --list ?

I've read the examples, and yeah, it's clearer to me what it does now. I guess the -c is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, a "show me resolved values" command. Maybe that could be a birdhouse configs --list ?

Ah ok I see what you mean now. The only issue I see with implementing this is that all the configuration settings get loaded as environment variables. So I don't see a great way to correctly display all the resolved values without including all the other environment variables that exist on the system as well. You can just do:

birdhouse configs -c printenv

This also includes other environment variables that are used internally but may not have much relevance to the user such as DELAYED_EVAL and VARS.

If we prefixed all our environment variables with BIRDHOUSE_ then we could do it but as it stands, we don't have a clean way of implementing this.

If you can think of something I'm missing please let me know.

and adding the -c meant "using this config file instead of the default"

Are you thinking of the -e or --env-file flag here?

I guess the -c is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.

I mean the help string for this command says "Execute the given command after loading configuration settings", I'm not sure how else to warn the user that the command that they provide will be executed other than by saying "execute the given command"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this PR. Maybe you guys can continue the conversation over #479?

THIS_DIR="$(dirname "${THIS_FILE}")"
COMPOSE_DIR="${COMPOSE_DIR:-$(dirname "${THIS_DIR}")}"

if [ -f "${COMPOSE_DIR}/read-configs.include.sh" ]; then
. "${COMPOSE_DIR}/read-configs.include.sh"

# Get THREDDS_SERVICE_DATA_LOCATION_ON_HOST
read_configs
fi


if [ -z "${DATASET_ROOT}" ]; then
DATASET_ROOT="${BIRDHOUSE_DATA_PERSIST_ROOT}/${THREDDS_SERVICE_DATA_LOCATION_ON_HOST}"
# Default for when unable to source read-configs.include.sh (ie when
# used standalone outside of the checkout).
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:=/data/datasets}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be :- instead of :=

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will set THREDDS_SERVICE_DATA_LOCATION_ON_HOST if undefined, and then set DATASET_ROOT with it, which will result in the same behavior in this case.

Since the rest of the script doesn't rely on THREDDS_SERVICE_DATA_LOCATION_ON_HOST, not a big deal, but we can use :- since it is more common to indicate the default value.

However, maybe consider BIRDHOUSE_DATA_PERSIST_ROOT as well:

DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:-${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/datasets}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I started to use := instead of :- to actually set THREDDS_SERVICE_DATA_LOCATION_ON_HOST because had the sourcing of read-configs.include.sh worked, that THREDDS_SERVICE_DATA_LOCATION_ON_HOST would also exist for real.

I was trying to emulate the successful sourcing of read-configs.include.sh even in standalone mode, but it was not needed since we don't use THREDDS_SERVICE_DATA_LOCATION_ON_HOST afterwards. I can revert to :- no problem.

fi

FROM_SERVER=${FROM_SERVER:-"https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse"}
Expand Down Expand Up @@ -39,6 +53,8 @@ for afile in ${FILE_LIST}; do
if [ ! -d "${PARENT_DIRS}" ]; then
mkdir -p "${PARENT_DIRS}"
fi
set -e # Fail on error.
curl "${FROM_SERVER}/${afile}" --output "${afile}"
set +e
fi
done
Loading