Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyterhub: allow users created before Cowbird was enabled to spawn jupyterlab #480

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mishaschwartz
Copy link
Collaborator

Overview

Users created before Cowbird was enabled will not have a "workspace directory" created. A workspace directory is a symlink to the directory that contains their Jupyterhub data.

When Cowbird is enabled, Jupyterhub checks if the workspace directory exists and raises an error if it doesn't.

This change allows Jupyterhub to create the symlink if it doesn't exist instead of raising an error.
This means that users without a "workspace directory" will be able to continue using Jupyterhub as they did before without the need for manual intervention by a system administrator who would otherwise need to manually create the symlink for them.

Changes

Non-breaking changes

  • changes jupyterhub configuration

Breaking changes
None

Related Issue / Discussion

Additional Information

CI Operations

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

@github-actions github-actions bot added component/jupyterhub Related to JupyterHub as development frontend with notebooks documentation Improvements or additions to documentation labels Nov 22, 2024
@mishaschwartz
Copy link
Collaborator Author

@tlvu please test this in your staging environment to ensure that this works for all of your users who were created before cowbird was enabled by default (version 2.0)

@tlvu
Copy link
Collaborator

tlvu commented Nov 22, 2024

@mishaschwartz great ! Will test this next week.

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

@mishaschwartz
Copy link
Collaborator Author

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix!
I'll let @tlvu do a more in-depth test to validate.

@mishaschwartz
Copy link
Collaborator Author

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Check out #481 which adds better documentation to avoid this for other users in the future.

@tlvu
Copy link
Collaborator

tlvu commented Nov 26, 2024

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Check out #481 which adds better documentation to avoid this for other users in the future.

Right sorry, I forgot all the work-around are just configs in env.local and no code change required.

@tlvu
Copy link
Collaborator

tlvu commented Dec 10, 2024

@mishaschwartz
I backported your change to the state that Ouranos is, at this commit efb8485.

Then from env.local.example I enable the poor-man sharing plus

export EXTRA_CONF_DIRS="
    ./components/cowbird
    ./optional-components/canarie-api-full-monitoring
    ./optional-components/all-public-access
    ./optional-components/secure-thredds
    ./optional-components/wps-healthchecks
"
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="share"

So with this very minimal env.local, everything works fine, I can login to Jupyter and see previous contents.

Then I enabled a config that is more similar to our production

export EXTRA_CONF_DIRS="
  ./components/canarie-api
  ./components/geoserver
  ./components/finch
  ./components/raven
  ./components/hummingbird
  ./components/thredds
  ./components/portainer
  ./components/jupyterhub
  ./components/cowbird
  ./optional-components/canarie-api-full-monitoring
  ./optional-components/wps-healthchecks
  ./optional-components/testthredds
  ./optional-components/generic_bird
  ./optional-components/x-robots-tag-header
  ./optional-components/proxy-json-logging
  ./components/weaver
  ./components/monitoring
  /path/no/exist
"

With this I am unable to login to Jupyter anymore. Here are the logs from docker logs jupyterhub:

[I 2024-12-10 06:18:04.527 JupyterHub log:191] 200 GET /jupyter/hub/login (@172.21.0.1) 2.13ms
[D 2024-12-10 06:18:09.731 JupyterHub log:191] 200 GET /jupyter/hub/static/components/font-awesome/fonts/fontawesome-webfont.woff2?v=4.7.0 (@10.10.10.6) 1.05ms
[W 2024-12-10 06:18:09.920 JupyterHub base:843] Failed login for lvu
[I 2024-12-10 06:18:09.922 JupyterHub log:191] 200 POST /jupyter/hub/login?next=%2Fjupyter%2Fhub%2F (@10.10.10.6) 178.20ms

Direct login to https://HOST/magpie with my user and passwd works so no problem on Magpie.

I removed ./components/cowbird from EXTRA_CONF_DIRS and I am able to login to Jupyter again !

This is a bit weird, I do not see how the other components can affect cowbird. I'll continue investigating another time.

@mishaschwartz
Copy link
Collaborator Author

mishaschwartz commented Dec 10, 2024

@tlvu

If you are unable to login, that is a different issue than the one addressed here. This fixes the issue that users were not able to spawn a jupyterlab container after they had logged in.

I'm also confused by your examples: in the first EXTRA_CONF_DIRS you're not enabling the jupyterhub service but you say that you can log in to jupyterhub?? Is there some other setting in your env.local that's overriding this?

In your second example can you please try with ./components/cowbird before ./components/jupyterhub in EXTRA_CONF_DIRS? Since cowbird is a required component in version 2, it will always be loaded before jupyterhub.

@tlvu
Copy link
Collaborator

tlvu commented Dec 10, 2024

in the first EXTRA_CONF_DIRS you're not enabling the jupyterhub service but you say that you can log in to jupyterhub??

This is Ouranos stack pre 2.0.0, jupyterhub, magpie and other are enabled by default 😄

In your second example can you please try with ./components/cowbird before ./components/jupyterhub in EXTRA_CONF_DIRS? Since cowbird is a required component in version 2, it will always be loaded before jupyterhub.

Oh right ! Never thought about this one but very true that the ordering could matter.

I have something to do today, will retry this investigation probably Thursday.

@mishaschwartz
Copy link
Collaborator Author

This is Ouranos stack pre 2.0.0, jupyterhub, magpie and other are enabled by default 😄

right, right.. forgot about that sorry

I have something to do today, will retry this investigation probably Thursday.

sounds good

@tlvu
Copy link
Collaborator

tlvu commented Dec 19, 2024

I re-added ./components/cowbird to EXTRA_CONF_DIRS, to the same place, not before ./components/jupyterhub and suddenly I can still login to Jupyterhub ! Basically I cannot reproduce the problem earlier in comment #480 (comment).

So I continue my testing and I created a new user in Magpie to see how Cowbird trigger works and I notice it create the following:

$ ls -l /data/user_workspaces/testcowbird01
total 4
lrwxrwxrwx. 1 root  root   40 Dec 19 20:58 notebooks -> /data/jupyterhub_user_data/testcowbird01
drwxrwxrwx+ 2 root  root    6 Dec 19 20:58 shapefile_datastore

So maybe in this PR, we might want to replicate the same behavior as the real Cowbird instead?

Then I was curious about this new shapefile_datastore dir so I look up Cowbird code and found this

$ ack shapefile_datastore                     
docs/components.rst                                 
38:    /user_workspaces/<user_name>/shapefile_datastore  # Managed by the `GeoServer` handler                                                                                                                      

cowbird/handlers/impl/geoserver.py                  
72:DEFAULT_DATASTORE_DIR_NAME = "shapefile_datastore"                                                    
686:        return f"shapefile_datastore_{workspace_name}"   

Notice there is a {workspace_name} after shapefile_datastore. I have an older Cowbird installed. The code I did the search is from the tip of master of Cowbird. I hope the folder name did not change?

This leads me to think maybe we should not try to replicate Cowbird behavior here manually but try to trigger Cowbird new user creation trigger again so any naming change is transparent for us? That assume we can call the same Magpie new user trigger from Jupyterhub.

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

Trying to trigger the hook manually:

$ curl -X POST "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson" -H  "accept: application/json" -H  "Accept: application/json" -H  "Content-Type: application/json" -d "{  \"event\": \"created\",  \"user_name\": \"testcowbird02\"}"

Got back this error and I am unable to understand why it fails:

{"param": {"conditions": {"not_none": false, "is_type": false}, "value": null, "name": "callback_url", "compare": "Type[str]"}, "code": 422, "detail": "Invalid value specified.", "type": "application/json", "path": "/webhooks/users", "url": "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson", "method": "POST"}

docker logs cowbird gives

[2024-12-20 02:48:22,409] INFO       [ThreadPoolExecutor-0_0][cowbird.utils] Request: [POST lvu8.ouranos.ca:7000 /cowbird/webhooks/users]
[2024-12-20 02:48:22,411] DEBUG      [ThreadPoolExecutor-0_0][cowbird.utils] Request details:
URL: http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson
Path: /cowbird/webhooks/users
Method: POST
Headers:
  Host: lvu8.ouranos.ca:7000
  User-Agent: curl/7.76.1
  Accept: application/json
  Content-Type: application/json
  Content-Length: 53
Parameters:
  format: application/json
Body:
  b'{  "event": "created",  "user_name": "testcowbird02"}'
[2024-12-20 02:48:22,413] DEBUG      [ThreadPoolExecutor-0_0][cowbird.api.webhooks.views] Received user webhook event [created] for user [testcowbird02].

docker logs cowbird-worker has nothing useful.

I did turn on DEBUG logging and expose the port this way:

$ git diff
diff --git a/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template b/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
index 3aa33da2..b4355c15 100644
--- a/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
+++ b/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
@@ -75,7 +75,7 @@ keys = console
 keys = generic
 
 [logger_root]
-level = INFO
+level = DEBUG
 handlers = console
 formatter = generic
 
diff --git a/birdhouse/components/cowbird/default.env b/birdhouse/components/cowbird/default.env
index 0d160735..54a17a4d 100644
--- a/birdhouse/components/cowbird/default.env
+++ b/birdhouse/components/cowbird/default.env
@@ -45,7 +45,7 @@ export COWBIRD_MONGODB_PORT=27017
 #   DEBUG:  logs detailed information about operations/settings (not for production, could leak sensitive data)
 #   INFO:   reports useful information, not leaking details about settings
 #   WARN:   only potential problems/unexpected results reported
-export COWBIRD_LOG_LEVEL=INFO
+export COWBIRD_LOG_LEVEL=DEBUG
 
 # Subdirectory of DATA_PERSIST_SHARED_ROOT containing the user workspaces used by Cowbird
 export USER_WORKSPACES="user_workspaces"
diff --git a/birdhouse/components/cowbird/docker-compose-extra.yml b/birdhouse/components/cowbird/docker-compose-extra.yml
index 5ad76749..d7a59413 100644
--- a/birdhouse/components/cowbird/docker-compose-extra.yml
+++ b/birdhouse/components/cowbird/docker-compose-extra.yml
@@ -11,6 +11,8 @@ services:
   cowbird:
     image: pavics/cowbird:${COWBIRD_VERSION}-webservice
     container_name: cowbird
+    ports:
+      - 7000:7000
     environment:
       HOSTNAME: $HOSTNAME
       FORWARDED_ALLOW_IPS: "*"

I followed these documentation to craft my curl:

Screenshot from 2024-12-19 22-02-19
Screenshot from 2024-12-19 22-05-40

How do we debug this kind of error? Is there a way to also turn on DEBUG logging for the cowbird-worker container?

@mishaschwartz
Copy link
Collaborator Author

@tlvu

So maybe in this PR, we might want to replicate the same behavior as the real Cowbird instead?

It's a good idea but I think that this needs to be tackled in a different PR. The issue you're describing is much bigger and needs careful consideration to figure out how to implement properly. Consider all of these scenarios that have to be handled:

  • a user is created after cowbird is enabled
  • a new cowbird user_created action is implemented after a user is created
  • a new cowbird handler is implemented
  • a user_created action in cowbird is modified to create/delete/modify a different resource than before
  • etc.

The PR here is really just supposed to fix the immediate issue that some users can't spawn jupyterlab containers

Got back this error and I am unable to understand why it fails:

It's telling you that you need to specify a callback URL

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

It's telling you that you need to specify a callback URL

OMG ! I can not not read Javascript response. Now that you tell me it looks clear. But I didn't "catch" it yesterday.

So I did this

$ curl -X POST "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson" -H  "accept: application/json" -H  "Accept: application/json" -H  "Content-Type: application/json" -d "{  \"event\": \"created\",  \"user_name\": \"lvu2\", \"callback_url\": \"\"}"

And it actually works, the folder structures are created on disk.

But the returned message is so misleading, with a whole bunch of NotImplementedError in docker logs cowbird as well.

{"webhook": {"event": "created", "user_name": "lvu2", "callback_url": ""}, "exception": "WebhookDispatchException([NotImplementedError(), NotImplementedError(), NotImplementedError()])", "code": 500, "detail": "Failed to handle user webhook event.", "type": "application/json", "path": "/webhooks/users", "url": "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson", "method": "POST"}

I am guessing this is the reason for your fix in #488

So given that this works, how about we call this hook from JupyterHub if the folder structure is missing on disk?

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

So if cowbird container is responding to hooks, what is the role of cowbird-worker container then? Sorry I did not have time to fully RTFM.

@fmigneault
Copy link
Collaborator

@tlvu
The worker is to run the actual operations without blocking the API. Basically, any function decorated with @shared_task that is called somewhere in the code is handled by the worker "at some point".

For example, When a user_created event is received by the API, it iterates over all "handlers" that are enabled, and call the corresponding method. Each of those methods can then sprawl out to many operations that could take more or less time, or could even depend on one another, depending on how the configuration is defined. Trying to do the operations directly with the API could lead to "soft lock" combinations.

In the case of Geoserver handler for example (https://github.com/Ouranosinc/cowbird/blob/4990fd505d5bc76cc73019daa27979adcfa2e35f/cowbird/handlers/impl/geoserver.py#L183-L188),
the chain(create_workspace.si(user_name), create_datastore.si(user_name)) only puts the workspace creation and datastore creation "in queue". Since they themselves need to request GeoServer's API, parse results, create directories, etc., it could cause Cowbird API to become unresponsive if there's too many or slow operations (eg: batch create many users), or it could cause some operations to fail if they have "race condition"-like dependencies (the directory creation for example).

class Geoserver(Handler, FSMonitor):
    #[...]
    def user_created(self, user_name: str) -> None:
        self._create_datastore_dir(user_name)
        res = chain(create_workspace.si(user_name), create_datastore.si(user_name))
        res.delay()
        LOGGER.info("Start monitoring datastore of created user [%s]", user_name)
        Monitoring().register(self._shapefile_folder_dir(user_name), True, Geoserver)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/jupyterhub Related to JupyterHub as development frontend with notebooks documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 [BUG]: Cowbird is not backward compatible with existing Jupyter users
3 participants