Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bioc2024] Bioconductor Workshop for BiocPy #89

Open
jkanche opened this issue Jul 17, 2024 · 21 comments
Open

[Bioc2024] Bioconductor Workshop for BiocPy #89

jkanche opened this issue Jul 17, 2024 · 21 comments
Labels

Comments

@jkanche
Copy link

jkanche commented Jul 17, 2024

Hi, I am presenting a workshop next week on BiocPy:interoperability between R and Python.

Most of the content is in Python, so folks will be following along using Jupyter that already contains all the necessary packages. Do I provide a docker image with Jupyter notebook and the packages preinstalled? How do i do this?

I currently use quarto to publish the tutorial website and is hosted here: https://github.com/BiocPy/BiocWorkshop2024

@almahmoud
Copy link
Collaborator

We've only ever done RStudio containers in the past, but happy to do the manual work to support jupyter.
Please let me know the container, port at which jupyter is exposed, and name/description for your workshop, and I'll do my best.

I'll do my best to get it up by tomorrow, but given the tight deadline and conference starting in a few days, if not done by tomorrow, I'd encourage you also plan for a backup such as Colab, but let's try using the platform first.

@jkanche
Copy link
Author

jkanche commented Jul 18, 2024

Thank you @almahmoud, I would like to use https://jupyter-docker-stacks.readthedocs.io/en/latest/ and it runs on port 8888

@almahmoud
Copy link
Collaborator

Hey @jkanche, you pointed to the general jupyter docker docs, not to a specific container. Please let me know which of the many jupyter containers you'd want to use, or better yet, if you can create a custom container with your packages etc pre-installed on top of that general jupyter container, that'd be even better. If you don't have experience using Docker and/or don't need a specific container just any Jupyter environment, please let me know what Pypi/conda/R packages you need and I'll try to build it on my side for you.
Also, do you need a jupyter container with both R and python kernel, or just python kernel?

@jkanche
Copy link
Author

jkanche commented Jul 19, 2024

Hi @almahmoud , thank you so much for helping me out here. I tried to use the bioconductor_docker:devel to create an image but ran into many issues. It would be super helpful to create one with both the python and R kernel. I have the packages listed in the workshop repository:

python dependencies: https://github.com/BiocPy/BiocWorkshop2024/blob/master/requirements.txt
R dependecies: https://github.com/BiocPy/BiocWorkshop2024/blob/master/rpackages.R

If having both R and Python is too much trouble, just a simple jupyter image with the Python packages installed would also be very helpful. I would really appreciate any help here.

@jkanche
Copy link
Author

jkanche commented Jul 19, 2024

quick update, I was able to publish an image containing the notebook and the relevant python packages to github registry: https://github.com/BiocPy/BiocWorkshop2024/pkgs/container/biocworkshop2024%2Fbuilder
and the dockerfile used for the build - https://github.com/BiocPy/BiocWorkshop2024/blob/master/Dockerfile

Jupyter notebook runs on post 8889. It has tokens, do you know if there's a way to disable token based authentication?

@almahmoud
Copy link
Collaborator

Hey @jkanche I actually made a container for you already, and it's not deployed to the instance at workshop.bioconductor.org . Please try it out and let me know if it works

@jkanche
Copy link
Author

jkanche commented Jul 19, 2024

Awesome, thank you so much. I am on a password screen, do you know what the default password is?
image

@almahmoud
Copy link
Collaborator

almahmoud commented Jul 19, 2024

Sorry about that, the startup command didn't take effect as expected the first time, try again now, there should be no password, and you should have both R and python kernels. Here is my simple Dockerfile:

FROM jupyter/r-notebook:r-4.3.1
USER root
RUN apt update -qq && apt install python3-dev build-essential -y && curl -O https://raw.githubusercontent.com/Bioconductor/bioconductor_docker/devel/bioc_scripts/install_bioc_sysdeps.sh && bash install_bioc_sysdeps.sh 3.18 && pip install -r <(curl -s https://raw.githubusercontent.com/BiocPy/BiocWorkshop2024/master/requirements.txt) && curl -s https://raw.githubusercontent.com/BiocPy/BiocWorkshop2024/master/rpackages.R | Rscript -

I used the latest available R notebook to make the jupyter setup easiest, but that means you have to use Bioc 3.18 and R 4.3.1. Lmk if that's an issue I can try to make an updated container

@jkanche
Copy link
Author

jkanche commented Jul 19, 2024

@almahmoud thank you so much. I had a couple of issues during this session 1) having file permissions issues when packages download something and 2) sqlite version shipped in the container is too old.

(2) can be fixed by

# Download and build SQLite3 from source
RUN wget --no-check-certificate https://www.sqlite.org/2024/sqlite-autoconf-3450300.tar.gz && \
    tar -xvf sqlite-autoconf-3450300.tar.gz && \
    cd sqlite-autoconf-3450300 && \
    ./configure && \
    make && \
    make install && \
    export PATH="/usr/local/lib:$PATH" && \
    cd .. && \
    rm -rf sqlite-autoconf-3450300.tar.gz sqlite-autoconf-3450300

# Set environment variable for LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/lib

do you know whats causing (1)?

@almahmoud
Copy link
Collaborator

I can modify container, thank you for providing the commands! Re 1) Are you writing to /home/jovyan ? I believe the default working directory might have permission issues as I didn't account for the user in the jupyter container on the NFS, but you shouldn't need that anyway. If you also can't write to /home/jovyan/ lmk and if you can provide a reproducible example that'd be really helpful too

@jkanche
Copy link
Author

jkanche commented Jul 19, 2024

I was running this chunk from the container @ notebook/genomic_ranges.ipynb, which downloads the bed file to the current working directory.

from geniml.bbclient import BBClient

bbclient = BBClient(cache_folder="cache", bedbase_api="https://api.bedbase.org")
bedfile_id = "be4054acf6e3feeb4dc490e6430e358e" 
bedfile = bbclient.load_bed(bedfile_id)
peaks = bedfile.to_granges()

filter_chr22 = [x == "chr22" for x in peaks.get_seqnames()]
peaks_chr22 = peaks[filter_chr22]

print(peaks_chr22)

@almahmoud
Copy link
Collaborator

@jkanche Thanks for the details! That was my bad, I forgot to chown the git directory since it's being cloned as root at startup. It should be fixed now, and container updated! Let me know if you encounter any other issues!

@jkanche
Copy link
Author

jkanche commented Jul 22, 2024

@almahmoud Thank you, this resolves the directory issue. Is there any way we can update the sqlite version in the container. It needs a newer version that the one available through the distros - #89 (comment)

@almahmoud
Copy link
Collaborator

Hey @jkanche, are you not seeing the updated sqlite version? I ran your command from above and updated the container already.

@jkanche
Copy link
Author

jkanche commented Jul 22, 2024

The notebook says the sqlite version is 3.43 instead of 3.45. I'm checking to see if there's another env variable i should be setting

image

@almahmoud
Copy link
Collaborator

I'm currently running the command as root, when you tried that installation command did you run as jovyan within the container or also ran as root?

@jkanche
Copy link
Author

jkanche commented Jul 22, 2024

seems like the notebooks are run as jovyan

image

@jkanche
Copy link
Author

jkanche commented Jul 22, 2024

if you are testing this, running section 1.1 from annotate cell types notebook, should give you the list of datasets.

Right now its an error, had the same issue before so i know its sqlite version
image

@almahmoud
Copy link
Collaborator

almahmoud commented Jul 22, 2024

I had not tested anything, simply added your sqlite upgrade suggestion, assuming you had tested that and seen it work. I have now added conda update -y -c conda-forge libsqlite instead which actually updates the version of sqlite you see in python. Try it out now, looking quickly in the container, I see:

>>> import sqlite3
>>> sqlite3.sqlite_version
'3.46.0'

Trying it out in the notebook, seems to work

image

@jkanche
Copy link
Author

jkanche commented Jul 23, 2024

awesome! thank you very much!

@jkanche
Copy link
Author

jkanche commented Jul 30, 2024

Hi @almahmoud, I am trying to build a docker image and register both R and Python kernels. Does the version you published to workshop.bioconductor.org look something like this ?

https://github.com/BiocPy/BiocWorkshop2024/blob/master/Dockerfile.bioc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants