Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pysam returning libcurl error 77 when accessing public S3 file? #1257

Open
daisieh opened this issue Dec 20, 2023 · 6 comments
Open

Pysam returning libcurl error 77 when accessing public S3 file? #1257

daisieh opened this issue Dec 20, 2023 · 6 comments

Comments

@daisieh
Copy link

daisieh commented Dec 20, 2023

This started happening for us in pysam 0.22.0; it doesn't happen in 0.21.0.

In case it helps, here is pip list and a quick example of the error:

root@ef448fe76fd3:/app/htsget_server# pip list
Package                   Version
------------------------- ----------
attrs                     23.1.0
candigv2-authx            1.0.0
certifi                   2023.11.17
charset-normalizer        3.3.2
click                     8.1.7
clickclick                20.10.2
connexion                 2.14.1
exceptiongroup            1.2.0
Flask                     2.2.5
Flask-Cors                3.0.10
greenlet                  3.0.2
idna                      3.6
inflection                0.5.1
iniconfig                 2.0.0
itsdangerous              2.1.2
Jinja2                    3.1.2
jsonschema                4.20.0
jsonschema-specifications 2023.11.2
MarkupSafe                2.1.1
minio                     7.1.14
packaging                 23.2
pip                       23.0.1
pluggy                    1.3.0
psycopg2-binary           2.9.9
pysam                     0.22.0
pytest                    7.2.0
PyYAML                    6.0.1
referencing               0.32.0
requests                  2.31.0
rpds-py                   0.15.2
setuptools                65.5.1
six                       1.16.0
SQLAlchemy                1.4.44
swagger-ui-bundle         0.0.9
tomli                     2.0.1
urllib3                   2.1.0
uWSGI                     2.0.23
Werkzeug                  2.3.8
wheel                     0.42.0

[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: pip install --upgrade pip
root@ef448fe76fd3:/app/htsget_server# python
Python 3.10.13 (main, Dec 19 2023, 20:49:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
>>> x = pysam.VariantFile("https://1000genomes.s3.us-east-1.amazonaws.com/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz")
[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))
[E::hts_open_format] Failed to open file "https://1000genomes.s3.us-east-1.amazonaws.com/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" : Input/output error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pysam/libcbcf.pyx", line 4117, in pysam.libcbcf.VariantFile.__init__
  File "pysam/libcbcf.pyx", line 4342, in pysam.libcbcf.VariantFile.open
OSError: [Errno 5] could not open variant file `b'https://1000genomes.s3.us-east-1.amazonaws.com/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz'`: Input/output error

The main error seems to be

[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))

But it seems odd that this error doesn't happen with pysam 0.21.0 or lower...

@daisieh
Copy link
Author

daisieh commented Jan 4, 2024

It's possible that this is a Docker thing...?

@litaifang
Copy link

I created a duplicate issue several weeks ago. This happens with other remote locations too (google cloud and https bam files): #1268

@jmarshall
Copy link
Member

Thanks for the report. Increasing the verbosity helps identify the problem:

>>> import pysam
>>> pysam.set_verbosity(9)
3
>>> pysam.AlignmentFile('s3://example/foo.bam')
[…]
*   Trying 3.5.7.133...
* TCP_NODELAY set
* Connected to example.s3.amazonaws.com (3.5.7.133) port 443 (#0)
* ALPN, offering http/1.1
* error setting certificate verify locations:
  CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))
[E::hts_open_format] Failed to open file "s3://example/foo.bam" : Input/output error
[…]

This /etc/pki/tls/certs/ca-bundle.crt path is RedHat/CentOS/Fedora's convention for the CAfile. You are probably running on Debian or Ubuntu, where the conventional path is /etc/ssl/certs/ca-certificates.crt and the path it's looking for does not exist.

You can work around this by exporting CURL_CA_BUNDLE so that pysam's libcurl will look for these files in the right place:

export CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

@jmarshall
Copy link
Member

jmarshall commented May 9, 2024

As for why this has started happening in 0.22.0, it has to do with how wheels are built. The problem is in the system libraries that are distributed inside manylinux wheels, so this is related to #1097 and #1276.

The pysam-0.21.0 Linux wheels were built for manylinux_2_24, which is now EOL and was based on Debian 9. These wheels contain a copy of libcurl-9f97daa0.so.4.4.0, which as it was built on Debian defaults to the /etc/ssl/… path.

More recent pysam releases' Linux wheels have been built for manylinux_2_28, which is based on AlmaLinux 8. These wheels contain a copy of libcurl-14d1b62d.so.4.5.0, which as it was built on a Red Hat-compatible defaults to the /etc/pki/… path.

Fedora (40, at least) contains symlinks under /etc so that both styles of path point to the same certificate bundle, so works with both flavours of wheel. (But e.g. Rocky and Alma do not; see also this bug.) Debian and Ubuntu do not, so only the wheel containing the Debian-style libcurl.so will work (without assistance from the environment variable).

This would appear to be a limitation in manylinux's claim to be making wheels that are portable across distributions!

This can be worked around by having everyone set $CURL_CA_BUNDLE as appropriate, but that is less than ideal. Ways of dealing with this when building future wheels would include:

  1. Manylinux may find a way to fix the libcurl.so that they ship.

  2. Because manylinux_2_24 is EOL, reverting to building that flavour of wheel is a non-starter.

  3. Pysam could patch its copy of hfile_libcurl.c to detect what paths are available at runtime and set CURLOPT_CAINFO accordingly, so that it would automatically work with whichever path style was present.

  4. The real problem here is the large number of system libraries that get pulled into our manylinux wheels.

    If we omitted the plugins from pysam wheels, libcurl.so and many other libraries would not be pulled into our wheels, and this and the two issues mentioned above would be fixed at a stroke. These plugins should not really be shipped within the Python world at all; it would be better if pysam could access externally-provided (non-Pythonised) plugin object files, via $HTS_PATH if necessary. But transitioning to that model would be a non-trivial deployment problem.

Long-term the correct approach is (4) as it solves numerous problems: these three issues and also reduces the size of our wheels. It may be worth doing (3) too in the interim.

@daisieh
Copy link
Author

daisieh commented May 10, 2024

From my POV, the fix suggested works for us! Thank you so much, and if you all feel that the root issue should be carried on elsewhere instead of in here, feel free to close.

@jmarshall
Copy link
Member

Glad to hear it does the trick.

Let's keep this one open to represent the interim fix (3), and in due course I'll open another issue to represent (4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants