Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read FITS straight out of AWS S3 storage #3

Open
trjaffe opened this issue Aug 30, 2023 · 5 comments
Open

read FITS straight out of AWS S3 storage #3

trjaffe opened this issue Aug 30, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@trjaffe
Copy link
Contributor

trjaffe commented Aug 30, 2023

Can cfitsio be made to read data out of an AWS S3 location using the AWS SDK for C++? (https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/examples-s3.html) I mean, for example, that a user would be able to work on an AWS EC2 instance and run a command like

fdump infile=s3://nasa-heasarc/chandra/data/byobsid/5/4475/primary/acisf04475N004_full_img2.fits.gz

This would be extremely useful for users in AWS to work analogously to the astropy.io.fits ability to read straight out of S3 using the boto3 and s3fs libraries.

@trjaffe trjaffe added the enhancement New feature or request label Aug 30, 2023
@zoghbi-a
Copy link

A simpler step before this may be to do it over a remote http address.

@trjaffe
Copy link
Contributor Author

trjaffe commented Sep 1, 2023

For caldb, yes, but long term, we will need this anyway.

@lpsinger
Copy link
Contributor

FYI, in healpy, we recently took pains to make sure that we were not linking cfitsio against libcurl because it turns out that it is a very large dependency (healpy/healpy#906). At least for Python users, support for URLs usually comes from a higher level than cfitsio itself; that would also be my preference for S3 support.

@pkgw
Copy link

pkgw commented Oct 17, 2024

For DASCH, I've implemented a version of this capability by wrapping the CFITSIO code in a Rust framework that uses the AWS S3 Rust SDK to implement the I/O backend:

https://github.com/pkgw/dasch-science-lambda/blob/dev/src/s3fits.rs

This uses the fits_register_driver() function to add s3:// as a supported URL form. This hook appears to be totally undocumented, but as far as I can see the I/O subsystem is actually very nicely pluggable. If you only need read-only support, it's conceptually quite straightforward: the basic I/O tasks translate directly into HTTP HEAD and ranged GET requests. For efficiency, it makes a big difference to add a buffering layer since CFITSIO wants to make a lot of small reads. I wrote a blog post describing my use case.

There's no reason that one couldn't do this in C++ with the language-appropriate AWS SDK, but I personally would only do so under duress.

It remains true that adding support for this inevitably brings in a bunch of HTTPS infrastructure that, as Leo mentioned, makes the library a lot heavier than it is otherwise. For my use case, that's fine, and for Tess's fdump example it would be necessary. If you want the features at a higher level while keeping the compiled library small and simple, you could imagine a Python package like Astropy adding a fairly small set of C shims to expose the pluggable I/O functionality at the Python-language level. The HTTPS logic would have to come in somewhere, but that would keep it out of CFITSIO itself.

@lpsinger
Copy link
Contributor

I'd prefer if this is not built into cfitsio. The builtin http support already brings along dependencies on curl and ssl that make building downstream dependencies more complicated than otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants