-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for cloud object storage (S3, Swift) ? #231
Comments
Not at the moment. To do it you'd have to find a way to mount your swift storage as a virtual file system. But, it would indeed be good to be able to access things like swift and aws storage directly through IIPImage. It risks to be quite inefficient, however, unless your images are optimized for cloud storage - you'd have to use something like cloud-optmized GeoTIFF: https://www.cogeo.org/ to make sure random access is fast enough |
Thanks for the answer ! Indeed, we are currently facing a dilemma. We must move our JPEG 2000 images from ordinary filesystems into Swift storage. But we were worried that IIPImage would no longer be able to serve them. From what I understand with your answer, our current solutions are :
|
As your images are currently in JPEG2000 format, I think your best and most flexible option at this stage would be to mount a virtual FS with your swift storage. In the longer term, I'll look into adding native swift support to IIPImage to avoid the need for a virtual FS. In such a configuration, a format such as COG would be much faster than using JPEG2000 or normal TIFF. |
When I started evaluating IIIF image servers for my institution, I was initially taken aback by the lack of storage options of IIPImage. Afterwards, I actually found this limitation to be a good thing, that keeps IIPImage simple and reliable. S3 is several times slower than an SSD directly attached to the server or mounted via NFS over a fiber-channel network. We ended up writing a small piece of Python middleware that does the following:
Along with some basic maintenance tools such as clearing the cache volume on demand and routinely by pruning older files, it's a relatively simple, low-maintenance addition that allows you to have your image sources anywhere, without depending on the image server features. Also, it allows you to provide fast access to frequently used sources without paying a fortune to store all your images in an SSD. |
@scossu Thanks for the report ! Our situation is that we have ~150 TB of JPEG 2000 images (around 4 millions files), that are currently served from a filesystem through IIPImage. Now, the vast majority of those images are, rarely if never, going to be served, and will remain in the archive for years without someone to touch them. More are added every month. Converting everything to COG in advance seems like a huge overkill, in term of processing power, and storage cost, since COG would be 3 to 10 times bigger than the original image to maintain lossless quality. My initial thought was to keep the JPEG2000 archive as is, but generate a lossy COG copy with In your scenario, the file would be fetched from object storage into a nearby cache to be served as is by IIIPImage. On the top of my head, I don't know which scenario would have the lowest latency from user request to image display, but it seems yours should be faster, since there would be no conversion step. However once the COG is generated, it would be served by a simple HTTP server. This would appear to scale better and would unload a lot of work from the backend. |
@scossu's solution is indeed a good option. The only drawback is that the very first request to a new image not in cache will be very slow as you have to copy the whole file across first. All subsequent requests will, however, be very fast. Regarding the use of COG directly through HTTP, it really depends what you want to be able to do with the images. Don't forget that COG is still a TIFF file. The only difference between COG and classic TIFF is just related to how the internal TIFF metadata in the file is ordered. COG puts all this information at the beginning of the file, whereas in classic TIFF, this can be scattered throughout. A COG HTTP request will give you direct access to the compressed tiles and not to transcoded images as you would get with an image server such as IIPImage. You also won't be able to get anything that isn't a tile, such as image overviews, arbitrary regions or be able to apply any image processing, unless you handle this through some client-side javascript. If you want the fastest possible access to tiles with no intervening image server and no need for client-side JS, then the old Zoomify or Deepzoom approach would be your best bet - you just pre-generate the JPEG tiles and store them all as separate files on your cloud server, which would server them directly to the browser.
By the way, lossless tiled pyramid TIFF (COG or not) will be about twice as large as lossless JPEG2000 (and similar in size to the raw image size). |
The goal is to be able to visualize the (8-bit) images in a browser. The user is the general public. So : pan, zoom, that's it. No need for image processing.
If we switched to COG, we would to use OpenLayers with the GeoTIFF source. This would remove most of the load from the server, and reduce the costs. If we stayed with JPEG2000s, we would probably keep IIPImage, with the aforementioned drawbacks. |
Agree that this would be the fastest way to serve tiles in large scale. However, maybe we could gain speed and dynamic serving together. If we think about IIP serve to include the computing power in browser, why couldn't we manipulate tiles in browser platform such as WASM and keep all tiles static on image server. |
Yes, if you use something like COG, you would have direct access to the raw image data and would be able to offload all processing to the browser itself through JS and WASM.. |
Execute me, what's COG? |
@joesong168 Cloud Optimized GeoTIFF is a way to stream imagery (a bit like what IIPServ does). The benefit of COGs is that they don't need a particular service (a simple HTTP server is enough). |
@sguimmara Thanks for your explanation. Have you tried Juice FS? It is a cloud native file system that might serve your need. https://juicefs.com/en/ |
Hello,
Does IIPImage supports accessing images elsewhere than a filesystem, such as a cloud object storage (and particularly swift) ?
Thank you
The text was updated successfully, but these errors were encountered: