-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Image Acceleration(Apparate) #165
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: fanjiankong <[email protected]>
This proposal sounds conceptually very similar to https://github.com/containerd/stargz-snapshotter 🤔 |
And very even more similar to nydus image acceleration service: https://github.com/dragonflyoss/image-service We've been discussing with Harbor team to create a pluggable image conversion mechanism that works for different image formats (currently nydus and estargz included). Maybe Apparate can join the force as well ;) /cc @ktock |
what's the difference between https://github.com/dragonflyoss/image-service and this one? |
👍 Recently a variety of image formats are discussed in the community (e.g. nydus, estargz, zstd:chunked...) not only Apparate, so it would be great to have a generic (and pluggable) conversion mechanism that works for them. |
A pluggable image conversion mechanism has also been proposed here: #167 |
It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter). |
@lovecontainers Standardization of lazy pulling in the current version of OCI Image Spec (v1) is discussed in opencontainers/image-spec#815. |
@ktock yeah, I hope for the next oci spec. but nydus looks quite similar to stargz as it illustrated in the doc that nydus is a improvement of stargz. In fact, almost all newer remote image formats looks the same. So I think maybe is a better way to bring up the stargz v2 rather than so many stargz liked ones. At this moment, widely disscussion is necessary, but repeated ones are meaningless. |
@lovecontainers Yes, repeated ones are meaningless. And there's a novel solution open-sourced recently: |
@lihuiba thank u, this is my first time learned about overlaybd for I am a beginner of containers. it looks like traditional vm image and native friendly to remote access. The most interesting point for me is your implementation deos not depends on FUSE. |
There's a fundamental difference between stargz and nydus:) |
@malc0lm Pls subscribe and discuss here, we need to answer questions from the community. |
it is really a great improvement. it is hard to say a fundamental difference, and also the Apparatus. these similar propsosals may have competitions for business, for they stand for different companies, but make no sense for community reaching an agreement of next oci . |
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds. |
interesting,good for you, you are so funny |
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting. |
@xujihui1985 Hi, jihui. "It doesn't depends on FUSE / virtio-fs" is just a statement of fact, and a confirmation to lovecontainers. The reasons why I believe overlaybd is the best is complicated, and I suggest you read the papers above mentioned. There are paragraphs discussing this topic. Thanks! |
@xujihui1985 Higher abstraction level doesn't necessarily mean better solution. For example, Python is a higher-level language than Java or C/C++, but Python is not necessarily better in every aspect. The best (-fit) abstractions vary in difference scenarios. The abstraction of block device doesn't preclude a file system abstraction on top of it. Actually, we have made an internal solution that includes an enhanced file system, called rofs, atop overlaybd. This solution unleashes all the imaginations about the file system abstraction, while retaining the advantages of block device, i.g. simplicity and efficiency. |
@lihuiba I don't get this metaphor, what's the matter with python? 😂 and I'm pleased to know you are working on a solution of filesystem. welcome to join the force. :) |
@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)? |
@lovecontainers My team is also trying to improve fuse's performance, and we have an up-coming paper on this topic: https://www.usenix.org/conference/atc21/presentation/hsu . But there's one more thing to solve: failure recovery. If fuse server process crashes, or gets killed, the file system instance may not recovery. These problems (perforamce, fault-tolerance, etc.) do not exist in overlaybd. |
At early stage of developing fs based image acceleration technologies, FUSE is a good choice. When the technology becomes mature, an in kernel read only fs may be better solution. |
@lovecontainers FUSE is not the problem of bottleneck, the problem is how to use fuse, the pros of stargz is the compatibility with targz, this is realy good one, the problem IMO is
What nydus does to improve is to do "overlay" in build stage, and build the final view of root fs in metadata, so that one fuse mountpoint per image, underlying blob file is shared. |
yeah, I have already tried something similar to your solutions. thank you. |
@kofj where is the git repository of Apparate? I am curious about Apparate's solution of recovering fuse process :) |
An important goal of this proposal is to create a vendor-neutral sub-project in the goharbor community. |
@lovecontainers Sorry, there is no Apparate repository in github currently. Recovering fuse process is core ability for Apparate. First, fuse in userspace and kernel fuse module use /dev/fuse fd to communitcate, so it must separate fuse process and holding fd process. And we also need fuse request tracing in case of io hang in recovering. Finally, in read/write fuse filesystem, we also need record opened fd. |
looking forward to see your implementation on github |
repo is here: https://github.com/goharbor/acceleration-service |
slack channel https://cloud-native.slack.com/archives/C01U31AK2LX |
A status update about the nydus image service project(https://github.com/dragonflyoss/image-service). Recently we have published nydus v2.0, which includes an experiment rafsv6 image format. The rafsv6 image format is compatible with in the kernel EROFS filesystem, so a rafsv6 image could be directly mounted by the EROFS. And a patchset to integrate EROFS with fscache subsystem has been merged into linux 5.19-rc1. With all this, an rafsv6 image could be used in two ways:
We are preparing articles to give more information about this topic too. Thanks! |
FYI:with the latest linux 5.19-rc1, nydus image could be mounted by the in-kernel EROFS:) https://d7y.io/blog/2022/06/06/evolution-of-nydus/ |
@kofj can you check https://github.com/goharbor/acceleration-service and take decision to close this or rework it! Thank you! |
WIP. Update later.
Signed-off-by: fanjiankong [email protected]