-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for submitting jobs to Kubernetes #181
Comments
Hello, we have no experience/knowledge about kubernetes-native batch. At at first look it seems to me that could be just another plugin to add to https://github.com/DIRACGrid/DIRAC/tree/integration/src/DIRAC/Resources/Computing (in DIRAC, or later in DiracX, does not seem different to me). What would it be the use case? |
Hi @fstagni , Thanks for the info. Kubernetes is quite popular and, aside from providing a wide array of capabilities that are not possible in traditional batch systems, is also gaining in feature parity for batch system scheduling functionality. There are a few ATLAS T2 sites that are native kubernetes batch clusters thanks to the k8s plugin of Harvester for Panda that was developed in ~2018 or so (for reference: CHEP2023 presentation CHEP2023 paper). I was wondering if experiments adopting DIRAC would also be able to support kubernetes sites. Particularly for new experiments, it can be more feasible and attractive to start developing a distributed computing framework using modern cloud native technologies. It looks like the development effort involved would mainly involve writing a KubernetesComputingElement.py file? Just curious at this point. Authentication to the Kubernetes API can be done with X509 certificates (not proxies) or OIDC and tokens, presumably Dirac already has some support for that? Thanks! |
That would be the way to do. DIRAC supports through these plugins the traditional HTCondor and ARC CEs as well as "SSH" CEs and computing clouds (https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/Resources/Computing/CloudComputingElement.py which uses libcloud under the hood).
That should not be pose an issue. Normally, since we are a small and busy group, we do not embark in developments without a requirement (from a VO using DIRAC). Questions:
|
Okay thanks. For know I was just gathering information to see how much work it would take, how much of a priority it might be, or if it would be straightforward for a potential contributor to work on, etc. In ATLAS, the NET2 in the US is also k8s native, and the ATLAS Google Cloud project, and a site in Taiwan. Several other sites are also interested and experimenting; in total there are 7 Panda queues for kubernetes in ATLAS. I'm not sure what other VOs they might support, but if ATLAS is the only VO using a workflow management system (Panda + Harvester) that supports Kubernetes (as far as I know, could be wrong), that would limit the options for adoption by other VOs. As for new experiments, SKAO is looking into a kubernetes-based approach and has considered using DIRAC. |
Hello,
Do you envision that diracx might have support for submitting jobs to kubernetes clusters (as kubernetes-native batch/v1 jobs), similar to the kubernetes plugin of Harvester for Panda, along with submitting to traditional batch clusters?
Thanks!
The text was updated successfully, but these errors were encountered: