We're building an end-to-end multi-modal MoE that works in 3D parallelism, and do pre-training in a decentraized way as proposed in the paper DiLoCo

If you want to contribute, please check the following links

Provide feedback

Saved searches