GitHub - InftyAI/Manta: 💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯

A lightweight P2P-based cache system for model distributions on Kubernetes.

Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.

We're reframing the Manta to make it a general distributed cache system with POSIX promise, the current capacities are still available with the latest v0.0.4 release. Let's see what will happen.

Architecture

Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.

Features Overview

Model Hub Support: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
Model Preheat: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
Model Cache: Models will be cached as chunks after downloading for faster model loading.
Model Lifecycle Management: Model lifecycle is managed automatically with different strategies, like Retain or Delete.
Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.

You Should Know Before

Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
All the models will be stored under the host path of /mnt/models/
After all, it's just a cache system.

Quick Start

Installation

Read the Installation for guidance.

Preheat Model

A sample to preload the Qwen/Qwen2.5-0.5B-Instruct model. Once preheated, no longer to fetch the models from cold start, but from the cache instead.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct

If you want to preload the model to specified nodes, use the NodeSelector:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  nodeSelector:
    foo: bar

Use Model

Once you have a Torrent, you can access the model simply from host path of `/mnt/models/. What you need to do is just set the Pod label like:

metadata:
  labels:
    manta.io/torrent-name: "torrent-sample"

Note: you can make the Torrent Standby by setting the preheat to false (true by default), then preheating will process in runtime, which obviously wll slow down the model loading.

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  preheat: false

Delete Model

If you want to remove the model weights once Torrent is deleted, set the ReclaimPolicy=Delete, default to Retain:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  reclaimPolicy: Delete

More details refer to the APIs.

Roadmap

In the long term, we hope to make Manta an unified cache system within MLOps.

Preloading datasets from model hubs
RDMA support for faster model loading
More integrations with MLOps system, including training and serving

Community

Join us for more discussions:

Slack Channel: #manta

Contributions

All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
agent		agent
api		api
cmd		cmd
config		config
docs		docs
hack		hack
pkg		pkg
preheat		preheat
test		test
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.agent		Dockerfile.agent
Dockerfile.preheat		Dockerfile.preheat
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Model

Use Model

Delete Model

Roadmap

Community

Contributions

About

Releases 4

Packages

Contributors 2

Languages

License

InftyAI/Manta

Folders and files

Latest commit

History

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Model

Use Model

Delete Model

Roadmap

Community

Contributions

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages