vectorch-ai / ScaleLLM Public

Notifications You must be signed in to change notification settings
Fork 30
Star 400

Code
Issues 33
Pull requests 7
Discussions
Actions
Projects 3
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Issues: vectorch-ai/ScaleLLM

ScaleLLM Roadmap

#84 opened Mar 16, 2024 by guocuimi

Open 3

ScaleAttention: a custom CUDA kernel, optimized for inference.

#356 opened Jan 1, 2025 by guocuimi

Open

Labels 12 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

33 Open 43 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

ScaleAttention: a custom CUDA kernel, optimized for inference. performance

Improvements to performance

roadmap

#356 opened Jan 1, 2025 by guocuimi

RuntimeError: Timed out

#310 opened Aug 16, 2024 by spongxin

Is there any plans to support Int8 weight quant ?

#276 opened Jul 18, 2024 by sitabulaixizawaluduo

[Issue] Qwen-14B-Chat init fail and performance issue.

#275 opened Jul 16, 2024 by liutongxuan

LoRA: QLoRA/S-LoRA: Serving thousands of LoRA adapters

#166 opened Apr 28, 2024 by guocuimi

Introducing the Mamba model

#165 opened Apr 28, 2024 by guocuimi

Introducing a ring attention mechanism for handling long contexts

#164 opened Apr 28, 2024 by guocuimi

Quantization: Supporting FP8 for both models and KV caches

#163 opened Apr 28, 2024 by guocuimi

Enhancing documentation for improved usability

#162 opened Apr 28, 2024 by guocuimi

Exploring other chips such as TPU, etc. backlog

#160 opened Apr 28, 2024 by guocuimi

Loosening coupling with PyTorch for easy deployment

#159 opened Apr 28, 2024 by guocuimi

Adding more Prometheus metrics and creating a Grafana dashboard for monitoring.

#158 opened Apr 28, 2024 by guocuimi

Adding more bechmarks and unittests for kernels and dependencies

#157 opened Apr 28, 2024 by guocuimi

Extending support to macOS and Windows platforms backlog

#156 opened Apr 28, 2024 by guocuimi

Structural Decoding: Function Calling backlog

#155 opened Apr 28, 2024 by guocuimi

Structural Decoding: Json format

#154 opened Apr 28, 2024 by guocuimi

Structural Decoding: Json format

#153 opened Apr 28, 2024 by guocuimi

GPU Arch: Turing architecture (sm75) enhancement

New feature or request

#152 opened Apr 28, 2024 by guocuimi

Adding support for Apple chips enhancement

New feature or request

#151 opened Apr 28, 2024 by guocuimi

Introducing multi-modal models (LLaVA model) enhancement

New feature or request

#150 opened Apr 28, 2024 by guocuimi

Implementing MoE (Mixture of Experts) kernels performance

Improvements to performance

#149 opened Apr 28, 2024 by guocuimi

Implementing fused FFN (Feed-Forward Network) to enhance efficiency performance

Improvements to performance

#148 opened Apr 28, 2024 by guocuimi

Exploring the feasibility of adopting the flashinfer library performance

Improvements to performance

#147 opened Apr 28, 2024 by guocuimi

Exploring lookahead decoding support enhancement

New feature or request

#146 opened Apr 28, 2024 by guocuimi

ScaleLLM vs vLLM in performance roadmap

#144 opened Apr 27, 2024 by WangErXiao

Previous 1 2 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly