feat(llm): Determine the best LLM deployment config automatically #2396

gaocegege · 2024-07-25T11:42:37Z

What you would like to be added?

Inspired by this research paper Vidur: A Large-Scale Simulation Framework For LLM Inference

Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies.

we present Vidur-Search, a configuration search tool that helps optimize LLM deployment. Vidur-Search uses Vidur
to automatically identify the most cost-effective deployment configuration that meets application performance
constraints. For example, Vidur-Search finds the best deployment configuration for LLaMA2-70B in one hour on
a CPU machine, in contrast to a deployment-based exploration which would require 42K GPU hours – costing
218K dollars.

Why is this needed?

Not sure if it is in the scope of katib, but glad to raise an issue here.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Electronic-Waste · 2024-08-03T06:33:12Z

I guess it may belong to the scope of KServe, since Katib focuses on the hyperparameters tuning of models :)

gaocegege · 2024-08-05T02:25:15Z

It's more like a tuning job. You can consider tuning the deployment configs. (e.g. distributed strategy)

andreyvelich · 2024-08-05T13:12:03Z

Thank you for creating this @gaocegege!

Yes, I think optimization of LLM Deployment makes sense since Katib is able to perform any optimization task (not even ML) and orchestrate any resources as Trials.

It would be nice to get someone from the Kubeflow community who can explore the Vidur aspects and see how Katib can be useful.

/help
/area llm
/remove-label lifecycle/needs-triage

google-oss-prow · 2024-08-05T13:12:06Z

@andreyvelich:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Thank you for creating this @gaocegege!

Yes, I think optimization of LLM Deployment makes sense since Katib is able to perform any optimization task (not even ML) and orchestrate any resources as Trials.

It would be nice to get someone from the Kubeflow community who can explore the Vidur aspects and see how Katib can be useful.

/help
/area llm
/remove-label lifecycle/needs-triage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-prow · 2024-08-05T13:12:07Z

@andreyvelich: The label(s) /remove-label lifecycle/needs-triage cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

Thank you for creating this @gaocegege!

Yes, I think optimization of LLM Deployment makes sense since Katib is able to perform any optimization task (not even ML) and orchestrate any resources as Trials.

It would be nice to get someone from the Kubeflow community who can explore the Vidur aspects and see how Katib can be useful.

/help
/area llm
/remove-label lifecycle/needs-triage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

gjyotin305 · 2024-09-12T02:33:33Z

@gaocegege @andreyvelich I would love to look into this,can I work on this ?

/assign

andreyvelich · 2024-09-23T17:29:58Z

Yes, that would be amazing @gjyotin305!
If you want, feel free to propose this topic in the AutoML and Training WG call when you explore it.

/assign @gjyotin305

gjyotin305 · 2024-09-23T18:01:43Z

Sure

github-actions · 2024-12-22T20:05:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gaocegege added kind/feature lifecycle/needs-triage labels Jul 25, 2024

google-oss-prow bot added the area/llm LLMs related content label Aug 5, 2024

google-oss-prow bot added the help wanted Extra attention is needed label Aug 5, 2024

andreyvelich removed the lifecycle/needs-triage label Aug 5, 2024

google-oss-prow bot assigned gjyotin305 Sep 23, 2024

github-actions bot added the lifecycle/stale label Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): Determine the best LLM deployment config automatically #2396

feat(llm): Determine the best LLM deployment config automatically #2396

gaocegege commented Jul 25, 2024

Electronic-Waste commented Aug 3, 2024 •

edited

Loading

gaocegege commented Aug 5, 2024

andreyvelich commented Aug 5, 2024

google-oss-prow bot commented Aug 5, 2024

google-oss-prow bot commented Aug 5, 2024

gjyotin305 commented Sep 12, 2024 •

edited

Loading

andreyvelich commented Sep 23, 2024

gjyotin305 commented Sep 23, 2024

github-actions bot commented Dec 22, 2024

feat(llm): Determine the best LLM deployment config automatically #2396

feat(llm): Determine the best LLM deployment config automatically #2396

Comments

gaocegege commented Jul 25, 2024

What you would like to be added?

Why is this needed?

Love this feature?

Electronic-Waste commented Aug 3, 2024 • edited Loading

gaocegege commented Aug 5, 2024

andreyvelich commented Aug 5, 2024

google-oss-prow bot commented Aug 5, 2024

google-oss-prow bot commented Aug 5, 2024

gjyotin305 commented Sep 12, 2024 • edited Loading

andreyvelich commented Sep 23, 2024

gjyotin305 commented Sep 23, 2024

github-actions bot commented Dec 22, 2024

Electronic-Waste commented Aug 3, 2024 •

edited

Loading

gjyotin305 commented Sep 12, 2024 •

edited

Loading