Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support QoS in api_server #877

Merged
merged 4 commits into from
Dec 27, 2023
Merged

Support QoS in api_server #877

merged 4 commits into from
Dec 27, 2023

Conversation

sallyjunjun
Copy link
Contributor

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

The purpose of this commit is to introduce Quality of Service (QoS) functionality. When the system is in an overloaded state, this feature ensures prioritization based on user priority, allowing high-priority users to have precedence. For users with the same priority, resources can be utilized in accordance with the specified ratio.

Modification

  1. Added the QoS (Quality of Service) module along with configuration file parsing.
  2. Introduced three rate-limiting interfaces: chat_completions_v1_qos, completions_v1_qos, and chat_interactive_v1_qos.
  3. Included test files for this module.
  4. No impact on the functionality of existing interfaces.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@lvhan028
Copy link
Collaborator

May provide user guide

@lvhan028
Copy link
Collaborator

Please resolve the linting error

lmdeploy/serve/openai/api_server.py Show resolved Hide resolved
lmdeploy/serve/openai/api_server.py Outdated Show resolved Hide resolved
lmdeploy/serve/openai/protocol.py Show resolved Hide resolved
lmdeploy/serve/qos_engine/qos_engine.py Outdated Show resolved Hide resolved
@lvhan028
Copy link
Collaborator

图像不建议用文件的方式放入repo,这样会导致项目太大了。请使用github图床方式,加入图像的链接

@sallyjunjun
Copy link
Contributor Author

May provide user guide

@sallyjunjun sallyjunjun reopened this Dec 22, 2023
@awslshadowstar
Copy link

图像不建议用文件的方式放入repo,这样会导致项目太大了。请使用github图床方式,加入图像的链接

现在github 图床获取到的链接都是 private-user-images.githubusercontent.com 域名的带过期jwt的图片链接,不适合用于文档,请问是否可以使用第三方图床工具

@lvhan028
Copy link
Collaborator

图像不建议用文件的方式放入repo,这样会导致项目太大了。请使用github图床方式,加入图像的链接

现在github 图床获取到的链接都是 private-user-images.githubusercontent.com 域名的带过期jwt的图片链接,不适合用于文档,请问是否可以使用第三方图床工具

把图像拽到这个PR下的comment栏中,会自动生成一条 url,用这个url就可以了

@lvhan028
Copy link
Collaborator

linting的错误还在:

pip install pre-commit
cd lmdeploy
pre-commit install .
pre-commit run --all-files

用上面这条命令,能够检查commit之前所有的文件格式规范上的错误,可以在本地修改好一并提交

@lvhan028 lvhan028 changed the title merge qos feature to sched_main Support QoS in api_server Dec 25, 2023
@lvhan028
Copy link
Collaborator

May improve docstring coverage of qos_engine module


self.qos_user_group = QosGroupQueue(self.qos_config)

self.usage_stats = UsageStats(60, 6, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May use key=value to initiate UsageStats in order to clarify what 60, 6, 0 are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okk

def dequeue(self, usage_stats):
return self.qos_user_group.dequeue(usage_stats)

def stop(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will stop be called?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be deleted

self.user_queue_map[user_id] = collections.deque()
self.user_quota_map[user_id] = item['quota_pct'] / total_quota

self.lock = threading.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.lock is defined but not used

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be deleted

Copy link
Collaborator

@AllentDan AllentDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after left comments resolved.

docs/en/qos.md Outdated Show resolved Hide resolved
@sallyjunjun sallyjunjun force-pushed the qos_feature branch 4 times, most recently from 661d436 to 270e92d Compare December 27, 2023 06:54
@lvhan028 lvhan028 merged commit ddfa8c4 into InternLM:main Dec 27, 2023
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants