Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPEA Triage Tool for ChatQnA #1185

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

louie-tsai
Copy link
Collaborator

@louie-tsai louie-tsai commented Nov 23, 2024

Description

To help customers debug OPEA issues, we create a Triage Tool to test and gather needed information for debugging.
Mainly target on Docker Compose scenarios.

For the first draft, it is for ChatQnA.

How to Use:
Under root folder of OPEA Examples.
Xeon :
python .Triage.py ChatQnA/testsChatQnA_Xeon.json
Gaudi :
python .Triage.py ChatQnA/tests/ChatQnA_Gaudi.json
only need to change json file for different architectures like Gaudi

The Triage Tools will run some simple tests including:

  1. health_check
  2. micorservice testings
  3. statistics

Below information will be gather after above testings

  1. system info
  2. env variable info
  3. all docker logs for microservices
  4. (optional) profiling results like vllm pytorch profiling

Report:
plan to have a HTML report.
Right now we have

  1. simple console output
  2. simple html report with all docker logs embedded

console output
Below is the screenshot for console output.

Xeon
image

Gaudi
image

html report with all docker logs embedded and profile log for vllm

Xeon
image

Gaudi
image

Other info:

  1. all input, output, port configurations are in data.json file. no need to change codes for configurations and data
  2. each example will be implemented as a seperate python class

ToDo:

  1. need to make html report better
  2. make sure all microservices are tested on both Xeon and Gaudi
  3. move files into right folder
  4. apply to all examples

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

NA

Tests

Manual Testing

vllm profiling result from Triage Tool

image

@louie-tsai
Copy link
Collaborator Author

a sample report zip file
ChatQnA_Xeon.json_11-25_06-34.zip

@louie-tsai
Copy link
Collaborator Author

[ToDo] : understand whether we need debug RESTful API request here or using Telemetery

ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
ChatQnA/tests/Triage.py Outdated Show resolved Hide resolved
Copy link

@alexsin368 alexsin368 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove any commented out code

@louie-tsai louie-tsai requested a review from hshen14 December 10, 2024 00:34
@louie-tsai louie-tsai changed the title [WIP] First Draft of OPEA Triage Tool OPEA Triage Tool for ChatQnA Dec 10, 2024
@xiguiw
Copy link
Collaborator

xiguiw commented Dec 23, 2024

@louie-tsai
This feature is useful!
Will you check the CI failure?
I will push development team to review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants