Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ai.onnx.ml uses older version of onnx runtime, failing to deploy "llama 3.2 1b instruct" in OpenSearch 2.18 #3204

Open
maxlepikhin opened this issue Nov 6, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@maxlepikhin
Copy link
Contributor

What is the bug?
"Llama 3.2 1b instruct" in ONNX format fails to deploy in 2.18 as DJL library uses older onnx runtime that does not support opset 5, see below.

[2024-11-06T14:20:26,260][ERROR][o.o.m.e.a.DLModel        ] Failed to deploy model fOtVA5MB-tMBOFEUsTd7
ai.djl.MalformedModelException: ONNX Model cannot be loaded
	at ai.djl.onnxruntime.engine.OrtModel.load(OrtModel.java:90) ~[onnxruntime-engine-0.28.0.jar:?]
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:166) ~[api-0.28.0.jar:?]
	at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[api-0.28.0.jar:?]
	at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[opensearch-ml-algorithms-2.18.0.0.jar:?]
	at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) [opensearch-ml-algorithms-2.18.0.0.jar:?]
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) [?:?]
	at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) [opensearch-ml-algorithms-2.18.0.0.jar:?]
	at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) [opensearch-ml-algorithms-2.18.0.0.jar:?]
	at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:139) [opensearch-ml-algorithms-2.18.0.0.jar:?]
	at org.opensearch.ml.model.MLModelManager.lambda$deployModel$55(MLModelManager.java:1119) [opensearch-ml-2.18.0.0.jar:2.18.0.0]
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.18.0.jar:2.18.0]
	at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$76(MLModelManager.java:1745) [opensearch-ml-2.18.0.0.jar:2.18.0.0]
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.18.0.jar:2.18.0]
	at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.18.0.jar:2.18.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005) [opensearch-2.18.0.jar:2.18.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.18.0.jar:2.18.0]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: ai.onnxruntime.OrtException: Error code - ORT_FAIL - message: Load model from /home/max/search/opensearch/opensearch-2.18.0/data/ml_cache/models_cache/models/fOtVA5MB-tMBOFEUsTd7/3/llama-3.2-1b_onnx/llama-3.2-1b_onnx.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 3.

How can one reproduce the bug?

  1. Prepare llama model with optimum export
  2. ZIP model directory
  3. Register with POST register call to opensearch (make sure fix for [BUG] FileUtils.splitFileIntoChunks casts file size to int resulting in negative number for files > 2GB #3197 is included in the build).
  4. POST deploy, observe the issue after a minute or so.

What is the expected behavior?
Deploy call succeeds.

What is your host/environment?
Ubuntu 24.04

Do you have any screenshots?
N/A

Do you have any additional context?
N/A

@maxlepikhin maxlepikhin added bug Something isn't working untriaged labels Nov 6, 2024
@mingshl mingshl removed the untriaged label Nov 19, 2024
@mingshl
Copy link
Collaborator

mingshl commented Nov 19, 2024

Hi, for local model, this is the list of the model types that ml-commons can support for now.

https://github.com/opensearch-project/ml-commons/blob/4f21953157cd4e04672e034dab5c9b401a2c07a2/common/src/main/java/org/opensearch/ml/common/FunctionName.java#L16C2-L34C15

Currently, this is not a supported model type. Please create a new issue if you want to introduce this new model type. Thanks.

@mingshl mingshl moved this to In Progress in ml-commons projects Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants