Should SKLearn operators be assumed to produce a single output? #656

stillmatic · 2022-11-25T18:21:20Z

See https://github.com/microsoft/hummingbird/blob/main/hummingbird/ml/_parse.py#L256

Consider models which implement predict and predict_proba functions. These return both label and probabilities as outputs. The current logic means that we cannot name the outputs in the hummingbird conversion step (ie with output_names argument to extra_config) and instead have to perform some ONNX graph surgery afterwards.

The text was updated successfully, but these errors were encountered:

stillmatic · 2023-01-17T17:20:41Z

took an extremely silly hard coded approach here: stillmatic@0a2fc96

interesaaat · 2023-01-17T17:58:01Z

Hey let me take a stab at this. It requires several changes in order to make it generic, but I agree that it could be the source of the error behind #676.

interesaaat · 2023-01-17T17:59:44Z

Of course unless you want to try to fix this yourself!

stillmatic · 2023-01-17T18:01:52Z

yeah, the generic case is quite tricky, hard coding it is much easier ;)

tried to do a cleaner implementation by adding XGBClassifier to the supported sklearn operator map, but that requires changes to onnxmltools upstream, probably. for right now, happy with stillmatic@0a2fc96 for myself, it's probably too hacky to upstream though.

I don't think it's the problem behind #676, after debugging more, that problem appears to be specifically in the GEMM -> ONNX conversion.

interesaaat · 2023-01-17T23:45:08Z

@stillmatic can you check if this repo will solve your problem?

stillmatic · 2023-01-18T22:48:05Z

interesting, this definitely looks like it is properly generating the two variables. on one of my internal models, I'm seeing this output

In [10]: hmclf.model.graph.output
Out[10]:
[name: "variable1"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "variable2"
type {
  tensor_type {
    elem_type: 11
    shape {
      dim {
        dim_param: "sym"
      }
      dim {
        dim_param: "Concatvariable2_dim_1"
      }
    }
  }
}
]

though I can't generate a reproducible example. it doesn't seem to affect prediction, onnx is happy with it. let me dig a bit more over the next few days!

stillmatic · 2023-02-02T23:05:54Z

@interesaaat -- sorry for delayed response. I have tested your branch with internal models and everything seems to be working, so think it could be upstreamed. Thanks for the help!

stillmatic mentioned this issue Jan 17, 2023

XGBoost onnx backend inference not working with a single example #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should SKLearn operators be assumed to produce a single output? #656

Should SKLearn operators be assumed to produce a single output? #656

stillmatic commented Nov 25, 2022

stillmatic commented Jan 17, 2023

interesaaat commented Jan 17, 2023

interesaaat commented Jan 17, 2023

stillmatic commented Jan 17, 2023

interesaaat commented Jan 17, 2023

stillmatic commented Jan 18, 2023

stillmatic commented Feb 2, 2023

Should SKLearn operators be assumed to produce a single output? #656

Should SKLearn operators be assumed to produce a single output? #656

Comments

stillmatic commented Nov 25, 2022

stillmatic commented Jan 17, 2023

interesaaat commented Jan 17, 2023

interesaaat commented Jan 17, 2023

stillmatic commented Jan 17, 2023

interesaaat commented Jan 17, 2023

stillmatic commented Jan 18, 2023

stillmatic commented Feb 2, 2023