Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should SKLearn operators be assumed to produce a single output? #656

Open
stillmatic opened this issue Nov 25, 2022 · 7 comments
Open

Should SKLearn operators be assumed to produce a single output? #656

stillmatic opened this issue Nov 25, 2022 · 7 comments

Comments

@stillmatic
Copy link
Contributor

See https://github.com/microsoft/hummingbird/blob/main/hummingbird/ml/_parse.py#L256

Consider models which implement predict and predict_proba functions. These return both label and probabilities as outputs. The current logic means that we cannot name the outputs in the hummingbird conversion step (ie with output_names argument to extra_config) and instead have to perform some ONNX graph surgery afterwards.

@stillmatic
Copy link
Contributor Author

took an extremely silly hard coded approach here: stillmatic@0a2fc96

@interesaaat
Copy link
Collaborator

Hey let me take a stab at this. It requires several changes in order to make it generic, but I agree that it could be the source of the error behind #676.

@interesaaat
Copy link
Collaborator

Of course unless you want to try to fix this yourself!

@stillmatic
Copy link
Contributor Author

yeah, the generic case is quite tricky, hard coding it is much easier ;)

tried to do a cleaner implementation by adding XGBClassifier to the supported sklearn operator map, but that requires changes to onnxmltools upstream, probably. for right now, happy with stillmatic@0a2fc96 for myself, it's probably too hacky to upstream though.

I don't think it's the problem behind #676, after debugging more, that problem appears to be specifically in the GEMM -> ONNX conversion.

@interesaaat
Copy link
Collaborator

@stillmatic can you check if this repo will solve your problem?

@stillmatic
Copy link
Contributor Author

interesting, this definitely looks like it is properly generating the two variables. on one of my internal models, I'm seeing this output

In [10]: hmclf.model.graph.output
Out[10]:
[name: "variable1"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "variable2"
type {
  tensor_type {
    elem_type: 11
    shape {
      dim {
        dim_param: "sym"
      }
      dim {
        dim_param: "Concatvariable2_dim_1"
      }
    }
  }
}
]

though I can't generate a reproducible example. it doesn't seem to affect prediction, onnx is happy with it. let me dig a bit more over the next few days!

@stillmatic
Copy link
Contributor Author

@interesaaat -- sorry for delayed response. I have tested your branch with internal models and everything seems to be working, so think it could be upstreamed. Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants