Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Encoder.oneHot like scikit-learn LabelBinarizer #28

Open
CaptainDario opened this issue Feb 10, 2021 · 7 comments
Open

Using Encoder.oneHot like scikit-learn LabelBinarizer #28

CaptainDario opened this issue Feb 10, 2021 · 7 comments

Comments

@CaptainDario
Copy link

CaptainDario commented Feb 10, 2021

First of all thanks for this nice package! But sadly I am already stuck at the beginning.

I am trying to use the Encoder.oneHot like the LabelBinarizer from scikit-learn. But I am not sure how to achieve that if it is even possible.

What I want is basically this:

# create an oneHotEncoder for my labels
y = ["a", "b", "c", ...]   # the labels i want to one hot encode
lb = LabelBinarizer()
lb.fit(y)
o_y = lb.transform(y)

# inference of CNN
...

# use the encoder on a prediction of a CNN to get the label (string) of the class
prediction = lb.inverse_transform(predicted)

The Encoder.oneHot forces me to provide a dataFrame instance to the constructor. However from the README it is not clear to me how that dataFrame should look like (also could you please update the link to the black friday data set).

Your help would be highly appreciated!

@gyrdym
Copy link
Owner

gyrdym commented Feb 11, 2021

@CaptainDario Thank you for creating the issue! Indeed, there are too few words in the README about encoding, I'd recommend you to look at live example Although a different encoder is used there, the key idea is the same - encoders from this lib infer labels from the provided data on their own, that's why you need to provide data first (using DataFrame). I suppose, it would be a good idea to add the ability to provide labels directly to encoders, I'll consider this in future updates of the lib

@gyrdym
Copy link
Owner

gyrdym commented Feb 11, 2021

@CaptainDario And regarding the additional info in README - I got your point, It's really needed to add some words on encoding + I'll fix the link

@CaptainDario
Copy link
Author

Thank you for your quick help.

If I understand that right I need to create a dataframe with a feature containing all my values like this:

DataFrame([
["My Feature"],
["a"],
["b"],
["c"],
...,
["z"]
])

and than the created encoder will be able to convert new instances back to the label, right?

@CaptainDario
Copy link
Author

CaptainDario commented Feb 11, 2021

Okay, I tried the above approach and it seems to be working.
However the application crashes if the optional parameter featureNames is not given. Maybe it would be good to encode all labels/features if the parameter is unset.

But does an encoder provide a method to reverse the oneHot encoding something like unprocess which takes a DataFrame like

final dataFrame = DataFrame([
    ["character"], ["a"], ["b"], ["c"], ["d"],
  ]);
final encoder = Encoder.oneHot(dataFrame, featureNames: ["character"]);

final prediction = DataFrame([[0], [0], [0], [1]]);
final decoded = encoder.unprocess(prediction);

And decoded now contains the value "d".
That would be really helpful.

@gyrdym
Copy link
Owner

gyrdym commented Feb 11, 2021

@CaptainDario thank you very much for such a precious feedback, I'll consider adding this functionality to the lib. Do you have any more problems with the package?

@CaptainDario
Copy link
Author

Otherwise the package seems to be doing exactly what I want. Thank you!
Because I need something like an unprocess method for progressing with my app, I will try to implement it for the encoder.oneHot.
Do you think adding unprocess to encoder_impl.dart would be suitable?

@gyrdym
Copy link
Owner

gyrdym commented Feb 12, 2021

@CaptainDario I need to think it over, unprocess sounds a bit unclear for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants