-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fix_encoding to preprocessing #129
Comments
This is certainly useful, I'm just not sure how common these errors are? I assume we would not put it into the standard Then the only case this would be used is if a user notices he has this encoding error in his Series. Would he then not just google the problem, land on StackOverflow, import ftfy and fix it himself? I guess I'm just not really seeing when a user would look for a texthero function to do this. The only exception I can see is that maybe these errors are much more common than I think? I'm not sure. |
Agree with @henrifroese. The way we would implement this is by simply calling @cedricconol if you believe this function might be useful for many, you can write a blog article about that subject. The idea would be to load a dataset, explain the problem, and show the code to fix the issue. I'm closing this now as the idea is to prioritize: #85 |
Thanks for your feedbacks @henrifroese and @jbesomi. |
I think it would be nice to have a
fix_encoding
function in preprocessing to fix bad encoding in input text. We can build this using ftfy.Examples from ftfy's readme:
The text was updated successfully, but these errors were encountered: