Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key Error in f2i Dict #105

Closed
zeinab-sheikhi opened this issue Apr 4, 2024 · 3 comments
Closed

Key Error in f2i Dict #105

zeinab-sheikhi opened this issue Apr 4, 2024 · 3 comments
Labels
bug Something isn't working question Further information is requested

Comments

@zeinab-sheikhi
Copy link

Hi,

I recently came across an issue while working with your library's Latin example. Upon inspection, I noticed that the test dataset used in the example is actually part of the training dataset.

Additionally, I attempted to apply your library to my custom language dataset, where the test dataset is distinctly separate from the training dataset. However, during implementation, I encountered a "KeyError in f2i" for the test dataset. This error indicates that some trigrams in the test dataset are not present in the f2i (feature-to-index) mapping.

Could you please provide guidance on how to handle this scenario? It seems crucial for the library to support cases where the test dataset contains trigrams not present in the f2i mapping.

Thank you for your attention to this matter.

@MariaHei
Copy link
Collaborator

MariaHei commented Apr 8, 2024

Hi @zeinab-sheikhi,

thanks for raising this issue! I'll have a look into the Latin example.

Regarding your issue, there is a function which is able to deal with exactly that, the make_combined_cue_matrix function (see documentation here). You have to provide both training and validation data to the function and it will make sure that there are are columns in the C matrix also for any trigrams that only occur in the validation data.

Hope that helps!
Maria

@MariaHei MariaHei added bug Something isn't working question Further information is requested labels Apr 8, 2024
@MariaHei
Copy link
Collaborator

MariaHei commented Apr 8, 2024

The latin issue is now fixed in both the readme and the documentation.

@MariaHei
Copy link
Collaborator

MariaHei commented Jul 3, 2024

Closing this because the issue seems to be fixed. Please let me know in case you still have any questions.

@MariaHei MariaHei closed this as completed Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants