You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently came across an issue while working with your library's Latin example. Upon inspection, I noticed that the test dataset used in the example is actually part of the training dataset.
Additionally, I attempted to apply your library to my custom language dataset, where the test dataset is distinctly separate from the training dataset. However, during implementation, I encountered a "KeyError in f2i" for the test dataset. This error indicates that some trigrams in the test dataset are not present in the f2i (feature-to-index) mapping.
Could you please provide guidance on how to handle this scenario? It seems crucial for the library to support cases where the test dataset contains trigrams not present in the f2i mapping.
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered:
thanks for raising this issue! I'll have a look into the Latin example.
Regarding your issue, there is a function which is able to deal with exactly that, the make_combined_cue_matrix function (see documentation here). You have to provide both training and validation data to the function and it will make sure that there are are columns in the C matrix also for any trigrams that only occur in the validation data.
Hi,
I recently came across an issue while working with your library's Latin example. Upon inspection, I noticed that the test dataset used in the example is actually part of the training dataset.
Additionally, I attempted to apply your library to my custom language dataset, where the test dataset is distinctly separate from the training dataset. However, during implementation, I encountered a "KeyError in f2i" for the test dataset. This error indicates that some trigrams in the test dataset are not present in the f2i (feature-to-index) mapping.
Could you please provide guidance on how to handle this scenario? It seems crucial for the library to support cases where the test dataset contains trigrams not present in the f2i mapping.
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered: