Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Import documents: normalize imported text and file names #568

Merged
merged 1 commit into from
Sep 10, 2020

Conversation

PrimozGodec
Copy link
Collaborator

Issue

Filename (and potentially text too) can contain characters that are written in decompose form (č is composed as a char c and separate caron). It causes problems when we filter documents (user inputs č as precomposed Unicode char).

Description of changes

With this PR text is normalized (all decomposed chars are changed to precomposed) before outputted from the widget

Includes
  • Code changes
  • Tests
  • Documentation

@codecov-commenter
Copy link

Codecov Report

Merging #568 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #568   +/-   ##
=======================================
  Coverage   73.80%   73.81%           
=======================================
  Files          66       66           
  Lines        7464     7465    +1     
  Branches     1000     1000           
=======================================
+ Hits         5509     5510    +1     
  Misses       1744     1744           
  Partials      211      211           

@ajdapretnar ajdapretnar merged commit 3f7526b into biolab:master Sep 10, 2020
@PrimozGodec PrimozGodec deleted the unicode-normalize branch March 29, 2023 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants