Carefulcase eats words it can't generate #35

unhammer · 2018-10-25T09:31:16Z

If the dictionary has

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
 <alphabet/>
 <sdefs>
   <sdef n="n"/>
   <sdef n="m"/>
   <sdef n="pl"/>
   <sdef n="def"/>
 </sdefs>
 <section id="main" type="standard">

<e><p><l>kakene</l><r>kake<s n="n"/><s n="m"/><s n="pl"/><s n="def"/></r></p></e>

<e><p><l>pc-ane</l><r>pc<s n="n"/><s n="m"/><s n="pl"/><s n="def"/></r></p></e>
<e><p><l>PC-ane</l><r>PC<s n="n"/><s n="m"/><s n="pl"/><s n="def"/></r></p></e>

 </section>
</dictionary>

then we get

$ echo '^kake<n><m><pl><def>$ ^KAKE<n><m><pl><def>$ ^kake<n><m><pl><def>$'|lt-proc -C nob.autogen.bin 
kakene  kakene

I would like it to just fall back to "normal" generation for words it can't find exact case for, ie.

$ echo '^kake<n><m><pl><def>$ ^KAKE<n><m><pl><def>$ ^kake<n><m><pl><def>$'|lt-proc -C nob.autogen.bin 
kakene KAKENE kakene

while still retaining the -C functionality for words it can find exact matches for

$ echo '^PC<n><m><pl><def>$ ^pc<n><m><pl><def>$' | lt-proc -C nob.autogen.bin
PC-ane pc-ane

jimregan · 2018-10-28T15:03:54Z

I lost my laptop three weeks ago, so it'll be a while before I can look at this.

…

On Thursday, 25 October 2018, Kevin Brubeck Unhammer < ***@***.***> wrote: Assigned #35 <#35> to @jimregan <https://github.com/jimregan>. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#35 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAN4FoMJsbXjFMdMmrDhxgKbjMluOebpks5uoYc4gaJpZM4X53Y5> .

unhammer · 2018-10-28T16:27:37Z

ouch :((

…nowns fix #34 - Carefulcase option -C not compatible with -g but #35 - Carefulcase eats words it can't generate still doesn't work if we get started on an ambiguous path

unhammer · 2018-11-17T13:04:47Z

I added some tests in fd6e6dc – it turns out to be problematic if we start generating ^KAKE<n><f><pl><def>$ and see a possible path that starts ^K but then only ends up in other analyses (e.g. ^KK<np>$). Then we end up with #KAKE where we should have tried a lowercased analysis.

But if there were no such garden paths, ^KAKE<n><f><pl><def>$ does give an analysis – see difference between the two test dix'es added fd6e6dc#diff-839e968af7bf80a08ea4d97247cbe7fdR1

unhammer · 2023-04-20T11:00:02Z

@mr-martian Do you think this is solvable? I'd love to have a solution for this (but in bilingual mode lt-proc -b), s.t. that I can e.g. have a dix with

<e>       <re>[a-zA-Z]+</re><p><l></l><r><s n="np"/></r></p></e>
<e>       <i>med</i>        <p><l></l><r><s n="pr"/></r></p></e>

and get

$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc -C -b nob-nno.autogen.bin
^Med<pr>/Med$ ^AbCd<np>/AbCd$

Currently, we can get either the one or the other:

$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc  -C tmp.bin # eats Med
 AbCd

$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc  -b tmp.bin # includes extra "Abcd"
^Med<pr>/Med$ ^AbCd<np>/AbCd/Abcd$

$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc  -c -g tmp.bin # fails to generate Med since lemma is lowercase
#Med AbCd

Possibly related to #167

jf. apertium/lttoolbox#35 (comment)

unhammer assigned jimregan Oct 25, 2018

unhammer added a commit to apertium/apertium-nno-nob that referenced this issue Apr 22, 2023

namemerge: stor forbokstav, så me ikkje genererer Abcd/abcd

7370f5b

jf. apertium/lttoolbox#35 (comment)

unhammer added a commit to apertium/apertium-nno-nob that referenced this issue Apr 26, 2023

namemerge: stor forbokstav, så me ikkje genererer Abcd/abcd

90865d1

jf. apertium/lttoolbox#35 (comment)

unhammer added a commit to apertium/apertium-nno-nob that referenced this issue Jun 3, 2023

namemerge: stor forbokstav, så me ikkje genererer Abcd/abcd

b179309

jf. apertium/lttoolbox#35 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carefulcase eats words it can't generate #35

Carefulcase eats words it can't generate #35

unhammer commented Oct 25, 2018 •

edited

Loading

jimregan commented Oct 28, 2018 via email

unhammer commented Oct 28, 2018

unhammer commented Nov 17, 2018

unhammer commented Apr 20, 2023 •

edited

Loading

Carefulcase eats words it can't generate #35

Carefulcase eats words it can't generate #35

Comments

unhammer commented Oct 25, 2018 • edited Loading

jimregan commented Oct 28, 2018 via email

unhammer commented Oct 28, 2018

unhammer commented Nov 17, 2018

unhammer commented Apr 20, 2023 • edited Loading

unhammer commented Oct 25, 2018 •

edited

Loading

unhammer commented Apr 20, 2023 •

edited

Loading