Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lt-proc -g -b should output @ symbol when there are unconsumed tags #182

Open
unhammer opened this issue Apr 4, 2024 · 0 comments
Open

Comments

@unhammer
Copy link
Member

unhammer commented Apr 4, 2024

For regular bidix lt-proc -b, we want to just copy over unconsumed tags and that is fine:

$ echo '^kake<n><m><unconsumed>$' |lt-proc -b nob-nno.autobil.bin
^kake<n><m><unconsumed>/kake<n><f><unconsumed>$

When using regular generation lt-proc -g, unconsumed tags lead to #-marks:

$ echo '^kake<n><f><sg><ind><unconsumed>$' |lt-proc -g nob-nno.autogen.bin
#kake

$ echo '^kake<n><f><sg><ind><unconsumed>$' |lt-proc --debugged-gen nob-nno.autogen.bin
#kake\<n\>\<f\>\<sg\>\<ind\>

But when using lt-proc in bilingual mode on a generator, we get the unconsumed tag without any debug symbol:

$ echo '^kake<n><f><sg><ind><unconsumed>$' |lt-proc -g -b nob-nno.autogen.bin
^kake<n><f><sg><ind><unconsumed>/kake<unconsumed>$

(while completely-unmatched words do get a @)

This can lead to hard-to-debug issues when we have a partial match; after the following cg-proc we just see the lemma as if it were the form and no hint about it not being found in the generator.

Ideally, when switch -b is given after -g (or -d), we would get an @ when there are unconsumed input tags. Note: we don't want an @ if there are output tags, e.g.

$ echo '^lykke<n><f><sg><ind>$' |lt-proc -g -b nob-nno.autogen.bin
^lykke<n><f><sg><ind>/lykke/lukke<v:lykke_lukke.vok-y2u>$

is still correct (here the whole input is consumed, there are no leftovers, but there is still a tag in output). But we want

$ echo '^kake<n><f><sg><ind><unconsumed>$' |lt-proc -g -b nob-nno.autogen.bin
^kake<n><f><sg><ind><unconsumed>/@kake$

and perhaps

$ echo '^kake<n><f><sg><ind><unconsumed>$' |lt-proc -d -b nob-nno.autogen.bin
^kake<n><f><sg><ind><unconsumed>/@kake\<unconsumed\>$

(though the details of -g vs -d are less important than just having the @ in there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant