Convert all Unicode to Ascii #34

GeorgeR227 · 2024-06-06T22:38:21Z

In Decapodes specifically but perhaps more generally as well, we should make all code relying on symbol name matching to depend on matching to the Ascii version of that symbol name. Unicode names can be supported but under the hood these should be converted to Ascii when needed.

Doing this makes the underlying code more maintainable since we only ever need to worry about the Ascii name and having the Ascii name be the base makes the code more accessible to users.

jpfairbanks · 2024-06-06T23:34:56Z

Having better ways to centralize the Unicode and ascii aliases would be good. But we can't just always convert to one or the other. If the users wrote it in Unicode then they will want to see output with Unicode labels on plots and stuff.

GeorgeR227 · 2024-06-06T23:58:03Z

Yeah I understand, which is why this conversion won't be happening directly on user facing things, like a user-made ACSet, but either on back-end copies of that ACSet or with on-the-spot conversions.

So for example, when we do type-inference, what we do now is look for multiple variations of an operator name.

DiagrammaticEquations.jl/src/deca/deca_acset.jl

Lines 9 to 14 in fa4fe1e

    
           (src_type = :Form0, tgt_type = :Form0, op_names = [:∂ₜ,:dt]), 
        
           (src_type = :Form1, tgt_type = :Form1, op_names = [:∂ₜ,:dt]), 
        
           # Rules for d 
        
           (src_type = :Form0, tgt_type = :Form1, op_names = [:d, :d₀]), 
        
           (src_type = :DualForm0, tgt_type = :DualForm1, op_names = [:d, :dual_d₀, :d̃₀]),

The idea would be to instead convert these names within the type-inference function itself, so hidden to the user, to a base name, represented in Ascii, and then have these rules only match on that base name.

lukem12345 · 2024-06-07T00:10:37Z

Yeah we can come up with some useful wrappers around symbols, tagging them as "USER_INPUTTED" and that kind of thing. For recognized symbols (or some distinct class of symbol wrappers) we can emit different names at to_graphviz time vs. colanguage time vs. gensim time.

lukem12345 · 2024-06-07T00:15:45Z

Sufficiently advanced feature would be able to track the lifetime as a symbol behind the scenes. Storing information such as:

"This variable was originally from the Foo decapode with name Bar. It was composed with Biz from Baz and called Buzz."

Or

"This variable was created by calling expand_operators on Baz and called Biz, then composed with Bar from the Foo decapode."

i.e. Some of the tools needed to provide a "stacktrace" or debugging information

GeorgeR227 mentioned this issue Jun 25, 2024

Add operator name canonicalization #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert all Unicode to Ascii #34

Convert all Unicode to Ascii #34

GeorgeR227 commented Jun 6, 2024

jpfairbanks commented Jun 6, 2024

GeorgeR227 commented Jun 6, 2024

lukem12345 commented Jun 7, 2024 •

edited

Loading

lukem12345 commented Jun 7, 2024

Convert all Unicode to Ascii #34

Convert all Unicode to Ascii #34

Comments

GeorgeR227 commented Jun 6, 2024

jpfairbanks commented Jun 6, 2024

GeorgeR227 commented Jun 6, 2024

lukem12345 commented Jun 7, 2024 • edited Loading

lukem12345 commented Jun 7, 2024

lukem12345 commented Jun 7, 2024 •

edited

Loading