Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert all Unicode to Ascii #34

Open
GeorgeR227 opened this issue Jun 6, 2024 · 4 comments
Open

Convert all Unicode to Ascii #34

GeorgeR227 opened this issue Jun 6, 2024 · 4 comments

Comments

@GeorgeR227
Copy link
Contributor

In Decapodes specifically but perhaps more generally as well, we should make all code relying on symbol name matching to depend on matching to the Ascii version of that symbol name. Unicode names can be supported but under the hood these should be converted to Ascii when needed.

Doing this makes the underlying code more maintainable since we only ever need to worry about the Ascii name and having the Ascii name be the base makes the code more accessible to users.

@jpfairbanks
Copy link
Member

Having better ways to centralize the Unicode and ascii aliases would be good. But we can't just always convert to one or the other. If the users wrote it in Unicode then they will want to see output with Unicode labels on plots and stuff.

@GeorgeR227
Copy link
Contributor Author

Yeah I understand, which is why this conversion won't be happening directly on user facing things, like a user-made ACSet, but either on back-end copies of that ACSet or with on-the-spot conversions.

So for example, when we do type-inference, what we do now is look for multiple variations of an operator name.

(src_type = :Form0, tgt_type = :Form0, op_names = [:∂ₜ,:dt]),
(src_type = :Form1, tgt_type = :Form1, op_names = [:∂ₜ,:dt]),
# Rules for d
(src_type = :Form0, tgt_type = :Form1, op_names = [:d, :d₀]),
(src_type = :DualForm0, tgt_type = :DualForm1, op_names = [:d, :dual_d₀, :d̃₀]),

The idea would be to instead convert these names within the type-inference function itself, so hidden to the user, to a base name, represented in Ascii, and then have these rules only match on that base name.

@lukem12345
Copy link
Member

lukem12345 commented Jun 7, 2024

Yeah we can come up with some useful wrappers around symbols, tagging them as "USER_INPUTTED" and that kind of thing. For recognized symbols (or some distinct class of symbol wrappers) we can emit different names at to_graphviz time vs. colanguage time vs. gensim time.

@lukem12345
Copy link
Member

Sufficiently advanced feature would be able to track the lifetime as a symbol behind the scenes. Storing information such as:

"This variable was originally from the Foo decapode with name Bar. It was composed with Biz from Baz and called Buzz."

Or

"This variable was created by calling expand_operators on Baz and called Biz, then composed with Bar from the Foo decapode."

i.e. Some of the tools needed to provide a "stacktrace" or debugging information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants