-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally supported implementation-defined values #707
Comments
Seconded. As an embedded developer I see value in using TOML over other languages (well-defined spec and end-users familiar with INI syntax), but creating a compliant implementation it while targeting tiny microcontrollers without even a full C standard library would likely be a no-go. At best the implementation would be fully compliant while allowing the user to opt out of some parts of the spec. Devices I have in mind would likely read that file from a pendrive or an SD card and likely have no graphical interface too. Note that both C and C++ have very relaxed requirements when it comes to this tiny devices. Even |
@jaskij what's the blocker for your use case? If you want to use a well defined subset, there's nothing stopping you from doing so. It won't be implementing the TOML spec in it's entirely but that's fine given the constraints of the environment. This issue is asking for more flexibility than TOML provides today. Are you asking for more flexibility somehow? |
@pradyunsg I might have worded it badly - for embedded use case a strong requirement for datetimes and time of day, which are often not used in those systems, is, if not a blocker, then at least undesired. I'm thinking of devices which have something like 256 kiB total - every kilobyte is precious. I wouldn't be surprised if users would not want to include date handling code. Making them optional would allow such an implementation to be compliant without undue burden. If I know from the onset that I cannot make my implementation compliant why even start writing it? And yes, personally, I put great emphasis on allowing an implementation to call itself compliant. In a different use case (this time on Linux) I do use time spans which with this proposal would make implementation-defined. Embedded systems use time spans regularly. I might have been a little misguided earlier, but yes, I do need more flexibility - at least in making timezones and datetimes optional. |
You can put things in strings:
And then the implementation can process that however it likes.
At the very least, it would need a different syntax than
Or maybe using a different key/value separator such as
But then you still have the problem "what kind of value is this?" The only way you know that the In YAML this is fixed by telling it what object type to use:
The above only shows a problem with this: allowing to use any object is also a security risk; parsing random TOML documents is no longer safe (just as parsing random YAML documents isn't, unless you disable this feature which things like Overall, I'm not in favour of this. I would be in favour of adding durations to the specification, but that's a different issue. I might be in favour of adding some functions to the specification (such as btw, what if someone defined a TOML document like this:
That's basically a DoS attack since computing the power of large values is computationally quite expensive. |
That seems like a very bad idea. Parser implementations should always treat strings as plain text, users and applications should be able to distinguish strings from implementation-defined values.
TOML documents with implementation-defined values could have a separate
That is the point of implementation-defined values. If the TOML implementation does not support them, or does not have them enabled, it should just skip over them or treat the TOML document as invalid. TOML is not a data exchange format, its mainly intended for configuration. The purpose of the proposal is created to provide a forward- and backward-compatible way to satisfy the common need for configuration files to have minimal, readable syntax for application-specific value syntax, while continuing to co-operate with existing and future TOML infrastructure, such as TOML validators, TOML formatters, TOML parsers, TOML writers, TOML editor plugins, etc.
The TOML implementation does not do anything with the value, it simply treats it as a list of tokens and lets the applications choose what to do with them. I hope the proposal made it sufficiently clear that the feature is very much intended to be opt-in. If an application decides to define or enable syntax that runs arbitrary computations like this, then that is on them, but at that point they probably have a compelling reason to do so. The same applies for many other uses of implementation-defined values, such as including files, inheriting parts of the TOML document recursively, substituting environment variables and strings, etc. These are all features which quite obviously dont belong in the TOML spec and should not be enabled by default, but which nontheless are very valuable for many applications of TOML. |
Anything can have a whole lot of different semantic meanings; "plain text" is often not "just plain text" (or "just a number"); for example:
These are all different "types" (regexp, email, directory, file, unix permission bits, monetary unit). I don't really see the difference between treating a string with the semantic meaning of an email address or directory as any different from treating it with the semantic meaning of an expression or something else. |
With all due respect to @yyny, I believe that any considerations of validation, syntax highlighting, security, or even practicality of implementation-specific modifications that fall outside of the TOML standard should be handled solely by the implementing project alone. We should not open ourselves up to these things. Either a TOML implementation is compliant with a given version of TOML, or it is not. Full stop. If they are not compliant, they should describe how they are "nearly compliant" or "almost compliant", but the burden falls on them to describe their variations. They can do so in their READMEs, and many specific implementations already exist with such caveats. You can review different implementations via the wiki. Parser writers have a lot of flexibility to make what suits their needs, and they could even come up with valuable extensions that may ultimately find their way into the TOML standard. But compliance is up to us to define. Non-standard variations are not. They are their creators' burdens first and foremost. No changes need to be made to the standard to express this sentiment. |
I've been busy on and off during the last months writing TOML parsers using different approaches. As such, I would like to propose prefixed strings. They are essentially strings prefixed with an unquoted key, e.g.:
For existing fully compliant parsers, supporting this should be trivial. Note that this is different to just using plain strings, since it gives semantic meaning to the string contents which is useful for e.g. syntax highlighting. Edit: Just noticed #603 which is related I suppose. |
@CheaterCodes Strings are a cheap fallback for many implementation problems. The semantics of your strings ultimately depend upon the application though. Tags like those in #603 could prove useful for microformats or syntax highlighting someday. More work needs done on this front though. Since you're writing your own parsers, try tags for yourself like e.g. I don't think something like a As for regexes, just use single-quote literal strings. That's what they're for. I know that you picked # TOML v1.0.0
time = 10:32:00.555
offset = 1987-07-05 17:45:56+13:00
regex = '[a-zA-Z]+' |
The problem was never with syntax anyway, but with the semantics of it all. What if an implementation doesn't support regular expressions? What should The syntax is unimportant until the semantics can be figured out. |
I guess my explanation just completely missed the point, my bad. So I agree, tags work just fine, syntax doesn't matter. As an alternative, consider tagged strings (with adjusted rules): This provides the (simple) rules for a parser to parse any string, not matter it's contents. Tags could then be used to clarify how the content should be handled, e.g. escaped or not, treated as time, etc. Maybe this is already too hard to read to be suitable for TOML, I might consider drafting my own format then. |
The way strings work is not my favourite part of TOML either, but that's not going to change no matter what we decide here, because changing it would break compatibility. Also, datetimes aren't really strings; you can express them as |
Which isn't technically a problem for a potential 2.0 release, but obviously this wouldn't be my choice.
Yes, but it was suggested above that simple parsers don't need to convert it to an actual date-time but can treat it as a string. Even with 4 string types my point stands, but I would still argue that datetimes fall into the same category. |
There are no plans for any v2 release and I would be surprised if there was ever an incompatible release. Might as well just create a new format because that's effectively the same as releasing an incompatible version of an existing format. |
Yes, I understand and somewhat agree. |
If you're writing the parser for your own use, then nothing's preventing you from making it work just for yourself. Make a parser that only recognizes one string syntax and treats all date/time values as strings; write all your TOML configs (which would still be valid TOML) to use that format; and don't rely on excluded data types within your application. Your parser won't be fully TOML-compliant, but you can clearly say how it diverges from the standard. That's better than nothing. |
With all these restrictions, this is strictly less useful than strings handled in a special manner would be. I'd say that expressing rich information of this form as strings is what we want people to do. Putting the description provided into the sort of language that TOML uses in its specification (and describing these rules in the corresponding ABNF) would likely rival the length and complexity of most of the language. I don't think the restricted (conditional) flexibility afforded by this is worth that complexity.
I don't -- at that point, it's effectively admitting that this is a new language with different considerations and guarentees. Given that nothing stops people from defining their own supersets, folks are welcome to define their own supersets already.
This is another indicator that what you're really describing is an optional superset of TOML, and that this isn't strictly needed everywhere. I don't think having conditionally parsable TOML files is a good idea.
This isn't compatible with TOML's design objective of:
I'm going to say that this sort of conditionally-supported undefined behaviour isn't going to be added to TOML, and close this issue. Thanks for making this proposal @yyny -- it's certainly intriguing and, while doesn't seem like a good fit for TOML, it is appreciated that you put the time to write this down in the amount of detail that you did. :) |
There have been several proposals to add new types/syntax for TOML values.
I personally don't like adding more types to the core of TOML, even date/time overdid it IMHO (Makes it harder to write a dependency-free parser in low-level languages that don't have a built-in "Date" type), but I do see the need for custom types.
I think there should be an extension to the TOML spec to allow for implementation-defined value syntax, so that's what I'm proposing here.
Proposal
A value which does not have the syntax of any of the currently existing basic types is considered "implementation defined".
implementation-defined values are defined as follows:
An implementation defined value...
"'{[
" characterstrue
,false
10ms
and100%
can be very useful(
is an error).)
is an error).( { ) }
is an error).print " "
andprint "\x20"
are exactly the same implementation-defined token).#
comments and ignore them (but not the newline after the comment).]}
closing delimiters while inside inline-tables and inline-arrays when not inside unmatched delimiters (e.g.{ name = implementation("defined") }
is a valid inline-table).\
and the character following it seperately.\"
quote to prevent starting string.\
is removed in this case is IMPLEMENTATION DEFINEDAdditionally:
The current "Date" syntax would be moved from a "required type" to a "recommended implementation-defined type".
This way, implementations that do not care about date values can simply ignore them and throw an error, without breaking TOML compliance.
Other proposed types can also be added as recommendations.
Implementation
Parser implementations that do not need implementation-defined values can simply throw an error when they encounter one.
Additionally, this means that all old TOML parsers would still be standard-compliant.
Parsing libraries that want to allow for implementation-defined values should allow the parser user to register a callback that gets called whenever an implementation-defined value token is recognized. They should get a string (or list of tokens) containing the full implementation-defined token that was parsed, with certain pre-processing as outlined above (e.g. removing leading and trailing spaces). These callbacks can then either return nothing/return an error to signal they do not recognize the string, or an implementation-defined value.
Parsing libraries may restrict implementation-defined values, for example, only allow values that look like function syntax:
expr(1 + 2) # This may search for a callback registered as "expr"
Multi-line values, indentation counting and preprocessing are really nice features, but implementations that do not wish to implement them can ignore them.
Additionally, implementations may wish to preprocess implementation-defined values however they see fit (e.g. remove leading space or ignore spaces completely), including no preprocessing at all.
Syntax Highlighting
The biggest drawback to this proposal is that a lot of syntax highlighters do not yet support this syntax.
Some syntax highlighters are not advanced enough to support some of the syntax proposed here at all (For example, counting leading indentation)
In addition, many current syntax highlighters clearly mark unrecognized tokens as invalid.
This nice feature would no longer be possible in a mix of implementation-defined and implementation-free TOML.
I don't consider this a big problem, since the syntax is implementation-defined by definition, and syntax highlighting should (mostly) be a concern for the implementation, not for the TOML spec.
Examples
List of proposals that propose new types/value syntax:
Glossary
Details
Delimiter
One of "
[](){}
"Paired delimiters
One of:
[]
"{}
"()
"Opening Delimiter
One of "
[{(
"Closing Delimiter
One of "
]})
"Matching Delimiter
An opening delimiter followed at some point by the closing delimiter in the same pair, with matching delimiters inside
Key
Standard TOML token
Equal sign
Standard TOML token
Registered callback
A function that gets called by the implementation when an implementation-defined value has been parsed
The text was updated successfully, but these errors were encountered: