-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
style_guide ideas by KOLANICH #7
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -251,10 +251,9 @@ KS enforces specific identifier style in the language - | |
styles of identifier spelling, like `UpperCamelCase` or | ||
`lowerCamelCase`, which some target languages use). | ||
|
||
KS allows omitting `id`. One MUST omit `id` to mark up attributes of | ||
unknown/undetermined purpose, i.e. unfinished reverse engineering | ||
work. One MUST NOT omit `id` to mark up reserved/unused attributes and | ||
padding, i.e. placeholder that are known to be empty and unused. | ||
KS allows omitting `id`. One MUST NOT omit `id` to mark up reserved/unused | ||
attributes and padding, i.e. placeholder that are known to be empty | ||
and unused. | ||
|
||
One SHOULD use the following rules to maintain consistency across | ||
various KSY files. Doing that would maintain the "principle of least | ||
|
@@ -263,28 +262,37 @@ guesswork. | |
|
||
* For simple non-repeated fields, use a simple singular form — | ||
e.g. `width`, `header`, `transaction_id`, `file`. | ||
* For an array of objects (i.e. with `repeat: something`), use plural | ||
* For a sequence of objects (i.e. with `repeat: something`), use plural | ||
form — e.g. `files`, `transactions`. | ||
* Don't be overly verbose: use commonly understood abbreviations | ||
liberally, if it will improve readability — e.g. `src_mac` or | ||
`src_mac_addr` instead of `source_media_access_control_address` | ||
* Repeated fields which cannot be packed into a sequence should | ||
have `id`s containing a number in the end. Numbers may have a | ||
visually-obvious structure, like "the first digit is major, the second | ||
one is minor". | ||
* For fields that are designed to be used to detect file type (AKA | ||
"magic values"), use `magic` name, or, if there are several of them, | ||
`magic1`, `magic2`, etc. | ||
"magic values"), use `signature` or `magic` name, or, if there are | ||
several of them, like `signature` or `magic0`, `magic1`, etc. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, so, basically:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 1 at KSY author's disposal. I personally prefer "signature" because it clearly states that this field is used for identification. |
||
* For reserved fields which are *known* to be unused, use `reserved` | ||
name (or `reserved1`, `reserved2`, etc, if there are many of them) | ||
name (or `reserved0`, `reserved1`, etc, if there are many of them) | ||
* For fields that designate *number / count* of something (in | ||
particular, number of repetitions of some other structure), use | ||
`num_` prefix and plural form — i.e. `num_questions`, `num_blocks`, | ||
`num_nodes` | ||
particular, number of repetitions of some other structure), use either | ||
`_count` suffix and a plural form, contrary to prescribed by English | ||
grammar, — i.e. `questions_count`, `blocks_count`, `nodes_count` | ||
(mentally replace `_count` by `.count` to understand the logic | ||
behind that). | ||
* For fields that designate *offset* to some particular data structure, | ||
use `ofs_` prefix and name of that data structure (as it would appear | ||
in the file) — i.e. `ofs_block`, `ofs_queries`, `ofs_path` | ||
use `_offset` suffix and name of that data structure (as it would | ||
appear in the file) — i.e. `block_offset`, `queries_offset`, `path_offset` | ||
* For fields that designate *size* of some particular data structure | ||
(in bytes or some other fixed units), use `len_` prefix and name of | ||
that data structure — i.e. `len_block` (length of a single `block` | ||
entry), `len_blocks` (total length of whole `blocks` array, made of | ||
(in bytes or some other fixed units), use `_size` suffix and name of | ||
that data structure — i.e. `block_size` (length of a single `block` | ||
entry), `blocks_size` (total length of whole `blocks` array, made of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, it boils down to:
In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation. Prefix does not try to sound like "proper English", and it avoids lots of ambiguilities and spelling problems (like that "count" stuff I've demonstrated above). For example, there are tricky nouns in English (like "water" or "money" or "news" or "advice" or "hair"), which make proper English phrasing difficult. Last, but not least, I definitely prefer shorter form of spelling. Given that we have to make "one size fits all or most" identifiers, we're not going to have full-length Java-style identifiers like "NormalBucketDistributionProgressionOfLongIntegerConstantsForHadoopNodeFactory" — (1) there are languages that have certain ID length limits, (2) a vast majority people I've interviewed do not like such long identifiers very much, (3) it's actually hard to work with them without a IDE with expression autocompletion and stuff — and we don't have one (and I'm not very keen about having IDE as the only way to work with the language). So, all and all, clear 3 character prefix looks like a good trade-off for me — it's non-ambiguous, it makes it very easy to detect whatever this attribute is about by always looking at the beginning, and it fits concise C-like style well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It doesn't mean that we should do this too. The good part in postfix is that since we read left-to-right all the properties of the same member look similar to each other. Especially when aligned. But with prefix they look too differrent. Also take in mind the argument with the point.
Valid.
Neither do I. But In the case of short enough identifiers, which is the majority of them, I prefer to have them readable. It is only a guide, not a strict requirement.
Valid. BTW some text editors, like Notepad++, have autocompletion which autocompletes any tokens. |
||
`block` entries). | ||
* Fields of unknown/undetermined purpose, i.e. unfinished reverse | ||
engineering work SHOULD either NOT HAVE an `id` or HAVE an | ||
`id` matching the `/unkn(?:own)?(_\w+)?\d*/` regular expression. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is, again, not how style guide should be written. It's pointless to leave so much choices to do for ksy developer. It's better to decide on this once and for all, and it will be easy for everyone — to remember, to follow, to understand, to write automated tools for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 1 Having an |
||
|
||
[NOTE] | ||
See <<transcribing>> for more info on preserving / renaming | ||
|
@@ -367,6 +375,13 @@ here as well. Keys MUST appear in this order: | |
. All other keys (except for `id` and `-orig-id`), in order specified | ||
in <<seq-attr>> | ||
|
||
[[enum-attr]] | ||
== Enum attributes | ||
|
||
* In the case of multiple enums with the same name it's usually | ||
better to append the integer value rather than a sequence | ||
number to its `id`. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had some problems understanding what you've meant here. Probably an example should help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that's a good idea. But we should describe it in more detail, with an example, and think of some other choices to be made here. For example, should it be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No |
||
[[transcribing]] | ||
== Transcribing existing specs | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pretty vague and can be interpreted in multiple ways. Probably an example would help. Right now, it's not clear (1) whether
foo_1
orfoo1
fit this, (2) how one starts the numbering and how it should progress, (3) what is "visually-obvious structure".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 I use
foo0
2,3 at KSY author's disposal. There are differrent situations and we want to be flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, yet again, the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanna some checks, IMHO it's better to extend KS language instead of introducing a convention making the names ugly.
These numbers are not for checks, but for developer's convenience.