Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style_guide ideas by KOLANICH #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 31 additions & 16 deletions ksy_style_guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -251,10 +251,9 @@ KS enforces specific identifier style in the language -
styles of identifier spelling, like `UpperCamelCase` or
`lowerCamelCase`, which some target languages use).

KS allows omitting `id`. One MUST omit `id` to mark up attributes of
unknown/undetermined purpose, i.e. unfinished reverse engineering
work. One MUST NOT omit `id` to mark up reserved/unused attributes and
padding, i.e. placeholder that are known to be empty and unused.
KS allows omitting `id`. One MUST NOT omit `id` to mark up reserved/unused
attributes and padding, i.e. placeholder that are known to be empty
and unused.

One SHOULD use the following rules to maintain consistency across
various KSY files. Doing that would maintain the "principle of least
Expand All @@ -263,28 +262,37 @@ guesswork.

* For simple non-repeated fields, use a simple singular form —
e.g. `width`, `header`, `transaction_id`, `file`.
* For an array of objects (i.e. with `repeat: something`), use plural
* For a sequence of objects (i.e. with `repeat: something`), use plural
form — e.g. `files`, `transactions`.
* Don't be overly verbose: use commonly understood abbreviations
liberally, if it will improve readability — e.g. `src_mac` or
`src_mac_addr` instead of `source_media_access_control_address`
* Repeated fields which cannot be packed into a sequence should
have `id`s containing a number in the end. Numbers may have a
visually-obvious structure, like "the first digit is major, the second
one is minor".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pretty vague and can be interpreted in multiple ways. Probably an example would help. Right now, it's not clear (1) whether foo_1 or foo1 fit this, (2) how one starts the numbering and how it should progress, (3) what is "visually-obvious structure".

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 I use foo0
2,3 at KSY author's disposal. There are differrent situations and we want to be flexible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, yet again, the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.

If we wanna some checks, IMHO it's better to extend KS language instead of introducing a convention making the names ugly.

These numbers are not for checks, but for developer's convenience.

* For fields that are designed to be used to detect file type (AKA
"magic values"), use `magic` name, or, if there are several of them,
`magic1`, `magic2`, etc.
"magic values"), use `signature` or `magic` name, or, if there are
several of them, like `signature` or `magic0`, `magic1`, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so, basically:

  1. You specify "signature or magic", and don't specify any way to choose between these two. Given that its essentially the same stuff, I see no point in separation this into two names, and adding headache for ksy developers to choose between two proposed alternatives.
  2. You want every numbered thing to start counting from 0. Any reasons to do so, i.e. any major examples of someone doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 at KSY author's disposal. I personally prefer "signature" because it clearly states that this field is used for identification.
2 Just taste. If we start arrays from index 0 in the most of languages why not to follow the same convention here?

* For reserved fields which are *known* to be unused, use `reserved`
name (or `reserved1`, `reserved2`, etc, if there are many of them)
name (or `reserved0`, `reserved1`, etc, if there are many of them)
* For fields that designate *number / count* of something (in
particular, number of repetitions of some other structure), use
`num_` prefix and plural form — i.e. `num_questions`, `num_blocks`,
`num_nodes`
particular, number of repetitions of some other structure), use either
`_count` suffix and a plural form, contrary to prescribed by English
grammar, — i.e. `questions_count`, `blocks_count`, `nodes_count`
(mentally replace `_count` by `.count` to understand the logic
behind that).
* For fields that designate *offset* to some particular data structure,
use `ofs_` prefix and name of that data structure (as it would appear
in the file) — i.e. `ofs_block`, `ofs_queries`, `ofs_path`
use `_offset` suffix and name of that data structure (as it would
appear in the file) — i.e. `block_offset`, `queries_offset`, `path_offset`
* For fields that designate *size* of some particular data structure
(in bytes or some other fixed units), use `len_` prefix and name of
that data structure — i.e. `len_block` (length of a single `block`
entry), `len_blocks` (total length of whole `blocks` array, made of
(in bytes or some other fixed units), use `_size` suffix and name of
that data structure — i.e. `block_size` (length of a single `block`
entry), `blocks_size` (total length of whole `blocks` array, made of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it boils down to:

  1. Prefix vs suffix
  2. Long of short spelling

In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation. Prefix does not try to sound like "proper English", and it avoids lots of ambiguilities and spelling problems (like that "count" stuff I've demonstrated above). For example, there are tricky nouns in English (like "water" or "money" or "news" or "advice" or "hair"), which make proper English phrasing difficult.

Last, but not least, I definitely prefer shorter form of spelling. Given that we have to make "one size fits all or most" identifiers, we're not going to have full-length Java-style identifiers like "NormalBucketDistributionProgressionOfLongIntegerConstantsForHadoopNodeFactory" — (1) there are languages that have certain ID length limits, (2) a vast majority people I've interviewed do not like such long identifiers very much, (3) it's actually hard to work with them without a IDE with expression autocompletion and stuff — and we don't have one (and I'm not very keen about having IDE as the only way to work with the language).

So, all and all, clear 3 character prefix looks like a good trade-off for me — it's non-ambiguous, it makes it very easy to detect whatever this attribute is about by always looking at the beginning, and it fits concise C-like style well.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation.

It doesn't mean that we should do this too.

The good part in postfix is that since we read left-to-right all the properties of the same member look similar to each other. Especially when aligned. But with prefix they look too differrent. Also take in mind the argument with the point.

(1) there are languages that have certain ID length limits,

Valid.

(2) a vast majority people I've interviewed do not like such long identifiers very much

Neither do I. But In the case of short enough identifiers, which is the majority of them, I prefer to have them readable. It is only a guide, not a strict requirement.

(3) it's actually hard to work with them without a IDE with expression autocompletion and stuff

Valid. BTW some text editors, like Notepad++, have autocompletion which autocompletes any tokens.

`block` entries).
* Fields of unknown/undetermined purpose, i.e. unfinished reverse
engineering work SHOULD either NOT HAVE an `id` or HAVE an
`id` matching the `/unkn(?:own)?(_\w+)?\d*/` regular expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, again, not how style guide should be written. It's pointless to leave so much choices to do for ksy developer. It's better to decide on this once and for all, and it will be easy for everyone — to remember, to follow, to understand, to write automated tools for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Having an id for a property of unknown meaning is useful when doing reverse-engineering, especially if you have more than one such a property in the same struct.
2 But if a developer doesn't need that id, why to force him to have it?


[NOTE]
See <<transcribing>> for more info on preserving / renaming
Expand Down Expand Up @@ -367,6 +375,13 @@ here as well. Keys MUST appear in this order:
. All other keys (except for `id` and `-orig-id`), in order specified
in <<seq-attr>>

[[enum-attr]]
== Enum attributes

* In the case of multiple enums with the same name it's usually
better to append the integer value rather than a sequence
number to its `id`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some problems understanding what you've meant here. Probably an example should help.

Copy link
Contributor Author

@KOLANICH KOLANICH Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. But we should describe it in more detail, with an example, and think of some other choices to be made here. For example, should it be unknown31 or unknown_31? unknown_31 or unknown_1f or unknown0x1f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, should it be unknown31 or unknown_31? unknown_31 or unknown_1f or unknown0x1f?

No 0x, 0o and 0b prefixes, no underscore, the number part is the same as in the left part.

[[transcribing]]
== Transcribing existing specs

Expand Down