Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: change to hyphenated keys #5909

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Dec 2, 2024

Fixes GH-5555

Proposed Commit Message

doc: change to hyphenated keys

Some words, and especially "metadata" are used in multiple
ways to describe multiple things. Convert to the following
standard.

user data -> user-data
vendor data -> vendor-data
meta data -> meta-data
instance data -> instance-data

Clarifications:

- "instance-data" describes data which is gathered from the
  IMDS: user-data, vendor-data, and meta-data.

- "metadata" may be used as part of the phrase "instance
  metadata service" (IMDS), to describe the service which 
  instance data is gathered from. Otherwise attempt to avoid
  the word "metadata". It is too abstract to be generally useful
  in cloud-init documentation because it could be correctly used
  to describe almost any data.

This also changes some spellings to US English.

Fixes GH-5555

Additional Context

Fixes #5555

Most of this work was completed with a simple sed script. I reviewed the changes and reverted any errors.

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb holmanb requested a review from s-makin December 2, 2024 19:21
@github-actions github-actions bot added the documentation This Pull Request changes documentation label Dec 2, 2024
@holmanb holmanb force-pushed the holmanb/hyphenated-keys branch from 12b6c93 to 06556e7 Compare December 2, 2024 22:57
@holmanb holmanb force-pushed the holmanb/hyphenated-keys branch from 06556e7 to 2ff05c7 Compare December 3, 2024 06:56
Copy link
Contributor

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to find some not picked up by the regex.

By changing the terms in the URLs, all of the links are now 404ing/broken. You'll probably want to revert those ones at least :)

For simplicity, I've not looked at the rendered preview this round, only what's presented here - but one pattern I noticed is that space-separated, capitalised terms (e.g. in headers and at the start of sentences) didn't seem to be picked up by the regex either. You might want to do a second pass to include those

I spotted a couple of GB_en spellings as well, I included those as suggestions.

You may also want to include the context around what we're standardizing on and why in the PR description - I was missing that context as I was going through, so I'm not sure if all my suggestions are actually correct. I'm also still not clear (and actually less clear now) on the difference between meta-data, instance-data, instance-metadata, and instance meta-data (some of these are still used interchangeably)

- **user data**, **vendor data**: Two words, not to be combined or hyphenated.
- **datasource**: One word.
- **user-data**, **vendor-data**, **cloud-config**, **instance-data**: Two
words, not to be combined or hyphenated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description doesn't match the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth briefly re-writing this whole paragraph to make sure that it correctly reflects what the situation is now (i.e., if all terms are to be hyphenated for consistency, does metadata need to be on a separate bullet point?)

doc/rtd/development/summit/2023_summit.rst Outdated Show resolved Hide resolved
doc/rtd/explanation/configuration.rst Outdated Show resolved Hide resolved
doc/rtd/explanation/instancedata.rst Outdated Show resolved Hide resolved
doc/rtd/explanation/instancedata.rst Show resolved Hide resolved
doc/rtd/reference/network-config.rst Outdated Show resolved Hide resolved
doc/rtd/reference/yaml_examples/datasources.rst Outdated Show resolved Hide resolved
doc/rtd/reference/yaml_examples/scripts.rst Outdated Show resolved Hide resolved
doc/rtd/tutorial/qemu.rst Outdated Show resolved Hide resolved
doc/rtd/tutorial/qemu.rst Outdated Show resolved Hide resolved
@holmanb
Copy link
Member Author

holmanb commented Dec 3, 2024

Thanks for the review @s-makin :-)

I spotted a couple of GB_en spellings as well, I included those as suggestions.

Nice, thanks!

one pattern I noticed is that space-separated, capitalised terms (e.g. in headers and at the start of sentences) didn't seem to be picked up by the regex either

Thanks for spotting that! I'll re-run the script with that included and check what falls out after that change.

You may also want to include the context around what we're standardizing on and why in the PR description - I was missing that context as I was going through, so I'm not sure if all my suggestions are actually correct.

Will do, thanks!

I'm also still not clear (and actually less clear now) on the difference between meta-data, instance-data, instance-metadata, and instance meta-data (some of these are still used interchangeably)

I'll clarify in the docs, but the tl;dr is that metadata is a far-too-overused term that I'd like us to avoid as much as possible for a two reasons:

  1. cloud-init docs call multiple specific things "metadata".
  2. The word "metadata" is too abstract to be generally useful. We could call most data used by cloud-init "metadata" and be technically correct, yet the word "metadata" requires more additional cognitive load because "metadata" requires the reader to disambiguate between the specific things (mentioned in 1.) and the general use of the term.

Users might benefit from a page full of definitions, here is a start for some of the ones that are relevant to this document (I wouldn't want to document or use, the "synonyms"):

synonyms currently in use preferred word definition
instance metadata server, metadata server IMDS a "service" from which user-data, vendor-data, and meta-data are gathered from, often an http server
metadata meta-data cloud-specific data that is exposed by the IMDS which is not user-data or vendor-data (can include network-data), only one key is common between all clouds
instance metadata, instance data instance-data a data structure which is stored as json which consists of a bunch of data, some of which includes data which was gathered from the IMDS

We could (should?) probably replace all use of the word "metadata" in our docs with the word "data" simply to avoid the need for disambiguation.

This PR aims to standardize the above table (in addition to vendor-data and user-data).

There is obviously a lesson to be learned from this whole mess about the benefits of selecting good names for core concepts (and using them consistently!).

Thanks again @s-makin for pointing out the confusing bits. Please let me know if this explanation helps - I can add this to the PR message and try to standardize the term instance-data a better in the docs.

@holmanb
Copy link
Member Author

holmanb commented Dec 3, 2024

@TheRealFalcon could you please review my last comment?

@TheRealFalcon
Copy link
Member

I think a big part of the problem is that these concepts are taken directly from ec2.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

"user data" is a thing. "Instance metadata" is a thing. We then layered meta-data on top of that for reasons. Now that we're supporting several more clouds, I think using our own terminology like user-data and instance-data is fine, especially since we're not simply storing all the instance metadata coming from ec2. Your proposed words there look good to me.

I generally prefer "instance metadata server" to IMDS just because it's another acronym that not many people reading our docs will know. If we're saying it a ton, I think "instance metadata server (IMDS)", or maybe providing a link on first use of a page is fine, but if a random reference appears out of the blue, I would prefer using the whole "instance metadata server".

@holmanb
Copy link
Member Author

holmanb commented Dec 13, 2024

@TheRealFalcon I adjusted the terminology to prefer "instance metadata service", and addressed the remaining comments from @s-makin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This Pull Request changes documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[docs]: a second look at the spelling of common terms
3 participants