Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate lang links #6618

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Automate lang links #6618

wants to merge 3 commits into from

Conversation

rikivillalba
Copy link
Contributor

In response to #6613

This is my proposal to automate link generation:

  • The logic is in .translation_links.R in the vignettes directory, and it is sourced in each vignette
  • The message "Translations of this document are available in the following languages" is translated in the same source (there is a switch which defaults to english)
  • Currently the output format is en | fr | es | ..., that is, not a list but all inline.

Note that currently, in documents with a TOC (datatable-faq.Rmd) the link list is below the toc in the packaged vignettes. That is unchanged as the TOC is built automatically by commonmark.
Also, #6394 should be considered.

Copy link

codecov bot commented Nov 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.61%. Comparing base (546259d) to head (d2cef04).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6618      +/-   ##
==========================================
+ Coverage   98.60%   98.61%   +0.01%     
==========================================
  Files          79       79              
  Lines       14518    14559      +41     
==========================================
+ Hits        14316    14358      +42     
+ Misses        202      201       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

block = sprintf(
"%s: %s\n",
switch(lang,
fr = "Des traductions de ce document sont disponibles dans les langues suivantes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method for defining translations is non-standard. I think this would be better as gettext(...)
or to simplify, just use English. Only list the other languages at the top of the English vignette. and then the other vignettes can just link back to English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think in gettext however I don't figure out how to get the string for the appropriate language to compile the vignettes in all languages, other than set Sys.setLanguage() in each call.
Another option is to hardcode the header and let the code generate only the links.
But I think your second option is ok.

@rikivillalba
Copy link
Contributor Author

I made the following changes:

  • The script is now is a function to which the "translations available" message is an arg, so it is not harcoded but passed from the vignette text.
  • The message only prints if there are any translations other than current language (as now french version of datatable-joins.Rmd is missing)
  • renamed .translation_links.R to _translation_links.R so ls can see it (it is part of the vignettes after all)
  • NOTE: I manually merge datatable-keys-fast-subset.Rmd, datatable-reference-semantics.Rmd and datatable-secondary-indices-and-auto-indexing.Rmd because Using hyperlinks instead of vignette() calls for readability; doing the same for vignette titles without links for consistency #6617 modified lines adjacent to this branch.

_translation_links.R must be bundled in the package so r CMD check can build the vignettes. However that source is 'knitted' and present in each .R file generated from .Rmd and installed in doc.

French vignettes are no modified at all. An equivalent to "from English original vignette" message linking back to english might be added to translated vignettes, or code similar to that of the english ones, it is up to the owner to decide. In the script case it will add all available links excluding that of the current vignette language determined from the folder name.

As part of spanish translators there is work in progres in https://github.com/cienciadedatos/traduccion-vignettes-datatable
tks

@MichaelChirico MichaelChirico added the translation issues/PRs related to message translation projects label Dec 3, 2024
# build a link list of alternative languages (may be character(0))
.write.translation.links <- function(fmt) {
url = "https://rdatatable.gitlab.io/data.table/articles"
path = dirname(knitr::current_input(TRUE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a litedown equivalent (#6583)?

Copy link
Contributor Author

@rikivillalba rikivillalba Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only alternative I found (other than searching up the call stack) was litedown:::.env$input, that stores the file name (the input path to litedown::fuse()) during the execution. Tested to work building the package from . and from ..

transl_lang[transl_lang != lang],
file.path(url, sub("(?i)\\.Rmd$", ".html", translation[transl_lang != lang]))
)))
} else ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about leaving unlinked text in for the current language instead?

Available languages: fr | es

vs

Available languages: en | fr | es

* [French](https://rdatatable.gitlab.io/data.table/articles/fr/datatable-benchmarking.html)
```{r echo=FALSE, file='_translation_links.R'}
```
`r .write.translation.links("Translations of this document are available in: %s")`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the idea is the translation team will just translate this string themselves directly in the .Rmd file, is that right? (seems fine, just want to confirm my understanding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have this helper script, let's keep a consistent naming style (personally, I like your snake case style here):

https://github.com/Rdatatable/data.table/blob/master/vignettes/.check.translations.R

Also note that this should probably be added to .Rbuildignore? Maybe just ignore all vignettes/_.* files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, adding it to .Rbuildignore causes R check to fail. It checks the vignettes to be able to be built from the tarball bundle alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh got it. I guess that's why it must be prefixed with _ to not be "hidden", which in turn is why there's no R CMD check failure.

I am only concerned about R CMD check passing, it is fine to include it.

@rikivillalba
Copy link
Contributor Author

This is the language-link printing function, with the propositions, adapted to litedown. Must be after #6583

# build a link list of alternative languages (may be character(0))
# idea is to look like 'Other languages: en | fr | de'
.write.translation.links <- function(fmt) {
  url = "https://rdatatable.gitlab.io/data.table/articles"
  # TODO: better litedown alternative to knitr::current_input()
  # fuse() setwd the .Rmd dir before evaluating chunks
  input_file = basename(litedown:::.env$input)
  path = dirname(normalizePath(input_file, mustWork = TRUE))
  
  if (basename(path) == "vignettes") {
    lang = "en"
  } else {
    lang = basename(path)
    path = dirname(path)
  }
  translation = dir(path,
    recursive = TRUE,
    pattern = glob2rx(input_file)
  )
  transl_lang = ifelse(dirname(translation) == ".", "en", dirname(translation))
  block = if (!all(transl_lang == lang)) {
    sprintf(fmt, paste(collapse = " | ", ifelse(
      transl_lang != lang,
      sprintf("[%s](%s)", transl_lang,  file.path(url, sub("(?i)\\.Rmd$", ".html", translation))),
      transl_lang     # print no link, only language code.
    )))
  } else ""
  block
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants