Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_description() and parse_description() assume native encoding #13

Open
bastistician opened this issue Jan 30, 2018 · 2 comments
Open

Comments

@bastistician
Copy link

authors <- as.person(eval(parse(text=description$`Authors@R`)))

parse()ing the text from Authors@R does not work if that field contains non-ASCII characters and the DESCRIPTION file is not in the native encoding of the system processing the package (UTF-8). Typical examples are "latin1" packages with accented characters in author names, e.g.:

res <- process_package("https://cran.r-project.org/src/contrib/flexrsurv_1.4.1.tar.gz", "flexrsurv", "cran")

Proper handling of package descriptions is provided by the desc package. However, a simple fix to just support packages in latin1 encoding in addition to UTF-8 is to mark the Encoding() in get_description() as in utils:::.read_description():

get_description <- function(pkg_folder) {
  desc_path <- file.path(pkg_folder, "DESCRIPTION")
  out <- read.dcf(desc_path)[1, ]
  if (identical(out[["Encoding"]], "latin1")) {
    Encoding(out) <- "latin1"
  }
  as.list(out)
}

This might fix datacamp/rdocumentation-app#386.

@filipsch
Copy link
Contributor

@WastlM interesting! Thanks for the digging and the pointer. @ludov04 or I will have a look asap, hopefully it solves the parsing issues!

@bastistician
Copy link
Author

Here's a better fix, which converts the input (regardless of its encoding) to the native encoding in case the DESCRIPTION has an Encoding field:

get_description <- function(pkg_folder) {
  desc_path <- file.path(pkg_folder, "DESCRIPTION")
  out <- read.dcf(desc_path)[1L, ]
  if ("Encoding" %in% names(out)) {
    Encoding(out) <- out[["Encoding"]]
    out <- enc2native(out)
  }
  as.list(out)
}

Conversion to the native encoding ensures that the subsequent parsing and evaluation of the Authors@R field works on the system which runs the code. I think that's the way to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants