get_description() and parse_description() assume native encoding #13

bastistician · 2018-01-30T10:23:25Z

Line 36 in cb48a03

authors <- as.person(eval(parse(text=description$`Authors@R`)))

parse()ing the text from Authors@R does not work if that field contains non-ASCII characters and the DESCRIPTION file is not in the native encoding of the system processing the package (UTF-8). Typical examples are "latin1" packages with accented characters in author names, e.g.:

res <- process_package("https://cran.r-project.org/src/contrib/flexrsurv_1.4.1.tar.gz", "flexrsurv", "cran")

Proper handling of package descriptions is provided by the desc package. However, a simple fix to just support packages in latin1 encoding in addition to UTF-8 is to mark the Encoding() in get_description() as in utils:::.read_description():

get_description <- function(pkg_folder) {
  desc_path <- file.path(pkg_folder, "DESCRIPTION")
  out <- read.dcf(desc_path)[1, ]
  if (identical(out[["Encoding"]], "latin1")) {
    Encoding(out) <- "latin1"
  }
  as.list(out)
}

This might fix datacamp/rdocumentation-app#386.

The text was updated successfully, but these errors were encountered:

filipsch · 2018-01-30T10:33:04Z

@WastlM interesting! Thanks for the digging and the pointer. @ludov04 or I will have a look asap, hopefully it solves the parsing issues!

bastistician · 2018-03-07T09:21:56Z

Here's a better fix, which converts the input (regardless of its encoding) to the native encoding in case the DESCRIPTION has an Encoding field:

get_description <- function(pkg_folder) {
  desc_path <- file.path(pkg_folder, "DESCRIPTION")
  out <- read.dcf(desc_path)[1L, ]
  if ("Encoding" %in% names(out)) {
    Encoding(out) <- out[["Encoding"]]
    out <- enc2native(out)
  }
  as.list(out)
}

Conversion to the native encoding ensures that the subsequent parsing and evaluation of the Authors@R field works on the system which runs the code. I think that's the way to go!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_description() and parse_description() assume native encoding #13

get_description() and parse_description() assume native encoding #13

bastistician commented Jan 30, 2018

filipsch commented Jan 30, 2018

bastistician commented Mar 7, 2018

get_description() and parse_description() assume native encoding #13

get_description() and parse_description() assume native encoding #13

Comments

bastistician commented Jan 30, 2018

filipsch commented Jan 30, 2018

bastistician commented Mar 7, 2018