Skip to content

Commit

Permalink
closes #6556
Browse files Browse the repository at this point in the history
  • Loading branch information
venom1204 committed Dec 25, 2024
1 parent 3b2812b commit 8ba2483
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 4 deletions.
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1013,4 +1013,7 @@ rowwiseDT(

20. Some clarity is added to `?GForce` for the case when subtle changes to `j` produce different results because of differences in locale. Because `data.table` _always_ uses the "C" locale, small changes to queries which activate/deactivate GForce might cause confusingly different results when sorting is involved, [#5331](https://github.com/Rdatatable/data.table/issues/5331). The inspirational example compared `DT[, .(max(a), max(b)), by=grp]` and `DT[, .(max(a), max(tolower(b))), by=grp]` -- in the latter case, GForce is deactivated owing to the _ad-hoc_ column, so the result for `max(a)` might differ for the two queries. An example is added to `?GForce`. As always, there are several options to guarantee consistency, for example (1) use namespace qualification to deactivate GForce: `DT[, .(base::max(a), base::max(b)), by=grp]`; (2) turn off all optimizations with `options(datatable.optimize = 0)`; or (3) set your R session to always sort in C locale with `Sys.setlocale("LC_COLLATE", "C")` (or temporarily with e.g. `withr::with_locale()`). Thanks @markseeto for the example and @michaelchirico for the improved documentation.

Merge.data.table: Improved Error Handling
Argument by in merge.data.table() now provides more informative error messages. When columns specified in by are missing from either of the data tables (x or y), the error will now clearly list which columns are missing from each table, making debugging easier. This improvement ensures better diagnostics for users when performing merges with invalid column names. This change will help users resolve errors more efficiently during data table merges.

# data.table v1.14.10 (Dec 2023) back to v1.10.0 (Dec 2016) has been moved to [NEWS.1.md](https://github.com/Rdatatable/data.table/blob/master/NEWS.1.md)
27 changes: 23 additions & 4 deletions R/merge.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,30 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL
by = intersect(nm_x, nm_y)
if (length(by) == 0L || !is.character(by))
stopf("A non-empty vector of column names for `by` is required.")
if (!all(by %chin% intersect(nm_x, nm_y)))
stopf("Elements listed in `by` must be valid column names in x and y")
by = unname(by)
by.x = by.y = by
# UPDATED PART STARTS HERE
if (!all(by %chin% intersect(nm_x, nm_y))) {
# Identify which keys are missing from each data table
missing_x = setdiff(by, nm_x)
missing_y = setdiff(by, nm_y)

Check warning on line 58 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=58,col=1,[trailing_whitespace_linter] Remove trailing whitespace.
# Construct a more detailed error message
error_message = "Elements listed in `by` must be valid column names in x and y."

Check warning on line 61 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=61,col=1,[trailing_whitespace_linter] Remove trailing whitespace.
if (length(missing_x) > 0) {
error_message = paste(error_message, "\nMissing columns in 'x':", paste(missing_x, collapse = ", "))

Check warning on line 63 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=63,col=73,[paste_linter] toString(.) is more expressive than paste(., collapse = ", "). Note also glue::glue_collapse() and and::and() for constructing human-readable / translation-friendly lists
}
if (length(missing_y) > 0) {
error_message = paste(error_message, "\nMissing columns in 'y':", paste(missing_y, collapse = ", "))

Check warning on line 66 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=66,col=73,[paste_linter] toString(.) is more expressive than paste(., collapse = ", "). Note also glue::glue_collapse() and and::and() for constructing human-readable / translation-friendly lists
}

Check warning on line 68 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=68,col=1,[trailing_whitespace_linter] Remove trailing whitespace.
# Raise the error with the detailed message
stopf(error_message)
}
# UPDATED PART ENDS HERE

Check warning on line 73 in R/merge.R

View workflow job for this annotation

GitHub Actions / lint-r

file=R/merge.R,line=73,col=1,[trailing_whitespace_linter] Remove trailing whitespace.
by = unname(by)
by.x = by.y = by
}

# warn about unused arguments #2587
if (length(list(...))) {
Expand Down

0 comments on commit 8ba2483

Please sign in to comment.