-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sort order for second dataset when using orderedComparison = false & ignoreColumnNames = true #93
Comments
@pkoplik24 - Thanks for pointing out this edge case. I think the function should error out if orderedComparison=false and ignoreColumnNames=true. We can have it return a descriptive error message that explains why the combination of options doesn't make sense. Does that sound like an OK approach with you? |
Hey @MrPowers sorry for the delay, busy week. I actually do think this combination of parameters makes sense, which is how I came across this. I think the fix will be something similar to #91 As an example, I would expect this test to pass but it does not due to the row ordering.
|
Hey @MrPowers, any thoughts on this? |
@MrPowers Bump Why do this
Instead of this
|
If orderedComparison is set to false, then the unordered comparison will sort the columns to provide a sort order for the dataset.
def defaultSortDataset[T](ds: Dataset[T]): Dataset[T] = {
val colNames = ds.columns.sorted
val cols = colNames.map(col)
ds.sort(cols: _*)
}
If the expected and actual datasets have different column sort orders because the names are different (and then ignoreColumnNames set to true), then the rows are sorted differently and the assertion fails.
Proposed fixes:
The text was updated successfully, but these errors were encountered: