Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix generic type validation to support datetimes and non-ASCII strings. #291

Merged
merged 7 commits into from
Oct 17, 2023

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented Oct 11, 2023

SC-24612

The C# API's rules around type validation in its generic methods have some defects that prevent datetime types and non-ASCII strings from being used. This PR improves the type validation to fix these cases.

Summary of changes

  • When performing type validation, instead of converting the .NET Type to a TileDB DataType and compare them, we do the opposite. This happens because (with few exceptions), the mapping between .NET types and TileDB data types is one-to-many, and so by "narrowing down" we can validate that datetime data types match with long, while long maps to DataType.Int64.
    • The validation logic was centralized in the ErrorHandling.CheckDataType. It accepts a generic type parameter and a DataType regular parameter, and fails if they are incompatible.
  • There are two special cases for validation: DataType.Boolean and StringUtf16. Besides byte and ushort, they can also match with bool and char (in .NET it's 2 bytes long) respectively.
    • We could do the same with StringUtf32 and System.Text.Rune, but the Core must validate that UTF-32 strings contain valid Unicode codepoints (if it already does, please let me know).
  • Behavior breaking change: The EnumUtil.DataTypeToType method mapped all string DataTypes to sbyte, which is an obviously wrong behavior, and inconsistent with what the C++ API does. This PR changes it to map ASCII and UTF-8 strings to byte, UTF-16 strings to ushort, and UTF-32 strings to uint.
    • In case of DataTypes with multiple valid types (like StringUtf16 and Boolean), DataTypeToType returns the matching numeric type.
  • Minor behavior breaking change: Some generic methods throw ArgumentException which is more appropriate than InvalidOperationException. Not commonly expected to hit, but worth mentioning.

And also document which type is compatible with which `DataType`.
…ing.

If the generic type is string, we check ourselves that the `DataType` is a string one. This ensures the exception thrown is `ArgumentException` instead of `TileDBException`.
The exception message also is more descriptive.
@teo-tsirpanis
Copy link
Member Author

The failing examples in #258 succeed after this.

@ihnorton
Copy link
Member

Behavior breaking change: The EnumUtil.DataTypeToType method

Is there any way that external users could be relying on this behavior?

@teo-tsirpanis
Copy link
Member Author

Besides the validation we do, one other use case I could find for this method is to dynamically construct an array of that type using reflection, as part of processing TileDB arrays of unknown schemata, but there are better ways of doing this.

Another thing we could do is make an internal edition of this method with the updated behavior and deprecate the public TypeToDataType and DataTypeToType methods, keeping the existing behavior in the meantime.

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do is make an internal edition of this method with the updated behavior and deprecate the public TypeToDataType and DataTypeToType methods, keeping the existing behavior in the meantime.

+1, let's do it that way, thanks 💪

@@ -55,5 +56,39 @@ public static void ThrowIfManagedType<T>()
ThrowHelpers.ThrowManagedType();
}
}

/// <summary>
/// Returns whether values of type <typeparamref name="T"/> can be stored or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ihnorton
Copy link
Member

Thanks

@teo-tsirpanis teo-tsirpanis merged commit e645f63 into TileDB-Inc:main Oct 17, 2023
4 checks passed
@teo-tsirpanis teo-tsirpanis deleted the type-validation branch October 17, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants