-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix type of the a batch returned by make_batch_reader when TransformSpec's function returns column with all values being None #750
base: master
Are you sure you want to change the base?
Conversation
@@ -209,6 +209,33 @@ def preproc_fn1(x): | |||
(3, 4, 5) == sample['tensor_col_2'].shape[1:] | |||
|
|||
|
|||
@pytest.mark.parametrize('null_column_dtype', [np.float64, np.unicode_]) | |||
@pytest.mark.parametrize('reader_factory', _D) | |||
def test_transform_spec_returns_all_none_values(scalar_dataset, null_column_dtype, reader_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases assume the transform_spec func is creating the null values. In the more common case, there are missing values in fields unedited by the transform_spec. I believe this solution already addresses both cases but it would be good to demonstrate this in the tests either with an additional test case or additional non-edited fields in this test case.
@pytest.mark.parametrize('reader_factory', _D) | ||
def test_transform_spec_returns_all_none_values_in_a_list_field(scalar_dataset, reader_factory): | ||
def fill_id_with_nones(x): | ||
return pd.DataFrame({'int_fixed_size_list': [[None for _ in range(3)]] * len(x)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also ran into the NoneType issue with lists of strings. Consider adding string types to the test as well.
The NoneType problem also occurs when only some of the values in the list are None, e.g. ['a', 'b', None]
. What about a test_transform_spec_returns_some_none_values_in_a_list_field
?
…rSpec's function sets an entire column to None. Resolves uber#744 We implemented a Unischema->Pyarrow-schema conversion and explicitly set the pyarrow schema when converting a pandas dataframe returned by transform spec function into a pyarrow table. This way, pyarrow does not have to guess the type of data from the data itself (which it obviously could not do before, since all values were None).
The test tests properly columns with scalars only. Will need to verify correct behavior with columns that are lists separately. Will extend the tests in the following commits.
…all elements being None. Change the type of numpy dtype to np.object instead of np.unicode_
5d28818
to
1fcf22f
Compare
Yevgeni Litvin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Resolves #744
We implemented a Unischema->Pyarrow-schema conversion and explicitly
set the pyarrow schema when converting a pandas dataframe returned by
transform spec function into a pyarrow table. This way, pyarrow does not
have to guess the type of data from the data itself (which it obviously
could not do before, since all values were None).