Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Documents: Read metas as the right type #677

Merged
merged 3 commits into from
Jul 21, 2021

Conversation

ajdapretnar
Copy link
Collaborator

@ajdapretnar ajdapretnar commented Jul 6, 2021

Issue

Import Documents added meta files if found, but they were all read as StringVariables not as their own proper types.

Description of changes

Use table_from_frame to translate numeric values to ContinuousVariables and categorical to DiscreteVariables.

CONS: now some attributes are in X, not in metas. Any easy way to fix this?

Decided to follow @VesnaT suggestion and sanitize variables before sending them to guess_data_type, which is now used instead of StringVariable.

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar ajdapretnar requested a review from VesnaT July 6, 2021 13:33
@VesnaT
Copy link
Contributor

VesnaT commented Jul 8, 2021

CONS: now some attributes are in X, not in metas. Any easy way to fix this?

Not that I'm aware of.
Have you considered using guess_data_type function (and keeping the old approach)?

filtered[column].to_numpy(),
to_metas=True
)
filtered = table_from_frame(filtered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use the table_from_frame, you need to reset the df.index:

filtered = table_from_frame(filtered.reset_index(drop=True))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try the above suggestion first, thanks! ^^

@ajdapretnar ajdapretnar force-pushed the metas-proper-types branch 3 times, most recently from 0833d28 to 25b7676 Compare July 9, 2021 08:39
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2021

Codecov Report

Merging #677 (42405e2) into master (b9b9612) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   74.02%   74.07%   +0.04%     
==========================================
  Files          72       72              
  Lines        9155     9230      +75     
  Branches     1239     1253      +14     
==========================================
+ Hits         6777     6837      +60     
- Misses       2138     2150      +12     
- Partials      240      243       +3     

@@ -284,6 +287,8 @@ def _read_meta_data(self):
content = data.content
if isinstance(content, dict):
content = pd.DataFrame(content, index=[0])
# if reader is YamlMetaReader:
# content = content.replace("None", np.nan)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VesnaT Should we keep this? Currently, null values in yamls are treated as "None" (strings).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather put fixes like this inside the YamlMetaReader.
Besides, I'm not sure the fix works correctly for theStringVariables.

Copy link
Contributor

@VesnaT VesnaT Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this could work:

class YamlMetaReader(Reader):
    ext = [".yaml"]

    def read_file(self):
        with open(self.path, "r") as f:
            self.content = yaml.safe_load(f)
            for k in self.content:
                if self.content[k] is None:
                    self.content[k] = ""
   

@VesnaT VesnaT merged commit 3fdb5e0 into biolab:master Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants