Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create basic ddi parser #13

Merged
merged 19 commits into from
Apr 16, 2024
Merged

Create basic ddi parser #13

merged 19 commits into from
Apr 16, 2024

Conversation

00krishna
Copy link
Collaborator

This PR concentrates on developing a basic DDI file parser for an IPUMs extract. The DDI file provides all of the IPUMs and variable metadata, which is necessary for importing the data file itself. In the basic IPUMs download, there is no metadata in the data file.

Included items:

  • created a new folder called parsers and added ddi_parser.jl file.
  • created DDIInfo and DDIVariable objects to hold metadata for the extract.
  • created a new function parse_ddi that takes in a DDI XML file and return a DDI object. The details of the DDI object may be updated later.
  • added export and includes to the IPUMS.jl file.
  • added a testdata subfolder to the test folder, and include some small test files.
  • added 2 test cases to the runtests.jl file.

TODO items:

  • Extract the categorical variable metadata from the DDI file.

Copy link

codecov bot commented Apr 6, 2024

Codecov Report

Attention: Patch coverage is 98.05825% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 20.26%. Comparing base (12e8c49) to head (9bd0d2f).
Report is 2 commits behind head on main.

Files Patch % Lines
src/parsers/ddi_parser.jl 97.97% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            main      #13       +/-   ##
==========================================
+ Coverage   1.19%   20.26%   +19.07%     
==========================================
  Files         25       26        +1     
  Lines        420      523      +103     
==========================================
+ Hits           5      106      +101     
- Misses       415      417        +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Left some comments as a first pass -- mostly to simplify some things and to document/organize various parts of the code. I'll test out the API to see how it feels as well. Thanks @00krishna !

Project.toml Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
Copy link
Member

@TheCedarPrince TheCedarPrince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great stuff here -- the docstrings are fantastic as are the examples within them! Left some feedback and thoughts.

What else are you planning to do with this PR? I am almost wondering if we could make the package extensions a separate PR as well as definitely making the way the parsed information gets returned into a separate PR. What do you think @00krishna ?

Project.toml Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/parsers/ddi_parser.jl Outdated Show resolved Hide resolved
src/structs.jl Outdated Show resolved Hide resolved
src/structs.jl Outdated Show resolved Hide resolved
src/structs.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
@TheCedarPrince
Copy link
Member

Oh also, do you know why Documenation is failing in the CI @00krishna ? Finally, in the PR, apparently the argumenterrors are not being code covered. Could you perhaps investigate why that is? Link here: https://app.codecov.io/gh/JuliaHealth/IPUMS.jl/pull/13?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaHealth

@TheCedarPrince
Copy link
Member

Ah wait -- I broke documentation. 🤦‍♂️

src/constants.jl Outdated Show resolved Hide resolved
@00krishna
Copy link
Collaborator Author

Really great stuff here -- the docstrings are fantastic as are the examples within them! Left some feedback and thoughts.

What else are you planning to do with this PR? I am almost wondering if we could make the package extensions a separate PR as well as definitely making the way the parsed information gets returned into a separate PR. What do you think @00krishna ?

Yeah, we can create a separate PR for the package extensions stuff. That is not too bad. I think I am done with this PR, as I have the parsing working, and I also created a simple output that shows the captured metadata. The next PR will actually import the datafile into a dataframe--based upon the parsing info.

@TheCedarPrince TheCedarPrince merged commit 67c8286 into JuliaHealth:main Apr 16, 2024
5 of 6 checks passed
@TheCedarPrince TheCedarPrince deleted the create_basic_ddi_parser branch April 16, 2024 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants