mamluk studies journal ingestion #2

verbalhanglider · 2017-05-06T15:13:06Z

The metadata in extracted from the 291 pdf files needs to be evaluated by stakeholders and a consensus should be reached that

A. it looks acceptable
B. what needs to be added or not added for any records that do not have a value in a particular field

I've created a mapper class and am feeding an input dict to each mapper to generate a dublin core ElementTree with the required elements with the appropriate values TODO: need to get stakeholder approval of this mapping before moving forward TODO: fold existing SAF creation code base into this one

split major loops in main() into two separate functions. also, moved all functions that were being re-typed into single functions being called multiple times.

I've created a separate function for each field being extracted from the data that needs to be read/normalized in its own way. This should make understanding the errors in the data a lot easier rather than the previous way where it was all being done in one giant main() function

verbalhanglider · 2017-08-25T15:25:46Z

SAFs for full volumes have been generated. Waiting on upload and DOIs for those per stakeholder request to add DOIs for volumes to individual articles for SAF generation of individual articles.

I've created a mapper class and am feeding an input dict to each mapper to generate a dublin core ElementTree with the required elements with the appropriate values TODO: need to get stakeholder approval of this mapping before moving forward TODO: fold existing SAF creation code base into this one

split major loops in main() into two separate functions. also, moved all functions that were being re-typed into single functions being called multiple times.

I've created a separate function for each field being extracted from the data that needs to be read/normalized in its own way. This should make understanding the errors in the data a lot easier rather than the previous way where it was all being done in one giant main() function

verbalhanglider added this to the ingest 291 PDFs into knowledgespace.uchicago.edu milestone May 6, 2017

verbalhanglider assigned c-blair May 6, 2017

verbalhanglider changed the title ~~metadata extracted needs to be evaluated~~ mamluk studies journal ingestion May 6, 2017

verbalhanglider pushed a commit that referenced this issue Jul 6, 2017

issue #2; added looking for keyword field in pdf metadata

8ab8568

verbalhanglider pushed a commit that referenced this issue Jul 6, 2017

issue #2: started metadata generator class

4abb5fe

verbalhanglider pushed a commit that referenced this issue Jul 12, 2017

issue #2; started generating required elements.

4b035fe

verbalhanglider pushed a commit that referenced this issue Jul 17, 2017

issue #2: refactored extractor.py

7161d5a

split major loops in main() into two separate functions. also, moved all functions that were being re-typed into single functions being called multiple times.

verbalhanglider pushed a commit that referenced this issue Nov 7, 2017

issue #2: refactored extractor.py

862538f

split major loops in main() into two separate functions. also, moved all functions that were being re-typed into single functions being called multiple times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mamluk studies journal ingestion #2

mamluk studies journal ingestion #2

verbalhanglider commented May 6, 2017

verbalhanglider commented Aug 25, 2017

mamluk studies journal ingestion #2

mamluk studies journal ingestion #2

Comments

verbalhanglider commented May 6, 2017

verbalhanglider commented Aug 25, 2017