Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we apply more efficient conversion logic? #220

Open
Yueqiao12Zhang opened this issue Nov 8, 2024 · 2 comments · May be fixed by #221
Open

Can we apply more efficient conversion logic? #220

Yueqiao12Zhang opened this issue Nov 8, 2024 · 2 comments · May be fixed by #221
Assignees

Comments

@Yueqiao12Zhang
Copy link
Contributor

No description provided.

@Yueqiao12Zhang Yueqiao12Zhang self-assigned this Nov 8, 2024
@Yueqiao12Zhang Yueqiao12Zhang linked a pull request Nov 8, 2024 that will close this issue
@Yueqiao12Zhang
Copy link
Contributor Author

Yueqiao12Zhang commented Nov 15, 2024

11-8-2024

MusicBrainz & Other Potential Databases:

  • Applied a new approach using JSON logic
  • Merged with old CSV2RDF logic for parsing JSON file

Advantages of Using JSON Logic:

  1. Data Structure Preservation: RDF closely aligns with JSON’s structure, perfectly conserving complex data layouts without losing fidelity—unlike CSV, which struggles with nested or hierarchical data.
  2. Simplified Reconciliation: CSV files introduced excessive, nested columns due to the JSON structure, complicating reconciliation efforts. With RDF, we avoid this, making reconciliation more straightforward.
  3. Data Integrity: Unlike CSV, where data might be truncated or result in numerous blank cells, RDF maintains full data integrity.
  4. Direct RDF Import for Reconciliation: RDF files can be directly imported into OpenRefine for reconciliation, allowing us to skip the additional CSV conversion step.
  5. Old functions preserved: We can apply the exact same functions in the old CSV2RDF, like marking language, detecting datatype, etc.

Disadvantage:

  1. Query Complexity: RDF is implemented using blank nodes, which can make querying the data more challenging.

@Yueqiao12Zhang
Copy link
Contributor Author

Workflow for Converting JSON to RDF and Reconciling with OpenRefine

Steps

  1. Extract predicates from the JSON file

    • Begin by extracting all predicates from the given JSON data. These predicates will form the basis for mapping the data.
  2. Map the predicates to Wikidata properties

    • Establish mappings between the extracted predicates and relevant Wikidata properties. This will ensure consistency and alignment with existing semantic data.
  3. Convert JSON to RDF file with all properties already mapped

    • Using the mapped predicates, convert the JSON file into an RDF (Resource Description Framework) format. Ensure all properties are appropriately defined in the RDF.
  4. Upload RDF into OpenRefine

    • Import the generated RDF file into OpenRefine for further refinement and reconciliation.
  5. Reconcile using OpenRefine

    • Reconcile the RDF data using OpenRefine, linking your data to external references (e.g., Wikidata).
  6. (Cannot export RDF using OpenRefine)

    • Note that OpenRefine does not support exporting data back into RDF format directly.
  7. Use the output CSV from OpenRefine, map the reconciliation data to RDF file

    • Export the reconciled data as a CSV file from OpenRefine. Map the reconciliation data from this CSV back to the original RDF structure.
  8. Successfully reconcile RDF data

    • Finalize the reconciliation by ensuring all mappings are accurate and the RDF data is consistent with external references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant