Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Generation Alternative - GSoC #2132

Merged
merged 48 commits into from
Aug 23, 2024

Conversation

DraKen0009
Copy link
Contributor

@DraKen0009 DraKen0009 commented May 7, 2024

Documentation link

https://docs.ohc.network/docs/care/CEP/Completed/pdf-generation

Associated Issue

Merge Checklist

  • Updated DockerFile.
  • Update prod DockerFile.
  • Added Test to check proper installation of Typst binary.
  • Adding helper for pdf generation using Typst.
  • Added Test to check that the PDF is generated.
  • Finding Library to read and test pdf content.
  • [Enhancement] Improving Error handling for pdf_generation functions.
  • Designing Template for Typst.
  • Integrating Typst with Django Templates
  • Writing tests
  • Improving test by adding more sample data

Proposal Preface

Care is currently using django-hardcopy to generate discharge reports, which utilizes Chromium Headless for rendering Django templates in Chrome and converting them to PDF. However, this method is not very efficient and could be improved by using a native Python package that can directly convert HTML to PDF without the need for browser rendering. This would result in lower resource consumption and increased efficiency.

Currently, this function is responsible for generating PDFs, and the following is the Django template used for generating reports using Tailwind CSS and Django templates.

Drawbacks of Current Approach

  • Dependency on Chromium Headless: Adds significant overhead and resource consumption.
  • Increased Docker Image Size: Including Chromium Headless impacts deployment and scalability.
  • Slower and Less Efficient Rendering: Generating PDFs through browser emulation is slower and less efficient compared to native PDF generation.
  • Higher Memory Usage: Requires higher memory usage during PDF generation, potentially affecting performance and responsiveness.
  • Lack of Recent Maintenance: The absence of recent maintenance and updates for django-hardcopy, last seen in July 2018, has raised concerns regarding possible security vulnerabilities.

Proposed Solution

Initial Solution Approach

Initially, I considered utilizing a Python library like WeasyPrint or xhtml2pdf. Both libraries offer Django-friendly code. Here are the respective documentation links for further reference:

After conducting further research and receiving feedback, I came across a recent article by Zerodha that discussed PDF generation. This is where I discovered Typst.

Final Solution (Using Typst)

Initially, PDFs were generated from HTML using Puppeteer, which involved spawning headless instances of Chrome. According to the Zerodha article, their earlier tech stack, similar to ours, relied on headless Chrome for PDF rendering, which proved inefficient at scale. However, they transitioned to Typst, noting its efficiency and scalability benefits. This sparked my interest, leading to extensive research validating Typst's effectiveness and credibility through various references.

We can also refer to the official documentation, to get detailed information related to the tool.

I engaged with the Typst community and moderators on Discord, where I received predominantly positive feedback and had productive interactions. The notable advantage I found with Typst is its freshness compared to competitors like LaTeX. While it may have fewer available resources for reference, my experience within the Typst community has been positive. The community is extremely helpful and responsive, making any potential lack of resources a minor concern.

By adopting Typst for our PDF generation needs, we can significantly improve the efficiency and scalability of our discharge report generation process, ensuring lower resource consumption and enhanced performance.

Implementation Plan

  • Step 1: Update Docker Files to Add Typst Dependencies

  • Step 2: Update Helper Functions for Typst and Create a Wrapper for Typst to Compile Templates

  • Step 3: Create a Static Template for Our Report

  • Step 4: Integrate Typst with Django Templates in Our Project

  • Step 5: Create Tests

  • Step 6: Remove All Previous Dependencies and Remove Chromium and django-hardcopy

  • Step 7: Update production files for the changes

Step 1: Update Docker Files to Add Typst Dependencies

Since we don't have an apt installation for Typst, we can download it from the official releases according to the build we are working on. Updated Dockerfile : dev.Dockerfile

Step 2: Update Helper Functions and Create Typst Wrapper

First of all we have to create a wrapper function to allow our typ binary to compile our template. The updated function - compile_typ

After creating a wrapper we can update our helper functions. Updated helper function - generate_discharge_summary_pdf

Step 3: Create Static Template for Reports

Static Report Template progress could be seen here - Report Template.

Now this document just contains different components that I've used in the template.

Step 4: Integrate Typst with Django Templates in Our Project

Updated the previous template using HTML/CSS with Typst. Template can be at patient_discharge_summary_pdf_template.typ

Step 5: Create Tests

Generating PNG of the pdf using typst and comparing using Pillow library. It involves have sample png images of the pdf in care/facility/tests/sample_reports folder which are to be compared with the newly generated pdf pngs , if identical the test cases passes, else throws error.

To Update the sample PNG files, we can update the test_compile_typ function by adding the below code to test function below line 59.

subprocess.run(
    ["typst", "compile", "-", sample_file_path, "--format", "png"],
    input=content.encode("utf-8"),
    capture_output=True,
    check=True,
    cwd="/",
)

To investigate any errors, we can remove the finally block from our test_compile_typ. It'll generate the test_output{n}.png files in care/facility/tests/sample_reports folder, from where you can use image diff checker to investigate the differences.

if in future we decide to add more data to the test function and the number of pages increases, then one should also update the number_of_pngs_generated number to the number of pages of pdf generated.

Step 6: Remove All Previous Dependencies and Remove Chromium and django-hardcopy

Updated all the functions utilising the older dependencies with the newer versions and removed django-hardcopy from pipfile and chromium from docker file.

Step 7: Update production files for the changes

Updated prod.Dockerfile to remove older dependencies and added newer dependencies.

Updates in the template are listed below

Patient Detail section

  • Remove Date of Birth field

Admission Details Section

  • Removed Decision After Consultation field
  • Removed Examination details and Clinical conditions field
  • Removed From field
  • Added Duration of Admission field , which shows the time span the patient was admitted (discharge date - encounter date)
  • Added Admitted to field , which shows the bed and it's type
  • Added Diagnosis at admission field which shows List the ICD-11 in the following order - confirmed,provisional, unconfirmed, differential
  • Added Reported Allergies field, which shows the list of allergies
  • Added Symptoms at admission field ,which show list of all active symptoms at the time of admission

Health Insurance Details

  • kept the table as the last item below discharge details

Treatment Summary Section

  • Combined Prescriptions medication details into a single table with updated format
  • Removed Treatment Plan
  • Removed General Instructions
  • Removed Special Instructions
  • Removed Prescription notes it is not relevant for discharge summary

Discharge Summary Section

  • updated Discharge Notes to Discharge Advice
  • Added Discharge Prescription table

Others

  • Removed Symptoms and Diagnosis (ICD-11) tables and Health Status at admission section
  • Removed Daily Round section
  • Created three new templatetags , one to format prescription, one to format_to_sentence_case and one to handle empty data
  • Added conditions to update fields name according to admission status

Results

  • Typst is 8-15 times faster in generating PDFs
  • Container size decreased by 30%
  • Typst is 10-15% more memory efficient
  • We eliminate the overhead associated with browser-based rendering, resulting in a more efficient and scalable process.

Links

@DraKen0009 DraKen0009 requested a review from a team as a code owner May 7, 2024 04:13
@sainak sainak marked this pull request as draft May 7, 2024 05:31
Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 63.20755% with 39 lines in your changes missing coverage. Please review.

Project coverage is 64.20%. Comparing base (8cd1032) to head (fe02b4f).
Report is 1 commits behind head on develop.

Files Patch % Lines
care/facility/utils/reports/discharge_summary.py 65.90% 10 Missing and 5 partials ⚠️
care/facility/models/patient.py 43.75% 8 Missing and 1 partial ⚠️
care/facility/api/viewsets/patient_consultation.py 33.33% 8 Missing ⚠️
care/facility/templatetags/data_formatting_tags.py 72.00% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2132      +/-   ##
===========================================
- Coverage    64.21%   64.20%   -0.02%     
===========================================
  Files          239      241       +2     
  Lines        13495    13582      +87     
  Branches      1917     1940      +23     
===========================================
+ Hits          8666     8720      +54     
- Misses        4472     4502      +30     
- Partials       357      360       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nihal467
Copy link
Member

nihal467 commented Jul 30, 2024

image

@sainak @DraKen0009 the discharge summary is not generating and the test is failing in the PR as well

@DraKen0009
Copy link
Contributor Author

  1. As per the format shared by aparna, the admitted-to field should only show all the types of beds the patient was admitted to during the consultation period, not just the last admitted-to bed name and type. for example in the above screenshot case, it will be " admitted to : ICU, Bed with Oxygen Support "

Fixed it and also added a template tag to format the text in more readable form

  1. Dummy bed 6 is a bed with oxygen support, but in the discharge summary, it's been mentioned as a regular bed, make sure bed types are properly fetched.

Fixed

  1. As per the PR description and the format shared by aparna, the date of admission field was planned to be removed, but in the discharge summary, this field still exists DraKen0009 was it intentionally kept, or was it a mistake?

It's intentional, asked to keep it. Have updated the PR description accordingly

  1. As per the shared format, the duration of admission should be just the number of days only, so remove the time-related details from the field.

Updated to show only days if duration is >24 hours else it'll show time in hours instead of 0 days

  1. When the weight and height are not available, just show N/A, without the units

Fixed

  1. According to the format provided, we should only display symptoms present at the time of admission. However, in the screenshot mentioned, the first two symptoms were added during the consultation creation, and the remaining three were added while creating a log update. Therefore, instead of showing just the two symptoms, the discharge summary displays all five symptoms. Please confirm whether this was intentional with sainak & aparna.

Updated it. Now only the symptoms with onset date before admission date will be visible in the list

  1. As per the shared format, it was expected to be the last item after the discharge details, but the discharge summary format, you kept it on top of the discharge details , was it intentional ?

Fixed it's kept it below the Discharge Summary and above Annexure(files details)

  1. As per the new format, you forgot to rename prescription medication to Medication administered

Fixed

  1. as per the format rename from discharge prescription medication to discharge prescription

Fixed

  1. the treating physician name and discharge prescription table have space between them, as per the shared format, so keep them

Added a line space above Treating Physician

@nihal467 @sainak

@sainak sainak marked this pull request as ready for review August 16, 2024 08:12
@sainak sainak requested a review from rithviknishad August 16, 2024 08:37
@nihal467
Copy link
Member

nihal467 commented Aug 19, 2024

@DraKen0009
image
image

  • If the patient is <1 age, then show the month and days not just the month

image

  • if the patient was admitted to icu bed twice during the consultation , show that only once , not twice

@nihal467
Copy link
Member

LGTM @DraKen0009 nice work

@nihal467 nihal467 requested a review from vigneshhari August 20, 2024 09:18
docker/dev.Dockerfile Outdated Show resolved Hide resolved
@vigneshhari vigneshhari merged commit 45b51a8 into ohcnetwork:develop Aug 23, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore other alternatives to generate PDF reports
4 participants