Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Another copy command implementation #375

Closed
wants to merge 4 commits into from
Closed

Conversation

pnadolny13
Copy link

@pnadolny13 pnadolny13 commented Jun 18, 2024

FWIW I also had a draft for using COPY similar to #370. It worked for me and was significantly faster but I havent tested enough to say its ready to merge. I based it off of the transferwise logic https://github.com/transferwise/pipelinewise-target-postgres/blob/bef5a2786afab3c1600e62a77b6d121625e7bc8e/target_postgres/db_sync.py#L357.

Without reviewing in more detail I dont know if this or #370 is a better approach. I'm happy with either as long as we get the performance improvement 😄 . I wanted to open this to share another implementation if its helpful in getting to the final solution.

cc @kinghuang

@kinghuang
Copy link
Contributor

Nice! The biggest difference that I see is that using csv.DictWriter will just serialize values by str(), which won't handle things like None (null), arrays of jsonb, and other complex types correctly for transmission to PostgreSQL. Most of my code deals with the serialization process to CSV. I also prepare the stream data in memory instead of writing to a temporary file.

@pnadolny13
Copy link
Author

@kinghuang thanks for explaining the differences!

The biggest difference that I see is that using csv.DictWriter will just serialize values
Great! Yeah this seems like a more robust approach.

I also prepare the stream data in memory instead of writing to a temporary file.
I wonder which approach is better. I guess writing to disk is likely more memory efficient but is a little slower. I wonder what copy_expert is doing with the input behind the scenes.

@pnadolny13
Copy link
Author

I'll close this draft though in favor of #370

@pnadolny13 pnadolny13 closed this Jun 25, 2024
@edgarrmondragon edgarrmondragon deleted the copy_command branch June 25, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants