Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: boto3 connector #1655

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

feat: boto3 connector #1655

wants to merge 12 commits into from

Conversation

pnadolny13
Copy link
Contributor

@pnadolny13 pnadolny13 commented May 1, 2023

Closes #1651

  • Adds a boto3 connector for simplifying the interface between taps and boto3. Providing default config options to inject into the tap then using those to authenticate with boto3 in all the variations that are common e.g. using config keys, env vars, local profiles, assuming roles, overriding endpoint urls, etc. There might be more ways that I'm missing but this would give us a consistent mechanism to add them to all taps/targets that want to use boto3.
  • Includes tests that all passed for me locally. I needed to add moto in the dev group for poetry.
  • I didnt install any new dependency to the SDK itself though because boto3 isnt always needed. I have the boto3 import statement raise an error telling the user to install boto3 in their connector implementation project.
  • I wanted to add docs to explain how to inject the default configs like I did in https://github.com/MeltanoLabs/tap-dynamodb/blob/6d1e2a17ff9c7a6e84704d7ae7e74e15f8ef70de/tap_dynamodb/tap.py#L60. Or if theres a better way. But I didnt know the best place for them and I had trouble getting even the class docs added locally. I'm seeing similar errors in the CI tests as well. Any suggestions?
  • tested using this feature branch in tap-dynamodb and everything worked as expected https://github.com/MeltanoLabs/tap-dynamodb/blob/use_sdk_boto_connector/tap_dynamodb/dynamodb_connector.py

@edgarrmondragon I see in #1649 that you're working on a base connector implementation. I'm happy to refactor this to match that structure once its ready.


📚 Documentation preview 📚: https://meltano-sdk--1655.org.readthedocs.build/en/1655/

@pnadolny13 pnadolny13 requested review from a team, edgarrmondragon and cjohnhanson as code owners May 1, 2023 14:40
@pnadolny13
Copy link
Contributor Author

I also see that theres poetry.lock conflicts. Any recommendations on the best way youv'e found to merge those?

@z3z1ma
Copy link
Contributor

z3z1ma commented May 1, 2023

Rebase, ensure you have the latest pyproject toml, run poetry lock --no-update, the commit the updated lock file.

Or just use the upstream lock file.

EDIT: just sharing what I do with poetry

@pnadolny13 pnadolny13 requested a review from kgpayne May 4, 2023 20:40
@pnadolny13
Copy link
Contributor Author

@edgarrmondragon @kgpayne any thoughts on this PR?

Comment on lines +100 to +111
if config.get("use_aws_env_vars"):
self.aws_access_key_id = os.environ.get("AWS_ACCESS_KEY_ID")
self.aws_secret_access_key = os.environ.get("AWS_SECRET_ACCESS_KEY")
self.aws_session_token = os.environ.get("AWS_SESSION_TOKEN")
self.aws_profile = os.environ.get("AWS_PROFILE")
self.aws_default_region = os.environ.get("AWS_DEFAULT_REGION")
else:
self.aws_access_key_id = config.get("aws_access_key_id")
self.aws_secret_access_key = config.get("aws_secret_access_key")
self.aws_session_token = config.get("aws_session_token")
self.aws_profile = config.get("aws_profile")
self.aws_default_region = config.get("aws_default_region")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for users to want to use a mix of env vars and JSON config? For example, set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in their environment but switch regions via the config file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon potentially, I dont know all the common patterns people use but my vision for this was to try to make it more explicit where AWS credentials are coming from. I find that having so many variations of where credentials come from (json, profiles, env vars, machine inheritances) and potentially mixing them makes it confusing and sometimes leads to a tap using credentials you weren't intending to use. What do you think about that? Am I exaggerating the challenge a bit?

In tap-dynamodb I was raising an error if credentials weren't provided but some users need to use implicit credentials MeltanoLabs/tap-dynamodb#14. This has me thinking about adding a config option like use_implicit_credentials=True, meaning the user needs to explicitly tell the tap that it should use a naked call like boto3.Session() with no args/kwargs to defer to the boto hierarchy of searching for credentials on the machine.

I'm also almost in favor of eliminating the use_aws_env_vars option and requiring that all users configure them via their meltano.yml. If they want to access env vars they should use templating in their meltano.yml explicitly.

Any thoughts?

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon @kgpayne any thoughts on this PR?

@pnadolny13 I'm still iterating on the connector interface but the guide in in https://meltano-sdk--1649.org.readthedocs.build/en/1649/guides/custom-connector.html gives an idea of where I'd want it to go.

Currently only the get_connector method is part of the interface, which would probably make sense for it to return a boto3.Session if it supports being used as a context manager (e.g. with session: ..., or a custom wrapper class if not.

@pnadolny13
Copy link
Contributor Author

Currently only the get_connector method is part of the interface, which would probably make sense for it to return a boto3.Session if it supports being used as a context manager (e.g. with session: ..., or a custom wrapper class if not.

@edgarrmondragon that makes sense. Its a little awkward with boto3 since it already handles opening and closing the session for you so I'd be adding a context manager for basically no purpose other than to fit the connector abstraction. I'm not quite sure how this works but could I override get_connection without a context manager annotation to avoid needing the with syntax, or does that break the abstraction too much?

@pnadolny13
Copy link
Contributor Author

@edgarrmondragon any new thoughts on this PR? I know you said you were still iterating on the connector interface. Is this something that we'd want to merge sometime soon? If so, I can clean it up to use the get_connector method based on your feedback and get it ready for a re-review.

Copy link

stale bot commented Jun 21, 2024

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Jun 21, 2024
@stale stale bot closed this Jul 12, 2024
@stale stale bot removed the stale label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add boto connector
3 participants