Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional --discover capability #8

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

aaronsteers
Copy link

@aaronsteers aaronsteers commented Aug 27, 2020

Description of change

This PR adds (optional) discovery capability. This is a broadly used tap and care must be taken to avoid breaking existing samples.

In order to not break downstream consumers and samples:

  • Sync must run with or without the --catalog argument provided.
  • Discovery must produce equivalent schema as would otherwise be emitted during tap's sync operation.
  • When --catalog or -c argument is provided, the tap should respect the singer spec, for instance column selections.

Manual QA steps

  • executed using --discover flag and confirmed emitted json matches with expected schema.
  • (not done) Run sync with or without the provided catalog.
  • (not done) Run sync with the primary stream deselected; confirm that no data is emitted.
  • (not done) Run sync with specific columns deselected; confirm that ignored column data is omitted.

Risks

  • Would break critical samples and onboarding if merged with breaking changes.

Rollback steps

  • revert this branch

@aaronsteers

This comment has been minimized.

@aaronsteers aaronsteers changed the title Add optional --discovery capability Add optional --discover capability Aug 27, 2020
@aaronsteers
Copy link
Author

Update:

I've now got it working successfully with or without the catalog input.

Still "todo" is to ensure that if something is deselected in the catalog that it would also be ignored/excluded from the data stream.

Comment on lines +88 to +89
# else:
# logger.info(catalog_stream_override)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: remove these after testing.

Comment on lines +96 to +97
# else:
# logger.info(metadata_override)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Remove after testing.

singer.Transformer().transform(
data=record,
schema=catalog_stream_override[0]["schema"],
# metadata=metadata_override, # TODO: debug (not working)
Copy link
Author

@aaronsteers aaronsteers Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get errors when I try to pass the metadata arg.

  • Any ideas?? ❓

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that the metadata is in list format and it's expecting dict or vice-versa. My gut says it needs to be a dict, but haven't written this code in awhile.

Copy link
Author

@aaronsteers aaronsteers Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmosorast

I think you are correct. The below seems to resolve the error, now pulling in the full dict which is one level up from the "metadata" element (which is itself of type list).

                            singer.Transformer().transform(
                                data=record,
                                schema=catalog_stream_override[0]["schema"],
                                metadata=catalog_stream_override[0],
                            )

Note: It's still possible that I have this wrong and the failure just went away because I fixed the argument type.

  • More testing needed to confirm whether those metadata arguments (especially selected status) are correctly getting piped through.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to follow (and more code cleanup to do) once I confirm the results are as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior might be incorrect. Without 100% tracing the code, I would expect it to be something like metadata.to_map(metadata_override) using the metadata module in singer-python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API for the metadata is a bit strange since it uses a module/"functional" pattern, so I can chat over DM if anything needs cleared up. I'm trying to figure out how to explain it myself.

Copy link
Contributor

@dmosorast dmosorast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start. I like how safe the catalog passing is for a gentle introduction. The code is a bit more complex, so that might be able to be refactored into functions to divide the intro (no config, no catalog, etc) from the advanced (discovery + catalog + metadata) and make the "stages" clearer to a reader.

Just a couple of comments/changes that came to mind. Thanks for taking this up!

tap_exchangeratesapi/__init__.py Show resolved Hide resolved
.vscode/settings.json Outdated Show resolved Hide resolved
singer.Transformer().transform(
data=record,
schema=catalog_stream_override[0]["schema"],
# metadata=metadata_override, # TODO: debug (not working)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that the metadata is in list format and it's expecting dict or vice-versa. My gut says it needs to be a dict, but haven't written this code in awhile.

tap_exchangeratesapi/__init__.py Outdated Show resolved Hide resolved
@dmosorast
Copy link
Contributor

One thing that occurs to me just now is that the -c parameter I notice in your checklist refers to --config rather than --catalog, there's no generally accepted short code for --catalog unfortunately due to the name conflict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants