-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add azure_remote_url and corresponding test. #4629
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4629 +/- ##
===========================================
+ Coverage 36.99% 47.71% +10.72%
===========================================
Files 1318 637 -681
Lines 132449 54251 -78198
===========================================
- Hits 48994 25885 -23109
+ Misses 79202 26042 -53160
+ Partials 4253 2324 -1929
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
looks like the other implementations skip testing stow error responses, so i kept my unit test to a happy path |
Signed-off-by: Chris Grass <[email protected]>
Signed-off-by: Chris Grass <[email protected]>
Is there any update here? |
@davidmirror-ops - this PR needs to be reviewed and merged if your reference impl needs to support presigned urls. that feature is not a hard requirement for flyte to work in azure but is required for feature parity with other cloud providers. |
@cgrass Is there anything we can do to help you take this to the finish line? Thanks! |
Signed-off-by: Chris Grass <[email protected]>
Signed-off-by: Flyte-Bot <[email protected]> Co-authored-by: flyte-bot <[email protected]> Signed-off-by: Chris Grass <[email protected]>
Signed-off-by: Flyte-Bot <[email protected]> Co-authored-by: evalsocket <[email protected]> Signed-off-by: Chris Grass <[email protected]>
Signed-off-by: Chris Grass <[email protected]>
Why are the changes needed?
This PR adds an Azure implementation for
RemoteURLInterface
. This is needed to keep feature parity with existing (S3, GCP) implementations and their use viaAdminServiceServer
. However, I'm not sure pre-signed URLs generated by this interface are actively used or supported at the moment. Please review the test section below for my findings on existing behavior (S3) and the limitations with this implementation given fsspec behavior.What changes were proposed in this pull request?
Stow was recently updated to include a full azure blob storage implementation. That includes a feature to return SAS tokens (presigned urls). I created a PR for flyte that updated the flyte-binary helm chart to use the new stow config, but found that the presignedUrl feature (for some workflows) was not implemented for azure.
How was this patch tested?
Unit Test: I added a new unit test to cover basic functionality.
E2E Testing: This is somewhat difficult because the signed-url output does not seem to be used in the Go codebase. I tested e2e by deploying my code and setting up a little pytest pasted below. I was then able to step through the flytekit code, confirming the token returned from the Flyte API was properly constructed and valid.
This test confirms my code works but also discovered a couple bugs related to fsspec:
get
operations. I am discussing options with the fsspec community here, but the current workaround would require a fix to flytekit and is not ideal. The bug only pops when all of the below are truea)
remoteData
is configured to setsignedUrls:true
b) the presigned URL is larger than n (file system limit) characters (common for S3)
c) the payload is larger than
maxSizeInBytes
d)
FlyteRemote
is used to interact with the execution - I am not 100% confident in this, but it's the only use case I could findget
with a URL forrpath
and a filename forlpath
results inlpath
being used as a path dir. e.g., ifrpath
ishttp://mydomain/output.pb
andlpath
is/var/output.pb
a file gets written at/var/output.pb/output.pb
.this causes an issue in the flytekit code because remote._get_output_literal_map() creates
tmp_name
with the filename ("output.pb"), and then usestmp_name
as the location for the resulting file. in reality, that path is a dir that contains the actual file.to demonstrate what works with fsspec i put together a hacky update to
remote._get_output_literal_map()
for my particular use case:Obviously the above is not a prod solution, but is meant to highlight the issue and gives some hints as to a proper fix.
Notes
script_mode.tar.gz
file, but that calls a different API (service.CreateUploadLocation()
).Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link