-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load S3 #453
Comments
Hi @AllamSudhakara: I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration: - dataFrameName: df_name
outputType: File
format: parquet
outputOptions:
saveMode: Overwrite
path: s3a://<s3_bucket_name>/path/to/file Looks like you are note building path correctly |
Hi Luis,
Thanks for the reply. Would you know if there is any pipeline builder GUI
developed based on Metorikku that generates .YML file
and run the pipeline and visualize the progress and emit any errors? Please
provide some details on if YotpoLtd has it and can supply
under some license fee. It would be great If this GUI has the ability to
read from Enterprise metadata for data scientists/analysts to
build pipelines and progressively consolidate the Enterprise data assets.
Regards,
Sudhakar
…On Wed, Aug 10, 2022 at 3:20 PM Luis Cabezon Manchado < ***@***.***> wrote:
Hi @AllamSudhakara <https://github.com/AllamSudhakara>:
I am currently using Metorikku and I am able to write to S3 parquet files.
I am using this output's configuration:
- dataFrameName: df_name
outputType: File
format: parquet
outputOptions:
saveMode: Overwrite
path: s3a://<s3_bucket_name>/path/to/file
Looks like you are note building path correctly
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEBOADRN6NZF75WQ7QPGV6DVYP6G7ANCNFSM5E3KTAYA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a very simple configuration file and Job file where in I select 20 rows from HADOOP System using HIVE Catalog and push it to S3 bucket. Job is populating the data frame and does not create file in S3. Could you please verify the following and provide me insight on what I am doing wrong? Thanks in advance for the help.
Command
spark-submit --conf spark.sql.catalogImplementation=hive --conf spark.hadoop.dfs.nameservices=mycluster --conf spark.hadoop.fs.s3a.fast.upload=True --conf spark.hadoop.fs.s3a.path.style.access=True --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.access.key= --conf spark.hadoop.fs.s3a.secret.key=<DEV-SECRET_KEY> --class com.yotpo.metorikku.Metorikku /home/myEdgenodePath/metorikku_2.11.jar -c /myHadoopFS/job-StraightLoad.yml
Job
metrics:
variables:
StartDate: 2021-09-01
EndDate: 2021-09-07
TrimmedDateFormat: yyyy-mm-dd
output:
file:
dir: s3a://dev-files-exchange/output
Metric
steps:
sql:
select * from mySchema.my_aggregate where exp_dt = ${EndDate} LIMIT 20
ignoreOnFailures: false
output:
outputType: Parquet
outputOptions:
saveMode: Overwrite
path: MYMonthly.parquet
The text was updated successfully, but these errors were encountered: