-
Notifications
You must be signed in to change notification settings - Fork 322
Since porting to 2.1.0, Dataflow is leaving Datasets/Tables behind in BigQuery #609
Comments
I haven't seen anything like this in 2.0.0; we run batch jobs on a daily basis and have restarted our streaming pipelines a few times now. Is this in streaming, batch, or both? |
Only seen it in batch so far, and cannot reproduce yet. |
Still happening in 2.2.0 templated batch jobs on our side. We're currently managing it with cleanup scripts but it's a PITA. |
I was just thinking about this today because it happened yet again. Agree, auto expire on the datasets makes sense. |
So I did a little investigation and it does look like that's actually implemented... not sure why it's still happening though.
I think I'll try do a bit more debugging of my own... p.s. is this the correct forum to be discussing this? |
[email protected] is a good place and also by opening a tracking issue on https://issues.apache.org/jira/projects/BEAM so people can follow the bug. |
I am also facing this issue. If job failed, I observed that table got delete after 1 day. But DataSet still remain exist. Can we have option to clean temp dataset and tables immediately if job failed. ? Can any one have better idea. ? |
Check out https://beam.apache.org/community/contact-us/ for ways to reach the Beam community with bug reports and questions. |
Since porting to 2.1.0, Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is cancelled or when it fails. We've been on 1.8.0/1.9.0 previous to this, and we've never see this before. We skipped 2.0.0, so unsure which version it was actually introduced in.
I cancelled a job (2017-10-08_18_35_30-13495977675828673253), and it left behind a dataset and table in BigQuery:
The text was updated successfully, but these errors were encountered: