job is executed multiple times unintentionally #362

GilShmaya · 2022-04-11T10:20:11Z

Hey,

We encounter an issue in which a job is executed multiple times unintentionally although it's mentioned in the following remark that this is unexpected behavior. (https://github.com/spotify/flink-on-k8s-operator/blob/v0.4.0-beta.7/controllers/flinkcluster/flinkcluster_reconciler.go#:~:text=//%20This%20is%20an%20exceptional%20situation.)

The scenario:
1 - A logical bug in the job code was introduced to a new job version. This version was deployed to the Flink cluster, causing some of the TaskManagers to crash with an exception after a few seconds of runtime. A restart loop started happening where the job would try to re-run and crash after a few seconds, repeatedly.
2 - The bug was identified, fixed and we want to update the running job with a new fixed-job JAR.

expected: only one job should run without errors.
actual: two jobs are up.

After that, when trying to cancel the unexpected job, the flink cluster is canceled as well.

Thanks,
Gil

live-wire · 2022-04-25T12:36:23Z

Hey @GilShmaya

If you deploy your FlinkCluster as Job Cluster / Application Cluster, Cancelling the job from the FlinkConsole will cancel the cluster too. (That is the intended behaviour)
Try the Session Cluster (sample) if you want to reuse your FlinkCluster for different jobs. (You'd have to submit your jobs to the JobManager using Flink CLI yourself)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job is executed multiple times unintentionally #362

job is executed multiple times unintentionally #362

GilShmaya commented Apr 11, 2022

live-wire commented Apr 25, 2022

job is executed multiple times unintentionally #362

job is executed multiple times unintentionally #362

Comments

GilShmaya commented Apr 11, 2022

live-wire commented Apr 25, 2022