Skip to content

Latest commit

 

History

History
67 lines (43 loc) · 3.73 KB

README.md

File metadata and controls

67 lines (43 loc) · 3.73 KB

Understanding Exactly-Once Processing And Windowing In Streaming Pipelines (with Apache Beam)

This repo contains the code showcased in the Beam Summit talk Understanding Exactly-Once Processing And Windowing In Streaming Pipelines

In that talk, I explained how windowing works in streaming pipelines, and what are the decisions you have to make in order to do complex event processing in streaming with Apache Beam.

Whenever we apply a window, there are always doubts about whether the window will drop data, or how many times (and when) will be the output triggered.

This repo contains a sample pipeline that uses unit testing to check if your window would drop data, and how many times would the window be trigered. You write your window in a function, and then use the unit test to check the output of that pipeline. If the window drops data, the test will fail. In addition, you get a CSV output that you can examine to see how and when your window produced output.

Watch the talk

Watch the video at the Beam Summit website, or at YouTube:

Check also the slides and the notes of each slide.

The tested pipeline

The pipeline processes 60 messages. 50 messages produced on time, and 10 messages that arrive after the watermark (late data).

How to test your own window?

First: add your window

Add a new window to src/main/java/com/google/cloud/pso/windows/SomeSampleWindow.java.

For that, just add a new method with this signature:

public Window<KV<String, MyDummyEvent>> myCustomWindow()

(maybe with some input parameters if you want to use those in your window).

See some examples of windows in that file

Second: apply your window

TODO

Copyright

Copyright 2020 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.