Let’s plan it – Scheduling in Apache NiFi
One key feature of Apache NiFi is scheduling. NiFi dataflows should cover streaming and batch use cases. But how can we connect both worlds together, and what settings are available? In this blog, we want to check the scheduling possibilities and show how to use them correctly.
General scheduling options
The dataflow engineer has in total two options to schedule a processor. There are some special processors which allow the engineer a third possibility, but this is marked as experimental.
The first two options are timer driven and CRON driven.
Generally, we can say: Use the CRON driven option as batch-oriented trigger of your flow. The timer driven option is rather for periodic flows.
Let's take at first a more in-depth look into the timer driven option.
The basics: Run Schedule and Timer driven
The default schedule option for all processors is timer driven. Therefore, we first check the options here. The data engineer has only one option to configure: The Run schedule. Here, we define the amount of time that should elapse until the processor will execute again.
Let's check out some examples:
1. Run schedule set to 0 s (default) → Means as fast as possible. Whenever a new FlowFile arrives, the processor will execute immediately.2. Run schedule set to 5 mins → After an execution, the processor will wait 5 minutes until the next execution.
It's planned: CRON Driven options
Especially for batch-oriented data flows like daily loading of tables or data transfers at the end of the month, the CRON Driven Option is more suitable. The CRON Driven scheduling is highly orientated by the well-known Linux cron tabs.
There are six required fields for the following values:
- Day of Month
- Day of Week
- Year (optional)
As always, I can highly recommend the Apache NiFi Documentation, where you can also find some examples.
If you're not familiar with the CRON-syntax, but you want to use it I recommend using online editors like Free Online Cron Expression Generator and Describer to generate the CRON-syntax for you. You can copy and paste it directly into Apache NiFi. In addition to this, there are often "debug" information, for example the time for the next execution.
We have seen two compelling scheduling options for data flows in Apache NiFi. With both, we cover batch orientated and streaming use cases. For some processors, another option exists, the event driven scheduling. This option is still "under development" therefore we won't cover it today.
If you are interested into Apache NiFi or using it, feel free to write us an e-mail.
seminar recommendation / Seminarempfehlungen
Consultant bei ORDIX
Bei Updates im Blog, informieren wir per E-Mail.