3 Minuten Lesezeit (536 Worte)

Let’s plan it – Scheduling in Apache NiFi

One key feature of Apache NiFi is scheduling. NiFi dataflows should cover streaming and batch use cases. But how can we connect both worlds together, and what settings are available? In this blog, we want to check the scheduling possibilities and show how to use them correctly.

General scheduling options 

The dataflow engineer has in total two options to schedule a processor. There are some special processors which allow the engineer a third possibility, but this is marked as experimental.

The first two options are timer driven and CRON driven.

Generally, we can say: Use the CRON driven option as batch-oriented trigger of your flow. The timer driven option is rather for periodic flows.

Let's take at first a more in-depth look into the timer driven option.

The basics: Run Schedule and Timer driven

The default schedule option for all processors is timer driven. Therefore, we first check the options here.  The data engineer has only one option to configure: The Run schedule. Here, we define the amount of time that should elapse until the processor will execute again.

Let's check out some examples:

1. Run schedule set to 0 s (default) → Means as fast as possible. Whenever a new FlowFile arrives, the processor will execute immediately. 

2. Run schedule set to 5 mins → After an execution, the processor will wait 5 minutes until the next execution. 

3. Run schedule set to 1 week → After an execution, the processor will wait 1 week until the next execution.

You are probably wondering now: Which time units are possible? We saw "s", "mins" and "week" but there are several other possible options.

It's planned: CRON Driven options

Especially for batch-oriented data flows like daily loading of tables or data transfers at the end of the month, the CRON Driven Option is more suitable. The CRON Driven scheduling is highly orientated by the well-known Linux cron tabs.

There are six required fields for the following values:

  • Seconds
  • Minutes
  • Hours
  • Day of Month
  • Month
  • Day of Week
  • Year (optional)

Since you can use the CRON Tab syntax, options like a static value e.g., every day at 11 AM (0 0 11 ? * * *), but also increment chances like every 15 minutes or a range are possible.


As always, I can highly recommend the Apache NiFi Documentation, where you can also find some examples.

If you're not familiar with the CRON-syntax, but you want to use it I recommend using online editors like Free Online Cron Expression Generator and Describer to generate the CRON-syntax for you. You can copy and paste it directly into Apache NiFi. In addition to this, there are often "debug" information, for example the time for the next execution.

Conclusion 

We have seen two compelling scheduling options for data flows in Apache NiFi. With both, we cover batch orientated and streaming use cases. For some processors, another option exists, the event driven scheduling. This option is still "under development" therefore we won't cover it today.

If you are interested into Apache NiFi or using it, feel free to write us an e-mail.

seminar recommendation / Seminarempfehlungen

Senior Consultant bei ORDIX

 

Kommentare

Derzeit gibt es keine Kommentare. Schreibe den ersten Kommentar!
Dienstag, 03. Dezember 2024

Sicherheitscode (Captcha)

×
Informiert bleiben!

Bei Updates im Blog, informieren wir per E-Mail.

Weitere Artikel in der Kategorie