In the rapidly evolving landscape of data processing, Apache NiFi has emerged as a powerful tool for automating the flow of data between systems. With the introduction of NiFi 2, the ability to implement new processors directly in Python has become a meaningful change for developers and data engineers. This blog post explains how you can deploy new processors directly into NiFi in a Continuous Deployment manner. You can adapt this process easily to your CI/CD tool, for example Jenkins, Tekton or GitLab CI/CD.
NiFi 2 and Python Processors
Apache NiFi 2 offers tons of new features, but today we focus on the deployment for the Python Processors. They allow developers to leverage the versatility of Python for data transformation and processing tasks. By integrating Python into NiFi, users can create custom processors that cater to specific data processing needs, thus enhancing the overall functionality of their data flows: And yes, since it is a current topic, you can integrate a lot of AI functionality into your Python Processors.
We will refrain from going into too much detail in terms of Continuous Integration (CI) but will give you a quick rundown. We expect that your workflow looks a little something like this:
You develop locally and test your processor with some basic flows. We highly recommend integrating at least pylint, but also pytest with your processor. After you are done with your local development, you commit and push your code. So, at this point your pipeline will take over, lint and test jobs will run (hopefully) successfully, the last step in your CI part is building the processor, with the hatch-datavolo-nar.
So, the question is; after developing our processors, how can we deploy them to our staging or even production NiFi environment?
The Answer: Using NiFi Registry as a NAR Provider
NiFi Registry offers a robust feature set for managing data flow versions. While this might be its most used feature, what truly elevates it to solve our deployment problem is its ability to act as a NAR provider. This means NiFi can dynamically pull new processor versions from the registry, enabling you to update custom processors without restarting NiFi or interrupting active data flows.
Different implementations for the NAR provider are available. One of them is the NiFi Registry, but you could use the same approach with the HDFS provider as well. It can be configured in the nifi.properties file.
nach Autor
On NiFi Registry side, there is no need to configure anything, it will work out of the box. Within NiFi, however, we must configure two values in the nifi.properties file:
What will happen now? NiFi will use the NiFiRegistryExternalResourceProvider and will try to connect to the configured registry. If NAR components are used, NiFi will try to pull and integrate them every 5 mins automatically. You can configure the sync value, if necessary.
Uploading NARs to NiFi Registry
After building your Python Processor, we recommend implementing hatch-datavolo-nar, you will receive a new NAR file. So, how can you upload it to the registry?
One of the easiest ways is through the NiFi Registry REST API. By using the API, you can also automate the upload of NAR files to the registry as part of your CDpipeline.
If you do not want to make use of the REST API, there are other options for deploying your custom NAR files, such as the NiFi Toolkit as a wrapper around the API, for example.
Upload via cURL:
You can upload a new NAR bundle using a simple cURL command:
By integrating the REST API call into your CI/CD pipeline, you can automate the process of pushing new processor versions into NiFi Registry.
Conclusion: Simple CD Pipeline for your Python Processors with NiFi Registry
Deploying new processor versions in NiFi does not have to be a hassle. By using NiFi Registry as a NAR provider, you can automate the deployment process and ensure that your custom processors are always up to date. Whether you are working with Python processors or Java-based NARs, this approach keeps your data flows agile, and your deployment process streamlined.
seminar recommendations
APACHE NIFI BASICS DB-BIG-07
learn moreNIFI ADVANCED WORKSHOP DB-BIG-08
learn more