Practice 13 - Advanced Apache NiFi and Edge computing with MiNiFi
In this practice session we will continue working with Apache NiFi. You will learn some more advanced features of NiFi and how to set up NiFi data pipelines across multiple devices. We will set up two instances, one NiFi server for central data management and one smaller instance running MiNiFi, which will represent more resource constrained Edge devices.
We will set up MiNiFi in the smaller instance, which is a special NiFi sub project for resource constrained devices, without any graphical interfaces and a limited number of Processors. We will design a multi-stage data pipeline, where data will be first collected, preprocessed and filtered on the MiNiFi device and then passed along to the MiNiFi server, where the data will be processed and migrated to a CouchDB JSON non-relational database.
The lab content is related to the RADON Horizon 2020 EU project (http://radon-h2020.eu/), which goal is to unlocking the benefits of serverless FaaS for the European software industry using TOSCA language and developed tools.
G. Casale, M. Artac, W.-J. van den Heuvel, A. van Hoorn, P. Jakovits, F. Leymann, M. Long, V. Papanikolaou, D. Presenza, A. Russo, S. N. Srirama, D.A. Tamburri, M. Wurster, L. Zhu RADON: rational decomposition and orchestration for serverless computing. SICS Software-Intensive Cyber-Physical Systems. 2019 Jan 1-11. https://doi.org/10.1007/s00450-019-00413-w
References
- Apache NiFi In Depth https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#flowfile-repository
- Apache NiFi documentation http://nifi.apache.org/docs.html
- MiNiFi documentation: https://nifi.apache.org/minifi/getting-started.html
Practical session communication!
Lab supervisors will provide support through Slack
- We have set up a Slack workspace for lab and course related discussions. Invitation link was sent separately to all students through email.
- Use the
#practice13-minifi
Slack channel for lab related questions and discussion.
- Use the
- When asking questions or support from lab assistant, please make sure to also provide all needed information, including screenshots, configurations, and even code if needed.
- If code needs to be shown to the lab supervisor, send it (or a link to it) through Slack Direct Messages.
Exercise 13.1. Setting up Apache NiFi server!
In this task we will use OpenStack to run Apache NiFi inside an instance using Docker.
- Create an OpenStack instance For NiFi just like you did in practice session 12.
- Flavour should be
m2.tiny
- Flavour should be
- Install Docker on the instance, just like we did in lab 3 Containers: Working with docker
- Make sure to follow all the steps of installing Docker in the guide, including pre-configuring the network addresses before installing Docker. Otherwise you may lose access to the instance after installing docker.
- Create a docker container with Apache NiFi image from docker hub using the following command line command:
sudo docker run --name nifi \ -p 8080:8080 \ -p 8081:8081 \ -d \ -e NIFI_WEB_HTTP_PORT='8080' \ apache/nifi:latest
- The NiFi will run inside the docker container and the NiFi 8080 web interface port will be made available through the instance.
- To access NiFi web interface, you can direct your browser to the following address
<IP_of_Nifi_Instance>:8080/nifi
, after replacing the IP with the OpenStack instance IP.
Lets also set up a Couchdb JSON database on this instance through Docker.
- Run the following command:
- Make sure to change the
CHANGE_THIS_PASSWORD
in the following command to an actual password. docker run -p 8082:5984 -d -e COUCHDB_USER=admin -e COUCHDB_PASSWORD=CHANGE_THIS_PASSWORD --name couchdb couchdb
- Remember the username and password you will need them later.
- Make sure to change the
- The couchdb server will be exposed through the
8082
port.- You can access the couchdb web interface at
http://<IP_of_Nifi_Instance>:8082/_utils/
- You can access the couchdb web interface at
- Create a new couchdb (NOT partitioned) database called test. We will insert documents to this database later.
- We do not need to use the couchdb server further at this moment, but we will come back to it in the last exercise.
Exercise 13.2. Setting up a MiNiFi device!
MiNiFi, as its name indicates, is a Mini version of NiFi, which is designed to be deployed on more resource constrained devices. MiNiFi does not have a graphical interface and the number of supported processors is lower. There are two versions of MiNiFi, Java and C++, which has even less supported processors than Java version.
You can read more information about MiNiFi here: MiNiFi documentation: https://nifi.apache.org/minifi/getting-started.html
In this task we will use OpenStack to run Apache Java MiNiFi inside another instance. As a result we will have two instances running.
- Create an OpenStack instance For NiFi
- Flavor should be
m2.tiny
- Flavor should be
- Install Java
sudo apt-get update
sudo apt install openjdk-8-jdk
- Install unzip
sudo apt install unzip
- Install MiNiFi service
wget https://downloads.apache.org/nifi/minifi/0.5.0/minifi-0.5.0-bin.zip
unzip minifi-0.5.0-bin.zip
cd minifi-0.5.0/
sudo bin/minifi.sh install
- Install MiNiFi toolbox
cd ~
wget https://downloads.apache.org/nifi/minifi/0.5.0/minifi-toolkit-0.5.0-bin.zip
unzip minifi-toolkit-0.5.0-bin.zip
- Modify the hosts file on the MiNiFi instance, to define what IP matches the hostname of the NiFi Docker container:
nano /etc/hosts
- NB! Don't miss this step or will have issues debugging later.
- NB! Also make sure you use the NiFi docker container ID and not couchdb docker container id!
- Add a new line inside the hosts file, which contains the following values:
- NIFI_INSTANCE_IP NIFI_DOCKER_CONTAINER_ID
- You can check docker container ID by running the docker command to list all running docker container:
docker ps
- The result should look like something like this (with your own docker username and nifi instance IP):
Exercise 13.3. Designing data pipelines in NiFi Web interface
As mentioned at the start of the practice session, MiNiFi has no graphical interface for designing data pipelines. Thus, we will design all the required data pipelines in NiFi web interface and later migrate one of the pipelines into the MiNiFi server.
To make it easier to manage NiFi pipelines and to create templates, we will create two NiFi Processor groups, and create pipelines inside the processor groups.
- Create processor groups named:
- NiFi Pipeline
- MiNiFi Pipeline
- The result should look like this:
Lets design the NiFi pipeline first, which will receive data from the MiNiFi data pipeline.
- Enter the NiFi Pipeline processor group by double clicking on it or using right click and enter group.
- Create a remote Input port for accepting remote connections to send data to this pipeline
- Allow Remote Access:
true
- External NiFi/MiNiFi pipelines can connect to this input port and send data through it.
- Input and output ports allow together with process groups and remote process groups allow to easily combine smaller data pipelines into larger ones even if they are located on different NiFi/MiNiFI servers.
- Allow Remote Access:
- Create a UpdateAttribute processor for defining mime type for each of the incoming FlowFiles
- Add a new Property to the Processor:
- mime.type:
application/json
- mime.type:
- Connect Input port output to UpdateAttribute input
- Add a new Property to the Processor:
- Create a InvokeHTTP processor for putting the data into our CouchDB non-relational server database named:
test
- Remote URL:
http://<IP_OF_NIFI_INSTANCE>:8082/test/
- NB! Make sure the port is 8082 and ends with /test (not /nifi)
- HTTP Method:
POST
- Basic Authentication Username:
<COUCHDB_USERNAME>
- Basic Authentication Password:
<COUCHDB_PASSWORD>
- Connect UpdateAttribute output to InvokeHTTP input
- Remote URL:
- Create a remote Input port for accepting remote connections to send data to this pipeline
- You can exit the NiFi Pipeline processor group and start or stop the whole pipeline by using commands on the processor group.
- You should also recheck that you have created the couchdb database named
test
inside the couchdb server. - The result should look something like this.
Lets design the MiNiFi pipeline now. We will design it in a way that we can also test it first fully inside the NiFi server, before migrating it to MiNiFi server.
- Enter the MiNiFi Pipeline processor group by double clicking on it or using right click and enter group.
- Create a GenerateFlowFile processor for generating input data.
- Lets limit how often FlowFiles are generated. Under scheduling, change:
- Run Schedule:
1 sec
- Run Schedule:
- Lets generate JSON objects containing random environment sensor data (temp, CO2, humidity)
- Change Custom Text value to:
{"temp": ${random():mod(50):plus(50)}, "co2": ${random():mod(200):plus(150)}, "humidity": ${random():mod(60):plus(15)} }
- Change Custom Text value to:
- Lets limit how often FlowFiles are generated. Under scheduling, change:
- Create a EvaluateJsonPath processor for extracting needed values from generated JSON document as NiFi attributes.
- Lets create a new tempAttribute by reading the temp value from the FlowFile.
- Change the following property:
- Destination:
flowfile-attribute
- This will specify that EvaluateJsonPath processor should update attributes and not content of FlowFile (JSON)
- Destination:
- Add a new property to the Processor with the following key and value:
- tempAttribute:
$.temp
- This JSON lookup operation
$.
will look for json element named temp from each FlowFile.
- tempAttribute:
- Change the following property:
- Connect GenerateFlowFile output to EvaluateJsonPath input
- Lets create a new tempAttribute by reading the temp value from the FlowFile.
- Create a RouteOnAttribute processor for defining how to filter data which is sent to the NiFi server
- We will forward data to NiFI only if temperature value (which should be now available from NiFI FlowFile tempAttribute attribute) is larger than 75.
- Add a new property to the processor tod efine new routing rule:
- highTemp:
${tempAttribute:gt(75)}
- highTemp:
- Connect EvaluateJsonPath
matched
relationship output to RouteOnAttribute input
- Create a Remote Processor Group processor for connecting to a NiFi server over the internet
- URLs:
http://<IP_OF_NIFI_INSTANCE>:8080/nifi/
- Transport Protoco:
HTTP
- Connect RouteOnAttribute
highTemp
relationship output to Remote Processor Group input
- URLs:
- Create a PutFile processor for storing the original generated files into filesystem, which is useful for checking that MiNiFi flow s working properly.
- Change the processor to output FlowFiles into
/tmp/nifi
folder- Later, when running the pipeline MiNiFi, you can check that new files are being created in this folder to confirm that the pipeline was properly started.
- Connect GenerateFlowFile output to PutFile input
- Change the processor to output FlowFiles into
- Create a GenerateFlowFile processor for generating input data.
- You can exit the MiNiFi Pipeline processor group and start or stop the whole pipeline by using commands on the processor group.
- The result should look something like this.
Exercise 13.4. Migrating data pipeline to MiNiFi
In this task we will move the first data pipeline to MiNiFi device, and configure it to connect and send data to the pipeline running inside the NiFi server.
- Inside the NiFi web interface: create a template of the MiNiFi pipeline processor group and download it
- Move the template XML file to the MiNiFi instance.
- You can use scp command (in windows, this is avaiable in Git Bash for example) to move the file to the MiNiFi device/instance.
- Converting NiFi XML template into MiNiFi YAML format:
- Assuming the name of the file you copied or created is named:
flow.xml
, you can use the following command:~/minifi-toolkit-0.5.0/bin/config.sh transform flow.xml config.yml
- Assuming the name of the file you copied or created is named:
NB! If you get an error about validation failing for the created config file, we need to fix the generated file manually.
- If the error looks like this:
- It means that the last connection between RouteOnAttribute and Remote processor group has wrong id's in the generated config file. This issue comes from using newest NiFi version when MiNiFi version has not been updated yet.
- We can fix it manually by editing the config.yml file.
- These are the ID values that need to be changed:
- Change the ID's located in your config file red boxes with the green box value.
- The result should like like this:
Lets now use the modified configuration file.
- Move the converted pipeline configuration file into MiNiFi configuration folder
mv config.yml ~/minifi-0.5.0/conf/config.yml
- Starting MiNiFi service:
~/minifi-0.5.0/bin/minifi.sh start
- If you need to Stop MiNiFi service (do not run this command unless you need to stop or restart MiNiFi):
~/minifi-0.5.0/bin/minifi.sh stop
- This is useful when you get an error about MiNiFi already running. then you can first stop the MiNiFi and then start it again.
- You can check any issues with MiNiFi from the MiNiFi logs at
minifi-0.5.0/logs/minifi-app.log
Check that the Whole MiNiFi-> NiFi -> CouchDB flow is working
- Check that new files are created under
/tmp/nifi
inside the MiNiFi device- Take a screenshot of the folder content
- Check that FlowFiles arrive in NiFi server
- Take a screenshot of the NiFI Pipeline
- Check that new documents are being added to the CouchDB server.
- You may have to refresh the page, couchdb login sessions close quite fast
- Take a screenshot of CouchDB web interface displaying the content of the
test
database documents
Deliverables:
- Screenshots from task 13.4
- Templates from tasks 13.3 and 13.4
- MiNiFi pipeline template
- NiFi pipeline template
- MiNiFi config.yml file