OPR4AA stands for On-the-edge PRocessing for Acquisition and Actuation: it is a platform based on open source FIWARE/Apache components aiming to cover the entire industrial data value chain. Built on top of the DIDA (Digital Industries Data Analytics) platform, it can be deployed for edge processing, collecting data, process them and communicate results with other external cloud modules
The component allows you to bring input data call some external Rest API, process them using AI algorithms (e.g. Python Tensorflow + Keras) persist them in the internal persistence layer or export them to external modules through Rest API.
The OPR4AA platform is composed by:
- FIWARE Draco (deprecated): based on Apache NiFi. NiFi is a data-flow system based on the flow-based concept programming designed to automate the flow of data in systems and support direct and scalable graphics.
- Apache Airflow: an open-source platform that allows users to programmatically author, schedule, and monitor workflows. It is designed to be highly scalable, extensible, and modular, making it ideal for creating complex data processing pipelines.
- Apache Hadoop Distributed File System (HDFS): designed to reliably store very large files across machines in a large cluster
- Apache Spark: is an open-source parallel processing framework for running largescale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.
- Apache Livy: is a service allowing easy interaction with a Spark cluster over a REST interface. Through it, can be easily submitted Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, everything via a simple REST interface or an RPC client library. Apache Livy also simplifies the interaction between Spark and application servers.
- Docker Engine
- Minimum 8GB RAM
- Docker Compose >= 1.29
Before starting, chose wether to run AirFlow or NiFi as ingestion module. Comment/Uncomment docker-compose.yml accordingly.
docker network create -d bridge network-bridge
docker-compose up --build -d
docker-compose -f airflow.yml up --build -d
- Airflow at (https://localhost:8087/)
- NiFi at (https://localhost:8443/nifi)
- HDFS at (http://localhost:9870/explorer.html)
- Spark Master at (http://localhost:8080/)
- Spark Worker at (http://localhost:8081/)
- Livy UI at (http://localhost:8998/)
In Airflow you can configure your own data flow (see Airflow doc here) in a more lightweight way. The solution provides processing on Spark.
User | Password |
---|---|
airflow | airflow |
curl --location --request POST 'http://localhost:8087/api/v1/dags/test-pipeline/dagRuns' \
--header 'Authorization: Basic YWlyZmxvdzphaXJmbG93' \
--header 'Content-Type: application/json' \
--data-raw '{
"conf": {
"host": "api.host.cloud",
"username": "*****",
"password": "*****",
"source_entity": {
"entity_id": "OPR4AA-Execution-Test",
"entity_type": "Entity-Type-Test",
"attribute":"image"
},
"sink_entity": {
"entity_id": "OPR4AA-Execution-Test",
"entity_type": "Entity-Type-Test",
"attribute":"evaluation"
}
}
}'
Basic Authorization header is generated by base64 encoding of string "airflow:airflow"
A Postman collection for DAG Run is provided.
In Draco you can configure your own data flow (see NiFi doc here). The solution provides processing on Draco or Spark. Algorithms & data ingestion can be done by calling the provided APIs. Inside the template pre-loaded on Draco you can activate the flows you prefer to use and can configure each NiFi processor following the notes on the UI.
User | Password |
---|---|
admin | ctsBtRBKHRAx69EqUghvvgEvjnaLjFEB |
Draco will start with pre-uploaded template:
HTTP Method | Port | Service | Description |
---|---|---|---|
POST | 8085 | /ingest-algorithm | Algorithm data ingestion route. Accept a .zip file that contains algorithm files. |
POST | 8086 | /ingest-data | Input data ingestion route. Accept a file. |
A Postman collection for algorithms & data ingestion is provided.
The solution is provided with an example you can run to test the solution. An AI Python image classifier is provided, including a pre-trained ready-to-use neural network model based on Tensorflow and Keras (see here).