Streamsets and ARIS Process Mining

This article outlines the steps involved in implementing a solution using Streamsets, Integration and ARIS Process Mining

1 Introduction

1.1 Why Streamsets?

Streamsets is a Data Ops platform. It can work with data at large scale. Smart data pipelines can be built and deployed across hybrid and multi-cloud platforms from a single portal.

1.2 Why ARIS Process Mining?

ARIS Process Mining lets you understand business processes to find bottlenecks and opportunities for improvement. Compare designed processes to as-is processes to see, if they execute as planned and make changes, before they impact the bottom line.

1.3 How can they work together?

ARIS process mining needs process execution data/logs/audit trail for mining the processes. ARIS process mining is more effective when there is a continuous stream of data coming in. Streamsets which processes data at large scale can extract meaningful process audit trail and send it to ARIS Process Mining. ARIS Process Mining can analyse the data and churn out meaningful insights about the processes.

1.4 Use Case for this article

Assume all the systems involved in the required process are publishing Audit data/activities to Kafka. From the audit data available in Kafka we can build a Process Mining pipeline.
Streamsets can extract the data from Kafka, aggregate and transform data. The resultant data can be sent to ARIS process mining via Integration.
ARIS Process Mining can use this data to Mine the processes and show the near real time Process Analytics and Dashboards. Integration is required to build a workflow to upload data to ARIS Process mining using Data APIs. To ingest data client applications needs to call a series of APIs. As Streamsets is a Data Ops platform, it’s not suited to build a functional app. Hence is a suitable platform to build such a workflow with existing ARIS Process Mining connector.

2 Pre-requisite

• ARIS Process Mining Cloud tenant cloud tenant with Integration enabled
• Streamsets Cloud Tenant and Streamsets Data Collector instance, Data Collector should have network access to connect to Kafka instance and internet.
• Kafka and Zookeeper Setup.

3 Implementation

3.1 Configure ARIS Process Mining Instance

3.1.1 Enable the Data Ingest APIs

Login to ARIS Process Mining Instance with User having Engineer and Process mining Admin roles.

Goto Administration > System Integration

Add System Integration for type “Data Ingest API” and Auth type ‘Client Credentials’

This creates Client Credentials, with ClientID and Client Secret.

The credentials should be used to get Access Tokens to call the APIs.

Curl command:

curl --location --request POST '' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Accept: application/json' \
--data-urlencode 'clientId={Client Id}' \
--data-urlencode 'clientSecret={Client Secret}' \
--data-urlencode 'tenant={Tenant Name}'

3.1.2 Create a Process Mining Project and Data Collection

Go to Projects and Create a project and an associated Data Collection. Create an Analysis in the Project.

3.1.3 Add Data Ingest API License to Data Collection

Go to Data Collections and open the new Data Collection created

Goto Connections

Add a new connection, give it a name, Select the System Integration attach an available License, If you don’t have a license contact Admin and get Data Ingest licenses.

This step is critical, without this connection you will not have permission to upload data using REST APIs


3.1.4 Create Table in the Data Collection using REST API

Ingest APIs doesnt work with tables created directly on the portal. So tables needs to be created from the APIs.

Goto Source Tables

Create a table using REST APIs

Curl Command to create table

curl --location --request POST '' \
--header 'Authorization: Bearer {Access Token}' \
--header 'Content-Type: application/json' \
--data-raw '[
        "name": "parceldelivery_csv",
        "namespace": "default",
        "columns": [
                "dataType": "STRING",
                "name": "Case_ID"
                "dataType": "STRING",
                "name": "Activity"
                "dataType": "FORMATTED_TIMESTAMP",
                "name": "Start",
                "format": "dd.MM.yyyy HH:mm"
                "dataType": "FORMATTED_TIMESTAMP",
                "name": "End",
                "format": "dd.MM.yyyy HH:mm"
                "dataType": "STRING",
                "name": "Product"
                "dataType": "STRING",
                "name": "Customer"
                "dataType": "STRING",
                "name": "Country"
                "dataType": "STRING",
                "name": "Delivery type"

After execution of the API, check the portal for newly created table.

3.2 Workflow to send data to Process Mining

Create a Workflow and use ARIS Process Mining connector.
Add an account for ARIS Process Mining account.
ARIS Process Mining Data Ingest APIs needs to be called in a particular order to ingest data. Implement the order as shown below


Create a Webhook to accept JSON array as input .

The input should be the process data that can be submitted to process mining.


3.3 Streamsets Data Pipeline

Streamsets supports HTTP Client as a Destination. REST API calls can be implemented using this destination. It is very configurable hence its easy to implement any REST API call.

But ARIS Process mining Data Ingest API is a complex set of API calls. Implementing such a workflow is not a good use case for Streamsets, as Streamsets is meant or data processing and not for building functions and app integrations.

To complete the usecase call the workflow using the REST endpoint created using webhook from Streamsets using HTTP Client destination.

Set Data Format to JSON array of objects

Below is a simple Data Pipeline in Streamsets,

Data is sourced from Kafka, using origin as Kafka Multitopic Consumer

And Destination as HTTP Client, calling REST API




4 Results

When the data is uploaded to ARIS Process Mining, it starts processing the data. The status can be seen on the overview page of the Data Collection

4.1 Streamsets



4.2 Integration transactions


4.3 ARIS Process Mining Data Collection Overview

Displays current status : Processing Data when uploaded data is being processed, once completed the status changes to Data Loaded




Next steps

Use a real world business process from a customer project to implement this solution.

Useful links | Relevant resources