Apache SeaTunnel is a new generation of high-performance, distributed data integration and synchronization tool that has been widely recognized and applied in the industry. SeaTunnel supports three deployment modes: Local mode, Hybrid Cluster Mode, and Separated Cluster Mode.
This article aims to introduce the deployment of SeaTunnel in Separated Cluster Mode on Kubernetes, providing a comprehensive deployment process and configuration examples for those with relevant needs.
1. Preparation
Before starting deployment, the following environments and components must be ready:
- Kubernetes cluster environment
- kubectl command-line tool
- docker
- helm (optional)
For those familiar with Helm, you can directly refer to the official Helm deployment tutorial:
This article mainly introduces deployment based on Kubernetes and kubectl tools.
2. Build SeaTunnel Docker Image
The official images of various versions are already provided and can be pulled directly. For details, please refer to the official documentation: Set Up With Docker.
docker pull apache/seatunnel:<version_tag>
Since we need to deploy cluster mode, the next step is to configure cluster network communication. The network service of the SeaTunnel cluster is implemented via Hazelcast, so we will configure this part next.
Headless Service Configuration
The Hazelcast cluster is a network formed by cluster members running Hazelcast, which automatically join together to form a cluster. This automatic joining is achieved through various discovery mechanisms used by cluster members to find each other.
Hazelcast supports the following discovery mechanisms:
- Auto Discovery, supporting environments like:
- AWS
- Azure
- GCP
- Kubernetes
- TCP
- Multicast
- Eureka
- Zookeeper
In this article’s cluster deployment, we configure Hazelcast using Kubernetes auto discovery mechanism. Detailed principles can be found in the official document: Kubernetes Auto Discovery.
Hazelcast’s Kubernetes auto discovery mechanism (DNS Lookup mode) requires Kubernetes Headless Service to work. Headless Service resolves the service domain name into a list of IP addresses of all matching Pods, enabling Hazelcast cluster members to discover each other.
First, we create a Kubernetes Headless Service:
# use for hazelcast cluster join
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
ports:
- port: 5801
name: hazelcast
Key parts of the above configuration:
metadata.name: seatunnel-cluster
: service name, Hazelcast clients/nodes discover cluster members through this namespec.clusterIP: None
: critical configuration declaring this as Headless Service without virtual IPspec.selector
: selector matching Pod labels that will be selected by this Servicespec.port
: port exposed for Hazelcast
Meanwhile, to access the cluster externally via REST API, we define another Service for the master node Pod:
# use for access seatunnel from outside system via rest api
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster-master
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
ports:
- port: 8080
name: "master-port"
targetPort: 8080
protocol: TCP
After defining the above Kubernetes Services, next configure hazelcast-master.yaml
and hazelcast-worker.yaml
files according to Hazelcast’s Kubernetes discovery mechanism.
Hazelcast master and worker yaml configurations
In SeaTunnel’s separated cluster mode, all network-related configuration is contained in hazelcast-master.yaml
and hazelcast-worker.yaml
.
hazelcast-master.yaml
example:
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
Key configuration items:
- cluster-name
- This config identifies if multiple nodes belong to the same cluster; only nodes with the same cluster-name will join the same Hazelcast cluster. Different cluster-name nodes reject requests from each other.
- Network configuration
- rest-api.enabled: Hazelcast REST service is disabled by default in ST 2.3.10; it must be explicitly enabled here.
- service-dns (required): full domain name of the Headless Service, generally
${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local
. - service-port (optional): Hazelcast port; if specified and > 0, overrides default port (5701).
Using this Kubernetes join mechanism, when Hazelcast Pod starts, it resolves the service-dns to get the IP list of all member Pods (via Headless Service), and then members attempt TCP connections over port 5801.
Similarly, the hazelcast-worker.yaml
configuration is:
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
member-attributes:
rule:
type: string
value: worker
Through the above, we complete Hazelcast cluster member discovery configuration based on Kubernetes. Next, proceed to configure SeaTunnel engine.
4. Configure SeaTunnel Engine
The configuration related to the SeaTunnel engine is all in the seatunnel.yaml
file. Below is a sample seatunnel.yaml
configuration for reference:
seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
classloader-cache-mode: true
http:
enable-http: true
port: 8080
enable-dynamic-port: false
port-range: 100
slot-service:
dynamic-slot: true
checkpoint:
interval: 300000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: hdfs://xxx:8020 # Ensure directory has write permission
telemetry:
metric:
enabled: true
This includes the following configuration information:
history-job-expire-minutes
: the retention period of task history records is 24 hours (1440 minutes), after which they will be automatically cleaned up.backup-count: 1
: number of backup replicas for task state is 1.queue-type: blockingqueue
: use a blocking queue to manage tasks to avoid resource exhaustion.print-execution-info-interval: 60
: print task execution status every 60 seconds.print-job-metrics-info-interval: 60
: output task metrics (such as throughput, latency) every 60 seconds.classloader-cache-mode: true
: enable class loader caching to reduce repeated loading overhead and improve performance.dynamic-slot: true
: allow dynamic adjustment of task slot quantity based on load to optimize resource utilization.checkpoint.interval: 300000
: trigger checkpoint every 5 minutes.checkpoint.timeout: 60000
: checkpoint timeout set to 1 minute.telemetry.metric.enabled: true
: enable collection of runtime task metrics (e.g., latency, throughput) for monitoring.
5. Create Kubernetes YAML Files to Deploy the Application
After completing the above workflow, the final step is to create Kubernetes YAML files for Master and Worker nodes, defining deployment-related configurations.
To decouple configuration files from the application, the above-mentioned configuration files are merged into one ConfigMap, mounted under the container’s configuration path for unified management and easier updates.
Below are sample configurations for seatunnel-cluster-master.yaml
and seatunnel-cluster-worker.yaml
, covering ConfigMap mounting, container startup commands, and deployment resource definitions.
seatunnel-cluster-master.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-master
spec:
replicas: 2 # modify replicas according to your scenario
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-master"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-master
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
- containerPort: 8080
name: "master-port"
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- master
resources:
requests:
cpu: "1"
memory: 4G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
Deployment Strategy
- Use multiple replicas (
replicas=2
) to ensure service high availability. - Use rolling update strategy for zero downtime deployment:
maxUnavailable: 25%
: ensure at least 75% of Pods are running during updates.maxSurge: 50%
: temporarily allow 50% more Pods during transition for smooth upgrade.
Label Selectors
- Use Kubernetes recommended standard label system
spec.selector.matchLabels
: defines the scope of Pods managed by the Deployment based on labelsspec.template.labels
: labels assigned to new Pods to identify their metadata
Node Affinity
- Configure
affinity
to specify which nodes the Pod should be scheduled on - Replace
nodeAffinity-key
with labels matching your Kubernetes environment nodes
Config File Mounting
- Centralize core configuration files in a ConfigMap to decouple management from applications
- Use
subPath
to mount individual files from ConfigMap
The seatunnel-cluster-worker.yaml
configuration is:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-worker
spec:
replicas: 3 # modify replicas according to your scenario
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-worker"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-worker
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- worker
resources:
requests:
cpu: "1"
memory: 10G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
sub
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
After defining the above master and worker YAML files, you can deploy them to the Kubernetes cluster by running:
kubectl apply -f seatunnel-cluster-master.yaml
kubectl apply -f seatunnel-cluster-worker.yaml
Under normal circumstances, you will see the SeaTunnel cluster running with 2 master nodes and 3 worker nodes:
$ kubectl get pods | grep seatunnel-cluster
seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m
seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m
seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m
At this point, we have successfully deployed the SeaTunnel cluster in Kubernetes using the separated cluster mode. Now that the cluster is ready, how do clients submit jobs to it?
6. Client Submits Jobs to the Cluster
Submit Jobs Using the Command-Line Tool
All client configurations for SeaTunnel are located in the hazelcast-client.yaml
file.
First, download the binary installation package locally on the client (which contains the bin
and config
directories), and ensure the SeaTunnel installation path is consistent with the server. This is what the official documentation refers to as: Setting the SEATUNNEL_HOME the same as the server, otherwise errors such as “cannot find connector plugin path on the server” may occur because the server-side plugin path differs from the client-side path.
Enter the installation directory and modify the config/hazelcast-client.yaml
file to point to the Headless Service address created earlier:
hazelcast-client:
cluster-name: seatunnel-cluster
properties:
hazelcast.logging.type: log4j2
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-cluster.bigdata.svc.cluster.local:5801
After the client configuration is done, you can submit jobs to the cluster. There are two main ways to configure JVM options for job submission:
- Configure JVM options in the
config/jvm_client_options
file: - JVM options configured here will apply to all jobs submitted via
seatunnel.sh
, regardless of running in local or cluster mode. All submitted jobs will share the same JVM configuration. - Specify JVM options directly in the command line when submitting jobs:
- When submitting jobs via
seatunnel.sh
, you can specify JVM parameters on the command line, e.g., sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G
.- This allows specifying JVM options individually for each job submission.
Next, here is a sample job configuration to demonstrate submitting a job to the cluster:
env {
parallelism = 2
job.mode = "STREAMING"
checkpoint.interval = 2000
}
source {
FakeSource {
parallelism = 2
plugin_output = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
sink {
Console {
}
}
Use the following command on the client to submit the job:
sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G"
On the Master node, list running jobs with:
$ sh bin/seatunnel.sh -l
Job ID Job Name Job Status Submit Time Finished Time
------------------ ------------------- ---------- ----------------------- -----------------------
964354250769432580 st.example.template RUNNING 2025-04-15 10:39:30.588
You can see the job named st.example.template
is currently in the RUNNING state. In the Worker node logs, you should observe log entries like:
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602
...
This confirms the job has been successfully submitted to the SeaTunnel cluster and is running normally.
Submit Jobs Using the REST API
SeaTunnel also provides a REST API for querying job status, statistics, submitting, and stopping jobs. We configured a Headless Service for Master nodes with port 8080 exposed. This allows submitting jobs via REST API from clients.
You can submit a job by uploading the configuration file via curl:
curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' --form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' --form 'jobName=st.example.template'
{"jobId":"964553575034257409","jobName":"st.example.template"}
If submission succeeds, the API returns the job ID and job name as above.
To list running jobs, query:
curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/running-jobs'
[{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...}]
The response shows the job status and additional metadata, confirming the REST API job submission method works correctly.
More details on the REST API can be found in the official documentation: RESTful API V2
7. Summary
This article focused on how to deploy SeaTunnel in Kubernetes using the recommended separated cluster mode. To summarize, the main deployment steps include:
- Prepare the Kubernetes environment: Ensure a running Kubernetes cluster and necessary tools are installed.
- Build SeaTunnel Docker images: Use the official image if no custom development is needed; otherwise, build locally and create your own image.
- Configure Headless Service and Hazelcast cluster:Hazelcast’s Kubernetes auto-discovery DNS Lookup mode requires Kubernetes Headless Service, so create a Headless Service and configure Hazelcast with the service DNS accordingly. The Headless Service resolves to all pods’ IPs to enable Hazelcast cluster member discovery.
- Configure SeaTunnel engine: Modify
seatunnel.yaml
to set engine parameters. - Create Kubernetes deployment YAML files: Define Master and Worker deployments with node selectors, startup commands, resources, and volume mounts, then deploy to Kubernetes.
- Configure the SeaTunnel client: Install SeaTunnel on the client, ensure
SEATUNNEL_HOME
matches the server, and configurehazelcast-client.yaml
to connect to the cluster. - Submit and run jobs: Submit jobs from the client to the SeaTunnel cluster for execution.
The configurations and cases presented here serve as references. There may be many other configuration options and details not covered. Feedback and discussions are welcome. Hope this is helpful for everyone!