Module: Kafka
Kafka is an open-source distributed event streaming platform: Installation | Configuration | Administration | Playbook | Monitoring | Parameters | Resources
Overview
Kafka module is currently available in Pigsty Pro as a Beta Preview.
Installation
If you are using the open-source version of Pigsty, you can install Kafka and its Java dependencies on the specified node using the following command.
Pigsty provides Kafka 3.8.0 RPM and DEB packages in the official Infra repository, which can be downloaded and installed directly.
./node.yml -t node_install -e '{"node_repo_modules":"infra","node_packages":["kafka"]}'
Kafka requires a Java runtime environment, so you need to install an available JDK when installing Kafka (OpenJDK 17 is used by default, but other JDKs and versions, such as 8 and 11, can also be used).
# EL7 (no JDK 17 support)
./node.yml -t node_install -e '{"node_repo_modules":"node","node_packages":["java-11-openjdk-headless"]}'
# EL8 / EL9 (use OpenJDK 17)
./node.yml -t node_install -e '{"node_repo_modules":"node","node_packages":["java-17-openjdk-headless"]}'
# Debian / Ubuntu (use OpenJDK 17)
./node.yml -t node_install -e '{"node_repo_modules":"node","node_packages":["openjdk-17-jdk"]}'
Configuration
Single node Kafka configuration example. Please note that in Pigsty single machine deployment mode, the 9093 port on the admin node is already occupied by AlertManager.
It is recommended to use other ports when installing Kafka on the admin node, such as (9095).
kf-main:
hosts:
10.10.10.10: { kafka_seq: 1, kafka_role: controller }
vars:
kafka_cluster: kf-main
kafka_data: /data/kafka
kafka_peer_port: 9095 # 9093 is already hold by alertmanager
3-node Kraft mode Kafka cluster configuration example:
kf-test:
hosts:
10.10.10.11: { kafka_seq: 1, kafka_role: controller }
10.10.10.12: { kafka_seq: 2, kafka_role: controller }
10.10.10.13: { kafka_seq: 3, kafka_role: controller }
vars:
kafka_cluster: kf-test
Administration
Here are some basic Kafka cluster management operations:
Create Kafka clusters with kafka.yml playbook:
./kafka.yml -l kf-main
./kafka.yml -l kf-test
Create a topic named test:
kafka-topics.sh --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092
Here the --replication-factor 1 means each data will be replicated once, and --partitions 1 means only one partition will be created.
Use the following command to view the list of Topics in Kafka:
kafka-topics.sh --bootstrap-server localhost:9092 --list
Use the built-in Kafka producer to send messages to the test Topic:
kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>haha
>xixi
>hoho
>hello
>world
> ^D
Use the built-in Kafka consumer to read messages from the test Topic:
kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
Playbook
Pigsty provides 1 playbook related to the Kafka module for managing Kafka clusters.
kafka.yml
The kafka.yml playbook for deploying Kafka KRaft mode cluster contains the following subtasks:
kafka-id : generate kafka instance identity
kafka_clean : remove existing kafka instance (DANGEROUS)
kafka_user : create os user kafka
kafka_pkg : install kafka rpm/deb packages
kafka_link : create symlink to /usr/kafka
kafka_path : add kafka bin path to /etc/profile.d
kafka_svc : install kafka systemd service
kafka_dir : create kafka data & conf dir
kafka_config : generate kafka config file
kafka_boot : bootstrap kafka cluster
kafka_launch : launch kafka service
kafka_exporter : launch kafka exporter
kafka_register : register kafka service to prometheus
Monitoring
Pigsty has provided two monitoring panels related to the KAFKA module:
KAFKA Overview shows the overall monitoring metrics of the Kafka cluster.
KAFKA Instance shows the monitoring metrics details of a single Kafka instance.
Parameters
Available parameters for Kafka module:
#kafka_cluster: #CLUSTER # kafka cluster name, required identity parameter
#kafka_role: controller #INSTANCE # kafka role, controller, broker, or controller-only
#kafka_seq: 0 #INSTANCE # kafka instance seq number, required identity parameter
kafka_clean: false # cleanup kafka during init? false by default
kafka_data: /data/kafka # kafka data directory, `/data/kafka` by default
kafka_version: 3.8.0 # kafka version string
scala_version: 2.13 # kafka binary scala version
kafka_port: 9092 # kafka broker listen port
kafka_peer_port: 9093 # kafka broker peer listen port, 9093 by default (conflict with alertmanager)
kafka_exporter_port: 9308 # kafka exporter listen port, 9308 by default
kafka_parameters: # kafka parameters to be added to server.properties
num.network.threads: 3
num.io.threads: 8
socket.send.buffer.bytes: 102400
socket.receive.buffer.bytes: 102400
socket.request.max.bytes: 104857600
num.partitions: 1
num.recovery.threads.per.data.dir: 1
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
log.retention.hours: 168
log.segment.bytes: 1073741824
log.retention.check.interval.ms: 300000
#log.retention.bytes: 1073741824
#log.flush.interval.ms: 1000
#log.flush.interval.messages: 10000
Resources
Pigsty provides some Kafka-related extension plugins for PostgreSQL:
kafka_fdw: A useful FDW that allows users to read and write Kafka Topic data directly from PostgreSQLwal2json: Used to logically decode WAL from PostgreSQL and generate JSON-formatted change datawal2mongo: Used to logically decode WAL from PostgreSQL and generate BSON-formatted change datadecoder_raw: Used to logically decode WAL from PostgreSQL and generate SQL-formatted change datatest_decoding: Used to logically decode WAL from PostgreSQL and generate RAW-formatted change data
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.