本文主要是根据官方文档指导,结合实际主机情况,在CentOS 7上通过TiCDC 将TiDB增量数据同步到Kafka。
TiDB和Kafka的安装请参照下述链接。
CentOS 7使用TiUP部署TiDB
CentOS 7安装Zookeeper和Kafka
在中控机,编写TiCDC的部署脚本
global:user: "tidb"ssh_port: 2333deploy_dir: "/tidb-deploy"data_dir: "/tidb-data"
cdc_servers:- host: 192.168.58.10gc-ttl: 86400data_dir: /data/deploy/install/data/cdc-8300root@yov-PC:/home/yov/Desktop#
执行安装指令。
# 安装tiCDC
$ tiup cluster scale-out tidb-test cdc.yaml -u root -p
# 查看集群组件状态
$ tiup cluster display tidb-test
tiup is checking updates for component cluster ...
Starting component `cluster`: /root/.tiup/components/cluster/v1.11.3/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v6.5.0
Deploy user: root
SSH type: builtin
Dashboard URL: http://192.168.58.10:2379/dashboard
Grafana URL: http://192.168.58.10:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ --------
192.168.58.10:8300 cdc 192.168.58.10 8300 linux/x86_64 Up /data/deploy/install/data/cdc-8300 /tidb-deploy/cdc-8300
192.168.58.10:3000 grafana 192.168.58.10 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
192.168.58.10:2379 pd 192.168.58.10 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
192.168.58.10:9090 prometheus 192.168.58.10 9090/12020 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
192.168.58.10:4000 tidb 192.168.58.10 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
192.168.58.10:20160 tikv 192.168.58.10 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
192.168.58.10:20161 tikv 192.168.58.10 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
192.168.58.10:20162 tikv 192.168.58.10 20162/20182 linux/x86_64 Up /tidb-data/tikv-20162 /tidb-deploy/tikv-20162
Total nodes: 8
在安装TICDC的主机上,创建同步通道
$ cd /tidb-deploy/cdc-8300/bin
# 创建同步任务
$ ./cdc cli changefeed create --server=http://192.168.58.10:8300 --sink-uri="kafka://192.168.58.10:9092/test-topic?protocol=canal-json&kafka-version=3.4.0&partition-num=1&max-message-bytes=67108864&replication-factor=1" --changefeed-id="simple-replication-task"
参数说明
--server: TiCDC的主机IP和端口
--changefeed-id:同步任务的 ID,格式需要符合正则表达式^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$。如果不指定该 ID,TiCDC 会自动生成一个 UUID(version 4 格式)作为 ID。
--sink-uri: 同步任务的下游地址,具体信息如下:
192.168.58.10:9092:为kafka访问的主机地址和端口;
test-topic:为kafka topic名;
protocol:输出到 Kafka 的消息协议,可选值有 canal-json、open-protocol、canal、avro、maxwell;
partition-num:下游 Kafka partition 数量(可选,不能大于实际 partition 数量,否则创建同步任务会失败,默认值 3)。
max-message-bytes:每次向 Kafka broker 发送消息的最大数据量(可选,默认值 10MB)建议调大。
replication-factor:Kafka 消息保存副本数(可选,默认值 1)。
{"id":0,"database":"test","table":"student","pkNames":["s_id"],"isDdl":false,"type":"INSERT","es":1678934551053,"ts":1678934551348,"sql":"","sqlType":{"s_id":4,"s_name":12},"mysqlType":{"s_id":"int","s_name":"varchar"},"old":null,"data":[{"s_id":"2","s_name":"Jack"}]}
database: 数据库名
table: 表名
pkNames: 主键名数组
isDdl: 是否为数据库结构变更
type: 增量类型,其中修改数据时,如果修改了数据主键,会被拆分为一条DELETE,一条INSERT;
sqlType: 对应TiDB中表的数据类型编码;
mysqlType:对应MySQL中的数据类型;
old:旧数据,当type为DELETE或INSERT时,该值为null;
data: 最新数据。
# 在kafka所在主机启动消费者
$ ./kafka-console-consumer.sh --bootstrap-server 192.168.58.10:9092 --topic test-topic --from-beginning
# 此时去tidb数据库,进行数据操作,比如插入一条数据,此时消费者控制台会输出以下信息:
{"id":0,"database":"test","table":"student","pkNames":["s_id"],"isDdl":false,"type":"INSERT","es":1678934551053,"ts":1678934551348,"sql":"","sqlType":{"s_id":4,"s_name":12},"mysqlType":{"s_id":"int","s_name":"varchar"},"old":null,"data":[{"s_id":"2","s_name":"Jack"}]}
package kafka;import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;import java.time.Duration;
import java.util.ArrayList;
import java.util.Date;
import java.util.Properties;public class KafkaDemo {public static void main(String[] args) throws Exception {Properties prop = new Properties();prop.put("bootstrap.servers", "192.168.58.10:9092");prop.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");prop.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");prop.put("group.id", "con-1");prop.put("auto.offset.reset","latest");KafkaConsumer consumer = new KafkaConsumer<>(prop);ArrayList topics = new ArrayList<>();topics.add("test-topic");consumer.subscribe(topics);while (true) {ConsumerRecords poll = consumer.poll(Duration.ofSeconds(5));for (ConsumerRecord record : poll) {System.out.println("增量发生时间:" + new Date(record.timestamp()));System.out.println("原始数据:" + record.value());JSONObject object = JSONObject.parseObject(record.value());if (object.getBoolean("isDdl")) {continue;}System.out.println("增量类型:" + object.get("type"));System.out.println("数据库名:" + object.get("database"));System.out.println("表名:" + object.get("table"));JSONArray pkArray = object.getJSONArray("pkNames");String pkName = pkArray.get(0).toString();System.out.println("主键名:" + pkName);JSONArray oldArray = object.getJSONArray("old");JSONArray dataArray = object.getJSONArray("data");String newPkValue = dataArray.getJSONObject(0).getString(pkName);System.out.println("主键值:" + newPkValue);if (oldArray != null) {JSONObject old = oldArray.getJSONObject(0);String oldPkValue = old.getString(pkName);if (!oldPkValue.equals(newPkValue)) {System.out.println("发生主键变更,旧主键值:" + newPkValue);}}}}}
}