前言
随着业务增长,单表数据量越来越大,查询越来越慢
所以决定用ShardingSphere做分库分表,记录一下整合过程
其实分库分表是最后的手段,能优化SQL就优化SQL,实在不行再考虑分库分表
环境准备
依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId> <version>5.3.2</version> </dependency>
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency>
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> </dependency>
|
分表策略
假设我们有一个订单表,数据量很大,决定按用户ID分表
数据库准备
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| CREATE DATABASE db0; CREATE DATABASE db1;
CREATE TABLE db0.t_order0 ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, order_no VARCHAR(50), amount DECIMAL(10, 2), create_time DATETIME );
CREATE TABLE db0.t_order1 ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, order_no VARCHAR(50), amount DECIMAL(10, 2), create_time DATETIME );
CREATE TABLE db1.t_order0 ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, order_no VARCHAR(50), amount DECIMAL(10, 2), create_time DATETIME );
CREATE TABLE db1.t_order1 ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, order_no VARCHAR(50), amount DECIMAL(10, 2), create_time DATETIME );
|
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
| spring: shardingsphere: datasource: names: ds0,ds1 ds0: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://localhost:3306/db0 username: root password: root123456 ds1: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://localhost:3306/db1 username: root password: root123456
rules: sharding: tables: t_order: actual-data-nodes: ds$->{0..1}.t_order$->{0..1} database-strategy: standard: sharding-column: user_id sharding-algorithm-name: db_mod table-strategy: standard: sharding-column: user_id sharding-algorithm-name: table_mod key-generate-strategy: column: id key-generator-name: snowflake
sharding-algorithms: db_mod: type: MOD props: sharding-count: 2 table_mod: type: MOD props: sharding-count: 2
key-generators: snowflake: type: SNOWFLAKE
props: sql-show: true
|
实体类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| @Entity @Table(name = "t_order") @Data public class Order { @Id private Long id;
private Long userId;
private String orderNo;
private BigDecimal amount;
private Date createTime; }
|
Repository
1 2 3
| public interface OrderRepository extends JpaRepository<Order, Long> { List<Order> findByUserId(Long userId); }
|
Service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| @Service public class OrderService {
@Autowired private OrderRepository orderRepository;
public Order createOrder(Order order) { order.setId(null); order.setCreateTime(new Date()); return orderRepository.save(order); }
public List<Order> getOrdersByUserId(Long userId) { return orderRepository.findByUserId(userId); }
public Order getOrderById(Long id) { return orderRepository.findById(id).orElse(null); } }
|
测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| @RestController @RequestMapping("/order") public class OrderController {
@Autowired private OrderService orderService;
@PostMapping public Order createOrder(@RequestBody Order order) { return orderService.createOrder(order); }
@GetMapping("/user/{userId}") public List<Order> getOrdersByUserId(@PathVariable Long userId) { return orderService.getOrdersByUserId(userId); }
@GetMapping("/{id}") public Order getOrderById(@PathVariable Long id) { return orderService.getOrderById(id); } }
|
广播表
有些表数据量不大,但是在每个库都要有,比如配置表
1 2 3 4 5 6 7 8 9 10
| spring: shardingsphere: rules: sharding: tables: t_config: actual-data-nodes: ds$->{0..1}.t_config broadcast-tables: - t_config
|
这样往t_config插入数据时,会自动同步到所有库
绑定表
如果两个表的分片键相同,可以配置为绑定表,避免笛卡尔积
1 2 3 4 5 6
| spring: shardingsphere: rules: sharding: binding-tables: - t_order,t_order_item
|
这样查询的时候会优化:
1 2 3
| SELECT * FROM t_order o LEFT JOIN t_order_item i ON o.id = i.order_id WHERE o.user_id = 1
|
读写分离
如果主从配置了读写分离:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| spring: shardingsphere: datasource: names: master,slave0,slave1 master: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://localhost:3306/db username: root password: root123456 slave0: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://localhost:3307/db username: root password: root123456 slave1: type: com.zaxxer.hikari.HikariDataSource driver-class-name: com.mysql.cj.jdbc.Driver jdbc-url: jdbc:mysql://localhost:3308/db username: root password: root123456
rules: readwrite-splitting: data-sources: prds: static-strategy: write-data-source-name: master read-data-source-names: slave0,slave1 load-balancers: round-robin: type: ROUND_ROBIN
|
注意事项
1. 分片键选择
选择合适的分片键很重要:
- 查询频率高的字段
- 数据分布均匀的字段
- 避免热点数据
2. 跨库查询
尽量避免跨库查询,如果一定要跨库,可以考虑:
- 在应用层聚合
- 使用全局表(广播表)
- 使用ShardingSphere的复杂查询能力
3. 分页查询
分页查询会查所有分片然后聚合,可以考虑:
4. 分布式事务
ShardingSphere支持分布式事务,但性能会下降:
1 2 3 4
| spring: shardingsphere: props: xa-enabled: true
|
5. SQL限制
有些SQL不支持或者有限制:
- 不支持跨库的关联查询
- 不支持跨库的子查询
- 不支持INSERT时SELECT
常见问题
问题1:数据分布不均
使用MOD算法可能导致数据分布不均,可以考虑:
1 2 3 4 5
| table_mod: type: INLINE props: algorithm-expression: t_order$->{user_id % 2}
|
或者使用STANDARD复杂分片算法
问题2:批量插入性能
批量插入时要注意,每个数据可能都要路由到不同的库:
1 2 3 4 5
| Map<Long, List<Order>> grouped = orders.stream() .collect(Collectors.groupingBy(Order::getUserId));
|
总结
ShardingSphere用起来还是挺方便的,但是分库分表会增加系统复杂度
建议:
- 先优化SQL和索引
- 考虑读写分离
- 最后再考虑分库分表
使用ShardingSphere时要注意:
- 合理选择分片键
- 避免跨库查询
- 注意分布式事务
- 充分测试
暂时就先记录这么多