springboot + shardingSphere5.0 + mybatis + mysql 实现分库分表情况下,查询跨年的数据,返回了同一时间的两条数据,其中一条为0

方便更快捷的说明问题,可以按需填写(可删除)

使用环境:

springboot + shardingSphere5.0 + mybatis + mysql

场景、问题:

实现分库分表情况下,查询跨年的数据,返回了同一时间的两条数据,其中一条为0

已进行操作:

现状:

配置

这种情况需要开启 sql-show,并分析 Logic SQL 和 Actual SQL。

report_site_day_2021 传的参数还是 2021-12-20, 2022-01-18
report_site_day_2022 传的参数还是 2021-12-20, 2022-01-18


这种应该怎么解决

看了你的描述,我不太清楚具体问题。
能不能提供一个 demo 让我复现一下?

dimdate表:
CREATE TABLE dimdate (
DateKey INT(11) NOT NULL,
FullDateAlternateKey DATE NOT NULL,
PRIMARY KEY (DateKey),
KEY DateKey (DateKey) USING BTREE
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT=‘时间明细表’

数据表:
CREATE TABLE report_site_day (
datatime INT(11) NOT NULL,
innum DATE NOT NULL,
PRIMARY KEY (datatime),
KEY datatime (datatime) USING BTREE
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT=‘数据表’

一个查询语句查询的时间条件是2021-12-20到2022-01-18。
report_site_day为数据表 用时间和dimdate关联
sql:
SELECT dim.FullDateAlternateKey,rs.innum FROM dimdate dim LEFT JOIN report_site_day rs ON dim.FullDateAlternateKey = rs.datatime
WHERE dim.FullDateAlternateKey BETWEEN ‘2021-12-20’ AND ‘2022-01-18’

用分库report_site_day_2021 ,report_site_day_2022 分别对主表dimdate进行查询,最后实际返回的数量为原来的两倍

能把相关分表配置贴一下吗?

  shardingsphere:
    datasource:
      ds0:
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: ***
        password: 
        type: com.zaxxer.hikari.HikariDataSource
        username: root
      names: ds0
    #      names: ds2015,ds2016,ds2017,ds2018,ds2019,ds2020,ds2021
    rules:
      sharding:
        tables:
          report_site_day:
            actual-data-nodes: ds0.report_site_day_$->{2015..2022}
            table-strategy:
              standard:
                sharding-column: datatime
                sharding-algorithm-name: my-db-range-sharding-algorithm
        shardingAlgorithms:
          my-db-range-sharding-algorithm:
            type: CLASS_BASED
            props:
              strategy: standard
              algorithmClassName: com.winner.algorithm.MyDBRangeShardingAlgorithm
    props:
      #      sql-federation-enabled: true
      sql-show: true
      # sql-simple: true

dimdate 数据:
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211220’,‘2021-12-20’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211221’,‘2021-12-21’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211222’,‘2021-12-22’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211223’,‘2021-12-23’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211224’,‘2021-12-24’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211225’,‘2021-12-25’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211226’,‘2021-12-26’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211227’,‘2021-12-27’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211228’,‘2021-12-28’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211229’,‘2021-12-29’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211230’,‘2021-12-30’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20211231’,‘2021-12-31’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220101’,‘2022-01-01’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220102’,‘2022-01-02’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220103’,‘2022-01-03’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220104’,‘2022-01-04’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220105’,‘2022-01-05’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220106’,‘2022-01-06’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220107’,‘2022-01-07’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220108’,‘2022-01-08’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220109’,‘2022-01-09’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220110’,‘2022-01-10’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220111’,‘2022-01-11’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220112’,‘2022-01-12’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220113’,‘2022-01-13’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220114’,‘2022-01-14’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220115’,‘2022-01-15’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220116’,‘2022-01-16’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220117’,‘2022-01-17’);
insert into dimdate (DateKey, FullDateAlternateKey) values(‘20220118’,‘2022-01-18’);

package com.winner.algorithm;
import org.apache.shardingsphere.sharding.api.sharding.standard.PreciseShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.RangeShardingValue;
import org.apache.shardingsphere.sharding.api.sharding.standard.StandardShardingAlgorithm;
import java.util.Collection;
import java.util.LinkedHashSet;
import java.util.Set;
public class MyDBRangeShardingAlgorithm implements StandardShardingAlgorithm<Integer> {
    @Override
    public String doSharding(Collection<String> collection, PreciseShardingValue<Integer> preciseShardingValue) {
        /**
         * databaseNames 所有分片库的集合
         * shardingValue 为分片属性,其中 logicTableName 为逻辑表,columnName 分片健(字段),value 为从 SQL 中解析出的分片健的值
         */
        String year = String.valueOf(preciseShardingValue.getValue()).substring(0, 4);
        for (String databaseName : collection) {
            if (databaseName.endsWith(year)) {
                System.out.println("return :" + year);
                return databaseName;
            }
        }
        return "";
    }

    @Override
    public Collection<String> doSharding(Collection<String> collection, RangeShardingValue<Integer> rangeShardingValue) {
        Set<String> result = new LinkedHashSet<>();
//        // between and 的起始值
        String lowerStr = String.valueOf(rangeShardingValue.getValueRange().lowerEndpoint()).substring(0, 4);
        String upperStr = String.valueOf(rangeShardingValue.getValueRange().upperEndpoint()).substring(0, 4);
        int lower = Integer.valueOf(lowerStr);
        int upper = Integer.valueOf(upperStr);
//        // 循环范围计算分库逻辑
        for (int i = lower; i <= upper; i++) {
            for (String databaseName : collection) {
                if (databaseName.endsWith(String.valueOf(i))) {
                    result.add(databaseName);
                }
            }
        }
        return result;
    }

    @Override
    public void init() {
    }

    @Override
    public String getType() {
        return null;
    }
}

我看到你的配置了,report_site_day 这张表是分表,dimdate 这张表是字典表。
执行 sql

SELECT dim.FullDateAlternateKey,rs.innum FROM dimdate dim LEFT JOIN report_site_day rs ON dim.FullDateAlternateKey = rs.datatime
WHERE dim.FullDateAlternateKey BETWEEN '2021-12-20' AND '2022-01-18' 

执行的结果有重复数据?
是这个问题吗?

是的,跨一年就有两倍的数据

看起来你的查询条件中的分片键和配置中的分片键不同。
改成这个试试

SELECT dim.FullDateAlternateKey,rs.innum FROM dimdate dim LEFT JOIN report_site_day rs ON dim.FullDateAlternateKey = rs.datatime WHERE rs.datatime BETWEEN '2021-12-20' AND '2022-01-18' 

这种写法就达不到时间字典表的想要的效果。

@用户1 确实有数据重复的问题。后期我们可能会考虑采用查询优化修复这个问题。
从原理上看,因为字典表只有一张,但是分表有五张,另外没有与分片键相关的查询条件,那么会采用全路由的方式,即将 sql 拆成 5 条 sql 执行,并对结果进行归并,所以会出现重复数据。
可以考虑将分表做为主表。

可以考虑inner join或者加个条件判断 WHERE rs.innum is not null

貌似解决不了吧。

你可以在页面选择俩个年份数据 并保证这俩个年份都不在数据的情况下 是否对获取分页 COUNT 报数组越界

京ICP备2021015875号