SSP进程CPU异常高原因分析

方便更快捷的说明问题,可以按需填写(可删除)

使用环境:

5.5.30-ShardingSphere-Proxy 5.3.0

场景、问题:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
17282  root      20   0   28.4g   2.6g  23108 S 735.0  0.7 236:27.05 java                                                                                                            
 27352 mysql     20   0   36.5g   5.7g  13176 S  34.0  1.5  37384:04 mysqld 

偶尔会增加到 4344%及以上
后端数据库负载很低,当CPU很高时,从SSP登录及任何查询 延迟均很大

mysql> select count(*) from xxxx;
+----------+
| count(*) |
+----------+
|     1008 |
+----------+
1 row in set (4.39 sec)

现导致业务超时异常频率增加

已进行操作:

重启后持续几分钟内正常,CPU逐渐增加,且延迟增加

现状:

mysql> SHOW DIST VARIABLES;
+---------------------------------------+----------------+
| variable_name                         | variable_value |
+---------------------------------------+----------------+
| sql_show                              | false          |
| sql_simple                            | false          |
| kernel_executor_size                  | 0              |
| max_connections_size_per_query        | 1              |
| check_table_metadata_enabled          | false          |
| sql_federation_type                   | NONE           |
| proxy_frontend_database_protocol_type |                |
| proxy_frontend_flush_threshold        | 128            |
| proxy_hint_enabled                    | false          |
| proxy_backend_query_fetch_size        | -1             |
| proxy_frontend_executor_size          | 0              |
| proxy_backend_executor_suitable       | OLAP           |
| proxy_frontend_max_connections        | 0              |
| proxy_backend_driver_type             | JDBC           |
| proxy_mysql_default_version           | 5.7.22         |
| proxy_default_port                    | 8066           |
| proxy_netty_backlog                   | 1024           |
| proxy_instance_type                   | Proxy          |
| agent_plugins_enabled                 | true           |
| cached_connections                    | 0              |
| transaction_type                      | LOCAL          |
+---------------------------------------+----------------+
21 rows in set (0.00 sec)

当前javaagent开启了:

-javaagent:/data/sharding-proxy/agent/shardingsphere-agent.jar
-javaagent:/opt/skywalking-agent/skywalking-agent.jar

其中 -javaagent:/opt/skywalking-agent/skywalking-agent.jar 是因为cpu使用率高临时添加上去的

top -H -p 74892
top - 12:35:15 up 335 days,  3:47,  7 users,  load average: 2.03, 1.94, 2.22
Threads: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.5 us,  0.0 sy,  0.0 ni, 95.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 39468630+total,  8460452 free, 15467635+used, 23154950+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 23456804+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                          
 74947 root      20   0   16.4g   2.4g  20964 R 99.9  0.6  20:28.64 java 
#jstack  74892   |   grep  124c3 -A  100
"VM Thread" os_prio=0 tid=0x00007f787c2ec800 nid=0x124c3 runnable 

"Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c024800 nid=0x12490 runnable 

"Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c026000 nid=0x12491 runnable 

"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c028000 nid=0x12492 runnable 

"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c02a000 nid=0x12493 runnable 

"Gang worker#4 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c02b800 nid=0x12494 runnable 

"Gang worker#5 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c02d800 nid=0x12495 runnable 

"Gang worker#6 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c02f800 nid=0x12496 runnable 

"Gang worker#7 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c031000 nid=0x12497 runnable 

"Gang worker#8 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c033000 nid=0x12498 runnable 

"Gang worker#9 (Parallel GC Threads)" os_prio=0 tid=0x00007f787c035000 nid=0x12499 runnable 

当前业务逻辑很简单:

  1. 服务去库里查询数据是否存在
  2. 若返回存在,则update
  3. 若返回不存在,则insert

每个操作均为应用层判断,每个步骤为独立事务,处理量在 10000次/分钟

4个SSP节点,当前业务只连了1个节点,这个节点的频繁的超时(1秒),其余SSP节点手动查询正常。

京ICP备2021015875号