您的当前位置:首页正文

clickhouse亿级数据性能测试

2021-05-22 来源:个人技术集锦
clickhouse亿级数据性能测试

clickhouse 在数据分析技术领域早已声名远扬,如果还不知道可以 了解下。

最近由于项⽬需求使⽤到了 clickhouse 做分析数据库,于是⽤测试环境做了⼀个单表 6 亿数据量的性能测试,记录⼀下测试结果,有做超⼤数据量分析技术选型需求的朋友可以参考下。服务器信息

CPU:Intel Xeon Gold 6240 @ 8x 2.594GHz内存:32G

系统:CentOS 7.6Linux内核版本:3.10.0磁盘类型:机械硬盘⽂件系统:ext4Clickhouse信息

部署⽅式:单机部署版本:20.8.11.17测试情况

测试数据和测试⽅法来⾃ clickshouse 官⽅的

按照官⽅指导造出了测试数据之后,先看⼀下数据量和空间占⽤情况。数据量和空间占⽤

表名suppliercustomerpartlineorder

列数67816

数据⾏数200,0003,000,0001,400,000

原始⼤⼩11.07 MiB

压缩⼤⼩压缩率7.53 MiB

68

168.83 MiB114.72 MiB6834.29 MiB

24.08 MiB16.67 GiB

706955

600,037,90224.03 GiB

lineorder_flat37688,552,212111.38 GiB61.05 GiB

可以看到 clickhouse 的压缩率很⾼,压缩率都在 50 以上,基本可以达到 70 左右。数据体积的减⼩可以⾮常有效的减少磁盘空间占⽤、提⾼ I/O 性能,这对整体查询性能的提升⾮常有效。

supplier、customer、part、lineorder 为⼀个简单的「供应商-客户-订单-地区」的星型模型,lineorder_flat 为根据这个星型模型数据关系合并的⼤宽表,所有分析都直接在这张⼤宽表中执⾏,减少不必要的表关联,符合我们实际⼯作中的分析建表逻辑。以下性能测试的所有分析 SQL 都在这张⼤宽表中运⾏,未进⾏表关联查询。查询性能测试详情Query 1.1

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenueFROM lineorder_flat

WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)┌────────revenue─┐│ 44652567249651 │└────────────────┘

1 rows in set. Elapsed: 0.242 sec. Processed 91.01 million rows, 728.06 MB (375.91 million rows/s., 3.01 GB/s.)扫描⾏数:91,010,000 ⼤约9100万耗时(秒):0.242查询列数:2结果⾏数:1Query 1.2

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenueFROM lineorder_flat

WHERE (toYYYYMM(LO_ORDERDATE) = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))┌───────revenue─┐│ 9624332170119 │└───────────────┘

1 rows in set. Elapsed: 0.040 sec. Processed 7.75 million rows, 61.96 MB (191.44 million rows/s., 1.53 GB/s.)扫描⾏数:7,750,000 775万耗时(秒):0.040查询列数:2返回⾏数:1Query 2.1SELECT

sum(LO_REVENUE),

toYear(LO_ORDERDATE) AS year, P_BRAND

FROM lineorder_flat

WHERE (P_CATEGORY = 'MFGR#12') AND (S_REGION = 'AMERICA')GROUP BY year,

P_BRANDORDER BY year ASC,

P_BRAND ASC

┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐│ 64420005618 │ 1992 │ MFGR#121 ││ 63389346096 │ 1992 │ MFGR#1210 ││ ........... │ .... │ ..........│

│ 39679892915 │ 1998 │ MFGR#128 ││ 35300513083 │ 1998 │ MFGR#129 │

└─────────────────┴──────┴───────────┘

280 rows in set. Elapsed: 8.558 sec. Processed 600.04 million rows, 6.20 GB (70.11 million rows/s., 725.04 MB/s.)扫描⾏数:600,040,000 ⼤约6亿耗时(秒):8.558

查询列数:3结果⾏数:280Query 2.2SELECT

sum(LO_REVENUE),

toYear(LO_ORDERDATE) AS year, P_BRAND

FROM lineorder_flat

WHERE ((P_BRAND >= 'MFGR#2221') AND (P_BRAND <= 'MFGR#2228')) AND (S_REGION = 'ASIA')GROUP BY year,

P_BRANDORDER BY year ASC,

P_BRAND ASC

┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐│ 66450349438 │ 1992 │ MFGR#2221 ││ 65423264312 │ 1992 │ MFGR#2222 ││ ........... │ .... │ ......... │

│ 39907545239 │ 1998 │ MFGR#2227 ││ 40654201840 │ 1998 │ MFGR#2228 │

└─────────────────┴──────┴───────────┘

56 rows in set. Elapsed: 1.242 sec. Processed 600.04 million rows, 5.60 GB (482.97 million rows/s., 4.51 GB/s.) 扫描⾏数:600,040,000 ⼤约6亿耗时(秒):1.242查询列数:3结果⾏数:56Query 3.1SELECT

C_NATION, S_NATION,

toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenueFROM lineorder_flat

WHERE (C_REGION = 'ASIA') AND (S_REGION = 'ASIA') AND (year >= 1992) AND (year <= 1997)GROUP BY C_NATION, S_NATION, year

ORDER BY year ASC,

revenue DESC

┌─C_NATION──┬─S_NATION──┬─year─┬──────revenue─┐│ INDIA │ INDIA │ 1992 │ 537778456208 ││ INDONESIA │ INDIA │ 1992 │ 536684093041 ││ ..... │ ....... │ .... │ ............ │

│ CHINA │ CHINA │ 1997 │ 525562838002 ││ JAPAN │ VIETNAM │ 1997 │ 525495763677 │

└───────────┴───────────┴──────┴──────────────┘

150 rows in set. Elapsed: 3.533 sec. Processed 546.67 million rows, 5.48 GB (154.72 million rows/s., 1.55 GB/s.) 扫描⾏数:546,670,000 ⼤约5亿4千多万耗时(秒):3.533查询列数:4结果⾏数:150Query 3.2SELECT C_CITY, S_CITY,

toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenueFROM lineorder_flat

WHERE (C_NATION = 'UNITED STATES') AND (S_NATION = 'UNITED STATES') AND (year >= 1992) AND (year <= 1997)GROUP BY C_CITY, S_CITY, year

ORDER BY year ASC,

revenue DESC

┌─C_CITY─────┬─S_CITY─────┬─year─┬────revenue─┐│ UNITED ST6 │ UNITED ST6 │ 1992 │ 5694246807 ││ UNITED ST0 │ UNITED ST0 │ 1992 │ 5676049026 ││ .......... │ .......... │ .... │ .......... │

│ UNITED ST9 │ UNITED ST9 │ 1997 │ 4836163349 ││ UNITED ST9 │ UNITED ST5 │ 1997 │ 4769919410 │└────────────┴────────────┴──────┴────────────┘

600 rows in set. Elapsed: 1.000 sec. Processed 546.67 million rows, 5.56 GB (546.59 million rows/s., 5.56 GB/s.)扫描⾏数:546,670,000 ⼤约5亿4千多万耗时(秒):1.00查询列数:4结果⾏数:600

Query 4.1SELECT

toYear(LO_ORDERDATE) AS year, C_NATION,

sum(LO_REVENUE - LO_SUPPLYCOST) AS profitFROM lineorder_flat

WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))GROUP BY year,

C_NATIONORDER BY year ASC,

C_NATION ASC

┌─year─┬─C_NATION──────┬────────profit─┐│ 1992 │ ARGENTINA │ 1041983042066 ││ 1992 │ BRAZIL │ 1031193572794 ││ .... │ ...... │ ............ │

│ 1998 │ PERU │ 603980044827 │

│ 1998 │ UNITED STATES │ 605069471323 │└──────┴───────────────┴───────────────┘

35 rows in set. Elapsed: 5.066 sec. Processed 600.04 million rows, 8.41 GB (118.43 million rows/s., 1.66 GB/s.) 扫描⾏数:600,040,000 ⼤约6亿耗时(秒):5.066查询列数:4结果⾏数:35Query 4.2SELECT

toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY,

sum(LO_REVENUE - LO_SUPPLYCOST) AS profitFROM lineorder_flat

WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((year = 1997) OR (year = 1998)) AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))GROUP BY year,

S_NATION, P_CATEGORYORDER BY year ASC,

S_NATION ASC, P_CATEGORY ASC

┌─year─┬─S_NATION──────┬─P_CATEGORY─┬───────profit─┐│ 1997 │ ARGENTINA │ MFGR#11 │ 102369950215 ││ 1997 │ ARGENTINA │ MFGR#12 │ 103052774082 ││ .... │ ......... │ ....... │ ............ │

│ 1998 │ UNITED STATES │ MFGR#24 │ 60779388345 ││ 1998 │ UNITED STATES │ MFGR#25 │ 60042710566 │

└──────┴───────────────┴────────────┴──────────────┘

100 rows in set. Elapsed: 0.826 sec. Processed 144.42 million rows, 2.17 GB (174.78 million rows/s., 2.63 GB/s.)扫描⾏数:144,420,000 ⼤约1亿4千多万耗时(秒):0.826查询列数:4结果⾏数:100性能测试结果汇总

查询语句SQL简要说明Q1.1Q1.2Q2.1Q2.2Q3.1Q3.2Q4.1Q4.2

乘积、汇总、4个条件、⾸次运⾏Q1.1增加1个条件运⾏

汇总、函数、2列分组、2列排序、⾸次运⾏Q2.1增加1个条件运⾏

汇总、函数、3列分组、2列排序、⾸次运⾏Q3.1更换条件运⾏

扫描⾏数返回⾏数扫描⾏数返回⾏数查询列数查询列数耗时耗时(秒)91,010,0007,750,000600,040,000600,040,000546,670,000546,670,000

112805615060035100

22334444

0.2420.0408.5581.2423.533

15.0060.826

相减、汇总、函数、2列分组、2列排序、⾸次运⾏600,040,000Q4.1增加2个条件运⾏

144,420,000

在当前软硬件环境下,扫描 6 亿多⾏数据,常见的分析语句⾸次运⾏最慢在 8 秒左右能返回结果,相同的分析逻辑更换条件再次查询的时候效率有明显的提升,可以缩短到 1

秒左右,如果只是简单的列查询没有加减乘除、聚合等逻辑,扫描全表 6 亿多⾏数据⾸次查询基本可以在 2 秒内执⾏完成。

因篇幅问题不能全部显示,请点此查看更多更全内容