分组查询
分组GROUP BY
操作是数据统计中不可缺少的功能,sqala也对分组查询有着深入的支持。
简单分组
分组操作通常与聚合函数配合使用,sqala中使用groupBy
方法创建分组,比如我们需要统计每个频道的帖子数量:
val q = query:
from(Post)
.groupBy(p => p.channelId)
.map((g, p) => (g, count()))
生成的SQL为:
SELECT
"t1"."channel_id" AS "c1",
COUNT(*) AS "c2"
FROM
"post" AS "t1"
GROUP BY
"t1"."channel_id"
在分组后的sortBy
/orderBy
/having
/map
中,均多了一个参数,这个参数代表当前分组集(如果是两表连接之后的分组则有三个参数,以此类推)。
使用元组或命名元组支持支持多个分组表达式:
val q = query:
from(Post)
.groupBy(p => (cid = p.channelId, title = p.title))
.map((g, p) => (g.cid, g.title, count()))
生成的SQL为:
SELECT
"t1"."channel_id" AS "c1",
"t1"."title" AS "c2",
COUNT(*) AS "c3"
FROM
"post" AS "t1"
GROUP BY
"t1"."channel_id",
"t1"."title"
需要注意的是,大多数数据库都不允许在分组后的操作中出现非分组表达式,因此以下的SQL是错误的:
SELECT
"t1"."channel_id" AS "c1",
"t1"."title" AS "c2",
COUNT(*) AS "c3"
FROM
"post" AS "t1"
GROUP BY
"t1"."channel_id"
sqala也会在这样的查询中生成编译错误。
如果我们只想在一个分组中返回任意的title
字段值,使用ANY_VALUE
聚合函数即可,在sqala中对应的方法为anyValue
:
val q = query:
from(Post)
.groupBy(p => p.channelId)
.map((g, p) => (g, anyValue(p.title), count()))
分组后过滤
having
方法用于分组后过滤数据,使用方式与filter
类似:
val q = query:
from(Post)
.groupBy(p => p.channelId)
.having((g, p) => count() > 1)
.map((g, p) => (g, count()))
生成的SQL为:
SELECT
"t1"."channel_id" AS "c1",
COUNT(*) AS "c2"
FROM
"post" AS "t1"
GROUP BY
"t1"."channel_id"
HAVING
COUNT(*) > 1
多维分组
除了简单的分组外,sqala支持多维分组操作。
CUBE分组
SQL的GROUP BY CUBE(a, b, c)
将会在分组中生成每一种组合的情况,也就是说CUBE (a, b, c)
相当于合并以下分组:
分组 |
---|
无分组 |
a |
b |
c |
a, b |
a, c |
b, c |
a, b, c |
共8种情况(2的n次方)。
CUBE
分组在制作多维数据集时非常有用,sqala既然宣称支持:“不止是简单的CRUD”,当然也支持CUBE
分组功能。
我们以一个人口表的“年龄段”,“民族”,“性别”三个维度统计人数。年龄段以“小于30岁分为青少年,30-60岁分为中年,60岁以上分为老年”分段统计。
case class Person(id: Int, age: Int, nation: String, gender: String)
val q = query:
from(Person)
.groupByCube: p =>
(
age = `if` (p.age <= 30) `then` "青少年" `else if` (p.age <= 60 && p.age > 30) `then` "中年" `else` "老年",
nation = p.nation,
gender = p.gender
)
.map: (g, p) =>
(
age = g.age
nation = g.nation,
gender = g.gender,
count = count()
)
生成的SQL为:
SELECT
CASE WHEN "t1"."age" <= 30 THEN '青少年' WHEN "t1"."age" <= 60 AND "t1"."age" > 30 THEN '中年' ELSE '老年' END AS "c1",
"t1"."nation" AS "c2",
"t1"."gender" AS "c3",
COUNT(*) AS "c4"
FROM
"person" AS "t1"
GROUP BY
CUBE(
CASE WHEN "t1"."age" <= 30 THEN '青少年' WHEN "t1"."age" <= 60 AND "t1"."age" > 30 THEN '中年' ELSE '老年' END,
"t1"."nation",
"t1"."gender"
)
可以看到,分组使用命名元组可以极大程度简化代码。
需要注意的是,即使数据中没有可空字段,但CUBE
还是会生成本来不存在的空值,这是CUBE
的语义决定的,因此,sqala会将此查询结果推断成:
case class Person(id: Int, age: Int, nation: String, gender: String)
val q = query:
from(Person)
.groupByCube: p =>
(
age = `if` (p.age <= 30) `then` "青少年" `else if` (p.age <= 60 && p.age > 30) `then` "中年" `else` "老年",
nation = p.nation,
gender = p.gender
)
.map: (g, p) =>
(
age = g.age
nation = g.nation,
gender = g.gender,
count = count()
)
// 返回类型为List[(age: Option[String], nation: Option[String], gender: Option[String], count: Long]]
val result = db.fetch(q)
GROUPING表达式
既然多维分组集会产生额外的空值,那我们怎么区分“分组产生的空值是数据原本就有的,还是因为多维分组额外产生的”,为此,SQL推出了配套的GROUPING
表达式,GROUPING
的结果为1则说明空值是分组额外产生的,结果为0则说明空值是原本数据中就有的,在sqala中,我们使用grouping
方法创建此表达式:
val q = query:
from(Person)
.groupByCube: p =>
(
age = `if` (p.age <= 30) `then` "青少年" `else if` (p.age <= 60 && p.age > 30) `then` "中年" `else` "老年",
nation = p.nation,
gender = p.gender
)
.map: (g, p) =>
(
age = g.age,
groupingAge = grouping(g.age),
nation = g.nation,
groupingNation = grouping(g.nation),
gender = g.gender,
groupingGender = grouping(g.gender),
count = count()
)
生成的SQL为:
SELECT
CASE WHEN "t1"."age" <= 30 THEN '青少年' WHEN "t1"."age" <= 60 AND "t1"."age" > 30 THEN '中年' ELSE '老年' END AS "c1",
GROUPING(CASE WHEN "t1"."age" <= 30 THEN '青少年' WHEN "t1"."age" <= 60 AND "t1"."age" > 30 THEN '中年' ELSE '老年' END) AS "c2",
"t1"."nation" AS "c3",
GROUPING("t1"."nation") AS "c4",
"t1"."gender" AS "c5",
GROUPING("t1"."gender") AS "c6",
COUNT(*) AS "c7"
FROM
"person" AS "t1"
GROUP BY
CUBE(
CASE WHEN CASE WHEN "t1"."age" <= 30 THEN '青少年' WHEN "t1"."age" <= 60 AND "t1"."age" > 30 THEN '中年' ELSE '老年' END,
"t1"."nation",
"t1"."gender"
)
ROLLUP分组
SQL的GROUP BY ROLLUP(a, b, c)
将会在分组中生成逐级分组,也就是说ROLLUP (a, b, c)
相当于合并以下分组:
分组 |
---|
无分组 |
a |
a, b |
a, b, c |
共4种情况(n + 1)。
在sqala中使用groupByRollup
方法生成ROLLUP
分组,使用方式与groupByCube
类似,不赘述。
GROUPING SETS分组
SQL的GROUP BY GROUPING SETS
分组将按指定的分组集来产生分组结果,比如我们想做到类似ROLLUP
的效果,则需要写成:
GROUP BY GROUPING SETS((), a, (a, b), (a, b, c))
空分组集使用()
来生成,为了做到这一点,sqala中的groupBySets
方法接收一个初始的分组集,和该分组集产生的分组的嵌套元组,空分组集使用Unit
类型的字面量()
来指定:
val q = query:
from(Entity)
.groupBySets(e => (a = e.a, b = e.b, c = e.c))(g => ((), g.a, (g.a, g.b), (g.a, g.b, g.c)))
.map((g, e) => (g.a, g.b, g.c, count()))