Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16 游戏玩法分析 IV #21

Open
astak16 opened this issue Jan 11, 2022 · 0 comments
Open

16 游戏玩法分析 IV #21

astak16 opened this issue Jan 11, 2022 · 0 comments
Labels

Comments

@astak16
Copy link
Owner

astak16 commented Jan 11, 2022

题目

查询首次登录游戏并且第二天再次登录游戏玩家的比率,四舍五入到小数点后两位。

create table activity (
	player_id int,
	device_id int,
	event_date date,
	games_played int
);

insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);

SQL:方法一

select round(select count(player_id) from (
	select
		player_id,
		event_date,
		lag(event_date)
			over(partition by player_id order by event_date) as next_date,
		rank() over(partition by player_id order by event_date) as login_times
	from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
/
select count(player_id) from (
	select player_id from activity group by player_id
) as temp, 2) as fraction;

解析

需要求出两个值:第一个值是总人数,第二个值是连续登录两天的玩家数

  1. 求总人数:

activity 表按照 player_id 进行分组在计算出总人数

select count(player_id) from (
	select player_id from activity group by player_id
) as temp

还有一种写法:

select count(distinct player_id) from activity
  1. 求连续两天登录游戏的玩家数:

使用窗口函数 lag(event_date) 将日期按照 player_id 分组,并按照 event_date 升序排列,然后偏移一天:

lag(event_date) over(partition by player_id order by event_date) as next_date

使用 rank 对日期按照 player_id 分组,并按照 event_date 升序排列,然后排序:

rank() over(partition by player_id order by event_date) as login_times

组合成完整的 SQL

select
	player_id,
	event_date,
	lag(event_date)
		over(partition by player_id order by event_date) as next_date,
	rank() over(partition by player_id order by event_date) as login_times
from activity

筛选出日期相差一天,并且是第一次连续登录,这里 login_times 取值 2 ,因为上面按照登录日期进行排序过了,第一天登录是 1 ,第二天登录是 2

计算出 player_id 的个数就可以算出连续登录两天的玩家了数了:

select count(player_id) from (
	select
		player_id,
		event_date,
		lag(event_date)
			over(partition by player_id order by event_date) as next_date,
		rank() over(partition by player_id order by event_date) as login_times
	from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
  1. 最后:

使用 round() 保留两位小数

SQL:方法二

select round((
	(select count(player_id) from (
		select
			player_id,
			datediff(event_date, min(event_date) over(partition by player_id)) as diff
		from activity
	) as temp where diff = 1) / (select count(distinct player_id) from activity)
), 2) as fraction;

解析

  1. 计算出每个用户最近两个登陆日期时间差:
select
	player_id,
	datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
  1. 计算出连续两天登录的用户数, diff1
select count(player_id) from (
	select
		player_id,
		datediff(event_date, min(event_date) over(partition by player_id)) as diff
	from activity
) as temp where diff = 1
  1. 总人口计算方法,在方法一种已经给出了
  2. 最后使用 round() 保留两位小数

SQL:方法三

with temp as (
	select
		player_id,
		datediff(event_date, min(event_date) over(partition by player_id)) as diff
	from activity
) select round(
	sum(case diff when 1 then 1 else 0 end) /
	count(distinct player_id),
2) as fraction from temp;

解析

  1. 计算出每个用户最近两个登陆日期时间差,在方法二中给出了
  2. 将它作为临时表 temp ,用 with
  3. diff = 1 的和是连续两天登录的用户:
sum(case diff when 1 then 1 else 0 end)
  1. 临时表 temp 中记录了每个用户每次登录的时间差,去重计算 player_id
count(distinct player_id)
  1. 最后使用 round() 保留两位小数

SQL:方法四

select round(avg(event_date is not null), 2) as fraction from (
	select player_id, min(event_date) as first_login from activity
	group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;

解析

  1. 使用 min 算出每个用户第一次登录的日期:
select player_id, min(event_date) as first_login from activity
group by player_id
  1. 左连 activity 表:
select * from (
	select player_id, min(event_date) as first_login from activity
	group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
  1. 因为 temp 只有每个用户第一次登录游戏的时间,加上连接条件 datediff(event_date, first_login) = 1 后,不满足条件的用户数据都 null
  2. 使用 avg(event_date is not null) 算出平均数
    1. temp 表是每个用户第一次登录的时间
    2. activity 表是存储着所有数据
    3. temp 左连 activity 筛选出第一次连续两天登录的玩家,满足条件的玩家是有值的,没有满足条件的玩家是 null
    4. event_date is not null 结果是 1 或者 0 ,不是日期了,也没有 null 了。
  3. 使用 round() 计算出平均数

Tips

datediff 语法

rank 语法

@astak16 astak16 added the 中等 label Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant