Taobao user behavior analysis

1, Data source

        The data comes from Alibaba Tianchi. The source data has about 100 million records. Due to hardware reasons, only the data from November 25, 2017 to December 4, 2017 are intercepted for analysis.

2, Data structure

Column nameexplain
user_idUser id, integer type, serialized user id
item_idCommodity id, integer type, serialized commodity id
category_idCommodity category id, integer type, category id of serialized commodity
hehavior_typeBehavior type, string, enumeration type, including ('pv ',' buy ',' cart ',' fav ')
dateTimestamp, the timestamp at which the behavior occurs
Behavior typeexplain
pvProduct details page pv, equivalent to clicking
buyCommodity purchase
cartAdd items to shopping cart
favCollect goods

3, Data cleaning

1. In the original data, the timestamp is a 10 digit number that needs to be changed to a date

update userbe set date=FROM_UNIXTIME(date,'%Y-%m-%d %H:%i:%s')

2. After data inspection, there are no duplicate values, missing values and abnormal values. The following data analysis is directly carried out.

3. View data volume

4, Data analysis

1. pv/uv analysis

select riqi,count(behavior_type) pv, count(distinct user_id) uv,count(behavior_type)/count(distinct user_id) 'Page views per capita' from users where behavior_type='pv' group by riqi order by riqi

        From November 25 to December 4, it is in the end of double 11 and the warm-up period of double 12. From the perspective of pv and uv, there was a significant increase on December 2. The reason may be that the other day is in the warm-up period of double 12.

2. User behavior analysis

create table usersbe as(
select user_id,sum(case when behavior_type='pv' then 1 else 0 end) as 'Number of views',
sum(case when behavior_type='fav' then 1 else 0 end) as 'Collection times',
sum(case when behavior_type='cart' then 1 else 0 end) as 'Additional purchase times',
sum(case when behavior_type='buy' then 1 else 0 end) as 'Number of purchases' from users group by user_id)

select count(case when `Number of views`>0 and `Collection times`=0 and `Additional purchase times`=0 and `Number of purchases`=0 then user_id else null end) 'Browse only',
count(case when `Number of views`>0 and `Collection times`>0 and `Additional purchase times`=0 and `Number of purchases`=0 then user_id else null end) 'Browse collection not purchased',
count(case when `Number of views`>0 and `Collection times`=0 and `Additional purchase times`>0 and `Number of purchases`=0 then user_id else null end) 'Browse additional purchase not purchased',
count(case when `Number of views`>0 and `Collection times`>0 and `Additional purchase times`>0 and `Number of purchases`=0 then user_id else null end) 'Browse collection add purchase not purchased',
count(case when `Number of views`>0 and `Collection times`=0 and `Additional purchase times`=0 and `Number of purchases`>0 then user_id else null end) 'Browse purchase',
count(case when `Number of views`>0 and `Collection times`>0 and `Additional purchase times`=0 and `Number of purchases`>0 then user_id else null end) 'Browse collection purchases',
count(case when `Number of views`>0 and `Collection times`=0 and `Additional purchase times`>0 and `Number of purchases`>0 then user_id else null end) 'Browse add on purchase',
count(case when `Number of views`>0 and `Collection times`>0 and `Additional purchase times`>0 and `Number of purchases`>0 then user_id else null end) 'Browse collections and purchase',
count(case when `Number of views`=0 and `Number of purchases`=0 then user_id else null end) 'Not browsed not purchased',
count(case when `Number of views`=0 and `Number of purchases`>0 then user_id else null end) 'Purchase not browsed'
from usersbe

          Browsing plus purchase accounts for the largest proportion, which is 32.82%. Compared with browsing collection purchase, it shows that users prefer to buy after joining the shopping cart.

 

 

          Users' browsing, collection and additional purchase behaviors are mainly at 21:00 and 22:00, and their purchase behaviors are from 9:00 to 15:00 and 21-22:00.

This is more in line with people's work and rest rules. They are busy working during the day and shopping at night.

2. Conversion rate

        The conversion rate of collection / additional purchase is 9.33%, which is relatively low. It may be caused by problems such as goods not meeting the demand or unfriendly commodity details page. Further analysis and mining can be carried out for the reasons.

4. User product preference

        Sales volume top 10, commodity category

Tags: Data Analysis

Posted on Sun, 05 Sep 2021 22:34:54 -0400 by LanceEh