E-commerce User Behavior Analysis

E-commerce User Behavior Analysis

1. Project Background

With Taobao app platform as the data set, the user behavior of Taobao is analyzed through industry indicators to explore the behavior mode of Taobao users. Specific indicators include: daily PV and daily UV analysis, payment rate analysis, re-purchase behavior analysis, funnel loss analysis and user value RFM analysis.

Data source: https://tianchi.aliyun.com/dataset/dataDetail?dataId=72423

2. Data Processing

2.1 Field Description

user_id - User identity


item_id - Commodity ID


behavior_type - User behavior type (including click, favorite, plus shopping cart, payment, expressed in numbers 1, 2, 3, 4)


user_geohash - geographic location


item_category - Category ID


time - When user behavior occurs

2.2 Data cleaning

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
data=pd.read_csv('/Users/kuebiko/Downloads/Taobao User Behavior.csv')
# View the number of duplicates
data.duplicated().sum()
4092866
# Remove duplicate values
data.drop_duplicates(inplace=True)
# View the number of missing values
data.isnull().sum()
user_id                0
item_id                0
behavior_type          0
user_geohash     4308015
item_category          0
time                   0
dtype: int64
# Missing Values Focus on User Geographic Information Delete
del data['user_geohash']
# Delete time column after splitting time column into date and hour
data['date']=data.time.str.split(' ').str[0]
data['hour']=data.time.str.split(' ').str[1]
del data['time']
# Convert Data Format
for i in data.columns[:2]:
    data[i]=data[i].astype('str')
data['date']=pd.to_datetime(data.date)
data['hour']=data.hour.astype('int')
# Sort by time
data=data.sort_values(by=['date','hour'])
data=data.reset_index(drop=True)

3. Data Analysis

3.1 Analysis Framework

1. Flow index diagnosis: PV, UV


2. User Purchase Analysis: Volume of Transaction, Rate of Payment, Number of Purchases Per Capital, Repurchase Rate


3. Conversion Rate of User Behavior Funnel: Click-Pay, Click-Collection, Click-Shopping Cart, Collection/Shopping Cart-Pay


4. Hierarchy of User Groups: Analysis of RFM Model

3.2 Flow Indicator Diagnosis

3.2.1 Traffic Indicators

pv_daily=data[data.behavior_type==1].groupby('date').count()['user_id'].reset_index().rename(columns={'user_id':'pv'})
uv_daily=data.groupby('date').nunique()['user_id'].reset_index().rename(columns={'user_id':'uv'})

fig,axes=plt.subplots(2,1,figsize=(12,8),sharex=True)
pv_daily.plot(x='date',y='pv',ax=axes[0])
uv_daily.plot(x='date',y='uv',ax=axes[1])
axes[0].set_title('pv_daily')
axes[1].set_title('uv_daily')
Text(0.5, 1.0, 'uv_daily')

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-VfOm0eFJ-1638207735263)(output_15_1.png)]

During this month, the overall number of visits and users slowly increased, and reached a peak during the twelve-year period, which shows that the activity effect of the twelve-year period is obvious.

3.2.2 Period Traffic Indicators

pv_hour=data.groupby('hour')['user_id'].count().reset_index().rename(columns={'user_id':'pv'})
uv_hour=data.groupby('hour')['user_id'].apply(lambda x:x.drop_duplicates().count()).reset_index().rename(columns={'user_id':'uv'})
fig,axes=plt.subplots(2,1,sharex=True)
pv_hour.plot(x='hour',y='pv',ax=axes[0])
uv_hour.plot(x='hour',y='uv',ax=axes[1])
axes[0].set_title('pv_hour')
axes[0].set_title('uv_hour')

Text(0.5, 1.0, 'uv_hour')

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-3zR7rnTw-1638207735265)(output_18_1.png)]

Visits and users declined from 23 o'clock to 5 o'clock the next day, and rose to a relatively stable level from 5 o'clock to 18 o'clock. Visits from 18 to 22 increased significantly and peaked at 21 and 22, while the trend in user volume was similar but less obvious.




Summary: 18-22 points are the active period of users, 5-18 points are the general active period of users, 23-5 points are the inactive period of users.

3.3 User Purchase Analysis

3.3.1 Volume and Rate Analysis

# daily volume
buy_daily=data[data.behavior_type==4].groupby('date').count()['behavior_type']

# Pay Rate = Number of Consumers / Active Users
paying_user=data[data.behavior_type==4].groupby('date').nunique()['user_id']
active_user=data.groupby('date').nunique()['user_id']
paying_rate=paying_user/active_user

plt.figure(figsize=(12,8))
axes[0]=plt.subplot(211)
axes[0].set_title('buy_daily')
axes[0].set_ylabel('buy')
plt.plot(buy_daily)
axes[1]=plt.subplot(212)
axes[1].set_title('paying_rate_daily')
axes[1].set_ylabel('paying_rate')
plt.plot(paying_rate)
[<matplotlib.lines.Line2D at 0x7feba7c94090>]

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-9kMEl13W-1638207735267)(output_21_1.png)]

Volume and rates remained generally at a stable level, peaking on both 12 days.




Summary: Volume and pv/uv trend are basically the same, confirming the positive relationship between user traffic and volume, and the billing rate on the twelfth day has risen by nearly 50%, indicating that the twelfth activity has an obvious effect on increasing user's billing rate.

3.3.2 User Purchase Number Analysis

# Distribution of user purchases
user_buy_time=data[data.behavior_type==4].groupby('user_id').count()['item_id']
plt.hist(x=user_buy_time,bins=10,range=[0,100])
plt.xlabel('buy_time')
plt.ylabel('num_of_user')
plt.title('user_buy_time')
Text(0.5, 1.0, 'user_buy_time')

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-Zm7bWnNo-1638207735269)(output_24_1.png)]

# Number of purchases per capita
total_buy_time=data[data.behavior_type==4].count()['user_id']
total_paying_user=data[data.behavior_type==4].nunique()['user_id']
capita_user_buy_time=total_buy_time/total_paying_user
capita_user_buy_time
12.45 464 776052217

3.3.3 Repurchase Rate

# Repurchase Rate = Number of people who have purchased twice or more / Number of people who have purchased
user_buy_time=data[data.behavior_type==4].groupby('user_id').count()
user_twice=user_buy_time[user_buy_time['item_id']>2].count()['item_id']
user_buy=data[data.behavior_type==4].nunique()['user_id']
rebuy_rate=user_twice/user_buy
rebuy_rate
0.8359216745442268

The retail rate of users is 83%, and the number of purchases per capita is 12. From the distribution of purchases, the proportion of users who buy less than 20 times, especially within 10 times, is very large.




Summary: From the retail rate and the number of purchases per capita, Taobao's paying users have high loyalty and should continue to maintain it. From the point of view of the distribution of users'purchase times, we should focus on users whose purchase times are less than 20 times, especially less than 10 times, to further tap their purchasing power.

3.4 Funnel Analysis of User Behavior

pv=data.groupby('behavior_type').count()['user_id'].iloc[0]
favor=data.groupby('behavior_type').count()['user_id'].iloc[1]
cart=data.groupby('behavior_type').count()['user_id'].iloc[2]
buy=data.groupby('behavior_type').count()['user_id'].iloc[3]

3.4.1 Click Conversion Rate

pv_favor=favor/pv*100
pv_cart=cart/pv*100
pv_buy=buy/pv*100
print('Click - Collection Conversion Rate:','%.2f'%pv_favor,'%')
print('Click - Shopping Cart Conversion:','%.2f'%pv_cart,'%')
print('Click-Pay Conversion:','%.2f'%pv_buy,'%')
Click - Collection Conversion Rate: 3.22 %
Click - Shopping Cart Conversion: 4.46 %
Click - Pay Conversion: 1.48 %

3.4.2 Conversion ratio of collection/shopping cart

favor_cart_buy=buy/(favor+cart)*100
print('Collection - Payment conversion rate:','%.2f'%favor_cart_buy,'%')
Collection - Payment Conversion Rate: 19.27 %

The overall conversion rate of users is low, with click-to-pay conversion only 1.48%, while click-to-collect and click-to-shopping cart conversion is relatively high, but only 3% and 4%. Conversion rate of collections or shopping carts is 19.27%, which is higher than click-through conversion rate.




Summary: The click conversion rate of users is generally low, while the conversion rate of behaviors such as collecting and purchasing which show interest in goods will be greatly increased.




Suggestions:


1) For click-through conversion, on the one hand, more time-limited promotions can be held to prompt users to place orders as soon as possible, on the other hand, to expand sources of off-site traffic and increase clickthrough;


2) In the function design, guide users to collect and purchase goods, and in the collection and shopping cart pages with reminder function to increase the purchase conversion rate.

3.5 User Value Classification Using RFM Model

# Since there is no consumption data for this dataset, only the RF dimension is analyzed
# Construct R Value
last_time=data[data.behavior_type==4].groupby('user_id').max()['date']
recency=(pd.to_datetime('2014-12-18')-last_time).dt.days.copy() # A copy is created here to prevent chain assignment warnings when setting R values with indexes below
recency_avg=recency.mean()
recency[recency<recency_avg]=0
recency[recency>recency_avg]=1
# Build F Value
frequency=data[data.behavior_type==4].groupby('user_id').count()['item_id'].copy()
frq_avg=frequency.mean()
frequency[frequency<frq_avg]=0
frequency[frequency>frq_avg]=1
rfm=pd.merge(recency,frequency,on='user_id',how='inner')
rfm=rfm.reset_index().rename(columns={'date':'r','item_id':'f'})
rfm=rfm[['r','f']].astype('str')
rfm['user_type']=rfm['r']+rfm['f']
rfm.groupby('user_type').count()['r'].sort_index(ascending=False).rename(index={'11':'Important Value Users:','10':'Important Development Users:','01':'Important to keep users:','00':'Important Retained Users:'})

user_type
 Important Value Users:     464
 Important Development Users:    3042
 Important to keep users:    2412
 Important Retained Users:    2968
Name: r, dtype: int64

The key value user is the most important user group, but the number is small, only 464 people, while the other types of user groups are equal in number, with a size of more than 2000 people. Operational measures should be taken according to the characteristics of user groups.




Conclusion:


1) Important value users are the best user group, and should focus on not only maintaining their stickiness, but also continuing to guide consumption so as to provide vip services for such users;


2) The important features of developing users are that they have consumed recently but not frequently. The strategy is to increase their consumption times. Specific measures include promotional reminders and special volume activities.


3) It is important to keep users'features high frequency of consumption but not consumption for a period of time. The strategy is to wake up again, attract their attention through app messaging and off-site advertising marketing, and promote re-purchase;


4) Important retained users have not consumed recently and have a low frequency. If they are not retained, they may lose. For such users, on the one hand, they need to maintain exposure, continue pushing activities and preferential information. On the other hand, they need to further study their interests and needs in order to adopt effective operational strategies.

4. Analysis conclusions

4.1 Flow Indicator Diagnosis

Platform user traffic indicators increased steadily, and the promotion effect of "Double Twelve Activities Drainage" was obvious.


Follow-up activities can be duplicated in conjunction with more holidays to create traffic hotspots, while continuing to optimize the rules of activities to improve conversion rate.

4.2 User Purchase Analysis

Taobao's paying users have high loyalty, high re-purchase rate and per capita purchase times, but the platform volume and payment rate levels are basically maintained.


On the one hand, we should continue to tap the purchasing power of paying users. On the other hand, we should increase the conversion rate, increase the size of paying users and increase the volume of transactions.

4.3 Funnel Analysis of User Behavior

The click conversion rate of users is generally low, while the conversion rate of collections and purchases is relatively high.


For click-through conversion, on the one hand, more time-limited promotions can be held to prompt users to place orders as soon as possible, on the other hand, to expand the sources of off-site traffic and increase clickthrough;


In the function design, guide users to collect and purchase goods, and in the collection and shopping cart pages with reminder function, to increase the purchase conversion rate.

4.4 RFM Model - User Population Hierarchy

Operational measures should be taken according to the characteristics of user groups.


Important value users are the best user group. We should focus on not only maintaining their stickiness, but also continuing to guide consumption to provide vip services for such users. At the same time, the number of such users is not large, and continuous operation and mining of other secondary user groups is needed to increase the number of users with important value and platform revenue.




The important features of developing users are that they have consumed recently but not frequently. The strategy is to increase their consumption times. Specific measures include promotional reminders and special volume activities.




The important feature to keep users is that they spend a lot but don't spend for a while. The strategy is to wake up again, attract their attention through app messaging and off-site advertising marketing, and promote re-purchase.




Important retained users have not consumed recently and have a low frequency. If they are not retained, they may lose. For such users, on the one hand, they need to maintain exposure, keep pushing activities and preferential information. On the other hand, they need to further study their interests and needs in order to adopt effective operational strategies.

Tags: Data Analysis Data Mining

Posted on Mon, 29 Nov 2021 17:20:25 -0500 by wutanggrenade