Simple fitting analysis of e-commerce user data

The main modules used in this analysis are: numpy, matplotlib, pandas, sklearn.liner_ Lineexpression in model


1. Data reading

The data includes user behavior data and vip user behavior data

2. Data analysis

When we use


After finding that there are no missing values in the code, we start to analyze the type of data and find some data we need

Our main task this time is to analyze user behavior, so we first eliminate some useless data, such as brands of similar goods and id of shopping cart (brands can be used as brand analysis)  

  3. Extract and analyze useful data

I. like vip_user_data we can find the so-called merchant_id is seller_id we can rename it and use it for subsequent parallel (here, count the number of VIPs of each merchant for subsequent links) the code is as follows:

vip_user_data = vip_user_data.rename(columns={"merchant_id":"seller_id"})
judge_seller_data = vip_user_data[["seller_id","label"]].groupby(by = "seller_id").sum()


II. We find the action of user behavior_ The type column is found to be object, which is disadvantageous to statistics. Convert it into data for subsequent use: (sort out the subsequent item_order_data)

user_behavior_data["click"] = user_behavior_data["action_type"].apply(lambda l: 1 if l=="click" else 0  )
user_behavior_data["cart"] = user_behavior_data["action_type"].apply(lambda l:1 if l == "cart" else 0)
user_behavior_data["order"] = user_behavior_data["action_type"].apply(lambda l:1 if l == "order" else 0)
user_behavior_data["fav"] = user_behavior_data["action_type"].apply(lambda l:1 if l == "fav" else 0)
item_order_data =user_behavior_data[["item_id","click","order","cart","fav"]]
item_order_data = item_order_data.groupby("item_id").sum()

The code is shown in the figure above

III. as shown in our mind map, our next step is to analyze the relationship between the number of goods in the store and the order quantity. First, sell_ ID and item_id for integration, because there are redundant items, because more than one order will be issued for a single commodity, so we use item_id performs a de duplication and passes the seller_id is grouped to complete a calculation after a grouping.

#Relationship between the number of goods in the store and sales volume
seller_pronum_data = user_behavior_data[["seller_id","item_id"]]
#duplicate removal
seller_pronum_data = seller_pronum_data.drop_duplicates(["item_id"])
#Add item_id is assigned to 1 for subsequent calculation
seller_pronum_data["item_id"] = 1
seller_pronum_data = seller_pronum_data.groupby("seller_id").sum().rename(columns={"item_id":"item_num"})

IV. next, we need to splice the tables we have integrated and analyzed

1. Create item_ The data contains sellers_ ID and item_id to compare with item_order_data for splicing

2. Add the spliced item_data and our judge_seller_data for splicing

3. And the number of goods in the store_ pronum_ Data for splicing

We found that the label, that is, the number of vip, is nan when there is no value. We need to convert it into 0.0 with value for our subsequent calculation


item_data = user_behavior_data[["item_id","seller_id"]]
item_data = item_data.drop_duplicates("item_id")
#Because of the item in our orderdata_ ID is not repeated, so we need to reprocess it before we can carry out subsequent operations
item_data = item_order_data.merge(item_data,on="item_id",how="left")

#Splice item_data and seller_pronum_data, splice judge_seller_data
item_data = item_data.merge(judge_seller_data,how="left",on="seller_id")

item_data.label = item_data.label.fillna(0.0)

  V. model building

We can see from the above table that the dependent variable is order and there are five independent variables: click, cart, fav, label and item_ Count, these five variables are used as a regression model to predict the trend. Therefore, we need to separate independent variables and dependent variables and make a test model to simulate the results of our prediction

from sklearn.model_selection import train_test_split
x = item_data.drop(columns=["item_id","seller_id","order"])
y = item_data["order"]

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2,random_state=42)
model =LinearRegression()
res  =,y_train)

Suppose a set of data predicts the result

pred = res.predict([[100,2,6,10,30]])

  We have obtained 19 orders we want, and the above e-commerce prediction experiment has been completed

Tags: Python Machine Learning sklearn

Posted on Fri, 26 Nov 2021 16:41:51 -0500 by robsgaming