Analysis of airline customer value with K-means algorithm

With the advent of the information age, the focus of enterprise marketing has shifted from products to customers. Customer Relationship Management (CRM) has become the core issue of enterprises. The key problem of customer relationship management is customer clustering. Through customer clustering, distinguish between worthless customers and high-value customers. Enterprises formulate optimized personalized service plans for customers with different values, adopt different marketing strategies, focus limited marketing resources on high-value customers, and achieve the goal of maximizing enterprise profits. Accurate customer clustering results are an important basis for enterprises to optimize the distribution of marketing resources. Customer clustering has increasingly become one of the key problems to be solved in customer relationship management.

Project objectives:

This paper will use airline customer data, combined with LRFMC model, use K-Means clustering algorithm to cluster customers, compare the customer value of different types of customers, and formulate corresponding marketing strategies. Project learning objectives: (1) Familiar with the steps and processes of aviation customer value analysis. (2) Understand the basic principles of RFM model. (3) Master the basic principle and application method of K-Means algorithm. (4) Compare the customer value of different types of customers and formulate corresponding marketing strategies.

Project splitting task 1: understand the current situation of airlines and customer value analysis

 task description Facing the fierce market competition, various airlines have launched more discounts to attract customers. A domestic airline company is facing the business crisis of frequent passenger loss, declining competitiveness and underutilization of resources. By establishing a reasonable customer value evaluation model, classify customers, analyze and compare the customer value of different customer groups, and formulate corresponding marketing strategies to provide personalized services to different customer groups.  task analysis (1) Understand the current situation of airlines. (2) Understand customer value analysis. (3) Familiar with the steps and processes of airline customer value analysis.

Understand the current situation of airlines At present, the airline has accumulated a large number of member file information and flight records. As of March 3, 2014 The day is the end time. The time period with a width of two years is selected as the analysis and observation window. The detailed data of all customers with flight records in the observation window are extracted to form historical data, with a total of 62988 records. It includes membership card number, membership time, gender, age, membership card level, city of work, province of work, country of work, end time of observation window, total accumulated points, total flight kilometers of observation window, number of flights in observation window, average flight time interval and average discount coefficient Data table structure: shape(62988, 44)

Table properties:

Feature name

Feature description


Membership card number


Membership time


First flight date




Membership card level


City of work


Province of work


Country of work




Number of flights in the observation window


End time of observation window


Time from the last flight to the end of observation window


Average discount rate


Fare income of observation window


Total flight kilometers of observation window


Last flight date


Average flight time interval


Maximum flight interval


Points redemption times


Total elite points


Promotion points


Partner points


Total cumulative points


Change times of non opportunity points


Total basic integral

Combined with the current data of airlines, the following objectives can be achieved. 1) With the help of airline customer data, customers are grouped. 2) Analyze the characteristics of different customer categories and compare the customer value of different categories of customers. 3) Provide personalized services for different value customer categories and formulate corresponding marketing strategies. Note: see ARI for data_ Data.csv (git link is pasted behind the table)

Understanding customer value analysis: The global economic environment and market environment have quietly changed, and the business of enterprises has gradually changed from product oriented to customer demand oriented. A new "customer-centric" business model is taking shape and being promoted to an unprecedented height. However, maintaining a relationship with customers requires costs, and only part of the customers owned by the enterprise can bring profits to the enterprise. The resources of enterprises are also limited. Ignoring high potential customers and providing the same service to all customers will make the resources of enterprises unable to give full play to their maximum utility to create maximum profits. If any enterprise wants to survive and develop, it must obtain profits. Pursuing profit maximization is one of the purposes of enterprise survival and development. Therefore, enterprises cannot and should not maintain the same relationship with all customers. Jay & Adam curry, the advocate of customer marketing strategy, abstracts the following experience from the implementation of customer marketing for hundreds of foreign companies. (1) 80% of the company's revenue comes from the top 20% of customers. (2) 20% of customers have a profit margin of 100%. (3) More than 90% of revenue comes from existing customers. (4) Most of the marketing budget is often spent on non existing customers. (5) 5% ~ 30% of customers have upgrade potential in the customer pyramid. (6) The customer upgrade of 2% in the customer pyramid means that the sales revenue increases by 10% and the profit increases by 50%.

These experiences may not be completely accurate, but they not only reveal the trend of customer differentiation in the new era, but also illustrate the urgency and necessity of customer value analysis. If we analyze the profitability of customers, we will find that the profit structure of customers has changed significantly, and only a specific part of customers have brought profits to the enterprise. If enterprises want to achieve long-term development, they must effectively identify and manage such customers. If we use the same method to deal with all customers who have business dealings with individual industries, we will not succeed. Although many enterprise managers know the importance of customer value analysis, they know little about how to conduct customer value analysis. How to consider the customer value factors in an all-round and multi angle way and carry out effective customer value analysis is a problem that needs to be seriously considered in front of all enterprises. Only by selecting valuable customers and focusing on these customers can we effectively enhance the competitiveness of enterprises and make enterprises achieve greater development. In the field of customer value analysis, the most influential and empirically tested theories and models include customer lifetime value theory, customer value pyramid model, strategy evaluation matrix analysis method and RFM customer value analysis model. This chapter will use the improved customer value RFM model for analysis.

Be familiar with the steps and processes of aviation customer value analysis The overall process of aviation customer value analysis project is shown in Figure 7-1. It mainly includes the following four steps. (1) Extract the data of airlines from April 1, 2012 to March 31, 2014. (2) Data cleaning, feature construction and standardization are performed on the extracted data. (3) Based on RFM model, K-Means algorithm is used for customer clustering. (4) For customers with different values obtained from the model results, different marketing methods are adopted to provide customized services.

Project splitting task 2: preprocessing aviation customer data

Task description: There are a few missing and abnormal values in the original data of airline customers, which can only be used for analysis after cleaning. with Because there are too many features of the original data, it is inconvenient to be directly used for customer value analysis. Therefore, it is necessary to screen the features and select the key features to measure customer value.

Task analysis: The preprocessing of airline customer data can be divided into the following three steps (1) Deal with missing and abnormal data values. (2) Screening features combined with RFM model. (3) Standardized filtered data

1. Deal with missing and abnormal values of data: Through the observation of the data, it is found that the original data has a blank ticket price, a minimum ticket price of 0 and a minimum discount rate of 0 Records with total flight kilometers greater than 0. The data with null ticket price may be caused by the absence of boarding record. Other data may be caused by customers taking 0% discount tickets or point exchange. Due to the large amount of original data, this kind of data accounts for a small proportion and has little impact on the problem, so it is discarded. The specific treatment methods are as follows (1) Discard records with empty ticket price. (2) Discard the record that the fare is 0, the average discount rate is not 0, and the total flying kilometers is greater than 0.

import pandas as pd                 #Import data processing library pandas installation method pip install pandas
import numpy as np                  #Import scientific computing library numpy installation method pip install numpy
from sklearn.cluster import KMeans  #Importing KMeans unsupervised clustering algorithm installation method is too troublesome Baidu
import matplotlib.pyplot as plt     #Import drawing library matplotlib installation method pip install matplotlib

'''Handling missing and outliers'''
data = pd.read_csv("air_data.csv", encoding="ansi")             #Read using pandas_ csv read csv file
print(data.shape)                                               #View the structure of the current data
##Requirement 1. Discard the record with empty ticket price.                                        #Then take the data that the ticket price is not empty

data = data[data["SUM_YR_1"].notnull() & data["SUM_YR_2"].notnull()]  #The two prices are YR_1 and yr_ two
# print(data.shape)                                                  #View the structure of the current data
##Requirements 2. Discard the record that the fare is 0, the average discount rate is not 0, and the total flight kilometers are greater than 0
        ##Then find out if the ticket price is not 0, flight km = = 0 and average discount = = 0.
doc1 = (data["SUM_YR_1"] !=0) |  (data["SUM_YR_2"] !=0)          #If the ticket price is not 0,'| 'is or.' & ' Yes and.
doc2 = (data["SEG_KM_SUM"] == 0) & (data["avg_discount"] == 0)   #Flight kilometers = = 0, average discount = = 0
data = data[doc1 | doc2]
# print(data.shape)

2. Build the key features of aviation customer value analysis: Constructing LRFMC model for aviation customer value analysis In this paper, the consumption amount is replaced by two characteristics: the customer's accumulated flight mileage M in a certain period of time and the average value C of the discount coefficient corresponding to the customer's class in a certain period of time. In addition, the length of membership time of airline members can affect customer value to a certain extent, so the customer relationship length L is added to the model as another feature to distinguish customers. In this paper, the customer relationship length L, consumption time interval R, consumption frequency F, flight mileage M and discount coefficient are equalized The five features of mean value C, as the characteristics of airlines identifying customer value, are recorded as LRFMC model.

According to the airline customer value LRFMC model, six features related to LRFMC features FFP DATE, LOAD TIME, FLIGHT COUNT, AVG count, SEG KM SUM and LAST TO END are selected. Delete irrelevant, weakly related or redundant features, such as membership card number, gender, city of work, country of work, age, etc.

Since the five features of LRFMC model are not directly given in the original data, these five features need to be extracted from the original data. (1) The number of months from the membership time to the end of the observation window L = the end time of the observation window - Membership time (unit: month), as shown in equation (7-1)


(2) The number of months from the end of the observation window when the customer last took the company's aircraft R = the time from the last flight to the end of the observation window (unit: month), as shown in equation (7-2).


(3) Customer's flight mileage in the observation window F = (consumption times in a certain period of time)


(4) Customer's flight mileage in observation window M = total flight kilometers in observation window (unit: km)


(5) Average value of discount coefficient corresponding to customer's class in observation window C = average discount rate (unit: none)


'key features of building aviation customer value analysis' #Using the LRFMC model, we need to get the required attributes. It says that there are attribute requirements.

##1. Number of months from the membership time to the end of the observation window L = end time of the observation window LOAD_TIME - Membership time (unit month) FFP_DATE
data["LOAD_TIME"] = pd.to_datetime(data["LOAD_TIME"])               #To convert the time format in pd.datetime format
data["FFP_DATE"] = pd.to_datetime(data["FFP_DATE"])
data["Membership time"] = (((data["LOAD_TIME"] - data["FFP_DATE"])))   # L - F = total number of days of membership, then converted to months
mon = []                                                              #Set an empty list
for i in data["Membership time"]:                                       #Traverse the next time
    months = int(i.days/30)                                           #Calculate the corresponding month
    mon.append(months)                                                #Add to empty list
data["Membership month"] = np.array(mon)                                  #Re insert a list of months in the data
# print(data ["membership month"])

data["LAST_TO_END"] = data["LAST_TO_END"]/30                       #The last flight time is the number of months, divided by 30.

    ##FLIGHT_COUNT: consumption times LAST_TO_END last time SEG_KM_SUM: total flying kilometers avg_discount average discount rate
my_data = data[["Membership month", "LAST_TO_END", "FLIGHT_COUNT" , "SEG_KM_SUM", "avg_discount"]]

Five features of Standardization:

After completing the construction of the five features, analyze the data distribution of each feature, and the value range of the data is shown in the table. From the data in the table, it can be found that the value range data of the five features are quite different. In order to eliminate the impact of order of magnitude data, it is necessary to standardize the data. (you can try: standardization of standard deviation)

# Data standardization processing              # Normalize the input sequence for decimal scaling
def decimal_clean(arr):                                        # : param arr: input sequence to be optimized
    k = np.ceil(np.log10(np.max(np.abs(arr))))                 # Standardize by moving the decimal point of the data
                             # np.ceil: round up np.abs: absolute value np.max: maximum value np.log10: find logarithm, if N=a**x,x=loga**N
    return arr / 10 ** k                                      # : return: normalized sequence
                             # Using concat join function and iloc slice function in pandas, slice first, standardize and then merge.
xyzqw = pd.concat([decimal_clean(my_data.iloc[:, 0]), decimal_clean(my_data.iloc[:, 1]),
                   decimal_clean(my_data.iloc[:, 2]),decimal_clean(my_data.iloc[:, 3]),
                   decimal_clean(my_data.iloc[:, 4])], axis=1, join="outer")
# print(xyzqw)
# print(type(xyzqw))

'''K-means clustering algorithm '''
x = xyzqw[['Membership month','LAST_TO_END','FLIGHT_COUNT','SEG_KM_SUM','avg_discount']]
kms = KMeans(n_clusters=5)                            #Five cluster centers are imported by Kmeans method
y = kms.fit_predict(x)                                #Calculate the cluster center and predict the cluster index of each sample
# print(y)                                            # y is a numpy array

Analysis of project results:

The first five part drawing:

'''Radar mapping:'''
def drow():
    plt.rcParams['font.sans-serif'] = ['KaiTi']
    #'ggplot')                          #ggplot style
    tu  = plt.subplot(321,polar=True)
    tu1 = plt.subplot(322,polar=True)
    tu2 = plt.subplot(323,polar=True)
    tu3 = plt.subplot(324,polar=True)
    tu4 = plt.subplot(325,polar=True)
    labels = np.array(['Membership month','LAST_TO_END','FLIGHT_COUNT','SEG_KM_SUM','avg_discount'])    #Set label
    theta = np.linspace(0,2*np.pi,5,endpoint=False)  #Generate angle value, starting from 0 to 2 π, generate 5 copies, and the terminal point is False
    theta = np.concatenate((theta,[theta[0]]))       #Close: after adding, the first value is connected end to end, and the first value is the same as the last value
    data = [x["Membership month"][y==0],x["LAST_TO_END"][y==0],x["FLIGHT_COUNT"][y==0],x["SEG_KM_SUM"][y==0],x["avg_discount"][y==0]]               #
    data1 = [x["Membership month"][y==1],x["LAST_TO_END"][y==1],x["FLIGHT_COUNT"][y==1],x["SEG_KM_SUM"][y==1],x["avg_discount"][y==1]]               #
    data2 = [x["Membership month"][y==2],x["LAST_TO_END"][y==2],x["FLIGHT_COUNT"][y==2],x["SEG_KM_SUM"][y==2],x["avg_discount"][y==2]]               #
    data3 = [x["Membership month"][y==3],x["LAST_TO_END"][y==3],x["FLIGHT_COUNT"][y==3],x["SEG_KM_SUM"][y==3],x["avg_discount"][y==3]]               #
    data4 = [x["Membership month"][y==4],x["LAST_TO_END"][y==4],x["FLIGHT_COUNT"][y==4],x["SEG_KM_SUM"][y==4],x["avg_discount"][y==4]]               #
    data  = np.concatenate((data, [data[0]]))          #Close: keep the first and last values equal
    data1 = np.concatenate((data1,[data1[0]]))
    data2 = np.concatenate((data2,[data2[0]]))
    data3 = np.concatenate((data3,[data3[0]]))
    data4 = np.concatenate((data4,[data4[0]]))
    tu .plot(theta,data, marker =(5,1))
    tu1.plot(theta,data1,marker = 'x')
    tu2.plot(theta,data2,marker ="o")
    tu3.plot(theta,data3,marker ="o")
    tu4.plot(theta,data4,marker ="o")



The second 5-in-1 radar chart: Taking the cluster center as the coordinate

plt.rcParams['font.sans-serif'] = ['KaiTi']
tu  = plt.subplot(111,polar=True)
angle = np.linspace(0,2*np.pi,datalenth)
# angle = np.concatenate((angle,[angle[0]]))
data = kms.cluster_centers_         #Cluster center
# labels = ['1','2','3','4','5']
plt.title('Cluster center')
labels1 = np.array(['Membership month','LAST_TO_END','FLIGHT_COUNT','SEG_KM_SUM','avg_discount'])    #Set label

It's better to add an attribute. Add linestyle in plt.polar. Because it takes the cluster center as the coordinate, the graph will change. plt.polar(angle,data,marker=(5,1),linestyle=':')

This case defines five levels of customer categories: important retention customers, important development customers, important retention customers and one General customers and low value customers.

(1) It's important to keep customers. Average discount factor for such customers © Higher (generally, the class of the flight is higher), and the length of time since the last flight ® Low, high number of flights (F) or total mileage (M). They are high-value customers of airlines and the most ideal customer type. They contribute the most to airlines, but the proportion is small. Airlines should give priority to putting resources on them, carry out differentiated management and one-to-one marketing, improve the loyalty and satisfaction of such customers, and prolong the high-level consumption of such customers as far as possible.

(2) Important development customers. Average discount factor for such customers © High, the length of time since the last flight ® Low, but the number of flights (F) or total mileage (M) is low. Such customers have a short membership time (L), and they are potential value customers of airlines. Although the current value of such customers is not very high, they have great development potential. Airlines should strive to promote such customers to increase their consumption in the company and partners, that is, to increase the share of customers' wallets. Through the promotion of customer value, strengthen the satisfaction of such customers, improve their transfer costs to competitors, and make them gradually become loyal customers of the company.

(3) Important to retain customers. The average discount factor for flights taken by such customers in the past ©, The number of flights (F) or total flight mileage (M) is high, but has not taken the company's flight for a long time (R is high) or the frequency is reduced. Such customer value changes are highly uncertain. Because the reasons for the decline of such customers are different, it is particularly important to master the latest information of customers and maintain the interaction with customers. Airlines should infer the changes of customers' consumption according to the recent consumption time and consumption times of such customers, list the customers, focus on their contact, and take certain marketing measures to prolong the customer's life cycle.

(4) General customers and low value customers. Average discount factor for flights taken by such customers © It's very low, and there's no flight of the company between the long stations (R is high), the number of flights (F) or total flight mileage (M) is low, and the length of membership (L) is short. They are general users and low value customers of airlines. They may take their flights only when airline tickets are discounted and promoted.

Among them, the three types of important customers: important development customers, important retention customers and important retention customers can be divided into three stages: development period, stable period and recession period of customer life cycle management of A. According to the characteristics of each customer type, rank the customer value of various customer groups, and the results are shown in table 7-10. Different types of customer groups provide different products and services, enhance the value of important development customers, stabilize and prolong the high-level consumption of important customers, prevent the loss of important retention customers and actively restore relationships.

This model uses historical data for modeling. With the change of time, the observation window of analysis data is also changing. because Therefore, for the details of new customers and considering the actual situation of the business, the model suggests to run once a month, judge the new customer information through the cluster center, and analyze the characteristics of this new customer. If there is a big difference between the actual situation of incremental data and the judgment results, the business department needs to pay special attention to check the reasons for the large change and confirm the stability of the model. If the stability of the model changes greatly, the model needs to be retrained for adjustment. At present, there is no unified standard for the retraining time of the model, which is determined by experience in most cases. According to experience, it is suggested that the model should be trained every six months.

Model application

According to the characteristic analysis of each customer group, the following marketing means and strategies are adopted to provide the price for airlines Provide reference for value customer group management. 1. Promotion and promotion of members Airline members can be divided into platinum card members, gold card members, silver card members and ordinary card members, including non ordinary card members General card members can be collectively referred to as elite members of airlines. Although each airline has its own characteristics and regulations, but The management method of membership system is similar. To become an elite member is generally required to accumulate a certain flight mileage or flight segment within a certain period of time (such as one year). After meeting this requirement, you will become an elite member within the validity period (usually two years) and enjoy the corresponding high-level services. At the end of the validity period, determine whether the customer is qualified to continue as an elite member according to relevant evaluation methods, and then upgrade or demote the customer accordingly. However, because many customers are not aware of or do not understand the time and requirements for member upgrading or relegation (the relevant documents are often complex and difficult to understand), they often find that they are only a little short of upgrading or relegation after the evaluation period, but miss the opportunity, resulting in the accumulation of previous mileage in vain. At the same time, this perception may also lead to customers. Therefore, airlines can give up their consumption in the company before the time point when they evaluate the upgrading or grading of members. High consumption customers who meet the requirements shall be properly reminded and even take some promotional activities to stimulate them to meet the corresponding standards through consumption. In this way, we can not only obtain benefits, but also improve customer satisfaction and increase the elite members of the company. 2. First exchange The most attractive content in the airline frequent flyer plan is that customers can exchange free tickets or free upgrades through the mileage accumulated by consumption. Each airline has a first redemption standard, that is, when the customer's mileage or period has accumulated to a certain extent, the first redemption can be realized. This standard will be higher than the normal mileage redemption standard. However, the mileage accumulation of many companies will be reduced over time. For example, some companies will halve the mileage accumulated in that year at the end of the year. This will cause many members who do not know the situation to lose their hard accumulated mileage in vain, and even it is always difficult to realize the first exchange. Similarly, this will also cause customer dissatisfaction or loss. The measures that can be taken are to extract members who are close to but have not reached the first exchange standard from the database, remind them or promote them, so that they can meet the standard through consumption. Once the first exchange is realized, it will be much easier for customers to exchange again in the company than in other companies, which is equivalent to increasing the transfer cost to a certain extent. In addition, some reminders can be given to customers before some special time points (such as the time point when the mileage is halved), which can increase customer satisfaction. 3. Cross selling Through the cooperation with non aviation enterprises such as issuing co branded cards, customers can obtain the company's points in the consumption process of other enterprises, enhance their contact with the company and improve their loyalty. For example, you can check the mileage accumulation of important customers at non aviation partners, find out their habitual mileage accumulation methods (whether they often consume at partners and which types of partners' products they prefer to consume), and promote them accordingly. Customer identification period and development period lay the foundation for customer relationship, but the customer relationship brought by these two periods is short-lived and unstable. In order to obtain long-term profits, enterprises must have stable and high-quality customers. Keeping customers is very important for enterprises, not only because the cost of winning a new customer is much higher than that of maintaining old customers, but also because the loss of customers will cause a direct loss of the company's revenue. Therefore, during this period, airlines should strive to maintain the level of customer relationship, make it at a high level, maximize the interactive value between the company and customers in the life cycle, and prolong this high level as much as possible. For customers at this stage, we should mainly improve customer satisfaction by providing high-quality service products and improving service level. The list of important customers can be obtained by customer segmentation based on the data analysis of frequent passenger database. The average discount factor of the flights taken by such customers © High, the length of time since the last flight ® Low, and the number of flights (F) or total flight mileage (M) is also high. They are the value customers of airlines and the most ideal customer type. They contribute the most to airlines, but their proportion is relatively small. Airlines should give priority to putting resources on them, carry out differentiated management and one-to-one marketing, improve the loyalty and satisfaction of such customers, and prolong the high-level consumption of such customers as far as possible.

github source code + data address:

Posted on Mon, 22 Nov 2021 08:55:52 -0500 by anthonyaykut