Visual data analysis of MySQL kettle superset E-commerce

1. Project overview

demand

Analyze the data of users, commodities and orders in e-commerce business, and observe the operation

framework

Business database: Mysql: store the most original data

ETL: Kettle

Data warehouse: Mysql: store data to be analyzed

Analysis processing: SQL/Kettle

Visualization: Superset

 

2. Preparations

system

linux system

Software

VMware virtual machine -- installing linux operating system

1 download address for windows:
2 https://www.vmware.com/

Final shell -- remote operating system

1 download address for windows:
2 http://www.hostbuf.com/downloads/finalshell_install.exe
 3 installation and tutorial for Mac and Linux:
4 http://www.hostbuf.com/t/1059.html

mysql - database (installation and compression)

 
 
1 download address for windows:
2 https://www.mysql.com//downloads/

DataGrid -- database management tool

Link: https://pan.baidu.com/s/1k1pix9uziaakoaifghmlnw
Extraction code: lhr4

Navicat -- database management tool

Link: https://pan.baidu.com/s/1eaw3cmhen x5sjvgs7enw 
Extraction code: fqov

kettle - please take care of yourself if you have any installation problems

1. Download and installation of kettle (this article uses the version of kettle as pdi-ce-7.1.0.0-12) click the download address Official website

Visualization tools

superset - please ask Du Niang if you have any questions

linux Environment installation dependency
yum upgrade python-setuptools
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel
install superset
supersetcd /root/anaconda3/
pip install email_validator -i https://pypi.douban.com/simple
pip install superset==0.30.0 -i https://pypi.douban.com/simple





 

3. Data environment

1. Import business data

Download and run this sql code to generate database and table

Link: https://pan.baidu.com/s/1uVYISah6hYkBqiyhIk407w 
Extraction code: sfdm 

2. Building data warehouse

Extract business data into data analysis database through kettle

Link: https://pan.baidu.com/s/1shH0zexh3WraQnMt17n-SA 
Extraction code: ao7n

Generate table - kettle operation strategy

mysql> use itcast_shop_bi;

Database changed
mysql
> show tables; +--------------------------+ | Tables_in_itcast_shop_bi | +--------------------------+ | ods_itcast_good_cats |Commodity classification table
| ods_itcast_goods |Commodity list
| ods_itcast_order_goods |Order and details
| ods_itcast_orders |Order form
| ods_itcast_users |User table
| ods_itcast_area      |Administrative area table
+--------------------------+

3. Implementation of automated build extraction

1. Automatic extraction of region table and commodity classification table

 

 

2. Commodity table, order table, order details table, user table

3. Set timed auto run

 

 

 

4. Data analysis

Demand 1

Demand: count the total amount of 2019-09-05 order payment and the total number of orders

Evolution: count the total amount of daily order payment and the total number of orders

Indicator: total amount, total number of orders

Dimension: day

-- Create result table
use
itcast_shop_bi; create table app_order_total( id int primary key auto_increment, dt date, total_money double, total_cnt int );
-- Save the results of the analysis to the results table
insert
into app_order_total select null, substring(createTime,1,10) as dt,-- 2019-09-05 Date of the day round(sum(realTotalMoney),2) as total_money, -- Total amount of all orders on the day after grouping count(orderId) as total_cnt -- Total number of orders on the day after grouping from ods_itcast_orders where substring(createTime,1,10) = '2019-09-05' group by substring(createTime,1,10);
-- Table structure and content
mysql> desc app_order_user; +----------------+------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+------+------+-----+---------+----------------+ | id | int | NO | PRI | NULL | auto_increment | | dt | date | YES | | NULL | | | total_user_cnt | int | YES | | NULL | | +----------------+------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> select * from app_order_user; +----+------------+----------------+ | id | dt | total_user_cnt | +----+------------+----------------+ | 1 | 2019-09-05 | 11 | | 2 | 2019-09-05 | 11 | +----+------------+----------------+ 2 rows in set (0.01 sec)

Demand 2

Demand: count the total number of users placing orders on September 5, 2019

Evolution: count the number of user IDs of all orders on 2019-09-05 in the order table

-- Create result table
use itcast_shop_bi;
create table app_order_user(
    id int primary key auto_increment,
    dt date,
    total_user_cnt int
);
-- Insert result data
insert into app_order_user
select
  null,
  substring(createTime,1,10) as dt,-- 2019-09-05 Date of the day
  count(distinct userId) as total_user_cnt
from
  ods_itcast_orders
where
  substring(createTime,1,10) = '2019-09-05'
group by
  substring(createTime,1,10);

Demand 3

Demand; analysis of total order amount / number of orders in different payment methods every day

Indicator: total order amount, total number of orders

Dimension: time dimension [day], payment method dimension

-- Create result table
create
table app_order_paytype( id int primary key auto_increment, dt date, pay_type varchar(20), total_money double, total_cnt int );
-- Insert result data
insert into app_order_paytype
select
  null,
  substring(createTime,1,10) as dt,-- Get the date of each day
  case payType when 1 then 'Alipay' when 2 then 'WeChat' when 3 then 'cash' else 'other' end as pay_type,
  round(sum(realTotalMoney),2) as total_money, -- Total amount of all orders on the day after grouping
  count(orderId) as total_cnt -- Total number of orders on the day after grouping
from
  ods_itcast_orders
group by
  substring(createTime,1,10),payType;

Demand 4

Demand: Top 5 users with the most orders in September 2019, i.e. top 5 users

Mode 1: the above is a simple case, only the top 5 people with the largest number of orders

select
    date_format(dt,'%Y-%m') as dt,
    userId,
    userName,
    count(orderId) as total_cnt
from
    ods_itcast_orders
where
    date_format(dt,'%Y-%m') = '2019-09'
group by
    date_format(dt,'%Y-%m'),userId,userName
order by
    total_cnt desc
limit 5;

Mode 2: we want to get the top 5 with the largest number of orders. If the number is the same, the ranking is the same

select
       *
from (
          select *,
                 dense_rank() over (partition by dt order by total_cnt desc) as rn
          from (
                   select date_format(dt, '%Y-%m') as dt,
                          userId,
                          userName,
                          count(orderId)           as total_cnt
                   from ods_itcast_orders
                   where date_format(dt, '%Y-%m') = '2019-09'
                   group by date_format(dt, '%Y-%m'), userId, userName
               ) tmp1
) tmp2 where rn < 6;

Demand 5

Demand: Statistics of total order amount and total number of orders in different classifications [similar to statistics of total order amount and total number of orders in different payment types]

-- Create result table
use itcast_shop_bi;
drop table if exists app_order_goods_cat;
create table app_order_goods_cat(
    id int primary key auto_increment,
    dt date,
    cat_name varchar(20),
    total_money double,
    total_num int
);
-- step2: First, the relationship between the three-level classification and the first level classification is constructed
-- use join realization
drop table if exists tmp_goods_cats;
create temporary table tmp_goods_cats as
select
    t3.catId as t3Id,-- Three level classification id
    t3.catName as t3Name, -- Class III classification name
    t2.catId as t2Id,
    t2.catName as t2Name,
    t1.catId as t1Id,
    t1.catName as t1Name
from
    ods_itcast_good_cats t3  join ods_itcast_good_cats t2 on t3.parentId = t2.catId
    join ods_itcast_good_cats t1 on t2.parentId = t1.catId;
    
    
CREATE UNIQUE INDEX idx_goods_cat3 ON tmp_goods_cats(t3Id);
CREATE UNIQUE INDEX idx_itheima_goods ON ods_itcast_goods(goodsId);
CREATE INDEX idx_itheima__order_goods ON ods_itcast_order_goods(goodsId);

-- Insert result data
insert

into app_order_goods_cat select null, substring(c.createtime,1,10) as dt, a.t1Name, sum(c.payPrice) as total_money, count(distinct orderId) as total_num from tmp_goods_cats a left join ods_itcast_goods b on a.t3Id = b.goodsCatId left join ods_itcast_order_goods c on b.goodsId = c.goodsId where substring(c.createtime,1,10) = '2019-09-05' group by substring(c.createtime,1,10),a.t1Name;

5. Build automatic Kettle job to realize automatic analysis

Create a job

Configure SQL script

Defining variables for a job

 

6. Visual construction

Total order sales

Total number of orders

Total order users

Proportion of total order amount of different payment methods

Number of orders with different payment methods

Total order amount of different commodity categories

Total number of orders in different commodity categories

Cloud of words

7. Building Kanban

Tags: Database MySQL Linux Python

Posted on Sun, 17 May 2020 02:56:29 -0400 by kevdotbadger