Python decrypted the latest rich list in 2021. Ma Yun didn't even enter the top three

Some time ago, Hurun Research Institute released the 2021 "Hurun 100 rich list", which is the 23rd consecutive release of the "Hurun 100 rich list" since 1999. The threshold for listing has maintained 2 billion yuan for the ninth consecutive year. By analyzing this year's "Hurun 100 rich list", we can see who these rich are and the industries they are mainly engaged in. Come and have a look with me

1. Data reading and preprocessing

df = pd.read_csv('/home/mw/input/hrbf9490/2021 Hurun Report  - List.csv')
df.replace('New ~','New',inplace=True)
df['industry'] = df['industry'].map(lambda x:x[3:])
df['Ranking change'] = df['Ranking change'].map(lambda x:x if x=='New' else('Up' if int(x)>0 else('Down' if int(x)<0 else 'Unchanged')))
df['wealth'] = df['wealth'].astype('int')
df['Character 1'] = df['Gender'].map(lambda x:x.split(',')[0])
df['Character 2'] = df['Gender'].map(lambda x:x.split(',')[1] if len(x) == 13 else '')
df.drop('Gender',axis=1,inplace=True)
df['Character 1_Gender'] = df['Character 1'].map(lambda x:x.split()[0])
df['Character 1_Age'] = df['Character 1'].map(lambda x:x.split()[1])
df['Character 2_Gender'] = df['Character 2'].map(lambda x:x.split()[0] if len(x) != 0 else '')
df['Character 2_Age'] = df['Character 2'].map(lambda x:x.split()[1] if len(x) != 0 else '')
df.drop(['Character 1','Character 2'],axis=1,inplace=True)

2. The visualization results of the top 10 of the Baifu list are shown in the figure below:


Seeing this table, I thought about how Ma Yun, the richest man in my heart, came to the top of the list with 390 billion yuan, and Zhang Yiming, the founder of byte beat, ranked second with 340 billion yuan; Zeng Yuqun of Ningde era ranks third with 320 billion yuan, which is the real rich level.

2.1 source code:

from pyecharts.charts import *
import pyecharts.options as opts
from pyecharts.commons.utils import JsCode
## wealth
bins = [0,50,100,500,1000,1800,10000000]
labels = ['0-50','50-100','100-500','500-1000','1000-1800','1800+']
df['wealth_cut'] = pd.cut(df['wealth'],bins,labels=labels)
df_t = df.head(10).sort_values('wealth',ascending = True)
df_t = df_t[['wealth','full name','enterprise']]
df_t['full name'] = df_t['full name']+'   '+df_t['enterprise']
# Rich text
rich_text1 = {
    "b": {"color": "#ffffff","fontSize": 12, "lineHeight": 12},
    "per": {
        "color": "#ffffff",
        },
}
bar = (Bar(init_opts=opts.InitOpts(width='980px',theme='light',bg_color='#070B50'))
    .add_xaxis([y for x, y, z in df_t.values])
    .add_yaxis('',[x for x, y, z in df_t.values],
        itemstyle_opts={
            'shadowBlur': 10, 
            'shadowColor': 'rgba(0, 0, 0, 0.5)',
            'shadowOffsetY': 5,
            'shadowOffsetX': 5,
            'barBorderRadius': [10, 10, 10, 10],
    },
        label_opts=opts.LabelOpts(
            is_show=True,
            position='insideRight',
            formatter='{b}: {c}Hundred million¥' 
            ))
)
bar.reversal_axis()

items = df['wealth_cut'].value_counts().index.tolist()
value = df['wealth_cut'].value_counts().values.tolist()
pie =(Pie()
    .add('',[list(z) for z in zip(items,value)],radius=['15%','30%'],center=['77%','70%'])
    .set_series_opts(label_opts=opts.LabelOpts(is_show=True,formatter="{b|{b}: }{per|{d}%}  ",
                     rich=rich_text1))
    .set_global_opts(legend_opts=opts.LegendOpts(is_show=False))
)
bar.overlap(pie)
bar.set_global_opts(title_opts=opts.TitleOpts(title='2021 China Hurun rich list Top10',
            subtitle='Data source: Hengchang Shaofang in 2021·Hurun Rich List ',pos_left='center',
            title_textstyle_opts=opts.TextStyleOpts(color='white')),
            legend_opts = opts.LegendOpts(is_show=False),
            xaxis_opts=opts.AxisOpts(is_show=False),
            yaxis_opts=opts.AxisOpts(is_show=False),
    )
bar.render_notebook()

3. Compared with last year, the ranking changes and the gender ratio of the rich are shown in the figure:


The ranking of 1605 corporate tycoons has declined, accounting for 55%. There are 838 rising stars, accounting for 28.72%. The number of men is obviously more than that of women, accounting for nearly 9:1. I don't know when I can become a tycoon in my dream.

3.1 source code:

df_t = pd.DataFrame(df['Character 1_Gender'].value_counts() + df['Character 2_Gender'].value_counts()).reset_index().dropna(axis=0)
df_t.columns = ['sex','count']
df_t1 = df.Ranking change.value_counts().reset_index()
label = df_t['sex'].tolist()
value = df_t['count'].tolist()
label1 = df_t1['index'].tolist()
value1 = df_t1['Ranking change'].tolist()
# Rich text
rich_text1 = {
    "b": {"color": "#ffffff","fontSize": 16, "lineHeight": 40},
    "per": {
        "color": "#ffffff",
        "backgroundColor": "#334455",
        "padding": [4, 2],
        "borderRadius": 2,
    },
}
pie =(Pie(init_opts=opts.InitOpts(width='980px',bg_color='#070B50',theme='light'))
    .add('',[list(z) for z in zip(label,value)],radius=['25%','45%'],center=['75%','55%'],)
    .add('',[list(z) for z in zip(label1,value1)],radius=['25%','45%'],center=['30%','55%'],)
    .set_series_opts(label_opts=opts.LabelOpts(position='outsiede',formatter="{b|{b}: }{c}  {per|{d}%}  ",rich=rich_text1))
    .set_global_opts(
        title_opts=[
            dict(
                text='2021 Ranking change and gender ratio of China Hurun rich list',
                left='center',
                top='5%',
                textStyle=dict(
                    color='#ffffff',
                    fontSize=20)),
            dict(
                text='Data source: Hengchang Shaofang in 2021·Hurun Rich List ',
                left='center',
                top='12%',
                textStyle=dict(
                    color='#C0C0C0',
                    fontSize=14)),
            dict(
                text='Ranking change',
                left='25%',
                top='52%',
                textStyle=dict(
                    color='#ffffff',
                    fontSize=22)),
            dict(
                text='Gender',
                left='72%',
                top='52%',
                textStyle=dict(
                    color='#ffffff',
                    fontSize=22))
    ],
        legend_opts=opts.LegendOpts(is_show=False),
        )
)
pie.render_notebook()

4. What are the main jobs of the rich? The results are as follows:

4.1 source code:

## Industry word cloud
hy = []
for i in df['industry'].map(lambda x:x.split(',')):
    hy.extend(i)
df_t = pd.DataFrame(hy,columns=['industry'])
df1 = df_t['industry'].value_counts().reset_index()
cloud_words = [tuple(xi) for xi in df1.values]
wc = (
    WordCloud()
    .add("", cloud_words,word_size_range=[10, 120],shape='diamond')
    .set_global_opts(title_opts=opts.TitleOpts(title='2021 China Hurun top industries',
            subtitle='Data source: Hengchang Shaofang in 2021·Hurun Rich List ',pos_left='center',))
)
wc.render_notebook()


Sure enough, real estate is the most profitable, and there are the most people doing real estate, followed by the investment industry and the pharmaceutical industry. Let's go and sell real estate.

receive 🎁 Q group number: 675240729 (pure technical exchange and resource sharing) for self-service.

① Industry consultation and professional answers
② Python development environment installation tutorial
③ 400 self-study videos
④ Common vocabulary of software development
⑤ Latest learning Roadmap
⑥ More than 3000 Python e-books

Tags: Python Back-end data visualization

Posted on Mon, 08 Nov 2021 05:39:47 -0500 by cricher