Why is EDG so popular? Analyze a wave with Python: all the fans are fried

On November 6, Beijing time, in the S11 finals of the League of heroes, EDG E-sports Club of the Chinese LPL division beat DK of the Korean LCK division by 3:2 to win the global finals of the League of heroes in 2021.

This competition has also attracted the attention of the whole network:

  • Microblog hot search ranked first, showing 81.94 million viewers;
  • bilibili platform, attracting 350 million people, full screen barrage;
  • Tencent video has been seen by 6 million people;
  • The heat of Betta and tiger tooth platform is also high;
  • After the competition, CCTV news also sent a microblog to congratulate EDG team on winning the championship;

    Since the competition is so hot, what did everyone say?

We analyzed 31000 bullet screen data with Python, and the screen was full of blessings and feelings of fans.

We can not only feel the whole process of the game through live broadcast and news, but also feel the enthusiasm of fans by analyzing hot spots through Python.

Teach you to get barrage data hand in hand

1. Brief description

It doesn't matter if you haven't seen the live broadcast. There's a replay! The whole video has been sorted out for everyone, from the opening ceremony, to five games, and then to the moment of winning the championship, a total of 7 videos.


In each video, there are bullet screens released by fans. What we need to do today is to get the bullet screen data in each video and see what the fans say in a restless mood?

I have to say that the change speed of the website of station B is really fast. I remember it was easy to find last year. But I haven't found it today.

But it doesn't matter. We just take the previous barrage data website interface and use it.

API: https://api.bilibili.com/x/v1/dm/list.so?oid=XXX

This oid is actually a string of numbers. Each video has a unique oid.

2. oid data search

This section takes you step by step to find this oid. To find an oid, you must first find something called cid.

Click F12, open the developer tool first, and complete the operations in 1-5 according to the prompts in the figure.

  • Place 3: there are many requests on this page, but you need to find the request starting with pagelist.
  • Place 4: under the corresponding Header, there is a Request URL, and the cid we want is in this URL.
  • Place 5: under the corresponding Preview, the Request URL is the result of the response to us. The cid data we want is circled in the figure.

2. cid data acquisition

We have found the Request URL above. Now we just need to make a request and get the cid data inside.

import requests
import json

url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
res = requests.get(url).text
json_dict = json.loads(res)


for i in json_dict["data"]:
    oid = i["cid"]
    print(oid)

The results are as follows:

In fact, the number string corresponding to cid here is the number string after oid.

3. Splicing url

We have not only the barrage api interface, but also the cid data. Then we can splice them to get the final url.

url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
res = requests.get(url).text
json_dict = json.loads(res)


for i in json_dict["data"]:
    oid = i["cid"]
    api = "https://api.bilibili.com/x/v1/dm/list.so?oid="
    url = api + str(oid)
    print(url)

The results are as follows:

There are 7 websites, corresponding to the barrage data in 7 videos.

Click any one to view:

4. Extract and save the barrage data

With a complete url, all we have to do is extract the data inside. Here, we still use regular expressions directly. Let's take one of the videos as an example to explain it to you.

final_url = "https://api.bilibili.com/x/v1/dm/list.so?oid=437729555"
final_res = requests.get(final_url)
final_res.encoding = chardet.detect(final_res.content)['encoding']
final_res = final_res.text
pattern = re.compile('<d.*?>(.*?)</d>')
data = pattern.findall(final_res)

with open("bullet chat.txt", mode="w", encoding="utf-8") as f:
    for i in data:
        f.write(i)
        f.write("\n")

The results are as follows:

This is only one page of data, with a total of 7200 data.

Complete code

I have explained each step of the process step by step. Here I directly encapsulate the code into a function.

import os
import requests
import json
import re
import chardet


def get_cid():
    url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
    res = requests.get(url).text
    json_dict = json.loads(res)
    cid_list = []
    for i in json_dict["data"]:
        cid_list.append(i["cid"])
    return cid_list


def concat_url(cid):
    api = "https://api.bilibili.com/x/v1/dm/list.so?oid="
    url = api + str(cid)
    return url


def get_data(url):
    final_res = requests.get(url)
    final_res.encoding = chardet.detect(final_res.content)['encoding']
    final_res = final_res.text
    pattern = re.compile('<d.*?>(.*?)</d>')
    data = pattern.findall(final_res)
    return data


def save_to_file(data):
    with open("Barrage data.txt", mode="a", encoding="utf-8") as f:
        for i in data:
            f.write(i)
            f.write("\n")
            
cid_list = get_cid()
for cid in cid_list:
    url = concat_url(cid)
    data = get_data(url)
    save_to_file(data)

The results are as follows:

It's really great, a total of 3.1w data!

Nanny level word cloud map making tutorial

For the obtained data, we use the EDG background image to make a good-looking word cloud image.

import pandas as pd
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from imageio import imread

import warnings
warnings.filterwarnings("ignore")


for i in ["EDG","Eternal God","yyds","fucking great","Send a congratulatory message"]
    jieba.add_word(i)


with open("Barrage data.txt",encoding="utf-8") as f:
    txt = f.read()
txt = txt.split()
txt = [i.upper() for i in txt]
data_cut = [jieba.lcut(x) for x in txt]


with open("stoplist.txt",encoding="utf-8") as f:
    stop = f.read()
stop = stop.split()
stop = [" "] + stop


s_data_cut = pd.Series(data_cut)
all_words_after = s_data_cut.apply(lambda x:[i for i in x if i not in stop])


all_words = []
for i in all_words_after:
    all_words.extend(i)
word_count = pd.Series(all_words).value_counts()



back_picture = imread("EDG.jpg")


wc = WordCloud(font_path="simhei.ttf",
               background_color="white",
               max_words=1000,
               mask=back_picture,
               max_font_size=200,
               random_state=42
              )
wc2 = wc.fit_words(word_count)


plt.figure(figsize=(16,8))
plt.imshow(wc2)
plt.axis("off")
plt.show()
wc.to_file("ciyun.png")

The results are as follows:

Tags: Python Back-end

Posted on Tue, 09 Nov 2021 00:08:02 -0500 by blackcow