EDG won the championship and analyzed a wave with Python: the fans are fried

EDG won the championship and the fans fried the pot!

On November 6, Beijing time, in the S11 finals of the League of heroes, EDG E-sports Club of the Chinese LPL division beat DK of the Korean LCK division by 3:2 to win the global finals of the League of heroes in 2021.

This competition has also attracted the attention of the whole network:

  • Microblog hot search ranked first, showing 81.94 million viewers;
  • bilibili platform, attracting 350 million people, full screen barrage;
  • Tencent video has been seen by 6 million people;
  • The heat of Betta and tiger tooth platform is also high;
  • After the competition, CCTV news also sent a microblog to congratulate EDG team on winning the championship;


Since the competition is so hot, what did everyone say?

We analyzed 31000 bullet screen data with Python, and the screen was full of blessings and feelings of fans.

We can not only feel the whole process of the game through live broadcast and news, but also feel the enthusiasm of fans by analyzing hot spots through Python.

The source code, font file, stop word file and background map used in the article can be obtained by adding friends!

Teach you to get barrage data hand in hand

1. Brief description

It doesn't matter if you haven't seen the live broadcast. There's a replay! The whole video has been sorted out for everyone, from the opening ceremony, to five games, and then to the moment of winning the championship, a total of 7 videos.


In each video, there are bullet screens released by fans. What we need to do today is to get the bullet screen data in each video and see what the fans say in a restless mood?

I have to say that the change speed of the website of station B is really fast. I remember it was easy to find last year. But I haven't found it today.

But it doesn't matter. We just take the previous barrage data website interface and use it.

API: https://api.bilibili.com/x/v1/dm/list.so?oid=XXX

This oid is actually a string of numbers. Each video has a unique oid.

2. oid data search

This section takes you step by step to find this oid. To find an oid, you must first find something called cid.

Click F12, open the developer tool first, and complete the operations in 1-5 according to the prompts in the figure.

  • Place 3: there are many requests on this page, but you need to find the request starting with pagelist.
  • Place 4: under the corresponding Header, there is a Request URL, and the cid we want is in this URL.
  • Place 5: under the corresponding Preview, the Request URL is the result of the response to us. The cid data we want is circled in the figure.

2. cid data acquisition

We have found the Request URL above. Now we just need to make a request and get the cid data inside.

import requests
import json

url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
res = requests.get(url).text
json_dict = json.loads(res)
#pprint(json_dict)

for i in json_dict["data"]:
    oid = i["cid"]
    print(oid)

The results are as follows:

In fact, the number string corresponding to cid here is the number string after oid.

3. Splicing url

We have not only the barrage api interface, but also the cid data. Then we can splice them to get the final url.

url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
res = requests.get(url).text
json_dict = json.loads(res)
#pprint(json_dict)

for i in json_dict["data"]:
    oid = i["cid"]
    api = "https://api.bilibili.com/x/v1/dm/list.so?oid="
    url = api + str(oid)
    print(url)

The results are as follows:

There are 7 websites, corresponding to the barrage data in 7 videos.

Click any one to view:

4. Extract and save the barrage data

With a complete url, all we have to do is extract the data inside. Here, we still use regular expressions directly. Let's take one of the videos as an example to explain it to you.

final_url = "https://api.bilibili.com/x/v1/dm/list.so?oid=437729555"
final_res = requests.get(final_url)
final_res.encoding = chardet.detect(final_res.content)['encoding']
final_res = final_res.text
pattern = re.compile('<d.*?>(.*?)</d>')
data = pattern.findall(final_res)

with open("bullet chat.txt", mode="w", encoding="utf-8") as f:
    for i in data:
        f.write(i)
        f.write("\n")

The results are as follows:

This is only one page of data, with a total of 7200 data.

Complete code

I have explained each step of the process step by step. Here I directly encapsulate the code into a function.

import os
import requests
import json
import re
import chardet

# Get cid
def get_cid():
    url = 'https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&jsonp=jsonp'
    res = requests.get(url).text
    json_dict = json.loads(res)
    cid_list = []
    for i in json_dict["data"]:
        cid_list.append(i["cid"])
    return cid_list

# Splice url
def concat_url(cid):
    api = "https://api.bilibili.com/x/v1/dm/list.so?oid="
    url = api + str(cid)
    return url

# Regular extraction data
def get_data(url):
    final_res = requests.get(url)
    final_res.encoding = chardet.detect(final_res.content)['encoding']
    final_res = final_res.text
    pattern = re.compile('<d.*?>(.*?)</d>')
    data = pattern.findall(final_res)
    return data

# Save data
def save_to_file(data):
    with open("Barrage data.txt", mode="a", encoding="utf-8") as f:
        for i in data:
            f.write(i)
            f.write("\n")
            
cid_list = get_cid()
for cid in cid_list:
    url = concat_url(cid)
    data = get_data(url)
    save_to_file(data)

The results are as follows:

It's really great, a total of 3.1w data!

Nanny level word cloud map making tutorial

For the obtained data, we use the EDG background image to make a good-looking word cloud image.

# 1. Import related libraries
import pandas as pd
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from imageio import imread

import warnings
warnings.filterwarnings("ignore")

# Note: add word sets dynamically
for i in ["EDG","Eternal God","yyds","fucking great","Send a congratulatory message"]
    jieba.add_word(i)

# 2 read the text file and use the lcut() method for word segmentation
with open("Barrage data.txt",encoding="utf-8") as f:
    txt = f.read()
txt = txt.split()
txt = [i.upper() for i in txt]
data_cut = [jieba.lcut(x) for x in txt]

# 3 read stop words
with open("stoplist.txt",encoding="utf-8") as f:
    stop = f.read()
stop = stop.split()
stop = [" "] + stop

# 4 remove the final word after the stop word
s_data_cut = pd.Series(data_cut)
all_words_after = s_data_cut.apply(lambda x:[i for i in x if i not in stop])

# 5 word frequency statistics
all_words = []
for i in all_words_after:
    all_words.extend(i)
word_count = pd.Series(all_words).value_counts()

# 6 drawing of word cloud map
# 1) Read background picture
back_picture = imread("EDG.jpg")

# 2) Set word cloud parameters
wc = WordCloud(font_path="simhei.ttf",
               background_color="white",
               max_words=1000,
               mask=back_picture,
               max_font_size=200,
               random_state=42
              )
wc2 = wc.fit_words(word_count)

# 3) Draw word cloud
plt.figure(figsize=(16,8))
plt.imshow(wc2)
plt.axis("off")
plt.show()
wc.to_file("ciyun.png")

The results are as follows:

Tags: Python

Posted on Sun, 07 Nov 2021 22:20:28 -0500 by godwheel