30 lines of Python code crawls all heroes and skins of hero League

30 lines of Python code crawls all heroes and skins of hero League

Fragment

There are hundreds of personal heroes in the hero League game, and then each hero has multiple skins. As a collection controller, I really want to collect all the pictures of skin. It's hard to praise the poor family!

Pre-phase Analyses

The seal information of hero skin can be found in the official game materials (https://lol.qq.com/data/info-heros.shtml).

Crawling skin itself is not very difficult, it is to save the data in binary form to a file. The difficulty is how to get the URL of skin image. Go to the official website immediately and click to open a few hero skin analysis.

Select a hero at will, and then F12 opens the debugging platform to find the image address of hero skin:

Find the location of skin image, and then we will find several more skin image addresses for comparative analysis to find out the rules:

Hero one
https://game.gtimg.cn/images/lol/act/img/skin/big1000.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big1001.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big1002.jpg
//Hero two
https://game.gtimg.cn/images/lol/act/img/skin/big2000.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big2001.jpg
https://game.gtimg.cn/images/lol/act/img/skin/big2002.jpg

Here, we compare and analyze the three skin image addresses of the two heroes, and find that only the last sequence number changes. We guess that the general address format of the skin image is https://game.gtimg.cn/images/lol/act/img/skin/bigxxxx.jpg. Part of the rule of serial number should be that the first digit represents the number of the hero, and the last three digits represent the skin number of the hero.

x       xxx
 Hero skin

According to this rule, we can calculate all the hero skin image addresses. But.... In fact, the number of heroes is not increasing in turn. For example, the number of the new hero SETI is 875, but in fact, SETI is the 148th hero in formal clothes.

In this way, there is no rule depending on the number of heroes. Don't worry. On the hero profile page, open the F12 debugging console, grab the network request and find several files. There is a hero list.js file. This json file contains the hero number we are searching for:

With the hero number, the next step is to find the hero's skin number. Because the number of skin of each hero is different, the same skin may also have colorful skin. It's not bad to use html file regular matching, just lazy ~ try to find out if there is a file similar to hero list.js on the details page.

XD ~ each hero has a json file prefixed with the hero number, which records the skin image address of the hero. mainImg is the large picture of skin, iconImg is the head picture of skin. The json file also records other information about the hero, such as location, skills, description, and so on. I can try to climb down myself.... Self development.

Since each hero's json file contains the address of the skin image, we don't need to build the address of the hero's skin image ourselves, but build the hero's json request link to resolve the skin image address from the json.

code implementation

The os and requests libraries are needed. The first step is to get the json file of hero number. The file address is https://game.gtimg.cn/images/lol/act/img/js/hero list/hero_list.js, and the request method is get. This address can be found in the debug console. After getting the json file, analyze the json data and extract the hero number.

import os
import requests

#Get all hero json files
list_url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
herolist = requests.get(list_url)

#Processing json files
herolist_json = herolist.json()#Convert to json format
version = herolist_json['version']#Extract version information
fileTime = herolist_json['fileTime']#Extract json generation time
heroId = list(map(lambda x:x['heroId'],herolist_json['hero']))#Extract hero Id

After you get the hero number, you need to build the hero independent json address. The general format is: https://game.gtimg.cn/images/lol/act/img/js/hero/ [hero number]. JS. Get requests the address to obtain the hero independent json file, and resolves the skin address from the file:

for heroid in heroId:

	herojs_url = "https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js".format(heroid)
	herojs = requests.get(herojs_url).json()#Get and convert to json file
	heroname = herojs['hero']['name'] + herojs['hero']['title']#Extract hero title and nickname
	try:
		os.mkdir(heroname)#Create hero folder
		print("\n\n[+] Folder [{}]Create successfully, start downloading skin".format(heroname))
	except FileExistsError:
		print("\n\n[!] Folder [{}]Already exists, start downloading skin".format(heroname))

	for skin in herojs['skins']:
		skinname = skin['name']#Extract skin name
		print("[-] Downloading[{}]".format(skinname))
		skinurl = skin['mainImg']#Extract skin links
		#Error handling of colorful skin
		try:
			skin_jpg = requests.get(skinurl)
			if skin_jpg.status_code == 200:
				f = open(heroname+"\\"+skinname+".jpg","wb")
				f.write(skin_jpg.content)
				f.close()
		except:
			print("[!] [{}]No picture for colorful skin".format(skinname))

There are two error handling, one is to create a separate folder for each hero; the other is to deal with colorful skin. Because there is no independent large image of dazzle color skin, there is no large image link of skin in json.

Ending

Remove the comment blank line, and nearly 30 lines of code can crawl down the hero League all hero all skin image. Let's see the finished product first~

Complete code

When writing code, folder creation uses relative path, which is created in the directory where the program runs. So please start cmd (powershell) in the directory where the source code is located, otherwise, the folder creation location is unexpected.

#encoding:utf-8

import os
import requests

#Get all hero json files
list_url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js'
herolist = requests.get(list_url)

#Processing json files
herolist_json = herolist.json()#Convert to json format
version = herolist_json['version']#Extract version information
fileTime = herolist_json['fileTime']#Extract json generation time
heroId = list(map(lambda x:x['heroId'],herolist_json['hero']))#Extract hero Id
#Prompt information
print("The current version is:{}".format(version))
print("The hero list is updated on:{}\n".format(fileTime))
print("Ready to start downloading skin....")

#Download each hero's skin
for heroid in heroId:

	herojs_url = "https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js".format(heroid)
	herojs = requests.get(herojs_url).json()#Get and convert to json file
	heroname = herojs['hero']['name'] + herojs['hero']['title']#Extract hero title and nickname
	try:
		os.mkdir(heroname)#Create hero folder
		print("\n\n[+] Folder [{}]Create successfully, start downloading skin".format(heroname))
	except FileExistsError:
		print("\n\n[!] Folder [{}]Already exists, start downloading skin".format(heroname))

	for skin in herojs['skins']:
		skinname = skin['name']#Extract skin name
		print("[-] Downloading[{}]".format(skinname))
		skinurl = skin['mainImg']#Extract skin links
		#Error handling of colorful skin
		try:
			skin_jpg = requests.get(skinurl)
			if skin_jpg.status_code == 200:
				f = open(heroname+"\\"+skinname+".jpg","wb")
				f.write(skin_jpg.content)
				f.close()
		except:
			print("[!] [{}]No picture for colorful skin".format(skinname))
77 original articles published, 27 praised, 10000 visitors+
Private letter follow

Tags: JSON Lambda Python Fragment

Posted on Tue, 14 Jan 2020 01:22:54 -0500 by kristolklp