catalogue
preface
Use Python to crawl the cat picture and create a picture for the cat???? Make thousands of images!
Crawling cat pictures
The Python version used in this article is version 3.10.0, which can be downloaded directly from the official website: https://www.python.org .

The installation and configuration process of Python is not described in detail here. Any search on the Internet is a tutorial!
1. Climb the painting material website
Crawl website: Cat picture
First install the necessary libraries:
pip install BeautifulSoup4 pip install requests pip install urllib3 pip install lxml
Crawl image code:
from bs4 import BeautifulSoup import requests import urllib.request import os # Page 1 cat picture website url = 'https://www.huiyi8.com/tupian/tag-%E7%8C%AB%E5%92%AA/1.html' # Image saving path, where r means no escape path = r"/Users/lpc/Downloads/cats/" # Judge whether the directory exists, skip if it exists, and create if it does not exist if os.path.exists(path): pass else: os.mkdir(path) # Get the web address of all cat pictures def allpage(): all_url = [] # Cycle page turning times 20 times for i in range(1, 20): # Replace the number of pages turned. Here [- 6] refers to the penultimate digit of the web page address each_url = url.replace(url[-6], str(i)) # Add all obtained URLs to all_url array all_url.append(each_url) # Return all obtained addresses return all_url # Main function entry if __name__ == '__main__': # Call the allpage function to get all web page addresses img_url = allpage() for url in img_url: # Get web page source code requ = requests.get(url) req = requ.text.encode(requ.encoding).decode() html = BeautifulSoup(req, 'lxml') # Add a url array img_urls = [] # Get the contents of all img tags in html for img in html.find_all('img'): # Filter matching src tag content starts with http and ends with jpg if img["src"].startswith('http') and img["src"].endswith("jpg"): # Add qualified img tags to img_urls array img_urls.append(img) # Loop through all src in the array for k in img_urls: # Get picture url img = k.get('src') # To get the picture name, cast is very important name = str(k.get('alt')) type(name) # Name the picture file_name = path + name + '.jpg' # Download cat pictures through picture url and picture name with open(file_name, "wb") as f, requests.get(img) as res: f.write(res.content) # Print crawled pictures print(img, file_name)
???? Note: the above code cannot be copied and run directly. You need to modify the download image path: / Users/lpc/Downloads/cats. Please modify it to the local save path of the reader!
Successful crawling:

A total of 346 cat pictures were taken!
2. Crawl ZOL website
Crawl to ZOL website: Adorable cat
Crawl Code:
import requests import time import os from lxml import etree # Requested path url = 'https://desk.zol.com.cn/dongwu/mengmao/1.html' # Image saving path, where r means no escape path = r"/Users/lpc/Downloads/ZOL/" # Here is the location of the path you want to save. The preceding r indicates that this paragraph is not escaped if os.path.exists(path): # Judge whether the directory exists, skip if it exists, and create if it does not exist pass else: os.mkdir(path) # Request header headers = {"Referer": "Referer: http://desk.zol.com.cn/dongman/1920x1080/", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36", } headers2 = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36 SE 2.X MetaSr 1.0", } def allpage(): # Get all pages all_url = [] for i in range(1, 4): # Number of page turning cycles each_url = url.replace(url[-6], str(i)) # replace all_url.append(each_url) return all_url # Return to address list # TODO obtains the Html page for parsing if __name__ == '__main__': img_url = allpage() # Call function for url in img_url: # Send request resq = requests.get(url, headers=headers) # Displays whether the request was successful print(resq) # Page obtained after parsing the request html = etree.HTML(resq.text) # Get the url to enter the HD map page under the a tag hrefs = html.xpath('.//a[@class="pic"]/@href') # TODO goes deeper to get high-definition pictures for i in range(1, len(hrefs)): # request resqt = requests.get("https://desk.zol.com.cn" + hrefs[i], headers=headers) # analysis htmlt = etree.HTML(resqt.text) srct = htmlt.xpath('.//img[@id="bigImg"]/@src') # Cut picture name imgname = srct[0].split('/')[-1] # Get pictures according to url img = requests.get(srct[0], headers=headers2) # Execute write picture to file with open(path + imgname, "ab") as file: file.write(img.content) # Print crawled pictures print(img, imgname)
Successful crawling:

A total of 81 cat pictures were taken!
3. Climb Baidu picture website
Climb Baidu website: Baidu cat pictures
1. Crawl image code:
import requests import os from lxml import etree path = r"/Users/lpc/Downloads/baidu1/" # Judge whether the directory exists, skip if it exists, and create if it does not exist if os.path.exists(path): pass else: os.mkdir(path) page = input('Please enter how many pages to crawl:') page = int(page) + 1 header = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36' } n = 0 pn = 1 # pn is obtained from the first few pictures. When Baidu pictures decline, 30 are displayed at one time by default for m in range(1, page): url = 'https://image.baidu.com/search/acjson?' param = { 'tn': 'resultjson_com', 'logid': '7680290037940858296', 'ipn': 'rj', 'ct': '201326592', 'is': '', 'fp': 'result', 'queryWord': 'Kitty', 'cl': '2', 'lm': '-1', 'ie': 'utf-8', 'oe': 'utf-8', 'adpicid': '', 'st': '-1', 'z': '', 'ic': '0', 'hd': '1', 'latest': '', 'copyright': '', 'word': 'Kitty', 's': '', 'se': '', 'tab': '', 'width': '', 'height': '', 'face': '0', 'istype': '2', 'qc': '', 'nc': '1', 'fr': '', 'expermode': '', 'nojc': '', 'acjsonfr': 'click', 'pn': pn, # Which picture to start with 'rn': '30', 'gsm': '3c', '1635752428843=': '', } page_text = requests.get(url=url, headers=header, params=param) page_text.encoding = 'utf-8' page_text = page_text.json() print(page_text) # First, take out the dictionary where all links are located and store it in a list info_list = page_text['data'] # Since the last dictionary retrieved in this way is empty, the last element in the list is deleted del info_list[-1] # Define a list for storing picture addresses img_path_list = [] for i in info_list: img_path_list.append(i['thumbURL']) # Then take out all the picture addresses and download them # n will be the name of the picture for img_path in img_path_list: img_data = requests.get(url=img_path, headers=header).content img_path = path + str(n) + '.jpg' with open(img_path, 'wb') as fp: fp.write(img_data) n = n + 1 pn += 29
2. Crawl code
# -*- coding:utf-8 -*- import requests import re, time, datetime import os import random import urllib.parse from PIL import Image # Import a module imgDir = r"/Volumes/DBA/python/img/" # Set headers to prevent anti pickpocketing, set multiple headers # chrome,firefox,Edge headers = [ { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36', 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', 'Connection': 'keep-alive' }, { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0', 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', 'Connection': 'keep-alive' }, { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19041', 'Accept-Language': 'zh-CN', 'Connection': 'keep-alive' } ] picList = [] # Empty List of stored pictures keyword = input("Please enter a keyword to search for:") kw = urllib.parse.quote(keyword) # transcoding # Get 1000 thumbnail list s searched by Baidu def getPicList(kw, n): global picList weburl = r"https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11601692320226504094&ipn=rj&ct=201326592&is=&fp=result&queryWord={kw}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=©right=&word={kw}&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn={n}&rn=30&gsm=1e&1611751343367=".format( kw=kw, n=n * 30) req = requests.get(url=weburl, headers=random.choice(headers)) req.encoding = req.apparent_encoding # Prevent Chinese garbled code webJSON = req.text imgurlReg = '"thumbURL":"(.*?)"' # regular picList = picList + re.findall(imgurlReg, webJSON, re.DOTALL | re.I) for i in range(150): # The number of cycles is relatively large. If there are not so many graphs, the picList data will not increase. getPicList(kw, i) for item in picList: # Suffix and first name itemList = item.split(".") hz = ".jpg" picName = str(int(time.time() * 1000)) # Millisecond timestamp # Request picture imgReq = requests.get(url=item, headers=random.choice(headers)) # Save picture with open(imgDir + picName + hz, "wb") as f: f.write(imgReq.content) # Open picture with Image module im = Image.open(imgDir + picName + hz) bili = im.width / im.height # Get the width height ratio, and adjust the picture size according to the width height ratio newIm = None # Resize the picture with the smallest side set to 50 if bili >= 1: newIm = im.resize((round(bili * 50), 50)) else: newIm = im.resize((50, round(50 * im.height / im.width))) # Intercept the 50 * 50 part of the picture clip = newIm.crop((0, 0, 50, 50)) # Intercept the picture and crop it clip.convert("RGB").save(imgDir + picName + hz) # Save the captured picture print(picName + hz + " Processing completed")
Successful crawling:

Summary: the three websites crawled 1600 cat pictures!
Kilogram imaging
After crawling thousands of pictures, you need to use the pictures to splice them into a cat picture, that is, thousands of images.
1. Implementation of foto mosaik Edda software
Download the software first: Foto-Mosaik-Edda Installer , if you can't download it, search foto mosaik Edda directly on Baidu!
The process of installing foto mosaik Edda for Windows is relatively simple!
???? Note: however, the. NET Framework 2 needs to be installed in advance, otherwise the following error will be reported and the installation will not succeed!

How to enable. NET Framework 2:





Confirm that it has been successfully enabled:

Then you can continue the installation!






After installation, open the following:

Step 1: create a gallery:





Step 2: thousand image imaging:



Here, check the first step to create a good Gallery:






Moments of Miracles:

Make another lovely cat:

be accomplished!
2. Implementation using Python
First, select a picture:

Run the following code:
# -*- coding:utf-8 -*- from PIL import Image import os import numpy as np imgDir = r"/Volumes/DBA/python/img/" bgImg = r"/Users/lpc/Downloads/494.jpg" # Gets the average color value of the image def compute_mean(imgPath): ''' Get image average color value :param imgPath: Thumbnail path :return: (r,g,b)Of the entire thumbnail rgb average value ''' im = Image.open(imgPath) im = im.convert("RGB") # Switch to rgb mode # Convert image data into data sequence. Each row stores the color of each pixel in behavioral units '''For example: [[ 60 33 24] [ 58 34 24] ... [188 152 136] [ 99 96 113]] [[ 60 33 24] [ 58 34 24] ... [188 152 136] [ 99 96 113]] ''' imArray = np.array(im) # Function of mean() function: calculate the mean value of the specified data R = np.mean(imArray[:, :, 0]) # Gets the average of all R values G = np.mean(imArray[:, :, 1]) B = np.mean(imArray[:, :, 2]) return (R, G, B) def getImgList(): """ Get the path and average color of thumbnails :return: list,The image path and average color value are stored. """ imgList = [] for pic in os.listdir(imgDir): imgPath = imgDir + pic imgRGB = compute_mean(imgPath) imgList.append({ "imgPath": imgPath, "imgRGB": imgRGB }) return imgList def computeDis(color1, color2): ''' To calculate the color difference between two images, the computer is the color space distance. dis = (R**2 + G**2 + B**2)**0.5 Parameters: color1,color2 Color data( r,g,b) ''' dis = 0 for i in range(len(color1)): dis += (color1[i] - color2[i]) ** 2 dis = dis ** 0.5 return dis def create_image(bgImg, imgDir, N=2, M=50): ''' Fill in the new picture with the avatar according to the background picture bgImg: Background map address imgDir: Avatar catalog N: Magnification of background image scaling M: Size of Avatar( MxM) ''' # Get picture list imgList = getImgList() # Read picture bg = Image.open(bgImg) # bg = bg.resize((bg.size[0] // N. BG. Size [1] / / N)) # zoom. It is recommended to zoom the original image. The image is too large and the operation time is very long. bgArray = np.array(bg) width = bg.size[0] * M # The width of the newly generated picture. Each pixel is magnified by M times height = bg.size[1] * M # Height of the newly generated picture # Create a new blank diagram newImg = Image.new('RGB', (width, height)) # Cyclic filling diagram for x in range(bgArray.shape[0]): # x. Row data can be replaced by the original width for y in range(bgArray.shape[1]): # y. Column data,, can be replaced by the original figure height # Find the picture with the smallest distance minDis = 10000 index = 0 for img in imgList: dis = computeDis(img['imgRGB'], bgArray[x][y]) if dis < minDis: index = img['imgPath'] minDis = dis # After the cycle, index stores the image path with the closest color # minDis stores the color difference # fill tempImg = Image.open(index) # Open the picture with the smallest color difference distance # Adjust the size of the picture. It can not be adjusted here, because I have already adjusted it when downloading the picture tempImg = tempImg.resize((M, M)) # Paste the small picture on the new picture. Pay attention to x, y, rows and columns. Paste one at a distance of M. newImg.paste(tempImg, (y * M, x * M)) print('(%d, %d)' % (x, y)) # Print progress. Formatted output x, y # Save picture newImg.save('final.jpg') # Last save picture create_image(bgImg, imgDir)
Operation results:

It can be seen from the above figure that the clarity of the picture is comparable to that of the original picture. After zooming in, the small picture is still clearly visible!
??? note: running with Python will be slow!
Write at the end
It's nice to suck cats again~
