Python Cartoonizes Photos, Breaking the Secondary Wall with a Fist|Machine Learning



Project structure

Core Code



Next to my last article on the use of open source machine learning: How to turn photos into cartoons, animegan2-pytorch Machine Learning Project Use | Machine Learning_ Alan's Blog - CSDN Blog

I'll continue to do a little bit of magic with the project and still turn it into a python file to perform single-picture processing. Change to a tool that you can use directly.  

Project github address: github address

Project structure

There are some sample pictures in the samples directory that can be tested. The weights directory contains four models of the original project. The python environment requires some dependencies to be installed, mainly pytorch. An environmental installation of pytorch can refer to another article of mine: Machine Learning Base Environment Deployment|Machine Learning Series_ Alan's Blog - CSDN Blog

Core Code

No crap, go to the core code.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/4 22:34
# @Author: Swordsman Alan_ ALiang
# @Site    : 
# @File    :

from PIL import Image
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from torch import nn
import os
import torch.nn.functional as F
import uuid

# -------------------------- hy add 01 --------------------------
class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        if pad_mode not in pad_layer:
            raise NotImplementedError

        super(ConvNormLReLU, self).__init__(
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)

class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()

        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))

        # dw
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))
        # pw
        layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))

        self.layers = nn.Sequential(*layers)

    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out

class Generator(nn.Module):
    def __init__(self, ):

        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(64, 64)

        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(128, 128)

        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128),

        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)

        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)

        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),

    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)

        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)

        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)

        out = self.out_layer(out)
        return out

# -------------------------- hy add 02 --------------------------

def load_image(image_path, x32=False):
    img ="RGB")

    if x32:
        def to_32s(x):
            return 256 if x < 256 else x - x % 32

        w, h = img.size
        img = img.resize((to_32s(w), to_32s(h)))

    return img

def handle(image_path: str, output_dir: str, type: int, device='cpu'):
    _ext = os.path.basename(image_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/'
    elif type == 2:
        _checkpoint = './weights/'
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    image = load_image(image_path)

    with torch.no_grad():
        image = to_tensor(image).unsqueeze(0) * 2 - 1
        out = net(, False).cpu()
        out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
        out = to_pil_image(out)
    result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext))
    return result

if __name__ == '__main__':
    print(handle('samples/images/fengjing.jpg', 'samples/images_result/', 1))
    print(handle('samples/images/renxiang.jpg', 'samples/images_result/', 2))

Code Description

1. The handle method can change a picture into a cartooned picture, which can be referred to as picture path, output directory, type (1 for scenery type picture, 2 for portrait picture), device type (default cpu, you can select cuda)

2. As tested in my previous article, models suitable for scenery are different from models suitable for portraits, so they are distinguished.

3. Use uuid for the output picture name in order not to repeat.


Start with the prepared pictures



results of enforcement

The effect is as follows


OK, no problem.


The overall effect is good. Recently, it may be better to understand if you want not to record a video of the operation process. Just don't know if it is necessary, and ask for advice, so you can tell me with confidence or comments.

I'll change this project. Isn't it more fragrant to make the input video?


        I want to be a gentle person, because I was treated like a gentle person, and I deeply understand the feeling of being treated like a gentle person.

                                                                                                        ·        —— Xia Mu Friends Account

If this article is helpful to you, give it a compliment, thank you!

Tags: Python Machine Learning Deep Learning image processing

Posted on Sat, 04 Dec 2021 12:30:53 -0500 by Superian