current position:Home>Python realizes GIF dynamic drawing and video card communication, kicks two feet to break the dimensional wall | machine learning

Python realizes GIF dynamic drawing and video card communication, kicks two feet to break the dimensional wall | machine learning

2022-02-02 09:42:10 Swordsman a Liang_ ALiang

Catalog

Preface

Environment depends on

Core code

gif Animated cartoon

Video card communication

summary


Preface

Then my last article :Python Realize the cartoon of photos , One punch breaks the dimensional wall | machine learning _ A Liang's blog -CSDN Blog

I'll continue to change , So that the model can support gif Motion picture or video , Also make cartoon effect . After all, a picture is OK, so can a side video , No problem . So I punched the dimensional wall , I'm adding two feet .

project github Address :github Address

Environment depends on

In addition to the dependencies in the previous article , You need to add some other dependencies ,requirements.txt as follows :

Other circumstances are not clear , You can see the article with my preface link address , There are specific instructions .

Core code

Don't talk nonsense , First gif Code .

gif Animated cartoon

The implementation code is as follows :

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 18:10
# @Author  :  Swordsman a Liang _ALiang
# @Site    : 
# @File    : gif_cartoon_tool.py
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 0:26
# @Author  :  Swordsman a Liang _ALiang
# @Site    :
# @File    : video_cartoon_tool.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/4 22:34
# @Author  :  Swordsman a Liang _ALiang
# @Site    :
# @File    : image_cartoon_tool.py

from PIL import Image, ImageEnhance, ImageSequence
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from torch import nn
import os
import torch.nn.functional as F
import uuid
import imageio


# -------------------------- hy add 01 --------------------------
class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        }
        if pad_mode not in pad_layer:
            raise NotImplementedError

        super(ConvNormLReLU, self).__init__(
            pad_layer[pad_mode](padding),
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)
        )


class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()

        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))

        # dw
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))
        # pw
        layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))

        self.layers = nn.Sequential(*layers)

    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out


class Generator(nn.Module):
    def __init__(self, ):
        super().__init__()

        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(64, 64)
        )

        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(128, 128)
        )

        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128),
        )

        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)
        )

        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)
        )

        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )

    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)

        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)

        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)

        out = self.out_layer(out)
        return out


# -------------------------- hy add 02 --------------------------

def handle(gif_path: str, output_dir: str, type: int, device='cpu'):
    _ext = os.path.basename(gif_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/paprika.pt'
    elif type == 2:
        _checkpoint = './weights/face_paint_512_v1.pt'
    elif type == 3:
        _checkpoint = './weights/face_paint_512_v2.pt'
    elif type == 4:
        _checkpoint = './weights/celeba_distill.pt'
    else:
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext))
    img = Image.open(gif_path)
    out_images = []
    for frame in ImageSequence.Iterator(img):
        frame = frame.convert("RGB")
        with torch.no_grad():
            image = to_tensor(frame).unsqueeze(0) * 2 - 1
            out = net(image.to(device), False).cpu()
            out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
            out = to_pil_image(out)
            out_images.append(out)
    # out_images[0].save(result, save_all=True, loop=True, append_images=out_images[1:], duration=100)
    imageio.mimsave(result, out_images, fps=15)
    return result


if __name__ == '__main__':
    print(handle('samples/gif/128.gif', 'samples/gif_result/', 3, 'cuda'))

Code instructions :

1、 The main handle The method input parameters are :gif Address 、 The output directory 、 type 、 Use of equipment ( Default cpu, Optional cuda Using a graphics card ).

2、 The type is mainly the selection model , It is best to 3, Portrait processing is more vivid .

Perform a verification

Here's what I prepared gif material

The results are as follows

Take a look at the effect

ha-ha , It's a little interesting .

Video card communication

The implementation code is as follows :

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 0:26
# @Author  :  Swordsman a Liang _ALiang
# @Site    : 
# @File    : video_cartoon_tool.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/4 22:34
# @Author  :  Swordsman a Liang _ALiang
# @Site    :
# @File    : image_cartoon_tool.py

from PIL import Image, ImageEnhance
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from torch import nn
import os
import torch.nn.functional as F
import uuid
import cv2
import numpy as np
import time
from ffmpy import FFmpeg


# -------------------------- hy add 01 --------------------------
class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        }
        if pad_mode not in pad_layer:
            raise NotImplementedError

        super(ConvNormLReLU, self).__init__(
            pad_layer[pad_mode](padding),
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)
        )


class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()

        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))

        # dw
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))
        # pw
        layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))

        self.layers = nn.Sequential(*layers)

    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out


class Generator(nn.Module):
    def __init__(self, ):
        super().__init__()

        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(64, 64)
        )

        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(128, 128)
        )

        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128),
        )

        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)
        )

        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)
        )

        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )

    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)

        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)

        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)

        out = self.out_layer(out)
        return out


# -------------------------- hy add 02 --------------------------

def handle(video_path: str, output_dir: str, type: int, fps: int, device='cpu'):
    _ext = os.path.basename(video_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/paprika.pt'
    elif type == 2:
        _checkpoint = './weights/face_paint_512_v1.pt'
    elif type == 3:
        _checkpoint = './weights/face_paint_512_v2.pt'
    elif type == 4:
        _checkpoint = './weights/celeba_distill.pt'
    else:
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    #  Get video audio 
    _audio = extract(video_path, output_dir, 'wav')
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext))
    capture = cv2.VideoCapture(video_path)
    size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    print(size)
    videoWriter = cv2.VideoWriter(result, cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
    cul = 0
    with torch.no_grad():
        while True:
            ret, frame = capture.read()
            if ret:
                print(ret)
                image = to_tensor(frame).unsqueeze(0) * 2 - 1
                out = net(image.to(device), False).cpu()
                out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
                out = to_pil_image(out)
                contrast_enhancer = ImageEnhance.Contrast(out)
                img_enhanced_image = contrast_enhancer.enhance(2)
                enhanced_image = np.asarray(img_enhanced_image)
                videoWriter.write(enhanced_image)
                cul += 1
                print(' The first {} Pictures '.format(cul))
            else:
                break
    videoWriter.release()
    #  Add original audio to video 
    _final_video = video_add_audio(result, _audio, output_dir)
    return _final_video


# -------------------------- hy add 03 --------------------------
def extract(video_path: str, tmp_dir: str, ext: str):
    file_name = '.'.join(os.path.basename(video_path).split('.')[0:-1])
    print(' file name :{}, Extract audio '.format(file_name))
    if ext == 'mp3':
        return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'mp3')
    if ext == 'wav':
        return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'wav')


def _run_ffmpeg(video_path: str, audio_path: str, format: str):
    ff = FFmpeg(inputs={video_path: None},
                outputs={audio_path: '-f {} -vn'.format(format)})
    print(ff.cmd)
    ff.run()
    return audio_path


#  Video add audio 
def video_add_audio(video_path: str, audio_path: str, output_dir: str):
    _ext_video = os.path.basename(video_path).strip().split('.')[-1]
    _ext_audio = os.path.basename(audio_path).strip().split('.')[-1]
    if _ext_audio not in ['mp3', 'wav']:
        raise Exception('audio format not support')
    _codec = 'copy'
    if _ext_audio == 'wav':
        _codec = 'aac'
    result = os.path.join(
        output_dir, '{}.{}'.format(
            uuid.uuid4(), _ext_video))
    ff = FFmpeg(
        inputs={video_path: None, audio_path: None},
        outputs={result: '-map 0:v -map 1:a -c:v copy -c:a {} -shortest'.format(_codec)})
    print(ff.cmd)
    ff.run()
    return result


if __name__ == '__main__':
    print(handle('samples/video/981.mp4', 'samples/video_result/', 3, 25, 'cuda'))

Code instructions

1、 The main implementation methods and input parameters are : Video address 、 The output directory 、 type 、fps( Frame rate )、 Device type ( Default cpu, Can choose cuda Graphics card mode ).

2、 The type is mainly the selection model , It is best to 3, Portrait processing is more vivid .

3、 Code design ideas : First extract the video and audio 、 The video is processed frame by frame and written into a new video 、 New video and original video audio fusion .

For how to extract audio from video, please refer to my other article :python Extract audio from video | Python Tool class _ A Liang's blog -CSDN Blog

On how to fuse video and audio, please refer to my other article :

Python Video add audio ( The attached code ) | Python Tools _ A Liang's blog -CSDN Blog

4、 Temporary files will be generated in the middle of the video , Not cleaned up , If necessary, you can modify the code and clean it yourself .

Check it out

The following is a screenshot of the video material I prepared , I'll upload it to github On .

Execution results

Take a look at the screenshot

 

It's still very good .

summary

It's not that there's nothing to summarize this time , There are a lot of things to sum up . First, let me talk about some problems with the current model of the open source project .

1、 I tested a lot of pictures , Generally speaking, Asian faces can't be well cartoon , But European and American faces are better . So the training data is not enough , But I can understand , After all, it's a headache to think about cartoon labeling data . So I suggest you use it , Pay more attention to whether the project has updated the latest model .

2、 Once the video has subtitles , Will also deal with subtitles . So you can consider finding some materials that separate video and subtitles , The effect will be better .

in general , This project is still very good , The two-day study also made me gain a lot .

Share :

        You can't jump out of the world , Because you don't know how big the world is , Once you know , You go beyond it . 

                                                                                                                ——《 The wu is empty the 》

If this article is useful to you , Please like it , thank you !

copyright notice
author[Swordsman a Liang_ ALiang],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020942086065.html

Random recommended