嘿!您似乎在 United States,您想使用我们的 English 网站吗?
Switch to English site
Skip to main content

利用Jetson Nano、Google Colab实作CycleGAN:将拍下来的照片、影片转换成梵谷风格 – 训练、预测以及应用篇

05_6fa4ff52a5273250b96fa0679f2977fe3ad79f59.jpg

训练CycleGAN

首先先取得训练资料:

from tqdm import tqdm
import torchvision.utils as vutils
total_len = len(dataA_loader) + len(dataB_loader)
for epoch in range(epochs): 
    progress_bar = tqdm(enumerate(zip(dataA_loader, dataB_loader)), total = total_len) 
    for idx, data in progress_bar: 
        ############ define training data & label ############ 
        real_A = data[0][0].to(device)    # vangogh image
        real_B = data[1][0].to(device)    # real picture

我们要先训练G,总共有三个标准要来衡量生成器:

1.是否能骗过鉴别器 (Adversial Loss ):

对于G_B2A来说,将A转换成B之后给予1的标签,并且计算跟real_B 之间的距离。

        ############ Train G ############ 
        optim_G.zero_grad()
        ############  Train G - Adversial Loss  ############
        fake_A = G_B2A(real_B)
        fake_out_A = D_A(fake_A) 
        fake_B = G_A2B(real_A)
        fake_out_B = D_B(fake_B)
        real_label = torch.ones( (fake_out_A.size()) , dtype=torch.float32).to(device)
        fake_label = torch.zeros( (fake_out_A.size()) , dtype=torch.float32).to(device) 
        adversial_loss_B2A = MSE(fake_out_A, real_label)
        adversial_loss_A2B = MSE(fake_out_B, real_label)
        adv_loss = adversial_loss_B2A + adversial_loss_A2B

2.是否能重新建构 (Consistency Loss):

举例 G_B2A(real_B) 产生风格A的图像 (fake_A) 后,再丢进 G_A2B(fake_A) 重新建构成B风格的图像 (rec_B),并且计算 real_B 跟 rec_B之间的差距。

        ############  G - Consistency Loss (Reconstruction)  ############
        rec_A = G_B2A(fake_B)
        rec_B = G_A2B(fake_A) 
        consistency_loss_B2A = L1(rec_A, real_A)
        consistency_loss_A2B = L1(rec_B, real_B) 
        rec_loss = consistency_loss_B2A + consistency_loss_A2B

3.是否能保持一致 (Identity Loss):

以G_A2B来说,是否在丢入 real_B的图片后,确实能输出 B风格的图片,是否能保持原样?

        ############  G - Identity  Loss ############
        idt_A = G_B2A(real_A)
        idt_B = G_A2B(real_B) 
        identity_loss_A = L1(idt_A, real_A)
        identity_loss_B = L1(idt_B, real_B) 
        idt_loss = identity_loss_A + identity_loss_B

最后将其所有损失都计算梯度,并且更新参数,这边可以注意到重构的loss乘上10,而一致性的部分乘上5,代表在CycleGAN当中能不能重构占了相当大的比例。

        ############  G - Total Loss  ############
        lambda_rec = 10
        lambda _idt = 5 
        loss_G = adv_loss + ( rec_loss * lambda _rec ) + ( idt_loss * lambda _idt )
        ############  G - Backward & Update  ############
        loss_G.backward()
        optim_G.step()

接着训练D,它只要将自己的本份顾好就好了,也就是「能否分辨得出该风格的成像是否真实」。

        ############ Train D ############ 
        optim_D.zero_grad()
        ############ D - Adversial D_A Loss ############  
        real_out_A = D_A(real_A)
        real_out_A_loss = MSE(real_out_A, real_label) 
        fake_out_A = D_A(fake_A_sample.push_and_pop(fake_A))
        fake_out_A_loss = MSE(real_out_A, fake_label)
        loss_DA = real_out_A_loss + fake_out_A_loss
        ############  D - Adversial D_B Loss  ############
        real_out_B = D_B(real_B)
        real_out_B_loss = MSE(real_out_B, real_label) 
        fake_out_B = D_B(fake_B_sample.push_and_pop(fake_B))
        fake_out_B_loss = MSE(fake_out_B, fake_label)
        loss_DB = ( real_out_B_loss + fake_out_B_loss )
        ############  D - Total Loss ############
        loss_D = ( loss_DA + loss_DB ) * 0.5
        ############  Backward & Update ############
        loss_D.backward()
        optim_D.step()

最后我们可以将一些信息透过tqdm印出来

        ############ progress info ############
        progress_bar.set_description(
            f"[{epoch}/{epochs - 1}][{idx}/{len(dataloader) - 1}] "
            f"Loss_D: {(loss_DA + loss_DB).item():.4f} "
            f"Loss_G: {loss_G.item():.4f} "
            f"Loss_G_identity: {(idt_loss).item():.4f} "
            f"loss_G_GAN: {(adv_loss).item():.4f} "
            f"loss_G_cycle: {(rec_loss).item():.4f}")

接着训练GAN非常重要的环节就是要记得储存权重,因为说不定训练第100回合的效果比200回合的还要好,所以都会倾向一定的回合数就储存一次。储存的方法很简单大家可以上PyTorch的官网查看,大致上总共有两种储存方式:

1.储存模型结构以及权重

torch.save( model )

2.只储存权重

torch.save( model.static_dict() )

而我采用的方式是只储存权重,这也是官方建议的方案:

        
        if i % log_freq == 0:
            vutils.save_image(real_A, f"{output_path}/real_A_{epoch}.jpg", normalize=True)
            vutils.save_image(real_B, f"{output_path}/real_B_{epoch}.jpg", normalize=True)
            fake_A = ( G_B2A( real_B ).data + 1.0 ) * 0.5
            fake_B = ( G_A2B( real_A ).data + 1.0 ) * 0.5
            vutils.save_image(fake_A, f"{output_path}/fake_A_{epoch}.jpg", normalize=True)
            vutils.save_image(fake_B, f"{output_path}/fake_A_{epoch}.jpg", normalize=True)
    torch.save(G_A2B.state_dict(), f"weights/netG_A2B_epoch_{epoch}.pth")
    torch.save(G_B2A.state_dict(), f"weights/netG_B2A_epoch_{epoch}.pth")
    torch.save(D_A.state_dict(), f"weights/netD_A_epoch_{epoch}.pth")
    torch.save(D_B.state_dict(), f"weights/netD_B_epoch_{epoch}.pth")
    ############ Update learning rates ############
    lr_scheduler_G.step()
    lr_scheduler_D.step()
############ save last check pointing ############
torch.save(netG_A2B.state_dict(), f"weights/netG_A2B.pth")
torch.save(netG_B2A.state_dict(), f"weights/netG_B2A.pth")
torch.save(netD_A.state_dict(), f"weights/netD_A.pth")
torch.save(netD_B.state_dict(), f"weights/netD_B.pth")   

测试

其实测试非常的简单,跟着以下的步骤就可以完成:

1.导入函式库

import os
import torch
import torchvision.datasets as dsets
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from tqdm import tqdm
import torchvision.utils as vutils

2.将测试数据建一个数据集并透过DataLoader加载:这边我创了一个Custom文件夹存放我自己的数据,并且新建了一个output文件夹方便察看结果。

  batch_size = 12
  device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
  transform = transforms.Compose( [transforms.Resize((256,256)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
  root = r'vangogh2photo'
  targetC_path = os.path.join(root, 'custom')
  output_path = os.path.join('./', r'output')
  if os.path.exists(output_path) == False:
    os.mkdir(output_path)
    print('Create dir : ', output_path)
  dataC_loader = DataLoader(dsets.ImageFolder(targetC_path, transform=transform), batch_size=batch_size, shuffle=True, num_workers=4)

3.实例化生成器、加载权重 (load_static_dict)、选择模式 ( train or eval ),如果选择 eval,PyTorch会将Drop给自动关掉;因为我只要真实照片转成梵谷所以只宣告了G_B2A:

  # get generator
  G_B2A = Generator().to(device)
  # Load state dicts
  G_B2A.load_state_dict(torch.load(os.path.join("weights", "netG_B2A.pth")))
  # Set model mode
  G_B2A.eval()

4.开始进行预测:

取得数据>丢进模型取得输出>储存图片

progress_bar = tqdm(enumerate(dataC_loader), total=len(dataC_loader))
  for i, data in progress_bar:
   # get data
      real_images_B = data[0].to(device)
      # Generate output
      fake_image_A = 0.5 * (G_B2A(real_images_B).data + 1.0)
      # Save image files
      vutils.save_image(fake_image_A.detach(), f"{output_path}/FakeA_{i + 1:04d}.jpg", normalize=True)
      progress_bar.set_description(f"Process images {i + 1} of {len(dataC_loader)}")

5.去output察看结果:

可能是因为我只有训练100回合,梵谷风格的细节线条还没学起来,大家可以尝试再训练久一点,理论上200回合就会有不错的成果了!

ORIGINAL

origin1_5c2af8b421c493ed46deaa6b4ba5a465aa355262.jpg

TRANSFORM

2_2_fake_A_98_3e3d1a6338a88d51a76f857f41e4ed35912da41a.jpg

好的,那现在已经会建构、训练以及预测了,接下来我们来想个办法应用它!讲到Style Transfer的应用,第一个就想到微软大大提供的Style Transfer Azure Website。

Azure 的Style Transfer

2_3_styletransfers-azurewebsites-net2_b45d2d410fc79e35be49e2acd516e64ec7795031.png

网站连结  https://styletransfers.azurewebsites.net/

这种拍一张照片就可以直接做转换的感觉真的很棒!所以我们理论上也可以透过简单的opencv程序来完成这件事情,再实作之前先去体验看看Style Transfer 。

2_4_styletransfers-trytrysee1_ba0995510ede053e5d02c32e3b20d47374813c3a.png

按下Create就能进来这个页面,透过点击Capture就可以拍照进行转换也可以点击Upload a picture上传照片,总共有4种风格可以选择:

06_6fa4ff52a5273250b96fa0679f2977fe3ad79f59.jpg

感觉真的超级酷的!所以我们也来试着实作类似的功能。

在JetsonNano中进行风格转换

1.首先要将权重放到Jetson Nano中,我新增了一个weights文件夹并且将pth放入其中,此外还在同一层级新增了jupyter book的程序:

jupyter1_baa8980f5be31f0867b50ec9d4a217bcced76301.png

2.重建生成器并导入权重值
这边可能会有版本问题,像我就必须升级成Torch 1.6版本,而安装PyTorch的方法我会放在文章结尾补述,回归正题,还记得刚刚我储存的时候只有储存权重对吧,所以我们必须建一个跟当初训练一模一样的模型才能汇入哦!所以来复制一下之前写的生成器吧!

import torch
from torch import nn
from torchsummary import summary
def conv_norm_relu(in_dim, out_dim, kernel_size, stride = 1, padding=0):
    layer = nn.Sequential(nn.Conv2d(in_dim, out_dim, kernel_size, stride, padding),
                          nn.InstanceNorm2d(out_dim), 
                          nn.ReLU(True))
    return layer
def dconv_norm_relu(in_dim, out_dim, kernel_size, stride = 1, padding=0, output_padding=0):
    layer = nn.Sequential(nn.ConvTranspose2d(in_dim, out_dim, kernel_size, stride, padding, output_padding),
                          nn.InstanceNorm2d(out_dim), 
                          nn.ReLU(True))
    return layer
class ResidualBlock(nn.Module):
    def __init__(self, dim, use_dropout):
        super(ResidualBlock, self).__init__()
        res_block = [nn.ReflectionPad2d(1),
                     conv_norm_relu(dim, dim, kernel_size=3)]
        if use_dropout:
            res_block += [nn.Dropout(0.5)]
        res_block += [nn.ReflectionPad2d(1),
                      nn.Conv2d(dim, dim, kernel_size=3, padding=0),
                      nn.InstanceNorm2d(dim)]
        self.res_block = nn.Sequential(*res_block)
    def forward(self, x):
        return x + self.res_block(x)
class Generator(nn.Module):
    def __init__(self, input_nc=3, output_nc=3, filters=64, use_dropout=True, n_blocks=6):
        super(Generator, self).__init__()
        # 向下采样
        model = [nn.ReflectionPad2d(3),
                 conv_norm_relu(input_nc   , filters * 1, 7),
                 conv_norm_relu(filters * 1, filters * 2, 3, 2, 1),
                 conv_norm_relu(filters * 2, filters * 4, 3, 2, 1)]
        # 颈脖层
        for i in range(n_blocks):
            model += [ResidualBlock(filters * 4, use_dropout)]
        # 向上采样
        model += [dconv_norm_relu(filters * 4, filters * 2, 3, 2, 1, 1),
                  dconv_norm_relu(filters * 2, filters * 1, 3, 2, 1, 1),
                  nn.ReflectionPad2d(3),
                  nn.Conv2d(filters, output_nc, 7),
                  nn.Tanh()]
        self.model = nn.Sequential(*model)    # model 是 list 但是 sequential 需要将其透过 , 分割出来
    def forward(self, x):
        return self.model(x)

接下来要做实例化模型并导入权重:

def init_model():
    device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
    G_B2A = Generator().to(device)
    G_B2A.load_state_dict(torch.load(os.path.join("weights", "netG_B2A.pth"), map_location=device ))
    G_B2A.eval()
    return G_B2A

3.在Colab中拍照

我先写了一个副函式来进行模型的预测,丢进去的图片记得也要做transform,将大小缩放到256、转换成tensor以及正规化,这部分squeeze目的是要模拟成有batch_size的格式:

def test(G, img): 
    device = 'cuda:0' if torch.cuda.is_available() else 'cpu' 
    transform = transforms.Compose([transforms.Resize((256,256)),
                                    transforms.ToTensor(),
                                    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
    data = transform(img).to(device) 
    data = data.unsqueeze(0) 
    out = (0.5 * (G(data).data + 1.0)).squeeze(0)    return out

我们接着使用OpenCV来完成拍照,按下q离开,按下s进行储存,那我们可以在按下s的时候进行风格转换,存下两种风格的图片,这边要注意的是PyTorch吃的是PIL的图档格式,所以还必须将OpenCV的nparray格式转换成PIL.Image格式:

if __name__=='__main__':
    G = init_model()
    trans_path = 'test_transform.jpg'
    org_path = 'test_original.jpg'
    cap = cv2.VideoCapture(0)
    while(True):
        ret, frame = cap.read()
        cv2.imshow('webcam', frame)
        key = cv2.waitKey(1)
        if key==ord('q'):
            cap.release()
            cv2.destroyAllWindows()
            break
        elif key==ord('s'):
            output = test(G, Image.fromarray(frame))
            style_img = np.array(output.cpu()).transpose([1,2,0])
            org_img = cv2.resize(frame, (256, 256))
            cv2.imwrite(trans_path, style_img*255)
            cv2.imwrite(org_path, org_img)
            break
    cap.release()
    cv2.destroyWindow('webcam')

执行的画面如下:

2_9_take_pic_toTransfer_11_8d38eaef5833fbcb6ca9a242503b93f872dc671f.jpg

    res = np.concatenate((style_img, org_img/255), axis=1)
    cv2.imshow('res',res )
    cv2.waitKey(0)
    cv2.destroyAllWindows()

2_10_take_pic_toTransfer_21_0b3b0bc4b3ff9c1b083de057a6a83b429d670ee5.jpg

在Jetson Nano中做实时影像转换

概念跟拍照转换雷同,这边我们直接在取得到摄影机的图像之后就做风格转换,我额外写了一个判断,按下t可以进行风格转换,并且用cv2.putText将现在风格的卷标显示在左上角。

if __name__=='__main__':
    G = init_model() 
    cap = cv2.VideoCapture(0) 
    change_style = False
    save_img_name = 'test.jpg'
    cv2text = ''
    while(True): 
        ret, frame = cap.read()
        # Do Something Cool 
        ############################ 
        if change_style:
            style_img = test(G, Image.fromarray(frame))
            out = np.array(style_img.cpu()).transpose([1,2,0])
            cv2text = 'Style Transfer'
        else:
            out = frame
            cv2text = 'Original'
        out = cv2.resize(out, (512, 512))
        out = cv2.putText(out, f'{cv2text}', (20, 40), cv2.FONT_HERSHEY_SIMPLEX ,  
                   1, (255, 255, 255), 2, cv2.LINE_AA) 
        ########################### 
        cv2.imshow('webcam', out) 
        key = cv2.waitKey(1) 
        if key==ord('q'):
            break
        elif key==ord('s'):
            if change_style==True:
                cv2.imwrite(save_img_name,out*255)
            else:
                cv2.imwrite(save_img_name,out) 
        elif key==ord('t'):
            change_style = False if change_style else True
    cap.release()
    cv2.destroyAllWindows()

实时影像风格转换成果

results_video_c68e96b069dc2957222a70077ff87f982720e91d.jpg

结语

这次GAN影像风格转换的部分就告一段落了,利用Colab来训练风格转换的范例真的还是偏硬了一点,虽然我们只有训练100回合但也跑了半天多一点了,但是!GAN就是个需要耐心的模型,不跑个三天两夜他是不会给你多好的成效的。

至于在Inference的部分,Jetson Nano还是担当起重要的角色,稍微有一些延迟不过还算是不错的了,或许可以考虑透过ONNX转换成TensorRT再去跑,应该又会加快许多了,下一次又会有什么GAN的范例大家可以期待一下,或者留言跟我说。

补充 – Nano 安装Torch 1.6的方法

首先,JetPack版本要升级到4.4哦!不然CUDA核心不同这部分官网就有升级教学所以就不多赘述了。

将PyTorch等相依套件更新至1.6版本:

$ wget https://nvidia.box.com/shared/static/yr6sjswn25z7oankw8zy1roow9cy5ur1.whl -O torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl
$ sudo apt-get install python3-pip libopenblas-base libopenmpi-dev 
$ pip3 install Cython
$ pip3 install torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl

将TorchVision更新至对应版本:

$ sudo apt-get install libjpeg-dev zlib1g-dev
$ git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision
$ cd torchvision
$ export BUILD_VERSION=0.7.0  # where 0.x.0 is the torchvision version  
$ sudo python3 setup.py install     # use python3 if installing for Python 3.6
$ cd ../  # attempting to load torchvision from build dir will result in import error
CAVEDU Education is devoted into robotics education and maker movement since 2008, and is intensively active in teaching fundamental knowledge and skills. We had published many books for readers in all ages, topics including Deep Learning, edge computing, App Inventor, IoT and robotics. Please check CAVEDU's website for more information: http://www.cavedu.com, http://www.appinventor.tw
DesignSpark Electrical Logolinkedin