Wie finden Sie diesen Artikel? Helfen Sie uns, bessere Inhalte für Sie bereitzustellen.
Vielen Dank! Ihr Feedback ist eingegangen.
There was a problem submitting your feedback, please try again later.
Was denken Sie über diesen Artikel?
训练CycleGAN
首先先取得训练资料:
from tqdm import tqdm
import torchvision.utils as vutils
total_len = len(dataA_loader) + len(dataB_loader)
for epoch in range(epochs):
progress_bar = tqdm(enumerate(zip(dataA_loader, dataB_loader)), total = total_len)
for idx, data in progress_bar:
############ define training data & label ############
real_A = data[0][0].to(device) # vangogh image
real_B = data[1][0].to(device) # real picture
我们要先训练G,总共有三个标准要来衡量生成器:
1.是否能骗过鉴别器 (Adversial Loss ):
对于G_B2A来说,将A转换成B之后给予1的标签,并且计算跟real_B 之间的距离。
############ Train G ############
optim_G.zero_grad()
############ Train G - Adversial Loss ############
fake_A = G_B2A(real_B)
fake_out_A = D_A(fake_A)
fake_B = G_A2B(real_A)
fake_out_B = D_B(fake_B)
real_label = torch.ones( (fake_out_A.size()) , dtype=torch.float32).to(device)
fake_label = torch.zeros( (fake_out_A.size()) , dtype=torch.float32).to(device)
adversial_loss_B2A = MSE(fake_out_A, real_label)
adversial_loss_A2B = MSE(fake_out_B, real_label)
adv_loss = adversial_loss_B2A + adversial_loss_A2B
2.是否能重新建构 (Consistency Loss):
举例 G_B2A(real_B) 产生风格A的图像 (fake_A) 后,再丢进 G_A2B(fake_A) 重新建构成B风格的图像 (rec_B),并且计算 real_B 跟 rec_B之间的差距。
############ G - Consistency Loss (Reconstruction) ############
rec_A = G_B2A(fake_B)
rec_B = G_A2B(fake_A)
consistency_loss_B2A = L1(rec_A, real_A)
consistency_loss_A2B = L1(rec_B, real_B)
rec_loss = consistency_loss_B2A + consistency_loss_A2B
3.是否能保持一致 (Identity Loss):
以G_A2B来说,是否在丢入 real_B的图片后,确实能输出 B风格的图片,是否能保持原样?
############ G - Identity Loss ############
idt_A = G_B2A(real_A)
idt_B = G_A2B(real_B)
identity_loss_A = L1(idt_A, real_A)
identity_loss_B = L1(idt_B, real_B)
idt_loss = identity_loss_A + identity_loss_B
最后将其所有损失都计算梯度,并且更新参数,这边可以注意到重构的loss乘上10,而一致性的部分乘上5,代表在CycleGAN当中能不能重构占了相当大的比例。
############ G - Total Loss ############
lambda_rec = 10
lambda _idt = 5
loss_G = adv_loss + ( rec_loss * lambda _rec ) + ( idt_loss * lambda _idt )
############ G - Backward & Update ############
loss_G.backward()
optim_G.step()
接着训练D,它只要将自己的本份顾好就好了,也就是「能否分辨得出该风格的成像是否真实」。
############ Train D ############
optim_D.zero_grad()
############ D - Adversial D_A Loss ############
real_out_A = D_A(real_A)
real_out_A_loss = MSE(real_out_A, real_label)
fake_out_A = D_A(fake_A_sample.push_and_pop(fake_A))
fake_out_A_loss = MSE(real_out_A, fake_label)
loss_DA = real_out_A_loss + fake_out_A_loss
############ D - Adversial D_B Loss ############
real_out_B = D_B(real_B)
real_out_B_loss = MSE(real_out_B, real_label)
fake_out_B = D_B(fake_B_sample.push_and_pop(fake_B))
fake_out_B_loss = MSE(fake_out_B, fake_label)
loss_DB = ( real_out_B_loss + fake_out_B_loss )
############ D - Total Loss ############
loss_D = ( loss_DA + loss_DB ) * 0.5
############ Backward & Update ############
loss_D.backward()
optim_D.step()
最后我们可以将一些信息透过tqdm印出来
############ progress info ############
progress_bar.set_description(
f"[{epoch}/{epochs - 1}][{idx}/{len(dataloader) - 1}] "
f"Loss_D: {(loss_DA + loss_DB).item():.4f} "
f"Loss_G: {loss_G.item():.4f} "
f"Loss_G_identity: {(idt_loss).item():.4f} "
f"loss_G_GAN: {(adv_loss).item():.4f} "
f"loss_G_cycle: {(rec_loss).item():.4f}")
接着训练GAN非常重要的环节就是要记得储存权重,因为说不定训练第100回合的效果比200回合的还要好,所以都会倾向一定的回合数就储存一次。储存的方法很简单大家可以上PyTorch的官网查看,大致上总共有两种储存方式:
1.储存模型结构以及权重
torch.save( model )
2.只储存权重
torch.save( model.static_dict() )
而我采用的方式是只储存权重,这也是官方建议的方案:
if i % log_freq == 0:
vutils.save_image(real_A, f"{output_path}/real_A_{epoch}.jpg", normalize=True)
vutils.save_image(real_B, f"{output_path}/real_B_{epoch}.jpg", normalize=True)
fake_A = ( G_B2A( real_B ).data + 1.0 ) * 0.5
fake_B = ( G_A2B( real_A ).data + 1.0 ) * 0.5
vutils.save_image(fake_A, f"{output_path}/fake_A_{epoch}.jpg", normalize=True)
vutils.save_image(fake_B, f"{output_path}/fake_A_{epoch}.jpg", normalize=True)
torch.save(G_A2B.state_dict(), f"weights/netG_A2B_epoch_{epoch}.pth")
torch.save(G_B2A.state_dict(), f"weights/netG_B2A_epoch_{epoch}.pth")
torch.save(D_A.state_dict(), f"weights/netD_A_epoch_{epoch}.pth")
torch.save(D_B.state_dict(), f"weights/netD_B_epoch_{epoch}.pth")
############ Update learning rates ############
lr_scheduler_G.step()
lr_scheduler_D.step()
############ save last check pointing ############
torch.save(netG_A2B.state_dict(), f"weights/netG_A2B.pth")
torch.save(netG_B2A.state_dict(), f"weights/netG_B2A.pth")
torch.save(netD_A.state_dict(), f"weights/netD_A.pth")
torch.save(netD_B.state_dict(), f"weights/netD_B.pth")
测试
其实测试非常的简单,跟着以下的步骤就可以完成:
1.导入函式库
import os
import torch
import torchvision.datasets as dsets
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from tqdm import tqdm
import torchvision.utils as vutils
2.将测试数据建一个数据集并透过DataLoader加载:这边我创了一个Custom文件夹存放我自己的数据,并且新建了一个output文件夹方便察看结果。
batch_size = 12
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
transform = transforms.Compose( [transforms.Resize((256,256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
root = r'vangogh2photo'
targetC_path = os.path.join(root, 'custom')
output_path = os.path.join('./', r'output')
if os.path.exists(output_path) == False:
os.mkdir(output_path)
print('Create dir : ', output_path)
dataC_loader = DataLoader(dsets.ImageFolder(targetC_path, transform=transform), batch_size=batch_size, shuffle=True, num_workers=4)
3.实例化生成器、加载权重 (load_static_dict)、选择模式 ( train or eval ),如果选择 eval,PyTorch会将Drop给自动关掉;因为我只要真实照片转成梵谷所以只宣告了G_B2A:
# get generator
G_B2A = Generator().to(device)
# Load state dicts
G_B2A.load_state_dict(torch.load(os.path.join("weights", "netG_B2A.pth")))
# Set model mode
G_B2A.eval()
4.开始进行预测:
取得数据>丢进模型取得输出>储存图片
progress_bar = tqdm(enumerate(dataC_loader), total=len(dataC_loader))
for i, data in progress_bar:
# get data
real_images_B = data[0].to(device)
# Generate output
fake_image_A = 0.5 * (G_B2A(real_images_B).data + 1.0)
# Save image files
vutils.save_image(fake_image_A.detach(), f"{output_path}/FakeA_{i + 1:04d}.jpg", normalize=True)
progress_bar.set_description(f"Process images {i + 1} of {len(dataC_loader)}")
5.去output察看结果:
可能是因为我只有训练100回合,梵谷风格的细节线条还没学起来,大家可以尝试再训练久一点,理论上200回合就会有不错的成果了!
ORIGINAL |
TRANSFORM |
好的,那现在已经会建构、训练以及预测了,接下来我们来想个办法应用它!讲到Style Transfer的应用,第一个就想到微软大大提供的Style Transfer Azure Website。
Azure 的Style Transfer
这种拍一张照片就可以直接做转换的感觉真的很棒!所以我们理论上也可以透过简单的opencv程序来完成这件事情,再实作之前先去体验看看Style Transfer 。
按下Create就能进来这个页面,透过点击Capture就可以拍照进行转换也可以点击Upload a picture上传照片,总共有4种风格可以选择:
感觉真的超级酷的!所以我们也来试着实作类似的功能。
在JetsonNano中进行风格转换
1.首先要将权重放到Jetson Nano中,我新增了一个weights文件夹并且将pth放入其中,此外还在同一层级新增了jupyter book的程序:
2.重建生成器并导入权重值
这边可能会有版本问题,像我就必须升级成Torch 1.6版本,而安装PyTorch的方法我会放在文章结尾补述,回归正题,还记得刚刚我储存的时候只有储存权重对吧,所以我们必须建一个跟当初训练一模一样的模型才能汇入哦!所以来复制一下之前写的生成器吧!
import torch
from torch import nn
from torchsummary import summary
def conv_norm_relu(in_dim, out_dim, kernel_size, stride = 1, padding=0):
layer = nn.Sequential(nn.Conv2d(in_dim, out_dim, kernel_size, stride, padding),
nn.InstanceNorm2d(out_dim),
nn.ReLU(True))
return layer
def dconv_norm_relu(in_dim, out_dim, kernel_size, stride = 1, padding=0, output_padding=0):
layer = nn.Sequential(nn.ConvTranspose2d(in_dim, out_dim, kernel_size, stride, padding, output_padding),
nn.InstanceNorm2d(out_dim),
nn.ReLU(True))
return layer
class ResidualBlock(nn.Module):
def __init__(self, dim, use_dropout):
super(ResidualBlock, self).__init__()
res_block = [nn.ReflectionPad2d(1),
conv_norm_relu(dim, dim, kernel_size=3)]
if use_dropout:
res_block += [nn.Dropout(0.5)]
res_block += [nn.ReflectionPad2d(1),
nn.Conv2d(dim, dim, kernel_size=3, padding=0),
nn.InstanceNorm2d(dim)]
self.res_block = nn.Sequential(*res_block)
def forward(self, x):
return x + self.res_block(x)
class Generator(nn.Module):
def __init__(self, input_nc=3, output_nc=3, filters=64, use_dropout=True, n_blocks=6):
super(Generator, self).__init__()
# 向下采样
model = [nn.ReflectionPad2d(3),
conv_norm_relu(input_nc , filters * 1, 7),
conv_norm_relu(filters * 1, filters * 2, 3, 2, 1),
conv_norm_relu(filters * 2, filters * 4, 3, 2, 1)]
# 颈脖层
for i in range(n_blocks):
model += [ResidualBlock(filters * 4, use_dropout)]
# 向上采样
model += [dconv_norm_relu(filters * 4, filters * 2, 3, 2, 1, 1),
dconv_norm_relu(filters * 2, filters * 1, 3, 2, 1, 1),
nn.ReflectionPad2d(3),
nn.Conv2d(filters, output_nc, 7),
nn.Tanh()]
self.model = nn.Sequential(*model) # model 是 list 但是 sequential 需要将其透过 , 分割出来
def forward(self, x):
return self.model(x)
接下来要做实例化模型并导入权重:
def init_model():
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
G_B2A = Generator().to(device)
G_B2A.load_state_dict(torch.load(os.path.join("weights", "netG_B2A.pth"), map_location=device ))
G_B2A.eval()
return G_B2A
3.在Colab中拍照
我先写了一个副函式来进行模型的预测,丢进去的图片记得也要做transform,将大小缩放到256、转换成tensor以及正规化,这部分squeeze目的是要模拟成有batch_size的格式:
def test(G, img):
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
transform = transforms.Compose([transforms.Resize((256,256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])
data = transform(img).to(device)
data = data.unsqueeze(0)
out = (0.5 * (G(data).data + 1.0)).squeeze(0) return out
我们接着使用OpenCV来完成拍照,按下q离开,按下s进行储存,那我们可以在按下s的时候进行风格转换,存下两种风格的图片,这边要注意的是PyTorch吃的是PIL的图档格式,所以还必须将OpenCV的nparray格式转换成PIL.Image格式:
if __name__=='__main__':
G = init_model()
trans_path = 'test_transform.jpg'
org_path = 'test_original.jpg'
cap = cv2.VideoCapture(0)
while(True):
ret, frame = cap.read()
cv2.imshow('webcam', frame)
key = cv2.waitKey(1)
if key==ord('q'):
cap.release()
cv2.destroyAllWindows()
break
elif key==ord('s'):
output = test(G, Image.fromarray(frame))
style_img = np.array(output.cpu()).transpose([1,2,0])
org_img = cv2.resize(frame, (256, 256))
cv2.imwrite(trans_path, style_img*255)
cv2.imwrite(org_path, org_img)
break
cap.release()
cv2.destroyWindow('webcam')
执行的画面如下:
res = np.concatenate((style_img, org_img/255), axis=1)
cv2.imshow('res',res )
cv2.waitKey(0)
cv2.destroyAllWindows()
在Jetson Nano中做实时影像转换
概念跟拍照转换雷同,这边我们直接在取得到摄影机的图像之后就做风格转换,我额外写了一个判断,按下t可以进行风格转换,并且用cv2.putText将现在风格的卷标显示在左上角。
if __name__=='__main__':
G = init_model()
cap = cv2.VideoCapture(0)
change_style = False
save_img_name = 'test.jpg'
cv2text = ''
while(True):
ret, frame = cap.read()
# Do Something Cool
############################
if change_style:
style_img = test(G, Image.fromarray(frame))
out = np.array(style_img.cpu()).transpose([1,2,0])
cv2text = 'Style Transfer'
else:
out = frame
cv2text = 'Original'
out = cv2.resize(out, (512, 512))
out = cv2.putText(out, f'{cv2text}', (20, 40), cv2.FONT_HERSHEY_SIMPLEX ,
1, (255, 255, 255), 2, cv2.LINE_AA)
###########################
cv2.imshow('webcam', out)
key = cv2.waitKey(1)
if key==ord('q'):
break
elif key==ord('s'):
if change_style==True:
cv2.imwrite(save_img_name,out*255)
else:
cv2.imwrite(save_img_name,out)
elif key==ord('t'):
change_style = False if change_style else True
cap.release()
cv2.destroyAllWindows()
实时影像风格转换成果
结语
这次GAN影像风格转换的部分就告一段落了,利用Colab来训练风格转换的范例真的还是偏硬了一点,虽然我们只有训练100回合但也跑了半天多一点了,但是!GAN就是个需要耐心的模型,不跑个三天两夜他是不会给你多好的成效的。
至于在Inference的部分,Jetson Nano还是担当起重要的角色,稍微有一些延迟不过还算是不错的了,或许可以考虑透过ONNX转换成TensorRT再去跑,应该又会加快许多了,下一次又会有什么GAN的范例大家可以期待一下,或者留言跟我说。
补充 – Nano 安装Torch 1.6的方法
首先,JetPack版本要升级到4.4哦!不然CUDA核心不同这部分官网就有升级教学所以就不多赘述了。
将PyTorch等相依套件更新至1.6版本:
$ wget https://nvidia.box.com/shared/static/yr6sjswn25z7oankw8zy1roow9cy5ur1.whl -O torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl
$ sudo apt-get install python3-pip libopenblas-base libopenmpi-dev
$ pip3 install Cython
$ pip3 install torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl
将TorchVision更新至对应版本:
$ sudo apt-get install libjpeg-dev zlib1g-dev
$ git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision
$ cd torchvision
$ export BUILD_VERSION=0.7.0 # where 0.x.0 is the torchvision version
$ sudo python3 setup.py install # use python3 if installing for Python 3.6
$ cd ../ # attempting to load torchvision from build dir will result in import error