Toybrick

标题: 关于pytorch转onnx的反卷积问题 [打印本页]

作者: 18651669016    时间: 2020-3-2 17:21
标题: 关于pytorch转onnx的反卷积问题
centernet的pytorch模型,转成onnx后,使用rknn.load_onnx(model.onnx)调用,出错。 模型中包含pytorch的反卷积操作nn.ConvTranspose2d。请问rknn是不是不支持pytorch的反卷积操作?


作者: gyq_    时间: 2020-4-1 15:53
我也遇到了同样的问题,请问楼主解决了吗
作者: jefferyzhang    时间: 2020-4-1 15:55
本帖最后由 jefferyzhang 于 2020-4-1 15:56 编辑

1. pytorch必须是1.2版本
2. 反卷积可以支持,verbose打开看下上面转换的错误log是什么
3. 转的这个onnx模型转rknn之前先加载推理下看是否能成功
作者: gyq_    时间: 2020-4-1 16:20
我的是直接转换pt文件。
D RKNN output shape(batchnormalize): (0 16 16 320)
D Process convolution_at_input136.1_36 ...
D RKNN output shape(convolution): (0 16 16 1280)
D Process batch_norm_at_input137.1_35 ...
D RKNN output shape(batchnormalize): (0 16 16 1280)
D Process hardtanh__at_input138.1_34 ...
D RKNN output shape(relun): (0 16 16 1280)
D Process convolution_at_input139.1_33 ...
D RKNN output shape(convolution): (0 8 8 1280)
D Process batch_norm_at_input140.1_32 ...
D RKNN output shape(batchnormalize): (0 8 8 1280)
D Process relu__at_input141.1_31 ...
D RKNN output shape(relu): (0 8 8 1280)
D Process convolution_at_input142.1_30 ...
D RKNN output shape(convolution): (0 4 4 16)
D Process batch_norm_at_input143.1_29 ...
D RKNN output shape(batchnormalize): (0 4 4 16)
D Process relu__at_input144.1_28 ...
D RKNN output shape(relu): (0 4 4 16)
D Process convolution_at_input145.1_27 ...
D RKNN output shape(convolution): (0 2 2 16)
D Process batch_norm_at_input146.1_26 ...
D RKNN output shape(batchnormalize): (0 2 2 16)
D Process relu__at_input147.1_24 ...
D RKNN output shape(relu): (0 2 2 16)
D Process convolution_at_input148.1_18 ...
D RKNN output shape(convolution): (0 2 2 64)
D Process relu__at_input149.1_12 ...
D RKNN output shape(relu): (0 2 2 64)
D Process convolution_at_hm_x.1_6 ...


D Process output_of_convolution_at_2269_5 ...
D RKNN output shape(output): (0 2 2 2)
I Build mobilenet1 complete.
D Optimizing network with force_1d_tensor, swapper, merge_layer, auto_fill_bn, resize_nearest_transformer, auto_fill_multiply, merge_avgpool_conv1x1, auto_fill_zero_bias, proposal_opt_import
D Merge ['convolution_at_input145.1_27', 'batch_norm_at_input146.1_26'] (convolution)
D Merge ['convolution_at_input142.1_30', 'batch_norm_at_input143.1_29'] (convolution)
D Merge ['convolution_at_input139.1_33', 'batch_norm_at_input140.1_32'] (convolution)
E Catch exception when loading pytorch model: /Users/gyq/DeepLearning/CenterNet_in_mac/mobilenet1.pt!
E Traceback (most recent call last):
E   File "rknn/api/rknn_base.py", line 567, in rknn.api.rknn_base.RKNNBase.load_pytorch
Load pytorch model failed!
E   File "rknn/base/RKNNlib/app/importer/import_pytorch.py", line 121, in rknn.base.RKNNlib.app.importer.import_pytorch.ImportPytorch.run
E   File "rknn/base/RKNNlib/app/helper/mergehelper.py", line 155, in rknn.base.RKNNlib.app.helper.mergehelper.MergeHelper.merge
E   File "rknn/base/RKNNlib/optimize/optimizer.py", line 299, in rknn.base.RKNNlib.optimize.optimizer.Optimizer.apply
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer.py", line 81, in rknn.base.RKNNlib.optimize.rules.merge_layer.MergeLayer.apply
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer.py", line 105, in rknn.base.RKNNlib.optimize.rules.merge_layer.MergeLayer._loop
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer_ext_proc.py", line 86, in rknn.base.RKNNlib.optimize.rules.merge_layer_ext_proc.m_l1_bn
E ValueError: operands could not be broadcast together with shapes (4,4,16,1280) (16,)
作者: jefferyzhang    时间: 2020-4-1 16:26
gyq_ 发表于 2020-4-1 16:20
我的是直接转换pt文件。
D RKNN output shape(batchnormalize): (0 16 16 320)
D Process convolution_at_i ...

确认pytorch是不是1.2版本,高了低了都不行。
在确认你这个pth模型用这个pytorch1.2是否可以读出来推理

作者: gyq_    时间: 2020-4-1 16:33
jefferyzhang 发表于 2020-4-1 16:26
确认pytorch是不是1.2版本,高了低了都不行。
在确认你这个pth模型用这个pytorch1.2是否可以读出来推理
...

pytorch确认是1.2.0,模型也可以读出来推理
作者: jefferyzhang    时间: 2020-4-1 16:37
gyq_ 发表于 2020-4-1 16:33
pytorch确认是1.2.0,模型也可以读出来推理

试下这个beta版本:

rknn_toolkit v1.3.1 beta3:
链接: https://pan.baidu.com/s/1Kn2FGAdF_j3CMLNEsC3OPw 提取码: rcds
作者: gyq_    时间: 2020-4-1 19:25
jefferyzhang 发表于 2020-4-1 16:37
试下这个beta版本:

rknn_toolkit v1.3.1 beta3:

还是会出现一样的错误
作者: gyq_    时间: 2020-4-2 00:31
jefferyzhang 发表于 2020-4-1 16:37
试下这个beta版本:

rknn_toolkit v1.3.1 beta3:
  1. import torch
  2. import torch.nn as nn
  3. import os
  4. import math
  5. import logging
  6. import torch.utils.model_zoo as model_zoo
  7. import math
  8. import numpy as np
  9. import cv2
  10. from rknn.api import RKNN
  11. import rknn.api

  12. import torchvision.models as models



  13. model = nn.Sequential(
  14.     nn.ConvTranspose2d(
  15.                     in_channels=1280,
  16.                     out_channels=16,
  17.                     kernel_size=4,
  18.                     stride=2,
  19.                     padding=1,
  20.                     output_padding=0,
  21.                     bias=True),
  22.     # nn.BatchNorm2d(16, momentum=0.1),
  23.     nn.ReLU(inplace=True),

  24.     nn.ConvTranspose2d(
  25.                     in_channels=16,
  26.                     out_channels=16,
  27.                     kernel_size=4,
  28.                     stride=2,
  29.                     padding=1,
  30.                     output_padding=0,
  31.                     bias=True),
  32.     nn.BatchNorm2d(16, momentum=0.1),
  33.     nn.ReLU(inplace=True),

  34.     nn.ConvTranspose2d(
  35.                     in_channels=16,
  36.                     out_channels=16,
  37.                     kernel_size=4,
  38.                     stride=2,
  39.                     padding=1,
  40.                     output_padding=0,
  41.                     bias=True),
  42.     nn.BatchNorm2d(16, momentum=0.1),
  43.     nn.ReLU(inplace=True),

  44. )

  45. trace_model = torch.jit.trace(model, torch.Tensor(1, 1280, 16, 16))
  46. trace_model.save('test_error.pt')

  47. input_size_list = [[1280, 16, 16]]

  48. # Create RKNN object
  49. rknn = RKNN(verbose=True)
  50. print('--> Loading model')
  51. ret = rknn.load_pytorch(model='test_error.pt', input_size_list=input_size_list)
复制代码


第一个batchnorm如果注释掉,就能转成功;如果保留,就会出现上面的报错。请问这是为什么?
作者: gyq_    时间: 2020-4-2 00:41
D Process input_of_graph/out1_10 ...
D RKNN output shape(input): (0 16 16 1280)
D Process convolution_at_input0.1_9 ...
D RKNN output shape(convolution): (0 8 8 1280)
D Process batch_norm_at_input1.1_8 ...
D RKNN output shape(batchnormalize): (0 8 8 1280)
D Process relu__at_input2.1_7 ...
D RKNN output shape(relu): (0 8 8 1280)
D Process convolution_at_input3.1_6 ...
D RKNN output shape(convolution): (0 4 4 16)
D Process batch_norm_at_input4.1_5 ...
D RKNN output shape(batchnormalize): (0 4 4 16)
D Process relu__at_input5.1_4 ...
D RKNN output shape(relu): (0 4 4 16)
D Process convolution_at_input6.1_3 ...
D RKNN output shape(convolution): (0 2 2 16)
D Process batch_norm_at_input7.1_2 ...
D RKNN output shape(batchnormalize): (0 2 2 16)
D Process relu__at_135_1 ...
D RKNN output shape(relu): (0 2 2 16)
D Process output_of_relu__at_135_0 ...
D RKNN output shape(output): (0 2 2 16)
I Build test_error complete.
D Optimizing network with force_1d_tensor, swapper, merge_layer, auto_fill_bn, resize_nearest_transformer, auto_fill_multiply, merge_avgpool_conv1x1, auto_fill_zero_bias, proposal_opt_import
D Merge ['convolution_at_input6.1_3', 'batch_norm_at_input7.1_2'] (convolution)
D Merge ['convolution_at_input3.1_6', 'batch_norm_at_input4.1_5'] (convolution)
D Merge ['convolution_at_input0.1_9', 'batch_norm_at_input1.1_8'] (convolution)
E Catch exception when loading pytorch model: test_error.pt!
E Traceback (most recent call last):
E   File "rknn/api/rknn_base.py", line 567, in rknn.api.rknn_base.RKNNBase.load_pytorch
E   File "rknn/base/RKNNlib/app/importer/import_pytorch.py", line 121, in rknn.base.RKNNlib.app.importer.import_pytorch.ImportPytorch.run
E   File "rknn/base/RKNNlib/app/helper/mergehelper.py", line 155, in rknn.base.RKNNlib.app.helper.mergehelper.MergeHelper.merge
E   File "rknn/base/RKNNlib/optimize/optimizer.py", line 299, in rknn.base.RKNNlib.optimize.optimizer.Optimizer.apply
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer.py", line 81, in rknn.base.RKNNlib.optimize.rules.merge_layer.MergeLayer.apply
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer.py", line 105, in rknn.base.RKNNlib.optimize.rules.merge_layer.MergeLayer._loop
E   File "rknn/base/RKNNlib/optimize/rules/merge_layer_ext_proc.py", line 86, in rknn.base.RKNNlib.optimize.rules.merge_layer_ext_proc.m_l1_bn
E ValueError: operands could not be broadcast together with shapes (4,4,16,1280) (16,)

作者: jefferyzhang    时间: 2020-4-2 08:21
把模型和转换脚本打包上传百度网盘,然后发给我,我提一个bug给NPU部门
作者: gyq_    时间: 2020-4-2 09:15
jefferyzhang 发表于 2020-4-2 08:21
把模型和转换脚本打包上传百度网盘,然后发给我,我提一个bug给NPU部门

链接: https://pan.baidu.com/s/1jGD54_j_n0o6dj-xF-8SCw  密码: wor0
作者: jefferyzhang    时间: 2020-4-2 09:22
gyq_ 发表于 2020-4-2 09:15
链接: https://pan.baidu.com/s/1jGD54_j_n0o6dj-xF-8SCw  密码: wor0

你的问题已经上报了。
楼主的问题楼主怎么不见了?
作者: gyq_    时间: 2020-4-2 09:51
jefferyzhang 发表于 2020-4-2 09:22
你的问题已经上报了。
楼主的问题楼主怎么不见了?

还有一个问题,这里的反卷积是不是执行错了,正确的h,w应该是从16到32到64到128的,结果反过来缩小到2了
作者: jefferyzhang    时间: 2020-4-2 09:54
gyq_ 发表于 2020-4-2 09:51
还有一个问题,这里的反卷积是不是执行错了,正确的h,w应该是从16到32到64到128的,结果反过来缩小到2了 ...

这里看不出来,你可以按troubleshoot文档试着dump出每一层结果来看下是不是预期
作者: gyq_    时间: 2020-4-3 11:54
我又尝试了一下先转成onnx再转成rknn。发现了一个问题,当反卷积层的输入通道数不等于输出通道数时,会出现如下错误:
D RKNN output shape(convolution): (1 16 16 1280)
D Real output shape: (1, 16, 16, 1280)
D Process ConvTranspose_30_9 ...
D RKNN output shape(deconvolution): (1 32 32 1280)
E Catch exception when building RKNN model!
E Traceback (most recent call last):
E   File "rknn/api/rknn_base.py", line 737, in rknn.api.rknn_base.RKNNBase.build
E   File "rknn/api/rknn_base.py", line 1644, in rknn.api.rknn_base.RKNNBase._quantize2
E   File "rknn/base/RKNNlib/app/medusa/quantization.py", line 105, in rknn.base.RKNNlib.app.medusa.quantization.Quantization.run
E   File "rknn/base/RKNNlib/app/medusa/quantization.py", line 44, in rknn.base.RKNNlib.app.medusa.quantization.Quantization._run_quantization
E   File "rknn/base/RKNNlib/app/medusa/workspace.py", line 129, in rknn.base.RKNNlib.app.medusa.workspace.Workspace.run
E   File "rknn/base/RKNNlib/app/medusa/workspace.py", line 99, in rknn.base.RKNNlib.app.medusa.workspace.Workspace._setup_graph
E   File "rknn/base/RKNNlib/app/medusa/workspace.py", line 100, in rknn.base.RKNNlib.app.medusa.workspace.Workspace._setup_graph
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 274, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 278, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 305, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build_layer
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 305, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build_layer
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 305, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build_layer
E   [Previous line repeated 2 more times]
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 331, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build_layer
E   File "rknn/base/RKNNlib/RKNNnetbuilder.py", line 336, in rknn.base.RKNNlib.RKNNnetbuilder.RKNNNetBuilder.build_layer
E   File "rknn/base/RKNNlib/layer/RKNNlayer.py", line 287, in rknn.base.RKNNlib.layer.RKNNlayer.RKNNLayer.compute_tensor
E   File "rknn/base/RKNNlib/layer/deconvolution.py", line 100, in rknn.base.RKNNlib.layer.deconvolution.Deconvolution.compute_out_tensor
E   File "/Users/gyq/DeepLearning/CenterNet_in_mac/venv/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1242, in conv2d_transpose
E     filter.get_shape()[2]))
E ValueError: output_shape does not match filter's output channels, 1280 != 16


如果两个通道数相等则不会出现错误。
  1. import torch
  2. import torch.nn as nn
  3. import cv2
  4. from rknn.api import RKNN

  5. #
  6. channels = 1280

  7. model = nn.Sequential(
  8.     nn.Conv2d(
  9.                 3, 32,
  10.                 kernel_size=4, stride=2,
  11.                 padding=1, dilation=1, bias=False),
  12.     nn.Conv2d(
  13.                 32, 32,
  14.                 kernel_size=4, stride=2,
  15.                 padding=1, dilation=1, bias=False),
  16.     nn.Conv2d(
  17.         32, 32,
  18.         kernel_size=4, stride=2,
  19.         padding=1, dilation=1, bias=False),
  20.     nn.Conv2d(
  21.         32, 32,
  22.         kernel_size=4, stride=2,
  23.         padding=1, dilation=1, bias=False),
  24.     nn.Conv2d(
  25.         32, channels,
  26.         kernel_size=4, stride=2,
  27.         padding=1, dilation=1, bias=False),
  28.     # 当ConvTranspose2d输入通道数不等于输出通道数是,在build过程中会出错
  29.     # 加一层conv2D转换一下后就可以跑通了
  30.     # nn.Conv2d(
  31.     #     channels, 16,
  32.     #     kernel_size=3, stride=1,
  33.     #     padding=1, dilation=1, bias=False),
  34.     nn.ConvTranspose2d(
  35.                     in_channels=channels,
  36.                     # in_channels=16,
  37.                     out_channels=16,
  38.                     kernel_size=4,
  39.                     stride=2,
  40.                     padding=1,
  41.                     output_padding=0,
  42.                     bias=True),
  43.     nn.BatchNorm2d(16, momentum=0.1),
  44.     nn.ReLU(inplace=True),

  45.     nn.ConvTranspose2d(
  46.                     in_channels=16,
  47.                     out_channels=16,
  48.                     kernel_size=4,
  49.                     stride=2,
  50.                     padding=1,
  51.                     output_padding=0,
  52.                     bias=False),
  53.     nn.BatchNorm2d(16, momentum=0.1),
  54.     nn.ReLU(inplace=True),

  55.     nn.ConvTranspose2d(
  56.                     in_channels=16,
  57.                     out_channels=16,
  58.                     kernel_size=4,
  59.                     stride=2,
  60.                     padding=1,
  61.                     output_padding=0,
  62.                     bias=False),
  63.     nn.BatchNorm2d(16, momentum=0.1),
  64.     nn.ReLU(inplace=True),

  65. )



  66. # 保存为onnx
  67. torch.onnx.export(model,
  68.                   torch.Tensor(1, 3, 512, 512),
  69.                   'test_error2.onnx',
  70.                   verbose=True,
  71.                   do_constant_folding=False,  # 是否执行常量折叠优化
  72.                   input_names=["input"],  # 输入名
  73.                   output_names=["output"],  # 输出名
  74.                   )

  75. input_size_list = [[3, 512, 512]]

  76. # Create RKNN object
  77. rknn = RKNN(verbose=True)
  78. print('--> Loading model')

  79. ret = rknn.load_onnx(model='test_error2.onnx')
  80. print('--> building model')
  81. ret = rknn.build(do_quantization=True, dataset='dataset.txt')

  82. print('--> export model')
  83. ret = rknn.export_rknn('test_error2.rknn')

  84. ret = rknn.load_rknn('test_error2.rknn')

  85. img = cv2.imread('space_shuttle_512.jpg')
  86. ret = rknn.init_runtime()

  87. print('--> inference')
  88. outputs = rknn.inference(inputs=[img])
  89. print(outputs[0].shape)
复制代码

作者: gyq_    时间: 2020-4-3 11:55
请问这是什么原因?
作者: gyq_    时间: 2020-4-3 14:26
jefferyzhang 发表于 2020-4-2 09:54
这里看不出来,你可以按troubleshoot文档试着dump出每一层结果来看下是不是预期 ...

我输出出来看过了,确实是和预期的相反。
刚刚我又试了一下把第一个反卷积层的输入通道和输出通道改成相等的,保存为.pt,再转成rknn,可以跑通,但最后inference输出的结果也是和预期的相反(反卷积越卷越小)。
作者: jefferyzhang    时间: 2020-4-3 14:55
gyq_ 发表于 2020-4-3 14:26
我输出出来看过了,确实是和预期的相反。
刚刚我又试了一下把第一个反卷积层的输入通道和输出通道改成相 ...

试下beta7:
链接:https://pan.baidu.com/s/1DuLeBawfoBP62mu0ADNGAA
提取码:qtys
作者: gyq_    时间: 2020-4-3 15:49
jefferyzhang 发表于 2020-4-3 14:55
试下beta7:
链接:https://pan.baidu.com/s/1DuLeBawfoBP62mu0ADNGAA
提取码:qtys

可以正常转换了,非常感谢!




欢迎光临 Toybrick (https://t.rock-chips.com/) Powered by Discuz! X3.3