Toybrick

标题: pytorch 模型转 onnx 成功, onnxruntime 可推理, 但转 rknn 失败 [打印本页]

作者: kkkaaa    时间: 2020-4-22 17:29
标题: pytorch 模型转 onnx 成功, onnxruntime 可推理, 但转 rknn 失败
本帖最后由 kkkaaa 于 2020-4-22 17:42 编辑

pytorch 模型转 onnx 成功, onnxruntime 可推理, 且 onnx 模型与 pytorch 模型推理结果一致

但是转 rknn 失败,报错如下

E Try match Gather_1322ut0 failed, catch exception!
W ----------------Warning(1)----------------
E Catch exception when loading onnx model: ../debug_log/truncated_debug_d0_num_classes80.onnx!
E Traceback (most recent call last):
E   File "rknn/base/RKNNlib/converter/convert_onnx.py", line 1071, in rknn.base.RKNNlib.converter.convert_onnx.convert_onnx.match_paragraph_and_param
E   File "rknn/base/RKNNlib/converter/convert_onnx.py", line 980, in rknn.base.RKNNlib.converter.convert_onnx.convert_onnx._onnx_push_ready_tensor
E TypeError: 'NoneType' object is not iterable
E During handling of the above exception, another exception occurred:
E Traceback (most recent call last):
E   File "rknn/api/rknn_base.py", line 513, in rknn.api.rknn_base.RKNNBase.load_onnx
E   File "rknn/base/RKNNlib/converter/convert_onnx.py", line 1077, in rknn.base.RKNNlib.converter.convert_onnx.convert_onnx.match_paragraph_and_param
E   File "rknn/api/rknn_log.py", line 312, in rknn.api.rknn_log.RKNNLog.e
E ValueError: Try match Gather_1322ut0 failed, catch exception!
Load effdet_d0 failed!
谢谢~


补充:
我正在通过改变模型返回值定位问题,当我修改输出时,报错信息变了(onnx 模型和 pytorch 模型推理结果一致),如下:
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
W Genreate input meta fail, please check model.
W External input meta file "/tmp/tmpl2_5vz4x/torchjitexport_inputmeta.yml" is not exists.
Traceback (most recent call last):
  File "test_no_quant.py", line 83, in <module>
    ret = rknn.build(do_quantization=False, dataset='./dataset.txt')
  File "/data01/wens/venv/rknn/lib/python3.6/site-packages/rknn/api/rknn.py", line 240, in build
    ret = self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, pack_vdata=pre_compile, batch_size=rknn_batch_size)
  File "rknn/api/rknn_base.py", line 791, in rknn.api.rknn_base.RKNNBase.build
  File "rknn/api/rknn_base.py", line 2328, in rknn.api.rknn_base.RKNNBase._generate_inputmeta
IndexError: list index out of range


作者: kkkaaa    时间: 2020-4-22 17:30
  1. Not match tensor Gather_1322:out0
复制代码



作者: jefferyzhang    时间: 2020-4-22 17:56
kkkaaa 发表于 2020-4-22 17:30

Not match tensor Gather_1322ut0
意思就是这个Gather_1322奇葩的OP我们识别不了。
请确认下这个OP是什么玩意儿。。。
作者: kkkaaa    时间: 2020-4-22 17:57
jefferyzhang 发表于 2020-4-22 17:56
Not match tensor Gather_1322ut0
意思就是这个Gather_1322奇葩的OP我们识别不了。
请确认下这个OP是什 ...

请问如何定位这个 op? 谢谢
作者: kkkaaa    时间: 2020-4-22 17:58
jefferyzhang 发表于 2020-4-22 17:56
Not match tensor Gather_1322ut0
意思就是这个Gather_1322奇葩的OP我们识别不了。
请确认下这个OP是什 ...

然后。。主楼还有另外一个错,能否帮忙看一下是什么问题??
作者: jefferyzhang    时间: 2020-4-22 18:06
kkkaaa 发表于 2020-4-22 17:58
然后。。主楼还有另外一个错,能否帮忙看一下是什么问题??

第二个错可能是你output写的有问题,找不到output的index。
第一个错你直接用Netron工具去看他节点名字就知道了,我们只能转换通用的OP,你要是自己写的什么奇奇怪怪ABCD的op,我们是没办法知道他什么意思的,这种要么通过rknn的自定义op去实现,要么就是改模型。

从你这里看,很可能这个op连onnx都识别不了,所以他取了个奇怪的名字,然后把torch的op操作给复制过来的。

你直接pytorch转rknn会有问题么?
作者: kkkaaa    时间: 2020-4-22 19:20
jefferyzhang 发表于 2020-4-22 18:06
第二个错可能是你output写的有问题,找不到output的index。
第一个错你直接用Netron工具去看他节点名字就 ...

直接从 pytorch 转 rknn 报的错不一样(反而和主楼中的第二个错误一样)

W:tensorflow:From /data01/wens/venv/rknn/lib/python3.6/site-packages/onnx_tf/handlers/backend/upsample.py:13: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

/data01/wens/venv/rknn/lib/python3.6/site-packages/onnx_tf/common/__init__.py:87: UserWarning: FrontendHandler.get_outputs_names is deprecated. It will be removed in future release.. Use node.outputs instead.
  warnings.warn(message)
./truncated_jit_trace_d0_num_classes80.pt ********************
WARNING: Token 'COMMENT' defined, but not used
WARNING: There is 1 unused token
Syntax error in input! LexToken(NAMED_IDENTIFIER,'str',3,103)
done
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
W Genreate input meta fail, please check model.
W External input meta file "/tmp/tmpt2y3w943/truncated_jit_trace_d0_num_classes80_inputmeta.yml" is not exists.
Traceback (most recent call last):
  File "convert_pytorch_to_rknn.py", line 80, in <module>
    ret = rknn.build(do_quantization=False, dataset='./dataset.txt')  
  File "/data01/wens/venv/rknn/lib/python3.6/site-packages/rknn/api/rknn.py", line 240, in build
    ret = self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, pack_vdata=pre_compile, batch_size=rknn_batch_size)
  File "rknn/api/rknn_base.py", line 791, in rknn.api.rknn_base.RKNNBase.build
  File "rknn/api/rknn_base.py", line 2328, in rknn.api.rknn_base.RKNNBase._generate_inputmeta
IndexError: list index out of range
作者: kkkaaa    时间: 2020-4-22 20:22
jefferyzhang 发表于 2020-4-22 18:06
第二个错可能是你output写的有问题,找不到output的index。
第一个错你直接用Netron工具去看他节点名字就 ...

我又修改了模型输出,这次直接输出骨干网络的 features, 发现
从 pytorch 转 onnx, 再转 rknn, 都成功.
pytorch 和 onnx 推理结果一样
但是rknn 和 onnx 推理结果相差很大(features 有好几层,第一层相差还小一点,后面几层相差很大)

从 pytorch 通过 torch.jit.trace 直接转 rknn 失败,报错如下:
WARNING: Token 'COMMENT' defined, but not used
WARNING: There is 1 unused token
E Catch exception when loading pytorch model: ./truncated_jit_trace_d0_num_classes80.pt!
E Traceback (most recent call last):
E   File "rknn/api/rknn_base.py", line 611, in rknn.api.rknn_base.RKNNBase.load_pytorch
E   File "rknn/base/RKNNlib/app/importer/import_pytorch.py", line 97, in rknn.base.RKNNlib.app.importer.import_pytorch.ImportPytorch.run
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 573, in rknn.base.RKNNlib.converter.convert_pytorch.convert_pytorch.__init__
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 657, in rknn.base.RKNNlib.converter.convert_pytorch.convert_pytorch.model_simplify
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 113, in rknn.base.RKNNlib.converter.convert_pytorch.torch_inference_engine.shape_pick
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 148, in rknn.base.RKNNlib.converter.convert_pytorch.torch_inference_engine.__ir_shape_inference
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 256, in rknn.base.RKNNlib.converter.convert_pytorch.torch_inference_engine.convolution_shape
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 113, in rknn.base.RKNNlib.converter.convert_pytorch.torch_inference_engine.shape_pick
E   File "rknn/base/RKNNlib/converter/convert_pytorch.py", line 148, in rknn.base.RKNNlib.converter.convert_pytorch.torch_inference_engine.__ir_shape_inference
E KeyError: 'aten::constant_pad_nd'
Load pytorch model failed!

是不是从 onnx 转 rknn 有问题??
谢谢

作者: jefferyzhang    时间: 2020-4-22 22:47
E KeyError: 'aten::constant_pad_nd'
这个意思就是这个op不支持。
差距大可以通过打印每一层来调试,如果是量化模型,可以看下不量化是否一样




欢迎光临 Toybrick (https://t.rock-chips.com/) Powered by Discuz! X3.3