|
本帖最后由 xsky 于 2020-2-28 16:23 编辑
rknn 1.3
D RKNNAPI: API: 1.3.0
D RKNNAPI: DRV: 1.3.0
torch 1.2.0
tensorflow 1.14.0
onnx 1.4.1
onnx-tf 1.2.1
目的是导出并测试一个LSTM运算
完整代码:
其中文件名
xxx-pt为pytorch建立模型,xxx-tf为tensorflow
xxx-run 为去掉模型创建转换,只加载.rknn并运行
op-check-xx 为单个运算符测试
其中pytorch,模型定义代码
- import platform
- import os
- import torch
- import numpy as np
- from rknn.api import RKNN
- import onnx
- from onnx_tf.backend import prepare
- class LstmNode(torch.nn.Module):
- def __init__(self, input_size, hidden_size):
- super(LstmNode, self).__init__()
- self._fc_x = torch.nn.Linear(input_size, hidden_size)
- self._fc_hc = torch.nn.Linear(hidden_size, hidden_size)
- def forward(self, x, hc0):
- a = self._fc_x(x)
- b = self._fc_hc(hc0)
- return a + b
- class HardTanh(torch.nn.Module):
- def __init__(self):
- super(HardTanh, self).__init__()
- def forward(self, x):
- return (x * 0.5 + 0.5)
- class PicewiseLinear(torch.nn.Module):
- def __init__(self, in_size, out_size):
- super(PicewiseLinear, self).__init__()
- self._linear = torch.nn.Linear(in_size, out_size)
- def forward(self, x):
- ox = self._linear(x)
- ox = ox.clamp_(0, 1)
- pass
- class LstmUnit(torch.nn.Module):
- def __init__(self, input_size, hidden_size):
- super(LstmUnit, self).__init__()
- self._tanh = torch.nn.Hardtanh() # torch.nn.Tanh()
- self._sigmoid = torch.nn.Sigmoid()
- self._fc_it = LstmNode(input_size, hidden_size)
- self._fc_ft = LstmNode(input_size, hidden_size)
- self._fc_gt = LstmNode(input_size, hidden_size)
- self._fc_ot = LstmNode(input_size, hidden_size)
- pass
- def forward(self, x, h0, c0):
- # # _tanh替换为_sigmoid, onnx加载计算错误; load_torch加载可转换但模型初始化失败
- # it = self._sigmoid(self._fc_it(x, h0))
- # ft = self._sigmoid(self._fc_ft(x, h0))
- # gt = self._sigmoid(self._fc_gt(x, h0)) # self._tanh
- # ot = self._sigmoid(self._fc_ot(x, h0))
- # ct = ft * c0 + it * gt
- # ht = ot * self._sigmoid(ct) # ot * self._tanh(ct)
- # 去掉sigmoid/tanh只剩矩阵乘加, onnx加载计算结果错误, load_pytorch结果正确
- it = self._fc_it(x, h0)
- ft = self._fc_ft(x, h0)
- gt = self._fc_gt(x, h0) # self._tanh
- ot = self._fc_ot(x, h0)
- ct = ft * c0 + it * gt
- ht = ot * ct # ot * self._tanh(ct)
- return ot, ht, ct
- class LSTM(torch.nn.Module):
- def __init__(self, seq_len, input_size, hidden_size):
- super(LSTM, self).__init__()
- self._seq_len = seq_len
- self._input_size = input_size
- self._hidden_size = hidden_size
- # self._lstm = torch.nn.LSTM(input_size, hidden_size, num_layers=1, batch_first=True)
- # self._fc = torch.nn.Linear(input_size, hidden_size)
- self._lstm_unit = LstmUnit(input_size, hidden_size)
- def forward(self, x, h0, c0):
- ix = x.view(1, -1)
- ih = h0.view(1, -1)
- ic = c0.view(1, -1)
- y, oh, oc = self._lstm_unit(ix, ih, ic)
- oy = y.view(1, seq_len, -1)
- oh = oh.view(1, 1, -1)
- oc = ic.view(1, 1, -1)
- # 此段循环会生成 select算子,rknn.load_pytorch时报错
- # ix = x.view(1, self._seq_len, -1)
- # ih = h0.view(1, -1)
- # ic = c0.view(1, -1)
- # y = []
- # for i in range(self._seq_len):
- # xt = ix[0][i].view(1, -1)
- # yt, ih, ic = self._lstm_unit(xt, ih, ic)
- # y.append(yt)
- # oy = torch.cat(y)
- # oh = ih.view(1, 1, -1)
- # oc = ic.view(1, 1, -1)
- # Pytorch LSTM
- # ix = x.view(1, seq_len, -1)
- # ih = h0.view(1, 1, -1)
- # ic = c0.view(1, 1, -1)
- # oy, (oh, oc) = self._lstm(ix, (ih, ic))
- return oy, oh, oc
- # return self._sigmoid(ix), self._sigmoid(ih), self._sigmoid(ic)
实验:
1、直接使用Pytorch 的LSTM导出:.load_pytorch时正常,.build时报错2、Tensorflow单层动态LSTM导出: 不支持的算子,TensorArrayGatherV3;如果把输出节点上移,报错:E AttributeError: 'NoneType' object has no attribute 'op'
3、使用Pytorch的底层运算符构造LSTM单元及迭代,即上面贴出的代码:
(1) 其中tanh报错: E KeyError: 'aten::tanh';
(2) 如果去掉tanh的调用:load_pytorch可以转换为rknn,但inin_runtime时报错。
(3)如果再把sigmoid也去掉,即只剩 Linear0 + Linear1,以及向量乘、向量加的操作: load_pytorch->rknn计算结果正确
以上步骤均可成功通过onnx->rknn, 但多输入的顺序会乱序; 如果按乱序后的输入调整输入,onnx运行结果不正确。
onnx->rknn 其中onnx模型是输入输出是有名字的,但转为rknn之后,C API运行查询到的输入名字是空的。
(4)在(3)的基础上,只有乘法和加法,针对LSTM的基本单元,使用for循环在序列数上面迭代:
load_pytorch报错:E KeyError: 'aten::select';
load_onnx,build报错:E ValueError: Try match Gather_26ut0 failed, catch exception!
4、对单个运算的测试
load_tensorflow->RKNN均支持tanh, sigmoid,计算结果正确。
请问下:
1、LSTM除了写自定义操作符还有别的办法么?
2、可以帮反馈下一版本支持一下双层双向动态LSTM么? 其主要的运算是Linear,应该NPU还是会有加速效果的。 但是用自定义运算符编写,其中应该会增加调度开销。底层驱动实现算子应该效率会高些。
3、使用自定义运算符,能没有instructions,类似NEON.h,将NPU支持加速的功能封装。
4、自定义运算符文档中提到的PPU模块全称是?有详细资料么? 这个模块可以针对哪些操作进行加速?
|
|