|
本帖最后由 张晨晨 于 2021-2-4 09:46 编辑
PS:此问题暂时作废,只作为我探索的过程总结
今天在1808平台上加速一个分类网络,结果不对,按照以往用其他加速框架的经验,那就一定跟网络输入有关系
首先网络的正确输出是0.958134 0.041866 对opencv读取的图像做了减均值128,除方差 1, 输入对数据做转换BGRBGRBGR.....-->BBBB...GGG...RRRR
定位到 rknn.config()
因为opencv读取默认是 BGR 的存储(历史原因)
看rknn文档 Caffe 的模型,reorder_channel 设置成"2 1 0",mean 需要按照 BGR 的顺序设置
那意思就是我网络输入是BGR,使用opencv去读图片 reorder_channel 应该设置成"2 1 0"
我的网络输入只对三通道数据做了一个减均值128,而没有除方差
转换模型的时候应该这样写 rknn.config(channel_mean_value='128.0 128.0 128.0 1.0', reorder_channel='2 1 0') (1.4.0的接口)
因为用的是C++,还需要在rknn_inputs_set()之前设置一些东西
inputs[0].index = 0;
inputs[0].type = RKNN_TENSOR_INT8;//
inputs[0].size = img.cols*img.rows*img.channels();
inputs[0].fmt = RKNN_TENSOR_NHWC;//NHWC;
inputs[0].buf = img.data;
我是opencv直接读取的图像 INT8 NHWC,因为网络输入是bbbbbb...gggggg.....rrrr.....的形式,也就是NCHW,没有找到在哪里设置的,
先不管,跑一下看一下结果0.000730 , 0.999512 这个结果,很难相信啊!将input[0].fmt改成NCHW 试试,结果0.640625 , 0.359619这是个什么玩意
算了自己写预处理,不用你的了
转换模型的时候这样写 rknn.config(channel_mean_value='0.0 0.0 0.0 1.0', reorder_channel='2 1 0')
读取的opencv图片这样处理 (uchar转float32,数据-128 BGRBGRBGR.....-->BBBB...GGG...RRRR )
float *inputdata=new float[224*224*3];
img.convertTo(img, CV_32FC3,1.0,0);
float* pB = inputdata;
float* pG = inputdata + img.rows * img.cols;
float* pR = inputdata + 2 * img.rows*img.cols;
int count=img.rows*img.cols;
float *d=(float*)img.data;
for(int i=0;i<count;i++)
{
*(pB+i)=*d-128;*d++;
*(pG+i)=*d-128;*d++;
*(pR+i)=*d-128;*d++;
}
rknn_inputs_set()之前这样设置 (float32 nchw)
inputs[0].index = 0;
inputs[0].type = RKNN_TENSOR_FLOAT32;//
inputs[0].size = img.cols*img.rows*img.channels();
inputs[0].fmt = RKNN_TENSOR_NCHW;//NHWC;
inputs[0].buf = inputdata;
看一下结果0.785645 , 0.214233 咋来的,不知道,啥万一自己做的处理跟使用你们的有啥区别,你们是怎么做的, 检查代码也没问题啊!rknn_inputs_set()这里一定有玄机,看看源码就知道了,结果好像没开源啊!哎呦,这怎么办 ,.......... .........,
思考了半辈子,config里面设置的是输入数据的处理,第二个参数还没有动 改一下变成
转换模型的时候这样写 rknn.config(channel_mean_value='0.0 0.0 0.0 1.0', reorder_channel='0 1 2')
跑一下看看结果 0.947266 , 0.052887 有点熟悉 唉 ,已经很接近正确答案了,小数点后一位,
查看文档 rknn.build(do_quantization=False) 这里为false,精度为fp16,怎么可能就到第一位,有玄机,最起码第三位才能用啊,精度都达不到怎么做量化。
After a long time,难道是BN层的问题,删掉BN Slice层看一下结果0.000000 1.0000000 哎呀,一定是经过softmax,之后饱和了,
,删掉最后一层看上一层,结果-2708.000000 , 1323.000000 恩很大
看一下caffe 的结果 -2734.590332 1336.125732 恩 ,感觉 你是内部变 int8了??????????????????????????我没有设置啊,不对啊 也不是int8 啊
解决不了,等我写个python代码,看一下每一层的输出 逐层对比。,再更新
第一层卷积结果出来了 不过我不确定我获取的对不对
文档里面有介绍:有些层会被合并,比如 conv+bn+scale 会合并成一个 conv,这时候就需要和原来模型的 scale 层的输出进行对比
但是不确定带不带relu 设置过环境变量出来 的文件 其中有带 uid 有的不带 uid
tensorName__0005_NodeID_147_ConvolutionReluPoolingLayer2_w_128_h_1568_d_1_batchID_0_out_0.txt
tensorName__0008_NodeID_148_ConvolutionReluPoolingLayer2_w_128_h_1568_d_1_batchID_0_out_0.txt
tensorName__0011_NodeID_149_ConvolutionReluPoolingLayer2_w_128_h_1568_d_1_batchID_0_out_0.txt
tensorName__0015_NodeID_150_ConvolutionReluPoolingLayer2_w_128_h_784_d_1_batchID_0_out_0.txt
tensorName__0018_NodeID_151_ConvolutionReluPoolingLayer2_w_128_h_784_d_1_batchID_0_out_0.txt
tensorName__0021_NodeID_152_ConvolutionReluPoolingLayer2_w_128_h_784_d_1_batchID_0_out_0.txt
tensorName__0024_NodeID_153_ConvolutionReluPoolingLayer2_w_128_h_784_d_1_batchID_0_out_0.txt
tensorName__0028_NodeID_154_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0031_NodeID_155_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0034_NodeID_156_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0037_NodeID_157_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0040_NodeID_158_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0043_NodeID_159_ConvolutionReluPoolingLayer2_w_128_h_392_d_1_batchID_0_out_0.txt
tensorName__0047_NodeID_160_ConvolutionReluPoolingLayer2_w_128_h_196_d_1_batchID_0_out_0.txt
tensorName__0050_NodeID_161_ConvolutionReluPoolingLayer2_w_128_h_196_d_1_batchID_0_out_0.txt
tensorName__0053_NodeID_162_ConvolutionReluPoolingLayer2_w_128_h_196_d_1_batchID_0_out_0.txt
tensorName_uid_100_out_0_0033_NodeID_131_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_108_out_0_0035_NodeID_132_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_109_out_0_0036_NodeID_133_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_117_out_0_0038_NodeID_134_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_118_out_0_0039_NodeID_135_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_126_out_0_0041_NodeID_136_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_127_out_0_0042_NodeID_137_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_12_out_0_0003_NodeID_110_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_132_out_0_0044_NodeID_138_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_138_out_0_0045_NodeID_139_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_139_out_0_0046_NodeID_140_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_13_out_0_0004_NodeID_111_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_147_out_0_0048_NodeID_141_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_148_out_0_0049_NodeID_142_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_156_out_0_0051_NodeID_143_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_157_out_0_0052_NodeID_144_ConvolutionReluPoolingLayer2_w_7_h_7_d_512_batchID_0_out_0.txt
tensorName_uid_162_out_0_0054_NodeID_87_PoolingLayer2_w_1_h_1_d_512_batchID_0_out_0.txt
tensorName_uid_163_out_0_0055_NodeID_145_FullyConnectedReluLayer_w_512_h_1_d_1_batchID_0_out_0.txt
tensorName_uid_166_out_0_0056_NodeID_146_FullyConnectedReluLayer_w_2_h_1_d_1_batchID_0_out_0.txt
tensorName_uid_167_out_0_0057_NodeID_91_SoftMax2_w_2_h_1_d_1_batchID_0_out_0.txt
tensorName_uid_21_out_0_0006_NodeID_112_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_22_out_0_0007_NodeID_113_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_30_out_0_0009_NodeID_114_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_31_out_0_0010_NodeID_115_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_36_out_0_0012_NodeID_116_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_42_out_0_0013_NodeID_117_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_43_out_0_0014_NodeID_118_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_4_out_0_0000_NodeID_108_ConvolutionReluPoolingLayer2_w_112_h_112_d_64_batchID_0_out_0.txt
tensorName_uid_51_out_0_0016_NodeID_119_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_52_out_0_0017_NodeID_120_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_5_out_0_0001_NodeID_2_PoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_60_out_0_0019_NodeID_121_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_61_out_0_0020_NodeID_122_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_69_out_0_0022_NodeID_123_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_6_out_0_0002_NodeID_109_ConvolutionReluPoolingLayer2_w_56_h_56_d_64_batchID_0_out_0.txt
tensorName_uid_70_out_0_0023_NodeID_124_ConvolutionReluPoolingLayer2_w_28_h_28_d_128_batchID_0_out_0.txt
tensorName_uid_75_out_0_0025_NodeID_125_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_81_out_0_0026_NodeID_126_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_82_out_0_0027_NodeID_127_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_90_out_0_0029_NodeID_128_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_91_out_0_0030_NodeID_129_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
tensorName_uid_99_out_0_0032_NodeID_130_ConvolutionReluPoolingLayer2_w_14_h_14_d_256_batchID_0_out_0.txt
凭感觉tensorName_uid_4_out_0_0000_NodeID_108_ConvolutionReluPoolingLayer2_w_112_h_112_d_64_batchID_0_out_0.txt 应该是生成的第一个文件112×112×64,
但是文档说不带 relu 只到slice 我也找不到 哪个是relu 但是看文件名ConvolutionReluPoolingLayer2 好像又是带的 ,就用他跟我这边 第一层conv1+bn+scale+relu做对比吧 !
tensorrt fp32(测试过精度很高) 1.114020 1.124940 1.117931
tensorrt fp16(混合精度) 1.114020 1.124940 1.117931
rknn fp16 1.101562 1.112305 1.105469 还不确定带不带relu 大概率会带,我再查一下资料
先不查了,就留一层卷积,不带bn scale relu,看一下结果
tensorrt fp32
0: -104.967720
1: -179.990646
2: -189.679260
3: -193.201340
4: -188.290512
5: -186.644836
6: -186.533371
7: -189.821106
8: -193.032257
9: -191.848618
tensorrt fp16 居然第一卷积层不掉精度,后期查了一下,这个是混合精度,假如强制fp16的话,精度掉的也很严重,感觉自己问的这个问题没有什么意思了
0: -104.967720
1: -179.990646
2: -189.679260
3: -193.201340
4: -188.290512
5: -186.644836
6: -186.533371
7: -189.821106
8: -193.032257
9: -191.848618
RKNN FP16
-104.937500
-180.000000
-189.625000
-193.125000
-188.250000
-186.625000
-186.500000
-189.750000
-193.000000
-191.750000
-191.500000
看到这里你应该知道我遇到的是什么问题了,
1,请问那几个地方config里面,setinout之前,应该怎么设置,代码里面我是想用config,的 但是结果差异太大 ,没办法还是自己写了
2,想要获取第一层卷积之前的数据,因为你们有个config的设置,导致有时候都不知道自己输入的数据是怎么处理的,可以考虑留个不需要config的接口,其实大部分情况并未方便C++ 开发者
3,其实可以留一个FP32的设置方法,要是选择混合量化,全部设置为fp32是什么效果
4,考虑不考虑开源部分代码呢
PS:附加文件
文件链接:链接:https://pan.baidu.com/s/1hGYF9gg2JyGu3jdeQAQd8Q
提取码:zvie
|
|