仿真性能数据和实测性能数据差距大，可能的原因是什么？

[复制链接] · 发表于 2020-12-1 10:19:59

开发板：rk3399pro
rknn_toolkit版本：rknn-1.3.0 下面 rknn-api提供的c++接口
测试模型：mobilenetv1

仿真平台提供的性能数据：
root@ffd0ced59bb3:/examples/tflite/mobilenet_v1# python3 ./test.py
--> config model
done
--> Loading model
done
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
W The channel_mean_value filed will not be used in the future!
W:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py:3632: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py:1941: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.

done
--> Export RKNN model
done
--> Init runtime environment
done
--> Running model
mobilenet_v1
-----TOP 5-----
[156]: 0.85107421875
[155]: 0.09173583984375
[205]: 0.01358795166015625
[284]: 0.006465911865234375
[194]: 0.002239227294921875

done
--> Begin evaluate model performance
W When performing performance evaluation, inputs can be set to None to use fake inputs.
========================================================================
                           Performance
========================================================================
Layer ID Name                                        Time(us)
0          tensor.transpose_3                         72
44       convolution.relu.pooling.layer2_2          363
59       convolution.relu.pooling.layer2_2          201
45       convolution.relu.pooling.layer2_2          185
60       convolution.relu.pooling.layer2_2          243
46       convolution.relu.pooling.layer2_2          98
61       convolution.relu.pooling.layer2_2          149
47       convolution.relu.pooling.layer2_2          104
62       convolution.relu.pooling.layer2_2          120
48       convolution.relu.pooling.layer2_2          72
63       convolution.relu.pooling.layer2_2          101
49       convolution.relu.pooling.layer2_2          92
64       convolution.relu.pooling.layer2_2          99
50       convolution.relu.pooling.layer2_2          110
65       convolution.relu.pooling.layer2_2          107
51       convolution.relu.pooling.layer2_2          212
66       convolution.relu.pooling.layer2_2          107
52       convolution.relu.pooling.layer2_2          212
67       convolution.relu.pooling.layer2_2          107
53       convolution.relu.pooling.layer2_2          212
68       convolution.relu.pooling.layer2_2          107
54       convolution.relu.pooling.layer2_2          212
69       convolution.relu.pooling.layer2_2          107
55       convolution.relu.pooling.layer2_2          212
70       convolution.relu.pooling.layer2_2          107
56       convolution.relu.pooling.layer2_2          174
71       convolution.relu.pooling.layer2_2          220
57       convolution.relu.pooling.layer2_2          353
28       pooling.layer2_1                            36
58       fullyconnected.relu.layer_3                110
30       softmaxlayer2.layer_1                      90
Total Time(us): 4694
FPS(600MHz): 159.78
FPS(800MHz): 213.04
Note: Time of each layer is converted according to 800MHz!
========================================================================

done

rk3399pro上使用c++接口实测性能数据：
firefly@firefly:~/Sync_firefly_3399pro/rknn_c++_api/rknn_api_sdk$ time ./build/rknn_mobilenet ../../dog_224x224.jpg ../../mobilenet_v1.rknn ../tmp/labels.txt
n_devices = 1
0: type=PCIE, id=0123456789ABCDEF
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI: API: 1.3.0 (c5654ea build: 2019-12-25 14:12:00)
D RKNNAPI: DRV: 1.3.1 (6ebb4d7 build: 2020-01-02 09:37:58)
D RKNNAPI: ==============================================
chrono: rknn_run cost time 0.132324ms
chrono: rknn inference cost time 29.2068 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
chrono: rknn_run cost time 0.126953ms
chrono: rknn inference cost time 30.5913 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
chrono: rknn_run cost time 0.151367ms
chrono: rknn inference cost time 30.1765 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
perf_run.run_duration = 16640 us
perf_run.perf_data =
Layer id:    Name:    Operation id:    Operator:    Target:    Uid:    Time(us):
  0  MobilenetV1/MobilenetV1/Conv2d_0_1_acuity_mark_perm_60  0  TENSOR_TRANS  TP  60       874
  3  MobilenetV1/MobilenetV1/Conv2d_0_1  0  CONVOLUTION  NN  1       547
  18  MobilenetV1/MobilenetV1/Conv2d_1_depthwise_3  0  CONVOLUTION  NN  3       404
  4  MobilenetV1/MobilenetV1/Conv2d_1_pointwise_5  0  CONVOLUTION  NN  5       322
  19  MobilenetV1/MobilenetV1/Conv2d_2_depthwise_7  0  CONVOLUTION  NN  7       431
  5  MobilenetV1/MobilenetV1/Conv2d_2_pointwise_9  0  CONVOLUTION  NN  9       263
  20  MobilenetV1/MobilenetV1/Conv2d_3_depthwise_11  0  CONVOLUTION  NN  11       359
  6  MobilenetV1/MobilenetV1/Conv2d_3_pointwise_13  0  CONVOLUTION  NN  13       278
  21  MobilenetV1/MobilenetV1/Conv2d_4_depthwise_15  0  CONVOLUTION  NN  15       297
  7  MobilenetV1/MobilenetV1/Conv2d_4_pointwise_17  0  CONVOLUTION  NN  17       322
  22  MobilenetV1/MobilenetV1/Conv2d_5_depthwise_19  0  CONVOLUTION  NN  19       283
  8  MobilenetV1/MobilenetV1/Conv2d_5_pointwise_21  0  CONVOLUTION  NN  21       389
  23  MobilenetV1/MobilenetV1/Conv2d_6_depthwise_23  0  CONVOLUTION  NN  23       303
  9  MobilenetV1/MobilenetV1/Conv2d_6_pointwise_25  0  CONVOLUTION  NN  25       269
  24  MobilenetV1/MobilenetV1/Conv2d_7_depthwise_27  0  CONVOLUTION  NN  27       373
  10  MobilenetV1/MobilenetV1/Conv2d_7_pointwise_29  0  CONVOLUTION  NN  29       362
  25  MobilenetV1/MobilenetV1/Conv2d_8_depthwise_31  0  CONVOLUTION  NN  31       376
  11  MobilenetV1/MobilenetV1/Conv2d_8_pointwise_33  0  CONVOLUTION  NN  33       354
  26  MobilenetV1/MobilenetV1/Conv2d_9_depthwise_35  0  CONVOLUTION  NN  35       365
  12  MobilenetV1/MobilenetV1/Conv2d_9_pointwise_37  0  CONVOLUTION  NN  37       364
  27  MobilenetV1/MobilenetV1/Conv2d_10_depthwise_39  0  CONVOLUTION  NN  39       366
  13  MobilenetV1/MobilenetV1/Conv2d_10_pointwise_41  0  CONVOLUTION  NN  41       354
  28  MobilenetV1/MobilenetV1/Conv2d_11_depthwise_43  0  CONVOLUTION  NN  43       366
  14  MobilenetV1/MobilenetV1/Conv2d_11_pointwise_45  0  CONVOLUTION  NN  45       358
  29  MobilenetV1/MobilenetV1/Conv2d_12_depthwise_47  0  CONVOLUTION  NN  47       400
  15  MobilenetV1/MobilenetV1/Conv2d_12_pointwise_49  0  CONVOLUTION  NN  49       281
  30  MobilenetV1/MobilenetV1/Conv2d_13_depthwise_51  0  CONVOLUTION  NN  51       519
  16  MobilenetV1/MobilenetV1/Conv2d_13_pointwise_53  0  CONVOLUTION  NN  53       375
  1  MobilenetV1/Logits/AvgPool_1a_55  0  POOLING  SH  55       670
  17  trans_MobilenetV1/Logits/Conv2d_1c_1x1_56  0  FULLYCONNECTED  TP  56       424
  2  SoftMax2  0  SOFTMAX  SH  -1       736

问题：仿真性能数据执行时间5ms，上板实测执行时间16.6ms，感觉差距挺大的，看网上一些开发者的测试，mobilenet系列应该性能好于16.6ms才对，想问是我执行的方式哪里有问题吗，可能是什么地方的问题影响了实测性能数据？(附：mobilenetv2现象和mobilenetv1类似，上板跑出来的差距和仿真环境下差距较大)

只看该作者 · 发表于 2020-12-1 17:52:08

影响因素多了去了，DDR、CPU频率、NPU频率等等等等。
你要在toybrick板子上测试我们才知道什么影响比较大。

只看该作者 · 发表于 2020-12-1 19:08:33

ok，主要第一次用rk的板子，怕编码上有什么地方没注意到影响了性能。想顺便问个小问题，rk3399pro和rk1808在npu参数上会有差别吗，比如DDR频率、NPU频率？

只看该作者 · 发表于 2020-12-1 19:55:00

markluo 发表于 2020-12-1 19:08
ok，主要第一次用rk的板子，怕编码上有什么地方没注意到影响了性能。想顺便问个小问题，rk3399pro和rk1808 ...

看板子，你用的要不是我们Toybrick板子的话我们咋知道他配的内存和设置的频率。

只看该作者 · 发表于 2020-12-1 20:27:59

好的吧，多谢！

仿真性能数据和实测性能数据差距大，可能的原因是什么？

浏览过的版块