|
开发板:rk3399pro
rknn_toolkit版本:rknn-1.3.0 下面 rknn-api提供的c++接口
测试模型:mobilenetv1
仿真平台提供的性能数据:
root@ffd0ced59bb3:/examples/tflite/mobilenet_v1# python3 ./test.py
--> config model
done
--> Loading model
done
--> Building model
W The target_platform is not set in config, using default target platform rk1808.
W The channel_mean_value filed will not be used in the future!
W:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py:3632: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/control_flow_ops.py:1941: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
done
--> Export RKNN model
done
--> Init runtime environment
done
--> Running model
mobilenet_v1
-----TOP 5-----
[156]: 0.85107421875
[155]: 0.09173583984375
[205]: 0.01358795166015625
[284]: 0.006465911865234375
[194]: 0.002239227294921875
done
--> Begin evaluate model performance
W When performing performance evaluation, inputs can be set to None to use fake inputs.
========================================================================
Performance
========================================================================
Layer ID Name Time(us)
0 tensor.transpose_3 72
44 convolution.relu.pooling.layer2_2 363
59 convolution.relu.pooling.layer2_2 201
45 convolution.relu.pooling.layer2_2 185
60 convolution.relu.pooling.layer2_2 243
46 convolution.relu.pooling.layer2_2 98
61 convolution.relu.pooling.layer2_2 149
47 convolution.relu.pooling.layer2_2 104
62 convolution.relu.pooling.layer2_2 120
48 convolution.relu.pooling.layer2_2 72
63 convolution.relu.pooling.layer2_2 101
49 convolution.relu.pooling.layer2_2 92
64 convolution.relu.pooling.layer2_2 99
50 convolution.relu.pooling.layer2_2 110
65 convolution.relu.pooling.layer2_2 107
51 convolution.relu.pooling.layer2_2 212
66 convolution.relu.pooling.layer2_2 107
52 convolution.relu.pooling.layer2_2 212
67 convolution.relu.pooling.layer2_2 107
53 convolution.relu.pooling.layer2_2 212
68 convolution.relu.pooling.layer2_2 107
54 convolution.relu.pooling.layer2_2 212
69 convolution.relu.pooling.layer2_2 107
55 convolution.relu.pooling.layer2_2 212
70 convolution.relu.pooling.layer2_2 107
56 convolution.relu.pooling.layer2_2 174
71 convolution.relu.pooling.layer2_2 220
57 convolution.relu.pooling.layer2_2 353
28 pooling.layer2_1 36
58 fullyconnected.relu.layer_3 110
30 softmaxlayer2.layer_1 90
Total Time(us): 4694
FPS(600MHz): 159.78
FPS(800MHz): 213.04
Note: Time of each layer is converted according to 800MHz!
========================================================================
done
rk3399pro上使用c++接口实测性能数据:
firefly@firefly:~/Sync_firefly_3399pro/rknn_c++_api/rknn_api_sdk$ time ./build/rknn_mobilenet ../../dog_224x224.jpg ../../mobilenet_v1.rknn ../tmp/labels.txt
n_devices = 1
0: type=PCIE, id=0123456789ABCDEF
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI: API: 1.3.0 (c5654ea build: 2019-12-25 14:12:00)
D RKNNAPI: DRV: 1.3.1 (6ebb4d7 build: 2020-01-02 09:37:58)
D RKNNAPI: ==============================================
chrono: rknn_run cost time 0.132324ms
chrono: rknn inference cost time 29.2068 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
chrono: rknn_run cost time 0.126953ms
chrono: rknn inference cost time 30.5913 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
chrono: rknn_run cost time 0.151367ms
chrono: rknn inference cost time 30.1765 ms
0.851074: 156 Shih-Tzu
0.0917358: 155 Pekinese
0.013588: 205 Lhasa
0.00646591: 284 Persian cat
0.00223923: 194 Australian terrier
perf_run.run_duration = 16640 us
perf_run.perf_data =
Layer id: Name: Operation id: Operator: Target: Uid: Time(us):
0 MobilenetV1/MobilenetV1/Conv2d_0_1_acuity_mark_perm_60 0 TENSOR_TRANS TP 60 874
3 MobilenetV1/MobilenetV1/Conv2d_0_1 0 CONVOLUTION NN 1 547
18 MobilenetV1/MobilenetV1/Conv2d_1_depthwise_3 0 CONVOLUTION NN 3 404
4 MobilenetV1/MobilenetV1/Conv2d_1_pointwise_5 0 CONVOLUTION NN 5 322
19 MobilenetV1/MobilenetV1/Conv2d_2_depthwise_7 0 CONVOLUTION NN 7 431
5 MobilenetV1/MobilenetV1/Conv2d_2_pointwise_9 0 CONVOLUTION NN 9 263
20 MobilenetV1/MobilenetV1/Conv2d_3_depthwise_11 0 CONVOLUTION NN 11 359
6 MobilenetV1/MobilenetV1/Conv2d_3_pointwise_13 0 CONVOLUTION NN 13 278
21 MobilenetV1/MobilenetV1/Conv2d_4_depthwise_15 0 CONVOLUTION NN 15 297
7 MobilenetV1/MobilenetV1/Conv2d_4_pointwise_17 0 CONVOLUTION NN 17 322
22 MobilenetV1/MobilenetV1/Conv2d_5_depthwise_19 0 CONVOLUTION NN 19 283
8 MobilenetV1/MobilenetV1/Conv2d_5_pointwise_21 0 CONVOLUTION NN 21 389
23 MobilenetV1/MobilenetV1/Conv2d_6_depthwise_23 0 CONVOLUTION NN 23 303
9 MobilenetV1/MobilenetV1/Conv2d_6_pointwise_25 0 CONVOLUTION NN 25 269
24 MobilenetV1/MobilenetV1/Conv2d_7_depthwise_27 0 CONVOLUTION NN 27 373
10 MobilenetV1/MobilenetV1/Conv2d_7_pointwise_29 0 CONVOLUTION NN 29 362
25 MobilenetV1/MobilenetV1/Conv2d_8_depthwise_31 0 CONVOLUTION NN 31 376
11 MobilenetV1/MobilenetV1/Conv2d_8_pointwise_33 0 CONVOLUTION NN 33 354
26 MobilenetV1/MobilenetV1/Conv2d_9_depthwise_35 0 CONVOLUTION NN 35 365
12 MobilenetV1/MobilenetV1/Conv2d_9_pointwise_37 0 CONVOLUTION NN 37 364
27 MobilenetV1/MobilenetV1/Conv2d_10_depthwise_39 0 CONVOLUTION NN 39 366
13 MobilenetV1/MobilenetV1/Conv2d_10_pointwise_41 0 CONVOLUTION NN 41 354
28 MobilenetV1/MobilenetV1/Conv2d_11_depthwise_43 0 CONVOLUTION NN 43 366
14 MobilenetV1/MobilenetV1/Conv2d_11_pointwise_45 0 CONVOLUTION NN 45 358
29 MobilenetV1/MobilenetV1/Conv2d_12_depthwise_47 0 CONVOLUTION NN 47 400
15 MobilenetV1/MobilenetV1/Conv2d_12_pointwise_49 0 CONVOLUTION NN 49 281
30 MobilenetV1/MobilenetV1/Conv2d_13_depthwise_51 0 CONVOLUTION NN 51 519
16 MobilenetV1/MobilenetV1/Conv2d_13_pointwise_53 0 CONVOLUTION NN 53 375
1 MobilenetV1/Logits/AvgPool_1a_55 0 POOLING SH 55 670
17 trans_MobilenetV1/Logits/Conv2d_1c_1x1_56 0 FULLYCONNECTED TP 56 424
2 SoftMax2 0 SOFTMAX SH -1 736
问题:仿真性能数据执行时间5ms,上板实测执行时间16.6ms,感觉差距挺大的,看网上一些开发者的测试,mobilenet系列应该性能好于16.6ms才对,想问是我执行的方式哪里有问题吗,可能是什么地方的问题影响了实测性能数据?(附:mobilenetv2现象和mobilenetv1类似,上板跑出来的差距和仿真环境下差距较大)
|
|