Toybrick

标题: TB-RK3399ProD 上运行RKNN模型推理,rknn_input_set 处理时间过长 [打印本页]

作者: ttxls    时间: 2022-5-26 09:16
标题: TB-RK3399ProD 上运行RKNN模型推理,rknn_input_set 处理时间过长
我使用的模型是小改过的YOLOX-s,根据官方提供的rknpu yolov5 demo做的预处理后,在 rknn_input_set 这一行处理耗时有时会超过 60 ms,模型推理时间只有约 75 ms,请教一下各位大神如何加速这个模型输入的操作?非常感谢!! 部分代码如下:
系统:Linux debian10.toybrick 4.4.194 aarch64
API version:1.7.0
DRV version: 1.7.0
  1. Loading RKNN Model ...
  2. Initializing RKNN Envs ...
  3. D RKNNAPI: ==============================================
  4. D RKNNAPI: RKNN VERSION:
  5. D RKNNAPI:   API: 1.7.0 (7880361 build: 2021-08-05 11:25:07)
  6. D RKNNAPI:   DRV: 1.7.0 (7880361 build: 2021-08-16 16:16:21)
  7. D RKNNAPI: ==============================================
复制代码
*:RGA 与 DRM 已随模型一并初始化。

  1. // RKNN model Input Settings
  2.     printf("Processing Input ... \n");
  3.     auto t_p0 = chrono::high_resolution_clock::now();
  4.     memset(this->inputs, 0, sizeof(this->inputs));
  5.     this->inputs[0].index = 0;
  6.     this->inputs[0].type = RKNN_TENSOR_UINT8;
  7.     this->inputs[0].size = this->width * this->height * this->channel;
  8.     this->inputs[0].fmt = RKNN_TENSOR_NHWC;
  9.     this->inputs[0].pass_through = 0;

  10.     auto t_p1 = chrono::high_resolution_clock::now();
  11.     float pre_time1 = chrono::duration<float, milli>(t_p1 - t_p0).count();
  12.     cout << "Done. Preprocessing1 Time: " << pre_time1 << " ms." << endl;    // input initialization time

  13.     // DRM alloc buffer
  14.     this->drm_buf = drm_buf_alloc(&this->drm_ctx, this->drm_fd, img_width, img_height, this->channel * 8,
  15.                                   &this->buf_fd, &this->ctx_handle, &this->actual_size);
  16.     memcpy(this->drm_buf, input_data, img_width * img_height * this->channel);
  17.     auto t_p2 = chrono::high_resolution_clock::now();
  18.     float pre_time2 = chrono::duration<float, milli>(t_p2 - t_p1).count();
  19.     cout << "Done. Preprocessing2 Time: " << pre_time2 << " ms." << endl;    // DRM Processing time

  20.     // init rga context
  21.     img_resize_slow(&this->rga_ctx, this->drm_buf, img_width, img_height, this->resize_buf, this->width, this->height);
  22.     this->inputs[0].buf = this->resize_buf;
  23.     auto t_p3 = chrono::high_resolution_clock::now();
  24.     float pre_time3 = chrono::duration<float, milli>(t_p3 - t_p2).count();
  25.     cout << "Done. Preprocessing3 Time: " << pre_time3 << " ms." << endl;    // RGA Resize (slow) time

  26.     rknn_inputs_set(this->ctx, this->io_num.n_input, this->inputs);

  27.     auto t_p4 = chrono::high_resolution_clock::now();
  28.     float pre_time4 = chrono::duration<float, milli>(t_p4 - t_p3).count();
  29.     cout << "Done. Preprocessing4 Time: " << pre_time4 << " ms." << endl;    // rknn_inputs_set time

  30.     // Modle Output Settings
  31.     memset(this->outputs, 0, sizeof(this->outputs));
  32.     outputs[0].want_float = 0;

  33.     printf("Inferencing ... \n");
  34.     auto t_0 = chrono::high_resolution_clock::now();
  35.     ret = rknn_run(this->ctx, NULL);
  36.     ret = rknn_outputs_get(this->ctx, this->io_num.n_output, this->outputs, NULL);
  37.     auto t_1 = chrono::high_resolution_clock::now();
  38.     float inf_time = chrono::duration<float, milli>(t_1 - t_0).count();
  39.     cout << "Done. RKNN Infer Time: " << inf_time << " ms." << endl;    // rknn inference time
复制代码
结果输出:


  1. Processing Input ...
  2. Done. Preprocessing1 Time: 0.001458 ms.
  3. Done. Preprocessing2 Time: 1.28944 ms.
  4. Done. Preprocessing3 Time: 6.79251 ms.
  5. Done. Preprocessing4 Time: 36.0082 ms.
  6. Inferencing ...
  7. Done. RKNN Infer Time: 76.1672 ms.
复制代码






欢迎光临 Toybrick (https://t.rock-chips.com/) Powered by Discuz! X3.3