|
我使用的模型是小改过的YOLOX-s,根据官方提供的rknpu yolov5 demo做的预处理后,在 rknn_input_set 这一行处理耗时有时会超过 60 ms,模型推理时间只有约 75 ms,请教一下各位大神如何加速这个模型输入的操作?非常感谢!! 部分代码如下:
系统:Linux debian10.toybrick 4.4.194 aarch64
API version:1.7.0
DRV version: 1.7.0
- Loading RKNN Model ...
- Initializing RKNN Envs ...
- D RKNNAPI: ==============================================
- D RKNNAPI: RKNN VERSION:
- D RKNNAPI: API: 1.7.0 (7880361 build: 2021-08-05 11:25:07)
- D RKNNAPI: DRV: 1.7.0 (7880361 build: 2021-08-16 16:16:21)
- D RKNNAPI: ==============================================
*:RGA 与 DRM 已随模型一并初始化。
- // RKNN model Input Settings
- printf("Processing Input ... \n");
- auto t_p0 = chrono::high_resolution_clock::now();
- memset(this->inputs, 0, sizeof(this->inputs));
- this->inputs[0].index = 0;
- this->inputs[0].type = RKNN_TENSOR_UINT8;
- this->inputs[0].size = this->width * this->height * this->channel;
- this->inputs[0].fmt = RKNN_TENSOR_NHWC;
- this->inputs[0].pass_through = 0;
- auto t_p1 = chrono::high_resolution_clock::now();
- float pre_time1 = chrono::duration<float, milli>(t_p1 - t_p0).count();
- cout << "Done. Preprocessing1 Time: " << pre_time1 << " ms." << endl; // input initialization time
- // DRM alloc buffer
- this->drm_buf = drm_buf_alloc(&this->drm_ctx, this->drm_fd, img_width, img_height, this->channel * 8,
- &this->buf_fd, &this->ctx_handle, &this->actual_size);
- memcpy(this->drm_buf, input_data, img_width * img_height * this->channel);
- auto t_p2 = chrono::high_resolution_clock::now();
- float pre_time2 = chrono::duration<float, milli>(t_p2 - t_p1).count();
- cout << "Done. Preprocessing2 Time: " << pre_time2 << " ms." << endl; // DRM Processing time
- // init rga context
- img_resize_slow(&this->rga_ctx, this->drm_buf, img_width, img_height, this->resize_buf, this->width, this->height);
- this->inputs[0].buf = this->resize_buf;
- auto t_p3 = chrono::high_resolution_clock::now();
- float pre_time3 = chrono::duration<float, milli>(t_p3 - t_p2).count();
- cout << "Done. Preprocessing3 Time: " << pre_time3 << " ms." << endl; // RGA Resize (slow) time
- rknn_inputs_set(this->ctx, this->io_num.n_input, this->inputs);
- auto t_p4 = chrono::high_resolution_clock::now();
- float pre_time4 = chrono::duration<float, milli>(t_p4 - t_p3).count();
- cout << "Done. Preprocessing4 Time: " << pre_time4 << " ms." << endl; // rknn_inputs_set time
- // Modle Output Settings
- memset(this->outputs, 0, sizeof(this->outputs));
- outputs[0].want_float = 0;
- printf("Inferencing ... \n");
- auto t_0 = chrono::high_resolution_clock::now();
- ret = rknn_run(this->ctx, NULL);
- ret = rknn_outputs_get(this->ctx, this->io_num.n_output, this->outputs, NULL);
- auto t_1 = chrono::high_resolution_clock::now();
- float inf_time = chrono::duration<float, milli>(t_1 - t_0).count();
- cout << "Done. RKNN Infer Time: " << inf_time << " ms." << endl; // rknn inference time
结果输出:
- Processing Input ...
- Done. Preprocessing1 Time: 0.001458 ms.
- Done. Preprocessing2 Time: 1.28944 ms.
- Done. Preprocessing3 Time: 6.79251 ms.
- Done. Preprocessing4 Time: 36.0082 ms.
- Inferencing ...
- Done. RKNN Infer Time: 76.1672 ms.
|
|