TB-RK3399ProD 上运行RKNN模型推理，rknn_input_set 处理时间过长

[复制链接] · 发表于 2022-5-26 09:16:01

我使用的模型是小改过的YOLOX-s，根据官方提供的rknpu yolov5 demo做的预处理后，在 rknn_input_set 这一行处理耗时有时会超过 60 ms，模型推理时间只有约 75 ms，请教一下各位大神如何加速这个模型输入的操作？非常感谢！！部分代码如下：
系统：Linux debian10.toybrick 4.4.194 aarch64
API version：1.7.0
DRV version: 1.7.0

Loading RKNN Model ...

Initializing RKNN Envs ...

D RKNNAPI: ==============================================

D RKNNAPI: RKNN VERSION:

D RKNNAPI:   API: 1.7.0 (7880361 build: 2021-08-05 11:25:07)

D RKNNAPI:   DRV: 1.7.0 (7880361 build: 2021-08-16 16:16:21)

D RKNNAPI: ==============================================

复制代码

*：RGA 与 DRM 已随模型一并初始化。

 // RKNN model Input Settings

    printf("Processing Input ... \n");

    auto t_p0 = chrono::high_resolution_clock::now();

    memset(this->inputs, 0, sizeof(this->inputs));

    this->inputs[0].index = 0;

    this->inputs[0].type = RKNN_TENSOR_UINT8;

    this->inputs[0].size = this->width * this->height * this->channel;

    this->inputs[0].fmt = RKNN_TENSOR_NHWC;

    this->inputs[0].pass_through = 0;



    auto t_p1 = chrono::high_resolution_clock::now();

    float pre_time1 = chrono::duration<float, milli>(t_p1 - t_p0).count();

    cout << "Done. Preprocessing1 Time: " << pre_time1 << " ms." << endl;    // input initialization time



    // DRM alloc buffer

    this->drm_buf = drm_buf_alloc(&this->drm_ctx, this->drm_fd, img_width, img_height, this->channel * 8,

                                  &this->buf_fd, &this->ctx_handle, &this->actual_size);

    memcpy(this->drm_buf, input_data, img_width * img_height * this->channel);

    auto t_p2 = chrono::high_resolution_clock::now();

    float pre_time2 = chrono::duration<float, milli>(t_p2 - t_p1).count();

    cout << "Done. Preprocessing2 Time: " << pre_time2 << " ms." << endl;    // DRM Processing time



    // init rga context

    img_resize_slow(&this->rga_ctx, this->drm_buf, img_width, img_height, this->resize_buf, this->width, this->height);

    this->inputs[0].buf = this->resize_buf;

    auto t_p3 = chrono::high_resolution_clock::now();

    float pre_time3 = chrono::duration<float, milli>(t_p3 - t_p2).count();

    cout << "Done. Preprocessing3 Time: " << pre_time3 << " ms." << endl;    // RGA Resize (slow) time



    rknn_inputs_set(this->ctx, this->io_num.n_input, this->inputs);



    auto t_p4 = chrono::high_resolution_clock::now();

    float pre_time4 = chrono::duration<float, milli>(t_p4 - t_p3).count();

    cout << "Done. Preprocessing4 Time: " << pre_time4 << " ms." << endl;    // rknn_inputs_set time



    // Modle Output Settings

    memset(this->outputs, 0, sizeof(this->outputs));

    outputs[0].want_float = 0;



    printf("Inferencing ... \n");

    auto t_0 = chrono::high_resolution_clock::now();

    ret = rknn_run(this->ctx, NULL);

    ret = rknn_outputs_get(this->ctx, this->io_num.n_output, this->outputs, NULL);

    auto t_1 = chrono::high_resolution_clock::now();

    float inf_time = chrono::duration<float, milli>(t_1 - t_0).count();

    cout << "Done. RKNN Infer Time: " << inf_time << " ms." << endl;    // rknn inference time
复制代码

结果输出：

Processing Input ...

Done. Preprocessing1 Time: 0.001458 ms.

Done. Preprocessing2 Time: 1.28944 ms.

Done. Preprocessing3 Time: 6.79251 ms.

Done. Preprocessing4 Time: 36.0082 ms.

Inferencing ...

Done. RKNN Infer Time: 76.1672 ms.
复制代码

TB-RK3399ProD 上运行RKNN模型推理，rknn_input_set 处理时间过长

浏览过的版块