|  | 
 
| 我使用的模型是小改过的YOLOX-s,根据官方提供的rknpu yolov5 demo做的预处理后,在 rknn_input_set 这一行处理耗时有时会超过 60 ms,模型推理时间只有约 75 ms,请教一下各位大神如何加速这个模型输入的操作?非常感谢!! 部分代码如下: 系统:Linux debian10.toybrick 4.4.194 aarch64
 API version:1.7.0
 DRV version: 1.7.0
 
 *:RGA 与 DRM 已随模型一并初始化。Loading RKNN Model ...
Initializing RKNN Envs ...
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.7.0 (7880361 build: 2021-08-05 11:25:07)
D RKNNAPI:   DRV: 1.7.0 (7880361 build: 2021-08-16 16:16:21)
D RKNNAPI: ==============================================
 
 结果输出: // RKNN model Input Settings
    printf("Processing Input ... \n");
    auto t_p0 = chrono::high_resolution_clock::now();
    memset(this->inputs, 0, sizeof(this->inputs));
    this->inputs[0].index = 0;
    this->inputs[0].type = RKNN_TENSOR_UINT8;
    this->inputs[0].size = this->width * this->height * this->channel;
    this->inputs[0].fmt = RKNN_TENSOR_NHWC;
    this->inputs[0].pass_through = 0;
    auto t_p1 = chrono::high_resolution_clock::now();
    float pre_time1 = chrono::duration<float, milli>(t_p1 - t_p0).count();
    cout << "Done. Preprocessing1 Time: " << pre_time1 << " ms." << endl;    // input initialization time
    // DRM alloc buffer
    this->drm_buf = drm_buf_alloc(&this->drm_ctx, this->drm_fd, img_width, img_height, this->channel * 8,
                                  &this->buf_fd, &this->ctx_handle, &this->actual_size);
    memcpy(this->drm_buf, input_data, img_width * img_height * this->channel);
    auto t_p2 = chrono::high_resolution_clock::now();
    float pre_time2 = chrono::duration<float, milli>(t_p2 - t_p1).count();
    cout << "Done. Preprocessing2 Time: " << pre_time2 << " ms." << endl;    // DRM Processing time
    // init rga context
    img_resize_slow(&this->rga_ctx, this->drm_buf, img_width, img_height, this->resize_buf, this->width, this->height);
    this->inputs[0].buf = this->resize_buf;
    auto t_p3 = chrono::high_resolution_clock::now();
    float pre_time3 = chrono::duration<float, milli>(t_p3 - t_p2).count();
    cout << "Done. Preprocessing3 Time: " << pre_time3 << " ms." << endl;    // RGA Resize (slow) time
    rknn_inputs_set(this->ctx, this->io_num.n_input, this->inputs);
    auto t_p4 = chrono::high_resolution_clock::now();
    float pre_time4 = chrono::duration<float, milli>(t_p4 - t_p3).count();
    cout << "Done. Preprocessing4 Time: " << pre_time4 << " ms." << endl;    // rknn_inputs_set time
    // Modle Output Settings
    memset(this->outputs, 0, sizeof(this->outputs));
    outputs[0].want_float = 0;
    printf("Inferencing ... \n");
    auto t_0 = chrono::high_resolution_clock::now();
    ret = rknn_run(this->ctx, NULL);
    ret = rknn_outputs_get(this->ctx, this->io_num.n_output, this->outputs, NULL);
    auto t_1 = chrono::high_resolution_clock::now();
    float inf_time = chrono::duration<float, milli>(t_1 - t_0).count();
    cout << "Done. RKNN Infer Time: " << inf_time << " ms." << endl;    // rknn inference time
 
 
 Processing Input ...
Done. Preprocessing1 Time: 0.001458 ms.
Done. Preprocessing2 Time: 1.28944 ms.
Done. Preprocessing3 Time: 6.79251 ms.
Done. Preprocessing4 Time: 36.0082 ms.
Inferencing ...
Done. RKNN Infer Time: 76.1672 ms.
 | 
 |