混合量化为何batch_size设为1也out of memory错误

[复制链接] · 发表于 2019-12-12 17:10:32

本帖最后由 xsky 于 2019-12-12 17:20 编辑

    # pre-process config

    print('--> config model')

    rknn.config(batch_size=1, channel_mean_value='123 117 104 1', reorder_channel='0 1 2', epochs=100, quantized_dtype='asymmetric_quantized-u8')

    print('done')



    model_file = '../XMC2-Det_student_detector.pth.tar_op-v9.onnx'

    print('--> Loading model', model_file)

    ret = rknn.load_onnx(model=model_file)

    if ret != 0:

        print('Load model failed!')

        exit(ret)

    print('done')



    if ret != 0:

        print('Load model failed!')

        exit(ret)

    print('done')



    # Build model

    print('--> hybrid_quantization_step1')

    ret = rknn.hybrid_quantization_step1(dataset='./dataset.txt')

    if ret != 0:

        print('hybrid_quantization_step1 failed!')

        exit(ret)

    print('done')



复制代码

输入错误信息：
W Warning: Axis may need to be adjusted according to original model shape.
W Warning: Axis may need to be adjusted according to original model shape.
W Unhandle status: the input shape of reshape layer Reshape_125_3 is not 4-D
W Warning: Axis may need to be adjusted according to original model shape.
W Warning: Axis may need to be adjusted according to original model shape.
W Unhandle status: the input shape of reshape layer Reshape_126_4 is not 4-D
W Warning: Axis may need to be adjusted according to original model shape.
W:tensorflow:From D:\Program Files\Python\Python36\lib\site-packages\rknn\api\rknn.py:194: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2019-12-12 17:06:39.913429: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:39.917815: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 4294967296
2019-12-12 17:06:39.937048: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 3865470464 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:39.941542: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 3865470464
2019-12-12 17:06:39.959247: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 3478923264 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:39.963639: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 3478923264
2019-12-12 17:06:47.350046: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:47.355453: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-12-12 17:06:47.400406: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:47.406481: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592
2019-12-12 17:06:47.465856: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-12-12 17:06:47.470446: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592

epochs设为100都会出错，设为1不会，但这显然不能设为1来使用啊

输入图片是1080p的，dataset.txt列表中有2164图片路径
config的batch_size已设为1怎么还报out of memory
按文档的说明，batch_size设小可以减小内存使用的啊，而且模型实际使用的时候是需要batch_size=1的， onnx导出的batch_size和运行rknn.config时如果不匹配输出会报错。
rknn 1.2.1 windows

只看该作者 · 发表于 2019-12-13 08:43:38

本帖最后由 jefferyzhang 于 2019-12-13 08:45 编辑

CUDA_ERROR_OUT_OF_MEMORY 跟混合量化啥关系。。。

只看该作者 · 发表于 2019-12-13 16:07:52

本帖最后由 xsky 于 2019-12-16 10:30 编辑

jefferyzhang 发表于 2019-12-13 08:43
CUDA_ERROR_OUT_OF_MEMORY 跟混合量化啥关系。。。

这个就是在尝试混合量化运行第一步出的错啊，提示需要申请8G的host内存，但是根据文档bate_size设小不是能减少内存消耗么，batch_size已经设为1了，还是需要这么大的内存？主机的虚拟内存这个时候看已经是扩展到20G了。
另外epochs如果降低到20都会报out of memeory，降低到15才不报。这肯定根量化机制有关系啊，就1080p的图片，你们做过尝试没，到底要多大的内存才够，而且batch_size都降到1了，还要这么大内存，也不合理啊。

2019-12-13 15:53:14.846719: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592

I iterations: 196

2019-12-13 15:53:14.854817: E tensorflow/stream_executor/cuda/cuda_driver.cc:890] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory

2019-12-13 15:53:14.859445: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592

复制代码

只看该作者 · 发表于 2019-12-16 08:21:32

rknn不会去申请GPU的显存。
我们用的是内存。。。

只看该作者 · 发表于 2019-12-16 10:31:56

本帖最后由 xsky 于 2019-12-16 10:37 编辑

jefferyzhang 发表于 2019-12-16 08:21
rknn不会去申请GPU的显存。
我们用的是内存。。。

在win10上用混合量化（执行step1,step2）也是一样的流程不会用GPU么？这是在win10 上用 toolkit 1.2.1 混合量化step1报的错。

只看该作者 · 发表于 2019-12-16 10:38:46

xsky 发表于 2019-12-16 10:31
在win10上用混合量化也是一样的流程不会用GPU么？这是在win10 上用 toolkit 1.2.1 ...

你想的有点多。
量化和转换都是纯cpu用c写的代码，c写出来代码要能自动用到GPU，还跑了CUDA，那就真是太神奇了。。

只看该作者 · 发表于 2019-12-16 16:20:09

本帖最后由 xsky 于 2019-12-16 17:34 编辑

jefferyzhang 发表于 2019-12-16 10:38
你想的有点多。
量化和转换都是纯cpu用c写的代码，c写出来代码要能自动用到GPU，还跑了CUDA，那 ...

之前为直接运行onnx做验证对比，装了tensorflow-gpu，按你说跟GPU没关系，我把-gpu卸掉，重装tensorflow（1.14默认是-cpu）不报这个错误了，报这个错之后量化迭代会卡住