Toybrick

创建Swap分区解决OOM问题

zhangzj

超级版主

积分
1117
楼主
发表于 2019-2-21 10:52:12    查看: 9362|回复: 2 | [复制链接]    打印 | 只看该作者
在使用rknn-toolkit转换大型模型并开启了量化功能时,板子的内存可能不够用导致出现OOM问题。
报错log如下:
  1. --> Loading model
  2. done
  3. --> Building model
  4. Killed
  5. 串口信息:
  6. [toybrick@localhost work]$ [746756.249341] python3 invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
  7. [746756.250074] python3 cpuset=/ mems_allowed=0
  8. [746756.250506] CPU: 0 PID: 21169 Comm: python3 Not tainted 4.4.167 #17
  9. [746756.251071] Hardware name: rockchip,rk3399pro-toybrick-prod-linux (DT)
  10. [746756.251660] Call trace:
  11. [746756.251905] [<ffffff8008088948>] dump_backtrace+0x0/0x220
  12. [746756.252394] [<ffffff8008088b8c>] show_stack+0x24/0x30
  13. [746756.252851] [<ffffff80083a85ac>] dump_stack+0x94/0xbc
  14. [746756.253307] [<ffffff80081a5224>] dump_header.isra.5+0x50/0x15c
  15. [746756.253831] [<ffffff8008166bd4>] oom_kill_process+0x94/0x3d4
  16. [746756.254341] [<ffffff8008167188>] out_of_memory+0x1d8/0x2a0
  17. [746756.254840] [<ffffff800816b670>] __alloc_pages_nodemask+0x6b0/0x724
  18. [746756.255407] [<ffffff8008165a6c>] filemap_fault+0x24c/0x35c
  19. [746756.255907] [<ffffff8008226e40>] ext4_filemap_fault+0x40/0x60
  20. [746756.256430] [<ffffff8008185dec>] __do_fault+0x78/0xdc
  21. [746756.256883] [<ffffff800818906c>] handle_mm_fault+0x538/0xca4
  22. [746756.257394] [<ffffff80080944e0>] do_page_fault+0x214/0x36c
  23. [746756.257892] [<ffffff800809468c>] do_translation_fault+0x54/0xc8
  24. [746756.258425] [<ffffff8008080b08>] do_mem_abort+0x54/0xac
复制代码
或者:
  1. --> Loading model
  2. done
  3. --> Building model
  4. E Catch exception when building RKNN model!
  5. T Traceback (most recent call last):
  6. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
  7. T     return fn(*args)
  8. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
  9. T     options, feed_dict, fetch_list, target_list, run_metadata)
  10. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
  11. T     run_metadata)
  12. T tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[7,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
  13. T        [[Node: convolution_1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](convolution_1_2/Pad, convolution_1/weight/read)]]
  14. T Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
  15. T During handling of the above exception, another exception occurred:
  16. T Traceback (most recent call last):
  17. T   File "rknn/api/rknn_base.py", line 470, in rknn.api.rknn_base.RKNNBase.build
  18. T   File "rknn/api/rknn_base.py", line 888, in rknn.api.rknn_base.RKNNBase._quantize
  19. T   File "rknn/base/rknnlib/app/tensorzone/quantization.py", line 248, in rknn.base.rknnlib.app.tensorzone.quantization.Quantization.run
  20. T   File "rknn/base/rknnlib/app/tensorzone/quantization.py", line 141, in rknn.base.rknnlib.app.tensorzone.quantization.Quantization._run_quantization
  21. T   File "rknn/base/rknnlib/app/tensorzone/graph.py", line 98, in rknn.base.rknnlib.app.tensorzone.graph.Graph.run
  22. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
  23. T     run_metadata_ptr)
  24. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
  25. T     feed_dict_tensor, options, run_metadata)
  26. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
  27. T     run_metadata)
  28. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
  29. T     raise type(e)(node_def, op, message)
  30. T tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[7,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
  31. T        [[Node: convolution_1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](convolution_1_2/Pad, convolution_1/weight/read)]]
  32. T Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
  33. T Caused by op 'convolution_1_2/Conv2D', defined at:
  34. T   File "rknn_transform.py", line 29, in <module>
  35. T     rknn.build(do_quantization=True, dataset='./dataset.txt')
  36. T   File "/usr/local/lib64/python3.6/site-packages/rknn/api/rknn.py", line 162, in build
  37. T     ret = self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, pack_vdata=pre_compile)
  38. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 956, in conv2d
  39. T     data_format=data_format, dilations=dilations, name=name)
  40. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
  41. T     op_def=op_def)
  42. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
  43. T     return func(*args, **kwargs)
  44. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
  45. T     op_def=op_def)
  46. T   File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
  47. T     self._traceback = tf_stack.extract_stack()
  48. T ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[7,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
  49. T        [[Node: convolution_1_2/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](convolution_1_2/Pad, convolution_1/weight/read)]]
  50. T Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
  51. done
复制代码

当出现以上错误log时,说明系统内存已经不够用了,这个时候可以创建一个swap分区来解决内存不足。
用free -mh查看当前swap分区大小,默认应该是没有swap分区的:
  1. $ free -mh
  2.               total        used        free      shared  buff/cache   available
  3. Mem:           3.8G        149M        3.2G        365M        439M        3.2G
  4. Swap:            0B          0B          0B
复制代码

用文件创建swap分区方法如下(这边创建swap文件在/mnt目录下,大小为500MB,大小可自行调整,建议2048):
  1. cd /mnt
  2. sudo dd if=/dev/zero of=swap bs=1M count=500
  3. sudo chmod 600 swap
  4. sudo mkswap swap
  5. sudo swapon swap
复制代码
创建好后可以查看swap信息:
  1. $ swapon -s
  2. Filename                                Type            Size    Used    Priority
  3. /mnt/swap                               file            511996  459592  -1
复制代码


此时再用free -mh查看当前swap分区大小,swap分区已增大为500M:
  1. $ free -mh
  2.               total        used        free      shared  buff/cache   available
  3. Mem:           3.8G         82M         25M        371M        3.7G        3.3G
  4. Swap:          499M        449M         50M
复制代码
可以编辑/etc/fstab文件让系统每次开机启动时自动启用swap文件:
  1. sudo vi /etc/fstab
  2. 新增一行:/mnt/swap swap swap defaults 0 0
复制代码
fstab的含义可以自行百度学习。

若不需要swap文件,可以关闭swap分区,然后删除swap文件:
  1. swapoff /mnt/swap
  2. sudo rm /mnt/swap
复制代码



回复

使用道具 举报

administer

中级会员

积分
311
沙发
发表于 2019-3-7 20:10:37 | 只看该作者
少了一行 sudo swapon swap 启用分区
回复

使用道具 举报

程子

中级会员

积分
386
板凳
发表于 2019-3-8 02:32:53 | 只看该作者
还是建议在开发机而不是板子上做量化和预编译的工作。
板子本来就性能弱,拿更慢的eMMC开SWAP,作用也很有限。

参考:http://t.rock-chips.com/forum.php?mod=viewthread&tid=127
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

产品中心 购买渠道 开源社区 Wiki教程 资料下载 关于Toybrick


快速回复 返回顶部 返回列表