- import cv2
- import timeit
- print('OpenCL available:', cv2.ocl.haveOpenCL())
- # A simple image pipeline that runs on both Mat and Umat
- def img_cal(img, mode='none'):
- if mode=='UMat':
- img = cv2.UMat(img)
- img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
- img = cv2.GaussianBlur(img, (7, 7), 1.5)
- img = cv2.Canny(img, 0, 50)
- if type(img) == cv2.UMat:
- img = cv2.UMat.get(img)
- return img
- # Timing function
- def run(processor, function, n_threads, N):
- cv2.setNumThreads(n_threads)
- t = timeit.timeit(function, globals=globals(), number=N)/N*1000
- print('%s avg. with %d threads: %0.2f ms' % (processor, n, t))
- return t
- img = cv2.imread('a.tif')
- N = 100
- threads = [1, 6]
- processor = {'GPU': "img_cal(img,'UMat')",
- 'CPU': "img_cal(img)"}
- results = {}
- for n in threads:
- for pro in processor.keys():
- results[pro,n] = run(processor=pro,
- function= processor[pro],
- n_threads=n, N=N)
- print('\nGPU speed increase over 1 CPU thread [%%]: %0.2f' % \
- (results[('CPU', 1)]/results[('GPU', 1)]*100))
- print('CPU speed increase on 6 threads versus 1 thread [%%]: %0.2f' % \
- (results[('CPU', 1)]/results[('CPU', 6)]*100))
- print('GPU speed increase versus 6 threads [%%]: %0.2f' % \
- (results[('GPU', 1)]/results[('GPU', 6)]*100))
复制代码
测试结果为:- toybrick@debian10:~/temp/pytest$ /usr/bin/python3 /home/toybrick/temp/pytest/main.py
- OpenCL available: True
- GPU avg. with 1 threads: 25.12 ms
- CPU avg. with 1 threads: 3.34 ms
- GPU avg. with 6 threads: 16.39 ms
- CPU avg. with 6 threads: 3.14 ms
- GPU speed increase over 1 CPU thread [%]: 13.31
- CPU speed increase on 6 threads versus 1 thread [%]: 106.54
- GPU speed increase versus 6 threads [%]: 153.28
复制代码
使用了OpenCL/GPU后,性能反而下降了,好奇怪?!- toybrick@debian10:~$ clinfo
- Number of platforms 1
- Platform Name ARM Platform
- Platform Vendor ARM
- Platform Version OpenCL 1.2 v1.r18p0-01rel0.5630b190419266e7fe8b09ec0007fb39
- Platform Profile FULL_PROFILE
- Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory
- Platform Extensions function suffix ARM
- Platform Name ARM Platform
- Number of devices 1
- Device Name Mali-T860
- Device Vendor ARM
- Device Vendor ID 0x8602000
- Device Version OpenCL 1.2 v1.r18p0-01rel0.5630b190419266e7fe8b09ec0007fb39
- Driver Version 1.2
- Device OpenCL C Version OpenCL C 1.2 v1.r18p0-01rel0.5630b190419266e7fe8b09ec0007fb39
- Device Type GPU
- Device Profile FULL_PROFILE
- Device Available Yes
- Compiler Available Yes
- Linker Available Yes
- Max compute units 4
- Max clock frequency 5MHz
- Device Partition (core)
- Max number of sub-devices 0
- Supported partition types None
- Supported affinity domains (n/a)
- Max work item dimensions 3
- Max work item sizes 256x256x256
- Max work group size 256
- Preferred work group size multiple 4
- Preferred / native vector sizes
- char 16 / 16
- short 8 / 8
- int 4 / 4
- long 2 / 2
- half 8 / 8 (cl_khr_fp16)
- float 4 / 4
- double 2 / 2 (cl_khr_fp64)
- Half-precision Floating-point support (cl_khr_fp16)
- Denormals Yes
- Infinity and NANs Yes
- Round to nearest Yes
- Round to zero Yes
- Round to infinity Yes
- IEEE754-2008 fused multiply-add Yes
- Support is emulated in software No
- Single-precision Floating-point support (core)
- Denormals Yes
- Infinity and NANs Yes
- Round to nearest Yes
- Round to zero Yes
- Round to infinity Yes
- IEEE754-2008 fused multiply-add Yes
- Support is emulated in software No
- Correctly-rounded divide and sqrt operations No
- Double-precision Floating-point support (cl_khr_fp64)
- Denormals Yes
- Infinity and NANs Yes
- Round to nearest Yes
- Round to zero Yes
- Round to infinity Yes
- IEEE754-2008 fused multiply-add Yes
- Support is emulated in software No
- Address bits 64, Little-Endian
- Global memory size 4029292544 (3.753GiB)
- Error Correction support No
- Max memory allocation 1007323136 (960.7MiB)
- Unified memory for Host and Device Yes
- Minimum alignment for any data type 128 bytes
- Alignment of base address 1024 bits (128 bytes)
- Global Memory cache type Read/Write
- Global Memory cache size 262144 (256KiB)
- Global Memory cache line size 64 bytes
- Image support Yes
- Max number of samplers per kernel 16
- Max size for 1D images from buffer 65536 pixels
- Max 1D or 2D image array size 2048 images
- Base address alignment for 2D image buffers 32 bytes
- Pitch alignment for 2D image buffers 16 pixels
- Max 2D image size 65536x65536 pixels
- Max 3D image size 65536x65536x65536 pixels
- Max number of read image args 128
- Max number of write image args 8
- Local memory type Global
- Local memory size 32768 (32KiB)
- Max number of constant args 8
- Max constant buffer size 65536 (64KiB)
- Max size of kernel argument 1024
- Queue properties
- Out-of-order execution Yes
- Profiling Yes
- Prefer user sync for interop No
- Profiling timer resolution 1000ns
- Execution capabilities
- Run OpenCL kernels Yes
- Run native kernels No
- printf() buffer size 1048576 (1024KiB)
- Built-in kernels (n/a)
- Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory
- NULL platform behavior
- clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform
- clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM]
- clCreateContext(NULL, ...) [default] Success [ARM]
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
- Platform Name ARM Platform
- Device Name Mali-T860
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
- Platform Name ARM Platform
- Device Name Mali-T860
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
- Platform Name ARM Platform
- Device Name Mali-T860
- toybrick@debian10:~$
复制代码
欢迎光临 Toybrick (https://t.rock-chips.com/) | Powered by Discuz! X3.3 |