Please refer to Building commands on linux and Building commands on windows in CUDA Platform Guide.
Please refer to How to run benchmark in CUDA Platform Guide.
There are three implementations of a function in our benchmark, one is the CUDA implementation in ppl.cv, the others are the x86 and cuda counterparts of the former in OpenCV. They all run on a serial of parameter combinations covering common usage and the elapsed time is recorded. Besides the particular parameters of a function, the supported data types(uchar/float), the channels(1/3/4) and the commonly used image sizes are tested for each function. The input images are composed of randomly generated pixel values.
We describe performance in terms of acceleration ratio using the x86 or CUDA implementation which is the fastest in OpenCV as the baseline. For each function, we sort the speedups and pick out the minimum speedup, the median speedup and the maximum speedup to form a compact box diagram to characterize acceleration ratio instead of average speedup.
Information of machines:
- X86 desktop computer with Geforce GTX 1060 GPU:
- CPU: Intel® Core™ i7-7700 CPU (8 cores, 3.60GHz)
- GPU: GeForce GTX 1060 (1280 CUDA Cores, 1772 MHz)
- Host memory: 32 GB
- Device memory: 6 GB
- OS: ubuntu 16.04
- X86 Cloud server with Tesla V100 GPU:
- CPU: Intel(R) Xeon(R) Gold 6146 CPU (12 cores, 3.20 GHz)
- GPU: Tesla V100 (5,120 cores, 1230 MHz)
- Host memory: 396 GB
- Device memory: 64 GB
- OS: ubuntu 16.04
function | Geforce GTX 1060 | Tesla V100 |
---|---|---|
Abs | (1.027618, 1.216567, 2.522977)(schar), (1.002612, 1.079896, 1.212746)(float) | (1.920000, 2.250000, 4.000000)(schar), (1.428571, 1.600000, 2.666667)(float) |
Add | (1.037273, 1.317647, 3.866667)(uchar), (1.012752, 1.135303, 1.318750)(float) | (1.484422, 2.346158, 3.628854)(uchar), (1.169862, 1.400410, 2.473603)(float) |
AddWeighted | (0.911404, 1.069565, 2.500000)(uchar), (1.032401, 1.068750, 1.312500)(float) | (0.887411, 1.310023, 2.540298)(uchar), (1.162151, 1.358260, 2.297834)(float) |
Subtract | (1.062500, 1.321429, 3.444444)(uchar), (1.025086, 1.073438, 1.256250)(float) | (1.187220, 1.922802, 3.456477)(uchar), (1.200039, 1.639410, 3.316506)(float) |
Mul | (1.083189, 1.327586, 3.193548)(uchar), (1.020218, 1.062124, 1.213873)(float) | (1.591886, 2.275094, 3.521160)(uchar), (1.169695, 1.402987, 2.497707)(float) |
Div | (1.197259, 1.318625, 1.967651)(uchar), (1.020744, 1.091126, 1.343856)(float) | (1.393939, 2.000000, 2.666667)(uchar), (1.465116, 1.800000, 3.666667)(float) |
BGR2BGRA | (1.076923, 1.188437, 2.466667)(uchar), (1.009084, 1.058051, 1.319444)(float) | (0.996509, 2.210639, 3.574978)(uchar), (1.191974, 1.460745, 3.175911)(float) |
BGRA2BGR | (1.061947, 1.235294, 2.666667)(uchar), (1.013997, 1.085965, 1.347518)(float) | (1.198190, 1.848153, 2.867657)(uchar), (1.173317, 1.402871, 2.416639)(float) |
BGR2RGB | (1.052308, 1.251572, 2.700000)(uchar), (1.009910, 1.076321, 1.300000)(float) | (1.209468, 1.870198, 2.894736)(uchar), (1.178372, 1.450653, 2.412296)(float) |
BGRA2RGBA | (1.052252, 1.294118, 3.550000)(uchar), (1.005998, 1.061538, 1.250000)(float) | (1.008355, 2.209785, 3.238027)(uchar), (1.184873, 1.405101, 3.121947)(float) |
BGR2GRAY | (1.297546, 1.875000, 3.272727)(uchar), (1.018468, 1.121212, 2.225000)(float) | (1.398824, 2.286852, 3.272300)(uchar), (1.187580, 2.308537, 3.109362)(float) |
BGRA2GRAY | (1.172324, 1.976190, 3.850000)(uchar), (1.022443, 1.125000, 2.050000)(float) | (1.324805, 2.282110, 3.271641)(uchar), (1.212494, 1.904405, 3.039953)(float) |
GRAY2BGR | (1.063415, 1.540000, 2.666667)(uchar), (1.019014, 1.107143, 1.454545)(float) | (1.102559, 1.845679, 2.813610)(uchar), (1.187604, 1.552207, 2.345931)(float) |
GRAY2BGRA | (1.211268, 1.688889, 3.550000)(uchar), (1.016974, 1.095238, 1.960000)(float) | (0.964918, 2.304777, 3.295088)(uchar), (1.212406, 1.816164, 3.188364)(float) |
BGR2YCrCb | (1.115718, 1.437500, 2.933333)(uchar), (1.009922, 1.078431, 1.357143)(float) | (1.241286, 1.960112, 2.976867)(uchar), (1.197845, 1.590102, 2.509307)(float) |
YCrCb2BGR | (1.000899, 1.245714, 2.275000)(uchar), (1.023173, 1.066667, 1.275168)(float) | (1.240044, 1.953158, 3.116737)(uchar), (1.199772, 1.588004, 2.517634)(float) |
BGR2HSV | (1.047619, 1.124324, 1.440000)(uchar), (0.997503, 1.044550, 1.516779)(float) | (1.244666, 1.442114, 2.198618)(uchar), (1.185546, 1.439000, 2.350594)(float) |
HSV2BGR | (1.087786, 1.182353, 1.625000)(uchar), (1.296982, 1.423002, 1.751773)(float) | (1.205051, 1.746410, 2.417964)(uchar), (1.799301, 1.971470, 2.591022)(float) |
BGR2LAB | (1.057958, 1.116981, 1.347518)(uchar), (4.817276, 4.958534, 5.178571)(float) | (1.218432, 1.340922, 2.003351)(uchar), (11.744559, 16.295735, 16.833752)(float) |
LAB2BGR | (3.492866, 3.535597, 3.842932)(uchar), (1.005988, 1.049924, 1.215789)(float) | (7.991927, 10.267584, 11.016621)(uchar), (1.196817, 1.574198, 2.219194)(float) |
NV122BGR | (11.103975, 14.618688, 16.317706)(uchar) | (22.728164, 76.905314, 80.594786)(uchar) |
NV122BGRA | (16.251937, 17.611833, 18.670433)(uchar) | (25.602489, 92.221827, 126.598710)(uchar) |
NV212BGR | (11.066975, 15.031184, 16.289727)(uchar) | (21.985959, 76.702585, 80.624540)(uchar) |
NV212BGRA | (16.354588, 17.128284, 18.803167)(uchar) | (25.452630, 92.121791, 126.082495)(uchar) |
BGR2I420 | (10.693974, 14.342727, 15.599600)(uchar) | (18.157460, 65.449673, 96.908186)(uchar) |
BGRA2I420 | (10.988100, 13.469619, 14.967882)(uchar) | (19.512714, 71.242601, 102.685346)(uchar) |
I4202BGR | (12.730026, 14.859967, 16.374234)(uchar) | (21.091450, 72.681859, 75.285153)(uchar) |
I4202BGRA | (16.132000, 16.995249, 18.781433)(uchar) | (24.304845, 89.937122, 126.809467)(uchar) |
YUV2GRAY | (1.474327, 2.118086, 3.434171)(uchar) | (1.091379, 14.942844, 20.622220)(uchar) |
UYVY2BGR | (12.124640, 14.798133, 15.474752)(uchar) | (23.692301, 54.666932, 60.882893)(uchar) |
UYVY2GRAY | (13.245150, 17.255159, 29.511351)(uchar) | (11.636411, 47.939781, 54.628228)(uchar) |
YUYV2BGR | (10.991902, 14.631356, 15.423861)(uchar) | (23.766830, 55.792111, 62.559629)(uchar) |
YUYV2GRAY | (15.908143, 18.193933, 28.971842)(uchar) | (11.908574, 45.874270, 54.474318)(uchar) |
AdaptiveThreshold | (3.573068, 14.689693, 25.344412)(uchar) | (21.294030, 73.243000, 102.870909)(uchar) |
BilateralFilter | (1.168011, 1.525880, 2.761905)(uchar), (1.030715, 1.982063, 2.684054)(float) | (81.962000, 170.308496, 2558.305913)(uchar), (237.644286, 462.073981, 574.847162)(float) |
BitwiseAnd | (1.000000, 3.181818, 8.497829)(uchar) | (1.147059, 4.000000, 35.471800)(uchar) |
BoxFilter | (1.480552, 4.495130, 7.620557)(uchar), (1.448339, 6.352779, 13.262363)(float) | (7.124271, 25.163939, 46.066635)(uchar), (14.106000, 38.753665, 77.675652)(float) |
CalcHist | (1.150568, 1.770833, 2.394737)(uchar) | (1.857143, 3.000000, 3.533333)(uchar) |
ConvertTo | (0.993691, 1.073810, 1.603448)(uchar), (0.998486, 1.052786, 1.187879)(float) | (0.670103, 1.344444, 4.500000)(uchar), (0.780702, 1.176471, 2.666667)(float) |
CopyMakeborder | (1.000000, 1.370000, 2.757983)(uchar), (1.057269, 1.162304, 3.528545)(float) | (1.596447, 1.717750, 14.224575)(uchar), (1.389611, 2.373997, 21.272053)(float) |
Crop | (5.246094, 10.568061, 17.913457)(uchar), (3.501557, 12.897787, 23.676354)(float) | (8.071733, 48.336800, 62.190875)(uchar), (27.270250, 42.298500, 82.963085)(float) |
Dilate | (0.700053, 3.018605, 36.491972)(uchar), (1.233902, 4.462496, 30.425474)(float) | (1.521163, 12.164694, 39.407692)(uchar), (4.035320, 33.010542, 91.460005)(float) |
DistanceTransform | (4.748947, 10.090304, 53.176053)(float) | (5.214643, 15.715885, 175.633388)(float) |
EqualizeHist | (1.282700, 1.964115, 3.808168)(uchar) | (1.896552, 2.444444, 22.475000)(uchar) |
Erode | (0.712260, 2.985459, 37.392623)(uchar), (1.166272, 4.408501, 30.251434)(float) | (1.519858, 12.016352, 40.007692)(uchar), (4.043243, 31.962500, 91.159956)(float) |
Filter2D | (0.857971, 2.707717, 10.080080)(uchar), (1.158228, 2.812923, 11.549172)(float) | (1.064132, 5.109709, 10.239130)(uchar), (1.180978, 3.344444, 10.202247)(float) |
Flip | (1.166667, 1.250000, 1.885246)(uchar), (1.020772, 1.088785, 1.247059)(float) | (2.266297, 2.692010, 2.764538)(uchar), (1.430543, 1.496240, 2.699739)(float) |
GaussianBlur | (1.642779, 3.304591, 12.553922)(uchar), (1.660031, 2.951287, 6.443099)(float) | (9.550909, 15.166667, 77.000000)(uchar), (9.790741, 22.002174, 56.855556)(float) |
GuidedFilter | (1.841109, 4.446838, 11.442694)(uchar), (1.914174, 4.654867, 12.122168)(float) | (6.002427, 33.592295, 85.662757)(uchar), (6.084409, 29.738052, 103.347549)(float) |
Integral | (0.336724, 0.616571, 1.143994)(uchar),(0.560649, 1.074805, 2.191565)(float) | (0.447493, 1.962876, 2.007879)(uchar), (0.788689, 2.565471, 3.641776)(float) |
Laplacian | (5.719577, 9.622927, 55.736000)(uchar), (2.665474, 5.248192, 15.066487)(float) | (31.377286, 75.869500, 234.550952)(uchar), (17.290625, 35.339333, 106.552075)(float) |
Mean | (0.498830, 18.990729, 59.166509)(uchar), (1.752304, 12.701592, 34.480694)(float) | (0.802057, 35.056429, 221.736000)(uchar), (6.057550, 39.561500, 157.173500)(float) |
MeanStdDev | (0.337072, 11.151297, 40.957974)(uchar), (4.841667, 8.282971, 17.452190)(float) | (0.536569, 23.880556, 145.785484)(uchar), (8.946910, 32.976917, 94.712100)(float) |
MedianBlur | (0.351890, 1.116563, 3.519904)(uchar), (2.136163, 3.724323, 4.381040)(float) | (3.459940, 7.941560, 23.388209)(uchar), (18.604885, 20.476500, 21.021053)(float) |
Merge | (2.278780, 2.644595, 6.276601)(uchar), (2.353986, 8.241453, 10.674901)(float) | (3.113387, 17.123593, 20.245900)(uchar), (16.259525, 24.317619, 46.897456)(float) |
MinMaxLoc | (0.326772, 3.114965, 6.056915)(uchar), (1.952189, 10.600955, 16.410043)(float) | (0.464410, 4.004704, 11.220379)(uchar), (1.564421, 15.698727, 33.511538)(float) |
Norm | (0.256201, 1.656461, 39.051089)(uchar), (1.036796, 5.491927, 31.023884)(float) | (0.132152, 5.693815, 108.367206)(uchar), (1.217980, 22.487100, 111.887975)(float) |
Normalize | (1.477920, 6.778663, 30.741967)(uchar), (1.928009, 11.314839, 27.490531)(float) | (4.358423, 15.967336, 77.594030)(uchar), (9.863286, 30.135352, 85.963588)(float) |
Ones | (34.007361, 102.137812, 110.859572)(uchar), (14.834672, 20.067127, 34.036056)(float) | (42.736150, 170.238000, 361.399606)(uchar), (30.091450, 60.011938, 119.932833)(float) |
PerspectiveTransform | (24.956857, 30.550903, 53.705714)(float) | (77.732667, 120.337187, 236.752000)(float) |
PyrDown | (0.855491, 1.760697, 3.200000)(uchar), (0.783599, 0.996094, 1.968254)(float) | (0.888889, 1.840000, 3.000000)(uchar), (0.967742, 1.714286, 2.500000)(float) |
PyrUp | (0.982715, 1.141379, 3.225000)(uchar), (1.018668, 1.101715, 1.277778)(float) | (1.092308, 1.714286, 2.750000)(uchar), (0.982332, 1.104167, 1.714286)(float) |
Remap | (1.000000, 1.500000, 3.093750)(uchar), (0.979498, 1.380000, 3.125000)(float) | (1.192308, 2.666667, 3.333333)(uchar), (1.117647, 2.500000, 3.666667)(float) |
Resize | (1.000943, 1.531532, 2.875000)(uchar), (0.993286, 1.147826, 2.619048)(float) | (1.030841, 2.471922, 3.428373)(uchar), (1.131494, 2.197102, 3.316012)(float) |
Rotate | (0.574651, 1.043956, 3.076923)(uchar), (0.546294, 0.665658, 1.033333)(float) | (0.805556, 2.016667, 5.550000)(uchar), (0.480392, 0.957143, 4.000000)(float) |
SepFilter2D | (1.364119, 1.908174, 9.654275)(uchar), (1.341837, 1.863272, 9.666667)(short), (1.340006, 2.333333, 6.084475)(float) | (9.454545, 19.282143, 84.583333)(uchar), (9.413043, 19.357143, 84.583333)(short), (10.227273, 18.721429, 63.787500)(float) |
SetTo | (1.000000, 1.600000, 5.615385)(uchar), (1.006349, 1.281250, 3.380952)(float) | (0.735294, 2.875000, 6.153846)(uchar), (0.542540, 1.304348, 4.500000)(float) |
Sobel | (2.329529, 5.088146, 11.560976)(uchar), (2.374818, 4.994074, 11.308411)(short), (2.209220, 4.286020, 6.318538)(float) | (20.446154, 46.628319, 72.314286)(uchar), (19.476015, 42.983740, 90.071429)(short), (21.000000, 39.061538, 70.411111)(float) |
Split | (1.067019, 1.238372, 3.166667)(uchar), (1.006090, 1.067797, 1.450000)(float) | (1.230769, 3.000000, 4.000000)(uchar), (1.070588, 1.461538, 4.000000)(float) |
Transpose | (8.403667, 11.634840, 15.109060)(uchar), (5.698850, 12.042905, 15.199568)(float) | (28.730750, 64.849000, 72.014231)(uchar), (28.907429, 72.115600, 126.537647)(float) |
WarpAffine | (0.730088, 3.073171, 96.212000)(uchar), (0.967412, 2.549020, 118.212581)(float) | (1.200000, 4.000000, 260.023697)(uchar), (1.230769, 4.333333, 404.470000)(float) |
WarpPerspective | (1.010352, 3.270270, 172.198672)(uchar), (1.022066, 3.275000, 172.292843)(float) | (1.266667, 4.000000, 673.100000)(uchar), (1.257143, 4.000000, 983.360000)(float) |
Zeros | (0.952381, 1.000000, 1.087719)(uchar), (0.975610, 1.001887, 1.019608)(float) | (1.000000, 1.000000, 2.000000)(uchar), (1.000000, 1.000000, 1.066667)(float) |