-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowdown when using alloc!
#33
Comments
Your problem is that you don't release the allocated arrays, and thus end up allocating a quite bit out of the buffer. If you correct this, you will get a little performance increase over the "normal" version using BenchmarkTools, Bumper
function work0(polys; use_custom_allocator=false)
if use_custom_allocator
custom_allocator = Bumper.SlabBuffer()
work1(polys, custom_allocator)
else
work1(polys)
end
end
# Very important work
function work1(polys, custom_allocator)
res = 0
for poly in polys
@no_escape custom_allocator begin
new_poly = work2(poly, custom_allocator)
res += sum(new_poly)
end
end
res
end
function work1(polys)
res = 0
for poly in polys
new_poly = work2(poly)
res += sum(new_poly)
end
res
end
function work2(poly::Vector{T}) where {T}
new_poly = Vector{T}(undef, length(poly))
work3!(new_poly)
end
function work2(poly::Vector{T}, custom_allocator) where {T}
new_poly = Bumper.alloc!(custom_allocator, T, length(poly))
work3!(new_poly)
end
function work3!(poly::AbstractVector{T}) where {T}
poly[1] = one(T)
for i in 2:length(poly)
poly[i] = convert(T, i)^3 - poly[i - 1]
end
poly
end
## #
m, n = 1_000, 10_000
polys = [rand(UInt32, rand(1:m)) for _ in 1:n];
@btime work0(polys, use_custom_allocator=false)
# 3.430 ms (10001 allocations: 20.20 MiB)
#0x0000e20ca5f991fb
@btime work0(polys, use_custom_allocator=true)
# 3.032 ms (4 allocations: 192 bytes)
#0x0000e20ca5f991fb The code is not that well optimized, so it will give you only a very small performance improvement. |
But that is precisely the point ! |
The problem is that, when you don't release arrays that are out of scope, you end up using much more memory than you actually need. In this example your data should fit into L1 cache (little over 8kB). But, if you don't release the temporary data, you end up with a buffer that does not fit into cache at all (about 20MB). edit. The default SlabBuffer is about 8MB, so it should fit in L3 cache. |
Thanks for your answer, sorry for being not clear, let me explain. My example is purposefully inefficient. |
Hi,
I have the following example where I observe a 2x slowdown with
Bumper.alloc!
.Could you please confirm that I use the package correctly?
Do you have ideas on how to fix this?
Thank you !
Running on
The text was updated successfully, but these errors were encountered: