-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set MALLOC_ARENA_MAX=2 #751
Comments
cc @nateberkopec thoughts? |
|
@mperham This is interesting. How come you are suggesting making this change in the buildpack vs developers adjusting this at the app level as needed? Also, does this environment variable work with the I ended up looking at these two articles: |
@abinoda Because for every developer that makes the adjustment, there will be 100 that don't. Ruby will continue to get a reputation for memory bloat. @nateberkopec I see jemalloc as a lot more risky but perhaps the buildpack could ship jemalloc 5.0.1 and allow the user to opt into it by setting LD_PRELOAD. |
I wouldn't say that it's off the table, but it's definitely more work. There's other things to consider here as well, such as how do we roll this out. Even if it's a better experience for 99% of customers, we still have to think about that 1% who's experience is worse and how can we manage that experience. These are examples of things we need to consider: Should this be the default for ALL apps? Should we only default it for new apps? Maybe only apps on a new stack etc. Should we warn people when we make the change? How can people opt out once the change is made (i.e. if we're defaulting to a MALLOC_ARENA_MAX, then they cannot unset the environment variable, we have to have docs on what the "normal" default would be and how to detect and set it). |
@mperham As someone who's been working with Ruby apps on Heroku almost a decade, I agree 💯 |
@mperham Adding
I'm not sure this is accurate. Malloc needs a lock on the arena to access any memory inside of it, including memory which may not have anything to do with Ruby code (c extensions, just parts of Ruby which aren't behind the GVL). Terrence's research has shown that there is a performance impact (even if it is much less than the memory impact!). @schneems Yeah. This is why I write performance guides and don't run a PaaS 😆 |
We've rolled out jemalloc 5.0.1 to roughly a dozen applications on Heroku over the past four months or so. Of those, we've had issues with two of them segfaulting which was fixed by switching to jemalloc 3.6.0. Making jemalloc enabled by default could end up causing new subtle issues that are hard to debug where they're coming from. Asking people to set LD_PRELOAD gets tricky causing it's own set of issues |
I knew as soon as I typed that someone would call me out on it. 😆 |
@gaffneyc thanks for chiming in. It seems like setting |
Not sure if GitHub notifies or not but I opened up #752 which adds MALLOC_ARENA_MAX=2 as a default for new apps and anything on our next stack. Would be good to also look at default heap growth factors for GC as well. It looks like some of this changed with Ruby 2.4:
We currently recommend |
Would you consider shipping jemalloc 3.6 or 5.0.1 in the buildpack so people could opt into it by setting LD_PRELOAD? Having an official documented path to jemalloc seems like it would solve all of these issues with people hacking in jemalloc with other buildpacks. |
Just a data point – upgrading my app to Heroku-16 stack impacted my memory usage so severely that I had to revert to Cedar-14. Baking in better defaults (MALLOC_ARENA_MAX) is definitely welcome, as is jemalloc. |
The changes to the memory allocator happened going from cedar-12 to cedar-14. It's why this article is "testing cedar 14 memory use" https://devcenter.heroku.com/articles/testing-cedar-14-memory-use. There is no change from cedar-14 to heroku-16 in terms of memory allocator. |
@abinoda thanks, updated |
This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps and for any apps running on the heroku-18 stack (when released). While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads: - https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/ - https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources. Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting: - https://devcenter.heroku.com/articles/testing-cedar-14-memory-use If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc). Our documentation will be updated to reflect this change once the PR is merged and deployed.
jemalloc 5.0.1 segfaults at random hardwares. Not deterministic, but in the ones that fails, it is consistent |
How about jemalloc 5.1.0, which is released 17 days ago ? |
However, we are still using slightly too much memory to downgrade (from PM dyno's) to 2X dyno's now. Could anyone (@schneems ?) tell me if trying jemalloc instead of
|
@kreintjes It's hard to tell based just on current memory usage what kind of improvement you would see with jemalloc (if any at all). Fortunately it is easy to run a test using the jemalloc buildpack. It's low risk and can be disabled without a code deploy if issues come up. heroku buildpacks:add --index 1 https://github.com/gaffneyc/heroku-buildpack-jemalloc.git -a [your app]
heroku config:set JEMALLOC_ENABLED=true -a [your app]
# You'll need to push a new commit after adding the buildpack for it to be installed
git commit --allow-empty -m "Empty commit to rebuild slug for jemalloc"
git push heroku master To turn off jemalloc: heroku config:set JEMALLOC_ENABLED=false -a [your app] |
@gaffneyc thanks! Will try that. I now see that #752 isn't merged yet, so the effects I described (huge memory reduction after upgrading to heroku-18 and no use setting MALLOC_ARENA_MAX ourselves) can't be caused by it. So most of my earlier comment was bullshit, sorry for that. Apparently I was a bit tired and overheated (37 degrees celsius is not something we are used to in the Netherlands). Maybe memory consumption is lower due to code improvements and/or ruby/rails updates, and we only saw it after upgrading the heroku stack. |
@gaffneyc tried the jemalloc buildpack for the last three days. We see a significant reduction in memory consumption. We still sometimes hit the 2X limit (1024 MB) during our nightly jobs, but for much shorter periods of time. Memory consumption of our workers for the past seven days: Memory consumption of our web dyno's for the past seven days: So much less memory than before. It looks like we can even downgrade our web dyno's from Pm to 2X now. Thanks! |
Small update about the memory consumption after switching to jemalloc. The workers are doing absolutely fine. However we see some strange patterns in the memory consumption of our web dyno's. Usually the memory consumption is fine and much lower than before. However, at some points we suddenly get extreme peaks in memory consumption of up to 5 GB (see screenshots and these aren't even the most extreme examples). Before switching to jemalloc we did not have this problem and the memory consumption never exceeded the limit (around 2 GB was the max we saw). Anyone any thoughts of what could be the cause or how to identify/fix the problem? @gaffneyc ? |
99% percent probability your jobs simply require that much memory. Use tools like GC.stat, Scout, etc to identify which jobs are allocating the most objects. |
As Nate said, it's most likely that there are some requests that just take that much memory to run. Your best bet would be to figure out which requests are allocating the most memory / objects and see if you can optimize them. Regardless of anything else you do, I would start digging in there as any other changes are probably just going to hide some unexpectedly expensive behavior within the app. You should also look at response times since allocating memory from the OS isn't necessarily bad unless it impacts overall performance. Like most algorithms there are trade offs that malloc implementations make around performance and memory growth. Jemalloc (compared with ptmalloc2 which is the Linux default) does a better job of keeping memory compact (which shows up as lower total memory usage) but can be more aggressive about requesting memory from the OS when it thinks it needs it. Jemalloc does have a number of tuning options that you could look at. Another option would be to give tcmalloc (which comes out of Google) a try with the tcmalloc buildpack that I also maintain. It has different properties from both jemalloc and ptmalloc2 but we've seen similar results to jemalloc in testing (low growth from overall compactness). Your mileage may vary and any algorithm changes probably won't fix the underlying issue even if it improves the metric. |
Yeah. My guess is you could tune all day and those spikes would still go be ~2.5-3GB, which is what they were before jemalloc. |
Also these kinds of actions/controllers/jobs tend to be very dependent on the input data you feed into them. So the fact that it's using more memory now could just be because you're now iterating over 10,000 records instead of 5,000, etc. You may think "nothing has changed except jemalloc" but that's ignoring that the data underneath has probably changed. |
One thing Nate has done before is actually remove jemalloc from https://www.codetriage.com to see what it does to the memory use. If the spikes are very regular (i.e. once a day etc.) then you can test your theory that they're related to jemalloc by removing it and seeing how the spikes change. If they're more sporadic then it's harder. I would put in some work to see what triggers the increase in memory in the first place. As @nateberkopec mentioned, the most likely culprit is a change in data "shape/size" (i.e. number of records being processed). |
I was brainstorming what tooling I would like to diagnose the "shape" issues and It would be killer if sidekiq could output memory allocation information per job similar to rails/rails#34136 but for sidekiq jobs. Which would be pretty cool. I do think some APM services support workers and allocations if you want to go a non-OSS route. |
Thanks everybody for the quick and helpful responses! I did not expect this much help this soon. To clarify, the problem isn't the workers/jobs, but the web dyno's (so web requests) now. The workers are doing absolutely better with jemalloc. The web dyno's most of time do so as well, however as pointed out, there are some irregularities. Also at moments of extremely high memory consumption our response times increase drastically as well, even leading to request time outs (although we didn't receive any user complaints as of yet). I understand the spikes in memory consumption could also be caused by changes in requests, code or data (growth of data). However, we started seeing these spikes very shortly after switching to jemalloc. I only did not record screenshots of them back then, because I wanted to monitor for a longer period. Then due to my holiday I only recently got a chance to summarise my findings and get back here. Also before jemalloc we indeed did have spikes as well, but back then they never exceeded the 2 GB (we never had a memory quota exceeded error on our Pm web dyno's), while quickly after switching to jemalloc they ran up to 5 GB. So therefore I think the increase in height of the spikes is caused by jemalloc, although indeed, I can't be certain. The growth of our (database) data is steadily and predictable, but of course the data users send with their requests could vary a lot. Anyway, as you pointed out, we likely always had a problem and the spikes of 2 GB probably already indicated that. Jemalloc might have made the spikes sharper (higher), but it would be best to fix the underlying problem(s) instead of tweaking the memory allocator. So I will start using GC.stat and/or Scout/NewRelic to track down the requests and code that cause the memory spikes and improve it. Thanks again everybody for your responses. |
This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads: - https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/ - https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources. Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting: - https://devcenter.heroku.com/articles/testing-cedar-14-memory-use If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc). Our documentation will be updated to reflect this change once the PR is merged and deployed.
[close #751] Default MALLOC_ARENA_MAX new apps
* upstream/master: (42 commits) [changelog skip] Fix logic for Outdated check Test current ruby version is also same as default ruby version Clean up OutdatedRubyVersion#join Bump Ruby version Safer interface for outdated ruby version Store values in thread directly No idea why this fails sometimes but this should be more robust. Add changelog for Ruby warnings mend Warn when app is not using the latest Ruby version Bump default Ruby version to 2.5.7 [changelog skip] Introduce change log check to all PRs v206 Add changelog for PR Switch back to a version of jruby that officially exists Pre-release binary is heroku-18 only Use 1.7.27 for jruby tests Remove app that's not used Bundler 1.17.3 [close heroku#751] Default MALLOC_ARENA_MAX new apps ...
This env variable greatly reduces memory fragmentation when using the GNU glibc memory allocator with MRI. Because MRI has the GIL, it doesn't need a lot of arenas to avoid thread contention; glibc defaulting to a large number only hurts Ruby users.
The text was updated successfully, but these errors were encountered: