Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set MALLOC_ARENA_MAX=2 #751

Closed
mperham opened this issue Apr 25, 2018 · 30 comments
Closed

Set MALLOC_ARENA_MAX=2 #751

mperham opened this issue Apr 25, 2018 · 30 comments

Comments

@mperham
Copy link

mperham commented Apr 25, 2018

This env variable greatly reduces memory fragmentation when using the GNU glibc memory allocator with MRI. Because MRI has the GIL, it doesn't need a lot of arenas to avoid thread contention; glibc defaulting to a large number only hurts Ruby users.

@schneems
Copy link
Contributor

cc @nateberkopec thoughts?

@nateberkopec
Copy link

jemalloc is a bigger benefit and doesn't have any performance penalty. Is it off the table?

@abinoda
Copy link

abinoda commented Apr 25, 2018

@mperham This is interesting. How come you are suggesting making this change in the buildpack vs developers adjusting this at the app level as needed?

Also, does this environment variable work with the Heroku 16 stack as well as the Cedar 14?

I ended up looking at these two articles:

@mperham
Copy link
Author

mperham commented Apr 25, 2018

@abinoda Because for every developer that makes the adjustment, there will be 100 that don't. Ruby will continue to get a reputation for memory bloat.

@nateberkopec I see jemalloc as a lot more risky but perhaps the buildpack could ship jemalloc 5.0.1 and allow the user to opt into it by setting LD_PRELOAD.

@schneems
Copy link
Contributor

jemalloc is a bigger benefit and doesn't have any performance penalty. Is it off the table?

I wouldn't say that it's off the table, but it's definitely more work. There's other things to consider here as well, such as how do we roll this out. Even if it's a better experience for 99% of customers, we still have to think about that 1% who's experience is worse and how can we manage that experience.

These are examples of things we need to consider:

Should this be the default for ALL apps? Should we only default it for new apps? Maybe only apps on a new stack etc. Should we warn people when we make the change? How can people opt out once the change is made (i.e. if we're defaulting to a MALLOC_ARENA_MAX, then they cannot unset the environment variable, we have to have docs on what the "normal" default would be and how to detect and set it).

@abinoda
Copy link

abinoda commented Apr 25, 2018

@mperham As someone who's been working with Ruby apps on Heroku almost a decade, I agree 💯

@nateberkopec
Copy link

@mperham Adding jemalloc only sounds riskier. In reality, it's 100% rock-solid and I've never heard of anyone having troubles w/ruby and jemalloc. And apps I've seen that have tried both MALLOC_ARENA_MAX and jemalloc always benefit more from the latter.

Because MRI has the GIL, it doesn't need a lot of arenas to avoid thread contention

I'm not sure this is accurate. Malloc needs a lock on the arena to access any memory inside of it, including memory which may not have anything to do with Ruby code (c extensions, just parts of Ruby which aren't behind the GVL). Terrence's research has shown that there is a performance impact (even if it is much less than the memory impact!).

@schneems Yeah. This is why I write performance guides and don't run a PaaS 😆

@gaffneyc
Copy link

We've rolled out jemalloc 5.0.1 to roughly a dozen applications on Heroku over the past four months or so. Of those, we've had issues with two of them segfaulting which was fixed by switching to jemalloc 3.6.0. Making jemalloc enabled by default could end up causing new subtle issues that are hard to debug where they're coming from. Asking people to set LD_PRELOAD gets tricky causing it's own set of issues

@nateberkopec
Copy link

I knew as soon as I typed that someone would call me out on it. 😆

@hone
Copy link
Member

hone commented Apr 26, 2018

@gaffneyc thanks for chiming in. It seems like setting MALLOC_ARENA_MAX seems to be a better bet here.

@schneems
Copy link
Contributor

schneems commented Apr 26, 2018

Not sure if GitHub notifies or not but I opened up #752 which adds MALLOC_ARENA_MAX=2 as a default for new apps and anything on our next stack.

Would be good to also look at default heap growth factors for GC as well. It looks like some of this changed with Ruby 2.4:

 * * RUBY_GC_HEAP_INIT_SLOTS
 *   - Initial allocation slots.
 * * RUBY_GC_HEAP_FREE_SLOTS
 *   - Prepare at least this amount of slots after GC.
 *   - Allocate slots if there are not enough slots.
 * * RUBY_GC_HEAP_GROWTH_FACTOR (new from 2.1)
 *   - Allocate slots by this factor.
 *   - (next slots number) = (current slots number) * (this factor)
 * * RUBY_GC_HEAP_GROWTH_MAX_SLOTS (new from 2.1)
 *   - Allocation rate is limited to this number of slots.
 * * RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO (new from 2.4)
 *   - Allocate additional pages when the number of free slots is
 *     lower than the value (total_slots * (this ratio)).
 * * RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO (new from 2.4)
 *   - Allocate slots to satisfy this formula:
 *       free_slots = total_slots * goal_ratio
 *   - In other words, prepare (total_slots * goal_ratio) free slots.
 *   - if this value is 0.0, then use RUBY_GC_HEAP_GROWTH_FACTOR directly.
 * * RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO (new from 2.4)
 *   - Allow to free pages when the number of free slots is
 *     greater than the value (total_slots * (this ratio)).
 * * RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR (new from 2.1.1)
 *   - Do full GC when the number of old objects is more than R * N
 *     where R is this factor and
 *           N is the number of old objects just after last full GC.

We currently recommend RUBY_GC_HEAP_GROWTH_FACTOR but it looks like that's only used as a fallback now? I'm not totally sure. It seems like setting a "goal" along with a min and a max is the new way to do it. I'm not sure the history here though and I've not played with any of the newer GC values.

@mperham
Copy link
Author

mperham commented Apr 26, 2018

Would you consider shipping jemalloc 3.6 or 5.0.1 in the buildpack so people could opt into it by setting LD_PRELOAD? Having an official documented path to jemalloc seems like it would solve all of these issues with people hacking in jemalloc with other buildpacks.

@abinoda
Copy link

abinoda commented Apr 27, 2018

@schneems Your previous comment I think meant to link to #752 (you linked to #751)

@elsurudo
Copy link

Just a data point – upgrading my app to Heroku-16 stack impacted my memory usage so severely that I had to revert to Cedar-14. Baking in better defaults (MALLOC_ARENA_MAX) is definitely welcome, as is jemalloc.

@schneems
Copy link
Contributor

Just a data point – upgrading my app to Heroku-16 stack impacted my memory usage so severely that I had to revert to Cedar-14. Baking in better defaults (MALLOC_ARENA_MAX) is definitely welcome, as is jemalloc.

The changes to the memory allocator happened going from cedar-12 to cedar-14. It's why this article is "testing cedar 14 memory use" https://devcenter.heroku.com/articles/testing-cedar-14-memory-use. There is no change from cedar-14 to heroku-16 in terms of memory allocator.

@schneems
Copy link
Contributor

@abinoda thanks, updated

schneems added a commit that referenced this issue Apr 30, 2018
This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps and for any apps running on the heroku-18 stack (when released).

While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads:

- https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
- https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html

In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources.

Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting:

- https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc).

Our documentation will be updated to reflect this change once the PR is merged and deployed.
@brodock
Copy link

brodock commented May 3, 2018

jemalloc 5.0.1 segfaults at random hardwares. Not deterministic, but in the ones that fails, it is consistent

@kirantpatil
Copy link

How about jemalloc 5.1.0, which is released 17 days ago ?

@kreintjes
Copy link

kreintjes commented Jun 25, 2019

Since upgrading to heroku-18 stack we saw a huge decrease in memory consumption for both our worker and web dyno's, probably because of #752. Thanks everyone!

However, we are still using slightly too much memory to downgrade (from PM dyno's) to 2X dyno's now. Could anyone (@schneems ?) tell me if trying jemalloc instead of MALLOC_ARENA_MAX=2 is worth a shot in order to reduce memory consumption further? Our workers currently use 1100 MB MAX/600 MB AVG. Our web dyno's currently use 1500 MB MAX/800 MB AVG. Our app is low-traffic, but has many complex/memory-intensive pages/jobs.

Ps. We didn't notice any performance reduction when upgrading to heroku-18 (i.e. setting MALLOC_ARENA_MAX=2).

PPs. After upgrading to heroku-18 I also tried setting MALLOC_ARENA_MAX=2 manually based on these articles https://devcenter.heroku.com/articles/ruby-memory-use#excess-memory-use-due-to-malloc-in-a-multi-threaded-environment and https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior. This obviously had no effect and then I found this issue and corresponding PR, indicating this is the default in heroku-18. Could anyone here change these articles to indicate this solution is enabled by default in heroku-18 or should I contact Heroku support for that?

@gaffneyc
Copy link

@kreintjes It's hard to tell based just on current memory usage what kind of improvement you would see with jemalloc (if any at all). Fortunately it is easy to run a test using the jemalloc buildpack. It's low risk and can be disabled without a code deploy if issues come up.

heroku buildpacks:add --index 1 https://github.com/gaffneyc/heroku-buildpack-jemalloc.git -a [your app]
heroku config:set JEMALLOC_ENABLED=true -a [your app]

# You'll need to push a new commit after adding the buildpack for it to be installed
git commit --allow-empty -m "Empty commit to rebuild slug for jemalloc"
git push heroku master

To turn off jemalloc:

heroku config:set JEMALLOC_ENABLED=false -a [your app]

@kreintjes
Copy link

@gaffneyc thanks! Will try that.

I now see that #752 isn't merged yet, so the effects I described (huge memory reduction after upgrading to heroku-18 and no use setting MALLOC_ARENA_MAX ourselves) can't be caused by it. So most of my earlier comment was bullshit, sorry for that. Apparently I was a bit tired and overheated (37 degrees celsius is not something we are used to in the Netherlands).

Maybe memory consumption is lower due to code improvements and/or ruby/rails updates, and we only saw it after upgrading the heroku stack.

@kreintjes
Copy link

@gaffneyc tried the jemalloc buildpack for the last three days. We see a significant reduction in memory consumption. We still sometimes hit the 2X limit (1024 MB) during our nightly jobs, but for much shorter periods of time.

Memory consumption of our workers for the past seven days:
(the d519552b release monday july 1st is the switch to jemalloc)
image
We went from several thousands of R14's in a night to only five in three nights!

Memory consumption of our web dyno's for the past seven days:
image

So much less memory than before. It looks like we can even downgrade our web dyno's from Pm to 2X now. Thanks!

@kreintjes
Copy link

Small update about the memory consumption after switching to jemalloc. The workers are doing absolutely fine. However we see some strange patterns in the memory consumption of our web dyno's. Usually the memory consumption is fine and much lower than before. However, at some points we suddenly get extreme peaks in memory consumption of up to 5 GB (see screenshots and these aren't even the most extreme examples).

image

image

Before switching to jemalloc we did not have this problem and the memory consumption never exceeded the limit (around 2 GB was the max we saw). Anyone any thoughts of what could be the cause or how to identify/fix the problem? @gaffneyc ?

@nateberkopec
Copy link

However, at some points we suddenly get extreme peaks in memory consumption of up to 5 GB

99% percent probability your jobs simply require that much memory. Use tools like GC.stat, Scout, etc to identify which jobs are allocating the most objects.

@gaffneyc
Copy link

gaffneyc commented Aug 7, 2019

As Nate said, it's most likely that there are some requests that just take that much memory to run. Your best bet would be to figure out which requests are allocating the most memory / objects and see if you can optimize them. Regardless of anything else you do, I would start digging in there as any other changes are probably just going to hide some unexpectedly expensive behavior within the app. You should also look at response times since allocating memory from the OS isn't necessarily bad unless it impacts overall performance.

Like most algorithms there are trade offs that malloc implementations make around performance and memory growth. Jemalloc (compared with ptmalloc2 which is the Linux default) does a better job of keeping memory compact (which shows up as lower total memory usage) but can be more aggressive about requesting memory from the OS when it thinks it needs it. Jemalloc does have a number of tuning options that you could look at. Another option would be to give tcmalloc (which comes out of Google) a try with the tcmalloc buildpack that I also maintain. It has different properties from both jemalloc and ptmalloc2 but we've seen similar results to jemalloc in testing (low growth from overall compactness). Your mileage may vary and any algorithm changes probably won't fix the underlying issue even if it improves the metric.

@nateberkopec
Copy link

Your mileage may vary and any algorithm changes probably won't fix the underlying issue even if it improves the metric.

Yeah. My guess is you could tune all day and those spikes would still go be ~2.5-3GB, which is what they were before jemalloc.

@nateberkopec
Copy link

Also these kinds of actions/controllers/jobs tend to be very dependent on the input data you feed into them. So the fact that it's using more memory now could just be because you're now iterating over 10,000 records instead of 5,000, etc. You may think "nothing has changed except jemalloc" but that's ignoring that the data underneath has probably changed.

@schneems
Copy link
Contributor

schneems commented Aug 7, 2019

One thing Nate has done before is actually remove jemalloc from https://www.codetriage.com to see what it does to the memory use.

If the spikes are very regular (i.e. once a day etc.) then you can test your theory that they're related to jemalloc by removing it and seeing how the spikes change.

If they're more sporadic then it's harder. I would put in some work to see what triggers the increase in memory in the first place. As @nateberkopec mentioned, the most likely culprit is a change in data "shape/size" (i.e. number of records being processed).

@schneems
Copy link
Contributor

schneems commented Aug 7, 2019

I was brainstorming what tooling I would like to diagnose the "shape" issues and It would be killer if sidekiq could output memory allocation information per job similar to rails/rails#34136 but for sidekiq jobs. Which would be pretty cool.

I do think some APM services support workers and allocations if you want to go a non-OSS route.

@kreintjes
Copy link

Thanks everybody for the quick and helpful responses! I did not expect this much help this soon.

To clarify, the problem isn't the workers/jobs, but the web dyno's (so web requests) now. The workers are doing absolutely better with jemalloc. The web dyno's most of time do so as well, however as pointed out, there are some irregularities. Also at moments of extremely high memory consumption our response times increase drastically as well, even leading to request time outs (although we didn't receive any user complaints as of yet).

I understand the spikes in memory consumption could also be caused by changes in requests, code or data (growth of data). However, we started seeing these spikes very shortly after switching to jemalloc. I only did not record screenshots of them back then, because I wanted to monitor for a longer period. Then due to my holiday I only recently got a chance to summarise my findings and get back here. Also before jemalloc we indeed did have spikes as well, but back then they never exceeded the 2 GB (we never had a memory quota exceeded error on our Pm web dyno's), while quickly after switching to jemalloc they ran up to 5 GB. So therefore I think the increase in height of the spikes is caused by jemalloc, although indeed, I can't be certain. The growth of our (database) data is steadily and predictable, but of course the data users send with their requests could vary a lot.

Anyway, as you pointed out, we likely always had a problem and the spikes of 2 GB probably already indicated that. Jemalloc might have made the spikes sharper (higher), but it would be best to fix the underlying problem(s) instead of tweaking the memory allocator. So I will start using GC.stat and/or Scout/NewRelic to track down the requests and code that cause the memory spikes and improve it.

Thanks again everybody for your responses.

schneems added a commit that referenced this issue Sep 12, 2019
This PR will set MALLOC_ARENA_MAX=2 by default for new Ruby apps

While we currently have [documentation on tuning the memory behavior of glibc by setting this environment variable](https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior) the default does not produce good results for Ruby applications that are using threads:

- https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
- https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html

In general most Ruby applications are memory bound and by decreasing the memory footprint of the application we can enable scaling out via more workers. Less memory might also mean a cheaper to run application, as it consumes fewer resources.

Setting this value is not entirely free. It does come with a performance trade off. For more information, see how we originally benchmarked this setting:

- https://devcenter.heroku.com/articles/testing-cedar-14-memory-use

If a customer’s application is not memory bound and would prefer slightly faster execution over the decreased memory use, they can set their MALLOC_ARENA_MAX to a higher value. The default as specified by the [linux man page](http://man7.org/linux/man-pages/man3/mallopt.3.html) is 8 times the number of cores on the system. Or they can use the 3rd party [jemalloc buildpack](https://elements.heroku.com/buildpacks/mojodna/heroku-buildpack-jemalloc).

Our documentation will be updated to reflect this change once the PR is merged and deployed.
schneems added a commit that referenced this issue Sep 23, 2019
[close #751] Default MALLOC_ARENA_MAX new apps
krisrang added a commit to skyltmax/heroku-buildpack-ruby that referenced this issue Oct 13, 2019
* upstream/master: (42 commits)
  [changelog skip] Fix logic for Outdated check
  Test current ruby version is also same as default ruby version
  Clean up OutdatedRubyVersion#join
  Bump Ruby version
  Safer interface for outdated ruby version
  Store values in thread directly
  No idea why this fails sometimes but this should be more robust.
  Add changelog for Ruby warnings
  mend
  Warn when app is not using the latest Ruby version
  Bump default Ruby version to 2.5.7
  [changelog skip] Introduce change log check to all PRs
  v206
  Add changelog for PR
  Switch back to a version of jruby that officially exists
  Pre-release binary is heroku-18 only
  Use 1.7.27 for jruby tests
  Remove app that's not used
  Bundler 1.17.3
  [close heroku#751] Default MALLOC_ARENA_MAX new apps
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants