Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Hangs forever in refresh after 5 minutes when 2ary unreachable #331

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

telent
Copy link
Contributor

@telent telent commented Nov 3, 2014

tl;dr setsockopt(SO_RCVTIMEO) doesn't work for us in Ruby 2.[01] MRI, infinite hangs result

We're having trouble in our Mongo cluster when a secondary becomes unreachable (due to e.g. crashed instance or network partition): the symptom is that after five minutes all attempts to contact the primary seem to hang indefinitely (or until Unicorn kills our worker process). Note that our mongoid.yml has options: {read: primary}

Attempting to replicate the problem in a test environment gets us a call stack something like this (some frames omitted for brevity)

     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `read'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `block in read'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:119:in `handle_socket_errors'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `read'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection.rb:221:in `with_connection'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection.rb:114:in `read'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/protocol/query.rb:213:in `receive_replies'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/node.rb:180:in `block in ensure_connected'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/node.rb:432:in `refresh'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:182:in `block in refresh'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:151:in `nodes'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:240:in `with_primary'
     # /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/read_preference/primary.rb:55:in `block in with_node'

and after some digging the problem seems to be that implementing timeouts using socket options doesn't work in Ruby MRI. Looking at it with strace I see a call to read that returns after five seconds with EAGAIN followed by a call to ppoll - the EAGAIN doesn't ever get as far as interpreted code.

getsockname(7, {sa_family=AF_INET, sin_port=htons(44829), sin_addr=inet_addr("192.168.195.6")}, [16]) = 0
setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, "\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
futex(0x7f24117f6654, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f24117f66d0, FUTEX_WAKE_PRIVATE, 1) = 0
select(8, [7], NULL, [7], {0, 0})       = 0 (Timeout)
read(7, 0x7f2413130f00, 100)            = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=7, events=POLLIN}], 1, NULL, NULL, 8  <hang indefinitely>

See also http://stackoverflow.com/questions/9853516/set-socket-timeout-in-ruby-via-so-rcvtimeo-socket-option

The attached patch has a spec that demonstrates the problem (try running it under strace) and a fix which implements read timeouts "by hand" using select() instead of relying on socket options. A more full solution would probably address the write() case and be tested on SSL connections as well as plaintext ones, but I thought I'd push this early diagnosis up to you for comment first to see if I'm on the right track

/cc @dawid-sklodowski @cpoo22 just fyi

In MRI 2.1.3 - and probably most other versions - setting the
SO_RCVTIMEO option on a socket will cause read(2) to return EWOULDBLOCK
as expected, but the interpreter internals will retry the operation
instead of passing the failure up to interpreted code.  Thus, the
timeout has no visible effect.

What we should do instead is call Kernel.select with the required
timeout to check if there is data available, *then* call read(2).
If .select fails we raise a Errors::ConnectionFailure on the assumption
that something higher up in the stack will catch it and cope.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) when pulling c56498f on simplybusiness:timeout_in_read into 7e5ac3e on mongoid:master.

@telent
Copy link
Contributor Author

telent commented Dec 19, 2014

cc @ngsmrk @dcoxall

@jonhyman
Copy link
Contributor

jonhyman commented Apr 8, 2015

We experience this issue as well. Would love to see it merged.

jonhyman added a commit to jonhyman/moped that referenced this pull request Apr 20, 2015
jonhyman added a commit to jonhyman/moped that referenced this pull request Apr 24, 2015
@jonhyman
Copy link
Contributor

@telent, have you been running this in production? I tested this out on production and had some really strange errors (moped cursor returning empty documents) that were sporadic and hard to reproduce. I reverted the change for now but will do some more investigation later.

@jonhyman
Copy link
Contributor

Though to be clear, I had done it by copying the code into a fork of Moped 1.5. I didn't try with Moped 2 yet.

@jonhyman
Copy link
Contributor

I tried testing this again by issuing a scatter-gather query on a massive collection with hundreds of millions of documents. This wreaked havoc on the database, as it appears that the query continually kept getting retried but the cursor wasn't closed. As such, before I CTRL+C'd it, we had 19 identical queries running which caused massive queued reads and blocking. I'm not sure this is safe. Though, I had retrofitted it to 1.5, so it may work in 2.0 but I am bearish on this right now given what happened to us.

@jonhyman
Copy link
Contributor

@durran have you thought about this problem at all in the new ruby driver? As @telent points out, SO_RCVTIMEO simply does not work in MRI anymore. We experience this issue when we have a bad networking event or a physical host which has multiple mongoDs freezes or crashes; our processes completely hang and must be kill -9'd. We have an identical stack trace to what @telent describes in the issue. Reviewing the code for the new driver, I think that we'll have the same issue as long as IO.select is not used before the socket read. This pull as it is is problematic for us, my guess is because if a query takes longer than @timeout to return, then the ConnectionFailure causes it to be re-issued (as I mentioned above, we wreaked havoc on a database with this pull and a 10 second timeout), but this is a real operational issue that causes really bad problems for us.

@durran
Copy link
Member

durran commented Aug 23, 2015

@jonhyman I have a branch up on the new driver that actually does an IO.select before selecting a server to use in an operation... (https://github.com/mongodb/mongo-ruby-driver/pull/670/files#diff-ea2082a78a5d9b36b80ce79b7705943dR86) The server.connectable? call is eventually calling a select before returning it, and the operation would execute immediately after.

@jonhyman
Copy link
Contributor

I may be wrong, but I don't think that that's going to help. Going back to Moped and this issue for a second, the Moped code already has a check_if_alive! immediately before the read, but one issue is that alive? is defined as this (in both Moped and it's roughly identical in the new driver)

def alive?
  if Kernel::select([ self ], nil, [ self ], 0)
    !eof? rescue false
  else
    true
  end
rescue IOError
  false
end

so if select returns nil, alive? still returns true. The alive? check has a 0 second timeout, so all you know is that the socket is not dead, but if it returns nil then it does not have data on it right now and is not ready for reading. Because the current Moped code does this:

check_for_alive!
handle_socket_errors { super }

you don't know if the socket is ready for reading when you go to do a blocking read, and the super call hangs indefinitely as @telent's stack trace shows (and we have that exact stack trace when it happens to us).

In your pull, connectable? simply does an alive? check, which, if select returns nil signifying that the socket is not ready for reading, connectable? will still return true. Then you'll end up calling @socket.read in https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/socket.rb#L172 and I think this code will be vulnerable to hanging again (I have not yet tested, your pull is the last one I'm waiting on before we perform failover testing scenarios with the new driver 😁).

From all my testing locally, I can't figure out how to solve this hanging case in Ruby that does not involve using select to set a "maximum read time" by making read_from_socket look something like this:

def read_from_socket(length)
  begin
    @socket.read_nonblock(length) || String.new
  rescue IO::WaitReadable, OpenSSL::SSL::SSLErrorWaitReadable
    ready = IO.select([@socket], nil, [@socket], 300)
    if ready
      retry
    else
      raise StandardError.new("Maximum query time exceeded, no data from socket returned in 300 seconds.")
    end
  end
end

Note here I'm not raising a ConnectionFailure to retry, I'm literally failing, because if it was in fact the case that the query should take longer than the max timeout, the ConnectionFailure will cause the query to get re-issued, and you can get in the loop that we got into 4 months ago when I tested this and ran a dozen and a half huge, slow scatter-gather queries on a very large cluster and caused high locking/queued reads.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants