Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async demo has undefined behavior #20

Open
neachdainn opened this issue Jul 24, 2020 · 6 comments
Open

Async demo has undefined behavior #20

neachdainn opened this issue Jul 24, 2020 · 6 comments

Comments

@neachdainn
Copy link

In the asynchronous REQREP demo, an exception is thrown in the callback [1]. However, the callback takes a direct function pointer [2] and nng may or may not have been built with -fexception (or equivalent) [3]. There is a function-try-block which looks like it will catch errors in the callback but the thread running the callback originates inside of the nng threadpool - which is in C.

One way to solve this could be to not directly provide nng with the user-provided callback. Instead, have the aio constructor take an std::function and provide that to an argument to a "trampoline" function which catches any and all exceptions from the user-provided function. This trampoline is what you would then provide to nng as the AIO callback.

@cwzx
Copy link
Owner

cwzx commented Jul 28, 2020

You are correct. We shouldn't be emitting exceptions from nng callbacks.
Your suggestion is a safe way to ensure this never happens.

@samarai-jk
Copy link

samarai-jk commented Oct 24, 2020

Found the same problem. May I ask, has there been update on this?
@neachdainn what was your solution?
Do you know how nng reacts on errors in this situation? Lets say there's an error sending (in my case that would happen asynchronously) should we try managing the failed message ourselves or does nng do some kind of management? The original nng demo exits the process on any error, doesn't seem like it would manage anything...

@neachdainn
Copy link
Author

neachdainn commented Oct 24, 2020

This really isn't an nng thing - it comes down to the behavior of the compiler. The nng library operates within the C specification - meaning it doesn't address C++ exceptions at all. I think most modern compilers do something sane (I think), but the real solution is to catch exceptions at the FFI boundary.

In my own code, I catch exceptions at the FFI boundary (i.e., the callback). In my older code, I would occasionally check a flag in my own threads to see if an exception occurred. In my newer code, I just abort the program if I catch an exception - not ideal but better than undefined behavior. If someone comes up with a better solution, I would love to hear it.

EDIT:

May I ask, has there been update on this?

Based on the latest tag in this repo, I do not think so. The changes I described in the original post aren't too crazy, so perhaps I should just get around to making them myself.

@neachdainn
Copy link
Author

One thing that might mostly work is to have aio_view::wait check a global / aio / context / socket specific variable and re-raise the exception if present. That should work if you can require that each asynchronous operation eventually is waited on, though that probably ends up being a project-specific policy rather than something a library can enforce.

@samarai-jk
Copy link

@neachdainn Thanks much for coming back. I've added a thread-safe context to our worker class which allows me to inform the server coordinator in case of an error inside the callback. The problem is only... what then? As you said, we can't allow undefined behavior. But are there cases where we can save the running process? And when will we get errors in the first place? Does nng e.g. retry sending failed messages before it emits the error? Anyway, that's questions for Garrett/nng

@neachdainn
Copy link
Author

Apologies if these responses are too delayed to be of any use.

I believe that if NNG reports an error, it will not attempt to do anything else with the message. That's the logical expectation given that the user owns the message upon error. That being said, some protocols are more "reluctant" to return an error than others; some have built-in retry (e.g., REP) and others will just drop messages (e.g., PUB).

If you're the one consuming the code, then the correct answer is probably pretty specific to your application. If you're writing a library, I don't know if there's a perfect answer. In my wrapper, I document that the user needs to either come up with their own scheme for handling errors or just not emit errors (though I could see plenty of argument for silently eating the errors instead of killing the program).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants