Recently, we hit a problem with Ruby’s “exit” command. If something went horribly wrong and it made no sense for our application to continue in its current state then we would abort with “exit 1″. We use supervisord to manage processes, so in this case when we exited with exit status of 1, supervisord would assume something went wrong and restart the process for us. Or at least that is what we thought…
“exit 1″ does not actually cause the process to exit, it just raises a SystemExit exception. This is very clearly explained in the documentation.
begin exit puts "never get here" rescue SystemExit puts "rescued a SystemExit exception" end puts "after begin block"
In our case, nothing was catching the exception. So what was it?
The other way to handle exits is to run an “at_exit” block. Ruby runs any “at_exit” block when it gets an “exit” call, but does not run these with a “exit!” call.
Here is what our rather naive at_exit block looked like…
at_exit do # do cleanup # now exit for real exit end
This harmless looking piece of code was turning our “exit 1″ into an “exit 0″. When supervisord sees one of its processes exit with a status of zero, it assumes all is good and does not try to restart the process. This is big problem for uptime and a major reason for using a tool like supervisord.
Instead we should be using “exit!” if we want to die hard [with a vengeance] and be sure that the process exited with a status of 1 and is restarted by supervisord. This would bypass all SystemExit rescue blocks and at_exit blocks.
Alternatively, and a better solution, is that we never call “exit” inside an “at_exit” block and we make sure that all SystemExit rescue blocks and at_exit blocks are used with great caution and echo the original exit status when necessary.
Jonathan Rochkind made a really great clarification of the exit / at_exit situation in a comment below. I think it is important to read it.
A good reminder; not enough people understand Ruby’s process termination system. Incidentally, I prefer `abort(“Some error message”)` to `exit(1)`. It sets the exit status to 1, and also sets the error message in the SystemExit exception.
Great tip Avdi! I’ve not used abort() before.
Yes, did not know about the abort() either! I’ve been hardcoding ‘exit 255′ in my scripts. Thanks!
Any clue on how one would do this, is there any way an “at_exit” block can access the ‘original exit status’?
Or wait, is the key thing just that you do not need to and should not call `exit` inside `at_exit` — at_exit shoudl do it’s thing without calling `exit`, and then when it’s done being executed the process will still exit on it’s own, with the original exit value. Is that right?
Good question, Jonathan. Yes, I think putting a “exit” inside an “at_exit” block makes no sense, because the application is already in the process of exiting. Putting an “exit!” instead of an “at_exit” block should also be avoided. If you have multiple at_exit blocks then this would cause some to run and others not to run, because one of the blocks decided to do this hard exit. It is technically possible to reason about the order in which these “at_exit” blocks execute (“If multiple handlers are registered, they are executed in reverse order of registration”), but who is going to keep track of this? It makes for better code to avoid “exit” or “exit!” within “at_exit” blocks altogether, especially a soft exit (“exit”). It is possible that some code within an at_exit block realizes that a situation has occurred where it needs to hard exit (“exit!”), so that nothing else is executed, but I would try to re-engineer the code so that this is not needed.
You can always find out what the exit status is by looking at the SystemExit exception, which will be found in `$!` (aka `$ERROR_INFO` if you require ‘English’). If I may shamelessly self-promote for a moment, I go into this in depth in Exceptional Ruby.
Great tip Avdi. I love shameless self-promotions!
> Yes, I think putting a “exit” inside an “at_exit” block makes no sense
Pardon the phrase, but this is the real WTF. Maybe at_exit good practices would be a good blog post. I’ve never had to use it so I’ve never looked into it myself. Also, I wholeheartedly recommend Avdi’s exceptional book.
What a coincidence! I just blogged about something very similar!
After running into my own weird at_exit related bug, I discovered this MRI bug report:
I think a LOT of people’s at_exit related problems are actually due to this bug. Note that while the bug is marked fixed in the tracker, the reproduction still reproduces for me in 1.9.3p194, so I don’t think it’s made it into a release yet.
And that bug report reproduction case also reveals something interesting — you ARE allowed to call `exit` in an `at_exit` block. It does not keep subsequent `at_exit` blocks from running. If everything is working properly, the exit code of the last `at_exit` block to call `exit` ‘wins’. But because of that bug, everything may not be working properly. There is a workaround with monkey-patch redefining `at_exit` in that bug report.
That bug doesn’t actually account for YOUR problem as above. Your problem was a legitimate software error — don’t call ‘exit’ in an `at_exit` block unless you actually WANT to set the exit code in the `at_exit` block. If you don’t, and don’t need to call `exit` in it, you’re fine. If you DO need to call `exit’ in an at_exit block to set the exit code… then the MRI bug may interfere with your desires.
Thanks Jonathan for this great clarification! I think anyone who reads this post should read your comment. I will make a note of it at the bottom of the post.