I have a multi-threaded C++ application, used as a component end for one of the integration channels. Since last 2-3 weeks, I have been facing a strange issue of application getting hanged and not responding.
Debugging started with `strace' and I found that the application hangs in futex as:
futex(0x5ac9df, FUTEX_WAIT, ....
I thought, it had something to do with pthread_mutex that I was using, to share a queue across the application. I ensured that, pthread_mutex_lock and pthread_mutex_unlock are happening correctly, since problem solving always starts with the assumption that, the problem is in your code.
Well, I did that and also put more debug statements around usage of `mutex'. Unfortunately, that did not help, since the problem persisted.
I was on the verge of restructuring the entire application, when on googling `futex_wait hangs', I stuck upon a link, where there is a discussion about the same issue what I was facing. Some text from the link:
Unfortunately, ctime() is not defined on this list. So, glibc does not guarantee the sane behavior when one uses ctime() in signal handler. BTW, I'm surprised that sysklogd calls some functions in signal handler.
Unfortunately, my application was doing the same, i.e., using a function `localtime' (which calls __libc_lock_lock() in glibc), in a signal handler. I couldn't believe it. Though the purpose of the signal handler was to clean up resources and exit the application, I was logging some data. The logging function was calling `localtime'.
Pathetically, it has not yet been fixed, as it seems that this is a problem with glibc on 2.6 kernel, and not application programmers are at the disposal of glibc or kernel developers to fix this.