-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tls_close causes exception on Windows #266
Comments
Do you have an example program that you can share? |
It appears to this commit libressl/openbsd@6be5c4c relates to this issue. |
See this Gist. When run, this program prints the issuer information and then crashes in the
I don't think it's the same case - I had done a |
I have a crude fix for this issue: --- tls/tls.c 2017-01-25 18:20:41.087748566 +0000
+++ /home/vinay/tmp/tls.c 2017-01-29 08:26:00.343170297 +0000
@@ -635,6 +635,21 @@
return (rv);
}
+static BOOL
+socket_is_valid(SOCKET fd)
+{
+ fd_set fds;
+ int rc;
+ TIMEVAL t;
+
+ ZeroMemory(&fds, sizeof(fds));
+ t.tv_sec = t.tv_usec = 0;
+ fds.fd_count = 1;
+ fds.fd_array[0] = fd;
+ rc = select(1, &fds, &fds, &fds, &t);
+ return rc >= 0;
+}
+
int
tls_close(struct tls *ctx)
{
@@ -651,12 +666,14 @@
if (ctx->ssl_conn != NULL) {
ERR_clear_error();
- ssl_ret = SSL_shutdown(ctx->ssl_conn);
- if (ssl_ret < 0) {
- rv = tls_ssl_error(ctx, ctx->ssl_conn, ssl_ret,
- "shutdown");
- if (rv == TLS_WANT_POLLIN || rv == TLS_WANT_POLLOUT)
- goto out;
+ if (socket_is_valid(ctx->ssl_conn->wbio->num)) {
+ ssl_ret = SSL_shutdown(ctx->ssl_conn);
+ if (ssl_ret < 0) {
+ rv = tls_ssl_error(ctx, ctx->ssl_conn, ssl_ret,
+ "shutdown");
+ if (rv == TLS_WANT_POLLIN || rv == TLS_WANT_POLLOUT)
+ goto out;
+ }
}
} With the above patch applied, the crash no longer occurs. This is a bit of a hack, perhaps (not portable, violates encapsulation, etc.) - you can treat this info as just a data point. I also noticed that this bug doesn't always present on Windows 7. The failure happens consistently on Windows 10. |
The socket must be closed after you have called tls_close() - otherwise there is not much point in calling tls_close(). That said, it does seem strange that this triggers an exception on Windows. In the Unix world this would just fail with an error that we would see following the SSL_shutdown() call. If anything, this should be fixed in posix_write() - certainly not in tls_close(). |
Sure - as I said, it was just a point of information. I wasn't suggesting that this was the right fix. On Linux, the program does fail with a Does it do any harm if we just omit the Talking of object lifetimes, I found what looks like another bug ... before I log it as a separate issue, can you confirm whether it is a bug or just a documentation issue? I found that freeing the config after creating a context (but before using it) also caused a "dangling pointer" kind of problem (access violation) - even though the documentation states
To observe this, in the above linked Gist, just move the "free up config" section to after the "get context and configure" section and before the "connect the socket to the context and do the handshake" section. Running the resulting program causes a segfault. |
Writing to a file descriptor after you have closed is a serious bug, whether it causes an exception or not. Even if other systems do not raise an exception, you could easily inadvertently write to the descriptor after it has already been reused in another context. We could make posix_write check if something is a socket before using socket operations on it (we do this in the poll() emulation in openssl(1) to determine if something is selectable) but this is still vulnerable to TOCTOU bugs. |
Do you mean a bug in my program? Possibly, though in that case the ssize_t
posix_write(int fd, const void *buf, size_t count)
{
ssize_t rc = send(fd, buf, count, 0);
if (rc == SOCKET_ERROR) {
int err = WSAGetLastError();
return (err == WSAENOTSOCK || err == WSAEBADF) ?
write(fd, buf, count) : wsa_errno(err);
}
return rc;
} It appears to try a |
N.B. the NTQuery internal API can distinguish between socket and file handles. |
@kinichiro @busterb can we solve this more sanely with NTQuery? |
Hi, It can be replicated by closing the underlying socket while it's blocked on a recv or send call. (We do this in order to cancel blocking socket calls) The LibreSSL code as it is seems incorrect, since it is treating a socket handle as a file descriptor. On windows the two are different, hence the crash. The LibreSSL code should be fixed to not call file functions on socket handles. |
To call I also tested to use I wonder if is there any safe way to detect that it is socket or not in any status. |
@kinichiro Then why not just use |
Just use
To check this, I had wrote several is_socket_...() code and tried to see how it works. To build with this posix_win.c, add ntdll after ws2_32 in CMakeLists.txt.
After all, I still couldn't find the sane way to detect fd or socket. |
Hmmm - before I posted my comment, I did a quick smoke test (Windows 10, but using Python bindings to the underlying Windows libraries). The script in this gist: https://gist.github.com/vsajip/0d1ff9d6e94561cc7f4466dbcf86748c gives results which suggest
and with Python 2:
Python 2 was compiled with VS 2008, Python 3 with VS >= 2015. |
I saw that python uses Here is the gist for new posix_win.c. Does this work for you? P.S. |
I would suggest that the fundamental issue here is some kind of confusion between socket handles and file handles, introduced due to the use of the Posix emulation layer. I think the solution is to resolve this confusion, not try to catch it with 'an ambulance at the bottom of a cliff' with asking the OS what kind of handle it is, or suppressing exceptions. |
There was a strong move in the upstream library code to remove conditional OS paths, targeting POSIX interfaces only. Whatever deviations arise in the supported OSes would suggest either fixing those OSes, or adding compatibility shims. Sometimes it feels like being between a rock and a hard place when deciding whether to add #ifdefs back into the portable version. In the case with libtls, the file and socket descriptor paths to close are unambiguous. While we're not likely to get new Windows #ifdef's added upstream, it would be good to think about how to more clearly signal intention to posix_close(), without needing too many out-of-tree patches. OTOH there are less than a dozen calls in libtls, maybe it's not too bad to special case calls here and just patch libtls? I worry more about the BIO layer, openssl(1), etc. Another solution (and sorry if this sounds slightly crazier) could be to track file descriptors allocated by posix_open as a list of known file descriptors used internally by the library, so we don't have to ask the OS later on. Anything not in the file list could be assumed to be a socket, though I suppose posix_socket could also be added to track socket descriptors explicitly as well. |
As I mention in my "Strange logic" issue, this happens on |
Would be good to get this fixed. It's a pretty bad piece of design in libtls. |
based on discussion in libressl#266 and https://bugs.python.org/issue23524 adjust the compat layer for Windows to use _get_osfhandle in combination with _set_thread_local_invalid_parameter_handler if applicable to more reliably determine if a handle is a socket, file, or closed socket. This prevents assertions when calling tls_close on an already-closed socket.
I know this has been sitting for a while. I got some time today to focus on it, and reworked @kinichiro's prototype in PR #883 . If anyone in this thread is still being affected by this, would be interested any feedback you have on this change. Thanks! See #883 |
When running the signertest, or the test project in libressl#266 an assertion window pops up. This was fixed in afcd4be for a release compiled library. To prevent the issue in debug mode, it looks like it is necessary to also disable the assertion window popup. With this all tests pass when compiling and running them with a Debug, Release or RelWithDebInfo CMake build on windows (for me).
When running the signertest, or the test project in libressl#266 an assertion window pops up. This was fixed in afcd4be for a release compiled library. To prevent the issue in debug mode, it looks like it is necessary to also disable the assertion window popup. With this all tests pass when compiling and running them with a Debug, Release or RelWithDebInfo CMake build on windows (for me).
When running the signertest, or the test project in libressl#266 an assertion window pops up. This was fixed in afcd4be for a release compiled library. To prevent the issue in debug mode, it looks like it is necessary to also disable the assertion window popup. With this all tests pass when compiling and running them with a Debug, Release or RelWithDebInfo CMake build on windows (for me).
@busterb Yet another crazy-sounding solution would be to exploit the rumor that valid Windows socket descriptors are kernel object handles, which, being offsets into some table whose entries require some alignment, can probably only take even numbers. In order to ensure that file descriptors returned from posix_open are always odd: static int
oddify_fd(int fd)
{
if (fd & 1) /* also catches an eventual -1 from using up all descriptors */
return fd;
int clone = oddify_fd(dup(fd));
close(fd);
return clone;
}
int
posix_open(const char *path, ...)
{
va_list ap;
int mode = 0;
int flags;
va_start(ap, path);
flags = va_arg(ap, int);
if (flags & O_CREAT)
mode = va_arg(ap, int);
va_end(ap);
flags |= O_BINARY;
if (flags & O_CLOEXEC) {
flags &= ~O_CLOEXEC;
flags |= O_NOINHERIT;
}
flags &= ~O_NONBLOCK;
return oddify_fd(open(path, flags, mode));
} |
Here's what I do to solve this problem: in posix_open:
in posix_close:
in tls_config_load_file:
This is more elegant IMO than relying on some dubious OS calls to try and decipher if the handle is a file or socket later. |
Given the following sequence of events:
tls_connect_socket
.tls_close
is called on the context.On Windows (but not Linux) the program crashes in the
tls_close
call. The stack trace indicates that a write is being attempted to the closed socket, which is causing Windows to abort the program withAn invalid parameter was passed to a function that considers invalid parameters fatal
.Stack trace:
ucrtbased.dll!00007ff92c379fb8()
libcrypto.dll!posix_write(int fd, const void * buf, unsigned __int64 count) Line 188
libcrypto.dll!sock_write(bio_st * b, const char * in, int inl) Line 154
libcrypto.dll!BIO_write(bio_st * b, const void * in, int inl) Line 252
libssl.dll!ssl3_write_pending(ssl_st * s, int type, const unsigned char * buf, unsigned int len) Line 813
libssl.dll!do_ssl3_write(ssl_st * s, int type, const unsigned char * buf, unsigned int len, int create_empty_fragment) Line 789
libssl.dll!ssl3_dispatch_alert(ssl_st * s) Line 1414
libssl.dll!ssl3_send_alert(ssl_st * s, int level, int desc) Line 1400
libssl.dll!ssl3_shutdown(ssl_st * s) Line 2544
libssl.dll!SSL_shutdown(ssl_st * s) Line 1029
libtls.dll!tls_close(tls * ctx) Line 654
The
fd
in theposix_write
call is the same as the fd of the connected socket passed totls_connect_socket
. This doesn't seem like desirable behaviour fortls_close
, or is there an expectation that the socket must be closed later?The text was updated successfully, but these errors were encountered: