Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls_close causes exception on Windows #266

Closed
vsajip opened this issue Jan 26, 2017 · 23 comments · Fixed by #883
Closed

tls_close causes exception on Windows #266

vsajip opened this issue Jan 26, 2017 · 23 comments · Fixed by #883

Comments

@vsajip
Copy link

vsajip commented Jan 26, 2017

Given the following sequence of events:

  1. A connected socket is associated with a context using tls_connect_socket.
  2. The socket is closed (after handshake).
  3. tls_close is called on the context.

On Windows (but not Linux) the program crashes in the tls_close call. The stack trace indicates that a write is being attempted to the closed socket, which is causing Windows to abort the program with An invalid parameter was passed to a function that considers invalid parameters fatal.

Stack trace:
ucrtbased.dll!00007ff92c379fb8()
libcrypto.dll!posix_write(int fd, const void * buf, unsigned __int64 count) Line 188
libcrypto.dll!sock_write(bio_st * b, const char * in, int inl) Line 154
libcrypto.dll!BIO_write(bio_st * b, const void * in, int inl) Line 252
libssl.dll!ssl3_write_pending(ssl_st * s, int type, const unsigned char * buf, unsigned int len) Line 813
libssl.dll!do_ssl3_write(ssl_st * s, int type, const unsigned char * buf, unsigned int len, int create_empty_fragment) Line 789
libssl.dll!ssl3_dispatch_alert(ssl_st * s) Line 1414
libssl.dll!ssl3_send_alert(ssl_st * s, int level, int desc) Line 1400
libssl.dll!ssl3_shutdown(ssl_st * s) Line 2544
libssl.dll!SSL_shutdown(ssl_st * s) Line 1029
libtls.dll!tls_close(tls * ctx) Line 654

The fd in the posix_write call is the same as the fd of the connected socket passed to tls_connect_socket. This doesn't seem like desirable behaviour for tls_close, or is there an expectation that the socket must be closed later?

@busterb
Copy link
Contributor

busterb commented Jan 27, 2017

Do you have an example program that you can share?

@kinichiro kinichiro added the bug label Jan 27, 2017
@kinichiro
Copy link
Contributor

It appears to this commit libressl/openbsd@6be5c4c relates to this issue.
Can you confirm this solve your issue ?

@vsajip
Copy link
Author

vsajip commented Jan 27, 2017

Do you have an example program that you can share?

See this Gist.

When run, this program prints the issuer information and then crashes in the tls_close call:

Failure dialog

Can you confirm this solve your issue ?

I don't think it's the same case - I had done a tls_handshake , as demonstrated by the program linked above.

@vsajip
Copy link
Author

vsajip commented Jan 29, 2017

I have a crude fix for this issue:

--- tls/tls.c	2017-01-25 18:20:41.087748566 +0000
+++ /home/vinay/tmp/tls.c	2017-01-29 08:26:00.343170297 +0000
@@ -635,6 +635,21 @@
 	return (rv);
 }
 
+static BOOL
+socket_is_valid(SOCKET fd)
+{
+	fd_set fds;
+	int rc;
+	TIMEVAL t;
+
+	ZeroMemory(&fds, sizeof(fds));
+	t.tv_sec = t.tv_usec = 0;
+	fds.fd_count = 1;
+	fds.fd_array[0] = fd;
+	rc = select(1, &fds, &fds, &fds, &t);
+	return rc >= 0;
+}
+
 int
 tls_close(struct tls *ctx)
 {
@@ -651,12 +666,14 @@
 
 	if (ctx->ssl_conn != NULL) {
 		ERR_clear_error();
-		ssl_ret = SSL_shutdown(ctx->ssl_conn);
-		if (ssl_ret < 0) {
-			rv = tls_ssl_error(ctx, ctx->ssl_conn, ssl_ret,
-			    "shutdown");
-			if (rv == TLS_WANT_POLLIN || rv == TLS_WANT_POLLOUT)
-				goto out;
+		if (socket_is_valid(ctx->ssl_conn->wbio->num)) {
+			ssl_ret = SSL_shutdown(ctx->ssl_conn);
+			if (ssl_ret < 0) {
+				rv = tls_ssl_error(ctx, ctx->ssl_conn, ssl_ret,
+					"shutdown");
+				if (rv == TLS_WANT_POLLIN || rv == TLS_WANT_POLLOUT)
+					goto out;
+			}
 		}
 	}

With the above patch applied, the crash no longer occurs. This is a bit of a hack, perhaps (not portable, violates encapsulation, etc.) - you can treat this info as just a data point.

I also noticed that this bug doesn't always present on Windows 7. The failure happens consistently on Windows 10.

@4a6f656c
Copy link
Contributor

The socket must be closed after you have called tls_close() - otherwise there is not much point in calling tls_close(). That said, it does seem strange that this triggers an exception on Windows. In the Unix world this would just fail with an error that we would see following the SSL_shutdown() call. If anything, this should be fixed in posix_write() - certainly not in tls_close().

@vsajip
Copy link
Author

vsajip commented Jan 29, 2017

If anything, this should be fixed in posix_write() - certainly not in tls_close().

Sure - as I said, it was just a point of information. I wasn't suggesting that this was the right fix.

On Linux, the program does fail with a shutdown failed: Bad file descriptor.

Does it do any harm if we just omit the tls_close(), if the socket is to be closed anyway? Does tls_close() close the socket if it was created internally (via tls_connect())?

Talking of object lifetimes, I found what looks like another bug ... before I log it as a separate issue, can you confirm whether it is a bug or just a documentation issue? I found that freeing the config after creating a context (but before using it) also caused a "dangling pointer" kind of problem (access violation) - even though the documentation states

When no more contexts are to be created, the tls_config object should be freed by calling tls_config_free().

To observe this, in the above linked Gist, just move the "free up config" section to after the "get context and configure" section and before the "connect the socket to the context and do the handshake" section. Running the resulting program causes a segfault.

@busterb
Copy link
Contributor

busterb commented Jan 29, 2017

Writing to a file descriptor after you have closed is a serious bug, whether it causes an exception or not. Even if other systems do not raise an exception, you could easily inadvertently write to the descriptor after it has already been reused in another context. We could make posix_write check if something is a socket before using socket operations on it (we do this in the poll() emulation in openssl(1) to determine if something is selectable) but this is still vulnerable to TOCTOU bugs.

@kinichiro kinichiro removed the bug label Jan 29, 2017
@vsajip
Copy link
Author

vsajip commented Jan 29, 2017

Writing to a file descriptor after you have closed is a serious bug

Do you mean a bug in my program? Possibly, though in that case the tls_connect_socket() documentation could be clearer about the socket lifetime as compared to the context lifetime. (I see that @kinichiro has removed the bug label.) I think it would be OK to return the same error as you get on POSIX, rather than an abnormal termination of the program. I assume this problem could also occur on other operations e.g. posix_read, etc. Looking at posix_write():

ssize_t
posix_write(int fd, const void *buf, size_t count)
{
	ssize_t rc = send(fd, buf, count, 0);
	if (rc == SOCKET_ERROR) {
		int err = WSAGetLastError();
		return (err == WSAENOTSOCK || err == WSAEBADF) ?
			write(fd, buf, count) : wsa_errno(err);
	}
	return rc;
}

It appears to try a send(), and if that returns SOCKET_ERROR with WSAENOTSOCK, it's doing a write(), presumably assuming that fd must be a file descriptor. However, WSAENOTSOCK is also returned for closed sockets, and I think this is more like a category error, which I presume is what is causing Windows to complain. Possibly WSAEBADF and WSAENOTSOCK should be treated differently? Note the assertion dialog message referring to fh not being in range [I suppose of a normal file descriptor].

@vsajip
Copy link
Author

vsajip commented Feb 5, 2017

N.B. the NTQuery internal API can distinguish between socket and file handles.

@4a6f656c
Copy link
Contributor

@kinichiro @busterb can we solve this more sanely with NTQuery?

@Ono-Sendai
Copy link

Hi,
We're running into this exception on Windows as well.

It can be replicated by closing the underlying socket while it's blocked on a recv or send call.

(We do this in order to cancel blocking socket calls)

The LibreSSL code as it is seems incorrect, since it is treating a socket handle as a file descriptor. On windows the two are different, hence the crash.

The LibreSSL code should be fixed to not call file functions on socket handles.

@kinichiro
Copy link
Contributor

To call NtQueryObject, file descriptor is required to be trasformed to HANDLE by _get_osfhandle.
But _get_osfhandle can be used only for file descriptor, not for socket.
I don't think we can use NtQueryObject for detecting it is socket or not.

I also tested to use getsockname for checking if it is socket or not.
I thought getsockname will always return SUCCESS if it is a socket.
But, it gets WSAENOTSOCK if socket was closed.

I wonder if is there any safe way to detect that it is socket or not in any status.

@vsajip
Copy link
Author

vsajip commented Mar 19, 2020

@kinichiro Then why not just use _get_osfhandle(sock_or_fd)? If it returns INVALID_HANDLE and errno == EBADF we have a socket (closed or open), else we have a file descriptor.

@kinichiro
Copy link
Contributor

Just use _get_osfhandle for socket, it appears to get assertion ...
And other several Windows API return the same error status for fd and closed socket.
This investigation was done with Visual Studio 16 (2019) on Windows10.

Windows API file descriptor socket closed socket
_get_osfhandle OK Assertion (*1) -
_open_osfhandle -1 OK -1
NtQueryObject STATUS_INVALID_HANDLE OK STATUS_INVALID_HANDLE
GetFileType FILE_TYPE_UNKNOWN FILE_TYPE_PIPE FILE_TYPE_UNKNOWN
getsockopt WSAENOTSOCK OK WSAENOTSOCK

(*1) fh >= 0 && (unsigned)fh < (unsigned)_nhandle

To check this, I had wrote several is_socket_...() code and tried to see how it works.
Here is the gist for that test version of posix_win.c.
https://gist.github.com/kinichiro/facdfdfa311dabe1ab30e51acadc54d8

To build with this posix_win.c, add ntdll after ws2_32 in CMakeLists.txt.

set(PLATFORM_LIBS ${PLATFORM_LIBS} ws2_32 ntdll)

After all, I still couldn't find the sane way to detect fd or socket.

@vsajip
Copy link
Author

vsajip commented Mar 20, 2020

Hmmm - before I posted my comment, I did a quick smoke test (Windows 10, but using Python bindings to the underlying Windows libraries). The script in this gist:

https://gist.github.com/vsajip/0d1ff9d6e94561cc7f4466dbcf86748c

gives results which suggest _get_osfhandle() should be usable. With Python 3:

~\Projects\scratch> python3 socktest.py
3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.18362-SP0
fd 0 -> handle 8
fd 1 -> handle 12
fd 2 -> handle 16
open socket 576: [Errno 9] Bad file descriptor
closed socket 576: [Errno 9] Bad file descriptor
Trying bad integer handle values to see if we can get a crash:
No crash.

and with Python 2:

~\Projects\scratch> python2 socktest.py                              
2.7.13 (default, Jan 18 2017, 15:40:43) [MSC v.1500 64 bit (AMD64)]  
Windows-10-10.0.18362                                                
fd 0 -> handle 8                                                     
fd 1 -> handle 12                                                    
fd 2 -> handle 16                                                    
open socket 916: [Errno 9] Bad file descriptor                       
closed socket 916: [Errno 9] Bad file descriptor                     
Trying bad integer handle values to see if we can get a crash:       
No crash.

Python 2 was compiled with VS 2008, Python 3 with VS >= 2015.

@kinichiro
Copy link
Contributor

I saw that python uses _set_thread_local_invalid_parameter_handler to handle exception.
This appears to work fine.
I also added _CrtSetReportMode to avoid showing dialog.

Here is the gist for new posix_win.c.
Added BEGIN_SUPPRESS_IPH and END_SUPPRESS_IPH macro to set and release handler while calling _get_osfhandle.
https://gist.github.com/kinichiro/facdfdfa311dabe1ab30e51acadc54d8

Does this work for you?

P.S.
I wonder if we should adopt this to posix_close too.

@Ono-Sendai
Copy link

I would suggest that the fundamental issue here is some kind of confusion between socket handles and file handles, introduced due to the use of the Posix emulation layer. I think the solution is to resolve this confusion, not try to catch it with 'an ambulance at the bottom of a cliff' with asking the OS what kind of handle it is, or suppressing exceptions.

@busterb
Copy link
Contributor

busterb commented Mar 21, 2020

There was a strong move in the upstream library code to remove conditional OS paths, targeting POSIX interfaces only. Whatever deviations arise in the supported OSes would suggest either fixing those OSes, or adding compatibility shims. Sometimes it feels like being between a rock and a hard place when deciding whether to add #ifdefs back into the portable version.

In the case with libtls, the file and socket descriptor paths to close are unambiguous. While we're not likely to get new Windows #ifdef's added upstream, it would be good to think about how to more clearly signal intention to posix_close(), without needing too many out-of-tree patches. OTOH there are less than a dozen calls in libtls, maybe it's not too bad to special case calls here and just patch libtls? I worry more about the BIO layer, openssl(1), etc.

Another solution (and sorry if this sounds slightly crazier) could be to track file descriptors allocated by posix_open as a list of known file descriptors used internally by the library, so we don't have to ask the OS later on. Anything not in the file list could be assumed to be a socket, though I suppose posix_socket could also be added to track socket descriptors explicitly as well.

@jimpark
Copy link

jimpark commented Aug 7, 2020

I saw that python uses _set_thread_local_invalid_parameter_handler to handle exception.
This appears to work fine.
I also added _CrtSetReportMode to avoid showing dialog.

Here is the gist for new posix_win.c.
Added BEGIN_SUPPRESS_IPH and END_SUPPRESS_IPH macro to set and release handler while calling _get_osfhandle.
https://gist.github.com/kinichiro/facdfdfa311dabe1ab30e51acadc54d8

Does this work for you?

P.S.
I wonder if we should adopt this to posix_close too.

As I mention in my "Strange logic" issue, this happens on posix_close for me. So I'd recommend that this socket test is added to it as well. While this issue gets sorted out officially, I'm going to take kinichiro's proposed posix_win.c.

@Ono-Sendai
Copy link

Would be good to get this fixed. It's a pretty bad piece of design in libtls.
libtls should not be confused if something is a file handle or a socket handle. Trying to distinguish the two from error codes from failed function calls is a bad technique.
I can't use the unmodified libtls source in my programs without hitting assertion failures in windows files.
I can work around this with various dirty hacks, but it's not pretty. If I didn't already have the code to use libtls set up, I'd change SSL library due to this issue.

busterb added a commit to busterb/portable that referenced this issue Jul 6, 2023
based on discussion in libressl#266
and https://bugs.python.org/issue23524 adjust the compat layer for
Windows to use _get_osfhandle in combination with
_set_thread_local_invalid_parameter_handler if applicable to more
reliably determine if a handle is a socket, file, or closed socket.

This prevents assertions when calling tls_close on an already-closed
socket.
@busterb
Copy link
Contributor

busterb commented Jul 6, 2023

I know this has been sitting for a while. I got some time today to focus on it, and reworked @kinichiro's prototype in PR #883 . If anyone in this thread is still being affected by this, would be interested any feedback you have on this change. Thanks! See #883

RandomInEqualities added a commit to RandomInEqualities/libressl that referenced this issue Feb 4, 2024
When running the signertest, or the test project in 
libressl#266 an assertion window
pops up. This was fixed in afcd4be for a release compiled library.
To prevent the issue in debug mode, it looks like it is necessary to 
also disable the assertion window popup. 

With this all tests pass when compiling and running them with a Debug,
Release or RelWithDebInfo CMake build on windows (for me).
busterb pushed a commit to busterb/portable that referenced this issue Mar 3, 2024
When running the signertest, or the test project in 
libressl#266 an assertion window
pops up. This was fixed in afcd4be for a release compiled library.
To prevent the issue in debug mode, it looks like it is necessary to 
also disable the assertion window popup. 

With this all tests pass when compiling and running them with a Debug,
Release or RelWithDebInfo CMake build on windows (for me).
busterb pushed a commit to busterb/portable that referenced this issue Mar 3, 2024
When running the signertest, or the test project in 
libressl#266 an assertion window
pops up. This was fixed in afcd4be for a release compiled library.
To prevent the issue in debug mode, it looks like it is necessary to 
also disable the assertion window popup. 

With this all tests pass when compiling and running them with a Debug,
Release or RelWithDebInfo CMake build on windows (for me).
@datadiode
Copy link

datadiode commented Jul 14, 2024

Another solution (and sorry if this sounds slightly crazier) could be to track file descriptors allocated by posix_open as a list of known file descriptors used internally by the library, so we don't have to ask the OS later on. Anything not in the file list could be assumed to be a socket, though I suppose posix_socket could also be added to track socket descriptors explicitly as well.

@busterb Yet another crazy-sounding solution would be to exploit the rumor that valid Windows socket descriptors are kernel object handles, which, being offsets into some table whose entries require some alignment, can probably only take even numbers.

In order to ensure that file descriptors returned from posix_open are always odd:

static int
oddify_fd(int fd)
{
	if (fd & 1) /* also catches an eventual -1 from using up all descriptors */
		return fd;
	int clone = oddify_fd(dup(fd));
	close(fd);
	return clone;
}

int
posix_open(const char *path, ...)
{
	va_list ap;
	int mode = 0;
	int flags;

	va_start(ap, path);
	flags = va_arg(ap, int);
	if (flags & O_CREAT)
		mode = va_arg(ap, int);
	va_end(ap);

	flags |= O_BINARY;
	if (flags & O_CLOEXEC) {
		flags &= ~O_CLOEXEC;
		flags |= O_NOINHERIT;
	}
	flags &= ~O_NONBLOCK;
	return oddify_fd(open(path, flags, mode));
}

@Ono-Sendai
Copy link

Ono-Sendai commented Jul 15, 2024

Here's what I do to solve this problem:

in posix_open:

const int fh = open(path, flags, mode);
return fh | 0x80000000; // Set high bit to mark file descriptor as a file handle, as opposed to a socket handle. // GLARE NEWCODE

in posix_close:

// GLARE NEWCODE
const int is_file = (fd & 0x80000000) != 0;

if (is_file)
	return close(fd & 0x7FFFFFFF);
else
{
	if (closesocket(fd) == SOCKET_ERROR)
		return wsa_errno(WSAGetLastError());
	else
		return 0;
}
posix_read(int fd, void *buf, size_t count)
{
	// GLARE NEWCODE
	const int is_file = (fd & 0x80000000) != 0;

	if(is_file)
		return read(fd & 0x7FFFFFFF, buf, count);
	else
	{
		const ssize_t rc = recv(fd, buf, count, 0);

		if (rc == SOCKET_ERROR)
			return wsa_errno(WSAGetLastError());
		else
			return rc;
	}
}

in tls_config_load_file:

if (fstat(fd & 0x7FFFFFFF, &st) != 0) { // GLARE NEWCODE
	tls_error_set(error, "failed to stat %s file '%s'",
	    filetype, filename);
	goto err;
}

This is more elegant IMO than relying on some dubious OS calls to try and decipher if the handle is a file or socket later.
Better to store that information explictly. Better still would be to use a completely different C++ type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants