-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Multithreading lldb-server works on Windows now; fixed gdb port mapping #100670
Conversation
@llvm/pr-subscribers-lldb Author: Dmitry Vasilyev (slydiman) ChangesRemoved fork(). Used threads and the common thread-safe port map for all platform connections. Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread. This patch depends on #100659, #100666. This patch fixes #97537, #90923, #56346. lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target. Patch is 33.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100670.diff 8 Files Affected:
diff --git a/lldb/include/lldb/Host/FileSystem.h b/lldb/include/lldb/Host/FileSystem.h
index 640f3846e448c..5e25414a894d3 100644
--- a/lldb/include/lldb/Host/FileSystem.h
+++ b/lldb/include/lldb/Host/FileSystem.h
@@ -47,6 +47,12 @@ class FileSystem {
static FileSystem &Instance();
+ static void InitializePerThread() {
+ lldbassert(!InstancePerThread() && "Already initialized.");
+ InstancePerThread().emplace(llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>(
+ llvm::vfs::createPhysicalFileSystem().release()));
+ }
+
template <class... T> static void Initialize(T &&...t) {
lldbassert(!InstanceImpl() && "Already initialized.");
InstanceImpl().emplace(std::forward<T>(t)...);
@@ -206,6 +212,7 @@ class FileSystem {
private:
static std::optional<FileSystem> &InstanceImpl();
+ static std::optional<FileSystem> &InstancePerThread();
llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> m_fs;
std::unique_ptr<TildeExpressionResolver> m_tilde_resolver;
std::string m_home_directory;
diff --git a/lldb/source/Host/common/FileSystem.cpp b/lldb/source/Host/common/FileSystem.cpp
index 5153a0a9ec513..cb76086616d6b 100644
--- a/lldb/source/Host/common/FileSystem.cpp
+++ b/lldb/source/Host/common/FileSystem.cpp
@@ -49,7 +49,15 @@ void FileSystem::Terminate() {
InstanceImpl().reset();
}
+std::optional<FileSystem> &FileSystem::InstancePerThread() {
+ static thread_local std::optional<FileSystem> t_fs;
+ return t_fs;
+}
+
std::optional<FileSystem> &FileSystem::InstanceImpl() {
+ std::optional<FileSystem> &fs = InstancePerThread();
+ if (fs)
+ return fs;
static std::optional<FileSystem> g_fs;
return g_fs;
}
diff --git a/lldb/source/Host/posix/PipePosix.cpp b/lldb/source/Host/posix/PipePosix.cpp
index f35c348990df6..1aa02efe86610 100644
--- a/lldb/source/Host/posix/PipePosix.cpp
+++ b/lldb/source/Host/posix/PipePosix.cpp
@@ -324,6 +324,18 @@ Status PipePosix::ReadWithTimeout(void *buf, size_t size,
bytes_read += result;
if (bytes_read == size || result == 0)
break;
+
+ // This is the workaround for the following bug in Linux multithreading
+ // select() https://bugzilla.kernel.org/show_bug.cgi?id=546
+ // ReadWithTimeout() with a non-zero timeout is used only to
+ // read the port number from the gdbserver pipe
+ // in GDBRemoteCommunication::StartDebugserverProcess().
+ // The port number may be "1024\0".."65535\0".
+ if (timeout.count() > 0 && size == 6 && bytes_read == 5 &&
+ static_cast<char *>(buf)[4] == '\0') {
+ break;
+ }
+
} else if (errno == EINTR) {
continue;
} else {
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
index f9d37490e16ae..cef836e001adf 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp
@@ -646,7 +646,9 @@ GDBRemoteCommunicationServerCommon::Handle_vFile_Size(
packet.GetHexByteString(path);
if (!path.empty()) {
uint64_t Size;
- if (llvm::sys::fs::file_size(path, Size))
+ FileSpec file_spec(path);
+ FileSystem::Instance().Resolve(file_spec);
+ if (llvm::sys::fs::file_size(file_spec.GetPath(), Size))
return SendErrorResponse(5);
StreamString response;
response.PutChar('F');
@@ -725,7 +727,9 @@ GDBRemoteCommunicationServerCommon::Handle_vFile_unlink(
packet.SetFilePos(::strlen("vFile:unlink:"));
std::string path;
packet.GetHexByteString(path);
- Status error(llvm::sys::fs::remove(path));
+ FileSpec file_spec(path);
+ FileSystem::Instance().Resolve(file_spec);
+ Status error(llvm::sys::fs::remove(file_spec.GetPath()));
StreamString response;
response.Printf("F%x,%x", error.GetError(), error.GetError());
return SendPacketNoLock(response.GetString());
@@ -744,6 +748,13 @@ GDBRemoteCommunicationServerCommon::Handle_qPlatform_shell(
// uint32_t timeout = packet.GetHexMaxU32(false, 32);
if (packet.GetChar() == ',')
packet.GetHexByteString(working_dir);
+ else {
+ auto cwd = FileSystem::Instance()
+ .GetVirtualFileSystem()
+ ->getCurrentWorkingDirectory();
+ if (cwd)
+ working_dir = *cwd;
+ }
int status, signo;
std::string output;
FileSpec working_spec(working_dir);
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
index 65f1cc12ba307..6e3b7b4a351e0 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp
@@ -18,13 +18,13 @@
#include <sstream>
#include <thread>
-#include "llvm/Support/FileSystem.h"
#include "llvm/Support/JSON.h"
#include "llvm/Support/Threading.h"
#include "lldb/Host/Config.h"
#include "lldb/Host/ConnectionFileDescriptor.h"
#include "lldb/Host/FileAction.h"
+#include "lldb/Host/FileSystem.h"
#include "lldb/Host/Host.h"
#include "lldb/Host/HostInfo.h"
#include "lldb/Interpreter/CommandCompletions.h"
@@ -44,8 +44,17 @@ using namespace lldb;
using namespace lldb_private::process_gdb_remote;
using namespace lldb_private;
+// Copy assignment operator to avoid copying m_mutex
+GDBRemoteCommunicationServerPlatform::PortMap &
+GDBRemoteCommunicationServerPlatform::PortMap::operator=(
+ const GDBRemoteCommunicationServerPlatform::PortMap &o) {
+ m_port_map = std::move(o.m_port_map);
+ return *this;
+}
+
GDBRemoteCommunicationServerPlatform::PortMap::PortMap(uint16_t min_port,
- uint16_t max_port) {
+ uint16_t max_port)
+ : m_mutex() {
assert(min_port);
for (; min_port < max_port; ++min_port)
m_port_map[min_port] = LLDB_INVALID_PROCESS_ID;
@@ -54,11 +63,13 @@ GDBRemoteCommunicationServerPlatform::PortMap::PortMap(uint16_t min_port,
void GDBRemoteCommunicationServerPlatform::PortMap::AllowPort(uint16_t port) {
assert(port);
// Do not modify existing mappings
+ std::lock_guard<std::mutex> guard(m_mutex);
m_port_map.insert({port, LLDB_INVALID_PROCESS_ID});
}
llvm::Expected<uint16_t>
GDBRemoteCommunicationServerPlatform::PortMap::GetNextAvailablePort() {
+ std::lock_guard<std::mutex> guard(m_mutex);
if (m_port_map.empty())
return 0; // Bind to port zero and get a port, we didn't have any
// limitations
@@ -75,6 +86,7 @@ GDBRemoteCommunicationServerPlatform::PortMap::GetNextAvailablePort() {
bool GDBRemoteCommunicationServerPlatform::PortMap::AssociatePortWithProcess(
uint16_t port, lldb::pid_t pid) {
+ std::lock_guard<std::mutex> guard(m_mutex);
auto pos = m_port_map.find(port);
if (pos != m_port_map.end()) {
pos->second = pid;
@@ -84,6 +96,7 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::AssociatePortWithProcess(
}
bool GDBRemoteCommunicationServerPlatform::PortMap::FreePort(uint16_t port) {
+ std::lock_guard<std::mutex> guard(m_mutex);
std::map<uint16_t, lldb::pid_t>::iterator pos = m_port_map.find(port);
if (pos != m_port_map.end()) {
pos->second = LLDB_INVALID_PROCESS_ID;
@@ -94,6 +107,7 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::FreePort(uint16_t port) {
bool GDBRemoteCommunicationServerPlatform::PortMap::FreePortForProcess(
lldb::pid_t pid) {
+ std::lock_guard<std::mutex> guard(m_mutex);
if (!m_port_map.empty()) {
for (auto &pair : m_port_map) {
if (pair.second == pid) {
@@ -106,15 +120,22 @@ bool GDBRemoteCommunicationServerPlatform::PortMap::FreePortForProcess(
}
bool GDBRemoteCommunicationServerPlatform::PortMap::empty() const {
+ std::lock_guard<std::mutex> guard(m_mutex);
return m_port_map.empty();
}
+GDBRemoteCommunicationServerPlatform::PortMap
+ GDBRemoteCommunicationServerPlatform::g_port_map;
+std::set<lldb::pid_t> GDBRemoteCommunicationServerPlatform::g_spawned_pids;
+std::mutex GDBRemoteCommunicationServerPlatform::g_spawned_pids_mutex;
+
// GDBRemoteCommunicationServerPlatform constructor
GDBRemoteCommunicationServerPlatform::GDBRemoteCommunicationServerPlatform(
- const Socket::SocketProtocol socket_protocol, const char *socket_scheme)
- : GDBRemoteCommunicationServerCommon(),
- m_socket_protocol(socket_protocol), m_socket_scheme(socket_scheme),
- m_spawned_pids_mutex(), m_port_map(), m_port_offset(0) {
+ const Socket::SocketProtocol socket_protocol, const char *socket_scheme,
+ const lldb_private::Args &args, uint16_t port_offset)
+ : GDBRemoteCommunicationServerCommon(), m_socket_protocol(socket_protocol),
+ m_socket_scheme(socket_scheme), m_inferior_arguments(args),
+ m_port_offset(port_offset) {
m_pending_gdb_server.pid = LLDB_INVALID_PROCESS_ID;
m_pending_gdb_server.port = 0;
@@ -159,11 +180,72 @@ GDBRemoteCommunicationServerPlatform::GDBRemoteCommunicationServerPlatform(
GDBRemoteCommunicationServerPlatform::~GDBRemoteCommunicationServerPlatform() =
default;
+lldb::thread_result_t GDBRemoteCommunicationServerPlatform::ThreadProc() {
+ // We need a virtual working directory per thread.
+ FileSystem::InitializePerThread();
+
+ Log *log = GetLog(LLDBLog::Platform);
+
+ if (IsConnected()) {
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "Thread started...",
+ __FUNCTION__);
+
+ if (m_inferior_arguments.GetArgumentCount() > 0) {
+ lldb::pid_t pid = LLDB_INVALID_PROCESS_ID;
+ std::optional<uint16_t> port;
+ std::string socket_name;
+ Status error = LaunchGDBServer(m_inferior_arguments,
+ "", // hostname
+ pid, port, socket_name);
+ if (error.Success())
+ SetPendingGdbServer(pid, *port, socket_name);
+ }
+
+ bool interrupt = false;
+ bool done = false;
+ Status error;
+ while (!interrupt && !done) {
+ if (GetPacketAndSendResponse(std::nullopt, error, interrupt, done) !=
+ GDBRemoteCommunication::PacketResult::Success)
+ break;
+ }
+
+ if (error.Fail()) {
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "GetPacketAndSendResponse: %s",
+ __FUNCTION__, error.AsCString());
+ }
+ }
+
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "Disconnected. Killing child processes...",
+ __FUNCTION__);
+ for (lldb::pid_t pid : m_spawned_pids)
+ KillSpawnedProcess(pid);
+
+ // Do do not wait for child processes. See comments in
+ // DebugserverProcessReaped() for details.
+
+ FileSystem::Terminate();
+
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "Thread exited.",
+ __FUNCTION__);
+
+ delete this;
+ return {};
+}
+
Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
const lldb_private::Args &args, std::string hostname, lldb::pid_t &pid,
std::optional<uint16_t> &port, std::string &socket_name) {
if (!port) {
- llvm::Expected<uint16_t> available_port = m_port_map.GetNextAvailablePort();
+ llvm::Expected<uint16_t> available_port = g_port_map.GetNextAvailablePort();
if (available_port)
port = *available_port;
else
@@ -181,23 +263,25 @@ Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
if (hostname.empty())
hostname = "127.0.0.1";
- Log *log = GetLog(LLDBLog::Platform);
- LLDB_LOGF(log, "Launching debugserver with: %s:%u...", hostname.c_str(),
- *port);
+ auto cwd = FileSystem::Instance()
+ .GetVirtualFileSystem()
+ ->getCurrentWorkingDirectory();
+ if (cwd)
+ debugserver_launch_info.SetWorkingDirectory(FileSpec(*cwd));
// Do not run in a new session so that it can not linger after the platform
// closes.
debugserver_launch_info.SetLaunchInSeparateProcessGroup(false);
debugserver_launch_info.SetMonitorProcessCallback(
- std::bind(&GDBRemoteCommunicationServerPlatform::DebugserverProcessReaped,
- this, std::placeholders::_1));
+ &GDBRemoteCommunicationServerPlatform::DebugserverProcessReaped);
std::ostringstream url;
// debugserver does not accept the URL scheme prefix.
#if !defined(__APPLE__)
url << m_socket_scheme << "://";
#endif
- uint16_t *port_ptr = &*port;
+ uint16_t child_port = *port;
+ uint16_t *port_ptr = &child_port;
if (m_socket_protocol == Socket::ProtocolTcp) {
std::string platform_uri = GetConnection()->GetURI();
std::optional<URI> parsed_uri = URI::Parse(platform_uri);
@@ -208,19 +292,44 @@ Status GDBRemoteCommunicationServerPlatform::LaunchGDBServer(
port_ptr = nullptr;
}
+ Log *log = GetLog(LLDBLog::Platform);
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "Host %s launching debugserver with: %s...",
+ __FUNCTION__, hostname.c_str(), url.str().c_str());
+
Status error = StartDebugserverProcess(
url.str().c_str(), nullptr, debugserver_launch_info, port_ptr, &args, -1);
pid = debugserver_launch_info.GetProcessID();
+
+ if (error.Success()) {
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "debugserver launched successfully as pid %" PRIu64,
+ __FUNCTION__, pid);
+ } else {
+ LLDB_LOGF(log,
+ "GDBRemoteCommunicationServerPlatform::%s() "
+ "debugserver launch failed: %s",
+ __FUNCTION__, error.AsCString());
+ }
+
+ // TODO: Be sure gdbserver uses the requested port.
+ // assert(!port_ptr || *port == 0 || *port == child_port)
+ // Use only the original *port returned by GetNextAvailablePort()
+ // for AssociatePortWithProcess() or FreePort() below.
+
if (pid != LLDB_INVALID_PROCESS_ID) {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- m_spawned_pids.insert(pid);
+ AddSpawnedProcess(pid);
if (*port > 0)
- m_port_map.AssociatePortWithProcess(*port, pid);
+ g_port_map.AssociatePortWithProcess(*port, pid);
} else {
if (*port > 0)
- m_port_map.FreePort(*port);
+ g_port_map.FreePort(*port);
}
+ if (port_ptr)
+ *port = child_port;
return error;
}
@@ -230,10 +339,6 @@ GDBRemoteCommunicationServerPlatform::Handle_qLaunchGDBServer(
// Spawn a local debugserver as a platform so we can then attach or launch a
// process...
- Log *log = GetLog(LLDBLog::Platform);
- LLDB_LOGF(log, "GDBRemoteCommunicationServerPlatform::%s() called",
- __FUNCTION__);
-
ConnectionFileDescriptor file_conn;
std::string hostname;
packet.SetFilePos(::strlen("qLaunchGDBServer;"));
@@ -255,18 +360,9 @@ GDBRemoteCommunicationServerPlatform::Handle_qLaunchGDBServer(
Status error =
LaunchGDBServer(Args(), hostname, debugserver_pid, port, socket_name);
if (error.Fail()) {
- LLDB_LOGF(log,
- "GDBRemoteCommunicationServerPlatform::%s() debugserver "
- "launch failed: %s",
- __FUNCTION__, error.AsCString());
return SendErrorResponse(9);
}
- LLDB_LOGF(log,
- "GDBRemoteCommunicationServerPlatform::%s() debugserver "
- "launched successfully as pid %" PRIu64,
- __FUNCTION__, debugserver_pid);
-
StreamGDBRemote response;
assert(port);
response.Printf("pid:%" PRIu64 ";port:%u;", debugserver_pid,
@@ -317,28 +413,45 @@ GDBRemoteCommunicationServerPlatform::Handle_qKillSpawnedProcess(
lldb::pid_t pid = packet.GetU64(LLDB_INVALID_PROCESS_ID);
+ if (SpawnedProcessFinished(pid))
+ m_spawned_pids.erase(pid);
+
// verify that we know anything about this pid. Scope for locker
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
- // not a pid we know about
- return SendErrorResponse(10);
- }
+ if ((m_spawned_pids.find(pid) == m_spawned_pids.end())) {
+ // not a pid we know about
+ return SendErrorResponse(10); // ECHILD
}
// go ahead and attempt to kill the spawned process
- if (KillSpawnedProcess(pid))
+ if (KillSpawnedProcess(pid)) {
+ m_spawned_pids.erase(pid);
return SendOKResponse();
- else
- return SendErrorResponse(11);
+ } else
+ return SendErrorResponse(11); // EDEADLK
+}
+
+void GDBRemoteCommunicationServerPlatform::AddSpawnedProcess(lldb::pid_t pid) {
+ std::lock_guard<std::mutex> guard(g_spawned_pids_mutex);
+
+ // If MonitorChildProcessThreadFunction() failed hope the system will not
+ // reuse pid of zombie processes.
+ // assert(g_spawned_pids.find(pid) == g_spawned_pids.end());
+
+ g_spawned_pids.insert(pid);
+ m_spawned_pids.insert(pid);
+}
+
+bool GDBRemoteCommunicationServerPlatform::SpawnedProcessFinished(
+ lldb::pid_t pid) {
+ std::lock_guard<std::mutex> guard(g_spawned_pids_mutex);
+ return (g_spawned_pids.find(pid) == g_spawned_pids.end());
}
bool GDBRemoteCommunicationServerPlatform::KillSpawnedProcess(lldb::pid_t pid) {
// make sure we know about this process
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end())
- return false;
+ if (SpawnedProcessFinished(pid)) {
+ // it seems the process has been finished recently
+ return true;
}
// first try a SIGTERM (standard kill)
@@ -346,46 +459,30 @@ bool GDBRemoteCommunicationServerPlatform::KillSpawnedProcess(lldb::pid_t pid) {
// check if that worked
for (size_t i = 0; i < 10; ++i) {
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
- // it is now killed
- return true;
- }
+ if (SpawnedProcessFinished(pid)) {
+ // it is now killed
+ return true;
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end())
- return true;
- }
+ if (SpawnedProcessFinished(pid))
+ return true;
// the launched process still lives. Now try killing it again, this time
// with an unblockable signal.
Host::Kill(pid, SIGKILL);
for (size_t i = 0; i < 10; ++i) {
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end()) {
- // it is now killed
- return true;
- }
+ if (SpawnedProcessFinished(pid)) {
+ // it is now killed
+ return true;
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
// check one more time after the final sleep
- {
- std::lock_guard<std::recursive_mutex> guard(m_spawned_pids_mutex);
- if (m_spawned_pids.find(pid) == m_spawned_pids.end())
- return true;
- }
-
- // no luck - the process still lives
- return false;
+ return SpawnedProcessFinished(pid);
}
GDBRemoteCommunication::PacketResult
@@ -442,12 +539,14 @@ GDBRemoteCommunication::PacketResult
GDBRemoteCommunicationServerPlatform::Handle_qGetWorkingDir(
StringExtractorGDBRemote &packet) {
- llvm::SmallString<64> cwd;
- if (std::error_code ec = llvm::sys::fs::current_path(cwd))
- return SendErrorResponse(ec.value());
+ auto cwd = FileSystem::Instance()
+ .GetVirtualFileSystem()
+ ->getCurrentWorkingDirectory();
+ if (!cwd)
+ return SendErrorResponse(cwd.getError());
StreamString response;
- response.PutBytesAsRawHex8(cwd.data(), cwd.size());
+ response.PutBytesAsRawHex8(cwd->data(), cwd->size());
return SendPacketNoLock(response.GetString());
}
@@ -458,7 +557,9 @@ GDBRemoteCommunicationServerPlatform::Handle_QSetWorkingDir(
std::string path;
packet.GetHexByteString(path);
- if (std::error_code ec = llvm::sys::fs::set_current_path(path))
+ if (std::error_code ec = FileSystem::Instance()
+ .GetVirtualFileSystem()
+ ...
[truncated]
|
Just edited the description to make the fixes links work automatically (https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). Will review shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to change this code to use threads for a long time, but the idea of using a per-thread virtual file system absolutely terrifies me. That only works is all of the accesses go through the file system (no direct syscalls) and if we're very strict about which code executes on which thread. Right now, I don't think we're either, and I don't see how we could ever achieve it. (That's not exactly true for the code in the lldb-server, but we also don't really have enforcable rules for which code can be run in the lldb-server, and the code this is changing affects all of lldb).
(Also, I guess this is the reason for the ETXTBSY workaround)
One obvious and safe (though less performant and less satisfying) alternative is to replace the fork with spawning a new process (fork+execve) instead. Have you considered that?
// This is the workaround for the following bug in Linux multithreading | ||
// select() https://bugzilla.kernel.org/show_bug.cgi?id=546 | ||
// ReadWithTimeout() with a non-zero timeout is used only to | ||
// read the port number from the gdbserver pipe | ||
// in GDBRemoteCommunication::StartDebugserverProcess(). | ||
// The port number may be "1024\0".."65535\0". | ||
if (timeout.count() > 0 && size == 6 && bytes_read == 5 && | ||
static_cast<char *>(buf)[4] == '\0') { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the consensus on that bug is that this is the application's (i.e. our) fault. We should fix the issue instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Note select() works fine with the pipe closed on other side in the single thread. select() hangs when called simultaneously from multiple threads. I tried multiple simultaneous connections to lldb-server platform
to launch lldb-server gdbserver
. It worked 50/50 in case of 2 connections and 100% failed in case of 3+ connections. Instead of using select() I tried
- use poll()
- use read(size = 0)
- use non blocked pipe and call read() w/o select() or poll()
- change pipe buffer size
Nothing helped. It is the bug in the kernel. read() will hang too if the pipe is closed on the other side. Non blocking read() will return EAGAIN instead of 0. The system just does not recognize the closed pipe in case of multithreading.
So, I don't see a way to fix this on Linux. The only way is a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I find this extremely hard to believe. Pipes and the ability to read them until EOF are as old as UNIX. We're not doing anything special here. It's going to take a lot more than a reference to a 20 year old REJECTED INVALID
bug to convince me this is a kernel issue.
Also note that the situation mentioned in that bug is different from what (I think) you're describing here. In their case, the pipe is NOT closed on the other side. The pipe is closed on the side that's doing the select
ing:
Consider a multithreaded program (pthreads). One thread reads from
a file descriptor (typically a TCP or UDP socket). At some time,
another thread decides to close(2) the file descriptor while the
first thread is blocked, waiting for input data.
In other words, this is your basic race in the application code, and it's the applications (ours) responsibility to fix it. While I'm not a kernel maintainer, I think I have a pretty good idea why they said the application is buggy, and why they didn't want to fix it -- it's because fixing it probably will not make the application correct.
The problem there is that the application has calls (select
(or friends) and close
) on two threads with no synchronization between them. Now if select happens to run first, then it's not completely unreasonable to expect that close
will terminate that select
, and the kernel could in theory make sure it does that (apparently, some operating systems do just that). The problem is what happens if select
does not run first. What if close
does ? In this case, select
will return an error (as the bug reporter expects), but only if the FD hasn't been reused in the mean time. And since linux always assigns the lowest FD available (I think POSIX mandates that), it's very likely that the very next operation (perhaps on a third thread) which creates an fd will get the same FD as we've just closed. If that happens, then the select will NOT return an error (the kernel has no way to know that it's referring to the old fd) and will happily start listening on the new fd. Since you usually aren't able to control all operations that could possibly create a new FD, this kind of pattern would be buggy except in extremely limited circumstances.
One way I can imagine this happening is if the FileSystem instance was local to a (That said, it still may be better to just spawn a new process instead. I don't think this is a particularly hot code path (I think the test suite is basically the only user of multithreaded platforms), and it will make sure we don't hit the ETXTBSY issue). |
We need the virtual working directory only for few requests via the platform protocol. It is necessary to just resolve the path. Note I have added only 2 updates for that. All the rest code already uses lldb::FileSystem correctly.
I don't see the connection. See more details in #100659 discussion.
To fix all gdb port mapping issues we need a common port mapping available in all platform connection handlers. It is possible only with the multithreading. And I don't see a way to delegate the accepted incoming connection to a spawned process on Windows.
GDBRemoteCommunicationServerPlatform extends GDBRemoteCommunicationServerCommon, GDBRemoteCommunication, etc. The working directory may be used for own tasks (for example load/save settings) and to handle request with a relative path. Most such requests are handled in GDBRemoteCommunicationServerCommon. It seems everything works correctly since all tests passed.
How will the spawned process help? I think the only way to fix ETXTBSY issue is to copy the executable to the target and launch it from the same process. It seems MacOS uses system's gdbserver instead of |
I believe that the code that's necessary to run I certainly can't do it, and that's kind of my point: I think this is a bad abstraction. Outside of a very limited use case, it's impossible to reason about this and/or prove that introducing a thread-local cwd is safe and correct. We're often optimizing code by moving it from one thread to another, or farming it out to a thread pool, and we also have code that can run on several threads in different contexts. All of these things could cause "correct" code to suddenly stop working because a completely unrelated code has been changed to run on a different thread. If this is only really used for a "only for few requests via the platform protocol", then why not make the CWD a property of the platform object? (Either through a virtual filesystem, or just by having it as a string, and resolving things explicitly)
Ack.
I think that using threads for the management of a common port resource is a very elegant solution. I did not realize you're trying to solve that, for which I apologise. That said, I think it's a stretch to say this is the only way to solve this issue. I can think at least four other potential solutions (with different tradeoffs) right now. This is my favourite: Don't use port mappings (at least in the multi-process The way this would work is by letting the platform instance delegate/upgrate/convert the platform connection into a gdbserver one. The way this would work would be something like this:
On non-darwin platform (darwin uses
Though I'm not a windows expert, I'm pretty sure that's possible. Windows has the concept of inheritable
That may be the case, and if that's true, then I think the base class should have a filesystem parameter as well. Either way, we're imposing some sort of a requirement on the code around us. In one case it's "a path string needs to be resolved relative the a certain filesystem", in the other "a path string need to be resolved on a specific thread". I think the first one is better because it's more explicit -- you know which code is prepared to handle "alternate" filesystems by seeing if it accepts a filesystem argument -- and flexible -- you can pass paths across threads by making sure the filesystem travels along with it.
That would certainly help, but I don't think its necessary. I've explained what I think the problem is on the other PR. |
It is possible to store an own FileSystem object in the platform handler, but it requires to update 80% of GDBRemoteCommunicationServerCommon.cpp and implement some behavior switch in inherited classes. I tried to minimize changes. I have added the new FileSystem::InitializePerThread() which is used only in GDBRemoteCommunicationServerPlatform and its base clases in case of multithreading. All other code uses the same FileSystem, nothing changed. FileSystem::InitializePerThread() uses the CWD of the app. So the behavior for the thread is the same as for a forked child process. I don't see any other threads where FileSystem is used. Anyway if FileSystem::InitializePerThread() was not called, any new thread uses the common app's FileSystem. It is safe. |
That does not worry me. In fact, I would say that if all we need to update is GDBRemoteCommunicationServerCommon and its subclasses, then we're pretty good.
I realize all that, but I still don't think it's a good tradeoff -- complicating one low-level library (which is pretty complicated on its own), for the sake of one very specific user. I can see how someone might view it differently and you're welcome to find those people and get them on board with your approach. I just don't think I'm going to be one of them. |
But debugserver on darwin will not support this feauture. It will still require the port map or disabling firewall. |
I think it should already support that. If it doesn't, and it comes down to this, I volunteer to help you with making sure that works. |
@labath It seems we do not need qSupported and qUpgradeToGdbConnection. We can run Option 1:
We need a pipe for communication between the main Option 2: And we need to update
I don't see any sockets here https://learn.microsoft.com/en-us/windows/win32/procthread/inheritance I'm not sure sockets may be inherited in Windows. But here is the example how to share the socket to a child process using WSADuplicateSocket and memory mapping files |
I like how you're thinking.
An interesting idea. I think we could find a way to implement this on windows, but the thing I don't like here is that it limits us to a single in-flight connection (otherwise, we wouldn't be able to match up the incoming connections). If anything goes wrong with one connection, we have to wait 10 seconds (maybe more -- I'm not sure if 10s is enough of a timeout) to serve the next one.
I like this. It still means that the user has to forward two ports (and the gdb port needs to be forwarded to the same value), but at least it's not N ports, and we can get out of this port map business. I mean, I still like my idea more, but if you're going to be implementing this, I think you can choose how to go about it. One thing I realized now (and this is a common problem for all of the implementations proposed here) is that if we can't match gdb-server and platform connections, we can't rely on the gdb server inheriting things from the platform process. I think the only piece of information that matters here (maybe) is the CWD, but I think we can make sure that works either by resetting the gdb server cwd explicitly, or by using absolute paths.
Interesting. If that's what it takes, then ok -- I guess. However, I'd still try to check if it's possible to pass it directly, as that page doesn't say sockets can't be inherited either, and this page mentions the BTW, I think i've have figured out what would be the best solution to the leaked FD problem (I don't claim to have invented this, I've seen this implementation in several places already): Create a "forkserver" process. Basically, we fork a new process at startup, and then use this process to perform any additional fork operations (which would be requested via IPC). Since we fork the process before opening any other FDs, it will not accidentally inherit any FDs except the ones we pass it explicitly. (This idea is mostly relevant for the multithreading implementation, but in principle, it could be interesting for the |
Ok. First, I will prepare a new patch to support If the socket sharing will work on Windows well enough, I will prepare a second patch to listen the single gdb port and provide the accepted connection to
See this article. BTW, here is the good question - what does the socket handle mean in Windows? I will try to use lldb::Pipe to provide the socket to the child process on Windows. |
…t on Windows `lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. Depends on llvm#101326. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. Depends on llvm#101326. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Depends on llvm#101383. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t mapping Removed fork(). Used threads and the common thread-safe port map for all platform connections. Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread. This patch depends on llvm#100659, llvm#100666. This patch fixes llvm#97537, llvm#90923, llvm#56346. lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target.
e69a04e
to
17916b6
Compare
`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes #90923, fixes #56346. This is the part 1 of the replacement of #100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх #97537.
See the replacement #101283. |
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537.
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537. (cherry picked from commit 82ee31f)
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537. (cherry picked from commit 82ee31f)
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537. (cherry picked from commit 82ee31f)
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537. (cherry picked from commit 82ee31f)
The main purpose of this patch is to fix all issues related to the gdb ports mapping in lldb-server. We need the common gdb ports map which is available in platform connection handlers. Currently
lldb-server platform
forks the child process to handle the incomming connection. This child process cannot request the gdb port from the common map. The optimal solution is to use multithreading lldb-server in the platform mode. lldb-server in the gdbserver mode remains the same.This solution also gave a new advantage -
lldb-server platform --server
works on Windows now.Note multithreading connection handlers require an own virtual working directory per thread. The most lldb-server code is using lldb::FileSystem and already resolves pathes. llvm::vfs::createPhysicalFileSystem() can be used as lldb::FileSystem with minimal updates.
Fixes #97537, fixes #90923, fixes #56346, fixes #101475.
lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target (8 concurrent threads/connections).