You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an issue that'll collect all potential technical issues that removing twisted will encounter, and try to provide suitable solutions for those.
Ever since I've seen synapse, I've poked at trying to replace its asynchronous execution framework, twisted, with the standard asyncio library.
However, i've gotten to realise that replacing twisted is a much more involved task.
Specifically, twisted covers four main problem domains;
Asynchronous Execution
Deferred, mainly, but also by providing IO primitives and such.
Web Frameworks
twisted.web; Resources, routing, servelets, etc.
Database Interaction
twisted.enterprise.dbapi provides a wrapper around any DB-APIv2-compatible database driver, making sure any blocking calls are executed on a threadpool, while returning an awaitable.
Integrated Testing
Twisted's trial, working through a large body of tests under tests/
I was initially mainly focused on the first one, with asyncio replacing twisted's asynchronous primitives. This was a naive approach, though now that I know better, I'll write down my findings on the rest.
I'll first explain each domain, how twisted handles it, and how an asyncio-equivalent could replace it.
Asynchronous Execution
Twisted provides Deferred, which provides callback-based asynchronous support.
It has been sugared over the years to program the same as asyncio's (and python 3.4+'s) coroutine, first via @inlineCallbacks (which provided a way to yield a Deferred to the executor, to resume processing once that deferred has finished, de-callback-ing the function signature, very much like await), and then ultimately also supporting async/await natively.
This all does not do away with Deferred's callback-based nature, and a bunch of hacks exist in the synapse codebase to deal with this in a 'sane' manner.
Enter asyncio, which has three async primitives;
coroutine, a language-internal object, essentially a more opinionated generator.
asyncio.Future, a low-level callback-based async primitive, mainly used for IO-based operations, cross-thread execution, and other black magic.
asyncio.Task, an object wrapping any awaitable, this will run on the loop until completion. Any other async thread can await this object at any time to yield until the result is ready. (Even after the task has finished, then it returns immediately).
As a rule of thumb, Future is a low-level plumbing object, while Task is used to drive coroutines forward concurrently (and can be used to keep track of these threads of execution).
All are awaitable, which is to say that await is able to be used on them while running in an event loop, and that loop will handle that object.
Specifically, this addresses these major uses of twisted in synapse;
Reactor => Event Loop
IDelayedCall => Task + asyncio.sleep() or loop.call_later
LoopingCall => Task + while True:
Failure => simple exception raising
defer.gatherResults => asyncio.gather
Web Frameworks
Twisted provides an extensive web framework, based around servelets, Resources which can respond to HTTP Requests (which synapse subclasses with SynapseRequest, to add additional details.)
This is by far the most extensive and integrated bit of twisted, which made it somewhat hard to find a suitable replacement.
By far, python has a ton of microservice frameworks, notably all similarly derived from the same idea that flask brings; annotation-based path routing.
I found two libraries which somewhat could fit the niche that synapse requires;
Tornado
Tornado provides a somewhat similar interface like twisted.web, though in my opinion, it has some problems, such as a stagnated release cycle (the last release was 2 years ago), and some crusted API which hasn't fully reconciled with their migration to asyncio.
This makes me recommend tornado less for a look at the following;
aiohttp
aiohttp provides a modest server module with a ton of features.
Most notably, while also adopting the annotation-based routing that flask does, aiohttp does two things which i think help it adapt to synapse's usecase;
First, it allows "Views", these look very similar to applets;
Returning this object and collating it all into a generic add_routes on the app router could replace register_servlets.
Possibly, things like nested applications can logically separate large parts of the codebase, such as the admin apis, these could then be re-added as follows; app.add_subapp('/_synapse/admin/', admin).
aiohttp is a plenty old library, but it also finds a niche between simple and involved, I think it'll be maintained for a very long time.
Database Interaction
This is a bit of a tricky one, the main way where synapse touches twisted here is through twisted.enterprise.dbapi, which provides a DB-APIv2 wrapper around blocking calls, automatically placing this on a threadpool.
I've tried to find a replacement, but ultimately came up short, I think that there are a few reasons for this;
First, DB-API was created to harmonise the python ecosystem around database adapters, making it possible to create twisted.enterprise.dbapi in the first place. This has largely been obsoleted by asyncio, as it provides IO primitives instead.
An asynchronous version of DB-API was never created, while I remember encountering a draft somewhere, I can't find it again. Regardless, nothing of the sort exists in the PEP.
Large databases nowadays are abstracted behind ORMs, which provide more pythonic and friendlier interfaces, instead of directly interfacing with the databases themselves.
Regardless, .dbapi ultimately only provides a thin wrapper around the DB-API actual, scheduling blocking calls on threads.
asyncio largely obsoletes this, by instead inlining the IO into the current event loop.
While this might look like a performance hit on first glance, i think that it might actually become a slight improvement, as asyncio would be able to cooperatively advance execution on every single connection at once every loop, and an alternate IO-optimised loop (such as uvloop) could bring even further improvements.
psycopg3provides an asyncio interface, which will largely mirror its existing interface, only with async def in front of most operations.
For sqlite, aiosqlite exists, scheduling all calls onto a single thread, allowing the event loop to continue while it churns away.
Integrated Testing
This is the bit i'm not entirely sure sure about yet, excluding the costs of rewriting a large amount of twisted-dominated test code with a pytest-esc equivalent (or the likes), synapse make heavy use of pump, basically time-travelling the reactor to get more results quicker.
I could not find a good equivalent in asyncio, while both asynctest and aiomock exist, both are not maintained.
This is still a somewhat open area of research, though the ecosystem is rich.
Motivation
Of course, all of this is not actionable without any motivation, "why switch from twisted?". The main reasons for staying would be; Stability, Developer Experience, Framework Integration (twisted.web).
All of these are extremely valid reasons, and I'm not rejecting them, this issue is more a gathering of "what if", or "what would it take" to switch to asyncio.
Though still, I think there are some good reasons to switch to asyncio;
Modern Ecosystem
Around the time of python 3.4, asyncio was poised to become the standard-library async framework, while many more existed at the time (Twisted, Tornado, gevent, etc.) concurrently.
This all happened around 2016, asyncio was the new kid on the block, twisted was the established framework, I wasn't around in 2016 to observe this, but I'd bet that for synapse, this choice was very obvious back then. (It also had an integrated asynchronous web framework, which i believe wasn't exactly a given back then)
Now, the situation has (in some definition) turned on its head, where asyncio is the prominent library in the ecosystem, and twisted is more and more becoming an exotic framework in comparison.
This is fine for synapse, but because of this, it cannot take advantage of modern libraries and speedups, such as uvloop (a libuv-based asyncio reactor, see also matrix-org/synapse#10366)
And don't forget to mention the large amount of up-to-date and maintained libraries which can rely on asyncio's large userbase (through python), versus the few libraries which still support twisted through some combination of factors. I don't have exact details, but i believe quite a fair bit of helper libraries have been rejected because they do not have twisted support, only provide asyncio support, and/or are blocking. This could change that.
Future-proofing
Synapse is relatively stable, its now at a stage where it provides a good matrix experience to a sizable portion of the ecosystem.
However, with servers like dendrite and conduit each aiming to take a slice of synapse's share, this may change, and with its relative slowness to compiled code, synapse may become somewhat of a legacy product for matrix.
Twisted does not help with this, as it increasingly becomes alien to many new python developers, who have only known asyncio. This may also introduce bugs, as community contributors could rely on asyncio-like behaviour that twisted does not provide, which adds maintenance overhead.
It also doesn't attract community developers to really work with synapse, as most of its codebase is intricately intertwined with twisted's framework, which can be a daunting task to learn, work with, interact with, and then subsequently submit patches for. For asyncio, they could possibly already have pre-existing knowledge about its behaviour, and certain libraries.
Ultimately, I think that asyncio will outlast twisted, due to it being co-developed with python, and twisted being a very old library with many alternatives.
Unseen benefits
Furthermore, integrating with asyncio might bring unseen benefits.
The first one that i know of is speed, asyncio has some "C-accelerated" parts (such as Future, Task, and some other primitives.) which could already provide a speed boost compared to twisted.
This can be extended with uvloop, which (as described above) provides a libuv-based event loop optimised for heavy IO handling.
Secondly, with pyo3-asyncio, it is possible to write extensions in Rust that accelerate synapse considerably, and are exposed to python as normal awaitables. These would run on a separate rust event loop thread, providing actual access to multithreading. (Whereas the GIL prohibits a lot of concurrent execution, even with DB adapters often enough).
Acknowledging the maintenance cost
Of course, this all isn't as simple as the technical reasoning above makes it sound; there is a huge up-front investment and maintenance cost associated with this, first with the rewriting effort itself, then with adjusting to the new framework(s), and lastly maintainer retraining/teaching/learning, etc.
Only in the long run do the results really pay off, where synapse can become simpler to maintain and easier to be contributed to by community members, and potentially find new maintainers for.
The question (of course) for matrix-org (or element, who currently are synapse's maintainers) is if this is worth it.
Some reasoning for this could be visions for the future of synapse, a personal one (and one that I've had since i saw the project) would be that synapse is a testbed for MSCs and other experiments, as python trades strong type guarantees and speed for developer happiness.
As Synapse is today, with it being a rather centre piece of "Matrix", with it still being the largest server, forgoing stability for flexibility and future-proofing like this wouldn't make much sense, but this situation/cost-benefit calculation might (or could) change, and that's what this issue is for.
The text was updated successfully, but these errors were encountered:
Conversation on this went stale on the previous repo and wasn't continued here. I'm totally new to Synapse, so I really have no good idea of priorities and the working dynamic of the maintainers, but this issue is very detailed and I resonate quite a lot with it. Twisted is very exotic nowadays and asyncio is where the entire ecosystem (pretty much) is developing, with lots of high quality libraries, such as aiohttp, asyncpg and uvloop.
Since the last discussion was about 2 years ago, I wanted to spark it again and see if priorities have shifted and if the cost of such a rewrite is more/less beneficial and interesting now.
This issue has been migrated from #11777.
This is an issue that'll collect all potential technical issues that removing
twisted
will encounter, and try to provide suitable solutions for those.Ever since I've seen
synapse
, I've poked at trying to replace its asynchronous execution framework,twisted
, with the standardasyncio
library.However, i've gotten to realise that replacing
twisted
is a much more involved task.Specifically, twisted covers four main problem domains;
Deferred
, mainly, but also by providing IO primitives and such.twisted.web
; Resources, routing, servelets, etc.twisted.enterprise.dbapi
provides a wrapper around any DB-APIv2-compatible database driver, making sure any blocking calls are executed on a threadpool, while returning an awaitable.tests/
I was initially mainly focused on the first one, with asyncio replacing twisted's asynchronous primitives. This was a naive approach, though now that I know better, I'll write down my findings on the rest.
I'll first explain each domain, how twisted handles it, and how an asyncio-equivalent could replace it.
Asynchronous Execution
Twisted provides
Deferred
, which provides callback-based asynchronous support.It has been sugared over the years to program the same as asyncio's (and python 3.4+'s)
coroutine
, first via@inlineCallbacks
(which provided a way toyield
aDeferred
to the executor, to resume processing once that deferred has finished, de-callback-ing the function signature, very much likeawait
), and then ultimately also supportingasync
/await
natively.This all does not do away with
Deferred
's callback-based nature, and a bunch of hacks exist in the synapse codebase to deal with this in a 'sane' manner.Enter
asyncio
, which has three async primitives;coroutine
, a language-internal object, essentially a more opinionated generator.asyncio.Future
, a low-level callback-based async primitive, mainly used for IO-based operations, cross-thread execution, and other black magic.asyncio.Task
, an object wrapping any awaitable, this will run on the loop until completion. Any other async thread can await this object at any time to yield until the result is ready. (Even after the task has finished, then it returns immediately).As a rule of thumb,
Future
is a low-level plumbing object, whileTask
is used to drive coroutines forward concurrently (and can be used to keep track of these threads of execution).All are awaitable, which is to say that
await
is able to be used on them while running in an event loop, and that loop will handle that object.Specifically, this addresses these major uses of twisted in synapse;
Reactor
=> Event LoopIDelayedCall
=>Task
+asyncio.sleep()
orloop.call_later
LoopingCall
=>Task
+while True:
Failure
=> simple exception raisingdefer.gatherResults
=>asyncio.gather
Web Frameworks
Twisted provides an extensive web framework, based around servelets,
Resources
which can respond to HTTPRequest
s (which synapse subclasses withSynapseRequest
, to add additional details.)This is by far the most extensive and integrated bit of twisted, which made it somewhat hard to find a suitable replacement.
By far, python has a ton of microservice frameworks, notably all similarly derived from the same idea that flask brings; annotation-based path routing.
I found two libraries which somewhat could fit the niche that synapse requires;
Tornado
Tornado provides a somewhat similar interface like
twisted.web
, though in my opinion, it has some problems, such as a stagnated release cycle (the last release was 2 years ago), and some crusted API which hasn't fully reconciled with their migration to asyncio.This makes me recommend tornado less for a look at the following;
aiohttp
aiohttp
provides a modest server module with a ton of features.Most notably, while also adopting the annotation-based routing that flask does, aiohttp does two things which i think help it adapt to synapse's usecase;
First, it allows "Views", these look very similar to applets;
Second, it allows
RouteTableDef
, which is an object that collects routes, using annotations which self-document the route it is getting.This can also be applied to views;
Returning this object and collating it all into a generic
add_routes
on the app router could replaceregister_servlets
.Possibly, things like nested applications can logically separate large parts of the codebase, such as the admin apis, these could then be re-added as follows;
app.add_subapp('/_synapse/admin/', admin)
.aiohttp
is a plenty old library, but it also finds a niche between simple and involved, I think it'll be maintained for a very long time.Database Interaction
This is a bit of a tricky one, the main way where synapse touches twisted here is through
twisted.enterprise.dbapi
, which provides a DB-APIv2 wrapper around blocking calls, automatically placing this on a threadpool.I've tried to find a replacement, but ultimately came up short, I think that there are a few reasons for this;
twisted.enterprise.dbapi
in the first place. This has largely been obsoleted by asyncio, as it provides IO primitives instead.Regardless,
.dbapi
ultimately only provides a thin wrapper around the DB-API actual, scheduling blocking calls on threads.asyncio
largely obsoletes this, by instead inlining the IO into the current event loop.While this might look like a performance hit on first glance, i think that it might actually become a slight improvement, as asyncio would be able to cooperatively advance execution on every single connection at once every loop, and an alternate IO-optimised loop (such as
uvloop
) could bring even further improvements.psycopg3
provides an asyncio interface, which will largely mirror its existing interface, only withasync def
in front of most operations.For sqlite, aiosqlite exists, scheduling all calls onto a single thread, allowing the event loop to continue while it churns away.
Integrated Testing
This is the bit i'm not entirely sure sure about yet, excluding the costs of rewriting a large amount of twisted-dominated test code with a
pytest
-esc equivalent (or the likes), synapse make heavy use ofpump
, basically time-travelling the reactor to get more results quicker.I could not find a good equivalent in asyncio, while both
asynctest
andaiomock
exist, both are not maintained.This is still a somewhat open area of research, though the ecosystem is rich.
Motivation
Of course, all of this is not actionable without any motivation, "why switch from twisted?". The main reasons for staying would be; Stability, Developer Experience, Framework Integration (
twisted.web
).All of these are extremely valid reasons, and I'm not rejecting them, this issue is more a gathering of "what if", or "what would it take" to switch to asyncio.
Though still, I think there are some good reasons to switch to asyncio;
Modern Ecosystem
Around the time of python 3.4,
asyncio
was poised to become the standard-library async framework, while many more existed at the time (Twisted, Tornado, gevent, etc.) concurrently.This all happened around 2016,
asyncio
was the new kid on the block,twisted
was the established framework, I wasn't around in 2016 to observe this, but I'd bet that for synapse, this choice was very obvious back then. (It also had an integrated asynchronous web framework, which i believe wasn't exactly a given back then)Now, the situation has (in some definition) turned on its head, where
asyncio
is the prominent library in the ecosystem, andtwisted
is more and more becoming an exotic framework in comparison.This is fine for synapse, but because of this, it cannot take advantage of modern libraries and speedups, such as
uvloop
(a libuv-based asyncio reactor, see also matrix-org/synapse#10366)And don't forget to mention the large amount of up-to-date and maintained libraries which can rely on asyncio's large userbase (through python), versus the few libraries which still support twisted through some combination of factors. I don't have exact details, but i believe quite a fair bit of helper libraries have been rejected because they do not have twisted support, only provide asyncio support, and/or are blocking. This could change that.
Future-proofing
Synapse is relatively stable, its now at a stage where it provides a good matrix experience to a sizable portion of the ecosystem.
However, with servers like dendrite and conduit each aiming to take a slice of synapse's share, this may change, and with its relative slowness to compiled code, synapse may become somewhat of a legacy product for matrix.
Twisted does not help with this, as it increasingly becomes alien to many new python developers, who have only known
asyncio
. This may also introduce bugs, as community contributors could rely on asyncio-like behaviour that twisted does not provide, which adds maintenance overhead.It also doesn't attract community developers to really work with synapse, as most of its codebase is intricately intertwined with twisted's framework, which can be a daunting task to learn, work with, interact with, and then subsequently submit patches for. For asyncio, they could possibly already have pre-existing knowledge about its behaviour, and certain libraries.
Ultimately, I think that asyncio will outlast twisted, due to it being co-developed with python, and twisted being a very old library with many alternatives.
Unseen benefits
Furthermore, integrating with asyncio might bring unseen benefits.
The first one that i know of is speed,
asyncio
has some "C-accelerated" parts (such asFuture
,Task
, and some other primitives.) which could already provide a speed boost compared to twisted.This can be extended with
uvloop
, which (as described above) provides alibuv
-based event loop optimised for heavy IO handling.Secondly, with
pyo3-asyncio
, it is possible to write extensions in Rust that accelerate synapse considerably, and are exposed to python as normal awaitables. These would run on a separate rust event loop thread, providing actual access to multithreading. (Whereas the GIL prohibits a lot of concurrent execution, even with DB adapters often enough).Acknowledging the maintenance cost
Of course, this all isn't as simple as the technical reasoning above makes it sound; there is a huge up-front investment and maintenance cost associated with this, first with the rewriting effort itself, then with adjusting to the new framework(s), and lastly maintainer retraining/teaching/learning, etc.
Only in the long run do the results really pay off, where synapse can become simpler to maintain and easier to be contributed to by community members, and potentially find new maintainers for.
The question (of course) for matrix-org (or element, who currently are synapse's maintainers) is if this is worth it.
Some reasoning for this could be visions for the future of synapse, a personal one (and one that I've had since i saw the project) would be that synapse is a testbed for MSCs and other experiments, as python trades strong type guarantees and speed for developer happiness.
As Synapse is today, with it being a rather centre piece of "Matrix", with it still being the largest server, forgoing stability for flexibility and future-proofing like this wouldn't make much sense, but this situation/cost-benefit calculation might (or could) change, and that's what this issue is for.
The text was updated successfully, but these errors were encountered: