-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include plugins in the architecture diagrams #36513
Include plugins in the architecture diagrams #36513
Conversation
This is part of preparation to the last stage of multi-tentant last AIP and documentation around it. It was also partially result of the issue #36510 which shows that Airflow's architecture - especially when it comes to "which component accesses what" and also "who has access to what" is oftem misunderstood (for some reason - for example - Trieggerer in Composer is configured as a component that does not have access to DAG files. Which was not really how we intended it to work. Since we have not put Triggers in Plugins our intention was that Triggerer CAN use Tiriggers defined in the DAG files and we allow DAG Authors to develop new Triggers (unlike Timetables, Listeners, Extra Linnks - all of which can only come from installed packages and plugins. I think adding visual representation of it and showing the difference between DAG files, Plugins and packages is important to avoid confusion on what rolese all these "entities" have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Introducing a third role in the architecture diagrams looks too complex to me. Would stick with "DAG author" and "admin".
- I feel all the different arrow styles require a legend. What does blue vs red and solid vs dashed indicate?
- All the crossed dashed arrows make it difficult to follow the arrow. Would stick with solid lines.
docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py
Outdated
Show resolved
Hide resolved
fa447f6
to
b78ac61
Compare
Yep. Will add it. There is a good reasoning for that and deliberate style/coloring. I will describe it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're changing the diagram, how about displaying a generic "database" icon instead of the PostgreSQL logo? PostgreSQL is the most used database but we do support others too.
Yep. I will work a bit on those. |
3d07321
to
b6a19fc
Compare
--> diagrams.programming.flowchart.Database ? |
b6a19fc
to
691ae3f
Compare
Should be quite a bit nicer (I will still need to figure out how to do anchoring a bit better but the icons should be nicer now. |
e7354c5
to
53ff3e2
Compare
Right I think I found a nice-enough layout. Looks like "ortho" layout for edges is rather difficult for edge anchoring (even if Iike it most) - but with some graphviz tweaking seems that "spline" , "polyline", "line" layouts for "basic", "distributed", "dag processor" diagrams respectively combined wiht minlen for edges for better spacing produced much nicer and cleaner diagrams. |
53ff3e2
to
6599548
Compare
And even a but nicer now :) |
269a6d2
to
6c691c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see several roles being mentioned, but don't think it's currently clear what those are (Deployment manager and Operations User). Such roles can have different names in different companies, so I don't think it's immediately clear what those mean.
Can we align/have a doc describing Airflow's vision on different roles in an organisation? That way we can link to it when mentioned in a doc, and the reader can map Airflow's role names to their own organisation.
I will take a closer look at other comments but one thing first:
We already have it. It's called "User types" but we shoud change it to User Role - and its rather comprehensively explained in our security model (which my change links to from the architecture) : Also Deployment Manager's responsibilities are very well described in the installation document: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html where Deployment Manager is not mentioned "per se" but in this page "you" when referred in the docs means "Deployment Manager" - which we should clarify. The question is - should it be a separate page with just user roles, and link to it from all three ?
Sounds like a new "User Roles" page in the "Administration and Deployment" ? WDYT? |
cdae819
to
0dcddf1
Compare
Applied the review comments @BasPH. I decided not to separate out the "roles" into separate document - but to briefly mention the roies and explicitly referring to security-model document for more details. |
0dcddf1
to
3af36f4
Compare
When it comes to access and management plugins management follows different pattern than DAG files. While DAG files can (and should) be modified by DAG authors, the whole idea of Plugins was to make it only possible to modify plugins folder (and installl plugin-enabled packages) by the Deploymenet Managers, not DAG authors. The difference is quite important because even in a simplest installation, airflow webserver never needs to access DAG files, while it should be able to access plugins. This is even more profound in the environment (leading in the future to multi-tenant deployments) plugins are not 'per-tenant" - they must be installed and managed by deployment manager, because those plugins can be used by Airflow Webservers. In the future we might want to make distinction between these two different types of plugins, because theorethically it would be possible to distingquish "scheduler, worker & triggerer" plugins from the "webserver" plugins - however we do not have such disctinction today and whoever manages plugins folder is impacting both webserver and workers. This change also re-adds the "basic" architecture which is targetted as single-user and single machine deployment and presents it as the first architecture that the user encounters - which might make it more digestible, while it also explains tha this is a simplified architecture and is followed by more complete and complex deployment scenarios involving distributed architecture, different user roles and security boundaries.
3af36f4
to
5fffd25
Compare
@BasPH -> do you think it's cool to go ? |
Ok. Will merge it now - we can always fix things later. |
#36513) When it comes to access and management plugins management follows different pattern than DAG files. While DAG files can (and should) be modified by DAG authors, the whole idea of Plugins was to make it only possible to modify plugins folder (and installl plugin-enabled packages) by the Deploymenet Managers, not DAG authors. The difference is quite important because even in a simplest installation, airflow webserver never needs to access DAG files, while it should be able to access plugins. This is even more profound in the environment (leading in the future to multi-tenant deployments) plugins are not 'per-tenant" - they must be installed and managed by deployment manager, because those plugins can be used by Airflow Webservers. In the future we might want to make distinction between these two different types of plugins, because theorethically it would be possible to distingquish "scheduler, worker & triggerer" plugins from the "webserver" plugins - however we do not have such disctinction today and whoever manages plugins folder is impacting both webserver and workers. This change also re-adds the "basic" architecture which is targetted as single-user and single machine deployment and presents it as the first architecture that the user encounters - which might make it more digestible, while it also explains tha this is a simplified architecture and is followed by more complete and complex deployment scenarios involving distributed architecture, different user roles and security boundaries. (cherry picked from commit c47dcc5)
When it comes to access and management plugins management follows different pattern than DAG files. While DAG files can (and should) be modified by DAG authors, the whole idea of Plugins was to make it only possible to modify plugins folder (and installl plugin-enabled packages) by the Deploymenet Managers, not DAG authors.
The difference is quite important because even in a simplest installation, airflow webserver never needs to access DAG files, while it should be able to access plugins.
This is even more profound in the environment (leading in the future to multi-tenant deployments) plugins are not 'per-tenant" - they must be installed and managed by deployment manager, because those plugins can be used by Airflow Webservers.
In the future we might want to make distinction between these two different types of plugins, because theorethically it would be possible to distingquish "scheduler, worker & triggerer" plugins from the "webserver" plugins - however we do not have such disctinction today and whoever manages plugins folder is impacting both webserver and workers.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.