Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include plugins in the architecture diagrams #36513

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Dec 31, 2023

When it comes to access and management plugins management follows different pattern than DAG files. While DAG files can (and should) be modified by DAG authors, the whole idea of Plugins was to make it only possible to modify plugins folder (and installl plugin-enabled packages) by the Deploymenet Managers, not DAG authors.

The difference is quite important because even in a simplest installation, airflow webserver never needs to access DAG files, while it should be able to access plugins.

This is even more profound in the environment (leading in the future to multi-tenant deployments) plugins are not 'per-tenant" - they must be installed and managed by deployment manager, because those plugins can be used by Airflow Webservers.

In the future we might want to make distinction between these two different types of plugins, because theorethically it would be possible to distingquish "scheduler, worker & triggerer" plugins from the "webserver" plugins - however we do not have such disctinction today and whoever manages plugins folder is impacting both webserver and workers.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

This is part of preparation to the last stage of multi-tentant last AIP and documentation around it.

It was also partially result of the issue #36510 which shows that Airflow's architecture - especially when it comes to "which component accesses what" and also "who has access to what" is oftem misunderstood (for some reason - for example - Trieggerer in Composer is configured as a component that does not have access to DAG files. Which was not really how we intended it to work. Since we have not put Triggers in Plugins our intention was that Triggerer CAN use Tiriggers defined in the DAG files and we allow DAG Authors to develop new Triggers (unlike Timetables, Listeners, Extra Linnks - all of which can only come from installed packages and plugins.

I think adding visual representation of it and showing the difference between DAG files, Plugins and packages is important to avoid confusion on what rolese all these "entities" have.

Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Introducing a third role in the architecture diagrams looks too complex to me. Would stick with "DAG author" and "admin".
  • I feel all the different arrow styles require a legend. What does blue vs red and solid vs dashed indicate?
  • All the crossed dashed arrows make it difficult to follow the arrow. Would stick with solid lines.

docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch 2 times, most recently from fa447f6 to b78ac61 Compare December 31, 2023 14:25
@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

I feel all the different arrow styles require a legend. What does blue vs red and solid vs dashed indicate?

Yep. Will add it. There is a good reasoning for that and deliberate style/coloring. I will describe it.

Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're changing the diagram, how about displaying a generic "database" icon instead of the PostgreSQL logo? PostgreSQL is the most used database but we do support others too.

@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

All the crossed dashed arrows make it difficult to follow the arrow. Would stick with solid lines.

Yep. I will work a bit on those.

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch 4 times, most recently from 3d07321 to b6a19fc Compare December 31, 2023 17:07
@jscheffl
Copy link
Contributor

While we're changing the diagram, how about displaying a generic "database" icon instead of the PostgreSQL logo? PostgreSQL is the most used database but we do support others too.

--> diagrams.programming.flowchart.Database ?

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch from b6a19fc to 691ae3f Compare December 31, 2023 18:25
@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

Should be quite a bit nicer (I will still need to figure out how to do anchoring a bit better but the icons should be nicer now.

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch 2 times, most recently from e7354c5 to 53ff3e2 Compare December 31, 2023 19:31
@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

Right I think I found a nice-enough layout.

Looks like "ortho" layout for edges is rather difficult for edge anchoring (even if Iike it most) - but with some graphviz tweaking seems that "spline" , "polyline", "line" layouts for "basic", "distributed", "dag processor" diagrams respectively combined wiht minlen for edges for better spacing produced much nicer and cleaner diagrams.

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch from 53ff3e2 to 6599548 Compare December 31, 2023 19:51
@potiuk
Copy link
Member Author

potiuk commented Dec 31, 2023

And even a but nicer now :)

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch 3 times, most recently from 269a6d2 to 6c691c6 Compare December 31, 2023 21:16
Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several roles being mentioned, but don't think it's currently clear what those are (Deployment manager and Operations User). Such roles can have different names in different companies, so I don't think it's immediately clear what those mean.

Can we align/have a doc describing Airflow's vision on different roles in an organisation? That way we can link to it when mentioned in a doc, and the reader can map Airflow's role names to their own organisation.

docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
docs/apache-airflow/core-concepts/overview.rst Outdated Show resolved Hide resolved
@potiuk
Copy link
Member Author

potiuk commented Jan 2, 2024

I will take a closer look at other comments but one thing first:

I see several roles being mentioned, but don't think it's currently clear what those are (Deployment manager and Operations User). Such roles can have different names in different companies, so I don't think it's immediately clear what those mean.

Can we align/have a doc describing Airflow's vision on different roles in an organisation? That way we can link to it when mentioned in a doc, and the reader can map Airflow's role names to their own organisation.

We already have it. It's called "User types" but we shoud change it to User Role - and its rather comprehensively explained in our security model (which my change links to from the architecture) :

https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html#airflow-security-model-user-types

Also Deployment Manager's responsibilities are very well described in the installation document: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html where Deployment Manager is not mentioned "per se" but in this page "you" when referred in the docs means "Deployment Manager" - which we should clarify.

The question is - should it be a separate page with just user roles, and link to it from all three ?

  • Security Model
  • Installation
  • Architecure Overview

Sounds like a new "User Roles" page in the "Administration and Deployment" ? WDYT?

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch 2 times, most recently from cdae819 to 0dcddf1 Compare January 3, 2024 19:26
@potiuk
Copy link
Member Author

potiuk commented Jan 3, 2024

Applied the review comments @BasPH. I decided not to separate out the "roles" into separate document - but to briefly mention the roies and explicitly referring to security-model document for more details.

@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch from 0dcddf1 to 3af36f4 Compare January 3, 2024 19:39
@potiuk
Copy link
Member Author

potiuk commented Jan 3, 2024

I think with all those changes, iterations it starts to look reallly good and rather professional:

This:

Screenshot 2024-01-03 at 20 43 29

And this:

Screenshot 2024-01-03 at 20 34 39

@potiuk
Copy link
Member Author

potiuk commented Jan 3, 2024

And:

Screenshot 2024-01-03 at 20 45 41

When it comes to access and management plugins management follows
different pattern than DAG files. While DAG files can (and should) be
modified by DAG authors, the whole idea of Plugins was to make it only
possible to modify plugins folder (and installl plugin-enabled packages)
by the Deploymenet Managers, not DAG authors.

The difference is quite important because even in a simplest
installation, airflow webserver never needs to access DAG files, while
it should be able to access plugins.

This is even more profound in the environment (leading in the future to
multi-tenant deployments) plugins are not 'per-tenant" - they must be
installed and managed by deployment manager, because those plugins can
be used by Airflow Webservers.

In the future we might want to make distinction between these two
different types of plugins, because theorethically it would be possible
to distingquish "scheduler, worker & triggerer" plugins from the
"webserver" plugins - however we do not have such disctinction today and
whoever manages plugins folder is impacting both webserver and workers.

This change also re-adds the "basic" architecture which is targetted
as single-user and single machine deployment and presents it as the
first architecture that the user encounters - which might make it more
digestible, while it also explains tha this is a simplified architecture
and is followed by more complete and complex deployment scenarios
involving distributed architecture, different user roles and security
boundaries.
@potiuk potiuk force-pushed the include-plugins-in-airflow-architecture-diagrams branch from 3af36f4 to 5fffd25 Compare January 4, 2024 15:16
@potiuk
Copy link
Member Author

potiuk commented Jan 4, 2024

@BasPH -> do you think it's cool to go ?

@potiuk
Copy link
Member Author

potiuk commented Jan 5, 2024

Ok. Will merge it now - we can always fix things later.

@potiuk potiuk merged commit c47dcc5 into apache:main Jan 5, 2024
50 checks passed
@potiuk potiuk deleted the include-plugins-in-airflow-architecture-diagrams branch January 5, 2024 14:37
@ephraimbuddy ephraimbuddy added the type:doc-only Changelog: Doc Only label Jan 10, 2024
ephraimbuddy pushed a commit that referenced this pull request Jan 11, 2024
#36513)

When it comes to access and management plugins management follows
different pattern than DAG files. While DAG files can (and should) be
modified by DAG authors, the whole idea of Plugins was to make it only
possible to modify plugins folder (and installl plugin-enabled packages)
by the Deploymenet Managers, not DAG authors.

The difference is quite important because even in a simplest
installation, airflow webserver never needs to access DAG files, while
it should be able to access plugins.

This is even more profound in the environment (leading in the future to
multi-tenant deployments) plugins are not 'per-tenant" - they must be
installed and managed by deployment manager, because those plugins can
be used by Airflow Webservers.

In the future we might want to make distinction between these two
different types of plugins, because theorethically it would be possible
to distingquish "scheduler, worker & triggerer" plugins from the
"webserver" plugins - however we do not have such disctinction today and
whoever manages plugins folder is impacting both webserver and workers.

This change also re-adds the "basic" architecture which is targetted
as single-user and single machine deployment and presents it as the
first architecture that the user encounters - which might make it more
digestible, while it also explains tha this is a simplified architecture
and is followed by more complete and complex deployment scenarios
involving distributed architecture, different user roles and security
boundaries.

(cherry picked from commit c47dcc5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants