Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-47 - Migrate apache pig DAGs to new design #22439 #24212

Merged
merged 1 commit into from
Jun 5, 2022

Conversation

chethanuk
Copy link
Contributor

No description provided.

@potiuk potiuk merged commit 0046f12 into apache:main Jun 5, 2022
socar-humphrey added a commit to socar-inc/airflow that referenced this pull request Jun 10, 2022
* Doc: Add column names for DB Migration Reference (apache#23853)

Before the automation: https://airflow.apache.org/docs/apache-airflow/2.2.5/migrations-ref.html
Currently (with missing column names): https://airflow.apache.org/docs/apache-airflow/2.3.0/migrations-ref.html

* Fix exception trying to display moved table warnings (apache#23837)

If you still have an old dangling table from the 2.2 migration this
would fail. Make it more resilient and cope with both styles of moved
table name

* Update sample dag and doc for RDS (apache#23651)

* Fix DataprocJobBaseOperator not being compatible with dotted names (apache#23439). (apache#23791)

* job_name parameter is now sanitized, replacing dots by underscores.

* Upgrade `pip` to 22.1.1 version (just released) (apache#23854)

* Add better feedback to Breeze users about expected action timing (apache#23827)

There are a few actions in Breeze that might take more or less time
when invoked. This is mostly when you need to upgrade Breeze or
update to latest version of the image because some dependedncies
were added or image was modified.

While we have improved significantly the waiting time involved
now (and caching problems have been fixed to make it as fast
possible), there are still a few situations that you need to have
a good connectivity and a little time to run the upgrade. Which
is often not something you would like to loose your time on in
a number of cases when you need to do things fast.

Usually Breeeze does not force the user to perform such long
actions - it allows to continue without doing them (either by
timeout or by letting user answer "no" to question asked.

Previously Breeze have not informed the user about the exepcted
time of running such operation, but with this change it tells
what is the expected delay - thus allowing the user to make
informed action whether they want to run the upgrade or not.

* Fix UnboundLocalError when sql is empty list in DbApiHook (apache#23816)

* Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (apache#23815)

* Add number of node params only for single-node cluster in RedshiftCreateClusterOperator (apache#23839)

* Sql to gcs with exclude columns (apache#23695)

* Add support for associating  custom tags to job runs submitted via EmrContainerOperator (apache#23769)

Co-authored-by: Sandeep Kadyan <[email protected]>

* Add Deferrable Databricks operators (apache#19736)

* Fix Amazon EKS example DAG raises warning during Imports (apache#23849)


Co-authored-by: eladkal <[email protected]>

* Fix databricks tests (apache#23856)

* Add __wrapped__ property to _TaskDecorator (apache#23830)

Co-authored-by: Sanjay Pillai <sanjaypillai11 [at] gmail.com>

* Highlight task states by hovering on legend row (apache#23678)

* Rework the legend row and add the hover effect.

* Move horevedTaskState to state and fix merge conflicts.

* Add tests.

* Order of item in the LegendRow, add no_status support

* Clean up f-strings in logging calls (apache#23597)

* update K8S-KIND to 0.14.0 (apache#23859)

* Replaced all days_ago functions with datetime functions (apache#23237)

Co-authored-by: Dev232001 <[email protected]>

* Add clear DagRun endpoint. (apache#23451)

* Ignore the DeprecationWarning in test_days_ago (apache#23875)

Co-authored-by: alexkru <[email protected]>

* Speed up Breeze experience on Mac OS (apache#23866)

This change should significantly speed up Breeze experience (and
especially iterating over a change in Breeze for MacOS users -
independently if you are using x86 or arm architecture.

The problem with MacOS with docker is particularly slow filesystem
used to map sources from Host to Docker VM. It is particularly bad
when there are multiple small files involved.

The improvement come from two areas:
* removing duplicate pycache cleaning
* moving MyPy cache to docker volume

When entering breeze we are - just in case - cleaning .pyc and
__pychache__ files potentially generated outside of the docker
container - this is particularly useful if you use local IDE
and you do not have bytecode generation disabled (we have it
disabled in Breeze). Generating python bytecode might lead to
various problems when you are switching branches and Python
versions, so for Breeze development where the files change
often anyway, disabling them and removing when they are found
is important. This happens at entering breeze and it might take
a second or two depending if you have locally generated.

It could happen that __init script was called twice (depending which
script was called - therefore the time could be double the one
that was actually needed. Also if you ever generated provider
packages, the time could be much longer, because node_modules
generated in provider sources were not excluded from searching
(and on MacOS it takes a LOT of time).

This also led to duplicate time of exit as the initialization code
installed traps that were also run twice. The traps however were
rather fast so had no negative influence on performance.

The change adds a guard so that initialization is only ever executed
once.

Second part of the change is moving the cache of mypy to a docker
volume rather than being used from local source folder (default
when complete sources are mounted). We were already using selective
mount to make sure MacOS filesystem slowness affects us in minimal
way - but with this change, the cache will be stored in docker
volume that does not suffer from the same problems as mounting
volumes from host. The Docker volume is preserved until the
`docker stop` command is run - which means that iterating over
a change should be WAY faster now - observed speed-up were around
5x speedups for MyPy pre-commit.

* Add default task retry delay config (apache#23861)

* Move MappedOperator tests to mirror code location (apache#23884)

At some point during the development of AIP-42 we moved the code for
MappedOperator out of baseoperator.py to mappedoperator.py, but we
didn't move the tests at the same time

* Enable clicking on DAG owner in autocomplete dropdown (apache#23804)

PR#18991 introduced directly navigating to a DAG when selecting one
from the typeahead search results. Unfortunately, the search results
also includes DAG owner names, and selecting one of those navigates to
a DAG with that name, which almost certainly doesn't exist.

This extends the autocompletion endpoint to return the type of result,
and adjusts the typeahead selection to use this to know which way to
navigate.

* Document LocalKubernetesExecutor support in chart (apache#23876)

* Avoid extra questions in `breeze build image` command. (apache#23898)

Fixes: apache#23867

* Update INTHEWILD.md (apache#23892)

* Split contributor's quick start into separate guides. (apache#23762)

The foldable parts were not good. They made links not to work as
well as they were not too discoverable.

Fixes: apache#23174

* Avoid printing exception when exiting tests command (apache#23897)

Fixes: apache#23868

* Move string arg evals to `execute()` in `EksCreateClusterOperator` (apache#23877)

Currently there are string-value evaluations of `compute`, `nodegroup_role_arn`,  and `fargate_pod_execution_role_arn` args in the constructor of `EksCreateClusterOperator`.  These args are all listed as a template fields so it's entirely possible that the value(s) passed in to the operator is a Jinja expression or an `XComArg`. Either of these value types could cause a false-negative `ValueError` (in the case of unsupported `compute` values) or a `false-positive` (in the the cases of explicit checks for the *arn values) since the values themselves have not been rendered.

This PR moves the evaluations of these args to the `execute()` scope.

* Update .readthedocs.yml (apache#23903)

String instead of Int see https://docs.readthedocs.io/en/stable/config-file/v2.html

* Make --file command in static-checks autocomplete file name (apache#23896)

The --verbose and --dry-dun commands caused n --files command to fail
and the flag was "artifficial" -it was equivalent to bool flag.
the actual files were taken  from arguments.

This PR fixes it by turning the arguments into multiple ``--file``
commands  - each with its own completioin for local files.

* Chart: Update default airflow version to `2.3.1` (apache#23913)

* Fix Breeze documentation typo (apache#23919)

* Update environments documentation links (apache#23920)

* `2.3.1` has been released (apache#23912)

* Make CI and PROD image builds consistent (apache#23841)

Simple refactoring to make the jobs more consistent.

* Alphabetizes two tables (apache#23923)

The rest of the page has consistently alphabetized tables. This commit fixes three `extras` that were not alphabetized.

* Use "remote" pod when patching KPO pod as "checked" (apache#23676)

When patching as "checked", we have to use the current version of the pod otherwise we may get an error when trying to patch it, e.g.:

```
Operation cannot be fulfilled on pods \"test-kubernetes-pod-db9eedb7885c40099dd40cd4edc62415\": the object has been modified; please apply your changes to the latest version and try again"
```

This error would not cause a failure of the task, since errors in `cleanup` are suppressed.  However, it would fail to patch.

I believe one scenario when the pod may be updated is when retrieving xcom, since the sidecar is terminated after extracting the value.

Concerning some changes in the tests re the "already_checked" label, it was added to a few "expected pods" recently, when we changed it to patch even in the case of a successful pod.

Since we are changing the "patch" code to patch with the latest read on the pod that we have (i.e. using the `remote_pod` variable), and no longer the pod object stored on `k.pod`, the label no longer shows up in those tests (that's because in k.pod isn't actually a read of the remote pod, but just happens to get mutated in the patch function before it is used to actually patch the pod).

Further, since the `remote_pod` is a local variable, we can't observe it in tests.  So we have to read the pod using k8s api. _But_, our "find pod" function excludes "already checked" pods!  So we have to make this configurable.

So, now we have a proper integration test for the "already_checked" behavior (there was already a unit test).

* Clarify manual merging of PR in release doc (apache#23928)

It was not clear to me what this really means

* Fix broken main (apache#23940)

main breaks with
`Traceback:
  /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
  tests/providers/amazon/aws/hooks/test_cloud_formation.py:31: in <module>
      class TestCloudFormationHook(unittest.TestCase):
  tests/providers/amazon/aws/hooks/test_cloud_formation.py:67: in TestCloudFormationHook
      @mock_cloudformation
  /usr/local/lib/python3.7/site-packages/moto/__init__.py:30: in f
      module = importlib.import_module(module_name, "moto")
  /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/__init__.py:1: in <module>
      from .models import cloudformation_backends
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/models.py:18: in <module>
      from .parsing import ResourceMap, OutputMap
  /usr/local/lib/python3.7/site-packages/moto/cloudformation/parsing.py:17: in <module>
      from moto.apigateway import models  # noqa  # pylint: disable=all
  /usr/local/lib/python3.7/site-packages/moto/apigateway/__init__.py:1: in <module>
      from .models import apigateway_backends
  /usr/local/lib/python3.7/site-packages/moto/apigateway/models.py:9: in <module>
      from openapi_spec_validator import validate_spec
  E   ModuleNotFoundError: No module named 'openapi_spec_validator'
  `
  Fix is already in placed in moto getmoto/moto#5165 but version 3.1.11 wasn't released yet

* Update INSTALL_PROVIDERS_FROM_SOURCES instructions. (apache#23938)

* Add typing to Azure Cosmos Client Hook (apache#23941)

New release of Azure Cosmos library has added typing information
and it broke main builds with mypy verification.

* Remove redundant register exit signals in `dag-processor` command (apache#23886)

* Disable rebase workflow (apache#23943)

The change of the release workflow in apache#23928 removed the reason
why we should have rebase workflow possible. We only needed to
do rebase when we merged test branch into stable branch and
since we are doing it manually, there is no more reeason to
have it in the GitHub UI.

* Prevent UI from crashing if grid task instances are null (apache#23939)

* UI fix for null task instances

* improve tests without global vars

* fix test data

* Grid fix details button truncated and small UI tweaks (apache#23934)

* Show details button and wrap on LegendRow.

* Update following brent review

* Fix display on small width

* Rotate icon for a 'ReadLess' effect

* Fix and speed up grid view (apache#23947)

This fetches all TIs for a given task across dag runs, leading to
signifincatly faster response times. It also fixes a bug where Nones
were being passed to the UI when a new task was added to a DAG with
exiting runs.

* Removes duplicate code block (apache#23952)

There's are two code blocks with identical text in the helm-chart docs. This commit removes one of them.

* Update dep for databricks apache#23917 (apache#23927)

* Use '--subdir' argument value for standalong dag processor. (apache#23864)

* Revert "Add limit for JPype1 (apache#23847)" (apache#23953)

This turned out to be mistake in manual submission. Fixed
on JPype1 side.

This reverts commit 3699be4.

* Faster grid view (apache#23951)

* Disallow calling expand with no arguments (apache#23463)

* [FEATURE] KPO use K8S hook (apache#22086)

* Add cascade to `dag_tag` to `dag` foreignkey (apache#23444)

Bulk delete does not work if the cascade behaviour of a foreignkey
is set on python side(relationship configuration). To allow bulk delete of dags
we need to setup cascade deletion in the DB.

The warning on query.delete at
https://docs.sqlalchemy.org/en/14/orm/session_basics.html#selecting-a-synchronization-strategy
stated that:

The operations do not offer in-Python cascading of relationships - it is assumed that ON UPDATE CASCADE and/or ON DELETE CASCADE is configured for any foreign key references which require it, otherwise the database may emit an integrity violation if foreign key references are being enforced.

Another alternative is avoiding bulk delete of dags but I prefer we support bulk deletes.

This will break offline sql generation for mssql(already broken before now :) ). Also, since there's only one foreign key
in `dag_tag` table, I assume that the foreign key would be named `dag_tag_ibfk_1` in `mysql`. This
avoided having to query the db for the name.

The foreignkey is explicitly named now, would be easy for future upgrades

* DagFileProcessorManager: Start a new process group only if current process not a session leader (apache#23872)

* Introduce `flake8-implicit-str-concat` plugin to static checks (apache#23873)

* Fix UnboundLocalError when sql is empty list in ExasolHook (apache#23812)

* Fix inverted section levels in best-practices.rst (apache#23968)

This PR fixes inverted levels in the sections added to the "Best Practices" document in apache#21879.

* Add support to specify language name in PapermillOperator (apache#23916)

* Add support to specify language name in PapermillOperator

* Replace getattr() with simple attribute access

* [23945] Icons in grid view for different dag types (apache#23970)

* Helm logo no longer a link (apache#23977)

* Fix links in documentation (apache#23975)

* fix links
* added right link to breeze

* Add TaskInstance State 'REMOVED' to finished states and success states (apache#23797)

Now that we support dynamic task mapping, we should have the 'REMOVED'
state of task instances as a finished state because
for dynamic tasks with a removed task instance, the dagrun would be stuck in
running state if 'REMOVED' state is not in finished states.

* Remove `xcom_push` from `DockerOperator` (apache#23981)

* Fix missing shorthand for docker buildx rm -f (apache#23984)

Latest version of buildx removed -f as shorthand for --force flag.

* use explicit --mount with types of mounts rather than --volume flags (apache#23982)

The --volume flag is an old style of specifying mounts used by docker,
the newer and more explicit version is --mount where you have to
specify type, source, destination in the form of key/value pairs.

This is more explicit and avoids some guesswork when volumes are
mounted (for example seems that on WSL2 volume name might be
guessed as path wrongly). The change explicitly specifies which
of the mounts are bind mounts and which are volume mounts.

Another nice side effect of this change is that when source is
missing, docker will not automatically create directories with the
missing name but it will fail. This is nicer because before it
led to creating directories when they were missing (for example
.bash_aliases and similar). This allows us to avoid some cleanups
to account for those files being created - instead we simply
skip those mounts if the file/folder does not exist.

* Force colors in yarn test output in CI (apache#23986)

* Fix breeze failures when there is no buildx installed on Mac (apache#23988)

If you have no buildx plugin installed on Mac (for example when
you use colima instead of Docker Desktop) the breeze check was
failing - but buildx in fact is not needed to run typical breeze
commands, and breeze already has support for it - it was just
wrongly handled.

* Replace generation of docker volumes to be done from python (apache#23985)

The pre-commit to generate docker volumes in docker compose
file is now written in Python and it also uses the newer "volume:"
syntax to define the volumes mounted in the docker-compose.

* Replace `use_task_execution_date` with `use_task_logical_date` (apache#23983)

* Replace `use_task_execution_date` with `use_task_logical_date`
We have some operators/sensors that use `*_execution_date` as the class parameters. This PR deprecate the usage of these parameters and replace it with `logical_date`.
There is no change in functionality, under the hood the functionality already uses `logical_date` this is just about the parameters name as exposed to the users.

* Remove pinning for xmltodict (apache#23992)

We have now moto 3.1.9+ in constraints so we should remove the limit.

Fixes: apache#23576

* Remove fixing cncf.kubernetes provider when generating constraints (apache#23994)

When we yanked cncf.kubernetes provider, we pinned 3.1.2
temporarily for provider generation. This removes the pinning as
we are already at 4.0.2 version

* Add better diagnostics capabilities for pre-commits run via CI image (apache#23980)

The pre-commits that require CI image run docker command under
the hood that is highly optimized for performance (only mounts
files that are necessary to be mounted) - in order to improve
performance on Mac OS and make sure that artifacts are not left
in the source code of Airflow.

However that makes the command slightly more difficult to debug
because they generate dynamically the docker command used,
including the volumens that should be mounted when the docker
command is run.

This PR adds better diagnostics to the pre-commit scripts
allowing VERBOSE="true" and DRY_RUN="true" variables that can
help with diagnosing problems such as running the scripts on
WSL2.

It also fixes a few documentation bugs that have been missed
after changing names of the image-related static checks and
thanks to separating the common code to utility function
it allows to set SKIP_IMAGE_PRE_COMMITS variable to true
which will skip running all pre-commit checks that require
breeze image to be available locally.

* Disable fail-fast on pushing images to docker cache (apache#24005)

There is an issue with pushing cache to docker registry that
is connected to containerd bug but started to appear more
frequently recently (as evidenced for example by
https://github.community/t/buildx-failed-with-error-cannot-reuse-body-request-must-be-retried/253178
). The issue is still open in containerd:
containerd/containerd#5978.

Until it if fixed, we disable fail-fast on pushing cache
so that even if it happens, we just have to re-run that single
python version that actually failed. Currently there is a much
lower chance of success because all 4 build have to succeed.

* Add automated retries on retryable condition for building images in CI (apache#24006)

There is a flakiness in pushing cache images to ghcr.io, therefore
we want to add automated retries when the images fail intermittently.

The root cause of the problem is tracked in containerd:
containerd/containerd#5978

* Ensure @contextmanager decorates generator func (apache#23103)

* Revert "Add automated retries on retryable condition for building images in CI (apache#24006)" (apache#24016)

This reverts commit 7cf0e43.

* Cleanup `BranchDayOfWeekOperator` example dag (apache#24007)

* Cleanup BranchDayOfWeekOperator example dag
There is no need for `dag=dag` when using context manager.

* Added missing project_id to the wait_for_job (apache#24020)

* Only run separate per-platform build when preparing build cache (apache#24023)

Apparently pushing multi-platform images when building cache on CI
has some problems recently, connected with ghcr.io being more
vulnerable to race condition described in this issue:

containerd/containerd#5978

Apparently when two, different platform layers are pushed about
the same time to ghcr.io, the error
"cannot reuse body, request must be retried" is generated.

However we actually do not even need to build the multiplatform
latest images because as of recently we have separate cache for each
platform, and the ghcr.io/:latest images are not used any more
not even for docker builds. We we always build images rather than
pull and we use --from-cache for that - specific per platform. The only
image pulling we do is when we pull the :COMMIT_HASH images in CI- but
those are single-platform images (amd64) and even if we add tests for
arm, they will have different tag.

Hopefully we can still build release images without causing the
race condition too frequently - this is more likely because when
we build images for cache we use machines with different performance
characteristics and the same layers are pushed at different times
from different platforms.

* Preparing buildx cache is allowed without --push-image flag (apache#24028)

The previous version of buildx cache preparation implied --push-image
flag, but now this is completely separated (we do not push image,
we just prepare cache), so when mutli-platform buildx preparation is
run we should also allow the cache to run without --push-image flag.

* Add partition related methods to GlueCatalogHook: (apache#23857)

* "get_partition" to retrieve a Partition
* "create_partition" to create a Partition

* Adds foldable CI group for command output (apache#24026)

* Add foldable groups in CI outputs in commands that need it (apache#24035)

This is follow-up after apache#24026 which added capability of selectively
deciding for each breeze command, whether the output of the command
should be "foldable" group. All CI output has been reviewed, and
the commands which "need" it were identified.

This also fixes a problem introduced there - that the command itself
was not "foldable" group itself.

* Increase size of ARM build instance (apache#24036)

Our ARM cache builds started to hang recently at yarn prod step.
The most likely reason are limited resources we had for the ARM
instance to run the docker build - it was rather small instance
with 2GB RAM and it is likely not nearly enought to cope with
recent changes related to Grid View where we likely need much
more memory during the yarn build step.

This change increases the instance memory to 8 GB (c6g.xlarge).
Also this instance type gives 70% cost saving and has very low
probability of being evicted (it's not in high demand in Ohio
Region of AWS.

Also the AMI used is refreshed with latest software (docker)

* Remove unused [github_enterprise] from ref docs (apache#24033)

* Add enum validation for [webserver]analytics_tool (apache#24032)

* Support impersonation service account parameter for Dataflow runner (apache#23961)

* Fix closing connection dbapi.get_pandas_df (apache#23452)

* Light Refactor and Clean-up AWS Provider (apache#23907)

* Removing magic numbers from exceptions (apache#23997)

* Removing magic numbers from exceptions

* Running pre-commit

* Upgrade to pip 22.1.2 (apache#24043)

Pip has been upgraded to version 22.1.2 12 minutes ago. Time to
catch up.

* Shaves-off about 3 minutes from usage of ARM instances on CI (apache#24052)

Preparing airflow packages and provider packages does not
need to be done on ARM and actually the ARM instance is idle
while they are prepared during cache building.

This change moves preparation of the packages to before
the ARM instance is started which saves about 3 minutes of ARM
instance time.

* SSL Bucket, Light Logic Refactor and Docstring Update for Alibaba Provider (apache#23891)

* Use KubernetesHook to create api client in KubernetesPodOperator (apache#20578)

Add support for k8s hook in KPO; use it always (even when no conn id); continue to consider the core k8s settings that KPO already takes into account but emit deprecation warning about them.

KPO historically takes into account a few settings from core airflow cfg (e.g. verify ssl, tcp keepalive, context, config file, and in_cluster). So to use the hook to generate the client, somehow the hook has to take these settings into account. But we don't want the hook to consider these settings in general.  So we read them in KPO and if necessary patch the hook and warn.

* Re-add --force-build flag (apache#24061)

After apache#24052 we also need to add --force-build flag as for
Python 3.7 rebuilding CI cache would have been silently ignored as
no image building would be needed

* Fix grid view for mapped tasks (apache#24059)

* Fix StatD timing metric units (apache#21106)

Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>

* Drop Python 3.6 compatibility objects/modules (apache#24048)

* Remove hack from BigQuery DTS hook (apache#23887)

* Spanner assets & system tests migration (AIP-47) (apache#23957)

* Run the `check_migration` loop at least once (apache#24068)

This is broken since 2.3.0. that's if a user specifies a migration_timeout
of 0 then no migration is run at all.

* Bump eventsource from 1.0.7 to 1.1.1 in /airflow/ui (apache#24062)

Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1.
- [Release notes](https://github.com/EventSource/eventsource/releases)
- [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md)
- [Commits](EventSource/eventsource@v1.0.7...v1.1.1)

---
updated-dependencies:
- dependency-name: eventsource
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove certifi limitations from eager upgrade limits (apache#23995)

The certifi limitation was introduced to keep snowflake happy while
performing eager upgrade because it added limits on certifi. However
seems like it is not limitation any more in latest versions of
snowflake python connector, so we can safely remove it from here.

The only remaining limit is dill but this one still holds.

* fix style of example block (apache#24078)

* Handle occasional deadlocks in trigger with retries (apache#24071)

Fixes: apache#23639

* Adds Pura Scents, edits The Dyrt (apache#24086)

* Migrate Yandex example DAGs to new design AIP-47 (apache#24082)

closes: apache#22470

* set color to operators in cloud_sql.py (apache#24000)

* Migrate HTTP example DAGs to new design AIP-47 (apache#23991)

closes: apache#22448 , apache#22431

* Make expand() error vague so it's not misleading (apache#24018)

* Use github for postgres chart index (apache#24089)

Bitnami's CloudFront CDN is seemingly having issues, so point at github
direct instead until it is resolved.

* Fix the link to google workplace (apache#24080)

* Bring MappedOperator members in sync with BaseOperator (apache#24034)

* Add note about Docker volume remount issues in WSL 2 (apache#24094)

* Convert Athena Sample DAG to System Test (apache#24058)

* Self-update pre-commit to latest versions (apache#24106)

* Temporarily fix bitnami index problem (apache#24112)

We started to experience "Internal Error" when installing
Helm chart and apperently bitnami "solved" the problem by
removing from their index software older than 6 months(!).

This makes our CI fail but It is much worse. This
renders all our charts useless for people to install
This is terribly wrong, and I raised this in the issue
here:

bitnami/charts#10539 (comment)

* Fix small typos in static code checks doc (apache#24113)

- Trivial typo fix in the command to run static checks on the last commit
- Update "run all tests" to "run all checks" where applicable for consistency

* Really workaround bitnami chart problem (apache#24115)

The original fix in apache#24112 did not work due to:
* not updated lock
* EOL characters at the end of multiline long URL

This PR fixes it.

* Reduce grid view API calls (apache#24083)

* Reduce API calls from /grid

- Separate /grid_data from /grid
- Remove need for formatData
- Increase default query stale time to prevent extra fetches
- Fix useTask query keys

* consolidate grid data functions

* fix www tests

test grid_data instead of /grid

* Removing magic status code numbers from api_connecxion (apache#24050)

* Do not support MSSQL less than v2017 in code (apache#24095)

Our experimental support for MSSQL starts from v2017(in README.md) but
we still support 2000 & 2005 in code.
This PR removes this support, allowing us to use mssql.DATETIME2 in all
MSSQL DB.

* Rename Permissions to Permission Pairs. (apache#24065)

* Note that yarn dev needs webserver in debug mode (apache#24119)

* Note that yarn dev needs webserver -d

* Update CONTRIBUTING.rst

Co-authored-by: Jed Cunningham <[email protected]>

* Use -D

* Revert "Use -D"

This reverts commit 94d63ad.

Co-authored-by: Jed Cunningham <[email protected]>

* fixing SSHHook bug when using allow_host_key_change param (apache#24116)

* Adds mssql volumes to "all" backends selection (apache#24123)

The "stop" command of Breeze uses "all" backend to remove all
volumes - but mssql has special approach where the volumes
defined depend on the filesystem used and we need to add the
specific docker-compose files to list of files used when
we use stop command.

* Breeze must create `hooks\` and `dags\` directories for bind mounts (apache#24122)

  Now that breeze uses --mount instead of --volume (the former of which
  does not create missing mount dirs like the latter does see docs here:
  https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior)
  we need to create these directories explicitly.

* AIP-47 | Migrate Trino example DAGs to new design (apache#24118)

* Update production-deployment.rst (apache#24121)

The sql_alchemy_conn option is in the database section, not the core section.  Simple typo fix.

* Migrate Zendesk example DAGs to new design apache#22471 (apache#24129)

* Migrate JDBC example DAGs to new design apache#22450 (apache#24137)

* Migrate Jenkins example DAGs to new design apache#22451 (apache#24138)

* Migrate Microsoft example DAGs to new design apache#22452 - mssql (apache#24139)

* Migrate MySQL example DAGs to new design apache#22453 (apache#24142)

* Migrate Opsgenie example DAGs to new design apache#22455 (apache#24144)

* Migrate Presto example DAGs to new design apache#22459 (apache#24145)

* Migrate Plexus example DAGs to new design apache#22457 (apache#24147)

* Migrate SQLite example DAGs to new design apache#22461 (apache#24150)

* Migrate Telegram example DAGs to new design apache#22468 (apache#24126)

* AIP-47 - Migrate Tableau DAGs to new design (apache#24125)

* Migrate Salesforce example DAGs to new design apache#22463 (apache#24127)

* Update credentials when using ADC in Compute Engine (apache#23773)

* Improve Windows development compatibility for breeze (apache#24098)

* Migrate Asana example DAGs to new design apache#22440 (apache#24131)

* Migrate Neo4j example DAGs to new design apache#22454 (apache#24143)

* Workflows assets & system tests migration (AIP-47) (apache#24105)

* Workflows assets & system tests migration (AIP-47)

Co-authored-by: Wojciech Januszek <[email protected]>

* Add disabled_algorithms as an extra parameter for SSH connections (apache#24090)

* Migrate Postgres example DAGs to new design apache#22458 (apache#24148)

* Migrate Postgres example DAGs to new design apache#22458

* Fix static checks

* Migrate Snowflake system tests to new design apache#22434 (apache#24151)

* Migrate Snowflake system tests to new design apache#22434

* Fix flake8

* Migrate Qubole example DAGs to new design apache#22460 (apache#24149)

* Migrate Qubole example DAGs to new design apache#22460

* Migrate Microsoft example DAGs to new design apache#22452 - azure (apache#24141)

* Migrate Microsoft example DAGs to new design apache#22452 - azure

* Migrate Microsoft example DAGs to new design apache#22452 - winrm (apache#24140)

* Migrate Microsoft example DAGs to new design apache#22452 - winrm

* Fix static checks

* Migrate Influx example DAGs to new design apache#22449 (apache#24136)

* Migrate Influx example DAGs to new design apache#22449

* Fix static checks

* Migrate DingTalk example DAGs to new design apache#22443 (apache#24133)

* Migrate DingTalk example DAGs to new design apache#22443

* Migrate Cncf.Kubernetes example DAGs to new design apache#22441 (apache#24132)

* Migrate Cncf.Kubernetes example DAGs to new design apache#22441

* Migrate Alibaba example DAGs to new design apache#22437 (apache#24130)

* Migrate Alibaba example DAGs to new design apache#22437

* Pass connection extra parameters to wasb BlobServiceClient (apache#24154)

* fix BigQueryInsertJobOperator (apache#24165)

* Migrate Singularity example DAGs to new design apache#22464 (apache#24128)

* Better summary of status of AIP-47 (apache#24169)

Result is here: apache#24168

* Remove old Athena Sample DAG (apache#24170)

* removed old files (apache#24172)

* Chart: Default to Airflow 2.3.2 (apache#24184)

* Update 'rich' to latest version across the board. (apache#24186)

That Also includes regenerating the breeze output images.

* Fix BigQuery system tests (apache#24013)

* Change execution_date to data_interval_start in BigQueryInsertJobOperator job_id

Change-Id: Ie1f3bba701169ceb2b39d693da320564de145c0c

* Change jinja template path to relative path

Change-Id: I6cced215124f69e9f4edf8ac08bb71d3ec3c8afc

Co-authored-by: Bartlomiej Hirsz <[email protected]>

* `2.3.2` has been released (apache#24182)

* Add verification step to image release process (apache#24177)

* Added impersonation_chain for DataflowStartFlexTemplateOperator and DataflowStartSqlJobOperator (apache#24046)

* Add key_secret_project_id parameter which specifies a project with KeyFile (apache#23930)

* Add built-in Extrenal Link for ExternalTaskMarker operator (apache#23964)

* fix: DatabricksSubmitRunOperator and DatabricksRunNowOperator cannot define .json as template_ext (apache#23622) (apache#23641)

* fix: StepFunctionHook ignores explicit set `region_name` (apache#23976)

* Remove `GithubOperator` use in  `GithubSensor.__init__()`` (apache#24214)

The constructor for `GithubSensor` was instantiating `GitHubOperator` to use its `execute()` method as the driver for the result of the sensor's `poke()` logic. However, this could yield a `DuplicateTaskIdFound` when used in DAGs.

This PR updates the `GithubSensor` to use the `GithubHook` instead.

* Mac M1 postgress and doc fix (apache#24200)

* AIP-47 - Migrate dbt DAGs to new design apache#22472 (apache#24202)

* AIP-47 - Migrate databricks DAGs to new design apache#22442 (apache#24203)

* AIP-47 - Migrate hive DAGs to new design apache#22439 (apache#24204)

* AIP-47 - Migrate kylin DAGs to new design apache#22439 (apache#24205)

* AIP-47 - Migrate drill DAGs to new design apache#22439 (apache#24206)

* AIP-47 - Migrate druid DAGs to new design apache#22439 (apache#24207)

* AIP-47 - Migrate cassandra DAGs to new design apache#22439 (apache#24209)

* AIP-47 - Migrate spark DAGs to new design apache#22439 (apache#24210)

* AIP-47 - Migrate apache pig DAGs to new design apache#22439 (apache#24212)

* Migrate GitHub example DAGs to new design apache#22446 (apache#24134)

* Remove warnings when starting breeze (apache#24183)

Breeze when started produced three warnings that were harmless,
but we should fix them to remove "false positives".

* AIP-47 - Migrate livy DAGs to new design apache#22439 (apache#24208)

* Remove escaping which is wrong in latest rich version (apache#24217)

Latest rich makes escaping not needed for extra `[` needed in
Markdown URLs.

* Parse error for task added to multiple groups (apache#23071)

This raises an exception if a task already belonging to a task group
(including added to a DAG, since such task is automatically added to the
DAG's root task group).

Also, according to the issue response, manually calling TaskGroup.add()
is not considered a supported way to add a task to group. So a
meta-marker is added to the function docstring to prevent it from
showing up in documentation and users from trying to use it.

* Fix xfail test in test_scheduler.py (apache#23731)

* Migrate Papermill example DAGs to new design apache#22456 (apache#24146)

* Migrate Asana system tests to new design AIP-47 (apache#24226)

closes: apache#22428
related: apache#22440

* Migrate Microsoft system tests to new design AIP-47 (apache#24225)

closes: apache#22432
related: apache#22452

* Migrate CNCF system tests to new design AIP-47 (apache#24224)

closes: apache#22429
related: apache#22441

* Migrate Postgres system tests to new design (apache#24223)

closes: apache#22433
related: apache#22458

* AIP-47 - Migrate beam DAGs to new design apache#22439 (apache#24211)

* AIP-47 - Migrate beam DAGs to new design apache#22439

* Add explanatory note for contributors about updating Changelog (apache#24229)

* Fix backwards-compatibility introduced by fixing mypy problems (apache#24230)

There was a backwards-incompatibility introduced by apache#23716 in
two providers by using get_mandatory_value config method.

This PR corrects that backwards compatibility and updates 2.1
compatibility pre-commit to check for forbidden usage of
get_mandatory_value.

* Bump moto version (apache#24222)

* Bump moto version
version 3.1.10 broke main but the issue was fixed since in moto
related: getmoto/moto#5165

* fix moto

* Add `PrestoToSlackOperator` (apache#23979)

* Add `PrestoToSlackOperator`
Adding the funcitonality to run a single query against presto and send the result as slack message.
Similar to `SnowflakeToSlackOperator`

* Fix BigQuery Sensors system test (apache#24245)

Co-authored-by: Bartlomiej Hirsz <[email protected]>

* adding AWS_DEFAULT_REGION to the docs, boto3 expects this to be in the env variables (apache#24181)

* Unify return_code interface for task runner (apache#24093)

* Update dbt.py (apache#24218)

* Fix GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix (apache#24039)

* Adding fnmatch type regex to SFTPSensor (apache#24084)

* docs: amazon-provider retry modes (apache#23906)

* Cloud Storage assets & StorageLink update (apache#23865)

Co-authored-by: Wojciech Januszek <[email protected]>

* Fix useTasks crash on error (apache#24152)

* Prevent UI from crashing on Get API error

* add test

* don't show API errors in test logs

* use setMinutes inline

* Refactor GlueJobHook get_or_create_glue_job method. (apache#24215)

When invoked, create_job takes into account the provided 'Command' argument instead of having it hardcoded.

* Fix delete_cluster no use TriggerRule.ALL_DONE (apache#24213)

related: apache#24082

* docker new system test (apache#23167)

* chore: Refactoring and Cleaning Apache Providers (apache#24219)

* Fix await_container_completion condition (apache#23883)

* Migrate Apache Beam system tests to new design AIP-47 (apache#24256)

closes: apache#22427

* Migrate Apache Beam system tests to new design apache#22427 (apache#24241)

* Migrate Google leveldb system tests to new design AIP-47 (apache#24255)

related: apache#22447, apache#22430

* Add param docs to KubernetesHook and KubernetesPodOperator (apache#23955) (apache#24054)

* Enable dbt Cloud provider to interact with single tenant instances (apache#24264)

* Enable provider to interact with single tenant

* Define single tenant arg on Operator

* Add test for single tenant endpoint

* Enable provider to interact with single tenant

* Define single tenant arg on Operator

* Add test for single tenant endpoint

* Code linting from black

* Code linting from black

* Pass tenant to dbtCloudHook in DbtCloudGetJobRunArtifactOperator class

* Make Tenant a connection-level setting

* Remove tenant arg from Operator

* Make tenant connection-level param that defaults to 'cloud'

* Remove tenant param from sensor

* Remove leftover param string from hook

* Update airflow/providers/dbt/cloud/hooks/dbt.py

Co-authored-by: Josh Fell <[email protected]>

* Parameterize test_init_hook to test single and multi tenant connections

* Integrate test simplification suggestion

* Add connection to TestDbtCloudJobRunSesnor

Co-authored-by: Josh Fell <[email protected]>

* Apply per-run log templates to log handlers (apache#24153)

* AIP-47 - Migrate google leveldb DAGs to new design #apache#22447 (apache#24233)

* Fix choosing backend versions in breeze's command line (apache#24228)

Choosing version of backend were broken when command line switches
were used. The _VERSION variables were "hard-coded" to defaults
rather than taken from command line. This is a remnant of initial
implementation and converting the parameters to "cacheable" ones.

While looking at the versions we also found that PARAM_NAME_FLAG
is not used any more so we took the opportunity to remove it.

* Fix link broken after apache#24082 (apache#24276)

apache#24082

* Add command to regenerate breeze command output images (apache#24216)

* Make numpy effectively an optional dependency for Oracle provider (apache#24272)

Better fix to apache#23132

* Add SMAP Energy to list of companies using Airflow (apache#24268)

* fix command and typo (apache#24282)

* Update doc and sample dag for EMR Containers (apache#24087)

* scheduleinterval nullable true added in openapi (apache#24253)

* Check that edge nodes actually exist (apache#24166)

* Prepare docs for May 2022 provider's release (apache#24231)

This documentation update also (following the rule agreed in
https://github.com/apache/airflow/blob/main/README.md#support-for-providers)
bumps mininimum supported version of Airflow for all providers
to 2.2 and it constitutes a breaking change and major version bump
for all providers.

* pydocstyle D202 added (apache#24221)

* Update provider templates for new Airflow 2.2+ req (apache#24291)

I imagine we could update this somewhat programmatically and/or add this update to instructions somewhere. Let me know what you think.

* Update package description to remove double min-airflow specification (apache#24292)

* Airflow UI fix vulnerabilities - Prototype Pollution (apache#24201)

* Mention context variables and logging (apache#24304)

* Mention context variables and logging

* Fix static checks

* Remove limit of presto-python-client version (apache#24305)

* Fix langauge override in papermill operator (apache#24301)

* Also mention airflow 2 only in readme template (apache#24296)

* Fix permission issue for dag that has dot in name (apache#23510)

How we determine if a DAG is a subdag in airflow.security.permissions.resource_name_for_dag is not right.
If a dag_id contains a dot, the permission is not recorded correctly.

The current solution makes a query every time we check for permission for dags that has a dot in the name. Not that I like it but I think it's better than other options I considered such as changing how we name dags for subdag. That's not
good in UX. Another option I considered was making a query when parsing, that's not good and it's avoided
by passing root_dag to resource_name_for_dag

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>

* Check bag DAG schedule_interval match tiemtable (apache#23113)

This guards against the DAG's timetable or schedule_interval from being
changed after it's created. Validation is done by creating a timetable
and check its summary matches schedule_interval. The logic is not
bullet-proof, especially if a custom timetable does not provide a useful
summary. But this is the best we can do.

* fix: patches apache#24215. Won't raise KeyError when 'create_job_kwargs' contains the 'Command' key. (apache#24308)

* Fix D202 issue (apache#24322)

* Check for run_id for grid group summaries (apache#24327)

* Workaround job race bug on biguery to gcs transfer (apache#24330)

Fixes: apache#24277

* Update release notes for RC2 release of Providers for May 2022 (apache#24307)

Also updates links to example dags to work properly
following apache#24331

* feat(README): 커스텀 리드미를 추가한다 (#1)

* feat(README): 커스텀 리드미를 추가한다

* fix(README): 원본 readme 위에 커스텀 readme 내용을 추가하도록 수정한다

Co-authored-by: Kaxil Naik <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-authored-by: Vincent <[email protected]>
Co-authored-by: Guilherme Martins Crocetti <[email protected]>
Co-authored-by: Jarek Potiuk <[email protected]>
Co-authored-by: Dmytro Kazanzhy <[email protected]>
Co-authored-by: pankajastro <[email protected]>
Co-authored-by: 서재권(Data Platform) <[email protected]>
Co-authored-by: Sandeep <[email protected]>
Co-authored-by: Sandeep Kadyan <[email protected]>
Co-authored-by: Eugene Karimov <[email protected]>
Co-authored-by: Vedant Bhamare <[email protected]>
Co-authored-by: eladkal <[email protected]>
Co-authored-by: pierrejeambrun <[email protected]>
Co-authored-by: sanjayp <[email protected]>
Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: raphaelauv <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: Dev232001 <[email protected]>
Co-authored-by: Karthikeyan Singaravelan <[email protected]>
Co-authored-by: Alex Kruchkov <[email protected]>
Co-authored-by: alexkru <[email protected]>
Co-authored-by: Sumit Maheshwari <[email protected]>
Co-authored-by: Mark Norman Francis <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Vincent Koc <[email protected]>
Co-authored-by: Ephraim Anierobi <[email protected]>
Co-authored-by: Igor Tavares <[email protected]>
Co-authored-by: Marty Jackson <[email protected]>
Co-authored-by: Daniel Standish <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
Co-authored-by: Brent Bovenzi <[email protected]>
Co-authored-by: mhenc <[email protected]>
Co-authored-by: Kengo Seki <[email protected]>
Co-authored-by: John Green <[email protected]>
Co-authored-by: David Skoda <[email protected]>
Co-authored-by: Edith Puclla <[email protected]>
Co-authored-by: Łukasz Wyszomirski <[email protected]>
Co-authored-by: Kamil Breguła <[email protected]>
Co-authored-by: Hubert Pietroń <[email protected]>
Co-authored-by: Bernardo Couto <[email protected]>
Co-authored-by: viktorvia <[email protected]>
Co-authored-by: Tzu-ping Chung <[email protected]>
Co-authored-by: henriqueribeiro <[email protected]>
Co-authored-by: Wojciech Januszek <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ishiis <[email protected]>
Co-authored-by: Chenglong Yan <[email protected]>
Co-authored-by: François de Metz <[email protected]>
Co-authored-by: Paul Williams <[email protected]>
Co-authored-by: D. Ferruzzi <[email protected]>
Co-authored-by: James Timmins <[email protected]>
Co-authored-by: Niko <[email protected]>
Co-authored-by: chethanuk-plutoflume <[email protected]>
Co-authored-by: DataFusion4All <[email protected]>
Co-authored-by: chethanuk-plutoflume <[email protected]>
Co-authored-by: Maksim <[email protected]>
Co-authored-by: Wojciech Januszek <[email protected]>
Co-authored-by: Paul Williams <[email protected]>
Co-authored-by: Tanel Kiis <[email protected]>
Co-authored-by: Bowrna <[email protected]>
Co-authored-by: Bartłomiej Hirsz <[email protected]>
Co-authored-by: Bartlomiej Hirsz <[email protected]>
Co-authored-by: Jonathan Simon Prates <[email protected]>
Co-authored-by: Rafael Carrasco <[email protected]>
Co-authored-by: Ping Zhang <[email protected]>
Co-authored-by: GitStart-AirFlow <[email protected]>
Co-authored-by: akakakakakaa <[email protected]>
Co-authored-by: Maria Sumedre <[email protected]>
Co-authored-by: Elize Papineau <[email protected]>
Co-authored-by: peter-volkov <[email protected]>
Co-authored-by: Hank Ehly <[email protected]>
Co-authored-by: Malthe Borch <[email protected]>
Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-authored-by: socar-dini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants