Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/prefect-plugin): fixed the unit tests #10643

Merged

Conversation

dushayntAW
Copy link
Contributor

@dushayntAW dushayntAW commented Jun 4, 2024

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

  • New Features

    • Introduced Prefect integration in DataHub, allowing users to manage Prefect flow and task metadata, flow run details, and lineage information.
    • Added a Prefect flow example for an ETL process, including data extraction, transformation, and interaction with Snowflake datasets.
  • Documentation

    • Added new documentation on Prefect integration, including instructions on metadata event emission, block registration, and troubleshooting.
  • Chores

    • Updated build and test workflows to include Prefect plugin configurations and artifact handling.
    • Adjusted documentation sidebar entries to include the new Prefect documentation link.

Copy link
Contributor

coderabbitai bot commented Jul 1, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The updates introduce a GitHub Actions workflow for the Prefect plugin, enhance artifact handling, and integrate Prefect with DataHub. Concurrently, Gradle and TypeScript changes support the new Prefect capabilities by including Prefect in the metadata ingestion modules, ensuring proper build processes, and updating documentation to feature the Prefect integration and corresponding examples.

Changes

Files/Paths Change Summary
.github/workflows/build-and-test.yml Added exclusion for metadata-ingestion-modules:prefect-plugin tasks and updated artifact upload path.
.github/workflows/prefect-plugin.yml Introduced new workflow for Prefect plugin, including testing, artifact uploads, and coverage reporting.
.github/workflows/test-results.yml Included "Prefect Plugin" in workflows triggering "Test Results".
.../docs-website/build.gradle Added dependency on prefect-plugin:buildWheel task in yarnGenerate task.
.../docs-website/generateDocsDir.ts Updated copy_python_wheels function to handle Prefect-plugin wheels.
.../docs-website/sidebars.js Added doc entry for Prefect and adjusted other documentation links.
docs/lineage/prefect.md Integrated Prefect with DataHub for flow and task metadata management and lineage information.
.../prefect-plugin/src/.../flow.py Introduced example Prefect flow for an ETL process demonstrating integration with DataHub.

Sequence Diagrams

sequenceDiagram
    participant Developer
    participant GitHub Actions
    participant DataHub
    Developer ->> GitHub Actions: Pushes to master/PR/release
    GitHub Actions ->> GitHub Actions: Trigger Prefect Plugin workflow
    GitHub Actions ->> Python: Set up environment and dependencies
    GitHub Actions ->> Gradle: Build metadata ingestion modules
    GitHub Actions ->> Python: Run Prefect flow
    Python ->> DataHub: Emit metadata events
    GitHub Actions -->> Developer: Upload artifacts and report coverage
Loading

Poem

🎀
A rabbit hops through code so bright,
Prefect flows now take their flight.
DataHub sings, as tasks emit,
Metadata winds the kit.
In workflows new and tasks so neat,
The project’s heartbeat finds its beat.
🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 36

Outside diff range and nitpick comments (13)
.github/workflows/test-results.yml (1)

Line range hint 19-19: Fix potential issues with read and quoting.

The read command should use the -r option to prevent backslash mangling. Additionally, the variables should be quoted to prevent globbing and word splitting.

- read name url <<< "$artifact"
+ IFS=$'\t' read -r name url <<< "$artifact"
- gh api $url > "$name.zip"
+ gh api "$url" > "$name.zip"
metadata-ingestion/developing.md (11)

Line range hint 67-67: Add a comma after the introductory clause.

A comma is needed after "pip install".

- If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python.
+ If you've already run the pip install, but running `datahub` in your command line doesn't work, then there is likely an issue with your PATH setup and Python.

Line range hint 112-112: Avoid wordiness.

Consider replacing "a variety of" with a more concise phrase.

- The sources pull metadata from a variety of data systems, while the sinks are primarily for moving this metadata into DataHub.
+ The sources pull metadata from various data systems, while the sinks are primarily for moving this metadata into DataHub.

Line range hint 116-116: Avoid redundancy: "CLI interface".

The phrase "CLI interface" is redundant. Use "CLI".

- The CLI interface is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory.
+ The CLI is defined in [entrypoints.py](./src/datahub/entrypoints.py) and in the [cli](./src/datahub/cli) directory.

Line range hint 117-117: Use a hyphen for compound adjectives.

Use a hyphen for "high level".

- The high level interfaces are defined in the [API directory](./src/datahub/ingestion/api).
+ The high-level interfaces are defined in the [API directory](./src/datahub/ingestion/api).

Line range hint 147-147: Avoid repetition: "probably".

The word "probably" is repeated in nearby sentences. Consider replacing it for variety.

- You probably should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class.
+ You should not be defining a `__hash__` method yourself. Using `@dataclass(frozen=True)` is a good way to get a hashable class.

Line range hint 154-154: Add a comma after "package".

A comma is needed after "package".

- are not required by the "core" package but instead can be optionally installed using Python "extras".
+ are not required by the "core" package, but instead can be optionally installed using Python "extras".

Line range hint 162-162: Avoid wordiness.

Consider a shorter alternative to avoid wordiness.

- In order to ensure that the configs are consistent and easy to use, we have a few guidelines:
+ To ensure that the configs are consistent and easy to use, we have a few guidelines:

Line range hint 168-168: Add a comma after "For example".

A comma is needed after "For example".

- For example `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`.
+ For example, `client_id` or `tenant_id` over a bare `id` and `access_secret` over a bare `secret`.

Line range hint 38-38: Add blank lines around heading.

Headings should be surrounded by blank lines.

- ### (Optional) Set up your Python environment for developing on Prefect Plugin
+ 
+ ### (Optional) Set up your Python environment for developing on Prefect Plugin
+ 

Line range hint 37-37: Add blank lines around fenced code blocks.

Fenced code blocks should be surrounded by blank lines.

- ```shell
+ 
+ ```shell

Line range hint 109-109: Add alt text for images.

Images should have alternate text.

- <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png"/>
+ <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-metadata-ingestion-framework.png" alt="DataHub Metadata Ingestion Framework"/>
docs-website/generateDocsDir.ts (1)

Line range hint 352-367: Remove unnecessary else clause.

The else clause can be omitted because previous branches break early.

- } else {
-   // Find first h1 header and use it as the title.
-   const headers = contents.content.match(/^# (.+)$/gm);
- 
-   if (!headers) {
-     throw new Error(
-       `${filepath} must have at least one h1 header for setting the title`
-     );
-   }
- 
-   if (headers.length > 1 && contents.content.indexOf("```") < 0) {
-     throw new Error(`too many h1 headers in ${filepath}`);
-   }
-   title = headers[0].slice(2).trim();
- }
+ // Find first h1 header and use it as the title.
+ const headers = contents.content.match(/^# (.+)$/gm);
+ 
+ if (!headers) {
+   throw new Error(
+     `${filepath} must have at least one h1 header for setting the title`
+   );
+ }
+ 
+ if (headers.length > 1 && contents.content.indexOf("```") < 0) {
+   throw new Error(`too many h1 headers in ${filepath}`);
+ }
+ title = headers[0].slice(2).trim();
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8f40229 and b796f59.

Files selected for processing (24)
  • .github/workflows/build-and-test.yml (2 hunks)
  • .github/workflows/prefect-plugin.yml (1 hunks)
  • .github/workflows/test-results.yml (1 hunks)
  • docs-website/build.gradle (1 hunks)
  • docs-website/generateDocsDir.ts (1 hunks)
  • docs-website/sidebars.js (2 hunks)
  • docs/lineage/prefect.md (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/.gitignore (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/README.md (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/build.gradle (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/pyproject.toml (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/setup.cfg (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/setup.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/save_block.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/tests/unit/test_block_standards.py (1 hunks)
  • metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py (1 hunks)
  • metadata-ingestion/developing.md (2 hunks)
  • settings.gradle (1 hunks)
Files not reviewed due to errors (1)
  • docs-website/sidebars.js (no review received)
Files skipped from review due to trivial changes (5)
  • metadata-ingestion-modules/prefect-plugin/pyproject.toml
  • metadata-ingestion-modules/prefect-plugin/setup.cfg
  • metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/init.py
  • metadata-ingestion-modules/prefect-plugin/tests/integration/integration_test_dummy.py
  • settings.gradle
Additional context used
GitHub Check: actionlint
.github/workflows/test-results.yml

[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:5:91: read without -r will mangle backslashes [shellcheck]


[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2162:info:7:13: read without -r will mangle backslashes [shellcheck]


[failure] 19-19:
[actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]

Raw Output:
.github/workflows/test-results.yml:19:9: shellcheck reported issue in this script: SC2086:info:8:10: Double quote to prevent globbing and word splitting [shellcheck]

LanguageTool
docs/lineage/prefect.md

[misspelling] ~15-~15: This word is normally spelled with a hyphen.
Context: ...ther Prefect Cloud (recommended) or the self hosted Prefect server. 2. Refer [Cloud Quickst...

(EN_COMPOUNDS_SELF_HOSTED)


[misspelling] ~17-~17: This word is normally spelled with a hyphen.
Context: ...refect.io/latest/guides/host/) to setup self hosted Prefect server. 4. Make sure the Prefec...

(EN_COMPOUNDS_SELF_HOSTED)


[uncategorized] ~40-~40: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)


[uncategorized] ~45-~45: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)


[typographical] ~46-~46: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...

(PLEASE_COMMA)


[uncategorized] ~100-~100: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Concept mapp...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)

metadata-ingestion-modules/prefect-plugin/README.md

[style] ~25-~25: Consider a shorter alternative to avoid wordiness.
Context: ... Getting Started ### Setup DataHub UI In order to use 'prefect-datahub' collection, you'l...

(IN_ORDER_TO_PREMIUM)


[uncategorized] ~30-~30: Possible missing preposition found.
Context: ...cessful deployment of DataHub will lead creation of DataHub GMS service running on 'http...

(AI_HYDRA_LEO_MISSING_TO)


[uncategorized] ~36-~36: A comma is probably missing here.
Context: .../concepts/blocks/#saving-blocks). While saving you can provide below configurations. D...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)


[uncategorized] ~36-~36: You might be missing the article “the” here.
Context: ...g-blocks). While saving you can provide below configurations. Default value will get ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~36-~36: A determiner appears to be missing. Consider inserting it.
Context: ...g you can provide below configurations. Default value will get set if not provided whil...

(AI_EN_LECTOR_MISSING_DETERMINER)


[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)


[typographical] ~42-~42: Consider adding a comma here.
Context: ...y this recipe belong to. For more detail please refer [here](https://datahubproject.io/...

(PLEASE_COMMA)


[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)


[uncategorized] ~124-~124: A comma may be missing after the conjunctive/linking adverb ‘Otherwise’.
Context: ...sks, user compulsory need to emit flow. Otherwise nothing will get emit. ## Resources F...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[style] ~149-~149: The phrase ‘feel free to’ is used quite frequently. Consider using a less frequent alternative to set your writing apart from others and make it sound more professional.
Context: ... community](https://prefect.io/slack). Feel free to star or watch [datahub](https://githu...

(FEEL_FREE_TO_STYLE_ME)


[style] ~150-~150: Using many exclamation marks might seem excessive (in this case: 8 exclamation marks for a text that’s 3263 characters long)
Context: ...datahub-project/datahub) for updates too! ### Contributing If you'd like to hel...

(EN_EXCESSIVE_EXCLAMATION)


[style] ~154-~154: Consider using a different verb for a more formal wording.
Context: ...ng If you'd like to help contribute to fix an issue or add a feature to `prefect-d...

(FIX_RESOLVE)

metadata-ingestion/developing.md

[grammar] ~67-~67: Possibly, a comma is missing after an introductory clause, the verb inflection is not correct, or a question mark is missing. Did you mean “pip, install” or “pip installs”?
Context: ...ll If you've already run the pip install, but running datahub in your command ...

(IF_DT_NN_VBZ)


[style] ~112-~112: The phrase “a variety of” may be wordy. To make your writing clearer, consider replacing it.
Context: ...objects. The sources pull metadata from a variety of data systems, while the sinks are prima...

(A_VARIETY_OF)


[style] ~116-~116: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...ta into DataHub. ## Code layout - The CLI interface is defined in [entrypoints.py](./src/da...

(ACRONYM_TAUTOLOGY)


[uncategorized] ~117-~117: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...li](./src/datahub/cli) directory. - The high level interfaces are defined in the [API dire...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[style] ~147-~147: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...er method over a @staticmethod. - You probably should not be defining a __hash__ met...

(REP_PROBABLY)


[uncategorized] ~154-~154: Possible missing comma found.
Context: ...s are not required by the "core" package but instead can be optionally installed usi...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~162-~162: Consider a shorter alternative to avoid wordiness.
Context: ...l.io/) to define the ingestion configs. In order to ensure that the configs are consistent ...

(IN_ORDER_TO_PREMIUM)


[typographical] ~168-~168: After the expression ‘for example’ a comma is usually used.
Context: ...ternative isn’t descriptive enough. For example client_id or tenant_id over a bare ...

(COMMA_FOR_EXAMPLE)

Markdownlint
docs/lineage/prefect.md

58-58: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


114-114: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


124-124: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


134-134: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


19-19: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


21-21: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


123-123: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


18-18: null
Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


22-22: null
Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


44-44: null
Bare URL used

(MD034, no-bare-urls)


120-120: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: Expected: fenced; Actual: indented
Code block style

(MD046, code-block-style)

metadata-ingestion-modules/prefect-plugin/README.md

26-26: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


63-63: Expected: 0 or 2; Actual: 1
Trailing spaces

(MD009, no-trailing-spaces)


21-21: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


34-34: Expected: 1; Actual: 2
Multiple consecutive blank lines

(MD012, no-multiple-blanks)


30-30: null
Bare URL used

(MD034, no-bare-urls)


40-40: null
Bare URL used

(MD034, no-bare-urls)


7-7: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


9-9: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


11-11: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


14-14: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


71-71: Expected: fenced; Actual: indented
Code block style

(MD046, code-block-style)

metadata-ingestion/developing.md

165-165: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time

(MD001, heading-increment)


38-38: Expected: 1; Actual: 0; Above
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


37-37: null
Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


109-109: null
Images should have alternate text (alt text)

(MD045, no-alt-text)


256-256: null
Files should end with a single newline character

(MD047, single-trailing-newline)

Biome
docs-website/generateDocsDir.ts

[error] 352-367: This else clause can be omitted because previous branches break early.

Unsafe fix: Omit the else clause.

(lint/style/noUselessElse)


[error] 356-367: This else clause can be omitted because previous branches break early.

(lint/style/noUselessElse)


[error] 365-367: This else clause can be omitted because previous branches break early.

(lint/style/noUselessElse)

Ruff
metadata-ingestion-modules/prefect-plugin/tests/unit/test_datahub_emitter.py

550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

Additional comments not posted (40)
metadata-ingestion-modules/prefect-plugin/scripts/release.sh (1)

4-8: Ensure proper handling of build and install commands.

The script conditionally runs gradlew build and gradlew install. Ensure that these commands are correctly handling failures and that the conditions are mutually exclusive.

Verification successful

Ensure proper handling of build and install commands.

The script conditionally runs gradlew build and gradlew install. The gradlew file is accessible at ../../jailuser/git/gradlew. The conditions for RELEASE_SKIP_TEST and RELEASE_SKIP_INSTALL are mutually exclusive, and the script uses set -euxo pipefail for robust error handling.

  • Update the script to use the correct path: ../../jailuser/git/gradlew.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the build and install commands are correctly handling failures and that the conditions are mutually exclusive.

# Test: Check the exit status of the commands.
../../gradlew build || { echo "Build failed"; exit 1; }
../../gradlew install || { echo "Install failed"; exit 1; }

Length of output: 154



Script:

#!/bin/bash
# Locate the `gradlew` file to ensure it exists in the expected directory.
fd gradlew

# Check the conditions and error handling in the release.sh script.
cat metadata-ingestion-modules/prefect-plugin/scripts/release.sh

Length of output: 1225



Script:

#!/bin/bash
# Verify the correct relative path to `gradlew` from the release.sh script location.
fd gradlew --exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/

# Check the mutual exclusivity of the conditions in the release.sh script.
ast-grep --lang bash --pattern $'if [[ ! ${RELEASE_SKIP_TEST:-} ]]; then\n\t$_\nelif [[ ! ${RELEASE_SKIP_INSTALL:-} ]]; then\n\t$_\nfi' metadata-ingestion-modules/prefect-plugin/scripts/release.sh

Length of output: 634



Script:

#!/bin/bash
# Locate the `gradlew` file and determine its relative path from the script's location.
find ../../ -name gradlew -exec realpath --relative-to=metadata-ingestion-modules/prefect-plugin/scripts/ {} +

# Verify the mutual exclusivity of conditions in the release.sh script.
grep -E 'if \[\[ ! \${RELEASE_SKIP_TEST:-} \]\]; then|elif \[\[ ! \${RELEASE_SKIP_INSTALL:-} \]\]; then' metadata-ingestion-modules/prefect-plugin/scripts/release.sh

Length of output: 411



Script:

#!/bin/bash
# Locate the `gradlew` file and display the paths.
find ../../ -name gradlew

Length of output: 235



Script:

#!/bin/bash
# Verify if the `gradlew` file is accessible at the intended path.
if [ -f ../../jailuser/git/gradlew ]; then
  echo "gradlew is accessible at ../../jailuser/git/gradlew"
else
  echo "gradlew is not accessible at ../../jailuser/git/gradlew"
fi

Length of output: 152

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/entities.py (3)

9-13: LGTM!

The _Entity class is well-defined and follows best practices for abstract base classes.


16-30: LGTM!

The Dataset class is well-implemented and makes good use of attr.s for attribute management.


33-46: LGTM!

The Urn class is well-implemented and makes good use of attr.s for attribute management and custom validation.

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/example/flow.py (2)

17-20: LGTM!

The extract function is simple and well-implemented.


49-55: LGTM!

The etl function is well-implemented and follows best practices for flow functions.

metadata-ingestion-modules/prefect-plugin/.gitignore (1)

1-143: LGTM!

The .gitignore file is well-structured and includes common patterns for Python projects.

.github/workflows/prefect-plugin.yml (1)

1-86: LGTM!

The workflow file is well-structured and follows best practices for GitHub Actions workflows.

metadata-ingestion-modules/prefect-plugin/setup.py (7)

1-2: Optimize imports.

Combine the os and pathlib imports for better readability.

import os
import pathlib

16-24: Clarify and document dependency groups.

Add comments to explain the purpose of each dependency group.

# Common dependencies for all environments.
rest_common = {"requests", "requests_file"}

# Base dependencies required for the package.
base_requirements = {
    # For python 3.7 and importlib-metadata>=5.0.0, build failed with attribute error
    "importlib-metadata>=4.4.0,<5.0.0; python_version < '3.8'",
    # Actual dependencies.
    "prefect >= 2.0.0",
    *rest_common,
    f"acryl-datahub == {package_metadata['__version__']}",
}

46-69: Update dev_requirements for clarity.

Group related dependencies and add comments for better readability.

# Development dependencies.
dev_requirements = {
    *base_requirements,
    *mypy_stubs,
    # Code formatting and linting tools.
    "black==22.12.0",
    "coverage>=5.1",
    "flake8>=3.8.3",
    "flake8-tidy-imports>=4.3.0",
    "isort>=5.7.0",
    "mypy>=1.4.0",
    # Pydantic version compatibility with mypy.
    "pydantic>=1.10",
    # Testing tools.
    "pytest>=6.2.2",
    "pytest-asyncio>=0.16.0",
    "pytest-cov>=2.8.1",
    "tox",
    # Miscellaneous tools.
    "deepdiff",
    "requests-mock",
    "freezegun",
    "jsonpickle",
    "build",
    "twine",
    "packaging",
}

76-121: Add keywords metadata for better discoverability.

Include relevant keywords in the setuptools.setup call to improve package discoverability.

setuptools.setup(
    # Package metadata.
    name=package_metadata["__package_name__"],
    version=package_metadata["__version__"],
    url="https://datahubproject.io/",
    project_urls={
        "Documentation": "https://datahubproject.io/docs/",
        "Source": "https://github.com/datahub-project/datahub",
        "Changelog": "https://github.com/datahub-project/datahub/releases",
    },
    license="Apache License 2.0",
    description="Datahub prefect block to capture executions and send to Datahub",
    long_description=get_long_description(),
    long_description_content_type="text/markdown",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3 :: Only",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Intended Audience :: Developers",
        "Intended Audience :: Information Technology",
        "Intended Audience :: System Administrators",
        "License :: OSI Approved",
        "License :: OSI Approved :: Apache Software License",
        "Operating System :: Unix",
        "Operating System :: POSIX :: Linux",
        "Environment :: Console",
        "Environment :: MacOS X",
        "Topic :: Software Development",
    ],
    # Package info.
    zip_safe=False,
    python_requires=">=3.7",
    package_dir={"": "src"},
    packages=setuptools.find_namespace_packages(where="./src"),
    entry_points=entry_points,
    # Dependencies.
    install_requires=list(base_requirements),
    extras_require={
        "dev": list(dev_requirements),
    },
    keywords="datahub prefect metadata integration",
)

71-73: Check entry points for correctness.

Ensure the entry point definition correctly maps to the DatahubEmitter class.

Verification successful

Entry point definition is correct.

The DatahubEmitter class is correctly defined in the prefect_datahub module as specified in the entry point.

  • File: metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py
  • Class Definition: class DatahubEmitter(Block)
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module.
rg --type python --pattern "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.py

Length of output: 504



Script:

#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition. Expect: Definition in the specified module.
rg "class DatahubEmitter" ./src/prefect_datahub/prefect_datahub.py

Length of output: 149



Script:

#!/bin/bash
# Description: Verify that the entry point definition correctly maps to the `DatahubEmitter` class.

# Test: Search for `DatahubEmitter` class definition across the entire repository.
rg "class DatahubEmitter"

Length of output: 278


28-44: Verify dependency versions for mypy_stubs.

Ensure that the specified versions for mypy_stubs are compatible with the rest of the dependencies.

Verification successful

Verify dependency versions for mypy_stubs.

The specified versions for mypy_stubs appear to be compatible with the rest of the dependencies listed in base_requirements and framework_common. No conflicts were found based on the available version constraints.

  • base_requirements and framework_common contain dependencies with specific versions that do not conflict with the mypy_stubs versions.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the specified versions for `mypy_stubs` are compatible with the rest of the dependencies.

# Test: Check for dependency conflicts. Expect: No conflicts.
pip install types-dataclasses sqlalchemy-stubs types-pkg_resources types-six types-python-dateutil types-requests types-toml types-PyYAML types-freezegun types-cachetools types-click==0.1.12 types-tabulate types-pytz

Length of output: 289



Script:

#!/bin/bash
# Description: List all dependencies and their versions in the setup.py file to manually verify compatibility.

# Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory
fd setup.py -a -t f -x cat {} | grep -E "install_requires|mypy_stubs"

Length of output: 404



Script:

#!/bin/bash
# Description: Search for the definitions of base_requirements and framework_common to list all dependencies and their versions.

# Locate the setup.py file in the metadata-ingestion-modules/prefect-plugin directory and search for base_requirements and framework_common definitions
fd setup.py -a -t f -x cat '{}' | grep -A 10 -E 'base_requirements|framework_common'

Length of output: 5934


6-8: Ensure __init__.py contains necessary metadata.

Verify that ./src/prefect_datahub/__init__.py contains the __package_name__ and __version__ keys used later in the setup.

Verification successful

Ensure __init__.py contains necessary metadata.

The metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/__init__.py file contains the __package_name__ and __version__ keys, confirming that the necessary metadata is present.

  • __package_name__ = "prefect-datahub"
  • __version__ = "1!0.0.0.dev0"
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`.

# Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`.
rg --type python --pattern "__package_name__|__version__" ./src/prefect_datahub/__init__.py

Length of output: 505



Script:

#!/bin/bash
# Description: Verify that `__package_name__` and `__version__` are defined in `__init__.py`.

# Test: Search for the definitions. Expect: Definitions of `__package_name__` and `__version__`.
rg '__package_name__|__version__' ./src/prefect_datahub/__init__.py

Length of output: 143



Script:

#!/bin/bash
# Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined.

# Step 1: Locate `__init__.py` in the repository.
fd --type f --name '__init__.py'

# Step 2: Once located, check for the definitions of `__package_name__` and `__version__`.
fd --type f --name '__init__.py' --exec rg '__package_name__|__version__' {}

Length of output: 613



Script:

#!/bin/bash
# Description: Locate `__init__.py` and verify that `__package_name__` and `__version__` are defined.

# Step 1: Locate `__init__.py` in the repository.
fd '__init__.py'

# Step 2: Once located, check for the definitions of `__package_name__` and `__version__'.
fd '__init__.py' --exec rg '__package_name__|__version__' {}

Length of output: 10052

docs-website/build.gradle (1)

89-89: LGTM! Ensure the build task is necessary.

The addition of the prefect-plugin:buildWheel task looks good.

Verify that this task is necessary for the documentation build process.

Verification successful

The buildWheel task for the prefect-plugin is necessary.

The buildWheel task is consistently referenced across multiple Gradle files, indicating its necessity in the build process.

  • metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process.

# Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts.
rg --type gradle --pattern "buildWheel"

Length of output: 451



Script:

#!/bin/bash
# Description: Verify that the `prefect-plugin:buildWheel` task is necessary for the documentation build process.

# Test: Search for references to `buildWheel` tasks. Expect: References in relevant build scripts.
rg "buildWheel" --type gradle

Length of output: 813

.github/workflows/build-and-test.yml (1)

89-90: Ensure exclusions are necessary.

The exclusion of the prefect-plugin:build and prefect-plugin:check tasks looks good.

Verify that these exclusions are necessary for the build process.

Verification successful

Exclusions are necessary.

The exclusions for prefect-plugin:build and prefect-plugin:check tasks are correctly specified and necessary for the build process.

  • The metadata-ingestion-modules/prefect-plugin/build.gradle file includes references to both build and check tasks, confirming their presence in the build process.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process.

# Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts.
rg --type gradle --pattern "build|check"

Length of output: 454



Script:

#!/bin/bash
# Description: Verify that the exclusions for `prefect-plugin:build` and `prefect-plugin:check` tasks are necessary for the build process.

# Test: Search for references to `build` and `check` tasks. Expect: References in relevant build scripts.
rg "build|check" --type gradle

Length of output: 19935

metadata-ingestion-modules/prefect-plugin/build.gradle (13)

1-3: LGTM! Ensure the base plugin is necessary.

The inclusion of the base plugin looks good.

Verify that the base plugin is necessary for the build process.

Verification successful

The base plugin is necessary for the build process.

The base plugin is referenced in multiple build.gradle files across different modules, indicating its necessity for the build configuration.

  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • entity-registry/custom-test-model/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the `base` plugin is necessary for the build process.

# Test: Search for references to the `base` plugin. Expect: References in relevant build scripts.
rg --type gradle --pattern "id 'base'"

Length of output: 458



Script:

#!/bin/bash
# Description: Verify that the `base` plugin is necessary for the build process.

# Test: Search for references to the `base` plugin. Expect: References in relevant build scripts.
rg "id 'base'" --type gradle

Length of output: 348


30-41: Ensure package installation task is necessary.

The task to install the package looks good.

Verify that this task is necessary for the build process.

Verification successful

The installPackage task is necessary for the build process.

The installPackage task is referenced in multiple build scripts, confirming its necessity.

  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the package installation task is necessary for the build process.

# Test: Search for references to the package installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installPackage"

Length of output: 455



Script:

#!/bin/bash
# Description: Verify that the package installation task is necessary for the build process.

# Test: Search for references to the package installation task. Expect: References in relevant build scripts.
rg "installPackage" --type gradle

Length of output: 1360


14-15: LGTM! Ensure the pip_install_command definition is correct.

The definition for the pip_install_command looks good.

Verify that this definition is correct for the environment.

Verification successful

LGTM! Ensure the pip_install_command definition is correct.

The definition for the pip_install_command looks good. It is referenced multiple times in the metadata-ingestion-modules/prefect-plugin/build.gradle file, confirming its usage.

  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the definition for the `pip_install_command` is correct for the environment.

# Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts.
rg --type gradle --pattern "pip_install_command"

Length of output: 460



Script:

#!/bin/bash
# Description: Verify that the definition for the `pip_install_command` is correct for the environment.

# Test: Search for references to `pip_install_command`. Expect: References in relevant build scripts.
rg "pip_install_command" --type gradle

Length of output: 1467


10-12: LGTM! Ensure the extra_pip_requirements property is necessary.

The definition for the extra_pip_requirements property looks good.

Verify that this property is necessary for the build process.

Verification successful

The extra_pip_requirements property is necessary for the build process.

The property is referenced across multiple build scripts and is used in pip installation commands, indicating its importance in managing pip dependencies.

  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • datahub-web-react/src/app/ingest/source/builder/NameSourceStep.tsx
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the `extra_pip_requirements` property is necessary for the build process.

# Test: Search for references to `extra_pip_requirements`. Expect: References in relevant build scripts.
rg --type gradle --pattern "extra_pip_requirements"

Length of output: 463



Script:

#!/bin/bash
# Description: Verify that the `extra_pip_requirements` property is necessary for the build process.

# Test: Search for references to `extra_pip_requirements` across all files.
rg "extra_pip_requirements"

Length of output: 2217


20-27: Ensure environment setup task is necessary.

The task to set up the environment looks good.

Verify that this task is necessary for the build process.

Verification successful

Environment setup task is necessary.

The environmentSetup task is a prerequisite for multiple tasks across various modules, confirming its necessity for the build process.

  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
  • metadata-ingestion-modules/prefect-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the environment setup task is necessary for the build process.

# Test: Search for references to the environment setup task. Expect: References in relevant build scripts.
rg --type gradle --pattern "environmentSetup"

Length of output: 457



Script:

#!/bin/bash
# Description: Verify that the environment setup task is necessary for the build process.

# Test: Search for references to the environment setup task. Expect: References in relevant build scripts.
rg "environmentSetup" --type gradle

Length of output: 1702


46-54: Ensure development installation task is necessary.

The task to install development dependencies looks good.

Verify that this task is necessary for the build process.

Verification successful

The installDev task is necessary for the build process.

The installDev task is referenced in multiple build.gradle files and is a dependency for various other tasks across the project.

  • smoke-test/build.gradle
  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle

These references confirm its necessity within the build process.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the development installation task is necessary for the build process.

# Test: Search for references to the development installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installDev"

Length of output: 451



Script:

#!/bin/bash
# Description: Verify that the development installation task is necessary for the build process.

# Test: Search for references to the development installation task. Expect: References in relevant build scripts.
rg "installDev" --type gradle

Length of output: 4020


5-8: LGTM! Ensure the Python executable and virtual environment name are correct.

The definitions for python_executable and venv_name look good.

Verify that these definitions are correct for the environment.

Verification successful

Verification successful for python_executable and venv_name.

The definitions for python_executable and venv_name are consistent and correctly used across the relevant build scripts.

  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • smoke-test/build.gradle
  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment.

# Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts.
rg --type gradle --pattern "python_executable|venv_name"

Length of output: 470



Script:

#!/bin/bash
# Description: Verify that the definitions for `python_executable` and `venv_name` are correct for the environment.

# Test: Search for references to `python_executable` and `venv_name`. Expect: References in relevant build scripts.
rg 'python_executable|venv_name' --type gradle

Length of output: 12093


16-18: Ensure Python version check is necessary.

The task to check the Python version looks good.

Verify that this task is necessary for the build process.

Verification successful

Python version check task is necessary for the build process.

The checkPythonVersion task is referenced as a dependency for the environmentSetup task in multiple build.gradle files, indicating its necessity for the build process.

  • metadata-ingestion-modules/dagster-plugin/build.gradle
  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion-modules/airflow-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the Python version check task is necessary for the build process.

# Test: Search for references to the Python version check task. Expect: References in relevant build scripts.
rg --type gradle --pattern "checkPythonVersion"

Length of output: 459



Script:

#!/bin/bash
# Description: Verify that the Python version check task is necessary for the build process.

# Test: Search for references to the Python version check task. Expect: References in relevant build scripts.
rg "checkPythonVersion" --type gradle

Length of output: 851


73-80: Ensure development test installation task is necessary.

The task to install development test dependencies looks good.

Verify that this task is necessary for the build process.

Verification successful

Verified: The installDevTest task is necessary for the build process.

The task is referenced by multiple other tasks, confirming its necessity.

  • metadata-ingestion-modules/prefect-plugin/build.gradle
  • metadata-ingestion/build.gradle
  • metadata-ingestion-modules/dagster-plugin/build.gradle
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the development test installation task is necessary for the build process.

# Test: Search for references to the development test installation task. Expect: References in relevant build scripts.
rg --type gradle --pattern "installDevTest"

Length of output: 455



Script:

#!/bin/bash
# Description: Verify that the development test installation task is necessary for the build process.

# Test: Search for references to the development test installation task in Gradle files.
rg "installDevTest" --type gradle

Length of output: 1697


56-63: Ensure linting task is necessary.

The task to lint the code looks good.

Verify that this task is necessary for the build process.

Verification successful

The linting task is necessary for the build process.

The lint task is referenced as a dependency of the check task in the metadata-ingestion-modules/prefect-plugin/build.gradle file, confirming its necessity.

  • metadata-ingestion-modules/prefect-plugin/build.gradle: check.dependsOn lint
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the linting task is necessary for the build process.

# Test: Search for references to the linting task. Expect: References in relevant build scripts.
rg --type gradle --pattern "lint"

Length of output: 445



Script:

#!/bin/bash
# Description: Verify that the linting task is necessary for the build process.

# Test: Search for references to the linting task. Expect: References in relevant build scripts.
rg "lint" --type gradle

Length of output: 1714


96-103: Ensure quick test task is necessary.

The task to run quick tests looks good.

Verify that this task is necessary for the build process.

Verification successful

The testQuick task is necessary for the build process.

The testQuick task is referenced in multiple build scripts, indicating its importance in the build process. Specifically, it is a dependency for the check task and the testFull task.

  • Files and references:
    • metadata-ingestion-modules/dagster-plugin/build.gradle
      • Line: task testQuick(type: Exec, dependsOn: installDevTest)
      • Line: task testFull(type: Exec, dependsOn: [testQuick, installDevTest])
      • Line: check.dependsOn testQuick
    • metadata-ingestion-modules/prefect-plugin/build.gradle
      • Line: task testQuick(type: Exec, dependsOn: installDevTest)
      • Line: task testFull(type: Exec, dependsOn: [testQuick, installDevTest])
      • Line: check.dependsOn testQuick
    • metadata-ingestion/build.gradle
      • Line: task testQuick(type: Exec, dependsOn: [installDev, ':metadata-models:generateJsonSchema'])
      • Line: check.dependsOn testQuick
    • metadata-ingestion-modules/airflow-plugin/build.gradle
      • Line: task testQuick(type: Exec, dependsOn: installTest)
      • Line: check.dependsOn testQuick
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the quick test task is necessary for the build process.

# Test: Search for references to the quick test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testQuick"

Length of output: 450



Script:

#!/bin/bash
# Description: Verify that the quick test task is necessary for the build process.

# Test: Search for references to the quick test task. Expect: References in relevant build scripts.
rg "testQuick" --type gradle

Length of output: 1030


82-94: Ensure single test task is necessary.

The task to run a single test looks good.

Verify that this task is necessary for the build process.

Verification successful

Ensure single test task is necessary.

The task testSingle is indeed referenced in the documentation (metadata-ingestion/developing.md), indicating that it is used for running specific test files. This confirms its necessity for the build process.

  • metadata-ingestion/developing.md: References to testSingle for running specific test files.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the single test task is necessary for the build process.

# Test: Search for references to the single test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testSingle"

Length of output: 451



Script:

#!/bin/bash
# Description: Verify that the single test task is necessary for the build process.

# Test: Search for references to the single test task. Expect: References in relevant build scripts.
rg "testSingle" --type gradle

Length of output: 312



Script:

#!/bin/bash
# Description: Verify if the single test task is invoked or referenced elsewhere in the codebase.

# Test: Search for invocations or references to the single test task.
rg "testSingle"

Length of output: 705


64-71: Ensure lint fixing task is necessary.

The task to fix linting issues looks good.

Verify that this task is necessary for the build process.

Verification successful

Lint fixing task is necessary

The lintFix task is referenced in multiple build scripts and is part of the dependency chain for running tests (installTest and testQuick). This confirms its necessity in the build process.

  • metadata-ingestion-modules/airflow-plugin/build.gradle: installTest should run after lintFix.
  • metadata-ingestion-modules/airflow-plugin/build.gradle: testQuick depends on installTest.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the lint fixing task is necessary for the build process.

# Test: Search for references to the lint fixing task. Expect: References in relevant build scripts.
rg --type gradle --pattern "lintFix"

Length of output: 448



Script:

#!/bin/bash
# Description: Verify that the lint fixing task is necessary for the build process.

# Test: Search for references to the lint fixing task. Expect: References in relevant build scripts.
rg "lintFix" --type gradle

Length of output: 513



Script:

#!/bin/bash
# Description: Verify the role of the `installTest` task in the build process.

# Test: Search for references to the `installTest` task. Expect: References in relevant build scripts.
rg "installTest" --type gradle

Length of output: 426

metadata-ingestion-modules/prefect-plugin/src/prefect_datahub/datahub_emitter.py (10)

104-114: LGTM!

The method correctly converts a list of _Entity objects to a list of DatasetUrn objects.


116-145: LGTM!

The method fetches the workspace name and handles errors gracefully.


174-189: LGTM!

The method correctly emits the browse path.


190-254: LGTM!

The method correctly generates a DataJob entity and handles various scenarios.


256-324: LGTM!

The method correctly generates a DataFlow entity and handles various scenarios.


326-394: LGTM!

The method correctly emits task metadata and handles upstream dependencies.


395-458: LGTM!

The method correctly emits flow run metadata and handles various scenarios.


459-552: LGTM!

The method correctly emits task run metadata and handles various scenarios.


554-612: LGTM!

The method correctly stores the current running task metadata.


613-659: LGTM!

The method correctly emits the current running flow metadata.

f"(prefect,{flow_run_ctx.flow.name},PROD),{task_run_ctx.task.task_key})"
)

assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

-  assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()
+  assert expected_datajob_urn in datahub_emitter._datajobs_to_emit
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert expected_datajob_urn in datahub_emitter._datajobs_to_emit.keys()
assert expected_datajob_urn in datahub_emitter._datajobs_to_emit
Tools
Ruff

550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

Comment on lines 1 to 3
import asyncio

from prefect_datahub.datahub_emitter import DatahubEmitter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the configuration values configurable.

Hardcoding values like datahub_rest_url, env, platform_instance, and token can reduce flexibility. Consider accepting these as parameters or using environment variables.

import os

datahub_rest_url = os.getenv("DATAHUB_REST_URL", "http://localhost:8080")
env = os.getenv("ENV", "DEV")
platform_instance = os.getenv("PLATFORM_INSTANCE", "local_prefect")
token = os.getenv("TOKEN", None)  # generate auth token in the datahub and provide here if gms endpoint is secure

datahub_emitter = DatahubEmitter(
    datahub_rest_url=datahub_rest_url,
    env=env,
    platform_instance=platform_instance,
    token=token,
)

Comment on lines 6 to 14
async def save_datahub_emitter():
datahub_emitter = DatahubEmitter(
datahub_rest_url="http://localhost:8080",
env="DEV",
platform_instance="local_prefect",
token=None, # generate auth token in the datahub and provide here if gms endpoint is secure
)

await datahub_emitter.save("BLOCK-ID", overwrite=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for the asynchronous operation.

The function save_datahub_emitter lacks error handling for the await datahub_emitter.save call. Consider adding a try-except block to handle potential exceptions.

async def save_datahub_emitter():
    try:
        datahub_emitter = DatahubEmitter(
            datahub_rest_url="http://localhost:8080",
            env="DEV",
            platform_instance="local_prefect",
            token=None,  # generate auth token in the datahub and provide here if gms endpoint is secure
        )
        await datahub_emitter.save("BLOCK-ID", overwrite=True)
    except Exception as e:
        print(f"Failed to save DatahubEmitter: {e}")

Comment on lines 10 to 14
async def load_datahub_emitter():
datahub_emitter = DatahubEmitter()
emitter = datahub_emitter.load("BLOCK-ID")
print(emitter)
return emitter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the block ID a parameter.

The function is well-implemented, but making the block ID a parameter would improve flexibility.

-    emitter = datahub_emitter.load("BLOCK-ID")
+    emitter = datahub_emitter.load(block_id: str)

Committable suggestion was skipped due to low confidence.

Comment on lines +23 to +46
@task(name="Transform", description="Transform the data")
def transform(
data: str, datahub_emitter: DatahubEmitter
) -> Tuple[List[str], DatahubEmitter]:
data_list_str = data.split(" ")
datahub_emitter.add_task(
inputs=[
Dataset(
platform="snowflake",
name="mydb.schema.tableA",
env=datahub_emitter.env,
platform_instance=datahub_emitter.platform_instance,
)
],
outputs=[
Dataset(
platform="snowflake",
name="mydb.schema.tableB",
env=datahub_emitter.env,
platform_instance=datahub_emitter.platform_instance,
)
],
)
return data_list_str, datahub_emitter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the task details parameters.

The function is well-implemented, but making the task details parameters would improve flexibility.

-    datahub_emitter.add_task(
-        inputs=[
-            Dataset(
-                platform="snowflake",
-                name="mydb.schema.tableA",
-                env=datahub_emitter.env,
-                platform_instance=datahub_emitter.platform_instance,
-            )
-        ],
-        outputs=[
-            Dataset(
-                platform="snowflake",
-                name="mydb.schema.tableB",
-                env=datahub_emitter.env,
-                platform_instance=datahub_emitter.platform_instance,
-            )
-        ],
-    )
+    datahub_emitter.add_task(inputs: List[Dataset], outputs: List[Dataset])

Committable suggestion was skipped due to low confidence.

Config | Type | Default | Description
--- | --- | --- | ---
datahub_rest_url | `str` | *http://localhost:8080* | DataHub GMS REST URL
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comma after "For more detail".

A comma is needed after "For more detail".

- For more detail and possible values refer [here]
+ For more detail and possible values, refer [here]
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).
env | `str` | *PROD* | The environment that all assets produced by this orchestrator belong to. For more detail and possible values, refer [here](https://datahubproject.io/docs/graphql/enums/#fabrictype).
Tools
LanguageTool

[uncategorized] ~41-~41: Possible missing comma found.
Context: ...belong to. For more detail and possible values refer [here](https://datahubproject.io/...

(AI_HYDRA_LEO_MISSING_COMMA)


### Load the saved block in prefect workflows

After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct verb form: "shown"

The correct verb form is "shown".

- as show below!
+ as shown below!
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as show below!
After installing `prefect-datahub` and [saving the configution](#saving-configurations-to-a-block), you can easily use it within your prefect workflows to help you emit metadata event as shown below!
Tools
LanguageTool

[uncategorized] ~81-~81: This verb may not be in the correct form. Consider using a different form for this context.
Context: ...lows to help you emit metadata event as show below! ```python import asyncio from ...

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)

Comment on lines +11 to +13
def get_long_description():
root = os.path.dirname(__file__)
return pathlib.Path(os.path.join(root, "README.md")).read_text()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pathlib consistently.

For consistency, use pathlib for path operations instead of os.path.

def get_long_description():
    root = pathlib.Path(__file__).parent
    return (root / "README.md").read_text()

@@ -241,4 +253,4 @@ tox -- --update-golden-files

# Update golden files for a specific environment.
tox -e py310-airflow26 -- --update-golden-files
```
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a single newline character at the end of the file.

Files should end with a single newline character.

- 
+ 

Committable suggestion was skipped due to low confidence.

Tools
Markdownlint

256-256: null
Files should end with a single newline character

(MD047, single-trailing-newline)

Comment on lines +106 to +109
task testFull(type: Exec, dependsOn: [testQuick, installDevTest]) {
commandLine 'bash', '-x', '-c',
"source ${venv_name}/bin/activate && pytest -m 'not slow_integration' -vv --continue-on-collection-errors --junit-xml=junit.full.xml"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Ensure full test task is necessary.

The testFull task is referenced in the metadata-ingestion/developing.md file but is not used in the build process itself.

  • metadata-ingestion/developing.md: ../gradlew :metadata-ingestion:testFull

This indicates that the testFull task is used during the development process but not required for the build process.

Analysis chain

Ensure full test task is necessary.

The task to run full tests looks good.

Verify that this task is necessary for the build process.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.

# Test: Search for references to the full test task. Expect: References in relevant build scripts.
rg --type gradle --pattern "testFull"

Length of output: 449



Script:

#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.

# Test: Search for references to the full test task. Expect: References in relevant build scripts.
rg "testFull" --type gradle

Length of output: 362



Script:

#!/bin/bash
# Description: Verify that the full test task is necessary for the build process.

# Test: Search for references to the full test task in all files within the repository.
rg "testFull"

Length of output: 421

@david-leifker david-leifker removed the devops PR or Issue related to DataHub backend & deployment label Jul 31, 2024
Fix build
@treff7es treff7es merged commit 6204cba into datahub-project:master Aug 29, 2024
72 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants