Skip to content

Releases: datahub-project/datahub

DataHub v0.8.33

15 Apr 18:46
72046bf
Compare
Choose a tag to compare

Release Highlights

User Experience

Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality

Ingestion Improvements

  • Airflow Improvements - as demoed in March Town Hall
    • Add support to capture Airflow execution runs from lineage backend
    • Introduce new High level API for generating dataflow/job/dataprocessinstance
  • MS SQL ingestion now captures table & column descriptions
  • Trino platform support for Great Expectations
  • New Presto-on-Hive ingestion source
  • BigQuery ingestion now supports extraction of usage info from audit logs
  • Fix to Looker ingestion to extract Explore Views from join names
  • Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
  • Simplify & annotate Redshift Usage source

Full Commit Log

  • feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in #4536
  • feat(ingest): add option for external Spark cluster by @kevinhu in #4571
  • fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in #4573
  • feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in #4574
  • build(ui): Fix Windows UI lint by @mattmatravers in #4556
  • doc: make note prominent on quickstart by @anshbansal in #4558
  • fix(protobuf) minor bugfixes for protobuf by @leifker in #4553
  • feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in #4552
  • chore: cleanup extra file by @anshbansal in #4541
  • feat(snowflake): reduce permissions provisioned by default by @anshbansal in #4543
  • fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in #4572
  • fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in #4582
  • feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in #4407
  • fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in #4577
  • chore: update default cli for managed ingestion by @anshbansal in #4581
  • feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in #4586
  • doc(snowflake): add example of table pattern by @anshbansal in #4580
  • fix(doc): try to fix broken link by @daha in #4593
  • fix(bigquery): incorrect lineage when views are present by @anshbansal in #4568
  • feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in #4584
  • fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in #4588
  • fix(browse): Reduce browse aggregation size by @dexter-mh-lee in #4601
  • doc: add page for handling deprecations, breaking changes etc. by @anshbansal in #4590
  • docs(GraphQL): fix typo by @Falci in #4605
  • feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in #4596
  • feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in #4585
  • feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in #4609
  • fix: urlencode slash in urns too by @daha in #4527
  • fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in #4607
  • fix(protobuf) fix test instabilitity by @leifker in #4612
  • fix(ui): Fix dashboard tags display by @jjoyce0510 in #4611
  • feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in #4614
  • feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in #4508
  • fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in #4559
  • chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in #4621
  • feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in #4387
  • doc(scheduling): make it easier to find ui ingestion by @anshbansal in #4610
  • feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in #4608
  • doc(cli): add env variables and options for ingest command by @anshbansal in #4598
  • fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in #4639
  • fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in #4640
  • Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in #4637
  • feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in #4629
  • feat: add presto-on-hive metadata ingestion source by @jchen0824 in #4625
  • feat(ingest): add trino platform for great expectations by @ms32035 in #4594
  • fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in #4604
  • [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in #4459
  • feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in #4618
  • fix(ingestion): ensure source/sink reports are always logged by @anshbansal in #4592
  • fix(ingestion): extract explore views from join name in Looker by @dyanarose in #4627
  • feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in #4649
  • feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in #4441
  • fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in #4638
  • fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in #4652
  • fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in #4645
  • feat(ingest): capture MSSQL table+column descriptions by @kevinhu in #4579
  • feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in #4651
  • (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in #4615
  • fix(ingestion): add missing workunit ids by @anshbansal in #4657
  • fix(ingestion): Adding missing init.py by @anshbansal in #4659
  • fix(bigquery-usage): missing dependency by @anshbansal in #4661
  • feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in #4654
  • feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in #4663
  • feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in #4587
  • fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in #4662
  • fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in #4613
  • Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in #4664
  • fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in #4668
  • docs:...
Read more

DataHub v0.8.32

04 Apr 21:27
ede6547
Compare
Choose a tag to compare

Release Highlights

User Experience

We're excited to announce View-based RBAC Policies! You can now create and apply view-only permissions to your DataHub end-users, providing more robust access controls.

We've also included some small (but impactful!) improvements to UX, including:

  • Display recent search terms when beginning the search flow
  • Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as "topics" instead of "datasets"

Ingestion Highlights

  • New! Protobuf ingestion (shoutout to @leifker for this Community-led contribution!)
  • Initial work to support a "Notebook" entity (shoutout to @tc350981 for spearheading this work!!)
  • Stateful ingestion for dbt is now supported
  • Ongoing improvements to our Tableau ingestion source from @nandacamargo & @cuong-pham
  • Improvements to handling database aliases for Redshift ingestion
  • Improvements to S3 source:
    • Add containers for datasets
    • Support platform_instance
    • Support for folder level datasets
    • Increased flexibility to specify dataset paths
  • Ingestion Fixes:
    • Snowflake Usage - log warning instead of error out & other error handling
    • Snowflake allow/deny patterns
    • Examples of allow/deny patterns added to docs

Full Commit Log

DataHub v0.8.31

17 Mar 23:22
2f078c9
Compare
Choose a tag to compare

Bugfix release to prevent failing reindexing of system metadata index in elasticsearch

Full Commit Log

  • #4440 @pedro93 fix(cli) Makes filtered search deletes include BOTH removed and non-removed
  • #4444 @pedro93 fix(cli) Adds elasticsearch mapping
  • #4432 @leifker feat(protobuf): Gradle protobuf example project

Datahub v0.8.30

17 Mar 13:55
2d82531
Compare
Choose a tag to compare

V0.8.30

Release Highlights

  • Fix for OIDC encryption bug from v0.8.29
  • Adds platform instance id to the container id generation, and support for migrating the old container ids to the new ones via the datahub migrate CLI.

Notable UI-Based Features

  • Showing recent searches in autocomplete.

What's Changed

  • fix(ui): some small ui fixes for lineage by @gabe-lyons in #4381
  • fix(docs): change cabify link by @maaaikoool in #4373
  • Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker by @Ankit-Keshari-Vituity in #4359
  • feat(GE): add option to disable sql parsing, use default parser by @mayurinehate in #4377
  • fix(removed): Make sure removed entities do not appear on recommendations by @dexter-mh-lee in #4353
  • fix(browse): fix browse double click issue by @gabe-lyons in #4382
  • fix(oidc): Update group membership each login (and make group extraction disabled by default) by @jjoyce0510 in #4380
  • feat(ingestion): add java protobuf schema ingestion by @leifker in #4178
  • Docs/update docs by @RyanHolstien in #4393
  • Revert "Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker" by @gabe-lyons in #4390
  • feat(ingestion): improve logging, docs for bigquery, snowflake, redshift by @anshbansal in #4344
  • fix(ingest) Azure AD: support nested groups (#4367) by @cccs-eric in #4368
  • fix: add missing logo by @anshbansal in #4386
  • feat(spark-lineage): add support to custom env and platform_instance by @MugdhaHardikar-GSLab in #4208
  • fix(containers) - configure domain resolver for containers by @aditya-radhakrishnan in #4404
  • feat(*): Support setting owner type when assigning ownership by @jjoyce0510 in #4354
  • fix: telemetry failure should not cause CLI failure by @anshbansal in #4406
  • feat(autocomplete): Show recent searches + improved autocomplete by @jjoyce0510 in #4400
  • fix(ingestion): Fix mypy error stateful committable & restore mypy version. by @rslanka in #4408
  • build(markupsafe): update markupsafe pinning for Airflow compatibility by @set5think in #4388
  • feat(search): Add flag to enable caching on search service by @dexter-mh-lee in #4335
  • fix(query_combiner): add try block to handle queries of type str by @WaStCo in #4397
  • fix(ingestion): read all tables from redshift by @Abhiram98 in #4345
  • fix(ingestion): Invoke SqlLineageSQLParser's implementation in a separate process by @rslanka in #4391
  • fix(ingest): handle endpoints without 200 response in openapi by @JorgenEvens in #4332
  • feat(ingestion): Add the ability to query the latest timeseries aspect value via the get_cli. by @rslanka in #4395
  • Refactoring the quries into a single one to get the search results on Home Page by @Ankit-Keshari-Vituity in #4372
  • feat(lineage): hide soft deleted nodes in lineage & adds banner in entity page by @gabe-lyons in #4410
  • fix(lineage): Move lineage registry to entity-registry module by @dexter-mh-lee in #4412
  • feat(cli) Changes rollback behaviour to apply soft deletes by default by @pedro93 in #4358
  • fix(looker): various looker fixes by @gabe-lyons in #4394
  • fix(oidc): Fixing OIDC encryption bug in v0.8.29 by @jjoyce0510 in #4418
  • feat(oidc): Adding support for extracting single string groups claim by @jjoyce0510 in #4419
  • fix: change log levels to debug by @anshbansal in #4411
  • tests(cypress): reduce cypress flakiness by retrying login on failure by @gabe-lyons in #4423
  • fix(ingest): extract redshift platform correctly from sqlalchemy uri by @mayurinehate in #4421
  • build: Fix line endings for Windows check-out by @mattmatravers in #4370
  • feat(gql): make gql layer resistant to unresolvable relationships by @gabe-lyons in #4424
  • fix(ingestion) containers: Adding platform instance to container keys by @treff7es in #4279
  • fix: don't set None default by @anshbansal in #4422
  • Flexible search on soft delete by @pedro93 in #4405
  • fix(no-code metadata models in ui): fixes bug with rendering renderSpec aspects by @gabe-lyons in #4430

New Contributors

Full Changelog: v0.8.29...v0.8.30

DataHub v0.8.29

10 Mar 19:15
d474387
Compare
Choose a tag to compare

v0.8.29

NOTICE

This version is affected by an OIDC (SSO) related issue with the following stack trace:

datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend Caused by: java.security.InvalidKeyException: Invalid AES key length: 30 bytes
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend 	at com.sun.crypto.provider.AESCrypt.init(AESCrypt.java:87)

DataHub core team is working to address this. For now, we recommend staying on 0.8.28 if you are using OIDC actively!

Release Highlights

Fix for MAE & MCE consumer healthcheck
Upgrade to Java 11 and Gradle 6

Full Commit Log

DataHub v0.8.28

07 Mar 23:57
beb51eb
Compare
Choose a tag to compare

Release Highlights

Notable UI-Based Features

Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.

View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.

User Groups, Policies, and Tags have a new look!

  • The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
  • We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
  • The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).

Notable Metadata Model & Ingestion-Based Features

First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.

Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.

Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.

Miscellaneous Metadata Ingestion Updates:

  • Incubating: PowerBI Ingestion Source
  • BigQuery Profiling: ability to disable profiling by partition
  • Tableau improvements: Workbooks are now modeled as “Containers”

What's Changed

Read more

DataHub Release Candidate v0.8.28 (rc1)

05 Mar 00:53
18dd5b6
Compare
Choose a tag to compare
Pre-release

DataHub v0.8.28 Release Candidate 1

What's Changed

New Contributors

Full Changelog: v0.8.27...v0.8.28rc1

Release Candidate v0.8.28

05 Mar 00:14
18dd5b6
Compare
Choose a tag to compare
Pre-release

Release Candidate for Version 0.8.28.

What's Changed

New Contributors

Full Changelog: v0.8.27...RC-v0.8.28

DataHub v0.8.27

23 Feb 19:44
49a8ece
Compare
Choose a tag to compare

Release Highlights

Notable UI-Based Features

  • The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.

  • Search for Entities by Owner - Easily filter search results by User/Group Owner

  • Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!

  • Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!

Notable Metadata Model & Ingestion-Based Features

  • ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!

  • Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!

  • Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!

  • BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.

Notable Docs Updates

  • NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl

  • Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.

What's Changed

Read more

DataHub v0.8.26

08 Feb 23:22
3668de8
Compare
Choose a tag to compare

This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.

Release Highlights

  • Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.