Skip to content

Releases: neo4j-field/bigquery-connector

0.6.1

18 Dec 18:41
8c3ca52
Compare
Choose a tag to compare

What's Changed

  • Added push to dockerhub and code owners by @ali-ince in #2
  • Use poetry as the build tool and apply linting & styling by @ali-ince in #4
  • Use labels from arrows by @ali-ince in #5

Full Changelog: 0.6.0...0.6.1

0.6.0

05 Jul 23:25
94ef9de
Compare
Choose a tag to compare

This release includes the following changes;

  • Improved error handling
  • Surfaces neo4j_action parameter so that a new database can also be created
  • Arrow connection information is now auto-discovered using Neo4j Bolt connection, meaning that neo4j_host parameter is now replaced by neo4j_uri which expects an ordinary Neo4j URI
  • Added support for model validation
  • Added pattern support for GDS->BigQuery direction and can include multiple patterns in a single run

0.5.1 - 🤏 Fix bug with write-backs for tiny datasets

12 Apr 16:53
Compare
Choose a tag to compare

If using a very small dataset (say a graph projection of < 10,000 nodes) and trying to write data back to BigQuery, we had the potential to trigger a RuntimeError because of a call to the finalizing method on a BigQuerySink when there's no defined BigQuery stream name.

0.5.0 - 🤪 Bumping 3rd Party Dependencies

12 Apr 15:59
Compare
Choose a tag to compare

This version mostly handles updates to the following dependencies:

  • google-dataproc-templates -- for some reason they pulled a package/release from PyPI again 🤬!!!
  • neo4j_arrow -- updates to v0.5.0 to pull in fixes for database creation for self-managed GDS Enterprise users.

0.4.0 - ↩️ Write-backs to BigQuery

23 Mar 21:20
Compare
Choose a tag to compare

Initial support for streaming data back to BigQuery from Neo4j AuraDS (or self-managed GDS) using a new template: Neo4jGDSToBigQueryTemplate

Supports both streaming of nodes (with or without properties) and relationship/edges (with or without properties). Any properties are stored in the resulting BigQuery table using a JSON field for flexibility.

0.3.1 - Field Filtering Fix (F^3)

16 Mar 21:58
Compare
Choose a tag to compare

Primarily a fix for supporting field filters, i.e. targeting fields in BigQuery tables based on the graph model.

While one could argue it's a new feature, the feature exists in the underlying neo4j_arrow module and wasn't properly wired in, so I consider this a bugfix 😉.

0.3.0 - 🤫 It's a Secret

02 Mar 23:10
Compare
Choose a tag to compare

Refinements since the initial prototype:

  • supports using Google Secret Manager 🤫 to store the Neo4j password and any other settings
  • switches to using native ARRAY<STRING> types for node_tables and edge_tables inputs
  • bug fix 🐛 where the provided graph name wasn't overriding one in the graph model file
  • updated docs/README 📄

The stored procedure signature is now reduced a bit to something cleaner.

0.2.0 - BQ to Neo4j Prototype

02 Mar 13:30
Compare
Choose a tag to compare

Initial functional prototype of using a BigQuery Stored Procedure for Apache Spark (say that 5 times fast) to lift and shift ~50 GiB dataset from BigQuery into Neo4j AuraDS using Neo4j's Apache Arrow Flight service.

🥳 It Lives

BQ to Neo4j

🤗 What's Known to Work

  • Pre-engineered datasets compliant with Neo4j GDS should lift/shift fine. This means a pre-designed node id space. Currently the import job does not address that, but there's a back-of-a-napkin-design on my desk for that.
  • Pushing 10's of GiB's of data works great staying in-region. Making sure the BQ dataset, Apache Spark "connection", and the AuraDS instance are co-located in the same region (e.g. europe-west1) keeps the throughput high without becoming network-bound. (I've observed maybe an order of magnitude drop in throughput going from US to EU.)

⚠️ Currently Known Gotchas

  • The Dataproc "template" works great in Dataproc Serverless (that's where it was originally built) but requires some odd hacks to get running under BigQuery. There are currently some...issues...just running the same container image because of how BigQuery orchestrates the Apache Spark environment.
  • We need to document all the cloud setup steps for this, specifically all the IAM roles that need to be granted to the service account used by BigQuery's Dataproc runner and access permissions to the Docker image. (As well as how to publish/host it.)

🔥Hot Topics for the Next Release

  • Reading back from AuraDS into BigQuery using the Storage Write API
  • Integration into GCP Secret Manager to get rid of plaint-text passwords (gross)
  • Cleanup the stored proc inputs (currently all BigQuery STRING args) (e.g. table list as an ARRAY<STRING>) and maybe tuck seldom-used config options into a BigQuery STRUCT like GDS does to make the procedure signature shorter.

0.1.0 - Initial Prototype

24 Feb 23:51
Compare
Choose a tag to compare
  • Can recreate the GraphConnect 2022 demo in under 10 minutes total runtime (including orchestration).
  • Includes a user-agent string identifying Neo4j as the cloud partner driving the consumption of BigQuery data.
  • Tested with both self-managed GDS on GCE and AuraDS.