Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Destination Azure blob storage: introduced new connector with jsonl and csv formats #5332

Merged
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a5b25cb
[3447] Added skeleton for CHECK() and SPECK() methods
Jul 17, 2021
adc3e07
Merge branch 'master' into etsybaev/3447-azure-blob-storage-destination
Jul 20, 2021
47c5fa2
[3447] Added skeleton for JSONL writer (some of tests still fail, nee…
Aug 2, 2021
af87e82
Merge branch 'master' into etsybaev/3447-azure-blob-storage-destination
Aug 2, 2021
8d8cc0d
[3447] Added some acceptance tests for azure blob storage jsonl client
Aug 4, 2021
bf653d9
[3447] minor refactoring
Aug 4, 2021
459fe72
[3447] updated the way blob name is handled and check method's client
Aug 5, 2021
ab89ffd
[3447] Fixed check method if url was not set explicitly by customer
Aug 5, 2021
e82a3f8
[3447] Updated README.md
Aug 5, 2021
20f527a
[3447] Added basic CSV writer
Aug 6, 2021
fe55528
[3549] Destination:BigQuery: Added skeleton for AVRO (still doesn't w…
Aug 7, 2021
eb4305e
[3549] Destination:BigQuery: Added AVRO comment
Aug 11, 2021
c927141
[3549] Destination:BigQuery: cleared code
Aug 11, 2021
49e9cf9
[3549] Destination:BigQuery: cleared code and added docs
Aug 11, 2021
1991005
[3549] Destination:BigQuery: made key arg as a secret type on UI
Aug 12, 2021
b755633
[3549] Destination:BigQuery: added secrets for CI
Aug 12, 2021
16ec69d
updated spec UI as per https://docs.airbyte.io/connector-development/…
Aug 16, 2021
0705839
Fixed code as per comments in PR
Aug 17, 2021
35c01b9
fixed code as per comments in PR
Aug 28, 2021
9e25067
Merge branch 'master' into etsybaev/3447-azure-blob-storage-destinati…
Aug 28, 2021
c74d443
Added azure blob storage destination_definitions
Aug 28, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/publish-command.yml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ jobs:
ZOOM_INTEGRATION_TEST_CREDS: ${{ secrets.ZOOM_INTEGRATION_TEST_CREDS }}
PLAID_INTEGRATION_TEST_CREDS: ${{ secrets.PLAID_INTEGRATION_TEST_CREDS }}
DESTINATION_S3_INTEGRATION_TEST_CREDS: ${{ secrets.DESTINATION_S3_INTEGRATION_TEST_CREDS }}
DESTINATION_AZURE_BLOB_CREDS: ${{ secrets.DESTINATION_AZURE_BLOB_CREDS }}
DESTINATION_GCS_CREDS: ${{ secrets.DESTINATION_GCS_CREDS }}
APIFY_INTEGRATION_TEST_CREDS: ${{ secrets.APIFY_INTEGRATION_TEST_CREDS }}
- run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/test-command.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ jobs:
ZOOM_INTEGRATION_TEST_CREDS: ${{ secrets.ZOOM_INTEGRATION_TEST_CREDS }}
PLAID_INTEGRATION_TEST_CREDS: ${{ secrets.PLAID_INTEGRATION_TEST_CREDS }}
DESTINATION_S3_INTEGRATION_TEST_CREDS: ${{ secrets.DESTINATION_S3_INTEGRATION_TEST_CREDS }}
DESTINATION_AZURE_BLOB_CREDS: ${{ secrets.DESTINATION_AZURE_BLOB_CREDS }}
DESTINATION_GCS_CREDS: ${{ secrets.DESTINATION_GCS_CREDS }}
APIFY_INTEGRATION_TEST_CREDS: ${{ secrets.APIFY_INTEGRATION_TEST_CREDS }}
- run: |
Expand Down
1 change: 1 addition & 0 deletions airbyte-integrations/builds.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@

|name |status |
| :--- | :--- |
| Azure Blob Storage | [![destination-azure-blob-storage](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-azure-blob-storage%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-azure-blob-storage) |
| BigQuery | [![destination-bigquery](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-bigquery%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-bigquery) |
| Google Cloud Storage (GCS) | [![destination-gcs](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-s3%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-gcs) |
| Google PubSub | [![destination-pubsub](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fdestination-pubsub%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/destination-pubsub) |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!Dockerfile
!build
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM airbyte/integration-base-java:dev

WORKDIR /airbyte
ENV APPLICATION destination-azure-blob-storage

COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar

RUN tar xf ${APPLICATION}.tar --strip-components=1

LABEL io.airbyte.version=0.1.0
LABEL io.airbyte.name=airbyte/destination-azure-blob-storage
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Azure Blob Storage Test Configuration

In order to test the Azure Blob Storage destination, you need a Microsoft account.

## Community Contributor

As a community contributor, you will need access to Azure to run the integration tests.

- Create an AzureBlobStorage account for testing. Check if it works under https://portal.azure.com/ -> "Storage explorer (preview)".
- Get your `azure_blob_storage_account_name` and `azure_blob_storage_account_key` that can read and write to the Azure Container.
- Paste the accountName and key information into the config files under [`./sample_secrets`](secrets).
- Rename the directory from `sample_secrets` to `secrets`.
- Feel free to modify the config files with different settings in the acceptance test file (e.g. `AzureBlobStorageJsonlDestinationAcceptanceTest.java`, method `getFormatConfig`), as long as they follow the schema defined in [spec.json](src/main/resources/spec.json).

## Airbyte Employee
- Access the `Azure Blob Storage Account` secrets on Last Pass.
- Replace the `config.json` under `sample_secrets`.
- Rename the directory from `sample_secrets` to `secrets`.

## Add New Output Format
- Add a new enum in `AzureBlobStorageFormat'.
- Modify `spec.json` to specify the configuration of this new format.
- Update `AzureBlobStorageFormatConfigs` to be able to construct a config for this new format.
- Create a new package under `io.airbyte.integrations.destination.azure_blob_storage`.
- Implement a new `AzureBlobStorageWriter`. The implementation can extend `BaseAzureBlobStorageWriter`.
- Write an acceptance test for the new output format. The test can extend `AzureBlobStorageDestinationAcceptanceTest`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
plugins {
id 'application'
id 'airbyte-docker'
id 'airbyte-integration-test-java'
}

application {
mainClass = 'io.airbyte.integrations.destination.azure_blob_storage.AzureBlobStorageDestination'
}

dependencies {
implementation project(':airbyte-config:models')
implementation project(':airbyte-protocol:models')
implementation project(':airbyte-integrations:bases:base-java')
implementation project(':airbyte-integrations:connectors:destination-jdbc')
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)

implementation 'com.azure:azure-storage-blob:12.12.0'
implementation 'org.apache.commons:commons-csv:1.4'

testImplementation 'org.apache.commons:commons-lang3:3.11'

integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test')
integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-azure-blob-storage')
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"azure_blob_storage_endpoint_domain_name": "blob.core.windows.net",
"azure_blob_storage_account_name": "your_account_name_here",
"azure_blob_storage_account_key": "your_account_key_here",
"azure_blob_storage_container_name": "testcontainername"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
/*
* MIT License
*
* Copyright (c) 2020 Airbyte
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

package io.airbyte.integrations.destination.azure_blob_storage;

import com.azure.core.http.rest.PagedIterable;
import com.azure.storage.blob.BlobContainerClient;
import com.azure.storage.blob.models.BlobItem;
import com.azure.storage.blob.specialized.AppendBlobClient;
import com.azure.storage.blob.specialized.SpecializedBlobClientBuilder;
import com.azure.storage.common.StorageSharedKeyCredential;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.util.UUID;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AzureBlobStorageConnectionChecker {

private static final String TEST_BLOB_NAME_PREFIX = "testConnectionBlob";
private BlobContainerClient containerClient; // schema in SQL DBs controller
private final AppendBlobClient appendBlobClient; // aka "SQL Table" controller
private final boolean overwriteDataInStream;
tuliren marked this conversation as resolved.
Show resolved Hide resolved

private static final Logger LOGGER = LoggerFactory.getLogger(
AzureBlobStorageConnectionChecker.class);

public AzureBlobStorageConnectionChecker(
AzureBlobStorageDestinationConfig azureBlobStorageConfig,
boolean overwriteDataInStream) {

this.overwriteDataInStream = overwriteDataInStream;

StorageSharedKeyCredential credential = new StorageSharedKeyCredential(
azureBlobStorageConfig.getAccountName(),
azureBlobStorageConfig.getAccountKey());

this.appendBlobClient =
new SpecializedBlobClientBuilder()
.endpoint(azureBlobStorageConfig.getEndpointUrl())
.credential(credential)
.containerName(azureBlobStorageConfig.getContainerName()) // Like schema in DB
.blobName(TEST_BLOB_NAME_PREFIX + UUID.randomUUID()) // Like table in DB
.buildAppendBlobClient();
}

// this a kinda test method that is used in CHECK operation to make sure all works fine with the
// current config
public void attemptWriteAndDelete() {
initTestContainerAndBlob();
writeUsingAppendBlock("Some test data");
listBlobsInContainer()
.forEach(
blobItem -> LOGGER.info(
"Blob name: " + blobItem.getName() + "Snapshot: " + blobItem.getSnapshot()));

deleteBlob();
}

private void initTestContainerAndBlob() {
// create container if absent (aka SQl Schema)
this.containerClient = appendBlobClient.getContainerClient();
if (!containerClient.exists()) {
containerClient.create();
}

// create a storage container if absent (aka Table is SQL BD)
if (!appendBlobClient.exists()) {
appendBlobClient.create(overwriteDataInStream);
LOGGER.info("blobContainerClient created");
} else {
LOGGER.info("blobContainerClient already exists");
}
}

// this options may be used to write and flush right away. fails for empty lines, but those are not
// supposed to be written here
tuliren marked this conversation as resolved.
Show resolved Hide resolved
public void writeUsingAppendBlock(String data) {
LOGGER.info("Writing test data to Azure Blob storage: " + data);
if (overwriteDataInStream) {
LOGGER.info("Override option is enabled. Old data is will be removed");
appendBlobClient.delete();
appendBlobClient.create();
}
InputStream dataStream = new ByteArrayInputStream(data.getBytes(StandardCharsets.UTF_8));

Integer blobCommittedBlockCount = appendBlobClient.appendBlock(dataStream, data.length())
.getBlobCommittedBlockCount();

LOGGER.info("blobCommittedBlockCount: " + blobCommittedBlockCount);
}

/*
* List the blob(s) in our container.
*/
public PagedIterable<BlobItem> listBlobsInContainer() {
return containerClient.listBlobs();
}

/*
* Delete the blob we created earlier.
*/
public void deleteBlob() {
LOGGER.info("Deleting blob: " + appendBlobClient.getBlobName());
appendBlobClient.delete(); // remove aka "SQL Table" used
}

/*
* Delete the Container. Be very careful when you ise ir. It removes thw whole bucket and supposed
* to be used in check connection ony for writing tmp data
*
tuliren marked this conversation as resolved.
Show resolved Hide resolved
*/
public void deleteContainer() {
LOGGER.info("Deleting blob: " + containerClient.getBlobContainerName());
containerClient.delete(); // remove aka "SQL Schema" used
}

}
Loading