Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADLS #6071

Merged
merged 42 commits into from
Nov 1, 2019
Merged

ADLS #6071

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
1362589
Added base files for ADLS
gapra-msft Sep 30, 2019
3f70a3d
Basic methods on PathAsyncClient
rickle-msft Oct 10, 2019
440e583
Regenerated off swagger
rickle-msft Oct 10, 2019
03f26bc
Regenerated code, not up to date with main branch. Made progress on F…
gapra-msft Oct 11, 2019
9e535fd
Merge branch 'storage/ADLSdev' of github.com:gapra-msft/azure-sdk-for…
gapra-msft Oct 11, 2019
595ae8e
Added more FileSystemClient and DataLakeServiceClient code
gapra-msft Oct 14, 2019
28f9a14
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 14, 2019
8fbe577
Added a few service apis
rickle-msft Oct 15, 2019
ef414ac
Merge branch 'storage/ADLSdev' of github.com:gapra-msft/azure-sdk-for…
rickle-msft Oct 15, 2019
dd0148e
Added tests for blob APIs in file system and data lake service client
gapra-msft Oct 16, 2019
6a7dd0e
Merge branch 'storage/ADLSdev' of github.com:gapra-msft/azure-sdk-for…
gapra-msft Oct 16, 2019
bb28178
Added all public facing APIs to ADLS no complete test coverage
gapra-msft Oct 18, 2019
4282e97
Added javadoc and samples to datalake service client
gapra-msft Oct 21, 2019
de275f5
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 21, 2019
d36d5e6
Added more java doc code snippets
gapra-msft Oct 21, 2019
4403dda
Fixed all build issues due to merging with master
gapra-msft Oct 22, 2019
c4687dd
Added rename to file and directory clients
gapra-msft Oct 23, 2019
d1c7a7d
Adding tests for filesystemclient and serviceclient
gapra-msft Oct 23, 2019
25004e9
Added all tests except file system list paths does not deserialize th…
gapra-msft Oct 25, 2019
e760c3b
Added tests for list paths
gapra-msft Oct 25, 2019
7c2b62a
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 25, 2019
8ae6c3d
Fixed build issues due to merge
gapra-msft Oct 28, 2019
547542c
renamed PathHTTPHeaders to PathHttpHeaders and changed
gapra-msft Oct 28, 2019
86e67ea
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 28, 2019
734dabd
Added SAS to datalake need to test
gapra-msft Oct 28, 2019
6901f0e
Added all java doc code snippets and moved pathhttpheaders
gapra-msft Oct 29, 2019
e408112
Fixed checkstyle and build issues
gapra-msft Oct 29, 2019
11dc017
Added application insights files back
gapra-msft Oct 29, 2019
96e4a15
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 29, 2019
5bb2332
Added basic samples
gapra-msft Oct 29, 2019
467bdb1
Added one more sample and changed identity version in pom
gapra-msft Oct 30, 2019
f1cd8dd
Addressed some cr comments and added README
gapra-msft Oct 30, 2019
27c15bb
Addressed more concerns from CR and added Readme
gapra-msft Oct 31, 2019
d407eff
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 31, 2019
bf60dbf
Updated blob version to 12.0.0
gapra-msft Oct 31, 2019
cd4781b
Fixed issue with checkstyle
gapra-msft Oct 31, 2019
77daad3
Changed name of client
gapra-msft Oct 31, 2019
9622a00
Fixed some code samples issues
gapra-msft Oct 31, 2019
7ef48bf
Updated readme
gapra-msft Oct 31, 2019
9cac381
Merge branch 'master' into storage/ADLSdev
gapra-msft Oct 31, 2019
ff5bf8b
Changed md5 to capital should fix in swagger for both file share and …
gapra-msft Nov 1, 2019
42d2904
Added test recordings
gapra-msft Nov 1, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,8 @@
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.storage.blob.batch.BlobBatchAsyncClient.java"/>
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.data.appconfiguration.ConfigurationClient.java"/>
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.data.appconfiguration.ConfigurationAsyncClient.java"/>
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.storage.file.datalake.DataLakeLeaseClient.java"/>
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.storage.file.datalake.DataLakeLeaseAsyncClient.java"/>

<!-- Suppress public/private constructor check since BlobClients need protected constructors to create EncryptedClients -->
<suppress checks="com.azure.tools.checkstyle.checks.ServiceClientCheck" files="com.azure.storage.blob.BlobAsyncClient.java"/>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@
<!-- It is fine to have un-used variables, unread fields, anonymous static inner classes in javadoc code samples. -->
<Match>
<Or>
<Class name="~.*JavaDoc(CodeSnippets|CodeSamples)"/>
<Class name="~.*JavaDoc(CodeSnippets|CodeSamples|Samples)"/>
<Class name="com.azure.storage.blob.batch.ReadmeCodeSamples"/>
</Or>
<Bug pattern="DLS_DEAD_LOCAL_STORE,
Expand Down
311 changes: 308 additions & 3 deletions sdk/storage/azure-storage-file-datalake/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,325 @@
# Azure File Data Lake client library for Java

Azure Data Lake Storage is Microsoft's optimized storage solution for for big
data analytics workloads. A fundamental part of Data Lake Storage Gen2 is the
addition of a hierarchical namespace to Blob storage. The hierarchical
namespace organizes objects/files into a hierarchy of directories for
efficient data access.

[Source code][source] | [API reference documentation][docs] | [REST API documentation][rest_docs] | [Product documentation][product_docs] | [Samples][samples]

## Getting started

### Prerequisites

- [Java Development Kit (JDK)][jdk] with version 8 or above
- [Azure Subscription][azure_subscription]
- [Create Storage Account][storage_account]

### Adding the package to your product

Add a dependency on Azure Storage Blob
[//]: # ({x-version-update-start;com.azure:azure-storage-blob;current})
```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.0.0</version>
</dependency>
```
[//]: # ({x-version-update-end})

### Default HTTP Client
All client libraries, by default, use the Netty HTTP client. Adding the above dependency will automatically configure
Storage Data Lake to use the Netty HTTP client.

### Alternate HTTP client
If, instead of Netty it is preferable to use OkHTTP, there is an HTTP client available for that too. Exclude the default
Netty and include the OkHTTP client in your pom.xml.

[//]: # ({x-version-update-start;com.azure:azure-storage-file-datalake;current})
```xml
<!-- Add the Storage Data Lake dependency without the Netty HTTP client -->
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
<version>12.0.0-preview.5</version>
<exclusions>
<exclusion>
<groupId>com.azure</groupId>
<artifactId>azure-core-http-netty</artifactId>
</exclusion>
</exclusions>
</dependency>
```
[//]: # ({x-version-update-end})
[//]: # ({x-version-update-start;com.azure:azure-core-http-okhttp;current})
```xml
<!-- Add the OkHTTP client to use with Storage Data Lake -->
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-core-http-okhttp</artifactId>
<version>1.0.0</version>
</dependency>
```
[//]: # ({x-version-update-end})

### Configuring HTTP Clients
When an HTTP client is included on the classpath, as shown above, it is not necessary to specify it in the client library [builders](#create-datalakeserviceclient) unless you want to customize the HTTP client in some fashion. If this is desired, the `httpClient` builder method is often available to achieve just this by allowing users to provide custom (or customized) `com.azure.core.http.HttpClient` instances.

For starters, by having the Netty or OkHTTP dependencies on your classpath, as shown above, you can create new instances of these `HttpClient` types using their builder APIs. For example, here is how you would create a Netty HttpClient instance:

```java
HttpClient client = new NettyAsyncHttpClientBuilder()
.port(8080)
.wiretap(true)
.build();
```

### Create a Storage Account
To create a Storage Account you can use the [Azure Portal][storage_account_create_portal] or [Azure CLI][storage_account_create_cli].
Note: To use data lake, your account must have hierarchical namespace enabled.

```bash
az storage account create \
--resource-group <resource-group-name> \
--name <storage-account-name> \
--location <location>
```

### Authenticate the client

In order to interact with the Storage Service you'll need to create an instance of the Service Client class.
To make this possible you'll need the Account SAS (shared access signature) string of the Storage Account. Learn more at [SAS Token][sas_token]

#### Get credentials

##### SAS Token

a. Use the Azure CLI snippet below to get the SAS token from the Storage Account.

```bash
az storage blob generate-sas \
--account-name {Storage Account name} \
--container-name {container name} \
--name {blob name} \
--permissions {permissions to grant} \
--expiry {datetime to expire the SAS token} \
--services {storage services the SAS allows} \
--resource-types {resource types the SAS allows}
```

Example:

```bash
CONNECTION_STRING=<connection-string>

az storage blob generate-sas \
--account-name MyStorageAccount \
--container-name MyContainer \
--name MyBlob \
--permissions racdw \
--expiry 2020-06-15
```

b. Alternatively, get the Account SAS Token from the Azure Portal.

1. Go to your Storage Account
2. Select `Shared access signature` from the menu on the left
3. Click on `Generate SAS and connection string` (after setup)

##### **Shared Key Credential**

a. Use Account name and Account key. Account name is your Storage Account name.

1. Go to your Storage Account
2. Select `Access keys` from the menu on the left
3. Under `key1`/`key2` copy the contents of the `Key` field

or

b. Use the connection string.

1. Go to your Storage Account
2. Select `Access keys` from the menu on the left
3. Under `key1`/`key2` copy the contents of the `Connection string` field

## Key concepts

This preview package for Java includes ADLS Gen2 specific API support made available in Blob SDK. This includes:
1. New directory level operations (Create, Rename/Move, Delete) for both hierarchical namespace enabled (HNS) storage accounts and HNS disabled storage accounts. For HNS enabled accounts, the rename/move operations are atomic.
2. Permission related operations (Get/Set ACLs) for hierarchical namespace enabled (HNS) accounts.

HNS enabled accounts in ADLS Gen2 can also now leverage all of the operations available in Blob SDK. Support for File level semantics for ADLS Gen2 is planned to be made available in Blob SDK in a later release. In the meantime, please find below mapping for ADLS Gen2 terminology to Blob terminology

|ADLS Gen2 | Blob |
| ---------- | ---------- |
|Filesystem | Container |
|Folder | Directory |
|File | Blob |

## Examples

The following sections provide several code snippets covering some of the most common Azure Storage Blob tasks, including:

- [Create a `DataLakeServiceClient`](#create-a-datalakeserviceclient)
- [Create a `DataLakeFileSystemClient`](#create-a-filesystemclient)
- [Create a `DataLakeFileClient`](#create-a-fileclient)
- [Create a file system](#create-a-filesystem)
- [Upload a file from a stream](#upload-a-file-from-a-stream)
- [Read a file to a stream](#read-a-file-to-a-stream)
- [Enumerate paths](#enumerate-paths)
- [Authenticate with Azure Identity](#authenticate-with-azure-identity)

### Create a `DataLakeServiceClient`

Create a `DataLakeServiceClient` using the [`sasToken`](#get-credentials) generated above.

```java
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("<your-storage-dfs-url>")
.sasToken("<your-sasToken>")
.buildClient();
```

### Create a `DataLakeFileSystemClient`

Create a `DataLakeFileSystemClient` using a `DataLakeServiceClient`.

```java
DataLakeFileSystemClient dataLakeFileSystemClient = dataLakeServiceClient.getFileSystemClient("myfilesystem");
```

or

Create a `DataLakeFileSystemClient` from the builder [`sasToken`](#get-credentials) generated above.

```java
DataLakeFileSystemClient dataLakeFileSystemClient = new DataLakeFileSystemClientBuilder()
.endpoint("<your-storage-dfs-url>")
.sasToken("<your-sasToken>")
.containerName("myfilesystem")
.buildClient();
```

### Create a `DataLakeFileClient`

Create a `DataLakeFileClient` using a `DataLakeFileSystemClient`.

```java
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
```

or

Create a `FileClient` from the builder [`sasToken`](#get-credentials) generated above.

```java
DataLakeFileClient fileClient = new DataLakePathClientBuilder()
.endpoint("<your-storage-dfs-url>")
.sasToken("<your-sasToken>")
.fileSystemName("myfilesystem")
.pathName("myfile")
.buildClient();
```

### Create a file system

Create a file system using a `DataLakeServiceClient`.

```java
dataLakeServiceClient.createFileSystem("myfilesystem");
```

or

Create a container using a `DataLakeFileSystemClient`.

```java
dataLakeFileSystemClient.create();
```

### Upload a file from a stream

Upload from an `InputStream` to a blob using a `DataLakeFileClient` generated from a `DataLakeFileSystemClient`.

```java
DataLakeFileClient fileClient = dataLakeFileSystemClient.getFileClient("myfile");
fileClient.create();
String dataSample = "samples";
try (ByteArrayInputStream dataStream = new ByteArrayInputStream(dataSample.getBytes())) {
fileClient.append(dataStream, 0, dataSample.length());
}
fileClient.flush(dataSample.length());
```

### Download a file to a stream

Download a file to an `OutputStream` using a `DataLakeFileClient`.

```java
try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
fileClient.read(outputStream);
}
```

### Enumerate paths

Enumerating all paths using a `DataLakeFileSystemClient`.

```java
dataLakeFileSystemClient.listPaths()
.forEach(
pathItem -> System.out.println("This is the path name: " + pathItem.getName())
);
```

### Authenticate with Azure Identity

The [Azure Identity library][identity] provides Azure Active Directory support for authenticating with Azure Storage.

```java
DataLakeServiceClient storageClient = new DataLakeServiceClientBuilder()
.endpoint(endpoint)
.credential(new DefaultAzureCredentialBuilder().build())
.buildClient();
```

## Troubleshooting

When interacting with data lake using this Java client library, errors returned by the service correspond to the same HTTP
status codes returned for [REST API][error_codes] requests. For example, if you try to retrieve a file system or path that
doesn't exist in your Storage Account, a `404` error is returned, indicating `Not Found`.

## Next steps

Several Storage datalake Java SDK samples are available to you in the SDK's GitHub repository.

## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

This project welcomes contributions and suggestions. Most contributions require you to agree to a [Contributor License Agreement (CLA)][cla] declaring that you have the right to, and actually do, grant us the rights to use your contribution.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For more information see the [Code of Conduct FAQ][coc_faq] or contact [[email protected]][coc_contact] with any additional questions or comments.

<!-- LINKS -->
[source]: src
[samples_readme]: src/samples/README.md
[docs]: http://azure.github.io/azure-sdk-for-java/
[rest_docs]: https://docs.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2
[product_docs]: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
[sas_token]: https://docs.microsoft.com/azure/storage/common/storage-dotnet-shared-access-signature-part-1
[jdk]: https://docs.microsoft.com/java/azure/jdk/
[azure_subscription]: https://azure.microsoft.com/free/
[storage_account]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal
[storage_account_create_cli]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-cli
[storage_account_create_portal]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal
[identity]: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/identity/azure-identity/README.md
[samples]: src/samples
[cla]: https://cla.microsoft.com
[coc]: https://opensource.microsoft.com/codeofconduct/
[coc_faq]: https://opensource.microsoft.com/codeofconduct/faq/
[coc_contact]: mailto:[email protected]

![Impressions](https://azure-sdk-impressions.azurewebsites.net/api/impressions/azure-sdk-for-java/sdk/tracing/README.png)
![Impressions](https://azure-sdk-impressions.azurewebsites.net/api/impressions/azure-sdk-for-java/sdk/storage/azure-storage-file-data-lake/README.png)
Loading