Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ADF Tool #21

Merged
merged 36 commits into from
Mar 24, 2020
Merged

Improve ADF Tool #21

merged 36 commits into from
Mar 24, 2020

Conversation

tongwu-sh
Copy link
Contributor

  1. Support parallel processing for ADF Tool
  2. Support cross region source and destination
  3. Change to use streaming mode without overhead on disk

@tongwu-sh tongwu-sh requested a review from moria97 March 18, 2020 12:43
Console.WriteLine($"[{stopWatch.Elapsed.ToString()}][tid:{args.CurrentThreadId}]: {processedCount} Completed. {processedErrorCount} Failed. {consumedCount} consume completed.");
};

await executor.ExecuteAsync(CancellationToken.None, false, progress).ConfigureAwait(false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/Should we build resiliency against node crash in the middle of a large blob processing (say picking only unprocessed data on restart)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add file level retry at ADF pipeline first? Use foreach activity to run anonymizer tool on single file and retry if it failed?

To achieve resume functionality, looks like we need somewhere for partial status, in batch mode, we can do this with additional storage table or ... If we change to use azure function, looks like we can easily leverage the partial status.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a story to track this work?

Copy link
Contributor Author

@tongwu-sh tongwu-sh Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a story to backlog for tracking. Thanks! https://microsofthealth.visualstudio.com/Health/_workitems/edit/73121/

@tongwu-sh tongwu-sh requested a review from deepak-ms March 20, 2020 06:08
deepak-ms
deepak-ms previously approved these changes Mar 20, 2020
@tongwu-sh tongwu-sh merged commit c6b87eb into master Mar 24, 2020
@tongwu-sh tongwu-sh deleted the personal/tongwu/adf-pipeline-streaming branch March 24, 2020 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants