Skip to content

Assignment 3 : REPORT

Aniruddha Patil edited this page May 4, 2020 · 36 revisions

Problem Statement

To understand the architecture of Apache Airavata's Managed File Transfers, we studied the transfer flow between protocols and how the different components of the system interacted with each other. We did this by setting up MFT in our local systems. We then decided to extend the transport module with submodules for the following protocols:

  1. Google Drive
  2. One Drive

Differences from Initial Problem Statement

Our initial proposal was regarding the contribution of Scheduling file transfers to Airavata's MFT.

We were considering delivering certain API endpoints such that they could easily be integrated into the dashboard in the future. We planned to build a scheduler and a notifier as modules leveraging the existing MFT architecture. However, upon discussions with the dev list, we realized that it was expected to be integrated in the existing architecture. Given the complexity of the project and the time constraint from our side, we expressed this challenge to the Apache Airavata Community and decided not to go ahead with this plan.

The communication links about this can be found here:

Problem Statement Development:

  • After setting up MFT in our local system, we first got an understanding of how the system works.
  • We understood the functionalities of key components like Agent, Controller, Transport Mediator, Resource Service and Secret Service.
  • We started a discussion with the Apache developer community and redefined our problem statement.
  • The developer community helped us with the questions we raised and guided us in implementing our new proposal.

Methodology:

  • We looked for the API provided by Google drive and One Drive, started discussion about their implementation with the Apache Developer community.
  • We forked the Apache Airavta MFT codebase and looked at the previous contributions to the project to understand the changes required to integrate Google Drive and OneDrive transport protocol.
  • We understood the parameters needed by the client to transfer files to and from the transport storage provider.
  • We made a list of all the Java files that needed changes.

Implementation:

1. Google Drive

  • After having all the necessary details, we started our development.
  • For Secret Service, we decided to go ahead with the Service account configuration as we needed to build a system to system application without the interference of a user.
  • For resource service, we checked how the files and folders are uploaded and downloaded and accordingly built our service.
  • Once the above mechanism was finalized, we added a new transport module, to begin the implementation for MetadataCollector, Sender and Receiver of the Google drive system.
  • We created proto files for the Secret and Resource service.
  • We then build the stub files to extend support for Google drive to make gRPC communication with the API service.
  • We have documented how to set up a service account which will help users to get started for their accounts.

2. One Drive

  • We created a scaffold for the OneDrive transport submodule in this fork
  • We required the emulation of the Microsoft Graph Explorer by our transport submodule without an intermediate login screen.
  • The core logic implementing onedrive-transport depends on being able to access OneDrive from Microsoft Graph.
  • Microsoft provides mainly two flows with which an authentication token can be generated.
  • Regardless of which flow is chosen, all of Microsoft's documentation points towards creating an Azure Active Directory (AAD) App on the Azure Portal.
  • We have tried creating an AAD app, provided the necessary permissions and generated the relevant credentials for it.

The two authentication flows are as follows:

1. OAuth 2.0 authorization code grant flow

We have investigated the following samples in order to be able to generate the access token:

  1. Nuxeo’s onedrive-java-client.

    • It does not provide a method to complete OAuth and obtain the access token.
  2. An unofficial third party client onedrive-sdk-java.

    • Requires spawning an intermediate browser window for authentication.
  3. Microsoft's own Postman guide

    • Generated an authentication token, but we were unable to list our drive files due to a license error (attempted license resolution addressed below)
  4. Azure Free Trial for adding OneDrive to the AAD

    • Assuming that the OneDrive service within the AAD was a paid one, we tried enabling the provided Azure free subscription.
    • We were not able to figure out why an added user within an AAD did not have their own dedicated OneDrive.
  5. To circumvent the intermediate authentication step, we looked at the Implicit Grant Flow and were unable to obtain an authentication token from the /authorize endpoint despite having allowed the same in the AAD application.

  6. MS Grah Explorer is able to make queries to OneDrive since it is already a web application.

2. OAuth 2.0 client credentials flow

  1. Using this flow grays out the possibility of using the /me alias for querying MS Graph and requires the usage of /users/{user_id} instead.

  2. Thus the enumeration of the files within a user's OneDrive cannot be done with /me/drive/root/children like how MS Graph Explorer does it.

We were unable to emulate queries that MS Graph Explorer does in the timeframe of this project, however, we hope to be able to resolve this issue with the help of the Airavata Developer Community.

Evaluation:

  • We did thorough testing of our modules implemented with other protocols for eg Google drive to S3, S3 to Google Drive, Google Drive to Local, Local to Google drive, Google drive to Google drive, etc. We have captured the results of our tests here
  • The test cases for GDrive transport can be found here
  • Results of the test cases can be found here

Conclusions and Outcomes

  • This project gave us a detailed understanding of architecture for Apache Airavata MFT. We also got to see how gRPC works and is used in the framework.
  • With the knowledge we gained, we were able to implement the Google Drive Protocol which is currently under review by the Apache developer community.
  • We also did a lot of research on One drive API and all the details about it were shared with the community.
  • As an outcome, we got a great opportunity to work for an open-source community, communicate with like-minded developers and improve our skills with the feedback we received.

Team Member Contributions:

Team Members

  • Aniruddha Patil
  • Nikita Bafna
  • Shivali Jejurkar

All the team members were actively involved in the communication and development of the protocols. As it was a complex framework, each one of us equally divided our parts and were able to reach our goal. The links below will show our commits and our discussion with the developer community.