Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook for deployments #562

Open
jayantbh opened this issue Oct 1, 2024 · 9 comments
Open

Webhook for deployments #562

jayantbh opened this issue Oct 1, 2024 · 9 comments
Assignees
Labels
advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! enhancement New feature or request

Comments

@jayantbh
Copy link
Contributor

jayantbh commented Oct 1, 2024

Why do we need this ?

Currently PR merges are assumed to be deployments for a repo, which is fair for any repo that runs on some kind of a CI.

But for many such repos that don't, we should at least support a webhook based mechanism that allows me to feed my deployment/workflow runs data into Middleware.

That will let me have a better picture of my Dora metrics with more accurate Lead Time, and Deployment Frequency.

@jayantbh jayantbh added enhancement New feature or request hacktoberfest advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! labels Oct 1, 2024
@Kamlesh72
Copy link
Contributor

@jayantbh working on this.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 2, 2024

Sure. Do share your approach before you begin implementation.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 3, 2024

Important

This issue is tagged advanced. By taking this up you acknowledge that you're accepting that this will be a non-trivial change and may require thorough testing and review.
Of course, this also means that we offer swag for someone who goes out of their way to tackle issues tagged advanced. 🚀
This also means we'll follow up on this regularly, and in case of inactivity the issue would be unassigned.

@Kamlesh72
Copy link
Contributor

@jayantbh Currently we take PR Merge or Workflow ( like Github_Actions ) for deployments. correct?

I am thinking to create a route that collects workflow/deployment webhook data.
This captured data will be mapped and pushed into RepoWorkflowRuns.
Separate adapter for each like bitbucket, circleci, gitlab etc.

This is basic idea, although more brainstorming needed.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 4, 2024

This should ideally happen on the python backend (apiserver dir). But yes, you have the right idea. I'll let @adnanhashmi09 explain further.

@adnanhashmi09
Copy link
Contributor

Keep the following in mind while implementing the workflow:

  1. Use authorization headers or custom headers for authenticating the workflow user. We should create a mechanism for users to create and update API keys. This would also include UI development efforts.

  2. The webhook should never cause the workflow to fail or take an excessively long time. It should return a status of 200 in all cases. In case of an error, the response body should contain the error message and possible ways to fix it.

  3. We need a mechanism to map these workflows to repositories linked with Middleware. Therefore, the webhook should also receive repository data for each workflow run.

  4. The processing of data should be asynchronous and not block the API response. The API request should resolve almost immediately after the request has been sent.

  5. The data should be processed in chunks, and the end user should send data in chunks, i.e., no more than 500 workflow runs data in a single call. This webhook should have the ability to sync large amounts of data and/or a single workflow run. Users can make a call to this webhook at the start and end of their workflow. We can infer the duration of the workflow run using that. Another case could be a user sending a number of their older workflow runs for us to process.

  6. A simple validation of received data should be performed when someone tries to upload data. If the required fields are not present, we should return a process error body with a status code of 200. We don't keep erroneous data.

  7. We would also need an API to prune the data synced if someone uploaded incorrect data and wanted to delete it.

  8. An API to revoke/generate API tokens is necessary.

  9. A frontend page to manage API tokens should be developed.

  10. Implement alerting/notification in case of erroneous data.

  11. A data dump for the request type, request body, response and error should be saved in case of an error. The data received from the end-user can be saved here and then later picked up for processing. So this could serve multiple purposes.

  12. We need some event based system to process workflow runs asynchronously without blocking the main thread. So whenever someone sends are request to our webhook we register an "event" which is picked up by a listener. When that event is invoked, the listener queries the database for the latest data to process and starts processing.

  13. The request body can be like as follows:

{
    "workflow_runs":[
        {          
            "workflow_name":"custom_workflow",
            "repo_names":["middleware"],
            "event_actor":"adnanhashmi09",
            "head_branch":"master",
            "workflow_run_unique_id":"unique_item",
            "status":"SUCCESS",
            "duration":"200", // can be provided, or we can infer this
            "workflow_run_conducted_at":"2024-09-28T20:35:45.123456+00:00"
        }
    ]
}

Read through the workflow sync once to check all the fields required for creating a RepoWorkflowRun

  1. A RepoWorkflow shall be created based on workflow_name and repo_names if not already present. This shall also be a part of validation, ie, if RepoWorkflow cannot be created due to the repo_names being wrong or not being linked to middleware shall result in an error.

So there are a lot of moving parts in this implementation and would require a thorough understanding of our system. Please read through the sync and document your approach here before starting to implement. This is a rather comprehensive task and would longer to implement.

@Kamlesh72
Copy link
Contributor

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

Can you please elaborate point 11 ?

@adnanhashmi09
Copy link
Contributor

adnanhashmi09 commented Oct 10, 2024

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

This webhook implementation is platform agnostic. We don't care about the workflow providers as the provider is not responsible for sending data. It is the user who integrates our webhook into their workflow who is responsible for sending the correct data. We will define a set of fields we require in the request body for us to register RepoWorkflow and RepoWorkflowRuns. It is up to the end user to make sure correct values are being sent.

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

Well, we can check for a few errors besides API_KEY errors. For instance, Maximum allowed data to be sent in once request, validate if the repo_names sent are linked with middleware or not. These operations are fairly quick to compute.

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

I don't think anybody would get github actions data from both integration and webhook. But yes, in practice we keep both data. We don't give the option to prune github actions data as they can always unlink that integration.

Can you please elaborate point 11 ?

We can save the entire request data in a database table including the data we receive for processing. This way we can check for errors and show alerts to the user by getting data from that table. It can also serve as a data dump to check what data has been received by our system for processing.

@Kamlesh72
Copy link
Contributor

@adnanhashmi09 The ?? is I have doubt whether to add it or not.

API KEYS

  • User will be able to Create, Read and Delete API Keys.
  • The API Key setting can be accessed as follows:
API Key Navigation Settings
// APIKeys Table Schema in Postgres
API_KEYS {
    keyname: "",
    secret_key: "",
    expiry_at: "",
    is_deleted: "",
    scope: "" //  [ WORKFLOW, INCIDENT ]
    org_id: "" // ??
}

Receiving Webhook Data

The "/webhook" api will be added to flask server.

{
    event_type: "WORKFLOW", // or "INCIDENT",
    payload: {
        workflow_runs: [{
            workflow_name: "name",
            provider_workflow_id: "", // ??
            repo_name: "middleware",
            event_actor: "githubusername",
            head_branch: "master",
            workflow_run_id: "",
            status: "SUCCESS",
            duration: "200",
            workflow_run_conducted_at: "date",
            html_url: "url"
        }]
    }
}

// Headers: "X-Secret-Key": "secret_key"

Pre Processing Validation

  • Verify API Key
  • Verify size of data
  • Verify required fields
  • Verify repo_name exists in middleware
    If error, send 200 with error message and Notify user about erroneous data on email/slack.
    The notification module can be developed separately and later integrated into it.

Store the data for processing

  1. Store the data in postgres table WebhookEventRequests (which act as DataDump table).
    Same table to store Workflow or Incident webhook data.
WebhookEventRequests {
    request_type: "DEPLOYMENT", // Or INCIDENT
    request_data: "{ workflow_runs: [] }",
    status: "",
    error: "",
    created_in_db_at: "",
    processed_at: "date",
    response_data: "",
    retries: 0
}
  1. Call the Celery to process data async. The broker will be Redis.
  2. If there is any error, WebhookEventRequest will be updated accordingly with status, error.
    Also, the user will be Notified about the error.
  3. If no error, then update WebhookEventRequest and store data in RepoWorkflow and RepoWorkflowRuns.

UI

There will be 3 pages namely: Deployments, Incidents, API Keys.
This pages will show incoming data in the table form with columns like WebhookEventRequest_id, processed_at, status.
The data can be sent from WebhookEventRequests table using Server Side Events.
As of now, we will just use API calls, so user need to refresh to get new data.
Another way is to have 3 tabs (API Keys, Deployments, Incidents) in a single page.

Discussion

  • Should the API Key actually be deleted or just marked is_deleted?
  • Why repo_names and not repo_name?
  • If the incoming data is much less like <10 workflows, it can be processed synchronously or can be processed in batches in celery (will decide later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants