Integration task architecture spec'd #5968

Dschoordsch · 2022-01-26T11:38:32Z

Now we have a generic data structure for multiple integrations and should move towards unifying the existing Jira and GitHub integrations to avoid duplication and simplify adding new ones. We're adding Jira Server and GitLab as new integrations and these should already follow a new generic design.
Notification integrations (i.e. slack/mattermost) are out of scope here.

Current state

IntegrationProvider stores the different cloud and privately hosted instances of different integration services. It is a separate table.
service describes the type of the integration (i.e. 'gitlab', 'github', 'jira', ...) and thus which integration manager is used
scope describes which users have access to an IntegrationProvider, it can be 'org', 'team' or 'global'. Global IntegrationProviders are also stored in the DB and added in yarn postDeploy based on environment variables.
TeamMemberIntegrationAuth stores the authentication of a TeamMember for a specific IntegrationProvider and is used instead of separate tables per service. Conceptually it is a union of different authentication methods (e.g. OAuth1, OAuth2, PAT, ...)

New and shiny mutations

addIntegrationProvider
removeIntegrationProvider
updateIntegrationProvider
createTaskIntegration¹
addTeamMemberIntegrationAuth
removeTeamMemberIntegrationAuth

Legacy mutations

GitHub and Jira both have their own tables and mutations.

addGitHubAuth
addAtlassianAuth
removeAtlassianAuth
createGitHubTask
createJiraTask
createGitHubTaskIntegration¹
createJiraTaskIntegration¹
persistGitHubSearchQuery
persistJiraSearchQuery
updateGitHubDimensionField
updateJiraDimensionField
pushEstimateToGitHub
addMissingJiraField

Legacy queries

fetchGitHubRepos
fetchAtlassianProjects

Goal

We want to make adding new IntegrationProvider services easy. Service specific logic on the backend side should be encapsulated in their respective <Service>Manager and the data stored in a common table.
If there is something specific to one service only (e.g. adding a missing field), this does not need to be forced into one generic method.

Acceptance criteria

new table schema for integration specific data (e.g. search query, last used project, voting field etc.)
map of new mutations and legacy mutations they are replacing
interface for ServiceManager which supports these new mutations

Next steps (out of scope)

define a migration path for Jira and GitHub which is taking into account the work we're doing to add GitLab and Jira Server ("While we're in there")

createGitHubTaskIntegration and createJiraTaskIntegration are currently combined into createTaskIntegration in Make createJiraTaskIntegration/createGitHubTaskIntegration generic #5819 ↩ ↩² ↩³

The text was updated successfully, but these errors were encountered:

Dschoordsch · 2022-01-26T20:17:04Z

Storing integration for `Task`

Currently Task has the following

Task {
  [...]
  integrationHash: GitHubIssueId.join(projectId, issueNumber),
  integration: {
     accessUserId: this.accessUserId!,
     service: 'github',
     issueNumber,
     nameWithOwner: projectId
  },

or

  integrationHash: JiraIssueId.join(cloudId, issueKey),
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    cloudId,
    issueKey
  },

integrationHash needs to be unique across all integration providers and thus should include the providerId. If integrationHash includes also projectId and issueNumber these fields are redundant in the integration field.
integration needs to have accessUserId and service is need in order to choose the correct format to parse the integrationHash
all other fields are redundant.

So going forward it would be enough if we had sth. like

  integrationHash: JiraServerIssueId.join(providerId, projectId, issueKey)
  integration: {
    accessUserId
    service
  }

This could be flattened, but this can be addressed when Task is migrated to Postgres.

Dschoordsch · 2022-01-28T18:38:55Z

Storing integration provider specific data

Legacy integration auth tables GitHubAuth and AtlassianAuth also store settings and cache values:
githubSearchQueries and jiraSearchQueries, cloudIds,

cloudIds belongs to the authentication and caches which projects are accessible and should be stored in TeamMemberIntegrationAuth.
searchQueries should be stored separately. It belongs to the TeamMember and when we allow to reuse a team members auth, a user could have a searchQuery to store without having authenticated with anything. Same goes for voting fields for poker. These settings could be either stored in TeamMember or in a new IntegrationSettings table

mattkrick · 2022-02-01T00:50:04Z

Thank you so much for getting this kicked off!

Questions:

For <Service>Manager if a service has multiple authStrategies (e.g. PAT and OAuth2) should both strategies be a part of the same manager?
Would JiraIssueId still include cloudId? I think it would have to if we are to compare scoping phase search result items to Tasks that already exist in our system, but wanted to make sure
I love the single source of truth, but would removing fields in the integration object make things too difficult to query? For example, if I wanted to see if there was a task in our database that was connected to this GitHub issue, today i could filter on integration.nameWithOwner. Without that, i'd have to filter on integration.service = 'github', then parse the returned integrationHash rows with a SQL equivalent of GitHubIssueId.split. I suppose we could always index on that computation if we used it in prod.
Would jiraDimensionFields, which currently lives on the Teams table, get its own table?

igorlesnenko · 2022-02-03T09:23:27Z

I understand that it is inconvenient that all integrations would have different field names for storing their data.

Not sure if we need to store separate fields as a single string, as I agree with Matt, it will not allow us to use database constraints, indexes and make queries more complex.
Also, could it feel conceptually wrong? If several fields should be in a single field then something like JSON feels more suitable.

What if we go this way:

integrationHash: JiraIssueId.join(cloudId, issueKey) // etc.
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: // nameWithOwner, JiraProjectId.join(cloudId, projectKey)
    issueId // issueKey, issueNumber etc
  },

We do not use integrationHash as a single source of truth, only for determining unique integration.
Also hash word feels like it should not be decoded back. But maybe it is only me.

We use only two fields, projectId and issueId.
Storing cloudId in projectId for jira would be an exception to the rule.

However, it is also not a problem to have an additional cloudId field. Not sure I fully understand how having only common fields in query would simplify the code, as we need to parse and handle it individually anyway.

So alternative is :

integrationHash: JiraIssueId.join(cloudId, issueKey) // etc.
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: // nameWithOwner, **still** JiraProjectId.join(cloudId, projectKey)
    issueId // issueKey, issueNumber etc,
    cloudId // valid only for jira, but maybe we can reuse that field for other integrations
  },

I would still prefer to have no parsing probably and go with having separate field for cloudId

What do you think?

Dschoordsch · 2022-02-03T10:46:25Z

<Service>Manager is currently only determined by the IntegrationProviderServiceEnum thus it cannot distinguish between authentication methods. This means we have one manager which internally has to deal with PAT vs. OAuth2 which shouldn't be too difficult
I don't have objections against storing necessary fields separately, but I don't understand the mentioned use case. I would imagine to look if a task already exists, we would hash it and check for the integrationHash. I agree searching for existing tasks by cloudId would be more cumbersome and we could store it separately just for that, but we're not querying for it at the moment. If we store it, we should still try to keep these integration specific inputs out of mutations if not necessary.
@igorlesnenko We set the precedence in TeamMemberIntegrationAuth that we prefer having additional columns instead of JSON, so let's keep it consistent at least across our current integration journey. I would also keep everything out that's not needed to query directly from the DB.
cloudId is an interesting case because as far as I can see there are very limited uses:
- we pass it around to construct a projectId or integrationHash
- it is stored in the authentication table and used to avoid 1 request when fetching available projects (which means the cloudIds stored could be stale)
- it is used together with projectKey or projectKey and issueId in AtlassianManager which could be replaced by projectId and integrationHash usage instead
- JiraDimensionFieldId has only cloudId but should really be project specific as fields can be configured on project level
We have 2 options to keep the DB consistent, we could add cloudId as generated field or add a check that it matches the integrationHash. Both would be done on writes, but the first one doesn't require the field to be updated manually, so it has a slight edge here.
@mattkrick Thanks for pointing me to jiraDimensionFields.
We should be consistent and aim for simplicity. Since integrations are team specific and an integral part of the experience, let's put Team specific integration settings into Team and TeamMember specific settings into TeamMember. Only downside I can see is adding a couple of null columns to every TeamMember.

Which exact fields are on Task does not need to be decided right now since it's still stored in rethink and that's something we need to have decided before moving it to postgres. However I propose for new integrations trying to add no additional fields to Task and when transitioning to postgres trying to get along without the additional columns and only adding these as GENERATED ... STORED once we realize these are necessary.

mattkrick · 2022-02-03T23:13:36Z

If we store it, we should still try to keep these integration specific inputs out of mutations if not necessary.

100% I'm all for storing it in the integration object. we don't use it today, but for security if someone disables an entire cloud, it'd be nice to just nuke all the tasks (or whatever unknown future use case may arise)
I agree with you, I don't think we have to have a solution for task.integration yet. When we revisit, perhaps we make this a separate table or JSONB. github & gitlab are very similarly shaped, jira is a wild animal, who knows what other integration we'll add. lots more to learn!
deriving projectKey from issueKey works today, but it's possible that some jira admin doesn't use the default naming mechanism (a - delimiter). If that happens, we'll want to store projectKey separately. No action needed, but something to keep in the back of your mind

We have 2 options to keep the DB consistent, we could add cloudId as generated field or add a check that it matches the integrationHash. Both would be done on writes, but the first one doesn't require the field to be updated manually, so it has a slight edge here.

No strong preference here, just as long as it's not cumbersome or expensive for tasks connected via github, gitlab, et al

igorlesnenko · 2022-03-23T20:27:15Z

I faced this issue while doing #6238: "Store issueId instead of issueNumber for GitHub issue integration #6252"

We need to recalculate GitHub hashes or store issueId in integration.

Considering this, I would suggest the following:

integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: JiraProjectId.join(cloudId, projectKey)
    issueId: JiraIssueId.join(cloudId, projectKey)
  },

mattkrick · 2022-03-23T23:58:40Z

do we already have decided where we store integration specific values like search queries and project ids?

i think search queries probably deserve their own table. It makes sense to index them on teamId.
Keeping them on the Team table is OK, but 90% of the time you query a Team row you don't need the search query, so it's a little overfetch.

projectIds

IIRC projectIds are just a GUID consisting of a cloudId + projectKey. If that's correct, then we can just make it as long as we have both of those values.
today, an integrated jira task has {projectKey, cloudId, issueKey}, which is probably sufficient.

nickoferrall · 2022-03-24T09:13:17Z

I haven't started implementing the search query for GitLab, but creating their own table sounds good to me.

For GitLab, I'm storing the gid that we get from GitLab, and the providerId to ensure the id is unique for self-hosted solutions. See here.

jordanh · 2022-07-20T21:16:56Z

@Dschoordsch now that we're a few integrations in, should this issue be closed?

Dschoordsch · 2022-07-29T17:57:27Z

I would have to take another look, but I suspect we could unify search and also could migrate GitHub and Jira to IntegrationProvider. Both doesn't solve any immediate problems, so let's just skip this for now until we're in there again.

Dschoordsch added discussion enhancement labels Jan 26, 2022

This was referenced Feb 7, 2022

Jira Server: persist search query #5820

Closed

GitLab Architectural Spec Written #5564

Closed

This was referenced Feb 11, 2022

Sprint Poker: GitLab v1: Create new issue #6033

Closed

GitLab v3: Task Integration Spec Written #6050

Closed

GitLab Architecture Proposal #6058

Closed

Dschoordsch added Story Points: 13 Generated by Parabol Next Sprint: Y Generated by Parabol labels Mar 15, 2022

igorlesnenko mentioned this issue Mar 22, 2022

Store issueId instead of issueNumber for GitHub issue integration #6252

Closed

2 tasks

jordanh removed the Next Sprint: Y Generated by Parabol label May 3, 2022

Dschoordsch self-assigned this Jul 21, 2022

Dschoordsch closed this as completed Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration task architecture spec'd #5968

Integration task architecture spec'd #5968

Dschoordsch commented Jan 26, 2022 •

edited

Loading

Dschoordsch commented Jan 26, 2022

Dschoordsch commented Jan 28, 2022

mattkrick commented Feb 1, 2022

igorlesnenko commented Feb 3, 2022 •

edited

Loading

Dschoordsch commented Feb 3, 2022

mattkrick commented Feb 3, 2022

igorlesnenko commented Mar 23, 2022 •

edited

Loading

mattkrick commented Mar 23, 2022

nickoferrall commented Mar 24, 2022

jordanh commented Jul 20, 2022

Dschoordsch commented Jul 29, 2022

Integration task architecture spec'd #5968

Integration task architecture spec'd #5968

Comments

Dschoordsch commented Jan 26, 2022 • edited Loading

Current state

New and shiny mutations

Legacy mutations

Legacy queries

Goal

Acceptance criteria

Next steps (out of scope)

Footnotes

Dschoordsch commented Jan 26, 2022

Storing integration for Task

Dschoordsch commented Jan 28, 2022

Storing integration provider specific data

mattkrick commented Feb 1, 2022

igorlesnenko commented Feb 3, 2022 • edited Loading

Dschoordsch commented Feb 3, 2022

mattkrick commented Feb 3, 2022

igorlesnenko commented Mar 23, 2022 • edited Loading

mattkrick commented Mar 23, 2022

nickoferrall commented Mar 24, 2022

jordanh commented Jul 20, 2022

Dschoordsch commented Jul 29, 2022

Dschoordsch commented Jan 26, 2022 •

edited

Loading

Storing integration for `Task`

igorlesnenko commented Feb 3, 2022 •

edited

Loading

igorlesnenko commented Mar 23, 2022 •

edited

Loading