Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration task architecture spec'd #5968

Closed
3 tasks
Dschoordsch opened this issue Jan 26, 2022 · 11 comments
Closed
3 tasks

Integration task architecture spec'd #5968

Dschoordsch opened this issue Jan 26, 2022 · 11 comments
Assignees

Comments

@Dschoordsch
Copy link
Contributor

Dschoordsch commented Jan 26, 2022

Now we have a generic data structure for multiple integrations and should move towards unifying the existing Jira and GitHub integrations to avoid duplication and simplify adding new ones. We're adding Jira Server and GitLab as new integrations and these should already follow a new generic design.
Notification integrations (i.e. slack/mattermost) are out of scope here.

Current state

  • IntegrationProvider stores the different cloud and privately hosted instances of different integration services. It is a separate table.
  • service describes the type of the integration (i.e. 'gitlab', 'github', 'jira', ...) and thus which integration manager is used
  • scope describes which users have access to an IntegrationProvider, it can be 'org', 'team' or 'global'. Global IntegrationProviders are also stored in the DB and added in yarn postDeploy based on environment variables.
  • TeamMemberIntegrationAuth stores the authentication of a TeamMember for a specific IntegrationProvider and is used instead of separate tables per service. Conceptually it is a union of different authentication methods (e.g. OAuth1, OAuth2, PAT, ...)

New and shiny mutations

  • addIntegrationProvider
  • removeIntegrationProvider
  • updateIntegrationProvider
  • createTaskIntegration1
  • addTeamMemberIntegrationAuth
  • removeTeamMemberIntegrationAuth

Legacy mutations

GitHub and Jira both have their own tables and mutations.

  • addGitHubAuth
  • addAtlassianAuth
  • removeAtlassianAuth
  • createGitHubTask
  • createJiraTask
  • createGitHubTaskIntegration1
  • createJiraTaskIntegration1
  • persistGitHubSearchQuery
  • persistJiraSearchQuery
  • updateGitHubDimensionField
  • updateJiraDimensionField
  • pushEstimateToGitHub
  • addMissingJiraField

Legacy queries

  • fetchGitHubRepos
  • fetchAtlassianProjects

Goal

We want to make adding new IntegrationProvider services easy. Service specific logic on the backend side should be encapsulated in their respective <Service>Manager and the data stored in a common table.
If there is something specific to one service only (e.g. adding a missing field), this does not need to be forced into one generic method.

Acceptance criteria

  • new table schema for integration specific data (e.g. search query, last used project, voting field etc.)
  • map of new mutations and legacy mutations they are replacing
  • interface for ServiceManager which supports these new mutations

Next steps (out of scope)

  • define a migration path for Jira and GitHub which is taking into account the work we're doing to add GitLab and Jira Server ("While we're in there")

Footnotes

  1. createGitHubTaskIntegration and createJiraTaskIntegration are currently combined into createTaskIntegration in Make createJiraTaskIntegration/createGitHubTaskIntegration generic #5819 2 3

@Dschoordsch
Copy link
Contributor Author

Storing integration for Task

Currently Task has the following

Task {
  [...]
  integrationHash: GitHubIssueId.join(projectId, issueNumber),
  integration: {
     accessUserId: this.accessUserId!,
     service: 'github',
     issueNumber,
     nameWithOwner: projectId
  },

or

  integrationHash: JiraIssueId.join(cloudId, issueKey),
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    cloudId,
    issueKey
  },
  • integrationHash needs to be unique across all integration providers and thus should include the providerId. If integrationHash includes also projectId and issueNumber these fields are redundant in the integration field.
  • integration needs to have accessUserId and service is need in order to choose the correct format to parse the integrationHash
  • all other fields are redundant.

So going forward it would be enough if we had sth. like

  integrationHash: JiraServerIssueId.join(providerId, projectId, issueKey)
  integration: {
    accessUserId
    service
  }

This could be flattened, but this can be addressed when Task is migrated to Postgres.

@Dschoordsch
Copy link
Contributor Author

Storing integration provider specific data

Legacy integration auth tables GitHubAuth and AtlassianAuth also store settings and cache values:
githubSearchQueries and jiraSearchQueries, cloudIds,

  • cloudIds belongs to the authentication and caches which projects are accessible and should be stored in TeamMemberIntegrationAuth.
  • searchQueries should be stored separately. It belongs to the TeamMember and when we allow to reuse a team members auth, a user could have a searchQuery to store without having authenticated with anything. Same goes for voting fields for poker. These settings could be either stored in TeamMember or in a new IntegrationSettings table

@mattkrick
Copy link
Member

Thank you so much for getting this kicked off!

Questions:

  • For <Service>Manager if a service has multiple authStrategies (e.g. PAT and OAuth2) should both strategies be a part of the same manager?
  • Would JiraIssueId still include cloudId? I think it would have to if we are to compare scoping phase search result items to Tasks that already exist in our system, but wanted to make sure
  • I love the single source of truth, but would removing fields in the integration object make things too difficult to query? For example, if I wanted to see if there was a task in our database that was connected to this GitHub issue, today i could filter on integration.nameWithOwner. Without that, i'd have to filter on integration.service = 'github', then parse the returned integrationHash rows with a SQL equivalent of GitHubIssueId.split. I suppose we could always index on that computation if we used it in prod.
  • Would jiraDimensionFields, which currently lives on the Teams table, get its own table?

@igorlesnenko
Copy link
Contributor

igorlesnenko commented Feb 3, 2022

I understand that it is inconvenient that all integrations would have different field names for storing their data.

Not sure if we need to store separate fields as a single string, as I agree with Matt, it will not allow us to use database constraints, indexes and make queries more complex.
Also, could it feel conceptually wrong? If several fields should be in a single field then something like JSON feels more suitable.

What if we go this way:

integrationHash: JiraIssueId.join(cloudId, issueKey) // etc.
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: // nameWithOwner, JiraProjectId.join(cloudId, projectKey)
    issueId // issueKey, issueNumber etc
  },

We do not use integrationHash as a single source of truth, only for determining unique integration.
Also hash word feels like it should not be decoded back. But maybe it is only me.

We use only two fields, projectId and issueId.
Storing cloudId in projectId for jira would be an exception to the rule.

However, it is also not a problem to have an additional cloudId field. Not sure I fully understand how having only common fields in query would simplify the code, as we need to parse and handle it individually anyway.

So alternative is :

integrationHash: JiraIssueId.join(cloudId, issueKey) // etc.
  integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: // nameWithOwner, **still** JiraProjectId.join(cloudId, projectKey)
    issueId // issueKey, issueNumber etc,
    cloudId // valid only for jira, but maybe we can reuse that field for other integrations
  },

I would still prefer to have no parsing probably and go with having separate field for cloudId

What do you think?

@Dschoordsch
Copy link
Contributor Author

  • <Service>Manager is currently only determined by the IntegrationProviderServiceEnum thus it cannot distinguish between authentication methods. This means we have one manager which internally has to deal with PAT vs. OAuth2 which shouldn't be too difficult
  • I don't have objections against storing necessary fields separately, but I don't understand the mentioned use case. I would imagine to look if a task already exists, we would hash it and check for the integrationHash. I agree searching for existing tasks by cloudId would be more cumbersome and we could store it separately just for that, but we're not querying for it at the moment. If we store it, we should still try to keep these integration specific inputs out of mutations if not necessary.
    @igorlesnenko We set the precedence in TeamMemberIntegrationAuth that we prefer having additional columns instead of JSON, so let's keep it consistent at least across our current integration journey. I would also keep everything out that's not needed to query directly from the DB.
    cloudId is an interesting case because as far as I can see there are very limited uses:
    • we pass it around to construct a projectId or integrationHash
    • it is stored in the authentication table and used to avoid 1 request when fetching available projects (which means the cloudIds stored could be stale)
    • it is used together with projectKey or projectKey and issueId in AtlassianManager which could be replaced by projectId and integrationHash usage instead
    • JiraDimensionFieldId has only cloudId but should really be project specific as fields can be configured on project level
  • We have 2 options to keep the DB consistent, we could add cloudId as generated field or add a check that it matches the integrationHash. Both would be done on writes, but the first one doesn't require the field to be updated manually, so it has a slight edge here.
  • @mattkrick Thanks for pointing me to jiraDimensionFields.
    We should be consistent and aim for simplicity. Since integrations are team specific and an integral part of the experience, let's put Team specific integration settings into Team and TeamMember specific settings into TeamMember. Only downside I can see is adding a couple of null columns to every TeamMember.

Which exact fields are on Task does not need to be decided right now since it's still stored in rethink and that's something we need to have decided before moving it to postgres. However I propose for new integrations trying to add no additional fields to Task and when transitioning to postgres trying to get along without the additional columns and only adding these as GENERATED ... STORED once we realize these are necessary.

@mattkrick
Copy link
Member

If we store it, we should still try to keep these integration specific inputs out of mutations if not necessary.

  • 100% I'm all for storing it in the integration object. we don't use it today, but for security if someone disables an entire cloud, it'd be nice to just nuke all the tasks (or whatever unknown future use case may arise)
  • I agree with you, I don't think we have to have a solution for task.integration yet. When we revisit, perhaps we make this a separate table or JSONB. github & gitlab are very similarly shaped, jira is a wild animal, who knows what other integration we'll add. lots more to learn!
  • deriving projectKey from issueKey works today, but it's possible that some jira admin doesn't use the default naming mechanism (a - delimiter). If that happens, we'll want to store projectKey separately. No action needed, but something to keep in the back of your mind

We have 2 options to keep the DB consistent, we could add cloudId as generated field or add a check that it matches the integrationHash. Both would be done on writes, but the first one doesn't require the field to be updated manually, so it has a slight edge here.

No strong preference here, just as long as it's not cumbersome or expensive for tasks connected via github, gitlab, et al

@igorlesnenko
Copy link
Contributor

igorlesnenko commented Mar 23, 2022

I faced this issue while doing #6238: "Store issueId instead of issueNumber for GitHub issue integration #6252"

We need to recalculate GitHub hashes or store issueId in integration.

Considering this, I would suggest the following:

integration: {
    accessUserId: this.accessUserId!,
    service: 'jira',
    projectId: JiraProjectId.join(cloudId, projectKey)
    issueId: JiraIssueId.join(cloudId, projectKey)
  },

@mattkrick
Copy link
Member

do we already have decided where we store integration specific values like search queries and project ids?

i think search queries probably deserve their own table. It makes sense to index them on teamId.
Keeping them on the Team table is OK, but 90% of the time you query a Team row you don't need the search query, so it's a little overfetch.

projectIds

IIRC projectIds are just a GUID consisting of a cloudId + projectKey. If that's correct, then we can just make it as long as we have both of those values.
today, an integrated jira task has {projectKey, cloudId, issueKey}, which is probably sufficient.

@nickoferrall
Copy link
Contributor

I haven't started implementing the search query for GitLab, but creating their own table sounds good to me.

For GitLab, I'm storing the gid that we get from GitLab, and the providerId to ensure the id is unique for self-hosted solutions. See here.

@jordanh jordanh removed the Next Sprint: Y Generated by Parabol label May 3, 2022
@jordanh
Copy link
Contributor

jordanh commented Jul 20, 2022

@Dschoordsch now that we're a few integrations in, should this issue be closed?

@Dschoordsch Dschoordsch self-assigned this Jul 21, 2022
@Dschoordsch
Copy link
Contributor Author

I would have to take another look, but I suspect we could unify search and also could migrate GitHub and Jira to IntegrationProvider. Both doesn't solve any immediate problems, so let's just skip this for now until we're in there again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants