-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize mentions for merge requests in Gitlab #3437
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is what you want, this will just sanitize links to github on pull requests created on GitLab. GitLab MergeRequests shouldn't create issue/PR references on github.
Are you looking at sanitizing links to GitLab MergeRequests and Issues?
Yes, exactly, you need to sanitize mentions in Gitlab Merge Requests as well. There is currently an issue, that if dependabot creates some merge request where release notes contain a mention of @andrcuns for example, it would send a notification to this user because it is also a valid handle in Gitlab which also uses I have had this change as monkey patch for quite a while already, but I think this should be added in core since I imagine quite a few users run dependabot as a script on Gitlab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrcuns yeah makes sense 👍 we won't be able to rely on the LinkAndMentionSanitizer
for this though as this only works for github users/links/PRs and re-writes @userhandles
to github.com links. I think the way to go here would be adding a GitLabLinkAndMentionSanitizer
and using this instead when the source is gitlab.
We're love a contribution to fix this and happy to review it.
Well, technically it works as expected, because all users/links/PR's always reference Github, I don't think dependabot fetches any info from non Github sources, does it? Here is an example: https://gitlab.com/dependabot-gitlab/dependabot/-/merge_requests/567. The changelog actually points to Github user profiles, which is who actually contributed the changes. I don't think it would make much sense to sanitize these user profiles and create links that point to Gitlab profile counter parts. Do You think it makes sense to implement |
It depends on the dependency, if the package is hosted on gitlab it will. I'm still not following exactly what the problem is. The MergeRequest you link to has two user mentions, Is the problem not with gitlab usernames getting mentions on gitlab? |
The problem has nothing to do with Github, it's Gitlab only :) I must be bad at explaining it. The problem is exactly that these 2 users would receive a notification on Gitlab about a merge request in a project that has nothing to do with them simply because they are mentioned in the changelog. The example MR works correctly and did not ping those 2 users, because it was created by a script that has the proposed changes as monkey patch and it transformed the usernames to urls to github. I need to dig deeper in the code then, if description can reference release notes or changelog that comes from gitlab, then indeed it would need it's own sanitizer. |
Ah gotcha, think I follow now. So it will work as expected if the mentioned users are always github users. I was thinking of the case where these users are only on gitlab, in which case this change would suddenly link them to non-existent github users. Hm not entirely sure what the best approach is here. Will have a poke at the code to see if we know where we are fetching changelogs/release notes from and see if we can toggle based on this instead of the updated repo. |
Yes, exactly what I was thinking now. If it's possible to know where release notes come from, it should be pretty straight forward to implement the proper sanitation. I was not aware that release notes can even come from Gitlab so didn't think that linking to non existent Github accounts might be an issue. I guess similar problem can happen with referencing issues and PR's/MR's in changelog since it could reference to incorrect platform. Let me know what you find. I'm willing to try and implement it. |
@@ -244,7 +244,7 @@ def source_provider_supports_html? | |||
end | |||
|
|||
def sanitize_links_and_mentions(text, unsafe: false) | |||
return text unless source.provider == "github" | |||
return text unless source.provider == "github" || source.provider == "gitlab" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly we need something like...
return text if running_on?(:github) && source.provider == 'github'
return text if running_on?(:gitlab) && source.provider == 'gitlab'
Does that capture the reason that this PR is blocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, the issue is determining the source of Release notes
and Changelog
so we do not sanitize links pointing natively to Gitlab as Github links (I have not encountered such a case myself though).
If I understand correctly, source.provider
is not that, it is basically a pointer where the dependabot-core runs and not where changelog comes from.
👋 Sorry for the delay here. Both @xlgmokha and @feelepxyz have transitioned off the core team, so this got dropped for a bit.
I haven't dug deeply into that code, but if we don't support querying non-GitHub sources today, I'm sure we'll support it in the future. Having correct/useful metadata all in one place in the PR description is one of the value-adds of Dependabot--I absolutely loved that about Dependabot when using it at my previous employer. So if we query a registry like PyPI for that data and it points us at GitLab, then we should be grabbing this data from GitLab. Again, not sure if we do this today, but given the importance we'll do it in the future so I'm a bit hesitant to extend the current "hack" of assuming that if we're running on GitHub that all links point to GitHub.
Exactly.
Yeah, that's a much more sustainable / better solution IMO. I opened #5729 to track that. This probably isn't something we can get to anytime soon, but if you want to take a crack at it I'll try to be responsive on the PR reviews to keep it moving. Given all this, I'm going to close this PR as "the right problem but wrong solution". And again, my apologies for the delay here! |
Actually, I think I jumped the gun on closing this. The reality is that most open source projects are hosted on GitHub, and most dependency bumps are open-source so extending the hack to include doing the sanitization on GitLab is probably fine in the short term. Long term we'd still want to address this through looking at the metadata source (#5729), but in the meantime this PR probably improves the user experience for anyone running on GitLab. Alternatively we could remove the check altogether and improve the user experience for all the other platforms. The one user experience that would be negatively impacted is a team working at GitLab across a bunch of internal repos... suddenly their internal links will start pointing at GitHub rather than their internal repos. But given that @andrcuns already applies this as a monkey patch, I doubt many folks will be affected. I'm going to get a second opinion internally before merging, just to be sure I'm not missing something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After noodling on it and getting a second opinon, going to merge this!
I'm going to also open a follow-up PR to remove the guard altogether, as we probably should sanitize on all platforms. And then address the underlying bug in #5729.
As pointed out in #5729, we should be using the "source" of the metadata for deciding how to sanitize, not the "place where Dependabot is hosted". This hack was annoying the folks who use GitLab to the point that [they monkey-patched this code to also run the sanitizer on GitLab](#3437 (comment)), and then contributed that back in #3437. If it's true on GitLab, it's even more true on other platforms as well for the reasons explained in #3437 (comment). So remove this guard altogether.
As pointed out in #5729, we should be using the "source" of the metadata for deciding how to sanitize, not the "place where Dependabot is hosted". This hack was annoying the folks who use GitLab to the point that [they monkey-patched this code to also run the sanitizer on GitLab](#3437 (comment)), and then contributed that back in #3437. If it's true on GitLab, it's even more true on other platforms as well for the reasons explained in #3437 (comment). So remove this guard altogether.
I have encountered an issue that quite often handles in gitlab will be the same as on github. This creates an issue where PR descriptions in various projects can send notifications to people mentioned in release notes.
I'm not entirely sure if same issue could exist on bitbucket as well and maybe this check should be removed all together. For now it seemed safer to explicitly just add gitlab to conditional.