Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Authentication Providers Support for Cody #6526

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

pkukielka
Copy link
Contributor

@pkukielka pkukielka commented Jan 5, 2025

Fixes https://linear.app/sourcegraph/issue/CODY-4642

External Authentication Provider Support for Cody

This PR introduces support for external authentication providers in Cody, allowing users to integrate with custom authentication proxies and handle complex authentication scenarios.

Feature Overview

This feature requires clients to have reverse proxy and custom sourcegraph instance configured to use HTTP authentication.

The external authentication provider feature allows clients to generate, for a specified endpoint, custom auth headers. Those headers will be attached to every authenticated http request instead of the normal "Authorization": "token sgp_SOME_TOKEN" auth header.

To generate those custom headers client need to specify command that generates authentication headers for specific endpoints. The command must output a JSON object containing header key-value pairs on stdout. Those endpoints URLs needs to point to proxies configured by client which redirects requests to the custom sourcegraph instance.

Whole flow looks like this:

  1. When Cody attempts to connect to a endpoint which has defined external provider it executes the specified command
  2. The command outputs a JSON object containing header key-value pairs on stdout
  3. These headers are attached to subsequent authorised requests to the endpoint
  4. The proxy server processes these headers and converts them to appropriate X-Forwarded-User and/or X-Forwarded-Email headers as specified in the documentation
  5. The Sourcegraph instance authenticates the user based on these forwarded headers

Configuration

Users can configure custom authentication providers in their vscode settings.json using the following structure:

"cody.auth.externalProviders": [
    {
        "endpoint": "http://localhost:5555",
        "executable": {
            "commandLine": ["echo '{ \"headers\": { \"Authorization\": \"Bearer SomeUser\" } }'"],
            "shell": "/bin/bash",       // Optional: Shell to execute the command with. Default: '/bin/sh' on Unix, process.env.ComSpec on Windows.
            "environment": {            // Optional: Additional environment variables
                "SOME_ENV": "VALUE"
            },
            "timeout": 5000,            // Optional: Timeout in milliseconds
            "windowsHide": true         // Optional: Hide the window on Windows
        }
    }
]

It can also be configured in IntelliJ using settings editor:
image

User can define as many external providers as needed.

If only one provider is needed and login using this provider should be forced, it will be possible to accomplish using overrideServerEndpoint.

Configuration Options

  • endpoint: The URL of the proxy server that will handle the authentication
  • executable: Configuration for the command that generates authentication headers
    • commandLine: Array of command and arguments to execute
    • shell: (Optional) Specific shell to use for command execution
    • environment: (Optional) Additional environment variables for the command
    • workingDir: (Optional) Working directory for command execution
    • timeout: (Optional) Command execution timeout
    • windowsHide: (Optional) Hide window when executing on Windows

Expected Output

Script or executable specified in the configuration have to return valid JSON object which adheres to the schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["headers"],
  "properties": {
    "headers": {
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    },
    "refreshTime": {
      "type": "string"
    }
  }
}

Where:

  • headers (Required) [Map with string keys and values] - headers which will be attached to authenticated requests to the http proxy
  • expiration (Optional) [a number] - Epoch Unix Timestamp (UTC) of the headers expiration date; after expiration date headers will be re-generated automatically using configured command

Testing Locally

  1. Start the provided reverse proxy:
    python agent/scripts/reverse-proxy.py https://your-sourcegraph-instance.com 5555
    You can choose different port or start a few different proxies for different endpoints.

  2. Add the proxy configuration to your settings:

"cody.auth.externalProviders": [
    {
        "endpoint": "http://localhost:5555",
        "executable": {
            "commandLine": ["echo '{ \"Authorization\": \"Bearer TestUser\" }'"],
            "shell": "/bin/bash"
        }
    }
]
  1. In Cody sign in to http://localhost:5555 endpoint
  2. Verify that you're authenticated as TestUser.

Security Considerations

  1. Ensure that the proxy server properly validates and sanitizes authentication headers
  2. The executable should be secured and have appropriate permissions
  3. Consider using HTTPS for the proxy endpoint in production environments

Missing features

  1. Fastpath users custom tokens for authentication, we need to check if and how we can support it with custom auth providers.
  2. Cli is currently not supported, but should be trivial to add support for it.

Test plan

  1. Setup local testing environment as described in the Testing Locally section.
  2. Run a full QA.

Changelog

@pkukielka pkukielka force-pushed the pkukielka/external-auth-provider-impl branch from 068277d to 5bf3cd4 Compare January 5, 2025 16:56
@pkukielka pkukielka changed the title Initial implementation Custom Authentication Providers Support for Cody Jan 6, 2025
@pkukielka pkukielka force-pushed the pkukielka/external-auth-provider-impl branch 2 times, most recently from 9fc4bcf to 129682b Compare January 7, 2025 16:17
@pkukielka pkukielka changed the title Custom Authentication Providers Support for Cody External Authentication Providers Support for Cody Jan 9, 2025
@dominiccooney
Copy link
Contributor

@MaedahBatool FYI this PR description and test plan are gold for the documentation we'll need to write for this feature.

Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will send more feedback but here's a first packet so you don't have to wait any longer. Sorry for the slow reply on this! Looks great but I am keen to read every detail.

@@ -1500,7 +1499,7 @@ export class Agent extends MessageHandler implements ExtensionClient {
},
auth: {
serverEndpoint: config.serverEndpoint,
accessToken: config.accessToken ?? null,
accessTokenOrHeaders: config.accessToken || null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This treats empty strings differently, is that relevant? (Why did we ?? in the first place?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I does not really matter, I'm not sure why I changed it, reverted now.

tokenSource?: TokenSource | undefined
accessTokenOrHeaders: string | AuthHeaders | null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can structure this better.

Does having headers versus a token imply something about the tokenSource?

If we only have token sources for tokens, let's make a sub object with { accessToken: string, tokenSource: TokenSource } which gets rid of the subtle duplication of accessToken's | null and tokenSource's | undefined (...do we ever have a token without a source?)

If we can have TokenSource for headers, then let's consider renaming tokenSource/TokenSource, maybe something like source: CredentialSource.

This is AuthCredentials, so can we simply call the important field credentials?

So maybe something like:

export interface AuthCredentials {
  serverEndpoint: string
  credentials: HeaderCredential | TokenCredential | undefined
}

export interface HeaderCredential {
  headers: Record<string, string>,
}

export interface TokenCredential {
  token: string,
  source: TokenSource,
}

Then the call sites will read something like:

if (auth.credentials) {
   ... ok we have some creds of some kind ...
}

and when we get down to the business, then auth.credentials.token and so on...

We could even add type: 'headers', type: 'token' if there's a lot of dispatch on the credential type so you can use switch if that makes things readable.

Do header credentials need an expiry? IIRC the script was going to give us a time to use the credentials for...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea regarding interface! It will be cleaner indeed.

As for credentials expiry... that is a good question.

I can add required expiration field to the JSON which should be returned by a command, but I'm not sure what would be best way to invalidate it. Just spawn some worker which will wait until exp time and then force auth refresh? That could work, although does not sound too elegant.

Alternative would be to automatically refresh auth on 403 error from the server, but it would also require some exponential backoff and it would be not up to the spec anyway.

If you don't mind I created https://linear.app/sourcegraph/issue/CODY-4663/add-support-for-the-token-expiration-date to address it in a followup to not dump everything in one huge PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored interface to the one you suggested.

@@ -166,6 +182,9 @@ interface RawClientConfiguration {
*/
overrideServerEndpoint?: string | undefined
overrideAuthToken?: string | undefined

// External auth providers
authExternalProviders?: ExternalAuthProvider[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be optional if we could just make it [] by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be just []

@@ -166,6 +182,9 @@ interface RawClientConfiguration {
*/
overrideServerEndpoint?: string | undefined
overrideAuthToken?: string | undefined

// External auth providers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this comment is adding anything that isn't in the field and type name, so remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

env: cmd.environment ? { ...process.env, ...cmd.environment } : process.env,
}

const { stdout, stderr } = await execAsync(command, options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test plan talks about timeouts, where is that happening?

Copy link
Contributor Author

@pkukielka pkukielka Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here:

const options = {
        ...cmd,

It works because my exec is matching node exec API, including option names. But it is hard to spot, if you think it makes sense I can just reassign all options directly so it will be clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it explicit.

},
"timeout": {
"type": "number",
"description": "Timeout for executing the command in miliseconds."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Timeout for executing the command in miliseconds."
"description": "Timeout for executing the command in milliseconds."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've read the whole thing:

  • We should not store these creds on disk, and it would be great if the types, and runtime made it really clear that won't happen.
  • We can make that AuthCredentials type much cleaner although you don't need to hew to my exact suggestion of course. There's probably something more elegant you can come up with.
  • Let's think about future directions. Like what if the script wants to return instructions to us about credential handling? Could we put the headers in a field, headers, so there's room to dump other things in the JSON output in future?
  • What about credential expiries?
  • In practice, if the proxy kicks you out, how good are we at echoing error message status text from the proxy? Fine if we look at that in a separate PR but I'm curious what you found so far.

}

const { stdout, stderr } = await execAsync(command, options)
if (stderr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use exit codes, not error output, to measure success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

clientConfiguration.overrideServerEndpoint || endpoint
)

// We must not throw here, because that would result in the `resolvedConfig` observable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're just moving this, but the situation is a bit 😱

The comment says we must not throw, but we will throw if we end up on line 136 via line 150. No need to address it here but I'm interested in what you think about this.

accessTokenOrHeaders = authHeaders
tokenSource = 'custom-auth-provider'
} else {
accessTokenOrHeaders = (await loadTokenFn()) || null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client does not want us to ever persist these tokens, by the way. Anything we can do at the type level to prove that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I will add some tests to prove it works.

@@ -39,10 +40,9 @@ export class SourcegraphBrowserCompletionsClient extends SourcegraphCompletionsC
...requestParams.customHeaders,
} as HeadersInit)
addCodyClientIdentificationHeaders(headersInstance)
addAuthHeaders(config.auth, headersInstance, url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 💯

await secretStorage.storeToken(
serverEndpoint,
credentials.accessToken,
credentials.accessTokenOrHeaders,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The particular customer for the first iteration of this feature wants us to never store these creds on disk.

It would be great to use types somehow to make a credible claim about that.

Maybe we could put something that isn't JSON serializable, like a Symbol, as the field value and store a WeakMap from Symbol -> actual creds which we look up when we're writing headers. That would be a pretty compelling runtime proof that we don't store the credential in the access point history, because the credential simply isn't in this structure and even its placeholder can't be written to disk. (...although it wouldn't say anything about whether the network stack is being logged to disk or whatever...)

@pkukielka pkukielka force-pushed the pkukielka/external-auth-provider-impl branch 2 times, most recently from 68dd847 to 00022d1 Compare January 10, 2025 16:27
@pkukielka pkukielka force-pushed the pkukielka/external-auth-provider-impl branch from 00022d1 to 1c4e54a Compare January 10, 2025 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants