Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chat): filter namespace messages from history if it exists in metadata VSCODE-611 #866

Merged
merged 60 commits into from
Nov 12, 2024

Conversation

gagik
Copy link
Contributor

@gagik gagik commented Nov 5, 2024

Scrub the namespace messages from the history when we have identified and are building the messages to send to the ai. We already include the namespace in the query assistant message. We might want to move these to be additions to the user's message instead (in both query and schema).

Description

Checklist

Motivation and Context

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

@gagik gagik changed the base branch from main to gagik/no-database-or-collection-error November 7, 2024 21:08
@gagik gagik changed the base branch from gagik/no-database-or-collection-error to gagik/one-no-collection-handling November 7, 2024 21:09
@gagik gagik changed the base branch from gagik/one-no-collection-handling to main November 7, 2024 21:34
@gagik gagik marked this pull request as ready for review November 8, 2024 08:51
@gagik gagik requested review from Anemy and alenakhineika November 8, 2024 08:53
@gagik gagik changed the base branch from main to gagik/one-no-collection-handling November 8, 2024 12:16
@gagik gagik changed the base branch from gagik/one-no-collection-handling to main November 8, 2024 12:16
@gagik gagik changed the base branch from main to gagik/no-database-or-collection-error November 8, 2024 13:53
@gagik gagik force-pushed the gagik/filter-namespace branch from 6ed4488 to 4ad4c5f Compare November 8, 2024 15:53
Base automatically changed from gagik/no-database-or-collection-error to main November 8, 2024 17:21
Copy link
Member

@Anemy Anemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, left a couple code quality suggestions.

@@ -163,16 +165,27 @@ export abstract class PromptBase<TArgs extends PromptArgsBase> {
protected getHistoryMessages({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is starting to get a bit long and hard to follow all of the things happening in it.
It already has the // eslint-disable-next-line complexity which is usually an indicator that we should break it into a few functions, even if it comes with a slightly hit to performance (like running through the messages multiple times).
Should we do that now? Break this function into multiple parts where each is doing a certain thing? That'll also make it more easily unit testable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like a question about this area in code and maybe a bit of a request :) if you will refactor this code, could we split the history into something like getUserHistoryMessages and getAssistantHistoryMessages. When testing this functionality I found it difficult to print the history content because it is already wrapped into the vscode.LanguageModelChatMessage format when it is returned here: https://github.com/mongodb-js/vscode/blob/main/src/participant/prompts/promptBase.ts#L130

Maybe, it could be something like:

const messages = [
  vscode.LanguageModelChatMessage.Assistant(this.getAssistantPrompt(args)),
  vscode.LanguageModelChatMessage.Assistant(this.getAssistantHistoryMessages()),
  vscode.LanguageModelChatMessage.User(this.getUserHistoryMessages()),
  vscode.LanguageModelChatMessage.User(prompt),
];

Or something like that. The idea here is that we have some unformatted string value to print to see what message we send to the model.

Copy link
Member

@Anemy Anemy Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the messages here is important, although the last message will always be the user prompt and the first is the assistant prompt, the user history and assistant history aren't sequential.
We do currently log information about the messages we are sending to the model:

messages: modelInput.messages.map(

@alenakhineika we can add something there that will log them in their entirety, and not just the metadata, if an environment variable is set. How does that sound?

Copy link
Contributor Author

@gagik gagik Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also dealing changing getHistoryMessages in VSCODE-632 so I think it'll be best to deal with some refactoring for it in the PR there.

src/test/suite/participant/participant.test.ts Outdated Show resolved Hide resolved
@gagik
Copy link
Contributor Author

gagik commented Nov 12, 2024

For the sake of keeping this simple to review, going to merge and do any greater potential refactoring work in the PR for VSCODE-632.

@gagik gagik merged commit dd80613 into main Nov 12, 2024
6 checks passed
@gagik gagik deleted the gagik/filter-namespace branch November 12, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants