Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(chat): Refactor prompt hierarchy #834

Merged
merged 6 commits into from
Sep 26, 2024

Conversation

nirinchev
Copy link
Contributor

@nirinchev nirinchev commented Sep 24, 2024

Description

This is a bit controversial, so putting it as a draft to get some feedback on the direction. It's preparation work for VSCODE-606 - it unifies all prompts under a common base class, which I'm planning to then extend with capability for counting user/participant message lengths as well as a flag for whether sample docs have been included in the prompt (per TD. While there are less disruptive ways to achieve this, this one felt the cleanest in terms of separation of concerns and extensibility.

It should also set us up for VSCODE-614 as abstracting the base functionality in a common place would allow us to count the tokens and drop history messages that push us past the model token limit.

Checklist

Motivation and Context

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

Comment on lines +29 to +33
interface DocsRequestMetadata {
intent: 'docs';
chatId: string;
docsChatbotMessageId?: string;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a drive-by - I wanted to more strongly confine the metadata types to ensure we don't have docsChatbotMessageId unless the intent is docs.

}

static buildMessages({
async buildMessages({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is the only one I couldn't map super cleanly and had to override the base class implementation. I have some ideas for how to restructure the abstraction to support these more complex history/prompt rewrites, but those would be more complicated than the current implementation and didn't feel it's justified to pursue until we have more than one use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll revisit this function and also how the other prompt message building functions handle namespace and connect messages/prompts in:
https://jira.mongodb.org/browse/VSCODE-611
I think the other prompts would benefit from a similar prompt change when the last message was selecting a namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I was following up on this PR with telemetry stuff and noticed history scraping could be applied to all messages, not just namespace.

@nirinchev nirinchev marked this pull request as ready for review September 25, 2024 17:13
Copy link
Member

@Anemy Anemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I like the refactor to get the types more aligned. two some suggestions, I'd like if we can avoid the argument mutation one, not blockers.

src/participant/prompts/namespace.ts Outdated Show resolved Hide resolved
src/participant/prompts/promptBase.ts Outdated Show resolved Hide resolved
}

static buildMessages({
async buildMessages({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll revisit this function and also how the other prompt message building functions handle namespace and connect messages/prompts in:
https://jira.mongodb.org/browse/VSCODE-611
I think the other prompts would benefit from a similar prompt change when the last message was selecting a namespace.

The schema is generated from a sample of documents in the user's collection.
You must follows these rules.
You must follow these rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -59,11 +58,14 @@ export async function getStringifiedSampleDocuments({
}

const stringifiedDocuments = toJSString(additionToPrompt);
promptInputTokens = await model.countTokens(prompt + stringifiedDocuments);

// TODO: model.countTokens will sometimes return undefined - at least in tests. We should investigate why.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something Alena ran into, The model returned by vscode.lm.selectChatModels is always undefined in tests. So there's no model to count the tokens.

Copy link
Member

@Anemy Anemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Nice update to share the connect message prompt from the rest of the prompts.

@nirinchev nirinchev merged commit 08fac17 into VSCODE-528-mongodb-copilot Sep 26, 2024
3 checks passed
@nirinchev nirinchev deleted the ni/prompt-refactor branch September 26, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants