Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add categoriser to fixtures in tests, fix streaming bugs (WIP) #323

Draft
wants to merge 28 commits into
base: fix/dropdown_sections
Choose a base branch
from

Conversation

stefl
Copy link
Contributor

@stefl stefl commented Oct 30, 2024

Description

  • Fixes a bug where we were not initialising the Aila instance correctly
  • Adds the Categoriser results to the test fixtures in E2E tests
  • Fixes some streaming logic bugs where it would take 30 seconds to stream the response from fixtures in E2E tests
  • Makes it so that Aila must be initialised just once
  • Fixes the issue where we are seeing repeated categorisations for each step of the lesson plan process (this should speed things up!)
  • Adds a new generateObject method on the LLMService which allows us to do non-streaming generations

Waiting to merge the streaming UX PR before continuing on this

Copy link

vercel bot commented Oct 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
oak-ai-lesson-assistant ❌ Failed (Inspect) Jan 7, 2025 0:44am

moderationAiClient,
ragService: (aila: AilaServices) => new AilaRag({ aila }),
chatCategoriser: (aila: AilaServices) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the categoriser at initialisation, in the same way we do with the other features

@@ -26,5 +26,7 @@ export const defaultConfig: Config = {
userId: undefined,
},
});
await createdAila.initialise();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the Aila instance is initialised

topic: "Basics and Advanced Techniques",
},
}),
chatCategoriser: () =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight change to how we initialise the categoriser

@@ -123,15 +122,38 @@ export class Aila implements AilaServices {
this._plugins = options.plugins;
}

private checkInitialised() {
if (!this._initialised) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our own sanity, throw an error if we try to use methods that rely on initialisation

if (
!sections.includes("title") ||
!sections.includes("subject") ||
!sections.includes("keyStage")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about topic because it's optional

!sections.includes("subject") ||
!sections.includes("keyStage")
) {
await this._lesson.setUpInitialLessonPlan(this._chat.messages);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug that was causing repeated categorisations

await this.moderate();
await this.persistChat();
await this.persistGeneration("SUCCESS");
await Promise.all([
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can parallelise these

try {
await this._chat.setupGeneration();
this.logStreamingStep("Setup generation complete");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of tracing, these help understand what's happening during streaming and how long each step is taking

@@ -87,4 +94,31 @@ export class OpenAIService implements LLMService {

return newStream.getReader();
}

async generateObject<T>({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new method that we can use for categorisation etc. that uses structured outputs

@@ -0,0 +1,10 @@
import type { LessonPlanKeys, LooseLessonPlan } from "../../protocol/schema";

export const completedSections = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper to list the keys that have values on the lesson plan

content: "Respond to the user and help them edit the lesson plan",
};
const nextStep = allSteps[0] ?? finalStep;
const step = `${nextStep.title}\n\n${nextStep.content}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduces noisy logging during streaming

.nullable()
.describe("The reasoning behind the categorisation"),
keyStage: z
.enum([...keyStages] as [string, ...string[]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More tightly specify the valid options for categorisation

Copy link

@stefl stefl changed the title feat: add categoriser to fixtures in tests, fix streaming bugs feat: add categoriser to fixtures in tests, fix streaming bugs (WIP) Oct 30, 2024
@codeincontext codeincontext marked this pull request as draft November 5, 2024 11:06
@stefl stefl changed the base branch from main to fix/dropdown_sections November 7, 2024 14:11
Copy link

github-actions bot commented Jan 7, 2025

Playwright test results

failed  2 failed
passed  16 passed
skipped  2 skipped

Details

report  Open report ↗︎
stats  20 tests across 17 suites
duration  2 minutes, 43 seconds
commit  0f60bbe

Failed tests

No persona › tests/aila-chat/rate-limiting.test.ts › User is restricted after message rate limit is reached
Common persona - mobile › tests/aila-chat/full-romans.mobile.test.ts › Full aila flow with Romans fixture

Skipped tests

No persona › tests/auth.test.ts › authenticate through Clerk UI
No persona › tests/chat-performance.test.ts › Component renders during lesson chat › There are no unnecessary rerenders across left and right side of chat

Copy link

sonarqubecloud bot commented Jan 7, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants