Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(js): Vitest/Jest evals #1343

Merged
merged 70 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
24fda8a
Initial unit test form factor exploration
jacoblee93 Dec 18, 2024
5dd083a
Merge
jacoblee93 Dec 20, 2024
8ab9e15
Refactor and add matchers
jacoblee93 Dec 20, 2024
39b5ed8
Fix bug
jacoblee93 Dec 20, 2024
0d424e2
Improve error message
jacoblee93 Dec 20, 2024
5903486
Fix ALS issues
jacoblee93 Dec 20, 2024
b5c54af
Fix lint
jacoblee93 Dec 20, 2024
c644b4d
Fix ALS issue
jacoblee93 Dec 20, 2024
70e94f5
Final fix for async local storage issue (hopefully)
jacoblee93 Dec 20, 2024
8a7947c
Adds support for .each
jacoblee93 Dec 21, 2024
e702a07
Avoid default export
jacoblee93 Dec 22, 2024
96b0c55
Fix for Bun
jacoblee93 Dec 22, 2024
8ec7fab
Lint
jacoblee93 Dec 22, 2024
0a91635
Use passed expect
jacoblee93 Dec 22, 2024
3fbfdd5
Merge branch 'main' of github.com:langchain-ai/langsmith-sdk into jac…
jacoblee93 Dec 28, 2024
4a62565
Add n parameter, fix deduping
jacoblee93 Dec 28, 2024
06a7d44
Allow more config for .each
jacoblee93 Dec 31, 2024
bbc628f
Add test for custom matchers, typing tweaks
jacoblee93 Jan 3, 2025
05f17ae
Update signature, typing, naming
jacoblee93 Jan 3, 2025
385910f
Allow passing experiment creation information into describe
jacoblee93 Jan 3, 2025
407470b
Add configurable test tracking
jacoblee93 Jan 3, 2025
cb7cb1a
Allow running test over a LangSmith datset
jacoblee93 Jan 4, 2025
6b52759
Fix for bun
jacoblee93 Jan 6, 2025
23c7466
Speed up latency, make tracking opt-out instead of opt-in
jacoblee93 Jan 6, 2025
364ffad
Rename outputs to expected, refactor feedback logging, nicer errors
jacoblee93 Jan 7, 2025
6abd444
Fix lint
jacoblee93 Jan 7, 2025
f14b31c
Add methods to manually log feedback/outputs
jacoblee93 Jan 8, 2025
775cfbe
Fix eval run attribution
jacoblee93 Jan 8, 2025
d620051
Fix local mode
jacoblee93 Jan 8, 2025
31420aa
Refactor, add Vitest support
jacoblee93 Jan 8, 2025
c74573e
Fix lint
jacoblee93 Jan 8, 2025
039e7de
Fix type
jacoblee93 Jan 8, 2025
314da52
Merge
jacoblee93 Jan 8, 2025
8ae1c80
Remove unused dep
jacoblee93 Jan 8, 2025
a77d3d5
Adds Jest reporter
jacoblee93 Jan 13, 2025
2d93811
Merge branch 'main' of github.com:langchain-ai/langsmith-sdk into jac…
jacoblee93 Jan 13, 2025
a810c8a
Revert
jacoblee93 Jan 13, 2025
6bb23d5
Adds Vitest reporter, refactor
jacoblee93 Jan 13, 2025
574dcbb
Fix
jacoblee93 Jan 13, 2025
e1824bb
Allow export of default entrypoints
jacoblee93 Jan 13, 2025
54d3825
Add experiment URL, polish
jacoblee93 Jan 13, 2025
73712fa
Test name
jacoblee93 Jan 13, 2025
6920a90
Update test
jacoblee93 Jan 13, 2025
0937427
Fix display when test fails
jacoblee93 Jan 13, 2025
42a87dd
Naming fixes
jacoblee93 Jan 13, 2025
47d0b7e
Add feedback column collapse
jacoblee93 Jan 13, 2025
d07f42e
Merge branch 'main' of github.com:langchain-ai/langsmith-sdk into jac…
jacoblee93 Jan 15, 2025
644aca5
Refactor for simplicity, fix experiment creation
jacoblee93 Jan 15, 2025
2a45e00
Fix typo
jacoblee93 Jan 15, 2025
539e5f2
Adds docstrings
jacoblee93 Jan 15, 2025
49e1580
Add docs link in thrown errors
jacoblee93 Jan 15, 2025
c0323ab
Merge branch 'main' of github.com:langchain-ai/langsmith-sdk into jac…
jacoblee93 Jan 15, 2025
2485055
Fix typo
jacoblee93 Jan 15, 2025
c781c9e
Merge
jacoblee93 Jan 15, 2025
1056a67
Merge branch 'main' of github.com:langchain-ai/langsmith-sdk into jac…
jacoblee93 Jan 16, 2025
c3c6b6d
Adds tracking
jacoblee93 Jan 16, 2025
b41d129
Allow other parameterization of test inputs
jacoblee93 Jan 16, 2025
f6a1c28
Rename logOutput to logOutputs
jacoblee93 Jan 17, 2025
b615035
Fix concurrency issue
jacoblee93 Jan 17, 2025
ff412f6
Rename expected -> referenceOutputs and actual -> outputs
jacoblee93 Jan 17, 2025
922d20c
Docstring
jacoblee93 Jan 17, 2025
503f128
Fix lint
jacoblee93 Jan 17, 2025
89313dd
Bump default test timeout
jacoblee93 Jan 18, 2025
d12e1ce
Add git commit tagging
jacoblee93 Jan 18, 2025
d22f759
Fix tests
jacoblee93 Jan 18, 2025
d02dc31
Update
jacoblee93 Jan 18, 2025
bd8ecd5
Remove log
jacoblee93 Jan 20, 2025
7a982bd
Adds CI
jacoblee93 Jan 20, 2025
0046863
Bump version
jacoblee93 Jan 20, 2025
98feca9
Fix CI
jacoblee93 Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions js/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,26 @@ Chinook_Sqlite.sql
/langchain.js
/langchain.d.ts
/langchain.d.cts
/jest.cjs
/jest.js
/jest.d.ts
/jest.d.cts
/jest/reporter.cjs
/jest/reporter.js
/jest/reporter.d.ts
/jest/reporter.d.cts
/vercel.cjs
/vercel.js
/vercel.d.ts
/vercel.d.cts
/vitest.cjs
/vitest.js
/vitest.d.ts
/vitest.d.cts
/vitest/reporter.cjs
/vitest/reporter.js
/vitest/reporter.d.ts
/vitest/reporter.d.cts
/wrappers.cjs
/wrappers.js
/wrappers.d.ts
Expand Down
9 changes: 9 additions & 0 deletions js/ls.vitest.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import { defineConfig } from "vitest/config";

export default defineConfig({
test: {
include: ["**/*.vitesteval.?(c|m)[jt]s"],
reporters: ["./src/vitest/reporter.ts"],
setupFiles: ["dotenv/config"],
},
});
57 changes: 56 additions & 1 deletion js/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,26 @@
"langchain.js",
"langchain.d.ts",
"langchain.d.cts",
"jest.cjs",
"jest.js",
"jest.d.ts",
"jest.d.cts",
"jest/reporter.cjs",
"jest/reporter.js",
"jest/reporter.d.ts",
"jest/reporter.d.cts",
"vercel.cjs",
"vercel.js",
"vercel.d.ts",
"vercel.d.cts",
"vitest.cjs",
"vitest.js",
"vitest.d.ts",
"vitest.d.cts",
"vitest/reporter.cjs",
"vitest/reporter.js",
"vitest/reporter.d.ts",
"vitest/reporter.d.cts",
"wrappers.cjs",
"wrappers.js",
"wrappers.d.ts",
Expand Down Expand Up @@ -103,7 +119,8 @@
"homepage": "https://github.com/langchain-ai/langsmith-sdk#readme",
"dependencies": {
"@types/uuid": "^10.0.0",
"commander": "^10.0.1",
"chalk": "^4.1.2",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new deps are just for test reporting, if desired we could split reporters into a separate package

"console-table-printer": "^2.12.1",
"p-queue": "^6.6.2",
"p-retry": "4",
"semver": "^7.6.3",
Expand All @@ -114,6 +131,7 @@
"@babel/preset-env": "^7.22.4",
"@faker-js/faker": "^8.4.1",
"@jest/globals": "^29.5.0",
"@jest/reporters": "^29.7.0",
"@langchain/core": "^0.3.14",
"@langchain/langgraph": "^0.2.20",
"@langchain/openai": "^0.3.11",
Expand Down Expand Up @@ -143,6 +161,7 @@
"typedoc": "^0.27.6",
"typedoc-plugin-expand-object-like-types": "^0.1.2",
"typescript": "^5.4.5",
"vitest": "^2.1.8",
"zod": "^3.23.8"
},
"peerDependencies": {
Expand Down Expand Up @@ -232,6 +251,24 @@
"import": "./langchain.js",
"require": "./langchain.cjs"
},
"./jest": {
"types": {
"import": "./jest.d.ts",
"require": "./jest.d.cts",
"default": "./jest.d.ts"
},
"import": "./jest.js",
"require": "./jest.cjs"
},
"./jest/reporter": {
"types": {
"import": "./jest/reporter.d.ts",
"require": "./jest/reporter.d.cts",
"default": "./jest/reporter.d.ts"
},
"import": "./jest/reporter.js",
"require": "./jest/reporter.cjs"
},
"./vercel": {
"types": {
"import": "./vercel.d.ts",
Expand All @@ -241,6 +278,24 @@
"import": "./vercel.js",
"require": "./vercel.cjs"
},
"./vitest": {
"types": {
"import": "./vitest.d.ts",
"require": "./vitest.d.cts",
"default": "./vitest.d.ts"
},
"import": "./vitest.js",
"require": "./vitest.cjs"
},
"./vitest/reporter": {
"types": {
"import": "./vitest/reporter.d.ts",
"require": "./vitest/reporter.d.cts",
"default": "./vitest/reporter.d.ts"
},
"import": "./vitest/reporter.js",
"require": "./vitest/reporter.cjs"
},
"./wrappers": {
"types": {
"import": "./wrappers.d.ts",
Expand Down
19 changes: 19 additions & 0 deletions js/scripts/create-entrypoints.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,22 @@ const entrypoints = {
"evaluation/langchain": "evaluation/langchain",
schemas: "schemas",
langchain: "langchain",
jest: "jest/index",
"jest/reporter": "jest/reporter",
vercel: "vercel",
vitest: "vitest/index",
"vitest/reporter": "vitest/reporter",
wrappers: "wrappers/index",
anonymizer: "anonymizer/index",
"wrappers/openai": "wrappers/openai",
"wrappers/vercel": "wrappers/vercel",
"singletons/traceable": "singletons/traceable",
};

const defaultEntrypoints = [
"vitest/reporter"
];

const updateJsonFile = (relativePath, updateFunction) => {
const contents = fs.readFileSync(relativePath).toString();
const res = updateFunction(JSON.parse(contents));
Expand All @@ -34,6 +42,17 @@ const generateFiles = () => {
const nrOfDots = key.split("/").length - 1;
const relativePath = "../".repeat(nrOfDots) || "./";
const compiledPath = `${relativePath}dist/${value}.js`;
if (defaultEntrypoints.includes(key)) {
return [
[
`${key}.cjs`,
`module.exports = require('${relativePath}dist/${value}.cjs').default;`,
],
[`${key}.js`, `export { default } from '${compiledPath}'`],
[`${key}.d.ts`, `export { default } from '${compiledPath}'`],
[`${key}.d.cts`, `export { default } from '${compiledPath}'`],
];
}
return [
[
`${key}.cjs`,
Expand Down
132 changes: 124 additions & 8 deletions js/src/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
RawExample,
AttachmentInfo,
AttachmentData,
DatasetVersion,
} from "./schemas.js";
import {
convertLangChainMessageToExample,
Expand Down Expand Up @@ -284,6 +285,15 @@
sourceRunId?: string;
};

export type CreateProjectParams = {
projectName: string;
description?: string | null;
metadata?: RecordStringAny | null;
upsert?: boolean;
projectExtra?: RecordStringAny | null;
referenceDatasetId?: string | null;
};

type AutoBatchQueueItem = {
action: "create" | "update";
item: RunCreate | RunUpdate;
Expand Down Expand Up @@ -429,7 +439,7 @@
// If there is an item on the queue we were unable to pop,
// just return it as a single batch.
if (popped.length === 0 && this.items.length > 0) {
const item = this.items.shift()!;

Check warning on line 442 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Forbidden non-null assertion
popped.push(item);
poppedSizeBytes += item.size;
this.sizeBytes -= item.size;
Expand Down Expand Up @@ -862,7 +872,7 @@
if (this._serverInfo === undefined) {
try {
this._serverInfo = await this._getServerInfo();
} catch (e) {

Check warning on line 875 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

'e' is defined but never used. Allowed unused args must match /^_/u
console.warn(
`[WARNING]: LangSmith failed to fetch info on supported operations. Falling back to batch operations and default limits.`
);
Expand Down Expand Up @@ -1597,7 +1607,7 @@
treeFilter?: string;
isRoot?: boolean;
dataSourceType?: string;
}): Promise<any> {

Check warning on line 1610 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
let projectIds_ = projectIds || [];
if (projectNames) {
projectIds_ = [
Expand Down Expand Up @@ -1885,7 +1895,7 @@
`Failed to list shared examples: ${response.status} ${response.statusText}`
);
}
return result.map((example: any) => ({

Check warning on line 1898 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
...example,
_hostUrl: this.getHostUrl(),
}));
Expand All @@ -1898,14 +1908,7 @@
upsert = false,
projectExtra = null,
referenceDatasetId = null,
}: {
projectName: string;
description?: string | null;
metadata?: RecordStringAny | null;
upsert?: boolean;
projectExtra?: RecordStringAny | null;
referenceDatasetId?: string | null;
}): Promise<TracerSession> {
}: CreateProjectParams): Promise<TracerSession> {
const upsert_ = upsert ? `?upsert=true` : "";
const endpoint = `${this.apiUrl}/sessions${upsert_}`;
const extra: RecordStringAny = projectExtra || {};
Expand Down Expand Up @@ -2022,7 +2025,7 @@
}
// projectId querying
return true;
} catch (e) {

Check warning on line 2028 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

'e' is defined but never used. Allowed unused args must match /^_/u
return false;
}
}
Expand Down Expand Up @@ -2480,6 +2483,53 @@
return (await response.json()) as Dataset;
}

/**
* Updates a tag on a dataset.
*
* If the tag is already assigned to a different version of this dataset,
* the tag will be moved to the new version. The as_of parameter is used to
* determine which version of the dataset to apply the new tags to.
*
* It must be an exact version of the dataset to succeed. You can
* use the "readDatasetVersion" method to find the exact version
* to apply the tags to.
* @param params.datasetId The ID of the dataset to update. Must be provided if "datasetName" is not provided.
* @param params.datasetName The name of the dataset to update. Must be provided if "datasetId" is not provided.
* @param params.asOf The timestamp of the dataset to apply the new tags to.
* @param params.tag The new tag to apply to the dataset.
*/
public async updateDatasetTag(props: {
datasetId?: string;
datasetName?: string;
asOf: string | Date;
tag: string;
}): Promise<void> {
const { datasetId, datasetName, asOf, tag } = props;

if (!datasetId && !datasetName) {
throw new Error("Must provide either datasetName or datasetId");
}
const _datasetId =
datasetId ?? (await this.readDataset({ datasetName })).id;
assertUuid(_datasetId);

const response = await this.caller.call(
_getFetchImplementation(),
`${this.apiUrl}/datasets/${_datasetId}/tags`,
{
method: "PUT",
headers: { ...this.headers, "Content-Type": "application/json" },
body: JSON.stringify({
as_of: typeof asOf === "string" ? asOf : asOf.toISOString(),
tag,
}),
signal: AbortSignal.timeout(this.timeout_ms),
...this.fetchOptions,
}
);
await raiseForStatus(response, "update dataset tags");
}

public async deleteDataset({
datasetId,
datasetName,
Expand Down Expand Up @@ -2939,6 +2989,72 @@
return result;
}

/**
* Get dataset version by closest date or exact tag.
*
* Use this to resolve the nearest version to a given timestamp or for a given tag.
*
* @param options The options for getting the dataset version
* @param options.datasetId The ID of the dataset
* @param options.datasetName The name of the dataset
* @param options.asOf The timestamp of the dataset to retrieve
* @param options.tag The tag of the dataset to retrieve
* @returns The dataset version
*/
public async readDatasetVersion({
datasetId,
datasetName,
asOf,
tag,
}: {
datasetId?: string;
datasetName?: string;
asOf?: string | Date;
tag?: string;
}): Promise<DatasetVersion> {
let resolvedDatasetId: string;
if (!datasetId) {
const dataset = await this.readDataset({ datasetName });
resolvedDatasetId = dataset.id;
} else {
resolvedDatasetId = datasetId;
}

assertUuid(resolvedDatasetId);

if ((asOf && tag) || (!asOf && !tag)) {
throw new Error("Exactly one of asOf and tag must be specified.");
}

const params = new URLSearchParams();
if (asOf !== undefined) {
params.append(
"as_of",
typeof asOf === "string" ? asOf : asOf.toISOString()
);
}
if (tag !== undefined) {
params.append("tag", tag);
}

const response = await this.caller.call(
_getFetchImplementation(),
`${
this.apiUrl
}/datasets/${resolvedDatasetId}/version?${params.toString()}`,
{
method: "GET",
headers: { ...this.headers },
signal: AbortSignal.timeout(this.timeout_ms),
...this.fetchOptions,
}
);

await raiseForStatus(response, "read dataset version");

return await response.json();
}

public async listDatasetSplits({
datasetId,
datasetName,
Expand Down Expand Up @@ -3399,7 +3515,7 @@
async _logEvaluationFeedback(
evaluatorResponse: EvaluationResult | EvaluationResults,
run?: Run,
sourceInfo?: { [key: string]: any }

Check warning on line 3518 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
): Promise<[results: EvaluationResult[], feedbacks: Feedback[]]> {
const evalResults: Array<EvaluationResult> =
this._selectEvalResults(evaluatorResponse);
Expand Down Expand Up @@ -3438,7 +3554,7 @@
public async logEvaluationFeedback(
evaluatorResponse: EvaluationResult | EvaluationResults,
run?: Run,
sourceInfo?: { [key: string]: any }

Check warning on line 3557 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
): Promise<EvaluationResult[]> {
const [results] = await this._logEvaluationFeedback(
evaluatorResponse,
Expand Down Expand Up @@ -3934,7 +4050,7 @@

public async createCommit(
promptIdentifier: string,
object: any,

Check warning on line 4053 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
options?: {
parentCommitHash?: string;
}
Expand Down Expand Up @@ -4166,7 +4282,7 @@
isPublic?: boolean;
isArchived?: boolean;
}
): Promise<Record<string, any>> {

Check warning on line 4285 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type
if (!(await this.promptExists(promptIdentifier))) {
throw new Error("Prompt does not exist, you must create it first.");
}
Expand All @@ -4177,7 +4293,7 @@
throw await this._ownerConflictError("update a prompt", owner);
}

const payload: Record<string, any> = {};

Check warning on line 4296 in js/src/client.ts

View workflow job for this annotation

GitHub Actions / Check linting

Unexpected any. Specify a different type

if (options?.description !== undefined)
payload.description = options.description;
Expand Down
Loading
Loading