.cursorrules

You are working with Scrapybara, a TypeScript SDK for deploying and managing remote desktop instances for AI agents. Use this guide to properly interact with the SDK.

CORE SDK USAGE:

- Initialize client: import { ScrapybaraClient } from "scrapybara"; const client = new ScrapybaraClient({ apiKey: "KEY" });
- Instance lifecycle:
    const instance = await client.startUbuntu({ timeoutHours: 1 });
    await instance.pause(); // Pause to save resources
    await instance.resume({ timeoutHours: 1 }); // Resume work
    await instance.stop(); // Terminate and clean up
- Instance types:
    const ubuntuInstance = client.startUbuntu(); // supports bash, computer, edit, browser
    const browserInstance = client.startBrowser(); // supports computer, browser
    const windowsInstance = client.startWindows(); // supports computer

CORE INSTANCE OPERATIONS:

- Screenshots: const base64Image = await instance.screenshot().base64Image;
- Bash commands: await instance.bash({ command: "ls -la" });
- Mouse control: await instance.computer({ action: "mouse_move", coordinate: [x, y] });
- Click actions: await instance.computer({ action: "left_click" });
- File operations: await instance.file.read({ path: "/path/file" }), await instance.file.write({ path: "/path/file", content: "data" });

ACT SDK (Primary Focus):

- Purpose: Enables building computer use agents with unified tools and model interfaces
- Core components:

1. Model: Handles LLM integration (currently Anthropic)
    import { anthropic } from "scrapybara/anthropic";
    const model = anthropic(); // Or model = anthropic({ apiKey: "KEY" }) for own key

2. Tools: Interface for computer interactions

    - bashTool: Run shell commands
    - computerTool: Mouse/keyboard control
    - editTool: File operations
    const tools = [
        bashTool(instance),
        computerTool(instance),
        editTool(instance),
    ];

3. Prompt:
    - system: system prompt, recommend to use UBUNTU_SYSTEM_PROMPT, BROWSER_SYSTEM_PROMPT, WINDOWS_SYSTEM_PROMPT
    - prompt: simple user prompt
    - messages: list of messages
    - Only include either prompt or messages, not both

const { messages, steps, text, output, usage } = await client.act({
    model: anthropic(),
    tools,
    system: UBUNTU_SYSTEM_PROMPT,
    prompt: "Task",
    onStep: handleStep
});

MESSAGE HANDLING:

- Response Structure: Messages are structured with roles (user/assistant/tool) and typed content
- Content Types:
- TextPart: Simple text content
    { type: "text", text: "content" }
- ImagePart: Base64 or URL images
    { type: "image", image: "base64...", mimeType: "image/png" }
- ToolCallPart: Tool invocations
    {
        type: "tool-call",
        toolCallId: "id",
        toolName: "bash",
        args: { command: "ls" }
    }
- ToolResultPart: Tool execution results
    {
        type: "tool-result",
        toolCallId: "id",
        toolName: "bash",
        result: "output",
        isError: false
    }

STEP HANDLING:

// Access step information in callbacks
const handleStep = (step: Step) => {
    console.log(`Text: ${step.text}`);
    if (step.toolCalls) {
        for (const call of step.toolCalls) {
            console.log(`Tool: ${call.toolName}`);
        }
    }
    if (step.toolResults) {
        for (const result of step.toolResults) {
            console.log(`Result: ${result.result}`);
        }
    }
    console.log(`Tokens: ${step.usage?.totalTokens ?? 'N/A'}`);
};

STRUCTURED OUTPUT:
Use the schema parameter to define a desired structured output. The response's output field will contain the validated typed data returned by the model.

const schema = z.object({
    posts: z.array(z.object({
        title: z.string(),
        url: z.string(),
        points: z.number(),
    })),
});

const { output } = await client.act({
    model: anthropic(),
    tools,
    schema,
    system: UBUNTU_SYSTEM_PROMPT,
    prompt: "Get the top 10 posts on Hacker News",
});

const posts = output.posts;

TOKEN USAGE:

- Track token usage through TokenUsage objects
- Fields: promptTokens, completionTokens, totalTokens
- Available in both Step and ActResponse objects

Here's a brief example of how to use the Scrapybara SDK:

import { ScrapybaraClient } from "scrapybara";
import { anthropic } from "scrapybara/anthropic";
import { UBUNTU_SYSTEM_PROMPT } from "scrapybara/prompts";
import { bashTool, computerTool, editTool } from "scrapybara/tools";

const client = new ScrapybaraClient();
const instance = await client.startUbuntu();
await instance.browser.start();

const { messages, steps, text, output, usage } = await client.act({
    model: anthropic(),
    tools: [
        bashTool(instance),
        computerTool(instance),
        editTool(instance),
    ],
    system: UBUNTU_SYSTEM_PROMPT,
    prompt: "Go to the YC website and fetch the HTML",
    onStep: (step) => console.log(`${step}\n`),
});

await instance.browser.stop();
await instance.stop();

EXECUTION PATTERNS:

1. Basic agent execution:

const { messages, steps, text, output, usage } = await client.act({
    model: anthropic(),
    tools,
    system: "System context here",
    prompt: "Task description"
});

2. Browser automation:

const cdpUrl = await instance.browser.start().cdpUrl;
const authStateId = await instance.browser.saveAuth({ name: "default" }).authStateId;  // Save auth
await instance.browser.authenticate({ authStateId });  // Reuse auth

3. File management:

await instance.file.write({ path: "/tmp/data.txt", content: "content" });
const content = await instance.file.read({ path: "/tmp/data.txt" }).content;

IMPORTANT GUIDELINES:

- Always stop instances after use to prevent unnecessary billing
- Use async/await for all operations as they are asynchronous
- Handle API errors with try/catch blocks
- Default timeout is 60s; customize with timeout parameter or requestOptions
- Instance auto-terminates after 1 hour by default
- For browser operations, always start browser before browserTool usage
- Prefer bash commands over GUI interactions for launching applications

ERROR HANDLING:

import { ApiError } from "scrapybara/core";
try {
    await client.startUbuntu();
} catch (e) {
    if (e instanceof ApiError) {
        console.error(`Error ${e.statusCode}: ${e.body}`);
    }
}

BROWSER TOOL OPERATIONS:

- Required setup:
const cdpUrl = await instance.browser.start().cdpUrl;
const tools = [browserTool(instance)];
- Commands: goTo, getHtml, evaluate, click, type, screenshot, getText, getAttribute
- Always handle browser authentication states appropriately

ENV VARIABLES & CONFIGURATION:

- Set env vars: await instance.env.set({ API_KEY: "value" });
- Get env vars: const vars = await instance.env.get().variables;
- Delete env vars: await instance.env.delete(["VAR_NAME"]);

Remember to handle resources properly and implement appropriate error handling in your code. This SDK is primarily designed for AI agent automation tasks, so structure your code accordingly.