Skip to content

Commit

Permalink
Miscellaneous changes (#94)
Browse files Browse the repository at this point in the history
* drop fastify-sentry in favour of plain sentry/node
* add sourcemaps for sentry
* add missing type annotations to routes
* remove action context
* add setting to configure concurrency
* add memoization for config
* import secret directly from config
* adjust yarn scripts
  • Loading branch information
Miłosz Skaza authored Apr 3, 2023
1 parent 1980145 commit 8c0280b
Show file tree
Hide file tree
Showing 24 changed files with 465 additions and 680 deletions.
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ DEBUG=false
SECRET=keyboard-cat
NODE_ENV=production
REDIS_URL=redis://127.0.0.1:6379
CONCURRENCY=0
HOST=127.0.0.1
PORT=3000
ENABLE_LEGACY_API=false
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ ENV NODE_ENV=production \

RUN yarn --production
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["yarn", "start:serve"]
CMD ["yarn", "start:docker"]
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ services:
REDIS_URL: redis://redis:6379

redis:
image: redis:alpine
image: redis:7.0-alpine
9 changes: 8 additions & 1 deletion docs/01-installing-tourist.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,18 @@ like [pm2](https://pm2.keymetrics.io/) or with a systemd service.
### Configuration Reference

| Option | Default | Description |
| --------------------- | ----------------------- |----------------------------------------------------------------|
|-----------------------|-------------------------|----------------------------------------------------------------|
| SECRET | (dynamically generated) | Secret value for token authentication purposes. |
| NODE_ENV | production | Environment Tourist is running in. |
| REDIS_URL | redis://127.0.0.1:6379 | URL of the redis server. |
| CONCURRENCY | (number of CPU threads) | Maximum number of jobs processed concurrently. |
| HOST | 127.0.0.1 | Host address that Tourist will listen on. |
| PORT | 3000 | Port on the host address that tourist will listen on. |
| ENABLE_LEGACY_API | false | Whether to enable legacy portion of the API (not recommended). |
| ENABLE_AUTHENTICATION | true | Whether to enable authentication with tokens (recommended). |

#### Note on concurrency

Concurrency value defaults to the number of threads present on the system. It is not recommend to go above this value,
as a headless browser can consume a full thread even for simple operations. You should also account for the RAM
available on your system, as each additional browser can consume somewhere around 100/200MB of RAM.
79 changes: 12 additions & 67 deletions docs/04-using-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,70 +20,17 @@ page.locator('#my-button').click();
```

It gets a little more complicated when trying to click all links, as we need to use the `.all()` method on the locator -
which itself returns a promise. We have two approaches to tackle that:

1. Use an IIFE statement to be able to await promises:

```js
(async () => {
const buttons = await page.locator('button').all();
for (button of buttons) {
await button.click();
}
})()
```

2. By using output from other actions - described in the section below:

### Utilizing the action context

Outside the `page` and `context` variables - there's one more object available in the sandbox - `actions`. It is an
array of action outputs. After each step's finish - the returned value is assigned to that array, at the same index as
the action that created it. With that in mind we can execute the `locator().all()` chain in the first action, and use
the returned value in the next, image an array of actions like this:

```js
[
"page.locator('button').all()",
// the locators are now available under context[0]
`(async () => {
for (const link of actions[0]) {
const pagePromise = context.waitForEvent('page');
await link.click({ button: 'middle' });
const newPage = await pagePromise;
await newPage.waitForLoadState();
await newPage.close();
}
})()`
]
```

While this might not look useful with a simple action like that, we believe that it will be for more complex workloads -
and we want to make Tourist as flexible as possible.

#### Action output is frozen inside the sandbox

Keep in mind that it's not possible to attach values to the context on your own on arbitrary indexes:
which itself returns a promise. To accomplish this, use an IIFE statement to be able to await promises:

```js
actions[100] = "test"
```
Action like this will have a different result than you might expect. The array itself is frozen and can't be modified
inside the sandbox - however the assignment operation returns the assigned value - so the value `"test"` will be
available on whatever index the action had.

This is by design - we don't want to prevent from assigning arbitrary values you might consider useful to the output,
we just want to ensure that they are placed on appropriate indexes and don't override each-other.

Security is provided by disallowing code generation inside the sandbox:

```js
new Function('return (1+2)')()
(async () => {
const buttons = await page.locator('button').all();
for (button of buttons) {
await button.click();
}
})()
```

Defining such action will raise an exception, and prevent further execution.

### Internals of the PlaywrightRunner

Actions are executed by `PlaywrightRunner` instances, created for each job. Let's quickly go over the livecycle of the
Expand All @@ -93,17 +40,15 @@ playwright runner:
2. Cookies are attached to the playwright context, page and context are created.
3. Actions are split into `preOpen` and `postOpen`. That's because some actions need to be executed before the page is
visited - like the `page.on` event handler.
4. A new VM is created, and the playwright page, context as well as the `actions` output array are frozen inside.
4. A new VM is created, and the playwright page and context are frozen inside.
5. `preOpen` actions are executed before navigation.
6. Navigation happens, the runner waits for the page to load, and executes `postOpen` actions inside the vm. **It's
important to understand that the `VM.run()` method is synchronous, that's why wrapping async code with iife expressions
is necessary.** It may of course, return a Promise, which is still a synchronous operation, and the runner will await
that Promise in its own async context, or just move forward if a concrete value has been returned.
7. Once the Promise is fulfilled, its value is assigned to the action output, at the index of the action that returned
it.
8. After the action is fully completed, the runner waits for the page to be in the load state again, and repeats the
7. After the action is fully completed, the runner waits for the page to be in the load state again, and repeats the
process for all the actions.
9. Finally, the runner moves onto finish and teardown. It gathers requested files: recording / screenshot / pdf, and
8. Finally, the runner moves onto finish and teardown. It gathers requested files: recording / screenshot / pdf, and
closes the browser, the context and the page.

#### A word about .map, .forEach and .reduce
Expand All @@ -112,7 +57,7 @@ This is not an issue specific to Tourist, however it's important to understand h
mentioned methods. It might be tempting to create an action like this:

```js
actions[0].map(async link => {
links.map(async link => {
await link.click({ button: "middle" });
})
```
Expand All @@ -121,7 +66,7 @@ However, **this will not work**, because playwright will attempt to click all th
As the async function returns a Promise when it's executed - it will not block further execution because there's nothing
awaiting it. This code will evaluate to an array of ready, but not fulfilled Promises. That's because Tourist will
await the full context - but not individual promises. Wrapping this code inside `Promise.all()` will only solve one of
the issues - the Promises will not be fulfilled, however playwright will still fail to click all the links.
the issues - the Promises will be fulfilled, however playwright will still fail to click all the links.

The correct way to approach this is to use a for loop inside an iife expression, as well as waiting for the page to load
by using the playwright context:
Expand Down
17 changes: 8 additions & 9 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@
"license": "Apache-2.0",
"scripts": {
"cmd:get-issuer-token": "ts-node ./src/cli.ts get-issuer-token",
"build": "rm -rf ./out && tsc -p tsconfig.json",
"start:serve": "node ./src/index.js",
"start:build": "yarn build && node out/index.js",
"start:app": "nodemon ./src/index.ts | pino-pretty",
"build": "rm -rf ./out && tsc -p tsconfig.production.json",
"start:docker": "node ./src/index.js | pino-pretty",
"start:serve": "node ./out/index.js | pino-pretty",
"start:build": "yarn build && yarn start:serve",
"start:dev": "nodemon ./src/index.ts | pino-pretty",
"start:test-app": "nodemon ./tests/utils/_server.ts",
"format:code": "prettier '{src,tests}/**/*.ts' --write",
"lint:code": "prettier --check '{src,tests}/**/*.ts'",
Expand All @@ -28,13 +29,15 @@
"@fastify/swagger": "^8.2.1",
"@fastify/swagger-ui": "^1.3.0",
"@fastify/type-provider-typebox": "^2.4.0",
"@immobiliarelabs/fastify-sentry": "^5.0.1",
"@sentry/integrations": "^7.46.0",
"@sentry/node": "^7.46.0",
"@sinclair/typebox": "^0.25.24",
"bull": "^4.10.2",
"dotenv": "^16.0.3",
"fastify": "^4.13.0",
"jsonwebtoken": "^9.0.0",
"lodash": "^4.17.21",
"pino-pretty": "^9.1.1",
"playwright": "^1.31.1",
"vm2": "^3.9.13"
},
Expand All @@ -49,7 +52,6 @@
"c8": "^7.12.0",
"markdownlint-cli": "^0.33.0",
"nodemon": "^2.0.20",
"pino-pretty": "^9.1.1",
"prettier": "^2.8.4",
"ts-node": "^10.9.1",
"typescript": "^4.9.4",
Expand All @@ -66,9 +68,6 @@
"nodeArguments": [
"--loader=ts-node/esm",
"--no-warnings"
],
"ignoredByWatcher": [
"static/openapi.json"
]
}
}
8 changes: 3 additions & 5 deletions src/app.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import Fastify, { FastifyServerOptions } from "fastify";
import { TypeBoxTypeProvider } from "@fastify/type-provider-typebox";

import fastifySentry from "@immobiliarelabs/fastify-sentry";
import fastifySwagger from "@fastify/swagger";
import fastifySwaggerUI from "@fastify/swagger-ui";

import initSentry from "./sentry";
import { TouristConfig } from "./config";
import { createRouter } from "./router";
import { SwaggerConfig } from "./swagger";
Expand All @@ -18,10 +18,8 @@ export const createApp = async (
app.decorate("config", config);

if (app.config.SENTRY_DSN) {
app.register(fastifySentry, {
dsn: app.config.SENTRY_DSN,
environment: app.config.ENV,
});
app.log.info("Sentry Reporting Enabled");
initSentry(app);
}

app.register(fastifySwagger, SwaggerConfig);
Expand Down
40 changes: 26 additions & 14 deletions src/config.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import os from "os";
import path from "path";
import dotenv from "dotenv";
import crypto from "crypto";

import _ from "lodash";
import { parseBool } from "./utils/config";

export declare type TouristConfig = {
DEBUG: boolean;
CONCURRENCY: number;
SECRET: string;
ENV: string;
REDIS_URL: string;
Expand All @@ -20,17 +23,26 @@ if (process.env.NODE_ENV !== "test") {
dotenv.config({ path: path.resolve(path.join(__dirname, "..", ".env")) });
}

export default {
DEBUG: parseBool(process.env.DEBUG, false),
SECRET: process.env.SECRET || crypto.randomBytes(48).toString("hex"),
ENV: process.env.NODE_ENV || "production",
REDIS_URL: process.env.REDIS_URL || "redis://127.0.0.1:6379",
HOST: process.env.HOST || "127.0.0.1",
PORT: parseInt(process.env.PORT ? process.env.PORT : "3000"),
ENABLE_LEGACY_API: parseBool(process.env.ENABLE_LEGACY_API, false),
ENABLE_AUTHENTICATION: parseBool(process.env.ENABLE_AUTHENTICATION, true),
SENTRY_DSN:
process.env.SENTRY_DSN && process.env.SENTRY_DSN !== ""
? process.env.SENTRY_DSN
: false,
} as TouristConfig;
const getConfig = () =>
({
DEBUG: parseBool(process.env.DEBUG, false),
SECRET: process.env.SECRET || crypto.randomBytes(48).toString("hex"),
ENV: process.env.NODE_ENV || "production",
REDIS_URL: process.env.REDIS_URL || "redis://127.0.0.1:6379",
CONCURRENCY:
process.env.CONCURRENCY &&
process.env.CONCURRENCY !== "" &&
process.env.CONCURRENCY !== "0"
? parseInt(process.env.CONCURRENCY)
: os.cpus().length,
HOST: process.env.HOST || "127.0.0.1",
PORT: parseInt(process.env.PORT ? process.env.PORT : "3000"),
ENABLE_LEGACY_API: parseBool(process.env.ENABLE_LEGACY_API, false),
ENABLE_AUTHENTICATION: parseBool(process.env.ENABLE_AUTHENTICATION, true),
SENTRY_DSN:
process.env.SENTRY_DSN && process.env.SENTRY_DSN !== ""
? process.env.SENTRY_DSN
: false,
} as TouristConfig);

export default _.memoize(getConfig)();
6 changes: 6 additions & 0 deletions src/jobs/api.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Job } from "bull";
import * as Sentry from "@sentry/node";

import config from "../config";
import { JobBrowser, JobCookieType, JobOptions, JobStepType } from "../schemas/api";
Expand All @@ -13,12 +14,17 @@ export declare type VisitJobData = {

export const asyncVisitJob = async (job: Job<VisitJobData>) => {
const { data } = job;

const runner = new PlaywrightRunner(data, config.DEBUG);

try {
await runner.init();
await runner.exec();
} catch (e: any) {
if (config.SENTRY_DSN) {
Sentry.captureException(e);
}

// change the job status to failed with the error message
await job.moveToFailed({ message: e.message });
}
Expand Down
6 changes: 4 additions & 2 deletions src/queue.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ import Queue from "bull";
import { legacySimpleVisitJob, LegacySimpleVisitJobData } from "./jobs/legacy";
import { asyncVisitJob, VisitJobData } from "./jobs/api";

import config from "./config";

const LegacySimpleVisitQueue = new Queue<LegacySimpleVisitJobData>(
"Legacy Simple Visit Job",
process.env.REDIS_URL || "redis://127.0.0.1:6379",
Expand All @@ -13,7 +15,7 @@ const AsyncVisitQueue = new Queue<VisitJobData>(
);

// Assign worker functions to queues
LegacySimpleVisitQueue.process(legacySimpleVisitJob);
AsyncVisitQueue.process(asyncVisitJob);
LegacySimpleVisitQueue.process(config.CONCURRENCY, legacySimpleVisitJob);
AsyncVisitQueue.process(config.CONCURRENCY, asyncVisitJob);

export { LegacySimpleVisitQueue, AsyncVisitQueue };
Loading

0 comments on commit 8c0280b

Please sign in to comment.