This document contains the official definition of the WAW (web automation workflow) format.
Are you looking for a way of running files in this format? Check out the
wbr
project on npm or in its GitHub repository.
The WAW format is a declarative format for specifying web-related workflows. It enables the user to control the automation flow with conditional expressions, allowing to make decisions based on the websites content. It is also easily parsable (based on JSON), which greatly simplifies validation, visualization and third-party adoption.
The .waw
(not to be confused with .wav) file is a textual format used for quick, safe and declarative definition of web automation workflows.
Syntactically, .waw
should always be a valid .json
file. If you are unsure what .json
is, refer to the official documentation.
Note: From now on, the .waw file will be considered a valid JSON file and all the terminology (object, array) will be used in this context.
On the top level, the workflow file contains an object with two properties - "meta"
- an object with the workflow's metadata and "workflow"
- a single array of so-called "where-what pairs". These pairs contain three properties with keys id
, where
, and what
.
The
id
property is used only for pair referencing (more in State memory) and can be omitted.
Here follows a top-level view of the Workflow file:
{
"meta" : {
...
}
"workflow": [
{
"id": "login",
"where": {...},
"what": [...]
},
{
"id": "signup",
"where": {...},
"what": [...]
},
...
]
]
The meta
header of the file can contain two fields:
- "name" -
string
- optional, name of the workflow (for easier management) - "desc" -
string
- optional, text description of the workflow. Even though all the metadata is optional, developers are strongly advised to use them for clarity and easier management of the workflows.
{
"name": "Google Maps Scraper",
"desc": "A blazing fast scraper for Google Maps search results."
}
The "workflow" part of the file is a single array consisting of the where-what pairs - objects describing desired behavior in different situations.
For example, let's say we want to click on a button with the label "hello" every time we get on the page "https://example.com/". This behavior is described with the following snippet:
{
"where": { "url": "https://example.com" },
"what": [
{
"action": "click",
"args": ["button:text('hello')"]
}
]
}
Now, let's say we want to type "Hello world!" into an input field, whenever we see an input field on the "https://example.com" website:
{
"where": {
"url": "https://example.com",
"selectors": "input"
},
"what": [
{
"action": "type",
"args": [
"input",
"Hello world!"
]
}
]
}
This should be enough to give you some basic understanding of the WAW Smart Workflow format. In the following sections, there are more details about the format and its certain features.
The Where clause describes a condition required for the respective What clause to be executed.
In the basic version without the state memory (more later), we can count with the Markov assumption, i.e. the Where clause always depends only on the current browser state and its "applicability" can be evaluated statically, knowing only the browser's state at the given point.
For this reason, the workflow can be executed on different tabs in parallel (any popup window open from the first passed page is processed as well).
The where
clause is an object with various keys.
The specific "basic" keys (like
url
,cookies
etc.) are implementation-dependent and are not a part of the format specification. Keys shown here correspond to thewbr-interpret
implementation.
As of now, three keys are recognized:
- URL (string or RegEx)
- cookies (object with string keys and string/RegEx values)
- selectors (array of CSS/Playwright selectors - all of the targetted elements must be present in the page to match this clause)
An example of a full (simple, flat) Where clause:
"where": {
"url": "https://jindrich.bar/",
"cookies": {
"uid": "123456"
},
"selectors": [
":text('My Profile')",
"button.logout"
]
}
For a system operating with conditions, it is crucial to have a simple way to work with formal logic. The WAW format is taking inspiration from the MongoDB query operators, as shown in the example below:
"where": {
"$and": [
{
"url": "https://jindrich.bar/",
},
{
"$or": [
{
"cookies": {
"uid": "123456"
}
},
{
"selectors": [
":text('My Profile')",
"button.logout"
]
}
]
}
]
}
This notation describes a condition where the URL is https://jindrich.bar/
and there is either the uid
cookie set with the specified value, or there are the selectors present. Please note that the top-level $and
condition is redundant, as the conjunction of the conditions is the implicit operation.
As of now, the format supports the following boolean operators: $and
, $or
and $not
.
Note that the ordering of the rules in the file is crucial. Consider the following example:
{
"where": { "url": "https://jindrich.bar" },
"what": [{ "action A" }]
},
{
"where": { "url": "https://jindrich.bar" },
"what": [{ "action B" }]
},
The where
conditions in the displayed pairs are the same, i.e. when the interpreter gets to the webpage https://jindrich.bar
, it has two possible action sequences to carry out. This situation makes little sense, as the workflow definition needs to be as strict as possible and cannot allow non-deterministic behaviour of the interpreter.
For this reason, the definition of the workflow file says that only the first matching action gets executed.
Even though the colliding conditions were easy to spot in the example above, this problem can get a little more nuanced, for example:
{
"where": {
"selectors": ["h1", "ul"]
},
"what": [{ "action A" }]
},
{
"where": {
"selectors": [".large-heading","#list"]
},
"what": [{ "action B" }]
},
While there is no visible collision in the described conditions, the interpreter behavior might be surprising on the following page:
...
<h1 class="large-heading">Heading</h1>
<ul id="list">
<li>a</li>
...
</ul>
...
Again, the interpreter will execute only action A
, even though both conditions apply.
Another way to think of this is "put more specific conditions closer to the top".
As mentioned earlier, the interpreter also has an internal memory which allows for more specific conditions. Some of those could be e.g.
"where": {
"$after": "login" // login being an "id" of another where-what pair
}
"where": {
"$before": "signup"
}
As of now, the metatags $before
and $after
are supported. The meaning behind those is to allow an action to be run only after (or before) another action has been executed.
The memory for actions used is tab-scoped, i.e. every new tab has its own memory of used actions (the tabs run the workflow independently of each other).
[Hacker's Tip] : The $before
condition specifically can be used to run an action only once ("id": "self", ..., "$before" : "self"
).
In the most basic version, the What clause should contain a sequence of actions, which should be carried out in case the respective Where condition is satisfied.
Note: While the interpreter
wbr-interpret
uses Playwright for its backend, the WAW format is suitable for use with any other backend. Just like with the Where clause basic keys, the action's names and parameters are not a part of the format specification.
The what
clause is an array of "function" objects. These objects consist of the action
field, describing the function called and args
- an optional array property, providing parameters for the specified function.
"what":[
{
"action": "functionAcceptingString",
"args": ["theFirstParameter"]
},
{
"action":"voidFunction",
},
{
"action":"moreParameters",
"args": [
1000,
"string parameter",
{
"option": true
}
]
}
]
In wbr-interpret
, these actions correspond to the Playwright's Page class methods (goto
,fill
, click
...). On top of this, users can use dot notation to access the Page
's properties and call their methods (e.g. page.keyboard.press
etc.) All parameters passed must be JSON's native types, i.e. scalars, arrays, or objects (no functions etc.)
On top of the Playwright's native methods/functions, the user can also use some interpreter-specific functions.
As of now, these are:
screenshot
- this is overriding Playwright'spage.screenshot
method and saves the screenshot using the interpreter's binary output callback.scrape
- using a heuristic algorithm, the interpreter tries to find the most important items on the webpage, parses those into a table and pushes the table into the serializable callback.- user can also specify the item from the webpage to be scraped (using a Playwright-style selector).
scrapeSchema
- getting a "row schema definition" with column names and selectors, the interpreter scrapes the data from a webpage into a "curated" table.- Example:
{ "action": "scrapeSchema", "args": [{ "name": ".c-item-title", "price": ".c-a-basic-info__price", "vin": ".c-vin-info__vin", "desc": ".c-car-properties__text" }] }
scroll
- scrolls down the webpage for given number of times (default =1
).script
- allows the user to run an arbitrary asynchronous function in the interpreter. The function's body is read as a string from theparams
field and evaluated at the server side (as opposed to a browser). The function accepts one parameter namedpage
, being the current Playwright Page instance.- Example:
The example runs a server-side script opening all links on the current page in new tabs with 100 ms delay (Note: if you only want to open links on a page, see{ "action": "script", "args": ["\ const links = await page.evaluate(() => \ {\ return Array.from(\ document.querySelectorAll('a.c-item__link.sds-surface--clickable')\ ).map(a => a.href);\ });\ \ for(let link of links){\ await new Promise(res => setTimeout(res, 100));\ await page.context().newPage().then(page => page.goto(link))\ }\ "] },
enqueueLinks
lower).- Even though it is possible to write the whole workflow using one
script
field, we do not endorse this. The WAW format should allow the developers to write comprehensible, easy-to-maintain workflow definitions.
enqueueLinks
(new in 0.4.0)- Accepts
selector
parameter. Reads elements targetted by the specified selector (Playwright selectors) and stores their links in a queue. - Those pages are then processed using the same workflow as the initial page (in parallel if the
maxConcurrency
interpreter parameter is greater than 1).
- Accepts
Apart from the mentioned syntax available for direct workflow specification, the WAW format contains more constructs for even better flexibility of the format.
The format supports usage of regular expressions, both in the conditions and the action parameters. The syntax is inspired by the MongoDB regex syntax and looks as follows:
"url": {"$regex": "^https"}
Such a rule matches every URL on a secured website, i.e. starting with https
.
The WAW format also allows the developer to parametrize the workflow - this can be particularly useful, e.g. for letting the user insert their login information, URL to be scraped etc.
{
"action": "goto",
"args": [
{"$param": "startURL"}
]
}
The interpreter of the format should allow the user to include their own value to replace the entire parameter structure with the user-supplied value.
In case you want to automatically check a workflow definition file for syntax correctness, use the official JSON Schema.
Note that this JSON schema validates the files only against the base WAW definition.
To validate the files against the wbr-interpret
implementation of the WAW format, please use the validateWorkflow
method of the Preprocessor
class.
Want to see a real-world example of a workflow? Visit the examples folder with numerous example workflows.
Ready to automate? Read how to write your first workflow step-by-step.