Skip to content
Robbie Chipka edited this page Dec 23, 2015 · 177 revisions

##Command reference

These are all of the "commands" that are available for chaining in an Osmosis instance.


#####( selector ) Click on nodes found by selector


#####( string ) Discard any nodes whose contents do not match string


#####( opts ) #####( key, val ) Set HTTP options and configure Osmosis


#####( callback( data ) ) Calls callback with the current data object

#####( null ) Empty the data object

#####( object ) Add or replace each key in the data object with a new val


#####( callback( msg ) ) Call callback when any debug messages are received


#####( seconds ) Delay starting next promise for seconds (float or int)


#####( osmosis..., osmosis... ) Call each Osmosis instance with the current context. This will always continue, even if an instance fails.


Reset the current context to the Document


#####( callback ) Create a DOM object from the current context.

The callback will be be called with 3 arguments (window, data, and next). The next([context], [data]) function must be called at least once


#####( callback ) Calls callback when parsing has completely finished


#####( callback( msg ) ) Call callback when any error messages are received


#####( selector ) Discard any nodes that match selector


#####( selector ) Discard any nodes that do not match selector


#####( selector ) Find elements based on selector anywhere within the current document


#####( [selector] ) Follow URLs found via selector. If selector isn't provided, follow will search the current element text or common URL attributes (href, src, etc).


.follow() .follow('@href') .follow('a') .follow('a@href') .follow('span.outlink') .follow('input.cloneURL@value') .follow('link[type="application/rss+xml"]@href')

##get / post

#####( url , [data] , [opts] ) Make an HTTP request

url - A string containing a URL, which can be relative to the current context.

data (optional) - An object containing GET query parameters or POST request data.

opts (optional) - An object containing HTTP request options.

Note: Query parameter values will be urlencoded by needle so make sure that your parameter values are not urlencoded.


#####( callback( msg ) ) Call callback when any log messages are received


#####( user , pass , [success] , [fail] ) Submit a login form.

#####Arguments: user - A string containing a username, email address, etc.

pass - A password string

success (optional) - A selector string determining if the login attempt succeeded

fail (optional) - A selector string determining if the login attempt failed

######How it works login finds the first form containing input[type="password"] and uses that input as the password field. It will use the preceding <input> element as the user field.


#####( [selector], RegExp ) Discard any nodes whose contents do not match RegExp

##page / paginate

#####( selector , [limit] ) Paginate the previous request limit times based on selector.

####selector: selector (String) - A selector string for either:

  • an element with the next page URL in its inner text or in an attribute that commonly contains a URL (href, src, etc.)
  • an element whose name and value attributes will respectively be added or replaced in the next page query.

selector (Object) - An object where each key is a query parameter name and each value is either a selector string or an increment amount (+1, -1, etc.).

####limit: limit (Number) - Total number of "next page" requests to make.

limit (String) - A selector string for an element containing the total number of requests to make.

.paginate('a.nextPage') // go to `a.nextPage` `@href`
.paginate('link[rel="next"]@href') // go to `link` `@href`
.paginate('input[name="page"]') // update `page` parameter of the next query

// adds 20 to the `startIndex` query parameter
// sets `page` query parameter to `a.nextPage` content
// stops after 15 requests are made
.paginate({ startIndex: +20,  page: 'a.nextPage' }, 15)

##pause / resume / stop

Pause, resume or stop an osmosis instance.


#####( string ) Parse an HTML or XML string

#####Arguments: string - A string or buffer containing the HTML/XML data


#####( name , selector) Set name to the value of selector

#####( object ) Set each key to the value of each val selector.

.set('title') // set 'title' to current element text .set('title', 'a.title') // set 'title' to text of 'a.title' .set({ title: 'a.title', description: 'p.description', url: 'a.permalink @href', images: ['img @src'], comments: [ osmosis .follow('a.comments') .find('div.comment') .set({ 'author': '.author' 'content': 'p.content', 'date': '.date' }) ] });


#####( selector , [data] ) Submit a form

#####Arguments: selector - A selector for the <form> element

data (optional) - An object where each key and value represents a form input name and value


#####( callback( context, data, [next], [done] ) ) Calls callback with the context of the current element.

####context: The context argument is the current context at that point in the command chain. If the previous command was get, post, follow, or parse then the context will be a Document. If the previous command was find then the current context will be one of the Elements that was found.

####data: The data argument contains values set via osmosis.set. This object can be modified in any way.

####next: The next argument is a function that will call the next command. It takes two arguments: context and data.

####done: The done argument is a function to call when then will no longer call next. This is only required if then calls next asynchronously any number of times.

Note: If the callback accepts done as an argument, it must always call done, even if next was never called.

####Functions The callback will have these functions bound to its this value:

  • this.request(method, url, [data], callback([err], context), [opts])
  • this.log(msg)
  • this.debug(msg)
  • this.error(msg)


Example 1: find every ul > li and pass it to the next command

osmosis ... .then(function(context, data, next) { var items = context.find('ul > li'); items.forEach(function(item) { next(item, data); }) })

**Example 2:** set `data.url` to the current page URL

osmosis ... .then(function(context, data, next) { data.url = context.doc().request.url; next(context, data); })

**Example 3:** only continue if `lastname != undefined`

osmosis ... .then(function(context, data, next) { if (data.lastname != undefined) next(context, data) })

**Example 4:** using the `done` function

osmosis ... .then(function(context, data, next, done) { if (db.connected == false) { this.error('database disconnected'); done(); return; } data.someArray.forEach(function(obj, index) {, function() { next(context, data); if (index == data.someArray.length-1) done(); }) }) })

Clone this wiki locally