-
Notifications
You must be signed in to change notification settings - Fork 247
Home
##Command reference
These are all of the "commands" that are available for chaining in an Osmosis instance.
##click
#####( selector ) Click on nodes found by selector
##contains
#####( string ) Discard any nodes whose contents do not match string
##config
#####( opts ) #####( key, val ) Set HTTP options and configure Osmosis
##data
#####( callback( data ) ) Calls
callback
with the current data object#####( null ) Empty the data object
#####( object ) Add or replace each
key
in the data object with a newval
##debug
#####( callback( msg ) ) Call
callback
when any debug messages are received
##delay
#####( seconds ) Delay starting next promise for
seconds
(float or int)
##do
#####( osmosis..., osmosis... ) Call each Osmosis instance with the current context
##doc
Reset the current context to the
Document
##dom
#####( callback ) Create a DOM object from the current context.
The
callback
will be be called with 3 arguments (window
,data
, andnext
). Thenext([context], [data])
function must be called at least once
##done
#####( callback ) Calls
callback
when parsing has completely finished
##error
#####( callback( msg ) ) Call
callback
when any error messages are received
##failure/fail
#####( selector ) Discard any nodes that match
selector
##filter/success
#####filter( selector ) Discard any nodes that do not match
selector
##find
#####( selector ) Find elements based on
selector
anywhere within the current document
##follow
#####( [selector] ) Follow URLs found via
selector
. Ifselector
isn't provided,follow
will search the current element text or common URL attributes (href, src, etc).####Examples:
.follow() .follow('@href') .follow('a') .follow('a@href') .follow('span.outlink') .follow('input.cloneURL@value') .follow('link[type="application/rss+xml"]@href')
##get / post
#####( url , [data] , [opts] , [callback] ) Make an HTTP request
url - A string containing a URL, which can be relative to the current context.
data (optional) - An object containing GET query parameters or POST request data.
opts (optional) - An object containing HTTP request options.
callback (optional) - A function called after making the request. Just like .then, the callback function should have 3 arguments (
context
,data
,next
) and must callnext
at least once.
Note: Query parameter values will be urlencoded by needle so make sure that your parameter values are not urlencoded.
##log
#####( callback( msg ) ) Call
callback
when any log messages are received
##login
#####( user , pass , [success] , [fail] ) Submit a login form.
#####Arguments: user - A string containing a username, email address, etc.
pass - A password string
success (optional) - A selector string determining if the login attempt succeeded
fail (optional) - A selector string determining if the login attempt failed
######How it works
login
finds the first form containinginput[type="password"]
and uses that input as the password field. It will use the preceding<input>
element as the user field.
##match
#####( [selector], RegExp ) Discard any nodes whose contents do not match
RegExp
##page / paginate
#####( selector , [limit] ) Paginate the previous request
limit
times based onselector
.####selector: selector (String) - A selector string for either:
- an element with the next page URL in its inner text or in an attribute that commonly contains a URL (href, src, etc.)
- an element whose
name
andvalue
attributes will respectively be added or replaced in the next page query.selector (Object) - An object where each
key
is a query parameter name and eachvalue
is either a selector string or an increment amount (+1, -1, etc.).####limit: limit (Number) - Total number of "next page" requests to make.
limit (String) - A selector string for an element containing the total number of requests to make.
.paginate('a.nextPage') // go to `a.nextPage` `@href` .paginate('link[rel="next"]@href') // go to `link` `@href` .paginate('input[name="page"]') // update `page` parameter of the next query // adds 20 to the `startIndex` query parameter // sets `page` query parameter to `a.nextPage` content // stops after 15 requests are made .paginate({ startIndex: +20, page: 'a.nextPage' }, 15)
##pause / resume / stop
Pause, resume or stop an osmosis instance.
##parse
#####( string ) Parse an HTML or XML string
#####Arguments: string - A string or buffer containing the HTML/XML data
##set
#####( name , selector) Set
name
to the value ofselector
#####( object ) Set each
key
to the value of eachval
selector.
.set('title') // set 'title' to current element text .set('title', 'a.title') // set 'title' to text of 'a.title' .set({ title: 'a.title', description: 'p.description', url: 'a.permalink @href', images: ['img @src'], comments: [ osmosis .follow('a.comments') .find('div.comment') .set({ 'author': '.author' 'content': 'p.content', 'date': '.date' }) ] });
##submit
#####( selector , [data] ) Submit a form
#####Arguments: selector - A selector for the
<form>
elementdata (optional) - An object where each
key
andvalue
represents a form input name and value
##then
#####( callback( context, data, [next], [done] ) ) Calls
callback
with the context of the current element.####context: The
context
argument is the current context at that point in the command chain. If the previous command wasget
,post
,follow
, orparse
then the context will be a Document. If the previous command wasfind
then the current context will be one of the Elements that was found.####data: The
data
argument contains values set viaosmosis.set
. This object can be modified in any way.####next: The
next
argument is a function that will call the next command. It takes two arguments: context and data.####done: The
done
argument is a function to call whenthen
will no longer callnext
. This is only required ifthen
callsnext
asynchronously any number of times. Note: If thethen
callback acceptsdone
as an argument, it must always calldone
, even ifnext
was never called.####Functions The callback will have these functions bound to its
this
value:
- this.request(method, url, [data], callback([err], context), [opts])
- this.log(msg)
- this.debug(msg)
- this.error(msg)
####Properties The callback will have these properties:
- this.instance - The Osmosis instance number that this command is in.
- this.depth - The current position in the command chain.
- this.next - The next command in the chain. Note that the
next()
argument is really just a shortcut forthis.next.start()
- this.prev - The previous command in the command chain.
####Examples:
Example 1: find every
ul > li
and pass it to the next command
osmosis ... .then(function(context, data, next) { var items = context.find('ul > li'); items.forEach(function(item) { next(item, data); }) })
**Example 2:** set `data.url` to the current page URL ```javascript
osmosis ... .then(function(context, data, next) { data.url = context.doc().request.url; next(context, data); })
**Example 3:** only continue if `lastname != undefined` ```javascript
osmosis ... .then(function(context, data, next) { if (data.lastname != undefined) next(context, data) })
**Example 4:** using the `done` function ```javascript
osmosis ... .then(function(context, data, next, done) { if (db.connected == false) { this.error('database disconnected'); done(); return; } data.someArray.forEach(function(obj, index) { db.save(obj, function() { next(context, data); if (index == data.someArray.length-1) dane(); }) }) })