-
Notifications
You must be signed in to change notification settings - Fork 247
Home
Osmosis works by passing a context object and a data object down a command chain.
#####[context] The context object is an XML/HTML Document, [Element] (https://developer.mozilla.org/en-US/docs/Web/API/Element).
#####[data] The data object is just a regular object that starts out empty.
Example:
osmosis
.command1() // passes a context to command2
.command2() // receives context and inherits all data values
...
osmosis
.command3() // new instance doesn't receive context or data
####Contexts that might be passed
#####New context
Some commands select new elements or request new documents. These commands will pass a new context down the chain. For example osmosis.find
will pass found elements to the next command.
#####Multiple contexts
A command can have more than one context (i.e. element) for the next command to process. Rather than pass an array of elements to the next command, it simply calls the next command once for each element.
#####Same context
Other commands, such as log
and set
do passive operations. They simply forward the current context.
##Command reference
These are all of the "commands" that are available for chaining in an Osmosis instance.
- click
- config
- contains
- data
- delay
- do
- doc
- dom
- failure
- filter
- find
- follow
- get/post
- login
- match
- paginate
- parse
- set
- submit
- then
##click
#####( selector ) Click on nodes found by selector
##contains
#####( string ) Discard any nodes whose contents do not match string
##config
#####( opts ) #####( key, val ) Set HTTP options and configure Osmosis
##data
#####( callback( data ) ) Calls
callback
with the current data object#####( null ) Empty the data object
#####( object ) Add or replace each
key
in the data object with a newval
##debug
#####( callback( msg ) ) Call
callback
when any debug messages are received
##delay
#####( seconds ) Delay starting next promise for
seconds
(float or int)
##do
#####( osmosis..., osmosis... ) Call each Osmosis instance with the current context. This will always continue, even if an instance fails.
##doc
Reset the current context to the
Document
##dom
#####( callback ) Create a DOM object from the current context.
The
callback
will be be called with 3 arguments (window
,data
, andnext
). Thenext([context], [data])
function must be called at least once
##done
#####( callback ) Calls
callback
when parsing has completely finished
##error
#####( callback( msg ) ) Call
callback
when any error messages are received
##failure/fail
#####( selector ) Discard any nodes that match
selector
##filter/success
#####( selector ) Discard any nodes that do not match
selector
##find
#####( selector ) Find elements based on
selector
anywhere within the current document
##follow
#####( [selector] ) Follow URLs found via
selector
. Ifselector
isn't provided,follow
will search the current element text or common URL attributes (href, src, etc).####Examples:
.follow() .follow('@href') .follow('a') .follow('a@href') .follow('span.outlink') .follow('input.cloneURL@value') .follow('link[type="application/rss+xml"]@href')
##get / post
#####( url , [data] , [opts] ) Make an HTTP request
url - A string containing a URL, which can be relative to the current context.
data (optional) - An object containing GET query parameters or POST request data.
opts (optional) - An object containing HTTP request options.
Note: Query parameter values will be urlencoded by needle so make sure that your parameter values are not urlencoded.
##log
#####( callback( msg ) ) Call
callback
when any log messages are received
##login
#####( user , pass , [success] , [fail] ) Submit a login form.
#####Arguments: user - A string containing a username, email address, etc.
pass - A password string
success (optional) - A selector string determining if the login attempt succeeded
fail (optional) - A selector string determining if the login attempt failed
######How it works
login
finds the first form containinginput[type="password"]
and uses that input as the password field. It will use the preceding<input>
element as the user field.
##match
#####( [selector], RegExp ) Discard any nodes whose contents do not match
RegExp
##page / paginate
#####( selector , [limit] ) Paginate the previous request
limit
times based onselector
.####selector: selector (String) - A selector string for either:
- an element with the next page URL in its inner text or in an attribute that commonly contains a URL (href, src, etc.)
- an element whose
name
andvalue
attributes will respectively be added or replaced in the next page query.selector (Object) - An object where each
key
is a query parameter name and eachvalue
is either a selector string or an increment amount (+1, -1, etc.).####limit: limit (Number) - Total number of "next page" requests to make.
limit (String) - A selector string for an element containing the total number of requests to make.
.paginate('a.nextPage') // go to `a.nextPage` `@href` .paginate('link[rel="next"]@href') // go to `link` `@href` .paginate('input[name="page"]') // update `page` parameter of the next query // adds 20 to the `startIndex` query parameter // sets `page` query parameter to `a.nextPage` content // stops after 15 requests are made .paginate({ startIndex: +20, page: 'a.nextPage' }, 15)
##pause / resume / stop
Pause, resume or stop an osmosis instance.
##parse
#####( string ) Parse an HTML or XML string
#####Arguments: string - A string or buffer containing the HTML/XML data
##set
#####( name , selector) Set
name
to the value ofselector
#####( object ) Set each
key
to the value of eachval
selector.
.set('title') // set 'title' to current element text .set('title', 'a.title') // set 'title' to text of 'a.title' .set({ title: 'a.title', description: 'p.description', url: 'a.permalink @href', images: ['img @src'], comments: [ osmosis .follow('a.comments') .find('div.comment') .set({ 'author': '.author' 'content': 'p.content', 'date': '.date' }) ] });
##submit
#####( selector , [data] ) Submit a form
#####Arguments: selector - A selector for the
<form>
elementdata (optional) - An object where each
key
andvalue
represents a form input name and value
##then
#####( callback( context, data, [next], [done] ) ) Calls
callback
with the context of the current element.####context: The
context
argument is the current context at that point in the command chain. If the previous command wasget
,post
,follow
, orparse
then the context will be a Document. If the previous command wasfind
then the current context will be one of the Elements that was found.####data: The
data
argument contains values set viaosmosis.set
. This object can be modified in any way.####next: The
next
argument is a function that will call the next command. It takes two arguments: context and data.####done: The
done
argument is a function to call whenthen
will no longer callnext
. This is only required ifthen
callsnext
asynchronously any number of times.Note: If the callback accepts
done
as an argument, it must always calldone
, even ifnext
was never called.####Functions The callback will have these functions bound to its
this
value:
- this.request(method, url, [data], callback([err], context), [opts])
- this.log(msg)
- this.debug(msg)
- this.error(msg)
####Examples:
Example 1: find every
ul > li
and pass it to the next command
osmosis ... .then(function(context, data, next) { var items = context.find('ul > li'); items.forEach(function(item) { next(item, data); }) })
**Example 2:** set `data.url` to the current page URL ```javascript
osmosis ... .then(function(context, data, next) { data.url = context.doc().request.url; next(context, data); })
**Example 3:** only continue if `lastname != undefined` ```javascript
osmosis ... .then(function(context, data, next) { if (data.lastname != undefined) next(context, data) })
**Example 4:** using the `done` function ```javascript
osmosis ... .then(function(context, data, next, done) { if (db.connected == false) { this.error('database disconnected'); done(); return; } data.someArray.forEach(function(obj, index) { db.save(obj, function() { next(context, data); if (index == data.someArray.length-1) done(); }) }) })