Can we promisfy js-crawler #27

shekarls · 2016-08-09T23:10:27Z

Can you help me show, if we can promisfy the below js-crawler.
is there better way to return the response from each crawler state.

function runCrawler(url) {

            var crawler = new Crawler().configure({ignoreRelative: false, depth: 2});
            crawler.crawl({
                url: url,
                success: function (page) {
                    console.log(page.url +' --- ' + page.status);
                   },

                failure: function (page) {
                    console.log(page.url +' --- ' + page.status);

                },
                finished: function (page) { 
                    return console.log('COMPLETED***********');
                } 
        });

The text was updated successfully, but these errors were encountered:

amoilanen · 2016-08-10T21:27:49Z

Hi,

Actually the result of invoking crawl is not a Promise, but a natural Observable http://reactivex.io/documentation/observable.html, it even has a very similar API, maybe we can think about making it an actual Observable in the later releases.

Promise resolves to a single value and crawler produces a series of values the length of which we do not know upfront, so we cannot just return a Promise from the crawl method.

I agree that it would be nice to think about some alternatives to callback-based API.

jankcat · 2017-02-15T16:26:58Z

I don't think promisifying the crawler API itself really makes sense, based on the fact that it is a many-result not a single-result system... it is better suited to the event listener system in place currently. (Re: antivanov's comment.)

What I did to adapt it to my promise-based system, for those interested:

function startCrawl(url) {
  return new Promise(function(resolve) {
    // Create new results object
    let results = {
    };
  
    crawler.crawl({
      url: url,
      success: function(page) {
        // Do any actions you wanted with page, log to console if you don't care about output order, etc.
        // Add whatever you want to keep to your results object
      },
      failure: function(page) {
        // Do any actions you wanted with page, log to console if you don't care about output order, etc.
        // Add whatever you want to keep to your results object
      },
      finished: function(crawledUrls) {
        // Do any actions you wanted with list of crawled urls, log to console if you want, etc.
        // Add whatever you want to keep to your results object
        resolve(results);
      }
    });
  });
}

Basically, the "results" object would be used to contain anything you want to pass back after the promise resolves, and the promise resolves after the crawling completes. You could add a reject() to handle errors as well, if you want the promise to fail in certain cases.

To use this, it is as simple as any other promise/thenable function:

startCrawl("https://example.com").then(function(results) {
  // Do something with results object here...
});

amoilanen added enhancement question labels Aug 10, 2016

amoilanen self-assigned this Aug 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we promisfy js-crawler #27

Can we promisfy js-crawler #27

shekarls commented Aug 9, 2016

amoilanen commented Aug 10, 2016 •

edited

Loading

jankcat commented Feb 15, 2017 •

edited

Loading

Can we promisfy js-crawler #27

Can we promisfy js-crawler #27

Comments

shekarls commented Aug 9, 2016

amoilanen commented Aug 10, 2016 • edited Loading

jankcat commented Feb 15, 2017 • edited Loading

amoilanen commented Aug 10, 2016 •

edited

Loading

jankcat commented Feb 15, 2017 •

edited

Loading