Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we promisfy js-crawler #27

Open
shekarls opened this issue Aug 9, 2016 · 2 comments
Open

Can we promisfy js-crawler #27

shekarls opened this issue Aug 9, 2016 · 2 comments

Comments

@shekarls
Copy link

shekarls commented Aug 9, 2016

Can you help me show, if we can promisfy the below js-crawler.
is there better way to return the response from each crawler state.

function runCrawler(url) {

            var crawler = new Crawler().configure({ignoreRelative: false, depth: 2});
            crawler.crawl({
                url: url,
                success: function (page) {
                    console.log(page.url +' --- ' + page.status);
                   },

                failure: function (page) {
                    console.log(page.url +' --- ' + page.status);

                },
                finished: function (page) { 
                    return console.log('COMPLETED***********');
                } 
        });
@amoilanen
Copy link
Owner

amoilanen commented Aug 10, 2016

Hi,

Actually the result of invoking crawl is not a Promise, but a natural Observable http://reactivex.io/documentation/observable.html, it even has a very similar API, maybe we can think about making it an actual Observable in the later releases.

Promise resolves to a single value and crawler produces a series of values the length of which we do not know upfront, so we cannot just return a Promise from the crawl method.

I agree that it would be nice to think about some alternatives to callback-based API.

@jankcat
Copy link
Contributor

jankcat commented Feb 15, 2017

I don't think promisifying the crawler API itself really makes sense, based on the fact that it is a many-result not a single-result system... it is better suited to the event listener system in place currently. (Re: antivanov's comment.)

What I did to adapt it to my promise-based system, for those interested:

function startCrawl(url) {
  return new Promise(function(resolve) {
    // Create new results object
    let results = {
    };
  
    crawler.crawl({
      url: url,
      success: function(page) {
        // Do any actions you wanted with page, log to console if you don't care about output order, etc.
        // Add whatever you want to keep to your results object
      },
      failure: function(page) {
        // Do any actions you wanted with page, log to console if you don't care about output order, etc.
        // Add whatever you want to keep to your results object
      },
      finished: function(crawledUrls) {
        // Do any actions you wanted with list of crawled urls, log to console if you want, etc.
        // Add whatever you want to keep to your results object
        resolve(results);
      }
    });
  });
}

Basically, the "results" object would be used to contain anything you want to pass back after the promise resolves, and the promise resolves after the crawling completes. You could add a reject() to handle errors as well, if you want the promise to fail in certain cases.

To use this, it is as simple as any other promise/thenable function:

startCrawl("https://example.com").then(function(results) {
  // Do something with results object here...
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants