Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic for returning results from a function that get XHR calls response body #102

Open
bakaburg1 opened this issue Jul 21, 2020 · 3 comments

Comments

@bakaburg1
Copy link

Hello,
first of all compliments for your library, it's a great evolution over RSelenium (which I didn't like).

I created a function that allows to get the response body of resources requested by a page. The algorithm I made works well in interactive mode since I can wait for the requests to b fulfilled to collect the results, but I have no idea how to do this in an automatic function.

get_website_resources <- function(url, url_filter = '*', type_filter = '*') {

  chrome <- Chrome$new()

  out <- new.env()

  out$l <- list()

  client <- chrome$connect(callback = ~ NULL)

  Fetch <- client$Fetch
  Page <- client$Page

  Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
    Fetch$requestPaused(callback = function(params) {

      if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

        Fetch$getResponseBody(requestId = params$requestId) %...>% {
          resp <- .

          if (resp$body != '') {
            if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

            body <- list(list(
              url = params$request$url,
              response = resp
            )) %>% set_names(params$requestId)

            str(body)

            out$l <- append(out$l, body)
          }

        }
      }

      Fetch$continueRequest(requestId = params$requestId)
    })
  } %...>% {
    Page$navigate(url)
  }


 ## the following two lines should be called only after the resources have been collected
 #chrome$close() 
 #out$l
}

I have no idea if there is a smart way to intercept some relevant event. In alternative I thought of a timer or a counter that stops when a chosen number of resources is collected, but I wouldn't know even how to implement the last two (new to Promises).

Heres a use case:
resources <- get_website_resources('https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9', url_filter = 'querydata', type_filter = 'XHR')

Thank you!
Angelo

@bakaburg1
Copy link
Author

Hello, I may have found a solution:

get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 20) {

  out <- new.env()

  out$l <- list()

  perform_with_chrome( function(client) {
    Fetch <- client$Fetch
    Page <- client$Page

    client$inspect()

    Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
      Fetch$requestPaused(callback = function(params) {

        if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

          Fetch$getResponseBody(requestId = params$requestId) %...>% {
            resp <- .

            if (resp$body != '') {
              if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

              body <- list(list(
                url = params$request$url,
                response = resp
              )) %>% set_names(params$requestId)

              out$l <- append(out$l, body)
            }

          }
        }

        Fetch$continueRequest(requestId = params$requestId)
      })
    } %...>% {
      Page$navigate(url)
    } %>% wait(wait_for)
  })

  out$l
}

This does the trick but is not very efficient (20s is a lot but you need a good margin). Is there a way to resolve the promise instead based on a condition?
In the javascript world, I would trigger an event if the correct number of resources is catcher, but I have no idea how to do it with R promises.

Thank you

@RLesur
Copy link
Owner

RLesur commented Jul 21, 2020

Hi @bakaburg1,

We're glad you like crrri! I haven't had time to dive in your example but here is what I would do to create an async counter with R promises:

# we can use a closure to define an async counter:
async_counter <- function(resolve_at = 20) {
  # initialize objects
  count <- 0
  resolve_function <- NULL
  
  # build a promise and store the resolve function
  pr <- promises::promise(function(resolve, reject) {
    resolve_function <<- resolve
  })
  
  # build a function to increment the counter
  increment_counter <- function() {
    count <<- count + 1
    if (count >= resolve_at) {
      resolve_function(count)
    }
  }
  
  # return both the counter and the promise:
  list(increment_counter = increment_counter, promise = pr)
}

# demo #1 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
# check the promise:
my_counter$promise
# increment the counter
my_counter$increment_counter()
# check the promise
my_counter$promise

# demo #2 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
my_counter$promise$then(function(value) {
  cat("promise resolved, value:", value)
})
my_counter$increment_counter()

I hope it will help.

@bakaburg1
Copy link
Author

bakaburg1 commented Jul 22, 2020

Thank you! it took me a while but I was able to more or less understand the logic of what you did and integrate it in my function. My problem was to stop a promise started by the crrri client.
I'm not sure if what I did was the most efficient solution: I enclosed all the async calls in a promise which also exports a resolve function in the parent environment. Once the conditions are met, the resolve function is called, closing the open async calls.

get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 15, n_of_resources = NULL) {

  crrri::perform_with_chrome(function(client) {
    Fetch <- client$Fetch
    Page <- client$Page

    client$inspect() # not needed, just to check what's going on

    out <- new.env()

    out$results <- list()
    out$resolve_function <- NULL

    # wrap everything into a promise which produces a resolution function
    out$pr <- promises::promise(function(resolve, reject) {
      out$resolve_function <- resolve

      Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
        Fetch$requestPaused(callback = function(params) {

          if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

            Fetch$getResponseBody(requestId = params$requestId) %...>% {
              resp <- .

              if (resp$body != '') {
                if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

                body <- list(list(
                  url = params$request$url,
                  response = resp
                )) %>% set_names(params$requestId)

                #str(body)

                out$results <- append(out$results, body)

                if (!is.null(n_of_resources) & length(out$results) >= n_of_resources) out$resolve_function(out$results)
              }

            }
          }

          Fetch$continueRequest(requestId = params$requestId)
        })
      } %...>% {
        Page$navigate(url)
      } %>% crrri::wait(wait_for) %>%
        then(~ out$resolve_function(out$results))

    })

    out$pr$then(function(x) x)
  }, timeouts = max(wait_for + 3, 30), cleaning_timeout = max(wait_for + 3, 30))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants