BART predictions returns an array of 0.5 when using parallelization #71

bakaburg1 · 2025-02-04T16:05:35Z

Hi!

When using mirai_map for parallel prediction with BART models, I get back an array with the expected dimensions but filled in only with "0.5" (sometimes as numeric, sometimes even as characters!) instead of the expected posterior, e.g:
chr [1:1000, 1:1000] "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" "0.5" ...

The same code works correctly when run sequentially, but it also fails with other parallelization backends, like futures + furrr::future_map.

Reprex:

    set.seed(123)
    n <- 1000
    X <- matrix(rnorm(n * 5), ncol = 5)
    colnames(X) <- paste0("X", 1:5)
    y <- (X[,1] > 0) * 1  # Binary outcome
    
    # Fit BART model
    model <- dbarts::bart(X, y, verbose = FALSE, keeptrees = T)
    
    predict_parallel <- function(model, newdata, batch_size = 500) {
        n_batches <- ceiling(nrow(newdata) / batch_size)
        batches <- round(quantile(1:nrow(newdata), seq(0, 1, length.out = n_batches + 1)))
        sequence <- seq_along(batches)[-length(batches)]
        
        with(mirai::daemons(2), {
            draws <- mirai::mirai_map(sequence, \(i) {
                start <- batches[i]
                stop <- batches[i + 1] - 1
                idx <- start:stop
                
                predict(model, newdata = newdata[idx, ], type = "ev")
            },
            model = model,
            newdata = newdata
            )[.progress] |>
                do.call(what = "cbind")
        })
        return(draws)
    }
    
    predict_sequential <- function(model, newdata, batch_size = 500) {
        n_batches <- ceiling(nrow(newdata) / batch_size)
        batches <- round(quantile(1:nrow(newdata), seq(0, 1, length.out = n_batches + 1)))
        sequence <- seq_along(batches)[-length(batches)]
        
        draws <- purrr::map(sequence, \(i) {
            start <- batches[i]
            stop <- batches[i + 1] - 1
            idx <- start:stop
            
            predict(model, newdata = newdata[idx, ], type = "ev")
        }, .progress = T) |>
            do.call(what = "cbind")
        
        return(draws)
    }
    
    str(predict_parallel(model, X[1:10,]))
    str(predict_sequential(model, X[1:10,]))

Which returns:
num [1:1000, 1:9] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
if in parallel and
num [1:1000, 1:9] 0.00104 0.0018 0.00163 0.00252 0.00509 ...
if sequential.

Environment Information

R version: 4.4.2
dbarts version: 0.9-30
mirai version: 2.0.1
Platform: Mac Os 15.3

Btw, isn't there a way to harness parallel computing when predicting? or some other possible prediction speed up in general?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BART predictions returns an array of 0.5 when using parallelization #71

BART predictions returns an array of 0.5 when using parallelization #71

bakaburg1 commented Feb 4, 2025

BART predictions returns an array of 0.5 when using parallelization #71

BART predictions returns an array of 0.5 when using parallelization #71

Comments

bakaburg1 commented Feb 4, 2025

Environment Information