-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sampling options of iterative and binary search #831
base: main
Are you sure you want to change the base?
Conversation
3f9c6c3
to
8952b07
Compare
da633b0
to
2f5d8f3
Compare
Here's a toy demo based on the conditional binomial method as implemented in numpy, etc. // single-threaded
#include <iostream>
#include <random>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <chrono>
using namespace std::chrono;
int main()
{
const int nq = 25, nshot = 800000;
// const int nq=8, nshot=1000;
std::default_random_engine generator;
std::uniform_real_distribution<double> uniform(0.0, 1.0);
double *probs = new double[1 << nq], totalprob = 0.0;
int samples[nshot];
// Generate (unnormalized) nq-qubit probability distr. Don't include this in timing
for (int i = 0; i < 1 << nq; i++)
totalprob += (probs[i] = uniform(generator));
std::binomial_distribution<int> binom;
gsl_rng *gslgen = gsl_rng_alloc(gsl_rng_taus);
int s, offset = 0, r = nshot;
auto start = high_resolution_clock::now();
// Take nshot of samples from the above distribution, by conditional-binomial method:
for (int j = 0; j < (1 << nq) - 1; j++)
{
// s = binom(generator, std::binomial_distribution<int>::param_type(r, probs[j]/totalprob));
s = gsl_ran_binomial(gslgen, probs[j] / totalprob, r);
r -= s;
for (int k = 0; k < s; k++)
samples[offset++] = j;
if (!r)
break;
totalprob -= probs[j];
}
for (int k = 0; k < r; k++)
samples[offset++] = (1 << nq) - 1;
auto stop = high_resolution_clock::now();
auto duration = duration_cast<milliseconds>(stop - start);
std::cout << duration.count() << std::endl;
return 0;
} For a parallel version you can divide the wavefn into roughly equal bins and then do a single-threaded multinomial sample to distribute samples between the bins before using OMP to sample within the bins. On my laptop the timing seems to scale well with the number of threads and is limited by the random number generation: // multi-threaded
#include <iostream>
#include <random>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <omp.h>
#include <chrono>
using namespace std::chrono;
int main()
{
const int nq = 25, nshot = 800000, nthread = 8;
//const int nq = 8, nshot = 10000, nthread = 6;
std::default_random_engine generator;
std::uniform_real_distribution<double> uniform(0.0, 1.0);
double *probs = new double[1 << nq], totalprob = 0.0, partialtotal[nthread];
int samples[nshot];
// Generate (unnormalized) nq-qubit probability distr, summing partial accumulated probability into nthread ~equal bins
// Don't include this in timing (I assume that the accumulating partial probability was performed during last sweep over the wavefn before the measurement)
int js[nthread + 1];
js[0] = 0;
for (int i = 0, t = 0; t < nthread; t++)
{
partialtotal[t] = 0.0;
js[t + 1] = (t + 1) * (1 << nq) / nthread;
for (; i < js[t + 1]; i++)
partialtotal[t] += (probs[i] = uniform(generator));
totalprob += partialtotal[t];
}
gsl_rng *gslgen = gsl_rng_alloc(gsl_rng_taus);
// std::binomial_distribution<int> binom;
auto start = high_resolution_clock::now();
// Do a single-threaded multinomial to divide the shots among the bins
int r = nshot, offsets[nthread + 1];
offsets[0] = 0;
for (int t = 0; t < nthread - 1; t++)
{
int nsucc = gsl_ran_binomial(gslgen, partialtotal[t] / totalprob, r);
offsets[t + 1] = offsets[t] + nsucc;
r -= nsucc;
totalprob -= partialtotal[t];
}
offsets[nthread] = nshot;
// Do a parallel set of multinomials to sample from each of the bins in parallel. This should be the rate-limiting step
#pragma omp parallel num_threads(nthread)
{
int t = omp_get_thread_num();
int off = offsets[t], r = offsets[t + 1] - offsets[t], jn = js[t + 1], s;
gsl_rng *gslgen1 = gsl_rng_alloc(gsl_rng_taus);
double partial = partialtotal[t];
for (int j = js[t]; j < jn - 1; j++)
{
// s = binom(generator, std::binomial_distribution<int>::param_type(r, probs[j]/totalprob));
s = gsl_ran_binomial(gslgen1, probs[j] / partial, r);
r -= s;
for (int k = 0; k < s; k++)
samples[off++] = j;
if (!r)
break;
partial -= probs[j];
}
for (int k = 0; k < r; k++)
samples[off++] = jn - 1;
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<milliseconds>(stop - start);
std::cout << duration.count() << std::endl;
/* for(int i=0; i<nshot; i++){
std::cout << samples[i] << std::endl;
} */
return 0;
} |
044019b
to
986a37f
Compare
Summary
Optimization of Sampling in QubitVector
Details and comments
In original implementation, a loop for sampling is iterated based on sample count.
In this optimized implementation, a loop for sampling is iterated based on indices.
In OpenMP, threads execute the same iterations if loop conditions are the same in most case.
By using the same conditions with index construction and sampling, memory access is optimized if sampled values are randomly allocated in a qubitvector.