Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new lightweight backend with performance comparisons #57

Merged
merged 12 commits into from
Feb 25, 2021

Conversation

ThomasLoke
Copy link
Contributor

@ThomasLoke ThomasLoke commented Feb 4, 2021

This PR adds in a new backend and some rudimentary performance comparisons between the new and old backend. If we elect to adopt the new backend, then we may just want to replace the old one altogether.

The old backend represents state vectors as a multi-dimensional tensor in Eigen and applies gates in the form of tensor contractions, which really limits the optimisations that can be performed. The long-term goal with the new backend is to write things in such a way that offers more flexibility when it comes to optimisation choices. For now, it implements gate operations as tensor contractions, so it does much the same work as the Eigen backend, but in future, we can write gate operations in ways that don't require the full matrix representation of a gate.

There's 4 commits in this:

  1. Tone down warning levels: Compiling with Wall with MSVC floods the compilation with warning messages for the Eigen backend, so I toned down the warning levels for my own sanity.
  2. Black reformatting: I integrated PyCharm with the black formatter, and it automatically reformats these files every single time I touch them, so I separated these out.
  3. New backend + benchmark notebook: The main change, which adds in a separate backend and a notebook that plots the average execution time of both.
  4. Add compilation for Eigen backend back in again: I turned off compilation of the Eigen backend during development because it literally takes minutes to compile; the new backend has no external dependencies (yet), so compilation is almost instantaneous.

Performance comparisons on my machine show about a x20-30 speedup relative to the old backend. For full disclosure, I doubt its from the backend itself--during development of the new backend, I found that the pybind11 bindings was the largest factor responsible for slowness, because it looked like it was returning a plain Python list instead of a numpy array, and then having to convert that into a numpy array. Its possible the old backend also has a similar problem, although I'm not that familiar with how pybind11 handles conversion between numpy arrays and Eigen vectors; if so, then perhaps a similar speedup can be achieved just using the old backend but reworking the pybind11 bindings. Even if that's the case, I still don't think its of much long-term value, just because of how representation using Eigen tensors inhibits future optimisation choices (there's a lot of low-hanging optimisation fruit for the new backend that remains to be picked).

On a much more minor note, the new backend also doesn't need template metaprogramming to have arbitrary rank tensors (although in theory, its hard limited to 64 bits, since indices only go up to unsigned 64-bit integers), which is a plus in my book :)

One caveat is that I haven't tested the results of the new backend extensively (beyond timing and inspecting a few state vectors by eye), so help validating the results would be appreciated.

@codecov
Copy link

codecov bot commented Feb 4, 2021

Codecov Report

Merging #57 (92cb98c) into master (2ceb93b) will decrease coverage by 37.19%.
The diff coverage is 30.76%.

Impacted file tree graph

@@             Coverage Diff              @@
##            master      #57       +/-   ##
============================================
- Coverage   100.00%   62.80%   -37.20%     
============================================
  Files            3        5        +2     
  Lines           56      121       +65     
============================================
+ Hits            56       76       +20     
- Misses           0       45       +45     
Impacted Files Coverage Δ
pennylane_lightning/lightning_benchmark.py 0.00% <0.00%> (ø)
pennylane_lightning/lightning_qubit_new.py 44.18% <44.18%> (ø)
pennylane_lightning/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ceb93b...743e0af. Read the comment docs.

@co9olguy
Copy link
Member

co9olguy commented Feb 4, 2021

Thanks for the PR @ThomasLoke!

@chaserileyroberts chaserileyroberts self-requested a review February 16, 2021 22:25
)

if operations:
# XXX: Do we need a copy here? Not sure of the required copy semantics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need a copy here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I take that back I'm not sure. @antalszava do we need to use np.copy() here?

Copy link
Contributor

@antalszava antalszava Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checked on a toy example, we'll need copying here. This will also be double-checked when we have tests in place (tests will catch these sort of cases).

Edit: on a second thought, although self._state is indeed mutated by self.apply_lightning, it will anyways be updated when considering rotations in the next if. So we could get away with not copying here. Best to do some testing to consider some edge cases. 🙂

pennylane_lightning/src/rework/StateVector.cpp Outdated Show resolved Hide resolved
pennylane_lightning/src/rework/StateVector.hpp Outdated Show resolved Hide resolved
Comment on lines +78 to +79
"msvc": ["-EHsc", "-O2", "-W1", "-std:c++11"],
"unix": ["-O3", "-W", "-fPIC", "-shared", "-fopenmp"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you reduce the warning level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See remark above:

Tone down warning levels: Compiling with Wall with MSVC floods the compilation with warning messages for the Eigen backend, so I toned down the warning levels for my own sanity.

if isinstance(operation, (QubitStateVector, BasisState)):
if i == 0:
self._apply_operation(operation)
del operations[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the del here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just copy-pasta from the original lightning_qubit.py--well, at least, it was before this commit. My branch isn't that fresh anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit will be required here too because the API changed in PennyLane (_apply_operation changed).

pennylane_lightning/src/rework/Gates.cpp Outdated Show resolved Hide resolved
Comment on lines +49 to +56
private:
static const std::vector<CplxType> matrix;
public:
static const std::string label;
static XGate create(const std::vector<double>& parameters);
inline const std::vector<CplxType>& asMatrix() {
return matrix;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this code is repeated. Can we move matrix, label and asMatrix() to the AbstractGate interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're static class variables, so they need to be defined in the subclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that asMatrix() already exists in the superclass as a virtual method, so it is already declared there.

pennylane_lightning/lightning_qubit_new.py Show resolved Hide resolved
pennylane_lightning/src/rework/Apply.cpp Outdated Show resolved Hide resolved
Comment on lines 25 to 31
if (Pennylane::XGate::label == label) {
gate = std::make_unique<Pennylane::XGate>(Pennylane::XGate::create(parameters));
}
else if (Pennylane::YGate::label == label) {
gate = std::make_unique<Pennylane::YGate>(Pennylane::YGate::create(parameters));
}
else if (Pennylane::ZGate::label == label) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a switch case instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you could just do return ... instead of gate = ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch cases don't work for strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch cases don't work for strings.

Huh. I didn't know that.

@@ -16,6 +16,7 @@
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this entire file is just formatting changes. Can we revert them to make this PR smaller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this PR does involve formatting changes--see remark 2 above:

Black reformatting: I integrated PyCharm with the black formatter, and it automatically reformats these files every single time I touch them, so I separated these out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasLoke would you be able to git checkout master <filename> for the files that have only formatting changes, and commit+push? It will help out with the code review side.

At the same time, we should make a separate PR to ensure all tests etc. are blacked, to reduce large formatting diffs in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have hoped that separating it out into a separate commit would have helped with the code review, given that you can review commit-by-commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I expect PRs to consist of a sequence of well-defined commits that group changes into logical chunks. If you're trying to review the PR based on the overall diff, then yes, I'd expect you'd run into problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, our team usually group chunks by PRs, rather than diffs. Especially as some comments/suggestions/changes that come up during review might affect subsequent updates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good to keep these changes - since the only changes were formatting we don't need to review.

Copy link
Contributor

@antalszava antalszava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ThomasLoke, this is a really great approach, thank you so much for this! 🥇 💯

I've left some comments, mostly for my own understanding. Haven't delved into all details, mostly started zooming in at some parts from a higher level.

Also checked the benchmarks and they look really promising indeed when we have >=10 qubit systems! Below that, the new backend is usually comparable or slightly better than the old one.


I'd have hoped that separating it out into a separate commit would have helped with the code review, given that you can review commit-by-commit?

True, with the heads-up it's for sure easier reviewing now, thanks for the note! 🙂

Separating the formatting changes would, however, also help when looking at the PR later in the future. When tests are included, this addition would grow even more (it might easily be that it will be worth splitting it into smaller pieces at that time). Retrospectively looking into this PR might become cumbersome.

if isinstance(operation, (QubitStateVector, BasisState)):
if i == 0:
self._apply_operation(operation)
del operations[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit will be required here too because the API changed in PennyLane (_apply_operation changed).

return StateVector((CplxType*)numpyArrayInfo.ptr, numpyArrayInfo.shape[0]);
}

Pennylane::StateVector::StateVector(CplxType* const arr, const size_t length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding: any reason we need arr to be a constant pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the class field is declared as CplxType* const arr intentionally, its certainly not required for the function parameter. I'll drop it.


if (numpyArrayInfo.ndim != 1)
throw std::invalid_argument("NumPy array must be a 1-dimensional array");
if (numpyArrayInfo.itemsize != sizeof(CplxType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: isn't the output of sizeof(CplxType) platform-dependent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, double "should" be an IEEE-754 64-bit floating point type (though its not explicitly guaranteed by C++ standards). And struct padding shouldn't make a difference, whether its a 32-bit or 64-bit system. So all-in-all, I think its unlikely to be platform-dependent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double "should" be an IEEE-754 64-bit floating point type (though its not explicitly guaranteed by C++ standards)

Yeah, was curious because of that. 🤔 Agreed, it would more so be implementation dependent. Would it be worth hardcoding the value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding the value would be inadvisable--at the end of the day, itemsize must conform to the size of whatever complex type is in use, otherwise the internal numpy array will be misinterpreted when casting to CplxType*.

AFAIK, there aren't any standard fixed-width floating point representations (unlike integers). However, there does exists a way of checking the floating-point representation is IEEE-754 compliant (e.g. this).

At the end of the day however, I'm not sure its wise to complicate the implementation by trying to support systems in which double isn't IEEE-754 64-bit floating-point compliant. But I'll leave that up to you :)

) {
unique_ptr<AbstractGate> gate = constructGate(opLabel, opParams);
const vector<CplxType>& matrix = gate->asMatrix();
assert(matrix.size() == exp2(opWires.size()) * exp2(opWires.size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we have such assert statements? These would be unnecessary to have once the code is tested, correct?

Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, I don't see the two (asserts and unit tests) as being dichotomous (i.e. why not both?). Asserts are helpful when it comes to making internal consistency checks. Unit tests are helpful when you want to validate the functionality of the application.

In this particular case, we should at least throw if the number of qubits on which the gate is defined doesn't match opWires.size(); I'm somewhat ambivalent about removing the assert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, we should at least throw if the number of qubits on which the gate is defined doesn't match opWires.size()

Makes sense! The thing with assert statements is that if they fail, then the user sees an assert error without a clear error message of what was wrong with their inputs. So in that sense throwing might be preferred here.

}
],
"source": [
"from cpuinfo import get_cpu_info\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires pip install py-cpuinfo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding a note that executing this cell requires an external library to be installed.

op_param = [o.parameters for o in operations]

state_vector = np.ravel(state)
assert state_vector.flags["C_CONTIGUOUS"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should rather be unit tested.

Suggested change
assert state_vector.flags["C_CONTIGUOUS"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. For this in particular, we probably don't even need to test it since it should be guaranteed by the semantics of numpy.ravel(). We can just remove the assert without losing sleep over it.
  2. On tests in general, I think it's rather moot as long as the current unit test suites only run for the old backend. If we replace the old one altogether, then that simplifies the situation, and we can just add new ones to the current suites. If we don't, then we'll need to figure something else out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm up for keeping this assert, since before we were doing some strange order="F" and would be good to catch this in case there's any remaining edge cases (which I doubt).

)

if operations:
# XXX: Do we need a copy here? Not sure of the required copy semantics
Copy link
Contributor

@antalszava antalszava Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checked on a toy example, we'll need copying here. This will also be double-checked when we have tests in place (tests will catch these sort of cases).

Edit: on a second thought, although self._state is indeed mutated by self.apply_lightning, it will anyways be updated when considering rotations in the next if. So we could get away with not copying here. Best to do some testing to consider some edge cases. 🙂

using std::unique_ptr;
using std::vector;

vector<unsigned int> Pennylane::getIndicesExcluding(vector<unsigned int>& excludedIndices, const unsigned int qubits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: is it worth representing indices with unsigned int? My understanding is that the general take is to avoid using them even if it's almost certain that no arithmetic operations are being done using them. Though haven't really come across any weird cases.

Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. It's true, unsigned indices can be error-prone. I like unsigned indices because of the conceptual correspondence (i.e. valid indices are always unsigned) and because STL containers size() methods typically return unsigned types (so using unsigned types avoids the compiler warnings about mixing signed and unsigned types). But I can be easily persuaded to ditch it.

for (unsigned int i = 0; i < qubits; i++) {
indices.insert(indices.end(), i);
}
for (const unsigned int& excludedIndex : excludedIndices) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using a const ref for excludedIndex here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const to prevent modification of the vector elements (the intent is read-only), and a ref to avoid the implicit copy (which compilers may auto-optimise away anyway).

using std::unique_ptr;
using std::vector;

// FIXME: This should be reworked to use a function dispatch table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function dispatch table

Would that be something like std::map<std::string, func> where func is the function that returns the gate of the operator? (Something like what is there for the current ops).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be the idea. I just never got around to polishing it to do that.

Copy link
Contributor Author

@ThomasLoke ThomasLoke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews!

Separating the formatting changes would, however, also help when looking at the PR later in the future. When tests are included, this addition would grow even more (it might easily be that it will be worth splitting it into smaller pieces at that time). Retrospectively looking into this PR might become cumbersome.

I guess I'm not familiar enough with your workflow to have a strong opinion on it; I typically don't find it any more cumbersome if refactoring changes are landed as a separate commit (rather than squashing every single commit altogether), and if I just view the PR commit-by-commit (as it was intended to be reviewed). But a simple git revert of the second commit is trivial enough to add if that's what we need.

result = method()
endTime = time.time()
durations.append(endTime - startTime)
gc.collect()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less familiar with the garbage collector for Python than I am with Java, but I added this in on the off-chance that reclamation of resources triggered a stop-the-world GC. Probably won't happen unless it was really starved for memory though.

op_param = [o.parameters for o in operations]

state_vector = np.ravel(state)
assert state_vector.flags["C_CONTIGUOUS"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. For this in particular, we probably don't even need to test it since it should be guaranteed by the semantics of numpy.ravel(). We can just remove the assert without losing sleep over it.
  2. On tests in general, I think it's rather moot as long as the current unit test suites only run for the old backend. If we replace the old one altogether, then that simplifies the situation, and we can just add new ones to the current suites. If we don't, then we'll need to figure something else out.

for (unsigned int i = 0; i < qubits; i++) {
indices.insert(indices.end(), i);
}
for (const unsigned int& excludedIndex : excludedIndices) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const to prevent modification of the vector elements (the intent is read-only), and a ref to avoid the implicit copy (which compilers may auto-optimise away anyway).

) {
unique_ptr<AbstractGate> gate = constructGate(opLabel, opParams);
const vector<CplxType>& matrix = gate->asMatrix();
assert(matrix.size() == exp2(opWires.size()) * exp2(opWires.size()));
Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, I don't see the two (asserts and unit tests) as being dichotomous (i.e. why not both?). Asserts are helpful when it comes to making internal consistency checks. Unit tests are helpful when you want to validate the functionality of the application.

In this particular case, we should at least throw if the number of qubits on which the gate is defined doesn't match opWires.size(); I'm somewhat ambivalent about removing the assert.

using std::unique_ptr;
using std::vector;

// FIXME: This should be reworked to use a function dispatch table
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be the idea. I just never got around to polishing it to do that.


if (numpyArrayInfo.ndim != 1)
throw std::invalid_argument("NumPy array must be a 1-dimensional array");
if (numpyArrayInfo.itemsize != sizeof(CplxType))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, double "should" be an IEEE-754 64-bit floating point type (though its not explicitly guaranteed by C++ standards). And struct padding shouldn't make a difference, whether its a 32-bit or 64-bit system. So all-in-all, I think its unlikely to be platform-dependent.

return StateVector((CplxType*)numpyArrayInfo.ptr, numpyArrayInfo.shape[0]);
}

Pennylane::StateVector::StateVector(CplxType* const arr, const size_t length)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the class field is declared as CplxType* const arr intentionally, its certainly not required for the function parameter. I'll drop it.

using std::unique_ptr;
using std::vector;

vector<unsigned int> Pennylane::getIndicesExcluding(vector<unsigned int>& excludedIndices, const unsigned int qubits) {
Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. It's true, unsigned indices can be error-prone. I like unsigned indices because of the conceptual correspondence (i.e. valid indices are always unsigned) and because STL containers size() methods typically return unsigned types (so using unsigned types avoids the compiler warnings about mixing signed and unsigned types). But I can be easily persuaded to ditch it.

return vector<unsigned int>(indices.begin(), indices.end());
}

vector<size_t> Pennylane::generateBitPatterns(vector<unsigned int>& qubitIndices, const unsigned int qubits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector<size_t> Pennylane::generateBitPatterns(vector<unsigned int>& qubitIndices, const unsigned int qubits) {
vector<size_t> Pennylane::generateBitPatterns(const vector<unsigned int>& qubitIndices, const unsigned int qubits) {

* @param qubits number of qubits
* @return decimal value corresponding to all possible bit patterns for the given indices
*/
std::vector<size_t> generateBitPatterns(std::vector<unsigned int>& qubitIndices, const unsigned int qubits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<size_t> generateBitPatterns(std::vector<unsigned int>& qubitIndices, const unsigned int qubits);
std::vector<size_t> generateBitPatterns(const std::vector<unsigned int>& qubitIndices, const unsigned int qubits);

Copy link
Contributor

@antalszava antalszava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasLoke went for another round, it's looking great! Left a couple of comments & questions, but no major blockers for me.

Going forward, we were thinking of the following approach:

  1. In this PR the new lightweight backend would be added as a stand-alone device (hence the suggestions for a new short_name and the change to the setup.py). Once we've concluded the discussion, the addition will be merged in.
  2. In a separate PR, our team would add tests for the new backend (and slightly adjust logic if needed). That will also be a good time for us to run some further tests & benchmarks that we already have in place for Python-based PennyLane devices.
  3. In a third PR, the new backend would be swapped out with the existing one.

What are your thoughts about these steps? 🙂

"""

name = "Lightning Qubit PennyLane plugin"
short_name = "lightning.qubit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
short_name = "lightning.qubit"
short_name = "lightning.qubit.new"

@@ -184,7 +207,9 @@ def build_extensions(self):
"packages": find_packages(where="."),
"package_data": {"pennylane_lightning": ["src/*"]},
"entry_points": {
"pennylane.plugins": ["lightning.qubit = pennylane_lightning:LightningQubit",],
"pennylane.plugins": [
"lightning.qubit = pennylane_lightning:LightningQubit",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"lightning.qubit = pennylane_lightning:LightningQubit",
"lightning.qubit = pennylane_lightning:LightningQubit",
"lightning.qubit.new = pennylane_lightning:LightningQubitNew",

* @return decimal value for the qubit at specified index
*/
inline size_t decimalValueForQubit(const unsigned int qubitIndex, const unsigned int qubits) {
assert(qubitIndex < qubits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(qubitIndex < qubits);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm potentially fine for the assert here. Looking here, isn't the idea of an assert to catch cases that should never happen?

Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I generally use asserts for internal consistency checks (which signifies programming errors) rather than for validation. But I can be convinced either way.

}

/**
* Calculates the decimal value for a qubit, assuming a big-endian convention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal value for a qubit

This seems a bit confusing (perhaps just for me), may be worth rephrasing? Not quite sure what the decimal value for a qubit would be.

Copy link
Contributor Author

@ThomasLoke ThomasLoke Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had given this some thought in the past, but could never arrive at a more satisfactory term. It's really just trying to compute something like 0100 -> 2^2 = 4, i.e. the decimal value you'd add if the bit was 1. Happy to take suggestions though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! How about maxDecimalForQubit or basisStateToDecimal? It seems that we need both qubits and qubitIndex to get the result in the big-endian convention (and for assert(qubitIndex < qubits)). Otherwise, if we had assumed a little-endian convention, we could go with exp2(qubitIndex). Also, qubitIndex determines the index in the qubit long system where we have a 1 (on the other registers we have 0). So basically qubitIndex specifies a computational basis state that we are converting to the decimal representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, if we had assumed a little-endian convention, we could go with exp2(qubitIndex)

I assumed that wasn't an option, short of having different conventions between the Python side of things (which I assumed interpret qubits in a big-endian fashion) and the C++ backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that wasn't an option, short of having different conventions between the Python side of things (which I assumed interpret qubits in a big-endian fashion) and the C++ backend.

Absolutely, having it big-endian is a great approach.

Otherwise, if we had assumed a little-endian convention, we could go with exp2(qubitIndex)

This comment was just further considering what is being done within the function to get closer to a descriptive name, the current implementation is great as is.

result = method()
endTime = time.time()
durations.append(endTime - startTime)
gc.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. In that case, maybe it's worth going without it. Python will systematically call the garbage collector by default. Since we don't change the garbage collection for Python (and memory should be deallocated from within C++) we can get away with no calls here. Having it could incur some time overhead.

Suggested change
gc.collect()

Comment on lines 202 to 203
std::cos(theta / 2) * std::pow(M_E, CplxType(0, (-phi - omega) / 2)), -std::sin(theta / 2) * std::pow(M_E, CplxType(0, (phi - omega) / 2)),
std::sin(theta / 2) * std::pow(M_E, CplxType(0, (-phi + omega) / 2)), std::cos(theta / 2) * std::pow(M_E, CplxType(0, (phi + omega) / 2)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization is a bit challenging to read, could we introduce helper members e.g., for std::cos(theta / 2) and std::sin(theta / 2)?

Comment on lines 327 to 328
0, 0, std::cos(theta / 2) * std::pow(M_E, CplxType(0, (-phi - omega) / 2)), -std::sin(theta / 2) * std::pow(M_E, CplxType(0, (phi - omega) / 2)),
0, 0, std::sin(theta / 2) * std::pow(M_E, CplxType(0, (-phi + omega) / 2)), std::cos(theta / 2) * std::pow(M_E, CplxType(0, (phi + omega) / 2)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for introducing helper members as above.

static const std::vector<CplxType> matrix;
public:
static const std::string label;
static XGate create(const std::vector<double>& parameters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the create method be kept, it would be worth providing a comment on what its purpose is.


if (numpyArrayInfo.ndim != 1)
throw std::invalid_argument("NumPy array must be a 1-dimensional array");
if (numpyArrayInfo.itemsize != sizeof(CplxType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double "should" be an IEEE-754 64-bit floating point type (though its not explicitly guaranteed by C++ standards)

Yeah, was curious because of that. 🤔 Agreed, it would more so be implementation dependent. Would it be worth hardcoding the value here?

// limitations under the License.
#include "StateVector.hpp"

Pennylane::StateVector Pennylane::StateVector::create(const pybind11::array_t<CplxType>* numpyArray) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about merging create into the constructor here too 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a somewhat intentional choice--we don't really need a pybind::array_t<CplxType> in order to initialise what's really just a wrapper around a CplxType* array (especially, say, if we wanted to unit test this). Farming it out into a create method allows the validation checks to be done prior to class instantiation.

@ThomasLoke
Copy link
Contributor Author

Going forward, we were thinking of the following approach:

Sounds good to me!

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ThomasLoke, really nice addition and I'm a big fan of getting rid of Eigen and using our own approach. We'd like to merge this in ASAP, hopefully over the next few days, and we just need to decide practical details like whether we replace the old backend immediately or keep both for a time. Will keep you updated. Thanks!

durations.append(endTime - startTime)
gc.collect()

return [statistics.mean(durations), statistics.stdev(durations)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm excited by this new backend!!

Going forward, it would be interesting to use the benchmarking suite we've been working on recently:
https://github.com/PennyLaneAI/benchmark

Do we have any intuition about speedups in the low qubit setting? E.g., do we expect new and old backends to be about the same speed for <10 qubits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any intuition about speedups in the low qubit setting? E.g., do we expect new and old backends to be about the same speed for <10 qubits?

Its hard to say, given that anything less than 10 qubits was difficult to even time in isolation with a millisecond resolution, at least for single-gate operations. I think the worst-case scenario for the new backend is using 3-qubit gates, as its implementation of matrix multiplication is wholly unoptimised, whereas Eigen probably uses a specialised kernel for it. That said, we'll want to move away from using this approach anyway in favour of more optimised kernels on a per-gate basis, so I don't think this is a big deal.

"""PennyLane Lightning device.

An extension of PennyLane's built-in ``default.qubit`` device that interfaces with C++ to
perform fast linear algebra calculations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
perform fast linear algebra calculations
perform fast linear algebra calculations.


Use of this device requires pre-built binaries or compilation from source. Check out the
:doc:`/installation` guide for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other version of this file, we have a 50+ qubit warning - does this no longer hold here? On the other hand, we'll unlikely need to go that high anyway 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it doesn't need to generate templates at compile-time for each possible number of qubits, so its not restricted by that. The issue is that the address space of a 64-bit machine only goes up to well, 64-bits, so in theory 64-bits is as high as it goes. Probably more like 62 or 63-bits given the need for additional allocations.

supports_inverse_operations=False,
supports_analytic_computation=True,
returns_state=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
)
)
capabilities.pop("passthru_devices", None)

Would probably get spotted when merging, but just adding here in case

op_param = [o.parameters for o in operations]

state_vector = np.ravel(state)
assert state_vector.flags["C_CONTIGUOUS"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm up for keeping this assert, since before we were doing some strange order="F" and would be good to catch this in case there's any remaining edge cases (which I doubt).

@@ -16,6 +16,7 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good to keep these changes - since the only changes were formatting we don't need to review.

}

l_opts = {
"msvc": [],
"unix": ["-O3", "-Wall", "-fPIC", "-shared", "-fopenmp"],
"unix": ["-O3", "-W", "-fPIC", "-shared", "-fopenmp"],
}

if platform.system() == "Darwin":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not needed for this PR), we can eventually remove the eigen lines from 124 and also mentions in the docs 🎉

* @return decimal value for the qubit at specified index
*/
inline size_t decimalValueForQubit(const unsigned int qubitIndex, const unsigned int qubits) {
assert(qubitIndex < qubits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm potentially fine for the assert here. Looking here, isn't the idea of an assert to catch cases that should never happen?


// -------------------------------------------------------------------------------------------------------------

Pennylane::AbstractGate::AbstractGate(int numQubits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general comment here: it looks great as is, but there is a risk of divergence between supported operations in PL and supported operations in lightning.qubit.

Currently, we pass a list of operation names, a list of parameters, and a list of wires from the Python side to the C++ side of lightning.qubit. What about passing just a list of matrices and a list of wires? I think matrix construction is fairly efficient even on the Python side.

In this way, we don't have to worry about defining operations and their matrices on the C++ side. But I'm not sure if that will result in a performance hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, we don't have to worry about defining operations and their matrices on the C++ side. But I'm not sure if that will result in a performance hit?

The idea is that we'll want to move away from defining gates as matrices altogether in favour of more specialised kernels per-gate, so implementation-wise this all should be kept in C++.

Shouldn't the risk of divergence be minimised by having a test suite that tests the full range of supported operations?

@ThomasLoke
Copy link
Contributor Author

We'd like to merge this in ASAP, hopefully over the next few days, and we just need to decide practical details like whether we replace the old backend immediately or keep both for a time.

Cool! I'll look to push the remaining review changes tonight once I get the time.

@ThomasLoke
Copy link
Contributor Author

I think I've addressed the reviews--let me know if I've missed anything.

Copy link
Contributor

@antalszava antalszava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasLoke thank you so much again for the amazing work, looks good to me! 🏆 💯 🙂

Had a minor suggestion. Also, thanks for the discussions, was really valuable. :)

return dispatchTable;
}

static const map<string, function<unique_ptr<Pennylane::AbstractGate>(const vector<double>&)>> dispatchTable = createDispatchTable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

Comment on lines 59 to 63
auto it = dispatchTable.find(label);
if (it == dispatchTable.end())
throw std::invalid_argument(label + " is not a supported gate type");

return it->second(parameters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)

Suggested change
auto it = dispatchTable.find(label);
if (it == dispatchTable.end())
throw std::invalid_argument(label + " is not a supported gate type");
return it->second(parameters);
auto dispatchTableIterator = dispatchTable.find(label);
if (dispatchTableIterator == dispatchTable.end())
throw std::invalid_argument(label + " is not a supported gate type");
return dispatchTableIterator->second(parameters);

Comment on lines +26 to +31
template<class GateType>
static void addToDispatchTable(map<string, function<unique_ptr<Pennylane::AbstractGate>(const vector<double>&)>>& dispatchTable) {
dispatchTable.emplace(GateType::label, [](const vector<double>& parameters) { return std::make_unique<GateType>(GateType::create(parameters)); });
}

static map<string, function<unique_ptr<Pennylane::AbstractGate>(const vector<double>&)>> createDispatchTable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Looking 🔥

@ThomasLoke
Copy link
Contributor Author

No worries--pleasure is all mine :)

Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ThomasLoke! This is a very exciting PR, approved! 💯

One remaining thing is that my understanding of the underlying approach is still not yet solid. Would you mind giving a quick summary of how it works? It'll help me to see the underlying objective outside of the C++ implementation.

Thanks again for this!

@antalszava antalszava changed the base branch from master to new_backend February 25, 2021 22:40
@antalszava
Copy link
Contributor

Hi @ThomasLoke, the wheels check seems to have started causing a segmentation fault, although none of the changes seem to have caused it. This might be an error with pytest.

Will merge this PR into a separate branch for further investigation before merging into master. The authorship of this contribution will be kept in the process.

@antalszava antalszava merged commit 7348978 into PennyLaneAI:new_backend Feb 25, 2021
@ThomasLoke
Copy link
Contributor Author

One remaining thing is that my understanding of the underlying approach is still not yet solid. Would you mind giving a quick summary of how it works? It'll help me to see the underlying objective outside of the C++ implementation.

TBH there's not that much to it. It accomplishes the same thing as the old backend, that is, it computes the tensor contraction of a specified gate sequence with a state vector. It does this by computing the bit patterns associated with each application of the matrix of the gate, e.g. if applying some two-qubit gate G to qubit 2-3 (out of 4), then the bit patterns to which we apply the matrix to are 0XY0 0XY1 1XY0 1XY1 (each denoting a sub-vector of length 4). Most of the apply logic just goes into computing the associated indices for the relevant bit patterns, gathering the sub-vector based on those indices, applying the matrix, and scattering the results back to the state vector.

antalszava added a commit that referenced this pull request Feb 26, 2021
…ranch (#66)

* Add new lightweight backend with performance comparisons (#57)

* Tone down warning levels

* Black reformatting

* New backend + benchmark notebook

* Add compilation for Eigen backend back in again

* Fix compilation with gcc

* Refactor construction of StateVector from numpy array into a static method

* Move method into utility header

* Review changes

* More review changes + add function dispatch table

* Remove outdated comment

* Rename variable

Co-authored-by: antalszava <[email protected]>

* use old GateFactory.cpp from commit cff579f

* revert 6124db8

* use old GateFactory.cpp from commit cff579f

Co-authored-by: Thomas Loke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants