Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mul!() of LazyProduct result in high alloc in integration #333

Open
Lightup1 opened this issue Apr 30, 2022 · 10 comments
Open

mul!() of LazyProduct result in high alloc in integration #333

Lightup1 opened this issue Apr 30, 2022 · 10 comments

Comments

@Lightup1
Copy link

function mul!(result::Ket{B1},a::LazyProduct{B1,B2},b::Ket{B2},alpha,beta) where {B1,B2}
    tmp1 = Ket(a.operators[end].basis_l)
    mul!(tmp1,a.operators[end],b,a.factor,0)
    for i=length(a.operators)-1:-1:2
        tmp2 = Ket(a.operators[i].basis_l)
        mul!(tmp2,a.operators[i],tmp1)
        tmp1 = tmp2
    end
    mul!(result,a.operators[1],tmp1,alpha,beta)
    return result
end

tmp1 and tmp2 can cause high alloc when the integration tspan is large.

@Lightup1 Lightup1 changed the title mul! of LazyProduct result high alloc in integration mul! of LazyProduct result in high alloc in integration Apr 30, 2022
@Lightup1
Copy link
Author

Speculate that adding fields of Ket, Bra, and combinedoperater may resolve the use of tmp, but it may make LazyProduct a little bit fat.

@Lightup1 Lightup1 changed the title mul! of LazyProduct result in high alloc in integration mul!() of LazyProduct result in high alloc in integration Apr 30, 2022
@Lightup1 Lightup1 changed the title mul!() of LazyProduct result in high alloc in integration mul!() of LazyProduct result in high alloc in integration Apr 30, 2022
@Lightup1
Copy link
Author

As a test add KTL and BTL

mutable struct LazyProduct{BL,BR,F,T,KTL,BTR} <: AbstractOperator{BL,BR}
    basis_l::BL
    basis_r::BR
    factor::F
    operators::T
    ket_l::KTL
    bra_r::BTR
    function LazyProduct{BL,BR,F,T,KTL,BTR}(operators::T, factor::F=1) where {BL,BR,F,T,KTL,BTR}
        for i = 2:length(operators)
            check_multiplicable(operators[i-1], operators[i])
        end
        ket_l=[Ket(operator.basis_l) for operator in operators]
        bra_r=[Bra(operator.basis_r) for operator in operators]
        new(operators[1].basis_l, operators[end].basis_r, factor, operators,ket_l,bra_r)
    end
end
function LazyProduct(operators::T, factor::F=1) where {T,F}
    BL = typeof(operators[1].basis_l)
    BR = typeof(operators[end].basis_r)
    KTL = typeof([Ket(operator.basis_l) for operator in operators])
    BTR = typeof([Bra(operator.basis_r) for operator in operators])
    LazyProduct{BL,BR,F,T,KTL,BTR}(operators, factor)
end
function mul!(result::Ket{B1},a::LazyProduct{B1,B2},b::Ket{B2},alpha,beta) where {B1,B2}
    mul!(a.ket_l[end],a.operators[end],b,a.factor,0)
    for i=length(a.operators)-1:-1:2
        mul!(a.ket_l[i],a.operators[i],a.ket_l[i+1])
        # a.ket_l[i+1]=Ket(a.operators[i].basis_l)
    end
    mul!(result,a.operators[1],a.ket_l[2],alpha,beta)
    # a.ket_l[2]=Ket(a.operators[2].basis_l)
    return result
end

@benchmark QuantumOptics.mul!(dpsi,H_kin,Ψ0)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.740 μs …  4.260 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.780 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.795 μs ± 91.743 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▂ ▆█▆ ▃▂ ▂                                               ▂
  ▇██▁███▁██▁██▅▁▅▅▅▁▃▄▁▃▅▄▁▅▄▃▁▃▅▁▆▇█▁█▇▁▇▆▆▁▆▅▅▁▅▄▁▄▄▅▁▄▄▄ █
  1.74 μs      Histogram: log(frequency) by time     2.16 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Now the alloc is irrelevant with the integration time
@benchmark tout, Ψt = timeevolution.schroedinger($T, $Ψ0, $H)

BenchmarkTools.Trial: 272 samples with 1 evaluation.
 Range (min … max):  17.615 ms … 37.316 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.155 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.400 ms ±  1.428 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▃▅▃▅▇██▅▂ 
  ▆██████████▆▆▆▁▇▇▁▆▆▄▁▁▇█▄▁▄▁▆▁▁▁▁▁▁▁▁▁▆▄▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄▄▄ ▆
  17.6 ms      Histogram: log(frequency) by time      21.9 ms <

 Memory estimate: 38.00 KiB, allocs estimate: 70.

@david-pl
Copy link
Member

Yes, that mul! can be optimized. Caching the Ket and Bra like you do is a good idea in principle.

One issue here is that the data type of the Ket you multiply with is not conserved. What I mean is, that e.g. Ket(operator.basis_l) in your ket_l does not specifiy the data type of the data of the Ket, so Vector{ComplexF64} is assumed. This means, however, if you try to work with, e.g. Vector{ComplexF32}, the data type will not be conserved when multiplying such a Ket with a LazyProduct.

This issue is actually a bit tricky as it means you need to take into account the type of result from the mul! method for the cache of LazyProduct. So you cannot fully know this when constructing the LazyProduct. You could fill the cache in mul! directly, since you'd have all the info. Then you just need a fast way of checking if the cache has already been filled for the current data types involved in the mul! operation and don't reconstruct it if it's already valid.

That said, the current version of the code (on master) doesn't respect the data type either. So what you're doing is definitely an improvement, and we could leave the data type issue for another time. Would you like to open a PR with the changes you suggest above?

@Lightup1
Copy link
Author

OK, I’ll open a PR. But I’m new in programming, may take some time to know how to use GitHub PR.

@Lightup1
Copy link
Author

> git push -u origin EffientLazyProduct_mul
remote: Permission to qojulia/QuantumOptics.jl.git denied to Lightup1.
fatal: unable to access 'https://github.com/qojulia/QuantumOptics.jl.git/': The requested URL returned error: 403

After setup for a whole night, I find out that I have no permission.😂😂

@david-pl
Copy link
Member

@Lightup1 that's not how you submit a PR... you need to fork the repository. See for example this guide: https://code.tutsplus.com/tutorials/how-to-collaborate-on-github--net-34267

@Lightup1
Copy link
Author

Oh, thank you. Got it!😅

@amilsted
Copy link
Collaborator

amilsted commented May 3, 2022

An alternative is to use caching to handle the tmp vectors, like in this PR. We could use the same cache in LazyProduct and LazyTensor (which is also a kind of lazy product operator).

I thought about using the approach described here, but it does seem a bit much to put all possible intermediate types (those for kets, bras, and operators!) in the operator struct.

A third way is to put a cache in the operator struct. That way only needed arrays get created and the memory is freed when the operator is garbage-collected.

@Lightup1
Copy link
Author

Lightup1 commented May 5, 2022

Thanks for your suggestions.

A third way is to put a cache in the operator struct. That way only needed arrays get created and the memory is freed when the operator is garbage-collected.

I wonder how we define the type of cache since it can be Ket or Operater with different basis.

@amilsted
Copy link
Collaborator

amilsted commented May 5, 2022

Yes, the typing issue is a little tricky, also because element types can vary, as @david-pl points out above. It may not be too terrible to use abstract types, such as an appropriate union, or even Any, as is used in the global cache of the PR I linked. There doesn't seem to be a significant performance hit in cases I have tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants