Skip to content
This repository has been archived by the owner on Jun 14, 2023. It is now read-only.

Multiple sequence with different length #16

Open
sosuts opened this issue Feb 26, 2020 · 17 comments
Open

Multiple sequence with different length #16

sosuts opened this issue Feb 26, 2020 · 17 comments
Labels
discussion enhancement New feature or request

Comments

@sosuts
Copy link

sosuts commented Feb 26, 2020

Hi!
I'm planning to use HMMBase.jl for biological usage.
Specifically, I want to estimate a parameters (e.g. speed) of an object which has multiple states.
My data have multiple coordinate data with different length.

I am new to julia so I'm not sure if HMMBase.jl or MS_HMMBase.jl supports this kind of analysis.
If not, are there any future plans?

Thank you in advance!
I'm sorry for a very silly question.

@maxmouchet
Copy link
Owner

Hi,

Parameter estimation with multiple sequences (of potentially different lengths), is not currently supported in HMMBase. It's something that I would like to implement (in mle.jl), but I don't know when.

I'm not sure if variable length sequences are supported by MS_HMMBase.

I'll keep this issue open as a reminder to try to implement this :-)

@sosuts
Copy link
Author

sosuts commented Feb 26, 2020

Thank you for your reply :)

At least I can implement the algorithm with a specific distribution, but I don’t know how to make it work with an arbitrary distributions...

@maxmouchet
Copy link
Owner

maxmouchet commented Feb 26, 2020

In HMMBase I make use of the Distributions.jl package for the pdf (likelihood) and fit_mle (parameter estimation) methods. This way I can handle any observations distributions that implement these methods.
I only need to compute the "responsibilities" (assignments probabilities) for each observations and each states, and re-estimate the transition matrix.
See https://github.com/maxmouchet/HMMBase.jl/blob/master/src/mle.jl#L58-L68 where I delegate to fit_mle.

@sosuts
Copy link
Author

sosuts commented Apr 6, 2020

Hi again,
I'm now trying to integrate my code to this package.

BTW, I'm not sure what you mean by responsibilities of each observations.

@maxmouchet
Copy link
Owner

By "responsibilities" I mean P(Z_t = i | Y, θ), where Z_t is the hidden state at time t, Y the observations, and θ the model parameters.

@sosuts
Copy link
Author

sosuts commented Apr 10, 2020

Thank you.
I have another question.
I was thinking that we scale β with c calculated from α.

@inbounds for t in T-1:-1:1
for j in OneTo(K)
for i in OneTo(K)
β[t,j] += β[t+1,i] * A[j,i] * L[t+1,i]
end
c[t+1] += β[t,j]
end
for j in OneTo(K)
β[t,j] /= c[t+1]
end
end

Do I need to calculate another c in β?

@maxmouchet
Copy link
Owner

You're right, it's possible to use the same scaling vector c for α and β.
It's not done, for now, to keep the code simple :-)

@sosuts
Copy link
Author

sosuts commented May 1, 2020

Sorry for asking many questions!
Why do we have to subtract m from loglikelihood?

HMMBase.jl/src/mle.jl

Lines 39 to 43 in 2ebf644

m = vec_maximum(view(LL, t + 1, :))
c = 0.0
for i in OneTo(K), j in OneTo(K)
ξ[t, i, j] = α[t, i] * A[i, j] * exp(LL[t+1, j] - m) * β[t+1, j]

HMMBase.jl/src/messages.jl

Lines 103 to 123 in 2ebf644

m = vec_maximum(view(LL, 1, :))
for j in OneTo(K)
α[1, j] = a[j] * exp(LL[1, j] - m)
c[1] += α[1, j]
end
for j in OneTo(K)
α[1, j] /= c[1]
end
c[1] *= exp(m) + eps()
@inbounds for t = 2:T
m = vec_maximum(view(LL, t, :))
for j in OneTo(K)
for i in OneTo(K)
α[t, j] += α[t-1, i] * A[i, j]
end
α[t, j] *= exp(LL[t, j] - m)

@maxmouchet
Copy link
Owner

No worries!

This is the log-sum-exp trick : https://en.wikipedia.org/wiki/LogSumExp
It prevents exp(LL[t,j]) from overflowing.

@sosuts
Copy link
Author

sosuts commented Jun 13, 2020

I restarted to write a code for multiple observations.
I think I can send PR soon.

  • rand
  • likelihoods
  • forward
  • backward
  • posterior
  • update_a!
  • update_A!
  • update_B!
  • fit_mle
  • viteribi

@maxmouchet
Copy link
Owner

Nice!

I recently cleaned-up the code by removing the logl keyword and the methods that do not use the log-likelihoods.
Basically everything is done using the log-likelihoods now.

Feel free to open a PR, and I'll help you if there are merge conflicts.

@sosuts
Copy link
Author

sosuts commented Jul 13, 2020

Hi.
I changed my codes to adapt to your new api.
I implemented some new functions assuming 2 situations;

  1. multiple observations with same length
  2. multiple observations with different(random) length

I haven't finished writing codes for multivariate model in situation 2.
This notebook is an example.

Is it ok to open a pr?
To be honest, this is my first time using github so I'm not sure when to open it...

@maxmouchet
Copy link
Owner

I think the correct URL is https://nbviewer.jupyter.org/github/SosUts/HMMBase.jl/blob/multiple_sequences/notebooks/multiple%20sequences.ipynb :)

This looks very nice!
Thank you for your work :)

You can open a PR now, and I'll review the code.
It is still possible to push new commits to your branch after the PR is opened, so there is no problem to make further modifications.

@sosuts
Copy link
Author

sosuts commented Jul 15, 2020

I opened it. Thank you as always!

Edit:
I forgot to consider about the tests.
Should I change the tests, or should I close the PR and change the codes?

@maxmouchet
Copy link
Owner

No worries, you can keep the PR open!
Every commit that you add to your branch will be added to the PR automatically.

I'm a bit busy this week, so I'll try to have a look at the PR this week-end, or the next one.
In any case your code looks clean :)

@maxmouchet maxmouchet added the enhancement New feature or request label Jul 16, 2020
@maxmouchet maxmouchet linked a pull request Jul 16, 2020 that will close this issue
9 tasks
@sosuts
Copy link
Author

sosuts commented Jul 17, 2020

Thanks to your original code!
I'll try fixing things step by step.

@cossio
Copy link

cossio commented Jan 6, 2023

Hello! I need this feature for some data I have, where also I have multiple time-series of different lengths.

What's the current status? I saw the linked PR got closed, but I'm not sure why?

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants