Should event likelihood be computed using current or last hidden state? #10

mistycheney · 2021-11-11T01:34:27Z

Suppose the transformer hidden state at event i is h_i, should the likelihood of this event be computed using h_i or h_{i-1}?

Using h_{i-1} makes more sense to me because this will encourage model to assign high intensity to the true next event, therefore learn to forecast.

But the implementation and the paper seem to be using h_i. The problem is that, since the transformer is given the true event i as part of the input, it can simply learn to output infinitely high intensity for the correct event type in order to maximize the likelihood. Still, the learned model will have no predictive power.

I feel I must have missed something. Any clarification is appreciated. Thanks.

AnthonyChouGit · 2022-01-27T13:27:21Z

I have the same question as @mistycheney . In this piece of code, the likelihood is calculated using h_i, which has already encoded the i-th event. This would lead the model trying to maximize the likelihood of the event type of the i-th event, and minimizing likelihood of all other event types at this point. Does this explain the dramatic decrease of negative log-likelihood, as presented in the paper (Table 4)? I think maybe this part of the code is not written correctly?

waystogetthere · 2022-11-21T08:13:44Z

Exactly, I think this is an error. And there are may different details in the code.

This is the function to calculate the log-likelihood:

Transformer-Hawkes-Process/Utils.py

Line 58 in e1fd7ac

def log_likelihood(model, data, time, types):

There are several inputs:

model: the Transformer
data: the raw output of the Model, which needs to go through a linear layer to get the hidden state.
time: the occurring event time, shape: [BATCH, SEQ_LEN]
types: the occurring event type, shape: [BATCH, SEQ_LEN]

Preliminary: Two Masks

Please refer to line 61~65, there are 2 masks that
# non_pad_mask.shape=[BATCH, SEQ_LEN]
indicates the padding position in the batch. This is batch training and sequences with different lengths in one batch are quite common.

# type_mask.shape=[BATCH, SEQ_LEN, NUM_TYPES]
The type_mask includes a one-hot encoding indicating what type occurs at each position.

Event-likelihood

Get the hidden state, calculate the intensity of every type of event at every position, and only extract the truly occurring ones. Please refer to line67~69

all_lambda.shape = [BATCH, SEQ_LEN, NUM_TYPES] different type has different intensity
type_lambda.shape=[BATCH, SEQ_LEN] only extract the ground-truth type

Then apply the log function and sum up all. Please refer line72~73

event_ll.shape=[BATCH]

HERE COMES THE FIRST ERROR, note that the i-th event's intensity is: $f_k(h_i)$, where $f_k$ is the soft-plus function.
It is totally different from the paper:

where for the event $t_i$ its intensity should be:
$\lambda(t_i) = f_k(\alpha \frac{t_i-t_{i-1}}{t_i} + \bf w\bf h_{i-1} + b)$
It does not include the 'current' term and uses the current hidden state: $\bf h_i$ instead of the last hidden state: $\bf h_{i-1}$

Non-Event Likelihood

The code set Monte-Carlo Method as default to calculate the integral of the intensity function:

The essential idea is that, during every inter-event time, uniformly sample N points and calculate their intensity, then use their mean as representation intensity during $[t_j, t_{j-1}]$.
However, when calculating the intensity, it still uses the current hidden state $\bf h_j$ instead of the last hidden state: $\bf h_{j-1}$.

waystogetthere mentioned this issue Nov 22, 2022

The performances in the paper is not reproduced. #13

Open

AnthonyChouGit mentioned this issue Oct 7, 2024

Incorrect log-likelihood computation in THP yangalan123/anhp-andtt#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should event likelihood be computed using current or last hidden state? #10

Should event likelihood be computed using current or last hidden state? #10

mistycheney commented Nov 11, 2021 •

edited

Loading

AnthonyChouGit commented Jan 27, 2022

waystogetthere commented Nov 21, 2022 •

edited

Loading

Should event likelihood be computed using current or last hidden state? #10

Should event likelihood be computed using current or last hidden state? #10

Comments

mistycheney commented Nov 11, 2021 • edited Loading

AnthonyChouGit commented Jan 27, 2022

waystogetthere commented Nov 21, 2022 • edited Loading

Preliminary: Two Masks

Event-likelihood

Non-Event Likelihood

mistycheney commented Nov 11, 2021 •

edited

Loading

waystogetthere commented Nov 21, 2022 •

edited

Loading