Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge Dan's PR of determinize #41

Merged
merged 20 commits into from
May 15, 2020
Merged

Conversation

qindazhu
Copy link
Collaborator

@qindazhu qindazhu commented May 14, 2020

Note: this PR is totally from @danpovey https://github.com/danpovey/k2/pull/37, I'm just fixing some compiler errors.

Nearly fixed with 3 group errors left. Please see comments below.

int32_t next_state_id;
bool is_new_state = state_map->GetOutputState(state);
arcs_out->push_back({this->state_id, next_state_id, label});
arc_weights_out->push_back(arc_weight);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this point to? and what's the next_state_id , seems we need to change the signature of GetOutputState to return a pair of <bool, next_state_id>.

And what's the label

if (det_state->forward_backward_prob >= prune_cutoff) {
bool is_new_state = state_map->GetOutputState(det_state);
arcs_out->push_back({this->state_id, next_state_id, label});
arc_weights_out->push_back(arc_weight);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, next_state_id, label and arc_weight

arc_derivs_out->clear();

bool ans = map.GetOutputState(start_state.get());
CHECK(ans && ans->state_id == 0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, seems we should change the signature of GetOutputState?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so (add wfsa_in.fsa)

int32_t seq_len = d.seq_len;
const auto &arcs = fsa.arcs;
for (int32_t i = 0; i < seq_len; ++i) {
int32_t symbol = arcs[elem->arc_id].label;
Copy link
Collaborator Author

@qindazhu qindazhu May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as MaxTracebackState and LogSumTracebackState has differnt structure, we may need to define differnt functions for them, For LogSum, how can we compute the hash pair as there are a vector of prev_elements vector<LogSumTracebackLink> prev_elements, iterate over all elements in it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you can just pick an arbitrary one, the symbol-sequence will be the same no matter which you pick.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I just picked the first one.

}
}

int32_t state_id;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey , I added a property of state_id as it seems it is used in many places (but do definition)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK but it needs to be documented. E.g.

  // State-id in the output FSA.  Only defined after DetStateMap::GetOutputState()
  //  is called.  A DetState that is in the queue should have this defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

int32_t seq_len = d.seq_len;
const auto &arcs = fsa.arcs;
for (int32_t i = 0; i < seq_len; ++i) {
int32_t symbol = arcs[elem->prev_elements[0].arc_index].label;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pick the first element to get the hash

@qindazhu
Copy link
Collaborator Author

qindazhu commented May 15, 2020

OK, fixed all compiler errors and style issues, ready to merge (so there will be no style issues for following PR from other guys). Definitely there will be bugs (and memory leaks as I noticed there are usages of new without unique_ptr). We'll need to write some tests to debug left issues.

@qindazhu qindazhu changed the title [WIP] merge Dan's PR of determinize merge Dan's PR of determinize May 15, 2020
@qindazhu qindazhu force-pushed the haowen-determinize branch from db2ed3b to cfd3c06 Compare May 15, 2020 06:48
@danpovey
Copy link
Collaborator

Great. Would you mind making sure the code runs? E.g. making some test cases?

@qindazhu
Copy link
Collaborator Author

I think we can merge it first to make other PRs pass the style check. I'll make another PR to debug with test cases this weekend, is that OK?

@danpovey danpovey merged commit 71b2691 into k2-fsa:master May 15, 2020
@danpovey
Copy link
Collaborator

Sure, thanks!

@qindazhu qindazhu deleted the haowen-determinize branch May 16, 2020 04:17
@csukuangfj csukuangfj mentioned this pull request May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants