-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement topological sort. #18
Conversation
Great. Please call it TopSort though, not TopoSort. |
ad89b57
to
d23ac96
Compare
Hm, I see.
I guess you can require that it be connected then.
But can we make sure that Connect() is able to produce top-sorted output as
long as the input is free of cycles (which are not self-loops)?
This should be possible just by using the right queue topology, i.e. a fifo
queue, to do breadth first search.
…On Wed, Apr 29, 2020 at 10:08 PM Fangjun Kuang ***@***.***> wrote:
It happens that not all un-connected graphs can be sorted topologically.
For the following fsa:
[image: trim]
<https://user-images.githubusercontent.com/5284924/80284508-7bf55d80-8751-11ea-86ac-c7101e3cbd23.png>
What should state 5 be numbered in a TopSorted FSA?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/18#issuecomment-621234470>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6L6RLFXZVXRFQ4Y4TRPAYFTANCNFSM4MS6ERKQ>
.
|
If that's hard, it's OK, we can make them separate calls for now.
…On Wed, Apr 29, 2020 at 10:14 PM Daniel Povey ***@***.***> wrote:
Hm, I see.
I guess you can require that it be connected then.
But can we make sure that Connect() is able to produce top-sorted output
as long as the input is free of cycles (which are not self-loops)?
This should be possible just by using the right queue topology, i.e. a
fifo queue, to do breadth first search.
On Wed, Apr 29, 2020 at 10:08 PM Fangjun Kuang ***@***.***>
wrote:
> It happens that not all un-connected graphs can be sorted topologically.
>
> For the following fsa:
> [image: trim]
> <https://user-images.githubusercontent.com/5284924/80284508-7bf55d80-8751-11ea-86ac-c7101e3cbd23.png>
>
> What should state 5 be numbered in a TopSorted FSA?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/danpovey/k2/pull/18#issuecomment-621234470>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZFLO6L6RLFXZVXRFQ4Y4TRPAYFTANCNFSM4MS6ERKQ>
> .
>
|
I am going to implement it in the current |
Lets leave the filenames as is for now. Maybe fsa_algo could just be
algorithms. We will already access via k2 prefix.
…On Thursday, April 30, 2020, Haowen Qiu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/fsa_algo.cc
<https://github.com/danpovey/k2/pull/18#discussion_r417702156>:
> @@ -194,4 +193,111 @@ void ArcSort(const Fsa &a, Fsa *b,
}
if (arc_map != nullptr) arc_map->swap(indexes);
}
+
+void TopSort(const Fsa &a, Fsa *b,
I think make boolean as the return value may be better, as the caller will
know what it should do next according to the return value:
- if the input fsa is empty, we return true, as sorted fsa on empty
fsa is still empty.
- if the input fsa is non-connected or cyclic, we return false. Then
the caller knows that the input is invalid, the caller may check the input
fsa.
For now, as we always clear the output fsa at the beginning, the caller
can not tell the difference when the function returns from different
branches.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/18#pullrequestreview-403143780>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO32GVMB6DQKIAJKUL3RPDGRBANCNFSM4MS6ERKQ>
.
|
Maybe remove fsa prefix on filenames.
…On Thursday, April 30, 2020, Daniel Povey ***@***.***> wrote:
Lets leave the filenames as is for now. Maybe fsa_algo could just be
algorithms. We will already access via k2 prefix.
On Thursday, April 30, 2020, Haowen Qiu ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In k2/csrc/fsa_algo.cc
> <https://github.com/danpovey/k2/pull/18#discussion_r417702156>:
>
> > @@ -194,4 +193,111 @@ void ArcSort(const Fsa &a, Fsa *b,
> }
> if (arc_map != nullptr) arc_map->swap(indexes);
> }
> +
> +void TopSort(const Fsa &a, Fsa *b,
>
> I think make boolean as the return value may be better, as the caller
> will know what it should do next according to the return value:
>
> - if the input fsa is empty, we return true, as sorted fsa on empty
> fsa is still empty.
> - if the input fsa is non-connected or cyclic, we return false. Then
> the caller knows that the input is invalid, the caller may check the input
> fsa.
>
> For now, as we always clear the output fsa at the beginning, the caller
> can not tell the difference when the function returns from different
> branches.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://github.com/danpovey/k2/pull/18#pullrequestreview-403143780>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZFLO32GVMB6DQKIAJKUL3RPDGRBANCNFSM4MS6ERKQ>
> .
>
|
Have to talk with cuda experts re how to structure that part.
…On Thursday, April 30, 2020, Daniel Povey ***@***.***> wrote:
Maybe remove fsa prefix on filenames.
On Thursday, April 30, 2020, Daniel Povey ***@***.***> wrote:
> Lets leave the filenames as is for now. Maybe fsa_algo could just be
> algorithms. We will already access via k2 prefix.
>
> On Thursday, April 30, 2020, Haowen Qiu ***@***.***> wrote:
>
>> ***@***.**** commented on this pull request.
>> ------------------------------
>>
>> In k2/csrc/fsa_algo.cc
>> <https://github.com/danpovey/k2/pull/18#discussion_r417702156>:
>>
>> > @@ -194,4 +193,111 @@ void ArcSort(const Fsa &a, Fsa *b,
>> }
>> if (arc_map != nullptr) arc_map->swap(indexes);
>> }
>> +
>> +void TopSort(const Fsa &a, Fsa *b,
>>
>> I think make boolean as the return value may be better, as the caller
>> will know what it should do next according to the return value:
>>
>> - if the input fsa is empty, we return true, as sorted fsa on empty
>> fsa is still empty.
>> - if the input fsa is non-connected or cyclic, we return false. Then
>> the caller knows that the input is invalid, the caller may check the input
>> fsa.
>>
>> For now, as we always clear the output fsa at the beginning, the caller
>> can not tell the difference when the function returns from different
>> branches.
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/danpovey/k2/pull/18#pullrequestreview-403143780>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAZFLO32GVMB6DQKIAJKUL3RPDGRBANCNFSM4MS6ERKQ>
>> .
>>
>
|
The transition table is represented by a string which has the same format with OpenFST.
c57db93
to
45dbb67
Compare
I've added support to build an FSA from a transition table, like OpenFST, simplifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
28f3c61
to
a0c09d7
Compare
I agree on !sub.empty().
You guys can merge if you think itit's OK.
…On Sat, May 2, 2020 at 9:47 AM Haowen Qiu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/fsa_util.cc
<https://github.com/danpovey/k2/pull/18#discussion_r418822701>:
> +void SplitStringToVector(const std::string &in, const char *delim,
+ std::vector<std::string> *out) {
+ CHECK_NOTNULL(delim);
+ CHECK_NOTNULL(out);
+ out->clear();
+ size_t start = 0;
+ size_t end = in.size();
+ while (true) {
+ auto pos = in.find_first_of(delim, start);
+ if (pos == in.npos) break;
+
+ auto sub = in.substr(start, pos - start);
+ start = pos + 1;
+
+ TrimString(&sub);
+ if (sub.empty() == false) out->emplace_back(std::move(sub));
really suggest if(!sub.empty()) It's more readable
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/18#pullrequestreview-404489782>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5S7NSKRPR6CQCMZXTRPN3SFANCNFSM4MS6ERKQ>
.
|
please use squash and merge. there are too many commits in this pullrequest. |
is successful; false otherwise | ||
@return The converted integer. | ||
*/ | ||
int32_t StringToInt(const std::string &s, bool *is_ok = nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking that maybe we sould not put those helper functions in fsa_utils
as all functions are not related with fsa except stringToFsa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
When such helper functions reach a certain number, we can place them separately.
For now they are in a .cc file so it doesn't matter.
We don't have to be too focused on style issues at the moment.
I'm more concerned about the overall structure.
…On Sat, May 2, 2020 at 12:25 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In k2/csrc/fsa_util.cc
<https://github.com/danpovey/k2/pull/18#discussion_r418884777>:
> #include <utility>
#include <vector>
+#include "glog/logging.h"
+
+namespace {
+/** Convert a string to an integer.
+
+ @param [in] s The input string.
+ @param [out] is If non-null, it is set to true when the conversion
+ is successful; false otherwise
+ @return The converted integer.
+ */
+int32_t StringToInt(const std::string &s, bool *is_ok = nullptr) {
I agree.
When such helper functions reach a certain number, we can place them
separately.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/18#discussion_r418884777>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO5EWXYTWSSCIP7K3STRPOOEZANCNFSM4MS6ERKQ>
.
|
I have a question about this. it is acyclic and topsorted. I've changed the Notice that the state 2 in the output fsa maps to the state 3 in the input fsa. My question is that the topological sort of an acyclic fsa is not unique, does the |
Shouldn't the topological sort respect the arcs ordering? E.g. if the FSA is label sorted then follow the same order in these cases (and if it's not sorted then whatever arc comes first in the underlying data structure)? |
I dont want to impose any additional requirements on it. Leave more
freedom for optimization. We can sort the arcs separately.
…On Sunday, May 3, 2020, Piotr Żelasko ***@***.***> wrote:
Shouldn't the topological sort respect the arcs ordering? E.g. if the FSA
is label sorted then follow the same order in these cases (and if it's not
sorted then whatever arc comes first in the underlying data structure)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/danpovey/k2/pull/18#issuecomment-623040893>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4BFVC25CEXNJMOFM3RPTEPXANCNFSM4MS6ERKQ>
.
|
For the following fsa
after
TopSort
, it produces the following fsawith the state map (mapping state from the sorted fsa to the original fsa)
More unit test cases will be added later (possibly tomorrow).