Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 627 #642

Merged
merged 11 commits into from
Feb 8, 2018
Merged

Fix 627 #642

merged 11 commits into from
Feb 8, 2018

Conversation

zhxfl
Copy link
Member

@zhxfl zhxfl commented Feb 6, 2018

split long sentence, drop sentence close by -1

@zhxfl zhxfl requested review from pkuyym and kuke February 6, 2018 09:29
@zhxfl zhxfl added the DeepASR label Feb 6, 2018

assert feature_frame_num == label_frame_num

if self._split_sentence_threshold == -1 or self._split_perturb == -1 or self._split_sub_sentence_len == -1 or self._split_sentence_threshold >= feature_frame_num:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep one line under the 80-character limit, which is also suitable for comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@kuke
Copy link
Collaborator

kuke commented Feb 7, 2018

LGTM

@@ -61,10 +61,21 @@ class SampleInfoBucket(object):
label_bin_paths (list|tuple): Files containing the binary label data.
label_desc_paths (list|tuple): Files containing the description of
samples' label data.
split_perturb(int): split long sentence' perturb sub-sentence length value.
split_sentence_threshold(int): sentence length large than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence --> Sentence
large --> larger
operator --> operation

sentence length large than split_sentence_threshold trigger split operator. -->
Sentence whose length larger than the value will trigger split operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -61,10 +61,21 @@ class SampleInfoBucket(object):
label_bin_paths (list|tuple): Files containing the binary label data.
label_desc_paths (list|tuple): Files containing the description of
samples' label data.
split_perturb(int): split long sentence' perturb sub-sentence length value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_perturb(int) --> split_perturb (int)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split long sentence' perturb sub-sentence length value. Please refine this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random perturb sub-sentence length when split long sentence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Maximum perturbation value for length of sub-sentence when splitting long sentence.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

remain_frame_num = feature_frame_num
while True:
if remain_frame_num > self._split_sentence_threshold:
cur_frame_len = self._split_sub_sentence_len + random.randint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems exceed 80 columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use self._rng instead using random.randint directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._rng.randint(0, self._split_perturb)

@@ -244,11 +291,20 @@ def read_bytes(fpath, start, size):
sample_info.feature_start,
sample_info.feature_size)

assert sample_info.feature_frame_num * sample_info.feature_dim * 4 == len(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems exceed 80 columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@@ -273,7 +329,8 @@ def read_bytes(fpath, start, size):
time.sleep(0.001)

# drop long sentence
if self._drop_frame_len >= sample_data[0].shape[0]:
if self._drop_frame_len == -1 or self._drop_frame_len >= sample_data[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhxfl zhxfl merged commit 1184109 into PaddlePaddle:develop Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants