-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Audio Feature in SDK #344
Create Audio Feature in SDK #344
Conversation
nickyfantasy
commented
Mar 28, 2018
- Add apis to record audio in SDK
- Add corresponding apis in pybind, storage.py, sdk.h
- Implement reservoir sampling when collecting audio samples
* Add apis to record audio in SDK * Add corresponding apis in pybind, storage.py, sdk.h * Implement reservoir sampling when collecting audio samples
@nickyfantasy Please run pre-commit to format the code. The Travis CI will check the format. |
visualdl/logic/sdk.cc
Outdated
void Audio::SetSample(int index, | ||
int sample_rate, | ||
const std::vector<value_t>& data) { | ||
CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a positive number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
visualdl/logic/sdk.cc
Outdated
int sample_rate, | ||
const std::vector<value_t>& data) { | ||
CHECK_GT(sample_rate, 0) << "sample rate should be something like 6000, 8000 or 44100"; | ||
CHECK_LT(index, num_samples_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add error messages for these 2 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
visualdl/logic/sdk.h
Outdated
struct AudioRecord { | ||
int step_id; | ||
int sample_rate; | ||
std::vector<int> data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here the data is 'int' instead of 'float'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, when we write data in, we convert float to string, when we read the data out, we convert binary to int
visualdl/logic/sdk.cc
Outdated
<< "g_log_dir should be set in LogReader construction"; | ||
BinaryRecordReader brcd(GenBinaryRecordDir(g_log_dir), filename); | ||
|
||
std::transform(brcd.data.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is brcd.data the same as res.data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brcd.data is the data in string format when we saved in file, when we read the data we convert to integer that becomes res.data
visualdl/logic/sdk.h
Outdated
|
||
/* | ||
* A combined interface for IsSampleTaken and SetSample, simpler but might be | ||
* low effience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficiency
visualdl/python/storage.py
Outdated
@@ -119,6 +119,16 @@ def text(self, tag): | |||
check_tag_name_valid(tag) | |||
return self.reader.get_text(tag) | |||
|
|||
def audio(self, tag): | |||
""" | |||
Get a audio reader with tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get 'an' audio
visualdl/logic/pybind.cc
Outdated
auto tablet = self.tablet(tag); | ||
return vs::components::ImageReader(self.mode(), tablet); | ||
}) | ||
.def("get_image", [](vs::LogReader& self, const std::string& tag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem just some code style formatting, and we each other have a different clang-format config that results in some diff, maybe we need a unified version of config that makes we have the same code style?
In paddle, the clang-format 3.8 and google c++ style is used, different config and version may lead to some diff.
We can reference paddle's .clang-format configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I just updated clang-format
visualdl/logic/sdk.cc
Outdated
CHECK_LE(index, num_records_); | ||
|
||
//convert float vector to char vector | ||
std::vector<char> data_str(data.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that data_str can directly be a string
and no need to tranform from vector
to string
again.
std::string data_str(data.size());
...
BinaryRecord brcd(xxdir, std::move(data_str));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I end up just use std::string(data.begin(),data.end())
to directly convert the data vector to string
visualdl/logic/sdk.h
Outdated
struct AudioRecord { | ||
int step_id; | ||
int sample_rate; | ||
std::vector<int> data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To meet the audio value interval, is short
or unsigned char
or char
is enough?
https://cn.mathworks.com/help/matlab/ref/audiorecorder.getaudiodata.html?s_tid=gn_loc_drop
here just int16
, int8
, uint8
are used, not int32
.
Just a suggestion, not that important, int
works good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, when we were doing speech recognition app before, we just use byte / int8
@@ -219,6 +233,61 @@ PYBIND11_MODULE(core, m) { | |||
.def("total_records", &cp::TextReader::total_records) | |||
.def("size", &cp::TextReader::size); | |||
|
|||
py::class_<cp::Audio>(m, "AudioWriter", R"pbdoc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be weird to have documentations published on the website but not the code is not in the release pip? I am not sure what's the best approach here.
visualdl/logic/pybind.cc
Outdated
)pbdoc"); | ||
|
||
py::class_<cp::AudioReader::AudioRecord>(m, "AudioRecord") | ||
// TODO(ChunweiYan) make these copyless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either remove the TODO or update it to yours
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya
visualdl/logic/sdk.cc
Outdated
num_records_ = 0; | ||
} | ||
|
||
int Audio::IsSampleTaken() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff, the function name is implying that the function will return a BOOL, but the function returns an index.
Maybe rename the function to NextRandSampleIndex
or provide a comment here to explain the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true