-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added csv directory reading #18853
Added csv directory reading #18853
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
labels='inferred')` | ||
will return a `tf.data.Dataset` that yields batches of csv files from | ||
the subdirectories `class_a` and `class_b`, together with labels | ||
0 and 1 (0 corresponding to `class_a` and 1 corresponding to `class_b`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So 1 csv file = 1 sample? Is this a common case? Usually you have 1 row = 1 sample.
|
||
|
||
def getReadings(path, stride: int = 0, head: bool = True): | ||
return tf.strings.to_number(tf.strings.split(tf.strings.split(tf.io.read_file(path)), sep=","), out_type=tf.float32)[1::stride] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line hardcodes a lot of assumptions about the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
My primary questions here are:
- Would this generalize to use cases encountered by a lot of people? Or is it closer to being a one-off for your use case?
- Would those people find it intuitive to learn to use the utility for their use case?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #18853 +/- ##
==========================================
+ Coverage 75.57% 79.25% +3.67%
==========================================
Files 352 337 -15
Lines 37066 34909 -2157
Branches 7225 6875 -350
==========================================
- Hits 28014 27667 -347
+ Misses 7357 5656 -1701
+ Partials 1695 1586 -109
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
I'm a aerospace engineering student actually searching for ways to measure sensors influence within a wing with help of ML. Given a lot of researches based on gathering sensors data stored in .csv files or binary ones, we thought it would be a nice feature to have a csv loader or even more formats of data files from sensors or researches, not only mainly Images, Audio or Text.
Then we implemented an "csv_dataset_from_directory()", by now as just an example, method to load a directory with multiple classes of csv, notwithstanding resulting in good results with a tiny amount of our dataset.
Please feel free to give any comments or suggestions about this implementation, we had only about 12 hours to think in a project cause we're in a Hackathon right now. Let us know if you guys had issues inputting that kind of data, therefore for our use in the aerospace engineering would be the exactly feature we're looking for!