-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cmd/influx/write): enhance the ability to import CSV files #17599
Conversation
55d7b3d
to
9a71054
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a reivew from @influxdata/storage-team here as well
@@ -74,41 +90,102 @@ func cmdWrite(f *globalFlags, opt genericCLIOpts) *cobra.Command { | |||
return cmd | |||
} | |||
|
|||
func (writeFlags *writeFlagsType) dump(args []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the motivation behind this? can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps with the troubleshooting of CSV -> LP conversion. It is easier to find problems in user input when you dump the actual input arguments together with their default values. This is printed only in --debug mode, together with metadata about columns that are used to extract CSV data that create protocol lines.
Sure I can remove it, but it will be then harder to resolve user input errors. Opinionated herein, I do not insist on having it here.
if len(args) > 0 && args[0][0] == '@' { | ||
func (writeFlags *writeFlagsType) createLineReader(cmd *cobra.Command, args []string) (io.Reader, io.Closer, error) { | ||
readers := make([]io.Reader, 0, 2*len(writeFlags.Headers)+2*len(writeFlags.Files)+1) | ||
closers := make([]io.Closer, 0, len(writeFlags.Files)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are readers and closers separate instead of being ReadClosers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closers are subset of readers herein, only opened files will be closed, nothing more ... stdin or string readers cannot/must not be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you, seems better to do it as ReadClosers with a noop closer on those that don't need a closer, typical pattern in golang.
type noopCloser struct {
r io.reader
}
func (n *noopCloser) Close() error {
return nil
}
my big concern here is it makes it easy to lose track of closers/readers when they ares disparate uncoupled return types.
Last thing, please don't resolve conversations, I'll happily resolve them when I feel the conversation has come to fruition 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification around "conversation resolving", as a newbie herein, I need such advice 👍
I see two concerns herein:
- Why are readers and closers separate instead of being ReadClosers?
- files can be provided in non-UTF-8 encoding, a file (ReaderCloser) is then transformed to UTF-8 Reader (not a ReadCloser)
Line 141 in b786fc1
readers = append(readers, decode(f), strings.NewReader("\n")) - the last file or stdin can be then changed so that first N lines are skipped when --skipHeader option is present. This transformation also works with Readers, but not ReadClosers
Line 168 in b786fc1
readers[i] = write.SkipHeaderLinesReader(writeFlags.SkipHeader, readers[i]) - changing the above intermediate Readers to ReadClosers to have just one ReadCloser array would introduce an extra complexity herein IMO
- files can be provided in non-UTF-8 encoding, a file (ReaderCloser) is then transformed to UTF-8 Reader (not a ReadCloser)
- Why are io.Reader and io.Closer disparate return values?
- I understand that a composite io.ReaderCloser looks better. It would require to wrap multiple Readers and Closers (or ReadClosers) in a new type that will only delegate calls. I wanted to prefer builtins whenever possible, but there is no multi-reader-closer for reuse.
- At the same time, it is not a big deal (in my POV) for a non-public function to avoid this wrapping. "A returned Closer must be closed" is an expected contract IMO.
cmd/influx/write.go
Outdated
// backward compatibility: @ in arg denotes a file | ||
writeFlags.File = args[0][1:] | ||
writeFlags.Files = append(writeFlags.Files, args[0][1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned the flags are being used as a bucket for a bunch of side effects here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, for the sake of backward compatibility @file means a file ... somebody introduced this and I decided to keep this functionality because I don't know where and how it is currently used. When starting from scratch, I would not allow specifying data in an argument at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't htink my eariler comment was clear. I'd rather see you maintain the files in a sep variable instead of creating a destructive call on the flags. Flags should only be used (imo), for the API contract between CLI and user. Makes things coupled in weird global ways atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood, repaired
) | ||
|
||
// IsCharacterDevice returns true if the supplied reader is a character device (a terminal) | ||
func IsCharacterDevice(reader io.Reader) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the motivation to move this to the internal pkg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation: This fn is independent on existing cli commands. I did not convince myself that attaching it to write.go or main.go of the main package is better. I will accept any suggestion herein to follow the mindset of this repo.
if len(args) > 0 && args[0][0] == '@' { | ||
func (writeFlags *writeFlagsType) createLineReader(cmd *cobra.Command, args []string) (io.Reader, io.Closer, error) { | ||
readers := make([]io.Reader, 0, 2*len(writeFlags.Headers)+2*len(writeFlags.Files)+1) | ||
closers := make([]io.Closer, 0, len(writeFlags.Files)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you, seems better to do it as ReadClosers with a noop closer on those that don't need a closer, typical pattern in golang.
type noopCloser struct {
r io.reader
}
func (n *noopCloser) Close() error {
return nil
}
my big concern here is it makes it easy to lose track of closers/readers when they ares disparate uncoupled return types.
Last thing, please don't resolve conversations, I'll happily resolve them when I feel the conversation has come to fruition 👍
cmd/influx/write.go
Outdated
// backward compatibility: @ in arg denotes a file | ||
writeFlags.File = args[0][1:] | ||
writeFlags.Files = append(writeFlags.Files, args[0][1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't htink my eariler comment was clear. I'd rather see you maintain the files in a sep variable instead of creating a destructive call on the flags. Flags should only be used (imo), for the API contract between CLI and user. Makes things coupled in weird global ways atm.
if err != nil { | ||
return false | ||
} | ||
if (info.Mode() & os.ModeCharDevice) == os.ModeCharDevice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realized this can be simplified too:
return (info.Mode() & os.ModeCharDevice) == os.ModeCharDevice
the if isn't necessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, simplified
cmd/influx/write_test.go
Outdated
require.Contains(t, fmt.Sprintf("%s", err), "bucket") // failed to retrieve buckets | ||
|
||
// validation: no such bucket found | ||
lineData = lineData[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these sub tests? its easier to erad, but feels like a lot of coupling between assertions. Can this be split up into sub tests?
t.Run("validates no bucket found", func(t *testing.T) {
lineData := lineData[:0]
command := cmdWrite(&globalFlags{}, genericCLIOpts{w: ioutil.Discard})
// note: my-empty-org parameter causes the test server to return no buckets
command.SetArgs([]string{"--format", "csv", "--org", "my-empty-org", "--bucket", "my-bucket"})
err = :command.Execute()
require.Contains(t, fmt.Sprintf("%s", err), "bucket") // no such bucket found
})
same goes for a lot of these assertions here. They look like sub tests to me but aren't being treated as such. Please correct me if that assumption is wrong 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks better when changed to sub-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about this PR. We like the user experience it creates but we aren't comfortable accepting this PR as is because of concerns around complexity and maintainability.
I'd like to see this PR broken into at least two pieces:
-
A standalone CSV to line-protocol conversion library. This should be a reusable library that we can use uniformly in other areas where we want to convert CSV to line protocol.
-
A PR that then uses that library's public API from the CLI command tool to accomplish this CLI workflow.
Separately - I'm curious if there are existing parsers that accomplish the parsing work done here. I don't really want to take on the long term maintenance costs of CSV parsing features. I'm not familiar with what's available in the go language space... But I'm curious if there are third-party CSV processors that would accomplish more with less custom code than is used here.
ed05c38
to
00e31e8
Compare
@rbetts I've seen only a few golang's CSV parser extensions that can do (un)marshaling into structs of a pre-defined layout. We have a different situation herein:
@russorat Can you please comment on the requested creation of CSV -> LP conversion library? I mean, there are no technical problems to do so, once you agree and provide a new influxdata git repo. |
@sranka you should be able to put the parser under the |
… header rows without changing the data
…ctive for dateTime
…err before reading data rows
…s to sterr and continue with CSV processing
…alue in column name
7c1157d
to
764552e
Compare
@jsteenb2 ok, "csv to line protocol conversion" is now in pkg/csv2lp |
@sranka, we'd like to see this PR broken up and submitted incrementally. First PR for the csv converter and another for the CLI. We'll have different stakeholders reviewing for the csv PR. |
@jsteenb2 thanks for letting me know of what is required. |
Closes #17004 #17008
There is a bunch of enhancements in the way of how CSV data are processed by
influx write
so that external CSV data can be written to InfluxDB. They are described in detail in #17004 + #17008Examples of new capabilities are provided in https://github.com/bonitoo-io/influxdb-csv-import/blob/master/README.md#step-2
New flags in
influx write
: