Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pkg/csv2lp): add csv to line protocol conversion library #17764

Merged
merged 13 commits into from
May 12, 2020

Conversation

sranka
Copy link
Contributor

@sranka sranka commented Apr 16, 2020

#17004 requests CSV support in influx write command, #17599 requires a separate library for CSV to line protocol conversion that shall be then used in influx write. This PR introduces the library.

README.md describes this library and contains examples.

@sranka sranka requested a review from jsteenb2 April 16, 2020 13:21
@sranka
Copy link
Contributor Author

sranka commented Apr 16, 2020

@jsteenb2 can you please help with the stakeholders to review this PR

@jsteenb2 jsteenb2 requested review from a team and sebito91 and removed request for a team April 16, 2020 17:00
@jsteenb2
Copy link
Contributor

@sranka I'd like to see storage folks or someone who is more LP saavy give this a read. I pulled someone in from the auto assigner for the storage team.

@sebito91
Copy link
Contributor

Hey folks, I'm looking into this today!

Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome addition to the library, thanks @sranka! There are a few things to clear up here in terms of composition and clarification. Also, I would really like to see you include a docs update similar to what you've done here. Add that as a README into this package as it would make things a lot clearer!

Also, apart from the other naming please go through an move your filenames from camelCase to snake_case, so csvTable.go becomes csv_table.go, etc.

Lastly, all test names should be of the format Test_ThingToTestInCamelCase. You have a few references to the actual function you're testing under the covers, like Test_escapeMeasurement that should become Test_EscapeMeasurement.

Other than that, thank you again so very much for your contribution! If you have any questions I'm lurking both on the gophers.slack.com and influxcommunity.slack.com under sebito91.

pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvTable.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvToProtocolLines.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@sranka sranka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sebito91 for a detailed code review, I fixed the naming conventions of files/tests, added documentation, and overall improved the code to address your feedback. I replied to all your comments and left them up to you to mark them as resolved. Some comments/conversations are still open, I would appreciate your feedback therein.

setupTable func(table *CsvTable, row []string) error
}

func (a *annotationComment) isTableAnnotation() bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return strings.HasPrefix(strings.ToLower(comment), a.prefix)
}

var supportedAnnotations = []annotationComment{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotationComment is never updated, there is IMHO no need to have pointers herein; there are a few annotation comments (by design) that don't change

pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/csvAnnotations.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
pkg/csv2lp/dataConversion.go Outdated Show resolved Hide resolved
errors.New("no measurement supplied"),
}
}
buffer = append(buffer, escapeMeasurement(measurement)...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also use appendConverted fn to append field values or various data types, the appendConverted internally uses strConv.append* functions that accept and return []byte. A change to bytes.Buffer would then require (on the other hand) to create temporary objects for every float, int, uint value that I don't need now. I am not sure that the situation gets better after this change. So it is trade-off situation herein. I would rather keep it the way it is.

I wish to have append optimized by the compiler or bytes.Buffer also accepting primitive types for writing.

}
}
}
return buffer, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, or I could simplify the function signature to use *bytes.Buffer

@sranka sranka requested a review from sebito91 April 28, 2020 20:52
@sebito91
Copy link
Contributor

Excellent work @sranka, will review this and get back to you today!

Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes this time but we're getting closer, thanks so much for your patience @sranka !!

pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/csv2lp.go Outdated Show resolved Hide resolved
pkg/csv2lp/csv_annotations.go Outdated Show resolved Hide resolved
@@ -0,0 +1,33 @@
package csv2lp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file actually used anywhere? If not we should remove...

Copy link
Contributor Author

@sranka sranka May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not used internally by the library. It is referenced in README.MD (Support Existing CSV files) since it helps compose a single input reader out of multiple Readers and ReadClosers.

csv2lp.MultiCloser helps with closing multiple io.Closers (files) on input, because it is not available OOTB

Copy link
Contributor Author

@sranka sranka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @sebito91 for review, it is now ready for the next review

pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/README.md Outdated Show resolved Hide resolved
pkg/csv2lp/csv2lp.go Outdated Show resolved Hide resolved
pkg/csv2lp/csv_annotations.go Outdated Show resolved Hide resolved
@@ -0,0 +1,33 @@
package csv2lp
Copy link
Contributor Author

@sranka sranka May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not used internally by the library. It is referenced in README.MD (Support Existing CSV files) since it helps compose a single input reader out of multiple Readers and ReadClosers.

csv2lp.MultiCloser helps with closing multiple io.Closers (files) on input, because it is not available OOTB

@sranka sranka requested a review from sebito91 May 12, 2020 19:17
@sebito91
Copy link
Contributor

I think this build requires a go mod tidy before it'll pass integration tests. If you can make those changes I'll ack the PR and we're good to merge!

@sranka
Copy link
Contributor Author

sranka commented May 12, 2020

I would appreciate any insight or explanation regarding the failing tests that are (from time to time) causing unintended interferences and testing failures. @sebito91 I rebased this branch to the latest master, it usually helps.

@sebito91
Copy link
Contributor

That'll usually do it, but if you're importing new packages you'll need to run go mod vendor and go mod tidy to ensure the modules files are updated accordingly.

For example after bringing your branch down into my local repo, it looks like go.mod is missing a reference to testify:

(base)  [sborza@icebox]:~/src/github.com/influxdata/influxdb (master-1.x *%):$ git diff go.mod
diff --git a/go.mod b/go.mod
index cd2e3407f4..c702f309d2 100644
--- a/go.mod
+++ b/go.mod
@@ -40,6 +40,7 @@ require (
        github.com/segmentio/kafka-go v0.2.0 // indirect
        github.com/smartystreets/goconvey v1.6.4 // indirect
        github.com/spf13/cast v1.3.0
+       github.com/stretchr/testify v1.4.0
        github.com/tinylib/msgp v1.0.2
        github.com/willf/bitset v1.1.3 // indirect
        github.com/xlab/treeprint v0.0.0-20180616005107-d6fb6747feb6

Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, well done!

@sebito91 sebito91 merged commit 01b00da into master May 12, 2020
sebito91 added a commit that referenced this pull request May 12, 2020
chore(CHANGELOG): update to include recently merged PR #17764
@sranka sranka deleted the 17004/csv2lp branch May 13, 2020 05:36
@sranka sranka linked an issue May 14, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to import a simple csv file via the influx cli
3 participants