Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Externalized Kafka Datasource #3504

Merged
merged 191 commits into from
Jun 29, 2020
Merged

Conversation

jdye64
Copy link
Contributor

@jdye64 jdye64 commented Dec 3, 2019

resolves: #3405

This PR introduces an Apache Kafka Datasource for libcudf.

Update 1/23/2020: It has been decided that this PR should include ways to externalize third party dependencies. I wanted to add the list of things I would be changing here with checkboxes to help with visibility into what I am doing.

  • Remove all librdkafka dependencies from conda environment
  • Create build script to enable building code needed for external datasources on the libcudf side
  • create a new interface, call it ExternalDatasource.h that extends the existing datasource.h interface.
  • Modify existing CMake build to not include librdkafka, but modify to build the supporting externalized kafka bits
  • Allow anyone who wants to make a cuDF compatible datasource implement that interface. So this would work for more than just Kafka this time. At this time the Kafka one will be added to this codebase.
  • library (.so source code) uses "extern C" to expose only the functions needed by datasource.h overcoming our current c++ symbol mangling issues with dlsym
  • A minor change is made to libcudf++ to scan a pre-determined directory for .so files and considers those "ExternalDatasources". Uses dlopen to open them and ONLY interact with functions defined in datasource.h ensuring any additional implementation logic doesn't interfere with cuDF

@jdye64 jdye64 requested review from a team as code owners December 3, 2019 02:39
@jdye64 jdye64 requested a review from a team December 3, 2019 02:39
@jdye64 jdye64 requested a review from a team as a code owner December 3, 2019 02:39
@jdye64 jdye64 changed the title [WIP] 3405: Base Kafka DataSource for CSV. Still need some integration test… [WIP] 3405: Base Kafka DataSource for CSV Dec 3, 2019
@harrism
Copy link
Member

harrism commented Dec 3, 2019

@jdye64 this should be 0.12 -- retargeting. Make sure you merge branch-0.12 into your branch.

@harrism harrism changed the base branch from branch-0.11 to branch-0.12 December 3, 2019 03:00
…ing framework, feedback around external queue consumption, and general feedback
build.sh Outdated Show resolved Hide resolved
@harrism harrism changed the title [WIP] 3405: Base Kafka DataSource for CSV [WIP] Base Kafka DataSource for CSV Dec 4, 2019
@harrism harrism added 2 - In Progress Currently a work in progress cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. labels Dec 4, 2019
raydouglass
raydouglass previously approved these changes Dec 5, 2019
@jdye64 jdye64 requested a review from a team as a code owner December 17, 2019 17:31
@harrism
Copy link
Member

harrism commented Jun 23, 2020

I've approved, but I would like an issue filed to track improved testing. I think we should use google mock to mock a kafka cluster in gtests so that we don't have to stand up Kafka in CI.

@jdye64
Copy link
Contributor Author

jdye64 commented Jun 24, 2020

@harrism I think that makes a lot of sense. I made another commit since there were some upstream changes needed but everything is passing now.

I also opened an issue for adding google mock #5571

@kkraus14
Copy link
Collaborator

cc @vuule @dillon-cullinan could you review again?

@vuule
Copy link
Contributor

vuule commented Jun 24, 2020

There's just one outstanding suggestion (unresolved the conversation, see above) left to integrate for my 👍

@jdye64
Copy link
Contributor Author

jdye64 commented Jun 25, 2020

@vuule change made. Thanks for the review.

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuIO Reviewer labels Jun 25, 2020
cpp/doxygen/Doxyfile Show resolved Hide resolved
@kkraus14 kkraus14 added 3 - Ready for Review Ready for review by team and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Jun 25, 2020
@kkraus14
Copy link
Collaborator

@dillon-cullinan could you review again when you get a chance? Thanks!

Copy link
Contributor

@dillon-cullinan dillon-cullinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some small nitpicks, but I won't let my review block the merge. Please do fix these :)

build.sh Outdated Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved
jdye64 added 2 commits June 28, 2020 22:04
…w doxygen 'io_datasources' group, change kafka_consumer namespace to 'datasource' from 'external'
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jun 29, 2020
@kkraus14 kkraus14 merged commit 0732f53 into rapidsai:branch-0.15 Jun 29, 2020
@jdye64 jdye64 deleted the 3405 branch June 29, 2020 16:36
ajschmidt8 added a commit to ajschmidt8/cudf that referenced this pull request Feb 16, 2022
The `-l` flag was introduced in rapidsai#3504, but I don't see it being used anywhere. Seems safe to remove unless anyone else thinks otherwise.
rapids-bot bot pushed a commit that referenced this pull request Feb 17, 2022
The `-l` flag was introduced in #3504, but I don't see it being used anywhere. Seems safe to remove unless anyone else thinks otherwise.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #10313
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Extend Kafka Datasource to include CSV
8 participants