-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-271] Create QuickstartUtils class towards simplifying quickstar… #929
Conversation
3c8bdb5
to
bca2214
Compare
@bvaradar I removed the KeyPartition static class and replaced with hoodieKey. I renamed all occurrences of commitTime with randomString or something along that line. Also, was able to reuse the OverwriteWithLatestAvroPayload class. Please review at your convenience. |
} | ||
} | ||
|
||
public static List<String> convertToStringList(List<HoodieRecord> records) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method to show the generated records in the demo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bvaradar yes. The idea is to present like below for insert:
val dataGen = new DataGenerator
val inserts = convertToStringList(dataGen.generateInserts(50))
val ds = spark.read.json(spark.sparkContext.parallelize(inserts, 2));
ds.write.format("org.apache.hudi").options(getQuickstartWriteConfigs).option(PRECOMBINE_FIELD_OPT_KEY, "ts").option(RECORDKEY_FIELD_OPT_KEY, "uuid").option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").option(TABLE_NAME, tableName).mode(SaveMode.Append).save(basepath);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Will merge it soon
Thanks @bhasudha for getting it done in short notice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhasudha can you please change the commit message to be bullets like the standard format
public class QuickstartUtils { | ||
|
||
public static class DataGenerator { | ||
private static final String DEFAULT_FIRST_PARTITION_PATH = "2019/09/15"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason this is 2019 and others are 2018? it may be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason. I ll change it to latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can we name the partitions based on region/country/cities?
americas/united_states/san_francisco
americas/brazil/sao_paulo
asia/india/chennai
This way its relevant even a year later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense. I ll change the sample schema and update the PR.
/** | ||
* Class to be used in quickstart guide for generating inserts and updates against a corpus. | ||
* <p> | ||
* Test data uses a toy Uber trips, data model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close
?bca2214
to
c472ede
Compare
Done! |
- This will be used in Quickstart guide (Doc changes to follow in a seperate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources. - This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.
c472ede
to
8fb740b
Compare
…t guide
This class is a thin version of HoodieTestDataGenerator and will be used in Quickstart guide(Doc changes to follow in a separate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources.
This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.