Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-271] Create QuickstartUtils class towards simplifying quickstar… #929

Merged
merged 1 commit into from
Sep 30, 2019

Conversation

bhasudha
Copy link
Contributor

@bhasudha bhasudha commented Sep 28, 2019

…t guide

This class is a thin version of HoodieTestDataGenerator and will be used in Quickstart guide(Doc changes to follow in a separate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources.
This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.

@bhasudha
Copy link
Contributor Author

@bvaradar I removed the KeyPartition static class and replaced with hoodieKey. I renamed all occurrences of commitTime with randomString or something along that line. Also, was able to reuse the OverwriteWithLatestAvroPayload class. Please review at your convenience.

}
}

public static List<String> convertToStringList(List<HoodieRecord> records) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method to show the generated records in the demo ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvaradar yes. The idea is to present like below for insert:

val dataGen = new DataGenerator
val inserts = convertToStringList(dataGen.generateInserts(50))
val ds = spark.read.json(spark.sparkContext.parallelize(inserts, 2));
ds.write.format("org.apache.hudi").options(getQuickstartWriteConfigs).option(PRECOMBINE_FIELD_OPT_KEY, "ts").option(RECORDKEY_FIELD_OPT_KEY, "uuid").option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").option(TABLE_NAME, tableName).mode(SaveMode.Append).save(basepath);

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Will merge it soon

@bvaradar
Copy link
Contributor

Thanks @bhasudha for getting it done in short notice.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhasudha can you please change the commit message to be bullets like the standard format

public class QuickstartUtils {

public static class DataGenerator {
private static final String DEFAULT_FIRST_PARTITION_PATH = "2019/09/15";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is 2019 and others are 2018? it may be confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason. I ll change it to latest.

Copy link
Member

@vinothchandar vinothchandar Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can we name the partitions based on region/country/cities?

americas/united_states/san_francisco
americas/brazil/sao_paulo
asia/india/chennai

This way its relevant even a year later ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense. I ll change the sample schema and update the PR.

/**
* Class to be used in quickstart guide for generating inserts and updates against a corpus.
* <p>
* Test data uses a toy Uber trips, data model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close

?

@bhasudha
Copy link
Contributor Author

@bhasudha can you please change the commit message to be bullets like the standard format

Done!

- This will be used in Quickstart guide (Doc changes to follow in a seperate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources.
- This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.
@bvaradar bvaradar merged commit 50a073f into apache:master Sep 30, 2019
kroushan-nit pushed a commit to kroushan-nit/hudi-oss-fork that referenced this pull request Nov 10, 2024
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 5, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 6, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 7, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 7, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 7, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 14, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 14, 2025
Davis-Zhang-Onehouse added a commit to Davis-Zhang-Onehouse/hudi-oss that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants