Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4209] Avoid using HDFS in HoodieClientTestHarness #7305

Merged
merged 7 commits into from
Nov 28, 2022

Conversation

xushiyan
Copy link
Member

@xushiyan xushiyan commented Nov 25, 2022

Change Logs

Remove HDFS usage from HoodieClientTestHarness and subclasses.

Impact

Improve CI test stability and reduce run time.

Risk level

None.

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@xushiyan xushiyan added the priority:major degraded perf; unable to move forward; potential bugs label Nov 25, 2022
@@ -50,16 +48,7 @@
@SuppressWarnings("unchecked")
public class TestHoodieReadClient extends HoodieClientTestBase {

@Override
protected void initPath() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use the one in super class so remove this method in subclass

*/

/**
* @Test public void testLeaseRecovery() throws IOException, URISyntaxException, InterruptedException { Writer writer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the comment mentions, this testcase is not as the name suggests, also the logic is covered in other test case, hence removing it

/**
* A utility class about mini cluster.
*/
public class MiniClusterUtil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discard this as it bundles hdfs with zookeeper, which should be discouraged for UT/FT. testcase which needs hdfs will use HdfsTestService where applicable

@xushiyan xushiyan requested a review from nsivabalan November 26, 2022 16:34
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on the patch man. guess we should be careful in such (set ups) test base classes. If not, devs will start using these base classes blindly and might end up like this. I am sure these will bring down out test run time a lot.

couple of minor comments.

@@ -102,28 +127,28 @@ public void readLocalWriteHDFS() throws Exception {
// Write generated data to hdfs (only inserts)
String readCommitTime = hdfsWriteClient.startCommit();
LOG.info("Starting commit " + readCommitTime);
List<HoodieRecord> records = dataGen.generateInserts(readCommitTime, 100);
JavaRDD<HoodieRecord> writeRecords = jsc.parallelize(records, 1);
List<HoodieRecord> records = dataGen.generateInserts(readCommitTime, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the test assumes that records will span across all 3 partition paths (HoodieTestDataGenerator). if we reduce the num of records from 100 to 10, chances we might see some flakiness. something to keep in mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case does not depend on all partitions having records. so i've only changed this one

@@ -89,7 +88,7 @@ public void tearDown() throws Exception {
}

@ParameterizedTest
@ValueSource(booleans = {true})
@ValueSource(booleans = {false, true})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know why this was not having "false" before. may be we had some reason to remove it explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false case was removed in #6313

not sure exactly why. but it's passing now.

throw new HoodieIOException(ioe.getMessage(), ioe);
}
}
private static final int PARALLELISM = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering if this can be paramterized as a maven option? Would be helpful when we run tests on bigger cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codope i've filed this https://issues.apache.org/jira/browse/HUDI-5287

in principle we should make UT/FT only run against small number of records and runnable even on cheap machines. So we should be able to keep parallelism low and fixed. We only make num records and parallelism configurable when running ITs with varying number of records.

@xushiyan xushiyan force-pushed the HUDI-4209-remove-hdfs branch from 69abf45 to 10fb959 Compare November 28, 2022 11:42
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan merged commit 62b5989 into apache:master Nov 28, 2022
@xushiyan xushiyan deleted the HUDI-4209-remove-hdfs branch November 28, 2022 16:38
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:major degraded perf; unable to move forward; potential bugs
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants