-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-4209] Avoid using HDFS in HoodieClientTestHarness #7305
Conversation
@@ -50,16 +48,7 @@ | |||
@SuppressWarnings("unchecked") | |||
public class TestHoodieReadClient extends HoodieClientTestBase { | |||
|
|||
@Override | |||
protected void initPath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to use the one in super class so remove this method in subclass
...client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestUpdateSchemaEvolution.java
Show resolved
Hide resolved
*/ | ||
|
||
/** | ||
* @Test public void testLeaseRecovery() throws IOException, URISyntaxException, InterruptedException { Writer writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the comment mentions, this testcase is not as the name suggests, also the logic is covered in other test case, hence removing it
/** | ||
* A utility class about mini cluster. | ||
*/ | ||
public class MiniClusterUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discard this as it bundles hdfs with zookeeper, which should be discouraged for UT/FT. testcase which needs hdfs will use HdfsTestService
where applicable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on the patch man. guess we should be careful in such (set ups) test base classes. If not, devs will start using these base classes blindly and might end up like this. I am sure these will bring down out test run time a lot.
couple of minor comments.
@@ -102,28 +127,28 @@ public void readLocalWriteHDFS() throws Exception { | |||
// Write generated data to hdfs (only inserts) | |||
String readCommitTime = hdfsWriteClient.startCommit(); | |||
LOG.info("Starting commit " + readCommitTime); | |||
List<HoodieRecord> records = dataGen.generateInserts(readCommitTime, 100); | |||
JavaRDD<HoodieRecord> writeRecords = jsc.parallelize(records, 1); | |||
List<HoodieRecord> records = dataGen.generateInserts(readCommitTime, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of the test assumes that records will span across all 3 partition paths (HoodieTestDataGenerator). if we reduce the num of records from 100 to 10, chances we might see some flakiness. something to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test case does not depend on all partitions having records. so i've only changed this one
@@ -89,7 +88,7 @@ public void tearDown() throws Exception { | |||
} | |||
|
|||
@ParameterizedTest | |||
@ValueSource(booleans = {true}) | |||
@ValueSource(booleans = {false, true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know why this was not having "false" before. may be we had some reason to remove it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false
case was removed in #6313
not sure exactly why. but it's passing now.
throw new HoodieIOException(ioe.getMessage(), ioe); | ||
} | ||
} | ||
private static final int PARALLELISM = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was wondering if this can be paramterized as a maven option? Would be helpful when we run tests on bigger cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codope i've filed this https://issues.apache.org/jira/browse/HUDI-5287
in principle we should make UT/FT only run against small number of records and runnable even on cheap machines. So we should be able to keep parallelism low and fixed. We only make num records and parallelism configurable when running ITs with varying number of records.
69abf45
to
10fb959
Compare
Change Logs
Remove HDFS usage from
HoodieClientTestHarness
and subclasses.Impact
Improve CI test stability and reduce run time.
Risk level
None.
Documentation Update
NA
Contributor's checklist