-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPREDUCE-6096.SummarizedJob Class Improvment #1
base: trunk
Are you sure you want to change the base?
Conversation
New file of AltFileInputStream.java to replace FileInputStream.java in apache/hadoop/HDFS
move tensorflow on yarn to the proper module
merge TensorFlow-YARN from zhankun's branch
New, improved Python script
This adds a new type of namenode: observer. A observer is like a standby NN (in fact they share most of the code), EXCEPT it doesn't participate in either NN failover (i.e., it is not part of the HA), or check pointing. A observer can be specified through configuration. First, it needs to be added into the config: dfs.ha.namenodes, just like a normal namenode, together with other configs such as dfs.namenode.rpc-address, dfs.namenode.http-address, etc. Second, it needs to be specified in a new config: dfs.ha.observer.namenodes. This differentiate it from the ordinary active/standby namenodes. A observer can be used to serve read-only requests from HDFS client, when the following two conditions are satisfied: 1. the config dfs.client.failover.proxy.provider.<nameservice> is set to org.apache.hadoop.hdfs.server.namenode.ha.StaleReadProxyProvider. 2. the config dfs.client.enable.stale-read is set to true This also changes the way edit logs are loaded from the standby/observer NNs. Instead of loading them all at once, the new implementation loads them one batch at a time (default batch size is 10K edits) through multiple iterations, while waiting for a short amount of time in between the iterations (default waiting time is 100ms). This is to make sure the global lock won't be held too long during loading edits. Otherwise, the RPC processing time would suffer. This patch does not include a mechanism for clients to specify the bound of the staleness using journal transction ID: excluding this allows us to deploy the observer more easily. In more specific, the deployment involves: 1. restarting all datanodes with the updated configs. No binary change on datanodes is required. 2. bootstraping and starting the observer namenode, with the updated configs. Existing namenodes do not need to change. Future tasks: 1. allow client to set a bound on staleness in observer in terms of time (e.g., 2min). If for some reason the lagging in edit tailing is larger than the bound, the client-side proxy provider will fail over all the RPCs to the active namenode. 2. use journal transaction ID to ensure bound on staleness. This can be embedded in the RPC header. 3. allow new standby/observer to be deployed without datanode restart.
Update image names/tags in scripts
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
adjust bpid format for HW Pacific storage
NULL tuples causes NPE when writing
…ng/binary column trunk are null In case of all nulls in a binary column, statistics object read from file metadata is empty, and should return true for all nulls check for the column. Even if column has no values, it can be ignored. The other way is to fix this behaviour in the writer, but is that what we want ? Author: Yash Datta <[email protected]> Author: Alex Levenson <[email protected]> Author: Yash Datta <[email protected]> Closes apache#99 from saucam/npe and squashes the following commits: 5138e44 [Yash Datta] PARQUET-136: Remove unreachable block b17cd38 [Yash Datta] Revert "PARQUET-161: Trigger tests" 82209e6 [Yash Datta] PARQUET-161: Trigger tests aab2f81 [Yash Datta] PARQUET-161: Review comments for the test case 2217ee2 [Yash Datta] PARQUET-161: Add a test case for checking the correct statistics info is recorded in case of all nulls in a column c2f8d6f [Yash Datta] PARQUET-161: Fix the write path to write statistics object in case of only nulls in the column 97bb517 [Yash Datta] Revert "revert TestStatisticsFilter.java" a06f0d0 [Yash Datta] Merge pull request apache#1 from isnotinvain/alexlevenson/PARQUET-161-136 b1001eb [Alex Levenson] Fix statistics isEmpty, handle more edge cases in statistics filter 0c88be0 [Alex Levenson] revert TestStatisticsFilter.java 1ac9192 [Yash Datta] PARQUET-136: Its better to not filter chunks for which empty statistics object is returned. Empty statistics can be read in case of 1. pre-statistics files, 2. files written from current writer that has a bug, as it does not write the statistics if column has all nulls e5e924e [Yash Datta] Revert "PARQUET-136: In case of all nulls in a binary column, statistics object read from file metadata is empty, and should return true for all nulls check for the column" 8cc5106 [Yash Datta] Revert "PARQUET-136: fix hasNulls to cater to the case where all values are nulls" c7c126f [Yash Datta] PARQUET-136: fix hasNulls to cater to the case where all values are nulls 974a22b [Yash Datta] PARQUET-136: In case of all nulls in a binary column, statistics object read from file metadata is empty, and should return true for all nulls check for the column
...thod Author: Alex Levenson <[email protected]> Author: Konstantin Shaposhnikov <[email protected]> Author: kostya-sh <[email protected]> Closes apache#171 from kostya-sh/PARQUET-246 and squashes the following commits: 75950c5 [kostya-sh] Merge pull request apache#1 from isnotinvain/PR-171 a718309 [Konstantin Shaposhnikov] Merge remote-tracking branch 'refs/remotes/origin/master' into PARQUET-246 0367588 [Alex Levenson] Add regression test for PR-171 94e8fda [Alex Levenson] Merge branch 'master' into PR-171 0a9ac9f [Konstantin Shaposhnikov] [PARQUET-246] bugfix: reset all DeltaByteArrayWriter state in reset() method
In response to PARQUET-251 created an integration test that generates random values and compares the statistics against the values read from a parquet file. There are two tools classes `DataGenerationContext` and `RandomValueGenerators` which are located in the same package as the unit test. I'm sure there is a better place to put these, but I leave that to your discretion. Thanks Reuben Author: Reuben Kuhnert <[email protected]> Author: Ryan Blue <[email protected]> Closes apache#255 from sircodesalotOfTheRound/stats-validation and squashes the following commits: 680e96a [Reuben Kuhnert] Merge pull request apache#1 from rdblue/PARQUET-355-stats-validation-tests 9f0033f [Ryan Blue] PARQUET-355: Use ColumnReaderImpl. 7d0b4fe [Reuben Kuhnert] PARQUET-355: Add Statistics Validation Test
https://issues.apache.org/jira/browse/MAPREDUCE-6096
SummarizedJob class should be Improvment
When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's map-reduce-client-core project org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like job_1408862281971_489761-1410883171851_XXX.jhist)
and it throw an Exception Just Like
Exception in thread "pool-1-thread-1" java.lang.NullPointerException
at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.(HistoryViewer.java:626)
at com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70)
After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is NULL ,
So I change the order of attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to
TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus())
and it works well .