-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-22877 WebHDFS based export snapshot will fail if hfile is in archive directory #509
Conversation
…chive directory FileLink.tryOpen() depends on fs.open(path, bufferSize) to throw FileNotFoundException to try the next file location; But when we use WebHDFS, no exception was thrown even when the file didn't exist; We should add an addition code to handle the WebHDFS case;
Does the patch try to fail early? Or does it try to support webhdfs based snapshot export? |
@jojochuang
|
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@VicoWu Could you elaborate why you think it doesn't throw FileNotfound exception. In the stack trace you pasted, it did throw FileNotFound Exception but it is wrapped in RemoteException. You just need to unwrap RemoteException to see underlying exception. Maybe I am missing something. Please correct me if I am wrong. |
💔 -1 overall
This message was automatically generated. |
Interesting to know this webhdfs behavior.
|
@jojochuang |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
I am confused whether you are talking about WebHdfsFileSystem or HttpFSFileSystem. These two are different. Refer to class org.apache.hadoop.fs.http.client.HttpFSFileSystem for latter. The further comment is assuming that you are talking about WebHdfsFileSystem since you mentioned that multiple times.
This is not correct. When you call WebHdfsFileSystem#open(), it does create an http connection to namenode and gets the list of datanodes where the blocks for that file resides. When you call read on the input stream, it directly goes to datanode. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@shahrs87
Yes, I previous make mistakes for the differences between
To be simple, in my code base
And this patch has been applied to only
So that's why your experiment cannot reproduce this problem and that you think the So that's my investigation for this problem; I think everything is much more clear now; |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
This bug does not exist in branch-2.6.0 also. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@shahrs87
And I find the first occurrence for the |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Ok. Really good back and forth here. Good info in the above dialog. Seems like a CDH5 issue. Closing. Shout if I have it wrong. |
The corresponding issue: HBASE-22877
The problem happened in the method
tryOpen
:FileLink.tryOpen()
depends onfs.open(path, bufferSize)
to throwFileNotFoundException
to try the next file location; When we are using traditional HDFS, the
fs
is implements ofDistributedFileSystem
ORViewFileSystem
, but when we are use webhdfs, thefs
is implement ofWebHdfsFileSystem
, in this case, no exception will be thrown even whenthe file didn't exist when we are calling
fs.open(path, bufferSize)
; so,ExportMapper
will think that hfile exists in the tmp directory, the FileLink will use this directory as the hfile directory by fault; Then finally, when mapper task trying to read this hfile, it found that this file doesn't exist in fact and mapper will fail:We should add an addition code to handle the
WebHDFS
case;