Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird result on Trajectory for Variable Frame-Rate dataset #51

Open
ShuangLiu1992 opened this issue Oct 2, 2016 · 13 comments
Open

weird result on Trajectory for Variable Frame-Rate dataset #51

ShuangLiu1992 opened this issue Oct 2, 2016 · 13 comments

Comments

@ShuangLiu1992
Copy link

screenshot from 2016-10-02 15-58-26

the test data is downloaded from https://www.doc.ic.ac.uk/~ahanda/VaFRIC/test_datasets.html.

It seems the program is producing weird result on the computer monitor because it is all black and the cost/correspondence is ambiguous, Is there some quick fix to penalise large depth discrepancy when there isn't enough confidence to support it?

@ShuangLiu1992
Copy link
Author

also, could somebody please shed some light on how to perform the newton step with the data structure of openDTAM to get subpixel level result?

@anuranbaka
Copy link
Owner

Regarding your first question, we already penalize depth discrepancy when
there is no other information: this is the AGd term in Eq.11 . The problem
is that specular shine provides bad information, this is one of the
fundamental difficulties of DTAM and similar pixel centric approaches. Even
when Newcomb demonstrated it in person, computer monitors would blow out.
It is hard to fix the problem, because the virtual image of reflected light
in the monitor is higher magnitude than many real features.

For the second question: The newton step should already be included in the
A step update, this is why it keeps track of the values around the minimum
and then produces a non-integer step. It is solving for the minimum of the
best fit parabola. There is still some quantization related error though. I
have found this is actually less on real video, I assume because the focus
of most real cameras is worse (more gaussian) than the VaFRIC data.

AFAIK the only part of DTAM not implemented is the accelerated exhaustive
search. You could do this with some fairly simple math during the A step
update, but I just didn't get around to it.

On Sun, Oct 2, 2016 at 11:03 AM, ShuangLiu1992 [email protected]
wrote:

also, could somebody please shed some light on how to perform the newton
step with the data structure of openDTAM to get subpixel level result?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#51 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEhWi4fYUoRtlHudN54VJ7sHaMlBuAueks5qv8e_gaJpZM4KMB5S
.

@ShuangLiu1992
Copy link
Author

ShuangLiu1992 commented Oct 2, 2016

Thank you for your reply!
I also noticed the artefacts on the monitor in the video demo of the DTAM paper, but there doesn't seem to be any specular shine on the monitor in the these particular synthetic images provided in the link?

@anuranbaka
Copy link
Owner

That's odd, I remember the top of the printer specifically having a problem
in that dataset, but I don't remember for sure on the monitor. You could
check it in a 16 bit image editor. If there's actually no specular data
there, then something is seriously wrong in the optimizer.

Also, your solution looks mirrored to me, is there a reason for that?

-Paul

On Sun, Oct 2, 2016 at 2:59 PM, ShuangLiu1992 [email protected]
wrote:

Thank you for your reply!
I also noticed the artefacts on the monitor in the video demo of the DTAM
paper, but there doesn't seem to be specular shine on the monitor in the
these particular synthetic images provided in the link?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#51 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEhWiwLQJeAvFhnWOVrC7JNRX5rk9lrJks5qv_8vgaJpZM4KMB5S
.

@ShuangLiu1992
Copy link
Author

ShuangLiu1992 commented Oct 2, 2016

Right now the code in the repo doesn't support opencv 3.0, so I rewrote some parts of to make it compatible with opencv 3.0, the reason solution looks mirrored is might be because I flipped it when saving it as .ply or .obj files.

turns out the weird result might be due to I dropped the third parameter when loading the image, because I didn't understand (still don't understand) what it was doing. After adding the third parameter back the result looks more acceptable now, but the monitor still isn't a very flat surface.

imread(png[imageNumber].string(), cv::IMREAD_UNCHANGED).convertTo(image, CV_32FC3, 1.0 / range, 1.0 / 255);

screenshot from 2016-10-02 21-28-48
screenshot from 2016-10-02 21-29-09

Also, what's the projection formula in openDTAM? I want to be able to convert other pipeline's camera and rotation matrix, translation vector to openDTAM format for testing. e.g. openMVG, openMVS, other SLAM tracking such as orbSLAM, LSD_SLAM etc

@anuranbaka
Copy link
Owner

I suspect the reason for the 1.0/255 is to avoid having totally black
regions match the out of bounds border fill that the gpu uses when making
the cost volume, to avoid this specific type of problem. But it has been a
long time since I wrote that, so I'm not sure.

I tried to make the external interfaces for OpenDTAM follow the conventions
for the opencv calib module, described here
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html.
Internally, it uses the x, y from the keyframe, and inverse depth. The
internal inverse depth is scaled so that the center of the first layer of
voxels corresponds to the far plane (since far<near in inverse depth) and
the center of the last layer corresponds to the near plane. This is all
calculated for you. In the end, you can get the world [x;y;z;w] from
CostVolume.projection.inv()*[col;row;layer;1.0], with the usual perspective
divide.

On Sun, Oct 2, 2016 at 4:33 PM, ShuangLiu1992 [email protected]
wrote:

Right now the code in the repo doesn't support opencv 3.0, so I rewrited
some parts of to make it compatible with opencv 3.0, the reason solution
looks mirrored is might be because I flipped it when saving it as .ply or
.obj files.

turns out the weird result might be due to I dropped the third parameter
when loading the image (by the way imread(path, -1) doesn't do what it is
supposed to do in 3.0 anymore), because I didn't understand (still don't
understand) what it was doing. After adding the third parameter back the
result looks more acceptable now, but the monitor still isn't a very flat
surface.

imread(png[imageNumber].string(), cv::IMREAD_UNCHANGED).convertTo(image,
CV_32FC3, 1.0 / range, 1.0 / 255);

[image: screenshot from 2016-10-02 21-28-48]
https://cloud.githubusercontent.com/assets/11735658/19023497/57c1c8fc-88e7-11e6-820b-fd990a87ffde.png
[image: screenshot from 2016-10-02 21-29-09]
https://cloud.githubusercontent.com/assets/11735658/19023498/57c1cdf2-88e7-11e6-8003-87b89a1cf613.png

Also, what's the projection formula in openDTAM? I want to be able to
convert other pipeline's camera and rotation matrix, translation vector to
openDTAM format for testing. e.g. openMVG, openMVS, other SLAM tracking
such as orbSLAM, LSD_SLAM etc


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#51 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEhWiwXbAx_715dmueFUGZJmIo-LIFuDks5qwBUHgaJpZM4KMB5S
.

@ShuangLiu1992
Copy link
Author

ShuangLiu1992 commented Oct 3, 2016

Getting weird result from converted camera matrix, is the camera matrix applied to the translation vector before it was added to the vertices?

isn't the projection formula supposed to be:
projection = camera * (rotation * vertex + translation) / z?

it seems in openDTAM the formula is:
projection = (camera * rotation * vertex + translation) / z?

Say I have got a bunch of images and their corresponding camera position and rotation in the format of openMVG, I don't know near, far and depth step, how can I convert the camera parameters to openDTAM format?

In theory, I can just replace the following code:

float wi = p.data[8] * xf + p.data[9] * yf + p.data[11];
float xi = (p.data[0] * xf + p.data[1] * yf + p.data[3]);
float yi = (p.data[4] * xf + p.data[5] * yf + p.data[7]);
float minv = 1000.0, maxv = 0.0;
float mini = 0;
for (unsigned int z = 0; z < layers; z++) {
float c0 = cdata[offset + z * layerStep];
float w = hdata[offset + z * layerStep];
float wiz = wi + p.data[10] * z;
float xiz = xi + p.data[2] * z;
float yiz = yi + p.data[6] * z;
float4 c = tex2D(tex, xiz / wiz, yiz / wiz);

in CostVolume.cu with my own projection formula, and the denoiser and optimizer would still work, is that right?

@anuranbaka
Copy link
Owner

OpenDTAM does use:
camera * (rotation * vertex + translation) / z <--this is real z
but only for x and y.

The trick is that the third coordinate of OpenDTAM internally is not real z. It is given by:
OpenDTAM_z = (1/real_z-(1/far))(1/near-1/far)*(num_layers - 1) . I used a bit of a math trick to get all that to work in the OpenDTAM projection matrix without having to do extra divides for each pixel.

The reason for all this work is that for stereo the real z depth is not a natural measure. The natural measure is 1/z (i.e. the ideal estimator for stereo is heteroskedastic in z, but homoskedastic in 1/z). I then just do a linear transformation on 1/z to make it range from [0, number_of_layers_in_cost_volume-1]

You can use real z for the projection if you like, but the results are worse if the range of depths being solved for is an appreciable fraction of the distance to the nearest depth (e.g. if you reconstruct things between 9 and 10m from the camera, then real z will probably work fine, but if you try to reconstruct things from 5 to 20m, then it will be hard to get the whole cost volume to denoise properly, and the Newton step will be biased. Worse, if you try to use 5m to infinity then real z doesn't even make sense).

In the stereo literature they say that disparity and not depth is a natural measure. Disparity is proportional to 1/depth, so it is basically the same argument as above. OpenMVP and other feature based approaches don't have to worry about this because they are solving for minimal residuals in the image plane rather than in 3-D, which automatically removes the heteroskedasticity issues.

The denoiser is unaffected by all of this. The optimizer will still work with real z, but will produce biased results.

@ShuangLiu1992
Copy link
Author

ShuangLiu1992 commented Oct 4, 2016

Thank you so much for your thorough explanation, I will try it. For me I already have the vertices, rotation and translation of an object in the scene, is there someway to validate that the combination of my rotation, translation and camera input is projecting the vertices to the right screen coordinate in openDTAM?

Also, since I only need a not so accurate depth map to work with, do you think DTAM will be definitely faster, or more suitable than other multi view stereo algorithm, for example patch match stereo?

Can I email you privately about my idea/use case of openDTAM and discuss some technical details with you? Don't know if you are still working closely with academia but I'm trying to write a paper on facial SLAM, maybe we could work together?

@anuranbaka
Copy link
Owner

That sounds interesting. I don't know how much time I have for actual work
on it, but I can certainly give advice.

As for writing a paper, I'm good at editing but very bad at writing
papers (my thoughts don't really go in order), which is a lot of why I'm
not a PhD anymore.

Anyway, you could email me at [email protected] <-words reversed to
avoid bots
-Paul

On Tue, Oct 4, 2016 at 4:45 AM, ShuangLiu1992 [email protected]
wrote:

Thank you so much for your thorough explanation, I will try it. Can I
email you privately about my idea/use case of openDTAM and discuss some
technical details with you? Don't know if you are still working closely
with academia but I'm trying to write a paper on real facial SLAM, maybe we
could work together?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#51 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEhWi2eXhufxdGo5BJI5vhH5WHE34yK2ks5qwhI6gaJpZM4KMB5S
.

@ShuangLiu1992
Copy link
Author

Just emailed you, please let me know if you have received it. Thank you!
-Shuang

@melights
Copy link

Hi @ShuangLiu1992 ,

Your reconstruction looks amazing. Is it possible to share the way of 3D reconstruction you used to me?

Many thanks,
Melights

@nonlinear1
Copy link

@ShuangLiu1992
Do you have solve your problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants