Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between experimental and backport APIs #36

Open
BigRedT opened this issue Aug 17, 2015 · 15 comments
Open

Difference between experimental and backport APIs #36

BigRedT opened this issue Aug 17, 2015 · 15 comments

Comments

@BigRedT
Copy link

BigRedT commented Aug 17, 2015

Hi,

Thanks for the contribution!

It would be great if you (or anybody in the community) could provide a summary of the major differences between the APIs in the 2 branches. For instance some of the commits in the experimental branch have been marked as unstable and while backport looks much cleaner, no one's touched it for an year. Also do you have a version that does both tracking and mapping? Since testprog does not use Tracking its hard to know the status on the Tracking part.

I also have a blog post on DTAM which might be of interest to this community:
http://ahumaninmachinesworld.blogspot.com/2015/07/dtam-dense-tracking-and-mapping.html

Thanks,
Tanmay

@BigRedT
Copy link
Author

BigRedT commented Aug 28, 2015

Hi,

I wrote code for tracking and mapping using your experimental branch APIs and the code works perfectly for the synthetic data which you read using loadAhanda. However, when I try to run it on my video I get black screens. I obtain the camera matrix for my camera with OpenCVs camera calibration. Since I did not have camera poses for my video, I pre-compute them for the first 20 frames using Bundle Adjustment implemented with Google's Ceres Solver (point cloud visualization looks good so the camera pose estimates and camera matrix are most likely correct). This gives me a 3x3 rotation matrix R and and 3x1 translation matrix T such that the mapping from the world coordinate x to camera coordinate x_cam is given by:
x_cam = R*x + T

Are you making any other assumptions about the input? My videos are not of entire rooms but of objects placed on a table and the camera is about 1-2 feet from the object. For this reason I set the 1/near = 0.03 and 1/far = 0.015 but the output inverse depthmap only contains 0.015. I have tried smaller values for 1/far but the output is always a matrix of 1/far values.

I also saw that you are rescaling the image pixel values to lie between 0+1/255 and 1+1/255. I did not understand this by I tried it nevertheless. However I still get black screens.

I can share my main .cpp file with you if you can take a look. It would be great if you can help me debug.

Thanks,
Tanmay

@XiaoshuiHuang
Copy link

Hi Tanmay,
I met the same problem with you. I also wonder how the tracking part works.
I have done the following tests:
(1). I delete all the other calibration txt file except for first 20 files, it works well using tracking.
(2). When I use my own video or camera, it got black screens too. I calibrate my camera intrinsic and extrinsic matrix using PTAM. Because the computed parameters is almost zero, I initialize the camera rotation matrix as R=[1,0,0;0,1,0;0,0,1], and T=[0,0,0]. As the paper said 'Given a dense model consisting of one or more keyframes, we can synthesise realistic novel views over wide baselines by projecting the entire model into a virtual camera', I initial these two camera pose the same. So, I got the black results.

Hi Anuranbaka,
@anuranbaka My question are: (1)How do you calibrate the camera to get so many parameters such as upvector? (2) Could the tracking part work well like the original paper presented?
Thank you.

BR
Xiaoshui

@BigRedT
Copy link
Author

BigRedT commented Aug 31, 2015

Hi Xiaoshui,

My best hypothesis so far is that the synthetic image sequence on which the code works fine is marked by significant motion of camera between consecutive frames. On our videos however due to high frame rate 10-20 consecutive frames are probably not that different. So to DTAM it probably looks like there is just 1 frame and it fails to produce a depthmap. All successive failures/ black screens are probably because this initial depthmap wasn't created. Over this week I am going to try and verify this hypothesis. I will post updates here and I request you to do the same so that we can all benefit from it.

If this indeed turns out to be the case then one possible solution would be to compute camera poses for 50-100 frames using SfM and use all of those to initialize DTAM instead of using just first 10 or 20.

By the way what implementation of PTAM are you using? Does it work straight out of the box or did you have to tweak the code to get it to work?

To answer your first question - The images are most likely synthetically generated using OpenGL or something similar. The camera parameters were therefore specified and images are rendered using those camera parameters rather than the other way round.
http://www.doc.ic.ac.uk/~ahanda/VaFRIC/

-Tanmay

@XiaoshuiHuang
Copy link

Hi Tanmay,
Yes.I agree with you. The initial depth map is not computed correctly. I will verify it through accurately input the camera pose, instead of inputing R=[1,0,0;0,1,0;0,0,1], and T=[0,0,0]. Also, I have output the R and T of the example sequence. Actually, it is not like my initial R=[1,0,0;0,1,0;0,0,1], and T=[0,0,0]. So, maybe deep in from the first depth map, we can fix our problem.

For PTAM, the newest one is a GPL version and the the build scripts had been deleted. I build it use previous version http://www.robots.ox.ac.uk/~gk/PTAM/download.html . Following the readme, it is not difficult to compile it.

By the way, SVO is a recent work which is more robust than PTAM. https://github.com/uzh-rpg/rpg_svo . If you feel interesting, you can try this instead. I had compiled SVO too, but I donot know how to obtain the R and T. In the test_pipeline example, it can obtain camera pose, but I donot know what actual meaning it is. Anyway, this is another choice apart from PTAM. I will work for the first depth map computation in the following days.

Thanks for your information and wish us success.

BR
Xiaoshui

@BigRedT
Copy link
Author

BigRedT commented Sep 1, 2015

I noticed that in convertAhandaPovRayToStandard function the rotation matrix is read as:
R.row(0)=Mat(direction.cross(upvector)).t();
R.row(1)=Mat(-upvector).t();
R.row(2)=Mat(direction).t();

While R.row(0) and R.row(2) look correct I don't understand why R.row(1) is not Mat(R.row(0).cross(R.row(2))).t()? It also seems wrong because R.row(1) and R.row(2) may not be orthogonal the way they are being set.

@XiaoshuiHuang
Copy link

Hi Tanmay and Paul,
I have done many experiments on my own video these days. I find that the camera pose directly influence the cost volume before optimization. Sometimes, the initial cost volume is very bad, do you think this bad cost volume results coming from the cost volume computation?
https://drive.google.com/file/d/0B4tp_cfWpm9QNnVxQzBEV1RfRms/view?usp=sharing

Thanks
Xiaoshui

@BigRedT
Copy link
Author

BigRedT commented Sep 11, 2015

I actually abandoned OpenDTAM and instead am now using LSD-SLAM to get the
camera poses and then wrote my own dense depthmap generation code along the
lines of DTAM but which is easier to optimize and produces good results. I
plan to release my code once we have a publication ready for submission.

Thanks,
Tanmay

On Thu, Sep 10, 2015 at 5:53 PM, Xiaoshui Huang [email protected]
wrote:

Hi Tanmay and Paul,
I have done many experiments on my own video these days. I find that the
camera pose directly influence the cost volume before optimization.
Sometimes, the initial cost volume is very bad, do you think this bad cost
volume results coming from the cost volume computation?

https://drive.google.com/file/d/0B4tp_cfWpm9QNnVxQzBEV1RfRms/view?usp=sharing

Thanks
Xiaoshui


Reply to this email directly or view it on GitHub
#36 (comment).

@anuranbaka
Copy link
Owner

Hi all, been busy pushing for the ICRA deadline, so been a bit out of it. I'll try to briefly answer these questions.
1."It would be great if you (or anybody in the community) could provide a summary of the major differences between the APIs in the 2 branches. "
I no longer remember the differences that well, but experimental has less bugs(because they've been patched) and more cruft(because I kept trying new things). Most dev is stalled, because the rest of my life is in the way, and I think it is only a matter of time until someone significantly outperforms the original dtam algo, so not worth much for research.
2. @BigRedT The problems you're talking with needing enough motion are correct. It cannot optimize correctly without enough baseline. The convertAhandaPovRayToStandard function is correct, the problem being that POV-Ray's internal representation of things like the up vector is not what you would think. I spent a long time puzzling that one out. Main point is only use that input method for actual POV Ray data dumps.
3. @XiaoshuiHuang You are correct. DTAM is fundamentally an optimization algorithm. Garbage in -> Garbage out.
4. As a general rule, LSD-SLAM is much better at tracking than DTAM. They use the same method, and DTAM is more robust to things like focus change, but LSD-SLAM has much better choices of data to work with in the usual case. I would like to see LSD-SLAM tracking replace the DTAM tracker, but I don't have time right now for that.

@anuranbaka
Copy link
Owner

As a side note, I think in general the best way for people to use OpenDTAM is probably to use the weighted Huber denoising and possibly cost volume projection parts --which are really the core of the mapping part of DTAM and highly optimized in this code-- with someone else's tracking. DTAM tracking is robust, but inaccurate and hard to start.

@johnny871227
Copy link

johnny871227 commented May 18, 2016

Hi Tanmay @BigRedT ,
Just wonder if you got your work published? If yes could you please share a bit about the dense depthmap generation?

Thanks,
Johnny

@BigRedT
Copy link
Author

BigRedT commented Jun 1, 2016

Hi Johnny,

@johnny871227 Unfortunately the paper (of which dense depth estimation was a part) wasn't accepted in Siggraph and I started working on a different project in the meantime. Shoot me an email with details of what exactly you want to know and I might be able to provide you with more information.

Appologies for the late reply.
-Tanmay

@johnny871227
Copy link

Hi Tanmay,

@BigRedT . I sent you an email to your gmail account. In case you have not received it, I'll just copy it here.
Just wonder have you got any dense depth maps using your own data? I know the given example (Trajectory_30_seconds) looks good, but is there anything I should pay attention to if I'm going to process my own data?

Thanks,
Johnny

@BigRedT
Copy link
Author

BigRedT commented Jun 3, 2016

Hi @johnny871227,

I replied to you. But since its a very general question I will also answer here for the benefit of others who might be interested and to facilitate further discussion on this. I have worked with my own implementation and not the one provided here. It is a variant of the DTAM approach with some simplifications.

My observations are the following:

  • It works well only for objects close to the camera (due to nonlinear depth discretization involved in creating the cost volume)
  • Make sure your camera pose estimates are sufficiently accurate and you have enough neighboring frames (~30) to create accurate cost volume
  • If you have a sparse point cloud you can incorporate that into the optimization as well to make it better behaved
  • It has trouble in low texture regions or areas with high specular reflection

My particular implementation also had other parameters which controlled the granularity and range of depth discretization, coupling weight, and spatially varying weights for cost volume term and smoothness term. This required a significant amount of parameter tuning.

I hope this was useful!

@BigRedT
Copy link
Author

BigRedT commented Jun 18, 2016

Hi @XiaoshuiHuang, @johnny871227, @anuranbaka ,

I have uploaded my paper on arxiv in case anybody is interested in the modifications that I made to DTAM:
https://arxiv.org/abs/1606.05002

A video with results can be found here:
https://youtu.be/qP_uLYYBi70

Feedback is welcome!

-Tanmay

@johnny871227
Copy link

Hi @BigRedT , thanks for the paper!
I'll look into it and let you know.

Cheers,
Johnny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants