4

To gain some confidence, I want to implement the camera tracking (optimization problem) discussed in Semi Dense Visual Odometry for a monocular cameraJ Engel, J Strum, D Cremers

$$E(\xi) = \underset{i}\Sigma\frac{\alpha(r_i(\xi))}{\sigma_{d_i}^2}(r_i(\xi))$$ $$r_i(\xi) = (I_2(w(x_i, d_i, \xi)) - I_1(x_i))$$

Using Gauss-Newton method, as discussed in the same paper, (local optima) $\xi$ between two monocular images can be found. Here is how I think the process goes:

Given

  • Start with initial guess $\xi_0$ (~ 0), Images $I_1, I_2$, Depth for some points in image $I_1$ is given

Steps

  • Start at lowest level (most coarse) $L_n$ pyramid image (same as shrinked image?) and corresponding depth map (for points in shrinked image?)

  • Gauss Newton iterations with some library (I plan to use python for ease).

    • Setup the residual calculation function that takes input $I_2$, $I_1$, {$x_i$}, {$d_i$} and {$\xi$} and produces $r_i$. The Jacobian involved will be calculated numerically.
    • I can skip $\alpha(.)$ and $\sigma_{d_i}^2$ for now. $E(\xi) = \underset{i}\Sigma ~r_i^2$
    • At the end I expect to get the result $\xi_L$
  • Use the solution $\xi_L$ as initial guess and repeat for the next pyramid level (n-1)

Questions:

  • Am I missing some step in above process ? Please let me know.

  • Is there a library function in openCV that take an image (and its depth image) as input and give in output the requested pyramid level (for a choosen n) image as output (depth image will also need to be shrinked)?

PS: can someone with higher reputation add the tag "Image-alignment" to this question?

vyi
  • 215
  • 1
  • 9
  • Wow blows me away that this question was asked over a year ago and you're putting a bounty on it now. Good luck! I tried to add an image-alignment tag but no such tag exists. – Chuck Dec 07 '20 at 15:06
  • @Chuck Yeah :) I was trying to do this a year back and then left it midway. I think I can implement a crude solution as outlined above but then I feel like there is a better way of doing this. I hope the bounty helps. About the tag, I was referring to creating a new tag image-alignment. – vyi Dec 07 '20 at 15:18
  • For tag creation, please start a "New tag request: image-alignment" question at the meta site. – Chuck Dec 07 '20 at 15:33

1 Answers1

1

I'm going to try to answer this question but please don't flame me if I got something wrong. Those were two heavy papers and I didn't have as much time as I wish to go through them.


The pyramids are the same as shrank image indeed. Image pyramids are images with lower resolution. As a general concept, by reducing the resolution and representing a different amount of details, the tracking algorithm can focus on different types of feature and become more generic. See pyrUp/Down in OpenCV.


It is a little hard to answer because it is not clear what is the granularity of your question.

I feel like the general steps of the algorithm are there but if I have the time later, I'll go through it again later. However, Gaussian-Newton optimization are rarely plug and play for me, so I would be cautious there in there of the time needed to have it running.

I'm wondering if, since you'll be ignoring $\alpha$ and $\sigma^2_{d_{i}}$ You're not implementing the method of Steinbrücker et al. in Real-Time Visual Odometry from Dense RGB-D Images (full disclaimer I didn't have time to read this one in details just skimmed it).

Malcolm
  • 571
  • 4
  • 17
  • Hi. Thanks for adding this answer. It is really helpful (I'm looking into the pyrUP/DOWN). I see that the other part of the Question (about the correctness of given implementation) might seem unclear, essentially I wanted to implement the IA algorithm correctly. There are many projects that do this (IA) and I thought someone with experience might validate the outlined steps for performing IA. – vyi Jan 08 '21 at 11:15