Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The generation of MPI #15

Closed
mwsunshine opened this issue Aug 14, 2019 · 8 comments
Closed

The generation of MPI #15

mwsunshine opened this issue Aug 14, 2019 · 8 comments

Comments

@mwsunshine
Copy link

Dear authors,

Many thanks for your excellent work and open-source code!

However I came up with some problems when I used my own dataset and tried to generate the MPI images.

  1. I had 32 MPI images indeed, but as far as I know: MPI contains 32 RGB layers and 32 alpha layers(I am not sure am I right or not). Are these 32 MPI images generated by corresponding RGB * alpha ?
  2. I also notice that when the inference runs, the closest indices of neighbors will be printed out, like [0, 6, 10, 4, 13, 0]. I suppose that the first 0 is the reference image, [0, 6, 10, 4, 13] are its closest neighbors. What does the last zero mean? Does it mean 0 is the target index of image?
@bmild
Copy link
Collaborator

bmild commented Aug 15, 2019

  1. The actual MPI data is saved as binary 8 bit data as mpi.b inside each numbered folder. This corresponds to the numpy array with the stacked up RGBA images. You can see how to reload this in Python using the function here if you'd like to inspect the layers yourself.
  2. Yes, the first five indices are the closest neighbors. The last index is only useful during real fine-tuning, where it indicates which pose should be rendered after the MPI is created -- in this case, we just put a copy of the first index there as a placeholder.

@mwsunshine
Copy link
Author

Thank you so much for your detailed and quick response!

I checked the link of your code mentioned above, and may I ask one more question?

I suppose the data with the shape of [h, w, d, 4] should be the MPI data. d in your case is 32. The first three channels are RGB and the last one is alpha, am my guess right?

@bmild
Copy link
Collaborator

bmild commented Aug 17, 2019

That's correct.

@mwsunshine
Copy link
Author

Dear author,

sorry to bother you again. As I looked back at your great job in details these days, I had several questions that I had not noticed.

  1. I use you sample(stored in testscene file which contains a plant), and imgs2mpis.py to generate 20 Mpis, one for each image. For each Mpi, is it formed by using itself and its 4 nearest neighbours accoarding to your paper?

  2. Then I check the code mpis2video.py, for each new pose, you use 5 Mpis(5 nearest poses related to the new pose) to render 5 pictures using the function mpi.render. The beautiful final output is an alpha blending of these 5 rendered images. Am I right?

  3. I check you paper in part 5.2, you blend views from two different Mpis. While in question 2 above, 5 Mpis are used. Any reasons for this?

  4. I check one of the rendered images using the mpi and the funciton render_mpi_homogs, the RGB image is like this:
    image
    I am not sure am I doing it right or not, since there are many holes inside, and I am afraid two blending Mpis will not converge to a good result as yours.

  5. In your training procedure, instead of training blending two Mpis, one Mpi is trained at first. Is the training loss always image reconstruction during the whole process?

@bmild
Copy link
Collaborator

bmild commented Oct 12, 2019

  1. That's correct. We sort the images by distance and select the 4 nearest neighbors to generate each MPI here.
  2. Yes.
  3. At test time we are free to generate each MPI in a separate Tensorflow call, so we only need the amount of GPU memory required for outputting one MPI at a time. However during training we must generate all the MPIs we want to blend within a single computation graph, so we need the GPU memory for all N MPIs that we are blending. As such, we limited training to blending only 2 MPIs (we basically need 1 GPU for each MPI generated, so this already required training on 2 GPUs). (See more details below)
  4. After training through blending, the MPI output needs to be normalized by its alpha channel (since that's what it expects from the loss). For looking at one rendering, you have to "unpremultiply" the alpha like this to view it:
    img[...,:3] / (img[...,3:] + 1e-6)
    (recommend adding the small number to avoid divide by zero)
  5. Yes, training loss is always image reconstruction.

Note on training on blending 2 but testing on blending ~5:
An interesting detail here is that the behavior learned from blending is that "some other" MPI might fill in missing holes, so you can leave low alpha values there -- so there's a big difference in the training signal between rendering 1 vs blending 2 MPIs, but maybe not so big a difference between blending 2 and blending N>2 MPIs. But at test time, blending from more MPIs provides more chances to view occluded content, so we use more (5). You can change this number to see the result - fewer will give maybe a bit crisper results but with more popping when the input MPI set changes, and more will give temporally smoother results but longer render time and probably somewhat blurrier.

@mwsunshine
Copy link
Author

Thank you for your clear explanation!
for question 3:
In conclusion there are several reasons for training with 2 Mpis but testing with 5 Mpis:

  1. difference between using one Mpi and two Mpis is large(since they are totally different method), while when N>=2, the main difference is the chance of fulling missing holes. When N is larger, the result tends to be smoother but render time is also longer.
  2. GPU memory limitation. one GPU is needed for one Mpi generation. Training with more Gpus is unnecessary.
  3. I believe training with more Mpis blending could be even harder to converge.

for question 4:
So do you use premultiplied alpha RGB for the result of mpi.b?
individualImage
I used your function and the result is like this, is it correct this time? A lot of holes disappear this time.
for question 5:
By the way, for the networks placed at two GPUs during training, they share the same trainable variables, are they?

@mwsunshine
Copy link
Author

捕获
for question 4, I am sorry I made a mistake:mismatching the npy file and corresponding images. Now the output is good.

@bmild
Copy link
Collaborator

bmild commented Oct 21, 2019

Glad you figured it out.

And yes, when we use two GPUs, the trainable variables are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants