A “Quick” review of Differentiable Renderers

More exactly Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning (paper here). The code commented is available here.


General overview of model

The model goes through 3 steps. The last one is the soft rasterizer (in which we are interested). The order of the steps are

  1. Lighting: Set the correct light effects on the faces/vertices depending on the light model. Here we modify the color of each vertex/face depending on the light model.
  2. Transformation: Transforms the objects to the perspective of the camera and projects it to the image frame.
    1. Rotation happens in look_at.py.
    2. Projection happens in perspective.py or in orthogonal.py for each case .
  3. Soft rasterization:
    1. Detailed explanation below.

This are some of the results

softras

Soft rasterization

Rasterization is the process of figuring out which faces are not occluded for each pixel and render that color per pixel. In this paper even if the face is ocluded will have an effect in the pixel (that is why it is called soft-rasterization). In general terms the rasterization algorithm (forward_soft_rasterize) has two parts:

The function forward_soft_rasterize launches:

  • First: forward_soft_rasterize_inv_cuda: This runs ((batch_size * num_faces - 1) / threads +1) blocks of 512 threads. It calculates the barycentric coordinates, \(FF^T\) and if the face has any obstuse angle.
  • Second it launches forward_soft_rasterize_cuda_kernel: This runs in parallel for every pixel for every image in the batch with 512 threads.
    • For each pixel we loop all faces.

One of the outcomes of forward_soft_rasterize_inv_cuda is populating faces_info, which contains the face inverse, face symetric matrix and the which angle of face is obtuse.

  • Each face has 3 points and each point has 3 coordinates: a total of 9
  • Face inverse has 9 spaces (squared matrix)
  • Face sym has 9 spaces (squared matrix)
  • Face obt has 3 spaces (for each angle of the face (triangle))

The whole rasterization problem translate into findind the color for each pixel based on depth of each face.


How to generate pixel colors

The algorithm here is a bit harder to understand. It does not follow the exact same equations in the paper but the result is the same.

The main equation to agregate the colors from all (mesh-) faces into each individual pixels is described by

\[I^i = \sum_jw_j^iC_j^i + w_b^iC_b \hspace{1cm} \text{ with } \hspace{1cm} w_j^i = \frac{\mathcal D_j^i\exp(z_j^i/\gamma)}{\sum_k\mathcal D_k^i\exp (z_k^i/\gamma) + \exp(\epsilon/\gamma)}\]

Where

\[D_j^i = \text{sigmoid}\left( \delta_j^i\frac{d^2(i,j)}{\sigma}\right) \hspace{1cm} \text{with } \hspace{1cm} \delta_j^i = \delta(i,j) \text{ and } \delta \text{ as described above.}\]

for the \(i\)-th pixel and the \(j\)-th face.

The \(w_j^i\) weights the color for each face for this pixel by how far away (in \(x\) and \(y\)) the pixel is to the face (by using \(\mathcal D_j^i(d^2(i,j),\cdot)\)) (which can be improved - read notes) and by how far away (depth in \(z\)) the face is with respect to the other faces for this pixel (closer faces will get more relative importance and far away faces) (by using \(\exp(z_j^i / \gamma)\)).

The exact algorithm they use is described below. This algorithm only describes the default approach, that means using “barycentric” distance, and soft asignment for colors and opacity.

  1. For each pixel \(i\) do:
    1. Calculate \(z\): \(z\) is the inverse normalized depth.
    2. Intialize softmax_sum as \(\text{sm}_\text{sum} = \exp (\epsilon / \gamma)\)
    3. initialize soft_color as \(c=\text{bg} \times \text{sm}_\text{sum}\) with \(\text{bg}\) as a tuple with the background with zero opacity.
    4. Initialize softmax_maxas \(\text{sm}_\text{max} = \epsilon\)
    5. For each face \(j\) do:
      1. Initialize soft_fragment called \(D_j^i\) in the paper as \(D_j^i = 1/\left[1+\exp\left(-d^2(i,j)/\sigma\right)\right]\).
      2. Intialize exp_delta_zp equal to 1, that is \(z_\Delta = 1\)
      3. if \(z > \text{sm}_\text{max}\) then:
        1. set \(z_\Delta = \exp\left( (\text{sm}_\text{max} - z) / \gamma \right)\)
        2. set \(\text{sm}_\max = z\).
      4. Set \(\text{sm}_\text{sum} = z_\Delta \cdot \text{sm}_\text{sum} + \exp\left( \frac{z - \text{sm}_\max}{\gamma} \right) \cdot D_j^i\)
      5. Get the color for this face \(c_k\).
      6. Set \(c = z_\Delta \cdot c + \exp\left( \frac{z-\text{sm}_\max} {\gamma} \right) D_j^i c_k\)
    6. Normalize the color by \(\text{sm}_\text{sum}\), that is \(c = c/\text{sm}_\text{sum}\).

This change in the equations makes the final sum of all the components (i.e. the denominator of \(w_i\) or \(\text{sm}_\text{sum}\)) become

\[\text{sm}_\text{sum} = \exp\left(\frac {\epsilon - \text{sm}_\max} \gamma \right) + \sum_i\exp\left( \frac{z_i - \text{sm}_\max }{\gamma}\right) D_j^i\]

In this way each values of the first exponentials in the sum in the denominator is clipped to values between \(\left[-\min(\max(z_i), 1),0\right]\). Which does not change anything due to the fact the we can factorise \(\exp(- \text{sm}_\max)\) from the denominator and from the numerator, so both cancel each out and the result is the same as in the paper.

The only difference is that during execution the values stay between \([-\max z_i, 0]\) which helps to avoid overflowing. Doing some iterations is very useful to understand the final form of the denomintor and why we can just factorise the max term.


Complementary functions for the rasterization process

The following sections are useful to understand specific functions of the whole rasterization process. Keep in mind that for simplicity we sometimes drop the over or subscript \(i\) for pixel \(i\) or \(j\) for face \(j\).

How to represent barycentric coordinates

The simpliest way to understand barycentric coordinates are with masses, the barycentric coordinates are the correponding weight (mass) in each edge of the face (in this case) which will make the point of gravity be at the same euclidian point we are trying to represent.

\[\pmb p_i = U_i \pmb \lambda_i, \hspace{1cm} \text{with } \pmb \lambda_i = \left[ \begin{matrix} \lambda_1\\ \lambda_2 \\ \lambda_3\end{matrix} \right], U_i = \left[ \begin{matrix}x_1 & x_2 & x_3 \\ y_1 & y_2 & y_3 \\ 1 & 1 & 1 \end{matrix} \right], \pmb p_i = \left[ \begin{matrix}x\\y\\1\end{matrix}\right]\]

The last row of \(U_i\) takes care of assuring that \(\sum \lambda_j = 1\).

Therefore if we want to convert from euclidian coordinates to barycentric coordinates the equation becomes

\[\pmb \lambda_i = U_j^{-1}\pmb p_i\]


Definition of “Barycentric Distance”

The definition of barycentric distance for the $i$-th pixel and $j$-th face is

\[D_\text{bary}(i,j) = \delta \min(\pmb \lambda_1, \pmb \lambda_2, \pmb \lambda_3)^2 \hspace{1cm} \text{with } \hspace{1cm} \delta = \begin{cases} +1 & \pmb x \text{ is inside the triangle }\\ -1 & \pmb x \text{ is outside the triangle} \end{cases}\]

where \(\lambda_i\) are the barycentric coordinates.


How to find euclidian distance using barycentric coordinates

Defining \(\pmb p = (x,y)\) as any point (inside or outiside the triangle) and \(\pmb b = (u,v,w)\) as the barycentric coordinates of \(\pmb p\) and \(\pmb t\) as the perpendicular point of \(\pmb b\) to the closest edge of the triangle, then we can define the distance as

\[d(\pmb p, T) = \delta||U(\pmb t- \pmb b)|| \hspace{1cm} \text{with } \hspace{1cm} \delta = \begin{cases} +1 & \pmb x \text{ is inside the triangle }\\ -1 & \pmb x \text{ is outside the triangle} \end{cases}\]

We can simplify looking for the closest edge of the triangle by looping through the 3 edges and getting the minimum distance. The only unknown is how to calculate \(\pmb t\) using barycentric coordinates. Questions are asked in math exchange (here) and in the original repo (here). So far no answer.



Back-propagation

Color Image

Regaring \(I_c\) (keep in mind that we have droped the subscript \(_c\)) the main equation is

\[I^i = \sum_jw_j^iC_j^i + w_b^iC_b \hspace{1cm} \text{ with } \hspace{1cm} w_j^i = \frac{\mathcal D_j^i\exp(z_j^i/\gamma)}{\sum_k\mathcal D_k^i\exp (z_k^i/\gamma) + \exp(\epsilon/\gamma)} \hspace{5mm} \text{and } \hspace{5mm} \mathcal D_j^i = \frac 1 {1 + \exp\left( \frac{-D(i,j)}{\sigma} \right)}\]

Where \(D(i,j)\) is the distance metric we choose, in our case the main distance metric is the barycentric signed squared distance. Where we are estimating the coordinates of each face \(j\)-th and vertex \(l\)-th \(x_j^l, y_j^l, z_j^l\) and the color of each face \(C_j\) (keep in mind that \(x_j^l\) and \(y_j^l\) are normalized and projected while \(z_j^l\) is not). We know that \(\mathcal D_j^i = \mathcal D_j^i(x_j^l,y_j^l)_{l\in\{1,3,4\}}\). Therefore we can derivate \(I^i\) in terms of \(\mathcal D_k^i, z_j^i\) and \(C_j\). Defining \(M\) as the mesh containing each face and color we have that

\[\frac{\partial I}{\partial M} = \frac{\partial I}{\partial w}\frac{\partial w}{\partial \mathcal D} + \frac{\partial I}{\partial w}\frac{\partial w}{\partial \mathcal z} + \frac{\partial I}{\partial C} \hspace{1cm}\text{with } \hspace{1cm} U_i = \{u_{k,l}\}_{3\times 3} \text{ as defined above}\]

Keep in mind that the mesh first goes through the lighting shader and the through the translation/rotation shader to finally be projected to 2D. Luckly for us, these opperations can be easily implemented in any ML framework (pytorch), thus the framework will take care of the automatic differentiation and corresponding backpropagation. Therefore we only are concerned with the diffirentiation and backpropagation of the part of the model implemented is custom kernels in GPU. That is, only the equation shown above.

Finding \(\partial I^i/\partial C\) is direct, and for \(\partial I/\partial x,y\) and \(\partial I /\partial z\) the quations are longer:

\[\begin{align} \text{Color derivative:} & & \frac{\partial I^i}{\partial C_j^i} &= w_j^i dC_j^i \\ \\ \text{Probability map derivative:} && \frac{\partial I^i}{\partial \mathcal D_j^i} &= \sum_{k\not = j} \left( C_k^i \frac{\partial w_k^i}{\partial \mathcal D_j^i} \right) + C_j^i\frac{\partial w_j^i}{\partial \mathcal D_j^i} + C_b\frac{\partial w_b^i}{\partial \mathcal D_j^i} \\ \\ \text{Depth derivative:} && \frac{\partial I^i}{\partial z_j^i} &= \sum_{k\not = j} \left( C_k^i \frac{\partial w_k^i}{\partial z_j^i} \right) + C_j^i\frac{\partial w_j^i}{\partial z_j^i} + C_b\frac{\partial w_b^i}{\partial z_j^i} \end{align}\]


Probability map derivatives1

We can simplify the Probability map derivative as follows

\[\begin{align} \frac{\partial w_{k\neq j}^i}{\partial \mathcal D_j^i} &= \frac{\mathcal -\mathcal D_k^i\exp(z_k^i/\gamma)}{\left( \sum_t\mathcal D_t^i\exp (z_t^i/\gamma) + \exp(\epsilon/\gamma) \right)^2} \left( \exp(z_j^i/\gamma) d\mathcal D_j^i \right) \\\\ &= -\frac{w_k^iw_j^i}{\mathcal D_j^i}d\mathcal D_j^i \\\\ \frac{\partial w_j^i}{\partial \mathcal D_j^i} &=\frac{w_j^i}{\mathcal D_j^i}d\mathcal D_j^i - \frac{w_j^iw_j^i}{\mathcal D_j^i}d\mathcal D_j^i \\\\ \frac{\partial w_b^i}{\partial \mathcal D_j^i} &= -\frac{w_b^iw_j^i}{\mathcal D_j^i}d\mathcal D_j^i \end{align}\]

All together in \(\partial I / \partial \mathcal D\) simplifies the quations to

\[\begin{align} \frac{\partial I^i}{\partial \mathcal D_j^i} &= \left( -\sum_{k\not = j} \left[ C_k^i \frac{w_k^iw_j^i}{\mathcal D_j^i}\right] + C_j^i\left[\frac{w_j^i}{\mathcal D_j^i} - \frac{w_j^iw_j^i}{\mathcal D_j^i}\right]- C_b\frac{w_b^iw_j^i}{\mathcal D_j^i} \right)d\mathcal D_j^i\\\\ &= \frac{w_j^i}{\mathcal D_j^i}\left( C_j^i - \left[ \sum_{k\neq j}\left[ C_kw_k^i \right] + C_j^iw_j^i + C_bw_b^i \right]\right)d\mathcal D_j^i \\\\ &= \frac {w_j^i}{\mathcal D_j^i} \left( C_j^i - I^i \right)d\mathcal D_j^i \end{align}\]

Continuing to solve the entiere probability map derivative, we solve the internal partial derivatives we get (The models uses \(D(i,j) = D_\text{bary}(i,j)\))

\[\begin{align} \frac{\partial \mathcal D_j^i}{\partial u_{k,l}} &= \mathcal D_j^i(1-\mathcal D_j^i) \left(\frac{1}{\sigma} \frac{\partial D(i,j)}{\partial u_{k,l}} \right) \\\\ \frac{\partial D(i,j)}{\partial u_{k,l}} &= 2 \pmb \lambda_s d\pmb\lambda_s \hspace{1cm} \text{where} \hspace{1cm} s = \arg \min_i(\pmb \lambda_i)\\ &= 2 (U^{-1}\pmb p_i)_sd(U^{-1}\pmb p_i)_s \\ &= 2 \pmb \lambda_s \left( -U^{-1} \frac{\partial U}{\partial u_{k,l}} U^{-1} \right)\pmb p_i\\\\ \frac{\partial U_j}{\partial u_{k,l}} &= \left(\begin{matrix} 1 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{matrix}\right) \hspace{1cm} \text{ if } \hspace{5mm} k,l = 1,1 \end{align}\]

Doing the corresponding math to calculate \(\partial U_j/\partial u_{k,l}\) the final result is

\[\frac{\partial \pmb \lambda_s}{\partial u_{k,l}} = -v_{s,k}\sum_tv_{l,t}p_t \hspace{1cm} \text{with} \hspace{1cm} V=U^{-1}=\{v_{m,n}\}\]

So the final probability map derivative is

\[\begin{align} \frac{\partial I^i}{\partial \mathcal D_j^i} &= \frac {w_j^i}{\mathcal D_j^i} \left( C_j^i - I^i \right)d\mathcal D_j^i \\\\ \frac{\partial I^i}{\partial \mathcal D_j^i}\frac{\partial \mathcal D_j^i}{\partial u_{k,l}}&= \frac {w_j^i}{\mathcal D_j^i} \left( C_j^i - I^i \right)\left( \frac{\mathcal D_j^i(1-\mathcal D_j^i) }{\sigma} \frac{\partial D(i,j)}{\partial u_{k,l}} \right) \\\\ \frac{\partial I^i}{\partial \mathcal D_j^i}\frac{\partial \mathcal D_j^i}{\partial u_{k,l}}&= \frac {w_j^i}{\mathcal D_j^i} \left( I^i - C_j^i \right)\left( \frac{\mathcal D_j^i(1-\mathcal D_j^i) }{\sigma} \left(2 \pmb \lambda_s v_{s,k}\sum_tv_{l,t}p_t \right) \right) \\\\ \end{align}\]


Depth derivatives 2

The result for \(\partial w_j^i/\partial z_j^i\) follows a very similar line of the one for \(\partial w_j^i/\partial \mathcal D_j^i\),

\[\begin{align} \frac{\partial w_j^i}{\partial z_j^i} &= \frac{w_j^i}{\gamma}dz_j^i - \frac{w_j^iw_j^i}{\gamma}d z_j^i \\\\ \frac{\partial w_k^i}{\partial z_j^i} &= - \frac{w_k^iw_j^i}{\gamma}d z_j^i\\\\ \frac{\partial w_b^i}{\partial z_j^i} &= -\frac{w_b^iw_j^i}{\gamma}d z_j^i \end{align}\]

First we can simplify it further. As before, the derivative becomes

\[\frac{\partial I^i}{\partial z_j^i} = \frac {w_j^i}{\gamma} \left( C_j^i - I^i \right)d z_j^i\]

Depth is calculted using the barycentric weights and the estimated \(z\) coodinate of each vertix for each \(j\)-th face, moreover \(z_j^i\) is the normalized depth. In other words

\[z_j^i = \frac{z_\text{far} - \left(\sum_t \lambda_j^{(t)}/z_j^{(t)}\right)^{-1}}{z_\text{far} - z_\text{near}}\]

where \(\pmb z_j^T = (z_j^{(1)}, z_j^{(2)}, z_j^{(3)})\) where \(z_i^{(j)}\) was estimated at the same time as \(x_i, y_i\) for each vertex \(j\) of face \(i\) and \(z_\text{far},z_\text{near}\) are the maximum and minimum depth that will be taken into account. Therefore

\[dz_j^i = \frac{\partial z_j^i}{\partial u_{k,l}} +\frac{\partial z_j^i}{\partial z_j^{(m)}}\]

Therefore

\[\begin{align} \frac{\partial z_j^i}{\partial z_{k}^{(m)}} &= \frac{1}{z_\text{far}-z_\text{near}}\left( \frac{-\lambda_j^{(m)}}{\left(z_j^{(m)}\sum_t \lambda_j^{(t)}/z_j^{(t)}\right)^2} \right)\\\\ \frac{\partial z_j^i}{\partial u_{k,l}} &= \frac{-1}{z_\text{far}-z_\text{near}} \left( \frac{\displaystyle\sum_w\left[\left(z_j^{(w)}\right)^{(-1)}\partial \lambda_j^{(w)}/\partial u_{k,l}\right]}{\left( \sum_w\lambda_j^{(w)}/z_j^{(w)} \right)^2} \right) \\\\ &= \frac{-1}{z_\text{far}-z_\text{near}} \left( \frac{\displaystyle\sum_w\left[\frac{v_{w,k}}{z_j^{(w)}}\sum_tv_{l,t}p_t^i\right]}{\left( \sum_w\lambda_j^{(w)}/z_j^{(w)} \right)^2} \right) \end{align}\]

With \(v_{i,j}\) defined above. Keep in mind the difference between \(z_j^{(m)}\) and \(z_j^i\). The former refers to the depth of the vertex \(m\) of face \(j\), while the later means the depth (normalized disparity) from the pixel \(i\) to the face \(j\). Also, keep in mind that the original estimated values are in terms a list of \(V\) vertices and not a list of \(V'\) vertices per \(F\) faces. This is a suttle difference but simplifies the understanding. Because of this, in principle we should find the derivative w.r.t to each vertex and not w.r.t to each vertex in each face (because two faces can share up to two vertices), but because we pass a matrix with a vertex per face (\(F\times 3 \times 3\) or \(F \times 9\) because each vertex has 3 points) to CUDA, we can treat it as vertices per faces given that the matrix is already allocated in memory. The ML framework will take care of going back to the \(V \times 3\) notation. This means extra consumption of memory for the vertices. Moreover, with less structured meshes in which vertices can be shared with arbritarly number of faces, the memory usage can grow significantly.

The final form for the depth derivative will be in the form

\[\begin{align} \frac{\partial I^i}{\partial z_j^i} &= \frac{w_j^i}{\gamma}(C_j^i- I^i)dz_j^i\\\\ \frac{\partial I^i}{\partial z_j^i}\frac{\partial z_j^i}{\partial z_j^{(m)}} &= \frac {w_j^i}{\gamma} \left( I^i - C_j^i \right)\left( \frac{\lambda_j^{(m)}}{\left(z_\text{far}-z_\text{near} \right)\left(z_j^{(m)}\sum_t \lambda_j^{(t)}/z_j^{(t)}\right)^2}\right) && \text{with } U_j = \{u_{m,n}\}\\\\ \frac{\partial I^i}{\partial z_j^i}\frac{\partial z_j^i}{\partial u_{k,l}} &= \frac {w_j^i}{\gamma} \left( I^i - C_j^i \right) \left( \frac{\displaystyle\sum_w\left[\frac{v_{w,k}}{z_j^{(w)}}\sum_tv_{l,t}p_t^i\right]}{\left(z_\text{far}-z_\text{near} \right)\left( \sum_w\lambda_j^{(w)}/z_j^{(w)} \right)^2} \right) &&\text{with } V_j = U_j^{-1} = \{ v_{m,n}\} \end{align}\]

.


Color derivatives 3

The color \(C_j^i\) depends on the pixel​ coordinates and on the face’s vertices, but its formula is a pice-wise function for the different \(R^2\) sub-faces in a Mesh with (face-) resolution of \(R\). Therefore the derivative of the selected sub-face will be same as specified above, while for the other faces it will be zero. Thus no change will be needed apart form considering that. Thus

\[\frac{\partial I^i}{\partial C_j^i} = w_j^i\]


Silhoutte Image

Regarding the partial derivatives of \(\hat I_s\), given that the silhouette do not use color, the derivative will only have one partial derivative w.r.t. the probability map \(\mathcal D\).

\[I_s^i = 1 - \prod_j\left( 1-\mathcal D_j^i \right) \hspace{10mm} \text{with } \hspace{5mm} \mathcal D_j^i = \frac 1 {1 + \exp\left( \frac{-D(i,j)}{\sigma} \right)}\]

for the \(i\)-th pixel and with \(D\) as described above, thus

\[\frac{\partial I^i_s}{\partial \mathcal D_j^i} = \frac{1-I_s^i}{1-\mathcal D_j^i}d\mathcal D_j^i\]

In other words we know that \(D\) depends only on \(\pmb \lambda\) which only depends on \(u_{k,l}\). Therefore

\[\frac{\partial I^i_s}{\partial \mathcal D_j^i} = \frac{1-I_s^i}{1-\mathcal D_j^i}\frac{\partial \mathcal D_j^i}{\partial u_{k,l}}\]

Where all partial derivatives are already defined.


Losses

Everything originates in the los \(\mathcal L = \mathcal L_s + \mu_c \mathcal L_c + \mu_g \mathcal L_g\) composed by a weighted sum of the silhouette loss \(\mathcal L_s\), the color loss \(\mathcal L_c\) and the geometry loss \(\mathcal L_g\).

\[\begin{align} \mathcal L_s &= 1-\frac{||\hat I_s \otimes I_s||_1}{||\hat I_s \oplus I_s - \hat I_s \otimes I_s||_1} \\\\ \mathcal L_c &= ||\hat I_c - I_c ||_1 \end{align}\]

So far we have dropped the super-script \(\hat \cdot\) (hat), here we need to re-introduce it. With \(\hat I\) being the predicted image, \(I_s\) the silhouete and \(I_c\) the color image. In other words \(\mathcal L_s\) works as a Intersection over Union (IoU), and \(\mathcal L_s\) is a simple L1 loss. On the other hand the geometry loss is enterly implemented in the ML framework so no need to speacial treatment here4.

We need to calculate the derivatives of each loss w.r.t. with all the estimated values (final prediction of the neural network), that means with respect to \(u_{k,l}\) (the \(x\) or \(y\) projected coordinates of each vertex of each face, the \(z_j^{(m)}\) the unprojected depth of each vertex of each face and the color \(C_j^i\) for each face \(j\) and sub-face. Luckly for us, several operations inside the losses are made with the ML framework, meaning we do not need to define manually the gradient for those operations (i.e. automatic differentiation in action). Nonetheless there are few operations not done with the framework and made on the custom gpu kernel. Those operations are \(\hat I_s\) and \(\hat I_c\) thus

\[\begin{align} \frac{\partial \mathcal L}{\partial \hat I_c}\frac{\partial \hat I_c}{\partial u_{k,l}} &= \mu_c\frac{\partial \mathcal L_c}{\partial \hat I_c} \frac{\partial \hat I_c}{\partial u_{k,l}}\\\\ \frac{\partial \mathcal L}{\partial \hat I_c}\frac{\partial \hat I_c}{\partial z_k^{(m)}} &= \mu_c\frac{\partial \mathcal L_c}{\partial \hat I_c} \frac{\partial \hat I_c}{\partial z_{k}^{(m)}} \\\\ \frac{\partial \mathcal L}{\partial \hat I_s}\frac{\partial \hat I_s}{\partial u_{k,l}} &= \frac{\partial \mathcal L_s}{\partial \hat I_s}\frac{\partial \hat I_s}{\partial u_{k,l}}\\\\ \frac{\partial \mathcal L}{\partial \hat I_c}\frac{\partial \hat I_c}{\partial C_j^i} &= \mu_c \frac{\partial \mathcal L_c}{\partial \hat I_c}\frac{\partial \hat I_c}{\partial C_j^i} \end{align}\]

\(\partial \mathcal L_c/\partial I_c\) and \(\partial \mathcal L_s /\partial \hat I_s\) are already calculated for us, thus we only care about calculating the partial derivatives of \(\hat I_c\) and \(\hat I_s\).


All derivatives Together

The most main loss gradient will have the form as follows

\[\begin{align} \frac{\partial \mathcal L}{\partial u_{k,l}} &= \frac{\partial \mathcal L_s}{\partial \hat I_s}\frac{\partial \hat I_s}{\partial u_{k,l}} + \mu_c\frac{\partial \mathcal L_c}{\partial \hat I_c} \frac{\partial \hat I_c}{\partial u_{k,l}} \\\\ \frac{\partial \mathcal L}{\partial z_k^{(m)}} &= \mu_c\frac{\partial \mathcal L_c}{\partial \hat I_c} \frac{\partial \hat I_c}{\partial z_{k}^{(m)}}\\\\ \frac{\partial \mathcal L}{\partial C_j^i} &= \mu_c \frac{\partial \mathcal L_c}{\partial \hat I_c}\frac{\partial \hat I_c}{\partial C_j^i} \end{align}\]

The final gradient for the silhouette image \(I_s\) will only depend on \(z_j^{(m)}\) as follows

\[\begin{align} \frac{\partial I^i_s}{\partial u_{k,l}} &= \frac{1-I_s^i}{1-\mathcal D_j^i}\frac{\partial \mathcal D_j^i}{\partial u_{k,l}}\\\\ &= \frac{1-I_s^i}{1-\mathcal D_j^i}\left( - \frac{\mathcal D_j^i(1-\mathcal D_j^i) }{\sigma} \left(2 \pmb \lambda_s v_{s,k}\sum_tv_{l,t}p_t \right) \right) \end{align}\]

The final gradient of the color image \(I^i\) w.r.t. each \(u_{k,l}\) and \(z_j^{(m)}\) will be, first w.r.t. \(u_{k,l}\):

\[\begin{align} \frac{\partial I^i}{\partial u_{k,l}} &= \frac{\partial I^i}{\partial \mathcal D_j^i}\frac{\partial \mathcal D_j^i}{\partial u_{k,l}}+ \frac{\partial I^i}{\partial z_j^i}\frac{\partial z_j^i}{\partial u_{k,l}}\\\\ \frac{\partial I^i}{\partial \mathcal D_j^i}\frac{\mathcal D_j^i}{\partial u_{k,l}} &= \frac {w_j^i}{\mathcal D_j^i} \left( I^i - C_j^i \right)\left( \frac{\mathcal D_j^i(1-\mathcal D_j^i) }{\sigma} \left(2 \pmb \lambda_s v_{s,k}\sum_tv_{l,t}p_t \right) \right)\\\\ \frac{\partial I^i}{\partial z_j^i}\frac{\partial z_j^i}{\partial u_{k,l}} &= \frac {w_j^i}{\gamma} \left( I^i - C_j^i \right) \frac{\displaystyle\sum_w\left[\frac{v_{w,k}}{z_j^{(w)}}\sum_tv_{l,t}p_t^i\right]}{\left(z_\text{far}-z_\text{near} \right)\left( \sum_w\lambda_j^{(w)}/z_j^{(w)} \right)^2} \\\\ \end{align}\]

And the gradient of \(I^i\) w.r.t. each \(z_j^{(m)}\) is

\[\begin{align} \frac{\partial I^i}{\partial z_j^{(m)}} &= \frac{\partial I^i}{\partial z_j^i}\frac{\partial z_j^i}{\partial z_j^{(m)}} \\\\ &= \frac {w_j^i}{\gamma} \left( I^i - C_j^i \right)\left( \frac{1}{z_\text{far}-z_\text{near}}\ \frac{\lambda_j^{(m)}}{\left(z_j^{(m)}\sum_t \lambda_j^{(t)}/z_j^{(t)}\right)^2} \right) \end{align}\]

And the gradient of \(I^i\) w.r.t. to each color \(C_j^i\) is

\[\begin{align} \frac{\partial I^i}{\partial C_j^i} &= w_j^i \end{align}\]


Questions

Some questions that can rise while understing the code

  1. Why we only work with 2d coords during the rasterization process? Because points are alredy projected.
  2. Why do we calculate \(F F^T\). Because it gives us useful information, for example the element (1,1) will be the squared lenght of the first vector. Or it is used to check how far away a point is from a triangle
  3. Why the barycentric distance is the min of the barycentric weights (coords)? Empirically it works, but isn’t it discontinous? Yes, the gradient will be zero for those functions that are not \(\arg\min\) and the \(\arg \min\) function of this barycentric distance is in fact differentiable.
  4. Why \(x+y+z = 0\)? It cannot happen (like in the code) that w[0] + w[1] + w[2] = 0 (the code has a typo) but what can happen is \(x+y+z=0\) and that is because if \((x,y,z) = \vec{PQ}\) (i.e. \(\vec{PQ} = Q-P\)) if \(P\) and \(Q\) are inside the triangle and normalised. Therefore \((x,y,z)\) are no longer normalized barycentric coordinates.
  5. What is the structure of texture? It is \([\text{batch, face_number, resolution, 3}]\)
  6. Should we have \(C_j^i\) or just \(C_j\) because the color only depends on the face \(j\)-th? False, faces can have a higher definition (i.e. subdivide the face in \(R^2\) sub-faces for a definition \(R\)). So the resulting color depends on the coordinate of the pixel \(i\)-th because it will define which sub-face’s color will be used.


Some useful resources

Footnotes

  1. To follow the derivative in the code, follow backwards grad_v from line 726 (backward_barycentric_ ...) on commented branch. 

  2. To follow the derivative in the code, follow backwards grad_v[][2] for example from line 708 on commented branch. 

  3. To follow the derivative in the code, check line 696 on commented branch. 

  4. The geometry loss works only with the corresponding coordinates of the predicted vertices in a simple set of operations, therefore no need to custom gpu implementation.