-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LU solver documentation #890
base: main
Are you sure you want to change the base?
Conversation
mgovers
commented
Feb 7, 2025
•
edited
Loading
edited
- Add LU solver documentation
- Rationale
- Implementation
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
78c6fe8
to
794e15e
Compare
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
@TonyXiang8787 this is ready for a first review round. I will spend some time tomorrow on going through all the documentation from start to finish myself as well to check if the structure still makes sense, but some feedback would be nice as well at this stage. |
Because the backward error is calculated on the $x$ and $r$ from the previous iteration, the | ||
iterative refinement loop will always be executed twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a potential follow-up improvement to the LU solver implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the logic here? What do you mean twice? The first time is just the solution of the original formula
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we always do one actual iterative refinement, but in the limit where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not understand your point here. So we now do the following initialization
- berr <- inf.
- r = b
The loop is (ignore the max limit)
while (berr > epsilon_converge)
- solve A*dx = r
- update berr, and update x
- update r
So
- In the first iteration, we are just solving Ax=b, which is the original equation. The berr will be very big. This should not be seen as iterative refinement.
- In the second iteration, we do one iterative refinement. And we get the new berr.
- If now the new berr is within the threshold, the loop terminates.
We always do ONE iterative refinement. But you mentioned in the doc that "iterative refinement will always be twice".
Instead of calculating $\underline{m}_p^{-1}$ to obtain $\underline{m}_p^{-1} x_p$, the power grid | ||
model first solves the matrix equation $l_p y_p = p_p a_p$, followed by solving $u_p z_p = y_p$. The | ||
end-result is then $\underline{m}_p^{-1} x_p = q_p z_p$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another potential follow-up improvement to the LU solver implementation:
Because matrix inversion is equivalent to solving this set of equations on a matrix once, it actually may be faster to pre-calculate and then left multiply each block in the above by
In addition, it may make the code simpler. (TODO: check that that solution would be numerically stable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the very first prototype of the LU Solver. It quickly proved to be extremely numerically not statable. Calculating inverse explicitly you lose already half of the precision. Applying that inverse to other off-diagonal blocks makes the result much less accurate and leads to divergence of outer loop (NRPF).
Have a look at the original PR, especially this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can explicitly document now in the docs to state applying inverse of diagonal blocks will result in numerical instability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing I have tried is to use EIGEN
build-in function to do the triangular solver instead of writing them manually. However, the benchmark was not so convincing. For some reason the EIGEN
triangular solver is much slower than manually written stuff. Maybe this can be an investigation to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the very first prototype of the LU Solver. It quickly proved to be extremely numerically not statable. Calculating inverse explicitly you lose already half of the precision. Applying that inverse to other off-diagonal blocks makes the result much less accurate and leads to divergence of outer loop (NRPF).
Have a look at the original PR, especially this commit
You can explicitly document now in the docs to state applying inverse of diagonal blocks will result in numerical instability.
Right. Thanks, that's useful input! Done.
Another thing I have tried is to use
EIGEN
build-in function to do the triangular solver instead of writing them manually. However, the benchmark was not so convincing. For some reason theEIGEN
triangular solver is much slower than manually written stuff. Maybe this can be an investigation to understand why.
do you want me to document that we have done a quick investigation and ran into a dead-end? I don't think it's worth investigating at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to mention anything in the docs. This is minor implementation choice.
We can later in the future to further benchmark why EIGEN
is slower.
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
#### Dense LU-factorization process | ||
|
||
The Gaussian elimination process itself is as usual. Let | ||
$M_p\equiv\begin{bmatrix}m_p && \vec{r}_p^T \\ \vec{q}_p && \hat{M}_p\end{bmatrix}$, where $p$ is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for me an unusual way of describing the problem? Usually ,we are talking about 9 blocks in the process. But you only mentioned 4.
[
[a11, a12, a13],
[a21, a22, a23],
[a31, a32, a33],
]
Where
a11: block containing handled L/U combination where the pivots have been already handled
a12: already handled U part just above the current pivot
a13: already handled U part above the rest pivots to be handled
a21: already handled L part just in the left of current pivot
a22: current pivot
a23: current U part
a31: already handled L part in the left of the rest pivots to be handled
a32: current L part
a33: rest of the matrix which will be adjusted by current elimination process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean i mentioned 4 blocks? i described the process for arbitrary block size. sym/asym is one thing, but the specific solvers can also add more dimensions to the block size. E.g. for NRPF, we treat the P and Q components separately, so it's not 3x3 but 9x9, right?
Since the process is iterative, you can step in at any pivot element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we should not omit the part which have been already factorized. We have to show the entire matrix in place, including the upper and left part of the matrix which already have the factorized L/U values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you at pivot
|
||
The Gaussian elimination process itself is as usual. Completely analogously to and following the | ||
same conventions as [before](#dense-lu-factorization-process), let | ||
$\underline{M}_p\equiv\begin{bmatrix}\underline{m}_p && \vec{\underline{r}}_p^T \\ \vec{\underline{q}}_p && \hat{\underline{M}}_p\end{bmatrix}$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why do you use underscore? Underscore in EE means specifically complex numbers.
- Same comment as above. Using 4 blocks instead of 9 blocks is confusing for the readers. Reader might think this is only applicable for the first pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i now understand what you mean. I'm describing the process at some point in the iteration during treatment of pivot element
I can indeed extend it, but the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Also, for the reader, include previous rows/columns shows that the factorization is in the middle of the process instead of just the beginning.
Here, $\overrightarrow{\underline{m}_p^{-1}\underline{q}_p}$ is symbolic notation for the | ||
block-vector of solutions of the equation $\underline{m}_p x_{k;p} = \underline{q}_{k;p}$, where | ||
$k = 0..(p-1)$. Similarly, $\widehat{\underline{m}_p^{-1}\underline{q}_p \underline{r}_p^T}$ is | ||
symbolic notation for the block-matrix of solutions of the equation | ||
$\underline{m}_p^{-1} x_{k,l;p} = \underline{q}_{k,p} \underline{r}_{p,l}^T$, where | ||
$k,l = 0..(p-1)$. That is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct and was actually one of the attempts in the prototyping. It did not work and produced the same numerical instability as doing the explicit inverse.
The real implementation now, is we do not fully solve
We apply lower-triangular solve in the U part which is right to the current pivot:
Lines 313 to 329 in 8d410ae
// for block matrix | |
// calculate U blocks in the right of the pivot, in-place | |
// L_pivot * U_pivot,k = P_pivot * A_pivot,k k > pivot | |
if constexpr (is_block) { | |
for (Idx u_idx = pivot_idx + 1; u_idx < row_indptr[pivot_row_col + 1]; ++u_idx) { | |
Tensor& u = lu_matrix[u_idx]; | |
// permutation | |
u = (block_perm.p * u.matrix()).array(); | |
// forward substitution, per row in u | |
for (Idx block_row = 0; block_row < block_size; ++block_row) { | |
for (Idx block_col = 0; block_col < block_row; ++block_col) { | |
// forward substract | |
u.row(block_row) -= pivot(block_row, block_col) * u.row(block_col); | |
} | |
} | |
} | |
} |
We then apply upper-triangular solver in the L part which is below the current pivot:
Lines 342 to 366 in 8d410ae
if constexpr (is_block) { | |
// for block matrix | |
// calculate L blocks below the pivot, in-place | |
// L_k,pivot * U_pivot = A_k_pivot * Q_pivot k > pivot | |
Tensor& l = lu_matrix[l_idx]; | |
// permutation | |
l = (l.matrix() * block_perm.q).array(); | |
// forward substitution, per column in l | |
// l0 = [l00, l10]^T | |
// l1 = [l01, l11]^T | |
// l = [l0, l1] | |
// a = [a0, a1] | |
// u = [[u00, u01] | |
// [0 , u11]] | |
// l * u = a | |
// l0 * u00 = a0 | |
// l0 * u01 + l1 * u11 = a1 | |
for (Idx block_col = 0; block_col < block_size; ++block_col) { | |
for (Idx block_row = 0; block_row < block_col; ++block_row) { | |
l.col(block_col) -= pivot(block_row, block_col) * l.col(block_row); | |
} | |
// divide diagonal | |
l.col(block_col) = l.col(block_col) / pivot(block_col, block_col); | |
} | |
} else { |
This is happening in the whole LU factorization part.
In the solve part, we do the rest lower/upper triangular solve on the actual b
and y
.
Signed-off-by: Martijn Govers <Martijn.Govers@Alliander.com>
53051da
to
f2551a8
Compare
|