Generically sized dual numbers #58
Merged
+3,401
−2,136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR uses more generic underlying data structures so that the derivative parts of vector dual (including second order dual and hyper dual) numbers can use either constant or dynamically sized vectors. This enables the calculation of gradients with respect to an arbitrary number of variables.
In theory the performance of dual numbers with constant sizes should not be changed. In practice, benchmarks showed minor regressions in performance. It is worth noting though, that while the benchmarks in this crate are reproducible they are somewhat unreliable an intransparent with respect to compiler settings. A benchmark in the feos crate showed no significant loss in performance.
Even though this allows the automatic differentiation of functions involving many variables, at some point dual numbers (i.e. forward mode AD) become less performant than reverse mode AD (backpropagation).
To accomodate the changes, the
DualNum
can not haveCopy
,Send
andSync
as supertraits. For statically allocated dual numbers these marker traits are still implemented an if needed additional trait bounds can be used for specific applications.Finally, for dynamically sized dual numbers the number of derivatives is only known at compile time. Therefore it is impossible for functions like
from
orzero
to allocate the appropriate amount of memory. To avoid unexpected behavior arising from possibly empty vectors, all derivative parts are wrapped in aDerivative
struct that contains the actual Vector within anOption
. This has the added benefit that algebraic operations can be avoided if one or more of the operands is not a function of the independent variable.