You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has resulted in end-to-end average speed up of 3.61x and max speedup of 4.98x across 9 workloads. Raw profiling data is available on this spreadsheet (access required). The profiling infrastructure is implemented in a separate repo (access required), and the approach is explained in PR #1.
Based on the profiling data of the original Python version of lowtime, the find_min_cut call made up ~90% of the total run time, which is why it was the first to be moved to Rust. However, there remains an opportunity to fully reimplement lowtime in Rust for performance gains in the remaining %10 of computations.
To concretely motivate this, the longest profiling job took around 2870 minutes; 10% of this is 287 minutes; speeding this up by 3.61x would save 287 - (287/3.61) = 287 - 80 = 207 minutes.
In general, if we assume Rust gives us 3.61x speedup, we can expect a further speedup of ~1.08x by calculating
The PhillipsDessouky class in Python contains a run function. run contains a loop which fundamentally repeats 4 things:
Call annotate_capacities, which updates lb (lower bound) and ub (upper bound) values of every Operation. These updates are dependent on the duration update in Bullet 3 below.
Call find_min_cut, which finds s_set, t_set for the minimum cut of the critical DAG, based on capacity values computed from the updated lb, ub in Bullet 1 above. This function is fully implemented in Rust in Rust Integration: Full find_min_cut Rust Integration #3.
Call reduce_durations, which updates the duration values of Operations according to the minimum cut found in Bullet 2 above. These updated duration values are needed by Bullet 1 of the next iteration.
Call aoa_to_critical_dag, which constructs the new critical DAG based on updated duration values from Bullet 3 above. The updated critical DAG is needed for the next iteration to work correctly.
The key observation from above is that find_min_cut is the only function that does not need to access or modify Operations - all the data it needs is contained in the critical_dag edges' capacity values. The remaining 3 functions require access to Operation member values, namely duration.
Ideally, we would want to reimplement the remaining 3 functions one-at-a-time, but because of the reliance on Operation, the 3 functions must be reimplemented together. In particular, the remaining functions require:
We do not need to reimplement the entire Operation class, we just need a simple struct that stores the member variables needed by PhillipsDessouky::run: min_duration, max_duration, duration. Each Operation also contains a cost_model which is queried when get_cost is called; we discuss this further below.
A sample operation.rs can be found in commit ddacc3f of PR #3. Note that get_cost and cost_model are commented out - the difficulty of reimplementing CostModel is explained in the section below.
CostModel in the Python version of lowtime is an abstract base class; in Rust, this can be implemented as a trait. To have a truly generic CostModel (not just exponential models), Operations should be able to store a trait-bound cost_model. A more detailed explanation of this problem is given in this comment from PR #3, along with this longer summary and relevant links (including this official pyo3 trait-bounds guide). In short, it is non-trivial to have a pyo3 function that receives trait-bound objects as input.
A sample cost_model.rs can be found in commit ddacc3f of PR #3. Note that this example did not compile at the time, but it gets across the idea of the data structures and traits that we want.
A "good" solution would implement the trait and "make it work" with pyo3 somehow. If this is infeasible, though, a viable option is to hard-code the cost model as an exponential model, and make the Rust implementation only compatible with exponential models. For example:
The Python-side PhillipsDessouky has 2 implementations of run. One for exponential models, another for a custom model.
If an exponential model is used, use the "Full Rust version" of run that computes everything in Rust.
If a custom model is used, use the "Partial Rust version" of run that only computes find_min_cut in Rust.
To support the "Full Rust version", the Rust-side codebase would include a "hard-coded" exponential model, which simply hardcodes 3 constants a, b, c for every Operation (get_cost would return a * (e^duration)) + c). A concrete example is shown below:
// cost_model.rsstructExponentialModel{a:f64,b:f64,c:f64,}implExponentialModel{fnnew(a:f64,b:f64,c:f64) -> Self{ExponentialModel{a, b, c}}fnget_cost(&self,duration:u32) -> f64{self.a* f64::exp(self.b* duration asf64) + self.c}}// operation.rsstructOperation{// other member variablescost_model:ExponentialModel,}implOperation{pubfnnew(/* other member variables */,a:f64,b:f64,c:f64) -> Self{Operation{// other member variablescost_model:ExponentialModel::new(a, b, c),}}fnget_cost(&mutself,duration:u32) -> f64{self.cost_model.get_cost(duration)}}
Currently, solver.py instantiates a rust_runner (Rust-side PhillipsDessouky object), which requires information about the critical_dag to reconstruct it in Rust. This conversion is aided by format_rust_inputs, which converts an nx.DiGraph object into a format compatible with the Rust-side LowtimeGraph constructor. Since the current Rust implementation does not include Operation and CostModel, this information is not processed in format_rust_inputs.
A sample format_rust_inputs extended for additional Operation information is included below. Note that the sample does not include CostModel parameters, which should be included as well. Finally, note that op_details can take on a None value (which can be interpreted Rust-side as an Option), because some edges may not correspond to an Operation (read Appendix E.1 of the Perseus paper for a detailed description of how the "Capacity DAG" is constructed).
The most important thing as we reimplement functions in Rust is that lowtime remains correct. The best way to ensure correctness is to compare the modfied version's outputs with the original version's outputs.
The infrastructure for this is contained in a separate repo: lowtime-exp (access required). lowtime-exp contains infrastructure for:
Profiling a specific version of lowtime (profile_single.sh or profile_all.sh)
A way of checking correctness by comparing outputs of one version to the original (diff_lowtime_outputs.py)
diff_lowtime_outputs.py takes in --expected_dir and --actual_dir command-line args, and compares the outputted energy schedules of every iteration. Note that the script allows for some intermediate iteration outputs to differ (it will let you know which iterations differed) as long as the final energy schedule is equivalent; the reason some intermediate iterations may differ is that nx.max_flow and pathfinding::edmonds_karp have different implementations with different tie-breaking mechanisms. As long as the final output is equivalent, this is fine.
Zooming out, we wish to reimplement the following functions: annotate_capacity, reduce_durations, aoa_to_critical_dag. To reimplement them, we need a Rust implementation of Operation and CostModel. This is a big project. Each of these components are non-trivial, so there will be bugs. A good way to approach building these is to build them one-by-one and check correctness before building the next component.
However, all of these components rely on the Operations values being correctly updated. For example, suppose we begin by reimplenting aoa_to_critical_dag in Rust:
Create graph_utils.rs
Implement aoa_to_critical_dag function
Create Rust-side PhillipsDessouky member function called aoa_to_critical_dag, which calls Rust-side aoa_to_critical_dag on its member self.dag
In solver.py, replace the call to self.aoa_to_critical_dag with rust_runner.aoa_to_critical_dag
This is great, but the problem is that we have no way of testing if the correctness of the Rust version of aoa_to_critical_dag. Furthermore, since the critical_dag on the Python-side was not updated, future iterations of run will be incorrect (e.g. annotate_capacity of the next iteration is given an incorrect critical_dag since it was updated Rust-side, but not Python-side).
There are 2 potential solutions to this problem:
Only start testing once we have reimplemented everything in Rust. We can simply check that the produced output is equivalent to the original version. If the output is incorrect, then oh well... the bug is somewhere in the entire reimplementation... we'll find it...
In the process of reimplementing everything, we reimplement things one-by-one. In this aoa_to_critical_dag example, once we have a Rust implemention, instead of replacing the Python-side call to aoa_to_critical_dag, we can run both Python and Rust versions and be left with 2 DAGs. We can add some code Python-side to iterate over the 2 DAGs to ensure that they are identical. This ensures that the Python-side Operations have the updated values and we can check the correctness of aoa_to_critical_dag without having reimplemented everything else.
The text was updated successfully, but these errors were encountered:
Context
lowtime
was initially fully implemented in Python. The main bottleneck computations have been moved to Rust, implemented in:pyo3
and moves max flow to Rust)find_min_cut
Rust Integration #3 (fully implementsfind_min_cut
in Rust)This has resulted in end-to-end average speed up of 3.61x and max speedup of 4.98x across 9 workloads. Raw profiling data is available on this spreadsheet (access required). The profiling infrastructure is implemented in a separate repo (access required), and the approach is explained in PR #1.
Based on the profiling data of the original Python version of
lowtime
, thefind_min_cut
call made up ~90% of the total run time, which is why it was the first to be moved to Rust. However, there remains an opportunity to fully reimplementlowtime
in Rust for performance gains in the remaining %10 of computations.To concretely motivate this, the longest profiling job took around 2870 minutes; 10% of this is 287 minutes; speeding this up by 3.61x would save 287 - (287/3.61) = 287 - 80 = 207 minutes.
In general, if we assume Rust gives us 3.61x speedup, we can expect a further speedup of ~1.08x by calculating
Next Steps
Contents
Operation
CostModel
solver.py
The
PhillipsDessouky
class in Python contains arun
function.run
contains a loop which fundamentally repeats 4 things:annotate_capacities
, which updateslb
(lower bound) andub
(upper bound) values of everyOperation
. These updates are dependent on theduration
update in Bullet 3 below.find_min_cut
, which findss_set
,t_set
for the minimum cut of the critical DAG, based oncapacity
values computed from the updatedlb
,ub
in Bullet 1 above. This function is fully implemented in Rust in Rust Integration: Fullfind_min_cut
Rust Integration #3.reduce_durations
, which updates theduration
values ofOperation
s according to the minimum cut found in Bullet 2 above. These updatedduration
values are needed by Bullet 1 of the next iteration.aoa_to_critical_dag
, which constructs the new critical DAG based on updatedduration
values from Bullet 3 above. The updated critical DAG is needed for the next iteration to work correctly.The key observation from above is that
find_min_cut
is the only function that does not need to access or modifyOperation
s - all the data it needs is contained in thecritical_dag
edges'capacity
values. The remaining 3 functions require access toOperation
member values, namelyduration
.Ideally, we would want to reimplement the remaining 3 functions one-at-a-time, but because of the reliance on
Operation
, the 3 functions must be reimplemented together. In particular, the remaining functions require:Operation
in Rust.CostModel
in Rust.Reimplementing
Operation
(back to contents)
We do not need to reimplement the entire
Operation
class, we just need a simplestruct
that stores the member variables needed byPhillipsDessouky::run
:min_duration
,max_duration
,duration
. EachOperation
also contains acost_model
which is queried whenget_cost
is called; we discuss this further below.A sample
operation.rs
can be found in commitddacc3f
of PR #3. Note thatget_cost
andcost_model
are commented out - the difficulty of reimplementingCostModel
is explained in the section below.Reimplementing
CostModel
(back to contents)
CostModel
in the Python version oflowtime
is an abstract base class; in Rust, this can be implemented as a trait. To have a truly genericCostModel
(not just exponential models),Operation
s should be able to store a trait-boundcost_model
. A more detailed explanation of this problem is given in this comment from PR #3, along with this longer summary and relevant links (including this official pyo3 trait-bounds guide). In short, it is non-trivial to have apyo3
function that receives trait-bound objects as input.A sample
cost_model.rs
can be found in commitddacc3f
of PR #3. Note that this example did not compile at the time, but it gets across the idea of the data structures and traits that we want.A "good" solution would implement the trait and "make it work" with
pyo3
somehow. If this is infeasible, though, a viable option is to hard-code the cost model as an exponential model, and make the Rust implementation only compatible with exponential models. For example:PhillipsDessouky
has 2 implementations ofrun
. One for exponential models, another for a custom model.run
that computes everything in Rust.run
that only computesfind_min_cut
in Rust.To support the "Full Rust version", the Rust-side codebase would include a "hard-coded" exponential model, which simply hardcodes 3 constants
a
,b
,c
for everyOperation
(get_cost
would returna * (e^duration)) + c
). A concrete example is shown below:Required Changes in
solver.py
(back to contents)
Currently,
solver.py
instantiates arust_runner
(Rust-sidePhillipsDessouky
object), which requires information about thecritical_dag
to reconstruct it in Rust. This conversion is aided byformat_rust_inputs
, which converts annx.DiGraph
object into a format compatible with the Rust-sideLowtimeGraph
constructor. Since the current Rust implementation does not includeOperation
andCostModel
, this information is not processed informat_rust_inputs
.A sample
format_rust_inputs
extended for additionalOperation
information is included below. Note that the sample does not includeCostModel
parameters, which should be included as well. Finally, note thatop_details
can take on aNone
value (which can be interpreted Rust-side as anOption
), because some edges may not correspond to anOperation
(read Appendix E.1 of the Perseus paper for a detailed description of how the "Capacity DAG" is constructed).Iterative Development for Correctness
(back to contents)
The most important thing as we reimplement functions in Rust is that
lowtime
remains correct. The best way to ensure correctness is to compare the modfied version's outputs with the original version's outputs.The infrastructure for this is contained in a separate repo:
lowtime-exp
(access required).lowtime-exp
contains infrastructure for:lowtime
(profile_single.sh
orprofile_all.sh
)diff_lowtime_outputs.py
)diff_lowtime_outputs.py
takes in--expected_dir
and--actual_dir
command-line args, and compares the outputted energy schedules of every iteration. Note that the script allows for some intermediate iteration outputs to differ (it will let you know which iterations differed) as long as the final energy schedule is equivalent; the reason some intermediate iterations may differ is thatnx.max_flow
andpathfinding::edmonds_karp
have different implementations with different tie-breaking mechanisms. As long as the final output is equivalent, this is fine.Zooming out, we wish to reimplement the following functions:
annotate_capacity
,reduce_durations
,aoa_to_critical_dag
. To reimplement them, we need a Rust implementation ofOperation
andCostModel
. This is a big project. Each of these components are non-trivial, so there will be bugs. A good way to approach building these is to build them one-by-one and check correctness before building the next component.However, all of these components rely on the
Operation
s values being correctly updated. For example, suppose we begin by reimplentingaoa_to_critical_dag
in Rust:graph_utils.rs
aoa_to_critical_dag
functionPhillipsDessouky
member function calledaoa_to_critical_dag
, which calls Rust-sideaoa_to_critical_dag
on its memberself.dag
solver.py
, replace the call toself.aoa_to_critical_dag
withrust_runner.aoa_to_critical_dag
This is great, but the problem is that we have no way of testing if the correctness of the Rust version of
aoa_to_critical_dag
. Furthermore, since thecritical_dag
on the Python-side was not updated, future iterations ofrun
will be incorrect (e.g.annotate_capacity
of the next iteration is given an incorrectcritical_dag
since it was updated Rust-side, but not Python-side).There are 2 potential solutions to this problem:
aoa_to_critical_dag
example, once we have a Rust implemention, instead of replacing the Python-side call toaoa_to_critical_dag
, we can run both Python and Rust versions and be left with 2 DAGs. We can add some code Python-side to iterate over the 2 DAGs to ensure that they are identical. This ensures that the Python-sideOperation
s have the updated values and we can check the correctness ofaoa_to_critical_dag
without having reimplemented everything else.The text was updated successfully, but these errors were encountered: