Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Rust Integration #4

Open
ohjuny opened this issue Dec 10, 2024 · 0 comments
Open

Full Rust Integration #4

ohjuny opened this issue Dec 10, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@ohjuny
Copy link
Collaborator

ohjuny commented Dec 10, 2024

Context

lowtime was initially fully implemented in Python. The main bottleneck computations have been moved to Rust, implemented in:

This has resulted in end-to-end average speed up of 3.61x and max speedup of 4.98x across 9 workloads. Raw profiling data is available on this spreadsheet (access required). The profiling infrastructure is implemented in a separate repo (access required), and the approach is explained in PR #1.

Based on the profiling data of the original Python version of lowtime, the find_min_cut call made up ~90% of the total run time, which is why it was the first to be moved to Rust. However, there remains an opportunity to fully reimplement lowtime in Rust for performance gains in the remaining %10 of computations.

To concretely motivate this, the longest profiling job took around 2870 minutes; 10% of this is 287 minutes; speeding this up by 3.61x would save 287 - (287/3.61) = 287 - 80 = 207 minutes.

In general, if we assume Rust gives us 3.61x speedup, we can expect a further speedup of ~1.08x by calculating

$\frac{1}{0.9 + \frac{0.1}{3.61}}$

Next Steps

Contents

The PhillipsDessouky class in Python contains a run function. run contains a loop which fundamentally repeats 4 things:

  1. Call annotate_capacities, which updates lb (lower bound) and ub (upper bound) values of every Operation. These updates are dependent on the duration update in Bullet 3 below.
  2. Call find_min_cut, which finds s_set, t_set for the minimum cut of the critical DAG, based on capacity values computed from the updated lb, ub in Bullet 1 above. This function is fully implemented in Rust in Rust Integration: Full find_min_cut Rust Integration #3.
  3. Call reduce_durations, which updates the duration values of Operations according to the minimum cut found in Bullet 2 above. These updated duration values are needed by Bullet 1 of the next iteration.
  4. Call aoa_to_critical_dag, which constructs the new critical DAG based on updated duration values from Bullet 3 above. The updated critical DAG is needed for the next iteration to work correctly.

The key observation from above is that find_min_cut is the only function that does not need to access or modify Operations - all the data it needs is contained in the critical_dag edges' capacity values. The remaining 3 functions require access to Operation member values, namely duration.

Ideally, we would want to reimplement the remaining 3 functions one-at-a-time, but because of the reliance on Operation, the 3 functions must be reimplemented together. In particular, the remaining functions require:

  • Reimplementing Operation in Rust.
  • Reimplementing CostModel in Rust.

Reimplementing Operation

(back to contents)

We do not need to reimplement the entire Operation class, we just need a simple struct that stores the member variables needed by PhillipsDessouky::run: min_duration, max_duration, duration. Each Operation also contains a cost_model which is queried when get_cost is called; we discuss this further below.

A sample operation.rs can be found in commit ddacc3f of PR #3. Note that get_cost and cost_model are commented out - the difficulty of reimplementing CostModel is explained in the section below.


Reimplementing CostModel

(back to contents)

CostModel in the Python version of lowtime is an abstract base class; in Rust, this can be implemented as a trait. To have a truly generic CostModel (not just exponential models), Operations should be able to store a trait-bound cost_model. A more detailed explanation of this problem is given in this comment from PR #3, along with this longer summary and relevant links (including this official pyo3 trait-bounds guide). In short, it is non-trivial to have a pyo3 function that receives trait-bound objects as input.

A sample cost_model.rs can be found in commit ddacc3f of PR #3. Note that this example did not compile at the time, but it gets across the idea of the data structures and traits that we want.

A "good" solution would implement the trait and "make it work" with pyo3 somehow. If this is infeasible, though, a viable option is to hard-code the cost model as an exponential model, and make the Rust implementation only compatible with exponential models. For example:

  • The Python-side PhillipsDessouky has 2 implementations of run. One for exponential models, another for a custom model.
  • If an exponential model is used, use the "Full Rust version" of run that computes everything in Rust.
  • If a custom model is used, use the "Partial Rust version" of run that only computes find_min_cut in Rust.

To support the "Full Rust version", the Rust-side codebase would include a "hard-coded" exponential model, which simply hardcodes 3 constants a, b, c for every Operation (get_cost would return a * (e^duration)) + c). A concrete example is shown below:

// cost_model.rs
struct ExponentialModel {
    a: f64,
    b: f64,
    c: f64,
}

impl ExponentialModel {
    fn new(a: f64, b: f64, c: f64) -> Self {
        ExponentialModel {a, b, c}
    }

    fn get_cost(&self, duration: u32) -> f64 {
        self.a * f64::exp(self.b * duration as f64) + self.c
    }
}

// operation.rs
struct Operation {
    // other member variables
    cost_model: ExponentialModel,
}

impl Operation {
    pub fn new(/* other member variables */, a: f64, b: f64, c:f64) -> Self {
        Operation {
            // other member variables
            cost_model: ExponentialModel::new(a, b, c),
        }
    }

    fn get_cost(&mut self, duration: u32) -> f64 {
        self.cost_model.get_cost(duration)
    }
}

Required Changes in solver.py

(back to contents)

Currently, solver.py instantiates a rust_runner (Rust-side PhillipsDessouky object), which requires information about the critical_dag to reconstruct it in Rust. This conversion is aided by format_rust_inputs, which converts an nx.DiGraph object into a format compatible with the Rust-side LowtimeGraph constructor. Since the current Rust implementation does not include Operation and CostModel, this information is not processed in format_rust_inputs.

A sample format_rust_inputs extended for additional Operation information is included below. Note that the sample does not include CostModel parameters, which should be included as well. Finally, note that op_details can take on a None value (which can be interpreted Rust-side as an Option), because some edges may not correspond to an Operation (read Appendix E.1 of the Perseus paper for a detailed description of how the "Capacity DAG" is constructed).

    def format_rust_inputs(
        dag: nx.DiGraph,
    ) -> tuple[
        nx.classes.reportviews.NodeView,
        list[
            tuple[
                tuple[int, int],
                tuple[float, float, float, float],
                tuple[bool, int, int, int, int, int, int, int] | None,
            ]
        ],
    ]:
        nodes = dag.nodes
        edges = []
        for from_, to_, edge_attrs in dag.edges(data=True):
            op_details = (
                None
                if self.attr_name not in edge_attrs
                else (
                    edge_attrs[self.attr_name].is_dummy,
                    edge_attrs[self.attr_name].duration,
                    edge_attrs[self.attr_name].max_duration,
                    edge_attrs[self.attr_name].min_duration,
                    edge_attrs[self.attr_name].earliest_start,
                    edge_attrs[self.attr_name].latest_start,
                    edge_attrs[self.attr_name].earliest_finish,
                    edge_attrs[self.attr_name].latest_finish,
                )
            )
            edges.append(
                (
                    (from_, to_),
                    (
                        edge_attrs.get("capacity", 0),
                        edge_attrs.get("flow", 0),
                        edge_attrs.get("ub", 0),
                        edge_attrs.get("lb", 0),
                    ),
                    op_details,
                )
            )
        return nodes, edges

Iterative Development for Correctness

(back to contents)

The most important thing as we reimplement functions in Rust is that lowtime remains correct. The best way to ensure correctness is to compare the modfied version's outputs with the original version's outputs.

The infrastructure for this is contained in a separate repo: lowtime-exp (access required). lowtime-exp contains infrastructure for:

  • Profiling a specific version of lowtime (profile_single.sh or profile_all.sh)
  • A way of checking correctness by comparing outputs of one version to the original (diff_lowtime_outputs.py)

diff_lowtime_outputs.py takes in --expected_dir and --actual_dir command-line args, and compares the outputted energy schedules of every iteration. Note that the script allows for some intermediate iteration outputs to differ (it will let you know which iterations differed) as long as the final energy schedule is equivalent; the reason some intermediate iterations may differ is that nx.max_flow and pathfinding::edmonds_karp have different implementations with different tie-breaking mechanisms. As long as the final output is equivalent, this is fine.

Zooming out, we wish to reimplement the following functions: annotate_capacity, reduce_durations, aoa_to_critical_dag. To reimplement them, we need a Rust implementation of Operation and CostModel. This is a big project. Each of these components are non-trivial, so there will be bugs. A good way to approach building these is to build them one-by-one and check correctness before building the next component.

However, all of these components rely on the Operations values being correctly updated. For example, suppose we begin by reimplenting aoa_to_critical_dag in Rust:

  • Create graph_utils.rs
  • Implement aoa_to_critical_dag function
  • Create Rust-side PhillipsDessouky member function called aoa_to_critical_dag, which calls Rust-side aoa_to_critical_dag on its member self.dag
  • In solver.py, replace the call to self.aoa_to_critical_dag with rust_runner.aoa_to_critical_dag

This is great, but the problem is that we have no way of testing if the correctness of the Rust version of aoa_to_critical_dag. Furthermore, since the critical_dag on the Python-side was not updated, future iterations of run will be incorrect (e.g. annotate_capacity of the next iteration is given an incorrect critical_dag since it was updated Rust-side, but not Python-side).

There are 2 potential solutions to this problem:

  1. Only start testing once we have reimplemented everything in Rust. We can simply check that the produced output is equivalent to the original version. If the output is incorrect, then oh well... the bug is somewhere in the entire reimplementation... we'll find it...
  2. In the process of reimplementing everything, we reimplement things one-by-one. In this aoa_to_critical_dag example, once we have a Rust implemention, instead of replacing the Python-side call to aoa_to_critical_dag, we can run both Python and Rust versions and be left with 2 DAGs. We can add some code Python-side to iterate over the 2 DAGs to ensure that they are identical. This ensures that the Python-side Operations have the updated values and we can check the correctness of aoa_to_critical_dag without having reimplemented everything else.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant