Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python BFS and DFS search documentation improvements #1361

Open
barakatzir opened this issue Jan 13, 2025 · 7 comments · May be fixed by #1375
Open

python BFS and DFS search documentation improvements #1361

barakatzir opened this issue Jan 13, 2025 · 7 comments · May be fixed by #1375

Comments

@barakatzir
Copy link

barakatzir commented Jan 13, 2025

What is the expected enhancement?

I find the documentations for the python visit functions (dfs_search etc.) confusing, and I think all users can benefit from some additions to them.
It might be that my sparse prior knowledge in graph algorithms, but I found the documentation confusing with all the colors (WHITE, GRAY, BLACK) mentioned without context to the BFSVisitor events, or the term "forward or cross edge" which I didn't know.
In the end, I read the rust implementation to understand the algorithm. The code is clearly well thought out and it would be an injustice to keep not let it shine in the documentation as well.

I suggest the following changes:

  1. In docstring, mentioning the visitor events in the pseudo-code in the docs, so it's clear which event occurs when. See suggestions below.
  2. Fix rustworkx.dfs_search docstring: mention that the graph can be a PyDiGraph as well.
  3. Changing the docstring's reported type-annotation for source from List[int] to Sequence[int] | None with default None.
  4. Changing the type annotations (in rustworkx.pyi) for the different visit functions so that visitor=None is not accepted by static type checkers. This can be achieved with overloads:
    @overload
    def digraph_bfs_search(
        graph: PyDiGraph,
        *,
        visitor: _BFSVisitor,
    ) -> None: ...
    @overload
    def digraph_bfs_search(
        graph: PyDiGraph,
        source: Sequence[int] | None,
        visitor: _BFSVisitor,
    ) -> None: ...
  5. In docstring, replace the "Graph can not be mutated while traversing." to "Trying to mutate the graph while traversing raises a RuntimeError.". This way the user can know which exception type he should catch in such cases.
  6. Documenting that raising PruneSearch in a finish_vertex method raises an exception. I checked and this behavior is documented in rustworkx-core where the function call panics in such cases.
  7. (This is the only actual suggested change to runtime behavior) I suggest not raising a PanicException if finish_vertex raises PrunceSearch. This exception isn't caught by except Exception and user cannot easily narrow it with except PanicException since PanicException is in pyo3_runtime module, which is not site-packages usually and cannot be easily imported.

I only looked at the bfs_search and dfs_search which I use, but I suspect dijkstra_search also could benefit from similar changes.
If you agree, I can open a PR with the agreed upon changes and also look into the Dijkstra case.

Regrading the pseudo-code elaborations. Here is an option: replacing the current pseudo-code with python alternative (python is already kinda pseudo-codish). I wrote python alternatives while trying to understand the rust code. Here they are after removing the control flow for the PruneSearch and StopSearch (I can reproduce the control flow if desired):

from typing import Literal
from collections import deque

def bfs_search(graph, source=None, visitor=None):
    color: dict[int, Literal['WHITE', 'GRAY', 'BLACK']] = {}
    for u in graph.node_indices():
        color[u] = 'WHITE'  # set all vertices as unvisited
    for s in source:
        _bfs_visit(graph, s, visitor, color)
        
def _bfs_visit(graph, start, visitor, color):
    if color[start] != 'WHITE':
        return
    color[start] = 'GRAY'
    visitor.discover_vertex(start)  # discover vertex start
    stack = deque((start,))
    while True:
        try:
            u = stack.pop()
        except IndexError:
            break  # break if stack is empty
        for _, v, weight in graph.incident_edge_index_map(u).values():  # examine edge (u,v)
            if color[v] == 'WHITE':
                visitor.tree_edge((u, v, weight))
                color[v] = 'GRAY'
                visitor.discover_vertex(v)
                stack.appendleft(v)
            else:
                visitor.non_tree_edge((u, v, weight))
                if color[v] == 'GRAY':
                    visitor.gray_target_edge((u, v, weight))
                else:  # color[v] == 'BLACK'
                    visitor.black_target_edge((u, v, weight))
        color[u] = 'BLACK'
        visitor.finish_vertex(u)

and

from typing import Literal

def dfs_search(graph, source=None, visitor=None):
    color: dict[int, Literal['WHITE', 'GRAY', 'BLACK']] = {}
    for u in graph.node_indices():
        color[u] = 'WHITE'  # set all vertices as unvisited
    time = -1  # time counter
    for s in source:
        time = _dfs_visit(graph, s, visitor, color, time)
        

def _dfs_visit(graph, start, visitor, color, time):
    if color[start] != 'WHITE':
        return time
    color[start] = 'GRAY'
    time += 1
    visitor.discover_vertex(start, time)  # discover vertex start
    stack = [(start, graph.incident_edge_index_map(start).values())]
    while True:
        try:
            u, edges = stack[-1]
        except IndexError:
            break  # break if stack is empty
        next_ = None
        for _, v, weight in edges:  # examine edge (u, v)
            if color[v] == 'WHITE':
                visitor.tree_edge((u, v, weight))
                color[v] = 'GRAY'
                time += 1
                visitor.discover_vertex(v, time)
                next_ = v
                break
            elif color[v] == 'GRAY':
                visitor.back_edge((u, v, weight))
            else:  # color[v] == 'BLACK'
                visitor.forward_or_cross_edge((u, v, weight))
        if next_ is not None:
            stack.append((v, graph.incident_edge_index_map(v).values()))
            continue
        color[u] = 'BLACK'
        time += 1
        visitor.finish_vertex(u, time)
        stack.pop()
    return time
@IvanIsCoding
Copy link
Collaborator

In general I think documentation improvements are always welcome. I did suggest the idea of having the pseudocode in Python in #489 but the consensus was that the existing pseudocode was good enough.

I will start by saying that the terminology used is the same as the one in CLRS. The book did not invent it but it popularized it for sure. We should probably cite Chapter 20 "Elementary Graph Algorithms" from the 4th edition.

I will group the answer by:

Panicking

  1. Answering n.7. I agree in not panicking, I'd rather throw an error. I will merge a PR for improving that aspect of the code
  2. Documenting n.6 is fine. It can come in the same PR

Modifying the graph

Answering about modifying the graph while iterating. Modifying the input while calling DFSVisitor or BFSVisitor is a terrible idea, the implementation does not handle that case. In the best scenario you will get a RuntimeError as mentioned but if the program managed to panic I would not be surprised.

I am ok with documenting RuntimeError with the emphasis that modifying the graph is not supported.

Type annotations

I can take that one myself and clean it up after #1353 is merged. There are some improvements from #1352 as well. Once I get Matthew to review my pending PRs it will be easier to work on that.

Documentation improvements

You can send the improvements. I don't mind changing the pseudocode to Python as long as it is roughly the same. It shouldn't be a full-fledged implementation but something good enough to give a guide on how to use the code. We should also document that the universal function dfs_search and bfs_search works with both graph types

Another idea is to add more examples to the documentation, maybe with a tutorial on how to use the visitors.

@barakatzir
Copy link
Author

Thanks for the informative response! I'll get to work on the PR on the documentations in a couple of days probably.
Adding more examples is also a great idea!

Do you have a plan on what to replace the panic with? If you have a concrete exception type in mind I can put it in the documentation.

@IvanIsCoding
Copy link
Collaborator

Thanks for the informative response! I'll get to work on the PR on the documentations in a couple of days probably. Adding more examples is also a great idea!

Do you have a plan on what to replace the panic with? If you have a concrete exception type in mind I can put it in the documentation.

I think RuntimeError is fine in the place of a panic. I will think more carefully about it but that one is a good umbrella error for things we don’t want to handle

@IvanIsCoding
Copy link
Collaborator

So I tried to add the annotations in #1362 and the results was that mypy complained that we did not cover the default arguments case.

I will probably need to add an overload with https://docs.python.org/3/library/typing.html#typing.Never for the visitor=None case. Something along the lines of def dfs_search(graph: PyGraph | PyDiGraph, *, visitor: None=...) -> Never.

I never thought argument ordering would bite us back so much!

@barakatzir
Copy link
Author

barakatzir commented Jan 15, 2025

Overloading can be tricky. There are a lot of sharp corners for complex signatures. I'll take a look at your commit with the overloading.

It is considered bad practice to use Never or NoReturn to annotate a raised exception.
The following code is valid by mypy:

From typing import Never

def neverreturns(): -> Never:
    assert False

neverreturns()  # no errors by mypy
1 + 'hello'  # this isn't type checked

The idea is that Never can be used by functions that run a server or another never-ending event loop. All the code after a Never is returned is skipped by type checkers since it is deemed unreachable.

Python simply does not have a good alternative to rust's Result...

@IvanIsCoding
Copy link
Collaborator

IvanIsCoding commented Jan 15, 2025

I agree it is bad practice but I need to cover the all None case. I’d rather not tell users the all None case is available at all. But our tests catch that “mistake”

@barakatzir
Copy link
Author

barakatzir commented Jan 15, 2025

I did some trials and I believe the following should work

@overload
def digraph_bfs_search(
    graph: PyDiGraph, source: Sequence[int] | None=..., *, visitor: _BFSVisitor,
) -> None: ...
@overload
def digraph_bfs_search(
    graph: PyDiGraph, source: Sequence[int] | None, visitor: _BFSVisitor,
) -> None: ...
@overload
def digraph_bfs_search(
    graph: PyDiGraph, source: Never, visitor: None = ...,
) -> None: ...

This passes both mypy and stubtest (on minimal examples I recreated). Notice that I put the Never in the argument for source, so type checkers are instructed to forbid this overload signature.

@barakatzir barakatzir linked a pull request Jan 29, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants