Replies: 1 comment
-
Definitely yes. This PR is more than welcome. We're not using any advanced graph operations in networkx. Basically it's just adding node, adding edge, removing node, removing edge, etc. As long as rustworkx provides the same basic APIs, performance is a more than good enough reason to proceed ahead. Also, if possible, we can consider adding a graph interface to hide the rustworkx API, pretty much like what we did with a analyzer interface to hide sqlparse+sqlfluff. But completely getting rid of networkx without that interface is also acceptable. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I noticed that when parsing and analysing large amounts of SQL, the graph operations in
ColumnLineageMixin.get_column_lineage
get slow.For context, the graph I operate with has approx. 200k tables and 10k column nodes coming from a combination of SQLs spanning 800k characters.
While
networkx
is really handy and pythonic, it's slow when it comes to searching for all simple paths over and over again. With 800 source columns and 3.3k target columns, we have to search for paths 2.6m times.I found a package called rustworkx, which is a partial reimplementation of
networkx
in Rust with Python API/bindings. It's "almost" a drop-in replacement. It would only require minor code changes and an extra dependency but bring significant performance improvement for large graphs. We should be able to keep the API stable too and keep the changes toColumnLineageMixin.get_column_lineage
.Would you accept a well-crafted PR?
Beta Was this translation helpful? Give feedback.
All reactions