-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ligra: Add comments and unit test regarding bug with vertex intersection #21
base: master
Are you sure you want to change the base?
Conversation
This is a good point. We can add an extra command-line flag that sorts the neighbor lists if the input CSR graph is not sorted. I think the default should be to assume that the neighbor lists are sorted, and to assume that an input graph to the graph constructor has all of the incident adjacency lists sorted. We can have an alternate constructor that sorts. I think this is the right approach because others who are benchmarking/using our library will probably only use the defaults, and so it's best to make sure the defaults are the fastest/best optimized codepaths. Ideally we would have some assertions that would fail if non-sorted lists are passed into the intersection primitives, which would make this approach both fast and safe. On a similar note, I think we should make the default graph ingestion method to use binary CSR, read through mmap, and make other graph ingestion methods use explicit flags. Right now the default is to assume a text-based format (which is slow) and to read the file into a buffer instead of mmap'ing, which is slow and wastes memory. Maybe there is a middle-ground, which is to check whether the neighbor lists are sorted which is dirt-cheap, and sort them if necessary. WDYT? |
I think the flag would be helpful, though I'm worried that user errors could still happen from someone not paying attention to the flags. That brings up another question for me though: what is the common case? as in, how are these AdjacencyGraph files usually generated? What I've been doing is downloading SNAP graphs and then using utils/SNAPtoAdj in jshun/ligra. Running One way we could assert for sortedness in I also do agree that it would probably be nice to have a more compact input format that we can read more quickly. That could also help reduce the chance of hitting this bug if we always save that input format with neighbor lists in sorted order. |
[a]symmetric_vertex::intersect
assumes that the neighbor lists of each vertex are in sorted order. However, this is not necessarily the case. In particular,sym_graph_from_edges
inligra/graph.h
does not sort neighbor lists.graph_test::MakeUnweightedSymmetricGraph
, which callssym_graph_from_edges
.This PR is a temporary patch to
graph_test::MakeUnweightedSymmetricGraph
to make unit test pass. It also adds to comments tosymmetric_vertex::intersect
documenting this assumption.However, this issue should really be addressed in a more robust way in some future PR. This is a sneaky assumption in
symmetric_vertex::intersect
that doesn't cause any seg faults but instead will lead to slightly wrong answers, and a user has no convenient way to validate the assumption that a graph's neighbor lists are all in sorted order. This affects:Some possible solutions:
graph_io.h
when we read in graphs or assert that they're sorted. Add a function comment to[a]symmetric_graph
's constructor saying that neighbor lists must be sorted. (this feels error-prone)intersect
calls (also error-prone)symmetric_vertex::intersect
to not require sorted neighbor lists (expensive, I'd guess this will slow down TriangleCounting by a lot)Feedback on what solution we should move forward with would be appreciated.