Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[visual-dataflow] DOT/CSV Parsing #40

Merged
merged 55 commits into from
Nov 3, 2023
Merged

[visual-dataflow] DOT/CSV Parsing #40

merged 55 commits into from
Nov 3, 2023

Conversation

albertfares
Copy link
Collaborator

@albertfares albertfares commented Oct 13, 2023

Description

This pull request introduces the initial version of parsing tools that will help us represent a specific circuit graphically in Godot and track its state evolution for each cycle.

The key components of this PR include:

  • Parser: This class serves as the main parsing interface, allowing developers to process both .dot and .csv files.

  • DotParser: The primary purpose of this component is to parse .dot files in order to create a Graph object representing a specific circuit. It also stylizes the resulting graph in a Graphviz style (position of nodes and edges).

  • CSVParser: Designed to handle .csv files, this parser identifies the state of each edge for each cycle.

  • ReformatDot: A helper class that streamlines the formatting of .dot files, making them easier to parse and ensuring consistency.

  • GraphComponents: This module defines the data structures used to create a Graph object.

This PR successfully parses and processes data samples provided in the current implementation. However, it's important to note that further adaptation may be necessary when integrating this functionality with GDExtension for the graphical user interface. ReformatDot can be further optimized to reduce its line count.

What’s new

  • Parser, DotParser, CSVParser, ReformatDot, and GraphComponents classes.
  • Defined the main parsing and graph construction logic.
  • Ensured proper styling of the graph in Graphviz style (theoretically).

Testing

  • Verified the functionality and accuracy of parsed data by testing with sample .dot and .csv files.

@lucas-rami lucas-rami changed the title Visual dataflow [visual-dataflow] DOT/CSV/MLIR Parsing Oct 14, 2023
@albertfares albertfares changed the title [visual-dataflow] DOT/CSV/MLIR Parsing [visual-dataflow] DOT/CSV Parsing Oct 16, 2023
This commit does two things:
- add missing empty lines at the end of a couple files
- change the default VSCode config so that the default C++ formatter
  becomes the one provided by the official LLVM/clang extension
Copy link
Member

@lucas-rami lucas-rami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my first pass on everything. Don't be scared by the amount of remarks on their scope, it's normal to have inconsistent styling and/or be less familiar with C++/MLIR capabilities when doing a first contribution. I made a very small push on the branch with super cosmetic changes that had to do with my starter code, so don't forget to pull.

Many of my comments apply in multiple places, even if I write them down only once (e.g., exceptions, namespaces, etc.), so you need to address them in all places where it makes sense. Consequently please look at all comments on the PR even if they are not on "your part" of the code, as they may apply to the code you've written too and I usually try to explain why we do things one way instead of another, so you may learn something in the process.

Finally, you need to run clang-format (formatter) and clang-tidy (static analyzer) on your code, and fix all warnings of the latter. There are extensions for these tools on major IDEs. Usually, you want to setup your IDE to run clang-format automatically whenever you save a file and to have clang-tidy give you warnings continuously while you code. In VSCode, these tools are provided in extension llvm-vs-code-extensions.vscode-clangd (type this in the extension research bar, you may need to install packages clang-format/clang-tidy on the side) and should pick up our configuration at the project's top-level automatically. To know if clang-tidy works, you should for example see multiple warnings in CsvParser.cpp (in other files too).

} else if (state_string == "valid_ready") {
return valid_ready;
} else {
throw std::runtime_error("Error unknown state for the edge.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use exceptions throughout the project (it's actually disallowed by our compiler configuration) for design/performance reasons. If you would like to report an error, you have to somehow return it (with a mlir::LogicalResult for example). When the error is about an internal code invariant that should never be broken, you can also use

assert(condition_that_must_be_true && "short message as to why this would fail")

Note that asserts are removed from the code when compiling in Release mode, so they should not be used for error reporting that is relevant in any build of the project. Here the error is not an internal code invariant but potentially an incorrectly-formatted CSV file, so we want the error to be reported in all cases.

For example, this function signature could become

LogicalResult findState(const std::string &state_string, State &state);

The return value indicates whether decoding the string succeeded (return success(); or return failure();) and, on success, the state variable is filled with the appropriate enum value. The caller can retrieve this state because the variable was passed by reference.

@@ -0,0 +1,124 @@
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authorship is indicated on a line-by-line basis by git (that's why we use it), so we shouldn't have authorship annotations in comments. I see that you committed everything from the same account so it is somewhat lost for this contribution but that is not the right way to do things. You should just push your own lines to keep their authorship on the repository.

#include "graphComponents.h"
#include <iostream>

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper documentation like this is very nice (so don't delete the one you've already done) but unnecessary. We usually just have a small paragraph (or just a couple lines for simple functions) describing whatever is happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it's a bit unclear with Godot having different file/variable naming conventions, but all our source file names (except those that contain a main that ends up as an executable) are written in upper camel case, so here it should be GraphComponents.cpp.

I know I didn't even respect this myself with register_types.* and visual_dataflow.* (because I was following the Godot tutorial which uses those names) but it is one of those "do as I say not as I do" situation ;) Feel free to change these last two to follow upper camel case too.

using NodeId = int;
using CycleNb = int;

enum State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All you public classes/structs/functions need to be namespaced to diminish the probability of name collisions and group things logically together. Everything experimental in the repo is under the dynamatic::experimental namespace, and you can even nest everything you do under another namespace, for example visual. So this would be:

namespace dynamatic {
namespace experimental {
namespace visual {
enum State {...};
struct Pos {...};
...
} // visual
} // experimental
} // dynamatic

I guess some stuff you'll have to leave in the godot namespace because the engine expects them there, but the rest (like everything in this file should be in ::dynamatic::experimental::visual)

&& line.find('[') != std::string::npos
&& line.find("node") == std::string::npos
&& line.find("graph") == std::string::npos) {
currentNode = new Node;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new doesn't seem necessary to me (just make currentNode a Node) and here (or after graph->nodes[currentNode->label] = currentNode;) do currentNode = Node(); instead. This avoids any memory management. graph.nodes can just be std::map<std::string, Node> as a consequence (now the map "owns" its values instead of just storing pointers to them, and the Edge can still store a pointer to its two endpoints).

*/
EdgeId findEdgeInGraph(const Graph *graph, const std::vector<std::string> &edge_info) {

for (Edge* edge : graph->edges) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making graph->edges a

class Graph {
  std::map<std::pair<std::string, std::string>, Edge> edges; // (src_name, dst_name) -> edge
  // other members
}

instead to allow for constant time lookup and make it more uniform with how nodes are stored

}

else if (line.find("->") != std::string::npos){
currentEdge = new Edge;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the nodes: graph.edges can just be std::vector<Edge> and own its data (also see my comment in CsvParser.cpp about changing how edges are stored entirely).

Copy link
Member

@lucas-rami lucas-rami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another round of review. The code is significantly better, thanks for the updates! My comments are about relatively cosmetic things for the most part.

Copy link
Member

@lucas-rami lucas-rami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified a couple things myself for the sake of time before merging this (take a look at my commits to see what). Couple comments:

  • Whenever something returns a mlir::LogicalResult you actually have to check whether it's a success or failure, otherwise what's the point of producing errors? This type is even marked no_discard so the compiler/linter even flags you if you don't check it (storing it in a variable shifts the problem to a unused variable warning, which isn't better).
  • Please make sure clang-tidy is running on your IDE, there were a non-small number of warnings in the code, and they should very rarely (unless you have a very good reason) be ignored.
  • Take care of consistently adding the standard comment header as I did on all files. Not only it is useful to given an overview of what the file contains (see most other source files on the repository) but it contains our license notice, which (I believe) we should actually have in all source files.

I also changed something important related to the build process. At this point I don't want all Dynamatic users to have to build the Godot API with the project, so I made building your part of the project (visual-dataflow) an "opt-in" option from the build script. Concretely, I added a flag -v/--build-visualizer that one must provide to the script to enable this part of the build. Due to the way the script works, the underlying variable that controls this behavior is not refreshed unless you force it too. So the first time you re-build, you'll have to do

$ ./build.sh -v -f # -f for "force"

to force CMake to build visual-dataflow. For the next builds you can just do

$ ./build.sh

@lucas-rami lucas-rami merged commit a8ae593 into main Nov 3, 2023
@Tikipiou Tikipiou mentioned this pull request Nov 9, 2023
lucas-rami added a commit that referenced this pull request Nov 17, 2023
This introduces the first version of the tools for mapping MLIR
components to Graph components, as introduced in #40.

The primary components included in this pull request are:
- `MLIRMapper.cpp`: This class maps MLIR components to Graph components.
- `GraphParser.cpp`: This updates the parser to handle .mlir files,
converting them into MLIR components and then invoking the MLIR mapper.

The functionality and accuracy of parsed data was tested on
sample .mlir files.

---------

Co-authored-by: Lucas Ramirez <lucas.rami@proton.me>
@lucas-rami lucas-rami deleted the visual-dataflow branch July 23, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants