-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-92] Proposal for restructuring the Python code base #20630
Comments
I like this! Changing I also like that the core SQL engine gets its own folder. It's long overdue. One thing I'm not sure about is can we make it faster to find all models that correspond to a db table? I assume they will all be moved to Another thing that has been bothering me is---there seems to be too many boilerplates and parameters passing around between the API, the commands and the DAO. Also, what is the boundary between a command and a DAO method anyway? Currently it seems a very thick validation layer can be added to either the DAO or a command. So I wonder if we can further simplify things down to something like this:
A couple of things to note:
|
@ktmud thanks the the comments. I believe views—which are part of the MVC model; alongside models—are likely deemed legacy, i.e., should be migrated to RESTful API endpoints, and thus likely shouldn't be explicitly called out. Regarding models, I'm not overly versed with the DAO pattern, though per this post DAOs and models could/should coexist and thus having a top-level |
The reason I wanted to keep a
Yes, I was thinking of the same thing. |
@ktmud I spoke with @hughhhh and @rusackas briefly about this and they were in agreement with you about keeping the concept of views and thus I've updated the directory schematic. As it currently stands views contain both API endpoints as well as non-API endpoints, i.e., those which render HTML templates, and thus the non-API endpoints would be housed under the |
Cool. Thanks for the update! |
Thanks for the SIP @john-bodley. I went through a similar process when writing SIP-61 - Improve the organization of front-end folders. When researching best practices for organizing projects, the concept of a feature-based organization appeared in many articles that described how large codebases were organized. You'll find many of the reasoning behind this model in SIP-61 and its references. The basic idea is that files related to a feature should belong together in a structured way. This allows you to easily switch between feature implementations, facilitates the use of feature flags, and also promotes better-defined dependencies. Maybe we can apply some of the concepts here too 😉 |
Thanks for this, @john-bodley! I +1 to @ktmud 's structure. I have noticed recently that models are mixed all around the code and should be considered top-level. Ideally they would be somewhat isolated and should not depend directly on layers above, so this change would be great. I'd be glad to help out with the refactor :) My view of what each layer "does" (open to discussion):
MVC Views / API
Commands
DAOs
Models
|
Closing the issue since the vote has passed. @john-bodley please move the issue to the "Implemented/Done" column when you think that's appropriate :) |
@john-bodley is there a way to provide linting rules around this so we can (a) enforce it, and (b) track the tech debt here? |
@rusackas I'm not sure, though I have started working with custom Pylint checkers. I suspect the "enforcement" likely happens implicitly with developers following the new patterns. That said the state of our current backend code is rather troubling as we have things strewn all over the place. |
Any updates from anyone following this on implementation status/plans? |
[SIP-92] Proposal for restructuring the Python code base
Motivation
Superset has evolved somewhat organically over time which is reflected—somewhat apparently—in the how the Python code—which resides solely in the top level
superset
folder—is organized. Initially Superset used a Model View Controlled (MVC) pattern combined with the notion of database connectors whereas now we've adopted the Data Access Object (DAO) pattern (SIP-35), which when coupled with commands and the API, helps to decouple the business layer from the persistence layer.Due to partial refactors and years of creep the code organization is fragmented. This has negatively impacted both the code quality and developer experience.
Proposed Change
The TL;DR is the Superset application is primarily composed on a few major functional components:
The proposed change would be to refactor the code into more functional rather than business top level folders, which has become somewhat bloated and results in a fragmented core—something which has plagued Superset since its inception with its feature driven mindset at the cost of architectural integrity.
Below is the before/after enumeration of the current (as of 10/28/2022) top-level folders and files, where "N/A" denotes that the folder/files will no longer exist in its current form. The additional sub-sections outline more specifics.
APIs, Commands, DAOs, and Models
APIs historically were mostly defined in an ad-hoc manner, i.e., in a non-RESTful way, as views which mostly reside in within the
superset/views/
folder. These "legacy" APIs now coexist alongside RESTful APIs which leverage the DAO model which reside in the component specific folder, i.e.,superset/datasets/
. Furthermore commands are either defined within thesuperset/commands/
folder or component specific folder, i.e.,superset/datasets/commands
.As a developer it isn't overly apparent where an API endpoint resides. The proposed solution is move to a directory structure which more clearly illustrates that the APIs, DAOs, and commands are decoupled (as illustrated below for the datasets components). The API—which leverages blueprints—is comprised both of the v1 RESTful API (current) and the legacy API. This demarcation also helps developers identify which API endpoints need to be migrated to v1.
Note currently views are a combination of legacy API endpoints and non-API endpoints. The concept of views will remain but should only contain non-API endpoints, i.e., rendering HTML templates.
SQL Engine
Though not as well flushed out as APIs, commands, and DAOs, the actual SQL engine—responsible for preparing, executing, fetching result sets—would be colocated within the broad
superset/engine/
subfolder. This would comprise of the engine specifications, templating, SQL parsing, query objects, etc.New or Changed Public Interfaces
N/A.
New dependencies
N/A.
Migration Plan and Compatibility
The code restructure can be piecemeal. The general steps are:
Rejected Alternatives
Regarding the engine directory structure I'm unsure whether the current proposal makes the most sense. An alternative could be based more on the flow/path of a query from pre-processing (including construction and SQL parsing), to execution, then fetching, and finally post-processing of the result set.
I presume that calling out the engine as a first class entity rather than treating it as a command likely makes sense. I think this is open for debate/discussion.
The text was updated successfully, but these errors were encountered: