-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SemanticModel Node Type #7769
Changes from 5 commits
17030ea
4b8d8a8
2fea3d6
9203b17
a79ad1f
27e13b8
84a47e1
85c4369
6bfc766
41ab90a
2406884
870402e
c410a65
a2de21a
6e70a2e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,20 +25,21 @@ | |
from dbt.contracts.publication import ProjectDependencies, PublicationConfig, PublicModel | ||
|
||
from dbt.contracts.graph.nodes import ( | ||
Macro, | ||
BaseNode, | ||
Documentation, | ||
SourceDefinition, | ||
GenericTestNode, | ||
Exposure, | ||
Metric, | ||
GenericTestNode, | ||
GraphMemberNode, | ||
Group, | ||
UnpatchedSourceDefinition, | ||
Macro, | ||
ManifestNode, | ||
GraphMemberNode, | ||
ResultNode, | ||
BaseNode, | ||
ManifestOrPublicNode, | ||
Metric, | ||
ModelNode, | ||
ResultNode, | ||
SemanticModel, | ||
SourceDefinition, | ||
UnpatchedSourceDefinition, | ||
) | ||
from dbt.contracts.graph.unparsed import SourcePatch, NodeVersion, UnparsedVersion | ||
from dbt.contracts.graph.manifest_upgrade import upgrade_manifest_json | ||
|
@@ -689,6 +690,7 @@ class Manifest(MacroMethods, DataClassMessagePackMixin, dbtClassMixin): | |
public_nodes: MutableMapping[str, PublicModel] = field(default_factory=dict) | ||
project_dependencies: Optional[ProjectDependencies] = None | ||
publications: MutableMapping[str, PublicationConfig] = field(default_factory=dict) | ||
semantic_models: MutableMapping[str, SemanticModel] = field(default_factory=dict) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The existing pattern is the node objects use "Model" and the dictionaries use "nodes", that is Model nodes are found in "nodes" dictionary, "PublicModel" nodes are found in public_nodes. Could this be the "semantic_nodes" to preserve that pattern? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that case, do you think we just rely on the existing nodes dictionary, and not have a special case for SemanticModel objects? Would it matter that the SemanticModels are not yet linked into the compiled graph? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original "metrics" went in a "metrics" dictionary... This has been a bit ad-hoc and not closely thought through, but it feels like the existing pattern is that SQL things go in nodes (plus seeds...) and yaml-generated things go in their own dictionaries. There are some assumptions in the rest of the code that match, such as the links in the file objects, etc. So I think it still makes sense to put semantic models in their own dictionary. If anything I'd be tempted to separate out some of the existing things that are in the nodes dictionary into their own dictionaries and maybe make some combined "indexes" (for cases where we don't want to loop over "nodes" all the time...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just wanted to mention that one of the reasons for going in the direction of more individual dictionaries rather than less, is that currently jsonschema and deserializers can't always correctly guess the classes of the objects in the "nodes" dictionary, leading to other hacky things like the big deserialization if statement in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is all really good to know. It seems like the right direction to me. |
||
|
||
_doc_lookup: Optional[DocLookup] = field( | ||
default=None, metadata={"serialize": lambda x: None, "deserialize": lambda x: None} | ||
|
@@ -1212,6 +1214,11 @@ def add_doc(self, source_file: SourceFile, doc: Documentation): | |
self.docs[doc.unique_id] = doc | ||
source_file.docs.append(doc.unique_id) | ||
|
||
def add_semantic_model(self, source_file: SchemaSourceFile, semantic_model: SemanticModel): | ||
peterallenwebb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_check_duplicates(semantic_model, self.semantic_models) | ||
self.semantic_models[semantic_model.unique_id] = semantic_model | ||
source_file.semantic_models.append(semantic_model.unique_id) | ||
|
||
# end of methods formerly in ParseResult | ||
|
||
# Provide support for copy.deepcopy() - we just need to avoid the lock! | ||
|
@@ -1311,6 +1318,9 @@ class WritableManifest(ArtifactMixin): | |
public_nodes: Mapping[UniqueID, PublicModel] = field( | ||
metadata=dict(description=("The public models used in the dbt project")) | ||
) | ||
semantic_models: Mapping[UniqueID, SemanticModel] = field( | ||
metadata=dict(description=("The semantic models defined in the dbt project")) | ||
) | ||
metadata: ManifestMetadata = field( | ||
metadata=dict( | ||
description="Metadata about the manifest", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,29 +6,23 @@ | |
import hashlib | ||
|
||
from mashumaro.types import SerializableType | ||
from typing import ( | ||
Optional, | ||
Union, | ||
List, | ||
Dict, | ||
Any, | ||
Sequence, | ||
Tuple, | ||
Iterator, | ||
) | ||
from typing import Optional, Union, List, Dict, Any, Sequence, Tuple, Iterator, Protocol | ||
|
||
from dbt.dataclass_schema import dbtClassMixin, ExtensibleDbtClassMixin | ||
|
||
from dbt.clients.system import write_file | ||
from dbt.contracts.files import FileHash | ||
from dbt.contracts.graph.unparsed import ( | ||
Dimension, | ||
Docs, | ||
Entity, | ||
ExposureType, | ||
ExternalTable, | ||
FreshnessThreshold, | ||
HasYamlMetadata, | ||
MacroArgument, | ||
MaturityType, | ||
Measure, | ||
MetricFilter, | ||
MetricTime, | ||
Owner, | ||
|
@@ -62,12 +56,6 @@ | |
EmptySnapshotConfig, | ||
SnapshotConfig, | ||
) | ||
import sys | ||
|
||
if sys.version_info >= (3, 8): | ||
from typing import Protocol | ||
else: | ||
from typing_extensions import Protocol | ||
|
||
|
||
# ===================================================================== | ||
|
@@ -552,6 +540,30 @@ def depends_on_macros(self): | |
return self.depends_on.macros | ||
|
||
|
||
@dataclass | ||
class FileSlice(dbtClassMixin, Replaceable): | ||
peterallenwebb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Provides file slice level context about what something was created from. | ||
|
||
Implementation of the dbt-semantic-interfaces `FileSlice` protocol | ||
""" | ||
|
||
filename: str | ||
content: str | ||
start_line_number: int | ||
end_line_number: int | ||
|
||
|
||
@dataclass | ||
class Metadata(dbtClassMixin, Replaceable): | ||
peterallenwebb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Provides file context about what something was created from. | ||
|
||
Implementation of the dbt-semantic-interfaces `Metadata` protocol | ||
""" | ||
|
||
repo_file_path: str | ||
file_slice: FileSlice | ||
|
||
|
||
# ==================================== | ||
# CompiledNode subclasses | ||
# ==================================== | ||
|
@@ -1399,6 +1411,28 @@ class Group(BaseNode): | |
resource_type: NodeType = field(metadata={"restrict": [NodeType.Group]}) | ||
|
||
|
||
# ==================================== | ||
# SemanticModel and related classes | ||
# ==================================== | ||
|
||
|
||
@dataclass | ||
class NodeRelation(dbtClassMixin, Replaceable): | ||
peterallenwebb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
alias: str | ||
schema_name: str | ||
relation_name: str | ||
database: Optional[str] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From this comment -- does this still need to include a unified relation_name that respects quoting + include policies? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we do -- we'll probably need the adapter.Relation to do something like this during compilation: adapter = get_adapter(self.config)
relation_cls = adapter.Relation
relation_name = str(relation_cls.create_from(self.config, node)) (from https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/compilation.py#L486-L488) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the SemanticModels actually get compiled, since they're yaml-only. There is a question of whether they need the individual pieces (identifier/schema/database) or just the relation_name... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @QMalcolm I'm not sure of the answer to this, but I suspect the answer may be yes. What makes sense from the perspective of MetricFlow integration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MichelleArk Yep! We need to include the unified relation_name, and it's desirable that it respects quoting + include policies. As for whether we need both the individual pieces as well as the |
||
|
||
|
||
@dataclass | ||
class SemanticModel(GraphNode): | ||
description: Optional[str] | ||
node_relation: NodeRelation | ||
entities: Sequence[Entity] | ||
measures: Sequence[Measure] | ||
dimensions: Sequence[Dimension] | ||
|
||
|
||
# ==================================== | ||
# Patches | ||
# ==================================== | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantic_nodes? (see other comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank To clarify, do you think we should change the name in the yml file, or just for our internal property names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for for the internal property names. I think "semantic_models" in the yaml files is fine, it matches the "models" section we already have. Naming is so hard.