-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end tagging: Rust #8304
End-to-end tagging: Rust #8304
Conversation
Web viewer failed to build.
Note: This comment is updated whenever you push a commit. |
83a1662
to
c515636
Compare
// NOTE: Uncomment this to expose fully-qualified names in the Dataframe APIs! | ||
// I'm not doing that right now, to avoid breaking changes (and we need to talk about | ||
// what the syntax for these fully-qualified paths need to look like first). | ||
format!("{}:{}", entity_path, descriptor.component_name.short_name()), | ||
// format!("{entity_path}@{}", descriptor.short_name()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I kept the existing column-path syntax around for now, to avoid breaking anything downstream. TBD.
let is_tombstone = re_types_core::archetypes::Clear::all_components() | ||
.iter() | ||
.any(|descr| descr.component_name == *component_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of many many many examples where we still have component-name based logic that needs to be migrated to component-descriptor based logic.
The goal of this PR is to get tags in and out, not migrate everything that exists in one go (that PR is big enough as is!).
#[cfg(debug_assertions)] | ||
for (component_name, per_desc) in chunk.components().iter() { | ||
assert!( | ||
per_desc.len() <= 1, | ||
"Insert Chunk with multiple values for component named `{component_name}`: this is currently UB", | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day this will be legal, but first we need to remove every single bit of component-name based logic across the entire app.
c515636
to
05f5cb0
Compare
@rerun-bot full-check |
Started a full build: https://github.com/rerun-io/rerun/actions/runs/12145398537 |
/// | ||
/// Used by implementers of [`crate::AsComponents`] to both efficiently expose their component data | ||
/// and assign the right tags given the surrounding context. | ||
pub struct MaybeOwnedComponentBatch<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at this point this deserves a new name? I don't think it matters all that much, all those APIs will break in the most extraordinary ways once we move to eager serialization anyhow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, DescribedComponentBatch
could be made to match what we use on the Python side, and maybe even the C++ side if we get lucky...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. Maybe simpler ComponentBatchCowWithDesc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having one opinion more than me, you win the opinion bidding war: ComponentBatchCowWithDescriptor
it is
|
I had to give a tour to @Wumpf so that he could help me with the C++ port, so he's probably the best equipped person to review this at this point. |
a82a5bb
to
82a1dae
Compare
82a1dae
to
deb1615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, that was a lot :)
ship it!
// TODO(cmc): disgusting, but good enough for now. | ||
(field.name == RowId::descriptor().component_name.as_str()).then_some(column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would the clean alternative look like? why not extract the full descriptor from the field's metadata and compare that? I suppose for RowId that's a bit overzealous given that a rowid would never be tagged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clean alternative is that all these things should be working at the new protobuf metadata layer, with proper Descriptor types etc.
/// | ||
/// See also: | ||
/// * [`ChunkStore::new`] | ||
pub fn from_log_msgs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only difference between this and the thing above is just the error context? couldn't that be done with another with_context
call on the result of from_log_msgs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then from_log_msgs
needs to take an iterator of results instead and things get annoying for other people... I'd rather have a bit of duplication and make everyone happy
// * Users might still want to register Components with specific tags. | ||
// * In the future, `ComponentDescriptor`s will very likely cover than Archetype-related tags | ||
// (e.g. generics, metric units, etc). | ||
fn descriptor() -> ComponentDescriptor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
despite above line of argumentation it does irk me a bit that you can set field names and archetype on the descriptor here. It's almost like a stricter subset of the descriptor would be better here: do we really want to be able to set field names at the component type level?
I do however concurr that there may be more stuff that then is very relevant to the component. And a separate type to split those out is likely overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can definitely be useful yes, I've seen it when implementing actual examples.
If you're implementing a custom component and you know you always want it to be part of a specific archetype, you'd rather just do it once in the your component definition, rather than having to specify overrides at every log site.
components?.into_iter().collect(); | ||
let components: ChunkComponents = components?.into_iter().collect(); | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense but how did this validation end up in here? isn't that more general than send_columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oO I did not write this code, I'm not sure what's going on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove it, I'm not sure what that's about 🤷
&rerun::Points3D::new([(1.0, 2.0, 3.0)]).with_radii([0.3, 0.2, 0.1]), | ||
)?; | ||
|
||
// When this snippet runs through the snippet comparison machinery, this environment variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a really weird hack to do the checking. Nothing about this is /docs/
anymore :/
wouldn't the older "roundtrip tests" do the same job just as well?
if not can we maybe at least add a short readme to this folder to point out that these are not really snippets for documenting but rather tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if i can move it to roundtrips/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no wait, i remember why I picked snippets/ -- these are docs, they show how to override tags in different scenarios! but yeah the presence of the testing code in the rust version kinda suck, hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if i can "hide" the testing code in a separate function at the very least
Co-authored-by: Andreas Reich <andreas@rerun.io>
94ef347
to
d8abb0b
Compare
d8abb0b
to
772fb23
Compare
Implemented with the help of @Wumpf. This semantically mimics very closely the way things are done in Rust, minus all technical differences due to the differences between both the languages and the SDKs. For that reason, everything stated in #8304 (comment) basically applies as-is. Pretty happy about it, I must say. * DNM: requires #8304 * Part of #7948 --------- Co-authored-by: Andreas Reich <andreas@rerun.io>
This semantically mimics very closely the way things are done in Rust, minus all technical differences due to the differences between both the languages and the SDKs. For that reason, everything stated in #8304 (comment) basically applies as-is. Pretty happy about it, I must say. * DNM: requires #8316 * Part of #7948
I had to give up on the idea of splitting this thing into neat little PRs -- the enormous amount of extra work needed in this case is just not worth it, it's not even close (turns out changing the definition of
Component
has cascading consequences 😶).I'll add a thorough description of what's going on to compensate, and can walk someone through this if needed.
Goals and non-goals
The goal of this PR is to get component tags in, store them, and then get them out.
The goal of this PR is not to port every single bit of component-name based logic to component-descriptor based logic (including but certainly not limited to datastore queries).
That will be the next step: #8293.
Types and traits
First and foremost, this ofc introduces the new
ComponentDescriptor
type:Note that this is a Rerun type, not a Sorbet type: i.e. it uses Rerun terminology (archetypes, fields, etc), not Sorbet terminology.
As is now tradition, this terminology gets translated into its Sorbet equivalent when leaving the land of internal Chunks for the land of external RecordBatches and Dataframes.
Component
s are now uniquely identified by aComponentDescriptor
rather than aComponentName
:Component::name
still exists for now, as a convenience during the interim (that is, until we propagateComponentDescriptor
to every last corner of the app).MaybeOwnedComponentBatch
now has the possibility to augment and/or fully-override theComponentDescriptor
of the data within:This is a crucial part of the story, as this is how e.g. archetypes inject their own tags when component data gets logged on their behalf.
Override model
The override model is simple:
Component
has an associatedComponentDescriptor
.ComponentBatch
inherits from its underlyingComponent
'sComponentDescriptor
.AsComponents
has an opportunity to override eachComponentBatch
'sComponentDescriptor
(by means ofMaybeOwnedComponentBatch
.The goal is to try and carry those semantics over the two other SDKs (Python, C++), while somehow keeping changes to a minimum.
Undefined behavior
Logging the same component multiple times on a single entity (e.g. by logging different archetypes that share parts of their definitions) has always been, for all intents and purposes, UB.
This PR propagates descriptors just enough to get things up and running, no more no less. By which I mean that it is possible to get component tags in and out of the system, but many things still assume that
Component
s are uniquely identified by their names.This means that some part of the codebase are still indexing things by name, while others index by descriptor. Where these parts meet, what was UB before is even more UB now, as we generally just pick one random component among the ones available.
You'll see a lot of
get_first_component
in the code: every single one of those is UB if there are multiple components under the same name (for now!).Debug builds assert for duplicated components, until we properly use descriptors everywhere (remember: nothing should ever be indexed by
ComponentName
in the future).Fully-qualified component names & column paths
ComponentDescriptor
defines its fully-qualified name as such:which yields e.g.
rerun.archetypes.Points3D:rerun.components.Position3D#positions
, which is generally shortened toPoints3D:Position3D#positions
when there is no ambiguity.In the dataframe API, a fully-qualified column path now becomes
{entity_path}@{archetype_name}:{component_name}#{archetype_field_name}
, e.g./my/points@rerun.archetypes.Points3D:rerun.components.Position3D#positions
or/my/points@Points3D:Position3D#positions
.This syntax needs to be debated. I have intentionally disabled the syntax in the dataframe APIs so as not to break anything external-facing.
Transport and metadata
ArchetypeName
andArchetypeFieldName
are now exposed asrerun.archetype_name
andrerun.archetype_field_name
inTransportChunk
's arrow metadata.I really cannot wait for a better metadata system.
Performance
ComponentDescriptor
s add an extra layer of mappings everywhere: we used to haveIntMap<ComponentName, T>
all over the place, now we haveIntMap<ComponentName, IntMap<ComponentDescriptor, T>>
.The extra
ComponentName
layer is needed because it is very common to want to look for anything matching aComponentName
, without any further tags specified.Like before, these are NoHash maps, so performance impact should be minimal (
ComponentDescriptor
implements NoHash by xor'ing everything).Examples / testing / roundtrips
See:
These snippets play all roles at once, as usual. In particular they make sure that all languages (well, only Rust for now, Python and C++ coming soon) carry all the right tags in all the right situations.