-
-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mapping time series collections #2687
Conversation
80c47e3
to
d2a5c5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
The functional test would ensure we pass the correct options to create the collection and this options works with all supported server versions (past and future).
); | ||
|
||
if ($class->timeSeriesOptions->expireAfterSeconds) { | ||
$options['expireAfterSeconds'] = $class->timeSeriesOptions->expireAfterSeconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted this option is specific to TimeSeries collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit odd that create receives this as a top-level option instead of within timeseries
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option is also used for clustered collections, which can leverage a TTL to remove expired documents. I'm guessing the team decided to reuse the option since it was already there instead of adding an extra option in the timeseries options document.
* @ODM\TimeSeries(timeField="time", metadataField="metadata", granularity=Granularity::Seconds, expireAfterSeconds=86400) | ||
*/ | ||
#[ODM\Document] | ||
#[ODM\TimeSeries('time', 'metadata', Granularity::Seconds, 86400)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Named arguments are frequently used for attributes, you should use them to ensure there is not regression with argument name change.
<xs:complexType name="time-series"> | ||
<xs:attribute name="time-field" type="xs:NMTOKEN" use="required" /> | ||
<xs:attribute name="meta-field" type="xs:NMTOKEN" /> | ||
<xs:attribute name="granularity" type="odm:time-series-granularity" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MongoDB 6.3 added bucketMaxSpanSeconds
and bucketRoundingSeconds
as an alternative to granularity
. Both options are expected to have the same value (between 1
and 31536000
), although we can rely on the server to enforce that. See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed those. I decided against adding validation, except for limiting the range in the XSD as we're able to type this better. I'm not sure why there are two different options that require having the same value, but something tells me that this requirement may change in the future (a great example why we have the Defy Augury mantra in the driver specifications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why there are two different options that require having the same value
From the minds that brought us a required sparsity
option for encrypt()
methods, before it was recently made optional. I jest, but that's what this reminded me of.
// Mark as time series only after mapping all fields | ||
if (isset($classAttributes[TimeSeries::class])) { | ||
assert($classAttributes[TimeSeries::class] instanceof TimeSeries); | ||
$metadata->markAsTimeSeries($classAttributes[TimeSeries::class]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the first markAs
method in ClassMetadata? Just curious why you went this route instead of a setTimeSeries
method as we have for ShardKey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the first time I used this naming. I could change it to setTimeSeriesOptions
to be consistent with others, but I figured that technically we're marking this as a time series document. Other methods could be changed as well, e.g. setShardKey
should be markAsSharded
as it doesn't allow to unshard metadata at a later stage anyways.
I have started another branch with a more comprehensive metadata change (e.g. using objects instead of arrays for field mappings, along with other changes), so I can defer the naming change to then if you'd prefer consistency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you intend to revise things anyway, I have no preference. Probably easier to adopt the new format earlier and migrate older stuff as needed at some later point.
} | ||
|
||
foreach ($xmlRoot->{'also-load-methods'}->{'also-load-method'} as $alsoLoadMethod) { | ||
$metadata->registerAlsoLoadMethod((string) $alsoLoadMethod['method'], (string) $alsoLoadMethod['field']); | ||
if (! isset($xmlRoot->{'time-series'})) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems long enough to do without this guard statement. If another attribute is added down the line, you'll likely need to wrap this in a condition as you just did for also-load-methods
, so why not just do that now and avoid ever having to do this again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is our coding standard requiring an early exit, which seems pointless at this point. For now I've disabled the rule around this method, but I may want to refactor this into multiple methods at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the rule makes little sense for this kind of method. These aren't traditional guard statements of returning early before some logical operation is performed, because the method is handling many different kinds of operations. If anything, it just makes the final operation inconsistent from those handled earlier.
); | ||
|
||
if ($class->timeSeriesOptions->expireAfterSeconds) { | ||
$options['expireAfterSeconds'] = $class->timeSeriesOptions->expireAfterSeconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit odd that create receives this as a top-level option instead of within timeseries
.
4ee5142
to
0e1a58e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API looks stable enough to add docs to this PR.
tests/Doctrine/ODM/MongoDB/Tests/Mapping/AbstractMappingDriverTestCase.php
Outdated
Show resolved
Hide resolved
@@ -806,6 +807,9 @@ | |||
*/ | |||
public $isReadOnly; | |||
|
|||
/** READ ONLY: stores metadata about the time series collection */ | |||
public ?TimeSeries $timeSeriesOptions = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use readonly
here? AFAIK, that would just mean that markAsTimeSeries()
, which sets this, can only be called once. Perhaps that's something to defer until later, as I expect it could be done across the board for all "READ ONLY" metadata properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. What we'd want here is asymetric visibility, where changing the property is only done through accessors (such as markAsTimeSeries
) that can incorporate checks, while reading is always allowed. readonly
in PHP actually means write-once
, meaning that one wouldn't be able to override the time series configuration (e.g. to change granularity) when using inheritance mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readonly
= you can change the assigned object properties. You cannot change the reference to the object. And I think - I'm not sure though - that you can only make assignment in the constructor 🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think - I'm not sure though - that you can only make assignment in the constructor
I thought something like that as well, but then it seems we are wrong, and the property can be initialised after the constructor, but only from within the class scope: https://3v4l.org/cV5QL
For our case, this still wouldn't work however, as we can't assign a default value to a readonly property but have to leave it uninitialised, which means that accessing the property always needs to be guarded by an isset
check :(
@jmikola good point about the docs. I added |
#[ODM\Field(type: 'float')] | ||
public float $temperature, | ||
) { | ||
$this->id = (string) new ObjectId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ODM require a string cast here, or is it only done to satisfy the string
type hint on the property? IIRC, ODM would have allowed an ObjectId on the initially created entity, and left is as-is for insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default id
type converts an ObjectId
instance to a string in the entity. It's bothered me for a while as working with the ObjectId
instance directly would make more sense, but I've shied away from making the big BC break. We could use ObjectId|string
as type for the identifier, but this would create ambiguity in the API as users would always have to check types when using the identifier, hence the string cast.
|
||
- | ||
``metaField`` - The name of the field which contains metadata in each time | ||
series document. The field can be of any data type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't immediately clear to me from reading the Time Series docs, but are documents in a time series collection restricted to only having _id
, timeField
, and metaField
fields? The docs suggest that everything else gets stuffed into the metaField
.
If so, this may be something to note in the cookbook article when we demonstrate model creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added the following language to the cookbook entry:
A time series document always contains a time value, and one or more measurement
fields. Metadata is optional, but cannot be added to a time series collection
after creating it. When using an embedded document for metadata, fields can be
added to this document after creating the collection.
I've also extended the example to store temperature and humidity, showing that time series documents can contain multiple measurements per entry.
{ | ||
$metadata = $this->dm->getClassMetadata(TimeSeriesTestDocument::class); | ||
|
||
self::assertNull($metadata->timeSeriesOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a meaningful test? I'm also a bit confused that these tests reference a TimeSeriesDocument class defined in the same file, while you also have a separate TimeSeriesDocument class defined in its own file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't. This was left over from refactoring out the isTimeSeries
field, where I replaced all occurrences of checking the field with the corresponding check on the timeSeriesOptions
property. I removed the test since it's redundant.
As for the classes being used, there are lots of classes defined in this test file, I believe to avoid issues where changing the mapping of one class to test new functionality affects related tests. I've followed this for consistency, but we'd probably do well to refactor this in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye. Ideally, I suppose these tests would be split into their own files or define anonymous classes within the test itself. I understand there's so much prior art that it'd be a chore to clean everything up at once, though.
d4189d2
to
db758a3
Compare
Co-authored-by: Jeremy Mikola <jmikola@gmail.com>
@@ -20,6 +21,7 @@ | |||
use Documents\Sharded\ShardedOne; | |||
use Documents\Sharded\ShardedOneWithDifferentKey; | |||
use Documents\SimpleReferenceUser; | |||
use Documents\TimeSeries\TimeSeriesDocument; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted that this is where the class is referenced. Missed that earlier.
@alcaeus This had been a pain point in some of our automated testing for the past 18 months as creating timeseries collections required custom scripts. Thank you for this contribution! |
Thank you for taking the time to write this comment, I really appreciate it! If you have a chance to try out the feature please do give us feedback if anything does not work as expected. I understand this only affects schema creation, but any real-world scenarios you can test help us greatly! |
Summary
Supersedes #2597. I was unhappy with the way we were mapping time series collections, so instead of adding more options to the
Document
attribute I decided to introduce a new attribute and XML mapping element to handle time series data. To make data handling easier, theClassMetadataInstance
will store the attribute instance entirely, as it serves as a value object for time series options. As a result, the XML driver is able to create the attribute instance from the data in the XML, needing no additional validation.The only other change needed was to add support for time series options in the schema manager. Since time series collections need to be created as such, no handling is needed to update time series collections at all. Dropping time series collections works the same as dropping other collections, again requiring no other changes.
The tests in this PR cover the basic mapping and schema manager changes. No functional testing for time series collections was added, but I can add those tests if necessary.