Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: robust handling of ObjectID type for MongoDB #3456

Closed
2 tasks
bajtos opened this issue Jul 26, 2019 · 25 comments
Closed
2 tasks

Spike: robust handling of ObjectID type for MongoDB #3456

bajtos opened this issue Jul 26, 2019 · 25 comments
Assignees
Labels
Relations Model relations (has many, etc.) Repository Issues related to @loopback/repository package spike
Milestone

Comments

@bajtos
Copy link
Member

bajtos commented Jul 26, 2019

MongoDB is tricky.

  • It uses a custom ObjectID type for primary keys.
  • ObjectID is represented as a string when converted to JSON
  • In queries, string values must be cast to ObjectID, otherwise they are not considered as the same value: 'some-id' !== ObjectID('some-id').

As a result, both PK and FK properties must use ObjectID as the type, and coercion must be applied where necessary.

Ideally, I'd like LB4 to define MongoDB PK and FKs as follows:

  • {type: 'string', mongodb: {dataType: 'ObjectID'}}

Even better, dataType: 'ObjectID' should be automatically applied by the connector for PK and FKs referencing ObjectID PKs.

For example:

@model()
class Product {
  @property({
    type: 'string',
    generated: true,
    // ^^ implies dataType: 'ObjectID'
  })
  id: string;

  @property({
    type: 'string',
    references: {
      model: () => Category,
      property: 'id',
    },
    // ^^ implies dataType: 'ObjectID' when Category is attached to MongoDB
  })
  categoryId: string;
}

For v1, I suppose we can ask developers to provide dataType manually.

@model()
class Product {
  @property({
    type: 'string',
    generated: true,
    mongodb: {dataType: 'ObjectID'},
  })
  id: string;

  @property({
    type: 'string',
    mongodb: {dataType: 'ObjectID'},
  })
  categoryId: string;
}

With this setup in place, id and categoryId properties should be always returned as strings from DAO and connector methods.

This is tricky to achieve for the PK property, because juggler overrides user-supplied type with connector's default type when the PK is generated by the database. See
DataSource.prototype.setupDataAccess()

Can we rework MongoDB connector to hide ObjectID complexity as an internal implementation detail and always use string values in public APIs? Accept ids as strings and internally convert them to ObjectID. Convert values returned by the database from ObjectID to strings.

Downside: this is not going to work for free-form properties that don't have any type definition and where the connector does not know that they should be converted from string to ObjectID. But then keep in mind that JSON cannot carry type information, therefore REST API clients are not able to set free-form properties to ObjectID values even today.

See also #3387

Acceptance criteria

For every property defined as {type: 'string', mongodb: {dataType: 'ObjectID'}}, including properties defined in nested/embedded models:

  • When the MongoDB connector returns data from database, it converts ObjectID values to strings.
  • When the MongoDB connector writes data to database, it converts string values to ObjectID
  • When the MongoDB connector queries database (think of filter.where, but also findById and replaceById), it converts string values to ObjectID. The conversion is applied to non-trivial conditions too, e.g. {where: {id: { inq: ['my-objectid-1', 'my-objectid-2'] }}}

Tasks & deliverables

  • A PoC to verify feasibility of the design where ObjectID type is hidden as an implementation detail of the MongoDB connector and all other LB layers (juggler, Repository, REST APIs) use string type instead.

  • A list of well-defined follow-up tasks to implement the PoC. (This will be a semver-major release of the MongoDB connector.)

@dhmlau
Copy link
Member

dhmlau commented Jul 26, 2019

@bajtos, just wanna make sure.. do we need to have this spike done first before working on the ObjectID support for properties, i.e. #1875? Or this spike is mostly for relations?

@bajtos
Copy link
Member Author

bajtos commented Nov 15, 2019

This is tricky to achieve for the PK property, because juggler overrides user-supplied type with connector's default type when the PK is generated by the database. See
DataSource.prototype.setupDataAccess()

There is a pull request in progress to improve this part, see loopbackio/loopback-datasource-juggler#1783 and also the discussion in #3602

@agnes512
Copy link
Contributor

agnes512 commented Dec 3, 2019

When working on the task #4148, the replaceById method was expected to take JSON obj as the target data. But even we set useDefaultIdType and dataType: ObjectID, the replaceById still throws error because with replaceById(id, { an JSON obj}, it tries to replace the objectId id property with pass in data.id, which is a string.

the problem is spot in dao.js file. Maybe we want to handle it inside of mongo.js or dao.js instead of having any hacky workaround in juggler bridge.

Once this is being fixed:

  • remove/edit the comment in ensurePersistable method in juggler bride.
  • modify the test for replaceById() in has-many-inclusion-resolver.relation.acceptance.ts file with JSON obj as the target data.

@hacksparrow
Copy link
Contributor

Ok, I am done with the spike. Here is the spike report.

POC is in the form of test cases. Make sure to checkout the new-objectid on loopback-datasource-juggler and npm link it in the loopback-connector-mongodb repo.

@jannyHou
Copy link
Contributor

@hacksparrow I like the spike report summary 👍 When I try to understand the problems related to ObjectID, I am always lost at a good description of the expected behaviour, is https://github.com/strongloop/loopback-connector-mongodb#handling-objectid still a valid description for the new proposal? If not could you maybe write a new one?

@mikeevstropov
Copy link
Contributor

How to describe relations with ObjectID? For example

@belongsTo(() => Customer)
customerId: string;

Thank you.

@hacksparrow
Copy link
Contributor

@mikeevstropov we haven't gotten to the implementation part yet, but I'd like customerId type to be automatically discovered without users having to add anything more than what you have shown.

@agnes512
Copy link
Contributor

@mikeevstropov currently you can do

  // foreign key
  @belongsTo(() => Customer,
   {}, //relation metadata goes in here
   {// property definition goes in here
    mongodb: {dataType: 'ObjectId'}
  })
  customerId: string;

@agnes512
Copy link
Contributor

@hacksparrow I like the idea of getting rid of ( or set aside for now) strictObjectIDCoercion. It is really confusing and not fully satisfied the requirements that "Use string values for ObjectID in public APIs and manage the coercion processes for the user."

If I remember correctly, now mongodb: {dataType: 'ObjectID'} does:

  1. create instances with objectId or string-like objectId
  2. can find/update with objectId or string-like objectId
  3. 🚫 however the returned instances has id in objectId ( as mentioned in this spike description)

From your summary, 3 will be handled in DAO, 👍 can't wait to see it.
As for 1 and 2, I know the goal is to just have string value objectId in public APIs, I am wondering would they still be allowed in repository level?

@hacksparrow
Copy link
Contributor

I am wondering would they still be allowed in repository level?

The only place where we will be working with "real" ObjectID will be within the MongoDB connector. Even DAO need not know anything about ObjectIDs, it just applies the coercing function, which will be provided by the respective connectors. Users will never have to think in ObjectID once they have set mongodb: {dataType: 'ObjectID'}, so repository level will certainly be strings.

@jannyHou
Copy link
Contributor

Even DAO need not know anything about ObjectIDs, it just applies the coercing function, which will be provided by the respective connectors.

I like removing juggler layer's coercion a lot. It's much better to give connector the full control. It would be easier for having unified behavior in lb3/lb4/other ORM.

I have a question for the returned data's type. IIUC, according to test case, the returned data will always be in string format. Is it worth to allow choose between type objectId/string?

E.g. connector's coercion function returns either objectId/string based on config, and juggler returns the data as is?

Other than ^ the solution looks reasonable to me.

@hacksparrow
Copy link
Contributor

Is it worth to allow choose between type objectId/string?

Is there a condition where someone would want to use ObjectID instead of string?

@jannyHou
Copy link
Contributor

@hacksparrow I think this test case is an example.

@hacksparrow
Copy link
Contributor

That was written for the existing behavior, we will get rid of the behavior and the test.

I can't think of any situation where one would want to work with ObjectIDs than string.

@jannyHou
Copy link
Contributor

I see fair enough, not sure how to approve the proposal since it's not a PR :p I like the new proposal it's very straightforward 💯

@agnes512
Copy link
Contributor

The only place where we will be working with "real" ObjectID will be within the MongoDB connector.

I see 👍
I have one more question. What's the behavior if the props don't have mongodb: {dataType: 'ObjectID'}? I guess it then would return ObjectId? Will that be the case that Janny mentioned above?

@hacksparrow
Copy link
Contributor

... if the props don't have mongodb: {dataType: 'ObjectID'}

Can you show an example of what you have on mind? A real property definition.

@dhmlau
Copy link
Member

dhmlau commented Aug 18, 2020

@bajtos
Copy link
Member Author

bajtos commented Aug 20, 2020

@hacksparrow @agnes512 @jannyHou I have few more suggestions to consider, PTAL at my comments in loopbackio/loopback-connector-mongodb#591 and loopbackio/loopback-datasource-juggler#1861.

@jannyHou
Copy link
Contributor

Thanks @bajtos I commented here loopbackio/loopback-datasource-juggler#1861 (comment)

@hacksparrow
Copy link
Contributor

OK, here is the new spike report - https://github.com/strongloop/loopback-connector-mongodb/blob/6732fe7589d061632df8990ef4f11b8d48d138d9/SPIKE.md.

This approach is simpler and does not require changes to Juggler. Thanks @bajtos for the discussion.

The PR: loopbackio/loopback-connector-mongodb#596

@hacksparrow
Copy link
Contributor

While discussing the first SPIKE, I highlighted to @bajtos that Juggler also requires refactoring. The points are captured in #889 (comment).

From my work on #3456, I noted that the mechanism for creating the model object that's returned from the DAO methods are different for most methods. This was quite unexpected. Implementing new features or bug fixes that center around the model object will require changes for each DAO method.

Ideally, the model object should be generated by a single pipeline, which would be invoked by all the DAO methods that deal with data retrieval.

Take a look at the differences in how the model data is created in the callback function of these DAO methods:

create: https://github.com/strongloop/loopback-datasource-juggler/blob/master/lib/dao.js#L383
save: https://github.com/strongloop/loopback-datasource-juggler/blob/master/lib/dao.js#L2137
replaceById: https://github.com/strongloop/loopback-datasource-juggler/blob/master/lib/dao.js#L2700
find: https://github.com/strongloop/loopback-datasource-juggler/blob/master/lib/dao.js#L1622

So, while we will not be using the first approach, it was a good exercise in pointing out areas of refactoring in Juggler, which was missed in the original Juggler Refactor Epic.

@hacksparrow
Copy link
Contributor

Created follow up tasks.

  1. Remove MongoDB.prototype.getDefaultIdType
  2. ObjectID string interface
  3. SPIKE: Connector level model validation
  4. Primary key property not validated in .create()
  5. Refactor DAO methods

4 and 5 are somewhat related but not a requirement, they were discovered and created out of the first spike.

@bajtos are we good to close this now?

@bajtos
Copy link
Member Author

bajtos commented Oct 1, 2020

are we good to close this now?

Sure 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Relations Model relations (has many, etc.) Repository Issues related to @loopback/repository package spike
Projects
None yet
Development

No branches or pull requests

6 participants