Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0] Add GridFS implementation on top of mongodb/mongodb #1790

Merged
merged 21 commits into from
Aug 3, 2018

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented May 13, 2018

Q A
Type feature
BC Break yes
Fixed issues #1697

Summary

This PR brings back support for GridFS. The idea is to provide mapped documents for GridFS files, while not re-implementing upload/download functionality currently in mongodb/mongodb. The general idea is:

  • Classes used for GridFS files are no longer mapped as documents, but rather as files
  • For classes mapped as files, only the fields specified in the GridFS specification (and not marked as deprecated) are allowed: _id, length, chunkSize, filename, uploadDate and metadata
  • Of the above fields, only the metadata field is checked for changes (if mapped)
  • Calling persist on a new file instance is not supported - the file has to be uploaded using the methods in the new GridFSRepository. This is because the specification suggests using streams, which isn't entirely feasible at this time

Mapping

Annotation mapping is done using special annotations (e.g. @File, @File\Filename, @File\UploadDate) that have predefined types and database field names and are marked as notSaved as a precaution.

XML mapping is done using special field mappings:

<gridfs-file name="Doctrine\ODM\MongoDB\Tests\Mapping\AbstractMappingDriverFile">
    <field fieldName="id" id="true" />
    <length fieldName="size" />
    <chunk-size />
    <upload-date />
    <filename fieldName="name" />

    <metadata target-document="Doctrine\ODM\MongoDB\Tests\Mapping\AbstractMappingDriverFileMetadata" />
</gridfs-file>

Note: the metadata field is treated like any other embed-one relationship; the repository does not check the given metadata object for its type when uploading a new file (see below).

Files and objects

Files can be read and hydrated into objects like all other documents. File contents are no longer available in the object itself but has to be read using the repository methods (downloadToStream at this time).

GridFS repository

This PR introduces a new interface for GridFS repositories along with a default implementation. The repository provides additional methods to upload and download files from and to streams. The repository is the only way to upload files, as mentioned above persisting a new file is not allowed and will cause an exception in UnitOfWork when calling persist($newFile). Once a file has been read from the database, it can be updated like all other documents.

BC breaks

  • Doctrine\ODM\MongoDB\Configuration: setDefaultRepositoryClassName and getDefaultRepositoryClassName have been replaced by setDefaultDocumentRepositoryClassName and getDefaultDocumentRepositoryClassName, respectively to accomodate the new GridFS repositories.
  • The @File annotation has been changed and is no longer a property-level annotation but a class-level annotation
  • The GridFS implementation no longer allows mapping arbitrary fields on the file level. Any fields not indicated in the GridFS specification need to be saved in the metadata document
  • The Doctrine\ODM\MongoDB\DocumentRepository class was moved to Doctrine\ODM\MongoDB\Repository\DocumentRepository.

Todo

  • Finalize tests
  • Document BC breaks in UPGRADE document
  • Write docs for GridFS feature, including migration

@alcaeus alcaeus added this to the 2.0.0 milestone May 13, 2018
@alcaeus alcaeus self-assigned this May 13, 2018
@alcaeus alcaeus force-pushed the gridfs-2.0 branch 2 times, most recently from 2efc50b to d852a68 Compare May 13, 2018 18:53
Copy link
Member

@malarzm malarzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work so far! Thanks @alcaeus

*/
public function setDefaultRepositoryClassName($className)
public function setDefaultDocumentRepositoryClassName($className)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note for future: this requires deprecation notice and maybe a forward compatibility in 1.3

}

/**
* Get default repository class for documents.

This comment was marked as resolved.

This comment was marked as resolved.

}

/**
* Get default repository class.
* Get default repository class for GridFS files.

This comment was marked as outdated.

@@ -2,15 +2,22 @@

declare(strict_types=1);

namespace Doctrine\ODM\MongoDB;
namespace Doctrine\ODM\MongoDB\Repository;

This comment was marked as resolved.

*/
public function setDefaultGridFSRepositoryClassName($className)
{
$reflectionClass = new \ReflectionClass($className);

This comment was marked as resolved.

This comment was marked as resolved.


private function isAllowedGridFSField(string $name): bool
{
return in_array($name, ['_id', 'chunkSize', 'filename', 'length', 'metadata', 'uploadDate'], true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this array to a constant?

@@ -451,6 +461,15 @@ public function update($document, array $options = [])
*/
public function delete($document, array $options = [])
{
if ($this->bucket) {
$id = $this->uow->getDocumentIdentifier($document);
$id = $this->class->getDatabaseIdentifierValue($id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming these variables might make things a little more clear.

$documentIdentifier = $this->uow->getDocumentIdentifier($document);
$databaseIdentifier = $this->class->getDatabaseIdentifierValue($documentIdentifier);

*/
public function setDefaultRepositoryClassName($className)
public function setDefaultDocumentRepositoryClassName($className)

This comment was marked as resolved.

This comment was marked as resolved.

$metadata = $this->metadataFactory->getMetadataFor($className);
if ($metadata->isFile) {
return $this->getDocumentBucket($className)->getFilesCollection();

This comment was marked as resolved.

This comment was marked as resolved.

@@ -758,6 +765,10 @@ public function setDiscriminatorField($discriminatorField)
return;
}

if ($this->isFile) {
throw MappingException::discriminatorNotAllowedForGridFS($this->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to go above the previous conditional that allows null assignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial thought was to allow an extending class to remove the discriminator. On second thought, that kind of object hierarchy doesn't really make sense - I'll move the check.

@@ -820,6 +835,10 @@ public function setDefaultDiscriminatorValue($defaultDiscriminatorValue)
return;
}

if ($this->isFile) {
throw MappingException::discriminatorNotAllowedForGridFS($this->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise a question if it makes sense to do this before the null assignment condition above.

@@ -81,6 +85,9 @@ public function loadMetadataForClass($className, \Doctrine\Common\Persistence\Ma
$class->setCollection((string) $xmlRoot['collection']);
}
}
if (isset($xmlRoot['bucket'])) {
$class->setCollection((string) $xmlRoot['bucket']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, are you setting the GridFS bucket prefix (e.g. "fs") as the collection name? If so, does that mean that interpretation of the collection property depends on the value of $class->isFile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. The alternative would be to bring in a bucket property and use that. That would mean consuming bucket if $class->isFile is true and collection if it isn't. It doesn't change the logic, but it removes the dual-use of the collection property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be clearer, but I'll defer to @jwage and @malarzm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should enhance this to make it more clear and remove the dual-use of the collection property.


class DefaultGridFSRepository extends DocumentRepository implements GridFSRepository
{
public function downloadToStream($id, $destination): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have placeholder doc blocks here, even if they only report @see MongoDB\GridFS\Bucket::method()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the style for return types with a space before : also used in ODM?
public function downloadToStream($id, $destination) : void

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently exclude the rule - this will be adjusted in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the original comment, I'll add a docblock with a reference to the original method.


$id = $this->getDocumentBucket()->uploadFromStream($filename, $source, $options);

// TODO: apply primary read preference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to ensure that the subsequent find() sees the same document that was just uploaded? Is there a particular reason you don't want to just return the document's ID as the library does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - it's to avoid issues like doctrine/mongodb#324 from the start. I haven't considered only returning the ID, it would've been nice to get a file you can use directly. The other option would be to return a proxy to avoid a read cycle that may be useless if the user doesn't want to use the newly persisted file.

public function uploadFromFile(string $source, ?string $filename = null, $metadata = null)
{
$resource = fopen($source, 'r');
if (! $resource) {
Copy link
Member

@jmikola jmikola May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a strict check for false? I'm not sure if a valid file pointer in PHP can possibly evaluate to zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know a successful created resource is never true with a check like this $resource == false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a type-safe check never hurts - will update this.

throw MongoDBException::cannotReadGridFSSourceFile($source);
}

if (! $filename) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to explicitly check for null here, in case someone wanted to use an empty string or "0" as a filename?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the check should be stricter.

@@ -225,6 +234,10 @@ public function ensureDocumentIndexes($documentName, $timeout = null)
throw new \InvalidArgumentException('Cannot create document indexes for mapped super classes, embedded documents or query result documents.');
}

if ($class->isFile) {
$this->ensureGridFSIndexes($class);

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

{
$chunksCollection = $this->dm->getDocumentBucket($class->getName())->getChunksCollection();
foreach ($chunksCollection->listIndexes() as $index) {
if ($index->isUnique() && $index->getKey() === ['files_id' => 1, 'n' => 1]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this duplicated array be in a constant or variable? It's also used in createIndex() a few lines later.

{
$filesCollection = $this->dm->getDocumentCollection($class->getName());
foreach ($filesCollection->listIndexes() as $index) {
if ($index->getKey() === ['filename' => 1, 'uploadDate' => 1]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with the array above.

try {
$gridFSMetadata = $class->getFieldMappingByDbFieldName('metadata');
$gridFSMetadataProperty = $gridFSMetadata['fieldName'];
} catch (MappingException $e) {

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@alcaeus

This comment has been minimized.

@malarzm

This comment has been minimized.

/**
* @Annotation
*/
final class Metadata extends AbstractField

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@jmikola

This comment has been minimized.

@alcaeus

This comment has been minimized.

@alcaeus
Copy link
Member Author

alcaeus commented Jun 4, 2018

XML mappings have been finalized, see original comment for example. I've also created #1806 to track changing the ID mapping to the same kind of mapping as with GridFS. This is mainly done to avoid allowing generic field mappings in GridFS files, with the added benefit of clearer, more concise ID mappings for all documents.

Optional attributes:

-
targetDocument - A |FQCN| of the target document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap targetDocument, discriminatorField, etc. in ``? We do it in other parts of the docs and it is done in storing-files-with-mongogridfs.rst in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasted it from the original documentation, I'll fix it throughout the file for consistency 👍

@alcaeus alcaeus changed the title [WIP] [2.0] Add GridFS implementation on top of mongodb/mongodb [2.0] Add GridFS implementation on top of mongodb/mongodb Jun 15, 2018
@alcaeus alcaeus requested review from jwage, jmikola and malarzm June 15, 2018 06:15
@alcaeus alcaeus force-pushed the gridfs-2.0 branch 2 times, most recently from 37243a5 to 11ba674 Compare July 31, 2018 19:02
$image = $profile->getImage();
$stream = fopen('tmp/path/to/copy', 'w+');
try {
$repository->downloadToStream($file->getid(), $stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but you have lowercase "i" in getId() ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, will fix before merging.

Copy link
Member

@malarzm malarzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to disregard my comments and let's have this merged 👍

@@ -342,20 +345,39 @@ public function getFilterParameters(string $name): ?array
/**
* @throws MongoDBException If not is a ObjectRepository.
*/
public function setDefaultRepositoryClassName(string $className): void
public function setDefaultDocumentRepositoryClassName(string $className): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think compat layer for this could actually should land in 1.3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely.

@@ -183,6 +185,12 @@ class ClassMetadata implements BaseClassMetadata
*/
public $collection;

/**
* READ-ONLY: The name of the GridFS bucket the document is mapped to.
* @var string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be string|null I think

*
* @var int|null
*/
public $chunkSizeBytes = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= null is redundant


public static function cannotPersistGridFSFile(string $className): self
{
return new self(sprintf('Cannot persist GridFS file for class "%s" through UnitOfWork', $className));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "." by the end


public static function cannotReadGridFSSourceFile(string $filename): self
{
return new self(sprintf('Cannot open file "%s" for uploading to GridFS', $filename));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "." by the end


.. _file_chunksize:

@File\ChunkSize

This comment was marked as resolved.

This comment was marked as resolved.

ObjectId, although you can override this in the mapping.
-
``chunkSize`` stores the size of a single chunk in bytes. By default, chunks
are 255 kb in size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"kb" may be ambiguous. I'd suggest quoting the PHPLIB docs here: "261120 (i.e. 255 KiB)"

https://github.com/mongodb/mongo-php-library/blob/bd148eab0493e38354e45e2cd7db59b90fdcad79/src/GridFS/Bucket.php#L69-L70

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the suggestion.

echo $image->getFile()->getBytes();
With XML mappings, the fields are automatically mapped to camel-cased properties.
To change property names, simply override the ``fieldName`` attribute for each
field. You can not override any other options for GridFS fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "cannot"? I'm not sure what is most commonly used in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick search favours "cannot" in our docs - updated.

You can of course make references to this Image document from
another document. Imagine you had a Profile document and you wanted
every Profile to have a profile image:
The ``ImageMetadata`` class can be any embedded document:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to say that the metadata document can be any embedded document class, but it seems odd to say that about ImageMetadata. ImageMetadata either is an embedded document class (can be used) or it isn't (cannot be used).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to "must be an embedded document".

$image->setName('Test image');
$image->setFile('/path/to/image.png');
$repository = $documentManager->getRepository(Documents\Image::class);
$file = $repository->uploadFromFile('image.jpg', '/tmp/path/to/image', new Documents\ImageMetadata('image/jpeg'));

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

$dm->flush();
When reading GridFS files, they behave like all other documents. You can query
for them using the ``find*`` methods in the repository, create query or
aggregation pipeline builders and also use them as ``targetDocument`` in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oxford comma: "builders, and"

*/
public function setDefaultGridFSRepositoryClassName(string $className): void
{
$reflectionClass = new \ReflectionClass($className);

This comment was marked as resolved.

This comment was marked as resolved.

public $db;

/** @var string */
public $bucketName = 'fs';

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

/**
* @see Bucket::downloadToStream
*/
public function downloadToStream($id, $destination): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you still intend to create an openDownloadStream() method? If not in this PR, should there be a separate ticket to do so before 2.0?

See: https://github.com/doctrine/mongodb-odm/pull/1790/files#r203132136

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That must have slipped through, I added it now.

@alcaeus alcaeus force-pushed the gridfs-2.0 branch 2 times, most recently from 3b04df8 to d75dccc Compare August 2, 2018 06:24
private $name;
If you want to pass options, such as a metadata object to the uploaded file, you
If you want to add metadata to the uploaded file, you can pass it as the last
can pass an ``UploadOptions`` object as the last argument to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can pass it as the last" on the line above should be deleted.


/** @Field */
private $name;
If you want to pass options, such as a metadata object to the uploaded file, you

This comment was marked as resolved.

If you want to pass options, such as a metadata object to the uploaded file, you
If you want to add metadata to the uploaded file, you can pass it as the last
can pass an ``UploadOptions`` object as the last argument to the
``uploadFromFile``, ``uploadFromStream`` or ``openUploadStream`` method call:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oxford comma after uploadFromStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants