-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image analysis job #307
Image analysis job #307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said this PR is a first draft towards #181, so I didn't comment on some things I assume you'll fix in followup commits, e.g. unwrap
s and TODO
s. I had a few low priority comments and a couple of nit picks, but overall I think this is looking great and I appreciate the time you've put into this so far!
I'll mark this PR as a draft just so that when you're ready for me to take another look you can just change the status and I'll know to look.
I chose a slightly broader approach to this job, making an AnalyzeMediaJob which includes an AnalyzeImage task. This is intended to be more extensible, as additional stages to media analysis will probably be wanted later on.
I think this is definitely the right approach. I imagine we'll be adding various types of media analysis in the future, so not having to define and maintain separate jobs for each type of analysis is a good idea.
On that note, I did this by adding an "Analyze Media" button to the "Manage" page for individual media items, it seemed like the right place to put it.
I think so, it satisfies your Individual
variant for the job. The other variants would need to triggers in different places, as you already mentioned, and I think this is a good starting point.
Finally, the job currently updates the number of pages on a media item always as opposed to just validating the number of pages. It sounds, from the issue, like this isn't the goal, so I can change that alongside other changes.
I think this is the eventual goal IMO. I think one thing to consider that might be good to gather input on is whether we want always want to auto-update metadata diffs when a media item is analyzed. An alternative that might be appealing is some sort of reconcilation process, which would require a user to manually review and accept the changes. What do you think?
As for the other functionalities outlined in that feature, things like image dimensions could just be an automatic update to whatever relation winds up holding that information on a media record.
pub struct AnalyzeMediaOutput { | ||
/// The number of images analyzed | ||
images_analyzed: u64, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could likely add more state to this, e.g. media_updated
, but this is a good start and I'm fine leaving as-is if that is what you decide
Okay, these last several commits get things into a generally workable state. There are probably a few things to do still. A few that come to mind:
|
I think what you have is good for now wrt to what analysis tasks are performed, and you've built it already to be able to add more tasks as needed which is 🔥
I'm not 100% yet. I think if it flat out doesn't exist, it's fine to set it to the value you read. The exception might be EPUB files. Otherwise, I think some sort of reconciliation would be neat, e.g. the analysis generates a list of actions to be taken on the frontend. In the case of mismatch, it would tell you what the current is and what the analyzed is and you can decide the action. However, that in itself is a huge feature. I'll err on the side of caution and say just persist |
So I was working on this pull request today and I realized that a media item's page is non-null. Investigating further, to see if perhaps an invalid value might be assigned under some conditions that should be overwritten (versus properly assigned page counts where no update is necessary, per your prior comment), I found something I hadn't considered: the pages for a file are always counted during initial creation during a scan (see the That initially made the job here seem a bit unnecessary - it would never have a new value. However, thinking further, it might actually makes the most sense to always update it. This way, should the page counting methodology be updated in the future to correct an error, people can easily correct their databases by running the analyze job. In interest of this, I am thinking of unifying the Let me know your thoughts here, if you agree, then I just need to get authentication working before this is finalized on the server side. If the random button I threw on each manage page is good enough for you, then it'll be ready to merge at that point as well. |
I'm not sure I understand, did you mean to refer to the
Yeah good catch, there is a I think the big goal for the analysis is more for ensuring the metadata matches the actual count, not necessarily the
I'm good with this 👍 I do like my idea for some sort of reconciliation flow (I'm starting to overuse this word now lol) but I don't think it is necessarily needed for this small of an operation.
Yeah I think that makes sense, I agree 👍 I also think the random buttons are good for now, especially since this is based into experimental |
This comment has been minimized.
This comment has been minimized.
This last commit does two things. First, it fixes an error in the way I had the job for rars counting pages (needed to check if it was an image). Second, it addresses a todo in Once you've confirmed that this is what you want this should be good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few nit-picks, but the only one I think care about is the conditional update during analysis. If you have time to tackle any, great, if not I'm not worried. I think this looks great otherwise so I'm approving it. Thanks again!!
Let me know if you'd prefer me to handle the conflicts with the base branch 👍
// Update media item in database | ||
let _ = ctx | ||
.db | ||
.media() | ||
.update(media::id::equals(id), vec![media::pages::set(page_count)]) | ||
.exec() | ||
.await?; | ||
output.media_updated += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update should probably only hit if the page counts are actually different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and followed this rule in my latest commit - I check if there's metadata, and if there is and page_count
doesn't match the one we calculated, update it. If there isn't, create new metadata with the new page_count
.
I'd appreciate if you could check the logic there, it may not have been the most idiomatic prisma usage.
This pull request creates a first draft of an image analysis job (#181), which will need to be updated further before merging.
I chose a slightly broader approach to this job, making an
AnalyzeMediaJob
which includes anAnalyzeImage
task. This is intended to be more extensible, as additional stages to media analysis will probably be wanted later on.The job isn't fully integrated. Although it is written to have four variants (Individual, Library, Series, and MediaGroup) I've only implemented a client-side way to access the Individual analysis command. On that note, I did this by adding an "Analyze Media" button to the "Manage" page for individual media items, it seemed like the right place to put it. Implementing the job fully would involve adding the same button to other menus and linking them up on the axum side. The axum side also needs authentication boilerplate added.
Finally, the job currently updates the number of pages on a media item always as opposed to just validating the number of pages. It sounds, from the issue, like this isn't the goal, so I can change that alongside other changes.
In the meantime, I think this is a good place to discuss how this could be improved and made ready for integration to develop.