Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clipper: Add capability of limiting downloads #9788

Merged
merged 17 commits into from
Mar 9, 2024
Merged

Clipper: Add capability of limiting downloads #9788

merged 17 commits into from
Mar 9, 2024

Conversation

pedr
Copy link
Collaborator

@pedr pedr commented Jan 26, 2024

Summary

Creating a new abstraction DownloadLimiter that can be used to limit the amount of data that extractNoteFromHTML can download. This works by counting how many images and the number of bytes downloaded by the shim.fetchBlob calls.

Changes

  • Limit how much data can be downloaded by the extractNoteFromHTML.
  • Change the "promise-pool" library.
  • Add a log when the limit is reached
  • Add a header to a note body if the limit is reached.

Why change the promise-pool library

Initially I planned to change the library that we were using to handle the image downloads, after some discussing we agreed that Promise.all should be enough for our use case.

Log pattern:

2024-01-20 00:29:43: downloadController: Owner id: qcc9jPy9UZ5xSuBhdBkSTZ - Total bytes stored: 841811. Maximum: 52428800 - Images initiated for download: 12. Maximum: 11. Expected: 57

2024-01-20 00:29:43: downloadController: Owner id: {ownerId} - Total bytes stored: {how much was downloaded in the request}. Maximum: { value set in the LimitedDownloadController } - Images initiated for download: { how many images started to be downloaded}. Maximum: { value set in the LimitedDownloadController }. Expected: { how many image URLs we had in the email body}

Implementation

There are "two behaviors".

When the image count limit is reached.
We can stop new downloads from even starting, there is only one warn log message since we can control it:

When the total number of bytes is exceeded.
Since the only place we can check this is inside the fetchBlob function and the error is being controlled by a try-catch there we can are potential logging a lot of error messages:

Testing

This change should not affect anything since the current code only uses a DummyDownloadController class as the fallback.

Any unexpected behavior from the Joplin Web Clipper or any code that uses shim.fetchBlob is a potential bug.

@pedr pedr requested a review from laurent22 January 26, 2024 19:14
@pedr pedr changed the title Clipper: Add capability of setting download limit Clipper: Add capability of limiting downloads Jan 26, 2024
Comment on lines 93 to 103
private bytesToHuman(bytes: number) {
const units = ['Bytes', 'KB', 'MB', 'GB'];
let unitIndex = 0;

while (bytes >= 1024 && unitIndex < units.length - 1) {
bytes /= 1024;
unitIndex++;
}

return `${bytes.toFixed(1)} ${units[unitIndex]}`;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to @joplin/utils/bytes

Comment on lines 105 to 113
public limitMessage() {
if (this.imagesCount_ > this.maxImagesCount) {
return `Your email has ${this.imageCountExpected} images, exceeding the limit of ${this.maxImagesCount}.`;
}
if (this.totalBytes >= this.maxTotalBytes) {
return `The size of your email (${this.bytesToHuman(this.totalBytes)}) was larger than the allowed limit (${this.bytesToHuman(this.maxTotalBytes)}).`;
}
return '';
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generic controller that could apply to anything, not just emails, so we should remove references to emails in the error messages.

@@ -223,7 +226,7 @@ const isValidUrl = (url: string, isDataUrl: boolean, urlProtocol?: string, allow
return isAllowedProtocol;
};

export async function downloadMediaFile(url: string, fetchOptions?: FetchOptions, allowedProtocols?: string[]) {
export async function downloadMediaFile(url: string, downloadController: DownloadController, fetchOptions?: FetchOptions, allowedProtocols?: string[]) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the downloadController optional.

I know we have this rule to avoid optional parameters, but when it means having to create a dummy object on each call it's better to make it optional

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why do we need this dummy controller? Isn't it possible to make the code work with a simple null controller? I guess there will be a few null checks here and there.

Especially since fetchBlob is heavily used for sync it's better to avoid creating unnecessary objects on each call.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if it's optional, make it part of the "fetchOptions" object

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move the downloadController to be a property of the other options object and instead of creating a dummy class I'm checking before using downloadController like you suggested.

@@ -471,9 +472,10 @@ function shimInit(options: ShimInitOptions = null) {
}, options);
};

shim.fetchBlob = async function(url: any, options) {
shim.fetchBlob = async function(url: any, options: any, downloadController?: DownloadController) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options parameter by convention should be the last one, so it means adding downloadController as an option

packages/lib/downloadController.ts Show resolved Hide resolved
return this.imageCountExpected_;
}

public handleNewChunk(request: any) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick but "handleNewChunk" is not necessary it's just handleChunk

@laurent22 laurent22 added the v2.15 label Feb 6, 2024
@pedr
Copy link
Collaborator Author

pedr commented Feb 22, 2024

This is a case where using the promising-pool lib is better than Promise.all. If we were using Promise.all we wouldn't be able to stop the execution midway (for example, if the content that is being downloaded exceeded a size limit that we choose).

@pedr
Copy link
Collaborator Author

pedr commented Feb 28, 2024

We talked about having each operation throw an error when the limit is reached instead of the class having a state that identifies if the limit has been reached. The problem with this approach is that if we have 100 image links inside the content and the size limit is reached at the 10th image we will be logging 90 error messages.

I'm not even sure how I could avoid this kind of error logging flood because it is a behavior that happens inside fetchBlob function, so the only way I can think would be to use a try catch to suppress the errors that are related to this feature (I have an error code to identify it).

let urlIndex = 0;
const promiseProducer = () => {
if (urlIndex >= urls.length) return null;
const maximumImageDownloadsAllowed = downloadController ? downloadController.maxImagesCount : urls.length;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about setting it to Number.POSITIVE_INFINITY if downloadController is not defined? It seems more future proof since it will work even if somehow the number of urls change in the future (if something's appended to the array)

packages/lib/services/rest/routes/notes.ts Show resolved Hide resolved
if (!options || !options.path) throw new Error('fetchBlob: target file path is missing');
if (!options.method) options.method = 'GET';
if (!options.downloadController) options.downloadController = null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessary

@@ -571,6 +583,11 @@ function shimInit(options: ShimInitOptions = null) {
});

const request = http.request(requestOptions, (response: any) => {

if (downloadController !== null) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for strict null checking here. What you want to check is if it's defined or not and for that just if (downloadController) is enough. There are places maybe where we need to make the difference between undefined and null or other falsy values, but not when we just need to know if something exists or not.

Comment on lines 1 to 12
// eslint-disable-next-line import/prefer-default-export
export const bytesToHuman = (bytes: number) => {
const units = ['Bytes', 'KB', 'MB', 'GB'];
let unitIndex = 0;

while (bytes >= 1024 && unitIndex < units.length - 1) {
bytes /= 1024;
unitIndex++;
}

return `${bytes.toFixed(1)} ${units[unitIndex]}`;
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but I've suggested moving it to @joplin/utils so why is it in @joplin/lib now? /lib should be mostly for business logic, specific to Joplin. /utils is for generic utilities with no dependencies that can be used anywhere. This is so other packages don't need to import the whole lib, renderer, editor and more when all they need is some simple utilities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, my bad, I'm fixing this.

@laurent22 laurent22 merged commit 17a8ce5 into laurent22:dev Mar 9, 2024
10 checks passed
@pedr pedr deleted the download-controller branch March 10, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants