-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clipper: Add capability of limiting downloads #9788
Conversation
packages/lib/downloadController.ts
Outdated
private bytesToHuman(bytes: number) { | ||
const units = ['Bytes', 'KB', 'MB', 'GB']; | ||
let unitIndex = 0; | ||
|
||
while (bytes >= 1024 && unitIndex < units.length - 1) { | ||
bytes /= 1024; | ||
unitIndex++; | ||
} | ||
|
||
return `${bytes.toFixed(1)} ${units[unitIndex]}`; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to @joplin/utils/bytes
packages/lib/downloadController.ts
Outdated
public limitMessage() { | ||
if (this.imagesCount_ > this.maxImagesCount) { | ||
return `Your email has ${this.imageCountExpected} images, exceeding the limit of ${this.maxImagesCount}.`; | ||
} | ||
if (this.totalBytes >= this.maxTotalBytes) { | ||
return `The size of your email (${this.bytesToHuman(this.totalBytes)}) was larger than the allowed limit (${this.bytesToHuman(this.maxTotalBytes)}).`; | ||
} | ||
return ''; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic controller that could apply to anything, not just emails, so we should remove references to emails in the error messages.
@@ -223,7 +226,7 @@ const isValidUrl = (url: string, isDataUrl: boolean, urlProtocol?: string, allow | |||
return isAllowedProtocol; | |||
}; | |||
|
|||
export async function downloadMediaFile(url: string, fetchOptions?: FetchOptions, allowedProtocols?: string[]) { | |||
export async function downloadMediaFile(url: string, downloadController: DownloadController, fetchOptions?: FetchOptions, allowedProtocols?: string[]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the downloadController optional.
I know we have this rule to avoid optional parameters, but when it means having to create a dummy object on each call it's better to make it optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why do we need this dummy controller? Isn't it possible to make the code work with a simple null
controller? I guess there will be a few null checks here and there.
Especially since fetchBlob
is heavily used for sync it's better to avoid creating unnecessary objects on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if it's optional, make it part of the "fetchOptions" object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move the downloadController to be a property of the other options object and instead of creating a dummy class I'm checking before using downloadController like you suggested.
packages/lib/shim-init-node.ts
Outdated
@@ -471,9 +472,10 @@ function shimInit(options: ShimInitOptions = null) { | |||
}, options); | |||
}; | |||
|
|||
shim.fetchBlob = async function(url: any, options) { | |||
shim.fetchBlob = async function(url: any, options: any, downloadController?: DownloadController) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The options
parameter by convention should be the last one, so it means adding downloadController
as an option
packages/lib/downloadController.ts
Outdated
return this.imageCountExpected_; | ||
} | ||
|
||
public handleNewChunk(request: any) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick but "handleNewChunk" is not necessary it's just handleChunk
|
We talked about having each operation throw an error when the limit is reached instead of the class having a state that identifies if the limit has been reached. The problem with this approach is that if we have 100 image links inside the content and the size limit is reached at the 10th image we will be logging 90 error messages. I'm not even sure how I could avoid this kind of error logging flood because it is a behavior that happens inside |
let urlIndex = 0; | ||
const promiseProducer = () => { | ||
if (urlIndex >= urls.length) return null; | ||
const maximumImageDownloadsAllowed = downloadController ? downloadController.maxImagesCount : urls.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about setting it to Number.POSITIVE_INFINITY if downloadController is not defined? It seems more future proof since it will work even if somehow the number of urls change in the future (if something's appended to the array)
packages/lib/shim-init-node.ts
Outdated
if (!options || !options.path) throw new Error('fetchBlob: target file path is missing'); | ||
if (!options.method) options.method = 'GET'; | ||
if (!options.downloadController) options.downloadController = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessary
packages/lib/shim-init-node.ts
Outdated
@@ -571,6 +583,11 @@ function shimInit(options: ShimInitOptions = null) { | |||
}); | |||
|
|||
const request = http.request(requestOptions, (response: any) => { | |||
|
|||
if (downloadController !== null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for strict null checking here. What you want to check is if it's defined or not and for that just if (downloadController)
is enough. There are places maybe where we need to make the difference between undefined and null or other falsy values, but not when we just need to know if something exists or not.
packages/lib/utils/bytes.ts
Outdated
// eslint-disable-next-line import/prefer-default-export | ||
export const bytesToHuman = (bytes: number) => { | ||
const units = ['Bytes', 'KB', 'MB', 'GB']; | ||
let unitIndex = 0; | ||
|
||
while (bytes >= 1024 && unitIndex < units.length - 1) { | ||
bytes /= 1024; | ||
unitIndex++; | ||
} | ||
|
||
return `${bytes.toFixed(1)} ${units[unitIndex]}`; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but I've suggested moving it to @joplin/utils
so why is it in @joplin/lib
now? /lib should be mostly for business logic, specific to Joplin. /utils is for generic utilities with no dependencies that can be used anywhere. This is so other packages don't need to import the whole lib, renderer, editor and more when all they need is some simple utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, my bad, I'm fixing this.
Summary
Creating a new abstraction
DownloadLimiter
that can be used to limit the amount of data thatextractNoteFromHTML
can download. This works by counting how many images and the number of bytes downloaded by theshim.fetchBlob
calls.Changes
extractNoteFromHTML
.Change the "promise-pool" library.Why change the promise-pool library
Initially I planned to change the library that we were using to handle the image downloads, after some discussing we agreed that
Promise.all
should be enough for our use case.Log pattern:
Implementation
There are "two behaviors".
When the image count limit is reached.
We can stop new downloads from even starting, there is only one warn log message since we can control it:
When the total number of bytes is exceeded.
Since the only place we can check this is inside the fetchBlob function and the error is being controlled by a try-catch there we can are potential logging a lot of error messages:
Testing
This change should not affect anything since the current code only uses a
DummyDownloadController
class as the fallback.Any unexpected behavior from the Joplin Web Clipper or any code that uses
shim.fetchBlob
is a potential bug.