Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clipper: Add capability of limiting downloads #9788

Merged
merged 17 commits into from
Mar 9, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,7 @@ packages/lib/database.js
packages/lib/debug/DebugService.js
packages/lib/determineBaseAppDirs.js
packages/lib/dom.js
packages/lib/downloadController.js
packages/lib/errorUtils.js
packages/lib/errors.js
packages/lib/eventManager.js
Expand Down Expand Up @@ -1010,6 +1011,7 @@ packages/lib/themes/solarizedLight.js
packages/lib/themes/type.js
packages/lib/time.js
packages/lib/types.js
packages/lib/utils/bytes.js
packages/lib/utils/credentialFiles.js
packages/lib/utils/joplinCloud.js
packages/lib/utils/processStartFlags.js
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,7 @@ packages/lib/database.js
packages/lib/debug/DebugService.js
packages/lib/determineBaseAppDirs.js
packages/lib/dom.js
packages/lib/downloadController.js
packages/lib/errorUtils.js
packages/lib/errors.js
packages/lib/eventManager.js
Expand Down Expand Up @@ -990,6 +991,7 @@ packages/lib/themes/solarizedLight.js
packages/lib/themes/type.js
packages/lib/time.js
packages/lib/types.js
packages/lib/utils/bytes.js
packages/lib/utils/credentialFiles.js
packages/lib/utils/joplinCloud.js
packages/lib/utils/processStartFlags.js
Expand Down
95 changes: 95 additions & 0 deletions packages/lib/downloadController.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import Logger from '@joplin/utils/Logger';
import JoplinError from './JoplinError';
import { ErrorCode } from './errors';
import { bytesToHuman } from './utils/bytes';

const logger = Logger.create('downloadController');

export interface DownloadController {
totalBytes: number;
imagesCount: number;
maxImagesCount: number;
imageCountExpected: number;
printStats(imagesCountExpected: number): void;
handleChunk(request: any): (chunk: any)=> void;
limitMessage(): string;
}

export class LimitedDownloadController implements DownloadController {
private totalBytes_ = 0;
// counts before the downloaded has finished, so at the end if the totalBytes > maxTotalBytesAllowed
// it means that imageCount will be higher than the total downloaded during the process
private imagesCount_ = 0;
// how many images links the content has
private imageCountExpected_ = 0;
private isLimitExceeded_ = false;

private maxTotalBytes = 0;
public readonly maxImagesCount: number;
private ownerId = '';

public constructor(ownerId: string, maxTotalBytes: number, maxImagesCount: number) {
this.ownerId = ownerId;
this.maxTotalBytes = maxTotalBytes;
this.maxImagesCount = maxImagesCount;
}

public set totalBytes(value: number) {
if (this.totalBytes_ >= this.maxTotalBytes) {
throw new JoplinError(`Total bytes stored (${this.totalBytes_}) has exceeded the amount established (${this.maxTotalBytes})`, ErrorCode.DownloadLimiter);
}
this.totalBytes_ = value;
}

public get totalBytes() {
return this.totalBytes_;
}

public set imagesCount(value: number) {
if (this.imagesCount_ > this.maxImagesCount) {
throw new JoplinError(`Total images to be stored (${this.imagesCount_}) has exceeded the amount established (${this.maxImagesCount})`, ErrorCode.DownloadLimiter);
}
this.imagesCount_ = value;
}

public get imagesCount() {
return this.imagesCount_;
}

public set imageCountExpected(value: number) {
this.imageCountExpected_ = value;
}

public get imageCountExpected() {
return this.imageCountExpected_;
}

public handleChunk(request: any) {
return (chunk: any) => {
try {
this.totalBytes += chunk.length;
} catch (error) {
request.destroy(error);
laurent22 marked this conversation as resolved.
Show resolved Hide resolved
}
};
}

public printStats() {
if (!this.isLimitExceeded_) return;

const owner = `Owner id: ${this.ownerId}`;
const totalBytes = `Total bytes stored: ${this.totalBytes}. Maximum: ${this.maxTotalBytes}`;
const totalImages = `Images initiated for download: ${this.imagesCount_}. Maximum: ${this.maxImagesCount}. Expected: ${this.imageCountExpected}`;
logger.info(`${owner} - ${totalBytes} - ${totalImages}`);
}

public limitMessage() {
if (this.imagesCount_ > this.maxImagesCount) {
return `The maximum image count of ${this.maxImagesCount} has been exceeded. Image count in your content: ${this.imageCountExpected}`;
}
if (this.totalBytes >= this.maxTotalBytes) {
return `The maximum content size ${bytesToHuman(this.maxTotalBytes)} has been exceeded. Content size: (${bytesToHuman(this.totalBytes)})`;
}
return '';
}
}
1 change: 1 addition & 0 deletions packages/lib/errors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ export enum ErrorCode {
NotFound = 'notFound',
UnsupportedMimeType = 'unsupportedMimeType',
MustUpgradeApp = 'mustUpgradeApp',
DownloadLimiter = 'downloadLimiter',
}
1 change: 0 additions & 1 deletion packages/lib/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@
"color": "3.2.1",
"compare-versions": "6.1.0",
"diff-match-patch": "1.0.5",
"es6-promise-pool": "2.5.0",
"fast-deep-equal": "3.1.3",
"fast-xml-parser": "3.21.1",
"follow-redirects": "1.15.5",
Expand Down
39 changes: 26 additions & 13 deletions packages/lib/services/rest/routes/notes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ const { MarkupToHtml } = require('@joplin/renderer');
const { ErrorNotFound } = require('../utils/errors');
import { fileUriToPath } from '@joplin/utils/url';
import { NoteEntity } from '../../database/types';
import { DownloadController } from '../../../downloadController';

const logger = Logger.create('routes/notes');

Expand Down Expand Up @@ -66,6 +67,7 @@ type RequestNote = {
type FetchOptions = {
timeout?: number;
maxRedirects?: number;
downloadController?: DownloadController;
};

async function requestNoteToNote(requestNote: RequestNote): Promise<NoteEntity> {
Expand Down Expand Up @@ -262,27 +264,32 @@ export async function downloadMediaFile(url: string, fetchOptions?: FetchOptions
}
}

async function downloadMediaFiles(urls: string[], fetchOptions?: FetchOptions, allowedProtocols?: string[]) {
const PromisePool = require('es6-promise-pool');

async function downloadMediaFiles(urls: string[], fetchOptions: FetchOptions, allowedProtocols?: string[]) {
const output: any = {};

const downloadController = fetchOptions?.downloadController ?? null;
laurent22 marked this conversation as resolved.
Show resolved Hide resolved

const downloadOne = async (url: string) => {
if (downloadController) downloadController.imagesCount += 1;
const mediaPath = await downloadMediaFile(url, fetchOptions, allowedProtocols);
if (mediaPath) output[url] = { path: mediaPath, originalUrl: url };
};

let urlIndex = 0;
const promiseProducer = () => {
if (urlIndex >= urls.length) return null;
const maximumImageDownloadsAllowed = downloadController ? downloadController.maxImagesCount : urls.length;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about setting it to Number.POSITIVE_INFINITY if downloadController is not defined? It seems more future proof since it will work even if somehow the number of urls change in the future (if something's appended to the array)

const urlsAllowedByController = urls.slice(0, maximumImageDownloadsAllowed);
logger.info(`Media files allowed to be downloaded: ${maximumImageDownloadsAllowed}`);

const url = urls[urlIndex++];
return downloadOne(url);
};
const promises = [];
for (const url of urlsAllowedByController) {
promises.push(downloadOne(url));
}

await Promise.all(promises);

const concurrency = 10;
const pool = new PromisePool(promiseProducer, concurrency);
await pool.start();
if (downloadController) {
downloadController.imageCountExpected = urls.length;
downloadController.printStats(urls.length);
}

return output;
}
Expand Down Expand Up @@ -459,7 +466,13 @@ export default async function(request: Request, id: string = null, link: string
logger.info('Images:', imageSizes);

const allowedProtocolsForDownloadMediaFiles = ['http:', 'https:', 'file:', 'data:'];
const extracted = await extractNoteFromHTML(requestNote, requestId, imageSizes, undefined, allowedProtocolsForDownloadMediaFiles);
const extracted = await extractNoteFromHTML(
requestNote,
requestId,
imageSizes,
undefined,
allowedProtocolsForDownloadMediaFiles,
);

let note = await Note.save(extracted.note, extracted.saveOptions);

Expand Down
19 changes: 18 additions & 1 deletion packages/lib/shim-init-node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import * as fs from 'fs-extra';
import * as pdfJsNamespace from 'pdfjs-dist';
import { writeFile } from 'fs/promises';
import { ResourceEntity } from './services/database/types';
import { DownloadController } from './downloadController';
import { TextItem } from 'pdfjs-dist/types/src/display/api';
import replaceUnsupportedCharacters from './utils/replaceUnsupportedCharacters';

Expand All @@ -25,6 +26,15 @@ const dgram = require('dgram');

const proxySettings: any = {};

type FetchBlobOptions = {
path?: string;
method?: string;
maxRedirects?: number;
timeout?: number;
headers?: any;
downloadController?: DownloadController;
};

function fileExists(filePath: string) {
try {
return fs.statSync(filePath).isFile();
Expand Down Expand Up @@ -493,9 +503,10 @@ function shimInit(options: ShimInitOptions = null) {
}, options);
};

shim.fetchBlob = async function(url: any, options) {
shim.fetchBlob = async function(url: any, options: FetchBlobOptions) {
if (!options || !options.path) throw new Error('fetchBlob: target file path is missing');
if (!options.method) options.method = 'GET';
if (!options.downloadController) options.downloadController = null;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessary

// if (!('maxRetry' in options)) options.maxRetry = 5;

// 21 maxRedirects is the default amount from follow-redirects library
Expand All @@ -510,6 +521,7 @@ function shimInit(options: ShimInitOptions = null) {
const http = url.protocol.toLowerCase() === 'http:' ? require('follow-redirects').http : require('follow-redirects').https;
const headers = options.headers ? options.headers : {};
const filePath = options.path;
const downloadController = options.downloadController;

function makeResponse(response: any) {
return {
Expand Down Expand Up @@ -571,6 +583,11 @@ function shimInit(options: ShimInitOptions = null) {
});

const request = http.request(requestOptions, (response: any) => {

if (downloadController !== null) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for strict null checking here. What you want to check is if it's defined or not and for that just if (downloadController) is enough. There are places maybe where we need to make the difference between undefined and null or other falsy values, but not when we just need to know if something exists or not.

response.on('data', downloadController.handleChunk(request));
}

response.pipe(file);

const isGzipped = response.headers['content-encoding'] === 'gzip';
Expand Down
12 changes: 12 additions & 0 deletions packages/lib/utils/bytes.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// eslint-disable-next-line import/prefer-default-export
export const bytesToHuman = (bytes: number) => {
const units = ['Bytes', 'KB', 'MB', 'GB'];
let unitIndex = 0;

while (bytes >= 1024 && unitIndex < units.length - 1) {
bytes /= 1024;
unitIndex++;
}

return `${bytes.toFixed(1)} ${units[unitIndex]}`;
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but I've suggested moving it to @joplin/utils so why is it in @joplin/lib now? /lib should be mostly for business logic, specific to Joplin. /utils is for generic utilities with no dependencies that can be used anywhere. This is so other packages don't need to import the whole lib, renderer, editor and more when all they need is some simple utilities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, my bad, I'm fixing this.

8 changes: 0 additions & 8 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -6859,7 +6859,6 @@ __metadata:
color: 3.2.1
compare-versions: 6.1.0
diff-match-patch: 1.0.5
es6-promise-pool: 2.5.0
fast-deep-equal: 3.1.3
fast-xml-parser: 3.21.1
follow-redirects: 1.15.5
Expand Down Expand Up @@ -19991,13 +19990,6 @@ __metadata:
languageName: node
linkType: hard

"es6-promise-pool@npm:2.5.0":
version: 2.5.0
resolution: "es6-promise-pool@npm:2.5.0"
checksum: e472ec5959b022b28e678446674c78dd2d198dd50c537ef59916d32d2423fe4518c43f132d81f2e98249b8b8450c95f77b8d9aecc1fb15e8dcd224c5b98f0cce
languageName: node
linkType: hard

"es6-promise@npm:^4.0.3, es6-promise@npm:^4.1.1":
version: 4.2.8
resolution: "es6-promise@npm:4.2.8"
Expand Down
Loading