Interface: CrawleeOneIO<TEnv, TReport, TMetadata>

Interface for storing and retrieving:

Scraped data
Requests (URLs) to scrape
Cache data

This interface is based on Crawlee/Apify, but defined separately to allow drop-in replacement with other integrations.

Type parameters

Name	Type
`TEnv`	extends `object` = `object`
`TReport`	extends `object` = `object`
`TMetadata`	extends `object` = `object`

Properties

createDefaultProxyConfiguration

• createDefaultProxyConfiguration: <T>(input: undefined | T | Readonly<T>) => MaybePromise<undefined | ProxyConfiguration>

Type declaration

▸ <T>(input): MaybePromise<undefined | ProxyConfiguration>

Creates a proxy configuration and returns a promise resolving to an instance of ProxyConfiguration that is already initialized.

Configures connection to a proxy server with the provided options. Proxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists. Setting proxy configuration in your crawlers automatically configures them to use the selected proxies for all connections.

For more details and code examples, see ProxyConfiguration.

Type parameters

Name	Type
`T`	extends `object`

Parameters

Name	Type
`input`	`undefined` \| `T` \| `Readonly`<`T`>

Returns

MaybePromise<undefined | ProxyConfiguration>

Defined in

src/lib/integrations/types.ts:128

generateEntryMetadata

• generateEntryMetadata: <Ctx>(ctx: Ctx) => MaybePromise<TMetadata>

Type declaration

▸ <Ctx>(ctx): MaybePromise<TMetadata>

Generate object with info on current context, which will be appended to the scraped entry

Type parameters

Name	Type
`Ctx`	extends `CrawlingContext`<`unknown`, `Dictionary`, `Ctx`>

Parameters

Name	Type
`ctx`	`Ctx`

Returns

MaybePromise<TMetadata>

Defined in

src/lib/integrations/types.ts:138

generateErrorReport

• generateErrorReport: (input: CrawleeOneErrorHandlerInput, options: PickRequired<CrawleeOneErrorHandlerOptions<CrawleeOneIO<TEnv, TReport, object>>, "io">) => MaybePromise<TReport>

Type declaration

▸ (input, options): MaybePromise<TReport>

Generate object with info on current context, which will be send to the error Dataset

Parameters

Name	Type
`input`	`CrawleeOneErrorHandlerInput`
`options`	`PickRequired`<`CrawleeOneErrorHandlerOptions`<`CrawleeOneIO`<`TEnv`, `TReport`, `object`>>, `"io"`>

Returns

MaybePromise<TReport>

Defined in

src/lib/integrations/types.ts:133

getInput

• getInput: <Input>() => Promise<null | Input>

Type declaration

▸ <Input>(): Promise<null | Input>

Returns a promise of an object with the crawler input. E.g. In Apify, retrieves the actor input value from the default KeyValueStore associated with the current actor run.

Type parameters

Name	Type
`Input`	extends `object`

Returns

Promise<null | Input>

Defined in

src/lib/integrations/types.ts:57

isTelemetryEnabled

• isTelemetryEnabled: () => MaybePromise<boolean>

Type declaration

▸ (): MaybePromise<boolean>

Returns

MaybePromise<boolean>

Defined in

src/lib/integrations/types.ts:131

openDataset

• openDataset: (id?: null | string) => MaybePromise<CrawleeOneDataset<object>>

Type declaration

▸ (id?): MaybePromise<CrawleeOneDataset<object>>

Opens a dataset and returns a promise resolving to an instance of the CrawleeOneDataset.

Datasets are used to store structured data where each object stored has the same attributes, such as online store products or real estate offers. The actual data is stored either on the local filesystem or in the cloud.

Parameters

Name	Type
`id?`	`null` \| `string`

Returns

MaybePromise<CrawleeOneDataset<object>>

Defined in

src/lib/integrations/types.ts:35

openKeyValueStore

• openKeyValueStore: (id?: null | string) => MaybePromise<CrawleeOneKeyValueStore>

Type declaration

▸ (id?): MaybePromise<CrawleeOneKeyValueStore>

Opens a key-value store and returns a promise resolving to an instance of the CrawleeOneKeyValueStore.

Key-value stores are used to store records or files, along with their MIME content type. The records are stored and retrieved using a unique key. The actual data is stored either on a local filesystem or in the cloud.

Parameters

Name	Type
`id?`	`null` \| `string`

Returns

MaybePromise<CrawleeOneKeyValueStore>

Defined in

src/lib/integrations/types.ts:52

openRequestQueue

• openRequestQueue: (id?: null | string) => MaybePromise<CrawleeOneRequestQueue>

Type declaration

▸ (id?): MaybePromise<CrawleeOneRequestQueue>

Opens a request queue and returns a promise resolving to an instance of the CrawleeOneRequestQueue.

RequestQueue represents a queue of URLs to crawl, which is stored either on local filesystem or in the cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.

Parameters

Name	Type
`id?`	`null` \| `string`

Returns

MaybePromise<CrawleeOneRequestQueue>

Defined in

src/lib/integrations/types.ts:44

runInContext

• runInContext: (userFunc: () => unknown, options?: ExitOptions) => Promise<void>

Type declaration

▸ (userFunc, options?): Promise<void>

Equivalent of Actor.main.

Runs the main user function that performs the job of the actor and terminates the process when the user function finishes.

The Actor.main() function is optional and is provided merely for your convenience. It is mainly useful when you're running your code as an actor on the Apify platform. However, if you want to use Apify SDK tools directly inside your existing projects, e.g. running in an Express server, on Google Cloud functions or AWS Lambda, it's better to avoid it since the function terminates the main process when it finishes!

The Actor.main() function performs the following actions:

When running on the Apify platform (i.e. APIFY_IS_AT_HOME environment variable is set), it sets up a connection to listen for platform events. For example, to get a notification about an imminent migration to another server. See Actor.events for details.
It checks that either APIFY_TOKEN or APIFY_LOCAL_STORAGE_DIR environment variable is defined. If not, the functions sets APIFY_LOCAL_STORAGE_DIR to ./apify_storage inside the current working directory. This is to simplify running code examples.
It invokes the user function passed as the userFunc parameter.
If the user function returned a promise, waits for it to resolve.
If the user function throws an exception or some other error is encountered, prints error details to console so that they are stored to the log.
Exits the Node.js process, with zero exit code on success and non-zero on errors.

Parameters

Name	Type
`userFunc`	() => `unknown`
`options?`	`ExitOptions`

Returns

Promise<void>

Defined in

src/lib/integrations/types.ts:116

triggerDownstreamCrawler

• triggerDownstreamCrawler: <TInput>(targetActorId: string, input?: TInput, options?: { build?: string }) => Promise<void>

Type declaration

▸ <TInput>(targetActorId, input?, options?): Promise<void>

Equivalent of Actor.metamorph.

This function should:

Start a crawler/actor by its ID,
Pass the given input into downsteam crawler.
Make the same storage available to the downstream crawler. AKA, the downstream crawler should use the same "default" storage as is the current "default" storage.

Type parameters

Name	Type
`TInput`	extends `object`

Parameters

Name	Type	Description
`targetActorId`	`string`	ID of the crawler/actor to which should be triggered.
`input?`	`TInput`	Input for the crawler/actor. Must be JSON-serializable (it will be stringified to JSON).
`options?`	`Object`	-
`options.build?`	`string`	Tag or number of the target build to metamorph into (e.g. `beta` or `1.2.345`). If not provided, the run uses build tag or number from the default actor run configuration (typically `latest`).

Returns

Promise<void>

Defined in

src/lib/integrations/types.ts:74

Files

CrawleeOneIO.md

Latest commit

History

CrawleeOneIO.md

File metadata and controls

Interface: CrawleeOneIO<TEnv, TReport, TMetadata>

Type parameters

Table of contents

Properties

Properties

createDefaultProxyConfiguration

Type declaration

Type parameters

Parameters

Returns

Defined in

generateEntryMetadata

Type declaration

Type parameters

Parameters

Returns

Defined in

generateErrorReport

Type declaration

Parameters

Returns

Defined in

getInput

Type declaration

Type parameters

Returns

Defined in

isTelemetryEnabled

Type declaration

Returns

Defined in

openDataset

Type declaration

Parameters

Returns

Defined in

openKeyValueStore

Type declaration

Parameters

Returns

Defined in

openRequestQueue

Type declaration

Parameters

Returns

Defined in

runInContext

Type declaration

Parameters

Returns

Defined in

triggerDownstreamCrawler

Type declaration

Type parameters

Parameters

Returns

Defined in