Cannot disable caching #586

UnDesSix · 2025-01-29T14:14:25Z

UnDesSix
Jan 29, 2025

Hi guys,

I am desperately trying to disable cache in order to run some tests. First time I run the script it does take a bit of time. But every next times it will use the cache.

Am I doing something wrong ? I tried the old and new version from the doc.

Also I am curious to know where are the cache files stored ?

Is this reproducible?

Yes

crawl4ai version

crawl4ai-0.4.247

Steps to Reproduce

Set-up

python3 -m venv venv
source venv/bin/activate
pip install crawl4ai

First time it scrapes, then it uses the cache

python3 main.py

Code snippets

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode, CrawlerRunConfig, BrowserConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator


async def main():
    # Initialize the AsyncWebCrawler
    browser_config = BrowserConfig(
        browser_type="chromium", headless=True, text_mode=True
    )
    async with AsyncWebCrawler(config=browser_config) as crawler:
        # List of URLs to crawl
        urls = [
            "https://example1.com",
            "https://example2.com",
            "https://example3.com",
            "https://example4.com",
            "https://example5.com",
        ]

        # Configure the crawler
        run_config = CrawlerRunConfig(
            markdown_generator=DefaultMarkdownGenerator(
                options={
                    "ignore_links": True,
                    "protect_links": False,
                }
            )
        )

        # Run the crawling process for multiple URLs
        results = await crawler.arun_many(
            cache_mode=CacheMode.DISABLED,
            urls=urls,
        )

        # Process the results
        for result in results:
            if result.success:
                file_name = (
                    result.url.replace("https://", "")
                    .replace("http://", "")
                    .replace("www.", "")
                    .replace("/", "_")
                    .replace(".fr", "_")
                    + ".md"
                )

                # Save the extracted content to a file
                with open(f"data/md/test/{file_name}", "w") as file:
                    file.write(result.markdown)
            else:
                print(f"Failed to crawl: {result.url}")
                print(f"Error: {result.error_message}")
                print("---")


if __name__ == "__main__":
    import time

    start = time.time()
    asyncio.run(main())
    end = time.time()
    print(f"Time: {end - start} seconds")

OS

Ubuntu 24.04

Python version

3.12.3

Error logs & Screenshots (if applicable)

Thanks 😄

Answered by UnDesSix

Jan 29, 2025

I will answer my own question.

I updated crawl4ai and used version 0.4.3b.

Working properly now.

View full answer

UnDesSix · 2025-01-29T15:04:45Z

UnDesSix
Jan 29, 2025
Author

I will answer my own question.

I updated crawl4ai and used version 0.4.3b.

Working properly now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot disable caching #586

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Cannot disable caching #586

UnDesSix Jan 29, 2025

Is this reproducible?

crawl4ai version

Steps to Reproduce

Set-up

First time it scrapes, then it uses the cache

Code snippets

OS

Python version

Error logs & Screenshots (if applicable)

Replies: 1 comment

UnDesSix Jan 29, 2025 Author

UnDesSix
Jan 29, 2025

UnDesSix
Jan 29, 2025
Author