Cannot disable caching #586
Answered
by
UnDesSix
UnDesSix
asked this question in
Forums - Q&A
-
Hi guys, I am desperately trying to disable cache in order to run some tests. First time I run the script it does take a bit of time. But every next times it will use the cache. Am I doing something wrong ? I tried the old and new version from the doc. Also I am curious to know where are the cache files stored ? Is this reproducible?Yes crawl4ai versioncrawl4ai-0.4.247 Steps to ReproduceSet-uppython3 -m venv venv
source venv/bin/activate
pip install crawl4ai First time it scrapes, then it uses the cachepython3 main.py Code snippetsimport asyncio
from crawl4ai import AsyncWebCrawler, CacheMode, CrawlerRunConfig, BrowserConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
# Initialize the AsyncWebCrawler
browser_config = BrowserConfig(
browser_type="chromium", headless=True, text_mode=True
)
async with AsyncWebCrawler(config=browser_config) as crawler:
# List of URLs to crawl
urls = [
"https://example1.com",
"https://example2.com",
"https://example3.com",
"https://example4.com",
"https://example5.com",
]
# Configure the crawler
run_config = CrawlerRunConfig(
markdown_generator=DefaultMarkdownGenerator(
options={
"ignore_links": True,
"protect_links": False,
}
)
)
# Run the crawling process for multiple URLs
results = await crawler.arun_many(
cache_mode=CacheMode.DISABLED,
urls=urls,
)
# Process the results
for result in results:
if result.success:
file_name = (
result.url.replace("https://", "")
.replace("http://", "")
.replace("www.", "")
.replace("/", "_")
.replace(".fr", "_")
+ ".md"
)
# Save the extracted content to a file
with open(f"data/md/test/{file_name}", "w") as file:
file.write(result.markdown)
else:
print(f"Failed to crawl: {result.url}")
print(f"Error: {result.error_message}")
print("---")
if __name__ == "__main__":
import time
start = time.time()
asyncio.run(main())
end = time.time()
print(f"Time: {end - start} seconds") OSUbuntu 24.04 Python version3.12.3 Error logs & Screenshots (if applicable)Thanks 😄 |
Beta Was this translation helpful? Give feedback.
Answered by
UnDesSix
Jan 29, 2025
Replies: 1 comment
-
I will answer my own question. I updated crawl4ai and used version 0.4.3b. Working properly now. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
UnDesSix
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I will answer my own question.
I updated crawl4ai and used version 0.4.3b.
Working properly now.