Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix raw:// URL parsing logic #687

Open
wants to merge 1 commit into
base: 2025-feb-alpha-1
Choose a base branch
from

Conversation

jl-martins
Copy link

Summary

Fixes #686

List of files changed and why

  • crawl4ai/async_crawler_strategy.py - To fix the logic that converts raw URLs to raw HTML
  • tests/20241401/test_async_crawler_strategy.py - To test the fix

How Has This Been Tested?

By implementing a parametrized test that calls AsyncPlaywrightCrawlerStrategy.crawl with a "raw:" and "raw://" URL and checks that the resulting HTML is correct.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@jl-martins
Copy link
Author

jl-martins commented Feb 15, 2025

Some tests are failing locally, but they were already failing before the fix, for reasons unrelated to this PR.

@RipFacuu
Copy link

good

@aravindkarnam aravindkarnam changed the base branch from main to 2025-feb-alpha-1 February 18, 2025 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Forward slashes of raw:// are not removed when converting raw URLs to HTML
2 participants