Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stealth mode #142

Open
route opened this issue Jan 26, 2021 · 20 comments
Open

Implement stealth mode #142

route opened this issue Jan 26, 2021 · 20 comments
Labels
enhancement New feature or request

Comments

@route
Copy link
Member

route commented Jan 26, 2021

https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth

@balt5r
Copy link

balt5r commented Jan 31, 2021

Would be great if it could pass those tests

@alexanderadam
Copy link
Contributor

alexanderadam commented Feb 20, 2021

undetected_chromedriver might also be a good reference.

Also it would probably make sense to add the intoli's checks to the specs.
They are also on GitHub (here and here).

@route route added the enhancement New feature or request label Feb 26, 2021
@brettallred
Copy link

@route Any thoughts on adding this in? We've been using ferrum for a while now and started getting blocked on one of the sites.

I'm happy to take a cut at implementing this if you want to outline some of your thoughts on how you envision doing it. I studied the source code for about an hour tonight just thinking through some options here.

@alexanderadam
Copy link
Contributor

alexanderadam commented Apr 29, 2021

Hi @brettallred,

I'm happy to take a cut at implementing

This would be so wonderful! 🙏

I'm not a maintainer here but I would like to see Stealth mode as an integrated extension.

My idea would be:

Specs

  • the specs could get a new directory for extensions (i.e. spec/extensions/stealth)
  • For the specs itself it would probably make sense to add a static page (see spec/support/views for some examples) that shows various states (could be visually simpler than this, since we only would need to check the text output in the specs). There are nice reference pages out there with checks that could be integrated in this page:

Implementation of the extension itself

there are good references out there:

Outside of the specs, you could also check the reCAPTCHA score how good the scripts work.

Summary of a possible solution — TL;DR;

  1. Create a HTML file in spec/support/views containing the checks mentioned above to have a reliable check available within the specs — maybe also a simple HTML table with a summary (i.e. you are [not] a bot)
  2. Write the spec in the way that it intentionally fails (since the extension is not used / ready yet — so that it's obvious that the specs work — i.e. expect(browser.body).to include("you are not a bot"))
  3. Write a rake task (i.e. rake update:stealth_extension) to fetch/build the minimized/compiled puppeteer-extra-plugin-stealth extension and put it in a nice extensions directory within the ferrum repository
  4. Hopefully the spec will be green now if the extension was properly loaded (remember to add Ferrum::Browser.new(extensions: %w(path/to/stealth/ext.js)) or even a shortcut like stealth_mode: true to that) 😉
  5. optional: document how to integrate Privacy Pass

Again, this is just an idea and I'm not the maintainer here. So please take it with a grain of salt.
But I think this could work in a very maintainable manner.

PS: Updating the stealth extension could even be a GitHub action later on.

@ttilberg
Copy link
Contributor

ttilberg commented Oct 1, 2021

I just wanted to pass a small note that the move @alexanderadam proposed is absolutely feasible. Absurdly so. I've always been a bit intimidated wrangling the js/extension side of things so I kind of brushed that last comment off a bit, assuming additional wiring would need to happen. Tonight I stumbled back into it and noted in particular extract-stealth-evasions, and thought I'd just see where I could get with it. Woah.

image

image

First off, thank you @alexanderadam for your detailed note. I saw it this spring, but like I said... I didn't understand it's proposed simplicity. Second, I wanted to report these findings just in case it inspires someone else.

@sebthemonster
Copy link

sebthemonster commented Oct 27, 2022

According to these webpages :

Tests of bot.sannysoft.com and www.nowsecure.nl are successfully passed with this configuration of browser :

browser = Ferrum::Browser.new(browser_path: BROWSER_PATH, headless: false, browser_options: { "disable-blink-features": "AutomationControlled" })

I don't yet find how to pass them in headless mode.

@sandstrom
Copy link
Contributor

Isn't this a problem better solved at the Chromium level?

I read this article recently, seems like there are improvements in an upcoming version of Chrome:

https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html

I'd close this issue, out of scope for Ferrum.

@route
Copy link
Member Author

route commented Feb 28, 2023

It is, but still ferrum itself can provide some guidance and scripts to make it even harder from the beginning to detect automation.

@wflanagan
Copy link

Is there documentation on how to get the new headless mode in Ferrum?

@akavitaliy
Copy link

You've found a solution on how to transfer them in headless mode?

@maeve
Copy link

maeve commented Aug 14, 2023

You can enable the new headless mode in chromium by modifying the browser options:

Ferrum::Browser.new(browser_options: { "headless": "new" })

@route
Copy link
Member Author

route commented Aug 17, 2023

You can enable the new headless mode in chromium by modifying the browser options:

Ferrum::Browser.new(browser_options: { "headless": "new" })

it doesn't work, because there's a lot more work to be done #379

@j-mcnally
Copy link

Sick, this works great, got all test to pass, and CF unblocked, thanks again.

@harrison-broadbent
Copy link

For anyone intersted, I wrote up the tips from this thread + many others into an article:

Stealthly Browsing and Scraping with Ferrum

It covers the tips from @ttilberg on integrating stealth.min.js with Ferrum, plus rotating user agents, proxies, saving bandwidth with browser.network.intercept and more.

@joshfester
Copy link

@harrison-broadbent Excellent article, I wasn't aware of ferrum before reading it. It looks like the puppeteer-extra-plugin-stealth has not been updated in over two years. Is that a potential issue, or does it simply not require updates often?

@ttilberg
Copy link
Contributor

ttilberg commented Jan 23, 2025

@harrison-broadbentIs that a potential issue, or does it simply not require updates often?

It’s the case that the maintainer took his valuable work private. It’s a lot of work to maintain this tool, and it’s profitable to leverage in consulting engagements, so I can’t blame him. It also gives the cat a big advantage in the age old cat and mouse game when the best evasions are public.

That said, the work that is public is still quite valuable and also helps me get around certain challenges more frequently.

@joshfester
Copy link

Thanks @ttilberg that is great to know 🙏

@akavitaliy
Copy link

For anyone intersted, I wrote up the tips from this thread + many others into an article:

Stealthly Browsing and Scraping with Ferrum

It covers the tips from @ttilberg on integrating stealth.min.js with Ferrum, plus rotating user agents, proxies, saving bandwidth with browser.network.intercept and more.

The article is very interesting, thank you!

@Nakilon
Copy link
Contributor

Nakilon commented Jan 23, 2025

I hate botting.

@harrison-broadbent
Copy link

@joshfester @akavitaliy thank you both!

And I agree with what @ttilberg said — despite being out of data (compared the the state-of-the-art) I believe the evasions are still valuable, particularly when scraping average / lightly-protected websites

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests