Bypass AntiSelenium Protect

Tr0jan_Horse

Moderator
Staff member
MODERATOR
ULTIMATE
PREMIUM
MEMBER
Joined
Oct 23, 2024
Messages
304
Reaction score
8,796
Deposit
0$
When parsing data, there are situations when you need to access a site using the selenium driver. Since the site page has scripts running in the code that add data to the code, which simply cannot be accessed using regular requests. But things can be a little worse. For example, the page can be located behind a CDN, such as Cloudflare, with browser checking enabled. In this case, the regular selenium driver will not be able to access the page, since it will be determined that automated testing software is being used. But even in these cases, there is a way out. Let's see how you can bypass this check by disabling certain options in the browser, and also consider a modified version of the driver for Chrome, in which these options are already disabled out of the box. But first things first.
1746390060706.png
First, to make sure that the Chrome browser does not have access to the site, let's create a simple code that will load the driver and go to a secure page.

What do you need?

This code requires selenium to be installed. To do this, type the following command in the terminal:

Bash:
pip install selenium

Loading a site with automation options enabled

After the required library is installed, import the required modules into the script. We will need the os module to get the path to the driver; time, to set a short pause before closing; platform, to determine the operating system, so that the required driver can be loaded depending on it.

We import webdriver from the selenium library, as well as Service from selenium.webdriver.chrome.service, to pass the required parameters to the driver. In particular, this code passes the path to the webdriver, as well as the log_path parameter, in which you can specify the path to save the logs. In this case, the logs are saved in null.

Python:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.service import Service


exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(service=Service(log_path=os.devnull, executable_path=exec_path))
Well, let's go to the desired page, wait 10 seconds and close the browser.
Python:
driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()
The page the browser goes to is protected by Cloudflare. We were unable to access the site. We were asked to confirm that we are humans. That is, the browser failed the check and automated software was identified.
1746390215367.png

Python:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.service import Service


exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(service=Service(log_path=os.devnull, executable_path=exec_path))

driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()

Using selenium-stealth to try to bypass protection

This project has helped out in many cases in the past. But in this case, things are not so good and rosy. Let's try to access the site with its help.

What do you need?

Install selenium, as well as the selenium-stealth add-on, which is designed to hide traces of automation. To install them, write the command in the terminal:

Bash:
pip install selenium selenium-stealth

Now we will do the same operations that we did when using a regular driver. With one small difference. Here you will need to import the installed selenium-stealth into the script, and also import Options from selenium.webdriver.chrome.options in order to be able to pass various options to the browser, such as "headlessness" and others. In our case, you will need to pass options to disable automation. But let's take it in order. Import the libraries into the script.

Code:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth

We create an instance of the Options class and pass the necessary parameters to it.


Python:
options = Options()
# options.add_argument("--headless")
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

After this, we create an object of the browser class, where we pass the path to the driver and the options we set.

Python:
exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(options=options, service=Service(log_path=os.devnull, executable_path=exec_path))

Now we execute the function from the selenium_stealth module, which sets the headers, display language and other parameters.

Code:
stealth(driver=driver,
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                   'Chrome/83.0.4103.53 Safari/537.36',
        languages=["ru-RU", "ru"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        run_on_insecure_origins=True,
        )
Well, then we'll try to access the page behind Cloudflare again.
Python:
driver.get('https://nowsecure.nl')
time.sleep(20)
driver.close()
driver.quit()
And again, failure. Unfortunately, the stealth module did not cope with this task, which is shown in the screenshot below. The browser is still determined under the control of automated software.
1746390480575.png

Python:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth

options = Options()
# options.add_argument("--headless")
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(options=options, service=Service(log_path=os.devnull, executable_path=exec_path))

stealth(driver=driver,
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                   'Chrome/83.0.4103.53 Safari/537.36',
        languages=["ru-RU", "ru"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        run_on_insecure_origins=True,
        )

driver.get('https://nowsecure.nl')
time.sleep(20)
driver.close()
driver.quit()

Disabling options by running a script in the driver

This method was presented on the Python Today channel, for which we thank them very much. In order to bypass the browser's protection for checking the use of automated software, you need to disable some options that are present in the browser running Selenium, but you will not find them in the original browser.

Here are these options. It is quite easy to view them. To do this, launch the browser running the webdriver, press "F12" to get to the developer tools. Then go to the console and write: window.cdc. And here you will see the parameters that are read by the protection against automated software.

Python:
window.cdc_adoQpoasnfa76pfcZLmcfl_Array
window.cdc_adoQpoasnfa76pfcZLmcfl_Promise
window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol
1746390633234.png
For example, when launching a regular browser and trying to find the same thing there, nothing was found.
1746390698045.png

Now it's time to move on to practice.

What do you need?

Apart from installing selenium, there is no need to install any third-party libraries in this code. Write in the terminal to install it:

Bash:
pip install selenium

Now import the necessary libraries into the script.
Python:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

As you can see, the Options class is also imported here, which means we will be passing parameters to the browser. In this case, we will pass a parameter to disable automation. However, just in case, the channel provides two additional parameters that can be disabled if necessary.

Python:
options = Options()
# options.add_argument("--headless")
# options.add_experimental_option("excludeSwitches", ["enable-automation"])
# options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")

We specify the path to the driver and pass the options and path to the browser object.

Python:
exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(options=options, service=Service(log_path=os.devnull, executable_path=exec_path))

Now we need to run the script to remove the parameters we talked about above from the current session. To do this, run the script:

Python:
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    'source': '''
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Array;
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Promise;
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol;
  '''
})

After this, we try to enter the protected site.

Python:
driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()

And, everything works out. Disabling the option and deleting the window methods helped in this case. We got access to the page.

1746390932086.png

As we can see, the method works and can be used when parsing protected sites like these.

Python:
import os
import time
from platform import system

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

options = Options()
# options.add_argument("--headless")
# options.add_experimental_option("excludeSwitches", ["enable-automation"])
# options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")

exec_path = os.path.join(os.getcwd(), 'driver', 'chromedriver.exe') if system() == "Windows" else \
    os.path.join(os.getcwd(), 'driver', 'chromedriver')

driver = webdriver.Chrome(options=options, service=Service(log_path=os.devnull, executable_path=exec_path))

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    'source': '''
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Array;
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Promise;
        delete window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol;
  '''
})

driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()
Gaining access using undetected-chromedriver

Here's another method that uses much less code. This time we will try to access a secure site using the undetected-chromedriver library. The developer of this library went further and created a library that already loads the required driver and immediately modifies it with the installation of certain profiles. Moreover, if you already have selenium installed, you can specify the version of the webdriver, and it will also be modified in undetected. To install the library, write in the terminal:

pip install undetected-chromedriver

And that's it. You don't need to install anything else. The selenium driver, as I wrote above, will be pulled in automatically.
Now that the library is installed, import everything that is necessary for its operation into the script.

Python:
import time

import undetected_chromedriver as uc

That's it. As you can see, just the library and time, just to set a pause until the browser is closed.
When using this library, there is no need to specify the path to the webdriver, it will be found automatically. All that is required in this case is to create a webdriver object.

Python:
driver = uc.Chrome()

And after that, we'll go to the secure site. By the way, the site we've been using throughout this article was made by the library developer specifically to test the protection. As he writes, everything that can be enabled is enabled on it.

Python:
driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()

As shown in the screenshot below, access to the site was successful.

1746391162049.png

Well, since the Selenium driver is used, it can also be imported into the script and used to perform all the search and click operations that you did in regular Selenium.

Python:
import time

import undetected_chromedriver as uc

driver = uc.Chrome()

driver.get('https://nowsecure.nl')
time.sleep(10)
driver.close()
driver.quit()

This is a small overview. As you can see, using one or another method allows you to access even a site protected from automated software. And at least two methods here are working. Which one to use is up to you, it's a matter of preference. But I use undetected-chromedriver, which allows you to use less code with the same result as in the Python Today code.

And that's probably all.

Thank you for your attention. I hope this information will be useful to you
 
Top Bottom