Until recently, I was able to log in by using code like the below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
# log in
link = (insert login page link here as a string)
driver.get(link)
userbox = driver.find_element_by_id("Username")
userbox.send_keys(myuser)
passbox = driver.find_element_by_id("Password")
passbox.send_keys(mypass)
passbox.send_keys(Keys.RETURN)
However, the site recently implemented a CAPTCHA hurdle into its login process, which means that the above code no longer works.
Firefox Profiles to the rescue!
In the course of my research, I learned that Selenium pros tend to prefer using custom profiles for faster page loads anyway, so maybe this was a blessing in disguise. Plus, I learned something new!
How to Bypass a CAPTCHA/Log-in Page With Selenium WebDriver
First, create a Firefox profile.
What's a Firefox profile, you ask? Mozilla says: "Firefox saves your personal information such as bookmarks, passwords, and user preferences in a set of files called your profile, which is stored in a separate location from the Firefox program files."
You have a default profile already, but let's create one just in case you want to test Selenium with different settings than you normally use.
In Terminal, run /Applications/Firefox.app/Contents/MacOS/firefox-bin -P
Reminder: Make sure to log in on Firefox using this special profile* (by running the Terminal command mentioned above) because cookies expire and your Firefox profile won't be able to access the page if you haven't logged in recently enough.
*You can also check the box that says "use the selected profile without asking at startup" if you want to just use this profile all the time, not just for Selenium stuff.
Update Your Code
It's time to update your code to include this Firefox profile. Depending on where your python file is located, you should update the path accordingly. The below assumes that your code is located in a folder that's two levels below your Library folder. Don't worry, the space between "Application" and "Support" is not a problem. I've highlighted the areas of code that need to be personalized based on your specific needs.
profile = webdriver.FirefoxProfile('../../Library/Application Support/Firefox/Profiles/yourprofilenamehere')
driver = webdriver.Firefox(firefox_profile=profile)
driver.get(link)
Happy scraping!
No comments:
Post a Comment