![]() ![]() ![]() Open the login page URL in the built-in browser.Ĭlick the first textbox. Choose "Enter text value". While using Octoparse to scrape data from such websites please follow the steps below.įirst of, Enter the task name. Octoparse supports scraping data from websites which require authentication. Some websites need you to login with user account and password before scraping data. In this tutorial, I’m going to show you how to scrape a website that requires login first. If you don’ t have an account yet, please sign up at octoparse. Octoparse is a web scraping tool specifically designed for mass-gathering of various data types. If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you within 24 hours.Welcome to Octoparse tutorial. Cloud Extraction doesn’t support dealing with Captcha. If you encounter a captcha, you can manually input the captcha when running the task locally. Entering captcha manually while running local extraction Any login information saved will be removed from your account permanently as soon as the task is deleted.ģ.When a task is exported, the password saved in the task gets removed automatically. In Octoparse, when you enter your password, it is only accessible on your own account.To resolve this, you will need to go through the login steps once again under browser mode in order to obtain and save the updated cookie. In Octoparse, the saved cookie will no longer work when it gets expired. Some have a specific expiration time, others expire immediately as the browser is closed. A saved cookie is only effective before it gets expiredĬookies come in many different forms. Now the web page is supposed to "remember" the login and skip the login steps when the crawler is running next time.ġ.After login, go to the "Options" settings of the "Go to web page" action, tick "Use Cookie" and click "Use cookie from the current page".You can log in to the website just like what you do on a regular browser.Switch to Browser mode by clicking on the top right.This way, Octoparse will send the saved cookies to the website during loading, and there's a good chance the website will remember "you" and skip the login steps. Most of the time, you can optimize the workflow by saving the cookies in the task after login. Select Clear cache before opening the web page.Click Go to Web Page and select Options.This way, the target website will always "forget" you and take you to the login page on which you can enter all the login information. To do this, you can clear any cookies saved before the login page is loaded. Octoparse has now logged into the website successfully!Īs all websites handle cookies differently, to ensure the task workflow will work consistently, you may want to start with the login steps every time the task is executed. Click the "Sign In" button on the page and select "Click button" from the Tips panel.Follow the same steps to enter the password.Click on "Continue" and select "Click button" from the Tips panel.Input the username into the textbox, click "Confirm", and the username entered is automatically populated to the username textbox on the web page.Click on the textbox for username input on the web page. ![]()
0 Comments
Leave a Reply. |