I would like to develop automatic scraper for asp password protected web page. I have a login/password for this page.
First of all, a look in to Firebug log during authorization via firefox. What I have found:
- When I open login page, I get cookie with "__RequestVerificationToken". i.e
http://mysite - When I press Login button FF makes POST query to
http://mysite/Account/Loginwith parameters UserName, Password and __RequestVerificationToken, also it uses cookie saved on step 1 - In case of successful authorisation I get another cookie .ASPXAUTH and goes to
http://mysite/Account/Index(page which I want to scrape)
My code
//1. Get __RequestVerificationToken cookie
$urlLogin = "http://mysite";
$cookieFile = "cookie.txt";
$regs=array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_STDERR,$f = fopen("answer.txt", "w+"));
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0' );
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
$data=curl_exec($ch);
//2. Parse token value for the post request
$hash=file_get_contents("answer.txt");
preg_match_all('/=(.*); p/i',$hash, $regs);
//3. Make a post request
$postData = '__RequestVerificationToken='.$regs[1][0].'&UserName=someLogin'.'&Password=somePassword';
$urlSecuredPage = "http://mysite/Account/Login";
curl_setopt($ch, CURLOPT_URL, $urlSecuredPage);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
$data = curl_exec($ch);
curl_close($ch);
At step 3 my cookie saved on step 1 is rewriting with new value of __RequestVerificationToken. I don`t understand why it happens. As a result I can not authorize due to wrong __RequestVerificationToken and get HTTP 500 error.
Where I`m wrong?