3

I would like to develop automatic scraper for asp password protected web page. I have a login/password for this page.

First of all, a look in to Firebug log during authorization via firefox. What I have found:

  1. When I open login page, I get cookie with "__RequestVerificationToken". i.e http://mysite
  2. When I press Login button FF makes POST query to http://mysite/Account/Login with parameters UserName, Password and __RequestVerificationToken, also it uses cookie saved on step 1
  3. In case of successful authorisation I get another cookie .ASPXAUTH and goes to http://mysite/Account/Index (page which I want to scrape)

My code

//1. Get __RequestVerificationToken cookie

    $urlLogin = "http://mysite";
    $cookieFile = "cookie.txt";
    $regs=array();
    
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_URL, $urlLogin);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
    curl_setopt($ch, CURLOPT_STDERR,$f = fopen("answer.txt", "w+"));
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0' );
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); 
    
    $data=curl_exec($ch);

//2. Parse token value for the post request

$hash=file_get_contents("answer.txt");
preg_match_all('/=(.*); p/i',$hash, $regs);

//3. Make a post request

    $postData = '__RequestVerificationToken='.$regs[1][0].'&UserName=someLogin'.'&Password=somePassword';
    $urlSecuredPage = "http://mysite/Account/Login";
    curl_setopt($ch, CURLOPT_URL, $urlSecuredPage); 
    curl_setopt($ch, CURLOPT_POST, TRUE);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); 

    $data = curl_exec($ch);
    curl_close($ch);

At step 3 my cookie saved on step 1 is rewriting with new value of __RequestVerificationToken. I don`t understand why it happens. As a result I can not authorize due to wrong __RequestVerificationToken and get HTTP 500 error.

Where I`m wrong?

Alex Molskiy
  • 835
  • 1
  • 8
  • 14
  • Your `__RequestVerificationToken` is session dependent, as far i am getting, as soon as user get authenticated, the value of `__RequestVerificationToken` gets changed, which remains valid for that user only, once user logs out, value will change again. So turn your logic based on this. I think. – abhij89 Feb 15 '14 at 08:58
  • saveATcode - u r right that __RequestVerificationToken is session depended, but it is still the same in FF from step 1 to step 3. It changes only if I close browser and visit site again. – Alex Molskiy Feb 15 '14 at 09:05
  • At step 3 my cookie saved on step 1 is rewriting with new value of __RequestVerificationToken. You wrote this in your question, which means error is somewhere else.. – abhij89 Feb 15 '14 at 09:07
  • 2
    Firstly, options CURLOPT_COOKIEFILE/CURLOPT_COOKIEJAR must be initialized with FULL path value. "cookie.txt" is a relative path. Secondly, `$postData` must be url-encoded. However you don't need do it manually. You may use `http_build_query()` to build encoded POST data or immediately pass assoc. array to `CURLOPT_POSTFIELDS` option. – hindmost Feb 15 '14 at 09:08
  • hindmost - I use full path for cookies, just delete it in the post. And thanks for mentioning url-encoding, I forgot this. – Alex Molskiy Feb 15 '14 at 09:19

2 Answers2

3

There are should be two things for __RequestVerificationToken. One of them in hidden input value, the second one in the cookie. Value from hidden input value is sent in each request. And for each request it has a new value. It depends on cookie value.

So you need to save input value and cookie, and send them back together. If you won't send value from hidden input, then Asp.Net MVC thinks that this is an attack, and generate new cookie. New cookie will be generated only if validation failed or the cookie itself doesn't exists. If you get that cookie, and you always send __RequestVerificationToken input value with POST request, then it shouldn't generate new cookie.

If it's still generated, then you are sending incorrect __RequestVerificationToken from hidden input value. Try to do the same from Fiddler\Charles, and check will be return success result or not.

They are used to prevent CSRF attacks.

Sergey Litvinov
  • 7,228
  • 5
  • 47
  • 66
2

Big thanks for Sergey Litvinov and hindmost

Correct code below

$urlLogin = "http://mysite";
$cookieFile = "/Volumes/Media/WebServer/aszh/cookie.txt";
$regs=array();

$ch = curl_init();

//Make GET request and get __RequestVerificationToken cookie
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_STDERR,$f = fopen("/Volumes/Media/WebServer/aszh/answer.txt", "w+"));
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0' );
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); 

$data=curl_exec($ch);

//Parse answer and get __RequestVerificationToken hidden input value
preg_match_all('/type="hidden" value="(.*)" /i', $data, $regs);
$token = $regs[1][0];

$postData = array('__RequestVerificationToken'=>$token,
     'UserName'=>'userName',
     'Password'=>'password');


//Make POST request and get .ASPXAUTH cookie
$urlSecuredPage = "http://mysite/Account/Login";
curl_setopt($ch, CURLOPT_URL, $urlSecuredPage); 
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postData));
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); 

$data = curl_exec($ch);
curl_close($ch);
Community
  • 1
  • 1
Alex Molskiy
  • 835
  • 1
  • 8
  • 14