6

I need to get the HTML source of pinnaclesports.com. The problem is it detects whether cookies and JS are enabled and if not, it just returns some page saying

This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.

Is there any way how to spoof JS support when using cURL?

EDIT: I can use a headless browser that runs either as a Perl/Ruby module or is written in PHP

user965748
  • 2,009
  • 4
  • 20
  • 29

2 Answers2

4

Other sugestion is set the user agent, this solution works for me on parser of the Google Groups:

curl -L -v "https://groups.google.com/d/forum/<GROUP-NAME>" -A "Mozilla/5.0 (compatible;  MSIE 7.01; Windows NT 5.0)"
João Paulo Cercal
  • 693
  • 1
  • 7
  • 12
3

I figured out that, if you make cookie-less REQUEST a page will be returned , which uses javascript to set cookies, the one which you are getting using the curl.

make another curl call like this

curl https://www.pinnaclesports.com/ --cookie "YPF8827340282Jdskjhfiw_928937459182JAX666=122.167.231.139"

i.e. You have to make 2 calls 1) make cookie less call, read and regex to find cookiename. 2) make 2nd request after setting the cokie name. that will solve your problem.

OR
Just use YQL

select * from html where url="https://www.pinnaclesports.com/" 

point your curl to here

Markandey Singh
  • 451
  • 3
  • 9
  • Thank you! The method you described works. YQL solution might be useful as well, but I need to further work with the source for making a login request, so it's probably better to use the former way. – user965748 Sep 06 '12 at 21:21
  • 1
    I am in same kind of dilemma. I read your solution up there but don't know how to find the cookie name and how to use it in the second curl request. Any assistance in this regard would be highly appreciated. – Saad Bashir Aug 14 '13 at 05:18