2

I've been interested in webcrawlers recently and decided to try Jsoup. I'm not exactly sure how to log into a website with it though. I saw another SO post about it but couldn't piece together how to do it.

I've been trying to crawl around with a site www.tickld.com and the login site is "https://www.tickld.com/signin".

I'm not sure if I'm using Jsoup correctly(I'm certain this is the main reason), if the error is the .jks, or if I'm entering the wrong information, and I don't really see how to test which part of the code is failing.

        System.setProperty("javax.net.ssl.trustStore", "filePath\\keystore.jks");

        Connection.Response loginForm = Jsoup.connect("https://www.tickld.com/signin")
                .method(Connection.Method.GET).execute();

        Document document = Jsoup.connect("https://www.tickld.com/signing")
                .data("l_username", "myUsername")
                .data("l_password", "myPassword")
                .cookies(loginForm.cookies())
                .post();

but whatever I'm doing, it is not logging into the site, it is only taking me to the signin page.

mattias
  • 2,031
  • 3
  • 21
  • 27
themiDdlest
  • 86
  • 1
  • 13

1 Answers1

4

The signing in is handled by ajax. I'm using chrome, so this is what I did. Try to login via the form from a browser. Press F12 and then press Console. You will see something like this XHR finished loading: POST "https://www.tickld.com/ajax/login.php". . When you make the POST request, you make it to the url that is placed in the action parameter of the form tag. In this case, no such url exists, because it is handled by javascript.

Try this and see if it works.

Document document = Jsoup.connect("https://www.tickld.com/ajax/login.php")
                .data("l_username", "myUsername")
                .data("l_password", "myPassword")
                .cookies(loginForm.cookies())
                .post();

If it doesn't then you might need to use some headless browser (which can handle js execution) like selenium webdriver.

Update

Connection.Response login = Jsoup.connect("https://www.tickld.com/signin")
                                .data("l_username", "myUsername")
                                .data("l_password", "myPassword")
                                .method(Connection.Method.POST)
                                .execute();

Document document = Jsoup.connect("http://www.tickld.com/user/chosimbaaaa")
                .cookies(login.cookies())
                .get();
Alkis Kalogeris
  • 15,894
  • 13
  • 56
  • 106
  • I was out of town for a while. This worked perfectly! I LITERALLY cannot tell you how happy you just made me. I would upvote this like 1000 times if I could. You are the best. THANK YOU THANK YOU. – themiDdlest May 28 '15 at 08:28
  • One more quick question. I am able to log in, I get a "{"result",:"success"} when I try this. I believe I need new cookies to keep logged in, The only way I see to get cookies, is by using Jsoup response. When I try a similar technique to get new cookies: response = Jsoup.connect("https://www.tickld.com/ajax/login.php") .data("username", "myUsername") .data("password", "myPW") .cookies(response.cookies()) .execute(); I get a {"result":"failed"} and errors telling me I didn't include a username and a password. – themiDdlest May 28 '15 at 09:01
  • continued: Am I correct in belief I need new cookies to stay logged in? And do you know if the response would use the same log in format as a document? Hopefully these comments made sense. – themiDdlest May 28 '15 at 09:05
  • Yes you do. Give a url that you want to parse after you've logged in, and I'll try to make an example and see if it works. – Alkis Kalogeris May 28 '15 at 09:07
  • https://www.tickld.com/user/chosimbaaaa He has certain things set private such as bookmarks. If I log in on the Browser, I am able to see them. – themiDdlest May 28 '15 at 09:12
  • Yes, with that I was able to successfully log in, but using the new cookies all the info was still private. Does this mean I should just use something else like selenium? – themiDdlest May 28 '15 at 09:27
  • Well if the data that you are trying to scrape are generated by javascript, then yes, selenium is the way to go. To check if javascript generated the code go to the page with chrome and press Ctrl+U . The new window contains the data that jsoup will get. If the data you need are not in there, then javascript is generating that data. – Alkis Kalogeris May 28 '15 at 09:32
  • 1
    I feel like I'm not doing it exactly correct, so I'm going to go read some more. I was able to log in, and you've been incredibly helpful. Thank you so much. – themiDdlest May 28 '15 at 10:01
  • I think the error I'm running into is the cookies. I'm a little confused by when to use Method.GET vs Method.POST. W3Schools explained it like for idiots "When you want to get info" or "submit data" but it was still a little confusing. I downloaded Fiddler web debugger and it looks like each connection contains a Post and Get, and each might have different cookies. For instance, my request had 7 different cookies, but the response only had 2. – themiDdlest Jun 05 '15 at 06:19