Unshortening Bulk urls in python

Question

I'm currently working on deleted questions of stack overflow project. By querying the stack exchange data explorer I got the data with list of deleted question including post id, creation date and deletion date. For data consistency I'm trying to retrieve those questions from internet archive. To search in the way back machine I need the URL structure for all the questions and later I can download the snapshots from waybackpack. I have the URLs as below:

https://stackoverflow.com/q/2557502
https://stackoverflow.com/q/2557505
https://stackoverflow.com/q/2557507

But these links will redirect with title of the question when opened like below: https://stackoverflow.com/questions/2557502/need-help-with-dropdowns-in-gridview-pager-that-are-slow-to-respond-the-first-ti https://stackoverflow.com/questions/2557505/netbeans-profiler-maven-jetty-plugin

To download these question from the internet archive I need the links in the above mentioned format.

Right now I have list of more than 40000 shortened URLs which needed to convert to long URLs.

I tried the solutions from How can I unshorten a URL?. But from my observation the algorithms are not working stable and after couple of iterations it is again giving shortened URLs. Is there any session limit to get the request header URL?.I have also used https://github.com/skevas/unshorten and other api's from GitHub but everything is getting crashed in the middle and returning the shortened URLs.

from unshortenit import UnshortenIt
unshortener = UnshortenIt()
uri = unshortener.unshorten('https://stackoverflow.com/q/2638047')
print(uri)

I think `session` has nothing to do. Simply if you send many requests in short time then server may think you are hacker and it may blocks your requests. You may have to use random delays (like real human) or use `proxy servers` (or `tor network`) to run requests with different IPs. And you may have the same problem when you start downloading from `wayback` - it also may block you if you will send many requests in short time. — furas, Mar 12 '22 at 01:17

Unshortening Bulk urls in python

0 Answers0