I'm currently working on deleted questions of stack overflow project. By querying the stack exchange data explorer I got the data with list of deleted question including post id, creation date and deletion date. For data consistency I'm trying to retrieve those questions from internet archive. To search in the way back machine I need the URL structure for all the questions and later I can download the snapshots from waybackpack. I have the URLs as below:
https://stackoverflow.com/q/2557502
https://stackoverflow.com/q/2557505
https://stackoverflow.com/q/2557507
But these links will redirect with title of the question when opened like below: https://stackoverflow.com/questions/2557502/need-help-with-dropdowns-in-gridview-pager-that-are-slow-to-respond-the-first-ti https://stackoverflow.com/questions/2557505/netbeans-profiler-maven-jetty-plugin
To download these question from the internet archive I need the links in the above mentioned format.
Right now I have list of more than 40000 shortened URLs which needed to convert to long URLs.
I tried the solutions from How can I unshorten a URL?. But from my observation the algorithms are not working stable and after couple of iterations it is again giving shortened URLs. Is there any session limit to get the request header URL?.I have also used https://github.com/skevas/unshorten and other api's from GitHub but everything is getting crashed in the middle and returning the shortened URLs.
from unshortenit import UnshortenIt
unshortener = UnshortenIt()
uri = unshortener.unshorten('https://stackoverflow.com/q/2638047')
print(uri)