3

I have the following site http://www.asd.com.tr. I want to download all PDF files into one directory. I've tried a couple of commands but am not having much luck.

$ wget --random-wait -r -l inf -nd -A pdf http://www.asd.com.tr/

With this code only four PDF files were downloaded. Check this link, there are over several thousand PDFs available:

For instance, hundreds of files are in the following folder:

But I can't figure out how to access them correctly to see and download them all, there are some of folders in this subdirectory, http://www.asd.com.tr/Folders/, and thousands of PDFs in these folders.

I've tried to mirror site using -m command but it failed too.

Any more suggestions?

slm
  • 14,096
  • 12
  • 98
  • 116
eddie skywalker
  • 187
  • 1
  • 5
  • 16

1 Answers1

9

First, verify that the TOS of the web site permit to crawl it. Then, one solution is :

mech-dump --links 'http://domain.com' |
    grep pdf$ |
    sed 's/\s+/%20/g' |
    xargs -I% wget http://domain.com/%

The mech-dump command comes with Perl's module WWW::Mechanize (libwww-mechanize-perl package on debian & debian likes distros)

Gilles Quenot
  • 154,891
  • 35
  • 213
  • 206
  • Can you take a look here? https://stackoverflow.com/questions/68287730/wget-unable-to-download-all-pdfs – x89 Jul 07 '21 at 14:14