wget: problem with URL containg query string [SOLVED]

Using applications, configuring, problems
Post Reply
Message
Author
Mysp
Posts: 47
Joined: Mon 08 Jun 2009, 10:39
Location: Czech Republic

wget: problem with URL containg query string [SOLVED]

#1 Post by Mysp »

I already solved the problem. Experienced users of wget probably know about it. But it is a little bit tricky, therefore it can be useful for user with limited knowledge of wget and other CLI tools (like me). Problem: I need to download quite a lot of files with the same base URL, but different query string (part of URL after "?"), like:

Code: Select all

http://www.somedomain.com/?q=abc
http://www.somedomain.com/?q=abd
http://www.somedomain.com/?q=baa
etc.
It can be more than one parametr after "?", then there are separated with "&" (http://www.somedomain.com/?q=abd&sort=1). It does not matter for the described problem.

Default wget behavior: if you try just

Code: Select all

wget http://www.somedomain.com/?q=abd
in Puppy (Quirky) you will get "HTTP request send.... 200, OK", but you will end up with error messages like "invalid argument..." ... "Cannot write to "fileNameWith?" (Invalid argument).

Solution: Since the very beginning I suspect, that the cause of error is "?" in local filename. But wget has many parameters, therefore I did not find the right one immediately. Also, I would recommend wget manual at http://www.gnu.org/software/wget/manual/wget.html. It is not only nicely formatted, but it is also more detailed than just wget --help (wget at Puppy is slightly older version, but the parameters are the same). The switch ‘--output-document=file’ (can be shortened to -O) is NOT the solution. The right parameter is ‘--restrict-file-names’. But surprisingly enough, ‘--restrict-file-names=unix’ doe NOT work, too. I almost gave up, but then I have made last one attempt. ‘--restrict-file-names=windows’ does help, even in Puppy (Quirky). "?" in filename is changed to "@". To sum up, my current set of wget parameters are:

Code: Select all

wget --restrict-file-names=windows -i list.txt -o logfile.txt --wait=3 --random-wait -U 'some WWW user string'
where:
--restrict-file-names=windows is the key to solution (see above): Addendum: how to use depend on file system (Ext2 x FAT32): see next two posts.
-i list.txt list of the file to be downloaded are taken from specified file
-o logfile.txt specify name of the logfile
--wait=3 will cause wget to wait between download (to avoid server overload when downloading many files)
--random-wait will cause wget to "randomly" change wait value (from 0.5 to 1.5 * wait)
-U 'someString' wget will mask itself as some version of you favorite WWW browser (so called UserAgent string). You can visit (for example) http://whatsmyuseragent.com/ to get appropriate string for you version of Firefox, Opera, SeanMonkey etc.
Note: downloaded files will have very "ugly" and impractical name. You can batch rename them with Puppy built-in renamer (look at Menu, Filesystem).

I hope this post will be useful to some Puppy users. And while I already have sufficient solution, I would still appreciate supplementary explanation from some more knowledgeable user: Why parametr ‘--restrict-file-names=unix’ does not solve the problem as I would expect? Explanation is in next two posts.
Last edited by Mysp on Mon 09 Jan 2012, 20:13, edited 1 time in total.
disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#2 Post by disciple »

Are you downloading to a filesystem in a Windows format (FAT, NTFS)?
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER
Mysp
Posts: 47
Joined: Mon 08 Jun 2009, 10:39
Location: Czech Republic

Yes, you are right (OK on Ext2, problem on FAT32)

#3 Post by Mysp »

Yes, you are right. I did not realize that it is more question of file system rather then operating system. I have Ext2 on sda1 (with frugal install of Puppy and Quirky) and FAT32 on sda2 (for compatibility reason).
And I have made first tests on sda2 (which is my DATA disk - I keep save file as small as possible and save all the data outside, on separate partition).
I have retest it today and it is exactly as you predicted:
a) wget on Ext2 works even without ‘--restrict-file-names'
(but I would avoid "?" in file name anyway)
b) on FAT32 it is neccessary to use '--restrict-file-names=windows'
Thank you for you answer, now is 100 percent clear.
Post Reply