The reason that I chose this project is sometimes I download lecture slides of courses that I'm interested in. For example, since MIT opened the most of their courses, I want to download the lecture files to see how they go over the courses and what the homeworks are. Downloading whole files by clicking every link was exhausting. At this point my project is lending a hand to make things easier.
Actually you can download files from a web site by using wget command. To download specific types of files like pdf files, you can give the extension of the file with -A parameter. Let's see an exampe of this.
wget -r -A.pdf http://www.cs.ozan.edu/~yildiz/prog101/
This command will download all pdf files in the given website which is supposed to include lecture slides or homeworks.
Since the aim of the project is using grep, sed, awk, cut and provide GUI to the user, I should have used something else. And I came up with this:
How does the script work?
- After a user enters comma-separated urls of web pages, the script keeps a list of urls by using awk command.
- The script has a loop to iterate over those urls.
- Then, the source of a web pages is downloaded with wget -k -O command. -k parameter is to convert all relative paths to absolute path.
- Then, grep extracts these absolute paths by looking file extensions and write them into a file. (The extensions of the files that are going to be downloaded is predefined.)
- Another loop downloads all files one by one. Also download bar is increasing when a file is downloaded.
- The script also keeps the history of the downloaded files.
And of course images of the project:
This is the part that user enters links.
Downloading screen. We are able to see the file that is being downloaded.
I put the codes to my gist. Click here to see the code. You can add new file extensions to be downloaded by modifying extensions variable.
No comments:
Post a Comment