乐闻世界logo
搜索文章和话题

How to download all files (but not HTML) from a website using wget?

1个答案

1

Downloading all files from a website (excluding HTML files) using wget can be achieved with specific parameter settings. I will now detail a common method and its steps.

First, wget is a powerful command-line tool that supports HTTP, HTTPS, and FTP protocols for downloading files. To download all non-HTML files, we can utilize wget's exclusion feature.

The specific command is as follows:

bash
wget -r -l inf -A pdf,jpg,png,mp3 -nd -np -R html,htm http://example.com

Here are the parameters used:

  • -r: Enables recursive downloading, meaning wget starts from the specified URL and recursively fetches all resources.
  • -l inf: Sets the recursion depth to infinite.
  • -A: Specifies the accept list; here, pdf,jpg,png,mp3 indicates only these file types will be downloaded.
  • -nd: Prevents directory creation; all downloaded files are stored directly in the current directory.
  • -np: Disables following parent directory links on web pages.
  • -R: Defines the exclusion list; here, html,htm ensures no HTML files are downloaded.
  • http://example.com: The target website URL.

With this configuration, wget will recursively download all specified file types from the target website without downloading any HTML files.

For example, if you want to download all lecture materials and audio files from a music school's website—primarily in PDF and MP3 formats—you can use a similar command by adjusting the website URL and potentially modifying the file type list to ensure only required formats are downloaded. This approach is highly effective and straightforward to implement.

2024年7月30日 00:20 回复

你的答案