Hathi Download Helper | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download books from the hathitrust website in a fast and easy manner. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Quickstart | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simple View Mode | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Advanced View Mode | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Features: brief overview:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download resources | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Source code and installer are available on: WWW: https://sourceforge.net/projects/hathidownloadhelper/ WWW: www.facebook.com/hathidownloadhelperTool WWW: http://www.softpedia.com/get/Internet/Download-Managers/Hathi-Download-Helper.shtml Comments, feedback, bug reports and questions are welcome: hathidownloadhelper@hotmail.com You can also use the build-in contact form. No email address required! (See: Help → Contact & Bug report) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
User interface elements |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The main window of Hathi Download Helper is separated into three group boxes. Each group box corresponds to a certain processing step:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Menu Bar | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The menu bar provides the following options: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Group Boxes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The 'book information' group box holds the URL input field as well as the received book information: Title, number of pages, book ID, publisher and author. After entering the book URL Hathi Download Helper reads the html document after pushing the 'get Book info' button. Alternatively the book ID can be entered also. If desired a proxy server could be used by selecting the corresponding checkbox. When the book is blocked, e.g. due to copy right restrictions, a message saying "Received empty document..." will be displayed close to the progress bar. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In the 'PDF merge & conversion' group box the user can choose between the following options:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Features | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This section holds some information about manual proxy, AutoProxy and WebProxy feature as well as the file naming and folder structure used by Hathi Download Helper. Furthermore, you will find some explanations about Hathi Download Helper as PDF merger and Image-to-PDF converter. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Manual Proxy | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper provides an option to enable a network proxy. The proxy specification (IP- and port number, proxy type, user name and password) have to be defined by the user. The Hathi Download Helper will use this proxy server connection as long as the 'use proxy server' checkbox is selected. For implementation the QNetworkProxy class of Qt 4.7.4 is used: The following types are supported:
On enabling the proxy connection Hathi Download Helper will check if a connection to the proxy server was established and if hathitrust.org is reachable. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AutoProxy feature | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper provides an option to automatically connect to network proxies. For this purpose the downloader is utilizing proxy lists free available from the internet and will check if the given hathirust.org book url is reachable. Depending on the selected option (see: Tools→Proxy ) the AutoProxy feature will use either proxy servers from 'US only' or from 'all countries'. To use the AutoProxy feature to get information about a book you have to select the 'use proxy server' checkbox within the book information groupbox. To enable the AutoProxy feature for download select the 'enable AutoProxy' checkbox within the download settings groupbox. NOTE: Some books are only available when viewed in the US. For those books the 'US only' option has to be selected. Please note that you are only allowed to view these books when you are in the US. Further information:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
WebProxy - disabled in HDH 1.1.1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper provides an option that utilizes a large amount of random web proxies to download data from hathitrust.org: This feature re-directs all download requests to free web proxy services to continue the download of data while the server download limitation for the user is activated. Please note the following information: Restrictions: • Works only for non-restricted books which are also public domain when viewed outside the US. • Strongly varying download speed. Important advice: • Since this feature utilizes a large number of random web pages an updated virus scanner is recommended. • There is no guarantee for proper functioning. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
WebProxy safety measurements | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To minimize the risk of unwanted behaviour the following safety measurements are implemented:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
File and folder structure | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper creates the following sub-folder structure for downloaded data inside the target directory:
Note: When restarting a download (with the same book ID to the same destination folder) all files downloaded in the previous session will be overwritten unless you have selected the 'resume book download options'. In that case the downloader will check if a corresponding file (with the same name) already exists and will not download this file again. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper creates the following sub-folder structure for converted data inside the source directory:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note: Since the target folder for download is the source folder for conversion all existing pdf files within the 'pdfs' folder will be overwriten when 'single pdf' conversion was selected as output option! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Namespace | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper is using a fixed name structure for downloaded data, starting with the document ID (but with removed reserved characters). This namespace is used for pdf files, images and ocr text files (html-files). Example for document ID: hvd.32044038439063:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper as PDF merger | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper is able to merge any pdf files utilizing the 'pdftk' (pdf toolkit) application. For this purpose the radio button "merge pdfs" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear. This dialog is also available from the menu bar (Tools → Merge PDFs). If you are running a linux or MAC OS system you have to install the 'pdftk' tool (http://www.pdflabs.com). For Windows systems Hathi Download Helper brings along a copy of 'pdftk'. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper as Image-to-PDF converter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hathi Download Helper is able to convert a number of different image formats into pdf files. For this purpose the radio button "convert & merge images to pdf book" or "convert images to single pdf files" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear." | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note: Since the target folder for download is the source folder for conversion all existing pdf files within the 'pdfs' folder will be overwriten when 'single pdf' conversion was selected as output option! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Installing pdftk |
For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. To install pdftk you have to do the following actions in dependency of your OS: | |||||
| |||||
| |||||
|
FAQ
Hathi (pronounced hah-tee) is the Hindi word for elephant, an animal highly regarded for its capability to suck a huge amount of water into its trunk, and to blow the water into its mouth. In computer networks, to download means to receive data to a local system from a remote system, or to initiate such a data transfer. Helper refers to a device that helps. In combination, the words convey the key benefits users can expect from this application - to download pages or complete books in an easy way. |
The Hathi Download Helper logo was originally created by 'lemming' and titled as 'Cartoon elephant'. It is under public domain. For further information visit: https://openclipart.org/detail/17810/cartoon-elephant |
There is a download limitation for any files by Hathitrust.org. When downloading too many files in a short period of time you will be forced to wait for some time. In case of pdf-files the limitation is about 15 files/ 5 minutes. Afterwards you have to wait for appr. 5 minutes. You may activate the WebProxy-Feature to download data via several webproxies during this queuing period. |
This behaviour may occur due to extensive download requests. In this case the user IP might be blocked by Hathitrust.org for apprx. 5 minutes. |
Hathi Download Helper uses a PDF-Printer (Qt::QPrinter), which 'prints' the images into the pdf file. Since QPrinter only supports jpg-image formats all pages are stored as jpg-images inside the pdf file. Therefore even pages with text only have to be stored in the same way like full resolution images. |
Hathi Download Helper does not have any OCR functionality. Instead it uses the OCR files generated by Hathitrust.org. The downloaded OCR files are stored as html files on your hard disk. For PDF creation the OCR text will be printed on each page overlayed by the corresponding images. |
Hathi Download Helper is using free proxy servers whose IPs are published and updated online. Therefore the service of quality is strongly varying. Normally HDH requests up to 20 proxy IPs from 1 source. Sometimes a source only provides outdated IPs. HDH will request a new IP list after checking each of the previously received IP list. You can enforce an update of the IP list by re-freshing (uncheck, re-check) the 'use Proxy Server' checkbox. When the proxy verification still fails check if HDH is blocked by your firewall. An easy test is to run a manual update check: Menu bar → Help → check for update. |
ERROR FIXING
For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. This error may occur due to missing permissions for the pdftk files. To fix this error see Installing pdftk | ||||
For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. This error may occur due to corrupted pdf-files. A warning dialog might name corrupted files. To fix this error you have to do the following actions:
| ||||
There are some problems with Qt framework on Mac OS X. Updating/changing application font should fix this problem. Select 'Options'→'Gui setupt'→'Font setup' from the menubar. | ||||
For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. This error may occur due to corrupted pdf-files. A warning dialog might name corrupted files. To fix this error you have to do the following actions:
|
Change log
2013.05.18: | initial version 1.0.0 | |
2013.05.19: | version 1.0.1 released: | |
fixed bug in image resolution setting after 'page setup' dialog, renamed images files in qt resources, copied image files in application directory | ||
2013.05.24: | version 1.0.2 released: | |
changed development environment to 4.7.4, added compiler switch for qt 5.x, tested on linux and windows system, added options for GUI style and fonts, updated GUI, bug fix for missing ocr files, reduced freezing effect of GUI during pdf creation, added 'pdftk' binary for linux/OS, added selection for proxy type. | ||
2013.06.03: | version 1.0.3 released: | |
bug fix for proxy type selection. moved pdf merge & conversion into QThread worker to eliminate freezing effect of GUI during processing. Changed usage from QPixmap to QImage for pdf creation. Changed OCR text extraction method to reduce memory usage(QWebkit is really greedy). Improved text font size adjustment method. Added Author and Publisher information. Changed Windows installer creation from QT framework installer to inno setup compiler to fix kernel32.dll error on win XP. | ||
2013.07.02: | version 1.0.4 released: | |
improved download performance by using parallel download requests (it is really much faster now :-D ), added encryption for proxy password, added 'check for update' feature, added batch job feature for downloading several books at once, added link export function | ||
2013.08.18: | version 1.0.5 released: | |
re-implementation of all GUI elements and dialogs, fixed text clipping of GUI elements, fixed page shrinking on pdf creation due to long ocr text, improved download speed, re-designed help file | ||
2013.10.27: | version 1.0.6 released: | |
bug fixes: lost destination path for single pdf-file creation, application crash on manual file selection. Added new features for batch job dialog: 'edit book', 'load job', 'save job', added gimmicks for Halloween and Christmas, minor changes. | ||
2014.03.30: | version 1.0.7 released: | |
added new download options: webproxies, resume of book downloads, added user settings dialog, added auto-update option, coding: separated GUI from file downloader. | ||
2014.05.06 | version 1.0.8 released: | |
adjustments due to changes in hathitrust.org link structure. | ||
2014.10.26 | version 1.0.9 released: | |
Updated GUI, added link collector feature, added history feature, added automatic proxy feature (including US proxies): 'AutoProxy', added verification check for proxy connections, improved pdf merging process, added field for copyright information, added check for corrupted pdf and image files, added automatic download resume in case of corrupted pdf files, minor bug fixes, changed development environment to 4.8.0 | ||
2014.11.30 | version 1.1.0 released: | |
bug fixes: fixed possible application crash on proxy activation, fixed PDFTK problems with too long file paths. Changes: disabled change-over from WebProxy to AutoProxy feature and vice versa during download, revised behaviour of various GUI controls to improve usability | ||
2015.03.08 | version 1.1.1 alpha (unreleased): | |
Changes: adjusted timing of AutoProxy feature, added option to preserve existing pdf books with identical name in same folder, adjustments for Mac OS compatibility. | ||
2016.05.19 | version 1.1.1 released: | |
Changes: adjustments to obtain SSL/TLS compatibility for https requests. (AutoProxy / WebProxy disabled) | ||
2016.06.07 | version 1.1.2 released: | |
Bug fixes: fixed and improved autoproxy feature. Changes: enabled rezie of GUI, added message / bug report feature. | ||
2016.09.05 | version 1.1.3 released: | |
Bug fixes: pdf merging fails when downloading book with more than 1300 pages, fixes automatic update check feature. New features: Download whole books as 1 pdf when whole book download is available, added Pdf merging dialog to merge arbitrary pdf files. | ||
2017.07.23 | version 1.1.4 released: | |
Bug fixes: Fixed connection problems to hathitrust webpage. Fixed broken feedback form. | ||
2017.12.20 | version 1.1.5 released: | |
Bug fixes: Removed obsolete proxy sources. Improved auto proxy feature. Added SQL database. Added option to remove downloaded page data automatically. | ||
2018.05.15 | version 1.1.6 released: | |
Bug fixes: Increased timeout for pdf merging process. Fixed crash on manual pdf merging process. Updated restriction check for image download. Adapted default font settings for Mac OS. New features: Added simple view mode, added 1-click-download feature, added forward connection check for proxy server | ||
2018.07.06 | version 1.1.7 released: | |
Bug fixes: Fixed (auto) update check feature, Fixed saving user settings problem on linux | ||
|