rsync is the preferred method for cloning and syncing with the archive.
This site has no javascript, data bases or any of that Web 2.0 stuff
It was designed to be cloned as-is to be an exact copy of bitsavers.org
As of Oct, 2023, the entire archive is around 1.4tb
There was a big bump in archive size because of the preservation of NetBSD iso archival releases
If you are syncing, be warned that file names, dates and
their location in the hierarchy change (these aren't permalinks)
The --delete in the rsync is important
Archive Indexing
An index file is maintained at the top level of each category heirarchy IndexByDate.txt is updated each time an indexed document is added to the archive.
These files are what drives the rss feeds
Documents here are kept in a minimal subset of PDF format, just using it as a
container for lossless Group 4 fax compression (ITU-T recommendation T.6) images.
Contributions are normally post-processed by tools to put them in exactly this format.
Documents were scanned using a Ricoh IS520 400dpi 30ppm B&W duplex production scanner
from the late 90's through 2007.
Conversion to higher performance Kodak DS 2500D scanning occured in July, 2007.
The 2500D is an OEM version of the Panasonic KV-S2055 scanner.
In 2008, the Kodak was replaced by a Panasonic KV-S3065W, which
is capable of duplex color 600dpi scanning, and has the capability to scan
sheets 100 inches long.
Post-processing is done using Lemkesoft's
TIFF to PDF conversion is done using Eric Smith's tumble
A final OCR step is done with Acrobat Pro.
I've continued to use tumble since it is MUCH faster than Acrobat for tif to pdf conversion.
The preferred form for any contributed text scan is as a collection of lossless
Group 4 fax compression (ITU-T recommendation T.6) images saved as TIFF
files with a minium scan resolution of 400 dpi.
Lower scan resolutions produce noticable artifacts if a page needs to be
straightened in post-processing.
Lossy compression formats, such as JPEG, should NEVER be used to save pages
of text, since the compression format destroys edge resolution and contrast
OCR
OCR has been part of the post-processing of scans for many years now
and is slowly being applied to older pdf files. It is a slow process and
it will take many years to complete.
Document Scanning Station
Tape processing over the years
These were taken in rooms that no longer exist at CHM, ca. 2006.
The rooms were demolished when the Revolution exhibit was built.
They were roughly where the gift shop and orientation theatre are now.
You can see four XServe RAIDs which are still in use in 2021 with 2.5" 1tb Toshiba SATA drives and PATA/SATA adapters.
Where does the source material come from?
Most of the documents are from my personal collection that I have either bought
or been given over the course of many decades in the computer industry, or have been
loaned to me for scanning.
I have a VERY
large backlog of material to scan and don't actively solicit material to work on.
If I do decide to scan something from a donor
I will return it if requested.
Unless it is a very rare document I probably
won't accept something that requires manual scanning, since scanning time in my
day is limited.
I do not personally archive any paper that has been scanned.
The scanning process I use is destructive. Bindings are removed and paper is recycled.
Original documents that are still in good condition may be donated to the Computer History Museum for archiving, depending on
if they are within CHM's collecting scope.
The CHM running lot number for my donated documents is
X6512.2012
This project was started to downsize my collection of paper in the early 90's and continues
to be its primary purpose.
and.. the site looks this way for a reason, to leave it static and easy to mirror, so don't remind me that it looks like it's from 1995