bitsavers.org

Bitsavers'
Software Archive
Computing Archive
Communications Archive
Components Archive
Magazine Archive
Test Equipment Archive

2025-03-27

People are downloading the ENTIRE site through the web interface which is INCREDIBLY INEFFICENT!

USE ANONYMOUS RSYNC.. That's what it's there for!

If you're rsyncing, and are willing to offer anonymous rsync service, please let me know at the adr below

As of Mar, 2025 there are over 179000 files including over 8.4 million text pages in the 1.68tb archive.

Bitsavers Updates RSS

RSS feeds for bitsavers updates are available
bits
communications
components
magazines
pdf
test_equipment

Mastodon

bitsavers' social media presence

Active Mirrors

Web

bitsavers.computerhistory.org
bitsavers.informatik.uni-stuttgart.de
bitsavers.trailing-edge.com
University of Kent
ftpmirror.your.org
www.bighole.nl
decromancer.ca

FTP

bitsavers.informatik.uni-stuttgart.de
University of Kent
ftpmirror.your.org

RSYNC

ftpmirror.your.org
ftpmirror.infania.net

rsync is the preferred method for cloning and syncing with the archive.
This site has no javascript, data bases or any of that Web 2.0 stuff
It was designed to be cloned as-is to be an exact copy of bitsavers.org

You can clone the entire archive with

rsync -av --delete rsync://bitsavers.org:/bitsavers/ bitsavers/

As of Oct, 2023, the entire archive is around 1.4tb

There was a big bump in archive size because of the preservation of NetBSD iso archival releases
If you are syncing, be warned that file names, dates and
their location in the hierarchy change (these aren't permalinks)
The --delete in the rsync is important

Archive Indexing

An index file is maintained at the top level of each category heirarchy
IndexByDate.txt is updated each time an indexed document is added to the archive.
These files are what drives the rss feeds

Snapshots/Mirrors

Jul 2004 shapshot of pdp-11.trailing-edge.com
Jan 2005 shapshot of simh.trailing-edge.com
Jun 2012 snapshot of simh.trailing-edge.com
scans from the University of Queensland
NetBSD release archive 1.0 to the present

The PDF Document Format

Documents here are kept in a minimal subset of PDF format, just using it as a
container for lossless Group 4 fax compression (ITU-T recommendation T.6) images.
Contributions are normally post-processed by tools to put them in exactly this format.

Documents were scanned using a Ricoh IS520 400dpi 30ppm B&W duplex production scanner
from the late 90's through 2007.

Conversion to higher performance Kodak DS 2500D scanning occured in July, 2007.
The 2500D is an OEM version of the Panasonic KV-S2055 scanner.

In 2008, the Kodak was replaced by a Panasonic KV-S3065W, which
is capable of duplex color 600dpi scanning, and has the capability to scan
sheets 100 inches long.

Post-processing is done using Lemkesoft's Graphic Converter
mostly just to clean page edges.
TIFF to PDF conversion is done using Eric Smith's tumble
which has been enhanced over the years to support formats other than TIFF.
A final OCR step is done with Acrobat Pro.
I've continued to use tumble since it is MUCH faster than Acrobat for tif to pdf conversion.

The preferred form for any contributed text scan is as a collection of lossless
Group 4 fax compression (ITU-T recommendation T.6) images saved as TIFF
files with a minium scan resolution of 400 dpi.
As the cost of storage declined, I moved to 600 dpi for all of my bitonal scanning in the 2010s.

Lower scan resolutions produce noticable artifacts if a page needs to be
straightened in post-processing.

Lossy compression formats, such as JPEG, should NEVER be used to save pages
of text, since the compression format destroys edge resolution and contrast

OCR

OCR has been part of the post-processing of scans for many years now
and is slowly being applied to older pdf files. It is a slow process and
it will take many years to complete.

Pictures and Magazines

Pages with pictures are processed generally as 300dpi JPEGs
and are inserted where appropriate either in color or grayscale.
There is a separate workflow now for magazines which started in the
2020s where the pages are fully processed as JPEG2000 pages with a
separate workflow using ViewScan and a Fujitsu fi-5530C2

Document Scanning Station

Tape processing over the years

These were taken in rooms that no longer exist at CHM, ca. 2006.
The rooms were demolished when the Revolution exhibit was built.
They were roughly where the gift shop and orientation theatre are now.
You can see four XServe RAIDs which are still in use in 2021 with 2.5" 1tb Toshiba SATA drives and PATA/SATA adapters.

Where does the source material come from?

Most of the documents are from my personal collection that I have either bought or been given over the course of many decades in the computer industry, or have been loaned to me for scanning.

I have a VERY large backlog of material to scan and don't actively solicit material to work on.

If I do decide to scan something from a donor I will return it if requested.

Unless it is a very rare document I probably won't accept something that requires manual scanning, since scanning time in my day is limited.

I do not personally archive any paper that has been scanned.

The scanning process I use is destructive. Bindings are removed and paper is recycled.

Original documents that are still in good condition may be donated to the Computer History Museum for archiving, depending on if they are within CHM's collecting scope.

The CHM running lot number for my donated documents is X6512.2012

This project was started to downsize my collection of paper in the early 90's and continues to be its primary purpose.

and.. the site looks this way for a reason, to leave it static and easy to mirror, so don't remind me that it looks like it's from 1995

at bitsavers dot org