Atari classic tape recovery program.  Version September 2003.

Preface.

I have always had a strong resentment towards saving data onto cassette
tape.  This is probably due to the fact that it is sometimes a write only
medium.  Data that is stored on a cassette tape is sometimes impossible to
ever load again.  I have had this problem ever since our first computer,
which was a system based on the S-100 bus.  It had an 8080 processor, and a
cassette interface.  When I finally went out and bought a real nice
computer, it was time to buy the Atari 800 system.  I knew right away that
I wanted an 810 diskette drive.  I thought it would be nice to be able to
load games from cassette too, so I did buy a 410 cassette recorder as well.
When I got home from the store, it turned out that my 810 was not
operational, and thus I was forced to use the 410 for a couple of days.  I
soon found out that the cassette system on the Atari was not much better
than that on our old S-100 bus system.  I rushed back to the store to get
the 810 replaced.  Ever since that first day, I have never trusted a
cassette unit again.  Various models were introduced to replace the old
410.  The 1010 appeared, followed by the XC11 and the XC12.  Some units
worked better than others.  Some had severe problems with noise and hum.  I
cannot begin to count the number of load errors that I have experienced.
And I have had a floppy drive almost from the beginning.  I started to hold
a grudge against cassette units.  Numerous times, I have dreamt about
throwing one from the world trade center, or some similar building, just to
see it be smashed into more pieces than are listed in the service
documentation.  This is what I advised people to do when I bought a box of
1010 cassette units.  I sold them strictly for the SIO-cable, telling
people that the 1010 unit came with it for free, and that it could be
thrown off a tall building.  Actually, since it is illegal and dangerous to
throw things off of a tall building, I had to come up with another way to
pay these nasty cassette units back for all the load errors they have
inflicted on me.  This project is the result of my efforts.  It is payback
time, so all you 410's, 1010's, XC11's, XC12's, beware, you are about to
become obsolete.

Purpose of this project.

The design goal of this project is to retrieve the data that is stored on
cassette tapes.  This sounds like a sensible thing to do with a storage
device.  That is, unless you design cassette interfaces.  The data can be
stored on a mass storage device on some host system.  It should be possible
to load the data from the host system into the Classic Atari.  The data
could also be re-saved onto cassette tape, thus cleaning up faulty tapes
that contain dropouts, spikes, noise, or other problems.  Data that is
damaged can be restored, or repaired.  The data on the new tape could be
saved at a higher baudrate, thus reducing the time required to load the
tape.  The quality of the cassette tape can be greatly improved in this
way.  The various emulators that currently exist, might add support for the
file format used to store the cassette data.  Booting a cassette tape on
these emulators could then be implemented, and it could load a tape
instantly.  Since the cassette data is retrieved using only a sound card
and some software, there is no need for a cassette unit.  Data can be
retrieved from the cassette tapes even if your cassette unit is broken,
unreliable, or otherwise not available.  Data that is retrieved from a tape
can be processed, and converted to a disk format.

Theory of operation.

In order to be able to retrieve information, we must first know how data is
encoded on Atari tapes.  There is some documentation on this subject in
various books and documents.  I could refer you to these documents, but
since I have gathered all the information for this project, I will simply
list the information here.  Data is encoded on the cassette tape as a
frequency shift keying audio signal, usually referred to as FSK.  For the
non-technical persons, this means that an audio tone is recorded on the
tape that has either one frequency, referred to as the mark tone, or
another frequency, called the space tone.  Normally, the FSK-decoder within
the cassette unit will determine what tone is on the tape.  Based on the
frequency of the tone, the cassette unit will convert this tone to either a
"1" value, or a "0" value.  This value is placed on the serial bus DATA IN
line.  So that is all the cassette unit has to do.  The POKEY chip within
the Atari will be programmed by the Operating System to decode the data.
The data is stored on the tape as a serial stream of bits.  Each byte
contains 8 bits.  Since the data is saved as a-synchroneous data, there are
also startbits and stopbits.  There is one startbit, and one stopbit.  So
in order to store one byte on the cassette tape, ten bits are encoded.
POKEY will notice the startbit, determine the value of the 8 bits that
follow the startbit, and it will use the stopbit to synchronize again, so
that it will not get confused.  The value of the 8 databits will be
combined to form the byte that was encoded.  The byte is stored in a
buffer, and the next byte will then be decoded.  When a complete record is
decoded, the O.S. will return the status of the operation.  Sometimes it
will tell you that decoding was successful.  Most of the time, a boot-error
or load-error is reported.  Well, actually, sometimes it works most of the
time.

All this sounds fairly simple, and actually it is.  But then again, I have
left out a lot.  For one thing, since this is a-synchroneous data, POKEY
must determine how long a bit actually is.  If two consecutive bits have a
value of "1", how will POKEY know whether this is one bit, or two bits, or
more?  This is determined by the bit-rate.  This tells POKEY how many bits
are encoded within one second.  The bit-rate for Atari tapes is usually 600
bits per second.  Since a simple FSK encoding scheme is used, this happens
to be the same as the baudrate.  So in the manuals, it says that the tapes
are written at 600 baud.  You probably should not care about the
difference, since who cares anyway?  Now that we know that the speed is 600
bps, we know that the duration of a single bit is 1/600th of a second.
POKEY can determine the level of the serial DATA IN line for that period of
time, and it will know that after this period of time, the next bit will
follow.  Now a tape can be stretched a little, due to wear and tear.  Also,
the speed of the motor inside the cassette unit is not always the same.  So
the engineers at Atari thought it would be a nice idea to compensate for
this in the O.S.  A scheme for dealing with variations in tape speeds and
other problems is designed into the software.  The O.S. will measure the
bit-rate at the start of each block of data.  In order to be able to do
this, two bytes with a fixed value are placed at the beginning of each tape
block.  When the O.S. reads the beginning of a data block, it will know
that these two bytes will be the first bytes to be found.  These bytes have
a known value, and thus they can be used to measure the speed of the tape.
Therefore, they are called the marker bytes.  The value chosen for these
marker bytes is the hexadecimal number 55.  If you translate this into
bits, you will find that this means the value is 01010101.  The "1" and "0"
bits alternate.  This means that there is no way that two consecutive bits
can have the same value.  This makes this value ideal for measuring the
length of a single bit.  Once this length has been measured, it is assumed
that all the other bits in this data block have the same length, since they
must have been written at the same speed.  Or actually, we hope that the
speed of the motor will not vary too much from the value it was when the
first two bytes were read.  The engineers at Atari allowed for quite a
deviation from the standard speed.  Some sources claim that the nominal
speed is 600 baud, but that the actual speed can range from 318 baud to
1407 baud.  If you dive into the Operating System source listing, there is
indeed a mention of these values.  There is a table that tells the O.S.
what value to store in a certain POKEY register.  What people appearantly
failed to notice, is that part of this table has been commented out.
Either there was not sufficient space in the ROM's, or some engineer
thought this range was outrageous.  The table actually ranges from 895 down
to 447 baud.  My experiments showed that the highest obtainable speed is
around 820 bps, which is the second value in the table, which begins with
895.  So maybe the logic for accessing this table always skips beyond the
beginning.  I have not looked at that piece of code though.  The actual
speed is allowed to deviate slightly from the programmed speed, so we can
get away with a speed of 875 baud.  Still, 820 bps is quite a difference
from 600 bps.  When you think about it, the variation in tape speed must be
enormous.  You think this tolerance would guarantee that a tape can always
be loaded.  But just stop and think about this.  If the tape speed would
really be more than 35 percent higher than normal, not only would the
baudrate be higher, but also the tone of the FSK signals would be some 35
percent higher.  The space tone is supposed to be 3995 Hz.  Add 35 percent
to that, and you get about 5393 Hz, which is even beyond the value of the
mark tone, which is supposed to be 5327 Hz.  This clearly shows that a
speed of 1047 bps would be ridiculus.  The FSK-decoder circuit in the
cassette unit can never ever cope with that much difference.  Too bad some
smart engineer at Atari figured this out, and decided to limit the table.
Make no mistake though, even though the cassette unit hardware cannot cope
with this, POKEY would have no problem at all processing these speeds, if
programmed at the right baudrate.  Remember, POKEY does all serial bus I/O
at 19,200 baud, and even then it can go beyond that.  So, the cassette unit
can probably not go much beyond 600 bps you would say.  Well, we cannot
simply increase the speed of the tape.  But we can increase the FSK
encoding bit-rate, as long as we keep the frequencies of the tones within
the specifications.  More about this later.

Okay, so now we now that each block starts out with two marker bytes of
hex-55.  The O.S. will assume that a data block consists of 128 data bytes.
This is the standard length of a cassette block of data, as supported by
the O.S.  Since the block must be 128 bytes long, regardless of how many
bytes are actually stored, the O.S. must have a way of knowing how many
bytes of data are stored within the block.  The third byte found in the
cassette block tells the O.S. how many bytes of data are valid.  It is
called the control byte.  If the block is totally filled with bytes, the
control byte will have a value of hex-FC.  If the block is partially
filled, this byte will be hex-FA, and the 128th data byte will indicate how
many of the 128 data bytes are to be considered valid data.  Since this is
by definition less than 128 bytes, the 128th byte is always available for
this purpose.  Most boot-tapes will always have completely filled blocks.
Files saved by basic usually do not have their last block totally filled.
There is also a special code for the control byte to indicate that the end
of the tape file has been reached.  If the byte has the value hex-FE, this
is the end of file indicator.  The data bytes will all contain hex-00 if
the tape is according to the specifications from Atari.  Some boot-tapes do
not have this end of file record though.

After the marker bytes, and the control byte, we will find 128 data bytes.
These are very boring, because they only contain the data or the game or
whatever is saved on the tape.  After these 128 data bytes, we will find
one more byte.  This byte contains the checksum.  If you add all bytes,
from the marker bytes, the control byte, up to the very last data byte, you
will get a checksum.  This checksum must be equal to the value found in
this last byte.  If the value does not match, well, we all know what
happens.  You will get a boot-error, a load-error, a message stating to try
the other side of the tape, or similar problems.  Note that the carry is
always added too.  So if addition of a byte causes a carry, the carry is
immediately added to the checksum.  Since this checksum is merely a simple
sum of all bytes, which is forced not to go beyond 255, this method of
detecting errors does not offer the highest reliability, but somehow, it
manages to detect load-errors every once in a while.  Well, if you paid
attention, you will now know that a physical tape record as found on the
tape contains 132 bytes.  The logical tape record is only 128 bytes.  When
you request a read from a tape, you will only get the 128 data bytes.  The
extra bytes on the tape are there solely for the purpose of retrieving and
checking the data.  There are some boot-tapes that do not adhere to this
standard.  Usually, they have a small boot-loader program that is booted
first.  Then, they take control of the cassette unit, and bypass the
Operating System.  Anything goes from that point on, like making a tape
block with a record length of 1024 bytes or so.  More about this later.

We now know that a tape block usually contains 132 bytes.  Since each byte
has eight bits, and is encoded with a startbit and a stopbit, there are a
total of 1320 bits in one tape block.  If this is encoded at 600 bps, you
will need 2.2 seconds to encode one block of data.  To allow the computer
some time to process the bytes once they have been received, there is a
little pause before the next data block begins.  This pause is called the
Inter Record Gap, or just IRG.  The O.S. has two flavours of IRG.  One is
called the short IRG, and the other is the normal IRG.  They differ mainly
in duration.  A normal IRG is longer than the short IRG, but I suppose
everybody already figured that one out.  Which flavour of IRG is used, is
selected when the cassette file is opened, with an option in the open
command.  The short IRG is also called the continuous open mode, while the
normal IRG is also called the start/stop open mode.  The open mode selects
whether or not the motor is stopped after the record has been written to
the tape.  Since the motor has to be started again when it has been
stopped, the motor must be allowed some time to get up to speed again.
This is why the IRG is longer when the start/stop mode is selected.  The
Inter Record Gap is the gap between one record and the next.  What do we
find between the end of the previous record and the start of the next?
That depends on what type of tape you come accross.  Some tapes are written
by companies that mass produced cassette tapes for software companies.  The
way these tapes were written differs a lot from the way a regular Atari
cassette recorder will record data.  One of the differences is found in the
way the IRG is created.  On an Atari cassette recorder, with the start/stop
open mode, the O.S. will stop the tape motor once the record has been
written.  So this means that after the end of a record, there is a fraction
of a second in which the tape will come to a halt.  Whatever is written on
the tape at that time is hard to say.  It is a sure bet that it will be
some noise, that will not look like data at all.  However, if the short IRG
mode is selected, the tape motor will not be stopped.  This means that
there will be no garbage after the end of the record.  Tapes that are mass
produced obviously are never stopped, because they usually use the short
IRG, but also since the tape was usually produced by some sort of
duplication machine, not a real Atari.  The master tape was probably
generated with a professional FSK modulator.  More about this later.  The
stuff that we find on the tape after the end of the record, is called the
Post Record Gap, or just PRG.  At the start of each record, a Pre Record
Write Tone is written so that it will be easy to recognize the beginning of
the record.  This is the PRWT.  If the short IRG is selected, the PRWT will
last 0.25 seconds.  For a normal IRG, this PRWT will last 3 seconds.  With
a normal IRG, the PRG can last up to one second.  With a short IRG, the
length of the PRG can be anything.  It depends on how much time was needed
for the program to start writing the next record.  If you used a simple
save command to write the program or data to the tape, the PRG will be
close to zero seconds, but if it is a data record written by a program, it
might take several seconds.  Then again, what would be the point in using
the short IRG then?  Anyway, what gets written to tape when the short IRG
is in effect is usually a mark tone.  The PRG and PRWT will form the IRG
together, and it is hard to tell where one ends and the other starts.
Since they are the same tone, we only have to worry about what happens when
the tape stops and starts again.  We will have to treat that portion of the
tape as noise.

Since we are on the subject of gaps, there is a special gap at the
beginning of the tape.  Well, since there is nothing before the gap, I
suppose you cannot really call it a gap.  That is probably the reason why
it is called the leader.  Each tape file starts out with a leader of 20
seconds.  This leader will be written as a mark tone.  The leader allows us
to recognize the beginning of the tape file.  Most cassette tapes have a
special leader tape portion to attach the tape to the spindles.  This
portion is usually made of a different material, so no data can be recorded
on it.  Since data is often saved at the beginning of a tape, the leader
will make sure that no data is written before the tape leader has passed
by.  Since the O.S. knows that this leader must be present, it will wait
some time to make sure that the leader has been reached.  If you do not
rewind the tape completely, you will get a load error, since the O.S.
insists on waiting for 9 seconds before even looking at the tape.  The same
is true for the inter record gaps.  The O.S. will wait a small period of
time because it wants to wait for a large portion of the IRG to pass by.
Since these timing values are built into the Operating System, it is
impossible to change these timing values without bypasing the O.S.
completely.

Now that we know what is on the tape, let us see what happens when we try
to read a tape.  When we start reading, we will first encounter the leader,
which is a mark tone.  The O.S. will wait 9 seconds, and after that, it
will start to load a block of data.  The mark tone is converted by the
FSK-decoder in the cassette unit into a "1" level on the DATA IN line on
the SIO bus.  The O.S. uses the POKEY chip to monitor the DATA IN line.
The state of the DATA IN line can be monitored at bit 4 of the SKSTAT
register.  The O.S. will wait until the DATA IN line drops to a "0" value.
When this happens, we will have reached the startbit of the first byte in
the block.  Since the first two bytes are the marker bytes, the O.S. will
use these bytes to measure the bit-rate.  At this time, a counter value is
saved, so that it can be used to compute the bit-rate, based on the
difference in the counter values.  The O.S. will now watch the DATA IN line
to see when the first ten bits have passed.  Then, it will look for another
ten bits for the second marker byte.  When all these bits have passed, the
counter value is taken again, and the difference is computed.  This value
is used to access a table, which holds the value for the bit-rate register
in the POKEY chip.  Once programmed with the proper bit-rate, POKEY will be
able to convert the serial stream of bits into bytes, that can easily be
obtained from the POKEY chip.  We start out by obtaining and storing the
control byte.  After that, we have to retrieve the 128 data bytes.  Each
byte is added to the checksum, and when the checksum byte is finally
obtained from the POKEY chip, the computed checksum is compared to the
checksum byte.  When things work as they should, these match, and the O.S.
will report that the record has been read.  The marker bytes have not
actually been read by the POKEY chip, but they are included in the
checksum.  This is why the O.S. starts the checksum at hex-AA, which is the
sum of two hex-55 bytes.  The O.S. will not need to convert the bits into
bytes itself.  Neither will it have any part in detecting the tone on the
tape.  The FSK-decoder inside the cassette unit will convert the FSK tones
to serial data that is put on the SIO bus.  POKEY will convert this to
bytes, once properly programmed with the proper bit-rate.  To the POKEY
chip, this data looks very similar to the data that comes from a diskette
drive.  Once again, when reading a cassette tape, all the audio processing
is done strictly by the FSK decoder circuit within the cassette unit
itself.

When data is saved to a cassette tape, POKEY will be used to generate the
tones.  The audio signals generated by POKEY are simply recorded by the
cassette unit.  When the cassette device is opened for output, the O.S.
will put POKEY into the two-tone mode.  POKEY will also be programmed for a
bit-rate of 600 bps.  Two beeps prompt the user to press return.  When
return is pressed, the motor is started.  At this time, the leader is
created, by simply waiting for 20 seconds.  Then the O.S. considers the
open complete.  When the O.S. is called to save a record to the tape, the
pre-record write tone is generated by waiting for either 0.25 seconds, or 3
seconds if the motor has to be started again, as described earlier.  Then,
the two marker bytes are put into the pokey register.  When these have been
written, the control byte and the 128 data bytes are also put into the
POKEY register one by one.  While doing this, the checksum is computed, and
when all bytes have been written to tape, the checksum is written to the
tape.  Then, either the motor is turned off, or left on, depending on the
open mode.  Bytes are coded on the tape in ten bits.  First a startbit,
which is a space tone.  Then, the eight databits are coded, starting with
the least significant bit.  A bit that has a "1" value is encoded as a mark
tone, a bit that has a "0" value is encoded as a space tone.  After the
databits, a stopbit is encoded, which is a mark tone.  POKEY will use a
tone of 5327 Hz for the mark tone, and a tone of 3995 Hz for the space
tone.  Most of this sounds familiar, since we already discussed this when
we were looking at how data is read.

Just the facts.

Stereo cassette tape, one data channel, one audio channel.
Leader is 20 seconds mark tone.
Normal IRG PRWT is 3 seconds mark.
Short IRG is 0.25 seconds mark.
Normal IRG PRG is up to 1 second garbage.
Short IRG PRG is 0 to n seconds of garbage.
Startbit is space.
8 data-bits "0" = space, "1" = mark, lsb first.
Stopbit is mark.
Mark is 5327 Hz.  Space is 3995 hz.
Speed is 600 bps nominal, may run from 425 to 875 bps.
Record starts with two marker bytes of hex-55.
Control byte has a value of hex-FC for a full record.
Control byte has a value of hex-FA for a partial record, the 128th data
byte will indicate the number of bytes actually used.
Control byte has a value of hex-FE when the record is an end-of-file
marker, all data bytes will be zero.
A tape record usually has 128 data bytes.
A checksum is appended to the end of the tape record, it is computed by
adding all bytes, adding the carry back in.

Reading data from a cassette tape.

Now that we know what we will find on a cassette tape, we can try to think
of ways to retrieve the data that is stored on the tape.  An obvious method
would be using an Atari cassette unit.  But that would be too easy.
Besides, I do not like using those machines, just collecting them.
Actually, it should be possible, but more about this later.  I was more
thinking about ways to retrieve the data by means of software.  Since the
data is encoded on the tape in an audio format, we can use regular audio
processing tools to process the data.  What we need is a PC equipped with a
sound card, some software to digitize the audio, and a lot of free hard
disk space.  We have to convert the FSK tones into a wave file.  A regular
audio cassette player should be connected to the sound card.  The tape is
recorded in a stereo format, and we are only interested in the data
channel.  We have to connect the sound card to the proper channel.  We can
then start the wave recording program.  For reasons that I will explain
later, we must set the sampling rate to 44,100.  We only want to record one
channel, so we set it to mono, 8-bits, just to keep it in style.  This
means that the audio will be sampled 44,100 times per second.  The level of
the audio signal is represented by a value that ranges from 0 to 255, since
that is the range that is available within 8 bits.  Therefore, each sample
will be represented by one byte.  When the recording level is not adjusted
properly, the value may range from about 30 to 220 or so, or similar
values.  This does not really matter much, as long as the audio signal has
a reasonable level.  Adjust it as you would with any other sound.  For my
cassette player, this means I have to set the recording level control at
the maximum.  When there is no audio signal, the level will be around 128.
We can now start to record the audio signal to the wave file.

Tapes differ is length, depending on how much data is stored on them.  Some
tapes are 16K or less, other tapes are 48K or more.  Most tapes require
roughly 2.5 seconds per block of 128 bytes, which is 20 seconds per
kilobyte.  A 16K tape will take about 5 minutes, a 48K tape could easily
take 15 minutes or more.  At a sampling rate of 44,100 samples per second,
we will have to store 44,100 bytes each second, or about 2.5 Megabytes per
minute.  A fifteen minute tape will easily consume over 30 Megabytes of
disk space.  I told you you would need a lot of free hard disk space.  If
you do not have that much hard disk space available, you should backup some
data and make space available, or simply buy a bigger hard disk.  Another
option is to record only a portion of the tape, process that portion, and
then process the next portion, and so on, until the entire tape has been
processed.  Me, I am lazy, so throw hardware at the problem.  Hard disks
are cheap nowadays.  When the entire tape has been digitized, we can stop
recording to the wave file.  Make sure you rewind the tape before you start
recording.  Most of the time, it is best to wait until you hear the leader
before starting to record to the wave file.

Now that we have the cassette tape on our hard disk as data, we can start
to analyze and interpret the wave data.  A wave file starts with a header
that specifies the sampling rate and other technical stuff.  A wave file is
a file in the standard RIFF file format, and documentation on this subject
is available.  I did not invent this weird format, so if you want to know
the details, get yourself a copy of that piece of documentation, or try to
read the code.  (Hah, if it was hard to code, it should be hard to
understand!).  This header is read and processed.  If the wave file is
acceptable, processing continues.  After the header, we will find the
sample data.  This sample data is a very huge list of PCM values ranging
from 0 to 255.  From these samples, we have to form the bytes that are
encoded in this audio pattern.  We know that we should only find the FSK
tones in the audio.  The first thing we should do is to determine which one
of the two tones is on the tape, the mark tone, or the space tone.  The
only way to distinguish between these two is their difference in frequency.
Therefore, we must think of an algorithm that determines the frequency of
the tone.  Or rather, one that tells us whether it is one tone, or the
other.   A tone has the form of a wave, hence the name wave file.  The
audio level rises and falls continuously.  The number of times it rises and
falls within one second is called the frequency.  The mark tone has a
frequency of 5327 Hz, which means that within each second, it will rise
from the minimum value to the maximum value and back to the minimum value
5327 times.  In other words, there are 5327 periods within one second.
Since we are sampling at a sampling rate of 44,100 samples per second, this
means that within each period, we will be taking about 8 samples.  If we
are sampling a space tone, which has a frequency of 3995 Hz, we will be
taking about 11 samples each period.  The way we can determine the
frequency of the tone is now very simple.  We look at the sampled values to
determine when the period is complete.  The number of samples within that
period tells us what the frequency of the tone is.  If we counted 11
samples, it was a space tone.  If we counted 8 samples, it was a mark tone.
If it was anything else, well, we have a problem.  Of course, sampling is
not begun exactly at the start of the period.  So we have to allow for
slight differences.  Therefore, 10 samples will also be recognized as a
space tone, and 9 samples will be recognized as mark.  From this, it is now
obvious why a sampling rate of 22,050 or less will not work well.  We would
have 4 samples for mark, and about 5 samples for space.  This is not enough
of a difference, since the timing differences we mentioned just now could
cause a difference of a single sample, causing a space tone to be
recognized as a mark tone, or a mark tone as a space tone.  With the
sampling rate at 44,100, we have enough of a difference to allow for these
slight timing problems.

So how do we determine when a period is complete?  In the beginning, I
tried to find the sinus wave form we all know, starting at the zero level,
going up to the top, going back down passing the zero level again, going
down to the bottom, and then back to the zero level again.  A classic
sinus.  Well, in real life, you will not only find the two FSK tones on a
tape.  For one thing, you can find noise.  There might be a hum
superimposed on the signal.  A FSK-decoder will not be disturbed by this,
since it has filters to get rid of these disturbing noises.  We have to
remove these unwanted signals in software now.  A hum has a frequency that
is around 50 or 60 Hz.  If the amplitude is fairly large, it will cause our
zero level to shift up or down from the normal value of 128.  To compensate
for this, we have to take the average value of the samples, and compute the
zero level value based on this.  This was a big problem, since when the
zero level is around 128, do we consider the value 127 to be positive or
negative?  If the zero level is not determined exactly right, this might
cause us to misinterpret a sample.  To reduce the problem, I boosted the
sampling rate internally to 88,200 by computing the average value of two
consecutive samples, and inserting that between the two.  This gave me more
of a difference between mark and space, and it reduced the deviation that
was caused by the samples with a value near the zero level.  Another
problem is the fact that on some tapes the volume level of the mark tone
differs from that of the space tone.  When the tone shifts from mark to
space, it is hard to compute the average.  You would need to take the
average of a couple of periods.  But then the hum is no longer removed.
When there is a dropout on the tape, the audio level is reduced, or the
audio is even totally gone.  I spent a lot of time trying to solve these
problems, and the solution is simple.  Since it is hard to determine the
zero level, it is best to come up with an algorithm that does not care
about what the zero level is.

The current version is only interested in the count of samples within a one
period time frame.  It does not matter where this period begins.  So let us
not start at the zero level, since that is so hard to determine.  Let us
start at the top level.  That is easy to recognize.  We only have to
compare two consecutive samples.  If the second sample has a higher value
than the first one, the sinus wave is going up.  If it is has a lesser
value, it is going down.  That is very easy to determine.  When it is no
longer going up, we will know that the previous value was the top value.
We will be going down to the bottom value, passing the zero level
somewhere, but who cares, and when we are going back up again, the period
will be complete when the sample level reaches the top, when it starts to
fall again.  This is true even if there is a hum or other noise
superimposed on the signal.  The value of the top and bottom levels may
vary, but we are not interested in the actual value.  We just want to see
the level change from rising to falling or from falling to rising.  This
algorithm is not troubled by any noise at all.  Now all we do is count the
number of samples between two top levels, and since we know the sample
rate, we can compute the frequency of the tone.  Or actually, we just want
to distinguish between the mark and the space tone, so if the sample count
is above a certain value, we will assume the tone is a space tone, and
otherwise it must have been a mark tone.

Now that we know whether the current tone is mark or space, we can take the
next step.  We want to know the duration of the mark and the space tones.
Since one tone stops when the other starts, we can measure the duration of
the mark and the space tones in pairs.  Each mark tone is followed by a
space tone, which is followed by another pair of mark and space tones.  The
duration of a tone is determined by summing up all sample counts that
turned out to be of the same tone.  When we find the sample count indicates
a mark tone, we add it to the sum of the duration of the mark tone.  When
the next period is a mark tone too, we keep adding.  If it is a space tone,
we start the space count of the pair, and continue counting space tone
values until we encounter a mark tone again.  At this time, we move to the
next pair and start summing the mark counts again.  In this way, we are
building a table with count pair values, that holds the total duration of
each mark and space tone on the tape as a sample count.  The duration of a
tone is related to how many bits are encoded within that tone, so we can
use this table to recover the bits from the tones.  The bits can then be
assembled into bytes again.  This table would grow very long, and it is
also easier to decode the bytes in blocks, since we have a checksum that
can help in determining problems in decoding the data.  So when we hit the
end of a record, we want to start decoding the data.  We know that after
the end of a record, we might find any old noise.  This makes it hard to
find the end of the record.  We do know however that after this, we will
find the PRWT of the next record.  This is easy to recognize, since it is a
mark tone of a considerable duration.  When the mark tone changes over to a
space tone, we can look at the duration of the mark tone.  If the sum is
above a fairly large number, we know that this mark tone was a PRWT tone.
At this time, we can start to process the record, and then start our table
all over again for the next record.

We now have this table of sample counts.  How do we convert this into
bytes?  First we have to know how many samples there are within one bit.
This depends on the baudrate.  If we find 600 bits per second on a tape,
and we are taking 44,100 samples each second, a bit will last about 44,100
samples divided by 600 bits, which is 73.5 samples.  Half a sample does not
exist, so we know we have about 73 to 74 samples per bit.  We can now
simply divide the sample counts in the table by 73, and then we will know
how many bits are represented by each tone.  We know how the bits are
written on the tape, so we can figure out the startbit, the eight databits,
and the stopbit.  The table starts out with the PRWT, which can be found in
the count for the first mark tone.  The count for the first space tone in
the table must be the first startbit, so this should be a value of around
73.  If this is a standard tape record, we will find the two marker bytes
at the beginning of the record.  The first databit we find will be the
least significant bit.  It has a value of "1", so it is encoded as a mark
tone.  Again the value should be around 73.  In this way we will be able to
decode all bits of the first marker byte.  At the end, we will find the
stopbit, which is a mark tone.  The next marker byte is exactly the same.
Then we will find the control byte, and the 128 data bytes, and the
checksum.  It is simply a matter of dividing the count values by the length
of one bit.  During this process, we have to keep track of when to expect a
startbit and when to expect a stopbit, since we have to throw these away.
We only want to store the databits.  Since the marker bytes are meant for
measuring the baudrate, this is what we will use them for.  We know that
their bit pattern alternates, so we know that from the first space tone, we
can find 20 bits alternating.  The count values of these 20 bits are added
and divided by 20.  We then know the actual length of one bit, which
usually turns out to be somewhere around 75 or so.

Well, this algorithm sounds easy enough, but what if a tone lasts only a
few periods?  If the sample count of a bit is around 40 or so, it is too
short to be a bit.  Well, there can be noise on the tape that causes our
counting algorithm to fail.  In this case, you might get count values that
are not a multiple of the bitlength.  This is a big problem.  The program
can try to clean up these disturbances, by looking at the count values of
the surrounding tones.  For instance, if a mark count of 40 is followed by
a space count of 10, and then we find a mark count of 25, we can assume
that the space count of 10 must have been a mistake, since a space tone of
10 samples is just one period.  We can then add all the counts, and we will
ignore the fact that we recognized the space tone.  The result is a mark
tone of 75 samples, which looks like a normal bit length.  This algorithm
cleans up little mistakes.  But what if the mark count is 320 and the space
count is 280?  This looks like 4 bits of mark tone, but we have a remainder
of 20 samples then.  Well, we can simply throw away the 20 excess samples,
but the 280 count of the space tone would be 3 bits with a remainder of 55
samples.  This looks more like 4 bits.  This is why we cannot simply divide
the count by the bitlength.  The program uses a repetetive subtraction.
For each time we can subtract 75, one bit is recognized.  If the remainder
is above 60 percent of the bitlength value, it is recognized as a bit, so
55 is still recognized as a bit.  This turns out to be the hardest part,
where most of the errors occur.  If the sample counts are not a fairly
reliable multiple of the bitlength, we must guess what the value is.  In
some cases we get lucky.  Sometimes, we make the wrong descision.  When
this happens, the decoding of bits fails to work.  Sometimes, we cannot
tell where the startbit begins, and several bytes get lost, until we find
some byte that causes the program to get synchronized again.  Of course, at
the end of the record, we might realize that we only decoded 126 bytes
instead of the 132 we were supposed to find.  Or we will find that the
checksum does not match the computed checksum.  We might add powerful data
restoration routines in this section of the decoding process.  However,
sometimes it is just impossible to correct the values in the table.  If the
tape had a dropout, there is no way you can properly decode the bits, since
there is no audio on that portion of the tape.  You can try to guess the
values though.  Some tapes have glitches and disturbances that you would
not believe until you look at the wave file with a wave editor.  Most of
the time, as long as the FSK signal is still present, the program will not
be disturbed by this.  Sometimes it will though.  Currently, there is not
much recovery logic in the program, since most tapes I have tested could be
processed by simply trying to read the reverse side of the tape.

The output.

Well, now that we know how to retrieve the bytes from the audio, what will
we do with them?  The bytes are stored in a cassette image file.  This file
has a small header, so that it can be recognized as a tape image file for a
Classic Atari computer.  This header starts with the word FUJI.  This
header record must always be the first record in the file, so that we can
always identify the file as a cassette image file.  A descriptive text is
also stored in this header.  When processing the wave file, this
description can be entered.  This description might be displayed when the
cassette image file is processed, or when the data is sent to the Atari.
The length of the description is stored, so that we know how long the
header record is.  All records in the cassette image file have the same
format, starting with a type identifier of four bytes.  The next two bytes
indicate the length of the record, and they are always in the 6502 format,
so first the low order byte, then the high order byte.  The next two bytes
contain information that varies based on the record type, they are called
the aux-bytes.  If they contain a number, it is stored in the 6502 format.
These eight bytes are always present.  These bytes are not included in the
length specification.  After these eight bytes, some record types contain
data, and some do not.  The length bytes specify how much data to expect.
If there is no data, the length bytes are zero.  There is no alignment, so
the numbers cannot be treated as words or anything like that.  The header
we discussed is simply a record type, like any other record type.  Because
the first record must be a record of the header type, we can use it to
identify the file.  The header records currently have no value in the
aux-bytes, so they are zero.  After the header record, we might find a
record telling us what the baudrate is at which the file should be
processed.  The record type contains the word baud.  The baudrate is stored
in the aux bytes.  There is no data for this record type.  After this, we
will find a lot of data records.  A data block will contain 128 data bytes
most of the time, but since the marker, the control, and the checksum bytes
are also stored in the cassette image, the record will usually be 132 bytes
long.  The record type contains the word data.  The length of the record is
stored as usual.  If the record size is different, well, this will be
indicated in the length bytes.  The duration of the PRWT is stored in the
aux bytes measured in milli-seconds.  This is the current content of the
.cas file.  Note that this leaves room for expansion.  Since each record
has a record type identifier and a length specification, we can ignore
records that are not supported by skipping them.  New record types might be
added in the future.  For instance, we could scan the picture that is on
the cassette inlay, and store the picture data in the .cas file with a
special record type.  We could then show the picture of the cassette box
while loading the program.  But we could do a lot of other things too.  For
instance, we could create multiple header records, each with a description,
in order to tell the user at what stage of the boot we are currently at.
Creativity and time are the limiting factors here.

If the decoding process does not result in a clean cassette image file, we
are forced to investigate what the problem is.  But how do we know how the
program did?  Apart from the cassette image file, the program will also
output a file containing the data in a hexadecimal notation.  It has an
extension of .hex, and it is a simple text file.  It can be viewed with any
decent text viewer.  It will contain a line of text for each data record.
It contains the length of the PRWT, the length of the data record, and all
the bytes.  The marker bytes, the control byte and the checksum byte are
also simply treated as data, so all bytes are visible.  The checksum is
computed, and it is compared to the checksum byte.  The result of the
comparison is indicated at the end of the text line, by either putting "ok"
or "bad" at the end of the line.  The computed checksum is stored next to
the checksum byte, so we can see what the current value is, and what the
expected value is.  Lines that are marked bad mean trouble.  If the record
length has a weird value, you will know that there is a problem with
recognizing the bits, maybe because of erroneous mark or space counts.  If
we could simply correct these problems within this text file, we would be
able to recover bad tapes.  This would only be possible if we know what
bytes are missing.  The missing or incorrect bytes can then be corrected.
But then we would need a program that will take a .hex file and convert
that to a .cas file.  This should be fairly easy, but I have been too busy
to write such a program.

If you want to investigate a little deeper what caused the problems, you
can look at the .fsk file.  It will contain all mark and space count pairs.
This is also a simple text file, that can optionally be generated.  Each
line starts with the byte offset of the PCM data.  We can take a look at
the wave file with a wave editor, and this offset value can be entered in
the wave editor to position it at the problem area.  We usually find
something here that caused the disturbance.  It might seem complicated to
interpret the mark and space counts, and it probably is.  But you should
not be trying to understand what is encoded.  If you want to use the FSK
file to correct problems, you should only try to correct the things where
the wrong tone was detected.  After changing the counts, the file should be
processed in a similar way as we would normally process a count table.  The
mark and space counts could then be converted to bits and bytes again.
Again, we would need a program to do this, and this could apply the same
logic as discussed.

During the development stage, I had the program print a lot of information
to the screen.  However, printing several megabytes to the screen takes a
lot of time, and you have to be reading as fast as Mr Data to keep up with
the computer, so I redirected the output to a file.  All the relevant data
can then be examined at your own pace.  Well, as you might guess, all these
files take up a lot of hard disk space too.  I told you you needed a big
hard drive.  Since these text files are very large, you need a program that
can cope with large text files.  If you also want to change or edit the
text files, you need an editor that allows you to edit files of a few
megabytes.  Don't even think about using the regular DOS editor, which is
no good for anything except for editing config files.  Well, if you are
only interested in the .hex file, it might be able to handle that.  But the
diagnostic data is a couple of megabytes, when redirected to a file.  One
of the things I looked at to discover problems is the total count of the
number of samples per byte.  Since each byte is encoded in 10 bits, you
should have about 735 samples per byte.  The actual value does not matter,
as long as all bytes are about the same size.  If, at some point in the
record, you see a large deviation from this average length, you know that
this is a problem area.  Come to think of it, this would be a far better
algorithm for cleaning up the table automatically.  Since each byte ends
with a stopbit, which is a mark tone, and the next byte starts with a
startbit, which is a space tone, we know that when the space tone changes
to a mark tone, this mark tone might include the stopbit.  We only have to
look at the sum of the sample counts to determine this.  Better yet, we
could go look for the stopbit first, and then decide how to clean up the
bits within the byte, if the sample counts cannot be divided evenly by the
bit-length.  But this version seems to work for now.

Tape quality and such.

A while back, we discussed lengths of bits.  When you have a lot of tapes
from different manufacturers, you can see that these tapes differ a lot in
audio quality.  I am not talking about how these tapes perform on an Atari
cassette unit here, although that might be affected too.  I am strictly
speaking about the technical quality of how the data is written on the
tape.  The tones that are written on the tape should be pure, so that no
distortion will be able to confuse the FSK-decoder.  Well, since POKEY can
generate pure tones, that is not a problem.  But when the data calls for a
shift from mark to space, or the other way round, what we see at the spot
where the tone changes varies a lot.  Some tapes show a brief disturbance.
Some tapes simply change the tone of the wave without disturbing it.  What
we can notice here is that there are two tones that are recorded on the
tape.  When we need to change the tone, some tapes immediately change the
tone right then and there.  However, the period usually is not complete at
that time.  Changing a tone in the middle of a period always causes
distortion.  Clean tapes will wait until the tone reaches the zero level
before changing the tone.  Technically, this is a better quality tape.
Sometimes, tapes are written by professional FSK generators.  These have
two tone generators, since it is almost impossible to have a generator
produce a pure tone the moment it is switched on.  In fact, it may take a
while before it can generate any tone at all.  That is why these tone
generators are on all the time.  If we simply use some sort of switch to
choose between these two tones, switching from one tone to the other will
cause distortion.  When we wait until the tone reached the zero level
before switching, we cannot be sure that the other tone will be at the zero
level at that exact moment.  As a matter of fact, since the frequency of
the two tones differs, it will probably never be exactly at the zero level
then.  We should make sure that we switch over to the other tone when that
tone is at the zero level too.  This means the tones have to be somewhat in
phase.  With the frequencies specified by Atari, this is impossible.  These
frequencies were choosen because POKEY can generate them.  If we want the
tones to be in phase at the time a change can occur, we must change the
frequency of one of the tones slightly.  The FSK-decoder can cope with some
deviation from the standard frequency.  We might even change both tones a
little.  On top of that, we can adjust the baudrate a little, so that we
can make the change of the tone real smooth.  This might mean that the bit
is slightly longer or shorter than it should be, and the "1" bit might have
a different length than the "0" bit, but as long as we do not deviate too
much, the improved sound quality is worth it, since the FSK-decoder will
have less noise to deal with.  Well, that is nice for the FSK-decoder, but
how about our program?  If the bit is longer, we will have to allow for
these tolerances.  Well, trust me, especially this program does not like to
deal with distortion caused by the change in the tone.  As a matter of
fact, POKEY does not care when it changes the tone.  It just does it the
moment the time for the previous bit is up.  Tapes saved on a real Atari
thus are not very clean, and we will have more trouble reading them.  Just
take a look with the wave editor, and you will see what I mean.  Allowing
the bit length to vary is easy though.

Since we are returning to the baudrate topic again, can we increase the
baudrate on the tape?  Well, now that we know what is supposed to be on the
tape, we could generate a tape with a higher baudrate, as long as we still
use the specified frequencies for the mark and space tones.  All we have to
do is make the bits shorter.  We talked about cleaning up tapes earlier.
If we want to generate a FSK audio signal again, based on the contents of
the .cas file, we could write a new and improved tape.  What we need is a
program that converts the data to a wave file again.  This wave file can
then be played through the sound card, and we could record it using an
audio cassette recorder.  Since I have have been too busy decoding data,
writing a program to encode data would be a nice change, if I only had
time.  The technical quality of the tones would not be limited by a tone
generator, since we can digitally compute the values for the tones.  This
could really clean up the quality of a tape.  We can also choose the
baudrate we want to use, so we could increase it to 875 baud.

As we discussed, some tapes have a special tape format.  They start by
loading their own tape handler, and then it takes control and bypasses the
O.S.  Then, tape records of 1024 bytes of more are used.  Since our program
is unaware of the fact that the rules imposed by the O.S. are no longer in
effect, we do not know how to handle some tapes.  If the length is the only
difference, this is usually not a problem, since the program was designed
to cope with that.  As a matter of fact, if we would not have to worry
about this, we would know that a record must always be 132 bytes, and
ignore any bytes beyond that.  Sometimes, the program interprets the PRG as
data, and thus some records are assumed to be 133 bytes or more.  The
checksum will appear to be incorrect, although when trying to load a file
like this, the Atari will treat the excess bytes as PRG and ignore them.
Since a number of tapes have a non-standard format, we do worry about this.
I have tried some tapes with a weird format, and this does work, so, if you
really want to get rid of these excess bytes, you should either modify the
program, or change the .hex file, and write a processing utility for it.
If you do, upload it!  Some files with this weird format have a funny way
of storing the checksum.  They have a checksum of two bytes or more.
Sometimes, this checksum appears to be a separate very short record.
Sometimes, the records do not have marker bytes.  However, we still apply
the marker bytes algorithm to compute the bit-rate, since we do not know
that the O.S. rules are no longer obeyed.  If the result is some weird
bit-rate, we are either processing noise, or one of these non-standard
records.  If there are only a couple of sample count pairs, we treat it as
noise.  Otherwise, we assume the standard bit-rate of 600 bps, and hope for
the best.  I do not know if the program can convert all tapes that exist.
If you run into very weird tapes, the program might have to be modified.
The various text files that are output might have problems with the very
long line length.  The .hex file for instance would still have all the
bytes of a record on one line, and thus it might be a couple of thousand
bytes long.  Some editors crash if the line is too long.  Again, get
yourself an editor that can cope with this, or modify the program.

Just a few words about sampling audio are in place.  I have tried a couple
of programs to record audio to a .wav file.  One of the things to watch is
the fact that you will be recording a lot of data.  If your hard drive is
fragmented, the system will need more time to save the data.  Since the
sampling rate is high, this could cause your sample recording program to
lose a few samples while it is busy saving data to disk.  I did not program
that stuff, I would assume the program continues to store samples in
memory, but somehow it looks like the O.S. disasbles the interrupts while
doing I/O, so it will simply miss a few samples sometimes.  This might also
be caused by the fact that there are other tasks in the system that are
allowed to use the system resources for a while.  I have seen this happen
with an Operating System that was released in 1995.  If you look at the
wave file with a wave editor, you can sometimes see the spot where the
system could not keep up, especially if this is one of those very clean FSK
tapes.  Since this occurs randomly, it is hard to do anything about it.
You could do it like I did, simply create a 32 meg ramdisk and run DOS, but
not everybody is so fortunate to have memory in abundance.  A fast hard
drive is nice too, and it will do most of the time.  If you cannot avoid
this, simply record the file again, and you will probably experience this
problem at another spot on the tape.  You will need to cut and paste then
to get a clean tape by replacing the portion that was distorted.  All you
have to do is find the IRG, cut away the bad record, and insert the good
copy.  You might even record only that record.

Sending a cassette file to the Classic Atari.

Now that we have a cassette image file, we would like to be able to load it
into the Classic Atari.  The program CAS2SIO will allow you to use your PC
as a replacement for the cassette unit.  All you need is the PC, the
cassette image file, and a SIO2PC interface.  No sound card is needed.  The
program will read the cassette data and send it to the Classic Atari on the
SERIAL IN line, just like the cassette unit does.  There is a slight
difference though.  There is no motor control line in the SIO2PC interface.
Well, that is not a big problem, since we do not actually need it most of
the time.  For instance, if we are loading a boot-tape, we know that we
want the data to be sent, so we do not need to tell the PC to start the
motor, which does not exist, it will simply send the data as instructed on
the PC keyboard.  We only need to enter the command for the PC, then we can
switch on the Classic Atari, telling it to boot from tape, and if we are
quick enough, the PC will still be sending the leader.  Booting will then
simply proceed as usual.  If the Atari tells the motor to stop, we will
have a problem.  I have not seen this yet though.  The motor is supposed to
stop when the boot is complete.  Some multi-stage bootloaders turn the
motor on again, after waiting for some introduction screen, or after
setting up some data.  This has not been a problem so far, since the PRWT
will usually be long enough.  The PC will be sending the PRWT of the next
record, and most of the time, the Atari will want the next data record
before the PRWT runs out.  If the Atari needs more time, this will be an
inconvenience.  All we have to do is go and change the length of the PRWT
of that record, so that the PC will wait a little longer.  You will need a
file-editor for this, or the utility that processes .hex files.

I tried to keep the CAS2SIO program very simple.  After all, all it has to
do is read the cassette records and send them over the serial port of the
PC.  The BIOS allows us to set the speed of the serial port to 600 baud, so
that is all we need.  Unfortunately, the hardware in the PC did not agree
with me on that.  When I tried to send a byte, it insisted on a couple of
handshaking signals to be present on the serial port of the PC before
sending anything.  With a breakout-box and a couple of wires, I was able to
provide these signals, but that would require all of you, and me, to modify
the SIO2PC cable.  That is why the program is a little more complicated,
since it goes directly to the serial chip, the UART inside the PC.  It does
this by addressing the port at which the chip is connected.  This way, we
can bypass the BIOS within the PC, and thus simply ignore the fact that the
handshaking signals are not present.  Since we are already bypassing the
BIOS, we can now also program this UART to use non-standard baudrates.  We
can program it for any value we like, up to the limit of our various
hardware components.  For cassette usage, this means we can go up to 875
baud.  The most important is that we do not have to modify the interface.
We can use our standard SIO2PC cable now.

Other programs for cassettes.

There are a number of other tools and utilities that might be useful.  I
have been working on this project for several months, so I did not have
time to work on the utilities I would like to see.  At the moment, I doubt
that I will work on those.  It is not even clear to me how much more time I
want to invest in this program.  It seems to work most of the time, so as
long as I can process our own tapes, I have no need for improvements.  For
people that are interested in cassette tapes, and like to program something
themselves, here are some of the things that might help in processing
tapes.  To begin with, some of the programs were already mentioned.  It
would be nice if there would be some programs that would be able to process
the .hex files, or the .fsk files.  The need for these is obvious.  In
fact, if I had to design this project from scratch again, I would make an
interactive version, that would present each record in the form of the FSK
table to the user for manual correction.  This would eliminate the need for
separate programs.  On the other hand, working with text files allows for
processing at your leisure, without the need to process the file in one
session.  Besides, an interactive version would require a lot of screen and
file handling, and I feel that sort of thing always distracts from the
problem at hand.  This program currently only uses standard 'C' input and
output routines.  This means that it is very portable.  As a matter of
fact, it compiles without modification on my Atari ST with Turbo-C.  Well,
the CAS2SIO program will not compile, since it goes directly to the PC
hardware, so if you want to use that on another sytem, you will have to
write your own version.  But what you find in the 'C' source code is
related to the processing of the data only.  If someone would want to write
a version with a graphical interface, maybe even in another language, only
relevant information is found in the source.  No screen or file editing
stuff is obscuring the algorithms.  Besides, this project started out like
all of my projects, I had an idea, came up with an algorithm, and just
started hacking at it.  This program is just one big hack.  When the
algorithm did not quite work, I adapted it.  Now that it works most of the
time, I started documenting it, and cleaning up the code a bit.  I try not
to spend time on the graphical stuff, since my time is limited, and the
results count.

Another useful tool would be a program to convert a .cas file into a clean
.wav file.  This was also mentioned already.  We could create an improved
tape this way, cleaning up noise, or making the technical quality of the
FSK tones a lot better.  We could increase the baudrate.  I wrote a program
for this and it is called cas2wav.  You can find it on my website.

We could also create a tape in a stereo format, which would allow us to
have full control of the audio channel.  If you like tapes, this opens
possibilities that allow limited authoring of tapes.  We could add a sound
track to the tape, maybe even from digitized sound produced by the game itself.
All we have to do is to sample the audio once the game is loaded.  We could
then merge that with the data, and create a stereo .wav file, that we could
record using a stereo cassette tape recorder.

Another program we might write could eliminate the need for lots of hard
disk space.  we are now using the PC to emulate the cassette unit.  We are
sending data on the SIO bus.  We could reverse things a little.  We could
use a real cassette unit, and then simply retrieve the bytes from the SIO
bus, using a SIO2PC interface and the serial port of a PC.  We would have
to connect the cassette unit to a real Atari, otherwise the motor will not
start.  Some cassette units draw their power from the SIO bus, so another
reason why the Atari computer should be connected.  As a matter of fact, so
does the SIO2PC interface.  Now some cassette units do not allow you to
daisy chain another device to the cassette unit.  If your SIO2PC interface
does not allow this either, you will not be able to connect both at the
same time, so then forget this.  Unless you get yourself another cassette
unit.  If we start the cassette unit, it will put the data on the DATA IN
line.  If we program the serial port of the PC at 600 baud, we could listen
in on the data that is on the bus.  We would not have to do any decoding at
all, since the serial port chip inside the PC would do all the work for us.
One problem you will need to solve though.  The SIO2PC interface listens on
the DATA OUT line, and it speaks on the DATA IN line of the SIO bus.
However, this time we want to listen on the DATA IN line, since this time
we are sort of acting the same as the Atari computer.  You would have to
change your interface.  You would have to disconnect the DATA OUT line, and
connect the DATA IN line to the interface at that point.  Another problem
would be that you would have to measure the length of the PRWT using some
sort of timer, unless you assume a standard length of 0.25 seconds.

There is a lot more we can do with cassettes.  We can convert them to disk.
With some cassettes, this is hard, because they contain a special boot
loader, or some weird tape format.  Now that we have the data in a file,
converting this data to some disk format is a lot easier.  It might not be
simple, but at least it is easier.  I have converted some tapes before, and
the hardest part was getting the data from the tape.  Writing a simple disk
boot loader is relatively easy.  But we only want to copy the stuff to disk
for speed.  So if we could increase the baudrate, that would be good
enough.  To do this, we could write a small cassette handler that is booted
from the tape at regular speed.  If we let this bypass the O.S., we could
increase the baudrate beyond the 875 baud limit.  We could increase it to
19,200 or more, and then load the tape very fast.  Bypassing the O.S. is
not that simple though, since replacing the C: handler vector will not work
all the time.  For the boot stuff, the O.S. calls the cassette block
functions directly, by calling the SIO block routines for the cassette
pseudo device.  This looks like an interesting challenge.  We could make
this format compatible with a real tape, so that we could actually create a
cassette tape that will load fast all by itself.  On the tape we are
limited by the resolution provided by the frequencies of the mark and space
tones.  Depending on the quality of the FSK-decoder circuit, the hardware
will need some time to detect the frequency of the tone.  I can imagine
that it would at least need one period for this.  If we would be using a
baudrate of 19.200, we would have less than a quarter of a period for each
bit.  I do not believe the FSK-decoder would decode the frequency of the
tone fast enough.  I have not tried this though, so, this sounds like
another area where no one has gone before.  Still, we might be able to
double the speed, or more.

Command line options.

Running the WAV2CAS program is easy.  You have two options.  You can run it
interactively, by simply starting it.  It will ask for the filename.  Once
that is entered, it will ask you to enter whether or not you want the FSK
file to be created.  You are then asked to enter the description for the
cassette.  This will be placed in the cassette header.  The diagnostic
option cannot be selected in the interactive mode.  If you wish the
diagnostic data to be printed, it would be printed to the screen.  This
output can be redirected to a file using the standard DOS redirection.
This redirection would cause all the prompts to be redirected to that file
too, so that is why this option cannot be selected in the interactive mode.
You can enter all this on the command line if you prefer.  The printing of
diagnostic data is an option switch, which can be selected by adding /d to
the command.  You can also select that the FSK file should be generated.
Adding /f as an option switch will cause the FSK file to be generated.  The
file to process is the first argument on the command line.  The description
is optional, and if it is present, it is the second argument.  Since the
description can contain spaces, you must enter the description as a quoted
string.  So if we want to process a file named demo.wav in the \cassette
directory, we would enter:
wav2cas \cassette\demo.wav "A demonstration program" > demo.txt /d /f
This would select the printing of diagnostic data, redirecting it to the
file demo.txt in the current directory.  The .fsk, the .hex and the .cas
files are all output to the directory where the .wav file resides.

The command line options for the CAS2SIO program are very similar.  This
program only requires the .cas file, so this is the only command line
argument.  The option switch allows selecting the COM port to be used,
either COM1, 2, 3 or 4.  The default port is COM2.  Loading the file
created above using COM1 could be accomplished by entering:
cas2sio \cassette\demo.cas /1
If no command line arguments are given, the program will prompt for the
file and the COM port to use.  If you would like to use another com port,
you should modify the program, or write your own.

Command line options can be placed anywhere on the command line, and
options can be combined, so entering /df has the same effect as entering /d
/f.  Both programs allow the user to exit the interactive request for data
by entering control Z.  On most PC's, pressing control-BREAK will also
terminate the program.  I used the Symantec C++ compiler on the PC.  If you
press control-BREAK, it will terminate the program as soon as it tries to
write to the screen.  If the CAS2SIO program happens to be in the middle of
the leader, this may take a few seconds though.

Troubleshooting.

Well, what to do if the program is unable to create a correct .cas file?
This means you have a big problem.  There is a number of things you can
try.  For one thing, check to see that the tape was sampled at the correct
sampling rate.  A sampling rate of 44,100 is required.  The program will
accept tapes sampled at 22,050, but that is just for experimental purposes.
I wanted to see if it could process tapes that were sampled at that rate.
So far, results are disappointing.

Also check to see if the recording level can be adjusted.  If the recording
level is too low, noise will be more likely to disturb the process.  If the
level is much too high, the tones will be distorted.  If there is some hum
or other noise superimposed on the signal, make sure the total amplitude
stays within the limits.  It is better to be able to see the signal with
interference, than it is so see a level of 255 for a number of samples in a
row.  If the recording level cannot be adjusted to a satisfactory level,
check to see that you are recording the proper channel.  Simply try the
other channel to see what happens.

Sometimes, the recording of data can be suspended due to various O.S.
related issues.  If you feel that a data block has been hit by this, try
recording the tape again.  Most tapes are recorded on both sides.  You
could try recording the other side.  If a certain portion of a tape is
damaged, or suffers from a dropout, you could try to sample both sides, and
use a wave editor to replace bad pieces from one recording by pieces from
the second recording.  This can be a tough job, since to us, all blocks
sound alike.  But it should be possible.  Simply counting the blocks might
help.  Most cassette decks have a tape counter.  I suggest you use it.  If
there is a very small glitch that confuses the program, you could try to
see what happens if you simply cut away the bad spot with the wave editor.
Be careful though, since the length of one bit is about 75 samples.  If you
cut away more than 20 samples, the bit will be too short.  You could try
replacing it with a piece of mark tone, or a piece of space tone, that you
copy from another piece of the tape.  This would be a tape transplant so to
speak.  I have not tried this yet, but it should be possible.  Since we are
dealing with very large .wav files here, editing them might take a while.
I do not even know whether or not this can be done at all.  I would
recommend this only if there is no easier way to correct the problem.

A much simpler way of editing the data would be to simply sample both
sides, and processing both sides.  The resulting .cas files or .hex file
could then simply be merged.  For instance, if the first half of the A-side
of the tape is fine, you could only sample that first half and process it.
You could then sample the second half from the B-side, and after processing
that, you could merge the files with the COPY command.  You can either
merge the .cas files, or you can merge the .hex files.  You could clean up
the new .hex file, and generate a new .cas file from this.  The only
problem with this would be getting the length of the PRWT to be acceptable
at the point where the two files are merged.  This is why it would be best
to use the .hex file, if we had that converter for it.

The method just described can also be used if you just do not have the hard
disk space available to sample an entire tape at once.  You could sample
say 25 records, then process these, and then delete the .wav file so you
can process the next 25 records.  You would have to repeat this process
until all records have been processed.  This is a tedious job, and it is
easy to make a mistake.  But if you are careful, you might be able to do
it.  You would have to do a lot of work to merge the files in the correct
order too.  But if you just do not have the hard disk space available,
there is not much else you can do, except buying a bigger hard disk maybe.

If you have trouble booting the cassette tape image with the CAS2SIO
program, there are some things you could try.  You would think that once
you have the data on a reliable medium, like a hard disk, you are forever
freed of the dreaded boot errors.  Think again.  During the tests in my
lab, the living room, I have found that some of the load errors are not
caused by the cassette unit at all.  As a matter of fact, some of the boot
errors are caused by the Operating System.  Or, if you prefer, by
programmers that did not take into account how the Operating System works.
I will try to explain.  Like we mentioned before, if the leader is too
short, the O.S. will still be waiting for the leader to pass by, while the
first cassette data block is already being transmitted over the DATA IN
line.  By the time the O.S. feels it is time to start looking at the data
on the tape, the data has already passed, so a boot error will occur.  This
can happen if you start the CAS2SIO program on the PC and spend more than
10 seconds trying to switch on the Atari.  If you sampled only a portion of
the leader, the length of the leader in the first data record of the .cas
file might be too short.  You might try and make it a little longer with
some of the editing tools.  The leader is the PRWT of the first data
record.  Well, that is, the leader is added to the length of this PRWT, but
in effect, this value becomes the length value of the PRWT as specified in
the aux bytes.  By increasing the length, you will have some more time to
get the Atari booted.  This sort of timing problem may also cause boot
errors in a multi-stage boot.  During my experiments, I had a clean .cas
file.  All records had the proper checksum, and yet, every other time I
tried loading the program, it would produce a boot-error at the third stage
of the boot.  Appearantly, the program did some setup processing, and by
the time it got around to reading the next cassette record, the remaining
PRWT tone was too short to be a proper PWRT.  I then increased the PRWT for
that record, and since then I have had no boot errors.  So, this shows that
the boot errors are sometimes caused by timing problems, due to low
tolerance levels.  Writing a new tape with improved timing might make the
real tape more reliable too.

Some tapes may appear to have been converted without problems.  If you look
at the .hex file, and all records show "ok", you might think that there is
some other problem why the tape does not properly boot.  Well, it might be
caused by a record that was skipped as noise.  This is highly improbable,
since the current version will allow even very short records to exist on
the tape.  In my first versions, the program would discard any data as
noise if it was not able to detect the marker bytes.  Since some tapes have
a strange checksum record in their custom format, these records got lost,
so I changed the program such that it will only ignore records that are
very short.  However, if this caused a checksum record to be lost, you will
have a problem.  It will not show up as a bad record.  The only thing you
could do is listen to the tape and count the records.  If you find a
different record count than the number of records in the .hex file, you are
in serious trouble.  The only thing you can do is change the program, and
remove the garbage filter.  But what if you have more garbage than you
need?  If the program tried to recognize some noise as data, it will be
stored in the cassette image file.  Sending this short data block might
cause the Atari to get confused, and respond with a boot error.  You would
have to remove this erroneous data block with the tools already mentioned.
The problem is, that these tools do not yet exist.  Well, there is one
trick you can use.  The format specification tells you that any data that
is not recognized or not supported, should simply be ignored.  If you
change the type identifier from "data" into "bad " or something like that,
the record would be ignored.  A simple file editor will allow easy
disabling of this record, without the need for deleting bytes.

During my tests, I had a few tapes that just did not seem to work.  After
booting them, the system just appeared to be hung.  Well, before you try to
load any of your old tapes, make sure you know what procedure to follow in
order to boot these tapes.  The tape might not be compatible with the XL/XE
O.S., or you might need a certain amount of memory.  Another thing that
amazed me sometimes is that some of the commercial games are written in
BASIC.  Trying to boot a tape with a program that was CSAVE'd will yield
interesting results.  Or rather, they are boring.  A BASIC save-file will
usually have a couple of zero bytes at the beginning.  If you try to use
this as a boot cassette, it will attempt to boot 256 records.  Most BASIC
files are shorter, so after the tape runs out, and after some pause, you
will get a timeout, and thus a boot error.  So, use CLOAD instead on these
tapes.  Anyway, make sure you know the proper procedure for the tape.

It happened to me a couple of times that the CAS2SIO program just did not
seem to work at all.  It turned out that I forgot to tell the program that
my SIO2PC interface is attached to COM1, so if you are not using COM2, make
sure you specify what COM port to use.  If your PC has a COM3 or COM4, you
should be able to use these without any problems, since the interrupts are
not used.  The program tries to get the port address from the BIOS data,
but if that is not available, the default values are used.  If you are
using other values, the program must be modified.

Epilog.

Well, we have finally come to the end of this document, for now.  It is
bigger than the source code I'm sure.  But anything worth writing is worth
documenting I suppose.  Since I have been spending a lot of time on
cleaning up this stuff and documenting it, I have not had time to actually
process tapes a lot.  As you might understand, sampling about 350 tapes
might take a while.  Processing them takes time too, and you can only store
a few .wav files on a gigabyte drive.  But you just volunteered as a
tester.  Call this alpha-testing, betha-testing or whatever.  If I would
charge you a lot of money, and ask you to pay again for the update, over
and over again, I might even call it a product.  But since I think that
nobody is prepared to send me a million dollars for freeing them of the
menace of the cassette units, I suppose I will not be able to quit my
daytime job.  Too bad, since that means I might not work on this project
for some time.  Well, since I am including the source code, anybody that
was about ready to start complaining about the program is very welcome to
improve on my stuff.  So, use this stuff at your own risk.  I am not
responsible for any damage that might occur whatsoever.  If you want to
take portions of my program and improve on it, go ahead, provided that you
comply with a few rules.  You have to include in your documentation that
your program was based on this work.  Yes, that means that you have to write
documentation too!  Hah, that would stop anybody.  On top of that, if you
charge money for your product, you will have to inform people that my stuff
is available free of charge.  This includes shareware fees and stuff like
that.  If you do write a slick program, I would like to wish you good luck.
If it is a major improvement over my version, I would welcome that a lot.
If people agree that it is a major improvement compared to the free version,
they might want to spend some money on it.  I just want people to know that
this thing is available for free.  This means that I myself do not expect
people to send me any money.  So, you get what you pay for with my stuff.
Well, maybe it works for you.  If you do want to get rid of your money,
send some of the regular shareware authors some.  After all, where would
we be without the people that wrote programs like SIO2PC?  There would not be
a SIO2PC cable then, and I would not have thought of this application for it.
There are numerous other authors out there, still working hard to improve their
projects.  Some encouragement always helps.  Look on your hard disk or in
the box of popular diskettes, you will know who I mean then.  If you really
want to drop me a note, there are various options.  For one thing, you can
send me E-mail over the Internet.  You can also send me a regular letter.
If you happen to have some 8-bit program disks for me, let me know, or send
me a note.  We are still collecting disks.
The first Pooldisk CD was a big success, and I would like to take this
opportunity to thank all the people that ordered one.  The second Pooldisk
named Pooldisk Too was also a big success.  Again thanks to all who ordered
one and everyone that helped us on the project.  Just as with the first CD,
only shareware, freeware and Public Domain stuff.  We have no intention of
violating copyrights of anyone, even if these rights are old, and the amount
of damage to business is negligable.  At this point, I am wondering whether
there are shareware cassettes at all.  If there are, this would be a great
moment for converting them to a .cas file, and sharing them.  Let me
know if you know of any.  Any librarians out there?  Anyway, if you have
questions, send them to me.  Please do not send me .wav files to look at.
It would be crazy to E-mail a file of several megabytes.  You are welcome
to send any comments.  If you think some of my data or research is in
error, let me know.  The address is below.

Ernest R. Schreurs.
Kempenlandstraat 8
5211 VN  Den Bosch
The Netherlands
E-mail: ernest@wxs.nl

Release 1.00 may 1997
Release 1.01 somewhere in between, mailed upon requests, skips unsupported
             and unrecognized chunks in wav files, which some program create.
Release 1.02 September 2003 added support for 16 bit samples, which some
             programs that do not allow setting sample size use.
