The Year 2000 Crisis: One software engineers' perspective

Some background

   In the Real World, calculations and decisions are made on the basis of elapsed time.  Examples of calculations and decisions made based on elapsed time include If these calculations are not done correctly, then the computers that use those calculations to make decisions will make wrong decisions.  These calculations are ubiquitous.  It is hard to find erroneous code by testing - you have to fool the program into thinking that the time is something other than what it is, and that in turn may break other programs running in the system.

Some deeper background - this problem was anticipated

    This is not a new problem.  There is a paradox, called "Goodman's Grue-Bleen paradox", which describes what we're facing.  See The fall 1983 issue of the Bulletin of the Santayana Society, "on Grue and Bleen", by  Angus Kerr-Lawson.  See also Goodman's Paradox.  In the paradox, an object is "grue" if it is green between a certain time, T, and blue after T.  An object is "bleen" if it is blue before T, and green after T.  The paradox is how to tell the difference between something which is green and something which is grue.  At the moment, it is impossible to know.  After T, it will be easy.  Similarly, it is easy to tell the difference between something which is green and something which is bleen.  But after T, it might not be.  Further, any system built on distingushing between green and blue will work properly now with green, blue, and bleen objects, but will fail after time T.
    A computer program can be viewed as a black box.  You can test the program to your hearts content, and it will work fine.  Now.  But at some arbitrary point in the future, in this case at time T=31-Dec-1999 11:59:59 PM, it will stop working.  And any system built on this program will also stop working.  I will return to this point in a moment.

The problem

    For forty years, computers have been storing dates as text, in the form yymmdd where yy is the year (which can vary from 00 to 99), mm is the month (which can vary from 01 to 12) and dd is the day of the month (which can vary from 01 to 31).  Yes, this is a suboptimal coding: one could store the date as the number of days from a certain date, which is what the Julian date is, for example.  The problem is that the programmers were torn between conflicting demands from management to code quickly yet compactly.  This coding is quick and compact.
    With this coding, the difference between two days can be calculated as follows:

DifferenceBetweenDate ( StartDate, EndDate)
{
   StartYear <- Integer(StartDate(1:2))                -- pick the year in the first two digits and convert from a string to an integer
   EndYear <- Integer(EndDate(1:2))                    --pick the year in the first two digits and convert from a string to an integer
   StartMonth <- Integer(StartDate(3:4))                -- pick the year in the first two digits and convert from a string to an integer
   EndMonth <- Integer(EndDate(3:4))                    --pick the year in the first two digits and convert from a string to an integer
   StartDay <- Integer(StartDate(5:6))                -- pick the year in the first two digits and convert from a string to an integer
   EndDay <- Integer(EndDay(5:6))                    --pick the year in the first two digits and convert from a string to an integer

   ElapsedDays <- EndDay - StartDay + CorrectionForDaysOfMonth
   IF EndMonth > StartMonth
    THEN
         ElapsedMonths <- EndMonth - StartMonth + CorrectionForMonths
    ELSE
        ElapasedMonths <- EndMonth - StartMonth + 12 + CorrectionForMonths
   ElapsedYears <- EndYear - StartYear

   TotalElapsedDays <- ElapsedYears * 365 + CorrectionLeapYears + ElapsedMonths * 30 + ElapsedDays

  RETURN TotalElapsedDays
}

A database that this program might read might look like this:

580119Silverman     Jeffrey    535348980MM3098112
841122Silverman     Daniel     535231231MS0198112
821028Silverman     Sarah      535191923FS0198112

This program and this database are not year 2000 compliant.  When the year 2000 rolls around, it will pick up 00 from the year field, subtract the StartYear, and come up with a negative number.  It will then multiply that by 365 and that error will swamp the rest of the calculation.

Suppose now that I go an fix the program.  That's easy to do:

DifferenceBetweenDate ( StartDate, EndDate)
{
   StartYear <- Integer(StartDate(1:4))                -- pick the year in the first two digits and convert from a string to an integer
   EndYear <- Integer(EndDate(1:4))                    --pick the year in the first two digits and convert from a string to an integer
   StartMonth <- Integer(StartDate(5:6))                -- pick the year in the first two digits and convert from a string to an integer
   EndMonth <- Integer(EndDate(5:6))                    --pick the year in the first two digits and convert from a string to an integer
   StartDay <- Integer(StartDate(7:8))                -- pick the year in the first two digits and convert from a string to an integer
   EndDay <- Integer(EndDay(7:8))                    --pick the year in the first two digits and convert from a string to an integer

   ElapsedDays <- EndDay - StartDay + CorrectionForDaysOfMonth
   IF EndMonth > StartMonth
    THEN
         ElapsedMonths <- EndMonth - StartMonth + CorrectionForMonths
    ELSE
        ElapasedMonths <- EndMonth - StartMonth + 12 + CorrectionForMonths
   ElapsedYears <- EndYear - StartYear

   TotalElapsedDays <- ElapsedYears * 365 + CorrectionLeapYears + ElapsedMonths * 30 + ElapsedDays

  RETURN TotalElapsedDays
}

There are a couple of problems with this new program.  Can you find them?  The revised program is just as correct as the original program (there is a lot of software in those correction functions, which need not concern us here).  Go ahead and think for a moment - I can wait.
The first problem is that if I run this corrected program against uncorrected data, then the program will give bad results and might even crash.  The second problem is that, while I fixed the code, I didn't fix the comments!  If another programmer comes through in a few years to check what I've done, and find this, he or she will be very confused.  There is no software tool made that will detect this second problem, in fact, I think it is theoretically impossible to build a such a tool.
Well, suppose I write a program to correct the data.  The corrected data looks like this:

19580119Silverman     Jeffrey    535348980MM3098112
19841122Silverman     Daniel     535231231MS0198112
19821028Silverman     Sarah      535191923FS0198112

Am I done?  Well, no.  The problem is that this database is probably read by many programs.  If some of the programs are corrected, and some of the programs are not correct, then there will be problems.  If an uncorrected program reads a corrected database, it will think that the year is always 19 and the month is 58, or 84, or something.  So now you have to have two databases.  Keeping the data straight and correct between the databases will be an ongoing challenge.  Part of the value of a database is having it stay up to date all the time; having two databases means that there is a delay from database to database.

A plot complication: the Gregorian Calendar has a wrinkle

Consider this flowchart:
For more information on what this all means, consider this Software Problem Report response which was written by Digital Equipment Corporation (DEC) in 1983!

How did we get into this mess?

The programmers who wrote this code in the first place were fairly confident that the programs would have a relatively short life.  They knew that the programs would be re-written and that the data formats would change.  They also knew that storage space was expensive and the development time was expensive so they came up with quick and cheap solutions.  And it worked.

Warning! Warning! Warning!Extreme cynicism approaching!  They also knew that they would go on to other jobs and when the problem became generally known, they would be elsewhere.  Many of them have retired.  But management also gets some blame for being short sighted, for emphasizing rapid delivery instead of taking the time to test properly, and for build single big systems instead of lots of little systems.

Human Relations (HR) gets in the way.  Have you ever turned in a resume to an HR person, knowing full well that this person hasn't a clue what you do or how you do it?  Have you worked a contract job, where the job was to get this thing done as quickly as possible and then good bye?  The Federal Government, for example, has some talented people, but more frequently has to contract out to get the DP services they need.  Congress doesn't help: it mandates overly complicated solutions to simple problems (consider, for example, the Internal Revenue code).

Some of the problem can be traced to planned obsolesence.  This is really cynical but here is an article on the subject in the health care field. End of cynicism

Ordinary people (non programmers and non DP managers) trust us (programmers, software engineers, sysadmins, DP managers) to build computer systems which do certain tasks.  Ideally, quicker, cheaper, and more reliably than existing human systems.  Most of the time, we succeed.  But sometimes we fail.  In general, I would like to know if non-computer people are generally happy with the proliferation of computer systems.  Please E-mail me with your comments on this question.

For a discussion of an operating system which is Y2K and has been since 1980 (sic!), see this article on the VAX/VMS boot process.  VMS rocks!

Other resources on the Y2K problem

The astute reader will note that these articles disagree with me.  Actually, these articles are talking about the consequences of failure.  I want to point out that not everything is going to fail.
The telecommunications industry

The Federal Communications Commission (FCC)

General Services Administration (GSA)

Y2K for Women.  This is an interesting site.  I am somewhat offended at the idea that simply because I am not a woman and perhaps especially because I am a software engineer, therefore I don't care about the health safety and well being of my family.  If women really want men to become more sensitive and feeling, then it seems incumbent on the women to begin the process by treating the men as if we had feelings.  However, once you get beyond that insult, there is a very important idea here, one that's actually fairly old: Be Prepared!  The thesis is that disasters happen, get ready for them.  That means having supplies of food, water, batteries, tents, etc. so that if you are cut off from civilization, you can survive.

http://www.ieee.org/organizations/tab/Y2k3/tsld001.htm  (A text version) and http://www.ieee.org/organizations/tab/Y2k3/sld001.htm
is a nice presentation of a moderate view about what is going to happen.  In particular, slide http://www.ieee.org/organizations/tab/Y2k3/sld019.htm shows graphically that some systems will fail, some will not; some will recover quickly, some will recover better than they were, some will recover as good as they were, some will not recover.
 
 

The solution

There is no "the" solution.  Software is written in different languages.  In order to fix a program written in a given language, you have to undestand that language.  Some programs are written in ancient languages, such as FORTRAN, COBOL, and Assembler.  Programmers able (and willing - the old languages are hard to use) are few and not cheap.  Sometimes, programs are written in obscure languages (ALGOL-68) or strange dialects of common languages (Instrumentation BASIC).  The operating system itself is a computer program, and some operating systems are not year 2000 compliant and are not going to be made year 2000 compliant (e.g. MS-DOS).  In some cases, fixing the problem will involve going to each computer, opening it, removing cards that are in the way, and replacing one chip with another.  You think software guys are expensive, the hardware people are even more expensive.  And those of us who do both are incredibly dear.

In the case of the program above, I came up with a fix in about 10 minutes.  However, I haven't compiled it or tested it, and I made a mistake in the code, and I have another program which I have to solve in a meaningful way.
 

There are other solutions.

The VMS operating system defines a Time data type, which is a signed, 64 bit number which counts 100 nanosecond ticks from November, 1858 (sic).  VMS will not have its "Y2K" crisis until the 6th millenium.  VMS is a great operating system - powerful, fast, efficient, reliable.  Unfortunately, none of those things seem to count in the marketplace.  For more information on VMS, contact Digital Equipment Corporation.

The UNIX operating system, including linux, uses a 32 bit counter which counts seconds from 1-Jan-1970.  This is good until sometime in the year 2038.  By that time, most UNIX software will be running on 64 bit platforms (I hope), and UNIX will use a 64 bit counter.  Programs using the time_t typedef will work without changing the source code, although they might need recompilation.

Earlier, I mentioned that the coding was not optimal.  A more optimal coding would be to calculate a number based on the year, month, and day.  Microsoft Excel does this, and Excel is Y2K compliant.  For example, a more efficient coding of the year field is two hexadecimal digits instead of two decimal digits. Then, the year field would be okay until the year 2156, and this solution is backwards compatible with all existing software.  "Backwards compatible" in this context means that if an uncoverted program or a converted program writes to the database, a converted program can read the data correctly.  Further, if a converted program writes to the database and an unconverted program attempts to read the database, then the program can probably be expected to signal the error.

Javascript also uses a seconds counter - I haven't bothered to figure out when it starts, but since it is already past 4.3 billion, it must be at least 64 bits.  The Java Date package is year 2000 compliant and has been from day 1.

Testing

    With a small computer, such as a PC, it is easy to change the clock and see if everything works.  For big computers, it is harder.  Unless you can change the clock on your computer, you can't test the software completely, and you have to take it on faith that you've fixed it correctly.  This is the "Grue Bleen Paradox" in the real world.  We assume that objects which are green today will be green tomorrow - in other words, we never observe Grue objects.  But the world might be full of Grue objects, it's just that the time T hasn't come and gone yet.
        The difference between the Y2K problem and the Grue-Bleen paradox is that we have the ability to peer into the black box and determine using deduction, not induction, that the system will work, or will not work.  Proving programs correct is a topic which is covered at most undergraduate computer science classes - unfortunately, the pressure to get something out the door precludes proving programs correct.

Some helpful information for software testers

Among others, we have found the following transition dates interesting to test:

September 9, 1999 to September 10, 1999 (to confirm correct translation of 9/9/99, which is sometimes used as an End of File marker (A technique taught in introduction to programming classes.  Why?  Because it is easy for people who barely understand BASIC to understand.  The problem is that then these people go out and write production code, but they write it in Intro to Programming style))
December 31, 1998 to January 1, 1999 (to check whether 99 is used to mean "no expiration date" (Again, an intro to progrmaming trick))
December 31, 1999 to January 1, 2000 (to check century transition; January 1 should be a Saturday)
February 28, 2000 to February 29, 2000 (to verify leap year calculation)
February 29, 2000 to March 1, 2000 (to verify leap year calculation; March 1 should be a Wednesday)
Any time after Mon Jan 18 19:14:06 2038.

I used this "C" program to test the time function:

#include <time.h>

main(int argc, char *argv[] )
{
  time_t t = atoi(argv[1]);
  char s[60];

  printf ( "%s\n", ctime( &t ) );
}
Here are some runs:
[jeffs@angel jeff]$ ./timetest 2147483646
Mon Jan 18 19:14:06 2038

[jeffs@angel jeff]$ ./timetest 2147483647
Mon Jan 18 19:14:07 2038

[jeffs@angel jeff]$ ./timetest 2147483645
Mon Jan 18 19:14:05 2038

[jeffs@angel jeff]$ ./timetest 2147483646
Mon Jan 18 19:14:06 2038

[jeffs@angel jeff]$ ./timetest 2147483647
Mon Jan 18 19:14:07 2038

[jeffs@angel jeff]$ ./timetest 2147483648
Mon Jan 18 19:14:07 2038

[jeffs@angel jeff]$ ./timetest 2147483649
Mon Jan 18 19:14:07 2038

2147483647 = 1024 * 1024 * 1024 -1

Now, here is where the story turns personally bizarre: I was born at 6:30 AM on January 19th, 1958; the UNIX time crisis will occur about 11 hours before my 80th birthday.
 

A solution for computers on networks which have out of compliant BIOSes

When your computer starts, it gets the time from the Time Of Year (TOY) clock which is part of the BIOS.  On many older PCs, this chip is not Y2K compliant.  The fix is to replace the chip.  Very expensive in terms of time and labor and the Risk of Screwing up.  A better solution, if your PC is on the network, is to designate a machine as the time server and then give the MS-DOS command

net time /set /yes \\timeserver

Put this command in a .BAT file and reference the .BAT file from the \windows\startup folder.

If you are running Linux, then you can get the time across the network with the rtime command, as in:

 rdate -p -s clock.llnl.gov

There are lots of lists of timeservers on the 'web; this particular timeserver is at the Lawrence Livermore National Lab in California.

Is the world about to end?

    Probably not.   There are some things that will break.  Some things won't.  I think that the major banks are aware of the problem and mobilizing resources to deal with the problem.  Visa, for example, has a serious compliance testing program in progress, with severe penalties for failure.  The Federal government always has had problems getting high quality talent (consider, for example, the U.S. Congress... sorry) and that is more scary.  But for example, will airplanes crash?  I don't think so.  The human pilots, for example, are year 2000 compliant and they still remember how to look out the window and fly.  They might even enjoy it.  I am told that the microprocessor in the engine of a car made in Japan uses a 00 in the year field to indicate that there is a problem in the engine - whether this is true or not is a matter of conjecture (it might, for example, be a rumor started by a rival).
    Remember, bad news sells newspapers and magazines.  Would you bother reading a headline "Over 20 747s landed safely at Sea Tac airport: thousands unhurt". ?     I thought not.
   Again, see the IEEE page for a balanced discussion.
    What would happen if...  The example at www.y2kwomen.com is overly simplistic.  Suppose your bank wasn't Y2K compliant and all of a sudden, you couldn't deposit checks (because the bank's computer thinks the checks are 100 years old).  The deposit would be rejected and you or the person who gave you the check would get a (computer generated) letter stating that there was a problem.  At the point, you would call a human being who would immediately understand the problem and Do Something to fix it.  Would it be a problem?  Well, yes, because computers can process checks in fractions of a second, whereas humans take minutes.  It will be a bloody pain - and the banks know this.  But, for example, if you bought a 10 year zero coupon bond in 1997, it already had to be Y2K compliant!  If you have paper records that show that you had assets stored in a computer, then the onus falls on the people who own the computer to show where the assets have gone.
    On the other hand.... there is allegedly a shortage of software people capable of dealing with the problem.  Yet, for example, as Compaq purchases Digital Equipment Corporation, it is laying off 15,000 people.  Intel is laying off people in Du Pont, Washington.  Boeing laid off a bunch of engineers in 1995 (including me).  If there really is shortage, why are these people being shown the door?  No, the problem is not that aren't enough people, the problem is that management is too focussed on short term to think about something happening hundreds of days into the future. Again, pressue to get something out the door is a major part of the source of the problem.
    You can help here.  Before entering into a business arrangement with a company, such as taking out a loan, opening a savings or checking account, ask if they are year 2000 compliant.  You might even modify the contract so that the institution will warrant that it is working on year 2000 compliance, and that it will hold you harmless from any damages or claims resulting from a failure of their systems.  Managers pay attention to that sort of thing.  If you have a PC or a Macintosh, test your computer for year 2000 compliance (Do you use UNIX or Linux?  You can update the time across the 'web, but be wary of the year 2038, which is 4 billion seconds are Jan 1 1970).
    So was the excitement justified?   Unfortunately, yes, it was.  Program maintenance is low on the radar screens of most managers, both IT and non-IT managers.  It's important, and there are some estimates that 50-60% of software cost is actually post release maintenance.  So unless the techies grabbed the managers by the lapels and told them "Dammit, this is important - the world is going to end unless you do something".  Of course, managers can't code and they can't engineer so they did the only thing they knew how to do - call meetings.  And at those meetings, the techies told them that we might go bankrupt unless we fix this problem, and it's going to take these kinds of resources.  The managers would then stagger, dazed, down to the club and over a beer or I guess since these are managers, over champagne, they would say to their fellow suits "You know, my techies tell me that when the millenium comes, all our software is going to go kaboom".  To which their suits would reply, "That's strange, my techies tell me the same thing.  Maybe there's something to it?"