Monday, February 11, 2013

Saying Nasty Things About Spreadsheets

James Kwak at The Baseline Scenario and Lisa Pollack at FTAlphaville both point to the task force report (PDF) on how JPMorgan's Chief Investment Office lost billions of dollars.  Among the factors identified in the report are a number of impressive spreadsheet errors.  The spreadsheet program, in this case, was Excel.  Excel and its spreadsheet brethren are very likely the most commonly used business software in the world.  And not just in the business world; in some academic fields, Excel is the platform of choice for numerical work simply because it is almost certainly safe to assume that the people with whom you want to share particular calculations will have Excel available [1].

Spreadsheet errors of disastrous proportions are relatively common.  So much so that there is a European Spreadsheet Risks Interest Group (EuSpRIG) that holds an annual conference, tries to identify best spreadsheet practices, and collects horror stories from around the world.  Researchers at the University of Hawaii have been studying the issue of spreadsheet errors for many years.

Contemporary spreadsheets even "help" you introduce errors.  For example, when you insert a new row or column within a range used in a formula, the software will typically "correct" the formula to accommodate the change.  All of the spreadsheet programs that I have worked with will, particularly for boundary cases, sometimes make the wrong change.  When I worked for the Colorado General Assembly's Joint Budget Committee, all of the analysts feared inserting new rows in a certain spreadsheet because we knew that the software (not Excel in this case) was going to break an unknown number of formulas in seemingly random places when we did the insertion.

Many years ago I had the unfortunate experience of being the representative from the programming side of a project sent to the project leader's staff meetings.  It was a moderately large project that was going to cost a hundred million dollars or so to deploy.  The decision to go forward was being made on the basis of a very large spreadsheet that incorporated market penetration data, financial calculations, detailed operational cost estimates, and so forth.  At some point I asked to see the test cases for the spreadsheet that verified the accuracy of the calculations.  When I was told that there was no formal testing, but the people who had built the spreadsheet had been "very careful," my mouth got away from me.  "If we did the real-time software that way," I blurted, "you'd fire the whole lot of us."  That was the episode that really hammered home for me that building a spreadsheet is a programming exercise, too often done by people who have never been trained in programming [2].

Certainly my own personal coding fails to follow best practices, leading to occasional mistakes I would prefer not to have made.  There's really no excuse for some of the bad practices; I plead personal laziness and ancient habits.  I do a much better job when someone else is paying for my services.  Still, I have long believed that spreadsheet software in general makes it difficult to follow good practices.  Test cases for code are hard to do because of the embedded data.  Flow control and the order of calculations can be difficult to determine, even with visual displays that show cell dependencies in different ways.  Version control can be difficult.  It's too easy for developers to "peek" at data in other parts of the sheet that they're not supposed to be using [3].   At least I know what the risks for my sloppy practices are, which is more than can be said for most spreadsheet users.

Still, it's kind of surprising that a firm the size of JPMorgan, making decisions involving billions of dollars of the firm's own money, would allow such a bad set of spreadsheet mistakes be made.

[1] It's also common to assume that they will have the Microsoft Windows version of Excel, which implies more than simple spreadsheet calculations.  That version of Excel includes, for example, the nonlinear optimization Solver and VBA (Visual Basic for Applications).

[2] Yes, there are good self-taught programmers.  But that's not the way to bet.

[3] Object-oriented programming languages, that enforce concealing the data structures and algorithms inside a class, have been widely adopted for a reason.  I'm old enough to remember that good programmers, in the days before OOP languages were common, very often used OOP philosophy.

No comments:

Post a Comment