Monday, July 21, 2014


A while back at Ordinary Times, there was an interesting comment thread on the subject of defining the Midwest region of the US.  One of the thoughts that occurred to me while reading that was whether it was possible to define regions based on inter-state migration patterns.  The idea grew, I suppose, out of my own experience.  I lived and worked in New Jersey for ten years, but never really felt like I fit in there.  Eventually my wife and I moved to Colorado, to the suburbs of Denver, where we immediately felt right at home.  Most people, I thought, might have been brighter than we were and not moved to someplace so "different."

I've also encountered a variety of nifty data visualization tools that look at inter-state migration in the US, like this one and this one from Forbes.  State-level data for recent years turns out to be readily available from the Census Bureau.  We can define a simple distance measure: two states are close if a relatively large fraction of the population of each moves between them each year.  "Relatively" because states with large population have large absolute migration numbers in both directions.  For example, large numbers of people move between California and Texas -- in both directions -- because those states have lots of people who could move.  From Wyoming, not so many.  Given a distance measurement, it turns into a statistical problem in cluster analysis: partition the states into groups so that states within a group are close to each other.  Since there's only a distance measure, hierarchical clustering seems like a reasonable choice.

The map to the left shows the results of partitioning the 48 contiguous states into seven clusters.  The first thing I noticed about the partition is that states are grouped into contiguous blocks, without exception.  While that might be expected as a tendency [1], I thought there would be at least a couple of exceptions.  The resulting regions are more than a little familiar: there's the Northest, the Mid-Atlantic, the Southeast, the Midwest (in two parts), the West, and "Greater Texas".  There are a couple of other surprises after reading the discussion at Ordinary Times: Kentucky is grouped with the Midwest, and Missouri and Kansas with Greater Texas.  New Mexico clustered with Texas isn't surprising, but New Mexico with Louisiana and Arkansas?  Hierarchical clustering is subject to a chaining effect: New Mexico may be very close to Texas, and Louisiana also close to Texas, and they get put into the same cluster even though New Mexico and Louisiana aren't very close at all.

One way to test that possibility is to remove Texas from the set of states.  The result of doing that is shown to the left. As expected, New Mexico is now clustered with the other Rocky Mountain states and Louisiana with the Southeast.  Perhaps less expected is that the other four states -- Arkansas, Kansas, Missouri, and Oklahoma -- remain grouped together.  None of them is split off to go to other regions; the four are close to one another on the basis of the measure I'm using here.

Answers to random anticipated questions... I used seven clusters because that was the largest number possible before there was some cluster with only a single state in it [2].  The Northeast region has the greatest distance between it and any of the other regions.  If the country is split into two regions, the dividing line runs down the Mississippi River.  If into three, the Northeast gets split off from the rest of the East.  There are undoubtedly states that should be split, ie, western Missouri (dominated by Kansas City) and eastern Missouri (dominated by St. Louis); a future project might be to work with county-level data.

[1]  My implementation of hierarchical clustering works from the bottom up, starting with each state being its own cluster and merging clusters that are close.  Using the particular measure I defined, close pairs of states include Minnesota/North Dakota, California/Nevada, Massachusetts/New Hampshire, and Kansas/Missouri.  These agree with my perception of population flows.

[2]  The singleton when eight clusters are used is New Mexico.  When ten clusters are used, Michigan also becomes a singleton, and Ohio/Kentucky a stand-alone pair.

Sunday, July 6, 2014

An Update on the War on Coal

[A longer version of this post appeared at Ordinary Times.]

It's been a tough year for coal in the United States. I generally dislike the use of war-on-this and war-on-that. But if the intended meaning is "make it much more difficult and/or expensive to continue burning large quantities of coal to produce electricity," then the phrase is accurate. Where most people who use it are wrong though, is just who it is that's fighting the war. It's the federal courts, and to a lesser degree some of the individual states. The EPA is just the tool through which the courts are acting. Well, also ghosts of Congresses past, who left us with various environmental protection statutes in their current form. Since the SCOTUS hammered the coal side of the fight twice this just-concluded term, it seems like a good time to write a little status report.

Not all the constituents of coal are combustible. Anywhere from 3% and up are not and are left behind as ash, and even 3% of a billion tons is a lot of ash. A bit more than 40% of coal ash is typically reused in various ways: some of it can replace Portland cement in the right circumstances, some it can be used as fill for roadbeds, etc. The remainder winds up in landfills or ash ponds. Ash ponds contain an ash/water slurry; the wet ash stays where it's put rather than being blown away by the wind. Ash pond spills are becoming more common. The federal EPA has not regulated ash ponds in the past; in January this year the DC District Court accepted a consent decree between the EPA and several plaintiffs that requires the EPA to issue final findings on ash pond problems by December. The expectation is that the findings will lead to significant new regulation, and increased spending on both existing and future ash ponds. Things are also happening at the state level. The North Carolina Senate unanimously approved a bill last week that would require the closure of all coal ash ponds in the state over the next 15 years. NC's not exactly one of your liberal Northeastern or Pacific Coast states.

Most of the visible pollutants that go up the flue at coal-fired plants have been eliminated. The picture to the left is the Intermountain generating station near Delta, Utah. The visible white stuff escaping from the stack is steam. Not visible are things like mercury compounds, sulfur and nitrous oxides, and extremely small particles of soot. Those are all precursors to haze, smog, low-level ozone and acid rain, as well as being direct eye, nose, throat and lung irritants. Some of these pollutants can travel significant distances in the open air. In April this year, a three-judge panel of the DC Circuit upheld a tougher rule for emissions of this type of pollutant (the MATS rule). Also in April, the SCOTUS approved the EPA's Cross State Air Pollution Rule that will result in tighter controls on this type of emission. Approval of the cross-state rule has been a long time coming, as EPA rules that would regulate cross-state sources made multiple trips up and down the court system. The courts have always held that the EPA should regulate cross-state pollutants; the problem has been finding a technical approach that would satisfy the courts. In EPA v. EME Homer in April, the SCOTUS reversed the DC Circuit, and the CSAPR will now go into effect.

Finally, last week the Supreme Court issued its opinion in the case of Utility Air Regulatory Group v. EPA. This opinion confirmed the Court's 2009 opinion in Massachusetts v. EPA that the EPA must regulate greenhouse gases. Massachusetts was a suit brought by several states against the Bush EPA, which had decided the carbon dioxide was not harmful. I think Utility is an odd opinion, cobbled together out of three different factions on the court (more about that in a moment). The opinion has three conclusions: (a) the EPA can and must regulate greenhouse gas emissions from stationary sources, (b) the EPA can only regulate greenhouse gas emissions from stationary sources if those sources would have been regulated for non-greenhouse emissions anyway, and (c) the somewhat controversial approach the EPA is taking to the regulation is acceptable. The last one seems to me to have been sort of an afterthought. OTOH, it's likely that we'll see a number of cases about it later when the states make the details of their individual plans known.

The results of the various court decisions are going to have very different effects on different states. Compare California and North Carolina, to pick two (not exactly at random). North Carolina has 43 coal ash ponds; California has none. North Carolina, despite being a much smaller state, generates more than 30 times as much electricity from coal as California; the MATS rule will require much more effort to meet in North Carolina. The CSAPR does not apply to California; but North Carolina power plants will be required to make reductions to improve air quality in downwind states. North Carolina has to reduce the CO2 intensity of its generating plants by more than the national average; California's required reduction is much less than the average, and decisions that California has already made at the state level will probably be sufficient to meet the EPA requirements. North Carolina's electricity rates are likely, it seems to me, to be noticeably higher in the future; California's rates will remain high and perhaps go higher, but aren't going to be driven by these decisions.

Monday, June 23, 2014

Infrastructure Needs - A Cartogram

From time to time you find articles that talk about how far behind the United States is in infrastructure spending.  The American Society of Civil Engineers maintains an entire web site dedicated to the topic.  I have often wondered whether there are geographic patterns to the infrastructure shortfalls.  One of the things that raises that question for me is the stuff I read about how the electric grid is falling apart.  In the Denver suburb where I have lived for the last 26 years, and the Front Range generally, there's been an enormous amount invested in the electric grid and service seems to be noticeably improved compared to what it was when I moved here.

Bloomberg maintains an interesting collection of state-by-state numbers, including infrastructure needs.  They only consider a limited number of things: roads, drinking water, and airports.  I'd like to have figures that included more factors — ah, how pleasant it would be to have minionsgraduate students to do the grunt work — but Bloomberg is an easy-to-use starting point.  The cartogram at left — double-click in most browsers for a larger version — shows US states sized to reflect Bloomberg's figure for annual per-capita infrastructure spending needs for the period 2013-2017.  West Virginia has the largest value at $1,035; New Jersey has the lowest value at $78.  While there are lots of things that would be interesting to regress against the numbers, in this essay I'm just thinking about geography.

One of the obvious things that jumps out is that high-population states do better.  California, New York, Florida, and Illinois all fall into that group.  The most likely reason would seem to be that there are economies of scale involved in the things Bloomberg measures.  An airport can serve more people in a high-population state; highway lane-miles are used by more people; doubling the capacity of a water- or sewage-treatment plant doesn't mean that the cost of the plant will be doubled.

Another factor appears to be that states with high population growth over the last 20 years do better.  California, Colorado, Texas, Georgia and Florida are examples.  In this case, the likely reason would be that rapidly growing populations have made infrastructure spending a critical need.  To use Colorado as an anecdotal case, since I live here and pay some attention, Denver built a major new airport, my suburb greatly expanded its water-treatment plant, and I-25 along the Front Range has been subject to an entire series of improvements (if you drive its length, it's not whether part of it is under construction, it's a matter of how much).

Finally, some simple regional observations. Here's a standard map of the 48 contiguous states using an equal-area projection for comparison.  The 11 contiguous western states appear to be in much better shape than the country as a whole.   As might be expected, Wyoming and Montana, the two western states with the smallest populations, do the worst in that region by a wide margin.  With a caveat that this is state-level data, the upper Great Plains, New England, and Appalachia all do very poorly.  With a small number of exceptions, the East Coast looks particularly bad.

I think I'll just sum this up with the obvious statement: "There are new parts of the country, and old parts of the country, and the new parts tend to have more and shinier stuff per person."

Sunday, June 22, 2014

Western Secession 7 -- The Age of Electricity

Lots of people who believe that Civilization is Doomed because of energy constraints talk about this being the Age of Petroleum. As far as transportation goes, that's absolutely true. But it's not really the most critical aspect of our current high-tech society. This is really the Age of Electricity.

If petroleum were to slowly go away, eventually reaching zero, there are alternatives. Existing land transportation can become much more efficient: smaller vehicles, trains instead of trucks, etc. Alternate power sources are available, at least for some applications: smaller electric vehicles, electric trains instead of diesel, etc. Goods can be produced closer to where they are consumed in order to reduce the amount of transportation required to deliver them. Slower transportation and delivery of objects helps — delivering the small package by electric train and electric vehicle uses much less energy than flying the package overnight. More expensive synthetic alternatives to petroleum-based liquid fuels exist for situations where alternatives are impractical.

OTOH, we have reached a point where there is no substitute for electricity. This short essay was written on a computer; it was uploaded to Blogspot and copied into some number of their computers; the copy you're reading was downloaded from one of those servers. The non-electric alternative is paper and ink (since radio and television also require electricity in large quantities).  If that were the only medium available, chances are that you would never see this. Paper and ink distribution imposes serious limits on how many people's writing gets distributed widely.  I'm using "widely" in the sense of making it possible for many people in many locations to read it. Having the NY Times publish it would count, as the NY Times is popular enough to have nation-wide physical distribution (although without electricity, that may mean being a day or two behind).  Putting it up on a wall in a public place in Arvada, CO doesn't count.

Recall that the original "wire services" that distributed stories to local newspapers were called that because it was a description of what they did. Stories were collected and distruted by telegraph and later teletype, both of which require electricity. David Weber's popular Safehold series of science fiction novels, set on a world where the use of electricity is strictly forbidden, envisions a semaphore network instead. It's slow, it's even slower at night, it fails temporarily when the weather is bad enough, and it fails completely when the message has to cross a large enough body of water (discounting transcribing the content, moving it physically across the water, then putting it back on the semaphore network).  High-speed communication means electricity.

Electricity is a key consideration in developing countries as well, with China as the most interesting case. Their population is extremely large. As recently as three decades ago, that population was desperately poor. The government is working — at a pretty hectic pace — to urbanize and find non-farming work for what was an enormous peasant population. In order to do that, electricity is vital. As a result, if it will generate electricity, China is deploying lots of it. Coal, natural gas, nuclear, hydro, wind, solar... China's rate of growth in the use of all of those is among the very highest in the world.  India has been less aggressive about expanding its grid, leading the head of one of that country's software development firms to say, "Job one is acquiring the diesel fuel to power our private generators; job two is writing software, and doesn't happen if we fail at job one."

Dependency on electricity has been greatly increased by the integrated circuit revolution. The common design approach for an enormous range of things is now a processor, a batch of sensors (some as simple as push buttons), and a handful of actuators. All of the difficult parts are implemented using software. Television is now digital, and depends on billion-transistor integrated circuits for every step from source to final viewing by the consumer. Film has disappeared. Music is (at least the vast majority is) delivered in digital formats dependent on those same integrated circuits. The banking system depends on computers to run the check clearing house, the stock markets are all electronic, the Post Office depends on computers to read addresses and route mail...  I told my bosses at Bell Labs that it was a software world back in the late 1970s; it has only become more so.

All of this may seem trivially obvious, but any plan to ensure that modern technology continues on into the future depends on maintaining robust reliable supplies of electricity.  The next post in this series will look at where the US gets its electricity today.

Monday, June 16, 2014

A Thought on the EPA's New CO2 Rule

Recently, the US EPA announced its proposed regulation of CO2 emitted by existing power plants.  The proposed rule follows as a consequence of (a) the Supreme Court's finding that greenhouse gases are an air pollutant under the language of the Clean Air Act, and that the EPA is therefore required to regulate it if it is harmful, and (b) the DC Circuit Court's subsequent finding that greenhouse gases are harmful.  Everyone knows that the matter will wind up back in the courts.  Ben Adler at Grist provides a nice summary of the potential legal vulnerabilities of the proposed rule.

As described in an earlier Grist piece, each state will have its own target for reduction of CO2 emissions, and each state will be allowed to develop its own plan for achieving the necessary reduction.  Washington will have to reduce its emissions by about 70%; North Dakota will only have to reduce its emissions by about 10%.  The EPA formula(s) (PDF) for calculating the required emissions targets are complicated and consider a number of factors.

One of the factors that is not included is where the electricity is consumed.  Some states produce more electricity than they consume, others produce less.  The graph to the left shows the approximate net exports for each state, in megawatt-hours [1].  California is at the top, with a negative value indicating they are a large importer of electricity.  Pennsylvania is at the other end of the chart and is the largest exporter.

Tracking exports and imports in more detail can be difficult.  Some cases are relatively straightforward.  Xcel Energy owns the coal-fired Comanche power plant in Pueblo, CO and sells the electricity generated there to consumers up and down the Front Range.  The 1.9 GW coal-fired Intermountain power plant in Utah is owned by utilities in California and Utah.  75% of the plant's output goes by HVDC transmission directly to San Bernardino County, CA; the remainder goes to utilities and electricity cooperatives in Utah.  The coal-fired Jim Bridger power plant in Wyoming is owned by Berkshire Hathaway and sells its output to two utilities operating across six states.  Oregon is one of those states.  Oregon is a net exporter of electricity, primarily hydro electricity sold to utilities in California.  Oregon generates a modest amount of in-state power from coal and imports coal-fired electricity from Wyoming and Utah.

Reducing CO2 emissions will require that money be spent on coal-fired power plants -- on sequestration technology, or on efficiency improvements [2], or on fuel conversions.  That money will eventually be collected from the pocketbooks of electricity consumers.  The fact that there are states that are exporters and importers of electricity would seem, at least to me, to create an opportunity for a certain amount of mischief.  That is, a state's plan for reducing CO2 emissions might be structured so that, as far as is possible, out-of-state consumers pay for the necessary changes.  From the examples in the preceding paragraph, Wyoming and Utah have an interest in getting California and Oregon to foot as much of the bills as possible.  In addition to Ben Adler's list of reasons that the EPA's final rule will end up in court, look for the distinct possibility of some states (and interstate companies) suing other states over their plans.

Myself, I'm in the camp that says, "A carbon tax would have been enormously simpler."  Reality, though, forces me to acknowledge that politics is the art of the possible, that such a tax would be DOA in Congress, and that not allowing Congress to delegate taxes and tax rates to the EPA is a good thing.

[1] Data from the EIA's state electricity profiles for calendar year 2012, total net generation minus total retail sales.  For the US as a whole, net generation exceeds retail sales by about 10%.  Each state's generation figure is scaled down by the US ratio so that the US Total exports comes out zero.

[2] An older conventional coal-fired plant may have 30% thermal efficiency.  That is, 30% of the heat energy released by burning the coal is converted to electricity.  New technology may achieve 45% thermal efficiency.  Such technology would lower CO2 emissions by 33% for the same amount of electricity.

Sunday, May 25, 2014

What Is It With Economists and Spreadsheets?

The hot book in economics this year is Thomas Piketty's Capital in the Twenty-First Century.  The book addresses topics of inequality, asserting that inappropriate levels of such are a natural outgrowth of capitalism.  It makes sweeping policy proposals, such as a global tax on wealth and much higher income tax rates on the upper income brackets.  It has been favorably reviewed by liberal economists like Paul Krugman, and criticized by conservative economists.  Recently we learned one more thing about it, as reported by the Financial Times: calculations critical to the argument relied on error-plagued spreadsheets.

Last year we went through the episode of the Reinhart-Rogoff paper that asserted that national public debts in excess of 90% of GDP killed economic growth.  Conservative politicians jumped on the paper as a justification for proposing drastic changes in US public policy.  As it turned out, critical calculations were done by spreadsheet, the spreadsheet had errors, and when the errors were corrected the "cliff" in economic growth at 90% disappeared.  The damage had already been done, though.  The 90% debt-to-GDP cliff was quoted in a variety of government reports, and those secondary sources continue to be cited in policy debates today.

What is it with economists and spreadsheets?  Spreadsheet software is a programming system.  There is a large literature on the frequency and nature of spreadsheet errors (the European Spreadsheet Risks Interest Group's annual conference on the subject will be held in July this year).  Even using best practices, complex spreadsheets contain errors at a rate that would be completely unacceptable in any other programming environment.  Not that there seems to be any evidence being offered that the economists mentioned above were making use of those best practices.  For example, I have yet to read that Reinhart and Rogoff conducted formal code reviews.

One part of this is particularly puzzling to me.  Some years back I spent two semesters in a PhD economics program.  The econometrics classes used R and Gauss.  There was never even a hint that Excel was an acceptable method for doing research calculations.  Certainly none of the graduate students I met who were working on their dissertations were using a spreadsheet to do the analysis.  So to find highly-respected academic economists using spreadsheets is surprising.  They have to know that if the spreadsheet is even moderately complex, errors are creeping in.  Quite possibly embarrassing errors [1].

 For some decades, economists have been accused of suffering from "physics envy."  That is, they want their field to be considered a hard science like physics, not one of the so-called soft sciences.  I'll offer economists a piece of free advice on that: hard sciences don't do data analysis with spreadsheets.  Clean up your act.  Require authors to certify that they use real tools for numerical work, reject out-of-hand papers that don't, and punish people who lie about it harshly.

[1]  Using better tools is no guarantee that errors won't creep in.  But they offer a better chance of catching them.

Wednesday, March 26, 2014

Mike the Pythoneer...

A few weeks back my old Mac Mini got to the point where it was giving me the "gray screen of death" every 18 to 36 hours [1].  Replacing the RAM -- failing RAM being the most frequent cause of kernel panics -- didn't fix the problem.  I decided that six-and-a-half years was a good run, that I had outgrown the RAM size limit, that having a graphics chip too dumb to support acceleration for OpenGL wasn't good, and that having no OS upgrade path for the old machine was a bad thing.  So I got a new Mini.  It has not been an entirely painless process, largely because there have been a bunch of changes in the Apple development tools.

I have a couple of pieces of software that I use regularly that I wrote in Perl.  I'm entirely dependent on one of them, a note-taking application that also acts as the front-end for (sort of) organizing a whole collection of files of various types -- PDFs, images, old source code -- that is pretty static these days [2].  Another one, that is part of a package for drawing cartograms, is under off-and-on development.  Both were broken under the default Perl/Tcl/Tk that comes as a standard part of the new OS X, and required some effort to get running again.  To get the note-taking app running I ended up downloading and using ActiveState's Perl instead of the Apple default.  At one point in the past I had toyed with the idea rewriting the note-taker in Python (for other reasons) and had some code that tested all of the necessary GUI functionality; that old Python code ran on the new Mini with no problems using the Apple default installation.

Reading a variety of Internet things led me to (perhaps erroneously) conclude that: Apple is moving away from Perl and towards Python for scripting; Tkinter is a required part of the Python core, but Perl has no required GUI module; so Apple is more likely to keep the chain of Python/Tcl/Tk working in the future.  Suddenly, switching to Python seemed much more compelling than before.  I also came across Eric Raymond's old "Why Python?" article from the Linux Journal.  His experience seemed to match my own recollections of writing that little Python program related to the note-taker: I got over the "eeewww" thing about the use of white space to identify block structure fairly quickly, and I seemed to be writing useful non-toy code fairly quickly.

One of my favorite (and I use that word with some ambiguity) time-wasters on my computer is FreeCell.  Since Windows 95, Microsoft has always provided a simple version of the game as part of the standard distribution.  When I switched to a Mac at home several years ago, the lack of a simple free reasonably-attractive version of the game grated on me [3].  I ended up using a Javascript version in my browser, but recently that one began to act flaky, putting up placeholders instead of images for some of the cards.  "Two birds with one stone," I thought.  "Practice Python and get a version of FreeCell that looks and behaves like I want."

The good news is that with a couple of manuals open in browser tabs, writing the game went quickly.  Call it 15 hours total over three days to go from nothing to something that my fingers are almost comfortable with up and running.  And by nothing, I mean just that: no cardface images, no thoughts on data structure.  A blank slate.  Some of that time was doing small sorts of rewriting, when I would find an example that showed a better Python idiom.  The bad news comes in several parts:
  • There's a version of FreeCell that my fingers know just sitting there on the desktop now, begging to be "tested".
  • Feature creep is going to be an issue.  It should have undo.  It should have the ability remember the current layout and return to that.  It should have a built-in solver that works from any position.
  • It should run on my old Linux laptop.  It should run on my Android tablet.  It should run on my wife's iPhone (well, maybe that's a stretch).
So I guess I'm going to be a Pythoneer (I actually think "Pythonista" sounds better, but that's been preempted as the name of a commercial application).  Expect an occasional update on how things are going...

[1]  In the event of certain kernel panics, Mac OS X puts up a translucent overlay to block the display, along with a dialog box that says "You have to restart your machine" in several languages.  Until recently, I had no idea that such a thing even existed.  For the record, since it no longer has to support nearly as many processes, the old Mini has been up without a problem for more than three weeks.

[2]  At last count, a few hundred pages of notes and more than 100M of images, PDFs, etc.  As to why write my own when there are dozens of note-taking applications out there, let's just say that I'm an old geek and paranoid and don't like to have critical data stored in a proprietary file format.

[3]  I'm sure that all of the authors of the solitaire packages out there think their games are laid out attractively.  I just happen to disagree.