interesting comment thread on the subject of defining the Midwest region of the US. One of the thoughts that occurred to me while reading that was whether it was possible to define regions based on inter-state migration patterns. The idea grew, I suppose, out of my own experience. I lived and worked in New Jersey for ten years, but never really felt like I fit in there. Eventually my wife and I moved to Colorado, to the suburbs of Denver, where we immediately felt right at home. Most people, I thought, might have been brighter than we were and not moved to someplace so "different."
I've also encountered a variety of nifty data visualization tools that look at inter-state migration in the US, like this one and this one from Forbes. State-level data for recent years turns out to be readily available from the Census Bureau. We can define a simple distance measure: two states are close if a relatively large fraction of the population of each moves between them each year. "Relatively" because states with large population have large absolute migration numbers in both directions. For example, large numbers of people move between California and Texas -- in both directions -- because those states have lots of people who could move. From Wyoming, not so many. Given a distance measurement, it turns into a statistical problem in cluster analysis: partition the states into groups so that states within a group are close to each other. Since there's only a distance measure, hierarchical clustering seems like a reasonable choice.
Answers to random anticipated questions... I used seven clusters because that was the largest number possible before there was some cluster with only a single state in it . The Northeast region has the greatest distance between it and any of the other regions. If the country is split into two regions, the dividing line runs down the Mississippi River. If into three, the Northeast gets split off from the rest of the East. There are undoubtedly states that should be split, ie, western Missouri (dominated by Kansas City) and eastern Missouri (dominated by St. Louis); a future project might be to work with county-level data.
 My implementation of hierarchical clustering works from the bottom up, starting with each state being its own cluster and merging clusters that are close. Using the particular measure I defined, close pairs of states include Minnesota/North Dakota, California/Nevada, Massachusetts/New Hampshire, and Kansas/Missouri. These agree with my perception of population flows.
 The singleton when eight clusters are used is New Mexico. When ten clusters are used, Michigan also becomes a singleton, and Ohio/Kentucky a stand-alone pair.