# Kilometers and the golden ratio

Regular readers know I enjoy reading John D. Cook’s blog and often comment on it here. But I was a little creeped out by it a couple of days ago. It started off with something I’ve been thinking about a lot over the past several weeks, and it was as if he’d been reading my mind.

The post is about how the conversion factor between miles and kilometers, $1.609\dots$, is close to the golden ratio, $\varphi =1.618\dots$. To convert kilometers to miles, you can make a good estimate by multiplying by $\varphi$, which means that you can convert in the other direction by multiplying by $1/\varphi =0.618\dots$.

You may think multiplying by an irrational number is a pain in the ass, and you’d be right. Cook gets around this by approximating $\varphi$ by the ratio of consecutive Fibonacci numbers, so you can, for example, convert from miles to kilometers by multiplying by 21 and dividing by 13. Similarly, you can use consecutive Lucas numbers in the same fashion, multiplying, say, by 29 and dividing by 18.

The problem with these calculations is that I’m not John D. Cook, and I can’t do multiplications and divisions like this in my head without losing digits along the way. So my conversion method is much cruder: to go from miles to kilometers, I multiply by 16; and to go in the opposite direction, I multiply by 6. In both cases, I finish by shifting the decimal point to divide by 10. If I need more precision than this, I pull out my phone, launch PCalc, and use its built-in conversions.

By the way, the main reason I’ve been converting between miles and kilometers lately is that I recently switched the units I use in the Activity/Fitness app1 from miles to kilometers. I now often find myself in the middle of one of my walking routes wondering how long it is in kilometers. I know the lengths of all my routes in miles but haven’t memorized their lengths in kilometers yet. It was while doing conversions like this that I noticed that the conversion factor was close to $\varphi$ and started doing a lot of multiplication by 16.

If you’re wondering why I bothered switching units, it’s because I enter 5k races a few times a year and I like my time to be under 45 minutes. To keep myself in shape for this, every couple of weeks I push myself to do my first 5k in that time. It’s much easier to look at my watch and know that my pace should be about 9:00 per kilometer than 14:29 per mile. Also, it’s easier to know when I’m done—marking my time at 3.11 miles is as much a pain in the ass as multiplying by $\varphi$

1. Does it really have to have different names on the watch and the phone? I don’t get confused when I’m using them because the icons are the same, but I never know which name to use when talking about them.

# The Electoral College again, this time with aggregation

After the last post, I had a conversation with ondaiwai on Mastodon about the to_markdown and agg functions in Pandas. After seeing his example code here and here, I rewrote my script, and I think it’s much better.

In a nutshell, the agg function aggregates the data pulled together by groupby and creates a new dataframe that can be formatted as a MultiMarkdown table via to_markdown. I had used agg several years ago but forgot about it. And I’d never seen to_markdown before. Let’s see how they work to make a nicely formatted Electoral College table.

This time, instead of showing you the script in pieces, I’ll show the whole thing at once, and we’ll consider each section in turn

python:
1:  #!/usr/bin/env python
2:
3:  import pandas as pd
4:
5:  # Import the raw data
7:
8:  # Generate new fields for percentages
9:  popSum = df.Population.sum()
10:  ecSum = df.Electors.sum()
11:  df['PopPct'] = df.Population/popSum
12:  df['ECPct'] = df.Electors/ecSum
13:
14:  # Collect states with same numbers of electors
15:  basicAgg = {'State': 'count', 'Population': 'sum', 'PopPct': 'sum',\
16:              'Electors': 'sum', 'ECPct': 'sum'}
17:  dfgBasic = df.groupby(by='Electors').agg(basicAgg)
18:
19:  # Print out the summary table
20:  print(dfgBasic)
21:
22:  print()
23:  print()
24:
25:  # Collect as above but for summary table in blog post
26:  tableAgg = {'Abbrev': lambda s: ', '.join(s), 'PopPct': 'sum', 'ECPct': 'sum'}
27:  dfgTable = df.groupby(by='Electors').agg(tableAgg)
28:
29:  # Print as Markdown table
30:  print(dfgTable.to_markdown(floatfmt='.2%',\
31:          headers=['Electors', 'States', 'Pop Pct', 'EC Pct'],\
32:          colalign=['center', 'center', 'right', 'right'] ))


Lines 1–12 are the same as before; they import the state population and Electoral College information into a dataframe, df, and then add a couple of columns for percentages of the totals. After Line 12, the first five rows of df are

State Abbrev Population Electors PopPct ECPct
Alabama AL 5108468 9 0.015253 0.016729
Alaska AK 733406 3 0.002190 0.005576
Arizona AZ 7431344 11 0.022189 0.020446
Arkansas AR 3067732 6 0.009160 0.011152
California CA 38965193 54 0.116344 0.100372

The code in Lines 15–20 groups the data according to the number of Electoral College votes per state and prints out a summary table intended for my use. “Intended for my use” means the output is clear but not the sort of thing I’d want to present to anyone else. “Quick and dirty” would be another way to describe the output, which is this:

          State  Population    PopPct  Electors     ECPct
Electors
3             7     5379033  0.016061        21  0.039033
4             7    10196485  0.030445        28  0.052045
5             2     4092750  0.012220        10  0.018587
6             6    18766882  0.056035        36  0.066914
7             2     7671000  0.022904        14  0.026022
8             3    13333261  0.039811        24  0.044610
9             2    10482023  0.031298        18  0.033457
10            5    29902889  0.089285        50  0.092937
11            4    28421431  0.084862        44  0.081784
12            1     7812880  0.023328        12  0.022305
13            1     8715698  0.026024        13  0.024164
14            1     9290841  0.027741        14  0.026022
15            1    10037261  0.029970        15  0.027881
16            2    21864718  0.065284        32  0.059480
17            1    11785935  0.035191        17  0.031599
19            2    25511372  0.076173        38  0.070632
28            1    19571216  0.058436        28  0.052045
30            1    22610726  0.067512        30  0.055762
40            1    30503301  0.091078        40  0.074349
54            1    38965193  0.116344        54  0.100372


The agg function has many options for passing arguments, one of which is a dictionary in which the column names are the keys and the aggregation functions to be applied to those columns are the values. We create that dictionary, basicAgg, in Lines 15–16, where the State column is counted and the other columns are summed. Notice that the values of basicAgg are strings with the names of the functions to be applied.1

Line 17 then groups the dataframe by the Electors column and runs the aggregation functions specified in basicAgg on the appropriate columns. The output is a new dataframe, dfgBasic, which is then printed out in Line 20. Nothing fancy in the print function because this is meant to be quick and dirty.

After a couple of print()s on Lines 22–23 to give us some whitespace, the rest of the code creates a Markdown table that looks like this:

|  Electors  |           States           |   Pop Pct |   EC Pct |
|:----------:|:--------------------------:|----------:|---------:|
|     3      | AK, DE, DC, ND, SD, VT, WY |     1.61% |    3.90% |
|     4      | HI, ID, ME, MT, NH, RI, WV |     3.04% |    5.20% |
|     5      |           NE, NM           |     1.22% |    1.86% |
|     6      |   AR, IA, KS, MS, NV, UT   |     5.60% |    6.69% |
|     7      |           CT, OK           |     2.29% |    2.60% |
|     8      |         KY, LA, OR         |     3.98% |    4.46% |
|     9      |           AL, SC           |     3.13% |    3.35% |
|     10     |     CO, MD, MN, MO, WI     |     8.93% |    9.29% |
|     11     |       AZ, IN, MA, TN       |     8.49% |    8.18% |
|     12     |             WA             |     2.33% |    2.23% |
|     13     |             VA             |     2.60% |    2.42% |
|     14     |             NJ             |     2.77% |    2.60% |
|     15     |             MI             |     3.00% |    2.79% |
|     16     |           GA, NC           |     6.53% |    5.95% |
|     17     |             OH             |     3.52% |    3.16% |
|     19     |           IL, PA           |     7.62% |    7.06% |
|     28     |             NY             |     5.84% |    5.20% |
|     30     |             FL             |     6.75% |    5.58% |
|     40     |             TX             |     9.11% |    7.43% |
|     54     |             CA             |    11.63% |   10.04% |


which presents just the data I want to show and formats it nicely for pasting into the blog post.

Line 26 creates a new aggregation dictionary, tableAgg. It’s similar to basicAgg except for the function applied to the Abbrev column. Because there’s no built-in function (as far as I know) for turning a column into a comma-separated string of its contents, I made one. Because this function is very short and I’m using it only once, I added it to tableAgg as a lambda function:

python:
lambda s: ', '.join(s)


This illustrates the overall structure of an aggregation function. It takes a Pandas series (the column) as input and produces a scalar value as output. Because a series acts like a list, it can be passed directly to the join function, producing the comma-separated string I wanted in the table.

The last few lines print out the table in Markdown format, using the to_markdown function to generate the header line, the format line, and the pipe characters between the columns. While I didn’t know about to_markdown until ondaiwai told me about it, it does its work via the tabulate module, which I have used (badly) in the past. The keyword arguments given to to_markdown in Lines 30–32 are passed on to tabulate to control the formatting:

1. floatfmt='.2%' sets the formatting of all the floating point numbers, which in this case are the PopPct and ECPct columns. Note that in the quick-and-dirty table, these columns are printed as decimal values with more digits than necessary.
2. headers=['Electors', 'States', 'Pop Pct', 'EC Pct'] sets the header names to something nicer than the default, which would be the names in df.
3. colalign=['center', 'center', 'right', 'right'] sets the alignment of the four columns. Without this, the Electors column would be right-aligned (the default for numbers) and the States column would be left-aligned (the default for strings).

Why do I think this code is better than what I showed in the last post? Mainly because this code operates on dataframes and series all at once instead of looping through items as my previous code did. This is how Pandas is meant to be used. Also, the functions being applied to the columns are more obvious because of how the basicAgg and tableAgg dictionaries are built. In my previous code, these functions are inside braces in f-strings, where they’re harder to see. Similarly, the formatting of the Markdown table is easier to see when it’s given as arguments to a function instead of buried in print commands.

My thanks again to ondaiwai for introducing me to to_markdown and getting me to think in a more Pandas-like way. He rewrote and extended the code snippets he posted on Mastodon and made a gist of it. After seeing his code, I felt pretty good about my script; it’s different from his, but it shows that I understood what he was saying in the tweets.

Update 7 Sep 2024 12:33 PM
Kieran Healy followed up with a post on doing this summary analysis in R, but there’s more to his post than a simple language translation. He does a better job than I did at explaining why we should use higher-level abstractions in our analysis code. While I just kind of waved my hands and said “this is how Pandas is meant to be used,” Kieran explains that this is a general principle, not just a feature of Pandas. He also does a good job of relating the Pandas agg function to the split-apply-combine strategy, which has been formalized in Pandas, R, and Julia (and, I assume, other data analysis systems).

Using abstractions like this in programming is analogous to how we advance in mathematics. If you’re doing a problem that requires matrix multiplication, you don’t write out every step of row-column multiplication and addition, you just say

$C=A\phantom{\rule{thinmathspace}{0ex}}B$

This is how you get to think about the purpose of the operation, not the nitty-gritty of how it’s done.

1. While you can, in theory, use the functions themselves as the values (e.g., sum instead of 'sum'), I found that doesn’t necessarily work for all aggregation functions. The count function, for example, seems to work only if it’s given as a string. I’m sure there’s a good reason for this, but I haven’t figured it out yet.

# Pandas and the Electoral College

A couple of weeks ago, I used the Pandas groupby function in some data analysis for work, so when I started writing my previous post on the Electoral College, groupby came immediately to mind when I realized I wanted to add this table to the post:

Electors States Pop Pct EC Pct
3 AK, DE, DC, ND, SD, VT, WY 1.61% 3.90%
4 HI, ID, ME, MT, NH, RI, WV 3.04% 5.20%
5 NE, NM 1.22% 1.86%
6 AR, IA, KS, MS, NV, UT 5.60% 6.69%
7 CT, OK 2.29% 2.60%
8 KY, LA, OR 3.98% 4.46%
9 AL, SC 3.13% 3.35%
10 CO, MD, MN, MO, WI 8.93% 9.29%
11 AZ, IN, MA, TN 8.49% 8.18%
12 WA 2.33% 2.23%
13 VA 2.60% 2.42%
14 NJ 2.77% 2.60%
15 MI 3.00% 2.79%
16 GA, NC 6.53% 5.95%
17 OH 3.52% 3.16%
19 IL, PA 7.62% 7.06%
28 NY 5.84% 5.20%
30 FL 6.75% 5.58%
40 TX 9.11% 7.43%
54 CA 11.63% 10.04%

I got the population and elector information from the Census Bureau and the National Archives, respectively, and put them in a CSV file named states.csv (which you can download). The header and first ten rows are

State Abbrev Population Electors
Alabama AL 5108468 9
Arizona AZ 7431344 11
Arkansas AR 3067732 6
California CA 38965193 54
Connecticut CT 3617176 7
Delaware DE 1031890 3
District of Columbia DC 678972 3
Florida FL 22610726 30

(Because the District of Columbia has a spot in the Electoral College, I’m lumping it in with the 50 states and referring to all of them as “states” in the remainder of this post. That makes it easier for both you and me.)

Here’s the initial code I used to summarize the data:

python:
1:  #!/usr/bin/env python
2:
3:  import pandas as pd
4:
5:  # Import the raw data
7:
8:  # Generate new fields for percentages
9:  popSum = df.Population.sum()
10:  ecSum = df.Electors.sum()
11:  df['PopPct'] = df.Population/popSum
12:  df['ECPct'] = df.Electors/ecSum
13:
14:  # Collect states with same numbers of electors
15:  dfg = df.groupby(by='Electors')
16:
17:  # Print out the summary table
18:  print('State EC   States   Population   Pop Pct   Electors   EC Pct')
19:  for k, v in dfg:
20:    print(f'    {k:2d}       {v.State.count():2d}    {v.Population.sum():11,d}\
21:      {v.PopPct.sum():6.2%}     {v.Electors.sum():3d}     {v.ECPct.sum():6.2%}')


The variable df contains the dataframe of all the states. It’s read in from the CSV in Line 6, and then extended in Lines 9–12. After Line 12, the first five rows of df are

State Abbrev Population Electors PopPct ECPct
Alabama AL 5108468 9 0.015253 0.016729
Alaska AK 733406 3 0.002190 0.005576
Arizona AZ 7431344 11 0.022189 0.020446
Arkansas AR 3067732 6 0.009160 0.011152
California CA 38965193 54 0.116344 0.100372

The summarizing is done first by using the groupby function in Line 15, which groups the states according to the number of electors they have. The resulting variable, dfg, is of type

pandas.core.groupby.generic.DataFrameGroupBy


which works like a dictionary, where the keys are the numbers of electors per state and the values are dataframes of the subset of states with the number of electors given by that key. Lines 19–21 loop through the dictionary and print out summary information for each subset of states. The output is this:

State EC   States   Population   Pop Pct   Electors   EC Pct
3        7      5,379,033     1.61%      21      3.90%
4        7     10,196,485     3.04%      28      5.20%
5        2      4,092,750     1.22%      10      1.86%
6        6     18,766,882     5.60%      36      6.69%
7        2      7,671,000     2.29%      14      2.60%
8        3     13,333,261     3.98%      24      4.46%
9        2     10,482,023     3.13%      18      3.35%
10        5     29,902,889     8.93%      50      9.29%
11        4     28,421,431     8.49%      44      8.18%
12        1      7,812,880     2.33%      12      2.23%
13        1      8,715,698     2.60%      13      2.42%
14        1      9,290,841     2.77%      14      2.60%
15        1     10,037,261     3.00%      15      2.79%
16        2     21,864,718     6.53%      32      5.95%
17        1     11,785,935     3.52%      17      3.16%
19        2     25,511,372     7.62%      38      7.06%
28        1     19,571,216     5.84%      28      5.20%
30        1     22,610,726     6.75%      30      5.58%
40        1     30,503,301     9.11%      40      7.43%
54        1     38,965,193    11.63%      54     10.04%


This is what I was looking at when I wrote the last few paragraphs of the post, but I didn’t want to present the data to you in this way—I wanted a nice table with only the necessary columns. I added a couple of print() statements to the code above to add a little whitespace and then this code to create a MultiMarkdown table.

python:
25:  # Print out the Markdown summary table
26:  print('| Electors | States | Pop Pct | EC Pct |')
27:  print('|:--:|:--:|--:|--:|')
28:  for k, v in dfg:
29:     print(f'| {k:2d} | {", ".join(v.Abbrev)} | {v.PopPct.sum():6.2%} \
30:  | {v.ECPct.sum():6.2%} |')


This is very much like the previous output code. Apart from adding the pipe characters and the formatting line, there’s the

python:
{", ".join(v.Abbrev)}


piece in the f-string of Line 29. The join part concatenates all the state abbreviations, putting a comma-space between them. I decided a list of states with the given number of electors would be better than just a count of how many such states there are.

The output from this additional code was

| Electors | States | Pop Pct | EC Pct |
|:--:|:--:|--:|--:|
|  3 | AK, DE, DC, ND, SD, VT, WY |  1.61% |  3.90% |
|  4 | HI, ID, ME, MT, NH, RI, WV |  3.04% |  5.20% |
|  5 | NE, NM |  1.22% |  1.86% |
|  6 | AR, IA, KS, MS, NV, UT |  5.60% |  6.69% |
|  7 | CT, OK |  2.29% |  2.60% |
|  8 | KY, LA, OR |  3.98% |  4.46% |
|  9 | AL, SC |  3.13% |  3.35% |
| 10 | CO, MD, MN, MO, WI |  8.93% |  9.29% |
| 11 | AZ, IN, MA, TN |  8.49% |  8.18% |
| 12 | WA |  2.33% |  2.23% |
| 13 | VA |  2.60% |  2.42% |
| 14 | NJ |  2.77% |  2.60% |
| 15 | MI |  3.00% |  2.79% |
| 16 | GA, NC |  6.53% |  5.95% |
| 17 | OH |  3.52% |  3.16% |
| 19 | IL, PA |  7.62% |  7.06% |
| 28 | NY |  5.84% |  5.20% |
| 30 | FL |  6.75% |  5.58% |
| 40 | TX |  9.11% |  7.43% |
| 54 | CA | 11.63% | 10.04% |


which I then ran through my filter in BBEdit to produce the nicer looking

| Electors |           States           | Pop Pct | EC Pct |
|:--------:|:--------------------------:|--------:|-------:|
|    3     | AK, DE, DC, ND, SD, VT, WY |   1.61% |  3.90% |
|    4     | HI, ID, ME, MT, NH, RI, WV |   3.04% |  5.20% |
|    5     |           NE, NM           |   1.22% |  1.86% |
|    6     |   AR, IA, KS, MS, NV, UT   |   5.60% |  6.69% |
|    7     |           CT, OK           |   2.29% |  2.60% |
|    8     |         KY, LA, OR         |   3.98% |  4.46% |
|    9     |           AL, SC           |   3.13% |  3.35% |
|    10    |     CO, MD, MN, MO, WI     |   8.93% |  9.29% |
|    11    |       AZ, IN, MA, TN       |   8.49% |  8.18% |
|    12    |             WA             |   2.33% |  2.23% |
|    13    |             VA             |   2.60% |  2.42% |
|    14    |             NJ             |   2.77% |  2.60% |
|    15    |             MI             |   3.00% |  2.79% |
|    16    |           GA, NC           |   6.53% |  5.95% |
|    17    |             OH             |   3.52% |  3.16% |
|    19    |           IL, PA           |   7.62% |  7.06% |
|    28    |             NY             |   5.84% |  5.20% |
|    30    |             FL             |   6.75% |  5.58% |
|    40    |             TX             |   9.11% |  7.43% |
|    54    |             CA             |  11.63% | 10.04% |


This is the Markdown I added to the post.

Update 1 Sep 2024 7:11 AM
There’s a better version of the script here.

# What I didn't learn about the Electoral College

People my age1 like to talk about how our education was superior to that of these kids today. I can say with absolute certainty that one aspect of my education was terribly deficient: how the Electoral College works.

I’m not talking about the existence of the Electoral College or how it works on a superficial level—I definitely learned that. But there were two aspects of the Electoral College system that were never mentioned.

First was the likelihood of the popular vote and the electoral vote being at odds with one another. My classmates and I were taught that it could happen, but it was treated as a sort of theoretical thing, not something of any practical significance. Maybe that sort of thing had happened long ago—I say “maybe” because the Hayes-Tilden election of 1876 was never a big focus in history class—but we modern Americans would never have to worry about that.

I suspect this misguided optimism came from the extremely close elections that had recently occurred: Kennedy-Nixon in 1960 and Nixon-Humphrey in 1968. In both cases, the winner of the popular vote won the electoral vote with room to spare, and that may have given people the feeling that the Electoral College would always reflect the popular vote.2

The second problem was how we were taught that the Electoral College could—theoretically, remember—lead to an inversion of results. The example we were given was two candidates, call them A and B, and two states, a large one and a small one. If A wins the large state by a very small margin and B wins the small state by a large margin, Anderson would get more electoral votes with fewer popular votes.

While this is true, it’s misleading in that it leads you to think that it’s the large states that have sort of an unfair advantage, when the truth is the exact opposite. Because every state gets two senators, they also get two “free” electoral votes, votes that are unrelated to their population.

Here’s a summary of the states, ranked by their number of electors and presenting their percentages of the population and the Electoral College. I took the 2023 population estimates from the Census Bureau and the current Electoral College vote counts from the National Archives.

Electors States Pop Pct EC Pct
3 AK, DE, DC, ND, SD, VT, WY 1.61% 3.90%
4 HI, ID, ME, MT, NH, RI, WV 3.04% 5.20%
5 NE, NM 1.22% 1.86%
6 AR, IA, KS, MS, NV, UT 5.60% 6.69%
7 CT, OK 2.29% 2.60%
8 KY, LA, OR 3.98% 4.46%
9 AL, SC 3.13% 3.35%
10 CO, MD, MN, MO, WI 8.93% 9.29%
11 AZ, IN, MA, TN 8.49% 8.18%
12 WA 2.33% 2.23%
13 VA 2.60% 2.42%
14 NJ 2.77% 2.60%
15 MI 3.00% 2.79%
16 GA, NC 6.53% 5.95%
17 OH 3.52% 3.16%
19 IL, PA 7.62% 7.06%
28 NY 5.84% 5.20%
30 FL 6.75% 5.58%
40 TX 9.11% 7.43%
54 CA 11.63% 10.04%

There are lots of ways to slice this data, but it always looks bad for the large states. For example, the 14 states that have either 3 or 4 Electors combine to have 9.11% of the total Electoral College vote3 but only 4.65% of the population. Compare this to Texas, which has nearly twice the population (9.11% of the US total) but gets only 7.43% of the Electoral College vote. You could make a similar comparison between the 16 states with 5 or fewer Electors and California.

Starting in 2000, every election except the two Obama wins has either inverted the popular and electoral winner or come close to doing so. I’d like to think that’s changed the way civics is taught.

1. I turned 64 last Friday, which will be the last time my age will be a power of 2 or of 4. Still hoping to make it to the next power of 3.

2. The Carter-Ford election of 1976 probably confirmed that feeling, but I was past civics classes by then.

3. The table suggests it should be 9.10% (3.90% + 5.20%), but that’s because of rounding. To three digits, it’s 9.11% (49/538).