NBA Finals and Pandas
June 23, 2025 at 6:23 PM by Dr. Drang
The NBA season ended last night as the Thunder beat the Pacers 103–91 in the seventh game of the Finals. I read this morning that this was the 20th Finals to go seven games in the 79-year history of the NBA, and I wondered what the distribution of game counts was.
I found all the Finals results on this Wikipedia page in a table that starts out this way:
I figured I’d use what I learned about how Pandas can read HTML tables to extract the information I wanted. So I started an interactive Python session that went like this:
python:
>>> import pandas as pd
>>> dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_NBA_champions')
>>> df = pd.DataFrame({'Result': dfs[2]['Result']})
>>> df[['West', 'East']] = df.Result.str.split('–', expand=True).astype(int)
>>> df
Result West East
0 1–4 1 4
1 4–2 4 2
2 4–2 4 2
3 4–2 4 2
4 4–3 4 3
.. ... ... ...
74 2–4 2 4
75 4–2 4 2
76 4–1 4 1
77 1–4 1 4
78 4–3 4 3
[79 rows x 3 columns]
>>> df['Games'] = df.West + df.East
>>> for n in range(4, 8):
... print(f'{n} {len(df[df.Games == n])}')
...
4 9
5 20
6 30
7 20
OK, you’re right, this is an edited version of the session. The command that created the West and East fields took a few tries to get right. Also, I’m pretty sure there’s a better way to put the Results column from the original table into its own dataframe, but creating an intermediate dictionary was the first thing that came to mind. Overall, I was pleased that I didn’t need to do much thrashing about; I’ve used Pandas long enough now to get most things right or nearly right on the first try.
The upshot is this:
Games | Count |
---|---|
4 | 9 |
5 | 20 |
6 | 30 |
7 | 20 |
I expected the 4–0 series to be the least common. As a Bulls fan, I suppose I should have guessed that 4–2 series are the most common—Michael Jordan was responsible for five of them.
If next year’s Finals is a sweep, I will definitely do this again. What a neat and tidy table that will be.
Graphing without empty spaces
June 23, 2025 at 12:27 PM by Dr. Drang
I’ve mentioned here before that I was on a dietary program to keep my Type II diabetes under control. The program and its associated app also kept track of my blood pressure with a Bluetooth-connected cuff that I used once a week. I left the program at the beginning of the year (insurance wouldn’t cover it anymore), but I’ve continued the diet and tracking my blood pressure. I don’t know if it’s possible to connect to the cuff’s Bluetooth signal, and even if it is, I don’t have the programming chops to do it. But I’ve kept taking my blood pressure once a week and entering it into the Health app on my phone.
Unfortunately, the Health app’s way of plotting blood pressure is kind of crappy. Here’s what the past six months looks like:
Lots of wasted space in there, and because the range is so broad, I can’t see at a glance where I stand. Since seeing at a glance is sort of the whole point of plotting data, I would say this graph is basically useless.
Plotting the data using a tighter range would be better, as it would give me a better chance to figure out the various values to within a couple of mm Hg. Here’s an example:
This still seems like it could be improved. We’ve traded a bunch of wasted space above the systolic readings for a bunch of wasted space between the systolic and diastolic values. In some situations, it’s good to see how a gap between two sets of values compares to the variation within a set, but I don’t see much use in it for this type of data.
This data could use a scale break. In effect, this means making two plots but arranging them into a single figure to take advantage of their common parts. Here’s one way to do it:
The scales of the two parts are the same, so the larger variation in systolic values is properly represented. We’ve just cut out the empty space between the systolic and diastolic and pushed them together.
William Cleveland, author of The Elements of Graphing Data, is not a fan of scale breaking:
Use scale breaking only when necessary. If a break cannot be avoided, use a full scale break. Do not connect numerical values on two sides of a break. Taking logs can cure the need for a break.
In this case, taking logs would make the cure worse than the disease, as it would make reading the values harder. If you’re wondering about the difference between full and partial breaks, a partial break is when an axis is broken by a wavy or zigzag line, like this:
Cleveland thinks partial breaks are too easy for readers to overlook. A full break is what we’ve done, so he may forgive what we’ve done.
Despite Cleveland’s admonitions, scale breaks can be effective and attractive. Here’s an example from Modern Timber Engineering by Scofield and O’Brien:
This gives the capacities of a type of wood connector under a variety of conditions. The graph is broken into three groups according to wood species, and what makes scale breaks useful here is that the plots would overlap and be impossible to read without the breaks. The authors could have make this three separate graphs, of course, but putting them together into a single figure emphasizes the interrelatedness of the data. And the curving gaps between the sections look really cool.
I’m not saying Apple should try curvy scale breaks in the Health app, but it wouldn’t take much to make the blood pressure graphs a lot more useful.
Technical editor needed
June 20, 2025 at 9:39 AM by Dr. Drang
I read this article from Scientific American about the GBU-57/B, the “bunker buster” bomb that Donald Trump will… or won’t… or will… or won’t allow Israel to use on Iran’s Fordo nuclear facility. The facility is buried deep within a mountain, and the GBU-57/B is the only non-nuclear bomb that may be able to destroy it. The article is worth reading, but if you do, you’ll probably notice some obvious errors.
The first error is related to concrete, which is why I picked up on it. Here’s the passage:
According to a 2012 Congressional Research Service briefing, the GBU-57/B has been reported to burrow through 200 feet of concrete or bedrock with a density of 5,000 pounds per square inch (comparable to the strength of bridge decks or parking-garage slabs).
The 5,000 psi figure refers to the compressive strength of concrete, not its density. Back when I was a student, 5,000 psi was kind of on the strong side for commercially available concrete; now it’s a garden-variety strength, as suggested by the parenthetical comment. The compressive strength of intact rock is often much higher than this, but natural rock formations tend to have joints and other defects that reduce their strength. By the way, even if you don’t have much experience with concrete or rock, you should know that something’s fishy with this passage. Density is weight or mass per unit volume—it can’t be measured in pounds per square inch.
Later, we see this:
About one fifth of the warhead’s 5,342-pound total weight is made up of two explosives: 4,590 pounds of AFX-757 plus 752 pounds of PBXN-114.
Since the sum of the two explosive weights—4,590 lbs and 752 lbs—is equal to 5,342 lbs, it’s hard to see how their sum could be one-fifth of that total. I’m guessing the intention here is to say that the combined explosive weight is about one-fifth of the missile’s total weight, which is given earlier in the article as about 30,000 lbs.
There’s also a discussion of how the ogive shape of the missile’s nose gives it both good aerodynamic and good penetrating properties. There’s nothing wrong with this, but it suggests the shape is something special. It isn’t. The ogive shape is common in rockets, missiles, and bullets. Maybe the GBU-57/B’s ogive is unusual in some way, but if it is, the article doesn’t say so.
I should say that this article isn’t in the Scientific American magazine proper, it’s just on the web, and maybe web articles aren’t given the same scrutiny as print articles. It does seem odd, though, that piece coming out under the SciAm name is edited at the same level as a blog post.
A polygon puzzle that really isn’t
June 17, 2025 at 9:13 PM by Dr. Drang
This is another post about a puzzle in Scientific American. I confess that this and my previous post have just been placeholders, things that I’m putting here because the post I really want to write is giving me trouble. It started when I read this article in Ars Technica about dropping eggs. The more I thought about it—and the paper it’s based on—the more I felt I should say, and now I have a couple of weeks’s worth of notes and calculations that I’m struggling to organize. Posts like this are much easier to write, so here we are.
The puzzle is this one. There are eight regular polygons with increasing numbers of sides, triangle through decagon. The first seven have numbers in them, and you’re supposed to find the number that goes in the last one.
Because of the house of cards puzzle I discussed several months ago, I decided to set up a difference table, like this:
The numbers in the difference column are obviously a series of prime numbers, so I figured the next difference would be the next prime, 29, and therefore the missing number would be 129. But I had no clue as to what that had to do with polygons.
It turns out that the number for the triangle is the sum of the first three prime numbers, the number for the square is the sum of the first four prime numbers, and so on. The (slight) geometric aspect of the problem is the number of sides being the number of prime numbers you should add. This is what makes the difference table work out the way it does and how I got the right answer without really solving the problem.
If you’re wondering, yes, this sequence is in the Online Encyclopedia of Integer Sequences: A007504. Mathematicians really like their prime numbers.