The sunrise plot

Since I almost never make a graph without showing the code for it, here’s how the sunrise/sunset plots in yesterday’s post were made.

Chicago sunrise and sunset

I started with the US Naval Observatory’s 2018 sunrise/sunset data for Chicago, which is a plain text table (i.e, monospaced font using space characters to align columns) that looks like this:

USNO data for Chicago

I copied the table, pasted it into BBEdit, and did some editing to get it into this form:

2018-01-01  0718 1631
2018-01-02  0718 1632
2018-01-03  0718 1632
2018-01-04  0718 1633
2018-01-05  0718 1634
2018-01-06  0718 1635
2018-01-07  0718 1636
2018-01-08  0718 1637
2018-01-09  0718 1638
2018-01-10  0718 1639
2018-01-11  0717 1640
2018-01-12  0717 1642
2018-01-13  0717 1643
2018-01-14  0716 1644
2018-01-15  0716 1645
2018-01-16  0715 1646
2018-01-17  0715 1647
2018-01-18  0714 1649
2018-01-19  0714 1650
2018-01-20  0713 1651

Most of the editing consisted of selecting columns for February through December and pasting them under the January data. Then I prepended the year and month (with hyphens) in front of the the days. That left me with a file, called chicago-riseset.txt, with 365 lines and three columns. If I were going to do this sort of thing on a regular basis, I’d write a script to handle this editing, but for a one-off I just did it “by hand.”

The script that parsed the data and created the graphs is this:

 1:  #!/usr/bin/env python
 3:  from fileinput import input
 4:  from dateutil.parser import parse
 5:  from datetime import datetime
 6:  import numpy as np
 7:  from matplotlib import pyplot as plt
 8:  import matplotlib.dates as mdates
 9:  from matplotlib.ticker import MultipleLocator, FormatStrFormatter
11:  # Read in the sunrise and sunset data in CST
12:  # and convert to floating point hours
13:  days = []
14:  rises = []
15:  sets = []
16:  for line in input():
17:    d, r, s = line.split()
18:    days.append(parse(d))
19:    hr, min = int(r[:2]), int(r[-2:])
20:    rises.append(hr + min/60)
21:    hr, min = int(s[:2]), int(s[-2:])
22:    sets.append(hr + min/60)
24:  # Daylight lengths
25:  lengths = np.array(sets) - np.array(rises)
27:  # Get the portion of the year that uses CDT
28:  cdtStart = days.index(datetime(2018, 3, 11))
29:  cstStart = days.index(datetime(2018, 11, 4))
30:  cdtdays = days[cdtStart:cstStart]
31:  cstrises = rises[cdtStart:cstStart]
32:  cdtrises = [ x + 1 for x in cstrises ]
33:  cstsets = sets[cdtStart:cstStart]
34:  cdtsets = [ x + 1 for x in cstsets ]
36:  # Plot the data
37:  fig, ax =plt.subplots(figsize=(10,6))
38:  plt.fill_between(days, rises, sets, facecolor='yellow', alpha=.5)
39:  plt.fill_between(days, 0, rises, facecolor='black', alpha=.25)
40:  plt.fill_between(days, sets, 24, facecolor='black', alpha=.25)
41:  plt.fill_between(cdtdays, cstsets, cdtsets, facecolor='yellow', alpha=.5)
42:  plt.fill_between(cdtdays, cdtrises, cstrises, facecolor='black', alpha=.1)
43:  plt.plot(days, rises, color='k')
44:  plt.plot(days, sets, color='k')
45:  plt.plot(cdtdays, cdtrises, color='k')
46:  plt.plot(cdtdays, cdtsets, color='k')
47:  plt.plot(days, lengths, color='#aa00aa', linestyle='--', lw=2)
49:  # Add annotations
50:  ax.text(datetime(2018,8,16), 4.25, 'Sunrise', fontsize=12, color='black', ha='center', rotation=9)
51:  ax.text(datetime(2018,8,16), 18, 'Sunset', fontsize=12, color='black', ha='center', rotation=-10)
52:  ax.text(datetime(2018,3,16), 13, 'Daylight', fontsize=12, color='#aa00aa', ha='center', rotation=22)
54:  # Background grids
55:  ax.grid(linewidth=1, which='major', color='#cccccc', linestyle='-', lw=.5)
56:  ax.grid(linewidth=1, which='minor', color='#cccccc', linestyle=':', lw=.5)
58:  # Horizontal axis
59:  ax.tick_params(axis='both', which='major', labelsize=12)
60:  plt.xlim(datetime(2018, 1, 1), datetime(2018, 12, 31))
61:  m = mdates.MonthLocator(bymonthday=1)
62:  mfmt = mdates.DateFormatter('              %b')
63:  ax.xaxis.set_major_locator(m)
64:  ax.xaxis.set_major_formatter(mfmt)
66:  # Vertical axis
67:  plt.ylim(0, 24)
68:  ymajor = MultipleLocator(4)
69:  yminor = MultipleLocator(1)
70:  tfmt = FormatStrFormatter('%d:00')
71:  ax.yaxis.set_major_locator(ymajor)
72:  ax.yaxis.set_minor_locator(yminor)
73:  ax.yaxis.set_major_formatter(tfmt)
75:  # Tighten up the white border and save
76:  fig.set_tight_layout({'pad': 1.5})
77:  plt.savefig('riseset.png', format='png', dpi=150)

After all the imports in Lines 3–9, the script begins by using the fileinput module to read and parse the lines of the data file, one by one. Each line is split into date, rise time, and set time in Line 17. Line 18 parses the date using the dateutil library, returning a datetime object. Lines 19–20 then split the sunrise time into the hour and minute parts and convert them into a single floating point number. Lines 21–22 do the same thing to the sunset time. The dates and times are accumulated into the days, rises, and sets lists.

The duration of daylight is calculated by subtracting the rise time from the set time in Line 25. I could have done this within the loop of Lines 16–22, but chose to do it through NumPy array arithmetic instead.

Lines 28–34 handle the daylight saving time stuff. The USNO data are given in standard time. To convert to DST, I needed to create new date and time lists that extend only over the duration of DST—from March 11 through November 3—and add an hour to the sunrise and sunset times. Lines 28–29 get the indices necessary to slice the lists, Line 30 slices the list of dates, Lines 31 and 33 slice the lists of rise and set times, and Lines 32 and 34 add the DST hour to the rise and set times. It sounds more complicated than it is.

Then we start using Matplotlib to make our graph. Lines 37–47 create the plot and add all of the various lines and areas. The fill_between function creates the areas, and the plot function draws the lines. I use the alpha parameter in the fill_between calls to get the shading I wanted and to allow the gridlines to show through the filled-in areas. There was a bit of trial and error to get alphas that made the two DST zones look about the same.

The parts of the plot were labeled in Lines 50–52. The text function puts the given text (third parameter) at the given x- and y-coordinates (first and second parameters). The neat thing about this function is that the coordinates are given in the same units as the data, which is why you see the x-coordinate given as a datetime. The ha parameter is short for “horizontal alignment,” which allowed me to specify the center of the text so it would fall between the vertical gridlines. The rotation values were chosen through trial and error to get the text tilted to match (by eye) their curves.

Lines 55–56 display the gridlines. I set the linewidth to .5 points, but thin lines like that are better for PDF output than PNG.

After setting the font size for all the tick labels in Line 59, the horizontal axis is formatted in Lines 60–64. The horizontal limits were set to run the length of the year, and the tick marks delineate the month boundaries. The date format in Line 62 uses the standard strftime system, and has a bunch of space characters at the beginning to get the month labels to be (more or less) centered in the middle of each month instead of centered under the tick mark at the beginning of the month. There should be a better way to do this, but I haven’t found it.

The vertical axis is formatted in Lines 67–73. Lines 68 and 69 set the major and minor tick marks to be 4 hours and 1 hour apart, respectively. Recall that the rise and set times are just numbers—we can’t use the strftime system Line 70 formats the time labels, which are just numbers, to look like like hours.

Finally, Line 76 gets rid of a lot of the whitespace border that would otherwise surround the plot, and Line 77 saves it to a PNG file. The 10″ by 6″ plot size set back in Line 37, combined with the dpi=150 setting in Line 77, gives us an image that’s 1500×900 pixels.

I did make one small change to the script. As I mentioned in yesterday’s post, I thought the curve and tick labels were too small for the size of the graph as it appears in the blog. I bumped the font size up from 10 to 12 to make the text more legible. Not a huge difference, but a definite improvement.

One table following another

Justin Grieser of the Washington Post had a story about Daylight Saving Time a couple of days ago, and unlike most such stories, it was generally favorable. I’ve said what I think about DST and don’t intend to revisit the topic here, but I do want to talk about how the WaPo article presented the sunrise and sunset data.

The gist of the article was similar to any argument in favor of DST: without it, we have lots of summer sunlight in the early morning, when it’s wasted for most of us. The argument was encapsulated in this table, which presents sunrise and sunset times in Washington, DC, using standard time throughout the year.

Rise and set table from WaPo

There are a few odd things about this table. Let’s start at the top. The headings for sunrise and sunset are given in emoji rather than words. I’m sure this seemed like a cute idea at the time, but it wasn’t very helpful to those of us who read the article on our phones, where the sun emoji, less than 2 mm across, looked more like a burnt orange smear than a sun. (Also, the emoji is officially named “Sunrise,” so it probably shouldn’t be used for sunset, even with the down arrow.) More distinct glyphs would have been the regular sun, ☀️, or even the sun with a face, 🌞, although the latter might be confused with one of the regular face emojis.

The next oddity is the distribution of dates. There’s no rule that says the dates have to be uniformly distributed throughout the year, but there’s no good reason for the haphazard scattering in this table. It starts out on the first of every month, then jumps to the 21st of June (to get the summer solstice), skips July entirely (because July 1 is too close to June 21?), jumps from October 1 to November 15 (to get back into standard time, I guess), and finishes up with December 21 (the winter solstice, but awfully close to January 1, where we started). A better choice would have been to use the 15th of every month. That would give a better sense of how sunrises and sunsets change throughout the year and would be close enough to the solstices. Hitting the solstices isn’t that important, anyway, as the sunrise and sunset extremes don’t occur on the solstices.

The weirdest thing, though, is why the article used a table at all. You can show a whole year’s worth of data in a graph, and it would give readers a better overall view of when the sunrise is too damned early. Graphs usually aren’t as good as tables for providing exact figures, but exact figures aren’t important here.

I took the US Naval Observatory’s sunrise and sunset data for Chicago in standard time and plotted them. This would be the Chicago equivalent of WaPo’s DC table.

Chicago rise and set in standard time

I didn’t label the axes because I thought they were obvious. The USNO uses a 24-hour clock in its data set,1 so I stuck with that for the vertical axis.

The graph is better than any table at showing the sinusoidal flow of rise and set times over the course of a year. The labels are little smaller than I intended, but unlike the WaPo app, you can zoom in to see details.

Another way to present the data would be to include the DST shift:

Chicago rise and set with DST

Here you can see how DST saves us from the earliest sunrise times, but at the expense of fairly late sunrise times at the two ends of DST. We didn’t get such late sunrises back when DST covered a shorter fraction of the year.2

I’ve also added the duration of daylight, the time between sunrise and sunset, as the dashed purple line. This is cheating, as I’m using the vertical axis for both time and duration, which are not the same; but the units work out, and I don’t think it’s confusing.

I’m not sure why Justin Grieser used tables instead of graphs in his article, but I have a guess. Most graphing programs have standard facilities for handing times and dates along the horizontal axis because timelines are so common. Plotting time along the vertical axis isn’t as common, and I suspect the software WaPo uses doesn’t make it easy to build the plots I did. If that’s the case, it’s another example of something I’ve complained about in the past: graphs being made—or in this case, not made—to accommodate the limitations of the software rather than the needs of the data.

  1. There is no 24:00 time—it’s 0:00 of the following day—but I included the 24:00 label because I thought the graph looked stupid without it. 

  2. I’ve often wondered how we can call something “standard time” when it’s in effect for only about four months. 

Travel, devices, and cables

I was on a business trip in Philadelphia this week. I arrived Monday evening and while I was taking out my contact lenses for the night, I tore one of them. No problem. I always have a spare set of contacts in my shaving kit because I’m a careful planner and forward thinker.

Then I started to remove my Apple Watch, and my opinion of myself took a sudden turn. I realized I hadn’t brought the watch’s charger with me because I’m a terrible planner and worthless thinker.

In my defense, this was my first trip after buying the watch a couple of weeks ago, and I haven’t yet established good habits for traveling with it. I have, however, ordered a spare charging cable, so this won’t happen again.

Over the years, I’ve developed a set of charging accessories for travel that tries to strike a balance between simplicity and comprehensiveness. One compartment of my backpack contains

  1. A 5W Apple charger (the little cube that came with one of my iPhones).
  2. A 10W Apple charger (the bigger one with the flip-out prongs that came with my iPad Pro).
  3. Lightning cables in 4-inch, 3-foot, and 6-foot lengths.
  4. A Jackery 6000mAH battery.
  5. A 21-inch micro-USB cable that came with some device I don’t remember.

All of the cables are USB A at the other end. I’m in the lucky position of not having to support both USB A and USB C.

I bought the Jackery 3–4 years ago, and although it doesn’t have as much oomph as it used to, it still gets me through long days stuck in an airport when my iPhone use—either directly or as my iPad’s tethering connection—goes way up. The 4-inch Lightning cable was bought mainly to use with the Jackery; it’s a convenient length when the phone is charging in my jacket pocket or backpack.

I bought a 10-foot Lightning cable a couple of years ago, thinking it would be handy to reach those out-of-the-way hotel outlets. But it wasn’t one of my better purchases. It takes up a lot of space and is so thick—both the cable itself and the Lightning end fitting—that it’s too stiff and clumsy to use comfortably with the phone. This surprised me, as the 3- and 6-foot cables (top and middle in the photo below) are much closer in thickness.

Amazon 3-ft, 6-ft, and 10-ft lightning cables

Maybe a braided cable would be a better choice. In any event, the 10-footer has been relegated to occasional use at home to charge the iPad when it’s being used as a laptop proxy.

The micro-USB cable is there to recharge the Jackery and my Kindle, although I can’t remember ever needing to charge the Kindle during a trip.

I buy only white Lightning cables to make them easy to distinguish from the micro-USB, which is black. I decided to go this way a few years ago after buying a black Lightning cable and getting it confused with the micro-USB almost every time I tried to pull it out of the pouch. I suppose it would be even smarter to color code the different Lightning cable lengths, but I haven’t gone that far.

Adding an Apple Watch to the mix means it’s time to switch from individual wall warts to a multiport charger. That, too, was part of the order when I returned from Philadelphia.

By the way, my watch made it through the trip without dying, from about 6:30 AM on Monday to 9:30 PM on Wednesday without a charge and without switching to Power Reserve. I can’t say it got heavy use during this trip, as I was under the weather and did no formal exercising those three days (breaking my streak of completed rings). Still, that’s 63 hours, which is much better than I expected.

Chart consistency

This is another in my irregular series on graphs that could be improved with a little work. It’s me being a grammar Nazi but with charts.

The article of interest is “How much better are today’s Winter Olympians than the first? The answer, in eight charts.” Words and graphics by Jacob Borage, published in the Washington Post a couple of days ago. The title tells you what to expect: a comparison over several events of this year’s results (or sometimes the 2014 results if the 2018 results weren’t available yet) to the first Winter Olympics results of 1924. The comparison is limited to sports in which the scoring is determined by a tape measure or a stopwatch.

The article starts with ski jumping:

Ski jumping

I like how Borage uses light and dark versions of the same basic color to do his comparison. We’ll see shortly that he uses a different color for each sport, which is also helpful. What I don’t understand about this graph is putting the two bars end-to-end. It suggests there’s some total amount being divvied up between 1924 and 2018, which certainly isn’t the case. A better chart would be a standard bar chart, with each bar starting at the same spot and the 2018 bar extending over twice as far. Such a chart is not only standard for this type of comparison, it would evoke the event itself—starting at the same point and heading off for whatever distance is achieved.

The 500-meter speed skating chart, which comes next in the article, is just what the doctor ordered:

500 meter speed skating

Note the change in color for a new event, but still with 1924 being the lighter version and 2018 the darker. Strictly speaking, this isn’t necessary because the bars are labeled, but its the kind of thing that helps the reader and shows some actual design thought. I do think that putting the city names in the chart is unnecessary clutter—the emphasis in the article is improvement with time, not venue—but otherwise I like this chart a lot.

The next chart is still in speed skating, but has shifted to the 1500-meter race:

1500 meter speed skating

What the hell is this? Why the switch to a donut chart within the same category? And why a donut chart at all? This, even more than the ski jumping chart, suggests a whole amount that’s being shared between two items. That’s not an apt comparison. I get that a donut chart looks sort of like a clock, and we are comparing times, but that’s not sufficient reason to use one here.

There are a couple of other speed skating comparisons, both of which use donut charts and continue the blue theme. I want to show the 10,000-meter chart:

10 kilometer speed skating

Do you see what’s wrong with this? The light and dark have been flipped, breaking the pleasing consistency established in the preceding graphs.

The next set of graphs cover cross-country skiing. Here’s a chart that covers the 18 km (from 1924) and 15 km (from 2018) races. Because the distances have changed, a direct comparison of the event times can’t be made, so Borage has plotted the average time per kilometer.

Cross country skiing average

We’re back to a standard bar chart, which I like, but we have the light and dark colors flipped, which I don’t. Then comes the 50 km event,

50 kilometer cross country skiing

where the colors have been flipped back to what they should be1 (and the modern time is from 2014 because the 50 km race hadn’t been skied when the article was published).

Finally, we have the four-man bobsled time:

Four man bobsled

Sigh. We’re back to the donut and we’ve switched the light/dark signalling again.

In the end, all of these graphs make the comparisons they’re supposed to, but the lack of consistency and the poor choice of graph type undercut the meaning. The decision to use different color themes for the different category events was a great idea made less great by flipping the light/dark representation. And using three different chart types for essentially the same type of comparison is just weird.

Consistency of design in graphing helps the reader as much as consistency of tense (to choose something I struggle with) in writing. Yes, the reader can figure it out, but you’ve lost some of the power and elegance of your message.

  1. By “should be,” I mean consistent with most of the other charts. There’s no particular reason the old events should be light and the new events dark, but once you choose a system, stick with it.