Matplotlib and the Dark Sky API

Last night, Than Tibbetts responded to my Check the Weather review with this:

@drdrang Why not hook your own script up to the Dark Sky API?
  — Than Tibbetts (@thanland) Thu Oct 18 2012 9:35 PM CDT

Which, of course, led me to spend a couple of hours playing around with the Dark Sky API. Nothing of any significance came of it, but it gave me more practice with matplotlib.

The people who make the hyperlocal weather app, Dark Sky, have opened up their API so regular mortals can access the app’s short-term rainfall forecast. As it happens, there’s more information in the API than is presented in the Dark Sky app itselft. Take a look at the documentation for the forecast call and you’ll see that the call returns this piece of info:

error—The 3-sigma error of the intensity value, conditional on precipitation. In other words, if there is precipitation, then 99.7% of the time the intensity value will be between intensity ± error.

The rain prediction plot in Dark Sky shows only the expected rainfall intensity; one could take this error value and also plot the uncertainty in the prediction. So that’s what I did. Here’s one plot, with the bright yellow, representing the expected rainfall intensity, plotted more or less the way Dark Sky does; and with the duller, see-through yellow, representing the 95th percentile value (assuming a normal distribution), plotted above it.

Dark Sky rainfall plot

Here’s another, in which the expected rainfall is plotted as a line, and the 95% range is plotted as a band surrounding it.

Dark Sky rainfall plot

This one is very much like the interquartile range plots I did in my last matplotlib post, but this one shows the middle 95% instead of the middle 50%.

The script that produced both of these plots is fairly short and easy to understand.

python:
 1:  #!/usr/bin/python
 2:  
 3:  from __future__ import division
 4:  import json
 5:  import urllib
 6:  from os import environ
 7:  from sys import exit, argv
 8:  import scipy.stats as stats
 9:  import matplotlib.pyplot as plt
10:  
11:  # The probability and number of standard deviations
12:  # associated with the upper and lower bounds.
13:  pUpper = .95
14:  pLower = .025
15:  nUpper = stats.norm.ppf(pUpper)
16:  nLower = stats.norm.ppf(pLower)
17:  
18:  # Get the latitude and longitude from the command line
19:  # or use default values from downtown Naperville.
20:  try:
21:    lat = argv[1]
22:    lon = argv[2]
23:  except IndexError:
24:    lat = 41.772903
25:    lon = -88.150392
26:  
27:  # Get my API key and construct the URL
28:  try:
29:    with open(environ['HOME'] + '/.darksky') as rcfile:
30:      for line in rcfile:
31:        k, v = line.split(':')
32:        if k.strip() == 'APIkey':
33:          APIkey = v.strip()
34:      dsURL = 'https://api.darkskyapp.com/v1/forecast/%s/%s,%s' % (APIkey, lat, lon)
35:  except (IOError, NameError):
36:    print "Failed to get API key"
37:    exit()
38:  
39:  # Get the data from Dark Sky.
40:  try:
41:    jsonString = urllib.urlopen(dsURL).read()
42:    weather = json.loads(jsonString)
43:  except (IOError, ValueError):
44:    print "Connection failure to %s" % dsURL
45:    exit()
46:    
47:  # Pluck out the hourly rain forecast information.
48:  startTime = weather['hourPrecipitation'][0]['time']
49:  intensity = [ x['intensity'] for x in weather['hourPrecipitation'] ]
50:  upper = [ min(x['intensity'] + x['error']/3*nUpper, 75) for x in weather['hourPrecipitation'] ]
51:  lower = [ max(x['intensity'] + x['error']/3*nLower, 0) for x in weather['hourPrecipitation'] ]
52:  time = [ (x['time'] - startTime)/60 for x in weather['hourPrecipitation'] ]
53:  
54:  # Plot the intensity ranges.
55:  plt.fill_between([0, 59], [15, 15], [0, 0], color='#0000ff', alpha=.02, linewidth=0)
56:  plt.fill_between([0, 59], [30, 30], [15, 15], color='#0000ff', alpha=.04, linewidth=0)
57:  plt.fill_between([0, 59], [45, 45], [30, 30], color='#0000ff', alpha=.08, linewidth=0)
58:  plt.fill_between([0, 59], [75, 75], [45, 45], color='#0000ff', alpha=.16, linewidth=0)
59:  
60:  # Plot the values.
61:  plt.fill_between(time, intensity, color='#ffdd00', linewidth=2)
62:  plt.fill_between(time, upper, intensity, color='#ffdd00', alpha=.25, linewidth=0)
63:  # plt.plot(time, intensity, color='#ff0000', linewidth=3)
64:  # plt.fill_between(time, upper, lower, color='#ff0000', alpha=.25, linewidth=0)
65:  plt.axis([0, 59, 0, 65])
66:  plt.xticks([10, 20, 30, 40, 50])
67:  plt.yticks([7.5, 22.5, 37.5, 55], ['sporadic', 'light', 'moderate', 'heavy'], rotation=90)
68:  plt.tick_params('y', length=0)
69:  plt.xlabel('Minutes from now')
70:  plt.title('Rainfall intensity')
71:  
72:  plt.savefig('ds-rain-1.png', dpi=160)
73:  # plt.savefig('ds-rain-2.png', dpi=160)

Lines 11-16 use the scipy.stats module to determine how wide, in standard deviations, the band will be. They’re currently set to generate the first plot, where only the upper bound, at the 95th percentile, was plotted. For the second plot, Line 13 was changed to

pUpper = 0.975

Lines 18-25 set the location for which the prediction is made. The latitude and longitude are expected to be the first and second arguments on the command line. If either or both are missing, a default location (the Apple Store in downtown Naperville, Illinois) is used.

Lines 27-37 gather my Dark Sky API key from a dotfile in my home directory (~/.darksky) and use it to construct the URL that makes the call to Dark Sky’s forecast method. The contents of the dotfile look like this:

APIkey: 123456789abcdefg

If the dotfile isn’t present or it doesn’t contain data in the right form, the script quits with an error message.

It’s easy to get a Dark Sky API key. Just give them your email address and a password and you’ll get a key. No need to tell them what you’re going to do with it. They do limit you to 10,000 calls per day before they either cut you off or start charging you (they can only charge you if they have your credit card on file). That’s a huge number for personal use, and I can’t imagine anyone blowing through it (unless you write a script with a loop that goes rogue).

Lines 39-45 collect the data from Dark Sky and decode it into a data structure. Like most web services nowadays, Dark Sky uses JSON to pass information around, and like most scripting languages, Python has a library for handling JSON. If something goes wrong in the download or the decoding, the script quits with an error message.

I should mention here that there is a Python module for Dark Sky on GitHub that I could have used. If I were going to do a lot with the Dark Sky API, I probably would use it; but because it depends on a handful of nonstandard modules I’d have to install, and I was only making one call to the API, it didn’t seem worth the effort for this script.

Lines 47-52 run through the data structure and pull out the time, intensity, and error values of the hourPrecipitation list. We use the intensity value directly. The error value is used, along with intensity and the upper and lower limit numbers calculated in Lines 11-16, to set the upper and lower band values. The time, which is the number of seconds since the start of the Unix epoch, is converted into the number of minutes from now associated with each predicted intensity value.

There’s a brief discussion in the forecast documentation on the meaning of the intensity value. For our purposes, it’s sufficient to say that it’s a number between 0 and 75 and is interpreted by Dark Sky this way:

Numerical Descriptive
< 15 sporadic
15-30 light
30-45 moderate
> 45 heavy

Lines 54-58 use these ranges to create a shaded background for the plot. The intensity (ha ha) of the blue was set for each band by altering the alpha value.

Lines 61-64 plot the data. The uncommented lines generate the first plot, the commented lines (when uncommented) generate the second plot.

Lines 65-70 decorate the axes. I put the descriptive labels along the y-axis at the center of each range and avoiding having tick marks appear by setting their length to zero. Note also that although the rainfall intensity can go up to 75, I set the upper limit of the plot to 65. From what I’ve read, a value above 65 means the kind of storm where you should be in a bunker with your head between your knees, not dicking around with Dark Sky.

Lines 72-73 save the plot to a PNG file that’s 1280×960 pixels. The standard plot size is 8″×6″, regardless of the format, and the bitmapped formats use the dpi setting to set the size in pixels (1280 = 8 × 160). As before, these lines are currently set to produce the first plot; switching the comments produces the second.

As in my baseball posts, my main interest here was to get more experience with matplotlib. What did I learn?

Update 10/25/12
Jay Hickey adapted some of this code and made a GeekTool geeklet to display the DarkSky short term precipitation chart on his Desktop, a great idea that I’ll probably steal to add to my NerdTool weather layout.


4 Responses to “Matplotlib and the Dark Sky API”

  1. Modernscientist says:

    This is a cool way to play with Matplotlib. I switched from gnuplot when it became too difficult to code subroutines to generate multiplots that I like. And though gnuplot is an excellent plotting package, I’ve never looked back. All the figures for my latest paper, including contour plots of NMR spectra, were made using Matplotlib.

  2. Jack Turner says:

    Cool stuff. We actually wrote some very similar graphing code in an example app we wrote and threw on GitHub. In this we use Node.js to request data from the API and do the drawing manually with a canvas element. We arrived at roughly the same place you did, with the error informing upper and lower bounds that are displayed faintly around the intensity line. Here is a link to an image of a few interesting storms so you don’t have to bother running it: http://s.turner.io/KJp3

    Perhaps it’s worth noting that this same error value that we’re is actually conveyed to the user in the Dark Sky iOS app. That error value is what determines how wobbly the graph is at any given point. The line in the Dark Sky iOS app bobs up and down between the same upper and lower bounds that we displayed in these non-animated plots.

    Forecasts with a large amount of error actually feel nervous. I think users get this even if they don’t consciously what’s causing the wobble. We do actually explain it in the app [ http://s.turner.io/KIjq ] but no one actually reads instructions for apps. I know I don’t.

    Jack Turner of The Dark Sky Company, LLC

  3. Dr. Drang says:

    Jack,
    When I first saw the error term in the API, I wondered if that determined the wiggle/wobble. Thanks for confirming.

    In the code, it looks like you’re further weighting the intensity by the probability value before plotting, which makes sense. There are a couple of other scaling factors in your code that I don’t understand, though (Lines 70-71 of web.js). Am I missing something obvious?

  4. Jack Turner says:

    Honestly, we just fiddled with the weighting of the error and probability until we liked the way it felt. In our defense, we’re making a weather visualization that needs to be intuitively useful, not statistically rigorous. Or as Kubrick said…

    “Sometimes the truth of a thing is not so much in the think of it, but in the feel of it.”