Redundant URLs

I often visit Pinboard’s popular bookmarks page to see what the rest of the world thinks is interesting. Today there are nine entries for Mona Simpson’s moving eulogy of Steve Jobs, reprinted on Sunday in the New York Times. The interest in it isn’t surprising, but it does seem a little weird that there are so many entries.

The idea behind the popular page is that it counts up repeated bookmarking of the same page. Pages that have been recently bookmarked by many users show up on the popular page, presented as a single entry with a count. The eulogy shows up repeatedly because Pinboard collects links according to their URLs, and the Times doesn’t have a single canonical URL for that page. When you’re on a Times page, the URL in your browser’s toolbar depends on how you got there.

Here are the nine slightly different URLs. Two of them go to the first page—the Times divided the eulogy over three pages to maximize pageviews—and the others go to a “single page” view.

I have a sense that there’s a reason for so many URLs (and I’m sure there are more than nine), but I don’t know what it is. I would think that the recommended SEO practice would be to use as few URLs as possible to avoid diluting your Page Rank. Maybe that’s not a concern because Google compresses all those URLs down into a single entry.

In any event, the repetition as I scrolled down the popular page was striking, and I thought it worth a mention. Put together, the nine URLs account for over 450 bookmarks, almost three times as many as the current “most popular” page.


5 Responses to “Redundant URLs”

  1. Joshua says:

    Regarding Google, the Times is using a canonical link to prevent dilution:

    <link rel="canonical" href="http://www.nytimes.com/2011/10/30/opinion/mona-simpsons-eulogy-for-steve-jobs.html?pagewanted=all">

  2. Dr. Drang says:

    Thanks, Joshua! Seems like Pinboard could do a better job of tracking popular bookmarks if it pulled out the canonical link.

  3. Maciej Ceglowski says:

    Grabbing the canonical link is a very good idea - I’ll throw it on the todo list. Thanks for pointing this out!

  4. Les Orchard says:

    We looked at something similar to this at Delicious, once upon a time. Canonical links for aggregate displays is what I think we eventually landed on.

    Where rel=”canonical” was missing from source pages, we also tried some arbitrary normalization schemes like chopping off all query params or just the ones from a list of common noise params. Main thing, though, was to never just do that to user data and just for aggregations.

  5. Dr. Drang says:

    Whatever you did, Les, it seemed to work. I don’t remember ever seeing repeats on the Delicious popular page.