## 26 February 2013

### How you can be fooled by a "heat map"

The image above shows "each day of the year with a ranking for how many babies were born in the United States on each date from 1973 to 1999."  Obviously, more people were born in Jul/Aug/Sep than in Jan/Feb/Mar.  But after the chart was published in The Daily Viz, the creator found it necessary to publish a clarification -
While I’m excited about the traffic, I’m also worried that the graphic may have misled some readers. Some people read the map assuming that darker shades represented higher numbers of actual births, even though I tried to explain in the post that the colors were shaded by birthday rank, from 1 to 366, in popularity. Or I thought I did. Because of that, Sept. 16 — the most popular birthday — seems wildly more common than January 1, among the least popular. Both may be relatively close in the raw number of births, even though their ranks are far apart.
Here's a followup graph showing births/month (and normalized for number of days in the month):

What I actually find most interesting about the first heat map is the extent to which modern medical technology allows birth days to be manipulated - the obvious gaps at the Fourth of July and the days closest to Christmas, with the darker shades just before and after those holidays.

1. I like the pale row running right across the image on the 13th.

1. That's interesting too. I wonder if he could break down his data by which 13ths were on Fridays?

2. Or a Fourth of July that falls on a Friday the 13th.

2. "Some people read the map assuming that darker shades represented higher numbers of actual births, even though I tried to explain in the post that the colors were shaded by birthday rank, from 1 to 366, in popularity."

I'm confused. How doesn't actual births = popularity? Are people choosing their birthdays now? Or was this chart mapping how people feel about different birth dates?

1. I agree that his explanation is infelicitous; presumably he's better with numbers than with words. What I think he meant to say is that the DEGREE of darkness does not correlate with the number of births (i.e. something twice as dark is not twice as many births, and something pale does not imply a near-absence).

2. So then what information is actually being delivered??? Something that seems obvious is actually confusing.

3. The most common birthdates look to be roughly 9 months after Valentines Day. Just saying.

1. They look to me to be about 8.5 months (40 weeks, more or less) after New Year's Day.

2. Look also at the blip-up on 14th Feb Valentine's Day itself and a similar excess on Patrick's Day. My Grandmother was born on Feb 14 1892 and called Lily Valentine, so it looks like loyal Irish Americans are scheduling their C-sections for Paddy's Day, so that they won't forget when Patrick Jr's birthday is.

4. A similar chart/graph:

...In 1969, we had our first Military Draft Lottery.

Wiki says this about it: "People soon noticed that the lottery numbers were not distributed uniformly over the year. In particular, November and December births, or dates 306 to 366, were assigned mainly to lower draft numbers representing earlier calls to serve (see figure). This led to complaints that the lottery was not random as the legislation required. Analysis of the procedure suggested that mixing 366 capsules in the shoe box did not mix them sufficiently before dumping them into the jar. ("The capsules were put in a box month by month, January through December, and subsequent mixing efforts were insufficient to overcome this sequencing.") However, the non-uniform lottery was allowed to stand. Only five days in December—Dec. 2, 12, 15, 17 and 19—were higher than the last call number of 195."

At the time I saw no clarion call bout this non-random, but I myself DID notice it. I was just getting interested in astrology at the time. (I later became a mechanical engineer, and I still vouch for astrology to this day, though I never broached the subject in engineering offices. I don't see a discrepancy - I DO think there is a physical basis for astrology, as I did then, even if no one in science wants to consider it.)

It was THROUGH the consideration of astrology that I found the non-randomness of the draft lottery. Basically, what I did was to take each day of the year, with its lottery number, and put them into bins according to their astrological signs. It was AMAZING the pattern that came out. Averaging the lottery number for each astrological sign gave a nice curve. It was like a sine curve (a "sign curve" maybe?).

I long ago lost my work on it, but as I recall, Aries was the most screwed - late March-most of April. Its average was less than (unluckier than) Sagittarius - which is basically the MIDDLE of the 60-day period mentioned in the Wiki article. Only the others looked at it by month, and I looked at it by astrological sign. I DID look at it by month, too, but it was not as severe. Not quite. But still there.

My hypothesis at the time was that the horoscope for the beginning of the Draft Lottery procedure (drawing the numbers) affected the outcome. I would have done it again, but I never saw the numbers on the later draft lotteries. I did NOT do a horoscope on that first lottery to see what it would indicate, so I dropped the ball there. But I am pretty sure that no matter WHEN the lottery was held, some sort of similar curve would have come out. [In any event, it is a BIG mistake to assume tht random means random results!]

The assertion in Wiki that the mixing was inadequate is only a speculation - and it does not explain why ARIES (let's say April) was the unluckiest nor Sagittarius (December) was the most lucky. Aries and Sagittarius are not opposite, nor are April and December.

In addition, their speculation about it also forgets that they did not spin the cage in between drawing balls out, but they DID. Thus, the mixing/randomization would have continued throughout the ball-drawing operation - and would have and DID randomized subsequent selections. It works for BINGO. The original mixing does not affect the subsequent randomization process. They were simply looking for some "scientific" explanation for the pattern of numbers. They would never have allowed themselves to publicly consider astrology as a possible influence. They pretend that each months should have been represented more or less equally. In statistics such things do not often happen - and are statistically rare, actually.

But I did consider astrology, simply as an exercise in number-crunching and testing astrology, on my part.

The curve was quite convincing. Aries was unluckiest. Pisces and Taurus were also highly unlucky. While Sagittarius was luckiest, Capricorn and Scorpio were next luckiest.

I also projected at the time that 195 would be the last number taken, and in that I was correct, it seems.