What do Rachael, Ulysses, and two Houston DJ's have in common?

Search privacy?

Question: What does my sister Rachael have in common with Ulysses and a pair of Houston DJ's that I, alas, do not?

Answer: AOL Search.

I assume most readers will have heard about AOL releasing what 650,000 subscribers searched for over a three month period. Although the data was ostensibly "anonymized" just being able to correlate a users searches can lead to that user being identified, or at the very least a pretty clear picture of who and where they are.

Although AOL eventually pulled the data from their research.aol.com website, it lives on mirrored on many sites around the world.

I downloaded the data and this evening finally got around to looking through it (briefly). I decided to see how many people had searched for anything that led them here or to planet.cleverly.com.

AOL broke the data up into ten individual files each somewhere between 212 to 228 megabytes (uncompressed). Using standard Unix utilities I executed an grep -li cleverly.com * and was suprised to see that 9 out of 10 of the files matched!

Upon closer examination, however, people were searching for either rachael.cleverly.com (my sister's old website) or stevensandcleverly.com (Stevens & Cleverley, Houston DJ's on—the now defunct?—KRTS 97.5 FM). Nobody was searching for blog.cleverly.com, planet.cleverly.com or even my old michael.cleverly.com website.

So Rachael's more popular among the Internet masses than I am. Or at least people want to read her old college essays more than mine. Either way I'm OK with that. :-)

Come to think of it I've never actually read Ulysses and I don't think I ever wrote an essay analysing a poem in college. So no wonder nobody is looking for me...

So what kind of profile can we glean from Rachael's anonymous homework stalkers? Let's see...

#10,536,410

Our first mystery user, lets call her Alice, looks to be a student from The University of Alabama in Huntsville. Alice went looking for Rachael's website on March 6, 2006 at 5pm. She appears to regularly need to use a search engine to search for websites that she could just go to directly if she knew how to use her browsers address bar. Oh, and type better.

#8,516,760

Our second user, lets call him Bob, appears to be a student at Texas State. He searched for Rachael's old website on March 20th at 11:43 pm. Some of his other search highlights:

#12,569,041

Our third user, lets call her Carol, was the busiest little searcher of the bunch. She performed 119 separate searches. Prime interested seem to revolve around:

#2,799,138

Let's pretend our final user is Dave. In addition to searching for Rachael's essays on poems he mainly seemed to be wanting information on different colleges. Perhaps a high school junior getting ready to start applying to colleges?


As for who these aspiring fans of Rachael really were, they probably weren't really Alice, Bob, Carol and Dave. They are just the prototypical example characters in discussions on cryptography.

— Michael A. Cleverly

1 comment | Printer friendly version

Paying more for the pleasure of riding lots of trains

Train tracks

Last year I wrote about paying more for the pleasure of lots of layovers—how for an extra $111 I could visit airports in Missouri & Ohio on my way from Salt Lake to Portland, Oregon for the 12th Annual Tcl/Tk conference.

Well, it is time to start thinking about the 13th Annual Tcl/Tk conference... this year the conference is being held in Naperville, Illinois (near Chicago). Unlike last year where I had to pay my own way, this year my employer is paying for it (though I'd have gone anyway if they hadn't).

I looked up what it would cost to travel by train this afternoon. (Salt Lake is a major stop on the California Zephyr route between Chicago and San Francisco.)

Much to my surprise coach tickets were actually $10 less each way then what I could find for flights into Chicago's Midway airport—$120 vs $130.

Amtrak's website, just like Delta's last year, seems to be programmed to really go the extra mile and give you every last possible itinerary option. For an extra $148 ($268 total) I could return home from Naperville to Salt Lake the round about way:

Travelling this way would only take the better part of five days (longer than the conference itself) to get home!

I'm attending the conference with two co-workers who would apparently rather face the indignities of airport security & several hours cramped with no leg room than view the scenic beauty of the American midwest if it means cutting twenty-nine some odd hours off the trip.

As for me, I have only vague memories of traveling by train from Salt Lake to Los Angeles as a young child. The Zephyr has a certain romantic appeal to it. If I don't "seize the day," so to speak, will I ever get around to it otherwise?

Something worth thinking about for a few days before booking airfare I think...

— Michael A. Cleverly

1 comment | Printer friendly version

Crosswalk placebos

Push button to cross

I'm reading Henry Petroski's Success through Failure: The Paradox of Design (and quite enjoying it).

While illustrating that "the connection between intention and result, between cause and effect, is not always what it seems" Petroski sheds light on a great mystery I've wondered about before: does pushing the crosswalk button actually accomplish anything?

Blaming an unfortunate occurence on bad design may make for a convincing damage claim—or even a succesful lawsuit—but the connection between intention and result, between cause and effect, is not always what it seems. Over three thousand intersections in New York City have signs instructing pedestrians, "To Cross Street / Push Button / Wait for Walk Signal." A good deal of time often elapses between pushing the button and getting the go-ahead, but conscientous citizens obediently wait. They presume, one presumes, that a delay is part of the system's design. It may be a "bad design," but the light does change—eventually.

New York intersections began to be fitted with these "semi-articulated signals" around 1964. They were the "brainstorm of the legendary traffic commissioner, Henry Barnes, the inventor of the 'Barnes Dance,' the traffic system that stops all vehicles in the intersection and allows pedestrians to cross in every direction at the same time." Walk buttons were installed mostly where a minor street intersected a major one, along which traffic would be stopped only if a pavement sensor detected a vehicle waiting to enter from the minor street or if someone pushed the button, causing the light to change ninety seconds hence. With increased traffic (by 1975, about 750,000 vehicles were entering Manhattan daily), the signals were being tripped frequently by minor-street traffic. The walk button hardly seemed necessary, and pushing them interfered with the coordination of newly installed computer-controlled traffic lights among many thoroughfares. Consequently, most of the devices were deactivated by the late 1980s, but the buttons themselves and the signs bearing the instructions for their use remained in place. Evidently there was never any official announcement about the status of the "mechanical placebos."

Which doesn't necessarily mean the buttons are placebos anywhere other than New York, but it does make one wonder...

— Michael A. Cleverly

1 comment | Printer friendly version

Survey says!: 8 out 10 AOL searchers ...

Screenshot showing the AOL user's dilema of where to enter a URL

Paul Boutin at Slate took AOL's published search data and used a commercial software package to analyze what people searched for. His conclusion: AOL's data leak reveals the seven ways people search the web.

Briefly, his seven classifications are:

  1. The Pornhound
  2. The Manhunter
  3. The Shopper
  4. The Obsessive
  5. The Omnivore
  6. The Newbie
  7. The Bakset Case

Unfortunately the article doesn't give us percentage breakdowns for the relative population size of each of these seven groups. (For the record I believe I'd be an Omnivore, though I'd never used AOL's search prior to their releasing this data.)

Nor does the article indicate whether each person is strictly limited to being placed in a single group, or whether one person might be classified as both a Newbie and a Basket Case at the same time.

I suspect people can belong to multiple classifications since an illustrative characteristic of being a Newbie is one "who confused AOL's search box with its browser address window."

Writing a short Tcl script to count the number of unique users who had at least one search that matches the following regular expression:

{^[a-z0-9-]+(?:\.[a-z0-9-]+)*\.[a-z]{2,6}$}

I found that over 78.6% of AOL users had searched for—what appears to be—a domain name instead of using their browsers address bar directly. (516,882 out of 657,426 to be precise.)

Maybe 21.4% of AOL's customers really have taken the training wheels off?

— Michael A. Cleverly

Comment? | Printer friendly version

What would you choose?: Is/Is, Is/Are, Are/Is, or Are/Are?

Via the Language Log comes a question worthy of Sunday dinner conversation: how would you complete each of the following sentences?

  1. The poll shows that a majority of people         against the war.
    1. The poll shows that a majority of people is against the war.
    2. The poll shows that a minority of people are against the war.
  2. The poll shows that a minority of people         against the war.
    1. The poll shows that a minority of people is against the war.
    2. The poll shows that a minority of people are against the war.

Readers were invited to participate in an online poll. This week the results are in and I'm happy to report that I is not in the minority... ;-)

How did I answer?

I chose are to complete the first sentence without hesitation. Both readings, "(a majority of) people are" and "a majority (of people) are" work for me.

Initially I wavered slightly in my commitment to are for the second sentence, but in the end decided that I liked it better as "(a minority of) people are" instead of "a minority (of people) is" (tolerable but rough sounding).

If you are curious about other peoples justifications be sure to check out both comment threads. Microsoft Word would tell me I'm wrong (in both cases) apparently.

— Michael A. Cleverly

2 comments | Printer friendly version

-> Next month (with posts)
-> Last month (with posts)