On March 14th Google announced plans to improve their privacy practices by "anonymizing" their logs after 18-24 months. As usual, Google is getting bashed for implementing weak efforts, despite the fact that no other search engine is making any efforts at privacy at all. I am going to join in--not to pick on Google, but because this affords us a chance to discuss these issues and debate what the policy should be.
First, let’s understand what they are saying. Past policy has been to keep a record of absolutely all searches, to keep them with all the identifying information they have, and to keep them forever. Considering that a large fraction of Google search users log in to various Google services, they already “own” a great deal of information about your identity. This is the current default in the industry. The plan is to continue capturing all this information, but then to make it "anonymous" some time between 1.5 and 2 years later.
There is a poor history of "anonymized" data. It turns out that you really are unique and individual. While hearing that made us all feel good in school, it also means that it is easy to identify you given enough individual facts.
Just last year AOL released "anonymized" search logs for 650,000 users. There was no "identifying information" attached to individual searches, but all searches done by the same person could be grouped together under some "anonymous" identifier. A reporter at the New York Times was able to identify one of those people in just a few days based on her searches. A well designed data analysis system could do this on a massive scale. While better than nothing, this is not much better than nothing.
I suspect that few people realize, even after all the media attention, that search engines store all of their searches in a way that is directly attributable to them. People expose themselves in all kinds of ways they would not do if they realized their vulnerability.
Therefore, I applaud Google for taking a step—even if a small one—towards privacy. Marc Rotenberg of EPIC opined that this would set a new standard for the industry. I hope he is wrong but suspect he is right. My hope is that the debate around this will enable a broader discussion and lead to better policy.
At the end of the day, however, any server you access can capture all kind of information about you and your activities, and it very well will do whatever it wants with it. This applies far beyond search engines, and for sites outside the legal or ethical reach of this debate. People need to actively protect themselves. Viruses are illegal, yet you need anti-virus. Hacking is illegal, yet you need firewalls. If you want to be anonymous, you need to take responsibility for that as well.