There are three approaches to thinking about online data collection: a consumerist model, a fundamental rights model, and a “market” model. A consumerist model focuses on adequate notice to a consumer that online profiling is occurring, and require consent before information about the surfer’s online behavior can be collected. Debates within the consumerist model take place over how much notice is required (general notice vs. detailed notice on every page [hypertext link to dashing signals]) and whether “consent” may be general (“if you use our web page you impliedly agree that we may collect data about you”) or specific (consumers required to give explicit consent each time they log on). Regardless of whether a consumerist model requires strong or weak notice, however, in the end under consumerist model a website can restrict a consumer from access unless the consumer agrees to allow the website to collect data.
Under a fundamental rights approach to online profiling, an individual’s browsing habits would not be allowed to be negotiated away in such a fashion. By analogy, Congress has forbade video stores from releasing information about what videos one rents. 18 U.S.C. § 2710 (1999)). The Cable Communications Policy Act of 1984 forbids cable operators and third parties from monitoring the viewing habits of subscribers. See 47 U.S.C. § 551 (1999). Many libraries forbid the release of borrowing records. These rules, exemplifying a fundamental rights approach to informational privacy, protect a viewer or borrower regardless of the lender’s desire to accumulate and sell such information.
In contrast to these approaches, a marketplace model (sometimes referred to as “self-regulation”) would defer to “the market” to work out the resolution of the tensions. The assumption behind a marketplace approach is that consumers have the power to negotiate what information they wish to disclose to websites. The model takes as its focus the idea of exchange: a user obtains information from a website, and in “exchange” the website collects data about the user. The market approach assumes that whatever data is disclosed—even unwittingly by consumers—is within the power of the collector to share, sell, or retain. The current system defaults to a market model unless a governmental agency (e.g., the Federal Trade Commission) intervenes.
As we explore this section, consider whether a consumerist model (either strong or weak) adequately protects an online user’s privacy, or whether a fundamental rights approach is required. Alternatively, consider whether any change is necessary: perhaps online profiling of your surfing behavior is a good thing.
Studies have shown that Americans are increasingly concerned about the protection of their privacy on the Internet. Many of these concerns are well-founded (See Module I). The nature of the Internet causes information to pass through dozens of networks and computer systems, each with its own manager capable of capturing and storing online activities. Additionally, user activities can be monitored by individual websites and Internet Service Providers (ISP) (Privacy in Cyberspace, http://www.privacyrights.org/fs/fs18-cyb.htm), vastly increasing the availability of one’s personal information to strangers. This Introduction will deal only with privacy concerns surrounding the practices of data profiling and mining.
Even without the benefit of high-tech equipment, it is possible for website administrators to glean information from a user’s clickstream – the “aggregation of electronic information generated as a web user communicates with other computers and networks over the Internet. Adam White Scoville, Clear Signatures, Obscure Signs, 17 Cardozo Arts & Ent. L.J. 345, 364 (1999). Often, cookies – “data files created on [users] own computer hard drives when [they] visit a web site [that] contain[s] unique tracking numbers that can be read by the web site,” are used to facilitate data profiling and mining. Ann Bartow, Our Data, Ourselves: Privacy, Propertization, and Gender, 34 U.S.F.L. Rev. 633, 678 (2000). Another device often used to track user’s behavior is a “web beacon” (sometimes called a web bug), which is a miniscule, pixel-sized identifier buried in the software on a page a user views. According to the Privacy Foundation,
A Web bug is a graphic on a Web page or in an e-mail message designed to monitor who is reading the page or message. Web bugs are often invisible because they are typically only 1-by-1 pixels in size. In many cases, Web bugs are placed on Web pages by third parties interested in collecting data about visitors to those pages.
http://www.bugnosis.org/faq.html#web bug basics
In short, various devices can be utilized to assist a website in monitoring or collecting data about a user.
III. The Players
Entities using data profiling practices can be categorized roughly into two groupings - private corporations and the government. In turn, private companies that profile online use are typically either the website itself (e.g., CNN.com) or a third party advertiser (e.g., Doubleclick). Access to data mines is of great use to the government in that the government “gains powerful investigative tools allowing it to plot the movements, actions, and financial activities of suspects.” Froomkin 52 Stan. L. Rev. 1461. Furthermore, the government gains the capability to document criminal activities to be used in the prosecution of such suspects. One positive example of governmental use of data profiling is in the fight to combat credit card fraud in purchases over the Internet. Scoville 17 Cardozo Arts & Ent. L.J. 345 at 364. However, significant privacy concerns surround when the government ought to be granted access to data mines, both those generated by governmental bodies and those mines created by private entities. We will focus more on governmental access to data in cyberspace when we turn to Modules IV and V.
Private corporations, such as individual websites and Internet Service Providers, collect data for their own benefit. Due to technological advances, mining of data has seen a significant reduction in cost, leading to an entire industry dedicated to selling consumer data, particularly to interested marketers. Because private corporations often contract with outside firms to handle their data profiling, companies such as Acxiom (http://www.acxiom.com) have come to “[hold] personal and financial information about almost every United States, United Kingdom, and Australian consumer.” Froomkin 52 Stan. L. Rev. 1461 at 1474.
One of the key issues concerning cyberspace data profiling is the extent to which it is different from offline data profiling. For years, catalog companies and others have “mined” data contained in consumer responses to surveys, direct marketing, or purchases. Is Online Data Profiling categorically different from the offline profiling that consumers have accepted for years? If so, those who argue that cyberspace profiling is unique need to explain why it is problematic in that context even if acceptable in the offline context.
IV. Current data profiling practices
A. Practices of private companies
Currently, data profiling practices allow companies to obtain useful data through both voluntary disclosure of information and involuntary extraction of information through clickstream data. Voluntary information is disclosed by consumers through registration pages, user surveys, online contests, application forms, and transaction documents. For instance, the use of credit cards in online purchasing allows collection of data about a person’s finances, buying habits, etc. Off-line, cash purchasing provides for a certain sense of anonymity. But even off-line, the establishment of loyalty and rewards programs allows for important data to be collected about consumers.
Involuntary extraction of clickstream data occurs when technology such as cookies, web bugs, and other means track a user’s e-mail address, the type of browser being used, the type of computer from which the site is being viewed, and the site from which the user arrived. Additional information that can be extracted includes the geographical location of a user and a user’s recent history of page views. These practices allow a company to gain significant information about specific consumers who have not indicated any desire to establish a relationship with the company, thus introducing crucial privacy concerns.
As one can see, therefore, some websites collect personally identifiable information, and other websites collect data about a user which may or may not be personally identifiable. Those that collect personally identifiable information can obtain this information in one of several ways:
(1) The user signs up or otherwise identifies herself to the website (say, through a purchase or by signing up for a sweepstakes);
(2) The user hasn’t identified herself to Website X, but has identified herself to Website Y, and Website Y maintains a data-sharing relationship with the first Website;
(3) The user has downloaded software that automatically “reports” back to a website information about the user’s online or offline clickstream behavior;
(4) The user has a unique IP address [the numerical address to which information is sent on your computer] that can be traced to the particular user.
(5) The user doesn’t identify himself to Website Z for several years, but Website Z has dropped a cookie [for definitions, see Module I] onto the user’s computer. Years later, the user purchases something from the website Z. Website Z can backlink to the user’s prior surfing behavior.
Take for example the fictional website travel.com, which hypothetically sells discount airline tickets. Before a prospective buyer is even allowed to browse the offerings on the site, the consumer is required to register. This voluntary disclosure of information might include such seemingly benign pieces of data as name, email address, and possibly the willingness of the consumer to accept emails from the company. After registration, the user may run a few searches looking for discounted airfare to specific cities on specific dates. The user may eventually decide to purchase an airline ticket from Chicago to Cancun, Mexico during the month of January. The consumer may be asked to enter her age before being allowed to purchase the ticket and then may purchase the ticket using a credit card.
On the back-end of these consumer transactions, the website acquires and stores a significant amount of information about the consumer. After registration, the site is likely to drop a cookie to be stored on the user’s hard drive. Information is then gleaned from a user’s clickstream and the eventual transaction.
When similar information is collected from a number of consumers, the company can “mine” its databases, looking for consumers in a certain age range, for example 17-23, who either purchased tickets to fly to warm climates during typical “winter break periods” or completed searches for fares that met the criterion. If these consumers re-visit the site, then perhaps banner ads for certain products, such as beachware, will be shown to these repeat visitors, who are recognized by looking for the cookie dropped on their hard-drive during the first visit. For consumers who asked to be sent emails, perhaps emails will be sent with certain specials that the particular type of consumer will may be likely to find interesting.
Therefore, through both voluntary disclosure of information and information “warehoused”, or stored, by tracking the user’s clickstream and transactions, travel.com’s data profiling capabilities were used to provide advertising and marketing that was more highly targeted than was ever available before the Internet. This example is quite typical of data profiling practices used every day on the Internet to better target advertising and market individually to consumers.
A major concern of internet users is what happens to their transactional data (or websurfing data) once they have left a website. Many websites are vague on this point (perhaps with reason), and the fear of many users is that their data will be sold to marketeers to be combined with many other datasources. Marketeers certainly have an interest in developing comprehensive user profiles about internet surfers. Such profiles might include detailed information about their interests, tastes, preferences, purchases, work and employment history, salary, and so on. Because such data has no inherent expiration, there’s no reason to think it might not be retained for decades.
B. Governmental use of data profiling
“According to a report prepared for the European Parliament, the United States and its allies maintain a massive worldwide spying apparatus capable of capturing all forms of electronic communications. Known as ‘Echelon,’ the network can ‘access, intercept and process every important modern form of communications, with few exceptions’.” Froomkin 52 Stan. L. Rev. 1461. The report does not, however, give any insight into current uses of such information.
IV. How “mined” data will be used in the future
With respect to the potential for monitoring one’s activity on the Internet, targeted advertising is only the beginning. Data profiling is only in its infancy, and given the rapidly accelerating growth of developments in the field, data mining could soon “transform modern life in all industrialized countries.” Froomkin 52 Stan. L. Rev. 1461. For instance, one controversial outlet yet to be fully utilized is a unique numerical identifier embedded in each Intel Pentium III chip. If exploited, unique identifiers such as those on the processor and also found in certain Microsoft software, would allow a computer to be tracked anywhere in the world, unaffected by changes in application usage.
V. Pros and Cons of Data Profiling for Consumers
It has often been suggested that practices such as data profiling can infringe on users’ privacy. Recently, news organizations have reported on employers’ use of data profiles, abstracted through use of an internal network, in monitoring, or “spying” on their employees. Data profiling has also gotten a negative rap because it can contribute to spam, or unwanted email usually sent to bulk lists of people.
Certainly, the anonymity that a user may have taken for granted in surfing the Internet five years ago, should no longer be viewed as so. On the contrary, for some, just knowing that their activities are being recorded may have a chilling effect on their use of the Internet. One 1999 Forrester report found that “[t]o this day, Web users -- regardless of age, gender, or income -- worry that the information they share online will produce unsolicited spam or telemarketing calls. Even worse, they worry that information they share could come back to haunt them, ultimately harming their relationships, employment, or insurance eligibility.”
Data profiling may also raise consumer privacy concerns to the extent that consumers lose consent privileges over their personal information. Internet users ordinarily cannot control who has access to data mines, or to whom information about them is sold. Additionally, there is no assurance that information collected about consumers is accurate or even kept up to date.
VI. How consumers can protect themselves
The information in this category is fairly specific, in terms of what commands to execute, etc. The general idea is that you can program most browsers, now, to not accept cookies, which helps to guard privacy. However, some websites will not work if the browser setting has been chosen to reject cookies. Alternately, consumers can take advantage of encryption software which may help to some extent, and researching the reputation of sites and programs is always a good idea. Regardless, the basic assumption should always be that the Internet is not a private or anonymous environment and that, unless convinced otherwise by reliable sources, consumers should act with this information in mind.
VII. Regulation and Legislation
Early attempts at enforcing the privacy of Internet users were framed in terms of the tort of invasion of privacy. When it became clear that the traditional tort, for a number of reasons, would not suffice to protect the privacy of consumers in terms of information that was gathered about them without their knowledge or consent, piecemeal attempts at legislation were born. Today, a handful of federal statutes cover such specific areas as video rental records, student loan information, drivers’ license information, and credit reporting (in the Fair Credit Reporting Act 15 U.S.C. §§ 1681-1681s (1999)).
While the Clinton administration succeeded in defining Fair Information Handling Practices in terms of nine privacy principles, and in passing the Children’s Online Privacy Protection Act, intense debates linger about how privacy related to data profiling should be protected in the future.
There have been and continue to be other attempts at self-regulation to protect data, in particular. For instance, many transactional websites purchase secure “keys” to ensure that information passed to and from their website is encrypted and protected from interception by others. “Firewalls” are also used to protect data.
However, the problem with data profiling is not necessarily that of data getting into the wrong hands, although that is one problem that these measures do help to alleviate. Instead, the problem is the perfectly legal practices of obtaining information about consumers and using it in ways to which they do not consent. To this end, under the current default rules, self-regulation will likely only help protect privacy of consumers if it is worth it to consumers to pay more for privacy than it costs for businesses to respect their privacy. With no data, this would be difficult to assess, but as long as eCommerce continues to grow and consumers do not stop making purchases due to privacy concerns, it would seem that the only incentive for the industry to self-regulate would be to stop the government from stepping in. Interestingly, the same 1999 Forrester Report mentioned above, found that half of consumers were ready to call on the government to regulate online privacy.
B. Market Approach
Some would argue that regulating cyberspace privacy is not necessary. If consumers really cared about privacy, some would say that they would force the industry to provide it. On the other hand, if their personal data were treated as their property, consumers could promote the development of markets in personal data.
The difficulty with a market approach lies in the fact that it assumes the answer to the question being posed. We are asking whether cyberspace privacy should be protected under one or another legal regime, but the market approach assumes that personal information is freely collectible. The market approach assumes that the current default rule regarding personal information (i.e., that it is freely collectable and alienable by a website to others) is the correct result.
Naturally, if the current default rule is to your advantage you might seek a “market” approach. But there is nothing inherent or “natural” in such an approach any more than it is “natural” that those in possession of your healthcare records are forbidden from releasing them generally to the public. We have rules preventing the release of your medical records (e.g., tort rules holding providers liable for their release; statutes requiring careful treatment of medical records), and these collection and disclosure rules developed over time and with due consideration for the privacy and efficiency interests at stake. There is therefore no reason why one should not review from the ground up reasons for and against calling clickstream data “private” and protectable.
Finally, though proponents of market approach suggest that consumers can renegotiate the terms of their cyberspace dealings, in actuality users are disorganized and individually have exactly no power to affect the collection of data about themselves. Under the “market approach,” a great deal of personal data has already been commodified and is already a part of the public domain subject to purchase, sale and barter by the collectors of that data. Particularly with information that is abstracted involuntarily through a user’s clickstream, a user has little control over who has access to it. It seems bizarre to suggest that cyberspace users have the power to affect (or should organize in this way to affect) the sale and distribution of the data already collected.
A “market” approach, therefore, inevitably defaults to whatever data collection and distribution practice the collector believes will maximize the collector’s revenues. Under the market approach favored by the datacollectors, the argument over “privacy” is over before it begins. The collector, in short, retains the power “to control the acquisition, disclosure, and use‑‑of [your] personal information.” Jerry Kang, Information Privacy In Cyberspace Transactions, 50 Stanford Law Review 1193, 1203 (April 1998).
C. Mandatory State Regulation
Mandatory regulation might be advocated because consumers fear that profit-maximizing marketeers will control their personal information. Mandatory regulation could take any number of forms. The strongest form, advocated by Professor Jerry Kang, would outright ban retention of cyberspace transactional information after the transaction is complete without an individual’s consent. Jerry Kang, Information Privacy In Cyberspace Transactions, 50 Stanford Law Review 1193, 1291-93 (April 1998)(section 5 (b) of his proposed statute). While Kang’s proposed statute does not explicitly address online profiling of nontransactional behavior, the rationale for doing so would seem to be even stronger than the case for banning profiling of consented transactions. Kang’s perspective, supported by many online privacy organizations and privacy experts, reflects a fundamental rights approach to online profiling.
1. Reforming “Notice”
Some privacy advocates see this as an unrealistic burden: surfers who are concerned about privacy are unlikely to review and analyze the policy of each website they visit. One possible solution to this dilemma would be to standardize, say, five different levels of online privacy protection. Standardized privacy policies could range from highly protective (“we collect and retain nothing about you beyond the particular transaction”) to the least protective (“whatever you do on our site is monitored and added to our database and sold to third party marketeers who amass as much information about you as possible”) would at a minimum assist consumers in making judgments.
Gap.com values its customers and respects their privacy. We collect customer information in an effort to improve your shopping experience and to communicate with you about our products, services, contests, and promotions. Gap.com recognizes that it must maintain and use customer information responsibly.
http://www.gap.com/asp/cs_security.asp#3 (last visited February 25, 2002)
Gap.com goes on to describe in a vague way that it collects information based on a customer’s online behavior:
We collect information (such as your name, email address, mailing address, and phone and credit card numbers) that you provide when you place an order, save your information with us or participate in a contest, promotion or survey. We maintain a record of your product interests and your purchases online and in our stores. We may acquire customer names, email addresses and mailing addresses for select mailings from third parties. http://www.gap.com/asp/cs_security.asp#3 (last visited February 25, 2002)
From a casual reading of the paragraph, one might conclude that whatever information Gap.com collects is connected to a particular purchase. However, the giveaway in the above paragraph is the tagline that “we maintain a record of your product interests.” This language might have been designed by Gap.com to encompass online profiling without purchases, and if so perhaps Gap.com assumes that it has satisfied its duty to notify customers that it is profiling their online behavior. Since the language is vague, the careful reader can only guess. However, in this context vagueness in a particular notice about privacy can only serve to obscure what a website is actually doing. If Gap.com is not collecting information about a user’s non-purchasing surfing behavior, its lawyers certainly know how to say that clearly.
In admirable contrast to Gap, Amazon.com provides those who read far enough a straight-forward list of the types of information it “automatically” collects from those who visit its website:
Examples of the information we collect and analyze include the Internet protocol (IP) address used to connect your computer to the Internet; login; e-mail address; password; computer and connection information such as browser type and version, operating system, and platform; purchase history; the full Uniform Resource Locators (URL) clickstream to, through, and from our Web site, including date and time; cookie number; products you viewed or searched for; zShops you visited; your Auction history, and phone number used to call our 800 number.
http://www.amazon.com/exec/obidos/tg/browse/-/468496/102-1997690-1291327#auto (last visited February 25, 2002)
In contrast to Gap.com, which may, so far as the reader can tell, also collects such data, Amazon at least particularizes for readers what sorts of information is being gathered, recorded, and stored. To Amazon’s credit, it candidly tells the user that every mouseclick generated while on its site is being recorded and stored for future purposes.
in this context, does the fact that a website is collecting data about users
with every mouseclick require more prominent notice than that typically contained
in privacy policies buried in a website?
For example, if one truly wanted to provide notice, one could place
a blinking icon at the bottom of every page that explicitly informed users
that data was being collected. For
We are collecting your mouseclicks!
We are collecting
In sum, even the consumerist reformers of online profiling must address how much notice must be given, when must it be given, at what level of particularity must it be given, and how detailed must the site describe who will have access to the data.
2. Reforming “Consent”
Many privacy advocates continue to be frustrated that consumers realistically have no choice when confronting a website that collects personal data. They argue that a more explicit consent should be sought. Should an explicit approach to notification of users be required? Like the blinking icon (above), a website might be required to obtain explicit consent from a user that it would retain and store any personal information.
A strong consumerist model might require websites to obtain your explicit consent before collecting data. An ironclad consumerist model would require your explicit consent to collect your data without any consequences if you refuse the request. At present, most websites that collect data for their own purposes assert that they use it to benefit their customers. Should websites that collect information that they will sell to other businesses be required to ask consumers’ specific permission on every occasion before doing so? For example:
We would like to collect and sell to other businesses data about your use of our site.
May we? Yes/No
3. Changing Privacy Policies
The key to data profiling hinges on whether one sees the problem as one of fundamental rights in privacy, or merely a matter of adjusting the terms of a consumer transaction. Alternatively, a “market” approach would let the default lie wherever those in control of the technologies determined their economic interests to lie.
As for protecting cyberspace privacy of personal data, the fundamental rights approach would suggest restricting in important ways the capacity of websites to retain even transactional data. In contrast, the consumerist model rests on an imagined balance between the interests in privacy and anonymity on the one hand, with the commercial interest in commodifying personal information of users on the other. The power for the moment rests with the commercial interests: commerce has driven the technology of collection and use, and it is unlikely that commercial interests enjoying significant economic benefits in extracting and mining online users’ data will willingly forego those revenues. Consumers might like to choose how much access marketeers have to one’s personal information, but in the current environment, that choice has been defaulted to the companies in control of the technology.
 This Introduction is based on a research paper prepared by Wendy Netter (HLS ’02).
 According to the Privacy Foundation, web bugs (or web beacons), tiny identifiers hidden in a webpage, are used for many reasons, and can discover data about a user:
Companies use Web bugs to:
--Count the number of times a particular Web page has
--Track the Web pages a visitor views within a Web site.
--Track what Web pages an individual visits across many
different Web sites.
--Count the number of times a banner ad has appeared.
--Measure the effectiveness of a banner ad campaign by
matching visits to a Web site to where banner ads for the
site were originally viewed.
-- Match a purchase to a banner ad that a person viewed
before making the purchase. The Web site that displayed
the banner ad is typically given a percentage of the sale.
-- Allow a third party to provide server logging to a Web site
that cannot perform this function.
--Record and report the type and configuration of the
Internet browser used by a visitor to a Web site. This
information is typically used in aggregate form to determine
what kind of content can be put on a Web site to be viewed
by most visitors.
--Record and report search strings from a search engine to
an Internet marketing company. The search strings are
typically used to profile users.
--Transfer previously input demographic data (gender, age,
zip code, etc.) about visitors of a Web site to an Internet
marketing company. This information is typically used for
online profiling purposes.
--Transfer previously input personally identifiable
information (name, address, phone number, e-mail address,
etc.) about visitors of a Web site to an Internet marketing
company. This information is typically used for online
profiling purposes. It also can be combined with other
offline demographic data such as household income, number
of family members, type(s) of car(s) owned, mortgage
--Cookie sync, which allows two companies to exchange
data in the background about Web site visitors. This data
can be demographics or personally identifiable data,
typically used for online profiling purposes.