File Caching on the Internet: Technical Infringement or Safeguard for Efficient Network Operation?
Richard S. Vermut
4 J. Intell. Prop. L. 273 (1997)
II. A FIRST LOOK AT CACHING*
III. THE INTERNET AND RELATED TOPICS*
B. THE INTERNET*
C. TCP/IP, THE PROTOCOL OF THE INTERNET*
D. THE WORLD WIDE WEB*
E. THE HYPERTEXT TRANSFER PROTOCOL*
F. THE FILE TRANSFER PROTOCOL*
G. THE IMPACT OF THE INTERNET ON INTELLECTUAL PROPERTY RIGHTS*
IV. CACHING ON THE INTERNET*
A. WHY CACHING IS IMPORTANT TO THE INTERNET AND NETWORK EFFICIENCY*
B. CACHING INFORMATION ON THE INTERNET*
C. DECIDING WHAT INFORMATION TO CACHE*
D. DECIDING HOW LONG INFORMATION REMAINS IN THE CACHE*
V. THE TECHNIQUE OF MIRRORING*
VI. COPYRIGHTABLE SUBJECT MATTER OF CACHED MATERIALS--THE FIXATION REQUIREMENT*
VII. THE EXCLUSIVE RIGHTS INFRINGED BY CACHING*
A. THE RIGHT OF REPRODUCTION*
B. THE RIGHT OF DISTRIBUTION*
C. THE RIGHT OF PUBLIC DISPLAY*
D. THE RIGHT OF PUBLIC PERFORMANCE*
VIII. THE CASE OF COPYRIGHT INFRINGEMENT*
A. A PRIMA FACIE CASE OF INFRINGEMENT*
IX. AFFIRMATIVE DEFENSES*
A. FAIR USE*
B. SECTION 117*
C. IMPLIED LICENSE*
Quote overheard during the German occupation of Czechoslovakia after its annexation in 1938:
"Excuse me sir, but could you cache a Czech?"
The future of technology and communications is in cyberspace The internet is the fastest growing form of computer telecommunications and will be tomorrow's marketplace. In 1981 fewer than 300 computers were linked to the internet. Today there are over 9,400,000 host computers A recent study concluded that there are 21.3 million users worldwide and 6 million computers attached to the internet Approximately sixty percent of these are located within the United States The internet reaches 100 countries, and electronic mail is accessible from 154 The internet has doubled in size every year from 1988 through 1994 It is expected that by the year 1999, there will be 200 million internet users As with most developing technologies and new legal frontiers, cyberspace brings with it novel and contemporary issues of law to be settled by outmoded legislative enactments and the judiciary.
The internet is headed for a transmission overload dilemma as more users convey more information over the same existing network and telephone lines. As it expands in size and number of connected computers, network engineers seek fresh solutions to ease overall internet congestion. One response that has been implemented for several years is file caching. Information originating from one point on the internet is temporarily stored at an intermediate location that it happens to pass through on its way to its intended destination. Consequently, future requests for that same information may be retrieved from this intermediary reducing travel time significantly. Several other caching schemes have also been utilized.
In order for a cache to operate successfully, it must make identical copies of all cacheable information that travels across its path. Much of the information transmitted across the internet receives copyright protection because it consists of files containing graphics, sounds, and text. When protected works are copied and stored by computer caches, the rights of authors and the public interest of maintaining a workable internet collide. This Article addresses the legal consequences of caching as copyright infringement.
The first part of this Article is an in-depth focus on file caching as it occurs on the internet. It also presents a basic understanding of the internet, its internal network operations, and protocols. This discussion is followed by a detailed description of how file caching works and its many flavors. Included in this discussion are file caching's benefits and significance to the internet's successful performance. The first half of this Article should be useful to attorneys and others who desire a precise understanding of the internet and the world wide web beyond the fundamentals.
The second part of this Article applies current copyright law to file caching on the internet. As a preliminary matter, cached information is analyzed to determine whether it meets the statutory requirements of copyrightable subject matter. Discussed next are the exclusive rights of copyright holders intruded upon by file caching. The exclusive rights presented are the rights of reproduction, distribution, public display, and public performance. Following the exclusive rights examination, a brief section speculates upon the easy case of infringement and damages that an author can claim against a file cache operator. The last part presents three affirmative defenses that may be raised by file cache operators. These include fair use, 17 U.S.C. § 117's computer program exception, and the theory of implied licensing. The conclusion reasons that, notwithstanding the technical infringements incidental to the operation of a successful cache, equitable considerations and public policy should immunize file caching from copyright liability.
II. A FIRST LOOK AT CACHING
Caching is a common technique used to reduce the time it takes for a computer to retrieve information. The term cache is derived from the French word cacher, meaning "to hide. Ideally, recently accessed information is stored in a cache so that a subsequent repeat access to that same information can be handled locally without additional access time or burdens on network traffic When a request for information is made, the system's caching software takes the request, looks in the cache to see if it is available and, if so, retrieves it directly from the cache. If it is not present in the cache, the file is retrieved directly from its source, returned to the user, and a copy is placed in cache storage. Caching has been applied to the retrieval of data from numerous secondary devices such as hard and floppy disks, computer RAM, and network servers.
A simple example of caching is a commonly known disk caching scheme used by MS-DOS. When information is retrieved from a file located on a floppy disk, a computer follows the following procedures: the computer must engage the floppy drive's motor to begin rotating the floppy disk, wait until the drive is up to speed at roughly 300 rotations per minute, read the file information from the floppy, move the head to the corresponding track and sector, and then finally, read the file. Relative to other computer operations, this process is rather lengthy. However, following the belief that information accessed once is likely to be accessed again, the retrieved floppy file is copied into a cache in the computer's RAM and then given to the user. With a copy of the file in the cache, a second request for this file will be returned from RAM and appear to be instantaneous.
Caching is used on computer networks and the internet to reduce network traffic. When multiple networked computers seek access to the same file, the workload of servers and the number of requests they receive can be burdensome. When too many requests to one particular computer on the network slow down overall network performance, there is network traffic. A strategically placed cache on a network can cut the traffic to one location in half, improving the overall network performance.
There is a distinct vernacular which accompanies caching. When requested information can be returned directly from the cache, there is what is known as a "hit." When the information is not in the cache and must be retrieved, there is a "miss." A high hit-miss cache ratio is desirable because the cache is frequently being hit and computer time is preserved. When there are more misses than hits, the cache is not successful and is underutilized. When the information in the cache is too outdated to be used, there is a cache consistency problem. The file in the cache is called "stale" or "dirty," meaning the cached file is no longer useful because it has been updated at its original source location since it was cached.
Caching issues are not only important to software engineers and programmers, but to users as well. It is apparent that caching can frequently save users from spending more time waiting for information than actually computing. Programs that implement caches have the ability to determine what information should be cached, how many files, and for how long. Essentially, a cached file is an additional copy of the file located on the disk drive. Caching can often occur without the file owner's knowledge or consent. On a network, this may result in widespread distribution of files that were never intended to be made so publicly available. Successful compromises on these kinds of unintended distribution issues may be achieved depending upon the type of caching implemented and cooperation between users and cache operators.
III. THE INTERNET AND RELATED TOPICS
While caching is used in many areas of computing, the focus of this Article is on its application to computer networking. The success of file caching on the internet depends upon the various responsibilities and operations of networked computers. The location of the caches, the internet protocols used for communication between network locations, and the types of information transferred are all integral components of how successful caches operate. This section discusses some of the simpler terms and provides a basic understanding of computer networks, interconnected networks, and what has become known as the internet.
When the word "computer" is used, one usually thinks of a personal computer such as a laptop, Apple or Windows compatible machine. As more users discover the need to share information such as customer information in databases and word processing files, it has become necessary for computers to be connected together. Through these connections, common information can be shared between several computers simultaneously. This arrangement, commonly used in business office settings, is known as a network. A computer that is connected to the network can access shared files and share resources such as printers. Also, messages may be sent from one computer to another connected to the network.
A collection of computers connected in this fashion that are relatively close in geographic proximity is known as a local area network (LAN). This close location is typically an entire office floor. A wide area network (WAN) is a network using these types of connections between computers that are in different office buildings or parts of the country. Many companies have several office buildings with computers on several floors of each building. All these LANs are often connected together to form one much larger network. This way, any employee in the company wherever located may communicate with any other connected employee. As applied in this manner, the term "network" may also be used to describe a collection of networks that are connected together.
The word "internet" is a relative term that is best described as a collection of interconnected networks. While one may think of the internet as the illustrious "information superhighway" or "cyberspace," it may simply be referring to a network of LANs and/or WANs in an entirely different setting. On the other hand, what has become known world wide as the "internet" is a global connection of LANs, WANs, and other computing devices, privately and publicly owned. For the remainder of this Article, the term "internet" will be used to refer to this global, more contemporary meaning. Presented in the next section is a brief history of the origins of the internet to give a better understanding of the effects and implications of these global connections.
B. THE INTERNET
The internet owes its origins to the United States government. In the early 1970's the Department of Defense's (DOD) Defense Advanced Research Project Agency (DARPA) developed a plan to connect computers and computer networks between military contractors and universities conducting defense research. It was originally called ARPANET. One of the novel concepts of this network of networks was its decentralized design. A "site" is a computer or network connected to the ARPANET. Rather than every site having a single connection to another site, it was decided that each site's connections would be to a multitude of other sites. All the sites were networked in this topography. A message sent from one site had more than one path it could take to another site. Despite this complexity, message routing was handled automatically by the network and was transparent to the users.
This multiplicity was essential at the time the internet was created. The eventual goal, aside from inter-network communication, was the certainty that no single site was so indispensable that the ARPANET depended upon it. For the government, this meant that a nuclear attack eliminating one site would not shut down the network because other computers would automatically use other routes to send their messages.
As this concept of overlapping routes progressed, more information was becoming available to the ARPANET participants. As more universities and even corporations wanted connections to the ARPANET, it soon became known as DARPA's internet. Today it is just called the internet. One of the most important concepts created by this topography is that no single computer or individual network has any greater responsibility for the success or functional existence of the internet. Whether any device is connected at any time has no effect upon the overall structure or operation of the internet. The information and hardware of the internet resides at no one particular site; rather, it is distributed. Consequently, no one entity or individual owns the internet.
C. TCP/IP, THE PROTOCOL OF THE INTERNET
One contribution to the success and wide acceptance of the internet is the ability for every computer to communicate with every other computer regardless of its vendor, size, or operating system. This has been accomplished by the implementation of recognized standards of communication. On the internet, the standard is the Transmission Control Protocol/Internet Protocol (TCP/IP). TCP/IP's acceptance is a result of its development and funding by the United States government and its implementation within UNIX. The DOD needed a network communications standard so all computers connected to the ARPANET could easily communicate with one another. In 1983 the University of California at Berkeley released its version of UNIX known as UNIX 4.2BSD. Embedded within its version of UNIX, were the TCP/IP protocols. The use of UNIX as an operating system was widespread, and the availability of the built-in TCP/IP helped achieve its acceptance.
TCP/IP is not a computer language. It is a standard for providing "all the facilities for two computer systems to exchange information, interpret it properly, and present it in a format which can be understood by the local machine and its users." This format is divided between four layers of the TCP/IP protocol: the data link, network, transport, and application layers.
When information is sent across the internet, it is first categorized by the application for which it is being used. Each application has a unique protocol for completing its task. When file transfers are being performed, TCP/IP's file transfer protocol application (FTP) is used. When logging on to a remote computer, TCP/IP's Telnet application is used. For electronic mail (e- mail), the application is the simple mail transfer protocol (SMTP).
To send a mail message, the first determination made is the type of information being sent. TCP/IP must first take the message and place it in a format that is standardized for the sending and receiving of e-mail. Modern conventions would require the letter to be placed in an envelope, marked with a sending address and a return address, and affixed with the proper postage. TCP/IP handles this at the top layer known as the application layer. This layer is responsible for providing all the information and formatting so that the message can be interpreted by the application layer at the receiving end. The format includes addressing the mail with a header providing information about the sender and receiver. Next, the e-mail message is converted into a format which can be sent over the network. This is handled by the network layer. This layer takes the message and divides it into smaller chunks called packets. Messages are sent over a network in the form of several packets. The network layer also orders the packets so that the receiving end will know how to reassemble them.
Once the packets are created, each one must be addressed to the destination. The network layer handles this responsibility. The packets are inserted with the necessary internet protocol address to reach their destination. The lowest layer, the data-link layer, handles all the hardware details and the transmission of the packets across the cabling. The data-link layer converts the packets into electrical impulses that travel along the physical cables.
D. THE WORLD WIDE WEB
The world wide web (WWW or web) is one application of the internet and TCP/IP. Users at one end of a network connection are able to retrieve specially formatted documents from another connection, also called a site, and view them on their screen. The user browses the web; the computer with the document on- line is known as a web site. The WWW consists of two user applications that work together to communicate over the internet. The term "world wide web" originates from a project at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland. Most computers and networks on the internet are connected to more than one other site. Visually, the many overlapping and crossing paths have the appearance of a spider web. The original purpose was to place all the data gathered by the CERN project on- line. The project involved physicists researching the operation and construction of large particle accelerators. An organized system was necessary since there were over one thousand people sharing enormous amounts of information in at least nineteen countries.
As previously discussed, nothing on the internet is centralized. In internet-like fashion, the WWW functions in the same decentralized manner. If a computer is connected to the internet, anything stored on that computer may be made available to any other computer connected to the internet. Anyone with a computer can place information on the web if one has a computer connected to the internet and tell other people using web software that the information is available on-line. In this manner, information placed on the internet is accessible worldwide.
1. The Client-Server Model. Client and server computing arrangements are important because of the legal consequences which can attach to their responsibilities. The client-server model is a means for distinguishing the responsibilities of two connected computers. The client asks, and the server provides. Typically, individuals interact with client computers. Clients depend upon servers to provide them with resources their computers do not contain. Servers store and process information for clients. On a typical LAN, the server computer provides resources such as hard disk space or printers for use by all client computers on the network.
In contrast, stand-alone equipment such as a single computer provides its user with many necessary resources. When connected to a network, those resources are distributed between at least two computers--the user's computer and the network computer. The user's computer still performs necessary tasks, but it relies upon the network computer for additional resources such as printing or file storage. In this arrangement, the server computer serves the needs of the client when requested to do so. This model provides for the sharing of resources by clients since most resources are not in continuous demand. The benefit of the client-server model is that many clients can share the same device, such as a printer, because the risks of a demand bottleneck are unlikely.
From a more practical perspective, a server can serve clients and simultaneously be a client of another server. Any client may be a server, and any server may be a client. The terminology must be used carefully when classifying a computer as a client or server to relatively depict the given operations. This may be illustrated by the process of retrieving a web document. When a user on the web wants to browse a document located at a web site, the browser is the client, and the web site is the server. The client is requesting that the server provide it with a resource that it could not provide alone--the contents of a document. The web site services this request by sending a copy of the document over the internet to the requesting client. Likewise, the client computer that just received the document may be a web site as well. While at first it was a client asking for a document, it is now a server fulfilling other client's requests for documents it has available on- line. If a request were to come from the same site that just previously fulfilled a request for a document, the roles would be reversed.
2. Hypertext and Hypermedia. Two distinct types of transferred information will be considered for purposes of our caching and infringement analysis. Communications and message transferring are accomplished in numerous ways on the internet. The focus here will be on hypertext documents and FTP files because of their contents. Hypertext documents are important to recognize because they are capable of containing not only text messages, but also sounds, graphics, motion pictures, and live broadcasts. FTP files are located at FTP sites on the internet and are used for the sole purpose of transferring copies of computer files to requesting clients. These files contain information such as data, computer programs, or even entire software packages.
Users seeking information on the world wide web are actually requesting and receiving single files of information. When a user views one of these web documents with a browser, pages of text and links are displayed on the screen. A link is a pointer to information located either at another location within the same document or in a different file. Links are displayed on the user's screen in a unique format in order to make it apparent that they point to other information. When a link is selected by the user, either another portion of that same document or a different one will be displayed. These individual files are called hypertext documents. The prefix "hyper" means "text with links." Hypertext documents are text documents containing links to other information. This is beneficial because the information does not have to be sequentially listed. Hypertext documents may contain links that reference different parts of the same document or other documents entirely. This fundamental concept of document organization has been around since 1965. Hypertext documents coupled with information in the form of images, sounds, and moving pictures are called hypermedia, or multimedia documents.
Standards have been set forth for the organization and encoding of hypertext and hypermedia documents. The Hypertext Mark-up Language (HTML) defines the formatting of documents that are accessed on the world wide web. An HTML file includes all of the text to be displayed on the viewer's screen and special codes that determine what should be displayed as a link and where the linked information may be found. The locations pointed to by links are called anchors. An anchor is information within a document to which a link may be attached. Anchors may consist of words, sentences, or paragraphs. In addition, HTML also provides other formatting codes for routine tasks such as centering text, italicizing, bolding, and ordering of headers and paragraphs.
When HTML is used to create multimedia documents, another linking device is needed in addition to links and anchors. Web documents containing text and graphic images on the same page are common. If links and anchors are used with an image represented by a link, the page appears to be all text with a text link pointing to another file containing the image. The two do not appear simultaneously. When this is not the desired result, HTML provides a format for a further description of anchors. Anchors are categorized as regular and inline. Regular anchors are displayed on the screen as text, indicating to the user that they may be selected for viewing the anchor. Inline anchors are automatically loaded and displayed along with the rest of the page. When the HTML document is browsed by a user, the text is displayed along with all inline links giving the appearance of both graphics and text. This same principle of inline imaging is used for presenting sounds and motion pictures.
E. THE HYPERTEXT TRANSFER PROTOCOL
Once an HTML document is written, it is ready to be placed on-line. It is important to appreciate how documents are placed on-line and the procedures followed by client computers in requesting and receiving them from servers. Each participant has unique responsibilities which determine what is being copied, at whose direction, and how. The following paragraphs present an explanation of these standardized computer operations of the hypertext transfer protocol.
1. The Web Server. Once an HTML document is written, it may be placed on- line. This is accomplished by placing a copy of the HTML document on a web server. A web server is a computer running appropriate software enabling it to receive client requests for web documents, retrieve them from its hard disk, and then transmit them back to the client. There are no unique requirements to run a web server. Practically any computer with a permanent connection to the internet and running the appropriate software is suitable. The number of web pages that a web server may provide is only limited by hard drive space. The web server is simply an information retriever, providing a copy of any HTML document that it has on its hard disk to anyone requesting it.
Web servers are distinguished from other computers connected to the internet by its software. For the server to function properly, web server software must be running continuously. This program is often known as an httpd, hypertext transfer protocol daemon. The daemon waits for a user's request to arrive from over the internet. The request is then deciphered to determine who sent it and which file has been requested. The server must then determine if it has that file on its hard disk, locate it, and send it back over the network to the requesting client. If the server is unable to fulfill the request, for example, if the file could not be found, then the web server indicates this to the client with an error message.
Web servers receive requests for web pages from thousands of people each day. With so many demands arriving so frequently, the server can quickly become overburdened. Several solutions have been implemented to combat delays to users waiting while other web page requests are being processed by the daemon. The simplest solution is to have the server computer simultaneously run multiple copies of the daemon. Another way, which provides better management of memory and processing time, is for a single daemon to run and create smaller versions of itself that handle each client request, and then terminate upon completion. In programming terms, this is known as the parent process (the daemon) spawning child processes for each request. Another efficient way to ease the load of servers is to have the request for the HTML file handled at an entirely different location. This is one of the greatest benefits accomplished by caching. Presented later is a discussion on how caching is able to achieve this goal.
2. The Browser.On the other side of this information exchange is the client computer running a web browser. A browser is a client program that allows its user to view hypertext and hypermedia documents and to use the links provided in those documents to move from one document to another. The process of moving through the links from one hyperdocument to another is called navigating. All that is needed for a user to access a web site is the document's location, known as an Internet Protocol address (IP address). The user provides the browser with the address by manually typing it in or selecting a link to an address from a web document that has already been retrieved and displayed. Once the address has been provided by the user to the browser, the browser sends a request to the web server at that location for a copy of the HTML file. When the file is received, the browser interprets the HTML and displays the results on the screen.
Standards have been provided for the addressing scheme of the world wide web. In order for any network to run successfully, messages from one computer need to be properly addressed to the destination computer. Every computer needs to have some unique identifier. One central addressing scheme is necessary to avoid confusion because the internet is a network of millions of computers. Every internet site that directly receives messages from another internet site is assigned an IP address. The address itself is a 32-bit number, consisting of four 8-bit numbers each separated by a period.
Since computers operate in numerics more easily than humans, a second addressing scheme was created for the latter constituency. Sites on the internet can apply for and receive domain names. A domain name is a unique name, registered to the internet, that is used to identify a computer connected to the internet. Every domain name has a unique corresponding IP address. For example, where a site uses 22.214.171.124 as the IP address of one of its particular computers on the internet, that same site may also be known as www.whitehouse.gov. Domain names are a series of characters or words, separated by periods. The word at the right end of the full domain name is known as the top level domain name.
Either the IP address or the domain name can be used as the address placed into the browser by the user. A conversion system has been created so clients having only the domain name may easily arrange for its conversion into the corresponding IP address. The reverse translation is also possible. Translations are handled by the client, running a resolver application, and by domain name servers (DNS). The client's resolver sends a request to a domain name server on the internet. The domain name server takes the request and determines if it is able to resolve the domain name to the appropriate IP address. If the information is available, the domain name server returns the corresponding IP address. Otherwise, it either returns an error message or forwards the request to another DNS that may have the information. Like the internet itself, domain name services are distributed across the network. Each unique DNS is responsible for a different top level domain. If one DNS receives a request involving a domain name whose translation table is assigned to another server, the request is forwarded.
DNS servers are often duplicated, with several of the same servers running concurrently, because of their importance. These duplicates are called secondary servers. DNS caching servers have also been created since most internet communications begin with a domain name and not an IP address. As will be discussed later, many web sites are duplicated and available on several different computers located at various IP addresses. Users who only have a single domain name do not know about the other sites. However, the DNS may have a list of numerous IP addresses for one domain name. When resolvers send inquiries to a DNS for a site having multiple cache sites, the IP address returned may be selected from one of the listed cached sites on a rotation basis known as DNS round-robin. The browser never knows it has reached an alternate destination.
Once the browser has received the domain name from the user, it uses the available resolver to convert the name into the related IP address. The browser then connects to that web server. The browser sends a message to the web server requesting the HTML document. Once the document is received, the browser must parse through the HTML code and display the corresponding output on the user's screen. Links to other documents are handled in one of two ways. If the link is inline, then the browser must immediately start this process over again and retrieve the file for that link. This is how images and sounds are displayed simultaneously with an HTML file. All inline links are requested immediately from web servers when the HTML is parsed. Accessing a single HTML document may initiate several immediate requests by the browser for other HTML documents, graphic images, and sounds. If the link is regular, requiring no further information, it is just displayed as such. Users know that by selecting a link, their browser will open it, take them to a new web page, and begin this process over again.
3. HTTP Header Information. Communications between browsers and web servers are more detailed than previously described. In addition to receiving HTML files or other hypermedia documents, other information is also available. This other information is used to determine whether hypertext and hypermedia documents are to be cached and for how long. This section presents a more in- depth look at the hypertext transfer protocol so that its caching directives may be understood.
There are three requests a browser can make to a web server. These browser requests are also called methods. They are GET, HEAD, and POST. A command must be properly formatted by the browser before it may be sent to the server. The information provided in the request is as follows: the method, the uniform resource identifier (URI) which provides where the information is located on the server, the version of HTTP being used, and a CrLf, which indicates that the end of the request has been reached. Additional information may also be sent by the browser following the version number in the field known as the header. Header values can tell the server information such as what file formats the browser can display and the type of browser software that is being used. Headers are valuable because they can also include information for caching purposes.
When a browser sends the GET method to the server, the information provided in the URI is retrieved and sent to the browser. This information includes header information and the entire contents of the file. The HEAD method is the same as the GET method, but only the header information for the file is returned. The HEAD method is useful when the retrieval of the entire file by a GET may only be needed depending upon the values of particular information in the file's header. It may be more efficient to use the HEAD method prior to a GET since a file's full contents is typically larger than its header. POST is used by the browser to send information to the web server. For example, users commonly use browsers to deliver their mail messages to web servers. Unlike GET and HEAD, POST sends the pertinent information in the other direction.
The server returns a similar header for every request it receives. The header is in addition to the file itself. The server's response includes the version of the HTTP it is using, a result code, a message, and a header. The result code informs the client whether the request was successful or not. A commonly returned result code is "404 Not found," meaning that the file to be retrieved could not be found by the server. The message field is a text-based string that relates to the result code. The header is optional and contains important information about the file to be sent. Current header field definitions include the HTTP version, the server's software information, the file format, the time and date the file was last modified, and the file's length. The last-modified field is important for internet caching because it often determines how long the web related file will remain in the cache.
F. THE FILE TRANSFER PROTOCOL
Another popular internet application, other than the browsing of HTML documents, is the transfer of computer files. TCP/IP's File Transfer Protocol (FTP) is responsible for this operation. In contrast to HTML files, users may want to exchange ordinary computer files containing programs, database information, or word processing documents. By using an FTP program, a client can view the contents of a server's file directory, select the files to be transferred, and automatically receive them. This file transfer arrangement in which clients request and FTP servers send files is called downloading. Users desiring to make computer files available to others on the internet need to place them on-line for FTP retrieval.
Like a web server, the only requirement for a successful FTP server is a computer permanently connected to the internet and running FTP software. A file transfer protocol daemon (ftpd) is loaded onto the server and continuously runs waiting for FTP requests from clients. In short, FTP differs from HTTP because a connection is established between the client and server for the duration of the FTP session. In other words, when a client accesses an FTP site, he is connected to it while he views, receives, and sends files. During an FTP connection, the client and server communicate by using three and four bit codes. When HTTP is used, connections are brief. A connection is opened for every individual browser request and immediately closed as soon as the server replies.
HTML files are created by users knowing they will be placed on-line, and often contain information in a piece-meal form. Users browsing HTML documents see only portions at a time on their screen. Frequently, they must select links and jump from document to document. When a client requests a file from an FTP server, the server reads the file off its hard disk and then sends an identical copy of it over the internet to the client. Unlike HTML documents, computer files are frequently not intended to be made publicly available. Computer software can be placed on an FTP server and duplicated by thousands of users world wide in a matter of seconds.
G. THE IMPACT OF THE INTERNET ON INTELLECTUAL PROPERTY RIGHTS
The ability of computer users to "browse around the web" is unlike any other technology that has ever been available. The internet is unlike radio and television. Users of the latter two forms of communication can only listen and view the information selected by broadcasters. Someone other than the audience makes those decisions. An internet user has the ability to sift through unlimited amounts of information and refine inquiries until precisely tailored. The retrieved information, often far more than may be reviewed in a short amount of time, is privately saved to the user's hard drive.
Computers store information digitally, in terms of ones and zeros. These ones and zeros, which correspond to electrical volts, are organized into bits and bytes. Previous recording devices have stored information in analog form. Copied information that is stored digitally is identical to the original. Every bit in the original is duplicated in the copy. There is no deterioration of quality as often seen in photocopies. Copies are identical to the original. Further reproductions will also be identical and may be made without the need for the original. Copies may be made in a matter of seconds by the computer holding the information. Duplicates are made effortlessly and without delay.
To realize the impact of this efficiency on copyright laws, consider a web page that displays a graphic image of a famous work of art. When a user wishes to view that web page, the image must be transferred from the host computer, across the internet, to the computer making the request. The graphic image is not being viewed from the original source but from a copy in the memory of the user's computer. With the copy of the image, it is possible for the user to permanently write the image to his hard drive for later retrieval. In essence, the internet is a library; but in order for others to borrow a book, the library must make for them an identical copy which they need never return. The "library" may easily have thousands of borrowers asking for copies daily.
IV. CACHING ON THE INTERNET
Information retrieved from the internet is cached in several ways and at different levels. Client browsers cache data to avoid repetitive delivery of information over slow conventional telephones modems. Web servers cache information to ease their demand loads. Other caches are strategically placed on the internet to ease overall network traffic in selected geographic regions. This section examines the importance of caching to the internet and how it is implemented. The benefits of caching need to be recognized in contrast with the consequences of its potential for copyright infringement. The following section is an in-depth look at how caches are managed and operated on the internet. The importance of caching is presented to accentuate how catastrophic it would be for a court to issue an injunction shutting down internet caching.
A. WHY CACHING IS IMPORTANT TO THE INTERNET AND NETWORK EFFICIENCY
Caching schemes are absolutely necessary to handle internet traffic. A recent example of network overload occurred during the impact of comet Shoemaker-Levy 9 with the planet Jupiter. NASA placed telescopic images of the event on the internet for public viewing within hours of their creation. Not only did NASA have to bring additional computers on-line to handle the overflow of requests, but some servers logged more than 880,000 accesses and others about 420,000, delivering more than six terabytes of data. The demand eventually so overwhelmed the servers that none of the requests could be serviced at all. Caching techniques have the ability to minimize network traffic and enhance overall network performance. The following sections address the types of traffic that diminish the internet's performance.
1. Congestion.One of the desired results of caching on the internet is to reduce congestion, known as network traffic. In 1993 an internet study concluded that if FTP files were subject to a caching scheme, half of all file transfers over the internet could be eliminated. For clients on the internet, congestion commonly occurs in two places. For users, the most common place of congestion is at a server's site. A single user's request reaches the server along with those of others on the internet. With millions of users on the internet, this spot can easily receive thousands of requests at nearly the same time. The second area of congestion is at the connection point between a client's LAN and the internet. Internet users typically connect through a LAN. Every request from each user on the LAN must pass through the same portal to the outside connection to the internet. These commonly shared access points may easily become overwhelmed.
2. Bandwidth.The speed with which data travels between two networked computers is determined by their physical connections. The physical connections between computers range from various forms of metallic wires and how well they are shielded from interference, to the use of light waves and fiber optics. The speed of a network is measured by the number of bits per second that may be transferred from one location to another. The rate that information can be transferred through one of the physical mediums is called the bandwidth and is usually expressed in kilobits per second or megabits per second. Fiber optic cables can transfer information at 100 megabits per second while telephone modems transfer at 28.8 kilobits per second. The type of cabling used between two computers determines the maximum throughput of data. On the internet, different types of cabling are often used to connect computers. Overall transfer speed is limited by the cabling with the lowest bandwidth.
Determining the minimum bandwidth of a connection is important in deciding where to position a cache. A user accessing the internet via a modem will not enjoy the benefits of caching if the cache is located at the other end of the phone line, a point beyond the lowest bandwidth. However, a cache at the user's end, such as within the browser software, helps prevent the painful delay experienced when receiving information at 28,800 bits per second.
3. Latency.The last efficiency concern is latency, the delay between the request for information and the moment it is received. Because the internet is a series of interconnected computers, it is rarely the case that two computers have a direct connection. In order for a packet to reach another computer, it may have to stop along the way at several intermediary sites. Each time the packet makes a stop along the way, this is known as a "hop." The ideal goal is for packets to reach their destinations by traveling via the shortest routes, making the fewest number of hops. At each stop along the way, the packet is handled by hardware known as routers. These routers determine the next place to send each packet and attempt to send it by the shortest route. When network traffic is high, packets are often routed to alternate pathways. This results in longer waiting time between a client's request and a response from the server.
4. How Caching Enhances Network Performance.Caches are often placed within a few hops of many internet clients. For a company with a LAN or WAN, every user's requests may be required to pass through a cache server, known as a proxy, before being sent out over the internet. Without the proxy, each user may be asking for the same information. As one person requests information, it is received and stored in the proxy's cache. Every other user on the LAN who then seeks that same information will receive the cached copy. This application of proxy caching may reduce the number of requests for internet information from possibly hundreds to just one. The return of this benefit is tremendous; it reduces both the internet traffic at the LAN's point of internet access, as many outside requests can be handled by the proxy, and the load on the web server needing to accept, interpret, and fulfill each request.
Well-placed caches at particular internet locations can reduce the problems of latency and bandwidth. As more and more connections are upgraded to fiber optic cable, caches placed at some of the slower connections can ease their load. While proxy caches reduce the traffic created by a single LAN, a cache placed at an intermediate point may have connections to several LANs. This second caching server, located farther out on the network, may have the information needed should the LAN's proxy not have the data in its cache. This decreases the number of hops required because the only distance traveled is between the proxy and the cache server.
B. CACHING INFORMATION ON THE INTERNET
As previously noted, caching occurs in numerous places on the internet. Different benefits are derived depending upon the location of the cache. The various types of caches used have unique decisional controls that may be exercised over their operation. These factors are affected by whether the cache is controlled by a user, a server, or part of a greater caching scheme. Presented next is a closer look at the caching implementations of these methods.
1. Client Caching.The simplest and most frequently used type of caching occurs on the user's own computer. Browsing software often comes with its own internal caching mechanisms. These are known as browser or client caches. The browser's cache is triggered each time the user instructs the browser to retrieve a web page. The browser checks the cache, which may be a portion of the computer's RAM or a reserved area of disk space on the hard drive. If the document is in the cache, it is read from the browser and displayed on the computer monitor. Otherwise, the appropriate HTTP commands are sent over the internet to retrieve the document.
Web browsers usually provide one of two types of client caching, persistent and nonpersistent. Persistent caches store cached data permanently so it is not lost even if the browser is exited. Nonpersistent client caches deallocate memory each time the browser is exited. While disk space is preserved, this type of caching is not as efficient as the persistent type because the cache is empty each time the browser is loaded. The cache starts over each time the browser software is reloaded.
Client caching is very helpful given the relatively slow data rate of telephone modems. Cache size is limited by the available RAM and disk space. Typically, browser caches only store HTML documents and image files but not FTP or gopher information. Nevertheless, this method of caching is very helpful to single users who frequent the same web sites.
2. Proxy Servers Used For Caching.Proxy caching is another service that is commonly found all over the internet. The term proxy originates from the server's function; the proxy server accepts requests from clients and then carries them out on the clients' behalves. The most widely used proxy server is the CERN HTTPd. CERN's acceptance has been attributed partly to its creation and availability at a time when no other caches were available. Most alternative caching proposals make their comparisons to CERN because of its wide acceptance. The following explanation of proxy caching is based on the CERN model.
For a proxy server to work, it must intercept all outgoing internet requests from a client's browser. This may be accomplished in at least one of two ways. If the proxy is placed between the internet and the user's network connection, such as a LAN's outgoing internet connection, interception is automatic. Otherwise, the browser software must be configured to send all outgoing directives to a proxy.
When a proxy receives a client's request, it first determines whether the request is a hit or a miss. If it is a hit, the proxy returns the file to the client. When the request is a miss, the proxy sends out a request of its own to the destination site for the document. When the document is returned, it is copied into the cache and then sent to the original client. Note that in this arrangement the proxy is acting as both server and client. The proxy is a server, fulfilling the requests of the client browser, and then a client, making its own request to another web server for a copy of an HTML document.
3. Hierarchical Caching and the Harvest Cache.The Harvest Cache was created by a collaboration between the University of Colorado and the University of Southern California. Its name was selected to denote its focus on, "reaping the growing collection of Internet information." Unlike the previously discussed caching systems, Harvest is a hierarchical cache; in addition to the local cache, there are regional caching servers holding information for clusters of networks that share information. In contrast, flat caching schemes are a simple proxy server arrangement where information is either on the proxy or is requested from the source. Harvest has been in use for almost two and one-half years by a growing collection of approximately 100 sites across the internet.
Hierarchical caching is superior to flat caching due to the collaborative efforts of several servers. The server initially receiving the request returns the information if it is in the cache. Otherwise, neighboring cache servers are queried for the relevant files. Neighboring caches have the ability to query one another and communicate using a caching protocol. When in operation, the client is serviced by the cache server that the request is directed to, and then by several caches that collectively share their stored files. When these Harvest cache servers are strategically placed on the internet, it is believed that some internet traffic is reduced by over forty percent.
The Harvest cache has been implemented nationally by the National Laboratory for Applied Network Research (NLANR). Currently, there are six root sites located throughout the United States. Expansion is presently occurring in Europe. Each server keeps track of the hits and misses to the cache and other statistical data for efficiency and research analysis.
C. DECIDING WHAT INFORMATION TO CACHE
While only HTTP and FTP have been discussed so far, information in many other forms is transmitted over the internet. Some sites on the internet can return specifically requested database information, conduct electronic transactions, or deliver personal or private information in a secure format using encryption. A decision must be made whether any of these messages need to be cached since many of them will pass through some form of caching service as they travel across the internet. Many of these decisions are limited by the server's hardware. The following section discusses the types of information that are cached and the decisions for doing so.
It is necessary to decide whether an item requested by a user from the server should be stored on that server at all, and if so, for how many hours, days, or months. One of the server's first considerations in deciding whether to cache a document is whether it is static or synthesized. A static document is a stand- alone file that a server sends each time it is requested. Every request to the server for that file produces the same output. Web pages, FTP files, motion pictures, and sound files are usually static documents. An example is computer images of photographs that are placed on-line for people to see. Whether the request is made today or in a few months, the photographs will not change. Static files are good candidates for caching. It is for this reason that most cache servers cache HTTP, FTP, and gopher files. Another type of information cached by special servers because of their static nature are IP addresses and domain names. Domain name servers also have caching servers available to assist with requests for DNS entries and IP addresses. They are essential to traffic reduction and the internet's operation.
Often, files stored on a web server are frequently updated or changed. It is not efficient to cache documents that are different each time a request is made. Files that are different each time they are requested are known as synthesized documents. They are stale from the moment they are placed in the cache. Cached documents that change each time they are accessed from their true server take up disk space in the cache but are never reusable. Operators of cache servers must therefore limit the document types that are placed in the cache. There are several types of synthesized documents which are not cached.
Other efficiency considerations taken into account by web servers involve file size and sources of origin. Files containing motion pictures, sounds, and images can be very complex, taking up a relatively great amount of disk space on the cache server. Large files of this type are not the most suitable candidates for caching because their place in the cache could be used to hold several smaller files that are more likely to have a higher hit-miss ratio. This is determined by the operators of the cache server through the use of cache limit directives. Cache servers can often make these decisions based upon information placed in the HTTP header.
Another efficiency consideration is the location of a file's source. Files located at a web site fairly close to the caching server, such as files that are located on the client's LAN or only one hop away, should not be cached to save space on the cache server. This is because the time required to fetch information so close is unlikely to have any significant effect on latency or network traffic. Caching servers can be directed not to cache particular files or files that come from a particular domain.
Another category of information is not cached for policy reasons. Information that is part of a secured transaction and payment, information requiring authorization, or sensitive or encrypted data may not be cached by the server. Not only is this information typically stale immediately following its placement in the cache, but the utility of caching the information may be outweighed by the financial or other privacy interests presented. Many cache servers respect the sending server's request that the information sent not be cached. Caching information can be placed in the HTTP header. A directive such as "please do not cache me" is provided in the protocol; however, there is nothing that assures a web server's compliance.
D. DECIDING HOW LONG INFORMATION REMAINS IN THE CACHE
Another issue for cache operators is whether the files are cached momentarily or for months. The optimum amount of time information should remain cached is for as long as it is not stale and receives enough hits to result in a decrease in network traffic justifying its storage. When data is placed in the cache, the issue for the cache server becomes how long the file should remain accessible to clients from the cache before being requested again from its source to avoid staleness. The time between a file's placement in the cache and the cache server's inquiry as to its freshness at the document's source site is known as the "time to live." The following paragraphs present the various methods used by cache servers in determining the time to live value.
In some instances the cache server is required to retrieve a newer version of a file or remove a file from the cache regardless of the time to live value. Disk space on the cache server is limited. Cache servers need to make room for newly fetched files that must be cached when the hard disk becomes full. When this occurs, depending upon the algorithm of the cache server, older cached files are removed in order to make room. Two types of time to live bypasses can be demanded by a requesting client. The first is a directive, "pragma:no-cache," placed in the header request. When a caching server finds this in the header of a client's request, the cache is by- passed and a new copy is automatically retrieved from the source. Note that this directive only by-passes reading the file from the cache. If the file retrieved from the source site is newer, the cache server places it in the cache in place of the previous stale version. Additionally, a client can provide its own "get if-modified-since" request to the cache server. This may force a cache server to check a document's freshness sooner than its internal default.
The cache server often sets the time to live value based upon information in the cached file's header. Many web servers provide pertinent information for cache servers in the headers of the files they send. The most useful is the "expires:date" header. Just like an expiration date on a milk carton, this directive tells the cache server when the file becomes stale. Files that have an expiration date before being received are treated as stale and are never cached. This may be used as a tool for web servers that do not wish to have their documents cached. By placing an already expired expiration date in each header, they will never be cached.
Another header directive is the "Last-Modified" header. This informs cache servers of the last time the file was modified by the source site. While not as helpful as the expiration header, it does inform the cache server of the file's age. From this header the cache can extrapolate the time to live by either using a default factor or by comparing the header date with the previous date the file was last modified, if that is available.
Some web servers do not provide any useful header information as a part of the delivered files. In such cases the cache server may determine the time to live based upon a default number. This is often twenty-four to forty-eight hours. Once a day the cache server runs an internal program to sort through all the cached files and determine which ones to discard. This process is known as "garbage collecting."
V. THE TECHNIQUE OF MIRRORING
A frequently used alternative to caching is mirroring. Also called shadowing or tracking, it is a technique in which a server makes complete copies of files or even whole servers and places them on-line. Caches are user-driven, retrieving specific files when clients request them. The decision to mirror is made by the operator of the shadowing server. Snapshots of the mirrored data are usually taken during scheduled low network traffic periods and may even be taken daily.
Mirror servers function in the same manner as the original server holding the information. TCP/IP's HTTP, FTP, and other protocols still govern. Note that mirror servers, when implemented, are identical to the source sites. Anything on the original server, whether a file, a photograph, motion picture, or sound, will eventually be copied onto the mirror. The operator of the mirror server decides which sites and files will be duplicated.
Mirroring is used to diminish the load of web servers overloaded with requests. However, it is important to note that servers are sometimes mirrored without their consent. Sites in Australia have mirrored many United States and European sites because of the long access time needed to send every document between such long distances. Users often do not know that the information is coming from a mirrored server because requests to the original site may be forwarded to the mirror via round-robin DNS table entries.
VI. COPYRIGHTABLE SUBJECT MATTER OF CACHED MATERIALS--THE FIXATION REQUIREMENT
In order for caching activities to create copyright liability, the materials being copied into the cache must fall within the subject matter granted protection by the Copyright Act. One of the requirements for copyright protection is that the work be fixed in a tangible medium of expression. As previously discussed, cached materials from web pages consist of images, sounds, moving pictures, and text. Historically, all of these works have received the protection of the copyright laws when created without the use of a computer. The fulfillment of the fixation requirement is not as apparent when these works are created originally or as derivative works on a computer. The following section presents the issues that arise when rights under the Act are asserted to protect works that exist in computer form on web pages.
The primary issue in determining whether computer and web page works receive copyright protection rests with the fixation element. Under the 1976 Copyright Act, copyright protection is given to all "original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device." Section 101 of the Act provides, "[a] work is fixed in a tangible medium of expression when its embodiment in a copy ... is sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration." In determining the reach of both the fixation requirement and section 101, it is important to consider the congressional intent. The Act was intended to protect works of authorship existing at the time of its passage, yet be broad enough to cover future technologies.
Before data can be cached on the internet, it must be stored at its point of origin. In order for files to be cached, they must pass through the cache as an intermediary point between the source computer and the destination computer. As discussed previously, web servers retrieve files from their local hard disks when requests for their transfer are made by clients. The local hard disks of web servers are the sites where copyrightable material is originally stored. In other words, cacheable information can only be transmitted across the internet if it is first stored on a hard drive at the source.
Cached files are copyrightable subject matter only if their storage on their servers' hard disks satisfies the fixation requirement. While the Act does not expressly provide that works stored on a computer are fixed in a tangible medium of expression, it is broad enough to include this type of storage. Section 101 further states that the work may be fixed in a tangible medium "now known or later developed." Files stored on computer hard disks are sufficiently permanent to meet this definition because the information may be repeatedly retrieved for an indefinite amount of time. Such a duration is clearly not transitory. Each time the file is loaded from the hard disk, it is reproduced in memory to be used or viewed by the user. Even though the computer is used to assist in the display of the information, the section 102 requirement that the work be "perceived, reproduced, or otherwise communicated ... with the aid of a machine or device" is satisfied. A contrary position, that files stored on hard disks are not sufficiently fixed, would resemble the unsuccessful arguments presented against fixation for movies on video tapes or sound recordings on compact discs.
Federal case law follows congressional policy, and it is clear that files stored on computer disks are fixed for purposes of the Act. While never expressly stated by a court, computer storage has been an underlying assumption made in many cases. In 1983 Apple Computer, Inc. v. Franklin Computer Corp. addressed the issue of whether a computer program stored in computer ROM was fixed for purposes of the Act. The Third Circuit, citing to an earlier case, held that " 'fixation' ... is satisfied through the embodiment of the expression in the ROM devices." Furthermore, in NLFC, Inc. v. Devcom Mid-America, Inc., the Seventh Circuit found it undisputed that the loading of software into a computer constitutes the making of a copy. While the issue in that case was an evidentiary matter of whether the software had been installed, the court presupposed without objection that the software, as stored on a computer disk, was fixed in a tangible medium of expression. The same approach was taken in ProCD, Inc. v. Zeidenberg. In a dispute over a shrink-wrap license, the court conceded that installing the copyrighted software onto a hard disk without authorization would constitute infringement. District courts have also held that computer storage is a sufficiently fixed medium. Sounds fixed in computer chips have also been held to satisfy the fixation requirement for sound recordings.
Since computer software is a proper subject matter of copyright protection, it only follows that disk storage is a contemplated means of fixation under the Act. In 1980 the Act was amended to expressly include copyright protection for computer programs. Section 101 defines "computer program," and section 117 provides exceptions to infringement of computer programs in limited situations. Even before this amendment, computer programs were always intended to be covered under the Act as passed in 1976. Both source code and object code are protectable under the Act. This also holds true when the copying is of non-literal elements. Tapes and hard disks are sufficiently fixed to be deemed copies under the Act as a matter of law. In order for software to be transferable among users, it must be stored on a transportable medium. Currently, software is sold on floppy disks and is installed for permanent use on hard drives. Disk storage has become to computer software what pencils and paper have always been to written literary works.
VII. THE EXCLUSIVE RIGHTS INFRINGED BY CACHING
Section 106 of the Act grants to copyright owners an enumerated list of exclusive rights. These exclusive rights include: the right to reproduce the work in copies, to prepare derivative works, to distribute copies to the public, to perform the work publicly, and to display the work publicly. As these rights are exclusive, it is improper to engage in any of the above listed activities without an author's permission. While some authors may place their works on-line in the form of a web page, this is not always the case. Many sounds, digital photographs and excerpts of written literary works may be placed on the internet without the author's permission, or even without his or her knowledge. If the section 106 rights of an author are violated when placed without authorization on-line at a web site, then it follows that these same rights are subsequently offended again when placed in a cache. The various section 106 rights are affected depending upon the type of cache that is implemented. The following section discusses how caching interferes with section 106 rights.
A. THE RIGHT OF REPRODUCTION
Under section 106(1), an author's exclusive right of reproduction is violated when someone reproduces the work in the form of a copy without the author's consent. A "copy" is defined in the Act as a "material object ... in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device." The right of reproduction can be divided into two elements, reproduction of the protected work and a copy that is sufficiently fixed. Caches duplicate information they retrieve from a web or FTP server. The purpose of caches is to maintain an identical reproduction so that the original source site does not need to be burdened with later requests for that same information. This duplication satisfies the reproduction requirement. The fulfillment of the second requirement of fixation is not always as obvious. The narrower issue is whether the cached data is sufficiently fixed under the definition of a copy for purposes of the exclusive right of reproduction under section 106.
1. Hard Disk Storage of Cached Materials.Information is stored in a cache in one of two ways, either on the cache's hard disk or in RAM. Hard disk caches are the easier case. When sections 101 and 106 are read together along the congressional intent, it is clear that physical computer disk storage was anticipated as one method of fulfilling the fixation requirement under the Act. In 1974 the National Commission on New Technological Uses of Copyrighted Works (CONTU) was created by Congress to revise the copyright laws. In developing national policies for protecting copyright works and ensuring public access to them, CONTU addressed issues affecting these interests when they are used in computer and machine duplication systems. In addition to proposing amendments to the Act, CONTU took a position on computer copies. It was the commission's belief that the definitions of "copies" and "fixed" should be read together to produce the result that, "because works in computer storage may be repeatedly produced, they are fixed and, therefore, are copies." CONTU also provided examples to demonstrate distinctions of how copies of computer software may be made. Put simply by CONTU, "the placement of a copyrighted work into a computer ... is the preparation of a copy."
Case law on the issue of storage onto computer hard disks has been addressed above in the discussion of the proper subject matter of copyright protection. It only follows that if the original storage of files on a hard disk is sufficiently fixed to give the work copyright protection, then duplication and subsequent storage on the same type of medium is likewise fixed for purposes of a copy. Fixation is only defined once in the Act, and it applies in both contexts.
One last challenge to the "copies" and "fixation" requirements is whether cached material is stored for more than a transitory period. From one perspective, if the medium of storage is considered in the abstract, it only matters how long the medium is capable of holding the work. Fixation is satisfied since magnetically stored materials can be held for virtually infinite periods. The other view is that because the cache automatically removes the data at predetermined time intervals based on staleness, it is for a transitory duration. However, it seems unconventional to make the pivotal issue how long the data is actually kept in the medium, rather than how long it could last there undisturbed. This is important to caches because some files may be cached for only a matter of minutes and others for weeks. A California district court has specifically addressed this issue; the court held that a computer disk drive is a sufficiently fixed medium even if the copy is only in existence for eleven days. The court's reasoning was that the data was fixed from the moment data was placed on the drive. Removal was a subsequent act and therefore, not dispositive. No distinction is placed on the amount of time that protected information remains in a medium that is capable of fixation the moment it is stored.
The fixation requirement is satisfied since computer disks are a form of magnetic storage. Mirror sites also use hard disks to make duplicates of other hard drives of other web servers. These two types of caches violate the section 106 exclusive right of reproduction each time a file is stored without the permission of the author.
2. The Storage of Cached Materials in RAM.The determination of the point of fixation when cached information is stored solely in RAM has only recently been clarified. Unlike computer hard disks, information is stored in RAM in the form of electrical impulses representing zeros and ones. Rather than being magnetic, the movement and existence of electrons represent the cached data. Because of RAM's volatility, the computer must always be on and providing power to the RAM chips. The electrons remain in RAM only as long as the computer's other circuitry provides support to the RAM chips in a procedure known as refreshing. However, while RAM is relatively sensitive compared to other forms of storage, information in RAM remains intact for as long as the computer is left running.
The most significant holding on this issue is from the Ninth Circuit which opined that computer RAM is sufficiently permanent or stable to satisfy the definition of a "copy." In MAI Systems Corp. v. Peak Computer, Inc., MAI manufactured computers and wrote software to run on its systems. The defendant, Peak, performed routine computer maintenance for MAI's customers. MAI filed suit against Peak raising various claims including copyright infringement. Particularly, MAI's infringement argument rested on an agreement between MAI and its customers. The agreement provided that only the contracting customers were authorized to use the computer software. Any third party use of the software was prohibited by the agreement. When Peak's employees serviced the computers, the mere act of turning on the computer, which caused the computer software operating system to be loaded into RAM, constituted a copy in violation of MAI's copyright protections. Peak's defense rested on the position that copying software into RAM does not constitute a "copy" under the Act.
In addressing this case of first impression, the court made its conclusions based on the characteristics of RAM, that it "can be 'perceived, reproduced, or otherwise communicated.' " The court based its holdings on three sources. First was the holding in Vault Corp. v. Quaid Software Ltd., in which the Fifth Circuit held that loading a program into a computer's memory creates a copy of the program. The court also looked to Nimmer on Copyright and CONTU's report. It was the court's belief that RAM was a natural extension of the Act as it had pertained to ROM and hard disk storage in the past. This holding has been followed by several courts despite the criticism it has received.
Another view taken by courts is that the work is fixed if stored somewhere in the computer. In Stern Electronics, Inc. v. Kaufman, the issue presented was whether works that appeared solely on a computer screen in the context of a video game were sufficiently fixed in a tangible medium of expression. While it was not an issue that the computer code for the game, which was stored in ROM, was sufficiently fixed, the defendant attempted to distinguish the code from the screen images. The defendant argued that since video games produce uniquely different screens and sequences for each game player, they only existed on the game's screen and not fixed in the ROM. The court gave fixation a broad definition, holding that all portions of the program, stored in memory or any other devices within the game, were fixed for purposes of the act. Under this interpretation, the information is sufficiently fixed as long as it is contained within any physical component of the computer. This case has been followed by several courts which have held that works which may be perceived from memory devices with the aid of a computer's components are sufficiently fixed. With such a liberal interpretation of fixation, RAM is a physical component of every computer, satisfying the section 101 definition of fixation.
While a look at the legislative history is not as helpful as one would hope, the passage of section 117, which places limitations on the exclusive rights of computer program copyrights, must have contemplated that loading a computer program into RAM is a "copy." CONTU's report addressed the issue of memory broadly and with less clarity than the issue of magnetic storage. In a contrary position, the legislative House Report states that "the definition of 'fixation' would exclude from the concept purely evanescent or transient reproductions such as those projected briefly on a screen, shown electronically on a television or other cathode ray tube, or captured momentarily in 'memory' of a computer" Notwithstanding these two positions, section 117 exempts individuals from liability for making copies of computer programs if the copying is an essential step in the utilization of the programs in conjunction with the computer running them. The purpose behind section 117 was to permit users with lawfully obtained computer programs to load them onto their computers without any concerns that they would be violating the author's exclusive rights. In order to use a computer program, it must be loaded from the computer disk into RAM. Computers use RAM as intermediary storage to run programs. As recognized by the Fifth Circuit in Vault Corp. v. Quaid Software Ltd.:
[b]ecause the act of loading a program from a medium of storage into a computer's memory creates a copy of the program, the CONTU reasoned that, "[o]ne who rightfully possesses a copy of a program ... should be provided with a legal right to copy it to that extent which will permit its use by [that] possessor."
Under this legislative interpretation, information stored in RAM meets the requirements of a "copy" for purposes of section 106.
B. THE RIGHT OF DISTRIBUTION
Section 106 provides for the exclusive right to "distribute copies ... of the copyrighted work to the public by sale or other transfer of ownership." This is considered the right to control a work's publication. Publication can occur when only one member of the public receives a copy of the copyrighted work. More objectively, the work must be offered to the public in general and not directed at any particular person.
The issue was first addressed in the context of individuals placing works on- line for other users to download. In Playboy Enterprises, Inc. v. Frena, Playboy sued the operator of a computer bulletin board, or BBS. The defendant placed protected works on-line in the form of photographs belonging to the plaintiff. The plaintiff alleged violations of its exclusive rights, including the exclusive right of public distribution. Following section 106, the court found that this right had been violated.
Playboy is significant to the issue of file caching because cache servers are very similar in their operation to computer bulletin boards. They store a large number of files which are copied and then sent to any user that requests them. By placing cache servers on the internet, they are accessible to any of the millions of users connected to the internet. As a consequence, a public distribution is made the first time a cached file is hit.
The Playboy court focused on another point of law that is controlling on file caching. The court found that "[i]t does not matter that the defendant Frena claims he did not make the copies itself." The fact that the computer was controlled by users and followed the instructions given by them is not relevant. While caches copy the information to their hard disks, they are only distributed upon the request of a user. Under this rationale, the cache server operator is liable even though the copies are made automatically and done solely at the request of a user.
Another court has rejected this argument. In Religious Technology Center v. Netcom On-line Communication Services, Inc., the district court was not convinced that "the mere possession of a digital copy on a BBS that is accessible to some members of the public constitutes infringement." Only the user requesting the file be sent should be liable for causing the distribution. The distinction drawn was that BBSs merely store and pass along information to others and do not cause the work to be distributed. In Netcom, one of the defendants was an internet service provider (ISP). Like ISPs, cache servers are conduits between two points on the internet. Regardless of message content, it passes through the cache and is stored if it will aid in network efficiency. This issue of control was dispositive in distinguishing the case from Playboy. According to the Netcom court, a different construction of section 106 would be broad, unreasonable, and serve no purpose.
C. THE RIGHT OF PUBLIC DISPLAY
The exclusive right of public display granted in section 106 means to "show a copy of [the work], either directly or by means of a ... television image, or any other device or process...." Legislative history includes: "the projection of an image on a screen or other surface by any method, the transmission of an image by electronic or other means, and the showing of an image on a cathode ray tube or similar viewing apparatus connected with any sort of information storage and retrieval system." The Playboy court also addressed whether the defendant's infringing conduct violated Playboy's exclusive right of public display. The court focused on the definition of "public display." Section 101 defines "public display" as "to ... display it at a place open to the public or at any place where a substantial number of persons outside of a normal circle of a family and its social acquaintances is gathered."
The court found the files on the BBS to be a public display because the persons who had access to those files were not just the defendant's family. Using an analysis similar to that employed in rejecting a violation of the right of distribution, the Netcom court rejected any violation of the public distribution rights and refused to follow this part of the Playboy holding as well. However, both courts failed to address the other issues of the public display rights. While cache servers are available on-line to the general public, it is questionable whether they are displaying any works.
If section 101 is interpreted literally, cache servers are not displaying any information. They are simply sending stored information to a user. It is sent in a non-human readable form. In order for a user to view the file, it must be loaded into a viewing program. No images or works are viewable from the cache server. This same argument applies to mirror servers as well.
Unlike cache servers, browser caches meet the section 101 definition. Browsers with built-in caches not only store files but also have the responsibility of displaying them when requested to do so by a user. Browser caches make a copy of the retrieved information and display it on the screen at a later time. Fortunately, while the "display" definition is satisfied, the "public display" requirement fails. Computers with browser software are usually utilized by individuals either at home or at work. If used at home, the browser is displaying the work to only a few persons that are either family members or social acquaintances. Viewings of this nature are not public for purposes of section 106.
Although an unusual distinction, browser caches used at home do not violate the section 106 right of public display. Browsers used elsewhere do. Computers connected to the internet running browser caching software may be used at offices, coffee houses, or other places providing public access. A computer at the office may be available to a substantial number of persons outside of family and social acquaintances. While it is arguable that only one or two individuals ever use a computer at the same time, the public display requirement is not based on this meaning. In order for a place to be open to the public, "[i]t does not require that the public place be actually crowded with people. A telephone booth, a taxi cab, and even a pay toilet are commonly regarded as 'open to the public,' even though they are usually occupied only by one party at a time." Accordingly, browser caches operating on publicly accessible computers are displaying protected works publicly.
D. THE RIGHT OF PUBLIC PERFORMANCE
Section 101 also gives authors the exclusive right to perform their copyrighted works publicly. Performance is defined as "to recite, render, play, dance, or act it, either directly or by means of any device or process or, in the case of a motion picture or other audiovisual work, to show its images in any sequence or to make the accompanying sounds audible." This is distinguished from the right of public display which requires a still picture or other single image. Presently, motion pictures and other such works are not likely to be cached because of the amount of disk space they require. The most likely file type to create an issue of the violation of this exclusive right is that of musical works cached in the form of sound files. The impact of file caching on this exclusive right is given the same analysis as that given to the right of public display in such cases.
VIII. THE CASE OF COPYRIGHT INFRINGEMENT
A. A PRIMA FACIE CASE OF INFRINGEMENT
One of the easier elements of a case against caching is proving infringement. To establish infringement the plaintiff must prove two things, ownership of a valid copyright and copying by the defendant. Ownership is a matter of registration for the plaintiff, which may be undertaken at anytime prior to filing suit. Copying by the defendant can be proven by direct evidence or inferred when the defendant had access to the copyrighted work and the accused work is substantially similar to the copyrighted work. The nature of file caching requires that the issue of copying be stipulated. In order for caches to work successfully, the files must be copied identically. Unlike the issues of copyrightable subject matter and violations of the exclusive rights, infringement is one facet of the case that should go uncontested.
A defendant may be held liable for damages regardless of whether the file caching is a benefit to the operation of the internet. The same is also true whether the author of the protected work suffered any actual injury by the infringement. Aside from the actual damages available to a prevailing plaintiff, an election to receive statutory damages can be made any time before final judgment. An author is entitled to not less than $500 and up to $20,000, as the court may deem just, for each work that is copied. Under current law, there is no innocent infringement; a defendant is liable to the plaintiff for damages even if the infringement was harmless. Statutory damages are calculated per work and not per copy. The court may also, in its discretion, award costs to the prevailing party as well as attorney's fees in certain cases.
Statutory damages may be increased to up to $100,000 for willful infringement. The willfulness requirement of section 504 is interpreted as meaning that the infringer either knew the conduct was an infringement or acted with a reckless disregard of the copyright owner's rights. The acts do not need to be committed maliciously. One must consider whether a cache operator is liable for willful infringement. The cache runs automatically and retrieves enormous quantities of documents. There is no opportunity for any file's copyright notice or registration to be confirmed. Consequently, unless the cache operator is personally informed by an author, there truly is no reasonable way of making a determination on a file-by-file basis.
Under the principle of a reckless disregard for the copyright owner's rights, information in the header can provide the cache with notice that information should not be cached. As discussed earlier, a "do not cache me" directive placed in the header of a file informs the cache not to store the data. This technology demonstrates an ability of copyright holders to express their wishes; disregard could be deemed reckless.
Another copyright remedy provision permits an author to seek injunctive relief in addition to monetary damages. This relief is granted either in the form of a preliminary injunction or as part of the final judgment. If preliminary injunctive relief is sought, it does not necessarily follow that the cache must be completely shut down. The complaining party must establish four requirements: first, that it has a substantial likelihood of prevailing on the merits; second, that it will suffer irreparable injury if it is denied the injunction; third, its threatened injury outweighs the injury that the opposing party will suffer under the injunction; and fourth, an injunction would not be adverse to the public interest. The plaintiff can easily show that copying has occurred with injury to his rights by the public distribution. However, the latter two elements lean in favor of the cache operator. Essential to the cache server's defense is that the operator has the ability to determine what information is cached. The cache software could be reprogrammed or configured to disregard a particular file or even all data originating from an entire domain. This voluntary cooperation by the cache operator diminishes the harm to the plaintiff because even though the cache is still operating, the author's files are no longer being copied. Additionally, the harm to the public is severe. This remedy could have grave consequences to the operation of the internet. With file caching reduces network traffic by at least forty percent, such a court order could be far reaching. It should be argued that granting such a remedy not only harms the cache operator, but consequently, every user of the internet.
A permanent injunction may be attained by a prevailing plaintiff as a part of the court's final judgment if it is reasonable to prevent or restrain infringement. The end result of infringement litigation would be ineffective if future infringement could not be proscribed. Courts should issue permanent injunctions when liability has been established and there is a threat of continuing violations. However, such an injunction can achieve its goal of preventing future infringement of the plaintiff's works without shutting down the entire cache. Shutting down an entire internet caching scheme is not necessary as discussed above under the preliminary injunction analysis.
IX. AFFIRMATIVE DEFENSES
The conclusion reached so far is that information stored in a cache is the proper subject matter of the Act under section 102, but it also interferes with the exclusive rights of the author provided in section 106. Accordingly, there is liability unless the activity is saved by an affirmative defense. Three affirmative defenses are potential candidates for cache operators: fair use, the section 117 computer program exception, and a theory of implied license. A distinction regarding who commits the first instance of infringement is worth considering before these defenses are presented.
Works are placed on the internet either by the author or a potential infringer. In the first case, it is the author who personally creates the web pages and places his work on-line. This may also be accomplished by another acting at the author's direction. The work is placed on-line lawfully, and the cache is committing the first act of infringement. In the second instance, a work is placed on-line without the author's permission. An example of this conduct is the copying of an article from a local newspaper and placing it on a web page without the permission of the owner. Infringement has occurred by the author of the web page, and the cache is committing a subsequent infringing act. As will be demonstrated, different defenses are more likely to be successful depending upon the categorization of the caching activity.
A. FAIR USE
The doctrine of fair use, which has been around for 200 years, is an affirmative defense to copyright infringement. Its application by the courts has often created unpredictable results. Within a two year period, the Supreme Court ruled that the use of a video cassette recorder to tape live television shows for later viewing was a fair use, while the copying of 300 words from a 200,000 word unpublished manuscript was not. Aside from being an equitable doctrine, fair use has been codified into the Act and permits infringement for purposes including criticism, comment, news reporting, teaching, scholarship, or research. The application of section 107 requires a case-by-case determination whether a use is "fair use." Four factors are used to determine whether an otherwise infringing work is entitled to fair use protection: the purpose and character of the work, the nature of the copyrighted work, the amount and substantiality of the portion used in relation to the entire copyrighted work, and the effect upon the potential market or value of the original work. Several cases have interpreted each of the four factors, and have left open the opportunity for courts to apply equitable considerations in unusual cases.
1. The Four Factors. The first factor, the purpose and character of the use, pertains to whether the infringer's use is for noncommercial purposes. Traditionally, a presumption arose that a work made for a commercial purpose weighs against a finding of fair use. There was concern that this factor could swallow the entire fair use doctrine exception since many fair uses can be deemed commercial. Two fair uses believed to be in jeopardy were television news reportings and criticisms. Recognizing this concern, in 1994 the Supreme Court clarified this issue and opined that the commercial nature of a work will not bar a finding of fair use. However, the commercial nature of the infringer's use leans in favor of the copyright holder if the duplication is for financial exploitation.
The second factor looks at the nature of the work to determine the amount of protection it is entitled to receive. The copyright laws are intended to protect an author's expression without creating a monopoly over the ideas being expressed. The use of facts deserves greater dissemination without the concern of liability than works that are truly original, such as fantasy and fiction. Consequently, there is a greater likelihood of a finding of fair use when a work contains only slight creativity.
The third factor rests on the amount of the protected work taken by the infringer. The portion used is measured in terms of both qualitative and quantitative amounts. A finding of fair use may be denied if too much of the protected work is actually copied and used by the infringer. However, even a slight taking has been found to be sufficient to support a denial of the fair use defense. One court has held that "a small degree of taking is sufficient to transgress fair use if the copying is the essential part of the copyrighted work."
The last factor, addressing the commercial impact of the use on the copyright holder, has been considered the most important factor. This factor is given heavy consideration to assure that the application of fair use does not "impair materially the marketability of the copied work." When balancing this fourth factor, the courts must determine whether widespread availability of the infringer's use will displace the market for the original protected work. This concern includes harm to the market not only for the protected work, but for any derivative works. The copyright owner may demonstrate market harm with evidence that he would have had significantly higher revenues from the work but for the defendant's copying. Proof of present lost profits is not required.
2. Case Law and Equitable Considerations.Equitable considerations remain an essential part of the fair use doctrine since the defense's codification as section 107 of the Act. The fair use defense should be applicable "where the purpose of the use is beneficial to society, complete copying is necessary given the type of use, the purpose of the use is completely different than the purpose of the original, and there is no evidence that the use will significantly harm the market for the original." The doctrine helps to allow courts to bypass an otherwise inflexible application of copyright law when it would impede the creative activity that the Act was originally intended to stimulate.
In Sega Enterprises Ltd. v. Accolade, Inc., the copyright owner was the manufacturer of a computer video game system as well as several game cartridges. The defendant, Accolade, decided to compete in the video cartridge market by also providing game cartridges to be played on Sega's game system. However, in order for Accolade to write sufficient code to enable its game cartridges to work on Sega's system, Accolade had to look at the code contained in Sega's copyrighted pre-existing game cartridges. Sega brought suit claiming the duplication of its game cartridges for the purposes of viewing its code infringed its copyrights. Notwithstanding the copying of Sega's software, the court found this to be a fair use. Looking at the first factor, the court disagreed with Sega's position that the defendant's copying was a commercial use since Accolade planned to compete against them in the market. Rather, the copying was done for the purpose of discovering the functional requirements for compatibility with Sega's game system.
Additionally, the court noted that it was free to balance the public benefits derived from the fair use despite the potential commercial gain. In validating the defendant's fair use, the court held that the "[p] ublic benefit need not be direct or tangible, but may arise because the challenged use serves a public interest." The court found dispositive the benefit to the public of competition in the video game market which would bring about creativity and expression in the form of more video games.
An analogous case to caching and the fair use defense arose in Religious Technology Ctr. v. Netcom On-Line Communication Servs., Inc. The plaintiffs owned the copyrights to works of the Church of Scientology that were made available on the internet. In addition to bringing their action against the individual who placed these works on-line, claims were also made against his internet service provider. There was no issue that placing the works on- line constituted infringement of the plaintiff's copyright. After holding that the fair use defense was available and created a genuine issue of material fact, the court found that the internet service provider could not be held directly liable. While not based on fair use principles, the court found that it did not make sense to "adopt a rule that could lead to the liability of countless parties whose role in the infringement is nothing more than setting up and operating a system that is necessary for the functioning of the Internet." Rather than hold the entire internet liable, the court believed that liability rested with the party who caused the original infringing copies to be made. There was no way for Netcom to prevent the infringement because billions of bits of information pass across the internet and are necessarily stored on servers. Further, their services were essential for public access to the internet. The only remaining issue on remand for Netcom was contributory liability.
The remaining case on fair use that is relevant to file caching is Sony Corp. of America v. Universal City Studios, Inc. An action was brought against Sony, as the manufacturer of video tape recorders (VTR), for providing a device that was used by consumers to tape record protected television shows without authorization. The theory was based on Sony's marketing of the VTRs for such uses. In holding for Sony, the Supreme Court found that the use of VTRs to record protected television shows for later viewing was a fair use. The Court called this activity "time shifting." While it only looked at a few of the fair use factors, the Court made a strong factual determination that favored the use of VTRs for this purpose. The Court believed that the plaintiffs suffered no harm. Programs were not stolen, but rather recorded for later viewing. There was evidence that many users reuse the same video tapes and tape over previously recorded shows. In weighing the fourth factor of section 107 in favor of the defendants, the Court stressed the plaintiffs' admission that there was no actual harm to date. In making this landmark decision, the Court followed the traditional principle that "the doctrine is an equitable rule of reason, no generally applicable definition is possible, and each case raising the question must be decided on its own facts."
3. Caching and Fair Use.The result awarded to the defendants in Sony should be granted to file caches when the equitable principles of section 107 are applied. Additionally, two of the four factors of section 107 favor file caching's use. Under the first factor, the purpose and character of file caching is not commercial. Not all file caching servers or mirrored sites derive a pecuniary benefit from the information they provide. Many of them are maintained by grants from the federal government or scientists and universities for research studies. Most users are not even aware that information they receive has even been cached before arriving on their screens. The purpose of caching is to assist the internet to operate efficiently. As discussed, caching reduces network load and minimizes the burden placed on popular web sites. The purpose and character factor weighs heavily in favor of caching mechanisms.
The second and third factors favor the copyright owner. The internet could be considered a universe of expression and creativity. With millions of users around the world, web pages are created for unlimited purposes. This originality and creativity has been attracting most noncommercial users to the internet. Undoubtedly, the nature of the copyrighted work factor favors the copyright owner. The third factor is unfavorable to the defense since caches make identical duplicates of the information being passed across the internet. The entire work is duplicated each time a file is stored in a cache.
The fourth factor is more likely to favor caching the work since there is no displacement in the market for the original. Most caches are not placed on-line to compete with authors of protected works. Caches are utilized to assist authors in distribution as often as requested. Users do not have the ability to ask a cache for a copy of someone else's work. A cache only retrieves a copy of a work from an author's site if doing so is network efficient. When the fourth factor of section 107 favors a copyright owner, it is because the use results in the creation of a second work, not belonging to the author, that competes in the marketplace against the original. A cache is used to deliver an author's work and not to create a second work. The author is placed in no worse a position than if a cache was not involved. File caches should enjoy the fair use defense as a consequence of this fourth factor having the most weight.
While the four factors tip the scale in favor of file caching, the equitable principals are equally persuasive. The public benefits of caching are far greater than the potential harm to owners of protected works. The Netcom court believed it would be absurd to hold the entire internet liable for infringement. It makes similar sense to not hold caches liable because they decrease network traffic by at least forty percent, reducing the overload of some of the more popular web sites.
File caches on the internet are analogous in many ways to the application of VTRs to public broadcasts. The Sony court believed that time- shifting devices did not infringe protected works and provided a valuable service for their owners to view programs at more convenient times. File caches are "source shifters." The same information is still received from the same source, but it is traveling a greatly reduced distance across the internet. Unlike the facts of the Sony case, web sites have the ability to by-pass caching. The cache may be overridden by using a "do not cache me" header or by setting an already expired expiration time. Caching is also helpful in many other aspects. Not all information on the internet that passes through a cache is copyrighted. Further, the entire purpose of placing information on-line is for others to view it. Should such viewings be unauthorized solely because they are accomplished by a more efficient means? Finally, cached information is only stored temporarily and is often erased from the cache within forty-eight hours. There is no injury to authors of protected works on the web that reasonably justifies the denial of this defense to file caches.
B. SECTION 117
Section 117 of the Act provides an exception for uses of computer programs which would otherwise violate the exclusive rights afforded in section 106. Added by the legislature to the Act in 1980, this section permits the owner of a copy to make or authorize the making of another copy or adaptation of the computer program. However, in doing so it must be an essential step in the utilization of the program on a computer or for archival purposes only. Three portions of section 117 have repeatedly been at the heart of disputes: ownership of the copy, the essential step, and adaptations of the program. Addressed next are the first two issues.
Courts have taken different views of the ownership requirement. Under a literal interpretation given by several courts, section 117 is only available to owners of copies and not to licensees or possessors, whether authorized or otherwise. In a case involving an individual who knew that the copy being used was without permission, the court denied the section 117 defense without further inquiry. A broader interpretation of section 117 has been handed down by two courts. In ProCD, Inc. v. Zeidenberg, a district court took a contrary position, providing a thoughtful analysis in its ruling that a licensee can invoke the defense granted by section 117. While the case was subsequently reversed on the grounds that shrink-wrap licenses are enforceable, the appellate court did not address the district court's section 117 analysis. In ruling that a legitimate holder of a computer program is entitled to the section 117 protections, the lower court cited to CONTU's report as cited by another district court. In Foresight Resources Corp. v. Pfortmiller, a district court in Kansas reached a similar conclusion, finding that a defendant who was a licensee under a contract with the plaintiff was the lawful owner of a copy of the infringing program. Despite these holdings, Congress enacted CONTU's proposed section 117 with only a single change. Congress selected "owner" over CONTU's choice of using "rightful possessor" as the term to describe who the section would protect.
For purposes of section 117, cache operators may be owners of copies of the files stored on their servers depending upon the author's intent. An author who places a work on-line via a web site must contemplate that his work will be received by browsing users. An author should know that the works displayed at his site will be cached prior to viewing since most browsing software contains internal caching. The issue presented is whether an author, who has knowledge that his on-line work will be cached either in transit to its destination site or by the browser, intended to convey an ownership interest to the cache operator. It appears unusual that such an interpretation of section 117 would be based upon one's subjective intent. In situations where the infringer is the web site that placed another's works on-line without permission, a cache server would not receive immunity under 117 because the web site sending the copy has no ownership interest to convey.
A license is defined as "[p]ermission to do a particular thing, to exercise a certain privilege or to carry on a particular business ...." It would not be unreasonable for a court following the broader interpretation to find the cache operator a licensee of the author. Licenses do not need to be in writing and may be presumed from the circumstances. The author who places his work on-line, knowing it will be cached before reaching its intended audience, must intend to grant some limited right. Any other conclusion creates the inequitable state where an author may unilaterally contrive situations in which his works will be infringed by a cache at any instant he so desires.
The strongest argument against the application of section 117 to internet file caching is the limiting definition provided to computer programs. Section 117 only protects "computer programs" from infringement, which are defined under the Act as "a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result." The plain meaning provided to computer programs does not encompass other types of files that are more frequently cached, such as graphical images, sound files, motion pictures, and textual works. None of these file types provide the computer with any set of instructions to be executed. Rather, they are data files which are secondary sources of information used by computer programs. This takes most of the files stored in a cache out of section 117's immunity. However, at least one amendment has been suggested to broaden this protection.
Whether file caching is an essential step of a computer program in conjunction with a computer under 117 depends upon the interpretation of "essential step." Section 117 was enacted to "provide a legitimate holder of a computer program with permission to do that copying of the program which is necessary for him to be able to use it in his computer without running afoul of possible infringement actions." The narrow view is that the copy must be required for the actual use of the program itself. In Micro- Sparc, Inc. v. Amtype Corp., a magazine provided sample program listings which could be typed in by readers. The defendant was a company providing the service of typing in each month's listings and delivering them to subscribers on a floppy disk. The court rejected the defendant's section 117 defense because it believed that only the user could input the copy into the computer. The defendant made a copy to assist subscribers from having to type it in themselves. This was a step of convenience rather than an essential one. The court noted that it would not be unlawful for subscribers to input the programs into their computers themselves.
Contrary to this view, the ProCD court gave the "essential step" requirement a more broad interpretation. In holding that placement of a copy of a computer program onto a hard disk would be immune under section 117, the court noted that any other finding would be "rather ridiculous." The court believed that if section 117 was intended to apply only to RAM, hard disk use would become very limited, section 117's purposes would be undermined, and new programs would lose sales appeal. The court interpreted the essential step requirement broadly to include "a[n] essential [step] for the effective use of today's computer software."
For file caching to receive immunity under section 117, it must be determined that the acts of caching are essential to the use of the software. The first difficulty arises from the fact that the program is being copied by the third party cache and not the copy owner. This is analogous to the issue presented in Micro-Sparc, Inc. The copy is not even being placed in RAM or on the hard disk of the user's computer because it is being made by a third party. Therefore, the question is just how essential caching is to the internet's operation to justify any extension of section 117's requirement that the copy be an essential step in the program's utilization. Only one court has defined "essential," determining that it means "indispensable and necessary." In conformity with this definition, cases have held that the "essential step" requirement is limited to the acts of inputting the program into the computer, loading the program into the computer's memory in order to permit the computer to execute the program, and making changes to a program in order to maintain the original software. Following these interpretations, caching is not essential at all to the program's operation. Caching exists entirely to assist with the delivery of the program to the user from a web site. A program's operation is independent of whether file caching is utilized. The only effect of using a program without file caching is that its delivery to the user may be prolonged. Given that file caching is only essential to the operation of the internet and packet delivery, protection under section 117 would be a very generous interpretation that was never contemplated by Congress or CONTU.
C. IMPLIED LICENSE
The last affirmative defense that may protect file caches from copyright infringement implies a license between the author of the protected work located at a web site and the cache operator. The creation of an implied license is drawn from contract law and is not a strict matter of copyright law. Under contract law, "a contract implied in fact arises under circumstances which, according to the ordinary course of dealing and common understanding of men, show a mutual intention to contract." Implied licenses have been used to protect other forms of intellectual property as well.
The concept of an implied license is not explicitly provided for in the Act, but is consistent with the reading of three sections. Section 106 provides a series of exclusive rights that belong to copyright owners. Section 204 addresses how those rights may be transferred. Under this section, all transfers of copyright ownership must be in writing. Section 101 defines "transfer of copyright ownership" to include assignments, exclusive licenses, and any other conveyance that transfers exclusive rights. Nonexclusive licenses are expressly excepted from the definition. These sections are interpreted as meaning that nonexclusive licenses are not transfers of ownership; therefore, they do not need to be in writing. Nonexclusive licenses may be granted orally or implied from the conduct of the parties since there is no writing requirement.
There is no bright line rule for the creation of implied licenses since they may arise out of many different types of understandings. A license by implication may arise where there was no contract between the parties. In order to determine whether such a license was granted and to what extent, a court must consider all the circumstances surrounding the negotiations made between the parties. It is necessary to examine the intent of the parties in making a determination and whether there was a meeting of the minds. To assist with its findings, a court may also consider trade usage.
Courts have found implied licenses in at least two situations. In Effects Associates, Inc. v. Cohen, the author of a protected work hired Effects to enhance a film. While the parties agreed that Effects would be compensated for its work, the agreement was never placed in writing. Effects performed its part of the agreement and handed the finished product over to the author. Dissatisfied with the results, the author refused to pay Effects. Effects then brought an action for copyright infringement. After finding that Effects was the copyright holder, the Ninth Circuit found there was no infringement because an implied nonexclusive license was created between the parties. Ostensibly, the court created the following rule to be applied in future cases: when an individual creates a work at the request of another, hands it over, and intends for that recipient to copy and distribute it, an implied license for the recipient is created.
In another case, the United States Court of Federal Claims found that when an individual hands his work over to another with no expectation of remuneration and with the anticipation that it will be copied and published, an implied license to publish without royalties is created. The court asked whether a reasonable person in the licensee's position would believe that the owner consented to and even encouraged publication. It was dispositive to the court that the owner voluntarily submitted his work to a committee whose purpose was to publish a book on that material. In making its ruling, the court cited to another case for the position that when a person voluntarily submits a work for publication, an implied license is created.
Whether an implied license exists between cache operators and authors of on-line works depends upon how a court views the sequence of events. Authors who place works on the internet know in advance that the works are available to millions of users around the world. No royalties may reasonably be expected in return by the author since the mere placing of a work on-line does not necessarily entail any compensatory scheme. From an objective point of view, someone placing a work on-line has no expectation of payment or limited distribution and intends that it be viewed by others. Authors who place their works on-line are expected to realize that other users may only view their works through use of the network hardware. Users do not go to an author's computer to view the works. The works are transferred over the internet to users. Given this constant of network and computer technology, the author must concede to this type of operation if anyone is ever to view his works on-line. Caching is a necessary mechanical part of network operations. In order for an author to know about the internet, he or she most likely has used it before through browser software. Authors must anticipate that others access the internet in the same manner and have also been introduced to one caching scheme or another, particularly, a cache that is contained within their browser software.
Caching has been a part of the internet for several years and is considered a common practice. While this factor weighs in favor of an implied license, it is arguable that two parties may not make such an agreement when they have only met via the TCP/IP protocol. In other words, how can a license be implied from the conduct of two parties that have never met and have only interacted by automated computer network operations? Quite simply, the focus is on the conduct of the author placing his work on the internet. Such conduct amounts to a unilateral offer made by an author to the world that "if you can find my IP address, I have files to send you." Past conduct supports this theory since all internet users globally adhere to this standard and have the same expectations. By putting an IP address into a web browser, files are requested from a web server, voluntarily delivered, and then viewed on the user's screen. While there may be no specific prior dealings between the parties on the internet, all dealings are conducted in an identical manner. It should never be a surprise to an author that his works were viewed by others and cached somewhere along the way subsequent to his placing them on-line.
The other party to the implied license, the user, has a reasonable expectation that the information he is receiving may be viewed without liability. Users purchase browser software and connect to the internet to conduct research, communicate with friends, and explore the collective information that is available on-line. When browsers connect to an IP address, they are either sent HTML files from web servers or the connection is denied. It is only after the connection is made and files are transferred to the browser that their contents are viewed. Many of these files are cached along the way, often by the browser. Prior to the receipt of these files, their contents are unknown to users. Only subsequent to caching and viewing files may their contents be understood. It would be senseless to hold that once the user views a work on a web page that says, "this information is not to be cached or viewed by a user without permission of the author," he is an infringer because he received the notice too late. It seems implicit that permission must be provided to grant a user rights to view the work or an author's initial statement. An author who does not desire that his works be cached may either use an HTTP header directive or place a disclaimer on his home page. The disclaimer should require the user to agree to the author's terms before the delivery of other files containing protected works ensues. Based on these facts and circumstances, cache operators have a strong argument for the implication of a license from authors who place their protected works on the internet.
As technology evolves at a faster rate than statutory law, courts are frequently presented with contemporary issues that have a significant impact on the future. It is clear that file caching was never anticipated by either Congress or CONTU. The only certainty under the current copyright act is that cached materials fall within the subject matter of copyright law. The responsibility of safeguarding the public interest and benefits of file caching shall remain with the judiciary as long as the matter of infringement creates no genuine factual issues and statutory damages are assured.
Three affirmative defenses were presented to provide caching with immunity from harmless and beneficial infringement. While section 117 is not a feasible protection, fair use and implied licensing may produce the more equitable result. Under the statutory factors of fair use, two of the four factors balance in favor of file caching. The purpose of caching is for network efficiency and public benefit rather than for commercial exploitation, and there is no harm to the market for the author's originals. In light of the Sony decision, file caching is analogous to the tape recording of television shows for the purpose of "time-shifting." File caches copy web page files solely to assist authors in reaching their intended audiences. The author's file is still sent to its destination, but it is copied as an incidental part of its transmission over the internet. The end result is the relocation of the source of the file to a location closer to the requesting user. Fair use has always been an equitable doctrine, and its application to protect the operators of file caches is not unprecedented.
Notwithstanding the fair use defense, courts have the power to imply a nonexclusive license between file cache operators and authors. From an objective perspective, authors who voluntarily place their works on the internet for public dissemination may not understand the internet's protocol intricacies, yet they realize that files must pass through a series of network hardware and telephone lines in order to reach their intended audiences. Caching is a part of the internet's functions and is becoming more essential each year. Quite simply, authors must at least consent to the use of internet technology to enable users to receive works which they intended to be delivered in the first place. It would not be an exaggeration of the law for a court to make this finding.
The future of commerce, telephony, and global interaction is on the internet. As the net's uses and popularity grow, so must the computers and network technology behind its operation. Advances in science and technology are the result of public demand and the quest for scholarship. Many problems and limitations of expanding technology are addressed every day by engineers, scientists, and researchers. The public's ever-increasing demand for technological solutions should not be constrained by too slowly evolving law.