Crowdsourcing: Difference between revisions
BerkmanSysop (talk | contribs) (UTurn to 1297036800) |
|||
(220 intermediate revisions by 11 users not shown) | |||
Line 2: | Line 2: | ||
=== Definitions === | === Definitions === | ||
At present there is no generally | At present there is no generally accepted definition of crowdsourcing, and commentators have used many different meanings. Therefore, we believe an overview of the various definitions offered is helpful for further discussion of different types of crowdsourcing. For the purposes of our discussion, we will be using the terms "crowdsourcing", "crowdwork", and "cloudwork" interchangeably. | ||
The most widely accepted definition of crowdsourcing comes from Jeff P Howe, who recognized it as | The most widely accepted definition of crowdsourcing comes from Jeff P. Howe, who recognized it as "the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call."[http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html] | ||
He further clarified that the form of crowdsourcing could be either peer production (when co-workers interact and collaborate on projects) or sole individuals (when co-workers, if any, are isolated from one another). Under Howe's definition, the employer must be an organization (in most cases, a corporation), because he was considering crowdsourcing as a new type of corporate business model, by which corporations could raise current productivity or establish new businesses that were not possible before. Nevertheless, we do not think the employer using crowdsourcing, as a matter of definition, must be an organization; individuals can certainly outsource a task to an online crowd. | |||
Kleemann and Vob (2008) argue that "central to the concept of crowdsourcing is the idea that a crowd of people, collaboratively (or at least simultaneously) contribute to an aspect of the production process or to the solution of a design issue or other problems."[http://www.sti-studies.de/fileadmin/articles/kleemannetalstivol4no1.pdf] | |||
Although we agree that simultaneous or collaborative work is a significant type of crowdsourcing, it is not the only one. The Best Practices entry for crowdwork, developed last year and reposted on [[Class 3]], divides crowdwork into three categories: "First, a large group of workers may do microtasks to complete a whole project; the best-known platform in this arena is Amazon Mechanical Turk. Second, companies may use cloudwork platforms to connect with individual workers, or a small group of workers, who then complete larger jobs (e.g., Elance[http://www.elance.com/p/landing/buyerA5.html] and oDesk[http://www.odesk.com/#reloaded]). Finally, a company may run 'contests,' where numerous workers complete a task and only the speediest or best worker is paid (e.g., InnoCentive[https://www2.innocentive.com/] and Worth1000[http://www.worth1000.com/]). In some contests, the company commits to picking at least one winner; in others, there is no such guarantee." It is clear that when the crowdsourcing takes the form of competitive bidding, not every participant works on a single aspect of the task; each of them works on the whole task, and they do not have to work at the same time. Only the final winner gets compensated. It is possible that only one individual or organization joins the bidding process and no competing parties are involved. | |||
Another concept that Reichwald and Piller (2006) used to describe crowdsourcing is "interactive value creation". They further differentiate two types of crowdsourcing: mass customization and open innovation.[http://www.sti-studies.de/fileadmin/articles/kleemannetalstivol4no1.pdf] We do not think Reichwald and Piller's approach is convincing. First, the general term "interactive value creation" can be used to cover many types of online collaborative activities traditionally not recognized as crowdsourcing, e.g., open source development. Second, mass customization refers to an isolated customer's activity to tailor one particular product, rather than contributions to a general product. Their second type, "open innovation", correctly points out that crowdsourcing should be outsourced through a open call, but we do not believe a strong degree of "innovation" is necessary, especially for highly divided microtasks. | |||
Based on the above discussion, we believe that there are two core elements of crowdsourcing, both of which may be facilitated by an online platform (such as Amazon Mechanical Turk[https://www.mturk.com/mturk/welcome]) | |||
1. The task is outsourced through an open call from the employer; | |||
2. The recipients of the call, whether or not they elect to participate, comprise a large, amorphous crowd. | |||
The following discussion of crowdsourcing reflects our understanding of three of the most significant types of crowdsourcing: microtasks, "professional" tasks, and "game" tasks. Microtasks facilitated by platforms such as Amazon Mechanical Turk [https://www.mturk.com/mturk/welcome] might be the typical type of crowdsourcing in literature.[http://www.clickadvisor.com/downloads/Howe_The_Rise_of_Crowdsourcing.pdf] Not only is there an open call by an employer to the crowd, the crowd also collaborates on the whole task by each member addressing a small piece of the pre-divided work. Professional tasks, in contrast, might have multiple people working on them, but there is no collaboration among them, and their work usually involve a higher degree of innovation. Game tasks are again a different animal, since they copy many attributes of the other two, but add another layer by aiming to be entertaining as well as purposeful. | |||
=== General Information on Crowdsourcing === | === General Information on Crowdsourcing === | ||
*''General Information'' | *''General Information'' | ||
**For a quick overview by Jeff Howe, author of Crowdsourcing,[http://books.google.com/books?id=ge_0LBOcwWsC&printsec=frontcover&dq=crowdsourcing&hl=en&ei=QinGTKS9AcGAlAeHtLX-AQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CC8Q6AEwAA#v=onepage&q&f=false] take a look at this YouTube clip.[http://www.youtube.com/watch?v=F0-UtNg3ots] | **For a quick overview by Jeff Howe, author of ''Crowdsourcing'',[http://books.google.com/books?id=ge_0LBOcwWsC&printsec=frontcover&dq=crowdsourcing&hl=en&ei=QinGTKS9AcGAlAeHtLX-AQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CC8Q6AEwAA#v=onepage&q&f=false] take a look at this YouTube clip.[http://www.youtube.com/watch?v=F0-UtNg3ots] | ||
**Northwestern University Professor Kris Hammond also explains crowdsourcing, but argues its downsides are worker rewards and quality.[http://www.youtube.com/watch?v=eX7RiV-wa_s&feature=related] | **Northwestern University Professor Kris Hammond also explains crowdsourcing, but argues that its downsides are worker rewards and quality.[http://www.youtube.com/watch?v=eX7RiV-wa_s&feature=related] | ||
**Our very own Jonathan Zittrain discusses crowdsourcing in his talk, ''Minds for Sale''.[http://www.youtube.com/watch?v=Dw3h-rae3uo] | **Our very own Jonathan Zittrain discusses crowdsourcing in his talk, ''Minds for Sale''.[http://www.youtube.com/watch?v=Dw3h-rae3uo] | ||
**Several individuals gathered to discuss crowdsourcing in panel moderated by New York Times correspondent Brad Stone.[http://www.youtube.com/watch?v=lxyUaWSblaA] | **Several individuals gathered to discuss crowdsourcing in a panel moderated by New York Times correspondent Brad Stone.[http://www.youtube.com/watch?v=lxyUaWSblaA] | ||
*''In the News | *''In the News'' | ||
**The New York Times recently ran an article on crowdsourcing featuring two crowdsourcing companies:[http://www.nytimes.com/2010/10/31/business/31digi.html?_r=1&ref=technology] Microtask[http://www.microtask.com/] and CloudCrowd.[http://www.cloudcrowd.com/] | **The New York Times recently ran an article on crowdsourcing featuring two crowdsourcing companies:[http://www.nytimes.com/2010/10/31/business/31digi.html?_r=1&ref=technology] Microtask[http://www.microtask.com/] and CloudCrowd.[http://www.cloudcrowd.com/] | ||
**It's interesting to note that these companies are attempting to monetize crowdsourcing in exactly the way in which Howe says it cannot be monetized successfully. | **It's interesting to note that these companies are attempting to monetize crowdsourcing in exactly the way in which Howe says it cannot be monetized successfully. | ||
*''Examples of | *''Examples of Crowdsourcing'' | ||
**Take a look at Wikipedia's | **Take a look at Wikipedia's compilation.[http://en.wikipedia.org/wiki/List_of_crowdsourcing_projects] | ||
== Crowdsourcing Literature == | == Crowdsourcing Literature == | ||
Line 37: | Line 38: | ||
=== General Overview === | === General Overview === | ||
Although the idea of crowdsourcing--if not the word itself--has been around for many years, the Internet has made it much easier, cheaper, and efficient to harness the power of crowds. The power of crowds was popularized in 2004 when James Surowiecki published a book entitled, ''The Wisdom of Crowds.''[http://books.google.com/booksid=hHUsHOHqVzEC&printsec=frontcover&dq=the+wisdom+of+crowds&hl=en&src=bmrr&ei=T0DtTI2FGYP88AbA78GaAw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCsQ6AEwAA#v=onepage&q&f=false] This book purported to show how large groups of people can, in many cases, be more effective at solving problems than | Although the idea of crowdsourcing--if not the word itself--has been around for many years, the Internet has made it much easier, cheaper, and efficient to harness the power of crowds. The power of crowds was popularized in 2004 when James Surowiecki published a book entitled, ''The Wisdom of Crowds.''[http://books.google.com/booksid=hHUsHOHqVzEC&printsec=frontcover&dq=the+wisdom+of+crowds&hl=en&src=bmrr&ei=T0DtTI2FGYP88AbA78GaAw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCsQ6AEwAA#v=onepage&q&f=false] This book purported to show how large groups of people can, in many cases, be more effective at solving problems than experts. According to Surowiecki (2004: xiii), "under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them." Two years later, journalist Jeff Howe coined the phrase "crowdsourcing" to refer to work that was performed by the "masses" online.[http://www.wired.com/wired/archive/14.06/crowds.html] Since Howe's article was published in 2006, numerous authors have written books on crowdsourcing, each choosing to focus on different aspects of the topic. Howe himself took up the topic in 2008, proclaiming crowdsourcing to be a panacea--a place where a perfect meritocracy could thrive.[http://books.google.com/books?id=ge_0LBOcwWsC&printsec=frontcover&dq=crowdsourcing&hl=en&ei=QinGTKS9AcGAlAeHtLX-AQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CC8Q6AEwAA#v=onepage&q&f=false] Howe examined crowdsourcing from a variety of perspectives: what benefits it can provide, what kinds of tasks it can accomplish, and the potential changes it may bring about. Howe's prognosis for crowdsourcing was positive--in it he saw many potential solutions and few potential problems. Others have followed Howe's lead in describing the benefits of crowdsourced work. Clay Shirky has published two books--''Here Comes Everybody'' (2008)[http://books.google.com/books?id=UNxU-2s2sQYC&printsec=frontcover&dq=clay+shirky+here+comes+everybody&hl=en&src=bmrr&ei=TEHtTMD6FIT58Aa54JVs&sa=X&oi=book_result&ct=book-thumbnail&resnum=1&ved=0CCQQ6wEwAA#v=onepage&q&f=false] and ''Cognitive Surplus'' (2010)[http://books.google.com/books?id=_U1nQgAACAAJ&dq=cognitive+surplus&hl=en&src=bmrr&ei=f0XtTMnsFcOC8gaD4MGRAw&sa=X&oi=book_result&ct=book-thumbnail&resnum=1&ved=0CCsQ6wEwAA]--in which he describes how technology does more than enable new tools, it also enables consumers to become collaborators and producers. Although Shirky's books are not expressly about crowdsourcing per se, they mirror the optimism Howe expresses, both in terms of collaborative enterprises and the Internet's power to enable them. | ||
These books have provoked an academic interest in finding out ''who'' is the crowd, or why the crowd moves the way it does. Some have looked at scientific crowdsourcing, asking what characteristics make someone a successful crowdworker/problem-solver.[http://www.hbs.edu/research/pdf/07-050.pdf] Part of answering that question, it turns out, requires asking why people attempt to be part of the innovating crowd in the first place. The authors of this study found that the crowd was highly educated. They also found heterogeneity in scientific interests, as well as monetary and intrinsic motivations, to be important drivers of "good" problem-solvers. Others have examined non-scientific endeavors and asked similar questions. One report, for example, examined workers on iStockphoto.[http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2159/1969] This report found that most iStock crowdworkers developing photographs were highly educated and motivated primarily by money. Jeff Howe, however, takes a different perspective as to crowdworkers' motivation: "There are . . . two shared attributes among almost all crowdsourcing projects: participants are not primarily motivated by money, and they’re donating their leisure hours to the cause. That is, they’re contributing their excess capacity, or 'spare cycles,' to indulge in something they love to do." (''Crowdsourcing'', pgs. 28-29.) | |||
While some focused on the potential consumer revolution or the composition of the crowd, others examined the business-related aspects of crowdsourcing. Identifying attributes of successful crowd innovators also has a business dimension. One researcher suggests that having experience spanning across a variety of communities or disciplines makes one likely to be considered 'innovative.'[http://74.125.155.132/scholar?q=cache:Ro0Ij3Y41wMJ:scholar.google.com/&hl=en&as_sdt=40000000&sciodt=40000000] Others focus more broadly on how to use the crowd to maintain or bolster business or brand. In ''Groundswell'' (2008),[http://books.google.com/books?id=YOVuQFXNcP4C&printsec=frontcover&dq=groundswell&hl=en&ei=ZF_tTKjtLIL58Ab-tu26AQ&sa=X&oi=book_result&ct=book-thumbnail&resnum=1&ved=0CCsQ6wEwAA#v=onepage&q&f=false] Charlene Li and Josh Bernoff focus on how to most effectively use crowdsourcing to advantage businesses. The authors highlight how user bases of products can undermine a product or brand.[http://books.google.com/books?id=YOVuQFXNcP4C&printsec=frontcover&dq=groundswell&hl=en&src=bmrr&ei=tUPtTPOeAYP-8AbcxemdAQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCgQ6AEwAA#v=onepage&q&f=false] As a result, the authors propose that businesses use the "groundswell" to their advantage, fostering communities that can provide valuable feedback and economic payoffs. Marion K. Poetz and Martin Schreier have also taken a business perspective on crowdsourcing,[http://ssrn.com/abstract=1566903] arguing that the crowd is capable of producing valuable (but not always viable) business ideas at a low cost. Other researchers have found that young entrepreneurs who were attempting to start businesses frequently belonged to these kinds of communities.[http://ssrn.com/abstract=1022358] For a related discussion on user innovation and user communities, see Eric Von Hippel's books[http://web.mit.edu/evhippel/www/books.htm] and William Fisher's article in the Minnesota Law Review.[http://www.minnesotalawreview.org/content/implications-law-user-innovation] | |||
Other authors have pointed out some of the problems with crowdsourcing. Dr. Mathieu O'Neil has argued that, despite its benefits, crowdsourcing can have inconsistent quality, can lack the diversity needed to draw on the "wisdom of the crowd", and can contain many irresponsible actors.[http://www.paris-sorbonne.fr/fr/IMG/pdf/oneil-2.pdf] Miriam Cherry has argued that some crowdwork can be exploitative, sometimes forcing people to work for absurdly low wages.[http://ssrn.com/abstract=1499823] She argues that we need a legal framework for addressing low wages, proposing we apply the Fair Labor Standards Act (FLSA) to crowdsourced work like that found on Mechanical Turk. In a forthcoming article, she takes a more systematic (but still legal) approach to suggesting solutions for the problems faced by different kinds of virtual work.[http://ssrn.com/abstract=1649055] Cherry seems to be the only law professor to have written on addressing crowdsourcing from a doctrinal perspective. | Other authors have pointed out some of the problems with crowdsourcing. Dr. Mathieu O'Neil has argued that, despite its benefits, crowdsourcing can have inconsistent quality, can lack the diversity needed to draw on the "wisdom of the crowd", and can contain many irresponsible actors.[http://www.paris-sorbonne.fr/fr/IMG/pdf/oneil-2.pdf] Miriam Cherry has argued that some crowdwork can be exploitative, sometimes forcing people to work for absurdly low wages.[http://ssrn.com/abstract=1499823] She argues that we need a legal framework for addressing low wages, proposing we apply the Fair Labor Standards Act (FLSA) to crowdsourced work like that found on Mechanical Turk. In a forthcoming article, she takes a more systematic (but still legal) approach to suggesting solutions for the problems faced by different kinds of virtual work.[http://ssrn.com/abstract=1649055] Cherry seems to be the only law professor to have written on addressing crowdsourcing from a doctrinal perspective. | ||
Much of the other literature on the subject concerns the problem of quality. Soylent--which is essentially a crowdsourced editing program--has been a prime example of how lack of quality can limit the commercialization of a innovative and useful crowdsourcing product.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4409122] Cheat detection--the ability to filter out individuals who complete tasks without actually reading them in the hopes of receiving money without doing the work--has also recently drawn attention. For instance, a possible crowdsourced solution to cheaters has been proposed for sentence translations, relying on principles such as crowd consensus and logical parallelism in sentence structure and word choice. [http://domino.research.ibm.com/library/cyberdig.nsf/papers/A08798F3F7A3476A8525777E005C6AD2] Others have attempted to increase the quality of the traditionally-automated mechanism used to translate words by crowdsourcing translation tasks.[http://domino.research.ibm.com/library/cyberdig.nsf/papers/A08798F3F7A3476A8525777E005C6AD2] In addition to simple crowdsourcing, one set of authors suggests combining human crowdwork with machine work. This process, according to the authors, the system can specific a specific "speed-cost-quality tradeoff," which is based on an allocation of tasks among computers and humans.[http://alexquinn.org/papers/CrowdFlow,%20Integrating%20Machine%20Learning%20with%20Mechanical%20Turk%20for%20Speed-Cost-Quality%20Flexibility%20(Quinn,%20Bederson,%20Yeh,%20Lin).pdf] John J. Horton, David Rand, and Richard Zeckauser have addressed using the online crowd for quality experimental research.[http://ssrn.com/abstract=1591202] | |||
Much of the other literature on the subject concerns the problem of quality. Soylent--which essentially | |||
=== Other Problems === | === Other Problems === | ||
The literature on crowdsourcing often discusses broad or specific issues. Books tend to have an overall argument about the value of crowdsourcing, its core attributes, and how it needs to be structured. Articles, conversely, tend to describe specific studies or problems within a particular community. | The literature on crowdsourcing often discusses either broad or specific issues. Books tend to have an overall argument about the value of crowdsourcing, its core attributes, and how it needs to be structured. Articles, conversely, tend to describe specific studies or problems within a particular community. There is little room for systematically addressing common crowdsourcing problems. Instead, the platforms offering crowdsourcing, such as Mechanical Turk, address these problems internally. 99designs--a website that allows people to solicit creative logo designs--has several policies regulating the behavior of those who request[http://99designs.com/help/contestholder/guidelines] and perform[http://99designs.com/help/designer/codeofconduct] work. Most crowdsourcing services have similar policies or recommendations. In January 2010, a small group of students from Harvard Law School and Stanford Law School gathered in Palo Alto for three weeks to talk about these more general problems. They produced a document of Best Practices ([[Class 3]]), which sought to identify and propose a framework to address problems endemic to crowdsourcing. That document identified six major issues that needed to be addressed in cloudwork: | ||
1. Disclosure: Workers want to know the identity of the employer, so disclosure should be the default preference. | 1. Disclosure: Workers want to know the identity of the employer, so disclosure should be the default preference. | ||
Line 66: | Line 64: | ||
6. Privacy Protection: Workers are concerned with employers sharing their (potentially sensitive) information, so platforms should protect information and not release it. | 6. Privacy Protection: Workers are concerned with employers sharing their (potentially sensitive) information, so platforms should protect information and not release it. | ||
The | The Best Practices document provides a good starting point because it identifies several major issues common to all crowdsourcing. It does not, however, capture all potential problems. Additionally, it tends to focus concerns only on the workers, but platforms and companies also face similar problems. Moreover, because the document is meant as a general framework, it is hard to get a sense of whether it could be effectively implemented across the board. There is room, then, to explore problems that are both broad enough to have implications for a variety of actors, but specific enough to merit a context-specific solution. | ||
== Our Addition: Identifying Areas, Exploring the Problems == | == Our Addition: Identifying Areas, Exploring the Problems == | ||
Given the body of literature and the Best Practices document, we found the idea of addressing systemic problems both attractive and difficult. Instead of replicating the Best Practices, or simply writing an overview of crowdsourcing, we decided to take a different angle. Unlike the Best Practices document, which classified problems generally and then worked downward to devise specific solutions by applying them to different types of crowdwork, we worked from the | Given the body of literature and the Best Practices document, we found the idea of addressing systemic problems both attractive and difficult. Instead of replicating the Best Practices, or simply writing an overview of crowdsourcing, we decided to take a different angle. Unlike the Best Practices document, which classified problems generally and then worked downward to devise specific solutions by applying them to different types of crowdwork, we worked from the bottom up. We identified three types of crowdwork that suggested a variety of important, but (context-)specific problems. At the beginning stages, we had only our intuition to guide our "sense" of the problems. As we delved further into them, however, they crystallized. From our discussions we identified three types of crowdwork in which specific problems arise, some of which are systemic problems with crowdsourcing that the Best Practices does not address. Nevertheless, we wanted to draw on the Best Practices document to determine whether some of its strategies seemed workable or needed to be expanded, refined, or discarded. To accomplish this goal, we attempted to integrate the Best Practices approaches into our framing of both the problems and the solutions we discussed. | ||
=== An Introduction to Our Approach === | === An Introduction to Our Approach === | ||
Our discussion of various crowdsourcing | Our discussion of various crowdsourcing environments suggested a variety of ways to slice the pie. In the end, we settled on three areas of crowdsourcing, reaching a rough classification based on the type of work performed. In that sense, our division followed the Best Practices division of work into microtasks, connective tasks, and contest tasks. But there was an important difference: our classification of work depended also upon the purpose for which the work was being put, focusing on a specific case study for each. In other words, it mattered to us that one task was framed as a "game" versus a "survey." We cared not just about the framing, but also the motives of employer and the worker. We asked questions like, "For what purpose is the employer requesting this task?" and "Why does the worker choose to perform the task?" Our aim was not to analyze every kind of crowdwork using motive and purpose; rather, these questions provided a general framing for dividing crowdwork into analytical categories--places where we could identify specific problems that may differ depending on the answers to these questions. After significant discussion, we settled on three types of tasks, choosing a case study to explore each one: | ||
1. Microtasks: ''Amazon's Mechanical Turk''[https://www.mturk.com/mturk/welcome]; | 1. Microtasks: ''Amazon's Mechanical Turk'', ''Microtask.com'', and ''Soylent''[https://www.mturk.com/mturk/welcome]; | ||
2. Tasks requiring "professional" skills: ''99designs''[http://99designs.com/] and InnoCentive[http://www2.innocentive.com/]; and | 2. Tasks requiring "professional" skills: ''99designs''[http://99designs.com/] and ''InnoCentive''[http://www2.innocentive.com/]; and | ||
3. "Game" tasks: ''Gwap''[http://www.gwap.com/gwap/]. | 3. "Game" tasks: ''Gwap''[http://www.gwap.com/gwap/]. | ||
Line 87: | Line 84: | ||
=== The 3 Crowdsourcing Environments and Problems === | === The 3 Crowdsourcing Environments and Problems === | ||
==== Microtasks: Amazon | ==== Microtasks: Amazon Mechanical Turk; Microtask.com; Soylent ==== | ||
Microtask is a type of crowdsourcing that refers to an employer dividing a task into smaller subtasks that require human intelligence (Human Intelligence Task, HIT [https://www.mturk.com/mturk/help?helpPage=overview]), and then assigning the microtasks to workers in the crowd. Each microtask can be completed independently of any information about the other microtasks. Although some tasks might require the worker to have certain qualifications to complete them, such as knowledge of a language, the human intelligence required to complete a HIT is minimal--i.e., the average (or even below average) workers can do the job. Workers earn income by the number of tasks that they complete and are approved by the employer.[https://www.mturk.com/mturk/help?helpPage=worker#how_paid] By definition, workers do not always earn monetary benefits: in some cases, they feel a sense of achievement by completing a task or winning virtual points in the form of games (as discuss in the final section, e.g., Gwap [http://www.gwap.com/gwap/]), or earning no benefits at all (e.g. ReCaptcha [http://www.google.com/recaptcha]). In this section, we limit our discussion to microtasks with some kind of monetary reward. | |||
An introductory video of Microtask.com [http://www.youtube.com/watch?v=SteMdhKSS18] is a good illustration of the nature of the microtask-type of crowdsourcing. Microtask.com also offers two typical examples that are particularly suitable for microtasks: form processing, which helps clients such as banks and insurance companies transfer hand-filled forms into digital forms compatible with databases[http://www.microtask.com/solutions/]; and digitization, which helps clients such as national archives and libraries proofread their scanned and OCR'ed files and digitize them into a format that is searchable and machine-readable [http://www.microtask.com/solutions/]. | |||
===== Quality Control ===== | |||
''Problem'' | |||
One concern is that the quality of crowdsourced projects may not always be satisfactory. Below is a sample of text revised by Soylent [http://projects.csail.mit.edu/soylent/] from the second paragraph of Prof Zittrain's book [http://yupnet.org/zittrain/archives/6]. | |||
"This was not the first time Steve Jobs had launched a revolution. Thirty years earlier, at the First West Coast Computer Faire in nearly the same spot, the twenty-one-year-old Jobs, wearing his first suit, exhibited the Apple II personal computer to great buzz amidst “10,000 walking, talking computer freaks.” The Apple II was, a machine for hobbyists who did not want to fuss with soldering irons:, had all the ingredients for a functioning PC were provided in a convenient molded plastic case. It lookedWhile clunky, yet it could be at home on someone’sfit on a home desk. Instead of puzzling over bits of hardware or typing up punch cards to feed into someone else’s mainframe, Apple owners faced only the hurdle of a cryptic blinking cursor in the upper left corner of the screen: the PC awaited instructions. But the hurdle was not high. Some owners were inspired to program the machines themselves, but true beginners simply could load up software written and then shared or sold by their more skilled or inspired counterparts. The Apple II was a blank slate, a bold departure from previous technology that had been developed and marketed to perform specific tasks from the first day of its sale to the last day of its use." | |||
A comparison between the above text with the original might suggest the the quality of the original is not improved; in fact, it added typos and errors that the original does not contain. Another example would be a worker who does not actually use human intelligence, but just clicks the mouse randomly. For instance, in a task where a worker is supposed to tell the sex of the person in a photo, or tell the major color of a picture, one could randomly choose the result in order to complete as many tasks as possible. Although workers of a task only earn income when the requester approves their work,[https://www.mturk.com/mturk/help?helpPage=worker#how_paid] the approval of requesters are generally procedural rather than substantive. The dilemma is that employers cannot check each microtask (since it would mean that the employer would have to do the work all over again), nor can they use machines to do so (otherwise they would not have outsourced the task to the crowd in the first place). | |||
''Solutions'' | |||
Improving the quality for crowdsourcing means, not only that current tasks can be completed with better quality, but also that the crowd will develop a greater capacity to assume more complex responsibilities. There are two critical approaches that can be considered to improve the quality: 1) result verification/evaluation, and 2) worker grouping. While neither is a perfect solution by itself, a superior system might be a hybrid of the following methods. | |||
An | *1. Verifying microtasks. One approach to maintain the quality of microtasks is to have the results checked. The dilemma here is that, on one hand, employers are unable to verify each microtask in substance (because if they could, crowdsourcing would be unnecessary); on the other, machines cannot verify the result because the microtasks usually require human intelligence that is difficult for machines to process. Therefore, possible solutions lie back with the crowd. | ||
**a. Repetition. As a mechanism that some platforms, e.g., Soylent, [http://groups.csail.mit.edu/uid/other-pubs/soylent.pdf] have already adopted, the employer can have each microtask repeated by multiple workers. A task is only accepted as valid when the majority (e.g., 2/3 or 3/4) of workers produce the same results. An assumption must be made that majority of workers in the task are of good faith, i.e., they work genuinely exercising their human intelligence. Obviously, if most workers click randomly, even such a mechanism will not guarantee high-quality products. Another issue is cost. From the efficiency point of view, each microtask can be solved by one competent worker. In order to verify the result by repetition, the costs of labor multiply hugely (depending on the number of repetitions that the employer designates). | |||
**b. Gold standard. Another mechanism that some employers use, if the nature of the task allows, is to mix in test questions where the result is already known to check the quality of the worker's performance. (''See, e.g.'', Qin Gao & Stephan Vogel, "Consensus versus Expertise: A Case Study of Word Alignment with Mechanical Turk", Language Technologies Institute, Carnegie Mellon University, pg. 31.) For instance, if legal journals wanted to crowdsource subciting[https://secure.wikimedia.org/wikipedia/en/wiki/Bluebook], they could include an incorrect citation where they already knew what the correct version was, and use that as a standard against which to measure the worker's overall performance of the task. | |||
*2. Differentiating the crowd. One way of differentiating a group of workers from the rest of the crowd for a specific purpose for a crowdsourcing task (e.g., if it is a task that requires workers who are more experienced or more competent in a particular area) is to impose qualifications based their prior experience, which can either be their experience in the off-line world or their online performance regardless of their actual background in the off-line world, or a qualification test specifically for the potential workers before they start. | |||
**a. As in the off-line employment market, crowdsourcing platforms can group workers based on their education, professional qualifications, etc., and an employer may specify such requirements to admit workers for its task. The difficulty here is that there is no reliable way of verifying the off-line information. If relying on self-reporting, the employer may receive fake information from workers who want to have more opportunities to work. While the employer (or the platform) can have workers upload evidence for the backgrounds (e.g., an electronic version of degree certificate, an .edu email address), workers can still fake evidence [http://www.nd-center.com/2006/05/fake-degree-certificate.html]. Moreover, there are acute concerns about privacy: people may not want to upload their real copies of certificates, even to large online service providers like Amazon or Google, who usually have better privacy protection mechanisms, so they might upload fake ones instead or not upload at all. | |||
**b. Another approach is to completely ignore people's off-line identities and focus only on their online experience. A rating system is commonly established for this purpose. A requester may choose how many points a worker can earn for each microtask. A platform may choose to adopt more specialized rating system (categorizing experience into different types of tasks), or a uniform rating system (experience accumulates by the number of tasks completed, as in Mechanical Turk's system[http://aws.amazon.com/mturk/]). A uniform rating system might be unfair for newcomers because they lack the required experience points to start, especially since workers who have sufficient points might have earned them from tasks that are irrelevant to the current task. A categorized rating system seems to be more reasonable. (Rating systems are discussed in more detail in the next sections problems on reputation for professional work.) | |||
**c. Some requesters do not consider either off-line background or online experience. They set up a training session and final quiz for candidates, and only those who can pass the quiz qualify to work [http://aws.amazon.com/mturk/]. The limitation of the qualification test is that it only ensures the competence of the worker; it does not guarantee that he or she will genuinely do the work as expected. | |||
===== Protection of Workers ===== | |||
''Problem'' | |||
The Best Practice document composed by [[Class 3]] of the last winter course discusses many issues that involve the protection of workers' interests. We agree that crowdsourcing (and the microtask-type of crowdsourcing in particular) creates a unique virtual environment where co-workers are isolated, and therefore reveals new issues in the employer-worker relationship. It is also important to recognize that crowdworkers are a heterogeneous group, not merely in terms of skills or other work-related differences, but in terms of backgrounds as well. For instance, recent studies on demographics of Mechanical Turk [http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1585030][http://www.ics.uci.edu/~jwross/pubs/RossEtAl-WhoAreTheCrowdworkers-altCHI2010.pdf] show that there is an increasing number of Turkers from India. In March 2008, the platform was dominated by U.S. participants (76%) and only 8% Turkers were from India. As of November 2009, although U.S. people still accounted from more than half (56%), the Indian portion had skyrocketed to over one-third (36%). Mechanical Turk is becoming more and more international, and the diverse workforce complicates the issues concerning the protection of their interests. Therefore, we will attempt, when appropriate, to treat workers with different backgrounds and purposes separately when discussing the protection of their interests in crowdsourcing work. | |||
''Solutions'' | |||
*1. Unionize. Felstiner (2010) argued that the current statutory framework based on the National Labor Relations Act (NLRA) is insufficient to address the legal issues on crowdsourcing; he further suggested that workers do not need to wait for legislative action, but rather can use their collective power to protect their interests now.[http://works.bepress.com/context/alek_felstiner/article/1000/type/native/viewcontent] This suggestion raises a question: what kind of collective action is appropriate for crowdsourcing? One side of the spectrum is for organizations, such as trade unions in the off-line world. We believe this approach is problematic for several reasons. First, only the platform possesses complete information of all the workers, and due to the wide geographical distribution, establishing a union entails disclosure of employer's contact information. Second, employers of commercial tasks rely on the isolation of workers to take advantage of the benefits of crowdsourcing without compromising trade secrecy and confidentiality. However, communications among workers might enable them to reassemble microtasks and reproduce the entire task, which may not be an acceptable risk for the employer. | |||
An online forum is another possibility. Compared to a union, online forums are loose, and satisfy the need of anonymous communication by workers. Existing forums includes Turker Nation [http://www.turkernation.com] and mTurk Forum [http://www.mturkforum.com], which seem to be functioning well. On those online forums, workers can discuss any issues of mutual interest and try to create bargains with the employers or the platform. Meanwhile, the identities of workers and details of the tasks they are working on stay confidential unless they reveal them deliberately. The risk of breach of confidentiality can be solved by rules of forum discussion and enforced by the the administrator of the online forum. | |||
*2. Minimum wage. Cherry (2009) argued that the minimum wage requirements should be extended to "virtual work", including crowdsourcing in cyberspace.[http://www.law.ua.edu/lawreview/articles/Volume%2060/Issue%205/cherry.pdf] Given that a large portion of Indians are working on Mechanical Turk and that the average household income is relatively low in India, as seen in the demographic studies noted above, some Indians (less than 10%) have started to rely on crowdsourcing as their primary source of income. For the other workers in India, although crowdsourcing is not their sole employment, it is a relatively significant portion of their incomes.[http://www.ics.uci.edu/~jwross/pubs/RossEtAl-WhoAreTheCrowdworkers-altCHI2010.pdf] Even in the U.S., more than half of Turkers consider monetary reward as their primary reason to participate in crowdsourcing. | |||
However, given the global nature of workforces on Mechanical Turk, the details of setting a minimum wage for crowdsourcing are extremely unclear and difficult because of jurisdictional differences in labor law, taxation, etc. For example, should the minimum wage law in the U.S. apply merely because the platform (Amazon) is registered in the U.S., even though both the requester and workers are from outside the U.S.? If so, the trend appears that more and more workers from developing countries would join crowdsourcing because the economic incentives are more attractive to them. | |||
On the other hand, we can see that the need of minimum wage less compelling for those who do not rely on crowdsourcing as their major employment. Most microtasks require a low level of attention, and people can easily multi-task when they are watching television and online chatting. Crowdsourcing enables them to take advantage of their "spare cycles", and meanwhile enables employers to utilize human intelligence at very low cost. In these cases, imposing minimum wage may diminish some of the advantages of crowdsourcing. | |||
==== "Professional-Grade" Tasks: 99designs & InnoCentive ==== | ==== "Professional-Grade" Tasks: 99designs & InnoCentive ==== | ||
99designs is a website that allows individuals or companies that need a design to ask for it by crowdsourcing.[http://99designs.com/help/what-is-99designs] InnoCentive is a company that allows individuals and entities to post scientific problems that anyone can attempt to solve.[http://www2.innocentive.com/innovation-solutions/corporate-innovation] These companies are a particularly interesting form of crowdsourcing because they enable the crowd to perform work typically performed by "professionals." Although services like Mechanical Turk also "deprofessionalize" work, 99designs and InnoCentive do so much more directly. Typically, designs are created (at 99designs) or problems are solved (at InnoCentive) by professional companies, the employees of which typically have some formal training. This platform raises a variety of concerns. | 99designs is a website that allows individuals or companies that need a design to ask for it by crowdsourcing.[http://99designs.com/help/what-is-99designs] InnoCentive is a company that allows individuals and entities to post scientific problems that anyone can attempt to solve.[http://www2.innocentive.com/innovation-solutions/corporate-innovation] These companies are a particularly interesting form of crowdsourcing because they enable the crowd to perform work typically performed by "professionals." Although services like Mechanical Turk also "deprofessionalize" work, 99designs and InnoCentive do so much more directly. Typically, designs are created (at 99designs) or problems are solved (at InnoCentive) by professional companies, the employees of which typically have some formal training. This platform raises a variety of concerns. We focus here on two: deprofessionalization and reputation. | ||
===== "Deprofessionalization" ===== | ===== "Deprofessionalization" ===== | ||
In some sense, graphic design and other industries such as science are "professionalized" | In some sense, graphic design and other industries such as science are "professionalized." What, exactly, "professional" means is open for debate. One definition might focus on how the "professional" occupation or industry regulates or doesn't regulate itself. Lawyers and doctors, for example, are self-policed professions with codes of ethics and oaths. Additionally, though, we often think of many other areas as professional because the individuals who occupy them have formal training (and many times formal education). Under this latter definitional component, many types of work qualify as "professional." The traditional occupations like lawyer, doctor, and clergy certainly fall within it; but so too do other kinds of work, such as graphic design. A more restrictive definition--one that includes both definitional components--might exclude a wide variety of occupations. But let's assume a common sense distinction between professional and amateur or non-professional works. For the past several years, crowdsourcing has crept into these "professionalized" areas without much fanfare. In science, for example, InnoCentive has provided a platform for corporate employers to crowdsource complex science problems that require formal training. In the "creative" space, 99designs performs a similar function: it enables companies to crowdsource graphic design work. | ||
When thinking about both of these "professionalized" environments, it helps to understand professionalism from two perspectives. The first perspective views professionalization as a beneficial gatekeeping mechanism--it vets people before they can perform certain work. The second perspective views it as merely reinforcing existing structures that disadvantage certain individuals; i.e., those without connections or formal training. There is probably some truth to both claims--but the focus here is on how crowdsourcing affects either perspective. From the first perspective, professional crowdsourcing platforms reduce the role of industry or profession as gatekeeper--and has the potential to eliminate it entirely. From the second perspective, crowdsourcing opens opportunities to those who otherwise are shut out of the industry by the gatekeeping mechanisms of formal education or training. Given that both of these perspectives probably have merit, there are a variety of problems we could address. We chose instead to identify two "problems," while remaining open to the possibility that others, with more time and resources, might be able to do further work on them. This might be analysis proclaiming these "problems" as non-issues, or it may be elaborating and expanding on the issues identified here. | |||
''Specific Problems'' | ''Specific Problems'' | ||
*1. | *1. Cannibalization/Wage Reduction. If 99designs or InnoCentive lowers entry barriers and costs, it could stimulate a race to the bottom. 99designs' Business Development and Marketing Manager, Jason Aiken, has already acknowledged that the company has created "a tension" with the traditional market for graphic design--and it may be driving down wages.[http://www.youtube.com/watch?v=lxyUaWSblaA] (Clip at 34:16.) In this world, professionals could not earn a living because "amateurs" or other professionals that do not have jobs will drive down prices for crowdsourced design work. With low prices and an abundance of crowdworkers, companies may shed their traditional means of acquiring designs. So crowdsourcing, which started off as a way to lower specific business costs or solve thorny problems, becomes the sole means of (research &) design work. In this environment, which designers worry about,[http://www.xemion.com/blog/99designscom-a-warning-to-freelancers-67.html] the professional industry collapses because wages are too low. Alternatively, a new web-based professionalization occurs. That, of course, depends on a variety of factors, including the ability of crowdworkers to maintain a coherent identity/reputation online (see Reputation below). | ||
*2. Devaluing of Education. If deprofessionalization occurs and an industry cannibalizes itself, there will be a concomitant and precipitous drop in the market value of education. In scientific areas, many crowdworkers have advanced degrees, or | *2. Devaluing of Education. If deprofessionalization occurs and an industry cannibalizes itself, there will be a concomitant and precipitous drop in the market value of education. In scientific areas, many crowdworkers tend to have advanced degrees, or are highly educated. Their ability to perform sophisticated or professional crowdwork is therefore partly dependent upon their education. But the crowdwork market undervalues the educational experience of the worker because, at present, most highly-educated workers are engaged in part-time, non-sustenance activity. As crowdwork becomes more popular for professionals, wages fall and crowdwork replaces traditional professional work. The cost of education, however, remains constant. This means that sophisticated crowdworkers will pay the same amount for school but will be full-time, instead of part-time, crowdworkers. Given the low wages, people will not be making an adequate return on their educational investment. Several possibilities then result--but we focus on the most dire here. Knowing their inability to generate an adequate return on their investment, low wages could deter individuals from higher education. This, in turn, will cause a brain drain, where fewer and fewer people obtain advanced degrees. As a result, the market for professional crowdworkers shrinks. Given the technological growth rate, the demand for crowdworkers will continue to grow. The shorty supply and high prices will mean the end of crowdsourced professional work for two reasons. Costs will rise to the point where crowdsourcing is no longer more economical than traditional professional services. Second, and more importantly, the lack of workers means that, given the growing technological and cultural demands our society makes, tasks simply cannot be crowdsourced effectively. Although one might argue that markets will correct for the supply/demand problems, it may not do so at fast enough rates. The legal education market, for example, has continually grown despite decreasing demand.[http://articles.chicagotribune.com/2010-04-27/business/ct-biz-0427-chicago-law-students--20100427_1_law-school-law-firms-national-law-journal] The same thing is happening with for-profit colleges.[http://www.pbs.org/wgbh/pages/frontline/collegeinc/][http://chronicle.com/article/For-Profit-Colleges-Mount-a/66027/] The market has not yet corrected itself because the government is so willing to issue loans. When the problem finally comes to the fore, there will be serious costs, both to society, to students, to graduates, and to the industry. The mere fact that the market will correct the problem eventually doesn't mean we should wait for it to do so, especially when the correction may be painful and enduring. | ||
''Solutions.'' | ''Solutions.'' | ||
One might wonder whether concerns over deprofessionalization are overstated. The criticism is that we are worried simply about "amateurs" displacing "professionals." We | One might wonder whether concerns over deprofessionalization are overstated. The criticism--the one Howe addresses the first chapter ("The Rise of the Amateur") in his book[http://books.google.com/books?id=LRbsMBxR9ykC&printsec=frontcover&dq=howe+crowdsourcing&hl=en&ei=xj_4TMLDIcGB8ga3w5mLAg&sa=X&oi=book_result&ct=book-thumbnail&resnum=1&ved=0CCsQ6wEwAA#v=onepage&q&f=false]--is that we are worried simply about "amateurs" displacing "professionals." Howe certainly has a point: if *other* individuals can do the work of professionals at less wages and through crowdsourcing, then what are we worried about? We do, of course, like things to be efficient. We are concerned, however, that some industries could, over the long run, suffer damage if efficiency is our only concern. That doesn't mean that industries will necessarily suffer damage, but they may. And we feel that it's worth considering what could happen if we place all our bets on markets and efficiency, and the dealer comes up 21. One place where we are starting to see problems of relying on market is with traditional media outlets. We may think that "amateurs" can report just as well as many professionals--and their doing so may be more efficient. But we have other concerns, such as whether the reporting is reliable and whether the facts check out. Similarly, in design work, efficiency and opportunity are good things. But what happens when the market for professionals has disappeared, and only low-wage crowdworkers remain? Alternatively, what if all legal work were crowdsourced? The quality of work may decrease, relationships between employers and workers may fray, and the industry may have significant difficulties. As noted, though, this is not the only scenario. The industry may adapt to these changes and do just fine. The exercise here, though, is to think about deprofessionalization and education devaluation as problems, and pose potential solutions while recognizing they may not materialize. Here are a few solutions we might consider: | ||
*1. Wage Scale. Industries and professionals could collaborate to set wages they think are reasonable. The Best Practices document recommends "fair" wages, but | *1. Wage Scale. Industries and professionals could collaborate to set wages they think are reasonable. The Best Practices document recommends "fair" wages, but seems to presume that only employers will be the ones deciding what wages to set. In this professional-crowdwork context, it might be wise to allow collaboration of interested parties, rather than rely only on employers to set a fair wage. One could see such a wage scale being set by various stakeholders, and perhaps some "non-stakeholders." | ||
*2. Wage Determinants. Wages could be set according to some or a series credential- | *2. Wage Determinants. Wages could be set according to some or a series credential-measures. This could take several forms, some of which could be combined. | ||
**2a. Workers with a degree or work experience in | **2a. Workers with a degree or work experience in a related professional field may be entitled to a higher wage than a worker with no higher education. The problem with this approach is that it reduces the "meritocracy" aspect of crowdsourcing. It also seems to devalue the diversity that crowdsourcing thrives on. | ||
**2b. If the platform implements an effective reputation or rating system (detailed or simple), workers could use their reputations to generate more work. This solution faces some technical and privacy problems, but seems like at least one plausible way of ensuring favorable wages for better performers. This also has the benefit of showing which workers are repeatedly good at performing tasks. This is important because many | **2b. If the platform implements an effective reputation or rating system (detailed or simple), workers could use their reputations to generate more work. This solution faces some technical and privacy problems, but seems like at least one plausible way of ensuring favorable wages for better performers. This also has the benefit of showing which workers are repeatedly good at performing tasks. This is important because many InnoCentive solvers, for example, never solve more than one problem.[http://www.hbs.edu/research/pdf/07-050.pdf] One drawback of this method is that it decreases the "perfect meritocracy" that some seem to think can persist forever (if it exists at all). | ||
**2c. Workers could be paid according to the contributions they make. For this system | **2c. Workers could be paid according to the contributions they make. For this system to work, a platform and employer would have to work together. They would have to create a framework that allowed an initial screening for those who held the requisite qualifications or ratings, and then pay them according to the amount of time worked and contributions made. (Here some kind of algorithm might be useful.) | ||
*3. Educational Reform. Change educational components to provide skills that crowdsourcing cannot cover. This could include non- | *3. Educational Reform. Change educational components to provide skills that crowdsourcing cannot cover. This could include non-compartmentalizable tasks or exposure to a broader range of subjects. There are many problems with reforming the educational system. Aside from the many practical difficulties, there are two specific problems. First, it may be difficult to identify in advance what problems are crowdsource-able, as the capacity to crowdsource work is likely to change in the future. Second, because we don't yet know enough about crowdsourcing, it's difficult to say what skills or exposure to ideas one needs to perform certain tasks well--or outperform crowdsourcing. | ||
*4. Discourage wage reductions/crowdsourcing by having platforms require disclosure when crowdsourcing. This, however, may discourage crowdsourcing generally. The goal should be to reap crowdsourcing's benefits while minimizing its potential downside. Still, this solution could be used to less extreme degrees to pressure companies to offer competitive wages for crowdsourced tasks. | *4. Discourage wage reductions/crowdsourcing by having platforms require disclosure when crowdsourcing. This, however, may discourage crowdsourcing generally. The goal should be to reap crowdsourcing's benefits while minimizing its potential downside. Still, this solution could be used to less extreme degrees to pressure companies to offer competitive wages for crowdsourced tasks. | ||
*5. | *5. Incentivize pay-friendly behavior by using reputation system (see below). The various approaches here could be combined with or pushed into a reputation system. We explain various problems with reputation systems below. Here, we focus on the potential benefits of a three-way reputation system. "Three-way" means that the system involves all parties: workers, employers, and platforms. A reputation that engages all three parties can incentivize each party to engage in wage-friendly behavior. A reputation system, for example, that allows workers or platforms to assign reputational rankings to employers can cause wages to increase. If workers know that Corporation A generally pays 2x as much as Corporation B, workers can identify "fairer" wages. Similarly, companies may be willing to offer better wages if they can screen out "low reputation" workers. | ||
*6. Regulation. One of the benefits of crowdsourcing is the lack of regulation. That is sure to change, and it might be that regulating crowdsourcing, either as to wages or work, could benefit everyone. | |||
===== Reputation ===== | ===== Reputation ===== | ||
Reputation is an issue for all parties involved in professional crowdsourcing: the worker, the employer, and the platform. The Best Practices document focused on some of these issues in the microtasking environment. Concerns there were focused on speed, efficiency, and work quality. The Best Practices also focused solely on the relationship between individual workers and employers. These concerns also exist in the professional environment. There are, however, other problems to confront. Specifically, worker quality may become more important for professional work because the work done is more time, labor, and skill intensive. Additionally, because tasks are fewer in number and take longer, each contest or task completed also has greater influence on, and importance for, worker reputation. Finally, if professional work pays more than microtasks, the reputational stakes are higher. | Reputation is an issue for all parties involved in professional crowdsourcing: the worker, the employer, and the platform. The Best Practices document focused on some of these issues in the microtasking environment. Many of these reputation mechanisms may be able to be imported from user-based reputation systems--and there is literature on how to design reputation such systems.[http://oreilly.com/catalog/9780596159801] Concerns there were focused on speed, efficiency, and work quality. The Best Practices also focused solely on the relationship between individual workers and employers. These concerns also exist in the professional environment. There are, however, other problems to confront. Specifically, worker quality may become more important for professional work because the work done is more time-, labor-, and skill-intensive. Additionally, because tasks are fewer in number and take longer, each contest or task completed also has greater influence on, and importance for, worker reputation. Finally, if professional work pays more than microtasks, the reputational stakes are higher. | ||
''Reputation Concerns for Workers, Employers, and Platforms'' | ''Reputation Concerns for Workers, Employers, and Platforms'' | ||
* Workers. As crowdsourcing increases and professionalized crowdworkers have success, | * Workers. As crowdsourcing increases and professionalized crowdworkers have success, crowdworkers want to signal to potential employers that they have done good work in the past. Workers may also want to list their professional experience, education, or training (as one can do on Gerson Lehram Group[http://www.glgroup.com/]). Workers also will seek to ensure that antisocial behavior among themselves is counted against a worker's reputation. The literature on crowdsourcing emphasizes the community that exists in crowdwork environment, and ensuring the community functions well is important. 99designs has numerous instances where a crowdworker accuses another of "stealing" someone's design. Workers also will want to know the reputation of an employer. This could include factors like the amount paid, the quickness of payment, and the type of work offered. | ||
* Employers. Reputation also | * Employers. Reputation would also be valuable for employers--they'd prefer work from people with high reputations, not only to ensure good work, but also to ensure the work was not copied from somewhere else. While copying from the worker's perspective is important as a social matter, from an employer's perspective copying is important as a legal matter. Employers want to avoid copyright infringement or any claims of copyright infringement. Employers also want to broadcast their "good" reputations. | ||
* Platforms. Platforms have an incentive to ensure that workers' and employers' desire for a reputation system is met. It will, in many cases, be up to the platform to institute a system that can deal effectively with reputation, antisocial behavior, and legal issues. 99designs has implemented a system that seeks to deal with the latter two, but does not address reputation. | |||
''Problems.'' | ''Problems.'' | ||
*1. Portability. As the Best Practices document notes, workers and employers may want have a coherent identities across a variety of platforms. In doing this, they may want to keep their identities secret but their reputations public. If a worker moves from 99designs to iStockphoto, for example, they may have to create a totally new profile and build up a reputation, even though some aspects of their previous reputation may be relevant. | *1. Portability. As the Best Practices document notes, workers and employers may want have a coherent identities across a variety of platforms. In doing this, they may want to keep their identities secret but their reputations public. If a worker moves from 99designs to iStockphoto, for example, they may have to create a totally new profile and build up a reputation, even though some aspects of their previous reputation may be relevant.[http://blog.reppify.com/tag/crowdsourcing/] | ||
*2. Reliability. Any feedback-reputation system needs to be reliable. That is, it needs to accurately reflect the quality and quantity of work performed. One way to ensure diversity in feedback, and therefore reliability, is to have a mediated reputation system, where workers vote on both each other (where applicable) and employers, and employers do the same (vote on workers and each other). The system could then be mediated by the platform. | *2. Reliability. Any feedback-reputation system needs to be reliable. That is, it needs to accurately reflect the quality and quantity of work performed. One way to ensure diversity in feedback, and therefore reliability, is to have a (platform-mediated) reputation system, where workers vote on both each other (where applicable) and employers, and employers do the same (vote on workers and each other). The system could then be mediated by the platform. | ||
''Solutions'' | ''Solutions'' | ||
*1. Platform specific solution. The Best Practices document suggests that each crowdsourcing platform implement its own measures to track reputation. This is a reasonable solution. 99designs and InnoCentive do not yet have such a system. This kind of solution would allow each platform to identify the reputational characteristics necessary for the kind of work offered. 99designs, for example, could track worker reputation based on cooperation among workers, number of contests won, and portfolio ratings from other users. It could also track employer reputation based on payment amount and speed, and copyright licensing terms. InnoCentive, by contrast, could focus on credentials characteristics, such as degrees obtained or relevant work experience. It could track employers based on potential for future projects with the same company. The Best Practices document also suggests that workers have access to their data, and presumably envisions a way to use reputation information at one provider or another. Here, the solution could be for each platform to issue a "portable reputation," which would explain the reputation of the worker on that site, the work performed, and the average/median rating of a worker on the site (with similar work performed). In this scenario, platforms would have to have some interoperability. Each platform, for example, could agree to host the reputational information provided by another. In such a situation, the worker could 'import' his or her data from, say, 99designs to iStockphoto. When that worker completes a task or uploads a photo, an employer could click on the worker's profile to view various bits of reputational information. (Perhaps they could all agree to a specific file (type) that included certain information?) In one iteration, this could include "Reputation Homepage" that displayed the logos of various platforms for which the worker had completed tasks. Scrolling the mouse over the logo might reveal some general reputational information,such as a reputation score (with the appropriate scale for the platform). | *1. Platform-specific solution. The Best Practices document suggests that each crowdsourcing platform implement its own measures to track reputation. This is a reasonable solution. 99designs and InnoCentive do not yet have such a system. This kind of solution would allow each platform to identify the reputational characteristics necessary for the kind of work offered. 99designs, for example, could track worker reputation based on cooperation among workers, number of contests won, and portfolio ratings from other users. It could also track employer reputation based on payment amount and speed, and copyright licensing terms. InnoCentive, by contrast, could focus on credentials characteristics, such as degrees obtained or relevant work experience. It could track employers based on potential for future projects with the same company. The Best Practices document also suggests that workers have access to their data, and presumably envisions a way to use reputation information at one provider or another. Here, the solution could be for each platform to issue a "portable reputation," which would explain the reputation of the worker on that site, the work performed, and the average/median rating of a worker on the site (with similar work performed). In this scenario, platforms would have to have some interoperability, a problem on which some already are working.[http://portal.acm.org/citation.cfm?id=1839707.1839723] Each platform, for example, could agree to host the reputational information provided by another. In such a situation, the worker could 'import' his or her data from, say, 99designs to iStockphoto. When that worker completes a task or uploads a photo, an employer could click on the worker's profile to view various bits of reputational information. (Perhaps they could all agree to a specific file (type) that included certain information?) In one iteration, this could include "Reputation Homepage" that displayed the logos of various platforms for which the worker had completed tasks. Scrolling the mouse over the logo might reveal some general reputational information,such as a reputation score (with the appropriate scale for the platform). This idea, though, requires cooperation among platforms--something that has been difficult in the past. One reason for this difficulty is because large, established platforms have a perverse incentive to keep a reputation system closed. These platforms already have a user base that depends upon the platform. Portability threatens what they perceive as theirs: the reputations earned through their platforms, which make the platform a desireable place to conduct business. eBay, for example, objected when Amazon allowed users to import their eBay reputations, claiming proprietary interest.[http://www.unpcdc.org/media/6406/reputation%20mechanisms%20and%20electronic%20markets%20-%20economic%20issues%20and%20proposals%20for%20public%20procurement.pdf] (p. 237.) This confirms Randal Picker's contention that, as far as portability on the Web goes, "law matters."[http://www.law.northwestern.edu/lawreview/colloquy/2008/25/LRColl2008n25Picker.pdf] (p. 8.) [http://sandeepkumar.org/my/papers/2009_CAT_PortableReputation.pdf][http://www.igi-global.com/Bookstore/Chapter.aspx?TitleId=30454] | ||
*2. Uniform Reputation. Another approach to reputation would use a centralized entity to | *2. Uniform Reputation. Another approach to reputation would use a centralized entity to manage worker/employer reputation across all platforms. One can imagine a universal reputation platform that draws information from various sites and aggregates it--maybe it has agreements with all the platforms to provide it reputation-related information. (There are already sites that purport to provide reputational rankings of websites--see, e.g., Webutation.[http://www.webutation.net/#]) In other words, this entity would maintain a database of all crowdsourced employers/workers reputations. Reputational information would be aggregated automatically from all participating crowdsourcing platforms, which would automatically submit data to this entity. Theoretically, employers and workers could consult this platform to determine worker reputations. To enable this "reputation finding," the reputation database could contain various "categories" of workers and employees, and allow an individual to sort based on various reputational scores, which would be determined based on a variety of metrics and the information provided to the centralized entity by the crowdsourcing platforms. So, for example, a company could find someone with a high "creative reputation" (e.g., logo design) or someone with a high "engineering" rating. It may also allow users to disaggregate reputational information across platforms to see how the users has performed/behaved in each. Additionally, a central reputation-authority would allow people to move "with" their reputations and stay "anonymous"--they won't disclose their real identity. As noted, this all would require platforms to (automatically) submit information to this reputation database/entity. Because platforms (in the hypothetical world) would automatically submit worker reputation info to the site, there are essentially two ways the site could work at the level of individual reputations. Both methods of operation work from the premise that individuals and employers are searchable by reputation. Search results might display an aggregate reputation or a simple ranking with no reputational score accompanying it. The first way resembles the solution described above: once a user is selected, a visitor is taken to a reputation homepage that displays a worker/employer reputation from different platforms, and provides information about each. This provides at least one benefit for platforms: they don't have to generate interoperability--they can delegate reputation aggregation to one entity, which could probably collect and display the information more effectively. Two authors have already sketched a design for a similar system.[http://sandeepkumar.org/my/papers/2009_CAT_PortableReputation.pdf] A second method would be to for the reputation site to create its own "general" reputation for each worker based on the information it receives from platforms. We can imagine a situation in which workers or employers can ask the reputation site to rank workers based on specific, general, or a combination of characteristics. This would allow employers to sort workers by reputational characteristics they deem important to a particular type of work. Once sorted, employers could provide an open-call project to the narrowed crowd of workers. This system also would require the centralized agency to weight different sites or reputational attributes. This is a problem of reliability, discussed below. One other advantage of a unified reputational system would be allowing some user control over reputation. We can imagine a place in a user profile where the user can fill in information, provide explanations, or otherwise comment on past work experience and the like. (There may also be room for a centralized-decentralized approach, where the user claims authority over her reputation and distributes it how she wants. Jon Udell outlined such a concept in his recent talk at the Berkman Center.[http://cyber.law.harvard.edu/events/luncheon/2010/12/udell]) | ||
*3. Reliability for Platform-Specific Versus Uniform. Each system of reputation (platform specific and uniform) will have issues of reliability. By reliability we mean a platform can accurately report to a user how well or poorly a specific individual has performed in the past, where performance is based specific criteria. | *3. Reliability for Platform-Specific Versus Uniform. Each system of reputation (platform specific and uniform) will have issues of reliability. By reliability, we mean a platform can accurately report to a user how well or poorly a specific individual has performed in the past, where performance is based specific criteria. Because reputation systems rely on information, what information is used and who discloses it can influence the reliability of a reputation system.[http://www.unpcdc.org/media/6406/reputation%20mechanisms%20and%20electronic%20markets%20-%20economic%20issues%20and%20proposals%20for%20public%20procurement.pdf] | ||
**Platform-specific reputational systems would likely achieve greater reliability because they would be localized. As self-contained systems, each platform could respond to user demands, as well as tinker with existing formulae that garner ratings. Still, it's not clear how reliable these local reputational systems could be across platforms. Assuming for the moment that each platform can in some way support the reputational rating of another platform, we still might have the problem of "overload": individuals will have ''too many'' reputational scores from ''too many'' different platforms. Employers may then look only to the lowest rating of any given platform on a user profile. Thus, platform specific rating systems could actually work to the detriment of the users. | **Platform-specific reputational systems would likely achieve greater reliability because they would be localized. As self-contained systems, each platform could respond to user demands, as well as tinker with existing formulae that garner ratings. Still, it's not clear how reliable these local reputational systems could be across platforms. Assuming for the moment that each platform can in some way support the reputational rating of another platform, we still might have the problem of "overload": individuals will have ''too many'' reputational scores from ''too many'' different platforms. Employers may then look or sort only to the lowest rating of any given platform on a user profile. Thus, platform specific rating systems could actually work to the detriment of the users. | ||
**A uniform reputational system may face more reliability issues than an interoperable platform-specific reputational system. For one thing, a unified system will be drawing reputation data from hundreds of different platforms automatically. There is a risk for gaming the system on two levels here. First, a user at a relatively unsophisticated platform may try to manipulate the system and boost their own reputational rating. Second, the platform itself may have an incentive to give its users higher reputational scores. Why? Because assuming the unified system computers some aggregate reputational score, any platform can boost their users' scores and give them an advantage in the market place. Another problem arises as to weighting reputational scores. Should a reputation score from InnoCentive be "worth" more than one from Gerson Lehrman? Do reputation scores from Threadless even translate into anything meaningful? If so, should they be treated the same as scores from 99designs. All of these questions illustrate that there is no easy way to decide how scores should be weighted, though we can think of a variety of factors, including number of users, length of existence, customer satisfaction, etc. Part of the problem also is that different platforms may use different rating systems--some may use only one general rating, while others may parse ratings into various categories. How these issues are dealt with impacts the "reliability" of reputation. It also, therefore, impacts the trust that crowdworkers have in a reputational system. | **A uniform reputational system may face more reliability issues than an interoperable platform-specific reputational system. For one thing, a unified system will be drawing reputation data from hundreds of different platforms automatically. There is a risk for gaming the system on two levels here. First, a user at a relatively unsophisticated platform may try to manipulate the system and boost their own reputational rating. Second, the platform itself may have an incentive to give its users higher reputational scores. Why? Because assuming the unified system computers some aggregate reputational score, any platform can boost their users' scores and give them an advantage in the market place. Another problem arises as to weighting reputational scores. Should a reputation score from InnoCentive be "worth" more than one from Gerson Lehrman? Do reputation scores from Threadless even translate into anything meaningful? If so, should they be treated the same as scores from 99designs. All of these questions illustrate that there is no easy way to decide how scores should be weighted, though we can think of a variety of factors, including number of users, length of existence, customer satisfaction, etc. Part of the problem also is that different platforms may use different rating systems--some may use only one general rating, while others may parse ratings into various categories. How these issues are dealt with impacts the "reliability" of reputation. It also, therefore, impacts the trust that crowdworkers have in a reputational system. | ||
==== "Game" Tasks: Gwap ==== | |||
Gwap is a website comprised of '''g'''ames '''w'''ith '''a''' '''p'''urpose (i.e., gwap).[http://www.gwap.com] That is, when its users play the games, they are simultaneously doing tasks that, in the aggregate, perform some function that improves our state of knowledge and/or advances our technology. For instance, the most well-known game on Gwap is called the ESP Game, where two players are shown a photo, each enters a list of suggested tags that the other player cannot see, and then they win points when they have a matching tag. The purpose behind the game is to create tags for images online, so that search engines can sift through them more easily when a user enters a query.[http://www.gwap.com/gwap/about/] As Gwap's tagline states, "When you play a game at Gwap, you aren't just having fun." While such platforms offer an innovative way to draw on the 'wisdom of the crowd,' they also raise concerns about the potential exploitation of users and its concurrent effects. | |||
=====Addiction===== | |||
''Problem.'' | |||
The question of addiction has arisen both in the context of crowdsourcing generally [http://www.informaworld.com/smpp/content~db=all?content=10.1080/13691181003624090] and in the context of so-called "social games", [http://www.gamespot.com/news/6284524.html][https://secure.wikimedia.org/wikipedia/en/wiki/Social_gaming] such as FarmVille. [http://www.farmville.com/] In a paper entitled ''Moving the Crowd at Threadless'',[http://www.informaworld.com/smpp/content~db=all?content=10.1080/13691181003624090] Daren C. Brabham offers some insight into the various motivations of the crowd at Threadless,[http://www.threadless.com/] a website that sells t-shirts and applies crowdsourcing through its solicitation of the crowd for designs and slogans.[http://www.threadless.com/submit] In particular, Brabham discusses how members of Threadless use the language of "addiction" when talking about their participation in Threadless. In his view (2010: 3, 17), the language of addiction illuminates the significance of building a community in order for crowdsourcing to be truly effective, such that organizations that use crowdsourcing "need to allow the crowd to truly support the problem-solving mission of a crowdsourcing venture for the public good, to generate in the crowd a sense of duty and love – and even addiction – to such a project", although he ultimately concludes that members of Threadless are most likely not actually addicted in the pathological sense. | |||
On the social gaming side, Gamespot--the self-styled "go-to source for video game news, reviews, and entertainment"[http://www.gamespot.com/]--recently ran a story on the ethics of the social games market.[http://games.slashdot.org/story/10/11/25/0634243/The-Ethics-of-Social-Games?from=rss] Here, the concern is that social games have crossed the line from being fun to being addictive. As Edmund McMillen [https://secure.wikimedia.org/wikipedia/en/wiki/Edmund_McMillen] put it, "There's a difference between addicting and compelling . . . . Crack is addicting, but it's not a fun game." | |||
So how does this affect crowdsourcing games? Since games like those at Gwap involve both crowdsourcing and social gaming, the concern is that the risk for addiction is more acute. Although undoubtedly hyperbole, players of the ESP Game, for example, have said that it is "strangely addicting" and that "it's like crack!" (See video [http://video.google.com/videoplay?docid=-8246463980976635143#] at 22:30.) | |||
''Solutions.'' | |||
There are a number of safeguards, detailed below, that can be built directly into the platforms. Ideally, such changes would be made voluntarily, but legal regulation can be used as a tool to accomplish the necessary changes as well. | |||
*1. Platforms can kick people off after a specified amount of time. For instance, Gwap will kick users off if they have played for 15 hours straight, or 10 hours if they are from an .edu domain. (See video [http://video.google.com/videoplay?docid=-8246463980976635143#] at 14:54-15:02.) | |||
*2. Platforms can notify the player after a specified period of time has elapsed. (5 hours) 'Just to let you know, you've been playing for quite a while.' (10 hours) 'Are you ''sure'' you want to keep playing?' (12 hours) 'Really, you should go outside.' (15 hours) 'All right, that's it. I'm kicking you off for your own good.' The principle behind both this solution and #1 mirrors the one that led to the development of applications designed to prevent people from sending drunken e-mails that they rue when sober, such as Gmail's Goggles[http://gmailblog.blogspot.com/2008/10/new-in-labs-stop-sending-mail-you-later.html]; namely, that sometimes a little check on people can keep them from making decisions that they might later regret. | |||
*3. Platform can have built-in termination of the games. In other words, games should have an established end to them. For instance, with games like Life[http://www.amazon.com/Hasbro-4000-Game-of-Life/dp/B00000IWD7/ref=sr_1_1?s=toys-and-games&ie=UTF8&qid=1291326858&sr=1-1] and Candyland[http://www.amazon.com/Hasbro-4700-S5-Candyland/dp/B00000DMF5/ref=sr_1_1?ie=UTF8&qid=1291326875&sr=8-1], the game ends when one or more players reaches the end of the trail. In contrast, games like Taboo[http://www.amazon.com/Parker-Brothers-14677-Taboo/dp/B001RN88DK/ref=sr_1_1?ie=UTF8&qid=1291326894&sr=8-1] and Apples to Apples[http://www.amazon.com/Apples-Party-Box-Hilarious-Comparisons/dp/B00112CHCK/ref=sr_1_1?ie=UTF8&qid=1291326919&sr=8-1] continue until the players themselves call an end to the game. Built-in termination of the game would also be an effective tool to address addiction concerns for social games in general--imagine if FarmVille actually ended at some point. | |||
*4. Is it really a problem? It should be noted that there is little, if any, support for these assertions by way of rigorous studies. Moreover, it is hard to see how addiction is more of a problem for crowdsourcing games than for television or other kinds of entertainment that, despite the substantial amount of time people spend doing them, we do not regulate with regard to their potential addictiveness. | |||
=====Disclosure===== | |||
''Problem.'' | |||
Another concern that arises with these games is that the players may not know the underlying purpose. For instance, on Gwap, there is an 'About' section that describes the purpose behind the ESP game as an example of what is meant by "games with a purpose",[http://www.gwap.com/gwap/about/], yet there is no place on the website that describes the purpose of any of the other six games. Although Luis von Ahn, the main creator of Gwap and hailed as the "father of human computing", [http://www.academicproductivity.com/2009/luis-von-ahn-on-doing-research-vs-writing-papers/] describes the purpose and features of Gwap's games in academic papers (see his CV [http://www.cs.cmu.edu/~biglou/CV.pdf] for a list of his papers), chances are that few users have access, know-how, or interest in hunting down and reading them. But that does not mean that they would not want to know if given the opportunity. | |||
''Solutions.'' | |||
*1. Objective Standard. As noted in the Best Practices, there should always be as much disclosure as possible. However, it is not always easy to see how this general principle should be applied in different situations. Perhaps one useful tool would be to employ an objective standard in deciding whether and to what extent certain information should be disclosed. In securities law, the materiality of information is measured by what a reasonable investor would want to know [https://secure.wikimedia.org/wikipedia/en/wiki/SEC_Rule_10b-5#Language_of_the_rule], so perhaps a good standard for crowdsourcing games would be what a reasonable person would want to know in deciding whether to play the game. | |||
==== " | One may argue potential players could care less what the underlying purpose of a given game is, since they are only playing it for enjoyment. The obvious response to such an argument is that it depends on the purpose of the game. For instance, most people are probably either neutral or positive towards the ESP Game's purpose of labeling images to make them easier to sort, but they would most likely have a negative reaction towards a game whose purpose was to spam people with porn ads. | ||
There is also the possibility that disclosure may not always be feasible for the game to work as intended. For instance, a psychology lab might formulate a game to test x, but if Player A is aware that they are testing x, it might inhibit or otherwise change how she plays the game, making the results useless for the purposes of study. So perhaps the caveat should be added that companies should disclose any information that a reasonable person would want to know in deciding whether to play the game, "except for information that, if disclosed, would have an adverse effect on the underlying task, aside from making the player unwilling to play." So "employers" would still be required to disclose as much as possible, but not to the point of comprising the underlying purpose of the game. | |||
One question that has arisen is what to do when the nature of the task/game relies on nondisclosure, such as making edits to a document discussing the employer's trade secrets. One possible solution is to require workers to sign a confidentiality agreement. Another would be for workers to have some sort of profile where they indicate what employers they are or are not willing to work or play games for (e.g., Google-philes could indicate that they do not want to play any games sponsored by Microsoft). A third possibility is that employers could do something like a privilege log in discovery, where they indicate why they are not disclosing, and a designated third party (the equivalent of the judge in discovery) could decide whether the nondisclosure is okay. | |||
But an alternate answer is that employers should simply not use crowdsourcing for particularly sensitive tasks. If it is that important to make sure something stays confidential, keep it in-house. When employers exponentially increase the number of people to whom the information is exposed, they are exponentially increasing the risk that that the information will be leaked -- they should either accept that risk or go elsewhere. | |||
*2. Timing. A further option would be to require additional disclosure after the player has finished the game, since the task will have been completed at that point, and then giving her the option of voiding her results. However, this may lead to people voiding their results for reasons other than taking issue with some aspect of the company or task, such as embarrassment or discomfort with how they acted during the game. | |||
*3. Review Process. If additional oversight is needed to ensure that companies are in compliance with this standard, companies could set up internal review boards. Alternatively, an outside party, such as the government, a non-profit, or a coalition of companies, could set up an independent review board that is responsible for reviewing and approving (or not) the proposed games and accompanying information, much like researchers are required to submit their proposals to an institutional board when the study involves human subjects.[https://secure.wikimedia.org/wikipedia/en/wiki/Institutional_review_board] Approval by such a board could be required before the game is crowdsourced, and/or the board could be given the authority--either by the government or by agreement among the companies--to do random audits or investigate complaints, just like the SEC can investigate and determine whether a company has failed to disclose material information with regard to its securities.[http://www.sec.gov/complaint.shtml] | |||
=====Compensation===== | |||
''Problem.'' | |||
One question that has come up with regard to crowdsourcing games is whether players should be compensated for doing tasks through the guise of a game.[http://vonahn.blogspot.com/2010/07/work-and-internet.html] Although the task is structured to be fun, it is still work that employers need done and would ordinarily pay for, were it not set up as a game. So are they exploiting the crowd by somehow 'tricking' them into doing work for free when the companies would have to pay for it otherwise? Does it make a difference whether people are aware of the underlying purpose? In the preceding section, we argue that people should know the task behind the game so that they can make an informed decision about whether to play it based on their personal values, but would the lack of compensation also play a role in their decision? Does it depend on whether they perceive it as work? More importantly, if people will do it anyway, why should companies pay for it? | |||
''Solution.'' | |||
Players should remain uncompensated for a number of reasons. First of all, if people don't find the games fun, they won't play them. For instance, one game on Gwap is called Squigl, in which two players are shown an image and a word, and have to trace the object described by the word.[http://www.gwap.com/gwap/gamesPreview/squigl/] If their traces match, they earn points. One of the authors of this page, doing laborious, intensive research for this section, found the game excruciatingly dull. So she stopped playing. As von Ahn himself wrote, "The key property of games is that people want to play them." (Luis von Ahn & Laura Dabbish, ''Designing Games with a Purpose'', Communications of the ACM, August 2008, vol. 51, no. 8.) | |||
In other words, crowdsourcing games aren't really "free" -- companies have to spend time and money figuring out a way to convert a task into a game that is sufficiently fun that people will play it voluntarily. While the total sum might be significantly less than what it would cost to pay each worker/player, it lessens the sense that the crowd is being exploited if creating such games is not without effort and cost to the employer. Moreover, offering compensation might bring players closer to the brink of addiction: if players were earning money, however trivial an amount, they might feel compelled to keep playing, more so than if they were only playing it for enjoyment, although further research is needed on this point. Alternately, players might actually play ''less'' if offered a token amount to play rather than an amount that reflects of the value of their work, because they would feel exploited, even though they would be getting more than if they were paid nothing, similar to the psychology behind the ultimatum game.[http://money.howstuffworks.com/ultimatum-game.htm] Again, however, more research is needed. | |||
According to von Ahn, people played over '''9 billion hours''' of Solitaire in 2003. (See video [http://video.google.com/videoplay?docid=-8246463980976635143#] at 7:00-7:04.) Solitaire is a game without a purpose, other than to entertain.[https://secure.wikimedia.org/wikipedia/en/wiki/Solitaire] So if von Ahn's statistic is accurate, people are perfectly willing to devote a significant amount of time to entertaining themselves. If the games on Gwap serve that function, why should they be penalized for simultaneously being useful? | |||
One might argue that the issue is not that companies would be penalized were they forced to compensate the players for playing games with a purpose, but rather that the workers are being currently penalized through the lack of compensation because they are unwittingly contributing free labor when they are entitled to a fair wage for their work. The question then becomes whether playing such games is indeed "work." We would argue that it is not, because people are playing the game primarily for its entertainment value: if it stops being entertaining, then they will stop playing it. Alternatively, if they are playing it because they support the underlying purpose, then that purpose is clearly important enough to them that they are doing it for free, so they already have sufficient incentive to do the work. After all, do we want to encourage a value system where everything must be set at a price? (If the answer to that question is yes, Wikipedia is in trouble.) | |||
Moreover, it should also be noted that not all workers get paid; some are volunteers. For instance, in ''Crowdsourcing'', Jeff Howe talks about the contribution of the crowd to tasks like NASA’s "Clickworkers" project (2006: 62)[https://secure.wikimedia.org/wikipedia/en/wiki/Clickworkers] and the Cornell Lab of Ornithology’s eBird project (2006: 31)[http://ebird.org/content/ebird/about]. In both examples, the crowd knows that what they are doing is work, but they still do it, even though they are contributing information for free, because they support the purpose for which they are laboring. If anything, this emphasizes the importance of disclosure rather than compensation. | |||
== Summary == | == Summary == | ||
Each of the three types of crowdsourcing discussed above presents interesting challenges based on the unique characteristics that each possesses. While there is some overlap among them -- for example, the ESP Game is essentially a microtask designed as a game -- each manifests its "difficult problems" in a different way. For instance, while compensation is a potential issue for all three, there is a concern present with microtasks that workers are not sufficiently compensated to make a living wage; unease that higher education is being devalued through lower wages for "professional" tasks; and a question whether workers are somehow being duped into doing work for free with crowdsourcing games. While the Best Practices document outlines great general solutions to problems, we felt that the different types of crowdsourcing required more tailored responses. Thus, we have attempted here to offer specific answers to some of the problems within each kind of crowdsourcing, while also providing a look at crowdsourcing generally and the disputes currently taking place in the literature. The next step from here might be to do a series of demonstration projects, utilizing and testing some of the suggestions here. |
Latest revision as of 05:46, 13 August 2020
Crowdsourcing: Background and Working Definitions
Definitions
At present there is no generally accepted definition of crowdsourcing, and commentators have used many different meanings. Therefore, we believe an overview of the various definitions offered is helpful for further discussion of different types of crowdsourcing. For the purposes of our discussion, we will be using the terms "crowdsourcing", "crowdwork", and "cloudwork" interchangeably.
The most widely accepted definition of crowdsourcing comes from Jeff P. Howe, who recognized it as "the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call."[1]
He further clarified that the form of crowdsourcing could be either peer production (when co-workers interact and collaborate on projects) or sole individuals (when co-workers, if any, are isolated from one another). Under Howe's definition, the employer must be an organization (in most cases, a corporation), because he was considering crowdsourcing as a new type of corporate business model, by which corporations could raise current productivity or establish new businesses that were not possible before. Nevertheless, we do not think the employer using crowdsourcing, as a matter of definition, must be an organization; individuals can certainly outsource a task to an online crowd.
Kleemann and Vob (2008) argue that "central to the concept of crowdsourcing is the idea that a crowd of people, collaboratively (or at least simultaneously) contribute to an aspect of the production process or to the solution of a design issue or other problems."[2]
Although we agree that simultaneous or collaborative work is a significant type of crowdsourcing, it is not the only one. The Best Practices entry for crowdwork, developed last year and reposted on Class 3, divides crowdwork into three categories: "First, a large group of workers may do microtasks to complete a whole project; the best-known platform in this arena is Amazon Mechanical Turk. Second, companies may use cloudwork platforms to connect with individual workers, or a small group of workers, who then complete larger jobs (e.g., Elance[3] and oDesk[4]). Finally, a company may run 'contests,' where numerous workers complete a task and only the speediest or best worker is paid (e.g., InnoCentive[5] and Worth1000[6]). In some contests, the company commits to picking at least one winner; in others, there is no such guarantee." It is clear that when the crowdsourcing takes the form of competitive bidding, not every participant works on a single aspect of the task; each of them works on the whole task, and they do not have to work at the same time. Only the final winner gets compensated. It is possible that only one individual or organization joins the bidding process and no competing parties are involved.
Another concept that Reichwald and Piller (2006) used to describe crowdsourcing is "interactive value creation". They further differentiate two types of crowdsourcing: mass customization and open innovation.[7] We do not think Reichwald and Piller's approach is convincing. First, the general term "interactive value creation" can be used to cover many types of online collaborative activities traditionally not recognized as crowdsourcing, e.g., open source development. Second, mass customization refers to an isolated customer's activity to tailor one particular product, rather than contributions to a general product. Their second type, "open innovation", correctly points out that crowdsourcing should be outsourced through a open call, but we do not believe a strong degree of "innovation" is necessary, especially for highly divided microtasks.
Based on the above discussion, we believe that there are two core elements of crowdsourcing, both of which may be facilitated by an online platform (such as Amazon Mechanical Turk[8])
1. The task is outsourced through an open call from the employer;
2. The recipients of the call, whether or not they elect to participate, comprise a large, amorphous crowd.
The following discussion of crowdsourcing reflects our understanding of three of the most significant types of crowdsourcing: microtasks, "professional" tasks, and "game" tasks. Microtasks facilitated by platforms such as Amazon Mechanical Turk [9] might be the typical type of crowdsourcing in literature.[10] Not only is there an open call by an employer to the crowd, the crowd also collaborates on the whole task by each member addressing a small piece of the pre-divided work. Professional tasks, in contrast, might have multiple people working on them, but there is no collaboration among them, and their work usually involve a higher degree of innovation. Game tasks are again a different animal, since they copy many attributes of the other two, but add another layer by aiming to be entertaining as well as purposeful.
General Information on Crowdsourcing
- General Information
- For a quick overview by Jeff Howe, author of Crowdsourcing,[11] take a look at this YouTube clip.[12]
- Northwestern University Professor Kris Hammond also explains crowdsourcing, but argues that its downsides are worker rewards and quality.[13]
- Our very own Jonathan Zittrain discusses crowdsourcing in his talk, Minds for Sale.[14]
- Several individuals gathered to discuss crowdsourcing in a panel moderated by New York Times correspondent Brad Stone.[15]
- In the News
- Examples of Crowdsourcing
- Take a look at Wikipedia's compilation.[19]
Crowdsourcing Literature
General Overview
Although the idea of crowdsourcing--if not the word itself--has been around for many years, the Internet has made it much easier, cheaper, and efficient to harness the power of crowds. The power of crowds was popularized in 2004 when James Surowiecki published a book entitled, The Wisdom of Crowds.[20] This book purported to show how large groups of people can, in many cases, be more effective at solving problems than experts. According to Surowiecki (2004: xiii), "under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them." Two years later, journalist Jeff Howe coined the phrase "crowdsourcing" to refer to work that was performed by the "masses" online.[21] Since Howe's article was published in 2006, numerous authors have written books on crowdsourcing, each choosing to focus on different aspects of the topic. Howe himself took up the topic in 2008, proclaiming crowdsourcing to be a panacea--a place where a perfect meritocracy could thrive.[22] Howe examined crowdsourcing from a variety of perspectives: what benefits it can provide, what kinds of tasks it can accomplish, and the potential changes it may bring about. Howe's prognosis for crowdsourcing was positive--in it he saw many potential solutions and few potential problems. Others have followed Howe's lead in describing the benefits of crowdsourced work. Clay Shirky has published two books--Here Comes Everybody (2008)[23] and Cognitive Surplus (2010)[24]--in which he describes how technology does more than enable new tools, it also enables consumers to become collaborators and producers. Although Shirky's books are not expressly about crowdsourcing per se, they mirror the optimism Howe expresses, both in terms of collaborative enterprises and the Internet's power to enable them.
These books have provoked an academic interest in finding out who is the crowd, or why the crowd moves the way it does. Some have looked at scientific crowdsourcing, asking what characteristics make someone a successful crowdworker/problem-solver.[25] Part of answering that question, it turns out, requires asking why people attempt to be part of the innovating crowd in the first place. The authors of this study found that the crowd was highly educated. They also found heterogeneity in scientific interests, as well as monetary and intrinsic motivations, to be important drivers of "good" problem-solvers. Others have examined non-scientific endeavors and asked similar questions. One report, for example, examined workers on iStockphoto.[26] This report found that most iStock crowdworkers developing photographs were highly educated and motivated primarily by money. Jeff Howe, however, takes a different perspective as to crowdworkers' motivation: "There are . . . two shared attributes among almost all crowdsourcing projects: participants are not primarily motivated by money, and they’re donating their leisure hours to the cause. That is, they’re contributing their excess capacity, or 'spare cycles,' to indulge in something they love to do." (Crowdsourcing, pgs. 28-29.)
While some focused on the potential consumer revolution or the composition of the crowd, others examined the business-related aspects of crowdsourcing. Identifying attributes of successful crowd innovators also has a business dimension. One researcher suggests that having experience spanning across a variety of communities or disciplines makes one likely to be considered 'innovative.'[27] Others focus more broadly on how to use the crowd to maintain or bolster business or brand. In Groundswell (2008),[28] Charlene Li and Josh Bernoff focus on how to most effectively use crowdsourcing to advantage businesses. The authors highlight how user bases of products can undermine a product or brand.[29] As a result, the authors propose that businesses use the "groundswell" to their advantage, fostering communities that can provide valuable feedback and economic payoffs. Marion K. Poetz and Martin Schreier have also taken a business perspective on crowdsourcing,[30] arguing that the crowd is capable of producing valuable (but not always viable) business ideas at a low cost. Other researchers have found that young entrepreneurs who were attempting to start businesses frequently belonged to these kinds of communities.[31] For a related discussion on user innovation and user communities, see Eric Von Hippel's books[32] and William Fisher's article in the Minnesota Law Review.[33]
Other authors have pointed out some of the problems with crowdsourcing. Dr. Mathieu O'Neil has argued that, despite its benefits, crowdsourcing can have inconsistent quality, can lack the diversity needed to draw on the "wisdom of the crowd", and can contain many irresponsible actors.[34] Miriam Cherry has argued that some crowdwork can be exploitative, sometimes forcing people to work for absurdly low wages.[35] She argues that we need a legal framework for addressing low wages, proposing we apply the Fair Labor Standards Act (FLSA) to crowdsourced work like that found on Mechanical Turk. In a forthcoming article, she takes a more systematic (but still legal) approach to suggesting solutions for the problems faced by different kinds of virtual work.[36] Cherry seems to be the only law professor to have written on addressing crowdsourcing from a doctrinal perspective.
Much of the other literature on the subject concerns the problem of quality. Soylent--which is essentially a crowdsourced editing program--has been a prime example of how lack of quality can limit the commercialization of a innovative and useful crowdsourcing product.[37] Cheat detection--the ability to filter out individuals who complete tasks without actually reading them in the hopes of receiving money without doing the work--has also recently drawn attention. For instance, a possible crowdsourced solution to cheaters has been proposed for sentence translations, relying on principles such as crowd consensus and logical parallelism in sentence structure and word choice. [38] Others have attempted to increase the quality of the traditionally-automated mechanism used to translate words by crowdsourcing translation tasks.[39] In addition to simple crowdsourcing, one set of authors suggests combining human crowdwork with machine work. This process, according to the authors, the system can specific a specific "speed-cost-quality tradeoff," which is based on an allocation of tasks among computers and humans.[40] John J. Horton, David Rand, and Richard Zeckauser have addressed using the online crowd for quality experimental research.[41]
Other Problems
The literature on crowdsourcing often discusses either broad or specific issues. Books tend to have an overall argument about the value of crowdsourcing, its core attributes, and how it needs to be structured. Articles, conversely, tend to describe specific studies or problems within a particular community. There is little room for systematically addressing common crowdsourcing problems. Instead, the platforms offering crowdsourcing, such as Mechanical Turk, address these problems internally. 99designs--a website that allows people to solicit creative logo designs--has several policies regulating the behavior of those who request[42] and perform[43] work. Most crowdsourcing services have similar policies or recommendations. In January 2010, a small group of students from Harvard Law School and Stanford Law School gathered in Palo Alto for three weeks to talk about these more general problems. They produced a document of Best Practices (Class 3), which sought to identify and propose a framework to address problems endemic to crowdsourcing. That document identified six major issues that needed to be addressed in cloudwork:
1. Disclosure: Workers want to know the identity of the employer, so disclosure should be the default preference.
2. Fairness: Employers sometimes underpay, pay late, or don't pay at all, so employers should pay fair and just wages on time.
3. Feedback and Monitoring: Judging the worker, task, or company is difficult for each player, so platforms should work to enable better feedback and monitoring systems.
4. Healthy Work Environment: Workers face the risks of stress from repetition, alienation and isolation, and addiction, so platforms should explain risks and companies should implement strategies to reduce risks.
5. Reputation and Portability: Workers who do good (or bad) work cannot capitalize on (and employers cannot avoid) their work, so platforms and companies should work to keep records of worker information and use it to track performance and confirm identities.
6. Privacy Protection: Workers are concerned with employers sharing their (potentially sensitive) information, so platforms should protect information and not release it.
The Best Practices document provides a good starting point because it identifies several major issues common to all crowdsourcing. It does not, however, capture all potential problems. Additionally, it tends to focus concerns only on the workers, but platforms and companies also face similar problems. Moreover, because the document is meant as a general framework, it is hard to get a sense of whether it could be effectively implemented across the board. There is room, then, to explore problems that are both broad enough to have implications for a variety of actors, but specific enough to merit a context-specific solution.
Our Addition: Identifying Areas, Exploring the Problems
Given the body of literature and the Best Practices document, we found the idea of addressing systemic problems both attractive and difficult. Instead of replicating the Best Practices, or simply writing an overview of crowdsourcing, we decided to take a different angle. Unlike the Best Practices document, which classified problems generally and then worked downward to devise specific solutions by applying them to different types of crowdwork, we worked from the bottom up. We identified three types of crowdwork that suggested a variety of important, but (context-)specific problems. At the beginning stages, we had only our intuition to guide our "sense" of the problems. As we delved further into them, however, they crystallized. From our discussions we identified three types of crowdwork in which specific problems arise, some of which are systemic problems with crowdsourcing that the Best Practices does not address. Nevertheless, we wanted to draw on the Best Practices document to determine whether some of its strategies seemed workable or needed to be expanded, refined, or discarded. To accomplish this goal, we attempted to integrate the Best Practices approaches into our framing of both the problems and the solutions we discussed.
An Introduction to Our Approach
Our discussion of various crowdsourcing environments suggested a variety of ways to slice the pie. In the end, we settled on three areas of crowdsourcing, reaching a rough classification based on the type of work performed. In that sense, our division followed the Best Practices division of work into microtasks, connective tasks, and contest tasks. But there was an important difference: our classification of work depended also upon the purpose for which the work was being put, focusing on a specific case study for each. In other words, it mattered to us that one task was framed as a "game" versus a "survey." We cared not just about the framing, but also the motives of employer and the worker. We asked questions like, "For what purpose is the employer requesting this task?" and "Why does the worker choose to perform the task?" Our aim was not to analyze every kind of crowdwork using motive and purpose; rather, these questions provided a general framing for dividing crowdwork into analytical categories--places where we could identify specific problems that may differ depending on the answers to these questions. After significant discussion, we settled on three types of tasks, choosing a case study to explore each one:
1. Microtasks: Amazon's Mechanical Turk, Microtask.com, and Soylent[44];
2. Tasks requiring "professional" skills: 99designs[45] and InnoCentive[46]; and
3. "Game" tasks: Gwap[47].
For each of these tasks, we attempted to identify salient "problems": issues that cause concern for workers, employers, platforms, businesses, or society generally. In identifying problems, we had two goals. The first was to provide a set of new issues for others to build upon in future work. The second was to explore a small number of issues and propose our own context-specific solutions. In this sense, it was an exercise in both applying the Best Practices and inventing new solutions that either context or framing prevented the Best Practices from solving. In what follows, we explain each topic, the problems it presents, and specific solutions to selected problems. Although we think the solutions we propose have some teeth, they are not meant to be final. Indeed, our goal in presenting these solutions and problems is to provide a base from which others can build.
The 3 Crowdsourcing Environments and Problems
Microtasks: Amazon Mechanical Turk; Microtask.com; Soylent
Microtask is a type of crowdsourcing that refers to an employer dividing a task into smaller subtasks that require human intelligence (Human Intelligence Task, HIT [48]), and then assigning the microtasks to workers in the crowd. Each microtask can be completed independently of any information about the other microtasks. Although some tasks might require the worker to have certain qualifications to complete them, such as knowledge of a language, the human intelligence required to complete a HIT is minimal--i.e., the average (or even below average) workers can do the job. Workers earn income by the number of tasks that they complete and are approved by the employer.[49] By definition, workers do not always earn monetary benefits: in some cases, they feel a sense of achievement by completing a task or winning virtual points in the form of games (as discuss in the final section, e.g., Gwap [50]), or earning no benefits at all (e.g. ReCaptcha [51]). In this section, we limit our discussion to microtasks with some kind of monetary reward.
An introductory video of Microtask.com [52] is a good illustration of the nature of the microtask-type of crowdsourcing. Microtask.com also offers two typical examples that are particularly suitable for microtasks: form processing, which helps clients such as banks and insurance companies transfer hand-filled forms into digital forms compatible with databases[53]; and digitization, which helps clients such as national archives and libraries proofread their scanned and OCR'ed files and digitize them into a format that is searchable and machine-readable [54].
Quality Control
Problem
One concern is that the quality of crowdsourced projects may not always be satisfactory. Below is a sample of text revised by Soylent [55] from the second paragraph of Prof Zittrain's book [56].
"This was not the first time Steve Jobs had launched a revolution. Thirty years earlier, at the First West Coast Computer Faire in nearly the same spot, the twenty-one-year-old Jobs, wearing his first suit, exhibited the Apple II personal computer to great buzz amidst “10,000 walking, talking computer freaks.” The Apple II was, a machine for hobbyists who did not want to fuss with soldering irons:, had all the ingredients for a functioning PC were provided in a convenient molded plastic case. It lookedWhile clunky, yet it could be at home on someone’sfit on a home desk. Instead of puzzling over bits of hardware or typing up punch cards to feed into someone else’s mainframe, Apple owners faced only the hurdle of a cryptic blinking cursor in the upper left corner of the screen: the PC awaited instructions. But the hurdle was not high. Some owners were inspired to program the machines themselves, but true beginners simply could load up software written and then shared or sold by their more skilled or inspired counterparts. The Apple II was a blank slate, a bold departure from previous technology that had been developed and marketed to perform specific tasks from the first day of its sale to the last day of its use."
A comparison between the above text with the original might suggest the the quality of the original is not improved; in fact, it added typos and errors that the original does not contain. Another example would be a worker who does not actually use human intelligence, but just clicks the mouse randomly. For instance, in a task where a worker is supposed to tell the sex of the person in a photo, or tell the major color of a picture, one could randomly choose the result in order to complete as many tasks as possible. Although workers of a task only earn income when the requester approves their work,[57] the approval of requesters are generally procedural rather than substantive. The dilemma is that employers cannot check each microtask (since it would mean that the employer would have to do the work all over again), nor can they use machines to do so (otherwise they would not have outsourced the task to the crowd in the first place).
Solutions
Improving the quality for crowdsourcing means, not only that current tasks can be completed with better quality, but also that the crowd will develop a greater capacity to assume more complex responsibilities. There are two critical approaches that can be considered to improve the quality: 1) result verification/evaluation, and 2) worker grouping. While neither is a perfect solution by itself, a superior system might be a hybrid of the following methods.
- 1. Verifying microtasks. One approach to maintain the quality of microtasks is to have the results checked. The dilemma here is that, on one hand, employers are unable to verify each microtask in substance (because if they could, crowdsourcing would be unnecessary); on the other, machines cannot verify the result because the microtasks usually require human intelligence that is difficult for machines to process. Therefore, possible solutions lie back with the crowd.
- a. Repetition. As a mechanism that some platforms, e.g., Soylent, [58] have already adopted, the employer can have each microtask repeated by multiple workers. A task is only accepted as valid when the majority (e.g., 2/3 or 3/4) of workers produce the same results. An assumption must be made that majority of workers in the task are of good faith, i.e., they work genuinely exercising their human intelligence. Obviously, if most workers click randomly, even such a mechanism will not guarantee high-quality products. Another issue is cost. From the efficiency point of view, each microtask can be solved by one competent worker. In order to verify the result by repetition, the costs of labor multiply hugely (depending on the number of repetitions that the employer designates).
- b. Gold standard. Another mechanism that some employers use, if the nature of the task allows, is to mix in test questions where the result is already known to check the quality of the worker's performance. (See, e.g., Qin Gao & Stephan Vogel, "Consensus versus Expertise: A Case Study of Word Alignment with Mechanical Turk", Language Technologies Institute, Carnegie Mellon University, pg. 31.) For instance, if legal journals wanted to crowdsource subciting[59], they could include an incorrect citation where they already knew what the correct version was, and use that as a standard against which to measure the worker's overall performance of the task.
- 2. Differentiating the crowd. One way of differentiating a group of workers from the rest of the crowd for a specific purpose for a crowdsourcing task (e.g., if it is a task that requires workers who are more experienced or more competent in a particular area) is to impose qualifications based their prior experience, which can either be their experience in the off-line world or their online performance regardless of their actual background in the off-line world, or a qualification test specifically for the potential workers before they start.
- a. As in the off-line employment market, crowdsourcing platforms can group workers based on their education, professional qualifications, etc., and an employer may specify such requirements to admit workers for its task. The difficulty here is that there is no reliable way of verifying the off-line information. If relying on self-reporting, the employer may receive fake information from workers who want to have more opportunities to work. While the employer (or the platform) can have workers upload evidence for the backgrounds (e.g., an electronic version of degree certificate, an .edu email address), workers can still fake evidence [60]. Moreover, there are acute concerns about privacy: people may not want to upload their real copies of certificates, even to large online service providers like Amazon or Google, who usually have better privacy protection mechanisms, so they might upload fake ones instead or not upload at all.
- b. Another approach is to completely ignore people's off-line identities and focus only on their online experience. A rating system is commonly established for this purpose. A requester may choose how many points a worker can earn for each microtask. A platform may choose to adopt more specialized rating system (categorizing experience into different types of tasks), or a uniform rating system (experience accumulates by the number of tasks completed, as in Mechanical Turk's system[61]). A uniform rating system might be unfair for newcomers because they lack the required experience points to start, especially since workers who have sufficient points might have earned them from tasks that are irrelevant to the current task. A categorized rating system seems to be more reasonable. (Rating systems are discussed in more detail in the next sections problems on reputation for professional work.)
- c. Some requesters do not consider either off-line background or online experience. They set up a training session and final quiz for candidates, and only those who can pass the quiz qualify to work [62]. The limitation of the qualification test is that it only ensures the competence of the worker; it does not guarantee that he or she will genuinely do the work as expected.
Protection of Workers
Problem
The Best Practice document composed by Class 3 of the last winter course discusses many issues that involve the protection of workers' interests. We agree that crowdsourcing (and the microtask-type of crowdsourcing in particular) creates a unique virtual environment where co-workers are isolated, and therefore reveals new issues in the employer-worker relationship. It is also important to recognize that crowdworkers are a heterogeneous group, not merely in terms of skills or other work-related differences, but in terms of backgrounds as well. For instance, recent studies on demographics of Mechanical Turk [63][64] show that there is an increasing number of Turkers from India. In March 2008, the platform was dominated by U.S. participants (76%) and only 8% Turkers were from India. As of November 2009, although U.S. people still accounted from more than half (56%), the Indian portion had skyrocketed to over one-third (36%). Mechanical Turk is becoming more and more international, and the diverse workforce complicates the issues concerning the protection of their interests. Therefore, we will attempt, when appropriate, to treat workers with different backgrounds and purposes separately when discussing the protection of their interests in crowdsourcing work.
Solutions
- 1. Unionize. Felstiner (2010) argued that the current statutory framework based on the National Labor Relations Act (NLRA) is insufficient to address the legal issues on crowdsourcing; he further suggested that workers do not need to wait for legislative action, but rather can use their collective power to protect their interests now.[65] This suggestion raises a question: what kind of collective action is appropriate for crowdsourcing? One side of the spectrum is for organizations, such as trade unions in the off-line world. We believe this approach is problematic for several reasons. First, only the platform possesses complete information of all the workers, and due to the wide geographical distribution, establishing a union entails disclosure of employer's contact information. Second, employers of commercial tasks rely on the isolation of workers to take advantage of the benefits of crowdsourcing without compromising trade secrecy and confidentiality. However, communications among workers might enable them to reassemble microtasks and reproduce the entire task, which may not be an acceptable risk for the employer.
An online forum is another possibility. Compared to a union, online forums are loose, and satisfy the need of anonymous communication by workers. Existing forums includes Turker Nation [66] and mTurk Forum [67], which seem to be functioning well. On those online forums, workers can discuss any issues of mutual interest and try to create bargains with the employers or the platform. Meanwhile, the identities of workers and details of the tasks they are working on stay confidential unless they reveal them deliberately. The risk of breach of confidentiality can be solved by rules of forum discussion and enforced by the the administrator of the online forum.
- 2. Minimum wage. Cherry (2009) argued that the minimum wage requirements should be extended to "virtual work", including crowdsourcing in cyberspace.[68] Given that a large portion of Indians are working on Mechanical Turk and that the average household income is relatively low in India, as seen in the demographic studies noted above, some Indians (less than 10%) have started to rely on crowdsourcing as their primary source of income. For the other workers in India, although crowdsourcing is not their sole employment, it is a relatively significant portion of their incomes.[69] Even in the U.S., more than half of Turkers consider monetary reward as their primary reason to participate in crowdsourcing.
However, given the global nature of workforces on Mechanical Turk, the details of setting a minimum wage for crowdsourcing are extremely unclear and difficult because of jurisdictional differences in labor law, taxation, etc. For example, should the minimum wage law in the U.S. apply merely because the platform (Amazon) is registered in the U.S., even though both the requester and workers are from outside the U.S.? If so, the trend appears that more and more workers from developing countries would join crowdsourcing because the economic incentives are more attractive to them.
On the other hand, we can see that the need of minimum wage less compelling for those who do not rely on crowdsourcing as their major employment. Most microtasks require a low level of attention, and people can easily multi-task when they are watching television and online chatting. Crowdsourcing enables them to take advantage of their "spare cycles", and meanwhile enables employers to utilize human intelligence at very low cost. In these cases, imposing minimum wage may diminish some of the advantages of crowdsourcing.
"Professional-Grade" Tasks: 99designs & InnoCentive
99designs is a website that allows individuals or companies that need a design to ask for it by crowdsourcing.[70] InnoCentive is a company that allows individuals and entities to post scientific problems that anyone can attempt to solve.[71] These companies are a particularly interesting form of crowdsourcing because they enable the crowd to perform work typically performed by "professionals." Although services like Mechanical Turk also "deprofessionalize" work, 99designs and InnoCentive do so much more directly. Typically, designs are created (at 99designs) or problems are solved (at InnoCentive) by professional companies, the employees of which typically have some formal training. This platform raises a variety of concerns. We focus here on two: deprofessionalization and reputation.
"Deprofessionalization"
In some sense, graphic design and other industries such as science are "professionalized." What, exactly, "professional" means is open for debate. One definition might focus on how the "professional" occupation or industry regulates or doesn't regulate itself. Lawyers and doctors, for example, are self-policed professions with codes of ethics and oaths. Additionally, though, we often think of many other areas as professional because the individuals who occupy them have formal training (and many times formal education). Under this latter definitional component, many types of work qualify as "professional." The traditional occupations like lawyer, doctor, and clergy certainly fall within it; but so too do other kinds of work, such as graphic design. A more restrictive definition--one that includes both definitional components--might exclude a wide variety of occupations. But let's assume a common sense distinction between professional and amateur or non-professional works. For the past several years, crowdsourcing has crept into these "professionalized" areas without much fanfare. In science, for example, InnoCentive has provided a platform for corporate employers to crowdsource complex science problems that require formal training. In the "creative" space, 99designs performs a similar function: it enables companies to crowdsource graphic design work.
When thinking about both of these "professionalized" environments, it helps to understand professionalism from two perspectives. The first perspective views professionalization as a beneficial gatekeeping mechanism--it vets people before they can perform certain work. The second perspective views it as merely reinforcing existing structures that disadvantage certain individuals; i.e., those without connections or formal training. There is probably some truth to both claims--but the focus here is on how crowdsourcing affects either perspective. From the first perspective, professional crowdsourcing platforms reduce the role of industry or profession as gatekeeper--and has the potential to eliminate it entirely. From the second perspective, crowdsourcing opens opportunities to those who otherwise are shut out of the industry by the gatekeeping mechanisms of formal education or training. Given that both of these perspectives probably have merit, there are a variety of problems we could address. We chose instead to identify two "problems," while remaining open to the possibility that others, with more time and resources, might be able to do further work on them. This might be analysis proclaiming these "problems" as non-issues, or it may be elaborating and expanding on the issues identified here.
Specific Problems
- 1. Cannibalization/Wage Reduction. If 99designs or InnoCentive lowers entry barriers and costs, it could stimulate a race to the bottom. 99designs' Business Development and Marketing Manager, Jason Aiken, has already acknowledged that the company has created "a tension" with the traditional market for graphic design--and it may be driving down wages.[72] (Clip at 34:16.) In this world, professionals could not earn a living because "amateurs" or other professionals that do not have jobs will drive down prices for crowdsourced design work. With low prices and an abundance of crowdworkers, companies may shed their traditional means of acquiring designs. So crowdsourcing, which started off as a way to lower specific business costs or solve thorny problems, becomes the sole means of (research &) design work. In this environment, which designers worry about,[73] the professional industry collapses because wages are too low. Alternatively, a new web-based professionalization occurs. That, of course, depends on a variety of factors, including the ability of crowdworkers to maintain a coherent identity/reputation online (see Reputation below).
- 2. Devaluing of Education. If deprofessionalization occurs and an industry cannibalizes itself, there will be a concomitant and precipitous drop in the market value of education. In scientific areas, many crowdworkers tend to have advanced degrees, or are highly educated. Their ability to perform sophisticated or professional crowdwork is therefore partly dependent upon their education. But the crowdwork market undervalues the educational experience of the worker because, at present, most highly-educated workers are engaged in part-time, non-sustenance activity. As crowdwork becomes more popular for professionals, wages fall and crowdwork replaces traditional professional work. The cost of education, however, remains constant. This means that sophisticated crowdworkers will pay the same amount for school but will be full-time, instead of part-time, crowdworkers. Given the low wages, people will not be making an adequate return on their educational investment. Several possibilities then result--but we focus on the most dire here. Knowing their inability to generate an adequate return on their investment, low wages could deter individuals from higher education. This, in turn, will cause a brain drain, where fewer and fewer people obtain advanced degrees. As a result, the market for professional crowdworkers shrinks. Given the technological growth rate, the demand for crowdworkers will continue to grow. The shorty supply and high prices will mean the end of crowdsourced professional work for two reasons. Costs will rise to the point where crowdsourcing is no longer more economical than traditional professional services. Second, and more importantly, the lack of workers means that, given the growing technological and cultural demands our society makes, tasks simply cannot be crowdsourced effectively. Although one might argue that markets will correct for the supply/demand problems, it may not do so at fast enough rates. The legal education market, for example, has continually grown despite decreasing demand.[74] The same thing is happening with for-profit colleges.[75][76] The market has not yet corrected itself because the government is so willing to issue loans. When the problem finally comes to the fore, there will be serious costs, both to society, to students, to graduates, and to the industry. The mere fact that the market will correct the problem eventually doesn't mean we should wait for it to do so, especially when the correction may be painful and enduring.
Solutions.
One might wonder whether concerns over deprofessionalization are overstated. The criticism--the one Howe addresses the first chapter ("The Rise of the Amateur") in his book[77]--is that we are worried simply about "amateurs" displacing "professionals." Howe certainly has a point: if *other* individuals can do the work of professionals at less wages and through crowdsourcing, then what are we worried about? We do, of course, like things to be efficient. We are concerned, however, that some industries could, over the long run, suffer damage if efficiency is our only concern. That doesn't mean that industries will necessarily suffer damage, but they may. And we feel that it's worth considering what could happen if we place all our bets on markets and efficiency, and the dealer comes up 21. One place where we are starting to see problems of relying on market is with traditional media outlets. We may think that "amateurs" can report just as well as many professionals--and their doing so may be more efficient. But we have other concerns, such as whether the reporting is reliable and whether the facts check out. Similarly, in design work, efficiency and opportunity are good things. But what happens when the market for professionals has disappeared, and only low-wage crowdworkers remain? Alternatively, what if all legal work were crowdsourced? The quality of work may decrease, relationships between employers and workers may fray, and the industry may have significant difficulties. As noted, though, this is not the only scenario. The industry may adapt to these changes and do just fine. The exercise here, though, is to think about deprofessionalization and education devaluation as problems, and pose potential solutions while recognizing they may not materialize. Here are a few solutions we might consider:
- 1. Wage Scale. Industries and professionals could collaborate to set wages they think are reasonable. The Best Practices document recommends "fair" wages, but seems to presume that only employers will be the ones deciding what wages to set. In this professional-crowdwork context, it might be wise to allow collaboration of interested parties, rather than rely only on employers to set a fair wage. One could see such a wage scale being set by various stakeholders, and perhaps some "non-stakeholders."
- 2. Wage Determinants. Wages could be set according to some or a series credential-measures. This could take several forms, some of which could be combined.
- 2a. Workers with a degree or work experience in a related professional field may be entitled to a higher wage than a worker with no higher education. The problem with this approach is that it reduces the "meritocracy" aspect of crowdsourcing. It also seems to devalue the diversity that crowdsourcing thrives on.
- 2b. If the platform implements an effective reputation or rating system (detailed or simple), workers could use their reputations to generate more work. This solution faces some technical and privacy problems, but seems like at least one plausible way of ensuring favorable wages for better performers. This also has the benefit of showing which workers are repeatedly good at performing tasks. This is important because many InnoCentive solvers, for example, never solve more than one problem.[78] One drawback of this method is that it decreases the "perfect meritocracy" that some seem to think can persist forever (if it exists at all).
- 2c. Workers could be paid according to the contributions they make. For this system to work, a platform and employer would have to work together. They would have to create a framework that allowed an initial screening for those who held the requisite qualifications or ratings, and then pay them according to the amount of time worked and contributions made. (Here some kind of algorithm might be useful.)
- 3. Educational Reform. Change educational components to provide skills that crowdsourcing cannot cover. This could include non-compartmentalizable tasks or exposure to a broader range of subjects. There are many problems with reforming the educational system. Aside from the many practical difficulties, there are two specific problems. First, it may be difficult to identify in advance what problems are crowdsource-able, as the capacity to crowdsource work is likely to change in the future. Second, because we don't yet know enough about crowdsourcing, it's difficult to say what skills or exposure to ideas one needs to perform certain tasks well--or outperform crowdsourcing.
- 4. Discourage wage reductions/crowdsourcing by having platforms require disclosure when crowdsourcing. This, however, may discourage crowdsourcing generally. The goal should be to reap crowdsourcing's benefits while minimizing its potential downside. Still, this solution could be used to less extreme degrees to pressure companies to offer competitive wages for crowdsourced tasks.
- 5. Incentivize pay-friendly behavior by using reputation system (see below). The various approaches here could be combined with or pushed into a reputation system. We explain various problems with reputation systems below. Here, we focus on the potential benefits of a three-way reputation system. "Three-way" means that the system involves all parties: workers, employers, and platforms. A reputation that engages all three parties can incentivize each party to engage in wage-friendly behavior. A reputation system, for example, that allows workers or platforms to assign reputational rankings to employers can cause wages to increase. If workers know that Corporation A generally pays 2x as much as Corporation B, workers can identify "fairer" wages. Similarly, companies may be willing to offer better wages if they can screen out "low reputation" workers.
- 6. Regulation. One of the benefits of crowdsourcing is the lack of regulation. That is sure to change, and it might be that regulating crowdsourcing, either as to wages or work, could benefit everyone.
Reputation
Reputation is an issue for all parties involved in professional crowdsourcing: the worker, the employer, and the platform. The Best Practices document focused on some of these issues in the microtasking environment. Many of these reputation mechanisms may be able to be imported from user-based reputation systems--and there is literature on how to design reputation such systems.[79] Concerns there were focused on speed, efficiency, and work quality. The Best Practices also focused solely on the relationship between individual workers and employers. These concerns also exist in the professional environment. There are, however, other problems to confront. Specifically, worker quality may become more important for professional work because the work done is more time-, labor-, and skill-intensive. Additionally, because tasks are fewer in number and take longer, each contest or task completed also has greater influence on, and importance for, worker reputation. Finally, if professional work pays more than microtasks, the reputational stakes are higher.
Reputation Concerns for Workers, Employers, and Platforms
- Workers. As crowdsourcing increases and professionalized crowdworkers have success, crowdworkers want to signal to potential employers that they have done good work in the past. Workers may also want to list their professional experience, education, or training (as one can do on Gerson Lehram Group[80]). Workers also will seek to ensure that antisocial behavior among themselves is counted against a worker's reputation. The literature on crowdsourcing emphasizes the community that exists in crowdwork environment, and ensuring the community functions well is important. 99designs has numerous instances where a crowdworker accuses another of "stealing" someone's design. Workers also will want to know the reputation of an employer. This could include factors like the amount paid, the quickness of payment, and the type of work offered.
- Employers. Reputation would also be valuable for employers--they'd prefer work from people with high reputations, not only to ensure good work, but also to ensure the work was not copied from somewhere else. While copying from the worker's perspective is important as a social matter, from an employer's perspective copying is important as a legal matter. Employers want to avoid copyright infringement or any claims of copyright infringement. Employers also want to broadcast their "good" reputations.
- Platforms. Platforms have an incentive to ensure that workers' and employers' desire for a reputation system is met. It will, in many cases, be up to the platform to institute a system that can deal effectively with reputation, antisocial behavior, and legal issues. 99designs has implemented a system that seeks to deal with the latter two, but does not address reputation.
Problems.
- 1. Portability. As the Best Practices document notes, workers and employers may want have a coherent identities across a variety of platforms. In doing this, they may want to keep their identities secret but their reputations public. If a worker moves from 99designs to iStockphoto, for example, they may have to create a totally new profile and build up a reputation, even though some aspects of their previous reputation may be relevant.[81]
- 2. Reliability. Any feedback-reputation system needs to be reliable. That is, it needs to accurately reflect the quality and quantity of work performed. One way to ensure diversity in feedback, and therefore reliability, is to have a (platform-mediated) reputation system, where workers vote on both each other (where applicable) and employers, and employers do the same (vote on workers and each other). The system could then be mediated by the platform.
Solutions
- 1. Platform-specific solution. The Best Practices document suggests that each crowdsourcing platform implement its own measures to track reputation. This is a reasonable solution. 99designs and InnoCentive do not yet have such a system. This kind of solution would allow each platform to identify the reputational characteristics necessary for the kind of work offered. 99designs, for example, could track worker reputation based on cooperation among workers, number of contests won, and portfolio ratings from other users. It could also track employer reputation based on payment amount and speed, and copyright licensing terms. InnoCentive, by contrast, could focus on credentials characteristics, such as degrees obtained or relevant work experience. It could track employers based on potential for future projects with the same company. The Best Practices document also suggests that workers have access to their data, and presumably envisions a way to use reputation information at one provider or another. Here, the solution could be for each platform to issue a "portable reputation," which would explain the reputation of the worker on that site, the work performed, and the average/median rating of a worker on the site (with similar work performed). In this scenario, platforms would have to have some interoperability, a problem on which some already are working.[82] Each platform, for example, could agree to host the reputational information provided by another. In such a situation, the worker could 'import' his or her data from, say, 99designs to iStockphoto. When that worker completes a task or uploads a photo, an employer could click on the worker's profile to view various bits of reputational information. (Perhaps they could all agree to a specific file (type) that included certain information?) In one iteration, this could include "Reputation Homepage" that displayed the logos of various platforms for which the worker had completed tasks. Scrolling the mouse over the logo might reveal some general reputational information,such as a reputation score (with the appropriate scale for the platform). This idea, though, requires cooperation among platforms--something that has been difficult in the past. One reason for this difficulty is because large, established platforms have a perverse incentive to keep a reputation system closed. These platforms already have a user base that depends upon the platform. Portability threatens what they perceive as theirs: the reputations earned through their platforms, which make the platform a desireable place to conduct business. eBay, for example, objected when Amazon allowed users to import their eBay reputations, claiming proprietary interest.[83] (p. 237.) This confirms Randal Picker's contention that, as far as portability on the Web goes, "law matters."[84] (p. 8.) [85][86]
- 2. Uniform Reputation. Another approach to reputation would use a centralized entity to manage worker/employer reputation across all platforms. One can imagine a universal reputation platform that draws information from various sites and aggregates it--maybe it has agreements with all the platforms to provide it reputation-related information. (There are already sites that purport to provide reputational rankings of websites--see, e.g., Webutation.[87]) In other words, this entity would maintain a database of all crowdsourced employers/workers reputations. Reputational information would be aggregated automatically from all participating crowdsourcing platforms, which would automatically submit data to this entity. Theoretically, employers and workers could consult this platform to determine worker reputations. To enable this "reputation finding," the reputation database could contain various "categories" of workers and employees, and allow an individual to sort based on various reputational scores, which would be determined based on a variety of metrics and the information provided to the centralized entity by the crowdsourcing platforms. So, for example, a company could find someone with a high "creative reputation" (e.g., logo design) or someone with a high "engineering" rating. It may also allow users to disaggregate reputational information across platforms to see how the users has performed/behaved in each. Additionally, a central reputation-authority would allow people to move "with" their reputations and stay "anonymous"--they won't disclose their real identity. As noted, this all would require platforms to (automatically) submit information to this reputation database/entity. Because platforms (in the hypothetical world) would automatically submit worker reputation info to the site, there are essentially two ways the site could work at the level of individual reputations. Both methods of operation work from the premise that individuals and employers are searchable by reputation. Search results might display an aggregate reputation or a simple ranking with no reputational score accompanying it. The first way resembles the solution described above: once a user is selected, a visitor is taken to a reputation homepage that displays a worker/employer reputation from different platforms, and provides information about each. This provides at least one benefit for platforms: they don't have to generate interoperability--they can delegate reputation aggregation to one entity, which could probably collect and display the information more effectively. Two authors have already sketched a design for a similar system.[88] A second method would be to for the reputation site to create its own "general" reputation for each worker based on the information it receives from platforms. We can imagine a situation in which workers or employers can ask the reputation site to rank workers based on specific, general, or a combination of characteristics. This would allow employers to sort workers by reputational characteristics they deem important to a particular type of work. Once sorted, employers could provide an open-call project to the narrowed crowd of workers. This system also would require the centralized agency to weight different sites or reputational attributes. This is a problem of reliability, discussed below. One other advantage of a unified reputational system would be allowing some user control over reputation. We can imagine a place in a user profile where the user can fill in information, provide explanations, or otherwise comment on past work experience and the like. (There may also be room for a centralized-decentralized approach, where the user claims authority over her reputation and distributes it how she wants. Jon Udell outlined such a concept in his recent talk at the Berkman Center.[89])
- 3. Reliability for Platform-Specific Versus Uniform. Each system of reputation (platform specific and uniform) will have issues of reliability. By reliability, we mean a platform can accurately report to a user how well or poorly a specific individual has performed in the past, where performance is based specific criteria. Because reputation systems rely on information, what information is used and who discloses it can influence the reliability of a reputation system.[90]
- Platform-specific reputational systems would likely achieve greater reliability because they would be localized. As self-contained systems, each platform could respond to user demands, as well as tinker with existing formulae that garner ratings. Still, it's not clear how reliable these local reputational systems could be across platforms. Assuming for the moment that each platform can in some way support the reputational rating of another platform, we still might have the problem of "overload": individuals will have too many reputational scores from too many different platforms. Employers may then look or sort only to the lowest rating of any given platform on a user profile. Thus, platform specific rating systems could actually work to the detriment of the users.
- A uniform reputational system may face more reliability issues than an interoperable platform-specific reputational system. For one thing, a unified system will be drawing reputation data from hundreds of different platforms automatically. There is a risk for gaming the system on two levels here. First, a user at a relatively unsophisticated platform may try to manipulate the system and boost their own reputational rating. Second, the platform itself may have an incentive to give its users higher reputational scores. Why? Because assuming the unified system computers some aggregate reputational score, any platform can boost their users' scores and give them an advantage in the market place. Another problem arises as to weighting reputational scores. Should a reputation score from InnoCentive be "worth" more than one from Gerson Lehrman? Do reputation scores from Threadless even translate into anything meaningful? If so, should they be treated the same as scores from 99designs. All of these questions illustrate that there is no easy way to decide how scores should be weighted, though we can think of a variety of factors, including number of users, length of existence, customer satisfaction, etc. Part of the problem also is that different platforms may use different rating systems--some may use only one general rating, while others may parse ratings into various categories. How these issues are dealt with impacts the "reliability" of reputation. It also, therefore, impacts the trust that crowdworkers have in a reputational system.
"Game" Tasks: Gwap
Gwap is a website comprised of games with a purpose (i.e., gwap).[91] That is, when its users play the games, they are simultaneously doing tasks that, in the aggregate, perform some function that improves our state of knowledge and/or advances our technology. For instance, the most well-known game on Gwap is called the ESP Game, where two players are shown a photo, each enters a list of suggested tags that the other player cannot see, and then they win points when they have a matching tag. The purpose behind the game is to create tags for images online, so that search engines can sift through them more easily when a user enters a query.[92] As Gwap's tagline states, "When you play a game at Gwap, you aren't just having fun." While such platforms offer an innovative way to draw on the 'wisdom of the crowd,' they also raise concerns about the potential exploitation of users and its concurrent effects.
Addiction
Problem.
The question of addiction has arisen both in the context of crowdsourcing generally [93] and in the context of so-called "social games", [94][95] such as FarmVille. [96] In a paper entitled Moving the Crowd at Threadless,[97] Daren C. Brabham offers some insight into the various motivations of the crowd at Threadless,[98] a website that sells t-shirts and applies crowdsourcing through its solicitation of the crowd for designs and slogans.[99] In particular, Brabham discusses how members of Threadless use the language of "addiction" when talking about their participation in Threadless. In his view (2010: 3, 17), the language of addiction illuminates the significance of building a community in order for crowdsourcing to be truly effective, such that organizations that use crowdsourcing "need to allow the crowd to truly support the problem-solving mission of a crowdsourcing venture for the public good, to generate in the crowd a sense of duty and love – and even addiction – to such a project", although he ultimately concludes that members of Threadless are most likely not actually addicted in the pathological sense.
On the social gaming side, Gamespot--the self-styled "go-to source for video game news, reviews, and entertainment"[100]--recently ran a story on the ethics of the social games market.[101] Here, the concern is that social games have crossed the line from being fun to being addictive. As Edmund McMillen [102] put it, "There's a difference between addicting and compelling . . . . Crack is addicting, but it's not a fun game."
So how does this affect crowdsourcing games? Since games like those at Gwap involve both crowdsourcing and social gaming, the concern is that the risk for addiction is more acute. Although undoubtedly hyperbole, players of the ESP Game, for example, have said that it is "strangely addicting" and that "it's like crack!" (See video [103] at 22:30.)
Solutions.
There are a number of safeguards, detailed below, that can be built directly into the platforms. Ideally, such changes would be made voluntarily, but legal regulation can be used as a tool to accomplish the necessary changes as well.
- 1. Platforms can kick people off after a specified amount of time. For instance, Gwap will kick users off if they have played for 15 hours straight, or 10 hours if they are from an .edu domain. (See video [104] at 14:54-15:02.)
- 2. Platforms can notify the player after a specified period of time has elapsed. (5 hours) 'Just to let you know, you've been playing for quite a while.' (10 hours) 'Are you sure you want to keep playing?' (12 hours) 'Really, you should go outside.' (15 hours) 'All right, that's it. I'm kicking you off for your own good.' The principle behind both this solution and #1 mirrors the one that led to the development of applications designed to prevent people from sending drunken e-mails that they rue when sober, such as Gmail's Goggles[105]; namely, that sometimes a little check on people can keep them from making decisions that they might later regret.
- 3. Platform can have built-in termination of the games. In other words, games should have an established end to them. For instance, with games like Life[106] and Candyland[107], the game ends when one or more players reaches the end of the trail. In contrast, games like Taboo[108] and Apples to Apples[109] continue until the players themselves call an end to the game. Built-in termination of the game would also be an effective tool to address addiction concerns for social games in general--imagine if FarmVille actually ended at some point.
- 4. Is it really a problem? It should be noted that there is little, if any, support for these assertions by way of rigorous studies. Moreover, it is hard to see how addiction is more of a problem for crowdsourcing games than for television or other kinds of entertainment that, despite the substantial amount of time people spend doing them, we do not regulate with regard to their potential addictiveness.
Disclosure
Problem.
Another concern that arises with these games is that the players may not know the underlying purpose. For instance, on Gwap, there is an 'About' section that describes the purpose behind the ESP game as an example of what is meant by "games with a purpose",[110], yet there is no place on the website that describes the purpose of any of the other six games. Although Luis von Ahn, the main creator of Gwap and hailed as the "father of human computing", [111] describes the purpose and features of Gwap's games in academic papers (see his CV [112] for a list of his papers), chances are that few users have access, know-how, or interest in hunting down and reading them. But that does not mean that they would not want to know if given the opportunity.
Solutions.
- 1. Objective Standard. As noted in the Best Practices, there should always be as much disclosure as possible. However, it is not always easy to see how this general principle should be applied in different situations. Perhaps one useful tool would be to employ an objective standard in deciding whether and to what extent certain information should be disclosed. In securities law, the materiality of information is measured by what a reasonable investor would want to know [113], so perhaps a good standard for crowdsourcing games would be what a reasonable person would want to know in deciding whether to play the game.
One may argue potential players could care less what the underlying purpose of a given game is, since they are only playing it for enjoyment. The obvious response to such an argument is that it depends on the purpose of the game. For instance, most people are probably either neutral or positive towards the ESP Game's purpose of labeling images to make them easier to sort, but they would most likely have a negative reaction towards a game whose purpose was to spam people with porn ads.
There is also the possibility that disclosure may not always be feasible for the game to work as intended. For instance, a psychology lab might formulate a game to test x, but if Player A is aware that they are testing x, it might inhibit or otherwise change how she plays the game, making the results useless for the purposes of study. So perhaps the caveat should be added that companies should disclose any information that a reasonable person would want to know in deciding whether to play the game, "except for information that, if disclosed, would have an adverse effect on the underlying task, aside from making the player unwilling to play." So "employers" would still be required to disclose as much as possible, but not to the point of comprising the underlying purpose of the game.
One question that has arisen is what to do when the nature of the task/game relies on nondisclosure, such as making edits to a document discussing the employer's trade secrets. One possible solution is to require workers to sign a confidentiality agreement. Another would be for workers to have some sort of profile where they indicate what employers they are or are not willing to work or play games for (e.g., Google-philes could indicate that they do not want to play any games sponsored by Microsoft). A third possibility is that employers could do something like a privilege log in discovery, where they indicate why they are not disclosing, and a designated third party (the equivalent of the judge in discovery) could decide whether the nondisclosure is okay.
But an alternate answer is that employers should simply not use crowdsourcing for particularly sensitive tasks. If it is that important to make sure something stays confidential, keep it in-house. When employers exponentially increase the number of people to whom the information is exposed, they are exponentially increasing the risk that that the information will be leaked -- they should either accept that risk or go elsewhere.
- 2. Timing. A further option would be to require additional disclosure after the player has finished the game, since the task will have been completed at that point, and then giving her the option of voiding her results. However, this may lead to people voiding their results for reasons other than taking issue with some aspect of the company or task, such as embarrassment or discomfort with how they acted during the game.
- 3. Review Process. If additional oversight is needed to ensure that companies are in compliance with this standard, companies could set up internal review boards. Alternatively, an outside party, such as the government, a non-profit, or a coalition of companies, could set up an independent review board that is responsible for reviewing and approving (or not) the proposed games and accompanying information, much like researchers are required to submit their proposals to an institutional board when the study involves human subjects.[114] Approval by such a board could be required before the game is crowdsourced, and/or the board could be given the authority--either by the government or by agreement among the companies--to do random audits or investigate complaints, just like the SEC can investigate and determine whether a company has failed to disclose material information with regard to its securities.[115]
Compensation
Problem.
One question that has come up with regard to crowdsourcing games is whether players should be compensated for doing tasks through the guise of a game.[116] Although the task is structured to be fun, it is still work that employers need done and would ordinarily pay for, were it not set up as a game. So are they exploiting the crowd by somehow 'tricking' them into doing work for free when the companies would have to pay for it otherwise? Does it make a difference whether people are aware of the underlying purpose? In the preceding section, we argue that people should know the task behind the game so that they can make an informed decision about whether to play it based on their personal values, but would the lack of compensation also play a role in their decision? Does it depend on whether they perceive it as work? More importantly, if people will do it anyway, why should companies pay for it?
Solution.
Players should remain uncompensated for a number of reasons. First of all, if people don't find the games fun, they won't play them. For instance, one game on Gwap is called Squigl, in which two players are shown an image and a word, and have to trace the object described by the word.[117] If their traces match, they earn points. One of the authors of this page, doing laborious, intensive research for this section, found the game excruciatingly dull. So she stopped playing. As von Ahn himself wrote, "The key property of games is that people want to play them." (Luis von Ahn & Laura Dabbish, Designing Games with a Purpose, Communications of the ACM, August 2008, vol. 51, no. 8.)
In other words, crowdsourcing games aren't really "free" -- companies have to spend time and money figuring out a way to convert a task into a game that is sufficiently fun that people will play it voluntarily. While the total sum might be significantly less than what it would cost to pay each worker/player, it lessens the sense that the crowd is being exploited if creating such games is not without effort and cost to the employer. Moreover, offering compensation might bring players closer to the brink of addiction: if players were earning money, however trivial an amount, they might feel compelled to keep playing, more so than if they were only playing it for enjoyment, although further research is needed on this point. Alternately, players might actually play less if offered a token amount to play rather than an amount that reflects of the value of their work, because they would feel exploited, even though they would be getting more than if they were paid nothing, similar to the psychology behind the ultimatum game.[118] Again, however, more research is needed.
According to von Ahn, people played over 9 billion hours of Solitaire in 2003. (See video [119] at 7:00-7:04.) Solitaire is a game without a purpose, other than to entertain.[120] So if von Ahn's statistic is accurate, people are perfectly willing to devote a significant amount of time to entertaining themselves. If the games on Gwap serve that function, why should they be penalized for simultaneously being useful?
One might argue that the issue is not that companies would be penalized were they forced to compensate the players for playing games with a purpose, but rather that the workers are being currently penalized through the lack of compensation because they are unwittingly contributing free labor when they are entitled to a fair wage for their work. The question then becomes whether playing such games is indeed "work." We would argue that it is not, because people are playing the game primarily for its entertainment value: if it stops being entertaining, then they will stop playing it. Alternatively, if they are playing it because they support the underlying purpose, then that purpose is clearly important enough to them that they are doing it for free, so they already have sufficient incentive to do the work. After all, do we want to encourage a value system where everything must be set at a price? (If the answer to that question is yes, Wikipedia is in trouble.)
Moreover, it should also be noted that not all workers get paid; some are volunteers. For instance, in Crowdsourcing, Jeff Howe talks about the contribution of the crowd to tasks like NASA’s "Clickworkers" project (2006: 62)[121] and the Cornell Lab of Ornithology’s eBird project (2006: 31)[122]. In both examples, the crowd knows that what they are doing is work, but they still do it, even though they are contributing information for free, because they support the purpose for which they are laboring. If anything, this emphasizes the importance of disclosure rather than compensation.
Summary
Each of the three types of crowdsourcing discussed above presents interesting challenges based on the unique characteristics that each possesses. While there is some overlap among them -- for example, the ESP Game is essentially a microtask designed as a game -- each manifests its "difficult problems" in a different way. For instance, while compensation is a potential issue for all three, there is a concern present with microtasks that workers are not sufficiently compensated to make a living wage; unease that higher education is being devalued through lower wages for "professional" tasks; and a question whether workers are somehow being duped into doing work for free with crowdsourcing games. While the Best Practices document outlines great general solutions to problems, we felt that the different types of crowdsourcing required more tailored responses. Thus, we have attempted here to offer specific answers to some of the problems within each kind of crowdsourcing, while also providing a look at crowdsourcing generally and the disputes currently taking place in the literature. The next step from here might be to do a series of demonstration projects, utilizing and testing some of the suggestions here.