Crowdsourcing: Difference between revisions

Revision as of 18:00, 6 December 2010

Crowdsourcing: Background and Working Definitions

Definitions

At present there is no generally agreed definition of crowdsourcing, and commentators have used many different meanings. Therefore, we believe an overview of definitions is helpful for further discussion of different types of crowdsourcing.

The most widely accepted definition of crowdsourcing comes from Jeff P. Howe, who recognized it as

 "the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people
   in the form of an open call."[1]

He further clarified that the form of crowdsourcing could be either peer production (when co-workers have interactions among themselves) or sole individuals (when co-workers, if any, are isolated from one another). Under Howe's definition, the employer must be a organization (in most cases, corporations), because he was considering crowdsourcing as a new type of corporate business model, by which corporations may raise current productivity or establish new businesses that were not possible before. Nevertheless, we do not think the employer in the process of crowdsourcing, as a matter of definition, must be an organization; individuals can certainly outsource a task to an online crowd.

Kleemann and Vob (2008) argue that

 "central to the concept of crowdsourcing is the idea that a crowd of people, collaboratively (or at least simultaneously) contribute to an aspect of the production
   process or to the solution of a design issue or other problems."[2]

Although we agree that simultaneous or collaborative work is a significant type of crowdsourcing, it is not the only one. The Best Practices entry for crowdwork, developed last year and reposted on Class 3, classifies crowdwork three ways: "First, a large group of workers may do microtasks to complete a whole project; the best-known platform in this arena is Amazon Mechanical Turk. Second, companies may use cloudwork platforms to connect with individual workers, or a small group of workers, who then complete larger jobs (e.g., Elance[3] and oDesk[4]). Finally, a company may run 'contests,' where numerous workers complete a task and only the speediest or best worker is paid (e.g., InnoCentive[5] and Worth1000[6]). In some contests, the company commits to picking at least one winner; in others, there is no such guarantee." It is clear that when the crowdsourcing takes the form of competitive bidding, not every participant works on a single aspect of the task; each of them work on the whole task, and they do not have to work at the same time. Only the final winner gets compensated. It is possible that only one individual or organization joins the bidding process and no competing parties are involved.

Another concept that Reichwald and Piller (2006) used to describe crowdscourcing is "interactive value creation". The further differentiate two types of crowdsourcing: mass customization and open innvovation.[7] We do not think Reichwald and Piller's approach is convincing. First, the general term "interactive value creation" can be used to cover many types of online collaborative activities traditionally not recognized as crowdsourcing, e.g. open source development. Second, mass customization refers to an isolated customer's activity to tailor one particular product, rather then contribute to a general product type. Their sencond type, "open innovation", correctly points out that crowdsourcing should be outsourced through a open call while we do not believe a strong degree of "innovation" is necessary, especially for highly divided microtasks.

Based on the above discussion, we believe that there are two core elements of crowdsourcing, both of which are facilitated by an online platform (such as Amazon Mechanical Turk[8])

 1. The task is outsourced through an open call from the employer;
 2. The recipients of the call, whether or not they elect to participate, comprise a large, undefined crowd.

The following discussion of crowdscourcing reflects our understanding of three most significant types of crowdsourcing. Microtasks faciliated by platforms such as Amazon Mechanical Turk [9] might be the typical type of crowdsourcing in literature.[10] Not only a crowd receive the call from an employer, they also collaborate on the whole task by address a small piece of the pre-divided work. Professional tasks might cause multiple bidders working on the task while there is no collaboration between them (and their work usually involve a higher degree of innovation). Game tasks are in a sense also microtasks but it reflects different issues because people usually do not earn monetary compensation when playing the game.

General Information on Crowdsourcing

General Information
- For a quick overview by Jeff Howe, author of Crowdsourcing,[11] take a look at this YouTube clip.[12]
- Northwestern University Professor Kris Hammond also explains crowdsourcing, but argues that its downsides are worker rewards and quality.[13]
- Our very own Jonathan Zittrain discusses crowdsourcing in his talk, Minds for Sale.[14]
- Several individuals gathered to discuss crowdsourcing in a panel moderated by New York Times correspondent Brad Stone.[15]
In the News
- The New York Times recently ran an article on crowdsourcing featuring two crowdsourcing companies:[16] Microtask[17] and CloudCrowd.[18]
- It's interesting to note that these companies are attempting to monetize crowdsourcing in exactly the way in which Howe says it cannot be monetized successfully.
Examples of Crowdsourcing
- Take a look at Wikipedia's compilation.[19]

Crowdsourcing Literature

General Overview

Although the idea of crowdsourcing--if not the word itself--has been around for many years, the Internet has made it much easier, cheaper, and efficient to harness the power of crowds. The power of crowds was popularized in 2004 when James Surowiecki published a book entitled, The Wisdom of Crowds.[20] This book purported to show how large groups of people can, in many cases, be more effective at solving problems than experts. According to Surowiecki (2004: xiii), "under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them." Two years later, journalist Jeff Howe coined the phrase "crowdsourcing" to refer to work that was performed by the "masses" online.[21] Since Howe's article was published in 2006, numerous authors have written books on crowdsourcing, each choosing to focus on different aspects of the topic. Howe himself took up the topic in 2008, proclaiming crowdsourcing to be a panacea--a place where a perfect meritocracy could thrive.[[22] Howe examined crowdsourcing from a variety of perspectives: what benefits it can provide, what kinds of tasks it can accomplish, and the potential changes it may bring about. Howe's prognosis for crowdsourcing was positive--in it he saw many potential solutions and few potential problems. Others have followed Howe's lead in describing the benefits of crowdsourced work. Clay Shirky has published two books--Here Comes Everybody (2008)[23] and Cognitive Surplus (2010)[24]--in which he describes how technology does more than enable new tools, it also enables consumers to become collaborators and producers. Although Shirky's books are not expressly about crowdsourcing per se, they mirror the optimism Howe expresses, both in terms of collaborative enterprises and the Internet's power to enable them.

These books have provoked an academic interest in finding out who is the crowd, or why the crowd moves the way it does. Some have looked at scientific crowdsourcing, asking what characteristics make someone a successful crowdworker/problem-solver.[25] Part of answering that question, it turns out, requires asking why people attempt to be part of the innovating crowd in the first place. The authors of this study found that the crowd was highly educated. It also found heterogeneity in scientific interests, as well as monetary and intrinsic motivations, to be important drivers of "good" problem-solvers. Others have examined non-scientific endeavors and asked similar questions.[26] This report also found that most iStock crowdworkers developing photographs were highly educated and motivated primarily by money. Jeff Howe, however, takes a different perspective as to crowdworkers' motivation: "There are...two shared attributes among almost all crowdsourcing projects: participants are not primarily motivated by money, and they’re donating their leisure hours to the cause. That is, they’re contributing their excess capacity, or 'spare cycles,' to indulge in something they love to do." (Crowdsourcing, pgs. 28-29.)

While some focused on the potential consumer revolution or the composition of the crowd, others examined the business-related aspects of crowdsourcing. Identifying attributes of successful crowd innovators also has a business dimension. One researcher suggests that having experience spanning across a variety of communities or disciplines makes one likely to be considered 'innovative.'[27] Others focus more broadly on how to use the crowd to maintain or bolster business or brand. In Groundswell (2008),[28] Charlene Li and Josh Bernoff focus on how to most effectively use crowdsourcing to advantage businesses. The authors highlight how user bases of products can undermine a product or brand.[29] As a result, the authors propose that businesses use the "groundswell" to their advantage, fostering communities that can provide valuable feedback and economic payoffs. Marion K. Poetz and Martin Schreier have also taken a business perspective on crowdsourcing,[30] arguing that the crowd is capable of producing valuable (but not always viable) business ideas at a low cost. Other researchers have found that young entrepreneurs who were attempting to start businesses frequently belonged to these kinds of communities.[31] For a related discussion on user innovation and user communities, see Eric Von Hippel's books[32] and William Fisher's article in the Minnesota Law Review.[33]

Other authors have pointed out some of the problems with crowdsourcing. Dr. Mathieu O'Neil has argued that, despite its benefits, crowdsourcing can have inconsistent quality, can lack the diversity needed to draw on the "wisdom of the crowd", and can contain many irresponsible actors.[34] Miriam Cherry has argued that some crowdwork can be exploitative, sometimes forcing people to work for absurdly low wages.[35] She argues that we need a legal framework for addressing low wages, proposing we apply the Fair Labor Standards Act (FLSA) to crowdsourced work like that found on Mechanical Turk. In a forthcoming article, she takes a more systematic (but still legal) approach to suggesting solutions for the problems faced by different kinds of virtual work.[36] Cherry seems to be the only law professor to have written on addressing crowdsourcing from a doctrinal perspective.

Much of the other literature on the subject concerns the problem of quality. Soylent--which is essentially a crowdsourced editing program--has been a prime example of how lack of quality can limit the commercialization of a innovative and useful crowdsourcing product.[37] Cheat detection--the ability to filter out individuals who complete tasks without actually reading them in the hopes of receiving money without doing the work--also has recently drawn attention. Indeed, a possible crowdsourced solution to cheaters has been proposed for sentence translations, relying on principles such as crowd consensus and logical parallelism in sentence structure and word choice. [38] Others have attempted to increase the quality of the traditionally-automated mechanism used to translate words by crowdsourcing translation tasks.[39] In addition to simple crowdsourcing, one set of authors suggests combining human crowdwork with machine work. This process, according to the authors, the system can specific a specific "speed-cost-quality tradeoff," which is based on an allocation of tasks among computers and humans.[40] John J. Horton, David Rand, and Richard Zeckauser have addressed using the online crowd for quality experimental research.[41]

Our Addition: Identifying Areas, Exploring the Problems

Given the body of literature and the Best Practices document, we found the idea of addressing systemic problems both attractive and difficult. Instead of replicating the Best Practices, or simply writing an overview of crowdsourcing, we decided to take a different angle. Unlike the Best Practices document, which classified problems generally and then worked downward to devise specific solutions by applying them to different types of crowdwork, we worked from the bottom up. We identified three types of crowdwork that suggested a variety of important, but (context-)specific problems. At the beginning stages, we had only our intuition to guide our "sense" of the problems. As we delved further into them, however, they crystallized. From our discussions we identified three types of crowdwork in which specific problems arise, some of which are systemic problems with crowdsourcing that the Best Practices does not address. Nevertheless, we wanted to draw on the Best Practices document to determine whether some of its strategies seemed workable or needed to be expanded, refined, or discarded. To accomplish this goal, we attempted to integrate the Best Practices approaches into our framing of both the problems and the solutions we discussed.

An Introduction to Our Approach

Our discussion of various crowdsourcing environment suggested a variety of ways to slice the pie. In the end, we settled on three areas of crowdsourcing, reaching a rough classification based on the type of work performed. In that sense, our division followed the Best Practices division of work into microtasks, connective tasks, and contest tasks. But there was an important difference: our classification of work depended also upon the purpose for which the work was being put, focusing on a specific case study for each. In other words, it mattered to us that one task was framed as a "game" versus a "survey." We cared not just about the framing, but the motives of employer and the worker. We asked questions like, "For what purpose is the employer requesting this task?" and "Why does the worker choose to perform the task?" Our aim was not to analyze every kind of crowdwork using motive and purpose; rather, these questions provided a general framing for dividing crowdwork into analytical categories--places where we could identify specific problems that may differ depending on the answers to these questions. After significant discussion, we settled on three types of tasks, choosing a case study to explore each one:

1. Microtasks: Amazon's Mechanical Turk[44];

2. Tasks requiring "professional" skills: 99designs[45] and InnoCentive[46]; and

3. "Game" tasks: Gwap[47].

For each of these tasks, we attempted to identify salient "problems": issues that cause concern for workers, employers, platforms, businesses, or society generally. In identifying problems, we had two goals. The first was to provide a set of new issues for others to build upon in future work. The second was to explore a small number of issues and propose our own context-specific solutions. In this sense, it was an exercise in both applying the Best Practices and inventing new solutions that either context or framing prevented the Best Practices from solving. In what follows, we explain each topic, the problems it presents, and specific solutions to selected problems. Although we think the solutions we propose have some teeth, they are not meant to be final. Indeed, our goal in presenting these solutions and problems is to provide a base from which others can build.

The 3 Crowdsourcing Environments and Problems

Microtasks: Amazon Mechanical Turk; Microtask.com; Soylent

Microtask is type of crowdsourcing refering to an employer divides a task into small pieces of subtasks that require human intelligence (Human Intelligence Task, HIT [48]) and assign the microtasks to a crowd. Each piece of microtask can be completed independent from any information from other microtasks. Although some tasks might need certain qualifications to complete, such as knowledge of a language, the human intelligence required to complete a HIT is minimal and average (or even below average) workers can do the job. Workers earn incomes by the numbers of tasks they complete and gets approved by the employer.[49] By definition, workers not always earn monetary benefits: in some cases they feel sense of achievement by completing a task or win virtual points in the form of competing game (as discuss in the final section, e.g. GWAP [50]), or earn no benefits at all (e.g. ReCaptcha [51]). In this section, we limit our discussion on microtasks with monetary reward only.

An introductory video of Microtask.com [52] is a good illustration the nature of microtask-type of crowdsourcing. Microtask.com also offers two typical examples that are particularly suitable for microtasks: form processing, which helps clients such as banks and insurance companies transfer hand-filled forms into digital forms compatible to databases[53]; Achieve digitization, which helps clients such as national achieves and libraries proofread their scanned and OCRed files and digitalize them into machine readable and searchable format[54].

Quality Control

Problem

One concern is the quality of the products of crowdsourcing is not always satisfactory. Below is a sample of text revised by Soylent [55] from the second paragraph of Prof Zittrain's book [56].

 This was not the first time Steve Jobs had launched a revolution. Thirty years earlier, at the First West Coast Computer Faire in nearly the same spot, the twenty-one-year-old
 Jobs, wearing his first suit, exhibited the Apple II personal computer to great buzz amidst “10,000 walking, talking computer freaks.” The Apple II was, a machine for hobbyists 
 who did not want to fuss with soldering irons:, had all the ingredients for a functioning PC were provided in a convenient molded plastic case. It lookedWhile clunky, yet it 
 could be at home on someone’sfit on a home desk. Instead of puzzling over bits of hardware or typing up punch cards to feed into someone else’s mainframe, Apple owners faced 
 only the hurdle of a cryptic blinking cursor in the upper left corner of the screen: the PC awaited instructions. But the hurdle was not high. Some owners were inspired to 
 program the machines themselves, but true beginners simply could load up software written and then shared or sold by their more skilled or inspired counterparts. The Apple II 
 was a blank slate, a bold departure from previous technology that had been developed and marketed to perform specific tasks from the first day of its sale to the last day of
 its use.

A comparison between the above text with the original might suggest the the quality of the original is not improved; in fact, it added typos and errors the original does not contain.A worse example could be a worker does not actually use human intelligence but just click the mouse randomly. For example, in a task where a worker is supposed to tell the sex of the person in a photo, or tell the major color of a picture, one could randomly choose the result in order to complete as many tasks as possible. Although workers of a task only earn income until the requester approves their work,[57] the approval of requesters are generally procedural rather than substantively. The dilemma is that neither can the employer check each microtask (which means the employer do the work again by themselves), nor can they use machines to do so (otherwise they would not have outsourced the task to a crowd).

Solutions

Improvements in the quality for crowdsourcing not only means that current tasks can be completed with better quality, also the crowd will obtain wider capacity to assume more complex responsibilities. We believe there are two critical approaches can be considered to improve the quality: result verification/evaluation and worker grouping.

1. Verifying microtasks.

a. repetition: >75%. problem: cost

2. Grouping the crowd.

a. personal background: education, expertise problem: hard to verify; privacy
b. prior performance, an experience-based rating system. problem: unfair for new comers. further divisions of types of tasks?

Communication of Workers

Problem

isolated workers; less bargaining power

Solutions

union: necessary?

anonymity of workers

confidentiality of the task

Compensation & Sustainability

Problem

Demographic evidence for workers on Mechanical Turk

Average wage

Solutions

Minimal wage?

"Professional-Grade" Tasks: 99designs & InnoCentive

99designs is a website that allows individuals or companies that need a design to ask for it by crowdsourcing.[58] InnoCentive is a company that allows individuals and entities to post scientific problems that anyone can attempt to solve.[59] These companies are a particularly interesting form of crowdsourcing because they enable the crowd to perform work typically performed by "professionals." Although services like Mechanical Turk also "deprofessionalize" work, 99designs and InnoCentive do so much more directly. Typically, designs are created (at 99designs) or problems are solved (at InnoCentive) by professional companies, the employees of which typically have some formal training. This platform raises a variety of concerns. We focus here on two: deprofessionalization and reputation.

"Deprofessionalization"

In some sense, graphic design and other industries such as science are "professionalized": they are businesses occupied by individuals with formal training (and many times formal education). Many types of work qualify as "professions" under this definition. The traditional occupations like lawyer, doctor, and clergy certainly fall within it; but so too do other kinds of work, such as graphic design. For the past several years, crowdsourcing has crept into these "professionalized" areas without much fanfare. In science, for example, InnoCentive has provided a platform for corporate employers to crowdsource complex science problems. In the "creative" space, 99designs performs a similar function: it enables companies to crowdsource graphic design work. In some sense, professionalization is a gatekeeping mechanism--it vets people before they can perform certain work. In other cases, some argue, it is merely reinforces existing structures that disadvantage certain individuals. Professional crowdsourcing platforms reduce the role of industry or profession as gatekeeper--and has the potential to eliminate it entirely. If that's the risk, then there are several resulting problems.

Specific Problems

1. Cannibalization/Wage Reduction. If 99designs or InnoCentive lowers entry barriers and costs, it could stimulate a race to the bottom. 99designs' Business Development and Marketing Manager, Jason Aiken, has already acknolwedged that the company has created "a tension" with the traditional market for graphic design--and it may be driving down wages.[60] (Clip at 34:16.) In this world, professionals could not earn a living because "amateurs" or other professionals that do not have jobs will drive down prices for crowdsourced design work. With low prices and an abundance of crowdworkers, companies may shed their traditional means of acquiring designs. So crowdsourcing, which started off as a way to lower specific business costs or solve thorny problems, becomes the sole means of (research &) design work. In this environment, which designers worry about,[61] the professional industry collapses because wages are too low. Alternatively, a new web-based professionalization occurs. That, of course, depends on a variety of factors, including the ability of crowdworkers to maintain a coherent identity/reputation online.

2. Devaluing of Education. If deprofessionalization occurs and an industry cannibalizes itself, there will be a concomitant and precipitous drop in the market value of education. In scientific areas, many crowdworkers tend to have advanced degrees, or are highly educated. Their ability to perform sophisticated or professional crowdwork is therefore partly dependent upon their education. But the crowdwork market undervalues the educational experience of the worker because, at present, most highly-educated workers are engaged in part-time, non-sustenance activity. As crowdwork becomes more popular for professionals, wages fall and crowdwork replaces traditional professional work. The cost of education, however, remains constant. This means that sophisticated crowdworkers will pay the same amount for school but will be full-time, instead of part-time, crowdworkers. Given the low wages, people will not be making an adequate return on their educational investment. Several possibilities then result--but we focus on the most dire here. Knowing their inability to generate an adequate return on their investment, low wages could deter individuals from higher education. This, in turn, will cause a brain drain, where fewer and fewer people obtain advanced degrees. As a result, the market for professional crowdworkers shrinks. Given the technological growth rate, the demand for crowdworkers will continue to grow. The shorty supply and high prices will mean the end of crowdsourced professional work for two reasons. Costs will rise to the point where crowdsourcing is no longer more economical than traditional professional services. Second, and more importantly, the lack of workers means that, given the growing technological and cultural demands our society makes, tasks simply cannot be crowdsourced effectively.

Solutions.

One might wonder whether concerns over deprofessionalization are overstated. The criticism--the one Howe addresses the first chapter ("The Rise of the Amateur") in his book[62]--is that we are worried simply about "amateurs" displacing "professionals." We think the two issues just outlined illustrate that the problem is greater than professionals losing their privileged status. There are doubtless many potential solutions to these problems. Here are a few.

1. Wage Scale. Industries and professionals could collaborate to set wages they think are reasonable. The Best Practices document recommends "fair" wages, but seems to presume only employers will be the ones deciding what wages to set. In this professional-crowdwork context, it might be wise to allow collaboration of interested parties, rather than rely only on employers to set a fair wage. One could see such a wage scale being set by various stakeholders, and perhaps some "non-stakeholders."
2. Wage Determinants. Wages could be set according to some or a series credential-measures. This could take several different forms, some of which could be combined.
- 2a. Workers with a degree or work experience in a related professional field may be entitled to a higher wage than a worker with no higher education. The problem with this approach is that it reduces the "meritocracy" aspect of crowdsourcing. It also seems to devalue the diversity that crowdsourcing thrives on.
- 2b. If the platform implements an effective reputation or rating system (detailed or simple), workers could use their reputations to generate more work. This solution faces some technical and privacy problems, but seems like at least one plausible way of ensuring favorable wages for better performers. This also has the benefit of showing which workers are repeatedly good at performing tasks. This is important because many InnoCentive solvers, for example, never solve more than one problem.[63] One drawback of this method is that it decreases the "perfect meritocracy" that some seem to think can persist forever (if it exists at all).
- 2c. Workers could be paid according to the contributions they make. For this system to work, a platform and employer would have to work together. They would have to create a framework that allowed an initial screening for those who held the requisite qualifications or ratings, and then pay them according to the amount of time worked and contributions made. (Here some kind of algorithm might be useful.)
3. Educational Reform. Change educational components to provide skills that crowdsourcing cannot cover. This could include non-compartmentalizable tasks or exposure to a broader range of subjects. There are many problems with reforming the educational system. Aside from the many practical difficulties, there are two specific problems. First, it may be difficult to identify in advance what problems are crowdsource-able, as the capacity to crowdsource work is likely to change in the future. Second, because we don't yet know enough about crowdsourcing, it's difficult to say what skills or exposure to ideas one needs to perform certain tasks well--or outperform crowdsourcing.
4. Discourage wage reductions/crowdsourcing by having platforms require disclosure when crowdsourcing. This, however, may discourage crowdsourcing generally. The goal should be to reap crowdsourcing's benefits while minimizing its potential downside. Still, this solution could be used to less extreme degrees to pressure companies to offer competitive wages for crowdsourced tasks.
5. Incentivize pay-friendly behavior by using reputation system (see below). The various approaches here could be combined with or pushed into a reputation system. We explain various problems with reputation systems below. Here, we focus on the potential benefits of a three-way reputation system. "Three-way" means that the system involves all parties: workers, employers, and platforms. A reputation that engages all three parties can incentivize each party to engage in wage-friendly behavior. A reputation system, for example, that allows workers or platforms to assign reputational rankings to employers can cause wages to increase. If workers know that Corporation A generally pays 2x as much as Corporation B, workers can identify "fairer" wages. Similarly, companies may be willing to offer better wages if they can screen out "low reputation" workers.

Reputation

Reputation is an issue for all parties involved in professional crowdsourcing: the worker, the employer, and the platform. The Best Practices document focused on some of these issues in the microtasking environment. Some of these reputation mechanisms may be able to be imported from user-based reputation systems--and there is literature on how to design reputation such systems.[64] Concerns there were focused on speed, efficiency, and work quality. The Best Practices also focused solely on the relationship between individual workers and employers. These concerns also exist in the professional environment. There are, however, other problems to confront. Specifically, worker quality may become more important for professional work because the work done is more time-, labor-, and skill-intensive. Additionally, because tasks are fewer in number and take longer, each contest or task completed also has greater influence on, and importance for, worker reputation. Finally, if professional work pays more than microtasks, the reputational stakes are higher.

Reputation Concerns for Workers, Employers, and Platforms

Workers. As crowdsourcing increases and professionalized crowdworkers have success, they want to signal to potential employers that they have done good work in the past. Workers may also want to list their professional experience, education, or training (as one can do on Gerson Lehram Group[65]). Workers also will seek to ensure that antisocial behavior among themselves is counted against a worker's reputation. The literature on crowdsourcing emphasizes the community that exists in crowdwork environment, and ensuring the community functions well is important. 99designs has numerous instances where a crowdworker accuses another of "stealing" another's design. Workers also will want to know the reputation of an employer. This could include factors like the amount paid, the quickness of payment, and the type of work offered.

Employers. Reputation also would be valuable for employers--they'd prefer work from people with high reputations, not only to ensure good work, but also to ensure the work was not copied from somewhere else. While copying from the worker's perspective is important as a social matter, from an employer's perspective copying is important as a legal matter. Employers want to avoid copyright infringement or any claims of copyright infringement. Employers also want to broadcast their "good" reputations.

Platforms. Platforms have an incentive to ensure that workers' and employers' desire for a reputation system is met. It will, in many cases, be up to the platform to institute a system that can deal effectively with reputation, antiscoial behavior, and legal issues. 99designs has implemented a system that seeks to deal with the latter two, but does not address reputation.

Problems.

1. Portability. As the Best Practices document notes, workers and employers may want have a coherent identities across a variety of platforms. In doing this, they may want to keep their identities secret but their reputations public. If a worker moves from 99designs to iStockphoto, for example, they may have to create a totally new profile and build up a reputation, even though some aspects of their previous reputation may be relevant.[66]

2. Reliability. Any feedback-reputation system needs to be reliable. That is, it needs to accurately reflect the quality and quantity of work performed. One way to ensure diversity in feedback, and therefore reliability, is to have a (platform-mediated) reputation system, where workers vote on both each other (where applicable) and employers, and employers do the same (vote on workers and each other). The system could then be mediated by the platform.

Solutions

1. Platform-specific solution. The Best Practices document suggests that each crowdsourcing platform implement its own measures to track reputation. This is a reasonable solution. 99designs and InnoCentive do not yet have such a system. This kind of solution would allow each platform to identify the reputational characteristics necessary for the kind of work offered. 99designs, for example, could track worker reputation based on cooperation among workers, number of contests won, and portfolio ratings from other users. It could also track employer reputation based on payment amount and speed, and copyright licensing terms. InnoCentive, by contrast, could focus on credentials characteristics, such as degrees obtained or relevant work experience. It could track employers based on potential for future projects with the same company. The Best Practices document also suggests that workers have access to their data, and presumably envisions a way to use reputation information at one provider or another. Here, the solution could be for each platform to issue a "portable reputation," which would explain the reputation of the worker on that site, the work performed, and the average/median rating of a worker on the site (with similar work performed). In this scenario, platforms would have to have some interoperability, a problem on which some already are working.[67] Each platform, for example, could agree to host the reputational information provided by another. In such a situation, the worker could 'import' his or her data from, say, 99designs to iStockphoto. When that worker completes a task or uploads a photo, an employer could click on the worker's profile to view various bits of reputational information. (Perhaps they could all agree to a specific file (type) that included certain information?) In one iteration, this could include "Reputation Homepage" that displayed the logos of various platforms for which the worker had completed tasks. Scrolling the mouse over the logo might reveal some general reputational information,such as a reputation score (with the appropriate scale for the platform). There are a variety of options, and most include some kind of cooperation among platforms. This idea, though, requires cooperation among platforms--something that has been difficult in the past. One reason for this difficulty is because large, established platforms have a perverse incentive to keep a reputation system closed. These platforms already have a user base. Portability threatens what they perceive as theirs: the reputations earned through their platforms, which make the platform a desireable place to conduct business. eBay, for example, objected when Amazon allowed users to import their eBay reputations, claiming proprietary interest.[68] (p. 237.) This confirms Randal Picker's contention that, as far as portability on the Web goes, "law matters."[69] (p. 8.) [70][71]

2. Uniform Reputation. Another approach to reputation would use a centralized entity to manage worker/employer reputation across all platforms. One can imagine a universal reputation platform that draws information from various sites and aggregates it--maybe it has agreements with all the platforms to provide it reputation-related information. (There are already sites that purport to provide reputational rankings of websites--see, e.g., Webutation.[72]) In other words, this entity would maintain a database of all crowdsourced employers/workers reputations. Reputational information would be aggregated automatically from all participating crowdsourcing platforms, which would automatically submit data to this entity. Theoretically, employers and workers could consult this platform to determine worker reputations. To enable this "reputation finding," the reputation database could contain various "categories" of workers and employees, and allow an individual to sort based on various reputational scores, which would be determined based on a variety of metrics and the information provided to the centralized entity by the crowdsourcing platforms. So, for example, a company could find someone with a high "creative reputation" (e.g., logo design) or someone with a high "engineering" rating. It may also allow users to disaggregate reputational information across platforms to see how the users has performed/behaved in each. Additionally, a central reputation-authority would allow people to move "with" their reputations and stay "anonymous"--they won't disclose their real identity. As noted, this all would require platforms to (automatically) submit information to this reputation database/entity. Because platforms (in the hypothetical world) would automatically submit worker reputation info to the site, there are essentially two ways the site could work at the level of individual reputations. Both methods of operation work from the premise that individuals and employers are searchable by reputation. Search results might display an aggregate reputation or a simple ranking with no reputational score accompanying it. The first way resembles the solution described above: once a user is selected, a visitor is taken to a reputation homepage that displays a worker/employer reputation from different platforms, and provides information about each. This provides at least one benefit for platforms: they don't have to generate interoperability--they can delegate reputation aggregation to one entity, which could probably collect and display the information more effectively. Two authors have already sketched a design for a similar system.[73] A second method would be to for the reputation site to create its own "general" reputation for each worker based on the information it receives from platforms. We can imagine a situation in which workers or employers can ask the reputation site to rank workers based on specific, general, or a combination of characteristics. This would allow employers to sort workers by reputational characteristics they deem important to a particular type of work. Once sorted, employers could provide an open-call project to the narrowed crowd of workers. This system also would require the centralized agency to weight different sites or reputational attributes. This is a problem of reliability, discussed below. One other advantage of a unified reputational system would be allowing some user control over reputation. We can imagine a place in a user profile where the user can fill in information, provide explanations, or otherwise comment on past work experience and the like.

3. Reliability for Platform-Specific Versus Uniform. Each system of reputation (platform specific and uniform) will have issues of reliability. By reliability we mean a platform can accurately report to a user how well or poorly a specific individual has performed in the past, where performance is based specific criteria. Because reputation systems rely on information, what information is used and who discloses it can influence the reliability of a reputation system.[74]
- Platform-specific reputational systems would likely achieve greater reliability because they would be localized. As self-contained systems, each platform could respond to user demands, as well as tinker with existing formulae that garner ratings. Still, it's not clear how reliable these local reputational systems could be across platforms. Assuming for the moment that each platform can in some way support the reputational rating of another platform, we still might have the problem of "overload": individuals will have too many reputational scores from too many different platforms. Employers may then look or sort only to the lowest rating of any given platform on a user profile. Thus, platform specific rating systems could actually work to the detriment of the users.
- A uniform reputational system may face more reliability issues than an interoperable platform-specific reputational system. For one thing, a unified system will be drawing reputation data from hundreds of different platforms automatically. There is a risk for gaming the system on two levels here. First, a user at a relatively unsophisticated platform may try to manipulate the system and boost their own reputational rating. Second, the platform itself may have an incentive to give its users higher reputational scores. Why? Because assuming the unified system computers some aggregate reputational score, any platform can boost their users' scores and give them an advantage in the market place. Another problem arises as to weighting reputational scores. Should a reputation score from InnoCentive be "worth" more than one from Gerson Lehrman? Do reputation scores from Threadless even translate into anything meaningful? If so, should they be treated the same as scores from 99designs. All of these questions illustrate that there is no easy way to decide how scores should be weighted, though we can think of a variety of factors, including number of users, length of existence, customer satisfaction, etc. Part of the problem also is that different platforms may use different rating systems--some may use only one general rating, while others may parse ratings into various categories. How these issues are dealt with impacts the "reliability" of reputation. It also, therefore, impacts the trust that crowdworkers have in a reputational system.

"Game" Tasks: Gwap

Gwap is a website comprised of games with a purpose (i.e., gwap).[75] That is, when its users play the games, they are simultaneously doing tasks that, in the aggregate, perform some function that improves our state of knowledge and/or advances our technology. For instance, the most well-known game on Gwap is called the ESP Game, where two players are shown a photo, each enters a list of suggested tags that the other player cannot see, and then they win points when they have a matching tag. The purpose behind the game is to create tags for images online, so that search engines can sift through them more easily when a user enters a query.[76] As Gwap's tagline states, "When you play a game at Gwap, you aren't just having fun." While such platforms offer an innovative way to draw on the 'wisdom of the crowd,' they also raise concerns about the potential exploitation of users and its concurrent effects.

Addiction

Problem.

The question of addiction has arisen both in the context of crowdsourcing generally [77] and in the context of so-called "social games", [78][79] such as FarmVille. [80] In a paper entitled Moving the Crowd at Threadless,[81] Daren C. Brabham offers some insight into the various motivations of the crowd at Threadless,[82] a website that sells t-shirts and applies crowdsourcing through its solicitation of the crowd for designs and slogans.[83] In particular, Brabham discusses how members of Threadless use the language of "addiction" when talking about their participation in Threadless. In his view (2010: 3, 17), the language of addiction illuminates the significance of building a community in order for crowdsourcing to be truly effective, such that organizations that use crowdsourcing "need to allow the crowd to truly support the problem-solving mission of a crowdsourcing venture for the public good, to generate in the crowd a sense of duty and love – and even addiction – to such a project", although he ultimately concludes that members of Threadless are most likely not actually addicted in the pathological sense.

On the social gaming side, Gamespot--the self-styled "go-to source for video game news, reviews, and entertainment"[84]--recently ran a story on the ethics of the social games market.[85] Here, the concern is that social games have crossed the line from being fun to being addictive. As Edmund McMillen [86] put it, "There's a difference between addicting and compelling... Crack is addicting, but it's not a fun game."

So how does this affect crowdsourcing games? Since games like those at Gwap involve both crowdsourcing and social gaming, the concern is that the risk for addiction is more acute. Although undoubtedly hyperbole, players of the ESP Game, for example, have said that it is "strangely addicting" and that "it's like crack!" (See video [87] at 22:30.)

Solutions.

There are a number of safeguards, detailed below, that can be built directly into the platforms. Ideally, such changes would be made voluntarily, but legal regulation can be used as a tool to accomplish the necessary changes as well.

1. Platforms can kick people off after a specified amount of time. For instance, Gwap will kick users off if they have played for 15 hours straight, or 10 hours if they are from an .edu domain. (See video [88] at 14:54-15:02.)

2. Platforms can notify the player after a specified period of time has elapsed. (5 hours) 'Just to let you know, you've been playing for quite a while.' (10 hours) 'Are you sure you want to keep playing?' (12 hours) 'Really, you should go outside.' (15 hours) 'All right, that's it. I'm kicking you off for your own good.' The principle behind both this solution and #1 mirrors the one that led to the development of applications designed to prevent people from sending drunken e-mails that they rue when sober, such as Gmail's Goggles[89]; namely, that sometimes a little check on people can keep them from making decisions that they might later regret.

3. Platform can have built-in termination of the games. In other words, games should have an established end to them. For instance, with games like Life[90] and Candyland[91], the game ends when one or more players reaches the end of the trail. In contrast, games like Taboo[92] and Apples to Apples[93] continue until the players themselves call an end to the game. Built-in termination of the game would also be an effective tool to address addiction concerns for social games in general--imagine if FarmVille actually ended at some point.

4. Is it really a problem? It should be noted that there is little, if any, support for these assertions by way of rigorous studies. Moreover, it is hard to see how addiction is more of a problem for crowdsourcing games than for television or other kinds of entertainment that, despite the substantial amount of time people spend doing them, we do not regulate with regard to their potential addictiveness.

Disclosure

Problem.

Another concern that arises with these games is that the players may not know the underlying purpose. For instance, on Gwap, there is an 'About' section that describes the purpose behind the ESP game as an example of what is meant by "games with a purpose",[94], yet there is no place on the website that describes the purpose of any of the other six games. Although Luis von Ahn, the main creator of Gwap and hailed as the "father of human computing", [95] describes the purpose and features of Gwap's games in academic papers (see his CV [96] for a list of his papers), chances are that few users have access, know-how, or interest in hunting down and reading them. But that does not mean that they would not want to know if given the opportunity.

Solutions.

1. Objective Standard. As noted in the Best Practices, there should always be as much disclosure as possible. However, it is not always easy to see how this general principle should be applied in different situations. Perhaps one useful tool would be to employ an objective standard in deciding whether and to what extent certain information should be disclosed. In securities law, the materiality of information is measured by what a reasonable investor would want to know [97], so perhaps a good standard for crowdsourcing games would be what a reasonable person would want to know in deciding whether to play the game.

One may argue potential players could care less what the underlying purpose of a given game is, since they are only playing it for enjoyment. The obvious response to such an argument is that it depends on the purpose of the game. For instance, most people are probably either neutral or positive towards the ESP Game's purpose of labeling images to make them easier to sort, but they would most likely have a negative reaction towards a game whose purpose was to spam people with porn ads.

There is also the possibility that disclosure may not always be feasible for the game to work as intended. For instance, a psychology lab might formulate a game to test x, but if Player A is aware that they are testing x, it might inhibit or otherwise change how she plays the game, making the results useless for the purposes of study. So perhaps the caveat should be added that companies should disclose any information that a reasonable person would want to know in deciding whether to play the game, "except for information that, if disclosed, would have an adverse effect on the underlying task, aside from making the player unwilling to play." So "employers" would still be required to disclose as much as possible, but not to the point of comprising the underlying purpose of the game.

2. Timing. A further option would be to require additional disclosure after the player has finished the game, since the task will have been completed at that point, and then giving her the option of voiding her results. However, this may lead to people voiding their results for reasons other than taking issue with some aspect of the company or task, such as embarrassment or discomfort with how they acted during the game.

3. Review Process. If additional oversight is needed to ensure that companies are in compliance with this standard, companies could set up internal review boards. Alternatively, an outside party, such as the government, a non-profit, or a coalition of companies, could set up an independent review board that is responsible for reviewing and approving (or not) the proposed games and accompanying information, much like researchers are required to submit their proposals to an institutional board when the study involves human subjects.[98] Approval by such a board could be required before the game is crowdsourced, and/or the board could be given the authority--either by the government or by agreement among the companies--to do random audits or investigate complaints, just like the SEC can investigate and determine whether a company has failed to disclose material information with regard to its securities.[99]

Compensation

Problem.

One question that has come up with regard to crowdsourcing games is whether players should be compensated for doing tasks through the guise of a game.[100] Although the task is structured to be fun, it is still work that employers need done and would ordinarily pay for, were it not set up as a game. So are they exploiting the crowd by somehow 'tricking' them into doing work for free when the companies would have to pay for it otherwise? Does it make a difference whether people are aware of the underlying purpose? In the preceding section, we argue that people should know the task behind the game so that they can make an informed decision about whether to play it based on their personal values, but would the lack of compensation also play a role in their decision? Does it depend on whether they perceive it as work? More importantly, if people will do it anyway, why should companies pay for it?

Solution.

Players should remain uncompensated for a number of reasons. First of all, if people don't find the games fun, they won't play them. For instance, one game on Gwap is called Squigl, in which two players are shown an image and a word, and have to trace the object described by the word.[101] If their traces match, they earn points. One of the authors of this page, doing laborious, intensive research for this section, found the game excruciatingly dull. So she stopped playing.

In other words, crowdsourcing games aren't really "free" -- companies have to spend time and money figuring out a way to convert a task into a game that is sufficiently fun that people will play it voluntarily. While the total sum might be significantly less than what it would cost to pay each worker/player, it lessens the sense that the crowd is being exploited if creating such games is not without effort and cost to the employer. Moreover, offering compensation might bring players closer to the brink of addiction: if players were earning money, however trivial an amount, they might feel compelled to keep playing, more so than if they were only playing it for enjoyment, although further research is needed on this point. Alternately, players might actually play less if offered a token amount to play rather than an amount that reflects of the value of their work, because they would feel exploited, even though they would be getting more than if they were paid nothing, similar to the psychology behind the ultimatum game.[102] Again, however, more research is needed.

According to von Ahn, people played over 9 billion hours of Solitaire in 2003. (See video [103] at 7:00-7:04.) Solitaire is a game without a purpose, other than to entertain.[104] So if von Ahn's statistic is accurate, people are perfectly willing to devote a significant amount of time to entertaining themselves. If the games on Gwap serve that function, why should they be penalized for simultaneously being useful?

One might argue that the issue is not that companies would be penalized were they forced to compensate the players for playing games with a purpose, but rather that the workers are being currently penalized through the lack of compensation because they are unwittingly contributing free labor when they are entitled to a fair wage for their work. The question then becomes whether playing such games is indeed "work." We would argue that it is not, because people are playing the game primarily for its entertainment value: if it stops being entertaining, then they will stop playing it. Alternatively, if they are playing it because they support the underlying purpose, then that purpose is clearly important enough to them that they are doing it for free, so they already have sufficient incentive to do the work. After all, do we want to encourage a value system where everything must be set at a price? (If the answer to that question is yes, Wikipedia is in trouble.)

Moreover, it should also be noted that not all workers get paid; some are volunteers. For instance, in Crowdsourcing, Jeff Howe talks about the contribution of the crowd to tasks like NASA’s "Clickworkers" project (2006:62)[105] and the Cornell Lab of Ornithology’s eBird project (2006:31)[106]. In both examples, the crowd knows that what they are doing is work, but they still do it, even though they are contributing information for free, because they support the purpose for which they are laboring. If anything, this emphasizes the importance of disclosure rather than compensation.

@@ Line 206: / Line 206: @@
 **A uniform reputational system may face more reliability issues than an interoperable platform-specific reputational system. For one thing, a unified system will be drawing reputation data from hundreds of different platforms automatically. There is a risk for gaming the system on two levels here. First, a user at a relatively unsophisticated platform may try to manipulate the system and boost their own reputational rating. Second, the platform itself may have an incentive to give its users higher reputational scores. Why? Because assuming the unified system computers some aggregate reputational score, any platform can boost their users' scores and give them an advantage in the market place. Another problem arises as to weighting reputational scores. Should a reputation score from InnoCentive be "worth" more than one from Gerson Lehrman? Do reputation scores from Threadless even translate into anything meaningful? If so, should they be treated the same as scores from 99designs. All of these questions illustrate that there is no easy way to decide how scores should be weighted, though we can think of a variety of factors, including number of users, length of existence, customer satisfaction, etc. Part of the problem also is that different platforms may use different rating systems--some may use only one general rating, while others may parse ratings into various categories. How these issues are dealt with impacts the "reliability" of reputation. It also, therefore, impacts the trust that crowdworkers have in a reputational system.
-===== Disclosure =====
 ==== "Game" Tasks: Gwap ====

Crowdsourcing: Difference between revisions

Revision as of 18:00, 6 December 2010

Contents

Crowdsourcing: Background and Working Definitions

Definitions

General Information on Crowdsourcing

Crowdsourcing Literature

General Overview

Other Problems

Our Addition: Identifying Areas, Exploring the Problems

An Introduction to Our Approach

The 3 Crowdsourcing Environments and Problems

Microtasks: Amazon Mechanical Turk; Microtask.com; Soylent

Quality Control

Communication of Workers

Compensation & Sustainability

"Professional-Grade" Tasks: 99designs & InnoCentive

"Deprofessionalization"

Reputation

"Game" Tasks: Gwap

Addiction

Disclosure

Compensation

Summary

Navigation menu

Crowdsourcing: Difference between revisions

Revision as of 18:00, 6 December 2010

Crowdsourcing: Background and Working Definitions

Definitions

General Information on Crowdsourcing

Crowdsourcing Literature

General Overview

Other Problems

Our Addition: Identifying Areas, Exploring the Problems

An Introduction to Our Approach

The 3 Crowdsourcing Environments and Problems

Microtasks: Amazon Mechanical Turk; Microtask.com; Soylent

Quality Control

Communication of Workers

Compensation & Sustainability

"Professional-Grade" Tasks: 99designs & InnoCentive

"Deprofessionalization"

Reputation

"Game" Tasks: Gwap

Addiction

Disclosure

Compensation

Summary

Navigation menu

Search