Day 6 Predictions

From Cyberlaw: Difficult Issues Winter 2010
Jump to navigation Jump to search

Daniel: Our guests will probably discuss at length the challenges that Dispute Finder and most web-based cooperative tools bump into while attempting to harness input from virtual crowds. I guess they will talk about Dispute Finder’s design difficulties, such as costs and trade-offs (between precision and recall, between user-friendliness and number / quality of features, etc). They’ll most likely also summon stories from the interviews discussed in the document we received, perhaps to illustrate content-layer problems with measurement of "information sources reliability"; users’ misunderstandings / trouble with logic operations; and group biases. I would love to hear their views on the proposed use of Turks to improve the database of disputed claims and arguments, as well as on the current biases of the disputed facts / arguments presently listed by the software.

Jason: I predict that there will be a good deal of discussion of what Daniel calls the "user-friendliness" aspect of these tools - and I hope there is, because it's critical. Specifically, what is the necessary ratio between DisputeFinder or Herdict "passive users" and "active reporters" to make a project successful? I say this because both Herdict and DisputeFinder look somewhat sparsely-populated for them to be maximally-useful right now. For example, Herdict is reporting that 2 Chinese users have reported YouTube as inaccessible. How do I interpret that? What percent of people who might know about and like Herdict in China are reporting back to Herdict? We know that Wikipedia is successful in spite of the fact that only a very small portion of readers become really regular editors - but Wikipedia is also one of the most visited sites in the world. I hope we discuss what strategies these organizations are employing to build participation for these more niche offerings. Jharrow 18:20, 11 January 2010 (UTC)

Reuben: When Daniel talks about the challenges of web-based cooperative tools, my first thought is about the challenge of a achieving a critical mass. I poked around with Dispute Finder for just over an hour this morning and during the entirety of my browsing the New York Times, Washington Post, Miami Herald, and Slate I only came across one disputed claim. No offense to America's news media, but my guess is that what I read is more disputed than that, but that there just aren't enough people trolling the news sites and adding claims to the dispute finder database for the service to actually be that helpful yet. Jason's point about passive users versus active reporters is important. I too would like to hear about how to reach a critical mass and how many active users are needed in order to have a useful service. I'd also like to hear about the potential for users to participate in a more passive manner - notwithstanding the privacy issues, if Herdict could just monitor my browsing and automatically send a report whenever I come across an inaccessible website, something akin to a for my click stream, the data would seem to be much more complete than simply recording whatever I choose to report. Never underestimate the laziness of the average person. My prediction is that our guests acknowledge the shortcomings in their current offerings while remaining optimistic about the possibilities of community based technology. ReubRodriguez 18:49, 11 January 2010 (UTC)
Amanda:Totally agree with Jason and Reuben here on hoping to hear Daniel's definition of what it means to have "critical mass." This is a huge challenge in much of the UGC world - how many people you need to really establish credibility, and at what point you start to become relevant. Attracting users is probably the largest hurdle I can think of for any site that relies on user content - so how they do it and what strategies have worked would be very helpful.
Ramesh: I agree with Reuben on the usefulness of Dispute Finder and Herdict. While Wikipedia (and Yelp, and a few other sites) show that sometimes, you can get useful content for free, that's not always the case. DisputeFinder didn't find many disputes when I did my regular scan of news websites, even when reading articles on topics like medical marijuana and same-sex marriage. It seems like applications like DisputeFinder and Herdict would be better if they were more automated -- if DisputeFinder automatically attached itself to controversial terms, and especially, as Reuben suggested, if Herdict was not based on self-reporting.
Franny: I agree that a threshold level of dedicated trusted users (see Vicki's comments below) resolves many problems. I hope our guests will discuss the strategies used by DisputeFinder, or any other website initiatives dependent on a large and broad variety of user input: (1) to attract that user base; and (2) to cope during the interim period while they continue to try to attract that user base. For example, I wonder to what extent DisputeFinder has considered building in redundancy as a means of increasing the accuracy of its results (to borrow a strategy from CrowdFlower), or perhaps some combination of automation and redundancy.
Andrew: I hope the discussion of how to achieve critical mass focuses as much on instilling/spreading an ethos as it does on ways to automate the DF system. Tools don't work without the accompanying human motivation: the wiki architecture is awesome, but (see our Predictions pages) they don't necessitate anything about a page's structure, and that's where the Wikipedian ethos steps in. No matter how passive DF/Herdict eventually allow their users to be, those users will (probably) still have to take the first step of registering, installing the plug-in etc. To do that, they have to be persuaded of the importance of the problem the tools are meant to address. Until the supposed echo chamber, "daily me" effect of Internet discourse becomes a mainstream concern, DF will not be a mainstream tool.
Sheel: I agree that critical mass is a huge issue, but I think validity is more of an issue; Reuben's description of his searches on the NY Times infer that many statements that most of us would claim to be disputed are simply not being caught by the program. I think DisputeFinder, while trying to make its technology stronger, should focus on instituting some sort of citations for all the information it crawls. In other words, whether the statement is 'disputed' or not, the plugin should (perhaps in the form of clickable footnotes), find citations for them; these can be links to 'valid' websites, scientific papers, press releases, etc. I'd be interested in asking whether DisputeFinder thinks it could move into the citation space and expand its scope. Style 02:35, 12 January 2010 (UTC)
Tyler: I completely agree with above thoughts about Herdict and DisputeFinder needing to collect a critical mass of users before becoming useful. This seems to echo the idea that wikipedia was not useful for its first several years because it did not possess a critical mass of articles. However, I think there are some differences because individual pieces of wikipedia could become useful before wikipedia as a whole in that individual articles could become independantly useful before wikipedia became as comprehensive as it is today. I don't see that Herdict or DisputeFinder have the same capability to be useful while scaling because they require users to explicitly decide to install plugins and begin using their services before any benefit can be gained by that user. Wikipedia was able to gradually grow in prominence as users occasionally found information on wikipedia that they wanted through web searches. I am wondering if Herdict or DisputeFinder can take advantage of automated solutions to increase their seemingly as-yet sparsely populated databases? For example, could web crawling robots be used to identify at least some inaccesible sites with the expectation that this list could then be pruned by users rather than expecting it to materialize entirely by user submissions? Could DisputeFinder use a web crawling robot, in conjunction with sophisticated text parsers to begin identifying at least some topics that clearly involve dispute? I expect and hope that the guests will discuss some strategies for increasing the datasets of their projects to the point that they can obtain their critical mass of users and data more quickly. TylerLacey 19:49, 11 January 2010 (UTC)
Michael: In terms of achieving critical mass for utility, herdict seems to have an additional challenge to wikipedia and DisputeFinder. When users make contributions to wikipedia or DisputeFinder, the information they provide remains useful indefinitely (for the most part). Herdict, on the other hand requires constant updating. This is entirely possible (as twitter and facebook demonstrate), but it reflects an additional challenge. I would be curious to hear if there is any data to determine what the different requirements of such different sites would be. Mfeld 23:19, 11 January 2010 (UTC)
I'm also wandering whether DisputeFinder might face the same problem as Wikipedia as the the composition of the DisputeFinder community (if there is one?). Do the guests have only clue on who the people are that are flagging certain statements? Do these people reflect the different composition of society?
Daniel: Tyler has a great point here, so maybe we should ask if the Dispute Finder team has thought of an interesting dispute to explore well enough in terms of paraphrases / arguments, so that users could experience the full potential of the software and then be lured into becoming frequent contributors. darbix 01:22, 12 January 2010 (UTC)
Yosuke:One of the biggest problems in the internet is the trustworthiness of the articles. DisputeFinder is attacking this problem directly, so I'm very excited to hear what is going on there. As you pointed out here, there are few users and it is hard to say it's really useful right now, but I guess the critical mass is coming around the corner. For example, social bookmark webservice users (like diggers) could begin using DisputeFinder, because they like to comment on the issue in the meta-pages. Personally, when Japanese version of the DisupteFinder has launched, I'd love to install it. Yosuke 03:16, 12 January 2010 (UTC)

Emily: Dispute Finder bears an inherent flaw: individuals, not algorithms, decide whom and what to trust for information. Consider the watch on your wrist. If your watch starts to get the time wrong, you might try to fix the watch. You hope and pray your watch starts giving you accurate, dependable information because you like your watch. You might even love your watch. But, if it continues to betray your trust, and the people in your trusted circle insist your watch is wrong, you give up. You decide to trust a new watch, but your new watch will probably be reminiscent of your old watch with respect to personal taste, experience, and preferences. Most people are intuitive enough (though they don’t necessarily convert insights into complex conclusions about source x versus source y) to know that 120 seconds of live, relatively unedited sound on Fox News Live or MSNBC Dayside is less likely to contain factually accurate information – even if relatively unimportant, like the location of a fire, or the total number of casualties in a mass shooting— than a compulsively edited, fact-checked tome in the Sunday NY Times magazine, the Economist, or the New Yorker.

Article 3.5 of the Dispute Finder document, “Determining Trustworthy Sources,” seems a bit absurd. It actually acknowledges the marketability challenges of its own software: “Unfortunately…the sites people actually trust are often those that share the person’s own point of view.” So, again, what is this software and what, really, is the point? Segway into ‘Cross-cutting themes.’ Save the world. How? Is Dispute Finder intended to help people sue other people for libel? Richard Jewel (now deceased) had a reasonably compelling case. That’s probably why he successfully sued (for libel) every organization, from CNN, to NBC, to the NY Post. All settled. He collected from each of them. But Richard Jewel didn’t need help from Dispute Finder. Richard Jewel had a case.

Cross-cutting themes: “Change the technology, save the world.” Okay, why not? Isn’t there something else smart people at Intel and UC Berkeley could be doing to make the world better? Last November, the New York Times produced an alarming story [1] about the food stamp program in America(“now expanding at a pace of about 20,000 people a day.”) Also no shortage of children in custody. Last December, the New York Times obtained – and reported on [2]— a “confidential draft report” prepared by a task force appointed by NY gov David Paterson: “New York State’s current approach fails the young people who are drawn into the system, the public whose safety it is intended to protect, and the principles of good governance that demand effective use of scarce state resources.” Story also says the situation was so bad that the DOJ, at one point, was threatening to “take over.”

So, if Intel is interested in contributing, how about addressing real problems—helping real people— that could affect real, collective societal change and improvement? Children and education seem like obvious places to start. Basics like hardware and mentors could go a long way. Children in poverty struggle with range of issues, including asthma, low self-esteem, obesity, and depression. Consider children in places like the South Bronx (Jonathan Kozol’s children [3]): allocation of resources in places like this (and/or lower-middle class communities), especially from companies like Intel, could change lives; give voices to people from whom we do not often hear.

Interested to hear thoughts on Internet privacy, though I'm not sure adults have an expectation of privacy anywhere [4] on the Internet. If you want privacy, don't put yourself on the Internet. Finally, on the subject of online harassment, if we accept that the Internet is a public place, to what extent is it acceptable to regulate online communication, including but not limited to comments deemed 'offensive' on blogs?

Lien: Although this is not the subject of today, I do believe we can still exceptations about privacy on the net. It is not because this is a difficult issue and the level of privacy goes down as more and more people also post personal information about others on the net, that we cannot have expectations anymore. The same is true for the regulation of online communication, it is not because the Internet is a public space, that even public speech cannot be regulated in one way or another.

Predictions. Guests will be nice. Class will be nice. Hope to hear more about Dispute Finder's business model.

Tyler: I would like an explanation for why the contributor of a disputed claim on DisputeFinder needs to provide a link to an article that illustrates the opposing point of view. If there are no article, isn't it still valuable to identify a claim as disputed, especially since this could break DisputeFinder's dataset building-process into two parts? I could enter a disputed claim without a link to another source and then another user, once alerted to the potential dispute could track down and enter the article. I see the argument that an issue is not actually in dispute if there is no contradictory reports of it, but I wonder if an entry into DisputeFinder should be enough to create a "dispute", rather than requiring a link. I agree that even a blog post outlining the opposing point of view would be more helpful than a "dispute" without any link, but I'm not sure that it should be a requirement. Today I entered a disputed claim as "Works prepared by amazon mechanical turkers are considered works for hire under the United States Copyright Act" to see if DisputeFinder would highlight portions of our wiki (which does not currently have any disputed claims, according to DisputeFinder) but I was stalled when it asked for a link to a web location outlining this dispute. Should I have entered the page on this wiki where we discuss the issue? I hope that the guests discuss this aspect of the DisputeFinder process. 20:11, 11 January 2010 (UTC)

Tyler: Another question that came up during my lunchtime discussion of DisputeFinder with some of our classmates: is there a practical way that DisputeFinder could leverage the existing collection of topics that wikipedia has flagged as a "point of view" or "non-neutral" to boost DisputeFinder's database of disputes?
Elisabeth: Actually, I tend to think the need to cite an article is a useful safeguard. It mirrors Wikipedia's rule that contributors can't do original research, and I think it exists for the same reason: to keep spammer-activists from simply flinging their views into these trusted systems. It's just to easy to go around labeling hundreds of things as disputed. Now, you could just go write a wiki on Amazon Mechanical Turk and then cite to it on dispute finder, much as anyone can publish an article on SSRN and then write a Wikipedia page, but it does provide some measure of protection (it stopped Tyler). One alternative solution is for DisputeFinder to flag in a lighter color, or a different color, claims that are marked disputed but have no article support.

Victoria: I completely agree with the former point that DisputeFinder's success is dependent on gaining a critical mass of end users. In addition to the need for more users, I think the platform is very trusting of the end users themselves. DisputeFinder allows for a lot of users to subjectively claim anything is disputed even when it begins to reach the absurd. posted in June 2009, "the 2009 Iran Presidential election was rigged," "Global warming does not exist" and "Recycling is good for the environment" were disputed. Although theoretically the idea of a marketplace of ideas works - without the appropriate robust marketplace DisputeFinder becomes a caricature of the truth-seeking function of free speech.

Michael: I am interested to hear the speakers from DisputeFinder describe why they believe the simple knowledge of a dispute is socially valuable. The extent to which a statement is disputed seems like it would be more valuable. I'd be interested to hear if our speakers expect to include a scale of dispute to their software, such as that used in Herdict (the different colored sheep).
I predict that DisputeFinder will view one of its most difficult challenges to be determining the trustworthiness of sources. Unlike CrowdFlower, DisputeFinder may not be able to simply use agreement as an accurate rubric for which user-supplied links are trustworthy and which are spam. Since the nature of the site is to highlight disagreement, it doesn't seem possible to use user agreement as the benchmark of useful data for their service. I would be interested to hear what DisputeFinder uses as its criteria for determining which data is reliable (meaning non-spam). Mfeld 23:39, 11 January 2010 (UTC)
Elisabeth: I'm not sure why a crowd-voting system isn't appropriate--after all, people on both sides of a given debate can vote up sources they consider most trustworthy--but I'd be interested to hear how our speakers think it has worked.

Juan: I'm interested in hearing their thoughts on the collection of disputed claims. As the material mentioned, most people who were interviewed are interested in applying Dispute Finder to particular areas that affect them. Thus, how will they collect data for areas not that popular or useful to most people, especially there are no other incentives to encourage claim creation. I guess building up community as wikipedia or herdict did might be one solution. The question is how this community can be built up. Also, I kind of feel Dispute Finder overlaps with the search services provided by Google and other search engines. If people are interested in one topic, they can always use Google or other search services to find the corroborations or objections on it. By determining trustworthy resources, can I say Dispute Finder sort of limited the available resources to people? To me, "trustworthy resources" is more like a subjective concept, everyone can has his/her own trustworthy resources. Is there a need to have a website telling us which resource is trustworthy, especially the site itself said it is a difficult tradeoff to determine the trustworthy resources.

Sharona: I totally agree with everyone's comments about the lack of a critical mass, and I would be curious to hear how they think they could theoretically gain one. Would people actually be drawn to this the way they are to edit wikipedia pages? Is it simply a matter of a marketing strategy? Another question I had while reading the website and the other document was regarding the phrase "trusted source." Who defines that? The users? What if people start claiming that what DisputeFinder may deem "unreliable" is a trusted source in their view? Who will stop them, and will that be antithetical to DisputeFinder's ethos?

Hector: Indeed, I'd be interested to see the demographics of DF's users (considering a perception of skeptical/activist internet users based out of Berkeley)

Emily: Sharona, not sure anyone needs to "stop them." Even a cursory search for articles on the Internet citing "Dispute Finder" suggests this thing hasn't picked up much steam: 0 results in google news; 1 at (but the story was produced out of the recently launched Bay Area blog!). Curious to hear if others, including guests, have responses to your "marketing strategy" question.

Elisabeth: looking at all of our readings together--on Dispute Finder,, and Herdict--makes me think about how we're creating an increasingly atomized web, where users have remarkably different experiences in the Internet space. It's not just that people are accessing different content. Some people are passively consuming stories from CNN, others are carefully sculpting an information diet from RSS feeders and DisputeFinder trusted sources, and still others are contributing to all different kinds of tasks, from tracking filtering around the world to working on SETI. Even "contribution" work can be active or passively happening in the background (Wikipedia is active; SETI is background, and DisputeFinder and Herdict are active now but could be passive if some of the tech suggestions above are implemented). Extensions like Readability and the popularity of mobile phone browsers make the web look totally different for different users, and if takes off, it won't even be interoperable between ISPs! The question of "how we interact with the internet" thus will become even more complex as the user experience splinters even further. And so (dragging this mediation back to class), I would like to hear how Herdict and DisputeFinder see their audiences. Is the vision global acceptance? Or is it a critical mass large enough to get the job done--to identify most blocked sites and most disputes claims--and specialized use only by those who are particularly interested in the topic?

Elisabeth: and one theme that connects all of these services is the need for active users. How do we foster a culture in which people think they have a responsibility to contribute to the web (beyond even contributing pure content), instead of just using it?

Lien: Although there aren't any guests of HerdictWeb, I think it would be interested to see on the website as to why a certain site is blocked. Sometimes, there might be legitimate reasons of a government to blog a certain site. In Belgium, e.g., a site that indicated what the exact address is of people convicted for paedophilia, was blocked by the government for privacy reasons.

Bruno: actually there is someone from Herdict, although he is not exactly a guest ;)

Bruno: Achieving a critical mass of users is really essential for such kind of projects. In order to scale up participation a project manager should be concerned about what are the motivations for users to participate and focus the initial marketing strategies on certain groups to which the project can be more appealing. Just like the experience of our wiki has shown, as well as Wikipedia's experience also seems to corroborate, initial participation tends to become an example for future participation. In this sense, I would be curious to learn whether the analysis of users participation to this moment would reveal the existence of an identifiable group(s) to which outreach efforts should be focused and whether there has been an activity directed at community building.