GSoC FAQ

From Google Summer of Code 2009
Jump to navigation Jump to search

This is the page for some of the frequently asked questions by the prospective participants of the Google Summer of Code 2009.

General Questions

Am I required to be local

Q: If someone is selected as a student coder through the Summer of Code, will they need to be in the Boston area over the summer?

A: No, we are not asking anyone to move to Boston for the summer. While you are most welcome to come and work at the Berkman Center if you are selected for an internship, we will not force anyone accepting an intership to move.

Non Berkman geeks

Q: Is this limited to Harvard and/or Berkman coders?

A: This is open to any and all that would like to apply.

Cohort CRM

Mass mailing?

We'd like a collection of tags and/or saved searches to serve as a source for the membership of lists. We'd then use this list of Cohort-managed users to integrate with Sympa or other open-source mail list software.

We'd also like a simple bulk-mailing system built in for when you want to contact a dozen or so people, nothing complicated or heavy duty, it'd just send email as the logged-in user via the application.

Deduplication?

We have conceived of a way -- maybe via rake tasks run as cron jobs -- to come up with a "scale" using proceedingly less precise matches to say what records are most likely to be duplicates. We'd then have to build a system to put records side-by-side and allow the user to merge them together: newer records are not necessarily going to have better information, so a UI would need to be made to allow a user to pick-and-choose what fields to use from each duplicate record.

So we'd look for exact matches on multiple fields and assign them a weight. Then we'd look for fuzzy matches, then we'd look for even fuzzier matches and so on, until there's a set of potential duplicates with assigned weights. This would be a very CPU-intensive operation.

Other algorithms for the de-duplication of records are welcome as well, of course.