GSoC FAQ: Difference between revisions

From Google Summer of Code 2009
Jump to navigation Jump to search
 
Line 26: Line 26:


Other algorithms for the de-duplication of records are welcome as well, of course.
Other algorithms for the de-duplication of records are welcome as well, of course.
===Database-level validation?===
We  want the database to enforce constraints, not the application. The idea would be to - as much as possible - replicate the postgres-level constraints and checks in MySQL. Where it isn't possible, we'll document the areas that MySQL is lacking (which are many). We'll validate at the application level of course, and some of this is handled already via the Redhill plugins. To the greatest extent possible we want to centralize validation logic in the database.
We can inspect the connection type during the application of a migration and run different "execute"-ed statements to create constraints - trivial really, it just hasn't been done yet.
We feel the rails community has it wrong by centralizing validation logic at the ActiveRecord level - it makes it impossible to use your well-designed database schema from other languages consistently.  The database is the primary repository of the data that makes up your application and it makes sense to treat it as an equal player - especially if you use a database like Postgres that is actually ACID and provides you a rich set of tools. MySQL has crippled most Rails developers understanding as to what a real database can do for your application.
Additionally, in "the real world," you cannot dictate that all projects must be rails. It does not make sense to re-implement application-level validation logic for php, for perl, for rails, for java, etc, every time you want to access your database directly. When you implement validation logic at the database level, you only need to do it once. Rails, fortunately, provides more than adequate tools through core features or plugins (check out the Redhill stuff) to make this possible - and easy.
A good web services API can help work around the application/database schism, but some "legacy apps" may want a direct database connection and there's no reason we can't provide a rock-solid database schema that can outlive Rails as the application framework.

Latest revision as of 15:30, 1 April 2009

This is the page for some of the frequently asked questions by the prospective participants of the Google Summer of Code 2009.

General Questions

Am I required to be local

Q: If someone is selected as a student coder through the Summer of Code, will they need to be in the Boston area over the summer?

A: No, we are not asking anyone to move to Boston for the summer. While you are most welcome to come and work at the Berkman Center if you are selected for an internship, we will not force anyone accepting an intership to move.

Non Berkman geeks

Q: Is this limited to Harvard and/or Berkman coders?

A: This is open to any and all that would like to apply.

Cohort CRM

Mass mailing?

We'd like a collection of tags and/or saved searches to serve as a source for the membership of lists. We'd then use this list of Cohort-managed users to integrate with Sympa or other open-source mail list software.

We'd also like a simple bulk-mailing system built in for when you want to contact a dozen or so people, nothing complicated or heavy duty, it'd just send email as the logged-in user via the application.

Deduplication?

We have conceived of a way -- maybe via rake tasks run as cron jobs -- to come up with a "scale" using proceedingly less precise matches to say what records are most likely to be duplicates. We'd then have to build a system to put records side-by-side and allow the user to merge them together: newer records are not necessarily going to have better information, so a UI would need to be made to allow a user to pick-and-choose what fields to use from each duplicate record.

So we'd look for exact matches on multiple fields and assign them a weight. Then we'd look for fuzzy matches, then we'd look for even fuzzier matches and so on, until there's a set of potential duplicates with assigned weights. This would be a very CPU-intensive operation.

Other algorithms for the de-duplication of records are welcome as well, of course.

Database-level validation?

We want the database to enforce constraints, not the application. The idea would be to - as much as possible - replicate the postgres-level constraints and checks in MySQL. Where it isn't possible, we'll document the areas that MySQL is lacking (which are many). We'll validate at the application level of course, and some of this is handled already via the Redhill plugins. To the greatest extent possible we want to centralize validation logic in the database.

We can inspect the connection type during the application of a migration and run different "execute"-ed statements to create constraints - trivial really, it just hasn't been done yet.

We feel the rails community has it wrong by centralizing validation logic at the ActiveRecord level - it makes it impossible to use your well-designed database schema from other languages consistently. The database is the primary repository of the data that makes up your application and it makes sense to treat it as an equal player - especially if you use a database like Postgres that is actually ACID and provides you a rich set of tools. MySQL has crippled most Rails developers understanding as to what a real database can do for your application.

Additionally, in "the real world," you cannot dictate that all projects must be rails. It does not make sense to re-implement application-level validation logic for php, for perl, for rails, for java, etc, every time you want to access your database directly. When you implement validation logic at the database level, you only need to do it once. Rails, fortunately, provides more than adequate tools through core features or plugins (check out the Redhill stuff) to make this possible - and easy.

A good web services API can help work around the application/database schism, but some "legacy apps" may want a direct database connection and there's no reason we can't provide a rock-solid database schema that can outlive Rails as the application framework.