A few of the obstacles -- and the ultimate standard -- English language*, Arabic numbers , unambiguous context - a place where lawyers can agree that the receipt is intelligible to both parties.
*(Mandarin is perhaps less efficient as context requires tonal parsing and much larger character sets)
Receipt issuers have little interest in conforming to a data interchange standard.
The receipt is considered a record of a private contract between two parties; not an information source to be used for other purposes.
Further, the information on digital receipts is wildly variable. The only common datum are the date (with finite format variances) and the total amount charged, lumping together individual items, tax, etc which appear differently from every distributor/state-province. There are services like shoeboxed.com that can perform this elementary level of parsing even for paper receipts. To go to the next level, parsing needs to be done on a case by case basis. You could create rules for Amazon, Apple, EBay, etc. and maintain updates as their formats change from time to time but doing this needs to be pooled for efficiency -- infrastructure services could do this, but they need to sell this to someone.
At least the digital receipt can be converted to ASCII if it is not already in that fundamental form. Paper receipts have the additional character to ASCII parsing problem. Then you need to interpret the meaning of the ASCII strings and normalize it into defined data fields that can be used for other purposes. Data cleansing may require manual intervention.
Suppose a standard is proposed. It then needs to be adopted by a significant number of retailers to be considered a useful standard for any data gathering system. This is difficult to make happen without either a federal mandate or a significant business advantage that continues after the leveling effect of universal adoption. Ahead of that, the advantage has to be sold to each distributor so they budget the funds to adopt and support the standard.
While we wait, parsing algorithms get smarter, but this is a tough problem to solve.
So there you have it -- the cowboys (sellers) herding cattle (buyers) are not interested in the divinity of bovines (re: Craig Burton) or cow equality. They like the system the way it is. But as the herd acquires smarter affordable gadgets, they will cause defacto open standards to emerge that can free the data to be used in multiple ways to benefit the cows (buyers).
Viva la Vaca!
- The Mad Cow
P.S. As dairy farmers well know, the fence holding the cows in is puny, but the herd respects it most of the time (zap). Now and then, one of the herd tramples down the fence and the cows escape for a while, usually in the spring. The cost-benefit ratio of flimsy fence / number of break-outs is within acceptable bounds. Warning: more break-outs inevitably leads to stronger more expensive fences and higher milk prices. YMMV ;-)
On May 27, 2011, at 3:15 AM, Iain Henderson wrote:
Yes, and let's not forget the offline world. A colleague of mine covered digitising receipts in her MSc thesis and came up with fascinating numbers, like the miles of paper that supermarkets churn out each week that go straight in the bin. Much of this is just a legacy way of doing things crying out for a standardised fix.
Count me in for this project work if we get it going.
Iain
On 27 May 2011, at 00:58, Doc Searls < > wrote:
On May 26, 2011, at 5:32 PM, Brian Behlendorf wrote:
Why not a standard for emailed purchase receipts?
Great idea!
Could it be there's already one we don't know?
For no good reason, today I get email messages confirming purchases in all sorts of different formats. One could pretty easily imagine a standard that preserved the ability to view the message in an HTML-ish mail client, but which also embedded enough metadata (microformats-style, or as a small attachment a la VCF) to allow a message to be fed to a client-side application that did interesting things with the data - like helped me keep track of purchasing habits, a personal-property inventory system for insurance purposes, a warranty tracking system so that when X breaks I know quickly what number to call and if it's still in service, etc. It could also thus feed a VRM system entirely client side, or on a personal cloud of one's choice, in a manner much more semantically meaningful than free text association.
Then, let a thousand All My Purchases apps bloom.
Brian
Yay! Let's do it.
Doc
Archive powered by MHonArc 2.6.19.