Four questions.
1. Does this data leakage happen with online banking, credit card sites, and when downloading one's own data into personal analysis programs (e.g., quicken) on one's personal computer?
2. Are there standards used by these technology providers which application developers could observe?
3. Until standards are set, as Mary is certainly rightly proposing below, could one be confident in an app built on top of the banking and personal financial analysis program framework, respecting the standards used by these websites?
4. If that application offered individuals a possibility to aggregate contextually relevant financial data with other data on one's personal computer to recognize a pattern that could be shared anonymously with a website which reveals where that individual (personally identifiably to the individual but anonymously to everyone else) falls on a cluster map (taking advantage of point. 4 below), what would the risks to the individual be?
Thanks, On Jun 6, 2011, at 1:25 AM, Joe Andrieu <
">
">
> wrote:
--
Then what is the next option for keeping PII data safe from leakage?
--
Well... there are some things you can do. But PII is such a challenging
framing. It's not unreasonable to super-encrypt any "hard" PII, such as
name, CC, address, etc., and only use anonymous identifiers to link
non-encrypted data to the PII. In some ways this is an extension of
the NIH standards for testing on human subjects. The ethical practice
is to separate as much identifiable information from the research data
as possible. In practice, it is generally possible to reverse-track
from the research data to the individual, but that's a special case.
The bulk of the work is done by people who have no idea who patient
#4335256 is.
But as we saw with the AOL search leakage, that isn't always enough,
especially in the face of significant quantities of data.
This is a fundamental problem. PII frames the issue as if there is a
containable set of data, which, if magically protected, would allow
safe interactions online. But that just isn't true. Most of the
behavioral tracking data is 100% anonymous. Often they don't care that
you're Mary Hodder. They care that you've been doing a lot of
automobile queries recently.
Perhaps a better framing is the Peeping Tom problem. Schlage locks can
keep amateur criminals out, that's like https and encryption for PII.
But it doesn't do anything to keep people from looking in my windows
and spying on me. Sure, securing PII might protect my PII from all but
the most determined hackers, but companies can still watch me in ways
that most people feel creeped out by, that violate privacy. And the
more you extend PII, to IP addresses and zip codes and such, the more
your digital home starts to look like Osama's hideout. Big bulky
walls, with no windows, no phone service and no Internet. As a privacy
solution, locking down PII turns privacy into a security problem and
starts growing the notion of PII until everyone's home looks like Fort
Knox.
IMO, there are four things you can and should do:
1. Lock down as much as you can. There's no reason credit card
and password information can't be encrypted. That's fairly trivial
with most systems and just sloppy engineering to not do so. A standard
one-way hash for passwords has been built into unix for decades. Save
this for information that is rarely touched and has the most damaging
consequences.
2. Don't use PII as a lever for other services. Some of the
most significant problems with identity theft are more from the
reliance of other services on PII to provision and authenticate. For
example, if I have your social security card and birth certificate, I
can get a driver's license in many states, which itself can be used to
bootstrap other services, like a bank account, credit cards, etc. Well,
if we can't really control the leakage of my SSN and birth date, etc.,
then stop using those data for giving and managing my credit! It was
only in the last 20 years that companies were restricted from using my
SSN as an identifier. In fact, when I was a freshman, my student ID #
was my SSN. How crazy was that? Fortunately, credit card companies
long ago denied merchants the right to correlate customer data by CC
number. This needs to be promoted and adopted more broadly. Using PII
as a key for services is ridiculous in a world where PII is compromised
all the time.
3. Regulate the use of information, no matter where it comes from,
requiring permission to use any personal information. Ultimately, I
think this is our best, fastest solution to most of the problems we are
dealing with, especially with contract-based Trust Frameworks. It is
how Do Not Call works. It's how defamation law works--it doesn't matter
how they got the defamatory material. It's how harrassment laws
work--it doesn't matter how they got my address or phone #. It's how
rules of evidence prevent unlawfully attained evidence in the
courtroom. The problem isn't that people know stuff about us. It's that
they do things based on what they know, things we find incompatible
with a free society. I think we can get forward-thinking companies to
play by new rules, rules that simultaneously respect individuals'
authority and enable deeper, more meaningful relationships. The latter
will pay for the former.
4. Use statistical aggregation wherever possible. This is the
only mathematically sound way to anonymize any data. You are only as
anonymous as the number of people you can be confused with. Statistical
aggregation would allow data analysis where companies can look at
statistical drivers of profitability without needing to link to any
particular individual. For example, knowing that 10% of visitors to
your website abandon their shopping carts when they see the shipping
charges is far more useful than a list of names of those same
"abandoners". This is actually one of the places where traditional
database marketing techniques can improve online businesses. I don't
believe in demographics and psychographics, but using statistical
aggregation is essentially what those approaches are. Thing is, we no
longer need to pivot the data on demographic data... which is often PII
and often only a minor indicator compared to, say, the last ten
keywords someone searched for or time on page. Reinventing analytics to
preserve anonymity in business intelligence is a good idea.
I'm not sure that is the answer you were hoping for, but it's all I
got. =)
-j
Joe Andrieu
">
">
">
+1 (805) 705-8651
On 6/5/2011 8:31 PM, Mary Hodder wrote:
" type="cite">Hi Joe,
So we've just been discussing all this.. the ways encryption can
go wrong.. and i got the same info in the room.
So the idea is.. use HTTPS for anything in transit.. but the
data at databases will be wide open.. and even the user's password will
most likely be stored in plain text..
in the database because it's too much work to encrypt that.
If the databases are where the most vulnerabilities are for PII
.. and encryption isn't an option there...
Then what is the next option for keeping PII data safe from
leakage?
mary
On Jun 5, 2011, at 7:56 PM, Joe Andrieu wrote:
Mary,
You're analysis is fairly on the nose, but there is a difference
between https by default and storing the data encrypted.
Https by default is good, but at a small cost. It also introduces extra
debugging overhead, but in theory, most of us would be using standard
libraries so that we aren't debugging the crypto anyway.
But all https does is deal with the data in transit.
It's a separate issue for data on the server or in an archive. And
your analysis there is spot on, certain types of transactions simply
become either too costly to be reasonable or essentially non-viable,
such as search.
I think the biggest problem here is that most of the analytic stuff
that's of modern interest needs to operate on unencrypted data. Even
simple web analytics to try to find out why people are buying one
product verses another requires access to the underlying factors. Any
web business worth its salt uses A/B testing to try out new features.
Doing that on top of some encrypted data set would be untenable, which
means that companies would need to decrypt their entire analytic set
when they want to run any tests like that. And the larger companies
are doing this constantly. So, the more sophisticated the vendor, the
less and less tenable it is to "encrypt everything", even if you adopt
an "on-demand" decryption strategy. On top of that the shear
computational complexity of current analytics is already bursting
server-side machines to the limit. Adding crypto simply makes it
worse. When you're busting your chops trying to fit your algorithm into
a scalable hadoop cluster, you consider any additional compute costs
very very carefully.
That said, https by default is, imo, a no brainer.
-j
Joe Andrieu
">
">
">
+1 (805) 705-8651
On 6/5/2011 7:17 PM, Mary Hodder wrote:
" type="cite">Hi,
So i checked with a friend who used to write the Linux OS
for
google servers and knows databases fairly well and he said that
to encrypt everything would mean the following:
* encryption would take no more space in the database..
because
while the data would no longer be in plain text.. encryption would
likely compress it to about the same size
* compression would not happen at the same time as encryption
* that extra compression/encryption step for each piece of
data
would take additional CPU time and power.. times trillions of bits of
data.. that would mean a big hit to the CPUs (many more servers would
be required)
* you could route all larger media (videos and photos)
through
unencrypted channels to save on encryption and compression power
* encryption would mean packets would not be inspectable by
your
ISP etc.. (an added benefit for privacy!!)
* because data would be encrypted you wouldn't be able to
reuse
it (ie, if twitter encrypted everything.. you would not be able to take
a link and have all those who pointed to it.. share it in the
database.. i don't know whether twitter does this.. but if they wanted
to.. an encrypted system wouldn't allow it though you could assign
unique items a unique number, match the item in an additional step, on
the way into the database.. then grab the number, encrypt it. and make
the user's pointer to it unique and encrypted)
* you could not do searches across the database.. but you
could
build an index, that would show data.. but not totally compromise it..
and that database could be put somewhere else (offline perhaps?) to
keep it safer from crackers
So it sounds to me like encrypting a system with HTTPS for a
lot
of users and data would slow it down.. and require more CPU power.. but
it would work. And be far more secure than what we have now. Probably
not practical for search.. but if it was say, a site that sells
things.. a repeat user logs in.. and it's all an encrypted session, and
the entire user's profile would have to be unencrypted.. then another
sale could happen matching the older profile data.. then the
re-encryption would occur as the new data was stored with the old data.
It would be expensive in terms of CPU time.. but the user
data
would be pretty secure in that scenario.
Other than that, can anyone think of why we wouldn't
recommend
HPPTS encryption to services to protect user data?
mary
On Jun 5, 2011, at 3:49 PM, Mary Hodder wrote:
maybe..
i don't know if we've evaluated what it would mean to
encrypt
everything..
let's take this use case:
a user has a personal data store.. they put as much as
they
want to into it..
they add some apps.. maybe a VRM app like buyosphere for
sharing shopping data..
maybe
they get an app for allowing news sites to know their advertising
preferences and serve back ads (not a VRM app.. because it's about ads
and marketing)
maybe they get an app for archiving their bank
statements.. and there is no sharing here.. except with their
accountant who is an email away..
maybe they get an app for aggregating their search
history..
and maybe they get an app for book recommendations where
they
share what books they've bought..
If all the sites the user touches.. their PDS.. their
buysphere account..
their advertising preferences that go to news sites
the bank, all their browsers and histories, and the
recommender for books.. and the book purchase sites..
if they all encrypted.. what would that do to the system?
Does every site require an Https: login?
I think this is a good question to evaluate..
what does it do to apps makers of any type that want to
hook
into a PDS?
mary
On Jun 5, 2011, at 3:16 PM, Michael O'Connor Clarke wrote:
So
I've just read the original article linked and the entire resulting
thread here, and I have what may be a completely dumb question. The
point I kept expecting didn't show up, so I'm driven to ask it, even if
it simply serves to highlight my own naivete.
Shouldn't part of the answer here be, simply, encrypt everything?
It
doesn't stop data "leaking" or prevent breaches, but it sure makes
things harder for your workaday cyber-baddy to do naughty things.
The
locks - all locks - can be picked. So just make the stuff behind the
locks useless to all but the really determined bad guys.
Doesn't seem like this should be all too hard to implement.
Am I completely missing something obvious? Would appreciate a little
education here.
Thanks,
/m
Michael O'Connor Clarke
+1.416.893.4941
@michaelocc
From: Mary Hodder <
">
">
">
>
Date: Sun, 5 Jun 2011 13:48:26 -0700
Subject: Re: [projectvrm] Can you really close
Pandora's box
Mark,
The idea in the personal data ecosystem model is that
users
do control their own data.
But where? Most likely their personal data store will be
hosted..
and point to or store personal data.. so those hosting
services should likely have some kind of level of
security we expect from the hosting.
And.. there will be services out in the world we send
our
data to.. from those PDS.
Those services should also have security standards for
keeping personal data.
I don't think a personal data ecosystem model means we
will
all be hosting our own boxes..
(kind of like today... most of us don't host our own
email
on our home servers.. if we even have one..
those of us who have our own domains -- like hodder.org --
likely still have someone else manage
that.)
I'm curious.. of the VRM and PDE companies in the
space..
have any of them announced
a level of security for their servers that is better..
kind
of like adhering to a Trust Framework..
but for data security?
So that leaks aren't happening?
I don't know of any company or service that talks about
it.
mary
On Jun 5, 2011, at 1:40 PM, Mark Lizar wrote:
if
the Customer had authentic and official control of their data then all
the other data would be second class. The customer gets to manage
access.
Does the box need to be closed? Could we all one day
have
our own box? :-)
- M
On 5 Jun 2011, at 17:05, Mary Hodder wrote:
Joe..
I
think it scares people to talk about personal data security.. yes.. but
i think it's healthy to talk about things that scare us..
So.. regarding locks. I think you and i are
agreeing
here.. in a weird way.
I
want to define the locks and what the "highest standard" is.. but that
doesn't mean the extreme standards.. like super cryptography.
It means that .. like the schlage locks example.. we
ask
for schlage locks. Note that recently we changed all my house
locks
from Kwikset ($13 locks) to Schlage ($52 each) and we feel it's good..
safe.. and frankly the locks don't need to be completely
tightened
up every 6 months because they are crap. The new locks do the
trick.. they have a longer thicker deadbolt, with more pins inside,
better structure and screws and I think given our
neighborhood, we can consider them "highest standard" for the
circumstances.
..
so maybe it's asking that when a site collects personal data.. they
partition it across multiple data bases.. in ways that make it hard to
steal and put back together..
unless you know how to do
it.. as opposed to keeping the whole of user data in one DB. Or maybe
we say.. attach bits of user data to other non PII data in a data
structure.. and make an
obscure way to connect it back to the
user.. unless you know the way to do it. In other words don't store:
name, address and CC all together. Or whatever.
Maybe the whole of user data at a service is only
stored
in a single DB when it's not attached to the public internet.
Maybe the api data available is fully examined
publicly
.. or maybe api's with any PII access require special oversite..
Those
are a couple of suggestions.. we can talk about asking for the Schlage
standard without telling IT people exactly how to do it.. they can
figure that out on their own
and in relative secrecy which means the details
don't so
easily get to the bad guys.
I
agree we need laws against breaking and entering.. i think we have a
lot of those now.. but how do you enforce that in Uzbekistan?
We
don't have international laws and enforcements at the same levels as we
do at the nation state level (nation states in my view are an
anachronism..
i think they are passe but we don't have a lot to take their place..
the real power is in global markets.. for good and for bad..
and it's too scary to talk about the fact that they
are
kind of passe.)
I
don't know that you get the whole world on board with a culture of not
breaking and entering.. we have uneasy peace (and wars) across
the world as it is. We can ask for it.. but many
won't
respect it at all.
What I'm asking for is to create a "highest"
standard
for services.. put it in writing.. and then show up and ask those guys:
"hey.. are you following the standard?" because
we'd
really like you to....
Give the IT guys something practical to implement
instead of just lamenting the fact that our data is leaking all over.
So I'm asking for that.. what does that look like?
mary
ps..
did you see the thing last week that 30m Google user's data leaked out
of Google? I don't think any service is immune here..
On Jun 4, 2011, at 11:03 PM, Joe Andrieu wrote:
Absolutely, I
think
we can, but it's hard. And it scares people.
Which makes both regular folks and experts avoid it. Same reason
locksmiths don't talk about locks. Most locks are crap and subject to
trivial attacks.
Most people don't want to hear about that and
most experts don't want the techniques leaked to a wider criminal
audience. Plus, there's the unfortunate tendency to enjoy being one of
the few wizards who understand the secret magic. But in the end, most
people are fine with the $30 Schlage lock, even thoough it's pretty
much useless for anyone with even moderate training or industry. For
most people, it provides the security they care about and, in fact it
keeps out enough potential criminals that people are mostly happy.
Which is to say that what I'm talking about is figuring out the digital
equivalent of (1) simple locks, (2) laws and rules against breaking and
entering and constitutional protection against unreasonable search and
seizure, and (3) a cultural shift that locks are to be expected and
respected. I think just getting /that/ in place will do more for our
society then the, also important, more detail oriented work of outright
security.
I think of it this way. For most of my information,
I don't need the equivalent of Fort Knox. Locks on my doors are just
fine. Today, we not only don't have digital locks on the doors, but
it's common practice to grab the pies cooling in my window sill. And
too much of the data security conversation ends up sounding like Fort
Knox!
To track this back to the FTC paper, it doesn't even
address what minimal business practices should be followed, that is,
that there should be locks on the doors. The main reason I push back
against too much fixation on data security is because
(1) I
think doing that 100% is literally impossible (see Wikileaks) and
ultimately is distracting. The data is out there. It will continue to
be out there. It will continue to be created and put out there by
people you know, simply because they tweet or blog or check in and
mention you. I don't believe we can contain the data. I do believe we
can penalize inappropriate use of that data. To point again at the
Do-Not-Call registry, it solved a significant annoyance not by data
security--the fact that my phone number is available was never seen as
the problem--but by inappropriate use of that data.
And (2)
because I believe the world will be a better place with more intimate,
more trusting, more valuable relationships, especially compared to the
minor cost of the risk of criminal use of my data. To me, security is
almost entirely about independence, not engagement. In fact, the
approaches I know preclude more engagement by their very nature. But, I
want Google to know what I'm looking for. I want facebook to know the
statuses I want to share with my friends. I want FourSquare to know
where I am. I want WordPress to know what I write. Information sharing
is the essence of digital relationships... and the bane of data
security. And, as you know, I've spent a lot of time working through
these issues from an information sharing perspective; that's my lens,
rose colored or otherwise.
So, yes, I think we should be able
to have conversations about data security--even as I explain why that's
not my focus. From our previous conversations, I think you and I are
aligned on most of these issues. I just think the biggest bang for our
buck is figuring out how individuals can contribute (data) to our
digital experience without fear of exploitation. Right now, the vast
majority of exploitation is legal and accepted business practice. THAT
I think we can change much more rapidly than we can control data
through rigorous security.
-j
Joe Andrieu
">
">
">
+1 (805) 705-8651
On 6/4/2011 4:42 PM, Mary Hodder wrote:
" type="cite">Joe..
i
agree we should collect less data and have more honest businesses. We
don't have as many problems
talking about that stuff.. and we can keep doing
it
and that part will get dramatically better soon, i think.
but some data will be collected..
and i know criminals will do their thing.. but
more.. or less is the question...
I'd like less and i'd like to know when we get
real
about having a way to measure security around data?
Most institutions hide/run from that kind of
discussion and i don't think we solve this until
we talk about it.
We have ways to talk about problems with
airplanes
and safety..
food safety
even clear air and water ..
we have measures and standards for serious
things
like that..
why can't we have similar talks about personal
data
security?
On Jun 4, 2011, at 3:56 PM, Joe Andrieu wrote:
I
think our biggest problem isn't with those who will break the law and
steal identifiers. That's a security issue and one that deserves
appropriate secrecy on behalf of those trying to solve it...
What is most broken is that it is *common business practice* to capture
and exploit information about and from individuals, without permission.
If there were appropriate boundaries for what is and isn't acceptable,
companies like Groupon--and those who aspire to IPOs or acquisitions
valued in the billions--would be forced to play by the rules. Public
markets won't tolerate wholesale illegal behavior. Not indefinitely.
This is the essence of privacy enforcement. Good people and companies
respect privacy. Bad ones don't. Or as the aphorism puts it: "Locks
don't keep criminals from stealing. They keep honest people honest."
What we are trying to figure out is how to tell the difference in a new
environment where the boundaries are unclear.
Although many researchers and authors argue that privacy defies
definition because it is so complex, I disagree. Privacy is context
management. Information released or created in one context is expected
to be dealt with under that context. When it leaks in ways that are
inconsistent with the expectations of the originating context, privacy
is violated. What we are dealing with is both new online contexts and
context collapse due to online interactions. That's the problem: new
contextual realities we don't have a social framework for, whether it
may be enforced by law, regulation, or etiquette.
To
restate my initial premise: criminals will always find ways to violate
context. We can legislate consequences and we can build technical
barriers, but all laws can be broken and all techno-solutions can be
hacked. What we /can/ do is figure out how the mainstream of
well-intentioned companies and individuals can handle context
management in a mutually satisfactory way. Once we figure that out, we
can deal with the technical and legal barriers to violations.
-j
Joe Andrieu
">
">
">
+1 (805) 705-8651
On 6/4/2011 3:15 PM, Mary Hodder wrote:
" type="cite">I
think
there is an interesting comparison here to the banking industry.
Obviously they have big security concerns
and
address them with things like using .NET
and double logins to check your account and
making everyone come into the bank to open an
account or get signing rights.
The FCRA and congress tell financial
institutions they *must* give the highest security to
our data.. and yet they don't. Instead, they
give some security.. but have held back
on making credit cards with chips (like in
the
rest of the world) because it was cheaper
to pay out for fraud on the mag stripe data
on
the backs of CCs than it is to get the chips.
And it's cheaper to not have restaurants get
wireless swipers than wired ones so the
servers walk away with your card
(statistically
the place you are most likely to get
IDENTIFIER theft around a commercial
transaction).
And they don't protect your data all that
well.
Just enough to not get called on by regulators..
but not so much that they can't offer you
$40 a
month to protect you from IDENTITY theft
(i love how they use "identity" which scares
people into paying the $40.. great marketing..
if they said "identifer theft" i don't think
they would sell a lot of that.. )
If we mandated (lets say.. with the Kerry
McCain
Right's and Responsibilities legislation
.. which currently leaves out a "highest"
standard on data security) that data collectors of
any sort maintain a highest level of
security..
would we have a standard to give sites..
would we be able to hold the sites to it?
How do we know when a site collecting data
is
being negligent?
The bar is always moving due to the script
kiddies, Anonymous, and credit card / spammers
from obscure parts of the world, not to
mention
your average cracker.
I think if we want a standard.. we have to
make
a standard.. and codify it.
It doesn't have to be codified into law..
but
the problem is the cryptographers
and RSA types don't want to tell people
outloud
and in public how to be secure
because the baddies will get the info.
Or at least the ones I know at Stanford and
Berkeley ... and they work for the US Govt
and have lead lined offices.. seriously.
So how do you make a secure standard for our
data when the security people don't want to talk publicly
about it.. Bruce Schneier not withstanding?
On Jun 4, 2011, at 2:39 PM, j. clark wrote:
Thanks Dan.
Of note from the end of the article:
"A
key failure of the FTC report is that it largely ignores the
responsibility of websites in safeguarding the privacy of their users,"
says Wills.
"These sites
should
play
a custodial role in protecting their users and preventing the leakage
of their sensitive or identifiable information. Third-party sites have
a powerful economic incentive to continue to collect and aggregate user
information, so relying on them to protect user privacy will continue
to be a losing battle."
Ah, there's a toxic leak in our
ecosystem. I'm shocked! Sony's many sites, practices and breaches are
but one example now dangling in the media's hooks. Alas, our attention
span is short, our needs continuous, and the practice is SO
widespread... what's a person to do? Is this even a valid crisis?
Where's the righteous indignation?
j.
|