ListenLog Meeting Notes

From Project VRM
Jump to: navigation, search


Below are meeting notes and ongoing issues and action item lists. For a detailed description of ListenLog, visit the main ListenLog page.

1/12

Parking Lot

  • Where is data stored (by default)? Does it sync to a local store or just stream out?
  • Is there any concern from collaboration stations and partners that there's no exclusive access to data / analytics?
  • What data do we capture? Application behavior data in addition to basic listen data? What about rating data? Location data?
    • Should the standard by minimal for extensibility vs. maximal for enhanced value/functionality
    • Privacy concerns (EFF Chair Brad Templeton)
  • Security / encryption on the stored data? Which bits?
  • How best to communicate and promote the ListenLog concept and where the key benefits and differentiators are? How do we address naysayers and differentiate from alternative approaches, e.g. APML?
  • How do we do identity? How do we make it swappable? Do we use icards, openID, Oauth?
  • How do we start the service without identity? (e.g. access to device ID?)
  • Do users have control over what's stored?
  • What's the absolute minimum of device-side functionality?
    • Opt-out(?)
    • Change repository
    • Assign identity
  • Where does legal and TOS come in? (see rights and contracts below)
  • Does anyone enforce standards compliance?
  • Who does the work / coding?
    • PRX + who?
  • Do we think about revenue / sustainability? PRX has two roles here - one to build codebase and standards for storage, the other to think about services and how we'd use the data.
  • Do we do opt out for data capture? (probably yes)
  • Do we provide "public by default," e.g. ubiquitious, anonymous access to the data out of the box (probably no)
  • Can we open source iphone bit? Publicly available libraries?
  • What's in the first release?
    • Should we provide ability for users to release data? How and to whom? What capacity for sharing? What terms? Anon vs. nonanon?
  • Does there need to be database legal protection underlying data rights access to drive user terms?

1/13

  • How do we make the data inherently more anonymous?
    • Match account data between logs
    • Make timestamp and LAT-LONG fuzzy?
  • What data rights can a user authorize for third parties?
    • propagation rights (the grantee can't extend to someone else)
    • public rights vs. directed/granted rights (e.g. for anyone to use vs. for specific entity to use)
      • anon/non-anonymous
      • Most rights issues/rats nests are associated with granted rights
    • How long you can use the data for? Keep the data?
    • Rights to cease use, remove data(?) + confirmation(?)
    • Can't use this to try and find/identify someone - reverse-engineering rights
    • commercial/non-commercial?
    • Contact me (e.g. DNC)
    • Compare to IRB
  • Contract rights
    • Investigate proactively - what is it that pandora might want to do? What is reasonable?
      • give me audio recommendations
      • use for product development
    • Don't cross-correlate/aggregate (e.g. social network correlation, Ben Laurie) - piercing identity/privacy data
    • Endorsement / assignment to my identity

Core Requirements

  • what data is going to be captured
  • where is it stored and in what format
  • how does one identify oneself / assign identity
  • what's the minimum functionality that needs to live on the device
  • what's the minimum functionality that needs to live remotely
  • additional / core functionality to prove value necessary?
  • Determine protections for communication and storage between client app and repository authenticated, encrypted, etc.

Action Items

  • Draft functional requirements
  • Doc talk to Berkman legal re: user rights / terms

1/20

  • There is a core set of things to figure out to proceed with this project. We'll focus on those:
    • What data will we capture?
    • What format will we send and store the data in?
    • How is this data being transmitted and stored?
    • How do we maintain integrity, privacy, and provide the required minimum of user control (e.g. "delete my data")?
    • How do we assign or associate identity?
  • In discussing data capture, Keith argued for being minimal and conservative and providing a mechanism for extending
  • Agreement that our proposed data, format, and services will deal with listening attention data only; both for on-demand (file) audio as well as streaming audio
  • Open question about whether Sound Exchange / RIAA requires specific formats, if so, might be nice to comply
  • XRI be resolved / discoverability
  • XDI can be used to transmit and/or store; X3 looks promising
  • XRI = identifier - resolvable to an XDI endpoint
  • XDI dictionary - like an XML schema
  • XRI authority resolution server (open XRI)
  • community iname registry
  • inumber (this is how you handle reassignment)
  • At what level will we register?
  • How will we maintain this registry?
  • How will other applications that write to LL handle XRI? Will they resolve to their own server?
  • What about dupes?
    • xri synonyms is the answer?
  • Use iname registration for digital identity?
    • might not need an icard selector on the iphone
    • Simpler way?

Action Items

  • Diagram high-level architecture
  • Prep for discussing identity options next week
  • Look into Sound Exchange reporting formats and PBcore formats

2/26

Action Items

  • Make some design decisions to get us started with POC development
  • Create a boxes and arrows diagram to help represent the data sets and functional entities and where they live
  • Have a convo with berkman RE: servers and hosting the user data
  • Identity + claimant - how best to handle user identity (focus on near-term)?
  • XDI dictionary - designing what info is sent to log, registrants, etc.
  • Revisit user functionality on wiki - what do we really need to do? What role should XRI/XDI (and existing OpenXRI code) do for us here?

3/3

Notes on Identity Requirements

When does identity happen? There are at least three contexts where we handle identity

  1. Capture data implicitly
    1. No registration
    2. Linked to phone
  2. Access to data provisioned through phone
  3. Sending listen activity to other data store providers instead of the default store

Perhaps there are others, but these three seem clear so far.

By way of thinking about identity, it is worth noting that there are four distinct functional aspects of identity. Something I call the Identity Quartet (blog post imminent!)

  1. For authentication, logging into services (username:jandrieu)
  2. For presentation, e.g., as handle on MySpace or Facebook or WOrld of Warcraft (name: Thor the Destroyer)
  3. Internally for database level handling of the attributes & privileges associated with a user (users.primaryKey=1023304)
  4. As a service endpoint, e.g., joe@andrieu.net

Flexible identity systems separate these four elements. Lazy ones combine, such as using my email address as my username. Or displaying my email address when I comment on a bulletin board system. A good system has distinct identifiers for each of these roles, and in fact, sophisticated ones could/should allow multiple different identifiers of the same functional class, for the same user, such as allowing an individual to have multiple characters on WoW, Iain Henderson calls this aspect of identity "personnas".

So, between the first set of contexts and the four functional types, we should be able to map out what we need for ListenLog.

-j

Identity Workflow

How identity in the LL app(s) might work. This is a proposed workflow based on my limited understanding of how identity systems work:

Khopper 15:19, 10 March 2009 (UTC)

  1. ID the data: iPhone app by default will create a unique ID for each install (likely created by combining application ID + device ID) in order to appropriately key the log data to a unique instance (individual)
  2. Claim the data: In order to get access to this data beyond the phone, the individual must associate the existing unique ID in #1 with a more friendly and portable identity (i.e. user ID). This could be through a unique external or internal identification process. External might be something like OpenID, internal might be created and assigned through an integrated user registration process. This would have to happen on the phone.
  3. To complete the association, the user must authenticate (internally or externally)
  4. The LL datastore must associate the UniqueID in #1 with the userID in #2. Ideally, this will be obfuscated in some way to protect the identity of the user (is this possible?).
  5. To retrieve the data remotely (e.g. on a website that lets you browse your LL data), you must provide identity credentials and authenticate. This will locate the data and validate your access to it.
  6. To write to the data remotely through another application or device (e.g. Pandora), you must follow steps 1-4 above. This should be standardized as part of the LL specification. It is conceivable that there are use cases where data needs to be merged or split by application ID, by user ID, or by Unique ID.

10/31

Current Project Status

10/31/2009

  • Due to funding status and application performance issues, the Public Radio Player (PRP) might not be the ideal (or at least the initial) platform for LL deployment
  • Investigations into The Mine! Project suggest that this platform may work for prototyping media logging
  • An initial proof of concept on The Mine! has been identified: Using the Last.fm API to log listening data in a proposed LL format into The Mine!
    • Note: The Mine!, by design, is broad in it's applicability (i.e. supports any personal data), but fairly shallow in it's capability (e.g. embedding in devices, strong encryption, native support for specific logging formats, etc.). LL needs much of this deep yet narrow functionality to be deployable in the market
  • Several months of discussions have led to the conclusion that getting market acceptance of an opt-in logging functionality will require the following:
  1. Data stored fully encrypted
  2. Data stored fully distributed (i.e. no vendor silos, regardless of portability and substitutability)
  3. Early functionality focused on attracting an early adopter developer community, e.g. data access APIs on day one