[p2pu-dev] DB Dump for Datamining p2pu
Niels Sprong
nielssprong at gmail.com
Sat Jan 29 10:24:39 UTC 2011
Good to know we're cool :-) all these answers clear it up somewhat, thanks!
I was just thinking a bit "argh why do we even have a law, we're all cool
people right?" yesterday.
On an ideological level in my view if you're sharing data with P2PU, you're
sharing data with a community. So technically, P2PU is not sharing data with
any third party when it makes data available to researchers, all people
using that data are part of p2pu. Why would you be comfortable sharing data
with 'p2pu the people who can now access the data' and not with 'p2pu'?
(well, indeed, because it's annoying when people use it for commercial
purposes, but any cool person (a.k.a a p2puer) knows that).
But I'm probably talking besides the point; I liked the 'long term' and
'short term' implications of your talk with 'the lawyers'. :-)
Also, privacy issues are part of a research code of conduct. I will spend a
part of today drafting up a research landing page (hey, it's still
January!), and have a go at a 'research code of conduct' (we started in BCN,
need to find those notes). I will share this stuff with the research list
somewhere during the day if you're interested.
N
On 28 January 2011 23:06, Emmanuel Baako <ebaako at gmail.com> wrote:
> Data mining has always raised ethical concerns. Not everyone likes to have
> their personal data included in studies. The purpose of collecting any form
> of user data should be clearly stated. Businesses use it for transactions
> and have to state and ensure its use for that purpose only.
> Without specifically stating anywhere that "data collected from its members
> will be included in ....... ", we would be in violation of the [purpose and
> use limitation principle] of the FAIR INFORMATION PRACTICE PRINCIPLES.
> Care and caution are always advised... but we're all definitely cool, ryt?
>
> EB
>
>
> On Fri, Jan 28, 2011 at 5:53 PM, Niels Sprong <nielssprong at gmail.com>wrote:
>
>> I understand the need for a sort of code of conduct to deal with data.
>>
>> I don't really understand the legal issues. ('sensitive' user data?)
>>
>> we're all cool, right?
>>
>>
>> On 28 January 2011 22:38, Philipp Schmidt <philipp at p2pu.org> wrote:
>>
>>> i met with our lawyers last week and privacy was one subject.
>>> long-term, P2PU (the project/organization) will need agreements with
>>> all individuals who have access to our technology, and data. we will
>>> ask people to enter these retroactively.
>>>
>>> short term:
>>>
>>> as long as we are sure we are not exposing any sensitive user data, it
>>> would be great to get people to do interesting things with our data
>>> (and hopefully help us improve what we do).
>>>
>>> the dev instances run on cleaned databases. george has written a
>>> script that cleans them automatically. it would be easy to share
>>> those. if we go forward with this - i would ask george to do one more
>>> check that they really don't contain any persona data before we hand
>>> them over.
>>>
>>> P
>>>
>>> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
>>> > Charles would have to do this, I don't know how. I also suspect that a
>>> mysql
>>> > dump would be less than useful, given the way data is structured in
>>> drupal,
>>> > although I don't know this for a fact. If we wanted to do an export
>>> through
>>> > Drupal, it would need some time to define how to export it.
>>> > Stian
>>> >
>>> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate <
>>> jessy.cowansharp at gmail.com>
>>> > wrote:
>>> >>
>>> >> hey all/charles and stian,
>>> >> would you be comfortable giving vid and emmanuel an anonymized
>>> database
>>> >> dump to do data analysis with? they have some neat ideas and it would
>>> be
>>> >> nice to let them run with it.
>>> >> if so, perhaps you could post a zip of it on the server for them to
>>> >> download?
>>> >> jessy
>>> >>
>>> >> ---------- Forwarded message ----------
>>> >> From: स्वक्ष <svaksha at gmail.com>
>>> >> Date: Thu, Jan 20, 2011 at 11:55 PM
>>> >> Subject: [p2pu-research] Datamining p2pu
>>> >> To: P2PU-research <p2pu-researches at googlegroups.com>
>>> >>
>>> >>
>>> >> Hello Folks,
>>> >>
>>> >> This discussion started out on p2pu-dev[0] but Stian requested it be
>>> >> moved out of *-dev.
>>> >>
>>> >> [0]
>>> >>
>>> http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
>>> >>
>>> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com>
>>> wrote:
>>> >> > I think we should probably move this to p2pu-researches for further
>>> >>
>>> >> Done, as per request.
>>> >>
>>> >>
>>> >> > discussion to avoid clogging p2pu-dev. I think the easiest initially
>>> >> > might
>>> >> > be data-dumps, but we might provide APi access too - we'll see.
>>> >>
>>> >> Its easier to work with data-dumps so I'd prefer to choose that
>>> >> option, if available :) I have installed MDP[1] on my laptop and am
>>> >> reading up on it but if you have alternate suggestions, please say so.
>>> >> Also, would it be possible to create a testing machine with ssh access
>>> >> (do advise to whom I should send my public sshkey) so I can start
>>> >> playing with it --install MDP and test it with a db dump from p2pu.
>>> >>
>>> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
>>> >> python (as are lernata/django website) based and thought it would be
>>> >> less of a system administration nightmare to maintain packages from
>>> >> the same language.}
>>> >>
>>> >>
>>> >> > Part of the problem has been figuring out the privacy questions...
>>> What
>>> >> > data
>>> >> > do we have a right to expose publicly, do we need to anonymize etc.
>>> It's
>>> >> > different at P2PU than in a university class, since all learning
>>> happens
>>> >> > publicly, and there is no expectation of privacy, but there might
>>> still
>>> >> > be
>>> >> > concerns we need to think through.
>>> >>
>>> >> Sure, privacy is important. To avoid individual identification, I'd
>>> >> suggest mining for "patterns" instead of the individuals. That means,
>>> >> for example, we'd look for patterns between languages ("do we have
>>> >> more python and ruby courses as compared to R-lang?") instead of "How
>>> >> many python courses $person studied?". To start out, does that sound
>>> >> like a reasonable alternative? Suggestions are welcome.
>>> >>
>>> >> Generally speaking, I'm technically inclined but its also important to
>>> >> know how the data is going to be used and by whom? Who are the stake
>>> >> holders and what they hope to gain from it...etc.. I'd assume that is
>>> >> the reason Stian requested the discussion be moved here.
>>> >> Ofcourse, as the system is being built we can always discuss and
>>> >> decide what data should be released publicly and what is for internal
>>> >> p2pu consumption. Maybe these discussions can be iterative in distinct
>>> >> multiple threads :)
>>> >>
>>> >> I'm sure I've missed many things so please feel free to keep the
>>> >> discussion going...and thanks for reading,
>>> >>
>>> >> PS: Although I've joined the group, I was unable to post to this list
>>> >> earlier (my mail was returned) so can the admin please check if my
>>> >> membership was approved earlier? I'm using the googlegroups web
>>> >> interface to mail this message.
>>> >>
>>> >> --
>>> >> Regards,
>>> >> vid ॥ http://svaksha.com
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jessy Cowan-Sharp
>>> >> http://jessykate.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > http://reganmian.net/blog -- Random Stuff that Matters
>>> >
>>> >
>>> > _______________________________________________
>>> > p2pu-dev mailing list
>>> > p2pu-dev at lists.p2pu.org
>>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>> >
>>> >
>>> _______________________________________________
>>> p2pu-dev mailing list
>>> p2pu-dev at lists.p2pu.org
>>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>>
>>
>>
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>
>>
>
>
> --
> http://about.me/ebaako
>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110129/434744db/attachment.html>
More information about the p2pu-dev
mailing list