[p2pu-dev] DB Dump for Datamining p2pu

Philipp Schmidt phi.schmidt at gmail.com
Fri Jan 28 23:46:51 UTC 2011


On 28 January 2011 22:53, Niels Sprong <nielssprong at gmail.com> wrote:
> I understand the need for a sort of code of conduct to deal with data.
>
> I don't really understand the legal issues. ('sensitive' user data?)

the legal issues are relatively minor compared to the danger of our
community *perceiving* us sharing any data as a violation of their
privacy. legal issues are mostly related to us not sharing user data
with third parties for commercial reasons. and p2pu users share much
information that would be considered personal, already on our site -
so the database would just be offering a different lens on that
information.

>
> we're all cool, right?

obviously!

P

> On 28 January 2011 22:38, Philipp Schmidt <philipp at p2pu.org> wrote:
>>
>> i met with our lawyers last week and privacy was one subject.
>> long-term, P2PU (the project/organization) will need agreements with
>> all individuals who have access to our technology, and data. we will
>> ask people to enter these retroactively.
>>
>> short term:
>>
>> as long as we are sure we are not exposing any sensitive user data, it
>> would be great to get people to do interesting things with our data
>> (and hopefully help us improve what we do).
>>
>> the dev instances run on cleaned databases. george has written a
>> script that cleans them automatically. it would be easy to share
>> those. if we go forward with this - i would ask george to do one more
>> check that they really don't contain any persona data before we hand
>> them over.
>>
>> P
>>
>> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
>> > Charles would have to do this, I don't know how. I also suspect that a
>> > mysql
>> > dump would be less than useful, given the way data is structured in
>> > drupal,
>> > although I don't know this for a fact. If we wanted to do an export
>> > through
>> > Drupal, it would need some time to define how to export it.
>> > Stian
>> >
>> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate
>> > <jessy.cowansharp at gmail.com>
>> > wrote:
>> >>
>> >> hey all/charles and stian,
>> >> would you be comfortable giving vid and emmanuel an anonymized database
>> >> dump to do data analysis with? they have some  neat ideas and it would
>> >> be
>> >> nice to let them run with it.
>> >> if so, perhaps you could post a zip of it on the server for them to
>> >> download?
>> >> jessy
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: स्वक्ष <svaksha at gmail.com>
>> >> Date: Thu, Jan 20, 2011 at 11:55 PM
>> >> Subject: [p2pu-research] Datamining p2pu
>> >> To: P2PU-research <p2pu-researches at googlegroups.com>
>> >>
>> >>
>> >> Hello Folks,
>> >>
>> >> This discussion started out on p2pu-dev[0] but Stian requested it be
>> >> moved out of *-dev.
>> >>
>> >> [0]
>> >>
>> >> http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
>> >>
>> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com> wrote:
>> >> > I think we should probably move this to p2pu-researches for further
>> >>
>> >> Done, as per request.
>> >>
>> >>
>> >> > discussion to avoid clogging p2pu-dev. I think the easiest initially
>> >> > might
>> >> > be data-dumps, but we might provide APi access too - we'll see.
>> >>
>> >> Its easier to work with data-dumps so I'd prefer to choose that
>> >> option, if available :)  I have installed MDP[1] on my laptop and am
>> >> reading up on it but if you have alternate suggestions, please say so.
>> >> Also, would it be possible to create a testing machine with ssh access
>> >> (do advise to whom I should send my public sshkey) so I can start
>> >> playing with it --install MDP and test it with a db dump from p2pu.
>> >>
>> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
>> >> python  (as are lernata/django website) based and thought it would be
>> >> less of a system administration nightmare to maintain packages from
>> >> the same language.}
>> >>
>> >>
>> >> > Part of the problem has been figuring out the privacy questions...
>> >> > What
>> >> > data
>> >> > do we have a right to expose publicly, do we need to anonymize etc.
>> >> > It's
>> >> > different at P2PU than in a university class, since all learning
>> >> > happens
>> >> > publicly, and there is no expectation of privacy, but there might
>> >> > still
>> >> > be
>> >> > concerns we need to think through.
>> >>
>> >> Sure, privacy is important. To avoid individual identification, I'd
>> >> suggest mining for "patterns" instead of the individuals. That means,
>> >> for example, we'd look for patterns between languages ("do we have
>> >> more python and ruby courses as compared to R-lang?") instead of "How
>> >> many python courses $person studied?". To start out, does that sound
>> >> like a reasonable alternative? Suggestions are welcome.
>> >>
>> >> Generally speaking, I'm technically inclined but its also important to
>> >> know how the data is going to be used and by whom? Who are the stake
>> >> holders and what they hope to gain from it...etc.. I'd assume that is
>> >> the reason Stian requested the discussion be moved here.
>> >> Ofcourse, as the system is being built we can always discuss and
>> >> decide what data should be released publicly and what is for internal
>> >> p2pu consumption. Maybe these discussions can be iterative in distinct
>> >> multiple threads :)
>> >>
>> >> I'm sure I've missed many things so please feel free to keep the
>> >> discussion going...and thanks for reading,
>> >>
>> >> PS: Although I've joined the group, I was unable to post to this list
>> >> earlier (my mail was returned) so can the admin please check if my
>> >> membership was approved earlier? I'm using the googlegroups web
>> >> interface to mail this message.
>> >>
>> >> --
>> >> Regards,
>> >> vid ॥ http://svaksha.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jessy Cowan-Sharp
>> >> http://jessykate.com
>> >>
>> >
>> >
>> >
>> > --
>> > http://reganmian.net/blog -- Random Stuff that Matters
>> >
>> >
>> > _______________________________________________
>> > p2pu-dev mailing list
>> > p2pu-dev at lists.p2pu.org
>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>> >
>> >
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>


More information about the p2pu-dev mailing list