[p2pu-dev] DB Dump for Datamining p2pu

Philipp Schmidt philipp at p2pu.org
Sat Jan 29 18:39:54 UTC 2011


Hey Emmanuel:

Nice to meet you (virtually) and thanks for offering to do some
interesting analysis of our data. Could you share some of your ideas /
questions / goals for what you intend to do?

P

On 28 January 2011 22:49, Emmanuel Baako <ebaako at gmail.com> wrote:
> Certainly, Philip. Last thing needed is a lawsuit to overshadow the intended
> good P2PU is serving. An NDA of sorts should be mandated.
> One other question usually posed is whether to ask permission from all the
> users or not? Different groups tackle that differently.
> In the short term, I don't see any targeted studies[demographics, etc] that
> require personal data. So we should be able to get the ball rolling once
> this the DB is checked.
> EB
> On Fri, Jan 28, 2011 at 5:38 PM, Philipp Schmidt <philipp at p2pu.org> wrote:
>>
>> i met with our lawyers last week and privacy was one subject.
>> long-term, P2PU (the project/organization) will need agreements with
>> all individuals who have access to our technology, and data. we will
>> ask people to enter these retroactively.
>>
>> short term:
>>
>> as long as we are sure we are not exposing any sensitive user data, it
>> would be great to get people to do interesting things with our data
>> (and hopefully help us improve what we do).
>>
>> the dev instances run on cleaned databases. george has written a
>> script that cleans them automatically. it would be easy to share
>> those. if we go forward with this - i would ask george to do one more
>> check that they really don't contain any persona data before we hand
>> them over.
>>
>> P
>>
>> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
>> > Charles would have to do this, I don't know how. I also suspect that a
>> > mysql
>> > dump would be less than useful, given the way data is structured in
>> > drupal,
>> > although I don't know this for a fact. If we wanted to do an export
>> > through
>> > Drupal, it would need some time to define how to export it.
>> > Stian
>> >
>> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate
>> > <jessy.cowansharp at gmail.com>
>> > wrote:
>> >>
>> >> hey all/charles and stian,
>> >> would you be comfortable giving vid and emmanuel an anonymized database
>> >> dump to do data analysis with? they have some  neat ideas and it would
>> >> be
>> >> nice to let them run with it.
>> >> if so, perhaps you could post a zip of it on the server for them to
>> >> download?
>> >> jessy
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: स्वक्ष <svaksha at gmail.com>
>> >> Date: Thu, Jan 20, 2011 at 11:55 PM
>> >> Subject: [p2pu-research] Datamining p2pu
>> >> To: P2PU-research <p2pu-researches at googlegroups.com>
>> >>
>> >>
>> >> Hello Folks,
>> >>
>> >> This discussion started out on p2pu-dev[0] but Stian requested it be
>> >> moved out of *-dev.
>> >>
>> >> [0]
>> >>
>> >> http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
>> >>
>> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com> wrote:
>> >> > I think we should probably move this to p2pu-researches for further
>> >>
>> >> Done, as per request.
>> >>
>> >>
>> >> > discussion to avoid clogging p2pu-dev. I think the easiest initially
>> >> > might
>> >> > be data-dumps, but we might provide APi access too - we'll see.
>> >>
>> >> Its easier to work with data-dumps so I'd prefer to choose that
>> >> option, if available :)  I have installed MDP[1] on my laptop and am
>> >> reading up on it but if you have alternate suggestions, please say so.
>> >> Also, would it be possible to create a testing machine with ssh access
>> >> (do advise to whom I should send my public sshkey) so I can start
>> >> playing with it --install MDP and test it with a db dump from p2pu.
>> >>
>> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
>> >> python  (as are lernata/django website) based and thought it would be
>> >> less of a system administration nightmare to maintain packages from
>> >> the same language.}
>> >>
>> >>
>> >> > Part of the problem has been figuring out the privacy questions...
>> >> > What
>> >> > data
>> >> > do we have a right to expose publicly, do we need to anonymize etc.
>> >> > It's
>> >> > different at P2PU than in a university class, since all learning
>> >> > happens
>> >> > publicly, and there is no expectation of privacy, but there might
>> >> > still
>> >> > be
>> >> > concerns we need to think through.
>> >>
>> >> Sure, privacy is important. To avoid individual identification, I'd
>> >> suggest mining for "patterns" instead of the individuals. That means,
>> >> for example, we'd look for patterns between languages ("do we have
>> >> more python and ruby courses as compared to R-lang?") instead of "How
>> >> many python courses $person studied?". To start out, does that sound
>> >> like a reasonable alternative? Suggestions are welcome.
>> >>
>> >> Generally speaking, I'm technically inclined but its also important to
>> >> know how the data is going to be used and by whom? Who are the stake
>> >> holders and what they hope to gain from it...etc.. I'd assume that is
>> >> the reason Stian requested the discussion be moved here.
>> >> Ofcourse, as the system is being built we can always discuss and
>> >> decide what data should be released publicly and what is for internal
>> >> p2pu consumption. Maybe these discussions can be iterative in distinct
>> >> multiple threads :)
>> >>
>> >> I'm sure I've missed many things so please feel free to keep the
>> >> discussion going...and thanks for reading,
>> >>
>> >> PS: Although I've joined the group, I was unable to post to this list
>> >> earlier (my mail was returned) so can the admin please check if my
>> >> membership was approved earlier? I'm using the googlegroups web
>> >> interface to mail this message.
>> >>
>> >> --
>> >> Regards,
>> >> vid ॥ http://svaksha.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jessy Cowan-Sharp
>> >> http://jessykate.com
>> >>
>> >
>> >
>> >
>> > --
>> > http://reganmian.net/blog -- Random Stuff that Matters
>> >
>> >
>> > _______________________________________________
>> > p2pu-dev mailing list
>> > p2pu-dev at lists.p2pu.org
>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>> >
>> >
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>
>
> --
> http://about.me/ebaako
>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>


More information about the p2pu-dev mailing list