[p2pu-dev] DB Dump for Datamining p2pu
Niels Sprong
nielssprong at gmail.com
Fri Jan 28 22:53:21 UTC 2011
I understand the need for a sort of code of conduct to deal with data.
I don't really understand the legal issues. ('sensitive' user data?)
we're all cool, right?
On 28 January 2011 22:38, Philipp Schmidt <philipp at p2pu.org> wrote:
> i met with our lawyers last week and privacy was one subject.
> long-term, P2PU (the project/organization) will need agreements with
> all individuals who have access to our technology, and data. we will
> ask people to enter these retroactively.
>
> short term:
>
> as long as we are sure we are not exposing any sensitive user data, it
> would be great to get people to do interesting things with our data
> (and hopefully help us improve what we do).
>
> the dev instances run on cleaned databases. george has written a
> script that cleans them automatically. it would be easy to share
> those. if we go forward with this - i would ask george to do one more
> check that they really don't contain any persona data before we hand
> them over.
>
> P
>
> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
> > Charles would have to do this, I don't know how. I also suspect that a
> mysql
> > dump would be less than useful, given the way data is structured in
> drupal,
> > although I don't know this for a fact. If we wanted to do an export
> through
> > Drupal, it would need some time to define how to export it.
> > Stian
> >
> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate <jessy.cowansharp at gmail.com
> >
> > wrote:
> >>
> >> hey all/charles and stian,
> >> would you be comfortable giving vid and emmanuel an anonymized database
> >> dump to do data analysis with? they have some neat ideas and it would
> be
> >> nice to let them run with it.
> >> if so, perhaps you could post a zip of it on the server for them to
> >> download?
> >> jessy
> >>
> >> ---------- Forwarded message ----------
> >> From: स्वक्ष <svaksha at gmail.com>
> >> Date: Thu, Jan 20, 2011 at 11:55 PM
> >> Subject: [p2pu-research] Datamining p2pu
> >> To: P2PU-research <p2pu-researches at googlegroups.com>
> >>
> >>
> >> Hello Folks,
> >>
> >> This discussion started out on p2pu-dev[0] but Stian requested it be
> >> moved out of *-dev.
> >>
> >> [0]
> >>
> http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
> >>
> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com> wrote:
> >> > I think we should probably move this to p2pu-researches for further
> >>
> >> Done, as per request.
> >>
> >>
> >> > discussion to avoid clogging p2pu-dev. I think the easiest initially
> >> > might
> >> > be data-dumps, but we might provide APi access too - we'll see.
> >>
> >> Its easier to work with data-dumps so I'd prefer to choose that
> >> option, if available :) I have installed MDP[1] on my laptop and am
> >> reading up on it but if you have alternate suggestions, please say so.
> >> Also, would it be possible to create a testing machine with ssh access
> >> (do advise to whom I should send my public sshkey) so I can start
> >> playing with it --install MDP and test it with a db dump from p2pu.
> >>
> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
> >> python (as are lernata/django website) based and thought it would be
> >> less of a system administration nightmare to maintain packages from
> >> the same language.}
> >>
> >>
> >> > Part of the problem has been figuring out the privacy questions...
> What
> >> > data
> >> > do we have a right to expose publicly, do we need to anonymize etc.
> It's
> >> > different at P2PU than in a university class, since all learning
> happens
> >> > publicly, and there is no expectation of privacy, but there might
> still
> >> > be
> >> > concerns we need to think through.
> >>
> >> Sure, privacy is important. To avoid individual identification, I'd
> >> suggest mining for "patterns" instead of the individuals. That means,
> >> for example, we'd look for patterns between languages ("do we have
> >> more python and ruby courses as compared to R-lang?") instead of "How
> >> many python courses $person studied?". To start out, does that sound
> >> like a reasonable alternative? Suggestions are welcome.
> >>
> >> Generally speaking, I'm technically inclined but its also important to
> >> know how the data is going to be used and by whom? Who are the stake
> >> holders and what they hope to gain from it...etc.. I'd assume that is
> >> the reason Stian requested the discussion be moved here.
> >> Ofcourse, as the system is being built we can always discuss and
> >> decide what data should be released publicly and what is for internal
> >> p2pu consumption. Maybe these discussions can be iterative in distinct
> >> multiple threads :)
> >>
> >> I'm sure I've missed many things so please feel free to keep the
> >> discussion going...and thanks for reading,
> >>
> >> PS: Although I've joined the group, I was unable to post to this list
> >> earlier (my mail was returned) so can the admin please check if my
> >> membership was approved earlier? I'm using the googlegroups web
> >> interface to mail this message.
> >>
> >> --
> >> Regards,
> >> vid ॥ http://svaksha.com
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jessy Cowan-Sharp
> >> http://jessykate.com
> >>
> >
> >
> >
> > --
> > http://reganmian.net/blog -- Random Stuff that Matters
> >
> >
> > _______________________________________________
> > p2pu-dev mailing list
> > p2pu-dev at lists.p2pu.org
> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
> >
> >
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110128/20530f59/attachment.html>
More information about the p2pu-dev
mailing list