[p2pu-dev] DB Dump for Datamining p2pu
Emmanuel Baako
ebaako at gmail.com
Fri Jan 28 23:06:39 UTC 2011
Data mining has always raised ethical concerns. Not everyone likes to have
their personal data included in studies. The purpose of collecting any form
of user data should be clearly stated. Businesses use it for transactions
and have to state and ensure its use for that purpose only.
Without specifically stating anywhere that "data collected from its members
will be included in ....... ", we would be in violation of the [purpose and
use limitation principle] of the FAIR INFORMATION PRACTICE PRINCIPLES.
Care and caution are always advised... but we're all definitely cool, ryt?
EB
On Fri, Jan 28, 2011 at 5:53 PM, Niels Sprong <nielssprong at gmail.com> wrote:
> I understand the need for a sort of code of conduct to deal with data.
>
> I don't really understand the legal issues. ('sensitive' user data?)
>
> we're all cool, right?
>
>
> On 28 January 2011 22:38, Philipp Schmidt <philipp at p2pu.org> wrote:
>
>> i met with our lawyers last week and privacy was one subject.
>> long-term, P2PU (the project/organization) will need agreements with
>> all individuals who have access to our technology, and data. we will
>> ask people to enter these retroactively.
>>
>> short term:
>>
>> as long as we are sure we are not exposing any sensitive user data, it
>> would be great to get people to do interesting things with our data
>> (and hopefully help us improve what we do).
>>
>> the dev instances run on cleaned databases. george has written a
>> script that cleans them automatically. it would be easy to share
>> those. if we go forward with this - i would ask george to do one more
>> check that they really don't contain any persona data before we hand
>> them over.
>>
>> P
>>
>> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
>> > Charles would have to do this, I don't know how. I also suspect that a
>> mysql
>> > dump would be less than useful, given the way data is structured in
>> drupal,
>> > although I don't know this for a fact. If we wanted to do an export
>> through
>> > Drupal, it would need some time to define how to export it.
>> > Stian
>> >
>> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate <
>> jessy.cowansharp at gmail.com>
>> > wrote:
>> >>
>> >> hey all/charles and stian,
>> >> would you be comfortable giving vid and emmanuel an anonymized database
>> >> dump to do data analysis with? they have some neat ideas and it would
>> be
>> >> nice to let them run with it.
>> >> if so, perhaps you could post a zip of it on the server for them to
>> >> download?
>> >> jessy
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: स्वक्ष <svaksha at gmail.com>
>> >> Date: Thu, Jan 20, 2011 at 11:55 PM
>> >> Subject: [p2pu-research] Datamining p2pu
>> >> To: P2PU-research <p2pu-researches at googlegroups.com>
>> >>
>> >>
>> >> Hello Folks,
>> >>
>> >> This discussion started out on p2pu-dev[0] but Stian requested it be
>> >> moved out of *-dev.
>> >>
>> >> [0]
>> >>
>> http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
>> >>
>> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com> wrote:
>> >> > I think we should probably move this to p2pu-researches for further
>> >>
>> >> Done, as per request.
>> >>
>> >>
>> >> > discussion to avoid clogging p2pu-dev. I think the easiest initially
>> >> > might
>> >> > be data-dumps, but we might provide APi access too - we'll see.
>> >>
>> >> Its easier to work with data-dumps so I'd prefer to choose that
>> >> option, if available :) I have installed MDP[1] on my laptop and am
>> >> reading up on it but if you have alternate suggestions, please say so.
>> >> Also, would it be possible to create a testing machine with ssh access
>> >> (do advise to whom I should send my public sshkey) so I can start
>> >> playing with it --install MDP and test it with a db dump from p2pu.
>> >>
>> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
>> >> python (as are lernata/django website) based and thought it would be
>> >> less of a system administration nightmare to maintain packages from
>> >> the same language.}
>> >>
>> >>
>> >> > Part of the problem has been figuring out the privacy questions...
>> What
>> >> > data
>> >> > do we have a right to expose publicly, do we need to anonymize etc.
>> It's
>> >> > different at P2PU than in a university class, since all learning
>> happens
>> >> > publicly, and there is no expectation of privacy, but there might
>> still
>> >> > be
>> >> > concerns we need to think through.
>> >>
>> >> Sure, privacy is important. To avoid individual identification, I'd
>> >> suggest mining for "patterns" instead of the individuals. That means,
>> >> for example, we'd look for patterns between languages ("do we have
>> >> more python and ruby courses as compared to R-lang?") instead of "How
>> >> many python courses $person studied?". To start out, does that sound
>> >> like a reasonable alternative? Suggestions are welcome.
>> >>
>> >> Generally speaking, I'm technically inclined but its also important to
>> >> know how the data is going to be used and by whom? Who are the stake
>> >> holders and what they hope to gain from it...etc.. I'd assume that is
>> >> the reason Stian requested the discussion be moved here.
>> >> Ofcourse, as the system is being built we can always discuss and
>> >> decide what data should be released publicly and what is for internal
>> >> p2pu consumption. Maybe these discussions can be iterative in distinct
>> >> multiple threads :)
>> >>
>> >> I'm sure I've missed many things so please feel free to keep the
>> >> discussion going...and thanks for reading,
>> >>
>> >> PS: Although I've joined the group, I was unable to post to this list
>> >> earlier (my mail was returned) so can the admin please check if my
>> >> membership was approved earlier? I'm using the googlegroups web
>> >> interface to mail this message.
>> >>
>> >> --
>> >> Regards,
>> >> vid ॥ http://svaksha.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jessy Cowan-Sharp
>> >> http://jessykate.com
>> >>
>> >
>> >
>> >
>> > --
>> > http://reganmian.net/blog -- Random Stuff that Matters
>> >
>> >
>> > _______________________________________________
>> > p2pu-dev mailing list
>> > p2pu-dev at lists.p2pu.org
>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>> >
>> >
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>
>
>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>
--
http://about.me/ebaako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110128/85f55adb/attachment-0001.html>
More information about the p2pu-dev
mailing list