[p2pu-dev] DB Dump for Datamining p2pu

Jessy Kate jessy.cowansharp at gmail.com
Fri Jan 28 16:38:31 UTC 2011


hey all/charles and stian,

would you be comfortable giving vid and emmanuel an anonymized database dump
to do data analysis with? they have some  neat ideas and it would be nice to
let them run with it.

if so, perhaps you could post a zip of it on the server for them to
download?

jessy

---------- Forwarded message ----------
From: स्वक्ष <svaksha at gmail.com>
Date: Thu, Jan 20, 2011 at 11:55 PM
Subject: [p2pu-research] Datamining p2pu
To: P2PU-research <p2pu-researches at googlegroups.com>


Hello Folks,

This discussion started out on p2pu-dev[0] but Stian requested it be
moved out of *-dev.

[0]
http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#

On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com> wrote:
> I think we should probably move this to p2pu-researches for further

Done, as per request.


> discussion to avoid clogging p2pu-dev. I think the easiest initially might
> be data-dumps, but we might provide APi access too - we'll see.

Its easier to work with data-dumps so I'd prefer to choose that
option, if available :)  I have installed MDP[1] on my laptop and am
reading up on it but if you have alternate suggestions, please say so.
Also, would it be possible to create a testing machine with ssh access
(do advise to whom I should send my public sshkey) so I can start
playing with it --install MDP and test it with a db dump from p2pu.

[1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
python  (as are lernata/django website) based and thought it would be
less of a system administration nightmare to maintain packages from
the same language.}


> Part of the problem has been figuring out the privacy questions... What
data
> do we have a right to expose publicly, do we need to anonymize etc. It's
> different at P2PU than in a university class, since all learning happens
> publicly, and there is no expectation of privacy, but there might still be
> concerns we need to think through.

Sure, privacy is important. To avoid individual identification, I'd
suggest mining for "patterns" instead of the individuals. That means,
for example, we'd look for patterns between languages ("do we have
more python and ruby courses as compared to R-lang?") instead of "How
many python courses $person studied?". To start out, does that sound
like a reasonable alternative? Suggestions are welcome.

Generally speaking, I'm technically inclined but its also important to
know how the data is going to be used and by whom? Who are the stake
holders and what they hope to gain from it...etc.. I'd assume that is
the reason Stian requested the discussion be moved here.
Ofcourse, as the system is being built we can always discuss and
decide what data should be released publicly and what is for internal
p2pu consumption. Maybe these discussions can be iterative in distinct
multiple threads :)

I'm sure I've missed many things so please feel free to keep the
discussion going...and thanks for reading,

PS: Although I've joined the group, I was unable to post to this list
earlier (my mail was returned) so can the admin please check if my
membership was approved earlier? I'm using the googlegroups web
interface to mail this message.

--
Regards,
vid ॥ http://svaksha.com





-- 
Jessy Cowan-Sharp
http://jessykate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110128/0112a22a/attachment-0001.html>


More information about the p2pu-dev mailing list