[p2pu-dev] DB Dump for Datamining p2pu
Emmanuel Baako
ebaako at gmail.com
Sat Jan 29 21:39:34 UTC 2011
Yea, as I mentioned on irc, I'm new to data mining, just started taking a
course so main goal would be to assist Svaksha as a learning experience as
well.
My main interest would be geared towards "retention" perhaps. How many
students are still active at the end of the courses, etc. My last research
paper was on virtual office vs traditional (the line between them gets
blurred a bit more with time). I'm the geek who's always argued the
effectiveness of online education, so seeing those numbers and the trends
would be key. It could shed light on how to be effective even as a
facilitator if the study shows a certain facilitator(s) or course(s) have
significantly higher retention rates than in general. Thus, the P2PU model
could receive some serious validation.
There's just my $.02 ... admittedly a noob on deck!
EB
On Jan 29, 2011 1:40 PM, "Philipp Schmidt" <philipp at p2pu.org> wrote:
> Hey Emmanuel:
>
> Nice to meet you (virtually) and thanks for offering to do some
> interesting analysis of our data. Could you share some of your ideas /
> questions / goals for what you intend to do?
>
> P
>
> On 28 January 2011 22:49, Emmanuel Baako <ebaako at gmail.com> wrote:
>> Certainly, Philip. Last thing needed is a lawsuit to overshadow the
intended
>> good P2PU is serving. An NDA of sorts should be mandated.
>> One other question usually posed is whether to ask permission from all
the
>> users or not? Different groups tackle that differently.
>> In the short term, I don't see any targeted studies[demographics, etc]
that
>> require personal data. So we should be able to get the ball rolling once
>> this the DB is checked.
>> EB
>> On Fri, Jan 28, 2011 at 5:38 PM, Philipp Schmidt <philipp at p2pu.org>
wrote:
>>>
>>> i met with our lawyers last week and privacy was one subject.
>>> long-term, P2PU (the project/organization) will need agreements with
>>> all individuals who have access to our technology, and data. we will
>>> ask people to enter these retroactively.
>>>
>>> short term:
>>>
>>> as long as we are sure we are not exposing any sensitive user data, it
>>> would be great to get people to do interesting things with our data
>>> (and hopefully help us improve what we do).
>>>
>>> the dev instances run on cleaned databases. george has written a
>>> script that cleans them automatically. it would be easy to share
>>> those. if we go forward with this - i would ask george to do one more
>>> check that they really don't contain any persona data before we hand
>>> them over.
>>>
>>> P
>>>
>>> On 28 January 2011 16:51, Stian Håklev <shaklev at gmail.com> wrote:
>>> > Charles would have to do this, I don't know how. I also suspect that a
>>> > mysql
>>> > dump would be less than useful, given the way data is structured in
>>> > drupal,
>>> > although I don't know this for a fact. If we wanted to do an export
>>> > through
>>> > Drupal, it would need some time to define how to export it.
>>> > Stian
>>> >
>>> > On Fri, Jan 28, 2011 at 11:38 AM, Jessy Kate
>>> > <jessy.cowansharp at gmail.com>
>>> > wrote:
>>> >>
>>> >> hey all/charles and stian,
>>> >> would you be comfortable giving vid and emmanuel an anonymized
database
>>> >> dump to do data analysis with? they have some neat ideas and it
would
>>> >> be
>>> >> nice to let them run with it.
>>> >> if so, perhaps you could post a zip of it on the server for them to
>>> >> download?
>>> >> jessy
>>> >>
>>> >> ---------- Forwarded message ----------
>>> >> From: स्वक्ष <svaksha at gmail.com>
>>> >> Date: Thu, Jan 20, 2011 at 11:55 PM
>>> >> Subject: [p2pu-research] Datamining p2pu
>>> >> To: P2PU-research <p2pu-researches at googlegroups.com>
>>> >>
>>> >>
>>> >> Hello Folks,
>>> >>
>>> >> This discussion started out on p2pu-dev[0] but Stian requested it be
>>> >> moved out of *-dev.
>>> >>
>>> >> [0]
>>> >>
>>> >>
http://groups.google.com/group/p2pu-dev/browse_thread/thread/f8ca8965961d13da#
>>> >>
>>> >> On Thu, Jan 20, 2011 at 05:09, Stian Håklev <shaklev at gmail.com>
wrote:
>>> >> > I think we should probably move this to p2pu-researches for further
>>> >>
>>> >> Done, as per request.
>>> >>
>>> >>
>>> >> > discussion to avoid clogging p2pu-dev. I think the easiest
initially
>>> >> > might
>>> >> > be data-dumps, but we might provide APi access too - we'll see.
>>> >>
>>> >> Its easier to work with data-dumps so I'd prefer to choose that
>>> >> option, if available :) I have installed MDP[1] on my laptop and am
>>> >> reading up on it but if you have alternate suggestions, please say
so.
>>> >> Also, would it be possible to create a testing machine with ssh
access
>>> >> (do advise to whom I should send my public sshkey) so I can start
>>> >> playing with it --install MDP and test it with a db dump from p2pu.
>>> >>
>>> >> [1] http://mdp-toolkit.sourceforge.net/ {I chose MDP because its
>>> >> python (as are lernata/django website) based and thought it would be
>>> >> less of a system administration nightmare to maintain packages from
>>> >> the same language.}
>>> >>
>>> >>
>>> >> > Part of the problem has been figuring out the privacy questions...
>>> >> > What
>>> >> > data
>>> >> > do we have a right to expose publicly, do we need to anonymize etc.
>>> >> > It's
>>> >> > different at P2PU than in a university class, since all learning
>>> >> > happens
>>> >> > publicly, and there is no expectation of privacy, but there might
>>> >> > still
>>> >> > be
>>> >> > concerns we need to think through.
>>> >>
>>> >> Sure, privacy is important. To avoid individual identification, I'd
>>> >> suggest mining for "patterns" instead of the individuals. That means,
>>> >> for example, we'd look for patterns between languages ("do we have
>>> >> more python and ruby courses as compared to R-lang?") instead of "How
>>> >> many python courses $person studied?". To start out, does that sound
>>> >> like a reasonable alternative? Suggestions are welcome.
>>> >>
>>> >> Generally speaking, I'm technically inclined but its also important
to
>>> >> know how the data is going to be used and by whom? Who are the stake
>>> >> holders and what they hope to gain from it...etc.. I'd assume that is
>>> >> the reason Stian requested the discussion be moved here.
>>> >> Ofcourse, as the system is being built we can always discuss and
>>> >> decide what data should be released publicly and what is for internal
>>> >> p2pu consumption. Maybe these discussions can be iterative in
distinct
>>> >> multiple threads :)
>>> >>
>>> >> I'm sure I've missed many things so please feel free to keep the
>>> >> discussion going...and thanks for reading,
>>> >>
>>> >> PS: Although I've joined the group, I was unable to post to this list
>>> >> earlier (my mail was returned) so can the admin please check if my
>>> >> membership was approved earlier? I'm using the googlegroups web
>>> >> interface to mail this message.
>>> >>
>>> >> --
>>> >> Regards,
>>> >> vid ॥ http://svaksha.com
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jessy Cowan-Sharp
>>> >> http://jessykate.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > http://reganmian.net/blog -- Random Stuff that Matters
>>> >
>>> >
>>> > _______________________________________________
>>> > p2pu-dev mailing list
>>> > p2pu-dev at lists.p2pu.org
>>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>> >
>>> >
>>> _______________________________________________
>>> p2pu-dev mailing list
>>> p2pu-dev at lists.p2pu.org
>>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>
>>
>>
>> --
>> http://about.me/ebaako
>>
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>>
>>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110129/ce6e1d8c/attachment.html>
More information about the p2pu-dev
mailing list