[p2pu-dev] Course metrics

Jessy Kate Schingler jessy at jessykate.com
Tue Jun 19 18:44:46 UTC 2012


i like the idea of running a cron job at regular intervals (whatever is
manageable - daily, hourly), and just being transparent about the currency
of the data (updated at xx interval, last update at hh:mm, etc.). then the
data is ready and can be passed back on demand.

wrt arbitrary filtering requests, how about handling that issue by, well,
not handling it :) - ie, support only a handful of predefined views on the
data. but also as jos was saying let the organizer download a csv if they
want to do arbitrary data mining of it on their own (*cough* stian *cough*
:)).

jessy



On Tue, Jun 19, 2012 at 7:46 PM, Jos Flores <josmasflores at gmail.com> wrote:

> Hey Dirk,
>
> I don't think getting data up to 'yesterday' is an issue. Overhead on
> page views sounds really ugly.
>
> Would it be very difficult to put all the graphics in a pdf, and send
> both pdf and csv to the organiser? This, and a lock so that it does
> not duplicate data, would make me happy because I don't have to wait,
> just say I want them and when they are ready, they will be in my
> inbox.
>
> Is the process CPU intensive or is the DB slowing things down? Could
> this be part of the API and run in a different machine? This is a bit
> out now, but sounds like a perfect job for a nodejs stream and a
> document DB, if it's not CPU bound.
>
> cheers,
> José
>
>
> On 19 June 2012 17:07, Dirk Uys <dirk at p2pu.org> wrote:
> > Hi everyone
> >
> > During the last release on 18 May I enabled course metrics for all course
> > organizers believing that the metrics were working perfectly and that
> it's
> > simply a permission update. You know what they say about assumption...
> >
> > The problem is that when a user goes to the metric page for a course the
> > metrics get generated from the recorded page views
> > (
> https://github.com/p2pu/lernanta/blob/master/lernanta/apps/tracker/models.py#L100
> ).
> > If the user refreshes the page (because it's taking so long), the
> process is
> > started again and the metric updating procedure happens concurrently.
> This
> > doesn't play nice with the intended use of the db and duplicated data is
> > generated :(
> >
> > Now, solving this problem has multiple possibilities! Each with pros and
> > cons.
> >
> > 1. Enforce some locking mechanism to ensure the operation only happens
> once
> > + process doesn't run concurrently
> > - user waits
> > - lots of db work tied to specific requests
> >
> > 2. Queue a celery tasks that runs to operation
> > + user doesn't need to wait for results
> > - still need to implement some locking mechanism to prevent celery tasks
> > from running concurrently
> > - lots of db work tied to specific requests
> >
> > 3. Keep the table updated from the get go
> > + metrics are always up to date
> > - introduces small overhead to every page view
> > - generate metrics that's never used
> >
> > 4. Fix the data duplication issue that presents itself
> > + doesn't matter if process runs concurrently
> > - update still takes a long time
> > - lots of db work tied to specific requests
> >
> > 5. Don't trigger the update process based on user actions, but rather at
> a
> > predetermined time
> > + user doesn't wait
> > - generate metrics that's never used
> >
> > 6. ?
> >
> > Does anyone have any thoughts on this?
> >
> > Cheers
> > d
> >
> > _______________________________________________
> > p2pu-dev mailing list
> > p2pu-dev at lists.p2pu.org
> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
> >
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>



-- 
Jessy
http://jessykate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20120619/084d1d2e/attachment.html>


More information about the p2pu-dev mailing list