[p2pu-dev] Course metrics

Dirk Uys dirk at p2pu.org
Wed Jun 20 06:49:06 UTC 2012


Hi Jose

On Tue, Jun 19, 2012 at 7:46 PM, Jos Flores <josmasflores at gmail.com> wrote:

> Hey Dirk,
>
> I don't think getting data up to 'yesterday' is an issue. Overhead on
> page views sounds really ugly.
>
> Would it be very difficult to put all the graphics in a pdf, and send
> both pdf and csv to the organiser? This, and a lock so that it does
> not duplicate data, would make me happy because I don't have to wait,
> just say I want them and when they are ready, they will be in my
> inbox.
>

It wouldn't be that difficult, but it's a departure from the current
implementation that would take some time to implement. I like the idea!

Is the process CPU intensive or is the DB slowing things down? Could
> this be part of the API and run in a different machine? This is a bit
> out now, but sounds like a perfect job for a nodejs stream and a
> document DB, if it's not CPU bound.
>

Both and neither. I think the Django ORM isn't the perfect fit for tasks
like this. The process that updates the data can probably be implemented a
few orders of magnitude quicker in pure SQL.

Pushing the tracker to a separate module is something that we need to look
at in the near future - the data being stored is already dominating the
size of the whole p2pu database! By a factor of two!


> cheers,
> José
>
>
> On 19 June 2012 17:07, Dirk Uys <dirk at p2pu.org> wrote:
> > Hi everyone
> >
> > During the last release on 18 May I enabled course metrics for all course
> > organizers believing that the metrics were working perfectly and that
> it's
> > simply a permission update. You know what they say about assumption...
> >
> > The problem is that when a user goes to the metric page for a course the
> > metrics get generated from the recorded page views
> > (
> https://github.com/p2pu/lernanta/blob/master/lernanta/apps/tracker/models.py#L100
> ).
> > If the user refreshes the page (because it's taking so long), the
> process is
> > started again and the metric updating procedure happens concurrently.
> This
> > doesn't play nice with the intended use of the db and duplicated data is
> > generated :(
> >
> > Now, solving this problem has multiple possibilities! Each with pros and
> > cons.
> >
> > 1. Enforce some locking mechanism to ensure the operation only happens
> once
> > + process doesn't run concurrently
> > - user waits
> > - lots of db work tied to specific requests
> >
> > 2. Queue a celery tasks that runs to operation
> > + user doesn't need to wait for results
> > - still need to implement some locking mechanism to prevent celery tasks
> > from running concurrently
> > - lots of db work tied to specific requests
> >
> > 3. Keep the table updated from the get go
> > + metrics are always up to date
> > - introduces small overhead to every page view
> > - generate metrics that's never used
> >
> > 4. Fix the data duplication issue that presents itself
> > + doesn't matter if process runs concurrently
> > - update still takes a long time
> > - lots of db work tied to specific requests
> >
> > 5. Don't trigger the update process based on user actions, but rather at
> a
> > predetermined time
> > + user doesn't wait
> > - generate metrics that's never used
> >
> > 6. ?
> >
> > Does anyone have any thoughts on this?
> >
> > Cheers
> > d
> >
> > _______________________________________________
> > p2pu-dev mailing list
> > p2pu-dev at lists.p2pu.org
> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
> >
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20120620/e910a676/attachment.html>


More information about the p2pu-dev mailing list