[p2pu-dev] Course metrics

Jos Flores josmasflores at gmail.com
Wed Jun 20 08:54:59 UTC 2012


Hey Dirk,

just wondering, just because the data is cacheable does not mean it
has to be cached, or that is has to be cached forever.
Think of a course organiser that wants a dump of the data after
closing the course. They will never use that data again after having
downloaded it.

Also this is a service that not everybody uses, and making it
offline-ish wouldn't be such a problem. What about something like
allowing the organiser to subscribe to metrics data? Think of specific
times such as weekly or monthly, generate it through a stored
procedure, and send it to their email automatically? I suppose a way
of getting all of it would also be needed.

Another option is to use an additional DB as caching. Run a cron job
that dumps all the data (for organisers that want it) into mongodb and
work from that data (in the web or offline).

cheers,
José


On 20 June 2012 07:49, Dirk Uys <dirk at p2pu.org> wrote:
> Hi Jose
>
> On Tue, Jun 19, 2012 at 7:46 PM, Jos Flores <josmasflores at gmail.com> wrote:
>>
>> Hey Dirk,
>>
>> I don't think getting data up to 'yesterday' is an issue. Overhead on
>> page views sounds really ugly.
>>
>> Would it be very difficult to put all the graphics in a pdf, and send
>> both pdf and csv to the organiser? This, and a lock so that it does
>> not duplicate data, would make me happy because I don't have to wait,
>> just say I want them and when they are ready, they will be in my
>> inbox.
>
>
> It wouldn't be that difficult, but it's a departure from the current
> implementation that would take some time to implement. I like the idea!
>
>> Is the process CPU intensive or is the DB slowing things down? Could
>> this be part of the API and run in a different machine? This is a bit
>> out now, but sounds like a perfect job for a nodejs stream and a
>> document DB, if it's not CPU bound.
>
>
> Both and neither. I think the Django ORM isn't the perfect fit for tasks
> like this. The process that updates the data can probably be implemented a
> few orders of magnitude quicker in pure SQL.
>
> Pushing the tracker to a separate module is something that we need to look
> at in the near future - the data being stored is already dominating the size
> of the whole p2pu database! By a factor of two!
>
>>
>> cheers,
>> José
>>
>>
>> On 19 June 2012 17:07, Dirk Uys <dirk at p2pu.org> wrote:
>> > Hi everyone
>> >
>> > During the last release on 18 May I enabled course metrics for all
>> > course
>> > organizers believing that the metrics were working perfectly and that
>> > it's
>> > simply a permission update. You know what they say about assumption...
>> >
>> > The problem is that when a user goes to the metric page for a course the
>> > metrics get generated from the recorded page views
>> >
>> > (https://github.com/p2pu/lernanta/blob/master/lernanta/apps/tracker/models.py#L100).
>> > If the user refreshes the page (because it's taking so long), the
>> > process is
>> > started again and the metric updating procedure happens concurrently.
>> > This
>> > doesn't play nice with the intended use of the db and duplicated data is
>> > generated :(
>> >
>> > Now, solving this problem has multiple possibilities! Each with pros and
>> > cons.
>> >
>> > 1. Enforce some locking mechanism to ensure the operation only happens
>> > once
>> > + process doesn't run concurrently
>> > - user waits
>> > - lots of db work tied to specific requests
>> >
>> > 2. Queue a celery tasks that runs to operation
>> > + user doesn't need to wait for results
>> > - still need to implement some locking mechanism to prevent celery tasks
>> > from running concurrently
>> > - lots of db work tied to specific requests
>> >
>> > 3. Keep the table updated from the get go
>> > + metrics are always up to date
>> > - introduces small overhead to every page view
>> > - generate metrics that's never used
>> >
>> > 4. Fix the data duplication issue that presents itself
>> > + doesn't matter if process runs concurrently
>> > - update still takes a long time
>> > - lots of db work tied to specific requests
>> >
>> > 5. Don't trigger the update process based on user actions, but rather at
>> > a
>> > predetermined time
>> > + user doesn't wait
>> > - generate metrics that's never used
>> >
>> > 6. ?
>> >
>> > Does anyone have any thoughts on this?
>> >
>> > Cheers
>> > d
>> >
>> > _______________________________________________
>> > p2pu-dev mailing list
>> > p2pu-dev at lists.p2pu.org
>> > http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>> >
>> _______________________________________________
>> p2pu-dev mailing list
>> p2pu-dev at lists.p2pu.org
>> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>
>
>
> _______________________________________________
> p2pu-dev mailing list
> p2pu-dev at lists.p2pu.org
> http://lists.p2pu.org/mailman/listinfo/p2pu-dev
>


More information about the p2pu-dev mailing list