Skip to content

Logging Access Stats

Bryan Fink edited this page Feb 1, 2012 · 13 revisions

Logging User Access Statistics

Unreleased Feature

Access stats are tracked on a per-user basis, as rollups for slices of time. They are stored in the same Riak cluster as other MOSS data, in the moss.access bucket.

For information about querying access stats, please read Querying Access Stats.

High Level

The 10k ft. view of MOSS access logging is:

  1. MOSS Webmachine resources add notes to their wrq structures.
  2. A Webmachine logging module has been written for moss to pull these notes off the final wrq and toss them to an aggregating gen_server.
  3. The aggregating gen_server periodically sends its accumulated log to an archival process.
  4. The archival process sums all recorded accesses for each user, and stores a record for each user for the time slice.

Log retrieval is then simply requesting from Riak all slice objects for a user in a time period.

Logging in a Resource

Setting the User

No access data will be logged unless the user for the access is known. A resource should record the user for the access by calling riak_moss_access_logger:set_user/2 with the #moss_user{} record for the user, and the webmachine request data:

forbidden(RD, Ctx) ->
   ...
   {ok, User} = Auth:authenticate(RD, Args),
   AccessRD = riak_moss_access_logger:set_user(User, RD),
   ...
   {false, AccessRD, Ctx}.

Note that a new request data structure is returned with the note added for the logger to pick up when the request finishes.

Adding a Stat

Each access stat should be added to the request data by calling riak_moss_access_logger:set_stat/3 with the stat name and value, and the webmachine request data:

accept_body(RD, Ctx) ->
    ...
    Length = size(ReceivedBody),
    AccessRD = riak_moss_access_logger:set_stat(bytes_in, Length, RD),
    ...
    {ok, AccessRD, Ctx}.

As with setting the user, a new request data structure is returned with another note added for the logger to pick up.

Stat names must be atoms or binaries containing valid UTF-8 characters (because they are stored as JSON object keys). Atoms will be converted to binaries during archival.

Stat values must be numbers. See below for how they are handled during archival.

Log Accumulation

As resources finish their processing, the riak_moss_access_logger module is called by Webmachine to log the access. This module implements a gen_server that finds all of the access notes in the request's log data and stores them until the current timeslice ends.

Storage is currently a simple ETS duplicate_bag table. The keys for the table are user key IDs, such that all accesses for a user are stored under the same key. Values for the table are proplists of the stats logged for each access.

This should be fairly easy to convert to a simple append-only disk log if ETS proves to behave poorly under load, or if more durable storage for the current time period is desired.

When the current timeslice ends, riak_moss_access_logger transfers ownership of its accumlated ETS table to riak_moss_access_archiver. The logger module then opens a fresh ETS table for logging the next slice's accsses.

Timeslice Duration

The size of a timeslice is configured by the riak_moss application environment variable access_archive_period. The value is expressed as an integer number of seconds. The number given must evenly divide one day (86400 seconds), in order to make uniform key generation for the archive objects simple.

The default value for access_archive_period, specified in riak_moss.app, is 3600 (one hour).

Log Archival

When riak_moss_access_archiver receives an ETS table, it iterates through the user keys stored there. For each key, the values for each property in its associated proplists are summed. That is, if two accesses were recorded for a key, with properties like [{a, 5},{b,1}] and [{a,7},{c,3}], the archiver will produce a summed proplist of [{a,12},{b,1},{c,3}].

The Erlang node name of the MOSS node is added to this summed proplist, for reference during retrieval.

Each user's summed proplist is then stored in a Riak object for the time slice. The key for this object is currently of the form UserKeyID.EndingTimestampISO8601. Proplists are converted to JSON objects for storage, for maximum portability.

When the archiver has finished iterating through all users logged, it destroys the table.

Archive Retrieval

When a request is recieved for a user's access stats over some time period, the objects for all slices in that time period must be retrieved.

It is important to note that the archival process does not attempt a read/modify/write cycle when writing a slice record. It is assumed that the moss.access bucket has the allow_mult=true flag set, and so multiple moss nodes writing the same slice record for the same user will create siblings.

Siblings should be handled at read time. Unless multiple slices have been written for the same node during the same period, sibling resolution should be nothing more than a set union of all records. The HTTP resource serving the statistics expects to provide them on a node-accumulated basis. This also means that it is important to set a unique Erlang node name for each MOSS node.