-
Notifications
You must be signed in to change notification settings - Fork 95
Logging Access Stats
Unreleased Feature
Access stats are tracked on a per-user basis, as rollups for slices of
time. They are stored in the same Riak cluster as other MOSS data, in
the moss.access
bucket.
For information about querying access stats, please read Querying Access Stats.
The 10k ft. view of MOSS access logging is:
- MOSS Webmachine resources add notes to their
wrq
structures. - A Webmachine logging module has been written for moss to pull
these notes off the final
wrq
and toss them to an aggregating gen_server. - The aggregating gen_server periodically sends its accumulated log to an archival process.
- The archival process sums all recorded accesses for each user, and stores a record for each user for the time slice.
Log retrieval is then simply requesting from Riak all slice objects for a user in a time period.
No access data will be logged unless the user for the access is known.
A resource should record the user for the access by calling
riak_moss_access_logger:set_user/2
with the #moss_user{}
record
for the user, and the webmachine request data:
forbidden(RD, Ctx) ->
...
{ok, User} = Auth:authenticate(RD, Args),
AccessRD = riak_moss_access_logger:set_user(User, RD),
...
{false, AccessRD, Ctx}.
Note that a new request data structure is returned with the note added for the logger to pick up when the request finishes.
Each access stat should be added to the request data by calling
riak_moss_access_logger:set_stat/3
with the stat name and value, and
the webmachine request data:
accept_body(RD, Ctx) ->
...
Length = size(ReceivedBody),
AccessRD = riak_moss_access_logger:set_stat(bytes_in, Length, RD),
...
{ok, AccessRD, Ctx}.
As with setting the user, a new request data structure is returned with another note added for the logger to pick up.
Stat names must be atoms or binaries containing valid UTF-8 characters (because they are stored as JSON object keys). Atoms will be converted to binaries during archival.
Stat values must be numbers. See below for how they are handled during archival.
As resources finish their processing, the riak_moss_access_logger
module is called by Webmachine to log the access. This module
implements a gen_server
that finds all of the access notes in the
request's log data and stores them until the current timeslice ends.
Storage is currently a simple ETS duplicate_bag
table. The keys for
the table are user key IDs, such that all accesses for a user are
stored under the same key. Values for the table are proplists of the
stats logged for each access.
This should be fairly easy to convert to a simple append-only disk log if ETS proves to behave poorly under load, or if more durable storage for the current time period is desired.
When the current timeslice ends, riak_moss_access_logger
transfers
ownership of its accumlated ETS table to riak_moss_access_archiver
.
The logger module then opens a fresh ETS table for logging the next
slice's accsses.
The size of a timeslice is configured by the riak_moss
application
environment variable access_archive_period
. The value is expressed
as an integer number of seconds. The number given must evenly divide
one day (86400 seconds), in order to make uniform key generation for
the archive objects simple.
The default value for access_archive_period
, specified in
riak_moss.app
, is 3600 (one hour).
When riak_moss_access_archiver
receives an ETS table, it iterates
through the user keys stored there. For each key, the values for each
property in its associated proplists are summed. That is, if two
accesses were recorded for a key, with properties like
[{a, 5},{b,1}]
and [{a,7},{c,3}]
, the archiver will produce a
summed proplist of [{a,12},{b,1},{c,3}]
.
The Erlang node name of the MOSS node is added to this summed proplist, for reference during retrieval.
Each user's summed proplist is then stored in a Riak object for the
time slice. The key for this object is currently of the form
UserKeyID.EndingTimestampISO8601
. Proplists are converted to JSON
objects for storage, for maximum portability.
When the archiver has finished iterating through all users logged, it destroys the table.
When a request is recieved for a user's access stats over some time period, the objects for all slices in that time period must be retrieved.
It is important to note that the archival process does not attempt a
read/modify/write cycle when writing a slice record. It is assumed
that the moss.access
bucket has the allow_mult=true
flag set, and
so multiple moss nodes writing the same slice record for the same user
will create siblings.
Siblings should be handled at read time. Unless multiple slices have been written for the same node during the same period, sibling resolution should be nothing more than a set union of all records. The HTTP resource serving the statistics expects to provide them on a node-accumulated basis. This also means that it is important to set a unique Erlang node name for each MOSS node.