Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Update User Activity Query #35

Merged
merged 2 commits into from
Apr 22, 2024
Merged

Conversation

FahadKhalid210
Copy link
Contributor

@FahadKhalid210 FahadKhalid210 commented Apr 5, 2024

update user activity query to improve average time spent in course

Description:

In our ongoing analysis of PT data, we've identified discrepancies in the time spent by users in courses. Additionally, relying solely on "browser" events for these calculations has proven insufficient. After thorough investigation, we've determined that extending the time span from 10 seconds to 120 seconds and selecting all events where the course ID is not null provides a more accurate representation of the time users spend in courses.

Therefore, this pull request updates the query to enhance the accuracy of the average time spent in courses.

Before:
image

After:
image

@DawoudSheraz DawoudSheraz self-requested a review April 16, 2024 09:17
@@ -1909,7 +1909,7 @@
"offset": 0,
"params": "{\"remote_id\": 49, \"database_name\": \"admin\"}",
"schema": "openedx",
"sql": "SELECT user_id,\n course_id,\n time,\n arrayJoin(range(floor(toUInt64(toUnixTimestamp(time) - 5)), ceil(toUInt64(toUnixTimestamp(time) + 5)))) AS timestamp\nFROM events\nWHERE event_source = 'browser'",
"sql": "SELECT user_id,\n course_id,\n time,\n arrayJoin(range(floor(toUInt64(toUnixTimestamp(time))), ceil(toUInt64(toUnixTimestamp(time) + 120)))) AS timestamp\nFROM events\nWHERE course_id <> '' and user_id > 0",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are ignoring browser entirely in this new query? Why is that? The timestamp change is understandable.
Also, if we can add some visualizations (before/after), it would make it easier to understand the impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now picking both events(browser and server) where we have course_id and user_id because some events have source "server"(e.g edx.grades.problem.submitted) which we missed previously.

@@ -0,0 +1 @@
- [Improvement] Update User Activity dataset query to improve average time spent in course. (by @Fahadkhalid210)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can add context here on why this change provides a better data then before.

@FahadKhalid210 FahadKhalid210 merged commit 3190fea into master Apr 22, 2024
2 checks passed
@FahadKhalid210 FahadKhalid210 deleted the fahad-updateUAquery branch April 22, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants