-
Notifications
You must be signed in to change notification settings - Fork 45
[Data Product] Content Rating Updater
Sowmya N Dixit edited this page Jun 29, 2022
·
1 revision
- Type - Content rating updater- update the content model with average rating
- Computation Level - Level 1
- Frequency - Runs Daily
- Generating Graph Update Event with average rating of a Content from consumption data and pushing to ES which is used to create dashboards.
- Raw Telemetry: - FEEDBACK Event
- Previous content rating summary from DB
- Update
content_rating_summary
table incontent_db
#Schema of data model
{
"period": String, // Data sync date in YYYY-MM-DD format. For ex: 2019-05-04
"content_id": String, // content id
"content_type": String, // content type
"total_rating": Double, // Sum of ratings for a content for the period
"total_count": Long, // Number of times the content has been rated.
"avg_rating": Double // Average rating on content
}
#Schema of table
TABLE content_rating_summary (
period text,
content_id text,
content_type text,
total_rating double,
total_count bigint,
avg_rating double,
PRIMARY KEY (content_id, period)
);
2 Generate Graph Update Event and push to Kafka topic learning.graph.events
.
#Schema of Graph Update Event
{
"ets" : Long, // Event generation time in epoch
"nodeUniqueId" : String, // content id
"operationType": String, // default to UPDATE
"nodeType": String, // default to DATA_NODE
"graphId": String, // default to domain
"objectType": String, // object type - Resolve object type from `content_type` field
"nodeGraphId": Int, // default to 0
"transactionData" : {
"properties" : {
"me_averageRating" : {
"ov" : Double,
"nv" : Double
}
}
}
}
1. Update content_rating_summary
table in content_db
Computation Table:
- Filter FEEDBACK events and group by content_id
Field | Computation | Remark |
---|---|---|
content_id | object.id value | |
content_type | object.type value | |
period | Get the sync date in YYYY-MM-DD format | Period is added to avoid replay complexity. If replay is done for last 2 days, only those 2 records for each content will be updated and final average computation for graph event will be recomputed |
total_rating | Sum of edata.rating
|
|
total_count | Count of FEEDBACK events | |
avg_rating | total_rating/total_count |
2. Generate Graph Update Event
- Get the list of unique content_ids from FEEDBACK events.
- Get all the entries in Cassandra table for that content.
- Compute Sum(total_rating), Sum(total_count) from Cassandra data.
- Compute average_rating as
Sum(total_rating)/Sum(total_count)
and generate a Graph Update Event for each content.