Skip to content

Latest commit

 

History

History
122 lines (110 loc) · 4.77 KB

1341_movie_rating.md

File metadata and controls

122 lines (110 loc) · 4.77 KB

Table: Movies

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.

Table: MovieRating

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date. 

Write a solution to:

  • Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.

  • Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.

  • The result format is in the following example.

Example 1:

Input: 
Movies table:
+-------------+--------------+
| movie_id    |  title       |
+-------------+--------------+
| 1           | Avengers     |
| 2           | Frozen 2     |
| 3           | Joker        |
+-------------+--------------+
Users table:
+-------------+--------------+
| user_id     |  name        |
+-------------+--------------+
| 1           | Daniel       |
| 2           | Monica       |
| 3           | Maria        |
| 4           | James        |
+-------------+--------------+
MovieRating table:
+-------------+--------------+--------------+-------------+
| movie_id    | user_id      | rating       | created_at  |
+-------------+--------------+--------------+-------------+
| 1           | 1            | 3            | 2020-01-12  |
| 1           | 2            | 4            | 2020-02-11  |
| 1           | 3            | 2            | 2020-02-12  |
| 1           | 4            | 1            | 2020-01-01  |
| 2           | 1            | 5            | 2020-02-17  | 
| 2           | 2            | 2            | 2020-02-01  | 
| 2           | 3            | 2            | 2020-03-01  |
| 3           | 1            | 3            | 2020-02-22  | 
| 3           | 2            | 4            | 2020-02-25  | 
+-------------+--------------+--------------+-------------+
Output: 
+--------------+
| results      |
+--------------+
| Daniel       |
| Frozen 2     |
+--------------+
Explanation: 
Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker") but Daniel is smaller lexicographically.
Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.

Solution

import pandas as pd


def movie_rating(movies: pd.DataFrame, users: pd.DataFrame, movie_rating: pd.DataFrame) -> pd.DataFrame:
    tmp = pd.merge(movie_rating, users, how='left', on='user_id')
    most_ratings = tmp.groupby('name')['user_id'].count().reset_index(name='cnt')
    most_ratings = most_ratings.sort_values(by=['cnt', 'name'], ascending=[False, True])['name'].iloc[0]

    tmp = pd.merge(movie_rating, movies, how='left', on='movie_id')
    tmp['created_at'] = pd.to_datetime(tmp['created_at'])
    tmp = tmp[(tmp['created_at'].dt.year == 2020) & (tmp['created_at'].dt.month == 2)]
    highest_rating = tmp.groupby('title')['rating'].mean().reset_index(name='avg')
    highest_rating = highest_rating.sort_values(by=['avg', 'title'], ascending=[False, True])['title'].iloc[0]
    return pd.DataFrame([[most_ratings], [highest_rating]], columns=['results'])


if __name__ == '__main__':
    data = [[1, 'Avengers'], [2, 'Frozen 2'], [3, 'Joker']]
    movies = pd.DataFrame(data, columns=['movie_id', 'title']).astype({'movie_id': 'Int64', 'title': 'object'})
    data = [[1, 'Daniel'], [2, 'Monica'], [3, 'Maria'], [4, 'James']]
    users = pd.DataFrame(data, columns=['user_id', 'name']).astype({'user_id': 'Int64', 'name': 'object'})
    data = [[1, 1, 3, '2020-01-12'], [1, 2, 4, '2020-02-11'], [1, 3, 2, '2020-02-12'], [1, 4, 1, '2020-01-01'],
            [2, 1, 5, '2020-02-17'], [2, 2, 2, '2020-02-01'], [2, 3, 2, '2020-03-01'], [3, 1, 3, '2020-02-22'],
            [3, 2, 4, '2020-02-25']]
    movie_ratings = pd.DataFrame(data, columns=['movie_id', 'user_id', 'rating', 'created_at']).astype(
        {'movie_id': 'Int64', 'user_id': 'Int64', 'rating': 'Int64', 'created_at': 'datetime64[ns]'})
    print(movie_rating(movies, users, movie_ratings))