Rockbuster Stealth LLC Data Analysis

Hello! this project is in Data Analytic Immersion program. It is Database and SQL analyst. The data set is from the video rental company or Rockbuster Stealth LLC. It is very cool project. I used pgAdmin, DBVisualize, Lucid Chart and I learned a lot about queries. I learned filtering, cleaning, summarizing, joining tables, subquery and CTE. I also did data dictionary and ERD. So far joining tables, sub query and CTE are challenging to me. Finally, I created a presentation to the stakeholders.

Project Overview

  • The Rockbuster Stealth management team is planing to use its existing movie license to launch an online video rental service in order to stay competitive
  • The Rockbuster Stealth Board has asked a series of business questions and they expect data-driven answers that they can use for their 2020 company strategy
    Here are the main questions:
    – Which movies contributed the most/least to revenue gain?
    – What was the average rental duration for all videos?
    – Which are customers with a high lifetime value based?
    – Do sales figures vary between geographic regions?

Data Set

  • A data set that contains information about Rockbuster’s film inventory, customers, and payments, among other things.

Data Quality

  • Found no missing values and no duplicated values

Data Limitations

  • Limited time

Analysis Data Process
Data Storage & Structure
Summarizing & Cleaning Data in SQL
Joining Tables of Data
Performing Subqueries
Common Table Expressions

Data Visualizaton

3 key questions before start

  • What Type of Data am i working with? Geospatial (Region) and Categorical (Genres, Rating, Movies, Rental Duration)
  • What do i want to communicate? Comparison (Column Chart, Bar Chart) , Composition (Tree Map, Bar Chart), Distribution (Map)
  • Who is the end user and what do they need? Stakeholders (Simple charts, Minimal Details). Provide the data-driven so they can use for their 2020 company strategy
Choose the Right Metrics
  • Total Sales in each Region
  • Total Revenues
  • Customer Lifetime Values
  • Average Rental Durations
Layout
Context (* Context is a key)
  • Context gives numbers meaning, and helps interpret them accurately so i used annotate points to show the meanings in my visualizations as well as the labels.

Tableau Public

 For better experience, please view it on a desktop and full screen. 

Github

There are some queries’ examples in this project. 

Data Storage & Structure

Entity Relationship Diagram (ERD)

Download ER Diagram here

Data Dictionary

Rockbuster database has a snowflake schema 

Link to see Rockbuster’s Data Dictionary

Summarizing & Cleaning Data in SQL

Find out if the film table and the customer table contain any dirty data, specifically non-uniform or duplicate data, or missing values.  

Identify Duplicates

Film Table

The result shows below and no duplicate values
sql_code

*Github’s link is coming soon

Customer Table

The result shows below and no duplicate values.

From both tables above, If there is duplicate data, there are two ways to fix them: 1. Create a virtual table, knows as a view, where you select only unique records 2. Delete the duplicate record from the table or view. And to analyze the data, there will be need to write a query that returns only unique records–by using group by and by using Distinct

None-Uniform Data

Film Table

The query below is for checking all columns but you can look closer by SELECT DISTINCT of each column and observe the result, as seen below

Customer Table

The query below is for checking all columns but you can look closer by SELECT DISTINCT of each column and observe the result, as seen below

Summarize your data: Use SQL to calculate descriptive statistics for both the film table and the customer table. For numerical columns, this means finding the minimum, maximum, and average values. For non-numerical columns, calculate the mode value. Copy-paste your SQL queries and their outputs into your answers document.

Film Table


For numerical columns and none-numerical columns:

The result is in Excel file

Customer Table

For numerical columns and none-numerical columns:

Joining Tables

Find the top 10 countries for Rockbuster in terms of customer numbers.

Find the top 10 cities within the top 10 countries identified in step 1

Find the top 5 customers in the top 10 cities who have paid the highest total amounts to Rockbuster. The customer team would like to reward them for their loyalty!

Performing Subquries

Find the average amount paid by the top 5 customers.

Find out how many of the top 5 customers are based within each country.

  • Outer query
  • Inner query

    Subquery

Common Table Expressions (CTE)

Rewrite queries from previous questions by using CTE

1.1 Find the average amount paid by the top 5 customers.

1.2 Find out how many of the top 5 customers are based within each country.