Instacart Data Analysis

Hello! this project is in Data Analytic Immersion program. It is a python project. Instacart is an online grocery store that operates through an app. They’re looking to improve the targeting of their advertising strategy using analysis of their customer and sales data to generate insights into what this strategy should look like. The dataset is very large. My tasks:- importing libraries, filtering, cleaning, combining, deriving new variables, grouping, and visualizing. Finally, I created a presentation to the stakeholders.

Project Overview

 Objective

  • Uncover more information about their sales patterns
  • Derive insights and suggest strategies for better segmentation based on the provided criteria

Key Questions

The sales team needs to know what the busiest days of the week and hours of the day

are (i.e., the days and times with the most orders) in order to schedule ads at times

when there are fewer orders.

  • They also want to know whether there are particular times of the day when people spend

the most money, as this might inform the type of products they advertise at these times.

  • Instacart has a lot of products with different price tags. Marketing and sales want to use

simpler price range groupings to help direct their efforts.

  • Are there certain types of products that are more popular than others? The marketing

and sales teams want to know which departments have the highest frequency of product

orders.

  • The marketing and sales teams are particularly interested in the different types of

customers in their system and how their ordering behaviors differ. For example:

  • What’s the distribution among users in regards to their brand loyalty (i.e., how

often do they return to Instacart)?

  • Are there differences in ordering habits based on a customer’s loyalty status?
  • Are there differences in ordering habits based on a customer’s region?
  • Is there a connection between age and family status in terms of ordering habits?

Note: Instacart is a real company that’s made their data available online. However, the contents of this project brief

have been fabricated for the purpose of this Achievement.

  • What different classifications does the demographic information suggest? Age?

Income? Certain types of goods? Family status?

  • What differences can you find in ordering habits of different customer profiles?

Consider the price of orders, the frequency of orders, the products customers are

ordering, and anything else you can think of.

Data Set

  •  A number of open-source data sets from Instacart. A customer data set (created and included for the purpose of this project) . Each data set contains a different kind of information, they all include some kind of common identifier.

Data Quality

  • Removed column that doesn’t need to be included in analysis as a numeric variable and change to suitable format. Renamed column to make it be consistency

  • Found missing value at days_since_prior_order and order_number (number 1). This is because the customer is ordering for the first time so there won’t be a value showing how long it has been since a prior order. I decided not to make any change and a value of 1 in the order_number column can act as a flag for a new customer. Found no duplicated values in order data.

Analysis Data Process

Preparing and Analysis Data Process:

Data Workflow

Data Consistency Checks

Data Wrangling & Subsetting

Combining & Exporting Data

Deriving New Variables

Grouping Data & Aggregating Variable

Visualization with Python

Data Visualizaton

3 key questions before start

  • What Type of Data am i working with? Geospatial (Region) and Categorical (product items, product price range, departments, product orders, ordering habits, frequency of orders, customer information)
  • What do i want to communicate? Comparison (Column Chart, Bar Chart) , Composition (Tree Map, Bar Chart, Column Chart)
  • Who is the end user and what do they need? Stakeholders (Simple charts, Minimal Details). Derive insights and suggest strategies for better segmentation based on the provided criteria
Choose the Right Metrics
  • Times: The days and times with the most orders (busiest days of the week and hours of the day) in order to schedule ads at times when there are fewer orders, day when people spend the most money
  • Product Types (for example: certain type of product that more popular)
  • Customer Behaviors
  • Customer Information
  • Orders (for example: number of orders, frequency orders,..,etc)
Layout
Context (* Context is a key)
  • Context gives numbers meaning, and helps interpret them accurately so i used annotate points to show the meanings in my visualizations as well as the labels.

Tableau Public

 For better experience, please view it on a desktop and full screen. 

Github

MY python codes for this project are available on Github (please click the link at the GitHub’s logo above)

Data Workflow

From top to bottom and  left to right, the original data was manipulated and merged and finally the data was ready to analyze. 

Data Consistency Check

Data Wrangling & Subsetting

Combining and Exporting Data

Deriving New Variables

Grouping Data & Aggregating Variable

Visualization with Python