Bab 6 Persentasi Tugas Akhir

Pada pertemuan ini akan dilakukan evaluasi dan peninjauan ulang materi yang sudah dipelajara selama pelatihan Data Analytics with Python & SQL. Pertama-tama akan direview materi sebelumnya dengan simulasi studi kasus.

6.1 Studi Kasus jne_db

Database jne_db adalah database fiktif yang dikembangkan oleh dscienclabs. Kumpulan data tersebut berisi informasi tentang pelanggan, produk, pesanan, logistik, kegiatan, dan kampanye pemasaran digital. Database ini digunakan dalam pelatihan kepada praktisi data di industri yang bertujuan untuk pemahaman analisis data, penemuan ide, pengembangan pengetahuan. Berikut ini adalah struktur database yang digunakan:

6.2 Create Database

Pada bagian ini diperlihatkan koding Python dan sintaks PostgreSQL untuk membentuk database baru yang diberi nama jne_db.

# load modul yg diperlukan
import psycopg2

# buat koneksi python ke PostgreSQL (127.0.0.1 atau localhost)
konek = psycopg2.connect(user='postgres', 
                        password='123', 
                        host='localhost', 
                        port= '5432'
)
konek.autocommit = True

# Eembuat kursor eksekutor 
kursor = konek.cursor()

# Kueri
sql = '''CREATE DATABASE JNE_db''';

#Creating a database
kursor.execute(sql)

6.3 Import Data CSV

Karena kita ingin membambahkan kumpulan data tabel ke database jne_db dari file CSV maka terlebih dahulu, dilakukan import data sebagai berikut:

import os
os.chdir('C:/Users/user/Desktop/jne_db')

6.4 Konversi Struktur Tabel

Pada bagian ini, kita melakukan penyimpanan tabel yang telah diimport dari file CSV ke database PostgreSQL.

# import pandas lib as pd
import pandas as pd

# read by default 1st sheet of an excel file
df1 = pd.read_csv('distribution_centers.csv',sep=';')

df2 = pd.read_csv('events.csv',sep=',')
df2['created_at'] = df2['created_at'].str.replace('UTC', '')
df2['created_at'] = pd.to_datetime(df2['created_at'])

df3 = pd.read_csv('inventory_items.csv',sep=',')
cols = df3.columns[2:4]
df3['created_at'] = df3['created_at'].str.replace('UTC', '')
df3['sold_at'] = df3['sold_at'].str.replace('UTC', '')
df3[cols] = df3[cols].apply(pd.to_datetime)

df4 = pd.read_csv('order_items.csv',sep=',')
df4['created_at'] = df4['created_at'].str.replace('UTC', '')
df4['shipped_at'] = df4['shipped_at'].str.replace('UTC', '')
df4['delivered_at'] = df4['delivered_at'].str.replace('UTC', '')
df4['returned_at'] = df4['returned_at'].str.replace('UTC', '')
cols = df4.columns[6:10]
df4[cols] = df4[cols].apply(pd.to_datetime)

df5 = pd.read_csv('orders.csv',sep=',')
df5['created_at'] = df5['created_at'].str.replace('UTC', '')
df5['shipped_at'] = df5['shipped_at'].str.replace('UTC', '')
df5['delivered_at'] = df5['delivered_at'].str.replace('UTC', '')
df5['returned_at'] = df5['returned_at'].str.replace('UTC', '')
cols = df5.columns[4:8]
df5[cols] = df5[cols].apply(pd.to_datetime)

df6 = pd.read_csv('products.csv',sep=',')

df7 = pd.read_csv('users.csv',sep=',')
df7['created_at'] = df7['created_at'].str.replace('UTC', '')
df7['created_at'] = pd.to_datetime(df7['created_at'])

6.5 Simpan Tabel ke Database

Pada bagian ini, kita melakukan penyimpanan tabel yang telah diimport dari file CSV ke database PostgreSQL.

import pandas as pd
from sqlalchemy import create_engine
import psycopg2
from sqlalchemy import text
engine = create_engine('postgresql://postgres:123@localhost:5432/jne_db')

df1.to_sql("distribution_centers", engine)
df2.to_sql("events", engine)
df3.to_sql("inventory_items", engine)
df4.to_sql("order_items", engine)
df5.to_sql("orders", engine)
df6.to_sql("products", engine)
df7.to_sql("users", engine)

Catatan: Sebaiknya lakukan Import Data, Konversi, dan Penyimpanan database menggunakan Jupterlab.

6.6 Contoh Soal 1

Visualisasikan total order berdasarkan status!

query = '''SELECT date_trunc('MONTH', created_at) AS Month_Year, 
           status, 
           COUNT(DISTINCT user_id) as total_unique_users,
           COUNT(DISTINCT order_id) as total_orders,
           SUM(sale_price) as total_sale_price
            FROM order_items
              WHERE 
              DATE(created_at) BETWEEN '2020-01-01' AND '2023-07-31'
              GROUP BY 1,2
              ORDER BY 1,2
        '''

df = pd.read_sql(query, engine)
df 
crosstb=pd.crosstab(df['month_year'], df['status'], df['total_orders'], aggfunc='sum')
crosstb.head(5)
import matplotlib.pyplot as plt
import plotly.express as px
fig1 = px.histogram(df, x="month_year", color="status",
             y='total_orders',
             title="Total Order Berdasarkan Status",
             barmode='group',
             height=500
            )

fig1.show()

Jika ditampilkan dengan Line-Chart, sebagai berikut:

import matplotlib.pyplot as plt
import plotly.express as px
fig2 = px.line(df, x="month_year", color="status",
             y='total_orders',
             title="Total Order Berdasarkan Status",
             height=200
            )

fig2.show()

6.7 Contoh Soal 2

Hitung dan Visualiasikan AOV order_item yang closed setiap bulannya!

Catatan: Average Order Value (AOV) adalah sebuah indikator yang menghitung besar pengeluaran pelanggan setiap kali mereka memesan atau membeli barang. Jadi, secara matematis, rumusnya adalah:

\[AOV = \text{Total penghasilan dalam periode A} / \text{Total transaksi dalam periode A}\]

Periode A yang dimaksud di sini dapat berupa 1 tahun, 3 bulan, 2 minggu, atau bahkan sehari. Misalkan, AOV brand tersebut adalah Rp30.000, yang berarti secara rata-rata pembeli mengeluarkan Rp30.000 setiap transaksi.

with engine.begin() as conn:
    query = text('''SELECT date_trunc('month', created_at) AS Month_Year,
    ROUND((COUNT(DISTINCT order_id)/COUNT(DISTINCT user_id)),2) AS frequencies,
    ((SUM(sale_price)/COUNT(DISTINCT order_id))) AS AOV,
    COUNT(DISTINCT user_id) AS total_unique_users
FROM order_items
WHERE 
  status = 'Complete'
  AND DATE(created_at) BETWEEN '2019-01-01' AND '2022-08-31'
GROUP BY 1
ORDER BY 1''')

    df = pd.read_sql_query(query, conn)
df

6.8 Contoh Soal 3

Carilah Produk yang paling Menguntungkan!

with engine.begin() as conn:
    query = text('''WITH
main AS (SELECT name AS product_name, 
    products.id AS products_id,
    ROUND(retail_price) AS retail_price,
    ROUND(cost) AS cost,
    SUM(sale_price-cost) AS profit
  FROM products 
  JOIN order_items 
    ON products.id = order_items.product_id
  WHERE status = 'Complete'
  GROUP BY 1,2,3,4), 
  top_least AS (SELECT  *, RANK() OVER (ORDER BY profit ASC) AS top_rank FROM main), 
  top_most AS(SELECT  *, RANK() OVER (ORDER BY profit DESC) AS top_rank FROM main)
  SELECT * , 'least profit' AS rank_values FROM top_least 
UNION ALL
SELECT *, 'most profit' AS rank_values FROM top_most
ORDER BY top_rank, rank_values DESC
LIMIT 10''')
    df = pd.read_sql_query(query, conn)

df

6.9 Contoh Soal 4

Dalam laporan ini kamu akan melihat beberapa detail penjualan seperti: Total Transactions: Total nominal transaksi seluruh outlet(Dalam Rupiah) Total Transaction Count: Total jumlah transaksi si seluruh outlet. Total Outlet: Total jumlah outlet yang bertransaksi di akun tersebut.

with engine.begin() as conn:
    query = text("""WITH
total_profit AS (SELECT 
    DATE(orders.shipped_at) AS order_date, 
    products.category AS product_categories, 
    SUM(sale_price-cost) AS profit
  FROM products 
  INNER JOIN order_items orders 
  ON products.id = orders.product_id
  WHERE status = 'Complete'
    AND orders.created_at BETWEEN '2022-06-01'AND '2022-08-15'
  GROUP BY 1,2
  ORDER BY 2,1
),
MTD_table AS (SELECT order_date,
   product_categories, 
   profit,
   SUM(profit) OVER(PARTITION BY product_categories, EXTRACT(MONTH FROM order_date) ORDER BY 2,1) 
   AS MTD
  FROM total_profit
  ORDER BY 2,1
)
SELECT 
  order_date, product_categories, round(MTD) AS MTD
FROM MTD_table
WHERE order_date BETWEEN '2022-06-01'AND '2022-08-15'
  AND EXTRACT(DAY FROM order_date) = 15""")
    df = pd.read_sql_query(query, conn)

df

Nb: Total Penjualan Month to Date (MTD)

6.10 Contoh Soal 5

Persentasi perbandingan total inventori sebelum dan seudah

with engine.begin() as conn:
    query = text("""WITH inventory AS
(  SELECT 
    date_trunc('month', created_at) AS Month_Year
    , product_category AS categories
    , COUNT(id) AS total_inventory
  FROM inventory_items
  WHERE created_at BETWEEN '2019-12-01' AND '2022-04-30'
  GROUP BY 1,2 ) ,
previous_inventory as 
( SELECT *
  , LAG (total_inventory) over (partition by inventory.categories order by date_trunc('month', Month_year)DESC,inventory.categories) AS total_prev_inv
  FROM inventory )

SELECT *, ROUND((total_inventory-total_prev_inv)/total_prev_inv*100) AS Growth
FROM previous_inventory
ORDER BY 1,2""")
    df = pd.read_sql_query(query, conn)

df

6.11 Contoh Soal 6

Menghitung jumlah revenue dan order item!

with engine.begin() as conn:
    query = text("""SELECT 
  DATE_TRUNC('month',oi.created_at) AS order_date,
  SUM(oi.sale_price*o.num_of_item) AS revenue,
  COUNT(DISTINCT oi.order_id) AS order_count,
  COUNT(DISTINCT oi.user_id) AS customers_purchased
FROM order_items AS oi
LEFT JOIN orders AS o 
ON oi.order_id = o.order_id
WHERE oi.status NOT IN ('Cancelled','Returned')
GROUP BY 1
ORDER BY 1 DESC""")
    df = pd.read_sql_query(query, conn)

df

6.12 Contoh Soal 7

Jumlah Pelanggan berdasarkan Negara, Pisahkan Berdasarkan Gender!

with engine.begin() as conn:
    query = text("""WITH
cust AS (
  SELECT 
    DISTINCT oi.user_id,
    SUM(CASE WHEN u.gender = 'M' THEN 1 ELSE null END) AS male,
    SUM(CASE WHEN u.gender = 'F' THEN 1 ELSE null END) AS female,
    u.country AS country
  FROM order_items AS oi
  INNER JOIN users AS u  
  ON oi.user_id = u.id
  WHERE oi.status NOT IN ('Cancelled','Returned')
  GROUP BY 1, 4
)

SELECT
  c.country,
  COUNT(DISTINCT c.user_id) AS customers_count,
  COUNT(c.female) AS female,
  COUNT(c.male) AS male
FROM cust AS c
GROUP BY 1
ORDER BY 2 DESC""")
    df = pd.read_sql_query(query, conn)

df

6.13 Contoh Soal 8

Buatlah Visualisasi Jumlah Pelanggan berdasarkan Kategori

with engine.begin() as conn:
    query = text("""SELECT
  CASE 
    WHEN u.age <15 THEN 'Kids'
    WHEN u.age BETWEEN 15 AND 24 THEN 'Teenager'
    WHEN u.age BETWEEN 25 AND 50 THEN 'Adult'
    WHEN u.age >50 THEN 'Eldery' END AS age_group,
  COUNT(DISTINCT oi.user_id) total_customer
FROM order_items oi
LEFT JOIN users u
ON oi.user_id = u.id
WHERE oi.status NOT IN ('Cancelled','Returned')
GROUP BY 1
ORDER BY 2 DESC""")
    df = pd.read_sql_query(query, conn)

df

6.14 Ujian

6.14.1 Onsite

  • Hitunglah persentasi Status Pemesanan Produk menggunakan (Pie-chart)!
  • Total jumlah barang Terjual dengan Status Complete Setiap Tahunnya (Pie-chart)!
  • Total jumlah barang Terjual berdasarkan Kategori Produk (Bar-chart)!
  • Total Revenue Barang Terjual berdasarkan Kategori Produk (Bar-chart)!
  • Total Sale Price, Cost, dan Laba Kotor bulanan Multi Line-Chart!

6.14.2 Take Home

  • Buatlah Visualisasi Data Customer PEREMPUAN berdasarkan Negara dengan memperhatikan status “RETURNED” saja.

  • Visualisasi total orders complete pada tahun 2022

  • Bandingkan total order yang Complete dan Returned perbulan dalam satu grafik garis?

  • Produk category apa yang memiliki penjualan tertinggi (revenue) pada setiap tahun (dibandingakan)? (2020-2023) dan Buatkan Visualisasi datanya!

    note: status nya harus Complete,Shipped,Processing. Jangan masukan status yang Cancelled dan Returned

    Hint : sale_price pada order_items dikali dengan num_of_time pada orders untuk menentukan revenue jadi ada 3 tabel yang digunakan yaitu products, orders,order_items

  • Visualisasi Data Customers dengan berdasarkan Country menggunakan Pie Chart

  • Melihat Penjualan Tahunan berdasarkan status yang ’COMPLETE” berdasarkan profit dan reveneunya dengan menerapkannya pada visualisasi data.