visit
We only have one table for analysis. The table has one million rows (events) and five columns originally (excluding the column name
that we created as the primary key). Here is a brief description of our table:
· id
is the primary key column for this table. Since the original dataset doesn’t have a primary key, so we have to generate an auto-increment primary key for the table.
· user_id
is the id for each user.
· product_id
is the id for each product.
· product_category_id
is the id for each product category
· type_of_behavior
describes users’ actions, which include “pv”, “buy”, “cart”, and “fav”
· timestamp
describes the timestamp of each event
CREATE TABLE 'ALI_USER_STAGE" (
"user_id' numeric,
"product_id" numeric,
"product_category_id" numeric,
"type_of_behavior" varchar(255)
"timestamp" numeric)
CREATE TABLE 'ALI_USER' (
"id" int NOT NULL AUTO INCREMENT,
"user_id" numeric, "product_id" numeric,
"product_category_id" numeric,
"type_of_behavior" varchar(255),
"timestamp" datetime,
PRIMARY KEY ("id"))
INSERT INTO ALI_USER(user_id, product_id, product_category_id, type_of_behavior, timestamp)
SELECT user_id, product_id, product_category_id, type_of_behavior, FROM_UNIXTIME(timestamp, ‘Y-%m-%d %H:%m:%S’) FROM_ALI_USER_STAGE
SELECT HOUR(timestamp) AS hour, COUNT (user_id) AS hourly_total_customer FROM ALI_USER GROUP BY hour ORDER BY hour
SELECT DATE_FORMAT(timestamp, "%m-%d-%Y %W") AS date, COUNT(*) AS visit_count FROM ALI_USER WHERE timestamp BETWEEN '2017-11-24' AND '2017-12-03' GROUP BY date ORDER BY visit_count DESC
From the Tableau area above, there is an increasing trend in visit counts starting from December 1st, 2017. This is because the coming Alibaba sitewide discount “12/12” (December 12th, 2017) activate many silent customers. Before December 2nd, 2017, the trend of the whole graph presents flatly, we can conclude that customers are not spending a lot more time on the platform during the weekend regularly. Thus, our hypothesis is not accepted.
SELECT product_category_id, COUNT(user_id) AS category_total_customer FROM ALI_USER WHERE type_of_behavior = "buy" GROUP BY product_category_id ORDER BY category_total_customer DESC
product_category_id = 4801426 has the most purchases while 181182, 194104, 245030, and 344221 have the least purchases.
SELECT user_id, CASE WHEN customer_count = 1 OR customer_count = 0 THEN "Bronze" WHEN customer_count = 2 THEN "Silver" WHEN customer_count = 3 THEN "Gold" ELSE "Platinum" END AS customer_buying_power FROM( SELECT user_id, COUNT(user_id) AS customer_count FROM ALI_USER WHERE type_of_behavior = "buy" GROUP BY user_id ORDER BY customer_count DESC) t
WITH cte AS ( SELECT user_id FROM ALI_USER WHERE type_of_behavior = "buy")
SELECT COUNT(t.user_id) / COUNT(cte.user_id) * 100 AS repurchase_rate FROM( SELECT user_id FROM ALI_USER WHERE type_of_behavior = "buy" GROUP BY user_id HAVING COUNT(user_id) >= 2) t RIGHT JOIN cte USING(user_id)
SELECT product_category_id, ctvisit, ctbuy,concat(round(ctbuy100/ctvisit,3),'%') AS conversion_rate, product_number FROM (SELECT product_category_id,count()as ctvisit, SUM(CASE WHEN type_of_behavior='buy' then 1 else 0 end) as ctbuy, COUNT(DISTINCT product_id) as product_number FROM ALI_USER GROUP BY product_category_id) t ORDER BY conversion_rate
There are a couple of product categories that have high conversion rates which show in the graph above. In order to investigate the reason behind the fact, we drilled down to the detail by adding the number of products in each category in our graph. In the graph, you can see that the higher the customer conversion rate, the lower number of products in the category. This is because when customers have too many options or substitutions, they might need to spend more time comparing products, which leads to a lower customer conversion rate. On the other hand, for the product category that has fewer options or substitutions, customers tend to spend less time choosing the product which leads to a higher customer conversion rate. To sum up, the product categories that have the most purchases are ranked by the combination of customer conversion rates and page visits.