docs/content/product/data-modeling/recipes/funnels.mdx
This functionality only works with data models written in JavaScript, not YAML. For more information, check out the Data Modeling Syntax page.
</InfoBox>Funnels represent a series of events that lead users towards a defined goal. Funnel analysis is an approach commonly used in product, marketing and sales analytics.
Regardless of the domain, every funnel has the following traits:
Since funnels have a pretty standard structure, they are good candidates for being extracted into reusable packages. Cube comes pre-packaged with a standard funnel package.
// First step is to require the Funnel package
const Funnels = require(`Funnels`)
cube(`PurchaseFunnel`, {
extends: Funnels.eventFunnel({
userId: {
sql: `user_id`
},
time: {
sql: `timestamp`
},
steps: [
{
name: `view_product`,
eventsView: {
sql: `select * from events where event = 'view_product'`
}
},
{
name: `purchase_product`,
eventsView: {
sql: `select * from events where event = 'purchase_product'`
},
timeToConvert: "1 day"
}
]
})
})
Cube will generate an SQL query for this funnel. Since funnel analysis in SQL is not straight forward, the SQL code itself is quite complicated, even for such a small funnel.
<a href="#" class="accordion-trigger" id="show-sql-accordion"> {" "} Show Funnel's SQL </a> <div class="accordion" id="show-sql-accordion-body">SELECT
purchase_funnel.step "purchase_funnel.step",
count(purchase_funnel.user_id) "purchase_funnel.conversions"
FROM
(
WITH joined_events AS (
select
view_product_events.user_id view_product_user_id,
purchase_product_events.user_id purchase_product_user_id,
view_product_events.t
FROM
(
select
user_id user_id,
timestamp t
from
(
select
*
from
events
where
event = 'view_product'
) e
) view_product_events
LEFT JOIN (
select
user_id user_id,
timestamp t
from
(
select
*
from
events
where
event = 'purchase_product'
) e
) purchase_product_events ON view_product_events.user_id = purchase_product_events.user_id
AND purchase_product_events.t >= view_product_events.t
AND (
purchase_product_events.t :: timestamptz AT TIME ZONE 'America/Los_Angeles'
) <= (
view_product_events.t :: timestamptz AT TIME ZONE 'America/Los_Angeles'
) + interval '1 day'
)
select
user_id,
first_step_user_id,
step,
max(t) t
from
(
SELECT
view_product_user_id user_id,
view_product_user_id first_step_user_id,
t,
'View Product' step
FROM
joined_events
UNION ALL
SELECT
purchase_product_user_id user_id,
view_product_user_id first_step_user_id,
t,
'Purchase Product' step
FROM
joined_events
) as event_steps
GROUP BY
1,
2,
3
) AS purchase_funnel
WHERE
(
purchase_funnel.t >= '2018-07-01T07:00:00Z' :: timestamptz
AND purchase_funnel.t <= '2018-07-31T06:59:59Z' :: timestamptz
)
GROUP BY
1
ORDER BY
2 DESC
LIMIT
5000
A unique key to identify the users moving through the funnel.
userId: {
sql: `user_id`
}
In the situation where user_id changes between steps, you can pass a unique
key to join two adjacent steps. For example, if a user signs in after having
been tracked anonymously until that point in the funnel, you could use
nextStepUserId to define a funnel where users are tracked by anonymous ID on
the first step and then by an identified user ID on subsequent steps.
const Funnels = require(`Funnels`)
cube(`OnboardingFunnel`, {
extends: Funnels.eventFunnel({
userId: {
sql: `id`
},
time: {
sql: `timestamp`
},
steps: [
{
name: `View Page`,
eventsView: {
sql: `select anonymous_id as id, timestamp from pages`
}
},
{
name: `Sign Up`,
eventsView: {
sql: `select anonymous_id as id, user_id, timestamp from sign_ups`
},
nextStepUserId: {
sql: `user_id`
},
timeToConvert: "1 day"
},
{
name: `Action`,
eventsView: {
sql: `select user_id as id from actions`
},
timeToConvert: "1 day"
}
]
})
})
A timestamp of the event.
time: {
sql: `timestamp`
}
An array of steps. Each step has 2 required and 1 optional parameters:
userId and time fields. For example, if we have defined the userId as
user_id and time as timestamp, we need to have these fields in the table
we're selecting from.1 day, for
instance, it means the funnel will include only users who made a purchase
within 1 day of visiting the product page.steps: [
{
name: `purchase_product`,
eventsView: {
sql: `select * from events where event = 'purchase_product'`
},
timeToConvert: "1 day"
}
]
In order to provide additional dimensions, funnels can be joined with other
cubes using user_id at the first step of a funnel. This will always use a
many_to_one relationship, hence you should always join with the corresponding
user cube. Here, by 'user' we understand this to be any entity that can go
through a sequence of steps within funnel. It could be a real web user with an
auto assigned ID or a specific email sent by an email automation that goes
through a typical flow of events like 'sent', 'opened', 'clicked', and so on.
For example, for our PurchaseFunnel we can add a join to another funnel as
following:
cube(`PurchaseFunnel`, {
joins: {
Users: {
relationship: `many_to_one`,
sql: `${CUBE}.first_step_user_id = ${Users.id}`
}
},
extends: Funnels.eventFunnel({
// ...
})
})
Cube is based on multidimensional analysis
Funnel-based cubes have the following structure:
Funnel joins are extremely heavy for most modern databases and complexity grows
in a non-linear way with the addition of steps. However, if the cardinality of
the first event isn't too high, very simple optimization can be applied:
originalSql pre-aggregation.
It is best to use partitioned rollups to cache the
steps instead. Add one to the PurchaseFunnel cube as follows:
cube(`PurchaseFunnel`, {
extends: Funnels.eventFunnel({
// ...
}),
preAggregations: {
main: {
type: `originalSql`
}
}
})