" Email from the CEO, Subject: Performance Targets
Chapter 1 from the original Vexpower prototype.
Performance Targets\nHey,\nI know the numbers are up since the TV spot went out (🔥) but do we know exactly how much? How many sales is it driving? What's the ROI compared to Facebook?\nTrying to get a handle on performance ahead of the management retreat – we're setting performance targets for the year. If we can tell the right story we'll be set up nicely for a hefty Series A round.\nThanks,
As the VP of Growth at GoolyBib, a fast-growing DTC startup selling smart bibs, you started running a TV campaign 2 months ago. You also run Facebook ads, and have affiliate partners, which will be important to report on too.\nThe wearable device for babies market is ultra-competitive, and potential customers need to be educated on your innovative new product: a smart bib that alerts you if your baby is choking on their food.\nAfter some success with affiliates and Facebook ads, you bet big on TV as the next major growth channel for GoolyBib. Now the CEO needs to see the numbers – this is critical for the future success of the company, and also your career!\nSo let's get started...
First let's look at what our analytics is telling us. This Google Sheet has a report from analytics that tells us our revenue by channel:
Take a look at the data (make a copy) and run some basic analysis to answer the questions below (skip ahead if you get stuck)
How much revenue total did we generate from all channels?
What channel drove the most revenue?
The TV campaign began on the 23rd of August, 2020. How much did revenue change (in %) in the period after TV started running versus before it started running?
The TV campaign spent approximately $500k. What was the return on ad spend (ROAS) of the TV campaign based on the increase in revenue? ROAS = Revenue / Cost
Take a look at the change in revenue by channel for the periods before and after TV (which started 23rd of August). Are there any noticeable trends over the time period?
Which of these factors could explain an increase in revenue in Facebook Ads?
Based on the data so far, do you think the TV campaign was effective? What other information would you need in order to increase your confidence?
RE: Performance Targets\nHey where did you get the data for FB? It looked a little low so I checked... Your numbers look to be about half what FB is reporting?\nFor example on the 18th:\n- facebook purchase value: $20,666.25\n- facebook / cpc (your analysis): $9,100.60\n\nBut then the totals in your analysis do add up to what I have from accounting, so I'm a little lost. Which are correct??
Why might the numbers in your analysis not match what Facebook is reporting? Choose the most likely reason.
Ok,\nLet's figure this out. Facebook is overcounting and maybe Google Analytics is undercounting, but we need to know by how much. It's great that numbers are up by 22% from TV, but investors will need more proof.\nI ran into Sam, the founder of Right Media, at the tennis court and mentioned this issue. Apparently what we need is a Marketing Mix Model. Look out for an intro from Sam for someone they recommend who can help us out.\nThanks,
Getting Ready: Data Collection, Cleaning & Engineering
Chapter 2 from the original Vexpower prototype.
Vexnomics <> GoolyBib\nHi Charlie,\nSounds like Sam already filled you in, so let's get right to it.\nLooping in Mike who will work with you on the analysis and get you the data you need.\nGive me a call directly on 202-555-0119 I just have a couple of questions on the contract.\nLooking forward to seeing what you find.\nThanks,\nG\n------------------------------------\nOn Fri, 30 Oct 2020, Charlie
wrote:\nHi G,\nSam already filled me in on the situation and the urgency of the project, so I took the liberty of putting together a scope of work so we can move quickly.\nIf all is good let me know and I'll follow up with a data requests list. Is it yourself or someone in your team that I'd be working with to get the data?\nCharlie\n------------------------------------\nOn Fri, 30 Oct 2020, Sam Smirnov wrote:\nHey G,\nLooping in Charlie at Vexonomics who did a great job on our Marketing Mix Models.\nI'll let you catch up.\nSam Chapter 1 Recap
As the VP of Growth for GoolyBib, you bet big on TV and it seems to be working – revenue is up 22%! But simply looking at the trend before and after TV isn't enough. Because all channels increased over the time period, we need to rule out other factors and prove the impact of TV. There are also questions on the ROI of Facebook ads that need to be answered as part of a Marketing Mix Model.
In your own words, what is Marketing Mix Modeling? Feel free to Google the term first if needed.
What are the benefits of Marketing Mix Modeling?
What are the limitations of Marketing Mix Modeling?
RE: Vexnomics <> GoolyBib\nHi Mike,\nIt was great to talk on the phone, and I appreciate you pulling the data so quickly.\nI understand you're keen to learn how MMM works, so I'm more than happy to accomodate your request to include you in every step along the way.\nThe first step now that we have the data is to get it into the right shape for modeling. It's not glamorous, but data cleaning is 80% of the job!\nBest,\n
You have the data necessary for modeling, but it's all in different places, and not in the right format for modeling.
Take a look at your data room and make copies to follow along with the data cleaning process.
Take a look at the Google Analytics data in the folder, that you already encountered in Chapter 1. Right now it's in 'stacked' or 'long' format, and we need to 'unstack' it for modeling. Each Source/Medium should be a new column like in the image above.
How would you accomplish this (without just copy/pasting)?
Check out the affiliates data. Is this already in the right (unstacked) format?
The TV data is horizontal, and is weekly rather than daily. Week 36 starts on Sunday August 23rd, 2020. How would you get this data into vertical format and divided out (interpolated) to make it daily rather than weekly to use in our model?
\nIn Chapter 1, we worked out Facebook Ads had generated 1,474,000.84 in revenue according to Google Analytics, with a spend of 1,165,185.77 this is a ROAS (Return On Ad Spend) of 1.27.\n\nLook at the Facebook Ads data. Using the Purchase Value column as Revenue, work out the ROAS as reported by Facebook.
RE: RE: Vexnomics <> GoolyBib\nHey,\nThanks for your help getting the data into the right format. I've pulled everything together here for modeling.\nYou can see I also added dummy variables (1 or 0) for each day of the week and for the 15th and 30th when most people get paid. This will help us capture the impact of the weekend on sales and see if there's a 'payday' effect.\nI'll get to work on the model and then we can run through it together.\nBest,
Everything but the Kitchen Sink: How to Choose Inputs
Chapter 3 from the original Vexpower prototype.
Hey, how are you doing?\nThanks for taking the time\nI thought it made sense if we just walk through the modeling process together so you can see how it works...
Chapter 2 Recap
You're building a marketing mix model to determine the ROI for each marketing channel at GoolyBib. In the last chapter you gathered all the data you needed and got it into the right format – that's already 80% of the work done! Now for the fun part, actually modeling the data.
Time to brush up on Linear Regression. Do some Googling and tell me in your own words how it works. Don't worry if you don't get it, skip ahead for answers.
What assumptions are associated with Linear Regression?
Let's pull up the final cleaned data up on my screen\nMarketing mix modeling mostly uses linear regression to see how each of these columns relates to sales.\nNormally we'd use a statistical package to do this, for example I use a Python library, but there are also packages in R, and tools like EViews, Orange or SPSS.\nFor simple cases like this though it's probably easier to get started just using the LINEST function in Google Sheets.
Using the data we compiled, you'll use the LINEST function to run a series of linear regressions.\n- The variable we're trying to predict is 'revenue' in column B, so this would be 'known_data_y' in the LINEST function.\n- The variables we're using to predict revenue would go into 'known_data_x'.\n- We always want 'calculate_b' to be true (this gives us the intercept).\n- W e will use 'verbose' as true when we need to see a detailed description of the model.\nMake a copy of the data so you can follow along.
Make a column called 'total_cost' that sums facebook_cost, affiliate_cost, and tv_cost. How much did we spend on marketing total for the whole time period?
Use the LINEST algorithm with total_cost as known_data_x, revenue as known_data_y, and both calculate_b and verbose set as true. The top left value is the coefficient or slope of the line. What value did you get?
The R-squared value for your model is the 3rd row down on the left, and should be a decimal between 0 and 1. How strong of an effect did total total_cost have on revenue?
The coefficient for total_cost is the first row on the left. The intercept is the first row on the right. The formula for calculating your model's prediction of revenue (y_pred) is:\ncoefficient * marketing_spend + intercept = y_pred
Calculate y_pred in column O and make a line chart of revenue and y_pred over time.\n\nDoes it look like this model has captured some of the impact on revenue from TV?
Create a scatter plot chart of revenue on the y axis (left) and total_cost on the x axis (bottom). Add a trendline in customize > series with the label set as use equation and show R2 selected.\n\nDo the numbers in the equation (coefficient, intercept, R-squared) match the output from LINEST?
The F statistic is the fourth number down on the left. The degrees of freedom is fourth down on the right. The formula for calculating the statistical significance (p-value) of total_cost is:\nFDIST(F statistic, 1, degrees of freedom)
Is the P-Value for the F Statistic significant (<0.05)?
Kitchen Sink Model
This is the output of a linear regression run with Python's Stats Model library. It's called a 'Kitchen Sink' model, which means we put 'everything but the kitchen sink', i.e. all the variables, in the regression to see what looks significant.
Which variables are statistically significant (P>|t| is less than 0.05)?
According to this model, facebook is returning $4.18 (coef) for every $1 invested (give or take $0.52, the standard error). Google Analytics reported Facebook revenue at $1,474,000.84. Is the Return On Ad Spend (ROAS, Revenue / Ad Spend) as reported by the linear regression model higher than what Google Analytics reported?
Great it looks like we have what we need with the model. It can obviously get a lot more complicated than that, but this is a good stopping point for our first one.\nNext steps for me are to drop those insignificant variables, then pull together a skeleton of the slide deck we'll share with G for the leadership retreat.\nI hope all of that was clear to you. Don't worry if you didn't get all of the stats – I'll explain more as we work on the presentation!
From 'What Happened' to 'What If': Presenting to Leadership
Chapter 4 from the original Vexpower prototype.
MMM Update\nHey Mike and Charlie,\nCan I get an update on how the project is going? Hoping to have the slides ready this week ahead of the management retreat.\nThanks,
Chapter 3 Recap
You built your model and now it's time to present the results. You found that Facebook was better than expected and TV was worse, as well as ruling out the impact of several other variables. However Marketing Mix Modeling isn't an exact science, so how do we communicate these results to our CEO in a way that helps guide decision-making? Below is the final model from Charlie after dropping the insignificant variables (where P>|t| more than 0.05).
The R-squared value of the model is 0.75. We can interpret this as...
Is R-squared the only measure we need to look at to determine if a model is reliable?
Take a look at the above post on how to interpret a regression analysis. Looking at the tests at the bottom of our model, which of these residual tests have we failed on?
What do the failing tests tell us about the model?
The coefficient for the constant is shown in scientific notation as 1.11E+04. What would this number be without scientific notation? (write it out to 2 decimal places with no commas, for example 5000.00)
The coefficient is a measure of how much of Y (revenue) do you get for every unit of X you put in. If the coefficient of TV is 0.39, what does that mean the ROAS of TV is? (revenue / cost)
The coefficient for Facebook is 4.23, with a standard error for Facebook is 0.52. What does this mean?
Sunday has a negative coefficient of -2353.91. Does this means...
Affiliates cost was excluded from the model because it was not 'significant'. How should this be interpreted?
Affiliates not being included in the model does not mean the model is wrong – it's perfectly common that some marketing channels prove to have no significant impact on revenue, even if they look like they do on a last click basis. For example if your affiliates are paying to run Google Ads on your brand term, they'd capture traffic that was likely to convert anyway. We also shouldn't overreact and cut all affiliate budget immediately – we should investigate with the hypothesis that affiliates don't increase revenue incrementally, and design an experiment to prove it.
RE: MMM Update\nHey,\nWe're right on track – we have some interesting findings and now just preparing the deck. Should be with you tomorrow.\nBest,
Now we'll put together a series of data visualizations that will go in the final presentation for the management retreat. Make a copy of the data below to follow along.\nTo get historical contributions of each variable to sales, we need to multiply the coefficient by the value for each variable on each day. For example Facebook's coefficient was 4.23 so on a day where we spent $100, we'd expect 4.23 * 100 = $423 in revenue from Facebook. Do this in the worksheet for each variable.\n
Work out Facebook's total contribution to sales, by multiplying the coefficient for each day by facebook_cost and adding up the resulting column. Do you get $4,937,125?
Calculate the contribution of each variable in the regression (ignore any variables we didn't keep). What variable contributed the most to revenue?
Calculate the absolute error (actual revenue - predicted revenue) for each day, and find the average of that column. What was the Mean Absolute Error (MAE) as a percentage of average revenue?
What potential reasons would explain why TV has such a low contribution to revenue? Choose the most likely reason.
Do you think it's realistic that Facebook is returning $4.23 per $1 spent on the channel, contributing so much to revenue? What's the likely explanation?
It's not likely that Facebook is driving so much revenue. More likely is that Facebook was colinear with another variable either in the model or missing from the model. Remember we had a high colinearity issue. This is probably because Facebook spend didn't vary by much – if we want to test the value of Facebook more explicitly we should increase spend in step increments and test splitting geo regions to prove incrementality and model the diminishing returns of the channel.
Use your coefficients to forecast what our average daily revenue would be, if we spent $10,000 per day on Facebook and $5,000 per day on TV for 7 consecutive days. Did you get $18,947.26?
E: RE: MMM Update\nHey G,\nAs promised, attached are the slides for the presentation. Top level summary:\n- The model could explain 75% of the movements in revenue, and was only out by 5% on a daily average across the period\n- TV returned $0.40 cents for every dollar invested, but this doesn't capture long term effects on the brand (likely to be substantial)\n- Affiliates doesn't look to be driving incremental revenue. You should investigate their behavior further\n- Facebook looks to be driving a high return at $4.20 ROAS, but we should validate this with an incrementality test\n- There are improvements to be made with the model, for example the numbers for Facebook are likely inflated\n- According to this model, with $10,000 spent on Facebook and $5,000 on TV per day, we'd average $19k revenue\nPlease let me know if you have any questions – let's hop on a call to discuss.\nBest,
Where Do We Go From Here? Next Steps After The Simulator
Chapter 5 from the original Vexpower prototype.
It sounds like G really liked our presentation, and I'm glad you found marketing mix modeling so useful!\nThis project was pretty short and rushed, but as you can imagine there's a lot more we can do.\nl can run you through a few of the high level concepts if you like? So you know where to take this next.
Chapter 4 Recap
You presented your model to the CEO, and it was a useful contribution to the management retreat. However the model isn't quite right – the Facebook Ads coefficient looks too high and there is evidence of multicolinearity. Althought the project is done for now, you're curious what else can be done to build a more valid model.
What type of business are likely to benefit from a pandemic?
One major benefit of Marketing Mix Modeling over other attribution methods is its flexibility in incorporating factors outside of marketing, or even outside of your company. Of course it's obvious that global pandemics, macro-economic conditions or major market changes would have an impact on sales, but digital attribution models don't take this into account.\nWith Marketing Mix Models you can include these factors as dummy variables, or better yet as indexes indicating the magnitude of effect. For example instead of '1' for COVID, '0' for no COVID, use Google Maps' free mobility data to determine how much your sales increase or decrease for every 1% increase in people staying at home.
What search keywords are likely to be declining over the past 5 years?
Are you doing a good job despite declining demand, or is a rising tide lifting your boat? Most performance marketers claim credit for period of increased performance, and try to avoid blame when market conditions are against them... but isn't it better to actually model the impact of the market?\nGoogle search trends is just one data source that you can include in your model as a proxy for general market conditions. It's possible with Marketing Mix Modeling to include search volume over time, and it often leads to more accurate models when you account for changes in consumer intent that you don't control.
What's the biggest day for ecommerce sales in the United States?
Most businesses have some seasonality, whether it's higher sales during sunny weather, more revenue near payday or lower sales after the weekend. Major holidays and government holidays also tend to have a major impact on sales, increasing for most B2C businesses or hurting it for B2B.\nThe impact of seasonality is typically accounted for with dummy variables (1 or 0) for holidays, specific days of the week or month or major holidays. It's also possible to incorporate weather data to see if there's an effect. These variables are commonly entagled however with others, causing colinearity, because marketers typically spend more and run promotions around holidays.
Who is Netflix's biggest competitor?
It may not always be possible to know what you're competitors are doing, but when available this data can be incredibly useful in marketing mix models. 3rd-party data sources like Similar Web can estimate your competitor's web traffic, and tools like SEMRush and Ahrefs can tell you what moves they're making on SEO and for TV advertising companies like Kantar can provide competitor TV spend.\nIncorporating this data can be really informative in your model, and it's often highly significant when explaining dips in your sales. The fact that this data is only directionally correct rather than 100% accurate is ok for Marketing Mix Modeling. You usually need less precision than you think, as even a proxy for competitor spend can show up as significant.
Which one of these channels is likely to have the highest long term effects on sales?
The effect of advertising usually lasts for longer than a day, so how do you incorporate longer term effects in your model? This is actually a solved problem in marketing mix modeling, using a variable transformation called 'adstocks'. To calculate it, you take the spend from yesterday, multiply it by an adstock rate, and add today's spend to it.\nTesting different adstock rates is one key part of modeling that lets you understand what the long term effect of a specific channel is on your sales. You can run a grid search process to test lots of different adstock rates until you find the one that most improves the accuracy of your model.
If you spend $1,000 per day on Facebook ads with a $2 cost per install, you are likely to get 1,000 installs per day when you spend $2,000 per day.
Usually it's naive to assume you can just keep increasing your spend in a channel and get the same efficiency. Even if a channel looks linear initially, you typically run into problems with scaling as you double down on your investment.\nBy transforming the variable and guessing the saturation point (B), you can determine what curve best fits your data. Once you have the right value for B, which should be where the channel is 2/3rd saturated, you can predict the cost per conversion or ROI for each spend level. This also allows you to run an optimization function and determine the right level of spend for each channel.
R-squared and Mean Absolute Error are the only two reliable ways to check a model's accuracy.
Econometrics, the discipline that marketing mix modeling is part of, has been going since the 80s, so most of the problems you can run into were solved long ago. One important part of determining if you have a valid model is running a series of statistical tests available to you.\nIt's useful to know statistics, but often if you just understand typical benchmarks or can visually interpret the data, it's straightforward to know what's gone wrong, what that means for the accuracy of your model, and how to potentially fix it. Condition number, Durbin-Watson, Jarque-Bera (JB), Gaussian Residuals and Breush-Pagan are just some of the important statistical tests to run.
What does a negative value for coefficient mean in a linear regression or marketing mix model?
The killer visualization for marketing mix modeling is the contribution chart. This is where you use the coefficients to play back what share of revenue each variable was responsible for.\nThis is particularly powerful because it let's you predict forward as well: if you know how much you plan to spend, when major holidays occur and other factors in your model, you should be able to forecast future revenue levels with a similar accuracy to your model.
MMM Training?\n\nHey,\nI really enjoyed our talk, and it seemed like you really wanted to go deeper on MMM. I actually have more training material on the topic if that would be of interest. Potentially we can talk to G about expensing it for professional development, so you can make future updates to the model?\nI'll follow up in the next few days with more information.\nBest,"