" Is that a valid assumption?
Your client is upset about an assumption they didn't know their model was making: that ads don't perform differently during holidays. Do they have a point?
Vexnomics Office – Early Morning
You get forwarded an angry client email about interaction effects: your model made the (default) assumption that terms don't interact. It might not be true in this case however...
Fwd: Interactions\nHey, I got an email from the client about interaction effects.\nThey didn't realize that linear models add up the terms, so the default assumption is no interaction. They see very different performance on holidays so that might not hold.\nLet's take a look at a Log Linear and Log Log model for them.\nThanks,
Normal Linear Regression models are additive. This means you're making assumption that each variable is independent from the others. We know that isn't really the case in real life, but it's a useful enough abstraction and makes the model easier to interpret. However there are times when you want the model's terms to interact, so you can use a log model.\nLog models are multiplicative, meaning the terms are multiplied together: they interact. So if previously you had a formula of y = (B0 * spend) + (B1 * holiday) + constant, now you might have y = (B0 * spend) * (B1 * holiday) * constant. This will capture the effect of the performance of your ad spend being different during holidays, and will also help with any non-linear effects as well, like diminishing returns (performance gets worse as a channel gets saturated).\nIn normal linear models, the coefficients (B) can be interpreted as the amount y changes for each unit of X. So for example B=0.2 for Facebook ads means you have a cost per conversion of $5. For log models the interpretation is different. For Log-Linear (where you just log the y value) the coefficient is the percentage change in y, so B=0.001 means you get 0.1% more conversions for each dollar spent. For log log models, it's the % change in y you get for a 1% change in x, so if B=0.02, you get 2% more conversions for each 1% increase in spend.
When building multiplicative models, we don't actually multiply everything together: we use logs. If you take the logs of two numbers, then add them together and find the exponential of that, you get the same result as if you multiplied them together. For example 5 * 10 = 500, but so does exp(ln(5) + ln(10)). This is how log models work.
...the linear regression method does not perform well on large amounts of data as it is sensitive to outliers, multicollinearity, and cross-correlation.
Log Linear and Log Log Models
Logarithms to the rescue: by taking the log you're going from adding terms together to multiplying them. This should satisfy the client's concern about interactions.
Vexnomics Offices – Later that day
You get to work on the log model to see how well it performs. Hopefully it helps you capture the interaction and non-linear effects.
Log Linear Model
Building a log linear model is relatively straightforward. You use the ln() function to find the natural log of the y variable (in our case conversions) and then the exp() function on the result of the prediction afterwards. You will also need to update the LINEST function to use the new ln() column instead of the untransformed conversions column. This is all that's needed to create a LogLinear model. We will want to compare the accuracy of this model using the Mean Absolute Percentage Error (MAPE) as was used in the original model.
Make sure you use the right base when dealing with logs. For example the log() function in Google Sheets / Excel uses a base of 10, so you would need to calculate 10 to the power of your coefficient multiplied by your x value to reverse it. Whereas the ln() function which we'll use, uses a base of 2.718281828459, the 'natural' log. This number is used because it makes the coefficients easier to interpret - a natural log coefficient of 0.06 is approximately equal to 6%. To reverse it we use the exp() function in GSheets / Excel. Don't worry too much about why this is: we can simply benefit from it by using the ln() or exp() functions.
The LINEST function which is used for linear regression in GSheets / Excel takes the known y values as the first parameter. You'll need to update this from column B to the new column where you store your logged conversion values.
Update the spreadsheet to use log(y) instead of y. Describe your process.
Log Log Model
The Log Log model works the same as the Log Linear model, except this time we log both the y variable (conversions) and the x variables (spend). Note we don't want to log the dummy variable of holidays because those values are 0 or 1, which would result in an error or 0 if we logged them. The rest of the model should work the same if you started from a copy of the Log Linear model.
Which model is more accurate?
Good work on the multiplicative model\nReally glad to see that the log log model seemed to capture the effects well and give us a big boost in accuracy.\nThat should keep them happy for a while!"