" Using Artificial Intelligence to avoid Natural Stupidity
Your startup is expanding, and you need to build a model to predict performance in a new market you don't know much about. Can you let the model take the driving seat?
Bank of Yu Offices — Monday
Marketing Mix Modeling is now a well established way to predict marketing performance, and recent growth has led to expansion to new markets. Your Head of Performance is keen to model these new countries, but you're not so sure.
I hear you, with the US model you had a lot of domain expertise to rely on\nI completely get that you wouldn't understand the Peruvian market! Neither do I to be honest.\nHowever I wonder if there's a way we can rely more on the data over human domain expertise\nIf we could crack that we'd probably also be able to remove some of the bias from our existing models.\nCan you take a look into AI solutions and see if that could work for us?
Machine Bias vs Human Bias
In building a successful model, most data scientists measure bias as the difference between the model's predictions and the actual data. However there is a more dangerous form of bias: ourselves! There are thousands of potential models that get similar accuracy scores, and because marketing mix modeling is still a very manual process, that leaves us open to making wrong decisions. Whenever you're measuring performance, it always becomes a political activity: the results of your analysis will be used to set budgets, so there will be winners and losers. All it takes is for a manager to put pressure on the analyst to find the result they wanted to see: it's easy to fudge the numbers and models are never made public for scientific replication, so you're not likely to get caught! This is particularly bad if you're using an external vendor: they're aiming to please, so if their client's favorite channel doesn't look impactful in the model they're likely to rework it. Therefore many marketing mix modeling experts are trying to automate and standardize the process to eliminate the potential for human bias. This has the benefit of making it quicker and easier to build a model, and allows us to build models for markets that we don't know much about: AI can 'learn' the right patterns from the data for us.
It's impossible to fully automate marketing mix modeling, but improvements in the field of deep learning have shown promise to take a lot of the work off of human modelers. These neural networks 'learn' the patterns in the data without as much human input, by creating their own features: much the way our own brains work. This has the potential to remove more human bias from the process, because there are less ways to manipulate the model to get the outcome you expected to see.
Predicting the future isn't magic, it's artificial intelligence.
Getting the data just the way they like it
Deep Learning algorithms are notoriously particular about the data they'll ingest. We need to get everything into the right format for modeling.
Bank of Yu Offices — later that day
You're skeptical that AI can do as well as a human, but also you're willing to give this a try: you know nothing about the Peruvian market so maybe the machine can outperform the human in this scenario! You have no idea how to write this code, so you hire an expert.
RE: Deep Learning Model\nHi,\nI've finished building your model and shared the notebook with you separately.\nThe first section deals with data cleaning, then I did a grid search to fit the model.\nI also plotted the partial dependence, which shows you how each channel performs at different spend levels.\nI love doing these sorts of projects on the side, so if you need help with anything else don't hestitate to ask. I'll share my invoice in a few days assuming you're happy with everything.\nBest,
Data Cleaning for ML
In order to build machine learning models, you typically need to clean the data in a very particular way. They can't just take in raw data, or they'll accidentally 'learn' that variables with higher values had a high impact, when they actually didn't. So you have to use a scaler, or a transformation that makes all data the same magnitude (i.e. between 0 and 1) while still preserving the exact same pattern. You also need to remove any NaN or Null values, and deal with categorical variables (i.e. convert labels into numbers). In this exercise the code is already written so we'll be testing your ability to run the code and understand what's going on.
Take a look at the code supplied - ffill was used to clean NaN values. Search for the documentation on ffill: what does it do?
What features were added to the dataset for seasonality?
Deep Learning algorithms are sensitive to the scale of data, so we need to transform the data before we feed it in. This is done in a data preprocessing pipeline, where each feature is transformed based on its type. Categorical variables are turned into dummy variables (1 or 0) with one hot encoding, and for numerical features best practice is to use a Quantile Transformer. Then all features are normalized, which makes sure all the features for each row are consistent with each other (compared to a standard scaler which operates on columns).
Look at the SKLearn documentation for One Hot Encoder. What does 'handle_unknown=ignore' do?
What is the difference between standard scaler and normalizer? Search Google for more information and explain it in your own words. Why would we use Normalizer for Deep Learning?
Is the Quantile Transformer a linear or non-linear transformation?
Surveys have shown that 60% of the work in Data Science is simply collecting and cleaning data, getting it into the right format for modeling. So if you've made it this far, you're most of the way done (even if it feels like you haven't seen much progress yet).
Training your model’s brain
The magic moment is here: time to train the model. Will deep learning be able to learn the right parameters for your model?
Bank of Yu Offices — the next day
The data is clean, your pipeline is set up and you're ready to model. Despite what you might think, this is actually the easy part!
The Multi-layer Perceptron regressor is a relatively simple to use and accessible deep learning algorithm. It doesn't require a lot of custom configuration so it's a good entry level model for anyone looking to explore Neural Networks. It optimizes to reduce the squared error, so it is penalized by big misses more than smaller errors. There are multiple 'solvers' available, which is what it uses to calculate the weights of the neurons in the network. To actually build the model you pair this algorithm with a GridSearch: that will run the model multiple times and find the right hyperparameters of the model. In this case it's trying different hidden layer sizes (the layers of neurons in our artificial brain!). Once the GridSearch is run, we can get the best model with the highest test score out of the hundreds it tries, with the best estimator function.
There are no real best practices when it comes to Deep Learning, because it's a relatively new field. Practicioners act more like chefs than scientists, mixing ingredients until they find a recipe that works! So don't worry too much if you don't understand how many hidden layers to pick, or what configuration to use: try a few options and see how it impacts your model so you can learn what works for you.
Deep refers to the number of layers typically and so this is kind of the popular term thats been adopted in the press.
Take a look at the MLPRegressor documentation on SciKitLearn. When should you use the 'adam' solver instead of 'lbfgs'?
What was the score of the best model?
What’s inside the black box?
One problem with deep learning is that it can be kind of a black box: we don't know how it makes predictions. There are ways to get an understanding of what's going on however...
Bank of Yu Offices — later that day
You've trained your model and now have selected your best estimator. Now we need to know how to use it to make decisions, and see how it does.
The main thing we want to learn from our model is what to spend our budget on. We can't access this directly: we don't know what all the different layers of neurons do in our model, so this technique is a lot more 'black box' than traditional linear regression. However we can feed our neurons inputs and get outputs: if we give it the different spend levels we want to predict, it will return the predictions. By controlling for all other variables and only changing the spend for one channel, we can show the partial dependence of a channel. This means we can choose how much to spend at different spend levels and use the model to make predictions for us.
The curve for Facebook Spend flattens at higher spend levels. What does this mean?
What's the benefit of using deep learning to estimate our model?
Man does not see reality as it is, but only as he perceives it, and his perception might be mistaken or biased.
Cassandra by Hybrida
Deep Learning is a complex topic, and while it's straightforward to get a simple version set up, it can be very difficult to use one of these models consistently in production. This is why I usually recommend you use an existing solution and see if it works with your data. Up until recently I didn't have a good recommendation for a tool that uses deep learning in your marketing mix modeling: then the guys at Hybrida released Cassandra. It uses the basics you've learned in this course (the founder of Hybrida helped create this course!) but takes away all of the complexity: you just upload your data and the tool does the rest for you. I actually hired the team at Hybrida to build a few marketing mix models for me in the past, and I'm delighted they've put their knowledge into a tool that's more widely accessible. I don't have any personal affiliation other than co-creating this course and thinking their modeling methodology is worth learning about.
The only way to avoid human bias in marketing attribution is to take away our ability to corrupt the model. Cassandra is like having a second brain that's immune to internal politics and tells you honestly what's working (or not)."