December 20, 2021

Befriend Your Data with Aravind Murthy, Anshul Jain, and Vinith Kumar

Aravind Murthy, Anshul Jain, Vinith Kumar

Data Science

An enthralling discussion on data science algorithms and machine learning in marketing analytics with Factors' in-house data science team

‍

What Makes Dealing with Data Difficult?

A common but prevalent pain point when dealing with data has to do with the inherent characteristics of big data — the volume or the sheer size of the data that is unprocessed, the velocity or the mobility of data, and it’s the rate at which it is conceived, and the veracity or the truthfulness of the data. A marketing perspective for these hurdles would look like this:

1. Volume: Millions of customer touchpoints or interactions with your web page, and hundreds to thousands of metrics and dimensions.

‍

2. Velocity: Changes in trends with a frequency of occurrence ranging from days, hours, weeks, months, etc.

‍

3. Veracity: Mining the top 10 segments of information that you can derive from your data daily.

‍

Then we have the data science pipeline, which is a sequence of steps for converting raw data into useful solutions for business problems. Here are some typical features of a data science pipeline:

‍

1. Data Pre-processing: This step involves the pre-processing of data through a set of standard operations like filtering, aggregating, ranking, etc. Along with managing your missing values and outliers.

‍

2. Feature Processing/Engineering: This is the back and forth of data being processed from humans to machines with the goal of understanding its features — using analysis like exploration data analysis, dimensional reduction, etc.

‍

3. Model Building/Training/Learning: Model building involves the creation of models which are parameters that describe the features that interact with each other, to forecast a metric of value for us. Your choice of algorithms will be dependent on if your data is structured or unstructured — like random forests for structured data and deep neural networks for unstructured data. Some common tasks involved in a typical data science pipeline include hyperparameter tuning, backpropagation, gradient design, etc.

‍

4. Model Inference and Explanation: The ultimate goal of model building is to use it to predict valuable information and provide live data. If you were to create a model that predicts CVR, we are interested in the “what” or “what would be the CVR” (inference) and the “why” or “why would the CVR be so” (explanation).

What about Predictive Models to facilitate lead forecasting (Forecasting CPM)?

In general, dealing with predictability through the leads you obtain through Google and Facebook is a complex problem to work with as most of the metrics like CPM are dependent on very volatile market conditions and seasonality on the way users bid on terms. That being said, there is still the possibility of working with your historical data and creating a budget allocation scheme based on a particular past performance that is in line with your present, which can help with forecasting the number of leads you can expect to generate. In terms of existing techniques that can be implemented in this case, that would include techniques like reinforcement learning for budget allocation. This would involve starting with a semi-supervised model in which you manually set the budget, after which you involve the machine to set parameters. You can also use a few models to help with adjusting your parameters, an example of that would be MAB (multi-armed bandits). Besides that, identifying your data’s features, setting your goals, and managing your audience scalability are some good practices for your model. If needed, a tool like Prophet by Facebook will prove to be very useful with forecasting.

How do you decide on what data science technique/algorithm to use for a given task?

There are a number of factors that determine which technique to use, these include:

‍

1. The goal of your task

2. The orientation of your data

3. What relationships do you expect your data to have

‍

From a machine learning context, choosing between supervised learning (which uses labelled data to learn to predict outcomes), unsupervised learning (used to find hidden patterns by analysing and clustering unlabelled data) and reinforcement-based learning (uses a training method that rewards desired outcomes and punishes undesired outcomes). You will need to identify your problem so that you can map them to one of these methods, which will, in turn, decide other factors like choosing between classification and regression, or clustering, etc.

How do we use these Algorithms for Marketing Analytics?

If you have specific goals that can be derived from the behaviour of your retrospective data, and if they can be arranged based on it being a categorical or numeric goal, this would fall under the range of supervised learning. Categorical goals in a marketing context could include things like click vs no click or convert vs no convert. While something like the percentage of sales increase for the upcoming month would be an example of a numerical goal. These can then be taken as a classification or regression problem.

‍

When you do not have a goal but have data. Unsupervised learning can help with analysing this data, using clustering, topic modelling, dimensionality reduction, etc. A use case for unsupervised learning can be market segmentation. If you want to segment your market based on a multitude of factors, clustering can be used.

‍

Reinforcement learning on the other hand on training your algorithm based on assigning rewards. This can be done by building a model and leaving a few parameters flexible to gain more insights from those parameters every day, week, month, etc.

Can Automatic Analytics match the level of Manual Insights one can obtain?

If we were to consider the level of insights generated by humans as the best we can obtain, then a machine could generate the same level of insights. Machines are capable of identifying a higher number of combinations than a human can. This is possible because machines are capable of learning from already pre-existing insights acquired from humans. This leads to a cycle of machines generating many combinations, humans picking the best ones, and the machine optimizing based on human choice. However, machines fail to explain the causality of outcomes better than humans can, this is one feat that machines are yet to achieve. From a marketing point of view, it boils down to choosing the best correlations which are causal in nature. The insights obtained from marketing data are of a template nature, making it effectively easier for humans to identify the causal influence of marketing insights compared to a machine.

Let's chat! When’s a good time?