This is a blog series that I originally wrote as a book. I’m sharing the full text online for free. Each blog post is a chapter. Please send your feedback: If you’d like, get on the email list.

All chapters:

  1. Intro
  2. Marketing Strategy and Management
  3. Business Growth
  4. Research and Analytics (we’re here)
  5. Campaigns and Tactics

In this chapter:

  1. Intro
  2. Marketing Research Steps
  3. Types of Marketing Research
  4. Primary Qualitative Research: Focus Groups and 1:1 Interviews
  5. Technical Skills for Market Research and Analytics
  6. Primary Quantitative Research
  7. Statistics 101
    1. Importance of the Random Sample
    2. Arranging a Survey
  8. Statistical Analysis
    1. Statistical Tools
    2. Descriptive Statistics
    3. Regression Analysis
    4. Cluster Analysis
    5. Conjoint Analysis
    6. Causal Research
  9. Data Visualization
  10. Secondary Research
    1. Google Trends
    2. Google Correlate
    3. Google AdWords: Keyword Planner
    4. Public Datasets
    5. Third-party Analytical Reports
  11. Forecasting Sales for a New Launch
  12. Marketing Analytics
    1. Web Analytics
    2. Databases and SQL
    3. A/B Testing
  13. Attribution Models
  14. Working with Research Agencies
  15. Marketing Research/Analytics Checklist
  16. A Quick Recap
  17. More on Data Analysis and Statistics


“If I had asked people what they wanted, they would have said faster horses.”

Henry Ford

It is an odd quote to start a chapter about marketing research with, isn’t it? Henry Ford did not seem to put a lot of faith into marketing research. Nor did Steve Jobs, who said, “It’s really hard to design products by focus groups. A lot of times, people don’t know what they want until you show it to them.”

The real takeaway here is not that marketing research is worthless. The real takeaway is that misapplication of market research is worse than no marketing research.

In other words, Henry Ford should not have asked what people wanted. Instead, he could have asked why they wanted faster horses. He might have learned that they wanted to get to certain places quicker, relatively safely, and in a socially acceptable manner. This could have informed many design and marketing choices.

Steve Jobs was absolutely right about focus groups. If you know even a little bit about qualitative marketing research, you know that product innovation is not what focus groups are for. If anything, you can use focus groups to test user experience or understand what conclusions customers make about your brand when they see a product, a prototype, or at least a mockup. But, of course, you cannot just ask random people what your company’s innovation or marketing strategy should be. This would be just silly, right?

Marketing Research Steps

Before starting any research project, no what type, it is very important to go through these steps:

  1. Set the research objective
    Be very clear what question you are trying to answer. If there are multiple teams participating in the project, make sure that everyone is aligned on the objective and the final deliverable.
  1. Decide how the results will be used
    What particular business decisions will be made? If possible, pre-define the outcomes before even starting the research project.
    For example: “If according to our research, total addressable market turns out to be smaller than $3M, we are not going to develop the product in question. Otherwise, we are going to invest in development.”
  1. Choose the right type of marketing research that will allow you to answer the question
    We will discuss the major types of marketing research and explain when to use them. As you will learn, some are better for generating ideas or hypotheses and others are better for verifying hypotheses with quantitative data.

Types of Marketing Research

There are a million ways you can conduct marketing research. And there are hundreds of books written on each of these methods. To save time, we will review only the main ones and discuss some useful tools, both free and paid.

Here is our big picture map of marketing research:

Untitled Diagram (3)


We discussed how data can be utilized when we talked about acquisition, retention, virality, marketing funnel and image indicators in the “Business Growth” chapter. We showed how applying frameworks to data can help us set goals.

But how do you get the data? Well, there are several options.

Primary Qualitative Research: Focus Groups and 1:1 Interviews

Primary qualitative research is most commonly known as “focus groups.”

This is a very simple concept. We invite several customers – whether buying our products or competing products – to talk about their experience and understand their needs better. Then we develop products and marketing campaigns that address those needs. Sounds like something that anyone without any experience in marketing research can do. The devil is in the details, though.

When to use:

Focus groups are very helpful at generating initial ideas about a question at hand.

Sometimes we might not have enough information to design a well thought out quantitative survey and need a starting place. We can use focus groups in this situation because the conversation is free flowing, which is conducive to brainstorming and creation of new ideas.

So the main advantages of focus groups are:

  1. The depth of data: we can get a lot of insights to test later with a larger sample.
  2. Good for storytelling: we can expect to get a lot of anecdotes that can be leveraged later within the organization.
  3. Flexibility: we can direct the discussion in the way we want and discover new topics that are important to our customers.

And these are main disadvantages or risks:

  1. The sample is very small:
    In other words, we are only working with a few people and their views might not be representative of a larger population of our customers. This is why focus groups are used to generate hypotheses, and then quantitative studies are used to validate these hypotheses.
  2. Conformity and groupthink can be a real issue:
    The facilitator of a focus group needs to work hard to manage naturally emerging leaders who skew the discussion and prevent other people from expressing their views.
  3. The importance of facilitator and proper preparation:
    If a facilitator is not experienced, he or she might not deal with group dynamics effectively or might ask leading questions. This, in turn, can undermine the validity of results. Hence, setting goals for a focus group and getting an experienced moderator is crucial.

These are the main applications:

  1. Understanding customers: their behaviors, values, goals, opinions, and attitudes towards products and services.
  2. Testing marketing materials:
    Sometimes before launching an ad campaign, it is a good idea to get initial feedback and understand what conclusions customers arrive at when they see the ads. How do they react to copy or visual ads? What does this ad make them think about our brand? Do they think this is a product for them? What reservations might be based on what they see?
  3. Product feedback: learning about initial reactions to prototypes and mock-ups or even final products.This type of focus group is sometimes called a user group. Even simply watching customers use your product can provide so many insights to product managers, engineers, and designers that they will completely reprioritize the product roadmap. Watching our existing customers is good, but what is even better is to watch a group of customers who don’t use our products but buy competing products instead. Ask them to think out loud while they try to use the product for the first time. Product teams will thank you later.

How to use:

  1. Recruiting
    1. Arrange at least two separate focus groups of about 6-12 people. For example, a group with company’s current customers and a group with customers who buy competing products.
    2. Try to keep the composition of each group relatively homogenous. There are exceptions to this rule, but in most cases it’s better to divide groups by age, gender, and other demographic characteristics.
    3. Screen participants who try to earn money by professionally participating in focus groups.

Emmett Shear, who founded the startup called Twitch and sold it to Amazon for $970 million, highlights the importance of talking to the right audience in his “How to Run a User Interview” lecture at Y-Combinator “How to Start a Startup” class. Emmet says that the team had a real breakthrough when they started talking to customers of competing products instead of focusing on their customers.

  1. Logistics
    1. Incentivize participation:
      Participants should be paid enough to be willing to spend time talking to you but not too much to skew the results.
    2. Typical duration: one to two hours.
    3. Location: pick a relevant facility.
      Professional settings usually work best, but there are exceptions. For instance, if you conduct a focus group with children, consider a more natural setting because an office conference room might be intimidating and not very conducive to an open conversation.
  1. Preparation
    1. Outline a discussion plan with a list of questions, make sure it meets the research objective, and approve it with all stakeholders.
    2. Follow the plan to make sure that all topics are covered. If you feel that you don’t have the skills to conduct a successful session, hire a professional marketing research agency or an experienced facilitator who can create a relaxed positive atmosphere and keep the discussion productive.
    3. If needed, record the session on video. But don’t forget to get the participants’ consent beforehand!

Even though focus groups are typically conducted with 6-12 people, most of the rules apply to 1:1 interviews as well. We can explore preferences of individuals deeper during 1:1 interviews.

More on focus groups:

Technical Skills for Market Research and Analytics

These days, marketers with technical skills have the edge. Many people in Silicon Valley would even argue that it is crucial for a marketer to be technical. The truth is, almost everyone can benefit from learning at least some technical skills.

Often you can see a good strategist and high-level manager who cannot do anything hands-on. But equally often you might find a domain expert who is very good at crunching numbers or optimizing content for a specific channel but has little understanding of the big picture. So a combination of fundamental business acumen and technical skills is truly unique.

We will cover statistical analysis, forecasting, marketing analytics, and other topics in this chapter. You will learn what tools you can use to solve specific problems you might encounter, such as segmenting customers or evaluating the effectiveness of marketing campaigns. Of course, you can always hire a professional research agency or a statistician to crunch the numbers for you. But being able to do those things independently, or at least understand the general logic, is highly helpful.

In the next chapter on “Campaigns and Tactics,” we will also talk about technical skills but in a different context. There we will cover web design and development, SQL, UX and wireframing skills.

Primary Quantitative Research

As discussed in the section on focus groups, the end result of a focus group is often a list of ideas or hypotheses. We might learn about certain preferences or needs of our participants.

The next logical step is to check those ideas with a larger group of people and quantify our findings – understand how big the effect is. There are two main steps:

  1. Get the data
    Where do we get the data? Usually, it is obtained through customer surveys or through internal analytics tools, such as Google Analytics or SQL databases. We will talk about more these later.
  2. Make sense of the data
    There are different ways of making sense of the data. If we are dealing with a large dataset, sometimes it’s enough to visualize it to see patterns emerge. Another way is to conduct statistical analysis to derive conclusions. We will cover both below.

Arranging a Survey

When to use:

An online survey is a free or an inexpensive way to get customer feedback or research an idea. Offline surveys might be necessary if you cannot get a representative sample of your customers online.

How to use:

If you already have a product with a user base, you can ask your customers questions to better understand their needs, evaluate product satisfaction, or achieve other research objectives.

If you don’t have a product with a user base just yet, there are still options for you. You might want to test your idea by getting feedback from the relevant target audience. Or you might want to learn more about needs of your target audience and products they currently use to fulfill these needs. In either of these situations, you might have to pay for an online panel to get your questions answered. Surveymonkey, among others, offers such services.

Here is a quick refresher on the steps to take before any marketing research project:

  1. Set the research objective
  2. Decide how the results will be used
  3. Choose the right type of marketing research that will allow you to answer the question

The next step for surveys specifically is to write down a list of questions you want to ask. Once you have the list ready, make sure that getting the answers will allow you to achieve the research objective.

The next step is to ensure that the questionnaire will give you the responses you are looking for. Even though it’s nearly impossible to guarantee this outcome, it will help you prevent some most common mistakes. First, go through this checklist and make sure that none of your questions fall in these categories:

1. Ambiguous question

2. Leading question
Example: “What do you like about our CoolBrand car that has been recently recognized as #1 car of the year by XX Magazine?”
Problem: you lead customers to only talk about positive attributes.

3. Non-exhaustive question
Question: “How do you commute to work?”
Options: “Bus | Car | Bicycle?”
Problem: What if you take a train or walk?

4. Non-mutually exclusive questions
Question: “What is your age?”
Options: “Under 18 | 18 – 25 | 25 – 30 | Over 30”
Problem: What should a 25-year-old person choose?

5. Two questions in one
Question: “Do you use a car audio system and phone charger?”
Options: “Yes | No”.
Problem: What if you only use a phone charger?

6. Unanswerable questions
Example: “How old were you when you first saw a CoolBrand car?”
Problem: Customer might not remember some things and might not care about some things that a company cares about.

Sometimes it’s harder to avoid leading questions than you might think. Even a word choice might influence the results. Consider this example:

(CBS/NY Times, February 10, 2010)


And this is an example showing how important it is to pick the right scale:

Question: “How many hours of television do you watch on an average weekday? Please choose the category from the list below that best describes your behavior.”

(Marketing Research class by Leif Nelson at BerkeleyHaas)

What conclusion should a marketing manager make based on the results above? According to the survey that used the scale on the left, only 16% watch more than 2.5 hours of TV per week. But according to the survey that used the scale on the right, the number is 38%!

Here is another example:

(Marketing Research class by Leif Nelson at BerkeleyHaas)


Both questions asked about the same thing. But notice how drastically different the results are: only 5% agree that the U.S. should forbid these speeches, but 19% agree that the U.S. “should not allow them.”

All of these are examples of different frames. In practice, it is very difficult to ask questions in a completely neutral tone, but this is something we should strive for.

Another important consideration is the order in which you ask questions. Consider this example:

(Marketing Research class by Leif Nelson at BerkeleyHaas)

Of the people who were asked about their interested in politics first, only 28% said that they are interested in politics. Of the people who were asked two preceding questions, 42% said that they are interested in politics.

Here are some good practices to follow when designing a survey questionnaire:

  1. Use simple words
  2. Use clear words
  3. Avoid leading questions
  4. Avoid implicit alternatives
  5. Avoid implicit assumptions
  6. Avoid double-barreled questions
  7. Consider frame of reference

Online survey tools:

For offline surveys, you might need to hire a research agency.

It is one thing to get data by conducting a survey. It is another thing to analyze the data and make actionable conclusions. Marketing research agencies can help with this task. But in a spirit of learning how to do the hard work ourselves, we will discuss statistical analysis and data visualization here.

Statistics 101

Some basic understanding of statistics is critical to analyze data, whether you are looking at a customer survey or A/B test results.

Data can be analyzed with a variety of tools. We can use Google Spreadsheets, Microsoft Excel, or similar products for small datasets and for basic operations. But if we want to evaluate the statistical significance of a certain conclusion or create a regression model, these tools become less practical. It is still possible to use them, but the effort becomes highly cumbersome and unnecessarily complicated. There are better tools for these purposes. We will assume that you have a basic knowledge of Excel and will review some specialized tools for statistical analysis instead.

It might have been a while since you studied stats at school, so here is a quick refresher on key terms. If you are a statistician, please cut me some slack for potentially oversimplifying many terms.

Importance of the Random Sample

More often than not, surveying everyone on your database or every potential customer will not be practical. This means that most of the time, we will only survey a subset of existing or potential customers. And this is when we need to worry about inadvertently making our sample biased. In statistics lingo, our sample should be representative of larger population.

Here is a simple example. Imagine you ask all your users to answer several quick questions on an opt-in basis. Let’s say 5% of them agree. Do you think you can take the results of this survey and extrapolate on the remaining 95% your users? You may be tempted to say yes. After all, you asked everyone to participate, and some users randomly agreed. Well, the bad news is that this is not what statisticians mean when they say random.

What if the product satisfaction among the 5% of your users who responded to the survey is 87%, but an average product satisfaction among the remaining 95% users is only 36%? You’ll never know about 36%, but will be tempted to conclude that product satisfaction is 87% among all users while the real satisfaction is 0.05 * 0.87 + 0.95 * 0.36 = 38.55%

In this particular example, 5% of users are not representative of the larger sample. So we cannot make any conclusions about all users. Instead, we could have assigned the survey to 5% randomly selected users and incentivized them to participate. Had we done that and achieved a high participation rate among these randomly selected people, we could have discovered the real satisfaction is actually much lower.

Now let’s review some other basic statistical terms.

Statistical Terms


Mean is what people usually think of when they hear “average.” Imagine there are five people in the room. Four of them make $45K a year, but the fifth person is Bill Gates. Let’s assume for the purpose of this example that he makes $1M a year. What is the mean income in the room?

($45,000*4 + $1,000,000) / 5 = $236K

As you can see from this example, mean can be easily distorted by outliers, such as Bill Gates. Intuitively, you might feel that concluding that the average income is $236K, when four out of five people make $45K, is not exactly right. This is why statisticians often use the median to get rid of outliers. Here is how it works:


Median is the number that is halfway into the set. Median is often used when we want to exclude the impact of outliers like Bill Gates from our analysis. Continuing with the same example, here is the set of income levels of these five people in ascending order:

$45K | $45K | $45K | $45K | $1,000K

Which number is halfway? It is the number in bold: $45K. Median income in this room is $45K. As you can see, both median and means are useful numbers and can be used to answer different questions.

Null hypothesis

The null hypothesis is just a fancy term for a default situation that you start with. For example, the null hypothesis can be “users who see version B of the website are no more or less likely to buy our product than users who see version A.” Another example might be “users who received engagement emails from us are no more or less likely to be retained than users who did not.”

It sounds odd, but the idea is simple: as we haven’t proven anything with our analysis yet, we are simply stating the current state of affairs. The current state of affairs is not knowing the result. This is why say that “users who received engagement emails from us are no more or less likely to be retained than users who did not.” We simply don’t know, so we start with a blank page, assuming these emails make no difference. I found this concept very counterintuitive when I first learned about it too. Please bear with me.

Alternative hypothesis

The alternative hypothesis is an idea that we want to test by conducting statistical analysis. For example, “users who see version B of the website are more likely to buy our product than users who see version A” or “users who received engagement emails from us are more likely to be retained than users who did not.” We don’t know if our alternative hypothesis is true yet, but this is important to clearly state it to be able to test the idea.

You might also hear statisticians often say “reject the null hypothesis” or “accept the null hypothesis.” It took me a bit to wrap my head around those phrases. But these are actually very simple.

Imagine we conduct an analysis and prove our alternative hypothesis: for example, users who see version B of the website are more likely to be retained than users who see version A. In this case, we can say we can “reject the null.” What it means is that now we know that the null hypothesis is not true and that the alternative hypothesis is true.

Now imagine we conduct an analysis but it does not say anything definitive about user behavior. Maybe there are differences between version A and version B, but they may be so small so as to not be statistically significant. In this case, we might say that “we accept the null.” It simply means that we failed to prove anything at all.

A simple way to think about p-value is that this is the probability of your conclusion being due to chance. This is important because no matter how elaborate our analysis is, it’s still possible that we got our results due to chance. So it’s important to estimate the exact probability to not to fool ourselves.

In other words, when you see “p-value = 0.05,” you can interpret it as “we are 95% sure that this is what’s actually happening, but there is 5% chance that we simply got this result by pure luck because of randomness in the data.”

P-value is automatically calculated for us by all the apps we describe here, so we don’t need to worry about formulas. We simply need to interpret the p-value we get: is it low enough? The lower it is, the less likely it is that what we see is due to chance.

α (alpha) or a significance level

Alpha simply means a p-value that you are comfortable with. You can set any α that you want. Weird, I know. It is usually set at 5%. It means we want to be at least 95% sure that our findings are not due to chance.

So imagine we set our alpha at the 5% level before conducting the analysis. Then imagine that our app shows us a p-value of 0.04. It means that we will need to conclude that our results are statistically significant (we will reject the null hypothesis) because 0.04 < 0.05.

But if the p-value turns out to be 0.06, we will need to conclude that we have not proven anything because there is a 6% chance that our results are due to chance (and because 0.006 > 0.005).

Confidence interval

Statistical findings are often presented with just one number. For example: “users are 8% more likely to buy when they see version B vs version A.” In reality, though, we never know the specific number – we only know the interval. A more accurate way to present the same conclusion would be “users are 6-10% more likely to buy when they see version B vs. version A.”

We don’t have to worry about calculating a confidence interval either because it’s also automatically calculated for us by apps. Actually, by default, a very specific type of confidence interval is calculated: “95% confidence interval.” Here is what the 95% confidence interval means in human language: “we are 95% sure that users are 6-10% more likely to buy from us when they see version B vs. version A.”

Why 95%, you might ask? It’s just a convention. We can pick a different number. For example, if we want to be 99% confident in our conclusion, we can adjust app settings to calculate a 99% confidence interval. Assuming the same data, do you want to guess what will happen to the conclusion?

That’s right, the interval will get wider! So the conclusion will likely sound more like this: “we are 99% sure that users are 1-15% more likely to buy from us when the see version B vs. version A.” Or even worse, it might look something like this: “we are 99% sure that users are -18 – 34% more likely to buy from us when they see version B vs. version A.” The latter would mean that we cannot even conclude that version B performs better with 99% confidence. What can we do in this situation? Two options:

  1. Set a more modest confidence level. For example, 95% or 90%.
  2. Get more data. The bigger the sample is, the easier it will be to achieve a high confidence level.

Statistical Analysis

Statistical Tools

Below are some apps commonly used for statistical analysis:

  • R (free, harder to learn)
  • SPSS (paid, easier to use)
  • Stata (paid, easier to use)

Pick the one you like. We used SPSS and Stata when learning Marketing Research and Marketing Analytics at UC Berkeley. R is open-source and very popular. It also has a big community. But R is also harder to learn.

There are books written about each application. So we will only cover some basics of what you can use these apps for in the context of marketing. The easiest way to learn more is to simply google the name of an app + a type of analysis you are trying to conduct. For example, “SPSS regression analysis” or “R cluster analysis.”

Now that we reviewed some basic statistics concepts and discussed popular applications, let’s see what we can do with these applications.

Descriptive Statistics

We’ll start with descriptive statistics because this is the easiest type of data analysis.

When to use:

When you want to summarize a large dataset in a small, neat table.

Imagine you have a huge table with 100,000 rows and 100 columns that describes our customers and looks something like this. This is what a typical customer relations database might look like. Even though it’s full of useful information, it’s a hard to find insights by simply looking at it.

This table would be too big to fit on this page, so we will skip rows or customers from 4 to 99,999 and characteristics or columns from 4 to 99.

Age, years Product satisfaction, 1-5 Money spent, $ …columns 4 – 99… Registration date
Customer 1 35 4 $1,384 02/11/2007
Customer 2 45 2 $234 05/01/2013
Customer 3 22 2 $121 8/04/2014
…customers 4 – 99,999… 08/08/2008
Customer 100,000 34 3 $284 09/03/11

What descriptive statistics can help you with is creating a small, neat table that would answer practical questions.

For example:

How does product satisfaction influence amount of money spent?

Product satisfaction, 1-5 Money spent, average $
1 $4.56
2 $89.45
3 $385.67
4 $1,543.94
5 $2,359.09

As you can see we grouped all customers by their level of product satisfaction and calculated average money spent by these groups. Of course, it can be done with Excel pivot tables. But it takes a few clicks or a few commands with Stata or SPSS. Here is how.

How to use:

If you are using SPSS, go to Analyze -> Descriptive Statistics.

If you are using another app, google “name_of_your_app descriptive statistics”

Regression Analysis

Regression analysis is a popular type of statistical modeling. It is used widely in business and science because of its power and versatility.

When to use:

Regression analysis, in a most general sense, is used to estimate relationships between variables. For instance, once a regression model is built, we might be able to answer questions such as “how will variable Y change if variable X is increased by 10%?”

Regression analysis can be applied to millions of problems, from understanding whether customer satisfaction increases revenue per customer to understanding whether eating a healthy diet increases life expectancy.

These are just a few marketing use cases:

  • Understand which marketing campaigns are most effective in driving sales
    -> redirect investment and increase effectiveness
  • Understand which customer characteristics are associated with increased spend
    -> prioritize product and marketing decisions
  • Understand which factors are associated with high or low user satisfaction
    -> address the most pressing issues
  • Understand which factors are associated with user churn
    -> target the most impactful factors to retain users

How to use:

This example is intended for illustration purposes to show when and how you can use regressions to improve your marketing.

Let’s continue with the same example we discussed in the previous section. Let’s look at our dataset which describes customers:

Age, years Product satisfaction, 1-5 Money spent, $ …columns 4 – 99… Registration date
Variable name age prodsat monsp regdate
Customer 1 35 4 $1,384 02/11/2007
Customer 2 45 2 $234 05/01/2013
Customer 3 22 2 $121 8/04/2014
…customers 4 – 99,999…
Customer 100,000 34 3 $284 09/03/11

You might have noticed the new row named “variable name.” It simply adds a short name to each column. We will use these short names in the future.

We will follow a simple, three-step process:

  1. Identify a question or hypothesis to test
  2. Run a regression
  3. Interpret the results and make practical conclusions

Step #1: Identify a question or hypothesis to test

Now we need to come up with a question that we want to answer. Let’s say we want to understand if age and product satisfaction increase money spent by an average customer.

A statistician might say that money spent in dollars is our dependent variable. In turn, product satisfaction and age are our independent variables or predictor variables. This is because we use these two variables to predict the third variable, which depends on them.

Step #2: Run a regression

Now we need to ask our statistical application of choice to build a regression model. In this example, we will be talking about a linear model, which looks something like this:


  • x1, x2, x3, x4 – all the way to xk are dependent variables. In our example, we will only have two of them, x1 and x2, because we are only using two dependent variables: age and product satisfaction.
  • Y is the dependent variable. In our example, it’s the amount of money an average customer is predicted to spend.
  • β (beta) simply shows by how much an increase in each x increases or decreases Y.

It will all make more sense in a moment.

To run a linear regression in Stata we simply need to type:

regress monsp age prodsat

Here is what it means:

  • “regress” is a command prompting Stata to build a regression model
  • “monsp” is our dependent variable (this is why it comes first)
  • “age” and “prodsat” are our dependent variables (which is why they come after “monsp”)

Thankfully, Stata builds the model for us and returns the results in a fraction of a second. All we have to do is interpret the results.

Step 3: Interpret the results and make practical conclusions

What Stata or another application will give back to us might look scary at first. But there are really only a few numbers that we care about. And they are easy to understand. Let’s highlight these numbers in green and take a look:


Let’s go one by one:

  • “Prob > F” is the p-value.
    Please re-read “Statistics 101” if you don’t remember what p-value is. Here is how to interpret it: if p-value is lower than the significance level (alpha) we are striving for, we can conclude that we found some meaningful relationship. Luckily for us, the p-value is super low: 0.000. This is very good; as you might remember, the lower it is, the better. Typically, p-values lower than 0.05 are considered “good enough.”
  • “R-squared” is not critical in this case. The way it’s usually interpreted is “the percentage of association explained by the analysis.”
    In other words, product satisfaction and age explain only about 0.67% of what determines the amount of money customers spend. This is not necessarily bad. We will never be able to explain 100%. There are always other factors at play, such as the customer’s disposable income, for example. Other things being equal, the more dependent variables we use in our analysis, the higher R-squared should usually be.
  • Now let’s turn to the table. First of all, let’s look at the “P>|t|” column.
    This column simply shows variable-specific p-values. This is a way to tell if there is a meaningful relationship between each dependent variable and independent variable or if there is none.

    • As you can see, p-value for “age” is very high – much higher than 0.05. It means that we can completely ignore age as a factor. In other words, based on our dataset and based on this regression, we cannot reliably say that age predicts money spent by an average customer in any meaningful way. So we want even look at other numbers on this line.
    • How about product satisfaction (“prodsat”)? P-value here is amazing; it’s 0.000. This is definitely lower than 0.05. It means there is something here. So let’s look at other columns to understand what’s going on with product satisfaction.
  • “Coef.” column shows how strong the effect of a given variable is. Here is how to interpret it: 386.429 means that on average, with each increase in product satisfaction, a customer is expected to spend $386.43 more dollars.
    This makes a lot of intuitive sense, but this analysis allows us to quantify the effect: An average customer who has a product satisfaction of “4” is expected to spend $386.43 more dollars than a customer with a product satisfaction of “3.” An average customer with product satisfaction of “4” is also expected to spend $386.43*2 = $772.86 more than a customer with product satisfaction of “2.”
  • “95% Conf. Interval” is exactly that: a 95% confidence interval.
    Please re-read “Statistics 101” if you don’t remember what confidence interval is. Here is how we can interpret our numbers. We are 95% confident that with each increase in product satisfaction, an average customer is expected to spend from $378.90 to $393.96 more dollars.
    You might also notice that we cannot say much about age with 95% confidence: the lower bound of the confidence interval is negative and the upper bound is positive. This is reflective of the fact that the p-value is high. We don’t even know if customers who are older spend more or less on average.
  • “_cons” captures some constant number that an average customer is expected to spend before any dependent variables are taken into account.

This interpretation might feel cumbersome at first, but it will become much more natural with practice.

To see the logic, let’s plug some numbers into our original formula:


This is how the formula looks like with the names used in our analysis:

Y (money spent) = _cons + (prodsat coef) * x

Let’s imagine we want to predict an average amount that a certain customer is expected to spend, based on his or her product satisfaction. Let’s say that his or her product satisfaction is “3.”

Y (money spent) = _cons + (prodsat coef) * x = 135.7517 + 3 * 386.429 = 1,295.0387

So an average customer with a product satisfaction of “3” is expected to spend $1,295.04.

Instead of doing the math ourselves, we can also ask Stata to all the work for us and predict “money spent” numbers for each customer in our database. All we need is to type:

predict variablename

We can use anything for variable name. Stata will simply add another column to our dataset and populate it with predicted values, making calculations in exactly the same way we just did for each row. For example, we can use:

predict predictedsales or predict howmuchwilltheyspend

Here is how the dataset will look after Stata adds this new column:


Remember, moneysp = “Money spent” and predictedsales = “Predicted Sales.”

Our model predicts values for some customers very well, but not quite so well for others. This is partly because we only used one variable in our model. Well, we started with two, but age turned out to be irrelevant. We can use more variables to improve the accuracy, but going through, let’s say a hundred, would make this example unnecessarily complicated.

Real-world regression analysis is usually more sophisticated. Often you would be able to get predicted numbers very close to actual numbers. Here are some considerations that can complicate real-world regression analysis:

  • Often we would use dozens or even hundreds variables, not just one or two, to render more accurate predictions.Depending on the research objective, we might use different types of regression models: linear regression, logistic regression, polynomial regressions, and others.
  • Depending on the research objective, we might use different types of regression models: linear regression, logistic regression, polynomial regressions, and others.
  • There are possible complications, such as interactions between variables.

There are many books written on the ways to use regression analysis – we won’t be able to cover it here. Now that you know how this type of analysis can be used, you can choose to learn more about it by following some of the tutorials below or by reading books recommended at the end of this chapter.

If on the other hand, you choose to outsource this work, you will know what questions to ask and how to interpret conclusions that you are given by a research agency or a data scientist.

Tutorials on regression analysis and Stata/SPSS/R:

Cluster Analysis


When to use:

In marketing, context cluster analysis is typically used to segment customers. Once we have identified distinct segments, we can customize our products and marketing to each particular segment. We can identify a small niche and provide a better product for a subset of customers.

For example, our shoe brand might not be able to compete with Nike in the category of running shoes. But if our cluster analysis reveals that there are a lot of customers interested in environmentally-friendly running shoes, this gives us a chance to win a small part of the market by developing an eco-friendly product, positioning it accordingly and going after this niche target audience.

Or we can go one level deeper and identify a subset of customers who prefer environmentally-friendly running shoes that are optimized for sprints, not marathons. Or we can target a subset of customers who prefer environmentally-friendly running shoes that are optimized for sprints and have a retro look. Or we can market to a subset of customers who prefer environmentally-friendly running shoes that are optimized for sprints, have a retro look, and come in wide sizes. You get the point.

Are those segments big enough and distinct enough from other segments? Are there a lot people who are similar enough to be put into one segment? Cluster analysis can help us answer questions like these two.

Applying cluster analysis to customer segmentation can be helpful in at least two kinds of situations:

  1. We are a startup and there is a large established company with a generic product that we cannot compete with, so we want to identify a smaller profitable niche.
  2. We are a large established company with a portfolio of products and we want to target each product at a specific segment to better serve customer needs and avoid losing market share to smart startups described one sentence above.

How to use:

We will not be showing all the necessary commands starting from here because it would make it too long. Instead, we will show you when to use a certain type of analysis and walk you through the logic so that you have conceptual understanding. If you want to learn by the mechanics, please follow the links below or google tutorials for your preferred app. Now, to cluster analysis.

We can apply cluster analysis to a dataset of customer preferences that looks something like this:

How important is price? How important is that shoes are created for sprinting specifically? How important is retro look? Other…
Customer 1 4 6 7
Customer 2 8 5 9
Customer 100,000 3 8 4

Just as with regression analysis, the actual data crunching is done by the app. Follow the links after this section to learn the commands to conduct cluster analysis in your preferred app.

The goal is to get a description of several distinct clusters or segments. We can specify the number of segments in the settings. What we get is a result which should look similar to this table:

How important is price? How important is that shoes are created for sprinting specifically? How important is retro look?
Segment 1 3.4 7.1 0.2
Segment 2 2.2 0.4 8.3
Segment 3 8.1 1.2 1.3

By looking at the summary, we can see patterns emerge. The next logical step is to give our segments names – this is the part that’s the most fun!

  • Segment 1 is straightforward; these people want shoes made for sprinting and they don’t seem to care about price too much. So let’s call them “upscale sprinters.”
  • Segment 2 people seem to mostly care about a retro look and worry about prices even less. Let’s call them “wealthy old-schoolers.”
  • Segment 3 people, on the other hand, seem to mostly care about the price. So to simplify matters, we call them “thrifty runners.”

As you can see above, we generated segment descriptions. The next step is to assign each customer to a segment it fits best:

Customer 1 3
Customer 2 3
Customer 100,000 1

So what are the marketing and business implications? Now that we know the segments, we can:

  1. Evaluate the business opportunity by estimating the relative sizes of these segments.
    If our sample of 100,000 is representative of the total market, we can simply count the number of customers in each segment, divide by 100,000, and then multiply by the total market. For example, let’s say we want to estimate the size of customer segment 1. Let’s say the total market is 10,000,000 customers and 17,000 of our 100,000 customers got classified into segment 1. Then the size of segment 1 is 17,000 / 100,000 * 10,000,000 = 1,700,000
  2. Identify the most promising segments by what customer needs are we most likely to meet with our products and where we are most likely to be competitive.
  3. Make conscious decisions on what segments to target and what segments not to target.
    For example, if we can design shoes with a great retro look, we can go after “wealthy old-schoolers.” If, on the other hand, our company is the most cost-efficient on the market but does not really have a lot of designer talent, we might go after “thrifty runners.”

Tutorials on cluster analysis and SPSS/Stata/R:

Conjoint Analysis

When to use:

Learn which product attributes are most valuable to customers to achieve product-market fit and refine our value proposition.

For example, would our customers prefer a fast smartphone that costs $500 or a slower one for $350? Would they prefer a faster processor or more memory? Would they prefer a product from a known brand for $89 or a product from an unknown company with the same specifications for $59? This type of quantified data can help us make product or marketing decisions and achieve product-market fit.

How to use:

We need to develop a specific type of survey where customers would repeatedly choose between products with different sets of characteristics. This would allow us to subsequently rank individual features. Imagine we have four characteristics:

  1. Fast processor: 1,000MHz vs 500MHz
  2. Low price: $350 instead of $500
  3. Sleek design: 1 or 0
  4. Well-known brand: 1 or 0

It means that we will first need to create a list of all possible combinations. In this example, we have five attributes. So the list will be 2 * 2 * 2 * 2 = 16 rows long:

Combination # Fast proc. Cheap Sleek design Well-known brand
1 1,000 350 1 1
2 1,000 350 1 0
3 1,000 350 0 1
4 1,000 350 0 0
5 1,000 500 1 1
6 1,000 500 1 0
7 1,000 500 0 1
8 1,000 500 0 0
9 500 350 1 1
10 500 350 1 0
11 500 350 0 1
12 500 350 0 0
13 500 500 1 1
14 500 500 1 0
15 500 500 0 1
16 500 500 0 0

To find out real preferences, we will ask customers to choose between each possible combination. Here is an example of just one of those questions:

“What do you prefer: a smartphone with a fast processor, low price and sleek design from an unknown brand or a smartphone with a fast processor, high price and a sleek design from a well-known brand?

Conjoint analysis solves a very important problem – customers cannot estimate how much they value a certain product attribute. In other words, you cannot simply ask “how important is sleek design for you, in dollars?” and get an accurate/confident response. However, you can get a proxy of this estimate from a customers survey – it’s easy for them to choose between two clearly defined options and for you to analyze the results.

As with regression and cluster analyses, all calculations will be done by the app of your choice. For the actual command lines to use, please see the links at the end of this section.

Here is the general logic. When we have all customer preferences captured, we can derive a relative weight of each characteristic using one of the apps. We will skip this step (so as to not focus too much on calculations) and will describe the end result and marketing implications instead.

The end result of the analysis should look similar to this:

Fast pr. Cheap Sleek design Well-known brand
Customer 1 0.6 0.2 0.1 0.1
Customer 2 0.3 0.4 0.1 0.2
Customer 100,000 0.2 0.2 0.1 0.2
Average for 100,000 customers 0.6 0.2 0.1 0.1

Here is how to read this table. Once we know how all our customers or a certain customer segment values certain aspects of the product, we can calculate a so-called total utility of a product by summarizing all relevant attributes. Total utility is just a fancy term for how much customers are expected to love a certain product. Let’s take a simple example.

Let’s try to estimate how much our customers on average will like a smartphone which is fast, cheap and has a sleek design but is not produced by a well-known brand. To do this, we simply add all the relevant attributes:

Total utility1 = 0.6 * 1 (fast) + 0.2 * 1 (cheap) + 0.1 * 1 (sleek design) + 0.1 * 0 (not from a well-known brand) = 0.9

Now imagine that we want to compare this option with a slightly different product: the one that is not cheap but is produced by a well-known brand. We simply change two numbers in our equation to calculate the total utility again:

Total utility2 = 0.5 * 1 (fast) + 0.2 * 0 (not cheap) + 0.1 * 1 (sleek design) + 0.1 * 1 (from a well-known brand) = 0.8

The total utility in the second case turned out to be lower: 0.9 > 0.8. This means that our customers, on average, would prefer a cheaper phone with the specs above even if it’s not produced by a well-known brand to a more expensive phone produced by a well-known brand.

This knowledge will allow us to make better product and marketing decisions, achieve product-market fit, and make our customer happy.

Conjoint analysis and SPSS/Stata/R tutorials:

Causal Research

When to use:

Causal research aims to establish a causal relationship, not just an association.

Consider this example. Imagine we have two types of customers who buy monthly subscriptions. The first type pays for our service manually each month. The second type sets up an automatic recurring payment. Now imagine we conduct a regression analysis to understand what variables influence customer retention. In other words, what factors make customers stay with us longer? Now suppose this analysis reveals that on average, customers who pay automatically stay with us twice as long and have twice the customer lifetime value. We have just identified an association or correlation.

At this point, it’s very tempting to jump to the conclusion that we should move all our customers to the automatic recurring payments system. Since moving a customer from manual payments to automatic payments should increase expected customer lifetime value twofold, we might also incentivize customers to switch by giving them bonuses as long as we are breaking even. Right? Not necessarily. Why? Well, correlation does not imply causation.


Let’s take a moment to think what else might be going on in our example. Another explanation might be that the most loyal customers who are happy with our service are simply more likely to sign-up for automatic payments than customers who are less happy with our service. If this is true, making customers switch won’t make much difference because it won’t change their underlying satisfaction with our product. So they will still cancel subscriptions at the same rate. This is because there was a third variable – product satisfaction – which influenced both retention rate and probability of signing up for automatic payments.

Another potential problem with using associations or correlations to make important decisions is reverse causation. Imagine you read a headline that says, “If you want to be happy, get married,” and an article that explains that according to some study, married people are more likely to be happy, so you should get married if you want to be happier. But you might ask how the researchers made sure that they did not fall into a reverse causation trap. In other words, how do they know that marriage makes people happier and it’s not that people who are happier to begin with are simply more likely to get married? Maybe because they are more likeable? Or maybe there is a third variable. For example, health. Healthy people might be statistically more likely to be happy and also more likely to get married.

If this is the case, marriage might not necessarily increase the happiness level at all. To conclude so would be a mistake. After learning about reverse causation, you can never look at popularized studies the same way. Unfortunately, it’s hard to address the issue of reverse causation when studying such things as happiness and marriage, because setting up large-scale social experiments is usually impossible. But it’s easier to address this issue in marketing. Here is how to use causal research to identify if a causal relationship does indeed exist between the variables that are correlated.

How to use:

How do we actually take it one step further and establish causation, not only correlation? We conduct an experiment.

Let’s go back to our example with manual payments vs. recurring payments. We can establish a causal relationship by arranging an experiment. We can incentivize a subset of customers who currently pay manually to switch to automatic payments and analyze their cancellation rates. Did the retention rates remain the same? Did they improve? If so, did they improve to the levels of customers who opted-in by themselves or to a smaller degree? Once we know the result, we can decide whether incentivizing customers to move to automatic payments improves retention rates significantly enough to offset money spent on incentives. If we are breaking even, we can roll out this program from a subset of customers we initially tested it on to a broader audience of all our customers.

What other experiments are there? Many so-called growth hacks are experiments.

A/B testing is a technique used when we randomly show different versions of a website to different groups of people to conduct an experiment. In this case, a group of users seeing version A can also be called a control group. This is because we compare the performance of another group with these folks. We will discuss A/B testing in much more detail in the “Campaigns and Tactics” chapter.

Experiments can be arranged offline as well. Imagine we have two coffee shops in similar locations and want to evaluate the effect of a new interior design on the number of sales. So we redesign one coffee shop but keep the old design in the other one (visitors of this coffee shop will be our control group). Having a control group will allow us to control for other variables, such as holidays, for example. If we only have one coffee shop and change the interior design there, we won’t be able to attribute the increase in sales to the change in design because some other factors, such as the holiday season, might be behind increased sales.

It’s impossible to control for all the variables, though. A new apartment building might open next to one of our two coffee shops during the experiment or a competing place next to one of them might close down. These will skew the results. But those are unlikely events, and having a control group will take care of the most likely confounding factors.

If this correlation vs. causation discussion bored you, take a look at this list of “spurious correlations”, it might make you smile.

Data Visualization

When to use:

Sometimes we don’t need fancy statistics to get a point across or to understand the data. Sometimes looking at the data is enough. Everyone knows how to visualize small pieces of data in Microsoft Excel. But it gets a little bit trickier when a dataset grows large.

Almost everyone is familiar with simple visualizations in Excel, such as column charts, bar charts, pie charts, and line charts. Excel has its limitations though. One of the best visualization tools is Tableau. It has an intuitive interface and can be learned fairly quickly. Probably the best way to learn it is to simply install it and start playing around. Alternatively, you can follow official tutorials on website.

To show the full power of data visualization, let’s look at a fascinating example. Unfortunately, this is not a type of visualization one can create with Tableau, but it can serve as a great example of showing patterns that otherwise would be hidden.

Do you want to take a guess what this picture is showing?


This is a grid plot of the distribution of PIN codes that people use.

In this heat-map, the x-axis depicts the left two digits from [00] to [99] and the y-axis depicts the right two digits from [00] to [99]. The bottom left is 0000 and the top right is 9999. Yes, it takes a minute to wrap one’s head around it.


Yellow lines and dots represent the most popular PIN codes. So this diagonal line shows repeated couplets that people use: 0101, 0202, …, 5656, …, 9999.

Do you want to guess what that vertical line is all about? Take a few seconds…

It represents birthdays. People are likely to use their birthdays as pin codes: 1972, 1984, etc. As you can see, the line also gets denser at the top because there are more people alive who were born in the second part of the 20th century.

Now imagine how much harder it would be to find any patterns if we simply looked at a very long list of thousands of PIN codes. Read more about this example here.

Also, take a look at 2015 awards for the best data visualizations on Information is Beautiful Blog.

Data Visualization Tools:

Secondary Research

Secondary research involves summary, collation, and/or synthesis of existing research. It’s different from primary research, which involves data collection from different sources, such as research subjects or experiments.

What is important is that when we conduct secondary research, we do not talk to customers directly, but instead utilize public or paid datasets and reports.

An example? Let us say your company competes in the Consumer Internet space and you want to know the market share of different browsers. You might, of course, conduct a national or even a global survey of Internet users. But you might also google “browser market shares” and find a lot of data. Voila, you have just conducted secondary research!

So is secondary research just a fancy term for googling around? Almost. If you are reading this book, you probably expect more than a recommendation to use Google, so here are some other tools for secondary research:

Google Trends

When to use:

This tool allows you to see how Google search patterns change over time. You might use it to estimate customer interest – which is a proxy for demand – in certain topics at certain times in certain regions.

How to use:

Let’s say we sell sports equipment, and want to know if we should shift our focus from baseball balls to soccer balls or from running to weightlifting shoes. You might find it useful to learn how interest in these sports changes in our region. Let’s assume our store is located in Massachusetts. Here are some trends in this state.


There are few things you might notice. First of all, not many people seem to be interested in weightlifting in Massachusetts, so we probably should not make it our major category. Unless there are no other stores selling weightlifting equipment, of course.

Second, there is a clear seasonality. Especially for soccer: the interest peaks in June. What are the business implications? One of the most obvious ones is to adjust our merchandising or promo content and focus on soccer equipment at times when people are most interested in this sport.

Interest in running, on the other hand, looks more consistent, which is a little counter-intuitive. If asked to guess, I would probably hypothesize that there is a greater interest in running during summer months than during winter months, but the data disproves this hypothesis. This is why it is important to look at data as much as possible, instead of relying on intuition. Another insight is that the baseball and soccer peaks are somewhat out of sync. The potential implication here is to focus on baseball first (around June) and then (around September) on soccer.

Another way to leverage the power of Google Trends is to estimate the market share of your competition for free. This works best if your market is relatively big and companies are mature, since the websites people search for correlates well with the websites they visit. Here is an example showing how the landscape of online dating changed over time:


Google Correlate

When to use:

This tool is a little more interesting. It answers the question: “What are these searches correlated with?” We might use it to find underlying reasons for patterns that we identified with Google Trends. For example, we know when soccer searches peak but we don’t know why. Google Correlate might help us with this “why” question.

How to use:

Let’s try to answer this question: “What are these searches correlated with?” In other words, at times when people are more likely to search for soccer, what other terms are they more likely to search for and vice versa?


This is the result. These correlations make a lot of sense. Believe me, this is not always the case – experiment with other keywords.

As we can see here, interest in soccer is correlated with major soccer events. This might add another level of granularity to our marketing calendar. Instead of simply advertising our soccer equipment in June, we can probably time these promos around major events. Bonus points for figuring out which ones are most likely to have the strongest impact. Hint: simply use Google Trends again but instead of analyzing “soccer,” analyze “world cup Chicago,” “world cup tournament,” and other keywords.

Google AdWords: Keyword Planner

When to use:

This is the #1 tool for search marketing, but it can be used in a much broader context of identifying promising market niches, estimating competition, and refining marketing messaging. Search marketing is simply only one of the marketing channels and it will be discussed in the chapter on “Campaigns and Tactics.” But Google Keyword Planner is particularly helpful for exploring new business ideas.

How to use:

Google Keyword Planner lets you see what people all over the world are searching for. And this is a very good proxy for what they are interested in and for what they need.

Here are some ideas related to “soccer” business. If the link does not work, simply type “soccer ball” and click on “Get ideas.”


Not only you can generate ideas this way, but you can also do a quick check of how competitive a niche is and how much it would cost to advertise a certain product.

We can also filter results by location, language, and device type (desktop vs. mobile).

Public Datasets

When to use:

A more advanced way to conduct secondary research is to use public datasets. This can be helpful in the absence of your internal customer data. For marketing purposes, we might use census data to understand potential customers better.

How to use:

Public datasets are simply huge tables of data. So just as with other datasets, we can use visualization tools, such as Tableau, or conduct a statistical analysis using an application of your choice, such as Stata, SPSS or R.

Third-party Analytical Reports

When and how to use:

Sometimes you will not be able to find the information you are looking for in public domain. In such situations, it makes sense to buy a report prepared by one of the research companies, such as Gartner, IDC or Forrester.

Forecasting Sales for a New Launch

How can we forecast sales of a product before launching it using marketing research tools? Imagine Apple is launching a new iPhone and wants to know how many to manufacture. Here is one of the possible ways to do it.

First, conduct several waves of surveys (see above) asking “How likely are you to buy product X?”

  • Very Unlikely
  • Somewhat Unlikely
  • Somewhat Likely
  • Very Likely

Then, add a separate column with the actual sales in millions that we had. Here is how our dataset with historic data might look:

Very unlikely Somewhat unlikely Somewhat likely Very likely Actual Sales, M
iPhone 16 12 % 28% 30% 30% 9
iPhone 17 20% 25% 30% 25% 7
iPhone 18 30% 25% 30% 15% 1

The first four columns reflect surveys we did in the past before each launch. The last column reflects actual sales in millions.

Second, run a regression analysis with “actual sales” as a dependent variable. This will allow us to build a model predicting the value of actual sales based on customer sentiment. If you need a refresher on regression models, please re-read the “Regression Analysis” section.

To build a model in Stata, we need to create short variable names, so verun = “Very Unlikely,” somun = “Somewhat Unlikely,” somli = “Somewhat Likely,” velik = “Very Likely,” and actual = “Actual Sales.” Since we have only three rows in our database, we are not really striving for statistical significance here.

Here is the command:

regress actual verun somun somli velik

And here is the result we get in Stata:


Now we need to interpret it. If the table makes no sense whatsoever, please re-read the “Regression Analysis” section. Notice how Stata thinks that “Somewhat Unlikely” and “Somewhat Likely” responses are irrelevant.

When expressed algebraically, the model looks like this:

Y (actual sales) = _cons + (verun coef) * x1 + (velik coef) * x2= -9.67 + 5.55 * x1+ 60 * x2

Now suppose we are launching the product: iPhone 19. We can use the model we built to forecast sales for this product. Suppose we surveyed customers again and here are the responses:

Very unlikely Somewhat unlikely Somewhat likely Very likely Actual Sales, M
iPhone 19 28% 25% 30% 17% .

A manual way to calculate expected sales would be to plug the numbers into the model and do the math ourselves:

Y (actual sales) = -9.67 + 5.56 * 0.28+ 60 * 0.17 = -9.67 + 1.112 + 15 = 2.0868

So, according to our model, we can expect sales of $2,087K.

The easier way to calculate predicted sales is to simply type this command in Stata:

predict prediction

Predict is the actual command and “prediction” is just a variable name. We can use literally anything. When we run this command, Stata creates a new column and predicts a value. Here is how our dataset looks after this command:


For more information about building regression models, see the “Regression Analysis” section of this chapter.

More on forecasting:

  • Book: The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t by Nate Silver
  • Book: Superforecasting: The Art and Science of Prediction by Philip E. Tetlock and Dan Gardner
  • Book: Data Smart: Using Data Science to Transform Information Into Insight by John Foreman

Marketing Analytics

“What gets measured gets managed.”

Peter F. Drucker, management consultant and author of The Effective Executive

Marketing analytics is simply measuring, managing, and analyzing marketing performance.

When we are talking about marketing analytics, we are referring to analysis of data that we gather internally, within the company. For example, sales, product usage, website visitors, or newsletter sign-ups. Marketing analytics can be distinguished from marketing research. As you have learned, when we embark on marketing research, we usually get data externally (for example, by surveying prospective customers).

Some methods that we discuss in this chapter, such as regression analysis, can be applied to both data obtained through marketing research, such as customer surveys, or from marketing analytics, such as product usage data. But true power is unlocked when we integrate different sources of data into one coherent database. For example, a customer relations database might include product usage data, demographic data from surveys, and purchase history.

So the data analysis methods will largely be the same or similar to those previously discussed. The source of data is what’s different.

When to use:

Marketing analytics can be used to derive conclusions about the performance of our marketing and become more effective in the future.

How to use:

Data for marketing analytics can be extracted from multiple sources. We might use Google Analytics for web analytics to analyze how users interact with our website.

If we use databases to store our product data as well, such as user accounts and behavior, we can use this data. If our product is a mobile app, we can use services such as Mobile Analytics by Google or Mixpanel.

As we discussed in other chapters, the marketing metrics we focus on are combination of marketing strategy and business goals. This is why we should develop marketing strategy before setting up marketing analytics. We would not know what numbers to focus on otherwise.

Web Analytics

Web analytics is, just as the name suggests, the measurement, collection, analysis, and reporting of web data.

When to use:

When we want to learn what users are coming to our website and what they are doing.

How to use:

  1. Implement a tool of your choice on your website. Here is a tutorial on how to implement free and most popular web analytics tool Google Analytics.
  2. Clearly identify questions to answer or metrics to track.
  3. Review reports and refine questions. Rinse and repeat.

We will talk more about specific applications of web analytics, such as A/B testing, below.

Another important thing about web analytics is that web data can be analyzed in ways described above. For example, we might conduct cluster analysis on web data as well, not only survey data. What is even better is that many types of analyses will be done for us automatically by web analytics tools to save us hours and hours of fun with stats apps.


More on web analytics:

Databases and SQL

When to use:

Sometimes as a marketing person, you might need to pull your own data from a database, especially if you work for a tech startup. Even more so if it is a small tech startup and there is nobody to help with this.

SQL stands for Structured Query Language, a programming language used for pulling data from relational databases. If you search for product marketing roles at companies, such as AirBnb, Dropbox, Uber, Upwork or others you might see SQL listed among other requirements quite frequently.

Using SQL, you can get data from a product database where all user and product usage information is stored. Then, you can analyze this data to learn about your users and answer questions like “what types of users do we have?” or “how are they using the product?”

How to use:

Learning SQL might take quite a bit of time, but basics are not that hard. Here are some tools and resources to get you started if you want to learn SQL:

A/B Testing

When to use:

We already touched on A/B testing, which is an experiment in which we test the performance of two versions, A and B, against each other on a certain dimension.

Let’s take product page as an example. Version A might use customer references as the primary content while version B might present a list of the technical specifications of our product. Which one will perform better? Instead of guessing, we can identify a goal, such as click-through rate, purchase, user registration, or newsletter subscription, and optimize towards that goal.

A/B testing was pioneered at Google in February 2000. Google wanted to understand whether the number of search results per page was optimal from a user perspective. So they ran an experiment on 0.1% of users testing 20 results, 25 results, and 30 results. These days most tech companies constantly run multiple A/B tests on multiple pages.

For example, at Mozilla, we ran A/B tests to optimize Firefox product pages to increase downloads. Among many other things, we learned, for example, that minimalist pages tend to perform much better. The more cluttered a page is and the less clear the call to action is, the more confused users will be.

Even Barack Obama’s team used A/B testing in 2007 to maximize newsletter subscriptions and donations. The best variation had a sign-up rate of 11.6% vs. 8.26% on the original version. This 40.6% improvement translated into a difference of 2,880,000 email sign-ups. On average, 10% of people who received emails signed-up as volunteers. So this A/B test brought an additional 288,000 volunteers. An average donation per user was $21. So the additional 2,888,000 email sign-ups translated into an additional $60M in donations.

This is performance marketing at its best. Instead of guessing, we are letting users decide and vote with their clicks, subscriptions and wallets.

How to use:

  1. Identify a metric to optimize. It can be sales, click-throughs, newsletter subscriptions, user registrations, social media shares, video views, free product downloads or anything else. Business goals and marketing strategy should inform this choice.
  2. Develop a hypothesis. For example, we might test different copies, images or designs. Previously conducted marketing research, especially qualitative studies, might help here. For example, we might know from focus groups that our customers care mostly about product features and not so much about design or price. How about putting this statement to test by designing a product page which emphasizes product features above anything else?
  3. Decide how you are going to use results before starting the experiment.
  4. Set-up the experiment. Choose one of the tools listed below and follow instructions on the website to set-up the experiment.
  5. Analyze the results and make appropriate business decisions. Data without the context is useless. Once we have data, we need to interpret it, not just blindly follow whatever it says. A well-formed marketing strategy grounded in marketing research can help make the right decisions.

So far we talked about A/B testing performance of websites, but there are other applications as well. For example, we can also A/B test our emails. We can even A/B test regular mail by sending different letters to randomly selected samples of customers. We can A/B test bar signs, even though it will be a bit harder to lock all other variables.

A/B testing is not a new idea either. David Ogilvy, founder of Ogilvy & Mather, sometimes called the “Father of Advertising,” said: “Never stop testing, and your advertising will never stop improving.”

The most important thing to understand about A/B testing, though, is that it’s not a panacea, but just one tool in a toolbox. It can be very effective at answering specific tactical questions, but it cannot replace a proper marketing strategy. A/B testing might be good for tiny marginal improvements, but not for the big leaps of a truly innovative, or shall I say disruptive, nature. A deeper understanding of the market, customers, competition and trends will be needed to make leaps like this possible. That’s why we started with the chapters on “Strategy and Management” and “Business Growth.”

More on A/B Testing:

More about the Obama case study:

Other A/B test examples:


Attribution Models

When to use:

Attribution models are used when you acquire customers through various media channels, such as search engine ads, display ads and emails, and want to know how many sales you generated through each of these channels.

The simplest form of attribution in existence is called “last touch attribution.” Here is how it works. If a customer bought our product after getting to our website by clicking on a search ad, we conclude that this sale was generated by a certain search ad campaign. This is a simple and straightforward way to analyze our sales, right? Well, yes. Except it’s wrong.

Think about it. What if the same customer had been exposed to our display advertising eight times during the three weeks preceding the click on the search ad and the purchase? How do we account for the fact that a given customer can be touched by various marketing channels, such as social media, email, display ads and others, before finally making a decision to purchase? Does this repeated exposure increase the effectiveness of the search ads or not?

A multi-channel, also called cross-channel or split-channel, attribution model can answer questions like “How many customers who bought our product saw an ad before purchase?” and “How many of our sales would not have happened had we not launched a certain marketing campaign?”

Here is an example of the type of information an attribution model can provide. Imagine we are trying to analyze our display advertising and decide whether we are breaking even. By creating an attribution model, we might learn that 60% of our customers were “touched” by display ads (60% of the people who bought saw a display ad), but only 30% of them were actually driven by media.

In other words, if you sold 1,000 products, you might conclude that 300 of these sales or customer acquisitions were generated by display ads and would not have happened otherwise. Dividing the total budget per channel by 300, we can derive CAC (Customer Acquisition Cost). Finally, we if we know our CLV (Customer Lifetime Value), we can answer the question on whether we are breaking even by subtracting CAC from CLV.

How to use:

Building a fully functioning attribution model is definitely outside the scope of this book. Sometimes teams of people can work on creating attribution models for months, and then refining and improving them over years.

However, one can get started and create a simple model by using regression analysis explained in this chapter. In this case, we would work with a dataset of customers that would include purchasing history as well as prior exposure to various marketing campaigns.

We would have a variable of total number of purchases or total amount spent. This would be our dependent variable. Exposure to various media ads will serve as dependent variables. For example, we might have a variable indicating how many times a given customer was exposed to display ads within a certain period of time.

Then we would conduct regression analysis to identify relationships between variables and estimate by how much we expect to increase the probability of purchase, number of purchases, or dollar amount spent by exposing a customer to a certain type of ad: search vs. display vs. email.

The model can get super complicated very quickly though, and might require applying much more sophisticated algorithms or machine learning techniques.

More on attribution models:

Working with Research Agencies

In some situations, online research will suffice. In others, you might consider buying a research report. And in still others, there will be no report and you will need to roll up your sleeves and do the research yourself. But sometimes a research project might be so complex that you will need to work with a market research agency. In fact, if you work for a large organization, this is likely to be your default option. It also makes perfect sense because nobody knows market research as well as market research agencies. I have briefed and managed quite a few agencies in the past and can share some advice in this regard.

According to, the biggest market research agencies are Nielsen, Kantar, IMS Health Holdings, Ipsos, and GfK.

What is important to understand about choosing market research agencies is that they specialize in different things. Some might be spectacular at focus groups, but have little expertise in quantitative research and vice versa. Another important consideration is that leaders in your particular region might differ from global leaders. So if you are looking for an agency that is strong in offline surveys in Chile, for example, you might be choosing from a different list of market leaders.

Once you have chosen your partner, there are several things to keep in mind to ensure a productive collaboration.

The shortest way to summarize relations with agencies is “garbage in – garbage out.” In other words, when you brief an agency, you need to provide all the necessary business and industry background that this agency needs, omitting all the irrelevant details. What exactly is necessary and what is not? Your agency of choice will be in the best position to guide you in terms of inputs that are most helpful.

Marketing Research/Analytics Checklist

  • We set research objective that support our marketing strategy
  • We decided how results will be used
  • We chose the right marketing research/analytics method to achieve the objective

A quick recap

  • Rule #1: understand “why” behind customer needs
  • Rule #2: identify company goals and capabilities
  • Rule #3: achieve product-market fit
  • Rule #4: think strategically to create a sustainable competitive advantage
  • Rule #5: foster a collaborative culture between engineering and marketing
  • Rule #6: use data to set goals and inform marketing strategy
  • Rule #7: select research and analytics techniques appropriate to the purpose

More on Data Analysis and Statistics