Five Essential Data Science Terms Marketers Need to Know

Five Essential Data Science Terms Marketers Need to Know

Once considered the land of “creative-types with intuition”, marketing is now a quantifiable, data-driven function. Today’s marketers proclaim customer data as their new “oil”. The most important evolution in the history of marketing is the ability to understand what data you have, what data you can get, how to organize and, ultimately, how to activate the data. If data is the new “oil” then data engineering and machine learning are the refinery. A new edict rules: no marketing decision shall be made without closely consulting the data-analytics crystal ball…ahem, dashboard. Thus, today’s marketers must understand data acquisition, curation and analysis. Here are five essential data science terms today’s data-driven marketers need to know.

1. Machine Learning

Machine learning is a set of methods and tools for finding and learning patterns and making predictions based on them. Unsupervised machine learning allows computers to find insights that help humans better explore complex data sets. Natural language processing (NLP) can be applied to unstructured data (see below) such as tweets, blog posts, comments to classify the ‘topics’, sentiment, entities and author attributes that are contained with each piece of content. Examples of topics could be people, places, products, model numbers, etc. Extracting these topics from content and giving it structure makes it easier (and scalable if you are talking about billions of documents) to understand at a glance what the key themes are of the content without having humans to extract it.

2. Unstructured Data

Unstructured data consists of the non-predefined fields found in the vast ocean of social media, including the endless flow of tweets, Facebook wall comments, blog posts and customer reviews. Other common unstructured data includes:

  • Text files like Word documents
  • PowerPoint presentations or PDF files
  • Audio, including voicemail messages, customer service recordings or 911 emergency calls
  • Images like photos, illustrations or infographics
  • Videos, including YouTube posts, instructional videos, police dash cam recordings or personal video
  • Messaging, like text messages or instant messages

Unstructured data accounts for the vast majority of social media content. According to IDG, unstructured data is growing at the rate of 62% per year. To put the amount of unstructured data most companies collect into context, a focus on typical transaction and account information means that unstructured data typically outweighs structured data by five-to-one in marketing, sales and service environments.

3. Structured Data

Structured data comes with predefined fields and attributes. Conventional CRM does a very good job of recording a customer’s interactions with an organization in a highly structured and regimented way. The data recorded about each customer in such systems typically and naturally has structure because CRM tools record information such as prospect/customer name, account number, company, purchase amounts or time of purchase. To be sure, structured data like this is hugely important for understanding customers, however, consider all of the insight relating to your customers and prospects that is not captured in your CRM database.

It’s also the case that many fields in a structured data set contain attributes, such as comments from a sales person, that contain unstructured text inputs. In fact, many data sets are semi-structured in this way. For example, social media profiles contain many structured fields, such as the author, date fields and metadata from a tweet or the sender, recipient and subject line of an email, or the number of “likes” a Facebook page has received. Some fields are quasi-structured, such as locations in Twitter profiles, which range from clear locations like “Austin, TX, USA” to “Justin Bieber’s Heart” to “behind you” to nothing at all.

All these fields, whether structured, semi-structured or unstructured, can be enormously helpful for answering different questions. The real gems tend to surface by using machine learning to extract inferred attributes from the unstructured data. 

4. People-based Marketing

People-based marketing uses data-driven insights to identify and target high-value individuals. This strategy focuses on individuals that have a higher likelihood of conversion and better life-time value. Effective people-based marketing drives clear business results as marketers are measuring their efforts based on a smaller, more-targeted audience segment and delivering highly personalized messaging and content that address the specific needs these high-value segments face. Without the data granularity of a people-based dataset, it is impossible to stitch (see next) individuals across multiple data sources.

5. Stitching (Data Unification)

Stitching is the ability to identify and unify duplicate records across multiple data sets. For customer analytics, this falls into the general task known as identity resolution, in which an algorithm must decide whether two (or more) personal profile records refer to the same or different people. Such algorithms typically use a mix of rule-based and machine learning components, and they pay attention to fuzzy string matches on names, geographical proximity, and known demographic attributes.

This capability brings together the structured data already inside of the CRM or marketing automation environment with structured transactional data and/or unstructured social data. As a result, you get a more complete view of the customer (or potential customer). For a sales team looking to generate leads, it’s helpful to have a structured record of which pages a customer has viewed on the company’s website. It’s even more powerful if the topics being discussed on those pages are made clear so that you can understand what specifically was of interest to your lead. Using that insight means you can immediately determine the interests and intent of your prospect, and engage in a more relevant sales dialogue.

Data-driven Marketing Operates on a Different Plane

Brands are struggling to generate effective actionable audience insights from the vast ocean of social media, including the endless flow of tweets, Facebook wall comments, blog posts and customer reviews. It’s impossible to mine meaningful information from these huge, seemingly unrelated, data sets without a strategy. The days of “shiny” vanity metrics posing as analytics is over. When using machine learning, brands are able to gain actionable insights from unstructured public data for a strong competitive edge.

If you want to learn more about these data science terms and how they apply to audience insights, please feel free to email me ken@peoplepattern.com or tweet me at @chonuff.