Six More Essential Data Science Terms Marketers Need to Know

For better or worse, the Don Draper era of marketing is over. Gone are the days of sitting alone in a gigantic wood-paneled office, nursing a 10 a.m. glass of scotch, and generating audience insights by pensively staring out of the window. Today, data science rules, and marketers aim to take a data-driven approach to identifying and engaging with their audiences. Understanding the science behind this approach is the best way to take full advantage of the insights data-driven marketing can offer.

So, without further ado, here are six more essential data science terms that marketers need to know:

Artificial Intelligence (AI)

While the phrase “artificial intelligence” may conjure up images of Star Trek, 2001: A Space Odyssey, and other fictional works set in the distant future, the reality is that AI exists today. In a broad sense, artificial intelligence refers to computerized systems which are able to perform very complex operations, like language translation, that require a high degree of cognition to complete.

One subfield of artificial intelligence that is particularly relevant to data science is machine learning, or the ability of a computer to learn without being explicitly programmed. Machine learning can be supervised, meaning that it is “trained” with a set of labeled data provided by a human before study new information. It can also be unsupervised, a more complicated task in which no training data is provided and the machine itself identifies patterns within a data set. Marketers use artificial intelligence to analyze huge amounts of data and determine audience demographics, identify popular interests, and discover new customer personas.

Natural Language Processing (NLP)

Natural language processing is a discipline that incorporates artificial intelligence, linguistics, and computer science with the ultimate goal of teaching computers to understand human language with a high degree of reliability.

As a rule, computers are fantastic at carrying out straightforward operations, like solving algebra problems or following a pre-defined set of instructions. What computers aren’t very good at is dealing with ambiguity, which is a defining characteristic of human language. Consider the following sentences:

“Federer was cool as a cucumber during the semi-final.”

“I can’t wait to go to the DMV!”

“Do you think Han has a thing for Leia?”

By default, a computer would read these sentences literally, causing it to wildly misinterpret each sentence’s true meaning. This is where natural language processing comes in – NLP techniques enable the computer to correctly identify and interpret complex linguistic features like idioms, sarcasm, and slang. Audience intelligence platforms like People Pattern use NLP to analyze millions of social media posts and extract useful audience insights like expressed interests and brand sentiment. Natural language processing is an essential component of the audience intelligence “refinery” that converts massive quantities of unstructured social data into actionable insights.

Classification

Classification is a supervised machine learning technique used to identify communities within complex networks. The platform is given a set of training data, for example, a large number of Twitter profiles where the demographic information for each has been already determined and labeled by a human. It analyzes this data set, looking for patterns. Then, when the platform ingests and analyzes new, unlabeled profiles, it uses the knowledge taken from the training set to determine which demographic categories each new profile falls into. Audience intelligence platforms like People Pattern use classification to quickly determine audience information like core demographics, brand sentiment, and audience locations.

Clustering

Clustering, or cluster analysis, is an unsupervised machine learning technique used to identify similarities and differences in objects within a data set, and place them into groups based on the findings. After identifying audience interests (see above), an audience intelligence platform uses clustering to find correlations between these expressed interests. For example, cluster analysis might reveal that outdoorsy people in a specific audience also tend to be interested in politics and media. The people who have expressed these interest then placed into groups, enabling marketers to discover new personas and scientifically verify their existing assumptions about customer personas.

Label Propagation

Label propagation refers to the use of a specific algorithm for the purpose of discovering keywords (labels) related to a certain topic. Because it has both supervised and unsupervised aspects, label propagation is considered a semi-supervised machine learning technique. As an example, imagine that you want to identify labels associated with vacationing at the beach. You begin with a large sample of social media posts and some obvious terms like “beach,” “vacation,” and “surf.” Then, when you analyze the sample with your label propagation algorithm, you discover more associated terms, like “waves,” “tanning,” and “palms.” Since these are all relevant to beach vacations, you add them to your algorithm and repeat the process until you have as many labels as needed for your analysis. Marketers can use these labels to search for potential new customers by identifying people outside of a particular audience who have expressed interests popular with that audience.

Deterministic vs. Probabilistic Data

Deterministic data is information that people provide about themselves. Examples of this include customer surveys, login information, and transaction data. Marketers like deterministic data because it’s generally accurate. It is often used for cross-device tracking – for example, if a user logs into their Instagram profile from their phone, tablet, and laptop, a marketer could reasonably assume that the user owns all three devices, then target him or her on all three platforms in order to provide a more complete user experience. Deterministic data isn’t perfect, though: it’s rarely updated and it offers little insight into audience segments that aren’t currently being engaged.

Probabilistic data is generated by analyzing large data sets, identifying patterns, and using the results to make inferences about the general population. This is useful for “filling in the gaps” of incomplete user information. As an example, after running an analysis, you could be reasonably confident that a white person who lives in Florida and expresses an interest in shuffleboard is a male older than 60. Although probabilistic data is inherently less accurate than deterministic data, it can provide audience insights on a much larger scale than deterministic data alone.

New challenges, new opportunities

The information revolution has proved to be a full-scale extinction event for old-school marketers. An exponential rise in social platform use, customer journeys, and available media outlets has rendered traditional marketing strategies obsolete, but thanks to modern data science, today’s marketers are able to identify and engage with their audiences more efficiently than ever before.

Want to see data science in action? Click here to request a demo of the People Pattern platform.

Recent Posts

Recent Comments