CDP – AI Models with Significant Training Errors 😨

Training models on 5% of traffic leads to the formation of many overlapping layers of systematic errors. This already has a crucial impact on the quality of recommendations in mailings (sic!) and on the site for recognized users.

This is a key problem related to machine learning, which CDP platform providers remain silent about, and which directly undermines their claims regarding product recommendations and purchase forecasting. A model trained on “known customers” learns (very sloooowly) to optimize for people like your best current customers, not for the broader market you are trying to capture.

We want to tell you why the quality of data for training AI models matters.

Or schedule a demo right away – we’ll talk about it at the meeting.

Request demo

Training on a Non-Representative Sample

CDP/MA systems claim to use machine learning / AI for product recommendations for e-commerce. However, the 5% of traffic (see: Reality of CDP/MA – so much effort for 5% of traffic) on which models could be trained is not representative of the entire audience. The segment of current customers almost never constitutes a random sample. It is systematically distorted:

Firstly, these 5% are people who have already converted or shown enough engagement to reveal themselves, not potential customers at earlier stages of the sales funnel. They are usually more active, more willing to share data, and may also be more attached to your brand.

Their purchasing patterns, browsing habits, and product preferences differ from those typical of regular users or price-sensitive individuals (behavioral bias). A model trained on “known customers” learns to optimize for people like your best current customers, not for the broader market you are trying to capture. These are completely different behavior patterns.

Additionally, there is the issue of the time needed to train models.

If you already feel something is wrong, schedule a demo right away.

Request demo

Time Needed to Train Models on Customer Data

In layman’s terms, training models on 5% of traffic should take 20x longer than on the entire population. So something that is normally achieved in a month takes almost 2 years. But even “more time” does not solve the problem; it’s not just a matter of the amount of data that can be applied to models by being patient.

Over time, you do not get more representative data. The pool of known customers, constituting 5%, remains the same pool, and the model becomes fundamentally skewed. Six or twenty months of error-laden data are still error-laden data.

Problems related to the “cold start” further accumulate. If your product recommendation model was trained solely based on purchases by current customers, it will perform poorly for new visitors browsing unfamiliar categories, price-conscious buyers (who may not be in your known database), different geographic or demographic segments.

95% of anonymous traffic not covered by learning contains key patterns that you ignore:

What products do regular users actually browse (not just buy)?
What content appeals to first-time visitors?
Where do browsers leave the site before completing a transaction and identifying themselves?

You are training a model on a sample biased by outcomes, ignoring behavioral signals from the entire funnel, in product recommendations based on the purchase history of current customers. Weaknesses of such a model: suggesting products to new visitors or other segments. Recommendations seem generic or miss emerging trends that browsers discover first.

Purchase forecasting and assessing purchase propensity based on people who have already made a purchase overlooks anonymous visitors who show great interest but have not yet logged in (they can also be customers).

Such models also fail in churn prediction. They are trained on your current customer base but cannot extract reasons why people left the site after browsing (never became “known”). You know when current customers leave, but you miss opportunities to convert browsers into customers before they leave the site.

If you already feel something is wrong, schedule a demo right away.

Request demo

Bad Data = Bad Model

The claim by a CDP platform provider that its models will mature over time, even though they were trained solely on 5% of identified traffic, is at best optimistic.

You are building on foundations that fundamentally do not reflect your actual audience. Models may achieve some higher level of statistical confidence over time, but they are certainly learning the wrong patterns.

To overcome this limitation, you need your own behavioral data on 95%+5%=100% of users (event tracking without the need for identification), which provides the most representative signal and understanding of how anonymous guests become known customers.

However, many CDP platforms do not emphasize this because it does not attract attention as much as the slogan “track everything about your customers”.

If you already feel something is wrong, schedule a demo right away.

Request demo

How to Properly Train AI Models?

CDP/MA platforms are unable to properly train AI models due to their “structural” error. They are designed for communication with known users, which in e-commerce means customers who have already purchased. Additionally, only when the customer remains recognized on the site (did not use incognito mode or did not delete cookies, not to mention ad-blockers).

Many companies, for which anonymous traffic is a daily occurrence, choose a decision engine based on anonymous data first, rather than a CDP. Why? Because a decision/recommendation engine trains models on 100% of traffic. This includes Quarticon, but also systems like Prefixbox, Nosto, Algolia, or Bloomreach. The return on investment is clear in this case.

If you feel like you’re wasting your budget on costly CDP/MA – don’t hesitate. Schedule a demo of the Quarticon decision engine for e-commerce and save tens of thousands of PLN! Quarticon

Request demo

How to Start Using Quarticon?

See how to start using Quarticon tools. Integration depends on the type of e-commerce platform. Quarticon works with all e-commerce platforms. The following instructions are universal for all, although in many cases integration will be even simpler.

1.

Create an account in Quarticon

Create a free account in Quarticon and subscribe products of your selection.

2.

Add Quarticon’s Universal Tag

Add a simple JS snippet to your site.

3.

The last step

Depending on your cart platform, we will fetch your product catalog either with API, or from the product feed.

You May Also Be Interested In:

CDP – AI Models with Significant Training Errors

Training models on 5% of traffic leads to overlapping layers of systematic errors, reducing the effectiveness of recommendations even in emails.

CDP/MA – understanding the timing paradox

The timing paradox using CDP/MA is that it starts working when it’s already too late. It doesn’t work at the critical moment of decision-making.

The Reality of CDP/MA – So Much Effort for 5% of Traffic

The sales trick of CDP/MA is to present traffic as ‘Your customers’, while most events belong to anonymous, unidentifiable individuals.