From ‘Good’ to ‘Excellent’: Engineering a Trustworthy Data Foundation Across 1.3M Transactions

Overview: Data Foundation for Pricing Analytics

This case study shows how a manufacturer data foundation was lifted from Good to Excellent across 1.3M transactions — correcting 27K+ negative-cost rows and orphaned costs that would otherwise have poisoned every downstream pricing and RGM conclusion. A trustworthy data foundation is the prerequisite for price elasticity, market basket, RFM, and product segmentation analytics; without it, even the most sophisticated models produce outputs leadership refuses to commit to. The data foundation engineering work is therefore the on-ramp that makes every subsequent analytics engagement defensible and repeatable. See the full case study below, or read our related case study on Building the AI Analytical Foundation.

Client Situation

The manufacturer’s raw data was sprawling: multiple historical yearly sales files, supplementary customer and pricing-agreement files, and a long tail of systemic data issues (misaligned sale / cost dates, orphaned costs, return-transaction mismatches, and over 27,000 transactions with negative costs on positive-quantity sales).

Data foundation quality scoring for 1.3M manufacturer transactions

Without remediation, these issues would have compounded through every downstream calculation — distorting gross margin at the SKU, customer and category level and undermining any pricing or RGM conclusion built on top of them.

An initial Data Health Check scored the raw data at 83.69 — the ‘Good’ tier but below the ‘Excellent’ standard required for Revify onboarding.

The Revify Approach

Phase 1 — Unified Historical Dataset

  • Consolidated multiple yearly sales files into a single master dataset of over 1.3M transactions with structurally consistent fields across periods.

Phase 2 — Standardization & Enrichment

  • Mapped client-specific fields (e.g., Raw_Sales, Raw_Discounts) to standardized fields (e.g. GrossSales, InvoicedSales).

image 39
  • Enriched every transaction with customer-headquarters data and pricing-agreement context, linking every sale to its customer attributes, pricing tier, and product-specific discount.

Phase 3 — Advanced Cleansing for Financial Accuracy

  • Realigned sales and costs recorded on mismatched dates to produce time-accurate margin calculations.
  • Enforced hierarchy consistency: products held a consistent category across their full history, eliminating mis-classification-driven trend noise.
  • Deployed an advanced returns-matching algorithm that identified and removed over 5,000 transaction lines representing $538,907 in orphaned costs — costs that were not tied to any sale and were inflating COGS.

Phase 4 — Final Preparation & KPI Derivation

  • Derived critical KPIs that were not explicit in the source data: GrossMargin, DiscountRate, InvoicePrice.
  • Assigned each customer to a strategic segment (Strategic Account, Core Customer, etc.) based on sales contribution.

Key Findings & Results

The pipeline moved the dataset from an 83.69 ‘Good’ rating to a 93.50 ‘Excellent’ rating — with the most dramatic gains in Problem Transactions (0% → 100%, driven by the resolution of 27,000+ negative-cost records) and Consistency (79.17% → 93.75%, from hierarchy backfilling).

image 37

Equally important, the $538,907 of orphaned costs that had been quietly distorting profitability reporting were fully netted out and captured into a separate bucket for analysis— materially changing the reliability of every downstream margin analysis at granular level.

IMPACT DIMENSIONQUANTIFIED BENEFIT
Overall Data Health Score83.69 → 93.50 (Good → Excellent)
Transactions processed1.3M+
Orphaned cost removed$538,907 (5,000+ lines)
Problem Transactions score0% → 100%
Consistency score79.17% → 93.75%
Completeness score95.21% → 99.26%
Negative-cost records remediated27,000+

Why This Matters

You cannot negotiate with a flawed margin number. Fixing $538,907 in orphaned costs was not a back-office tidy-up — it was the difference between a defensible profitability view and a misleading one.
image 36

Conclusion

The data engineering work was not a technical exercise; it was a targeted remediation of the specific issues that would have compromised every downstream pricing and RGM conclusion.

With an ‘Excellent’ data foundation in place, the manufacturer’s subsequent analytics — price elasticity, market basket, RFM, product segmentation — produced results leadership could actually commit to.

Related Case Studies

Further reading

For broader industry perspective on revenue growth management and pricing analytics, see McKinsey’s Growth, Marketing & Sales insights.

Author

Get in Touch

You are on the right spot!

We are still working on this to give the best insights. 

We will inform you once this is done.