Data Engineering with Pathum

Posts

- December 28, 2025

When Strong Correlation Leads to Wrong Decisions When Strong Correlation Leads to Wrong Decisions I’ve seen scatter plots with beautiful trends: clean lines, strong correlation, high confidence. Yet, despite the apparent relationship, decisions sometimes fail in production systems. The Correlation Trap Correlation feels convincing, but in real systems, variables rarely behave in isolation. Even a strong ρ(X, Y) does not imply causation. Production decisions require context, timing, and system awareness. Common pitfalls when interpreting correlation: Correlation is conditional: Relationships change when the environment changes. Hidden variables distort signals: Unobserved factors may drive both X and Y. Aggregated data lies quietly: Trends vanish when zoomed into time windows. Time order matters: Cause must come before effect, not after. Decisions amplify errors: Small assumption errors can scale into large impact. From Analytics to Action G...

- December 21, 2025

When the Average Looks Fine but the System Isn’t When the Average Looks Fine… but the System Isn’t Most dashboards report the mean. On the surface, everything appears stable. But most real-world failures do not live in the average. They live in the distribution. This pattern appears everywhere, from business metrics to machine learning systems. The headline number stays flat, while the underlying behavior quietly shifts. By the time the average moves, the system has often already degraded. The Illusion of Stability In real data and AI systems, stability is often assumed because the mean remains constant. However, variance can increase, distributions can widen, and tails can grow heavier, all while the average looks perfectly normal. This is exactly how many production issues begin. Risk increases silently, but dashboards continue to report healthy numbers. Why Averages Fail in Real Systems Several critical signals are consistently overlooked when system...

- December 16, 2025

Anomaly Detection in Daily KPI Monitoring: Data Engineering Perspective Anomaly Detection in Daily KPI Monitoring: A Data Engineering Perspective Most KPI dashboards look simple at first glance. However, real production systems are rarely that straightforward. After detecting anomalies, the bigger challenge is understanding why the signal changed. This is where data engineering provides critical insight. Understanding the KPI Signal KPIs without context can be misleading. Time-series data is shaped not only by the system's behavior but also by pipelines, aggregation logic, and feature engineering. Common pitfalls in KPI monitoring: Missing or delayed data – Out-of-order events distort baselines. Aggregation without meaning – Averages hide variance and extreme events. Single-point KPIs – Ignoring rolling windows and trends hides important patterns. Lack of correlation analysis – Latency, throughput, and error rates are interdependent. No system context ...

- December 14, 2025

Anomaly Detection in Daily KPI Monitoring: A Data Engineering View from Real Systems Anomaly Detection in Daily KPI Monitoring: A Data Engineering View from Real Systems Daily KPI monitoring is often misunderstood as watching dashboards and reacting to alarms. In real production environments, especially large-scale networks and data platforms, the real challenge lies in detecting subtle changes early, before users feel the impact. In practice, anomalies rarely appear as clean failures. They emerge quietly, as small statistical deviations buried inside time-series data. By the time a KPI crosses a hard threshold, the system has often already been degraded for some time. Effective anomaly detection sits at the intersection of data engineering foundations, mathematical modeling, and domain understanding. Ignoring any one of these leads to noisy alerts, missed signals, or delayed responses. The Data Engineering Foundation Comes First Before any detection logic i...

- December 12, 2025

Data Engineering with Pathum – Beginner-Friendly Guide Welcome to Data Engineering with Pathum Hello and welcome! My name is Pathum Dilshan, and I created this blog to make data engineering accessible for everyone, whether you are just starting out, a student exploring new skills, or a professional looking to grow. I know how overwhelming data concepts can feel at first, and that’s why my goal is to break everything down in a way that is simple, practical, and even enjoyable to learn. Welcome to Data Engineering with Pathum Why This Blog Exists When I started learning about data, I realized that most tutorials assume you already know a lot of things, and that can make beginners feel lost. This blog is my way of helping people navigate the world of data engineering without feeling intimidated. I want to show that with curiosity, logical thinking, and consistent effort, anyone can learn to work with data effectively. Throughout this blog, I’ll share ex...