Azure Data Explorer succeeds because it indexes aggressively at ingest so it can ignore aggressively at query. When you "read online" in ADX, you aren't reading the data. You are reading the index of the index .
Most systems "read online" by brute force. They spin up 50 nodes, shuffle terabytes across the network, and pray the optimizer doesn't choke. ADX does it differently. It leverages a proprietary indexing technology that is closer to a search engine (think Elasticsearch) than a traditional database (think Postgres), but with the aggregation power of a column-store.
But anyone who has tried to run a high-cardinality GROUP BY over a petabyte of unstructured JSON in a data lake knows the truth. The truth is . You compromise on latency (waiting 30 seconds for a dashboard to load). You compromise on concurrency (the fifth user crashes the cluster). Or you compromise on data freshness (welcome to the world of hourly micro-batches).
The lie is this: "You can use your data lake for everything. Just add a little Spark, maybe a dash of Presto, and voilà—real-time analytics."
We’ve been sold a comforting lie for the last decade.
Scalability is not about how much data you can store . It’s about how much data you can forget —while still answering the question.
Your future petabyte-scale self will thank you.