Updates: Snowflake & Databricks - Controlling The MDS And GenAI PaaS Future (Pt.4)
Summary
- Snowflake's future hinges on governance tools like Polaris and Horizon, compute engine cost-effectiveness, and a rich ecosystem, including AI integration.
- Despite supporting Iceberg, Snowflake and Databricks still lock customers into their ecosystems, with governance and performance trade-offs determining the best platform.
- Databricks' shift to a serverless model could boost revenue, but not as much as what Wall Street is anticipating.
- Investor sentiment around Snowflake is currently negative, but its product leadership and innovation in core areas suggest a strong long-term position.
Specific Thoughts on Snowflake
End game of MDS >>> Governance & Compute Engine Cost Effectiveness & Ecosystem
The end game for MDS involves Governance, Compute Engine Cost Efficiency, and Ecosystem Expansion (potentially AI?). A key question remains: Can Snowflake maintain control over customer data and compute processing when customers are free to choose any compute engine? Databricks, for instance, highlighted their support for Iceberg, promoting a "best engine wins" approach.
Indeed, the performance and cost-effectiveness of the compute engine is a key factor in the competition between Snowflake and Databricks. Though, given that the difference is rather narrow, governance is likely to arise as the main differentiator. This makes Snowflake's Polaris and Horizon very important for cloud data warehouse pioneer's future.
Polaris is a solution that manages the metadata associated with stored data, enabling compute engines to execute queries more efficiently, as well as providing the data required for governance controls. Snowflake has recently open-sourced Polaris so that it can be used as the metadata management layer for external (i.e., Iceberg) tables and third-party compute engines. Horizon works at a higher level, by using the metadata provided by Polaris to set and enforce data governance policies. By working together, Polaris and Horizon can federate privacy and security policies across both Snowflake and Iceberg data estates, enabling unified governance across the entire Snowflake environment.
When data resides within the Snowflake estate, including both Snowflake-managed and Iceberg-managed formats, third parties can read but not write. Where Polaris adds significant value is that it governs external tables and makes them open for both read and write access by third parties while maintaining unified governance, ensuring Snowflake still controls access. As a result, customers remain dependent on Snowflake.
Snowflake is also exploring making Horizon an independent product. While this may simplify customer migration, especially if a catalog comparable to Snowflake’s — including data exchange and marketplace capabilities — emerges, Snowflake’s leadership in governance and ecosystem integration could ensure customers continue to default to using Horizon, keeping their data within Snowflake’s control.
Both Snowflake and Databricks now emphasize their support for Iceberg, allowing users to choose whichever compute engine suits their needs. This flexibility introduces challenges for less cost-effective engines, which may struggle to compete.
For Snowflake, there’s concern among investors that its premium status relies on locking customers into its proprietary format, then charging high premiums for compute. However, based on our analysis and industry checks, Snowflake remains the most cost-effective data warehouse engine when considering speed, price, and availability. By supporting Iceberg, Snowflake users can now query against an even broader range of data sources, driving increased compute consumption and, ultimately, revenue growth for Snowflake.
Databricks, on the other hand, positions Spark as the leading, cost-effective engine for ETL and writing to data. With the introduction of Photon, Databricks aims to extend its efficiency to reading and query serving. The real test, however, is whether Photon can surpass Snowflake in terms of performance and cost, not just in marketing claims but in independent third-party benchmarks. Photon also plays a critical role in enforcing governance policies, which creates a lock-in dilemma for customers.
If a company chooses Databricks' Unity Catalog for governance (the equivalent to Snowflake's Polaris + Horizon stack) and Delta Lake with Uniform for Iceberg, they may face limitations — such as suboptimal Iceberg reads and the inability to write in Iceberg — unless they rely on Photon for access policy enforcement. Ultimately, the choice between Snowflake and Databricks still results in vendor lock-in, despite the narrative of flexibility.
In conclusion, despite the narrative of openness and flexibility, the reality is that customers still face vendor lock-in. While Snowflake and Databricks both support Iceberg, data mobility and compatibility between these platforms are not as seamless as one might expect.
For Snowflake users, if data is stored in Snowflake-managed Iceberg, metadata is managed by Polaris and governance is handled by Horizon, and the data is optimized for Snowflake’s engine. Although Databricks can read this data, its performance and cost efficiency will be lower than in its own native Delta Lake environment due to Snowflake’s optimizations.