Data+AI Summit 2024 - Kubrick

insight

JUNE 25 - Databricks’ flagship Data+AI Summit landed in San Francisco earlier this month. With four full days of content, training, and events, the conference attracted thousands of data and AI practitioners and leaders to learn, connect, and grow. As one of the many in attendance, Kubrick Databricks Champion Ben Roche shares his top takeaways and developments to watch.

The flagship event is always full of big announcements – but this year was exceptional. What were the major headlines anyone in the data and AI industry should know?

The announcement that Unity Catalog is going open source was one of the biggest focuses of the weekend – and rightly so. This means that Unity Catalog is the industry’s first open source catalog, and Databricks’ vision to is make it the most interoperable. More than anything, it speaks to the fundamentality of data governance in the current data and AI space. This is no longer a ‘nice to have’ layer on top of data warehouses or lakes, pipelines, and analytics instances – it should be interwoven into all data architecture and systems. And this demand is something we’ve seen from our clients at Kubrick. As a Databricks specialist, a growing volume of our work at Kubrick has been assisting organisations with their Unity Catalog migrations, so this is announcement is a highly relevant – and exciting – step forward.

I’ve seen firsthand how important Unity Catalog is for empowering data governance and democratization – and how it will ultimately enable AI. The open source version of Unity Catalog will come with some limitations. I’ll be taking a deep dive into the features and what this means for organizations as we see the features roll out, so stay tuned!

The showstopper at the summit was the unveiling of Mosaic AI’s new suite of capabilities. The developments include impressive technical capabilities, such as enabling easier, faster fine-tuning with a managed API through Mosaic AI Model Training, as well as the new Agent Framework, which enables faster development of Retrieval-Augmented Generation (RAG) applications with enhanced governance in tandem with Unity Catalog.

But the developments mean more than just increasing the speed of AI application development. Mosaic AI Agent Evaluation allows users to create ‘golden’ examples of interactions by which to measure and assess the quality of your AI’s outputs, with capability to explore how changing tuning or adding tools can alter the quality, with diagnostics for low-quality responses. It also allows you to invite Subject Matter Experts (SMEs) from around your organization to review the outputs for live quality assessment. This kind of tooling is essential to improve trust and credibility in AI, especially when inviting stakeholders from around your organization to have their input.

There are a whole host of other features announced, but these developments are especially notable for the fundamental way organisations can embrace AI.

The developments of Mosaic AI are certainly comprehensive. But are businesses ready for this level of Generative AI capability?

When Databricks’ acquired MosaicML for $1.3 billion, it was a bold move towards making generative AI accessible for enterprise. Perhaps its too soon to say whether the bet has paid off, but in just 11 months they have created an end-to-end model building system which is fully integrated across their platform.

Source: Databricks

Many organizations won’t be ready to make use of the full suite of capabilities, but many of the features within the building, deployment, and evaluation stages create a safe environment to experiment and find value. The ease of use in these features will also help break through barriers of upskilling data and AI teams to deploy AI products, with increased collaboration between data engineers and ML engineers/MLOps specialists. For the organizations who are ready to embrace AI, there is incredible opportunity to scale design and deployment quickly, which lends itself to driving a strong competitive edge.

Databricks are finding more ways to showcase the power of AI to business users, as well. They announced a partnership with Shutterstock with ImageAI, a text-to-image generation tool for enterprise underpinned by Mosaic AI. Perhaps one of the most attention-grabbing developments for business users, however, was the announcement of Databricks AI/BI, their new business intelligence platform which leverages AI for the rapid creation of dashboards with a low-code interface. It includes their Genie feature, a conversational interface that allows users to engage with the tool in natural language, finally giving non-technical end users the ability to develop and interrogate their BI sources like never before.

The product represents a strategic move into the highly competitive BI and analytics space, which is not an area of technology that Databricks are known for. On a personal level, I’m curious to see how the product is received; so many organizations will already have their BI tools of choice integrated into their Databricks instance, but the compound AI system powering AI/BI means it will continuously learn and develop from usage across an organization’s entire data stack, without the setup or modelling that other AI-powered BI tools need. For organizations investing in their Databricks instance, this could be a game-changer to realize even more direct business value.

The big announcements have created some splashy headlines, but there were other developments that might have flown under the radar. What were some of the biggest takeaways for you, as a Databricks Champion working with organisations to leverage Databricks on a day-to-day level?

The AI capability announcements were vast, but we all know robust, trustworthy AI is not possible without strong data foundations – and ultimately many of the organizations I’ve supported in my time with Kubrick are still building or reinforcing these foundations before they can consider developing AI systems. I was glad to see there have been some impressive improvements to Databricks’ data warehouse performance, showcasing how querying speeds are 73% faster in just the last 2 years.

One of the most interesting announcements that missed the major headlines was that the entire Databricks platform will be available as serverless as of July 1st. Until now, only their SQL Warehouses were serverless. This move will help users have more control over their compute and spend, which is particularly beneficial to smaller organizations who are just getting started with Databricks. The move to serverless may be less appealing to larger enterprises who are already well established with their Databricks deployments– and may even cause some concern – but in the long-term, it will enable instant rollout of updates and even help improve security.

The biggest gamechanger for data engineers is their new Lakeflow capability, an all-in-one ETL solution. Lakeflow is comprised of 3 components: Lakeflow Connect, Lakeflow Pipelines, and Lakeflow jobs. Together, it promises to automate the deployment, operations, and monitoring of pipelines at scale, adding complexity of capabilities in workflows like triggering, branching and conditional execution, with in-built data quality monitoring. Most critically, their Lakeflow Connect feature, which stems from their acquisition of Arcion last year, enables integration with major enterprise systems like Salesforce, Workday, and SAP, meaning organizations can drive even greater cohesion and accessibility to their data assets. This feels like a final frontier in the race to become truly data-driven.

On a final note, for my fellow Unity Catalog enthusiasts, the Summit had some great sessions to dive deeper into the many features which make it such a comprehensive governance tool. In particular, I have to mention the new Attribute-Based Access Control, which enables better scaling of your organization’s access management framework, as well as the ability to ‘Bring your own Lineage’, whereby users can add other sources that aren’t captured by Databricks to improve the holistic view of your data lineage. These are the kinds of updates that make a big difference to the overall effectiveness and reliability of data assets which can really change outcomes for businesses.

Ben Roche is a Databrick Champion and Squad Lead at Kubrick, supporting organizations of all sizes and all stages of their Databricks journey to implement and advance their data and AI capabilities leveraging the Databricks platform. To learn more about Kubrick’s partnership with Databricks and our delivery and resourcing capabilities, get in touch: speaktous@kubrickgroup.com

Latest insights

insight

Industry Report - Q1 2025

Mar 6, 2025

Camilla Dickson

podcast

Kubrick + Starburst podcast

Feb 7, 2025

Read all insights

Databricks Data+AI Summit 2024: The headlines – and what you might have missed

Latest insights