What’s the Delta with Microsoft Fabric Lakehouse?

May 30, 2024

What is Microsoft Fabric Lakehouse?

That is a great question that can only be answered if we get some definitions straight:

Delta Lake – a low-cost data storage framework that stores data in open-source, generic file formats such as Parquet and their metadata, typically in the cloud.

Parquet – a data storage file format that organizes and compresses data as columns within groups of rows to optimize query time.

Delta – a data storage format combining Parquet files with corresponding JSON metadata to enhance data use and traceability.

Lakehouse architecture – a pattern of organizing data that prioritizes data management and optimization features such as ACID transactions (Atomicity, Consistency, Isolation, and Durability), versioning, auditing, and query optimization.

In short, parquet files with their transactional metadata (AKA delta tables) can create a delta lake within lakehouse architecture to improve accessibility, use, and performance of data within your organization while reducing data storage costs.

So, how is this different than what I am using now?

Currently, many organizations use relational database systems like SQL Server or Azure SQL to store their data warehouse. A data warehouse is a highly organized, multi-dimensional source of data for an entire organization or a specific purpose. Data from various sources or raw files throughout the organization are transformed, cleaned, and loaded into a row-based database such as an Azure SQL Server for use. This architecture doesn’t easily support unstructured data such as images and can limit analytical workloads that require non-SQL code. Plus, if the data warehouse has a lengthy load process, it can increase data lag into user-facing programs like Power BI.  Additionally, the warehouse model data uses schema-on-write, meaning the data needs to be strictly defined before landing in the warehouse. This can increase maintenance and reduce flexibility.

Microsoft Fabric Lakehouse
Traditional Data Warehouse Architecture with Power BI Dataset using Import Mode. The minimum time before new data from the source systems is visible in Power BI is about 45 minutes in this example.

A lakehouse that is built on top of a delta-lake has the following advantages:

  • Uses schema-on-read to increase flexibility of use throughout the organization from business analytics to machine learning
  • Increases performance of the data through use of parquet files
  • Improves data reliability through the Delta metadata
  • Reduces storage costs due to the compressed storage format
  • Reduction of continual processing efforts required to load data warehouses

The Microsoft Fabric Lakehouse

Direct Lake

Power BI typically uses Import Mode to load data into its own memory-based semantic model. Large Power BI semantic models using Import Mode can take a long time to refresh which can cause lags in decision-making and the display of stale data. Direct Lake is a semantic model format like Import mode of Direct Query that aims to eliminate this problem by connecting directly to the live Lakehouse model. In Direct Lake, rather than loading an imported semantic model or querying through it, the Lakehouse model is the Power BI semantic model.

Microsoft Fabric Lakehouse
Lakehouse architecture with Fabric Semantic Model, Direct Lake mode.

Note that once the Lakehouse is updated by the ETL process, the Fabric Semantic Model immediately has the newest data – there is no refresh time required. Now the new source data is visible in Power BI in 15 minutes at most – as the ETL process is running and writing to Delta Lake, new data becomes available as soon as it is written.

One caveat with using the Direct Lake in Power BI is the constraint of column naming convention within the Lakehouse. Parquet files cannot contain special characters or spaces as column names. This can pose issues for creating descriptive, business-facing column names within a Lakehouse model. 

For example, it is preferable to name a column “Customer Address” but the closest that the parquet format supports would be “CustomerAddress.” If you use Power Query to rename the parquet-supported columns (from “CustomerAddress” to “Customer Address”) within Power BI, Direct Lake is no longer supported; the model reverts to using Direct Query instead.

Direct Query mode still avoids the refresh time that Import Mode has, because it issues queries directly against the Delta Lake SQL endpoint. The performance of Direct Lake and Direct Query is comparable, but most often Direct Lake will outperform Direct Query slightly.

Microsoft Fabric Licensing

Microsoft Fabric is currently included with Power BI Premium capacity licenses, which start at approximately $5000 per month. (Power BI Premium Per User licenses do not include any Fabric capabilities). You can purchase Fabric capacities to add to your Power BI tenant, through your Azure subscription, starting at $262.80 per month for the smallest capacity (running 24/7). Fabric capacities can be turned off when not in use to reduce costs if you don’t need it to run 24 hours a day.

If you already have a Power BI Premium capacity, are currently spending more than $200/month for Azure SQL databases, or you need to shorten your data load time to get the most recent data with much smaller processing delays, then you should consider using Fabric and DirectLake.

If you are looking for a data storage solution or want to start using Microsoft Fabric Lakehouse, our team can help. We can provide you with a solution that will enable you to maximize the potential of your organization’s data. Make sure to subscribe to our blog for more data tips, tricks, and updates.  

Want to hear the latest from out team of experts? Sign up to receive the latest news right to your inbox. You may unsubscribe at anytime.

QR Code Phishing Attacks

Discover More

PMO Practices

PMO Practices and Microsoft 365  

Janine JeansonJun 6, 20245 min read

PMO Practices and Microsoft 365 May 30, 2024 What is a PMO (Project Management Office) The Project Management Office (PMO) sets the standards for work and processes among project managers.…

software projects

Avoiding the Cliff of Success: When to Back Out of Software Projects   

Darren KuikMay 23, 20245 min read

Avoiding the Cliff of Success: When to Back Out of Software Projects   May 23, 2024 Imagine you’re in a car on a road trip. Everything starts off smoothly and you…

Configuring Power Platform Pipelines

Creating and Configuring Power Platform Pipelines: An Overview  

Stephan AlexanderMay 16, 20246 min read

Creating and Configuring Power Platform Pipelines: An Overview   May 16, 2024 Pipelines in Power Platform aim to automate and democratize the execution of the Application Life Cycle Management (ALM) process…

Let’s build something amazing together

From concept to handoff, we’d love to learn more about what you are working on.
Send us a message below or call us at 1-800-989-6022.