What’s the Delta with Microsoft Fabric Lakehouse?
May 30, 2024
What is Microsoft Fabric Lakehouse?
That is a great question that can only be answered if we get some definitions straight:
Delta Lake – a low-cost data storage framework that stores data in open-source, generic file formats such as Parquet and their metadata, typically in the cloud.
Parquet – a data storage file format that organizes and compresses data as columns within groups of rows to optimize query time.
Delta – a data storage format combining Parquet files with corresponding JSON metadata to enhance data use and traceability.
Lakehouse architecture – a pattern of organizing data that prioritizes data management and optimization features such as ACID transactions (Atomicity, Consistency, Isolation, and Durability), versioning, auditing, and query optimization.
In short, parquet files with their transactional metadata (AKA delta tables) can create a delta lake within lakehouse architecture to improve accessibility, use, and performance of data within your organization while reducing data storage costs.
So, how is this different than what I am using now?
Currently, many organizations use relational database systems like SQL Server or Azure SQL to store their data warehouse. A data warehouse is a highly organized, multi-dimensional source of data for an entire organization or a specific purpose. Data from various sources or raw files throughout the organization are transformed, cleaned, and loaded into a row-based database such as an Azure SQL Server for use. This architecture doesn’t easily support unstructured data such as images and can limit analytical workloads that require non-SQL code. Plus, if the data warehouse has a lengthy load process, it can increase data lag into user-facing programs like Power BI. Additionally, the warehouse model data uses schema-on-write, meaning the data needs to be strictly defined before landing in the warehouse. This can increase maintenance and reduce flexibility.
A lakehouse that is built on top of a delta-lake has the following advantages:
- Uses schema-on-read to increase flexibility of use throughout the organization from business analytics to machine learning
- Increases performance of the data through use of parquet files
- Improves data reliability through the Delta metadata
- Reduces storage costs due to the compressed storage format
- Reduction of continual processing efforts required to load data warehouses
The Microsoft Fabric Lakehouse
Direct Lake
Power BI typically uses Import Mode to load data into its own memory-based semantic model. Large Power BI semantic models using Import Mode can take a long time to refresh which can cause lags in decision-making and the display of stale data. Direct Lake is a semantic model format like Import mode of Direct Query that aims to eliminate this problem by connecting directly to the live Lakehouse model. In Direct Lake, rather than loading an imported semantic model or querying through it, the Lakehouse model is the Power BI semantic model.
Note that once the Lakehouse is updated by the ETL process, the Fabric Semantic Model immediately has the newest data – there is no refresh time required. Now the new source data is visible in Power BI in 15 minutes at most – as the ETL process is running and writing to Delta Lake, new data becomes available as soon as it is written.
One caveat with using the Direct Lake in Power BI is the constraint of column naming convention within the Lakehouse. Parquet files cannot contain special characters or spaces as column names. This can pose issues for creating descriptive, business-facing column names within a Lakehouse model.
For example, it is preferable to name a column “Customer Address” but the closest that the parquet format supports would be “CustomerAddress.” If you use Power Query to rename the parquet-supported columns (from “CustomerAddress” to “Customer Address”) within Power BI, Direct Lake is no longer supported; the model reverts to using Direct Query instead.
Direct Query mode still avoids the refresh time that Import Mode has, because it issues queries directly against the Delta Lake SQL endpoint. The performance of Direct Lake and Direct Query is comparable, but most often Direct Lake will outperform Direct Query slightly.
Microsoft Fabric Licensing
Microsoft Fabric is currently included with Power BI Premium capacity licenses, which start at approximately $5000 per month. (Power BI Premium Per User licenses do not include any Fabric capabilities). You can purchase Fabric capacities to add to your Power BI tenant, through your Azure subscription, starting at $262.80 per month for the smallest capacity (running 24/7). Fabric capacities can be turned off when not in use to reduce costs if you don’t need it to run 24 hours a day.
If you already have a Power BI Premium capacity, are currently spending more than $200/month for Azure SQL databases, or you need to shorten your data load time to get the most recent data with much smaller processing delays, then you should consider using Fabric and DirectLake.
If you are looking for a data storage solution or want to start using Microsoft Fabric Lakehouse, our team can help. We can provide you with a solution that will enable you to maximize the potential of your organization’s data. Make sure to subscribe to our blog for more data tips, tricks, and updates.
Want to hear the latest from out team of experts? Sign up to receive the latest news right to your inbox. You may unsubscribe at anytime.
Discover More
Secret Manager to Protect Sensitive Data in App Development
Secret Manager to Protect Sensitive Data in App Development September 5, 2024 Secret Manager to Protect Sensitive Data in App Development How you deal with sensitive data is critical to…
Protecting Sensitive Data: Ensuring Privacy from a Data Engineering Perspective
Protecting Sensitive Data: Ensuring Privacy from a Data Engineering Perspective August 29, 2024 Protecting Sensitive Data: Ensuring Privacy from a Data Engineering Perspective In a bank call center, staff members…
Custom Script Setting in SharePoint & OneDrive Removed
Custom Script Setting in SharePoint & OneDrive Removed August 22, 2024 Last week, we published a blog discussing Microsoft retiring SharePoint Add-Ins. In that blog, we briefly discussed how the…
Let’s build something amazing together
From concept to handoff, we’d love to learn more about what you are working on.
Send us a message below or call us at 1-800-989-6022.