Data warehouse: A digital storage space where structured data can be archived for specific business intelligence purposes and reporting. The data is predefined and stored in a specific format, ready to use in a data set.
Data lake: A digital storage space where unstructured information is stored in raw format, ready for whatever uses we may be able to find for it, now or in the future. A more flexible approach.
Data lakehouse: A combination of both of the above; bringing together the features and tools of a data warehouse and the unstructured, raw data of a data lake.
Structured data vs unstructured data
If a data warehouse stores structured data and a data lake stores unstructured data, you need to grasp what types of data those could be, so you can assess your business needs.
Structured data: data points that have already been organised and ‘structured’ in some way i.e. into tables, so they are searchable and clearly defined. This type of data includes numbers, words, and strings. For example, anything that could be formatted into your standard subscriber list with fields and values, or a relational database like Marketing Cloud Data Extensions.
Unstructured data: basically the opposite. This type of data is more qualitative and made up of images, emails, word processing files, video and audio, for example. It can’t be easily shown in a table of rows and country code korea mobile phone columns and is notoriously more difficult to search and analyse.
According to Gartner, a whopping 80% of enterprise data is unstructured. As you can guess, this means data warehouses are not always meeting the full needs of a business. In order for a data warehouse to receive data from another system it must first extract, transform and load (ETL) which can be time consuming.
Yet data lakes have their drawbacks too. These don’t take into account the other 20% of structured enterprise data and are much more difficult to extract useful insights from without significant data science resources and budgets.
The Salesforce Data Lakehouse
Combining the data management and security of a data warehouse with the flexibility of a data lake, a data lakehouse allows businesses to store and organise all types of data. In turn, this means better insights, improved machine learning for AI tools like Einstein, and the ability to act quickly on both first and third-party data.
Salesforce Data Cloud comes with a built-in data lakehouse. It also has a ready-made data model known as ‘Star Schema’ that helps you to organise your data, and a simpler ‘Extract, Transform, Load’ (ETL) process that can largely be configured in the UI.