Your data is the most valuable commodity your company has and at this very moment, thieves are plotting to steal it away from you. The rise of cloud storage means that businesses can access files and documents from nearly anywhere at any time, but there are several different types of cloud-based data storage solutions and choosing the right one could mean the difference between data safety and total catastrophe.
There are three main types of cloud-based data repositories: Data lakes, data warehouses and data marts. Each has its own strengths and weaknesses, so it’s vitally important that you choose a solution that fits the specific needs of your business. Here’s a look at the big three types of data storage solutions and which one is right for you.
Data lakes
A data lake is generally considered to be the most basic type of cloud storage available. This type of repository can store massive amounts of raw, unstructured data, but isn’t particularly organized.
Like water flowing from a river into an actual lake, data flows from one or more sources into the data lake, which can hold quite a bit within its depths. The data might not be sorted, but it can all be contained within the lake.
According to IBM, data lakes are useful for organizations that need to ingest and store massive amounts of information from multiple sources. Data lakes are currently being utilized to house the vast amounts of raw data used to train machine learning models like ChatGPT.
Other industries which make use of data lakes include the energy industry, in which huge amounts of data are analyzed in order to optimize energy output and the healthcare industry, in which patient and medication data is analyzed to predict healthcare costs and diagnoses.
Organizations also store data in a lake if they haven’t decided how to best utilize that data and simply need somewhere to store it. It’s difficult to run analytics on data stored in a data lake, so you’ll want to make sure that your data lake provider is able to seamlessly move information between the repository and a dedicated analytics platform.
Data warehouses
Unlike a data lake, a data warehouse is specifically designed for reporting and analysis of structured data. This is accomplished through a process called ETL, which stands for extract, transform and load.
First, the data is extracted from its original source, then is automatically transformed to fit within the parameters of the data warehouse. This typically involves cleaning the data, combining data from different sources and converting that data into standardized formats.
Finally, that transformed data is loaded into the warehouse and organized into its assigned location. Data warehouses have a wide variety of business use cases across industries that are dependent on data-driven decisions.
Retail stores use data warehouses to store and analyze sales, inventory and customer data. Through this analysis, stores can make better decisions about item pricing and inventory management.
Other businesses that utilize data warehouses include financial institutions, which store and analyze customer data and financial transactions to identify patterns that can inform better risk management strategies and manufacturing companies, which analyze production and supply chain data to optimize the production process and improve quality control.
Data marts
Technically, a data mart is actually contained within a larger data warehouse and is intended to serve very specific business functions. While a data warehouse or lake will typically contain all of an enterprise’s data, a data mart only contains the data relevant to its specific function.
Businesses utilizing data marts are typically looking to analyze a very focused dataset in a short amount of time. According to IBM, data marts are often used by marketing departments within larger companies to track and analyze data related to the performance of the campaigns, including conversion rates and ROI, in order to gain a better understanding of what can be improved for future campaigns.
In addition to being faster and more focused, data marts also tend to be less expensive to maintain, mainly due to their reduced size compared to data lakes and warehouses. Plus, data marts are more secure than lakes and warehouses, as access can be restricted to just the people at the company working on that specific data.
Introducing the data harbor
While data lakes, warehouses and marts are the most notable cloud storage solutions, there are several alterative storage types that either provide similar services or enhance the capabilities of existing data repositories. One company, Calamu, is billing itself as the first provider of a new type of storage solution, called a data harbor.
According to Paul Lewis, founder and CEO of Calamu, a data harbor functions like an additional layer of security to protect your most sensitive information. Data repositories are a prime target for internet hackers, since they’re essentially the high-security bank vaults of the internet, filled with precious data. By using a data harbor, the information stored within is rendered valueless to thieves.
Data stored within the data harbor is fragmented into multiple pieces, scattered across several repositories and then re-encrypted. If an unauthorized intruder tries to access the data, they’re left with nothing but a meaningless collection of numbers.
Lewis says that the term “data harbor” is derived from the legal term “safe harbor” and is meant to signal that data stored through the tech is protected from both hackers and legal liability, such as regulatory compliance with GDPR. Imagine that a piece of data is like a piece of paper with your social security number on it, says Lewis.
“You could put it in a shredder, which is how basic encryption works. It would be difficult to put that piece of paper back together, but it’s not impossible. What Calamu can do, is put data through the shredder and then it takes a handful of the scraps and puts it in one repository and then puts another handful in a different repository. That way, if someone hacks into one repository, they only get a piece of the data, which is meaningless on its own.”
But when an authorized user needs to access that data, Calamu can seamlessly bring those pieces back together. Because no individual storage location has a complete set of data and the data is essentially in a destroyed state when it’s being stored, providers aren’t able to turn over information to legal authorities, since jurisdiction over digital data depends on where it is “at rest.”
“This is where the world is going” says Lewis. “I fully support regulatory compliance where the data is in use, but where the data is at rest is trivial and when storing data in the cloud this becomes vitally important.”
This article was originally published in the original United States edition of Inc. or on inc.com and is the copyright property of Mansueto Ventures LLC, which reserves all rights. Copyright © Mansueto Ventures LLC.
By : BEN SHERRY