visit
This post initially appeared on .
For a few years there, the term “private cloud” had a negative connotation. But as we know, technology is more of a wheel than an arrow, and right on cue, the private cloud is getting a ton of attention and it is all positive. The statistics are clear, Forrester’s 2023 Infrastructure Cloud Survey had 79% of the 1,300 enterprise decision-makers who responded saying they are implementing private clouds. According to a
The primary reason that companies repatriate is cost. They save up to 70% by repatriating. This has been proven publicly by companies as diverse as
That operating model defines a certain architecture and time and time again, that architecture makes the modern data lake possible. There are other architectures to be sure, but using the private cloud to build your modern data lake allows organizations to pay for only what they need. When their business grows, scaling is as simple as adding more resources to a cluster. A redesign is not needed.
A modern data lake is one-half data warehouse and one-half data lake and uses object storage for everything. The object storage layer is software-defined, scalable, cloud native and performant. Performance is tunable through the selection of the
Using object storage with the data lake is standard, using it with the data warehouse is new, made possible by Open Table Formats (OTFs) like Apache Iceberg, Apache Hudi and Delta Lake. There is considerable detail on this architecture that is beyond the scope of this article. For that I recommend reading Keith Pijanowski’s full
High performance: While the private cloud can be designed for capacity, the modern private cloud looks to deliver performance at scale. This architecture prioritizes tools that emphasize speed and efficiency. As Jeff Bezos says, who wants to pay more and wait longer to get it? Same principles apply here: Who wants it slower?
Decoupled compute and storage: Unlinking these components offers increased flexibility and scalability, enabling your chosen infrastructure, services and tools to excel in their respective areas of expertise.
Open standards: Open standards not only encourage interoperability but also future-proof your investments. This encompasses not just open source solutions but also open table formats as we will explore. Don’t build a private cloud with a storage appliance for these reasons (and the fact that they will never be cloud native).
Compatibility with RESTful APIs: Interconnectivity is a must. Your tools should share a common language, with S3 serving as the lingua franca for cloud storage. For this reason, don’t build your private cloud with a POSIX-centric solution, even if it claims to support S3. Go with the real deal.
Software driven/Infrastructure as Code: Automate and let Kubernetes take care of orchestrating your infrastructure, enabling you to abstract away the complexities of manual management, and allowing for rapid and efficient scalability.
Enhanced security and compliance: Because private clouds provide a dedicated infrastructure they offer greater control over data and enhanced security measures. This is particularly beneficial for industries that handle sensitive information, such as finance and health care.
Regulatory compliance: This architecture can support regulatory compliance by providing customizable security settings and audit controls to meet specific industry standards.
Putting Your Private Cloud in Play
There are a number of approaches we have seen to lighting up the private cloud. All of them can work; it really depends on the enterprise and the use case.
Time-Limited Hybrid Approach: The time-limited hybrid approach essentially turns the public cloud into cold storage and builds out your private cloud footprint over some period of time (months/quarters, not years). This involves buying and configuring your infrastructure and software stack on the private cloud. Then you point your data pipeline at the private cloud, not the public cloud. There may be some period of time where you might do both. The goal, however, is to use the public cloud as tiered cold storage and the private cloud as hot storage. Over time, the public cloud goes from cold to frozen while the private cloud becomes the primary and dominant storage type.
Complete Repatriation: There are times where keeping the applications and data on both the public and private cloud is not an option. In these cases, you need to break up with your cloud provider. It is hard, and even with the elimination of exit fees, they make it painful (the fine print basically says everything has to go to get any exit fee relief). It is very doable; it just takes a little more planning and a little more business friction. In this case, provision your colo or private cloud and application stack. Then back up the data truck or lease the network to firehose the data out to your private cloud data infrastructure. At this point you are free, but count on paying double for a month or two if you are the belt-and-suspenders type. One of the leading streaming companies took this approach when it left the public cloud. It forklifted half an exabyte into the new private cloud, including all the movies, shows, documentaries, etc. The process took about three quarters. The payoff was massive, however, and the complexity was greatly reduced for the team managing the service. They also enjoyed the side benefit of a nice pop in “
Greenfield Private Cloud:
This is a fairly straightforward proposition and it generally involves new everything. The project is new, the data on the project will be new (or newish) or generated from some source that is coming online (like a giant fabrication plant or a new cloud video-on-demand service). Here you size the workload — you might even test it on the public cloud — but the idea is that it will, from inception, run on the private cloud. We are seeing this quite frequently with AI data infrastructure. The early experiments are occurring in the public cloud. The data is not that significant. The GPU availability is fairly good. Nonetheless, the enterprise knows that the workload needs to be on the private cloud for production — both for scale, but also security, privacy and control. One of the leading automotive companies in the world recently pivoted its full self-driving initiative from a rules-based system to one that is based on the behavior of actual drivers.
Brownfield Private Cloud:
We will be honest here: We see this, but we don’t love it. This includes trying to run high-performance workloads on hard disk drives to layer MinIO on
The Others:
There are two other scenarios that are less frequent but should be in the consideration mix. One is the hybrid burst approach and the other is the external tables approach. Both are related to the hybrid option, but may not be time-bound. In the hybrid burst approach, you maintain a private cloud while designing it to seamlessly expand, or "burst," into the public cloud for added flexibility. This strategy is often adopted to leverage extra GPU capacity or to use specific cloud services. In this model, certain tasks are temporarily transferred to the public cloud for processing. Once the analysis is complete, the results are sent back to the private cloud, and the public cloud resources are then decommissioned. We have a major financial services customer doing this with credit risk and market risk calculations. It uses the public cloud for some compute operations and combines it with a private cloud data lake that uses MinIO and Dremio. The beauty of the cloud operating model is that the architecture should support operations in both places. It is, effectively, a two-way street.
Final Thoughts and Counsel
We have been party to a lot of these private cloud repatriation/new builds over the years. One thing that comes as a surprise to the teams is managing hardware again. In the cloud it is transparent. DevOps and site reliability engineers only interact with infrastructure at an API level. If a VM is acting up, terminate and launch a new one in its place. Unfortunately, in the new private cloud, rather than just scrapping hardware and buying new, we have to make the existing hardware work.
Colocation provides a middle ground between fully on-premises infrastructure and the public cloud, offering the benefits of both worlds. With access to top-tier networking and proximity to the public cloud providers, colos facilitate low-latency connections and hybrid cloud setups, enabling efficient data transfer and processing. This flexibility, and potential for successful hybrid cloud deployments is crucial for businesses aiming to optimize their operations and maintain a competitive edge. To learn more about how this works, check out our