In an age where on-premises architectures and relational databases dominated, delivering data management via an agent-based approach made sense since compute infrastructures, network bandwidth and database/virtualized topologies didn’t change drastically. However, today’s enterprise environment is increasingly dominated by hybrid and cloud-based architectures paired with extremely large and non-relational data sets that are deployed on commodity nodes in a horizontal scale-out fashion. In such environments only an agentless data management architecture will suffice. Let’s discuss the reasons why.
In a cloud architecture, compute is separate from storage and the former is often brought online in an on-demand manner. With on-demand compute, it’s by definition impossible to have agents running. Requiring compute to constantly be available just to deploy agents makes the underlying infrastructure that much more expensive to operate and maintain.
Let’s take this analogy one step further. One of the major use cases for data management in the cloud revolves around disaster recovery or data replication. In this scenario, both the production and DR sites would need to have compute nodes running all the time for an agent-based approach to work, again increasing the cost and overhead.
A recent trend with cloud computing has been the advent of serverless computing. The growth of CosmosDB, Azure Functions, Azure Data Lake Store and Azure Data Lake Analytics (and the relevant parallels in AWS and Google Cloud) has shown how the concept of event-driven, serverless applications can become the new norm. Customers pay for analytics and queries along with storage. In other words, there is no “compute” infrastructure managed by anyone, and therefore an agent-based infrastructure won’t work.
With these trends in mind, our approach with Imanis Data was to build a completely agentless data management architecture. No additional overhead if you are adding or decommissioning nodes in your production database. No worries about additional costs or management overhead if you have a multi-tier replication strategy. The ability to easily handle everything from on-prem to hybrid to pure cloud architectures. Our approach has been to integrate at the data platform API level, and provide the ability to back up both data (tables, partitions, etc) and metadata (schema) on an incremental-forever basis, resulting in lower operating and capital costs.