It will be necessary to examine one of the complaints made against Databricks over the years—that it is difficult to set up and sometimes difficult to use—given that the company is shifting its entire data platform to a serverless architecture.
Databricks currently offers a serverless option for some functions, relieving users of the hassle of spinning up and down clusters as needed. The bulk of the platform, though, depends on underlying computing clusters, which clients pay for whether or not they utilize them.
That is changing. During his keynote speech at the company’s Data + AI Summit on Wednesday, Databricks’ CEO and co-founder Ali Ghodsi disclosed that starting on July 1, the whole Databricks platform will be available as serverless.
Bizfirespark | finvestguide | quickbizfly | cadencewavez | linkerchains
“With serverless, you just pay for what you utilize,” said Ghodsi. In actuality, whether the cluster is idle or not, there is nothing to configure. So, we will take care of everything under the hood on your behalf.
Databricks is reliant on three primary cloud platforms for networking, computation, and storage: AWS, Azure, and Google Cloud. Customer data is expected to be stored in cloud object storage accounts, such as GCS (Google Cloud Storage) on GCP, ALCS (Azure Lake Cloud Storage) on Azure, or S3 (Simple Storage Service) on Amazon, according to Databricks. Because of this, cloud storage is comparatively easy.
The computational configuration is more challenging, though. For tasks like ETL, streaming data, SQL analytics, or ML/AI training, clients can utilize Databricks to provision computation; however, they will be billed for the compute through their cloud platform account. In a serverless approach, the compute equation is changed.
“All of these knobs that we had before are gone,” said Ghodsi. “Cluster tuning: individuals establish clusters. What kind of equipment should they use? isolated incidents?..Can we use autoscale now? That is all no longer available. It just disappeared. There is not one of such pages. You are not capable of achieving it.
Ghodsi claims that being serverless helps customers since it removes the requirement to understand past usage and use it for capacity planning. Although it does not now charge for network costs incurred for serverless workloads, Databricks has the right to do so in the future, according to its serverless literature.
From the perspectives of security and data architecture, serverless computing has benefits, claims Ghodsi.
We can also handle security in a new way because we own every system and can shut it down in an efficient manner. That is not practical when it is not serverless,” he said. “The data layout: how exactly are you going to structure your data sets? How are you going to optimize your data sets? That is no longer there either. We are only running background optimizations. We only use machine learning to improve your data set in the background because it is serverless, which makes it very fast and effective. That, too, is very wonderful.
Databricks will benefit from the shift away from versioning software releases; versioning will be done away with since Databricks will update the program automatically, giving all users simultaneous access to the same fixes and features.
Ghodsi claims that for the past three years, Databricks programmers have been developing the serverless version of their technology. Although the engineers essentially had to redesign every product, it was a topic of discussion within the organization, which is why it took so long.
Matei Zaharia, the CTO of Databricks, and I told the firm two or three years ago that we needed to develop a serverless platform that was simple, lift-and-shift in nature. Our engineers did, in fact, respond with, “Hey, you people are mistaken.” For the serverless era, it needs to be entirely redone. And we rejected it. We as a firm make decisions. And it turned out we were wrong. The technical leads were right. And they have worked really hard over the last two years to effectively recreate a lot of the products—jobs, laptops, everything—as if we had started a whole new company.
June 30 is a Sunday, which is nice, but the shift to serverless computing will not happen overnight. It will take time to migrate all 12,000 Databricks customers’ products from Spark clusters, Structured Streaming, notebooks, and MosaicAI to the serverless versions.
Databricks is investing globally to ensure that serverless versions of its products are available in every cloud data center it manages. The company will strongly urge its customers to make the transition to serverless as soon as possible.
“Please start using serverless,” said Ghodsi.It is highly likely that any new products we offer in the future will only work in serverless installations. Therefore, please implement serverless if your organization does not already.