Polybase is a pretty cool technology that was debuted in SQL Server 2016 as part of the Big Data (Clusters) initiative. It got a bit of a facelift in SQL Server 2019 and now is really cool. I’ve been testing and breaking it in the past few months just for fun, but it got kind of exciting when I applied the SQL Server 2019 CU5 patch on my test cluster. I’ll post about these findings as I am able to resolve them. Right now it’s working, but I personally feel it’s impaired.
Go check out Kevin Feasel’s great blog and his post on the same CU, but for a different reason. He published a book on Polybase as well and you can find the link to buy it here. He knows a ton more about Polybase then I do, but I am certainly having fun working with it from a admin point of view.
What does it do? It’s meant to be a bit of a replacement for Linked Servers, something that most DBAs have known about for far too long. I still have nightmares about OLEDB waits from linked servers being used because they’re just “easier”. Sure they are, but they wreak havoc on your enterprise. Then again, so do huge Power BI reports, but we don’t talk about them… 🙂
Polybase got cool in SQL Server 2019 because it connects to SO MANY different disparate data sources. I’ve been working on seeing how many I can connect to at one time. It’s pretty neat stuff. I’ll be writing about my findings from a HA POV/Admin POV here in this blog and I hope to hear from you.
I’ll be speaking about Polybase next week at both the Salem SQL UG and the Oregon Data Community UG next Wednesday, the 8th of July at noon and 7:00 PM PST respectively.
Have you used Polybase? What do you think? Let me know!