Our CEO Cristina Flaschen was interviewed on Google Cloud's Stack Chat. We wanted to provide a write up for what she talked about with some more details.
For those of you new to our platform, the idea for Pandium’s core product came about when Cristina and I realized first-hand that existing integration tools were ill-suited to meet the growing demand for in-app integration marketplaces.
But from a technical perspective, companies face real challenges in building these external-facing integrations and the infrastructure they require to function at scale. Their developers need to securely build, maintain, and host these integrations, while providing a front-end that business users can easily navigate.
This is incredibly complex to build, and as a result, many SaaS companies end up with only a few functional integrations to offer their customers, with high maintenance and customer support costs.
We decided to build a platform that provides all the in-app marketplace infrastructure–authentication, security, front-end UI, account provisioning, business user logging, and hosting–so SaaS developers can focus solely on writing the specific integration configurations their customers need.
Traditional integration platforms work for some simple use cases without code, but for more complex configurations, they require developers to code within and around visualized elements (bundled code) in a fairly rigid system. This means engineers have to learn an esoteric system.
We wanted Pandium to be developer friendly, and we designed Pandium so that developers can securely push simple command-line based scripts (in whatever language they already write in) through their repo to the Pandium platform. This gives engineers maximum flexibility and speed to iterate their integration configurations without having to learn a new system.
In order to make this work at scale, our engineering team chose a microservices architecture with containers to ensure that it runs as efficiently and securely as possible.
With this structure, clients and their customers will not be affected if there is ever an issue with one client’s integrations. Because we run third-party code on our platform, we face unique concerns around security and we needed to ensure that no client’s errors or performance issues ever affected any of the other clients’ accounts.
In addition, since our clients' in-app integration marketplaces can quickly gain or lose users, as the host of these marketplaces, we needed an architecture that could efficiently respond to rapidly changing utilization without compromising availability or causing a huge spike in costs.
We initially used a different cloud provider, and it took us hours to spin up clusters. We also fairly regularly received night-time pages related to the Kubernetes control plane, such as high memory consumption of the master etcd3 database that backs the Kubernetes control plane.
We decided to switch to GKE so we could leave the work of running cluster subsystems to Google.
With a few simple gcloud commands we were able to spin up clusters in minutes. And not having to worry about Master Node health allowed our team to sleep better at night… literally.
GKE node pools can be set up to be preemptible, which saves 80 percent of the cost of running a cluster on Google. We run many jobs at punctuated intervals that do not need to run for long periods. This means ephemeral workloads could be incorporated into our setup without compromising our clients’ experience.
In addition, GKE’s node pools add an extra layer of security and segregation that our clients need. Pandium’s worker nodes can be separated from client workers, for example, and we can run different security levels for first-party nodes, which we completely control, compared to clients’ nodes.
From automatic scaling and automatic upgrading, to logging through Stackdriver, GKE makes it possible to take full advantage of Kubernetes without devoting significant engineering resources to designing, managing, and maintaining its performance.
Over time, the drift between different environments (i.e. production, dev and staging) became too great to manage without more concrete management. Cluster deployment time reverted from minutes back to hours.
When looking to fix this, Terraform and in particular Google’s prebaked Terraform Modules allowed us to implement managing cloud infrastructure with code.
Using the Hashicorp Configuration Language (HCL) we could define our infrastructure with clear, concise code, without having to figure out how to get to that state.
This is similar to when one writes a SQL query against a database. You declare the result you want, not the details of how to get a response. This enabled us to get our environment creation process back down to minutes.
With the modules provided by Google, we jumped months ahead of our infrastructure roadmap, and could create and configure everything from Cloud Projects, to IAM service accounts, to Node and Network Policies in GKE. Running this way empowers us to ensure high availability even with utilization spikes, as our systems can autoscale up and down.
By relying on Google’s infrastructure and its robust support, we have been able to provide the infrastructure for SaaS companies to build scalable in-app marketplaces. To hear our CEO Cristina Flaschen discuss how we leveraged GKE, you can watch the Google Cloud interview here.