5 Years of Building A Cloud: Interview With LivePerson Openstack Cloud Leader

Three months ago I started this LinkedIn discussion and I keep getting comments about it. People might say that it is just a defiant question for marketing purposes. I say that this question raises many thoughts and opinions that helps marking the strategy of an IT organization. I invite you to read the following comments that can bring you to think a bit more about your current On-Demand strategy and approach.

> > > > > Answer #1: Just a Buzzword

It’s a buzzword. This is a 70’s-80’s technologies evolution. Remember mainframes, VM/370, per-time payments when using machine. Just another evolutionary loop, development of already existent technologies. In my opinion Cloud computing is an evolution. Started with the revolution of Grid computing, then Utility computing, SAAS computing and now it finds its preliminary conclusion in Cloud computing. Thus no it is not a revolution, it is a revolutionary step in the evolution of what is now called Cloud computing. This is just a good name for number of technologies that was ready years before than customers are become ready for it and useful software was written. Many companies added “Cloud” to the titles of their solutions. Any site can be marked “Cloud ready” or “SaaS solution” :) It means that it’s only marketing. This all is possible because people don’t know what Cloud is in details; sellers often talk about it as about some magic. You can use Magic instead of Cloud; meaning stays the same - marketing.

> > > > > Answer #2 : Depends! From the technology perspective: Evolution and from the business point: Revolution

“From a technology point of view I am pretty positive about categorizing it as evolution and not even sure if representing a significant step; from a business standpoint however I think there is much more value in the concept. I believe that Cloud Computing introduces a capability to rapidly map dynamic changes in the business models that is kind of revolutionary

“My observation is that “cloud” is a description of how IT is supposed to work from a business perspective: flexible, available, efficient (lower cost), secure, dynamic, responsive, etc. If you are an IT specialist, the technology is evolutionary, but the thinking may be revolutionary.”

“I tend to think that the cloud computing Revolution will transform the way all businesses interface include enterprises with technology and communications, and marks the next wave of the fundamental changes that the evolution of the internet has already brought about the Tera Play

> > > > > Answer #3: IaaS just an Evolution. Massive Scaling, supported by PaaS and SaaS, is the Revolution.

“I’ve seen global Trading and risk systems, (30,000 node compute grids, nano second trading platforms), some true cutting edge platforms. And this is really a complete transformation of IT. If you’re thinking just IaaS then it’s just evolutionary. True SaaS and PaaS is a revolution. The fact that Salesforce (and the force.com platform) can deliver millions of users and 97500 customers on a single multi-tenant platform with three major upgrades per year. That’s the power of the cloud. Giving a small 10 user non-profit the same reach and scale as a multibillion dollar organisation. The cloud. No admin or maintenance, pure development and software business process IP. What other technology can scale from 1 to 100,000 users. It can take much less than 10% of the traditional development to build a SaaS app compared to traditional platforms. Cheaper , faster AND better . That’s a revolution.

“The prior comment reflects a deep misunderstanding about what timeshared (outsourced) mainframe computing was all about. Cloud is just another swing in the pendulum. The business owners in the 60’s were right: why should we buy and maintain our own computers when we can better spend the money by renting the computing resources from somebody who knows how to take care of all that “stuff”? It’s not new. We’re just coming around to the fact that PC computing set the industry back 40 years and we’re now where we would have been if PCs had not taken 25 years to “grow up”.

As amorphous as this question might be, the analogy to mainframes is highly misplaced and not very useful. Among other defining characteristics, cloud services allow software developers to control infrastructure resources programmatically. This means that applications specifically designed for cloud environments can bypass the historically slow and error prone layer of IT administrators that maintain computing resources through largely manual, error prone processes. Companies that use such functionality to enable auto-scaling, such as Netflix, are doing so without the need to invest capital into stranded computing capacity that may or may be fully subscribed. I’ll leave the ever so important question of evolution vs. revolution to you, but explain to me how the Netflix development team could have replicated their

“Revolution - Cloud is a disruption of everything internet and application as we know them. The very large infrastructure and service vendors are racing to rework their offers and slow things down to keep their competitive advantage. Revolutions are messy - like a massive earthquake or coup d etat. Evolution is what you study afterwards when learning which creatures adapted and which went extinct.”

> > > > > Revolution by Wikipedia takes place in a short period of time

“According to the Wikipedia definition: A revolution (from the Latin revolutio, “a turnaround”) is a fundamental change in power or organizational structures that takes place in a relatively short period of time. So how this aligned with the “Cloud Computing Revolution” that doesn’t seem to come up in a short period of time?… I remind you that Amazon started its AWS 11 years ago… ” I asked

“Ofir - most revolutions have a long lead up time where the angst ferments underground and bursts out in a moment of time when the underlying ability to organize action is catalyzed by some event - Egypt for example (mobile devices & Facebook). Think of the internet revolution in 1995-1997. The internet was slowly building out (DARPA net, etc) with organization by the scientific/defense communities and catalyzed by Tim Berner Lee public gift of http/html. The corporate world was seeking a way to collaborate beyond the bonds of one company’s offer, like IBM & MSFT. Within two years the Internet exploded into the corporate world, literally revolutionizing the ways companies marketed themselves. The coup was over when Bill Gates announced that Msft was an Internet company and Netscape dropped $25 in a day!”

This month Christian Verstraete, HP’s Chief Technologist also raised this question in the CIO magazine. In his post he writes:

“One of the questions that came on the table was whether cloud computing is a revolution, a paradigm shift, or not. I’d like to answer, it’s both.

I say that the cloud computing is Evolving faster to become a Revolution,

what do you think ? Join the discussion

5 Years of Building A Cloud: Interview With LivePerson Openstack Cloud Leader

Koby Holzer has 17 years of experience working with large infrastructure environments, with the last 4.5 of these at LivePerson as the director of cloud engineering, specifically focussing on Openstack. His past experience includes working with prominent Israeli telecom companies in the area of technological infrastructure. I have personally known Koby for the past few years, through discussing, lecturing and enjoying the great cloud and DevOps community in Israel.

 

Source: Twitter

Though unfortunately I didn’t have the pleasure to be at the last OpenStack summit in April, I was thrilled to see Holzer pictured taking part in the keynotes session at one of the most important global cloud events. And the following is the result of another great interview with a true cloud leader.

Ofir: Let’s start with a simple question. How was the OpenStack Summit?

Koby: The summit was amazing, and it’s getting better every year. It was my fourth one. This was the most exciting one for me as I had the opportunity to , talking about LivePerson, Openstack, containers…and all the new and exciting stuff we are doing with the LivePerson cloud.

It was the biggest convention yet with over 7,500 people in attendance. And it just continues to get bigger every year in every aspect — logistics, keynote sessions, educational tracks.

On the educational side I particularly like the practical real life cases. It’s very interesting to see how other companies are tackling Openstack. My focus was on two specific tracks – the case studies, including the AT&T, Comcast and the containers track, which was very popular at the conference. The one session that I remember as particularly meaningful was the session – . The panel included  experts from the most popular container technologies in the market  (i.e.Kubernetes, Docker, Mesos).

IMG_3983.JPG

Source: Twitter

Ofir: How would you summarize the evolution of LivePerson over the past five years?

Koby: Early 2012 we started learning and evaluating Openstack, which was the Diablo version at that time. We started to play with it, building our cloud labs and making the decision to go to production with a small portion of LivePerson (LP) services during the middle of that year. And when we reached production we were already using Essex. In 2012 we had Essex in production and towards the end of that year we decided to rewrite LivePerson’s service from one big  monolithic service to many microservices.

The next step was adopting Puppet, which accelerated the consumption of our private cloud. R&D moved from consuming physical servers to virtual OpenStack instances. By the end of 2013 we had already created a large cluster with more than 2,000 instances on 4 data centers, and from then on it just continued to evolve.

In 2013-2014 we were dealing with the OpenStack upgrade challenge and managed to move to  Folsom, then Havana and Icehouse. We try to upgrade as often as possible, however, the bigger our cloud gets the more difficult that is.

In 2015 we reached a point where we had finished rewriting the service, and our new “LiveEngage” service was ready for production and real users. Today we have something like 8,000 instances on 7 data centers, running on more than 20,000 physical cores. 2016 is the year for us to migrate to containers and Kubernetes, something which we expect to span well into 2017.

Source: openstack.org

Ofir: If you were to look over these last 4.5 years, what would you say have been, or are still, your main challenges?

Koby: I’m managing the cloud engineering and there is a rather large team here of software developers. We were very lucky that the R&D organization decided to move to microservices at the same time that we introduced OpenStack and Puppet. Looking back, I am not sure if it was planned, but the timing was just perfect. While development built a modern microservices based service, ops adopted and implemented cloud and automation.

In terms of management challenges, I will just narrow it down and talk about the challenges that I see for 2016. Migrating 150 services to containers is something that my team cannot accomplish alone. We are in a continuous effort to maintain the partnership with R&D and create a joint effort when it comes to educating ourselves and being able to optimally use the new technologies. That includes moving from continuous integration to continuous delivery, and building a strong delivery pipeline.

The operations goal is to build an environment that enables R&D to own the service end-to-end, not only to develop it but also to be able to support a quality and robust production environment.

Source: Twitter

Ofir: Can you point to any specific challenge that you faced and overcame throughout your cloud journey?

Koby: One big challenge was the deployment and adoption across the organization of Puppet. If only the cloud production and operations team was using it, it wouldn’t have been enough. We needed our software developers to adopt Puppet as well and use it as a standard delivery method. And making 200 developers use a new technology doesn’t happen overnight as you can imagine.

I learned that it’s not something that you can easily convince that number of smart people to do just by saying “guys this is great technology and it’s the only way we can deal with delivery”. We learned our lesson from that and now we work much closer with R&D. Taking decisions together from the start.

Remember that this was almost 4 years ago. It took a management decision from the very top of LivePerson General Manager in order for everyone to understand that this was the way forward. Our entire R&D was instructed that all new updates will go to production using Puppet. A small team of DevOps experts was brought in order to support and train the R&D teams and made sure Puppet was being used on a daily basis. This team carried out workshops and were the people to go to if any questions were raised. It took around a year to bring everyone up to speed and today Puppet is the main delivery tool.

Another challenge which is a common for OpenStack is the upgrades, at least with older versions. After 4 years of practicing, the process of upgrading takes one engineer up to three months. This was the story for every upgrade until now. The most recent upgrade has been the biggest so far, mainly due to the fact that our cloud has grown significantly and that we also needed to upgrade the hosts in tandem.

Upgrading thousands of physical servers while maintaining the service uptime is no simple task. In order to do this we need to take a group of servers, run a live migration of workloads to the other servers, then upgrade and ensure nothing was harmed before bringing the group back into the pool. There are lots of considerations and activities behind this, including understanding and segmenting the sensitive workloads.

The LivePerson team at OpenStack Day Israel. Photo: Lauren Sell. ()

Ofir: How do you manage to keep transparency throughout the upgrade process?

Koby: We built a smart cloud discovery solution which updates automatically. Transparency is key and we have complete control over each individual VM and service. The system records all activities and can be accessed using an API and UI.

Ofir: What 3 takeaway tips can you give from your experience?

Koby: 1. As the operations manager you should be able to build an efficient and professional team. Which obviously depends on the size of your OpenStack cloud. Considering that a cloud consists of best casino canada thousands of hosts you need at least 2 network professionals, 3 talented operations/engineering guys that are responsible for automating everything, and one storage guy. This team does not include the teams that operate the daily tasks and use the Openstack resources for the LivePerson service.

In addition, you need to think of every management aspect. Security is not part of our team, although ideally it should be. We are supported by our R&D team’s security experts.  When dealing with building your private cloud team remember that your R&D care less whether it’s an OpenStack, physical servers, VMWare, whatever. They just need the resources and the flexibility that the cloud and DevOps promise.

2. Learning from the past with the Puppet challenge, it was like us telling R&D “we demand that you deliver with Puppet”, but as an IT leader you need to understand and market the values of the new technology. And it never ends, but once you have done it the next time will be easier, as I see with our current move to Docker and Kubernetes. Eventually, we want to work together as equals, with everyone adopting the technology together, learning together and coping with all challenges together.

In order to accomplish that you need to create a “feature team” that includes representation from parties involved including the architects, leading developers, operations, network and security.

Although that might be challenging I strongly suggest to educate the other parties, not only on the touch points between dev and ops but also to get them to know behind the scenes of your cloud and get them to have the knowledge they need to use the OpenStack/Kubernetes APIs in particular. This is something that we are still working on with our R&D team. And together with containers our developers will be able to enjoy real independency with provisioning and consumption of resources. Connecting between the software and the infrastructure and letting the developers decide what they need and when is the flexibility, IT operations are responsible for.

3. Everyone should adopt the  DevOps approach. R&D and Production are both developers, each with their own location in the delivery system. Although I am proud that LP is a cloud pioneer we still have a way to go on that matter and that’s exciting. Becoming Netflix or Google doesn’t happen overnight. The good news is that this road never ends and there is always something new to learn, adopt and do better.

Ofir: What are your future thoughts about the private cloud/cloud market landscape for the next 5 years?

Koby: I’m not sure about the next 5, so let’s start with the next 2 and move on. I think that in 2 years we will see hybrid clouds big time — this is also what we are aiming for. By using Kubernetes we will be able to use all the public clouds, including our own private one the same way, with the same teams and tools. What I want to see in LP is a very dynamic multi cloud environment. For example, let’s say that Amazon just changed their prices and I know in real-time that I can get a better price with Google, I will want all workloads and traffic to seamlessly move to GCP, and if it changes again the day after, I will want it to automatically move back to a third public cloud. The workload migration will be based on a price/performance equation while taking into consideration the SLA of each workload.

In regards to OpenStack there is no doubt that today the compute, network and storage are much better than 4 years ago, even ENT ready. I think that those core components will be much better, support larger scales and so it will be easier and easier to upgrade seamlessly. The second priority is to have openstack integrate better with public clouds, burst workloads, DR and backup projects supporting us everywhere: on our Openstack private cloud, in EC2, Google and Azure. For example, Trove working the same in private cloud, EC2 instances, Google cloud, etc. Since the future is Hybrid, it just makes sense to have those extra cool projects work for me everywhere I choose. I think it will make Openstack much stronger.

Want to know how to maximize your reach with your content?
I'm interested in:
Join IOD

IOD is a content creation and research company working with some of the top names in IT.

Our philosophy is experts are not writers and writers are not experts, so we pair tech experts with experienced editors to produce high quality, deeply technical content.

The author of this post is one of our top experts. You can be too! Join us.

Join Us

Tech blogger? Why not get paid for your research and writing? Learn how.

The IOD website uses cookies. By clicking, you confirm acceptance of our cookie policy.
Logout
Please select one of the following:
Full Name
Email
Company
Job Title
Website
Expert?
Yes
Yes
Yes
No
What's your area of expertise?