Qualities of a Highly Available Cloud

MVISION UCE, our data-centric, cloud-native Security Service Edge (SSE) security platform, derives its capabilities from McAfee Enterprise’s industry leading Secure Web Gateway and Enterprise Data Protection solutions. However, this is not a lift and shift of capabilities to the cloud, which would have made it prone to service outages and impossible to have the flexibility that is needed to meet the demands of SSE. Instead, the best of breed functionality was purposefully reconstructed for SSE, using a microservices architecture that can scale elastically, and built on a platform-neutral stack that can run on bare metal and public cloud, equally effectively. A hallmark of the architecture is that the cloud is a single global fabric where service instances are spread throughout the globe. Users automatically access the best instance of any service through policy configuration.

With the widespread adoption of hybrid work models across enterprises for promoting flexible work culture in a post pandemic world, ensuring critical services are highly available in the cloud is no longer an option, but a necessity. McAfee Enterprise’s MVISION Unified Cloud Edge (UCE) is designed to maximize performance, minimize latency, and deliver 99.999% SLA guaranteed resiliency, offering blazing fast connectivity to cloud applications from any location and causing no service degradation, even when the usage of cloud services spiked 600% during the COVID-19 pandemic, as reported in our Cloud Adoption and Risk Report (Work From Home Edition). This blog shares details on how MVISION UCE is architected to enable uninterrupted access to corporate resources to meet the demands of the hybrid workforce.

What other alternatives are out there? We have seen some cloud services replicated in each region of their presence. While this makes controlling resources and data simple, and keeps everything within a boundary, such an approach loses out on the flexibility needed to scale on demand and reduced latency on access. With UCE, each point of presence (POP) is part of the global fabric, yet at the same time, fully featured with all services housed within the POP. This avoids the need to send traffic back and forth between various services located at different locations, a phenomenon known as traffic hairpin.

By default, user traffic gets processed at the POP closest to their physical location, regardless of where the user may be. A user may work at their office in New York 90% of the time and travel to UK occasionally. When the user connects to MVISON UCE, they are connected to New York POP when they are at office, and the POP in London if they are in a UK hotel while traveling. This is a big advantage if you think about it. User’s traffic does not need to trombone from the hotel in UK, to the POP in New York and back to a server in London. MVISION UCE’s out-of-the-box traffic routing scheme favors low latency. This does not mean that the customer cannot override this policy and force the traffic to be processed at the New York POP. They might do so if there is a compliance need to process all traffic at a certain location. Many customers have a need to store logs in a certain geography even though traffic processing may occur anywhere on the globe. MVISION UCE architecture decouples log storage from traffic processing and lets the customer choose their log storage geography based on criteria that customers define.

With thousands of peering partners growing every day, over 70% of traffic served by MVISION UCE uses peering links in some geographies. The whitepaper, How Peering POPs Make Negative Latency Possible, shares details about a study conducted by McAfee Enterprise to measure the efficacy of these peering relationships. This paper is proof that UCE customers experience faster response times going through our POPs than they would usually get by going directly through their Internet Service Providers. UCE follows a living partnership model when it comes to peering, with thousands of peering relationships in production. We are committed to keeping the latency to a minimum.

One of the key considerations while choosing a SSE vendor would be how much latency the service adds to user’s requests. Significant latency can negatively affect user experience and could be a deterrent to product adoption. With 85 POPs strategically placed around the globe providing low latency access to customers, UCE POPs have direct peering with the biggest SaaS vendors like Microsoft, Google, Akamai, and Salesforce to further reduce latency. In addition, MVISION UCE POPs peer with many ISPs around the globe, enabling high bandwidth and low latency connectivity end to end, from the customer’s network to UCE and from UCE to the destination server.

You may be wondering what the secret sauce is for achieving a reliability of five 9s or higher in MVISION UCE. Several items play a crucial role in preventing unplanned service degradation.

At McAfee Enterprise, security is not an afterthought. From the start, the architecture was designed with zero trust in mind. Services are segmented from one another and follow the least privileged principle when resources need to be shared between services. Industry standard protocols and methodologies are used to enforce user and identity access management (UAM/IAM). Strong role-based access controls (RBAC) across the platform keep customer’s data separate and provide self-defense when a service is compromised. None of these features matter if the software is vulnerable. McAfee Enterprise follows one of the strictest Software Development Life Cycle (SDLC) processes in the industry to eliminate known vulnerabilities and threats in our software as it is written.

Redundantly provisioned components that allow for one or more instances to pick up the work when one of them goes down. Unexpected system failures and interruptions do occur in the real world and having a good architecture that detects failures early and reroutes the traffic to another suitable instance is paramount to maintaining availability. A combination of client redirection, server-side redirection, along with deep application state tracking, is used to seamlessly bypass a failed spot. The global nature of the fabric allows for multiple simultaneous failures without causing a local outage.
State of the art automation and deployment infrastructure is key to localize issues, maintain redundancy, and react automatically when issues are found. Containerized workloads over Kubernetes are the foundation of the cloud infrastructure in MVISION UCE, which facilitates fast recovery, canary rollouts of software, and elastic scaling of the infrastructure in case of peak demand. This is combined with an extensive automation and monitoring framework that monitors the customer’s experience and alerts the operations team of any localized or global service degradation.
Ability to scale up on demand at a global scale. We are not talking about scale out within a POP here. Many times, physical data centers have a hard limit on resources and sometimes it takes several months to add new servers and resources at a physical site. We are talking about bursting out to newly provisioned POPs when the traffic demands, in a matter of hours. Through extensive automation and intelligent traffic routing, a new MVISION UCE POP can be deployed in public cloud quickly and start absorbing load, providing the needed cushion to avoid traffic peaks that could otherwise cause service degradation when usage patterns change. This capability allowed MVISION UCE to successfully handle increasing demand when customer VPNs could not handle the load created by dramatically increased remote work due to the pandemic last year.

Last but not the least, a chain is only as strong as its weakest link, and physical data center security must also be considered. Global partners are selected only after careful evaluation of their facilities and infrastructure that will host our data centers, while other vendors in this space are working with a larger set of less rigorously qualified regional partners to increase their presence. The McAfee Enterprise approach provides the necessary guard rails against supply chain attacks that our customers demand.

Another aspect of security that is gaining momentum these days is data privacy. This is at the forefront of all feature designs in MVISION UCE. Usually, data privacy means tokenization or anonymization of customer private data stored in MVISION UCE, be it logs or other metadata. At McAfee Enterprise, we strive to take this a step further. We do not want to retrieve private data from the customer environment if it can be avoided. For example, to evaluate a policy that involves customer premise data, UCE can offload the evaluation to a component on the customer premise. Case in point, McAfee Client Proxy (MCP) that is installed on user’s machine can perform a policy evaluation and avoid sending private data to the cloud. The McAfee Enterprise cloud leverages the results of the evaluation to complete the policy execution. Where this is not possible, private data is anonymized at the earliest entry point in the cloud to minimize data leaks.

There are other architectural gems hidden within UCE and thus failing to mention them would make this article incomplete. First, the policy engine is exposed in the form of code with which the customer can construct complex policies without being constrained by what UI provides. If you are a user of MVISON UCE, you can see this in action by enabling “Code View” in the Web Policy tree. If you do not like the way policy nodes are ordered in the tree or the evaluations made by default, you can take complete control and process the traffic in any manner you wish. By the way, the policy is so flexible that one can write a policy to process traffic in one region and store logs in another region.

Lastly, all services are automated and require no manual intervention to provision a customer unlike other vendors that require a support ticket to provision some features. Independent of where your account has been provisioned and where your preferred UI console resides, polices that you author are stored in a global policy system that is synchronized to all POPs around the world, giving you the flexibility to process traffic anywhere in the world.

Second, policy evaluation can be distributed across various components which allows its evaluation at the earliest point in the network. This avoids hauling all traffic to the cloud to apply policy. For example, if a sensitive document needs to be blocked due to data protection rules, the DLP agent running on the user’s machine can block it instead of hauling the traffic to cloud for classification and blocking. This strategy reduces load on the cloud and consequently increases the scale at which we can process requests.

To learn more about how MVISION UCE can help ensure your critical services are highly available in the cloud, watch this short video or visit our MVISION UCE page to get started.

To conclude, all clouds are not built equally. Architecture of a cloud is a matter of choice and tradeoffs. MVISON UCE implements a global cloud and puts customers in the driver’s seat through programmatic policies, that are secure, scalable, and highly available.

The post Qualities of a Highly Available Cloud appeared first on McAfee Blogs.

This post was first first published on McAfee Enterprise – McAfee Blogs’s website by Mani Kenyan. You can view it by clicking here