NSX Security Reference Design Guide

1 Introduction

1.1 VMware Security

 

 

VMware has a broad offering of security products and features which are integrated across the entire VMware product portfolio.  VMware sees security as an adjective, not a noun.  Security is built-in; not bolted on.  Here is a listing of VMware Security Products and features across the heterogeneous infrastructure which is common today. Infrastructure today extends along a continuum from physical servers on prem to VMs in hypervisors (sometimes a variety of hypervisors like ESXi and KVM) to containers, on prem and in the cloud, to Software as a Service (SaaS) offerings like Office365 (O365) and SalesForce (SFDC). VMware offers the tools to secure this heterogeneous environment in a consistent manner, while allowing the qualities of each solution to shine.

 

A screenshot of a cell phone

Description automatically generatedFigure 1--1 VMware security offering

 

 

1 – VMware Carbon Black – CB secures endpoints by detecting malware, from the physical pcs on prem to the virtual desktop infrastructure (in the cloud or on prem) to servers, both physical and virtual.

2- VMware Horizon View - Horizon allows for the secure delivery of virtual desktop infrastructure, both for Carbon Black and for applications.

3                    – WorkspaceONE – WorkspaceONE is a Unified Endpoint Manager (UEM) which provides a single point of definition and control of the intersection of application/user/device/location.

4                   – VMware SD-WAN by VeloCloud – Traffic to remote locations can be secured (and optimized through DMPO – Dynamic MultiPath Optimization) using SD-WAN by VeloCloud.  Now, with Secure Access Service Edge (SASE) functionality, the admin can also define secure connectivity policy. 5– vRealize Network Insight - vRNI provides visibility into the physical underlying infrastructure of switches and routers as well as the virtual infrastructure through netflow, or into the legacy firewall infrastructure through integration with a variety of firewall managers.  This visibility is complemented by a cross sectional view of the virtual infrastructure from native Amazon Web Services (AWS) and Microsoft Azure environments to branches to ESXi VMs and Kubernetes (K8) containers.  This ubiquitous view combines into a complete picture of the environment which is searchable. In addition, an admin can get firewall policy suggestions or just determine the path with applicable security policies along every step from point A to point B.

6 – NSX-T Data Center – This document will focus on the security features of NSX.  To provide context in the greater picture, items 7 through 14 provide a listing of the security components of NSX.  7 –NSX Gateway Firewall – NSX Gateway Firewall secures the data center boundary.  It also provides security at the physical to virtual boundary as well as tenant boundaries, in multi-tenant environments. 

8 – NSX Distributed Firewall - For East-West security, the admin can centrally define policy from the NSX Manager. NSX leverages a distributed local control plane to implement policy definition using local constructs (be they firewall rules on every virtual NIC (vnic) of a VM or agents running on physical servers).  NSX Distributed Firewall runs on any ESXi hypervisor, on prem or in several clouds. It also runs on KVM hypervisors, windows and linux physical servers, and as part of the NSX Container Plug-in (NCP) which supports K8, RedHat OpenShift, and Tanzu container platforms. 9 – NSX Identity Firewall – NSX IDFW uses Active Directory User SIDs to provide user-context for single-user Horizon/Citrix VDI and server OS cases, and server OS use cases, as well as multiuser, RDSH use cases such as Horizon Apps and Citrix Published Applications/Virtual Apps. 

10 – NSX URL Filtering – NSX also provides URL filtering capabilities, whether it is to ensure that malicious websites are not being accessed (such as by ransomware for Command and Control) or by users misguided sense of where to download software.

11   – NSX Intelligence – NSX Intelligence is a native distributed analytics platform, that leverages workload and network context from NSX, to deliver converged security policy management, analytics, and compliance. 

12  – NSX IPS - For intrusion detection, NSX brings industry first distributed IPS (Intrusion Detection and Prevention System). This not only provides distributed, scalable IPS but also prevents misfires by loading only relevant IDS signatures.

13  - NSX Cloud – For AWS and Azure native workloads, NSX Cloud offers a single point of policy control across VPCs and VNETs to ensure policy consistency. For AWS and Azure native environments, security can be implemented either via agents on workloads or natively via cloud controls

14  – IPSec VPN – To access cloud environments (such as for direct connect) or anywhere else, NSX ensures the in flight traffic is encrypted using IPSec VPN.

15  – vSAN Disk Encryption –For data at rest, vSAN disk encryption ensures data is safe.

16– Web Application Firewall – NSX provides integrated load balancing.  With our Advanced LB, comes iWAF: intelligent WAF that uses analytics and machine learning to tune policy and insights into attack traffic.  

17                - Tanzu Service Mesh – For the security of microservice applications across K8 clusters and clouds, VMware provides Tanzu’s service mesh.

18               – Secure State - Finally, VMware Secure State correlates risk across this dynamic cloud infrastructure, reporting on risk such as “any any allow” configuration changes.

 

This vast offering of products and features allows for pervasive and granular security policy definition from endpoints to servers to containers to microservices.  It also allows for encrypting data both in flight and at rest. Finally, this also allows for the detection of suspicious behaviors on endpoints or in the network across a heterogeneous environment.

 

This document will focus on the security offerings of the NSX product portfolio and how to optimally design and use those offerings.

 

1.2  Service Defined Firewall

The Service Defined Firewall is the term used to define the combination of NSX firewalling and NSX IDS.  As we will see in subsequent chapters, this powerful combination brings firewalling and IPS capabilities to every corner of the data center, without the need for network redesign. It also allows for tuned security policy based on granular east west requirements, as we shall be explored in subsequent chapters.

 

Intrinsic security is a fundamentally different approach to securing your business. It is not a product, or tool, or bundle for your organization. Rather, it is a strategy for leveraging your infrastructure and control points in new ways—in real time—across any app, cloud, or device so that you can shift from a reactive security posture to a position of strength.

 

1.3  Modern Security Journey

 

One of the most common questions customers ask is: “How do I move to a modern security architecture from where I am now?”  This is like asking what you should do to upgrade your house – the answer is: it depends on your lifestyle and where your house is lacking.  This section will briefly provide an overview of a few customers who have undertaken the journey to a modern security infrastructure.  As you will see, the important thing is that they have started, not that how they started.

 

1.3.1  Segmentation

The first sample customer is a customer from a large company of over 50,000 employees with over 1800 hosts running 30,000 VMs.  This is an old, solid, well-established company that has been in business for over 175 years.  Their infrastructure includes pretty much every technology across the course of computer history from mainframes to modern containers and micro services.  For this company, the first step in adopting a modern security strategy was as simple as separating prod from nonprod.  This effort took them 18 months due to the complex nature of their environment.  There were applications that were in use for decades and whose architecture and even ownership was poorly understood.  Sorting through those details took a long time.  But, at the end of the effort, every application was inventoried along with its use and ownership.  One executive noted that the inventory effort alone improved their security posture.  The segmentation was a bonus.  They are now poised to further segment their prod environment by business unit, continuing in an iterative manner.

 

1.3.2  Security Growing Up

 

The next company’s journey is another large company with close to 16,500 employees.  This company took the approach of starting at their branches and securing those first because the physical security at those branch locations was wildly varying.  They started with their smaller branches and secured those, allowing them to get comfortable with the technology and its operational nuances.  Should one of those branches go down due to operational unfamiliarity, the impact to the company as a whole was minimal.  By the time they had secure all the smaller locations, there was a degree of comfort that gave them confidence to take on their medium branches and with that they grew confidence for their large branches and on to their corporate data center environment. Because of the pure software architecture of NSX, they were easily able to revise earlier implementations based on lessons from later stages as the project progressed.

 

1.3.3  Application Focused Security

 

The next example looks at a large hospital.  This hospital has close to 1 million outpatient visits a year, over 500 beds and 6,000 employees.  This hospital chose to secure their most precious asset first:

their Electronic Health Records (EHR) application.  This application is a multitiered, complex application which interfaced with every other application in their hospital: timeclock, billing, etc.  Due to the complexity of the application, this customer chose to take this on as part of a 6-week professional services engagement with VMware.  The environment was identified and tagged, with rules written, within 2 weeks.  The rest of the engagement was about scheduling maintenance windows to enabling the deny rule at the end of each section in the policy, watching the logs and updating anything that may have been missed.  

 

This engagement took place almost 4 years ago.  Since then, the customer has maintained the policy and updated code.  This is the value of NSX: when there is an effective tagging model that is chosen, the maintenance of the infrastructure is minimal and new features of the later releases were easily added in.

 

1.3.4  Security Through Migration

 

On occasion, a golden opportunity presents itself in which to adopt a new security model such as a new infrastructure migration.  This last customer use case took advantage of a hardware refresh to build a new environment with security built-in.  This customer is a SaaS software supplier which is subject to compliance.  They are a $6B company with 5500 employees.  Because a hardware refresh requires mapping out applications as they get migrated over, it does present an opportunity to build the new environment with the appropriate policies in place and settle the applications in to a new environment, with security built in from ground zero.  In this instance, they used vRNI to map out their environment to size the new hardware environment.  As part of that same vRNI assessment, they were able to map out their applications and their flows.  With this policy suggestion exported from vRNI, they were able to preload the policy prior to migration.  So, the applications migrated into a secure, modern infrastructure from the start.

 

1.4  Closing thoughts on security adoption

 

As the above examples show, there are many ways to embark on the modern security journey.  There really is no right or wrong way to start.  The important thing is to start.

 

The following document provides an in-depth view from a security perspective of NSX architecture, features, and functions. The following chapters are organized to first provide a basic view of the NSX architecture and components such that the discussion of the features and functions can be better contextualized.  The topic of firewalling is divided among three chapters: one discussing virtual firewalling in all its models: gateway (or perimeter) firewalling, one discussing distributed firewalling, and the last one public cloud firewalling which leverages the aforementioned models to provide NSX security in the public cloud. There is a chapter dedicated to Containers as they are a new architectural component which makes them an ideal place to implement security from the start. There is a chapter dedicated to firewall features which are most commonly used to integrate modern NSX security technology into existing security architectures.  The next chapter covers Intrusion Detection and Prevention. Finally, there is a chapter on management and operations.

2 NSX-T Architecture Components 

NSX-T reproduces the complete set of networking services (e.g., switching, routing, firewalling, load balancing, QoS) in software. These services can be programmatically assembled in arbitrary combinations to produce unique, isolated virtual networks with complete security in a matter of seconds. Although NSX does not require overlay networking, there is an added security assurance when overlay is used in that it is less likely that external networking mechanisms bypass NSX security controls.

 

NSX-T works by implementing three separate but integrated planes: management, control, and data. The three planes are implemented as sets of processes, modules, and agents residing on two types of nodes: manager appliance and transport.

 

Figure 2--1: NSX-T Architecture and Components

2.1  Management Plane and Control Plane

 2.1.1  Management Plane

The management plane provides an entry point to the system for API as well NSX-T graphical user interface. It is responsible for maintaining user configuration, handling user queries, and performing operational tasks on all management, control, and data plane nodes.

 

The NSX-T Manager implements the management plane for the NSX-T ecosystem. It provides an aggregated system view and is the centralized network management component of NSX-T. NSX-T Manager provides the following functionality: 

       Serves as a unique entry point for user configuration via RESTful API (CMP, automation, including third party security managers) or NSX-T user interface.

       Responsible for storing desired configuration such as security policy in its database. The NSXT Manager stores the final configuration request by the user for the system. This configuration will be pushed by the NSX-T Manager to the control plane to become a realized configuration (i.e., a configuration effective in the data plane).

       Expands rules and converts object to IP addresses and pushes rules to data plane 

       Maintain object to IP database, updated via IP discovery mechanism

       Retrieves the desired configuration in addition to system information (e.g., statistics).

       Provides ubiquitous connectivity, consistent enforcement of security and operational visibility via object management and inventory collection and for multiple compute domains – up to 16 vCenters, container orchestrators (PKS, OpenShift & Tanzu) and clouds (AWS and Azure)

 

 

Data plane components or transport nodes run a management plane agent (MPA) that connects them to the NSX-T Manager.

 2.1.2  Control Plane

The control plane computes the runtime state of the system based on configuration from the management plane. It is also responsible for disseminating topology information reported by the data plane elements and pushing stateless configuration to forwarding engines.

 

NSX-T splits the control plane into two parts:

       Central Control Plane (CCP) The CCP is implemented as a cluster of virtual machines called CCP nodes. The cluster form factor provides both redundancy and scalability of resources. The CCP is logically separated from all data plane traffic, meaning any failure in the control plane does not affect existing data plane operations. User traffic does not pass through the CCP Cluster.

       Local Control Plane (LCP) – The LCP runs on transport nodes. It is adjacent to the data plane it controls and is connected to the CCP. The LCP is responsible for programing the forwarding entries and firewall rules of the data plane.

 2.1.3  NSX Manager Appliance

Instances of the NSX Manager and NSX Controller are bundled in a virtual machine called the NSX Manager Appliance. Three unique NSX appliance VMs are required for cluster availability. NSX-T relies on a cluster of three such NSX Manager Appliances for scaling out and for redundancy. Because the NSX-T Manager is storing all its information in a database immediately synchronized across the cluster, configuration or read operations can be performed on any appliance.

 

Each NSX Manager appliance has a dedicated IP address and its manager process can be accessed directly or through a load balancer. Optionally, the three appliances can be configured to maintain a virtual IP address which will be serviced by one appliance selected among the three. The design consideration of NSX-T Manager appliance is further discussed in the NSX Design Document.

2.2  Data Plane

The data plane performs stateless forwarding or transformation of packets based on tables populated by the control plane. It reports topology information to the control plane and maintains packet level statistics.  The NSX-T data plane is the term which applies to all packet handling software which is part of the NSX-T scope.  This data plane includes physical servers, hypervisors, NCPs, cloud enforcement mechanisms be they agents or gateways, and edge nodes which are handling traffic, be they bare metal or VM form factors.

 

Hosts running the local control plane daemons and forwarding engines implementing the NSX-T data plane are called transport nodes. Prior to NSX-T 3.0, transport nodes could only be run on an instance of the NSX-T virtual switch called the NSX Virtual Distributed Switch, or N-VDS. The N-VDS is so close to the ESXi  virtual distributed switch (VDS) that NSX-T 3.0 introduced the capability of installing NSX-T directly on the top of a VDS on ESXi transport hosts. For all other kinds of transport nodes and for all edge nodes, the N-VDS is required. The N-VDS is based on the platform

independent Open vSwitch (OVS) and serves as the foundation for the implementation of NSX-T in other environments (e.g., cloud, containers, etc.).  The NSX data plane supports both IPv4 and IPv6.  In cases when only one protocol is used, the other one can be disabled to free up system resources.  

 

As represented in Figure 2-1-, there are two main types of transport nodes in NSX-T:

         Hypervisor Transport Nodes: Hypervisor transport nodes are hypervisors prepared and configured for NSX-T. NSX-T provides network services to the virtual machines running on those hypervisors. NSX-T currently supports VMware ESX and KVM hypervisors.

         Edge Nodes: VMware NSX-T Edge™ nodes are service appliances dedicated to running centralized network services that cannot be distributed to the hypervisors (Gateway firewall, NAT, DHCP, VPN, and Load Balancing). They can be instantiated as a bare metal appliance or in virtual machine form factor. They are grouped in one or several clusters, representing a pool of capacity.

 

2.3  NSX-T Consumption Model

A user can interact with the NSX-T platform through the Graphical User Interface or the REST API framework.

 2.3.1  NSX-T Role Based Access Control

NSX-T offers Role Based Access Control (RBAC).  Roles can be assigned through integration with direct LDAP Identity Sources such as Microsoft Active Directory (AD) and OpenLDAP using LDAP, LDAPS, and StartTLS.  Multiple domains or identity sources are supported to accommodate large enterprise configurations.  Either users or groups can be assigned to roles.  NSX provides four basic permissions: full access, execute, read, and none.  Full access gives a user all the permissions.  The execute permission includes the read permission.  There are 11 predefined roles: There are ten predefined roles in NSX-T:

 

         Enterprise Administrator

         Auditor

         Network Engineer

         Network Operations

         Security Engineer

         Security Operations

         Load Balancer Administrator

         Load Balancer Auditor

         VPN Administrator

         Guest Introspection Administrator

         Network Introspection Administrator

 

A screenshot of a cell phone

Description automatically generated

Figure 2--2: NSX-T RBAC with LDAP Integration

Note that when integrated with Active Directory, if the username is changed on the AD server, the NSX role will need to be reassigned to the new username.

 2.3.2  NSX-T Declarative API Framework

 The NSX-T declarative API framework provides an outcome driven config option.  This allows a single API call to configure multiple NSX networking & security objects for an application deployment.  This is more applicable for customers using automation and for CMP plugins. Some of the main benefits of declarative API framework are:

         Outcome driven: Reduces the number of configuration steps by allowing a user to describe desired endgoal (the “what”), and letting the system figure out “how” to achieve it. This allows users to utilize userspecified names, not system generated IDs

         Order Independent: create/update/delete in any order and always arrive at the same consistent result

         Prescriptive: reduces potential for user error with built-in dependency checks

         Policy Life Cycle Management:  Simpler with single API call. Toggle marked-to-delete flag in the JSON request body to manage life cycle of entire application topology.

 

The NSX-T API documentation can be accessible directly from the NSX Manager UI, under Policy section within API documentation, or it can be accessed from code.vmware.com link.  

 

The following examples walks you through the declarative API examples for two of the customer scenarios:

2.3.3  API Usage Example 1- Templatize and deploy 3-Tier Application Topology

This example provides how Declarative API helps user to create the reusable code template for deploying a 3-Tier APP shown in figure 2-3, which includes Networking, Security & Services needed for the application.

Figure 2-3: 3 Tier App

 

The desired outcome for deploying the application, as shown in the figure above, can be defined using JSON. Once JSON request body is defined to reflect the desired outcome, then API & JSON request body can be leveraged to automate following operational workflows: 

         Deploy entire topology with single API and JSON request body.

         The same API/JSON can be further leveraged to templatize and reuse to deploy same application in different environment (PROD, TEST and DEV).

         Handle life cycle management of entire application topology by toggling the "marked_for_delete" flag in the JSON body to true or false. 

 

2.3.4  API Usage Example 2- Application Security Policy Lifecycle Management

 This example demonstrates how a security admin can leverage declarative API to manage the life cycle of security configuration, grouping, and micro-segmentation policy for a given 3-tier application. The following figure depicts the entire application topology and the desired outcome to provide zero trust security model for an application.

Figure 2-4: JSON Declarative statement

 

To define the desired outcome for defining grouping and micro-segmentation polices using JSON and use single API with JSON request body to automate following operational workflows:

         Deploy allow-list security policy with single API and JSON request body.

         The same API/JSON can further leveraged to templatize and reuse to secure same application in different environment (PROD, TEST and DEV).

         Handle life cycle management of entire application topology by toggling the "marked_for_delete" flag in the JSON body to true or false.

 

The details of both sample examples are fully described in the NSX-T Data Center Reference Design Guide.  

 

3 Virtual Firewalling

The practice of firewalling goes back to the early days of the Internet when there was a leased line connecting the “Inside” to the “Outside”.  First, the router which provided that connection was configured with an access list or filter which define which traffic types were allowed in which direction.  As time went on, there was a recognition that simple router access lists did not suffice to secure these connections as a greater level of intelligence was needed and firewalls were born.  As corporate networks grew and changed, more firewalls were added at the access points: Remote Access entries, Partner connections, etc.  As traffic needed firewall servicing, it would be directed to these central appliances. With the proliferation of these firewalls, the concept of a firewall manager was born.  This manager would provide a central point for the administrator to configure firewalls.  This configuration would later be pushed to the firewalls themselves.  This architecture of a manager controlling a network of appliances has remained unchanged for decades.

 

NSX-T brings a new paradigm to the firewall strategy with Distributed Firewalls.  As described in the previous chapter, NSX-T provides a central management and control plane for a distributed data plane.  From a security perspective, this means centralized control and policy with ubiquitous distributed enforcement.  Whereas legacy firewalls needed to have traffic directed to them (and were thus easily bypassed), NSX-T Distributed Firewalls (DFWs) operate on every virtual NIC (VNIC) of every VM, seeing every single packet entering or exiting the VM without the need to reroute those packets.  When a vMotion takes place and a VM is moved from one host to another, the legacy firewalls which were designed for static infrastructure put a greater burden on the infrastructure to direct traffic to them.  Because the DFW is tied to the vNIC of the VM, it is impossible to bypass the DFW as it moves with the VM as part of the vMotion event.  This also means that only the firewall administrator can disable its functionality to allow traffic to bypass the DFW.  

 

There are three types of firewalls in the NSX-T architecture: Gateway Firewalls and the Distributed Firewall is an element of firewalling attached to the data plane source/destination (be it a pod in a container, a VM on prem or in a public cloud, or a physical server.  The third type of firewall is the Bridge Firewall.  Gateway firewalls are designed to run in the periphery or boundaries; they are North-South Firewalls.  Two examples of these peripheries or boundaries might be between the physical to virtual boundary or between tenants.  The Distributed Firewall runs in every hypervisor, container, or physical server and is an East-West Firewall.  A key characteristic of the DFW is that it is network agnostic and pervasive. The Bridge Firewall is used only in NSX bridges (which are used to adjoin two L2 domains at layer 2 – without routing- such as a vlan and a geneve overlay segment).  The Bridge Firewall is a layer 2 firewall and beyond the scope of this document.  For more details on the Bridge Firewall, see the NSX documentation.

 

A screenshot of a cell phone

Description automatically generated

Figure 3--1 NSX-T Firewalls

 

Figure 3-1 shows the Gateway firewalls running on NSX Edge nodes. NSX Edge nodes are virtual appliances or physical servers managed by NSX. They provide networking and security services for north-south traffic, interfacing with your top of rack switches.  One NSX Edge node can host multiple Gateways or Gateway firewalls. These Gateway firewalls have their own firewall or security rule table; they operate individually while being centrally managed by NSX. Gateway Firewalls make it easy to create security zones according to network boundary and manage firewall rules per zone or organization.  Also, in figure 3-1, the NSX Distributed firewall is shown.  The DFW is built into the hypervisor as if each VM has its own firewall. Distributed Firewall is managed as one universal firewall. It is agnostic to network topology and enables micro segmentation without requiring network segmentation. Combined together, NSX Gateway Firewall and Distributed Firewall can secure northsouth traffic and east-west traffic of Data Center.

 

These two firewall types can be combined in many different configurations, some of which are shown in figure 3-2 below:

 

Diagram

Description automatically generated

Figure 3--2 NSX-T Firewall Deployment Options

 

Figure 3-2 shows 4 common deployment modes.  Option 1 is using the DFW on a mixture of physical and virtual servers.  In this mode, there is no network virtualization and all network connectivity above L2 is provided by physical routers.  This option is a common deployment for those who are starting their NSX journey with security only.  Option 2 is using the Gateway Firewall and DFW on workloads which are leveraging NSX overlay networking.  This option is common in greenfield environments or when there is a new extension to an existing environment due to IP address exhaustion or just a new application which is being rolled out with software defined networking and security.  Option 3 is showing vlans routed and protected by a Gateway firewall.  Since firewalling and networking are functions assigned to different roles (as was discussed in the RBAC section of chapter 2), this model ensures that the vlans in question will not be able to communicate with the other ones, even if the routing is configured by a network administrator.  Finally, Option 4 shows the use of an NSX Bridge firewall.  This is frequently used as a migration mode while infrastructure is being modernized to a software defined network and security architecture.  It is important to note that all (or some) of these modes can coexist in one environment, depending on the needs of each area.

 

 

3.1  Gateway Firewall

As mentioned above, the Gateway Firewall provides firewalling services at boundaries or perimeters.

The Gateway Firewall is supported on both Tier 0 and Tier 1 routers (for more information about Tier 0 and Tier 1 routers, see the NSX Design Document). Note that although the Gateway Firewall is instantiated in the same software as the Tier 0 and Tier 1 routers, its functionality IS NOT equivalent to an access list in traditional routers.  Even if routing is performed elsewhere (ie, disabled on the T1 or T0), the Gateway Firewall will still function.  The Gateway firewall provides firewalling services and services that cannot be distributed such as NAT, DHCP, VPN, and Load Balancing, and as such need the Services Router component of the router.  This means that the Gateway Firewall is implemented in the NSX Edge Transport Nodes, which are dedicated DPDK appliances.  Further, the Gateway Firewall provides functionality such as Service Insertion which will be described in Chapter 6.

3.1.1 Zone Firewalling with the Gateway Firewall

 

As the Gateway Firewall is designed to work at boundaries, it is ideal for designing zones.

 

A close up of a device

Description automatically generated

 

Figure 3--3 NSX-T Gateway Firewalls implementing Zones

As figure 3-3 shows, the Gateway Firewall is applied to both the Gateway Uplinks and Services Interfaces.  In this figure, the Prod and Non-Prod zones can have policy defined for each zone independently. Although this figure does not depict it, the two zones could even have overlapping or duplicate address space, with NAT at the T0, or each T1.  In either case, security policy is implemented at the gateway level for all traffic entering or exiting the respective zones.  The T0 gateway is where policy securing the NSX environment is applied.  This policy is applied on the northbound interface. As is shown in this scenario, the Tier 0 gateway is also an inter-tenant connector.  For firewalling between tenants, the policy is applied on the northbound interface of the T1s.  At this level, for example, one may define the policy that the Prod can talk to the Non-Prod, but not vice versa.  The T1 gateway firewalls are ideal for implementing zone or tenant specific policy. The T1 would be the ideal place to define which services are available within that zone – say web services can go to the 10.1.1.0/24 segment only.  This hierarchical definition of policy provides a means to minimize policy clutter.

 

 

 

A close up of a map

Description automatically generated

Figure 3--4 NSX-T Gateway Firewalls, External to Internal Traffic

Figure 3.4 depicts the path of a packet though the above-described zone configuration.  In this scenario, a packet originates in the outside and is destined to the right VM on the 10.1.1.0/24 segment.  The left half of Figure 3-4 shows the logical representation of this flow.  The right half shows the physical representation of the flow.  In the physical manifestation of this environment, the Edge Node hosts the Gateway Firewall itself (as indicated above).  Given that the Edge Node is also where routing connecting the virtual world to the outside world would happen, this places the security at the outermost boundary.  

 

A screenshot of a cell phone

Description automatically generated

Figure 3--5 T0 Gateway Firewall Rule

In this case (as is seen in figure 3.5), there is a gateway policy on the T0 Gateway firewall that allows all http traffic to any VM with *web* in the name.  (Note that this is where following a naming convention pays off!)  This will allow the packet through the perimeter.  Now, just because the traffic is allowed through the gateway, that does not mean it is allowed into the zone.  

 

A screenshot of a cell phone

Description automatically generated

Figure 3-6 T1 Gateway Firewall Rule

 

In this case (as is seen in Figure 3.6), there is a policy on the T1 Gateway firewall that allows all http to that application VM.  This is a layered gateway firewall security approach.  The T0 gateway firewall has a general policy - what is allowed in, which tenant can talk to which other tenant - and the T1 gateway firewall has a more specific policy, regarding its own tenancy.  This distributed, hierarchical model allows for optimal efficiency where the T0 Gateway firewall is not cluttered with details about each of the zone specifics.

 

In the physical representation, both the T0 and the T1 firewalls are on the Edge transport Node.  Thus, the packet does not leave the Edge host until it has passed through the T1 Gateway Firewall.  At this point, the packet is sent to the host with the destination VM, encapsulated in any overlay headers that may be required.  (The network details of this are included in the NSX Design document.). Upon arriving at the destination host, the packet will then be examined by the Distributed Firewall for that VM, as described in the following section.

 

Next, for consideration is inter-tenant traffic, or traffic between tenants.

 

A close up of a map

Description automatically generated

Figure 3-7 NSX-T Gateway Firewalls, Inter-tenant Traffic

In figure 3.7, traffic originating inter-tenant traffic from the Prod zone is going to the Non-Prod zone. Again, this is depicted both logically and physically.  The traffic originates at the VM on the

10.1.1.0/24 segment of the Prod (blue) zone and is destined to the VM on the 10.2.2.0/24 segment in the NonProd (green) zone.  Assuming the packet is allowed out the DFW on the VM, it then goes to the Prod T1 Gateway which resides on the Edge Node.  At the Prod T1 Gateway Firewall, it hits a rule that allows the web_prod to talk to the Non_Prod Dev_Test segment.  From there, the packet goes to the T0 Gateway Firewall which allows Prod to talk to NonProd. Finally, it will hit the NonProd T1 gateway firewall which allows the in with a rule that says web _servers can talk to the Dev_Test Segment.  Once again, each Gateway firewall has rules relevant to its scope.

 

 3.1.2  Gateway Firewall Functions

The Gateway Firewall is where state-sensitive services such as NAT, DHCP, VPN, and LB are implemented.  One of the differentiating services which is available with NSX security is the full security suite of services functionality available from our Advanced Load Balancer.

 

A screenshot of a cell phone

Description automatically generated

Figure 3--8 Advanced Load Balancer Security Service Suite

WAF (Web Application Firewalling) is one part of the security stack within the Advanced Load Balancer (ALB). Obviously, the ALB provides load balancing services, global load balancing. On top of that, though, there is a security stack that can be applied to applications ranging from the basic layer three/four firewalling all the way up to SSL termination. The Advanced LB also offers authentication and authorization via integration with SAML. Next there is layer seven firewalling - the ability to have firewall rules on HTTP headers, url and so on.  There is also DDoS protection at layer seven for application attacks like Slow Loris, built into the platform as well. To complement the L7 security, there is comprehensive rate limiting. This provides the ability to rate limit both connections and requests in a fairly granular way all the way down - if you need to - to individual clients or per URL.  Finally, on top of all of that security, there is the web application firewall which is part of that LB Service Engine. It is not a separate component and not a separate feature or license. It is literally a policy that you assign to an application when you deploy it and that application is then protected by the WAF.  As you change the LB URLs for that application, the WAF is automatically learning those changes.

 

A screenshot of a social media post

Description automatically generatedFigure 3-9 WAF Security Pipeline

Figure 3-9 shows the full WAF security pipeline, which has been designed with optimal security and efficiency in mind.  WAF checks include HTTP checks (enforcing the HTTP standard), encoding bypass checks (multiple encoding attempts), and even restricted files or extensions (such as forgotten .bak files, for example).  Walking through this pipeline, the first pass is an allow list of things which are known good.  The next step is Positive Security with its learning input which checks a high percentage of all parameters, therefore reducing the impact of the last step: signature checking. Each step is designed to cull traffic for the following, more computationally expensive step.  All learned and enforced tragic by the positive security engine reduces traffic for the signature checks, which are the most expensive.  Since generic signature checks are the most common for false positives, reducing the traffic on which they operate also reduces the false positive rate.  The result of this inspection waterfall is that zero day attacks are blocked, false positives are reduced, and WAF performance is optimized.

3.2  Distributed Firewall

 

As was mentioned above, the Distributed Firewall is an East-West Firewall.  The DFW inspects every packet going in and out of a VM, as a function of a packet travelling along the virtual NIC (VNIC) of the VM.  The DFW exists in the kernel of the hypervisor, which means it delivers line rate performance.  Moreover, since it exists in the hypervisor, the DFW scales linearly with added compute.  Most importantly, the DFW rules move with the VM during vMotion events.  This means that traffic flow state is preserved, regardless of which host a VM moves to. (vMotion events will typically disrupt legacy firewalls deployed in a VM form factor.)  Another key aspect of the distributed firewall is that it provides a central policy, enforced in a distributed manner.  Chapter 4 will dive into the details of Distributed Firewall policy design.   

 

One consideration for DFW optimization is IPv6.  By default, the DFW has both IPv4 and IPv6 enabled for every firewall rule.  For resource optimization, it is recommended to only enable IPv4 in firewall rules where IPv4 is the only protocol in use. This is done by clicking on the gear icon to the right of the rule, which brings up the configuration screen shown in figure 3.10.  Note that this screen will also allow the enabling of logging and rule directionality definition.  When IPv6 is selected, it is important to note that NSX-T IPv6 resolution is enabled by default and IPv6 learning is disabled.  (The opposite was true with NSX-V.). The IPv6 settings are adjusted in the Networking section.

 

 

Figure 3-10 Rule Settings for IPv6

 

In figure 3.11, the DFW being used to create zones. The red VMs and containers are the DMZ zone; the green VMs and containers are the Prod zone; and the blue VMs are the NonProd zone.  Note that all of these VMs can comingle on the same segments in the same host and be secure with DFW policy, without the need to change the underlying network infrastructure.  The gray services zone happens to be all on the same segment (because luck occasionally shines).  This design allows the creation what is

called a “DMZ Anywhere” design.  A DMZ no longer means stranded compute capacity, nor does it require the backhaul of DMZ traffic across the DC for security treatment.

 

A close up of text on a white background

Description automatically generatedFigure 3--11 NSX-T Distributed Firewall

 

3.2.1  Zone Firewalling with the Distributed Firewall

 

 

Revisiting the interzone communication above where the VM on the blue zone communicates to the

VM on the green zone, as shown in figure 3.11.  But, this time, one examines that flow without any Gateway firewall rules in place.  In this example, the T0 and T1 routers exist only for the purposes of routing. 

 

A close up of a map

Description automatically generatedFigure 3--12 NSX-T Distributed Firewall

As explained earlier, the packet leaving the VM must traverse the DFW on the VM.  This means that the DFW must allow that protocol out.  In this case, there is a rule that allowing that. Because of the magic of distributed routing (described in detail in the NSX Design document), the packet never leaves the host but appears at the destination VM which coincidentally lives on the same host.  The packet now arrives that the destination VM’s VNIC, but it must go though the destination VM’s DFW.  Here, again, a rule that allows the packet in.

 

Figure 3-13 NSX-T Distributed Firewall

 

Figure 3-13 shows a sample policy that would define a blue Zone then add a rule for exceptions allowed out of the zone.

3.3  Data Plane Implementation

 

In order to understand how to optimize the policy design, one needs to understand the NSX data plane implementation.  In this section, the data plane implementation will be examined in physical servers, ESXi and KVM hypervisors, and containers.  The NSX Container Plug-In will be introduced here only to cover the processing of firewall policy.  The NCP will be discussed in detail in its own chapter.

 

 3.3.1  Physical Servers

 

NSX can provide security for physical servers as well as virtual servers by installing an NSX agent on the server.  These servers can connect to the NSX environment either on overlay or VLAN backed networks.  It is recommended that the servers be integrated into NSX-T using Ansible scripts.  (After 3.1, this is no longer the recommended manner, but it is stills supported.)  To support NSX, the server must support third-party packages, and be running a supported OS per the Bare Metal Server System Requirements described here.  The following terms are relevant in the physical server security:

 

Application: This represents the actual application running on the server (web server or data base server).

Application Interface: This represents the network interface card (NIC) which the application uses to send and receive traffic. One application interface per server is supported.

Management Interface: This represents the NIC which manages the server.

VIF: This is the peer of the application interface which is attached to the logical switch (This is similar to the VM vNIC).

 

To add physical servers to the NSX data plane, perform the following steps:

 

  1. Install Third Party Packages on the Server
  2. Create an Application Interface for the workload.
  3. Set up the Ansible and download and extract the integration from Github
  4. Establish connectivity to the NSX Manager
  5. Secure Workloads with DFW

 

Once configured, the physical servers will be with DFW rules which are pushed from the NSX Manager.

 3.3.2  ESXi Data Plane

 

NSX-T provides network virtualization and security services in a heterogeneous hypervisor environment with ESXi and KVM hosts part of the same NSX-T cluster. The NSX-T DFW management and control plane components are identical in both ESXi and KVM hosts. Functionally, the NSX-T distributed firewall (DFW) is identical on both flavors of hypervisors. However, architecture and implementation have some differences between ESXi and KVM environment for DFW. The data plane implementation differs as they use a different type of Virtual Switch for packet handling. NSX-T uses the VDS7 or N-VDS or “NSX-T vSwitch” on ESXi hosts, along with VSIP kernel modules for firewalling. On KVM, on the other hand, NSX uses the “Open Virtual Switch” (OVS) and its utilities. The following section highlights the implementation details and differences between ESXi and KVM environment from data plane perspective.  

 

 

NSX uses the N-VDS on NSX Edge Transport Nodes and older (6.5 and 6.7) ESXi host transport nodes. The N-VDS is a variant of vCenter VDS which was used prior to vSphere 7.0 where NSX-T manager fully manages NSX-T vSwitch. The NSX-T DFW kernel space implementation for ESXi is same as implementation for NSX for VSphere (NSX-v), it uses VSIP kernel module and kernel IO chains filters.  NSX-T does not require vCenter to be present.  For installations in vSphere 7.0 environments and going forward, NSX can use the VDS 7.0 for host transport nodes.  With the VDS7, you can:

         Manage NSX transport nodes using a VDS switch

         Realize a segment created in NSX as an NSX Virtual Distributed Port Group in vCenter

         Migrate VMs between vSphere Distributed Virtual Port groups and NSX Distributed Virtual port groups.

         Send VMs traffic running on both types of port groups

 

Which VDS is running can have significant implications in vMotion events and other feature support.  Of note, SR-IOV was not supported in the N-VDS, but is supported in the VDS 7.0.  For full details of impacted features, see the NSX Documentation.

 

A screenshot of a cell phone

Description automatically generated

Figure 3--14 ESXi Data Plane

 

Regardless of which virtual switch is used in ESXi hosts, the DFW uses the VSIP kernel module and kernel IO chain filters.  The LCP intelligently programs the FW rules table for every vNIC based on the “Applied To” field in the policy.

 

 3.3.3  KVM Data Plane

 

As mentioned earlier, the NSX-T distributed firewall (DFW) functionally is identical on both hypervisors. This section will examine the details of the KVM data plane. 

 

On KVM, the NSX Agent is the primary LCP component. It receives the DFW configuration from the central control plane. The NSX agent has a DFW wiring module as a component.  It’s used to generate Openflow flows based on the firewall rules that were pushed from the CCP. The actual DFW is implemented through the OVS.KO FastPath module. Stateless filtering is implemented through OVSDaemon, which is part of openVswitch distributions.  It implements the wiring implementation it received from LCP in the form of openflows. The Linux conntrack utility is used to keep track of state of connections in case they were allowed by a stateful firewall rule.  Any new packet is first looked up in conntrack to see if there is an existing connection.  Statistics are exported through the Management Plane Agent directly to the NSX Manager. Figure 3.15 details the KVM data plane.

 

A screenshot of a cell phone

Description automatically generated

 

Figure 3--15 NSX-T KVM Data Plane

 

 3.3.4  Distributed Firewall For Containers

The DFW is implemented in containers using the NSX Container Plug-In.  The NCP is detailed in its own chapter.  The discussion in this section will be limited the DFW implementation in container environments, as depicted in figure 3.16 below.

A screenshot of a cell phone

Description automatically generated 

Figure 3--16 DFW for Containers

In containers, every Pod/Container has rules applied to its interface.  This allows security policy to be implemented from container to container AND from container to/from physical or virtual servers. This allows for a uniform security policy application, regardless of the implementation details of the environment.  For example, if there is a corporate policy that prohibits FTP and SSH to servers which source SQL, that policy can be implemented uniformly across physical servers, virtual servers and even any pods inside containers.

 

As is shown in figure 3.16, containers are hosted on Node VM which acts as Node VM for

K8S/VMWare Tanzu. Each of the Container connects to the OVS in the Node VM. The OVS has VLAN trunk to N-VDS or VDS in the hypervisor on which it is hosted and connects each of the containers to virtual switch on a CIF (Container Interface).  The OVS within the node does not switch traffic locally, but always sends it to the virtual switch in the hypervisor. Traffic between the CIF and OVS is carried over a locally significant unique VLAN tag per container.  This allows each CIF to have a DFW to provide segmentation for each of the container pods.

 

3.4  NSX Firewalling For Public Clouds

NSX Cloud integrates NSX core components (the NSX Management cluster) with your public cloud to enable consistent network and security across your entire infrastructure. Currently, NSX Cloud supports only AWS and Azure.  NSX cloud brings the agility needed for dev and test environments AND the structural integrity needed for production.  When combined with VMware Secure State for auditing purposes, VMware security makes enterprise-ready.

 

In the public cloud implementation, NSX adds a few extra components:

         Cloud Service Manager: The CSM integrates the NSX Manager to provide cloud-specific information to the management plane.  Think of it as the interpreter which is bilingual in both NSX and public cloud.

         NSX Public Cloud Gateway: The PCG provides connectivity to the NSX management and control planes, the NSX Edge gateway services, and for API-based communications with the public cloud entities.  The PCG is not in the datapath.

         NSX Agent: This provides the NSX-managed datapath for workload VMs.

 

There are two modes for enforcing NSX policy in public clouds: Cloud Enforce mode which leverages native means such as AWS Security Groups and Azure Network Security Groups, and NSX Enforce mode which leverages NSX Tools for enforcement.   Dynamic policy enforcement is based on Instance Attributes.  This policy is fully configurable to each VPC with exclusion lists

 

 3.4.1  Cloud Enforce Mode 

In Cloud Enforce mode, NSX manages the security policy and provides security using Azure/AWS (network) security groups. In this implementation, there are no NSX tools inside the cloud instance, although the PCG is still required.  Management is at the VNET/VPC level. This provides a common policy framework by translation NSX Policies to native cloud security policies. In this mode, default quarantine policies do not apply. 

 

The installation steps for Cloud enforce mode are:

 

  1. Install the Cloud Service Manger (CSM) on prem and register the CSM with NSX Manger & Cloud Provider Azure/AWS with right credentials
  2. Install the NSX Cloud Gateway in your cloud Account
  3. Push the micro segmentation security Policy to NSX Cloud Gateway, which in turn pushes policy to VPC/VNET 

 

Unfortunately, NSX cannot overcome the limitations set by current cloud providers such as the number of security groups or the number of rules, nor the scope of NSG in Azure (regional) or SG in AWS (VPC).

A close up of a map

Description automatically generated 

Figure 3--17 DFW on Public Clouds, Native Enforce Mode

 

 3.4.2  NSX Enforce Mode 

NSX Enforce Mode leverages NSX Tools inside the VMs to enforce a consistent security policy framework.  Enforce Mode allows for control at the individual VM level and a default quarantine. The list of currently supported operating systems for NSX Tools is listed in the NSX Documentation.  The list of installation steps is:

 

  1. Install the Cloud Service Manger (CSM) on prem & register with the NSX Manger & Cloud Provider Azure/AWS with right credentials
  2. Install the NSX Cloud Gateway in your cloud Account
  3. Install NSX Tools on Cloud VM instances (Note: On Azure VNets, NSX Tools can automatically be installed if Auto-Install NSX Tools is enabled.)
  4. Push the micro segmentation security Policy to NSX Cloud Gateway, which in turn pushes policy to NSX managed instances 

 

A close up of a map

Description automatically generated

Figure 3--18 DFW on Public Clouds, NSX Enforce Mode

 

  

4 NSX Firewall Policy Building

While one CAN build NSX policy in the same manner that legacy firewall policy has been built for years, the history of VMware support cases shows that not to be the best idea as wone get to large scale environments.  One of the most common problems seen by support is temporary measures which last far beyond their intended period, only to cause massive problems down the road.  Moving to an NSX firewall model is an opportunity to start fresh, with all the lessons of the past, to build a better policy.  It is advised against porting legacy firewall policies to NSX.  Can it be done? Sure. It can.  And the policy will work.  SHOULD it be done? Not if a solid, long-term solution is the goal. VMware professional services have worked with many customers to migrate their policy, but the key to the success of those engagements has been the translations and optimizations that took place to make the resulting policy optimized for NSX. Importing a legacy firewall config into NSX without translation is like putting a gas engine into a Tesla.  It can be done, and it will work for transportation, but the differentiating value is lost.

 

The NSX-T transport nodes make a distributed data plane with DFW enforcement done at the hypervisor’s kernel level. Each of the transport nodes, at any given time, connects to only one of the Central Control Plane (CCP) controller based on mastership for that node. Once the Local Control Plane (LCP) has the policy configuration from the CCP, the LCP pushes the firewall policy and rules to the data plane filters (in kernel) for each of the virtual NIC’s. The LCP programs rules only on virtual NICs based on the contents of the Applied To field. instead of every rule everywhere (optimizing the use of hypervisor resources).  In policies ported from legacy firewalls, the Applied To field is a concept that does not exist.  Thus, the ported policy is substandard right off the bat.

 

Although some of the same concepts in building legacy firewall policy apply, there are new constructs available in building NSX firewall policy which can make the resulting implementation run more efficiently.  This chapter examines the new constructs of building virtual firewall policy.

 

 

4.1  Rule Lookup

NSX firewalls implement a top down rule search order.  When a packet matches, it pops out of the search based on the processing indicated in the matched rule.

 

By default, the DFW implements the rule table and flow table model that most firewalls use.  However, this behavior can be overwritten as described later.

 

In figure 4.1, the processing of a packet takes place as follows:

An IP packet identified as pkt1 that matches rule number 2. The order of operation is the following:

  1. A lookup is performed in the connection tracker table to determine if an entry for the flow already exists.
  2. As flow 3 is not present in the connection tracker table, a lookup is performed in the rule table to identify which rule is applicable to flow 3. The first rule that matches the flow will be enforced.
  3. Rule 2 matches for flow 3. The action is set to ‘Allow’.
  4. Because the action is set to ‘Allow’ for flow 3, a new entry will be created inside the connection tracker table. The packet is then transmitted out of DFW.

 

Subsequent packets are processed in this order:

  1. A lookup is performed in the connection tracker table to check if an entry for the flow already exists.
  2. An entry for flow 3 exists in the connection tracker table. The packet is transmitted out of DFW

 

 

A screenshot of a cell phone

Description automatically generated

 Figure 4--1 NSX-T Rule Processing

 

4.2  Applied To Field

THE most important best practice, the one that addresses the majority of calls into VMware support due to policy suboptimization, is the use of the Applied To field.  To reiterate: USE THE APPLIED TO FIELD!!  So, what is this magical applied to field and how can it help?  Applied To is the filed that indicates which vnics will receive the rule in question. It limits the scope of a given rule. By default (DFW in the Applied To field), every rule is applied to every vnic. This means that if there are 1,000 VMs in the environment and a rule allows VM A to talk to VM B, all 1,000 VMs will receive that rule instead of just A and B, or 998 VMs too many.  

 

A screenshot of a computer

Description automatically generated

 Figure 4--2 The Applied To Field

Figure 4.2, shows a three tier application, “3-tier”, which has a web, app, and DB tier.  The first rule is applied only to the web servers.  The second rule is applied both to the web and app servers. The third rule is applied to both the app and db servers.  The following rule of thumb clarifies what to put in the applied to field:

         If the source is any (external to the NSX environment), apply the rule to the destinations only. (This is what we saw above.)

         If the source is any, and can include sources within the NSX environment, apply the rule to everything (DFW).

         If the source and destinations are clearly defined in the rule, apply the rule to BOTH the source and the destination.

 

 That simple step will set your NSX policy off on the right foot. 

 

It is important to note that when there is a multitenant environment (especially with overlapping IP addresses), the use of the Applied To field is critical.  In this case, typically assets are tagged with their tenancy.

 

A screenshot of a social media post

Description automatically generated Figure 4--3 Applied To Field in Action

In figure 4.3, the reward of using the Applied To field is evident.  The policy allows the green VMs to talk to each other and the blue VMs to talk to each other.  The applied to field is used in both rules.  This full policy is built in the NSX management appliance and sent to the LCP on the hosts.  The VSIP module will instantiate only the green policies on the green VNICs and the blue policies on the blue VNICs, based on the contents of the Applied To field.  Note that the default behavior of the Applied To field, DFW, means that the rule will be implemented in everything.  As policies grow to thousands of entries, the Applied To field becomes critical for scale.  However, retrofitting the Applied to field is extremely challenging, so the use of the Applied To field is critical from the outset.

 

One last note about the Applied To field with IP groups that have IP Addresses. NSX allows for overlapping IP addressing in different tenants (say, tenant A uses IP address 10.1.1.1. and Tenant B uses that same IP address 10.1.1.1 but referring to a different endpoint).  This means that when the IP address is entered into the Applied To field, it is impossible for the NSX LCP to know which instance is referenced.  So, to use the Applied To field in this case, it is necessary to create a group with the relevant segment(s) for use in the Applied To field. 

Granted, that may be larger than the anticipated scope if there is only one or 2 relevant IP addresses in the segment in question, but that is still a smaller scope than the entire environment.

 

4.3  Grouping

 

Groups are a very useful tool for defining the source or destination in a rule.  While the grouping concept is trivial (one term used to describe many objects), the use of groups can be made optimal if best practices are known at the outset.

 

The following concepts apply in deciding the proper grouping construct:

         IP Block / CIDR / Infrastructure constructs per environment are typically static. 

  • Most organizations have different CIDR locks for their prod and non-prod environments, for example. When that is the case, it is optimal to use the CIDR block as a grouping construct.
  • When adding IP Addresses to a group, you can import a txt file. (This allows for noncontiguous IP Address ranges.)

         Have broader groups like Environment/Zone more statically using IP Subnets/ Segments

         Application/Application Tier level grouping should use dynamic grouping with VM Tags or VM name or combination 

         Nested groups should limit to 3 levels of nesting for manageability and resource optimization

         When using dynamic grouping with multiple AND/OR criteria, limit the complexity of the criteria for the same reasons as well as to limit the number of unexpected members.

         When possible, use tag/name “Equals-to” to limit the number of unexpected members.

 

A screenshot of a cell phone

Description automatically generated

 Figure 4-4 NSX-T Groups

With security, there is a balance between agility and dynamic membership and security.  Many new installations like to use regex to create groups. Although this is supported, it is highly recommended (from a security perspective) that this be done to create initial groupings which can be reviewed for accuracy, then static groups be created at least for sensitive groupings.  When there is a desire to have automated security, tags are a much better way to go than groupings with complex membership.

 

Tags

Tags is what all the cool kids are doing in security.  Why? Well because tags accelerate automation, apply policy when the workloads are provisioned, allow for policy definition apart from application, AND they prevent rule sprawl (when used properly). What more could you ask from a nifty software construct?

 

(It is important to call out here that this document refers to NSX-T security architecture. The tagging approach described below is an example of the differing architecture between the two platforms.  Should this approach be applied to an NSX-V implementation, serious performance penalties may be experienced due to architectural differences between the two platforms.)

 

A close up of text on a white background

Description automatically generated

 

 Figure 4-5 NSX-T Tags

Tags are a security wonder because security is automated! This means that if one service finds something, then another service can do something about it.  Tagging also provides a security posture of a workload of VM. This can be either an intended posture or runtime posture.  You can create custom tags to tag VMs. Third Party Services are required to tag on specific events. This helps to create automated workflows.  For example, antivirus can tag a VM when it is found to be infected.  By having predefined rules based on this tag, this allows for automated remediation.

 

Security Tag are applied to Virtual Machines, Logical Ports, and Logical Segments and can be used for dynamic Security Group membership.  They can apply differentiated policy based on OS,

Environment, or a myriad of other attributes. Tags are used to automate policy definition for new applications being provisioned.

 

In vRealize Automation, upon a blueprint deployment, all VMs part of an application are placed into a new Security Group.  Also, every VM is tagged with multiple tags identifying: Function, Zone, OS, Environment and Tenant.

 

Tanzu also uses tags to define policy.

 

 

A screenshot of a cell phone

Description automatically generated

 Figure 4--6 Compound or combined tag

 

One of the challenges that may arise with tagging is following the logic in troubleshooting.  For this reason, some advocate the use of compound or combined tags.  Figure 4.6 shows an example of this where 3 tags - Prod, App-1, and Web - are combined to make one tag, Prod_App-1_Web.  The tradeoff with this model is that there will be more rules as one must write a rule for each permutation, instead of letting the manager calculate things.  It must be note that concatenated tags are not exclusive of traditional tagging.  Typically, a successful implementation will use a combination of tagging strategies.

 

In large scale implementations, it is advisable to use a combination of tagging methods to achieve security.  For example, take a company that has 3,000 3-tiered applications in 3 environments (Dev, Test, and Prod).  Most commonly, this would be spread across at least 2 installations so that code upgrades can happen asynchronously.  But, say they are all in one installation so that all groups are combined.  If this customer takes a compound tagging approach only (defining a tag for each combination), they will have 3,000 (applications) x 3 (environments) x 3 (tiers) = 27,000 tags requiring 27,000 or more rules.  However, if they use 3 tags (one for environment, one for application, and one for tier), they would end up with 3,006 tags, with roughly 3,000 rules.  Thus, the grouping approach has a significant influence on the number of rules, which in turn affects processing efficiency on the VNIC.  An example of the use of environment categories is found in the Distributed Firewall Policy Categories.

4.4  NSX-T Policy Structure

The NSX-T Manager UI has two different areas, one for the Distributed Firewall, and one for the

Gateway Firewall, as shown in figure 4.9. The top portion, shown below, is for the Distributed Firewall, in the East West section.  The gateway Firewall section is just below that, in the North South section. This layout reflects the findings that most customers spend the majority of their time in the East West section, as opposed to the North South section.

 

A screenshot of a social media post

Description automatically generated Figure 4-7 NSX-T Policy UI

Within each of these sections, there are categories which provide a means for organizing your security policy.  The categories of the Gateway and Distributed Firewalls will be examined below.

 

 4.4.1  Gateway Firewall Policy Categories

NSX Firewall simplifies policy definition by having pre-defined categories.  To match with common security policy best practices used by our customers like you.  This helps in organizing the rule better.  Rules are evaluated top down within a category and left to right across categories.  Category names can be changed using the API.

 

First look at the NSX gateway firewall and its predefined categories.

 

A screenshot of a cell phone

Description automatically generated Figure 4--8 NSX-T Gateway Firewalls, Policy Structure

 

Emergency – This is used for Quarantine.  It can also be used for Allow rules.

 

System – These rules are automatically generated by NSX and are specific to the internal control plane traffic (such as BFD rules, VPN rules, etc.) DO NOT EDIT SYSTEM RULES.

 

Shared Pre Rules – These rules are applied globally across all of the gateways.

 

Local Gateway – These rules are specific to a particular gateway.

 

Auto Service Rules – These are auto-plumbed rules applied to the data plane.  These rules can be edited as required.

 

Default – These rules define the default gateway firewall behavior.

 

Most Gateway Firewall configuration will be done in the Pre-Shared and Local Categories.  A good rule of thumb for the two categories would be that corporate policy lives in the Pre-Shared Rules while tenant/application policy lives in the Local rules.

 

 

 Figure 4-9 NSX-T Gateway Firewalls UI 

 

Figure 4.9 shows the UI for the Gateway Firewall.

 

 

 4.4.2  Distributed Firewall Policy Categories

 

As with the Gateway Firewall rules, the rules in the Distributed Firewall are processed top down and left to right. Again, the category names can be changed via that API.  As you can see, the categories are quite different from the Gateway Firewall.  Those will be examined in detail.

 

A screenshot of a cell phone

Description automatically generated

Figure 4-10 NSX-T Distributed Firewall Policy Structure

Ethernet – These are layer 2 rules based on MAC addresses

 

Emergency – This is the ideal place to put quarantine and allow rules for troubleshooting.

 

Infrastructure – These rules define access to shared services.  Examples of rules in this category would be to allow AD, DNS, NTP, DHCP, Backup, Management access.

 

Environment – These are rules between zones.  For example, allowing Prod to talk to Non Prod, or inter business unit rules.  This is also a means to define zones.

 

Application – These are rules between applications, application tiers, or defining micro services.

 

Ideally, the top categories are less dynamic than the bottom categories.

 

In using the DFW for zoning, the environment can be used by creating ring-fencing policies.  These are policies that create a ring around an environment.   For example, the following policy creates rings around the Prod, Dev, and Test environments such that nothing is allowed out of those environments:

 

 

 Figure 4-11 NSX-T Zone Policy

 

 

To create the rules, the group negation has been leveraged as shown below:

 

 Figure 4-12 NSX-T Group

 

The only traffic to leave the environment section will be Prod traffic traveling within Prod, test within test, or Dev within Dev.  Thus, the Zones have been established.  As indicated above, the infrastructure section has already caught traffic that was DNS, LDAP, or other common traffic that would cross the zone boundary.  If there are Zone exceptions, it is common to see a Zone exception Section before the zone policy as shown below.

 

 

 Figure 4-13 NSX-T Gateway Zone Policy Exception

 

The DFW allows for firewall drafts.  Firewall drafts are complete firewall configurations with policy section and rules which can be immediately published or saved for publishing at a later time.  Auto drafts (enabled by default) means any config change results in a system generated draft. A maximum of 100 auto drafts can be saved.  These auto drafts are useful for reverting to a previously known good config. Manual firewall drafts (of which there can be 10) can be useful for having (for example) different security level policies in predefined policy for easy implementation.  It is worth noting that when updates are made to the active policy (for example a new application is added), that change is not updated on previously saved drafts.

 

The Distributed Firewall provides an exclusion list which allows for it to be removed from certain entities. From a security practitioner’s perspective, this is a useful tool to be used very rarely, if at all.  Placing a logical port, logical switch, or Group in the exclusion list means that DFW will not be applied to that/those entities at all.  Even if a VM is referred to in the rules or the Applied To field, it will not receive any policy if it is in the exclusion list.  Upon installation, NSX places the NSX Manager and NSX Edge node VMs into this list.  This prevents novice users from locking themselves out of those entities.  For a secure installation, it is recommended that a policy allowing the communication ports defined at ports.vmware.com be added and those entities be removed from the exclusion list.  Figure 4-13 shows how to access the exclusion list for DFW:

 

A screenshot of a cell phone

Description automatically generated

Figure 4--14 NSX-T Distributed Firewall Exclusion List

 

The exclusion list is handy for troubleshooting to remove the DFW so that it can be determined if DFW policy can be causing connectivity issues.  Other than as a troubleshooting tool, it’s use is not recommended in secure environments.

 

4.5  Security Profiles

One of the very useful tools within NSX for defining security policies is Profiles. Security Profiles are used to tune Firewall Operations features such as Session Timers, Flood Protection, and DNS Security.  Each of those will be examined in this section.

 4.5.1  Session Timers

 

Session Timers define how long the session is kept after inactivity on the session.  When this timer expires, the session closes.  On the Tier0 and Tier 1 gateway firewalls, several timers for TCP, UDP, and ICMP “sessions” can be specified to apply to a user-defined group.  In other words, default session values can be defined depending on your network or server needs. While setting the value too low can cause frequent timeouts, setting it too high will consume resources needlessly.  Ideally, these

timers are set in coordination with the timers on the servers to which traffic is destined. The figure below provides the default values for the Session Timers:

 

A screenshot of a cell phone

Description automatically generated Figure 4-15 Default Session Timers

 4.5.2  Flood Protection

 

Flood Protection helps protect against Distributed Denial of Service (DDoS) attacks.   DDoS attacks aim to make a server unavailable to legitimate traffic by consuming all the available server resources through flooding the server with requests. Creating a flood protection profile imposes active session limits for ICMP, UDP, and half-open TCP flows. The distributed firewalling can cache flow entries which are in SYN-SENT and SYN-RECEIVED state and promote each entry to a TCP state after and

ACK is received from the initiator, completing the three-way handshake.  Note that due to its distributed nature, the DFW is far better able to protect against DDoS attacks than a legacy centralized firewall which may need to protect many servers at once.

 

The following table provides details around the Flood Protection parameters, their limits, and their suggested use:

 

A screenshot of a cell phone

Description automatically generated Figure 4\-16 Flood Protection Parameters

 4.5.3  DNS Security

DNS Security guards against DNS-related attacks.  DNS security controls include the ability to snoop on DNS responses for a VM or group of VMs to associated FQDNs with IP addresses and adding global and default DNS server information for select VMs.  Only one DNS server profile can be applied to any given VM.  Tags are supported so that profiles can be applied associated with a given group.

 

 Figure 4\-17 DNS Security UI

 

 

   

  

5 Container Security

The programmable nature of NSX makes it the ideal networking and security infrastructure for containers.  With NSX, the developer can deploy apps with the security built in from the get-go. While security is traditionally seen as an impediment among the developers, the visibility which security requires can be leveraged by developers to ease their troubleshooting.  Moreover, NCP security can be quite extensive providing firewalling, LB (including WAF), and IDS. This section dives deeply into the NSX Container Plug-in, a software component provided by VMware in the form of a container image meant to be run as a Kubernetes pod.

NSX Container Plug-in (NCP) provides integration between NSX-T Data Center and container orchestrators such as Kubernetes, as well as integration between NSX-T Data Center and container-based PaaS (platform as a service) products such as OpenShift and Pivotal Cloud Foundry or CaaS (Container as a Service) platforms such as EKS (Amazon Elastic Kubernetes Service), AKS (Azure Kubernetes Service), and GKE (Google Kubernetes Engine). The NCP has a modular design, allowing for additional platform support in the future. 

 

The main component of NCP runs in a container and communicates with NSX Manager and with the Kubernetes control plane. The NCP monitors changes to containers and other resources and manages networking resources such as logical ports, switches, routers, and security groups for the containers by calling the NSX API.

 

The NSX Container Plug-In is a software component provided by VMware in form of a container image, e.g. to be run as a K8s/OCP Pod.

There are four key design goals of the NSX OCP/K8S integration:

 

  1. Don’t stand in the way of the developer!!
  2. Provide solutions to map the Kubernetes constructs to enterprise networking constructs
  3. Secure Containers, VMs and any other endpoints with overarching Firewall Policies and IDS
  4. Provide visibility & troubleshooting tools to ease the container adoption in the enterprise

 

The NSX CNI plug-in runs on each Kubernetes node. It monitors container life cycle events, connects a container interface to the guest vSwitch, and programs the guest vSwitch to tag and forward container traffic between the container interfaces and the VNIC.

 

The NCP automatically creates an NSX-T Data Center logical topology for a Kubernetes cluster, and creates a separate logical network for each Kubernetes namespace.  It also connects Kubernetes pods to the logical network, allocates IP and MAC addresses.  Finally, the NCP supports network address translation (NAT) and allocates a separate SNAT IP for each Kubernetes namespace.  These separate SNAT IP addresses allow each Kubernetes names space to be uniquely addressable.

 

The NCP implements the following in Kubernetes:

Security policies with the NSX-T Data Center distributed firewall.

   

o

Support for ingress and egress network policies.

   

o

Support for IPBlock selector in network policies.

   

o

Support for matchLabels and matchExpression when specifying label selectors for network policies.

   

o

Support for selecting pods in another namespace.

ClusterIP and LoadBalancer service types.

Ingress with NSX-T layer 7 load balancer.

   

o

Support for HTTP Ingress and HTTPS Ingress with TLS edge termination.

   

o

Support for Ingress default backend configuration.

   

o

Support for redirect to HTTPS, path rewrite, and path pattern matching.

Creates tags on the NSX-T Data Center logical switch port for the namespace, pod name, and labels of a pod, and allows the administrator to define NSX-T security groups and policies based on the tags.

       

 

NCP 3.0.1 supports a single Kubernetes cluster. You can have multiple Kubernetes clusters, each with its distinct NCP instance, using the same NSX-T Data Center deployment.

 

A screenshot of a cell phone

Description automatically generated

Figure 5--1 NSX-T Broad platform support

 

5.1  NCP Components

  NCP is built in a modular way, so that individual adapters can be added for different CaaS and PaaS systems. The current NCP supports K8S, Tanzu, and OpenShift, but more can be easily added.  Figure 5.2 shows this modular architecture.

  

A screenshot of a cell phone

Description automatically generated 

Figure 5--2 NCP Components 

The heart of the NCP is the NCM Infra.  It talks to both the NSX Manager via the NSX Manager API client and the container environments via a container-specific adapter as shown above. In a K8s environment, the NCP communicates with the K8s control plane and monitors changes to containers and other resources.  It monitors containers life cycle events and connects the container interface to the vSwitch. In doing so, the NCP will program the vSwitch to tag and forward container traffic between the container interfaces and the vnic.  The NCP also manages resources such as logical ports, switches, and security groups by calling the NSX API.  This allows the NCP to extend all NSX services, even Distributed IDS (discussed in chapter 7) to the container, as seen in figure 5-3, below:

 

A picture containing application

Description automatically generated

 Figure 5--3 Distributed IDS for Containers 

 

Because NSX infrastructure exists solely in software, it is entirely programmable.  The next section will look at how the NCP calls to the NSX Manager when instantiating K8s clusters.

 

As described above, the NCP provides per namespace topology upon creation.  This is shown in figure 5.4 below in which two namespaces are created: foo and bar, each with its own topology.

 

A screenshot of a cell phone

Description automatically generated

 Figure 5--4 NSX-T Per Namespace Topology

Walking through the four commands above provides an understanding of this environment’s instantiation.  The first thing the NCP does is request a subnet for each namespace from the block which is pre-configured in NSX. (This block is defined when the NCP is set up in NSX.) Next, the NCP will create a logical switch and T1 router (which it will attach to the pre-configured T0 router). Finally, the NCP will create a router port on the T1 which it will attach to the logical switch (to which it has assigned the subnet it received).  This is how the commands result in the topology on the right.  Note that smaller environments, may wish to have a shared T1 for all namespaces.  This is also supported. 

On the other end of the spectrum, where there may be a requirement for massive throughput, Equal Cost Multi Path (ECMP) routing may be enabled on the T0s above the T1s, providing up to 8 parallel paths in an out of each environment. (for more details on NSX network design, please see the NSX Design document.)

 

A screenshot of a cell phone

Description automatically generated

 Figure 5--5 NSX-T Namespace Scalability

 

One of the critical pieces of a secure infrastructure design is the reliability of IP addressing.  This is necessary for forensic purposes.  It is critical that when there is an endpoint with a given IP address, it be assigned to that endpoint throughout the life of that endpoint and that it not change, making it harder to track that endpoint’s history. This leads to the requirement for persistent SNAT in the world of containers. NSX allows for persistent SNAT IP per K8S service. With this feature, a set of

Kubernetes Workloads (Pods) can be assigned to use a specific IP or group of SNAT IPs from which to source their traffic.  Persistent SNAT also allows the creation of rules in legacy firewalls and other IPaddressed based infrastructure.  

 

A picture containing screenshot, map

Description automatically generated 

Figure 5--6 NSX-T Persistent SNAT IP per K8S Service

 

To further help with security, metadata within Kubernetes (like namespace, pod names, and labels) all get copied to the NSX Logical Port as Port Tags, as shown in figure 5.7 below.  

 

A screenshot of a social media post

Description automatically generated 

Figure 5--7 K8S Metadata mapping

 

Although this may seem like merely an administrative convenience, it has significant security implications as well. NSX can be configured to collect ports and switches in dynamic security groups based on Tags (derived from Kubernetes Metadata).  Those same groups can be referenced in firewall rules, as figure 5.9 shows.

A screenshot of a cell phone

Description automatically generated Figure 5--8 K8S Pre-Created Firewall Rules with Pre-Created Groups

 

A screenshot of a social media post

Description automatically generated 

Figure 5--9 NSX-T DFW Category Support

 

 

 

5.2  Tanzu Application Service

NCP functionality in Tanzu environments is similar to the one described in the K8s section above.  The NMC Infra component lies between the Cloud Foundry Adapter and the NSX API Client to orchestrate the two environments.

 

In Tanzu application service environments, CF orgs (typically a company, department, or applications suite) are assigned a separate network topology in NSX so that each CF org gets its own Tier 1 router (as seen in the K8S section above).  For each CF space, NSX creates one or more logical switches, which are then attached to the Org’s T1 router.  Each Tanzu AI (container) has its own logical port on an NSX logical switch (so NAT is not needed).  Every cell can have AIs from different orgs and spaces.  Every AI has DFW rules applied on its Interface, with policies defined in the new cf-networking policy server.  ASGs (Application Security Groups) are also mapped to the DFW.  For North/South routing, NSX infrastructure (T0s) provide connectivity to the outside world.  During installation, one can select direct Gorouter to container networking (with or without NAT).  NSX also provides IP Address Management (IPAM) by supplying subnets (from the IP Block provided at install) to Namespaces.  NSX also provides the individual IP addresses and MACs to the AIs (containers).

5.3  OpenShift

 

The NSX Container Plugin for OpenShift is designed for OpenShift4 (and for OpenShift3 in the case of NCP 2.5). As described above, the main component of the NCP runs in a container, communicating with the NSX Manger via the Client API.  It also communicates with the OpenShift control plane via the OpenShift Adapter.  Through this interaction, the NCP will create an NSX-T logical topology for each OpenShift cluster, creating a separate logical network for each OpenShift namespace.  The NCP will connect the OpenShift pods to the logical network, allocating IP and MAC addresses.  As the NCP creates the logical switch port, it will assign labels for the namespace, pod name, and labels of a pod which will can be referenced in firewall policies.  Each OpenShift namespace will also be allocated an SNAT.  Through the DFW, the NCP will also support ingress and egress network policies with IPBlock selector, as well as matchLabels and matchExpressions when specifying label selectors for policies.  Using the NSX LB, the NCP can implement the OpenShift route, including support for HTTP route and HTTPS route with TLS edge termination, as well as routes with alternate backends and wildcard subdomains.  The Advanced LB available with NSX allows for a whole security suite to be applied to the HTTP traffic, including rate limiting and WAF.  For more details on the ALB security suite, see Chapter XX.

 

 

A screenshot of a cell phone

Description automatically generated

Figure 5--10 OpenShift NCP

To trigger an NCP deployment, the networkType field in the CRD in the RedHat UBI (Universal Base Image) must be “ncp”.  Both the NCP and the Network Cluster Operator are packaged with the Red Hat UBI.  Operators apply the equivalent of the K8s controller model at the level of the application.

 

5.4  NCP Features

 

The previous sections discussed the NCP architecture and functionality in support of K8S, OpenShift, and Tanzu.  This section will look at the additional functionality the NCP brings to these environments that makes them more secure and easier to operate.

 5.4.1  IDS

Traditionally, customers had to make a binary choice between the efficiency of containers and security.  The NCP allows customers to have the best of both worlds: agile developers AND fully functioning security infrastructure.  The NCP plug-in brings full security infrastructure to the container world: firewalling, IDS, and even WAF.  The below figure shows the IDS functionality being extended into a Kubernetes environment.

 

A screenshot of a cell phone

Description automatically generated

Figure 5--11 NCP extending IDS into K8S

 

 

 5.4.2  Visibility

NSX ends the black hole that is the container environments.  NSX Topology mapper provides a dynamic topology map of the environment.  

 

A screenshot of a cell phone

Description automatically generated

Figure 5-12 NCP Topology and Policy Visibility

Tools such as traceflow not only extend visibility, but they also aid in troubleshooting connectivity across the entire flow, from VM to container, or even between pods.

A screenshot of a cell phone

Description automatically generated

 

 5.4.3  IPv6

IPv6 is supported on the NSX Container Plug-in, for both single and two-tier topologies.  IPv6 and IPv4 IP blocks cannot be mixed in the NCP configuration. Dual stacks are not supported, so if a container has an IPv6 address, it cannot have IPv4 addressing. For north-south traffic to work properly, the Tier-0 gateway must have an IPv6 address and spoofguard must be disabled. The Kubernetes cluster must be created with an IPv6 service cluster CIDR with a maximum 16 bit subnet mask.  All namespaces will be in no_SNAT mode.  Kubernetes nodes must have an IPv6 address for connectivity between the nodes and pods, and TCP and HTTP liveliness and readiness probes to work. Either SLAAC or static IPs can be used.  The Kubernetes nodes can also be in dual-stack mode, in which case you must register the node with an IPv6 address by specifying the IPv6 address with the node-ip option as one of the kubelet’s startup parameters.

 

5.5  Project Antrea

No discussion of Container Networking would be complete without the mention of Project Antrea. Project Antrea is an open source Container Network Interface (CNI) plug in providing pod connectivity and network policy enforcement with Open vSwitch in K8s.  It is available on https://antrea.io.  Being an open source project, Antrea is extensible and scalable.  Antrea simplifies networking across different clouds and operating systems. Its installation is quite simple, requiring only one yaml file. An Antrea CNI is installed per K8s cluster, allowing for better scale in environments with many K8s clusters.  In the future, these CNIs will be able to managed by the NSX manager for global policy distribution.  This document will be updated with details when that functionality comes available.

 

  

6 Firewall features

The NSX Firewall provides many features which are useful for securing the environment.  Although there are a myriad of firewall features including time of day rules and so on this chapter will only highlight a few of the ones most commonly used: URL Analysis, Service Insertion, and Endpoint Protection (also known as Guest Introspection).  The focus on these features is highlighted due to the impact these features has on system architecture and design.  For an exhaustive look at firewall features, see the NSX product documentation.

6.1  URL Analysis

URL Analysis allows administrators to gain insight into the type of external websites accessed from within the organization and understand the reputation and risk of the accessed websites.  URL Analysis is available on the gateway firewall and is enabled on a per cluster basis. After it is enabled, you can add a context profile with a URL category attribute.  URL Analysis Profiles specify the categories of traffic to be analyzed.  If no profiles are created, all traffic is analyzed.  To analyze domain information, you must configure a Later 7 gateway firewall rule on all Tier-1 gateways backing the NSX Edge cluster for which you want to analyze traffic.  The DNS traffic is analyzed to extract the hostname and IP information.  The extracted information is then used to categorize and score traffic.  

 

To download the category and reputation database, the management interface of the edge nodes on which URL Analysis is enabled must have internet access.

 

This is depicted in fire 6.1 below.

A close up of a map

Description automatically generated

Figure 6-1 -NSX-T URL Analysis

 

URL categories are used to classify websites into different types.  There are more than 80 predefined categories in the system.  Currently, categories cannot be customized. A website or domain can belong to multiple categories. Based on their reputation score, URLs are classified into the following severities:

         High Risk (1-20)

         Suspicious (21-40)

         Moderate Risk (41-60)

         Low Risk (61-80)

         Trustworthy (81-100)

 

The Webroot BrightCloud® Web Classification and Web Reputation Services provide the most effective way to block access to unwanted content and protect users against web-based threats. For these services, Webroot:

         Uses patented machine learning that enables single classifiers to work at a rate of

20K classifications per second; with 500+ classifiers running in parallel, site classification is extremely fast and accurate

         Categorizes the largest URL database of its kind across 82 categories

         Observes and protect users in real time from the risks of connecting to any URL, regardless of reputation

         Provides details as to why a site classification was made, empowering admins to make betterinformed security decisions

 

 

6.2  Service Insertion and Service Chaining

 

The value of NSX security extends beyond NSX to your pre-existing security infrastructure; NSX is the mortar that ties your security bricks to build a stronger wall. Legacy security strategies were intolerant of pre-existing security infrastructure.  Anyone who had a Checkpoint firewall and wanted to move to a Palo Alto Networks firewall would run the 2 managers, side by side until the transition was complete. Troubleshooting during this transition period required a lot of chair swiveling. NSX brings a new model, complementing pre-existing infrastructure. Service Insertion is the feature which allows NSX firewalls (both gateway and DFW) to send traffic to legacy firewall infrastructure for processing.  This can be done as granularly as a port level, without any modification to existing network architecture.  

 

Service Insertion not only sends the traffic to other services for processing, Service Insertion offers a deep integration which allows the exchange of NSX Manager objects to SI service managers.  So, a group in NSX which is comprised on VMs which a substring of “web” (for example) would get shared to the SI service manager.  Thus, when a new VM is spun up which becomes a member of the new group, the NSX Manager will send that update to the SI Service Manager so that policy can be consistently applied across platforms. 

 

This section examines Service Insertion, which provides the functionality to insert third-party services at the Tier-0 or Teir-1 gateways.

 

Figure 6.2 shows Service Insertion at the gateway firewall (north south service insertion) and at the distributed firewall (east west service insertion).  Notice that east west service insertion means it can be applied to traffic destined to physical servers, VMs, or containers.  In other words: if you decide that you want your sql traffic to be directed to a Fortinet firewall (a viable security policy), that policy will apply to all sql traffic destined to physical servers, VMs, or containers as the actual instantiation of the server is an implementation detail which should not dilute the security policy.

A screenshot of a cell phone

Description automatically generatedFigure 6-2 -NSX-T Service Insertion

For a complete list of the currently supported vendors for Service Insertion, see the VMware Compatibility Guide.  

 

 6.2.1  North South Service Insertion

 

The first step in integrating NSX with your existing firewall vendor is to determine which deployments are supported. In the case of North South service insertion this is fairly straightforward as the gateway firewall are central data planes which are very much in line with legacy firewalling models. North South Service Insertion is available at both the Tier-0 and Tier 1 routers. Figure 3.11 depicts the typical supported deployment model for North South Insertion.  In this figure, the Service Insertion rule is applied at the Tier 0 gateway. 

 

A close up of a map

Description automatically generatedFigure 6-3 -NSX-T North South Service Insertion

 

This model suggests the deployment of the VM form factor of the legacy firewall alongside the Gateway firewalls on the Edge Nodes.  This suggestion would minimize the need for traffic to exit the host for processing by the virtualized legacy firewall.  Note that when the NSX firewall and the gateway firewall are coresident, this means that the additional delay in traffic processing by the additional security element is a matter of microseconds as nothing is traversing wires, contending with network traffic.  Traffic sent from the NSX gateway firewall to the VM firewall arrives in a matter of microseconds, dependent solely on CPU load.  Upon successful processing by the VM, traffic returns to the NSX gateway to be routed on its path. Again, this processing required no modification to routing or any network infrastructure.

 

Once the supported deployment is verified, the configuration of service insertion involves just three simple steps:

  1. Register the Service with the NSX Manager.
  2. Deploy a Service for North South Introspection.
  3. Add Redirection Rules for North South Traffic.

 

A screenshot of a social media post

Description automatically generatedFigure 6-4 -NSX-T North South Service Redirection Rule

 

Figure 6-4 shows a service redirection policy. You will notice that this policy has sections defined by which SVM the traffic is redirected to.  It is entirely possible to have more than one entity or vendor to which traffic is redirected. Under each section, rules are defined for the traffic that will be redirected or NOT redirected.  Note that if your Edges are running in HA mode, you need to create a redirection rule for each Edge Node.  NSX does not automatically apply the redirection rule to the standby node in the event of a failover as not all vendors support failing over the service VM.

 

As part of the SI integration, the NSX Manager will update the partner manager with changes to group membership.  In other words, the state is automatically synchronized to ensure consistent processing.  For some customers, this provides a great way to start NSX and legacy firewall integration.  This extends the inventory and dynamic grouping constructs into their legacy firewall environment.  The next step of the adoption would be to use the North-South insertion where the Gateway firewall becomes a means to reduce the processing burned on their legacy firewalls. 

 

 

6.2.2  East West Service Insertion and Service Chaining

East West Service Insertion gets a bit more complicated as the DFW is a distributed firewall.  Legacy firewalls have no equivalent model.  Because of this, understanding the supported deployment models for your firewall vendor is especially important.  Here are a few concepts which are important to keep in mind:

 

 

Service: Partners register services with the NSX Manager. A service represents the security functionality offered by the partner, service deployment details such as OVF URL of service VMs, point to attach the service, state of the service.

 
 

Vendor Template: It defines the functionality that a service can perform on a network traffic. Partners define vendor templates. For example, a vendor template can provide a network operation service such as tunneling with IPSec service.

 
 

Service Profile: It is an instance of a vendor template. An NSX-T Data Center administrator can create a service profile to be consumed by service VMs.

 
 

Guest VM: It is a source or destination of traffic in the network – where the packets originated or destined. The incoming or outgoing traffic is introspected by a service chain defined for a rule running east-west network services.

 
 

Service VM: A VM that runs the OVA or OVF appliance specified by a service. It is connected over the service plane to receive redirected traffic.

 
 

Service Instance: It is created when a service is deployed on a host.  Each service instance has a corresponding service VM.

 
 

Service Segment: A segment (overlay or vlan backed) of a service plane that is associated to a transport zone. Each service attachment is segregated from other service attachments and from the regular L2 or L3 network segments provided by NSX-T. The service plane manages service attachments.

 
 

Service Manager: It is the partner manager that points to a set of services.

 
 

Service Chain: Is a logical sequence of service profiles defined by an administrator. Service profiles introspect network traffic in the order defined in the service chain. For example, the first service profile is firewall, second service profile is monitor, and so on. Service chains can specify different sequence of service profiles for different directions of traffic (egress/ingress).

 
 

Redirection Policy: It ensures that traffic classified for a specific service chain is redirected to that service chain. It is based on traffic patterns that match NSX-T Data Center security group and a service chain. All traffic matching the pattern is redirected along the service chain.

 
 

Service Path: It is a sequence of service VMs that implement the service profiles of a service chain.  An administrator defines the service chain, which consists of a pre-defined order of service profiles.  NSX generates multiple service paths from a service chain based on the number of locations of the guest VMs and service VMs. It selects the optimum path for the traffic flow to be introspected. Each service path is identified by a Service Path Index (SPI) and each hop along the path has a unique Service Index (SI).

 

 

       

For east west service insertion, one has typically two options: a Service Cluster or a Host-Based model.  These two options are shown in figures 6.5 and 6.6, below both depicting the same flow between tenants in DFW that were examined in chapter 3.

 

In a per host deployment (as shown in figure 6.5), an instance of the SVM is installed on each host in the ESXi Cluster.  Traffic between guestVMs on the same host is inspected without ever having to leave the host.  This clearly offers a significant processing advantage to the clustered model, with a greater licensing cost.  

 

Figure 6.6 shows a Service Cluster model. In a clustered deployment, the service VMs are installed on one single cluster.  Traffic between the VMs is redirected to the service cluster for policy inspection and enforcement before reaching its final destination. When configuring a cluster deployment, you can specify which particular host within the cluster the traffic should be redirected to (if there is a desire to segregate traffic while undergoing security policies), or you can select any and NSX will select the optimal host.

 

It is important the note that the two models may coexist in different clusters of the same installation.  For example, one may have a cluster of DB VMs where every VM will require processing and may go with a host model for that cluster.  Another cluster may have a mixture of general population VMs and only a small portion of traffic or even traffic which is not very delay sensitive is being inspected.  In this cluster, the service model may the preferred architecture.

 

A close up of a map

Description automatically generatedFigure 6-5 -East West Service Insertion Per Host Model

 

 

A screenshot of a map

Description automatically generated

Figure 6-6 -East West Service Insertion Service Cluster Model

 

In order to support East West Service Insertion, at least one overlay transport zone with overlay logical switches must exist.  All transport nodes must be of the type overlay because the service sends traffic on overlay-backed logical switches. (This is how the magic happens: NSX internally creates an infrastructure which allows sending the traffic around without the need to modify the existing infrastructure.)  The overlay-backed logical switch is provisioned internally to NSX and is not visible to the user interface.  Even if you plan on using only vlan-backed logical switches for the Guest VMs, the service insertion plumbing passes traffic being processed through the overlay.  Without this overlay infrastructure, a guest VM which is subject to east west service insertion cannot be vMotioned to another host and would go into a disconnected state.

 

Deploying East West Service Insertion is sightly more involved than deploying north south. The following steps are required to set up east-west service insertion:

 

  1. Register the Service, Vendor Template, and Service Manager
  2. Deploy a Service for East West Introspection
  3. Add a Service Profile
  4. Add a Service Chain
  5. Add Redirection Rules

 

With East west service insertion, it is possible to string multiple services together to provide service chaining. Service Chaining provides standards-based delivery and flexible deployment options. Figure 6.7 below shows a service node with NGFW, IPS, and Network Monitoring services for service chaining.  A flow may leverage one, two, or all three services as defined by the rules in the service insertion policy.  Although service chaining is defined in the east West Security section, under the DFW, the dynamic service chain is attached to the T-0/T-1 Services Router (where the Tier-1 gateway firewall lives). Classification and redirection of traffic to the Services Plane happens at the T0/T1 uplink, which means service chaining is applied at the gateway. Note that Service Chaining provides support to north south traffic coming to and from VMs and Kubernetes containers.

 

A close up of a map

Description automatically generated

Figure 6-7 -NSX-T Service Chaining

  This case is similar to regular N/S SI, but instead of redirecting traffic to a bump-in-the-wire N/S service, a service chain is used instead. SI classification and redirection happens in the same location as regular N/S SI in the packet processing pipeline.

  Given that the SI lookup happens on the uplink, processing will use IN/OUT directions as appropriate for the uplink itself. IN means the packet is being received from the internet, OUT mean the packet is being send to the internet through the uplink. This is the same as regular N/S SI.

 

 

Service Chaining Compare to Service Insertion:

         Support for additional use-cases/vendors

         Chaining of multiple services versus a single service

         Leverage the same service chain and service instances (SVMs) for multiple Logical Routers and E-W Service Insertion

         Support for Liveness detection

         No HA (Active/Standby), but load distribution with flow pinning

 

 

A screenshot of a social media post

Description automatically generated

Figure 6-8 -NSX-T Network Introspection and Service Chaining Deployment Options

 

 

 

6.3  Endpoint Protection – Guest Introspection

 

NSX-T provides the Endpoint Protection (EPP) platform to allow 3rd party partners to run agentless

Anti-Virus/Anti-Malware (AV/AM) capabilities for virtualized workloads on ESXi.  Traditional AV/AM services require agents be run inside the guest operating system of a virtual workload.  These agents can consume small amounts of resources for each workload on an ESXi host.  The Endpoint Protection platform allows the AV/AM partner to remove their agent from the virtual workload and provide the same services using a Service Virtual Machine (SVM) that is installed on each host.  These SVMs consume much less virtual CPU and memory overall than the many running agents on every workload on the ESXi host.  This chapter focuses on NSX-T Endpoint Protection capabilities:

         Platform for Partner integration for Agentless AV/AM deployments

         Use cases covered for EPP

         Architecture details

         Windows EPP vs Linux EPP

         Workflows – Registration, Deployment, Consumption

         Designing consistent EPP Policies across vCenter Server inventories

         Designing granular, cluster-based Policy and Partner SVM deployment

6.3.1  Endpoint Protection – Architecture and Components

 

The high-level Endpoint Protection Architecture consists of the following components which are mandatory for NSX-T Endpoint Protection deployment.  These components represent the items which an NSX-T administrator would configure or interact with the most for using the Endpoint Protection platform.  

 

A screenshot of a cell phone

Description automatically generated

 

         NSX-T Manager Cluster o The cluster of NSX-T Managers which the Partner console will interact with via REST API commands

  • Provides the User Interface for configuring Groups, Service Deployments, and Endpoint Protection Policies for virtual machine workloads

         Partner Console o Registers via REST API with the NSX-T Manager cluster

         VMware Tools with Thin Agent o Two drivers for file and network inspection deployed as part of the VMware Tools installation, needed to send events and information to the Partner SVM

         Partner SVM

  • The Partner provided virtual machine appliance that contains the partner’s anti-malware engine.  

         ESXi Cluster o Endpoint Protection currently only supports ESXi-based workloads and the hosts must be in a vSphere Cluster, even if only 1 host resides.  A Partner SVM is deployed to ALL hosts within that vSphere cluster.  

         vCenter Server o vCenter Server provides the management plane for ESXi hosts and clusters.   o vCenter Server assists with the deployment and protection of the Partner SVMs using ESXi Agent Manager

         VSS/VDS Portgroup or N-VDS Segment (Refer to Figure 6-11) – Management Plane Interface o The VSS/VDS portgroup can used for connecting the Management network interface of the Partner SVM for communication to the Partner Console

  • The NSX prepped portgroup in the VDS, N-VDS Segment, Overlay or VLAN, can be used for connecting the Management network interface of the Partner SVM for communication to the Partner Console

         vmservice-vswitch (Refer to Figure 6-11) – Control/Data Plane switch o A standard vSphere Switch that provides a vmkernel port for the content multiplexer to communicate with the Partner SVM.  Must run on a vSphere Standard Switch.  Not configurable. 

         vmservice-vshield-pg (Refer to Figure 6-11) – Control/Data Plane portgroup o A standard vSphere Switch port group located on the vmservice-vswitch that the Partner SVM connects the Control/Data Plane interface to.  Must run on a vSphere Standard Switch.  Not configurable.

         NSX-T Transport Node Profile (Not Pictured) – An NSX-T Transport Node profile provides a consistent configuration of the Transport Nodes (ESXi Hosts prepared with NSX-T) in the vSphere Cluster.  This profile ensures any new host that joins a vSphere Cluster automatically has a Partner SVM deployed to the host for protecting workloads.   

         IP-Addressing Mechanism (Not Pictured) – IP Addresses for the Partner SVM are necessary for the SVM to communicate to the Partner Console.  These can be provided by NSX-T via IP Pool, or through a customer DHCP server.   

Breaking each of these components down further and dividing them into their planes of operation, one can take a closer look at the internal components.  

 

A picture containing text, map

Description automatically generated

Figure 6.10 - NSX-T Endpoint Protection Architecture - Low-level

 

A screenshot of a computer

Description automatically generated

Figure 6.11 - NSX-T Endpoint Protection Architecture - Including Networking

Figure 3 shows additional components of the NSX-T Endpoint Protection Architecture, specifically the ESXi host network configuration.  

 6.3.2  User Interface/REST API

 

The Endpoint Protection Platform User Interface is accessed through NSX-T Policy and REST API calls are made to the NSX-T Policy API.  A dashboard is supplied under the Security tab for Endpoint Protection that supplies information around the deployments, components having issues, and configured VMs.  

 

A screenshot of a cell phone

Description automatically generated

 6.3.3  Management Plane Components

 

The GI Provider is the component responsible for composing Endpoint Protection Policies and interacting with the Management Plane GI Vertical.  It resides inside the NSX-T Manager(s) that constitute the NSX-T Management Cluster.  Endpoint Protection leverages Service insertion for inserting partner services onto the NSX-T Transport Nodes.  Each host has a vSphere ESX Agent Manager installed and configured to manage the Partner SVM lifecycle and protect the virtual machine.  Finally, the GI vertical configures policies on NSX-T Groups of VMs and sends this configuration to the CCP Span Calculator.

 6.3.4  Control Plane Components

 

The NSX-T Control Plane components consist of the Centralized Control Plane (CCP), that resides in the NSX-T Manager(s) and the Local Control Plane (LCP) that resides in each ESXi host.  For NSX-T Endpoint Protection, the CCP pushes the VM Group configuration and subsequently the Endpoint Protection Policy, to the LCP of the hosts where the VMs reside.  The CCP calculates the span for the NSX-T Group(s) of VMs and sends this information to the LCP on appropriate hosts.

 6.3.5  Data Plane 

 

The Data Plane of the NSX-T Endpoint Protection platform resides in several components.  These components represent the plane in which the files, events, and information actually ‘flow’ for processing by the Endpoint Protection Platform and the Partner Service associated.  

 6.3.5.1  Thin Agent

 

The Thin Agent is a set of two drivers that are installed as part of the VMware Tools ‘Complete’ installation or by selectively installing them using the ‘Custom’ installation.  For Windows machines, this is done via the following:

 

A screenshot of a cell phone

Description automatically generated

 

For Linux-based workloads, VMware Tools is not required, only the Endpoint Protection thin agent package is required.  The package is accessed from https://packages.vmware.com/packages/nsxgi/latest/index.html for the appropriate Linux operating system.  The service created on the Linux system is located in /etc/init.d/vsepd.

For a list of supported Operating Systems, please refer to the NSX-T Administrator Guide  Endpoint Protection. 

 

 6.3.5.2  GI Library

 

This is the library that is linked with the Partner SVM and acts as an interface between the Partner SVM and the Thin Agent for their communications.  

 6.3.5.3  Context Multiplexer (Mux)

 

This component is responsible for forwarding Thin Agent events to the configured Partner SVMs.  It also forwards Partner SVM requests to the Thin Agent.  

 6.3.5.4  Context Engine GI Client

 

This component is responsible for sending Thin Agent and Mux health status to the GI Vertical.

 6.3.5.5  Muxconfig.xml file

 

This file is used to track the Partner Service(s) that are deployed as well as the virtual machines configured for each service on the ESXi host.  As machines are powered on and off, they are added and removed from the muxconfig.xml to enable and disable protection from the Partner Service.  

 6.3.6  Partner Components

 

NSX-T Endpoint Protection provides the platform for VMware certified partners to integrate their partner services.  The following section goes into details around the necessary component from the VMware partner, that communicate with the NSX-T Endpoint Protection Platform.  

 6.3.6.1  Partner Management Plane 

 

The Management Plane for the Partner Service is the Partner Console. The Partner Console is typically deployed as an OVA virtual machine and can be placed in a compute cluster, but generally placed into the management cluster for protection similar to other management plane appliance such as NSX-T Manager.  

 6.3.6.2  Partner Control/Data Plane 

 

The Control/Data Plane for the Partner Service is comprised of the Partner Service VM (SVM). The Partner SVM is deployed on each ESXi host in a cluster.  This SVM contains the partner’s AntiMalware engine.  

 6.3.7  Workflow Object Definitions

 

Before discussing NSX-T Endpoint Protection deployment, enforcement, and workflows, the objects that are configured and their definitions are required.  

         Deployment Template – Partner Template that tells the Partner SVM deployment where to connect to the Partner Console and over which ports

         Deployment Specification – Partner SVM metadata and sizing characteristics

         Service Deployment – Represents the configuration details of all the necessary objects to perform a deployment of the Partner SVM.  Contains the Computer Manager where the cluster, network, and data

store reside for Partner SVM Deployment.  Also contains the Deployment Specification and Deployment Template of the Partner SVM.  

         Service Instance – Represents the Partner SVM deployments and the associated data about their, host location, deployment mode, deployment status, and health status.  

         Catalog – Lists the Partner registrations that have been configured with NSX-T Manager

         Service Profile – Defines the vendor template that will be used in the Endpoint Protection Policy

         Vendor Template – Defines the template created from the Partner Console that contains the protection policy that the Partner will be enforcing on the workloads.  This template is passed to the NSX-T Manager for use in the Endpoint Protection Service Profile.

         Endpoint Protection Policy – NSX-T Policy that uses the Group and the Service Profile to define the ‘HOW’ and ‘WHAT’ for endpoint protection.  

Group – Defines the workloads that will be used in the Endpoint Protection Policy and protected.  

6.3.8  NSX-T Endpoint Protection Deployment and Enforcement

 

NSX-T Endpoint Protection provides a robust set of capabilities that provide significant flexibility of deployment options and enforcement.  

         Multiple vCenter Server Support – NSX-T Endpoint Protection supports a consistent Endpoint Protection Policy across multiple vCenter Server Computer Managers connected to NSX-T.  Certified Partner must support multiple vCenter Server connectivity.

         Cluster-based Endpoint Protection Policy Granularity – NSX-T Endpoint Protection supports granular, per-cluster, policy deployment and enforcement.  A different policy can be applied based on cluster workload needs.  Example:  VDI Desktop Cluster versus Server Workload Cluster policies.   

         Scalable Partner SVM deployment – NSX-T Endpoint Protection supports deploying different Partner SVM sizes to different clusters based on cluster workload needs.  Partner SVM sizing can reduce the number of resources necessary to perform agentless offload.  Example:  VDI Desktop Cluster versus Server Workload Cluster deployments where consolidation ratios are higher on VDI and may require a larger SVM with more resources to accommodate file-scanning needs.   

A screenshot of a cell phone

Description automatically generated

Figure 6.14 - NSX-T Endpoint Protection Policy - Multi-vCenter Server Consistent

 

A screenshot of a cell phone

Description automatically generated

Figure 6.15 - NSX-T Endpoint Protection - Cluster Granular and Partner Scalable

         Workload/Partner SVM Networking Agnostic – NSX-T Endpoint Protection supports workloads that reside on either VSS/VDS or N-VDS networking and supports deploying the Partner SVM on these same networking constructs.   

6.3.9  NSX-T Endpoint Protection Design Considerations

 

The flexibility options in deployment and enforcement of NSX-T Endpoint Protection bring up specific design considerations prior to deployment.  Before going into the design considerations in detail, it makes sense to call out a configuration detail, specific to Endpoint Protection.  

There are very specific ESXi host configurations that can impact a design of the NSX-T Endpoint

Protection deployment.  ESXi hosts have settings local on each host where Agent VMs, specifically Endpoint Protection Partner SVMs, can be placed on a specific datastore and network that’s locally significant to the ESXi host or part of shared networks and datastores present to other hosts.  Generally, these settings are not needed and Service Deployment from NSX-T Manager will overwrite any locally controlled settings on the ESXi host.  While these options are supported, they do not represent the majority of deployments and recommended options as they do not scale and are errorprone due the manual nature of configuration and the need to touch every ESXi host.  The following sub-section will describe these options and how to use them, but the rest of the section will be based on the recommended deployment option of configuration through the NSX-T Manager.  

 

 6.3.9.1  Agent VM Settings in ESXi/vCenter Server

 

It is possible, although not recommended as a primary use case, to deploy the Partner SVMs from NSX-T, to locally specified networks and data stores on the ESXi host.  These settings are configured on EACH ESXi individually in the Agent VM Settings Configure options.  You can configure these options from vCenter Server and each host as well.  

 

A screenshot of a cell phone

Description automatically generated

 6.3.9.2  Agent VM Settings in NSX-T

 

If local ESXi Agent VM Settings are used, the NSX-T Endpoint Protection Service Deployment needs to be configured appropriately and the ‘Specified on Host’ option used for the data store and management network.  

 

A screenshot of a cell phone

Description automatically generated

Figure 6.17 - NSX-T Endpoint Protection - Service Deployment Specified on Host Data Store

A screenshot of a cell phone

Description automatically generated

Figure 6.18 - NSX-T Endpoint Protection - Service Deployment Specified on Host Network

 6.3.9.3  Workload and Partner SVM Networking

NSX-T Endpoint Protection enforcement can be achieved for workloads on VLAN Backed and Overlay NSX-T Segment types and is also unique in that it does not require either of these segment types.  NSX-T Endpoint Protection can provide protection to workloads residing on VSS/VDS Portgroups as well.  

 

The Partner SVM that is deployed requires two network connections:

 

         Management vNIC – Connects to either a VSS/VDS Portgroup or an N-VDS VLAN or Overlay Segment.  

         Control vNIC – Connects the Partner SVM to the Mux inside the ESXi host.  Not configurable and automatically created on Service Deployment to the host

Regardless of networking construct used, the Management vNIC of the Partner SVM must be able to communicate with the Partner Console.  

 

The Partner SVM requires an IP Address mechanism to provide the IP for the Management vNIC.  This can be achieved by:

 

         DHCP Server – Customer hosted DHCP appliance/IPAM

         NSX-T IPAM (IP Pool) – NSX-T can provide an IP Pool and requisite configuration options for the Partner SVM to pull from.  (This is done from the IP Management selection in the Networking option of the UI).  

Regardless of IP Addressing mechanism used, the number of IP addresses in either the DHCP Scope or the NSX-T IPAM IP Pool should be sufficient to cover the number Partner SVMs deployed.

 6.3.9.4  Partner SVM Data Store and Compute 

 

The data store which the Partner SVM will be placed on is recommended to be shared across the entire cluster that is being deployed to, and provides enough disk space that will be able to host the size of the SVM multiplied by the number of hosts in the cluster.  The size of the disk that each Partner SVM requires differs per partner.  Consult the partner documentation to understand the disk requirements.  

 

Partner SVMs are deployed to all hosts in a vSphere cluster.  If a new host is added to the cluster, EAM triggers a deployment of a new Partner SVM to reside on the host and provide the same Endpoint Protection as assigned to all other hosts in the vSphere cluster.  

 

A screenshot of a cell phone

Description automatically generated

Figure 6.19 - NSX-T Endpoint Protection - Partner SVM Cluster Deployment

 

 6.3.9.5  Partner Console 

 

The Partner Console is recommended to reside on a management cluster with vSphere HA configured to provide redundancy.  Please consult the specific partner documentation on recommended highavailability configurations.  

 

 6.3.9.6  Service Deployment Restriction and Support

 

NSX-T Endpoint Protection Service Deployments are on a per-cluster, per-vCenter Server basis.  One

Service Deployment is required for each cluster.   If a Partner provides more than one Deployment Specification, i.e. SVM size, selection of the appropriate size is recommended based on the cluster workloads that are hosted.  

 

Once a Service Deployment is deployed, the NSX-T/vCenter Server specific options cannot be changed.  Only the Partner Deployment Specification and Deployment Template can be changed.  If either of these options are changed, a redeployment of the Partner SVMs will occur and protection will be lost while redeployment is taking place.    Changing networks of the Partner SVMs is not supported.  The recommendation is to remove Service Deployment and recreate on new data store.  Storage vMotion of the Partner SVMs is supported, however any redeployment will result in the Partner SVMs attempting to be put back on the configured Service Deployment data store.  The recommendation is to remove the Service Deployment and recreate on new data store.  

 6.3.9.7  Groups

 

NSX-T Groups define the workloads that will be protected by the Endpoint Protection Policy.  Size of

Groups follow the configuration maximums documented here.  Considering that Groups can contains VMs that reside on hosts outside of Endpoint Protection and VMs can be part of multiple Groups, it is recommended to create new Groups that align to the VMs on protected clusters.  Multiple Groups can be associated with the same Endpoint Protection Rule.  

 6.3.9.8   Service Profile

 

A partner can specify multiple templates that can be used based on the workload type that’s being protected.  It is required to create at least one Service Profile that will be used in an Endpoint

Protection Policy.  Multiple Service Profiles should be used when it’s necessary to have more than one

Endpoint Protection Policy.  Example:  VDI Service Profile and Endpoint Protection Policy and Server Service Profile and Endpoint Protection Policy.  Only one Service Profile can be specified in an Endpoint Protection Rule.  

 6.3.9.9  Endpoint Protection Policy

 

NSX-T Endpoint Protection Policy provides the Endpoint Protection Rules that govern the protection over the Groups of workloads contained and apply a specific Service Profile.  An Endpoint Protection Policy can have more than one Endpoint Protection Rule and, in each rule, the same or a different Service Profile.  The recommended configuration of an Endpoint Protection Policy would be to group like policies with the same Service Profile into one Endpoint Protection Policy.  This helps with troubleshooting and consistent deployment models.  

 

NSX-T Endpoint Protection Rules are defined within an Endpoint Protection Policy and include one or more NSX-T Groups and exactly one Service Profile.  Recommended configuration would be to add all of the groups necessary that are part of the same Service Profile, to the same Endpoint Protection Rule.  Check the maximum amount of VMs that Endpoint Protection can support per NSX-T deployment and Group maximums documented here.  

 

6.3.10  Endpoint Protection Workflow: Registration, Deployment, and Consumption

 

The Registration step of the Endpoint Protection Workflow is performed from the Partner Console.  The Partner Console needs to register with the NSX-T Managers and the vCenter Servers where workload protection will be applied.  Please consult the Partner documentation for the process of registering NSX-T and vCenter Server.  

 

  1. Connect Partner Console to vCenter Server(s)
  2. Connect Partner Console to NSX-T Manager
  3. Verify Service Definition Registration in Catalog

 

A screenshot of a social media post

Description automatically generated

Figure 6.20 - NSX-T Endpoint Protection Workflow - Partner Registration in Catalog

The Deployment step of the Endpoint Protection Workflow is performed in the Service Deployments > Deployment section of the NSX-T Manager.  

 

  1. A Compute Manager vCenter Server is selected
  2. A Cluster is selected
  3. A Data Store is selected
  4. A Management Network is selected
  5. An IP Addressing Mechanism is selected
  6. A Deployment Specification is selected
  7. A Deployment Template is selected
  8. Deployment is executed per the configurations selected

 

 

A screenshot of a cell phone

Description automatically generated

 

The Consumption step of the Endpoint Protection Workflow is performed in both the Partner Console and Security > Endpoint Protection Rules section of the NSX-T Manager.  

 

A screenshot of a cell phone

Description automatically generated

Figure 6.22 - NSX-T Endpoint Protection Workflow - Service Profile Creation

 

  1. The Partner Console will push default and new Vendor Template policies that are created and marked by NSX-T synchronization to the NSX-T Manager.  
  2. A Service Profile is created and Vendor Template selected
  3. Endpoint Protection Policy created
  4. Endpoint Protection Rule created
  5. Endpoint Protection Rule Group(s) assigned/created
  6. Endpoint Protection Rule Service Profile assigned
  7. Publish

 

6.3.11 Partner Supportability 

 

All partners that are currently certified and supported for the Endpoint Protection Platform are listed on the VMware Compatibility Guide.  This is the definitive sources for joint VMware and Partner certified integrations.  

https://www.vmware.com/resources/compatibility/search.php?deviceCategory=nsxt&details=1&solu tioncategories=28&api=5&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc 

 

 

  

7 Intrusion Detection and Prevention

Much like distributed firewalling changed the game on firewalling by providing a distributed, ubiquitous enforcement plane, NSX distributed IDS/IPS changes the game on IPS by providing a distributed, ubiquitous enforcement plane.  However, there are additional benefits that the NSX distributed IPS model brings beyond ubiquity (which in itself is a game changer).  NSX IPS is IPS distributed across all the hosts.  Much like with DFW, the distributed nature allows the IPS capacity to grow linearly with compute capacity.  Beyond that, however, there is an added benefit to distributing IPS.  This is the added context. Legacy network Intrusion Detection and Prevention systems are deployed centrally in the network and rely either on traffic to be hair pinned through them or a copy of the traffic to be sent to them via techniques like SPAN or TAPs. These sensors typically match all traffic against all or a broad set of signatures and have very little context about the assets they are protecting. Applying all signatures to all traffic is very inefficient, as IDS/IPS unlike firewalling needs to look at the packet payload, not just the network headers. Each signature that needs to be matched against the traffic adds inspection overhead and potential latency introduced. Also, because legacy network IDS/IPS appliances just see packets without having context about the protected workloads, it’s very difficult for security teams to determine the appropriate priority for each incident. Obviously, a successful intrusion against a vulnerable database server in production which holds mission-critical data needs more attention than someone in the IT staff triggering an IDS event by running a vulnerability scan. Because the NSX distributed IDS/IPS is applied to the vNIC of every workload, traffic does not need to be hair pinned to a centralized appliance, and one can be very selective as to what signatures are applied. Signatures related to a windows vulnerability don’t need to be applied to linux workloads, or servers running Apache don’t need signatures that detect an exploit of a database service. Through the Guest Introspection Framework, and in-guest drivers, NSX has access to context about each guest, including the operating system version, users logged in or any running process. This context can be leveraged to selectively apply only the relevant signatures, not only reducing the processing impact, but more importantly reducing the noise and quantity of false positives compared to what would be seen if all signatures are applied to all traffic with a traditional appliance.

 

NSX distributed IPS brings 4 main benefits:

 

         Elastic throughput

         Simplified Network architecture

         Operational Efficiency

         Higher Trigger Fidelity

         Utilize Stranded Compute

 

As with the NSX DFW, NSX IPS is network independent, and can be used to monitor intrusions for both workloads on traditional VLANs and workloads on Overlay segments. Thanks to the NCP, it can even monitor even Pods inside containers.

 

 

Figure 7-1 NSX-T IPS Configuration and Workflow

 

Configuring NSX IPS involves four steps, as showing in figure 7-1 above: Download Signatures, Enable IDS, Create Profiles and Rules, and Monitor events.  After describing the IPS components, each step will be examined in detail.

7.1  NSX IPS Components

 

The NSX IPS components are the same as those described above for DFW as IPS functionality is collocated with DFW.  In the Management plane, the Manager downloads IPS signature updates from the cloud service and users configure IPS profiles and rules.  As with the DFW, the configuration is passed to the CCP after being stored in the Manager.  Again, as with DFW, the CCP pushes the information to the LCP on the hosts.  At the host, the signature information is stored in a database on the host and configured in the datapath.  The ESXi host also collects traffic data and events to pass up to the NSX manager.

 

Figure 7.2 below shows the detail of the IPS components inside the host.

 

A screenshot of a cell phone

Description automatically generated

Figure 7-2- NSX-T IPS Components – LCP and host

When the configuration arrives at the host, the following takes place.

 

  1. NSX-Proxy obtains configuration changes from CCP and writes data into NestDB.
  2. NestDB stores signatures and IPS rules locally.
  3. nsx-cfgAgent obtains the configuration from NestDB and writes signatures to IPS and IPS rules to VSIP.
  4. vSIP evaluates traffic against IPS “interesting traffic” rules. If a match is found, the packet is punted to the vDPI (Mux) 
  5. vDPI copies the packet and send the copy through the IPS engine. The packet is released on the vSIP dataplane when IPS finishes inspection
  6. IPS Event is generated by the IPS engine
  7. Event Engine collects flow metadata and generates alerts.

 

The event engine is a multi-threaded engine (one thread per host core) deployed on every ESXi TN as part of host-prep which runs in User-space.  This engine runs on all ESXi hosts regardless of the enabled state of IPS. (When NSX-T is installed on a host, everything that is required for distributed IDS/IPS to function is installed at that time. No additional software needs to be pushed to the host.) 

The event engine evaluates traffic against IPS signatures only when IPS is enabled on the TN and IPS

Rules are configured.  The IPS signatures are configured in profiles and programmed on each IPS Engine. Traffic is mapped to profiles to limit signature evaluation. Note that IPS performance is impacted more so by the inspected traffic, than by the number of signatures which are evaluated.  The default set of signatures is programmed on each IPS engine, even when IPS is disabled. For highly secure air-gapped environments, there is support for offline signature update download which involves registration, authentication, and signature downloads in a zip file which can then be manually uploaded via the UI. 

 

 

7.2  IPS Signatures

 

NSX-T IPS ships with over 11,000 curated signatures.  These signatures are currently provided by one of the most well-known Thread Intelligence providers, Trustwave, and are curated based on the Emerging Threat and Trustwave Spiderlabs signatures sets. Because of our pluggable framework, additional signature providers can be added in the future.  

A screenshot of a cell phone

Description automatically generated

 

   Description and ID – These are unique to each signature 

   Simple Strings or Regular Expressions – These are used to match traffic patterns

   Modifiers - Are used to eliminate packets (packet payload size, ports, etc.)

 Meta-data – Used to selectively enable signatures that are relevant to the workload being protected using the following fields for context:

         Affected Product - Broad category of workloads vulnerable to the exploit

         Attack Target – Specific service vulnerable to this exploit (Drupal Server or Joomla, for example)

         Deployment

   Performance impact – Is an optional field.

   Severity – Information included in most signatures

  

Signatures are classified into over 50 self-explanatory categories/types including Attempted DOS,

Successful user privilege gain, and shell.code-detect. Each Classification-type has a Type Rating (19) based on the risk and fidelity associated with the type of event/attack. Type ratings are mapped to NSX IPS Severity Rating (4 - Critical, 3 - High, 2 - Medium, and 1 - Low).  Signature Severity helps security teams prioritize incidents. A Higher score indicates a higher risk associated with the intrusion event. Severity is determined based on the following:

 

  1. Severity specified in the signature itself|
  2. CVSS Score specified in the signature 

(as per CVSS 3.0 specs)

  1. Type-Rating associated with the classification-type

 

7.3  Profiles

 

Signatures are applied to IPS rules via Profiles.

 

A screenshot of a social media post

Description automatically generatedFigure 7--4 NSX-T IPS Signature Profile

 

A single profile is applied to matching traffic. The default signature-set enables all critical signatures.  The IPS engine supports “tenants” to apply specific profiles to traffic per vNIC. This limits the number of false positives and reduces the performance impact.  Profiles are used in different strategies such as a single or few broad profiles for all traffic or many granular/workload-specific profiles.  The tradeoff is yours to make between administrative complexity and workload signature fidelity. 

 

Profiles group signatures based on the following criteria: 

 

         Classification Type

         Severity (Critical | High | Medium | Low)

         Deployment (Gateway | DC) Attack Target (Client | Server)

         Affected Product (Web_Browsers | Apache | …)

         Signatures can be excluded from a profile

 

For each profile, exclusions can be set to disable individual signatures that cause false positives, are noisy, or are just irrelevant for the protected workloads. Exclusions are set per severity level and can be filtered by Signature ID or Meta-data.  The benefits of excluding signatures are reduced noise and improved performance.  Excluding too many signatures comes with a risk of not detecting important threats. 

7.4  IPS Rules

 

Rules are used to map an IPS profile to workloads and traffic. In other words: IPS rules define what is “interesting traffic” to be inspected by the IPS engine.  By default, no rules are configured.

 

A screenshot of a cell phone

Description automatically generated

 

As one can see in figure 7.5, IPS rules are similar to regular DFW rules or Service Insertion Rules.  You can specify one IPS profile per rule.  IPS rules are stateful and provide support for any type of group in the source and destination fields, just like DFW rules.  However, the use of L7 APP-ID services inside IPS rules is not supported.  As was addressed earlier with the DFW, the use of the Applied-To field to limit the scope of the rule is highly recommended.

A screenshot of a social media post

Description automatically generated

The NSX Security Overview screen provides several key insights to help security teams.  This screen provides three main dashboards: IPS Summary (for East West Traffic), URL Analysis (for North South Traffic), and DFW Rule Utilization.

 

The IPS dashboard (shown above in figure 7.6) provides the following information:

         Enabled state for standalone hosts and for clusters 

        Above shows the standalone hosts are enabled and 1 out 3 clusters are enabled.

         Date range for the data being displayed

        Above, shows the date range is January 10, 2020 through January 29, 2020

         Total number of intrusion attempts, organized attempts by severity 

        Above shows 254303 Critical, 19161 High, and 83 Medium, and 7 Low.  (If you ever see this in a live environment, brew a strong pot of coffee.  It is going to be a long night!)

         Trending by Severity

        In the figure above, it shows there was a peak on January 11th

         Top VMs by Intrusion Attempts or Top VMs by Vulnerability Severity

        Above displays Top VMs by Intrusion Attempts

 

All of this information is intended to give a sense of the state of affairs in general and provide an indication of where to focus attention. If you click on the Total Intrusion Attempts, you are brought to the Events screen, shown below.

 

 

A screenshot of a cell phone

Description automatically generatedFigure 7--7 NSX-T IPS Centralized Events

 

The UI will contain last 14 days of data or 2 Million Records.  There is a configurable timeframe on the far right for 24 hours, 48 hours, 7 days, or 14 days.  The clickable colored dots above the timeline indicate unique types of intrusion attempts.  The timeline below that can be used to zoom in or out.  Finally, the event details are shown below in tabular form.  On every severity level, there are check boxes to enable filtering.  Event filtering can be based on:

         Attack-target (Server|Client|…)

         Attack-type (Trojan|Dos|web-attack|…) 

         CVSS 

         Product Affected

         VM Name

 

Figure 7.8 below shows the details of an event.

 

A screenshot of a social media post

Description automatically generatedFigure 7--8 NSX-T IPS Event Details

Each event contains the following details:

         Severity

         Description/Details

         Attack Type

         Attack Target (if available)

         Signature Revision

         Product Affected (if available)

         Vulnerability Details

        CVSS (Common Vulnerability Scoring System) 

        CVE ID (Common Vulnerabilities and Exposures)

 

Events can be stored on the host via a cli command for troubleshooting. By default, local event storage is disabled.  When it is enabled the events are stored in the /var/log/nsx-idps/fast.log file.

 

As was defined earlier, the NSX IPS configuration high level workflow is essentially four steps: Signature download, Enabling IPS, Profile/Rule Definition, and Monitoring. Most of the time will be spent iterating between the last 2 steps after NSX distributed IPS is configured.  New downloads may trigger a need to update profiles and rules, but most of the time will be spent monitoring. One important point to note with respect to IPS: Regular DFW, Layer-7 APP-ID Rules and IPS Rules can be applied to the same traffic, but traffic needs to be allowed by the DFW to be passed through IPS.  In other words, IPS does not apply to dropped traffic.

 

7.5  IPS Use Cases

Although NSX IPS can be used in a wide variety of use cases, four common use cases are examined in the following section: Compliance, Zones, Appliance Replacement, and Lateral Threat Containment. Although they are highlighted as four individual use cases, it is entirely possible that they coexist.

 7.5.1  IPS Use Case: Compliance

 

NSX IPS is typically used in compliance to enable software-based IPS/IPS for critical applications to easily achieve compliance requirements for PCI-DSS, HIPAA, SOX. Many customers need to meet regulatory compliance for their sensitive applications that deal with (for instance) healthcare or financial data such as HIPAA or PCI-DSS. These compliance requirements often specify the needs for IPS/IPS to prevent data theft. NSX enables customers to easily achieve regulatory compliance by enabling micro-segmentation to reduce the audit scope and by enabling IPS/ISP selectively on the workloads that need to meet compliance. Certain regulatory requirements specify the needs for Intrusion Detection to be enabled for all applications subject to those regulations. Without NSX IPS, that would require all traffic be funneled through a group of appliances, which could have an impact on data center architecture.  With the combination of NSX DFW and NSX IPS, traffic can be microsegmented and tagged for IPS as showing in figure 7.10 below.

 

 

A screenshot of a cell phone

Description automatically generated

Figure 7--10 NSX-T IPS Compliance

 

In the example above, the PCI application is tagged so that it is firewalled off from the other applications which are coresident on the server hardware.  IPS can be applied to only that application to meet compliance requirements, without requiring dedicated hardware.  If desired IPS with a reduced signature set may be applied to only the database portion of the other applications, for example. 

 

This use case highlights the following aspects of NSX IPS:

         Reduced compliance scope

         Selective enablement of IPS throughout the environment

         Apply signatures relevant to compliance zone

         Reducing performance impact and alert noise

 

NSX IPS allows customers to ensure and prove compliance, regardless of where the workloads reside which enables further consolidation of workloads with different compliance requirements  on x86.

 

 7.5.2  IPS Use Case: Creating Zones

 

NSX IPS allows customers to create Zones in software without cost and complexity of air-gapped networks or physical separation. Some customers provide centralized infrastructure services to different lines of business or need to provide external partner with access to some applications and data. All customers need to provide proper segmentation between DMZ workloads that are exposed to the outside/guest wifi and the internal applications and data. Traditionally, this segmentation between tenants or between the DMZ and the rest of the environment was done by physically separating the infrastructure, meaning workloads and data for different tenants or different zones were hosted on different servers, each with their own dedicated firewalls. This leads to sub-optimal use of hardware resources. The NSX Distributed firewall and Distributed IPS/IPS allow customers to run workloads that belong to different tenants and different zones on the same hypervisor clusters and provide the same level of segmentation they would get with physical firewalls and IPS appliances while allowing much higher consolidation ratios. 

 

 7.5.3  IPS Use Case: Appliance Replacement

 

 

With the added functionality of NSX distributed IPS, many customers evolve from legacy appliancebased IPS architectures to NSX distributed IPS. As customers are virtualizing their data center infrastructure and networking, NSX enables them to replace physical security appliances with intrinsic security that is built into the hypervisor. Doing this for both firewalling with the distributed firewall and for IPS with the Distributed IPS/IPS provides a single security policy for both across the whole SDDC.  Further, there is a real savings in terms of rack space and electricity and cooling with the intrinsic approach. Each data center grade security appliance draws on the order of 10 kW of power, which is almost 90,000 kW per year – per appliance!  When those appliances are replaced by an intrinsic security architecture which uses the spare cycles of each CPU in the datacenter, the savings add up quickly.  Savings, which come with an ENHANCED security posture.

 

A close up of a map

Description automatically generated

Figure 7--11 NSX-T NSX distributed IPS appliance replacement

Beyond the security appliance savings, there is a significant savings in networking infrastructure required to connect things up as shown in figure 7-11.  This use case alone can fund the change to an intrinsic security architecture.

 7.5.4  IPS Use Case: Detecting Lateral Threats

 

NSX IPS allows customers to combine signature-based detection, anomaly detection, and protocol conformance checks. Almost invariably, the actual objective of an attack is not the same as where the attacker initially gained access. This means that an attacker will try to move through the environment in order to get to steal the valuable data they are after. Hence, being able to not just defend against the initial attack vector, but also against lateral movement is critical. Micro-segmentation using the distributed firewall is key in reducing the attack surface and makes lateral movement a lot more difficult. Now, for the first time, micro-segmentation becomes operationally feasible to front-end each of your workloads with an Intrusion Detection and Prevention service to detect and block attempts at exploiting vulnerabilities wherever they may exist. This protection exists regardless of whether the attacker is trying to gain initial access in the environment, or has already compromised a workload on the same VLAN and is now trying to move laterally to their target database on that same VLAN.

 

A screenshot of a cell phone

Description automatically generated

Figure 7-12 NSX-T Lateral Threat Movement

Distributed IPS front-ending every workload enables exploit-detection regardless of it being initial attack vector, lateral spread or exfiltration.

 

 

 

 

  

8 Federation

Most enterprise environments have multiple data centers for scale and/or disaster recovery, each with its own compute resources as well as its own network and security resources. 

 

For simpler solutions, NSX-T offers multi-site. This solution NSX-T Multisite solution is based on 1 NSX-T Manager Cluster managing Transport Nodes (Hypervisors + Edge Nodes) physically in multiple sites. In case of high scale need, a second NSX-T Manager Cluster has to be installed. With multi-site, however, each NSX-T Manager Cluster is independent, and Network and Security objects are not shared between them. The sweet spot for multi-site is 2 locations in a metro region for DR.  For truly diverse data centers, multi-site does not suffice; federation is designed to address this use case.

A picture containing screenshot

Description automatically generated

 

Figure 8--1 NSX-T MultiSite

 

With NSX-T Federation, you have Network and Security services offered by NSX Local Managers

(LMs). Local Managers are the very same NSX-T Managers you know. Here they are called Local

Managers to differentiate them from the new NSX element with NSX-T Federation: Global Manager (GM). Global Manager offers Operational Simplicity with Network and Security configuration centrally done to the GM, and then transparently pushed to all LMs. And also offers Consistent Policy Configuration and Enforcement with network and security objects that are shared across LMs. So, you can create Global rules like “all my DMZ Web Servers can talk to my Active Directory Servers” and this single rule will be pushed and enforced to all the data centers. (Note: All manager connectivity – GM to LM and LM to LM- must not be NATed. Connectivity to the Edge Nodes must also not be NATed.)

 

A close up of a map

Description automatically generated

Figure 8--2 NSX-T Federation

 

8.1  Managers in Federation

 

Each manager logically depicted above represents a manager cluster of three appliances.  Although the GM is represented as one object, it is functionally one active GM cluster with a standby GM cluster in another location as shown in figure 8.3 below.

 

A screenshot of a cell phone

Description automatically generatedFigure 8-3 NSX-T Federation clusters

The active GM cluster stores the configuration, syncs it to the standby GM, and pushes it to the relevant LM(s).  If, for example a segment is stretched between Locations 1 and 2, but not 3, the config would only be pushed to the LMs at Location 1 and 2, but not the LM at 3.  The control plane state is synced between the peer LMs.  So, for example, group membership which spans sites is synced between the 2 LMs directly using the “async_replicator” process whose status is available via the cli.

 

The UI provides a means for selecting the Location for configuration as shown in figure 8.4 below.

 

A screenshot of a computer

Description automatically generated

 

Figure 8-3 NSX-T Federation UI

 

As mentioned above, when interacting with the GM the configuration is pushed to the LM(s).  However, the LM configuration remains local.  It is not pushed up to the GM. This interaction is shown in figure 8.4

 

A close up of a map

Description automatically generated

Figure 8-4 NSX-T Federation Config push

 

 

8.2  Groups

As explained earlier, groups are a very efficient tool for policy configuration on NSX-T firewalling. 

Groups are also available with Federation, but now there are 3 different types of groups: Global, Regional, and Local.  Global groups are relevant at all locations.  Regional groups are relevant at more than one location, but not all locations.  Finally, local groups are relevant at only one location.  It is important to note that groups can mix spans as shown below in figure 8.5.

 

A screenshot of a video game

Description automatically generated

Figure 8-5 NSX-T Federation Groups

 

Figure 8.5 shows a global group that contains global services such as DNS and NTP.  There is also a regional group which contains AD and proxy services. Finally, there are local App groups.  Note that the Apps in Location 3 consume the Regional services as well as the global services and thus require firewall rules allowing this.

 

As mentioned above, group membership is updated directly from LM to LM.  Groups can be defined by tags so membership may be quite dynamic. 

 

The following figure shows a topology with stretched T1 services.  

 

A close up of a map

Description automatically generatedFigure 8-6 Stretched T1 services

Here, there are 4 groups, of which three (G1, G2, and G3) are used in NAT rules.  G1 is a local group to Location 1.  G2 is a regional group spanned across Locations 2 and 3.  G3 and G4 are global groups, meaning they span all three locations.  The span of a T1 is (by definition) equal to or a subset of the T0 to which it is connected.

 

For a complete discussion of Federation use cases and configuration, please see the Federation Design Document.

 

 

9 Management and Operations

 

One of the challenges of using legacy firewalling tools for securing modern infrastructure is that (due to their architectural nature) they lack the tools needed to effectively secure and manage a datacenter infrastructure.  As mentioned in the introduction, these legacy firewalls are designed to be at a perimeter with an inside and an outside – a safe side and a suspicious side.  East West Firewalling has no such bearings.  East West Firewalling is about securing everything.  One of the greatest challenges that customers face in implementing East West Firewalling is in defining policy for an infrastructure which has been around for years or even decades. How do you secure an environment which you don’t know or understand?  This is where modern policy management tools come in.  VMware offers 2 such tools: vRNI and NSX Intelligence. Each tool has its own use cases and sweet spots.  This chapter examines those.

 

This chapter closes with a look at operations.  A detailed list of the tasks required for a successful NSX implementation is provided.

  

 

 

9.1  vRNI

 

vRNI is the perfect tool to understand an environment where NSX does not exist.  vRNI uses netflow/IPFIX to understand traffic patterns.  It has visibility to the virtual and physical world by tapping the switches and routers in both worlds. vRNI provides an understanding not only of what is talking to what on which ports, but also a sense of the volume of that traffic flow.  

 

A screen shot of a monitor

Description automatically generated

Figure 9--1 vRNI Microsegmentation Planning window

 

 

vRNI is a great tool to assess the rough order of magnitude of the undertaking in question.  vRNI has been used by customers to:

         Determine application interdependencies

         Determine application volume, and use

         NSX compliance and policy suggestions

         Troubleshoot day two issues

 

For NSX compliance and policy suggestions, vRNI can determine flows that are unprotected by NSX even if both endpoints are unprotected.  One of the views in the Security Planning section is unprotected flows.  When those flows are displayed, the entirety of the suggested security policy can be exported in YAML or XML.  Or, one can click on one wedge to see a suggested security policy (exportable to XML or CSV) for only that wedge.  Figure 9.2 shows both those screens.

 

A screen shot of a computer

Description automatically generated

Figure 9-2 vRNI Unprotected flows and Recommended Firewall Rules

 

Troubleshooting day two issues is where vRNI excels the most. vRNI makes the entire infrastructure searchable.  If one dos not know where 2 endpoints are, a vRNI query for the path between them will plot them out, with intermediate switches, routers, firewalls, and load balancers all depicted, even if either one or both endpoints are containers or in public clouds (vRNI integrates into AWS VPCs and Azure VNETS when provided credentials) – even if one or both endpoints are containers in public clouds. Figure 9.3 shows a flow which traverses 2 data centers and an AWS VPC.  What is important to note is that a user can query the path without knowing where the 2 endpoints with equal simplicity as if the endpoints were two VMs sitting next to each other on a host.

 

A screenshot of a computer

Description automatically generated

Figure 9-3 vRNI Path Tool 2 data centers and AWS VPC

 

 

vRNI also provides native integration into every major firewall vendor management platform.  This allows vRNI to provide a visual end to end path which includes firewalls along the way and relevant security policy.  This is show in figure 9.4 below.

 

A screenshot of a computer

Description automatically generated

Figure 9-4 vRNI Path Tool with Palo Alto Networks physical firewall

As can be seen from the figure, not only is the firewall noted in the path, but one can click on the object and view the portion of the firewall policy relevant to these endpoints through vRNI integration with Panorama.  When NSX is deployed, vRNI can help with compliance by pointing out unprotected flows.  It can also help alert on factors that may affect the health of the NSX infrastructure components such as VM storage issues.

 

 

9.2  NSX Intelligence

When NSX is installed, NSX Intelligence is the optimal tool for visualization and policy planning, closing the speed and action gap with network and host informed analytics.  NSX Intelligence is a lightweight central appliance with distributed processing engines inline within the hypervisors which take a single pass approach to provide intelligent policy formulation, as well as security and network analytics. Because NSX Intelligence processing engines lie within the hypervisors, they can increase in processing capacity linearly with the increase in compute.

 

NSX Intelligence has the luxury of complete L7 inspection and endpoint context for every workload.  This is combined with bi-directional intelligence feeds from external sources.  When NSX is installed, NSX Intelligence is the optimal tool to:

         Automate Micro-segmentation/Firewalling at Scale

         Demonstrate and Maintain Policy Compliance

         Simplify Security Incident Troubleshooting

 

 When used to automate firewalling at scale, NSX Intelligence provides a repository for policy management and enforcement.  NSX Intelligence will generate new recommendations upon detecting changes to policy. This allows you to create a baseline recommendation, then let NSX Intelligence learn the desired DFW policy.  As part of optimal policy design, NSX Intelligence can discover groups of up to 250 members based on VM membership changes. In providing automated firewall policy recommendation which can be pushed directly to the NSX firewall, NSX Intelligence speeds up the securing of complex, unknown east west environments.  NSX Intelligence also provides an iterative workflow with continuous updates to topology visualization.  

 

A screenshot of a social media post

Description automatically generated

Figure 9-5 NSX Intelligence Policy Recommendation, Viewed in DFW table

 

 

For compliance, NSX Intelligence provides a complete record of every flow from, from every workload.  NSX Intelligence also provides correlated flows and policies to highlight misconfigurations, policy exemptions, and on-compliant flows between workloads of security scopes.  Most importantly, NSX intelligence provides continuous analysis so that the above information is always accurate and current.

 

A screenshot of a cell phone

Description automatically generated

Figure 9-6 NSX Intelligence New Recommendation Upon Detected Changes

 

For troubleshooting, NSX Intelligence provides comprehensive visibility of the NSX environment for security teams. Notably, NSX Intelligence provides Layer 7 analysis of every flow, without sampling, for optimal fidelity. The Drill-down topology visualization combines application maps and complete workload inventory. By default, community grouping is activated when the UI detects more than 1,000 nodes. Multi level visualization allows NSX Intelligence to scale to enterprise environments.  This is shown in figure 9.7 below.

 

Figure 9-7 NSX Intelligence

 

 

To summarize, vRNI and NSX Intelligence are two complementary tools which coordinate for a complete security management solution. vRNI is the perfect tool for understanding the scope of an environment without NSX.  Once NSX is installed, the simplified rule recommendations and deployment mean one-click firewalling.  For day two operations, vRNI assists in the micro-seg planning by app modeling and grouping, leveraging information from sources such as Service Now.  For end-to-end infrastructure visibility across both the physical and virtual environments, nothing beats vRNI.

 

A screenshot of a cell phone

Description automatically generatedFigure 9-8 NSX Intelligence vs vRNI

 

 

9.3  SIEM

System Information and Event Management tools (aka SIEM or syslog tools) is an important part of any security approach for early detection of attacks and breaches. SIEM tools collect and aggregate data from a variety of sources (devices, endpoints, applications, and even services).  In addition to writing RFC 5424 compliant syslog messages to a local file (in the /var/log/ directory), NSX can configure a remote syslogging server via the cli (with the set logging-server command). Syslog is supported on the NSX Manager, the NSX Edges, and the hypervisors. On hypervisors, the tac, tail, grep, and more commands can be used to navigate the logs.  The audit log is part of syslog.

 

NSX includes a license for vRealize Log Insight.  Log Insight (as its lovingly known) commonly is used to front end larger SIEM installations such as Splunk to reduce the cost burden of the latter.  In doing so, many customers also find the NSX-T content pack for LI to provide significant value. Instructions for pointing NSX-T audit and syslogs to LI can be found in the VVD here. 

 

 

Figure 9-9 NSX Content Pack for vRealize Log Insight

For those who prefer to ingest the NSX syslog data directly into Splunk, there is an NSX-T App for Splunk.

 

A screenshot of a social media post

Description automatically generated

Figure 9-10 NSX Content Pack for Splunk

 

 

9.4  NSX Operations

 

One of the most frequently asked questions from customers is “How do I operationalize this?” This question is similar to the question of “How do I run a household?” In both cases, while there is no strict list of exactly how to do things, there is a list of tasks which need to be addressed for a success. Much like who takes the garbage out will vary house to house, a household cannot succeed without that task being done. Similarly, which person assumes the given NSX task will vary from company to company, based on the culture at each company. But for a successful implementation, the listed tasks need to be addressed. The following table describes some of the high-level tasks that need to be done to operationalize NSX:

 

 

Task

Responsible Role

Comments

Role in NSX

Architecture

 

 

 

Design and publish detailed NSX designs and drawings

Cloud

Security

Architect

In collaboration with Network Architect.

Planning activity

Application assessment and migration strategy

 

Cloud

Security

Architect

Migration of applications from existing physical firewall to logical firewall

Depends on Approach

Define common

services for applications

Cloud

Security

Architect

Services for modern and traditional applications. IT services for applications eg: DNS,NTP,DHCP,AD. Admin need to understand where the Services reside and how to track changes.

Security Admin

Define Security Group model for DFW

Cloud

Security

Architect

Working with Engineers, and collaboration with Application teams.

Security Admin

Obtain overall approval from Security on architecture 

 

Cloud

Security

Architect

 

Planning activity

Define support staff authorization policy and list (who/what) for

NSX Manager (Admin /

User Roles)

 

Cloud

Security

Architect

Working with Engineering and Cloud Operations leadership

Enterprise Admin  

Define, signoff, and publish NSX Security object naming and

tagging conventions

 

Cloud

Security

Architect

 

Security Admin

Design blueprint security tagging policy

Cloud

Security

Architect

 

Security Admin

Define policy and ports for application access

(e.g., Web, App, DB)

 

Cloud

Security

Architect

 

Security Admin

Security approval process for new services (e.g., FW policies)

 

Cloud

Security

Architect

Working with Engineers. For example, health monitoring.

Security Admin

 

Task

Responsible Role

Comments

Role in NSX

Architecture

 

 

 

Specify posture for Security Zone

Cloud

Security

Architect,

Cloud

Network

Architect

Build the security framework for Test and

Development zone,

Production zone, DMZ etc.

Security Admin

Guide engineering and operations teams with implementation and onboarding

 

Cloud

Architect,

Cloud Admin

 

Not Applicable to a specific role in NSX

Identify, evaluate, and recommend automation and operations tools

 

Cloud

Architect   

 

Not Applicable to a specific role in NSX

Define alerting and notification model

 

Cloud

Architect

 

NSX Admin

Auditing and reporting processes for

compliance

 

Cloud

Architect   

Define the processes for audit and reporting for compliance

Not Applicable to a specific role in NSX

Engage in Tier 3 support as needed  

Cloud

Security

Architect

 

Advanced troubleshooting and architectural changes

Not Applicable to a specific role in NSX

Planning for security in the physical network

Cloud

Security

Architect

In collaboration with

Infrastructure Security Team. For example, interrack connectivity and communication with NSX appliances.

 

Not Applicable to a specific role in NSX

Engineering

 

 

 

Building the automation and orchestration model   

Cloud Tooling Engineer

Development of blueprints, templates for automation.

Eg: vRealize suite,

Openstack,Puppet,Chef etc.

 

Not Applicable to a specific role in NSX

Deploy & test defined blueprints

Cloud

Security

Engineer

 

 

Not Applicable to a specific role in NSX

 

Task

Responsible Role

Comments

Role in NSX

Engineering

 

 

 

Deploy Operations Tools for Monitoring and Troubleshooting

Cloud Tooling Engineer

vRNI Dashboards, NSX Dashbaords, Runbooks. VI admin requires to configure syslog configuration for

Hosts

Security Admin

Build, manage, and maintain NSX Infrastructure  

Cloud

Infrastructure

Engineer

Deploy test, validate and certify the infrastructure. Capabilities, configurations, integrations and interoperability. Ensure fulfilment of requirements (capacity, availability, security and compliance), ensure backup and restore of NSX Manager data. Upgrade and patch infrastructure and tools. 

 

NSX Admin

Build, manage, maintain and customize provisioning, monitoring and troubleshooting tools 

 

Cloud Tooling Engineer

 

NSX Admin

Build all Common Services for applications

Cloud

Security

Engineer

Working with Architects, build the services for applications like firewall services

 

Security Admin

Modify policy/ports on ongoing basis

Cloud

Security

Engineer

Check to see if this can be handed over to Operations team. Implement routine, approved and exception changes.

 

Security Admin

Implement microsegementation

security model

 

Cloud

Security

Engineer

Security groups, tags, policies, service insertion.

Using NSX Intelligence.

Security Admin

Implementing logging for security events

based on Architecture

 

Cloud

Security

Engineer

Cloud Security Architect will also be involved. Via vRNI, Log Insight, Splunk

NSX Admin

Task

Responsible Role

Comments

Role in NSX

Engineering

 

 

 

Implement alerts and notifications  

Cloud Tooling Engineer

Implement alerts and notifications for events in monitoring systems.

NSX Admin

Implement Access Control to NSX infrastructure components

Cloud

Security

Engineer

Working with Architects

Enterprise Admin  

Build security for the physical network (e.g., physical firewall rules)

Cloud

Security

Architect

In collaboration with

Infrastructure Security Team. For example, interrack connectivity and communication with NSX appliances.

Not Applicable to a specific role in NSX

Tier 2 support  

Cloud

Security

Engineer

Diagnose and analyze root cause of issues. Apply patches and fixes as needed.

n/a

Operations

 

 

 

NOC and SOC staff manage NSX operations

Operations Director

Working with Engineers

Auditor  

Deploy Application topologies based on blueprints/templates

Operations Engineer

vRA catalog to deploy network topologies and instances

Automated already by engineering

Tier 1 support for infrastructure and security

Operations Engineer

Document tickets, respond to alerts and alarms, basic break-fix tasks, document alerts/alarm messages, track tickets to closure, and escalate to Tier 2 as needed.

Not Applicable to a specific role in NSX

Respond to

exception/failureissues on build/run automation

Operations Engineer

Working with Engineering.

Not Applicable to a specific role in NSX

Monitoring, alerting, and troubleshooting NSX and physical security infrastructure

Operations Engineer

Infrastructure, applications and security. Responding to alerts and notifications. vRNI.

Follow the runbooks from eng to perform

the tasks. No specific NSX role required.

 

   

 

 

 

 

 

 

  

Filter Tags

Security NSX Data Center Document Deep Dive Design Guide Intermediate Design Network Operations