Welcome to the DEMO for vRealize Network Insight (vRNI) Integration with VMware SD-WAN. In this Demo we will show how vRNI can monitor traffic flows across many domains such as SD-WAN, NSX Datacenters, AWS cloud and VMC and provide a unified view.
This is the topology that we will be using for this Demo. In this topology, we have a few SD-WAN branches at Pune, Amsterdam, Detroit and the Washington Data Center (WDC). We have a SD-WAN Hub site at OVH. The scenario involves a Client in the Detroit branch who is trying to access the HIVE Video training application front-end which resides in WDC. The backend for this application resides in AWS. The WDC site is an NSX datacenter with DELL switches and servers.
We have an end user at the Detroit branch that is trying to access the Hive Video training application to watch a training video on how to install the vRealize network insight.
The problem is that the application video is buffering and will not play smoothly for the end user at the Detroit branch.
We will start by showing the video at the Detroit branch to see that there is an actual issue.
Start the Troubleshooting on vRNI
Let us start the troubleshooting in Network Insight.
Based on the fact that the Hive application reported by the Detroit branch end user is the application that was indicated, we are going to go to our application view.
From here we can see our applications that we are monitoring within our data center.
- Click on Application.
- Click on HIVE training application.
- Once the hive training application is loaded, we can see an overview of our application.
- We can now see the Video tier, the Storage tier, the Detroit branch, the Rotterdam branch and other entities that take part in this application.
- We can see on our video tier for our video front end is hosted in our Washington data center that there's an indicated problem.
- We can also see that there are two red lines indicating flow issues between the video tier and the Detroit branch and the video tier and the storage, which is actually hosted in AWS EC2 and is the backend of the application.
- Click on the Flows and Degradation analysis between the Detroit branch and the video tier in our Washington data center.
- Now, we now we will see all of the flows occurring between the Washington data center Hive server and our Detroit branch.
- Let us scroll down to the active flow with traffic which we can see here is Originating from 184.108.40.206 at the Detroit branch connecting to the HIVE video server at our Washington data center.
- Click on the actual flow.
- Here we get a full end to end view topology from our Detroit branch end user over our Velocloud SD-WAN connecting into our Washington data center and going through NSX and reaching the video server running in our Washington data center.
- Click on the VM underlay to switch to the Hive video server.
- We can see that there are alerts on our VM and our Physical NIC in our underlay.
- Click on the physical NIC where we have an alert.
- Here we can see on our Physical NIC that there's an issue with packets being dropped on our Ethernet1/1/28:1 port, which is on our underlay physical switch
- Click on Ethernet1/1/28:1 which is the peer port to our Physical NIC on our underlay switch
- Now we can see the information for Ethernet1/1/28:1 on our Physical underlay switch and in the events, we can see that there are packet drops occurring at different points, such as the VTEP and switch port, including the associated peer PNIC port.
- Let us scroll down and look at the actual switch port metrics.
- Click on the 6h view to Zoom in.
- Click on Maximize to get a better view.
- From this view we can see that there is a saturation on the physical Ethernet port of 10 Gbps and we can see our interface speed for the switch port is actually 10 Gbps. So, we have saturated the actual port and we are also showing dropped packets and we are also showing the interface peak buffer utilization streaming from the underlay switch to network insight has reached its buffer fullness. This is a clear indication of network saturation on this physical port.
- Click Next
To quickly resolve the issue for the end user, so that they can complete their training. We are going to vMotion this VM to a different host that is on a different physical switch port in the underlay.
- We will vMotion our HIVE Video server from Host 16 to Host 15
- Click Next
- Our vMotion has completed and now the HIVE server is located on Host 15.
- Click Next
- We will now go back and take a look at the video at the Detroit branch to ensure that it is functioning as expected.
- Now this is one way to troubleshoot an individual application within our data center. If we want to take a look at a view of our entire data center to see if we have any flows from applications performing poorly, we can simply go to Network Insight - Analytics
- Click on Flow insights
- Click on Analyze for the last 24 hours. This will load up all of our top talkers inside our data center.
- Click on Network Performance. Here within our chart for network performance. We can now see all of our flows within our data center. And we can see that we have 10 abnormal flows occurring in our data center.All responding at different rates. And this is based on deviation of the flows from the baseline.
- Click on the Abnormal Flows. vRNI does automatic baselining and thresholds based on that . Abnormal flows are the ones exceeding the baseline.We can see all of the details around TCP round trip time, packet drop and flow traffic for our abnormal flows and can begin troubleshooting from there. This is another option to troubleshoot abnormal flows from entire data center view.
vRNI and VMware SD-WAN
Let us now explore the integration between vRNI and VMware SD-WAN. Let us go to the Network Insight portal
- Click Next.
- Click on Velocloud Enterprise Dashboard.
- We now see that we have one application that needs attention.
- Click on Applications (1 Needs Attention).
- We can see that it’s the HIVE Media Application that needs attention.
- Click on HIVE_Media.
- This shows packet drops on the Edge.
- Click on Packet drop Event on Edge.
- We can see that Packet Loss value is 6.7% which exceeds the threshold value of 3% for the application.
- Click on the Detroit Branch to see the details.
- Click on Application packet.
- Here are the details on the Application packet loss.
- Notice the 1 Unhealthy application in the topology.
- Click Next.
Troubleshoot on the VCO
- Navigate to the Configure-Profiles Section.
- Click on the Media Application category.
- Notice that the HIVE Media Application is set to Low Priority and is being rate limited to 1% of the Link Bandwidth.Also it is set to Direct which means it is not using DMPO . We will change it to Multi-Path.
- Click on High to change the Priority. Multi-Path will be selected automatically.
- Click to uncheck the Rate Limit Check box.
- Click OK.
- Click on Save Changes.
Back to Network Insights
- We notice that it now states All Healthy.
- Let us narrow down to the HIVE application now.
- Click on HIVE_Media.
- Click on the + Button to expand the view.
- We notice that the packet drops have stopped.
In this DEMO, we have explored two different troubleshooting scenarios
- The first scenario was a problem outside of the SD-WAN Domain, within the NSX-T Datacenter on the DELL switches. We were able to vMotion the VM instance to another host and resolve the issue.
- The second scenario was a Misconfiguration of a Business Policy for the HIVE application by the administrator. We were able to isolate the problem and fix it on the VCO and then show the results on the Network Insight.