NSX Troubleshooting tips

ESXi Host Level Troubleshooting

1. How to verify that VIBs are successfully installed on the ESXi host:

Verify the NSX vibs are installed and correct version is on the ESXi host esxcli software vib list

(Will display the list of all the VIBs installed on the hosts and user can grep for vxlan and vsip VIBs)

esxcli software vib get –vibname esx-vxlan

esxcli software vib get –vibname esx-vsip

esxcli software vib get –vibname esx-vxlan

Verify VXLAN kernel module vdl2 is loaded on the ESXi host vmkload_mod –l | grep vdl2
Find the VDS name associated with this host’s VTEP. esxcli network vswitch dvs vmware vxlan list

If none of these commands return expected output, this is an indication of a problem and logs should be verified.

Relevant logs to be checked are:

/var/log/esxupdate.log

/var/log/vmkernel.log

Syslog collectors like LogInsight can be configured to send alerts/errors for certain messages detected in the logs.

Sample Output:

2. How to verify control-plane is up between the host and the controller per logical-switch.

Verify logical network information and controller-plane connection per logical-switch esxcli network vswitch dvs vmware vxlan network list –vds-name <VDS_Name>
Verify message bus TCP connection (vsfwd) esxcli network ip connection list | grep 5671
Verify controller TCP connection (netcpad) esxcli network ip connection list | grep 1234
Verify controller connection from host /etc/init.d/netcpad

<status/start/stop/restart>

Verify the firewall process running on the host /etc/init.d/vShield-stateful-firewall

<status/start/stop/restart>

If there are VMs present attached to a logical switch on this host, the host should have controller-connections in the output of this command (there should be one connection for each logical switch which has an attached VM running on this host).

Check if all the controller connections show “up” or “down”. If there is a down, it warrants more debugging and checking the logs on the host and/or logging into the controllers for further debugging.

Relevant logs to be checked are the netcpa and vsfwd communication channel logs:

/var/log/netcpad.log

/var/log/vsfwd.log

Advertisements

VCM installation issues – Troubleshooting

Hi All,

This was the first time, I was working on a VCM (vRealize Configuration Manager) single tier deployment for a customer, and experienced an issue which I am sharing with this blog and it’s resolution.

  • VCM version – 5.8.3
  • Single Tier – SQL server database instance was deployed and configured
  • SQLXML and SSRS service was also configured.
  • SQL was running with a service account

Once the VCM installer was run, it passed all the pre-requisites check and proceeded with the VCM installation in about 30 minutes. Post the installation, I restarted the server as a general reboot, this is when after the reboot the SQL service would not start at all on the server.

After multiple reboots, I reviewed the logs and below was the error message found:

Event Viewer > Application logs

SQL logs

C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\error.log

The server was unable to initialize encryption because of a problem with a security library. The security library may be missing. Verify that security.dll exists on the system.


Next, I googled around and as VCM uses TLS 1.0, I opened the below registry path to check the registry settings on the server

\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS1.0\

Changed the settings as below for both Client and Server

DisabledbyDefault – 0

Enabled – 1

Restarted the machine after that , and wow!! the sql service was started automatically .

However, I ran into the next problem where I was unable to access the VCM console or login page, and getting the below error:

“Your id is not allowed to access, please contact your administrator”

This was very strange, as I was using the service account to login to the VCM portal. After some googling I stumbled upon VMware KB : https://kb.vmware.com/kb/2000958 and then checked the permissions of in the VCM database tablename – ecm_sysdat_logins, and to my surprise for all the login accounts the login active was 0 . Thus, I used the steps on the KB to allow login for few accounts, and then VCM login page was accessible and was able to configure it.

 

VMware Named to Great Place to Work® and…

VMware Named to Great Place to Work® and Fortune 2017 “Best Companies to Work For” List [VMware Radius]

VMware Named to Great Place to Work® and…

Today, VMware received a prestigious and public acknowledgement of our high-impact workplace as one of Fortune’s 100 Best Companies to Work For.


VMware Social Media Advocacy

Hi All,

Just wanted to share another issue which I experienced related to the license service . I ran into the issue after a reboot of the PSC , vCenter machines post a maintenance activity .

Small information about the setup, PSC is external windows based and vCenter is also windows based. After the reboot , on clicking of the “Licensing ” tab on web-client or vSphere-Client gives the error : Assigning VC license failed with class Vmacore::Soap::InvalidResponseException(Invalid response code: 503 Service Unavailable) .

I checked the license service log on the PSC machine, and also found below error:

Vpxd::License::LicenseClientFaultTolerance::ProcessLicenseChanges threw class Vmacore::Exception(License client start has failed.)

I rebooted the PSC machine once again , but the issue still remained . After bit of research through the logs , it was understood that there is an issue were the license client is expiring for a wait period as other PSC services are taking time to start.

Thus, I stop the VMware Directory service (which inturn stops all the dependent service) on PSC , and then first started the license service , then the VMware Directory Service .

As I guessed, this fixed the issue and I was able to view the licenses under the Licenses tab.

Hope, this helps if any one of you run into a similar issue.