VMware NSX-T Data Center 3.x – Load Balancer on a NSX-T environment stop working

If you came across the issue with accessing the backend servers via the Load Balance or it is taking several seconds to display the backend server pages, then your Load balancer process is not handling the Server keep-alive in the HTTP header.

Have a look at NSX-T Edge syslog -> /var/log/syslog

NOTE: NGINX Core dump can be find on the NSX-T Edge: /var/log/core (Example: core.nginx.1606488022.24628.134.11.gz)

2020-12-24T12:02:38.474876+00:00 edge01 NSX 5074 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="INFO"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] signal 17 (SIGCHLD) received from 18485
2020-12-24T12:02:38.475010+00:00 edge01 NSX 5074 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] worker process 18485 exited on signal 11 (core dumped)
2020-12-24T12:02:38.475091+00:00 edge01 NSX 5074 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="INFO"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] start child: worker process, pid: 20067, gen_id: 15, worker_counter: 369
2020-12-24T12:02:38.475195+00:00 edge01 NSX 5074 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="INFO"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] signal 29 (SIGIO) received
2020-12-24T12:02:38.534849+00:00 edge01 NSX 20067 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="INFO"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] signal 17 (SIGCHLD) received from 18485
2020-12-24T12:02:38.534932+00:00 edge01 NSX 20067 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] worker process 18485 exited on signal 11 (core dumped)
2020-12-24T12:02:38.534967+00:00 edge01 NSX 20067 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="INFO"] [c3db5e55-4ac6-48e4-b154-a2e6705dba25] L4LB cp 0 successfully started (sock:123

This issue is resolved in VMware NSX-T version 3.1.1.

But if you are not ready to upgrade here is workaround:

You need to disable the NTLM Auth & Server Keep-alive parameters on the HTTP application profile.

To disable Server Keep-alive from UI:

Select Networking> Load Balancing > Profiles > Application
Next to ‘server keep-alive’ set to disable
Save

To disable NTLM Auth & Server Keep-alive via API:

Find the application profile ID in your output:

GET https://<nsx-mgr>/api/v1/loadbalancer/application-profiles

Retrieve configuration/settings of the application profile using ID collected in previous command:

GET https://<nsx-mgr>/api/v1/loadbalancer/application-profiles/<application profile ID>

Copy and paste the entire body that was returned from the previous command into the below PUT command. The two changes that need to be made to the information within the body:

"ntlm": true, ——–> this needs to be set to false

"server_keep_alive" : true, ——-> this needs to be set to false

PUT https://<nsx-mgr>/api/v1/loadbalancer/application-profiles/<application-profile-id>

If you get the API response:

"httpStatus": "BAD_REQUEST",
"error_code": 289,
"module_name": "common-services",
"error_message": "Principal 'admin' with role '[enterprise_admin]' attempts to delete or modify an object of type LoadBalancerHttpProfile it doesn't own. (createUser=nsx_policy, allowOverwrite=null)"

Add the following under headers (may not auto fill but still works):

Key= X-Allow-Overwrite

Value= true

Please like and share to spread the knowledge in the community.

Visit my FB page: https://www.facebook.com/AngrySysOps

Subscribe to my YouTube channel: https://www.youtube.com/channel/UCRTcKGl0neismSRpDMK_M4A

DON’T FORGET ABOUT COMPETITION FOR FREE EXAM VOUCHER!!! -> https://angrysysops.com/2021/07/08/how-to-get-free-vmware-vcp-certification-voucher-competition/

Please leave the comment