Once admin is getting notifications form (ab)users about issue with login to vCenter, there are few possibilities:
- PEBCAK – one of many terms used in Internet chatting, stands for “problem exists between chair and keyboard,” meaning it’s a problem in user understanding or behavior rather than something wrong with hardware or software. The term is sometimes used by help desk people since it’s a common class of problems.
- Password expiration
- Certificate exparation
In this article we will focus on special kind of certificates – STS.
The vCenter Single Sign-On Security Token Service (STS) is a Web service that issues, validates, and renews security tokens. Users present their primary credentials to the STS interface to acquire SAML tokens. The primary credential depends on the type of user.
STS authenticates the user based on the primary credentials, and constructs a SAML token that contains user attributes. STS signs the SAML token with its STS signing certificate, and assigns the token to the user. By default, the STS signing certificate is generated by VMCA. You can replace the default STS signing certificate from the vSphere Web Client. Do not replace the STS signing certificate unless your company’s security policy requires replacing all certificates.
After a user has a SAML token, the SAML token is sent as part of that user’s HTTP requests, possibly through various proxies. Only the intended recipient (service provider) can use the information in the SAML token.
The Symptoms:
RUNLIST:
- Download repair script from VMware : fixsts
NOTE: This script interacts with the VMDIR’s database. Take an offline snapshot concurrently for all vCenter Servers and Platform Service Controllers in the SSO domain before running the script. Failing to do so may result in an unrecoverable error and require redeploying vCenter Server
- SSH to you vCenter
- Navigate to the /tmp directory
cd /tmp
- Place the script in /tmp folder
- Make the script executable
chmod +x fixsts.sh
- Run the script
./fixsts.sh.
- Restart services
service-control --stop --all
service-control --start --all
NOTE: Restart of services will fail if there are other expired certificates like Machine SSL or Solution User. Proceed with the next step to identify and replace expired certificates.
To determinate if any other certificates are expired, run this one-liner:
for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
NOTE: If you replaced Machine SSL or VMCA Root certificates, you will need to re-register 2nd party solutions such as NSX, SRM, and vSphere Replication.
If HLM (Hybrid Linked Mode) is in use without a gateway, you would need to re-sync the certs from Cloud to On-Prem after following this procedure.
The script will ask for the SSO administrator password and then proceed to regenerate and replace STS certificate.
If you receive an error when you try to run the script:
/bin/bash^M: bad interpreter: No such file or directory
This error is caused by DOS carriage returns added to the script when copying from a Windows-based text editor. To resolve this problem, run this command and rerun the script:
sed -i -e 's/\r$//' fixsts.sh
Here is a video of my runlist:
The script :
#!/bin/bash
# Run this from the affected PSC/VC
# Luciano Delorenzi [ldelorenzi@vmware.com] - 1/2/2020
# NOTE: This works on external and embedded PSCs
# This script will do the following
# 1: Regenerate STS certificate
#
#
# What is needed?
# 1: Offline snapshots of VCs/PSCs
# 2: SSO Admin Password
NODETYPE=$(cat /etc/vmware/deployment.node.type)
if [ "$NODETYPE" = "management" ]; then
echo "Detected this node is a vCenter server with external PSC."
echo "Please run this script from a vCenter with embedded PSC, or an external PSC"
exit 0
fi
echo "NOTE: This works on external and embedded PSCs"
echo "This script will do the following"
echo "1: Regenerate STS certificate"
echo "What is needed?"
echo "1: Offline snapshots of VCs/PSCs"
echo "2: SSO Admin Password"
echo "IMPORTANT: This script should only be run on a single PSC per SSO domain"
mkdir -p /tmp/vmware-fixsts
SCRIPTPATH="/tmp/vmware-fixsts"
LOGFILE="$SCRIPTPATH/fix_sts_cert.log"
echo "==================================" | tee -a $LOGFILE
echo "Resetting STS certificate for $HOSTNAME started on $(date)" | tee -a $LOGFILE
echo ""| tee -a $LOGFILE
echo ""
DN=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmdir]' | grep dcAccountDN | awk '{$1=$2=$3="";print $0}'|tr -d '"'|sed -e 's/^[ \t]*//')
echo "Detected DN: $DN" | tee -a $LOGFILE
PNID=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep PNID | awk '{print $4}'|tr -d '"')
echo "Detected PNID: $PNID" | tee -a $LOGFILE
PSC=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep DCName | awk '{print $4}'|tr -d '"')
echo "Detected PSC: $PSC" | tee -a $LOGFILE
DOMAIN=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep DomainName | awk '{print $4}'|tr -d '"')
echo "Detected SSO domain name: $DOMAIN" | tee -a $LOGFILE
SITE=$(/opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\vmafd\Parameters]' | grep SiteName | awk '{print $4}'|tr -d '"')
MACHINEID=$(/usr/lib/vmware-vmafd/bin/vmafd-cli get-machine-id --server-name localhost)
echo "Detected Machine ID: $MACHINEID" | tee -a $LOGFILE
IPADDRESS=$(ifconfig | grep eth0 -A1 | grep "inet addr" | awk -F ':' '{print $2}' | awk -F ' ' '{print $1}')
echo "Detected IP Address: $IPADDRESS" | tee -a $LOGFILE
DOMAINCN="dc=$(echo "$DOMAIN" | sed 's/\./,dc=/g')"
echo "Domain CN: $DOMAINCN"
ADMIN="cn=administrator,cn=users,$DOMAINCN"
USERNAME="administrator@${DOMAIN^^}"
ROOTCERTDATE=$(openssl x509 -in /var/lib/vmware/vmca/root.cer -text | grep "Not After" | awk -F ' ' '{print $7,$4,$5}')
TODAYSDATE=$(date | awk -F ' ' '{print $6,$2,$3}')
echo "#" > $SCRIPTPATH/certool.cfg
echo "# Template file for a CSR request" >> $SCRIPTPATH/certool.cfg
echo "#" >> certool.cfg
echo "# Country is needed and has to be 2 characters" >> $SCRIPTPATH/certool.cfg
echo "Country = DS" >> $SCRIPTPATH/certool.cfg
echo "Name = $PNID" >> $SCRIPTPATH/certool.cfg
echo "Organization = VMware" >> $SCRIPTPATH/certool.cfg
echo "OrgUnit = VMware" >> $SCRIPTPATH/certool.cfg
echo "State = VMware" >> $SCRIPTPATH/certool.cfg
echo "Locality = VMware" >> $SCRIPTPATH/certool.cfg
echo "IPAddress = $IPADDRESS" >> $SCRIPTPATH/certool.cfg
echo "Email = email@acme.com" >> $SCRIPTPATH/certool.cfg
echo "Hostname = $PNID" >> $SCRIPTPATH/certool.cfg
echo "==================================" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo ""
echo "Detected Root's certificate expiration date: $ROOTCERTDATE" | tee -a $LOGFILE
echo "Detected today's date: $TODAYSDATE" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
flag=0
if [[ $TODAYSDATE > $ROOTCERTDATE ]];
then
echo "IMPORTANT: Root certificate is expired, so it will be replaced" | tee -a $LOGFILE
flag=1
mkdir /certs && cd /certs
cp $SCRIPTPATH/certool.cfg /certs/vmca.cfg
/usr/lib/vmware-vmca/bin/certool --genselfcacert --outprivkey /certs/vmcacert.key --outcert /certs/vmcacert.crt --config /certs/vmca.cfg
/usr/lib/vmware-vmca/bin/certool --rootca --cert /certs/vmcacert.crt --privkey /certs/vmcacert.key
fi
echo "#" > $SCRIPTPATH/certool.cfg
echo "# Template file for a CSR request" >> $SCRIPTPATH/certool.cfg
echo "#" >> $SCRIPTPATH/certool.cfg
echo "# Country is needed and has to be 2 characters" >> $SCRIPTPATH/certool.cfg
echo "Country = DS" >> $SCRIPTPATH/certool.cfg
echo "Name = STS" >> $SCRIPTPATH/certool.cfg
echo "Organization = VMware" >> $SCRIPTPATH/certool.cfg
echo "OrgUnit = VMware" >> $SCRIPTPATH/certool.cfg
echo "State = VMware" >> $SCRIPTPATH/certool.cfg
echo "Locality = VMware" >> $SCRIPTPATH/certool.cfg
echo "IPAddress = $IPADDRESS" >> $SCRIPTPATH/certool.cfg
echo "Email = email@acme.com" >> $SCRIPTPATH/certool.cfg
echo "Hostname = $PNID" >> $SCRIPTPATH/certool.cfg
echo ""
echo "Exporting and generating STS certificate" | tee -a $LOGFILE
echo ""
cd $SCRIPTPATH
/usr/lib/vmware-vmca/bin/certool --server localhost --genkey --privkey=sts.key --pubkey=sts.pub
/usr/lib/vmware-vmca/bin/certool --gencert --cert=sts.cer --privkey=sts.key --config=$SCRIPTPATH/certool.cfg
openssl x509 -outform der -in sts.cer -out sts.der
CERTS=$(csplit -f root /var/lib/vmware/vmca/root.cer '/-----BEGIN CERTIFICATE-----/' '{*}' | wc -l)
openssl pkcs8 -topk8 -inform pem -outform der -in sts.key -out sts.key.der -nocrypt
i=1
until [ $i -eq $CERTS ]
do
openssl x509 -outform der -in root0$i -out vmca0$i.der
((i++))
done
echo ""
echo ""
read -s -p "Enter password for administrator@$DOMAIN: " DOMAINPASSWORD
TENANTS=$(/opt/likewise/bin/ldapsearch -h localhost -p 389 -b "cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "(objectclass=vmwSTSTenantCredential)" | grep numEntries | awk '{print $3}')
echo ""
TRUSTEDCERTCHAINS=$(/opt/likewise/bin/ldapsearch -h localhost -p 389 -b "cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "(objectclass=vmwSTSTenantTrustedCertificateChain)" | grep numEntries | awk '{print $3}')
echo "Amount of tenant credentials: $TENANTS" | tee -a $LOGFILE
i=1
if [ ! -z $TENANTS ]
then
until [ $i -gt $TENANTS ]
do
echo "Exporting tenant $i to $SCRIPTPATH" | tee -a $LOGFILE
echo ""
ldapsearch -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -b "cn=TenantCredential-$i,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > $SCRIPTPATH/tenantcredential-$i.ldif
echo "Deleting tenant $i" | tee -a $LOGFILE
ldapdelete -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "cn=TenantCredential-$i,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" | tee -a $LOGFILE
((i++))
done
fi
echo ""
echo "Amount of trustedcertchains: $TRUSTEDCERTCHAINS" | tee -a $LOGFILE
i=1
if [ ! -z $TRUSTEDCERTCHAINS ]
then
until [ $i -gt $TRUSTEDCERTCHAINS ]
do
echo "Exporting trustedcertchain $i to $SCRIPTPATH" | tee -a $LOGFILE
ldapsearch -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -b "cn=TrustedCertChain-$i,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > $SCRIPTPATH/trustedcertchain-$i.ldif
echo ""
echo "Deleting trustedcertchain $i" | tee -a $LOGFILE
ldapdelete -h localhost -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" "cn=TrustedCertChain-$i,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" | tee -a $LOGFILE
((i++))
done
fi
echo ""
i=1
echo "dn: cn=TenantCredential-1,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" > sso-sts.ldif
echo "changetype: add" >> sso-sts.ldif
echo "objectClass: vmwSTSTenantCredential" >> sso-sts.ldif
echo "objectClass: top" >> sso-sts.ldif
echo "cn: TenantCredential-1" >> sso-sts.ldif
echo "userCertificate:< file:sts.der" >> sso-sts.ldif
until [ $i -eq $CERTS ]
do
echo "userCertificate:< file:vmca0$i.der" >> sso-sts.ldif
((i++))
done
echo "vmwSTSPrivateKey:< file:sts.key.der" >> sso-sts.ldif
echo "" >> sso-sts.ldif
echo "dn: cn=TrustedCertChain-1,cn=TrustedCertificateChains,cn=$DOMAIN,cn=Tenants,cn=IdentityManager,cn=Services,$DOMAINCN" >> sso-sts.ldif
echo "changetype: add" >> sso-sts.ldif
echo "objectClass: vmwSTSTenantTrustedCertificateChain" >> sso-sts.ldif
echo "objectClass: top" >> sso-sts.ldif
echo "cn: TrustedCertChain-1" >> sso-sts.ldif
echo "userCertificate:< file:sts.der" >> sso-sts.ldif
i=1
until [ $i -eq $CERTS ]
do
echo "userCertificate:< file:vmca0$i.der" >> sso-sts.ldif
((i++))
done
echo ""
echo "Applying newly generated STS certificate to SSO domain" | tee -a $LOGFILE
/opt/likewise/bin/ldapmodify -x -h localhost -p 389 -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -f sso-sts.ldif | tee -a $LOGFILE
echo ""
echo "Replacement finished - Please restart services on all vCenters and PSCs in your SSO domain" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo "IMPORTANT: In case you're using HLM (Hybrid Linked Mode) without a gateway, you would need to re-sync the certs from Cloud to On-Prem after following this procedure" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
echo "==================================" | tee -a $LOGFILE
if [ $flag == 1 ]
then
echo "Since your Root certificate was expired and was replaced, you will need to replace your MachineSSL and Solution User certificates" | tee -a $LOGFILE
echo "You can do so following this KB: https://kb.vmware.com/s/article/2097936" | tee -a $LOGFILE
fi