Data Retriever is not initialized yet. Please wait. – How to replace expired internal certificate in vRealize Operations Manager

You may not know but vROps has an internal certificate that may expire. If you won’t catch it on time, you will be in the trouble. You may have the situation that you won’t be able to log into the Admin UI.

The cluster is offline and you won’t be able to bring it online, while the error mesage will state:

Data Retriever is not initialized yet. Please wait

So where this internal certificate come from?

The internal certificate in vRealize Operations is generated upon initial deployment.

I am about to upgrade vROps, would that fix the issue?

As of now, the time I am writing this article the latest version is 8.5 and upgrading to the latest version WILL NOT upgrade internal certificate.

Why should I bother, I will renew certificate later, can I?

There is no recovery option for vRealize Operations Manager 6.2.x and earlier.

OK, how to check if my internal cert is valid?

  • Command line:
    • SSH to primary node
    • Run the command:
/bin/grep -E --color=always -B1 'java.security.cert.CertPathValidatorException: validity check failed|java.security.cert.CertificateExpiredException' $ALIVE_BASE/user/log/*.log | /usr/bin/tail -20

If like in case command returns nothing, certificate is still valid and renewal is not required

If command returned output containing validity check failed, certificate renewal is required immediately.

  • Web browsers:
    • Go to https://Primary_node_FQDN_or_IP:6061
    • The page will display warning about not secure connection, depends on the browser message will vary, THIS IS EXPECTED!
    • The Gemfire service must be running for a certificate to be presented.
    • Go to certificate and check the validation period

What to do?

If internal certificate not expired

  • Take a snapshot of all nodes, here is a procedure how to properly take a snapshot for vRealize Operations.
  • Download Certificate Renewal PAK file from the VMware Patch Portal.
  • Take the cluster offline
  • Go to Software update in the left panel
  • Click Install a Software Update.
  • Install the certificate renewal PAK file
  • After installation Bring cluster Online.

IMPORTANT:

  • Run the following commands on all nodes in the vRealize Operations cluster:
chown admin:admin -R /storage/vcops/user/conf/ssl/ /storage/vcops/user/conf/ssl_bak/ /storage/db/casa/webapp/hsqldb/

chown -h root:root /storage/vcops/user/conf/ssl/web_cert.pem /storage/vcops/user/conf/ssl/web_chain.pem /storage/vcops/user/conf/ssl/web_key.pem


chmod guo+r -R /storage/vcops/user/conf/ssl/

chmod 444 /storage/vcops/user/conf/ssl/cacert.pem /storage/vcops/user/conf/ssl/slice_*_cert.pem


hmod 400 /storage/vcops/user/conf/ssl/cakey.pem /storage/vcops/user/conf/ssl/slice_*_cert.pfx /storage/vcops/user/conf/ssl/slice_*_key.pem

chmod 640 /storage/vcops/user/conf/ssl/tcserver.keystore

NOTE: if you cannot see that installation is completed, please do as follow: clear cookies and restart browser, clear the browser caches. If installation is still in progress, SSH to Primary node and run this command:

sed -i -e 's/\"initialization_state\"\:\"INITIALIZING\"/\"initialization_state\"\:\"NONE\"/g' /data/db/casa/webapp/hsqldb/casa.db.script

Repeat this procedure on the Replica node.

Restart CASA service:

service vmware-casa restart

If internal certificate expired

In this case you need to install PAK file manually, please refer to VMware KB for compatibility versions.

  • SSH to all nodes!
  • Copy Certificate Renewal PAK file to the /tmp/ on all nodes
  • Prepare environment by running this command:
mkdir -p /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted

  • Unzip PAK:
unzip /tmp/vRealize_Operations_Manager_Enterprise_Certificate_Renewal-build.pak -d /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted

NOTE: You need to replace build with the build number you downloaded.

  • Stop all the services:
service vmware-vcops-watchdog stop
service vmware-vcops stop
  • Make sure all services are stopped

Disclaimer:

This part I copied directly from VMware KB to avoid mistypes as this is imprtant to get this right

The following command needs to be run in a particular order.  Follow each sub-step carefully.

Command: $VMWARE_PYTHON_BIN

/data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted/updateCoordinator.py EXPIRED

  1. First, run the command on all Remote Collector nodes (if present) in the cluster, and wait for the task to complete.  Continue to step 8.2.
  2. Next, run the command on all Data nodes, the Witness node (if present), and the Primary Replica node (if present) in the cluster; do not wait for each node to complete, just start the command on all nodes.  Once Waiting for certificate generation to complete appears on the last node, wait roughly 60 seconds, and continue to step 8.3.
  3. Finally, run the command on the Primary node.

The expected behavior is for the command to finish, then shortly afterwards the pending tasks on the Data nodes and Primary Replica node (if present) will complete.
Note: To ensure that the command completes successfully check for the existence of the /var/vmware/_cert_generation_completed file on the Primary node.

Change newly generated certificates permissions on all nodes in the vRealize Operations cluster by running the following commands:

chown admin:admin -R /storage/vcops/user/conf/ssl/ /storage/vcops/user/conf/ssl_bak/ /storage/db/casa/webapp/hsqldb/

chown -h root:root /storage/vcops/user/conf/ssl/web_cert.pem /storage/vcops/user/conf/ssl/web_chain.pem /storage/vcops/user/conf/ssl/web_key.pem

chmod guo+r -R /storage/vcops/user/conf/ssl/

chmod 444 /storage/vcops/user/conf/ssl/cacert.pem /storage/vcops/user/conf/ssl/slice_*_cert.pem

chmod 400 /storage/vcops/user/conf/ssl/cakey.pem /storage/vcops/user/conf/ssl/slice_*_cert.pfx /storage/vcops/user/conf/ssl/slice_*_key.pem

chmod 640 /storage/vcops/user/conf/ssl/tcserver.keystore

For version 8.4 and later, also run the following commands on the Primary node and Primary Replica node (if present) and all data nodes:

chown postgres:root /storage/vcops/user/conf/ssl/postgres_vcopsrepl_*

chmod 600 /storage/vcops/user/conf/ssl/postgres_vcops_key.pk8 /storage/vcops/user/conf/ssl/postgres_vcopsrepl_key.pem

chmod 640 /storage/vcops/user/conf/ssl/postgres_vcops_cert.pem /storage/vcops/user/conf/ssl/postgres_vcopsrepl_cert.pem

Run the following commands on all nodes in the vRealize Operations cluster:

service vmware-vcops-watchdog start

service vmware-casa restart

sed -i 's/sliceonline\ \=\ true/sliceonline\ \=\ false/g' /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/data/roleState.properties

Run the following commands on the Primary node, and Primary Replica node (if present):

service vmware-casa stop

sed -i -e 's/\"onlineState\"\:\"GOING\_OFFLINE\"/\"onlineState\"\:\"OFFLINE\"/g' -e 's/\"online\_state\"\:\"GOING\_OFFLINE\"/\"online\_state\"\:\"OFFLINE\"/g' -e 's/\"onlineState\"\:\"GOING\_ONLINE\"/\"onlineState\"\:\"OFFLINE\"/g' -e 's/\"online\_state\"\:\"GOING\_ONLINE\"/\"online\_state\"\:\"OFFLINE\"/g' -e 's/\"onlineState\"\:\"ONLINE\"/\"onlineState\"\:\"OFFLINE\"/g' -e 's/\"online\_state\"\:\"ONLINE\"/\"online\_state\"\:\"OFFLINE\"/g' -e 's/\"onlineState\"\:\"FAILURE\"/\"onlineState\"\:\"OFFLINE\"/g' -e 's/\"online\_state\"\:\"FAILURE\"/\"online\_state\"\:\"OFFLINE\"/g' /data/db/casa/webapp/hsqldb/casa.db.script

service vmware-casa start

service vmware-vcops-web restart

/etc/init.d/apache2 restart or systemctl restart httpd if vRealize Operations is version 8.x

Bring cluster Online

Please like and share to spread the knowledge in the community.

Visit my FB page: https://www.facebook.com/AngrySysOps

Chat with me on Twitter: @AngrySysOps

Subscribe to my YouTube channel: https://www.youtube.com/channel/UCRTcKGl0neismSRpDMK_M4A

Please leave the comment