23ai, Zero-Downtime Oracle Grid Infrastructure Patching – GOLD IMAGE

As you know, the 23ai was released for Cloud and Engineered Systems (Exadata and ExaCC) first, I already explored these in previous posts as well. And since the patches already started to be released, now with the patch for 23.6, we can re-test the feature Zero-Downtime Oracle Grid Infrastructure Patching (ZDOGIP). The steps here are not specific to the Exadata version and can be used for any 23ai version.

I already demonstrated how to use it for 21c (using graphical, and silent mode) and the same can be done for 19c as well.

But now, I will show how to do for 23ai, and this post includes:

  • Install the Grid Infrastructure 23.6.0.24.10, using the Gold Image
  • Upgrade the GI from 23.5.0.24.07 to 23.6.0.24.10 using the Zero-Downtime Oracle Grid Infrastructure Patching

This will be done while the database is running to show that we can patch the GI without downtime. I will show how to do this:

Current Environment

The running system is:

  • OEL 8.9 Kernel 5.4.17-2136.324.5.3.el8uek.x86_64.
  • Oracle GI 23ai, version 23.5.0.24.07 with no one-off or patches installed.
  • Oracle Database 23ai (23.5.0.24.07) and 19c (19.23.0.0.0).
  • Nodes are not using Transparent HugePages.
  • It is a RAC installation, with two nodes.

You can see this information in detail here in this file. Since I am running the Oracle 19c, the compatibility for ASM diskgroups was set to 19.0.0.0.0.

Gold Image

One detail for this patch is that I am using the gold image provided for 23ai. Starting with 23ai Oracle will provide (for some HW/OS combinations) the version called Gold Image, this is aimed to facilitate installation and patch process because the image is already packed resulting in small size and fast installation.

One detail for the 23ai patches, the Engineered System version started with 23.4.0.24.05 and later with the release 23.5.0.24.07. Unfortunately, the upgrade from 23.4 to 23.5 was not possible to use the ZDOGIP process because 23.5 was considered a new release, and when I tried to apply the patch I received an error message telling me that the actual patch level was the same. And if we compare 23.4 and 23.5, they appear as the same patch level at CRS:

[grid@o23c1n1s1 ~]$ /u01/app/23.4.0.0/grid/bin/crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [23.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [0].
[grid@o23c1n1s1 ~]$

[grid@o23c1n1s1 ~]$ /u01/app/23.5.0.0/grid/bin/crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [23.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [0].
[grid@o23c1n1s1 ~]$

But using the provided gold image, we can use the ZDOGIP.

ACFS and AFD drivers

As you know, some installations use the ASM Filter and ACFS drivers compiled and attached as modules directly to the Linux Kernel. So, there is a well-known compatible matrix that you can see at MOS for these drivers and the Linux Kernel (link ACFS Support On OS Platforms (Certification Matrix – Doc ID 1369107.1)).

This verification is important because most of the time (99%) of the patches for GI include upgrades for the OS drivers. So, you need to validate if your actual kernel is compatible with the new version of drivers.

Besides that, it is important that BEFORE starting the patch you execute the command to check the current version of these drivers (execute in both nodes):

######################################
#
#Checking Node 01
#
######################################
[grid@o23c1n1s2 ~]$ acfsdriverstate version
ACFS-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
ACFS-9326:     Driver build number = 240702.1.
ACFS-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
ACFS-9547:     Driver available build number = 240702.1.
ACFS-9232:     Driver available build version = 23.0.0.0.0 (23.5.0.24.07).
[grid@o23c1n1s2 ~]$
[grid@o23c1n1s2 ~]$ afddriverstate version
AFD-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
AFD-9326:     Driver build number = 240702.1.
AFD-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
AFD-9547:     Driver available build number = 240702.1.
AFD-9232:     Driver available build version = 23.0.0.0.0 (23.5.0.24.07).
[grid@o23c1n1s2 ~]$
[grid@o23c1n1s2 ~]$

######################################
#
#Checking Node 02
#
######################################
[grid@o23c1n2s2 ~]$ acfsdriverstate version
ACFS-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
ACFS-9326:     Driver build number = 240702.1.
ACFS-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
ACFS-9547:     Driver available build number = 240702.1.
ACFS-9232:     Driver available build version = 23.0.0.0.0 (23.5.0.24.07).
[grid@o23c1n2s2 ~]$
[grid@o23c1n2s2 ~]$ afddriverstate version
AFD-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
AFD-9326:     Driver build number = 240702.1.
AFD-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
AFD-9547:     Driver available build number = 240702.1.
AFD-9232:     Driver available build version = 23.0.0.0.0 (23.5.0.24.07).
[grid@o23c1n2s2 ~]$

And as well for CRS to check what is the current active drivers:

######################################
#
#Checking Node 01
#
######################################
[grid@o23c1n1s2 ~]$ crsctl query driver activeversion -all
Node Name : o23c1n1s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n1s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

[grid@o23c1n1s2 ~]$ crsctl query driver softwareversion -all
Node Name : o23c1n1s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n1s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

[grid@o23c1n1s2 ~]$

A little detail about the ZDOGIP. If you need to upgrade the OS drivers, unfortunately, you will have downtime because the process stops all databases. But there is one way to do ZDOGIP without applying the ACFS and AFD drivers, and this is what I will use later.

Preparing for Patching

The files that we need are simple:

  • Grid Infrastructure Gold Image, patch number is 37037934.
  • Opatch, always the last version available. Download here in this link.

Creating the folders the files

The patch will use the ZDOGIP processes. So, it is out of place patching, with a switch of the current t Oracle Home folder for the GI. So, after downloading the patches and uploading to the first node of your RAC cluster we need to create the new folders in all nodes:

######################################
#
#Checking Node 01
#
######################################
[root@o23c1n1s2 ~]# mkdir -p /u01/app/23.6.0.0/grid
[root@o23c1n1s2 ~]# chown grid:oinstall /u01/app/23.6.0.0/grid
[root@o23c1n1s2 ~]#

######################################
#
#Checking Node 02
#
######################################
[root@o23c1n2s2 ~]# mkdir -p /u01/app/23.6.0.0/grid
[root@o23c1n2s2 ~]# chown grid:oinstall /u01/app/23.6.0.0/grid
[root@o23c1n2s2 ~]#

Unzipping the patches

The next step is executed only in the first node. We just need to unzip the GI gold image and the OPatch as the grid user:

[root@o23c1n1s2 ~]# su - grid
[grid@o23c1n1s2 ~]$
[grid@o23c1n1s2 ~]$ cd /u01/app/23.6.0.0/grid/
[grid@o23c1n1s2 grid]$
[grid@o23c1n1s2 grid]$ unzip -q /u01/install/Grid/p37037934_230000_Linux-x86-64.zip
[grid@o23c1n1s2 grid]$
[grid@o23c1n1s2 grid]$
[grid@o23c1n1s2 grid]$ unzip -q /u01/install/p6880880_230000_Linux-x86-64.zip
replace OPatch/opatchauto? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
[grid@o23c1n1s2 grid]$

Running systems  

For now in my example scenario, what is running in the system is:

  • 23.5 GI installed and running at /u01/app/23.5.0.0
  • Oracle RAC Database 23.5 called o23ne
  • Oracle RAC Database 19.23 called o19c

You can see below this (in both nodes):

######################################
#
#Checking Node 01
#
######################################
[root@o23c1n1s2 ~]# ps -ef |grep smon
root        5770       1  0 15:14 ?        00:00:53 /u01/app/23.5.0.0/grid/bin/osysmond.bin
grid        6480       1  0 15:15 ?        00:00:00 asm_smon_+ASM1
oracle      7170       1  0 15:15 ?        00:00:00 ora_smon_o23ne1
oracle      7514       1  0 15:15 ?        00:00:00 ora_smon_o19c1
root       66017    3896  0 16:46 pts/0    00:00:00 grep --color=auto smon
[root@o23c1n1s2 ~]#
[root@o23c1n1s2 ~]# ps -ef |grep lsnr
root        5872    5810  0 15:14 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/crfelsnr -n o23c1n1s2
grid        6209       1  0 15:15 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid        6259       1  0 15:15 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid        9078       1  0 15:16 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
grid        9084       1  0 15:16 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
root       66025    3896  0 16:46 pts/0    00:00:00 grep --color=auto lsnr
[root@o23c1n1s2 ~]#

######################################
#
#Checking Node 02
#
######################################
[root@o23c1n2s2 ~]#
[root@o23c1n2s2 ~]# ps -ef |grep smon
root        4795       1  0 15:19 ?        00:00:48 /u01/app/23.5.0.0/grid/bin/osysmond.bin
grid        5727       1  0 15:19 ?        00:00:00 asm_smon_+ASM2
oracle      6300       1  0 15:20 ?        00:00:00 ora_smon_o23ne2
oracle      6679       1  0 15:20 ?        00:00:00 ora_smon_o19c2
root       57432    9393  0 16:46 pts/0    00:00:00 grep --color=auto smon
[root@o23c1n2s2 ~]#
[root@o23c1n2s2 ~]# ps -ef |grep lsnr
root        4896    4838  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/crfelsnr -n o23c1n2s2
grid        5298       1  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid        5329       1  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid        5350       1  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
root       57436    9393  0 16:46 pts/0    00:00:00 grep --color=auto lsnr
[root@o23c1n2s2 ~]#
[root@o23c1n2s2 ~]#

And for the 19c database, we have one pdb called PDB19C1:

[oracle@o23c1n1s2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Oct 20 16:51:48 2024
Version 19.23.0.0.0

Copyright (c) 1982, 2023, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB19C1                        READ WRITE NO
SQL> show parameter list

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
forward_listener                     string
listener_networks                    string
local_listener                       string       (ADDRESS=(PROTOCOL=TCP)(HOST=
                                                 o23c1n1s2-vip.oralocal)(PORT=1
                                                 521))
remote_listener                      string       o23c1s2-scan:1521
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.23.0.0.0
[oracle@o23c1n1s2 ~]$

Testing the ZDOGIP

As you know, I like to test in detail what I post here. So, I created a simple table (T1) at the PDB19C and made two loops that were continuously inserted into it. In case of downtime or shutdown of the database, we will notice the error.

The first one is using a PLSQL connected from the node01. This will represent a connected session (that never disconnects) from your database:

[oracle@o23c1n1s2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Oct 20 16:52:34 2024
Version 19.23.0.0.0

Copyright (c) 1982, 2023, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> alter session set container = PDB19C1;

Session altered.

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  2      lDatMax DATE := (sysdate + 240/1440);
  3  BEGIN
  4      WHILE (sysdate <= (lDatMax)) LOOP
  5          insert into simon.t1(c1, c2, c3) values (SYS_CONTEXT ('USERENV', 'INSTANCE'), 'Loop - Sqlplus', sysdate);
  6          commit;
  7          dbms_session.sleep(0.5);
  8      END LOOP;
  9  END;
 10  /

The second is using EZCONNECT from a third machine that connects to the database using the scan. So, sometimes connection goes to instance 1, and others for instance 2:

[oracle@o8rpn1-19c ~]$ date
Sun Oct 20 16:51:03 CEST 2024
[oracle@o8rpn1-19c ~]$ for i in {1..1000000}
> do
> echo "Insert Data $i "`date +%d-%m-%Y-%H%M%S`
> sqlplus -s simon/simon23ai@o23c1s2-scan.oralocal/PDB19C1<<EOF
> set heading on feedback on;
> insert into t1(c1, c2, c3) values (SYS_CONTEXT ('USERENV', 'INSTANCE'), 'Loop - EZconnect', sysdate);
> commit;
> EOF
> done
Insert Data 1 20-10-2024-165359

1 row created.


Commit complete.

Insert Data 2 20-10-2024-165359

1 row created.
…

ZDOGIP using GOLD IMAGE

Differently of the ZDOGIP for 21c and 19c, which we call the installation and patch at the same time, for 23c using the gold image, we need to do two steps. This is needed because the gold image already provides a complete and patched 23.6 image that can be installed directly.

Step 01 – Installing the software

So, the first step is to call the gridSetup.sh script and select the option to just install the crs software:

[grid@o23c1n1s2 ~]$ export ORACLE_HOME=/u01/app/23.6.0.0/grid
[grid@o23c1n1s2 ~]$ /u01/app/23.6.0.0/grid/gridSetup.sh
ERROR: Unable to verify the graphical display setup. This application requires X display. Make sure that xdpyinfo exist under PATH variable.
Launching Oracle Grid Infrastructure Setup Wizard...

The response file for this session can be found at:
 /u01/app/23.6.0.0/grid/install/response/grid_2024-10-20_05-00-04PM.rsp

You can find the log of this install session at:
 /u01/app/oraInventory/logs/GridSetupActions2024-10-20_05-00-04PM/gridSetupActions2024-10-20_05-00-04PM.log
[grid@o23c1n1s2 ~]$

The gallery below shows all the steps that you need to do using the GUI. It is a simple next, next-finish process.

This slideshow requires JavaScript.

At the end, is needed to call the root.sh in both nodes:

######################################
#
# Node 01
#
######################################
[root@o23c1n1s2 ~]# /u01/app/23.6.0.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/23.6.0.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Cluster execute the following command as grid user:
/u01/app/23.6.0.0/grid/gridSetup.sh
This command launches the Grid Infrastructure Setup Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

[root@o23c1n1s2 ~]#

######################################
#
# Node 02
#
######################################
[root@o23c1n2s2 ~]# /u01/app/23.6.0.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/23.6.0.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Cluster execute the following command as grid user:
/u01/app/23.6.0.0/grid/gridSetup.sh
This command launches the Grid Infrastructure Setup Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

[root@o23c1n2s2 ~]#

An interesting detail here is why we can’t call the installation and switch at the same time. This occurs because the unzipped gold image does not appear (in the oraInventory) as installed. It does not exist in both nodes and does not exist inside the inventory. And even if you try to call everything together, you will receive one error that your software is not installed:

Looking for the running inserts, we noticed that everything was ok while the installation was happening:

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
      4846 1          Loop - EZconnect     20/10/2024 17:11:15 20/10/2024 16:53:59
       677 2          Loop - EZconnect     20/10/2024 17:11:12 20/10/2024 16:53:59
      2078 1          Loop - Sqlplus       20/10/2024 17:11:15 20/10/2024 16:53:53

SQL>

Step 02 – Doing the Home Switch

After the success of step 01, we can call the step 02. Here we will switch between the homes for the GI, and we will do this online. We will use the following parameters when calling the grtidSetup.sh:

  • switchGridHome: This will tell the installer to switch between the old and new GI home.
  • zeroDowntimeGIPatching: This will tell you to do the switch online, without stopping any database that is running in the nodes.
  • skipDriverUpdate: This will not update the kernel drivers for the ACFS and AFD.

So, we just call this (as grid user) in the first node of the cluster:

[grid@o23c1n1s2 ~]$ /u01/app/23.6.0.0/grid/gridSetup.sh -switchGridHome -zeroDowntimeGIPatching -skipDriverUpdate
ERROR: Unable to verify the graphical display setup. This application requires X display. Make sure that xdpyinfo exist under PATH variable.
Launching Oracle Grid Infrastructure Setup Wizard...

[grid@o23c1n1s2 ~]$

The gallery below shows the GUI steps that are needed:

This slideshow requires JavaScript.

Near of the end of the GUI installation, the run for the root.sh script is needed in all nodes. And it is here where the switch will officially occur, and that it is done online. So, running the root.sh at the first node with root user:

[root@o23c1n1s2 ~]# /u01/app/23.6.0.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/23.6.0.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
RAC option enabled on: Linux
Executing command '/u01/app/23.6.0.0/grid/perl/bin/perl -I/u01/app/23.6.0.0/grid/perl/lib -I/u01/app/23.6.0.0/grid/crs/install /u01/app/23.6.0.0/grid/crs/install/rootcrs.pl  -dstcrshome /u01/app/23.6.0.0/grid -transparent -nodriverupdate -prepatch'
Using configuration parameter file: /u01/app/23.6.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/o23c1n1s2/crsconfig/crs_prepatch_apply_oop_o23c1n1s2_2024-10-20_05-36-08PM.log

 Performing following verification checks ...

   cluster upgrade state ...PASSED
   OLR Integrity ...PASSED
   Hosts File ...PASSED
   Free Space: o23c1n1s2:/ ...PASSED
   Software home: /u01/app/23.5.0.0/grid ...PASSED

 Pre-check for Patch Application was successful.

 CVU operation performed:      stage -pre patch
 Date:                         Oct 20, 2024, 5:36:38 PM
 CVU version:                  23.5.0.24.7 (070324x8664)
 Clusterware version:          23.0.0.0.0
 CVU home:                     /u01/app/23.5.0.0/grid
 Grid home:                    /u01/app/23.5.0.0/grid
 User:                         grid
 Operating system:             Linux5.4.17-2136.324.5.3.el8uek.x86_64

2024/10/20 17:37:03 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.
Executing command '/u01/app/23.6.0.0/grid/perl/bin/perl -I/u01/app/23.6.0.0/grid/perl/lib -I/u01/app/23.6.0.0/grid/crs/install /u01/app/23.6.0.0/grid/crs/install/rootcrs.pl  -dstcrshome /u01/app/23.6.0.0/grid -transparent -nodriverupdate -postpatch'
Using configuration parameter file: /u01/app/23.6.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/o23c1n1s2/crsconfig/crs_postpatch_apply_oop_o23c1n1s2_2024-10-20_05-37-03PM.log
2024/10/20 17:37:57 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd_dummy.service'
2024/10/20 17:38:51 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
2024/10/20 17:40:04 CLSRSC-4015: Performing install or upgrade action for Oracle Autonomous Health Framework (AHF).
2024/10/20 17:40:04 CLSRSC-4012: Shutting down Oracle Autonomous Health Framework (AHF).
2024/10/20 17:41:27 CLSRSC-4013: Successfully shut down Oracle Autonomous Health Framework (AHF).
2024/10/20 17:41:39 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
[root@o23c1n1s2 ~]#

We can see that between 2024/10/20 17:37:57 and 2024/10/20 17:41:39 the switch occurred. Looking for the log we can notice the stop and start of crs (and some interesting messages):

2024-10-20 17:37:58:
2024-10-20 17:37:58: Successfully removed file: /etc/systemd/system/oracle-ohasd.service.d/00_oracle-ohasd.conf
2024-10-20 17:37:58: Executing cmd: /usr/bin/systemctl daemon-reload
2024-10-20 17:38:00: zip: stopping Oracle Clusterware stack ...
2024-10-20 17:38:00: Executing cmd: /u01/app/23.5.0.0/grid/bin/crsctl stop crs -tgip -f
2024-10-20 17:38:51: Command output:
>  CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'o23c1n1s2'
>  CRS-2673: Attempting to stop 'ora.crsd' on 'o23c1n1s2'
>  CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'o23c1n1s2'
…
…
>  CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'o23c1n1s2' has completed
>  CRS-4133: Oracle High Availability Services has been stopped.
>End Command output
2024-10-20 17:38:51: The return value of stop of CRS: 0
2024-10-20 17:38:51: Executing cmd: /u01/app/23.5.0.0/grid/bin/crsctl check crs
…
…
2024-10-20 17:38:51: no change to oracle ohasd service, skip installing it in systemd
2024-10-20 17:38:51: zip: the oracle ohasd service, ohasd service, and slice files have not changed, skip daemon-reload but restart the service
2024-10-20 17:38:51: zip: restart ohasd service by killing init ohasd pid=2595
2024-10-20 17:38:51: Transparent GI patching, skip patching USM drivers
2024-10-20 17:38:51: Updating OLR with the new Patch Level
2024-10-20 17:38:51: Executing cmd: /u01/app/23.6.0.0/grid/bin/clscfg -localpatch
2024-10-20 17:38:55: Command output:
>  clscfg: EXISTING configuration version 0 detected.
>  Creating OCR keys for user 'root', privgrp 'root'..
>  Operation successful.
>End Command output
2024-10-20 17:38:55: Starting GI stack to set patch level in OCR and stop rolling patch
2024-10-20 17:38:55: Executing cmd: /u01/app/23.6.0.0/grid/bin/crsctl start crs -wait -tgip
2024-10-20 17:39:51: Command output:
>  CRS-4123: Starting Oracle High Availability Services-managed resources
>  CRS-2672: Attempting to start 'ora.cssd' on 'o23c1n1s2'
…
…

If you look the what was running during the patch we can notice that databases did not stopped, but ASM and listener were restarted:

[root@o23c1n1s2 ~]# ps -ef |grep smon
root        5770       1  1 15:14 ?        00:01:27 /u01/app/23.5.0.0/grid/bin/osysmond.bin
grid        6480       1  0 15:15 ?        00:00:00 asm_smon_+ASM1
oracle      7170       1  0 15:15 ?        00:00:00 ora_smon_o23ne1
oracle      7514       1  0 15:15 ?        00:00:00 ora_smon_o19c1
root      136141  134781  0 17:38 pts/6    00:00:00 grep --color=auto smon
[root@o23c1n1s2 ~]# ps -ef |grep lsnr
root        5872    5810  0 15:14 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/crfelsnr -n o23c1n1s2
grid        6209       1  0 15:15 ?        00:00:12 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid        6259       1  0 15:15 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid        9078       1  0 15:16 ?        00:00:03 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
grid        9084       1  0 15:16 ?        00:00:03 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
root      136164  134781  0 17:38 pts/6    00:00:00 grep --color=auto lsnr
[root@o23c1n1s2 ~]#
[root@o23c1n1s2 ~]# ps -ef |grep smon
oracle      7170       1  0 15:15 ?        00:00:00 ora_smon_o23ne1
oracle      7514       1  0 15:15 ?        00:00:00 ora_smon_o19c1
root      140246  134781  0 17:38 pts/6    00:00:00 grep --color=auto smon
[root@o23c1n1s2 ~]# ps -ef |grep lsnr
grid        6209       1  0 15:15 ?        00:00:13 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
root      140438  134781  0 17:38 pts/6    00:00:00 grep --color=auto lsnr
[root@o23c1n1s2 ~]#
[root@o23c1n1s2 ~]# ps -ef |grep smon
oracle      7170       1  0 15:15 ?        00:00:00 ora_smon_o23ne1
oracle      7514       1  0 15:15 ?        00:00:00 ora_smon_o19c1
root      142699       1  1 17:39 ?        00:00:03 /u01/app/23.6.0.0/grid/bin/osysmond.bin
grid      146818       1  0 17:39 ?        00:00:00 asm_smon_+ASM1
root      167971  134781  0 17:43 pts/6    00:00:00 grep --color=auto smon
[root@o23c1n1s2 ~]# ps -ef |grep lsnr
root      143127  142839  0 17:39 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/crfelsnr -n o23c1n1s2
grid      144540       1  0 17:39 ?        00:00:01 /u01/app/23.6.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid      144582       1  0 17:39 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid      144700       1  0 17:39 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
root      167992  134781  0 17:43 pts/6    00:00:00 grep --color=auto lsnr
[root@o23c1n1s2 ~]#

And our insert continues to be running at the same time (compare the columns for date/time with the log of the patch – stop, and start of crs):

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
     11131 1          Loop - EZconnect     20/10/2024 17:38:06 20/10/2024 16:53:59
      2248 2          Loop - EZconnect     20/10/2024 17:36:46 20/10/2024 16:53:59
      5283 1          Loop - Sqlplus       20/10/2024 17:38:06 20/10/2024 16:53:53

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
     11222 1          Loop - EZconnect     20/10/2024 17:38:36 20/10/2024 16:53:59
      2270 2          Loop - EZconnect     20/10/2024 17:38:32 20/10/2024 16:53:59
      5343 1          Loop - Sqlplus       20/10/2024 17:38:36 20/10/2024 16:53:53

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
     11287 1          Loop - EZconnect     20/10/2024 17:39:14 20/10/2024 16:53:59
      2370 2          Loop - EZconnect     20/10/2024 17:39:15 20/10/2024 16:53:59
      5420 1          Loop - Sqlplus       20/10/2024 17:39:14 20/10/2024 16:53:53

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
     11299 1          Loop - EZconnect     20/10/2024 17:39:21 20/10/2024 16:53:59
      2375 2          Loop - EZconnect     20/10/2024 17:39:21 20/10/2024 16:53:59
      5433 1          Loop - Sqlplus       20/10/2024 17:39:21 20/10/2024 16:53:53

SQL>

And at the time for the end of the root.sh at the first node, we can see those databases were not restarted, but the listener and ASM were.

When finished root.sh at node 01 is possible to execute in the remaining nodes (since it is two nodes, this is the last one and you can see that the checks were executed). Below you can see that I execute the ps to check the running services to pick up the time for the databases process, and after the patch finishes, I execute the same command to demonstrate that just crs services were restarted:

[root@o23c1n2s2 ~]# ps -ef |grep smon
root        4795       1  0 15:19 ?        00:01:20 /u01/app/23.5.0.0/grid/bin/osysmond.bin
grid        5727       1  0 15:19 ?        00:00:00 asm_smon_+ASM2
oracle      6300       1  0 15:20 ?        00:00:00 ora_smon_o23ne2
oracle      6679       1  0 15:20 ?        00:00:00 ora_smon_o19c2
root      121768  120925  0 17:45 pts/3    00:00:00 grep --color=auto smon
[root@o23c1n2s2 ~]# ps -ef |grep lsnr
root        4896    4838  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/crfelsnr -n o23c1n2s2
grid        5298       1  0 15:19 ?        00:00:03 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid        5329       1  0 15:19 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid      106916       1  0 17:38 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
grid      106934       1  0 17:38 ?        00:00:00 /u01/app/23.5.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
root      121881  120925  0 17:45 pts/3    00:00:00 grep --color=auto lsnr
[root@o23c1n2s2 ~]#
[root@o23c1n2s2 ~]# /u01/app/23.6.0.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/23.6.0.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
RAC option enabled on: Linux
Executing command '/u01/app/23.6.0.0/grid/perl/bin/perl -I/u01/app/23.6.0.0/grid/perl/lib -I/u01/app/23.6.0.0/grid/crs/install /u01/app/23.6.0.0/grid/crs/install/rootcrs.pl  -dstcrshome /u01/app/23.6.0.0/grid -transparent -nodriverupdate -prepatch'
Using configuration parameter file: /u01/app/23.6.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/o23c1n2s2/crsconfig/crs_prepatch_apply_oop_o23c1n2s2_2024-10-20_05-44-11PM.log

 Initializing ...

 Performing following verification checks ...

   cluster upgrade state ...PASSED
   OLR Integrity ...PASSED
   Hosts File ...PASSED
   Free Space: o23c1n2s2:/ ...PASSED
   Software home: /u01/app/23.5.0.0/grid ...PASSED

 Pre-check for Patch Application was successful.

 CVU operation performed:      stage -pre patch
 Date:                         Oct 20, 2024, 5:44:42 PM
 CVU version:                  23.5.0.24.7 (070324x8664)
 Clusterware version:          23.0.0.0.0
 CVU home:                     /u01/app/23.5.0.0/grid
 Grid home:                    /u01/app/23.5.0.0/grid
 User:                         grid
 Operating system:             Linux5.4.17-2136.324.5.3.el8uek.x86_64

2024/10/20 17:45:32 CLSRSC-671: Pre-patch steps for patching GI home successfully completed.
Executing command '/u01/app/23.6.0.0/grid/perl/bin/perl -I/u01/app/23.6.0.0/grid/perl/lib -I/u01/app/23.6.0.0/grid/crs/install /u01/app/23.6.0.0/grid/crs/install/rootcrs.pl  -dstcrshome /u01/app/23.6.0.0/grid -transparent -nodriverupdate -postpatch'
Using configuration parameter file: /u01/app/23.6.0.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/grid/crsdata/o23c1n2s2/crsconfig/crs_postpatch_apply_oop_o23c1n2s2_2024-10-20_05-45-34PM.log
2024/10/20 17:46:24 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd_dummy.service'
2024/10/20 17:46:57 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
2024/10/20 17:48:03 CLSRSC-4015: Performing install or upgrade action for Oracle Autonomous Health Framework (AHF).
2024/10/20 17:48:03 CLSRSC-4012: Shutting down Oracle Autonomous Health Framework (AHF).
2024/10/20 17:49:24 CLSRSC-4013: Successfully shut down Oracle Autonomous Health Framework (AHF).

 Initializing ...

 Performing following verification checks ...

   cluster upgrade state ...PASSED

 Post-check for Patch Application was successful.

 CVU operation performed:      stage -post patch
 Date:                         Oct 20, 2024, 5:49:55 PM
 CVU version:                  23.6.0.24.10 (100824x8664)
 Clusterware version:          23.0.0.0.0
 CVU home:                     /u01/app/23.6.0.0/grid
 Grid home:                    /u01/app/23.6.0.0/grid
 User:                         grid
 Operating system:             Linux5.4.17-2136.324.5.3.el8uek.x86_64

2024/10/20 17:51:53 CLSRSC-672: Post-patch steps for patching GI home successfully completed.
[root@o23c1n2s2 ~]# 2024/10/20 17:53:47 CLSRSC-4003: Successfully patched Oracle Autonomous Health Framework (AHF).
[root@o23c1n2s2 ~]# ps -ef |grep smon
oracle      6300       1  0 15:20 ?        00:00:00 ora_smon_o23ne2
oracle      6679       1  0 15:20 ?        00:00:00 ora_smon_o19c2
root      129794       1  1 17:47 ?        00:00:04 /u01/app/23.6.0.0/grid/bin/osysmond.bin
grid      133450       1  0 17:47 ?        00:00:00 asm_smon_+ASM2
root      155777  120925  0 17:53 pts/3    00:00:00 grep --color=auto smon
[root@o23c1n2s2 ~]# ps -ef |grep lsnr
root      130215  129930  0 17:47 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/crfelsnr -n o23c1n2s2
grid      131655       1  0 17:47 ?        00:00:01 /u01/app/23.6.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid      131683       1  0 17:47 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid      131854       1  0 17:47 ?        00:00:00 /u01/app/23.6.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
root      155852  120925  0 17:53 pts/3    00:00:00 grep --color=auto lsnr
[root@o23c1n2s2 ~]#

So, is possible to see that databases were up and running for the whole time. Connections continue to insert data in both nodes:

SQL> /

  COUNT(*) C1         C2                   LAST_INS            FIRST_INS
---------- ---------- -------------------- ------------------- -------------------
     13117 1          Loop - EZconnect     20/10/2024 17:55:12 20/10/2024 16:53:59
      4722 2          Loop - EZconnect     20/10/2024 17:54:35 20/10/2024 16:53:59
      7328 1          Loop - Sqlplus       20/10/2024 17:55:12 20/10/2024 16:53:53

SQL>

This file is the full output from the SQLPLUS and PLSQL execution. You can notice that no downtime message was shown. Even the connected running session never stopped to insert.

If you would like to understand why, this is basically because the database detected the missing instance and registered as Flex Client to the remaining ASM node. This is ASM Flex:

2024-10-20T17:38:04.899942+02:00
ALTER SYSTEM SET _asm_asmb_rcvto=60 SCOPE=MEMORY;
2024-10-20T17:38:04.924714+02:00
ALTER SYSTEM SET _asm_asmb_max_wait_timeout=60 SCOPE=MEMORY;
2024-10-20T17:38:05.056174+02:00
ALTER SYSTEM RELOCATE CLIENT TO '+ASM2'
2024-10-20T17:38:15.061808+02:00
NOTE: ASMB (7543) relocating from ASM instance +ASM1 to +ASM2 (User initiated)
2024-10-20T17:38:16.999014+02:00
NOTE: ASMB (index:0) registering with ASM instance as Flex client 0xcde9b1707cdd5761 (reg:5540864) (startid:1182870941) (reconnect)
NOTE: ASMB (index:0) (7543) connected to ASM instance +ASM2, osid: 106767 (Flex mode; client id 0xcde9b1707cdd5761)
NOTE: ASMB (7543) rebuilding ASM server state for all pending groups
NOTE: ASMB (7543) rebuilding ASM server state for group 1 (DATA)
2024-10-20T17:38:18.189032+02:00
NOTE: ASMB (7543) rebuilt 1 (of 1) groups
NOTE: ASMB (7543) rebuilt 21 (of 26) allocated files
NOTE: (7543) fetching new locked extents from server
NOTE: ASMB (7543) 0 locks established; 0 pending writes sent to server
SUCCESS: ASMB (7543) reconnected & completed ASM server state for disk group 1
NOTE: ASMB (7543) rebuilding ASM server state for group 2 (RECO)
NOTE: ASMB (7543) rebuilt 1 (of 1) groups
NOTE: ASMB (7543) rebuilt 5 (of 26) allocated files
NOTE: (7543) fetching new locked extents from server
NOTE: ASMB (7543) 0 locks established; 0 pending writes sent to server
SUCCESS: ASMB (7543) reconnected & completed ASM server state for disk group 2
2024-10-20T17:39:21.216560+02:00
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.160.23.14)(PORT=1521))' SCOPE=MEMORY SID='o19c1';
2024-10-20T17:39:21.434298+02:00
ALTER SYSTEM SET remote_listener=' o23c1s2-scan:1521' SCOPE=MEMORY SID='o19c1';
2024-10-20T17:39:21.438054+02:00
ALTER SYSTEM SET listener_networks='' SCOPE=MEMORY SID='o19c1';
2024-10-20T17:39:22.740276+02:00
ALTER SYSTEM SET _asm_asmb_rcvto=10 SCOPE=MEMORY;
2024-10-20T17:39:22.744926+02:00
ALTER SYSTEM SET _asm_asmb_max_wait_timeout=6 SCOPE=MEMORY;
2024-10-20T17:39:28.603041+02:00
ALTER SYSTEM SET remote_listener=' o23c1s2-scan:1521' SCOPE=MEMORY SID='o19c1';
2024-10-20T17:46:34.397059+02:00

State of ACFS and AFD drivers

After the installation, we noticed that the drivers were not updated. That was the requested option during the installation, but the new versions of the drivers are available but not active:

[root@o23c1n1s2 ~]# su - grid
[grid@o23c1n1s2 ~]$
[grid@o23c1n1s2 ~]$ echo $ORACLE_HOME
/u01/app/23.6.0.0/grid
[grid@o23c1n1s2 ~]$
[grid@o23c1n1s2 ~]$ $ORACLE_HOME/bin/crsctl query driver activeversion -all
Node Name : o23c1n1s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n1s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : ACFS
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

Node Name : o23c1n2s2
Driver Name : AFD
BuildNumber : 240702.1
BuildVersion : 23.0.0.0.0 (23.5.0.24.07)

[grid@o23c1n1s2 ~]$ $ORACLE_HOME/bin/crsctl query driver softwareversion -all
Node Name : o23c1n1s2
Driver Name : ACFS
BuildNumber : 241006
BuildVersion : 23.0.0.0.0 (23.6.0.24.10)

Node Name : o23c1n1s2
Driver Name : AFD
BuildNumber : 241006
BuildVersion : 23.0.0.0.0 (23.6.0.24.10)

Node Name : o23c1n2s2
Driver Name : ACFS
BuildNumber : 241006
BuildVersion : 23.0.0.0.0 (23.6.0.24.10)

Node Name : o23c1n2s2
Driver Name : AFD
BuildNumber : 241006
BuildVersion : 23.0.0.0.0 (23.6.0.24.10)

[grid@o23c1n1s2 ~]$ 
[grid@o23c1n1s2 ~]$ $ORACLE_HOME/bin/acfsdriverstate version
ACFS-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
ACFS-9326:     Driver build number = 240702.1.
ACFS-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
ACFS-9547:     Driver available build number = 241006.
ACFS-9232:     Driver available build version = 23.0.0.0.0 (23.6.0.24.10).
[grid@o23c1n1s2 ~]$ $ORACLE_HOME/bin/afddriverstate version
AFD-9325:     Driver OS kernel version = 5.4.17-2011.0.7.el8uek.x86_64.
AFD-9326:     Driver build number = 240702.1.
AFD-9231:     Driver build version = 23.0.0.0.0 (23.5.0.24.07).
AFD-9547:     Driver available build number = 241006.
AFD-9232:     Driver available build version = 23.0.0.0.0 (23.6.0.24.10).
[grid@o23c1n1s2 ~]$

In one dedicated post, I will show how to update the drivers. It is only one step executed in each node, but this requires downtime since it will stop all resources running in the node. Even I recommend restarting of machine to have a clean kernel start with the new drivers.

Conclusion

The patching using the Zero-Downtime Oracle Grid Infrastructure Patching (ZDOGIP) with the Gold Image still allows the upgrade of the GI without interruption at databases. This is important for environments where the remaining nodes can’t absorb the whole load of the database instance running in the patched node.

The important detail is linked with the AFD and ACFS kernel drivers. If the skipDriverUpdate is not specified, the root.sh will execute and shutdown all the databases, causing downtime. You need to be aware of the compatibility matrix between the new GI version and the OS kernel.

For the last, is important to notice that when using Gold Image we need two steps. The first is the installation, and the second is the switch. But this is good because allows us to prepare the installation without any impact on the running databases, and later do the switch in a maintenance window. This can reduce even more the possibility of impact on the environment.

 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies, or opinions. The information here was edited to be useful for general purposes, and specific data and identifications were removed to allow reach the generic audience and to be useful for the community. Post protected by copyright.

One thought on “23ai, Zero-Downtime Oracle Grid Infrastructure Patching – GOLD IMAGE

  1. Pingback: 23ai, Zero-Downtime Oracle Grid Infrastructure Patching – GOLD IMAGE with Silent Install - Fernando Simon

Leave a Reply

Your email address will not be published. Required fields are marked *