Tag Archives: Engineered Systems

TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.

Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

The environment

The actual environment is:

  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64

TFA error

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.

And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.

The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).

Click here to read more…

Exadata, workaround for oracka.ko error

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) released in October of 2019, and update the GI to the last 19c version (19.5.0.0.191015) and after that, I hade some issues to create 11G databases.

So, when I try to create an 11G RAC database, the error “File -oracka.ko- was not found” appears and creation fails. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to bypass the error.

The environment

The actual environment is:

  • Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64
  • 11G Database: Version 11.2.0.4.180717
  • ACFS: Used to store some files

oracka.ko

So, calling dbca:

[DEV-oracle@exsite1c1-]$ /u01/app/oracle/product/11.2.0.4/dbhome_1/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName D11TST19 -adminManaged -sid D11TST19 -sysPassword oracle11 -systemPassword oracle11 -characterSet WE8ISO8859P15 -emConfiguration NONE -storageType ASM -diskGroupName DATAC8 -recoveryGroupName RECOC8 -nodelist exsite1c1,exsite1c2 -sampleSchema false
Copying database files
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/D11TST19/D11TST19.log" for further details.
[DEV-oracle@exsite1c1-]$

Click here to read more…

Exadata, Using metrics to help you

It is well known that Exadata delivers a lot of power for databases and, besides that, has a lot of features that can be combined to reach the desired goals. But you need to understand how to use Exadata, it is not just knowing the internal hardware pieces, put some SQL hints, or use smart scan that makes a better DBA (or DMA).

Think about the “traditional” environment (DB + storage) and how you check for performance problems there. Basically, you just have/receive the number of IOPS from luns, throughput in MB/s, and latency from the storage side. But Exadata provides a lot of metrics that go beyond that and can be used to really understand what it is happening between the database and the access of data blocks.

For me, one of the most underrated (and not even well explained in web) features of Exadata is the metrics because they can help you to really understand Exadata deeply. As an example, from metrics, you can check the MB/s read from flash cache, disks (per type), flash log writes, reads that bypassed flash cache and went to disk, Smart I/O per database, PDB or consumer groups. It is not part of this post explain all the metrics (will be in another one), but you can read more at Chapter 6 of the Exadata User Guide.

In this post, I will show you one example of how to use the metric to identify and solve database problems. Sometimes it can be a hide and seek game, but I will try to show you how to use metrics and how they can help you on your daily basis.

Click here to read more…

ZDLRA, Multi-site protection – ZERO RPO for Primary and Standby

ZDLRA can be used from a small single database environment to big environments where you need protection in more than one site at the same time. At every level, you can use different features of ZDLRA to provide desirable protection. Here I will show how to reach zero RPO for both primary and standby databases. All the steps, doc, and tech parts are covered.

You can check the examples the reference for every scenario int these two papers from the Oracle MAA team: MAA Overview On-Premises and Oracle MAA Reference Architectures. They provide good information on how to prepare to reduce RPO and improve RTO. In resume, the focus is the same, reduce the downtime and data loss in case of a catastrophe (zero RPO, and zero RPO).

Multi-site protection

If you looked both papers before, you saw that to provide good protection is desirable to have an additional site to, at least, send the backups. And if you go higher, for GOLD and PLATINUM environments, you start to have multiple sites synced with data guard. These Critical/Mission-critical environments need to be protected for every kind of catastrophic failure, from disk until complete site outage (some need to follow specific law’s requirements, bank as an example).

And the focus of this post is these big environments. I will show you how to use ZDLRA to protect both sites, reaching zero RPO even for standby databases. And doing that, you can survive for a catastrophic outage (like entire datacenter failure) and still have zero RPO. Going further, you can even have zero RPO if you lose completely on site when using real-time redo for ZDLRA, and this is not written in the docs by the way.

Click here to read more…

ZDLRA, Real-Time Redo and Zero RPO

The idea for Real-Time Redo is to reach zero RPO for every kind of database and this includes ones with and without DG. As you can see in my last post, where I showed how to configure Real-Time Redo for one database, some little steps need to be executed and they are pretty similar than a remote destination for archivelog for DG.

But if you noticed, the configuration for the remote destination was defined as ASYNC, and hinted like that at ZDLRA docs (“Protection of Ongoing Transactions” or at “How Real-Time Redo Transport Works”). In the same post, I suggested as “controversial” because the ASYNC does not guarantee the RPO zero. 

You can see more in the DataGuard docs at (Oracle Data Guard Protection Modes and Oracle Data Guard Concepts and Administration), but the resume it is:

  • ASYNC: The primary database does not wait for the response from a remote destination.
  • SYNC/NOAFIRM: The Primary database holds commit until the remote destination report that received the redo data. It does not wait until the remote site report that wrote the data in the disc.
  • SYNC/AFFIRM: The primary database holds commit until the remote destination report that received redo data and wrote it at the disk.

You can read with more details the difference here: Best Practices for Synchronous Redo Transport and Best Practices for Asynchronous Redo Transport.

The idea is simple, if you use ASYNC, there is no guarantee for zero data loss between the primary database and the remote destination.

Click here to read more…

ZDLRA, Real-Time Redo

Real-time redo transport is the feature that allows you to reduce to zero the RPO (Recovery Point Objective) for your database. Check how to configure real-time redo, the steps, parameters, and other details that need to be modified to enable it.

The idea behind real-time redo transport it is easy, basically the ZDLRA it is a remote destination for your redo log buffers/archivelogs of your database. It is really, really, similar to what occurs for data guard configurations (but here you don’t need to set all datafiles as an example). It is not the same too because ZDLRA can detect if the database stops/crash and will generate the archivelog (at ZDLRA side) with all the received redo and this can be used to restore to, at least zero/sub-seconds, of data loss.

Using real-time redo it is the only way to reach RPO zero. With other features of ZDLRA, you can have a better backup window time (but just that) using incremental backups. Just using real-time redo you reach zero RPO and this impacts directly how to configure for MAA compliance. There are a lot of options and level of protection for MAA that you can check at “Maximum Availability Architecture (MAA) – On-Premises HA Reference Architectures 2019”, “Maximum Availability Architecture Best Practices for Oracle Cloud”, “Oracle MAA Reference Architectures”, “Maximum Availability Architecture – Best Practices for Oracle Database 19c”.

This post starts from one environment that you already enrolled in the database at ZDLRA. I already wrote about how to do that, you can check here in my previous post. This is the first post about real-time redo, here you will see how to configure and verify it is working.

Click here to read more…

ZDLRA Internals, Virtual Full Backup

Virtual Full Backup probably is the most know feature of Oracle Zero Data Loss Recovery Appliance (ZDLRA) and you can check here how it works. In this post I will show how virtual full backup works internally and integrate INDEX_BACKUP task with tables like PLANS, PLAN_DETAILS, CHUNKS, and BLOCKS.

About the internal tables, you can check my previous post “ZDLRA Internals, Tables and Storage” where I explained details about that. To understand the INDEX_BACKUP task, check my post “ZDLRA Internals, INDEX_BACKUP task in details”. But if you know nothing and want to start reading about ZDLRA, you can check my post “Understanding ZDLRA” and check all the features and details.

The base for this article is virtual full backup and incremental forever strategy. I explained both at “ZDLRA, Virtual Full Backup and Incremental Forever” and I included hot it’s work integrated with rman backup. Basically, after an initial backup level 0, you execute just level 1 backups and ZDLRA generated a virtual backup level 0. But here, in this post, I will show you how it works in some internal details.

Click here to read more…

Exadata X8M

Exadata X8M model it was released in the Oracle Open World 2019 and the new feature, “M”, was aimed to reduce the latency and increase the IOPS. The Exadata X8M uses the Remote Direct Memory (RDMA) to allow database access the storage server memory directly. And the memory, in this case, it is special, X8M uses Intel Optane DC Persistent Memory modules (DIMM/NVDIMM – Non Volatile DIMM – to provide PMEM – Persistent Memory) attached directly at storage server and these can be accessed directly from the database using RDMA trough RoCE network. Let’s check the details to see what it is.

Click here to read more…

ZDLRA, 19.2

Last week, at 27/August/2019, ZDLRA team released the new major version 19.2 (19.2.1.1.1) of Recovery Appliance Software. As always, some new features like supported databases, but some downsides too.

The main features for the software side:

  • Support for 19c databases.
  • Improved way to apply the patch (the rpm installation is not anymore manual).
  • Improved way to rollback the patch in some cases.

Other changes that deserve the hint:

  • Runs over 19.4 version of GRID and RDBMS.
  • Runs over Exadata System Software 19.2.3.0.0 (this means OEL 7) for DB and Storage.
  • Updated version for TFA, Exachk, and OEDA; all 19.x version.

Click here to read more…

ZDLRA Internals, INDEX_BACKUP task in details

For ZDLRA, the task type INDEX_BACKUP it is important (if it is not the most) because it is responsible to create the virtual full backup. This task runs for every backup that you ingest at ZDLRA and here, I will show with more details what occurs at ZDLRA: internals steps, phases, and tables involved.

I recommend that you check my previous post about ZDLRA: ZDLRA Internals, Tables and Storage, ZDLRA, Virtual Full Backup and Incremental Forever, and Understanding ZDLRA. They provide a good base to understand some aspects of ZDLRA architecture and features.

Backup

As you saw in my previous post, ZDLRA opens every backup that you sent and read every block of it to generate one new virtual full backup. And this backup is validated block a block (physically and logically) against corruption. It differs from a snapshot because it is content-aware (in this case it is proprietary Oracle datafile blocks inside another proprietary Oracle rman block) and Oracle it is the only that can do this guaranteeing that result is valid.

Click here to read more…