Author: Emiliano Fusaglia

Solution Architect at Oracle. I am an IT specialist and I have been managing Oracle Databases since 2000. After some years dedicated to programming I have been specializing as an Oracle DBA covering all the aspects: data modeling, installation, migration, procedures for backup/recovery, PL/SQL, tuning and support to developers. I’m skilled in the design and implementation of High Availability architectures with RAC, ASM, Data Guard on commodity hardware and Oracle Engineered Systems, mainly Exadata Machine. I have been working on engineering and on operation with complex environments, running business critical databases. After a few years working for third party companies I was a contractor for five years in IBM; then I moved to Zurich Switzerland where I worked almost four years at Credit Suisse AG and ten years at Trivadis AG. Today I’m still based in Switzerland and I’m an Oracle employee. Specialties: OCI - DRCC - DBaaS - Oracle Exadata Machine – Oracle RAC – Grid infrastructure (CRS, ASM) – Data Guard – Tuning

Introduction to Oracle Dedicated Region Cloud@Customer (DRCC)

Over the past few years, the public cloud adoption has become mainstream and the enterprises take advantage of the cloud in term of cost reduction, agility, flexibility, automation and self-service activities.

But it exists a large portion of critical workloads that due to regulatory and performance-related contraints, cannot currently be moved to the public cloud. To solve those restrictions, Oracle has engineered a unique solution named Oracle Dedicated Region Cloud@Customer (DRCC).

Designed as alternative to the OCI public cloud, it offers the same services at the same conditions available on the OCI public regions, inside the customer’s data center.

The main use cases for the DRCC adoption are associated to the following constraints:

Data Sovereignty
Security and Control
Network Latency

Data Sovereignty

Data sovereignty generally refers to government efforts to prevent their citizens’ data from falling into the wrong hands via measures that restrict how businesses can transfer personal information beyond their country’s borders. Those measures can be in the form of regulations—think General Data Protection Regulation (GDPR) in the European Union, which regulates data privacy in the European Union and the European Economic Area as well as the transfer of personal data from those regions, or the California Consumer Privacy Act (CCPA), which gives citizens the right to know what personal information companies collect about them and how it is used and shared.

source: https://www.oracle.com/security/saas-security/data-sovereignty/

Security and Control

Restricting the physical access to the data center to special granted employees, and keeping the data behind the company’s firewall.

Network Latency

Specific applications (like trading or Voice over IP) are very sensitive to the increase of the network latency, therefore cannot be easily relocated to the public cloud.

DRCC Overview

As previously mentioned, the DRCC implementation offers the same services at the same conditions available on the OCI public region, for a full list of OCI services and associated SLA please check this link.

The main differences between the DRCC and the OCI region are:

It is built inside the customer’s premises
Data never leaves the customer’s premises
The infrastructure is remotely managed by Oracle
All resources are dedicated to one tenant

High level DRCC architecture

List of currently available major DRCC services

DRCC Benefits

Oracle Dedicated Region Cloud@Customer brings all public cloud capabilities on-premises, so the enterprises can reduce infrastructure and operational costs, upgrade legacy applications on modern cloud platform, and meet the most demanding regulatory, data residency and latency requirements:

Bring all existing OCI services (at the time of this blog post around 80) including the Autonomous Database, eliminating any technical debt.
Leading cloud performance
Security-first architecture that reduces the risk and attack surfaces
Keep full control of all data to meet the most demanding data privacy and latency requirements.
Pay only for the services consumed and during the utilization period.
Deploy seamlessly between on-premises and public cloud, using the exact same tools, APIs, and SLA available in OCI and DRCC.
Single-vendor cloud accountability and management for all cloud platform, database, and infrastructure

Conclusion

Oracle DRCC unique characteristics offers cloud-scale security, resiliency and superior performance, allowing to meet the most stringents requirements in terms of data residency and latency.

DRCC expands the public OCI offers, enabling customers to build and operate modern applications inside their data center at the same price offered in Oracle’s public cloud.

More details about DRCC are available at the follwing Oracle corporate link.

Exadata as Code – PDB Snapshot Cloning

As follow up to the Exadata as Code post, today I’m going to focus on one of the latest features added to our automation: PDB Snapshot Cloning.

PDB snapshot cloning is one of the best development options to offers in a CI/CD project. In an Exadata environment there are special requirements to implement before start using this technology: Sparse Grid Disks and Sparse ASM Disk Group ( description and step-by-step example is available here).

On Exadata, the PDB Snapshot beneficial of all Smart features including offload capabilities, with in addition space- and time-efficient provisioning.

After this brief introduction let’s see how the PDB Snapshot Cloning has been implemented on our Exadata as Code automation.

Exadata Sparse Storage Automation

Select the Oracle Container Database (CDB) where the PDB Snapshot Cloning should be activated, and with One-click provisioning the Sparse Grid disks and Sparse ASM disk group are created.

After the initial provisioning, the automation starts monitoring the space usage, automatically resizing it (increasing/decreasing) when necessary.

PDB Snapshot Cloning Automation

Same principle applies to the different PDB actions: One-click Provisioning/Decommissioning of the PDB Test Master and PDB Snapshot.

Those features are exposed via UI or API to the application developers, making them autonomous on the management of such space efficient database environment.

Exadata as Code

It is very exciting for me to share this post, because what I’m going to describe here is not the final, but the current and intermediate result achieved by the Trivadis team on the development and implementation of what I call “Exadata as Code” project.

In the Cloud and automation era, this is the Trivadis answer to increase efficiency, time-to-market and quality on the most challenging Exadata projects. Trivadis has been working hard to achieve this level of automation, covering most of the recurrent activities on the platform, and this is not all, as any CI/CD development, it is getting every day better, enriched by new features and fixes, which simplify the lifecycle management of such platforms.

Few months ago, I posted a blog entitled Bulk Exadata Patching, where it is shown how to improve the Exadata patching automation, but despite the improvements, it was still having a number of manual interactions. Now we have reached a much better level of automation, with one-click action to perform cumbersome tasks like: Infrastructure and Database Provisioning/Decommissioning, Patching, and many other operational tasks.

How Exadata as Code works

The concept is quite simple, all developments are made available in the form of Ansible playbooks and incapsulated inside Jenkins; this brings the following advantages:

User friendly interface
Orchestration Pipelines
Enhanced Security, recording auditing information and logging job executions
Job Scheduling

Exadata Administration Workflow

Oracle Database Administration Workflow

Wrap-up

One takeaway from this experience, automation is the only option to stay competitive and deliver high quality service.

Special thanks go to all Trivadis’ colleagues working everyday so passionately on the project #BetterTogether.

Ksplice for Exadata

Here is detailed how installing online Uptrack fixes on the Exadata Linux Kernel using Oracle Kesplice.

Quick licensing information this option requires “Oracle Linux Premier Support Customers” for all details please visit this link

Exadata Online Linux Kernel Patching

Downloading the Uptrack from linux.oracle.com

If still present remove the exadata-sun.*computenode-exact RPM

yum list installed | grep 'exadata-sun.*computenode-exact'

[root@tvdexa07db01 Ksplice]# yum list installed | grep 'exadata-sun.*computenode-exact' exadata-sun-vm-computenode-exact.noarch [root@tvdexa07db01 Ksplice]#

[root@tvdexa07db01 Ksplice]# yum erase exadata-sun-vm-computenode-exact.noarch Resolving Dependencies --> Running transaction check ---> Package exadata-sun-vm-computenode-exact.noarch 0:19.3.4.0.0.200130-1 will be erased --> Finished Dependency Resolution

Dependencies Resolved

…

Install the Uptrack update

[root@tvdexa07db01 Ksplice]# yum install uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch.rpm Examining uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch.rpm: uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch Marking uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch.rpm as an update to uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200128-0.noarch Resolving Dependencies --> Running transaction check ---> Package uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64.noarch 0:20200128-0 will be updated ---> Package uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64.noarch 0:20200417-0 will be an update --> Finished Dependency Resolution

Dependencies Resolved

Upgrade 1 Package

Total size: 18 M Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Updating : uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch 1/2 The following steps will be taken: Install [mnnc6fcu] CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. Install [tvgkcphj] Improved fix to CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. Install [bss17slx] CVE-2019-19332: Denial-of-service in KVM cpuid emulation reporting. Install [sl587t78] Missing Non Maskable Interrupts on AMD KVM guests. Install [98393wix] CVE-2016-5244: Information leak in the RDS network protocol. Install [alabfukc] CVE-2019-17666: Remote code execution in Realtek peer-to-peer Wifi. Install [sc77umtr] CVE-2019-0155: Privilege escalation in Intel i915 graphics driver. Install [o8kefoq8] CVE-2019-0154: Denial-of-service in Intel i915 graphics driver. Install [cy9t2fe7] Improved fix to CVE-2019-11135: Side-channel information leak in Intel TSX on late microcode update. Install [3h40qtja] IO hang on block multiqueue device waits. Install [ll4csbbo] CVE-2019-20054: Denial-of-service in procfs sysctl removal. Install [r7vboahq] Deadlock in Reliable Datagram socket connection. Install [pojd20br] Denial-of-service in Reliable Datagram Socket send cancellation. Install [ayrz7nql] CVE-2020-2732: Privilege escalation in Intel KVM nested emulation. Install [e6szjj4l] CVE-2019-14901: Denial-of-service when parsing TDLS action frame in Marvell WiFi-Ex driver. Install [9253ftyr] CVE-2019-15291: Denial-of-service in B2C2 FlexCop driver probing. Install [h6qm8tqh] IO hang in Reliable Datagram Socket remote DMA socket closing. Install [jmo8wvme] Denial-of-service in CIFS POSIX file locks on close. Install [c92mbxy5] CVE-2019-15538: Denial-of-service in XFS filesystem with Quota support enabled. Install [6z2m3403] CVE-2020-7053: Use-after-free when destroying i915 GEM context. Install [c5t3d5fa] NULL dereference while loading userspace I/O driver. Install [8dgt3zi6] CVE-2019-14895: Denial-of-service when receiving Country WLAN element in Marvell WiFi-Ex driver. Install [a4scbw6j] Kernel panic in Infiniband RDMA QP send queues. Install [9m542zj0] Denial-of-service in IPMI device registration. Install [jq0r6r0v] Denial-of-service in Pressure Stall Information cgroup destruction. Install [m5jqkbu3] Denial-of-service in Infiniband pool destruction. Install [miag3lwz] CVE-2019-19338: Missing Intel TAA mitigation in KVM guests. Install [28o9c5jg] CVE-2019-14615: Information leak in Intel i915 generation 9 devices. Install [e9ghelsq] Race condition in crypto initialization causes denial-of-service. Install [8kg2ltrj] Denial-of-service in XFS whiteout renames. Install [djt21tl9] NULL pointer dereference when mounting a GFS2 filesystem. Install [fmlfqa0h] NULL-pointer dereference when driver logs in/out of system. Install [r0hywzwk] Soft lockup in low memory situation due to misused allocator in LPFC. Install [o6ivsuer] Use-after-free due to race condition in LPFC NVME write handling. Install [muzxn90m] Hard lock up when handling aborts for LPFC. Install [297glmum] Soft lockup when multiple threads read /proc/stat simultaneously. Install [p2wsrgbk] Denial-of-service in the batman-adv subsystem. Install [tg92ldhm] Spurious signals during TTY reopen. Install [e6k3gzxq] CVE-2019-18809: Memory leak when identifying state in Afatech AF9005 DVB-T USB1.1 driver. Install [9ccfw3yo] CVE-2019-18806: Memory leak when allocating large buffers in QLogic QLA3XXX Network driver. Install [73fkk23h] Use-after-free when using NFS with page cache. Install [d4thcifo] NULL pointer dereference during UBIFS mount. Install [jdwrgfic] CVE-2020-10942: Out-of-bounds memory access in the Virtual host driver. Install [iizw2yj1] Use-after-free when constructing ERSPAN packet header. Install [221vs9ih] Deadlock when deleting NVMe namespace fails. Install [1outnjhr] Out-of-bounds read when transmitting packet using XFRM.
Install [i6sjtrm9] Point-to-Point Protocol IOCDETACH ioctl causes use-after-free. Install [8grqdp2d] Flawed logic in read/write semaphore implementation causes crash. Install [jsndggsn] System fails to generate vmcore dump after panic. Install [p6dow9c8] CVE-2018-5953: Information leak in software IO TLB driver. Installing [mnnc6fcu] CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. Installing [tvgkcphj] Improved fix to CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. Installing [bss17slx] CVE-2019-19332: Denial-of-service in KVM cpuid emulation reporting. Installing [sl587t78] Missing Non Maskable Interrupts on AMD KVM guests. Installing [98393wix] CVE-2016-5244: Information leak in the RDS network protocol. Installing [alabfukc] CVE-2019-17666: Remote code execution in Realtek peer-to-peer Wifi. Installing [sc77umtr] CVE-2019-0155: Privilege escalation in Intel i915 graphics driver. Installing [o8kefoq8] CVE-2019-0154: Denial-of-service in Intel i915 graphics driver. Installing [cy9t2fe7] Improved fix to CVE-2019-11135: Side-channel information leak in Intel TSX on late microcode update. Installing [3h40qtja] IO hang on block multiqueue device waits. Installing [ll4csbbo] CVE-2019-20054: Denial-of-service in procfs sysctl removal. Installing [r7vboahq] Deadlock in Reliable Datagram socket connection. Installing [pojd20br] Denial-of-service in Reliable Datagram Socket send cancellation. Installing [ayrz7nql] CVE-2020-2732: Privilege escalation in Intel KVM nested emulation. Installing [e6szjj4l] CVE-2019-14901: Denial-of-service when parsing TDLS action frame in Marvell WiFi-Ex driver. Installing [9253ftyr] CVE-2019-15291: Denial-of-service in B2C2 FlexCop driver probing. Installing [h6qm8tqh] IO hang in Reliable Datagram Socket remote DMA socket closing. Installing [jmo8wvme] Denial-of-service in CIFS POSIX file locks on close. Installing [c92mbxy5] CVE-2019-15538: Denial-of-service in XFS filesystem with Quota support enabled. Installing [6z2m3403] CVE-2020-7053: Use-after-free when destroying i915 GEM context. Installing [c5t3d5fa] NULL dereference while loading userspace I/O driver. Installing [8dgt3zi6] CVE-2019-14895: Denial-of-service when receiving Country WLAN element in Marvell WiFi-Ex driver. Installing [a4scbw6j] Kernel panic in Infiniband RDMA QP send queues. Installing [9m542zj0] Denial-of-service in IPMI device registration. Installing [jq0r6r0v] Denial-of-service in Pressure Stall Information cgroup destruction. Installing [m5jqkbu3] Denial-of-service in Infiniband pool destruction. Installing [miag3lwz] CVE-2019-19338: Missing Intel TAA mitigation in KVM guests. Installing [28o9c5jg] CVE-2019-14615: Information leak in Intel i915 generation 9 devices. Installing [e9ghelsq] Race condition in crypto initialization causes denial-of-service. Installing [8kg2ltrj] Denial-of-service in XFS whiteout renames. Installing [djt21tl9] NULL pointer dereference when mounting a GFS2 filesystem. Installing [fmlfqa0h] NULL-pointer dereference when driver logs in/out of system. Installing [r0hywzwk] Soft lockup in low memory situation due to misused allocator in LPFC. Installing [o6ivsuer] Use-after-free due to race condition in LPFC NVME write handling. Installing [muzxn90m] Hard lock up when handling aborts for LPFC. Installing [297glmum] Soft lockup when multiple threads read /proc/stat simultaneously. Installing [p2wsrgbk] Denial-of-service in the batman-adv subsystem. Installing [tg92ldhm] Spurious signals during TTY reopen. Installing [e6k3gzxq] CVE-2019-18809: Memory leak when identifying state in Afatech AF9005 DVB-T USB1.1 driver. Installing [9ccfw3yo] CVE-2019-18806: Memory leak when allocating large buffers in QLogic QLA3XXX Network driver. Installing [73fkk23h] Use-after-free when using NFS with page cache. Installing [d4thcifo] NULL pointer dereference during UBIFS mount. Installing [jdwrgfic] CVE-2020-10942: Out-of-bounds memory access in the Virtual host driver. Installing [iizw2yj1] Use-after-free when constructing ERSPAN packet header. Installing [221vs9ih] Deadlock when deleting NVMe namespace fails. Installing [1outnjhr] Out-of-bounds read when transmitting packet using XFRM. Installing [i6sjtrm9] Point-to-Point Protocol IOCDETACH ioctl causes use-after-free. Installing [8grqdp2d] Flawed logic in read/write semaphore implementation causes crash. Installing [jsndggsn] System fails to generate vmcore dump after panic. Installing [p6dow9c8] CVE-2018-5953: Information leak in software IO TLB driver. Your kernel is fully up to date. Effective kernel version is 4.14.35-1902.301.1.el7uek Cleanup : uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200128-0.noarch 2/2 Verifying : uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch 1/2 Verifying : uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200128-0.noarch 2/2

Updated: uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64.noarch 0:20200417-0

Complete! [root@tvdexa07db01 Ksplice]#

Check the Newly Installed Uptrack Version

[root@tvdexa07db01 Ksplice]# yum list installed | grep uptrack-updates uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64.noarch 20200417-0 @/uptrack-updates-4.14.35-1902.9.2.el7uek.x86_64-20200417-0.noarch

[root@tvdexa07db01 Ksplice]# uname -r 4.14.35-1902.9.2.el7uek.x86_64

[root@tvdexa07db01 Ksplice]# uptrack-uname -r 4.14.35-1902.301.1.el7uek.x86_64

[root@tvdexa07db01 Ksplice]# uptrack-show Installed updates: [qb92vyqf] Known exploit detection. [inu8wb4r] Known exploit detection for CVE-2017-7308. [cd8sbgv7] Known exploit detection for CVE-2018-14634. [1kzvx01a] KPTI enablement for Ksplice. [6vwn8ut1] Known exploit detection for CVE-2018-18445. [4zccrjds] KSPLICE enablement for patching KVM Intel module. [gzezyqb5] Memory leak in Mellanox ConnextX HCA Infiniband CX-3 virtual functions. [bk9sc6f2] NULL pointer dereference when mounting a CIFS filesystem with invalid mount option. [tiomcx4f] CVE-2019-15917: Use-after-free when registering Bluetooth HCI uart device. [1oztqfiv] Memory corruption in Reliable Datagram Socket send completion. [13vvcp2l] Oracle ASM device hang during offline. [d1r03k7l] Network stall during RDMA failover. [mnnc6fcu] CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. [tvgkcphj] Improved fix to CVE-2019-3016: Privilege escalation in KVM guest paravirtualized TLB flushes. [bss17slx] CVE-2019-19332: Denial-of-service in KVM cpuid emulation reporting. [sl587t78] Missing Non Maskable Interrupts on AMD KVM guests. [98393wix] CVE-2016-5244: Information leak in the RDS network protocol. [alabfukc] CVE-2019-17666: Remote code execution in Realtek peer-to-peer Wifi. [sc77umtr] CVE-2019-0155: Privilege escalation in Intel i915 graphics driver. [o8kefoq8] CVE-2019-0154: Denial-of-service in Intel i915 graphics driver. [cy9t2fe7] Improved fix to CVE-2019-11135: Side-channel information leak in Intel TSX on late microcode update. [3h40qtja] IO hang on block multiqueue device waits. [ll4csbbo] CVE-2019-20054: Denial-of-service in procfs sysctl removal. [r7vboahq] Deadlock in Reliable Datagram socket connection. [pojd20br] Denial-of-service in Reliable Datagram Socket send cancellation. [ayrz7nql] CVE-2020-2732: Privilege escalation in Intel KVM nested emulation. [e6szjj4l] CVE-2019-14901: Denial-of-service when parsing TDLS action frame in Marvell WiFi-Ex driver. [9253ftyr] CVE-2019-15291: Denial-of-service in B2C2 FlexCop driver probing. [h6qm8tqh] IO hang in Reliable Datagram Socket remote DMA socket closing. [jmo8wvme] Denial-of-service in CIFS POSIX file locks on close. [c92mbxy5] CVE-2019-15538: Denial-of-service in XFS filesystem with Quota support enabled. [6z2m3403] CVE-2020-7053: Use-after-free when destroying i915 GEM context. [c5t3d5fa] NULL dereference while loading userspace I/O driver. [8dgt3zi6] CVE-2019-14895: Denial-of-service when receiving Country WLAN element in Marvell WiFi-Ex driver. [a4scbw6j] Kernel panic in Infiniband RDMA QP send queues. [9m542zj0] Denial-of-service in IPMI device registration. [jq0r6r0v] Denial-of-service in Pressure Stall Information cgroup destruction. [m5jqkbu3] Denial-of-service in Infiniband pool destruction. [miag3lwz] CVE-2019-19338: Missing Intel TAA mitigation in KVM guests. [28o9c5jg] CVE-2019-14615: Information leak in Intel i915 generation 9 devices. [e9ghelsq] Race condition in crypto initialization causes denial-of-service. [8kg2ltrj] Denial-of-service in XFS whiteout renames. [djt21tl9] NULL pointer dereference when mounting a GFS2 filesystem. [fmlfqa0h] NULL-pointer dereference when driver logs in/out of system. [r0hywzwk] Soft lockup in low memory situation due to misused allocator in LPFC. [o6ivsuer] Use-after-free due to race condition in LPFC NVME write handling. [muzxn90m] Hard lock up when handling aborts for LPFC. [297glmum] Soft lockup when multiple threads read /proc/stat simultaneously. [p2wsrgbk] Denial-of-service in the batman-adv subsystem. [tg92ldhm] Spurious signals during TTY reopen. [e6k3gzxq] CVE-2019-18809: Memory leak when identifying state in Afatech AF9005 DVB-T USB1.1 driver. [9ccfw3yo] CVE-2019-18806: Memory leak when allocating large buffers in QLogic QLA3XXX Network driver. [73fkk23h] Use-after-free when using NFS with page cache. [d4thcifo] NULL pointer dereference during UBIFS mount. [jdwrgfic] CVE-2020-10942: Out-of-bounds memory access in the Virtual host driver. [iizw2yj1] Use-after-free when constructing ERSPAN packet header. [221vs9ih] Deadlock when deleting NVMe namespace fails. [1outnjhr] Out-of-bounds read when transmitting packet using XFRM. [i6sjtrm9] Point-to-Point Protocol IOCDETACH ioctl causes use-after-free. [8grqdp2d] Flawed logic in read/write semaphore implementation causes crash. [jsndggsn] System fails to generate vmcore dump after panic. [p6dow9c8] CVE-2018-5953: Information leak in software IO TLB driver.

Effective kernel version is 4.14.35-1902.301.1.el7uek [root@tvdexa07db01 Ksplice]#

One last personal feedback about Oracle Kesplice, this tool is really cool! Few times I used to quickly and effectively fix Kernel Bug online, avoiding unplanned service interruption.

Bulk Exadata Patching

After more than 11 year from the launch Oracle Exadata Machine has become popular on many companies across industries, making administrators, developers and final users almost unanimously satisfied about performance and availability.

But also, on Exadata there are cumbersome maintenance activities like patching.

Most of my Exadata customers have acquired 2 non-full RACKs, which makes the patching effort quite reasonable; but recently I started working on a project with multiple full RACKs, with tens of Storage Servers, Compute Nodes and hundreds of Virtual Machines…

A very challenging environment, especially when it came to patching…

Patching all the systems using the standard patchmgr utiliy was not acceptable, therefore I had to replace my standard patching procedure with a new one offering automation and scalability.

At this subject Oracle provides few handy options:

Patching Exadata Infrastructure

Storage Server Patching via http/https server: starting with Oracle Exadata System Software release 18.1.0.0.0, it is possible to patch the Storage Servers using an external http server hosting the new software image. The activitiy can be scheduled up to one week before the installation, allowing on each Cell the Management Server (MS) to start downloading and run pre-checks in advance. MS interrupts the software upgrade and generates an alert if the Cell does not comply with all pre-requisites.

Unbreakable Linux Network: ULN offers software patches, updates, and fixes for Oracle Linux and Oracle VM. The implementation of a local YUM repository leverages the patch automation of the bare metal OS or dom0/domU.

InfiniBand Switch: standard rolling upgrade patching procedure using patchmgr.

Patching Grid Infrastructure & RDBMS

GI & RDBMS: those components are patched using the standard Oracle tools common to all platforms, but the entire process has been parallalized using OS tools like dcli commands.

Overview Bulk Exadata Patching

Main Patching Commands

Storage Server – Scheduling Automated Storage Server Update via HTTP/HTTPS

On the Storage Cell set the local Apache location hosting the cell software

[root@efucndb01-a ~]# dcli -l root -g ~/cells cellcli -e 'alter softwareUpdate store=\"http://uln-yum.emilianofusaglia.net/cellsw\"'
efucncel01-a: Software Update successfully altered.
efucncel02-a: Software Update successfully altered.
efucncel03-a: Software Update successfully altered.
efucncel04-a: Software Update successfully altered.
efucncel05-a: Software Update successfully altered.
efucncel06-a: Software Update successfully altered.
efucncel07-a: Software Update successfully altered.
efucncel08-a: Software Update successfully altered.
efucncel09-a: Software Update successfully altered.
efucncel10-a: Software Update successfully altered.
efucncel11-a: Software Update successfully altered.
efucncel12-a: Software Update successfully altered.
efucncel13-a: Software Update successfully altered.
efucncel14-a: Software Update successfully altered.
[root@efucndb01-a ~]#

Schedule the update

[root@efucndb01-a ~]# dcli -l root -g ~/cells cellcli -e 'alter softwareUpdate time=\"03:20 AM WEDNESDAY\"'
efucncel01-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel02-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel03-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel04-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel05-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel06-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel07-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel08-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel09-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel10-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel11-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel12-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel13-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
efucncel14-a: Software update is scheduled to begin at: 2020-02-05T03:20:00+01:00.
[root@efucndb01-a ~]#

Verify the scheduled upgrade

[root@efucndb01-a ~]# dcli -l root -g ~/cells cellcli -e 'list softwareupdate detail'
efucncel01-a: name: 19.3.4.0.0.200130
efucncel01-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel01-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel01-a: time: 2020-02-05T03:20:00+01:00
efucncel02-a: name: 19.3.4.0.0.200130
efucncel02-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel02-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel02-a: time: 2020-02-05T03:20:00+01:00
efucncel03-a: name: 19.3.4.0.0.200130
efucncel03-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel03-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel03-a: time: 2020-02-05T03:20:00+01:00
efucncel04-a: name: 19.3.4.0.0.200130
efucncel04-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel04-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel04-a: time: 2020-02-05T03:20:00+01:00
efucncel05-a: name: 19.3.4.0.0.200130
efucncel05-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel05-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel05-a: time: 2020-02-05T03:20:00+01:00
efucncel06-a: name: 19.3.4.0.0.200130
efucncel06-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel06-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel06-a: time: 2020-02-05T03:20:00+01:00
efucncel07-a: name: 19.3.4.0.0.200130
efucncel07-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel07-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel07-a: time: 2020-02-05T03:20:00+01:00
efucncel08-a: name: 19.3.4.0.0.200130
efucncel08-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel08-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel08-a: time: 2020-02-05T03:20:00+01:00
efucncel09-a: name: 19.3.4.0.0.200130
efucncel09-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel09-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel09-a: time: 2020-02-05T03:20:00+01:00
efucncel10-a: name: 19.3.4.0.0.200130
efucncel10-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel10-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel10-a: time: 2020-02-05T03:20:00+01:00
efucncel11-a: name: 19.3.4.0.0.200130
efucncel11-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel11-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel11-a: time: 2020-02-05T03:20:00+01:00
efucncel12-a: name: 19.3.4.0.0.200130
efucncel12-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel12-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel12-a: time: 2020-02-05T03:20:00+01:00
efucncel13-a: name: 19.3.4.0.0.200130
efucncel13-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel13-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel13-a: time: 2020-02-05T03:20:00+01:00
efucncel14-a: name: 19.3.4.0.0.200130
efucncel14-a: status: PreReq OK. Ready to update at: 2020-02-05T03:20:00+01:00
efucncel14-a: store: http://uln-yum.emilianofusaglia.net/cellsw
efucncel14-a: time: 2020-02-05T03:20:00+01:00
[root@efucndb01-a ~]#

Unbreakable Linux Network

dom0 checks

[root@efuconsole dbserver_patch_19.200120]# ./patchmgr -dbnodes ~/dom0 -precheck -yum_repo http://uln-yum.emilianofusaglia.net/yum/EngineeredSystems/exadata/dbserver/dom0/19.3.4.0.0/base/x86_64 -target_version 19.3.4.0.0.200130

NOTE patchmgr release: 19.200120 (always check MOS 1553103.1 for the latest release of dbserver.patch.zip)
NOTE
WARNING Do not interrupt the patchmgr session.
WARNING Do not resize the screen. It may disturb the screen layout.
WARNING Do not reboot database nodes during update or rollback.
WARNING Do not open logfiles in write mode and do not try to alter them.

2020-02-06 14:06:17 +0100 :Working: Verify SSH equivalence for the root user to node(s)
2020-02-06 14:06:19 +0100 :SUCCESS: Verify SSH equivalence for the root user to node(s)
2020-02-06 14:06:22 +0100 :Working: Initiate precheck on 8 node(s)
2020-02-06 14:07:36 +0100 :Working: Check free space on node(s)
2020-02-06 14:07:42 +0100 :SUCCESS: Check free space on node(s)
2020-02-06 14:08:07 +0100 :Working: dbnodeupdate.sh running a precheck on node(s).
2020-02-06 14:09:43 +0100 :SUCCESS: Initiate precheck on node(s).
2020-02-06 14:09:45 +0100 :SUCCESS: Completed run of command: ./patchmgr -dbnodes /root/dom0 -precheck -yum_repo http://uln-yum.emilianofusaglia.net/yum/EngineeredSystems/exadata/dbserver/dom0/19.3.4.0.0/base/x86_64 -target_version 19.3.4.0.0.200130
2020-02-06 14:09:45 +0100 :INFO : Precheck attempted on nodes in file /root/dom0: [efucndb01-a efucndb02-a efucndb03-a efucndb04-a efucndb05-a efucndb06-a efucndb07-a efucndb08-a]
2020-02-06 14:09:45 +0100 :INFO : Current image version on dbnode(s) is:
2020-02-06 14:09:45 +0100 :INFO : efucndb01-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb02-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb03-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb04-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb05-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb06-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb07-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : efucndb08-a: 19.2.6.0.0.190911.1
2020-02-06 14:09:45 +0100 :INFO : For details, check the following files in /EXAVMIMAGES/Patch/patchmgr_DBSERVER/dbserver_patch_19.200120:
2020-02-06 14:09:45 +0100 :INFO : - _dbnodeupdate.log
2020-02-06 14:09:45 +0100 :INFO : - patchmgr.log
2020-02-06 14:09:45 +0100 :INFO : - patchmgr.trc
2020-02-06 14:09:45 +0100 :INFO : Exit status:0
2020-02-06 14:09:45 +0100 :INFO : Exiting.
[root@efucndb01-a dbserver_patch_19.200120]#

dom0 upgrade

[root@efuconsole dbserver_patch_19.200120]# ./patchmgr -dbnodes ~/dom0 -upgrade -yum_repo http://uln-yum.emilianofusaglia.net/yum/EngineeredSystems/exadata/dbserver/dom0/19.3.4.0.0/base/x86_64 -target_version 19.3.4.0.0.200130

NOTE patchmgr release: 19.200120 (always check MOS 1553103.1 for the latest release of dbserver.patch.zip)
NOTE
NOTE Database nodes will reboot during the update process.
NOTE
WARNING Do not interrupt the patchmgr session.
WARNING Do not resize the screen. It may disturb the screen layout.
WARNING Do not reboot database nodes during update or rollback.
WARNING Do not open logfiles in write mode and do not try to alter them.

2020-02-06 14:29:11 +0100 :Working: Verify SSH equivalence for the root user to node(s)
2020-02-06 14:29:13 +0100 :SUCCESS: Verify SSH equivalence for the root user to node(s)
2020-02-06 14:29:15 +0100 :Working: Initiate prepare steps on node(s).
2020-02-06 14:29:17 +0100 :Working: Check free space on node(s)
2020-02-06 14:29:23 +0100 :SUCCESS: Check free space on node(s)
2020-02-06 14:29:59 +0100 :SUCCESS: Initiate prepare steps on node(s).
2020-02-06 14:29:59 +0100 :Working: Initiate update on 8 node(s).
2020-02-06 14:29:59 +0100 :Working: dbnodeupdate.sh running a backup on 8 node(s).
2020-02-06 14:36:16 +0100 :SUCCESS: dbnodeupdate.sh running a backup on 8 node(s).
2020-02-06 14:36:16 +0100 :Working: Initiate update on node(s)
2020-02-06 14:36:16 +0100 :Working: Get information about any required OS upgrades from node(s).
2020-02-06 14:36:28 +0100 :SUCCESS: Get information about any required OS upgrades from node(s).
2020-02-06 14:36:28 +0100 :Working: dbnodeupdate.sh running an update step on all nodes.
2020-02-06 14:56:38 +0100 :INFO : efucndb01-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb02-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb03-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb04-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb05-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb06-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb07-a is ready to reboot.
2020-02-06 14:56:38 +0100 :INFO : efucndb08-a is ready to reboot.
2020-02-06 14:56:39 +0100 :SUCCESS: dbnodeupdate.sh running an update step on all nodes.
2020-02-06 14:56:51 +0100 :Working: Initiate reboot on node(s)
2020-02-06 14:56:55 +0100 :SUCCESS: Initiate reboot on node(s)
2020-02-06 14:56:55 +0100 :Working: Waiting to ensure node(s) is down before reboot.
2020-02-06 14:58:20 +0100 :SUCCESS: Waiting to ensure node(s) is down before reboot.
2020-02-06 14:58:20 +0100 :Working: Waiting to ensure node(s) is up after reboot.
2020-02-06 15:04:23 +0100 :SUCCESS: Waiting to ensure node(s) is up after reboot.
2020-02-06 15:04:23 +0100 :Working: Waiting to connect to node(s) with SSH. During Linux upgrades this can take some time.
2020-02-06 15:27:46 +0100 :SUCCESS: Waiting to connect to node(s) with SSH. During Linux upgrades this can take some time.
2020-02-06 15:27:46 +0100 :Working: Wait for node(s) is ready for the completion step of update.
2020-02-06 15:31:29 +0100 :SUCCESS: Wait for node(s) is ready for the completion step of update.
2020-02-06 15:31:30 +0100 :Working: Initiate completion step from dbnodeupdate.sh on node(s)
2020-02-06 15:48:10 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb01-a
2020-02-06 15:48:14 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb02-a
2020-02-06 15:48:19 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb03-a
2020-02-06 15:48:30 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb04-a
2020-02-06 15:48:35 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb05-a
2020-02-06 15:48:46 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb06-a
2020-02-06 15:48:50 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb07-a
2020-02-06 15:49:02 +0100 :SUCCESS: Initiate completion step from dbnodeupdate.sh on efucndb08-a
2020-02-06 15:49:18 +0100 :SUCCESS: Initiate update on node(s).
2020-02-06 15:49:18 +0100 :SUCCESS: Initiate update on 0 node(s).
[INFO ] Collected dbnodeupdate diag in file: Diag_patchmgr_dbnode_upgrade_060220142909.tbz
-rw-r--r-- 1 root root 6381043 Feb 6 15:49 Diag_patchmgr_dbnode_upgrade_060220142909.tbz
2020-02-06 15:49:22 +0100 :SUCCESS: Completed run of command: ./patchmgr -dbnodes /root/dom0 -upgrade -yum_repo http://uln-yum.emilianofusaglia.net/yum/EngineeredSystems/exadata/dbserver/dom0/19.3.4.0.0/base/x86_64 -target_version 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : Upgrade attempted on nodes in file /root/dom0: [efucndb01-a efucndb02-a efucndb03-a efucndb04-a efucndb05-a efucndb06-a efucndb07-a efucndb08-a]
2020-02-06 15:49:22 +0100 :INFO : Current image version on dbnode(s) is:
2020-02-06 15:49:22 +0100 :INFO : efucndb01-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb02-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb03-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb04-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb05-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb06-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb07-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : efucndb08-a: 19.3.4.0.0.200130
2020-02-06 15:49:22 +0100 :INFO : For details, check the following files in /EXAVMIMAGES/Patch/patchmgr_DBSERVER/dbserver_patch_19.200120:
2020-02-06 15:49:22 +0100 :INFO : - _dbnodeupdate.log
2020-02-06 15:49:22 +0100 :INFO : - patchmgr.log
2020-02-06 15:49:22 +0100 :INFO : - patchmgr.trc
2020-02-06 15:49:22 +0100 :INFO : Exit status:0
2020-02-06 15:49:22 +0100 :INFO : Exiting.

InfiniBand Switch

IB switch checks

[root@efucndb01-a patch_switch_19.3.4.0.0.200130]# ./patchmgr -ibswitches ~/ibs -upgrade -ibswitch_precheck
2020-02-10 07:57:44 +0100 :Working: Verify SSH equivalence for the root user to node(s)
2020-02-10 07:57:46 +0100 :SUCCESS: Verify SSH equivalence for the root user to node(s)
2020-02-10 07:57:47 +0100 1 of 1 :Working: Initiate pre-upgrade validation check on InfiniBand switch(es).
----- InfiniBand switch update process started 2020-02-10 07:57:48 +0100 -----
[NOTE ] Log file at /EXAVMIMAGES/Patch/IBs_19.3.4.0.0/patch_switch_19.3.4.0.0.200130/upgradeIBSwitch.log
[INFO ] List of InfiniBand switches for upgrade: ( efucnsw-iba01-a efucnsw-ibb01-a )
[SUCCESS ] Verifying Network connectivity to efucnsw-iba01-a
[SUCCESS ] Verifying Network connectivity to efucnsw-ibb01-a
[SUCCESS ] Validating verify-topology output
[INFO ] Master Subnet Manager is set to "efucnsw-iba01-a" in all Switches
[INFO ] ---------- Starting with InfiniBand Switch efucnsw-iba01-a
[WARNING ] Infiniband switch meets minimal version requirements, but downgrade is only available to 2.2.13-2 with the current package.
To downgrade to other versions:
Manually download the InfiniBand switch firmware package to the patch directory
Set export variable "EXADATA_IMAGE_IBSWITCH_DOWNGRADE_VERSION" to the appropriate version
Run patchmgr command to initiate downgrade.
[SUCCESS ] Verify SSH access to the patchmgr host efucndb01-a.emilianofusaglia.net from the InfiniBand Switch efucnsw-iba01-a.
[INFO ] Starting pre-update validation on efucnsw-iba01-a
[SUCCESS ] Verifying that /tmp has 150M in efucnsw-iba01-a, found 492M
[SUCCESS ] Verifying that / has 20M in efucnsw-iba01-a, found 28M
[SUCCESS ] NTP daemon is running on efucnsw-iba01-a.
[INFO ] Manually validate the following entries Date:(YYYY-aM-DD) 2020-02-10 Time:(HH:MM:SS) 07:58:05
[INFO ] Validating the current firmware on the InfiniBand Switch
[SUCCESS ] Firmware verification on InfiniBand switch efucnsw-iba01-a
[SUCCESS ] Verifying that the patchmgr host efucndb01-a.emilianofusaglia.net is recognized on the InfiniBand Switch efucnsw-iba01-a through getHostByName
[SUCCESS ] Execute plugin check for Patch Check Prereq on efucnsw-iba01-a
[INFO ] Finished pre-update validation on efucnsw-iba01-a
[SUCCESS ] Pre-update validation on efucnsw-iba01-a
[SUCCESS ] Prereq check on efucnsw-iba01-a
[INFO ] ---------- Starting with InfiniBand Switch efucnsw-ibb01-a
[WARNING ] Infiniband switch meets minimal version requirements, but downgrade is only available to 2.2.13-2 with the current package.
To downgrade to other versions:
Manually download the InfiniBand switch firmware package to the patch directory
Set export variable "EXADATA_IMAGE_IBSWITCH_DOWNGRADE_VERSION" to the appropriate version
Run patchmgr command to initiate downgrade.
[SUCCESS ] Verify SSH access to the patchmgr host efucndb01-a.emilianofusaglia.net from the InfiniBand Switch efucnsw-ibb01-a.
[INFO ] Starting pre-update validation on efucnsw-ibb01-a
[SUCCESS ] Verifying that /tmp has 150M in efucnsw-ibb01-a, found 492M
[SUCCESS ] Verifying that / has 20M in efucnsw-ibb01-a, found 28M
[SUCCESS ] NTP daemon is running on efucnsw-ibb01-a.
[INFO ] Manually validate the following entries Date:(YYYY-aM-DD) 2020-02-10 Time:(HH:MM:SS) 07:58:25
[INFO ] Validating the current firmware on the InfiniBand Switch
[SUCCESS ] Firmware verification on InfiniBand switch efucnsw-ibb01-a
[SUCCESS ] Verifying that the patchmgr host efucndb01-a.emilianofusaglia.net is recognized on the InfiniBand Switch efucnsw-ibb01-a through getHostByName
[SUCCESS ] Execute plugin check for Patch Check Prereq on efucnsw-ibb01-a
[INFO ] Finished pre-update validation on efucnsw-ibb01-a
[SUCCESS ] Pre-update validation on efucnsw-ibb01-a
[SUCCESS ] Prereq check on efucnsw-ibb01-a
[SUCCESS ] Overall status
----- InfiniBand switch update process ended 2020-02-10 07:58:42 +0100 -----
2020-02-10 07:58:42 +0100 1 of 1 :SUCCESS: Initiate pre-upgrade validation check on InfiniBand switch(es).
2020-02-10 07:58:42 +0100 :SUCCESS: Completed run of command: ./patchmgr -ibswitches /root/ibs -upgrade -ibswitch_precheck
2020-02-10 07:58:42 +0100 :INFO : upgrade attempted on nodes in file /root/ibs: [efucnsw-iba01-a efucnsw-ibb01-a]
2020-02-10 07:58:42 +0100 :INFO : For details, check the following files in /EXAVMIMAGES/Patch/IBs_19.3.4.0.0/patch_switch_19.3.4.0.0.200130:
2020-02-10 07:58:42 +0100 :INFO : - upgradeIBSwitch.log
2020-02-10 07:58:42 +0100 :INFO : - upgradeIBSwitch.trc
2020-02-10 07:58:42 +0100 :INFO : - patchmgr.stdout
2020-02-10 07:58:42 +0100 :INFO : - patchmgr.stderr
2020-02-10 07:58:42 +0100 :INFO : - patchmgr.log
2020-02-10 07:58:42 +0100 :INFO : - patchmgr.trc
2020-02-10 07:58:42 +0100 :INFO : Exit status:0
2020-02-10 07:58:42 +0100 :INFO : Exiting.
[root@efucndb01-a patch_switch_19.3.4.0.0.200130]#

IB switch upgrade

[root@efucndb01-a patch_switch_19.3.4.0.0.200130]# ./patchmgr -ibswitches ~/ibs -upgrade
2020-02-10 07:59:22 +0100 :Working: Verify SSH equivalence for the root user to node(s)
2020-02-10 07:59:24 +0100 :SUCCESS: Verify SSH equivalence for the root user to node(s)
2020-02-10 07:59:25 +0100 1 of 1 :Working: Initiate upgrade of InfiniBand switches to 2.2.14-1. Expect up to 40 minutes for each switch
----- InfiniBand switch update process started 2020-02-10 07:59:25 +0100 -----
[NOTE ] Log file at /EXAVMIMAGES/Patch/IBs_19.3.4.0.0/patch_switch_19.3.4.0.0.200130/upgradeIBSwitch.log
[INFO ] List of InfiniBand switches for upgrade: ( efucnsw-iba01-a efucnsw-ibb01-a )
[SUCCESS ] Verifying Network connectivity to efucnsw-iba01-a
[SUCCESS ] Verifying Network connectivity to efucnsw-ibb01-a
[SUCCESS ] Validating verify-topology output
[INFO ] Proceeding with upgrade of InfiniBand switches to version 2.2.14_1
[INFO ] Master Subnet Manager is set to "efucnsw-iba01-a" in all Switches
[INFO ] ---------- Starting with InfiniBand Switch efucnsw-iba01-a
[WARNING ] Infiniband switch meets minimal version requirements, but downgrade is only available to 2.2.13-2 with the current package.
To downgrade to other versions:
Manually download the InfiniBand switch firmware package to the patch directory
Set export variable "EXADATA_IMAGE_IBSWITCH_DOWNGRADE_VERSION" to the appropriate version
Run patchmgr command to initiate downgrade.
[SUCCESS ] Verify SSH access to the patchmgr host efucndb01-a.emilianofusaglia.net from the InfiniBand Switch efucnsw-iba01-a.
[INFO ] Starting pre-update validation on efucnsw-iba01-a
[SUCCESS ] Verifying that /tmp has 150M in efucnsw-iba01-a, found 492M
[SUCCESS ] Verifying that / has 20M in efucnsw-iba01-a, found 26M
[SUCCESS ] Service opensmd is running on InfiniBand Switch efucnsw-iba01-a
[SUCCESS ] NTP daemon is running on efucnsw-iba01-a.
[INFO ] Manually validate the following entries Date:(YYYY-aM-DD) 2020-02-10 Time:(HH:MM:SS) 07:59:41
[INFO ] Validating the current firmware on the InfiniBand Switch
[SUCCESS ] Firmware verification on InfiniBand switch efucnsw-iba01-a
[SUCCESS ] Verifying that the patchmgr host efucndb01-a.emilianofusaglia.net is recognized on the InfiniBand Switch efucnsw-iba01-a through getHostByName
[SUCCESS ] Execute plugin check for Patch Check Prereq on efucnsw-iba01-a
[INFO ] Finished pre-update validation on efucnsw-iba01-a
[SUCCESS ] Pre-update validation on efucnsw-iba01-a
[INFO ] Package will be downloaded at firmware update time via scp
[SUCCESS ] Execute plugin check for Patching on efucnsw-iba01-a
[INFO ] Starting upgrade on efucnsw-iba01-a to 2.2.14_1. Please give upto 15 mins for the process to complete. DO NOT INTERRUPT or HIT CTRL+C during the upgrade
[INFO ] Rebooting efucnsw-iba01-a to complete the firmware update. Wait for 15 minutes before continuing. DO NOT MANUALLY REBOOT THE INFINIBAND SWITCH
Connection to efucndb01-a closed by remote host.
Connection to efucndb01-a closed.
2020-02-10 08:27:49 +0100 :Working: Verify SSH equivalence for the root user to node(s)
2020-02-10 08:27:51 +0100 :SUCCESS: Verify SSH equivalence for the root user to node(s)
2020-02-10 08:27:52 +0100 1 of 1 :Working: Initiate upgrade of InfiniBand switches to 2.2.14-1. Expect up to 40 minutes for each switch
----- InfiniBand switch update process started 2020-02-10 08:27:52 +0100 -----
[NOTE ] Log file at /EXAVMIMAGES/Patch/IBs_19.3.4.0.0/patch_switch_19.3.4.0.0.200130/upgradeIBSwitch.log
[INFO ] List of InfiniBand switches for upgrade: ( efucnsw-iba01-a efucnsw-ibb01-a )
[SUCCESS ] Verifying Network connectivity to efucnsw-iba01-a
[SUCCESS ] Verifying Network connectivity to efucnsw-ibb01-a
[INFO ] InfiniBand switch efucnsw-iba01-a is already at target version.
[SUCCESS ] Validating verify-topology output
[INFO ] Proceeding with upgrade of InfiniBand switches to version 2.2.14_1
[INFO ] Master Subnet Manager is set to "efucnsw-ibb01-a" in all Switches
[INFO ] ---------- Starting with InfiniBand Switch efucnsw-ibb01-a
[WARNING ] Infiniband switch meets minimal version requirements, but downgrade is only available to 2.2.13-2 with the current package.
To downgrade to other versions:
Manually download the InfiniBand switch firmware package to the patch directory
Set export variable "EXADATA_IMAGE_IBSWITCH_DOWNGRADE_VERSION" to the appropriate version
Run patchmgr command to initiate downgrade.
[SUCCESS ] Verify SSH access to the patchmgr host efucndb01-a.emilianofusaglia.net from the InfiniBand Switch efucnsw-ibb01-a.
[INFO ] Starting pre-update validation on efucnsw-ibb01-a
[SUCCESS ] Verifying that /tmp has 150M in efucnsw-ibb01-a, found 492M
[SUCCESS ] Verifying that / has 20M in efucnsw-ibb01-a, found 26M
[SUCCESS ] Service opensmd is running on InfiniBand Switch efucnsw-ibb01-a
[SUCCESS ] NTP daemon is running on efucnsw-ibb01-a.
[INFO ] Manually validate the following entries Date:(YYYY-aM-DD) 2020-02-10 Time:(HH:MM:SS) 08:28:07
[INFO ] Validating the current firmware on the InfiniBand Switch
[SUCCESS ] Firmware verification on InfiniBand switch efucnsw-ibb01-a
[SUCCESS ] Verifying that the patchmgr host efucndb01-a.emilianofusaglia.net is recognized on the InfiniBand Switch efucnsw-ibb01-a through getHostByName
[SUCCESS ] Execute plugin check for Patch Check Prereq on efucnsw-ibb01-a
[INFO ] Finished pre-update validation on efucnsw-ibb01-a
[SUCCESS ] Pre-update validation on efucnsw-ibb01-a
[INFO ] Package will be downloaded at firmware update time via scp
[SUCCESS ] Execute plugin check for Patching on efucnsw-ibb01-a
[INFO ] Starting upgrade on efucnsw-ibb01-a to 2.2.14_1. Please give upto 15 mins for the process to complete. DO NOT INTERRUPT or HIT CTRL+C during the upgrade
[INFO ] Rebooting efucnsw-ibb01-a to complete the firmware update. Wait for 15 minutes before continuing. DO NOT MANUALLY REBOOT THE INFINIBAND SWITCH
Connection to efucndb01-a closed by remote host.
Connection to efucndb01-a closed.

Exadata X8M introduces BIG Architectural Changes

Launched at the 2019 Oracle Open Word Conference the Exadata X8M introduces few big architectural changes which make the leading Oracle Database Machine even more attractive.

Among the most relevant changes there are:

InfiniBand Network replacement with an Ethernet network fabric
Intel Optane DC Persistent Memory inside the Storage Cell
New Remote Direct Memory Access (RDMA) functionalities
Replacement of the XEN Hypervisor with KVM

InfiniBand Network replacement with an Ethernet network fabric

The characteristic 40Gbit/second InfiniBand network used for all private network communications among database nodes and storage cells has been replaced by a new 100Gbit/second RDMA over Converged Ethernet Fabric (RoCE) based on the Cisco Switch 9336c RoCE .

The new network not only increase 2.5x the throughput but also reduces the communication latancy.

The schema below highlight the network architecture change for all private communications

Intel Optane DC Persistent Memory inside the Storage Cell

Oracle has introduced 1.5TB of Intel Optane DC Persistent Memory as additional storage device inside all Exadata X8M Storage Cell, (no matter if equipped with HC and EF devices), and it is used as accelerator in front of the Flash Memory Cards. In term of speed this new type of ultra fast storage device is located between the DRAM and the Flash Memory, bringing to three the number of multi-tiered storage devices present inside the Storage Cell.

The Exadata unique software is than capable to extract the maximum performance from this HW configuration, automatically detecting and placing the hottest data on the Persistent Memory, reducing the I/O latency of the most critical tasks.

Below is described the list of Storage Cell’s devices ordered by speed.

New Remote Direct Memory Access (RDMA) functionalities

Until now the RDMA was used among database nodes for exchanging Exafusion messages or for Smart Fusion Block Transfer. Starting with Exadata X8M, the RDMA technology is also used to perform direct I/O access to the Persistent Memory of the Storage Cells, bypassing the network and I/O stack and eliminating expensive CPU interrupts and context switches. This optimization reduces the latency by 10x, from 200μs to less than 19μs.

The picture below highlight the “Database Node to Database Node” and the new “Database Node to Storage Cell” communication using RDMA functionalities.

Replacement of the XEN Hypervisor with KVM

Oracle virtualization technology is called Oracle Virtual Machine (OVM) and in a productive invironment it can be implemented with one of this two different products:

Starting with Exadata X8M-2 the virtualization technology in use is KVM instead of Xen. Oracle started replacing Xen with KVM few years ago, for example on the smaller engineered system ODA X7-2M & X7-2S, but for the Exadata took longer, and I think the root cause was the InfiniBand network. Infact KVM is not fully integrated with InfiniBand, and it does not support bridging.

Exadata and IORM by Examples

The Exadata Machine is frequently used to consolidate the database infrastructure, and such kind of environments must guarantee performance stability and governance. On Exadata the IO Resource Manager extends the capabilities available also on the other platforms to allocate, cap and prioritize the resources among databases and categories.

Available since the the first version of the Storage Cell software, IORM has been recently enhanced to cope with the new Multitenant and Cloud requirements. The IORM Plan can optimize the workload with one of the following objectives: basic, auto, low_latency, balanced or high_throughput.

I/O Resource Manager Overview

IORM allows to execute I/O Requests based on their priority, this is achieved handling separated queues which manage High and Low priority requests as shown on the image below.

IORM_Overview

Default IORM status

Automatically enabled it cannot be completely disabled. The default mode, protects critical operations like flash cache and flash log I/Os

CellCLI> list iormplan detail
name: tvdceladm06_IORMPLAN
catPlan:
dbPlan:
objective: basic
status: active

CellCLI>

Per Database IORM definition

This configuration is suitable on environments with a small number of databases, where the I/O resources are individually defined for each database.

alter iormplan objective=auto

ALTER IORMPLAN -
dbplan=((name=ERP01, level=1, allocation=75, limit=95, role=primary), -
(name=ERP01, level=1, allocation=5, limit=25, role=standby),          -
(name=TREP, level=1, allocation=2, limit=5, flashCacheSize=1G),       -
(name=EPA01, level=2, allocation=40, limit=80),                       -
(name=DHJ01, level=3, allocation=50, flashCacheSize=20G),             -
(name=other, level=3, allocation=30))

The above plan regulates: the database level, allocation (%), soft and hard limits (%), the amount of flash cache and the role (primary or standby).

DBaaS and IORM

This configuration is suitable for Cloud like environments, where a large number of databases are consolidated on the same infrastructure. The database services are standardized in few categories (for example Gold, Silver and Bronze) and the I/O resource plan regulates the same service categories.

CellCLI> ALTER IORMPLAN
dbplan=((name=gold, share=20,limit=100, type=profile), 
        (name=silver, share=10, limit=60, type=profile),
        (name=bronze, share=5, limit=20, type=profile))

The datase parameter db_performance_profile allows to associate the corresponding IORM service category to the instance:

SQL> alter system set db_performance_profile=silver scope=spfile;

Exadata Deployment with Elastic Configuration

Recently, for one of my customers, I had the chance to install a couples of Exadata X7-2 using the new Elastic Configuration. The major benefits of using Elastic Configuration consists in the possibility to acquire the Exadata Machine with almost any possible combination of Database Nodes and Storage Cells.

In the past we were used to standard Oracle pre-defined Exadata Machine configurations: Eighth Rack, Quarter Rack, Half Rack and Full Rack, which is still possible, but not flexible enough.

The pictures below highlight the differences between the two configurations:

Edadats_Classiv_vs_Elastic

source: Oracle Data Sheet Exadata Database Machine X7-2

Deployment Exadata Elactic Configuration

The elastic configuration process automates the initial IP address allocations to databasenodes and storage cells, regardless the ordered configuration. The Exadata Machine is connected to the InfiniBand switches using a standard cabling methodology which allows to determinate the node’s location in the rack. This information is therefore used when the nodes are powered up for the first time in order to assign the initial default IPs.

[root@exatest-iba0 ~]# ibhosts
Ca : 0x579b0123796ba0 ports 2 "node10 elasticNode 192.168.10.17,192.168.10.18 ETH0"
Ca : 0x579b01237966e0 ports 2 "node8 elasticNode 192.168.10.15,192.168.10.16 ETH0"
Ca : 0x579b0123844ab0 ports 2 "node6 elasticNode 192.168.10.11,192.168.10.12 ETH0"
Ca : 0x579b0123845e50 ports 2 "node5 elasticNode 192.168.10.7,192.168.10.8 ETH0"
Ca : 0x579b0123845fe0 ports 2 "node4 elasticNode 192.168.10.40,172.16.2.40 ETH0"
Ca : 0x579b0123845ea0 ports 2 "node3 elasticNode 192.168.10.9,192.168.10.10 ETH0"
Ca : 0x579b0123812b90 ports 2 "node2 elasticNode 192.168.10.1,192.168.10.2 ETH0"
Ca : 0x579b0123812970 ports 2 "node1 elasticNode 192.168.10.3,192.168.10.4 ETH0"
[root@exatest-iba0 ~]#

Because the Virtualization option was required, it has to be activated at this stage:

[root@node8 ~]# /opt/oracle.SupportTools/switch_to_ovm.sh
2019-03-07 01:05:22 -0800 [INFO] Switch to DOM0 system partition /dev/VGExaDb/LVDbSys3 (/dev/mapper/VGExaDb-LVDbSys3)
2019-03-07 01:05:22 -0800 [INFO] Active system device: /dev/mapper/VGExaDb-LVDbSys1
2019-03-07 01:05:22 -0800 [INFO] Active system device in boot area: /dev/mapper/VGExaDb-LVDbSys1
2019-03-07 01:05:23 -0800 [INFO] Set active system device to /dev/VGExaDb/LVDbSys3 in /boot/I_am_hd_boot
2019-03-07 01:05:23 -0800 [INFO] Creating /.elasticConfig on DOM0 boot partition /boot
2019-03-07 01:05:34 -0800 [INFO] Reboot has been initiated to switch to the DOM0 system partition
Connection to 192.168.1.8 closed by remote host.
Connection to 192.168.1.8 closed.
✘

After the switch to OVM command it is time to reclaim the space initially used by the Linux bare metal Logical Volumes:

[root@node8 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim
Model is ORACLE SERVER X7-2
Number of LSI controllers: 1
Physical disks found: 4 (252:0 252:1 252:2 252:3)
Logical drives found: 1
Linux logical drive: 0
RAID Level for the Linux logical drive: 5
Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
Dedicated Hot Spares for the Linux logical drive: 0
Global Hot Spares: 0
[INFO ] Check for DOM0 with inactive Linux system disk
[INFO ] Valid DOM0 with inactive Linux system disk is detected
[INFO ] Number of partitions on the system device /dev/sda: 3
[INFO ] Higher partition number on the system device /dev/sda: 3
[INFO ] Last sector on the system device /dev/sda: 3509760000
[INFO ] End sector of the last partition on the system device /dev/sda: 3509759966
[INFO ] Remove inactive system logical volume /dev/VGExaDb/LVDbSys1
[INFO ] Remove logical volume /dev/VGExaDb/LVDbOra1
[INFO ] Extend logical volume /dev/VGExaDb/LVDbExaVMImages
[INFO ] Resize ocfs2 on logical volume /dev/VGExaDb/LVDbExaVMImages
[INFO ] XEN boot version and rpm versions are in sync
[INFO ] XEN EFI files will not be updated
[INFO ] Force setup grub
[root@node8 ~]#

Check the success of the reclaim disks procedure:

[root@node8 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -check
Model is ORACLE SERVER X7-2
Number of LSI controllers: 1
Physical disks found: 4 (252:0 252:1 252:2 252:3)
Logical drives found: 1
Linux logical drive: 0
RAID Level for the Linux logical drive: 5
Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
Dedicated Hot Spares for the Linux logical drive: 0
Global Hot Spares: 0
Valid. Disks configuration: RAID5 from 4 disks with no global and dedicated hot spare disks.
Valid. Booted: DOM0. Layout: DOM0.
[root@node8 ~]#

Upload the Oracle Exadata Database Machine Deployment Assistant configuration files to the database server, together with all software images, and run the One command procedure.

List of all Steps

[root@exatestdbadm01 linux-x64]# ./install.sh -cf TVD-exatest.xml -l
Initializing

1. Validate Configuration File
2. Update Nodes for Eighth Rack
3. Create Virtual Machine
4. Create Users
5. Setup Cell Connectivity
6. Calibrate Cells
7. Create Cell Disks
8. Create Grid Disks
9. Install Cluster Software
10. Initialize Cluster Software
11. Install Database Software
12. Relink Database with RDS
13. Create ASM Diskgroups
14. Create Databases
15. Apply Security Fixes
16. Install Exachk
17. Create Installation Summary
18. Resecure Machine
[root@exatestdbadm01 linux-x64]#

Run Step One to validate the setup

This example includes the creation of three different Clusters.

[root@exatestdbadm01 linux-x64]# ./install.sh -cf TVD-exatest.xml -s 1
Initializing
Executing Validate Configuration File
Validating cluster: Cluster-EFU
Locating machines...
Verifying operating systems...
Validating cluster networks...
Validating network connectivity...
Validating private ips on virtual cluster
Validating NTP setup...
Validating physical disks on storage cells...
Validating users...
Validating cluster: Cluster-PR1
Locating machines...
Verifying operating systems...
Validating cluster networks...
Validating network connectivity...
Validating private ips on virtual cluster
Validating NTP setup...
Validating physical disks on storage cells...
Validating users...
Validating cluster: Cluster-VAL
Locating machines...
Verifying operating systems...
Validating cluster networks...
Validating network connectivity...
Validating private ips on virtual cluster
Validating NTP setup...
Validating physical disks on storage cells...
Validating users...
Validating platinum...
Validating switches...
Checking disk reclaim status...
Checking Disk Tests Status....
Completed validation...

SUCCESS: Ip address: 10.x8.xx.40 is configured correctly
SUCCESS: Ip address: 10.x9.xx.55 is configured correctly
SUCCESS: Ip address: 10.x8.xx.41 is configured correctly
SUCCESS: Ip address: 10.x9.xx.56 is configured correctly
SUCCESS: Ip address: 10.x8.xx.45 is configured correctly
SUCCESS: Ip address: 10.x8.xx.46 is configured correctly
SUCCESS: Ip address: 10.x8.xx.44 is configured correctly
SUCCESS: Ip address: 10.x8.xx.43 is configured correctly
SUCCESS: Ip address: 10.x8.xx.42 is configured correctly
SUCCESS: 10.x8.xx.40 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.55 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.41 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.56 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.45 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.46 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.44 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.43 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.42 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.40 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.55 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.41 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.56 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.45 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.46 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.44 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.43 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.42 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.40 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.55 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.41 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.56 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.45 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.46 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.44 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.43 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.42 configured correctly on exatestceladm03.my.domain.com
SUCCESS: Ip address: 10.x8.xx.47 is configured correctly
SUCCESS: Ip address: 10.x9.xx.57 is configured correctly
SUCCESS: Ip address: 10.x8.xx.48 is configured correctly
SUCCESS: Ip address: 10.x9.xx.58 is configured correctly
SUCCESS: Ip address: 10.x8.xx.52 is configured correctly
SUCCESS: Ip address: 10.x8.xx.51 is configured correctly
SUCCESS: Ip address: 10.x8.xx.53 is configured correctly
SUCCESS: Ip address: 10.x8.xx.50 is configured correctly
SUCCESS: Ip address: 10.x8.xx.49 is configured correctly
SUCCESS: 10.x8.xx.47 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.57 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.48 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.58 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.52 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.51 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.53 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.50 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.49 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.47 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.57 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.48 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.58 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.52 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.51 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.53 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.50 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.49 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.47 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.57 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.48 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.58 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.52 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.51 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.53 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.50 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.49 configured correctly on exatestceladm03.my.domain.com
SUCCESS: Ip address: 10.x8.xx.54 is configured correctly
SUCCESS: Ip address: 10.x9.xx.59 is configured correctly
SUCCESS: Ip address: 10.x8.xx.55 is configured correctly
SUCCESS: Ip address: 10.x9.xx.60 is configured correctly
SUCCESS: Ip address: 10.x8.xx.58 is configured correctly
SUCCESS: Ip address: 10.x8.xx.60 is configured correctly
SUCCESS: Ip address: 10.x8.xx.59 is configured correctly
SUCCESS: Ip address: 10.x8.xx.57 is configured correctly
SUCCESS: Ip address: 10.x8.xx.56 is configured correctly
SUCCESS: 10.x8.xx.54 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.59 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.55 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x9.xx.60 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.58 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.60 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.59 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.57 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.56 configured correctly on exatestceladm01.my.domain.com
SUCCESS: 10.x8.xx.54 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.59 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.55 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x9.xx.60 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.58 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.60 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.59 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.57 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.56 configured correctly on exatestceladm02.my.domain.com
SUCCESS: 10.x8.xx.54 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.59 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.55 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x9.xx.60 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.58 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.60 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.59 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.57 configured correctly on exatestceladm03.my.domain.com
SUCCESS: 10.x8.xx.56 configured correctly on exatestceladm03.my.domain.com
SUCCESS: Validated NTP server 10.x3.xx.xx0
SUCCESS: Validated NTP server 10.x3.xx.xx1
SUCCESS: Required file /EXAVMIMAGES/onecommand/linux-x64/WorkDir/p28514222_122118_Linux-x86-64.zip exists...
SUCCESS: Required file /EXAVMIMAGES/onecommand/linux-x64/WorkDir/p28762988_12201181016GIOCT2018RU_Linux-x86-64.zip exists...
SUCCESS: Required file /EXAVMIMAGES/onecommand/linux-x64/WorkDir/p28762989_12201181016DBOCT2018RU_Linux-x86-64.zip exists...
SUCCESS: Required file config/exachk.zip exists...
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm03.my.domain.com, machine type: storage
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm02.my.domain.com, machine type: storage
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm01.my.domain.com, machine type: storage
SUCCESS: Expected machine exatestdbadm01.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: Expected machine exatestdbadm02.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: NTP servers on machine exatestceladm02.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm01.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm03.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm01.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm02.my.domain.com verified successfully
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm02.my.domain.com
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm01.my.domain.com
SUCCESS: Expected machine exatestdbadm02.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm01.my.domain.com, machine type: storage
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm02.my.domain.com, machine type: storage
SUCCESS: Expected machine exatestdbadm01.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm03.my.domain.com, machine type: storage
SUCCESS: NTP servers on machine exatestceladm03.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm01.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm02.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm02.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm01.my.domain.com verified successfully
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm02.my.domain.com
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm01.my.domain.com
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm03.my.domain.com, machine type: storage
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm02.my.domain.com, machine type: storage
SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine exatestceladm01.my.domain.com, machine type: storage
SUCCESS: Expected machine exatestdbadm02.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: Expected machine exatestdbadm01.my.domain.com to have OS Type of Linux Dom0, and found OsType LinuxDom0
SUCCESS: NTP servers on machine exatestceladm03.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm02.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestceladm01.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm01.my.domain.com verified successfully
SUCCESS: NTP servers on machine exatestdbadm02.my.domain.com verified successfully
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm02.my.domain.com
SUCCESS: Sufficient memory for all the guests on database node exatestdbadm01.my.domain.com
SUCCESS: Switch IP 10.x9.xx.51 resolves successfully to host exatest-iba0.my.domain.com on node exatestceladm03.my.domain.com
SUCCESS:
SUCCESS: Switch IP 10.x9.xx.51 resolves successfully to host exatest-iba0.my.domain.com on node exatestceladm02.my.domain.com
SUCCESS: Switch IP 10.x9.xx.52 resolves successfully to host exatest-ibb0.my.domain.com on node exatestceladm03.my.domain.com
SUCCESS:
SUCCESS:
SUCCESS:
SUCCESS: Switch IP 10.x9.xx.52 resolves successfully to host exatest-ibb0.my.domain.com on node exatestceladm02.my.domain.com
SUCCESS:
SUCCESS: Switch IP 10.x9.xx.51 resolves successfully to host exatest-iba0.my.domain.com on node exatestceladm01.my.domain.com
SUCCESS: Switch IP 10.x9.xx.52 resolves successfully to host exatest-ibb0.my.domain.com on node exatestceladm01.my.domain.com
SUCCESS:
SUCCESS: X7 compute node exatestdbadm01.my.domain.com has updated Broadcom firmware
SUCCESS: X7 compute node exatestdbadm02.my.domain.com has updated Broadcom firmware
SUCCESS: Disk Tests are not running/active on any of the Storage Servers.
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm01
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm02
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm01
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm02
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm01
SUCCESS: Cluster Version 12.2.0.1.181016 is compatible with OL7 on exatestdbadm02
SUCCESS: Disk size 10000GB on cell exatestceladm01.my.domain.com matches the value specified in the OEDA configuration file
SUCCESS: Disk size 10000GB on cell exatestceladm02.my.domain.com matches the value specified in the OEDA configuration file
SUCCESS: Disk size 10000GB on cell exatestceladm03.my.domain.com matches the value specified in the OEDA configuration file
SUCCESS: Disk size 10000GB on cell exatestceladm04.my.domain.com matches the value specified in the OEDA configuration file
SUCCESS: Disk size 10000GB on cell exatestceladm05.my.domain.com matches the value specified in the OEDA configuration file
SUCCESS: Disk size 10000GB on cell exatestceladm06.my.domain.com matches the value specified in the OEDA configuration file
Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 250301 mS [4.0 minutes] Thu Mar 07 12:35:31 CET 2019]]
[root@exatestdbadm01 linux-x64]#

Execution of all remaining steps

Than, because we felt confident, we decide to invoke all remaining steps together:

root@exatestdbadm01 linux-x64]# ./install.sh -cf TVD-exatest.xml -r 1-18
...
..

The final result is the Exadata Machine installed with six Oracle VMs, and three Grid Infrastructure clusters each one running a test RAC database.

Oracle VM Server 3.4.5 – Kernel Memory Leak

Oracle VM Server instability on version 3.4.5 due to a Oracle Unbreakable Enterprise Kernel (UEK) bug.

The kernel version 4.1.12-124.14.5.el6uek.x86_64 has introduced a memory leak of the network module i40e.

Here is reported the backtraces collected after the problem from the /var/log/messages:

Dec 12 07:06:30 efuovs02 kernel: [1508192.885203] ntpd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Dec 12 07:06:30 efuovs02 kernel: [1508192.885208] ntpd cpuset=/ mems_allowed=0
Dec 12 07:06:30 efuovs02 kernel: [1508192.885217] CPU: 3 PID: 4751 Comm: ntpd Not tainted 4.1.12-124.14.5.el6uek.x86_64 #2
Dec 12 07:06:30 efuovs02 kernel: [1508192.885221] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 02/14/2018
Dec 12 07:06:30 efuovs02 kernel: [1508192.885224]  0000000000000000 ffff8804484cf678 ffffffff816e4bdb ffff88044d44aa00
Dec 12 07:06:30 efuovs02 kernel: [1508192.885230]  0000000000000000 ffff8804484cf708 ffffffff816e32d1 01ff8804484cf688
Dec 12 07:06:30 efuovs02 kernel: [1508192.885235]  ffff8804484cf718 ffff8804484cf6c8 ffffffff811fc561 ffff8804484cf800
Dec 12 07:06:30 efuovs02 kernel: [1508192.885241] Call Trace:
Dec 12 07:06:30 efuovs02 kernel: [1508192.885251]  [<ffffffff816e4bdb>] dump_stack+0x63/0x81
Dec 12 07:06:30 efuovs02 kernel: [1508192.885256]  [<ffffffff816e32d1>] dump_header+0x7f/0x1f3
Dec 12 07:06:30 efuovs02 kernel: [1508192.885264]  [<ffffffff811fc561>] ? vmpressure+0x21/0x90
Dec 12 07:06:30 efuovs02 kernel: [1508192.885272]  [<ffffffff8118e53c>] oom_kill_process+0x1cc/0x3c0
Dec 12 07:06:30 efuovs02 kernel: [1508192.885283]  [<ffffffff8108de0e>] ? has_capability_noaudit+0x1e/0x30
Dec 12 07:06:31 efuovs02 kernel: [1508192.885288]  [<ffffffff8118eaab>] __out_of_memory+0x31b/0x530
Dec 12 07:06:31 efuovs02 kernel: [1508192.885294]  [<ffffffff8118ee5b>] out_of_memory+0x5b/0x80
Dec 12 07:06:31 efuovs02 kernel: [1508192.885300]  [<ffffffff81194d42>] __alloc_pages_nodemask+0x952/0xab0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885307]  [<ffffffff811de28d>] alloc_pages_vma+0xbd/0x260
Dec 12 07:06:31 efuovs02 kernel: [1508192.885311]  [<ffffffff8118a59e>] ? find_get_entry+0x1e/0xc0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885317]  [<ffffffff811ce6bd>] read_swap_cache_async+0xed/0x170
Dec 12 07:06:31 efuovs02 kernel: [1508192.885322]  [<ffffffff811ce82d>] swapin_readahead+0xed/0x190
Dec 12 07:06:31 efuovs02 kernel: [1508192.885328]  [<ffffffff811bbfe0>] handle_mm_fault+0x12d0/0x1770
Dec 12 07:06:31 efuovs02 kernel: [1508192.885335]  [<ffffffff8121d910>] ? poll_select_copy_remaining+0x130/0x130
Dec 12 07:06:31 efuovs02 kernel: [1508192.885340]  [<ffffffff8106d57f>] __do_page_fault+0x1af/0x480
Dec 12 07:06:31 efuovs02 kernel: [1508192.885346]  [<ffffffff816f2c1c>] ? page_fault+0xcc/0x120
Dec 12 07:06:31 efuovs02 kernel: [1508192.885350]  [<ffffffff8106d87f>] do_page_fault+0x2f/0x80
Dec 12 07:06:31 efuovs02 kernel: [1508192.885354]  [<ffffffff816f2be4>] ? page_fault+0x94/0x120
Dec 12 07:06:31 efuovs02 kernel: [1508192.885359]  [<ffffffff816f2bdd>] ? page_fault+0x8d/0x120
Dec 12 07:06:31 efuovs02 kernel: [1508192.885363]  [<ffffffff816f2bd6>] ? page_fault+0x86/0x120
Dec 12 07:06:31 efuovs02 kernel: [1508192.885367]  [<ffffffff816f2c5f>] page_fault+0x10f/0x120
Dec 12 07:06:31 efuovs02 kernel: [1508192.885375]  [<ffffffff813316c5>] ? copy_user_enhanced_fast_string+0x5/0x10
Dec 12 07:06:31 efuovs02 kernel: [1508192.885379]  [<ffffffff8121d7d1>] ? set_fd_set+0x21/0x30
Dec 12 07:06:31 efuovs02 kernel: [1508192.885384]  [<ffffffff8121e5aa>] core_sys_select+0x1fa/0x2f0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885392]  [<ffffffff810f8fc3>] ? ntp_notify_cmos_timer+0x23/0x30
Dec 12 07:06:31 efuovs02 kernel: [1508192.885396]  [<ffffffff810f8a1d>] ? do_adjtimex+0xed/0x100
Dec 12 07:06:31 efuovs02 kernel: [1508192.885402]  [<ffffffff810ed3ac>] ? SYSC_adjtimex+0x4c/0x80
Dec 12 07:06:31 efuovs02 kernel: [1508192.885410]  [<ffffffff810209e9>] ? read_tsc+0x9/0x10
Dec 12 07:06:31 efuovs02 kernel: [1508192.885414]  [<ffffffff810f68cb>] ? ktime_get_ts64+0x4b/0x110
Dec 12 07:06:31 efuovs02 kernel: [1508192.885419]  [<ffffffff8121e74b>] SyS_select+0xab/0x100
Dec 12 07:06:31 efuovs02 kernel: [1508192.885424]  [<ffffffff816ed451>] ? system_call_after_swapgs+0xdb/0x18c
Dec 12 07:06:31 efuovs02 kernel: [1508192.885428]  [<ffffffff816ed51a>] system_call_fastpath+0x18/0xd4
Dec 12 07:06:31 efuovs02 kernel: [1508192.885457] Mem-Info:
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] active_anon:1452 inactive_anon:1426 isolated_anon:65
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469]  active_file:4559 inactive_file:873 isolated_file:0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469]  unevictable:1547 dirty:20 writeback:31 unstable:0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469]  slab_reclaimable:6776 slab_unreclaimable:8649
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469]  mapped:3007 shmem:0 pagetables:1705 bounce:0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885469]  free:33536 free_pcp:918 free_cma:0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885483] Node 0 DMA free:15740kB min:60kB low:72kB high:84kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 12 07:06:31 efuovs02 kernel: [1508192.885499] lowmem_reserve[]: 0 2661 15921 15921
Dec 12 07:06:31 efuovs02 kernel: [1508192.885508] Node 0 DMA32 free:64064kB min:11076kB low:13844kB high:16612kB active_anon:5912kB inactive_anon:5876kB active_file:112kB inactive_file:52kB unevictable:836kB isolated(anon):256kB isolated(file):0kB present:2781336kB managed:2751088kB mlocked:836kB dirty:0kB writeback:0kB mapped:668kB shmem:0kB slab_reclaimable:4792kB slab_unreclaimable:6452kB kernel_stack:912kB pagetables:1692kB unstable:0kB bounce:0kB free_pcp:1156kB local_pcp:248kB free_cma:0kB writeback_tmp:0kB pages_scanned:619296 all_unreclaimable? yes
Dec 12 07:06:31 efuovs02 kernel: [1508192.885524] lowmem_reserve[]: 0 0 13260 13260
Dec 12 07:06:31 efuovs02 kernel: [1508192.885532] Node 0 Normal free:54340kB min:54392kB low:67988kB high:81584kB active_anon:0kB inactive_anon:0kB active_file:18124kB inactive_file:3440kB unevictable:5352kB isolated(anon):4kB isolated(file):0kB present:13979888kB managed:13534768kB mlocked:5352kB dirty:80kB writeback:124kB mapped:11360kB shmem:0kB slab_reclaimable:22312kB slab_unreclaimable:28144kB kernel_stack:2880kB pagetables:5128kB unstable:0kB bounce:0kB free_pcp:2516kB local_pcp:572kB free_cma:0kB writeback_tmp:0kB pages_scanned:129384 all_unreclaimable? yes
Dec 12 07:06:31 efuovs02 kernel: [1508192.885546] lowmem_reserve[]: 0 0 0 0
Dec 12 07:06:31 efuovs02 kernel: [1508192.885551] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15740kB
Dec 12 07:06:31 efuovs02 kernel: [1508192.885573] Node 0 DMA32: 143*4kB (UE) 111*8kB (UEM) 209*16kB (UE) 140*32kB (UE) 94*64kB (UEM) 57*128kB (UEM) 28*256kB (UEM) 7*512kB (UEM) 4*1024kB (EM) 9*2048kB (MR) 2*4096kB (MR) = 64068kB
Dec 12 07:06:31 efuovs02 kernel: [1508192.885596] Node 0 Normal: 8736*4kB (UEM) 1360*8kB (UEM) 208*16kB (UEM) 32*32kB (UE) 2*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 54400kB
Dec 12 07:06:31 efuovs02 kernel: [1508192.885615] 8002 total pagecache pages
Dec 12 07:06:31 efuovs02 kernel: [1508192.885618] 1494 pages in swap cache
Dec 12 07:06:31 efuovs02 kernel: [1508192.885621] Swap cache stats: add 3717250313, delete 3717248819, find 2895172777/5168362256
Dec 12 07:06:31 efuovs02 kernel: [1508192.885624] Free swap  = 4129656kB
Dec 12 07:06:31 efuovs02 kernel: [1508192.885626] Total swap = 4194300kB
Dec 12 07:06:31 efuovs02 kernel: [1508192.885628] 4194303 pages RAM
Dec 12 07:06:31 efuovs02 kernel: [1508192.885630] 0 pages HighMem/MovableOnly
Dec 12 07:06:31 efuovs02 kernel: [1508192.885632] 118864 pages reserved
Dec 12 07:06:31 efuovs02 kernel: [1508192.885634] 0 pages cma reserved
Dec 12 07:06:31 efuovs02 kernel: [1508192.885636] 0 pages hwpoisoned
Dec 12 07:06:31 efuovs02 kernel: [1508192.885638] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 12 07:06:31 efuovs02 kernel: [1508192.885650] [  983]     0   983     2677      266      11       3      111         -1000 udevd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885657] [ 3783]     0  3783   125771     1414      48       5        0         -1000 multipathd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885662] [ 4334]     0  4334     6944      399      15       3      108         -1000 auditd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885667] [ 4368]     0  4368    61281      438      23       3      377             0 rsyslogd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885671] [ 4383]     0  4383     2832      352      11       3      164             0 irqbalance
Dec 12 07:06:31 efuovs02 kernel: [1508192.885675] [ 4412]    32  4412     4760      397      16       3       74             0 rpcbind
Dec 12 07:06:31 efuovs02 kernel: [1508192.885680] [ 4436]    29  4436     5853      354      17       3      112             0 rpc.statd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885684] [ 4481]     0  4481     5790        0      15       3       50             0 rpc.idmapd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885689] [ 4522]     0  4522     2106      268      12       5       28             0 fcoemon
Dec 12 07:06:31 efuovs02 kernel: [1508192.885694] [ 4537]    81  4537     5373        0      15       3       62             0 dbus-daemon
Dec 12 07:06:31 efuovs02 kernel: [1508192.885698] [ 4612]     0  4612     1030      310       9       5       41             0 o2hbmonitor
Dec 12 07:06:31 efuovs02 kernel: [1508192.885702] [ 4632]     0  4632    47286      410      51       3      221             0 cupsd
Dec 12 07:06:31 efuovs02 kernel: [1508192.885706] [ 4692]     0  4692     1039      323       9       5       30             0 acpid
Dec 12 07:06:31 efuovs02 kernel: [1508192.885711] [ 4718]     0  4718     1580      211       8       5       27             0 mcelog
Dec 12 07:06:31 efuovs02 kernel: [1508192.885715] [ 4738]     0  4738    16579      304      34       3      185         -1000 sshd
Dec 12 07:06:32 efuovs02 kernel: [1508192.885720] [ 4751]    38  4751     6644      567      18       3      162             0 ntpd
Dec 12 07:06:32 efuovs02 kernel: [1508192.885724] [ 4796]     0  4796     3235      331      15       6      143             0 xenstored
Dec 12 07:06:32 efuovs02 kernel: [1508192.885729] [ 4803]     0  4803    21126      307      21       6       69             0 xenconsoled
Dec 12 07:06:32 efuovs02 kernel: [1508192.885733] [ 4807]     0  4807    53362      393      65       3      525             0 qemu-system-i38
Dec 12 07:06:32 efuovs02 kernel: [1508192.885737] [ 4910]     0  4910    20252      478      45       3      239             0 master
Dec 12 07:06:32 efuovs02 kernel: [1508192.885742] [ 4922]    89  4922    20315      486      46       3      238             0 qmgr
Dec 12 07:06:32 efuovs02 kernel: [1508192.885746] [ 4930]     0  4930    29223      395      16       3      171             0 crond
Dec 12 07:06:32 efuovs02 kernel: [1508192.885750] [ 5036]     0  5036     5291      283      15       3       67             0 atd
Dec 12 07:06:32 efuovs02 kernel: [1508192.885755] [ 5345]     0  5345    38468      249      14       5       30             0 osmdaemon
Dec 12 07:06:32 efuovs02 kernel: [1508192.885759] [ 5366]     0  5366    85597      650      61       7     1514             0 python
Dec 12 07:06:32 efuovs02 kernel: [1508192.885764] [ 5378]     0  5378    24079      467      27       6      113             0 ovmport
Dec 12 07:06:32 efuovs02 kernel: [1508192.885768] [ 5390]     0  5390    60521      410      65       6      920             0 ovmwatch
Dec 12 07:06:32 efuovs02 kernel: [1508192.885772] [ 5405]     0  5405   208969      656      87       7     1558             0 python
Dec 12 07:06:32 efuovs02 kernel: [1508192.885777] [ 5772]     0  5772   177327     1015      89       6     1775             0 python
Dec 12 07:06:32 efuovs02 kernel: [1508192.885782] [ 5789]     0  5789    49154      741      71       7     1366             0 python
Dec 12 07:06:32 efuovs02 kernel: [1508192.885786] [ 5831]     0  5831    82559      555      70       6     1491             0 devmon
Dec 12 07:06:32 efuovs02 kernel: [1508192.885790] [ 5901]     0  5901     1031      292       9       5       18             0 mingetty
Dec 12 07:06:32 efuovs02 kernel: [1508192.885794] [ 5903]     0  5903     1031      292       8       5       19             0 mingetty
Dec 12 07:06:32 efuovs02 kernel: [1508192.885798] [ 5905]     0  5905     1031      292       9       5       19             0 mingetty
Dec 12 07:06:32 efuovs02 kernel: [1508192.885802] [ 5907]     0  5907     1031      292       9       5       19             0 mingetty
Dec 12 07:06:32 efuovs02 kernel: [1508192.885806] [ 5909]     0  5909     1031      292       9       5       19             0 mingetty
Dec 12 07:06:32 efuovs02 kernel: [1508192.885812] [26455]     0 26455    11091      458      44       5      108             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885816] [27087]     0 27087    11091      458      45       5      108             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885820] [27845]     0 27845    11091      458      44       5      109             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885825] [27996]     0 27996    11091      458      44       5      107             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885829] [14189]     0 14189    11091      458      44       5      109             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885833] [16371]     0 16371    11091      458      44       5      109             0 socat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885838] [14238]     0 14238     2676      256      11       3      129         -1000 udevd
Dec 12 07:06:32 efuovs02 kernel: [1508192.885842] [14374]     0 14374     2676      240      11       3      119         -1000 udevd
Dec 12 07:06:32 efuovs02 kernel: [1508192.885846] [15869]     0 15869    22957      931      62       6     1730             0 python
Dec 12 07:06:32 efuovs02 kernel: [1508192.885851] [16935]     0 16935    28695     2029      16       5       64             0 OSWatcher
Dec 12 07:06:32 efuovs02 kernel: [1508192.885855] [ 5867]    89  5867    20272     1250      45       3      229             0 pickup
Dec 12 07:06:32 efuovs02 kernel: [1508192.885860] [ 8948]     0  8948    27070      682      17       5       71             0 vmsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885864] [ 8951]     0  8951    27070      675      17       5       77             0 mpsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885868] [ 8953]     0  8953     1581      328      11       5       42             0 vmstat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885872] [ 8958]     0  8958    27070      659      17       6       41             0 iosub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885876] [ 8959]     0  8959    25258      441      13       5       46             0 mpstat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885880] [ 8966]     0  8966    25261      420      11       5       19             0 iostat
Dec 12 07:06:32 efuovs02 kernel: [1508192.885884] [ 8971]     0  8971    27070      679      17       5        0             0 xtop
Dec 12 07:06:32 efuovs02 kernel: [1508192.885888] [ 8976]     0  8976    27070      695      17       5        0             0 psmemsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885892] [ 8977]     0  8977     3771      483      20       5        3             0 top
Dec 12 07:06:32 efuovs02 kernel: [1508192.885896] [ 8980]     0  8980    27070      680      17       5        0             0 oswsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885901] [ 8985]     0  8985    28695     1794      15       5      131             0 OSWatcher
Dec 12 07:06:32 efuovs02 kernel: [1508192.885905] [ 8986]     0  8986    27564      523      19       5        8             0 ps
Dec 12 07:06:32 efuovs02 kernel: [1508192.885909] [ 8987]     0  8987    27070       54      12       5        0             0 psmemsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885913] [ 8988]     0  8988    27070       52      11       5        0             0 oswsub
Dec 12 07:06:32 efuovs02 kernel: [1508192.885917] Out of memory: Kill process 5772 (python) score 0 or sacrifice child
Dec 12 07:06:32 efuovs02 kernel: [1508192.886216] Killed process 5772 (python) total-vm:709308kB, anon-rss:0kB, file-rss:4060kB

How to fix the OVS Kernel Memory Leak

Download the following kernel version which includes the memoy leak fix for the i40e module: link to Oracle RPM repository

kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm
kernel-uek-firmware-4.1.12-124.21.1.el6uek.noarch.rpm


[root@efuovs02 new_Kernel]# rpm -qp --changelog kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm | grep -B 3 28228724
warning: kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
* Tue Oct 30 2018 Brian Maly <brian.maly@oracle.com> [4.1.12-124.20.8.el6uek] 
- scsi: lpfc: devloss timeout race condition caused null pointer reference (James Smart) [Orabug: 27994179] 
- scsi: qla2xxx: Fix race condition between iocb timeout and initialisation (Ben Hutchings) [Orabug: 28013813] 
- i40e: Add programming descriptors to cleaned_count (Alexander Duyck) [Orabug: 28228724] 
- i40e: Fix memory leak related filter programming status (Alexander Duyck) [Orabug: 28228724]

Install the new OVS Kernel

Using the steps reported below, the new kernel has been installed on all OVS servers of the farm.

[root@efuovs02 new_Kernel]# rpm -ivh kernel*
warning: kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing... ########################################### [100%]
1:kernel-uek-firmware ########################################### [ 50%]
2:kernel-uek ########################################### [100%]
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.1.12-124.21.1.el6uek.x86_64
Found linux image: /boot/vmlinuz-4.1.12-124.14.5.el6uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-124.14.5.el6uek.x86_64.img
done
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.1.12-124.21.1.el6uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-124.21.1.el6uek.x86_64.img
Found linux image: /boot/vmlinuz-4.1.12-124.14.5.el6uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-124.14.5.el6uek.x86_64.img
done
[root@efuovs02 new_Kernel]#

....

[root@efuovs02 new_Kernel]# reboot

[root@efuovs02 ~]# uname -a
Linux efuovs02 4.1.12-124.21.1.el6uek.x86_64 #2 SMP Tue Nov 6 13:31:13 PST 2018 x86_64 x86_64 x86_64 GNU/Linux

My OOW18 Summary

For those who are interested here my major takeaways from the OOW18

As we all know, since few years the HOTTEST topic advertized at the OOW is “Cloud Computing”, but this time Oracle Cloud was no longer alone!

In fact the focus was divided between the new Oracle OCI Cloud, also named by Larry as Second Generation of Cloud and the Autonomous Database.

OCI Second Gen of Cloud

Here a summary of the major advantages compared to the previous version:

– Security, guaranteed by robots which scan the network for any malicious attack.

– The cutting edge virtual network, which brings up to 50GB speed and extreme flexibility.

– Bare Metal Infrastructure based on Exadata Machines.

– Aggressive pricing, compared to the competitors.

Autonomous Database.

The Autonomous Database option is now available for OLTP and DWH databases and includes new capabilities like automatic index creation and column stored table conversion. In version 19 it will manage online memory increase and additional tuning options.

As announces during Larry’s keynote, the Autonomous database will be also available with the Cloud @ Customer option (on Exadata only), ant it will no longer require human labor (DBA and Sys Admin intervention), because Self Provisioning, Self Driving, Self Tuning and Self Repairing.

For non-technical people it looks magic, but it is few steps from what we already use in a standard Oracle 12c Database. In fact Autonomous Database leverages a bunch of database advisors and tuning options, now orchestrated by an Artificial Intelligence and Machine Learning software, in order to provide data-driven predictions and decisions.

Over the next few years, Autonomous Database will be enriched with several new options, improving the quality of live of many DBAs, which will be relieved of the majority of the tedious and recurring tasks, leaving the most added value tasks under their own responsibility.

Last but not least, the Autonomous Database runs in a very high end configurations (Oracle guarantees 99,995% of availability), which is quite expensive to acquire due to the list of mandatory requirements: Exadata, RAC, Active DG, Multitenant, Tuning Pack, Diagnostic Pack etc..

Exadata Machine

Several interesting features are coming next year with the introduction of the INTEL Optane DC Persistent Memory for even faster OLTP.

This new type of memory will be installed on the Storage Cell and used as accelerator in front of Flash memory.

The database node will access to the Persistent Memory via RDMA with a gain up to 20 x faster access latency.

Oracle is developing the more and more Remote Direct Memory Access (RDMA) instructions for Cache Fusion and Storage Cell operations in order to offload the database nodes and increase the overall performance.

Stay tuned on Exadata Machine because the next generation will also include BIG architectural change…

Oracle Virtual Machine (OVM)

One curiosity directly collected at Linux Virtualization booth is that even though the next generation of hypervisor will be based on KVM, Oracle will keep calling it OVM and of course the current OVM product based on XEN (OVS, OVM) will still be in use by many companies.

How possibly the customers can get confused ?!?

With this I finished, although there would be much more to write.

RMAN on Multitenant DB – Awareness of the Backup Optimization Behavior

Recovery Manager (RMAN) is one of the most popular Oracle databases components with unique Backup/Recovery features. It is fully integrated with the Multitenant Architecture allowing to implement “Manage Many-Databases-as-One“ strategy.

RMAN permits to customize and save several database parameters used during the backup and recovery operations. Such parameters define for example the backup retention policy, the default device type, how many archivelog copy should be stored, if the backup-sets should be compressed and/or encrypted and so on…

Below an example of RMAN setup with the highlight of the parameter CONFIGURE BACKUP OPTIMIZATION ON discussed on the next sections.

RMAN> show all;

RMAN configuration parameters for database with db_unique_name CEFUPRD are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 8 DAYS;
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/BACKUP/Databases/CEFUPRD/%F';
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1;
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT '/BACKUP/Databases/CEFUPRD/%d_%T_%U';
CONFIGURE MAXSETSIZE TO UNLIMITED;
CONFIGURE ENCRYPTION FOR DATABASE OFF;
CONFIGURE ENCRYPTION ALGORITHM 'AES128';
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE;
CONFIGURE RMAN OUTPUT TO KEEP FOR 10 DAYS;
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/fast_recovery_area/rcocefuprd/cefuprd/snapcf_cefuprd.f';

RMAN>

Effects of RMAN Backup Optimization ON/OFF

In a Multitenant environment is more important than ever to understand the effects of the parameter CONFIGURE BACKUP OPTIMIZATION which can be set to ON or OFF.

Behavion when set ON

If RMAN determines that a file is identical and it has been backed up, then it is a candidate to be skipped. RMAN must do further checking to determine whether to skip the file, however, because both the retention policy and the backup duplexing feature are factors in the algorithm that determines whether RMAN has sufficient backups on the specified device type. (Definition from Oracle Backup Recovery User’s Guide).

Behavion when set OFF

The RMAN backup always includes all files no matter if they are identical and already backed up within the backup retention window.

What happens by migrating from Non-CDB to PDB?

Assuming that we have just migrated a non-CDB database to PDB and our pluggable database has 4 tablespaces all open read/write. The container uses the same RMAN setup included on the top of this page, with CONFIGURE BACKUP OPTIMIZATION ON.

Dispite having a FULL database backup every night, only 1 backup every 8 days will be complete and consistent, because the RMAN backup optimization algorithm will detect the SEED PDB datafiles unchanged and it will skip those files. Therefore if we restore the CDB using the backup-sets generated by one FULL database backup, with no access to the rest of backup-sets inside the retention window, there are great probabilities that the restore will fail.

Extract of the CDB backup log which shows that the PDB$SEED datafiles have been skipped because already backed up 1 time during the last 8 days.

RMAN> BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL = 0 DATABASE PLUS ARCHIVELOG NOT BACKED UP 1 TIMES; 

Starting backup at May 15 2018 00:35:07
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=9 instance=clgbprd1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=168 instance=clgbprd1 device type=DISK
skipping archived logs of thread 1 from sequence 39516 to 39931; already backed up
skipping archived logs of thread 2 from sequence 34457 to 34749; already backed up
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=2 sequence=34774 RECID=148645 STAMP=976088413
input archived log thread=2 sequence=34775 RECID=148649 STAMP=976088467
input archived log thread=1 sequence=39944 RECID=148655 STAMP=976088552
input archived log thread=2 sequence=34776 RECID=148651 STAMP=976088509
input archived log thread=2 sequence=34777 RECID=148653 STAMP=976088551
input archived log thread=2 sequence=34778 RECID=148657 STAMP=976088700
input archived log thread=1 sequence=39945 RECID=148662 STAMP=976088937
input archived log thread=2 sequence=34779 RECID=148659 STAMP=976088838

...
Starting backup at May 15 2018 00:50:02
using channel ORA_DISK_1
using channel ORA_DISK_2
skipping datafile 2; already backed up 1 time(s)
skipping datafile 4; already backed up 1 time(s)
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
...

Using only the backup-sets above to restore the CDB means that Oracle has to recreate the two skipped datafiles (number 2 and 4) applying the archived logs generated during the initial CDB provisioning.

To note that the full backup starts including archived log from the following sequence:

For the Thread 1 – Sequence 39944
For the Thread 2 – Sequence 34774

But when Oracle initiates the Media Recovery, it complains because the archived log Thread 1 – Sequence 1 is unavailable:

RMAN> run {
allocate auxiliary channel dsk1 type disk ;
2> allocate auxiliary channel dsk2 type disk ;
allocate auxiliary channel dsk3 type disk ;
allocate auxiliary channel dsk4 type disk ;
3> allocate auxiliary channel dsk5 type disk ;
4> allocate auxiliary channel dsk6 type disk ;
5> duplicate database to 'CEFUAUX' noopen backup location '/BACKUP/Databases/CEFUPRD/backup_20180515_only' nofilenamecheck;
}6> 7> 8> 9>

allocated channel: dsk1
channel dsk1: SID=322 device type=DISK

allocated channel: dsk2
channel dsk2: SID=471 device type=DISK

allocated channel: dsk3
channel dsk3: SID=9 device type=DISK

allocated channel: dsk4
channel dsk4: SID=166 device type=DISK

allocated channel: dsk5
channel dsk5: SID=323 device type=DISK

allocated channel: dsk6
channel dsk6: SID=478 device type=DISK

Starting Duplicate Db at May 15 2018 09:29:15

....

contents of Memory Script:
{
 set until scn 2372623043;
 recover
 clone database
 delete archivelog
 ;
}
executing Memory Script

executing command: SET until clause

Starting recover at May 15 2018 11:24:39

starting media recovery

unable to find archived log
archived log thread=1 sequence=1
Oracle instance started

I hope this example helped to understand that while migrating from non-CDB to Multitenant, many Administration tasks should be carefully reviewed due to major architecture changes.

Grid Management DB filling up ASM disk space

Recently I discovered on Oracle Grid Infrastructure 12cR2 that the ASM disk group hosting the Management DB (-MGMTDB) was filling up the disk space very quickly.

This is due to a bug on the oclumon data purge procedure.

To fix the problem, two possibilities are available:

Recreate the Management DB
Manually truncate the tables not purged and shrinking the Tablespace Size

Below are described the two options.

Option 1 – Recreate the Management DB

As root user on each cluster node:

# /u01/app/12.2.0.1/grid/bin/crsctl stop res ora.crf -init
# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.crf -attr ENABLED=0 -init

As Grid from the local node hosting the Management Database Instance run the commands:

$ /u01/app/12.2.0.1/grid/bin/srvctl status mgmtdb
$ /u01/app/12.2.0.1/grid/bin/dbca -silent -deleteDatabase -sourceDB -MGMTDB
Connecting to database
4% complete
9% complete
14% complete
19% complete
23% complete
28% complete
47% complete
Updating network configuration files
48% complete
52% complete
Deleting instance and datafiles
76% complete
100% complete

How to recreate the MGMTDB:

$ /u01/app/12.2.0.1/grid/bin/dbca -silent -createDatabase -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc
-sid -MGMTDB
-gdbName _mgmtdb
-storageType ASM
-diskGroupName GIMR
-datafileJarLocation <GI HOME>/assistants/dbca/templates
-characterset AL32UTF8
-autoGeneratePassword           
-skipUserTemplateCheck

Create the pluggable GIMR database

$ /u01/app/12.2.0.1/grid/bin/mgmtca -local

Option 2 – Manually truncate the tables

As root user stop and disable ora.crf resource on each cluster node:

# /u01/app/12.2.0.1/grid/bin/crsctl stop res ora.crf -init
# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.crf -attr ENABLED=0 -init

Connect to MGMTDB and identify the segments to truncate:

export ORACLE_SID=-MGMTDB
$ORACLE_HOME/bin/sqlplus / as sysdba
SQL> select pdb_name from dba_pdbs where pdb_name!='PDB$SEED';

SQL> alter session set container=GIMR_DSCREP_10;

Session altered.

SQL> col obj format a50
SQL> select owner||'.'||SEGMENT_NAME obj, BYTES from dba_segments where owner='CHM' order by 2 asc;

Likely those two tables are much bigger than the rest :

CHM.CHMOS_PROCESS_INT_TBL
CHM.CHMOS_DEVICE_INT_TBL

Truncate the tables:

SQL> truncate table CHM.CHMOS_PROCESS_INT_TBL;
SQL> truncate table CHM.CHMOS_DEVICE_INT_TBL;

Then if needed shrink the tablespace and job done!

Exadata How Safely Erase All Data

When the time arrives to decommission an environment with sesitive data, we are frequently confronted to the problem how to certify to our customer or management the erase of all data and logs.

On Exadata Machine starting from the software release 12.2.1.1.0, this problem has been elegantly solved by Oracle introducing a new utility called Secure Eraser; which securely erases data on hard drives, flash devices, internal USBs, and resets ILOM to factory default.

In earlier software versions, the Exadata Storage Software includes CellCli commands to securely erase the user data:

CellCLI> DROP GRIDDISK ALL FLASHDISK PREFIX=DATA, ERASE=7pass
CellCLI> DROP GRIDDISK ALL PREFIX=DATA, ERASE=3pass

and

CellCLI> DROP CELLDISK ALL FLASHDISK ERASE=7pass 
CellCLI> DROP CELL ERASE=3pass

Unfortunatly those commands only cover the user data stored on the Storage Cell, and none of them produces an official certificate with the summary of the actions taken to guarantee the wipe of the data. While all this is done by Secure Eraser on all Compute and Storage nodes, sanitizing on all type of devices: user data, OS logs and network configurations.

Depending from the Exadata model, a subset of all of available options to execute Secure Eraser is possible:

Automatic Secure Eraser Ethrough PXE Boot
Interactive Secure Eraser through PXE Boot
Interactive Secure Eraser through Network Boot
Interactive Secure Eraser through External USB

Recently I used Secure Eraser through External USB on one Exadata X7-2 Machine and here are reported the different steps.

Copy the Secure Eraser Diagnostic image from MOS 2180963.1 to a USB stick.

 # dd if=image_diagnostics_18.1.4.0.0_LINUX.X64_180125.3-1.x86_64.usb of=/dev/sdb

Boot the server using the USB device with the Secure Eraser Diagnostic image

After login, start the Secure Erase process

/usr/sbin/secureeraser --erase --all --flash_erasure_method=7pass --hdd_erasure_method=3pass --technician=Emiliano_Fusaglia --witness=Mario_Bros --output=/mnt/iso

At the end of the erase process a Data Erasure Certificate similar to the one on the example below will be available in TXT, HTML and PDF format.

Exa_SecureErase_Report

Exadata Storage Snapshots

This post describes how to implement Oracle Database Snapshot Technology on Exadata Machine.

Because Exadata Storage Cell Smart Features, Storage Indexes, IORM and Network Resource Manager work at level of ASM Volume Manager only, (and they don’t work on top of ACFS Cluster File System), the implementation of the snapshot technology is different compared to any other non-Exadata environment.

At this purpuse Oracle has developed a new type of ASM Disk Group called SPARSE Disk Group. It uses ASM SPARSE Grid Disk based on Thin Provisioning to save the database snapshot copies and the associated metadata, and it supports non-CDB and PDB snapshot copy.

The implementation requires the following minimal software versions :

Exadata Storage Software version 12.1.2.1.0.
Oracle Database version 12.1.0.2 with bundle patch 5.

One major restriction applies to Exadata Storage Sanpshot compared to ACFS;

the source database must be a shared copy open on read only and called Test Master. The Test Master Database can not be modified or deleted as long the latest child snapshot is in use.

This restriction exists because Exadata Snapshot technology uses “allocate on first write”, and not “copy on write” (like for ACFS), and the snapshot is per-database-datafile.

When a child snapshot issue a write, the write goes to a private copy of that block inside the snapshot, preserving the original block value which can be accessed by other child snapshots of the same Test Master.

How to Implement Exadata Storage Snapshots in a PDB Environment

Check the celldisks for available free space to allocate to a new SPARSE Disk Group

[root@strgceladm01 ~]# cellcli -e list celldisk attributes name,freespace
 CD_00_strgceladm01 853.34375G
 CD_01_strgceladm01 853.34375G
 CD_02_strgceladm01 853.34375G
 CD_03_strgceladm01 853.34375G
 CD_04_strgceladm01 853.34375G
 CD_05_strgceladm01 853.34375G
 CD_06_strgceladm01 853.34375G
 CD_07_strgceladm01 853.34375G
 CD_08_strgceladm01 853.34375G
 CD_09_strgceladm01 853.34375G
 CD_10_strgceladm01 853.34375G
 CD_11_strgceladm01 853.34375G
 FD_00_strgceladm01 0
 FD_01_strgceladm01 0
 FD_02_strgceladm01 0
 FD_03_strgceladm01 0
[root@strgceladm01 ~]#


[root@strgceladm02 ~]# cellcli -e list celldisk attributes name,freespace
 CD_00_strgceladm02 853.34375G
 CD_01_strgceladm02 853.34375G
 CD_02_strgceladm02 853.34375G
 CD_03_strgceladm02 853.34375G
 CD_04_strgceladm02 853.34375G
 CD_05_strgceladm02 853.34375G
 CD_06_strgceladm02 853.34375G
 CD_07_strgceladm02 853.34375G
 CD_08_strgceladm02 853.34375G
 CD_09_strgceladm02 853.34375G
 CD_10_strgceladm02 853.34375G
 CD_11_strgceladm02 853.34375G
 FD_00_strgceladm02 0
 FD_01_strgceladm02 0
 FD_02_strgceladm02 0
 FD_03_strgceladm02 0
[root@strgceladm02 ~]#


[root@strgceladm03 ~]# cellcli -e list celldisk attributes name,freespace
 CD_00_strgceladm03 853.34375G
 CD_01_strgceladm03 853.34375G
 CD_02_strgceladm03 853.34375G
 CD_03_strgceladm03 853.34375G
 CD_04_strgceladm03 853.34375G
 CD_05_strgceladm03 853.34375G
 CD_06_strgceladm03 853.34375G
 CD_07_strgceladm03 853.34375G
 CD_08_strgceladm03 853.34375G
 CD_09_strgceladm03 853.34375G
 CD_10_strgceladm03 853.34375G
 CD_11_strgceladm03 853.34375G
 FD_00_strgceladm03 0
 FD_01_strgceladm03 0
 FD_02_strgceladm03 0
 FD_03_strgceladm03 0
[root@strgceladm03 ~]#

For each Storage Cell Create a SPARSE Grid Disks as described below

[root@strgceladm01 ~]# cellcli -e CREATE GRIDDISK ALL PREFIX=SPARSE, sparse=true, SIZE=853.34375G
Cell disks were skipped because they had no freespace for grid disks: FD_00_strgceladm01, FD_01_strgceladm01, FD_02_strgceladm01, FD_03_strgceladm01.
GridDisk SPARSE_CD_00_strgceladm01 successfully created
GridDisk SPARSE_CD_01_strgceladm01 successfully created
GridDisk SPARSE_CD_02_strgceladm01 successfully created
GridDisk SPARSE_CD_03_strgceladm01 successfully created
GridDisk SPARSE_CD_04_strgceladm01 successfully created
GridDisk SPARSE_CD_05_strgceladm01 successfully created
GridDisk SPARSE_CD_06_strgceladm01 successfully created
GridDisk SPARSE_CD_07_strgceladm01 successfully created
GridDisk SPARSE_CD_08_strgceladm01 successfully created
GridDisk SPARSE_CD_09_strgceladm01 successfully created
GridDisk SPARSE_CD_10_strgceladm01 successfully created
GridDisk SPARSE_CD_11_strgceladm01 successfully created
[root@strgceladm01 ~]#

For each Storage Cell List all Grid Disks

[root@strgceladm01 ~]# cellcli -e list griddisk attributes name,size
 DATAC1_CD_00_strgceladm01 6.294586181640625T
 DATAC1_CD_01_strgceladm01 6.294586181640625T
 DATAC1_CD_02_strgceladm01 6.294586181640625T
 DATAC1_CD_03_strgceladm01 6.294586181640625T
 DATAC1_CD_04_strgceladm01 6.294586181640625T
 DATAC1_CD_05_strgceladm01 6.294586181640625T
 DATAC1_CD_06_strgceladm01 6.294586181640625T
 DATAC1_CD_07_strgceladm01 6.294586181640625T
 DATAC1_CD_08_strgceladm01 6.294586181640625T
 DATAC1_CD_09_strgceladm01 6.294586181640625T
 DATAC1_CD_10_strgceladm01 6.294586181640625T
 DATAC1_CD_11_strgceladm01 6.294586181640625T
 FGRID_FD_00_strgceladm01 2.0717315673828125T
 FGRID_FD_01_strgceladm01 2.0717315673828125T
 FGRID_FD_02_strgceladm01 2.0717315673828125T
 FGRID_FD_03_strgceladm01 2.0717315673828125T
 RECOC1_CD_00_strgceladm01 1.78143310546875T
 RECOC1_CD_01_strgceladm01 1.78143310546875T
 RECOC1_CD_02_strgceladm01 1.78143310546875T
 RECOC1_CD_03_strgceladm01 1.78143310546875T
 RECOC1_CD_04_strgceladm01 1.78143310546875T
 RECOC1_CD_05_strgceladm01 1.78143310546875T
 RECOC1_CD_06_strgceladm01 1.78143310546875T
 RECOC1_CD_07_strgceladm01 1.78143310546875T
 RECOC1_CD_08_strgceladm01 1.78143310546875T
 RECOC1_CD_09_strgceladm01 1.78143310546875T
 RECOC1_CD_10_strgceladm01 1.78143310546875T
 RECOC1_CD_11_strgceladm01 1.78143310546875T
 SPARSE_CD_00_strgceladm01 853.34375G
 SPARSE_CD_01_strgceladm01 853.34375G
 SPARSE_CD_02_strgceladm01 853.34375G
 SPARSE_CD_03_strgceladm01 853.34375G
 SPARSE_CD_04_strgceladm01 853.34375G
 SPARSE_CD_05_strgceladm01 853.34375G
 SPARSE_CD_06_strgceladm01 853.34375G
 SPARSE_CD_07_strgceladm01 853.34375G
 SPARSE_CD_08_strgceladm01 853.34375G
 SPARSE_CD_09_strgceladm01 853.34375G
 SPARSE_CD_10_strgceladm01 853.34375G
 SPARSE_CD_11_strgceladm01 853.34375G
[root@strgceladm01 ~]#

From ASM Instance Create a SPARSE Disk Group

SQL> CREATE DISKGROUP SPARSEC1 EXTERNAL REDUNDANCY DISK 'o/*/SPARSE_CD_*'
ATTRIBUTE
'compatible.asm' = '12.2.0.1',
'compatible.rdbms' = '12.2.0.1',
'cell.smart_scan_capable'='TRUE',
'cell.sparse_dg' = 'allsparse',
'AU_SIZE' = '4M';

Diskgroup created.

Set the following ASM attributes on the Disk Group hosting the Test Master Database

ALTER DISKGROUP DATAC1 SET ATTRIBUTE 'access_control.enabled' = 'true';

Grant access to the OS RDBMS user used to access to the Disk Group

ALTER DISKGROUP DATAC1 ADD USER 'oracle';

From an ASM Instance Set ownership permissions for every file that belongs solely to the PDB being snapped cloned as per example below

alter diskgroup DATAC1 set ownership owner='oracle' for file '+DATAC1/CDBT/<xxxxxxxxxxxxxxxxxxx>/DATAFILE/system.xxx.xxxxxxx';
alter diskgroup DATAC1 set ownership owner='oracle' for file '+DATAC1/CDBT/<xxxxxxxxxxxxxxxxxxx>/DATAFILE/sysaux.xxx.xxxxxxx';
alter diskgroup DATAC1 set ownership owner='oracle' for file '+DATAC1/CDBT/<xxxxxxxxxxxxxxxxxxx>/DATAFILE/users.xxx.xxxxxxx';
...
..

Restart the Master Test PDB in Read Only

alter pluggable database PDBTESTMASTER close immediate instances=all;
alter pluggable database PDBTESTMASTER open read only;

Create the first PDB Snapshot Copy on Exadata SPARSE Disk Group

Create pluggable database PDBDEV01 from PDBTESTMASTER tempfile reuse create_file_dest='+SPARSEC1' snapshot copy;

Feedback of the Exadata Storage Snapshots

The ability to create storage efficient database copies in a few seconds, independently from the size of the Test Master is very useful for today IT departments; but such extreme velocity and flexibility is not entirely free. In fact performance tests on a I/O bound workload have highlighted important performance degradation. This reminds us that as defined by Oracle Corporation, the Snapshot Technology, included on Exadata Machine remains a non-production option.

Feedback of Modern Consolidated Database Environment

Since the launch of Oracle 12c R1 Beta Program (August 2012) at Trivadis, we have been intensively testing, engineering and implementing Multitenant architectures for our customers.

Today, we can provide our feedbacks and those of our customers!

The overall feedback related to Oracle Multitenant is very positive, customers have been able to increase flexibility and automation, improving the efficiency of the software development life cycles.

Even the Single-tenant configuration (free of charge) brings few advantages compared to the non-CDB architecture. Therefore, from a technology point of view I recommend adopting the Container Database (CDB) architecture for all Oracle databases.

Examples of Multitenant architectures implemented

Having defined Oracle Multitenant a technological revolution on the space of relational databases, when combined with others 12c features it becomes a game changer for flexibility, automation and velocity.

Here are listed few examples of successful architectures implemented with our customers, using Oracle Container Database (CDB):

Oracle Multitenant combined with Oracle Grid Infrastructure (Clustering and ASM, with no additional cost), brings several advantages: Hight Availability, Horizontal Scalability, Elasticity, Cluster and Storage Foundation with advanced options like:
- Oracle Real Application Cluster.
- Quality of Service Management.
- Oracle Autonomous Framework.
- File System and Database Snapshot.
- Foundation of Oracle Autonomous Database 18c

Database consolidation without performance and stability compromise here.

Multitenant and DevOps here.

Operating Database Disaster Recovery in Multitenant environment here.