Subido por albertin1975

Oracle Real Application Clusters 19c: Best Practices and Secret Internals

Anuncio
Oracle Real Application Clusters 19c:
Best Practices and Secret Internals
Anil Nair
Sr Principal Product Manager,
Oracle Real Application Clusters (RAC)
@RACMasterPM
http://www.linkedin.com/in/anil-nair-01960b6
http://www.slideshare.net/AnilNair27/
1
Copyright © 2019 Oracle and/or its affiliates.
Guest speaker:
Paresh Patel
Senior Member of Technical Staff,
PayPal
Safe Harbor
The preceding is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality described for Oracle’s products may change
and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed
discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and
Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q
under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website
at http://www.oracle.com/investor. All information in this presentation is current as of September
2019 and Oracle undertakes no duty to update any statement in light of new information or future events.
Copyright © 2019 Oracle and/or its affiliates.
Copyright © 2019 Oracle and/or its affiliates.
Session Survey
Help us make the content even
better. Please complete the
session survey in the Mobile
App.
Session id is TRN 4851
Agenda
5
1
Best Practices to Upgrade to Oracle 19c
2
Oracle 19c Grid Infrastructure (GI) New Features
3
What’s new with Cache Fusion?
Agenda
1
2
3
6
Best Practices to Upgrade to Oracle 19c
Oracle 19c Upgrade requires Linux 7
•
•
Execution of ./gridSetup.sh on old
OS releases may fail
Failure is reported as a perl error
message
• perl has hard dependency on
glibc
•
Similar message reported by DB
installer
•
Additional details in URL below
https://www.linkedin.com/pulse/high-levelsteps-upgrade-oracle-19c-rac-anil-nair/
Upgrade to Linux 7 with least downtime
For each Node perform the following
until last node
Services
Drain
Linux6
Linux6
Linux6
$srvctl relocate service
–drain_timeout
Linux6
addNode
delNode
Linux7
*Inline Upgrade depends on initial configuration
Upgrade to Linux 7 with least downtime
For each Node perform the following
until last node
Services
Drain
Linux6
Linux6
Linux6
$srvctl relocate service
–drain_timeout
Linux6
addNode
delNode
Linux7
*Inline Upgrade depends on initial configuration
Upgrade to Linux 7 with least downtime
For each Node perform the following
until last node
Services
Drain
Linux6
Linux6
Linux6
$srvctl relocate service
–drain_timeout
Linux6
addNode
delNode
Linux7
*Inline Upgrade depends on initial configuration
$./delNode ….
Upgrade to Linux 7 with least downtime
For each Node perform the following
until last node
Services
Drain
Linux6
Linux6
Linux6
$srvctl relocate service
–drain_timeout
Linux6
Linux7
addNode
delNode
Linux7
Upgrade or
Reinstall
*Inline Upgrade depends on initial configuration
$./delNode ….
Upgrade to Linux 7 with least downtime
For each Node perform the following
until last node
Services
Drain
Linux6
Linux6
Linux6
$srvctl relocate service
–drain_timeout
Linux6
Linux7
$./addNode ….
addNode
delNode
Linux7
Upgrade or
Reinstall
*Inline Upgrade depends on initial configuration
$./delNode ….
ORAchk=ORAchk+cluvfy+Autoupgrade.jar*
•
Download latest orachk and benefit
from the latest checks
• No need to individually
download autoupgrade.jar or
cluvfy
•
Single report with results from
autoupgrade.jar, orachk and cluvfy
checks
•
*orachk also includes other
components like Application
Continuity and Security related
checks
ORAchk
Cluvfy
PreUpgrade
ORAchk autoupgrade includes autoupgrade.jar
checks and cluvfy pre-upgrade checks
One command for all Autoupgrade checks
Report includes results from all components with appropriate options
•
orachk -preupgrade -targetversion 19.3.0.0.0 –showpass
•
cluvfy –stage pre/post
•
orachk includes preupgrade.jar and cluvfy pre-upgrade checks
One command for all Autoupgrade checks
Report includes results from all components with appropriate options
•
orachk -preupgrade -targetversion 19.3.0.0.0 –showpass
•
cluvfy –stage pre/post
•
orachk includes preupgrade.jar and cluvfy pre-upgrade checks
One command for all Autoupgrade checks
Report includes results from all components with appropriate options
•
orachk -preupgrade -targetversion 19.3.0.0.0 –showpass
•
cluvfy –stage pre/post
•
orachk includes preupgrade.jar and cluvfy pre-upgrade checks
One command for all Autoupgrade checks
Report includes results from all components with appropriate options
•
orachk -preupgrade -targetversion 19.3.0.0.0 –showpass
•
cluvfy –stage pre/post
•
orachk includes preupgrade.jar and cluvfy pre-upgrade checks
State of the GIMR
•
Grid Infrastructure Management
Repository (GIMR) aka mgmtDB is
NO longer mandatory with starting
with Oracle 19c
•
Limited AHF functionality by
utilizing filesystem without GIMR
• No support for CHA GUI chactl
• Trace File Analyzer (TFA) will
provide limited graphical view
State of the GIMR
•
Grid Infrastructure Management
Repository (GIMR) aka mgmtDB is
NO longer mandatory with starting
with Oracle 19c
•
Limited AHF functionality by
utilizing filesystem without GIMR
• No support for CHA GUI chactl
• Trace File Analyzer (TFA) will
provide limited graphical view
20
Choose to install GIMR: Thanks to Your
Feedback
Only for New Installations
•
Upgrades depend on Initial GIMR configuration
•
Choose to Install Grid Infrastructure Management Repository (GIMR)
•
Eventual goal is to move GIMR into its own new separate home
21
-dryRunForUpgrade: Thanks to Your
Feedback
$./gridSetup.sh –dryRunForUpgrade
gridSetup now can be used with the –dryRunForUpgrade option for dry run
testing of Oracle Grid Infrastructure upgrades
GIMR state during upgrade
From
Version
GIMR
To Version state in
source OH
GIMR
State in
dest OH
Comments
Standalone Cluster Upgrade (Not Cluster Domain)
Pre12.2
12.2 Jan
2019 RU
No
Yes/No
Choice to select Yes/No to configure GIMR during upgrade
Pre-18c
18.5
Yes
Yes
No Choice to change state of GIMR during upgrade
Pre-19c
19.3
Yes
Yes
No Choice to change state of GIMR during upgrade
Pre-19c
19.3
No
No
No Choice to change state of GIMR during upgrade
Fresh Install
12.2 JAN
2019 RU
Yes/No
Choice to select Yes/No
18.5
Yes/No
Choice to select Yes/No
Yes/No
To add GIMR post Installation use mgmtca
19.3
-
Read Only Oracle HOME (ROOH)
•
•
•
•
•
ROOH enabled Oracle Database homes store configuration
files outside of the Oracle Home
Faster cloning of Oracle software home with ROOH as
environment specific configuration files are stored outside of
Oracle home
Improves security as running processes cannot create new
files under Oracle Home
Oracle RAC DB Home is ROOH
$roohctl can be used in versions 18c and 19c for manual
conversion
•
•
Only pertinent to Oracle Database Home (Not GI home)
Plan to remove configuration files used by application before
converting to ROOH such as
•
tnsnames.ora
Patch faster with -SwitchHome
/u01/app/19.0/grid
/u01/app/19.3/grid
• Apply patch to a new grid
home while stack continues to
run from current home
• Reduces downtime as stack is
up and running during the
copy process
• Reduces errors caused by
common issues such as “Out
of space”
• Easy fallback in case of issues
Summary of Best Practices for Upgrade
Always download the latest version of orachk/exachk from
•
•
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1268927.2
https://support.oracle.com/epmos/faces/DocContentDisplay?id=1070954.1
Consider storage requirements of GIMR
Apply latest OS patches
•
orachk includes the DBSAT (Oracle Database security assessment tool)
Add user defined checks to benefit from a single report
Find environment specific files in ORACLE_HOME such as
password, tnsnames.ora, pfile) and other files that may affect using
ROOH
Agenda
1
2
3
26
Oracle 19c Grid Infrastructure (GI) New Features
New GI Resource Modeling for PDBs
•
•
Optimize management of resources such as database instance, listener on
nodes
Include the ability to startup, stop, prioritize, relocate resources
•
define pdb2 as more critical and therefore start pdb2 before other pdbs
Resource Modeling Today
Services
• Utilizes Service(s) to drive
workload placement
• Services implicitly opens PDB
Instance(s)
• Order of PDB open based on
service definition
•
Defined using Preferred,
Available attributes
• Default modeling after
upgrades
Services trigger PDB open
Resource Modeling Today
Services
• Utilizes Service(s) to drive
workload placement
• Services implicitly opens PDB
Instance(s)
• Order of PDB open based on
service definition
•
Defined using Preferred,
Available attributes
• Default modeling after
upgrades
Services trigger PDB open
Oracle Clusterware start Diagnostics
cssdAgent
cssd
crsd
ctssd
oraRootAgent
init
HAIP
ACFS
ohasd
mdnsd
gipcd
oraAgent
evmd
ASM
cssdmonitor
*NOT all daemons are shown in illustration above
• Environment changes,
incorrect permissions of
binaries can prevent stack
startup
• Oracle 19c Clusterware stack
attempts to auto-diagnose
unsuccessful startup issues
•
Provides detailed logging in
case of failures
CRS-41053: checking Oracle Grid Infrastructure for file permission
issues
PRVG-2031 : Owner of file ”…gipcd.bin" did not match the
[Expected="grid(54320)" Found="oracle(54325)"]
Clusterware runtime Diagnostics
•
Oracle 19c Clusterware processes maintains histograms, statistics
such as trace file rotation frequency, time taken for rotation
•
•
Severity tagging provides human readable criticality of messages
•
•
•
Preserves critical information on very busy systems
2019-08-20 08:36:13.142 : CSSD:1871161088: [ ERROR]
clssgmclienteventhndlr: (SENDCOMPLETE) No proc found for ClientID
2019-08-20 08:36:13.188 : CSSD:1871161088: [ INFO] clssgmDeadProc:
Removing clientID 2:43454:0 (0x7fda802df820), with GIPC
New diagnostics monitor thread ensures in-memory logs (UTS) are
periodically written to ensure diagnostics are available in case of
process crash
Private Network Interface Check
•
•
•
•
•
Oracle 19c recommends using bonding mode 0 Balance-RR, 1Active Backup when HAIP is not used for network redundancy
Recommendation is based on greater tolerance to network
jitters with different combination of interface, switch, OS
Check also ensures every node of the cluster has same
bonding mode
The check is a warning
It is possible but not recommended to use other modes
Private Network Interface Check
•
•
•
•
•
Oracle 19c recommends using bonding mode 0 Balance-RR, 1Active Backup when HAIP is not used for network redundancy
Recommendation is based on greater tolerance to network
jitters with different combination of interface, switch, OS
Check also ensures every node of the cluster has same
bonding mode
The check is a warning
It is possible but not recommended to use other modes
Oracle Clusterware Ciphers
$crsctl get cluster security tls
ON
$crsctl get cluster tlsciphersuite
enabled
• Clusterware processes
communicate with each
other using gIPC
• gIPC today utilizes TLS
(Transport Layer
Security)
• Easy configuration of
any future secure
communication
protocol
Summary of Clusterware features
Plan to utilize the New Resource modelling capabilities
Clusterware start failure(s) will trigger environment checks with
detailed logging to help pinpoint probable mis-configurations
•
•
Permission
Network/Storage
Human readable messages in trace files during runtime failures
for faster issue resolution
Additional checks to ensure cluster interconnect is configured
correctly
Agenda
1
2
3
36
What’s new with Cache Fusion?
Cache Fusion: A long Journey
Before Cache Fusion
3
B
Private
Network
B
2
1
B
B
Buffer
Private
Network
2
Oracle 8i
Oracle 19c
B
•
•
•
•
•
•
Multiple LMSs
Higher Priority
Auto Tune # of LMS
BOC Synchronization
Integration with DRF
Dynamic GRD resizing
1
Cache Fusion
B
B
Private
Network
Optimize Resource Master placement
Global Resource Directory
• During Startup
M M M M
M M M M
M M M M
M M M M
M M M M
M M M M
M M M M
M M M M
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
B B B B
BB Buffer
• Resources are distributed across nodes
• GRD maintains information on these
resources
• Resource Master may or may not be on
same node as the resource
• Steady State
• DRM (Dynamic Re-Mastering) helps
moves Resource Master to same node
as Resource
M Master
Goal is to reduce 3- way communication providing performance equal to Single Instance
38
Oracle RAC Performance Automation
•
•
Note 1619155.1 Best Practices and Recommendations for RAC
databases with SGA size over 100GB
Automatic configuration in 19c
•
•
•
•
•
•
Dynamic CR slaves to deal with changing workload
LMS CR slaves (_max_cr_rollbacks deprecated) (1630755.1)
Reduce “LMS process busy” event in AWR report
LMS CR Slaves
Dynamic DLM ticket adjustment to prevent hangs
remove _lm_tickets
Oracle RAC Exadata optimizations
Exafusion
40
Fast Node Death Detection
30
30
20
10
0
Smart Fusion
Block Transfer
•
•
Network (Subnet Manager)
Disk (Diskmon)
• Utilize low latency RDMA
0.8
Exadata
• Subnet Manager for Fast
Node Death detection
•
Generic Systems
Read/Write to remote memory
without CPU
• More details available at
•
https://www.slideshare.net/AnilNair2
7/oracle-rac-features-on-exadata
Oracle RAC at PayPal
At PayPal, we put people at the
center of everything we do.
Characteristics of Database platform
Across Databases
Extremely Busy OLTP RAC Cluster
150+
4 x Oracle X7-8
< 0.250 µs
Oracle RAC Clusters
(768 Cores, 24 TB memory)
Avg GC message latency
5M+
18M+
2M+
Execs/Sec
Logical Reads/Sec
IC packets sent/received
25% Y-o-Y
200k
< 4ms
DB Storage Growth
Execs/Sec
Avg SQL call latency
50+ PB
300k
75K
Total DB Storage
GC/GE messages/Sec
Execs/Sec on a table
Why PayPal adopted
Oracle RAC
Single instance databases
Scalability limits
Unpredictable availability
Active/Passive configuration
Capacity wastage
Does not meet business goals
How did PayPal achieve HA and scalability?
o Introduction of new technology and concepts
Ø
Ø
Ø
Ø
Ø
Oracle Real Application Clusters(RAC) on Oracle x86 server with IB for interconnect
Smart routing of read/write calls to instances on primary database
Shrinking buffer cache to reduce reconfiguration duration during maintenance
Oracle RAC based ADG/GG reader farms for read only and latency tolerant applications
Fail fast Read Only cluster to failover traffic to secondary cluster in < 10 seconds
o Benefits
Ø
Ø
Ø
Ø
Ø
Elastic scalability
Availability improved by 10x
Eliminated Single Point Of Failure(SPOF)
Primary database Instance failover improved by 10x
Leverage all allocated capacity means reduced CAPEX
Oracle 19c RAC features addressing our needs/problems?
o Availability improvement by 4x by introducing,
Ø
Ø
Ø
Ø
more LMS processes where enough compute capacity available
optimized algorithms reducing Oracle RAC reconfiguration duration
DBMS_CACHEUTIL helping with grab and dissolve resource affinity
Instance pairing for crash redo recovery
o Improvements in scalability and Performance,
Ø
Ø
Ø
Ø
Oracle RAC scalable sequences
improved cache locality reducing interconnect traffic
Smart fusion block transfer eliminating redo log write latency on Exadata
commit cache for recent transactions reducing block transfers between nodes
o Exadata Adoption
Ø Benefit from Oracle RAC features on Exadata
Summary
•
Oracle RAC is the proven choice for Scalability and Availability without any application changes
Descargar