A production database is full of data that makes sense for its purpose, whether the database is 10GB or 10TB. Now if you take that database and clone it for QA or development suddenly the data is cumbersome, unnecessary. Terabytes of disk are provisioned simply to hold a replica for the purpose of testing a small subset of it. An entire architecture with its myriad support structures, historical data, indexes, and large objects cloned and ready just to be partially used, trashed, and rebuilt. This is waste, both of time and storage resources. To the business, having a duplicate environment makes absolute sense; however, from an IT point of view the repeated duplication of storage space and the time drain it cause just makes no sense at all.
When database copies are made from the same master database (the source environment , the master database, being used for cloning), typically 95% or more of the blocks are duplicated across all copies. In a QA, development, and reporting environments there will almost always be some changes exclusive to the cloned system; however, the amount is usually extremely small compared to the size of the source database. The unchanged blocks are redundant and take up massive amounts of disk space that could be saved if the blocks could somehow be shared.
Yet sharing duplicate blocks is not an easy feat in most database environments. It requires a technology that can act as a foundational cornerstone that coordinates and orchestrates access to duplicate blocks. At the same time, it requires the database copies to be writable with their own private modifications that are hidden from the source or other clones made from the source.
There are several technologies available in the industry that can accomplish block sharing across database clones. The primary technology involves filesystem snapshots that can be deployed across multiple clones. This concept is known as thin cloning, and it allows filesystems to share original, unmodified blocks across multiple snapshots while keeping changes on the target clones private to the clone that made the changes.
But as with all new technologies, there are many methods available that all accomplish the same task. Likewise with the taks of thin cloning, there are multiple vendors and methods that provide this technology.
Thin Cloning Technologies
The main requirement of database cloning is that the database files and logs must be in a consistent state on the copied system. This can be achieved either by having datafiles in a consistent states (via a cold backup) or with change logs that can bring the datafiles to a consistent state. The cloning process must also perform prerequisite tasks like producing startup parameter files, database definition files, or other pre-creation tasks for the target database to function. For example, Oracle requires control files, password files, pfile/spfiles, and other pre-created components before the target can be opened for use. Controlfiles, logfiles, and datafiles together constitute a database that can be opened and read by the appropriate DBMS version.
In order to share data between two distinct, non-clustered instances of a database, the two instances must believe they have sole access to the datafiles. Modifications to the datafiles in one instance (a production database, for example) cannot be seen by the other instance (a clone) as it would result in corruption. For thin clones, datafiles are virtual constructs that are comprised of the shared common data and a private area specific to the clone.
There are a number of technologies that can be leveraged to support thin cloning. These technologies can be broken down into
Application software made to manage access to shared datafile copies
Copy-on-write filesystem snapshots
Allocate-on-write filesystem snapshots
The difference in these technologies is substantial and will determine how flexible and manageable the final thin cloning solution will be.
Software Managed Thin Cloning
Because it is part of the existing stack, one approach to sharing databases is to have the database software itself as the thin cloning technology. In this scenario the DBMS software itself orchestrates access to a shared set of database files while modifications are written to an area that is private to each database clone. In this way the clones are managed as part of the DBMS software itself. This is the approach Oracle has taken with Clonedb, a software model introduced in Oracle 11gR2 patchset 2.
Clonedb
Clonedb manages the combination of shared and private data within an Oracle environment. By maintaining a central set of read only datafiles and a private area for each clone. The private area is only visible to each clone, guaranteeing that cross-clone corruption cannot occur. The Oracle RDBMS software orchestrates access to the read only datafiles and private areas maintained by the clones.
Oracle’s Clonedb is available starting in Oracle version 11.2.0.2 and higher. The cloned option takes a set of read only datafiles from an RMAN backup as the basis for the clone. Clonedb then maps a set of ‘sparse’ files to the actual datafiles. These sparse files represent the private area for each clone where all changes are stored. When the clone database needs a datablock, it will look in the sparse file; if it is not found there, then the cloned database looks in the underlying read only backup datafile. Through this approach, many clones can be created from a single set of RMAN datafile backups.
The greatest benefit of this technology is that by sharing the underlying set of read only datafiles, the common data shared between each clone does not have to be replicated. Clones can be created easily and with minimal storage requirements. With time and space no longer a constraint, cloning operations become far more efficiently and with minimal resource requirements.
Many of the business and operational issues summarized in chapters 1 and 2 of this book can be alleviated with this technology. For instance, DBAs can use Clonedb to provision multiple developer copies of a database instead of forcing developers to share the same data set. By using multiple developer copies many delays and potential data contamination issues can be avoided, which speeds development and efficiency during application development. Developers will be able to perform their test work, validate it against his or her private dataset, and then commit their code and merge it into a shared development database. The cloned environment can be trashed, refreshed, or handed over to another developer.
On the other hand, Clonedb requires that all clones be made from the source database at the same point in time. For example, if Clonedb is used to provision databases from an RMAN backup taken a day ago and developers want clones of the source database as it is today, then an entire new RMAN backup must be made. This dilutes the storage and timesavings advantage that Clonedb originally brought. While it is possible to use redo and archive logs to bring the previous day’s RMAN backups up to speed (all changes from the last 24 hours would be applied to yesterday’s RMAN datafile copies), the strategy would only work efficiently in some cases. The farther the clone is from the original RMAN datafile copies, the longer and more arduous the catching up process would be, resulting in wasted time and resources.
Clonedb functionality is effective and powerful in some situations, but it is limited in its ability to be a standard in an enterprise-wide thin cloning strategy.
Figure 1. Sparse files are mapped to actual datafiles behind the scenes. The datafile backup copy is kept in a read only state. The cloned instance (using Clonedb) first looks for datablocks in the sparse file. If the datablock is not found, it will then read from the RMAN backup. If the Clonedb instance modifies any data, it will write the changes to the sparse file.
In order to implement Clonedb, Oracle 11.2.0.2 or higher is required. Additionally, Direct NFS (dNFS) must be used for sparse files. The sparse file is implemented on a central NFS mounted directory with files that can be accessed via Oracle’s direct NFS implementation.
To create this configuration, the following high-level steps must be taken:
Recompile the Oracle binaries with Oracle dNFS code
Run the clonedb.pl script, available through Metalink Document 1210656.1
Startup the cloned database with the startup script created by clonedb.pl
The syntax for clonedb.pl is relatively simple:
clonedb.pl initSOURCE.ora create_clone.sql
Three environment variables must be set for the configuration:
MASTER_COPY_DIR=”/rman_backup”
CLONE_FILE_CREATE_DEST=”/nfs_mount”
CLONEDB_NAME=”clone”
Once clonedb.pl is run, running the output file generated by the script will create the dabase clone.
sqlplus / as sysdba @create_clone.sql
The create clone script does the work in four basic steps:
The database is started up in nomount mode with a generated pfile (initclone.ora in this case).
A custom create controlfile command is run that points to the datafiles in the RMAN backup location.
Maps the sparse files on a dNFS mount to the datafiles in the RMAN backup location. For instance: dbms_dnfs.clonedb_renamefile(‘/backup/file.dbf’, ‘/clone/file.dbf’);
The database is brought online in resetlogs mode: alter database open resetlogs;
Figure 2. This image shows that multiple Clonedb instances can share the same underlying RMAN backup. Each Clonedb instances writes its changes to its own private sparse files
Figure 3. A graphical outline of the process. An RMAN backup is taken of the source database and placed in a location where the Clonedb instances can access them (in this case, an NFS mount). A Clonedb instance can be set up on any host that has access to the NFS filer via dNFS. The Clonedb instances will create sparse files on the NFS filer. The sparse files map to the datafiles in the RMAN backup.
NOTE: If Clonedb instances are going to be created from two different points in time, then a new RMAN backup has to be taken and copied to the NFS server before it can be used as the source for new clones as shown in Figure 3.
Because Clonedb adds an extra layer of code that requires reads to both the sparse files over dNFS and the RMAN datafile backups, there is a performance hit for using Clonedb. The biggest drawback is the requirement for multiple copies of source databases in order to create clones from different points in time, which diminishes the storage savings. The Clonedb functionality is a powerful option that should be in the back pocket of any Oracle DBA but it has limited use for an automated strategy involving thin cloning.
Reference
The best write-up on clonedb is Tim Hall’s blog post at http://www.oracle-base.com/articles/11g/clonedb-11gr2.php
Comments