and 105GB. We need to remember that these initial sizes are collective overall figures, so
we need to divide them among the six files. Set the growth increment to 20 to 25 percent
of the initial sizes to minimize the SQL Server grow operation. Remember that we
need to maintain the size of the data and log files manually (say every six months, with
monitoring every month); we should use autogrowth only for emergencies.
• If the expected daily load is 5GB to 10GB and we keep five days of data on the stage, set
the stage database’s initial size to 50GB with a growth increment of 10GB. For the
metadata database, we expect it would be about 10GB in a year and to be 20GB in two
years, so let’s set the initial size to 10GB with a 5GB increment. The principle to setting
the increment is that it is only for emergencies; we should never have to use autogrowth
because we increase the file size manually. The increment for the earlier stage
database was set to 20 percent to 25 percent as per the DDS and the NDS. This large
percentage is selected to minimize fragmentation if the database file did become full.
The increment for the metadata database is set to 50 percent (5GB) because the metadata
database contains audit and usage metadata that could fluctuate by significant
amounts depending on ETL processes.
• The log file size depends on the size of the daily load, recovery model, and the loading
method (ETL or ELT, stage or not stage; I’ll discuss this in the next chapter) as well as
index operations. Let’s set it to a 1GB initial size with a 512MB increment for both the
DDS and the NDS. For the stage, set a 2GB initial size with a 512MB increment. For
metadata, set it to 100MB with a 25MB increment. The transaction log contains database
changes. The transaction log space required depends on how much data we load
into the database. One way to estimate how much log space is required is to use the
ETL processes to load one day’s data into the stage and then into the NDS and then into
the DDS. If we set the initial size and the increment of these three databases to a small
amount (say 1MB to 2MB), during this process the log file will grow so that after the ETL
processes are completed, the transaction log sizes of these three databases will indicate
the required log sizes.
• For the recovery model, choose simple rather than bulk. All the changes in the data
warehouse are from the ETL processes. When recovering from failure, we can roll forward
using ETL by reapplying the extracted source system data for the particular day.
We don’t require the bulk recovery model to roll forward to a certain point in time using
differential backup or transaction log backup. We fully control the data upload on the
ETL, and the ETL process is the only process that updates the data warehouse. For the
stage, the NDS, and the DDS, the full recovery model is not necessary and causes overhead.
The full recovery model requires log backup, whereas the simple recovery model
doesn’t. The simple recovery model reclaims the log space automatically. The full recovery
model is suitable for OLTP systems where inserts and updates happen frequently all
day from many users and we want to be able to recover the database to a certain point
in time. In data warehousing, we can recover the data store by restoring the last full
backup followed by applying differential backups and then applying the daily ETL
loads.
Sedang diterjemahkan, harap tunggu..
