Hasil (
Bahasa Indonesia) 1:
[Salinan]Disalin!
rows are physically stored on disk. Because of this, we can have only one clustered index oneach table.Dimension tables contain attribute columns, typically having a character data type.Attribute columns that are often used in the where clause of the queries need to be set as anonclustered index—but only if the selectivity is high. If the attribute column has manyduplicate values, it may not be worth indexing.For the fact table in SQL Server data warehousing, we have two approaches in terms ofdetermining the primary key and the clustered index. Note that this is specifically for SQLServer; it does not apply to other database engines such as Oracle or Teradata.• The first approach is to create a fact table surrogate key column. This is an identity (1,1)column that functions as a single-column unique identifier of the fact table row. We setthis column as the primary key and the clustered index of the fact table.• The second approach is not to have a fact table surrogate key column. Instead, weselect the minimum combination of columns that make a row unique as the primarykey. In some cases, the combination of all dimensional surrogate key columns makesa row unique. In other cases, they don’t, and we have to identify other columns thatmake a row unique, such as the degenerate dimension columns.If you want to implement the first approach, create a new column for every single facttable. Call this column fact_key. It is an identity (1,1) column. The data type is bigint. It is notworth taking the risk of using the int data type to save 4 bytes, unless you are certain you arenot going to hit 9 billion. Remember that the max(key) can be higher than the number ofrows, as I explained earlier. The bigint data type is 9 quintillion (18 zeros), so it should beenough. Create a clustered primary key in this column. The advantages of this approach arethat the loading can be twice as fast (because the clustered index key is an identity column,so there is no need to reorganize the fact table rows when loading new data), and the nonclusteredindexes can be four to five times smaller than the second approach (because theclustered index key is 8 bytes).If you want to implement the second approach, find out what makes the fact table rowunique. For example, in our Subscription Sales fact table, the grain is one row for each customersubscription per day. So, date_key and customer_key must be part of the primary key.What if the customer subscribes to two packages? We need to include the subscription_idin the primary key to make it unique, so the primary key is date_key, customer_key, andsubscription_id.We cluster the table in these primary key columns, so the table will physicallybe organized/sorted according to date, customer, and then subscription ID.This will make the query fast if the where clause contains the date and customer, becausethe table is physically sorted by the date and then the customer. This could be 10 times fasterthan if the date and customer are not indexed. That’s the advantage of the second approach.The loading speed of the second approach could be twice as slow as the first approach (as discussedpreviously), but we have fewer indexes to maintain. Overall, on the SQL Serverplatform, the fact table surrogate key approach is preferred because there is no need to reorganizethe clustered index when loading, which results in better loading performance, whilethe query performance can be supported by a nonclustered index. The second reason is functionality:we can uniquely identify a fact table row using a single column, which is useful whenwe need to refer to a fact table row either from another fact table (drilling across) or from thesame fact table itself (self-join).
Sedang diterjemahkan, harap tunggu..
