rows are physically stored on disk.

rows are physically stored on disk. Because of this, we can have only one clustered index on
each table.
Dimension tables contain attribute columns, typically having a character data type.
Attribute columns that are often used in the where clause of the queries need to be set as a
nonclustered index—but only if the selectivity is high. If the attribute column has many
duplicate values, it may not be worth indexing.
For the fact table in SQL Server data warehousing, we have two approaches in terms of
determining the primary key and the clustered index. Note that this is specifically for SQL
Server; it does not apply to other database engines such as Oracle or Teradata.
• The first approach is to create a fact table surrogate key column. This is an identity (1,1)
column that functions as a single-column unique identifier of the fact table row. We set
this column as the primary key and the clustered index of the fact table.
• The second approach is not to have a fact table surrogate key column. Instead, we
select the minimum combination of columns that make a row unique as the primary
key. In some cases, the combination of all dimensional surrogate key columns makes
a row unique. In other cases, they don’t, and we have to identify other columns that
make a row unique, such as the degenerate dimension columns.
If you want to implement the first approach, create a new column for every single fact
table. Call this column fact_key. It is an identity (1,1) column. The data type is bigint. It is not
worth taking the risk of using the int data type to save 4 bytes, unless you are certain you are
not going to hit 9 billion. Remember that the max(key) can be higher than the number of
rows, as I explained earlier. The bigint data type is 9 quintillion (18 zeros), so it should be
enough. Create a clustered primary key in this column. The advantages of this approach are
that the loading can be twice as fast (because the clustered index key is an identity column,
so there is no need to reorganize the fact table rows when loading new data), and the nonclustered
indexes can be four to five times smaller than the second approach (because the
clustered index key is 8 bytes).
If you want to implement the second approach, find out what makes the fact table row
unique. For example, in our Subscription Sales fact table, the grain is one row for each customer
subscription per day. So, date_key and customer_key must be part of the primary key.
What if the customer subscribes to two packages? We need to include the subscription_id
in the primary key to make it unique, so the primary key is date_key, customer_key, and
subscription_id.We cluster the table in these primary key columns, so the table will physically
be organized/sorted according to date, customer, and then subscription ID.
This will make the query fast if the where clause contains the date and customer, because
the table is physically sorted by the date and then the customer. This could be 10 times faster
than if the date and customer are not indexed. That’s the advantage of the second approach.
The loading speed of the second approach could be twice as slow as the first approach (as discussed
previously), but we have fewer indexes to maintain. Overall, on the SQL Server
platform, the fact table surrogate key approach is preferred because there is no need to reorganize
the clustered index when loading, which results in better loading performance, while
the query performance can be supported by a nonclustered index. The second reason is functionality:
we can uniquely identify a fact table row using a single column, which is useful when
we need to refer to a fact table row either from another fact table (drilling across) or from the
same fact table itself (self-join).

0/5000

Dari: -

Ke: -

Hasil (Bahasa Indonesia) 1: [Salinan]

Disalin!

rows are physically stored on disk. Because of this, we can have only one clustered index oneach table.Dimension tables contain attribute columns, typically having a character data type.Attribute columns that are often used in the where clause of the queries need to be set as anonclustered index—but only if the selectivity is high. If the attribute column has manyduplicate values, it may not be worth indexing.For the fact table in SQL Server data warehousing, we have two approaches in terms ofdetermining the primary key and the clustered index. Note that this is specifically for SQLServer; it does not apply to other database engines such as Oracle or Teradata.• The first approach is to create a fact table surrogate key column. This is an identity (1,1)column that functions as a single-column unique identifier of the fact table row. We setthis column as the primary key and the clustered index of the fact table.• The second approach is not to have a fact table surrogate key column. Instead, weselect the minimum combination of columns that make a row unique as the primarykey. In some cases, the combination of all dimensional surrogate key columns makesa row unique. In other cases, they don’t, and we have to identify other columns thatmake a row unique, such as the degenerate dimension columns.If you want to implement the first approach, create a new column for every single facttable. Call this column fact_key. It is an identity (1,1) column. The data type is bigint. It is notworth taking the risk of using the int data type to save 4 bytes, unless you are certain you arenot going to hit 9 billion. Remember that the max(key) can be higher than the number ofrows, as I explained earlier. The bigint data type is 9 quintillion (18 zeros), so it should beenough. Create a clustered primary key in this column. The advantages of this approach arethat the loading can be twice as fast (because the clustered index key is an identity column,so there is no need to reorganize the fact table rows when loading new data), and the nonclusteredindexes can be four to five times smaller than the second approach (because theclustered index key is 8 bytes).If you want to implement the second approach, find out what makes the fact table rowunique. For example, in our Subscription Sales fact table, the grain is one row for each customersubscription per day. So, date_key and customer_key must be part of the primary key.What if the customer subscribes to two packages? We need to include the subscription_idin the primary key to make it unique, so the primary key is date_key, customer_key, andsubscription_id.We cluster the table in these primary key columns, so the table will physicallybe organized/sorted according to date, customer, and then subscription ID.This will make the query fast if the where clause contains the date and customer, becausethe table is physically sorted by the date and then the customer. This could be 10 times fasterthan if the date and customer are not indexed. That’s the advantage of the second approach.The loading speed of the second approach could be twice as slow as the first approach (as discussedpreviously), but we have fewer indexes to maintain. Overall, on the SQL Serverplatform, the fact table surrogate key approach is preferred because there is no need to reorganizethe clustered index when loading, which results in better loading performance, whilethe query performance can be supported by a nonclustered index. The second reason is functionality:we can uniquely identify a fact table row using a single column, which is useful whenwe need to refer to a fact table row either from another fact table (drilling across) or from thesame fact table itself (self-join).

Sedang diterjemahkan, harap tunggu..

Hasil (Bahasa Indonesia) 2:[Salinan]

Disalin!

Sedang diterjemahkan, harap tunggu..

Hasil (Bahasa Indonesia) 3:[Salinan]

Disalin!

Sedang diterjemahkan, harap tunggu..

Bahasa lainnya

Dukungan alat penerjemahan: Afrikans, Albania, Amhara, Arab, Armenia, Azerbaijan, Bahasa Indonesia, Basque, Belanda, Belarussia, Bengali, Bosnia, Bulgaria, Burma, Cebuano, Ceko, Chichewa, China, Cina Tradisional, Denmark, Deteksi bahasa, Esperanto, Estonia, Farsi, Finlandia, Frisia, Gaelig, Gaelik Skotlandia, Galisia, Georgia, Gujarati, Hausa, Hawaii, Hindi, Hmong, Ibrani, Igbo, Inggris, Islan, Italia, Jawa, Jepang, Jerman, Kannada, Katala, Kazak, Khmer, Kinyarwanda, Kirghiz, Klingon, Korea, Korsika, Kreol Haiti, Kroat, Kurdi, Laos, Latin, Latvia, Lituania, Luksemburg, Magyar, Makedonia, Malagasi, Malayalam, Malta, Maori, Marathi, Melayu, Mongol, Nepal, Norsk, Odia (Oriya), Pashto, Polandia, Portugis, Prancis, Punjabi, Rumania, Rusia, Samoa, Serb, Sesotho, Shona, Sindhi, Sinhala, Slovakia, Slovenia, Somali, Spanyol, Sunda, Swahili, Swensk, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Turki, Turkmen, Ukraina, Urdu, Uyghur, Uzbek, Vietnam, Wales, Xhosa, Yiddi, Yoruba, Yunani, Zulu, Bahasa terjemahan.