Archive for September, 2007

Ipower web hosting - Data Warehouse Modeling Step by Step The primary

Sunday, September 30th, 2007

Data Warehouse Modeling Step by Step The primary objective of a data warehouse (or any database) is to service end-users. The end-users are the people who read reports produced by data warehouse SQL queries. End-users utilize a data warehouse to search for patterns, and attempt to forecast trends from masses of historical information. From that perspective, there is a sequence of steps in approaching data warehouse database design, beginning with the end-user perspective. The end-user looks at a company from a business process, or operational perspective: . Business processes Establish the subject areas of a business. How can a business be divided up? The result is the fact tables. Fact tables contain records of historical transactions. . Granularity Granularity is the level of detail required. In other words, should a data warehouse store every single transaction? Should it summarize transactions as a single record for each day, month, year, and so on? The more granularity the data warehouse contains, the bigger fact tables are because the more records they contain. The safest option is include all historical data down to the lowest level of granularity. This ensures that any possible future requirements for detailed analysis can always be met, without needed data perhaps missing in the future. Missing data might make your executive managers a little irate in the future. They will be irate with you, and that s usually best avoided. There are specialized objects such as materialized views that can create summaries at a later stage. When you do not know the precise requirements for future use of your data warehouse, to be on the safe side, it is best to store all levels of detail (assuming hardware storage capacity allows it). If you miss a level of detail in any specific area and it is later requested, you won t be able to comply. In other words, store every transaction, if you have the physical disk space and general hardware-processing capacity. . Identify and build dimensions Dimensions contain static information. Dimensions describe facts by storing static details about transactions in fact tables. Dimensions must be built before facts because facts contain foreign key references to dimension tables. . Build facts As previously mentioned, facts are transactional records, going back even many years. Fact tables are built after all dimensions are decided upon because, as you already know, facts are dependent on dimensions. How Long to Keep Data in a Data Warehouse? The amount of time you keep data in a data warehouse depends on end-user requirements. Typically, when designing a data warehouse, at the point of creating table ERD diagrams, it is impossible to tell how detail is required. The best option is retain every single transaction without summarizing anything; however, that can chew up a humungous amount of disk space. If you have the space, why not use it. If you run out of space later, you can always begin summarizing and destroying detail level records at a later stage. Be warned! Summarizing data warehouse records into aggregated records, and deleting detail records can be a seriously time- consuming effort if done when the data warehouse has grown to be extremely large. Data warehouses sometimes retain all data forever. When a data warehouse becomes too difficult to manage, there will have to be some deletion of older data, or summarizing (or both). It all depends on hardware storage capacity, the power of computers, and how much you can spend on continually expanding the capacity of existing hardware. Upgrading a large data warehouse to new hardware and software can also be very time-consuming. 183 Understanding Data Warehouse Database Modeling
If you are looking for affordable and reliable webhost to host and run your business application visit our ftp web hosting services.

A Cartesian product is a worse-case scenario. .

Sunday, September 30th, 2007

A Cartesian product is a worse-case scenario. . Now look at the next query. SELECT * FROM SALE SAL JOIN AUTHOR AUT JOIN CUSTOMER CUS JOIN SHIPPER SHP JOIN SUBJECT SUB JOIN BOOK BOO WHERE … GROUP BY … ORDER BY … ; . Using the star schema from Figure 7-7, assuming the same number of records, a join occurs between one fact table and six dimensional tables. That is a Cartesian product of 106 multiple by 106, resulting in 1012 records returned. The difference between 1012 and 1015 is three decimals. Three decimals is not just three zeroes and thus 1,000 records. The difference is actually 1,000,000,000,000,000 1,000,000,000,000 = 999,000,000,000,000. That is effectively just a little less than 1015. The difference between six dimensions and nine dimensions is more or less infinite, from the perspective of counting all those zeros. Fewer dimensions make for faster queries. That s why it is so essential to denormalize snowflake schemas into star schemas. . Take another quick glance at the snowflake schema in Figure 7-4 and Figure 7-5. Then examine the equivalent denormalized star schema in Figure 7-6 and Figure 7-7. Now put yourself into the shoes of a hustled, harried and very busy executive trying to get a quick report. Think as an end-user, one only interested in results. Which diagram is easier to decipher as to content and meaning? The diagram in Figure 7-7 is more complex than the diagram in Figure 7-5? After all, being an end-user, you are probably not too interested in understanding the complexities of how to build SQL join queries. You have bigger fish to fry. The point is this: The less complex the table structure, the easier it will be to use. This is because a star schema is more representative of the real world than a snowflake schema. Look at it this way. A snowflake schema is more deeply normalized than a star schema, and, therefore, by definition more mathematical. Something more mathematical is generally of more use to a mathematician than it is to an executive manager. The executive is trying to get a quick overall impression of whether his company will sell more cans of lima beans, or more cans of string beans, over the course of the next ten years. If you are a computer programmer, you will quite probably not agree with this analogy. That tells us the very basics of data warehouse database modeling. How can a data warehouse database model be constructed? How to Build a Data Warehouse Database Model Now you know how to build star schemas for data warehouse database models. As you can see, a star schema is quite different from a standard relational database model (Figure 7-1). The next step is to examine the process, or the steps, by which a data warehouse database model can be built. 182 Chapter 7
Looking for affordable and reliable webhost to host and run your business application? Then look no more and go to servlet web hosting services.

Figure 7-7: The SALE fact-dimensional structure denormalized into (Web site layout)

Saturday, September 29th, 2007

Figure 7-7: The SALE fact-dimensional structure denormalized into a star schema. What does all this prove? Not much, you might say. On the contrary, two things are achieved by using fact-dimensional structures and star schemas: . Figure 7-1 shows a highly normalized table structure, useful for high-concurrency, precision record-searching databases (an OLTP database). Replacing this structure with a fact-dimensional structure (as shown in Figure 7-2, Figure 7-4, and Figure 7-6) reduces the number of tables. As you already know, reducing the number tables is critical to SQL query performance. Data warehouses consist of large quantities of data, batch updates, and incredibly complex queries. The fewer tables, the better. It just makes things so much easier with fewer tables, especially because there is so much data. The following code is a SQL join query for the snowflake schema, joining all nine tables in the snowflake schema shown in Figure 7-5. SELECT * FROM SALE SAL JOIN AUTHOR AUT JOIN CUSTOMER CUS JOIN SHIPPER SHP JOIN SUBJECT SUB JOIN CATEGORY CAT JOIN BOOK BOO JOIN PUBLISHER PBS JOIN PUBLICATION PBL WHERE … GROUP BY … ORDER BY … ; . If the SALE fact table has 1 million records, and all dimensions contain 10 records each, a Cartesian product would return 106 multiplied by 109 records. That makes for 1015 records. That is a lot of records for any CPU to process. Book Customer Shipper Author Subject One-To-Many Relationship Sale 181 Understanding Data Warehouse Database Modeling
We recommend cheap and reliable webhost to host and run your web applications: Coldfusion Web Hosting services.

The solution is an obvious one. Convert (denormalize) (My space web page)

Saturday, September 29th, 2007

The solution is an obvious one. Convert (denormalize) a normalized snowflake schema into a star schema, as shown in Figure 7-6. In Figure 7-6 the PUBLISHER and PUBLICATION tables have been denormalized into the BOOK table, plus the CATEGORY table has been denormalized into the SUBJECT table. Figure 7-6: A denormalized SALE table fact-dimensional structure. A more simplistic equivalent diagram to that of Figure 7-6 is shown by the star schema in Figure 7-7. Sale sale_id ISBN (FK) author_id (FK) shipper_id (FK) customer_id (FK) subject_id (FK) sale_price sale_date Customer customer_id customer address phone email credit_card_type credit_card# credit_card_expiry Shipper shipper_id shipper address phone email Author author_id author Book ISBN publisher title edition# print_date pages list_price format rank ingram_units Subject subject_id category subject 180 Chapter 7
We recommend cheap and reliable webhost to host and run your web applications: Coldfusion Web Hosting services.

Amore simplistic equivalent diagram to that of Figure (Web site management)

Friday, September 28th, 2007

Amore simplistic equivalent diagram to that of Figure 7-4 is shown by the snowflake schema in Figure 7-5. Figure 7-5: The SALE fact-dimensional structure is a snowflake schema. The problem with snowflake schemas isn t too many tables but too many layers. Data warehouse fact tables can become incredibly large, even to millions, billions, even trillions of records. The critical factor in creating star and snowflake schemas, instead of using standard nth Normal Form layers, is decreasing the number of tables in SQL query joins. The more tables in a join, the more complex a query, the slower it will execute. When fact tables contain enormous record counts, reports can take hours and days, not minutes. Adding just one more table to a fact-dimensional query join at that level of database size could make the query run for weeks. That s no good! Book Publication Publisher Customer Shipper Author Subject One-To-Many Relationship Sale Category 179 Understanding Data Warehouse Database Modeling
You need excellent and relaible webhost company to host your web applications? Then pay a visit to Inexpensive Web Hosting services.

What Is a Snowflake Schema? Asnowflake schema is (Free web host)

Friday, September 28th, 2007

What Is a Snowflake Schema? Asnowflake schema is shown in Figure 7-4. Asnowflake schema is a normalized star schema, such that dimension entities are normalized (dimensions are separated into multiple tables). Normalized dimensions have all duplication removed from each dimension, such that the result is a single fact table, connected directly to some of the dimensions. Not all of the dimensions are directly connected to the fact table. In Figure 7-4, the dimensions are grayed out in two shades of gray. The lighter shade of gray represents dimensions connected directly to the fact table (BOOK, AUTHOR, SUBJECT, SHIPPER, and CUSTOMER). The darker-shaded gray dimensional tables, are normalized subset dimensional tables, not connected to the fact table directly (PUBLISHER, PUBLICATION, and CATEGORY). Figure 7-4: The SALE table fact-dimensional structure. Customer customer_id Shipper shipper_id shipper address phone email customer address phone email credit_card_type credit_card# credit_card_expiry Book ISBN publication_id (FK) publisher_id (FK) edition# print_date pages list_price format rank ingram_units Sale sale_id ISBN (FK) author_id (FK) shipper_id (FK) customer_id (FK) subject_id (FK) sale_price sale_date Publisher publisher_id publisher Author author_id author Publication publication_id title Subject subject_id category_id (FK) subject Category category_id category 178 Chapter 7
If you are in need for cheap and reliable webhost to host your website, we recommend http web server services.

Figure 7-3: (Web server logs) The REVIEW fact-dimensional structure is a

Thursday, September 27th, 2007

Figure 7-3: The REVIEW fact-dimensional structure is a star schema. A star schema contains a single fact table plus a number of small dimensional tables. If there is more than one fact table, effectively there is more than one star schema. Fact tables contain transactional records, which over a period of time can come to contain very large numbers of records. Dimension tables on the other hand remain relatively constant in record numbers. The objective is to enhance SQL query join performance, where joins are executed between a single fact table and multiple dimensions, all on a single hierarchical level. So, a star schema is a single, very large, very changeable, fact table, connected directly to a single layer of multiple, static-sized dimensional tables. Author Book Publisher Customer One-To-Many Relationship Review 177 Understanding Data Warehouse Database Modeling
Searching for affordable and proven webhost to host and run your servlet applications? Go to Linux Web Hosting services and you will find it.

Web site domain - What Is a Star Schema? The most effective

Thursday, September 27th, 2007

What Is a Star Schema? The most effective approach for a data warehouse database model (using dimensions and facts) is called a star schema. Figure 7-2 shows a simple star schema for the REVIEW fact table shown in Figure 7-1. Figure 7-2: The REVIEW table fact-dimensional structure. A more simplistic equivalent diagram to that of Figure 7-2 is shown by the star schema structure in Figure 7-3. Review review_id customer_id (FK) publication_id (FK) author_id (FK) publisher_id (FK) review_date text Customer customer_id customer address phone email credit_card_type credit_card# credit_card_expiry Author author_id author Publisher publisher_id publisher Publication publication_id title 176 Chapter 7
In case you need quality webspace to host and run your web applications, try our personal web hosting services.

The Dimensional Database Model A standard, normalized, relational (Photography web hosting)

Wednesday, September 26th, 2007

The Dimensional Database Model A standard, normalized, relational database model is completely inappropriate to the requirements of a data warehouse. Even a denormalized relational database model doesn t make the cut. An entirely different modeling technique, called a dimensional database model, is needed for data warehouses. A dimensional model contains what are called facts and dimensions. A fact table contains historical transactions, such as all invoices issued to all customers for the last five years. That could be a lot of records. Dimensions describe facts. The easiest way to describe the dimensional model is to demonstrate by example. Figure 7-1 shows a relational table structure for both static book data and dynamic (transactional) book data. The grayed out tables in Figure 7-1 are static data tables and others are tables containing data, which is in a constant state of change. Static tables are the equivalent of dimensions, describing facts (equivalent to transactions). So, in Figure 7-1, the dimensions are grayed out and the facts are not. Figure 7-1: The OLTP relational database model for books. Customer customer_id Shipper shipper_id shipper address phone email customer address phone email credit_card_type credit_card# credit_card_expiry Sale sale_id ISBN (FK) shipper_id (FK) customer_id (FK) sale_price sale_date Edition ISBN publisher_id (FK) publication_id (FK) print_date pages list_price format Publisher publisher_id name Publication publication_id subject_id (FK) author_id (FK) title Author author_id name Review review_id publication_id (FK) review_date text Subject subject_id parent_id name CoAuthor coauthor_id (FK) publication_id (FK) Rank ISBN (FK) rank ingram_units 175 Understanding Data Warehouse Database Modeling
From our experience, we are can tell you that you can find a reliable and cheap webhost service at Java Web Hosting services.

Surrogate Keys in a Data Warehouse (Web hosting solutions) Surrogate keys,

Wednesday, September 26th, 2007

Surrogate Keys in a Data Warehouse Surrogate keys, as you already know, are replacement key values. Asurrogate key makes database access more efficient usually. In data warehouse databases, surrogate keys are possibly more important in terms of gluing together different data, even from different databases, perhaps even different database engines. Sometimes different databases could be keyed on different values, or even contain different key values, which in the non-computerized world are actually identical. For example, a customer in a department of a company could be uniquely identified by the customer s name. In a second department, within the same company, the same customer could be identified by the name of a contact or even perhaps the phone number of that customer. Athird department could identify the same customer by a fixed-length character coding system. All three definitions identify exactly the same customer. If this single company is to have meaningful data across all departments, it must identify the three separate formats, all representing the same customer as being the same customer in the data warehouse. Asurrogate key is the perfect solution, using the same surrogate key value for each repetition of the same customer, across all departments. Surrogate key use is prominent in data warehouse database modeling. Referential Integrity in a Data Warehouse Data warehouse data modeling is essentially a form of relational database modeling, albeit a simplistic form. Referential integrity still applies to data warehouse databases; however, even though referential integrity applies, it is not essential to create primary keys, foreign keys, and their inter-table referential links (referential integrity). It is important to understand that a data warehouse database generally has two distinct activities. The first activity is updating with large numbers of records added at once, sometimes also with large numbers of records changed. It is always best to only add or remove data in a data warehouse. Changing existing data warehouse table records can be extremely inefficient simply because of the sheer size of data warehouses. Referential integrity is best implemented and enforced when updating tables. The second activity of a data warehouse is the reading of data. When data is read, referential integrity does not need to be verified because no changes are occurring to records in tables. On the contrary, because referential integrity implies creation of primary and foreign keys, and because the best database model designs make profligate use of primary and foreign key fields in SQL code, leave referential integrity intact for a data warehouse. So, now we know the origin of data warehouses and why they were devised. What is the data warehouse dimensional database model? 174 Chapter 7
Searching for affordable and proven webhost to host and run your servlet applications? Go to Linux Web Hosting services and you will find it.