Tuesday, December 16, 2014

The Date Dimension – A MUST HAVE in all Reporting Environments



Bryan Townsend
Business Analytics Specialist - Data Warehousing, Modeling

As a report developer, how many times have you been stuck creating reports in a Business Intelligence environment without a date dimension? If your answer is only once, I can tell you from experience that even once is too many times. If you are consistently in that situation, then you may want to befriend the DBA and plead your case to have them implement a date dimension.

Date dimensions are common in BI implementations that have a data warehouse. They are not as common in BI environments where reporting occurs directly off of the transactional system(s). Regardless of whether you’re reporting off of a data warehouse, or a transactional system, I would argue that date dimensions are a MUST HAVE in all reporting environments for the following two reasons:

  • Enhanced query performance - All fact tables in a reporting environment contain one or more dates that can be linked to date dimensions. By tying these fact tables to a date dimension, report developers can utilize time period attributes which results in more efficient queries sent to the database.
  • Increased efficiency for report development - Most if not all reports reference a time period such as Month, Year, or Prior Year. Using attributes from a date dimension make reporting on time periods easier, which results in faster developing rather than having to code logic for these time periods.

Those two points alone make date dimensions the most commonly used dimensions in Business Intelligence implementations. The following will discuss the structure, the implementation process, and benefits of having a date dimension in your reporting environment.

Structure

The granularity of the date dimension is day. It consists of a record for each day for a timeframe defined by the business that will be sufficient to handle their reporting needs. Date dimensions have dates that go well into the future. For instance, a date dimension can start on 1/1/2000 and end 12/31/2030. This range will vary from business to business.

Date dimensions consist of a key and many attributes. The key is usually an integer value that is unique for each record, and is used to join the dimension to fact tables. For example, the key for 12/01/2014 would be 20141201. The many attributes of a date dimension are what makes them so powerful. Many companies use their fiscal calendar for reporting. Because of that, most date dimensions will contain attributes for both calendar year and fiscal year. Below is a list of common attributes found in a date dimension.

Common Attributes
Additional Attributes
Date
Calendar_Weekday
Calendar_Year
Calendar_Day_Of_Week
Calendar_Qtr
Calendar_DOM
Calendar_Month
Week_Ending_Date
Calendar_Week_Of_Year
Working_Day_Of_Month
Calendar_Day_Of_Year
Working_Days_In_Month
Fiscal_Year

Fiscal_Qtr

Fiscal_Period

Fiscal_Week_Of_Year

Fiscal_Day_Of_Year


Populating

There are several methods for loading data into date dimensions. Since most date dimensions are typically loaded once to contain all the attributes, past dates, and future dates, any of the options discussed below are valid. They all may take an afternoon to put together, but they all accomplish the same task. These options are:

  • Build the date dimension manually in a spreadsheet, and have that loaded into a database table. This method would be for those that are less technical, but have a great understanding of what attributes need to be in the date dimension. If you’re loading 30 years of dates, you’ll have over 10,000 records to manage in this spreadsheet. Dealing with that volume manually may make this method slightly more cumbersome than the next two methods. It may be tougher to maintain as well if there is a need to add more years. Once all the keys and attributes are in the spreadsheet, have your Database Administrator import that data into a database table where it can be used for report development.
  • Create a spreadsheet with formulas that calculate the attributes, and have that loaded into the table. Reference the spreadsheet to see this method. This option can be used by those savvy excel users that have the ability to write complex formulas. Once the correct formula is created for a cell, that formula can be copied across all date cells. For example, if your Fiscal Year starts on Oct 1st, you can use the formula below to calculate the period number. Many of the Calendar Year attributes can be achieved by referencing the date column, and adding some formatting. Another benefit of this option is that a quick search on the internet for creating any of the attributes in a spreadsheet will most likely return several ways to create the formula. When the spreadsheet is complete, your DBA will have to import that into a table.
          =IF((MONTH(A2)+3)>12,(MONTH(A2)+3)-12,(MONTH(A2)+3)) 
          Note: A2 is the date column in this example
  • Write a SQL statement that will populate the date dimension. This method will require knowledge of writing SQL statements, and for you to have access to your database environment. This option is the most complex of the three methods, but it is also the most effective. It is also the easiest to maintain. Because this task is accomplished in the database environment there is no need for an import step as there is with spreadsheets. Once the SQL statement is completed, it’s as easy as executing the statement to populate the date dimension. Similar to the previous method, a simple search on the internet can reward you with many SQL statements that can act as a great starting point.

Specialty Attributes

Unlike the other attributes that are originally loaded with the date dimension and remain static, these specialty attributes need to be updated daily. Specialty attributes can consist of Yes/No Flags, and Relative Time Periods or Lag Periods.

  • Yes/No Flags can be created to identify specific Period to Date records. For example, if we are two days into a Period, the Period_To_Date_Flag would be “Y” for the first 2 days of the current period, and “N” for all other records for that column.
  • Relative Time Periods or Lag Periods are another common specialty attribute. Using Relative Time Periods allow report developers to easily identify current periods, and prior periods. Using Relative_Calendar_Day as an example, 0 would represent current day, -1 represents prior day, -2 identifies two days ago, +1 identifies tomorrow. All the Relative Time Periods have the same representation so for Relative_Fiscal_Year, 0 is current fiscal year, -1 is prior fiscal year, and +1 is next year. Below is a list of several common specialty attributes.
Specialty Attributes
Year_To_Date_Flag
Prior_Year_To_Date_Flag
Period_To_Date_Flag
Prior_Year_Period_To_Date_Flag
Relative_Calendar_Day
Relative_Calendar_Week
Relative_Calendar_Month
Relative_Fiscal_Period
Relative_Calendar_Year
Relative_Fiscal_Year

By updating these specialty attributes daily, report developers can easily make their reports dynamic. If there is a requirement for a report that contains sales for current period, the report developer can easily identify that, and make the report dynamic by using the filter Relative_Fiscal_Period = 0.

Reporting Benefits

The main benefit of having a date dimension is that it greatly simplifies the report development process. Report developers know that the majority, if not all, of report requirements are based on a specific timeframe, and more often than not the report needs to update dynamically when that timeframe changes. Accomplishing reports with specific timeframes is certainly possible without a date dimension, but the logic to do this is much more complex. Most reporting environments have beginner, intermediate, and advanced report developers. With the added complexity advanced report writers would be forced to develop all reports with these timeframe requirements. Having a date dimension allows beginner, and intermediate report writers to handle these requirements as well. Let’s look at the report requirement below, and discuss how this would be achieved with and without a date dimension.

Report Requirement:

Compare the current period to date Sales Dollars to the Sales Dollars for the same period to date last year. The criteria specifies that when the period changes, the report should dynamically update with the current period. 

With a Date Dimension

Logic for Current Period to Date Sales Dollar would look something like this:

CASE
   WHEN Relative_Fiscal_Period = 0
   THEN Sales_Dollars
   ELSE 0
END

Logic for Last Year Same Period to Date Sales Dollar would look something like this:

CASE
   WHEN Relative_Fiscal_Period = -12 AND Prior_Year_Period_To_Date_Flag = ‘Y’
   THEN Sales_Dollars
   ELSE 0
END

Without a Date Dimension

Good Luck with that!

ALL reporting environments should have a Date Dimension. There are too many benefits that they bring to BI implementations, and those benefits cannot be ignored.

Thursday, December 4, 2014

4 Ways to Optimize Your QlikView Application Performance



Patrick McCaffrey

Business Analytics Specialist

QlikView’s appeal and power is behind its ease of use and the speed at which users can perform data discovery without the need to bother IT with requests. Once an application is validated and deployed, business users have the ability to perform their own custom reporting needs quickly.

But what if it’s not quick? Then some optimization and fine tuning must be done!

There are plenty of ways to fine tune a QlikView application, but I want to focus on 4 that should be performed immediately.

1.      How large are your charts?

If you have a detailed straight table that contains hundreds of thousands or millions of lines, you may want to consider applying a calculation condition that will require users to make selections that limit the line count. The way I normally try to accomplish this is by identifying one field that I want to limit, normally the Invoice Number, Account Number, etc. I then create a condition, for example, Count(InvoiceNo) < 50000. When there are more than 50,000 invoice numbers in the current filter, my chart won’t load.

This is also helpful in limiting how many records a user can export to Excel.

Reminder: Make sure to set an error message to accompany any Calculation Condition. You want to specify WHY the chart isn’t being loaded, or risk confusing the users.

2.      Are there any synthetic keys in your model?

While QlikView can handle some synthetic keys, it has a tendency to slow the application down. QlikView can handle the links better if you code it in yourself, using functions like:

·        AutoNumber(Field1,Field2)
·        AutoNumberHash128(Field1,Field2)
·        AutoNumberHash256(Field1,Field2)
·        Concat(Field1,Field2)

All of these functions will allow you to create the key in the table, rather than letting QlikView try to identify and create a key synthetically.

3.      How many IF Statements are in your data?

QlikView is quick because it loads all of the data straight into memory, the way it appears from the data source (preferably a QVD). What happens if you start changing the way the data appears from the way it’s loaded in? It slows down. One of the reasons it slows down is because every time a new filter is applied, QlikView has to recalculate all of those changes for every line. Imagine 5 changes per line, and 5 million lines… that’s quite a bit of calculating that the application must do every time a new filter is applied.

To speed up your application, you want to avoid doing these sorts of changes in the application, and instead make the changes in the script.

But what if you can’t put these changes in the script? Then try to utilize Set Analysis as much as possible. This is QlikView’s answer for the times that it can’t be avoided. Create flags or buckets in your script then refer to them in Set Analysis. The application will run much faster.

What if you want to change the calculation based on selections? You can still avoid the IF statement by using ‘Conditional’ shows. You can put each expression in with a ‘Conditional’ show using the function GetFieldSelections(Field1) = ‘Value’. When Field1 is on the specified value, that expression will show while the other expressions will not. This technique can also be used for the entire chart or sheet.

4.      How large is the application?

If your application size, in conjunction with your user adoption rate, exceeds the formula that Susie Bann explained in her blog post, it is recommended that you begin looking to upgrade your server, or breaking the application into multiple, specialized QVW documents.

I often look to break the application into main categories that each business department finds useful, such as Finance, Inventory, Sales, etc. This allows for only the needed amount of data to load into the cache per user, and thus limits the required amount of memory needed for successful deployments.

These are a few suggestions I recommend you take immediately to optimize your QlikView application experience. Your users will thank you!