Stairway to Wisdom

There is a tendency now, especially for those of us in the more affluent classes, to want to use education to make life more predictable, to seek control as the essential good, to emphasize data that masks the remorseless unpredictability of individual lives. But people engaged in direct contact with problems like teenage pregnancy are cured of those linear illusions.

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing

A lot of shops are still using 20-year-old B-tree technology to index their databases because B-tree indexes are a compromise between querying and updating. However, we don’t need to compromise when we decide to break our database world into two pieces.

.. I recommend starting with these kinds of first-level data marts because I believe this minimizes the risk of signing up to an implementation that is too ambitious. Most of the risk comes from biting off too big an extract programming job. Also, in many cases an efficiently implemented first-level data mart  will provide users with enough interesting data to keep them happy and quiet while the data mart teams keep working on harder issues. (pp 44) .. As early as 1980, Nielsen figured out that the way to tie separate data sources together was by using “conformed dimensions.” (pp 54)

.. Perhaps more than any other job in IT, the data warehouse design task combines computer technology, cognitive psychology, business content, and politics. (pp 58)

.. The most serious mistake with hardware and software is choosing a proprietary and closed hardware solution that emphasizes raw computing power on complex production schemas rather than careful software and data design. While such solutions may reduce isolated back room costs, they drive up the costs of application development and increase the chances that users will encounter complexity. (pp 59)

.. A centrally planned data warehouse is as likely to be successful as a centrally planned economy. It sounds great on paper, and it appeals to the controlling instincts of IT, but a centrally planned data warehouse makes the assumptions of perfect information and control. .. Data warehouses are built to materially improve the organization’s decision-making capabilities.

.. I’ve often said that the best application support person is permanently conflicted as to whether the business or the technology is more appealing. These people (of which I am one) spend their entire careers moving back and forth across the business/IT boundary. The cost to address this problem is an explicit program of tours of duty for IT people to spend a year or longer working directly in the business user department. Business credibility for IT personnel is the “gold coin.”

.. The tongue in cheek definition of real time is “any data delivery that is too fast for the current extract, transform, and load system.” .. It’s fashionable to measure return-on-investment with highly analytic-sounding techniques, such as payback period, net present value, or internal rate of return. In my opinion, these miss the main point of evaluating the costs and eventually the value of a data warehouse. A data warehouse supports decisions.  After a decision is made, give the data warehouse a portion of the credit, and then compare that retrospectively to the costs of the warehouse. My rule of thumb is to take 20 percent of the monetary value of a decision.

.. “Agile is an iterative and incremental (evolutionary) approach to software development which is performed in a highly collaborative manner with ‘just enough’ ceremony.” made and book that to the benefit of the data warehouse. .. The comatose user. These business users respond to your classic, open-ended questions with monosyllabic, one-word responses. Fortunately, this is a relatively rare syndrome. Most often, their apathetic responses are due to external distractions totally unrelated to the DW/BI project. It’s sometimes effective to ask these people questions from a more negative perspective. For example, rather than trying to get them to envision life outside the box, these users sometimes find it easier to tell you what’s wrong inside the box.

.. The dimension tables represent the biggest departure from the usual normalization techniques. It is important that the dimension tables remain as flat tables without being further normalized. This is the hardest design step for relational data modelers to accept. (pp 135)

.. Conformed dimensions are agreed to by an interdisciplinary team representing all the interests of the enterprise. This is a hard job. It’s expected that the team will get stuck from time to time trying to align the incompatible original vocabularies of different groups. This is why the high level business executive sponsor is necessary. The executive must periodically approve these tough vocabulary compromises or even force them to be made. (pp 150)

.. Users understand hierarchies, and your job is to present them in the most natural and efficient way. A hierarchy belongs in a single, physical flat dimension table. Resist the urge to “snowflake” a hierarchy by generating a set of progressively smaller (pp 199)

.. A great place to start in most enterprises is to model the process consisting of customer invoices or monthly statements, as shown in Figure 7-1. This data source is probably fairly accessible and of fairly high quality. One of Kimball’s laws is that the best data source in any enterprise is the record of “how much money they owe us.” (pp 211)

.. Details Always Come First Because consolidated fact tables deliver ease of use and query performance, perhaps you’re thinking that you’ll start there. It’s especially tempting if you’re chartered with creating a flashy scorecard or dashboard for the executive team. However, don’t be lured onto this supposedly easy street. You need to focus on the atomic details before pursuing either aggregated or consolidated dimensional models. If you start at the more macro level without the detailed foundation, there’s nothing to drill into when a business user wants to probe into an exceptional condition or anomaly in the consolidated data. Remember, you can always roll up, but you can only drill down if the lower level details are available. (pp 242)

.. If an enterprise data model is a model of real data, then I am its biggest fan. In that case, this article probably describes a specific episode in building that very useful enterprise data model. But if the enterprise data model describes a kind of abstract, ideal data world, describing how data should be if only it were designed correctly, then I have very little patience. Idealized enterprise data models are of only marginal use when we try to take real data and deliver it to business users on a tight budget and time frame. Idealized enterprise data models aren’t populated with data. (pp 305)

.. Using GIS tools, we can clean up and effectively use the millions of addresses we already store for all our customers. (pp 385)

“As we often remark, a clever person who is a good programmer can do anything.  You can make SCEs work in both the ETL back room and the BI front room if you are determined, but… we respectfully suggest that you win the Nobel prize on a different topic.

But what do you do when your executives are clamoring for a sexy dashboard, but there’s no existing foundation that can be reasonably leveraged? Facing a similar predicament, some of you have bootstrapped the dashboard development effort. And it may have been initially perceived as a success. But then middle managers start calling because their bosses are monitoring performance via the dashboard, yet there’s no ability for them to drill into the details where the true causal factors of a performance problem are lurking. Or management starts to question the validity of the dashboard data because it doesn’t tie to other reports due to inconsistent transformation/business rules. Or the users determine they need the dashboard updated more frequently. Or your counterpart who supports another area of the business launches a separate, similar but different dashboard initiative. The bootstrapped dashboard will be seriously, potentially fatally, stressed from the consequences of bypassing the development of an appropriate infrastructure. Eventually you’ll need to pay the price and rework the initiative. (pp 613)

Performance Management in the Retail Industry

If telephone expenses are less than 1% of store expenses, they should not be reported as a KPI for the stores. Surprisingly, reports like this are often found and prove distracting to managers, keeping them from focusing on more important factors. In marketing, if only a small percentage of cost is driven by postage, and there is little action that can be taken to impact postage once a catalog delivery decision is made, then postage should not be a marketing KPI