Data Warehousing: Then & Now, and What to Do with It

Originally published in DataArt blog.

Background

Data warehousing is not a new thing today. The concept was first introduced in the 1970s and its key terms “dimension” and “fact” appeared even earlier – in the 1960s. Since then, many businesses have successfully implemented and adopted various data warehouse solutions. Though they were using a great variety of technologies, processes, and ways of thinking, their goals were alike – consolidating data from scattered operational systems, making data clean and trustworthy, extracting the information, and unlocking hidden knowledge. All this was necessary to improve business decisions, to make them knowledgeable, rather than based on blind-guesses.

Many organizations from various industries – from finance to hospitality, from healthcare to gambling – leverage the benefits provided by this several decades old concept. But technologies evolve and brings new methods of data processing, new algorithms and implementations, new features and new possibilities. The amount of data available for analysis grows dramatically. The speed of communication increases. Thus businesses face new challenges – they need to cope with a highly competitive environment which is much faster than before, they need to evaluate the situation in a much more accurate manner, they cannot wait.

In recent years, a new trend in data warehousing has emerged, many companies are looking for ways to improve their existing solutions, which currently are:

  • Hard to maintain. Some base technologies are outdated and will be not supported in a matter of months, some key persons may have already left the company and (in worst cases) some custom source code was lost;
  • Slow. Well, maybe not slow, but not fast enough. Business users complain that they spend too much time on waiting for that “key report required by regulations”;
  • Not functional enough. The business community cannot proceed with “this simple kind of analyses” because “the data warehouse is not designed for that”.

Though all these reasons sound valuable and business-justified, it is needless to say that in most cases there are many people who are afraid of any changes in their data warehouse and show significant resistance. There is no surprise here, DW is considered to be the informational heart in many businesses and (we think) most people are afraid of heart surgery. (Only cyborgs without heart, we believe, do not).

It becomes very important for IT departments to show and prove that changes to the corporate data warehouse will be not surgery, but therapy; that it will be done in a qualified and controlled manner; that all actions are planned and risks are mitigated.

Solution

Unfortunately, due to the diversity of the universe, there is no silver bullet to kill the user’s fears and guarantee success for any DW project. When you step into your second DW project, it is really easy to get into the “been there, done that” trap. In reality, even if your second DW project is for the same organization as the first one, you shortly realize – as soon as business folks get into your masterpiece, things change. And there is no other way but to go and choose the best tools, practices, and approaches once again.

On the other side, seasoned IT Pros know that even though technologies, concepts and process description tend to change at least twice a year (same story as in high-fashion business), there are some things, that will stay in their toolbox forever (like a classic collar cut on a consultant’s shirt). And here are our top three.

Project hack #1 – Get out of your office

Just as ideas for a successful startup can hardly be found in an entrepreneur’s building, business needs and requirements of a firm can rarely live in its CTO’s office. DBAs know database structure, developers know ETL, IT Pros know infrastructure. But it may turn out that all of them together have no idea about latest business trends and a CTO cannot do anything about that. On the other hand, there is fat chance that members of the business user community know what can be accomplished with recent technology breakthroughs.

This is the moment when someone should step out from their office and start building a truly cross-functional project team. A team in which knowledge sharing is encouraged throughout the entire project lifecycle. A team which will be dedicated to success. A team, which will be eager to perform the “heart surgery”.

Various books suggest to us a bunch of related “stylish” words – stakeholder management, relationship management, management of requirements and expectations… Indeed, these activities do exist. But working with people is the key to success here, and a meeting room is the best setting in this case, not your personal office.

Why this is especially important for DW projects?

We cannot state that this is important only for DW projects. Other types of projects may also benefit from this approach. (And not only in the software development industry.) But we do believe that the success of DW project is nearly impossible without strong collaboration in a cross-functional team because of fragmentation of knowledge, amount of data and information, as well as solution complexity, which is usually rather high.

Project hack #2 – One thing to find them all

Working with people often means wading through jungles of personal opinions and subjective judgements. It is very likely that even the architects do not know how those guys from the open space on another floor use the decade-old system. It could be even worse, sometimes you cannot say who exactly uses certain features. Of course, you can guess, but a “blind guess” approach is a direct road to project catastrophe. The cone of uncertainty expands, even though we do not see it yet.

This is the moment when technology comes to the fore and helps to obtain facts, so judgements can be proven. Fortunately, user activities are not player’s bids in Las Vegas, so “what happens here stays here” is not applicable. Most enterprise-level solutions allow a certain level of auditing and logging. Cognos or Microsoft SQL Server Reporting Services, IIS or Apache… It really doesn’t matter which product, platform, or solution is used. All of them have logs, all of them can give you facts. Another portion of facts is buried in the source code, so there is no reason to use opinions about algorithms or formulas. Even in cases when the source code is not available, technology arms you with various reverse engineering tools and techniques.

Why this is especially important for DW projects?

In the data warehousing industry, user interfaces cannot be called just a “tip of the iceberg”. They are a small fraction of the iceberg’s tip, so personal opinions are not just subjective. They are extremely subjective and, thus, unreliable. On one of the recently completed projects we faced a DW accompanied by more than five hundreds reports hosted on two different platforms. Only through investigating logs and reverse engineering we were able to discover all valid use cases and ensure that nothing will be left behind. Interviewing end users from business departments gave us less than one third of what was discovered.

Project hack #3 – Embrace the unembraceable

An old data warehouse is like a gem in a forest. Dozens of data sources and data consumers, hundreds of users and gigabytes, terabytes, petabytes of data… Sometimes you get the feeling that you are in an alien galaxy when you start the project. Nobody wants to destroy the gem (or turn the galaxy into trash), nobody wants a solution that will not be trustworthy and reliable. Everyone, especially the business owners, wants guarantees. Well, maybe not 100%, but at least some… In the IT world the guarantee means testing. How much will it cost to test a galaxy?

Everything has its own price, so let’s try to estimate the hypothetical case of testing 100 reports for a small project:

  • 100 reports, 2-4 ways to change data, 3-4 parameter sets. This gives us 1600 test cases.
  • Minimum 15 minutes per report (open interface, set parameter values, run report, check result).
  • 1600 test cases multiplied by 15 minutes gives 400 man-hours or more than 2 man-months for just one iteration.

100 reports is not a large scope (not small, but not large), so it is unlikely that there will be a budget for 6 man-months to proceed with a modest three iterations of testing. Saying nothing about validating each and every change introduced by the development team. This means that project will have no full coverage in terms of quality assurance, i.e. no guarantees for the business. Of course, we can try to use reasons like “any financial calculation allows errors up to 5%”, but good luck with presenting this to your CFO.

The only way to cut the knot is automation. Indeed, why should we need to perform 1600 repetitive actions manually? The solution is to identify how the changed parts of the product affect other areas, and then decide which areas are most important from both business and technical perspectives. This will give an idea of what should be tested and the technology experts will be able to decide how to do it.

Why this is especially important for DW projects?

Near-real time processing, increased amounts of data, self-service analytics, all these words are among modern DW trends. From the development perspective all of them mean the same: more and faster. More use cases and requirements, faster development (preferably in the ”Agile” manner) and delivery, and better performance. But IT budgets do not grow proportionally, actually they tend to shrink, so that the value-to-money ratio is maximized. Thus there is no other way, but to maximize the amount of work per human “resource”. It means automation.

Conclusion

In recent decades businesses have learned how to evolve at a much faster pace than ever before, and technology has followed the same path. The concept of Data Warehousing has earned the reputation of “informational heart” in many industries, so many enterprises choose to invest into further improvements. Many projects related to data warehousing do succeed, but many others fail. What is the key to success and how to avoid failure? The answer is simple, there is no simple answer. Thinking individually, thinking broad and thinking deep, thinking about both people and technology and, of course, choosing the right tools from both management and technology toolboxes, this is what drives a project’s success.

Leave a Comment