Featured Product
This Week in Quality Digest Live
Management Features
Kate Zabriskie
Misguided incentives create misaligned consequences
Chengyi Lin
The right metrics can align objectives in flexible work arrangements
Jake Mazulewicz
Three tips from high-reliability organizations
Dave Gilson
Getting out of the boardroom for a stroll changes how women navigate
Bob Ferrone
Saving the planet and bolstering the bottom line

More Features

Management News
How to drive productivity with a universal and powerful 3D inspection software
Research commissioned by the Aerospace & Defense PLM Action Group with Eurostep and leading PLM providers
Improved design of polarization-independent beam splitters
New industry-recognized guidelines for manufacturing jobs
ASQ will address absence of internationally recognized ESG benchmarks
Helping organizations improve quality and performance
Leading technologies empowering the next generation of 3D engineering software solutions
EstateSpace offers digital estate management system

More News

Meredith Griffith


Are You Putting the Analytical Cart Before the Data Horse?

Three best practices for prepping data for analysis

Published: Tuesday, March 22, 2016 - 17:43

Sponsored Content

Most of us have heard of a backward way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart. This saying may be especially true in the world of statistics.

Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You might find yourself trying to force your data to fit an analysis, particularly when the data have not been set up properly. It’s far more efficient to first make sure your data are reliable and then allow your questions of interest to guide you to the right analysis.

Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work or can’t be trusted.

As a quality practitioner, you’re likely to be involved in many activities: establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and many more.

All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a Gage R&R study to verify that your measurement systems are valid, or you need to understand how machine failures affect downtime.

Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:
• What problem am I trying to solve?
• Are my data set up in a way that will be useful to answering my question?
• Did I make any mistakes while recording my data?

Utilizing process knowledge can also help you answer questions about your data as well as identify data entry errors. Preparing and exploring your data prior to an analysis not only will save you time in the long run, but also help you obtain reliable results.

Here are three tips for preparing, exploring, and drawing insights from your data to keep you from putting the analytical cart before the data horse.

Clean your data before you analyze

Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure that your analyses and results—and the decisions you make—are reliable. 

Be sure to identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, and remove blank rows and extra spaces.

Figure 1: Minitab offers a data-import dialog that helps you quickly clean your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.

Use formatting and highlighting tools to explore and visualize data

Use conditional formatting to identify frequently occurring values or points that are out of spec or out of control. For example, use formatting rules to identify values that are not within spec and might indicate either a data entry error or valid cause for investigation.

Figure 2: With a simple right-click directly in the Minitab worksheet, you can identify out-of-spec values you may wish to investigate before you begin your analysis.

Highlight individual cells or rows, and add cell comments to draw attention to data that need further investigation, such as out-of-control points, unusual observations, or other data of interest. Rather than removing questionable data right away, take note of the data, perhaps by commenting on the cell as a reminder to follow-up. Doing this can prevent you from committing the statistically unsound practice of cherry-picking data.

Figure 3: In the Minitab worksheet, you can highlight an entire row to easily visualize all variables associated with particular data, or add a cell comment to an out-of-control point for future reference.

Use subsets to uncover insights prior to your analysis

Creating subsets from formatted columns helps you focus only on the data relevant to answering your questions. For example, suppose you want to understand why machines are experiencing downtimes so you can address productivity problems. Use conditional formatting to identify the most frequent reason for a machine’s downtime, and subset your data to understand their relationship to other variables.

Figure 4: It’s easy to subset your data in Minitab by right-clicking directly within the worksheet.

Taking the time to clean and explore your data before you begin an analysis is worth the investment. Doing so will help you answer key questions about your process, lead to a more efficient analysis, and yield results you can trust.

If it’s been a while since your last round of data collection and analysis, check out the free trial of Minitab 17 Statistical Software. New features include improved workflows for cleaning and preparing your data for analysis, as well as tools for exploring and manipulating your data directly in the worksheet.


About The Author

Meredith Griffith’s picture

Meredith Griffith

Meredith Griffith is an associate product marketing manager at Minitab. She has a bachelor’s degree in statistics from Virginia Polytechnic Institute. Griffith’s professional experience is in statistical data analysis, software testing, and software development.