Featured Product
This Week in Quality Digest Live
Management Features
Gleb Tsipursky
The problem is a lot more complex than you think
Rita Men
A survey shows people tend to trust their employers more than governments or the media
Dirk Dusharme @ Quality Digest
Cloud-based eQMS solutions provide quality professionals with the data they need when they need it
Kate Zabriskie
Strategies to retake control, push for greater accountability, and regain control of your sanity
Mike Figliuolo
It’s easy for your team to get sidetracked if your strategy has a lot of moving parts

More Features

Management News
Tech aggravation can lead to issues with employee engagement, customer experience, and business results
Harnessing the forces that drive your organizations success
Free education source for global medical device community
New standard for safe generator use created by the industry’s own PGMA with the assistance of industry experts
Provides synchronization, compliance, traceability, and transparency within processes
Galileo’s Telescope describes how to measure success at the top of the organization, translate down to every level of supervision
Too often process enhancements occur in silos where there is little positive impact on the big picture
Latest installment of North American Manufacturing Covid-19 Survey Series shows 38% of surveyed companies are hiring
How to develop an effective strategic plan and make the best major decisions in the context of uncertainty and ambiguity

More News

Meredith Griffith


Are You Putting the Analytical Cart Before the Data Horse?

Three best practices for prepping data for analysis

Published: Tuesday, March 22, 2016 - 17:43

Sponsored Content

Most of us have heard of a backward way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart. This saying may be especially true in the world of statistics.

Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You might find yourself trying to force your data to fit an analysis, particularly when the data have not been set up properly. It’s far more efficient to first make sure your data are reliable and then allow your questions of interest to guide you to the right analysis.

Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work or can’t be trusted.

As a quality practitioner, you’re likely to be involved in many activities: establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and many more.

All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a Gage R&R study to verify that your measurement systems are valid, or you need to understand how machine failures affect downtime.

Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:
• What problem am I trying to solve?
• Are my data set up in a way that will be useful to answering my question?
• Did I make any mistakes while recording my data?

Utilizing process knowledge can also help you answer questions about your data as well as identify data entry errors. Preparing and exploring your data prior to an analysis not only will save you time in the long run, but also help you obtain reliable results.

Here are three tips for preparing, exploring, and drawing insights from your data to keep you from putting the analytical cart before the data horse.

Clean your data before you analyze

Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure that your analyses and results—and the decisions you make—are reliable. 

Be sure to identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, and remove blank rows and extra spaces.

Figure 1: Minitab offers a data-import dialog that helps you quickly clean your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.

Use formatting and highlighting tools to explore and visualize data

Use conditional formatting to identify frequently occurring values or points that are out of spec or out of control. For example, use formatting rules to identify values that are not within spec and might indicate either a data entry error or valid cause for investigation.

Figure 2: With a simple right-click directly in the Minitab worksheet, you can identify out-of-spec values you may wish to investigate before you begin your analysis.

Highlight individual cells or rows, and add cell comments to draw attention to data that need further investigation, such as out-of-control points, unusual observations, or other data of interest. Rather than removing questionable data right away, take note of the data, perhaps by commenting on the cell as a reminder to follow-up. Doing this can prevent you from committing the statistically unsound practice of cherry-picking data.

Figure 3: In the Minitab worksheet, you can highlight an entire row to easily visualize all variables associated with particular data, or add a cell comment to an out-of-control point for future reference.

Use subsets to uncover insights prior to your analysis

Creating subsets from formatted columns helps you focus only on the data relevant to answering your questions. For example, suppose you want to understand why machines are experiencing downtimes so you can address productivity problems. Use conditional formatting to identify the most frequent reason for a machine’s downtime, and subset your data to understand their relationship to other variables.

Figure 4: It’s easy to subset your data in Minitab by right-clicking directly within the worksheet.

Taking the time to clean and explore your data before you begin an analysis is worth the investment. Doing so will help you answer key questions about your process, lead to a more efficient analysis, and yield results you can trust.

If it’s been a while since your last round of data collection and analysis, check out the free trial of Minitab 17 Statistical Software. New features include improved workflows for cleaning and preparing your data for analysis, as well as tools for exploring and manipulating your data directly in the worksheet.


About The Author

Meredith Griffith’s picture

Meredith Griffith

Meredith Griffith is an associate product marketing manager at Minitab. She has a bachelor’s degree in statistics from Virginia Polytechnic Institute. Griffith’s professional experience is in statistical data analysis, software testing, and software development.