Stack the deck in your favor! Why understanding datasets could radically speed up your project delivery
Market research professionals are busy people – there is constant pressure to gather ever more accurate, insightful data with which to enable business decisions and stay relevant in a competitive marketplace. This pressure to reduce the time between question and action, while gathering information from the largest, most representative survey group possible, means that there is not much time left over to consider how the data is gathered, stored, and shared. This could be an extremely expensive mistake. In this article we look at the structure of data itself – and how a proper understanding can radically speed up the production of actionable reports. This article will probably take you less than five minutes to read…it could save you a lot more time than that.
To stack or not to stack? Why should I care?
Surveys are becoming ever larger in their scope, with massive amounts of data generated. The more ambitious the research, the bigger the data set, with significant implications for storage and analysis. Typically, brand and campaign studies have used ‘looped’ questionnaires, where the respondents answer the same or similar questions for a number of brands or campaigns. This data type of looped questionnaire usually generates a large number of variables. For example, in a global brand study of 200 brands with a loop of 50 questions for each, there will be 10,000 columns/variables in the data set.
Looped questionnaires have traditionally generated a flat, ‘unstacked’ data format, where there’s a column/variable per brand/campaign and ‘loop’ question. This is a great format for many types of market research, including customer experience and employee experience studies.
It does, however, present certain problems. The time it takes to produce research reports is strongly correlated to the number of columns/variables in the data set and also increases the risk of errors. Since large brand studies often have a high number of respondents, the data sets become huge, which can make the reporting even more difficult.
By making the data response (rather than respondent) driven and stacking it, we can significantly reduce the column/variable count. This has a number of advantages: fewer variables reduce the computational workload; maintenance hours are reduced – new brands can be added to the questionnaire with the minimum of structural changes, cutting back on unnecessary costs.
The benefits stack up
Stacked data is much easier to apply filters to – a huge advantage when it comes to generating meaningful reports. For such surveys as brand trackers, where the information needs to be divided more along ‘response’ lines, (i.e. where multiple brands are compared for similar attributes), a stacked format allows software solutions to apply filters that quickly generate actionable feedback to stakeholders.
The only downside of stacking the data in this way is that a respondent (identified by a unique ID number) can appear multiple times – unhelpful if we’re looking at demographic data focused upon the type of respondent, rather than their response. The solution? An understanding of where to use each structure and – ideally – a software solution that can draw information from both, depending on the type of report needed.
By giving each respondent an unique ID number, filters can be applied based upon this identifier, correlating information from both stacked and unstacked data sets, depending on the needs of the user. Want to narrow data down by age, gender, or income? Draw the data from the unstacked file. Want to know what this group thinks about a given brand? Cross reference their IDs with the stacked data, giving a precise, actionable insight into a given group.
Readers of previous articles will know that we focus heavily upon the bottom line – what is the best way that a technological solution can add value to business, either through efficiency/cost savings, or giving a competitive edge in a crowded marketplace. By understanding the implications of stacked versus unstacked data, we can achieve both these objectives.
The bottom line
We need to acknowledge the disconnect between those gathering and processing data and those reliant upon it for business decisions. The first group is primarily concerned with the scope, accuracy and scale of the data collected – the second is looking for data to be presented in an easily consumable yet infinitely flexible format, often for onward dissemination through the stakeholder chain.
What is needed is the ability to bring these groups (and their disparate interests) closer together, aligning process with result. In a perfect world, the gap between respondent and end-user would be as small as possible, with no latency between a question being asked and a perfectly informed business decision being taken.
Understanding the way that data is stored (and the implications) should not, therefore, just be of interest to technical departments, but should form part of the essential professional toolkit of everyone involved in market research. A basic comprehension of the speed and efficiency implications of incorrectly presented data can be the difference between quick, actionable reports and an obsolete pile of numbers.
A driver of a car may not need to know how the internal combustion engine works – but they do need to know the role it plays in reaching their destination on time…
Curious about how this works in the real world? Find out how Dapresy helped one customer transform their project delivery speed…
Read the case study
Want to know how you can use Dapresy Pro to transform your project delivery speed? Take a more detailed look at our new features.