File Compatibility

KnowledgeHound natively supports respondent-by-respondent data in SPSS’s .SAV format.

File Types and Compatible Tools

We guarantee compatibility with SAV+MDD file pairs produced by exporting data from SPSS Data Collection Survey Reporter (What is an SAV? An MDD?).  Not all organizations have data in this format, or the tools to produce it efficiently.  In lieu of the SPSS Data Collection Survey Reporter-produced files, we can accept:

  1. An SAV+MDD file pair produced by another tool

  2. Just the SAV file produced by another tool

  3. An SPSS/Quanvert PKD file for each study

  4. Specially formatted CSV files

  5. Other file formats (on an experimental basis)

We strive for compatibility with a range of files and tools, but cannot guarantee compatibility with each of the hundreds of combinations of tools and file formats.  Some tools don’t gracefully export to competing file formats and some export processes truncate or mangle important data elements.

We would be happy to evaluate your files and, if we can’t import them into KnowledgeHound, recommend a process you can use to create compatible files or discuss additional data curation services.

File Size

We support user uploads of files up to 50MB (up to 100MB on an experimental basis).  If your data files are larger than 100MB, you can break them into several smaller files or exclude unnecessary variables.

Excluding unnecessary variables

You don’t need to upload data your users won’t see.  For example, you might have collected “system variables” that are not of interest to analysts.  Since this data will be unavailable to users, you can eliminate it from your files before upload to shrink file size.  To do so, simply export only the desired variables from your current file.

Breaking Files into Several Smaller Files

If all data in a file is important and files are still too large, we recommend breaking a file into smaller files by exporting subsets of variables into each of the smaller files.

Tip: When exporting variable subsets into many files, make sure each and every file contains the important demographic and segmentation variables your users will want to compare results across.

Why SAV Files, not Tables?

Because respondent level data gives your users the most flexibility in determining their own cuts, and because highly structured files like SPSS’s SAV are more reliable, we prefer respondent level SPSS files.  

What is an SAV file?  An MDD?

SAV and MDD are two of the file types you can create with SPSS.  You can also create them with other programs, though in most cases, other programs only produce SAVs.

SAVs are a basic data file.  They contain the response of every respondent to every question (if they answered, of course).  They also contain some of the metadata about that study -- What was the prompt of each question?  What were the possible responses to categorical questions?

MDDs contain extra metadata (MDD stands for MetaData Document) about the survey’s structure, but they don’t contain any response data at all.  So, they’re just a complement to the SAV, not a replacement.  Having an MDD is helpful for two reasons.  First, because KnowledgeHound doesn’t read stand-alone SAVs, it only reads SAV+MDD pairs.  Second, because your MDD might have something the SAV doesn’t.  For example, SAVs can only store question prompts up to 120 characters, but MDDs can handle very long prompts.  

How to get an MDD if you’ve only got an SAV

Using SPSS (we recommend SPSS Data Collection Survey Reporter), open the SAV file.  Select File > Export > Data

Then, under the “General” tab, click Browse to choose where to save the SAV+MDD pair you will export .  Do not export these new files into the same directory where you keep your original SAV.  When you’ve chosen your new file name, select “Save”.  Confusingly, this does not yet save the new files.

Then, under the “Variables” tab, be sure that you’ve selected “All Variables”.  Finally, click “Export” at the bottom right.  In the folder you selected, you’ll see 4 new files, among them: new_file_name.sav and new_file_name.mdd.  Provide these two files to your KnowledgeHound client manager, and you’re all set.  

Data Quality

Data quality is a big topic, much too big to cover completely in a few pages.  You and your research supplier likely pay attention to many critical topics like appropriate sampling, bias, instrument design, ‘speeders’, and more.  

Here, we’ll discuss some additional facets of data quality you may not be familiar with, especially if historically your data has usually been accessed and analyzed by specialists.  Paying close attention to these facets will help you get the most out of your data.

Native Variable Types

Supported Variable Types

KnowledgeHound supports categorical variables (Single Response and Multiple Response) and numerical variables.

For categorical variables, each response should be the full text of the response selected by users.  Responses should not restate the question prompt.

Good example:  

Q:  Where do you shop most often?

○ Grocery Stores

○ Club Stores

○ Convenience Stores

Bad example 1:  

Q:  Where do you shop most often?

○ 1

○ 2

○ 3

Bad example 2:  

Q:  Where do you shop most often?

○ Where do you shop most often - Grocery Stores

○ Where do you shop most often - Club Stores

○ Where do you shop most often - Convenience Stores

Even if they are properly formatted, responses never shown to users should not be included in the MDD file since users will not be able to differentiate between responses users never saw and responses they simply did not select.

Variable Type Conventions

KnowledgeHound works best when SPSS files use the proper variable type for the questions they describe.  Almost all suppliers do this by default.  In the rare cases that improper variable types are used, incompatible tools are usually to blame.

When properly formatted, a data file should have exactly one variable for every question asked to respondents.  Each variable should match the type of question given to respondents.  For example, a if users are allowed to select more than one possible answer, the file should use the SPSS variable type “Multiple Response”.

Using the right variable type:  

Variable 1:  What colors do you like (check all that apply)?

▢ Red

▢ Yellow

▢ Blue

Using the wrong variable type:  

Variable 1:  What colors do you like (check all that apply) - Red

○ True

○ False

Variable 2:  What colors do you like (check all that apply) - Yellow

○ True

○ False

Variable 3:  What colors do you like (check all that apply) - Blue

○ True

○ False

There will be cases where it makes sense to include two variables for a single question.  For example, if you asked respondents to report their age as a number, your supplier might have helpfully delivered a file with two age variables, one numeric (the raw responses) and one categorical variable describing respondents’ age brackets.  KnowledgeHound gladly accommodates these convenience variables.

If you’re unsure whether or not your variables are properly formatted, we’d be happy to examine a sample file and its related questionnaire and let you know. 


Some surveys use skip patterns which only ask particular questions to subsets of respondents.  When you use a skip pattern, be sure that your data differentiates between respondents who (a) never saw the question (b) declined to answer the question or (c) simply didn’t select an answer because none of the responses applied to them.  

Because these differences are integral to drawing inferences from your data, as in the sample question below, be sure that responses are coded appropriately.  

Sample Question:  

Q:  When was the last time you shopped at a Country Store?

○ In the past 7 days

○ In the past 30 days

○ In the past 365 days

For example, if a respondent saw this question but did not answer it, they were probably indicating they had not shopped at a country store in the last 365 days.

Tip: If applicable, provide a “None of the above” option on categorical questions and require a response to all questions presented to respondents.

Grid Questions

KnowledgeHound can import grid questions, and when it does, it separates it into multiple stand-alone questions.  For example, your questionnaire might ask a question like:

How strongly do the following statements reflect your views?

Strongly Disagree

Somewhat Disagree

Neither Agree nor Disagree

Somewhat Agree

Strongly Agree

I like music






I like exercise






I like peace & quiet





When you import a grid question like this, KnowledgeHound breaks it up into 3 individual questions:

Q:  How strongly do the following statements reflect your views: I like music?

○  Strongly Disagree

○  Somewhat Disagree

○  Neither Agree nor Disagree

○  Somewhat Agree

○  Strongly Agree

Q:  How strongly do the following statements reflect your views: I like exercise?

○  Strongly Disagree

○  Somewhat Disagree

○  Neither Agree nor Disagree

… and so on

It’s important to check that your grid prompt (“How strongly do…”) and your slice prompt (e.g. “I like music”), when combined, are readable and searchable (see below).  If you import a grid question without a descriptive grid prompt, the result may be confusing.

Searchability and Readability

To get the most from KnowledgeHound, the question prompts and listed responses for each question in your data set should be both readable and searchable.  

Readable means that any of your colleagues who find the question in KnowledgeHound will know immediately from the question’s prompt what was asked to respondents.

Readable example:  

Q:  Where have you shopped the most often in the Past 6 Months (select the most accurate response)?

A1: Grocery Stores

A2: Club Stores

A3: Convenience Stores

Bad example:  

Q:  Whr Shp Mst P6M?

A1: Grcry Strs

A2: Clb Strs

A3: Cnvnc Strs

Searchable means that the words in the question prompt capture the essence of the question.

Searchable example:  

Q:  Where do you shop most often (select the most accurate response)?

A1: Grocery Stores

A2: Club Stores

A3: Convenience Stores

Bad example:  

Q:  Which is most true of you?

A1: I most often shop at grocery stores

A2: I most often shop at club stores

A3: I most often shop at convenience stores

SPSS variables (each generally represents one question in your questionnaire) each come with, amongst other things, a name and a description.  Our interest is in the description.  Some tools and some users often abbreviate the question prompt in the description, sometimes inadvertently making the question unsearchable and unreadable.

Tip: The description field should contain the full question prompt without abbreviations

Categorical SPSS variables, questions which ask users to select their response from a list of predefined responses, also come with that list of responses embedded in the SPSS file.  To make sure your colleagues can search for text that appears in those answers, for example when they search “How often do people shop at grocery stores?”, these too need to be readable and searchable.

Tip: The response selections to categorical questions should be as searchable and readable as question prompts

Tip: If you decide to edit the response selections to categorical questions, make sure to check afterwards that the file has saved your data properly by looking at the summary for that variable.  In some cases, SPSS may corrupt the data for a question if the response names are changed.

Screener Data

To determine whether or not screener questions should be included, you should first check the base sizes of those screener questions. If the base is higher in the screener question than it is in the main survey questions, the data file probably includes terminate data. Meaning, there is data from respondents who did not qualify for the study. In these cases, you can do one of two things: delete the terminate data from the data file or omit those questions. The reason we do not want to include this kind of data is because it is not representative of the sample for the study. If somebody pulls a data point from one of these questions, and doesn’t realize that the base size is not the same as the main portion of the study, they could potentially use that data point in the wrong context.

References to Information not Included in the SPSS File

You may find that you have some data you can’t possibly include inside your SPSS file.  For example, you may have respondents examine a set of concept images before offering their impressions.  When this happens, be sure that you include enough information in your SPSS file that your colleagues searching KnowledgeHound will be able to make sense of the questions you’ve asked.

Tip: When referring to concept tests, give a one to three word description that differentiates each concept.  It is most encouraged to use the benefit statement or the key offering of the stimuli.  Instead of calling it “Concept #1”, call it “Freshness Concept” or better yet “100% Freshness Guaranteed”.  This is also is applicable to claims testing.  Instead of calling it “Claim 1”, call it the actual claim tested like, “5 Times Fresher”.

Tip: When referring to outside assets, such as images from concept tests, make sure that you or your supplier also upload those assets to KnowledgeHound as “Associated Documents”.

Working with Suppliers

Issues with data quality can be handled anywhere in the data pipeline, but the later they’re addressed the more costly they can become.

The least expensive and most effective place to address data quality issues is at the source.  Whenever possible, it’s better to have suppliers deliver files that meet these standards.  Better yet is for suppliers to use tools that require little or no additional effort from them to meet these standards.

Did this answer your question?