This document will be discussing the delivered format for all SPSS files to provide the best KnowledgeHound user experience.

All data files delivered in SPSS format (SAV, SAV/MDD, PKD accepted)

Every question has a name, description (in the variable column) and responses (in the values column). 

  • Name: Needs to be unique within the data set, but formatting is not necessary as it will not be viewed in KnowledgeHound.

  • Variable: Formatting is extremely important as this is what the user will see in the application. There cannot be question numbers or abbreviations, and every variable needs enough context to stand alone. 

Unacceptable: “Q4. How mny chl in hme” 

Acceptable: “How many children currently live in your home?” 

  • Values: Properly coded per the questionnaire. 

Multiple Response Sets

  • Any questions that allow for more than one response from a respondent should be combined into a Multiple Response Set. This allows us to see each question once, as it was intended in the questionnaire. For Example: One question has been broken into 11 separate lines. The first variable, B3m1, shows only the first response option. The second, B3m2, only shows the second option and so forth. This makes it very difficult to compare responses for a question as you would have to manually tally all 11 responses.

  • When these 11 variables are made into a multiple response set it becomes 1 variable that contains all responses and is easy to compare responses against one another, and it looks like this...

Single Response Formatting

  • Each respondent should have all their responses in a single row. You can see in the example below, there is a unique respondent ID (first column) for each respondent. This shows that each row represents all the responses for a single person.

Respondents who did not see a question or did not respond with a valid choice should be given a null value, not zero. 

  • This is important because not answering or not even seeing a question is very different from responding “no”. Below you can see that respondents 1-16 all gave an answer to the question and 17-20 did not see that question

Personally identifiable information, or PII, needs to be removed before KnowledgeHound can accept the file.

  • This includes first and last name, email address, phone number, street address, etc.

  • We do not consider the following PII and give you the best experience if you DO INCLUDE the following: ZIP code, Age, Income, Marital Status, Homeownership, Household Size, and Presence of Children

Piping language (when a question contains another question as part of its prompt) needs to be accurately portrayed to provide context for the user.

  • Let's imagine we have 2 questions asked back to back. Frequently those are coded in the following format:

Question number 1: What cell phone service do you have?

Question number 2: How satisfied are you with your (Q1) service?

The respondent will see question 2 with “(Q1)” replaced by whatever their answer was to question 1. For example, if they responded “BRAND A” to question 1, then question 2 for the respondent would read “How satisfied are you with your BRAND A service?”

Therefore, instead of showing “(Q1)” in the question 2 prompt within the data file, KnowledgeHound requires the full title of the question being piped in the following format…

Question number 2: How satisfied are you with your {Breakout by “What cell phone service do you have?”} service?

Tracking Studies 

  • Should be delivered as a cumulative representation of the study. This means that as new respondents are surveyed, they should be appended to the existing file containing all prior respondents. The final deliverable should be a single data file that contains all respondents who have ever been part of the study.

  • The file will also ideally have a year or wave variable for the user to crosstab and use KnowledgeHound’s trending graphs.

Weighted Data Files

  • Please provide the KnowledgeHound team the following information when a data file has been weighted:

Which variable to apply as weighted (when there are more than one)

What the weighting is representing. “This data has been weighted to represent _____”

Grid Variable Formatting

  • When the questionnaire breaks out Columns and Rows, the row options will be displayed in the “Response” column, and the columns need to have a separate line item per choice, with the full question before it to provide user context. 

  • For example: 

Q16. Please tell us how much you agree or disagree with the following statements.


Statement A

Statement B

Statement C

Statement D


Strongly Agree

Somewhat Agree

Somewhat Dislike

Strongly Dislike

The properly formatted data file will display the following for the 4 statement variables:

(Please note the importance of the “space dash space” format between the question stem and the different statements)

Please tell us how much you agree or disagree with the following statements. – Statement A

Please tell us how much you agree or disagree with the following statements. – Statement B

Please tell us how much you agree or disagree with the following statements. – Statement C

Please tell us how much you agree or disagree with the following statements. – Statement D

Scale Questions

  • For questions asked on a scale (ex. "On a scale from 1 to 10 where 1 is "Terrible" and 10 is "Excellent"), the endpoints of the scale must be labeled in the value labels to provide the necessary context to users. This also allows the KnowledgeHound platform to calculate mean scores on these categorical variables.

Did this answer your question?