Data Reduction Resources

For our class, I want to balance the methods of quantitative and qualitative. Most traditional courses like EPSY 5601 would focus only on quantitative methods in the remainder of the course. For example, Chapter 14 in your TopHat text presents what you would normally get in an intro Statistics class for social research, and Chapter 15 takes it further in to inferential statistics. That’s waaaay too much information for our purposes, and omits a discussion of qualitative methods.

Once any data set is collected, the very first step an educational researcher takes is to “clean” the data, deciding what to do about missing data (there are many options) and checking for outliers (there’s always that 1 student who puts all 5’s or all 1’s on the survey, right?). In a qualitative study, that would look a little different, where “cleaning” might mean transcribing to get a clean copy of what was said, and parsing that into various units of analysis (an utterance vs. each sentence vs. each idea unit vs. a dialogical back-and-forth such as a question and its answer). The second step is to characterize the data, and if it is heading to a more advanced quantitative analysis, that step involves what is called “descriptive statistics” (see TopHat 14 if you want more details).

Quantitative Data Reduction – Mean (average), Median or Mode

Every study has issues with data reduction. How to summarize either the quantitative scores from a study or the self-reported survey responses, interview transcripts, or observations that were taken. The typical way to reduce a bunch of numbers (like achievement test scores) from a class or group is to compute a measure of central tendency, most commonly the Mean (average). The average is then used to characterize how the whole class performed. Now… is something lost when we do this? Of course! But we have to reduce the data some how. Our Neag future teachers Fall 2020 created this tiktok intro to measures of central tendency (let me know what you think and I’ll pass it own to them)

To look at this, let’s anchor our discussion by reviewing  a study about the value of Kahoot! for test preparation, a classic idea of teaching to the test (Iwanoto et al. 2017).

As in this Kahoot! study, the most common way to reduce all the class scores to 1 number is the Mean. With 2 classes, an “Experimental” class and a “Control” there is an easy way to compare the 2 Means, a t-test.

Excel has a built-in t-test function that can you use, if you have Excel.

For any 2 different classes t-test there is an online tool. (independent samples)

If the 2 sets of scores are from the same class (like pre-test, post-test) here is a 2 “treatment times” repeated measures t-test online tool (dependent samples)

 

Reference

Iwamoto, D. H. Hargus, J. Jon, E, & Vuong, K.(2017). Analyzing the efficacy of the testing effect using Kahoot™ on student performance. Turkish Online Journal of Distance Education, 18(2).

Qualitative Data Reduction – Emergent Themes (data visualization)

The same problem of data reduction exists for qualitative survey style data. Who wants to read all that text- Not me. So again, the researcher is faced with finding a reliable and valid approach to summarizing the data (call it “analyzing the data” if you wish). Perhaps the quickest way to reduce a long transcript of text for its central themes is to create a wordle or word cloud. This is called data visualization and is a good first step at looking for themes and patterns in complex data sets.

Using a Word Cloud to summarize survey results is the simplest thing to do to look for “emergent themes,” just like computing an average (mean) score for a group of grades is the easiest thing to do. You have now seen a few of the additional things one can do with quantitative scores (t-test, correlations, descriptive stats, etc). So here is a summary of what qualitative data analysis entails. Instead of just constructing a word cloud, a qualitative researcher might do open coding of the responses, then apply those results using axial coding. This would be part of a grounded theory process

Here are a couple sample tools, 1 of which we’ll use in the “to do” section of this module.

Online Text Analyzer

Word Cloud online tool

Other Readings:

Swalin, A. (2018). How to handle missing data.

Essential Understanding:

In any study there is a need to reduce the raw data and summarize the results of the research. That process of data reduction is usually called the Analysis, and involves some method for making what was observed simpler to understand for readers who don’t want to read, view, or listen to everything themselves. Any form of data reduction, whether Quant (averages/correlations) or Qual (Word Clouds/Emergent Themes) necessarily loses some of the meaning. It’s the same process. Only the full data set will retain all the information (and with digital publishing the data file(s) can be there too). So data reduction is a trade-off to produce a summary, and should not be judged against doing a full analysis of the full raw data set on your own– but it clearly to your advantage not to have to read and make sense of all the raw data yourself.

Note: Using a Word Cloud to summarize survey results is the simplest thing to do to look for “emergent themes,” just like computing an average (mean) score for a group of grades is the easiest thing to do. You have now seen a few of the additional things one can do with quantitative scores (t-test, correlations, descriptive stats, etc). So here is a summary of what qualitative data analysis entails. Instead of just constructing a word cloud, a qualitative researcher might do open coding of the responses, then apply those results using axial coding. This would be part of a grounded theory process