I presented this lecture on Questionnaire Development to the graduate students of Psychology at the University of São Francisco, Campinas, Brazil on Monday, 19, August, 2019. I take you through the process I followed to develop questionnaires for Self-efficacy for Learning and for Performing in Music. Unfortunately I didn’t video this one. The slides and my full notes (nearly a transcript) are below.
1. Intro – I started as Laura the cellist and have always been interested in how people think.
2. Albert Bandura, the renowned Australian psychologist first introduced the construct of self-efficacy in 1977. This is a construct that is responsible for so much, I believe it underlies everything that we do. It is not the global construct of self-image, but a very specific belief in capabilities to carry out a task.
Bandura’s initial investigations were striking and not something researchers could directly replicate. He used psychotherapy patients and their fear of boa constrictors to test the accuracy of their reported self- beliefs. The task he gave them was to get progressively closer to the snakes. He asked: How confident are you that you can be in a room with a boa constrictor? How confident are you that you can hold a boa constrictor?
Each task had separate criteria, and through this initial investigation we see a clear demonstration of the power of these beliefs.
Bandura confirmed that self-efficacy beliefs have four main influences:
- Mastery experiences
- Vicarious experiences
- Verbal persuasion
- Physiological symptoms
These are presented in order of strength. Firstly, we are influenced by what we have accomplished: If I have done it, I can believe in my capabilities to do it again. Second most influential are vicarious experiences. These are observed experiences. If I have no mastery experience, then watching someone else complete the task will be the most influential for me. If I do have mastery of something, and have accomplished the task before, then even if one of the lesser influences is telling me I might not be successful, for example if someone tells me I cannot do it, that ‘verbal persuasion’ will be overridden by my own positive ‘mastery’ experience. Lastly are physical signs, and those are least influential. Today I am sure that my hands are clammy with nerves, but that will not stop me speaking with confidence to all of you.
Judging self-efficacy is not however as simple as following this list and asking, ‘Have you done it before? Yes? Ah, you’ll be fine.’ There are a complex set of factors that will influence any one of us as we carry out tasks in different times, and in different places.
3. Although other researchers could not directly replicate Bandura’s initial study, there was interest to explore the construct, but not all researchers understood the construct as Bandura defined it, and at that point, there were no established questionnaires or tools to measure self-efficacy. This produced confusion and gaps in understanding of the construct.
In the years following Bandura’s initial investigations self-efficacy was investigated through a variety of different means. Researchers invented their own questionnaires, or simply asked a single, blanket question to measure self-efficacy. Theorists began to question whether these early studies were investigating self-efficacy or a wider, global construct such as self-concept or self-esteem.
Bong, M., & Clark, R. (1999). Comparison between self-concept and self-efficacy in academic motivation
research. Educational Psychologist, 34, 139–153.Bong, M., & Skaalvik, E. (2003). Academic self-concept and self-efficacy: How different are they really? Educational Psychology Review, 15, 1–40.
Schwarzer & Jerusalem created a general scale in 1979 that was originally created in German, however there was no construct validation study. Their scale is specifically designed to measure ‘general’ self- efficacy, but this is already at odds with the construct’s definition, BUT this was not clearly understood at the time.
We now know that self-efficacy has to be specific. It is not something general.
The Schwarzer and Jerusalem scale was made readily available and has been translated into over 33 languages. This has meant that many people have used it. Internal reliability reports are good, but it is general and I see this as a problem.
http://userpage.fu-berlin.de/health/engscal.htm and https://cyfar.org/sites/default/files/PsychometricsFiles/General%20Self- Efficacy%20Scale%20(Adolescents,%20Adults)%20Schwarzer.pdf
Sherer and his colleagues also created a general self-efficacy scale in 1982. This scale was created for use in academic settings. Like the Schwarzer and Jerusalem scale this was also a general scale, but it had a
strong validation study. Because of this, I chose to use this scale as the basis for my research.
4. In 1996, there was an important theoretical shift proposed by Dale Schunk at a meeting of the Annual Educational Research Association. Although this was not a formal, empirical study he presented the theoretical argument in line with Bandura’s definition of self-efficacy, that because self-efficacy is specific, there should be different types of self-efficacy. I believe this is very important to the way we investigate self-efficacy.
5. In music there were no specific studies before 2003, and then McCormick and McPherson produced an influential study where they showed the link between self-efficacy and performance. They also said: “However, we still do not understand properly the mechanisms whereby students come to believe in their own capabilities to perform well.”
The issue with this study was how self-efficacy was assessed. They asked one question, and it is very difficult to measure a complex construct with a single question.
6. The same researchers carried out another empirical study in 2006, and this time they improved the questioning, by breaking down the overall task of ‘exam’ into its component parts, but this still did not assess various aspects of the construct, it was not a questionnaire, but a single question.
Explain ABRSM exams and how they work, and why this is a series of separate questions. Lack of detail, lack of specificity. But they were doing the best they knew how at the time, and it was still very important research.
Sherer, M., Maddux, J. E., Mercandante, B., Prentice-Dunn, S., Jacobs,
B., & Rogers, R. W. (1982). The self-efficacy scale: Construction and
validation. Psychological reports, 51(2), 663-671.
7. At the same time, another very influential book was published. This offers advice on questionnaire construction, considering:
b. Questions asked? c. Reliability?
The chapter by Mimi Bong is very good and there is a chapter by Bandura himself reinforcing the need for correspondence with a task and specificity of questions within a questionnaire to question aspects of self- efficacy. There are examples of questionnaires from various domains, and the strengths and weaknesses are discussed.
Mimi Bong highlights problems with asking the wrong questions. She lists:
- Confusion with other constructs that relate to the self (often self- concept and self-esteem, both of which are global and not task- specific)
- Lack of accurate understanding in context specific nature of self- efficacy
- Failure to ensure correspondence between self-efficacy and the task-target
Bandura reminds us that self- “ …efficacy beliefs are multifaceted and contextual, but the level of generality of the efficacy items within a given domain of functioning varies depending on
a. the degree of situational resemblance and b. the foreseeability of task demands.
But regardless of the level of generality, in no case are the efficacy items dissociated from context and level of task demands”
(Bandura, 1997, p.50). (lettering a/b is mine)
How did I develop my questionnaire? 250 music students completed questionnaires for the validation study
8. The first step was to adapt the Sheerer and Maddux scale to music. I translated as directly as possible from their original ‘general’ scale and then it was separated into Self-efficacy for Musical Learning and Self- efficacy for Musical Performing questionnaires.
9. Illustrating how individual items were made specific for Learning and for Performing.
10. An introduction was added to make sure the respondents were thinking about the task – remembering Mimi Bong’s advice to be careful of problem areas. This was to make sure the self-efficacy judgements corresponded with a specific task.
11. When developing a new scale it was important to test it and make sure it was reliable and robust.
-Explain how the items were deleted from the performing scale –
Cronbach Alpha shows the overall reliability of the scale and then after each item is shows how the reliability would change if that item was removed. You can see that there were two items that improved this scale when removed.
14.Explain the factor analysis – different methods. and then how the 2 factor solution here represents the reverse-coded items. – which really reinforces that there is one underlying factor being assessed by each scale.
Data reduction tests the internal structure of a questionnaire and can confirm external links between a measure and other variables explored (Anderson & Gerbing, 1988). These tests can be used when searching for underlying factors, testing or confirming a hypothesised model that involves various factors and influences, or when confirming a single underlying component where a larger external model of constructs is not involved. Two methods for analysis, Principal Component Analysis (PCA; Pearson 1901; Hotelling 1933; Kelley 1935) and Factor Analysis (FA; Harman 1976; Anderson & Gerbing 1988; Joliffe 2002) both identify the component factors within a measure. However, the methods
make different assumptions about the treatment of error within the resulting factors, and these have an impact on the interpretability of the results in different situations; an informed choice must therefore be made in order to employ the appropriate procedure for the data gathered (Shur, 2005).
In studies where there is no hypothesized relationship within a fixed model, there is a tendency to use PCA because it is the default method in the commonly used analysis software SPSS. PCA aims to extract components that represent the maximum amount of variance within the model, including both discrete variance as represented specifically by that component and error variance. PCA is acceptable if identification of components is the goal, but, because of the inclusion of error in the measurement of components, they should not be interpreted as having theoretical significance (Pedhazur & Schmelkin, 1991) and PCA is not an appropriate method if the components are being extracted to explain the understanding of a construct. Following Pedhazur (1982), when PCA is employed only the initial, un-rotated solution should be considered in analysis; methods of rotating the factors around the axis are known to enhance the clarity of solutions in FA, but as PCA includes error within the components, any rotation will create distorted results.
FA can be used in two forms: either as an exploratory tool or for confirmation within a structure. Exploratory Factor Analysis (EFA; Spearman 1904, 1927; Thurstone 1931; Tucker 1955) extracts factors that are discretely responsible for the variance, allowing the researcher to make interpretative judgement about the resulting factors. EFA is preferable to PCA when exploring the structure of cognitive abilities (Carroll, 1993, p.vi) as with discrete variance represented by extracted components, theoretical meaning can be attributed to the resulting factors. When performing EFA, the resulting factor loadings can be rotated to make the interpretation of the relationship of the factors more clear. How- ever, the method of rotation for factors needs to be considered; there are either orthogonal or oblique rotation methods which rotates the factors by different angles, depending on whether there are hypothesised correlations between factors. Both methods of rotation are suggested, for example in the validation of the well-established State-Trait Anxiety Inventory (Gaudry et al., 1975). The Varimax rotation method (Kaiser, 1958) is the default in SPSS and is the most commonly used orthogonal rotation because of the simplicity of its output, but it is not appropriate when a single underlying factor is hypothesized (Gorsuch, 1983), and in this situation the Quartimax method has been suggested as most appropriate (Kaiser, 1958).
15. With the responses collected, I then looked for distinct correlations of learning and performing with various skills. (Ritchie & Williamon, 2008) There were specific correlations… these were used to illustrate the fundamental differences between SEL and SEP.
16. Full list of skills tested against the three iterations of the questionnaire: a. The direct translation of the ‘General’ scale by Sherer et al,
b. My adaptation for Learning, and
c. My adaptation for Performing
You can see that there are separate correlations with different skills for the performing and the learning, and that these show distinct patterns that are also separate from the ‘general’ scale. (point out some skills)
It was at this point that I was convinced that a ‘general’ scale was not at all good enough.
17.Also demonstrated that these beliefs could be different across different samples. The conservatoire sample was more skilled than the university sample.
18.Adapting and validating an established questionnaire to test a task in another domain allows less-researched domains to draw upon experience from disciplines with an established self-efficacy research history. This does not imply that results transfer across domains, but that methods and procedures can be drawn upon in order to further the research within the domains to which the questionnaire adapts successfully.
19.This research followed the initial validation of the adult version of the scale to confirm a single underlying factor in the scale adapted for use with primary school children. Exploratory Factor Analysis (EFA) was undertaken using the Maximum Likelihood method, with the Quartimax method of orthogonal rotation employed (as is appropriate when a single underlying factor is hypothesized; Gorsuch 1983). Both the Kaiser rule, where factors with eigenvalues of 1 or greater are considered for retention (Kaiser, 1960), and the examination of the Scree plot (Cattell, 1966) were used to test the initial hypothesis of a single underlying factor for the adapted self-efficacy scale.
20.This study with children reveals links between self- efficacy for learning in music and other pursuits in music, as well as with extra-musical activities and other psychological measures. The time spent listening to
music correlated positively with Self-efficacy for Musical Learning scores. Listening to music can influence instrumental learning (c.f. the group of “best” violinists reported by Ericsson et al. 1993), as students can gain an understanding of an entire piece and hear polished musical interpretations. Listening also engages high-level skills involving musical analytic understanding and the interrelationship of music components. It is unlikely, however, that children who have such limited experiences with making music have yet developed or refined the skills to listen to music in the same way as more experienced musicians (Nielsen, 1999a; North et al., 2000; Ericsson, 2006). The impact of listening on the actual processes of learning an instrument may not be clearly shown at these early stages of learning in music.
The correlations to extra-musical activities found in this study demonstrate the wider relevance of self-efficacy in children’s lives. There were positive correlations with the physical activities of dancing and participating in individual sports. Both of these structured physical activities rely on teacher or coach input for learning, and the pattern of learning takes place through scheduled rehearsal sessions, building to performance goals. In this sense, the processes of learning and performing in dance and individual sports are similar to those involved in music. When students learn a musical instrument, they typically have weekly lessons with a teacher and give a concert or a recital when pieces have been learned.
The link between reading and self-efficacy beliefs may be less immediately obvious. However, there are again similarities with fundamental, underlying processes (Gardener, 1983). Reading skills involve processes of verbal decoding and comprehension of both detail and of a complete story. Music learning involves skills to decode musical notation, comprehend phrases, and attribute meaning to the music in preparation for a performance. The correlation of reading for pleasure and children’s perceived self-efficacy could be explained in the similarity of processes between these tasks (Hansen & Bernstorf, 2002). According to Bandura (1986), vicarious experiences are the second strongest influence on people’s perceived self-efficacy beliefs, and the skills students have already mastered in literary interpretation could inform and reinforce their belief in their abilities to learn music.
21.The children’s Self-efficacy for Musical Performing questionnaire tested here was shown to have a robust Cronbach alpha, with EFA showing one overriding internal factor. The students showed significant differences in
beliefs about their performing capabilities de- pending on their level of engagement with the subject being questioned. All of the students in the study had experience performing in music as part of the National Curriculum in music, which requires that students at Key Stage 2 (i.e. this level) will all perform, com- pose, and appraise music in school lessons (QCA & DEA, 1999). However, those who engaged with specialist musical tuition had noticeably higher self-efficacy for performing scores, which correlated with different aspects within their everyday lives.
22.Can self-efficacy beliefs be measured in a similar manner when related tasks share core skills?
23.Above: Self-efficacy for Musical Performing and the minimal adaptations to Self-efficacy for Performing in Sport. A sample of 98 university sport science students completed adaptations of each of these to determine whether it was possible to transfer to another context.
24.Same process of testing…
25.Performance, which is often a public event, is a clearly observable task, making comparison more transparent across domains. Learning is less visible. It is not unique to a specific setting or bound by a set methodology. The learning process is more private, potentially occurs in isolation, and is somewhat more abstract as compared with a scheduled performance. As performance has obvious similarities across the domains of music and sport, one might assume that the learning processes also exhibits similarities. In these domains, learning requires the acquisition and subsequent development of skill into a polished high-level state. This does not, however, necessitate that the processes or means of achieving the learning are the same across domains. It is possible, therefore, that the task of learning may not be comparable, because of hidden differences.
27.Knowing the construct so an instrument can be accurately created and used to assess people’s views and beliefs.
28.What to think about – many things to consider. Ultimately when designing an instrument it is essential to know your sample. This image is about knowing what are the ingredients, knowing how they work together, and then you can hypothesise (and actually test) how to create something… my analogy/metaphor is about ingredients and making a cake. 🙂
ps I had so much fun!! Thank you Rodolfo for the invitation.