Establishing best practices in the use of an upgraded airborne teaching laboratory

Since the 1980s National Flying Laboratory Centre has used the Jetstream family of aircraft as a ﬂying classroom, providing university students and developing professionals with real-world exposure to theoretical concepts in the form of practical ﬂight test instruction. Recently the Jetstream was replaced with a newer Saab-340B. The work in this paper presents an experimental analysis of instruction using the Jetstream, compared with known best practices, to inform its replacement process. Flight activities were observed, and participating students ( n = 60) were surveyed at four set intervals to establish their mood and interest towards the module. A pen and paper test, comparing what participants learned compared to a controlled group was also administered. While the module was still able to excite, motivate and re-contextualise previously taught information to students, upgrades to the aging technology suite, speciﬁcally to support data analysis and brieﬁng was one of the greatest needs from the newer aircraft

In 2019, NFLC began the process of replacing its current Jetstream-3102 aircraft with a newer and larger Saab-340B, with the view to enable higher quality teaching and research. This paper concerns lessons on the now retired Jetstream.
The perception that practical work, particularly in the United Kingdom where NFLC mainly operates, is integral to effective science and technology education in general is widely shared amongst teachers and students in addition to the RAeS. Research in flying classroom use also draws conclusions which support it; however, a much larger body of surrounding evidence published over the last 40 years differs, describing a strategy that is difficult and rare to implement effectively [2][3][4].

Pedagogy of effective flying classrooms
Rather than the airborne environment offsetting the need for good pedagogy, the positive outcomes reported by flying classroom literature are likely due to practices baked into professional aviation regulation and culture [5]. The aviation system of work, originally designed for pilot instruction, also makes good pedagogy for practical work of this type.
The main trait of the aviation training system, and many effective practical work lessons, is adherence to a rehearsed format, composed of: 1) theoretical knowledge instruction and synthesis; followed by 2) pre-flight briefing; 3) exercise; 4) immediate debrief of key events; and 5) deeper analysis and synthesis. Lessons learned from literature, both from the flying classroom and wider pedagogical community, illustrate its value: Chiefly, practical work itself does not make for consistent learning on its own, but re-enforces it [6]. Teachers and educators who employ practical work as an alternative teaching strategy to classroom instruction rather than a supplement often find students may do what is expected, but not learn what is expected [4]. This is especially notable for more experiential use-cases, where a student being able to feel, or directly observe theoretical phenomena first-hand is the express aim of the exercise [7,8]. Lewis et al. [9] and Stickland and Scanlon [10] for example, report on flying classroom activities that make use of experiential learning to demonstrate previously learned flight dynamic principles where the timing, depth and relevance of the theoretical instruction provided ahead of the exercise were major considerations. This is reflected in where the curriculum they were used, appearing in the second academic year, building on fundamental material learned in the first. While doing practical work can promote more in depth analysis to understanding introducing new concepts and phenomena [11][12], consistently effective practical work tends to build on knowledge that is consistently explored and taught.
Evidence from general science studies, such as by Jackman et al. [13], Watson et al. [14], Hofstein and Lunetta [15] and Harrison [6] show a pattern of practical work being more difficult to learn in due to factors in the environment, such as temperature changes, accelerations, motio and, sounds [16,17]. Many aviation tasks are also sufficiently complex in addition to the environment that careful management and mitigation are essential facets of safe training practice. Both the intrinsic load brought on by task complexity and external load through the environment [18,19] can be controlled through familiarisation and rehearsal, even simulation in extreme cases [20,21]. This is the purpose of the pre-flight brief, to mentally allow the student to rehearse the exercise. Crucially, this is not the same as theoretical knowledge instruction. The level of complexity of flying classroom exercises have, in general, shown to be minimal compared to flight crew, not out-pacing level of training required by Part-SPO's broad requirements. However, the nuances of how students are briefed for specific exercises remain major considerations [9,22].
It may be resource constraints driving the removal of post-exercise work [23], or the opinion that the exercise alone is sufficient for learning [3], but it is common practice for pedagogy in science education to stop once the exercise has finished [24]. Effective practical work does not, instead making use of further academic time, which repeatedly requires the students to recall the exercise over a longer period than the module itself. Committing temporary information gathered in flight from the working memory into long-term memory requires a student to recall it multiple times. The approximate half-life  [10] Detailed the development of a flying classroom syllabus in unmodified gliders. Authors presented an anecdotal assessment that the intervention was generally effective, as well as amenable student opinions. Trainelli et al. [25] Discusses design and implementation of ultralight flying classroom and accompanying cost-effective flight test software. Authors state that the educational outcomes are more reflective of real-world practice than comparable classroom-based teaching. Orio et al. [26] Presented the preliminary design for onboard instrumentation in a light aircraft. Its intended use was for test pilot and flight test engineer education, but could not yet comment on its application. Padfield [27] Design of simulated flying handling qualities trials Simulink and FlightGear, intended to familiarise students with testing terminology, requirements and test program design. Comments on the positive and negative factors that appeared to impact the effectiveness of the module. Muratore et al. [28] Detailed the modernisation of an existing flying classroom's flight data display hardware. Discussed the educational merits to comparing simulation and real flight data. Lewis et al. [9] Presented student feedback of the National Flying Laboratories Jetstream (the subject of this paper), discussing its high affective value and perceived importance amongst students. Bromfield and Belberov [29] Design of flying classroom module using off-the-shelf software in unmodified aircraft. Qualitative and quantitative measurement of improvements to student learning compared to previous iterations of the module by comparing grades, substantiated by student feedback. Slingerland et al. [22] Offer best practices for flying classroom design using a long-running flying classroom as the case study, and discussed the practical merits and disadvantages to flying classroom operations of memory is 24 h [30], and other activity which draws a student's attention away -such are the demands of academia -will make this faster [31]. Hence, a common practice in professional flight test, mirrored in flying classroom lessons is to compile a report detailing what happened, typically within 24 h [33], before future reference and analysis. Direct comparisons between and further work, and no other intervention shows the former to be far more effective at improving recall and developing knowledge [33]. Yet, this activity will likely be far longer than the exercise; Slingerland et al. [22] find students will be far less inclined to perform 'the boring bits' themselves. In many ways, this portion of a practical work lesson requires as many resources as the exercise itself.
Overt discussions as to an exercise's 'affective value' -educationally beneficial moods and feelings arising from a lesson -are common to other forms of scientific practical work [34][35][36], but not most aviation instruction. Practical work tends to be enjoyable [37], confidence-boosting [38], interesting [39,40], and motivating [41,42]. There is an evidence basis to suggest that each construct aids student achievement (e.g. [39] and has made its way into official policy in some schools, or even used for behaviour control [43]. However, each holds little pedagogical value on their own. These facts have been referenced in flying classroom literature and affect treated as an explicit advantage [22,29], however they refrain from explicitly refer to affect as a policy decision.

Motivation for study
Community acceptance of how aviation is done, not regulatory or systemic needs, will be the driver for flying classroom operators to continue follow this pedagogy. Compared to pilot instruction there are fewer regulatory or systemic requirements for it. Most, including NFLC, will operate under a separate annex of the operational regulations named CS-Part-SPO 1 , where legally there are only broad requirements to brief an occupant for their role in the aircraft [44], and operationally one which is not safety critical or requiring the use of unfamiliar equipment. This means other practical or pedagogical factors can yet erode best practice. The Jetstream-3102 was initially sized for the demands of teaching aeroengineering between the 1980s and early 2000s [9], which have since evolved along with the scale of operation, prompting NFLC to begin the process of replacing it. An assessment of what strengths its replacement, a larger Saab-340B, needs to re-enforce, and vulnerabilities to address in order to enable best accepted teaching practice for the next 20 years.

Study objectives
The objective of the work in this paper was to perform an experimentally derived analysis of a representative flying classroom module in practice, to compare it with what is known to be effective aviation, and practical work pedagogy, and from this analysis, draw systemic recommendations which would serve to strengthen its use. This was done in 2019 using NFLC's previous Jetstream-3102 aircraft, shown in Fig. 1, as part of the learning process leading up to its Saab-340B replacement.

Course selected for study
The course selected for study is one of a series of standard exercises delivered by NFLC, in this case, delivered internally to Postgraduate Aerospace Design Engineering students. Sixty students, all studying an Aerospace Design MSc, agreed to supply data for this work. Of these students, 32 contributed towards the academic trial, 40 contributed survey data, and 10 took part in a post-flight interview. Participation did not affect their academic progress. A pre-study survey indicated a professionally young, global cohort; the average age was 26 (SD = 4.1), representing 19 different nationalities. Eight students reported prior aerospace design or engineering experience, with a mean of 5.8 years (SD = 5.0). The flying experience of the cohort was, on average, low. Six students had been on board in a single Jetstream flight before; very few had any other form of active flying experience. Two, however, held private pilot qualifications.
The objectives of the module were a mixture of procedural knowledge and skills (e.g. Report results using collected flight test data) and declarative knowledge (e.g. explain the practical functions of the primary flying controls). This study mainly analysed changes to higher-order skills. Similar to Trainelli et al. [25], the demonstrators emphasised that flight test itself was not the aim, but was an incidental means to explore "real" engineering: "[. . .] from a teaching perspective there's nothing like seeing it in practice. It puts theory into context; if you deal purely in theory, then everything is very logical, clean, follows the trend perfectly. One of the things flight test shows -or any experiment, with flight test as an exampleis that the real world is not like that. The students feel the real conditions in the aircraft, and they get the variation of that in their data as well." Following a week of classroom lectures, in groups of 15, students acted as flight test observers onboard two flights: (1) measuring parameters related to aircraft performance; (2) measuring parameters related to aircraft stability and handling. Students aggregated, analysed, then presented data across all flights. The purpose of these learning objectives was for the students to remember set aerodynamic and flight dynamic knowledge, as well as to demonstrate the robustness of underlying theoretical principles by using real-world data in analysis work.
The Jetstream 3102 had a maximum take-off mass of 7,059kg, and a capacity of 15 students, two pilots flying, with one demonstrator supervising and operating the data collection and display system. The aircraft was fitted with instrumentation allowing for measurement of control surface positions, applied control forces, aircraft attitude in three dimensions, angle-of-attack and angle of sideslip, aircraft body rates of rotation and acceleration, static and differential pressures for airspeed and altitude readings, position using Global Positioning System (GPS) and Inertial Reference System (IRS), and avionics including Distance Measuring Equipment (DME) and an Instrument Landing System (ILS). Data was reduced to meaningful units, which the demonstrator can transmit to seat-mounted displays inside the aircraft for the students to view or use to record data using LABVIEW.

Study design
A mixed-methods approach was adopted to capture both the qualitative and quantitative characteristics of the course in practice [45]. A controlled pen-and-paper academic trial was distributed before and after the course to measure the response in student knowledge. The trial required students (n = 17) to complete three questions: 1) and 2) compile two short flight test reports pertaining respectively to performance parameters, and stability and landing parameters using fictional test data; 3) Identify some stability characteristics from printed data traces. Students were briefed that they had 90min and that they could complete the questions in any order. The trial featured a fictional aircraft of a different type to the Jetstream. The results were scored by two independent subject matter experts (SMEs) with flight dynamics and academic grading experience. The marking scheme for each question was a rubric based upon the seven-part paragraph, common in-flight test reporting [32], and assessed the strength of students analysis, discussion, conclusions, recommendations, relation to real operations, and presentation. To reduce familiarity with the test affecting student scores improving student score, the surface details the trial questions administered post-course were different. A 'do-nothing' control group (n = 13) studying the same degree, who had not exposed to the Jetstream course, took part in the trial to determine retest reliability [46] 2 . The trial was first administered to a pilot study group to check for errors and readability. This trial considered the total learning response across the course as a whole without controlling for extraneous variables, which Bracht and Glass [47] suggest that this has more external validity than testing isolated elements, particularly in the context of an educational package.
To determine practical task effectiveness and offer a description of its difficulty, students were timed completing each task and were further asked to complete the NASA TLX self-report scale [48] post-flight, detailing six subscales: mental and physical load, effort, success, frustration and pace. TLX is normally used per task; here it was used to explore the effect of each flight as a whole. A total workload rating can be calculated through an average of each raw score, weighted by the student. TLX, however, has not consistently been shown as more, or less, valid considering only raw scores [49]. As a result, total load was not considered to limit unnecessary student engagement.
A randomised subset of 12 students agreed to take part in a semi-structured debrief to discuss what they remembered from the flight, and what they felt; their comments are shown throughout this work using representative pseudonyms. In this case, the sampling strategy was not based on reaching content saturation, but a practical maximum, based on recommendations given by Onwuegbuzie and Collins [50]. The affective value of the course was also measured at four different points (n = 48): before the course start, after flight 1, after flight 2 and at module completion. The survey measured four dimensions, identified in other practical education research: Motivation, Interest, Enjoyment and Self-Confidence.

Results
The first flight took place shortly after a 2-h lecture covering the test plan, explanation of the aircraft systems, and a safety brief; flight time was 26min and students were active for 8min. Comments made regarding each section of the flight are presented in Table 2. Just before the test, students who were sat towards the rear of the aircraft were instructed to move to the front row, moving the centre of gravity to the desired position. The demonstrator changed the seat displays to show the data console and repeated what information the students needed to manually record. The demonstrator froze the screen to allow the students to write down the information displayed. No student appeared to have missed any data or remarked to have struggled with the task; one elected to record extraneous information. Upon completion of data collection, students were returned to their original position for landing. As the instructor needed to fly other groups of students, debrief was limited.
The second flight, the following day, was 44min, where students were actively recording data for 5min and were required to observe for a further 17min. Students appeared to be far more familiar and comfortable with the environment, and visibly less focused on distractions.
Following this, students were prompted to watch flight data displays. The demonstrator also prompted all students to refer to their workbooks, which contained information about the manoeuvres to be flown. Students, experiencing a rapid change in conditions (the "Short Period" mode), freely recalled the facts mentioned in demonstrator's commentary during debriefing.
The second demonstration was the phugoid mode, a longer period of oscillation where the aircraft exchanges speed and altitude. The demonstrator commented: "[. . .] the pilot has released the elevator, feel the aircraft start to pitch nose down", then, "as the airspeed increases, we generate more lift, once we get beyond 160kts that extra lift will feel as an increase in G, [in the] meantime the aircraft starts to pitch nose up." An example of what the students will see is shown in Fig. 2. Recall of facts, such as the relationship between G and airspeed, the instability of the mode was better during debrief compared to other modes. Student feedback tended to concentrate upon the experiential rather than the technical aspects of this.
During Dutch roll (Fig. 3), the pilot made alternate left and right inputs to the rudder to yaw the aircraft back and forth, which also prompts a secondary rolling motion. This is visible to the students out the window as the wingtip tracing a circle around the horizon, and through the flight deck camera as the nose tracing a figure 8. All students during debrief recalled feeling the only the lateral motion, not the prescribed shapes drawn at the nose or wing.
During a demonstration of the roll subsidence mode, the damping effect which limits roll rate, was the mode students least freely recalled, and offered little discussion. The data trace onboard the    [32].
aircraft offered students the best explanation of what was happening, but this was not accessible to them afterwards.
The spiral mode (accelerating and descending with increasing bank angle and G) demonstration was the second mode students most freely discussed. The demonstrator prompted:  Student comments on these again concentrated upon the experiential aspects, with emotive terms more often used than engineering ones.
Debriefing for both flights was made by the instructor the following week and covered both flights. The following week, students were asked to report whether they had completed the related coursework. Two had. Figure 4 shows self-reported task-load index (TLX) scores for each flight. Load and effort increased during the second flight particularly physically, but not unreasonably so. Students reported that they had sufficient spare capacity and desire to do more in each flight, and crucially were able to retain capacity to absorb information.

Educational effectiveness
Students were administered the pre-test 2h prior to the start of the module; mean score was 4.18, (SD = 3.57), and 6.38 (SD = 6.05) for the control (n = 13) and intervention groups (n = 17), respectively. The post-test was administered 2h after module completion, with a mean score of 5.90 (SD = 5.63), 13.25 (SD = 9.80) for the control and intervention groups, respectively. An initial one-way ANOVA was performed, comparing the post-test scores of the intervention group and the control group, controlling for pre-test scores. No statistically significant differences were observed (F = 3.874, p = 0.061, 1-β = 0.472), indicating the exercises were not effective.
However, further analysis of individual questions shows a greater change than their combined sum represents. Firstly, the control group showed statistically significant practice effects, but for Q1 only (Wilcoxon signed rank test [51], z = 2.036, p = 0.042). Repeat analysis, excluding the results of Q1, show that the intervention group made a small but significant improvement compared to the control group (Wilcoxon sign rank test, z = 2.661, p = 0.008). These are shown in Table 3. Students made the  largest improvement to initial identification of the problem they were presented with, but were not able yet to apply it to reasonable conclusions. 3 Some moods and feelings generated by the module are educationally beneficial. Figure 5 shows the change in different dimensions of positive affect throughout the module. All measures peak completing the second flight and sharply decrease once the practical elements are finished, which suggests it is a short-term advantage and not a long-term outcome. One student commented that this reinforced a desire to enter the flight test professions; evidence from other authors suggests, however, that this may be temporary [43]. The value of affect in practical education like this is as a mechanism to hold a student's attention enough to complete the supporting task or assignment.

Experiential value
When left to freely recall specific facts about the flights, students mentioned the more physically demanding experiences. They valued being able to feel aircraft motion during dynamic mode demonstrations, which they felt helped to contextualise and visualise theoretical knowledge they already held. These experiences, coincidentally, were related to the controlled trial questions less sensitive to practice effects, suggesting rote learning would be a less-effective method. For example, one student suggested that the numerical data they recorded was not the main benefit compared to feeling what those numbers mean in action. A second had mixed feelings about the flight, describing it simultaneously as "amazing" and "so uncomfortable", but felt they could now attach judgement to numbers associated with dynamic modes. A third discussed that being on board the aircraft was important, because "my imagination would not have been sufficient". Prior work by Lewis et al. [9] indicated students feel that experience of direct aircraft operation, particularly the role of the pilot, would also be beneficial to experience, and this view was repeated here. The avionics suite onboard the Jetstream-3201 does not allow for much integration of this students noted that they lacked situational awareness of the flights as a whole in a way that can be aided by modernisation in the cabin (e.g. moving maps). By comparison, the commentary and guidance that were given during dynamic demonstrations were considered to greatly enhance the experiential value of the flight, and when coupled with the visual aids on the data screens shows the potential of such an upgrade.

Limitations
There are two primary limitations to this study. First, a modestly powered test (1-β = 0.46) was achieved, which falls below commonly accepted metric of statistical confidence (1-β ≥ 0.8) but is reasonable for this type of exploratory research [45]. Because of this, the tests performed were not sensitive enough to reveal whether the observed results were statistically significant; however, the supporting qualitative data gathered suggesting it could be pedagogically meaningful. Second, only the acute affective and experiential value of the module was recorded; its impact on students in later professional life was not explored. However, from prior evidence drawn from related studies, a reasonable hypothesis is that affect, and recall of the related experiences will tend to diminish over time [33,43].

Discussion and conclusions
In broad, these findings resonate with other flying classroom literature, counter to the wider body of practical work knowledge. Chiefly, that well-implemented flying classroom instruction continues to commute a narrow but indispensable benefit in aero-engineering education. Students ended the module with a measurable improvement in their ability to identify different aircraft dynamic modes and were beginning to calculate the effects that longitudinal centre of gravity (CG) has on stability using real non-idealised experimental data, a chief aim of the course. The flight test exercises themselves promoted interest, self-confidence, motivation and enjoyment among students, and the first-hand experience of relevant aircraft manoeuvres engendered the impression that their current understanding of aero-engineering knowledge had been deepened, consistent with what aspects students improved the most at during the controlled trial.
The results here, however, re-enforce how sensitive to disturbances in procedure practical work of this nature is, emphasising the need to keep the facility up-to-date to cope with modern teaching demands. The two major vulnerabilities discovered were largely indicative of problems related to the aging Jetstream-3201 and its equipment, that had become increasingly challenging for instructors to compensate for.
Firstly, a passenger capacity of 15 is now too small to handle the volume of students on modern aeroengineering courses, requiring more flights than optimal, accelerating aircraft maintenance intervals and pressuring staff time which would otherwise be available to provide further teaching support. The pedagogical effect was borne out here, where scheduling limits imposed a longer than ideal gap between theoretical knowledge instruction and the exercise, and again between the exercise and debrief. This affected recall of theoretical knowledge and key facts regarding the flight once they were finished. The instructor also could not maximise the positive affective and experiential values the exercise commutes, coinciding with lower coursework completion rates after a week. Secondly, even though task loading indices showed students felt they had spare capacity, an initial first flight was essential in developing a capacity and situational awareness in the following exercises. Students would likely develop this capacity naturally if exposed to more flights, and NFLC offer courses that see students fly up to five times. Many clients, however, as in the example studied here, select two, or even one flight, and the capabilities of the data suites onboard the Jetstream-3201 were no longer sufficient to extract more performance from students in that timespan. Upgrades were relatively costprohibitive to install into that airframe.
Targeted use of technology may bridge this gap. Specific reference is made to students' situational awareness during flights, a problem which technology fitted to other training aircraft suites has greatly improved. Students recall and debriefing would likely have benefited from being able to immediately download and review data traces from their flights.

Recommendations
The recommendations found as they relate to designing and implementing flying classroom activities in general to stem from this study are: • Well-implemented flying classroom exercises can be highly effective. However, as in other branches of aviation, well-crafted initial 'air experience flights' permit familiarisation with the airborne environment and improve students' capacity to learn effectively. • Airborne effectiveness is contingent on good ground-based pedagogy. Flying classroom activity will provide best learning benefits where the remembering and reflection process is strengthened by encouraged by debriefing students as soon as practicable. • Affect generated by flight activities may be best exploited by using it to encourage students to engage with the necessary, but less interesting academic work, understanding it is a short-lived effect. • Repeating this research once operations and teaching using the Saab-340B has matured, to directly compare and contrast the changes made will offer further strength to flying classroom literature.
Specifically, hypothesised points for optimising teaching and learning on the Saab-340B or other similar airborne teaching environments: • The ability to fly more students allows for greater operational flexibility in larger courses (legal limits notwithstanding). The effect it will have on good pedagogy should be noticeable. • Prioritise student situational awareness throughout the course; upgraded instrumentation should assist with this.
• Maximise the ability of any new instrumentation or avionics suites to enable students to use more examples of real-world data they will, or have, personally experienced as an aid to theoretical knowledge instruction, briefing and debriefing.