ТОП просматриваемых книг сайта:
Practical Data Analysis with JMP, Third Edition. Robert Carver
Читать онлайн.Название Practical Data Analysis with JMP, Third Edition
Год выпуска 0
isbn 9781642956122
Автор произведения Robert Carver
Жанр Программы
Издательство Ingram
This data table also has a large amount of missing data. Although the FAA attempts to record a full set of variables for each strike, sometimes the data is not known to the individual reporting the incident. In a JMP data table, missing categorical data appears as a blank cell, and missing continuous data is a dot.
In a bivariate analysis, we think in terms of a pair of observations for each bird strike. JMP will only analyze those incidents that have a complete pair of observations for whichever two columns that we select.
Describing Covariation: Two Categorical Variables
At what point in a flight do bird strikes most often occur? Was it the same at all three airports? In our data table, we have two variables that identify the airport (the Airport code and City) and another ordinal variable identifying the Phase of the Flight at which the strike happened. We can use JMP to investigate the covariation of these categorical variables using a few different approaches.
1. Select Analyze ► Distribution. You should be quite familiar with this dialog box by now. Select the AIRPORT and PHASE_OF_FLT (phase of flight) columns, cast them into the Y role, and click OK.
JMP produces two univariate graphs with accompanying frequency tables for each of the two columns. In the default state, they reveal no information about patterns of covariation, but there are a few important things to notice. Look at the phase of flight data and see that several of the levels of this column are repeated, but with different capitalization. Our analysis will be simplified if we can treat APPROACH and Approach as the same thing.
A Digression: Recoding a Variable and Changing Value Order
When data are recorded by many individuals over a long time period, sometimes variations in spelling, punctuation, or capitalization creep into a set of data. Prior to analysis, we typically want to clean up and standardize such variation when we find it.
1. Make the data table the active tab of your project and select the column PHASE_OF_FLT.
2. Select Cols ► Recode. The Recode dialog box (Figure 4.1) lets us replace current values with new ones.
Figure 4.1: Recode Dialog Box
We have the option to preserve the original data and create a new column, and then individually type in text to replace values that we want to standardize. In this case, we see that 1,417 incidents happened on Approach and 1 happened on APPROACH.
3. In the New Values (16) column, replace APPROACH with Approach. Notice the changes in the dialog box that indicate that two old values that were previously treated as different are now grouped together.
4. Continue by recoding all the Landing Roll and Take-off run levels. When finished, your dialog box should look like Figure 4.2. Then click Recode.
Figure 4.2: Recoded Values
Also notice that the flight phase values are in alphabetical order. It might be more useful to change the order to correspond to the chronology of a flight, and fortunately this is easy to do. The basic chronology of the phases in this data table is as follows:
● Departure
● Taxi (the FAA distinguishes between Taxi-out and Taxi-in, but here we just have “Taxi”, which occurs 4 times)
● Take-off run
● Climb
● Descent
● Approach
● Landing Roll
● Arrival
● Local (rare; apparently refers to strikes noted at an airport)
● Unknown
5. In the Columns pane of the data window, right-click on the column PHASE_OF_FLT 2 and select Column info…
6. Click Column Properties and select Value Order.
7. As shown in Figure 4.3, highlight Departure in the Custom Order list of values, and use the black arrows to move it to precede Approach. Note the blank line above Approach, representing missing values. We can leave that as the first item in the list.
Figure 4.3: Defining a Custom Value Order
8. Continue using the up and down arrows to place the value labels in the desired order. When you finish, click OK.
Back to the bivariate analysis
1. Again, select Analyze ► Distribution and cast AIRPORT and PHASE_OF_FLT 2 columns into the Y role, and click OK.
Wildlife strikes are rare during the taxi phase (in either direction between the terminal and runway) and Arrival phase. We also should note that, although the FAA database does contain strike reports while flights are en route, these three airports did not report any such strikes. Consequently, the en route phase of flight does not even appear in the right-hand panel of this graph.
In contrast, more than 50% of the strikes occurred during the approach phase of the flight. Clicking the Approach bar highlights the bar in the right-hand graph and also highlights all approach-related observations in the left-hand graph. If the relevant dynamics are similar at the three airports, then we would expect approximately half of each of the three bars in the left chart to be darkened. However, this is not the case. We will explore this curious pattern shortly.
2. Click the bar representing Approach (to the airport just prior to landing), we see something interesting, as shown in Figure 4.4.
Before going further, look at the frequency tables under each graph. Each table tallies the number of observations in each column category as well as the number of missing observations, and the number of distinct values (levels) within the column. The data table identifies the airport city for every bird strike, but only has data about the flight phase for 2,818 incidents. The other 593 are missing—they are unknown to history.
Figure 4.4: Two Linked Univariate Distributions
The issue of missing data is quite common in observational and survey data, though introductory courses often bypass it as a topic. In a bivariate analysis, we will need two values for each observation. If one is present, but the other is missing, then that observation will be omitted from the analysis.
It behooves the analyst to think about why a value is missing, and whether missing observations share a common cause. The very fact that observations are missing could be informative in its own right. For the current discussion, it is only important to realize that univariate analyses might include observations that are excluded in a related bivariate analysis.
3. We can generate a different view of the same data by using the By feature of the Distribution platform. Click the red triangle next to the word Distributions in the current window and select Redo ► Relaunch Analysis. This reopens the dialog box that we used earlier. The two columns still appear in the Y, Column box.
4. Drag the column name Airport from its current position in the Y, columns box into the By box and click OK.
The