We’re doing an important upgrade to our website. It will not take long, but Cambridge.org will be unavailable on Monday 21st October for 15 minutes only from 04.30 BST/11.30 EDT

# Code for chapter 19

#### For instructors:

Below are listed all the scripts as shown in chapter 19 of the book. These are free to copy and paste into your code editor.

## Script 19.1

 script_19_1.awk, file.dat ```# Script 19.1 # Processing of the ASCII-file file.tab with tab-separated values awk ' BEGIN {FS="\t"} (NR>1){ if (\$1>2) {print \$1,\$2-\$4 } else {print \$1,\$3} n=n+1 } END {print "This is the End after processing "n" lines."} ' file.dat ``` script_19_1.out ```[ah@hobbes Documents]\$ ./script_19_1.awk 1 S1 Male 66.3 20 TRUE 2 S2 Male 80.2 21 FALSE 0 3 S3 Female 53.1 29 TRUE 0 4 S4 Female 21.3 18 TRUE 0 5 S5 Female 54.6 NA FALSE 0 6 S6 Female 78.1 23 TRUE 0 This is the End after processing 6 lines. ```

## Script 19.2

 script_19_2.py ```# Script 19.2 # Calculate the sum of squares of data in an array s = 0 data = [3, -1, 2, -5, None, 9, -2] for d in data: if not d is None: s += d * d print('The sum of squares is ',s) ``` script_19_2.out ```[ah@hobbes Documents]\$ python script_19_2.py ('The sum of squares is ', 124) ```

## Script 19.3

 script_19_3.R ```# Script 19.3 # Plotting a normal distribution x<-seq(-4,4,length=100)dx<-dnorm(x,mean=0,sd=1)plot(x,dx,type="l",main="Normal Distribution")p1<-seq(-4,-1.5,length=100)dp1<-dnorm(p1)p2<-seq(1.5,4,length=100)dp2<-dnorm(p2)polygon(c(-4,p1,-1.5),c(0,dp1,0),col="gray") ``` script_19_3.png

## Script 19.4

 script_19_4.R ```# Script 19.4 library() # List packages available, not necessarily active data(agriculture) # Attempt to load the Agriculture dataset bannerplot(agnes(agriculture), main = "Bannerplot") # Run bannerplot function library(cluster) # Load the cluster library and try again data(agriculture) bannerplot(agnes(agriculture), main = "Bannerplot") ``` script_19_4.png ` `

## Script 19.7

 script_19_7.R ```# Script 19.7 x<-3 # Set x to hold the value 1 print(x) # Its value can be displayed simply with the print() command x # Or even simply by stating its name x+x # Simple operations are straight forward x*x sqrt(x) # even the square root, using the built-in function sqrt() y<- -3.4 y x-y # Calculate the difference between numbers x and y t<-"Text" # Set t to hold the String "Text" t x-t # Attempt to calculate the difference between x and t which fails typeof(x) # x is a number with double precision typeof(t) # t is a string with character type z<- TRUE # Set a BOOLEAN value TRUE to z z x-z # Calculate the difference between x and z provides an answer typeof(z) typeof(x-z) # The BOOLEAN logical type TRUE is transformed into value 1 in double type 3-1=2``` script_19_7.out ```[ah@hobbes Documents]\$ R R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > # Script 19.7 > x print(x) # Its value can be displayed simply with the print() command [1] 3 > > x # Or even simply by stating its name [1] 3 > > x+x # Simple operations are straight forward [1] 6 > > x*x [1] 9 > > sqrt(x) # even the square root, using the built-in function sqrt() [1] 1.732051 > > y y [1] -3.4 > > x-y # Calculate the difference between numbers x and y [1] 6.4 > > t t [1] "Text" > > x-t # Attempt to calculate the difference between x and t which fails Error in x - t : non-numeric argument to binary operator > > typeof(x) # x is a number with double precision [1] "double" > > typeof(t) # t is a string with character type [1] "character" > > z z [1] TRUE > > x-z # Calculate the difference between x and z provides an answer [1] 2 > > typeof(z) [1] "logical" > > typeof(x-z) # The BOOLEAN logical type TRUE is transformed into value 1 in double type 3-1=2 [1] "double" > ```

## Script 19.8

 script_19_8.R ```# Script 19.8 v <- c(10, 1.6, 1, 2.6, 1.7) # Create a vector of length 5 with set values v w<-1:10 # Create a vector of length 10 w v+1 # One can operate on all numbers v*w # Even between vectors sqrt(v) # Or more complex operation like square root mean(v) # Average across the values of the vector sd(v) # Standard Deviation sum(v) # Sum length(v) # Length of the vector``` script_19_8.out ```[ah@hobbes Documents]\$ R R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > # Script 19.8 > v v [1] 10.0 1.6 1.0 2.6 1.7 > w w [1] 1 2 3 4 5 6 7 8 9 10 > v+1 # One can operate on all numbers [1] 11.0 2.6 2.0 3.6 2.7 > v*w # Even between vectors [1] 10.0 3.2 3.0 10.4 8.5 60.0 11.2 8.0 23.4 17.0 > sqrt(v) # Or more complex operation like square root [1] 3.162278 1.264911 1.000000 1.612452 1.303840 > mean(v) # Average across the values of the vector [1] 3.38 > sd(v) # Standard Deviation [1] 3.744596 > sum(v) # Sum [1] 16.9 > length(v) # Length of the vector [1] 5 > ```

## Script 19.9

 script_19_9.R ```# Script 19.9m<-array(2:7,dim=c(2,3))   # Create a 2x3 array containing values from 2 to 7mm[1,3]                     # Access a single cell by indexm[1,]                      # Access a single rowm[,c(1,3)]                 # Access a pair of columns ``` script_19_9.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.9> m<-array(2:7,dim=c(2,3))   # Create a 2x3 array containing values from 2 to 7> m     [,1] [,2] [,3][1,]    2    4    6[2,]    3    5    7> m[1,3]                     # Access a single cell by index[1] 6> m[1,]                      # Access a single row[1] 2 4 6> m[,c(1,3)]                 # Access a pair of columns     [,1] [,2][1,]    2    6[2,]    3    7> ```

## Script 19.10

 script_19_10.R ```# Script 19.10hair<-c("black","fair","black","black","brown","fair","white","brown","white","bald")f_hair<-factor(hair)    # Make a factor from this vectorf_hairlevels(f_hair)          # The categories are the levelslevels(hair)            # There are no levels in the original vector ``` script_19_10.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.10> hair<-c("black","fair","black","black","brown","fair","white","brown","white","bald")> f_hair<-factor(hair)    # Make a factor from this vector> f_hair [1] black fair  black black brown fair  white brown white bald Levels: bald black brown fair white> levels(f_hair)          # The categories are the levels[1] "bald"  "black" "brown" "fair"  "white"> levels(hair)            # There are no levels in the original vectorNULL> ```

## Script 19.11

 script_19_11.R ```# Script 19.11res <- list(test_name=c("t.test","Fisher_exact"), pval=0.001, df=3, err=c(0.4,0.7,0.9))     # List of 4 componentsresres\$pval             # Access the full element pvalres\$err              # Access the full element errres\$test_name[1]     # Access an element in an element ``` script_19_11.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.11> res <- list(test_name=c("t.test","Fisher_exact"), pval=0.001, df=3, err=c(0.4,0.7,0.9))     # List of 4 components> res\$test_name[1] "t.test"       "Fisher_exact"\$pval[1] 0.001\$df[1] 3\$err[1] 0.4 0.7 0.9> res\$pval             # Access the full element pval[1] 0.001> res\$err              # Access the full element err[1] 0.4 0.7 0.9> res\$test_name[1]     # Access an element in an element[1] "t.test"> ```

## Script 19.12

 script_19_12.R ```# Script 19.12# Read the table considering the first row as dataread.table("table.dat") ``` script_19_12.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.12> # Read the table considering the first row as data> read.table("table.dat")        V1     V2     V3  V4       V51 SampleID Gender Weight Age Affected2       S1   Male   66.3  20     TRUE3       S2   Male   80.2  21    FALSE4       S3 Female   53.1  29     TRUE5       S4 Female   21.3  18     TRUE6       S5 Female   54.6  12    FALSE7       S6 Female   78.1  23     TRUE> ```

## Script 19.13

 script_19_13.R ```# Script 19.13# Read the table and use the first row to name the columnsread.table("table.dat",header=T) ``` script_19_13.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.13> # Read the table and use the first row to name the columns> read.table("table.dat",header=T)  SampleID Gender Weight Age Affected1       S1   Male   66.3  20     TRUE2       S2   Male   80.2  21    FALSE3       S3 Female   53.1  29     TRUE4       S4 Female   21.3  18     TRUE5       S5 Female   54.6  12    FALSE6       S6 Female   78.1  23     TRUE> ```

## Script 19.14

 script_19_14.R ```# Script 19.14# Read the table with missing entriesread.table("table2.dat")    t<-read.table("table2.dat",header=T,sep="\t")print(t)typeof(t)   # The data is interpreted as a list of various subtypestypeof(t\$Age)typeof(t\$Affected)typeof(t\$Gender)summary(t)  # A summary of the values can be obtained according to the type ``` script_19_14.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.14> # Read the table with missing entries> read.table("table2.dat")Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :   line 6 did not have 5 elements> t<-read.table("table2.dat",header=T,sep="\t")> print(t)  SampleID Gender Weight Age Affected1       S1   Male   66.3  20     TRUE2       S2   Male   80.2  21    FALSE3       S3 Female   53.1  29     TRUE4       S4 Female   21.3  18     TRUE5       S5 Female   54.6  NA    FALSE6       S6 Female   78.1  23     TRUE> typeof(t)   # The data is interpreted as a list of various subtypes[1] "list"> typeof(t\$Age)[1] "integer"> typeof(t\$Affected)[1] "logical"> typeof(t\$Gender)[1] "integer"> summary(t)  # A summary of the values can be obtained according to the type SampleID    Gender      Weight           Age        Affected       S1:1     Female:4   Min.   :21.30   Min.   :18.0   Mode :logical   S2:1     Male  :2   1st Qu.:53.48   1st Qu.:20.0   FALSE:2         S3:1                Median :60.45   Median :21.0   TRUE :4         S4:1                Mean   :58.93   Mean   :22.2   NA's :0         S5:1                3rd Qu.:75.15   3rd Qu.:23.0                   S6:1                Max.   :80.20   Max.   :29.0                                                       NA's   :1                     > ```

## Script 19.15

 script_19_15.R ```# Script 19.15data(euro)                      # Load the built-in Euro conversion valueseurodata(package="cluster")         # List of datasets is accessible from packagesdata(flower, package="cluster") # Built-in data can be loaded from a packagedata(cars)dim(cars)cars[1:10,]                     # A subset of the dataset can be usedsummary(cars)                   # and relevant content displayed ``` script_19_15.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.15> data(euro)                      # Load the built-in Euro conversion values> euro        ATS         BEF         DEM         ESP         FIM         FRF   13.760300   40.339900    1.955830  166.386000    5.945730    6.559570         IEP         ITL         LUF         NLG         PTE    0.787564 1936.270000   40.339900    2.203710  200.482000 > data(package="cluster")         # List of datasets is accessible from packagesData sets in package 'cluster':agriculture              European Union Agricultural Workforcesanimals                  Attributes of AnimalschorSub                  Subset of C-horizon of Kola Dataflower                   Flower CharacteristicsplantTraits              Plant Species Traits Datapluton                   Isotopic Composition Plutonium Batchesruspini                  Ruspini Datavotes.repub              Votes for Republican Candidate in Presidential                         Electionsxclara                   Bivariate Data Set with 3 Clusters> data(flower, package="cluster") # Built-in data can be loaded from a package> data(cars)> dim(cars)[1] 50  2> cars[1:10,]                     # A subset of the dataset can be used   speed dist1      4    22      4   103      7    44      7   225      8   166      9   107     10   188     10   269     10   3410    11   17> summary(cars)                   # and relevant content displayed     speed           dist        Min.   : 4.0   Min.   :  2.00   1st Qu.:12.0   1st Qu.: 26.00   Median :15.0   Median : 36.00   Mean   :15.4   Mean   : 42.98   3rd Qu.:19.0   3rd Qu.: 56.00   Max.   :25.0   Max.   :120.00  > ```

## Script 19.16

 script_19_16.R ```# Script 19.16# Cars dataset visualised as scatter plotdata(cars)dim(cars)names(cars)par(mfrow=c(2,1)) # Split the graphic in 2 rows and 1 columnplot(cars)  plot(cars,pch="+",xlab="Speed",ylab="Distance",col="red") # Or use several parameters ``` script_19_16.png script_19_16.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.16> # Cars dataset visualised as scatter plot> data(cars)> dim(cars)[1] 50  2> names(cars)[1] "speed" "dist" > par(mfrow=c(2,1)) # Split the graphic in 2 rows and 1 column> plot(cars)  > plot(cars,pch="+",xlab="Speed",ylab="Distance",col="red") # Or use several parameters> ```

## Script 19.17

 script_19_17.R ```# Script 19.17# Ruspini dataset visualised as scatter plotlibrary(cluster)head(ruspini) # Another set of x vs y dataplot(ruspini, main="Ruspini cluster set") # Highlights a clustering pattern ``` script_19_17.png script_19_17.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.17> # Ruspini dataset visualised as scatter plot> library(cluster)> head(ruspini) # Another set of x vs y data   x  y1  4 532  5 633 10 594  9 775 13 496 13 69> plot(ruspini, main="Ruspini cluster set") # Highlights a clustering pattern> ```

## Script 19.18

 script_19_18.R ```# Script 19.18# Titanic dataset visualised as scatter plotplot(Titanic,main="Titanic") ``` script_19_18.png script_19_18.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.18> # Titanic dataset visualised as scatter plot> plot(Titanic,main="Titanic")> ```

## Script 19.19

 script_19_19.R ```# Script 19.19# The Cars dataset visualised as histogrampar(mfrow=c(2,1))hist(cars\$speed)hist(cars\$speed,breaks=15,col="gray")abline(v=21,col="red") ``` script_19_19.png script_19_19.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.19> # The Cars dataset visualised as histogram> par(mfrow=c(2,1))> hist(cars\$speed)> hist(cars\$speed,breaks=15,col="gray")> abline(v=21,col="red")> ```

## Script 19.20

 script_19_20.R `# Script 19.20# A normal distribution with a mean of 0 and standard deviation of 1par(mfrow=c(2,1))        # Split the display in 2 rows and 1 columnx<-seq(-5,10,length=100)  # Interval from 5 to 10dx<-dnorm(x)             # Default normal distributionplot(x,dx,type="l",main="Normal Distribution") # Plot distribution` script_19_20.png script_19_20.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.20> # A normal distribution with a mean of 0 and standard deviation of 1> par(mfrow=c(2,1))        # Split the display in 2 rows and 1 column> x<-seq(-5,10,length=100)  # Interval from 5 to 10> dx<-dnorm(x)             # Default normal distribution> plot(x,dx,type="l",main="Normal Distribution") # Plot distribution> ```

## Script 19.21

 script_19_21.R ```# Script 19.21# A normal distribution with a mean of 0 and standard deviation of 1par(mfrow=c(2,1))        # Split the display in 2 rows and 1 columnx<-seq(-5,10,length=100)  # Interval from 5 to 10dx<-dnorm(x)             # Default normal distributionplot(x,dx,type="l",main="Normal Distribution") # Plot distribution# Another distribution with mean of 4 and standard deviation of 1.5x2<-xdx<-dnorm(x2,mean=4,sd=1.5)plot(x2,dx2,type="l",main="Other Distribution") # Plot distributionabline(v=4,col="red")        # Draw vertical line in red at meanabline(v=0,col="blue")       # Draw vertical line in blue at 0mtext(text="mean=4, sd=1.5") # Include extra text on top ``` script_19_21.png script_19_21.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.21> # A normal distribution with a mean of 0 and standard deviation of 1> par(mfrow=c(2,1))        # Split the display in 2 rows and 1 column> x<-seq(-5,10,length=100)  # Interval from 5 to 10> dx<-dnorm(x)             # Default normal distribution> plot(x,dx,type="l",main="Normal Distribution") # Plot distribution> # Another distribution with mean of 4 and standard deviation of 1.5> x2<-x> dx<-dnorm(x2,mean=4,sd=1.5)> plot(x2,dx2,type="l",main="Other Distribution") # Plot distributionError in xy.coords(x, y, xlabel, ylabel, log) : object 'dx2' not found> abline(v=4,col="red")        # Draw vertical line in red at mean> abline(v=0,col="blue")       # Draw vertical line in blue at 0> mtext(text="mean=4, sd=1.5") # Include extra text on top> ```

## Script 19.22

 script_19_22.R ```# Script 19.22# Z-transformation of the Cars datasetpar(mfrow=c(2,1))s<-cars\$speedzs<-(s-mean(s))/sd(s)hist(cars\$speed,breaks=10,col="gray",xlim=c(-5,25),main="Speed Distribution",xlab="Speed")hist(zs,breaks=10,xlim=c(-5,25),main="Z Speed Distribution",xlab="Z transform of Speed") ``` script_19_22.png script_19_22.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.22> # Z-transformation of the Cars dataset> par(mfrow=c(2,1))> s<-cars\$speed> zs<-(s-mean(s))/sd(s)> hist(cars\$speed,breaks=10,col="gray",xlim=c(-5,25),main="Speed Distribution",xlab="Speed")> hist(zs,breaks=10,xlim=c(-5,25),main="Z Speed Distribution",xlab="Z transform of Speed")> ```

## Script 19.23

 script_19_23.R ```# Script 19.23# Box plot of a dataset of smoking-related deaths# breslow dataset available from# https://vincentarelbundock.github.io/Rdatasets/datasets.htmlbreslow<-read.table("breslow.dat",header=T)boxplot(y~smoke,data=breslow,main="Smoking-related deaths",xlab="Smoking",ylab="Number of related deaths") ``` script_19_23.png script_19_23.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.23> # Box plot of a dataset of smoking-related deaths> # breslow dataset available from> # https://vincentarelbundock.github.io/Rdatasets/datasets.html> breslow<-read.table("breslow.dat",header=T)> boxplot(y~smoke,data=breslow,main="Smoking-related deaths",xlab="Smoking",ylab="Number of related deaths")> ```

## Script 19.24

 script_19_24.R ```# Script 19.24# Heatmap representation of the data with automatic construction of # dendrogram to reflect the organisation of the datapar(mfrow=c(2,1))x<-as.matrix(swiss) # Investigate the fertility in Swiss cantonshead(x)heatmap(as.matrix(swiss)) ``` script_19_24.png script_19_24.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.24> # Heatmap representation of the data with automatic construction of > # dendrogram to reflect the organisation of the data> par(mfrow=c(2,1))> x<-as.matrix(swiss) # Investigate the fertility in Swiss cantons> head(x)             Fertility Agriculture Examination Education CatholicCourtelary        80.2        17.0          15        12     9.96Delemont          83.1        45.1           6         9    84.84Franches-Mnt      92.5        39.7           5         5    93.40Moutier           85.8        36.5          12         7    33.77Neuveville        76.9        43.5          17        15     5.16Porrentruy        76.1        35.3           9         7    90.57             Infant.MortalityCourtelary               22.2Delemont                 22.2Franches-Mnt             20.2Moutier                  20.3Neuveville               20.6Porrentruy               26.6> heatmap(as.matrix(swiss))> ```

## Script 19.25

 script_19_25.R `# Script 19.25# Same plot with different colour scheme and added colour on the# sides from a Rainbow palettepar(mfrow=c(2,1))x<-as.matrix(swiss) # Investigate the fertility in Swiss cantonsrc<-rainbow(nrow(x),start=0, end =.3)cc<-rainbow(ncol(x),start=0, end =.3)heatmap(x,col=cm.colors(256),scale="column",RowSideColors=rc, ColSideColors=cc, xlab="Specification variables", ylab="Cantons",main="Heatmap")` script_19_25.png script_19_25.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.25> # Same plot with different colour scheme and added colour on the> # sides from a Rainbow palette> par(mfrow=c(2,1))> x<-as.matrix(swiss) # Investigate the fertility in Swiss cantons> rc<-rainbow(nrow(x),start=0, end =.3)> cc<-rainbow(ncol(x),start=0, end =.3)> heatmap(x,col=cm.colors(256),scale="column",RowSideColors=rc, ColSideColors=cc, xlab="Specification variables", ylab="Cantons",main="Heatmap")> ```

## Script 19.28

 script_19_28.R ```# Script 19.28# Create factors with value labels library(ggplot2)mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl")) qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),main="Distribution of Gas Milage", xlab="Miles Per Gallon",ylab="Density") ``` script_19_28.png script_19_28.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.28> # Create factors with value labels > library(ggplot2)Need help? Try the ggplot2 mailing list:http://groups.google.com/group/ggplot2.> mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) > mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) > mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl")) > qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),main="Distribution of Gas Milage", xlab="Miles Per Gallon",ylab="Density")> ```

## Script 19.29

 script_19_29.R ```# Script 19.29# Scatter plots of mpg vs. hp # for each combination of gears and cylinders# in each facet, transmission type is represented by shape and colourlibrary(ggplot2)mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))qplot(hp, mpg, data=mtcars, shape=am, color=am,facets=gear~cyl, size=I(3),xlab="Horsepower", ylab="Miles per Gallon") ``` script_19_29.png script_19_29.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.29> # Scatter plots of mpg vs. hp > # for each combination of gears and cylinders> # in each facet, transmission type is represented by shape and colour> library(ggplot2)> mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) > mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) > mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))> qplot(hp, mpg, data=mtcars, shape=am, color=am,facets=gear~cyl, size=I(3),xlab="Horsepower", ylab="Miles per Gallon") > ```

## Script 19.30

 script_19_30.R `# Script 19.30# Separate regressions of mpg on weight for each number of cylinderslibrary(ggplot2)mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),method="lm", formula=y~x, color=cyl, main="Regression of MPG on Weight",xlab="Weight", ylab="Miles per Gallon")` script_19_30.png script_19_30.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.30> # Separate regressions of mpg on weight for each number of cylinders> library(ggplot2)> mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) > mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) > mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))> qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),method="lm", formula=y~x, color=cyl, main="Regression of MPG on Weight",xlab="Weight", ylab="Miles per Gallon")Warning: Ignoring unknown parameters: method, formula> ```

## Script 19.31

 script_19_31.R ```# Script 19.31# Box plots of mpg by number of gears # observations (points) are overlayed and jitteredlibrary(ggplot2)mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),fill=gear, main="Mileage by Gear Number",xlab="", ylab="Miles per Gallon") ``` script_19_31.png script_19_31.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.31> # Box plots of mpg by number of gears > # observations (points) are overlayed and jittered> library(ggplot2)> mtcars\$gear <- factor(mtcars\$gear,levels=c(3,4,5),labels=c("3gears","4gears","5gears")) > mtcars\$am <- factor(mtcars\$am,levels=c(0,1),labels=c("Automatic","Manual")) > mtcars\$cyl <- factor(mtcars\$cyl,levels=c(4,6,8),labels=c("4cyl","6cyl","8cyl"))> qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),fill=gear, main="Mileage by Gear Number",xlab="", ylab="Miles per Gallon")> ```

## Script 19.32

 script_19_32.R ```# Script 19.32g<-c(1.1,2.0,3.2,3.7,5.1,8.6,10.4,15.2,18.7,25.3)h<-c(0.9,2.1,2.9,3.5,4.8,8.7,10.6,14.9,18.7,25.0)d<-g-hdmean(d)sum((d-mean(d))**2)sd(d)t.test(g,h,paired=TRUE) ``` script_19_32.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.32> g<-c(1.1,2.0,3.2,3.7,5.1,8.6,10.4,15.2,18.7,25.3)> h<-c(0.9,2.1,2.9,3.5,4.8,8.7,10.6,14.9,18.7,25.0)> d<-g-h> d [1]  0.2 -0.1  0.3  0.2  0.3 -0.1 -0.2  0.3  0.0  0.3> mean(d)[1] 0.12> sum((d-mean(d))**2)[1] 0.356> sd(d)[1] 0.1988858> t.test(g,h,paired=TRUE)    Paired t-testdata:  g and ht = 1.908, df = 9, p-value = 0.08875alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -0.02227432  0.26227432sample estimates:mean of the differences                    0.12 > ```

## Script 19.33

 script_19_33.R ```# Script 19.33# p-value in the normal probability density distributionx<-seq(-4,4,length=100)dx<-dnorm(x,mean=0,sd=1)plot(x,dx,type="l",main="Normal Distribution")p1<-seq(-4,-1.5,length=100)dp1<-dnorm(p1)p2<-seq(1.5,4,length=100)dp2<-dnorm(p2)polygon(c(-4,p1,-1.5),c(0,dp1,0),col="gray")polygon(c(1.5,p2,4),c(0,dp2,0),col="gray")abline(h=0) ``` script_19_33.png script_19_33.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.33> # p-value in the normal probability density distribution> x<-seq(-4,4,length=100)> dx<-dnorm(x,mean=0,sd=1)> plot(x,dx,type="l",main="Normal Distribution")> p1<-seq(-4,-1.5,length=100)> dp1<-dnorm(p1)> p2<-seq(1.5,4,length=100)> dp2<-dnorm(p2)> polygon(c(-4,p1,-1.5),c(0,dp1,0),col="gray")> polygon(c(1.5,p2,4),c(0,dp2,0),col="gray")> abline(h=0)> ```

## Script 19.34

 script_19_34.R ```# Script 19.34F<-matrix(c(24,26,35,15),nrow=2);Ffisher.test(F)chisq.test(F) ``` script_19_34.out ```[ah@hobbes Documents]\$ RR version 3.3.1 (2016-06-21) -- "Bug in Your Hair"Copyright (C) 2016 The R Foundation for Statistical ComputingPlatform: x86_64-redhat-linux-gnu (64-bit)R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.  Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.> # Script 19.34> F<-matrix(c(24,26,35,15),nrow=2);> F     [,1] [,2][1,]   24   35[2,]   26   15> fisher.test(F)    Fisher's Exact Test for Count Datadata:  Fp-value = 0.04143alternative hypothesis: true odds ratio is not equal to 195 percent confidence interval: 0.1599061 0.9685788sample estimates:odds ratio  0.3994095 > chisq.test(F)    Pearson's Chi-squared test with Yates' continuity correctiondata:  FX-squared = 4.1339, df = 1, p-value = 0.04203> ```

Not already registered? Create an account now. ×