Recommendation

International travel immensely supports the Australian economy. However, travelling is often seasonal, and mainly popular cities nearby are visited. To attract more potential travellers, we recommend the Tourism Australia’s minister advertise more of Australia’s most breath-taking sceneries online, even highlighting ones rarely mentioned but worth visiting at less-renown cities.


Evidence

Here’s a glimpse of the large international flight dataset acquired from Australia’s Bureau of Infrastructure and Transport Research Department revealing the sold airline seats supplied by the scheduled international airlines operating to and from Australia:

# Reading collected data - International Airlines operating from Australia (flights in and out)
library(readr)
flights = read.csv("flights.csv")
View(flights)
str(flights)
## 'data.frame':    89312 obs. of  15 variables:
##  $ Month             : int  37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 ...
##  $ In_Out            : chr  "I" "I" "I" "I" ...
##  $ Australian_City   : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ International_City: chr  "Denpasar" "Hong Kong" "Kuala Lumpur" "Singapore" ...
##  $ Airline           : chr  "Garuda Indonesia" "Cathay Pacific Airways" "Malaysia Airlines" "Qantas Airways" ...
##  $ Route             : chr  "DPS-ADL-MEL" "HKG-ADL-MEL" "KUL-ADL" "SIN-DRW-ADL-MEL" ...
##  $ Port_Country      : chr  "Indonesia" "Hong Kong (SAR)" "Malaysia" "Singapore" ...
##  $ Port_Region       : chr  "SE Asia" "NE Asia" "SE Asia" "SE Asia" ...
##  $ Service_Country   : chr  "Indonesia" "Hong Kong (SAR)" "Malaysia" "Singapore" ...
##  $ Service_Region    : chr  "SE Asia" "NE Asia" "SE Asia" "SE Asia" ...
##  $ Stops             : int  0 0 0 1 1 0 0 0 0 0 ...
##  $ All_Flights       : int  13 8 17 4 9 12 36 18 8 14 ...
##  $ Max_Seats         : int  3809 2008 4726 908 2038 3876 12624 2556 2296 5404 ...
##  $ Year              : int  2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 ...
##  $ Month_num         : int  9 9 9 9 9 9 9 9 9 9 ...
library(dplyr)
flights <- mutate_if(flights,is.character,as.factor)
library(ggplot2)
library(tidyverse)
library(RColorBrewer)

flights_travelling = ggplot(flights, aes(Australian_City)) + geom_bar(position = position_dodge2(preserve = "single"), aes(fill = In_Out))+ coord_flip() + theme_bw() + labs(title = "Total flights heading to and from Australia's main cities", x = "Australian Cities", y=  "Total accumulative flights")
flights_travelling 

2003 - 2018 accumulative flight trends show Sydney, Melbourne and Brisbane leading with the highest flight arrivals. Even merging Perth, Adelaide, Cairns, Gold Coast flights together falls 6 times short in comparison. Canberra had far less.

Reasons for lower flight visits in those specific areas might be since those areas are small with low populations like in Cairns. Additionally, areas close to the popularly visited cities ie. Gold Coast to Brisbane might encourage other transportation options like driving instead of flights. Furthermore, less-known cities like Adelaide in South Australia or places alike Canberra might have smaller airports and less tourist destinations making tourists less eager to visit.

Top_3_Aus_Cities <- flights %>% 
filter(Australian_City %in% c("Melbourne", "Sydney", "Brisbane"))%>%
filter(Year >= 2010, Year<=2017) 
ggplot(Top_3_Aus_Cities, aes(x = Year, fill = Australian_City)) + geom_bar(stat='count', position='dodge') + labs(title = "Total flights visiting Australia's top 3 cities from 2010 - 2017", x = "Yearly trends between 2010 - 2017", y = "Total flight interactions (inflow & outflow of flights) from Aus' top 3 cities")

Yearly trends of 2010 - 2017 flights visiting and exiting Australia’s top 3 cities show a relatively stable quantity of flights with slight fluctuations occurring between 2012 - 2015 meaning tourists’ travel preference on a seasonal basis still vastly impacts annual trends.

Australian_renown_airlines <- flights %>%
  filter(Airline %in% c("Virgin Australia","Australian Airlines","Qantas Airways","Cathay Pacific Airways","Jetstar","Tigerair Australia"))%>%
  filter(Month_num >=0, Month_num <=12)
  ggplot(Australian_renown_airlines, mapping = aes(x=Month_num, y=log(Max_Seats*All_Flights), fill = Airline)) + geom_boxplot() + labs(title = "Compiled flights and seats of Australia's Renown Airlines over different months", x = "Months", y = "Log scale comprised of Aus Airlines' maximum capacity")

Top Australian Airlines like Qantas, Cathay Pacific Airways, Tigerair Australia and Virgin Australia tend to operate larger flights most during peak holiday seasons i.e near Easter, Summer and winter breaks to perhaps satisfy demand.

  All_international_flights = flights$Service_Region
  
  counts1 = table(All_international_flights)
   barplot(counts1,las = 3, density = 20, angle = 36, col= brewer.pal(5,"Set2"),xlab = "Flight regions",ylab = "Accumulative international flights", cex.names = 0.52, cex.lab = 0.75)
   abline (h = mean(flights$Max_Seats), col = "red")

A clear trend noted from the graph is that more flights originate from countries closely located to Australia with well-developed populations like NZ, North East and South East Asia. A slight anomaly though is North America located geographically inconvenient but their wealth and eagerness to visit amends for it.

Red line above depicts regions where flights in, out Australia had exceeded the overall average international flights, which is indicated by the below value:

  mean(flights$Max_Seats)
## [1] 6588.187

Regression Test:Does having more flights taking off correlate to more flight passengers boarding airlines with higher maximum seat capacities?

Hypotheses

H0: Maximum seat capacity is independent to the total flights H1: Maximum seat capacity is not independent to the total flights

Assumptions

-> Linear relationship between the 2 variables -> Residuals are independent & homoscadastic

   p4 = ggplot(flights, aes(All_Flights, Max_Seats)) + geom_point(aes(colour = All_Flights)) + geom_smooth(method = "lm", se = FALSE, col = "darkblue", formula = y~x) + scale_colour_gradient (low = "red", high = "violet") + labs(title = "Exploring correlation between seat capacity and total flights of all airlines", x= "Total flights", y = "Maximum seat capacity")
   p4

# First assumption is affirmed below   

Strong positive correlation co-efficient below indicates a strong relation:

   cor(flights$All_Flights, flights$Max_Seats)
## [1] 0.8988735
  p5 = ggplot(lm(Max_Seats~All_Flights, data = flights)) + geom_point(aes(x=.fitted, y=.resid), col = "darkorange1") + geom_hline(yintercept = 0, linetype = "dashed", colour = "royalblue3") + labs(title = "Residual plot between maximum amount of seats and accumulative flights", x = "Total Max Seats", y = "Residuals")
  p5

Unfortunately, the fanning pattern within our residual plot indicates heteroscedasticity meaning multiple confounders greatly impact these 2 variables,so further predictions aren’t appropriate.

  summary(lm(Max_Seats~All_Flights, data = flights))
## 
## Call:
## lm(formula = Max_Seats ~ All_Flights, data = flights)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13943.6  -1233.0   -158.9    984.2  15093.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 240.0336    13.6831   17.54   <2e-16 ***
## All_Flights 256.3843     0.4182  613.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2673 on 89310 degrees of freedom
## Multiple R-squared:  0.808,  Adjusted R-squared:  0.808 
## F-statistic: 3.758e+05 on 1 and 89310 DF,  p-value: < 2.2e-16
# Welch's t-test

  cor.test(flights$Max_Seats, flights$All_Flights)
## 
##  Pearson's product-moment correlation
## 
## data:  flights$Max_Seats and flights$All_Flights
## t = 613.01, df = 89310, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8976066 0.9001254
## sample estimates:
##       cor 
## 0.8988735

As t = 613.01 and the corresponding p-value = <2.2 x e-16 is smaller than .05, we reject the null hypothesis and conclude there is statistical difference between the maximum seat flight capacity and the total flights overall.


Other Evidence

Tourism and education have long been the major contributors driving Australia’s economic growth. Between 2016 and 2017, international education brought $28.1 billion and continually rose to $32.4 billion a year after (Aird, 2017). Whilst, tourism often brings an annual turnover exceeding 2.4 trillion (Ferguson et al., 2019). However, the number of tourists visiting Australia fluctuates massively each year with some extending stay whilst others embark on short excursions before returning home immediately or routing elsewhere (Behrans,2018). Thus, tourism revenue generated annually tend to vary although observing flight trends can generally garner better predictions of consumer demand, and qualify regions mostly welcoming, attractive for visiting tourists.

References

Aird, G. (2017, November 7). How important are tourism and education exports? - MacroBusiness. Retrieved November 16, 2020, from www.macrobusiness.com.au website: https://www.macrobusiness.com.au/2017/11/important-tourism-education-exports/

Alan Behrans. (2018, November 14). Positive and negative impact of tourism on the economy. Retrieved from Positive Negative Impact website: https://positivenegativeimpact.com/tourism-on-the-economy

Ferguson, H., & Sherrell, H. (2019, June 20). Overseas students in Australian higher education: a quick guide – Parliament of Australia. Retrieved November 16, 2020, from Aph.gov.au website: https://www.aph.gov.au/About_Parliament/Parliamentary_Departments/Parliamentary_Library/pubs/rp/rp1819/Quick_Guides/OverseasStudents

Data Science Reflection

The combined statistical skills, concepts learnt from DATA1001 improved my ability in data visualization, and critically thinking against large datasets as applied in this report. Furthermore, these skills may valuably aid me in interpreting data-heavy nutritional or immunology, pathology-based research articles when later doing interdisciplinary projects required for my majors.

Acknowledgements

Was a little stuck on how to make a barchart focusing on yearly trends so I modified some of the code from this post with assistance from Helen Wu (Tutor)

Dinka,E. Ed -Digital Learning Platform. Retrieved 13 November 2020, from https://edstem.org/courses/4447/discussion/350475

Got Stuck on how to add the regression line initially based on ggplot in R so modified a little bit of code from this post:

Trinh, E. Ed -Digital Learning Platform. Retrieved 17 November 2020, from https://edstem.org/courses/4447/discussion/351413

Stuck on producing residual plot by ggplot so found help on this through: atiretoo R-Studio Community. Retrieved 15 November 2020, from https://community.rstudio.com/t/ggplot-makes-residual-plots/738