A Statistical and Machine Learning Analysis
While being almost a decade since the centenary of the RMS Titanic disaster, researchers still search for reasons as to why passengers with certain characteristics were more likely to survive than others. Utilising a privately available dataset, passenger features were analysed to study the extent of their importance in determining the mortality of those onboard during her fateful maiden voyage. The characteristics studied were sex, ticket class, age, nationality, allocated cabin, port of boarding, number of companions travelled with, whether a spouse boarded with them and the purpose of their voyage.
This study was performed through a manual statistical analysis of the dataset and the utilisation of two machine learning models; a Random Forest and a Linear Discriminant Analysis model. During training, each model weighted the importance of each passenger feature, which verified the patterns found from the initial statistical analysis.
In addition to those two models, a Support Vector Machine and a Neural Network were produced and compared to observe which classification method performed the most optimally. The Random Forest achieved the highest score and was subsequently integrated into a Graphical User Interface, which gave the user the capability to input theoretical passenger characteristics into the model to predict whether that passenger would have survived the disaster.
For those who suffered as a result of the 1912 RMS Titanic disaster, in memoriam.
Eternal Father! strong to save,
Whose arm hath bound the restless wave,
Who bid'st the mighty ocean deep
Its own appointed limits keep:
O hear us when we cry to Thee
For those in peril on the sea.
|Amidships||Centreline of a ship|
|Aport||Towards the left of a ship|
|Astarboard||Towards the starboard-side of a ship|
|Astern||Towards the rear of a ship|
|Forward||Towards the front of a ship|
|GUI||Graphical User Interface|
|Inboard||Towards the centre of a ship|
|LDA||Linear Discriminant Analysis|
|MLM||Machine Learning Models|
|Outboard||Towards out-most sides of a ship|
|Port||Left-side of a ship|
|RMS||Royal Mail Ship|
|Starboard||Right-side of a ship|
|SVM||Support Vector Machine|
1.1 Project Background
RMS (Royal Mail Ship) Titanic was a four-funnelled steamship built in 1912 for transatlantic crossings, designed with customer comfort in mind alongside the latest innovations in safety technology. Four days after leaving Southampton for New York , she struck an iceberg in the North Atlantic Ocean and sank, tragically taking 1496 of the 2208 souls onboard with her. (Encyclopedia Titanica, 2018)
While conclusions regarding the causes of the catastrophe are already well-established (Garzke, Foecke, Matthias, & Wood, 2000), individuals from both professional and amateur backgrounds still analyse passenger datasets to determine the extent in which each passenger characteristic contributed to passenger’s chance of survival. This project is a continuation of this research, which confirms the widely recognised key determinant factors in survivability; sex, class and age. In addition to this, several new passenger attributes of varying importance were discovered, prepared and analysed; nationality, number of travel companions and whether or not they boarded with a spouse.
The common approach to RMS Titanic passenger analysis was to employ the publicly available dataset from Kaggle, who distributed it as the basis of their introductory machine learning competition (Kaggle, 2019). According to Vanderbilt University who distribute a more complete version of this dataset however, it had not been updated since 1999 (Cason, 2012). In light of this, far more up-to-date data has been obtained from the leading RMS Titanic online archive, Encyclopedia Titanica, for this project. The survivability patterns found from this extended dataset are not definitive, but rather exist to inspire others to pursue further research and to search archived primary sources to create a more complete explanation of mortality rates across different passenger characteristics.
1.2 Aims and Objectives
The primary aim of this project was to continue the study of the importance of RMS Titanic passenger characteristics on survival rates by employing the most up-to-date archive data possible; analysing new features that had not yet been studied through manual data analysis and machine learning techniques.
The secondary aim was to provide the reader with the capability to program a MLM (Machine Learning Model) to predict survivability for theoretical passenger data.
This project had eight objectives:
- Obtain the most up-to-date RMS Titanic passenger archive data.
- Format the data from (1) into datasets useful for analytical purposes.
- Perform a statistical analysis on the passenger population demographics using the dataset from (2).
- Perform a historic study on how passenger characteristics affected their likelihood of surviving the disaster using the dataset from (2).
- Employ several machine learning methods, training MLMs using the datasets from (2).
- Compare the accuracy of MLMs from (5) in a variety of ways.
- Numerically establish the importance of each passenger feature in survivability.
- Create a GUI (Graphical User Interface) that employs the most optimal MLM from (6) to predict the outcome of a theoretical passenger, who’s features are inputted bythe user.
The raw dataset was gathered through a query run by the Encyclopedia Titanica Facebook team on their most up-to-date database at the time of data generation, November 2018.
An additional query request was made to Encyclopedia Titanica, who kindly sent data for pairs of passengers who were recorded as having travelled together. They cautiously advised that these queries were works-in-progress, so may not be fully representative of the actual passengers, therefore meaning that statistical patterns found may not be fully reliable.
Three different versions of the dataset were subsequently formatted appropriately for the purposes of statistical analysis and to train machine learning models that use different learning techniques.
Using Microsoft Excel, statistical analyses were manually carried out on the first dataset to determine the demographic makeup of the ship’s population and how each passenger characteristic affected survival rates. These analyses are located in APPENDIX A and APPENDIX B respectively; it is highly recommended that the reader study these.
Utilising the MATLAB 2018b Statistics and Machine Learning Toolbox & Deep Learning Toolbox, several MLMs were optimised and trained; a SVM (Support Vector Machine), RF (Random Forest), LDA (Linear Discriminant Analysis) and NN (Neural Network).
The RF and LDA properties were investigated to observe how the algorithm had judged the importance of each passenger feature in survival probability. These values were subsequently used to verify the findings in the manual statistical analysis.
The generated MLMs were compared in a variety of ways to conclude which was the best-performing and the RF came out on-top.
A GUI was produced using the MATLAB 2018b App Designer that gave users the capability to enter custom passenger parameters into the RF and make it predict whether they would have been lost or saved in the disaster.
2 SUBJECT REVIEW
2.1 Data Gathering
The vast majority of RMS Titanic data analyses in the last few years have utilised the Kaggle dataset from 1999 (Carter, 2016) (Donges, 2018). Older analyses from the 1980s and 1990s however used a dataset that was released over a century ago (Hall, 1986). Overall, it was clear that no one had undergone an analysis of a dataset that had been updated since 1999; What set this project apart from other RMS Titanic data analyses was that it employed a brand-new dataset that contained the most recent findings surrounding the passengers.
2.1 Use of Passenger Features
2.1.1 Travel Companions
The Kaggle dataset (Kaggle, 2019) represented the number of passenger travel companions by creating two combined features; the number of siblings/spouses and parents/children on the same ticket number. This project differed from this approach by:
- Utilising a dataset that incorporated all passengers who travelled together in groups as companions, regardless of whether they were on the same ticket number.
- Separating the travel companion data into four distinctive attributes; number of adults, young adults, children and spouses that each passenger travelled with.
2.1.2 Age Categorisation
Previous analyses categorised passengers into age groups (i.e. children, adolescents and adults) without any historical justification (Cicoria , Sherlock, Muniswamaiah, & Clarke, 2014) (Carter, 2016). In this project, age grouping is instead performed by analysing historical sources to calculate the ages in which people were judged to have transitioned through different stages of development.
2.1.3 Cabin Allocation
Due to how 77% of passengers didn’t have allocated cabin data records in the Kaggle dataset, previous studies either removed the cabin attribute (Carter, 2016), replaced the blank data with ‘unknowns’ (Donges, 2018) or over-generalised the data by not specifying which part of the ship the cabin was located in (Kelley, 2014). This project however embraces the substantially small volume of cabin data available by performing an in-depth analysis of it.
2.2 Importance of Passenger Features
The correlation coefficients (values of predictor performance) of RFs produced by Donges (2018) and Durojayne, et. al. (2014) depicted in Table 1 show that passenger sex and ticket class were ranked within the three most important characteristics. Further confirming the importance of passenger sex in survivability, the report by Cicoria, Sherlock, & Muniswamaiah (2014) stated, ‘Sex clearly had the most significant relationship demonstrated within the dataset for survival rate’.
The number of siblings, spouses, parents and children a passenger boarded with however does not appear to have been particularly decisive in determining their likelihood of survival.
This project repeated this correlation coefficient analysis, but with new alternative features and a LDA as well as a RF.
Table 1—Feature Importance
|Correlation Coefficient||Order of Importance|
|Feature||(Donges, 2018)||(Durojayne, et al., 2014)||(Donges, 2018)||(Durojayne, et al., 2014)|
|No. Siblings/Spouses Onboard||0.036||0.035||5||6|
|No. Parents/Children Onboard||0.022||0.082||6||4|
|Port of Embarkation||0.051||0.102||4||3|
2.3 Machine Learning Method
RF was the most widely used machine learning technique in previous data analyses (Kelley, 2014) (Durojayne, et al., 2014) and was the most accurate when compared to other methods (Donges, 2018). In light of this, it was expected that a RF was going to be the best performing MLM in this project.
3 PREPARING THE DATASET FOR HISTORIC STUDIES
3.1 Articulating the Problem
The problem at hand was that of binary classification, as the MLM was to be trained using known data to predict a categorical response; whether a passenger was saved or lost in the sinking.
3.2 Gathering Data
The raw dataset was obtained through a query run by the Encyclopedia Titanica social media team on their most up-to-date database at the time of data generation.
An additional query request was made to Encyclopedia Titanica, who kindly sent data showing pairs of passengers who were recorded as having travelled together. They cautiously advised that it was an incomplete work-in-progress, so may not be fully representative of the actual passengers and patterns found may be incorrect.
3.3 Data Sampling
The raw data included everyone who boarded the Titanic from the time of the initial crossing from Belfast to Southampton onwards. In order to be purely representative of the passengers who were onboard at the time of the disaster, the ship’s crew and those who disembarked prior to the maiden voyage to New York were filtered out.
The passenger attributes in Table 2 were removed from the raw dataset as they did not affect the probability of passenger’s survival.
Table 2—Deleted Attributes
|Pseudo name||Name of spouse||Maiden name||Name suffix||Destination||Present for Delivery Trip|
|Maiden Voyage||Port of disembarkation||Servant to… [passenger]||Ship of body recovery||Street of last address||City of last address|
|Place of death||Place of burial||Date of burial||City of birth||County/state of birth||Date of Birth|
|Date of Death||Cause of Death||Encyclopedia Titanica URL||Occupation||Recovered body ID||Lifeboat|
3.4 Formatting Data
3.4.1 Standardising Titles
Titles were standardised for consistency (e.g. instances of ordained titles were changed to ‘Rev.’) and, where applicable, foreign titles were translated to English. Only the Mexican honorific title ‘Don’ wasn’t translated, as there was no equivalent English title.
In the raw data, the titles ‘Lady’ and ‘Countess’ were embedded as forenames. These were moved to the ‘Title’ attribute for standardisation.
3.4.2 Renaming Attributes
The following passenger attributes were renamed to make them more representative of the data they held:
- ‘Forename’ was renamed to ‘Forename(s)’.
- ‘Class/Dept’ was changed to ‘Class’.
- ‘Subgroup’ was altered to ‘Purpose of Voyage’.
- ‘nationality’ was renamed to ‘Nationality’.
- ‘ET URL.html’ was changed to ‘ET URL’.
- The country columns within ‘Last Address’ and ‘Birthplace’ were altered to ‘Last Country of Residence’ and ‘Country of Birth’ respectively.
3.4.2 Renaming Records
The following passenger records were adjusted to make them more user-friendly:
- ‘Y’ and ‘N’ in Survivor were relabelled ‘Saved’ and ‘Lost’ respectively.
- ‘M’ and ‘F’ in Sex were changed to ‘Male’ and ‘Female’ correspondingly.
- ‘1st Class’, ‘2nd Class’ and ‘3rd Class’ were altered to ‘First’, ‘Second’ and ‘Third’ accordingly.
- Blank records in Purpose of Voyage were labelled ‘Passenger’.
3.5 Discretising Age Records by Range
When age records were plotted individually, they did not show any pattern in relation to passenger mortality, as shown in Figure 1. In order for this attribute to have a correlation with survival rates, data was discretised into age ranges spanning five years each (0-4, 5-9 years etc.), as show in Figure 29 in APPENDIX B. The discretisation of age records supported the training of machine learning models, as each ‘Age Range’ category had an adequate number of samples, whereas the raw age records individually did not.
Figure 1—Passenger Survivability by Raw Age Records
3.6 Discretising Age Records by Group
Age groups were aggregated within the historical context of the maritime industry in early 20th century Europe; around the same time as the RMS Titanic catastrophe. The sources referenced in this section are from the Genoa International Labour Conference in 1920 and the British Wreck Commissioner’s Inquiry in 1912. Aggregation is performed in this specific context to replicate how White Star Line deck crewman may have judged young passengers as children or adults while allocating places in lifeboats.
Throughout the Genoa International Labour Conference, people’s perceptions of the age in which adulthood began varied between 15-24 years, as described below:
- The Commission discussed how young men who prematurely began work in the stoke-hold (boiler and furnace compartment) were bound to get serious diseases. Additionally, they discussed how boys under 17 should not be left with the care of men. (Dahlén, 2007, p. 149)
- In light of their discussion, the Commission unanimously agreed to add two articles to the draft Convention (part of their report). The first of these fixed a minimum age of 18 years for working as stoke-hold firemen and coal trimmers and a minimum age of 17 years for working on night watch. (Dahlén, 2007, p. 149)
- The British shipowners’ delegation, Sir Cuthbert Laws opposed the Commission’s higher minimum age for stoke-hold workers. He criticised the age-oriented view of when adulthood begins and consequently argued in favour of a view oriented around physical and mental development: “The principle test is … the state of … development … at which the young man … has arrived. We know that there are many youths of 17 who are much more developed than men of 24, and there are men of 24 who have less physical development than youths of 18 or 19.” (Dahlén, 2007, p. 152)
Based on the research embedded in this section, passengers who were 18 years and above were categorised as adults.
Throughout the Genoa International Labour Conference, people’s perceptions of the age in which childhood ended varied between 6-12 years, as described below:
- The majority of the Commission agreed that children who went to sea at 12 years were deprived of an education, as the complex engine maintenance of post-industrialisation era steamships did not leave any time for study. This source indicates that it was widely known that children were being taken to work at sea on steamships at 12 years (this is relevant as Titanic was a steamship). (Dahlén, 2007, p. 148)
- During the plenary session of the conference, a Greek seaman stated that he took boys at 12 years to train on steamships. (Dahlén, 2007, p. 150)
- During the same session, the British seaman Henson stated that in the mind of the maritime employer, working ‘boys’ become adults nominally at 8-12 years, but in some circumstances as young as 6 years. (Dahlén, 2007, p. 152)
Based on the research contained in this section, passengers who were below 12 were categorised as children. Confirming this as a sensible grouping, a few months after the disaster the British Wreck Commissioner’s Inquiry classed passengers below the age of 13 as children in section 4 of their ‘Report on the Loss of the Titanic’ (British Wreck Commisioner's Inquiry, 1912). This was calculated by comparing their number of ‘children’ onboard (109) with the number of passengers below 13 years of age in the Encyclopedia Titanica dataset (109).
3.6.3 Young Adults
Young adults were categorised as the intermediate range between adults and children, whereby they began to hold some adult responsibilities, such as caring for their younger siblings.
Based on the categorisation boundaries described in section 3.6.1 and section 3.6.2, the transitional age range is defined as passengers who were between 12 and 17.9 years.
3.7 Aggregating Nationality Records
Given the sheer number of nationalities on the ship, there were too few data records per nationality to analyse survival patterns from. In light of this, nationality data was aggregated into regions of nationality to assist in MLM training, as shown in Table 3. For those who held joint nationality, their ‘region’ was categorised by studying passenger pages on Encyclopedia Titanica for information such as which country they resided in for most of their lives and the nationality of their parents.
Table 3—Region of Nationality
|Africa||Egyptian, South African or Uruguayan|
|Asia-Pacific||Asian, Australian or Russian|
|America||American or Canadian|
|Northern Europe||Danish, Finnish, Norwegian or Swedish|
|Southern Europe||Italian, Portuguese or Spanish|
|Eastern Europe||Croatian, Hungarian, Latvian, Lithuanian, Polish or Slovakian|
|Western Europe||Belgian, French, German, Prussian or Swiss|
|United Kingdom||English, Scottish, Irish or Welsh|
3.8 Decomposing Passenger Pairs Data
Three new attributes relating to groups of passengers that travelled together were created for this project; the number of children, young adults and adults that each passenger voyaged with. These were produced by utilising both manual and automated data management techniques on different parts of the second dataset provided by Encyclopedia Titanica:
- Microsoft Excel was used to manually sort through pairs of passengers who travelled together but booked on different ticket numbers to calculate how many companions from each age group they were with.
- A python script was used to automatically calculate the number of people on the same ticket number in each age group per passenger. (Tunstall, 2018)
3.9 Creating Spouse Attribute
‘Embarked with Spouse’ was a new attribute that indicated whether or not a passenger boarded with their significant other (only applicable to those who were married). It was created by utilising the ‘Title’, ‘Sex’ and ‘Surname’ features to manually match up the married couples onboard.
3.10 Decomposing Cabin Attribute
Utilising the ‘Cabin ID’ attributefrom the dataset and compartmentalising RMS Titanic deck plans from Encyclopedia Titanica, as seen in Figure 2 (Beveridge, 2008), the following features were created to analyse passenger survival rates according to cabin locations:
- Cabin Number
- Cabin Deck
- Forward/Astern (towards the front or rear) Cabin (separated by black line)
- Aport/Astarboard (towards the left or right) Cabin (separated by green line)
- Inner /Outer Cabin (separated by orange outline)
Figure 2—Cabin Categorisation
4 STATISTICAL ANALYSES
4.1 Statistical Study of Passenger Population
A statistical analysis was conducted to produce an overview of who was on RMS Titanic at the time of the disaster; located in APPENDIX A. From this statistical analysis, the largest passenger variable in each characteristic was recorded, as seen in Table 4. It is thoroughly recommended that the reader studies APPENDIX A in order to understand the historical context of the seaborne community.
Table 4—Overall Passenger Demographics
|Passenger Feature||Largest Demographic|
|Ticket Class||Third (Steerage)|
|Purpose of Voyage||Passengers|
|Role of Commercial Services||Female First-Class Servants|
|Married and Boarded with Their Spouse?||Yes|
|No. People Travelled with||At least one companion|
|No. Adults Travelled with||0|
|No. Young Adults Travelled with||0|
|No. Children Travelled with||0|
|Port Embarked From||Southampton|
|Nationality – First Class||American|
|Nationality – Second Class||English|
|Nationality – Third Class||Irish|
|Region of Nationality - Overall||British|
4.2 Historic Study of Passenger Survivability
A historical survivability study was performed as part of this project to find patterns in the mortality rates of passengers in respect to each characteristic; found in APPENDIX B. It is highly recommended that the reader studies this in order to develop an understanding of exactly how each characteristic affected passenger survivability. From this study, the standard deviation (σ) of each feature’s survival rate was calculated, as shown in Table 5. Here, the variance in survival rates differed not only per characteristic, but per age-sex group too:
- For adults overall, sex had the highest σ, unlike children who had a 50% survival rate regardless of their sex.
- The features with the largest σ for male adults were ticket class, the number of children they travelled with and region of nationality (excluding sex and age).
- Similar to male adults, the survival rate of female adults was most significantly varied by ticket class and the number of children and young adults they travelled with (excluding sex and age).
- Children had comparatively higher σ per feature than adults, whereby the features with the highest σ were the port they embarked in and the number of children and young adults they boarded with (excluding age).
Table 5—Standard Deviation of Survival Rate vs Passenger Feature
|Feature||Male Adults σ |
Including Young Adults
|Female Adults σ |
Including Young Adults
|Children σ |
Male and Female
|Average σ||Ranking of Average σ|
|Sex||30||0.01||15||1 (excluding children)|
|Age||17 (across all ages of males)||15 (across all ages of females)||-||16||8|
|No. Adult Travel Companions||9||14||29||17||6 [joint]|
|No. Young Adult Travel Companions||7||34||28||23||3|
|No. Child Travel Companions||14||33||30||26||2|
|Port of Boarding||9||12||33||18||5|
|Region of Nationality||13||18||20||17||6 [joint]|
5 PREPARING THE DATASET FOR MACHINE LEARNING
Further modifications were made to the dataset described in section 3 to make two new datasets that could be used to train a variety of MLMs with.
The SVM and RF MLMs were trained using a dataset that employed the alterations described in section 5.1 and section 5.2. The NN and LDA employed this same dataset, but including the adjustments described in section 5.3.
5.1 Filling Missing Records
The age records missing from three third-class males with the title ‘Mr’ were filled by using the average age of passengers with that ticket-class, sex and title; 28. These age records were then converted into the age range of ’25-29’ in order to be categorised in line with the rest of the ‘Age Range’ attribute.
5.1.2 Region of Nationality
The 383 nationality records missing were filled by making educated assumptions based on passenger’s surnames and the last country they resided in. These nationality records were consequently converted into the region they belonged to, in order to be grouped in accordance with the ‘Region of Nationality’ feature.
5.2 Attribute Sampling
The ‘Survivor’ attribute was moved to a separate table, in order to serve as separate output data during MLM training. Each passenger feature in Table 6 was deleted from the dataset for one of the following reasons:
- It did not comprise of useful numerical or categorical data that a MLM could be trained with.
- The data had been aggregated into a different attribute.
- Too much data was missing in order to create a dataset that was representative of the passengers onboard.
Table 6—Deleted Attributes for MLMs
|Passenger ID||Title||Forename(s)||Surname||Age Category||Marital Status|
|Fare||Ticket ID||Occupation||Nationality||Country of Birth|
|Cabin Deck||Fwd/Aft Cabin||Port/Stbd Cabin||Outer/Inner Cabin||Last Country of Residence|
5.3 Converting Records
In order to be able to train a NN and LDA, all categorical attributes (i.e. everything but ‘travel companions’ features) were encoded, as seen in APPENDIX C. For example, ‘Male’ and ‘Female’ were converted to integers ‘1’ and ‘0’ respectively. To avoid some characteristics overweighting others, all records remained within the integer range of 0 – 14.
6 PRODUCING MACHINE LEARNING MODELS
6.1 Building & Training
The RF, SVM and LDA were built in similar ways using functions within the MATLAB 2018b Statistics and Machine Learning Toolbox, as described in section 6.1.1. The NN on the other hand was constructed in a dissimilar way, utilising the Deep Learning Toolbox, as described in section 6.1.2.
6.1.1 Statistics & Machine Learning Toolbox
The RF, SVM and LDA were built using the method described below:
- Input data was imported.
- Output data was loaded and reformatted into a cell array.
- The random seed was set for repeatability.
- The Classification Learner app was employed to find which number of k-folds validated the model to show the lowest misclassification rate.
- The output data was randomly partitioned for a stratified k-fold (value found in ) cross-validation to ensure that for each sample made, there was an equal number of ‘saved’ and ‘lost’ passengers. This was important to include, as only 38% of passengers in the dataset were classified as ‘saved’.
- The MLM was trained using the appropriate technique (RF, SVM or LDA) and the highest-performing parameters found through the automatically run ‘OptimizeHyperparameters’ function.
6.1.2 Deep Learning Toolbox
Using the Neural Net Pattern Recognition app, a neural network (represented in Figure 3) was produced through the following steps:
- Input and output data were selected and loaded.
- The dataset was randomly divided into training, validation, and testing samples; 70%, 15% and 15% of the whole dataset respectively.
- Through trial and error, the optimal number of hidden neurons was selected as 1000.
- The NN was trained and retrained enough times to achieve the lowest misclassification rate on testing data possible.
Figure 3—Neural Network Block Diagram
6.2.1 Statistics & Machine Learning Toolbox
The RF, SVM and LDA predicted the outputs (lost or saved) for the entire passenger dataset using the ‘predict’ function. Utilising the ‘confusionchart’ function, these results were compared with the true outputs to produce confusion matrices, as seen in Figure 4, Figure 5 and Figure 6.
Figure 4—RF Confusion Matrix Figure 5—SVM Confusion Matrix Figure 6—LDA Confusion Matrix
6.2.2 Deep Learning Toolbox
Using the ‘net’ function, the NN predicted outputs for the whole passenger dataset. Employing these results, true outputs and ‘perfcurve’ function, the optimal operating point of the ROC (Receiver Operating Characteristic Curve) was found, as shown in Figure 7. This value (0.5115) was applied to the NN’s predictions as the label threshold (i.e. results = 0.5115 were labelled ‘saved’). Utilising the ‘plotconfusion’ function, these results were compared with the true outputs to produce a confusion matrix, as seen in Figure 8.
Figure 7—Neural Network ROC Curve Figure 8—NN Confusion Matrix
6.3 Performance Analysis
The TPs (True Positive), TNs, (True Negative), FPs (False Positive) and FNs (False Negatives) from the MLM testing described in section 6.2.1 and section 6.2.2 were inputted into formulas to measure different aspects of performance (Joseph, 2016), as seen in Table 7 whereby the best MLM per calculation was highlighted in green.
Classifier Accuracy calculated the overall ‘correctness’ of each MLM. This method provided a misleadingly high measure of MLM performance, as it didn’t consider the misclassifying rates proportionately. That is, the accuracy of predicted lost and saved passengers were over- and under-represented respectively, due to the imbalance of data records as described in section 6.1.1.
Sensitivity (True Positive Rate) worked out the fraction of saved passengers that the MLM predicted correctly to the total number of actual saved passengers. Specificity (True Negative Rate) on the other hand calculated the fraction of predicted lost passengers to the total number of genuine lost passengers. Therefore, in this application, Sensitivity and Specificity showed how complete the predicted casualty list was with real survivors and victims respectively.
Precision (Positive Predictive Value) calculated the percentage of truly saved passengers in the pool of predicted saved passengers. The Negative Predictive Value however worked out the percentage of actual lost passengers in the pool of predicted lost passengers. Thus, Precision and the Negative Predictive Value represented the probability that the predicted survival of a passenger (saved or lost respectively) was correct. Incorporating both Sensitivity and Precision, the F1 Score gave a more balanced representation of MLM performance compared to the metrics in this section’s preceding paragraphs. TNs however were not included in this calculation, meaning it did not provide a fully representative measure of MLM accuracy.
Combining all predictions within the confusion matrices, the Matthews Correlation Coefficient (MCC) provided accuracy values that were easy to interpret and were representative of overall MLM performance, whereby ‘-1’ represented a worst-case model and ‘+1’ indicated a perfect model.
Not only did the RF show itself to be the overall most accurate MLM by scoring the highest in MCC; it also achieved the highest values in four out of the six other tests, thus conforming with the findings from Donges (2018) (described in section 2.3) that showed a RF to provide the best-performing MLM. The NN on the other hand narrowly won on Specificity and Precision, but scored poorly on Sensitivity. In light of this, the NN would have been the most suitable MLM to employ for producing a complete casualty list of victims. The NN however would not have been an appropriate MLM for predicting how likely it was that a passenger that was predicted ‘saved’ was truly saved, as it’s casualty list of actual saved passengers would have only been 67% complete.
Table 7—MLM Accuracy Calculations
|Performance Testing Metric||Formula (Joseph, 2016)||SVM||RF||NN||LDA|
|Sensitivity/Recall/True Positive Rate|| |
|Specificity/True Negative Rate||0.85||0.88||0.91||0.82|
|Precision/Positive Predictive Value||0.74||0.78||0.82||0.70|
|Negative Predictive Value||0.91||0.96||0.82||0.86|
6.3.2 Passenger Feature Importance
Whilst training the RF and LDA, MATLAB weighted the significance of each feature in affecting the survival outcome of passengers in the form of correlation coefficients, shown in Table 8.
Conforming with the findings from Donges (2018) and Durojayne et al. (2014) (analysed in section 2.2), Sex and Ticket Class were measured as being within the top three most important passenger features in not only the RF, but the LDA too. An additional observation was that the three Travel Companions attributes, when compares to Donges’ equivalent attributes, were weighted as being slightly more important in predicting survivability.
Interestingly, the RF judged features drastically different to the LDA. For instance, the RF judged Voyage Purpose as having no importance, but the LDA calculated it as being the fourth most important characteristic. These differences could have been a factor in why the LDA had such a low MCC performance rating compared to the other MLMs (shown in Table 7).
The findings from the statistical analyses in section 4 concurred with both the MLMs in Table 8 that show Sex as being the most deciphering overall factor, but indicated that when passenger data was analysed by standard-deviation in age-sex groups, Travel Companions features had more of an impact on passenger mortality than age or ticket class.
Table 8—Passenger Feature Significance Comparison
|Feature||RF Predictor Importance||LDA Correlation Coefficient||Order of Importance|
|Ticket Class||0.0003||0.5485||3 [joint]||2|
|No. Young Adult Travel Companions||0.0001||0.1030||6 [joint]||7|
|No. Child Travel Companions||0.0001||0.1781||6 [joint]||5|
|No. Adult Travel Companions||0.0002||0.0936||5||8|
|Region of Nationality||0.0003||0.0899||3 [joint]||9|
|Port of Boarding||0.0001||0.1568||6 [joint]||6|
|Boarded with Spouse||0||0.0158||-||10|
7 GRAPHICAL USER INTERFACE
Using the MATLAB App Designer, a graphical user interface was built to provide the capability of employing the best-performing MLM (i.e. the RF) to predict the survival of a bespoke theoretical passenger.
7.1 Input Validation
As seen in Figure 9, all ten passenger features were available for the user to customise for inputting into the RF. In order for the features to be logically and historically feasible, they were programmed to be validated prior to MLM prediction in the following ways:
- Males and Females couldn’t board with a spouse if they were below the age of 14 and 12 respectively; under common law in the early 20th century, these were the lowest ages a person could legally marry. (Lowe & Bromley, 1992)
- If a passenger boarded with a spouse, they must be inputted as having travelled with at least one adult or young adult.
- Age was limited to integers in a range of 0-100 years.
- The number of travel companions was limited to 10 per age category.
- Categorical data could only be entered through pre-set drop-down menus, to ensure the validity of values entered.
Figure 9—Graphical User Interface
Upon pressing the Predict Button, the GUI followed the following processes to predict the outcome of a theoretical passenger:
- Predict Button was disabled to stop the user erroneously initiating another prediction while the system already predicting data.
- Prediction Text Box was cleared to remove any previous results.
- A Wait Bar popped up (seen in Figure 10) to show the user how close the GUI was to completion.
- GUI features were inputted as raw data.
- Age integer data was discretised into the appropriate ‘Age Range’ category.
- Input data was formatted into a table.
- The RF was loaded and predicted a response from the input data formatted in (6).
- RF response was displayed in the Prediction Text Box.
- Wait Bar pop-up was closed.
- Predict Button was re-enabled to allow for a new prediction to take place.
Figure 10—Wait Bar
8.1 Outline of Work - Fulfilment of Project Objectives
The most up-to-date datasets describing passengers onboard the RMS Titanic during the infamous disaster in 1912 were gathered from the leading online RMS Titanic archivists, Encyclopedia Titanica (Objective 1). These datasets were subsequently formatted appropriately for the purposes of statistical analysis and to train machine learning models (Objective 2).
Manual statistical analyses were carried out to determine the demographic makeup of the ship’s population (Objective 3) and how each passenger characteristic affected survival rates (Objective 4).
Several machine learning models were trained (Objective 5) and their performance compared through a variety of metrics to determine the best method (Objective 6). Two of these models were further examined to see how they measured the importance of each passenger feature in predicting survivability (Objective 7). These values were subsequently compared with the findings from the manual statistical analysis.
A GUI was produced that gave users the capability to enter a theoretical passenger into the best-performing machine learning model and make it predict whether they would have been lost or saved in the disaster (Objective 8).
8.2 Major Findings
8.2.1 Importance of Passenger Features in Survivability
In this project, the significance of each passenger feature in determining the fate of passengers onboard the RMS Titanic was measured in two ways; training two machine learning models to automatically determine the overall importance (correlation coefficients) of each characteristic (Objective 7) and performing a manual statistical analysis of survival rates per feature in the dataset (Objective 4). The method used in the statistical analysis greatly differed from the production of MLMs in that it measured feature importance in accordance with passenger sex and age categories, as opposed to looking at the overall significance of the characteristics.
The MLMs created were a Random Forest and Linear Discriminant Analysis model. Despite having drastically different performance ratings, they both concluded the same results as previous studies that overall, Sex and Ticket Class were within the top three most important features in predicting survival, with Sex being the most substantial.
There were however some notable differences in characteristic importance in MLMs, both between previous studies and the models trained in this project. The two most accurate models however largely conformed with the same level of importance measured for each feature; the Random Forests created in this project and by Donges (2018). In light of this, the Random Forest made in this project was employed to make the final conclusions.
Thus, the most influential passenger characteristics in determining mortality overall were, in numerical order:
- Ticket class and the region of their nationality
- Number of adults travelled with
- Port of embarkation and number of young adults and children voyaged with
The following additional conclusions were met regarding the importance of passenger features when the data was split by age-sex group (i.e. male adults, female adults and children):
- For male adults, the most divisive factors in determining their fate were their region of nationality and number of children journeyed with.
- Similarly, female adults’ survival rates were largely determined by the number of young adults and children they travelled alongside.
- Factors that were most significant in child mortality were the port in which they boarded and the number of fellow children they were with.
8.2.2 Optimal Machine Learning Method
As expected, just as with the study by Donges (2018), the highest performing machine learning method for predicting the survival outcome of passengers onboard the RMS Titanic was a Random Forest; beating the next-best model’s Matthews Correlation Coefficient (MCC) by 0.09.
8.3 Further Development
8.3.1 Repeat Model Training
With the intention of creating a model that predicts passenger mortality in a more representative manner of the actual casualties of the RMS Titanic tragedy, it is recommended that a RF is trained but with the following attributes removed:
- ‘Port of Boarding’ as it is at least partially dependent on ticket class, as described in the report by Cicoria, Sherlock, Muniswamaiah, & Clarke (2014).
- ‘Boarded with Spouse’ and ‘Purpose of Voyage, as they had so little significance that the RF weighted them as having correlation coefficients of 0; no significance on passenger mortality. This will likely create a more accurate representation of the factors that affected passenger survival rates.
8.3.2 Graphical User Interface Export
To provide RMS Titanic enthusiasts with the capability to predict the survival outcome of theoretical passengers, the GUI (described in section 7) should be generated into C or C++ code using the MATLAB Coder tool to produce an application which doesn’t require MATLAB to run.
8.3.3 Additional Statistical Study
Further historic study should be undertaken to understand the following survivability patterns found in APPENDIX B:
- Survivability dramatically dropped for passengers within the age range of 40-44.
- Male adults who held Western European nationalities had an anomalously high survival rate, compared to other male adults.
- The higher the number of young adults and children that adults travelled with, the less likely they were to survive. That is aside from if they journeyed with only one child, whereby their survival likelihood increased.
Beveridge, B. (2008). Titanic the Ship Magnificent (Vol. I). Stroud, Gloucestershire, United Kingdom: The History Press. Retrieved March 19, 2019, from https://www.encyclopedia-titanica.org/titanic-deckplans/e-deck.html
Bracken, R. L. (2004, June 7). The Mystery of Rhoda Abbott Revealed. Retrieved March 19, 2019, from Encyclopedia Titanica: https://www.encyclopedia-titanica.org/rhoda-abbott.html
Bride, H. (1912, April 19). Statement by Harold Bride. New York Times. Retrieved March 19, 2019, from Encyclopedia Titanica: https://www.encyclopedia-titanica.org/statement-harold-bride.html
British Wreck Commisioner's Inquiry. (1912). Report: Account of the Saving and Rescue of those who Survived. London: Titanic Inquiry Project. Retrieved March 3, 2019, from https://www.titanicinquiry.org/BOTInq/BOTReport/botRepSaved.php
Cameron, J. (Director). (1997). Titanic [Motion Picture].
Caprinomics. (2018). Titanic Survivalship. Retrieved from Caprinomics: http://www.caprinomics.com/projects/titanic/
Carter, J. I. (2016, April 16). Looking for Survivors with Titanic Data Analysis. Retrieved March 17, 2019, from That's Deep: https://jasonicarter.github.io/survival-analysis-titanic-data/
Cason, T. E. (2012). Titanic Datasets. Lake Forest: Lake Forest College. Retrieved March 9, 2019, from Lake Forest College: http://campus.lakeforest.edu/frank/FILES/MLFfiles/Bio150/Titanic/TitanicMETA.pdf
Cicoria, S., Sherlock, J., Muniswamaiah, M., & Clarke, L. (2014). Classification of Titanic Passenger Data and Chances of Surviving the Disaster. Seidenberg School of CSIS. New York: Pace University. Retrieved March 17, 2019, from http://csis.pace.edu/~ctappert/srd2014/d3.pdf
Dahlén, M. (2007). The Negotiable Child: The ILO Child Labour Campaign 1919-1973. Uppsala: Uppsala University. Retrieved March 3, 2019, from http://uu.diva-portal.org/smash/get/diva2:169702/FULLTEXT01.pdf
Donges, N. (2018, May 14). Predicting the Survival of Titanic Passengers. Retrieved March 17, 2019, from Towards Data Science: https://towardsdatascience.com/predicting-the-survival-of-titanic-passengers-30870ccc7e8
Durojayne, M., Rakotonirainy, R., Shabalala, S., Akinyelu, A., Raphulu, D., & Simelane, S. (2014, January 11). Predicting Survival on the TItanic. Retrieved March 17, 2019, from University of the Witwatersrand Johannesburg: https://www.wits.ac.za/media/migration/files/cs-38933-fix/migrated-pdf/pdfs-2/Titanic.pdf
Encyclopedia Titanica. (2005, October 12). Nearer My God to Thee. Retrieved March 19, 2019, from Encyclopedia Titanica: https://www.encyclopedia-titanica.org/nearer-god.html
Encyclopedia Titanica. (2018). RMS Titanic. Retrieved from Encyclopedia Titanica: https://www.encyclopedia-titanica.org/titanic/
Encyclopedia Titanica. (2019). Mr John Law Hume. Retrieved March 19, 2019, from Encyclopedia Titanica: https://www.encyclopedia-titanica.org/titanic-victim/jock-hume.html
Garzke, W. H., Foecke, T., Matthias, P., & Wood, D. (2000). A Marine Forensic Analysis of the RMS Titanic. Oceans 2000 MTS/IEEE Conference Proceedings (pp. 673-690). Providence: MTS/IEEE. Retrieved March 14, 2019, from https://ieeeexplore.ieee.org/stamp.jsp?tp=&arnumber=881331
Hall, W. (1986). Social Class and Survival on the S.S Titanic. Soc. Sci. Med., 22(6), 687-690. Retrieved March 3, 2019, from http://www.med.mcgill.ca/epidemiology/courses/EPIB591/Fall%202010/mid-term%20presentations/Paper6.pdf
Joseph, J. (2016). The Best Metric to Measure Accuracy of Classification Models. Retrieved March 21, 2019, from KD Nuggets: https://www.kdnuggets.com/2016/12/best-metric-measure-accuracy-classification-models.html
Kaggle. (2019). Titanic: Machine Learning from Disaster. Retrieved March 23, 2019, from Kaggle: https://www.kaggle.com/c/titanic
Kelley, T. (2014). Exploratory Analysis - Cabin. Retrieved March 17, 2019, from Kaggle: https://www.kaggle.com/c/deloitte-tackles-titanic/discussion/9804
Lowe, N. V., & Bromley, P. M. (1992). Bromley's Family Law (8th ed.). Oxford: Butterworths.
Moughal, M. J. (2018, March 10). Exploratory Data Analysis of Titanic Dataset with Python. Retrieved March 17, 2019, from Medium: https://medium.com/@mjamilmoughal786/exploratory-data-analysis-of-titanic-dataset-with-python-94b0c84cd108
National Geographic. (2012, March 21). SAVE THE TITANIC WITH BOB BALLARD: FACTS. Retrieved March 19, 2019, from National Geographic: https://www.nationalgeographic.com.au/history/save-the-titanic-with-bob-ballard-facts.aspx
Titanic Facts. (2019). Building the Titanic. Retrieved March 19, 2019, from Titanic Facts: https://titanicfacts.net/building-the-titanic/
Tunstall, L. M. (2018, November 23). Titanic Pandas Python Script. Luton, Bedfordshire, United Kingdom.
Turner, S. (2011). The Band That Played On. Nashville: Thomas Nelson.
WilliamMurdoch.net.(2016). Starboard Evacuation. Retrieved March 19, 2019, from WilliamMurdoch.net: http://www.williammurdoch.net/man-08_starboard_evacuation.html
Lord, W. (1955). A Night to Remember. New York City: R & W Holt.
MathWorks. (2019). Machine Learning. Retrieved March 23, 2019, from MathWorks: https://uk.mathworks.com/discovery/machine-learning.html
STATISTICAL STUDY OF PASSENGER POPULATION
This appendix analyses the demographics of the passenger population who were onboard for the New York-bound maiden voyage.
Passengers boarded with first-, second- and third- (steerage) class tickets, which generally reflected their social class.
First-class passengers were the wealthiest to board. Their class comprised of highly upheld members of society including businessmen, stockbrokers, socialites, high-ranking military officers, doctors and writers.
Those who travelled in the second-class were mostly blue-collar workers with careers such as bakers, engineers, teachers, carpenters and clergymen.
The passengers voyaging in the third-class were largely emigrants on their way to the United States for a better life. (Hall, 1986, p. 687). While much of the media at the time focused on the celebrities from the first-class, it was the second- and third-class who drove the economic success for White Star Line. (Encyclopedia Titanica, 2018). As shown in Figure 11, there were approximately as many third-class passengers as first- and second-class passengers combined. This ticket class also had a higher male-female ratio (~5:2) than the other two classes (~5:3.5).
Figure 11—Passenger Ticket Class
The age of passengers spanned across three generations with both new-borns and elderly people onboard; Most were between the age of 15 and 34 years, as shown in Figure 12.
Additionally, the following observations can be made:
Passenger population by age generally decreased after 20 – 24 years.
Elderly passengers (≥ 65 years) were the smallest age group (1% of whole population).
There was a comparatively large population gap (123) between passengers aged 10-14 and 15-19.
Figure 12—Passenger Age Range
As observed in Figure 13, there were around twice as many male adults as female adults and only a small minority of passengers were young adults or children.
Figure 13—Passenger Age Category
The vast majority of people boarded RMS Titanic for personal reasons, such as returning home to the United States. A few however (5%) travelled to perform commercial services. As shown in Figure 14, these were the Ship’s Orchestra, Harland & Wolff (H&W) Guarantee Group and servants of first- and second-class passengers. Further information on these passengers is located in section ‘Role of Commercial Services’.
Figure 14—Passenger Purpose of Voyage
Role of Commercial Services
Most passengers performing commercial services were servants who served first-class passengers (63%), of which most were female as shown in Figure 15. A minority of only five mostly-male servants served second-class passengers; the presence of these servants indicates that these second-class passengers were wealthier than others in the same ticket class.
Figure 15—Passengers of Commercial Services
Servants were employed serving some of the wealthier first- and second-class passengers, whereby the class of passenger they served reflected the ticket class they held. They are recorded in the dataset as having performed the roles of secretary, chauffeur, maid, nurse, cook and clerk.
The eight-strong Ship’s Orchestra were employed by the Liverpool firm ‘C.W. and F.N. Black’ fulfilling a contract that provided all the steamship companies with musicians for the purpose of passenger entertainment. They all boarded with second-class tickets, resulting in them having a more luxurious voyage than other, less fortunate workers on the Titanic. (Encyclopedia Titanica, 2019)
These musicians went down in history as ‘The band That Played On’ (Turner, 2011), performing pieces of music to comfort those still on-deck until the ship’s final plunge (Bride, 1912). It has often been suggested that the last piece played by the band was ‘Nearer My God to Thee’, as dramatised in James Cameron’s movie adaptation of the disaster (Encyclopedia Titanica, 2005) (Cameron, 1997). Tragically, none of the Ship’s Orchestra survived the disaster. Of the bodies that were recovered, only three were identified as theirs.
Harland & Wolff Guarantee Group
Harland & Wolff (H&W) is the name of the shipyard Titanic was constructed in and the name of the shipbuilding company who designed and built her. There were around 14,000 employees working in the H&W shipyard on the year of the disaster (1912). Of these several thousand workers, approximately 3000 men and boys were engaged in the building of the Titanic. The other workers were engaged in projects such as the building of Titanic’s sister ship, Olympic. (Titanic Facts, 2019)
Of the core teams within the Titanic workforce, eight exceptional employees were specially selected to work on-board during the maiden voyage, forming the Guarantee Group. Led by the ship’s chief designer, Thomas Andrews, they were technical trouble-shooters who ensured the smooth-running of the ship (National Geographic, 2012) (Encyclopedia Titanica, 2018). Tragically, none of the Guarantee Group survived the disaster and their bodies, if recovered, were never identified.
The number of married and single passengers onboard was approximately the same, as shown in Figure 16. A further observation is that the ratio of male to female passengers was similar in those who were married (~5:3) and single (~5:4.2).
Interestingly, there is only one divorced passenger recorded in the dataset; Rhoda Abbot. She suffered a difficult divorce from her husband, Stanton Abbot, a year prior to the voyage. She had boarded the Titanic to return to her home of Rhode Island, United States. This was for the benefit of her sons, who had become homesick in England. Tragically her two boys, Rossmore and Eugene, died after the three of them jumped from the deck during the sinking. Miraculously, Rhoda survived after climbing into a partially swamped collapsible lifeboat. Her further distinction was that she was the only female passenger in the disaster to be exposed to the cold Atlantic water and survive (Bracken, 2004). A factor in her anomalous survival could have been accrued mental strength from the suffering she experienced during her divorce.
Figure 16—Passenger Marital Status
Of the wedded passengers onboard approximately as many boarded with than without their spouse, as shown in Figure 17. These 106 married couples comprised approximately a sixth of the whole ship’s passenger population.
Figure 17—Married Passengers Who Boarded with a Spouse
A slim majority of passengers voyaged with at least one companion (56%), meaning that just under half travelled alone. Age categorisation is described in section 3.6.
Just over half of passengers travelled with at least one adult (53%), as shown in Figure 18.
Figure 18—Passenger Adult Travel Companions
Young Adult Companions
A small minority of passengers travelled with young adults (9%), with most of those who did only journeying with one as portrayed in Figure 19.
Figure 19—Passenger Young Adult Travel Companions
Not many passengers travelled with children (14%), with most of those who did only travelling with one as shown in Figure 20.
Figure 20—Passenger Child Travel Companions
Ports of Embarkation
Passengers boarded the ship in English, Irish and French ports. The distribution of passengers who boarded the ship in each port is shown in Figure 21.
Titanic made several trips prior to the New York-bound maiden voyage, as depicted in Figure 22. These were as follows:
After the Titanic had passed sea trials in the littoral waters of the Irish coast, The H&W Guarantee Group sailed from Belfast to Southampton to pick up the majority of passengers (69%).
From Southampton, there was a channel crossing to Cherbourg to pick up approximately a quarter of passengers (21%).
From Cherbourg, there was a trip back to Ireland in Queenstown to pick up the remaining passengers (9%).
From Queenstown, Titanic set sail on the New York-bound maiden voyage.
Figure 21—Passenger Port of Boarding
Figure 22—RMS Titanic Sea Route
People boarded the Titanic from all over the world to set sail to America, with notable differences in the demographic of each ticket class.
The majority of first-class passengers were American, as depicted in Figure 23. Just under a quarter were English or Canadian and around a fifth were non-English Europeans. The only non-western passengers in the first-class were three Uruguayan passengers and an Egyptian servant.
Figure 23—First-Class Passenger Nationality
For the most-part, second-class passengers were English, as shown in
Figure 24. Here, American passengers comprised significantly less of the class demographic than in the first-class. This is possibly due to how as American passengers had enough wealth to travel both to the United Kingdom and back to the United States, they could also afford the highest grade of ticket.
Figure 24—Second-Class Passenger Nationality
Almost half of third-class passengers were Irish or English, as displayed in Figure 25. Despite only comprising less than a quarter of the class demographic, there were roughly the same number of English passengers in the third- and second-class. The third-class held the largest non-western group of people to board the Titanic; the 83 Syrian passengers.
Figure 25—Third-Class Passenger Nationality
Region of Nationality
Despite the United Kingdoms close proximity to France, overall four times as many Northern European passengers than Western European passengers boarded, as shown in Figure 26. Unsurprisingly as the Titanic was built and launched from Britain, almost half of passengers were British.
Passenger regions of nationalities are split by ticket class in Figure 27, whereby it can be observed that British passengers comprised a fifth of the first-class, most of the second-class and just under half of the third-class.
Figure 26—Overall Passenger Region of Nationality
Figure 27—Passenger Region of Nationality Distribution by Class
HISTORIC STUDY OF PASSENGER SURVIVABILITY
This appendix investigates the survival rates of passengers relative to personal characteristics that were prepared in section 3. The number of data entries for these characteristics is found in APPENDIX A.
Due to inadequate volumes of data being available for Marital Status and Nationality, they have been omitted from this analysis.
The survival rate of passengers by their ticket class is depicted in Figure 28.
First-class male adults were approximately three time as likely to live through the sinking than those who boarded with a second- or third-class ticket, whom of which almost all would not survive.
Female adults of the first- and second-class were around twice as likely to be saved than those of the third-class, who had only half a chance of surviving.
First- and second- class children were roughly twice as likely to survive than those in the third-class, whom of which tragically less than half were saved.
Figure 28—Passenger Survivability by Ticket Class
Passenger survivability varied substantially in accordance with age, as shown in Figure 29, which is annotated with dotted lines and diamond-shaped points to show the age category boundaries and survival rate of the passengers on those boundaries (12 [11-13] and 18 [17-19]) accordingly.
It was probable that new-borns, infants and toddlers (0 – 4 years) would survive, regardless of sex. Children (5 – 9 years) were about as likely to survive as they were to be lost in the disaster, irrespective of sex.
It was unlikely that male young adults (10 – 19 years) would survive the disaster. Likelihood of survival in female young adults was dependent on how far they were through teenagerhood; those in their early adolescent years (10 – 14 years) were unlikely to survive while it was probable that those later in their in their late juvenile years (15 – 19 years) would survive.
Males in their earlier adult years (20 – 44 years) were decidedly unlikely to survive, however adults in the middle of this age range (25 – 34 years) were slightly more likely to survive than those on the edges (20 – 24 and 40 – 44 years). Middle-aged males (45 – 54 years) had the best chance to survive, however were still fairly unlikely to. Males of senior age (≥ 55 years) were distinctly unlikely to survive.
It was probable that females in earlier adulthood (20 – 39) and middle-aged (45 – 59) years would survive the catastrophe. Those nearing their middle-aged years (40 – 44 years) however only had half a chance of survival and seniors (≥ 60 years) were only slightly more likely to survive than be lost in the disaster.
Figure 29—Passenger Survivability by Age Range
Passenger survivability by age category is further shown in Figure 30, whereby the age categories are defined in section 3.6. Here it can be observed that adults and young adults had comparable survival rates in each sex. Children had similar survivability regardless of whether they were male or female.
In order that survival rates could be analysed across sex and age for each passenger feature, based on the observations described above, data was further grouped as such:
a) Male Adults including young adults
b) Female Adults including young adults
Figure 30—Passenger Survivability by Age Category
The survivability of passengers by the purpose of their voyage is portrayed in Figure 31. Children are excluded in this section as none of them were passengers who performed commercial services.
Manservants had a slightly lower survival probability than male adult passengers, however both of these subgroups were still unlikely to live. Notably, the H&W Guarantee Group and Ship’s Orchestra had a zero-survival rate.
Incredibly, all woman servants lived through the disaster, whereas approximately only three quarters of female adult passengers had the same luck.
Figure 31—Passenger Survivability by Purpose of Voyage
As shown in Figure 32, the likelihood of passenger’s survival didn’t particularly vary depending on whether or not they boarded with a spouse; those who boarded with a spouse were only slightly more likely to live through the disaster. Children and young adults are not included in this section as understandably, none boarded with spouses.
Figure 32—Passenger Survivability by Boarding with a Spouse
Port of Embarkation
As depicted in Figure 33, the closer along the sea route a passenger’s boarding port was to the second-last port, Cherbourg, the more likely they were to be saved. Furthermore, most children who embarked at Cherbourg survived, however only half of those who boarded in Southampton survived and none who boarded in Queenstown survived.
The passengers who boarded in Belfast, whom of which all were lost in the disaster, were the H&W Guarantee Group and an American businessman.
Figure 33—Passenger Survivability by Port of Boarding
Region of Nationality
Survivability varied significantly relative to the nationality held by passengers, as portrayed in Figure 34. No African, Southern or Eastern European children boarded, so are not included in this section.
Male adults were unlikely to survive across all nationalities, aside from Western Europeans; a slim majority of them anomalously survived. Asia-Pacifikas, Africans and Northern Europeans were the only passengers in this group with survival rates higher than or equal to 20%.
A high percentage of female adults were saved regardless of nationality, aside from Northern and Eastern Europeans who had comparatively lower survival rates.
Children had varying but mostly positive survivability. They were all likely to live aside from those who were American or British; they only had half a chance of surviving.
Figure 34—Passenger Survivability by Region of Nationality
Figure 35 depicts passenger survival rates in accordance with the number of adult companions they travelled with.
It can be observed that male adults had a slump in survivability in those who boarded with 4-5 and 8-9 adults and a slight increase in those who travelled with seven.
The survival rate of female adults takes the form of a gaussian bell curve within those who embarked with 0-5 adult companions, peaking at a staggering 95%. Those who boarded with 8 adults however all survived. There were no female adult passengers recorded that travelled with 6 or 7 Adult Companions, so a dotted line joins those points in Figure 35.
Children generally had half a chance of being saved, aside from those who boarded with three adults who all survived, or those who travelled with five and were all lost.
Figure 35—Passenger Survivability by Number of Adult Travel Companions
Young Adult Companions
As shown in Figure 36, the presence of passengers boarding with young adults decreased their chances of being saved; The more they embarked with, the less probable it was that they would survive. In particular, all children and male adults who travelled with two or more young people did not survive. Female adults who voyaged with two young adults had half a chance at surviving, but none of those with three or more lived.
Figure 36—Passenger Survivability by Number of Young Adult Travel Companions
As depicted in Figure 37 for the most part, the more children passengers travelled with, the lower their survival rate. If an adult travelled with one child however, their likelihood of being saved increased; this pattern was particularly significant in male adults.
Figure 37—Passenger Survivability by Number of Child Travel Companions
Due to a lack of available data, only first-class passengers are analysed in this section. Of them, only male adults and young adults are represented. This is because almost all Female and Child passengers survived, thus leaving no survivability pattern. Throughout this section, ‘A-E’ represent decks (with A being one floor below the boat deck) and dotted lines separate different parts of the ship.
Figure 38 depicts the number of passengers and their survival rate by allocated cabin deck. It can be observed that the closer a passenger’s cabin was to the middle first-class deck (C-deck), the lower their chances of survival and the more densely packed they were. This observed pattern forms the basis of this survivability sub-analysis.
Figure 38—Passenger Survivability by Cabin Deck
Proximity to Ship Centreline
The Captain’s order to board women and children onto lifeboats first is one of the most well-known facts surrounding the Titanic disaster. After the Titanic had collided into the iceberg, Captain Smith placed Chief Officer Wilde (with Second Officer Lightoller assisting) and First Officer Murdoch in charge of launching the lifeboats; overseeing the port and starboard side respectively. (WilliamMurdoch.net, 2016)
Murdoch enforced his interpretation of the ‘women and children first’ evacuation order; that women and children should board as a priority with men filling any spare spaces. Lightoller however strictly enforced his interpretation of the order that only women and children could board. In addition, he and Wilde did not work well together, which furthermore added to the inefficiency of lifeboat allocation on the port side. (WilliamMurdoch.net, 2016)
Because of these differing boarding strategies, the survival of first-class men largely depended on which direction they chose at the top of the grand staircase during the evacuation, which could have been influenced by the side of their allocated cabin (i.e. port or starboard). This theory is supported by the pattern observed in Figure 39 whereby in three out of four decks, passengers were more likely to survive if they were allocated a starboard-side cabin.
Figure 39—Passenger Survivability by Cabin Proximity to Centreline
Proximity to Ship Centre
As observed in Figure 40, passengers were significantly more likely to survive if they were in inboard than outboard cabins on three out of four decks. This could have been because passengers in these rooms were fathers looking after their families; many outboard rooms were joint family rooms (Beveridge, 2008). A deck was omitted because there, all passenger cabins were positioned towards the centre of the ship.
Figure 40—Passenger Survivability by Cabin Proximity to Centre
Proximity to Amidships
Passengers were far more likely to be saved in the disaster if they were allocated a cabin towards the front of the ship, as shown in Figure 41 (Beveridge, 2008). D deck was excluded as only one first-class male adult was recorded as being in an astern cabin there.
Figure 41—Passenger Survivability by Cabin Proximity to Amidships
CONVERSION OF CATEGORICAL DATA
Data Value Conversion
to MLM Dataset
|H&W Guarantee Group||3|
|Boarded with Spouse||Yes||1|
|Region of Nationality||Africa||0|
|Port of Boarding||Belfast||0|