The article analyses the survivability of passengers aboard the Titanic using data mining techniques, specifically decision tree classification and clustering, through tools like Weka. A subset of the Titanic passenger data from Kaggle was used, normalised to nominal data for analysis. The study focused on identifying significant factors affecting survival, such as sex, cabin class, age, and point of embarkation.
Key findings include:
- Sex: Being female was the most significant factor in survival, with women showing a higher likelihood of surviving.
- Cabin Class: Passengers in first class had higher survival rates compared to those in lower classes, particularly third class.
- Age Group: Adults aged 20–49 formed the largest group among those who perished, but the generalisation of age groups limited deeper insights.
- Embarkation Point: Point of departure showed a weaker correlation with survival, but it appeared related to class distribution.
The study utilised a J48 decision tree classifier with a ~81% accuracy and a Simple K Means clustering algorithm for visualising relationships. While the findings suggest strong associations, the authors caution against inferring causality without further analysis.
The paper concludes by highlighting the need for additional research with the complete dataset and exploration of cross-classification dependencies to enhance the model’s accuracy and insights. It also reflects on the learning process, emphasising the importance of data preparation for effective analysis.
Find it on arxiv.org
Encyclopedia Titanica is not responsible for the content of external sites, and the availability of links may change.
About Research References on Encyclopedia Titanica
This item is not available to read on Encyclopedia Titanica, but we have included it as a reference, provided a brief summary of the key points, and linked to the original source to help readers interested in the finer details of the Titanic story.