Question in Business Statistics
Every year, millions of high school students apply and vie for acceptance to a college of their choice. For many students and their parents, this requires years of preparation, especially for those wishing to attend a top-ranked college. In high schools, students usually work with college advisors to research different colleges and navigate the admissions process.
Elena Sheridan, a college counselor at Beachside High School, is working with 14 students who are interested in applying to the same selective four-year college. She is asked by her school principal to prepare a report that analyzes the chances of the 14 students getting accepted into one of the three academic programs. In a database of past college applicants available to counselors at Beachside High, predictor variables include the student’s high school GPA, SAT score, and the Male, White, and Asian dummy variables that capture the student’s sex and ethnicity. Elena also wants to know whether or not the parents’ education can be a predictor of a student’s college acceptance and plans to include the education level of both parents in her analysis.
Based on her conversation with college counselors at other high schools, she believes that high school students with a GPA of 3.5 or above have a much higher chance of getting accepted into a selective college. She also thinks that SAT scores of at least 1,200 substantially increase the chance of acceptance. To test these anecdotal assumptions, Elena wants to convert the GPAs and SAT scores into the categories corresponding to these thresholds. In addition, the database has a target variable indicating whether or not the past applicant was accepted to the college.
Develop the naïve Bayes classification model and create a report that presents an analysis of the factors that may influence whether or not a high school student is admitted to a selective four-year college. Predictor variables should include the applicant’s sex, ethnicity, parents’ education levels, GPA, and SAT scores. Transform the GPAs and SAT scores into appropriate categorical variables. Make predictions whether or not each of the 14 high school students at Beachside High in the College_Admission_Score worksheet will be admitted.
High school students work hard to excel academically and set themselves apart with extracurricular achievements. Getting into the right college and choosing the right major can help start them out on a successful professional career.
The college admissions data set that is available to Beachside High School counselors includes records of past students who had applied to a selective four-year college that has three academic units: School of Arts and Letters, School of Business and Economics, and School of Mathematics and Sciences. The data set is used to develop classification models based on a naïve Bayes algorithm to predict whether or not the 14 high-achieving students at Beachside High will be admitted to any of the three academic units at the college.
The 14 current students are the top students in their graduating class. However, because the admission process can be highly competitive, some of these students might not get admitted to the college they wish to attend. Moreover, different academic programs may have different admission criteria. As a result, a naïve Bayes classification model is developed for each of the three academic schools based on the following variables: parents’ education, high school.
GPAs, SAT scores, and the male, white, and Asian indicators. Because the naïve Bayes algorithm requires that all predictor variables are categorical, GPAs and SAT scores are converted into binary values where GPAs that are at least 3.50 are denoted as 1, 0 otherwise, and SAT scores that are at least 1,200 are denoted as 1, 0 otherwise. A summary of the students’ demographic and academic information is as follows.
Of the 14 students, eight of them are female. Four students are of Asian descent, and six are nonwhite.
The average high school GPA and SAT score of the 14 students are 3.64 and 1,261, respectively.
Nine students have a current GPA of 3.50 or higher, and 10 students scored at least 1,200 on the SAT exam.
All but three students have at least one parent who completed a four-year college degree.
Even though the current students have not decided which academic program they want to pursue, most of them express an interest in the School of Arts and Letters. The data set has 6,964 records of past applicants to this program, which are partitioned into training, validation, and test data sets. Based on the test data set, the naïve Bayes model for the School of Arts and Letters has an overall accuracy rate of 75.81%. The specificity and sensitivity rates are 84.65% and 48.83%, respectively.
A summary of performance measures of the classification models for the three academic schools is shown in Table 9.10. The naïve Bayes model predicts that only four of our 14 top students will be admitted to the School of Arts and Letters. This program appears to be the most selective of the three academic units. The scoring results of the 14 students for each of the three academic schools are presented in Table 9.11.
TABLE 9.10 Performance Measures of Naïve Bayes Classifiers
The School of Business and Economics is also a popular choice among the current students. The data set has 4,103 records of past applicants to this program, which are partitioned into training, validation, and test data sets. The accuracy, specificity, and sensitivity rates based on the test data are 77.44%, 83.20%, and 67.86%, respectively. The model predicts that six out of the 14 students will be admitted into this program.
A similar naïve Bayes classifier is developed based on the 6,272 records of past applicants to the School of Mathematics and Sciences. As presented in Table 9.10, the accuracy, specificity, and sensitivity rates based on the test data are 76.24%, 79.81%, and 69.19%, respectively. Based on the scoring results, six out of the 14 students are likely to be admitted into this program. These are the same six students that the previous model classifies as likely to be admitted into the School of Business and Economics.
Based on the overall accuracy rate, the naïve Bayes classifiers perform reasonably well. The lift ratios and the decile-wise lift chart also indicate that the naïve Bayes classifiers are more effective than a baseline random model. As shown in Table 9.10, the lift values of the first decile of the three models are above 2.0, and the AUC values are around 80%. However, compared to other performance measures, the sensitivity rate of the models is relatively low, especially for the School of Arts and Letters. This may be because the schools also use qualitative information that is not captured in the database, making it difficult to correctly classify all of the past applicants who were admitted to the college (i.e., identifying true positive cases). The qualitative factors that are relevant to the admission process include letters of recommendations, written essays, and, for the School of Arts and Letters, a student’s artwork.
Table 9.11 presents the scoring results of the 14 current students. In general, the only students who are likely to be admitted into any of the three academic programs are those who maintain a GPA of at least 3.50 and score 1,200 or above on the SAT exam. Out of the 14 students, only six of them meet both criteria. A high GPA or a high SAT score alone is not likely to result in an acceptance to the college.
These data-driven results confirm the anecdotal intuition that a GPA of 3.50 and an SAT score of 1,200 are the minimum thresholds that students at Beachside High need to achieve in order to be admitted into a more selective college or university. Moreover, the School of Arts and Letters appears to be more selective than the other two schools. Only four of the six students with a GPA above 3.50 and an SAT score above 1,200 are likely to get accepted into this program. With this information, it is advised that some of the 14 students wishing to attend the School of Arts and Letters apply to additional colleges and universities with a similar degree program as a back-up plan.
TABLE 9.11 Prediction Results for the 14 Students