acm-header
Sign In

Communications of the ACM

Virtual extension

Data Mining and Revenue Management Methodologies in College Admissions


The competition for college admissions is getting fiercer each year with most colleges receiving record number of applications and hence becoming increasingly selective. The acceptance rate in some elite colleges is as low as 10%, and the uncertainty often causes talented students to apply to schools in the next tier.1 Students try to improve their chances of getting into a college by applying to multiple schools, with each school having its own timelines and deadlines for admissions. Consequently, students are often caught in a dilemma when they run out of time to accept an offer from a university that is lower on their priority list, before they know the decision from a university that they value more. The college admissions process is thus extremely stressful and unpredictable to both students and their parents.

Universities, on the other hand, usually receive far more applications than their capacity. They consider various factors in making their decision, with each university using its own process and timelines. A university typically relies on a weighted set of performance indicators to aid the decision making process. These performance indicators and the associated weights for them are often based on a best guess approach relying mostly on past experience. However, since not all admission offers are accepted, universities send out more offer letters than their capacity and hope that the best students accept their offer.

Figure 1 shows a step-by-step sequence of events in a typical university admission process. The sequence describes a scenario that results in an unfavorable outcome for both the university and the student. The student applies to two different universities and prefers one over the other (Step A) with priority 1 university on the far right of the figure. Each university evaluates the application, and priority 2 university makes an early offer along with a certain deadline to accept the offer (Step B). The student, uncertain about priority 1 university, accepts the offer from priority 2 university (Step C), possibly committing some funds. At a later date, priority 1 university decides to accept the student (Step D) who may no longer be available. This process typically spans a number of months and is fraught with uncertainty, and results in a lose-lose situation for the priority 1 university and the student.

There are two challenges in the admissions process exemplified above:

  • i. The process of identifying the best applicants involves multiple credentials. Given the complex interactions between these credentials, it is not easy to identify a single model that is effective for this selection process. Furthermore, given the competitive nature of university admissions, there are no normative models in the literature.
  • ii. Once the most desirable candidates are identified, the decision to make an offer, and the composition of that offer, are both difficult. Better candidates are likely to be sought by multiple schools, so the university has to trade off the risks of chasing (and still losing) these students versus the better chances of getting the next tier of students. Furthermore, in many universities, some admission decisions and offer may have to be made before all applications are received.

We believe that data mining and revenue management techniques can be used effectively to address both these challenges, and thus convert the lose-lose situation into a win-win situation. By applying these techniques, universities can methodically score an applicant and be able to respond almost immediately with an offer, mitigating prolonged uncertainty while increasing transparency. We demonstrate the approach using a simplistic admissions process. Although individual universities may have additional, and possibly subjective features in their admissions processes, we believe that our approach could be adapted to the specific processes of many universities and colleges.

Back to Top

The Approach: A Two-Step Process

Our approach consists of two parts. First, we use data-mining techniques to develop a model that can predict the quality of an applicant using historical student performances and a number of key parameters gleaned from the student's application. Then, revenue management techniques are used to generate a going rate for the quality of the qualified student, which can be used to decide on admission offers at various points in the admission period. The value of the student obtained from the data mining techniques is checked against the going rate tables to make an admission decision. This approach benefits the students by letting them know the university's decision soon after they apply, and also benefits the university by maximizing the overall quality of students it admits.

Before describing the process, we provide a brief overview of data mining and revenue management, with some pointers to sources for more information about these techniques.

Data-Mining Technologies: With the spread of information technology most industries collect large amounts of data that are often organized into data warehouses. A number of data mining techniques have been developed over the last few decades that probe these data warehouses and provide the intelligence to find patterns in historical data. These patterns in needs, preferences and disposition of customers can be used to enhance the customer experience and improve revenues and profitability.3

Data-mining algorithms are broadly classified into two types: supervised and unsupervised. Supervised data mining techniques attempt to explain or categorize a particular target element in the data. Neural networks are one such technique and have been found to be particularly effective in environments where high accuracy, speed and adaptability to changing conditions are needed. They are known for their ability to generalize and learn from data similar to our own ability to learn from experience. The only drawback of neural networks is the difficult in explaining the rationale behind their actions. Neural networks are initially trained using a data set till the error rate stabilizes and then validated to test its ability to predict the target variable using another data set for which the target variable value is already known. Such models have been proposed to identify students that are more likely to accept and offer,6 and also to identify students that are likely to dropout based on information available at admission,4 and are the type of neural networks used in our approach. Decision trees are another type of supervised data mining technique. On the other hand, un-supervised data mining approach attempts to find patterns among groups of records without the specifying a target element or a predefined class, and are mainly used for identifying similar cases or clusters.

Revenue-Management Methodologies: In recent years, the growing importance of "perishable" products in the service economy has motivated the development of special methods for pricing such products. Revenue Management thus deals with demand-management decisions for perishable products or products whose value changes over a finite lifetime during which demand can vary. The basic idea behind these techniques relies on identifying different classes of customers and then exploits the differences in their willingness to pay.5 Revenue management has been very successful in developing dynamic pricing strategies for services such as lodging, travel and healthcare.

While both data mining and revenue management have proved to be useful and broadly applicable, to a great extent these powerful techniques have been implemented in different contexts, and have rarely been combined. Our approach is thus innovative not just because it provides new insights for the college admissions process, but also because it illustrates how data mining and revenue management can be combined to solve useful business problems.

Back to Top

Case Study

Step 1: Computing the value of the student. This is the first of the two-step process where data-mining techniques are used to compute the value of the student using information that is available on the application of the student. The value of the student is taken to be the anticipated/predicted performance of the student in the freshman year in terms of the GPA earned. With 42% of U.S. universities witnessing a freshman attrition rate of 25% or higher,4 performance in the freshman year is considered a key indicator of student quality.

For this study, anonymized undergraduate admissions data spanning a four year period was collected from a leading business school in the U.S.A. This data included various credentials derived from the applications of over 6880 students. The grades of these students for the freshman year were obtained and GPA calculated. The data was partitioned randomly into a training set and a validation set, with 70% of the data used to train or learn and the remaining 30% used to validate the models created.

The following performance indicators obtained from the student's application were used to develop the data mining model:

  1. High School GPA: The student's high school GPA on a 4.0 scale reported on the application
  2. SAT Math score: The math component of the SAT score on a scale of 800 points
  3. SAT Verbal score: The verbal component of the SAT score on a scale of 800 points
  4. Strength of curriculum: The strength of the curriculum offered in the school the student attended. The admissions office assigns this score on a 5.0 scale
  5. Adjusted GPA: The admissions office adjusts the student's high school GPA based on the quality/difficult of the classes the student completed during high school, using a 5.0 scale.
  6. Adjusted test scores: Scores on standardized tests such as SAT reported by the students are converted to a 5.0 scale.
  7. Subjective score: The quality of the essay, and other activities and experiences listed by the student on the application, scored by the admissions office on a 5.0 scale.
  8. Overall assessment score: An overall score on a 5.0 scale assigned by the admissions office after considering all the credentials of the student

In developing a data mining model, two data mining techniques were considered, neural networks and decision trees. Both methods can be used for predictive and classification modeling, so we chose to compare their performance for the specific student quality classification problem and to determine the technique that is a better fit for our study. For the decision tree model, students in the data set were classified into three groups based on the freshman year GPA: low tier, middle tier and upper tier. The low tier consisted of students that scored less than 3.0 in the freshman year, while the middle score obtained a GPA between 3.0 and 3.5, and the high tier scored more than 3.5. The decision tree was designed to classify a student into one of these three groups. With the neural network approach, we used both the raw GPA score as well as the GPA categories to build a model. It was observed that both data mining techniques were nearly twice as effective in classifying students into these three tiers as a simple random classification. Based on the results, we chose neural network model, not only because of its better performance on our data sets, but also because their ability to adapt to changing conditions make them particularly suitable for the admissions context, where applicant population characteristics change over time. The eight components listed above were used as the inputs to the network with the classified freshman GPA being the target variable.

The Enterprise Miner" package from SAS Institute was used to design and assess the data mining model. The data obtained from the admissions office was initially cleaned to create a consistent data set. For example, the high school GPA of some students was over 4.0 since the school the student attended followed a different scale, and this had to be recalculated on a 4.0 scale. Figure 3 shows the process diagram used to develop the neural network data mining model. It shows the application data being split for training and testing in the Data Partition module. Data is normalized in the Transform Variables module and is fed to the Neural Network. Data is normalized to remove any skewing of the model due to differences in absolute values. The Assessment module provides various statistical measures and charts that can be used to assess the performance of the model. For instance, lift charts computed in the Assessment module measure the change in concentration of a particular class when the model is used to select a group from the general population. They show the effectiveness of a model by comparing the predicted results using the model against results using no model or a random sample.

In effect, this process demonstrated that the use of a data mining model could help the admission process by accurately identifying applicants who would be most likely to succeed at the university, at least through their first year. The output from the neural network was the predicted freshman GPA.

Quantifying the quality of the student using techniques outlined above solves the problem of identifying the best applicants in an automated manner soon after the application is received. However, the admission problem is complicated by the fact that not all applications are received at the same time and there is uncertainty about the quality and quantity of applications that would be received later. If an admission decision needs to be made without the university receiving all applications, an admission offer could be made to a student of lower value when the same seat could have been offered to a student of higher value who applied later. Conversely, with a conservative approach, a student with a higher potential could be denied admission, forcing the university to later admit a student of lower potential if high quality students do not apply as expected. Revenue management techniques aid in the decision making in such situations where there is uncertainty in demand.

Step 2: Computing going-rate tables. Companies such as airlines and hotels use revenue management techniques to maximize their revenue by collecting the best price possible for each seat/resource even when there is uncertainty in future demand. Thus, a university's efforts to maximize the overall quality of the students it admits as it tries to fill each seat with the best possible student is comparable to the revenue management process in the service industry. Similar to the service industry that tries to maximize the price for each seat, we use revenue management techniques to select the best applicant when the quality and quantity of future applicants is uncertain. However, instead of a price, we use the value or the quality of the applicant derived in Step 1 and compare it against the going-rate derived by revenue management techniques, to make a decision.

A common revenue management model is Littlewood's two-class model, which is applicable when there are only two classes of customers, each willing to pay different prices. Another assumption in the Littlewood model is that the demand for the lower price (value) arrives first and the model determines the protection level for higher priced units. Since the value of students that apply is random, Littlewood's model is not suitable for our study. A dynamic model with Markovian periods is more suitable as it has the ability to handle demand that arrives in a random fashion. The entire time period during which the applications are accepted is divided into tiny units called Markovian periods, so that at most one request/ one application can be received in each time period. This dynamic model can be used to determine time-dependent nested protection/booking levels and also a time-dependent bid price table.

The bid price or going-rate table entries can then be used as a reference threshold by the admissions office to accept or deny a particular application. The university has a good idea of the yield rate based on historical results, based on which the number of offers made is higher than the capacity of the university. The time remaining from the current date to the end of the admission period is divided into a number of very small periods for which a reasonable assumption could be made regarding the number of applications received for each tier. In this study, it was assumed that a four week period remained till the end of the application deadline. Table 1 provides the assumed number of applications received in each week for three categories for the four week period. It was assumed that these application numbers include applications that were turned down initially and reconsidered.

It was also assumed that the university receives 6650 applications and makes about 1000 offers. The 1000 offers were made with an assumption that with the new process the yield rate would be around 80% for the 800 seats the university can accommodate. To satisfy the requirements of the dynamic model with Markovian periods, the four-week period during which applications can be received was divided into 33250 time periods giving each time period a chance of 20% (6650/33250) to receive an application. It was assumed that at most one application would be received during each of these tiny time periods. With these assumptions, the dynamic model was run and the bid price table generated. Table 2 shows the bid prices for time periods in increments of 4000. Each collection of 4000 periods is about 3.3 days.

This bid price table serves as a tool that the admissions office can use to make decisions on an individual application. The overall process then works as follows: the admissions office collects and categorizes admissions data and then uses the data mining model that was developed earlier to determine the value of a student (as a surrogate for the benefit derived by the university in accepting the student). At each period in time during the admission period and with a specific number of seats still available, the revenue management approach is used to generate the bid price table in terms of student value. This table can then be used to determine the going rate, based upon the remaining capacity and the remaining time.

For example, at about the beginning of the admissions cycle, with a Markovian time period of P2000 and with all 1000 seats available, the bid price is 3.25. This indicates that the university at this particular stage should accept a student who has a predicted value of 3.25 or more. On the other hand, if the number of available seats in P2000 was only 800, the university would be more selective and the going rate would be 3.73. A student that has been denied admission at a particular time period may qualify at a later time period if better students do not apply later as expected causing the bid prices to drop. Hence, a student who is denied admission at a particular time period can be considered again for the following periods if the bid prices drop. By using these bid prices as a reference, the university can maximize the overall quality of admitted students, and at the same time make a decision on the application quickly.

Back to Top

Discussion and Application

We describe an innovative and a promising approach to improve the college admissions process in this paper. We show how data-mining techniques, particularly supervised data mining techniques such as decision tree and neural networks, can be used to effectively classify students in terms of their predicted freshman GPA, using credentials gathered from their applications. By predicting the freshman GPA of applicants, right at the time of admission, we were able to quantify the value of the student to the university. The derived value or the quality of student was then used to maximize the overall quality of admitted students, using revenue management techniques. A dynamic model with Markovian time periods was used to generate a bid price table that can be used by the admissions office as a reference to accept or deny an application.

It should be noted that what we have presented is a modeling approach, and not a specific model. To employ the approach suggested in this study, a university would need to evaluate the performance measures that it considers important on the application and use them in Step 1 to assess the value of the student using data mining techniques. The model would have to be refined periodically to accommodate for the any changes post implementation. Furthermore, the basic model would need to be enhanced to accommodate additional complexities like early admission, waitlists, probability of the student not honoring the commitment to accept an offer, etc. However, this study successfully demonstrated the efficacy of combining two powerful methodologies to address a problem common in most college admissions.

With the approach outlined in this study, instead of waiting to receive all applications to make a decision, a university can quickly respond to each student and at the same time maximize its own benefit by accepting the best possible student for every seat offered. The student also would be able to make a better decision since he or she would know the result of his application sooner. Thus, the proposed approach attempts to convert the admissions process from a potentially lose-lose situation to a win-win situation. It is likely that the two techniques could be similarly combined in other applications as well, such as recruiting college graduates.

Back to Top

References

1. Athavaley, A. Colleges reject record numbers. The Wall Street Journal. (Apr. 3, 2007).

2. Barker, K., Trafalis, T., and Rhoads, T.R. Learning from student data. In Proceedings of the 2004 Systems and Information Engineering Design Symposium, (Apr. 2004), 7986.

3. Berry, M.J.A. and Linoff, G. Data Mining Techniques, Second Edition, Wiley Publishing.

4. Kemerer, F.R. and Baldridge, J.V. Strategies for Effective Enrollment Management, (1982).

5. Talluri, K.T. and Van Ryzin, G.J. The Theory and Practice of Revenue Management, Second Edition, Wiley Publishing.

6. Walczak, S. Neural network models for a resource allocation problem. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics 28, 2, (Apr. 1988), 276284.

Back to Top

Authors

Amit Basu is the Carr P. Collins Chair in MIS at the Cox School of Business, Southern Methodist University, Dallas, TX.

Surya Rebbapragada is a software engineer at Verizon Communication in Irving, TX. He received his MBA from Southern Methodist University at Dallas, TX.

John Semple is the Charles Wyly Professor of MIS at the Cox School of Business, Southern Methodist University, Dallas, TX.

Back to Top

Footnotes

DOI: http://doi.acm.org/10.1145/1721654.1721690

Back to Top

Figures

F1Figure 1. A sequence of events in an admissions process resulting in an adverse outcome

F2Figure 2. The Approach Two Step process

F3Figure 3. Data mining model built using Enterprise Miner in SAS

Back to Top

Tables

T1Table 1. Number of applications received each week

T2Table 2. A condensed form of the Bid Price table

Back to top


©2010 ACM  0001-0782/10/0400  $10.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.


Comments


The account that made this comment no longer exists.

When the first step is like this, the second step may be hiring a robot teacher...


Arvind Punj

It is unclear if more than expected 3.25 score candidates accept, then the chances of giving admission to higher GPA or score candidates may go down how does the model adjust may be better to publish more bids table instances to clear this up


Displaying all 2 comments