Applications Quest: Computing Diversity

It helps admissions officers review college applicants holistically.

Posted Mar 1 2006

Introduction
Clustering for Diversity
Processing the Applications
Admissions Decisions
Conclusion
References
Author
Figures

Recently, two landmark cases challenged the University of Michigan’s admissions policies. In Grutter v. Bollinger, which focused on admissions to the university’s law school, the U.S. Supreme Court ruled 54 in favor of the law school’s admissions policy, which was designed to enhance the diversity of the student body. However, in Gratz v. Bollinger, by a vote of 63, the court reversed, in part, the university’s undergraduate admissions policy of awarding points for race/ethnicity. Here, the court decided that although race could be considered in admission decisions, it cannot be the deciding factor. Although this decision appears to support affirmative action efforts, it severely limits how race can be used to achieve diversity goals. The Supreme Court thus ruled that diversity could be used in university admissions, but did not specify how this should be achieved.

Rather than simply excluding any consideration of race/ethnicity from the admissions process, the route taken by many educational institutions, the University of Michigan has chosen to implement an expensive and labor-intensive holistic evaluation process that incorporates race/ethnicity as a factor. In Michigan’s holistic review process, every application is read by different admissions counselors. Counselors rate each application as outstanding, excellent, good, average/fair, or below average/poor. Additionally, the counselors give each application a recommended decision: high admit, admit, admit with reservation, deny with reservation, or deny [6]. Even though this process adheres to the Supreme Court’s rulings, there are no metrics in place to compare large numbers of applications. Counselors subjectively rate applications. Therefore, when applications must be compared, there are no methods in place to effectively compare applications to determine the extent to which applications are alike or different.

Applications Quest was developed to perform holistic comparisons between applications, yielding clusters of similar applications.

In general, a holistic evaluation uses race as one of many attributes being considered by the university as part of its decision, yet all attributes play a role and no single attribute is the determining attribute. This raises several interesting questions regarding holistic application evaluation: How does holistic evaluation translate into practice? What techniques could best be employed to compare large numbers of applications? Can holistic evaluations be performed economically without sacrificing quality? In an attempt to address these and other issues associated with holistic evaluation, we have developed a new computer algorithm, Applications Quest, a dynamic software tool that clusters applications, thereby giving admissions professionals a new perspective on holistic evaluation. This new approach includes race/ethnicity as one of the factors considered, but does not assign a numerical value to it and thus complies with the Supreme Court decision.

Clustering for Diversity

University admissions offices use applications to gather the same information about each applicant, ensuring that all applicants can be evaluated based on the same attributes. As the application represents every potential student, each university application is expected to contain pertinent information conveying the most important details about each applicant. Following this principle, it is possible to define diversity using a holistic view of an application. From this perspective, diversity is observed when the selected group of applications is holistically diverse. This level of diversity can be obtained through cluster analysis.

Clustering is the grouping of similar objects for the purpose of classification or categorization. Clustering is one of the most basic abilities of humans [1]. An intelligent being cannot treat every object as a unique entity unlike anything else in the universe. Instead, he or she must put objects in categories so as to apply hard-won knowledge about similar objects encountered in the past to the object at hand.

Clustering algorithms can be divided into two categories: hierarchical and non-hierarchical. Hierarchical clustering methods create clusters or groups by merging or dividing. These actions may occur in one of two forms: agglomeration or division [1, 2]. Agglomerative clustering methods form clusters by merging individuals and begin by assuming each instance in the collection population is an individual cluster. In the course of each processing cycle, two clusters are merged. This process continues until either there is only one cluster remaining that contains all instances in the population, or some other predefined stopping point has been reached, such as a specified number of clusters. The divisive clustering approach works in the opposite direction. It starts by assuming that all instances belong to one cluster. In each step of the process, a cluster is split into two clusters, until all clusters contain a single instance, or some other predefined stopping point has been reached, such as a specified number of clusters.

Non-hierarchical clustering methods result in faster execution times compared to hierarchical methods. The most common non-hierarchical method is k-means [1, 2, 7]. Before the k-means algorithm can be executed, the number of clusters is typically specified, which is k. Initially, k-means begins by selecting k instances as centroids. A centroid is the most representative instance within a cluster. It is the instance within a cluster that has the shortest distance from all the other instances within the cluster. The centroid instances are typically selected at random, or this process may utilize some heuristic. Much like the divisive approach, all the remaining non-centroid instances are compared to each centroid. The non-centroid instances are placed in the cluster with the most similar centroid. At the end of each cycle, the centroids are recalculated for each cluster and the instances are redistributed until the centroids do not change. There are several variations of k-means, such as bisecting k-means [7], but they all follow this basic approach.

All clustering algorithms must utilize some distance, or similarity, measure. Distance measures determine the distance or similarity between instances within a given population. These measures can be calculated using several methods, but the Euclidean distance is the most commonly used distance measure [2]. Euclidean distance is based on Pythagoras’ theorem, where instances are represented as points in an n-dimensional space. The distance between any two points in an n-dimensional space is calculated as the square root of the sum of the squared sides between the two points along each dimension. Euclidean distance measures are used by clustering algorithms to determine distance or similarity, yielding a basis for comparison between instances, or objects, with the same attributes/characteristics. As a result, clustering algorithms can be applied to admissions applications.

When holistic clustering is applied to admissions applications, the results yield clusters or groups of similar applications. The table here gives an illustration of three graduate school applications. Using holistic clustering, it is easy to see that applications 146 and 59 are more similar by comparing each attribute, such as INST1_GPA, and GRE_V, GRE_Q, across each application. Comparing the three applications is fairly easy. However, when hundreds or thousands of applications have to be compared or when the similarity between applications is not so apparent, this task becomes physically impossible for a human admissions counselor. Consequently, Applications Quest was developed to perform holistic comparisons between applications, yielding clusters of similar applications.

Applications Quest is a software tool that uses hierarchical clustering approaches to holistically compare admissions applications and place them in clusters based on their similarity. This process begins by collecting applications from an admissions pool in an electronic format and placing them in a database table. Each application’s attributes are classified as numeric, opinion, or nominal. Numeric attributes contain numerically based values, such as GPA, GRE, GMAT, SAT, and ACT scores.

Opinion attributes must be evaluated by an admissions counselor and include personal statements and essays. Nominal attributes are those that do not have a numeric base but exist in name only, such as race/ethnicity, gender, first-generation student, and major. Before processing, all numeric attributes are scaled to values between 0 and 1. Opinion attributes are assigned ratings by the admissions counselors between 1 and 10, which are later scaled to values between 0 and 1. This scaling provides a basis for comparison between attributes and ensures that all values are on the same numerical base. The nominal attributes are not assigned values but are handled differently from the numeric and opinion attributes. Applications Quest uses a squared Euclidean distance measure. The squared Euclidean distance measure is the Euclidean distance, but the final square root is omitted, which saves one operation. When two applications are compared, the numeric and opinion attributes are treated identically, in that the sum of the squared difference between the scaled values is computed. When considering nominal values, the attribute values are either the same or different, yielding 0 for identical attribute values or 1 for different values. Using the squared Euclidean distance measure, Applications Quest computes a similarity matrix.

Applications Quest compares every application to every other application and places the result of each comparison into a database table called the similarity matrix. The similarity matrix contains an entry for every comparison and the similarity between each pair of applications. This is computationally expensive because it must consider all combinations, nCr = n! / (n – r)! r!, where n is the number of applications and r represents the number compared at a time. Given a pool of 1,000 applications compared two at a time, 499,500 comparisons are required to build the similarity matrix. The similarity matrix should not be avoided when holistically comparing applications. Once the similarity matrix has been built, Applications Quest applies one of the aforementioned clustering algorithms.

Applications Quest’s processing is initiated by the user who specifies the number of clusters and the number of applications that Applications Quest will recommend for admissions from each cluster. Next, Applications Quest builds the similarity matrix and applies an agglomerative or divisive clustering approach to the application pool.

The divisive clustering approach begins by identifying the two most different applications using the similarity matrix. These two applications are used to split the pool into two clusters and are thus the centroids. All other applications are placed in one of the two clusters surrounding the selected two applications based on their distance/similarity to either of the centroids. Once all the applications have been placed in one of the two clusters, the largest cluster is selected and split again using the divisive approach by selecting the two most different applications within the selected cluster as centroids. This process continues until the specified number of clusters has been obtained.

When the agglomerative approach is used, it merges the most similar applications into clusters until the specified number of clusters has been reached. Applications Quest uses an average linkage [2] comparison measure to merge clusters. When the final number of clusters has been reached using either method, an email is sent to the admissions counselors notifying them that processing is complete. The admissions counselors can then use Applications Quest’s cluster visualization interface to process the applications.

Processing the Applications

After the applications have been placed into clusters, the admissions counselors can view the results using Applications Quest’s cluster visualization interface. Figure 1 is the summary page of the visualization tool and contains information about all the applications. In the example shown, 754 applications are divided into 75 clusters. Applications Quest recommends two applications from each cluster, as specified by the user. The recommended applications are selected from each cluster such that they optimize the Difference Index, which is defined as the average difference between applications. This measures the degree of difference within clusters and for the entire application pool. The larger the Difference Index value, the greater the difference between applications. For example, two completely opposite applications will have a Difference Index of 100%, vs. 0% for two identical applications.

In Figure 1, the Difference Index is 49.90%, with a standard deviation of 14. The Difference Index for the recommended applications is 52.96%, with a standard deviation of 13. Notice the Difference Index for the recommended applications is larger than the Difference Index for the applications pool. This illustrates how Applications Quest optimizes the Difference Index by selecting the N most different applications from each cluster, where N is the number of recommended applications per cluster specified by the user. Furthermore, the summary view gives the overall average for each of the numeric and opinion attributes, such as INST1_GPA of 1.078276. The top three values for each nominal attribute are also given. For example, the CITIZENSHIP attribute is a nominal attribute, with 257 applicants from India, 201 from the U.S., and 164 from China.

It recommends applicants who should be given strong consideration for admission while ensuring that these applicants optimize the Difference Index.

Applications Quest includes a navigation frame, shown on the left of Figure 1, containing a list of all the clusters, ordered by the number of applications within each cluster. In the Figure 1 example, Cluster 42 has 18 applications and Cluster 0 has 17 applications. When the user selects a cluster in the navigation frame, the summary results for that cluster are displayed in the content frame.

Figure 2 illustrates the summary view for Cluster 42. Notice the Difference Index within Cluster 42 is 23.61%, with a standard deviation of four. The summary results are presented using the same format as the applications summary. However, the cluster summary also contains an additional table at the bottom of the content frame, as shown in Figure 3. Every application that is a member of Cluster 42 appears in one row of the cluster summary table.

Admissions Decisions

Now that admissions counselors have a powerful visualization tool in Applications Quest, how will they use it? How does this tool help them in the decision making process?

Recall that the U.S. Supreme Court rulings deemed diversity a worthy goal for institutions of higher education. The rulings also stated that race/ethnicity could be used as part of the decision making process, although race could not be the determining factor, nor could it be assigned points [3, 4]. Applications Quest holistically compares applications and places them in a specified number of clusters based on their similarity. During this process, race/ethnicity is defined as a nominal attribute.

When race is compared between two applications, it is not given any preferential consideration by assigning it some predefined weight. Race/ethnicity is measured based on its similarity, just as any nominal attribute. Applications Quest provides admissions counselors a specialized view based on holistic clustering of their applications pool. It recommends applicants who should be given strong consideration for admission while ensuring that these applicants optimize the Difference Index. The counselors can then use the recommendations to narrow their admissions search. For example, admissions counselors enter all the applications that meet some minimum admissions requirement into the Applications Quest database.

Next, the counselors specify the number of applications that Applications Quest should recommend for admission. The counselors execute Applications Quest, review the recommended applications, and admit students from the recommended application pool or simply admit selected students from each cluster. Therefore, admissions counselors are able to use Applications Quest to narrow their search space and achieve diversity.

Conclusion

The U.S. Supreme Court has ruled that race can be considered in university admissions decisions. However, it did not specify exactly how race should be used. Applications Quest allows university admissions counselors to cost-effectively and holistically evaluate applications and consider race/ethnicity as one of the application attributes.

Applications Quest is undergoing an extensive pilot study in which it is being used to measure the diversity obtained from various universities using their existing admissions policies. Diversity is being measured using the Difference Index developed for this project. The Difference Index of admitted students is being compared to the index for those that Applications Quest recommends. The results of the study will reveal the degree to which admissions decisions would be different using Applications Quest and whether this approach can be used effectively to measure diversity.

Figures

Figure 1. Applications Quest summary page.

Figure 2. Cluster summary.

Figure 3. Cluster summary table.

Figure. Sample graduate school applications.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Applications Quest: Computing Diversity

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1118178.1118183

March 2006 Issue

Published: March 1, 2006

Vol. 49 No. 3

Pages: 99-104

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Clustering for Diversity

Processing the Applications

Admissions Decisions

Conclusion

Figures

Applications Quest: Computing Diversity

DOI

March 2006 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.