Shortly after the first author started his tenure-track position at Bar-Ilan University, he published a few additional papers with his doctoral advisor. These papers were mostly "lingering" results from his Ph.D. or direct extensions thereof. He was very surprised that his department chair reprimanded him for this, claiming it could be harmful to his career. Surprisingly, until now, we were unable to find any support to that claim in the literature.
The benefits and importance of mentoring have been long established and span a wide variety of vocational fields both in and outside of academia.2,7 In the academic realm, the supervision benefits are commonly mutual:6 The advisor extends her ability to conduct research by delegation, extends her influence network, and the advisee learns the important skills needed to conduct scientific research, receives various types of academic support, and so on. Focusing on the advisee, prior research has shown the doctoral advisor's identity and characteristics can have a far-reaching effect on a doctoral student's future career. For example, having an advisor with a strong publication record was shown to drive graduate students' publication activity,16 to increase students' chances of obtaining an academic position,10 and to serve as a predictor for future academic success.9 These, in turn, include higher levels of scientific autonomy,5 active international collaboration dynamics,1 an increase in the advisee's chances of pioneering their own research topics (that is, not following their advisor's research topics), winning prestigious prizes and recognition,12 and publishing in top venues such as Nature and Science.18
An advisor's strong publication record is not the only factor to influence the advisee's career trajectory. A satisfactory advisor-advisee relationship is an essential component of successful doctoral training,8,19,20,22,25 the number of advisors and their characteristics,23 the number of advisees that an advisor mentors,14 and the advisor's academic age11 all influence the advisee's future academic path. However, to the best of our knowledge, existing literature has yet to focus on a continued relationship after the advisee has completed her doctoral studies. One notable exception is Ma et al.,12 who investigated various possible effects of having an award-winning advisor on the future success of the advisee across various fields (but not in computer science). The authors found the proportion of co-authored papers between an advisee and her advisor within the advisee's total body of work negatively relates to her chances of becoming an award winner herself. This result may suggest a continued advisor-advisee relationship indeed bears significant (negative) effects on the advisee's academic success in her career.
In this work, we adopt a more fine-grained approach than Ma et al.,12 which allows us to reach more nuanced conclusions for computer science. Specifically, we examine richer measures of academic success (also known as impact metrics) rather than awards and follow the advisor-advisee collaboration pattern on a yearly basis rather than relying on overall frequency of co-authored papers in one's body of work. Overall, our results confirm and significantly extend those reached in Ma et al. by identifying three distinct advisor-advisee collaboration patterns in computer science, which are in turn linked to the advisees' academic success and the advisors' characteristics.
The data used in this study comes from three highly popular datasets: DBLPa (∼5.5M papers and ∼1.7M authors), Academic Family Tree (AFT)b (∼00K authors) and Microsoft Academic Graph (MAG) (∼36M papers and ∼18M authors).c
DBLP is the most popular computer science database today, which indexes all major CS journals, conference proceedings, books, and preprint servers. From DBLP, we extracted all indexed data up to 2020 and focused on all authors who had published at least five papers over a publication career span of at least five years (as was done in past works, for example, Liu et al.11), resulting in over half a million authors who published approximately 12.5M non-unique papers (average of 22 papers per author) and over half a million unique papers.
AFT is a crowdsourced academic genealogy database that is often used in academic research to investigate different aspects of advisor-advisee relationships, for example, Liénard et al.,10 Ma et al.,12 and Sanyal et al.17 While AFT may not be completely representative of the advisor-advisee population, it does provide comprehensive state-of-the-art indexing of a variety of academic supervision relationships, including Ph.D. supervisions, along with basic (mostly partial) information on the first and last years of the supervision. From AFT, we extracted all advisor-advisee pairs for which a Ph.D. supervision was indicated and filtered out any pair for which the advisee or advisor was absent from the extracted DBLP author pool, if both the start and last supervision years were missing or if the advisee had less than five active publication years after her graduation. We completed the missing first or last years as suggested elsewhere.10 Specifically, we inferred the missing year by identifying the earliest commonly authored publication by the advisor and advisee and added or subtracted the median lag between the start/end year and first publication (in our data, four years is the median Ph.D. duration and one year is the median duration from start to first publication). This process resulted in ∼14K advisor-advisee pairs for which complete estimated data is derived.d Out of these ∼14K pairs, 3,401 advisees have at least five active publication years after their Ph.D. graduation, and 993 have at least 10 active years after graduation. Since only 236 advisees have at least 15 active years after their Ph.D. graduation, we only consider the entire set of advisees and the former two subsets in our analysis.
More recent graduates tend to publish more with their Ph.D. advisors, in relative terms, compared to prior graduates.
MAG was used to extract the citations for the papers in our analysis. MAG was recently established as a leading citation database,15 which is only second to Google Scholar. Unfortunately, Google Scholar provides very limited access to its data and thus could not be used for our relatively large-scale analysis. Most papers in our study were matched in MAG and the rest (mainly workshop papers and preprints) were omitted from further consideration.
We started by examining the nature of the continued advisor-advisee relationship by looking at the portion of advisor-advisee co-authored papers within the advisee's total body of work over time as depicted in Figure 1.
As the figure shows, the percentage of advisor-advisee co-authored papers within an advisee's total body of work is growing quickly over time. Namely, more recent graduates tend to publish more with their Ph.D. advisors, in relative terms, compared to prior graduates. In addition, we encounter higher variability in the portion of co-authored papers within one's body of work for more recent graduates.
Next, we extend and verify the results from Ma et al.12 (given for other fields of science and different success metrics) and examine the portion of advisor-advisee co-authored publications within the advisee's total body of work against three classic measures of academic success five and 10 years after graduation. These measures are: H-index, i10-index, and total number of citations (see Horta and Santos5 for a review of these academic metrics and their use in practice). As depicted in Figure 2, higher collaboration rates are linked with lower academic success metrics. The three success metrics seem to present a very similar exponentially decaying behavior along the collaboration frequency axis. For concreteness, the mean H-Index, i10, and total number citations drop by 61%–71% as the collaboration frequency is increased from the 0–9% range to 40%–49%. Similarly, the variability in each success metric is quickly reduced as the collaboration frequency increases.
Combining the results here, we see that younger computer scientists seem to publish steadily and more frequently with their advisors after their graduation, which in turn is linked with lower academic success. However, this analysis misses what we consider to be important pieces of the puzzle—what collaboration patterns exist and how these relate to academic success.
To that end, we employ a cluster analysis while focusing on researchers with sufficiently long careers after their graduation (that is, researchers with at least five and 10 active publication years after their Ph.D. graduation). Specifically, we look at the ratio of papers each researcher has co-authored with her Ph.D. advisor per year for the first five years after graduation and the first 10 years after graduation and seek to group these time series into meaningful groups (or clusters) that could be analyzed and compared. We adopt the popular K-means algorithm13 with Dynamic Time Warping (DTW) distance measure,3 which is especially suited for time series clustering. To determine the appropriate number of clusters (k), we use the classic Elbow Method (which can be traced back to the mid-20th century21). As illustrated in Figure 3, a reasonable selection of k, under both time spans we examined, is k = 3.
Figure 3. Elbow graph for determining optimal k in clustering the collaboration patterns. The x-axis represents the number of clusters (k) and the y-axis represents the distortion (variance) in the data. The top line (blue) is for the five-year time span after graduation, followed by the line (green) for the 10-year time span.
The three clusters identified for the 10-year time span after graduation are depicted in Figure 4 using the centroid of each cluster (the diagram for the five-year time span after graduation is almost identical to the first five years of each cluster presented in the figure). Three distinct patterns clearly arise: Highly independent researchers who (almost) instantly stop collaborating with their advisors upon graduation; moderately independent researchers who gradually stop collaborating with their advisors (over ∼five years); and weakly independent researchers who maintain a high degree of collaboration with their advisors.
Figure 4. Cluster centroids as representatives of the three groups of researchers 10 years after graduation. The x-axis represents time after graduation and the y-axis represents the portion of co-authored papers the researcher has with her advisor each year. Highly Independent researchers are represented by the bottom line in blue, Moderately Independent researchers are represented by the middle line in orange, and Weakly Independent researchers are represented by the top line in green.
For concreteness, let us consider three well-known computer scientists. Andrew Ng, a pioneer in machine learning and AI, is one of the leading computer scientists today both in academia and industry. He is a prime example of the highly independent cluster, showing zero co-authored publications with his Ph.D. advisor (Michael I. Jordan) after his graduation in 2013. At the other extreme, let us consider Zhu Han from University of Houston. Since 2017, Han has been in the top 1% of the most cited researchers in all fields of science according to Web of Science.e Since his graduation in 2003, he has published extensively with his Ph.D. advisor (K.J. Ray), publishing almost 50 papers together. Only in 2007 (four years after his graduation) did he author his first paper without his advisor, while more than 50% of his publications that year were with him. As a representative of the moderately independent cluster, we consider another advisee of K.J. Ray's, Wade Trappe, a world leader in cybersecurity and communication systems from Rutgers University. Since his graduation in 2002, he has gradually ceased collaborating with his advisor—in 2003 his advisor co-authored seven out of his eight papers, followed by four out of eight papers with his advisor in 2004 and two out of nine papers in 2005. Since 2006 he has not published anything with his advisor.
We compare the three clusters in terms of their mean success metrics using a one-way ANOVA (ANalysis Of VAriance) with post-hoc Tukey HSD (Honestly Significant Difference) correction (see the accompanying table). We see that for the three examined success metrics, the highly independent cluster scores statistically significantly higher than the other two clusters both five and 10 years after graduation (one exception is the i10-index at 10 years where the clusters do not differ significantly, partially due to the low number of members in the weakly independent cluster). Surprisingly, while the moderately independent cluster indeed scores higher on all metrics and across both time points compared to the weakly independent cluster, the differences are not found to be statistically significant at p < 0.05 (p value ranged from 0.2 to 0.5). While the lack of significance for the 10-year time span may be partially attributed to the relatively low number of members in the weakly independent cluster, we cannot provide such an explanation for the five-year time span.
Table. Clusters' rounded mean H-index|i10-index|total citations (size of the cluster in parentheses) at different points in their careers measured by years after their graduation. Results in bold are statistically significantly higher at p < 0.05 using a one-way ANOVA with post hoc Tukey correction.
Furthermore, most researchers in our data are members of the highly independent cluster (57.3% and 60.3% for both the five and 10 years after graduation, respectively) or the moderately independent one (34% and 36.7%, respectively). The fewest number of researchers are members of the weakly independent cluster (8.7% and 3%). This result seems to suggest that more independent graduates are also more likely to "survive" in academic research, especially the transition from five years to 10 years after graduation. Incidentally, this is usually the time frame in which tenure-track researchers in academia apply for a tenured post. To verify our observation, we compare the average career length of researchers in each cluster (of the five-year clustering) using yet another one-way ANOVA with post-hoc Tukey correction. As expected, we find the clusters significantly differ, with the weakly independent cluster having a significantly shorter average career length than the other two clusters at p < 0.05, demonstrating about 10% shorter careers. The highly independent and moderately independent clusters do not differ significantly in this respect.
The identified clusters seem to differ in their members' collaboration behavior with their advisors even in the very first years after graduation and significantly differ in their academic success and their chances of having long academic careers. However, one may also wonder if the clusters also differ before and at graduation. We examine the following key metrics: the number of papers the advisee published during her Ph.D. period (with and without the advisor), the advisor's academic age (that is, time since first publication), the advisor's performance metrics (H-index, i10-index, and total citations), the advisor's country of affiliation (based on Scopusf) and the advisee's graduation year.
We find the highly and weakly independent cluster members publish significantly more than the moderately independent cluster members during their Ph.D., p < 0.05. Specifically, the moderately independent researchers publish 3.45 papers, on average, during their Ph.D. period while the highly and weakly independent researchers average 4.08 and 3.95, respectively. In other words, highly and weakly independent researchers do not differ significantly in their academic productivity during their Ph.D. period, despite their considerable differences after graduation, yet they are both more productive than moderately independent researchers, on average. Be that as it may, highly independent researchers do demonstrate significantly higher levels of academic independence also during their Ph.D. period. On average, a highly independent researcher publishes 49% of her papers with her advisor during her Ph.D. period compared to 61% and 69% in the moderately and weakly independent clusters, respectively. The latter two groups do not differ significantly. In other words, a researcher's independence post-graduation is strongly linked with her independence during her Ph.D. Focusing on the advisors' academic status and success, we find the highly and moderately independent researchers were advised by advisors of "similar caliber" (that is, no significant difference in the examined success metrics) and, as such, are expected to attract students of similar academic potential and promise. Interestingly, we find that weakly independent researchers are supervised by significantly more academically successful advisors in terms of both H-index and i10-index, p < 0.05. On average, weakly independent researchers were advised by advisors with 9%–10% higher H-indexes and 14%–16% higher i10-index compared to the other two groups. No significant difference was found for the total number of citations metric. We further find that weakly independent researchers graduate slightly later than the members of the other two groups by an average of 1.5 years, p < 0.05. Highly and moderately independent researchers do not differ on this account. No statistically significant differences were found for the advisors' academic age and the distribution over country of affiliation for all three groups.
Our analysis of genealogical and scientometric databases reveals three distinct collaboration patterns between computer science doctoral graduates and their advisors: Highly independent who cease to collaborate with their advisors almost instantly upon graduation; moderately independent who gradually stop the collaboration over ∼five years; and weakly independent who continue to heavily collaborate with advisors, at least for the first 10 years after their graduation. In turn, highly independent researchers are positively linked with greater academic success in terms of H-index, i10-index, and total number of citations both five and 10 years after graduation. Moderately independent researchers, who also stop collaborating with their doctoral advisors similarly to the first group but at a slower pace, are not found to be more academically successful than the weakly independent researchers, albeit displaying higher average metrics on all accounts. Both highly and moderately independent researchers are also associated with greater chances of having a long academic career.
Distinguishing between the identified groups seems to be possible even by observing the very first years (1–2) after the advisee graduated. In the same manner, it turns out that the identified groups also differ in their Ph.D. profiles. Most notably, as one could expect, highly independent graduates are also associated with higher levels of independence during their Ph.D. period. Interestingly, weakly independent researchers tend to be supervised by more academically successful advisors. One may consider this result to be somewhat surprising, since more academically successful advisors are expected to attract promising doctoral students who are more capable of conducting independent research. However, one may also argue that advisors who continue to publish papers with their former students are effectively expanding their workforce and thus become more successful. Either way, we speculate that advisees of highly successful advisors may hold the common belief that they may be the next rising stars of their advisor's hit research topic,14 and thus continue to collaborate with their advisors. Additionally, it may be that highly successful advisors have greater academic resources (for example, funding, equipment, ideas, and collaboration network) that young graduates feel can still assist them after graduation in building their careers. A complete inquiry into this matter is outside the scope of this work and is left for future work.
Arguably, we interpret the fact that highly and moderately independent researchers are statistically indistinguishable in terms of their advisors' examined characteristics (such as age, success metrics, and country of affiliation) and in their graduation year to be supporting evidence of these researchers' similar potential and promise. If this assumption holds, the differences between the groups should be, at least partially, attributed to the collaboration between one and her advisor during and after her doctoral studies.
Our results provide supporting evidence that young computer scientists should be reasonably encouraged to stop collaborating with their doctoral advisors—the sooner, the better. This may be especially important for productive Ph.D. graduates and those who were supervised by highly successful Ph.D. advisors. It further seems that promoting student independence during the Ph.D. period should be encouraged by the Ph.D. advisors. Our conclusion is further strengthened by recent evidence outside the computer science realm demonstrating how graduates who undergo additional training outside their advisors' research focus10 and those who change their research agenda away from their advisor's focus12,24 tend to significantly increase their scientific impact. Clearly, this evidence translates to lower collaboration rates with one's advisor early on in one's career.
As is the case in most literature on the academic labor markets, little is known about graduates who had very short academic career spans (in our study, less than five years after graduation). Specifically, we cannot easily determine why a graduate has ceased to publish academic research (for example, has she ever looked for a research post? Did she find one but was later laid off or quit?). As one could expect, we do find that those who qualified for our analysis had published 18% more papers during their Ph.D. compared to those who did not. We intend to investigate this population in future work. In addition, we plan to examine more subtle aspects of the advisor-advisee relationship including "informal" collaboration and advice which need not necessarily result in co-authored papers. Note that this work explicitly assumes that collaboration is manifested in co-authored papers, yet in the academic context and especially between doctoral advisors and their advisees, collaboration and advisement are much more than simple co-authorship.4 Last, we plan to investigate the collaboration patterns between other "family members" in the academic genealogy. For example, we plan to extend our analysis and investigate the possible relationships between "academic siblings," that is, researchers who were advised by the same doctoral advisor.
2. Eby, L., Allen, T., Evans, S., Ng, T., and DuBois, D. Does mentoring matter? A multidisciplinary meta-analysis comparing mentored and non-mentored individuals. J. of Vocational Behavior 72, 2 (2008), 254–267.
13. MacQueen, J., et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symp. Mathematical Statistics and Probability 1 (1967). Oakland, CA, USA, 281–297.
15. Martín-Martín, A., Thelwall, M., Orduna-Malea, E., and López-Cózar, E. Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: A multidisciplinary comparison of coverage via citations. Scientometrics 126, 1 (2021), 871–906.
18. Sekara, V., Deville, P., Ahnert, S., Barabási, A., Sinatra, R., and Lehmann, S. The chaperone effect in scientific publishing. In Proceedings of the National Academy of Sciences 115, 50 (2018), 12603–12607.
25. Zhao, C., Golde, C., and McCormick, A. More than a signature: How advisor choice and advisor behaviour affect doctoral student satisfaction. J. of Further and Higher Education 31, 3 (2007), 263–281.
c. https://academic.microsoft.com/ (no longer active).
d. We have examined the validity of using such data completion techniques in our data and noticed that it brings about extremely similar results to those derived solely based on the 1K pairs for which all information is given.
©2022 ACM 0001-0782/22/10
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.
Interesting. The point is that most Ph.D. students have to align their research with their advisors' agendas to start the research ideas and get more guidance. Independence is tricky. Do you mean independent ideas within the advisors' agendas or independent agendas? After reading this, I wonder whether I should keep working with my advisor. I kind of enjoy working with my advisor. LOL
Displaying 1 comment