Research and Advances
Architecture and Hardware

Reevaluating Google’s Reinforcement Learning for IC Macro Placement

Meta-analysis discusses the reproduction and evaluation of results in a 2021 paper about using RL to design silicon chips, as well as the validity of methods, results, and claims.

Posted
chip and circuit board in Google's color palette, illustration

A 2021 paper in Nature by Mirhoseini et al.30 about the use of reinforcement learning (RL) in the physical design of silicon chips raised eyebrows, drew critical media coverage, and stirred up controversy due to poorly documented claims. The paper, authored by Google researchers, withheld critical methodological steps, and most inputs needed to reproduce its results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind human chip designers, a well-known algorithm (simulated annealing), and generally available commercial software, while also being slower. Crosschecked data indicates that the integrity of the Nature paper is substantially undermined, owing to errors in conduct, analysis, and reporting. Before publishing, Google rebuffed internal allegations of fraud which still stand. We note policy implications.

Key Insights

  • A Nature paper from Google with revolutionary claims in AI-enabled chip design was heralded as a breakthrough in the popular press, but it was met with skepticism from domain experts for being too good to be true and for lacking reproducible evidence.

  • Now, crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis, and reporting. Independently, detailed allegations of fraud and research misconduct in the Google Nature paper have been filed under oath in California.

  • Nature has been slow to enforce its own policies. Delaying retractions of problematic publications is distorting the scientific process. Swift and decisive action is necessary to maintain the integrity and credibility of scientific research.

As AI applications demand greater compute power, efficiency may be improved via better chip design. The Nature paper was advertised as a chip-design breakthrough using machine learning (ML). It addressed a challenging problem to optimize locations of circuit components on a chip and described applications to five tensor processing unit (TPU) chip blocks, implying that no better methods were available at the time in academia or industry. The paper generalized the claims beyond chip design to suggest that RL outperforms the state of the art in combinatorial optimization. “Extraordinary claims require extraordinary evidence” (per Carl Sagan) but the paper lacked results on public test examples (benchmarks16) and did not share the proprietary TPU chip blocks used. Source code—released seven months after publication13 to support the paper’s findings after the initial controversy14,36,37,39,42—was missing key parts needed to reproduce the methods and results (as explained in Cheng et al.11 and Goth18). More than a dozen researchers14,18,36,42 from Google and academia questioned the claims of Mirhoseini et al.,30 performed experiments, and raised concerns5,11 about the reported research. Google engineers have updated their open source13 many times since, filling in some missing pieces but not all.11 The single open source chip-design example in the Google repository13 does not clearly show strong performance of Google’s RL code.11 Apparently, the only openly claimed independent (of Google) reproduction of techniques in Mirhoseini et al.30 was developed in Fall 2022 by UCSD researchers.11 They reverse-engineered key components missing from Google’s open source code13 and fully reimplemented the simulated annealing (SA) baseline11 absent in the code.13 Google released no proprietary TPU chip design blocks used in Mirhoseini et al. (nor sanitized equivalents), ruling out full external reproduction of results. So, the UCSD Team shared27 their experiments on modern, public chip designs: Both SA and commercial electronic design automation (EDA) tools outperformed Google RL code.13

Reporters from The New York Times and Reuters covered this controversy in 202214,42 and found that, well before the Nature submission, several Google researchers (see Table 1) disputed the claims they had been tasked with checking. The paper’s two lead authors complained of persistent allegations of fraud in their research.39 In 2022, Google fired the internal whistleblower14,42 and denied publication approval for a paper written by Google researchers critical of Mirhoseini et al.5 The whistleblower sued Google for wrongful termination under California whistleblower-protection laws: Court documents,37 filed under penalty of perjury, detail allegations of fraud and scientific misconduct related to research in Mirhoseini et al.30 The 2021 Nature News & Views article introducing the paper in the same issue urged replication of the paper’s results. Given the obstacles to replication and the results of replication attempts,11 the author of the News & Views article retracted it. On Sept. 20, 2023, Nature added an online Editor’s Note20 to the paper:

“Editor’s Note: Readers are alerted that the performance claims in this article have been called into question. The Editors are investigating these concerns, and, if appropriate, editorial action will be taken once this investigation is complete.”

A year later (late September 2024), as this article goes to print, the Editor’s note was removed from the Nature article, but an authors’ addendum appeared. This addendum largely repeats the arguments from an earlier statement17 discussed in the section on authors response to critiques. There is little for us to modify in this article: none of the major concerns about the Nature paper have been addressed. In particular, “results” on one additional proprietary TPU block with undisclosed statistics do not support any substantiated conclusions. This only aggravates concerns about cherrypicking and misreporting. The release of a pre-trained model without information about pre-training data aggravates concerns about data contamination—any circuit could have been used in pre-training and then in testing. We do not comment on the recent Google blog post,a except that it repeats the demonstrably false claim of a full source-code release that allows one to reproduce the results in the Nature paper. Among other pieces, source code for SA is missing, and additionally the Nature results cannot be reproduced without proprietary training data and test data.

This article first covers the background and the chip-design task solved in the Nature paper and then introduces secondary sources used.5,11,27,46 Next, the article lists initial suspicions about the paper and shows that many of them were later confirmed. The article then checks if Mirhoseini et al. improved the state of the art, outlines how the authors responded, and discusses possible uses of the work in practice. Finally, the article draws conclusions and notes policy implications.

Background

Components of integrated circuits (ICs) include small gates and standard cells, as well as memory arrays and reusable subcircuits. In physical design,23 they are represented by rectangles within the chip canvas (Figures 1 and 2). Connections between components are modeled by the circuit netlist before wire routes are known. A netlist is an unordered set of nets, each naming components that should be connected. The length of a net depends on components’ locations and on wire routes; long routes are undesirable. The macro placement problem addressed in the paper seeks (x, y) locations for large circuit components (macros) so that their rectangles do not overlap, and the remaining components can be well-placed to optimize chip layout.22,28,33

Figure 1.  A modern chip design layout with rectangular macros and numerous small standard cells placed in between (left); vertical and horizontal wire routes connecting macros and standard cells (right). On the left, colors distinguish logic from different parts of the design. On the right, colors distinguish wires routed on different metal layers on the chip.
Figure 2.  Layouts from Bae et al. with macros in red and standard cells in green, locations produced by RL (left) and RePlAce (right) for the IBM10 benchmark.2 Limiting macro locations to a coarse grid (left) leads to spreading of small macros (red squares on a grid) and elongates connecting wires from 27.5 (right) to 44.1 (left) for IBM10.5 High area utilization and many macros of different sizes make the ICCAD 2004 benchmarks2 challenging compared to benchmarks in Mirhoseini et al.30

Circuit placement as an optimization task.  After (x, y) locations of all components are known, wires that connect components’ I/O pins are routed. Routes impact chip metrics (for power, timing/speed, and so on). The optimization of (x, y) locations starts with simplified estimates of wirelength without wire routes. Pin locations (x1, y1) and (x2, y2) may be connected by horizontal and vertical wire segments in many ways, but the shortest route length is |x1x2 | + |y1y2 |.

For multiple pin locations {(xi, yi)}i, this estimate generalizes to

HPWL = ( max x i min x i ) + ( max y i min y i )

HPWL stands for half-perimeter wirelength, where the perimeter is taken of the bounding box of points {(xi, yi)}i.23,28,33 It is easy to compute and sum over many nets. This sum correlates with total routed wirelength reasonably well. When (x, y) locations are scaled by a factor γ > 0, HPWL also scales by γ, which makes HPWL optimization scale-invariant and appropriate for all semiconductor technology nodes.b Algorithms that optimize HPWL extend to more precisely optimize routed wirelength and technology-dependent chip metrics, so HPWL optimization is a precursor:4,10,22,28

  • To test new placement methods; once HPWL results are close to the best known, accurate metrics are used for evaluation; or

  • Followed by optimizations of advanced objectives that extend HPWL, for example, the RL proxy cost function in Mirhoseini et al.

Widely adopted optimization frameworks for placement do not use ML4,22,23,28,33 and can be classified as: simulated annealing, partitioning-driven, and analytical. Simulated annealing, developed in the 1980s24,25,38 and dominant through the mid-1990s,45 starts with an initial layout (for example, random) and alters it by a sequence of actions, such as component moves and swaps, of prescribed length. To improve the final result, some actions may sacrifice quality to escape local minima. SA excels on smaller layouts (up to 100K placeable components) but takes a long time for large layouts. Partitioning-driven methods3 view the circuit connectivity (the netlist) as a hypergraph and use established software packages to subdivide it into partitions with more connections within the partitions (not between). These methods run faster than SA, capture global netlist structures, and were dominant for some 10 years. Yet, the mismatch between partitioning and placement objectives (Equation 1) leaves room for improvement.3 Analytical methods approximate Equation 1 by closed-form functions amenable to established optimization methods. Force-directed placement12 from the 1980s models nets by springs and finds component locations to reconcile spring forces.23 In the 2000s, advanced analytical placement techniques attained superiority10,22,28,33 on all large, public benchmark sets, including those with macros and routing data.10 RePlAce10 from UCSD is much faster than SA and partitioning-based methods, but lags in quality on small netlists.

The Nature paper focuses on large circuit components (macros) among numerous small components. The fixed-outline macro-placement problem, which was formulated in the early 2000s,1,21,44 places all components onto a fixed-size canvas (prior formulations could stretch the canvas). It is now viewed as part of mixed-size placement.3 A 2004 benchmark suite2 for testing mixed-size placement algorithms evaluates the HPWL objective (Equation 1) which, as noted above, is apt for all semiconductor technology nodes. The suite has enjoyed significant use in the literature, for example Cheng et al.,10 Kahng,22 and Markov et al.28

Commercial and academic software for placement is developed to run on modest hardware within reasonable runtime. The methods and software in Mirhoseini et al. consume significantly greater resources, but at least with SA (during comparisons) it is straightforward to obtain progressively better results with greater runtime budget.

Circuit metrics for evaluating optimization results include circuit timing and dynamic power. Unlike power, timing metrics are sensitive to long/slow paths taken by signal transitions in a circuit and are difficult to predict before detailed placement and wire routing. Accurate early estimation of circuit metrics is a popular topic in the research literature but remains an unsolved challenge in physical design because metric values depend on the actual decisions by optimizers. For example, decisions on which wires take the shortest routes and which ones get detoured determine which pairs of wires experience crosstalk and which signal paths become slow.23 Because of this estimation difficulty, optimization methods with closed-form objectives are fundamentally limited in what they can achieve, and circuit implementation may need to be redone when routing cannot be completed or timing constraints cannot be satisfied.22

Key sources.  To solve mixed-size placement, the Nature paper first places macros and then places small components with commercial software. It places numerous macros with an RL action policy that is iteratively improved (fine-tuned) at the same time. The RL policy can be pre-trained on prior circuits or initialized “from scratch.” The iterative process runs for a set time (or until no change) and optimizes a fixed (not learned) proxy cost function that blends HPWL, component density, and routing congestion. To evaluate this function, the small components are placed with force-directed placement. The paper claims that RL beats three baselines: (1) macro placement by human chip designers, (2) parallel SA, and (3) RePlAce software from UCSD, which uses no RL.

Among secondary sources discussed in the context of Mirhoseini et al., we prefer scholarly papers5,11,46 but also draw on open source repositories and include FAQs as needed.13,27,c Here, all benchmark sets have hundreds of macros per design, compared to only a handful in sets such as ISPD 2015. We crosscheck claims from three nonoverlapping groups of researchers: those associated with Google Team 1 (Mirhoseini et al. and CT), the Google Team 2 (Bae et al.5), and the UCSD Team (Cheng et al.11 and the Macro Placement Repo—see Table 1). Consistent claims from different groups are even more trustworthy when backed by numerous benchmarks. Both Google Team 2 and the UCSD Team included highly cited experts on floor-planning and placement with extensive publication records and several key references cited in Mirhoseini et al., (such as Cheng et al.,10 Markov et al.,28 and others), as well as experience developing academic and commercial floor-planning and placement tools beyond Google.

Table 1. 
Secondary sources published by the teams and chip designs for which they report results. The IBM circuits2 are ICCAD 2004 benchmarks. Cheng et al.11 built three designs with two semiconductor technologies each.
Google Team 1 (Nature authors + coauthors)Google Team 2 + external coauthorsUCSD Team

Circuit Training (CT) repo and FAQ13

ISPD 2022 paper46

Stronger Baselines5

MacroPlacement repo and FAQ27

ISPD 2023 paper11

Four proprietary TPU blocks30

Ariane (public)13—all with numerous macros

20 proprietary TPU blocks

17 public IBM circuits2 all with numerous macros

All with numerous macros:

17 public IBM circuits2

2× Ariane (public)11,27

2× MemPool (public)11,27

2× BlackParrot (public)11,27

Initial Doubts

While the Nature paper was sophisticated and impressive, its research plan had notable shortfalls. For one, proposed RL was presented as being capable of broader combinatorial optimization (a field that includes puzzle-like tasks such as the Traveling Salesperson Problem, Vertex Cover, and Bin Packing). But instead of illustrating this with key problem formulations and easy-to-configure test examples, it solved a specialty task (macro placement for chip design) for proprietary Google TPU circuit design blocks, providing results on five blocks out of many more available. The RL formulation did not track chip metrics and optimized a simplified proxy function that included HPWL, but it was not evaluated for pure HPWL optimization on open circuit examples, as is routine in the literature.3,4,10,16,22,28,33 New ideas in placement are usually evaluated in research contests on industry chip designs released as public benchmarks,22,33 but Mirhoseini et al. neglected these contest benchmarks.

Some aspects of Mirhoseini et al. looked suspicious, as it did not substantiate several claims and withheld key aspects of experiments, claimed improvements in noisy metrics that the proposed technique did not optimize, relied on techniques with known handicaps that undermined performance in similar circumstances, and may have misconfigured and underreported its baselines. We spell these out and confirm many of them later in the article.

Unsubstantiated claims and insufficient reporting.  Serious omissions are clear even without a background in chip design.

U1. With “fast chip design” in the title,30 the authors only described improvement in design-process time as “days or weeks to hours” without giving per-design time or breaking it down into stages. It was unclear if “days or weeks” for the baseline design process included the time for functional design changes, idle time, inferior EDA tools, and so on.

U2. The claim of RL runtimes per testcase being under six hours (for each of five TPU design blocks)30 excluded RL pre-training on 20 blocks (not amortized over many uses, as in some AI applications). Pausing the clock for pre-training (not used by prior methods) was misleading. Also, RL runtimes only cover macro placement, but RePlAce and industry tools place all circuit components.

U3. Mirhoseini et al. focused on placing macros but withheld the number, sizes, and shapes of macros in each TPU chip block, and other key design parameters such as area utilization.

U4. Mirhoseini et al. gave results on only five TPU blocks, with unclear statistical significance, but high-variance metrics produce noisy results (Table 2). Using more examples is common (Table 1).

Table 2. 
Evaluating the soundness of the proxy cost used with RL in the paper and the noisiness of reported chip metrics after RL-based optimization. We summarize data from Table 2 in Cheng et al.11 on the Kendall rank correlation of chip metrics to the RL proxy cost and from Tables 3 and 4 in Cheng et al.11 on statistics for chip metrics (only Ariane-NG45 design data is shown, but data for BlackParrot-NG45 shows similar trends). As expected, purely additive metrics (standard-cell area, routed wirelength, and chip power) exhibit low variance, but the TNS and WNS metrics that measure timing-constraint violations have high variance.
Chip Metrics →AreaRouted WirelengthPowerWNSTNS
Rank correlation to RL proxy cost0.000.280.050.200.05

Mean μ

247.1K834.84,978-100-65
Standard deviation σ1.652K4.12722836.9
σ/|μ |0.010.000.050.280.57

U5. Mirhoseini et al. was silent on the qualifications and level of effort of the human chip designer(s) outperformed by RL. Reproducibility aside, those results could be easily improved (as shown in Cheng et al.11 later).

U6. Mirhoseini et al. claimed improved “area”, but chip area and macro area did not change and standard-cell area did not change during placement (also see the 0.00 correlation in Table 2).

U7. For iterative algorithms that optimize results over time, fair comparisons show per testcase: better-quality metrics with equal runtime, better runtime with equal quality, or wins for both. Mirhoseini et al. offered no such evidence. In particular, if ML-based optimization is used with extraordinary compute resources, then so should be optimization by SA in its most competitive form.

A flawed optimization proxy.  The chip design methodology in Mirhoseini et al. uses physical synthesis to generate circuits for further layout optimization (physical design). The proposed RL technique places macros of those circuits to optimize a simplified proxy cost function. Then, a commercial EDA tool is invoked to place the remaining components (standard cells). The remaining operations (including power-grid design, clock-tree synthesis, and timing closure4,23) are outsourced to an unknown third party.30,35 Results are evaluated with respect to routed wirelength, area, power, and two circuit-timing metrics: TNS and WNS.d Per Mirhoseini et al., the proxy cost function did not perform circuit-timing analysis23 needed to evaluate TNS and WNS.e Therefore, it was misleading to claim in Mirhoseini et al. that the proposed RL method led to TNS and WNS improvements on five TPU design blocks without performing variance-based statistical significance tests (TNS and WNS were optimized at later steps unrelated to RL30).

Use of limited techniques.  To experts, the methodology in Mirhoseini et al.30 looked to have shortcomings: Using outdated methods made it harder to improve the state of the art (SOTA).

H1. Proposed RL used exorbitant CPU/GPU resources compared to SOTA. Hence, the “fast chip design” claim (presumably due to fewer unsuccessful design attempts) required careful substantiation.

H2. Placing macros one by one (a type of constructive floor-planning23) is one of the simplest approaches. SA can swap and shift macros and make other incremental changes. Analytical methods relocate many components at once. One-by-one placement looked handicapped even when driven by deep RL.

H3. Mirhoseini et al. used circuit-partitioning (clustering) methods similar to partitioning-based methods from 20+ years ago.3,4,23 Those techniques are known to diverge from interconnect optimization objectives.3,23 By placing macros using a clustered netlist without gradual layout refinement, RL runs into the same problem.

H4. Mirhoseini et al. limited macro locations to a coarse grid, whereas SOTA methods10 avoid such a constraint. In Figure 1 (left) macros are placed freely, but a coarse grid used by Google’s RL implementation tends to spread macros apart and disallow large regions for cells, such as in the center of Figure 1 (left). Figure 2 illustrates the difference. Even if RL can run without gridding, it might not scale to large enough circuits without coarse gridding.

H5. The use of force-directed placement from the 1980s12 in Mirhoseini et al. left much room for improvement.

Questionable baselines.  The Nature paper used several baselines to claim the superiority of proposed techniques. We already mentioned that the human baseline was undocumented and not reproducible.

B1. Key results in Mirhoseini et al. and in Table 1 give chip metrics for five TPU design blocks. But comparisons to SA do not report those chip metrics.

B2. Mirhoseini et al. mentions that RL results were post-processed by SA but lacks ablation studies to evaluate the impact of SA on chip metrics.

B3. RePlAce10 was used as a baseline in Mirhoseini et al. in a way inconsistent with its intended use. As previously explained, analytical methods do well on circuits with millions of movable components, but RePlAce was not intended for clustered netlists with a reduced number of components: It should be used directly sans clustering (for details, see Bae et al. and Cheng et al.10,11). Clustering can worsen results due to a mismatch between placement and partitioning objectives,3 and by unnecessarily creating large clusters that are hard to pack without overlaps.

B4. Mirhoseini et al. did not describe how macro locations in SA were initialized, suggesting that the authors used a naive approach that could be improved. Later, Bae et al. identified more handicaps in the SA baseline, and Cheng et al.11 confirmed them.

Additional Evidence

Months after the Nature publication, more data became available in Bae et al., Google’s documentation and open source code,13 Nature peer review,35 and in Yue et al.,46 followed by the first wave of controversial media coverage.14,39,42 Nature editors released the peer review file for Mirhoseini et al., including authors’ rebuttals. In the lengthy back-and-forth,35 the authors assured reviewers that macro locations were not modified after placement by RL, confirming coarse-grid placement of macros. Among several contributions, Bae et al.5 implemented the request of Nature Reviewer #335 and benchmarked Google’s technique on 17 public chip-design examples:2 Prior methods decisively outperformed Google RL. American and German professors publicly expressed doubts about the Nature paper.14,42 As researchers noted gaps in the Google open source release,13 such as the grouping (clustering) flow, Google engineers released more code (but not all), prompting more questions. Another year passed, and initial suspicions were expanded11,27 by showing that when macro placement is not limited to a grid, both human designers and commercial EDA tools (separately) outperform Google code.13 In Table 2 of Cheng et al.,11 the authors estimated rank correlation of the proxy cost function optimized by RL to chip metrics used in Table 1 of the Nature paper. Cheng et al.,11 in Table 3, estimated the mean and standard deviation for chip metrics after RL-based optimization. A summary is provided in this article (Table 2), where rank correlations are low for all chip metrics, while TNS and WNS are noisy. Hence, the optimization of TNS and WNS in Mirhoseini et al. relied on a flawed proxy and produced results of dubious statistical significance (see Table 1 in Mirhoseini et al.). We note that σ/|μ | > 0.5 for TNS on Ariane-NG45, as well as on BlackParrot-NG45 in Table 3 of Cheng et al. In additional critical media coverage, Mirhoseini et al. was questioned by three U.S. professors.18,36

Table 3. 
Runtimes in hours for three mixed-size placement tools and methodologies on three large-chip modern designs reported in the arXiv version of Cheng et al.11 Google CT: Circuit Training code supporting RL in the Nature paper, used without pre-training. Cadence CMP: Concurrent Macro Placer (commercial EDA tool). SA: parallel simulated annealing implemented at UCSD following Bae et al.5 given 12.5h of runtime in each case. CT and SA are used only to place macros; the remaining components are placed by a commercial EDA tool whose runtime is not included. Cadence CMP places all circuit components. By quality of results in Cheng et al.11 (not shown here), Cadence CMP leads, followed by simulated annealing, and then Google CT. Additional evaluations of Cadence CMP versions by year concluded that performance and runtime on these examples did not appreciably change between the versions since 2019.27
↓ Designs / Tools →Google CT/RLCadence CMPUCSD SA
Ariane-NG4532.310.0512.50
BlackParrot-NG4550.510.3312.50
MemPool-NG4581.231.9712.50

Undisclosed use of (x, y) locations from commercial tools.  Strong evidence and confirmation by Google engineers are mentioned in the UCSD paper11 that the authors withheld a critical detail. When clustering the input netlist, CT merge in Google code13 read in a placement to restructure clusters based on locations. To produce (x, y) locations of macros, the paper’s authors used initial (x, y) locations of all circuit components (including macros) produced by commercial EDA tools from Synopsys.13 The lead authors of Mirhoseini et al. confirmed using this step, claiming it was unimportant.17 But it improved key metrics by 7–10% in Cheng et al.11 So, the results in Mirhoseini et al. needed algorithmic steps that were not included, such as obtaining (x, y) data from commercial software.

More undocumented techniques were itemized in Cheng et al.,11 which mentioned discrepancies between the Nature paper, their source code,13 and the actual code used for chip design at Google. These discrepancies included specific weights of terms in the proxy cost function, a different construction of the adjacency matrix from the circuit, and several “blackbox” elements13 available as binaries with no source code or full description in Mirhoseini et al. Bae et al., Cheng et al.,11 and the Macro Placement Repo27 offer missing descriptions. Moreover, Mirhoseini et al.’s results did not match the methods used because key components were not mentioned in the paper. And neither results nor methods were reproducible from descriptions alone.

Data leakage between training and test data?  Per Mirhoseini et al., “as we expose the policy network to a greater variety of chip designs, it becomes less prone to overfitting.” But Google Team 1 showed later in Yue et al.46 that pre-training on “diverse TPU blocks” did not improve quality of results. Pre-training on “previous netlist versions” improved quality somewhat. Pre-training RL and evaluating it on similar designs could be a serious flaw in methodology of Mirhoseini et al. As Google did not release proprietary TPU designs or per-design statistics, we cannot compare training and test data.

Likely limitations.  Mirhoseini et al. did not disclose major limitations of its methods but promised success in broader combinatorial optimization. The Ariane design image in Mirhoseini et al. shows macro blocks of identical sizes: a potential limitation, given that commercial chip designs often use a variety of macro sizes. Yet, they do not report basic statistics per TPU block: the number of macros and their shapes, design area utilization, and the fraction of area taken by macros. Based on peer reviews35 and the guidance from Google engineers to the authors of Cheng et al.,11 it appears that TPU blocks had lower area utilization than in typical commercial chip designs. Poor performance of Google RL on challenging public benchmarks from Adya and Markov2 used in Bae et al. and Cheng et al.11 (illustrated in Figure 2) suggests undisclosed limitations. Another possible limitation is poor handling of preplaced (fixed) macros, common in industry layouts, but not discussed in Mirhoseini et al. By interfering with pre-placed macros, gridding (see H4) can impact usability in practice. Poor performance on public benchmarks may also be due to overfitting to proprietary TPU designs.

A middling simulated annealing baseline.  The “Stronger Baselines paper”5 from Google Team 2 improved the parallel SA used by Google Team 1 in Mirhoseini et al. by adding “move” and “shuffle” actions to “swap,” “shift,” and “mirror” actions. This improved SA typically produces better results than RL in a shorter amount of time when optimizing the same objective function. Cheng et al11 reproduced qualitative conclusions of Bae et al. with an independent implementation of SA and found that SA results had less variance than RL results. Additionally, Bae et al. suggested a simple and fast macro-initialization heuristic for SA and equalized compute times when comparing RL to SA. Given that SA was widely used in the 1980s and 1990s, comparing to a weak SA baseline contributed to overestimating the new RL technique.

Did the Nature Paper Improve State of the Art?

The Nature editorial15 discussing the paper speculated that “this is an important achievement and will be a huge help in speeding up the supply chain.” But today, after evaluations and reproduction attempts at multiple chip-design and EDA companies, it is safe to conclude that no important achievement occurred because prior chip-design software, particularly from Cadence Design Systems, produced better layouts faster.11,27 If this were known to the paper’s reviewers or to the public, the paper’s claims of improving TPU designs would be nonsensical. The Nature paper claimed that humans produced better results than commercial EDA tools but gave no substantiation. When license terms complicate publishing comparisons to commercial EDA tools,f one compares to academic software and to other prior methods, with the proviso that small improvements are not compelling. Google Team 2 and the UCSD Team took different approaches to comparing methods from the Mirhoseini paper to baselines,5,11,27 but cumulatively reported comparisons to commercial EDA tools, to human designers, prior university software, and to two independent custom implementations of SA.

  • Google Team 25 followed the descriptions in Mirhoseini and did not supply initial placement information. The UCSD Team11,27 sought to replicate what Google actually did to produce results (lacking details in Mirhoseini et al.).

  • Google Team 2 had access to TPU design blocks and demonstrated5 that the impact of pre-training was small at best.g

  • The UCSD Team11,27 lacked access to Google training data and code but followed Google instructions by Google Team 113 for obtaining results similar to those in Mirhoseini et al. without pre-training. They also reimplemented SA following instructions by Google Team 25 and introduced several new chip-design examples (Table 1).

  • Comparisons using chip metrics and using a commercial EDA tool (Cadence CMP) were made,11,27 which outperformed Google RL. When running RePlAce in this context,11 used only macro locations produced by RePlAce and placed standard cells with the same commercial software used after Google CT/RL13,30 (more details below).

  • The UCSD Team repeated SA vs. RL comparisons for several configurations11,27 (those in Mirhoseini et al., those in the Github repo,13 and additional ones suggested by Google engineers). The results were consistent.

  • A chip designer from IBM outperformed Google RL,11,13,27 whereas Bae et al. did not use human baselines.

For comparisons that can be crosschecked, Bae et al. and Cheng et al.11 and the Macro Placement Repo27 report qualitatively similar conclusions. RePlAce was used in the Nature paper in a way inconsistent with its intended use.5 With proper use of RePlAce, Bae et al. and, independently, Cheng et al.11 produce strong results for RePlAce on well-known public ICCAD 2004 benchmarks. The implementation of simulated annealing used in the Nature paper was handicapped.5 Removing the handicaps (in the same source code base) improved results. When properly implemented, SA produces better solutions than Google CT/RL13 using less runtime, when both are given the same proxy cost function. This is shown consistently in Bae et al. and Cheng et al.11 on 17 widely used ICCAD 2004 benchmarks2 and on several modern design benchmarks.11 Compared to Google CT/RL,13 SA consistently improves wirelength and power metrics. For circuit-timing metrics TNS and WNS, SA produces less noisy results but comparable to RL’s results.11 Recall that the proxy function optimized by SA and RL does not include timing metrics,30 making any claims of improvement in these metrics due to SA or RL dubious. Improving upon SOTA requires improving upon all prior baselines.

Google CT/RL failed to improve by quality upon human baselines, commercial EDA tools, and SA. It did not improve SOTA by runtime either (Table 3), and the authors did not disclose per-design data or design-process time. RePlAce and SA gave stronger baselines than described in the paper, when configured/implemented well.

Rebuttals to Critiques of the Nature Paper

Despite critical media coverage14,31,36,42 and technical questions raised, the authors failed to remove remaining obstacles to reproducibility18 of the methods and results in Mirhoseini et al. The UCSD Team’s engineering effort overcame those obstacles, and they followed up on the work of Google Team 25 that criticized the Nature paper, then analyzed many of the issues. Google Team 2 had access to Google TPU designs and the source code used in the paper before the CT GitHub repo appeared. The UCSD authors of Cheng et al.11 and the Macro Placement Repo27 had access to circuit training (CT)13 and benefited from a lengthy involvement of Google Team 1 engineers, but not access to SA code used in Bae et al. or Mirhoseini et al. or other key pieces of code missing from the CT framework.13 Yet, the results in Bae et al., Cheng et al.,11 and the Macro Placement Repo27 corroborate each other, and their qualitative conclusions are consistent. UCSD results for Ariane-NG45 closely match those by Google Team 1 engineers, and in Figure 4 of Cheng et al.11 shows that CT training curves of Ariane-NG45 generated at UCSD match those produced by Google Team 1 engineers. Google Team 1 engineers carefully reviewed the paper11 and the work in Fall 2022 and Winter 2023, raising no objections.27

The two lead authors of the Nature paper left Google in August 2022, but in March 2023 they objected to the results in Cheng et al.11 without remedying the original work’s deficiencies. Those objections were addressed promptly in the FAQ section of the Macro Placement Repo,27 for example, in #6, #11, #13, #15. One issue was the lack of pre-training in experiments in Cheng et al.11

Pre-training.  Cheng et al.11 performed training using code and instructions in Google’s Circuit CT repo,13 which states (June 2023): “The results below are reported for training from scratch, since the pre-trained model cannot be shared at this time.”

  • Per the MacroPlacement FAQ in the Macro Placement Repo, Cheng et al.11 did not use pretraining because, per Google’s CT FAQ,13 pre-training was not needed to reproduce results of Mirhoseini et al. Also, Google did not release pre-training data.

  • Google Team 25 evaluated pre-training using Google-internal code and saw no impact on comparisons to SA or RePlAce.

  • Google Team 1 showed46 that pre-training on “diverse TPU blocks” did not improve results, only runtime. Pre-training on “previous netlist versions” gave small improvement. No such previous versions were discussed, disclosed or released in the CT documentation13 or in the paper itself.30

In other words, the lead authors of the Nature paper want others to use pre-training while they did not describe it in detail sufficient for reproduction, did not release code or data for it, and have shown that it does not improve results in the context of their claims. In September 2024 (years after the publication), the authors announced the release of a pre-trained model but not the pre-training data. Hence, we cannot ensure that a particular example used for testing was not used in pre-training.

Old benchmarks.  Another objection31 is that public circuit benchmarks2 used in Bae et al.5 and Cheng et al.11 allegedly use outdated infrastructure. In fact, those benchmarks2 have been evaluated with the HPWL objective, which scales accurately under geometric 2D scaling of chip designs and remains appropriate for all technology nodes (Section 2). ICCAD benchmarks were requested35 by Peer Reviewer #3 of the paper. When Bae et al. and Cheng et al.11 implemented this ask, Google RL ran into trouble before routing became relevant: RL lost by 20% or so in HPWL optimization (HPWL is the simplest yet most important term of the proxy cost optimized by CT/RL13,30).

Not training until convergence in experiments in Cheng et al.11  This concern was promptly addressed in FAQ #15 in the Macro Placement Repo: “ ‘training until convergence’ is not described in any of the guidelines provided by the CT GitHub repo for reproducing the results in the Nature paper.”27 Cheng et al. followed guidelines by Google in the CT. Later, their additional experiments indicated that “training until convergence worsens some key chip metrics while improving others, highlighting the poor correlation between proxy cost and chip metrics. Overall, training until convergence does not qualitatively change comparisons to results of Simulated Annealing and human macro placements reported in the ISPD 2023 paper.” RL-vs-SA experiments in Bae et al. predated the CT framework, so trained until convergence per six-hour protocol from Mirhoseini et al.

Computational resources used the Nature paper were exorbitant and difficult to replicate. Since both RL and SA algorithms produce valid solutions early and then gradually improve the proxy function, the best-effort comparisons in Cheng et al.11 used smaller computational resources than in Mirhoseini et al, with parity between RL and SA. The result: SA beat RL. Bae et al.5 compared RL to SA using the same computational resources as Mirhoseini. Results in Cheng et al.11 were consistent with Bae et al.5 If given greater resources, SA and RL are unlikely to further improve chip metrics due to poor correlation to the proxy function from Mirhoseini.

The paper’s lead authors mention in Goldie and Mirhoseini17 that the paper is heavily cited, but they cite no positive reproductions outside Google that cleared all known obstacles. Bae et al. and Cheng et al. do not discuss other ways to use RL in IC design, so we avoid general conclusions.

Can the Work in the Nature Paper Be Used?

The Nature paper claimed applications to recent Google TPU chips, providing credence to the notion that those methods improved state of the art. But aside from vague general claims, no chip-metric improvements were reported for specific production chips.h This article (see section titled “Did the Nature Paper Improve State of the Art?”) shows that the methods in the paper, and in the framework, lag behind SOTA, for example, simulated annealing from the 1980s.24,25,38,45 Moreover, a strong Google internal implementation of SA from Bae et al. could serve as a drop-in replacement of RL in the framework and of the Nature paper. We try to reconcile the claimed use in TPUs with Google CT/RL lagging behind SOTA.5,11

  • Given the high variance of chip-timing metrics TNS and WNS in RL results (due to low correlation with the proxy metric), trying many independent randomized attempts with variant proxy cost functions and hyperparameter settings may improve best-seen results,37 with much greater runtimes. But SA can also be used this way.

  • Using in-house methods, even if inferior, is a common methodology in industry practice called dogfooding (“eat your own dogfood”). In most chips, some blocks are not critical (do not affect chip speed) and are good dogfooding candidates. This can explain selective “production use” and reporting.

  • The results of RL were postprocessed by SA30 but the CT FAQ13 disclaimed this postprocessing—postprocessing was used in the TPU design flow but not when comparing RL to SA. But since full-fledged SA consistently beats RL5,11 SA could substitute for RL (initial locations can be accommodated using an adaptive temperature schedule in SA).

Google Team 1’s follow-up46 shows (in Figure 7) that pre-training improves results only when pre-training on essentially the same design. Perhaps, Google is leveraging RL when performing multiple revisions to IC designs—a valid context, but not described in the Nature paper. Moreover, commercial EDA tools are orders of magnitude faster than RL when running from scratch,11 so pre-training RL does not close the gap.

Can Google CT/RL code13 be improved?  RL and SA are orders of magnitude slower than SOTA (Table 3), but pre-training (missing in CT) speeds up RL46 by only several times. The CT repository now contains attempted improvements, but we have not seen serious improvements to chip metrics. Four major barriers to improving the CT repository and the paper remain:

  1. The proxy cost optimized by RL does not reflect circuit timing,11 so improving RL may not help to improve TNS and WNS.

  2. SA outperforms RL when optimizing a given proxy function.5,11 Hence, RL may lose even with a better proxy.

  3. RL’s placement of macros on a coarse grid limits their locations (Figure 2). When a human ignored the course grid, they found better macro locations.11 Commercial EDA tools also avoid this limitation and outperform Google CT/RL.

  4. Clustering as a preprocessing step creates mismatches between placement and netlist partitioning objectives.3,23

Conclusions

This meta-analysis discusses the reproduction and evaluation of results in the Nature paper by Mirhoseini et al.30, as well as the validity of methods, results, and claims. In the paper, we find a smorgasbord of questionable practices in ML,26 including irreproducible research practices, multiple variants of cherry-picking, misreporting, and likely data contamination (leakage). Based on crosschecked newer data, we draw conclusions with ample redundancy (resistant to isolated mistakes): the paper’s integrity is substantially undermined owing to errors in the conduct, analysis and reporting of its study. Omissions, inconsistencies, mistakes, and misrepresentations impacted their methods, data, results, and interpretation.

Conclusions about the Nature paper.  Google Team 2 had access to Google internal code whereas Cheng et al.11 reverse-engineered and/or reimplemented missing components. Google Team 2 and the UCSD Team drew consistent conclusions from similar experiments, and each team made additional observations. We crosscheck the results reported in Google Team 2 and the UCSD Team and also account for the CT framework,13 Nature peer reviews,35 and Yue et al.46, and then summarize conclusions drawn from these works. This confirms many of the initial doubts about the claims and identifies additional deficiencies. As a result, it is clear that the Nature paper by Mirhoseini et al. is misleading in several ways, such that the readers can have no confidence in its top-line claims and conclusions. Mirhoseini et al. did not improve SOTA while the methods and results of the original paper were not reproducible from the descriptions provided, contrary to stated editorial policies at Nature (see below). The reliance on proprietary TPU designs for evaluation, along with insufficient reporting of experiments, continues to obstruct reproducibility of the methods and the results. Attempts by the authors of the Nature paper to invalidate critiques have been unsuccessful. Surprisingly, the authors of Mirhoseini et al. have not offered new compelling empirical results one-and-a-half years since the publication of Cheng et al.11

Implications for chip design.  Our work highlights deficiencies only in the approach of the Nature paper. But a 2024 effort from China43 compared seven techniques for mixed-size placement using their new independent evaluation framework with 20 circuits (seven with macros). End-to-end results for chip metrics show that ML-based techniques lag behind RePlAce10 (embedded in OpenROAD) and other optimization-based techniques: DREAMPlace (a GPU-based variant of the RePlAce algorithm) and AutoDMP (a Bayesian Optimization wrapper around DREAMPlace). Despite the obvious need to replicate the methods of Mirhoseini et al., the authors of Wang et al.43 were unable to provide such results.

Policy implications.  Theoretical arguments and empirical evidence suggest that numerous published papers across various fields cannot be replicated and may be incorrect.34,41 As a case in point, the Nature paper aggravated the reproducibility crisis that is undermining trust in published research.8,34 Retraction Watch tracks 5,000 retractions per year, including prominent cases of research misconduct.19,34 “Research misconduct is a serious problem and (probably) getting worse”,8 which makes it even more important to separate honest mistakes from deliberate exaggerations and misconduct.6,7,19,40 Institutional response is needed,40,41 including clarity in Nature retraction notices.32

Nature Portfolio editorial policies should be followed broadly and rigorously.  Quoting from Nature Portfolio (https://go.nature.com/4dshcXv):

“An inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims. A condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications[…] After publication, readers who encounter refusal by the authors to comply with these policies should contact the chief editor of the journal.”

Specifically for Mirhoseini et al., the Nature editorial15 insisted that “the technical expertise must be shared widely.” But when manuscript authors neglect requests for public benchmarking and obstruct reproducibility, their technical claims should be viewed with suspicion (especially if they later disagree with comparisons to their work17). This point has already been made in a Communications news article.18 Per peer review file,35 the acceptance of the Nature paper30 was conditional on the release of code and data, but this did not happen when Mirhoseini et al.30 was published or later.11 The Nature paper was amended by the authors to claim that the code had been made available (see the “Data and Code Availability” disclaimer). But serious omissions remain in the released code. This is particularly concerning because the paper omitted key comparisons and details, and fraud was alleged under oath in a California court by a Google whistleblower tasked with evaluating the project.37 This makes reproducibility more critical.

It is in everyone’s interest to reach clear and unequivocal conclusions about published scientific claims, free of misrepresentations. Authors, Nature editors and reviewers, and the research community, share the burden of responsibility. Seeking the truth is a shared obligation.6,40

Acknowledgments

This meta-analysis would be impossible without the hard work and dedication to science of the authors of Bae et al. and Cheng et al.

    References

    • 1. Adya, S.N. and Markov, I.L. Fixed-outline floorplanning: Enabling hierarchical design. IEEE Trans. VLSI 11, 6 (2003), 11201135.
    • 2. Adya, S.N. and Markov, I.L. ICCAD04 Mixed-size placement benchmarks (2004); https://bit.ly/4gH0s1S
    • 3. Adya, S.N. and Markov, I.L. Combinatorial techniques for mixed-size placement. ACM Trans. on Design Automation of Electronic Systems 10, 1 (2005), 5890.
    • 4. Alpert, C.J. and Mehta, D.P. Handbook of Algorithms for Physical Design Automation. S.Sapatnekar (Ed.), Auerbach, Sachin (2008).
    • 5. Bae, S. et al. Stronger baselines for evaluating deep reinforcement learning in chip placement (2022); https://bit.ly/3TG9P7M
    • 6. Baker, T. The research scandal at Stanford is more common than you think. The New York Times (July 30, 2023).
    • 7. Brainard, J. In some scientific papers, words expressing uncertainty have decreased. Science  (July 28, 2023).
    • 8. Brooks, J. The battle for research integrity is winnable. Research Professional News (July 28, 2023); https://bit.ly/4egqib6
    • 9. Chen, S-T., Chang, Y-W., and Chen, T-C. An integrated-spreading-based macro-refining algorithm for large-scale mixed-size circuit designs. In Proceedings of the Intern. Conf. Computer-Aided Design (2017), 496503.
    • 10. Cheng, C.-K., Kahng, A.B., Kang, I., and Wang, L. RePlAce: Advancing solution quality and routability validation in global placement. IEEE Trans. on Computer-Aided Design 38, 9 (2018), 17171730; https://bit.ly/4gDOuGa
    • 11. Cheng, C.-K. et al. Assessment of reinforcement learning for macro placement. In Proceedings of the Intern. Symp. Physical Design. ACM (2023), 158166; https://bit.ly/4eeIthA
    • 12. Cheng, C.-K. and Kuh, E.S. Module Placement Based on Resistive Network Optimization. IEEE Trans. Comp.-Aided Des. 3, 3 (1984), 218225.
    • 13. Circuit training: An open-source framework for generating chip floorplans with distributed deep reinforcement learning. Github  (2022); https://bit.ly/4eoqTYr
    • 14. Dave, P. Google faces internal battle over research on AI to speed chip design. Reuters (May 3, 2022); https://reut.rs/3MZr3cz
    • 15. Google’s AI approach to microchips is welcome—but needs care. Nature (June 9, 2021); https://go.nature.com/3N1tnjt
    • 16. Goering, R. IC placement benchmarks needed, researchers say. EE Times (Apr. 10, 2003).
    • 17. Goldie, A. and Mirhoseini, A. Statement on reinforcement learning for chip design. The Register  (March 2023); https://bit.ly/3MZe9LJ
    • 18. Goth, G. More details, but not enough. Communications  (Mar. 29, 2023).
    • 19. Jack, A. ‘Open science’ advocates warn of widespread academic fraud. Financial Times (July 31, 2023).
    • 20. Joelving, F. Nature flags doubts over Google AI study, pulls commentary. Retraction Watch  (Sept. 26, 2023); https://bit.ly/4dl18aa
    • 21. Kahng, A.B. Classical floorplanning harmful? In Proceedings of the Intern. Symp. Physical Design (2000), 207213.
    • 22. Kahng, A.B. Advancing placement. In Proceedings of the Intl. Symp. Physical Design. ACM (2021), 1522.
    • 23. Kahng, A.B., Lienig, J., Markov, I.L., and Hu, J. VLSI Physical Design: from Graph Partitioning to Timing Closure, 2nd edSpringer (2022); https://bit.ly/4dCHPJL
    • 24. Kirkpatrick, S., Jr., Gelatt, C.D., and Vecchi, M.P. Optimization by simulated annealing. Science 220, 4598 (June 1983), 671680.
    • 25. Kravitz, S.A. and Rutenbar, R.A. Placement by simulated annealing on a multiprocessor. IEEE Trans Comp-Aided Design. 6, 4 (1987), 534549.
    • 26. Leech, L. et al. Questionable practices in machine learning. arXiv  (2024); https://bit.ly/4eleQes
    • 27. Macro Placement Repository. Github (2022); https://bit.ly/3Zzi7Cp
    • 28. Markov, I.L., Hu, J., and Kim, M.C. Progress and challenges in VLSI placement research. In Proceedings of the IEEE 103, 11 (2015), 19852003.
    • 29. Mazyavkina, N., Sviridov, S., Ivanov, S., and Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Computers and Operations Research 134 (2021), 105400; https://bit.ly/4dsfbKV
    • 30. Mirhoseini, A. et al. A graph placement methodology for fast chip design. Nature 594, (June 2021), 207212; https://go.nature.com/3N3AyaV
    • 31. Moore, S. Ending an ugly chapter in chip design study tries to settle a bitter disagreement over Google’s chip design AI. IEEE Spectrum  (Apr. 4, 2023); https://bit.ly/3Y1UsJO
    • 32. Naddaf, M. Retraction notices are getting clearer—but progress is slow. Nature (July 2024); https://go.nature.com/3znXGOh
    • 33. Nam, G.-J. and Cong, J. Modern Circuit Placement, Best Practices and Results. Springer, (2007).
    • 34. Oransky, I. and Marcus, A. There’s far more scientific fraud than anyone wants to admit. The Guardian  (Aug. 9, 2023); https://bit.ly/3THAwJv
    • 35. Nature. Peer Rev. File for the Nature paper. Nature (2022); https://bit.ly/3XZYZvW
    • 36. Quach, K. Google’s claims of super-human AI chip layout back under the microscope. The Register (Mar. 27, 2023).
    • 37. Superior Court of the Santa Clara County 2023. ‘Satrajit Chatterjee vs. Google, Case No. 22CV398683’, First Amended Complaint (FAC). (Feb. 2023); https://bit.ly/4eiw6km
    • 38. Sechen, C. and Sangiovanni-Vincentelli, A. The TimberWolf Placement and Routing Package. IEEE J. of Solid-State Circuits SC-20 2 (April 1985), 510522.
    • 39. Simonite, T. Tension inside Google over a fired AI researcher’s conduct. Wired (May 31, 2022)
    • 40. Thorp, H.H. A generative approach to research integrity. Science 381, 6668 (Aug. 10, 2023), 587.
    • 41. van Ravenzwaaij, D. et al. Perspectives on Scientific Error. Royal Society Open Science (July 19, 2023).
    • 42. Wakabayashi, D. and Metz, C. Another firing among Google’s A.I. brain trust, and more discord. The New York Times  (2022); https://nyti.ms/3XNEAcy
    • 43. Wang, Z. et al. Benchmarking end-to-end performance of AI-based chip placement algorithms. arXiv (2024); https://bit.ly/4ef8RI0
    • 44. Wein, E. and Benkoski, J. Hard macros will revolutionize SoC design. EE Times  (Aug. 2004); https://bit.ly/3NlIRiB
    • 45. Wong, D.-F., Leong, H.W., and Liu, C.L. Simulated Annealing for VLSI Design. Springer (1988).
    • 46. Yue, S. et al. Scalability and generalization of circuit training for chip floorplanning. In Proceedings of the Intern. Symp. Physical Design. ACM (2022), 6570.
    • See https://deepmind.google/discover/blog/how-alphachip-transformed-computer-chip-design/
    • With semiconductor technology scaling, macros may scale differently, but placement algorithms should handle a variety of macro sizes.
    • Google Team 1 posted their code under the Circuit Training (CT) GitHub repository,13 and the UCSD Team—under the Macro Placement (MP) GitHub repository.27
    • TNS = Total Negative Slack, WNS = Worst Negative Slack. These metrics measure violations of timing constraints (negative slack represents violations) by adding violations along all critical circuit paths or using the worst violations. These metrics are noisy since chip timing is often determined by a handful of paths, and small changes to macro locations may change timing a lot.
    • Proxy values correlate poorly with TNS and WNS.11
    • The lawsuit37 alleges that Google obtained better results with commercial tools before the Nature submission was made.
    • A consistent conclusion was reported in Yue et al.46 by Google Team 1—training on diverse designs does not improve quality of results, and improvements are seen only when training on earlier versions of the same design.
    • Table 1 in the Nature paper shows results for TPU designs of an earlier generation (that is, on chips that were already manufactured at the time). Assuming substantial use in production, more recent TPU design blocks must have used the CT framework and Mirhoseini et al. for tape-out.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More