Why Scientists Still Can't Measure Big G Precisely

Newton's gravitational constant, G, is one of the oldest quantities in physics — and one of the least precisely known. Decades of increasingly sophisticated experiments have not settled its value. They have made the problem harder.

A Constant Known to Only Three Significant Figures — and Getting Harder to Narrow

Most fundamental constants of nature have been measured to extraordinary precision. The mass of an electron is pinned down to roughly eight significant figures. The speed of light is defined exactly. G is different. After more than three centuries, it remains reliably known to only about three significant figures, sitting near 6.674 × 10⁻¹¹ m³ kg⁻¹ s⁻².

Part of the difficulty is intrinsic. Gravity is the weakest of the four fundamental forces by an enormous margin — roughly 10³⁶ times weaker than electromagnetism. Julian Stirling, a NIST postdoctoral guest researcher, offered a concrete sense of scale: the gravitational force between two parked sedans one space apart is approximately 100,000 times weaker than the force required to peel apart two Post-it notes. At that scale, isolating a gravitational signal from experimental noise is genuinely hard.

What makes the situation unusual is the direction of the problem. As measurement technology improves, the various international groups attempting to measure G have not converged. Their results have diverged, with error bars from different experiments that do not overlap. Recent high-precision measurements have ranged from approximately 6.670 × 10⁻¹¹ to 6.676 × 10⁻¹¹ m³ kg⁻¹ s⁻² — a spread that is small in absolute terms but large relative to the stated uncertainties of individual experiments.

The body that sets recommended values for fundamental constants, CODATA (the Committee on Data for Science and Technology), has had to respond. Between its 2006 and 2010 recommended values, CODATA widened the standard uncertainty on G — an admission that the field's collective precision had not improved, and that something was going wrong across multiple independent experimental programs. The chart below shows the CODATA recommended values and their standard uncertainties across three successive updates.

Carl Williams, Deputy Director of NIST's Physical Measurement Laboratory, described the situation plainly at a 2014 workshop: "The more work we do to nail it down, the bigger the divergences seem to be. This is an issue that no metrologist can be pleased with."

The 2014 Workshop and the Decision to Retest an Outlier

The divergence came to a head in October 2014, when NIST convened a specialist workshop drawing 53 scientists from international measurement programs. The mood, by Williams's own framing, was one of shared alarm. Attendees agreed on several coordinated responses: establishing recurring focused meetings, backing new experimental configurations, and exploring a centralized consortium to build consensus.

One development from 2013 made replication particularly urgent. The International Bureau of Weights and Measures, BIPM, had published a G measurement with an unusually tight stated uncertainty — just ±0.00018 × 10⁻¹¹ m³ kg⁻¹ s⁻² — but an anomalously high central value of 6.67545 × 10⁻¹¹ m³ kg⁻¹ s⁻², sitting well above most other experimental results. A narrow error bar on an outlying value is more worrying, not less: it implies high confidence in a result that other experiments cannot reproduce, which points toward an undetected systematic bias somewhere in the apparatus.

The response was unusual for metrology: the decision was made to physically ship the BIPM hardware to NIST and have an entirely different team re-run the experiment at a different facility. Separately, in June 2014, a team led by Rosi et al. published a G measurement in Nature using an entirely different physical principle — atomic interferometry with laser-cooled rubidium atoms accelerated past tungsten test masses — adding a data point from a method free of the mechanical assumptions embedded in torsion balance designs.

The timeline below maps the investigative milestones from the 2014 workshop through the 2016 arrival of the replication apparatus at NIST.

How the BIPM Torsion Balance Actually Measures G — and Where Systematic Error Can Hide

The apparatus at the center of the replication effort modernizes Henry Cavendish's 1798 torsion balance. The core arrangement uses eight copper-tellurium alloy cylinders. Four smaller inner masses hang on a disk suspended from a central pillar by a copper-beryllium strip — 2.5 mm wide, 160 mm long, approximately the thickness of a human hair. Four larger outer masses sit on a rotating carousel.

The experiment runs in two distinct modes, which is a meaningful design feature rather than redundancy. In the first mode, rotating the outer carousel causes the outer masses to exert a net gravitational pull on the inner masses, twisting the suspension strip. A laser bouncing off a mirror at the top of the strip tracks the angular deflection. Because the rotation is perpendicular to Earth's gravitational field, terrestrial gravity does not interfere.

In the second mode, electrodes apply a precisely measured electrostatic counter-force to hold the inner masses completely stationary. The magnitude of the electrostatic force needed to maintain equilibrium gives an independent route to G. Two modes yielding consistent results increase confidence that neither result is an artifact of the measurement method. Significant disagreement between modes, on the other hand, would be a specific flag for a systematic problem.

Jon Pratt, Chief of NIST's Quantum Measurement Division, was direct about what the replication attempt was actually looking for: "The terrifying part is obvious: bias or unaccounted-for physics in this experiment is far and away the most likely explanation, yet they will be extremely hard to find, since some of the best measurement scientists in the world have already done their best to eliminate them."

The NIST team introduced one notable procedural change. The original BIPM experiment shifted the outer masses in discrete steps between measurements. NIST planned to rotate the outer carousel continuously and slowly through a full 360 degrees while monitoring torque data in real time. This matters because a continuous sweep produces a richer, higher-density dataset that makes it easier to spot systematic trends or periodicities that a stepped approach might average over or miss entirely.

The physical infrastructure required for the experiment reflects its sensitivity. A Coordinate Measuring Machine — capable of measuring geometric features of the apparatus to within half a micrometre — arrived at NIST in the summer of 2016 and had to be craned down an air shaft to a laboratory 12 metres underground. A wall was removed to accommodate it. Vibration isolation and distance from surface-level disturbances are not incidental concerns when measuring a force as faint as gravity between kilogram-scale objects.

The diagram below outlines the dual-mode torsion balance layout as described in the source material.

The prevailing working hypothesis — shared among the metrologists involved — is that the divergence across international experiments reflects undetected systematic errors embedded in specific experimental setups, not a breakdown of Newtonian gravitational theory at laboratory scales. The BIPM apparatus transferred to NIST represents one of the few cases in precision metrology where a complete hardware replication by a different team, at a different site, was judged necessary and worth the logistical cost. Whether that replication finds a systematic error in the original BIPM result, confirms its anomalously high value, or introduces new questions of its own, the outcome will narrow the space of possible explanations for a problem that has persisted through more than a decade of coordinated international effort.