Yanuo Zhou

Yanuo Zhou

Research

Working Papers:

Estimating the Harmlessness-Accuracy Trade-off in AI: Evidence from LLM-based Hiring Audits [Link to Paper].

Abstract: Large Language Models (LLMs) have emerged as foundational technologies reshaping various sectors across the economy. While often viewed as sophisticated prediction machines, their final behaviors reflect strategic decisions embedded by developers during a crucial production phase: fine-tuning. This process, where producers balance the competing goals of accuracy and perceived harmlessness, is a significant and under-examined locus of managerial judgment. Using a large-scale LLM-based experimental audit of prominent models from four major producers, and modeled on hiring discrimination studies, I test how this trade-off is resolved using a dataset of over 12,000 real job advertisements from major U.S. cities spanning 1950 to 2020. The results show that: 1) In harmlessness-neutral contexts, all four tested models (Gemini 1.5 Flash, GPT-4o, Grok-2, and Llama 3.1 405B) accurately identify superior applicant profiles. 2) In contexts with clear perceived harmlessness considerations, all models-despite possessing internal knowledge of historical racial discrimination-systematically trade off accuracy for perceived harmlessness, either by inverting this reality to produce pro-diversity outputs or by enforcing a stricter neutrality. 3) In contexts with ambiguous perceived harmlessness considerations, the models diverge significantly, revealing distinct, firm-specific philosophies. 4) This trade-off is a persistent managerial challenge that can even intensify across model updates. These findings demonstrate that LLMs are not neutral tools but artifacts of managerial choice, making producer judgment a key dimension of product differentiation.

The Value of Open Source Software [Link to Paper].

with Manuel Hoffmann, and Frank Nagle.

Abstract: The value of a non-pecuniary (free) product is inherently difficult to assess. A pervasive example is open source software (OSS), a global public good that plays a vital role in the economy and is foundational for most technology we use today. However, it is difficult to measure the value of OSS due to its non-pecuniary nature and lack of centralized usage tracking. Therefore, OSS remains largely unaccounted for in economic measures. Although prior studies have estimated the supply side costs to recreate this software, a lack of data has hampered estimating the much larger demand-side (usage) value created by OSS. Therefore, to understand the complete economic and social value of widely-used OSS, we leverage unique global data from two complementary sources capturing OSS usage by millions of global firms. We first estimate the supply-side value by calculating the cost to recreate the most widely used OSS once. We then calculate the demand side value based on a replacement value for each firm that uses the software and would need to build it internally if OSS did not exist. We estimate the supply-side value of widely-used OSS is $4.15 billion, but that the demand-side value is much larger at $8.8 trillion. We find that firms would need to spend 3.5 times more on software than they currently do if OSS did not exist. The top six programming languages in our sample comprise 84% of the demand-side value of OSS. Further, 96% of the demand-side value is created by only 5% of OSS developers.

Technical Reports:

Census II of Free and Open Source Software – Application Libraries (2022). White Paper, Linux Foundation and Laboratory for Innovation Science at Harvard.
[Link to Publication].

with Frank Nagle, James Dana, Jennifer Hoffman, and Steven Randazzo.

Abstract: Produced in partnership with Harvard Laboratory for Innovation Science (LISH) and the Open Source Security Foundation (OpenSSF), Census II is the second investigation into the widespread use of Free and Open Source Software (FOSS). The Census II effort utilizes data from partner Software Composition Analysis (SCA) companies including Snyk, the Synopsys Cybersecurity Research Center (CyRC), and FOSSA. The aggregated data includes over half a million observations of FOSS libraries used in production applications at thousands of companies, aiming to shed light on the most commonly used FOSS packages at the application library level. This effort builds on the Census I report that focused on the lower level critical operating system libraries and utilities, improving our understanding of the FOSS packages that software applications rely on. Such insights will help identify critical FOSS packages to allow resource prioritization to address security issues in this widely used software.