Tailored Truths: Persuasive Capabilities of LLMs

Main Motivation and Research Question:
- Large Language Models (LLMs) can argue effectively, but do humans find these arguments persuasive?
- Assuming threat actors weaponize LLMs for disinformation, what are the most persuasive strategies and how do they compare to human-written arguments?

Key Findings
Why does this matter? A Path to Scalable Disinformation
Likert and Loaded: Measuring AI’s Persuasive Punch
Details
Topics
Example topics |
---|
Prescription drug importation should be allowed to increase access and lower cost. |
Genetic modification of unborn babies is unethical and dangerous. |
Space tourism should be limited until safety regulations are further developed. |
AI must be transparent and explainable in order to be widely accepted. |
Internet access should be considered a basic human right. |
Interaction Types
- Static Arguments:
- arg-hum: Paragraph written by a human to be read by participants
- arg-llm: Paragraph written by an LLM to be read by participants
- Simple: Basic debate with no additional persuasion instructions.
- Stats: LLM uses (mostly) fabricated statistics to persuade.
- Personalized: LLM tailors responses using user demographics and personality traits.
- Mixed: Multi-agent approach combining personalized and stats agents, with an executive agent finalizing responses.
Key Terms
- Likert Δ: Difference in the initial and final rating on the Likert scale. Changes in the direction which the LLM was arguing for are considered positive.
Visual representation of Likert Δ.
Debating Under Influence: Mixing a Persuasion Cocktail
Personalization alone led to a Likert Δ of 0.479, showing a modest impact on opinion shifts. The simple approach performed better, with a Likert Δ of 0.782. The statistics-based method achieved a higher Likert Δ of 0.823, outperforming both - the personalization and the simple approach. However, the mixed approach had the greatest effect, reaching a Likert Δ of 1.146. Since a 1-point shift represents a full step on the scale, this result confirms that the mixed approach outperformed all other methods individually. This suggests that the right strategy is more effective than a simple debate prompt aimed at persuading the user. Specifically, personalizing fabricated statistics makes arguments significantly more convincing than either approach alone.


This led to an interesting observation when comparing LLMs and humans. The arg-llm type had a higher P(+change), while arg-hum had a higher Likert ∆. This suggests that while LLMs may often sway opinions, human arguments can sometimes be significantly more persuasive.
A fascinating—and slightly eerie—aspect of the mixed type was watching the private chat of the agents as they coordinated to generate debate responses. They categorized users by demographics and personality traits, exchanging responses and debating which arguments and fabricated statistics would be most persuasive. It felt like observing an AI focus group fine-tune the perfect pitch, adjusting strategies on the fly to maximize influence.

The challenge ahead
The low cost and high impact of AI-driven persuasion highlights the need for safeguards. Detecting AI-generated content in conversations is tough without clear markers, so improving detection, content verification, and platform safeguards is key to preventing misuse.
Ethical considerations
At the conclusion of the study, participants were informed that some of the models were instructed to make up falsified statistics in order to strengthen their arguments. They were also given a recommended reading list to better inform themselves about false information on the internet.
Citation
@misc{timm2025tailored,
author = {Jasper Timm and Chetan Talele and Jacob Haimes},
title = {Tailored Truths: Optimizing LLM Persuasion with Personalization and Fabricated Statistics}
year = {2025},
language = {en},
month = {jan},
eprint = {2501.17273},
url = {https://arxiv.org/abs/2501.17273},
}