An analysis on a research blog site suggests that a famous Duke professor and researcher could have committed academic fraud.
Dan Ariely, James B. Duke professor of psychology and behavioral economics, has had at least two of his works come under scrutiny in recent weeks due to ambiguous or potentially fraudulent data.
The two incidents
One of Ariely’s works received an expression of concern in July 2021 from the journal Psychological Science due to discrepancies in the data. An expression of concern alerts readers to potential problems in a journal article. The 2004 article, “Effort for Payment: A Tale of Two Markets,” contained ambiguous statistics that Ariely “could not resolve,” the journal said.
Data Colada also published an analysis in August questioning the results of a 2012 paper where Ariely was one of five authors. The 2012 paper reported that “dishonesty can be reduced by asking people to sign a statement of honest intent before providing information (i.e., at the top of a document) rather than after providing information (i.e., at the bottom of a document).”
In 2020, the five authors and two others published a follow-up article entitled “Signing at the beginning versus the end does not decrease dishonesty.” The paper reported six studies that did not replicate the two original lab studies, one of which was an attempt to directly replicate the study and the other five of which were conceptual replications.
A brief analysis of the inconsistencies in the data and summaries of Data Colada’s findings can be found at the conclusion of this article.
How this could have happened
Data Colada noted that it is “impossible to tell from the data who fabricated it” but lists three possibilities, given that Ariely made clear he was the only author in touch with the insurance company in his response: Ariely, someone in Ariely’s lab or someone at the insurance company.
Ariely responded to the allegation, but Data Colada—the research blog—and other analyses suggest his statement and the theory that it was the insurance company are both inconsistent.
Ariely wrote on Sunday in a statement that he “[agrees] with the conclusions” of the data analysis and that he “fully [agrees] that posting data sooner would help to improve data quality and scientific accuracy.”
He wrote that the data were collected, entered, merged and anonymized by the insurance company and then sent to him. Ariely wrote that he was not involved in data collection or entry, nor did he merge data with information from the insurance database “for privacy reasons.”
He reiterated that he was the only author in contact with the insurance company and that none of the co-authors were involved.
“In situations like this, [Duke’s policy is to] immediately engage the Office of Research Integrity, which reviewed the available information, the process that was used to gather and verify the data, and its subsequent dissemination,” he wrote. “I fully supported this effort and am grateful for their guidance in developing appropriate regulatory oversight, data management and document retention procedures for the future.”
He added that he “did not suspect any problems with the data” and did not test the data for irregularities—“which after this painful lesson, [he] will start doing regularly.”
“I am committed to ensuring the integrity and validity of our research, and we are developing new policies to ensure that our data collection and analysis meets the highest standards.”
Apparent conflicts with Ariely’s statement and the insurance company theory
Aaron Charlton, a marketing professor at Illinois State University, wrote on his website OpenMKTG that Ariely’s written statement conflicted with other statements he’d made about the study.
The Excel data file posted online and analyzed by Data Colada shows that Ariely created the Excel file and the last person to modify it was Nina Mazar, a co-author of the study.
On Aug. 6, Mazar forwarded Data Colada an email that she received from Ariely on Feb. 11, 2011. The email reads “Here is the file” and attaches an earlier version of the Excel data file, which shows Ariely both created the file and was the last person to modify it.
Mazar found two mistakes in the data file. First, the effect observed from the data was the opposite direction from the paper’s hypothesis. When Mazar asked Ariely about this, he wrote that “when preparing the dataset for her, he had changed the condition labels to be more descriptive and in that process had switched the meaning of the conditions, and that [Mazar] should swap the labels back.” Mazar did so, according to Data Colada.
But Charlton noted that if Ariely didn’t touch the data, it wouldn’t have been possible for him to miscode the data.
“You can’t miscode a variable if someone else does all the data work and you didn’t touch it,” he wrote.
The other error was that the Excel formula used to compute the difference in mileage between the two odometer readings was missing in the last two rows of data, which Mazar also corrected.
“It is worth noting that the names of the other three authors—Lisa Shu, Francesca Gino, and Max Bazerman—do not appear on the properties of either Excel file,” according to Data Colada.
Charlton also noted a “logic problem” with the hypothesis that the insurance company itself fabricated the data: “Why on earth would an insurance company fabricate data in such a way as to support Dan Ariely’s hypothesis?”
Bazerman ‘completely convinced’ study contains fraudulent data
Bazerman, a behavioral economist at Harvard Business School and co-author of the original study, wrote in a statement to Data Colada that he was “completely convinced” it contained fraudulent data. He asserted that he raised concerns about the data to a co-author in 2011 and was “assured the data was accurate.” He did not examine the data more carefully nor ask anyone else to, he wrote.
Likewise, Shu wrote on Twitter that she, Bazerman and Gino had never looked at the data and that the collaboration with Ariely and Mazar began from “a place of assumed trust.”
When the randomization failure was discovered, Bazerman expressed concerns to his co-authors, only to be told it had been discussed collectively in 2012. Bazerman had no memory or evidence of the discussion, he wrote.
Bazerman wrote in the statement that PNAS raised the issue of retraction after the 2020 study, and although he and Shu favored retraction, the other three authors did not. Bazerman later requested a retraction of the study in July 2021.
“I wish I had worked harder to identify the data were fraudulent, to ensure rigorous research in a collaborative context, and to promptly retract the 2012 paper,” Bazerman wrote.
The Chronicle reached out to Ariely comment on the allegations of fraudulent data, the concerns of inconsistencies in his response, the statistical ambiguity in the 2004 study and Bazerman’s comments. He said he "[doesn't] really have much more to add" to what he has already said publicly.
When asked if the Office of Research Integrity has been in contact with Duke about the matter, a representative said that they can "neither admit to nor deny the existence of any ongoing case or any past case in which ORI did not make a research misconduct finding."
The 2012 study
The original study was conducted by an auto insurance company in the southeastern United States under supervision of Ariely; the name of the company was never identified in the study or subsequent follow-ups. The study asked customers to report the current odometer reading of up to four cars covered by their insurance policy. The customers had to sign a statement that read “I promise that the information I am providing is true,” but whether they signed that statement on the top or bottom of the form was assigned randomly.
The data showed that customers who had to sign the statement at the top of the paper reported driving 2,400 more miles—10.3% more—than those who signed the statement at the bottom of the paper.
The anomaly in the 2012 study
Baseline odometer readings across conditions were collected “months, if not many years before” the study participants were assigned to sign the paper at the top or bottom of the form. Before the random assignment, the condition difference was about 15,000 miles; after random assignment, this difference was about 2,400 miles.
The authors of the 2020 paper first proposed that “the randomization failed (or may have even failed to occur as instructed) in that  study.”
A team of anonymous researchers then downloaded the data from the original 2012 paper and the attempts of the 2020 team to replicate the Study 3 data. The researchers felt that rather than a randomization failure, there was strong evidence that the data were fabricated.
The findings were published on Data Colada, a website that analyzes data and is managed by Uri Simohnson at the ESADE Business School in Barcelona, Spain; Leif D. Nelson at the University of California-Berkeley and Joe Simmons of the University of Pennsylvania.
The analysis by Data Colada and the anonymous researchers concluded that “the data underwent at least two forms of fabrication: (1) many Time 1 data points were duplicated and then slightly altered (using a random number generator) to create additional observations, and (2) all of the Time 2 data were created using a random number generator that capped miles driven, the key dependent variable, at 50,000 miles.”
“Time 1” refers to the average mileage on the car before it was used and “Time 2” refers to the average mileage after usage. The researchers asserted that the Time 1 data had multiple “twins,” or similar data points that were within 1,000 miles of each other, and the Time 2 data appeared too uniform for it to be valid.
Get The Chronicle straight to your inbox
Signup for our weekly newsletter. Cancel at any time.
Nadia Bey, Trinity '23, was managing editor for The Chronicle's 117th volume and digital strategy director for Volume 118.
Leah Boyd is a Pratt senior and a social chair of The Chronicle's 118th volume. She was previously editor-in-chief for Volume 117.