More than 200 students scrutinized, analyzed and visualized data during this weekend’s DataFest.
The event—the fourth annual DataFest held at Duke—consisted of teams from Duke and surrounding universities. Participants were given statistics from a major consumer shopping company, and had 42 hours to analyze the data and develop interesting ways of presenting their results. Teams that found valuable insights in the data, successfully integrated outside sources and created engaging visualizations won final prizes. The event was the first in a series of DataFests being held at universities across the country this Spring.
“The purpose is to get exposure to a large and real dataset,” said Mine Çetinkaya-Rundel, assistant professor of the practice of statistics and one of the organizers of the event. “It’s also to give them quite a bit of liberty in what to do with it.”
Students in teams of two to five worked with up to a million rows of consumer website interaction data from a major shopping company that connects buyers to sellers. Rather than giving specific objectives to students, organizers encouraged groups to think outside of the box with what they could do with the dataset.
“It’s interesting trying to run an analysis without the very guided question that you usually receive on work and tests,” said junior Tori Hall—a member of the Bayes Anatomy team.
While many of the participants were undergraduates at Duke, the event was open to master's students as well as students from other universities. Teams came from the University of North Carolina at Chapel Hill, North Carolina State University and as far as the University of Michigan.
Participants were majors in subjects including statistics, computer science, engineering and the natural sciences. Çetinkaya-Rundel said the dataset was chosen so that students with any level of statistics knowledge would be able to participate.
“The dataset is such that someone with only one semester of statistics could take a small random sample and still provide some interesting insights,” she explained.
The data was taken directly from the company’s databases. Several teams said they learned a lot from working with real-world information, but also said that the dataset presented a wide range of challenges.
“There’s so many issues that we run into from just the dataset itself,” said Heather Shapiro, a senior. “It’s good experience for interviews and stuff.”
Anurag Sodhi, a first-year masters in engineering management student, noted that the scale of the dataset was larger than most others he had worked with.
“We might require another week to get the whole feel of what’s inside it,” he said.
Engineers and analysts from companies as well as professors were on site to provide advice for teams. Brittany Cohen, Trinity ’14 and quality assurance engineer with Applied Predictive Technologies, said she had enjoyed participating in DataFest during her time at Duke and was interested in helping current students.
“[The teams] are telling me what they’re trying to do and bouncing ideas off of me,” she said.
The team that won the Best Use of Outside Data award combined data on political ideology with transaction data from the provided dataset. The Bayes Anatomy Team, which won the Best Visualization award, created an alluvial flow chart and a network graph using the provided data. Winning teams in each category received books, certificates and medals.
The winners emphasized that they enjoyed being able to engage with real world data. The Bayes Anatomy team added that the weekend was “stressful but rewarding.”
“It was just a massive learning experience,” said team member David Clancy, a junior.