Students compete in first annual Duke DataFest
While crowds of students flocked to Old Duke Friday afternoon, 23 students gathered in the Old Chemistry Building for a 48-hour statistical science competition.
DataFest, a team competition where students analyze a complex data set in a limited period of time, was hosted at Duke for the first time this weekend. The participants used real data from Kiva Systems—a nonprofit organization that connects lenders with people looking for microloans around the world—to make recommendations that would be relevant to a Kiva client.
“They learned so much about analyzing data, which as a statistician is important to me,” said Mine Cetinkaya-Rundel, the organizer of the event and assistant professor of the practice of statistical science. “But more importantly, they also learned so much about context and how to come to useful conclusions.”
Prior to the competition, students met their client Noah Balmer, a software engineer at Kiva, via a Skype call. He said Kiva’s goal is to give a lender clear insight into how their money is spent.
“Unlike a lot of charities that do not give insight into how the money they receive is used, we like a lot of insight into how the money is used,” Balmer said. “To do this, a couple of years ago, we made a public database of our information that can be accessed by anyone.”
Students utilized this public information, which provided data on 50,000 lenders and their loans, Cetinkaya-Rundel said. Teams faced similar issues when analyzing these data sets, such as how to maximize each group member’s contribution.
“The way they decided to resolve their problems is to have each person on a team work with whatever they are most comfortable with,” she said. “They needed to parse their problem into pieces so that everyone did what they were best at.”
The teams took different approaches to analyzing the data. The DataCrunchers chose to look at how important social impact and overall risk is on a lender’s loan. Team Dendenwins created a naive Bayes classifier to identify words a person should include in their profile when seeking a loan to increase their probability of receiving one.
Dennis Zhan, a freshman on team Ducks, said it was difficult working with data that did not necessarily contain certain results.
“We are using this data where you don’t know if there are any correlations,” he said. “Most of the time you are looking for something, but since we defined the question, we don’t even know what we would find.”
There were three categories in which students could place first—best recommendations or insight, best visualization and best use of outside data. The Statisto-nots won best use of outside data, team Icepack won best visualization and team DataCrunchers won best insight.
Only five of the initial 10 registered teams showed up to compete, Cetinkaya-Rundel said, adding that some teams dropped out due to a demanding workload.
Niel Lebeck, a sophomore on team DataCrunchers, found the opportunity to work with real data rewarding.
“This competition lines up with the stuff that I am interested in,” he said.
Cetinkaya-Rundel said she was pleased with the overall turnout of the competition, but added that she may change the date to the Fall to attract more students.
“This happens to be a busy weekend for students,” she said. “Still, I was really surprised by the amount of effort they put into everything. If we had five awards, I would have given them all out.”