DataFest: Finding and Communicating the Meaning in Data

Students searching data on computer

“Big Data.” We hear this term used more often than just “data” these days. Thanks to the internet, organizations around the globe are gathering more information than ever thought possible, even at the turn of the century. With so much data, problem-solving and reasoning are critical skills students need to develop, which the Organization for Economic Cooperative Development is measuring globally as shown in its Programme for International Student Assessment report. So how are higher education institutions meeting this challenge? Many are exploring different ideas, but one idea gaining in popularity is modeled after hackathons.

ASA DataFest began at UCLA in 2011 and provided a forum for students to spend an intense 48 hours engaged in a friendly competition to find and communicate meaning in a large, complex data set. Since then, ASA DataFest has become a national event, held annually at seven sites around the country, including colleges and universities. For the fifth year in a row, students from across the country will gather for ASA DataFest @ UCLA, April 24-26, 2015. For two days, students, working in teams of two to five members, will dive deeply into a massive amount of data, looking for meaning. Most importantly, they will work to communicate in under five minutes this meaning to a panel of expert judges! Robert Gould, undergraduate vice-chair of the Dept. of Statistics, and director of the Center for Teaching Statistics, at UCLA, is the creator of DataFest, and has taken some time to talk with us about the event.

Q. I understand that the goal of this hackathon is to help students learn how to work with large amounts of data. But what are some other things you want students to learn from this experience?

First, I guess I should say that the term “hackathon” isn’t quite right. DataFest shares many similarities with hackathons: a large group of people are united to work on a large-scale problem in a very short, intense time. But at DataFest, the end goal isn’t to produce code –– although most teams will do quite a bit of coding –– but instead is to produce a presentation that teaches the judges something they didn’t know about the data, and does so elegantly and creatively.

I hope that students learn to work well in a group. The time pressure is intense, and collaborating well with your teammates is essential for DataFest and is also a valued marketplace skill. I also hope they learn that they can teach themselves things they didn’t know. Students come to DataFest at different levels; some are first-year students, some are seniors, and some haven’t even studied statistics! Even those with lots of statistical know-how will find that they need even more to do what they want with the data. While we have roving experts on the floor to assist teams, the students who are successful will teach themselves new methods and new computational skills.

Finally, I hope they learn to communicate well. While it’s called “DataFest”, and while data are front-and-center, this event is as much about communication. Students have to communicate with their teammates, they have to communicate with the experts to ask for help, and, of course, they have to communicate clearly to the judges.

Q. You invited data professionals from the Los Angeles area to help the students during the event. What do you think their presence will offer students?

Their presence is the one thing students cite each year as the most important aspect of DataFest. These professionals, who include engineers, computer scientists, statisticians, social scientists, and faculty from research institutes, universities, and colleges, give students a different perspective beyond the classroom. In the classroom, they interact with their instructor in a student/teacher role. Here, they interact more like peers. The data professionals give them the perspective of the workplace, and also teach them new tools.

And we can’t overlook the fact that DataFest is a great recruiting opportunity for these data professionals. Where else can you observe potential employees working under time pressure on a novel and complex problem in a team? Visitors get a sense of more than just the students’ technical skills, but also learn something of the students’ personalities.

Q. Why did you decide to create this event as a hackathon? What I mean is hackathons involve exploring problems and offering solutions within a short period of time. Why is working with data in this type of situation important?

I’m not sure that its a good idea to tackle serious problems when you’re sleep deprived, but it is a lot of fun. And fun, more precisely, play, is a really important part of learning that I think gets left out of the classroom too often. DataFest is low stakes. It doesn’t show up on your transcript, and simply competing teaches you a lot and looks impressive on a resume. And so you’re free to try things out that you wouldn’t usually try. The time-limit helps you focus, and gives you permission to set aside some of the other things you might have going on so that you can focus on this, knowing that by Sunday afternoon, it will be over.

The other reason is that lots of the best data are not publicly available. Sometimes, one of us might have a consulting client who allows us to use some subset of their data, but to get a dataset together of this size and scope requires lots of work on behalf of the company that gives us their data. We’ve been really fortunate to work with organizations and companies that took DataFest very seriously, and devoted many hours to assembling the data, verifying that no confidentialities are broken, and sending representatives to the events around the country to teach the students what the data are about. The companies and organizations that have given us data to date, notably the Los Angeles Police Department, e-Harmony, GridPoint and this year’s provider (which is still a secret until the Big Reveal!) did so only because we promised that the data would be used only at this event, and then deleted. Even so, lots of time was spent making sure that, say, no one’s eHarmony profile was identified or any other client information was compromised. There’s no way we could do this in the classroom; it’s just too hard and takes too much time.

Q. Why do you encourage students to find team members with a mix of different skills?

What’s the old saying? If all you have is a hammer, then every problem looks like a nail. If your team has lots of tools, you can tackle problems from a variety of perspectives and reach a deeper understanding.

Q. What gives you the greatest amount of satisfaction from the event?

Watching the enthusiasm of the students. Also, I’m always really, really impressed with the quality of the work that the students produce. They do things that amaze me, and I have no idea where they learned to do it!

Q. Is there anything else you would like to add?

Is it fair to make a plea for support? Part of the success of DataFest is the food. We feed them 6 meals and give them a mountain of snacks to motivate and encourage them. Food is important because it makes this more like a party, and it gives students a chance to unwind a bit and to talk to other teams — some of whom come from other universities and colleges — and perhaps learn to think a little bit differently.

Feeding almost 300 people six meals is quite expensive. This year, we’re estimating that about 700 students will participate in an ASA DataFest around the country. If you belong to a company that wants to support DataFest, please consider contributing to your local DataFest site. Speaking for our own event, there’s just no way it would happen if we didn’t have the patronage of a dozen or so very generous sponsors.

One more thing: you can learn more about DataFest at the official national site. And you can learn about UCLA’s event.

Thanks for the opportunity to talk to you!


About Robert Gould
Rob Gould

Rob Gould

Rob Gould, Ph.D., is a lecturer and vice-chair of undergraduate studies and director of Center for Teaching of Statistics at UCLA. He’s been involved in statistics education since the late 1990s, and is interested in the role of technology in teaching statistics. He founded the e-journal Technology Innovations in Statistics Education, and is a co-founder and regular blogger on the Citizen Statistician blog.

Rob earned a Ph.D. in Mathematics, at the University of California, San Diego, in 1987. He earned a Bachelor’s of Science in Applied Mathematics, at Harvey Mudd College. He has written articles that were published in International Statistical Review, Statistics Education Research Journal, and, he published an introductory statistics textbook with Colleen Ryan: Exploring the World Through Data (Pearson, 2012).