DataFest UCLA: Wrestling with big data impacts students long term
ASA DataFestTM 2016 is set to be another great event for students studying statistics, mathematics, computer science, and humanities. There is so much excitement that some students are coming from up to 200 miles away. Founder and Professor Robert Gould will host the event again at the University of California, Los Angeles, and even though he was busy with final preparations for the April 29-May 1 event, he took the time to chat with me about the growing momentum for this event that occurs not only in Southern California, but also around the world. From what Gould has shared, the impact on students doesn’t stop at the end of the event, but is growing and spreading throughout the year.
Q. With the event only a few days away, what do you have planned for it this year?
Gould: Last year was our biggest event ever with 250 students, and we decided that was pretty much our capacity. So we are sticking with the same number this year, in part because it is all we can handle. It takes so many hours to judge each team; they all give five-minute presentations, and then we deliberate. Time dictates that we stick with this number.
This year we have a new data sponsor we are very excited about. As usual, we cannot divulge the name of the sponsor prior to the event. The data set this year is the biggest in size, and the most complex we have had yet. So it will provide a lot of technical challenges for the students both for their coding skills and interpretation skills. We are really looking forward to seeing how that plays out.
Otherwise, this is the first year of some continuity. We have moved into a slightly larger space; it is one of the large banquet conference facilities here on campus. So the students will have a little more elbow room. We will have about the same number of visitors as we had last year, and we are looking forward to seeing them.
So this year is about continuity, and taking stock of what is best about DataFest.
Q. What growth have you seen compared to last year?
Gould: Demand for DataFest is growing. We are welcoming back the institutions outside of UCLA that participated last year. These include Pomona College, University of California Riverside, CalPoly San Luis Obispo, and University of Southern California. Each of these institutions, as well as UCLA, reported strong growth in the number of students participating, even though we have no additional space this year. At UCLA alone, we had 400 students apply.
The competition is tight just to join the competition. This speaks to the fact that we could easily have two DataFests at UCLA, if only we had enough judges. So we are looking to expand in the Southern California area next year, adding an additional site or two to meet the demand.
Furthermore, the American Statistical Association (ASA) has been one of our sponsors, and has been helping with logistical information. We are an official event of the ASA, and this year the event was held at 21 locations around the world, as far away as Germany, and Canada, all over the US as well. I think we will have almost 1,500 students competing somewhere in the country this year.
Q. What type of students should consider participating?
We work hard to make sure there is something for everyone, so students at all levels can find something that engages them. If they are beginners in statistics and data analysis, they shouldn’t expect to win, but they should expect to have a pretty exciting weekend. It is essential to have some computer skills or have someone on the team with some computer skills, because they really need to know how to read data and do some data manipulation like cleaning data, sorting data, linking files together, and subsetting data to extract just the rows that are useful to them.
It isn’t necessary that every student knows how to do that, but they should have someone on the team with those skills. They should have someone on the team who is a good communicator, who can piece the story together, and communicate clearly and quickly. I was looking over the enrollments this year, and several of the teams have students who are majoring in the humanities, and sometimes that is a really great idea because even if they don’t have strong data analysis skills, they know how to put together a strong presentation. (And of course, someone from the humanities with strong data skills is a killer combination.) So there really is room for students from all areas.
Q. One of the aspects of DataFest that I think is a great idea is involving professionals in the industry to participate and interact with students. Who do you have lined up this year?
Gould: Professionals play a really big role with the students. Like last year, we are asking them to come, go onto the floor, and talk to the teams. The students really appreciate the guidance that experienced data professionals can give them, especially strategic support because students often get so immersed in the minutiae that they need someone to help them pull back and see the big picture.
I don’t call DataFest a recruiting opportunity, but from the visitors’ point of view it is a scouting opportunity. They get to see students working in a stressful situation, in a team environment, and problem solving. We have had a number of students get interviews, internships and jobs from these events.
The final list of visitors is not quite complete, but so far we have people signed up from Google, Education Management Systems, KPMG, Under Armour, Walt Disney, Navigant Consulting, Sony Pictures, and Cedar Sinai Medical Center. So they are coming from a variety of industries, and research centers around the Los Angeles area.
Q. What do you think this type of an event gives students that you cannot replicate in the classroom?
Gould: First of all it gives them a data set that is more complex than what we can give them in the classroom. There are two reasons: The first reason is that the data is so complex that it takes a good 30-40 hours of work with a team to figure out what to do with it. Thirty to 40 hours is about the limit of contact hours across the quarter or term, so that means it is such a large size it is hard to fit into class in a meaningful way. The second reason is we work hard to get data from our data provider, and it takes a good eight months of work, mostly from their end and not just cleaning the data, but working with their lawyers to ensure that it is something they are willing to share. There aren’t that many businesses willing to put in that amount of work for a single classroom experience. But they are willing to do it for an event of this size; it brings so many good students to the problem, all at the same time.
We are also able to give them a more open-ended problem. It isn’t like a classroom assignment that says answer this question. It is an open-ended problem that says here is some data, here are the sorts of things the data sponsors want to know, now go and frame the problem. It is a complete modeling exercise where they have to frame the problem in a way that is meaningful for the data sponsor. So it is a very difficult experience to bring into a classroom.
Q. I know I asked you this last year, but I still like the question. What gives you the greatest amount of satisfaction?
Gould: Oh gosh, I guess the greatest amount of satisfaction is stepping onto the main floor, during the peak hours, and seeing 250 students buzzing. They are so deeply engaged in a problem, and it is at a level that we don’t get to see in our classrooms. I think, in the classroom they are often trying to please their professor; most of the time they have one eye on the professor trying to figure out what it is he or she really wants them to do. But here they are really trying to please themselves. They understand this is a problem that they alone are tackling and no one really knows the answer to. They take complete ownership and responsibility for it, and they come alive. They come up with solutions that are far beyond what happens in the classroom. I don’t know if it is just the competition that this out, but it is really exciting to see.
Another thing we have started to see–organizers from other campuses are seeing the same thing–is it is really starting to up the game across the campus. Students are coming to the classroom with a higher level of engagement. They are organizing their own workshops and seminars to learn things that they feel are important. It has created this community of learning on campus where everyone is deeply engaged and learning as much as they can throughout the year. So it has had this wonderful spill-over effect. It isn’t about this one weekend, but it has created a heightened level activity and attention throughout the whole year.
Q. How can people get involved to help?
We created a website about the event, and have provided a page “Support DataFest” where people can see the different opportunities. One thing we would really like, if they are in the area, is to visit. Another thing is financial support, and there is a place for people to give. A third thing is students love swag; they love to get t-shirts or mugs, or things like that. They also love books, especially ones that might help them with the problem they are working on at the moment. And, last, we need snacks and such. The website shows how those things can be sent along.
The major expense is renting a hall, the security, the AV and all the sorts of stuff involved in keeping a building open for 48 hours. We had a very substantial donation from the Office of Residential Life, which donated the use of the space. But looking into the future, we don’t know how long we will have to renew this grant with them every year. So we don’t know how long we will be able to do it. This is something we are looking for sponsors for in the future. But that still leaves the cost of food. We feed the 250 students six meals during the course of the event, plus all of the snacks, and caffeine, and such to keep them going.
About Robert Gould
Rob Gould, Ph.D., is a lecturer with security of employment and vice-chair of undergraduate studies and director of Center for Teaching of Statistics at UCLA. He’s been involved in statistics education since the late 1990s, and is interested in the role of technology in teaching statistics. He founded the e-journal Technology Innovations in Statistics Education, and is a co-founder and regular blogger on the Citizen Statistician blog.
Rob earned a Ph.D. in Mathematics, at the University of California, San Diego, in 1987. He earned a Bachelor’s of Science in Applied Mathematics, at Harvey Mudd College. He has written articles that were published in International Statistical Review, Statistics Education Research Journal, and has published several textbooks: