April 13, 2017

Student-led course sparks interest in data science

Amit Mizrahi/Provided
Students attend the Cornell Data Science Training Program in Gates G01.

Walk into Gates Hall G01 on Wednesdays at 5 p.m. and you’ll see a crowd of students gathered in their seats. A lecturer at the front of the room runs through carefully prepared slides, describing machine-learning algorithms and data-processing techniques. Students take notes and ask questions, which are answered in detail on the chalkboard. This scene looks and sounds like a typical Cornell lecture, aside from one detail: All of the material is prepared and taught by students.

The Cornell Data Science Training Program is a one-credit unofficial course offered through the Cornell Data Science project team. The 12-week course focuses on manipulating data, visualizing trends and implementing machine-learning algorithms. Students are not expected to have any programming experience and are taught the R programming language (used for statistical analysis and data mining) to supplement the concepts they encounter. They then use their new skills to complete four assignments and two real-world data science projects.

Cornell Data Science has been selected as co-winner of the 2017 College of Engineering Alumni Association's Albert R. George Student Team Award (the other winner is Engineers Without Borders). The award includes $1,000 for the team to spend on initiatives like the training program.

The course materials are the work of Dae Won Kim ’17, Amit Mizrahi ’19, Chase Thomas ’19, Kenta Takatsu ’19 and Jared Lim ’20. The five noticed a demand in the undergraduate community for a practical, hands-on introduction to data science. The group hopes to equip students with the tools and knowledge needed to take on their own projects.

“The Data Science Training Program can be more sensitive to trending industry standards because we’re not bound by the same constraints as college courses, and we are positioned to provide a more industry-sensitive introduction to data science,” Kim said.

The course is aimed at freshmen and sophomores representing a variety of majors and interests. “Everything today from marketing, to health care, to finance, to agriculture uses data science,” Thomas said. “There’s a lot of buzz around data science right now; we saw the course as an opportunity to provide an introduction accessible to people from all backgrounds.”

Over winter break the group collaborated to create lecture slides, notes, assignments and projects from scratch. The team members drew on past experiences, such as working in data science internships and taking online courses from peer institutions like the Massachusetts Institute of Technology and the University of California, Berkeley.

“With the widespread growth of big data tools, there has been no better time to learn data science,” Mizrahi said. “It’s exciting to be able to help teach a field that hardly existed 15 or 20 years ago.”

The course will continue through May and will be offered again next semester.

Leslie Morris is director of communications for Computing and Information Science.