Capstone Projects

Education is one of the pillars of the data science institute..

Through educational activities, we strive to create a community in Data Science at Columbia. The capstone project is one of the most lauded elements of our MS in Data Science program. As a final step during their study at Columbia, our MS students work on a project sponsored by a DSI industry affiliate or a faculty member over the course of a semester.

Faculty-Sponsored Capstone Projects

A DSI faculty member proposes a research project and advises a team of students working on this project. This is a great way to run a research project with enthusiastic students, eager to try out their newly acquired data science skills in a research setting. This is especially a good opportunity for developing and accelerating interdisciplinary collaboration.

2023-2024 Academic Year: July 15, 2023 via this form

Project Archive

  • Spring 2022
  • Spring 2020
  • Spring 2019
  • Spring 2018
  • Spring 2016

Data Science

Search Submit search

DS: 401 Capstone Projects

Selected projects from 2022:.

P1: Public Health - Overdose Data Dashboard  

P2: USDA Commodity Dashboard(s)

P3: Classification and Analysis Pipeline of Political Video Advertisements - Dashboard with Google Cloud

P4: CSAFE Assessing and modeling quality of 3d topographic scans of fired bullets

P5: Department of Residence - The Impact of Living on Campus on Student Success

The Public Science Collaborative at ISU is looking for advanced data science students to join a research project focusing on the engineering and visualization of public health data. The key task for the spring semester will be to build an opioid overdose data dashboard similar to  this one in California . DS 401 interns will work in a supervised, collaborative team science environment to clean, analyze, and visualize data from four data sets, including a) vital statistics mortality data, b) emergency department overdose data, c) substance abuse treatment episodes data, and d) the Iowa Youth Survey dataset. We are a pluralistic coding environment and welcome students using Python, R, Stata, SAS, and other data management and analytic platforms.

Because this project is funded by an  Overdose Data to Action.  a grant from the Centers for Disease Control, students who are accepted to the project will have the opportunity to pair a funded research assistantship with their DS 401 internship. This opportunity would be an especially good fit for students who are interested in data visualization and data science communication.

Project advisors: Shawn F. Dorius - Associate Professor of Sociology


Students selecting this project will develop a series of dashboards using Tableau. These dashboards will utilize data from the USDA Agricultural Census to show trends in production for selected commodities (such as apples, cheese, grapes, dairy, pork, lettuce, tomatoes, potatoes, strawberries, bees, and honey or wine). Trends may also include the monthly or annual quantity, the number of producers, acres in production, total sales, and other metrics at multiple geographies (county, state nation). Students will also incorporate demographic data for selected areas of interest that highlights the potential regional market and the market and consumption profile (food expenditures, farmers' marker density, schools with farms-to-school programs, etc). Students will be provided with access to Tableau and Tableau Server and will utilize R’s TidyCensus package to acquire data from the American Community Survey (ACS).

Project advisors:

Christopher J. Seeger, PLA, GISP - Professor, GIS Specialist and Director of Extension Indicators Program and 2022 DSPG Chair

Bailey Hanson, GISP - GIS Specialist; Leads GIS program and Data for Decision Maker program. Note her background includes a Master in Human-Computer Interaction.


Using the public data from the Google Transparency Report, this project will create a pipeline of extracting, processing, classifying, and visualizing the Political Ads data using a Google Cloud computing platform.

Campaign advertising through social media platforms has been growing at a high rate, which creates a large volume of content on the Internet. To increase transparency in federal campaign advertising, Google Inc. created  Google Transparency Report (GTR) . GTR provides websites and searchable databases about federal election campaign ads aired on Google and partners’ platforms. According to GTR, political advertisers have spent around $800M on election campaigns since May 2018.

This project made a platform for a collection of video ads aired on YouTube and for automated content analysis. It's able to 1) automatically classify a video ad into either a political category or a non-political one, (2) analyze predicted political ads into one of these types of interest to political science scholars: promote, attack, or contrast, 3) extract issues of interest for political science research, and 4) determine the polarity and subjectivity of a given ad. 5) Create various visualization charts from the previous analysis. 

Adisak Sukul - Associate Teaching Professor, Computer Science. Instructors for Data Science courses. Google Cloud Faculty Expert


A large part of a forensic examiner’s job is to visually compare evidence to decide whether two pieces of evidence come from the same source (e.g. bullets fired from the same barrel, prints from the same shoe, the same finger).

3d digital microscopy provides a basis to bring in algorithms in an attempt to make comparisons of evidence objective and quantify similarities (or dissimilarities). The high-resolution microscopy lab at Iowa State has acquired scans of bullet lands. 

Good-quality scans are essential for assessing the similarity of the striations (the marks engraved on the bullet as it passes through the barrel). 

In this project, the goal is to derive features capturing (aspects of) the quality of scans and build a model to predict a quality indicator. Ideally, this feedback will be given at the time of scanning, such that a lack of quality can be addressed immediately.

Students will work under the guidance of Dr. Heike Hofmann to derive features capturing scan quality, work on a model incorporating these scan analytics, and depending on time, design an app for giving feedback to scanning personnel.

Preferred skills: proficiency in R, and knowledge of HTML/javascript would be a plus.

Heike Hofmann, Professor, and Professor in Charge of the Data Science Program - Department of Statistics

Final R Package:


Project Description: The Department of Residence is interested in understanding how living on campus, both your first year and subsequent years after, impacts student success measures such as graduation and retention.  We’re also looking to understand whether those impacts are the same or different for different sub-groups of students (such as students of color, first-generation students, etc.).  The audience for this data would be considered a non-technical audience, with a limited background in understanding and analyzing data.  The data file is already compiled and will be provided to this team.  No preference for analysis software. 

This project contains sensitive and private information. All of the findings from this project will remain private.

Dr. Elizabeth Housholder, serves as the Senior Research Analyst for the Department of Residence.


Capstone Projects

The culminating experience in the Master’s in Applied Data Science program is a Capstone Project where you’ll put your knowledge and skills into practice . You will immerse yourself in a real business problem and will gain valuable, data driven insights using authentic data. Together with project sponsors, you will develop a data science solution to address organization problems, enhance analytics capabilities, and expand talent pools and employment opportunities. Leveraging the university’s rich research portfolio, you also have the option to join a research-focused team .

Selected Capstone Projects

Copd readmission and cost reduction assessment, an nfl ticket pricing study: optimizing revenue using variable and dynamic pricing methods, using image recognition to identify yoga poses, using image recognition to measure the speed of a pitch, real-time credit card fraud detection, interested in becoming a capstone sponsor.

The Master’s in Applied Data Science program accepts projects year-round for placement at the beginning of every quarter, with the Spring quarter being the largest cohort. All projects must be submitted no later than one month prior to the beginning of the preferred starting quarter based on the UChicago academic calendar .

Capstone Sponsor Incentives

Sponsors derive measurable benefits from this unique opportunity to support higher education. Partner organizations propose real-world problems, untested ideas or research queries. Students review them from the perspective of data scientists trained to generate actionable insights that provide long-term value. Through the project, Capstone partners gain access to a symbiotic pool of world-class students, highly accomplished instructors, and cited researchers, resulting in optimized utilization of modern data science-based methods, using your data. Further, for many sponsors, the project becomes a meaningful source of recruitment through the excellent pool of students who work on your project.

Capstone Sponsor Obligations

While there is no monetary cost or contract necessary to sponsor a project, we do consider this a partnership. Teams comprised of four students and guided by an instructor and subject matter expert are provided with expectations from the capstone sponsor and learning objectives, assignments, and evaluation requirements from instructors. In turn, Capstone partners should be prepared to provide the following:

  • A detailed problem statement with a description of the data and expected results
  • Two or more points of contact
  • Access to data relevant to the project by the first week of the applicable quarter
  • Engagement through regular meetings (typically bi-weekly) while classes are in session
  • If requested, a non-disclosure agreement that may be completed by the student team

Interested in Becoming a Capstone or Industry Research Partner?

Get in touch with us to submit your idea for a collaboration or ask us questions about how the partnership process works.

Apply Today

The application portal for entrance in Autumn 2024 is now open ! Explore our In-Person and Online programs.


  1. capstone-project-ideas-for-data-science

    capstone project ideas for data science

  2. Capstone Project Ideas For Data Analytics

    capstone project ideas for data science

  3. Top 20 (Interesting) Data Science Projects Ideas

    capstone project ideas for data science

  4. 25 Data Science Project Ideas for Beginners with Source Code

    capstone project ideas for data science

  5. 16 Data Science Projects with Source Code to Strengthen your Resume

    capstone project ideas for data science

  6. Top Capstone Project Ideas for Information Technology

    capstone project ideas for data science


  1. Best Data Science Projects in 2021

  2. Data Science Project ideas from ChatGPT #shorts

  3. Capstone Project Idea for IT Students: "WriteIt"

  4. Capstone Projects for Library System


  6. Data Science Capstone Project Spotlight: Language Detection App