Stanford - Department of Biomedical Data Science

Biomedical Data Science Graduate Program Overview

The Biomedical Data Science Training Program is an interdisciplinary graduate and postdoctoral training program, part of the Department of Biomedical Data Science at Stanford University’s School of Medicine.

Our Mission

History of our graduate program, gifts and donations, employment in biomedical data science, directions to dbds, contact information, our educational mission.

The mission of DBDS is to train the next generation of research leaders in Biomedical Data Science. Our students gain knowledge of the scholarly informatics literature and the application requirements of specific areas within biology and medicine. They learn to design and implement novel methods that are generalizable to a defined class of problems, focusing on the acquisition, representation, retrieval, and analysis of biomedical information. We also require training in understanding ethical, social, and legal issues and consequences of research. We seek to attract diverse candidates from all backgrounds and experiences.

What is Biomedical Data Science?

Biomedical Data Science is a broad term comprising multiple areas.

  • Bioinformatics develops novel methods for problems in basic biology.
  • Translational Bioinformatics moves developments in our understanding of disease from basic research to clinical care.
  • Clinical Informatics develops methods and tools directly applied to patient care.
  • Public Health Informatics works on challenging problems from health systems and populations.
  • Imaging Informatics addresses intelligent management, interpretation, and annotation of biomedical images.

Take a look at our current courses. 

Our Graduate Degrees

The graduate training program offers the PhD degree, and three MS degrees (an academic research-oriented degree, a professional distance-learning masters for part-time students, and co-terminal for Stanford undergraduates). We also have post-doctoral fellows, and offer a distance learning certificate.

  • Prerequisites . For a graduate degree, the University requires the applicant to have a bachelor’s degree. We do not require any particular major, but we do require that students have strong undergraduate preparation in computer science/software engineering, mathematics (especially calculus, probability and statistics, and linear algebra), and college-level biology. Applicants with limited backgrounds in these areas should fill the deficiencies prior to applying to our program.
  • Curriculum . MS and PhD candidates take coursework in four areas: (1) core DBDS classes, (2) an individual plan with electives in computer science, statistics, mathematics, engineering, and allied informatics-related disciplines, (3) required coursework in social, legal, and ethical issues, (4) unrestricted electives. In addition, PhD candidates are required to choose electives in some area of biology or medicine. Degree candidates also learn important didactic skills by serving as teaching assistants in our core courses.
  • Funding . We have been continuously funded by  a training grant from the National Library of Medicine since 1984, which provides fellowship support for students who are US citizens and permanent residents. International students bring outside funding or compete for Stanford Graduate Fellowships. Senior graduate students typically receive funding support through their research supervisor.

The History of Our Graduate Program

History at Stanford

The Biomedical Data Science Graduate Program has a long history both at Stanford and internationally, as the first program of its kind. The degree program was initiated in October 1982 as Medical Information Sciences (MIS) and continues to emphasize interdisciplinary education between medicine, computer science, and statistics, offering pre- and postdoctoral degrees and training. The DBDS Program has been supported by a training grant from the National Library of Medicine since 1984, which initially funded only postdoctoral trainees but was broadened to include predoctoral trainees in 1987. The NLM training grant has been renewed every five years since and has provided tuition and stipend support for hundreds of trainees.

Today, the Biomedical Data Science Graduate Program sits in the newly formed Department of Biomedical Data Science and emphasizes methods development and application across the entire spectrum of biology, medicine, and human health.

A Foundation in Medicine and Computer Science

The interaction between Computer Science and other disciplines has produced vibrant areas of research and education at Stanford since the late 1960s; computing activities in the School of Medicine were stimulated even earlier, principally by the Chair of Genetics, Nobel Laureate Joshua Lederberg. Professor Lederberg collaborated with Professor Carl Djerassi (Chemistry) and Professor Edward Feigenbaum (Computer Science) to create what is arguably the first research program that applied the nascent field of artificial intelligence to biomedical problems. Their U.S. Dendral system, which studied the expertise of mass spectroscopists who could interpret an organic compound’s mass spectrum to infer the chemical structure of that compound, is considered the first expert system.

Professor Lederberg’s second key effort was to attract NIH funding for a large medically focused shared computer for the medical school. This computer, known as ACME, was heavily used by Stanford medical researchers, educators, and students until 1973. It brought a computing culture into the environment, which in turn began to attract medical students who had an interest in the intersection of the two fields.  Later ACME gave way to the SUMEX-AIM Computer, also funded by NIH with Lederberg as PI. This resource was the first biomedically focused machine on the ARPANet, which evolved to become today’s Internet.  The SUMEX Computer was a key resource at Stanford for almost 20 years.

Working closely with Stanley Cohen (a Professor of Medicine who later succeeded Lederberg as Chair of Genetics) and Bruce Buchanan (a research scientist in computer science who was a member of the Dendral Project), Edward Shortliffe undertook a combined MD/PhD with the doctoral degree in a self-designed interdisciplinary program. Further discussion with faculty, students, and researchers emphasized the interest and need to formalize this kind of interdisciplinary education, directly leading to the formation of the MIS graduate program.

The Human Genome Project and a Turn at the Turn of the Century

The launch of the Human Genome Project in 1990 and its completion in 2003 seeded substantial interest and need for computing in the biological community. In 2000 Dr. Russ B. Altman succeeded Dr. Shortliffe as Director of the MIS Program and in recognition of a new mission beyond clinical informatics, to fundamental issues of biomedical knowledge, its representation and its application, the program was renamed Biomedical Data Science  Training Program  (DBDS). The term Biomedical Data Science   represents not only the continued development of medical information systems but also the use of sophisticated computation to study medicine at the molecular, cellular, organismal, and population levels.

Biomedical Data Science Today

On September 1, 2023, the Biomedical Informatics (BMI) training program finalized its last step in merging with the Department of Biomedical Data Science (DBDS) and formally changed its name to the Biomedical Data Science Training Program.

Our trainees admitted after September 1, 2023 will earn their Master’s and PhD degrees in Biomedical Data Science.

The mission of our department and the training program remain fully aligned to “advance precision health by leveraging large, complex, multi-scale real-world data through the development and implementation of novel analytical tools and methods.” Aligning the name of the degree program with department name was widely regarded as both logical and appropriate. More importantly, it reflects a shared vision in our research and education missions that serves to pull our integrated work in biomedical informatics, biostatistics and AI/ML under a unified interdisciplinary umbrella.

The DBDS Training Program at Stanford continues to evolve to meet the needs of biomedical computation and application. Under the guidance of the current Director since 2018 and Chair of the Department of Biomedical Data Science, Professor Sylvia Plevritis, and with support from NLM, the DBDS Program continues to innovate in the areas of Healthcare and Clinical Informatics, Translational Bioinformatics, and Clinical Research Informatics. In addition to historical research thrusts in biomedical knowledge representation and the genetic basis of disease, current research explores algorithms for real world biomedical data, multi-modal data and meta-analysis, medical image analysis, responsible clinical decision making, reproducibility, methods for efficient querying and access to big biomedical data, and more.

Gifts and Donations to Stanford DBDS

DBDS has benefited greatly from the financial generosity of alumni and other donors. We have two funds which support student activities:

  • Biomedical Data Science Gift Fund. Donations from alumni support pizza lunches, dinners during grant-writing sessions, turkey and trimmings for the annual holiday party and DBDS logo items for recruits and students. Additionally, the fund provides books for graduating student representatives and gift cards for poster award winners at the annual retreat.
  • Darlene Vian Memorial Fund for Student Support.  This endowed fund was established in memory of beloved student services administrator, mentor and confidant, Darlene P. Vian, who passed away in 2011.  In May 2016, the fund was endowed with the initial $100,000 of gifts from Darlene’s many alumni, colleagues, friends, and relatives.  The Vian Fund supports student activities that Darlene knew were key to a cohesive program and career success, such as the annual offsite retreat, travel expenses for conferences, student recruiting receptions, barbeques, TGI parties and alumni dinners.  It may also be used to support student research projects and tuition.  Donors at the $1000+ level are acknowledged on a perpetual memorial plaque in the Medical School Office Building.

Please contact our Student Services Team if you would like information about how to make a tax-deductible contribution to any of these funds.

Prospective students with interest in career directions in Biomedical Data Science should review a list of our Alumni and their current jobs under the People Directory .

If you have a job posting that you would like to send to the DBDS students and post-docs, please email it to dbds-job-openings at lists.stanford.edu for distribution as we deem appropriate for our audience.

DBDS Current Students and Alumni

The  School of Medicine Career Center  offers resources for professional and leadership development, resources for the job hunt ranging from presentation skills, resume preparation, interview skills to job hunt strategy. There is a seminar series from both industry and academia, and a number of industry events: demos, job fairs, industry mixers.

The University’s  Career Development Center  supports undergraduate and graduate career development. They have  Career Fairs .

To add your name to the DBDS jobs email list, send your request to the DBDS student services team .

External Job Listings in Biomedical Data Science

AMIA Job Exchange BayBio’s Job Sites list BioCareer’s Job site Bioinformatics.org’s Jobs site BioinformaticsDirectory listings Genomeweb’s Job listings ISCB Jobs Database Nature’s Jobs list New Scientist Jobs NIH’s job listings Science Career’s Ziprecruiter

Postdoctoral Positions at Stanford

Please see the descriptions for various opportunities in Biomedical Data Science under Postdoctoral Training

Directions to DBDS Program Offices

The DBDS Program Offices are in the Stanford’s Medical School Office Building (MSOB). The street address is: 1265 Welch Road, Stanford, CA 94305.

MSOB is located on the corner of Campus Drive West and Welch Road, between Panama Street and Welch Road. MSOB is a three story white building with redwood window framing. The exact latitude/longitude is 37.431734, -122.179476. See the map, below.

There are two options for parking:

  • The parking lot in front of our building, which has an entrance on Welch Road. This lot has a few parking spots with coin metered parking.
  • The large parking lot across the street on Welch Road. Entrance to the lot is from Stock Farm Road or Oak Road, but you have to drive within the lot towards the corner of Welch Road and Campus Drive. Payment is through cash, coins, or credit card using an automated permit dispenser. Information:  https://transportation.stanford.edu/parking

For all questions about the program, email: 

[email protected]

Mailing Address: Office Location 

Department of Biomedical Data Science Graduate Training Program

Stanford University School of Medicine

1265 Welch Road, MSOB X-343

Stanford, CA 94305-5464

University of Delaware

PhD in Bioinformatics Data Science

iStock_-Research-1-1024×683

A Ph.D. in Bioinformatics Data Science will train the next-generation of researchers and professionals who will play a key role in multi- and interdisciplinary teams, bridging life sciences and computational sciences. Students will receive training in experimental, computational and mathematical disciplines through their coursework and research. Students who complete this degree will be able to generate and analyze experimental data for biomedical research as well as develop physical or computational models of the molecular components that drive the behavior of the biological system.

Students must complete a minimum of 15 hours of coursework, plus 3 credit hours of seminar, 6 credit hours of research and 9 credit hours of doctoral dissertation. The Ph.D. requires a minimum of 33 credits. Students who are admitted directly after a B.S. degree will be required to complete up to 9 additional credits in order to fulfill the core curriculum in the following areas: Database Systems, Statistics, and Introduction to Discipline. In addition, if students entering the program with an M.S. degree are lacking equivalent prerequisites, they also will be required to complete courses in these three areas; however, these courses may fulfill the elective requirement in the Ph.D. program, if approved in the program of study.

Academic Load

PhD students holding research assistantships (or teaching) are considered full-time with 6 credit hours . Students without RA or TA  are considered full-time if enrolled in at least 9 credit hours or in sustaining credit. Those enrolled for fewer than 9 credit hours are considered part-time students. Generally, a maximum load is 12 graduate credit hours; however, additional credit hours may be taken with the approval of the student’s adviser and the Graduate College. A maximum course load in either summer or winter session is 7 credit hours. Permission must be obtained from the Graduate College to carry an overload in any session. 

Bioinformatics Data Science Courses

Students must take one course in each of the following areas (9 credits):

Prerequisites

Students must fulfill core curriculum in each of the following areas (3-9 credits):

Elective Courses

Students must take two courses to compliment their bioinformatics data science dissertation project (6 credits): 

See Elective courses

Students must take six semesters of seminar (three 0 credit; three 1 credit) and give a presentation during three semesters.

Other Requirements:

  • Formation of Graduate Dissertation Committee
  • Successful completion of Graduate Preliminary Exam
  • Research on a significant scientific problem
  • Successful completion of Ph.D. Candidacy Exam
  • Successful completion of Dissertation Defense

Formation of Graduate Committee

The student needs to establish a Dissertation Committee within the first year of study. The Committee should consist of at least four faculty members, including the primary faculty advisor (serving as the Committee Chair), a secondary faculty advisor (in a complementary field to the primary advisor), a second faculty from the home department, and one CBCB affiliate faculty outside the Departments of the primary and secondary advisors or from outside the University. Students must complete the Dissertation Committee Formation form and submit to the Associate Director.

Students should convene their dissertation committee at least once every six months.

Preliminary Examination

The preliminary examination should be taken before the end of the fourth semester and will consist of an oral exam in subjects based on the Bioinformatics Data Science core.* In recognition of the importance of the core curriculum in providing a good test of the student’s knowledge, students must achieve a minimum 3.0 GPA in the core curriculum before taking the preliminary exam. Students will not be permitted to take the preliminary examination if the core grade requirements and cumulative GPA of 3.0 has not been achieved. The exam will be administered by the Preliminary Exam Committee , which will consist of one instructor from each of the three core courses. Each member of the Committee will provide a single grade (pass, conditional pass or fail) and the final grades will be submitted via the Results of Preliminary Exam Form :

  • Pass . The student may proceed to the next stage of his/her degree training.
  • Conditional pass . In the event that the examination committee feels that the student did not have an adequate background or understanding in one or more specific areas, the Preliminary Exam Committee will communicate the conditional pass to the student and must provide the student with specific requirements and guidelines for completing the conditional pass. The student must inform the Preliminary Exam Committee, the Graduate Program Director and Program Committee when these conditions have been completed. The Preliminary Exam Committee will then meet with the student to ensure all recommendations have been completed and whether a re-examination is necessary. If required, the re-examination will be done using the same format and prior to the beginning of the next academic semester. If the student still does not perform satisfactorily on this re-examination, he/she will then be recommended to the Graduate Affairs Committee for dismissal from the graduate program.
  • Failure . This outcome would indicate that the Examination Committee considers the student incapable of completing degree training. The student’s academic progress will be reviewed by the Graduate Affairs Committee, who will make recommendations to the Program Director regarding the student’s enrollment status. The Program Director may recommend to the Graduate College that the student be dismissed from the Program immediately.

*Students who need to complete prerequisite courses may request a deadline extension for the preliminary and subsequently the candidacy examination. Requests must be submitted to the Graduate Program Committee prior to the start of the third semester.

Candidacy Exam

The candidacy examination must be completed by the end of the sixth semester of enrollment.* It requires a formal, detailed proposal be submitted to the Dissertation Committee and an oral defense of the student’s proposed research project. Upon the recommendation of the Dissertation Committee, the student may be admitted to candidacy for the Ph.D. degree. The stipulations for admission to doctoral candidacy are that the student has (i) completed one academic years of full-time graduate study in residence at the University of Delaware, (ii) completed all required courses with the exception of BINF865 and BINF969, (iii) passed the preliminary exams, (iv) demonstrated the ability to perform research, and (v) had a research project accepted by the Dissertation Committee. Within one week of the candidacy exam, complete and submit the Recommendation for Candidacy for Doctoral Degree form for details. A copy of the completed form should be given to the Associate Director.

*Students who need to complete prerequisite courses may request a deadline extension for the preliminary and subsequently the candidacy examination.  Requests must be submitted to the Graduate Program Committee prior to the start of the third semester.

Dissertation Exam

The dissertation examination of the Ph.D. program will involve the approval of the written dissertation and an oral defense of the candidate’s dissertation.  The written dissertation will be submitted to the Dissertation Committee and the CBCB office at least three weeks in advance of the oral defense date.  The oral defense date will be publicly announced at least two weeks prior to the scheduled date. The oral presentation will be open to the public and all members of the Bioinformatics Data Science program. The Dissertation Committee will approve the candidate’s dissertation. The student and the primary faculty advisor will be responsible for making all corrections to the dissertation document and for meeting all Graduate College deadlines.  Within one week of the dissertation defense, complete and submit the Certification of Doctoral Dissertation Defense Form. A copy of the completed form should be given to the Associate Director.

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Center for Computational Biology

Computational Biology PhD

The main objective of the Computational Biology PhD is to train the next generation of scientists who are both passionate about exploring the interface of computation and biology, and committed to functioning at a high level in both computational and biological fields.

The program emphasizes multidisciplinary competency, interdisciplinary collaboration, and transdisciplinary research, and offers an integrated and customizable curriculum that consists of two semesters of didactic course work tailored to each student’s background and interests, research rotations with faculty mentors spanning computational biology’s core disciplines, and dissertation research jointly supervised by computational and biological faculty mentors.

The Computational Biology Graduate Group facilitates student immersion into UC Berkeley’s vibrant computational biology research community. Currently, the Group includes over 46 faculty from across 14 departments of the College of Letters and Science, the College of Engineering, the College of Natural Resources, and the School of Public Health. Many of these faculty are available as potential dissertation research advisors for Computational Biology PhD students, with more available for participation on doctoral committees.

biology phd to data science

The First Year

The time to degree (normative time) of the Computational Biology PhD is five years. The first year of the program emphasizes gaining competency in computational biology, the biological sciences, and the computational sciences (broadly construed). Since student backgrounds will vary widely, each student will work with faculty and student advisory committees to develop a program of study tailored to their background and interests. Specifically, all first-year students must:

  • Perform three rotations with Core faculty (one rotation with a non-Core faculty is acceptable with advance approval)
  • Complete course work requirements (see below)
  • Complete a course in the Responsible Conduct of Research
  • Attend the computational biology seminar series
  • Complete experimental training (see below)

Laboratory Rotations

Entering students are required to complete three laboratory rotations during their first year in the program to seek out a Dissertation Advisor under whose supervision dissertation research will be conducted. Students should rotate with at least one computational Core faculty member and one experimental Core faculty member. Click here to view rotation policy. 

Course Work & Additional Requirements

Students must complete the following coursework in the first three (up to four) semesters. Courses must be taken for a grade and a grade of B or higher is required for a course to count towards degree progress:

  • Fall and Spring semester of CMPBIO 293, Doctoral Seminar in Computational Biology
  • A Responsible Conduct of Research course, most likely through the Department of Molecular and Cell Biology.
  • STAT 201A & STAT 201B : Intro to Probability and Statistics at an Advanced Level. Note: Students who are offered admission and are not prepared to complete STAT 201A and 201B will be required to complete STAT 134 or PH 142 first.
  • CS61A : The Structure and Interpretation of Computer Programs. Note: students with the equivalent background can replace this requirement with a more advanced CS course of their choosing.
  • 3 elective courses relevant to the field of Computational Biology , one of which must be at the graduate level (see below for details).
  • Attend the computational biology invited speaker seminar series. A schedule is circulated to all students by email and is available on the Center website. Starting with the 2023 entering class, CCB PhD students must enroll in CMPBIO 275: Computational Biology Seminar , which provides credit for this seminar series.
  • 1) completion of a laboratory course at Berkeley with a minimum grade of B,
  • 2) completion of a rotation in an experimental lab (w/ an experimental project), with a positive evaluation from the PI,
  • a biological sciences undergraduate major with at least two upper division laboratory-based courses,
  • a semester or equivalent of supervised undergraduate experimental laboratory-based research at a university,
  • or previous paid or volunteer/internship work in an industry-based experimental laboratory.

Students are expected to develop a course plan for their program requirements and to consult with the Head Graduate Advisor before the Spring semester of their first year for formal approval (signature required). The course plan will take into account the student’s undergraduate training areas and goals for PhD research areas.

Satisfactory completion of first year requirements will be evaluated at the end of the spring semester of the first year. If requirements are satisfied, students will formally choose a Dissertation advisor from among the core faculty with whom they rotated and begin dissertation research.

Waivers: Students may request waivers for the specific courses STAT 201A, STAT 201B, and CS61A. In all cases of waivers, the student must take alternative courses in related areas so as to have six additional courses, as described above. For waiving out of STAT 201A/B, students can demonstrate they have completed the equivalent by passing a proctored assessment exam on Campus. For waiving out CS61A, the Head Graduate Advisor will evaluate student’s previous coursework based on the previous course’s syllabus and other course materials to determine equivalency.

Electives: Of the three electives, students are required to choose one course in each of the two following cluster areas:

  • Cluster A (Biological Science) : These courses are defined as those for which the learning goals are primarily related to biology. This includes courses covering topics in molecular biology, genetics, evolution, environmental science, experimental methods, and human health. This category may also cover courses whose focus is on learning how to use bioinformatic tools to understand experimental data.
  • Cluster B (Computational Sciences): These courses are defined as those for which the learning goals involve computing, inference, or mathematical modeling, broadly defined. This includes courses on algorithms, computing languages or structures, mathematical or probabilistic concepts, and statistics. This category would include courses whose focus is on biological applications of such topics.

In the below link we give some relevant such courses, but students can take courses beyond this list; for courses not on this list, the Head Graduate Advisor will determine to which cluster a course can be credited. For classes that have significant overlap between these two clusters, the department which offers the course may influence the decision of the HGA as to whether the course should be assigned to cluster A or B.

See below for some suggested courses in these categories:

Suggested Coursework Options

Second Year & Beyond

At the beginning of the fall of the second year, students begin full-time dissertation research in earnest under the supervision of their Dissertation advisor. It is anticipated that it will take students three (up to four) semesters to complete the 6 course requirement. Students are required to continue to participate annually in the computational biology seminar series.

Qualifying Examination

Students are expected to take and pass an oral Qualifying Examination (QE) by the end of the spring semester (June 15th) of their second year of graduate study. Students must present a written dissertation proposal to the QE committee no fewer than four weeks prior to the oral QE. The write-up should follow the format of an NIH-style grant proposal (i.e., it should include an abstract, background and significance, specific aims to be addressed (~3), and a research plan for addressing the aims) and must thoroughly discuss plans for research to be conducted in the dissertation lab. Click here for more details on the guidelines and format for the QE. Click here to view the rules for the composition of the committee and the form for declaring your committee.

Advancement to Candidacy

After successfully completing the QE, students will Advance to Candidacy. At this time, students select the members of their dissertation committee and submit this committee for approval to the Graduate Division. Students should endeavor to include a member whose research represents a complementary yet distinct area from that of the dissertation advisor (ie, biological vs computational, experimental vs theoretical) and that will be integrated in the student’s dissertation research. Click here to view the rules for the composition of the committee and the form for declaring your committee.

Meetings with the Dissertation Committee

After Advancing to Candidacy, students are expected to meet with their Dissertation Committee at least once each year.

Teaching Requirements

Computational Biology PhD students are required to teach at least two semesters (starting with Fall 2019 class), but may teach more. The requirement can be modified if the student has funding that does not allow teaching. Starting with the Fall 2019 class: At least one of those courses should require that you teach a section. Berkeley Connect or CMPBIO 293 can count towards one of the required semesters.

The Dissertation

Dissertation projects will represent scholarly, independent and novel research that contributes new knowledge to Computational Biology by integrating knowledge and methodologies from both the biological and computational sciences. Students must submit their dissertation by the May Graduate Division filing deadline (see Graduate Division for date) of their fifth–and final–year.

Special Requirements

Students will be required to present their research either orally or via a poster at the annual retreat beginning in their second year.

  • Financial Support

The Computational Biology Graduate Group provides a competitive stipend (the stipend for 2023-24 is $43,363) as well as full payment of fees and non-resident tuition (which includes health care). Students maintaining satisfactory academic progress are provided full funding for five to five and a half years. The program supports students in the first year, while the PI/mentor provides support from the second year on. A portion of this support is in the form of salary from teaching assistance as a Graduate Student Instructor (GSI) in allied departments, such as Molecular and Cell Biology, Integrative Biology, Plant and Microbial Biology, Mathematics, Statistics or Computer Science. Teaching is part of the training of the program and most students will not teach more than two semesters, unless by choice.

Due to cost constraints, the program admits few international students; the average is two per year. Those admitted are also given full financial support (as noted above): stipend, fees and tuition.

Students are also strongly encouraged to apply for extramural fellowships for the proposal writing experience. There are a number of extramural fellowships that Berkeley students apply for that current applicants may find appealing. Please note that the NSF now only allows two submissions – once as an undergrad and once in grad school. The NSF funds students with potential, as opposed to specific research projects, so do not be concerned that you don’t know your grad school plans yet – just put together a good proposal! Although we make admissions offers before the fellowships results are released, all eligible students should take advantage of both opportunities to apply, as it’s a great opportunity and a great addition to a CV.

  • National Science Foundation Graduate Research Fellowship (app deadlines in Oct)
  • Hertz Foundation Fellowship (app deadline Oct)
  • National Defense Science and Engineering Graduate Fellowship (app deadline in mid-Fall)
  • DOE Computational Science Graduate Fellowship (Krell Institute) (app deadline in Jan)

CCB no longer requires the GRE for admission (neither general, nor subject). The GRE will not be seen by the review committee, even if sent to Berkeley.

PLEASE NOTE: The application deadline is Wednesday, November 30 , 2023, 8:59 PST/11:59 EST

If you would like to learn more about our program, you can watch informational YouTube videos from the past two UC Berkeley Graduate Diversity Admissions Fairs: 2021 recording & 2020 recording .

We invite applications from students with distinguished academic records, strong foundations in the basic biological, physical and computational sciences, as well as significant computer programming and research experience. Admission for the Computational Biology PhD is for the fall semester only, and Computational Biology does not offer a Master’s degree.

We are happy to answer any questions you may have, but please be sure to read this entire page first, as many of your questions will be answered below or on the Tips tab.

IMPORTANT : Please note that it is not possible to select a specific PhD advisor until the end of the first year in the program, so contacting individual faculty about openings in their laboratories will not increase your chances of being accepted into the program. You will have an opportunity to discuss your interests with relevant faculty if you are invited to interview in February.

Undergraduate Preparation

Minimum requirements for admission to graduate study:

  • A bachelor’s degree or recognized equivalent from an accredited institution.
  • Minimum GPA of 3.0.
  • Undergraduate preparation reflecting a balance of training in computational biology’s core disciplines (biology, computer science, statistics/mathematics), for example, a single interdisciplinary major, such as computational biology or bioinformatics; a major in a core discipline and a combination of interdisciplinary course work and research experiences; or a double major in core disciplines.
  • Basic research experience and aptitude are key considerations for admission, so evidence of research experience and letters of recommendation from faculty mentors attesting to the applicant’s research experience are of particular interest.
  • GRE – NOT required or used for review .
  • TOEFL scores for international students (see below for details).

Application Requirements

ALL materials, including letters, are due November 30, 2023 (8:59 PST). More information is provided and required as part of the online application, so please create an account and review the application before emailing with questions (and please set up an account well before the deadline):

  • A completed graduate application: The online application opens in early or mid-September and is located on the Graduate Division website . Paper applications are not accepted. Please create your account and review the application well ahead of the submit date , as it will take time to complete and requests information not listed here.
  • A nonrefundable application fee: The fee must be paid using a major credit card and is not refundable. For US citizens and permanent residents, the fee is $135; US citizens and permanent residents may request a fee waiver as part of the online application. For all other students (international) the fee is $155 (no waivers, no exceptions). Graduate Admissions manages the fee, not the program, so please contact them with questions.
  • Three letters of recommendation, minimum (up to five are accepted): Letters of recommendation must be submitted online as part of the Graduate Division’s application process. Letters are also due November 30, so please inform your recommenders of this deadline and give them sufficient advance notice. It is your responsibility to monitor the status of your letters of recommendation (sending prompts, as necessary) in the online system.
  • Transcripts: Unofficial copies of all relevant transcripts, uploaded as part of the online application (see application for details). Scanned copies of official transcripts are strongly preferred, as transcripts must include applicant and institution name and degree goal and should be easy for the reviewers to read (print-outs from online personal schedules can be hard to read and transcripts without your name and the institution name cannot be used for review). Do not send via mail official transcripts to Grad Division or Computational Biology, they will be discarded.
  • Essays: Follow links to view descriptions of what these essays should include ( Statement of Purpose [2-3 pages], Personal Statement [1-2 pages]). Also review Tips tab for formatting advice.
  • (Highly recommended) Applicants should consider applying for extramural funding, such as NSF Fellowships. These are amazing opportunities and the application processes are great preparation for graduate studies. Please see Financial Support tab.
  • Read and follow all of the “Application Tips” listed on the last tab. This ensures that everything goes smoothly and you make a good impression on the faculty reviewing your file.

The GRE general test is not required. GRE subject tests are not required. GRE scores will not be a determining factor for application review and admission, and will NOT be seen by the CCB admissions committee. While we do not encourage anyone to take the exam, in case you decide to apply to a different program at Berkeley that does require them: the UC Berkeley school code is 4833; department codes are unnecessary. As long as the scores are sent to UC Berkeley, they will be received by any program you apply to on campus.

TOEFL/IELTS

Adequate proficiency in English must be demonstrated by those applicants applying from countries where English is not the official language. There are two standardized tests you may take: the Test of English as a Foreign Language (TOEFL), and the International English Language Testing System (IELTS). TOEFL minimum passing scores are 90 for the  Internet-based test (IBT) , and 570 for the paper-based format (PBT) . The TOEFL may be waived if an international student has completed at least one year of full-time academic course work with grades of B or better while in residence at a U.S. university (transcript will be required). Please click here for more information .

Application Deadlines

The Application Deadline is 8:59 pm Pacific Standard Time, November 30, 2023 . The application will lock at 9pm PST, precisely. All materials must be received by the deadline. While rec letters can continue to be submitted and received after the deadline, the committee meets in early December and will review incomplete applications. TOEFL tests should be taken by or before the deadline, but self-reported scores are acceptable for review while the official scores are being processed. All submitted applications will be reviewed, even if materials are missing, but it may impact the evaluation of the application.

It is your responsibility to ensure and verify that your application materials are submitted in a timely manner. Please be sure to hit the submit button when you have completed the application and to monitor the status of your letters of recommendation (sending prompts, as necessary). Please include the statement of purpose and personal statement in the online application. While you can upload a CV, please DO NOT upload entire publications or papers. Please DO NOT send paper résumés, separate folders of information, or articles via mail. They will be discarded unread.

The Computational Biology Interview Visit dates will be: February 25-27, 2024

Top applicants who are being considered for admission will be invited to visit campus for interviews with faculty. Invitations will be made by early January. Students are expected to stay for the entire event, arriving in Berkeley by 5:30pm on the first day and leaving the evening of the final day. In the application, you must provide the names of between 7-10 faculty from the Computational Biology website with whom you are interested in conducting research or performing rotations. This helps route your application to our reviewers and facilitates the interview scheduling process. An invitation is not a guarantee of admission.

International students may be interviewed virtually, as flights are often prohibitively expensive.

Tips for the Application Process

Uploaded Documents: Be sure to put your name and type of essay on your essays ( Statement of Purpose [2-3 pages], Personal Statement [1-2 pages]) as a header or before the text, whether you use the text box or upload a PDF or Word doc. There is no minimum length on either essay, but 3 pages maximum is suggested. The Statement of Purpose should describe your research and educational background and aspirations. The Personal Statement can include personal achievements not necessarily related to research, barriers you’ve had to overcome, mentoring and volunteering activities, things that make you unique and demonstrate the qualities you will bring to the program.

Letters of Recommendation: should be from persons who have supervised your research or academic work and who can evaluate your intellectual ability, creativity, leadership potential and promise for productive scholarship. If lab supervision was provided by a postdoc or graduate student, the letter should carry the signature or support of the faculty member in charge of the research project. Note: the application can be submitted before all of the recommenders have completed their letters. It is your responsibility to keep track of your recommender’s progress through the online system. Be sure to send reminders if your recommenders do not submit their letters.

Extramural fellowships: it is to your benefit to apply for fellowships as they may facilitate entry into the lab of your choice, are a great addition to your CV and often provide higher stipends. Do not allow concerns about coming up with a research proposal before joining a lab prevent you from applying. The fellowships are looking for research potential and proposal writing skills and will not hold you to specific research projects once you have started graduate school.

Calculating GPA: Schools can differ in how they assign grades and calculate grade point averages, so it may be difficult for this office to offer advice. The best resource for calculating the GPA for your school is to check the back of the official transcripts where a guide is often provided or use an online tool. There are free online GPA conversion tools that can be found via an internet search.

Faculty Contact/Interests: Please be sure to list faculty that interest you as part of the online application. You are not required to contact any faculty in advance, nor will it assist with admission, but are welcome to if you wish to learn more about their research.

Submitting the application: To avoid the possibility of computer problems on either side, it is NOT advisable to wait until the last day to start and/or submit your application. It is not unusual for the application system to have difficulties during times of heavy traffic. However, there is no need to submit the application too early. No application will be reviewed before the deadline.

Visits: We only arrange one campus visit for recruitment purposes. If you are interested in visiting the campus and meeting with faculty before the application deadline, you are welcome to do so on your own time (we will be unable to assist).

Name: Please double check that you have entered your first and last names in the correct fields. This is our first impression of you as a candidate, so you do want to get your name correct! Be sure to put your name on any documents that you upload (Statement of Purpose, Personal Statement).

California Residency: You are not considered a resident if you hope to enter our program in the Fall, but have never lived in California before or are here on a visa. So, please do not mark “resident” on the application in anticipation of admission. You must have lived in California previously, and be a US citizen or Permanent Resident, to be a resident.

Faculty Leadership Head Graduate Advisor and Chair for the PhD & DE John Huelsenbeck ( [email protected] )

Associate Head Graduate Advisor for PhD & DE Liana Lareau ( [email protected] )

Equity Advisor Rasmus Nielsen ( [email protected] )

Director of CCB Elizabeth Purdom ( [email protected] )

Core PhD & DE Faculty ( link )

Staff support Student Services Advisor (GSAO): Kate Chase ( [email protected] )

Link to external website (http://www.berkeley.edu)

CALS

  • Cornell University Home
  • College of Agriculture & Life Sciences Home

Computational Biology

cornell shield

Computational Biology Program

The computational biology ph.d. program is training the next generation of computational scientists to tackle research using the big genomic, image, remote sensing, clinical, and real world data that are transforming the biological sciences..

The graduate field of Computational Biology offers Ph.D. degrees in the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological systems.

Computation has become essential to biological research. Genomic databases, protein databanks, MRI images of the human brain, and remote sensing data on landscapes contain unprecedented amounts of detailed information that are transforming almost all of biology. The computational biologist must have skills in mathematics and computation as well as in biology. A key goal in training is to develop the ability to relate biological processes to computational models.

The field provides interdisciplinary training and research opportunities in a range of subareas of computational biology including comparative and functional genomics, systems biology, molecular and protein networks, population genomics and genetics, bioinformatics, model system genomics, agricultural genomics, and medical genomics.

Students majoring in computational biology are expected to obtain a broad, interdisciplinary knowledge of fundamental principles in biology, computational science, and mathematics. But because the field covers a wide range of areas, it would be unrealistic to expect a student to master each facet in detail. Instead, students choose from specific subareas of study: They are expected to develop competence in at least one specific subdomain of biology and in relevant subareas of computational science and mathematics.

  • Program Details
  • Field Faculty
  • Current Students & Alumni
  • News & Events

sunset with bike

About Cornell

  • Welcoming all to our community

Student Research

  • Spotlight on student research

Faculty Research

  • Spotlight on Computational Genomics

News & Congratulations

  • Professor McCouch elected to National Academy of Sciences

Entrepreneurship Spotlight

  • Embark: Dog DNA Testing & Analysis

How to join us

  • Prepare yourself and your application

Program Contacts

a man in a white shirt draws on a white board

Associate Professor

  • (607) 255-3984
  • pm544 [at] cornell.edu

person in a grey shirt sitting outside

Assistant to the Chair and Graduate Field Administrator

  • (607) 255-5488
  • jf633 [at] cornell.edu
  • Faculty & Staff

Public Health Sciences

  • Degrees Overview
  • Undergraduate Degrees & Certificates Overview
  • Health Sciences, B.S.
  • Undergraduate Public Health Certificate
  • Graduate Degrees & Certificates Overview
  • Master of Public Health Online
  • Accelerated B.S. to M.S. Degree Program
  • Applied Health Research & Evaluation, M.S.
  • Graduate Certificate in Clinical and Translational Research
  • Applied Health Research & Evaluation, Ph.D.
  • Collaborative Degrees Overview
  • Language and International Health, B.S.
  • Biomedical Data Science & Informatics, M.S. / Ph.D.
  • Research Overview
  • Center for Public Health Modeling and Response
  • Alumni & Friends

Biomedical Data Science and Informatics, M.S. / Ph.D.

Clemson University students in the Biomedical Data Science and Informatics, M.S./Ph.D. in a classroom

Graduate Handbook Apply

The nation’s transition to new healthcare delivery models and the exponential growth in biomedical data translate to a need for professionals with expertise in data science focused on biomedical research who can leverage big data to improve health in the state and the nation. Specialized tracks will initially include precision medicine, population health, and clinical and translational informatics.

Biomedical data science and informatics is an interdisciplinary field that applies concepts and methods from computer science and other quantitative disciplines together with principles of information science to solve challenging problems in biology, medicine and public health.

The BDSI program is a unique collaboration that brings together Clemson’s strengths in computing, engineering, and public health and MUSC’s expertise in biomedical sciences to produce the next generation of data scientists, prepared to manage and analyze big data sources from mobile sensors to genomic an imaging technologies. Graduates will possess marketable skills for informatics careers in biology, medicine or public health focused on the development of prescriptive analytics from large data sources and are well prepared to lead research programs in academic, healthcare, public health, and industry.

This interdisciplinary program leverages the broad strength at Clemson spanning computing, engineering, mathematics, biology, public health, and other areas to produce the next generation of data scientists, prepared to manage and analyze big data sources from mobile sensors to genomic and imaging technologies. Graduates will possess the necessary skills for informatics careers in biology, medicine or public health focused on the development of prescriptive analytics from large data sources.

The program is designed for students with undergraduate computer science, math, engineering, or biomedical sciences backgrounds who wish to contribute to biomedical sciences or individual and societal health.

Highly qualified Clemson undergraduates who are interested in earning the M.S. in Biomedical Data Science and Informatics may begin earning their master’s degree while simultaneously completing their bachelor’s degree in Health Science. Learn more about the accelerated B.S. to M.S. program .

The M.S. and Ph.D. programs are a joint venture between Clemson University and the Medical University of South Carolina.

This interdisciplinary program leverages the broad strength at Clemson spanning computing, engineering, mathematics, biology, public health, and other areas to produce the next generation of data scientists, prepared to manage and analyze big data sources from mobile sensors to genomic and imaging technologies. Graduates will possess the necessary skills for informatics careers in biology, medicine or public health focused on the development of prescriptive analytics from large data sources to improve health in the state and the nation.

This program in unique to South Carolina and very few programs nationally focus on data science applied to the health and biomedical fields. Specialized tracks include precision medicine, population health, and clinical and translational informatics.

Graduates will possess marketable skills for informatics careers in biology, medicine, or public health focused on the development of prescriptive analytics from large data sources and will be prepared to lead research programs in academic, healthcare, public health, and industry. These specially trained scientists will be critical to on-going efforts to improve health outcomes in South Carolina and the nation.

This degree program is located in the School of Computing located at 100 McAdams Hall.  For more information, contact:

Dr. Brian Dean Director and Professor School of Computing Clemson University 205 McAdams Hall Clemson, SC 29634 [email protected] 864-656-5866

Adam Rollins Graduate Student Services Coordinator School of Computing Clemson University 100G McAdams Hall Clemson, SC 29634 [email protected] 864-656-5853

biology phd to data science

University of Delaware

  • People Directory
  • Safety at UD

University of Delaware Logo

  • Graduate Programs

Student presenting data.

Bioinformatics Data Science: Ph.D.

Bioinformatics data science is an emerging and rapidly expanding field where biological, computational and quantitative disciplines converge. The field encompasses the development and application of computational tools and techniques for the collection, analysis, management and visualization of biological data, as well as modeling and simulation methods for the study of biological systems. Fundamental to modern-day biological studies and key to the basic understanding of complex biological systems, bioinformatics data science is making an impact upon the science and technology of fields ranging from agricultural and environmental sciences to pharmaceutical and medical sciences. The research requires close collaboration among multi-disciplinary teams of researchers in quantitative and life sciences and their interfaces.

The Ph.D. in bioinformatics data science is offered as a University-wide interdisciplinary graduate program with scientific curriculum that builds upon the research and educational strength from departments across the colleges of Engineering (COE), Arts and Sciences (CAS), Agriculture and Natural Resources (CANR) and Earth, Ocean and Environment (CEOE). The Center for Bioinformatics and Computational Biology (CBCB) administers the Ph.D. program in bioinformatics data science and coordinates with the individual departments involved in the program.

The Ph.D. in bioinformatics data science trains the next generation of researchers and professionals to play a key role in multi- and interdisciplinary teams, bridging life sciences and computational sciences. Students will receive training in experimental, computational and mathematical disciplines through their coursework and research. Students who complete this degree will be able to generate and analyze experimental data for biomedical research as well as develop physical or computational models of the molecular components that drive the behavior of the biological system.

Degrees Offered

Bioinformatics Data Science–Ph.D. (CANR)

Bioinformatics Data Science–Ph.D. (CAS)

Bioinformatics Data Science–Ph.D. (CEOE)

Bioinformatics Data Science–Ph.D. (COE)

Application Deadlines

The  2023-2024  UD graduate student tuition rate per credit hour is $1028 .

All or nearly all doctoral students receive a stipend and full tuition scholarship.

Return to All Programs

Apply Now >

Contact info, program info.

Graduate College

Prospective Students

  • Graduate Admissions
  • Recruiting Events and Resources
  • Professional & Continuing Education
  • One UD Degree Wasn't Enough for Me
  • Cost of Attendance
  • Graduate Community Portal
  • Graduate College Council
  • Event Photography and Videography Policy

234 Hullihen Hall, Newark, DE 19716 USA   [email protected] General: (302) 831-6824 Fax: (302) 831-8745

DiscoverDataScience.org

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

biology phd to data science

Created by aasif.faizal

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.

Dissertation

One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at Kaggle.com . Winning one of these competitions is a good way to demonstrate professional interest and experience.

Internships

Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.

Conferences

Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

biology phd to data science

  • Related Programs

wiley university servieces logo

UW Bioengineering logo

INVENTING THE FUTURE OF MEDICINE

UW Bioengineering

PHD Data Science Option

Students studying together

Data Science Option

PhD students have the opportunity to pursue their PhD with a Data Science option. The Data Science option prepares the next generation of thought leaders to both apply new data science methods and build new data science tools. It recognizes Ph.D. students whose thesis work focuses specifically on advanced data science tools and provides an advanced education to students who will push the state-of-the-art in data science methods, such as developing new machine learning methods.

Students enrolled in this option can expect to interact with students in similar programs in genome sciences, statistics, oceanography, computer science & engineering and astronomy. The Data Science option is an official UW degree option which will be part of your degree title and appear on your transcript.

The Data Science Option overlays our standard course requirements. In other words, students must satisfy the universal PhD Curriculum requirements, in addition to the Data Science Option requirements. This may impose some extra constraints on course selection. However, note that some of the required data science courses may substitute for required electives.

Required Curriculum:

  • 2 credits of eScience seminar, currently offered as CHEME 599, Topics in Data Science (1 credit, CR/NC).
  • Three courses totaling 9-14 credits from three of the five categories listed below: Scientific Computing, Statistics and Machine Learning, Big Data and Image Processing, Data Visualization, and Data Science. Each category listed includes approved courses offered within Bioengineering and other departments on the UW Seattle campus. The large number of approved courses reflects both the breadth of the data science field, and the fact that space in non-Bioengineering courses is sometimes limited.

Scientific Computing

Ph.D. students who choose to enroll in the Data Science Option must have approval of their research advisor. Email this approval to the Graduate Program Advisor, Kalei Combs, [email protected] . There is no additional admission procedure.

Computational and Systems Biology

  • Biological Data Sciences Concentration

IMPORTANT: Effective 22W, the concentrations have been replaced with tracks. For students admitted to the major (not the pre-major) 22W and forward, more info on the new track curriculum can be found here.

The Biological Data Sciences concentration tackles a diverse set of biological questions–ranging from medicine, to genomics, to physiology, to pharmacology, to neuroscience, to ecology, and evolution–using recent tools and advances in mathematics and computation–specifically machine learning, statistical data sciences, and informatics. This concentration leverages new and developing courses within CaSB and across UCLA and will greatly aid students aiming to go directly into industry–biotech, pharmaceuticals , and more–as well as computational biology graduate school. This concentration has a strong focus and deep integration with life sciences.

Concentration Curriculum

1. COM SCI CM121: Introduction to Bioinformatics (4)

2. COM SCI 180: Introduction to Algorithms and Complexity (4)

3. COM SCI M146: Introduction to Machine Learning (4) OR STATS 161: Introduction to Pattern Recognition and Machine Learning (4) OR STATS 101C: Introduction to Statistical Models and Data Mining (4)

4 & 5 . Two e lective courses chosen from list below: 

Students in the Biological Data Sciences concentration must also complete:

  • Computer Science 32

Concentration Course Planning

When planning major coursework, students must be mindful of pre-requisites. Some courses for the Biological Data Sciences concentration have additional pre-requisites that are not part of the CaSB major or pre-major curriculum. The flowcharts below are meant to help students plan out concentration coursework by depicting the requisites for each requirement. These flowcharts were last updated September 2021. Always check the Registrar’s Course Descriptions  for updated requisites. Additionally, students must be mindful of when classes are offered (i.e., which quarters). Students should check the  Schedule of Classes  for updated course offerings.

It is recommended that students meet with their Departmental Counselor regularly to plan out major coursework.

' title=

Requisites vary based on chosen courses. Check the Registrar’s Course Descriptions  for updated requisites.

Other Important Information

All major courses must be taken for a letter grade, C or better. ^CaSB made a temporary exception allowing pre-major courses taken between Spring 2020 and Summer 2021 to be taken for a Pass grade. More details on this exception can be found  here .

Students must have a minimum 2.0 GPA in upper-division major coursework to graduate.

Students who receive a C- or below in a major course must either repeat the course or petition to have the lower grade count for the major. More information on petitioning can be found  here .

Students are subject to any requirement changes in the major, including concentrations, until they are officially admitted to the major.

Interesting links

  • 5-Year B.S./M.S. Departmental Scholar Program
  • Academic Advising
  • Academic Eligibility
  • Advising Appointments
  • Career Exploration
  • CASB Student Profiles
  • Commencement
  • Concentration Descriptions
  • DSP Current Student Resources
  • Poster Session
  • Prospective and Admitted Student Events
  • Registration and Enrollment
  • Student Socials
  • Undergraduate Seminar Series
  • Bioinformatics Concentration
  • Bioinformatics Track
  • Capstone Option 1: C&S BIO 199 & M187
  • Major Methodology Core Descriptions
  • Biological Data Sciences Track
  • Capstone Courses & Research Expectations
  • Capstone Option 2: C&S BIO 198A & 198B
  • Biomedical Systems Concentration
  • Capstone Option 3: C&S BIO 195 & M187
  • Concentration Curriculum (Major Admits Before 22W)
  • Dynamical Modeling Track
  • Neurosystems Concentration
  • Systems Biology Concentration
  • Track Curriculum (Major Admits 22W and Forward)
  • Customized Tracks
  • Uncategorized
  • December 2014
  • February 2013
  • December 2012
  • October 2010

Any Instructor

Can Biologists Become Data Scientists? (Answered!)

This post may contain paid links to my personal recommendations that help to support the site!

You’re an established scientist in the biological domain and you’re keen on knowing how you can transition from a biologist to a data scientist.

Here’s a short answer:

Yes, biologists can transition to be data scientists. Biology is becoming an increasingly quantitative and data-heavy domain. It is a complex field where life is quantified through biomedical data science, bioinformatics, and computational biology, which allow smoother transitions from biology to data science.

biology phd to data science

When I meant that biologists are able to transition, it means that there are probably some areas of overlap between these two areas.

I know that I’m honestly no expert in any of both of these areas but I’ve had some work in each of these fields.

During my observation of work in these areas, I picked up some aspects where biology could overlap with data science.

If you’re a biologist aspiring to enter data science , do check out these 3 reasons why below!

Why Can Biologists Transition to Data Science?

1. biological data is complex.

glass blur bubble health

Biology is very much the study of life, exploring all the different systems in your body that react and respond to each other to make you, you!

To look into these systems and study them, one requires a strong understanding of complex systems and how they can be explained through the data collected.

Data is so large in our DNA, that 455 exabytes of data are generated just from a single gram of it, according to the New Scientist .

The study of such large, intricate complexities hidden within biological data requires a strong understanding of how to properly clean, manipulate and preserve the right data for proper insight.

These are just some of the possible areas I found where a biologist can provide something that would be great when working in data science.

R in Healthcare: Here’s 7 REAL Use Cases You SHOULD Know!

2. Biology is Becoming Increasingly Computational

Although biologists may seem as without any transferrable skills, there are actually several skills a biologist has that allow for a good candidate in data science .

Of course, a proper transition would not be an immediate one and the development of these transition skills might help smooth out that transition much better

To add to the previous point I mentioned above, data in biology requires a similar data manipulation toolkit as that of a data scientist.

To play around with data, biologists tend to use programming languages like Python and R for their flexibility in conducting statistical tests.

Want to know more about Python in biology?

Check out this other article I wrote on Python and Biology !

Python & Biology: Here’s 15 KEY Things You SHOULD Know!

3. Scientific Thinking

scientific calculator ii

What’s the main similarity between a biologist and a data scientist, you say? Based on what I’ve observed from my interactions with both sides, they’re both scientists! That means that the same scientific thinking is present in both fields of work.

For example, the scientific methodology of having an initial hypothesis and testing against that hypothesis is a common thought process in both fields.

If you’re a biologist, you already know what’s next. The data has to be collected through some means and stored somewhere.

When that’s complete you’d have to pull up these data and have a look at them for initial data exploration.

Then comes the cleaning and removal of possible outliers due to error. Lastly, all the data is put together in digestible charts and put in a report. Now, does that all sound similar to the data science pipeline?

For comparison, here’s an article I’ve found about the data science pipeline. There are many points mentioned there but I’d like to draw similarities to the flow:

Acquiring data > Cleaning data > Data exploration > Data modeling > Data interpretation
Data Analytics Certification: Are They Worth It? (With 7 Examples)

What Areas Connect Biology to Data Science?

1. Bioinformatics

According to this article in Science, medicine, and the future that I found, bioinformatics can be seen as the application of computational approaches to better understand biological data.

I would say that this field is still very much biology-focused in terms of results.

The only thing is that the methods used are similar to that of data science, which is to clean and process data.

You would likely be looking at large gene datasets, mining for possible biological insights that are useful in understanding how genes work.
Is Bioinformatics a Data Science? (Answered + Explained!)

Of course, the data is not limited to genes and can vary to even protein expression levels.

Additionally, some bioinformaticians can work on creating algorithms for bioinformatics tools in processing data.

Based on what I experienced when speaking to a few bioinformaticians, this small field is right at the intersection center of data science, biology, and computer science.

This should be a perfect transition field for a general biologist to step into data science .

You’re going to want to learn some R and Bash to get some substantial work done in this area.

Learning R for the first time?

Find out how long it takes to learn R in my other article here .

How Long Does it Take to Learn R? (Answered!)

2. Biomedical Data Science

Based on this article I’ve found, this field of biomedical data science is a rather new term coined by data scientists who work on biomedical data.

Essentially, what this means is that there’s an intersection where biological and medical knowledge is combined with data science better uncover insights from data for use in biomedicine.

Much of the data would be used to drive improvements in healthcare, which separates this category from bioinformatics, which is an older field. I would say that this area is still a growing one, with more Universities expanding their departments.

For example, Harvard University started its research Department in Biomedical Data Science in 2015 to meet this demand.

At Nanyang Technological University, Singapore, there was also a recently set up Master’s Course in Biomedical Data Science to meet the same demand.

If you’re looking for a field that can still make use of your previous biomedical knowledge, here’s a field you might want to look into!

I’d recommend starting with the IBM Data Science Professional Certificate to get started!

Read my review of it here .

IBM Data Science Certificate: Worth It? (Read THIS First!)

Or if you’re thinking of going the data analytics route, you can use these 9 smarter ways to get s tarted!

Learning Data Analytics: 9 SMARTER Ways to Get Started!

3. Computational Biology

crop ethnic clever student writing formula after analysis of molecule model in university

Computational biologists utilize mathematical models for the prediction of outcomes for a biological understanding of interactions in molecules.

You would typically see examples of use in this field for drug development and protein interactions.

Due to the quantitative nature of this field, having a strong mathematics background is essential. This is a great trait to have when approaching the field of data science when handling machine learning models.

Why is Python Used More Than R in Data Science? (Explained)

How Can a Biologist Transition to Data Science?

woman working at home using laptop

1. Work on Basic Statistics and Programming Exercises

For starters, you might want to have some basic statistics and coding exercises to training your technical expertise, which will be useful in a technical interview.

Some really good places to start learning would be DataCamp and Coursera . I would recommend taking the introductory courses and going through the basic practice questions to get your foundations well.

With these two online learning sites, you should be able to find a wide variety of topics on data science. I would personally go for those in Python, R, and Basic Statistics courses.

Personally, I went to take the Google Data Analytics Professional Certificate to get the extra training in R programming. You can check out my review of that course over here .

Google Data Analytics Certification: An Honest Review (2023)

2. Apply Computational Methods To Your Project

“For the things we have to learn before we can do them, we learn by doing them.” ―  Aristotle,  The Nicomachean Ethics

I have always believed that learning by doing is by far one of the fastest ways to learn. Rather than just staring at boring lecture content, try applying some of that newfound knowledge to the project you’re in.

For example, if you’re doing wet-lab benchwork for your biology project, try to incorporate some programming scripts in your data analysis after the data collection is done.

You can pick up some useful languages such as R and Bash much faster through this method.

Want to know how long a data science project would take?

Read this article to find out!

Data Science Projects: Here’s How LONG They Take!

3. Learn from Youtube

If you’re very much a visual learner like me or if you need someone to guide you through your very first project in bioinformatics or data science , you should be looking to YouTube for your learning content.

What I would personally recommend is the Data Professor Youtube Channel. This channel is run by a bioinformatics professor who transitioned from biology into data science and you should really check his channel out.

Here’s a video that I think would help you out in your transition to data science.

Final Thoughts

The field of data science is still rather new and the entry requirements might still be flexible at this point (2020). That means that you are definitely able to make that transition out of biology if you take the necessary training seriously and start applying data science approaches in your daily biology work. Thanks for reading!

My Favorite Learning Resources:

Here are some of the learning resources I’ve personally found to be useful as a data analyst and I hope you find them useful too!

These may contain affiliate links and I earn a commission from them if you use them.

However, I’d honestly recommend them to my juniors , friends , or even my family !

My Recommended Learning Platforms!

My recommended online courses + books.

To see all of my most up-to-date recommendations, check out this resource I’ve put together for you here .

More Articles For You

  • 5 Ways to Start Learning Data Science by Austin Chia
  • How I Would Learn Data Analytics in 2024 by Austin Chia
  • How to Learn Data Analytics in 2024 by Austin Chia
  • AWS Cloud Technology Consultant Professional Certificate: Review by Austin Chia
  • Microsoft Power BI Data Analyst Professional Certificate: Reviewed! by Austin Chia

biology phd to data science

I'm a tech nerd, data analyst, and data scientist hungry to learn new skills, tools, and software. I love sharing content with my years of experience in data science, marketing, and tech startups.

Don’t miss out!

DataCamp New Year Sale!

biology phd to data science

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Genomics Proteomics Bioinformatics
  • v.18(1); 2020 Feb

The Birth of Bio-data Science: Trends, Expectations, and Applications

Wilson wen bin goh.

1 School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore

Limsoon Wong

2 Department of Computer Science, National University of Singapore, Singapore 117417, Singapore

Components of bio-data science

Biology is becoming increasingly digitized and has now taken on the sheen of a quantitative scientific discipline. A key driving factor is the increasing pervasiveness of high-throughput technological platforms in biological research, allowing millions of data points on genes, proteins, and other biological moieties across thousands of tissues and organisms to be compiled, cleaned, stored, and integrated for the purpose of systematic studies. In this data-rich landscape, it is not an exaggeration to say that the future of biological (and where deployed on clinical samples, biomedical) research lies in strategic maximization of data.

Big bio-data is not a distant fantasy. Not only have we already been living in the age of big bio-data, biological data is also being generated and accrued in an increasingly accelerated manner. Between 1990 and 2003, unraveling the human genome cost approximately $2.7 billion and took several years with many teams involved for completion [1] . By 2016, the same experiment now costs less than $1500 and requires only an afternoon within a single laboratory. Similarly, mapping a single tomato genome initially took an international consortium 5 years [2] ; but today, 150 different tomato genomes may be completed within a year [3] . The big bio-data landscape has also spurred the development of big data management systems such as the Expression Atlas [4] and proteomics identification (PRIDE) database [5] .

The rise of big bio-data needs to be leveraged upon for understanding diseases and improving health. Problems in the generation, management, analysis, visualization, and interpretation of data should assume a leading role, requiring a paradigm shift in attitude and know-how. Moreover, addressing larger data volumes requires advances in database management platforms and also improved algorithm efficiency. Where large amounts of data are accrued, issues with regard to veracity and complexity also emerge and need to be tackled with more urgency than ever. Traditional disciplines such as bioinformatics and computational biology are now more challenged than ever. In today’s technological landscape, data science and artificial intelligence (AI) have already acted as innovation drivers in areas such as business and finance, where data scientists take helm in converting data into practicable insights instead of working behind the scenes in operations. Examples include AI-driven algorithmic trading and stock recommendation systems in financial technology (fintech) and automated engine design, system maintenance, and robotics in engineering. Given the recent data explosion of and concomitant advances in data science in other disciplines such as business, finance, and computing, we predict that alongside the rapid and voluminous generation of biological data, a new variant of data science, which will specifically address domain-specific issues pertinent to biology, will emerge. We term this variant of data science as “bio-data science (BDS).”

BDS comprises three core disciplinary areas: biology (which constitutes the application domain), computer science, as well as mathematics and statistics ( Figure 1 ). The biology core area is concerned with questions regarding biological origin, such as the cause of a disease or understanding the diagnostic utility of an inferred biomarker. The computer science core area is concerned with devising appropriate algorithms for problem-solving, dealing with repetition ( e.g. , running the same algorithm on large subsets of data many times over), and resolving data storage issues, especially if the data to be analyzed is large. The mathematics and statistics core area is concerned with issues such as data summarization, normalization, and modeling. Although descriptive and exploratory statistical data analysis is by no means unique to BDS (also being an essential component of biostatistics and, to a lesser degree, bioinformatics), BDS has an added focus on prediction using emerging technology based on applying AI/machine learning (ML) on big data.

An external file that holds a picture, illustration, etc.
Object name is gr1.jpg

The core areas of bio-data science

Bio-data science may be split into 3 core areas. The theory is supplied by mathematics and statistics, put into action by computers via computer science, and the biology domain. ML, machine learning; AI, artificial intelligence.

Thinking of BDS additively in terms of the disciplinary cores is a mistake. BDS is more than the sum of its parts. Data science is often likened to storytelling with data. And to tell a good story requires one to have in-depth domain knowledge, such that these idiosyncrasies are carefully considered during data interpretation. In other words, BDS requires synergy amongst its disciplinary core areas. To give an example of the importance of domain and synergy with statistics, proteins do not operate independently but rather, as functional units called protein complexes. For a complex to function, its components must be co-expressed tightly, so that the complex can form in the first place. However, when we interpret a matrix of gene or protein expression from a purely statistical viewpoint, we mistakenly assume that each gene or protein operates independent of each other, a fundamental assumption of many statistical tests. This means that when we try to limit false positive rates, we make corrections based on the total number of genes being considered, even though the genes are not independent of each other ( e.g. , two proteins in a protein complex tend to be correlated in their expression profiles). Assuming independence results in overcorrection, causing loss of statistical power. In such cases, a more reasonable approach would have been to make corrections based on the potential number of protein complexes that can be formed instead [6] , [7] , [8] . Therefore, the biological domain does not merely create the questions that need to be answered, but it also provides constraints that must be understood and incorporated to create robust models.

We may also categorize BDS by analytical outcomes. Borrowing from Gartner ( www.gartner.com ), data science outcomes may be categorized into four levels in the order of difficulty and value: descriptive, diagnostic, predictive, and prescriptive. We have summarized these outcomes and levels in Table 1 .

The four levels of a bio-data science analysis goal or achieved outcome

Note : ML, machine learning; AI, artificial intelligence.

Currently, most modern-day investigations are at the first two levels. Descriptive analytics is concerned with simple data exploration and data description by plotting basic graphs such as pie charts and line graphs, as well as calculating simple statistics such as mean and median. Diagnostic analytics goes a step further and is concerned with identifying potential underlying causes that can explain why something happens. For example, if the stock market crashes today, we may examine existing data to identify potential causes. It may so happen that a political crisis occurs somewhere else. We know that in general, political uncertainty leads towards economic instability; so, this is a potential explanation, even though it may not in fact, be the correct explanation. The purpose of diagnostic analytics is to attempt, given evidence constraints, to figure out the true cause. To see why it is so hard to determine the true cause in, say, a stock market crash, we only have empirical data showing correlations in the past linking uncertainty and market crashes; it may be just that these two phenomena tend to happen together, that’s all. The more direct way to determine the true cause with certainty is to test for causality; however, it would be unethical and unfeasible to deliberately cause a crisis, just to observe its impact on the stock market.

Predictive analytics is concerned with translating what we currently know, into judgements on future phenomena. Unlike diagnostic analytics, which retrospectively analyzes the possible explanatory causes, predictive analytics goes a step further and attempts to predict the phenomena before it happens. In order to do so, it needs to have a good grasp on the potential causes and appropriate indicators. But this is all it requires, a good grasp on the causes and indicators. It may be able to predict that something will happen; however, without knowing how the causes and indicators actually work together, it is helpless to change what will eventually happen.

Being able to control outcome is the realm of prescriptive analytics. Here, a good grasp on the causes and indicators is not enough. Prescriptive analytics demands that you know how the causes work together, and how changes in specific factors will result in a change consistent with the desired outcome. When working with complex systems, although prescriptive analytics is incredibly difficult to achieve, it is also powerful. Prescriptive analytics requires a deep and detailed understanding of the system. In a complex system where many alternative pathways exist, several factors need to be targeted simultaneously in order to achieve an intended effect. Hence, network modeling in biology has proven to be especially vital for prescriptive analytics [9] .

Categorization of BDS by core area or by outcome is useful for theoretical discourse but has otherwise limited practical value. Moreover, in the case of core areas, notions of what should constitute core skills and expertise for data scientists is rapidly evolving. As we enter the “third wave” (at the point of writing), strategic and leadership skills are being increasingly touted as critical areas for enablement and empowerment, which is hardly surprising, as without any charm or charisma, it is difficult to convince other stakeholders to act on advice. As bio-data scientists are probably less concerned with the exigent needs of the business sector, such revisions in the core skill set are useful but nonessential. We do hold the viewpoint that regardless of anyone’s beliefs regarding what should be a disciplinary core area, being an effective bio-data scientist is less about what one knows, than what one does with it. Therefore, emergent skills and behaviors that arise from such divergent multidisciplinary training is more important than the core content (knowledge and skills) themselves. We also cannot emphasize enough that to be an effective bio-data scientist, it is critical to leverage on idiosyncrasies and informative contexts drawn from domain knowledge and use these creatively for problem-solving.

As far as analytical outcome level is concerned, there are also some gray areas. For example, descriptive analytics may also involve denoising and normalization approaches to some extent without the use of any correlation analysis. Also, an intended “prescriptive” analysis may fall short, perhaps due to unresolvable technical errors or other reasons, such that the predictive model cannot generalize and therefore has to be abandoned at the “diagnostic” level.

Ultimately, these divisions and classifications, no matter by disciplinary component or by analytical outcome, are arbitrary.

Despite its seemingly “new” status, BDS is ultimately a science of inquiry, and in this respect, not different from any typical scientific investigation. In the example shown in Figure 2 , as a simplified mode of BDS inquiry, we may use the following seven steps to help us answer the question of whether alterations in gene expression correlate meaningfully with mental states. The main difference is that BDS requires strong ability in meaningful data manipulation and analysis, with less emphasis on lower-throughput or underpowered physical experiments.

An external file that holds a picture, illustration, etc.
Object name is gr2.jpg

A bio-data science inquiry requires a well-defined question

Data science is like any other scientific pursuit. It can involve first choosing a question to investigate. We can then scope this question by identifying a relevant hypothesis, which is testable. Appropriate experiments for obtaining data to answer the hypothesis can be then designed and fielded. We then determine the results and assess their validity, that is, whether the data is suitable for answering the research question. Finally, we deploy the model and see if our findings are repeatable.

Bioinformaticians and computational biologists can be bio-data scientists

We define BDS as the application of data science principles and associated technologies for deriving insights from bio-data. This has important implications for drug development, personalized medicine, automated diagnosis, and health service monitoring systems. Currently, some bioinformaticians, depending on their scope and/or research question, already function as bio-data scientists.

Bioinformatics is the application of information technology (IT) and computer science (CS) to biology. It emerged and evolved in response to the growth of digital biological information, which creates new analytical problems. For example, when full-length DNA or protein sequences became more common, data storage, organization, and representations emerged, paving the way toward pioneering databases such as Dayhoff’s Atlas of protein sequence in 1966 [10] . In the dawn of the Human Genome Project (HGP) and the emergence of DNA-sequencing technologies, it was unnecessarily arduous to identify overlapping DNA fragments by eye. Such tasks are highly repetitive and can be automated by designing and implementing appropriate algorithms. Bioinformatics emerged in a time to provide support for these emerging analytical requirements. Some successes of bioinformatics include the provision of algorithms for assembling a full genome or performing highly intensive annotation tasks, such as marking approximately 10 million single nucleotide polymorphism (SNP) locations in the human genome. Bioinformatics also includes algorithms for noise removal and bias correction. This includes normalization procedures such as robust microarray analysis (RMA) [11] in microarrays, base-calling [12] , and gene length-based correction approaches, e.g. , transcripts per million (TPM) and reads per kilobase million (RPKM) [13] , in RNA sequencing.

Bioinformatics draws upon IT and CS concepts to identify suitable parallels, create reasonable models, and then solve the biological problem. In this respect, bioinformatics acts as a support discipline that solves a technical issue, so that the biologist may move forward in dissection of some biological problems, such as unraveling causal mechanisms that give rise to a phenotype. However, a bioinformatician acting in this respect does not take the lead in generating actionable interventions or building possible explanatory models that lead directly toward understanding the biological problem.

This is not to say that all bioinformaticians do not care about developing models that explain biological phenomena. Certainly, within many laboratories, many bioinformaticians double up to provide explanatory models by collaborating closely with biologists. We regard activities requiring a bioinformatician to translate the digitized data output into biological insight as the realm of computational biology. A bioinformatician can therefore act as a computational biologist.

Both bioinformaticians and computational biologists may act as bio-data scientists, provided they use similar skillsets associated with the data science field. This includes being able to tweak, optimize, and deploy ML and AI technologies, and being well trained in applied statistics. Notably, these are not formal training requirements for computational biologists and bioinformaticians currently.

Computational biologists, bioinformaticians, and bio-data scientists will occupy and share the analytical space in this new digital biology landscape. The distinctions can be muddy, but there is certainly no barrier for a skilled individual to occupy all three professional spaces. Moreover, we do not think that there will be any form of superseding amongst the three professions: bioinformaticians will certainly continue to play important roles as frontline data generators and aggregators. This becomes increasingly important, given the large volumes of data being generated. Computational biologists will evolve, and the biological questions that interest them will change as new possibilities open with the advent of big bio-data. Computational biologists may have already been trained bioinformaticians, and certainly, they may cross the barrier/ divide to leverage on data science, thus becoming bio-data scientists themselves and using the new know-how to create new solutions to their interested biological problems.

Finally, bio-data scientists, no matter self-professed or by professional designation, will emerge as new players. They may be existing purveyors from a bioinformatics or computational biology background. Nonetheless, they may also include new players with non-biological backgrounds such as mathematics, physics, and engineering, or even, pure data science or AI training backgrounds. Just as data scientists are transforming other fields, we foretell the emergence of a new breed of bio-data scientists, who will actively shape and lead the narrative for research and development direction in their chosen biological domains or disease contexts.

Drivers for BDS

BDS is accelerated by three main drivers: the emergence of big bio-data, the second coming of AI, and a revolution in statistical thinking.

Emergence of big data in biology

HGP marked the start of numerous large-scale data acquisition initiatives such as the International HapMap Project [14] , 1000 Genomes [15] , the Cancer Genome Atlas ( https://www.cancer.gov/tcga ), as well as the recently announced Human Brain Project [16] and the Human Proteome Project [17] . These ambitious initiatives require advances in approaches for data generation, data flow, data storage, data access, and data representation. To address this need, new cloud technologies provide powerful methods for data storage and access beyond the limitations of our local hard drives [18] . Parallel/distributed computing methods such as Hadoop provide powerful ways of performing analysis on the cloud [19] .

Today, genotypic data based on DNA and RNA sequences is the major driving force for the evolution of biology into a data science. There are more than 2.7 million samples that are currently available from the Gene Expression Omnibus database (at the point of writing: November 19, 2018). Assuming the size of each file is approximately 1 GB (a very modest estimate), size of these samples can easily add up to the amount of 2.7 petabytes (PB). Improvements in RNA sequencing technologies will accelerate data explosion. For instance, the current Illumina HiSeq X sequencing platform can generate 900 billion nucleotides of raw DNA sequence within 3 days. It is estimated that by 2025, the storage of human genomes alone will require 2–40 exabytes (EB) [20] , [21] . Besides genotyping data, other sources of big bio-data are also emerging. These include medical records, phenotyping and trait-based measurements collectively referred to as “phenome,” imaging and microscopy data, as well as network-based information garnered from various interaction-based experiments.

This changing data landscape did not go by unnoticed. In a survey across 704 National Science Foundation investigators, the unanimous response was that biology is awash with big data [22] . Respondents also ranked training on integration of multiple data types, data management and metadata, as well as scaling analysis to cloud/high-performance computing as the three greatest unmet needs critical to advancement in their research fields [22] . It appears that the problem is the growing gap between the accumulation of big data and the limited knowledge of researchers about how to use it effectively [22] .

Second coming of AI

Despite AI being heralded as the technological game changer that will drive the digital economy of the future, this is not the first time such high expectations have been heaped on the AI technology. During the 1970s–1980s, AI was expected to usher in the age of the self-driving car and other technological marvels; these unfortunately did not come to pass, eventually leading to a period known as the “AI winter” [23] . Improvements in AI-based learning platforms, particularly neural networks [24] , and newly revitalized paradigms such as deep learning (DL) [25] and reinforcement learning (RL) [26] have created new opportunities and applications.

RL is loosely defined as learning that does not require perfect or large amounts of data. Encapsulated as an AI system, RL is about making appropriate decision and then taking action to maximize reward in a particular situation or acting under specific constraints ( e.g. , chess playing rules).

DL is loosely defined as architectures that facilitate complex decision-making by modeling AI as neural networks, not unlike the neural connections found in the human brain. DL is compatible with big datasets with high levels of complexities as it aims at learning feature hierarchies from the data, where higher-level features of the hierarchy are formed by composition of lower-level features. We may think of these multi-level features as abstractions, allowing a system to learn complex inputs without necessarily depending completely on pre-defined human-based inputs. DL is gaining great popularity in biological research, with novel applications in proteomics [27] , genomics [28] , and biomedicine [29] , [30] , [31] .

While there is much anticipated potential that has led to several high-profile tie-ups between IBM’s Watson AI and various pharmaceutical giants (see the section “Trends and expectations for BDS”), it is important to remember that AI and ML techniques are intimately connected, of which the latter is already commonplace in bioinformatics. Algorithms for gene finding based on hidden Markov models (HMMs), e.g. , GENSCAN [32] , and neural networks for motif finding [33] are just a few notable examples. A key difference between the AI applications of old days, and today’s new applications is scale, wherein AI is expected to identify long-range patterns and perform multi-omics integration across various levels of big bio-data, such as the genome and proteome, thus proposing mechanisms and/or testable targets.

A revolution in statistical thinking

The field of statistics is undergoing a major transformation. Scientific arguments based solely on P values are no longer viewed as sufficiently robust. For example, a replication study across leading psychology journals has revealed that <50% of the studies examined are replicated [34] . Halsey et al demonstrate the instability and variability of P values; even as sample size increment and exact replication experiments (EREs) converge on the true effect size, there lacks any concomitant reduction in the variability of P values [35] . Halsey et al’s work partially explains the high non-replication rates in Ioannidis’ experiment [34] and warns against the use of the convenient yet ill-founded strategy of claiming conclusive research findings solely on the basis of P values, despite it being a commonly accepted practice.

Relatively simple mitigating measures against P value instability include using confidence intervals (CIs) [35] (although this viewpoint has also been confronted by van Helden [36] ), ranking variables by effect sizes [37] , reporting the P value replicability or p-rep [38] , [39] , and performing repeated subsampling on the data to determine if the findings are consistent [40] . There has already been much discussion regarding the nature of P values; therefore, we will not elaborate this further.

A very useful, and in our opinion, a more balanced approach is to incorporate Bayesian thinking, when it comes to reasoning about the P values. The Bayesian perspective says that instead of only considering the evidence that suggests support for a true effect, we should consider the evidence in totality, which also includes considering the same evidence that suggests support for a non-true effect.

We may express the probability [P(T|e)] for a true effect (T), given some evidence (e). By Bayes’ theorem, the probability is expressed as follows:

The right hand side is the probability of obtaining a true effect, P(T), which is multiplied by the probability of obtaining some evidence, e, given a true effect, P(e|T), and divided by the probability of observing the evidence, e, independently. We also need to consider the probability [P(−T|e)] for a non-true effect (−T), given the same evidence (e). Accordingly, the probability is expressed via Bayes’s theorem as follows:

The right hand side is the probability of obtaining a non-true effect, P(−T), which is multiplied by the probability of obtaining some evidence, e, given a non-true effect, P(e|−T), and divided by the probability of observing the evidence, e, independently.

Given some evidence e, we may then calculate the odds of obtaining true effects against non-true effects as follows:

When people observe strong effect ( e.g. , a significant P value) in support of their hypothesis, they will think that there is a true effect. However, they often fail to consider the alternative possibility that a significant P value can also arise when there is no true effect. Thus, the Bayesian perspective is more balanced. We can use this perspective in practical settings. For example, when a gene is reported as significantly correlated with a phenotype, we will be less inclined to immediately declare this finding as important without first estimating the likelihood that the same gene will also be reported as significantly correlated, even if it has no true correlation with the phenotype. This perspective can also be usefully extended toward situations beyond “no effect” to situations wherein a significant result is due to a confounder ( e.g. , batch effects) as well.

A second important and changing statistical perspective is the movement against blind use of centralities such as mean, mode, and median. In symmetrical distributions, the arithmetic mean and median, combined with a sense of the underlying dispersion such as the standard deviation or interquartile range, are generally useful metrics. However, there are many instances wherein the use of centralities is unwarranted and extreme metrics such as minimum and maximum values are actually more useful [41] in situations, including adverse environments where a biological phenomenon is rare [42] . To provide an example, suppose we are interested in examining the optimal configuration for fire resistance, given a fixed number of trees and lakes in a simulation model. The model that provides the maximum number of surviving trees would be the optimal configuration we want. Suppose we simulate the random placement of trees and lakes and return the number of surviving trees each round. In this case, reporting the average values of the models only tells us on average what is the remaining number of trees but is otherwise pointless.

The third perspective is the recognition of the gap between theoretical and applied statistics. The studies of Halsey et al and Ioannidis et al, wherein the former reports P value instability leading to the Winner’s curse (the analogy pertains to one winning the lottery out of sheer chance, just as a false positive but spectacular finding also arises due to chance) and the latter shows that >50% of real-world studies in psychology are not reproducible, have demonstrated that theoretical statistical perspectives do not work well in practice [34] , [35] . Similarly, in our own practice, we have also found that statistical significance is abundant, due to the presence of confounders and other irrelevant factors [43] . This is also known as the Anna Karenina effect. Such problems are remediable by performing statistical analysis more logically and considering disparities and idiosyncrasies associated with both statistical techniques and data [43] .

Trends and expectations for BDS

The rise of data science, AI, and ML has led toward several high-level collaborations between industry and computing firms, spawned new biotechnology companies, and created new opportunities for advancing scientific discovery.

We list a few examples. In late 2016, pharmaceutical giant company Pfizer announced a collaboration with IBM, involving the use of the latter's Watson AI for immuno-oncological research. In June 2017, GNS Healthcare and Genentech (Roche) announced a collaboration to use the causal ML and simulation platform of GNS Healthcare to power development of novel cancer therapies. In that same month, Novartis also announced a collaboration with IBM Watson to use AI for improving health outcomes in patients with breast cancer. New enterprises are also emerging rapidly. For example, XtalPi is a pharmaceutical technology company that is re-inventing the industry’s approach toward drug R&D with its Intelligent Digital Drug Discovery and Development (ID4) platform, which integrates quantum mechanics, AI, and cloud computing, thus allowing pharmaceutical companies to increase their efficiency, accuracy, and success rates at critical stages of drug R&D. Since 2016, Bayer has been offering money through its grant programs, with clear preference for AI medical startups working on cancer (Turbine) and preventable diseases (xbird).

Besides pharmaceuticals, there are also instances of AI-led advances in biological research. For example, Allen Cell Explorer uses ML to predict stem cell topology based on thousands of images; BenevolentAI and Microsoft Academic AI are learning algorithms that process natural language, formulate new ideas from what they read, and sift through vast chemical libraries, medical databases, and conventionally presented scientific papers to establish connections across knowledge networks.

Biological education is also expected to benefit from the advancement of AI/ML and data science. Smart learning platforms based on adaptive learning models are emerging [44] .

Risks for BDS

BDS will be a challenging field, but its difficulties are not necessarily distinct from those of bioinformatics or computational biology. Biological systems are highly complex, while the technological platforms intended for assaying these biological systems are in themselves also highly sophisticated. Moreover, technological instruments developed for measuring biological entities are subject to technical uncertainty, while the components of biological systems change and vary naturally over time. Big bio-data is not a natural solution for such issues, and it presents new difficulties. While big bio-data may facilitate data science endeavors, such as the process of identifying conserved patterns over very large numbers of observations, it may only do so if appropriate analytical pipelines are developed. This task is non-trivial. One may imagine such an analytical pipeline as an end-to-end integration of various approaches, forming an analysis stack starting with data collection and continuing through computational and statistical evaluations toward higher-level biological interpretations and insights. A simplified pipeline for biomarker analysis from high-throughput omics data and the associated key considerations are shown in Figure 3 .

An external file that holds a picture, illustration, etc.
Object name is gr3.jpg

An analysis pipeline for biomarker analysis using a data-centric approach

While it is foolhardy to propose a one-size-fits-all approach toward biomarker prediction, a workable model may take the form as follows. It can be seen that tools typically associated with data science, such as machine learning, come much later in the pipeline, and are subject to good experimental design and adequate removal of confounding factors from data. A few definitions are provided here for the reader’s convenience: reproducibility – the tendency for an identified signature to be repeated in another independent evaluation; robustness – the tendency for an identified signature to outperform randomly generated feature sets; and relevance – the consistency in terms of a signature with a given phenotype. KNN, K-nearest neighbor.

Analytical pipelines need to be very flexible and change according to the needs of the research question. Since we lack perfect knowledge, it is also usual to iterate and refine, moving back and forth across several steps, to achieve some sense of optimization and reproducibility. For example, suppose in the normalization step, we find that the use of two different normalization procedures results in very different and non-overlapping differential gene sets. It is possible that the normalization procedure makes erroneous assumptions about the data or that it may have been wrongly implemented. The key considerations shown in Figure 3 are non-exhaustive. The purpose of showing the steps with examples of considerations is to demonstrate that while there is no perfect system or pipeline, given each step, there are many considerations, with each decision point having consequence for the steps that come afterward. We also need to evaluate compatibility issues, such as whether a particular normalization approach works well with a downstream statistical procedure. Other issues include whether a particular procedure might lead toward over-cleaning and overcorrection (problems associated with batch effect correction algorithms [45] , [46] and some multiple test correction methods [47] ). BDS may be likened to recipe development in the kitchen, requiring multiple rounds of trial-and-error, while keeping a close eye on the intended endpoint or objective. There is no route map or standard operating procedure that guarantees a universally good result. In this regard, BDS is as much an art as it is a science [43] , [48] .

Suppose we are able to reach our intended analytical objective; it still should not be forgotten that the output is ultimately based on inference. And inferences, when based on massive data wherein we are less able to control heterogeneity and variability, run the risk of generating errors (both false positives and false negatives). These errors in turn lead to overfitting, that is, the predictive models are over-tuned to work well only on the training data but not on future independently generated datasets.

In practice, good research and development should include an accurate evaluation of error rates, and good methods should minimize error rates where practical. However, there is always a trade-off between getting only correct answers (higher false negative rate) and getting all the correct answers (higher false positive rate). Furthermore, estimations of error rates may be off, if the statistical model is a poor fit with the data, for example, using reference models that assume normality of distribution when the data is clearly non-normally distributed.

Toward a unified BDS curriculum

There are many insertion points into BDS. A computer scientist, statistician, or fintech analyst may enter the field by increasing their biological domain knowledge. A practicing computational biologist and bioinformatician may strengthen their statistical knowledge, learn parallel computing platforms such as Apache Spark or Hadoop, and learn how to use ML and AI implementations such as TensorFlow. Professional training in the BDS landscape will prove highly heterogeneous. It would take more work (and time) for a pure biologist to crossover, as fundamental training in mathematics, statistics, and computing would be required.

The increased momentum toward data science has led to education reforms internationally. In recent years, the University of California, Berkeley and Carnegie Mellon University have sought to make digital literacy (basic programming and data science) a core component of all undergraduate education. Where BDS is concerned, the School of Biological Sciences, Nanyang Technological University (NTU), in consultation with other stakeholders, has proposed the following curricula ( Figure 4 ) for BDS. The basic purpose is to equip biological science undergraduates with timely computational thinking and digital literacy skills essential for the modern economy. This set of courses is also meant to provide an insertion point for undergraduates to pursue further training as bio-data scientists.

An external file that holds a picture, illustration, etc.
Object name is gr4.jpg

Current digital literacy offerings in SBS, NTU to facilitate immersion into the bio-data science

SBS currently (at the point of writing: November 19, 2018) offers 2 compulsory data science modules in the form of introduction to computational thinking, and introduction to data science (other modules are electives). The purpose of these modules is to seed interest and also to inform students on the new data-centric paradigms that will likely revolutionize biomedical research in the years to come. SBS, School of Biological Sciences; NTU, Nanyang Technological University.

At the graduate level, a handful of Masters/PhD-level programs using the term BDS have emerged as well. The University of Wisconsin–Madison has launched a pre-doctoral training program and Master’s program in BDS, with emphasis on statistics, mathematics, data visualization, and ML. Other similar and related programs include the Masters of health data science, available in the Faculty of Biology, Medicine and Health, University of Manchester, UK; the Masters of biomedical research (data science), available in the Faculty of Medicine, Imperial College London, UK; and the Masters of biostatistics and data science, available in the Graduate School of Medical Sciences, Cornell University, USA. Beginning from 2020, the School of Biological Sciences, Nanyang Technological University (NTU) will also offer a Masters program in biomedical data science.

We believe what should constitute the core curriculum of BDS is still being formulated and may take several more years before the field matures and stabilizes. We have noticed that the term BDS is now being marketed in some graduate programs. In several cases, besides a name change, the distinction between BDS and bioinformatics/biostatistics is not explicit. While we agree that bioinformatics knowledge is essential in BDS training, it is less clear as to exactly which aspects of bioinformatics are relevant and must be included. The advent of BDS will also drive changes in bioinformatics education, as educators re-examine the course content for timely relevance, and explore areas for synergistic collaboration. Indeed, given the rise of big data, educators are questioning if current bioinformatics curriculum include sufficient components to address this issue. After all, many bioinformatics programs were established before big data became a prominent area of focus.

For biology educators seeking to implement a BDS curriculum, we feel that it is crucial not to just teach programming and employing existing software tools such as TensorFlow [49] . Educational components incorporating abstract, algorithmic, and logical thinking (computational thinking), which are important for problem-solving, are absolutely necessary.

Some analytical situations requiring BDS

BDS will emerge as a new discipline in light of novel challenges stemming from big bio-data, an increasing recognition of the gulf between applied and theoretical statistics, and expectations heaped upon it given the rise of AI. In this section, we describe some interesting challenges for BDS.

Creating new perspectives in doing cross-validation right

In our “Turning straw into gold” paper, it is shown that about 50% of randomly generated (and therefore meaningless) gene signatures work well on a given breast cancer survival dataset, with some even outperforming published signatures [50] . On the surface, this would imply rather dramatically that all manuscripts focusing on finding prognostic signatures on breast cancer survival are a waste of effort (and therefore, that all manuscripts with focus on finding such signatures should be rejected without review). Of course, that would be too drastic. However, it does suggest that if we rethink more deeply, even higher stringency should be placed on validation than currently practiced by data mining or ML researchers. In particular, given the observation that a random signature has about 50% chance to be significant in a dataset, more independent datasets must be used to ensure that the observed associations are not due to chance.

Assuming that the datasets are fully independent, we also observed that seven datasets are needed to ensure that a random signature has <1% chance to be universally significant in all seven datasets. This requirement (of seven independent test datasets) is much higher than the common practice of simple cross-validation on a training dataset and a single independent test dataset in the data mining and ML communities. In other words, biology demands higher proof of generalizability.

Perceived interdependence of datasets in independent validation

In our meta-analysis of various breast cancer datasets, we also observed that the number of independent datasets in which a randomly generated gene signature is significant is not distributed according to the binomial distribution, although the mode of the distribution is preserved and accentuated [50] . This suggests that the independent datasets might not be fully independent despite being collected from different independent groups. Perhaps there are some shared intrinsic population characteristics that confound the random signatures (besides the effects of proliferation-associated genes, which is reportedly a major source of confounding effects). A deeper investigation into the meta-characteristics of these datasets is therefore useful and may reveal the existence of yet, unreported confounders. In other words, while existing ML and AI practitioners may use only one independent validation approach, there are instabilities associated with this extremely crucial step. Just because an independent validation proved positive does not mean that the gene signature is truly good. It could also be because it so happens that the independent validation dataset has some commonalities with the training data, and that therefore, data leakage has occurred.

Stop to question even when prediction accuracy is good

Suppose we train a neural network W, on a training set and test it on a test set only to get a high accuracy ( e.g. , 90%). Next, we randomly remove two edges in W to get a new network W′ and train/test it on the same training/testing set as W, it is very likely to get a high accuracy similar to that of W.

Now, suppose we randomly generate lots of new test data and feed these to both W and W′. Although we have no idea what the true class labels on the new test data are, we still can determine whether W and W′ agree on these test data ( i.e. , W and W′ agree—both predict “yes” or both predict “no,” or W and W′ disagree—one predicts “yes” and the other “no”). It can be observed quite often that W and W′ would drastically disagree on the new test data (with disagreement rates that may be >50% of the new test instances). This means that despite having very similar and common origins, we may have the following findings. (1) W and W′ are drastically different rules/models; (2) a single test dataset is insufficient to validate W and ensure that it is meaningful; and (3) there is often significant sampling gap/bias in a test dataset. A corollary is: (4) it is critical to carefully analyze W to obtain/derive a full explanation of the set of rules it represents and to properly ascertain the biological meaningfulness of these rules.

In short, the ability to achieve a good prediction accuracy may have little to do with the true biological meaning. This is also a major stumbling block when transcending from “predictive” toward “prescriptive” analytical levels. While we offer no direct solution to this problem, it is important to realize that ML and AI are but tools with high tuneability and many performance exceptions. It is therefore important that aspiring bio-data scientists to train hard on logical thinking processes instead of merely relying on feeding massive heaps of data into an algorithmic blackbox. If good results are obtained, the good performance may be misleading. If bad results are obtained, knowing the likely factors to consider and test is crucial, instead of trial-and-error, which may prove daunting when there are many more variables in big data to consider.

Biology has a golden opportunity to ride on the current data science wave. This will inevitably give rise to a new subfield—BDS. We are beginning to see new initiatives and achievements as a result. We foresee the rise of bio-data scientists as a new breed of specialists who will act as navigators and overseers in directing and leading future innovation from a data-centric/informed perspective.

Competing interests

The authors have declared no competing interests.

Acknowledgments

WWBG gratefully acknowledges support from the Accelerating Creativity and Excellence (ACE) and EdeX grants from Nanyang Technological University, Singapore. WWBG also acknowledges the support from a National Research Foundation of Singapore–National Natural Science Foundation of China (Grant No. NRF2018NRF-NSFC003SB-006). LW acknowledges support from a Kwan Im Thong Hood Cho Temple Chair Professorship and the National Research Foundation Singapore under its AI Singapore Programme (Grant Nos. AISG-100E-2019-027 and AISG-100E-2019-028). WWBG and LW also acknowledge Alex Bateman for close reading and contribution of ideas that helped improve the manuscript.

Handled by Zhang Zhang

Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.

cds official logo

NYU Center for Data Science

Harnessing Data’s Potential for the World

PhD in Data Science

An NRT-sponsored program in Data Science

  • Areas & Faculty
  • Admission Requirements
  • Medical School Track
  • NRT FUTURE Program

Advances in computational speed and data availability, and the development of novel data analysis methods, have birthed a new field: data science. This new field requires a new type of researcher and actor: the rigorously trained, cross-disciplinary, and ethically responsible data scientist. Launched in Fall 2017, the pioneering CDS PhD Data Science program seeks to produce such researchers who are fluent in the emerging field of data science, and to develop a native environment for their education and training. The CDS PhD Data Science program has rapidly received widespread recognition and is considered among the top and most selective data science doctoral programs in the world. It has recently been recognized by the NSF through an NRT training grant.

The CDS PhD program model rigorously trains data scientists of the future who (1) develop methodology and harness statistical tools to find answers to questions that transcend the boundaries of traditional academic disciplines; (2) clearly communicate to extract crisp questions from big, heterogeneous, uncertain data; (3) effectively translate fundamental research insights into data science practice in the sciences, medicine, industry, and government; and (4) are aware of the ethical implications of their work.

Our programmatic mission is to nurture this new generation of data scientists, by designing and building a data science environment where methodological innovations are developed and translated successfully to domain applications, both scientific and social. Our vision is that combining fundamental research on the principles of data science with translational projects involving domain experts creates a virtuous cycle: Advances in data science methodology transform the process of discovery in the sciences, and enable effective data-driven governance in the public sector. At the same time, the demands of real-world translational projects will catalyze the creation of new data science methodologies. An essential ingredient of such methodologies is that they embed ethics and responsibility by design.

These objectives will be achieved by a combination of an innovative core curriculum, a novel data assistantship mechanism that provides training of skills transfer through rotations and internships, and communication and entrepreneurship modules. Students will be exposed to a wider range of fields than in more standard PhD programs while working with our interdisciplinary faculty. In particular, we are proud to offer a medical track for students eager to explore data science as applied to healthcare or to develop novel theoretical models stemming from medical questions.

In short, the CDS PhD Data Science program prepares students to become leaders in data science research and prepares them for outstanding careers in academia or industry. Successful candidates are guaranteed financial support in the form of tuition and a competitive stipend in the fall and spring semesters for up to five years.* We invite you to learn more through our webpage or by contacting  [email protected] .

*The Ph.D. program also offers students the opportunity to pursue their study and research with Data Science faculty based at NYU Shanghai. With this opportunity, students generally complete their coursework in New York City before moving full-time to Shanghai for their research. For more information, please visit the NYU Shanghai Ph.D. page .

The University of Edinburgh home

  • Schools & departments

Postgraduate study

Data Science for Biology MSc

Awards: MSc

Study modes: Full-time

Funding opportunities

Programme website: Data Science for Biology

Discovery Day

Join us online on 18th April to learn more about postgraduate study at Edinburgh

View sessions and register

Programme description

Biological Science is being transformed by a range of high-throughput technologies that generate vast amounts of data. These technologies include high-throughput genomic sequencing, imaging, screening and proteomics.

The result of this is that skills that equip scientists and researchers with the expertise to extract biological meaning from this data are in high demand.

This programme uses concepts from data science to train students in a generic set of skills that can be adapted to solve problems in a flexible way across different types of data. The intention is to complement and extend existing approaches, especially those from bioinformatics, with data science, to build highly adaptable skill sets for modern biological research.

Is this MSc for me?

The University of Edinburgh has a thriving data science community, which you would become part of. Based in the School of Biological Sciences, ranked in the top 5 for the UK for research power (REF 2022), you will be deeply focused on applying data science concepts within the domain of biological sciences.

On completion of this programme you will be well equipped to undertake further studies, or to work in industry.

Programme structure

The MSc comprises two semesters of taught courses followed by a research project and dissertation.

Semesters 1 and 2 include 60 credits each of a mixture of compulsory and optional courses.

The research project and dissertation is also worth 60 credits, giving a total of 180 credits for the year.

The programme is designed to provide training in statistics, Python, R programming and data analysis, all tailored specifically to biological examples. Teaching is through lectures, tutorials, seminars, computer practicals and lab demonstrations.

Compulsory courses:

  • Statistics and Data Analysis (Semester 1, 20 credits)
  • Biological Databases (Semester 1, 10 credits)
  • Using R for Data Science (Semester 1, 10 credits)
  • Introduction to Python Programming for Data Science (Semester 2, 10 credits)
  • Research Project Proposal (Semester 2, 20 credits)
  • Project Dissertation (Semester 2, 60 credits)

Available recommended and optional courses include:

  • Information Processing in Biological Cells (Semester 1, 10 credits)
  • Practical Systems Biology (Semester 2, 20 credits)
  • Foundations in Responsible Research and Innovation (Semester 1, 10 credits)
  • Functional Genomic Technologies (Semester 2, 10 credits)
  • Bioinformatics Algorithms (Semester 2, 10 credits)
  • Introduction to Website & Database Design (Semester 2, 10 credits)
  • Social Dimensions of Systems and Synthetic Biology (Semester 2, 10 credits)

Your research project, is a key element in deciding how your career should develop further, and is carried out independently during the summer.

Working under the guidance of a supervisor from one of our world leading laboratories you will present your results in a dissertation.

A wide range of projects is available through both the School of Biological Sciences and from other University of Edinburgh schools and they may also be available from industry.

Find out more about compulsory and optional courses

We link to the latest information available. Please note that this may be for a previous academic year and should be considered indicative.

Learning outcomes

By completing this programme, you will:

  • acquire a set of skills necessary for the analysis and interpretation of complex biological data (these skills may be sufficiently generic so that they can be applied across multiple different types of data)
  • acquire skills that might be used to make new discoveries from data
  • learn how to develop new software and database components using a variety of programming languages
  • learn how to interpret existing scientific literature and design experiments

Career opportunities

This programme will equip you with the quantitative skills that employers are looking for in:

  • programming
  • data analysis and interpretation

The programme will also give you the skills to work within the developing area of data science. Alongside data science, you will have a mixed portfolio of skills from established fields such as systems biology, bioinformatics and genomics.

As a Data Science for Biology MSc graduate, you will have a wide range of career options. You might work in:

  • a diverse range of bioscience companies
  • academic labs

Equally, you will be well-prepared for a career in academic research and we expect many of our students to continue to PhD level.

Our Careers Service will support you throughout your time studying with us and for 2 years after graduation. They can provide:

  • tailored advice
  • individual guidance and personal assistance
  • internship and networking opportunities (with employers from local organisations to top multinationals)
  • access to the experience of our worldwide alumni network

We invest in your future beyond the end of your degree. Studying at the University of Edinburgh will lay the foundations for your future success, whatever shape that takes.

Entry requirements

These entry requirements are for the 2024/25 academic year and requirements for future academic years may differ. Entry requirements for the 2025/26 academic year will be published on 1 Oct 2024.

A UK 2:1 honours degree, or its international equivalent, in an area of biological sciences. You must have a strong background in molecular biology, biochemistry or related sciences and some experience of computer science, including programming (Python, Perl, C/C++, Java or R.), and statistics. This programme is aimed primarily at biological science graduates.

We may also consider your application if you have a background in physics, computing, mathematics or engineering and some experience of biological sciences. Your application must show that you are keen to learn data science from a biological perspective.

Students from China

This degree is Band C.

  • Postgraduate entry requirements for students from China

International qualifications

Check whether your international qualifications meet our general entry requirements:

  • Entry requirements by country
  • English language requirements

Regardless of your nationality or country of residence, you must demonstrate a level of English language competency at a level that will enable you to succeed in your studies.

English language tests

We accept the following English language qualifications at the grades specified:

  • IELTS Academic: total 7.0 with at least 6.0 in each component. We do not accept IELTS One Skill Retake to meet our English language requirements.
  • TOEFL-iBT (including Home Edition): total 100 with at least 20 in each component. We do not accept TOEFL MyBest Score to meet our English language requirements.
  • C1 Advanced ( CAE ) / C2 Proficiency ( CPE ): total 185 with at least 169 in each component.
  • Trinity ISE : ISE III with passes in all four components.
  • PTE Academic: total 70 with at least 59 in each component.

Your English language qualification must be no more than three and a half years old from the start date of the programme you are applying to study, unless you are using IELTS , TOEFL, Trinity ISE or PTE , in which case it must be no more than two years old.

Degrees taught and assessed in English

We also accept an undergraduate or postgraduate degree that has been taught and assessed in English in a majority English speaking country, as defined by UK Visas and Immigration:

  • UKVI list of majority English speaking countries

We also accept a degree that has been taught and assessed in English from a university on our list of approved universities in non-majority English speaking countries (non-MESC).

  • Approved universities in non-MESC

If you are not a national of a majority English speaking country, then your degree must be no more than five years old* at the beginning of your programme of study. (*Revised 05 March 2024 to extend degree validity to five years.)

Find out more about our language requirements:

Fees and costs

If you receive an offer of admission (either unconditional or conditional), you will need to pay a deposit to secure your place.

  • £1,500 (this contributes towards your tuition fees)

Find out more about tuition fee deposits:

  • Tuition fee deposits

Tuition Fees

Scholarships and funding, featured funding.

  • School of Biological Sciences Taught Postgraduate Bursaries

UK government postgraduate loans

If you live in the UK, you may be able to apply for a postgraduate loan from one of the UK’s governments.

The type and amount of financial support you are eligible for will depend on:

  • your programme
  • the duration of your studies
  • your tuition fee status

Programmes studied on a part-time intermittent basis are not eligible.

  • UK government and other external funding

Other funding opportunities

Search for scholarships and funding opportunities:

  • Search for funding

Further information

  • Recruitment and Employability Manager, Rona Lindsay
  • Phone: +44 (0)131 650 8649
  • Contact: [email protected]
  • Programme Director, Dr Simon Tomlinson
  • School of Biological Sciences
  • The King's Buildings Campus
  • Programme: Data Science for Biology
  • School: Biological Sciences
  • College: Science & Engineering

Select your programme and preferred start date to begin your application.

MSc Data Science for Biology - 1 Year (Full-time)

Application deadlines.

Due to high demand, the school operates a number of selection deadlines.

We strongly recommend you apply as early as possible. Applications may close earlier than the published deadlines if there is exceptionally high demand.

We will make a small number of offers to the most outstanding candidates on an ongoing basis, but hold the majority of applications until the next published selection deadline. We aim to make the majority of decisions within eight weeks of the selection deadline.

Applicants who are not made an offer at a specific selection deadline will either be notified that they have been unsuccessful in securing a place on the programme, or if they do not hear then they are still being considered and their application will be carried forward to the next selection deadline for further consideration at that time.

Selection Deadlines

  • How to apply

You must submit one reference with your application.

Find out more about the general application process for postgraduate programmes:

Where To Earn A Ph.D. In Data Science Online In 2024

Mikeie Reiland, MFA

Published: Apr 3, 2024, 2:15pm

Where To Earn A Ph.D. In Data Science Online In 2024

Data science is among the most in-demand skill sets in the modern economy. Data science professionals help businesses make decisions by creating analytical models, combining elements of math, artificial intelligence, machine learning and statistics.

If you want to pursue a high-paying data science career or teach data science at the college level, you may want to earn a terminal degree in the field. Online Ph.D. in data science programs allow you to advance your career while balancing other responsibilities at work or home.

We found two online data science programs that met our ranking criteria. Read on to learn more about these schools and find answers to frequently asked questions about data science.

Why You Can Trust Forbes Advisor Education

Forbes Advisor’s education editors are committed to producing unbiased rankings and informative articles covering online colleges, tech bootcamps and career paths. Our ranking methodologies use data from the National Center for Education Statistics , education providers, and reputable educational and professional organizations. An advisory board of educators and other subject matter experts reviews and verifies our content to bring you trustworthy, up-to-date information. Advertisers do not influence our rankings or editorial content.

  • 6,290 accredited, nonprofit colleges and universities analyzed nationwide
  • 52 reputable tech bootcamp providers evaluated for our rankings
  • All content is fact-checked and updated on an annual basis
  • Rankings undergo five rounds of fact-checking
  • Only 7.12% of all colleges, universities and bootcamp providers we consider are awarded

Online Ph.D. in Data Science Option

Capitol technology university, national university.

Located just outside Washington, D.C., in South Laurel, Maryland, Capitol Technology University offers an online doctoral degree in business analytics and data science. The program includes a limited residency requirement: Students must complete a course in contemporary research in management on campus, during which they take a qualifying exam. The degree requires 54 to 66 credits, and students can graduate within three years.

All students must also complete a dissertation and an oral defense of their work. The program costs $950 per credit for both in-state and out-of-state learners. Retired and active duty military receive a tuition discount.

At a Glance

  • School Type: Private
  • Application Fee: $100
  • Degree Credit Requirements: 54 to 66 credits
  • Program Enrollment Options: Part-time
  • Notable Major-Specific Courses: Management theory in a global economy; analytics and decision analysis
  • Concentrations Available: N/A
  • In-Person Requirements: Yes, for residency

Degree Finder

Based in San Diego, California, National University (NU) offers a variety of online programs, including a Ph.D. in data science. NU’s program requires 60 credits and takes an estimated 40 months. NU aims for flexibility, delivering coursework asynchronously and offering a new start date each Monday. The curriculum comprises 20 courses covering data science principles and data preparation methods.

NU runs on the quarter system and charges $442 per quarter unit for graduate courses. The program does not include any in-person requirements.

  • Application Fee: Free
  • Degree Credit Requirements: 60 credits
  • Notable Major-Specific Courses: Principles of data science, data preparation methods
  • In-Person Requirements: No

How To Find the Right Online Ph.D. in Data Science for You

Consider your future goals.

A Ph.D. in data science makes sense if you want to become a college professor , conduct original research or compete for the highest-paying and most cognitively demanding business analytics and machine learning positions. If you plan to pursue other careers, you may not need a terminal degree in this field.

If you want to work in academia, make sure your chosen doctorate in data science includes a dissertation requirement. A dissertation allows you to perform original research and contribute to scholarship in your field before you graduate. In turn, you’ll get a sense of your chosen career and a head start on professional publication.

Understand Your Expenses and Financing Options

Per-credit tuition rates for the programs in our guide ranged from $442 to $950. A 60-credit degree from NU totals about $26,500, while the 66-credit option at Capitol Tech costs more than $62,000.

Private universities, including NU and Capitol Tech, tend to cost more than public schools. Graduate students at nonprofit private universities paid an average of $20,408 per year in 2022-23, according to the National Center for Education Statistics . Over the course of a typical three-year Ph.D. program, this translates to about $61,000. This roughly matches Capitol Tech’s tuition, while NU offers a more affordable program.

While a Ph.D. might help you land a lucrative role in the long run, the upfront investment is still significant. Make sure to fill out the FAFSA ® to access federal student aid. This application is the gateway to opportunities like scholarships, grants and loans. You can pursue similar opportunities through schools and nonprofit organizations.

As a doctoral student, you may be able to access graduate assistantships or stipends, but these are often reserved for on-campus students who teach undergraduates or assist professors with research.

Should You Enroll in a Ph.D. in Data Science Online?

Pursuing a Ph.D. in data science online suits a specific kind of learner. To decide if that’s you, ask yourself a few key questions:

  • What’s my budget? In some cases, public universities allow students who exclusively enroll in online courses to pay in-state or otherwise discounted tuition rates. Even if you have to pay full price, distance learners generally save on costs associated with housing and transportation.
  • What are my other commitments? Distance learning is often a good fit for parents and students who need to work full time while pursuing their degree. Learners with outside responsibilities might pursue a program with asynchronous course delivery, which eliminates scheduled class sessions.
  • What’s my learning style? Distance learning requires a great deal of discipline, organization and time management. If you need external accountability or prefer the structure of a peer group or physical classroom, on-campus learning might offer a better fit.

Accreditation for Online Ph.D.s in Data Science

There are two important types of college accreditation to consider: institutional and programmatic.

Institutional accreditation is essential; it involves vetting schools to ensure the quality of their finances, academics, and faculty, among other areas. The Council for Higher Education Accreditation (CHEA) and U.S. Department of Education oversee the regional agencies that administer this process.

You should only enroll at institutionally accredited schools. Otherwise, you will be ineligible for federal financial aid. You can check a school’s accreditation status on its website or by visiting the directory on CHEA’s website .

Individual departments and degrees earn programmatic accreditation based on their curriculum, faculty and learner outcomes. However, this process has not been widely established for data science programs, so it shouldn’t make or break your enrollment decision. However, you can still keep an eye out for accreditation from the Data Science Council of America (DASCA).

Our Methodology

We ranked two accredited, nonprofit colleges offering online Ph.D.s in data science in the U.S. using 15 data points in the categories of student experience, credibility, student outcomes and affordability. We pulled data for these categories from reliable resources such as the Integrated Postsecondary Education Data System ; private, third-party data sources; and individual school and program websites.

Data is accurate as of February 2024. Note that because online doctorates are relatively uncommon, fewer schools meet our ranking standards at the doctoral level.

We scored schools based on the following metrics:

Student Experience:

  • Student-to-faculty ratio
  • Socioeconomic diversity
  • Availability of online coursework
  • Total number of graduate assistants
  • Proportion of graduate students enrolled in at least some distance education

Credibility:

  • Fully accredited
  • Programmatic accreditation status
  • Nonprofit status

Student Outcomes:

  • Overall graduation rate
  • Median earnings 10 years after graduation

Affordability:

  • In-state graduate student tuition
  • In-state graduate student fees
  • Alternative tuition plans offered
  • Median federal student loan debt
  • Student loan default rate

We listed the two schools in the U.S. that met our ranking criteria.

Find our full list of methodologies here .

Featured Online Schools

Learn about start dates, transferring credits, availability of financial credit and much more by clicking 'Visit Site'

Frequently Asked Questions (FAQs) About Earning a Ph.D. in Data Science Online

Can i do a ph.d. in data science online.

Yes, you can. National University and Capitol Technology University both offer Ph.D. programs in data science that you can complete mostly or entirely online.

Is a Ph.D. worth it for data science?

It depends on your goals and circumstances. A Ph.D. in data science may be a good fit if you want to pursue a career in research or academia or compete for advanced, lucrative positions in business analytics, artificial intelligence or machine learning.

Is it okay to get a Ph.D. online?

Yes, as long as the program is accredited. Distance learning requires strong motivation and self-discipline, so it suits some students better than others.

Can you become a professor with an online Ph.D.?

Yes, you can. Online diplomas feature the same coursework and degree requirements as in-person degrees, and your degree won’t say “online”.

  • Best Online Cybersecurity Degrees
  • Best Master’s In Computer Science Online
  • Best Online Data Science Master’s Degrees
  • Online Master’s In Computer Engineering
  • Best Online Master’s In Information Technology Programs
  • Best Software Engineering Master’s Online
  • Best Online Computer Science Degrees
  • How To Become A Cybersecurity Analyst
  • How To Become a Web Developer
  • How To Become A Sales Engineer
  • Careers In Cybersecurity
  • 10 Careers In Game Design To Consider
  • Earning An Associate In Computer Science
  • Earning A Bachelor’s Degree In Cybersecurity
  • How To Become A Cybersecurity Specialist
  • What Is A Typical Cybersecurity Salary?
  • The 7 Best Programming Languages To Learn For Beginners
  • How Long Does It Take To Learn Coding? And Other Coding Questions
  • How To Learn Python For Free
  • Ask A Tech Recruiter

Where To Earn A Ph.D. In Computer Science Online In 2024

Where To Earn A Ph.D. In Computer Science Online In 2024

Doug Wintemute

Where To Earn An Online Doctorate in Information Technology In 2024

Mariah St. John

The Best And Worst States For Technology Careers

Cecilia Seiter

Cybersecurity Stats: Facts And Figures You Should Know

What Is CISSP Certification? Qualifications, Benefits And Salary

What Is CISSP Certification? Qualifications, Benefits And Salary

Meghan Gallagher

Artificial Intelligence In Education: Teachers’ Opinions On AI In The Classroom

Ilana Hamilton

Mikeie Reiland is a writer who has written features for Oxford American, Bitter Southerner, Gravy, and SB Nation, among other publications. He received a James Beard nomination for a feature he wrote in 2023.

  • Introduction to Genomics
  • Educational Resources
  • Policy Issues in Genomics
  • The Human Genome Project

Funding Opportunities

  • Funded Programs & Projects
  • Division and Program Directors

Scientific Program Analysts

  • Contact by Research Area
  • News & Events
  • Research Areas
  • Research investigators
  • Research Projects
  • Clinical Research
  • Data Tools & Resources
  • Genomics & Medicine
  • Family Health History
  • For Patients & Families
  • For Health Professionals
  • Jobs at NHGRI
  • Training at NHGRI
  • Funding for Research Training
  • Professional Development Programs
  • NHGRI Culture
  • Social Media
  • Broadcast Media
  • Image Gallery
  • Press Resources
  • Organization
  • NHGRI Director
  • Mission & Vision
  • Policies & Guidance
  • Institute Advisors
  • Strategic Vision
  • Leadership Initiatives
  • Diversity, Equity, and Inclusion
  • Partner with NHGRI
  • Staff Search

 alt=

Computational Genomics and Data Science Program

Extracting knowledge from data is a defining challenge of science.

Explore this Page

Nhgri support, program breadth.

  • Tools and Resources

NIH Strategic Plan for Data Science

Workshops and meetings, related content, program staff.

Computational genomics has been an important area of focus for NHGRI since the beginning of the Human Genome Project. Today, however, advances in tools and techniques for data generation are rapidly increasing the amount of data available to researchers, particularly in genomics. This increase requires researchers to rely ever more heavily on computational and data science tools for the storage, management, analysis, and visualization of data. NHGRI’s commitment to computational genomics and data science is NHGRI’s commitment to computational genomics and data science is a key component of the NHGRI 2020 Strategic Vision and is in alignment with the NIH Strategic Plan for Data Science , which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.

Read the Genomic Data Science Fact Sheet .

See the Draft 2023-2028 NIH Strategic Plan for Data Science

The NHGRI 2020 Strategic Vision  highlights the importance of bioinformatics and computational biology by stating, “all major genomics breakthroughs to date have been accompanied by the development of groundbreaking statistical and computational methods.”  See an extensive outline of the 2020 Strategic Vision .

Projects involving a substantial element of computational genomics or data science account for around 30% of NHGRI’s FY2023 budget ; these areas are key components of many NHGRI grants and programs.

NHGRI’s support for computational genomics and data science follows the general principles and priorities identified in the FY2023 NHGRI Funding Policy . NHGRI prioritizes funding support on “the development of resources, approaches, and technologies that accelerate and support studies focused on the structure and biology of genomes; functional genomics; the genomics of disease; the implementation and effectiveness of genomic medicine, computational genomics and data science; training, developing, and expanding the diversity of the genomics workforce; and ethical, legal, and social issues related to genomic advances."

The Computational Genomics and Data Science Program (CGDS) supports the development of advanced computational approaches, innovative data analysis tools, and data resources that provide scientific utility across the extramural research programs and divisions. The CGDS program includes a number of managed grants and programs spanning many scientific topics. These grants can be categorized usefully, though neither exhaustively nor perfectly, into three categories: Computational Genomics and Data Science Methods Development, Genomic Data Resources and Informatics Platforms, and Computational Genomics Training and Workforce Development. 

The links below lead to NIH RePORTER, a database that provides information on NIH funded grants and research activities. Each link associated with a category will display the portfolio of FY2023 grants that received funding from the NHGRI Computational Genomics and Data Science Program.

Computational Genomics and Data Science Methods Development

  • Computational Methods for Clinical Genomics : Development and implementation of genomic-based clinical informatics resources and tools that harmonize scalable, sharable and computable inferences of genomic knowledge with clinical practice guidelines. Includes frameworks and collaborative tools that allow researchers to share, analyze and secure genomic data and patient information.
  • Computational Methods for Functional Genomics : Development of novel methods, software and tools to analyze gene regulation, gene expression, epigenetic modifications and methylation data. Includes methods to integrate and interpret across multiple data types.
  • Computational Methods for Genomic Sequencing Data : Development of novel methods, software and tools to process, align, format and visualize genomic sequence reads; perform genome assembly; and extract sequence features. Includes graph-based and other novel approaches for pangenome analysis. Also includes general genomic analysis tools.
  • Computational Methods for Variation and Association Analysis : Development of novel methods, software and tools for identifying and interpreting genetic variation, elucidating the genetic architecture of human traits and disease, and analyzing population and evolutionary level genomic data.
  • Privacy and Security Technologies : Development of novel methods, software and tools to maximalize security in genomic data sharing and storage.
  • General Computational Tools : Any development of novel methods, software, or tools not covered in the other categories. 

Genomic Data Resources and Informatics Platforms

  • Genomic and Phenotypic Measures and Standards : Development of tools and standards to facilitate sharing and analysis of large-scale genomics data, phenotype data and associated metadata. Includes approaches to harmonize phenotypic information for use in genomic analysis, such as incorporating family history information, electronic phenotyping and ontology development.
  • Genomic Community Resources : Development and maintenance of resources that collect, curate, integrate and distribute comprehensive sets of genomic information from humans or biomedically relevant species. Includes software environments to store, share, analyze and visualize genomics data.

Computational Genomics Training and Workforce Development

Training and Workforce development : Development of resources, research training, career development, classroom courses, or events for expanding and diversifying the genome informatics workforce.  

Looking for a contact in your research area? Visit the NHGRI Extramural Scientific Areas off Emphasis and Program Contacts page for the Computational Genomics and Data Science Program.   

2023-2028 : In December 2023, NIH released a Draft 2023-2028 Strategic Plan for Data Science to solicit public comments . This updated Strategic Plan for Data Science builds on accomplishments from the initial NIH Strategic Plan for Data Science and will prepare NIH to face the acceleration of sophisticated new technologies and address the rapid rise in the quantity and diversity of data. The updated Strategic Plan supports the NIH Policy for Data Management and Sharing and embraces data-driven discovery as a powerful tool to elucidate biological processes and better characterize the health and health consequences of all people. The plan also fosters ethical use of new methodologies arising from artificial intelligence (AI) and machine learning (ML).

More information to come when the final 2023-2028 Strategic Plan for Data Science is released. 

2018-2023 : As a result of the rapid changes in biomedical research and information technology, several pressing issues related to the data-resource ecosystem confront NIH and other components of the biomedical research community. To address these challenges, NIH released its first Strategic Plan for Data Science on June 4, 2018, to provide a roadmap for modernizing the NIH-funded biomedical data science ecosystem. In establishing this plan, NIH addresses storing data efficiently and securely; making data usable to as many people as possible (including researchers, institutions, and the public); developing a research workforce poised to capitalize on advances in data science and information technology; and setting policies for productive, efficient, secure, and ethical data use.

  • Future Directions of the NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL)   October 29, 2021 (Virtual) This workshop aimed to identify gaps, challenges and future opportunities related to NHGRI’s investment in the AnVIL’s cloud-based infrastructure, tools, and services.  
  • Genomic Medicine XIII: Developing a Clinical Genomic Informatics Research Agenda   February 9-10, 2021 (Virtual) The goal of this meeting was to develop a research strategy on the use of genomic-based clinical informatics resources to improve the detection, treatment, and reporting of genetic disorders in clinical settings.  
  • NHGRI Extramural Informatics & Data Science Workshop  September 29-30, 2016; Bethesda, MD The goal of the workshop was to identify and prioritize opportunities of significance to the NHGRI Computational Genomics and Data Science Program over the next 3-5 years. A report  was generated that outlined the opportunities identified through the course of this workshop. This was presented to the NHGRI council in May 2017.

Investigators interested in submitting applications to NHGRI are encouraged to contact NHGRI program staff before submission to discuss their specific aims and their choice of Funding Opportunity Announcement (FOA). Contact information for NHGRI program staff is at the bottom of this page. 

Investigator Initiated Research in Computational Genomics and Data Science (R01 and R21):  PAR-21-254  and  PAR-21-255 , invite applications for a broad range of research efforts in computational genomics, data science, statistics, and bioinformatics relevant to one or both of basic or clinical genomic science, and broadly applicable to human health and disease.

Genomic Resource Grants for Community Resource Projects (U24):  PAR-23-124  is tightly focused on supporting major genomic resources, including those in informatics. Potential applicants are strongly encouraged to contact NHGRI Program Staff before developing an application.

Trans-NIH Enhancement and Management of Established Biomedical Data Repositories and Knowledgebases (U24): PAR-23-237  supports the enhancement and maintenance of established, widely used data resources.

Trans-NIH Early-stage Biomedical Data Repositories and Knowledgebases (U24) :  PAR-23-236  supports the initial development of a data resource or pilot significant modification of an existing resource.

Development and Implementation of Clinical Informatics Tools to Enhance Patients’ Use of Genomic Information (NOSI): NOT-HG-22-011 encourages applications to develop and implement patient-facing genomic-based clinical informatics tools that facilitate or enhance patient-provider electronic communication, patient tracking and registry functions, patient self-management and support, provider electronic prescribing, test tracking, referral tracking, and health care decision-making.

Parent NIH Solicitations:  R01 ( PA-20-185 and PA-20-183 ), Parent R21 ( PA-20-195 and PA-20-194 ), and Parent K25 ( PA-20-199 ) solicitations. These investigator-initiated grants allow researchers to target their specific area of science relevant to NHGRI’s mission (per the NHGRI Funding Policy ). Other funding opportunities include PAR-21-075 , which focuses on research experiences for students seeking a master’s degree. Additionally, NIH funding opportunities for Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) grants can be found at  https://sbir.nih.gov/funding .

Broadening Opportunities for Computational Genomics and Data Science Education (UE5): RFA-HG-23-002  supports educational activities that encourage individuals from diverse backgrounds, including those from groups underrepresented in the biomedical and behavioral sciences, to pursue further studies or careers in research. This is a parallel effort with the (expired) RFA-HG-22-002  Educational Hub for Enhancing Diversity in Computational Genomics and Data Science.

Other Relevant NIH Funding Opportunities 

NHGRI's Funding Opportunities  page links to various NHGRI funding opportunities and provides instructions for signing up for NHGRI's funding opportunities email list.

The webpage of the Office of Data Science Strategy (ODSS) provides resources and links to various informatics-related funding opportunities across the NIH and other Federal agencies.

Expired Funding Opportunities

RFA-HG-22-020 : Limited Competition: The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (U24 Clinical Trial Not Allowed)

RFA-HG-22-021 : The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space Clinical Resource (ACR) (U24 Clinical Trial Not Allowed)

RFA-HG-17-011 : The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (U24)

RFA-HG-22-002 : Educational Hub for Enhancing Diversity in Computational Genomics and Data Science (U24 Clinical Trials Not Allowed)

PAR-20-097 : Trans-NIH Biomedical Knowledgebase (U24)

  • See PAR-23-237  and  PAR-23-236  

PAR-20-089 : Trans-NIH Biomedical Data Repository (U24)

  • See  PAR-23-237  and  PAR-23-236  

Genomic Data Science

Program Directors

Daniel A. Gilchrist, Ph.D.

  • Program Director
  • Division of Genome Sciences

Ajay Pillai, Ph.D.

  • Office of Genomic Data Science

Chris Wellington, B.S.

  • Program Director, Computational Genomics and Data Science

Helen Thompson

  • Program Specialist

Sarah Hutchison

  • Scientific Program Analyst

Nicolas Keller

Last updated: April 2, 2024

  • Office of Graduate Education

Program Overview

Graphic showing the organization of the Stanford Biosciences PhD programs

When you join Stanford Biosciences, you join a collaborative network tackling some of the world’s toughest questions.  The Stanford Biosciences  Home Programs  comprise nine departments and five interdisciplinary programs, which span the School of Medicine and the School of Humanities and Sciences.  These Home Programs are the foundation of our collaborative culture, offering students the opportunity to tailor their graduate education  by working within an entire network of faculty, labs, and approaches to pursue their research.

Each student is admitted to a particular Home Program and initiates training with a core group of faculty, students, and postdoctoral fellows who share scientific interests. Many Home Programs host annual retreats—facilitating the exchange of ideas between Stanford colleagues and fostering team-building—as well as seminar series that invite outside speakers.

In addition to that intimate setting, all Biosciences students have access to faculty in every Home Program for laboratory rotations and potential thesis work.  One of Stanford Biosciences’ biggest strengths is the physical proximity of programs and labs , encouraging face-to-face collaboration and feeding an environment of interdisciplinary innovation. Indeed, the Biosciences PhD Programs combine the supportive atmosphere of a small program with the many opportunities afforded by a large umbrella program—the best of both worlds.

A closer look

The 14 Home Programs in Stanford Biosciences’ collaborative network:

Biochemistry

Illustration of an organic molecule

Department website | Find Faculty

biology phd to data science

Biomedical Data Science

biology phd to data science

Cancer Biology

biology phd to data science

Chemical and Systems Biology

biology phd to data science

Developmental Biology

biology phd to data science

Microbiology and Immunology

biology phd to data science

Molecular and Cellular Physiology

biology phd to data science

Neurosciences

biology phd to data science

Stem Cell Biology and Regenerative Medicine

biology phd to data science

Structural Biology

biology phd to data science

Related programs

Bioengineering.

biology phd to data science

Program website | Find Faculty

Biomedical Physics

biology phd to data science

Health Policy

biology phd to data science

Epidemiology and Clinical Research

biology phd to data science

Dual-Degree Programs

Providing a select group of medical students with an opportunity to pursue a training program designed to equip them for careers in academic investigative medicine.

Program website

Table of Contents

Bioinformatics and data science in biology, what is bioinformatics used for, what’s the difference between bioinformatics and computational biology do both require coding skills, what is bioinformatic visualization, conclusion: bioinformaticians needed, bioinformatics: where biology and data science meet.

Bioinformatics: Where Biology and Data Science Meet

Scientists first mapped the human genome in 2003. Since then, the pace of genome sequencing has exploded, resulting in the generation of massive quantities of data. Experts predict that by 2025, genome sequencing will produce 40 exabytes (40 billion gigabytes) of data per year . For comparison, five exabytes is approximately equivalent to all the words ever spoken by humankind. 

The challenges of storing, organizing, and gleaning insights from such a large volume of data are immense. That’s why bioinformatics — the application of computational tools to store, analyze, and interpret biological “ big data ” — is a fast-growing and increasingly important field. Bioinformaticians program and maintain databases of biological data, as well as create and use algorithms to analyze and interpret that data. 

Bioinformatics is a multidisciplinary field that utilizes computer programming, machine learning , algorithms , statistics , and other computational tools to organize and analyze large volumes of biological data. Fields of biology that generate massive amounts of data include genomics, transcriptomics, proteomics, and metabolomics.

  • Genomics is the study of the complete genetic makeup of an organism. It focuses on deoxyribonucleic acid (DNA), the main component of chromosomes and the repository of genetic information. Sequencing just a single human genome generates 200 gigabytes of data.   It once took over a decade to sequence a complete human genome. Today, with next generation sequencing (NGS), that same task takes a single day.    
  • Transcriptomics is the study of transcriptomes, the ribonucleic acid (RNA) transcripts produced by a genome. Scientists are particularly interested in how diseases and environmental factors affect transcript patterns. NGS is used in transcriptomics as well.    
  • Proteomics is the study of proteins, which carry out cellular work and regulate our bodies’ organs. Protein sequencing is usually done via a process called mass spectrometry. 
  • Metabolomics is the study of metabolites, small molecules inside of cells, tissues, and fluids in organisms. A better understanding of how metabolites work can help doctors deliver more individualized treatments for patients, a field called precision medicine. Nuclear magnetic resonance and mass spectrometry are used in metabolomics.  

Providing the means to map and compare DNA, study protein sequences, and identify patterns in large volumes of data are some of the primary ways bioinformatics aims to improve our understanding of biological processes.

Become a Data Science & Business Analytics Professional

  • 28% Annual Job Growth By 2026
  • 11.5 M Expected New Jobs For Data Science By 2026

Data Scientist

  • Add the IBM Advantage to your Learning
  • 25 Industry-relevant Projects and Integrated labs

Caltech Post Graduate Program in Data Science

  • Earn a program completion certificate from Caltech CTME
  • Curriculum delivered in live online sessions by industry experts

Here's what learners are saying regarding our programs:

A.Anthony Davis

A.Anthony Davis

Simplilearn has one of the best programs available online to earn real-world skills that are in demand worldwide. I just completed the Machine Learning Advanced course, and the LMS was excellent.

Charu Tripathi

Charu Tripathi

Senior business intelligence engineer , dell technologies.

My online learning experience was truly enriching, thanks to the exceptional faculty. The faculty members were always available, ready to assist and guide me through challenging topics, fostering a conducive learning environment. Their expertise and commitment were evident in their thorough explanations and willingness to ensure every student comprehended the subject.

Bioinformatics entails the storage and management of biological data via the creation and maintenance of powerful databases, as well as the retrieval, analysis, and interpretation of data via algorithms and other computational tools. As such, it has applications for a wide range of fields. Here are just a few examples of how bioinformatics helps tackle real-world problems:

  • It can help cancer researchers identify which gene mutations cause cancer. Scientists can then develop targeted therapies exploiting that knowledge.
  • It can help biologists map evolutionary connections and ancestry. 
  • It can help pharmaceutical companies develop new drugs customized to a person’s individual genome.
  • It can aid in the development of new vaccines.
  • It can enable the development of crops that are more resistant to insects and disease.
  • It can identify microbes that have the ability to clean-up environmental waste.
  • It can improve the health of livestock.  
  • It can help forensic scientists identify incriminating DNA evidence.

Bioinformatics utilizes computer programming and algorithms to store, analyze, and interpret massive volumes of biological data. Computational biology uses computer science, statistics, and mathematics to analyze typically smaller volumes of data. Bioinformatics also incorporates more machine learning and artificial intelligence than does computational biology. 

Becoming a bioinformatician requires coding skills and more technical training than becoming a computational biologist. Programming languages commonly used in bioinformatics include Bash, Python, Perl, R, C, and C++ . Bioinformatics and computational biology have many overlaps, however, and are often integrated in colleges and research centers.

Sometimes insights buried deep in a large volume of data can come to light when displayed in the right visual configurations. Bioinformatic visualization employs computerized procedures to transform data into visual representations that make the data more meaningful and easier to interpret. Examples of data visualization include:

  • Genome browsers that display genomic data in linear layouts consisting of multiple parallel “tracks,” enabling the comparison of sequencing data and experimental results (see figure .) 

Bioinformatics

  • Graphs that can identify outliers, errors, or mistaken assumptions in raw statistical data
  • 3D representations of genomes 
  • 3D representations of proteins
  • Visual representations of spatial transcriptomics
Looking forward to becoming a Data Scientist? Check out the  Data Science Bootcamp  and get certified today.

We are amassing biological data at speeds and quantities that require increasingly powerful computational tools to store, organize, analyze, and interpret. Life scientists need bioinformatic skills to stay at the forefront of many research fields, while industries ranging from health care to agriculture to environmental conservation stand to benefit from the insights waiting to be gleaned from biological data. If you are passionate about biology, interested in computer programming, and excited about a career in data science, this may be the field for you! 

To succeed in this rewarding and in-demand career, check out the Caltech Data Science Bootcamp , offered in collaboration with IBM. Leveraging Simplilearn’s proven applied learning approach, you will learn through a blend of live, instructor-led classes, self-paced videos, hands-on projects in interactive labs, exclusive access to IBM hackathons and Ask Me Anything sessions, and much more. Skills in data science apply to all industries today, so upskilling in this new and critical field is a win-win in any case.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Get Free Certifications with free video courses

Introduction to Data Science

Data Science & Business Analytics

Introduction to Data Science

Artificial Intelligence Beginners Guide: What is AI?

AI & Machine Learning

Artificial Intelligence Beginners Guide: What is AI?

Learn from Industry Experts with free Masterclasses

Open Gates to a Successful Data Scientist Career in 2024 with Simplilearn Masters program

Learner Spotlight: Watch How Prasann Upskilled in Data Science and Transformed His Career

Career Fast-track

Redefining Future-Readiness for the Modern Graduate: Expert Tips for a Successful Career

Recommended Reads

What Is Open Source?

Perl Programming for Beginners

Support Vector Machine (SVM) in R: Taking a Deep Dive

Breaking Down Different Types of Technology [2024]

Data Science Course Syllabus and Subjects

What Is Spark GraphX? Everything You Need to Know

Get Affiliated Certifications with Live Class programs

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Northeastern University Graduate Programs

Beyond Education. Experience.

Graduate programs.

  • Professional Doctorate
  • Certificate
  • Professional Doctorate Programs
  • Master’s Programs
  • Certificate Programs
  • Financial Aid
  • Event Calendar

biology phd to data science

Experiential Learning

biology phd to data science

What Can You Do with a Master’s in Economics? 5 Career Paths

biology phd to data science

Global Reach

Experience our network..

biology phd to data science

Campus Locations

Career outcome.

biology phd to data science

biology phd to data science

A.I. Is Learning What It Means to Be Alive

Given troves of data about genes and cells, A.I. models have made some surprising discoveries. What could they teach us someday?

Credit... Doug Chayka

Supported by

  • Share full article

Carl Zimmer

By Carl Zimmer

  • Published March 10, 2024 Updated March 12, 2024

In 1889, a French doctor named Francois-Gilbert Viault climbed down from a mountain in the Andes, drew blood from his arm and inspected it under a microscope. Dr. Viault’s red blood cells, which ferry oxygen, had surged 42 percent. He had discovered a mysterious power of the human body: When it needs more of these crucial cells, it can make them on demand.

In the early 1900s, scientists theorized that a hormone was the cause. They called the theoretical hormone erythropoietin, or “red maker” in Greek. Seven decades later, researchers found actual erythropoietin after filtering 670 gallons of urine .

And about 50 years after that, biologists in Israel announced they had found a rare kidney cell that makes the hormone when oxygen drops too low. It’s called the Norn cell , named after the Norse deities who were believed to control human fate.

It took humans 134 years to discover Norn cells. Last summer, computers in California discovered them on their own in just six weeks.

The discovery came about when researchers at Stanford programmed the computers to teach themselves biology. The computers ran an artificial intelligence program similar to ChatGPT, the popular bot that became fluent with language after training on billions of pieces of text from the internet. But the Stanford researchers trained their computers on raw data about millions of real cells and their chemical and genetic makeup.

The researchers did not tell the computers what these measurements meant. They did not explain that different kinds of cells have different biochemical profiles. They did not define which cells catch light in our eyes, for example, or which ones make antibodies.

The computers crunched the data on their own, creating a model of all the cells based on their similarity to each other in a vast, multidimensional space. When the machines were done, they had learned an astonishing amount . They could classify a cell they had never seen before as one of over 1,000 different types. One of those was the Norn cell.

“That’s remarkable, because nobody ever told the model that a Norn cell exists in the kidney,” said Jure Leskovec, a computer scientist at Stanford who trained the computers.

The software is one of several new A.I.-powered programs, known as foundation models, that are setting their sights on the fundamentals of biology. The models are not simply tidying up the information that biologists are collecting. They are making discoveries about how genes work and how cells develop.

As the models scale up, with ever more laboratory data and computing power, scientists predict that they will start making more profound discoveries. They may reveal secrets about cancer and other diseases. They may figure out recipes for turning one kind of cell into another.

“A vital discovery about biology that otherwise would not have been made by the biologists — I think we’re going to see that at some point,” said Dr. Eric Topol, the director of the Scripps Research Translational Institute.

Just how far they will go is a matter of debate. While some skeptics think the models are going to hit a wall, more optimistic scientists believe that foundation models will even tackle the biggest biological question of them all: What separates life from nonlife?

Heart Cells and Mole Rats

biology phd to data science

Biologists have long sought to understand how the different cells in our bodies use genes to do the many things we need to stay alive.

About a decade ago, researchers started industrial-scale experiments to fish out genetic bits from individual cells. They recorded what they found in catalogs, or “ cell atlases ,” that swelled with billions of pieces of data.

Dr. Christina Theodoris, a medical resident at Boston Children’s Hospital, was reading about a new kind of A.I. model made by Google engineers in 2017 for language translations. The researchers provided the model with millions of sentences in English, along with their translations into German and French. The model developed the power to translate sentences it hadn’t seen before. Dr. Theodoris wondered if a similar model could teach itself to make sense of the data in cell atlases.

In 2021, she struggled to find a lab that might let her try to build one. “There was a lot of skepticism that this approach would work at all,” she said.

Shirley Liu, a computational biologist at the Dana-Farber Cancer Institute in Boston, gave her a shot. Dr. Theodoris pulled data from 106 published human studies, which collectively included 30 million cells, and fed it all into a program she created called GeneFormer.

The model gained a deep understanding of how our genes behave in different cells. It predicted, for example, that shutting down a gene called TEAD4 in a certain type of heart cell would severely disrupt it. When her team put the prediction to the test in real cells called cardiomyocytes, the beating of the heart cells grew weaker.

In another test, she and her colleagues showed GeneFormer heart cells from people with defective heartbeat rhythms as well as from healthy people. “Then we said, Now tell us what changes we need to happen to the unhealthy cells to make them healthy,” said Dr. Theodoris, who now works as a computational biologist at the Gladstone Institutes in San Francisco.

GeneFormer recommended reducing the activity of four genes that had never before been linked to heart disease. Dr. Theodoris’s team followed the model’s advice, knocking down each of the four genes. In two out of the four cases, the treatment improved how the cells contracted.

The Stanford team got into the foundation-model business after helping to build one of the biggest databases of cells in the world, known as CellXGene . Beginning in August, the researchers trained their computers on the 33 million cells in the database, focusing on a type of genetic information called messenger RNA. They also fed the model the three-dimensional structures of proteins, which are the products of genes.

From this data, the model — known as Universal Cell Embedding, or U.C.E. — calculated the similarity among cells, grouping them into more than 1,000 clusters according to how they used their genes. The clusters corresponded to types of cells discovered by generations of biologists.

U.C.E. also taught itself some important things about how the cells develop from a single fertilized egg. For example, U.C.E. recognized that all the cells in the body can be grouped according to which of three layers they came from in the early embryo.

“It essentially rediscovered developmental biology,” said Stephen Quake, a biophysicist at Stanford who helped develop U.C.E.

The model was also able to transfer its knowledge to new species. Presented with the genetic profile of cells from an animal that it had never seen before — a naked mole rat, say — U.C.E. could identify many of its cell types.

“You can bring a completely new organism — chicken, frog, fish, whatever — you can put it in, and you will get something useful out,” Dr. Leskovec said.

After U.C.E. discovered the Norn cells, Dr. Leskovec and his colleagues looked in the CellXGene database to see where they had come from. While many of the cells had been taken from kidneys, some had come from lungs or other organs. It was possible, the researchers speculated, that previously unknown Norn cells were scattered across the body.

Dr. Katalin Susztak, a physician-scientist at the University of Pennsylvania who studies Norn cells, said that the finding whetted her curiosity. “I want to check these cells,” she said.

She is skeptical that the model found true Norn cells outside the kidneys, since the erythropoietin hormone hasn’t been found in other places. But the new cells may sense oxygen as Norn cells do.

In other words, U.C.E. may have discovered a new type of cell before biologists did.

An ‘Internet of Cells’

Just like ChatGPT , biological models sometimes get things wrong. Kasia Kedzierska, a computational biologist at the University of Oxford, and her colleagues recently gave GeneFormer and another foundation model , scGPT, a battery of tests . They presented the models with cell atlases they hadn’t seen before and had them perform tasks such as classifying the cells into types. The models performed well on some tasks, but in other cases they fared poorly compared with simpler computer programs.

Dr. Kedzierska said she had great hopes for the models but that, for now, “they should not be used out of the box without a proper understanding of their limitations.”

Dr. Leskovec said that the models were improving as scientists trained them on more data. But compared with ChatGPT’s training on the entire internet, the latest cell atlases offer only a modest amount of information. “I’d like an entire internet of cells,” he said.

More cells are on the way as bigger cell atlases come online. And scientists are gleaning different kinds of data from each of the cells in those atlases. Some scientists are cataloging the molecules that stick to genes, or taking photographs of cells to illuminate the precise location of their proteins. All of that information will allow foundation models to draw lessons about what makes cells work.

Scientists are also developing tools that let foundation models combine what they’re learning on their own with what flesh-and-blood biologists have already discovered. The idea would be to connect the findings in thousands of published scientific papers to the databases of cell measurements.

With enough data and computing power, scientists say, they may eventually create a complete mathematical representation of a cell.

“That’s going to be hugely revolutionary for the field of biology,” said Bo Wang, a computational biologist at the University of Toronto and the creator of scGPT. With this virtual cell, he speculated, it would be possible to predict what a real cell would do in any situation. Scientists could run entire experiments on their computers rather than in petri dishes.

Dr. Quake suspects that foundation models will learn not just about the kinds of cells that currently reside in our bodies but also about kinds of cells that could exist. He speculates that only certain combinations of biochemistry can keep a cell alive. Dr. Quake dreams of using foundation models to make a map showing the realm of the possible, beyond which life cannot exist.

“I think these models are going to help us get some really fundamental understanding of the cell, which is going to provide some insight into what life really is,” Dr. Quake said.

Having a map of what’s possible and impossible to sustain life might also mean that scientists could actually create new cells that don’t yet exist in nature. The foundation model might be able to concoct chemical recipes that transform ordinary cells into new, extraordinary ones. Those new cells might devour plaque in blood vessels or explore a diseased organ to report back on its condition.

“It’s very ‘Fantastic Voyage ’- ish,” Dr. Quake admitted. “But who knows what the future is going to hold?”

If foundation models live up to Dr. Quake’s dreams, they will also raise a number of new risks. On Friday, more than 80 biologists and A.I. experts signed a call for the technology to be regulated so that it cannot be used to create new biological weapons. Such a concern might apply to new kinds of cells produced by the models.

Privacy breaches could happen even sooner. Researchers hope to program personalized foundation models that would look at an individual’s unique genome and the particular way that it works in cells. That new dimension of knowledge could reveal how different versions of genes affect the way cells work. But it could also give the owners of a foundation model some of the most intimate knowledge imaginable about the people who donated their DNA and cells to science.

Some scientists have their doubts about how far foundational models will make it down the road to “Fantastic Voyage,” however. The models are only as good as the data they are fed. Making an important new discovery about life may depend on having data on hand that we haven’t figured out how to collect. We might not even know what data the models need.

“They might make some new discoveries of interest,” said Sara Walker, a physicist at Arizona State University who studies the physical basis of life. “But ultimately they are limited when it comes to new fundamental advances.”

Still, the performance of foundation models has already led their creators to wonder about the role of human biologists in a world where computers make important insights on their own. Traditionally, biologists have been rewarded for creative and time-consuming experiments that uncover some of the workings of life. But computers may be able to see those workings in a matter of weeks, days or even hours by scanning billions of cells for patterns we can’t see.

“It’s going to force a complete rethink of what we consider creativity,” Dr. Quake said. “Professors should be very, very nervous.”

Carl Zimmer covers news about science for The Times and writes the Origins column . More about Carl Zimmer

Explore Our Coverage of Artificial Intelligence

News  and Analysis

David Autor, an M.I.T. economist and tech skeptic, argues that A.I. is fundamentally different  from past waves of computerization.

Economists doubt that artificial intelligence is already visible in productivity data . Big companies, however, talk often about adopting it to improve efficiency.

OpenAI unveiled Voice Engine , an A.I. technology that can recreate a person’s voice from a 15-second recording.

Amazon said it had added $2.75 billion to its investment in Anthropic , an A.I. start-up that competes with companies like OpenAI and Google.

Gov. Bill Lee of Tennessee signed a bill  to prevent the use of A.I. to copy a performer’s voice. It is the first such measure in the United States.

French regulators said Google failed to notify news publishers  that it was using their articles to train its A.I. algorithms, part of a wider ruling against the company for its negotiating practices with media outlets.

Advertisement

  • Online Degree Explore Bachelor’s & Master’s degrees
  • MasterTrack™ Earn credit towards a Master’s degree
  • University Certificates Advance your career with graduate-level learning
  • Top Courses
  • Join for Free

biology phd to data science

  • Certificates >

Earn a data science certificate

Professional certificates.

Google

Google Data Analytics Professional Certificate

Offered by Google

6 months at 10 hours per week

Go to certificate

IBM

IBM Data Science Professional Certificate

Offered by IBM

3 months at 12 hours per week

IBM Data Analyst Professional Certificate

3 months at 10 hrs per week

IBM Data Engineering Professional Certificate

7 months to complete at 3 hours per week

Meta

Meta Database Engineer Professional Certificate

Offered by Meta

6 months at 6 hours per week

IBM Data Analytics with Excel and R Professional Certificate

6 months at 4 hours per week

SkillUp EdTech

IBM Business Intelligence (BI) Analyst Professional Certificate

Offered by SkillUp EdTech

5 months at 10 hours a week

IBM Data Warehouse Engineer Professional Certificate

8 months at 3 hours per week

Fractal Analytics

Fractal Data Science Professional Certificate

Offered by Fractal Analytics

5 months at 10 hours per week

Google Advanced Data Analytics Professional Certificate

Google Business Intelligence Professional Certificate

2 months at 10 hours a week

Microsoft

Microsoft Azure Data Scientist Associate (DP-100) Professional Certificate

Offered by Microsoft

6.5 months at 4 hours per week

DeepLearning.AI

DeepLearning.AI TensorFlow Developer Professional Certificate

Offered by DeepLearning.AI

4 months at 5 hours per week

Microsoft Azure Data Engineering Associate (DP-203) Professional Certificate

7 months at 2 hours per week

IBM Machine Learning Professional Certificate

3 months at 5 hours per week

SAS

SAS Advanced Programmer Professional Certificate

Offered by SAS

5 months, 3 hours per week

CertNexus

CertNexus Certified Data Science Practitioner Professional Certificate

Offered by CertNexus

CertNexus Certified Ethical Emerging Technologist Professional Certificate

6 months at 3 hours per week

IBM AI Engineering Professional Certificate

3-4 months at 12 hours per week

IBM Applied AI Professional Certificate

Google Cloud

Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certificate

Offered by Google Cloud

1.5 months at 5 hours per week

SAS Programmer Professional Certificate

2 months at 14 hours per week

SAS Visual Business Analytics Professional Certificate

2 months, 3-5 hours per week

SAS Statistical Business Analyst Professional Certificate

MasterTrack Certificates

Universidad de los Andes

Analítica de datos: visualización, predicción y toma de decisiones Programa de Certificado MasterTrack®

Offered by Universidad de los Andes

Pontificia Universidad Católica de Chile

Certificado en Introducción a la Ciencia de Datos MasterTrack® MasterTrack® Certificate

Offered by Pontificia Universidad Católica de Chile

University Certificates

IIT Roorkee

Post Graduate Certificate in Data Science and Machine Learning

Offered by IIT Roorkee

Dartmouth College

Digital Transformation Certificate

Offered by Dartmouth College

Post Graduate Certificate in Strategic Supply Chain Management with AI

Post Graduate Certificate in Machine Learning for Finance

Graduate Certificates

University of Colorado Boulder

Data Science Graduate Certificate

Offered by University of Colorado Boulder

Explore more certificates by category

Launch your career

Advance your career

Prepare for a certification

Computer Science and IT Certificates

Business Certificates

Learn more about online data science certificates on Coursera

Get job-ready for an in-demand career Professional Certificates on Coursera can help you get job-ready for an in-demand career field in less than a year. Earn a career credential, apply your knowledge to hands-on projects that showcase your skills for employers, and get access to career support resources. Many programs also provide a pathway to an industry-recognized certification.

biology phd to data science

Benefit from master’s degree learning that can count as credit With MasterTrack Certificates, portions of Master’s programs have been split into online modules, so you can earn a high quality university-issued career credential at a breakthrough price in a flexible, interactive format. Benefit from a deeply engaging learning experience with real-world projects and feedback from expert instructors. If you are accepted to the full Master's program, your MasterTrack coursework can count towards your degree.

University 
Certificates

Begin developing expertise in your chosen field of study In these certificate programs from leading universities, you can get the advanced training necessary to take on more senior roles in your chosen profession. Upon completion, you’ll earn a university-issued certificate you can share with recruiters, hiring managers, and employers. Plus, by immersing yourself in a cohort-based learning experience where you’ll engage in live, expert instruction, you’ll build your network and gain insights and advice from other working professionals.

biology phd to data science

IMAGES

  1. PhD in Data Science

    biology phd to data science

  2. Biological Data Science, MS by Ruth Dempsey

    biology phd to data science

  3. What will you do with a biology Ph.D.?

    biology phd to data science

  4. How a Biologist became a Data Scientist

    biology phd to data science

  5. Biology to Data Science (data professor's tips on how to get a data

    biology phd to data science

  6. Data Science 101: The Data Science Venn Diagram

    biology phd to data science

VIDEO

  1. Use of Ai and data in industry

  2. TOP 5 JOBS IN PRIVATE FIELD 💥|| ODIN SCHOOL|| DATA SCIENCE 🎉||

  3. Integrative Biology PhD Defense

  4. Integrative Biology PhD Defense

  5. Fully Funded PhD Scholarship at the Institute of Science and Technology Austria (ISTA)

  6. Student Spotlight: Linking Passion to Research

COMMENTS

  1. Biomedical Data Science Graduate Program Overview

    Biomedical Data Science is a broad term comprising multiple areas. Bioinformatics develops novel methods for problems in basic biology. Translational Bioinformatics moves developments in our understanding of disease from basic research to clinical care. Clinical Informatics develops methods and tools directly applied to patient care.

  2. How a Biologist Became a Data Scientist

    I have been working in data science since 2004 when I was in my second year of PhD studies. By the year 2006, ... Fast forward to 2020, I am still using data science to make sense of data from biology, chemistry and medicine. Much of my work revolves around the discovery of drugs that exert promising modulatory property against diseases by ...

  3. PhD in Bioinformatics Data Science

    PhD in Bioinformatics Data Science. A Ph.D. in Bioinformatics Data Science will train the next-generation of researchers and professionals who will play a key role in multi- and interdisciplinary teams, bridging life sciences and computational sciences. Students will receive training in experimental, computational and mathematical disciplines ...

  4. Computational Biology PhD

    The Computational Biology Graduate Group facilitates student immersion into UC Berkeley's vibrant computational biology research community. Currently, the Group includes over 46 faculty from across 14 departments of the College of Letters and Science, the College of Engineering, the College of Natural Resources, and the School of Public Health.

  5. Computational Biology Program

    The Computational Biology Ph.D. program is training the next generation of Computational Scientists to tackle research using the big genomic, image, remote sensing, clinical, and real world data that are transforming the biological sciences. The graduate field of Computational Biology offers Ph.D. degrees in the development and application of ...

  6. Biomedical Data Science and Informatics, M.S. / Ph.D.

    Clemson University and the Medical University of South Carolina offer a joint Master of Science and Ph.D. degree in Biomedical Data Science and Informatics. This unique collaboration combines Clemson's strengths in computing, engineering, and public health with MUSC's expertise in biomedical sciences to produce the next generation of data scientists, prepared to manage and analyze big data ...

  7. Biological Data Sciences Track

    Biological Data Sciences leverages new and developing courses within computational and systems biology and across UCLA, and greatly aids students who aim to go directly into industry—biotech, pharmaceuticals, and more—as well as computational biology graduate school. The track has a strong focus and deep integration with life sciences.

  8. Bioinformatics Data Science

    The Ph.D. in bioinformatics data science trains the next generation of researchers and professionals to play a key role in multi- and interdisciplinary teams, bridging life sciences and computational sciences. Students will receive training in experimental, computational and mathematical disciplines through their coursework and research.

  9. Biological Data Science, MS

    Program description. The MS degree program in biological data science provides students with a foundation in biology and computational methods along with hands-on training through practical projects at the interface of the natural and mathematical sciences. Students learn to manipulate big data, including the generation and analysis of data ...

  10. Biomedical Data Science

    Kavli Neuroscience Discovery Institute. Mathematical Institute for Data Science (MINDS) Translational Tissue Engineering Center. See More. Johns Hopkins Biomedical Engineering. Contact BME. Homewood Campus. 3400 N. Charles StreetWyman Park BuildingSuite 400 WestBaltimore, MD 21218. (410) 516-8120.

  11. PhD in Data Science

    A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field. ... Program requirements include a year of calculus and college biology, as well as experience in computer programming. Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 ...

  12. PHD Data Science Option

    Data Science Option PhD students have the opportunity to pursue their PhD with a Data Science option. The Data Science option prepares the next generation of thought leaders to both apply new data science methods and build new data science tools. It recognizes Ph.D. students whose thesis work focuses specifically on advanced data science tools and

  13. Biological Data Sciences Concentration

    Track Curriculum. The Biological Data Sciences concentration tackles a diverse set of biological questions-ranging from medicine, to genomics, to physiology, to pharmacology, to neuroscience, to ecology, and evolution-using recent tools and advances in mathematics and computation-specifically machine learning, statistical data sciences ...

  14. Can Biologists Become Data Scientists? (Answered!)

    Here's a short answer: Yes, biologists can transition to be data scientists. Biology is becoming an increasingly quantitative and data-heavy domain. It is a complex field where life is quantified through biomedical data science, bioinformatics, and computational biology, which allow smoother transitions from biology to data science.

  15. Biological Sciences (data science) PhD Projects, Programmes ...

    Fully-funded studentship - Deep learning methods for large citizen science data sets. University of Kent School of Mathematics, Statistics and Actuarial Science. Supervisors: This is a joint project between the University of Kent and Butterfly Conservation, and the PhD student will be supervised by a team with expertise in Statistics ...

  16. The Birth of Bio-data Science: Trends, Expectations, and Applications

    We term this variant of data science as "bio-data science (BDS).". BDS comprises three core disciplinary areas: biology (which constitutes the application domain), computer science, as well as mathematics and statistics ( Figure 1 ). The biology core area is concerned with questions regarding biological origin, such as the cause of a ...

  17. PhD in Data Science

    An NRT-sponsored program in Data Science Overview Overview Advances in computational speed and data availability, and the development of novel data analysis methods, have birthed a new field: data science. This new field requires a new type of researcher and actor: the rigorously trained, cross-disciplinary, and ethically responsible data scientist. Launched in Fall 2017, the …

  18. Data Science for Biology MSc

    The University of Edinburgh has a thriving data science community, which you would become part of. Based in the School of Biological Sciences, ranked in the top 5 for the UK for research power (REF 2022), you will be deeply focused on applying data science concepts within the domain of biological sciences. On completion of this programme you ...

  19. Where To Earn A Ph.D. In Data Science Online In 2024

    Based in San Diego, California, National University (NU) offers a variety of online programs, including a Ph.D. in data science. NU's program requires 60 credits and takes an estimated 40 months ...

  20. Getting a PhD in Data Science: What You Need to Know

    A PhD in Data Science is a research degree that typically takes four to five years to complete but can take longer depending on a range of personal factors. In addition to taking more advanced courses, PhD candidates devote a significant amount of time to teaching and conducting dissertation research with the intent of advancing the field. At ...

  21. Computational Genomics and Data Science Program

    The NHGRI 2020 Strategic Vision highlights the importance of bioinformatics and computational biology by stating, "all major genomics breakthroughs to date have been accompanied by the development of groundbreaking statistical and computational methods." Projects involving a substantial element of computational genomics or data science account for over a quarter of NHGRI's FY2021 budget ...

  22. What can a PhD add to your data science career?

    There are many career paths towards data science. Even though the field was mostly populated by people with academic backgrounds at the beginning, this is definitely not the only valid entry point. The long-standing debate about whether or not should you have a PhD to be a data scientist has been settled: you don't. However, a PhD can ...

  23. Program Overview

    The Graduate Home Program in Immunology is a premier training program that is collaborative and multidisciplinary. We offer two tracks: Molecular, Cellular, and Translational Immunology (MCTI) and Computational and Systems Immunology (CSI). Our Ph.D. curriculum includes lab and foundation courses, immunology and computational biomedical ...

  24. MS in Bioinformatics

    The Master of Science in Bioinformatics is structured to provide students with the skills and knowledge to develop, evaluate, and deploy bioinformatics and computational biology applications. The program is designed to prepare students for employment in the biotechnology sector, where the need for knowledgeable life scientists with quantitative ...

  25. Bioinformatics: Where Biology and Data Science Meet

    Bioinformatics and Data Science in Biology. Bioinformatics is a multidisciplinary field that utilizes computer programming, machine learning, algorithms, statistics, and other computational tools to organize and analyze large volumes of biological data. Fields of biology that generate massive amounts of data include genomics, transcriptomics ...

  26. Graduate Degrees

    Northeastern is the world leader in experiential learning. Here, graduate students—from the master's through the doctorate, and in professional and certificate programs—put knowledge to work at Fortune 500 and startup companies, universities, government agencies, nonprofits, and global organizations.

  27. A.I. Is Learning What It Means to Be Alive

    On Friday, more than 80 biologists and A.I. experts signed a call for the technology to be regulated so that it cannot be used to create new biological weapons. Such a concern might apply to new ...

  28. Medical Sciences MS with Concentration in Medical Anatomy & Physiology

    Overview This online, non-thesis Master of Science program is designed to provide online courses in a learning format that address anatomical base knowledge including gross anatomy, microscopic anatomy, embryology, and cell biology, and physiology. Furthermore, the program is flexible enough to allow the student to pursue neuroanatomy and medical physiology courses as electives. This MS…

  29. Build your Data career with a Certificate in Data Science

    Data Science Graduate Certificate. Offered by University of Colorado Boulder. 6-9 months. Develop interdisciplinary skills in data science and gain knowledge of statistical analysis, data mining, and machine learning. Go to certificate. Explore more certificates by category. Launch your career .