Advanced Statistics for Data Science Specialization
Basic Info
Faculty Profile
Course Contents
Course Outcomes
Assignments
Exams
Further Readings

Course Title:

Advanced Statistics for Data Science Specialization



Course Description:

Fundamental concepts in probability, statistics and linear models are primary building blocks for data science work. Learners aspiring to become biostatisticians and data scientists will benefit from the foundational knowledge being offered in this specialization. It will enable the learner to understand the behind-the-scenes mechanism of key modeling tools in data science, like least squares and linear regression. This specialization starts with Mathematical Statistics bootcamps, specifically concepts and methods used in biostatistics applications. These range from probability, distribution, and likelihood concepts to hypothesis testing and case-control sampling. This specialization also linear models for data science, starting from understanding least squares from a linear algebraic and mathematical perspective, to statistical linear models, including multivariate regression using the R programming language. These courses will give learners a firm foundation in the linear algebraic treatment of regression modeling, which will greatly augment applied data scientists' general understanding of regression models.



Course instructional level:


Advance

Course Duration:


3 Month/6 Month
Hours: 45

Course coordinator:


Prof. Nilanjan Nandy

Course coordinator's profile(s):


Fundamental concepts in probability, statistics and linear models are primary building blocks for data science work. Learners aspiring to become biostatisticians and data scientists will benefit from the foundational knowledge being offered in this specialization. It will enable the learner to understand the behind-the-scenes mechanism of key modeling tools in data science, like least squares and linear regression. This specialization starts with Mathematical Statistics bootcamps, specifically concepts and methods used in biostatistics applications. These range from probability, distribution, and likelihood concepts to hypothesis testing and case-control sampling. This specialization also linear models for data science, starting from understanding least squares from a linear algebraic and mathematical perspective, to statistical linear models, including multivariate regression using the R programming language. These courses will give learners a firm foundation in the linear algebraic treatment of regression modeling, which will greatly augment applied data scientists' general understanding of regression models.

Course instructional level: Advance
Course duration: 3 Month/6 Month Hours: 45
Course coordinator’s name(s) Dr. Nilanjan Nandy

Course coordinator’s profile(s): Dr. Nilanjan Nandy is a mathematician who earned his Ph.D. in Mathematics from Visvesvaraya National Institute of Technology, Nagpur. With a strong foundation in mathematical theory and research, he has made significant contributions to the field, having published numerous research papers in well-regarded international journals. These publications highlight his expertise and the relevance of his work in advancing mathematical understanding and applications. In addition to his research, Dr. Nandy is likely involved in teaching and mentoring students, shaping the next generation of mathematicians and researchers. His achievements reflect both his academic prowess and his dedication to the development of mathematics as a discipline.

Course Outcomes:

  1. Learn about probability, expectations, conditional probabilities, distributions, confidence intervals, bootstrapping, binomial proportions, and more.
  2. Understand the matrix algebra of linear regression models.
  3. Learn about canonical examples of linear models to relate them to techniques that you may already be using.

Module/Topic name Sub-topic Duration Module/Topic- wiseCourse name (Coursera/ other online platform) University/organization name: Course Instructor name Course (Coursera/ other online platform) web-page link. Please paste the web page link of each topic/sub-topic.
1. Mathematical Biostatistics Boot Camp 1 1. Introduction, Probability, expectation, and Random vectors. Course name: University/organization name:

Johns Hopkis University Course Instructor name: Brian Caffo, PhD

web page link: https://www.coursera.org/learn/biostatistics?specialization=advanced-statistics-data-science
2. Conditional Probability, Bayes’Rule, Likelihood, Distributions, and Asymptotics web page link: https://www.coursera.org/learn/biostatistics?specialization=advanced-statistics-data-science
3. Confidence Intervals, Bootstrapping, and Plotting web page link: https://www.coursera.org/learn/biostatistics?specialization=advanced-statistics-data-science
4. Binomial Proportions and Logs web page link: https://www.coursera.org/learn/biostatistics?specialization=advanced-statistics-data-science
2. Mathematical Biostatistics Boot Camp 2 1. Hypothesis testing Course name: University/organization name:

Johns Hopkis University Course Instructor name: Brian Caffo, PhD

web page link: https://www.coursera.org/learn/biostatistics-2?specialization=advanced-statistics-data-science
2. Two Binomials web page link: https://www.coursera.org/learn/biostatistics-2?specialization=advanced-statistics-data-science
3. Discrete Data Setting web page link: https://www.coursera.org/learn/biostatistics-2?specialization=advanced-statistics-data-science
4. Techniques web page link: https://www.coursera.org/learn/biostatistics-2?specialization=advanced-statistics-data-science
3. Advanced Linear Models for Data Science 1: Least Squares 1. Background Course name: University/organization name:

Johns Hopkis University Course Instructor name: Brian Caffo, PhD

web page link: https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
2. One and two parameter regression web page link: https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
3. Linear regression web page link: https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
4. General least squares web page link: https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
5. General least squares https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
6. Least squares example https://www.coursera.org/learn/linear-models?specialization=advanced-statistics-data-science#modules
4. Advanced Linear Models for Data Science 2: Statistical Linear Models 1. Introduction and expected values Course name: University/organization name:

Johns Hopkis University Course Instructor name: Brian Caffo, PhD

https://www.coursera.org/learn/linear-models-2?specialization=advanced-statistics-data-science
2. The multivariate normal distribution https://www.coursera.org/learn/linear-models-2?specialization=advanced-statistics-data-science
3. Distributional results https://www.coursera.org/learn/linear-models-2?specialization=advanced-statistics-data-science
4. Residuals https://www.coursera.org/learn/linear-models-2?specialization=advanced-statistics-data-science

Course Contents:



Module/Topic name Sub-topic Duration
1. Mathematical Biostatistics Boot Camp 1 1. Introduction, Probability, expectation, and Random vectors.
2. Conditional Probability, Bayes’Rule, Likelihood, Distributions, and Asymptotics
3. Confidence Intervals, Bootstrapping, and Plotting
4. Binomial Proportions and Logs
2. Mathematical Biostatistics Boot Camp 2 1. Hypothesis testing
2. Two Binomials
3. Discrete Data Setting
4. Techniques
3. Advanced Linear Models for Data Science 1: Least Squares 1. Background
2. One and two parameter regression
3. Linear regression
4. General least squares
5. General least squares
6. Least squares example
4. Advanced Linear Models for Data Science 2: Statistical Linear Models 1. Introduction and expected values
2. The multivariate normal distribution
3. Distributional results
4. Residuals


Course Outcomes:


  1. Learn about probability, expectations, conditional probabilities, distributions, confidence intervals, bootstrapping, binomial proportions, and more.
  2. Understand the matrix algebra of linear regression models.
  3. Learn about canonical examples of linear models to relate them to techniques that you may already be using.