MATH/COSC 3570 Introduction to Data Science (Spring 2026)

This course introduces main aspects of doing a practical data science project, from importing data to deploying what is learned from data. We start with learning popular data science tools such as basic R and Python programming, Git and GitHub, and interactive publishing system Quarto. Then we learn data importing, data visualization and data wrangling using both R and Python. The second half of the course focuses on several basic simulation and machine learning methods, including Monte Carlo simulation, linear regression, K-nearest neighbors, logistic regression, principal component analysis, and K-means clustering. We learn R tidyverse and tidymodels packages. For Python, Pandas, NumPy, and Scikit-Learn libraries are introduced.

Learning outcomes

By the end of the course, students will be able to:

Execute a reproducible data science workflow using R and Python with Quarto and version control practices (Git and GitHub) to produce, share, and update analyses.
Import, clean, reshape, and visualize data to create effective representations that support exploration and decision making.
Apply machine learning and statistical modeling methods including simulation and core supervised and unsupervised learning tools to detect patterns and evaluate model performance.
Design and communicate a practical data science project by posing meaningful questions, justifying method choices, acknowledging limitations, and presenting results with clear, reproducible materials and appropriate citation of external code and permitted generative AI use.

A series of six, generic data visualizations: a scatterplot, a density plot, a contour plot, a line plot, a box plot, and another scatterplot.