Data Science Overview 📖

MATH/COSC 3570 Introduction to Data Science

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

What is Data Science or a Data Scientist?

A Little History of Data Science

Source: https://www.reddit.com/r/meme/comments/floq3q/reality_behind_data_science/

Source: https://br.ifunny.co/picture/we-will-work-together-statistics-computer-science-please-teach-now-h4hdtthT9?s=cl

😕 Still what on earth is data science?

Battle of the Data Science Venn Diagrams

Battle of the Data Science Venn Diagrams

Battle of the Data Science Venn Diagrams

Shall We Continue?

  • You probably get the idea. There are so many ways to define data science.

What is Data Scientist

What is Data Scientist

Source: https://www.instagram.com/data_science_learn/

Source: https://www.instagram.com/data_science_dojo/

What Wiki Defines

GenAI Prompting

  • Make AI your thinking partner and tutor!

  • Copy and paste the paragraph into your GenAI chat as the initial prompt.

I am a college student in an introductory data science course. Be my thinking partner and tutor, not just an answer generator. Explain everything at an introductory level in plain language and avoid advanced jargon unless I ask. Walk through problems step by step and show your reasoning, and ask up to two quick questions if you need missing details. Use a tiny example when it helps. If code is needed, use either R (tidyverse) or Python (pandas, scikit-learn) based on what I say I am using, and keep the code minimal, well commented, and explained line by line. If my approach is wrong, tell me clearly what is wrong and how to fix it. End with one quick check for understanding and a one sentence summary.

GenAI Prompting


  • Why should we care about and learn data science?

> How does data science show up in everyday decisions I already care about?

> What mistakes happen when decisions are made without data or with bad data?

> How is data science used in my major: [environmental science, computer science, etc]?


  • Share your AI’s responses and your thoughts!~

Data Science in This Course

  • Data science is an discipline that allows us to turn raw data into understanding, insight, and knowledge.

  • We’re going to learn to do this in a tidy way – more on that later!

  • This is a introductory data science course with an emphasis on important tools of R/Python that help us do data science.

A Data Science Project

Data Science Workflow

  • Import: Take data stored somewhere and load it into your workspace.
  • Tidy: Storing data in a consistent rectangular form, i.e., a data matrix.
  • Transform: Narrowing in on observations of interest, creating new variables, calculating statistics.

Data Matrix

  • Each row corresponds to a unique case or observational unit.

  • Each column represents a characteristic or variable.

  • This structure allows new cases to be added as rows or new variables as new columns.

  • Visualisation: A good visualisation shows you things that you did not expect or raise new questions about the data.

mpg |> ggplot(aes(x = displ, y = hwy)) +
    geom_point(aes(color = class)) + 
    geom_smooth() + 
    theme_bw()

  • Model: Models are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them.

library(tidymodels)
linear_reg() |>  
    set_engine("lm") |> 
    fit(hwy ~ displ, data = mpg)
parsnip model object


Call:
stats::lm(formula = hwy ~ displ, data = data)

Coefficients:
(Intercept)        displ  
      35.70        -3.53  

  • Communication: It doesn’t matter how well your models and visualization have led you to understand the data unless you can also communicate your results to others.

  • Programming: Surrounding all these tools is programming.

R for Data Science

Source: https://teachdatascience.com/tidyverse/

Python for Data Science

Source: https://www.e2enetworks.com/blog/9-python-libraries-for-data-science-and-artificial-intelligence

GenAI Prompting


  • Making Data Science Concrete Through Examples

> Walk through a simple data science problem from question to conclusion.


  • Share your example! Does the workflow generally follow what we discussed?