Welcome Aboard πŸ™Œ

MATH/COSC 3570 Introduction to Data Science

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Taipei, Taiwan

Taiwan location

My Journey

  • Assistant Professor (2020/08 - )

  • Postdoctoral Fellow

  • PhD in Statistics

  • MA in Economics/PhD program in Statistics

How to Reach Me

  • Office hours TuTh 3:20 - 4:20 PM and Wed 2 - 3 PM in Cudahy Hall 353.
  • πŸ“§
    • Answer your question within 24 hours.
    • Expect a reply on Monday if shoot me a message on weekends.
    • Start your subject line with [math3570] or [cosc3570] followed by a clear description of your question.
  • I will NOT reply your e-mail if … Check the email policy in the syllabus!

What is This Course?

  • Every aspect of doing a practical data science project, from importing data to deploying what we learn from data.

❓ What are prerequisites?
πŸ‘‰ COSC 1010 (Intro Programming) and MATH 4720 (Intro Stats) or MATH 2780 (Intro Regression)


❓ Is this like another intro stats course?
πŸ‘‰ No. Statistics and data science are closely related.

Nowadays
πŸ‘‰ Data science is a broader subject than statistics.

πŸ‘‰ Statistics focuses more on analyzing and learning from data, a part of the entire workflow of data science.


❓ Is this like another intro CS or programming course?
πŸ‘‰ Absolutely not. We learn how to code for doing data science, not for understanding computer systems and structures.

What is NOT Covered in This Course

  • Advanced data analytics and computing
    • MATH 4750 Statistical Computing
    • MATH 4760 Time Series Analysis
    • MATH 4780 Regression Analysis
    • MATH 4790 Bayesian Statistics
    • COSC 4600 Fundamentals of Artificial Intelligence
    • COSC 4610 Data Mining
    • COEN 4860 Introduction to Neural Networks
  • Big data: We start with small, in-memory data sets. You don’t know how to tackle big data unless you have experience with small data.
  • Database: Learn SQL in
    • COSC 4800 Principles of Database Systems
    • INSY 4052 Database Management Systems.

What Computing Languages?

~ 60%

~ 40%

  • You’ve learned Python in COSC 1010. Being R-Python bilingual is getting more important!

πŸ‘‰ Wouldn’t it be great to add both languages to your resume! 😎

❌ Don’t want to learn R and/or Python? Take another section or in next semester~!

❌ Drop deadline: 01/20 (Tue), 11:59 PM.

Where to Code? Posit Cloud


  • Have nice computing power and interactive collaboration with me and your teammates!

Course Materials

  • All course materials and information can be found on this website. Bookmark it!

Textbook (Really?!)

Learning Management System (D2L)

  • Check your grade: Assessments > Grades

  • New announcement: News

Grading Policy ✨

  • Your grade is from the following categories and distribution

    • 25% In-class lab activities

    • 10% In-class AI activities

    • 45% Mini projects

    • 20% Final project competition

  • ❌ You must participate in the final presentation in order to pass the course.

  • ❌ You will NOT be allowed to do any extra credit projects/homework/exam to compensate for a poor grade.

Grade-Percentage Conversion

  • \([x, y)\) means greater than or equal to \(x\) and less than \(y\). For example, 94.0 is in [94, 100] and the grade is A and 93.8 is in [90, 94) and the grade is A-.
Grade Percentage
A [94, 100]
A- [90, 94)
B+ [87, 90)
B [84, 87)
B- [80, 84)
C+ [77, 80)
C [74, 77)
C- [70, 74)
D+ [65, 70)
D [60, 65)
F [0, 60)

Lab Exercises (25%)

  • Graded as Complete/Incomplete and used as evidence of attendance and participation.

  • Allowed to have one incomplete lab exercise without any penalty.

  • Beyond that, 2% grade percentage will be taken off for each missing/incomplete exercise.

  • You will create a project in Posit Cloud saving all of your lab exercises. (We’ll go through know-how together)

AI activities (10%)

  • AI activities are short presentations during class. They are usually graded as complete or incomplete.

  • AI activities are used as evidence of both attendance and participation.

  • Groups take turn to present what they learn from GenAI about data science.

Mini Projects (45%)

  • You will work in a team on 3 mini projects.

  • This project will focus on a subset of the course content up to that point, for example data wrangling and visualization or a simple predictive model.

  • More detailed instructions and rubric will be provided later in the semester.

Final Project Competition (20%)

  • You will be team up to do the final project.

  • Your project can be in either of the following categories:

    1. Data analysis using statistical models or machine learning algorithms

    2. Introduce a R or Python package not learned in class, including live demo

    3. Introduce a data science tool (visualization, computing, etc) not learned in class, including live demo

    4. Introduce a programming language not learned in class for doing data science, including live demo, Julia, SQL, MATLAB, SAS for example.

    5. Web development: Shiny website or dashboard, including live demo

  • The final project presentation is on Monday, 5/4, 10:30 AM - 12:30 PM.

  • More information will be released later.

Document Generative AI Use

  • You may use generative AI tools such as ChatGPT and Gemini to generate a first draft of text for your work, provided that this use is documented and cited.
  • Unless explicitly stated otherwise, you may make use of any online resources, but you must explicitly cite/document where you obtained any code you directly use or use as inspiration in your solutions.

  • If you use GenAI, please include the followings in your submitted work:

    • Why/How I used AI (prompts or questions)
    • Generated output (screenshot or copy-paste excerpt)
    • How I used the output

Document Generative AI Use

  • Why/How I used AI (prompts and questions)
    • I asked ChatGPT to generate a histogram using R.
  • Generated output (screenshot or copy-paste excerpt)

  • How I used the output
    • I reviewed the suggestions, but I did not use the exact code. Instead, I change the code format and breaks value to 50.

GenAI and Academic Integrity