AI Activity Example

Author

Dr. Cheng-Han Yu

Modified

April 23, 2026

Topic: Data Importing and Hidden Assumptions

You will use an AI tool as a learning partner to investigate a core data science reality:

Importing data is not a neutral step. Import tools make assumptions about missing values, data types, dates, text encoding, and formatting. When assumptions are wrong, the dataset can be changed silently before analysis begins.

You are graded on how you think, critique, and explain, not on whether AI gives perfect answers.

To Do

Download the dataset from the Dataset section below.
Write your investigation question (2 to 3 sentences). Step 1 in Tasks.
Run 3 to 5 AI prompts that cover missing values, types, dates, and validation. Step 2 in Tasks.
Import the data and complete checklist items 1 to 4 first. Step 4
Draft your synthesis, then build slides using the required 5 slide structure. Step 5 and Step 6

Assigned Roles (3 students)

Prompt Engineer

Responsibilities

Create 3 to 5 purposeful AI prompts
Save AI responses and build the AI Interaction Log
Write annotations for each prompt and response pair

Data Science Auditor

Responsibilities

Identify hidden assumptions in AI guidance
Design and run post import validation checks
Provide evidence for at least two hidden assumptions

Synthesizer

Responsibilities

Write the Human Authored Synthesis in clear course language
Build slides using the required structure
Ensure the final work is consistent and concise

Dataset

insideairbnb_chicago_listings_sample_600_2025_09_22.csv

What you will submit

Group submissions
- AI Interaction Log
- Human Authored Synthesis
- Slides for a 5 to 7 minute presentation
Individual submission
- Individual Reflection (150 to 200 words)

Step-by-step tasks

Step 1 Define your investigation question (before using AI)

Write 2 to 3 sentences answering:

What import problem are we investigating
Why this problem matters in real data science

You must focus on hidden assumptions, for example assumptions about missing values, types, dates, or text encoding.

Step 2 Use AI strategically (3 to 5 prompts)

Your prompts must be purposeful and iterative.

[Note:] You may run several prompts, but keep 5 most useful and meaningful prompts for reporting.

Prompt requirements

At least 1 prompt about missing values and missing value tokens
At least 1 prompt about type guessing and type coercion risks
At least 1 prompt about date parsing and mixed date formats
At least 1 prompt about validation steps after import

Suggested prompt starters (you may adapt)

“I am importing a CSV where missing values appear as blank, NA, N A, N slash A, and a period. What should I do in R or Python to import correctly and verify it”
“What are the most common silent errors from automatic type guessing during CSV import”
“If a date column mixes formats like YYYY MM DD and MM slash DD slash YYYY, what should I check after import”
“Give me a post import validation checklist for columns that look numeric but are stored as text, such as currency and percentages”

Step 3 Create the AI Interaction Log

For each prompt, include:

Prompt goal
The AI response excerpt you used
Your annotation:
- What AI got right
- What AI assumed
- What was missing, misleading, or incorrect

Important rule

You may not paste AI text into your final synthesis verbatim.

Step 4 Import the dataset and run required validation checks

Import the required file, then complete the validation checklist below. Your evidence can be screenshots, printed outputs, or short summaries of what you observed. A short summary without numbers or outputs does not count as evidence.

Required post import validation checklist (expand to view)

Submit evidence for every item.

1. Rows and columns

Confirm the dataset has 600 rows and 79 columns after import.

2. Missing value tokens

Identify at least three different missing value representations present in the file (examples include NA, N slash A, blank, period).
Report how many appear in at least two columns of your choice.

3. Date columns

Identify at least two date columns (examples include last_scraped, host_since).
Show evidence that more than one date format exists in the data.
State whether your import tool parsed them as dates or left them as text.

4. Percent columns stored as text

Check host_response_rate and host_acceptance_rate.
Show the raw format you observe (for example values that include a percent sign).
State what type your import tool assigned and what conversion would be required.

5. Currency stored as text

Check the price column.
Show the raw format you observe (for example dollar sign and commas).
State what type your import tool assigned and what conversion would be required.

6. Logical values coded as text

Check host_is_superhost.
List the unique values you observe (for example t and f).
State what type your import tool assigned and what conversion would be required.

7. Text encoding and non ASCII characters

Find at least one host_name that contains non ASCII characters (examples include accented letters).
Show evidence that the character displays correctly after import.
State what could go wrong if encoding is not handled properly.

8. Silent type coercion risk

Identify one column where automatic guessing could lead to a silent mistake.
Explain the mistake, why it could be hard to notice, and how you would detect it.

Step 5 Write the Human-Authored Synthesis (group)

Length target: 400 to 600 words.

Your synthesis must include:

Your investigation question
At least two hidden assumptions with evidence from your validation checks
A clear explanation of why the assumptions matter for later analysis
A short recommended import and validation routine for this dataset type

Your synthesis must be written in your own words.

Step 6 Presentation slides (group)

Use this exact slide structure:

Our question
What AI suggested
Where AI fell short
Our corrected understanding, with evidence
One takeaway for future data science work

Time: 12 to 15 minutes.

Individual Reflection (each student)

Write 150 to 200 words answering:

What did AI help you learn
What did AI miss or oversimplify
What did you contribute as a human thinker
How will you change your AI use in future data work

Your reflection must match your assigned role.

Topic: Data Importing and Hidden Assumptions

Assigned Roles (3 students)

Prompt Engineer

Data Science Auditor

Synthesizer

Dataset

What you will submit

Step-by-step tasks

Step 1 Define your investigation question (before using AI)

Step 2 Use AI strategically (3 to 5 prompts)

Suggested prompt starters (you may adapt)

Step 3 Create the AI Interaction Log

Step 4 Import the dataset and run required validation checks

Step 5 Write the Human-Authored Synthesis (group)

Step 6 Presentation slides (group)

Individual Reflection (each student)

Submission Examples