AI Activity Example
Assigned Roles (3 students)
Prompt Engineer
Responsibilities
- Create 3 to 5 purposeful AI prompts
- Save AI responses and build the AI Interaction Log
- Write annotations for each prompt and response pair
Data Science Auditor
Responsibilities
- Identify hidden assumptions in AI guidance
- Design and run post import validation checks
- Provide evidence for at least two hidden assumptions
Synthesizer
Responsibilities
- Write the Human Authored Synthesis in clear course language
- Build slides using the required structure
- Ensure the final work is consistent and concise
Dataset
What you will submit
Group submissions
- AI Interaction Log
- Human Authored Synthesis
- Slides for a 5 to 7 minute presentation
Individual submission
- Individual Reflection (150 to 200 words)
Step-by-step tasks
Step 1 Define your investigation question (before using AI)
Write 2 to 3 sentences answering:
- What import problem are we investigating
- Why this problem matters in real data science
You must focus on hidden assumptions, for example assumptions about missing values, types, dates, or text encoding.
Step 2 Use AI strategically (3 to 5 prompts)
Your prompts must be purposeful and iterative.
[Note:] You may run several prompts, but keep 5 most useful and meaningful prompts for reporting.
Prompt requirements
- At least 1 prompt about missing values and missing value tokens
- At least 1 prompt about type guessing and type coercion risks
- At least 1 prompt about date parsing and mixed date formats
- At least 1 prompt about validation steps after import
Suggested prompt starters (you may adapt)
- “I am importing a CSV where missing values appear as blank, NA, N A, N slash A, and a period. What should I do in R or Python to import correctly and verify it”
- “What are the most common silent errors from automatic type guessing during CSV import”
- “If a date column mixes formats like YYYY MM DD and MM slash DD slash YYYY, what should I check after import”
- “Give me a post import validation checklist for columns that look numeric but are stored as text, such as currency and percentages”
Step 3 Create the AI Interaction Log
For each prompt, include:
- Prompt goal
- The AI response excerpt you used
- Your annotation:
- What AI got right
- What AI assumed
- What was missing, misleading, or incorrect
Important rule
- You may not paste AI text into your final synthesis verbatim.
Step 4 Import the dataset and run required validation checks
Import the required file, then complete the validation checklist below. Your evidence can be screenshots, printed outputs, or short summaries of what you observed. A short summary without numbers or outputs does not count as evidence.
Submit evidence for every item.
1. Rows and columns
- Confirm the dataset has 600 rows and 79 columns after import.
2. Missing value tokens
- Identify at least three different missing value representations present in the file (examples include NA, N slash A, blank, period).
- Report how many appear in at least two columns of your choice.
3. Date columns
- Identify at least two date columns (examples include last_scraped, host_since).
- Show evidence that more than one date format exists in the data.
- State whether your import tool parsed them as dates or left them as text.
4. Percent columns stored as text
- Check host_response_rate and host_acceptance_rate.
- Show the raw format you observe (for example values that include a percent sign).
- State what type your import tool assigned and what conversion would be required.
5. Currency stored as text
- Check the price column.
- Show the raw format you observe (for example dollar sign and commas).
- State what type your import tool assigned and what conversion would be required.
6. Logical values coded as text
- Check host_is_superhost.
- List the unique values you observe (for example t and f).
- State what type your import tool assigned and what conversion would be required.
7. Text encoding and non ASCII characters
- Find at least one host_name that contains non ASCII characters (examples include accented letters).
- Show evidence that the character displays correctly after import.
- State what could go wrong if encoding is not handled properly.
8. Silent type coercion risk
- Identify one column where automatic guessing could lead to a silent mistake.
- Explain the mistake, why it could be hard to notice, and how you would detect it.
Step 5 Write the Human-Authored Synthesis (group)
Length target: 400 to 600 words.
Your synthesis must include:
- Your investigation question
- At least two hidden assumptions with evidence from your validation checks
- A clear explanation of why the assumptions matter for later analysis
- A short recommended import and validation routine for this dataset type
Your synthesis must be written in your own words.
Step 6 Presentation slides (group)
Use this exact slide structure:
- Our question
- What AI suggested
- Where AI fell short
- Our corrected understanding, with evidence
- One takeaway for future data science work
Time: 12 to 15 minutes.
Individual Reflection (each student)
Write 150 to 200 words answering:
- What did AI help you learn
- What did AI miss or oversimplify
- What did you contribute as a human thinker
- How will you change your AI use in future data work
Your reflection must match your assigned role.