## Import your data
data_raw <- Mini Project 1
Exploratory Data Storytelling via Data Visualization
Overview
In this mini project, your team will explore one data set and tell a short data story with visuals.
- Asking a clear question for a real audience
- Exploring with a few plots
- Choosing the best evidence
- Writing a short, readable story
This is an exploratory project. Use descriptive analysis only. Do NOT run hypothesis tests and do NOT fit complex models.
What you will submit
Show, in your Posit Cloud,
- The rendered HTML report
- The complete version of this source qmd file
- Any data file you used, only if it is not a built in package data set
What you will present
A 10 minute team presentation that summarizes your story and highlights your 3 final visuals.
Team Info
Team Name: Your Team Name
Team members and roles for this project:
- Project lead (keeps time, coordinates tasks): Your Member Name(s)
- Data wrangler (loads data, checks quality, makes a clean table): Your Member Name(s)
- Visualization designer (builds plots, improves readability): Your Member Name(s)
- Storyteller (writes narrative, connects visuals to claims): Your Member Name(s)
Step 1: Choose a data set
Choose one option below. Two teams may use the same data set.
If another team uses the same data set, it would be much better if your story is meaningfully different. Your team may differ on
- Audience
- Focus variables or outcome
- Comparison frame (group comparison, relationship, ranking, time trend, and so on)
Data sets
penguinsdata set from R packagepalmerpenguinsflightsdata set from R packagenycflights13gapminderdata set from R packagegapminderdiamondsdata set from R packageggplot2Your own data set!
Using the code chunk import-data below to import your data, and call the data set data_raw.
Quick description of your data set
Answer the following questions.
- What does one row represent
Answer:
- What are columns you think might matter for your story
Answer:
- Who might care about this data
Answer:
Step 2: Mini proposal
Write short answers. Keep it specific.
- Audience: Who is your audience
Answer:
- Investigation question: One sentence
Answer:
- Your current guess: One sentence
Answer:
- Focus variables: List some variables you plan to use
Answer:
- Comparison frame: Choose one or two (group comparison, relationship, ranking, time trend, etc.)
Answer:
Step 3: Quick data check and light cleaning
Your goal is to understand what you have. Keep cleaning minimal.
3.1 First look
## Check your data (you can use your own way to check the data)
glimpse(data_raw)
summary(data_raw)3.2 Missing values
missing_counts = data_raw |>
summarise(across(everything(), ~ sum(is.na(.x)))) |>
pivot_longer(cols = everything(), names_to = "variable", values_to = "n_missing") |>
arrange(desc(n_missing))
missing_countsIn 2 to 4 sentences, describe what missing values you noticed and how you plan to handle them.
Answer:
3.3 Create a working clean table
Create a new object data with:
- Clean column names if needed
- Only the columns you might use
- Any simple recodes you need
Use the code chunk clean below to clean your data_raw, and save the new cleaned data called data that is ready for analysis.
## Clean your dataDescribe what you changed and why, in plain language:
Answer:
Step 4: Exploration
Create 4 quick exploratory plots. These are NOT final. They help you decide what story to tell.
Suggestions:
- One distribution plot
- One bar plot or count plot
- One relationship plot (two numeric variables)
- One plot that compares groups
Exploratory plot 1
# ggplot(data, aes(x = ...)) + geom_...What you learned (1 to 2 sentences):
Answer:
Exploratory plot 2
What you learned (1 to 2 sentences):
Answer:
Exploratory plot 3
What you learned (1 to 2 sentences):
Answer:
Exploratory plot 4
What you learned (2 to 3 sentences):
Answer:
Decision
- What is the most interesting pattern you found
Answer:
- Revised investigation question (one sentence)
Answer:
- Which 3 visuals will become your final visuals, and why
Answer:
Step 5: Build your mini data story
Your final report must include:
- A short opening paragraph that sets context and states the question (4 to 6 sentences)
- Exactly 3 final visuals, each with a short interpretation paragraph (3 to 5 sentences)
- A short closing paragraph with takeaways, limitations, and next steps (4 to 5 sentences)
Keep the whole report short. Aim for about 1 to 2 pages when rendered.
5.1 Story opening
Write your opening paragraph in this section. Include
- Context: What is the setting in plain language
- Audience: Who should care about this story
- Investigation question: What are you trying to learn
- Why it matters: What decision or understanding this could support
- Data description: Name the data set and explain what one row represents
- Preview: Briefly state what your three visuals will show
5.2 Final visual 1
# Build a clean, readable plotInterpretation (3 to 4 sentences). Include:
- What is shown: What is on the x axis and y axis, and what groups or colors mean
- Key pattern: Describe the main visible pattern using directional language and comparisons
- Evidence: Mention at least one concrete detail (an approximate value, a ranking, or a noticeable gap)
- Connection: Explain how this plot helps answer your investigation question
- Caveat: State one caution, such as missing values, outliers, or “this is association, not causation”
[Tip:] Avoid describing every point. Focus on what you want your audience to notice.
5.4 Final visual 2
# Build a clean, readable plotInterpretation (3 to 4 sentences): Include the same suggested points as visual 1.
5.5 Final visual 3
# Build a clean, readable plotInterpretation (3 to 4 sentences): Include the same suggested points as visual 1.
5.6 Closing
Write your closing paragraph. Include:
- Your answer: State your best answer to the investigation question (one sentence)
- Support: Point to your strongest visual evidence (one to two sentences)
- Limitation: Name one limitation of the data or your approach (one sentence)
- Next step: State one concrete next step you would take with more time (one sentence)
- So what: Explain what this means for your audience (one sentence)
[Tip:] Keep claims descriptive and reasonable. Do not overstate certainty.
Step 6: Visual quality checklist
For each final plot, confirm:
- Title and axis labels are descriptive
- Units are clear when relevant
- Text is readable at presentation size
- Legend is readable, or direct labels are used
- If you filtered rows, you said so in the text
Improve your plots accordingly.
Step 7: Team reflection
Each team member writes 2 to 4 sentences:
- What you contributed
- One thing you learned
- One thing you would improve next time
Member 1: your name
Answer:
Member 2: your name
Answer:
Member 3: your name
Answer:
Member 4: your name (if applicable)
Answer:
Step 7: Presentation plan
Plan a 10 minute talk with the suggested structure:
- About 1 minute: context, audience, question
- About 2 minutes: data set, key variables, and any cleaning choices
- About 5 minutes: your 3 visuals and the story they tell
- About 2 minutes: takeaway, limitation, next step
Presentation order
teams <- c("Team 1", "Superb Statisticians", "The Data Scientists", "Stat Padders", "Data Divers", "")
set.seed(3570)
sample(teams, 6, replace = FALSE)[1] "Team 1" "Stat Padders" "Superb Statisticians"
[4] "The Data Scientists" "" "Data Divers"
Grading guide
Total 15 points:
- Clear question and audience (3 pts)
- Evidence-based story with 3 high-quality visuals (6 pts)
- Reproducible workflow and readable report (3 pts)
- Presentation clarity and timing (3 pts)