Day 10: A/B Testing - The Reality Check
I was excited to try PostHog's A/B testing feature.
I set up my experiment: "Create New Task" button vs "Add Task" button. Hypothesis: Clearer wording = more task creation.
Then I looked at my user count.
TaskFlow has... 3 users. Including me.
That's not enough for an A/B test. Not even close.
But let me show you how A/B testing works anyway, and when you should actually use it.
The Experiment I Wanted to Run
I wanted to test two variations of the task creation button:
Control (A): "Add Task" Test (B): "Create New Task"
Hypothesis: "Create New Task" is clearer and will lead to more task completions.
Goal metric: Task completion rate (users who click the button AND actually create a task)
Secondary metrics:
Button click rate
Time to first task creation
Task editing after creation
Sounds good, right?
Here's the problem...
Why TaskFlow Can't Run This Test Yet
A/B tests need statistical significance to be reliable.
PostHog's recommended sample size calculator told me the truth:
For my experiment:
Baseline conversion rate: ~60% (2 out of 3 users create tasks)
Minimum detectable effect: 20% improvement
Required sample size: ~200 users per variant
I need 400 total users.
I have 3.
That's 133x fewer users than needed.
If I ran this test with 3 users:
User 1 sees "Add Task" → creates task (100% conversion)
User 2 sees "Create New Task" → doesn't create task (0% conversion)
User 3 sees "Add Task" → creates task (100% conversion)
Result: "Add Task" wins! 100% vs 0%!
Reality: This means absolutely nothing. Sample size too small.
One user having a bad day skews everything.
How A/B Testing Works in PostHog
Even though I can't run a meaningful test yet, let me show you how it works.
Step 1: Create Your Experiment
Go to Experiments in PostHog → New Experiment
What you need:
Feature flag key:
button-text-experimentHypothesis: Clear description of what you're testing and why
Variants:
Control: "Add Task" (existing)
Test: "Create New Task" (new version)
Goal metric: Primary success metric (e.g., "task_created" event)
Step 2: Set Up Your Metrics
PostHog offers 4 metric types:
Funnel: Conversion rate through multiple steps
Use case: "Users who saw button → clicked → created task"
This is what I'd use for TaskFlow
Mean: Average value per user
- Use case: "Average tasks created per user"
Ratio: One metric divided by another
- Use case: "Tasks completed / Tasks started"
Retention: Users who return after X days
- Use case: "Users who return within 7 days"
For my button test, I'd use Funnel:
Step 1: Exposed to experiment
Step 2: Clicked button
Step 3: Created task
Step 3: Configure Variants
Control (50%): "Add Task" Test (50%): "Create New Task"
PostHog automatically splits users 50/50.
You can add more variants if you want (e.g., "New Task", "Make Task", etc.)
Implementing the Experiment
Here's how the code would look (if I had enough users):
// A/B Test: Button text experiment
const buttonText = useMemo(() => {
const variant = posthog.getFeatureFlag("button-text-experiment");
return variant === "test" ? "Create New Task" : "Add Task"; // control
}, []);
const handleSubmit = (e: React.FormEvent) => {
e.preventDefault();
if (!title.trim()) {
// Track form submission with empty title - UX friction indicator
posthog.capture("task_form_submitted_empty", {
button_variant: buttonText,
});
return;
}
addTask({
title: title.trim(),
description: description.trim() || undefined,
completed: false,
priority,
category,
dueDate: dueDate ? new Date(dueDate) : undefined,
});
// Track task creation - key conversion event
posthog.capture("task_created", {
priority,
category,
has_description: !!description.trim(),
has_due_date: !!dueDate,
button_text: buttonText, // Track which variant they saw
});
// Reset form
setTitle("");
setDescription("");
setPriority("medium");
setCategory("work");
setDueDate("");
setOpen(false);
};
const handleOpenChange = (newOpen: boolean) => {
if (newOpen) {
// Track dialog opening - top of task creation funnel
posthog.capture("add_task_dialog_opened", {
button_text: buttonText, // Track which button text led to this
});
}
setOpen(newOpen);
};
const handleCancel = () => {
// Track cancellation - potential friction indicator
posthog.capture("add_task_dialog_cancelled", {
had_title: !!title.trim(),
had_description: !!description.trim(),
button_text: buttonText,
});
setOpen(false);
};
Key points:
posthog.getFeatureFlag()assigns the user to a variantTrack both actions: button click AND task creation
Include context: Which variant they saw
Important: Only call getFeatureFlag() for users who will actually see the button. Otherwise, you're including people who never had a chance to interact with it (skews results).
When Is Your App Ready for A/B Testing?
Minimum Requirements:
1. Enough users
100+ active users minimum
1000+ active users ideal
More users = faster, more reliable results
2. Clear baseline metrics
You need existing data to compare against
"Users create 2.5 tasks on average"
"40% of users complete onboarding"
3. A real hypothesis
Not "I wonder if this is better"
But "I think X will improve Y because Z"
4. A metric that matters
Revenue
Activation
Retention
Not vanity metrics
TaskFlow Isn't Ready Because:
❌ Only 3 users (need 400+)
❌ No baseline data (not enough history)
❌ Low event volume (2-3 events per week)
⚠️ Hypothesis is fine
⚠️ Metric is fine (task completion)
I'll revisit A/B testing when TaskFlow has 100+ active users.
For now, I'll make decisions based on:
Session replays (watch users struggle)
User interviews (ask them directly)
My own intuition (founder mode)
A/B testing is for optimization, not validation.
Better Ways to Test When You're Small
1. Session Replay (What I'm Using)
Watch 5-10 users interact with both button text versions manually.
Advantage: Qualitative insights, see actual confusion
Disadvantage: Not statistically significant
2. User Interviews
Ask users: "Which button text is clearer?"
Show them both options. Get feedback.
Advantage: Direct feedback, understand why
Disadvantage: What people say ≠ what they do
3. Ship It and Watch
Change the button text. Monitor for 1-2 weeks.
Did task creation go up or down?
Advantage: Real-world results
Disadvantage: No control group, can't isolate the change
4. Fake Door Test
Instead of A/B testing the button text, test if people WANT the feature at all.
Advantage: Validates demand before building
Disadvantage: Not for existing features
For TaskFlow, I'm using Session Replay + watching my funnels.
Once I hit 100+ users, I'll revisit A/B testing.
What I Learned About A/B Testing
1. Sample size matters more than you think
3 users ≠ statistically significant
You need hundreds, sometimes thousands.
2. A/B testing is for optimization, not discovery
Use it when you have:
Traffic
Baseline data
Clear hypothesis
Don't use it when you're still figuring out product-market fit.
3. Small changes need big traffic
Testing button text? Need 400+ users.
Testing a complete redesign? Can get away with less (bigger effect).
4. PostHog makes it easy... when you're ready
The tool is simple to use.
The challenge is having enough users to make it meaningful.
5. There are other ways to test
Session replay, user interviews, and gut instinct are valid for early-stage products.
Follow along with the series:
- 🎥 TikTok: [https://www.tiktok.com/@hey_techys?is_from_webapp=1&sender_device=pc]
- 📸 Instagram: [https://www.instagram.com/hey_techys]
- 💼 LinkedIn: [https://www.linkedin.com/in/onyenekwe-elizabeth-46a467183/]
- 🐦 Twitter/X: [https://x.com/ElizabethOnyen6]
Byeeeeeeee!!!!
- Lizzy