A/B testing

I was excited to try PostHog's A/B testing feature.

I set up my experiment: "Create New Task" button vs "Add Task" button. Hypothesis: Clearer wording = more task creation.

Then I looked at my user count.

TaskFlow has... 3 users. Including me.

That's not enough for an A/B test. Not even close.

But let me show you how A/B testing works anyway, and when you should actually use it.

The Experiment I Wanted to Run

I wanted to test two variations of the task creation button:

Control (A): "Add Task" Test (B): "Create New Task"

Hypothesis: "Create New Task" is clearer and will lead to more task completions.

Goal metric: Task completion rate (users who click the button AND actually create a task)

Secondary metrics:

Button click rate
Time to first task creation
Task editing after creation

Sounds good, right?

Here's the problem...

Why TaskFlow Can't Run This Test Yet

A/B tests need statistical significance to be reliable.

PostHog's recommended sample size calculator told me the truth:

For my experiment:

Baseline conversion rate: ~60% (2 out of 3 users create tasks)
Minimum detectable effect: 20% improvement
Required sample size: ~200 users per variant

I need 400 total users.

I have 3.

That's 133x fewer users than needed.

If I ran this test with 3 users:

User 1 sees "Add Task" → creates task (100% conversion)
User 2 sees "Create New Task" → doesn't create task (0% conversion)
User 3 sees "Add Task" → creates task (100% conversion)

Result: "Add Task" wins! 100% vs 0%!

Reality: This means absolutely nothing. Sample size too small.

One user having a bad day skews everything.

How A/B Testing Works in PostHog

Even though I can't run a meaningful test yet, let me show you how it works.

Step 1: Create Your Experiment

Go to Experiments in PostHog → New Experiment

What you need:

Feature flag key: button-text-experiment
Hypothesis: Clear description of what you're testing and why
Variants:
- Control: "Add Task" (existing)
- Test: "Create New Task" (new version)
Goal metric: Primary success metric (e.g., "task_created" event)

Step 2: Set Up Your Metrics

PostHog offers 4 metric types:

Funnel: Conversion rate through multiple steps

Use case: "Users who saw button → clicked → created task"
This is what I'd use for TaskFlow

Mean: Average value per user

Use case: "Average tasks created per user"

Ratio: One metric divided by another

Use case: "Tasks completed / Tasks started"

Retention: Users who return after X days

Use case: "Users who return within 7 days"

For my button test, I'd use Funnel:

Step 1: Exposed to experiment
Step 2: Clicked button
Step 3: Created task

Step 3: Configure Variants

Control (50%): "Add Task" Test (50%): "Create New Task"

PostHog automatically splits users 50/50.

You can add more variants if you want (e.g., "New Task", "Make Task", etc.)

Implementing the Experiment

Here's how the code would look (if I had enough users):


// A/B Test: Button text experiment
  const buttonText = useMemo(() => {
    const variant = posthog.getFeatureFlag("button-text-experiment");
    return variant === "test" ? "Create New Task" : "Add Task"; // control
  }, []);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();

    if (!title.trim()) {
      // Track form submission with empty title - UX friction indicator
      posthog.capture("task_form_submitted_empty", {
        button_variant: buttonText,
      });
      return;
    }

    addTask({
      title: title.trim(),
      description: description.trim() || undefined,
      completed: false,
      priority,
      category,
      dueDate: dueDate ? new Date(dueDate) : undefined,
    });

    // Track task creation - key conversion event
    posthog.capture("task_created", {
      priority,
      category,
      has_description: !!description.trim(),
      has_due_date: !!dueDate,
      button_text: buttonText, // Track which variant they saw
    });

    // Reset form
    setTitle("");
    setDescription("");
    setPriority("medium");
    setCategory("work");
    setDueDate("");
    setOpen(false);
  };

  const handleOpenChange = (newOpen: boolean) => {
    if (newOpen) {
      // Track dialog opening - top of task creation funnel
      posthog.capture("add_task_dialog_opened", {
        button_text: buttonText, // Track which button text led to this
      });
    }
    setOpen(newOpen);
  };

  const handleCancel = () => {
    // Track cancellation - potential friction indicator
    posthog.capture("add_task_dialog_cancelled", {
      had_title: !!title.trim(),
      had_description: !!description.trim(),
      button_text: buttonText,
    });
    setOpen(false);
  };

Key points:

posthog.getFeatureFlag() assigns the user to a variant
Track both actions: button click AND task creation
Include context: Which variant they saw

Important: Only call getFeatureFlag() for users who will actually see the button. Otherwise, you're including people who never had a chance to interact with it (skews results).

When Is Your App Ready for A/B Testing?

Minimum Requirements:

1. Enough users

100+ active users minimum
1000+ active users ideal
More users = faster, more reliable results

2. Clear baseline metrics

You need existing data to compare against
"Users create 2.5 tasks on average"
"40% of users complete onboarding"

3. A real hypothesis

Not "I wonder if this is better"
But "I think X will improve Y because Z"

4. A metric that matters

Revenue
Activation
Retention
Not vanity metrics

TaskFlow Isn't Ready Because:

❌ Only 3 users (need 400+)
❌ No baseline data (not enough history)
❌ Low event volume (2-3 events per week)
⚠️ Hypothesis is fine
⚠️ Metric is fine (task completion)

I'll revisit A/B testing when TaskFlow has 100+ active users.

For now, I'll make decisions based on:

Session replays (watch users struggle)
User interviews (ask them directly)
My own intuition (founder mode)

A/B testing is for optimization, not validation.

Better Ways to Test When You're Small

1. Session Replay (What I'm Using)

Watch 5-10 users interact with both button text versions manually.

Advantage: Qualitative insights, see actual confusion
Disadvantage: Not statistically significant

2. User Interviews

Ask users: "Which button text is clearer?"

Show them both options. Get feedback.

Advantage: Direct feedback, understand why
Disadvantage: What people say ≠ what they do

3. Ship It and Watch

Change the button text. Monitor for 1-2 weeks.

Did task creation go up or down?

Advantage: Real-world results
Disadvantage: No control group, can't isolate the change

4. Fake Door Test

Instead of A/B testing the button text, test if people WANT the feature at all.

Advantage: Validates demand before building
Disadvantage: Not for existing features

For TaskFlow, I'm using Session Replay + watching my funnels.

Once I hit 100+ users, I'll revisit A/B testing.

What I Learned About A/B Testing

1. Sample size matters more than you think

3 users ≠ statistically significant

You need hundreds, sometimes thousands.

2. A/B testing is for optimization, not discovery

Use it when you have:

Traffic
Baseline data
Clear hypothesis

Don't use it when you're still figuring out product-market fit.

3. Small changes need big traffic

Testing button text? Need 400+ users.

Testing a complete redesign? Can get away with less (bigger effect).

4. PostHog makes it easy... when you're ready

The tool is simple to use.

The challenge is having enough users to make it meaningful.

5. There are other ways to test

Session replay, user interviews, and gut instinct are valid for early-stage products.

Follow along with the series:
- 🎥 TikTok: [https://www.tiktok.com/@hey_techys?is_from_webapp=1&sender_device=pc]
- 📸 Instagram: [https://www.instagram.com/hey_techys]
- 💼 LinkedIn: [https://www.linkedin.com/in/onyenekwe-elizabeth-46a467183/]
- 🐦 Twitter/X: [https://x.com/ElizabethOnyen6]

Byeeeeeeee!!!!

- Lizzy

Day 10: A/B Testing - The Reality Check

The Experiment I Wanted to Run

Why TaskFlow Can't Run This Test Yet

How A/B Testing Works in PostHog

Step 1: Create Your Experiment

Step 2: Set Up Your Metrics

Step 3: Configure Variants

Implementing the Experiment

When Is Your App Ready for A/B Testing?

Minimum Requirements:

TaskFlow Isn't Ready Because:

Better Ways to Test When You're Small

1. Session Replay (What I'm Using)

2. User Interviews

3. Ship It and Watch

4. Fake Door Test

What I Learned About A/B Testing

1. Sample size matters more than you think

2. A/B testing is for optimization, not discovery

3. Small changes need big traffic

4. PostHog makes it easy... when you're ready

5. There are other ways to test

Comments

More from this blog

Day 9: Feature Flags – Shipping Without Fear

Day 8: Funnels – Where Users Drop Off (And Why Metrics Lie)

Day 7: User Identification – Anonymous vs Identified Users

Day 6: Session Replay – I Watched Strangers Use My App (And They Had No Idea)

Command Palette

The Experiment I Wanted to Run

Why TaskFlow Can't Run This Test Yet

How A/B Testing Works in PostHog

Step 1: Create Your Experiment

Step 2: Set Up Your Metrics

Step 3: Configure Variants

Implementing the Experiment

When Is Your App Ready for A/B Testing?

Minimum Requirements:

TaskFlow Isn't Ready Because:

Better Ways to Test When You're Small

1. Session Replay (What I'm Using)

2. User Interviews

3. Ship It and Watch

4. Fake Door Test

What I Learned About A/B Testing

1. Sample size matters more than you think

2. A/B testing is for optimization, not discovery

3. Small changes need big traffic

4. PostHog makes it easy... when you're ready

5. There are other ways to test

Comments

More from this blog