Skip to main content

Command Palette

Search for a command to run...

Day 10: A/B Testing - The Reality Check

Updated
7 min read
O
Frontend engineer with 6 years of experience building and shipping production web applications. Currently focused on developer experience and developer relations through hands-on analysis and public writing on developer tools, onboarding, and product messaging.

I was excited to try PostHog's A/B testing feature.

I set up my experiment: "Create New Task" button vs "Add Task" button. Hypothesis: Clearer wording = more task creation.

Then I looked at my user count.

TaskFlow has... 3 users. Including me.

That's not enough for an A/B test. Not even close.

But let me show you how A/B testing works anyway, and when you should actually use it.

The Experiment I Wanted to Run

I wanted to test two variations of the task creation button:

Control (A): "Add Task" Test (B): "Create New Task"

Hypothesis: "Create New Task" is clearer and will lead to more task completions.

Goal metric: Task completion rate (users who click the button AND actually create a task)

Secondary metrics:

  • Button click rate

  • Time to first task creation

  • Task editing after creation

Sounds good, right?

Here's the problem...

Why TaskFlow Can't Run This Test Yet

A/B tests need statistical significance to be reliable.

PostHog's recommended sample size calculator told me the truth:

For my experiment:

  • Baseline conversion rate: ~60% (2 out of 3 users create tasks)

  • Minimum detectable effect: 20% improvement

  • Required sample size: ~200 users per variant

I need 400 total users.

I have 3.

That's 133x fewer users than needed.

If I ran this test with 3 users:

  • User 1 sees "Add Task" → creates task (100% conversion)

  • User 2 sees "Create New Task" → doesn't create task (0% conversion)

  • User 3 sees "Add Task" → creates task (100% conversion)

Result: "Add Task" wins! 100% vs 0%!

Reality: This means absolutely nothing. Sample size too small.

One user having a bad day skews everything.

How A/B Testing Works in PostHog

Even though I can't run a meaningful test yet, let me show you how it works.

Step 1: Create Your Experiment

Go to Experiments in PostHog → New Experiment

What you need:

  1. Feature flag key: button-text-experiment

  2. Hypothesis: Clear description of what you're testing and why

  3. Variants:

    • Control: "Add Task" (existing)

    • Test: "Create New Task" (new version)

  4. Goal metric: Primary success metric (e.g., "task_created" event)

Step 2: Set Up Your Metrics

PostHog offers 4 metric types:

Funnel: Conversion rate through multiple steps

  • Use case: "Users who saw button → clicked → created task"

  • This is what I'd use for TaskFlow

Mean: Average value per user

  • Use case: "Average tasks created per user"

Ratio: One metric divided by another

  • Use case: "Tasks completed / Tasks started"

Retention: Users who return after X days

  • Use case: "Users who return within 7 days"

For my button test, I'd use Funnel:

  • Step 1: Exposed to experiment

  • Step 2: Clicked button

  • Step 3: Created task

Step 3: Configure Variants

Control (50%): "Add Task" Test (50%): "Create New Task"

PostHog automatically splits users 50/50.

You can add more variants if you want (e.g., "New Task", "Make Task", etc.)

Implementing the Experiment

Here's how the code would look (if I had enough users):


// A/B Test: Button text experiment
  const buttonText = useMemo(() => {
    const variant = posthog.getFeatureFlag("button-text-experiment");
    return variant === "test" ? "Create New Task" : "Add Task"; // control
  }, []);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();

    if (!title.trim()) {
      // Track form submission with empty title - UX friction indicator
      posthog.capture("task_form_submitted_empty", {
        button_variant: buttonText,
      });
      return;
    }

    addTask({
      title: title.trim(),
      description: description.trim() || undefined,
      completed: false,
      priority,
      category,
      dueDate: dueDate ? new Date(dueDate) : undefined,
    });

    // Track task creation - key conversion event
    posthog.capture("task_created", {
      priority,
      category,
      has_description: !!description.trim(),
      has_due_date: !!dueDate,
      button_text: buttonText, // Track which variant they saw
    });

    // Reset form
    setTitle("");
    setDescription("");
    setPriority("medium");
    setCategory("work");
    setDueDate("");
    setOpen(false);
  };

  const handleOpenChange = (newOpen: boolean) => {
    if (newOpen) {
      // Track dialog opening - top of task creation funnel
      posthog.capture("add_task_dialog_opened", {
        button_text: buttonText, // Track which button text led to this
      });
    }
    setOpen(newOpen);
  };

  const handleCancel = () => {
    // Track cancellation - potential friction indicator
    posthog.capture("add_task_dialog_cancelled", {
      had_title: !!title.trim(),
      had_description: !!description.trim(),
      button_text: buttonText,
    });
    setOpen(false);
  };

Key points:

  1. posthog.getFeatureFlag() assigns the user to a variant

  2. Track both actions: button click AND task creation

  3. Include context: Which variant they saw

Important: Only call getFeatureFlag() for users who will actually see the button. Otherwise, you're including people who never had a chance to interact with it (skews results).

When Is Your App Ready for A/B Testing?

Minimum Requirements:

1. Enough users

  • 100+ active users minimum

  • 1000+ active users ideal

  • More users = faster, more reliable results

2. Clear baseline metrics

  • You need existing data to compare against

  • "Users create 2.5 tasks on average"

  • "40% of users complete onboarding"

3. A real hypothesis

  • Not "I wonder if this is better"

  • But "I think X will improve Y because Z"

4. A metric that matters

  • Revenue

  • Activation

  • Retention

  • Not vanity metrics

TaskFlow Isn't Ready Because:

❌ Only 3 users (need 400+)
❌ No baseline data (not enough history)
❌ Low event volume (2-3 events per week)
⚠️ Hypothesis is fine
⚠️ Metric is fine (task completion)

I'll revisit A/B testing when TaskFlow has 100+ active users.

For now, I'll make decisions based on:

  • Session replays (watch users struggle)

  • User interviews (ask them directly)

  • My own intuition (founder mode)

A/B testing is for optimization, not validation.

Better Ways to Test When You're Small

1. Session Replay (What I'm Using)

Watch 5-10 users interact with both button text versions manually.

Advantage: Qualitative insights, see actual confusion
Disadvantage: Not statistically significant

2. User Interviews

Ask users: "Which button text is clearer?"

Show them both options. Get feedback.

Advantage: Direct feedback, understand why
Disadvantage: What people say ≠ what they do

3. Ship It and Watch

Change the button text. Monitor for 1-2 weeks.

Did task creation go up or down?

Advantage: Real-world results
Disadvantage: No control group, can't isolate the change

4. Fake Door Test

Instead of A/B testing the button text, test if people WANT the feature at all.

Advantage: Validates demand before building
Disadvantage: Not for existing features

For TaskFlow, I'm using Session Replay + watching my funnels.

Once I hit 100+ users, I'll revisit A/B testing.

What I Learned About A/B Testing

1. Sample size matters more than you think

3 users ≠ statistically significant

You need hundreds, sometimes thousands.

2. A/B testing is for optimization, not discovery

Use it when you have:

  • Traffic

  • Baseline data

  • Clear hypothesis

Don't use it when you're still figuring out product-market fit.

3. Small changes need big traffic

Testing button text? Need 400+ users.

Testing a complete redesign? Can get away with less (bigger effect).

4. PostHog makes it easy... when you're ready

The tool is simple to use.

The challenge is having enough users to make it meaningful.

5. There are other ways to test

Session replay, user interviews, and gut instinct are valid for early-stage products.

Follow along with the series:
- 🎥 TikTok: [https://www.tiktok.com/@hey_techys?is_from_webapp=1&sender_device=pc]
- 📸 Instagram: [https://www.instagram.com/hey_techys]
- 💼 LinkedIn: [https://www.linkedin.com/in/onyenekwe-elizabeth-46a467183/]
- 🐦 Twitter/X: [https://x.com/ElizabethOnyen6]

Byeeeeeeee!!!!

- Lizzy