How to choose the right sample size for your M&E Project

TL;DR

Need to figure out your sample size? Here’s the quick version:

For baseline surveys and monitoring: You need 4 things – your population size, how precise you want to be (usually ±5%), how confident (95% is standard), and your best guess at what you’ll find. Plug these into an online calculator. For most M&E work, expect 200-400 people.

For impact evaluations: You need bigger samples because you’re comparing treatment vs control groups. The smaller the change you want to detect, the more people you need. A typical impact evaluation needs 400-800 people total (split evenly between groups).

Reality check: Cluster sampling (villages/schools instead of individuals) usually doubles your sample size. Budget for 15-20% dropout. And always calculate sample size before you write your budget.

Can’t afford the calculated sample size? You have three choices: find more money, accept you’ll only detect larger impacts, or don’t do the evaluation at all. An underpowered study wastes everyone’s time and money.

Try out our free calculator. Document every assumption. When in doubt, hire a statistician for a few hours – it’s cheaper than a failed evaluation.

Table of Content

Introduction
Why sample size matters more than you think
The two types of M&E studies and why they need different approaches
The four numbers you need before calculating anything
Quick guide: Sample size for monitoring and baseline studies
Quick guide: Sample size for impact evaluations
What about qualitative research?
- Purposeful sampling is not cheating

Introduction

Getting your sample size right is one of those things that keeps M&E practitioners up at night. Sample too few people and your findings won’t be credible. Sample too many and you’ve wasted precious time and money that could have gone to actually helping people.

Here’s what I’ve seen happen in the field: An organization spends three months collecting data from 50 farmers to evaluate their agriculture program, only to have a donor reject the findings because the sample was too small to prove anything. Or worse, a team surveys 2,000 beneficiaries when 400 would have been enough – burning through their entire M&E budget on data collection when they needed money for analysis and reporting.

The good news? You don’t need a statistics PhD to get this right.

This guide will walk you through the practical steps to determine your sample size, whether you’re running a baseline survey or evaluating program impact. I’ll skip the heavy math and focus on what actually works in the real world.

Why sample size matters more than you think

Think of sample size as insurance for your evaluation. Get it wrong and you’re essentially gambling with your organization’s resources and credibility.

When your sample is too small, you risk missing real program impacts. I once worked with a health NGO that surveyed 30 clinics to measure whether their quality improvement program worked. The data showed a 15% improvement in patient satisfaction, but the sample was too small to prove it wasn’t just random chance. They couldn’t convince their board to continue funding.

That’s called a Type II error in statistics – failing to detect an impact that actually exists. Your program might be working beautifully but your evaluation can’t prove it.

The flip side is just as bad. Oversample and you’re wasting money that could feed more children, train more teachers, or treat more patients. If you need 200 surveys to get reliable results but you collect 800, you’ve just spent 4x more than necessary on data collection.

There’s also an ethical dimension here. Every person you survey is giving you their time. If you’re collecting more data than you need to answer your questions, you’re wasting people’s time. If you’re collecting too little data to produce reliable findings, you’re also wasting their time because the results won’t be useful.

The right sample size sits in that sweet spot where you can:

Detect the impacts you care about
Convince skeptical stakeholders
Stay within your budget
Respect participants’ time

The two types of M&E studies and why they need different approaches

Before you calculate anything, you need to be clear about what kind of study you’re doing. This determines everything about how you calculate sample size.

Descriptive studies answer questions like “how many?” or “what proportion?”. These include baseline surveys, monitoring reports, and needs assessments. You’re measuring the current situation, not trying to prove your program caused a change.

Example: “What percentage of farmers in our project area currently use improved seeds?”

Impact evaluations answer the question “did our program cause this change?”. You’re comparing people who got your intervention against people who didn’t, trying to prove the difference is because of your program.

Example: “Did our training program increase the percentage of farmers using improved seeds compared to farmers who didn’t get training?”

The key difference? Descriptive studies focus on precision – getting an accurate snapshot. Impact evaluations focus on detection – being able to spot the difference your program made.

This is important because impact evaluations almost always need bigger samples. You’re not just measuring one group, you’re measuring two groups (treatment and control) and trying to find the difference between them. That’s harder statistically, so it requires more data.

A baseline survey of 200 farmers might give you a perfectly good estimate of current seed use. But proving your training changed seed use? You’d probably need 400-600 farmers (split between trained and untrained groups) to confidently detect a meaningful change.

The four numbers you need before calculating anything

Every sample size calculation needs four inputs. Let’s break them down in plain language.

1. Confidence level: How sure do you want to be?

This is about how certain you want to be that your findings reflect reality. The standard is 95% confidence, which means if you repeated your study 100 times, 95 of those times your results would be accurate.

Most M&E studies use 95% confidence. It’s what donors expect and what’s considered acceptable in the development sector. You could go to 90% (less certain, smaller sample needed) or 99% (more certain, bigger sample needed), but 95% is the sweet spot.

In practical terms: stick with 95% unless you have a really good reason not to.

2. Margin of error: How precise do you need to be?

The margin of error tells you how much your sample results might differ from the true population value. A 5% margin of error means if your survey shows 60% of farmers use improved seeds, the real number is probably between 55% and 65%.

Common margins of error:

±3%: Very precise, expensive, usually only for large national surveys
±5%: Standard for most M&E work, good balance of precision and cost
±10%: Less precise, cheaper, sometimes acceptable for small pilots

Real example: If you survey 200 households and find 40% have access to clean water (±5% margin), you can say with confidence that the true rate is between 35% and 45%. That’s usually precise enough for program decisions.

3. Expected baseline proportion: What do you expect to find?

This is your best guess at what the current situation looks like. If you’re measuring whether people wash hands with soap, what percentage do you think currently do it?

This matters because outcomes near 50% have the most variation, requiring larger samples. Outcomes near 0% or 100% have less variation and need smaller samples.

Where to get this number:

Pilot studies or previous surveys in your area
National statistics (DHS, census data)
Similar programs in similar contexts
If you truly have no idea, use 50% – it’s the most conservative estimate

4. Minimum detectable effect: The smallest change worth finding

This one only applies to impact evaluations. It’s the smallest difference between your treatment and control groups that would be meaningful enough to care about.

Let’s say your baseline handwashing rate is 20%. Would you care about a 2% increase? Probably not. How about 10%? Maybe. How about 20%? Definitely.

The smaller the effect you want to detect, the bigger your sample needs to be. Detecting a 5% change requires four times the sample size as detecting a 10% change.

Be realistic here. If you set this too small (wanting to detect tiny changes), you’ll need a massive sample you can’t afford. If you set it too large (only detecting huge changes), you might miss real but modest program impacts.

Quick guide: Sample size for monitoring and baseline studies

Here’s how to calculate sample size for a descriptive study in five steps. I’ll use a real example so you can see how it works.

Example scenario: You need to survey office staff to find out what percentage are using new project management software. There are 500 staff members total.

Step 1: Know your population size

How many people total are in the group you want to study? In our example, that’s 500 office staff.

Step 2: Choose your margin of error

How precise do you need to be? Let’s say ±5% (0.05) is good enough for making decisions about training needs.

Step 3: Set your confidence level

We’ll stick with the standard 95% confidence level.

Step 4: Estimate the baseline proportion

We have no idea how many people are using the software yet, so we’ll use 50% (0.5) – the most conservative estimate.

Step 5: Calculate

You can use an online calculator (I recommend Calculator.net or SurveyMonkey’s calculator) or the formula if you’re comfortable with it.

For our example, the calculator tells us we need 217 staff members to survey.

Here’s the practical reality: if your population is small (under 1,000), the population size matters a lot. If your population is huge (over 20,000), the population size barely matters at all.

This surprises people. Whether you’re surveying a city of 50,000 or a country of 50 million, you’d need almost the same sample size to get a ±5% margin of error. The precision you want (that margin of error) becomes the main driver of your sample size, not the total population.

Population Size	Sample Needed (±5% margin, 95% confidence)
500	217
1,000	278
5,000	357
50,000	381
500,000	384
5,000,000	384

See how it plateaus? Once your population gets big, your sample size barely changes.

Quick guide: Sample size for impact evaluations

Impact evaluations are trickier because you’re comparing two groups. Here’s how to think through it with a real example.

Example scenario: A nutrition program wants to know if their intervention increases the percentage of children meeting growth targets. Currently, 20% of children meet these targets (your baseline). You want to detect at least a 10 percentage point increase (so 30% in the treatment group).

The key inputs for impact evaluation

Baseline proportion: 20% (0.20) – current rate in control group
Expected proportion after intervention: 30% (0.30) – expected rate in treatment group
Minimum detectable effect: 10 percentage points (the difference between them)
Power: 80% – your chance of detecting this difference if it exists
Confidence level: 95% – how certain you want to be

What the calculation tells you

Using these inputs and an online power calculator (try the one from Sealed Envelope or Evan’s Awesome A/B Tools), you’d need:

294 children in the control group
294 children in the treatment group
588 children total

This is almost always bigger than people expect. You need enough children in each group to reliably detect the difference between 20% and 30%.

Why effect size matters so much

Here’s where it gets interesting. Small changes in your expected effect size create huge changes in required sample size.

Let’s look at the same nutrition program but with different expected impacts:

Expected Impact	Control Rate	Treatment Rate	Sample Needed Per Group	Total Sample
Small (5 points)	20%	25%	1,167	2,334
Medium (10 points)	20%	30%	294	588
Large (20 points)	20%	40%	74	148

See the pattern? Detecting small changes requires massive samples. If you want to detect a 5 percentage point change instead of a 10 point change, you need four times as many participants.

This is why you need to be realistic about your minimum detectable effect. Don’t set it at 3% just because that would be nice to detect. Set it at the smallest change that would actually matter for program decisions.

The 1:1 ratio rule

Always split your sample evenly between treatment and control groups. If you need 300 people total, that’s 150 in each group. Uneven splits reduce your statistical power and waste resources.

The reality check: When your actual situation is messier

The calculations I showed you assume simple random sampling – you randomly pick individuals from your population. But real M&E work is rarely that simple. Here are the complications that inflate your sample size.

Cluster sampling: The hidden sample size killer

Let’s say you’re evaluating a school program. It’s way easier to randomly select 20 schools and survey all students in those schools than to randomly select individual students across 100 schools.

That’s cluster sampling. It saves you time and money on fieldwork, but it costs you statistically.

Why? Students within the same school are more similar to each other than students from different schools. They share teachers, resources, and context. This similarity means you’re not getting as much unique information as if you’d randomly sampled individual students.

To compensate, you need to inflate your sample size using something called the design effect. A design effect of 2.0 means you need twice as many participants. Design effects for cluster sampling typically range from 1.5 to 3.0, depending on how similar people within clusters are.

Practical example: Your calculation says you need 400 students. But you’re sampling schools (clusters), not individual students. With a design effect of 2.0, you actually need 800 students.

If you’re doing cluster sampling, get help from a statistician to estimate your design effect. It’s worth the consultation fee to get this right.

Dropout and non-response

People drop out of studies. They move away, lose interest, or can’t be reached for follow-up surveys. This is especially common in impact evaluations that require surveying people multiple times over months or years.

The fix is simple: inflate your sample size to account for expected losses.

Formula: Take your calculated sample size and divide it by (1 – expected dropout rate)

Example: You need 300 people for your endline survey. You expect 20% will drop out or be lost to follow-up.

300 ÷ (1 – 0.20) = 300 ÷ 0.80 = 375

So you need to recruit 375 people at baseline to end up with 300 at endline.

Where to get dropout estimates? Look at similar studies in your context. If you’re doing something new, budget for 15-20% dropout as a conservative estimate.

Multiple outcomes

Many programs track several indicators. Maybe you’re measuring both household income AND children’s school attendance.

If your success criteria require all outcomes to show significant results (an “and” condition), calculate for each outcome separately and use the largest sample size. This ensures you’re adequately powered for your toughest outcome.

If you’re claiming success if any outcome shows results (an “or” condition), you need to adjust your statistical tests to avoid false positives. This can require larger samples. Talk to a statistician about this one.

What about qualitative research?

Qualitative research follows completely different rules. You’re not trying to measure prevalence or prove causation – you’re trying to understand processes, experiences, and meaning.

For interviews and focus groups, sample size is about reaching saturation – the point where you stop hearing new information. Once three interviews in a row tell you the same themes with no new insights, you’ve probably hit saturation.

Typical sample sizes for qualitative M&E:

In-depth interviews: 8-15 participants
Focus groups: 4-6 groups of 6-10 people each
Key informant interviews: 5-12 informants

But these are just guidelines. What really matters is whether you’ve heard enough to answer your questions.

I once did a study on why farmers weren’t adopting a new technique. After 10 interviews, the same three barriers kept coming up (lack of credit, fear of risk, peer pressure). By interview 12, I wasn’t learning anything new. That’s saturation.

Purposeful sampling is not cheating

Unlike quantitative studies that need random sampling, qualitative studies deliberately select participants based on specific criteria. This is called purposeful sampling and it’s methodologically sound.

You might specifically seek out:

Early adopters and resisters of your program
Different age groups or genders
Participants from high-performing and low-performing communities
Program staff at different levels

The goal is to capture diverse perspectives that illuminate your research questions, not to be statistically representative.

One warning: Skip convenience sampling

Convenience sampling means just grabbing whoever is easiest to reach – friends, colleagues, people hanging around the office. This rarely produces useful insights and makes your findings hard to defend. Always have clear selection criteria for your qualitative participants.

Seven practical tips from the field

1. Calculate sample size before budgeting, not after

I can’t tell you how many times I’ve seen this: Someone writes a budget for 200 surveys, then does the sample size calculation and discovers they need 400. Now they’re either underpowered or scrambling to find more money.

Do your sample size calculation first. Then build your budget around it.

2. Use online calculators, not rules of thumb

Never just sample “10% of the population” or “30 people per cluster.” These rules ignore critical factors like your desired precision and expected effect size.

Free calculators I recommend:

Raosoft: Simple interface for descriptive studies
Sealed Envelope: Good for impact evaluations
G*Power: Free software with lots of options (bit technical)

3. Document everything

Write down every assumption you made:

Where you got your baseline proportion
Why you chose your minimum detectable effect
Your expected dropout rate and its source
Your design effect estimate

Your donor or ethics board will ask. Future you will thank present you.

4. Link your MDE to program theory

Don’t choose your minimum detectable effect based on what’s statistically convenient. Choose it based on what matters for program decisions.

Ask yourself: “What’s the smallest improvement that would convince us this program is worth scaling up?” That’s your MDE.

5. When in doubt, get help

If your study involves cluster sampling, multiple stages, or complex statistical methods, pay for a statistician consultation. A few hundred dollars for expert advice beats a failed evaluation that cost thousands.

6. Build in a buffer

Your calculated sample size is the minimum you need if everything goes perfectly. Things rarely go perfect. Budget for 10-20% more than your calculation to account for messy reality.

7. Consider sequential sampling

If you’re genuinely unsure about your baseline proportion or expected effect, consider starting with a smaller pilot sample. Analyze those results to refine your assumptions, then calculate how many additional participants you need.

This adaptive approach costs more in time but can save money if your initial assumptions were way off.

FAQs

What if I can’t afford the calculated sample size?

This happens all the time. You have three options:

Find more money. Sometimes you need to make the case to your donor that adequate M&E requires adequate resources.
Increase your minimum detectable effect. Accept that you’ll only be able to detect larger impacts. If you needed 600 people to detect a 10% change, maybe 200 people can detect a 20% change.
Reduce your confidence or power slightly. Going from 90% power to 80% power reduces your sample size. Just don’t go below 80% power or you’re gambling too much.

What you shouldn’t do: Collect an inadequate sample anyway and hope for the best. That’s worse than doing no evaluation at all because you’re spending money on findings that aren’t credible.

Do I need different calculations for each outcome?

Yes. Calculate for your primary outcome (the main thing you care about). If you’re tracking multiple primary outcomes, calculate for each and use the largest sample size. This ensures you’re powered for your most demanding outcome.

Can I just sample 10% of my population?

No. This “10% rule” ignores precision and power completely.

A village of 1,000 people needs about 278 respondents for ±5% precision (that’s 28%). A city of 100,000 people needs about 383 respondents (that’s 0.4%). See how the percentage changes wildly?

Sample size is about precision and power, not population percentages.

What if my population is really small (under 100)?

If your population is tiny, you might need to survey almost everyone to get adequate precision. For example, a population of 50 needs a sample of 44 for ±5% margin of error. At that point, it’s often easier to just do a census and survey everyone.

How do I handle high dropout rates?

Inflate your baseline sample by dividing your calculated sample by (1 – dropout rate).

If you need 400 people for your endline and expect 30% dropout: 400 ÷ 0.70 = 571 people at baseline.

Estimate dropout based on similar studies in your context. If you’re unsure, 20% is a reasonable conservative estimate for most development contexts.

What if I’m comparing more than two groups?

Let’s say you have three treatment arms plus a control group. The calculations get more complex and you’ll need specialized software. Use G*Power (free) or consult a statistician. As a rough rule, you’ll need more total participants than a simple two-group comparison.

How precise is “precise enough” for my margin of error?

It depends on how the results will be used:

±3%: For large national surveys or when very precise estimates matter for policy
±5%: Standard for most M&E work, good for program decisions
±7-10%: Acceptable for pilots, exploratory studies, or when budget is very limited

Remember, cutting your margin of error in half roughly quadruples your sample size. Going from ±10% to ±5% means needing about 4x as many respondents.

Should I power for the smallest possible effect?

No. Power for the smallest effect that would be programmatically meaningful. If your handwashing program improves rates by 3%, is that worth the cost of scaling up? If not, don’t power your study to detect a 3% change.

Be honest about what magnitude of impact would actually change your program decisions. That’s your minimum detectable effect.

The bottom line

Getting sample size right doesn’t require a statistics degree, but it does require thinking carefully about what you’re trying to learn and being honest about your constraints.

Start with clarity about your research question. Are you describing the current situation or trying to prove your program caused a change? That determines your approach.

Use a calculator, not guesswork. Document your assumptions. And remember that sample size is ultimately about making good decisions with limited resources.

The goal isn’t perfection. The goal is generating findings that are credible enough to guide your program forward while staying within the bounds of your budget and respecting participants’ time.

When you get that balance right, you’ve done good M&E work.

Blog

How to choose the right sample size for your M&E Project