A Randomised Controlled Trial — a clinical study where participants are randomly assigned to either the intervention group or a control group, with predefined outcomes measured. RCTs are the gold standard for testing whether an intervention causes an outcome.

Medical Disclaimer: This website does not provide medical advice. Content is for informational and educational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider before starting any supplement. Read full disclaimer →

How to Read a Clinical Trial About Ayurvedic Herbs

Q: What does p < 0.05 mean?

It means: if there were truly no effect, the probability of seeing this trial result by chance alone is less than 5%. It does NOT mean 'the herb works for sure' or '5% probability of being wrong.'

Q: How big does a trial need to be to count?

n less than 30 is preliminary. n = 30-100 is interpretable but needs replication. n greater than 100 is more robust. Multi-centre trials with n greater than 200 are the strongest single-trial evidence.

The vocabulary, framework, and red flags you need to evaluate any herb claim — RCT, placebo, p-value, sample size, effect size, journal quality. Once you read this, every other article on HerbVerdict makes more sense.

By Ash · Research Editor · Updated April 28, 2026 · 13 min read

What this article covers

By the end of this guide you will be able to:

Read a study abstract and identify the trial design, sample size, duration, and primary outcome. Recognise common methodological weaknesses that should reduce your confidence in a finding. Distinguish between in-vitro, animal, and human evidence. Understand what "p < 0.05" actually means and what it doesn't. Find a study yourself on PubMed and check it against marketing claims. Know the difference between "statistically significant" and "clinically meaningful."

This is the foundation that makes the rest of HerbVerdict's evidence framework usable.

The hierarchy of evidence — why some studies count more than others

Not all studies are equal. The clinical-research community has spent decades developing a hierarchy that ranks evidence types by how much weight they should carry in decisions.

The pyramid is the single most important concept in this article. When a brand cites "studies show," your first question should be: what level of the pyramid is the study? An in-vitro finding (bottom) is interesting biology but not clinical proof. A meta-analysis of multiple RCTs (top) is the strongest evidence we typically have outside of regulatory drug-approval data.

Most "studies show" marketing claims for Ayurvedic supplements rely on in-vitro or animal evidence. Reading the actual cited paper usually reveals this gap.

What is a clinical trial?

A clinical trial is a structured experiment that tests a specific intervention (a herb, drug, treatment, or procedure) in human participants under controlled conditions, with predefined outcomes measured before and after the intervention.

Clinical trials are typically described in phases. Phase 1 establishes safety and dosing in small healthy volunteer groups (typically 20-80 people). Phase 2 evaluates efficacy and side effects in patients with the target condition (100-300 people). Phase 3 confirms efficacy and monitors adverse reactions in larger populations (1,000-3,000 people). Phase 4 covers post-marketing surveillance after a treatment is approved.

For Ayurvedic herbs, most clinical research lives in Phase 1 or Phase 2 territory. Few Ayurvedic herbs have completed Phase 3 trials, and even fewer have Phase 4 data. This is one of the reasons HerbVerdict's verdicts are calibrated more cautiously than pharmaceutical drug evidence — the trial base is generally smaller and earlier-stage.

RCT explained — the gold standard

A Randomised Controlled Trial (RCT) is the methodologically strongest design for testing whether an intervention causes an outcome. Three features make it the gold standard.

Randomisation. Participants are randomly assigned to either the intervention group (gets the herb) or the control group (gets placebo or standard care). Random assignment ensures the two groups are similar at baseline on average — same age distribution, same gender mix, same baseline severity of whatever condition you're studying. Without randomisation, you can't tell whether differences in outcomes come from the intervention or from differences between the groups. Control group. A comparison group that gets either no intervention, a placebo (inactive substance), or standard care. The control group lets you isolate the intervention's effect from natural disease course, time-related improvement, and placebo response. Outcome measurement. Predefined endpoints measured before and after the intervention using standardised, reproducible methods. The outcome should be specified before the trial begins, not selected after results are known.

A well-designed RCT can answer the question "does this intervention cause this outcome?" with reasonable confidence. A poorly-designed RCT — small sample, short duration, no real control, multiple ad-hoc outcome changes — answers nothing useful even if it gets published.

Placebo and blinding

This is the section that confuses readers most because the concepts are subtle.

Placebo is an inactive substance (sugar pill, capsule with cellulose filler, water injection) that looks identical to the real intervention but has no active pharmacology. Placebo groups exist because human participants in trials often improve regardless of what they take — due to natural disease course, attention from medical staff, or psychological expectations. Without a placebo control, you can't tell whether the herb's effect is the herb or the placebo response. Blinding means that participants and/or researchers don't know who is in which group during the trial. Single-blind means participants don't know but researchers do. Double-blind means neither participants nor researchers know — only an independent statistician maintains the assignment code. Double-blind is stronger because it eliminates both participant expectation effects and researcher bias in measurement.

For Ayurvedic herb trials, blinding is sometimes difficult — many herbs have distinctive smells, tastes, or appearances that make them hard to disguise. The Ashwagandha studies we cite often note this limitation: participants may have correctly guessed their group assignment from the herb's earthy smell.

Quick rule. Double-blind RCT > Single-blind RCT > Open-label RCT > Observational study. When evaluating a herb claim, ask: was it double-blind? Most strong trials are. Many Ayurvedic supplement marketing claims rely on weaker designs.

Sample size — why n matters

The "n" in a study (e.g., n=60) is the number of participants. Sample size matters because small studies produce noisy results that may or may not replicate.

A 30-person trial showing a 20% improvement could be a real effect or random variation. A 300-person trial showing the same 20% improvement is much more likely to reflect a real effect. Statisticians have formal tools (statistical power calculations) to estimate the sample size needed to detect a given effect with a given confidence.

For Ayurvedic herb research, sample sizes vary widely: - Small trials (n < 30): Common in older Ayurvedic literature. Findings should be treated as preliminary. - Medium trials (n = 30-100): The most common range for modern Ayurvedic RCTs. Findings are interpretable but warrant replication. - Large trials (n > 100): Less common for Ayurvedic herbs. Findings carry more weight. - Multi-centre trials with n > 200: The strongest evidence type. Rare for Ayurvedic single-herb research.

When a brand cites "research shows" with an n=15 trial, that's preliminary evidence at best. When the cited evidence is a meta-analysis pooling multiple RCTs to n=400+, that's much stronger ground.

P-values — what they mean and don't mean

This is the section where most public-health communication fails. Let me explain it clearly.

A p-value is a probability calculated by statistical tests, expressing how likely it is that you would observe the difference your trial found if there were actually no real difference between the groups.

When a paper reports "p < 0.05," it means: if there were truly no effect, the probability of seeing this result by chance alone is less than 5%. The convention is to call results "statistically significant" if p < 0.05.

What p < 0.05 does NOT mean: - It does not mean "the herb works." It means "the result is unlikely to be pure chance." - It does not mean "5% probability of being wrong." That's a common misinterpretation. - It does not measure the size of the effect — only its statistical reliability. - It does not mean "clinically meaningful." A tiny statistically significant effect may be clinically irrelevant.

A more useful question than "is p < 0.05?" is "what is the effect size?" — i.e., how big is the actual change, and is it big enough to matter to a patient?

Plain-English example. A trial finds Ashwagandha reduced cortisol by 27.9% vs 7.9% on placebo (p < 0.0001). The p-value tells you the difference is statistically reliable. The 20-percentage-point effect size tells you it's a meaningful difference, not just a tiny statistical blip.

Effect size — what changes by how much

Effect size measures the magnitude of an intervention's effect on the outcome, separate from statistical significance.

Common effect size measures include: - Mean difference: The raw change (e.g., LDL cholesterol reduced by 15 mg/dL) - Standardised mean difference (SMD): A normalised measure used in meta-analyses - Relative risk reduction: Useful for binary outcomes (e.g., 30% fewer infections)

A statistically significant result with tiny effect size may be clinically uninteresting. A barely-not-significant result with large effect size may still warrant follow-up research. Both questions matter.

When reading a trial summary, look for both: "is the result statistically significant?" AND "is the effect size big enough to matter?"

Limitations to watch for

This is the checklist I run through for every Ayurvedic herb trial I evaluate.

Sample size too small. n < 30 is preliminary. n = 30-100 is interpretable but needs replication. n > 100 is more robust.

Trial duration too short. Cognitive trials typically need 8-12 weeks. Cortisol trials at least 8 weeks. Cardiovascular trials longer.

Single-centre design. Multi-centre is stronger because it tests whether findings generalise across different clinical contexts.

Industry funding. Industry-funded trials are not automatically wrong but warrant closer scrutiny of methodology and outcome selection.

Open-label design. Lack of blinding allows expectancy effects to inflate results, particularly for subjective outcomes.

Wrong dose form. Trial uses standardised extract; the retail product uses whole-plant powder. Effects don't necessarily transfer.

Surrogate endpoints. Trial measures biomarkers (e.g., cortisol levels) rather than clinical outcomes (e.g., reduced absenteeism due to stress). Both useful, but biomarker change ≠ clinical benefit guaranteed.

Multiple outcome testing. Trials measuring 20 outcomes will find some "significant" by chance. Pre-registered primary endpoints are stronger than post-hoc analysis.

No replication. A single study showing an effect is interesting. Multiple independent replications make it credible.

A trial can be high-quality on most of these dimensions and weak on one or two. Perfect trials are rare. The skill is weighing the limitations against the strength of the finding.

How to find and check a study yourself

This is the practical skill that makes all of the above usable.

Step 1 — Search PubMed. Go to [pubmed.ncbi.nlm.nih.gov](https://pubmed.ncbi.nlm.nih.gov/). Type the herb name plus the outcome. For example: `ashwagandha cortisol RCT`. You can refine with filters: clinical trials only, last 5 years only, full-text available. Step 2 — Read the abstract. The abstract gives you trial design, n, duration, dose, primary outcome, and key result. This alone tells you whether the trial is worth deeper engagement. Step 3 — Note the journal. Indexed peer-reviewed journals (those that appear in PubMed) are generally more reliable than predatory or non-indexed journals. Big-name journals (NEJM, JAMA, Lancet) carry strong editorial scrutiny; specialty journals vary. Step 4 — Find the limitations section. Honest trials state their own limitations. Marketing summaries usually omit them. Read the limitations section in the original paper rather than relying on press releases. Step 5 — Check funding source. Disclosure of funding is required in reputable journals. Note whether the trial was industry-funded or independent. Step 6 — Look for replication. Has anyone independently replicated this finding? Single positive studies are interesting; replicated findings are credible.

This is more work than most consumers will do for any individual product. But for a herb you're considering taking long-term, this 20-minute exercise is a defensible time investment.

How HerbVerdict's verdict system maps to all this

We use three verdict tiers across our scorecards:

PROVEN — Multiple replicated RCTs at meaningful sample size (typically n > 100 cumulative across trials), with consistent effects in the same direction, plus at least one meta-analysis or systematic review supporting the finding. Examples in our library: Boswellia for knee osteoarthritis pain, Curcumin for knee osteoarthritis pain. PROMISING — At least one well-designed RCT with positive findings, plus supportive but weaker replication, plus a plausible biological mechanism. Most outcomes for Ashwagandha, Brahmi, Tulsi, Triphala, and Amla sit at this tier. Reasonable evidence, not yet definitive. LIMITED — Few RCTs, small samples, weak replication, or significant safety concerns. Shilajit and Giloy currently sit at this tier in our scorecards.

The framework forces us to be specific about what evidence supports each verdict, rather than relying on general impressions.

What "clinically proven" usually means on a label (and what it should mean)

When you see "clinically proven" on an Ayurvedic supplement, here is the spectrum of what it could actually mean:

The actual specific product has been tested in a published RCT (rare). The herb in general has been tested in some RCTs (common). An ingredient in the product has been tested in some preclinical studies (very common). Someone wrote a marketing brochure saying it works (most common).

The first interpretation is the only one that justifies "clinically proven" as a claim. The other three should not be marketed that way, but often are. A more sophisticated consumer asks: "what specific product was tested in what specific trial?" If a brand can answer with citations, the claim is defensible. If not, treat it as marketing.

Common Ayurveda research red flags — a checklist for marketing claims

This is where the framework actually becomes useful for everyday consumer decisions.

When you see "clinically proven" or "research shows" on an Ayurvedic supplement, run through this checklist:

Does the marketing cite a specific study or just say "studies show"? Specific citation is much stronger than vague reference. If a brand cites a study, you can look it up. If they don't, you can't. Was the study on the specific product or on the underlying herb? Brands often cite herb-level evidence to support specific-product claims. The two are different. KSM-66 ashwagandha has 15+ RCTs; a generic ashwagandha capsule from Brand X has zero. What was the trial design? RCT is strongest. Open-label or single-arm trials are weaker. In-vitro and animal studies are not clinical evidence. What was the sample size and duration? n > 100 is meaningfully better than n < 30. 12+ weeks is meaningfully better than 4 weeks for most outcomes. What was the outcome measured? Biomarker change is weaker than clinical outcome. Self-reported subjective improvement is weaker than objective measurement. Was the study replicated? A single positive study is interesting; multiple independent replications make it credible. Who funded the study? Industry funding warrants closer methodological scrutiny.

A reader who runs this checklist on every supplement marketing claim will quickly find that most claims fail multiple items. That's not a paranoia exercise — it's just calibrated consumer literacy.

What "evidence-based supplementation" actually looks like

The phrase "evidence-based" gets used loosely in consumer wellness marketing. Here is what it actually means in practice.

Evidence-based supplementation means: choosing supplements where the specific intervention you are taking matches what was tested in published clinical trials at meaningful sample size, with replicated findings, in populations similar to yours, for outcomes relevant to your goals.

This is a narrow standard. Most consumer supplement use does not meet it. That's not necessarily a problem — many people take supplements for traditional, cultural, or general wellness reasons that don't require clinical-trial-level evidence justification.

But "evidence-based" should mean something specific. When a brand uses the phrase, they should be able to point to specific clinical trial evidence that matches their specific product. Most can't. A more honest version of supplement marketing would distinguish between "this herb has a research base" and "this specific product has trial-level evidence" — they are different claims.

Frequently asked questions

What is an RCT?

A Randomised Controlled Trial — a clinical study where participants are randomly assigned to either the intervention group (gets the herb) or a control group (gets placebo or standard care), with predefined outcomes measured before and after. RCTs are the gold standard for testing whether an intervention causes an outcome.

What does p < 0.05 mean?

It means: if there were truly no effect, the probability of seeing this trial result by chance alone is less than 5%. It is the convention for calling a result "statistically significant." It does NOT mean "the herb works for sure" or "5% probability of being wrong" — those are common misinterpretations.

How big does a trial need to be to count?

n < 30 is preliminary. n = 30-100 is interpretable but needs replication. n > 100 is more robust. Multi-centre trials with n > 200 are the strongest single-trial evidence. Meta-analyses pooling multiple RCTs to combined n > 400 are stronger still.

Why does double-blind matter?

Blinding prevents both participant expectation effects (placebo response) and researcher bias in measuring outcomes. Double-blind (neither participants nor researchers know assignment) is stronger than single-blind (only participants don't know). Open-label (everyone knows) is weakest.

What's the difference between in-vitro, animal, and human evidence?

In-vitro = lab petri dish or cell culture studies. Animal = mice, rats, dogs. Human = actual human trials. The hierarchy of evidence prioritises human RCTs. In-vitro and animal findings are useful for understanding mechanisms but don't reliably translate to human clinical effects.

How do I check a study myself?

Go to pubmed.ncbi.nlm.nih.gov, search for the herb plus the claimed outcome, read the abstract, note the journal, find the limitations section, and check funding source. This 20-minute exercise lets you evaluate any specific study claim.