How to Select a Keeper Study: A Step-by-Step Guide for Systematic and Literature Reviews
A practical walkthrough of the keeper study selection process — how to define eligibility criteria using PICO, apply a two-phase screening protocol, conduct quality appraisal, and document every decision in a way that survives peer scrutiny and journal review.
A keeper study is a source that passes your eligibility criteria and is retained for full inclusion in your systematic review, scoping review, or structured literature review. The selection process — deciding which studies are keepers and which are excluded — is the methodological step most likely to compromise your review if handled inconsistently. Instructors and journal reviewers can accept a narrow topic or a small final sample. What they cannot accept is a selection process that cannot be reconstructed, that applied different standards to different studies, or that confuses topic relevance with methodological eligibility. This guide explains how to structure the keeper study selection process so that your decisions are defensible at every stage.
This guide covers selection methodology — the process, the criteria, the tools, and the documentation. It does not conduct the selection for you, and it does not determine which specific studies belong in your review. Those decisions depend on your research question, your field’s quality standards, and the database results your search produces. What this guide gives you is the framework for making those decisions correctly.
What This Guide Covers
What a Keeper Study Is (and Is Not)
A keeper study is any study that meets all of your pre-defined inclusion criteria and none of your exclusion criteria. The term is used in educational research, social science, and some health science contexts to describe the final retained set of sources after a systematic screening process. In clinical research, the same concept appears under labels like “eligible study,” “included study,” or “retained record.” The terminology varies by field, but the concept is identical: a study you keep is one that answers your research question, meets your methodological standards, and falls within your scope boundaries.
What a keeper study is not is a study you like, a study from an author whose work you recognize, or a study that confirms your hypothesis. Selection bias — the systematic tendency to include studies that support a particular direction — is one of the most serious methodological threats to any review. A correctly constructed keeper selection process is specifically designed to prevent this. Your criteria are set before screening begins, applied uniformly across every record, and documented publicly enough that another researcher could reproduce your decisions.
The keeper study concept applies to systematic reviews, scoping reviews, integrative reviews, and structured narrative reviews. It does not apply to traditional non-systematic literature reviews, where source selection is not required to follow a documented, reproducible protocol. If your assignment or thesis requires a systematic or structured review — even at the student level — the keeper selection process described in this guide applies. If your assignment is a general literature review without a documented search and screening methodology, your institution may use different standards. Confirm with your supervisor which type of review you are conducting before designing your selection criteria.
Before You Screen: Building Your Criteria
The most important rule in keeper study selection is one that most students violate: you must define your eligibility criteria before you run your search and before you see any results. Criteria built after you have seen which studies exist are post-hoc criteria — they are shaped, consciously or not, by what you found rather than by what your research question requires. Post-hoc criteria produce biased selections and cannot be defended in a methods section.
Criteria development is a planning step. It happens at the same time you develop your research question and design your search strategy — not after. For most student reviews, this means your criteria should be committed to paper (or a pre-registration document) before you log into any database. Everything that follows — the screening, the full-text review, the quality appraisal — applies those criteria mechanically to whatever the search returns.
Start with Your Research Question
Your eligibility criteria are a direct translation of your research question into screening rules. Every element of your research question — population, intervention, comparison, outcome — becomes a category of criteria. If you cannot connect a criterion directly to your research question, question whether it belongs.
Review Field-Specific Standards
Different fields apply different methodological standards to what counts as acceptable evidence. Health sciences reviews may require randomized controlled trial designs. Social science reviews may accept mixed-methods or qualitative designs. Know your field’s evidence hierarchy before writing your inclusion criteria for study design.
Set Scope Boundaries
Date range, language, publication type, geographic scope, and setting are all scope decisions that belong in your criteria before screening. These are not arbitrary — each one should be justifiable in your methods section. “English-language only” is acceptable if you explain why; unexplained language restrictions are a methodological weakness.
Using PICO/PICOS to Define Eligibility
PICO is the most widely used framework for converting a research question into eligibility criteria in health sciences and increasingly in social sciences and education. It stands for Population, Intervention, Comparison, and Outcome. An extended version — PICOS — adds Study design as a fifth element. Each letter of the acronym becomes a category of your inclusion and exclusion criteria.
Population
Who are the participants or subjects? Age range, diagnosis, demographic group, setting, or professional role. Your P criterion determines which studies have the right sample — everything else about the study is irrelevant if the population does not match.
Intervention
What exposure, program, treatment, or phenomenon is being examined? Define this precisely — a study of “mindfulness-based stress reduction” is different from one of “mindfulness practices” broadly. Vague intervention criteria produce a heterogeneous sample of keeper studies that cannot be meaningfully synthesized.
Comparison
What is the intervention compared against — a control group, an alternative treatment, a pre-intervention baseline, or nothing? Not all reviews require a comparison; qualitative and descriptive reviews often have no C element. If your research question does not involve comparison, this criterion may not apply.
Outcome
What results or effects are you interested in? Define your outcomes specifically — “academic achievement” is too broad if you need “standardized reading scores at third-grade level.” Studies that measure related but different outcomes may not be keepers unless your criteria explicitly include them.
Adding S: Study Design
PICOS adds a study design component to the framework. This is the criterion that determines which research designs are eligible — RCTs only, RCTs plus quasi-experimental, any quantitative design, mixed methods included, qualitative included. In student reviews, study design is often the most debated criterion because it determines how much evidence you will find. Narrowing to RCTs only is methodologically rigorous but may leave you with very few keeper studies in fields where RCTs are rare. Your study design criterion should be calibrated to your research question and justified in your methods section.
PICO gives you categories for thinking about eligibility — it does not write your criteria for you. Within each PICO element, you still need to make specific decisions: exactly which populations, which intervention variants, which outcomes, which designs. A PICO table with only one-word answers in each cell (“adults / CBT / usual care / anxiety”) is not a usable criterion set. Each cell needs enough specificity that a second researcher applying your criteria to the same study would reach the same inclusion or exclusion decision independently.
Writing Inclusion and Exclusion Criteria
Inclusion criteria define what a study must have to be a keeper. Exclusion criteria define what disqualifies a study that would otherwise pass. Both are required — they are not two ways of saying the same thing. A study can technically meet all inclusion criteria and still be excluded for a reason that is easier to state as an exclusion rule.
| Criterion Category | Inclusion Example | Exclusion Example |
|---|---|---|
| Population | Studies involving adult participants (18+) diagnosed with Type 2 diabetes | Studies conducted exclusively with children or adolescents under 18 |
| Intervention | Studies examining structured dietary interventions lasting at least 8 weeks | Studies examining pharmacological interventions without a dietary component |
| Outcome | Studies reporting HbA1c levels as a primary or secondary outcome measure | Studies reporting only patient-reported outcomes without clinical measures |
| Study Design | Randomized controlled trials and quasi-experimental designs with control groups | Case reports, case series, editorials, opinion pieces, and letters to the editor |
| Publication | Peer-reviewed journal articles published between 2014 and 2024 | Conference abstracts, dissertations, grey literature, and non-peer-reviewed reports |
| Language | Articles published in English | Articles published in languages other than English where no translation is available |
| Setting | Studies conducted in primary care or community health settings | Studies conducted exclusively in inpatient or hospital settings |
Write your criteria in complete sentences, not just keywords. “Adults with Type 2 diabetes” is not a criterion — it is a category label. “Studies must include participants aged 18 or older with a confirmed diagnosis of Type 2 diabetes (ICD-10 code E11 or equivalent)” is a criterion. The specificity matters because when you encounter a borderline case during screening — a study of adults with pre-diabetes who are classified as Type 2 by one measure but not another — your written criterion is what you apply to reach a decision.
1. Population: Studies must include K–12 students in any grade level in public or private school settings. Studies focused exclusively on home-schooled students are excluded.
2. Intervention: Studies must examine a structured, school-based social-emotional learning (SEL) program delivered by classroom teachers or school counselors. Programs must have a defined curriculum with at least four sequential sessions.
3. Outcome: Studies must report at least one of the following outcomes: student academic achievement (grades or standardized test scores), student behavioral outcomes (disciplinary referrals, attendance), or teacher-rated social competence using a validated instrument.
4. Study Design: Quantitative studies with experimental or quasi-experimental designs, including randomized controlled trials, cluster randomized trials, controlled before-and-after studies, and interrupted time series designs.
5. Publication: Peer-reviewed journal articles published between January 2010 and December 2024, in English.
Note: Each criterion is a complete statement that can be applied as a yes/no decision. A second researcher reading only these criteria — without access to the research question — could apply them to any study and reach the same keeper/exclude decision.
The Two-Phase Screening Protocol
Keeper study selection happens in two sequential phases. The first phase — title and abstract screening — applies your criteria to brief summaries to eliminate clearly irrelevant records quickly. The second phase — full-text review — applies your criteria in full to the complete paper for every record that survived Phase 1. Both phases use the same eligibility criteria. The difference is how much information you have available and how much time each decision takes.
This structure exists because systematic searches return large numbers of records — often hundreds to thousands — and full-text review is time-intensive. Phase 1 functions as an efficient filter. Phase 2 is where the detailed, defensible decisions are made. Studies excluded in Phase 1 are excluded because the title and abstract provide enough information to confirm they do not meet at least one inclusion criterion. Studies that survive both phases are your keeper studies.
Phase 1: Title and Abstract Screening
Title and abstract screening is a rapid filtering step. You are reading enough of each record to determine whether it could plausibly meet all of your inclusion criteria. The operative word is “plausibly” — at this phase, you are not required to be certain that a study is a keeper. You are only required to be certain that a study is not a keeper.
-
Export your search results to a screening tool
Import all database results into a reference manager (Zotero, Mendeley, EndNote) or a dedicated screening platform (Rayyan, Covidence, Abstrackr). Deduplicate the records — the same study may appear in multiple databases. Deduplication happens before Phase 1 begins. The number of records after deduplication is your starting total for the PRISMA flow diagram.
-
Apply your criteria to the title first
The title alone excludes a significant proportion of records in most searches. If the title clearly indicates the study is about a different population, a different intervention, or a different discipline entirely, it is excluded at the title stage. Document this as “excluded at title screening” in your tracking tool. Do not spend time reading the abstract of a study whose title already confirms exclusion.
-
Read the abstract for borderline or unclear records
For records where the title is potentially relevant but not conclusive, read the abstract. You are looking for enough information to confirm or rule out each inclusion criterion. If the abstract confirms that the study meets all criteria, mark it for Phase 2. If the abstract confirms that the study violates at least one criterion, exclude it and record the reason. If the abstract is ambiguous — insufficient information to decide — include it in Phase 2. Do not exclude on ambiguity.
-
Record your decisions systematically
Every record receives one of three outcomes: included for Phase 2, excluded with reason, or flagged for discussion (in a dual-screener setup). The reason for exclusion must reference a specific criterion — not “not relevant” but “wrong population: participants were children, inclusion criterion requires adults 18+.” Vague exclusion reasons cannot be defended in a methods section.
The most common Phase 1 error is excluding records that should have gone to full-text review. If the abstract does not clearly describe the population, the intervention, the design, or the outcomes — information that many abstracts omit or report incompletely — you cannot exclude based on missing information. Missing information in an abstract is not evidence that the criterion is not met. Only confirmed criterion violations justify exclusion. When information is absent, err toward inclusion and retrieve the full text. False exclusions at Phase 1 are irrecoverable — you will not know you missed a keeper study unless you go back.
Phase 2: Full-Text Review
Every record that survived Phase 1 requires full-text retrieval and review. This phase is where your final keeper study determinations are made. You are reading the complete paper — methods section especially — and applying every inclusion and exclusion criterion with full information available.
Quality Appraisal: Does Quality Affect Keeper Status?
Quality appraisal is the process of evaluating the methodological rigor of each study that passes your eligibility screening. This is a separate step from eligibility screening and happens after your keeper set is determined. The question of whether quality appraisal can exclude a study from the final keeper set — or only weight it differently in synthesis — is one of the most contested decisions in systematic review methodology.
Position 1: Quality as Exclusion Criterion
Some review protocols treat a minimum quality threshold as an eligibility criterion — studies below a certain score on the appraisal tool are excluded from the keeper set. This is most common in Cochrane-style clinical reviews where evidence quality is tied directly to clinical decision-making. If your protocol takes this position, the quality threshold must be specified in your criteria before screening begins — not after you see the scores.
Position 2: Quality as Synthesis Weight
The more common approach in social science and education reviews is to include all eligible studies regardless of quality score, then account for quality in the synthesis and discussion. Low-quality studies are retained as keepers but their findings are interpreted with appropriate caution. This approach avoids the circularity of using quality to exclude studies when quality assessment itself involves judgment calls.
For student-level reviews, your supervisor or the assignment rubric will usually specify which approach applies. The key methodological requirement is consistency: you cannot exclude some low-quality studies and retain others of similar quality without an explicit, pre-specified rule. If you use quality as an exclusion filter, document the tool, the threshold, and the score each study received.
Common Quality Appraisal Tools by Study Design
| Study Design | Appraisal Tool | What It Assesses |
|---|---|---|
| Randomized Controlled Trials | Cochrane Risk of Bias Tool (RoB 2) | Randomization process, allocation concealment, blinding, outcome reporting |
| Observational Studies | Newcastle-Ottawa Scale (NOS) | Selection, comparability, and outcome/exposure assessment across cohort and case-control designs |
| Quasi-Experimental | ROBINS-I | Risk of bias in non-randomized studies of interventions — seven bias domains |
| Qualitative Studies | CASP Qualitative Checklist | Research design justification, recruitment, data collection rigor, reflexivity, ethical issues |
| Mixed Methods | MMAT (Mixed Methods Appraisal Tool) | Addresses quantitative, qualitative, and mixed methods components within a single tool |
| Any Design | JBI Critical Appraisal Tools | Design-specific checklists for 12 study types maintained by the Joanna Briggs Institute |
Documenting Every Decision — PRISMA and Flow Diagrams
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework is the international standard for documenting the keeper study selection process. The PRISMA flow diagram is a visual record of how many records entered the selection process at each stage and how many exited at each stage, with reasons. For any review that presents itself as systematic, PRISMA documentation is expected — by journal reviewers, thesis committees, and in many student assignment rubrics.
What the PRISMA Flow Diagram Must Show
- Records identified: The total number of records returned by each database searched, listed by database name. If you also searched grey literature, reference lists, or other sources, those are listed separately.
- Records after deduplication: The number of unique records remaining after duplicate records are removed.
- Records screened (Phase 1): The number of records reviewed at title and abstract stage — this equals the post-deduplication total.
- Records excluded at Phase 1: The number excluded after title/abstract screening, with the primary reason categories listed (e.g., wrong population: n=143; wrong intervention: n=87; not a primary study: n=52).
- Full texts retrieved (Phase 2): The number of records for which full text was sought.
- Full texts not retrieved: The number where full text could not be obtained and why.
- Full texts excluded (Phase 2): The number excluded after full-text review, with specific exclusion reasons and counts for each reason.
- Studies included: Your final keeper study count — the number that met all eligibility criteria and are included in the review.
The PRISMA statement website (prisma-statement.org) provides the official 2020 PRISMA checklist, the flow diagram template, and the elaboration and explanation document that describes what belongs in each item. If you are conducting a systematic review at any level — student, thesis, or publication — this is the primary methodological reference for your selection documentation requirements.
The 2020 PRISMA statement covers standard systematic reviews and meta-analyses. Specific review types have their own extensions: PRISMA-ScR for scoping reviews, PRISMA-IPD for individual participant data meta-analyses, PRISMA-Harms for reporting of harms, and PRISMA-Equity for equity-focused reviews. If your review has a specific methodological character — particularly a scoping review — use the appropriate extension rather than the standard PRISMA template. Check with your supervisor or the target journal for which version applies.
Inter-Rater Reliability and Second Screeners
A systematic review conducted by a single screener introduces reviewer bias — the risk that your individual judgments, blind spots, or knowledge gaps influence which studies become keepers. The methodological standard for high-quality reviews is dual independent screening: two reviewers apply the criteria to each record independently, then compare decisions. Where they agree, the decision stands. Where they disagree, the discrepancy is resolved through discussion or by a third reviewer.
Agreement between screeners is measured using Cohen’s kappa (κ), a statistic that accounts for agreement occurring by chance. A kappa of 0.61–0.80 is generally considered substantial agreement; above 0.80 is near-perfect. Most published systematic reviews report kappa values for both Phase 1 and Phase 2 screening to demonstrate that the keeper set is not the idiosyncratic product of a single reviewer’s decisions.
For Published Reviews and Thesis-Level Work
Dual independent screening is expected. This requires finding a second reviewer — often a colleague, fellow student, supervisor, or research assistant — who applies your criteria to the same records without seeing your decisions first. After both reviewers complete screening independently, decisions are compared and discrepancies resolved. Report kappa values in your methods section.
For Course-Level Student Reviews
Many undergraduate and some master’s-level assignments do not require a second screener due to the practical constraints of a student project. If your assignment does not mandate dual screening, note this as a limitation in your discussion section. Some courses allow a simplified version — applying criteria to a subset of records with a second reviewer to demonstrate the process — rather than dual screening of the full record set.
Even if your assignment does not require dual screening, piloting your criteria before full screening begins is a practice that improves decision quality. Apply your criteria to ten records before screening the full set. If you find yourself uncertain about how a criterion applies to borderline cases, resolve those ambiguities now — not mid-screen, when changing your interpretation is a consistency violation.
Where Keeper Study Selection Goes Wrong
Criteria Defined After Seeing Search Results
Building your eligibility criteria after you have browsed the database results means your criteria are shaped by what exists rather than what your question requires. This is the most fundamental bias in study selection. Reviewers and committee members who ask “when were your criteria finalized?” are checking for exactly this problem.
Instead
Write your complete criteria set before running your first database search. If you pre-register your review (PROSPERO for health sciences, OSF for others), your criteria are timestamped before data collection. Even without formal pre-registration, document your criteria in a dated file before searching.
Excluding on Ambiguity at Phase 1
Reading an abstract that does not mention the outcome measure and excluding the study because “it probably doesn’t report the right outcomes.” Ambiguity is not a criterion violation. If you cannot confirm exclusion from the abstract, the study goes to Phase 2.
Instead
Train yourself to ask only one question at Phase 1: “Can I confirm, from this abstract alone, that this study definitely violates at least one criterion?” If no — it goes to Phase 2. This conservative approach costs time in Phase 2 but eliminates false exclusions, which are irrecoverable.
Vague Exclusion Reasons in the Log
Recording “excluded — not relevant” or “excluded — poor quality” for Phase 2 rejections without specifying which criterion was violated. A committee or journal reviewer can challenge “not relevant” — they cannot challenge “excluded: study population was adults with Type 1 diabetes; inclusion criterion specifies Type 2 diabetes only.”
Instead
Every exclusion at Phase 2 maps to a specific, named criterion from your inclusion/exclusion list. Your exclusion log column heading should read “Criterion Violated” — not “Reason for Exclusion.” The specific criterion reference is what makes the decision reproducible and defensible.
Applying Criteria Inconsistently Across Studies
Including a study conducted in 2008 (outside your 2010–2024 date range) because it was a landmark study in the field, while excluding other studies from the same period. Exceptions that are not justified by a pre-specified criterion are a consistency violation. A reviewer who notices one 2008 study in your keeper list will check whether there are others from the same period that were excluded.
Instead
Apply your criteria mechanically. If a landmark study falls outside your date range, it does not become a keeper — it can be referenced in your introduction or discussion as context, but it should not appear in your PRISMA keeper count or your data extraction table. If you realize mid-review that your date range was too restrictive, adjust the criterion for all records and document the change, not just for the one study you want.
Conflating Eligibility with Quality at Phase 1
Excluding studies at the abstract stage because they appear to be methodologically weak — small samples, no control group, cross-sectional design — when your eligibility criteria did not specify those as exclusion factors. Quality appraisal is a separate, later step.
Instead
Ask only eligibility questions during Phase 1 and Phase 2. “Is this study design listed in my study design inclusion criterion?” is a Phase 2 eligibility question. “Is this study well-designed?” is a quality appraisal question. Mixing them at the screening stage produces a keeper set shaped by quality preferences rather than defined criteria, which is a form of selection bias.
Missing the Full PRISMA Count
Reporting only the final keeper count without documenting the funnel — how many records were identified, how many were deduplicated, how many excluded at each phase. A statement that “15 studies were included after a systematic search” with no PRISMA diagram is not a reproducible or auditable selection process.
Instead
Track your numbers from the first database search. Record how many records each database returned before deduplication, how many duplicates were removed, how many went through Phase 1, how many were excluded at Phase 1 (with the primary reason categories), how many went to Phase 2, how many were excluded at Phase 2 (with specific reason counts), and your final keeper count. These numbers build your PRISMA flow diagram.
Tools That Support the Screening Process
Keeper study selection is a systematic process — and it is substantially more manageable when conducted in a tool designed for it rather than in a spreadsheet. Several platforms exist specifically for systematic review screening, each with different features, costs, and institutional availability.
Rayyan
Free web-based platform designed for systematic review screening. Supports dual-blind screening with built-in conflict detection. Imports records from most reference managers and databases. The blind mode hides one screener’s decisions from the other until both have reviewed each record. Widely used in student and academic reviews. Available at rayyan.ai.
Covidence
Full-featured systematic review platform including screening, full-text review, data extraction, and quality appraisal. Subscription-based but many universities provide institutional access — check your library. Generates PRISMA flow diagram data automatically as you screen. The most commonly used platform for Cochrane reviews and health sciences systematic reviews.
Excel / Sheets with a Protocol
Acceptable for smaller student reviews when dedicated platforms are unavailable. Create columns for: Record ID, Title, Authors, Year, Phase 1 Decision (Include/Exclude/Unsure), Phase 1 Reason if Excluded, Phase 2 Decision, Phase 2 Criterion Violated. Track your numbers as you screen to build the PRISMA flow at the end. Less efficient than dedicated tools but fully functional.
Zotero / EndNote
Reference managers that handle deduplication and organization well, but are not built for screening decisions. Best used in combination with a screening platform — import from the database, deduplicate in Zotero or EndNote, export the deduplicated set to Rayyan or Covidence for actual screening.
ASReview / EPPI-Reviewer
Machine learning-assisted screening tools that use active learning to prioritize the records most likely to be keepers for human review. Useful for very large record sets (1,000+). EPPI-Reviewer is commonly used in education and social policy reviews in the UK. Both require some learning investment before use.
PROSPERO
Not a screening tool — but the international registry where systematic review protocols are pre-registered before data collection begins. Registering on PROSPERO timestamps your criteria and signals to readers that your selection process was planned prospectively. Free to register. Required by many journals and encouraged by most systematic review methodologists.
Frequently Asked Questions
Connecting the Steps: How Keeper Selection Fits Into Your Full Review
Keeper study selection does not begin when you open your database results and does not end when you color your final study green in Rayyan. It begins when you write your research question and ends when you finalize your PRISMA flow diagram and exclusion log. Every step in between — criteria development, search strategy design, Phase 1 screening, Phase 2 full-text review, quality appraisal, and data extraction — is connected to the integrity of your keeper set.
The most common reason students struggle with keeper selection is treating it as a clerical task rather than a methodological one. Deciding which studies count as evidence for your research question is an intellectual act with consequences for everything you conclude. A keeper set built on vague criteria, inconsistently applied, produces a review whose conclusions cannot be trusted — regardless of how well-written the discussion section is.
Before moving from keeper selection to data extraction and synthesis, audit your process against three questions: Can you reproduce every inclusion and exclusion decision from your written criteria alone? Does your PRISMA diagram account for every record that entered the search? Does your exclusion log for Phase 2 cite a specific criterion for every excluded study? If yes to all three, your keeper set is methodologically sound. If not, the gap is in documentation rather than in your judgment — and documentation gaps are fixable before submission.
For direct support with study selection protocols, PICO criteria development, PRISMA documentation, or any stage of a systematic or structured review, our research paper writing team works specifically with evidence synthesis methodology at the student, thesis, and publication level.
Continue with: literature review writing services · dissertation and thesis writing · data analysis help · statistical analysis help · proofread my research paper.