How to Effectively Analyze Statistical Data of Police Brutality

STATISTICS · CRIMINOLOGY · COMPARATIVE ANALYSIS · RESEARCH METHODS

How to Effectively Analyze Statistical Data of Police Brutality: Comparisons and Analysis

A section-by-section framework for approaching police brutality data analysis assignments — what data sources to use, how to set up valid comparisons, which statistical methods apply to which questions, how to handle bias and measurement problems, and where most analyses lose credibility before the numbers are even interpreted.

21 min read Criminology, Sociology & Public Policy Undergraduate & Graduate ~4,000 words

Custom University Papers — Criminology & Statistical Analysis Writing Team

Specialist guidance on quantitative research methods, comparative data analysis, and criminology assignments — grounded in what assignment rubrics actually evaluate and the methodological standards that distinguish rigorous analysis from cherry-picked numbers.

Analyzing statistical data on police brutality is one of the most methodologically demanding tasks in criminology and social science coursework — not because the math is complicated, but because the data itself is broken in ways that most students do not identify before they start running numbers. Incomplete reporting, inconsistent definitions, selection bias in the underlying incidents, and the absence of a reliable national denominator mean that two analysts using different sources on the same research question can reach contradictory conclusions — both technically correct given their data — without either of them being wrong in a simple sense. Understanding why that happens, and knowing how to structure an analysis that accounts for it, is what separates a credible assignment from one that gets marked down for methodological naivety.

This guide walks through every stage of a police brutality data analysis — from selecting and evaluating data sources, to setting up valid comparisons, to choosing appropriate statistical methods, to presenting findings with the limitations they require. It explains the reasoning behind each step so you can apply it to your specific research question and assignment requirements. It does not complete any analysis for you; it tells you what to do and why.

What This Guide Covers

Why Police Brutality Data Is Different From Other Crime Data Step 1: Identifying and Evaluating Your Data Sources Step 2: Defining Your Variables Precisely Step 3: Setting Up Valid Comparisons Step 4: Choosing the Right Denominator Step 5: Matching Statistical Methods to Your Question Step 6: Analyzing Racial Disparity Data Specifically Step 7: Using Regression to Control for Confounders Step 8: Visualizing Police Brutality Data Step 9: Assessing and Reporting Data Limitations Step 10: Writing Up the Analysis Where Most Analyses Lose Credibility Frequently Asked Questions

Why Police Brutality Data Is Different From Other Crime Data

Most crime statistics are collected through established federal reporting systems — the FBI’s Uniform Crime Reporting (UCR) program or the Bureau of Justice Statistics’ National Crime Victimization Survey. These systems have well-documented methodologies, known participation rates, and decades of methodological literature evaluating their reliability and validity. Police brutality data has none of that.

There is no comprehensive, mandatory national database of police use-of-force incidents in the United States. The FBI launched its National Use-of-Force Data Collection in 2019, but as of its first years of operation, fewer than half of eligible law enforcement agencies were submitting data — which means the dataset systematically undercounts incidents, and the agencies that do not report are not randomly distributed. They tend to be larger urban agencies, which skews the data in ways that are hard to correct without knowing the direction of the skew. This is not a flaw specific to one database — it is a structural feature of police brutality data that affects every analysis and that must be addressed in any credible assignment.

<50% Agency participation rate in the FBI’s National Use-of-Force Data Collection in its early years — the rest is unobserved

4+ Major competing databases tracking police killings — each with different definitions, methods, and coverage

0 Mandatory federal reporting requirements for non-fatal use-of-force incidents — all non-fatal data is voluntary or crowdsourced

2–3× Ratio by which crowdsourced databases like Mapping Police Violence typically exceed official counts of police killings

Fatal vs. Non-Fatal Incidents: An Entirely Different Data Landscape

Police killings (fatal use of force) and non-fatal use-of-force incidents are tracked by completely different methods with completely different coverage rates. Fatal incidents are relatively well-documented because deaths require official records — death certificates, medical examiners, coroner reports — that are independent of police self-reporting. Non-fatal incidents — excessive force complaints, injuries from restraint, tasings, beatings — are almost entirely dependent on police self-report or civilian complaint systems, both of which have severe underreporting problems. An analysis that conflates these two categories is not analyzing the same phenomenon with two data points — it is analyzing two different phenomena with incomparable measurement approaches. Clarify which category your assignment addresses before selecting any data source.

Step 1: Identifying and Evaluating Your Data Sources

Before conducting any analysis, you need to identify the data sources available for your specific research question, evaluate their coverage and methodology, and decide which sources are appropriate given what you are trying to measure. Using the wrong source for the wrong question produces findings that cannot be interpreted correctly — and your instructor will know.

Data Source	What It Covers	Key Limitations	Best Used For
FBI National Use-of-Force Data Collection	Use-of-force incidents reported voluntarily by participating agencies; fatal and non-fatal; 2019–present	Low and non-random agency participation; non-participating agencies are not randomly distributed; data is not nationally representative	Trend analysis within participating agencies; agency-level comparisons among reporters only — never as national totals
Mapping Police Violence	Police killings in the US; compiled from media reports, obituaries, and public records; includes race/ethnicity, armed status, mental health context	Media-sourced data may miss low-profile incidents; some variables coded inconsistently across years; methodology has evolved	Cross-department racial disparity analysis; trend analysis of police killings over time; state-level comparisons
Washington Post Fatal Force Database	On-duty police shootings in the US; 2015–present; includes race, age, armed status, fleeing, mental illness, body camera presence	Covers only shootings, not other forms of lethal force (e.g., positional asphyxia, vehicle strikes); does not include off-duty shootings	Analysis of officer-involved shootings specifically; armed vs. unarmed breakdowns; weapon-type analysis
The Guardian’s The Counted (2015–2016)	People killed by police in the US; crowdsourced with editorial verification; rich individual-level data	Only covers 2015–2016; discontinued; cannot support multi-year trend analysis beyond that window	In-depth individual-level analysis for 2015–2016; validation against other sources for the same period
Bureau of Justice Statistics Arrest-Related Deaths	Deaths occurring during arrest or in custody; federally collected; 2003–2009, 2011–2012, revived 2016+	Coverage gaps; definitional changes across collection periods; relies on agency reporting with known undercount	Cross-period trend analysis with appropriate caveats; comparison against media-derived sources to estimate undercount
Police-Public Contact Survey (PPCS)	Self-reported civilian experiences of police contact, including perceived excessive force; BJS; periodic	Survey-based; subject to recall bias and social desirability bias; does not capture severity or outcome of force	Prevalence of force encounters from civilian perspective; demographic differences in reported police contact

Triangulation Is a Methodological Requirement, Not an Option

No single database is sufficient as the sole source for a serious police brutality analysis. Methodologically sound assignments compare findings across at least two sources — and explicitly note where the sources converge and where they diverge. When Mapping Police Violence and the Washington Post Fatal Force Database produce different counts for the same population and time period, the divergence itself is a finding that needs explanation. Either the definitions differ, the coverage differs, or the verification methods differ — and identifying which of these is driving the discrepancy tells you something important about both the data and the phenomenon you are studying.

Step 2: Defining Your Variables Precisely

“Police brutality” is not a variable. It is a concept, and before you can analyze it statistically, you have to operationalize it — translate it into a specific, measurable variable that exists in an actual dataset. The operationalization you choose determines what you can and cannot conclude, and an assignment that uses the term “police brutality” without specifying exactly what is being measured will be read as conceptually underdeveloped.

Lethal Force / Police Killings

Deaths caused by police action — the most completely documented category because death is independently recorded. Includes officer-involved shootings but also deaths from tasers, vehicle pursuit, restraint, and in-custody medical neglect, depending on the database. Specify whether you mean all lethal force or only shootings — these are different variables.

Use of Force (Non-Fatal)

Physical contact, weapon deployment, or restraint beyond handcuffing — the least reliably documented category. Definitions vary by agency policy. Data exists only in agency self-reports or civilian complaint databases, both with severe coverage problems. Any analysis of non-fatal use of force requires extensive limitation discussion.

Complaints and Adjudications

Civilian complaints of excessive force filed with police departments or civilian review boards. Subject to complaint filing barriers, variable adjudication standards across agencies, and systematic suppression in some jurisdictions. Useful for within-agency trend analysis but not valid for cross-agency comparison without extensive standardization.

Beyond the primary variable (type of force), you need to precisely define any comparative variables your analysis uses. Race/ethnicity coding differs across databases — some use five categories, some use two, some code Hispanic ethnicity separately from race and some do not. Armed status is coded differently across databases and has changed over time within some databases. “Mental illness” as a coded variable has no standardized definition across any major dataset. Before running any comparison, confirm that the variable you are comparing means the same thing across your sources.

Step 3: Setting Up Valid Comparisons

The most common analytical failure in police brutality assignments is an invalid comparison — comparing two numbers that are not measuring the same thing in the same units for the same population and time period. A ratio that compares police killing rates across states means nothing if the underlying databases have systematically different agency participation rates by state. A comparison of Black and white killing rates means nothing if the race coding methodology is not consistent across both groups in the same dataset.

Valid comparisons require that three conditions are met: the same outcome variable is measured using the same definition in both comparison groups; the same time period is used; and the comparison is expressed as a rate — not a raw count — using a population denominator that is consistent and appropriate for both groups being compared.

COMPARISON VALIDITY — checking your comparison before you run it

SAME DEFINITION: Are both comparison groups using the same database? If comparing Black and white police killing rates, both rates must come from the same source using the same case-selection criteria — not one from Mapping Police Violence and one from official FBI data, which have different coverage and definitions.

SAME TIME PERIOD: Are both numbers from the same year or the same multi-year window? Annual police killing counts fluctuate substantially — a single-year comparison may reflect year-specific noise rather than a structural difference. Three-to-five year averages are more stable for cross-group comparisons.

RATE, NOT COUNT: Are you dividing by a population denominator? The fact that more incidents involve Black individuals than white individuals in a jurisdiction with a majority-white population is not evidence of disparity — it requires a rate calculation before it means anything. The numerator (incidents) must be divided by the denominator (population) to produce a rate that can be compared.

SAME UNIT OF ANALYSIS: Is your comparison at the individual level (incidents per person), the agency level (incidents per department), or the geographic level (incidents per county or state)? Mixing levels of analysis — using individual-level rates for one group and agency-level rates for another — produces a meaningless comparison.

Step 4: Choosing the Right Denominator

The choice of denominator for rate calculations in police brutality analysis is one of the most methodologically contested issues in the field — and one that your assignment almost certainly needs to address explicitly. There are three main denominator options, each of which answers a different research question and produces a different rate. Using the wrong denominator for your research question is a fundamental methodological error.

Population-Based Rate

Incidents per 100,000 people in the demographic group. Answers: which demographic group faces the highest absolute risk of a police killing encounter given their presence in the population? This is the most commonly used denominator in public health and epidemiology-style analyses. Limitation: it does not account for differential rates of police contact across groups — which is itself a product of policing decisions, not just population composition.

Police-Contact-Based Rate

Incidents per arrest or per police encounter. Answers: conditional on having police contact, what is the likelihood of experiencing force? This controls for differential patrol intensity across neighborhoods and groups. Limitation: arrest data itself reflects policing decisions and may not be an unbiased measure of contact rate; using arrest as a denominator may partially obscure the disparity if arrests themselves are biased.

Encounter-Specific Rate

Incidents per encounter in specific contexts (traffic stops, mental health calls, domestic disturbances). The most precise denominator for causal inference about officer behavior. Limitation: encounter-level data is extremely rare — only a small number of jurisdictions collect it, and it is not available at the national level. Most student assignments cannot use this denominator.

Which One to Use

Use population-based rates when your research question is about absolute risk to community members. Use police-contact-based rates (typically per arrest) when your research question is about officer decision-making or conditional probability of force given contact. State explicitly which denominator you are using, why, and what its limitations mean for your conclusions. Do not switch denominators between comparisons in the same analysis without flagging it.

Step 5: Matching Statistical Methods to Your Research Question

The statistical method you use needs to match the type of research question you are asking, the level of measurement of your variables, and the structure of your data. A common error in student assignments is applying a method that is technically valid but answers a different question than the one being asked — or applying a sophisticated method to data that does not meet its assumptions.

Descriptive Statistics — What the Data Looks Like

Frequency counts, proportions, means, medians, rates per population. Use for: describing who is affected, where incidents occur, how rates have changed over time. Required in any analysis before inferential methods. Present rates, not raw counts, for any cross-group comparison. Use measures of central tendency and dispersion to characterize the distribution before attempting comparisons.

Rate Ratios and Risk Ratios — How Groups Compare

The ratio of one group’s rate to another’s. A rate ratio of 2.5 means Group A experiences 2.5 times the rate of Group B. Use for: quantifying racial, geographic, or temporal disparities. Always report confidence intervals around rate ratios — a ratio of 2.5 with a 95% CI of [0.8, 6.3] crosses 1.0 and is not statistically significant; one with a CI of [2.1, 3.0] is. A rate ratio without a confidence interval is incomplete.

Chi-Square Tests — Categorical Comparisons

Tests whether the distribution of a categorical outcome (e.g., armed vs. unarmed, racial category) differs significantly across comparison groups. Use for: testing whether the proportion of unarmed individuals differs significantly between racial groups in a police killing dataset. Requires sufficient cell counts (expected frequency ≥5 per cell). Does not tell you the magnitude of the difference — that requires a rate ratio or odds ratio alongside it.

Trend Analysis — How Rates Change Over Time

Joinpoint regression or simple linear regression of rates over time. Use for: determining whether police killing rates are increasing, decreasing, or stable; whether a policy intervention is associated with a change in trend. Requires at least 5–7 data points (years) for meaningful trend analysis. Single-year-to-year changes are not evidence of a trend — calculate the annualized percent change across the full time series.

Geographic Analysis — Spatial Patterns

Mapping rates by county, state, or census tract; calculating local rates using population denominators from Census data. Use for: identifying geographic concentration of high-rate areas; comparing urban vs. rural rates; mapping racial composition against incident rates. State-level analysis is straightforward; sub-state geographic analysis requires Census population data at the matching geographic level.

Multivariate Regression — Controlling for Confounders

OLS regression (for continuous outcomes) or Poisson/negative binomial regression (for count outcomes) that includes multiple predictors simultaneously. Use for: testing whether racial disparities in police killing rates persist after controlling for neighborhood poverty rate, crime rate, and population density. Required for any causal claim — without controlling for confounders, observed disparities could reflect unmeasured structural factors rather than the variable of interest.

Step 6: Analyzing Racial Disparity Data Specifically

Racial disparity analysis is one of the most common and most analytically fraught components of police brutality assignments. The methodological literature — including work published in peer-reviewed criminology and sociology journals — disagrees sharply on which analytical approach is appropriate. Your assignment needs to acknowledge this methodological debate, not paper over it with a single set of numbers.

The central issue is that the observed racial disparity in police killing rates (Black Americans are killed at approximately 2.5–3 times the rate of white Americans per capita, according to multiple databases) is not itself evidence of racial bias in officer decision-making — and it is also not evidence against it. The disparity could reflect differential police deployment, differential rates of poverty-driven encounters, differential levels of police presence in high-crime areas, or direct racial bias by individual officers. Disentangling these explanations requires controlling for structural variables at the neighborhood and department level — which requires data that is not available nationally and is only available in a small number of jurisdictions.

The Denominator Debate in Racial Disparity Research

Research by Knox, Lowe, and Mummolo (2020), published in Proceedings of the National Academy of Sciences, found evidence of anti-Black racial bias in fatal shootings when using an encounter-based denominator — controlling for crime rates and patrol patterns. Research by Johnson, Tress, Burkel, Taylor, and Cesario (2019) in the same journal, using a different analytic approach and denominator, initially concluded the opposite — though the paper was subsequently corrected. The exchange between these research groups is one of the clearest illustrations in the literature of how denominator choice drives conclusions in police brutality analysis. Citing both this debate and its resolution in any racial disparity analysis demonstrates methodological awareness that the rubric will reward at the excellent band.

Population-based denominators (per 100,000 residents) consistently show large racial disparities favoring white Americans
Arrest-based denominators (per arrest) show smaller but still significant disparities in most analyses
Encounter-based denominators are available only in a handful of jurisdictions and produce the most precise estimates of officer decision-making bias
No denominator is “correct” — each answers a different question, and your analysis should specify which question it is answering

Do Not Conflate Disparity With Discrimination

A measured racial disparity in police killing rates — even a large and statistically significant one — is not synonymous with racial discrimination in officer decision-making. Disparity is a measured difference in outcomes between groups. Discrimination is a causal explanation for that difference. Establishing discrimination requires ruling out alternative explanations through appropriate statistical controls — a task that requires encounter-level data that is not nationally available. Your assignment should characterize disparities precisely (using rates and rate ratios) and be explicit about what those disparities can and cannot support as a causal claim. Treating disparity as self-evidently equivalent to discrimination is a logical error that sophisticated analysis does not make — and that will be marked as such.

Step 7: Using Regression to Control for Confounders

If your assignment requires multivariate analysis — testing whether a predictor variable (race, department policy, state law, poverty rate) is associated with police killing rates after controlling for other factors — you need to structure your regression model correctly before interpreting its output. The three most common errors in regression-based police brutality analysis are: choosing the wrong regression family for the outcome type, failing to account for the count-based nature of most police brutality outcomes, and treating observational regression results as causal estimates.

Choosing the Right Regression Model

Police killing counts are count data — non-negative integers with a right-skewed distribution. OLS linear regression assumes a normally distributed continuous outcome. Applying OLS to count data with small expected counts per unit produces biased coefficient estimates and invalid standard errors. For count outcomes, use Poisson regression (when the mean and variance are approximately equal) or negative binomial regression (when the variance substantially exceeds the mean, which is typical in police killing data because of overdispersion across agencies and jurisdictions). If your outcome is a binary variable (was force used: yes/no), logistic regression is appropriate.

What to Include as Covariates

The covariates you include in a regression model should be selected based on theory and the existing literature — not based on which variables make your main predictor significant. Include variables that are theoretically expected to confound the relationship you are studying. For a regression of racial composition on police killing rates across counties, include: median household income (poverty-crime nexus), violent crime rate (exposure of police to dangerous situations), population density (urban vs. rural policing differences), and region (South vs. non-South historical policing differences). Missing important confounders produces omitted variable bias — which in this literature typically means underestimating structural explanations and overattributing variation to the focal predictor.

REGRESSION SETUP — structure for a county-level analysis

OUTCOME: Police killing rate per 100,000 residents in each county over a 5-year period. Use negative binomial regression with the logged county population as an offset term — this is the standard approach for modeling rates as count outcomes with varying population exposure.

FOCAL PREDICTOR: Percentage of county population that is Black (or another racial/ethnic variable). This is the variable whose association with the outcome you are estimating after controlling for confounders.

COVARIATES: Violent crime rate per 100,000; median household income; poverty rate; population density; region (dummy variables); percentage of police with college degrees; agency size. Source each covariate from a documented, citable source — Census ACS for income/poverty/demographics; FBI UCR for crime rates; LEMAS (Law Enforcement Management and Administrative Statistics) for agency characteristics.

INTERPRETATION: Report the incidence rate ratio (IRR) from the negative binomial model, not the raw coefficient. An IRR of 1.08 for the racial composition variable means that for each percentage point increase in the Black population share, the expected police killing rate increases by 8%, holding all covariates constant. Always report 95% confidence intervals alongside the IRR.

Step 8: Visualizing Police Brutality Data

Data visualization for police brutality analysis serves a specific analytical purpose — it should show something about the data that a table of numbers does not convey efficiently. Common visualization errors in student assignments are using bar charts to show rates when a rate ratio or confidence interval would be more informative, and presenting maps without a corresponding rate calculation that accounts for population size.

Time Series Line Chart

Appropriate for showing how annual rates have changed across a multi-year period. Plot rates (per 100,000), not raw counts — raw counts conflate population growth with genuine rate change. Show separate lines for comparison groups (race/ethnicity, region) to make temporal disparity patterns visible. Label any major policy events on the timeline if they are relevant to your analysis.

Choropleth Map

Appropriate for showing geographic variation in rates across states or counties. Color intensity encodes the rate magnitude — but the underlying number must be a rate per population, not a raw count. States with large populations will always have more raw incidents; mapping raw counts produces a map of population density, not police brutality. Use a diverging color scale for rate ratios (above/below national average) rather than a sequential scale for raw rates.

Rate Ratio with Confidence Intervals

A forest plot or coefficient plot showing rate ratios and their 95% confidence intervals for multiple comparison groups or regression predictors is the most information-dense visualization for disparity analysis. It shows both the magnitude of the estimated disparity and its statistical uncertainty in a single compact graphic — more informative than a bar chart showing mean rates, which loses the uncertainty information.

Do Not Map Raw Counts — Ever

A map showing the number of police killings by state, with California, Texas, and Florida appearing darkest, is a map of population size — not police brutality. California, Texas, and Florida are the three most populous states; of course they have the most incidents in absolute terms. Any geographic visualization must use a rate per 100,000 population (or per department, or per arrest, depending on your research question) as the mapped variable. Presenting a map of raw counts as evidence of geographic variation in police brutality is one of the most common visualization errors in student assignments and signals that the analysis is not methodologically sound.

Step 9: Assessing and Reporting Data Limitations

Every serious analysis of police brutality data includes an explicit, substantive limitations section — not a perfunctory paragraph at the end acknowledging that “all data has limitations,” but a genuine engagement with the specific ways the data’s weaknesses affect what the findings can and cannot support. Rubrics for criminology and social science research methods assignments consistently evaluate whether the student demonstrates awareness of measurement and coverage limitations. A technically correct analysis with no limitations discussion will score lower than a slightly less sophisticated analysis that honestly addresses what the data cannot tell you.

Identify the Coverage Limitation

State the percentage of incidents your data source is estimated to capture. For the FBI National Use-of-Force Data Collection, this means identifying which agencies did not report and whether non-participating agencies are systematically different from participating ones (they are — non-participation is correlated with agency size and region). For media-derived databases, explain the search methodology and what types of incidents would be systematically missed (incidents with no media coverage, incidents in jurisdictions with poor press access).

Identify the Definition Limitation

State whether the definition of the outcome variable has changed over time in your dataset (many have), whether it differs from the definition used in comparison sources, and whether coding inconsistencies exist for key variables like race/ethnicity or armed status. These are not peripheral concerns — they directly affect whether your trend analysis or cross-source comparison is valid.

Identify the Denominator Limitation

State which denominator you used and why it is the most appropriate available option — and then state what its limitations mean for interpretation. If you used population as the denominator (because encounter-level data is not available), state that your rate reflects absolute community risk, not officer decision-making propensity, and that differential patrol intensity across neighborhoods may confound the racial disparity estimates you report.

Identify the Confounding Limitation

For any regression or comparative analysis, state which theoretically relevant confounders were not included because data was unavailable. Omitted variable bias is a near-universal limitation in this literature — acknowledge it and explain the likely direction of bias (does the omitted variable, if included, likely increase or decrease your estimated association?). This demonstrates methodological sophistication that marks the difference between a descriptive report and an analytical one.

State What the Findings Cannot Support

Be explicit about the causal inferences your analysis does not support. Observational analysis of cross-sectional data does not establish that a measured disparity is caused by racial bias in officer decision-making. Correlation between a department policy variable and killing rates does not establish that the policy caused the change in rates. State the causal claim your data could theoretically support and note that your analysis does not meet the bar for that claim.

Step 10: Writing Up the Analysis

The structure of a police brutality data analysis write-up follows the same logic as any quantitative social science paper: research question, data and methods, results, and limitations/discussion. Each section has specific content requirements, and the connections between sections — how the data section justifies the methods, how the methods produce the results, how the limitations qualify the discussion — are what the instructor is evaluating when they read for analytical coherence.

Research Question Section

State a specific, answerable research question — not “is there police brutality?” but “do racial disparities in police killing rates persist across jurisdictions after controlling for violent crime rates and poverty?” The research question determines everything downstream: the data needed, the method appropriate, and what the results mean.

Data Section

Identify each data source by full name, version/year accessed, and coverage. Describe how incidents are identified and coded. Report the total N (number of incidents) in your analytic sample. Note any inclusion/exclusion criteria you applied and why. If merging multiple datasets, describe the merge logic and any cases lost to incomplete matching.

Methods Section

Name the specific statistical methods used (negative binomial regression, chi-square test, rate ratio calculation). State the denominator used for rate calculations and justify the choice. List the covariates included in any regression model and their data sources. State the software used (R, Stata, SPSS, Python) and the significance threshold (typically α = .05).

Results Section

Present descriptive statistics first. Then present the key comparative findings — rates and rate ratios with confidence intervals. Then present any regression results — coefficients or incidence rate ratios with standard errors and p-values. Tables should present numbers; the narrative should interpret them. Do not repeat every number in the narrative — state what the results mean.

Limitations and Discussion

Connect the findings back to the research question — did the analysis answer it? If the data limitations mean the research question cannot be fully answered, state that explicitly and explain what data would be needed. Connect findings to existing literature — do your results replicate, extend, or contradict prior published analyses, and why might any differences exist?

“The analysis that earns the highest marks is not the one with the most sophisticated methods — it is the one that uses appropriate methods for the question being asked, applies them correctly to the available data, and is honest about what the results do and do not mean.”

Where Most Analyses Lose Credibility

Using Raw Counts as Evidence of Disparity

Reporting that “1,000 Black Americans were killed by police compared to 400 white Americans” without converting to rates per population. Raw counts tell you about the size of the affected population, not the risk to individuals in that population. Without a denominator, no disparity claim is valid.

Instead

Calculate per-capita rates using Census population data as the denominator. Report the rate ratio with a 95% confidence interval. State the specific database and time period. Example: “Black Americans were killed at a rate of X.X per 100,000, compared to X.X per 100,000 for white Americans (rate ratio = X.X, 95% CI [X.X, X.X]).”

Treating One Database as Definitive

Building an entire analysis on a single data source — typically Mapping Police Violence or the Washington Post Fatal Force Database — without acknowledging that this source covers only specific incident types, has its own methodological limitations, and may diverge from other available sources. Single-source analyses cannot assess the robustness of findings.

Instead

Use at least two sources for any major comparative finding. Where they agree, the convergence strengthens confidence. Where they disagree, explain the source of divergence (definition differences, coverage differences, verification methodology). Triangulation is a methodological standard in this literature, not an optional supplement.

Equating Disparity With Causation

Writing “the data proves that police are racist” or “the data shows systemic racial bias” based on observational disparity statistics alone. Observational rate disparities — even large, statistically significant ones — do not establish the causal mechanism producing them. This is not a defense of any particular mechanism; it is a statement about what observational data can and cannot establish.

Instead

Describe the disparity precisely: “Black Americans experience police killings at X times the rate of white Americans per capita, a disparity that persists after controlling for [covariates included], though the data cannot distinguish between [mechanism A] and [mechanism B] as explanatory factors.” Precision in causal language is a methodological standard, not a political hedge.

Ignoring the Armed/Unarmed Distinction

Analyzing all police killings as a single homogeneous category when the armed status of the individual is a documented variable in most major databases and affects the legal and policy context of the incident entirely. An armed and unarmed killing are not the same outcome for analytical purposes, and treating them identically produces results that cannot be interpreted meaningfully.

Instead

Stratify analysis by armed status where the database includes it — and note the racial disparities within each stratum. The Washington Post Fatal Force Database codes armed status for each incident. Conduct separate analyses for armed and unarmed incidents and compare disparity ratios across strata. Report whether the racial disparity is larger, smaller, or similar between the two strata.

Missing the Selection Bias Problem

Failing to address that the incidents in any police brutality database are not a random sample of police-civilian encounters — they are a selected subset of encounters that resulted in force or a formal complaint. Any analysis based on these databases is studying the characteristics of force incidents, not the determinants of force from all encounters. This distinction matters for every inference drawn from the analysis.

Instead

Acknowledge selection bias explicitly in the limitations section. State that the database captures only incidents that entered the reporting system — through death records, media coverage, or formal complaint — and that unreported incidents may differ systematically from reported ones in ways that affect your findings. Where possible, cite published estimates of the undercount magnitude.

No Confidence Intervals on Rate Comparisons

Reporting a rate ratio of 2.8 without a confidence interval. Without the interval, there is no way to know whether the estimated disparity is consistent with chance variation. A ratio of 2.8 based on 12 incidents in one group and 5 in another has an extremely wide confidence interval and may not be statistically distinguishable from 1.0. Point estimates without uncertainty quantification are incomplete findings.

Instead

Calculate 95% confidence intervals for all rate ratios. For count-based rate ratios, use the exact Poisson method or bootstrapped confidence intervals. Report whether the interval excludes 1.0 (statistically significant disparity) or includes it (disparity is consistent with chance). Stata, R, and Python all have built-in functions for rate ratio confidence intervals that require minimal code.

Frequently Asked Questions

Which database should I use if my assignment doesn’t specify one?

For fatal use-of-force analysis, the Washington Post Fatal Force Database and Mapping Police Violence are the two most methodologically documented and most frequently cited in peer-reviewed literature. The Washington Post database is strongest for officer-involved shootings with armed/unarmed and fleeing-status variables. Mapping Police Violence covers a broader definition of police killings including deaths from tasers, vehicles, and in-custody incidents, and has stronger racial disparity data. Use both and cross-reference. For non-fatal use of force, the options are significantly more limited — the FBI National Use-of-Force Data Collection is the only national source, but its low agency participation rate severely limits what conclusions are valid. The BJS Police-Public Contact Survey is the best option for population-level prevalence estimates of force encounters from the civilian perspective.

How do I get Census population data to use as a denominator?

The Census Bureau provides population data by race/ethnicity at the national, state, and county level through the American Community Survey (ACS) and Decennial Census. For annual rate calculations, use the ACS 5-year estimates for the relevant year period — they are available at data.census.gov at no cost and require no account. Download the B02001 table (Race) and B03002 table (Hispanic or Latino Origin by Race) for racial/ethnic population denominators. For sub-county geographic units, use the ACS 5-year estimates at the census tract or block group level. The Census Bureau also provides the population estimates program (PEP) for intercensal annual estimates at the state and county level, which is more appropriate for year-specific annual rates than the ACS when the analysis spans multiple years.

My assignment asks me to compare police brutality across countries — what data exists for international comparisons?

International comparative analysis is significantly harder than domestic US analysis because there is no standardized cross-national database of police killings. The OECD and Eurostat do not systematically collect police use-of-force data. Cross-national comparisons typically rely on three approaches: national government statistics where they exist and are publicly accessible (the UK’s Independent Office for Police Conduct publishes annual death-in-custody data; Germany’s Federal Statistical Office publishes some police violence data); academic datasets compiled from national sources by researchers (Amnesty International and Human Rights Watch maintain country-level assessments that can be used as secondary sources, though they are not standardized statistical databases); or journalistic databases that compile cross-national incident data for specific countries. Any international comparison must acknowledge that definitions of “police killing” or “use of force” differ substantially across national legal and reporting systems, making rate comparisons across countries inherently approximate.

My analysis shows a large racial disparity in police killings. How should I write about the cause of the disparity without overclaiming?

State what you can demonstrate from the data and what requires additional evidence. You can state, based on rate calculations: “Black Americans in [jurisdiction/time period] were killed by police at X times the rate of white Americans per 100,000 population.” You can note whether that disparity persists after controlling for [covariates you included] if you ran a regression. What you cannot state from observational rate data alone: that the disparity is caused by racial bias in officer decision-making, or that it is caused entirely by differential crime exposure. Both are possible explanations, and the data you have cannot distinguish between them without encounter-level data and a stronger research design. Writing “the disparity is consistent with racial bias in police decision-making” is defensible if you have controlled for the main structural confounders — writing “the data proves racial bias” is not defensible from any currently available national-level dataset.

How many years of data should I include in a trend analysis?

A minimum of five years is needed for a trend line to be meaningful — with fewer data points, the trend estimate is too sensitive to single-year fluctuations to be interpretable. Ten or more years is preferable for trend analysis, but creates a methodological problem in police brutality research: the most complete databases (Washington Post, Mapping Police Violence) begin in 2013–2015, and earlier data from other sources uses different definitions and methodologies. Comparing rates across methodological discontinuities requires careful handling and explicit acknowledgment. For most student assignments, a five-to-ten year window from a single consistent database is more defensible than a longer window that crosses definitional or coverage changes in the data. State the start and end year of your trend analysis explicitly and explain any methodological constraints on the time window chosen.

What statistical software should I use for this analysis?

The choice of software depends on what your institution has available and what your assignment requires. R is free and has robust packages for negative binomial regression (MASS), rate ratio confidence intervals (epitools), and geographic visualization (ggplot2, sf). Stata is common in criminology and public policy programs and handles count regression models cleanly. SPSS can produce descriptive statistics and chi-square tests but is weaker for advanced regression modeling and geographic analysis. Python (pandas, statsmodels, scipy) is increasingly used in data science-oriented criminology courses. Excel is appropriate only for basic descriptive statistics and simple charts — it cannot run negative binomial regression or produce confidence intervals for rate ratios without add-ins. Whatever software you use, cite the version in your methods section.

Getting the Analysis Right: What Instructors Are Looking For

The methodological sophistication expected in a police brutality data analysis assignment varies by course level — undergraduate courses typically expect correct rate calculation, source evaluation, and basic descriptive comparison; graduate courses expect regression modeling, denominator justification, and explicit engagement with the measurement and causal inference literature. But across both levels, the core evaluative criterion is the same: does the student understand that the numbers in a police brutality database are not self-interpreting facts, but measurements produced by specific collection methods with specific limitations that constrain what conclusions the data can support?

Instructors in criminology and social science methods courses read assignments looking for whether the student knows which questions their data can answer and which it cannot. An analysis that presents a rate ratio with a confidence interval, identifies the source of that rate ratio, explains why population rather than encounter was used as the denominator, controls for one or two key confounders, and honestly describes what the data cannot establish will consistently outperform an analysis with more sophisticated methods applied to data that was not evaluated for coverage or validity in the first place.

For direct support with this analysis — whether you need help selecting and merging the right data sources, setting up a regression model appropriate to your research question, or writing an analytically coherent results and limitations section — our statistical analysis assignment team works with criminology data at the undergraduate and graduate levels and is familiar with the methodological standards this topic requires.

Also Useful

Continue with: criminology assignment help · sociology assignment help · data analysis assignment help · research paper writing · proofread my research paper.

How to Effectively Analyze Statistical Data of Police Brutality

What This Guide Covers

Why Police Brutality Data Is Different From Other Crime Data

Step 1: Identifying and Evaluating Your Data Sources

Step 2: Defining Your Variables Precisely

Lethal Force / Police Killings

Use of Force (Non-Fatal)

Complaints and Adjudications

Step 3: Setting Up Valid Comparisons

Step 4: Choosing the Right Denominator

Step 5: Matching Statistical Methods to Your Research Question

Descriptive Statistics — What the Data Looks Like

Rate Ratios and Risk Ratios — How Groups Compare

Chi-Square Tests — Categorical Comparisons

Trend Analysis — How Rates Change Over Time

Geographic Analysis — Spatial Patterns

Multivariate Regression — Controlling for Confounders

Step 6: Analyzing Racial Disparity Data Specifically

The Denominator Debate in Racial Disparity Research

Step 7: Using Regression to Control for Confounders

Choosing the Right Regression Model

What to Include as Covariates

Step 8: Visualizing Police Brutality Data

Time Series Line Chart

Choropleth Map

Rate Ratio with Confidence Intervals

Step 9: Assessing and Reporting Data Limitations

Identify the Coverage Limitation

Identify the Definition Limitation

Identify the Denominator Limitation

Identify the Confounding Limitation

State What the Findings Cannot Support

Step 10: Writing Up the Analysis

Where Most Analyses Lose Credibility

Using Raw Counts as Evidence of Disparity

Instead

Treating One Database as Definitive

Instead

Equating Disparity With Causation

Instead

Ignoring the Armed/Unarmed Distinction

Instead

Missing the Selection Bias Problem

Instead

No Confidence Intervals on Rate Comparisons

Instead

Frequently Asked Questions

Need Help With Your Police Brutality Data Analysis Assignment?

Related Resources

Getting the Analysis Right: What Instructors Are Looking For

Leave a Comment Cancel

Article Reviewed by

Simon