## To start, you can think of a research question, theory, and hypothesis

Think of the assignment as writing a report for an NGO (or government ministry, or company, or academic journal etc). You are in charge of the research process from start to finish, except that you are using what we call ‘secondary’ data – data gathered by someone else.

The datasets you may use are the Quality of Government dataset, the American National Election Study 2016; or the Canadian Election Study 2015.
All are posted on the course website in the Datasets Module.

To start, you can think of a research question, theory, and hypothesis (and then check if there are suitable variables) OR you can look for a dependent variable in the codebooks and then generate question, theory, hypothesis. In either case, your theory and hypothesis that involve explaining variation in your dependent variable will lead you to look for a variable that measures what we’ll call your primary independent variable.

After that, identify two other independent variables that might be ‘confounding’ variables. Variables you need to control for. Make sure you know what a confounding variable is.

At the end of the assignment you will report the results of a multiple regression using those three variables to explain the dependent variable.

But you must work up to that multiple regression on a pathway that mirrors the progression of the course. Do the following sections:
(You may number the sections in your assignment. You may use headings but don’t waste vertical space.)

1. You must state your research question concisely in terms of concepts (not variables). Like we did in week 2.

2. You should present a maximum of four sentences of theory about how ONE of the concepts – the one you’re focussing on – is related to your dependent variable (ideally it would be a causal relationship). (Once this has a measurement, we’ll call this your primary independent variable).

3. Identify two other independent variables (at the level concepts) that you think you need to control for in order to get an accurate estimate of the relationship between your primary IV and the DV. Both of these additional independent variables should plausibly be confounding variables and they should not be just slightly different measures of the same concept, or the same concept as the primary IV.
For one or both of these two other variables, explain why you think you really need to control for it in order to get a good estimate of the relationship between your primary independent variable and your dependent variable. How might it be related to both the primary independent and the dependent variable? [Both additional independent variables should plausibly be confounding variables, but you only need to explain why for one variable].

4. You should include a clearly stated hypothesis about the relationship between your primary independent variable and the dependent variable. (put the hypothesis as a separate paragraph and put it indented in italics.)

5. You must separate the concepts involved in your variables (dependent and three independent) from the measurements of those variables. Write about both concepts and measures for all four variables in an integrated way so your audience knows what variables you’re working with and what they measure. You should smoothly and very concisely mention each concept and how it is measured. This step can be a challenge. In some case you need to abstract away from the measure to identify the concept. For instance, voter turnout is one measure of the following concepts: political participation, regime satisfaction, and citizen inclusion in politics (among others). Another thing, and I can’t believe I have to say this, you cannot copy directly from the codebook without attribution. If you use a word for word statement, put it in quotation marks and cite the codebook. Better yet, rephrase it into your own words.

6. You should first have a look at descriptive statistics (tabulate, summarize, histograms, etc.) and make sure you’ve dealt appropriately with missing data. In some cases you will need to spend some time recoding variables. It depends, of course, on the variables you choose. (For instance, if you use a survey question with ordinal values but their numeric codes are 1,3,5,7, you may want to recode it so that a ‘one unit change’ is more meaningful.)

7. You should then present some descriptive statistics to give your audience a feel for how your variables are distributed in the dataset. You must decide which descriptive statistics are appropriate given the levels of measurement of the variables. If you want to use graphics, do so sparingly and only to show the distribution of one of your variables.  You’re doing this step to show your audience how the variables you’re interested in are distributed.

8. Then do some correlational analysis and present your results.

1. You should start with a bivariate look at the relationship between your primary independent variable and the dependent variable. That might be a crosstab, or a difference of means if it’s a binary independent variable, or it could be a scatterplot and bivariate regression.
1. Report these results and include a table or graphic to give us a sense of the two variable relationship.

1. Then you should run a multiple regression of your dependent variable on all three of your variables.[1] Report the multiple regression results and interpret them in the text. Connect your results to your hypothesis.

9. Conclude with a summary ‘answer’ to your question in prose. Short and to the point. Not the usual essay conclusion, please.

Submit the paper by uploading it after clicking into Assignment 3 on the Canvas site.

The final product will be a Word or PDF document that is a maximum of 5 pages. The text must be line-spaced at 1.5 lines and the minimum font size is 11-point. That is, your reports need to be concise. In fact, I’d like you to do it really efficiently. Too much essay-writing makes many students undisciplined writers. Concise and focused writing will serve you well when you move on to careers and/or graduate school.

The TA will offer some guidance. And I n class sometime in the last two weeks, I’ll do one of these with your advice, “live”.

Important Points: (Read all of these. They are… important)

• First, a really important point: You don’t need to worry about whether or not the evidence you find is consistent with your hypotheses. Your p-values could be huge but that’s ok. Indeed, when it comes to evaluating scientific research, the quality of the research process is more important than the actual results. As long as you do the right things with the data and the words you use to describe it and interpret the results, you will do well.
• One important difference between this assignment and a normal research study is that you do not need to consult or summarize existing research. You do not need to read or cite ANY actual political science literature on these topics. If you consult existing research you will be wasting your time. Even if you intend to use it to look for ideas or for smart ways of approaching a question, the real political science research on the topic you choose will probably make you much more confused and you will suffer a blow to your confidence because their analytic techniques are likely much more sophisticated. So simply do it with your existing knowledge, within the confines of the available data, and with the methodological tools you have from the course. This is the right way to try out your skills.
• If your dependent variable is an ‘index’ variable in the Quality of Government data set, make sure your independent variables are not part of the index. For instance, the ‘Hobbes index’ includes a measure of national income. It wouldn’t make sense to use national income as an independent variable.
• A small fraction of your grade is based on the novelty and thoughtfulness of the relationships you choose to explore. For instance, you can still earn a good grade if you explore how age, gender, and income are related to voter turnout, but you can earn a higher grade if you consider some less obvious relationships. The key here is to demonstrate that you spent time thinking about possible causal factors rather than picking the first few that come to mind. Don’t make it totally wacky – but try to avoid being boring.
• You should paste the Stata table for your multiple regression at the very end of your report as an appendix. it will not count in the page limit. Either paste it as an image or paste it as text and then change the font to courier so it looks real pretty, like regression always does.J  If you highlight the results in Stata and right-click on it (2 finger click for mac) you’ll see four copy options. See what works on your computer; I can’t tell you in advance.
• Clear writing in the discussion of your results is a substantial part of the grade. Look for examples included in the Canvas learning modules, the assignment/test answer sheets, the textbook, and the lab workbook for examples of how to clearly, concisely, and correctly discuss results. For instance, sometimes a one-unit change in the independent variable in a regression isn’t the best way to tell us about the relationship (e.g. a one dollar increase in household income?).
• Some of the posted datasets that do not contain all of the variables included in the codebook. There used to be a maximum number of variables that ‘small stata’ (which some people rented) could handle. If you use the small data sets, you are obviously restricted to the variables in that data set. If there are variables in the codebook you would like to use, you can do so by opening the ‘large’ versions of the data sets.
• There is almost no instance where you should recode a continuous variable into a binary or ordinal variable. You may need to recode a categorical variable (like religion) into one or more binary variables if you wish to include it in your regression. If you are unsure if a particular recoding of a variable is a good idea, check with me or the TA.

[1] Notice that we say run a regression of Dependent ON independent. That seems backwards. But the sentence is written like the equation Dependent [Y] = a + B(Independent [X]). That’s just how we say that and write the equation. Unfortunately, this is linguistically the opposite of how we ask our questions and present our theories where we say X “influences” Y.