Basic Econometrics

Basic Econometrics

Research Report Group Assignment

This is a group assignment where you can work alone or with up to four other students (a maximum group size of four). All group members will receive the same marks for the assignment.  You must submit an electronic copy of your assignment in Canvas in pdf, doc or docx format.  Hard copies will not be accepted. Show your tables and calculations as well as answering the questions in full sentences. Please make sure your tables of results are neatly formatted, not just copied and pasted from STATA, and that you write your answers in clear sentences. You should write no more than 1000 words (not including tables/calculations) in total for this assignment. The number of words, tables, graphs, calculations given in parentheses after each question are a guide.


This assignment uses data from the BUPA health insurance call centre. Each observation includes data from one call to the call centre. The variables describe several characteristics of the call (eg the length of the call, the amount of silence in the call), characteristics of the customer (eg state of residence, family type, number of adults and children), and measures of performance (eg net promoter score, sentiment score of the customer). In this assignment we are interested in predicting the net promoter score and the length of the call.

Please use the dataset CallCentre.dta and associated information file CC_DEFINITIONS_.XLSX to answer these questions. Use the software program STATA 15 available through RMIT MyDesktop for all data analysis.

  1. Calculate descriptive statistics using the ‘summarize’ command for the variables net_promoter_score, total_silence, total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted and present the results in a table. Comment on what we learn about these variables from the descriptives. Graph a scatter plot of  net_promoter_score  against agent_crosstalk_weighted and describe the relationship between these two variables.  

(4.5 marks) (100 words, 1 table, 1 graph)

  • Estimate a multiple linear regression with net_promoter_score as the dependent variable  and  total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted as the explanatory (independent) variables. Predict the change in net_promoter_score associated with a 0.1 increase in total_silence_weighted and a 0.01 increase in agent_crosstalk_weighted. Assuming this is the correct model specification, are we sure that total_silence_weighted has a negative effect? [Hint: consider the t-statistic and p-value]

(6 marks) (50 words, 1 table, 2 calculations)

  • Add dummy variables to the regression to control for all of the potential effects of State and Package. Make sure the base category is customers with the “HOSPITAL AND EXTRAS” package in NSW. Carefully interpret the estimated coefficient on the package1 dummy variable you have included.  Why is this NOT a very important result?

[Hint: Use the variable labels to include and interpret the correct variables, consider the descriptive statistics of the dummy variables to interpret their importance]

(4.5 marks) (50 words, 1 table)

  • Include a quadratic specification of the variable “sentiment_score_cust” in the model along with the existing explanatory variables.  Calculate and interpret the marginal effect of a 1 point change in “sentiment_score_cust” when sentiment_score_cust = 1 and when sentiment_score_cust=4.  

(4.5 marks) (50 words, 1 table, 2 calculations)

  • Explain the conditional mean independence assumption and assess its relevance with respect to the explanatory variable “sentiment_score_cust”.

[Hint: Think about factors that may be included in the error term of the regression: the customer’s experience with the company (positive or negative), the general attitude of the  customer towards call centre conversations (positive or negative) and whether these may be correlated with sentiment_score_cust]

(3 marks) (100 words)

  • Write an executive summary of the findings in questions 2 to 5 on what variables are likely and are not likely to be important drivers of net promoter score.

(1.5 marks, 100 words)


“The rise in energy consumption of rapidly growing developing countries, especially China and India, has accounted for the vast majority of the global increase in energy use in recent years. Non-OECD countries currently account for approximately 60% of global energy demand, which is predicted to rise to 70% by 2040 (International Energy Agency, 2014). This increasing energy use exacerbates environmental problems including global climate change due to greenhouse gas emissions and local environmental problems such as the recent episodes of extreme air pollution in Beijing and other Chinese cities. Besides its environmental impacts, increasing energy use also raises questions of national energy supply security. As the share of world energy use consumed in developing countries increases, it is increasingly important to understand how energy use evolves across the full income continuum from less developed to highly developed countries (van Ruijven et al., 2009).”  Csereklyei and Stern (2015) page 633.

In this part of the home assignment we will be exploring the drivers of total and sectoral energy use across several developed and developing countries. Please use the dataset: “energy_econometrics_data_SIM2060.dta”

  • Countries have a keen interest in exploring the drivers of their sectoral energy consumption, including ELECTRICITY USE IN INDUSTRY. Please examine the log final ELECTRICITY use by INDUSTRY per capitaln_elec_indus_pc”.
  • Model Design

Present the results of the descriptive statistics in a Table (1) .

(1 mark)

Design TWO regression models to predict ““ln_elec_indus_pc”.

  • One with a linear per capita  GDP term (or its logs) [Model 1],            

Presentation of tables and model adequacy:  choose which explanatory variables to include, and whether to include them as dummies/ logs/ polynomials/ interactions as you feel appropriate. (2 marks)

Interpret the coefficients including dummies, elasticities or semi-elasticities (1 mark)

Interpret the statistical significance of these coefficients (1 mark)

(Subtotal: 4 marks)

  • one with a quadratic per capita GDP  term (or its log) [Model 2].                                

Presentation of tables and model adequacy:  choose which explanatory variables to include, and whether to include them as dummies/ logs/ polynomials/ interactions as you feel appropriate. (1 mark)

What are the major differences compared to model 1? (1 mark) 

Which model do you think is more appropriate (number 1 or 2)? How do you explain the quadratic model? (2 marks)

(Subtotal: 4 marks)

  1. Discuss how you have designed your model with reference to the “Gauss Markov” assumptions and whether these assumptions are likely to be met. (2 marks)

Interpret the results of THREE of your explanatory variables including income per capita, which you consider to be the key drivers of per capita industrial electricity consumption. (3 marks)

(Total: 14 marks) (550 words, 3 tables, 4 calculations)

There will be up to 2 additional marks awarded for presentation of your answers (neat formatting of tables and clear expression of answers in full sentences).

Rubric for marking

1. Descriptive statistics A) Present descriptive statistics table, B) comment on descriptives, C) present and comment on graph.4.5 pts (1.5 marks each)
2. Multiple linear regression A) Estimate regression model, B) present table, C) two predictions, D) comment on total_silence_weighted effect6.0 pts (1.5 marks each)
3. Dummy variables A) Include dummy variables correctly, B) Comment on package1 coefficient C) Why not an important result4.5 pts (1.5 marks each)
4. Quadratic Specification A) Include quadratic specification correctly and present results in table. B) Calculate marginal effect when sentiment_score_cust=1 C) Calculate marginal effect when sentiment_score_cust=44.5 pts (1.5 marks each)
5. Conditional mean independence A) Explain conditional mean independence assumption. B) Discuss with reference to the variable “sentiment_score_cust”3.0 pts (1.5 marks each)
6 .Executive Summary1.5 pts
7. a Model design A) Descriptive statistics table (1 pt) B) Linear model with explanations (4 pts) Quadratic model with explanation (4 pts)  9 Pts
7. b Model design A) Discuss Gauss_Markov assumptions 1-3 B) Discuss Gauss_Markov assumptions 4-5 C) Prediction 1 D) Prediction 2 E) Prediction 3Pts
8. Neat formatting of tables1.0 pts
9. Clear expression of answers in full sentences1.0 pts
find the cost of your paper

President, Woodrow Wilson (from The Meaning of a Liberal Education, An Address to the New York City High School Teachers Association; delivered January 9, 1909

Let us go back and distinguish between the two things that we want to do; for we want to do two things in modern society. We want one class of….

Social inequality in digital society

“What are the new forms of social inequality created by the digital society? And how do they lead to new conflicts? Discuss referring to specific case of social inequality and conflict….

Care of Persons ( Class) Reflection Paper

Description: Personal reflections provide an opportunity for you to expand your knowledge by writing out your thoughts, experiences and questions regarding a particular subject. It is an interactive engagement between….