2

enter image description hereWhat does a jagged / v-shaped residuals vs fitted plot mean? I am doing a multiple linear regression with three explanatory variables

I used a linear regression model with the data and indicators below, there are 217 data points: emissions.urban=lm(EN.ATM.METH.KT.CE~AG.LND.AGRI.K2+AG.LND.FRST.K2+SP.URB.TOTL,data=wbcc) summary(emissions.urban)

I've posted the data below AG.LND.AGRI.K2= Agricultural land (sq. km) AG.LND.FRST.K2=Forest area (sq. km) SP.URB.TOTL=Urban population EN.ATM.METH.KT.CE=Methane emissions (kt of CO2 equivalent)

iso3c country AG.LND.AGRI.K2 AG.LND.FRST.K2 SP.URB.TOTL EN.ATM.METH.KT.CE
ABW Aruba 20 4.2 46654 NA
AFG Afghanistan 379190 12084.4 10131490 81510
AGO Angola 69524.9 666073.8 21962884 35520
ALB Albania 11740.81 7889 1762579 3160
AND Andorra 188.3 160 67928 50
ARE United Arab Emirates 3817.5 3173 8609395 52960
ARG Argentina 1487680 285730 41796990 117850
ARM Armenia 16773 3284.7 1876112 2430
ASM American Samoa 49 171.3 48106 NA
ATG Antigua and Barbuda 90 81.2 23927 200
AUS Australia 3588950 1340051 22152761 139070
AUT Austria 26528.3 38991.5 5238680 6660
AZE Azerbaijan 47795 11317.7 5701802 43600
BDI Burundi 20330 2796.4 1629988 2140
BEL Belgium 13540 6893 11334006 8200
BEN Benin 39500 31351.5 5869446 5530
BFA Burkina Faso 121000 62164 6397866 15600
BGD Bangladesh 92023 18834 62873466 83790
BGR Bulgaria 50300 38930 5242987 6910
BHR Bahrain 86 7 1523019 14850
BHS Bahamas, The 140 5098.6 327359 230
BIH Bosnia and Herzegovina 22110 21879.1 1608256 3380
BLR Belarus 84530 87676 7470497 16730
BLZ Belize 1720 12770.5 183005 530
BMU Bermuda 3 10 63903 NA
BOL Bolivia 377270 508337.6 8185478 25050
BRA Brazil 2368788.01 4966196 185081854 416280
BRB Barbados 100 63 89634 2350
BRN Brunei Darussalam 144 3800 342330 8830
BTN Bhutan 5130 27250.8 326515 860
BWA Botswana 258616 152547 1666761 4560
CAF Central African Republic 50800 223030 2038064 23200
CAN Canada 581990 3469281 30997832 93980
CHE Switzerland 15100.756 12691.1 6383962 4790
CHI Channel Islands 92.2 10.2 53832 NA
CHL Chile 156930 182107 16770077 13480
CHN China 5285287 2199781.8 861289359 1238630
CIV Cote d'Ivoire 212000 28367.1 13639151 6560
CMR Cameroon 97500 203404.8 15279799 14960
COD Congo, Dem. Rep. 315000 1261552.4 40874034 38620
COG Congo, Rep. 106280 219460 3742867 3540
COL Colombia 494920 591419.1 41431388 76700
COM Comoros 1310 329.2 255487 270
CPV Cabo Verde 790 457.2 370577 120
CRI Costa Rica 17825 30348.7 4114567 4820
CUB Cuba 63000 32420 8743468 12690
CUW Curacao NA 0.7 138059 NA
CYM Cayman Islands 27 127.2 65720 NA
CYP Cyprus 1309.47 1725.3 806771 740
CZE Czech Republic 35230 26770.9 7923709 12430
DEU Germany 166450 114190 64472284 53370
DJI Djibouti 17020 58 771254 690
DMA Dominica 250 478.7 51178 50
DNK Denmark 26320 6284.4 5138400 7140
DOM Dominican Republic 24290 21441 8953860 9200
DZA Algeria 413588.47 19490 32332690 49550
ECU Ecuador 54480 124978.3 11320846 20350
EGY Egypt, Arab Rep. 38359.688 449.8 43781728 56870
ERI Eritrea 75920 10552.6 1149669 3730
ESP Spain 261833.239 185721.7 38264801 39940
EST Estonia 10040 24384 921477 1050
ETH Ethiopia 379030 170685 24941349 103110
FIN Finland 22720 224090 4729705 4240
FJI Fiji 4250 11400.2 513187 670
FRA France 286601 172530 54570334 58340
FRO Faroe Islands 30 0.8 20718 NA
FSM Micronesia, Fed. Sts. 220 644.2 26378 60
GAB Gabon 22126.4 235306 2005203 1120
GBR United Kingdom 173508.616 31900 56395647 51210
GEO Georgia 23718 28224 2208084 5210
GHA Ghana 147827.4 79857.1 17820023 21350
GIB Gibraltar NA 0 33691 NA
GIN Guinea 145000 61890 4842717 17830
GMB Gambia, The 6050 2426.7 1512397 1700
GNB Guinea-Bissau 8151.1 19800.1 869776 1500
GNQ Equatorial Guinea 2840 24484.2 1025582 12230
GRC Greece 61036 39018 8541900 9670
GRD Grenada 80 177 41111 2030
GRL Greenland 2431.1 2.2 49198 NA
GTM Guatemala 38560 35278 8738685 11750
GUM Guam 180 280 160239 NA
GUY Guyana 12512.5 184153.4 210688 1540
HKG Hong Kong SAR, China 50 NA 7481800 NA
HND Honduras 33560 63592.6 5780230 8150
HRV Croatia 14840 19391.1 2329285 3820
HTI Haiti 18400 3473 6509478 4730
HUN Hungary 52960 20530.1 7014174 7120
IDN Indonesia 623000 921332 154926514 287500
IMN Isle of Man 403 34.6 44980 NA
IND India 1796740 721600 481980332 666510
IRL Ireland 45160 7820.2 3179292 16820
IRN Iran, Islamic Rep. 459540 107518.7 63728813 149690
IRQ Iraq 92500 8250 28514939 16750
ISL Iceland 18720 513.5 344066 530
ISR Israel 6233 1400 8533651 12070
ITA Italy 124050 95661.3 42306582 43670
JAM Jamaica 4440 5968.9 1667459 780
JOR Jordan 10218 975 9327507 6300
JPN Japan 44200 249350 115494817 21110
KAZ Kazakhstan 2160365 34546.8 10815873 41360
KEN Kenya 276300 36110.9 15053275 40250
KGZ Kyrgyz Republic 105413 13153.8 2429400 4990
KHM Cambodia 55660 80683.7 4051341 20310
KIR Kiribati 340 11.8 66405 20
KNA St. Kitts and Nevis 60 110 16406 80
KOR Korea, Rep. 16520 62870 42156641 25530
KWT Kuwait 1500 62.5 4270563 6080
LAO Lao PDR 23940 165955 2640299 7610
LBN Lebanon 6580 1433.3 6069524 3250
LBR Liberia 19540.4 76174.4 2634493 6210
LBY Libya 153500 2170 5544510 37790
LCA St. Lucia 106 207.7 34598 270
LIE Liechtenstein 51.6 67 5498 20
LKA Sri Lanka 28116 21130.2 4101702 10030
LSO Lesotho 24333 345.2 621853 2320
LTU Lithuania 29470 22010 1901682 3150
LUX Luxembourg 1315.59 887 578234 540
LVA Latvia 19380 34107.9 1299043 1940
MAC Macao SAR, China NA NA 649342 NA
MAF St. Martin (French part) NA 12.4 NA NA
MAR Morocco 300690 57424.9 23450016 17670
MCO Monaco NA 0 39244 NA
MDA Moldova 22571 3865 1121710 3310
MDG Madagascar 408950 124298.1 10670457 17470
MDV Maldives 79 8.2 219833 130
MEX Mexico 1068910 656920.8 104088701 144610
MHL Marshall Islands 86 94 46049 30
MKD North Macedonia 12640 10014.9 1218402 2520
MLI Mali 412010 132960 8891939 23290
MLT Malta 103.8 4.6 497676 220
MMR Myanmar 128890 285438.9 16943754 65790
MNE Montenegro 2568 8270 419585 820
MNG Mongolia 1134330 141727.8 2250777 17860
MNP Northern Mariana Islands 30 243.6 52836 NA
MOZ Mozambique 414138.32 367437.6 11587640 16850
MRT Mauritania 396610 3128 2572517 6830
MUS Mauritius 860 387.7 515916 1930
MWI Malawi 56500 22417 3333777 11020
MYS Malaysia 85710 191140.4 24973604 46580
NAM Namibia 388100 66389 1322115 4510
NCL New Caledonia 1840.3 8380.2 194500 NA
NER Niger 466000 10797 4024595 29860
NGA Nigeria 691234.5 216269.5 107106007 127900
NIC Nicaragua 50650 34075.3 3909282 9830
NLD Netherlands 18220 3695 16087009 17260
NOR Norway 9862.97 121800 4463566 4850
NPL Nepal 41210 59620.3 5995190 30800
NRU Nauru 4 0 10834 0
NZL New Zealand 104670 98925.9 4408037 32530
OMN Oman 14588.9 25 4405789 5460
PAK Pakistan 363000 37259 82094635 151020
PAN Panama 22590 42138.4 2951905 5390
PER Peru 236087 723303.7 25815966 31410
PHL Philippines 124400 71885.9 51950201 67660
PLW Palau 43 414.1 14652 20
PNG Papua New Guinea 11900 358557.6 1193981 11310
POL Poland 145120 94830 22786800 47540
PRI Puerto Rico 1689 4963.3 2989009 NA
PRK Korea, Dem. People's Rep. 26300 60300.9 16081083 18710
PRT Portugal 35739.9 33120 6833619 11320
PRY Paraguay 218190 161022.6 4435221 29070
PSE West Bank and Gaza 2969.200134 101.4 3685020 NA
PYF French Polynesia 455 1494.6 174090 NA
QAT Qatar 670 0 2859020 8110
ROU Romania 134140 69290.5 10451921 23780
RUS Russian Federation 2154940 8153116 107723564 849570
RWA Rwanda 18117 2760 2257829 2910
SAU Saudi Arabia 1736290 9770 29343564 44170
SDN Sudan 681861.6 183595.5 15458183 58850
SEN Senegal 88780 80681.6 8057514 10750
SGP Singapore 6.6 155.7 5685807 4150
SLB Solomon Islands 1170 25229.7 169453 410
SLE Sierra Leone 39490 25348.8 3423961 4610
SLV El Salvador 14791.2 5838.8 4763725 3990
SMR San Marino 23 10 33089 NA
SOM Somalia 441250 59800 7333290 19430
SRB Serbia 34640 27226.5 3899416 12400
SSD South Sudan 285332 71570 2261021 34170
STP Sao Tome and Principe 440 519 162955 30
SUR Suriname 840 151962.9 388053 1370
SVK Slovak Republic 18890 19259 2934665 4360
SVN Slovenia 6120 12378.3 1157547 1980
SWE Sweden 30090 279800 9108648 4580
SWZ Eswatini 12220 4975.6 280423 1500
SXM Sint Maarten (Dutch part) NA 3.7 40812 NA
SYC Seychelles 15.5 337 56661 90
SYR Syrian Arab Republic 139210 5220.8 9708489 12770
TCA Turks and Caicos Islands 10 105.2 36242 NA
TCD Chad 502380 43130 3863362 53990
TGO Togo 38200 12092.7 3543299 3450
THA Thailand 221100 198730 35898129 84140
TJK Tajikistan 47277 4238 2623424 5520
TKM Turkmenistan 338380 41270 3167338 49580
TLS Timor-Leste 3800 9211 412936 5280
TON Tonga 350 89.5 24415 100
TTO Trinidad and Tobago 540 2281.9 744725 1280
TUN Tunisia 97430 7027.3 8221976 6420
TUR Turkey 378020 222203.6 64186247 47400
TUV Tuvalu 18 10 7549 10
TZA Tanzania 396500 457450 21042571 62650
UGA Uganda 144150 23379 11414209 33250
UKR Ukraine 413290 96900 30721277 62950
URY Uruguay 140159 20310 3317930 20940
USA United States 4058103.538 3097950 272364755 622590
UZB Uzbekistan 255777 36896.6 17258430 105930
VCT St. Vincent and the Grenadines 70 285.4 58837 70
VEN Venezuela, RB 215000 462309 25102966 72340
VGB British Virgin Islands 70 36.2 14669 NA
VIR Virgin Islands (U.S.) 40 199.1 101974 NA
VNM Vietnam 121688 146430.9 36346227 87750
VUT Vanuatu 1870 4423 78400 510
WSM Samoa 757 1616.7 35494 300
XKX Kosovo 5700 NA NA NA
YEM Yemen, Rep. 233877 5490 11306428 8590
ZAF South Africa 963410 170500.9 39946775 45140
ZMB Zambia 238360 448140.3 8204576 17870
ZWE Zimbabwe 162000 174445.8 4792105 11850
Natalie
  • 21
  • You don't provide much information, and it's hardly helpful to make guesses about what could be going on. Instead explain your problem, the data, what is the outcome variable, what are the three predictors, what is the model that you've fitted, etc.... – dipetkov Oct 20 '22 at 18:01
  • Thanks for posting the data. It seems clear that CPV Cabo Verde needs editing before the data can be used. Two columns have been mushed into one. (The total land area of the country is 4033 sq. km according to Wikipedia.) – Nick Cox Oct 22 '22 at 17:16
  • There are many possible analyses of these data. I fixed Cabo Verde, as above, and ignored missing values. My suggestions: They are all essentially size variables for countries covering a wide range. There are some zeros. As a brute force method that won't appeal universally, I pushed them all (outcome and predictors) through log1p() $= \ln(1 + v)$ and then plain regression works well enough and the residual versus fitted plot looks very well behaved. – Nick Cox Oct 22 '22 at 17:29

2 Answers2

2

A dip (or hump) in the residuals curve is a sign of non-linearity. It means that you might bet a better model using some quadratic terms.

But that is not the most striking issue shown by your graph. The very non-uniform distribution of the predicted values is more important. The few very large values will make a big impact on the regression coefficients because of the use of least-squares as the loss function.

There are several ways of addressing this:

  • ignore very large values as "outliers"
  • use an L1 loss function
  • transform the value to be predicted before making the model

In this case the third option looks best - if you make a linear regression for $\log y$ instead of $y$ you might get much better results.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
chrishmorris
  • 1,780
0

I want to add to @chrishmorris's helpful answer, although I am never in favour of dropping outliers because they are awkward and much more in favour of working on logarithmic scale.

It is all too easy to over-interpret a "smooth" which for high fitted values is summarizing only a very few residuals and utterly dependent on what is seen within a smoothing window, even if a data point is down-weighted somehow as it enters or leaves a window. So, the V shape is probably dependent in the right-hand half of the plot on just about the 5 residuals visible as individuals.

Two rather different stories are broadly consistent with each other and the data, on this information:

  1. You would be better off with transforming $y$ or perhaps even better off with a generalized linear model with logarithmic link, such as Poisson regression.

  2. The regression is doing a fair job of following the main trend of the data, but uncertainty is bound to accompany severe skewness of the response, and there may be nonlinearity better modelled otherwise.

How many data points are there here? Is it possible to post the data?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • Thank you both for your suggestions - there are 217 data points

    emissions.urban=lm(EN.ATM.METH.KT.CE~AG.LND.AGRI.K2+AG.LND.FRST.K2+SP.URB.TOTL,data=wbcc) summary(emissions.urban)

    – Natalie Oct 22 '22 at 14:13