First see data distribution:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2700 3550 4050 4202 4750 6300 2
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 172.0 190.0 197.0 200.9 213.0 231.0 2
Test it with a model:
##
## Call:
## lm(formula = flipper_length_mm ~ body_mass_g + species, data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5455 -3.1845 0.1307 3.3533 17.5313
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.589e+02 2.387e+00 66.564 < 2e-16 ***
## body_mass_g 8.402e-03 6.339e-04 13.255 < 2e-16 ***
## speciesChinstrap 5.597e+00 7.882e-01 7.101 7.33e-12 ***
## speciesGentoo 1.568e+01 1.091e+00 14.374 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.395 on 338 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.8541, Adjusted R-squared: 0.8528
## F-statistic: 659.4 on 3 and 338 DF, p-value: < 2.2e-16
Check model residuals:
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.052 0.208 0.94 0.832 0.564 0.064 0.256 0.772 0.192 0.456 0.028 0.268 0.548 0.668 0.176 0.912 0.532 0.268 0.476 0.004 ...
Plot:
## `geom_smooth()` using formula = 'y ~ x'