install.packages(c("tidyverse",
"here",
"tidylog",
"summarytools")) 2026-05-04
https://rstats-courses.github.io/CursoR-AEET-2026/materiales.html
La exploración de datos nos permite verificar su calidad, generar y probar hipótesis de forma rápida, identificando pistas prometedoras para analizar más a fondo luego.
La visualización de los datos es un buen comienzo, pero por sí sola no suele ser suficiente, ya que a menudo requiere transformar los datos previamente.
Más de una variable por columna
Source: Data Carpentry
Múltiples tablas
Source: Data Carpentry
Información en colores
Se puede evitar simplemente añadiendo una columna a la tabla original.
[1] "broom" "conflicted" "cli" "dbplyr"
[5] "dplyr" "dtplyr" "forcats" "ggplot2"
[9] "googledrive" "googlesheets4" "haven" "hms"
[13] "httr" "jsonlite" "lubridate" "magrittr"
[17] "modelr" "pillar" "purrr" "ragg"
[21] "readr" "readxl" "reprex" "rlang"
[25] "rstudioapi" "rvest" "stringr" "tibble"
[29] "tidyr" "xml2" "tidyverse"
Da información de las operaciones que se realizan en el dataset
Permite hacer resumenes completos de los datasets
read.table(), read.csv(), readRDS(), read.txt()
Argumentos útiles: sep, dec, comment.char, na.strings, stringsAsFactors
read_csv(), read_csv2(), read_table(), read_delim()
Más rapido, formato “tibbles”, no convierte characteres a factores automaticamente, no usa los nombres de fila, deja strings como texto.
Argumentos útiles: delim, comment, na, col_types, skip_empty_rows, guess_max
read_excel(), read_xls(), read_xlsx()
Argumentos útiles: sheet, col_types, skip
La función here() permite hacer referencia siempre al directorio donde se encuentra el proyecto
Ejemplo usando ruta absoluta:
Ejemplo usando ruta relativa al proyecto:
Mecanismo para encadenar varias funciones:
Rows: 213062 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): site_name, megaplot, plot, plant_ID, species_name, height_diameter_...
dbl (6): trap, year, count, stem_diameter_cm, trap_area_m2, burned
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 213,062
Columns: 14
$ site_name <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND",…
$ megaplot <chr> "Bare Mountain", "Bare Mountain", "Bare Mountain…
$ plot <chr> "CNCT_01", "CNCT_01", "CNCT_01", "CNCT_01", "CNC…
$ trap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1", …
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amabi…
$ year <dbl> 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, …
$ count <dbl> 22, 0, 0, 2, 0, 2, 108, 0, 0, 7, 0, 0, 2, 0, 12,…
$ stem_diameter_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ height_diameter_taken <chr> "Breast Height", "Breast Height", "Breast Height…
$ burned <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ general_method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIAL…
$ methods_notes <chr> "all cones visible from established view points …
# A tibble: 6 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1962 22
2 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1963 0
3 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1964 0
4 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1965 2
5 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1966 0
6 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1967 2
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
arrange() - Ordenar variables por casosrename() - Renombrar variablesrelocate() - Reordenar variablesselect() - Extraer variablesarrange()# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1963 0
2 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1964 0
3 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1966 0
4 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1969 0
5 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1970 0
6 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1972 0
7 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1973 0
8 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1975 0
9 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1977 0
10 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1981 0
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
arrange()De mayor a menor:
# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 LUQ 1 1 92 <NA> Cecropia_schreberiana 1997 1114340
2 LUQ 1 1 93 <NA> Ficus_trigonata 2015 106650
3 LUQ 1 1 92 <NA> Ficus_trigonata 2013 69450
4 LUQ 1 1 93 <NA> Ficus_trigonata 2018 44090
5 LUQ 1 1 109 <NA> Ficus_trigonata 2015 39670
6 LUQ 1 1 92 <NA> Ficus_trigonata 2015 35075
7 LUQ 1 1 93 <NA> Ficus_trigonata 2016 33500
8 LUQ 1 1 92 <NA> Ficus_trigonata 2008 33000
9 LUQ 1 1 107 <NA> Ficus_trigonata 2011 32689
10 LUQ 1 1 99 <NA> Cecropia_schreberiana 2009 30594
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
arrange()Por orden jerárquico:
# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AEC adirondack adirondack 971 <NA> Acer_rubrum 2009 191
2 AEC adirondack adirondack 971 <NA> Acer_rubrum 2004 171
3 AEC adirondack adirondack 971 <NA> Acer_rubrum 1995 141
4 AEC adirondack adirondack 941 <NA> Acer_rubrum 1994 105
5 AEC adirondack adirondack 941 <NA> Acer_rubrum 1995 85
6 AEC adirondack adirondack 972 <NA> Acer_rubrum 2007 82
7 AEC adirondack adirondack 971 <NA> Acer_rubrum 2008 81
8 AEC adirondack adirondack 971 <NA> Acer_rubrum 1993 79
9 AEC adirondack adirondack 941 <NA> Acer_rubrum 1991 77
10 AEC adirondack adirondack 938 <NA> Acer_rubrum 2004 72
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
rename()rename: renamed one variable (site)
# A tibble: 213,062 × 14
site megaplot plot trap plant_ID species_name year count stem_diameter_cm
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1962 22 56.6
2 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1963 0 NA
3 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1964 0 NA
4 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1965 2 NA
5 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1966 0 NA
6 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1967 2 NA
7 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1968 108 NA
8 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1969 0 NA
9 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1970 0 NA
10 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1971 7 NA
# ℹ 213,052 more rows
# ℹ 5 more variables: trap_area_m2 <dbl>, height_diameter_taken <chr>,
# burned <dbl>, general_method <chr>, methods_notes <chr>
relocate()relocate: columns reordered (site_name, year, megaplot, plot, trap, …)
# A tibble: 213,062 × 14
site_name year megaplot plot trap plant_ID species_name count
<chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 AND 1962 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 22
2 AND 1963 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
3 AND 1964 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
4 AND 1965 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 2
5 AND 1966 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
6 AND 1967 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 2
7 AND 1968 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 108
8 AND 1969 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
9 AND 1970 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
10 AND 1971 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 7
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
select()select: dropped 10 variables (megaplot, plot, trap, plant_ID, stem_diameter_cm,
…)
# A tibble: 213,062 × 4
site_name year species_name count
<chr> <dbl> <chr> <dbl>
1 AND 1962 Abies_amabilis 22
2 AND 1963 Abies_amabilis 0
3 AND 1964 Abies_amabilis 0
4 AND 1965 Abies_amabilis 2
5 AND 1966 Abies_amabilis 0
6 AND 1967 Abies_amabilis 2
7 AND 1968 Abies_amabilis 108
8 AND 1969 Abies_amabilis 0
9 AND 1970 Abies_amabilis 0
10 AND 1971 Abies_amabilis 7
# ℹ 213,052 more rows
select()Quitar variables:
select: dropped 3 variables (megaplot, plot, trap)
# A tibble: 213,062 × 11
site_name plant_ID species_name year count stem_diameter_cm trap_area_m2
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 AND CNCT_01ABAM1 Abies_amabi… 1962 22 56.6 NA
2 AND CNCT_01ABAM1 Abies_amabi… 1963 0 NA NA
3 AND CNCT_01ABAM1 Abies_amabi… 1964 0 NA NA
4 AND CNCT_01ABAM1 Abies_amabi… 1965 2 NA NA
5 AND CNCT_01ABAM1 Abies_amabi… 1966 0 NA NA
6 AND CNCT_01ABAM1 Abies_amabi… 1967 2 NA NA
7 AND CNCT_01ABAM1 Abies_amabi… 1968 108 NA NA
8 AND CNCT_01ABAM1 Abies_amabi… 1969 0 NA NA
9 AND CNCT_01ABAM1 Abies_amabi… 1970 0 NA NA
10 AND CNCT_01ABAM1 Abies_amabi… 1971 7 NA NA
# ℹ 213,052 more rows
# ℹ 4 more variables: height_diameter_taken <chr>, burned <dbl>,
# general_method <chr>, methods_notes <chr>
select()La función select() nos permite seleccionar, renombrar y recolocar - todo a la vez
select()La función select() nos permite seleccionar, renombrar y recolocar - todo a la vez
Rows: 213,062
Columns: 8
$ site <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND", "AND", "…
$ year <dbl> 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 197…
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amabilis", "Ab…
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01A…
$ count <dbl> 22, 0, 0, 2, 0, 2, 108, 0, 0, 7, 0, 0, 2, 0, 12, 0, 21, 1…
$ method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIALCONECOUNT…
$ stem_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
dfSummary()dfSummary()Data Frame Summary
dt
Dimensions: 213062 x 1
Duplicates: 212921
-----------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- -------------- ------------------------- -------------------- ------------- ---------- ---------
1 species_name 1. Abies_amabilis 15488 ( 7.3%) I 213062 0
[character] 2. Juniperus_monosperma 10201 ( 4.8%) (100.0%) (0.0%)
3. Amelanchier_arborea 10182 ( 4.8%)
4. Abies_procera 10141 ( 4.8%)
5. Tsuga_mertensiana 9684 ( 4.5%)
6. Acer_saccharum 8403 ( 3.9%)
7. Fraxinus_americana 7997 ( 3.8%)
8. Acer_rubrum 6231 ( 2.9%)
9. Tsuga_canadensis 6078 ( 2.9%)
10. Sassafras_albidum 5200 ( 2.4%)
[ 131 others ] 123457 (57.9%) IIIIIIIIIII
-----------------------------------------------------------------------------------------------------
dfSummary()dfSummary()Data Frame Summary
dt
Dimensions: 213062 x 1
Duplicates: 211712
-----------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ----------- --------------------------- ---------------------- ------- --------- ---------
1 count Mean (sd) : 34.6 (2481.6) 1349 distinct values : 209047 4015
[numeric] min < med < max: : (98.1%) (1.9%)
0 < 0 < 1114340 :
IQR (CV) : 4 (71.7) :
:
-----------------------------------------------------------------------------------------------
distinct() - Extraer valores únicosmutate() - Crear nuevas variablesfilter() - Filtrar datos por casosgroup_by() - Agrupar datos por casossummarise() - Resumir datos por casoscount() - Contar casos por variabledistinct()Niveles de una variable:
distinct()Equivalente en library(base):
distinct()Niveles de una variable:
mutate()Ej: transformar frutos a frutos/m2
mutate: new variable 'fruits_m2' (double) with 2,116 unique values and 35% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2 fruits_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 AND 1962 Abies_amabi… CNCT_01… 22 PARTI… 56.6 NA NA
2 AND 1963 Abies_amabi… CNCT_01… 0 PARTI… NA NA NA
3 AND 1964 Abies_amabi… CNCT_01… 0 PARTI… NA NA NA
4 AND 1965 Abies_amabi… CNCT_01… 2 PARTI… NA NA NA
5 AND 1966 Abies_amabi… CNCT_01… 0 PARTI… NA NA NA
6 AND 1967 Abies_amabi… CNCT_01… 2 PARTI… NA NA NA
7 AND 1968 Abies_amabi… CNCT_01… 108 PARTI… NA NA NA
8 AND 1969 Abies_amabi… CNCT_01… 0 PARTI… NA NA NA
9 AND 1970 Abies_amabi… CNCT_01… 0 PARTI… NA NA NA
10 AND 1971 Abies_amabi… CNCT_01… 7 PARTI… NA NA NA
# ℹ 213,052 more rows
filter() y filter_out()filter: removed 208,641 rows (98%), 4,421 rows remaining
# A tibble: 4,421 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
2 BNZ 1957 Picea_glauca <NA> 1 TRAP NA 0.25
3 BNZ 1957 Picea_glauca <NA> 2 TRAP NA 0.25
4 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
5 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
6 BNZ 1957 Picea_glauca <NA> 0 TRAP NA 0.25
7 BNZ 1957 Picea_glauca <NA> 2 TRAP NA 0.25
8 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
9 BNZ 1957 Picea_glauca <NA> 0 TRAP NA 0.25
10 BNZ 1957 Picea_glauca <NA> 1 TRAP NA 0.25
# ℹ 4,411 more rows
filter() y filter_out()# A tibble: 208,641 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 208,631 more rows
filter() y filter_out()filter: removed 150,475 rows (71%), 62,587 rows remaining
filter: removed 41,369 rows (66%), 21,218 rows remaining
# A tibble: 21,218 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
3 AND 1976 Abies_amabilis CNCT_01ABAM1 12 PARTIALCO… NA NA
4 AND 1978 Abies_amabilis CNCT_01ABAM1 21 PARTIALCO… NA NA
5 AND 1980 Abies_amabilis CNCT_01ABAM1 30 PARTIALCO… NA NA
6 AND 1982 Abies_amabilis CNCT_01ABAM1 61 PARTIALCO… NA NA
7 AND 1985 Abies_amabilis CNCT_01ABAM1 76 PARTIALCO… NA NA
8 AND 1991 Abies_amabilis CNCT_01ABAM1 42 PARTIALCO… NA NA
9 AND 1995 Abies_amabilis CNCT_01ABAM1 75 PARTIALCO… NA NA
10 AND 1997 Abies_amabilis CNCT_01ABAM1 52 PARTIALCO… NA NA
# ℹ 21,208 more rows
group_by() y summarise()group_by() y summarise()group_by: one grouping variable (site)
summarise: now 9 rows and 2 columns, ungrouped
# A tibble: 9 × 2
site fruits
<chr> <dbl>
1 AEC 55731.
2 AND 1968048
3 BNZ 915902
4 CDR 85431
5 CWT 292939
6 HBR 24556.
7 HFR 3683
8 LUQ 3653588
9 SEV 231905.
group_by() y summarise()dt |>
group_by(site) |>
summarise(max_fruit = max(count, na.rm = TRUE),
min_fruit = min(count, na.rm = TRUE)) group_by: one grouping variable (site)
summarise: now 9 rows and 3 columns, ungrouped
# A tibble: 9 × 3
site max_fruit min_fruit
<chr> <dbl> <dbl>
1 AEC 591 0
2 AND 5000 0
3 BNZ 7230 0
4 CDR 151 0
5 CWT 1383 0
6 HBR 244 0
7 HFR 77 0
8 LUQ 1114340 0
9 SEV 1100 0
group_by() y summarise()Crear dataset con media de frutos de cada especie de árbol por sitio y por año:
dt |>
group_by(site, species_name, year) |>
summarise(mean_fruits = mean(count, na.rm = TRUE),
sd_fruit = sd(count, na.rm = TRUE)) |>
ungroup()group_by: 3 grouping variables (site, species_name, year)
summarise: now 3,212 rows and 5 columns, 2 group variables remaining (site, species_name)
ungroup: no grouping variables remain
# A tibble: 3,212 × 5
site species_name year mean_fruits sd_fruit
<chr> <chr> <dbl> <dbl> <dbl>
1 AEC Acer_rubrum 1988 0 0
2 AEC Acer_rubrum 1989 3.1 5.61
3 AEC Acer_rubrum 1990 0.44 1.05
4 AEC Acer_rubrum 1991 9.36 16.4
5 AEC Acer_rubrum 1992 3.90 5.60
6 AEC Acer_rubrum 1993 4.45 12.0
7 AEC Acer_rubrum 1994 9.75 18.4
8 AEC Acer_rubrum 1995 6.52 23.2
9 AEC Acer_rubrum 1996 6.86 12.0
10 AEC Acer_rubrum 1997 1.80 2.47
# ℹ 3,202 more rows
count()if_else() - Corregir datoscase_when() - Crear categoriasif_else()# A tibble: 213,062 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 LUQ 1997 Cecropia_schreberiana <NA> 1.11e6 TRAP NA 0.16
2 LUQ 2015 Ficus_trigonata <NA> 1.07e5 TRAP NA 0.5
3 LUQ 2013 Ficus_trigonata <NA> 6.95e4 TRAP NA 0.5
4 LUQ 2018 Ficus_trigonata <NA> 4.41e4 TRAP NA 0.5
5 LUQ 2015 Ficus_trigonata <NA> 3.97e4 TRAP NA 0.5
6 LUQ 2015 Ficus_trigonata <NA> 3.51e4 TRAP NA 0.5
7 LUQ 2016 Ficus_trigonata <NA> 3.35e4 TRAP NA 0.5
8 LUQ 2008 Ficus_trigonata <NA> 3.30e4 TRAP NA 0.5
9 LUQ 2011 Ficus_trigonata <NA> 3.27e4 TRAP NA 0.5
10 LUQ 2009 Cecropia_schreberiana <NA> 3.06e4 TRAP NA 0.5
# ℹ 213,052 more rows
if_else()mutate: changed one value (<1%) of 'count' (one new NA)
# A tibble: 213,062 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
if_else()if_else()Crear una nueva variable de dos niveles de frutos.
mutate: new variable 'frutos_cat' (character) with 3 unique values and 2% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
# ℹ 1 more variable: frutos_cat <chr>
if_else()Crear una nueva variable de dos niveles de frutos.
dt |>
# add missing para los valores NA
mutate(frutos_cat = if_else(count > 100, "mucho", "poco",
missing = "nodata"))mutate: new variable 'frutos_cat' (character) with 3 unique values and 0% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
# ℹ 1 more variable: frutos_cat <chr>
case_when()Crear una nueva variable en base a diferentes niveles de frutos.
Ej - un factor de 3 niveles de cantidad frutos:
case_when()Crear una nueva variable en base a diferentes niveles de frutos.
Ej - un factor de 3 niveles de cantidad frutos:
dt |>
mutate(nivel_frutos = case_when(
count <= 100 ~ "bajo",
count > 100 & count <= 1000 ~ "medio",
count > 1000 ~ "alto",
.default = "nodata"))mutate: new variable 'nivel_frutos' (character) with 4 unique values and 0% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
# ℹ 1 more variable: nivel_frutos <chr>
case_when()Contar numero de arboles con distintos niveles de frutos:
dt |>
mutate(nivel_frutos = case_when(
count <= 100 ~ "bajo",
count > 100 & count <= 1000 ~ "medio",
count > 1000 ~ "alto")) |>
group_by(nivel_frutos) |>
summarise(trees = n())mutate: new variable 'nivel_frutos' (character) with 4 unique values and 2% NA
group_by: one grouping variable (nivel_frutos)
summarise: now 4 rows and 2 columns, ungrouped
# A tibble: 4 × 2
nivel_frutos trees
<chr> <int>
1 alto 698
2 bajo 200157
3 medio 8192
4 <NA> 4015
arrange() - Ordenar variable por casosrename() - Renombrar variablesrelocate() - Reordenar variablesselect() - Extraer variablesdistinct() - Extraer valores únicosmutate() - Crear nuevas variablesfilter() - Filtrar datos por casosgroup_by() - Agrupar datos por casossummarise() - Resumir datos por casoscount() - Contar casos por variableif_else() - Corregir datoscase_when() - Crear categoriaslibrary(tidyr)pivot_wider()pivot_longer()Primero creamos dataset reducido:
group_by: 2 grouping variables (site, year)
summarise: now 283 rows and 3 columns, one group variable remaining (site)
# A tibble: 283 × 3
# Groups: site [9]
site year fruits
<chr> <dbl> <dbl>
1 AEC 1988 8.43
2 AEC 1989 25.7
3 AEC 1990 1.18
4 AEC 1991 4.84
5 AEC 1992 8.82
6 AEC 1993 1.24
7 AEC 1994 15.7
8 AEC 1995 11.4
9 AEC 1996 9.95
10 AEC 1997 5.10
# ℹ 273 more rows
pivot_wider()Convertir a formato corto:
dt_short <- dt |>
group_by(site, year) |>
summarise(fruits = mean(count, na.rm. = TRUE)) |>
pivot_wider(names_from = "site",
values_from = "fruits")group_by: 2 grouping variables (site, year)
summarise: now 283 rows and 3 columns, one group variable remaining (site)
pivot_wider: reorganized (site, fruits) into (AEC, AND, BNZ, CDR, CWT, …) [was 283x3, now 65x10]
# A tibble: 65 × 10
year AEC AND BNZ CDR CWT HBR HFR LUQ SEV
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1988 8.43 NA 31.9 NA NA NA NA NA NA
2 1989 25.7 NA 274. NA NA NA NA NA NA
3 1990 1.18 NA 154. NA NA NA NA NA NA
4 1991 4.84 NA 207. NA 0.669 NA NA NA NA
5 1992 8.82 NA 56.0 NA 0.877 NA NA 39.9 NA
6 1993 1.24 NA 311. NA 1.59 0.516 NA 43.1 NA
7 1994 15.7 NA 102. NA 2.58 6.67 NA 43.9 NA
8 1995 11.4 NA 95.7 22.7 3.63 1.56 NA 39.1 NA
9 1996 9.95 NA 96.3 27.4 4.88 7.88 NA 58.4 NA
10 1997 5.10 NA 318. 17.4 6.68 0.115 NA 1445. NA
# ℹ 55 more rows
pivot_longer()Convertir a formato largo:
dt_short |>
pivot_longer(cols = c(AEC:SEV), # or cols = -year,
names_to = "site",
values_to = "fruits")pivot_longer: reorganized (AEC, AND, BNZ, CDR, CWT, …) into (site, fruits) [was
65x10, now 585x3]
# A tibble: 585 × 3
year site fruits
<dbl> <chr> <dbl>
1 1988 AEC 8.43
2 1988 AND NA
3 1988 BNZ 31.9
4 1988 CDR NA
5 1988 CWT NA
6 1988 HBR NA
7 1988 HFR NA
8 1988 LUQ NA
9 1988 SEV NA
10 1989 AEC 25.7
# ℹ 575 more rows
Leemos un nuevo dataset con información de atributos para las especies de árboles:
Rows: 104 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): species_name, family, genus, epithet, pollinator_code, mycorrhiza_...
dbl (2): seed_development_years, seed_mass_mg
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 104
Columns: 17
$ species_name <chr> "Abies_amabilis", "Abies_concolor", "Abies_gran…
$ family <chr> "Pinaceae", "Pinaceae", "Pinaceae", "Pinaceae",…
$ genus <chr> "Abies", "Abies", "Abies", "Abies", "Abies", "A…
$ epithet <chr> "amabilis", "concolor", "grandis", "lasiocarpa"…
$ seed_development_years <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ pollinator_code <chr> "wind", "wind", "wind", "wind", "wind", "wind",…
$ mycorrhiza_type <chr> "EM", "EM", "EM", "EM", "EM", "EM", "AM", "AM",…
$ needleleaf_broadleaf <chr> "needleleaf", "needleleaf", "needleleaf", "need…
$ deciduous_evergreen <chr> "evergreen", "evergreen", "evergreen", "evergre…
$ seed_maturation_timing <chr> "late summer", "fall", "late summer", "late sum…
$ seed_mass_mg <dbl> 46.2063354, 34.2847056, 21.0800075, 13.7327226,…
$ sexual_system <chr> "monoecious", "monoecious", "monoecious", "mono…
$ shade_tolerance <chr> "tolerant", "tolerant", "tolerant", "tolerant",…
$ growth_form <chr> "tree", "tree", "tree", "tree", "tree", "tree",…
$ seed_bank <chr> "no", "no", "no", "no", "yes", "yes", "no", "no…
$ fleshy_fruit <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ dispersal_syndrome <chr> "abiotic", "abiotic", "abiotic", "abiotic", "ab…
# A tibble: 2 × 2
pollinator_code n
<chr> <int>
1 animal 73
2 wind 31
# A tibble: 41 × 2
family n
<chr> <int>
1 Aceraceae 2
2 Annonaceae 2
3 Aquifoliaceae 1
4 Araliaceae 2
5 Arecaceae 1
6 Betulaceae 4
7 Bignoniaceae 2
8 Boraginaceae 2
9 Burseraceae 2
10 Cecropiaceae 1
# ℹ 31 more rows
left_join()left_join()Rows: 213,062
Columns: 24
$ site <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND"…
$ year <dbl> 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amab…
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1",…
$ count <dbl> 22, 0, 0, 2, 0, 2, 108, 0, 0, 7, 0, 0, 2, 0, 12…
$ method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIA…
$ stem_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ family <chr> "Pinaceae", "Pinaceae", "Pinaceae", "Pinaceae",…
$ genus <chr> "Abies", "Abies", "Abies", "Abies", "Abies", "A…
$ epithet <chr> "amabilis", "amabilis", "amabilis", "amabilis",…
$ seed_development_years <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ pollinator_code <chr> "wind", "wind", "wind", "wind", "wind", "wind",…
$ mycorrhiza_type <chr> "EM", "EM", "EM", "EM", "EM", "EM", "EM", "EM",…
$ needleleaf_broadleaf <chr> "needleleaf", "needleleaf", "needleleaf", "need…
$ deciduous_evergreen <chr> "evergreen", "evergreen", "evergreen", "evergre…
$ seed_maturation_timing <chr> "late summer", "late summer", "late summer", "l…
$ seed_mass_mg <dbl> 46.20634, 46.20634, 46.20634, 46.20634, 46.2063…
$ sexual_system <chr> "monoecious", "monoecious", "monoecious", "mono…
$ shade_tolerance <chr> "tolerant", "tolerant", "tolerant", "tolerant",…
$ growth_form <chr> "tree", "tree", "tree", "tree", "tree", "tree",…
$ seed_bank <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ fleshy_fruit <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ dispersal_syndrome <chr> "abiotic", "abiotic", "abiotic", "abiotic", "ab…
dt_clean <- dt_sp |>
# quitar un valor equivocado
mutate(count = if_else(count > 200000, NA, count)) |>
# considerar valores de 0
filter(count > 0) |> #con esto tambien quitamos NAs
# calcular número de frutos por m2
mutate(fruits_per_m2 = count/trap_area_m2) |>
# crear variable con la cantidad de frutos de count o corregida
mutate(fruits = if_else(is.na(fruits_per_m2), count, fruits_per_m2))mutate: changed one value (<1%) of 'count' (one new NA)
filter: removed 130,782 rows (61%), 82,280 rows remaining
mutate: new variable 'fruits_per_m2' (double) with 2,114 unique values and 40% NA
mutate: new variable 'fruits' (double) with 2,303 unique values and 0% NA
write_csv - usa separador de “,”write_csv2 - usa separador de “;”write_delim - usa cualquier separador de datos (ej. delim = “|”)El formato parquet para guardar datos es una forma muy eficiente de manejar grandes bases de datos.
Este formato archiva los datos en forma de columnas, ofrece una compresion mayor que .csv incluso mayor que .rds y es más rapido para trabajar.
Además permite el particionado de datos en diferentes ficheros.
readr, readxl, and googlesheets4dplyrtidyrstringrforcatslubridateUsando la base de datos final (dt_clean), seleccionar datos con información para diámetro de tronco (stem_cm) y ordernar de mayor a menor:
Usando la base de datos final (dt_clean), seleccionar datos con información para diámetro de tronco (stem_cm) y ordernar de mayor a menor:
# A tibble: 1,503 × 26
site year species_name plant_ID count method stem_cm trap_area_m2 family
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 AND 1993 Abies_procera CNCT_37AB… 250 PARTI… 221. NA Pinac…
2 AND 1962 Abies_procera CNCT_15AB… 50 PARTI… 198. NA Pinac…
3 AND 1962 Abies_procera CNCT_15AB… 68 PARTI… 196. NA Pinac…
4 AND 1993 Abies_procera CNCT_37AB… 350 PARTI… 186. NA Pinac…
5 AND 1961 Abies_procera CNCT_37AB… 231 PARTI… 186. NA Pinac…
6 AND 1993 Abies_procera CNCT_37AB… 410 PARTI… 184. NA Pinac…
7 AND 1962 Abies_procera CNCT_15AB… 130 PARTI… 183. NA Pinac…
8 AND 1992 Abies_procera CNCT_02AB… 54 PARTI… 180. NA Pinac…
9 AND 1962 Abies_procera CNCT_15AB… 300 PARTI… 178. NA Pinac…
10 AND 1993 Abies_procera CNCT_37AB… 250 PARTI… 174. NA Pinac…
# ℹ 1,493 more rows
# ℹ 17 more variables: genus <chr>, epithet <chr>,
# seed_development_years <dbl>, pollinator_code <chr>, mycorrhiza_type <chr>,
# needleleaf_broadleaf <chr>, deciduous_evergreen <chr>,
# seed_maturation_timing <chr>, seed_mass_mg <dbl>, sexual_system <chr>,
# shade_tolerance <chr>, growth_form <chr>, seed_bank <chr>,
# fleshy_fruit <chr>, dispersal_syndrome <chr>, fruits_per_m2 <dbl>, …
Usando la base de datos final (dt_clean), calcular diámetro medio y SD para cada especie de árbol.
Usando la base de datos final (dt_clean), calcular diámetro medio y SD para cada especie de árbol.
dt_clean |>
filter(!is.na(stem_cm)) |>
group_by(species_name) |>
summarise(mean = mean(stem_cm),
sd = sd(stem_cm))filter: removed 80,777 rows (98%), 1,503 rows remaining
group_by: one grouping variable (species_name)
summarise: now 10 rows and 3 columns, ungrouped
# A tibble: 10 × 3
species_name mean sd
<chr> <dbl> <dbl>
1 Abies_amabilis 65.0 19.7
2 Abies_concolor 63.1 18.6
3 Abies_grandis 74.5 14.8
4 Abies_lasiocarpa 45.0 16.2
5 Abies_magnifica 87.9 19.9
6 Abies_procera 104. 34.4
7 Picea_engelmannii 80.2 16.5
8 Pinus_lambertiana 114. 27.7
9 Pinus_monticola 63.4 22.4
10 Tsuga_mertensiana 56.5 12.5
Usando la base de datos final (dt_clean), calcular el número de árboles y número de especies mayores de 40cm de diámetro y menores de 40cm de diámetro. Debería salir algo asi:
# A tibble: 2 × 3
tree_size n_trees n_species
<chr> <int> <int>
1 big 1405 10
2 small 98 6
Usando la base de datos final (dt_clean), calcular el número de árboles y número de especies mayores de 40cm de diámetro y menores de 40cm de diámetro.
dt_clean |>
filter(!is.na(stem_cm)) |>
mutate(tree_size = case_when(stem_cm >= 40 ~ "big",
stem_cm < 40 ~ "small")) |>
group_by(tree_size) |>
summarise(n_trees = n(),
n_species = n_distinct(species_name))filter: removed 80,777 rows (98%), 1,503 rows remaining
mutate: new variable 'tree_size' (character) with 2 unique values and 0% NA
group_by: one grouping variable (tree_size)
summarise: now 2 rows and 3 columns, ungrouped
# A tibble: 2 × 3
tree_size n_trees n_species
<chr> <int> <int>
1 big 1405 10
2 small 98 6
Usando la base de datos final (dt_clean), seleccionar sitios con método de conteo tipo “TRAP” y calcular cantidad máxima y mínima de frutos por m2 para cada sitio.
Usando la base de datos final (dt_clean), seleccionar sitios con método de conteo tipo “TRAP” y calcular cantidad máxima y mínima de frutos por m2 para cada sitio.
dt_clean |>
filter(method == "TRAP") |>
group_by(site) |>
summarise(max_fruit = max(fruits_per_m2),
min_fruit = mean(fruits_per_m2))filter: removed 33,191 rows (40%), 49,089 rows remaining
group_by: one grouping variable (site)
summarise: now 5 rows and 3 columns, ungrouped
# A tibble: 5 × 3
site max_fruit min_fruit
<chr> <dbl> <dbl>
1 AEC 8107. 296.
2 BNZ 28920 1088.
3 CWT 12207. 197.
4 HBR 2440 93.4
5 LUQ 213300 299.
Usando la base de datos final (dt_clean), crear una tabla que compare la suma de frutos contados en los sitios CWT y HFR (en columnas), para los años entre 2000-2010 (filas).
Usando la base de datos final (dt_clean), crear una tabla que compare la suma de frutos contados en los sitios CWT y HFR (en columnas), para los años entre 2000-2010 (filas).
dt_clean |>
filter(site %in% c("CWT", "SEV")) |>
filter(year %in% c(2000:2010)) |>
group_by(site, year) |>
summarise(fruits = sum(fruits)) |>
pivot_wider(names_from = site, values_from = fruits) # A tibble: 11 × 3
year CWT SEV
<dbl> <dbl> <dbl>
1 2000 92776. 5561.
2 2001 133160. 28243.
3 2002 45746. 302.
4 2003 63213. 13646.
5 2004 67092. 23964.
6 2005 47034. 20558.
7 2006 84603. 726.
8 2007 114387. 11630.
9 2008 147617. 14634
10 2009 138707. 6
11 2010 139444. 15441.
Usando la base de datos final (dt_clean), crear una tabla que compare la suma de frutos contados entre los años 2001 y 2005 (en columnas), para las especies de Abies (filas).
Usando la base de datos final (dt_clean), crear una tabla que compare la suma de frutos contados entre los años 2001 y 2005 (en columnas), para las especies de Abies (filas).
dt_clean |>
filter(year %in% c(2001:2005)) |>
filter(str_detect(species_name, "Abies")) |>
group_by(year, species_name) |>
summarise(fruits = sum(fruits)) |>
pivot_wider(names_from = year, values_from = fruits)# A tibble: 6 × 6
species_name `2001` `2002` `2003` `2004` `2005`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abies_amabilis 721 2819 5907 54 864
2 Abies_concolor 1429 92 3032 NA 136
3 Abies_grandis 3509 238 4414 17 1119
4 Abies_magnifica 52 1374 6324 8 570
5 Abies_procera 3308 7772 10485 957 1588
6 Abies_lasiocarpa NA 443 1974 17 73
Igual al ejercicio 6 pero donde se muestre la suma de frutos acumulada por años y el nombre de la especie separado por espacio en vez de barra baja (i.e. Abies amabilis en vez de Abies_amabilis). Usando la base de datos final (dt_clean), crear una tabla que compare la suma de frutos contados acumulada entre los años 2001 y 2005 (en columnas), para las especies de Abies (filas).
# A tibble: 6 × 6
# Groups: species_name [6]
species_name `2001` `2002` `2003` `2004` `2005`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abies amabilis 721 3540 9447 9501 10365
2 Abies concolor 1429 1521 4553 NA 4689
3 Abies grandis 3509 3747 8161 8178 9297
4 Abies magnifica 52 1426 7750 7758 8328
5 Abies procera 3308 11080 21565 22522 24110
6 Abies lasiocarpa NA 443 2417 2434 2507
dt_clean |>
filter(year %in% c(2001:2005)) |>
filter(str_detect(species_name, "Abies")) |>
group_by(year, species_name) |>
summarise(fruits = sum(fruits)) |>
group_by(species_name) |>
mutate(fruits_acc = cumsum(fruits),
species_name = str_replace(species_name, "_", " ")) |>
select(-fruits) |>
pivot_wider(names_from = year, values_from = fruits_acc)# A tibble: 6 × 6
# Groups: species_name [6]
species_name `2001` `2002` `2003` `2004` `2005`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abies amabilis 721 3540 9447 9501 10365
2 Abies concolor 1429 1521 4553 NA 4689
3 Abies grandis 3509 3747 8161 8178 9297
4 Abies magnifica 52 1426 7750 7758 8328
5 Abies procera 3308 11080 21565 22522 24110
6 Abies lasiocarpa NA 443 2417 2434 2507
Selecciona la familia de las fagaceas del dataset final (dt_clean) y calcula el número de observaciones por genero (si puedes sacalo de la columna species, sin usar la variable genero) en cada sitio.
Selecciona la familia de las fagaceas del dataset final (dt_clean) y calcula el número de observaciones por género (si puedes sacalo de la columna ‘species_name’, sin usar la variable ‘genus’) en cada sitio.