Filter lineage_defs for specific lineages, keeping mutations that are present in at least one lineage

Usage

filter_lineages(
  lineage_defs = NULL,
  lineages = c("B.1.526", "B.1.1.7", "B.1.351", "B.1.617.2", "B.1.427", "B.1.429", "P.1"),
  return_df = FALSE,
  path = NULL,
  shared_order = TRUE
)

Arguments

lineage_defs: The result of astronomize(). If NULL, tries to run astronoimize.
lineages: Vector of lineage names (must be in rownmaes(lineage_defs)). Defaults to lineages circulating in 2021-2022.
return_df: Should the function return a data frame? Note that returned df is transposed compared to lineage_defs. Default FALSE.
path: Passed on to astronomize if lineage_defs is NULL.
shared_order: Put shared mutations first? Default TRUE.

Value

A lineage definition matrix with fewer rows and columns than lineage_defs. If return_df, the columns represent lineage names and a mutations column is added.

Details

After removing some lineage, the remaining mutations might not be present in any of the remaining lineage. This function will remove mutations that no longer belong to any lineage.

shared_order = TRUE will result in the mutations that are present in the highest number of lineages to appear first. This is convenient for human inspection, but does not affect estimation.

Examples

# After cloning the constellations repo
lineage_defs <- astronomize(path = "../constellations")
#> Warning: Path does not exist. Using built-in definitions.
dim(lineage_defs)
#> [1]  37 325
lineage_defs <- filter_lineages(lineage_defs, c("B.1.1.7", "B.1.617.2"))
dim(lineage_defs) # rows and columns have changed
#> [1]  2 36