R for Animal Crossing flower genetics

Animal Crossing New Horizons uses Mendelian inheritance of bi-allelic polygenic traits to determine flower colors. I wrote some tools in R to help plan flower crosses.
R
Just for fun
Published

May 4, 2020

There are already very nice and user-friendly tools to help out with flower breeding in Animal Crossing: New Horizons. The best one I’ve found is the Flower Breeding Simulator. So, if you’re just trying to get particular flower colors, there are plenty of guides available (but be aware that many of them are not quite accurate).

But for fun, I wanted to develop semi-interactive tools in R myself, because I like to do things the hard way. In a very distant past life, I was a molecular biologist / geneticist, so once I heard about how flower crosses work in ACNH, I was intrigued.

Good background on the mechanics is at ACNH Flower Genetics Guide by Paleh, with a shout-out to Aeter for the data mining.

In brief:

  • flower phenotypes are determined by 3 bi-allelic genes for all species except roses, which have 4
  • the colors are polygenic traits, so it does not make sense to talk about dominant and recessive alleles
  • only flowers of the same species can cross, which occurs via classical Mendelian genetics (e.g., meiosis resulting in random selection of one allele of each gene, from each parent)
  • however, if flowers do not have a suitable adjacent partner, they can ‘clone’ themselves. This is more like a mitosis (generating an identical clone) and not like, for example, the self-fertilization of the hermaphroditic C. elegans. This effect isn’t as well documented, but seems likely.

Challenges:

  • the concept that seems to generate the most confusion is that simply knowing the phenotype (i.e., color) of a flower usually says very little about the genotype. If you just pick flowers off your island, you have no idea what you’re starting with.
  • so, it’s important to use as flower breeding source materials either flowers started from seed or flowers found at rare flower islands, because those would have a known genotype.

Goals:

  • write functions to generate the outcomes of flower crosses, using R tidyverse principles to generate nice dataframes, including summary statistics of proportions of each outcome.

Example of use:

Flower genotype / phenotype datafile

The flower datafile contains one row per flower genotype, with information on flower species (type, e.g., rose, cosmos, etc.), the genotypes (four genes, encoded as 0, 1, 2 for homozygous first allele, heterozygous both alleles, and homozygous second allele, respectively), the phenotype (color), the source (seed, rare flower island, by crosses), a Boolean of whether it will breedtrue (all genes homozgygous), and a unique ‘genotype’ field to help with table joins.

data %>% mutate_if(is.character, as.factor) %>% summary
       type         red        yellow      white     brightness      color       source    breedtrue               genotype  
 rose    :81   Min.   :0   Min.   :0   Min.   :0   Min.   :0.0   yellow :62   cross :227   Mode :logical   000 cosmos  :  1  
 cosmos  :27   1st Qu.:0   1st Qu.:0   1st Qu.:0   1st Qu.:0.0   red    :50   island: 19   FALSE:198       000 hyacinth:  1  
 hyacinth:27   Median :1   Median :1   Median :1   Median :0.0   white  :46   seed  : 24   TRUE :72        000 lily    :  1  
 lily    :27   Mean   :1   Mean   :1   Mean   :1   Mean   :0.3   orange :37                                000 mum     :  1  
 mum     :27   3rd Qu.:2   3rd Qu.:2   3rd Qu.:2   3rd Qu.:0.0   pink   :27                                000 pansy   :  1  
 pansy   :27   Max.   :2   Max.   :2   Max.   :2   Max.   :2.0   purple :27                                000 tulip   :  1  
 (Other) :54                                                     (Other):21                                (Other)     :264  

I disagree with naming the genes red, yellow, white, and brightness, because the genotype - phenotype associations are not consistent, and also because the different flower species can’t be crossed with one another so there’s no reason why the flowers should have genes of the same name. But I kept the gene names to remain consistent with the data source.

What hyacinths are available with known genotype?

Say I wanted to breed hyacinths. First I might like to know which genotypes I can obtain from seeds and rare flower islands. We’ll filter the datafile by type and source:

data %>% filter(type == 'hyacinth', source != 'cross') %>% arrange(source, genotype)
  type       red yellow white brightness color  source breedtrue genotype    
  <chr>    <int>  <int> <int>      <int> <chr>  <chr>  <lgl>     <chr>       
1 hyacinth     1      0     1          0 pink   island FALSE     101 hyacinth
2 hyacinth     1      2     0          0 orange island FALSE     120 hyacinth
3 hyacinth     2      1     0          0 blue   island FALSE     210 hyacinth
4 hyacinth     0      0     1          0 white  seed   FALSE     001 hyacinth
5 hyacinth     0      2     0          0 yellow seed   TRUE      020 hyacinth
6 hyacinth     2      0     1          0 red    seed   FALSE     201 hyacinth

I wrote a multi-purpose function that can show you the results of a single cross, or else you can specify all the genotypes you have available to you, and it will generate the results of all possible crosses (including self-crosses).

Determining possible outcomes of a simple cross

First a simple cross of two flowers, let’s say the orange hyacinth from a rare flower island with genotype 120 hyacynth (the first gene is heterozygous and the other 2 are homozygous):

Cross where all progeny have known genotype

cross_flowers('hyacinth 120 120', single_cross = TRUE)
# A tibble: 3 x 6
  cross                       genotype     color  breedtrue prop_cross prop_color
  <chr>                       <chr>        <chr>  <lgl>          <dbl>      <dbl>
1 120 (orange) x 120 (orange) 120 hyacinth orange FALSE           0.5           1
2 120 (orange) x 120 (orange) 220 hyacinth purple TRUE            0.25          1
3 120 (orange) x 120 (orange) 020 hyacinth yellow TRUE            0.25          1

What do you know? You get the classic 1:2:1 ratio of Mendelian genetics out, from the heterozygous first gene. Columns:

  • cross: specifies the originating parents (more helpful later)
  • genotype: the progeny genotype
  • color: the outcome phenotype
  • prop_cross: in this specific cross, the proportion which will be this genotype
  • prop_color: in this specific cross AND outcome color, the proportion of progeny of this color that will be this genotype.

In this example, it’s rather nice that every possible result color is from a known genotype. That’s not always the case. For example, we use store-bought white hyacinth seeds (with genotype 001), and self-crossed it:

Cross where all progeny look alike but can have different genotype

cross_flowers('hyacinth 001 001', single_cross = TRUE)
  cross                     genotype     color breedtrue prop_cross prop_color
  <chr>                     <chr>        <chr> <lgl>          <dbl>      <dbl>
1 001 (white) x 001 (white) 002 hyacinth blue  TRUE            0.25      1    
2 001 (white) x 001 (white) 001 hyacinth white FALSE           0.5       0.667
3 001 (white) x 001 (white) 000 hyacinth white TRUE            0.25      0.333

If you get a blue progeny, which will happen in 25% of the progeny (prop_cross = 0.25), you know that all of them will be of genotype 002 (prop_color = 1). And furthermore, breedtrue == TRUE, meaning any self-crosses will regerate itself. Yay! However, 75% of the time you’ll get white progeny, and of them, 66.7% will be genotype 001 and 33% will be genotype 000. You should not use white flowers that result from this cross for further crosses without being aware of the uncertain starting genotype.

Finding ALL possible outcomes of ALL crosses, from a list of sources

Another feature I wanted was to know what I could generate with what I had on hand, in all possible crosses. Say I had access to all the store-bought hyacinth seeds:

  • white 001
  • yellow 020
  • red 201)

.. but none of the rare flower island sources. What can I get from all possible crosses (including self-crosses) of these sources?

cross_flowers('hyacinth 001 020 201')
   cross                       genotype     color  breedtrue prop_cross prop_color
   <chr>                       <chr>        <chr>  <lgl>          <dbl>      <dbl>
 1 001 (white) x 001 (white)   002 hyacinth blue   TRUE            0.25      1    
 2 001 (white) x 001 (white)   001 hyacinth white  FALSE           0.5       0.667
 3 001 (white) x 001 (white)   000 hyacinth white  TRUE            0.25      0.333
 4 020 (yellow) x 001 (white)  010 hyacinth yellow FALSE           0.5       0.5  
 5 020 (yellow) x 001 (white)  011 hyacinth yellow FALSE           0.5       0.5  
 6 201 (red) x 001 (white)     101 hyacinth pink   FALSE           0.5       1    
 7 201 (red) x 001 (white)     100 hyacinth red    FALSE           0.25      1    
 8 201 (red) x 001 (white)     102 hyacinth white  FALSE           0.25      1    
 9 020 (yellow) x 020 (yellow) 020 hyacinth yellow TRUE            1         1    
10 201 (red) x 020 (yellow)    110 hyacinth orange FALSE           0.5       1    
11 201 (red) x 020 (yellow)    111 hyacinth yellow FALSE           0.5       1    
12 201 (red) x 201 (red)       201 hyacinth red    FALSE           0.5       0.5  
13 201 (red) x 201 (red)       200 hyacinth red    TRUE            0.25      0.25 
14 201 (red) x 201 (red)       202 hyacinth red    TRUE            0.25      0.25 

So the 6 possible crosses generate 14 possible genetic outcomes. Seven (with prop_color === 1) can be genotyped precisely based on the parental cross and the progeny phenotype. These might be easier to use for subsequent crosses.

Finding cross outcomes where color will uniquely identify genotype

cross_flowers('hyacinth 001 020 201') %>% filter(prop_color == 1)
cross                       genotype     color  breedtrue prop_cross prop_color
  <chr>                       <chr>        <chr>  <lgl>          <dbl>      <dbl>
1 001 (white) x 001 (white)   002 hyacinth blue   TRUE            0.25          1
2 201 (red) x 001 (white)     101 hyacinth pink   FALSE           0.5           1
3 201 (red) x 001 (white)     100 hyacinth red    FALSE           0.25          1
4 201 (red) x 001 (white)     102 hyacinth white  FALSE           0.25          1
5 020 (yellow) x 020 (yellow) 020 hyacinth yellow TRUE            1             1
6 201 (red) x 020 (yellow)    110 hyacinth orange FALSE           0.5           1
7 201 (red) x 020 (yellow)    111 hyacinth yellow FALSE           0.5           1

With the source dataframe, we can also ask other questions. Like, what color roses are there, and how many genotypes account for each color?

Listing rose colors and number of genotypes specifying each

data %>%
  filter(type == 'rose') %>%
  group_by(color) %>%
  summarise(n = n()) %>%
  arrange(desc(n))
  color      n
  <chr>  <int>
1 white     18
2 yellow    18
3 red       13
4 orange     9
5 pink       9
6 purple     9
7 black      4
8 blue       1

Ahh, there’s only 1 way to generate the elusive blue rose.

data %>% filter(type == 'rose', color == 'blue')
type    red yellow white brightness color source breedtrue genotype 
  <chr> <int>  <int> <int>      <int> <chr> <chr>  <lgl>     <chr>    
1 rose      2      2     2          0 blue  cross  TRUE      2220 rose

It’s actually pretty boring – it’s homozygous at each of the 4 loci (4 genes for roses, 3 for all the other flowers), so will breed true without any further work. It would have confused the heck out of ACNH players much more if roses were generated by, say, a 1111 genotype. In which case, I would be trying to generate 2222 and 0000, for which crosses would produce 100% of the target 1111 genotype, but selfing the target would generate an unholy mess of outcomes.

How problematic is it to use a flower color alone to infer genotype?

With the flower data available, it’s easy to see exactly why it’s a very bad idea to just pick flowers off of player islands and use the color alone to infer genotype and initiate crosses.

First, let’s see how many unique flower species / color combinations there are:

data %>% select(type, color) %>% unique() %>% nrow()
[1] 52

So, there are 52 flower / color combinations available. We can actually generate a table listing them, with the numbers representing the number of distinct genotypes that generate that combination:

Number of genotypes representing each flower / color combination

data %>%
  group_by(type, color) %>%
  summarise(n = as.character(n())) %>%
  pivot_wider(names_from = color, values_from = n, values_fill = list(n = '-'))
  type       black orange pink  red   white yellow blue  purple green
  <chr>      <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>  <chr>
1 cosmos     2     7      4     5     4     5      -     -      -    
2 hyacinth   -     2      1     6     4     9      2     3      -    
3 lily       2     4      3     3     8     7      -     -      -    
4 mum        -     -      4     6     3     6      -     6      2    
5 pansy      -     5      -     6     2     8      3     3      -    
6 rose       4     9      9     13    18    18     1     9      -    
7 tulip      2     2      1     5     5     9      -     3      -    
8 windflower -     8      5     6     2     -      3     3      -    

Only 3 out of the 52 different flower / color combinations are represented by a unique genotype, hence, the warning that flower colors alone should not be used to infer genotype. Flower sourced from seeds or a rare flower island, both of which have known genotypes, should be used instead.

The only 3 flowers where species and color alone identify genotype

data %>% group_by(type, color) %>% filter(n() == 1) %>% ungroup()
  type       red yellow white brightness color source breedtrue genotype    
  <chr>    <int>  <int> <int>      <int> <chr> <chr>  <lgl>     <chr>       
1 rose         2      2     2          0 blue  cross  TRUE      2220 rose   
2 tulip        1      0     1          0 pink  island FALSE     101 tulip   
3 hyacinth     1      0     1          0 pink  island FALSE     101 hyacinth

It looks like the easiest way (other than ‘cloning’ individual flowers with no available partners) to generate sources of pink tulips and pink hyacinths would be to cross parental 202 and 000 genotypes, which when crossed would always yield the 101 genotype of the pink flowers.

There you go, tidy analysis of Animal Crossing New Horizons flower genetics, done in R!

The full source code of the functions as well as the data file are available on GitHub.