Dataset description:

Animal Age Dataset.

AnAge is a database of longevity, ageing, and life history in extant species employing the same engine of GenAge(the benchmark database of genes related to ageing).

Initial questions Based on the anage dataset, is there a relationship between the length of time mammals reach maturity and the maximum lifespan? Does gender difference affect this relationship?


Description:  These two scatterplots show the relationship between the average length of time mammals reach maturity(maturity days) and the average maximum longevity of different species in the Mammalia. The left one shows the relationship between female avg. maturity days and the avg. maximum longevity. The right one shows the relationship between male avg. maturity days and the avg. maximum longevity. Each point represents a species, darker points indicate there are multiple species. Hovering over a point will reveal the associated species' name, female and male average maturity days, and average maximum longevity. Species discussed in the text are indicated using red color.
Insight: According to the scatterplots above, we can see that there is a relationship between the length of time mammals reach maturity and the maximum lifespan. In general, The longer it takes to reach maturity, the longer mammals live, and both genders show similar patterns of relationship. But this correlation is not perfect, there are several species whose average maturity days are longer than others but the average maximum longevity is shorter(i.e. crassidens(highlighted with red color)).

Another interesting thing that I found is that generally female mammals spend less time to reach maturity than male mammals. And for some species, female mammals may spend far less time to reach maturity than male mammals such as crassidens(highlighted with red color). If you hover to several points, you will notice that.

More interestingly, WHO has come to a similar conclusion, but only for humans(sapiens species).
Design considerations:  I choose to use a scatterplot because it is a very effective way to show the relationship between two quantitative variables, other alternative visualisations of distributions like box plots and histograms can't display a relationship between two quantitative variables very apparently. Box plots are usually used to just show distributions of numeric data values, especially when you want to compare them between multiple groups. Similarly, histograms are good for showing general distributional features of dataset variables.

I have also used tooltips so it is possible to hover over a point to see the associated information. A limitation of this plot is that it suffers from overplotting - many points are drawn on top of each other. To reduce the impact of this I have used opacity - darker circles indicate more species located at the same or very similar position. There are some limitations of this - notably it is impossible to tell exactly how many species are at one particular value, and there are a bunch of points clustered around 0, but the emphasised trend is still very apparent.

Binning some data would reduce overplotting and avoid lots of points clustered around 0 such as deleting the top right point, but considering that this point is very much in line with the overall trend, I didn't end up doing that.

Data filtering and transformation:  Only species with recorded maximum longevity(yrs) greater than zero are shown, only species that belong to the Mammalia are shown, only species that male or female maturity (days) are greater than zero are shown.

Aggregate the data using "average" operation to show the average maximum longevity and average maturity days.