Dataset description:

Animal Age Dataset.

AnAge is a database of longevity, ageing, and life history in extant species employing the same engine of GenAge(the benchmark database of genes related to ageing).

Initial questions Where does the data come from? Is the quality of the data good enough? Can the data be trusted?


Description:  This bar chart shows where the data come from, how many and how good the data collected is from each specimen origin. The bar chart is sorted from maximum to minimum. Each bar represents a specimen origin and these are coloured according to the data quality - blue means the quality of corresponding data is acceptable, orange means the quality of corresponding data is high and so on. You can see the detailed information from the legend. Hovering over a coloured bar will reveal the associated specimen origin, data quality and the number of records.
Insight: Based on the bar chart above, we can see that the data basically have three origins, captivity, wild and unknown. It is clear that most of the data comes from captive or wild individuals. And the amount of captive data is slightly higher than the wild one. It is also very obvious that the quality of most of the data is acceptable, which means the data is worth to be analysed and interpreted. Particularly, some of the data in captivity is of high quality, and low-quality data accounts for the majority for the unknown origin.

Overall, based on the analysis of the bar chart above, I believe that the data quality of this dataset is good enough to do some analyses and researches.

Design considerations:  I choose to use a bar chart to display the relevant data not only because it is a very effective way to convey quantitative information but also it can show the difference between different category data visually. I give up using the pie chart because it can not show the slight difference between different categories (like wild and captivity) effectively, although pie chart is a very good way to show the percentage of a particular category in the population visually.

Comparing with using more bars, I choose to use different colours to separate different data qualities in individual bars, because it can show the relevant information more clearly and aesthetically. There is no need to use more bars to separate the data qualities because it is not necessary to show the number of records of different data qualities in different categories. Using colours can show a general proportion in different bars, which is enough for the current situation.

I have also used tooltips so it is possible to hover over a coloured bar to see the associated information. I also set the specimen origin to the Y-axis and the number of records to the X-axis so that the text on the Y-axis is horizontal.
Data filtering and transformation:  Aggregate the data using "count" operation to calculate the number of records.