VisQualdex: a Comprehensive Guide to Good Data Visualization

Sawicki, Jan; Burdukiewicz, Michał

doi:10.26583/sv.15.1.11

Scientific Visualization, 2023, volume 15, number 1, pages 127 - 149, DOI: 10.26583/sv.15.1.11

VisQualdex: a Comprehensive Guide to Good Data Visualization

Authors: Jan Sawicki^1,A, Michał Burdukiewicz^2,B

^A Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland

^B Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Barselona, Spain

¹ ORCID: 0000-0002-8930-7564, jansawickisawicki@gmail.com

² ORCID: 0000-0001-8926-582X, michalburdukiewicz@gmail.com

Abstract

The rapid influx of low-quality data visualizations is one of the main challenges in today’s commu- nication. Misleading, unreadable, or confusing visualizations spread misinformation. Furthermore, they fail to deliver their message to the viewer. The scale of the problem is big enough that there already exist public fora gathering tens of thousands of users criticizing graphics and charts (reddit.com/r/dataisugly) made with obvious mistakes. Current attempts at data visualization appear mostly as simple and overgeneralized checklists, and often lack systematicity and versatility. The lack of proper tooling for evaluating data visualization quality further heightens the problem.

Therefore, this paper proposes VisQualdex, a systematic set of guidelines for static data visual- ization. The codex categorization is based and inspired by the theory of Grammar of Graphics. It contains dozens of criteria designed to catch various errors and mistakes of different categories and magnitude. Furthermore, it has been peer-reviewed and tested by experts of data visualization, data science, graphics design, information technology and computer science.

To apply theory in the real world and increase the practical impact of VisQualdex, this contribution also introduces a practical tool. The implementation of the guidelines is available in the form of the web server, https://visqual.onrender.com, developed as a single page application in JavaScript using Vue.js and Material Design principles.

Keywords: data visualization, guidelines, visualization checklist, grammar of graphics.

1 Introduction

The first historical stamps of data visualization reach back to before 17^th century. Its “golden age” dates to the second half of 19^th century [1]. However, it is the 21^st that may be called the renaissance of data visualization.

Human preference to acquire information with visual means [2] in tandem with the time efficiency of conveying information with charts [3] results in massive production of data visualization applied in various fields, including business analysis, Big Data, psychology, journalism, and production process [4,5]. Mass-produced charts, graphs, diagrams, schemes and infographics flood the market and the viewers [6].

The influx of data visualizations is possible thanks to the vast plethora of specialized tools. Although novel software makes visualization easier, they do not ensure the quality of their creations. Moreover, no modern tools are designed to evaluate the quality of the data visualization. We are still limited to guidelines presented in checklists, lists of questions or suggestions that often fail to fulfill their task [7]. To fill this void, we propose organized, systematic data visualization guidelines based on state-of-the-art practices, VisQualdex.

Our methodology allows everyone, from non-specialists to data science experts, to assess the quality of data visualization and pinpoint existing problems. Compared to current methods for data visualization evaluation, VisQualdex leads to a more exhaustive and complete evaluation, due to utilizing a systematic, precise and scientifically supported criteria. Moreover, the VisQualdex is also available as the companion web application, VisQual, and thus can be easily incorporated in many design workflows.

1.1 The spectrum of data visualizations

The focus of VisQualdex is to evaluate the quality of broadly understood static “data visualizations”. The definitions in the literature vary from the “image that is representative of the raw data” [8] to “the set of methods for graphically displaying information in a way that is understandable and straightforward” [9]. Despite the relative vagueness of these definitions, they capture the essential aim of the data visualization, which is to communicate information in a graphical form. The “static” keyword indicated that this guideline refers to visualization that could be simply printed out without losing its key features, i.e. interactive dashboard or real 3D visualization are out of the scope.

In all types of communication, there are many possibilities of conveying the same message. Therefore, the data visualization itself covers drastically different entities, ranging from the simple chart (Figure 1) to more complex visualizations (Figure 2). It provides an additional layer of complication to the data visualization assessment, as the rules must be general enough to apply to all types of data visualization.

Figure 1: An example of a “classical” data visualization

Figure 2: An example of a “non-classical” data visualization (adapted from Charles Lallemand’s “tour de force” nomogram from a work on hexagonal charts [10]

1.2 State of the art

Despite the vagueness of definitions, the way we describe data visualizations is subject to numerous improvements over time [1]. These small steps result in more structured descriptions of visualizations, exemplified by the grammar of graphics. These theoretical achievements have contributed to the development of practical tools such as Matplotlib, Seaborn, Plotly and others [11–14].

In a glaring difference, the evaluation of data visualization is still as unsystematic as it used to be in the past. The first debate on that topic dated back to 1857 and resulted in several advises [15]. The theoretical understanding of the correctness of data visualization has grown over time with new or revised good practices [8], but very few propose practical and applicable evaluation methods.

The lack of a systematic approach stems from the popularity of checklists as one of the most commonly adapted evaluation systems. Here, a checklist is a list of potential mistakes, sometimes divided in thematic sections. The first checklists (or guidelines) date back to 1915 [16], but the community of data visualization practitioners is still producing newer counterparts [17].

Checklists have two main practical advantages: simplicity and shortness. Simplicity means that the majority of the state-of-the-art checklists can be easily operated by any viewer with a basic understanding of the most trivial data visualization concepts, like chart, axis or scale. Moreover, most visualization checklists are brief (e.g., about 20 questions [17, 18]). This short length, along with the simplicity, translates to a relatively good assessment pace.

1.3 Disadvantages of the checklist usage

However, checklists are characterized by the disorder happening on two different conceptual levels: the lack of organization and varying levels of universality.

In the majority of studied examples, checklists do not possess any grouping or hierarchy of the guidelines. Although some checklists demonstrate some degree of guideline categorization, they often do not reflect the state-of-art data visualization descriptions and may have a practical rationale.

The varying level of universality happens when general guidelines (e.g., keep the graph two-dimensional) occur along with more specific pieces of advice (e.g., ’use bar charts to visualize achievement of an objective’). It limits the scope of the checklist to a particular set of data visualizations. This problem is even more pronounced if the checklist contains a scale based on the number of questions answered correctly or fulfilled guidelines. In this situation, non-general guidelines falsely lower the actual score of a visualization.

Moreover, usage of checklists forces following the state-of-the-art methodology and logic of check lists [19], which state that (some points are skipped):

• “The list should be complete (no significant omissions).” and “The checkpoints should refer to criteria and not mere indicators.”

According to this research, the second rule is most often broken as none of the state-of-the-art checklists have “complete”/“full” coverage of the evaluation criteria. As most of both state-of-the-art data visualization checklists aim at an “engineering” approach, they tend to have very strict, concise and precise points. However, this causes them to sometimes be superficial, oversimplified and focus on indicators instead of criteria. For example, a rule “No more than 3 colors” [18] is focusing purely on indicators, but disallows great 4-color visualizations from passing the benchmark. Additionally, a rule “Did you start the Y-axis at 0?” (answer yes/no) [20] allows only charts which do not cut the axis in a justifiable way (e.g. shoe sizes for adults, Earth temperatures with Kelvin units, etc.).

• “The criteria should be commensurable.”

This criterion is not applicable to data visualization checklists, because some mistakes are more significant than others. There exist visualizations which can follow all criteria except one and still be condemning wrong.

• “The list should be concise (to assist its mnemonic function).”

In contrast to engineering processes or medical applications [21], data visualization does not always follow strict regularities as it is a mix of applied arts and fine arts which cannot be separated [22,23].

Therefore, applying checklist format to data visualization evaluation may lead to problems which are neither the problem of the checklist methodology, neither the data visualization evaluation. The problem lies in forcefully fusing the two ideas together.

All problems described above, together with the brevity, result in the non-exhaustiveness of check lists. Right now, there are no checklists that would approach the evaluation of data visualization in a systematized way. Therefore, we have designed VisQualdex to at least partially alleviate these issues and produce an evaluation methodology applicable to the broad spectrum of data visualizations.

2 Design of VisQualdex

The development of VisQualdex follows the VISupply framework for design of data visualization guidelines [24]. It covers four main steps:

• Evidence collection

Collection and diligent analysis of research works about data visualization.

• Integration

Curation of existing data visualization guidelines, good practices, suggestions and similar.

• Contextualization & Generalization

Merging concepts from different works and forming clusters, a.k.a. “categories” (see section 2.1).

• Guideline Definition

Formalization of VisQualdex, i.e. stating the “question” format, supplementing missing areas and verifying VisQualdex in practice, utilizing and extending nomenclatural notions [25].

Moreover, an original concept introduced in VisQualdex consists of the four main traits of a correct data visualization. These four pillars of VisQualdex are:

1. Real data instead of guesstimates.

2. Clarity and readability instead of incomprehensibility and ambiguity.

3. Simplicity and summarization instead of complexity and raw data.

4. Guidance and objectivity instead of manipulation and subjectivity.

Finally, the codex has been peer-reviewed by 4 independent reviewers. The reviewers were experts and specialists in the following fields (parenthesis contain the reviewer’s higher domain): data visualization (computer science), data science (computer science), graphics and design (fine arts) and information technology (computer science). The reviewers all submitted their critique to all the questions and overall codex design. The feedback was gathered in 1–3 iterations, depending on the reviewer. All the comments and suggestions caused various criteria (VisQualdex questions) to be introduced, redefined or abandoned due to lack of quality evidence.

It is important to note that in some initial stages of development, the tool was supposed to be based on user testing and feedback in a “wisdom of the crowds” methodology. However, the user’s feedback was very fragmented and not unanimous. Moreover, it was highly biased to the user experience. Therefore, the expert approach was chosen to strengthen the final criteria, maximize the good practices and minimize the bias of random user evaluation.

2.1 Categories

The important part of VisQualdex is the categorization of guidelines. The baselines are the formalized descriptions of Grammar of Graphics (GoG) [26] and Layered Grammar of Graphics (LGoG) [27]. Although both of these approaches constitute an in-depth description of data visualization, they are used primarily for either building or decomposing the visualization object. As this is a different goal from the evaluation of data visualizations, GoG and LGoG are only reference points. Therefore, VisQualdex utilizes a redesigned categorization of guidelines. Furthermore, proposed categories are complete (i.e. there is no “others” category) and disjoint (i.e. each of the questions belongs exactly to one category).

Subjective

The objective of this category is to incorporate any purely subjective aspects of the visualization. Although all categories concern issues that may be answered differently depending on the viewer, this one focuses on things exclusively related to the opinion of the on-looker.

Theme

This category contains all visual features and artistic choices not directly depending on data, like colors (not related to the color scale), fonts, spacing, and any additional graphics that are not strictly part of the chart.

Coordinates

This category is responsible for the coordinate system and units. Its purpose is to check if all coordinates systems (or their alternatives), units and axes are correctly prepared, provided and presented. It also examines if the relation between shear data and all the aspects above is consistent.

Geometry

This category includes all information about the shapes used for data presentations (e.g., the shape itself and its dimensions). It concerns the shape of the whole figure as well as all used figures and any other geometrical aspects.

Guides

This category handles any text content that appears on the visualization. It focuses only on the content, not the display of e.g., title, legend, axes labels, additional comments, labels etc. It verifies the most importantly the content of the text but also its clarity, objectivity and overall necessity.

Perception

This category focuses on the general perception of the data. It is also responsible for detecting all misuses leading to the incorrect understanding of the data, e.g., bar charts with bars starting at an arbitrary point to make the difference between bar length more pronounced.

Data

This category is responsible for evaluating issues related only to data and all the possible issues such as data source/validity, missing data, and appropriateness of data explanations (e.g., used metrics).

2.1.1 Questions

Each category contains questions which represent unitary criteria based on the pillars described in the beginning of section 2. The most important features of them are:

• All questions are “yes or no” and trigger (negative answer) only if something is incorrect.

• Questions do not overlap or include each other.

• All questions address as general issues as possible while focusing on one particular type of mistake. It means that each question can be applied to any visualization regardless of factors such as form, type, content. However, some categories are incompatible with some visualizations by definition, e.g., a simple bar chart without any faceting cannot be evaluated in terms of faceting.

• It is possible for one general bad practice to trigger many questions.

• Depending on the context, a single negative answer may have a tiny or gigantic impact on the visualization understanding. It means that it is impossible to judge the quality of a visualization solely by the fraction of positively answered questions.

• Literature sources support most questions (the complete citations list available in the supplementary materials).

VisQualdex contains a total of 60 criteria in the form of a question which address/detect different mistakes. We present their general content in the form of the word cloud (Figure 3). The full list of questions is available in the supplementary materials. Here, we present and analyze a few exemplary questions.

Figure 3: Wordcloud displaying most popular words used in questions (without stopwords)

Q: "Is there not too many colors representing the data?" In the case of gradient color scale, the distribution of the colors should be regular. The figure 4 shows a proper distribution of colors on a gradient scale. Even though this rule does not touch upon the topic of the choice of colors, it is worth mentioning that other studies [28] suggest refraining from “rainbow scale” and advise simpler/fewer color combinations instead.

Figure 4: Example of an equidistributed color gradient scale (using dataset Iris [29])

Q: "Does it omit or utilize properly the third dimension?" This question focuses on minimizing the additional dimensional complications of the visualization. According to state-of-the-art research [30] using more than two dimensions on a visualization may be misleading and difficult to perceive. Moreover, another study [31] shows that interpreting angles (which appear a lot more often on 3D graphics) has many possible vicious implications, ranging from minor “illusory effects” distorting the viewer perception to completely hiding some data points on the visualization. See figure 5 for a graphical example of how differently the same angle (data point) looks due to different projections.

Figure 5: Example of different angles for a 3D pie chart 8

Q: "Does the visualization obey the reading gravity?" The concept of reading gravity [32] compre- hensively aggregates all aspects of the order in which the viewer perceives (reads) the visualization. It accounts for how the user reads the text of the visualization and in what order they see the data points, sub-charts (if faceting) and all other visuals. Although most western cultures are sinistrodextral (i.e. read from left to right) and from top to bottom, some cultures follow different patterns (e.g., Arabic is written right to left [33], Hanunuo script is written bottom to top [34]). Hence, to maintain flexibility and universality, the question imposes general “reading gravity” instead of “left to right, top to bottom”.

3 Technical implementation

The VisQualdex has been implemented as an easy to use online tool for data evaluation. The tool is hosted at https://visqual.onrender.com and a screenshot is presented in Figure 6. The tool allows anyone, even without advanced visualization knowledge, to upload a result of data visualization (a picture in any common format, i.e. JPG, PNG etc.) and evaluate it by going through all the questions. Each question has a yes or no answer. If a question does not apply to a particular chart or a user cannot answer a question because they are unsure, they can skip it, which marks them as “NA” (Not Applicable). Moreover, most of the questions have examples presenting good or bad practices of that question. For example, the question “Are colors distinct?” shows a “bad example” in which the colors are used incorrectly (i.e. continuous color scale is applied to the categorical axis). After processing all the questions, a user can see the report with percent of questions answered positively in each category. The higher the percent, the better the evaluation is. Moreover, they can download a persistent form of this report to the computer in a JSON or CSV format.

Figure 6: VisQual website screenshot

From the technical point of view, the website is implemented in Vue.js [35] as a single page application (SPA). Therefore, it does not need to reload the page at any point. However, it is responsive to changes and intuitive because it is based on the Material Design [36]. This makes it easy to use for users without technical knowledge. Additionally, all questions are stored in an external file, so it is possible to hot swap any content. Finally, the whole project has been carefully maintained using a version control system, Git [37].

4 Discussion

The field of data visualization does not suffer from the lack of guidelines and checklists, but rather from their surplus and disorder. This disarray results in a situation where our checklists are not comprehensive and thus dismiss the impact of the interaction of many factors determining the effectiveness of visual communication [38]. VisQualdex is the first codex (a structured set of criteria) that could be at the same time versatile and extensive enough to cover all existing data visualizations.

Our systematic approach results in a set of rules that constitutes a foundation for tools for data visualization creation (e.g., Microsoft Excel, Plotly, ggplot, Matplotlib, D3, etc.) and instruments for automatic/semi-automatic data visualization correction (e.g., project ReVision [39]). A principal example is the usage VisQualdex for default settings of these tools.

One of the problems concerning checklists is a varying level of detail. Specific points may mention at the same time significantly narrow and very general criteria while being on the same “level” of evaluation or even share the evaluation weight. It leads to imbalanced evaluation, which may either allow “incorrect” visualizations to slip through the metric or “good” visualizations to be unfairly punished for minor mistakes. VisQualdex partially solves this problem with categories, which guard question overlap and thematic division. However, we see it only as one of the first steps into a comprehensive visualization ontology, focused on the evaluation, instead of creation [40].

There are certain aspects of data visualization that could not be included in questions due to lack of scientific consensus and ongoing heated debate regarding the right answer. A primary example of it is the question “Is the data-to-ink ratio rational?”. There are respected experts like Edward Tufte [41] and others [42, 43] who favor minimalism in data visualization and reject “chart junk” [44]). There are also respected experts like Alberto Cairo [23] who claim that “chart junk” can be useful [45] and claim that redundancy (e.g., highlighting in color) may help to quicker convey the message and improve memorability [46]. Overly encumbering the visualization with unnecessary information may lead to confusion, but leaving as little trace of the information may also turn a visualization into a “clue hunt” instead of quickening information perception. A similar debate considers the “Y axis trimming” [47]. There are works claiming the starting the Y axis from 0 is the best way [48], while others suggest that different ranges apply in different situations [49]. These and other scientific arguments prove that data visualization is first a still developing and lively domain and second, that it is not purely an applied art/exact science but also fine arts/humanities. Moreover, it also implies that the consensus regarding specific aspects of visualizing information is still fluid and in future there will be a need to update the VisQualdex guidelines.

Finally, the current and future techniques of image processing will make it possible to automate or semi-automate some evaluation steps.

5 Conclusion

The following conclusions emerged from this thorough analysis and after seeing thousands of graphs, charts or infographics. Data visualization is a field on the border of computer science, data science and arts [22], which renders it highly subjective to the bias of the creator and the viewer. However, we believe it is possible to forge universal criteria and find a universal standard to visualize data more understandably. VisQualdex is our first attempt at this task.

References

[1] M. Friendly, “A brief history of data visualization,” in Handbook of data visualization, pp. 15–56, Springer, 2008.

[2] J. H. Kaas and P. Balaram, “Current research on the organization and function of the visual system in primates,” // Eye and brain, vol. 6, no. Suppl 1, p. 1, 2014.

[3] G. V. R. J. S. Prasad and A. Ojha, “Text, table and graph – which is faster and more accurate to understand?,” in 2012 IEEE Fourth International Conference on Technology for Education, pp. 126– 131, 2012.

[4] W. Weber and H. Rall, “Data visualization in online journalism and its implications for the pro duction process,” in 2012 16th International Conference on Information Visualisation, pp. 349–356, IEEE, 2012.

[5] E. F. Sinar, “Data visualization,” // Big data at work: The data science revolution and organizational psychology, pp. 115–157, 2015.

[6] S. Few and P. Edge, “Data visualization: past, present, and future,” // IBM Cognos Innovation Center, 2007.

[7] K. Catchpole and S. Russ, “The problem with checklists,” // BMJ Quality & Safety, vol. 24, pp. 545– 549, Sept. 2015.

[8] T. Azzam, S. Evergreen, A. A. Germuth, and S. J. Kistler, “Data visualization and evaluation,” // New Directions for Evaluation, vol. 2013, no. 139, pp. 7–32, 2013.

[9] S. Tonidandel, E. B. King, and J. M. Cortina, Big data at work: The data science revolution and organizational psychology.Routledge, 2015.

[10] C. Lallemand, “Les abaques héxagonaux: Nouvelle méthode générale de calcul graphique, avec de nombreux exemples d’application,” // Ministère des travaux publics, Comité du nivellement général de la France, Paris, 1885.

[11] V. Porcu, “Matplotlib,” in Python for Data Mining Quick Syntax Reference, pp. 201–234, Springer, 2018.

[12] M. L. Waskom, “Seaborn: statistical data visualization,” // Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021.

[13] H. Wickham, “An introduction to ggplot: An implementation of the grammar of graphics in r,” // Statistics, 2006.

[14] S. M. Ali, N. Gupta, G. K. Nayak, and R. K. Lenka, “Big data visualization: Tools and challenges,” in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp. 656– 660, IEEE, 2016.

[15] G. Palsky, “The debate on the standardization of statistical maps and diagrams (1857-1901). ele ments of the history of graphical semiotics,” // Cybergeo: European Journal of Geography, 1999.

[16] “Joint committee on standards for graphic presentation,” // Publications of the American Statistical Association, vol. 14, no. 112, pp. 790–797, 1915.

[17] S. Evergreen and A. K. Emery, “Data visualization checklist,” 2016.

[18] B. D. I. Team, “Data visualization checklist,” // Berkley Business Process Management, pp. 1–4, 2020.

[19] M. Scriven, “The logic and methodology of checklists,” 2000.

[20] D. McKie, “Data visualization checklist.” (http://www.davidmckie.com/infogram-data visualization-checklist.pdf), accessed 2020.

[21] E. Grigg, “Smarter clinical checklists: how to minimize checklist fatigue and maximize clinician performance,” // Anesthesia & Analgesia, vol. 121, no. 2, pp. 570–573, 2015.

[22] M. Mirahan-Farag, “Chapter fourteen the segregation of applied arts from fine arts and the status of fashion,” // An Anthology of Philosophical Studies Volume 7, p. 145.

[23] A. Cairo, The Functional Art: An introduction to information graphics and visualization.New Riders, 2012.

[24] U. Engelke, A. Abdul-Rahman, and M. Chen, “Visupply: A supply-chain process model for visual ization guidelines,” in 2018 International Symposium on Big Data Visual and Immersive Analytics (BDVA), pp. 1–9, IEEE, 2018.

[25] A. Diehl, M. Kraus, A. Abdul-Raman, M. El-Assady, B. Bach, R. S. Laramee, D. Keim, and M. Chen, “Studying visualization guidelines according to grounded theory,” // arXiv preprint arXiv:2010.09040, 2020.

[26] L. Wilkinson, “The grammar of graphics,” in Handbook of computational statistics, pp. 375–414, Springer, 2012.

[27] H. Wickham, “A layered grammar of graphics,” // Journal of Computational and Graphical Statis tics, vol. 19, no. 1, pp. 3–28, 2010.

[28] B. E. Rogowitz, A. D. Kalvin, A. Pelah, and A. Cohen, “Which trajectories through which per ceptually uniform color spaces produce appropriate colors scales for interval data?,” in Color and Imaging Conference, vol. 1999, pp. 321–326, Society for Imaging Science and Technology, 1999.

[29] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” // Annals of eugenics, vol. 7, no. 2, pp. 179–188, 1936.

[30] E. Bertini, N. Elmqvist, and T. Wischgoll, “Judgment error in pie chart variations,” in Proceedings of the Eurographics/IEEE VGTC conference on visualization: Short papers, pp. 91–95, 2016.

[31] W. S. Cleveland and R. McGill, “Graphical perception and graphical methods for analyzing scientific data,” // Science, vol. 229, no. 4716, pp. 828–833, 1985.

[32] C. Wheildon, D. Ogilvy, and G. Heard, Type & Layout: Are You Communicating Or Just Making Pretty Shapes. Kickstarting Business Series, Worsley Press, 2005.

[33] T. R. Jordan, A. A. Almabruk, E. A. Gadalla, V. A. McGowan, S. J. White, L. Abedipour, and K. B. Paterson, “Reading direction and the central perceptual span: Evidence from arabic and english,” // Psychonomic bulletin & review, vol. 21, no. 2, pp. 505–511, 2014.

[34] Y. J. S. Epo, Discourse analysis of suyot: a Hanunuo-Mangyan folk narrative.PhD thesis, Citeseer, 2014.

[35] “Vue.js – the progressive javascript framework v3.0..” (https://vuejs.org/guide/ introduction.html), 2014.

[36] Google Inc., “Material design lite,” 2015.

[37] D. Spinellis, “Git,” // IEEE Software, vol. 29, no. 3, pp. 100–101, 2012.

[38] E. Kandogan and H. Lee, “A grounded theory study on the language of data visualization principles and guidelines,” // Electronic Imaging, vol. 2016, no. 16, pp. 1–9, 2016.

[39] M. Savva, N. Kong, A. Chhajta, L. Fei-Fei, M. Agrawala, and J. Heer, “Revision: Automated classification, analysis and redesign of chart images,” in Proceedings of the 24th Annual ACM Sym posium on User Interface Software and Technology, UIST ’11, (New York, NY, USA), p. 393–402, Association for Computing Machinery, 2011.

[40] D. J. Duke, K. W. Brodlie, and D. A. Duce, “Building an ontology of visualization,” in IEEE Visualization 2004, pp. 7p–7p, IEEE, 2004.

[41] E. R. Tufte, N. H. Goeler, and R. Benson, Envisioning information, vol. 2. Graphics press Cheshire, CT, 1990.

[42] K. McGurgan, Data-ink ratio and task complexity in graph comprehension.Rochester Institute of Technology, 2015.

[43] O. Inbar, N. Tractinsky, and J. Meyer, “Minimalism in information visualization: attitudes to wards maximizing the data-ink ratio,” in Proceedings of the 14th European conference on Cognitive ergonomics: invent! explore!, pp. 185–188, 2007.

[44] M. A. Gatto, “Making research useful: Current challenges and good practices in data visualisation,” 2015.

[45] H. Li and N. Moacdieh, “Is “chart junk” useful? an extended examination of visual embellishment,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 58, pp. 1516–1520, Sage Publications Sage CA: Los Angeles, CA, 2014.

[46] S. Bateman, R. L. Mandryk, C. Gutwin, A. Genest, D. McDine, and C. Brooks, “Useful junk? the effects of visual embellishment on comprehension and memorability of charts,” in Proceedings of the SIGCHI conference on human factors in computing systems, pp. 2573–2582, 2010.

[47] M. Correll, E. Bertini, and S. Franconeri, “Truncating the y-axis: Threat or menace?,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2020.

[48] L. King, “Preparing better graphs,” // Journal of Public Health and Emergency, vol. 2, no. 1, 2018.

[49] J. K. Witt, “Graph construction: An empirical investigation on setting the range of the y-axis,” // Meta-psychology, 2019.

Supplement 1 - VisQual questions

1.1 Coordinates

1 Do the units match their values? [17]

2 Are the coordinates changing in the intuitive direction? [18]

3 Does the scale cover the whole data? [17]

See Figure 1.1 with a good example.

Figure 1.1: Good example for question 3¹

4 Is there a scale/axis? [17]

See Figure 1.2 with a bad example.

Figure 1.2: Bad example for question 4

5 Is there enough axes or value references? [17]

See Figure 1.3 with a bad example.

Figure 1.3: Bad example for question 5

6 Is there no more than one scale in one dimension? [7]

See Figure 1.4 with a bad example.

Figure 1.4: Bad example for question 6

7 Are the units properly displayed? [17]

See Figure 1.5 with a bad example.

Figure 1.5: Bad example for question 7

8 Does it use the proper coordinate system? [17]

See Figure 1.6 with a bad example.

Figure 1.6: Bad example for question 8

9 Are the coordinates consistent? [17]

See Figure 1.7 with a bad example.

Figure 1.7: Bad example for question 9

10 Are the coordinates units equal? [17]

See Figure 1.8 with a bad example.

Figure 1.8: Bad example for question 10

11 Is each axis in one dimension? [17]

See Figure 1.9 with a bad example.

Figure 1.9: Bad example for question 11

12 Do the coordinates correspond to the rest of the visualization? [17]

13 Are the units displayed clearly? [17]

See Figure 1.10 with a bad example.

Figure 1.10: Bad example for question 13

1.2 Data

14 Does it contain any data? [17]

See Figure 1.11 with a bad example.

Figure 1.11: Bad example for question 14

15 Is the data correct? [17]

16 Is missing data represented? [17]

17 Is the data organized according to five hat racks? [10]

1.3 Position

18 Does it omit or utilize properly the third dimension? [2]

See Figure 1.12 with a bad example.

Figure 1.12: Bad example for question 18

1.4 Geometry

19 Are the shapes associated with the topic? [13]

See Figure 1.13 with a good example.

Figure 1.13: Good example for question 19²

20 Do the shapes represent the data in proper scale? [17]

See Figure 1.14 with a bad example.

Figure 1.14: Bad example for question 20

21 Does it use linear layour instead of radial if it requires value lookup? [6]

1.5 Perception

22 If there are lines which may be compared with each other are they far from being horizontal or vertical? [5]

23 Are all 2D shapes presented in a simple projection/without projections? [9]

24 Are used pie chart simple? [2]

1.6 Position

25 Are angles not to sharp/too flat? [5]

1.7 Guides

26 Are the axes labeled? [15]

See Figure 1.15 with a bad example.

Figure 1.15: Bad example for question 26

27 Are all crucial data points labelled? [17]

See Figure 1.16 with a bad example.

Figure 1.16: Bad example for question 27

28 Is all text readable? [17]

See Figure 1.17 with a bad example.

Figure 1.17: Bad example for question 28

29 Is the title descriptive and well formed? [3]

1.8 Facets

30 Does it preserve the focus-plus-context principle? [4]

See Figure 1.18 with a good example.

Figure 1.18: Good example for question 30³

1.9 Perception

31 Is the data memorable after the first sight? [3]

32 Are the main features of the data easy to remember long-term? [3]

33 Does the visualization obey the reading gravity? [18]

34 Is the information organized according to five hat racks? [10]

See Figure 1.19 with a good example.

Figure 19: Good example for question 34⁴

35 Are all axes’ directions and shape proper? [18]

1.10 Manipulation

36 Does it take into account the gestalt principles of perception [8]

37 Is redundancy used appropriately and consistently? [3]

38 Is the visualization addressed properly to target audience? [15]

1.11 Subjective

39 Is this visualization better than similar ones that you have seen?

40 Did you learn something from this visualization?

41 Is the visualization aesthetically pleasing? [10]

42 Is the presented information useful?

43 Was the time spent experiencing the visualization worth it?

44 Would you like to see a visualization similar to this one?

45 Do you understand the visualization?

46 Do you like it?

47 Is this visualization interesting?

1.12 Theme

48 Are colors distinct? [16]

See Figure 1.20 with a bad example.

Figure 1.20: Bad example for question 48

49 Is the use of colors appropriate? [14]

See Figure 1.21 with a bad example.

Figure 1.21: Bad example for question 49

50 Are bipolar problems represented using two-color scale? [16]

51 Are color values on color scale consistent with their representation? [16]

52 Does it contain additional graphics? [3]

See Figure 1.22 with a good example.

Figure 22: Good example for question 52⁵

53 Are there not too many colors? [16]

54 Are the colors intuitive? [11]

See Figure 1.23 with a bad example.

Figure 1.23: Bad example for question 54

55 Is the gradient scale wide enough to allow distinct colors? [11]

See Figure 1.24 with a good example.

Figure 1.24: Good example for question 55⁶

56 Are colors readable on the medium that the visualization is supposed to be presented on? [15]

See Figure 1.25 with a bad example.

Figure 1.25: Bad example for question 56

57 Are the colors understandable for viewers with color blindess? [1]

58 Is color scale appropriate? [15]

59 Is there not too many colors representing the data? [12]

60 Is there not too many colors representing the data? [12]

The questions have been built using among others the following sources:

References

[1] Basaeir Y Ahmed, Zainab H Majeed, and Safa A Najim. Applying the visualization technique to solve the human color blindness. In AIP Conference Proceedings, volume 2290, page 040008. AIP Publishing LLC, 2020.

[2] Enrico Bertini, Niklas Elmqvist, and Thomas Wischgoll. Judgment error in pie chart variations. In Proceedings of the Eurographics/IEEE VGTC conference on visualization: Short papers, pages 91–95, 2016.

[3] Michelle A Borkin, Zoya Bylinskii, Nam Wook Kim, Constance May Bainbridge, Chelsea S Yeh, Daniel Borkin, Hanspeter Pfister, and Aude Oliva. Beyond memorability: Visualization recognition and recall. IEEE transactions on visualization and computer graphics, 22(1):519– 528, 2015.

[4] Mackinlay Card. Readings in information visualization: using vision to think.Morgan Kaufmann, 1999.

[5] William S Cleveland and Robert McGill. Graphical perception and graphical methods for analyzing scientific data. Science, 229(4716):828– 833, 1985.

[6] Joseph Goldberg and Jonathan Helfman. Eye tracking for visualization evaluation: Reading values on linear versus radial graphs. Information visualization, 10(3):182–195, 2011.

[7] Petra Isenberg, Anastasia Bezerianos, Pierre Dragicevic, and Jean Daniel Fekete. A study on dual-scale data charts. IEEE Transactions on Visualization and Computer Graphics, 17(12):2469–2478, 2011.

[8] Koponen, Juuso, and Jonatan Hildén. Data visualization handbook. Aalto korkeakoulusäätiö, 2019.

[9] Robert Kosara. Evidence for area as the primary visual cue in pie charts. In 2019 IEEE Visualization Conference (VIS), pages 101–105. IEEE, 2019.

[10] William Lidwell, Kritina Holden, and Jill Butler. Universal principles of design, revised and updated: 125 ways to enhance usability, influ ence perception, increase appeal, make better design decisions, and teach through design.Rockport Pub, 2010.

[11] Sharon Lin, Julie Fortuna, Chinmay Kulkarni, Maureen Stone, and Jef frey Heer. Selecting semantically-resonant colors for data visualization. In Computer Graphics Forum, volume 32, pages 401–410. Wiley Online Library, 2013.

[12] Vinh T Nguyen, Kwanghee Jung, and Vibhuti Gupta. Examining data visualization pitfalls in scientific publications. Visual Computing for Industry, Biomedicine, and Art, 4(1):1–15, 2021.

[13] Don Norman. Things that make us smart: Defending human attributes in the age of the machine.Diversion Books, 2014.

[14] Christine Nothelfer, Michael Gleicher, and Steven Franconeri. Redun dant encoding strengthens segmentation and grouping in visual displays of data. Journal of Experimental Psychology: Human Perception and Performance, 43(9):1667, 2017.

[15] Nicolas P Rougier, Michael Droettboom, and Philip E Bourne. Ten sim ple rules for better figures. PLoS computational biology, 10(9): e1003833, 2014.

[16] Maureen Stone. Choosing colors for data visualization. Business Intel ligence Network, 2, 2006.

[17] Edward R Tufte. The visual display of quantitative information. The Journal for Healthcare Quality (JHQ), 7(3):15, 1985.

[18] Colin Wheildon and Geoffrey Heard. Type & layout: Are you communicating or just making pretty shapes. Worsley Press, 2005.

Scientific Visualization

Open Access Electronic Journal

National Research Nuclear University "MEPhI"