Supporting cognition in systems biology analysis: findings on users' processes and design implications

Background Current usability studies of bioinformatics tools suggest that tools for exploratory analysis support some tasks related to finding relationships of interest but not the deep causal insights necessary for formulating plausible and credible hypotheses. To better understand design requirements for gaining these causal insights in systems biology analyses a longitudinal field study of 15 biomedical researchers was conducted. Researchers interacted with the same protein-protein interaction tools to discover possible disease mechanisms for further experimentation. Results Findings reveal patterns in scientists' exploratory and explanatory analysis and reveal that tools positively supported a number of well-structured query and analysis tasks. But for several of scientists' more complex, higher order ways of knowing and reasoning the tools did not offer adequate support. Results show that for a better fit with scientists' cognition for exploratory analysis systems biology tools need to better match scientists' processes for validating, for making a transition from classification to model-based reasoning, and for engaging in causal mental modelling. Conclusion As the next great frontier in bioinformatics usability, tool designs for exploratory systems biology analysis need to move beyond the successes already achieved in supporting formulaic query and analysis tasks and now reduce current mismatches with several of scientists' higher order analytical practices. The implications of results for tool designs are discussed.

• Provide "why?" provenance and semantics of data integration (Chapman et al, 2008 Strategies criteria that scientists use to find relationships of interest (those mentioned in Mismatch 2) (Jayapandian and Jagadish, 2008) Enable scientists to interact with multiple-scale views of complex query results.
D User Need How confident am I that displayed interactions that I'm interpreting are not happening by chance alone?
Mismatch 1, 2 Objective Generate trust, Reduce search/ analysis space, Contextualize and cue biological relationships, Give users flexible interactivity

Design strategies
Develop/let users run algorithms for significance testing on overall network structures and motifs (Barabasi and Oltvai, 2004) Reveal the logic and parameters of the algorithms Offer additional indicators of the strength of relationships -e.g. term enrichment statistics(MeSH, GO); and reveal the logic of these computations

User Need Will I be able to move from static multidimensional relationships of interest to dynamics of association and effect?
Mismatch 2 Objective Generate trust, Reduce search/ analysis space, Contextualize and cue biological relationships, Give users flexible interactivity

Clarify/display what comprises an interaction and a molecule
Additionally, give prominence to other high priority information for seeing patterns and relationships: GO annotations, homology, pathways, reactions, interrelated layers of GO annotations across classes (Dadzie and Burger, 2005) Provide the ability to perceptually encode on node and edge traits, including counts or types of experiments that showed a particular interaction (Barsky et al, 2007) Highlight indirect interactions and combine visual highlighting and motion to draw selective attention to neighbors and sub-graphs, especially motifs, or clusters sharing attributes. (Ware and Bobrow, 2002 ) Provide several side by side views for diverse perspectives on biological relationships have them dynamically linked on such operations as selection, filtering, and color coding In views rich in conceptual biology data, cue groupings, scales, and content relevant to domain-based inferences (Baldonado et al, 2000) Show/provide the ability to import one's own data (Cline et al, 2007) Show/provide abilities for users to: Aggregate by self-specified fields and select Objective Contextualize and cue biological relationships

Design strategies
Represent and provide ways to perceptually encode (e.g. color code) on molecular attributes and display the encoding on pathways (Efroni et al, 2007) Provide capabilities for user annotations (publicly shared if desired) and give some means of standardizing them to afford perceptual encoding, filtering, etc. (Chen et al, 2008) Represent sub-networks associated with disease (Chuang et al, 2007) B.

User Need Can I inventively group genes and relationships of interest to infer causal relationships? Mismatch 3
Objective Trust, Reduce search/ analysis space, Contextualize and cue biological relationships

Design strategies
Represent relationships across GO classes and hierarchical levels (e.g. a specific function "is involved in" a certain process and "acts in" a certain component; or provide visual pivots to show many hierarchical levels in multiple classes and crossmembership) (Myre et al, 2006;Robertson et al, 2002) Provide capabilities for users to group clusters of interactions into aggregates defined by a superordinate class (e.g. GO category, perhaps at a certain level of the hierarchy). Provide the ability for users to create "smart" aggregates on available traits (Tesone and Goodall, 2007) Represent relationships between gene/protein interactions and significantly enriched MeSH terms, letting users select the relationships to display Highlight motifs in biological networks with the significance of their frequency and let users interactively impose biological traits on visualized motifs to find biological meaning, (Schrieiber and Schwobbermeyer, 2005) Provide zoom capabilities that leave context visible Provide the ability to drill down into aggregates and roll up again Reveal computations on which a tool's pre-calculated clusters are based.
Build in hierarchical graph structures (in which nodes contain graphs) to accommodate displays of aggregates C.

User Need Can I place interactions in context to infer and judge the credibility of variable behaviors, contingencies, and dynamic effects?
Mismatch 3 Objective Trust, Reduce search/ analysis space, Contextualize and cue biological relationships

Design strategies
Provide views/layouts of gene interactions that suggest temporal contexts , e.g. gene interactions layered by regulatory processes (Barsky et al, 2007 ) Represent (e.g. through overlays) gene/protein interactions and states in relation to canonical pathways, in relation to "disease-ome," in relation to regulatory relationships, or in relation to all three (Reese et al, 2005;Efroni et al, 2007) In representations of neighbors, draw attention/let users draw attention to biologically meaningful neighbors, chains/loops of interactions, or factors limiting behaviors in a hypothetical biological event Provide graph-theoretic statistics on networks with cues to implications for inferring biological meaning and with significance values to judge the likelihood of a topological structure occurring by chance alone in the database (Zhang et al, 2007;Bader and Hogue, 2003;Wong et al, 2006) Provide visual indicators signaling confidence levels when statistical values are encoded by color, size, thickness (Holloway et al, 2008) D.

User need Can I spatially transform my mental model of causal relationships to better develop and validate explanations of biological events and consequences?
Mismatch 3 Objective Reduce search/analysis space, Contextualize and cue biological relationships, Give users flexible interactivity

Design strategies
Build in functionality for graph customizations, e.g. let users construct a: workspace for side by side comparisons (Jonker et al, 2005;Kerpedjiev and Roth, 2001) Build in functionality for aggregating data, including custom aggregations and for perceptually encoding by aggregates.