Illuminating the Path: The Research and Development Agenda for Visual Analytics. Buy / Download or learn more
Jump to Year:
(A free pdf viewer is available from Adobe.)
Scalable Visual Analytics of Massive Textual Datasets
Krishnan M, SJ Bohn, WE Cowley, VL Crow, and J Nieplocha. 2007. "Scalable Visual Analytics of Massive Textual Datasets." In IEEE International Parallel & Distributed Processing Symposium. Long Beach, CA, March 26-30, 2007.
Abstract
This paper describes the first scalable implementation of text processing engine used in Visual
Analytics tools. These tools aid information analysts in interacting with and understanding large textual
information content through visual interfaces. By developing parallel implementation of the text
processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive
dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear
scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis
of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Visual Analysis of Weblog Content
Gregory ML, DA Payne, D McColgin, NO Cramer, and DV Love. 2006. "Visual Analysis of Weblog Content." In International Conference on Weblogs and Social Media '07. pp. 227-230. Boulder, March 26-28, 2007.
Abstract
In recent years, one of the advances of the World Wide Web is social media and one of the fastest
growing aspects of social media is the blogosphere. Blogs make content creation easy and are highly
accessible through web pages and syndication. With their growing influence, a need has arisen to be
able to monitor the opinions and insight revealed within their content. This paper describes a technical
approach for analyzing the content of blog data using a visual analytic tool, IN-SPIRE, developed by
Pacific Northwest National Laboratory. We will describe both how an analyst can explore blog data with
IN-SPIRE and how the tool could be modified in the future to handle the specific nuances of analyzing blog
data.
Diverse Information Integration and Visualization
Havre SL, A Shah, C Posse, and BM Webb-Robertson. "Diverse Information Integration and Visualization." 2006. In Visualization and Data Analysis 2006 (EI10). SPIE The International Society for Optical Engineering, San Jose, CA.
Abstract
This paper presents and explores a technique for visually integrating and exploring diverse information. Society produces,
collects, and processes ever larger and diverse data including semi- and un-structured text, as well as transaction,
communication, and scientific data. It is no longer sufficient to analyze one type of data or information in isolation.
Users need to explore their data/information in the context of related information to discover often hidden, but meaningful,
complex relationships. Our approach visualizes multiple, like entities across multiple dimensions where each dimension is a
partitioning of the entities. The partitioning may be based on inherent or assigned attributes of the entities (or entity data)
such as meta-data or prior knowledge captured in annotations. The partitioning may also be derived from entity data. For example,
clustering, or unsupervised classification, can be applied to arrays of multidimensional entity data to partition the entities
into groups of similar entities, or clusters. The same entities may be clustered on data from different experiment types or
processing approaches. This reduction of diverse data/information on an entity to a series of partitions, or discrete
(and unit-less) categories, allows the user to view the entities across a variety of data without concern for data types and units.
Parallel coordinates visualize entity data across multiple dimensions of typically continuous attributes. We adapt parallel
coordinates for dimensions with discrete attributes (partitions) to allow the comparison of entity partition patterns for
identifying trends and outlier entities. We illustrate this approach through a prototype, Juxter (short for Juxtaposer).
From Question Answering to Visual Exploration
McColgin DW, ML Gregory, EG Hetzler, and AE Turner. 2006. "From Question Answering to Visual Exploration." In Proceedings of the ACM SIGIR workshop on Evaluating Exploratory Search Systems, pp. 47-50. Seattle, August 10th, 2006.
Abstract
Research in Question Answering has focused on the quality of information retrieval or extraction using
the metrics of precision and recall to judge success; these metrics drive toward finding the specific
best answer(s) and are best supportive of a lookup type of search. These do not address the opportunity
that users' natural language questions present for exploratory interactions. In this paper, we present
an integrated Question Answering environment that combines a visual analytics tool for unstructured text
and a state-of-the-art query expansion tool designed to compliment the cognitive processes associated with
an information analysts work flow. Analysts are seldom looking for factoid answers to simple questions;
their information needs are much more complex in that they may be interested in patterns of answers over
time, conflicting information, and even related non-answer data may be critical to learning about a problem
or reaching prudent conclusions. In our visual analytics tool, questions result in a comprehensive answer
space that allows users to explore the variety within the answers and spot related information in the rest
of the data. The exploratory nature of the dialog between the user and this system requires tailored
evaluation methods that better address the evolving user goals and counter cognitive biases inherent to
exploratory search tasks.
Generating Graphs for Visual Analytics through Interactive Sketching
Wong PC, HP Foote, PS Mackey, KA Perrine, and G Chin, JR. 2006. "Generating Graphs for Visual Analytics through Interactive Sketching." IEEE Transactions on Visualization and Computer Graphics Volume 12(Number 6):, doi:10.1109/TVCG.2006.91
Abstract
We introduce an interactive graph generator, GreenSketch, designed to facilitate the creation of descriptive
graphs required for different visual analytics tasks. The human-centric design approach of GreenSketch enables
users to master the creation process without specific training or prior knowledge of graph model theory. The
customized user interface encourages users to gain insight into the connection between the compact matrix
representation and the topology of a graph layout when they sketch their graphs. Both the human-enforced and
machine-generated randomnesses supported by GreenSketch provide the flexibility needed to address the uncertainty
factor in many analytical tasks. This paper describes over two dozen examples that cover a wide variety of graph
creations from a single line of nodes to a real-life small-world network that describes a snapshot of telephone
connections. While the discussion focuses mainly on the design of GreenSketch, we include a case study that applies
the technology in a visual analytics environment and a usability study that evaluates the strengths and weaknesses
of our design approach.
Graph Signatures for Visual Analytics
Wong PC, HP Foote, G Chin, JR, PS Mackey, and KA Perrine. 2006. "Graph Signatures for Visual Analytics." IEEE Transactions on Visualization and Computer Graphics 12(6):, doi:10.1109/TVCG.2006.92
Abstract
We present a visual analytics technique to explore graphs using the concept of a data signature. A data
signature, in our context, is a multidimensional vector that captures the local topology information
surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional
scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the
vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations
through repeated use of brushing and linking between the two visualizations. The interpretation of the graph
structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts
conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public
domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate
its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness
and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the
traditional approach based solely on graph visualization but also the advantages and strengths of our
signature-guided approach presented in the paper.
Have Green - A Visual Analytics Framework for Large Semantic Graphs
Wong PC, G Chin, Jr, HP Foote, PS Mackey, and JJ Thomas. 2006. "Have Green - A Visual Analytics Framework for Large Semantic Graphs." In IEEE Symposium on Visual Analytics Science and Technology, pp 67-74. Baltimore, Maryland, October 31-November 2, 2006.
Abstract
A semantic graph is a network of heterogeneous nodes and links annotated with a domain ontology. In
intelligence analysis, investigators use semantic graphs to organize concepts and relationships as graph
nodes and links in hopes of discovering key trends, patterns, and insights. However, as new information
continues to arrive from a multitude of sources, the size and complexity of the semantic graphs will soon
overwhelm an investigator's cognitive capacity to carry out significant analyses. We introduce a powerful
visual analytics framework designed to enhance investigators' natural analytical capabilities to comprehend
and analyze large semantic graphs. The paper describes the overall framework design, presents major
development accomplishments to date, and discusses future directions of a new visual analytics system known
as Have Green.
Walking the Path-A New Journey to Explore and Discover through Visual Analytics
Wong PC, SJ Rose, G Chin, Jr, D Frincke, RA May, II, C Posse, AP Sanfilippo, and JJ Thomas. 2006. "Walking the Path-A New Journey to Explore and Discover through Visual Analytics." Information Visualization 5(4):237-249. doi:10.1057/palgrave.ivs.9500133
Abstract
Visual representations are essential aids to human cognitive tasks and are valued to the extent that they
provide stable and external reference points upon which dynamic activities and thought processes may be
calibrated and upon which models and theories can be tested and confirmed. The active use and manipulation
of visual representations makes many complex and intensive cognitive tasks feasible. As described in the
recently published "Illuminating the Path", visual analytics is "the science of analytical reasoning
facilitated by interactive visual interfaces." We describe research and development at PNNL focused on
improving the value that interactive visual representations provide to persons engaged in complex cognitive
tasks. We describe work at PNNL that carries forward research from multiple disciplines with a goal to
improve the capability of visual representations and present examples whose aim is to improve the extraction,
and reasoning about information, knowledge, and data.
A Typology for Visualizing Uncertainty
Thomson JR, EG Hetzler, A MacEachren, MN Gahegan, and M Pavel. 2005. "A Typology for Visualizing Uncertainty." In Visualization and Data Analysis 2005, Published in Proceedings of the SPIE, vol. 5669, pp. 146-157. SPIE, IS&T, San Jose, CA.
Abstract
Information analysts must rapidly assess information to determine its usefulness in supporting and informing decision makers.
In addition to assessing the content, the analyst must also be confident about the quality and veracity of the information.
Visualizations can concisely represent vast quantities of information thus aiding the analyst to examine larger quantities of
material; however visualization programs are challenged to incorporate a notion of confidence or certainty because the factors
that influence the certainty or uncertainty of information vary with the type of information and the type of decisions being
made. For example, the assessment of potentially subjective human-reported data leads to a large set of uncertainty concerns
in fields such as national security, law enforcement (witness reports), and even scientific analysis where data is collected
from a variety of individual observers. What's needed is a formal model or framework for describing uncertainty as it relates
to information analysis, to provide a consistent basis for constructing visualizations of uncertainty. This paper proposes an
expanded typology for uncertainty, drawing from past frameworks targeted at scientific computing. The typology provides general
categories for analytic uncertainty, a framework for creating task-specific refinements to those categories, and examples drawn
from the national security field.
Bioinformatic Insights from Metagenomics through Visualization
Havre SL, BM Webb-Robertson, A Shah, C Posse, B Gopalan, and FJ Brockman. "Bioinformatic Insights from Metagenomics through Visualization." 2005. In Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB 2005). August 8-11, 2005, pp. 341-350. IEEE Computer Society, Los Alamitos, CA.
Abstract
Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of
high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of
such data, typically through advanced statistics and mathematics. Visualization is a complementary com-putational approach that
supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype,
Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing
patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data.
These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and
discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics,
which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.
Building a Human Information Discourse Interface to Uncover Scenario Content
Sanfilippo AP, BL Baddeley, AJ Cowell, ML Gregory, RE Hohimer, and SC Tratz. 2005. "Building a Human Information Discourse Interface to Uncover Scenario Content." In 2005 International Conference on Intelligence Analysis. Mitre Website, McLean, VA.
Abstract
In this paper, we present an interactive visual environment designed to provide a coherent discourse structure to information
from multiple documents. Based on event structure and discourse relationships, our system enables users to manipulate information
by choosing the topics and events that are of interest, query in detail, and store information concurrently. The three panel user
interface was specifically designed according to the cognitive processes that analysts use to gather, analyze, and construct
intelligence products.
Dynamic Visualization of Graphs with Extended Labels
Wong PC, PS Mackey, KA Perrine, JR Eagan, HP Foote, and J Thomas. 2005. "Dynamic Visualization of Graphs with Extended Labels." In 2005 IEEE Symposium on Information Visualization, Los Alamitos, CA, October 2005, pp. 73-80. IEEE, Piscataway, NJ.
Abstract
The paper describes a novel technique to visualize graphs with extended node and link labels. The lengths of these labels range
from a short phrase to a full sentence to an entire paragraph and beyond. Our solution is different from all the existing approaches
that almost always rely on intensive computational effort to optimize the label placement problem. Instead, we share the visualization
resources with the graph and present the label information in static, interactive, and dynamic modes without the requirement for
tackling the intractability issues. This allows us to reallocate the computational resources for dynamic presentation of real-time
information. The paper includes a user study to evaluate the effectiveness and efficiency of the visualization technique.
Extending the Reach of Augmented Cognition To Real-World Decision Making Tasks
Greitzer FL. "Extending the Reach of Augmented Cognition To Real-World Decision Making Tasks." 2005. In Augmented Cognition International Conference. HCI-International, Las Vegas.
Abstract
The focus of this paper is on the critical challenge of bridging the gap between psychophysiological sensor data and the inferred
cognitive states of users. It is argued that a more robust behavioral data collection foundation will facilitate accurate inferences
about the state of the user so that an appropriate mitigation strategy, if needed, can be applied. The argument for such a foundation
is based on two premises: (1) To realize the envisioned impact of augmented cognition systems, the technology should be applied to a
broad, and more cognitively complex, range of real-world problems. (2) To support identifying cognitive states for more complex,
real-world tasks, more sophisticated instrumentation will be needed for behavioral data collection. It is argued that such
instrumentation would enable inferences to be made about higher-level semantic aspects of performance. The paper describes how
instrumentation software developed to support information analysis R&D may serve as an integration environment that can provide
additional behavioral data, in context, to facilitate inferences of cognitive state that will enable the successful augmenting of
cognitive performance.
InfoStar: An Adaptive Visual Analytics Platform for Mobile Devices
Sanfilippo AP, RA May, II, GR Danielson, RM Riensche, and BL Baddeley. 2005. "InfoStar: An Adaptive Visual Analytics Platform for Mobile Devices." In First International Workshop on Managing Context Information in Mobile and Pervasive Environments. CEUR-WS.org, Ayia Napa, Cyprus.
Abstract
We present the design and implementation of InfoStar, an adaptive Visual Analytics platform for mobile devices such a PDAs, laptops,
Tablet PCs and mobile phones. InfoStar extends the reach of visual analytics technology beyond the traditional desktop paradigm to provide
ubiquitous access to inter-active visualizations of information spaces. These visualizations are critical in addressing the knowledge needs
of human agents operating in the field, in areas as diverse as business, homeland security, law enforcement, protective services, emergency
medical services and scientific discovery. We describe an initial real world deployment of this technology, in which the InfoStar platform
has been used to offer mobile access to scheduling and venue information to conference attendees at Supercomputing 2004.
Metrics and Measures for Intelligence Analysis Task Difficulty
Greitzer FL and KM Allwein. "Metrics and Measures for Intelligence Analysis Task Difficulty." 2005. In First International Conference on Intelligence Analysis Methods and Tools. MITRE Corp, McLean, VA.
Abstract
Recent workshops and conferences supporting the intelligence community (IC) have highlighted the need to characterize the difficulty or complexity of
intelligence analysis (IA) tasks in order to facilitate assessments of the impact or effectiveness of IA tools that are being considered for
introduction into the IC. Some fundamental issues are: (a) how to employ rigorous methodologies in evaluating tools, given a host of problems such as
controlling for task difficulty, effects of time or learning, small-sample size limitations; (b) how to measure the difficulty/complexity of IA tasks
in order to establish valid experimental/quasi-experimental designs aimed to support evaluation of tools; and (c) development of more rigorous (summative),
performance-based measures of human performance during the conduct of IA tasks, beyond the more traditional reliance on formative assessments
(e.g., subjective ratings). Invited discussants will be asked to comment on one or more of these issues, with the aim of bringing the most salient issues
and research needs into focus.
New Challenges Facing Integrative Biological Science in the Post-Genomic Era
Oehmen CS, T Straatsma, GA Anderson, G Orr, BM Webb-Robertson, RC Taylor, RW Mooney, DJ Baxter, DR Jones, and DA Dixon. 2005. "New Challenges Facing Integrative Biological Science in the Post-Genomic Era." Journal of Biological Systems.
Abstract
The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven
discovery research employing the massive amounts of available biological data. We identify key technological developments needed
to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a
wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data
resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of
biosystems from the molecular level to the organism population level. This will require the development of tools which efficiently
utilize high-performance compute power, large storage infrastructures and large aggregate memory architectures. The end result
will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given
system at a wide range of temporal and spatial scales in a single conceptual model.
Turning the Bucket of Text into a Pipe
Hetzler EG, VL Crow, DA Payne, and AE Turner. "Turning the Bucket of Text into a Pipe." 2005. In Proceedings of the IEEE Symposium on Information Visualization. INFOVIS 2005. 23-25 Oct. 2005, pp. 89-94. IEEE, Los Alamitos, CA.
Abstract
Many visual analysis tools operate on a fixed set of data. However, professional information analysts follow issues over a period of
time, and need to be able to easily add the new documents to an ongoing exploration. Some analysts handle documents in a moving
window of time, with new documents constantly added and old ones aging out. This paper describes both the user interaction and the
technical implementation approach for a visual analysis system designed to support constantly evolving text collections.
Scientist-Centered Graph-Based Models of Scientific Knowledge
Chin G, JR, EG Stephan, DK Gracio, OA Kuchar, PD Whitney, and KL Schuchardt. 2005. "Scientist-Centered Graph-Based Models of Scientific Knowledge." In HCI International 2005. 11th International Conference on Human-Computer Interaction, 22-27, July 2005, Caesars Palace, Las Vegas, Nevada USA., p. 10 pages. Lawrence Erlbaum and Associates, Mahwah, NJ.
Abstract
At the Pacific Northwest National Laboratory, we are researching and developing visual models and paradigms that will allow scientists
to capture and represent conceptual models in a computational form that may linked to and integrated with scientific data sets and
applications. Captured conceptual models may be logical in conveying how individual concepts tie together to form a higher theory,
analytical in conveying intermediate or final analysis results, or temporal in describing the experimental process in which concepts
are physically and computationally explored. In this paper, we describe and contrast three different research and development systems
that allow scientists to capture and interact with computational graph-based models of scientific knowledge. Through these examples, we
explore and examine ways in which researchers may graphically encode and apply scientific theory and practice on computer systems.
Top Ten Needs for Intelligence Analysis Tool Development
Badalamente RV, and FL Greitzer. "Top Ten Needs for Intelligence Analysis Tool Development." 2005. In First International Conference on Intelligence Analysis Methods and Tools. MITRE Corp, McLean, VA.
Abstract
The purpose of this paper is to report on the results of R&D to generate ideas about future enhancements to software systems
designed to aid the process of intelligence analysis (IA). Use of IA tools in actual settings has revealed significant problems:
the user's thought process has not been adequately modeled and is therefore not reflected in the design of analysis tools; users
find the tools difficult to learn and use; the tools are not tailored to specific intelligence domains; the tools do not offer an
integrated approach (data preprocessing/ingest is a particular problem); the tools do not address the longitudinal nature
(continuing over extended periods of time) of the general analysis problem. The aim of this work was to establish an enduring,
well-integrated, robust technical foundation for the development and deployment of information-technology (IT)-based IA tools
recognized by users and clients as uniquely well designed to meet their varied analysis needs. An overarching strategy or "roadmap"
is needed to guide technology development, and a more accurate understanding is needed about how real intelligence analysts do
their job. To address these needs, we conducted a facilitated workshop with nine working analysts. An intelligence analysis process
model was developed and discussed with the analysts as a point of departure for the discussion. Participants worked in break-out
groups to discuss concepts for tools and enhanced products to aid in the IA process. The top ten enhancements identified during
the workshop were: seamless data access and ingest; diverse data ingest and fusion; shared electronic folders for collaborative
analysis; hypothesis generation and tracking; template for analysis strategy; electronic skills inventory; dynamic data processing
and visualization; intelligent tutor for intelligence product development; imagery data resources; intelligence analysis knowledge
base. This paper and presentation will discuss the conduct of the workshop and the results obtained.
Toward the Development of Cognitive Task Difficulty Metrics to Support Intelligence Analysis Research
Greitzer FL. "Toward the Development of Cognitive Task Difficulty Metrics to Support Intelligence Analysis Research." 2005. In The Fourth IEEE Conference on Cognitive Informatics, Aug. 8-10, 2005. ICCI 2005, pp. 315-320. Institute of Electrical and Electronics Engineers, Piscataway, NJ.
Abstract
Intelligence analysis is a cognitively complex task that is the subject of considerable research aimed at developing methods and
tools to aid the analysis process. To support such research, it is necessary to characterize the difficulty or complexity of
intelligence analysis tasks in order to facilitate assessments of the impact or effectiveness of tools that are being considered for
deployment. A number of informal accounts of "What makes intelligence analysis hard" are available, but there has been no attempt to
establish a more rigorous characterization with well-defined difficulty factors or dimensions. This paper takes an initial step in
this direction by describing a set of proposed difficulty metrics based on cognitive principles.
Visual Sample Plan (VSP) Software: Designs and Data Analyses for Sampling Contaminated Buildings
Pulsipher BA, JE Wilson, RO Gilbert, LL Nuffer, and NL Hassig. 2005. "Visual Sample Plan (VSP) Software: Designs and Data Analyses for Sampling Contaminated Buildings." In Proceedings of 24th Annual National Conference on Managing Environmental Quality Systems , vol. 24-2-2, pp. 24-34. US EPA, Washington, DC.
Abstract
A new module of the Visual Sample Plan (VSP) software has been developed to provide sampling designs and data analyses for potentially
contaminated buildings. An important application is assessing levels of contamination in buildings after a terrorist attack. This new
module, funded by DHS through the Combating Terrorism Technology Support Office, Technical Support Working Group, was developed to
provide a tailored, user-friendly and visually-orientated buildings module within the existing VSP software toolkit, the latest version
of which can be downloaded from http://dqo.pnl.gov/vsp. In case of, or when planning against, a chemical, biological, or radionuclide
release within a building, the VSP module can be used to quickly and easily develop and visualize technically defensible sampling
schemes for walls, floors, ceilings, and other surfaces to statistically determine if contamination is present, its magnitude and extent
throughout the building and if decontamination has been effective. This paper demonstrates the features of this new VSP buildings module,
which include: the ability to import building floor plans or to easily draw, manipulate, and view rooms in several ways; being able to
insert doors, windows and annotations into a room; 3-D graphic room views with surfaces labeled and floor plans that show building zones
that have separate air handing units. The paper will also discuss the statistical design and data analysis options available in the
buildings module. Design objectives supported include comparing an average to a threshold when the data distribution is normal or unknown,
and comparing measurements to a threshold to detect hotspots or to insure most of the area is uncontaminated when the data distribution
is normal or unknown.
Analysis Experiences Using Information Visualization
Hetzler, E. and Turner A. "Analysis experiences using information visualization." 2004. IEEE Computer Graphics and Applications, 24:5, pp. 22-26.
Abstract
To deliver truly useful tools, researchers must learn how to map between the knowledge domains inherent in information collections and the
knowledge domains in users' minds. The true measure of this work is not what the software shows, but what the user is able to understand by using
it. This article summarizes lessons learned from an observational study of the application of the In-Spire
visually-oriented text exploitation system in an operational analysis environment.
Supporting Mutual Understanding in a Visual Dialogue Between Analyst and Computer
Chappell AR, AJ Cowell, DA Thurman, and JR Thomson. 2004. "Supporting Mutual Understanding in a Visual Dialogue Between Analyst and Computer." In HFES 2004 proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting: September 20-24, 2004, New Orleans, Louisiana, p. 5 Human Factors & Ergonomics Society, Santa Monica, AB, Canada.
Abstract
The Knowledge Associates for Novel Intelligence (KANI) project is developing a system of automated "associates" to actively support
and participate in the information analysis task. The primary goal of KANI is to use automatically extracted information in a reasoning
system that draws on the strengths of both a human analyst and automated reasoning. The interface between the two agents is a key
element in achieving this goal. The KANI interface seeks to support a visual dialogue with mixed-initiative manipulation of information
and reasoning components. To be successful, the interface must achieve mutual understanding between the analyst and KANI of the other's
actions. Toward this mutual understanding, KANI allows the analyst to work at multiple levels of abstraction over the reasoning process,
links the information presented across these levels to make use of interaction context, and provides querying facilities to allow
exploration and explanation.
Visual Analytics
Wong PC, and J Thomas. "Visual Analytics." 2004. IEEE Computer Graphics and Applications, 24:5 pp20-21.
Excerpt
The information revolution is upon us, and it's guaranteed to change our lives and the
way we conduct our daily business. The fact that we have to deal with not just the size but also the variety
and complexity of this information makes it a real challenge to survive the revolution. Enter visual analytics, a
contemporary and proven approach to combine the art of human intuition and the science of mathematical
deduction to directly perceive patterns and derive knowledge and insight from them.
Visual analytics is the formation of abstract visual metaphors in combination with a human information discourse (interaction) that enables detection of the expected and discovery of the unexpected within massive, dynamically changing information spaces. These suites of technologies apply to almost all fields but are being driven by critical needs in biology and national security...
Visualizing Data Streams
Wong PC, HP Foote, DR Adams, WE Cowley, LR Leung, and JJ Thomas. 2004. "Visualizing Data Streams." Chapter 11 in Visual and Spatial Analysis: Advances in Data Mining, Reasoning, and Problem Solving, ed. Boris Kovalerchuk and James Schwing, pp. 265-291,568,569,570,571. Springer, Dordrecht, Netherlands.
Abstract
We introduce two dynamic visualization techniques using multi-dimensional scaling to analyze transient data streams such as newswires and remote sensing imagery. While the time-sensitive nature of these data streams requires immediate attention in many applications, the unpredictable and unbounded characteristics of this information can potentially overwhelm many scaling algorithms that require a full re-computation for every update. We present an adaptive visualization technique based on data stratification to ingest stream information adaptively when influx rate exceeds processing rate. We also describe an incremental visualization technique based on data fusion to project new information directly onto a visualization subspace spanned by the singular vectors of the previously processed neighboring data. The ultimate goal is to leverage the value of legacy and new information and minimize re-processing of the entire dataset in full resolution. We demonstrate these dynamic visualization results using a newswire corpus and a remote sensing imagery sequence.
Dynamic Visualization of Transient Data Streams
Wong PC, HP Foote, DR Adams, WE Cowley, and JJ Thomas. 2003. "Dynamic Visualization of Transient Data Streams." In IEEE Symposium on Information Visualization 2003. Proceedings IEEE Symposium Information Visualization, Seattle, WA.
Abstract
We introduce two dynamic visualization techniques using multi-dimensional scaling to analyze transient
data streams such as newswires and remote sensing imagery. While the time-sensitive nature of these data
streams requires immediate attention in many applications, the unpredictable and unbounded characteristics
of this information can potentially overwhelm many scaling algorithms that require a full re-computation
for every update. We present an adaptive visualization technique based on data stratification to ingest
stream information adaptively when influx rate exceeds processing rate. We also describe an incremental
visualization technique based on data fusion to project new information directly onto a visualization subspace
spanned by the singular vectors of the previously processed neighboring data. The ultimate goal is to leverage
the value of legacy and new information and minimize re-processing of the entire dataset in full resolution.
We demonstrate these dynamic visualization results using a newswire corpus and a remote sensing imagery
sequence.
Global Visualization and Alignments of Whole Bacterial Genomes
Wong PC, K Wong, HP Foote, and JJ Thomas. 2003. "Global Visualization and Alignments of Whole Bacterial Genomes." IEEE Transactions on Visualization and Computer Graphics 9(3):361-377.
Abstract
We present a novel visualization technique to align whole bacterial genomes with millions of nucleotides.
Our basic design combines the descriptive power of pixel-based visualizations with the interpretative strength
of digital image-processing filters. The innovative use of pixel enhancement techniques on pixel-based
visualizations brings out the best of the recursive data patterns and further enhances the effectiveness of the
visualization techniques. The result is a fast, versatile, and cost-effective analysis tool to reveal the
functional identifications and the phenotypic changes of whole bacterial genomes. Our experiments show that our
visualization-based genome alignment technique outperforms other computational-based tools in processing time.
They also show that our pictorial results are far superior to the hardcopy printouts generated by
computation-based programs in studying the overall genomic structures. Six different bacterial genomes obtained
from public genome banks are used to demonstrate our designs and measure their performances.
Multivariate Visualization with Data Fusion
Wong PC, HP Foote, DL Kao, LR Leung, and JJ Thomas. 2002. "Mulitvariate Visualization with Data Fusion." In Infomation Visualization, vol. 1, no. 3/4, ed. Chaomei Chen, pp. 182-193. MacMillan, Hampshire, United Kingdom.
Abstract
We discuss a fusion-based visualization method to analyze a 2D flow field together with its related scalars. The primary difference between a conventional visualization and a fusion-based visuali-zation is that the former draws on a single image whereas the latter draws on multiple see-through layers, which are then over-laid on each other to form the final visualization. We propose uniquely designed colormaps to highlight flow features that would not be shown with conventional colormaps. We present fusion techniques that integrate multiple single-purpose flow visualiza-tion techniques into the same viewing space. Our highly flexible fusion approach allows scientists to explore multiple parameters concurrently by mixing and matching images without frequently reconstructing new visualizations from its data for every possible combination. Sample datasets collected from a climate modeling study are used to demonstrate our approach.
ThemeRiver: Visualizing Thematic Changes in Large Document Collections
S. Havre, E. Hetzler, P. Whitney, L. Nowell. “ThemeRiver: Visualizing Thematic Changes in Large Document Collections.” IEEE Transactions on Visualization and Computer Graphics, Vol.8, No. 1, January-March 2002.
Abstract
The ThemeRiver visualization depicts thematic variations over time within a large collection of documents. The thematic changes are shown in
the context of a timeline and corresponding external events. The focus on temporal thematic change whithin a context framework allows a user
to discern patterns that suggest relationships or trends. For example, the sudden change of thematic strength following an external event may
indicate a causal relationship. Such patterns are not readily accessible in other visualizations of the data. We use a river metaphor to convey
several key notions. The document collection's time line, selected thematic content, and thematic strength are indicated by the river's directed
flow, composition, and changing width, respectively. The directed flow from left to right is interpreted as movement through time and the
horizontal distance between two points on the river defines a time interval. At any point in time, the vertical distance, or width, of the river
indicates that collective strength of the selected themes. Colored "currents" flowing within the river represent individual themes. A current's
vertical width narrows or broadens to indicate decreases or increases in the strength of the individual theme.
Change blindness in information visualization: a case study
Nowell LT, EG Hetzler, and TE Tanasse. 2001. "Change Blindness in Information Visualization." October 22-23, 2001 Proceedings of the IEEE Information Visualization Symposium 2001 (InfoVis 2001), San Diego, CA.
Abstract
This paper introduces a graphical method for visually presenting and exploring the results of multiple queries simultaneously. This method allows
a user to visually compare multiple query result sets, explore various combinations among the query result sets, and identify the “best”
matches for combinations of multiple independent queries. This approach might also help users explore methods for progressively improving queries
by visually comparing the improvement in result sets.
Interactive Visualization of Multiple Query Results
S. Havre, E. Hetzler, K. Perrine, E. Jurrus, and N. Miller.“Interactive Visualization of Multiple Query Results.” October 22-23, 2001 Proceedings of the IEEE Information Visualization Symposium 2001 (InfoVis 2001), San Diego, CA.
Abstract
This paper introduces a graphical method for visually presenting and exploring the results of multiple queries simultaneously. This method allows
a user to visually compare multiple query result sets, explore various combinations among the query result sets, and identify the “best”
matches for combinations of multiple independent queries. This approach might also help users explore methods for progressively improving queries
by visually comparing the improvement in result sets.
Radical SAM, A Novel Protein Superfamily Linking Unresolved Steps in Familiar Biosynthetic Pathways with Radical Mechanisms: Functional Characterization Using New Analysis and Information Visualization Methods
Sofia HJ, G Chen, EG Hetzler, JF Reyes Spindola, and NE Miller. 2001. "Radical SAM, A Novel Protein Superfamily Linking Unresolved Steps in Familiar Biosynthetic Pathways with Radical Mechanisms: Functional Characterization Using New Analysis and Information Visualization Methods." Nucleic Acids Research 29(5):1097-1106.
Abstract
A large protein superfamily with over 500 members has been discovered and analyzed using powerful new
bioinformatics and information visualization methods. Evidence exists that these proteins generate a
5?-deoxyadenosyl radical by reductive cleavage of S-adenosylmethionine (SAM) through an unusual Fe-S center.
Radical SAM superfamily proteins function in DNA precursor, vitamin, cofactor, antibiotic, and herbicide
biosynthesis in a collection of basic and familiar pathways. One of the members is interferon-inducible and is
considered a candidate drug target for osteoporosis. The identification of this superfamily suggests that
radical-based catalysis is important in a number of previously well-studied but unresolved biochemical pathways.
Data Signatures and Visualization of Very Large Datasets
Pak Chung Wong, Harlan Foote, Ruby Leung, Dan Adams, and Jim Thomas. Data Signatures and Visualization of Very Large Datasets. IEEE Computer Graphics and Applications, Vol 20, No 2, March 2000.
Abstract
Today, as data sets used in computations grow in size and complexity,the technologies
developed over the years to deal with scientific data sets have become less efficient
and effective. Many frequently used operations,such as Eigenvector computation, could
quickly exhaust our desktop workstations once the data size reaches certain limits.
On the other hand,the high-dimensional data sets we collect every day don't relieve the problem. Many conventional metric designs that build on quantitative or categorical data sets cannot be applied directly to heterogeneous data sets with multiple data types. While building new machines with more resources might conquer the data size problems, the complexity of today's computations requires a new breed of projection techniques to support analysis of the data and verification of the results.
We introduce the concept of a data signature, which captures the essence of a scientific data set in a compact format, and use it to conduct analysis as if using the original. A time-dependent climate simulation data set demonstrates our approach and presents the results.
DriftWeed - A Visual Metaphor for Interactive Analysis of Multivariate Data
Stuart Rose and Pak Chung Wong. DriftWeed - A Visual Metaphor for Interactive Analysis of Multivariate Data. Proceedings IS&T/SPIE Conference on Visual Data Exploration and Analysis, San Jose, CA, Jan 2000.
Abstract
We present a visualization technique that allows a user to identify and detect
patterns and structures within a multivariate data set. Our research builds on
previous efforts to represent multivariate data in a two-dimensional information
display through the use of icon plots. Although the icon plot work done by Pickett
and Grinstein is similar to our approach, we improve on their efforts in several
ways.
Our technique allows analysis of a time series without using animation; promotes
visual differentiation of information clusters based on measures of variance; and
facilitates exploration through direct manipulation of geometry based on scales of
variance.
Our goal is to provide a visualization that implicitly conveys the degree to which
an element's ordered collection (pattern) of attributes varies from the prevailing
pattern of attributes for other elements in the collection. We apply this technique to
multivariate abstract data and use it to locate exceptional elements in a data set and
divisions among clusters.
ThemeRiver: Visualizing Theme Changes over Time
S. Havre, B. Hetzler, L. Nowell, "ThemeRiver: Visualizing Theme Changes over Time", Proceedings of IEEE Symposium on Information Visualization, InfoVis 2000, 2000, pp. 115 - 123.
Abstract
ThemeRiver™ is a prototype system that visualizes
thematic variations over time within a large collection
of documents. The "river" flows from left to right
through time, changing width to depict changes in
thematic strength of temporally associated documents.
Colored "currents" flowing within the river narrow or
widen to indicate decreases or increases in the strength
of an individual topic or a group of topics in the
associated documents. The river is shown within the
context of a timeline and a corresponding textual
presentation of external events.
Vector Fields Simplification - A Case Study of Visualizing Climate Modeling and Simulation Data Sets
Pak Chung Wong, Harlan Foote, Ruby Leung, Elizabeth Jurrus, Dan Adams, and Jim Thomas. Vector Fields Simplification - A Case Study of Visualizing Climate Modeling and Simulation Data Sets. Proceedings IEEE Visualization 2000. Salt Lake City, Utah, Oct 8 - Oct 13, 2000.
Abstract
In our study of regional climate modeling and simulation, we frequently encounter vector fields that are crowded with
large numbers of critical points. A critical point in a flow is where the vector field vanishes. While these critical
points accurately reflect the topology of the vector fields, in our study only a subset of them is worth further
investigation. We present a filtering technique based on the vorticity of the vector fields to eliminate the less
interesting and sometimes sporadic critical points in a multi-resolution fashion. The neighboring regions of the preserved
features, which are characterized by strong shear and circulation, are potential locations of weather instability. We
apply our feature- filtering technique to a regional climate modeling data set covering East Asia in the summer of 1991.
Visualizing Sequential Patterns for Text Mining
Pak Chung Wong, Wendy Cowley, Harlan Foote, Elizabeth Jurrus, and Jim Thomas. Visualizing Sequential Patterns for Text Mining. Proceedings IEEE Information Visualization 2000, Salt Lake City, Utah, Oct 8 - Oct 13, 2000.
Abstract
A sequential pattern in data mining is a finite series of elements such as A—>B—>C—>D where A, B, C, and
D are elements of the same domain. The mining of sequential patterns is designed to find patterns of discrete events that
frequently happen in the same arrangement along a timeline. Like association and clustering, the mining of sequential patterns is
among the most popular knowledge discovery techniques that apply statistical measures to extract useful information from large
datasets. As our computers become more powerful, we are able to mine bigger datasets and obtain hundreds of thousands of
sequential patterns in full detail. With this vast amount of data, we argue that neither data mining nor visualization by itself
can manage the information and reflect the knowledge effectively. Subsequently, we apply visualization to augment data mining in a
study of sequential patterns in large text corpora. The result shows that we can learn more and more quickly in an integrated
visual data-mining environment.
Visual Data Mining - Guest Editor's Introduction
Pak Chung Wong. Visual Data Mining - Guest Editor's Introduction. IEEE Computer Graphics and Applications, Vol 19, No 5, Sep 1999.
Abstract
Seeing is knowing, though merely seeing is not enough. When you understand what you see, seeing becomes
believing. A while ago scientists discovered that seeing and understanding together enable humans to glean
knowledge and deeper insight from large amounts of data. The approach integrates the human mind's
exploration abilities with the enormous processing power of computers to form a powerful knowledge discovery
environment that capitalizes on the best of both worlds. The technology builds on visual and analytical
processes developed in various disciplines including scientific visualization, data mining, statistics, and
machine learning with custom extensions that handle very large, multidimensional, multivariate data sets.
The methodology is based on both functionality that characterizes structures and displays data and human
capabilities that perceive patterns, exceptions, trends, and relationships. Here I'll define the vision,
present the state of the art, and discuss the future of a young discipline called visual data mining.
Visualizing Association Rules for Text Mining
Pak Chung Wong, Paul Whitney, and Jim Thomas. Visualizing Association Rules for Text Mining. Proceedings IEEE Information Visualization 99, San Francisco, CA, Oct 24 - Oct 29, 1999.
Abstract
An association rule in data mining is an implication of the form X —> Y where X is a set of
antecedent items and Y is the consequent item. For years researchers have developed many tools to
visualize association rules. However, few of these tools can handle more than dozens of rules, and
none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult
to visualize and understand the association information of a large data set even when all the rules
are available. This paper presents a novel visualization technique to tackle many of these problems.
We apply the technology to a text mining study on large corpora. The results indicate that our design
can easily handle hundreds of multiple antecedent association rules in a three-dimensional display
with minimum human interaction, low occlusion percentage, and no screen swapping.
ThemeRiver™: In Search of Trends, Patterns, and Relationships
Susan Havre, Beth Hetzler, and Lucy Nowell. In Proceedings of IEEE Symposium on Information Visualization, InfoVis '99, October 25-26, San Francisco CA.
Abstract
ThemeRiver™ is a prototype system that visualizes thematic
variations over time across a collection of documents. The
"river" flows through time, changing width to depict changes
in the thematic strength of documents temporally collocated.
Themes or topics are represented as colored "currents" flowing
within the river that narrow or widen to indicate decreases or
increases in the strength of a topic in associated documents at
a specific point in time. The river is shown within the context
of a timeline and a corresponding textual presentation of
external events.
Human Computer Interaction with Global Information Spaces - Beyond Data Mining
Jim Thomas, Kris Cook, Vern Crow, Beth Hetzler, Richard May, Dennis McQuerry, Renie McVeety, Nancy Miller, Grant Nakamura, Lucy Nowell, Paul Whitney, Pak Chung Wong. 1999. Pacific Northwest National Laboratory, Richland, WA 99352
Abstract
This invited paper describes a vision and progress towards
a fundamentally new approach for dealing with the
massive information overload situation of the emerging
global information age. Today we use techniques such as
data mining, through a WIMP interface, for searching or
for analysis. Yet, the human mind can deal and interact
simultaneously with millions of information items, e.g.
documents. The challenge is to find visual paradigms,
interaction techniques, and physical devices that encourage
a new human information discourse between the human
and their massive global and corporate information
resources. After the vision, the current progress towards
some core technology development, we present the grand
challenges to bring this vision to reality.
TOPIC ISLANDS™ - A Wavelet-Based Text Visualization System
Nancy E. Miller, Pak Chung Wong, Mary Brewster, and Harlan Foote. TOPIC ISLANDS - A Wavelet Based Text Visualization System. 1998. In Proceedings of the conference on Visualization '98, pp. 189-196.
Abstract
We present a novel approach to visualize and explore unstructured text. The underlying technology,
called TOPIC-O-GRAPHY™, applies wavelet transforms to a custom digital signal constructed from
words within a document. The resultant multiresolution wavelet energy is used to analyze the
characteristics of the narrative flow in the frequency domain, such as theme changes, which is then
to the overall thematic content of the text document using statistical methods. The thematic
characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized
text partitions to partitions consisting of a few words. Using this technology, we are developing a
visualization system prototype known as TOPIC ISLANDS™ to browse a document, generate fuzzy
document outlines, summarize text by levels of detail and according to user interests, define
meaningful subdocuments, query text content, and provide summaries of topic evolution.
Four Critical Elements for Designing Information Exploration Systems. [web page]
Beth Hetzler, Nancy Miller. Four Critical Elements for Designing Information Exploration Systems. 1998. Presented at Information Exploration workshop for ACM SIGCHI '98. Los Angeles, CA. April 1998.
Abstract
Designing an information exploration system requires attention to four critical components.
Since information exploration is a highly interactive process, the user is a key element.
The second and third critical elements are the presentation methods that are used to
communicate information and the interaction techniques that enable that user to actively
explore that information. Finally, powerful mathematics are needed to identify and manipulate
features of the information. This paper describes how these four critical components can work
together to flexibly meet varied user goals.
Visualizing the Full Spectrum of Document Relationships
Hetzler, Beth, W. Michelle Harris, Susan Havre, Paul Whitney. Visualizing the Full Spectrum of Document Relationships. In: Structures and Relations in Knowledge Organization. Proc. 5th Int. ISKO Conf. Wurzburg: ERGON Verlag, 1998. pp. 168-175.
Abstract
Documents embody a rich and potentially very useful set of complex interrelationships, both
among the documents themselves and among the terms they contain. However, the very richness of
these relationships and the variety of potential applications make it difficult to present them
in a usable form. This paper describes an approach that enables the user to visualize a
multitude of document or entity relationships. Two visual metaphors are presented that allow the
user to gain new insights and understandings by interactively exploring these relationship
patterns at multiple levels of detail.
Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms.
Beth Hetzler, Paul Whitney, Lou Martucci, Jim Thomas. Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms. 1998. In Proceedings of IEEE Symposium on Information Visualization, InfoVis '98, October 19-20, 1998, Research Triangle Park, North Carolina. pp.137-144.
Abstract
To gain insight and understanding of complex information collections, users must be able to visualize
and explore many facets of the information. This paper presents several novel visual methods from an
information analyst's perspective. We present a sample scenario, using the various methods to gain a
variety of insights from a large information collection. We conclude that no single paradigm or visual
method is sufficient for many analytical tasks. Often a suite of integrated methods offers a better
analytic environment in today's emerging culture of information overload and rapidly changing issues.
We also conclude that the interactions among these visual paradigms are equally as important as, if
not more important than, the paradigms themselves.
Beyond Word Relations - SIGIR '97
Hetzler, E. (1997). Beyond Word Relations. SIGIR Forum, Fall 1997. Vol 31, No. 2. ACM Press, p. 28-32.
Abstract
Many information retrieval systems identify documents or provide a document visualization based on
analysis of a particular relationship among documents — that of similar topical content. But there
may be layers of other less apparent and less traditional relationships that are useful to the user.
Exploring this other information was the subject of this workshop, with a focus on identifying new
non-traditional relationships. An initial taxonomy was introduced and fleshed out during the workshop.
The Need For Metrics In Visual Information Analysis
Nancy Miller, Beth Hetzler, Grant Nakamura, Paul Whitney. The Need For Metrics In Visual Information Analysis. Workshop on New Paradigms in Information Visualization and Manipulation in conjunction with the Sixth ACM International Conference on Information and Knowledge Management (CIKM '97), November 13-14, 1997, Las Vegas Nevada, ACM Press
Abstract
This paper explores several methods for visualizing the thematic content of large document collections.
As opposed to traditional query-driven document retrieval, these methods are used for exploring and gaining
insight into document collections. For our experiments, we used 12,000 medical abstracts. The
SPIRE [now IN-SPIRE] system was used to create the mathematical
signal from text and to project the documents into a universe of "docustars" and as a thematic contour map
based on thematic proximity. A self-organizing map is used to project the documents onto a "Tree" fractal. A
topic-based approach is used to align documents between concepts in the "Cosmic Tumbleweed" projection. In
the 32-D Hypercube, documents are organized by cascading theme strengths. An argument is made for a new type
of metric that would facilitate comparisons among the many methods for visualizing or browsing document
collections. An initial organization is proposed for some of the relevant research that metrics for
information visualization can draw upon.