Technical White Paper
  Interactive Data Analytics for Information Visualization
Implementation examples
Online demonstrations
Product Information
Advanced Visual Systems (AVS), a world leader in data visualization software and services for over a decade, is providing technology, vision and direction to leading software manufacturers and corporations that are making visual analysis a core element of their decision support practices and product development strategies.

OpenViz, AVS’ information visualization technology, is designed to generate highly customizable, interactive data visualizations based on raw data from a data source. The highly versatile Field data structure on which OpenViz is based can represent both hierarchical and relational data models, allowing the import of data directly from OLAP cubes or standard relational databases and retain the data relationships inherent in these common data models.

Augmenting OpenViz’ unmatched data visualization functionality are numerous components that enable sophisticated data analysis. The OpenViz data visualization system allows application developers to create visualizations based on analytic operations performed entirely within the flexible OpenViz architecture.

OpenViz supports a host of aggregation functions that can analyze data in records segregated into bins. OpenViz allows the definition of record-binning schemes, ranging from the simplest one-dimensional discrete bin specification to the most complex three-dimensional scheme with non-uniform continuous binning ranges.

The mechanism for defining bin-membership criteria in OpenViz is called the AxisMap. The wide range of AxisMaps that can be created within OpenViz provides dynamic flexibility in defining bins including:

Discrete and continuous binning: Bin membership can be defined based on unique, discrete data values or on continuous ranges of data values.

Non-uniform bin ranges: Bins can be defined by chopping a continuous data space into a series of ranges. These ranges can be of non-uniform size and can be adjusted dynamically.

Variety of data types: Bin membership criteria need not be based on simple numerical data, but can be based on a wider range of types including string, date/time and currency values. In the case of date/time values, the bins can be defined using convenient intervals such as months or quarters and can easily be made to omit weekends and holidays.

Multi-dimensional analysis: As many as three different criteria can be used to define bin membership, allowing for 1-, 2- and 3-dimensional analyses.

The versatile binning options outlined above the implementation of analytic techniques such as cluster analysis and correspondence analysis require bin membership criteria to be flexible and mutable. (Cluster analysis provides tools for grouping data items with increasing specificity; for example, biologists classify man as an animal, a vertebrate, an amniote, a mammal, and ultimately a primate. Correspondence analysis portrays higher-level similarities between data sets by generalizing their lower-level values based on correspondence; for example, five data points expressing the prevalence of smoking per level of employees could be normalized and generalized to show a similar contour in two of those levels.)

The aggregation functions (also known as amalgamation functions) that can be applied to the records in each bin include:


Sum: Computes the sum of the values in a specified column of the records in each bin. In addition to the summing of raw data, this operation can be applied to the results of analytical operations to implement stacked generalization analyses. (Stacked generalization is an extension of cross-validation, and is a scheme for minimizing the error rate in data generalization.)

Mean: Computes the arithmetic mean of the values in a specified column of the records in each bin. In addition to the summing of raw data, this operation can be applied to the results of analytical operations to implement bagging or voting analyses. (Bagging uses sampled data as a tool for predicting data grouping.)

Minimum: Identifies the least of the values in a specified column of the records in each bin.

Maximum: Identifies the greatest of the values in a specified column of the records in each bin.

Median: Computes the median of the values in a specified column of the records in each bin.

Count: Returns the number of records in each bin.

First: Identifies the first value encountered in a specified column of the records in each bin.

Last: Identifies the last value encountered in a specified column of the records in each bin.

Standard Deviation: Computes the standard deviation of the values in a specified column of the records in
each bin. (The standard deviation is a measure of the dispersion of values in a data set and is equal to the square root of the variance. The variance is the average of the squares of the amounts by which each value deviates from the mean.)

nth Percentile: For any n, computes the data value corresponding to the nth percentile of the values in a specified column of the records in each bin. (The nth percentile is a value below which lie n% of the data values. So, for example, a test score in the 95th percentile would be among the highest.)

n% Confidence Limit: For any n, computes c, the data value corresponding to the n% confidence limit of the values in a specified column of the records in each bin. (If c is the n% confidence limit for a set of data, then there is an n% chance that any given data value is less than c.)

Mean Plus n Standard Deviations: For any n, computes the sum of the arithmetic mean and n times the standard deviation of the values in a specified column of the records in each bin. These measurements are useful for determining how a data set is distributed around its mean value.

Median Plus n Standard Deviations: For any n, computes the sum of the median and n times the standard deviation of the values in a specified column of the records in each bin. Similarly to the previous operation, this measurement is useful for determining how a data set is distributed around its median value.

For data analyses based on a relational data model, binning and aggregation functionality is implemented within the ColumnDataToBins and BinStatistics components, using binning schemes that are defined using the AxisMap components.

For data analyses based on the hierarchical model, the ColumnDataToTree and TableRollUp components implement the aggregation functionality, allowing you to summarize data values in child nodes by assigning appropriate aggregated measures to the parent. The TableRollUp component allows for a drill-down analysis of multi-dimensional hierarchical data sets. (Drill-down refers to the exploration of successively more-detailed data items; for example, of students who achieved perfect scores, how many took the recommended prerequisites? Of those who did, how many received a grade of B or higher? Of those who did… and so on.)


In addition to aggregation, OpenViz provides components to implement other data reduction and data cleansing techniques. Thresholding allows the discarding of records in which the data value in a specified column falls above or below a certain threshold value. Additional components allow for cropping and down-sampling to reduce the size of a data set.

The ExtractOutliers component permits the identification of records for which the data value in a certain column lies outside some specified statistical interval (such as a confidence interval or a number of standard deviations from the mean). Identification of outlying data points is useful both for data cleansing and for focused analysis of extreme values.

The amount and quality of information that can be derived from data analysis is often determined by the selection of analytic parameters. OpenViz manipulators (visible tools that are inserted directly into a scene to allow a user to define analysis parameters by direct manipulation) provide a highly intuitive, visual mechanism for refining your analysis.

Using OpenViz manipulators, an end-user can slide, scroll, zoom and select visualization elements to fine-tune data analyses with instant, visual feedback. These interactive techniques permit a vast amount of information to be presented in a scene, and empower the user to select the quantity of information in display. The OpenViz application developer has a wide range of interactive techniques available.

The OpenViz DataMath component allows developers to use their own custom analytical functions to generate new data values for each record. This component allows the specification of algorithms in a C-like programming language to compute new data values based on existing data values within each record. These new data values can be used as binning criteria, thus permitting the implementation of customized bin-membership functions and discriminant function analyses. (Discriminant function analysis is used to determine which characteristics correspond most consistently to — and thus are the best predictors of — how data items fall into different groups.

An educational researcher investigating student test scores would collect other data about each student as well, such as hours of study, grades in prerequisite courses, and so on. The researcher would use discriminant function analysis to determine which of those other data items were the best predictors of students' test scores.)

The OpenViz architecture implements an event model that allows for easy modification of the data analysis pipeline. This feature allows for real-time modification and substitution of analytical logic with a minimum of computational overhead and with no need to re-query external data sources. The ability to alternate between multiple analytical techniques is useful in both the model-building and deployment phases of a data-mining project.

In addition to its decision-enhancing visualization capabilities, OpenViz provides sophisticated analytical capabilities that offer a complete data analysis and exploration solution for all types of analytical applications.

For additional information: www.openviz.com | info@avs.com
World headquarters (Boston): 781.890.4300
European inquires (London) 44.0.2376.804840 (Berlin) 49.0.30.6392.6063