discussion.tex

\section{Discussion and Future Work}
\label{sec:discussion}
We have presented a method for selecting good partitioning variables for small multiple displays.
An advantage of combining cognostics with non-parametric statistical approaches is that they can easily be extended to solve a variety of visual analytic problems. For example, we have described our algorithm in terms of a permutation test, which ignores sampling error in the data set. This is correct in many common analytic scenarios where the data set contains the entire population. If, however, the user wants to account for possible sampling error when scoring small multiple displays, they could instead use bootstrapping~\cite{Efron1994} to build the simulated null distributions. The structure of the approach would be unchanged.

Another natural extension of our method would be to handle continuous variables. Our method works in a straightforward manner on discrete partitioning variables. For continuous variables, discrete partitions can be created through disjoint binning techniques~\cite{Freedman1981,Scott2009}, or, overlapping bins (shingles)~\cite{Becker1996}. In either case, our approach can be extended to handle binning by first permuting the continuous variable and then applying the binning algorithm to partition the data. We used the equal count binning algorithm for non-overlapping shingles in the example described in Section~\ref{sec:method}. Future work could investigate the use of this process to find interesting bins for continuous variables given a particular partitioning variable. The parameters to the binning algorithm could be varied while the partitioning variable was held constant. This would allow us to pick out the setting of bins that maximize the cognostic for that partitioning variable. 

While we frame our algorithm in terms of scoring single variables, it is trivial to combine two discrete variables into a new discrete variable by crossing or nesting the levels of each variables~\cite{Wilkinson2005GG,Stolte2002}. Doing so would allow our algorithm to consider combinations of variables. 
Another common use case is creating small multiples by drilling down into aggregated data. A variation of our approach could be used to detect if potentially interesting visual information would be revealed by a change in level of detail. Visualization tools could use this to recommend a drill down or roll up dimension.
We could also extend our approach to consider sequences of partitionings. This could be used to develop a decision tree based exploratory data analysis interaction mechanism guided by our algorithm. At each decision level, we could apply our algorithm to select a partitioning variable given a single view of the data at that level. This would produce a small multiple display where each component plot could be further partitioned to reveal interesting structure. Considering the tree structure, each choice of a partitioning variable would be conditional on the other previously used variables, as in model selection methods. 

One weakness with our approach is that we do not correct for possible correlation between the patterns in the input visualization and partitioning variables. As a result, we may redundantly choose a small multiple display that shows a pattern that was already clearly visible in the original plot. While exposing highly correlated variables can be useful, it is likely not what the user wants in an effective small multiple display. Statistical methods for variable selection, such as ridge or lasso regression, can downweight highly correlated variables. Our approach would be improved by incorporating similar behavior. 

Our use of Chebyshev's inequality produces a very conservative bound on the likelihood of a cognostic. Better results might be achieved if more information is available about the underlying distribution of the cognostic. For example, Wilkinson and Wills have suggested that the distributions of their graph-theoretic scagnostics are well-modeled by a beta distribution~\cite{Wilkinson2008}. Fitting a beta distribution would capture the skew and truncation visible in some of our empirical cognostic distributions. We also suggest using the maximum absolute z-score across all component plots to score the overall small multiple display. This allows us to pick up single partitions with strong patterns. However, it may discount small multiples with weaker patterns in many or all the component plots. Averaging the z-scores across component plots might help address this, but may miss strong individual plots. Our choice of the maximum has worked well in practice, but more exploration is needed. 
% Future work could focus on taking such aspects of our approach that seem ad-hoc, towards a formalism like that of powerful statistical variable selection techniques.%, such as lasso regression.

Both cognostics and non-parametric methods are computationally demanding. In our approach we compute the scagnostics for each partition of each variable and then for the randomly permuted partitions for each variable. For a moderate sized dataset with thousands of rows, our R implementation takes about ten seconds on average to evaluate each partitioning variable. More work on computationally efficient cognostics is needed. Also, in our work with Scagnostics, we have found that they sometimes miss very obvious visual patterns. More work is needed to develop cognostics that are robust to properties such as sample size, the amount of noise in the data set, and the location and scale of the axes.