The methods of data exploration have become the centerpiece of phylogenetic inference, but without the scientific importance of those methods having been identified. We examine in some detail the procedures and justifications of Wheeler's sensitivity analysis and relative rate comparison (saturation analysis). In addition, we review methods designed to explore evidential decisiveness, clade stability, transformation series additivity, methodological concordance, sensitivity to prior probabilities (Bayesian analysis), skewness, computer-intensive tests, long-branch attraction, model assumptions (likelihood ratio test), sensitivity to amount of data, polymorphism, clade concordance index, character compatibility, partitioned analysis, spectral analysis, relative apparent synapomorphy analysis, and congruence with a ''known'' phylogeny.
In our review, we consider a method to be scientific if it performs empirical tests, i.e., if it applies empirical data that could potentially refute the hypothesis of interest. Methods that do not perform tests, and therefore are not scientific, may nonetheless be heuristic in the scientific enterprise if they point to more weakly or ambiguously corroborated hypotheses, such propositions being more easily refuted than those that have been more severely tested and are more strongly corroborated. Based on common usage, data exploration in phylogenetics is accomplished by any method that performs sensitivity or quality analysis. Sensitivity analysis evaluates the responsiveness of results to variation or errors in parameter values and assumptions.
Sensitivity analysis is generally interpreted as providing a measure of support, where conclusions that are insensitive (robust, stable) to perturbations are judged to be accurate, probable, or reliable. As an alternative to that verificationist concept, we define support objectively as the degree to which critical evidence refutes competing hypotheses. As such, degree of support is secondary to the scientific optimality criterion of maximizing explanatory power.
An explicit definition of data exploration has yet to be offered in systematics, which has led to the proliferation of a bewildering number and variety of methods that purport to explore phylogenetic data. The lack of an explicit definition also hinders attempts to delimit what is, and what is not, data exploration. Nevertheless, common usage indicates that data exploration is accomplished by any method that performs either sensitivity analysis, defined broadly as the investigation of ''the responsiveness of conclusions to changes or errors in parameter values and assumptions'', or quality analysis, which purports to distinguish good, reliable data from bad, unreliable data, thereby assessing the ability of data to indicate the true phylogeny. Common usage also implies that methods of discovery such as maximum likelihood, parsimony, and neighbor-joining are not in themselves considered to be methods of data exploration (although application of multiple discovery operations is) nor are reports on the optimality criteria employed by those operations, such as the ensemble consistency (CI; Kluge and Farris, 1969) and retentionPhylogentic tree
If the outgroup shows any apomorphic states in a character, that character should not be used for cladistics (Hedges 1996).
Yet, these arguments are relevant only in nomothetic sciences, where discovery operations must contend with the objective indeterminism of the ...