Dictionary of Mathematical Geosciences - PDFCOFFEE.COM (2024)

Richard J. Howarth

Dictionary of Mathematical Geosciences With Historical Notes

Richard J. Howarth Department of Earth Sciences University College London London, United Kingdom

ISBN 978-3-319-57314-4 ISBN 978-3-319-57315-1 DOI 10.1007/978-3-319-57315-1

(eBook)

Library of Congress Control Number: 2017942721 # Springer International Publishing AG 2017

This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Introduction

Is it possible for mathematical geology to accede to the true dignity of a really rational and scientific discipline, or shall it be nothing but a purely mechanical application of computers? (G. Matheron 1967).

Many geologists become earth scientists because they are interested in rocks, minerals, fossils, and the great outdoors, and it is not generally thought of as a mathematical branch of science. However, in recent years the interpretation of observational data, whether structures in the field, geochemical determinations, mineralogical or fossil assemblage compositions, stratigraphic correlation, the properties of geophysical or other time series and their analysis, etc., have all become increasingly reliant on the use of statistical and mathematical methods. Geophysicists will generally have had a strong background in, or at least a good aptitude for, mathematics, but this is not necessarily the case with their geological colleagues. In order to assist with this situation, this dictionary has been compiled so as to provide some guidance to the methods and terminology encountered in the literature. All the words which appear in bold in the explanatory text are themselves included as topics elsewhere. It is intended to be self-describing from the mathematical point of view and no prior knowledge is assumed. For this reason, some of the entries deal with entirely mathematical or statistical terms because they have been used in explanations elsewhere. It is intended as an aid for undergraduate and postgraduate earth science students, as well as professionals in the academic world and industry, who need a guide to terminology with pointers to the appropriate literature from which further information, and examples of use, can be found. It includes methods used in geology, geochemistry, palaeontology, and geophysics. The field of “geostatistics” in its original sense, i.e. as applied to spatial statistics and reserve estimation, is also included, but readers should be aware that the International Association for Mathematical Geosciences has published a more specialist glossary in this field (Olea et al. 1991). Since many aspects of early computing which underpinned this growth have now themselves passed into history, some terms from this field have also been included as they occur in the geological literature of the 1960s–1980s and may not be familiar to readers brought up in an era of laptops and tablets. I have

included notes on the origin of as many as possible of the terms included in this dictionary as I hope it will add to the interest for the reader. Conventions used in biographical dates, etc. in this work: (?–1892) should be read as “year of birth unknown”; (1904?–1942), “possibly born in 1904, died in 1942”; (?1520–? 1559), “born about 1520, died about 1559”; (1907–), “still alive or year of death not recorded”; John [?Henry] Koonce, “second given name is possibly Henry”; ?A. Otsuka, “given name is unknown, but may begin with the letter A.” Bibliographic citations such as “Raphson (1668?)” mean that the date of the original work is uncertain; and “(d’Alambert 1747 [1750])” mean that the date of publication was much later than submission to the journal. The history of mathematics websites maintained by Jeff Miller (New Port Richley, Florida), and the Ancestry.com website were invaluable in preparing this work, and I am immensely grateful to Annett Büttner and D€orthe Mennecke-Buehler at Springer DE and particularly to their proofreaders at SPi Global (Chennai) for their help with the manuscript. Frits Agterberg (Ottawa) and John McArthur (London) are thanked for their initial encouragement and suggestions. Readers are welcome to send the author suggested text (including references where applicable) for additional entries, or improvements to existing ones, and their submissions will be credited should an updated edition of this work be produced. London, UK

Richard J. Howarth

Contents

A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

133

E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

177

F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

201

G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229

H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

251

I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

275

J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

299

K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

305

L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

315

M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

355

N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

397

O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

419

P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

435

Q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

493

R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

503

S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

541

T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

611

U. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

637

V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

643

W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

653

X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

667

Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

669

Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

673

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

677

Acronyms, Abbreviations and Symbols

Acronyms A/D AFM diagram AGC AI AIC aln alsn AM ANN ANOVA AR process ARMA process ART ASA BLUE BME BPT CA plot CAD CAI CART CASC CEP CICA

Analog to digital conversion Alkalis-total FeO-MgO diagram (see AFM diagram) Automatic gain control Artificial intelligence Akaike information criterion Additive logistic normal distribution Additive logistic skew-normal distribution Amplitude modulation Artificial neural network Analysis of variance Autoregressive process Autoregressive moving average process Algebraic reconstruction technique Adaptive simulated annealing Best linear unbiased estimator Bayesian maximum entropy Back projection tomography Concentration-area plot Computer-aided design or drafting Computer-aided or computer-assisted instruction Classification and regression trees Correlation and scaling Circular error probability Constrained independent component analysis

Unless otherwise indicated, for explanation refer to their full titles.

CIPW norm CONOP CP plot D/A DBMS DEM DFT DPSS DSM DTM EDA EDF EM algorithm F FA FAP FCM FFT f-k analysis FM FUNOP FWT GA GIS H ICA IIR filter IQR IRLS IWLS KEE LAD LAV LMS LOWESS MA MANOVA MAP MCMC MDS MED

Cross, Iddings, Pirsson, Washington norm (see CIPW norm) Constrained optimisation Cumulative probability plot Digital-to-analog Database management system Digital elevation model Discrete Fourier transform Discrete prolate spheroidal sequence Digital surface model Digital terrain model Exploratory data analysis Empirical discriminant function Expectation-maximization algorithm Favorability function Factor analysis FORTRAN assembly program Fuzzy c-means clustering Fast Fourier transform Frequency-wavenumber analysis Frequency modulation Full normal plot Fast Walsh transform, fast wavelet transform Genetic algorithm Geographic information system Entropy, Hurst exponent Independent component analysis Infinite impulse response filter Interquartile range Iteratively reweighted least squares (see under IWLS) Iterative weighted least squares Knowledge engineering environment Least absolute deviation (see under LAV) Least absolute value Least mean squares, least median squares Locally weighted scatterplot smoother Moving average Multivariate analysis of variance Macro assembly program Markov chain Monte Carlo Multidimensional scaling Minimum entropy deconvolution

MEM MESA MFA diagram ML MLA MLP MTBF MV NLM algorithm ODE OLS OOP PCA PDE PDF P-P or PP plot PSD PSO QA QC Q-Q plot R R/S analysis RBV RDBMS REE diagram RMA regression RMS RTL algorithm SA S-A method SIR algorithm SIRT SNR SQC SSA SVD TAS UANOVA URV URPV

Maximum entropy method Maximum entropy spectral analysis MgO-total FeO-alkalis diagram (see MFA diagram) Maximum likelihood Machine learning algorithm Mean lagged product Mean time between failure Multivariate Nonlinear mapping algorithm Ordinary differential equation Ordinary least squares Object-oriented programming Principal components analysis Partial differential equation Probability density function Percent-percent plot Power spectral density Particle swarm optimization Quality assurance Quality control Quantile-quantile plot Multiple correlation coefficient Rescaled-range analysis Relative biostratigraphic value Relational database management system Rare earth element diagram Reduced major axis regression Root mean square Region-time-length algorithm Simulated annealing Spectrum-area method Sampling-importance-resampling algorithm Simultaneous iterative reconstruction technique Signal-to-noise ratio Statistical quality control Singular spectrum analysis Singular value decomposition Total alkalis-silica diagram (see TAS diagram) Unbalanced analysis of variance Unit regional value Unit regional production value, see URV

URW UV VFSR VR

Unit regional weight, see URV Univariate Very fast simulated re-annealing, see SA Virtual reality

Abbreviations [Notation] asin arccos, acos arctan, atan cos cosh div e erf exp Γ grad H I i iff iid J Λ λ ln lods log logit r r2, R2 rad sin sinc sinh T tan tanh

arcsine arcosine (see under arcsine) arctangent (see under arcsine) cosine hyperbolic cosine (see hyperbolic functions) divergence operator Euler’s number error function exponential function gamma function gradient operator entropy, Hermite polynomial, Hilbert transform, Hurst exponent identity matrix imaginary unit if and only if independent and identically distributed Jacobian eigenvector eigenvalue Napierian logarithm log odds, logarithmic odds ratio Briggsian or common logarithm logistic transform Pearson’s product-moment correlation coefficient coefficient of determination radian sine cardinal sine hyperbolic sine (see hyperbolic functions) transpose, transposition tangent hyperbolic tangent (see hyperbolic functions)

Mathematical Symbols [Notation] ¼, 6¼ ffi , , ! , >, , 1; so that wi ¼ wi 1 + μiei yi (Lau et al. 2004), where e is Euler’s number, the constant 2.71828. This same approach has been used in time series analysis to update estimates of the power spectrum of a time-varying process, such as seismic waveforms, to improve prediction (Griffiths 1975; Griffiths and Prieto-Diaz 1977; Ulrych and Ooe 1979). Adaptive processing Data processing in which the parameters of the algorithm are varied with arrival time as measurements of the data statistics change (Sheriff 1984). See also: Baggeroer (1974), Lacoss (1971), Leverette (1977); adaptive least mean squares, maximum likelihood, maximum entropy methods. Adaptive Simulated Annealing (ASA) See simulated annealing. Addition-subtraction diagram 1. Pairs of side-by-side divided bar charts were used by the American geologist and petroleum engineer Joseph Bertram Umpleby (1883–1967) (Umpleby 1917), Butler et al. (1920) and Burbank and Henderson (1932) to compare the major element oxide gains or losses in a rock affected by metamorphism, e.g. the transition from limestone to wollastonite, diopside, etc.; from quartz monzonite to sericitised quartz monzonite, etc. Oxide percentages of each major constituent were multiplied by the specific gravity of the rock to express the changes relative to 100 cm3 of the unaltered rock. The proportional length of each bar was then equivalent to the same volume of unaltered rock. It is also known as a gain-loss diagram. 2. Use of a Harker variation diagram to back-calculate the composition of material added to, or subtracted from a magma (Cox et al. 1979). Additive logistic normal distribution (aln) The additive logistic transform applied to a multivariate normal distribution produces an additive logistic normal (aln) distribution over the simplex. The aln model is closed under perturbations, power transformations, subcompositions and permutations of its parts. It can also be obtained by applying the closure operation to a positive random vector with a multivariate lognormal distribution. The process of estimating the parameters follows the standard multivariate

5

procedure for the normal distribution but in terms of logratio-transformed data. To validate the aln model, it is sufficient to apply a goodness-of-fit test for compliance with a normal distribution to the logratio-transformed sample. In practice, there are a considerable number of compositional data sets whose distribution can be reasonably modelled by an additive logistic normal model. The aln distribution was introduced by the Scottish-born statistician, John Aitchison (1926–) and his doctoral student at the University of Hong Kong, Shen Shir-ming (Shen 1983) in Aitchison and Shen (1980); see also: Aitchison (1986, 2003) and Buccianti et al. (2006). Additive logistic skew-normal distribution (alsn) The additive logistic transform applied to a multivariate skew-normal distribution produces an additive logistic skewnormal (alsn) distribution over the simplex. It is a generalization of the additive logistic normal distribution and it appears to be suitable for modelling compositional data sets when the logratio-transformed data has a moderate skewness. Like the additive logistic normal model, the alsn model is closed under perturbations, power transformations, subcompositions and permutations of its parts. Also, the alsn model can be obtained applying the closure operation to a positive random vector with a logskew-normal distribution. The process of estimating the parameters follows the multivariate procedure for the skew-normal distribution but in terms of logratio-transformed data. To validate the alsn model it is sufficient to apply a goodness-of-fit test for skew-normality to the logratio-transformed sample. Introduced by the Spanish statistician, Glória MateauFigueras (1973–); see: Mateau-Figueras et al. (1998), Mateau-Figueras (2003), Buccianti et al. (2006). Additive logistic transform If X is a k-part composition (e.g. a set of major-element oxide determinations on a physical sample) and one of the oxides xk is chosen as the basis, then the forward-transformation is to obtain the logratios: yi ¼ ln

xi , i ¼ 1 to k 1: xk

The back-transformation is obtained by bi ¼ exp(yi), where i ¼ 1 to k 1 and setting S ¼ 1 þ ðb1 þ b2 þ þ bk1 Þ; then xi ¼ bi/S, i ¼ 1 to k 1; and, finally, xk ¼ 1 ðx1 þ x2 þ þ xk1 Þ: An unbiased mean for a set of compositional data can generally be obtained by forwardtransforming from the original percentaged data to logratios on the basis of a chosen (k-th)

6

A

component (e.g. in a petrological context, SiO2); calculating the means of each of the k 1 transformed variables; back-transforming them, and finally multiplying the results by 100 to convert the multivariate mean back to percentaged values. Introduced by the Scottish statistician, John Aitchison (1926–). See Aitchison (1981, 1982, 1986, 2003) and Buccianti et al. (2006). Additive logratio transform The Scottish statistician, John Aitchison (1926–) has analysed many of the difficulties caused by the constant-sum nature of a percentaged data set, the closure problem (Aitchison 1981, 1982, 1986, 2003) which had previously been recognised by Chayes (1960, 1971), Krumbein (1962) and Vistelius (1980, 1992). Aitchison showed that if this is not taken into account, bias will occur in both estimates of the mean composition and in the application of multivariate statistical analysis methods (e.g. principal components analysis). Aitchison found that provided no zero percentages are present these types of problem can be overcome by re-expressing the data set in terms of the natural logarithms of the ratio of each of the k proportions ( p1, , pk) in a sample to one variable selected as the basis ( pB) and ratioing the rest to that, i.e. ln( p1/pB), , ln( pk-1/pB), the logratio transform. Statistics such as the mean composition are computed on the basis of the transformed data and then back-transformed to recover the actual percentage composition. See Zhou et al. (1991), Eynatten et al. (2003), and the papers in Buccianti et al. (2006) for discussion and recent applications; and Martín-Fernández et al. (2003) and Martín-Fernández and Thio-Henestrosa (2006) for work on the “zeros” problem. Additivity Consider two independent variables x1 and x2 and a response variable y ¼ β0 + β1x1 + β2x2 + β3x1x2, where β0 , β1 , β2 , β3 are constants. y depends linearly on x1 for fixed values of x2, and linearly on x2 for fixed values of x1. However, the effect of changes in x1 and x2 will be additive if, and only if, there is no interaction between x1 and x2, i.e. β3 ¼ 0. Only in this case will a given change in x1 produce the same magnitude of change in y regardless of the value of x2; and a given change in x2 produce the same change in y regardless of the value of x1. When changes in y resulting from changes in x1 and x2 are additive, there is no interaction between x1 and x2 with respect to their effect on the response y. So, in order for linearity and additivity to apply, in terms of three or more independent variables the function must be of the form: y ¼ β0 þ β1 x1 þ β2 x2 þ β3 x3 þ β4 x4 þ : Transformation (Stanley 2006a,b) may be required to achieve additivity. For example, sampling and analytical errors are additive as variances, but not as standard deviations. The term additivity was introduced by the American statistician, Churchill Eisenhart (1913–1994). See: Eisenhart (1947), Miller and Kahn (1962), Vistelius (1980, 1992), Stanley (2006a).

7

Address A label which identifies a specific position within a computer memory. The term appears in this context in Bloch et al. (1948). Adjoint, adjugate The inverse matrix multiplied by the determinant. The term was introduced by the American mathematician, Leonard Eugene Dickson (1874–1954) and the Swiss-American mathematician Saul Epsteen (1878–) (Dickson 1902; Epsteen 1902); it was also known as the adjugate in earlier work (Dodgson 1866). Affine A linear function which has a constant derivative. First defined by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1748; Blanton 1988, 1990). Affine correction The affine correction of variance was introduced into geostatistics by Journel and Huijbregts (1978). It reduces the variance of the values of a frequency distribution Z (e.g. that of observed concentrations at a set of sampling points), σ 2z, to σ 2y for the corresponding transformed distribution Y (e.g. block averages), while retaining the same mean, m, using the transform Y ¼

σy ðZ mÞ þ m: σz

The shape of the original distribution is retained. Affine transformation Any transformation which preserves collinearity: when applied to positional data, such transformations can alter distances and angles, but straight lines remain straight, parallel lines remain parallel, and the ratio in which a point divides a line remains the same. Such changes could involve: translation, rotation and change of scale. They may be helpful when digitizing and plotting line data (e.g. where sets of positional data have been obtained from various sources and need to be brought to a common frame of reference before they can be plotted on the same map). First introduced by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1748; Blanton 1988, 1990). See Loudon et al. (1980) and Doytsher and Hall (1997) for discussion. AFM diagram 1. A ternary diagram frequently used in igneous petrology to distinguish between tholeiitic and calcalkaline differentiation trends in subalkaline magma series; based on oxide weight percentage data, with apices: “A:” alkalis (Na2O + K2O), lower left; “F:” total FeO, i.e. (FeO + 0.8998Fe2O3), top; and “M:” MgO, lower right. Introduced by the American metamorphic petrologist and mineralogist, James Burleigh Thompson Jr. (1921–2011) (Thompson 1957) his method was extended by R. Thompson (1982). It is occasionally referred to as an MFA diagram (Kuno 1968).

8

A

2. A ternary diagram used in metamorphic petrology to show changing mineral compositions on the basis of Al2O3, FeO and MgO. Akaike Information Criterion (AIC) A criterion for statistical model fitting: AIC ¼ (2)ln(maximum likelihood) + 2(number of independent adjusted parameters), where ln is the natural logarithm. It was introduced by the Japanese statistician, Hirotugu Akaike (1927–2009) (Akaike 1973) and has subsequently been used in seismology, volcanology and geostatistics, e.g. Takanami and Kitagawa (1988), Webster and McBratney (1989), Webster and Oliver (2001), Ammann and Naveau (2003). Alanysis A term introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000), as a name for the procedure for summarising, looking at, or dissecting a frequency series in terms of quefrency (Bogert et al. 1963; Oppenheim and Schafer 2004). See: cepstrum analysis. Algebraic Reconstruction Technique (ART) The Japanese mathematician, Kunio Tanabe (1941–) implemented a projection method to solve a system of linear equations Ax ¼ b (Tanabe 1971), following the work of the Polish mathematician, Stefan Kaczmarz (1895–1939) (Kaczmarz 1937). It is sometimes known as the Kaczmarz method. Each equation in the system can be thought of as the projection of the solution vector onto the hyperplane corresponding to that equation (Carr et al. 1985). Rediscovered by Gordon et al. (1970) in the field of biological image reconstruction from projections, they used the method to reconstruct three-dimensional objects from a series of two-dimensional electron photomicrographs taken at a number of angles in a fan-like pattern (Bender et al. 1970; Herman et al. 1973; Schmidlin 1973), the method was called the Algebraic Reconstruction Technique (ART). It has subsequently been applied to seismic tomography (McMechan 1983; Neumann-Denzau and Behrens 1984) and to cokriging (Carr et al. 1985; Freund 1986), although it proved to be slow (Carr and Myers 1990). However, in seismic work the method was found to be both ill-conditioned and slow, and it was subsequently replaced by the Simultaneous iterative reconstruction technique. See also Back projection tomography. ALGOL Acronym for “Algorithmic Oriented Language,” a computer programming language originally developed by a group of European and American computer scientists at a meeting in Zurich (Perlis and Samelson 1958). It was subsequently refined and popularised as ALGOL-60 (Naur 1960), assisted by the work of the computer scientists Edsger Wybe Dijkstra (1930–2002) and Jaap A. Zonneveld (1924–) in the Netherlands; and (Sir) Charles Anthony Richard Hoare (1934–) then working with the computer manufacturer, Elliott Brothers, in England, and the Swiss computer scientist, Niklaus Emil Wirth (1934–) (Wirth and Hoare 1966). Later variants used in geological studies included BALGOL (Burroughs Algol), developed by the Burroughs Corporation in the USA. Early examples of its use in the earth sciences include Harbaugh (1963, 1964) and

9

Sackin et al. (1965) but it was soon replaced by the programming language FORTRAN. See also: computer program. Algorithm A formal procedure (a set of well-defined logical instructions) for solving a numerical or logical problem. Given an initial state, it will terminate at a defined end-state. It may be embodied as computer code, written in a formal programming language, or as pseudocode (a notation which resembles a programming language, but which is not intended for actual compilation). The steps involved in an algorithm may also be shown graphically in the form of a flowchart which illustrates the individual steps and the input/ output or logical links between them. The earliest use of the term, derived via Latin from Arabic, just meant “arithmetic.” Its use, in the sense of problem-solving, goes back to Gottfried Wilhelm von Leibniz (1646–1716) in the 1670s (Leibniz 1684). The term was first used in its modern sense by the Russian mathematician, Andrei Andreevich Markov [Jr.] (1903–1979) (Markov 1954, 1961) and in geology by the Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995) (Vistelius and Yanovskaya 1963). Early examples of such formalisation in the earth sciences were the development of calculation sheets and flowcharts to aid geochemical calculations, such as the CIPW norm for igneous rocks (so-called after the authors of the paper in which its use was proposed—Cross et al. 1903), later flowcharted and implemented as a computer program for an IBM 650 computer at the University of Oklahoma by Kenneth Johnson (1962); and the calculation of mineral compositions from sedimentary rocks (Imbrie and Poldervaart 1959). Alias, aliasing, alias filter This is an inherent property of all time series sampling systems (e.g. in seismic processing) or measurement at discrete stations, such as along a traverse line in gravity or geochemical surveying. When a time series is sampled at regular intervals, any frequency which is present in the waveform which is greater than the Nyquist frequency (ω) by an amount δω, will be indistinguishable from a lower frequency (ω δω), e.g. suppose the sampling rate is every 5 ms (i.e. 200 times per second), the Nyquist frequency ¼ 100 Hz (cycles per second), hence a waveform with a frequency of (100 + 50) ¼ 150 Hz when sampled every 5 ms will be indistinguishable from a (100 – 50) ¼ 50 Hz waveform; and a (100 + 100) ¼ 200 Hz waveform will appear to be a series of constant-amplitude values. In general this phenomenon will occur when f1 ¼ 2k fN f2 , where k is an integer and fN is the Nyquist frequency (Blackman and Tukey 1958). Data which are suspected to be aliased should not be used for time series analysis. In geophysics an antialiasing filter, also called an alias filter, is used before sampling to remove undesired frequencies above the Nyquist frequency; see Smith (1997) for examples. The theory of aliasing was developed by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). For discussion in a signal processing context, see Blackman and Tukey (1958) and Smith (1997); in an earth science context, see: Camina and Janacek (1984), Buttkus (1991, 2000), Weedon (2003), Gubbins (2004) and Costain and Çoruh (2004). See also: principal alias.

10

A

Alignment chart, alignment diagram A graphical calculator: a diagram representing the relations between three or more variables by means of linear or curved scales, so arranged that the value of one variable can be read off by means of drawing a straight line intersecting the other scales at the appropriate values. Alignment charts have been used in structural geology to aid calculation of bed thickness; depth to a stratigraphic horizon; spacing interval for structure contours, etc. The methods for construction of such charts were first developed by the French geometer, Maurice D’Ocagne (1862–1938) (1896) and are also explained by Peddle (1910). Examples of use in early structural geology are found in Cloos (1947), Nevin (1949) and Billings (1954). Alignment chart seems to be the most usual terminology (Google Research 2012). All-colour noise All-colour [American English sp., all-color] noise, also known as Brownian motion, random noise or white noise, is a one-dimensional random walk which begins with a value (e.g. 0, at time 0), and each successive value is obtained by adding a random value from a normal distribution to the previous value (Camina and Janacek 1984; Buttkus 1991, 2000). All poles method, all poles model Alternative terms (Weedon 2003) for Burg’s (1967, 1968, 1975) Maximum Entropy method of power spectrum estimation. The procedure computes the autoregressive power spectrum coefficients directly from the data by estimating the partial autocorrelations at successive orders. Since the computed coefficients are the harmonic mean between the forward and backward partial autocorrelation estimates, this procedure is also known as the Harmonic algorithm. It will exhibit some bias in estimating the central frequencies of sine components, and higher order fits are notorious for splitting, a phenomenon which causes multiple spectral peaks to be generated when, in reality, only a single feature is present. The technique was introduced by the American geophysicist, John Parker Burg (1931–) (Burg 1967). See also Buttkus (1991, 2000), Camina and Janacek (1984). Allometry, allometric growth This was originally the study of the relationship between a measurement characterising the size of a human or animal body as a whole (e.g. its weight or overall length) and that of any of its parts (e.g. a limb), or latterly, the relative growth of any two parts. In geology it has been particularly applied in palaeobiometrics. The German psychiatrist, Otto Snell (1859–1939) first drew attention to the importance of relating brain size to body size (Snell 1892). The English evolutionary biologist, Julian Huxley (1887–1975), followed this (Huxley 1932) with the suggestion that the relative change in growth of two skeletal parts, x and y, could be expressed by the general equation y ¼ bxk, where b and k are constants. If the parts grow at the same rate, k equals 1, and it is known as isogonic growth; if k is not equal to 1, it is known as heterogonic growth. Early palaeontological studies include Hersh (1934) and Olsen and Miller (1951). As it was realised that the assumption of dependent and independent variables was not really applicable in the case of morphological dimensions (Kermack and Haldane 1950), the

11

line of organic correlation, later known as the reduced major axis, was used to fit regression models to such morphological measurement data. Alphanumeric, alphameric Terms which arose in computer programming in the mid-1950s to mean a character set containing letters, numerals, and other characters. They first came into use in Beard et al. (1956), Bracken and Oldfield (1956) and Dunwell (1957). Alphameric never gained the frequency of usage achieved by alphanumeric and began to fall out of use by the 1970s (Google Research 2012). Amalgamation An operation performed on compositional data, which consists in summing two or more components to form a single new composite variable. The term was introduced by the Scottish statistician, John Aitchison (1926–) (Aitchison 1986, 2003; Buccianti et al. 2006). Ambient The term means surrounding, or background and was used as far back as the seventeenth Century in the context of “ambient air” and “ambient light.” Ambient noise is the level of pervasive noise (e.g. in a signal) associated with a particular environment. The term occurs in Morrical (1939) and in a geophysical context in Haggerty and Olson (1948). Ambiguity function A cross-correlation function with a stretch or shrink factor to enable better waveform matching. Introduced by American electrical engineers, Ronald Lee Gassner (1938–) and George Robert Cooper (1921–) (Gassner and Cooper 1967); and into geology by the American geophysicist, Norman Samson Neidell (1939–) (Neidell 1969). Amplitude 1. The maximum deviation of a periodic, or quasi-periodic, waveform from its mean value in a single cycle. The term was used by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1727) and was later popularised through its use in physics (Thomson and Tait 1867) and in geophysics, in discussion of seismic waves from earthquakes, by the pioneer English seismologist, John Milne (1850–1913) (Milne 1882) and by the English mathematician and seismologist, Charles Davison (1858–1940) (Davison 1893). 2. In the case of the discrete Fourier transform of a time series X(t) consisting of n equispaced values, for each possible wavelength, say (n/4), the amplitude can be regarded as being given by: {the sum of the individual products of the X(t) values multiplied by the equivalent values of a cosine or sine wave} multiplied by (2/n). The modern theory was developed by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990), see Blackman and Tukey (1958). For recent discussion in an earth science context, see: Camina and Janacek (1984), Weedon (2003) and Gubbins (2004); see also: frequency.

12

A

Amplitude Modulation (AM) A constant-amplitude sinusoidal “carrier” waveform with a relatively long wavelength is modulated in such a way that its amplitude becomes proportional to that of another waveform whose information content is to be transmitted. The resulting waveform will have a constant pattern of varying amplitude over a fixed interval (the beat wavelength). The technique was fundamental to the early transmission of radio signals carrying speech and music. The principle was originally conceived by Canadian-born chemist, physicist, and wireless telegrapher, Reginald Aubrey Fessenden (1866–1931) (Fessenden 1902), who also coined the word “heterodyne” from the Greek, heteros (other) and dynamis (force). The superheterodyne receiver evolved through the work of wireless telegrapher Lucien Lévy (1892–1965) in France, and the American electrical engineer, Edwin Howard Armstrong (1890–1954). Armstrong (1917) patented it, and by 1921 the term had come into frequent use (Armstrong 1921, 1924). In an earth science context Weedon (2003) distinguishes between: heterodyne amplitude modulation and imposed amplitude modulation: (i) Heterodyne amplitude modulation is the addition of two sinusoids with similar wavelengths to create a new waveform which has a frequency equal to the average of those of the two waveforms added. The amplitude of the resultant waveform (the beat) varies in a fixed pattern over the beat wavelength and has a frequency which equals the difference in the frequencies of the two added waveforms. (ii) Imposed amplitude modulation is the modification of a high frequency sinusoid by one of longer period (e.g. by multiplication of the two signals) to produce a combined signal in which amplitude varies in a fixed pattern; maximum amplitude corresponds to the frequency of the imposed, longer wavelength, signal. Amplitude spectrum A representation of power spectral density analysis in which amplitude (rather than the usual squared amplitude), or the logarithm of this value, is plotted as a function of frequency. It is also known as a magnitude spectrum. A waveform a(t) and its frequency spectrum A( f ), the variation of amplitude and phase as a function of frequency, where t is time and f is frequency (cycles/unit time), are Fourier transform pairs. A( f ) is usually a complex valued function of frequency, extending over all positive and negative frequencies. It may be written in polar form as Að f Þ ¼

1 X

at e2πift j Að f Þ j eiφð f Þ ,

t¼0

pffiffiffiffiffiffiffi where i is the imaginary unit 1 and e is Euler’s number, the constant 2.71828. The magnitude |A( f )| is called the amplitude spectrum, and the angle φ( f ) is called the phase spectrum. The theory was originally developed by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) in Blackman and Tukey (1958). Early examples of its application in Earth science were by Ben-Menahem and Toks€oz (1962) and Anderson and Koopmans (1963); it is also mentioned in Robinson (1967b), Buttkus (1991, 2000) and Weedon (2003). See also: phase-lag spectrum.

13

Anaglyph, anaglyptic image An image with apparent depth perception (sometimes referred to in early literature as a stereogram) produced when the left and right eyes simultaneously view two images of an object taken from corresponding viewpoints. Aided by viewers (stereoscopes) developed by the English physicist, (Sir) Charles Wheatsone (1802–1875) in 1833 (Wheatstone 1838) and the Scottish physicist, (Sir) David Brewster (1781–1868) in 1849 (Brewster 1856), stereophotography became very popular towards the end of the nineteenth Century. Anaglyptic images, in which the pair of left and right views are superimposed using projected or printed images in red and green respectively, and then viewed through a pair of red and green filters, were discovered by the French physicists Joseph-Charles d’Almeida (1822–1880) (d’Almeida 1858), and Arthurlouis Ducos du Hauron (1837–1920) which the latter patented in 1892. It was also Ducos du Hauron (1899) who coined the term anaglyph. Examples of its use in earth science include Galton (1865), Blee (1940), and Gay (1971). Use of stereoscopic aerial photographs in geology began in the 1940s and Rea (1941) introduced the term photogeology. Bartels (1931) used both stereograms and anaglyphs to illustrate geophysical data. Analog, analogue 1. An object that has a resemblance to, or correspondence with, another, e.g. as used by Humboldt and Bonpland (1825) and Thorp (1840), especially in the sense of similarity of function (Page 1859). 2. A chemical compound with a molecular structure very similar to that of another (Liebig 1838). 3. A continuous physical variable which bears a direct physical relationship to another variable in such a way that it is proportional to it (Shaw 1890; Raman and Krishnan 1928). 4. In a computing context, a continuous-valued variable, as opposed to one which has discrete values (Housner and McCann 1949; Rajchman 1957). The American English spelling analog (Sheriff 1984) has also become the more frequently used, rather than analogue, in British English since the 1960s (Google Research 2012). Analog computer Mechanical aids to computing had been in use since Victorian times, for example, the work of the English engineer, James Thomson (1822–1892) (Thomson 1876). However, the earliest electronic computers were analog [N.B. American English spelling], in which physical phenomena were modelled using electrical voltages and currents as the analogue quantities. Although these were subsequently used in geophysical applications (Housner and McCann 1949), following the work of the American electronic engineer and mathematician, Claude Shannon (1916–2001), who showed (Shannon 1938, 1993) that the operations of Boolean algebra could be accomplished using electronic relays and switches. About 1935, the American mathematician and physicist, John Vincent Atanasoff (1903–1995) developed with his graduate students, Lynn Hannum and Glenn Murphy, the “Laplaciometer,” an analogue calculator for solving Laplace’s equation with

14

A

various boundary conditions (Murphy and Atanasoff 1949). Atanasoff is reputed to have coined the term analog computer to contrast this type of device with the early (digital) computers which followed; both terms began to be used in the late 1940s (Merzbach and Atanasoff 1969) and this reached a maximum in the late 1960s (Google Research 2012). Analog[ue] to Digital conversion (A/D) This process is also called digitizing. The conversion of the amplitude values of a continuous time-varying waveform (usually an electrical or optical signal of some kind) to digital form as discrete numerical values at equally-spaced time intervals throughout its length. The process consists of: (i) choice of a suitable time-spacing; and (ii) extraction and quantisation of the values of the amplitude of the signal at each time point and its representation as a numerical reading, where each number is limited to a certain number of digits. Choice of the coarseness or fineness of the digitization interval will depend on such factors as: the requirements of the problem, quality of the analogue signal, the equipment available to perform the conversion, datastorage requirements, and the costs involved. Note that unsuitable choice of sampling interval will result in aliasing, the theory of which was developed by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) in Tukey and Hamming (1949); see also: Blackman and Tukey (1958). Mentioned in an earth science context by Robinson (1967b), Buttkus (1991, 2000), Broding and Poole (1960), Aspinall and Latchman (1983), Thibault and Klink (1997), Xu and Xu (2014). Both terms have come into wide usage since the 1950s (Google Research 2012). Analog[ue] model A method of studying the effects of a subsurface body or structure by comparison with the response of one or more physical models, or the behaviour of instrumentation, using electrical (Braun and Wheatley 1960; Roy and Naidu 1970) or optical (Roy 1959) methods. See also: fluid flow model. The term first came into prominent usage in the 1950s, but this has declined since the 1970s (Google Research 2012). Analysis of Variance (ANOVA) This is a method of decomposing the total variation displayed by a set of observations (as measured by the sums of squares of differences from the mean) into components associated with defined sources of variation. It enables the relative magnitude of the contribution of each source to the overall observed total variation to be determined. Originally called “analysis of variation,” it was developed by the English statistician, (Sir) Ronald Alymer Fisher (1890–1962) in the early 1920s (Fisher and Mackenzie 1923) and was widely taken up following publication of his book Statistical Methods for Research Workers (1925a). It was first introduced into geology by the American statistician, Churchill Eisenhart (1913–1994) (Eisenhart 1935). Classical applications include the investigation of whether a statistically significant difference exists between samples of nominally similar composition (Eisenhart 1935) and the magnitude of variation introduced by different human operators in determination of a physical sample composition by modal analysis, determination of grain size, shape, orientation, density,

15

porosity, etc. (Griffiths and Rosenfeld 1954). See Miller (1949), Miller and Kahn (1962) and Krumbein and Graybill (1965) for examples of early geological applications. More recently, it has been widely used in the estimation of relative magnitudes of sources of sampling and analytical variance in regional- or local-scale environmental geochemical surveys (Miesch 1967a, 1976a; Garrett and Goss 1979; Thompson and Ramsey 1995; Ramsey et al. 1995). Garrett and Goss (1980a, b) provided a computer program for analysis of variance of unbalanced nested sampling designs (UANOVA); see also Goss and Garrett (1978). Analytic continuation A method of extending the set of values over which a mathematical complex function is defined; the mathematical projection of a potential field from one datum surface to another level surface, lying either above or below the original datum. Consider two domains (regions in the complex plane), D1 and D2. Then the intersection of D1 and D2 is the set of all points common to both D1 and D2; the union of D1 and D2 is the set of all points which are either in D1 or D2. If the intersection of D1 and D2 is not empty and is connected to both domains then, if there is a function f1 that is analytic over D1, and another function f2 that is analytic over D2, with f1 ¼ f2 in the region of the intersection of D1 and D2, then f2 is said to be an analytic continuation of f1 into the domain D2. See Weaver (1942), Peters (1949) and Buttkus (1991, 2000) in the context of digital filtering in applied geophysics; see also: downward continuation, upward continuation. Analytic function 1. If w ¼ f [z] ¼ u[x, y] + iv[x, y] is a function of the complex variable z ¼ x + iy, where i is pffiffiffiffiffiffiffi the imaginary unit 1, its derivative f 0 ½ z ¼

dw f ½z þ Δz f ½z ¼ lim dz Δz!0 Δz

under the conditions that: f[z] must be defined at z; f[z] is not equal to 1; and the limit does not depend on the direction in which Δz ! 0. Then if f[z] is differentiable at z0 ¼ (x0, y0) and throughout a region about z0, it is said to be analytic. If f1[z] and f2[z] are analytic functions in domains (regions in the complex plane), D1 and D2, then it may be shown that in a region D which is the intersection of D1 and D2: (i) a linear combination of f1[z] and f2[z] is analytic; (ii) the product f1[z] f2[z] is analytic; and (iii) f1[z]/f2[z] is analytic except at the points where f2[z] ¼ 0. 2. A function which is locally given by a convergent power series. Analytical geology A term introduced by the Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995) (Vistelius 1944; Tomkeieff 1947) for what in the 1960s

16

A

became known as mathematical geology (Vistelius 1962, 1967, 1980, 1992; Google Research 2012). Andronov-Hopf bifurcation A local bifurcation in a dynamical system at which a fixed point looses its stability to perturbations that take the form of growing oscillations, shedding its limit cycle. It occurs in certain chemical reaction systems and predator-prey models. Named for the Russian control engineer, Aleksandr Aleksandrovich Andronov (1901–1952) who first discovered it (Andronov 1929), and the Austrian-born German mathematician, Eberhard Frederich Ferdinand Hopf (1902–1983) who independently discovered it (Hopf 1942, 1948, 1976). The term “Hopf bifurcation” was subsequently introduced by the Belgian physicist and mathematician, David Ruelle (1935–) and Dutch mathematician, Floris Takens (1940–2010) (Ruelle and Takens 1971). For discussion in an earth science context, see Turcotte (1997). Angelier-Mechler diagram Given a population of compatible measurements of the characteristics of geological faults, determining the proportion of points in compression or extension in each direction enables the three orthogonal principal stress axes (σ 1, σ 2 and σ 3) to be located. The method involves placing a plane perpendicular to the plane of movement in a fault, thereby dividing the fault into a set of four dihedra or quadrants. Two will be in compression (+) and two in extension (). σ 1 and σ 3 will lie somewhere between the dihedra; if the directions of σ 1 and σ 3 can be determined, then the remaining stress axis, σ 2, can be calculated from them, as it must be perpendicular to them both, or normal to the plane they define. σ 1 will lie somewhere in the area of compression and σ 3 will lie somewhere in the area of extension. As faults are rarely isolated, other faults in the fault system can also be plotted on a Lambert equal area projection. As increasing numbers are plotted, a direction for σ 1 and σ 2 representing the entire fault system may be determined. The two compressional right dihedrons and two extensional right dihedrons shown on the graph may be coloured white and black respectively, leading to its also being called a beach ball plot. Introduced by the French structural geologist, Jacques Angelier (1947–2010) and geophysicist, Pierre Mechler (1937–) (Angelier and Mechler 1977). Their method was improved on by the British structural geologist, Richard J. Lisle (1987, 1988, 1992). Angular frequency A function of time f (t) which has a constant period (T) if for very many (or infinite) successive integer values k, f(t) ¼ f(t + kT). The angular frequency ω ¼ 2π/T; or ω ¼ 2πν, where ν is the frequency measured in Hz. ω is measured in radians per second. Mentioned in an earth science context in: Knott (1908), Jeffreys (1924, Appendix E), Camina and Janacek (1984), Buttkus (1991, 2000), Weedon (2003), Chapman and Gubbins (2004); see also: periodic, quasi-periodic. Angular shear strain A measure of the angular deflection (ψ) between two initially perpendicular lines which have been subjected to strain; shear strain, γ ¼ tan(ψ). The

17

concept was introduced by the English mathematician, Augustus Edward Hough Love (1863–1940) (Love 1906) and was discussed in a geological context by the Hungarianborn American mechanical engineer, Árpád Ludwig Nádai (1883–1963) (Nádai 1927, 1931), but was popularised in the earth sciences through the work of the English geologist, John Graham Ramsay (1931–) (Ramsay 1967; Ramsay and Huber 1983). Angular transform A transform, y ¼ arcsin(x + k), where k 0 is a constant. Introduced by the British statistician, (Sir) Roland Aylmer Fisher (1890–1962) to stabilize the variance of a binomial variable x, the number of successes (Fisher 1922b). In modern usage, it is frequently taken to be y ¼ arcsin(√p), where p is a proportion. See also arcsine; Bartlett (1947); for geological discussion, see Krumbein and Tukey (1956) and Weedon (2003). Anomalous, anomaly A value (or a set of values) of a variable which differ markedly from the “usual” situation (see threshold). This may be expressed, for example, as a spatial cluster of unusually high values (positive anomaly) or low values (negative anomaly); or values which differ by an unexpected amount from those theoretically predicted from a fitted or conceptual model. Such a value (or values) may be spoken of as being anomalous. Sheriff (1984) states that it has been used (in geophysics) to mean the observed value minus a theoretical value (see residual). Usage of the term to mean something which deviates from the usual pattern goes back to at least the seventeenth Century in astronomical observations related to geophysical studies (Gellibrand 1635) and later marine magnetic observations (Scoresby 1819). Early earth science uses of the term were largely in exploration geophysics: Haasemann (1905) published a gravity anomaly map of the Harz region, Germany; see also Ambronn (1926, 1928), Elkins (1940), Heiland (1940), Agocs (1951); and in exploration geochemistry (Hawkes 1957). ANSI-C A general-purpose computer programming language, still widely used in application and operating system development, originally developed by the American computer pioneer, Dennis MacAlistair Ritchie (1941–2011), at the AT & T Bell Laboratories Computing Sciences Research Centre, Murray Hill, NJ, USA. Together with Canadian, Brian Kernighan (1942–), he wrote the definitive book on the language (Kernighan and Ritchie 1978) and a second edition in 1988 when the American National Standards Institute (ANSI) standard version was brought out. With Kenneth Lane Thompson (1943–), Ritchie was also one of the developers of the Unix computer operating system. This was originally developed in assembly language in 1969, but by 1973 it had been recoded in C, which greatly aided its portability. One of the first C programs to be published in the earth sciences was a geostatistical simulation program (GómezHernández and Srivastava 1990). Later examples include Brown (1995), Kutty and Gosh (1992) and Guzzetti et al. (2002). See also C++.

18

A

Ant colony optimization A new approach to system optimization has been pioneered through the work of the Italian robotics engineer, Marco Dorigo (1961–) and his co-workers, inspired by the behaviour of biological systems, such as the foraging behaviour of an ant colony which results in finding the shortest route between its nest and a good source of food (Dorigo 1992; Dorigo et al. 1996; Dorigo and Gambardella 1997; Dorigo and Stützle 2004; Blum 2005). It enables the search to concentrate in parts of the search region containing high quality solutions and to learn which attibutes contribute to their being good solutions, which assists the search process. Geoscience applications include geophysical inversion problems (Yuan et al. 2009), petroleum reservoir history matching (Hajizadeh 2011; Hajizadeh et al. 2011); tracking faults within 3-D seismic data (Yan et al. 2013); and open-pit mine production planning (Gilani and Sattarvand 2016). Antialiasing filter In geophysics an antialiasing filter (see Smith (1997) for examples) also called an alias filter, is used before sampling to remove undesired frequencies above the Nyquist frequency. See also Pavlis (2011); alias. Anti-causal filter A filter whose output depends only on future inputs (Gubbins 2004). See causal filter. Anticipation function A function which collapses a wave train into an impulse at the front end of the train (Robinson 1966b, 1967b). Introduced in a geophysical context by the German-born Argentinian-American geophysicist, Sven O. Treitel (1929–) and American mathematician and geophysicist, Enders Anthony Robinson (1930–) (Treitel and Robinson 1964). Antisymmetric function A function F which changes sign when its argument changes sign, thus F(x) ¼ F(x). It is also known as the odd function. It occurs in geophysics in Press et al. (1950); see also: Camina and Janacek (1984) and Gubbins (2004). It is called the even (or symmetric) function when F(x) ¼ F(x). Antisymmetric matrix A matrix (A), for which the transpose AT ¼ A, e.g. the 3 3 matrix: 2

0 A ¼ 4 a12 a13

a12 0 a23

3 a13 a23 5: 0

It is also known as a skew-symmetric matrix. See Camina and Janacek (1984). Aperiodic An irregular, non-periodic waveform (e.g. random noise). Briefly discussed in the context of the amplitude decay of a signal, as experienced in the critical damping of air- or oil-damped seismographs, by Heiland (1940). See also: period.

19

APL The acronym stands for A Programming Language (Iverson 1962), which uses symbols rather than linguistic words, based on a mathematical notation developed by the Canadian computer scientist, Kenneth Eugene Iverson (1920–2004), when he was at Harvard University in 1957. Its subsequent take-up was facilitated by the development of a keyboard carrying the symbol set, and it first became available in 1966 as APL\360 for the IBM 360 computer (Falkoff and Iverson 1968). Today, a variety of interpreters and compilers for the language are available. Early earth science applications of APL include plotting ternary diagrams (McHone 1977), the calculation of volumes and fugacity coefficients for pure H2O and CO2 and activities in H2O–CO2 mixtures throughout most of the crustal and upper mantle pressure-temperature conditions (Jacobs and Kerrick 1981), and in the generation of synthetic seismic sections (Mukhopadhyay 1985). The Scottish structural geologist and historian of geology, Donald Bertram McIntyre (1923–2009) was a keen advocate for APL (McIntyre 1993; Smillie 2011). See also: computer program. Apodization function, apodizing function Sheriff (1984) uses these terms. It is also known as a tapering function, a term which was introduced by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1950) and refers to the operation of multiplying a time series record by a smooth function which is near zero at the beginning and end of the record and near unity in between. Discussed in a seismological context by Tukey (1959a). Apodization function has been the most widely used spelling since the mid-1970s (Google Research 2012). See also: Gubbins (2004); Bartlett window, Blackman-Harris window, boxcar taper, cosine taper, Daniell window, data window, Gaussian taper, Hamming window, Hann window, multi-tapering method, optimal taper, Parzen window, Thomson tapering. Arcsine transform (arcsin, asin) A transform, y ¼ sin1(x + k), (which may also be written as arcsin or asin) where k 0 is a constant. Introduced the British statistician, (Sir) Roland Aylmer Fisher (1890–1962) (Fisher 1922b). In modern usage, it is usually taken to pffiffiffi be the transformation y ¼ arcsin( p ), where p is a proportion (formerly known as the angular transform). These types of transform were initially introduced to stabilize the variance of a binomial variable x, the number of successes. See also: Bartlett (1947); for geological discussion, see: Krumbein and Tukey (1956) and Weedon (2003). See also cosine transform, sine transform. Arccosine (arcos, acos) and arctangent (arctan, atan) are similarly defined. Area dilation The ratio (ΔA) of the area of a strain ellipse to the area of the initial circle from which it was derived; ΔA ¼ (1 + e1)(1 + e2) 1, where (1 + e1) and (1 + e2) are the major and minor semi-axes of the strain ellipse respectively, and e1 and e2 are the principal finite extensions (also called principle finite strains). The term dilation was used by the British mathematician, Peter Guthrie Tait

20

A

(1831–1901) (Tait 1867). With regard to the equivalent ellipsoid, the increment of volume per unit volume was referred to in early literature (Love 1906) as dilatation. It was popularised by the work of the English geologist, John Graham Ramsay (1931–) in his book Folding and fracturing of rocks (Ramsay 1967). Area-of-influence polygons An alternative term for a Dirichlet tessellation: the class of random polygons that describe growth about random centres, or the contraction-cracking of a surface. They are space-filling, convex polygons constructed around a set of points or centres, such that each polygon contains all of the points that are closer to its centre than to the centres of other polygons. The tessellation was first discovered by the German mathematician, Johann Peter Gustav Lejeune Dirichlet (1805–1859) (Dirichlet 1850), but was rediscovered by the Russian mathematician, Georgy Fedoseevich Voronoï (1868–1908) who studied the n-dimensional case (Voronoï 1909); the American meteorologist, Alfred Henry Thiessen (1872–1956), who applied them to finding the spatial average (Thiessen mean) of rainfall (Thiessen 1911), and others. Hence their alternative names, Voronoï polygons and Thiessen polygons. The concept has been advocated for use in the mining industry since the 1920s (Harding 1920, 1923). Note that Evans and Jones (1987) comment “the vast majority of naturally occurring polygons will not be approximated well by [such] polygons” as evidenced by the concave polygons formed by mud cracks, crystal interfaces, etc. See also: Beard (1959), Gilbert (1962), Lachenbruch (1962), Crain (1976), and Boots and Jones (1983). Areal standard deviation The American sedimentologist, Robert Louis Folk (1925–), suggested (Folk 1973) the use of point-density contouring of bivariate scatterplots of data (e.g. skewness versus kurtosis values characterising the frequency distributions of a series of sediment samples), using a circular mask with a 50% overlap in each direction, then interpolating the density isoline which just excludes the outermost 32% of the points for a given population. Areal value estimation The estimation of the tonnage and monetary value of a commodity in a given geographical area, in terms of Unit Regional Weight (URW, metric tonnes per km2) and Unit Regional Value (URV, deflated US $ value per km2) as bases for interregional or national-scale comparisons. This approach was developed by the Welsh-born American geologist and statistician, John Cedric Griffiths (1912–1992) (Griffiths 1967a, b, 1978a; Missan et al. 1978, Labovitz and Griffiths 1982). A modified criterion, the Unit Regional Production Value (URPV), the value of cumulative historical minerals production plus economic reserves (again, valued in deflated US $) per km2, based on 33 major tradedmineral commodities, was introduced by the American mineral economist, James Peter Dorian (Dorian 1983; Dorian and Johnson 1984; Dorian and Clark 1986). Argand diagram A graphical method of depicting complex numbers: A complex number has both real and imaginary parts, e.g. z ¼ x + iy, where the constant i represents

21

pffiffiffiffiffiffiffi the imaginary unit 1, a concept developed by the Swiss mathematician Leonhard Euler (1707–1783) in 1777 but only published in 1794 (Euler 1768–94). It may also be written in the form z ¼ Meiθ, where M is the magnitude (modulus) |z| of the complex pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi number, i.e. M ¼ ðx2 þ y2 Þ, e is Euler’s number, the constant 2.71828, and θ ¼ arctan (y/x). The value of a complex number is visualised as a point plotted in a Cartesian coordinate system centred at (0,0), in which the horizontal axis (x) represents the value of the real component, designated Re(z) or ℜ(z), and the vertical axis that of the non-constant part ( y) of the imaginary component, Im(z) or ℑ(z). Although the idea of this graphical portrayal is generally credited to the French mathematician, Jean-Robert Argand (1768–1822) (Anonymous 1806; Argand 1874), it was originally proposed by the Norwegian-Danish surveyor and mathematician, Caspar Wessel (1745–1818) (Wessel 1799). It is mentioned in a geophysical context in Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004). In recent years (e.g. Osgood 1901; Zitelli 1948) it has also been referred to as the complex z-plane, or called a zero-pole diagram or occasionally a pole-zero diagram. The z-transform of the discrete impulse response function of a filter is the filter transfer function (FTF). This has the structure H(z) ¼ P(z)/Q(z), where P(z) and Q(z) are both polynomial functions in z, then “zeros” are the values of z at which P(z) ¼ 0, and the overall gain of the FTF is zero; “poles” are the values of z at which Q(z) ¼ 0 and the overall gain of the FTF is infinite. In the Argand diagram the poles of stable causal filters will lie outside the unit circle |z| ¼ 1; and all poles and all zeros of minimum-phase filters will lie outside the unit circle (Buttkus 1991, 2000). See also modulus. Argument 1. The independent variable of a function or the particular value at which a function is evaluated (Cayley 1879); e.g. if y ¼ 3 2x + 5x2, the argument x ¼ 3 yields the functional value y ¼ 42. 2. The phase or amplitude of a complex number (Briot and Bouquet 1875). Arithmetic mean A measure of the location of the centre of the frequency distribution of a set of observed values (x) of size n. The arithmetic mean of the values (m) is the sum of the individual values divided by their number: m¼

X n

x =n: i i¼1

Although use of such a concept goes back to the time of the Greek philosopher, Pythagoras (c. 530 BC), the first use of the term arithmetical mean was by the British polymath, Edmund Halley (1656–1742) (Halley 1695) and by the end of the eighteenth Century, it was in regular use for astronomical observations (Simpson 1755; Gauss 1809a). Its usage was popularised in geophysics by the work of the German mathematician, astronomer and geomagnetist, Carl Friedrich Gauss (1777–1855) and his colleague, the physicist Wilhelm

22

A

Eduard Weber (1804–1891), (Gauss and Weber 1837). It was used as a descriptive parameter in sedimentology by Krumbein and Pettijohn (1938). See Helsel (2005) for methods of computation when a data set contains values falling below a defined detection limit, as is often the case in geochemistry. See also: vector mean for orientation data and phi mean diameter. Array A systematic arrangement of objects in rows and columns. For example, each row of a data array may correspond to a single rock specimen and each column to the amount of a given component, e.g. absolute counts from a petrological modal point-count; parts per million from a trace-element analysis; or the raw counts re-expressed as a percentage; the directly measured percentage of major-oxides; etc. In the earth sciences, the term was first applied by the American petrographer, Felix Chayes (1916–1993) (Chayes 1960). See: closed array, data array, parent array, matrix. Array processor Also known as a vector processor. A special-purpose computer processor utilised as a peripheral device by a host computer to carry out special functions (such as matrix manipulations) very fast. Its instruction set enables the performance of mathematical operations on multiple data elements simultaneously. The IBM 2938 array processor was installed in 1967, developed by the American IBM systems engineers, John [?Henry] Koonce (?1916–?1992) and Byron Lee Gariepy (1937–2007), who had previously worked as a geophysicist with Marathon Oil Co., at Western Geophysical’s first computer centre at Shreveport, Louisiana (Sisko 2006; Ruggiero and Coryell 1969). Among other advantages, it was the first hardware to implement the Fast Fourier transform. The array processor subsequently formed the basis of the supercomputers of the 1970s and 1980s: e.g. the Texas Instruments Advanced Scientific Computer (1973); University of Illinois ILLIAC-IV (1975); Control Data Corporation CDC STAR-100 (1974); Cray Research CRAY-1 (1976); Control Data Corporation Cyber 205 (1981); Hitachi S-810 (1982); Cray X-MP (1982), etc., but became largely outmoded with improvements in performance and design of general-purpose central processor units. The technology now underpins video games-consoles and specialised computer graphics hardware, and is incorporated in all modern central processor designs as SIMD (Single Instruction, Multiple Data) elements. Early geophysical applications include: Kobayashi (1970), Wang and Treitel (1973), Cassano and Rocca (1973) and Krzeczkowski et al. (1982). Arrhenius plot The Arrhenius equation, k ¼ AeðEa =RT Þ , where k is a rate constant; A is a constant of proportionality; Ea is the activation energy; R is the universal gas constant (8.314472 JK1 mol1); and T is absolute temperature (degrees Kelvin), describes the rate of a chemical reaction as a function of temperature; and e is Euler’s number, the constant 2.71828. First described by Dutch chemist, Jacobus Henricus van’t Hoff (1852–1911), (van’t Hoff 1884), the physical interpretation was provided by the Swedish physicist and

23

chemist, Svante August Arrhenius (1859–1927), (Arrhenius 1889, 1967). The relationship may also be written in the form: ln(k) ¼ ln(A) (Ea/R)(1/T ) and the Arrhenius plot is a graph of ln(k) as a function of 1/T. The observed data points will ideally fall on a straight line whose y-intercept is ln(A) and slope is Ea/R, which represents the fraction of the molecules present in a gas which have energies equal to, or exceeding, the activation energy at a particular temperature. This graph has been widely used in conjunction with 40 Ar/39Ar dating, e.g. a graph of 39Ar diffusivity as a function of reciprocal temperature (Lovera et al. 1989); Berger and York (1981) show Arrhenius plots for hornblende, biotite and potassium feldspar from the Haliburton Highlands intrusives, Ontario, Canada. Lovera (1992) gives a computer program for calculating such plots. Arrow plot A graphic representation used to aid interpretation of early dipmeter log results (Matthews et al. 1965). The y-axis of the graph corresponds to down-hole depth and the x-axis to angle (0o 90o). At each depth at which the dip of the bedding has been measured an arrow is plotted at the coordinates corresponding to depth and angle, the stem of the arrow being drawn from the centre outwards in the direction of dip, using the convention that North-South is parallel to the y-axis with North at the top. Individually oriented short lines of equal length with a numeric dip value were used as a symbol by the Schlumberger Well Surveying Corp. in the 1940s; arrow plots came into use in the late 1950s and computer plotting first began in about 1961. See also: stick plot, tadpole plot, vector plot. Artificial Intelligence (AI) A term meaning “the science and engineering of making intelligent machines” introduced by four American scientists: mathematician, John McCarthy (1927–2011), Dartmouth College; mathematician and neuroscientist, Marvin Minsky (1927–), Harvard University; computer scientist, Nathanial Rochester (1919–2001), I.B.M. Corporation; and mathematician, Claude Shannon (1916–2001), Bell Telephone Laboratories in a proposal to hold a research conference on Artificial Intelligence, at Dartmouth College, Hanover, New Hampshire, in 1956: “For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving” (McCarthy et al. 1955). McCarthy also developed the programming language LISP to aid this work. Early studies to apply this type of approach to geological decision-making include Duda et al. (1978), Summers and MacDonald (1988), Riedel (1989), DeMers (1990), Bugaets et al. (1991) and Ali and Chawathé (2000). Artificial Neural Network (ANN) These are computer algorithms derived in part from attempts to imitate the activity of nerve cells (Ripley 1993; Bishop 1995; Warner and Misra 1996). This rapidly growing field of study originates from work by the American computer scientist, Frank Rosenblatt (1928–1969), who developed the perceptron, a connected network which simulated memory (Rosenblatt 1958) and physicist, John Joseph Hopfield (1933–) who introduced the first associative neural network (Hopfield 1982). They have

24

A

since been successfully applied to pattern recognition, discriminant analysis and time series prediction, and have often proved particularly useful when dealing with complex, poorly-understood, phenomena when there are many possible predictors (especially in nonlinear combination) and large amounts of training data are available: e.g. in remote sensing (Miller et al. 1995; Cracknell and Reading 2014), see also papers in Lees (1996); volcanological (Bertucco et al. 1999) and seismic event monitoring (Tiira 1999). Other examples of earth science applications include Osborne (1992), Benediktsson et al. (1993), Singer and Kouda (1997), Brown et al. (2000), Mohaghegh (2000), Koike et al. (2002), Yue and Tao (2005), Weller et al. (2007), Bassam et al. (2010), Baykan and Yilmaz (2010), and Cracknell and Reading (2014). Aspect ratio In general, it is the ratio of the longer dimension (l) to the shorter dimension (s) of a two-dimensional object, generally written as l:s. It is related to the ellipticity or strain ratio (R) of a finite strain ellipsoid (Thomson 1856) with major and minor semiaxes (1 + ε1) and (1 + ε2), where ε1 and ε2 are the principal finite extensions, R ¼ (1 + ε1)/ (1 + ε2). See Ramsay (1967) and Ramsay and Huber (1983). Assembler A computer program which translates symbolic assembler language code (e.g. FAP or MAP) into binary code for execution by a computer; it may also be known as an assembly program. See: Boehm and Steel (1959), Gear (1964) and, for early discussion in a geological context, Koch and Link (1970–71). Assembly language, assembler language A source language which includes symbolic language statements in which there is a one-to-one correspondence between the instruction and data formats for a computer of a specific architecture. The code, specific to a given computer and written in assembler language, is processed by an assembler to produce binary code for execution by the computer. A program module written in this language would generally be called from a high-level program such as FORTRAN (Struble 1969). The first assembler code was developed in 1948 by the American computer scientist, Nathanial Rochester (1919–2001), for the IBM 701 computer, which he designed. Each instruction consisted of a two-digit operation code, a four-digit data address and the fourdigit address of the next instruction. There were 97 operation codes in all. Each instruction was read from a punched card in Hollerith code. The earliest published geological example of its use is a program for the computation of sand-shale ratios in stratigraphic analysis, written for the IBM 650 computer (Krumbein and Sloss 1958). Assembly language has remained the most widely-used spelling (Google Research 2012). See also: Creager et al. (1962), Koch and Link (1970–71); FAP, MAP. Association analysis A method of classification based on measures of similarity suitable for binary-coded multi-attribute data, originally introduced by Williams and Lambert (1959) for the study of plant communities (Gill et al. 1976) using the Chi-squared statistic, the association measure being given by:

25

2

Qij ¼ nnij ni nj = ni ðn ni Þnj n nj , where n is the total number of samples; ni and nj are respectively, the number of samples in which characters i or j are present; and nij is the number of samples in which both characters i and j are present. A simpler criterion was used by Soukup (1970): If nij is the number of samples in which both characters are present; n00 is the number of samples in which both are absent; ni and nj are, respectively, the number of samples in which only characters i or j are present; then Qij ¼

nij n00 ni nj : nij n00 þ ni nj

Geological applications include Soukup (1970), Gill et al. (1976) and Gill (1993). Association of proportions A graphical technique (Snow 1975) to determine whether the proportions in mixtures are associated, after allowing for constraint. In its simplest form, given two proportions x1 and x2 (e.g. proportions of silica and alumina in a suite of rocks), he proposed plotting all the n data points on the basis of x1 and x2. The points will all fall in the triangle bounded by the vertices (0,0), (0,1) and (1,0). Dividing the points by a line originating at the vertex (0,1) so that n/2 points lie on each side of it, effectively drawn through the median of x1, and similarly dividing them by a line originating at (1,0), the test criterion is based on the 2 2 contingency table for the number of points falling into each “quadrant.” This is equivalent to setting P1 ¼ x1/(1 x2) and P2 ¼ x2/(1 x1) and testing whether P1 and P2 are independent. However, Darroch and Ratcliff (1978) showed that the method was unsatisfactory. Asymmetry In structural geology, it is a measure of the deflection of a fold from a curve which is symmetrical with respect to the line perpendicular to the fold envelope; a measure of the “relative overturn” or “relative vergence” of a fold (Dimitrijevič 1971). In other contexts, e.g. the shape of a frequency distribution, it simply means “lack of symmetry.” Asymmetry analysis Numerical and graphical methods for the analysis of asymmetry in square, non-symmetric, matrices, were developed by the British applied statistician, John Gower (1930–) (Gower 1977; Constantine and Gower 1978). This type of data typically arises in the quantitative interpretation of spatial relationships in distributional maps (e.g. the number of times lithology i is the nearest unlike neighbour of lithology j). Reyment (1981) applied these methods to the study of relationships between tectonic units in the western Mediterranean. Asymptote In analytic geometry, it is the limit of the tangent to a curve as the point of contact approaches infinity, e.g. as used in Sylvester (1866) and Birch (1938). Although use

26

A

of the term goes back to work of the Greek geometer, Apollonius of Perga (?262 BC c. 190 BC) on conic sections, Sylvester used it to refer to any lines which do not meet in whatever direction they are produced. Asymptotic error The term asymptotic, taken to mean “pertaining to large-sample behaviour,” came into use in statistics in the 1930s (Camp 1933). As the sample size approaches infinity, the standard error of a statistic, such as the standard error of the mean, tends to be unbiased (Bhattacharya and Rao 1976). In an earth science context, Buttkus (1991, 2000) uses the term in a discussion of error-reduction in periodogram and power spectrum estimation. Attractor In dynamical systems, this is a set in phase space to which the system evolves after a very long time as transients die out. It may be a point, a line, a curve, or even a complicated set with a fractal structure. The first physical example was demonstrated by the American meteorologist, Edward Norton Lorenz (1917–2008) (Lorenz 1963), although he did not actually use the term attractor, which first appeared in a publication by the American mathematician, Pinchas Mendelson (Mendelson 1960). It was then informally used by the French topologist, René Thom (1923–2002) in the late 1960s, but was popularised following publication of work by the Belgian physicist and mathematician, David Ruelle (1935–) and Dutch mathematician, Floris Takens (1940–2010) (Ruelle and Takens 1971). See also: Lorenz (1963), Thom (1972; 1975), Milnor (1985), Turcotte (1997); Lorenz attractor, strange attractor, phase space, phase map. Attribute, attribute space An inherent property of a class, member of a class, or object (Bouillé 1976a) which may be quantitative or non-quantitative (such as recording present/ absent), depending on its nature. A multivariate statistical analysis of a data set may be spoken of as being carried out in attribute space (Bezvoda et al. 1986). Attribute table A table characterising the properties of an object. Franklin et al. (1991) use the term for a table listing the characteristics and final classification type for pixels in a remotely-sensed image. Autoassociation A measure similar to autocorrelation designed for data which consists of nominal variables only, e.g. codes representing a number of different lithological states (Sackin and Merriam 1969). See: cross-association, substitutability analysis. AutoCAD First released by Autodesk Inc. in 1982, AutoCAD was one of the first computer-aided drafting (CAD) programs designed to run on personal computers. Two early earth science applications were to produce geological maps (Cameron et al. 1988) and, used in combination with gridding and contour mapping software, display of views of three-dimensional geological models (Marschallinger 1991).

27

Autoconvolution This is the convolution of a function with itself (Biraud 1969; Mooers 1973). Convolution is the integral from 0 to t of the product of two time series: Rt 0 f 1i f 2ti dx. For two equal-interval discrete time series a ¼ {a0, a1, a2, , an} and b ¼ {b0, b1, b2, , bn}, the convolution, written as a∗b, is c ¼ {c0, c1, c2, , cn} where ct ¼

t X

ai bti :

i¼0

The operation can be imagined as sliding a past b one step at a time and multiplying and summing adjacent entries. This type of integral was originally used by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1781). The Hungarian-born American mathematician, Aurel Friedrich Wintner (1903–1958) may have been the first to use the English term “convolution” (Wintner 1934), although its German equivalent Faltung (folding referring to the way in which the coefficients may be derived from cross-multiplication of the a and b terms and summation of their products along diagonals, if they are written along the margins of a square table) appeared in Wiener (1933). Early mathematical discussions of the technique include Tukey and Hamming (1949), Blackman and Tukey (1958), Robinson (1967a) and, in the earth sciences by: Jones (1977), Vistelius (1980, 1992), Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004). The operation has also been referred to as the Boltzmann-Hopkinson theorem, Borel’s theorem, Duhamel’s theorem, Green’s theorem, Faltungsintegral, and the superposition theorem and may also be achieved in terms of z-transforms or Fourier transforms. It may also be applied in more than two dimensions (helix transform). See also: deconvolution. Autocorrelation, auto-correlation, auto correlation, autocorrelation function The correlation of a time-varying waveform with an offset copy of itself. Let x1, x2, . . ., xn be a series of equally-spaced observations in time (or on a line in space) of length n. The autocovariance function is the series of values of the covariance (Cd) computed between values xi and members of the same series xi+d at a later interval in time, the k pairs of points being separated by the lag, d ¼ 0, 1, 2, . Then ( Cd ¼

n X

! ) xi xiþd =k

m,

i¼1

where m is the mean of all the data. The autocorrelation function rd ¼ Cd/s2, i.e. Cd normalised by the variance of the data (which is the same as the autocovariance at lag 0, where the comparison is between all elements of x and itself). So r0 ¼ 1 by definition. At lag d ¼ 1, the correlation is between {x1, x2, ∙∙∙, xn-1} and {x2, x3, ∙∙∙, xn), etc.; 1 rd 1. The term was introduced by the Swedish statistician, Herman Ole Andreas Wold

28

A

(1908–1992) (Wold 1938), although it may also have been used as an unnamed function by the American statistician, Norbert Wiener (1894–1964) from as early as 1926 (Wiener 1930). See also: Bartlett (1946), Blackman and Tukey (1958); mentioned in an earth science context by: Jones and Morrisson (1954), Horton (1955, 1957), Grant (1957), Horton et al. (1964), Davis and Sampson (1973), Camina and Janacek (1984), Buttkus (1991, 2000) and Weedon (2003). By far the most frequently used spelling is autocorrelation (Google Research 2012). See also: lagged product, retro-correlation. Autocorrelogram A display of half of the autocorrelation function (the half for positive time shifts). Often used with analysis of seismic traces in record-section format (Clegg 1976; Sheriff 1984); see also: Barber (1956) and Duncan et al. (1980). Autocovariance function Let x1, x2, . . ., xn be a series of equally-spaced observations in time (or on a line in space) of length n. The autocovariance function is the series of values of the covariance (Cd) computed between values xi and members of the same series xi+d at a later interval in time, the k pairs of points being separated by the lag, d ¼ 0, 1, 2, Then ( Cd ¼

n X

! ) xi xiþd =k

m,

i¼1

where m is the mean of all the data. It was introduced by the American statistician, Norbert Wiener (1894–1964) (Wiener 1930). See autocorrelation. Automatic cartography The first computer-contoured maps were produced in the 1950s and the first major project in computer-based cartography, the Canadian Geographical Information System began in 1963. See: Rhind (1977), Monmonier (1982), Cromley (1992), Clarke (1995) and Dent et al. (2008) for general reviews; early geological examples are discussed by: Wynne-Edwards et al. (1970), Rhind (1971), Smith and Ellison (1999); and geochemical examples by: Nichol et al. (1966), Rhind et al. (1973), Howarth (1977a), Webb and Howarth (1979), Lecuyer and Boyer (1979), and Howarth and Garrett (2010). See also: automatic contouring, choropleth map, contour map, data mapping, digital mapping, digital surface model, digital terrain model, grey-level map, point-symbol map, point-value map, three-component map, trend-surface map, windrose map, interpolation. Automatic contouring The computer-based interpolation and delineation of isolines of equal value through regularly-spaced (gridded) or irregularly-spaced data points in a two-dimensional spatial field, so as to obtain a realistic depiction of the spatial distribution of the variable of interest across the field; also known as an isarithmic map. Each isoline with a value x separates a field of values > x from a field of values < x. Such methods also underpin the representation of surfaces in pseudo “three-dimensional” views.

29

The first depiction of a topographic contour map was published by the French scientist, Marcellin Ducarla-Bonifas (1738–1816) (Ducarla-Bonifas 1782), but the first isoline map of an abstract quantity was a map of magnetic declination over the Atlantic Ocean, published by the English scientist, Edmond Halley (1656–1742) (Halley 1701). Isoline maps began to be widely used from the early nineteenth Century; see Robinson (1982) for discussion. Prior to the introduction of computers into the earth sciences in the 1960s “contouring” was accomplished by hand (Robinson 1982). Since then, numerous algorithmic approaches have been developed (Mason 1956; Bugry 1981; Crain 1970; Jones et al. 1986; El Abbass et al. 1990; Monnet et al. 2003). Even now, some types of data (e.g. airborne- or shipborne-acquired observations in which data points are very dense along the tracks but there are relatively large distances between them) can prove problematic. The stages involved are: (i) choice of grid mesh-size on which the generally irregularly-spaced values are to be interpolated; (ii) choice of the size of search-radius and/or number of nearest-neighbours to be considered around each grid node; (iii) choice of the interpolation algorithm; (iv) interpolation of values at the grid nodes; (v) choice of contourthreading algorithm; (vi) choice of isoline spacing; and (vii) automatic annotation of isolines and unambiguous declination of “lows” and “highs.” Some implementations use grey scale or colour scale infill between isolines with an accompanying key, rather than labelling the isolines themselves. Use of interpolation methods discussed under geostatistics (e.g. kriging) have become increasingly important in recent years as they exploit knowledge of the spatial correlation of the data to obtain an optimum solution which also provides the only means of estimating the uncertainty in the interpolated surface. One problem specific to the preparation of subsurface structure contour maps is the representation of abrupt discontinuities across fault planes. Tocher (1979) discusses the contouring of fabric diagrams. Automatic digitization The computer-based process of sampling a continuous voltage signal (or other continuously varying function, such as a recorder trace of a time series, or a map contour line), usually at regular intervals, and recording the values in digital form for subsequent data storage or analysis (Robinson 1967b; Sheriff 1984; Pintore et al. 2005; Xu and Xu 2014). See also: analogue-to-digital conversion. Automatic Gain Control (AGC) Gain is an increase (or change) in signal amplitude (or power) from one point in a circuit or system to another, e.g. System input ! System output. The term occurs in Nyquist (1932) and in geophysics in Lehner and Press (1966), Camina and Janacak (1984). In automatic gain control, a sliding window of fixed length is used to compute the average amplitude within the window. This average is compared to a reference level and the gain computed for a point in the window. The window then slides down one data point and the next gain correction is computed. Autoregressive model Used by Weedon (2003) as an alternative term for Burg’s maximum entropy method of spectrum estimation.

30

A

Autoregressive Moving Average (ARMA) process A stationary process in which the value of a time series at time t is correlated in some way with the value(s) in the previous time steps. An autoregressive moving average process, ARMAðp; qÞ is : xt m ¼ φ1 ðxt1 mÞ þ φ2 ðxt2 mÞ þ þ φp xtp m þ εt θ1 εt1 θ2 εt2 θq εtq where m is the mean level; ε is a white noise process with zero mean and a finite and constant variance; φi, i ¼ 1 to p and θj, j ¼ 1 to q are the parameters; and p, q are the orders. To obey the assumption of stationarity, the absolute values of φ1 and θ1 should be less than unity. The basic idea was introduced by the Swedish statistician, Herman Ole Andreas Wold (1908–1992) (Wold 1938), and later developed by the British-born American chemist and mathematician, George Edward Pelham Box (1919–2013) and statistician, Gwilym Meirion Jenkins (1933–1982) (Box and Jenkins 1970). For discussion in an earth science context, see: Camina and Janacek (1984), Sarma (1990), Buttkus (1991, 2000) and Weedon (2003); see also: autoregressive process, moving average process. Autoregressive process (AR process) A stationary process in which the value of a time series at time t is correlated in some way with the value(s) in the previous time steps. There are several types of model. An autoregressive process, AR( p) is: xt m ¼ φ1 ðxt1 mÞ þ φ2 ðxt2 mÞ þ þ φp xtp m þ εt , where m is the mean level; φi, i ¼ 1 to p, are the parameters; p is the order; and ε is a white noise process with zero mean and finite and constant variance, identically and independently distributed for all t. To obey the assumption of stationarity, the absolute value of φ1 should be less than unity. The term autoregressive process was introduced by the Swedish statistician, Herman Ole Andreas Wold (1908–1992) (Wold 1938), although processes of this type had previously been investigated mathematically by the British statistician, George Udney Yule (1871–1951) (Yule 1927). For discussion in an earth science context, see: Sarma (1990), Buttkus (1991, 2000), Weedon (2003); see also: moving average process, autoregressive moving average process. Autoregressive series A time series generated from another time series as the solution of a linear difference equation. Usually previous values of the output enter into the determination of a current value. Their properties were first discussed by the English Statistician, Maurice Kendall (1907–1983) in Kendall (1945), D. Kendall (1949), M. Kendall (1949); see also Quenouille (1958). Discussed in an earth science context by Sandvin and Tjøstheim (1978). Autoregressive spectrum analysis Buttkus (1991, 2000) uses this term when referring to a discussion of the power spectral density analysis of an autoregressive process.

31

Autorun function Introduced by the Turkish hydrologist, Zekâi Şen (1947–) (Şen 1977, 1984), the autorun function has been used to investigate the sequential properties of a time series (or spatial series along a transect) in which only two distinct values exist (e.g. in a transect of a porous medium encoded: + 1 ¼ solid, 1 ¼ void), the data points being equally spaced, a distance δs apart. The small-sample estimate of a lag k autorun coefficient is given by r(kδs) ¼ 2nk/(n k), where nk is the number of overlapping successive +1 pairs at a distance of kδs apart, and n is the number of equally-spaced points in the characteristic function. Autospectrum A term used by Schulz and Stattegger (1997) for a power spectrum obtained using the Fourier transform of a truncated autocovariance sequence. Auxiliary functions In geostatistics, a set of functions introduced by the British geostatistician, Isobel Clark (1948–), which provide aids to the evaluation of a semivariogram between regions of differing shapes and sizes, e.g. the estimation of the average grade of a mining panel from a number of lengths of core samples (I. Clark 1976, 1979). It is assumed in all cases that the spherical model of the semivariogram applies, the functions define the average semivariogram for a number of fixed configurations which can be combined so as to produce the average semivariogram for other configurations, e.g. χ(l) gives the average value of the semivariogram between a point and an adjacent segment of length l; F(l) gives the average value of the semivariogram between every possible combination of points within the length l; χ(l, b), gives the average value of the semivariogram between a segment of length b and a panel adjacent to it of width l; F(l, b) gives the average value of the semivariogram between all possible combinations of points within the panel; and H(l, b) yields the average value of the semivariogram between: (i) a sample point on the corner of a panel and every point within the panel or, equivalently (ii) between every point of a segment of length b and every point on a segment of length l perpendicular to one end of the first segment. Average (ave) The arithmetic mean of the values of a sample of observations from a given population. The method was introduced in astronomical observations by the Danish astronomer, Tycho Brahe (1546–1601) towards the end of the sixteenth Century and was subsequently applied by the French mathematician, Pierre-Louis Moreau de Maupertuis (1698–1759) in the course of his measurements of the length of a degree of latitude in Lapland (1736–7) in comparison with that at the Equator (Plackett 1958). Averaging filter A type of filter often used to remove periodic disturbances from a power spectrum (Camina and Janacek 1984). See: frequency selective-filter.

32

A

Averaging function A smooth function of weights over an interval xa to xb, chosen so as to optimise the estimation of the value of an observed variable y ¼ f(x) throughout the interval. Gubbins (2004) gives an example of its use in determining the mass of the Earth’s core. awk An interpretative text-processing computer programming language which copes with text parsing and pattern-matching. Its name is based on the family names of its authors as given in their publication (Aho et al. 1988), the Canadian computer scientists, Alfred Vaino Aho (1941–) and Brian Wilson Kernighan (1942–), and American computer scientist, Peter Jay Weinberger (1942–). Lieberman (1992) demonstrated its use with earth science bibliographic databases. Axial data, axial ratio The Swiss geologist, Theodor Zingg (1901–1974) introduced a method of classifying pebble shape by approximating it to that of a triaxial ellipsoid and using the axial ratios of the pebble’s long (L ), intermediate (I) and short (S) diameters, I/L and S/I to define its shape (Zingg 1935). See also: Krumbein (1941); Zingg plot. Axis, axes 1. In the Cartesian coordinate system for graphs, named for the French philosopher, René Descartes (1596–1650), the perpendicular reference lines are referred to as the axes of the graph. However, in his classic text which, in one of its appendices, relates algebra to geometry and vici-versa (Descartes 1637) he does not actually use the x-y coordinate system characteristic of analytic geometry. The systematic use of two axes and both positive and negative coordinates first appears in the work of the English polymath, (Sir) Isaac Newton (1642–1727): Enumeratio linearum tertii ordinis [Enumeration of curves of third degree], written in 1676 but published as an appendix to his treatise Optiks (1704). 2. In the earth sciences, by the mid nineteenth Century, the term was also in use to denote a “line about which objects are symmetrical, about which they are bent, around which they turn, or to which they have some common relation,” e.g. as in: synclinal axis, axis of elevation, the axis of a crystal, etc. (Page 1859). Azimuth, azimuth angle A horizontal angle which is generally measured clockwise from true North. Magnetic north may also occasionally be used (if so, this should be made clear). The term azimuth was in use in astronomy by the early eighteenth Century (Leadbetter 1727) and the azimuth compass came into being at about the same time (Middleton 1737). Azure noise Coloured (colored, American English sp.) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t) where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for blue

33

(or azure) noise increases linearly with frequency. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003).

B

B-spline An abbreviation of basis spline, introduced in Schoenberg (1967): A chain of polynomials of fixed degree (usually cubic functions are used) ordered in such a way that they are continuous at the points at which they join (knots). The knots are usually placed at the x-coordinates of the data points. The function is fitted in such a way that it has continuous first- and second-derivatives at the knots; the second derivative can be set to zero at the first and last data points. Splines were first described by the RomanianAmerican mathematician, Isaac Jacob Schoenberg (1903–1990) (Schoenberg 1946, 1971). Other types include: quadratic, cubic and bicubic splines (Ahlberg et al. 1967). Jupp (1976) described an early application of B-splines in geophysics. See also: piecewise function, spline, smoothing spline regression. Back Projection Tomography (BPT) An early method used in seismic tomography. It has its origins in the work of the Australian-born American physicist, radio astronomer and electrical engineer, Robert Newbold Bracewell (1921–2007) who showed theoretically (Bracewell 1956) how an image of a celestial body (e.g. the brightness distribution over the Sun) could be obtained by “line integration” of the observations obtained by a narrow beam sweeping across it. In exploration seismology, the aim is to determine the velocity structure in a region which has been sampled with a set of rays. In the basic back projection tomography approach (Aki et al. 1977), a reference velocity structure (e.g. a laterallyaveraged plane-layer model for the region studied) is assumed, and deviations from the travel times are inverted to obtain the slowness (i.e. reciprocal velocity) perturbations of the blocks. Only the assumed velocity structure is used to guide the ray’s path. The least squares solution to the problem is found by solving the normal equations LTLs ¼ LTt, where t ¼ Ls. t are the time delays, s are the slowness perturbations associated with the blocks, L is an N by M matrix of lengths (l) of the ray segments associated with each block, N are the number of travel-time data, and M are the number of blocks in the model. Because

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_2

35

36

B

most of the blocks are not hit by any given ray, the majority of the elements of L are zero. Considering only the diagonal of LTL, s ¼ D1LTt, where D ¼ diag(LTL), each ray is projected back from its receiver, one at a time. For each block encountered, the contributions to the sums of tl and l2 are accumulated separately. Once all the rays have been back-projected, each block's slowness is estimated using s ¼ Σtl/Σl2. The method is fast, but provides rather blurred results (Humphreys and Clayton 1988). This problem can be overcome using techniques which iterate on the basis of the travel-time residuals, such as the Algebraic Reconstruction Technique and the Simultaneous Iterative Reconstruction Technique. Background 1. In geophysics: The average systematic or random noise level of a time-varying waveform upon which a desired signal is superimposed (Dyk and Eisler 1951; Sheriff 1984). 2. In exploration geochemistry: a range of values above which the magnitude of the concentration of a geochemical element is considered to be “anomalous.” The term was adopted following the work of the pioneering American geochemist, Herbert Edwin Hawkes (1912–1996) (Hawkes 1957). See recent discussion by Reimann et al. (2005). 3. In computing, a background process is one which does not require operator intervention but can be run by a computer while the workstation is used to do other work (International Business Machines [undated]). See also: anomaly. Backus-Gilbert method, Backus-Gilbert inversion A numerical method for the solution of inverse problems in geophysics (less frequently known as Backus-Gilbert inversion; Google Research 2012) first introduced into geophysics by the American geophysicists, George Edward Backus (1930–) and James Freeman Gilbert (1931–2014) in a series of papers (Backus and Gilbert 1967, 1968, 1970) to infer the internal density structure, bulk modulus, shear modulus, etc. of the Earth from seismically-derived observations of its vibration frequencies. Their method aims to optimise the resolution of undetermined model parameters. See also: trade-off curve; Parker (1977), Menke (1989, 2012), Eberhart-Phillips (1986), Snieder (1991), Buttkus (1991, 2000), Press et al. (1992), Koppelt and Rojas (1994) and Gubbins (2004). Backward elimination A method of subset selection used in both multiple regression and classification (discriminant analysis) in which there may be a very large number (N ) of potential predictors, some of which may be better than others. Backward elimination begins with all N predictors; each one is temporarily eliminated at a time, then the bestperforming subset of the remaining (N 1) predictors is retained. Selection stops when no further improvement in the regression fit or classification success rate is obtained. See Berk (1978), and in an earth science context, Howarth (1973a).

37

BALGOL An acronym for “Burroughs ALGOL.” ALGOL is itself an acronym for Algorithmic Oriented Language, a computer programming language originally developed by a group of European and American computer scientists at a meeting in Zurich in 1958 (Perlis and Samelson 1958). It was subsequently refined and popularised as ALGOL60 (Naur 1960), assisted by the work of the computer scientists, Edsger Wybe Dijkstra (1930–2002) and Jaap A. Zonneveld (1924–) in the Netherlands; and (Sir) Charles Antony Richard Hoare (1934–), then working with the computer manufacturers, Elliott Brothers, in England. Later variants used in geological studies included BALGOL, developed by the Burroughs Corporation in the USA. Early examples of its use in the earth sciences include Harbaugh (1963, 1964) and Sackin et al. (1965), but it was soon replaced by programming in FORTRAN. Band A range of frequencies such as those passed (band-pass) or rejected (band-reject) by a filter. Electrical low-pass, high-pass and band-pass “wave filters” were initially conceived by the American mathematician and telecommunications engineer, George Ashley Campbell (1870–1954) between 1903 and 1910, working with colleagues, physicist, Otto Julius Zobel (1887–1970) and mathematician Hendrick Wade Bode (1905–1982), but the work was not published until some years later (Campbell 1922; Zobel 1923a, 1923b, 1923c; Bode 1934). The term band pass was subsequently used in Stewart (1923) and Peacock (1924); see also: Wiggins (1966) and Steber (1967). See: frequency selective-filter. Band-limited function A function whose Fourier transform vanishes, or is very small, outside some finite interval, i.e. band of frequencies. The term was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). For discussion in a geophysical context, see: Grillot (1975) and Boatwright (1978). Band-pass filter Filters are algorithms for selectively removing noise from a time series (or spatial set of data), smoothing, or for enhancing particular components of the signal by removing components that are not wanted. A band-pass filter attenuates all frequencies except those in a given range between two given cut-off frequencies and may also be applied to smoothing of a periodogram. A low-pass filter and a high-pass filter connected in series is one form of a band-pass filter. Information in the passband frequencies are treated as signal, and those in the stopband are treated as unwanted and rejected by the filter. There will always be a narrow frequency interval, known as the transition band, between the passband and stopband in which the relative gain of the passed signal decreases to its near-zero values in the stopband. Electrical low-pass, highpass and band-pass “wave filters” were initially conceived by the American mathematician and telecommunications engineer, George Ashley Campbell (1870–1954) between 1903 and 1910, working with colleagues, physicist, Otto Julius Zobel (1887–1970) and mathematician Hendrick Wade Bode (1905–1982), but the work was not published until some

38

B

years later (Campbell 1922; Zobel 1923a, 1923b, 1923c; Bode 1934). Equivalent filters were introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). Parallel theoretical background was provided by the work of the American physicist, George W. Steward (1876–1956), who worked on acoustics between 1903 and 1926 and solved the fundamental wave equations involved in acoustic filter design (Crandall 1926; Stewart 1923). See Buttkus (1991, 2000), Camina and Janacek (1984), Gubbins (2004) and Vistelius (1961) for discussion in an earth sciences context. Band-reject filter, band-stop filter A filter which is designed to remove (reject) a narrow band of frequencies in a signal while passing all others. It is also known as a notch or rejection filter (Sherriff 1984; Wood 1968; Buttkus 2000; Gubbins 2004). The opposite of a band-pass filter. See: Steber (1967) and Ulrych et al. (1973). Banded equation solution This refers to the solution of a system of linear equations involving a square symmetric matrix in which the band referred to is a symmetrical area on either side of, and parallel to, the matrix diagonal which itself contains nonzero values. Outside this band, all entries are zero. See: Segui (1973), Carr (1990) and Carr and Myers (1990). Bandwidth 1. The width of the passband of a frequency selective-filter; the term was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). 2. A term introduced by the British statistician, Maurice Stevenson Bartlett (1910–2002), in the context of the smoothing parameter used in smoothing a periodogram (Bartlett 1950). Bandwidth always exceeds the Rayleigh frequency. 3. It has more recently been applied to the smoothing parameter used in kernel density estimation (Chin 1991). Mentioned in an earth science context by: Sheriff (1984), Buttkus (1991, 2000), Weedon (2003) and Gubbins (2004). Bandwidth retention factor A criterion used in normalizing the taper coefficients in designing a multitaper filter, it is the ratio: (energy within the chosen spectral frequency band)/(energy in the entire band). It was called the bandwidth retention factor by Park et al. (1987). See: multi-tapering method. Bar chart A graph in which either the absolute frequency or relative frequency of occurrence of a category is shown by the proportional-length of a vertical bar for each category in a data set. Since they are categorical variables, ideally, the side-by-side bars should be drawn with a gap between them. Not to be confused with a histogram, which

39

shows the binned frequency distribution for a continuous- or discrete-valued variable. The earliest bar chart, based on absolute amount, was published by the English econometrician, William Playfair (1759–1823) (Playfair and Corry 1786). An early earth science use was by Federov (1902) to show relative mineral birefringences. In a divided bar chart, each bar is divided vertically into a number of proportional-width zones to illustrate the relative proportions of various components in a given sample; total bar-length may be constant (e.g. 100% composition) or vary, depending on the type of graph. These were first used by the German scientist, Alexander von Humboldt (1769–1859) (Humboldt 1811). In geology, divided bars were first used by the Norwegian geologist, metallurgist and experimental petrologist, Johan Herman Lie Vogt (1858–1932) (Vogt 1903–1904). The Collins (1923) bar chart uses double divided bars to show the cationic and anionic compositions of a water sample separately; each set is recalculated to sum to 100% and plotted in the left- and righthand bars respectively. Usage in geology increased following publication of Krumbein and Pettijohn’s Manual of sedimentary petrography (1938). Bartlett method, Bartlett spectrum, Bartlett taper, Bartlett window, Bartlett weighting function Named for the British statistician, Maurice Stevenson Bartlett (1910–2002) who first estimated the power spectrum density of a time series, by dividing the data into a number of contiguous non-overlapping segments, calculating a periodogram for each (after detrending and tapering), and calculating the average of them (Bartlett 1948, 1950). The term Bartlett window (occasionally misspelt in recent literature as the “Bartlet” window), was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958) and has remained the most frequently used term since the mid-1970s (Google Research 2012). It is used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the width of the Bartlett window, is typically even and an integer power of 2, e.g. 2, 4, 8, 16, 32, etc.; for each point, n ¼ 0,, N, the weight w(n) is given by 2n ;0 n wðnÞ ¼ N 1

N 1 2

and w ð nÞ ¼ 2

2n N ; n < N; N 1 2

otherwise zero. It is also known (Harris 1978) as the triangle, triangular, or Fejér window, named for the Hungarian mathematician, Lipót Fejér (1880–1959) (Fejér 1904). See also Blackman and Tukey (1958) and, for a comprehensive survey, Harris (1978). Mentioned in an earth science context by: Buttkus (1991, 2000) and Weedon (2003). See also: spectral window.

40

B

Barycentric coordinates The percentage-based coordinate system used today in an equilateral ternary diagram is equivalent to the barycentric coordinate system introduced by the German mathematician, August Ferdinand M€obius (1790–1886) (M€obius 1827). Imagine three masses wA, wB and wC, placed at the apices A, B, C of a triangle and all joined by threads to an interior point P at equilibrium, then the areas of the subtriangles BPC ¼ a, APC ¼ b and APB ¼ c are proportional to wC, wB and wA respectively. The barycentric coordinates {a, b, c} may be normalised so that a + b + c ¼ 1. However, M€ obius never seems to have used the idea as the basis for a graphical tool. BASIC Acronym for Beginner’s All-purpose Symbolic Instruction Code, a generalpurpose interactive computer programming language (i.e. interpreted on the fly, rather than compiled and run) originally developed in 1963–1964 by American mathematicians and computer scientists, John George Kemeny (1926–1992), and Thomas Eugene Kurtz (1928–) at Dartmouth College, New Hampshire, USA, as a teaching tool for non-scientist undergraduates (Kemeny and Kurtz 1964). It was partly based on FORTRAN II and ALGOL, with additions to make it suitable for timesharing use. Because of its ease of use, it was subsequently adopted for use on minicomputers, such as the DEC PDP series, Data General and Hewlett Packard in the late 1960s and early 1970s, but it was the development of BASIC interpreters by Paul Allen (1953–) and William Henry Gates III (1955–), co-founders of Microsoft, and Monte Davidoff (1956–) for the Altair and Apple computers, and its subsequent take-up in many other dialects by other manufacturers which popularised its use in the personal-computing environment of the 1980s. Early applications in the earth sciences include: Till et al. (1971), McCann and Till (1973) and Jeremiasson (1976). Basin analysis The quantitative modelling of the behaviour of sedimentary basins through time has become an important tool in studying the probable hydrocarbon potential of a basin as an aid to exploration. Modelling generally embraces factors such as basement subsidence, compaction and fluid flow, burial history, thermal history, thermal maturation, and hydrocarbon generation, migration and accumulation. The aim is to determine the relative timing of hydrocarbon evolution in relation to the development of traps and their seals, and the continuing integrity of the sealed traps following petroleum entrapment. The methods used have been largely developed by the British-American physicist, theoretical astronomer, and geophysicist, Ian Lerche (1941–); see: Lerche (1990, 1992), Dore et al. (1993), Harff and Merriam (1993) and Lerche et al. (1998). Basin of attraction A region in phase space in which solutions for the behaviour of a dynamical system approach a particular fixed point; the set of initial conditions gives rise to trajectories which approach the attractor as time approaches infinity. The term was introduced by the French topologist, René Thom (1923–2002) in the late 1960s and published in Thom (1972, 1975). For discussion in an earth science context, see Turcotte (1997). See also: phase map.

41

Basis function, basis vector 1. An element of a particular basis (a set of vectors that, in a linear combination, can represent every vector in a given vector space, such that no element of the set can be represented as a linear combination of the others) for a function space (a set of functions of a given kind). Basis function has been the most frequently used spelling since the 1980s (Google Research 2012). Examples include the sine and cosine functions which make up a Fourier series, Legendre polynomials, and splines. 2. Algorithms which form the basis for numerical modelling and for methods of approximation (Sheriff 1984; Gubbins 2004) Batch processing The execution of a series of “jobs” (programs) on a computer, established so that they can all be run to completion without manual intervention. Used on mainframe computers since the 1950s, it ensures the maximum level of usage of the computer facilities by many users. Early examples of geological programs for such an environment are those of Krumbein and Sloss (1958), Whitten (1963) and Kaesler et al. (1963). By the 1970s, “time-shared” operations enabled input/output via remote Teletype terminals which offered both keyboard and punched paper-tape readers as means of input and, in the latter case, output also. An early example of a suite of statistical computer programs for geological usage written for a time-sharing environment is that of Koch et al. (1972). Batch sampling 1. An alternative name for channel sampling, a means of physical sampling in a mine environment in which a slot, or channel, of given length is cut into the rock face in a given alignment (generally from top to bottom of the bed, orthogonal to the bedding plane); all the rock fragments broken out of the slot constitute the sample. 2. In statistical sampling, it is a method used to reduce the volume of a long data series: the arithmetic mean of all the values in a fixed non-overlapping sampling interval is determined and that value constitutes the channel sample. See: Krumbein and Pettijohn (1938) and Krumbein and Graybill (1965); composite sample. Baud In asynchronous transmission, the unit of modulation rate corresponding to one unit interval per second; e.g. if the duration of the interval is 20 ms, the modulation rate is 50 baud (International Business Machines [undated]) Bayes rule, Bayesian methods Given a prior frequency distribution of known (or sometimes assumed) functional form for the occurrence of the event, the posterior frequency distribution is given by Bayes' rule, named after the English philosopher and mathematician, Thomas Bayes (1702–1761). Expressed in modern notation as:

42

pðSjX Þ ¼ ½ pðX jS ÞpðS Þ=f½ pðx1 jS ÞpðS Þ þ ½ pðx2 jS ÞpðS Þ þ þ ½ pðxn jS ÞpðS Þg,

B

where p(S|X) is the posterior distribution of a given state (or model parameters) S occurring, given a vector of observations, X; p(S) is the prior distribution; and p(x|S) is the likelihood. However, this “rule” does not appear in Bayes (1763); John Aldrich in Miller (2015a) gives the first use of the term “la règle de Bayes” to Cournot (1843) but attributes its origin to Laplace (1814). The term Bayesian was first used by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) in Fisher (1950). See: Wrinch and Jeffreys (1919) and, in an earth science context: Appendix B in Jeffreys (1924), also: Rendu (1976), Vistelius (1980, 1992), Christakos (1990), Curl (1998), Solow (2001) and Rostirolla et al. (2003); Bayesian inversion, Bayesian/maximum-entropy method. Bayesian inversion The application of Bayesian methods to solution of inverse problems (e.g. the reconstruction of a two-dimensional cross-sectional image of the interior of an object from a set of measurements made round its periphery). For discussion in an earth science context, see: Scales and Snieder (1997), Oh and Kwon (2001), Spichak and Sizov (2006), Hannisdal (2007), Gunning and Glinsky (2007) and Cardiff and Kitandis (2009). Bayesian/Maximum-Entropy (BME) method A methodological approach to the incorporation of prior information in an optimal manner in the context of spatial and spatio-temporal random fields: given measurements of a physical variable at a limited number of positions in space, the aim is to obtain estimates of the variable which are most likely to occur at unknown positions in space, subject to the a priori information about the spatial variability characteristics. Introduced by the Greek-born American environmental scientist and statistician, George Christakos (1956–), Christakos (1990, 2000). See also: Bayes rule, maximum entropy principle. Beach ball plot Given a population of compatible measurements of the characteristics of geological faults, determining the proportion of points in compression or extension in each direction enables the three orthogonal principal stress axes (σ 1, σ 2 and σ 3) to be located. The method involves placing a plane perpendicular to the plane of movement in a fault; dividing the fault into a set of 4 dihedra or quadrants. Two will be in compression (+) and two will be in extension (). σ 1 and σ 3 will lie somewhere between the dihedra; if the directions of σ 1 and σ 3 can be determined, then the remaining stress axis, σ 2, can be calculated from them, as it must be perpendicular to them both, or normal to the plane they define. σ 1 will lie somewhere in the area of compression and σ 3 will lie somewhere in the area of extension. As faults are rarely isolated, other faults in the fault system can also be plotted on a Lambert equal area projection. As increasing numbers are plotted, a direction for σ 1 and σ 2 representing the entire fault system may be determined. The two compressional right dihedra and two extensional right dihedra shown on the graph may be

43

coloured white and black respectively, leading to its being called a beach ball plot. Introduced by the French structural geologist, Jacques Angelier (1947–2010) and geophysicist, Pierre Mechler (1937–) (Angelier and Mechler 1977) when it was known as an Angelier-Mechler diagram. Their method was improved on by the British structural geologist, Richard J. Lisle (1987, 1988, 1992). Beat If two sinusoids of similar wavelengths are added together, the resultant waveform will have constant wavelength (equal to the average of the wavelengths of the two sinusoids), but the amplitude of the resulting waveform, the beat, will vary in a fixed manner which will be repeated over the beat wavelength. The term originally derives from the acoustics of music, and was used (battement) by the French mathematician and physicist, Joseph Sauveur (1653–1716) (Sauveur [1701] 1743); by 1909 it was in use in wireless telegraphy, first patented by Italian physicist, Guglielmo Marconi (1874–1937) in 1896. Mentioned in an earth science context by Panza (1976) and Weedon (2003). See also: amplitude modulation. Belyaev dichotomy Named for the Russian statistician, Yuri Konstantinovich Belyaev (1932–), who proved (Belyaev 1961, 1972) that with a probability of one, a stationary Gaussian process in one dimension either has continuous sample paths, or else almost all its paths are unbounded in all intervals. The implication for a Gaussian random field is that if it is smooth it is very smooth, but if it is irregular, it is highly irregular and there is no in-between state. This concept was applied to the topography of a soil-covered landscape by the British theoretical geomorphologist and mathematician, William Edward Herbert Culling (1928–1988) in Culling and Datko (1987) and Culling (1989) who used it to justify the view that the fractal nature of a landscape renders “the customary geomorphic stance of phenomenological measurement, naïve averaging and mapping by continuous contour lines” both “inappropriate” and “inadmissible” (Culling 1989). Bell-curve, bell-shaped curve, bell-shaped distribution An informal descriptive name for the shape described by a continuous Gaussian (“normal”) frequency distribution. John Aldrich in Miller (2015a) says that although the term “bell-shaped curve” appears in Francis Galton’s description of his Apparatus affording Physical Illustration of the action of the Law of Error or of Dispersion: “Shot are caused to run through a narrow opening among pins fixed in the face of an inclined plane, like teeth in a harrow, so that each time a shot passes between any two pins it is compelled to roll against another pin in the row immediately below, to one side or other of which it must pass, and, as the arrangement is strictly symmetrical, there is an equal chance of either event. The effect of subjecting each shot to this succession of alternative courses is, to disperse the stream of shot during its downward course under conditions identical with those supposed by the hypothesis on which the law of error is commonly founded. Consequently, when the shot have reached the bottom of the tray, where long narrow compartments are arranged to receive them, the general outline of the mass of shot there collected is always found to assimilate to the well-

44

B

known bell-shaped curve, by which the law of error or of dispersion is mathematically expressed,” Galton demonstrated his apparatus at a meeting of the Royal Institution in February 1874 (Committee of Council on Education 1876), but did not actually use the term in his many statistical publications. Nevertheless, the term began to be used in the early 1900s and by Thompson (1920), but it gained in popularity following its appearance in textbooks, such as Uspensky (1937) and Feller (1950). Bending power law spectrum An energy spectrum which is a modification of the linear (1/f ) power law spectrum ( f is frequency) which includes an element enabling to bend downwards, steepen, at high frequencies: It has the form: Eð f Þ ¼

N f c dc 1 þ ff B

where N is a factor which sets the amplitude, fB is the frequency at which the bend occurs, and c (usually in the range 0 to 1) and d (usually in the range 1 to 4) are constants which govern the slope of the spectrum above and below the bend. Vaughan et al. (2011) discuss the problems inherent in choice of a first-order autoregressive, AR(1), process as a model for the spectrum in cyclostratigraphy and recommend use of the power law, bending power law or Lorentzian power law models as alternatives. See also power spectrum. Bernoulli model, Bernoulli variable A Bernoulli random variable is a binary variable for which the probability that e.g. a species is present at a site, Pr(X ¼ 1) ¼ p and the probability that it is not present, Pr(X ¼ 0) ¼ 1 p. Named for the Swiss mathematician, Jacques or Jacob Bernoulli (1654–1705), whose book, Ars Conjectandi (1713), was an important contribution to the early development of probability theory. A statistical model using a variable of this type has been referred to since the 1960s as a Bernoulli model (Soal 1965; Merrill and Guber 1982). Bernstein distribution A family of probability distributions of the form (

) ðx mÞ F ðx; mÞ ¼ Φ pffiffiffiffiffiffiffiffiffi , f ð xÞ where Φ{•} is the normal distribution; m is the median; and f(x) is a polynomial function in x, (e.g. ax2 2bx + c; where a, b, and c are constants), whose value is greater than zero for all x. Introduced by the Russian mathematician, Sergei Natanovich Bernštein (1880–1968) (Bernštein 1926a, b; Gertsbakh and Kordonsky 1969); for discussion in an earth science context, see Vistelius (1980, 1992).

45

Bessel function A set of functions that are solutions to Laplace’s equation in cylindrical polar coordinates. Named (Lommel 1868) for the German astronomer and mathematician, Friedrich Wilhelm Bessel (1784–1846). The first spherical Bessel function is the same as the unnormalised sinc function, i.e. sin(x)/x. Mentioned in an earth science context by Buttkus (1991, 2000). Best Linear Unbiased Estimator (BLUE) A linear estimator of a parameter which has a smaller variance associated with it than any other estimator, and which is also unbiased, e.g. the ordinary least squares estimator of the coefficients in the case of fitting a linear regression equation, as shown by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1812, p. 326), or the use of “ordinary” kriging to estimate concentration values for spatially distributed data in applied geostatistics (Journel and Huijbregts 1978; Isaaks and Srivastava 1989). Beta diagram Introduced by the Austrian structural geologist, Bruno Sander (1884–1979) (Sander 1948; Sander 1970), the β-axis is the line of intersection between two or more planes distinguished by a parallel fabric (e.g. bedding planes, foliation planes). If the attitudes of these planes in a folded structure are plotted in cyclographic form on a stereographic projection, the unimodal ensemble of intersections statistically defines the location of the mean β-axis, which may correspond to a cylindrical fold axis (in certain types of complex folding they may not represent a true direction of folding). Also called a pole diagram. See: Turner and Weiss (1963), Robinson (1963) and Ramsay (1964, 1967). Beta distribution, Beta function A family of continuous probability distributions of the form f ð xÞ ¼

xα1 ð1 xÞβ1 , Bðα; βÞ

R1 where 0 < x < 1, 0 < α, β < 1 and B(α, β) is the Beta function: Bðα; βÞ ¼ 0 uα1 ð1 uÞβ1 du, first studied by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1768–1794), and by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1774). Subsequently given its name by the French mathematician, Jacques Phillippe Marie Binet (1776–1856) (Binet 1839). The distribution is J-shaped if α or β lie between 0 and 1, and U-shaped if both are within this range. Otherwise, if α and β are both greater than 1, then it is unimodal with the peak of the distribution (mode) falling at (α 1)/(α + β 2). It is frequently used to fit data on a finite interval and has been applied to the modelling of the proportions of microlithotype data in coal (Cameron and Hunt 1985). A Beta distribution scaled to the observed maxima and minima is known as a

46

stretched Beta distribution, which is now being used for distribution-fitting in petroleum resource estimation studies (Senger et al. 2010). See also incomplete Beta function.

B

Beta test To test a pre-release version of a piece of software by making it available to selected users (International Business Machines [undated]). Bias, biased 1. In statistical terms, bias is the difference between the estimated value of a parameter, or set of parameters, and the true (but generally unknown) value. The terms biased and unbiased errors were introduced by the British econometrician, (Sir) Arthur Lyon Bowley (1869–1957) (Bowley 1897). Typically, the estimated value might be inflated by erroneous observations or the presence of an outlier, or outliers, in the data. In time series analysis, it may be applied to the incorrect estimation of the periodogram as a result of the leakage effect. For discussion in an earth science context, see: Miller and Kahn (1962), Buttkus (1991, 2000) and Weedon (2003). 2. In geochemical analysis, or similar measurement processes, it is the difference between a test result (or the mean of a set of test results) and the accepted reference value (Analytical Methods Committee 2003). In practice, it is equivalent to systematic error. In analytical (chemical) work, the magnitude of the bias is established using a standard reference material, and it is generally attributable to instrumental interference and/or incomplete recovery of the analyte. See also: accuracy, precision, inaccuracy, blank. Bicoherence This is a measure of the proportion of the signal energy at any bifrequency that is quadratically phase-coupled. Nonlinear frequency modulation of a signal will be indicated by the presence of phase- and frequency-coupling at the frequencies corresponding to the sidebands, e.g. where a signal is composed of three cosinusoids with frequencies f1, f2, and f1 + f2 and phases φ1, φ2 and φ1 + φ2. This will be revealed by peaks in the bicoherence, a squared normalised version of the bispectrum of the time series, B( f1,f2) : jBð f 1 ; f 2 Þj2 i h i, bð f 1 ; f 2 Þ ¼ h E jPð f 1 ÞPð f 2 Þj2 E jPð f 1 þ f 2 Þj2 plotted as a function of f1 and f2; where P( f ) is the complex Fourier transform of the time series at frequency f; and E(•) is the expectation operator. The term was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) about 1953 (Brillinger 1991; Tukey 1953). See also: Brillinger (1965), Brillinger and Rosenblatt (1967a, b) and Brillinger and Tukey (1985); discussed in an earth science context by: Elgar and Sebert (1989), Mendel (1991), Nikias and Petropulu (1993), Persson (2003) and Weedon (2003).

47

Bicubic spline A chain of polynomials of fixed degree (usually cubic functions are used) in such a way that they are continuous at the points at which they join (knots). The knots are usually placed at the x-coordinates of the data points. The function is fitted in such a way that it has continuous first and second derivatives at the knots; the second derivative can be set to zero at the first and last data points. Splines were discovered by the RomanianAmerican mathematician, Isaac Jacob Schoenberg (1903–1990) (Schoenberg 1946). See also: Schoenberg (1971), Ahlberg et al. (1967) and Davis and David (1980); smoothing spline regression, spline, piecewise function. Bifrequency A reference to two frequencies of a single signal. The term was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) (Brillinger 1991; Tukey 1953). See: bicoherence, bispectrum. Bifurcation A sudden change in the behaviour of a dynamical system when a control parameter ( p) is varied, resulting in a period-doubling, quadrupling, etc. with the onset of chaos. A system of behaviour that previously exhibited only one mode, which subsequently exhibits 2, 4, etc. It shows on a logistic map as a splitting of the trace made by the variable representing the behaviour of the system when plotted as a function of p; the splitting becomes more and more frequent, at progressively shorter intervals, as p increases in magnitude. The term was coined by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) (Poincaré 1885, 1902) but was first used in this context by the Austrian-born German mathematician, Eberhard Frederich Ferdinand Hopf (1902–1983) (Hopf 1942; Howard and Kopell 1976), and the Russian mathematician, Lev Davidovich Landau (1908–1968) (Landau 1944). For earth science discussion see: Turcotte (1997) and Quin et al. (2006). See also: Andronov-Hopf bifurcation, perioddoubling bifurcation, pitchfork bifurcation. Bi-Gaussian approach A method of geostatistical estimation (Marcotte and David 1985) in which the conditioning is based on the simple kriging estimate of the mean value of the Gaussian variable representing the grades of a point, or block, rather than the actual data values. See also: multi-Gaussian approach. Bilinear interpolation A two-dimensional interpolation method in which values are first interpolated in one direction and then in the orthogonal direction. It was originally used for interpolation in tables, e.g. Wilk et al. (1962). Konikow and Bredehoeft (1978) used the method in computing solute transport in groundwater, and Sheriff (1984) gives the example of first interpolating in time between picks at velocity analysis points and then spatially between velocity analysis positions. Bilinear mapping, bilinear transform A stability-preserving transform used in digital signal processing to transform continuous-time system representations (analogue signal) to discrete-time (digital signal) and vice versa. It is often used in the design of digital

48

B

filters from an analogue prototype. Sometimes known as the Tustin transform or Tustin’s method, after the British electrical engineer, Arnold Tustin (1899–1994), who first introduced it (Tustin 1947). See Buttkus (1991, 2000) for discussion in an earth science context. Billings net This graphical net (a Lambert equal-area (polar) projection of the sphere) is used as an aid to plotting structural data (e.g. poles to joint planes). Named for the American structural geologist, Marland Pratt Billings (1902–1996), whose textbook (Billings 1942) greatly helped to promote its use in analysis of geological structures. This seems a little surprising, as the stereographic net, which appeared in the second edition (Billings 1954) is acknowledged by him as being reproduced from a paper by the American structural geologist, Walter Herman Bucher (1888–1965) (Bucher 1944). However, it was the Austrian mineralogist, Walter Schmidt (1885–1945) who was the first to adopt the use of the Lambert projection in petrofabric work in structural geology (Schmidt 1925), and it was first used in macroscopic structural work by Fischer (1930), but it was undoubtedly Billing's work which popularised its use in macro-scale structural geology (Howarth 1996b). Bimodal distribution A variable with two local maxima in its probability density. Use of the term goes back to about 1900. The first attempt to decompose a bimodal distribution into two normally distributed components in the geological literature appears to be that of the British petrologist, William Alfred Richardson (1887–1965) who, in 1923, applied it to the frequency distribution of silica in igneous rocks (Richardson 1923), using the method of moments originally described by the British statistician, Karl Pearson (1857–1936) (Pearson 1894). Jones and James (1969) discuss the case of bimodal orientation data. See also: frequency distribution decomposition. Bin 1. One of a set of fixed-interval divisions into which the range of a variable is divided so as to count its frequency distribution. The term is believed to have been first used the British statistician, Karl Pearson (1857–1936) in his lectures at Gresham College, London, probably in 1892/1893 when he introduced the histogram (Bibby 1986). 2. Sherriff (1984) uses the term for one of a set of discrete areas into which a survey region is divided (it is also used in this sense in astronomical surveys). Binary coefficient Statistical models for the analysis of binary-coded (presence/ absence) data were reviewed by Cox (1970). Cheetham and Hazel (1969) review 22 similarity coefficients for such data in the literature, some of which are discussed in more detail by Sokal and Sneath (1963) and Hohn (1976); see also Hazel (1970) and Choi et al. (2010). Of these, the Dice coefficient, Jaccard coefficient, Otsuka coefficient, Simpson

49

coefficient and simple matching coefficient were embodied in a FORTRAN program for biostratigraphical use by Millendorf et al. (1978). Binary digit (bit) Usually known by its acronym bit, the term was coined by the American statistician, John Wilder Tukey (1915–2000) about 1946, because the two states of an element in a computer’s core can represent one digit in the binary representation of a number. It first appeared in print in an article by the American mathematician, Claude Elwood Shannon (1916–2001) (Shannon 1948), see also Koons and Lubkin (1949) and Shaw (1950). A series of 8 bits linked together are referred to as a byte (Buchholz 1981). It is mentioned in Davis and Sampson (1973). Binary notation The representation of integer numbers in terms of powers of two, using only the digits 0 and 1. The position of the digits corresponds to the successive powers, e.g. in binary arithmetic: 0 + 0 ¼ 0, 0 + 1 ¼ 1, 1 + 0 ¼ 1, 1 + 1 ¼ 10; decimal 2 ¼ 0010, decimal 3 ¼ 0011, decimal 4 ¼ 0100, etc. and, e.g., decimal 23 ¼ decimal 16 + 4 + 2 + 1, i.e. 10000 + 00100 + 00010 + 00001 ¼ 10111 in binary notation. Although it has been asserted (Leibniz 1703, 1768) that binary arithmetic may have been used in the Chinese I-king [Book of permutations] which is believed to have been written by the Chinese mystic, W€ on-wang (1182–1135 BC), it “has no historical foundation in the I-king as originally written” (Smith 1923–1925, I, 25). Binary arithmetic was discussed by the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716), (Leibniz 1703). In computing, the binary numbering system was used in a report (von Neumann 1945) on the EDVAC (Electronic Discrete Variable Automatic Computer), developed under J. Presper Eckert (1919–1995) and John Mauchly (1907–1980) at the Eckert-Mauchly Computer Corporation, USA, in 1946 and its successor, Binac (Binary Automatic Computer) (Eckert-Mauchly Computer Corp. 1949). Statistical models for the analysis of presence/absence data, often coded as {1,0} values, were reviewed by Cox (1970). Binary notation in an earth science context is discussed by Ramsayer and BonhamCarter (1974), who consider the classification of petrographical and palaeontological data when represented by strings of binary variables. See also: Sheriff (1984) and Camina and Janacek (1984); binary coefficient. Binary variable A variable which may take one of only two discrete values, e.g. the presence of a particular lithology in a map cell might be coded: absent ¼ 0, present ¼ 1. Statistical methods for the analysis of such data were reviewed by Cox (1970). See also: Ramsayer and Bonham-Carter (1974); binary coefficient. Bingham distribution A spherical frequency distribution first studied by the American statistician, Christopher Bingham (1937–) in 1964, but only published some years later (Bingham 1964, 1974). It is the distribution of a trivariate vector of normal distributions, all with zero mean and an arbitrary covariance matrix, C, given that the

50

length of the vector is unity. If the random vector is x ¼ (x1, x2, x3), the probability distribution, is given by:

B

f ðx; m; kÞ ¼

2 2 2 1 efk 1 ðx1 m1 Þ þk 2 ðx2 m2 Þ þk 3 ðx3 m3 Þ g 4πd ðk Þ

where k ¼ (k1, k2, k3) is a matrix of constants, known as the concentrations; m1, m2 and m3 are three orthogonal normalised vectors, the principal axes; m ¼ (m1, m2, m3); and d(k) is a constant which depends only on k1, k2 and k3 and e is Euler’s number, the constant 2.71828. . . See Mardia (1972), Fisher et al. (1993), and Mardia and Jupp (2000) for further discussion. This distribution was popularised for use with paleomagnetic data by Onstott (1980). For other earth science applications, see: Kelker and Langenberg (1976) and Cheeney (1983). See also: e, spherical statistics, Fisher distribution, Kent distribution. Binomial distribution, binomial model, binomial probability If p is the probability of an event occurring one way (e.g. a “success”) and q is the probability of it occurring in an alternative way (e.g. a “failure”) then p + q ¼ 1, and p and q remain constant in n independent trials, then the probability distribution for x individuals occurring in a sampling unit is: Pðx; n; pÞ ¼

n! qnx pn x!ðn xÞ!

where x is the number of individuals per sampling unit; and k! means k factorial. The pffiffiffiffiffiffiffi arithmetic mean is kp and the standard deviation is kpq: Knowledge of this distribution goes back to the eighteenth Century, but the term binomial was introduced by the British statistician, George Udney Yule (1871–1951) (Yule 1911). For discussion in an earth science context, see: Miller and Kahn (1962), Koch and Link (1970–1971), Vistelius (1980, 1992), Agterberg (1984a) and Camina and Janacek (1984). See also trinomial distribution. Biochronologic correlation A method of correlation between two or more spatial positions based on the dates of first and last appearances of taxa, reaching a particular evolutionary state, etc. For general reviews, see: Hay and Southam (1978) and Agterberg (1984c, 1990). See also: biostratigraphic zonation, correlation and scaling, ranking and scaling, unitary associations. Biofacies map A map showing the areal distribution in the biological composition of a given stratigraphic unit based on quantitative measurements, expressed as percentages of the types of group present (e.g. brachiopods, pelecypods, corals, etc.). The American mathematical geologist, William Christian Krumbein (1902–1979) and Laurence

51

Louis Sloss (1913–1996) used isolines to portray the ratio of cephalopods/ (gastropods + pelecypods) in the Mancos Shale of New Mexico (Krumbein and Sloss 1951). See also: lithofacies map. Biometrical methods, biometrics Statistical and mathematical methods developed for application to problems in the biological sciences have long been applied to the solution of palaeontological problems. The term has been in use in the biological sciences since at least the 1920s, e.g. Hartzell (1924). The journal Biometrics began under the title Biometrics Bulletin in 1945 but changed to the shorter title in 1947 when the Biometrics Society became established in the USA under the Presidency of the English statistician, (Sir) Ronald Alymer Fisher (1890–1962), and a British “region” followed in 1948. Important early studies include those by the American palaeontologists, Benjamin H. Burma (1917–1982), followed by those of Robert Lee Miller (1920–1976) and Everett Claire Olsen (1910–1993) and by the English vertebrate palaeontologist, Kenneth A. Kermack (1919–2000) (Burma 1948, 1949, 1953; Miller 1949; Olsen and Miller 1951, 1958; Kermack 1954). The American geologist, John Imbrie (1925–) commented (Imbrie 1956) on the slowness with which palaeontologists were taking up such methods and he promoted the use of reduced major axis regression (Jones 1937), introduced into palaeontology by Kermack’s (1954) study, while regretting (in the pre-computer era) that practicalities limited such studies to the use of one- or two-dimensional methods. In later years, they embraced multivariate techniques such as principal components analysis, nonlinear mapping and correspondence analysis (Temple 1982, 1992). See also: Sepkoski (2012); biochronologic correlation. Biostratigraphic zonation A biostratigraphic zone is a general term for any kind of biostratigraphic unit regardless of its thickness or geographic extent. Use of microfossils as an aid to stratigraphic zonation in the petroleum industry dates from about 1925, and graphical depiction of microfossil assemblage abundances as a function of stratigraphic unit position in a succession has been in use since at least the 1940s (Ten Dam 1947; LeRoy 1950a). Methods for achieving quantitative stratigraphic zonation are discussed in Hay and Southam (1978), Cubitt and Reyment (1982), Gradstein et al. (1985), Hattori (1985) and Agterberg (1984c, 1990). Biphase The phase relationship of two nonlinearly related frequency components. The term was introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1953). See: bispectrum. See also: Brillinger (1965), Brillinger and Rosenblatt (1967a, 1967b) and Brillinger and Tukey (1985); and in an earth science context: Elgar and Sebert (1989), King (1996) and Weedon (2003). Biplot, Gabriel biplot Graphical display of the rows and columns of a rectangular n p data matrix X, where the rows generally correspond to the sample compositions, and the columns to the variables. In almost all applications, biplot analysis starts with performing

52

B

some transformation on X, depending on the nature of the data, to obtain a transformed matrix Z, which is the one that is actually displayed. The graphical representation is based on a singular value decomposition of matrix Z. There are essentially two different biplot representations: the form biplot, which favours the display of individuals (it does not represent the covariance of each variable, so as to better represent the natural form of the data set), and the covariance biplot, which favours the display of the variables (it preserves the covariance structure of the variables but represents the samples as a spherical cloud). Also known as the Gabriel biplot, named for the German-born statistician, Kuno Ruben Gabriel (1929–2003) who introduced the method (Gabriel 1971). See also: Greenacre and Underhill (1982), Aitchison and Greenacre (2002); and, in an earth science context, Buccianti et al. (2006). Bispectral analysis, bispectrum The bispectrum, B( f1, f2) of a time series measures the statistical dependence between three frequency bands centred at f1, f2, and f1 + f2: B( f1, f2) ¼ E[P( f1)P( f2)P∗( f1 + f2)], where P( f ) is the complex Fourier transform of the time series at frequency f; E(•) is the expectation operator; and P*( f ) is the complex conjugate. Each band will be characterised by an amplitude and phase. If the sum or difference of the phases of these bands are statistically independent, then on taking the average, the bispectrum will tend to zero as a result of random phase mixing; but if the three frequency bands are related, the total phase will not be random (although the phase of each band may be randomly changing) and averaging will yield a peak at { f1, f2} on a graph of B( f1, f2) as a function of f1 and f2. The term was introduced by the American statistician, John Wilder Tukey (1915–2000) in an unpublished paper (Tukey 1953). See also: Tukey (1959b), Mendel (1991) and Nikias and Petropulu (1993); and, in an earth science context: Haubrich (1965), Hagelberg et al. (1991), Rial and Anaclerio (2000), Persson (2003) and Weedon (2003). See also: bicoherence. bit An acronym for binary digit. Coined by the American statistician, John Wilder Tukey (1915–2000) about 1946, because the two states of an element in a computer core can represent one digit in the binary representation of a number. In the binary system, representation of integer numbers is in terms of powers of two, using only the digits 0 and 1. The position of the digits corresponds to the successive powers. e.g. in binary arithmetic 0 + 0 ¼ 0, 0 + 1 ¼ 1, 1 + 0 ¼ 1, 1 + 1 ¼ 10; decimal 2 ¼ 0010, decimal 3 ¼ 0011, decimal 4 ¼ 0100, etc. and, e.g., decimal 23 ¼ decimal 16 + 4 + 2 + 1, i.e. 10000 + 00100 + 00010 + 00001 ¼ 10111 in binary notation. It first appeared in print in an article by the American mathematician, Claude Elwood Shannon (1916–2001) (Shannon 1948). A series of 8 bits linked together are referred to as a byte. Mentioned in Davis and Sampson (1973). Bit-map A set of bits that represent an image. Armstrong and Bennett (1990) describe a classifier for the detection of trends in hydrogeochemical parameters as a function of time, based on the conversion of concentration-time curves into bit-strings.

53

Bivariate, bivariate frequency distribution 1. The term bivariate is used in the context of the analysis of data in which each observation consists of values from two variables. It came into usage following its use by the British statistician, Karl Pearson (1857–1936) (Pearson 1920). 2. A bivariate frequency distribution is the probability distribution corresponding to the simultaneous occurrence of any pair of values from each of two variables (x and y). It shows not only the univariate frequency distributions for x and y, but also the way in which each value of y is distributed among the values of x and vici-versa. It is also known as a two-way or joint frequency distribution. The distribution of the “joint chance” involving two variables was discussed by the British mathematician, mathematical astronomer and geophysicist, (Sir) Harold Jeffreys (1891–1989) (Jeffreys 1939). However, bivariate frequency distributions were actually used earlier in geology, in an empirical fashion, by the French mathematician and cataloguer of earthquakes, Alexis Perrey (1807–1882) (Perrey 1847) and subsequently by Alkins (1920) and Schmid (1934); see also Miller and Kahn (1962), Smart (1979), Camina and Janacek (1984) and Swan and Sandilands (1995); joint distribution, multivariate. Black box A conceptual model which has input variables, output variables and behavioural characteristics, but without specification of internal structure or mechanisms explicitly linking the input to output behaviours. The term is used to describe an element in a statistical model which contains features common to most techniques of statistical inference and in which only the input and output characteristics are of interest, without regard to its internal mechanism or structure. Although attributed to the Canadian statistician, Donald Alexander Stuart Fraser (1925–) (Fraser 1968), the term was previously used by the American statistician, John Wilder Tukey (1915–2000) in a geophysical context (Tukey 1959a). For discussion in geoscience applications see: Griffiths (1978a, 1978b), Kanasewich (1981), Tarantola (1984), Spero and Williams (1989), Gholipour et al. (2004), Jiracek et al. (2007) and Cabalar and Cevik (2009). Black noise Coloured (American English sp. colored) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t) where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for black noise is either characterised by predominantly zero power over most frequency ranges, with the exception of a few narrow spikes or bands; or increases linearly as fp, p > 2. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming

54

1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003).

B

Blackman-Harris window, Blackman-Harris taper Used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the length of the window is typically even and an integer power of 2; for each point, n ¼ 0, . . ., N. The weight for a four-term window is given by wðnÞ ¼ 0:35875 0:48829 cos

2πn 4πn þ 0:14128 cos N N

6πn 0:01168 cos ; where N

n ¼ 0, 1, 2,. . ., (N 1). Named for the American communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958) and signal processing and communications specialist, Frederic J. Harris (1940–). Use of this window (Gubbins 2004) was introduced by Harris (1976) and subsequently became more widely known through industrial taught-courses (Harris 1977) and publication (Harris 1978; Rabiner et al. 1970). Window seems to be the preferred usage over taper (Google Research 2012). See also: Bartlett window, boxcar taper, cosine taper, Daniell window, data window, Gaussian taper, Hamming window, Hann window, multi-tapering method, optimal taper, Parzen window, Thomson tapering. Blackman-Tukey method, Blackman-Tukey spectrum estimation Named for the American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000) who introduced it (Blackman and Tukey 1958), this method of power spectral density analysis is based on the Fourier transform of the smoothed autocovariance function, which has been computed for lags up to a certain value (the truncation point), so as to eliminate the most noisy values (which are based on only a small number of data) prior to the Fourier transform. The results were shown in one study (Edmonds and Webb 1970) to be similar in practice to those obtained using the Fast Fourier transform (FFT) method, although the latter was found to be superior from the point of view of flexibility of use and computation time. For discussion in an earth science context, see Buttkus (1991, 2000) and Weedon (2003); see also: mean lagged product. Blake’s method A method for determining the ellipticity (strain ratio) from measurements of the pressure-deformed spiral logarithmic growth curve in ammonites, goniatites and cephalopods. Named for the British geologist, John Frederick Blake (1839–1906) (Blake 1878). Mentioned in Ramsay and Huber (1983).

55

Blank 1. In analytical geochemistry, a dummy sample which has a chemical composition designed to contain a “zero” quantity of an analyte of interest. The term was in use in this sense in geochemistry by the early 1900s (Strutt 1908; Holmes 1911). 2. In geophysics, to replace a value by zero (Sheriff 1984). Blind source separation, blind signal separation More usually known as Independent Component Analysis, this is a technique based on information theory, originally developed in the context of signal processing (Hérault and Ans 1984; Jutten and Hérault 1991; Comon 1994; Hyvärinen and Oja 2000; Hyvärinen et al. 2001; Comon and Jutten 2010) intended to separate independent sources in a multivariate time series which have been mixed in signals detected by several sensors. After whitening the data to ensure the different channels are uncorrelated, they are rotated so as to make the frequency distributions of the points projected onto each axis as near uniform as possible. The source signals are assumed to have non-Gaussian probability distribution functions and to be statistically independent of each other. Unlike principal components analysis (PCA), the axes do not have to be orthogonal, and linearity of the mixture model is not required. ICA extracts statistically independent components. Ciaramella et al. (2004) and van der Baan (2006) describe its successful application to seismic data. Blind source separation appears to be the most frequent usage (Google Research 2012). Block averaging A technique for smoothing spatial distribution patterns in the presence of highly erratic background values, using the mean values of non-overlapping blocks of fixed size so as to enhance the presence of, for example, mineralized zones (Chork and Govett 1979). Block diagram This is typically an oblique pseudo three-dimensional view of a gridded (contoured) surface with cross-sectional views of two of its sides. It has its origins in diagrams to illustrate geological structure. Early examples were produced as a by-product in computer mapping packages such as SURF (Van Horik and Goodchild 1975) and SURFACEII (Sampson 1975). Block matrix This is a matrix which is subdivided into sections called blocks. Each block is separated from the others by imaginary horizontal and vertical lines, which cut the matrix completely in the given direction. Thus, the matrix is composed of a series of smaller matrices. A block Toeplitz matrix, in which each block is itself a Toeplitz matrix, is used in Davis (1987b). It is also known as a partitioned matrix, but the term block matrix has become the more widely used since the 1990s (Google Research 2012). Block model A method of modelling, say, a mineral deposit, by its representation as a grid of three-dimensional blocks. One approach is to use equal sized (“fixed”) blocks.

56

B

Dunstan and Mill (1989) discuss the use of the octree encoding technique to enable blocks of different sizes to be used so as to better model the topography of the spatial boundary of the deposit by enabling the use of progressively finer resolution blocks as it is approached. Blue noise Coloured [U.S. spelling, colored] noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g., x (t) ¼ ax(t 1) + kw(t), where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for blue (or azure) noise increases linearly as f. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003). Bochner’s theorem This theorem, used in Armstrong and Diamond (1984), is named for the American mathematician of Austro-Hungarian origin, Salomon Bochner (1899–1982). It characterizes the Fourier transform of a positive finite Borel measure on the real line: every positive definite function Q is the Fourier transform of a positive finite Borel measure. Bochner window This is another name for a window named after the Austro-HungarianAmerican mathematician, Salomon Bochner (1899–1982) used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time signal (Parzen 1957, 1961). N, the length of the window is typically even and an integer power of 2; for each point 0 n N 1, the weight is given by: 8 2 3 N N > nN =2 j j nN =2 > þ 6 N =2 ; 0 n < 1 6 N=2 2 4 wðnÞ ¼ : 3 N N N > jnN=2j > : 2 1 N =2 ; < n 4 2 2 It is also named for the American statistician, Emanuel Parzen (1929–2016). Parzen (1962) applied a similar technique to estimation of a density trace. It is also known (Harris 1978) as the Riesz window. See also: Preston and Davis (1976), Buttkus (1991, 2000); spectral window. Body rotation, body translation Body rotation: When a body moves as a rigid mass by rotation about some fixed point. Body translation: When a body moves without rotation or internal distortion. Both terms were used by Thomson and Tait (1878) and popularised in geology through the work of the English geologist, John Graham Ramsay (1931–) (1967, 1976). See also: Hobbs et al. (1976) and Ramsay and Huber (1983).

57

Boltzmann-Hopkinson theorem Convolution is the integral from i ¼ 0 to t of the Rt product of two functions, 0 f 1i f 2ti dx. For two equal-interval discrete time series a ¼ {a0, a1, a2, ..., an} and b ¼ {b0, b1, b2, ..., bn}, the convolution, usually written as a∗b or a ⨂ b, is c ¼ {c0, c1, c2, ..., cn}, where ct ¼

t X

ai bti :

i¼0

The operation can be imagined as sliding a past b one step at a time and multiplying and summing adjacent entries. This type of integral was originally used by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1781). The Hungarian-born American mathematician, Aurel Friedrich Wintner (1903–1958) may have been the first to use the English term convolution (Wintner 1934), although its German equivalent Faltung ( folding, referring to the way in which the coefficients may be derived from cross-multiplication of the a and b terms and summation of their products along diagonals if they are written along the margins of a square table) appeared in Wiener (1933). The operation has also been referred to as the Boltzmann-Hopkinson theorem, Borel’s theorem, Duhamel’s theorem, Green’s theorem, Faltungsintegral, and the superposition theorem and a similar result may also be achieved in terms of z-transforms or Fourier transforms. It can also be applied in more than two dimensions (see: helix transform). See also: Tukey and Hamming (1949), Blackman and Tukey (1958), and in an earth science context: Robinson (1967b), Jones (1977), Vistelius (1980, 1992), Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004); deconvolution. Boolean algebra A version of standard algebra introduced by the British mathematician George Boole (1815–1864) (Boole 1854), based solely on use of the integer values zero (false) and unity (true). The usual algebraic operations of addition (x + y), multiplication (xy), and negation (x) are replaced by the operators: OR (disjunction, equivalent to the arithmetic result xy), AND (conjunction, equivalent to x + y xy), and NOT (negation or compliment, equivalent to 1 x). Mentioned in an earth science context by Vistelius (1972). Boolean similarity matrix This similarity criterion is named for the George Boole (1815–1864), a British mathematician who pioneered the use of binary logic in problem solving (Boole 1854). Each attribute (e.g. the occurrence of n indicator mineral species, at m mineralised districts to be compared), is coded as either zero for “absent” or unity for “present.” The resultant m (row) n (column) data matrix (M) is multiplied by its n m transpose (MT) to form a product matrix (P). The square roots of the sums of squares of the elements of the rows of P were called the mineral typicalities by the American geologist, Joseph Moses Botbol (1937–) (Botbol 1970). See also characteristic analysis.

58

B

Booton integral equation The American mathematician, Norbert Wiener (1894–1964) and Austrian-born American mathematician, Eberhard Frederich Ferdinand Hopf (1902–1983), who worked with Wiener at the Massachusetts Institute of Technology (1931–1936), devised a method for the solution of a class of integral equations of the form: Z

1

f ð xÞ ¼

k ðx yÞf ðyÞdy,

where x 0 (Wiener and Hopf 1931; Wiener 1949; Widom 1997). The solution for the non-stationary case was developed by American electrical engineer Richard Crittenden Booton Jr. (1926–2009) in the context of prediction of random signals and their separation from random noise (Booton 1952). The objective is to obtain the specification of a linear dynamical system (Wiener filter) which accomplishes the prediction, separation, or detection of a random signal. For discussion in a geophysical context, see Buttkus (1991, 2000). Bootstrap A technique which involves computer-intensive resampling of a data set, in order to obtain nonparametric estimates of the standard error and confidence interval for medians, variances, percentiles, correlation and regression coefficients etc. It is based on repeatedly drawing at random, with replacement, a set of n samples from a pre-existing set of data values and determining the required statistics from a large number of trials. It was introduced by the American statistician, Bradley Efron (1938–) (Efron 1979; Efron and Tibshirani 1993). Examples of earth science applications include: Solow (1985), Campbell (1988), Constable and Tauxe (1990), Tauxe et al. (1991), Joy and Chatterjee (1998), Birks et al. (1990), Birks (1995) and Caers et al. (1999a,b); see also: cross-validation, jackknife. Borehole log, well log A graphical or digital record of one or more physical measurements (or quantities derived from them) as a function of depth in a borehole; also known as a well log or wireline log, as they are often derived from measurements made by a instruments contained in a sonde which is lowered down the borehole (Nettleton 1940; LeRoy 1950b). The first geophysical log (“electrical coring”) was made by Henri Doll (1902–1991), Roger Jost and Charles Scheibli over a 5 h period on September 5, 1927, in the Diefenbach Well 2905, in Pechelbronn, France, over an interval of 140 m, beginning at a depth of 279 m, using equipment designed by Doll following an idea for Recherches Électriques dans les Sondages [Electrical research in boreholes] outlined by Conrad Schlumberger (1878–1936) in a note dated April 28, 1927 (Allaud and Martin 1977, 103–108). The unhyphenated well log appears to be by far the most frequent usage (Google Research 2012). Borel algebra, Borel measure The Borel algebra over any topological space is the sigma algebra generated by either the open sets or the closed sets. A measure is defined on the

59

sigma algebra of a topological space onto the set of real numbers (ℝ). If the mapping is onto the interval [0, 1], it is a Borel measure. Both are named for the French mathematician, Félix Edouard Justin Émile Borel (1871–1956) and are mentioned in an earth science context by Vistelius (1980, 1992). Borel’s theorem Convolution is the integral from i ¼ 0 to t of the product of two functions, Z

t 0

f 1i f 2ti dx:

For two equal-interval discrete time series a ¼ {a0, a1, a2, . . ., an} and b ¼ {b0, b1, b2, . . ., bn}, the convolution, usually written as a∗b or a ⨂ b, is c ¼ {c0, c1, c2, . . ., cn}, where ct ¼

t X

ai bti :

i¼0

The operation can be imagined as sliding a past b one step at a time and multiplying and summing adjacent entries. This type of integral was originally used by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1781). The Hungarian-born American mathematician, Aurel Friedrich Wintner (1903–1958) may have been the first to use the English term convolution (Wintner 1934), although its German equivalent Faltung ( folding, referring to the way in which the coefficients may be derived from cross-multiplication of the a and b terms and summation of their products along diagonals if they are written along the margins of a square table) appeared in Wiener (1933). The operation has also been referred to as the Boltzmann-Hopkinson theorem, Duhamel’s theorem, Green’s theorem, Faltungsintegral, and the superposition theorem and a similar result may also be achieved in terms of z-transforms or Fourier transforms. It can also be applied in more than two dimensions (see: helix transform). See also: Tukey and Hamming (1949) and Blackman and Tukey (1958), and in an earth science context: Robinson (1967b), Jones (1977), Vistelius (1980, 1992), Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004); deconvolution. Boundary condition A constraint that a function must satisfy along a boundary. Knopoff (1956) and Cheng and Hodge (1976) are early examples of usage in geophysics and geology respectively. Boundary value problem Solution of a differential equation with boundary conditions. The term was used in mathematics in Birkhoff (1908). Wuenschel (1960) and Cheng and Hodge (1976) are early examples of usage in geophysics and geology respectively.

60

B

Box-Cox transform A general method of transformation of a skewed (asymmetrical) frequency distribution into one which is more symmetrical, for the purposes of statistical analysis: 8 < xλ1 ∗ ; λ 6¼ 0 x ¼ λ : log e ðλ Þ; λ ¼ 0 where e is Euler’s number, the constant 2.71828. . . In practice, the value of λ is determined empirically so that it minimises one or more measures of the asymmetry of the distribution (e.g. skewness). Introduced by the British-born American chemist and mathematician, George Edward Pelham Box (1919–2013) and statistician, (Sir) David Roxbee Cox (1924–) (Box and Cox 1964); it is also known as the power transformation. Introduced into geochemical usage by Howarth and Earle (1979), its usage has been further developed by Joseph and Bhaumik (1997) and Stanley (2006a,b). Box-count dimension This is a popular term for an estimator of fractal dimension (D; > 0) for a two-dimensional spatial point pattern. The area occupied by the set of points is covered with a square mesh of cells, beginning with one of diameter d, sufficient to cover the whole of the area occupied by the point set. The mesh size is then progressively decreased, and the number of occupied cells, N(d), at each size step is counted. Then, N(d ) ¼ cdD, where c is a constant; a graph of log[N(d )] ( y-axis) as a function of log(d) (x-axis) will be linear with a slope of D. This is more properly known as the Minkowski or Minkowski-Bouligand dimension, named after the Russian-born German mathematician, Hermann Minkowski (1864–1909) and the French mathematician, Georges Louis Bouligand (1889–1979). See: Minkowski (1901), Bouligand (1928, 1929), Mandelbrot (1975a, 1977, 1982), Turcotte (1997) and Kenkel (2013) for a cautionary note on samplesize requirements for such dimensionality estimation methods. Taud and Parrot (2005) discuss methods applied to topographic surfaces. See also Richardson plot. Box-Jenkins process A stationary process in which the value of a time series at time t is correlated in some way with the value(s) in the previous time steps. An autoregressive moving average process, ARMA( p, q) is:

xt m ¼ φ1 ðxt1 mÞ þ φ2 ðxt2 mÞ þ . . . þ φp xtp m þ εt θ1 εt1 θ2 εt2 . . . θq εtq where m is the mean level; ε is a white noise process with zero mean and a finite and constant variance; φi, i ¼ 1, p and θj, j ¼ 1, q are the parameters; and p, q are the orders. To obey the assumption of stationarity, the absolute values of φ1 and θ1 should be less than unity. The basic idea was introduced by the Swedish statistician, Herman Ole Andreas Wold (1908–1992) (Wold 1938), and later developed by the British-born

61

American chemist and mathematician, George Edward Pelham Box (1919–2013) and statistician, Gwilym Meirion Jenkins (1933–1982) (Box and Jenkins 1970). For discussion in an earth science context, see: Camina and Janacek (1984), Sarma (1990), Buttkus (1991, 2000) and Weedon (2003); see also: autoregressive process. Boxcar distribution A probability density in which the probability of occurrence of the value of a variable f(x) is the same for all values of x lying between xmin and xmax inclusive and zero outside that range (Vistelius 1980, 1992; Feagin 1981; Camina and Janacek 1984). The distribution is named after the shape of a “boxcar” railway freight waggon, a term which has been used in U.S. English since at least the 1890s. It is also known as the rectangular or uniform distribution. Boxcar taper, boxcar weighting function, boxcar window The boxcar taper or window (Blackman and Tukey 1958; Alsop 1968), is named after the shape of a “boxcar” railway freight waggon, a term which has been used in American English since at least the 1890s, and is used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the half-width of the window is typically even and an integer power of 2; for each point within 0 n N 1, the weight w(n) ¼ 1, otherwise it is zero. Its shape contrasts with that of the smoothly changing weights in windows which are tapered. It is also known as a Daniell window (Blackman and Tukey 1958); rectangular window (Harris 1978); and Dirichlet window (Rice 1964; Harris 1978); see also: Camina and Janacek (1984) and Gubbins (2004). Boxplot A graphical display, originally devised by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1977; McGill et al. 1978), which is extremely useful for the simultaneous comparison of a number of frequency distributions (e.g. concentrations of a trace element in a number of different sampled rock types). For each set of data, the top and bottom of a central “box” are given by the first and third quartiles (Q1, Q3), so the rectangle formed by the box (which is conventionally drawn parallel to the vertical axis, corresponding to increasing magnitude of the variable studied) encloses the central 50% of the frequency distribution. The position of the second quartile (the median) is shown by a horizontal line dividing the box. In the most useful graph, so-called whiskers are drawn outwards from the top and bottom of the box to the smallest data value lying within Q1 and Q1 1.5R, where R ¼ Q3 Q1; or to the largest data value lying within Q3 and Q3 + 1.5R; and any “further out” data values are deemed to be outliers and are plotted individually. Less informative plots are produced by simply extending the whiskers out to the maximum and minimum of the data values. In a multi-group comparison, box-width can be made proportional to the sample size of each group. See Helsel (2005) for discussion of treatment of data containing nondetects. Although the spelling box-and-whisker-plot was originally used, the contractions boxplot or box plot now appear to be equally frequent (Google Research 2012). See also: notched boxplot and Chambers et al. (1983), Kurzl (1988), Frigge et al. (1989), Helsel and Hirsch (1992) and Reimann et al. (2008) for examples of usage.

62

B

Branching process A Markov process that models a population in which each individual in generation n produces some random number of offspring in generation (n + 1), according to a fixed probability distribution which does not vary from individual to individual. The lines of descent “branching out” as new members are born. It has been applied to the study of the evolution of populations of individuals who reproduce independently. The mathematical problem was originally solved by the French statistician, Irénée-Jules Bienaymé (1796–1878), who published, without proof (Bienaymé 1845), the statement that eventual extinction of a family name would occur with a probability of one if, and only if, the mean number of male children is less than or equal to one (Heyde and Seneta 1977). The topic was revisited with the work of the English statistician, (Sir) Francis Galton (1822–1911) and mathematician, Rev. Henry William Watson (1827–1903) (Galton and Watson 1874; Watson and Galton 1875). Modern work began in the 1940s (Kolmogorov and Dmitriev 1947, 1992; Harris 1963; Jagers 1975). Discussed in the context of earthquake-induced crack-propagation in rocks by Vere-Jones (1976, 1977). See also Turcotte (1997). Bray-Curtis coefficient A measure of the similarity of one sample to another in terms of their p-dimensional compositions. Given two samples j and k and percentages of the i-th variable (e.g. in ecological or paleoecological studies, species abundance) in each sample the Bray-Curtis metric, named for American botanists and ecologists, J. Roger Bray (1929–) and John T. Curtis (1913–1961), is: d jkBC

( Pp

) 2 i¼1 min xij ; xik Pp

¼ i¼1 xij þ xik

where min() implies the minimum of the two counts where a species is present in both samples (Bray and Curtis 1957). In their usage, the data were first normalized by dividing the percentages for each species by the maximum attained by that species over all samples. However, Bray and Curtis attribute this formulation to Motyka et al. (1950) and Osting (1956). Use of the minimum abundance alone was proposed as an “index of affinity” by Rogers (1976). An alternative measure: (

) P p x x ij ik , d jkS ¼ 100 1 P pi¼1

i¼1 xij þ xik where the difference without regard to sign (the absolute difference) replaces the minimum, has been used in Stephenson and Williams (1971) and later studies, but use of this measure has been criticised by Michie (1982). See also the comments by Somerfield (2008).

63

Breakage model, breakage process Theoretical statistical models for the size frequency distribution which results from progressive breakage of a single homogeneous piece of material. First discussed by the Russian mathematician, Andrey Nikolaevich Kolmogorov (1903–1987), (Kolmogorov 1941a, 1992) the result of a breakage process (Halmos 1944; Epstein 1947) yielded size distributions which followed the lognormal distribution, but it was subsequently found that this model may not always fit adequately. Applied to consideration of the comminution of rocks, minerals and coal, see Filippov (1961) and more recently discussed in connection with the formation of the lunar regolith (Marcus 1970; Martin and Mills 1977). See the discussion in the context of particle-size distribution by Dacey and Krumbein (1979); see also: Rosin’s law, Pareto distribution. Breakpoint The point at which a statistically significant change in amplitude in the mean and/or variance of a time series occurs, indicating a change in the nature of the underlying process controlling the formation of the time series. Generally detected by means of a graph of the cumulative sum of mean and/or variance as a function of time (Montgomery 1991a) in which changepoints are indicated by a statistically significant change in slope, e.g. Green (1981, 1982) but see discussion in Clark and Royall (1996). See also: Leonte et al. (2003); segmentation. Breddin curves In structural geology, a set of curves of angular shear strain (ψ; y-axis) as a function of orientation of the greatest principal extension direction (φ; x-axis) for differing values of the strain ratio, or ellipticity, (R). The strain ratio in a given case may be estimated by matching a curve of observed ψ versus φ as found from field measurements of deformed fossils with original bilateral symmetry. Introduced by the German geologist, Hans Breddin (1900–1973) (Breddin 1956); see Ramsay and Huber (1983). Briggsian or common logarithm (log) An abbreviation for the common (i.e. base-10) logarithm. If x ¼ zy, then y is the logarithm to the base z of x, e.g. log10(100) ¼ 2 and log (xy) ¼ log(x) + log( y); log(x/y) ¼ log(x) log( y), etc. The principle was originally developed by the Scottish landowner, mathematician, physicist and astronomer, John Napier, 8th Laird of Murchiston (1550–1617), who produced the first table of natural logarithms of sines, cosines and tangents, intended as an aid to astronomical, surveying and navigational calculations (Napier 1614; Napier and Briggs 1618; Napier and Macdonald 1889). “The same were transformed, and the foundation and use of them illustrated with his approbation” by the British mathematician, Henry Briggs (1561–1630), who following discussions with Napier whom he visited in 1615 and 1616, developed the idea of common logarithms (sometimes called Briggsian logarithms), defining log(1) ¼ 0 and log(10) ¼ 1, and obtaining the intermediate values by taking successive roots, e.g. √10 is 3.16227, so log(3.16227) ¼ 0.50000, etc. His first publication (Briggs 1617) consisted of the first 1000 values computed, by hand, to 14 decimal places (they are almost entirely accurate to within 1014; see Monta (2015) for an interesting

64

B

analysis). A full table was initially published in Latin (Briggs 1624). After Briggs death an English edition was published “for the benefit of such as understand not the Latin tongue” (Briggs 1631). Briggs logarithms were soon being applied in works on geophysics, e.g. by the English mathematician, Henry Gellibrand (1597–1637) who was studying terrestrial magnetism (Gellibrand 1635). The first extensive table of (Briggsian) anti-logarithms was made by the British mathematician, James Dodson (?1705–1757) (Dodson 1742). All the tables mentioned here were calculated by hand as mechanical calculations did not come into use until the beginning of the twentieth Century. Although 10 is the common or Briggsian base, others may be used, see: Napierian logarithm and phi scale. Broken-line distribution This refers to the shape of the cumulative distribution of two complimentarily truncated normal or lognormal distributions, which form two straight lines which join at an angle at the truncation point. Parameter estimation uses a numerical estimation of maximum likelihood. Applied by the British physicist, Cecil Reginald Burch (1901–1983) to analysis of major and trace element geochemical distributions (Burch and Murgatroyd 1971). Brown noise Coloured (colored, American English sp.) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t) where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for brown noise decreases linearly as 1/f 2. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003). Brownian motion, Brownian walk Now generally considered in the context of a one-dimensional time series in which over a fixed interval (T ) the variance is proportional pffiffiffiffi to T and the standard deviation is proportional to T . In fractional Brownian motion (fractal), the variance is proportional to 2H and standard deviation to H, where H is the Hurst exponent. It is named for the British botanist, Robert Brown (1773–1858), who first described the phenomenon (Brown 1828), which he observed in 1827 in microscopic examination of the random movement of pollen grains suspended in water. In 1905, the German-American physicist, Albert Einstein (1879–1955), unaware of Brown's observations, showed theoretically (Einstein 1905, 1926) that the random difference between the pressure of molecules bombarding a microscopic particle from different sides would cause such movement, and that the probability distribution of a particle moving a distance d in a given time period in a given direction would be governed by the normal distribution. His theory of Brownian motion was verified in emulsions by the

65

French physicist, Jean-Baptiste Perrin (1870–1942), following invention of the ultramicroscope (Perrin 1908; Newburgh et al. 2006). See also: Wiener (1923, 1949) and Weedon (2003); random walk. Buffon’s needle problem This was first posed in an editorial comment by the French natural historian and mathematician, Georges-Louis Leclerc, Comte de Buffon (1707–1788) in 1733. It seeks the probability P(x) with which a needle of given length l, dropped at random onto a floor composed of parallel strips of wood of constant width d, will lie across the boundary between two of the strips. He showed (Buffon 1777, 46–123) that if d 1 then P ð xÞ ¼

2l , πd

and if d < l, then P ð xÞ ¼

2l π 2θ ð1 cos θÞ þ , πd π

where θ ¼ arcsin(d/l). In modern times, it has been used as a model for an airborne survey seeking a linear target and flying along parallel, equi-spaced, flight lines (Agos 1955; McCammon 1977). Chung (1981) solved the problem for the case of search using unequally-spaced parallel strips and a needle with a preferred orientation. Bug An error in a computer program, or hardware (International Business Machines [undated]) which causes it to produce erroneous, or unexpected, results. Although use of the term in this context was popularised following work in engineering, radar and early computers in the late 1940s (Shapiro 1987), its origins go back to nineteenth Century telegraphy and its use by Thomas Edison to indicate the occurrence of some kind of problem in electrical circuits (Edison 1878; Mangoun and Israel 2013). Burg algorithm A method of spectrum analysis, also known as the maximum entropy method, introduced by the American geophysicist, John Parker Burg (1931–) in 1967–1968 (Burg 1967, 1968, 1975). It minimizes the forward and backward prediction errors in the least squares sense, with the autoregressive coefficients constrained to satisfy the Levinson-Durbin recursion. For earth science applications see: Ulrych (1972), Ulrych et al. (1973), Camina and Janacek (1984), Yang and Kouwe (1995), Buttkus (1991, 2000) and Weedon (2003). Burnaby’s similarity coefficient This is a weighted similarity coefficient. The English palaeontologist, Thomas Patrick Burnaby (1924–1968) discussed the use of character weighting in the computation of a similarity coefficient in a paper, originally drafted in

66

1965, which was only published posthumously (Burnaby 1970). See Gower (1970) for a critique of Burnaby’s approach.

B

Burr distribution Named for the American statistician, Irving Wingate Burr (1908–1989), this right-skew distribution was introduced by Burr (1942), is f ðxÞ ¼ ck

x ð c 1Þ ð1 þ xcÞðk þ 1Þ

and the cumulative distribution F ð xÞ ¼ 1

1 , ð1 þ xcÞk

where x 0 and with shape parameters c 1 and k 1. The Weibull, exponential and log-logistic distributions can be regarded as are special cases of the Burr distribution. It has been widely applied in reliability studies and failure-time modelling. Discussed in an earth science context by Caers et al. (1999a,b). Burr-Pareto logistic distribution A bivariate distribution which fits bivariate data which exhibit heteroscedacity quite well. Introduced by Cook and Johnson (1981) as a unifying form for the multivariate versions of the Burr, Pareto and logistic distributions and used by them (1986) in analysis of hydrogeochemical data from a large regional survey. Named for the American statistician, Irving Wingate Burr (1908–1989) and the French economist, Vilfredo Pareto (1848–1923). Butterfly effect The property of sensitivity of a dynamical system to initial conditions. The idea was first popularised by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) (Poincaré 1908), but the term itself apparently arose from the title of a talk given by the American meteorologist, Edward Norton Lorenz (1917–2008) to the American Association for the Advancement of Science in 1972: “Predictability: Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” See: Lorenz attractor. Butterworth filter An electronic filter designed to have as flat as possible a frequency response (i.e. free of ripples) in the passband; the gain drops off in a linear fashion towards negative infinity away from the edges of the passband. Introduced by the British physicist, Steven Butterworth (1885–1958) (Butterworth 1930). Mentioned in an earth science context by Buttkus (1991, 2000) and Gubbins (2004).

67

Byte A sequence of eight bits. The term was introduced by the German-American computer scientist, Werner Buchholz (1922–2003) in 1956, when he was working on the design of the International Business Machines (IBM) 7030 “Stretch” computer, their first transistorized supercomputer, to describe the number of bits used to encode a single character of text in a computer (Buchholz 1962, 1981). Mentioned in Davis and Sampson (1973).

C

C, C++ 1. C is a general-purpose computer programming language, which produces compact, efficient code which can be run on different computers with minimal changes and still widely used in application and operating system development. It was originally developed by the American computer pioneer, Dennis MacAlistair Ritchie (1941–2011), at the AT & T Bell Laboratories Computing Sciences Research Centre, Murray Hill, NJ, USA. Together with Canadian, Brian Kernighan (1942–), he wrote the definitive book on the language (Kernighan and Ritchie 1978), with a second edition in 1988 when the ANSI standard version was brought out. With Kenneth Lane Thompson (1943–), he was also one of the developers of the Unix computer operating system. This was originally developed in assembler language in 1969, but by 1973 it had been recoded in C, which greatly aided its portability. One of the first programs coded in C to be published in the earth sciences was a geostatistical simulation program (Gómez-Hernández and Srivastava 1990); a second early example is Kutty and Gosh (1992). 2. C++ is a general-purpose object-oriented programming language, originally developed in 1979 by the Danish computer scientist, Bjarne Stroustrup (1950–) at AT & T Bell Laboratories, Murray Hill, NJ, USA, as an enhancement to the C programming language. It is suitable for the development of both object-oriented and procedural code. Originally known as “C with Classes,” after years of development, it was renamed C++ in 1983, and the first commercial implementation of the language was released two years later (Stroustrup 1985). An application as a high-level specification language in computer cartography, in the context of geographical information systems, was described by Frank and Egenhofer (1992).

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_3

69

70

C

Cagniard’s method The French geophysicist, Louis Cagniard (1900–1971) devised an ingenious, but extremely complex, method for the solution of problems involving the interaction of a dynamic source with an elastic half-space (i.e. a body of infinite extent limited by a flat surface), applicable to the theory of seismic wave propagation in stratified media (Cagniard 1939, 1962). His method was gradually simplified and made more tractable by the work of Dix (1954, 1958), de Hoop (1960, 1988) and Bleistein and Cohen (1992); see also Ben-Hador and Buchen (1999). For a more general discussion of this type of problem see: Červený and Ravindra (1971). Calculus of variations A field of mathematics, pioneered by the Swiss mathematician and physicist, Leonhard Euler (1707–1783) in 1756 (Euler 1766), concerned with functionals (a function which takes a vector as its argument and returns a scalar), as opposed to ordinary calculus which deals with functions (Weinstock 1947). It is applied to finding a path, curve, or surface, for which a given function has a stationary value (in most physical problems, this will be a minimum or maximum). See Lanzano (1987) and Buttkus (1991, 2000) for examples of its application in geophysics. Calibration Setting up an instrument so that its readout ( y) corresponds in a known way to the actual values of the variable of interest (x), e.g. the concentration of an analyte. Generally accomplished by determining instrumental readings on a series of equi-spaced reference materials spanning the range of interest (e.g. the required concentration range). So as to assess the inherent error, readings should be made at least in duplicate for every reference value. The calibration is performed by regression of instrumental responses y as a function of the reference values x (usually a linear model is applicable). The regression equation is then inverted to obtain estimates of x given y. Although it is often assumed that error in the reference values may be neglected, statistical methods (errors-in-variates regression) are available which can take this into account. For useful discussion of techniques, see the classic papers: Mandel and Linnig (1957) and Eisenhart (1962); and, for more recent developments: Swallow and Trout (1983), Knafl et al. (1984), Vecchia et al. (1989), Spiegelman et al. (1991), Miller (1991), Analytical Methods Committee (1994), Webster (1997) and Jones and Rocke (1999). Mentioned in an earth science context by Nettleton (1940) and Heiland (1940). The term has also been applied to checking the results of a computer-based model against actual data, e.g. in Cathles (1979). Canonical correlation, canonical variates The American statistician, Harold Hotelling (1895–1973) showed that given two sets of variables: x1, x2, ∙∙∙, xp and xp + 1, xp + 2, ∙∙∙, xp + q, it is possible to linearly transform both sets into two new sets of variables: g1, g2, . . . gp and gp + 1, gp + 2, ∙∙∙, gp + q in such a way that the members of each group are uncorrelated among themselves; each member of one group is only correlated with one member of the other; and non-zero correlations between the members of the two groups are maximised (Hotelling 1936). He called the new variables canonical variates and the correlations between them canonical correlations. Suppose we have three elements (Zn, Cd, Cu) representing potential mineralisation and a further two (Fe, Mn) representing environmental effects in a set of

71

stream sediment samples, then if the canonical variables are U and V, we find the coefficients of U ¼ a1Zn + a2Cd + a3Cu and V ¼ b1Fe + b2Mn such that maximum correlation exists between U and V. A scatterplot of the samples in terms of U and V may well separate background from anomalous mineralisation effects. See: Lee and Middleton (Lee and Middleton 1967) and Lee (1981) for applications to mapping; Reyment (1991) for discussion in the context of morphometrics; and Pan and Harris (1992) for its application to favorability function analysis. Canonical transformation A transformation from one set of coordinates and momenta to another set in which the equations of motion are preserved. First introduced by the French astronomer and mathematician, Charles-Eugène Delaunay (1816–1872) in a study of the Earth-Moon-Sun system (Delaunay 1860, 1867). An early example of discussion in a geophysical context is Scheidegger and Chaudhari (1964). Cantor dust, Cantor set A fractal set (Cantor set) generated by recursively subdividing a straight line into three equal parts and removing the central third; this process is repeated on each “occupied” line segment. As the number of iterations tends to infinity, the total occupied line length tends to zero. At this limit the Cantor set is known as a Cantor dust, an infinite set of clustered points. The set has a dimension of ~0.63. Named for the Russianborn German mathematician Georg Ferdinand Ludwig Philipp Cantor (1845–1918), who first described it in his papers on set theory (Cantor 1883, 1884). See Turcotte (1997) for discussion in an earth science context. Cardinal sine function (sinc, sincn) The cardinal sine function (from the Latin sinus cardinalis) is better known in mathematics as the function sinc(x). This is historically defined in the unnormalised form as sinc(x) ¼ sin(x)/x, and the Z

1

1

sincðxÞdx ¼ π:

In signal processing, e.g. Woodward (1953), it is more convenient to use it in normalised form, which is sincn(x) ¼ sin(πx)/πx; since sincn(0) ¼ 1 and sincn(k) ¼0 for non-zero integer values of k and Z

1 1

sincnðxÞdx ¼ 1:

In its unnormalised form, it is also known as the sampling function and interpolation function. It is of interest in signal processing (Gubbins 2004) because it is the impulse response of the ideal low-pass filter, the Fourier transform of a boxcar function (Daniell window) which cuts off at half the sampling rate (i.e. π and π).

72

C

Cartesian coordinates A method of specifying the position of a point P in 2-, 3-, or higher-dimensional space with reference to 2, 3 or more axes with a common origin O, corresponding to zero in the coordinate system. Unless otherwise specified, the axes are assumed to be orthogonal (i.e. providing a rectangular as opposed to an oblique coordinate system). The coordinates of P, say {x, y, z}, specify how many units it lies away from O along the direction of each axis. The concept is fundamental to both map construction and the development of statistical graphics. The idea of coordinates was first proposed by the French mathematician and philosopher René Descartes (1596–1650) in an appendix to his book on scientific method (Descartes 1637). However, it was the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716) who introduced the terms x-axis, y-axis (ordinata) and coordinates (coordinatae) (Leibniz 1692). Cartography The first computer-contoured map was produced in the 1950s and the first major project in computer cartography, the Canadian Geographical Information System began in 1963. See: Rhind (1977), Monmonier (1982), Cromley (1992), Clarke (1995) and Dent et al. (2008) for general reviews; early geological examples are discussed by: WynneEdwards et al. (1970), Rhind (1971), Smith and Ellison (1999) and geochemical examples by: Nichol et al. (1966), Howarth (1977a), Webb and Howarth (1979), Lecuyer and Boyer (1979) and Howarth and Garrett (2010). See also: choropleth map, three-component map, windrose map, interpolation. Cascading regression A technique introduced in an earth science context by Karlinger and Troutman (1985) for predicting a value of a dependent variable when no paired measurements exist to perform a standard regression analysis. For example, one may have suitable pairs of data points to obtain linear regression equations: Y a + bX and Z c + dY, hence Z ffi c + d(a + b)X, giving the final estimated equation: Z ¼ p + qX where p ¼ c + da and q ¼ db. Catastrophe theory A branch of mathematical modelling which attempts to reproduce phenomena in which completely different discontinuous results can arise from the same continuous process. It was initially developed by the French topologist, René Thom (1923–2002) (Thom 1968), with the intention of studying the bifurcations of particular dynamical systems. The term catastrophe theory, informally introduced by the Japaneseborn British mathematician, (Sir) Erik Christopher Zeeman (1925–) who worked with Thom (Zeeman 1971, 1976). Its possible relevance to geological processes was noted by Henley (1976) and Cubitt and Shaw (1976) but only recently has it been shown to be useful in predictive modelling (Quin et al. 2006). See also: Thom (1972, 1975), Aubin (1998) and Castrigiano and Hayes (2003) and, in earth science: Chillingworth and Furness (1975), Lantzy et al. (1977) and subsequent discussion (Platt 1978; Kitchell 1979).

73

Catastrophic rupture, catastrophic splitting model A numerical model of the size frequency distribution resulting from the breaking of a rock particle into a finite (or countably infinite) number of smaller particles as the result of the impact of another body, e.g. a meteoroid, upon it. The Russian statistician, Andrey Nikolaevich Kolmogorov (1903–1987) developed a number of random independent splitting models with selfsimilar one-shot splitting laws (Kolmogorov 1941a, 1992). Assuming that each impacted rock splits according to a Poisson process whose rate depends only on size, Marcus (1970) applied Filippov’s (1961) extension of Kolmogorov’s work, which produced satisfactory number frequency distributions of particle mass according to an inverse power-law combined with a surface-count correction for small particles subjected to coverage and burial (Hartmann 1967; Melroy and O’Keefe 1968). For discussion of the size distribution of observed samples, beginning with those obtained from the Apollo 11 and Apollo 12 Lunar missions, which proved to have a slightly bimodal distribution, attributed to contamination from the regolith of the highland regions, see King et al. (1971), Carrier (1973, 2003) and Graf (1993). Category theory A mathematical conceptual framework enabling consideration of the universal components of a family of structures of a given kind, and how such structures are inter-related (Marquis 2010). As originally introduced by Polish-American mathematician, Samuel Eilenberg (1913–1998) and American mathematician, Saunders Mac Lane (1909–2005) (Eilenberg and Mac Lane 1945), in the context of their work in topology, a category C consists of the following mathematical entities: a class Ob(C) of abstract elements, called the objects of C; a class Hom(C), (Hom: homomorpism, a structurepreserving function) whose elements are called morphisms (depicted as arrows), each morphism ( f) having a unique source object (a) and target object (b), represented as f: a ! b; Hom(a, b) denotes the Hom-class of all morphisms from a to b. Finally, a binary operation (depicted as ○), called the composition of morphisms, such that for any three objects a, b, and c, we have: Homða; bÞ Homðb; cÞ ! Homða; cÞ: The composition of f: a ! b and g: b ! c is written as g ○ f. Thus, if f: a ! b, g: b ! c and h: c ! d then h ∘ ðg ∘ f Þ ¼ ðg ∘ f Þ ∘ f : Category theory is mentioned in a Geographical Information Systems context in Herring et al. (1990) and Herring (1992).

74

Cauchy distribution A probability distribution of the form: β i, F ðx; α; βÞ ¼ h 2 π β þ ðx αÞ2

C

where β > 0. The density does not have an expectation and is symmetrical about the origin, where it has a maximum. Although similar in shape to the Gaussian distribution, it has more pronounced tails, decaying more quickly for large values of x. Named for the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857), who first described it (Cauchy 1853). Discussed in an earth science context in Vistelius (1980, 1992). It is also known among physicists as the Lorentz distribution or the Cauchy-Lorentz distribution after the Dutch theoretical physicist, Hendrik Antoon Lorentz (1853–1928) who derived it as the theoretical shape of optical spectral lines formed by atomic emission or absorption. Cauchy-Bunyakovsky-Schwarz inequality, Cauchy’s inequality, Cauchy-Schwarz inequality, Schwarz’s inequality Given two series of real numbers a1, , an and b1, bn then n X

!2 ai bi

n X

i¼1

! ai

2

i¼1

n X

! bi

2

i¼1

This postulate was initially proved by the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1821). A continuous analogue of this Z a

b

Z

b

f ðxÞg ðxÞdx a

12 Z f 2 ðxÞdx

b

12 g 2 ðxÞdx

a

was obtained by a Ukrainian former student of his, Viktor Yacovlevich Bunyakovsky (1804–1889) (Bouniakowsky 1859) and the principle was independently discovered by a Silesian (Polish) mathematician, Hermann Amandus Schwarz (1843–1921) in 1884 (Schwarz 1888). Used by Watson (1971) in an early discussion of Matheronian geostatistics and Buttkus (1991, 2000) in the context of matching filters to signals in the presence of noise. Cauchy’s integral theorem Named for the French mathematician, (Baron) AugustinLouis Cauchy (1789–1857) who published it (Cauchy 1825), it applies to line integrals for holomorphic functions in the complex number plane; it implies that if two different paths connect the same two points, and a function is holomorphic everywhere “in between” the two points, then the two path integrals of the function will be the same. Discussed in an earth science context in Buttkus (1991, 2000).

75

Cauchy’s principal value The principal value of a definite integral over an integrand with a singularity at c, a < c < b, is obtained by dividing the integral into two parts and evaluating it: Z

Z

b

f ðxÞdx ¼

a

lim ε!0, ε>0

Z

b

f ðxÞdx þ

a

f ðxÞdx:

cþε

Replacing ε by μτ in the first integral and by vτ in the second, where μ and v are two arbitrary and undetermined constants and τ represents an indefinitely small quantity approaching zero (so that neither part-integral contains the actual point at which the original integral becomes infinite or discontinuous), then following integration, replacing τ by 0 will yield the desired result. For example: Z

π 0

dx ¼ a þ b cos x

Z

aμτ 0

dx þ a þ b cos x

Z

π

dx aþvτ a þ b cos x

If a > b then Z

π 0

" (rffiffiffiffiffiffiffiffiffiffiffi )# π dx 2 ab x π 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi tan tan ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi ; 2 2 a þ bcosx a þ b 2 a b a b 0

if a < b then Z 0

π

(" #aμτ " # π ) sin aþx sin xþa 1 2 2 ax xa ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ log log 2 2 sin sin b a 2 2 0 aþvτ μτ vτ sin a sin 1 1 v 2 2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi log log : vτ μτ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 2 μ b a b a sin a þ sin 2 2

dx a þ bcosx

The value of this integral is indeterminate because the values of the constants v and μ are undefined; it is known as a general definite integral. However, by setting these arbitrary constants to μ ¼ v ¼ 1, then the integral takes the form of a definite integral, which the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1825, 1827) called the principal value of the definite integral. In this case logϑμ ¼ 0, and so the

principal value is 0. If a ¼ b then Z 0

π

dx 1 ¼ að1 þ cosxÞ 2a

Z

π

sec

x 2 dx ¼ 1; 2

76

hence

C

dx 0 aþb cos x

1 π logϑμ , 1, or pffiffiffiffiffiffiffiffiffi is a discontinuous function, equalling pffiffiffiffiffiffiffiffiffi 2 2 b a2

a2 b

depending on whether a is less than, equal to, or greater than b (Price 1865). If a function R1 R1 f(t) can be expressed as the Fourier integral xðt Þ ¼ 1 X ð f Þei2πft df where X ð f Þ ¼ 1 xðt Þei2πft dt, X ð f Þ is a representation of x(t) in the frequency e is Euler’s pffiffiffiffiffiffidomain, ffi number, the constant 2.71828, and i is the imaginary unit 1. They are related by (

xðt Þ ! X ð f Þ Fourier transform

X ð f Þ ! xðt Þ

) :

Inverse transform

Consider a signal consisting of a single rectangular pulse with a half-width in time of 2E : Then 8 < 1 when E t E 2 2 xðt Þ ¼ : 0 otherwise: The principal value of the Fourier integral is: 8 E > 1 when jt j < > > 2 > > Z a < E i2πft lim X ð f Þe df ¼ 0:5 when jt j ¼ : a!1 a > 2 > > > > : 0 when jt j > E 2 It is mentioned in an earth science context in Buttkus (1991, 2000). Cauchy-Riemann equations The Cauchy-Riemann differential equations on a pair of real number valued functions u(x, y) and v(x, y) are the pair of equations: ∂u/∂x ¼ ∂v/∂y and ∂u/∂y ¼ (∂v/∂x), where u and v are taken to be the real part and imaginary part of pffiffiffiffiffiffiffi a complex function f(x + iy) ¼ u(x, y) + iv(x, y) and i is the imaginary unit 1. They have great importance in mathematical physics, and are mentioned in an earth science context, in Camina and Janacek (1984). First derived by the French mathematician and physicist, Jean le Rond d’Alambert (1717–1783) (d’Alambert 1747 [1750], 1750 [1752]), it was made the basis of a theory of complex analysis by the mathematicians, (Baron) Augustin-Louis Cauchy (1789–1857) in France, and Georg Friedrich Bernhard Riemann (1826–1866) in Germany in two works, Cauchy (1825) and Riemann (1851, 2004).

77

Causal 1. Relating to a cause. 2. Not existing before some finite starting time and having a finite total energy (Sheriff 1984). Causal filter A filter which reacts only to past events: it produces output at time t which depends only on input values prior to, or at, time t; also known as a realisable filter. (A filter whose output also depends on future inputs is noncausal; a filter whose output depends only on future inputs is anti-causal). Causal filters always introduce phase shifts. Discussed in an earth science context by Ferber (1984), Buttkus (1991, 2000) and Gubbins (2004). See also: acausal filter, impulse response filter. Causal transform A transform which uses only previous samples of the input or output signals (Futterman 1962). Causality condition For a filter to be causal, its response must only depend on the current and past inputs. Mentioned in an earth science context by Buttkus (1991, 2000). See also: Paley-Wiener criterion. Cell Usually considered to refer to a position within a two- or three-dimensional mesh. Often used as the basis for interpolation of values into a regular spatial pattern from irregularly-spaced raw data values (Antoy 1983). See: contour map. Cellular automata An infinite grid of cells, each one of which can be in a number of states. The state of a given cell at time t is a function of the states of a finite number of neighbouring cells in the previous time step. The same updating rules are applied to all cells in the grid. The Hungarian-American mathematician, Janosh (John) von Neumann (1903–1957) gave the first description of a theoretical model of a self-reproducing machine at a meeting held at the California Institute of Technology in September, 1948 (von Neumann 1951), but it was only after a suggestion by his colleague, the Polish-American mathematician, Stanislaw Marcin Ulam (1909–1984) in 1951 that he developed a theoretical cell-based model capable of self-reproduction (von Neumann 1966). However, the complex rules combined with the fairly rudimentary state of computing at the time did not allow practical development. The earliest automaton to actually work successfully on a computer (a PDP-7 with a visual display screen) was the two-state “Game of Life” devised in 1970 by the British mathematician, John Horton Conway (1937–) (Gardner 1970; Adamatzky 2010). The field was subsequently developed by the British mathematician, Steven Wolfram (1959–) in the 1980s (Wolfram 1986) and others. These techniques have recently begun to be applied to modelling of complex geological phenomena such as debris and pyroclastic flows (Contreras and Suter 1990; D’Ambrosio et al. 2003; Crisci et al. 2005)

78

C

Censored data, censored distribution A set of n observations of which a known number, n0, fall beyond some lower limit of observation x0 but their individual values are unknown, save that 0 x < x0. In geological data, censoring occurs much more frequently in the lower tail of a frequency distribution; typically, x0 is the method detection limit. The larger the proportion of n0 to n, the greater will be the bias in estimation of parameters such as the mean and standard deviation and special methods must be used (Hald 1949; Cohen 1959; Selvin 1976; Gentleman and Crowley 1991; Helsel 2005). The Danish statistician, Anders Hald (1913–2007) seems to have been the first to make a distinction between censored and truncated distributions in 1949, although what is now called a censored distribution was referred to as truncated in Stevens (1937) (Miller 2015a). In the earth sciences, problems arise in geochemistry (Miesch 1967a, b; Helsel 2005) and with joint-length and fracture-length data (Baecher 1980, 1983; de Caprariis 1988). See also truncated data, nondetect. Centre-finding algorithm An algorithm for locating the centroid of an approximately circular or ellipsoidal shape, such as the two-dimensional outline of a quartz grain based on Fourier analysis (Full and Ehrlich 1982). Centring Subtracting the overall mean value from every value in an observed time series, so as to centre the values in the series about zero (Weedon 2003). The term was in use in this context by the 1920s (Crum 1925). Centred logratio transform The Scottish statistician, John Aitchison (1926–) has analysed many of the difficulties caused by constant-sum nature of a percentaged data set (closure problem), demonstrating that if this is not taken into account, bias will occur in both estimates of the mean composition, and the results of the application of multivariate statistical analysis methods (e.g. principal components analysis). He has shown (Aitchison 1982, 1986, 2003) that, provided no zero percentages are present in the data set, these types of problem can be overcome by re-expressing the data set in terms of the natural logarithms of the ratio of each of the k proportions ( p1,, pk) in a sample to the geometric mean (mg) for that sample, i.e. ln( p1/mg),, ln( pk/mg). This is known as the centred logratio transform. Statistics such as the mean composition are computed on the basis of the transformed data and then back-transformed to recover the actual percentage composition. Central limit theorem This states that if the sum of many independent and identically distributed (i.i.d.) random variables has finite variance, then its probability density will be asymptotically that of the normal distribution. The earliest demonstration of this approximation was by the French-born English mathematician, Abraham De Moivre (1667–1754) (De Moivre 1733) [reproduced in Archibald (1926); English translation in Smith (1929, 566–568)], followed by a proof by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1810) but it was the Russian mathematician,

79

Aleksandr Mikhailovich Lyapunov (1857–1918) who gave the first rigorous proof of the theorem (Lyapunov 1901), and showed that so long as none of the variables exerts a much larger influence than the others, then the i.i.d. condition is not necessary. Mentioned in an earth science context by Miller and Kahn (1962), Koch and Link (1970–1971), Vistelius (1980, 1992) and Camina and Janacek (1984). Central tendency The value about which all other values group in a frequency distribution. The term is used in Fisher (1925b) and in an earth science context by Krumbein and Pettijohn (1938) and Chayes (1972). See also: mean, median, mode. Cepstral analysis, cepstrum, cepstrum analysis The term cepstrum (a rearrangement of the letters of spectrum) was introduced by the American statistician, John Wilder Tukey (1915–2000) in a paper with the mathematicians, Bruce Plympton Bogert (1923–) and Michael John Romer Healy (1923–), motivated by determining the depth to a seismic source, based on the recognition that a seismogram could contain an echo of the signal superimposed on the signal itself, with a time delay proportional to the depth of the source (Bogert et al. 1963; Kemerait and Childers 1972). The cepstrum is a dissection of the variance of the time series into portions associated with various quefrencies (quefrency is the analogue of frequency in the power domain). This is achieved by using the power spectrum of the logarithm of the (smoothed or filtered) power spectrum of the original waveform. The latter is first shifted by an integer multiple of 2π radians to properly unwrap the angle or imaginary part of the complex logarithm function. The real cepstrum uses only the information of the magnitude of the power spectrum; the imaginary cepstrum, uses the complex logarithm function and holds information about the magnitude and phase of the initial spectrum, allowing reconstruction of the signal. It was Tukey who suggested that by taking the logarithm of the spectrum, any ripples present would be rendered nearly cosinusoidal. He also proposed the use of additional terms to describe those identities or operations applied in cepstrum analysis which are equivalent to those used in traditional spectral analysis, hence: alanysis—for the process itself; quefrency/frequency, rahmonics/harmonics, gamnitude/magnitude, saphe/phase, saphe-cracking/complex demodulation, lifter/filter, short-pass lifter/low-pass filter and long-pass lifter/highpass filter (Oppenheim and Schafer 2004). Buttkus (1991, 2000) emphasises that it is the complex cepstrum which is used in modern geophysics, as the Bogert et al. (1963) cepstrum did not take any phase information into consideration. See: Cohen (1970), Lines and Ulrych (1977), Flinn et al. (1973), Butler (1988), Shumway et al. (2004), Hall (2006), Liu et al. (2008) and Xue et al. (2015) for earth science examples. Cerebellar model A mathematical model which attempts to simulate the informationprocessing functions of the human brain, first described by the British neuroscientist David Courtnay Marr (1945–1980) (Marr 1969) and, extending Marr’s theory, the American electrical engineer James Sacra Albus (1935–2011), developed the Cerebellar Model

80

Arithmetic Computer (CMAC), based on neural net architecture (Albus 1971, 1981). It was experimentally applied to subsurface mapping by Hagens and Doveton (1991).

C

Chadha diagram A composite diagram showing both cation and anion compositions of groundwater. Modified from the Piper-Hill diagram (by the omission of the two equilateral triangles, and transformation of its diamond-shaped field to a rectangle) by the Indian hydrogeologist, Devinder Kumar Chadha (1999). It plots the differences in milliequivalent percentages of {(CO3 + HCO3) (Cl + SO4)} (y-axis) against those of {(Ca + Mg) (Na + K)} (x-axis). It has been found useful for identification of natural waters as well as the study of geochemical processes Chain rule A term for a function of a function (Camina and Janacek 1984), e.g. If y ¼ f(t) and t ¼ g(x) so that y ¼ f [g(x)], then dy/dx ¼ (df /dt)(dt/dx) or dy/dx ¼ (df/dt)(dg/dx). Change point The point at which a statistically significant change in amplitude in the mean and/or variance of a time series occurs, indicating a change in the nature of the underlying process controlling the formation of the time series. Generally detected by means of a graph of the cumulative sum of mean and/or variance as a function of time (Montgomery 1991a), changepoints being indicated by a statistically significant change in slope (Clark and Royall 1996; Green 1981, 1982; Leonte et al. 2003; Gallagher et al. 2011). See also segmentation. Channel sample 1. The term originates in physical sampling in a mine environment: A slot, or channel, of given length is cut into the rock face in a given alignment (generally from top to bottom of the bed, orthogonal to the bedding plane); all the rock fragments broken out of the slot constitute the sample. It is also known as batch sampling. 2. In statistical sampling, it is a method used to reduce the volume of a long data series: all the values in a fixed non-overlapping sampling interval are averaged and that value constitutes the channel sample (Krumbein and Pettijohn 1938). See also: Krumbein and Graybill (1965); composite sample. Chaos, chaos theory, chaotic behaviour Although chaotic behaviour was referred to by the French topologist, René Thom (1923–2002) (Thom 1972, 1975); the Belgian physicist and mathematician, David Ruelle (1935–) and Dutch mathematician, Floris Takens (1940–) (Ruelle and Takens 1971), the term chaos was popularised by the work of the American applied mathematician, James Alan Yorke (1941–) and the Chinese-born American mathematician, Tien-Yien Li (1945–) (Yorke and Li 1975; Aubin 1998). Chaotic systems are intrinsically deterministic nonlinear dynamical systems but which exhibit a seemingly random behaviour which is very sensitive to initial conditions. Solutions to deterministic systems are said to be chaotic if adjacent solutions diverge exponentially in the phase

81

space of the system, i.e. a small change in the initial conditions may lead to very different long-term behaviour. The study of such systems can be traced back to the work of the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) on celestial dynamics (Poincaré 1890). For earth science examples, see: Yuen (1992), Turcotte (1997), Weedon (2003) and Quin et al. (2006); see also deterministic chaos. Chaotic map This term applies to a logistic function of the form: F ðxtþ1 Þ ¼ cxt ð1 xt Þ which comprises the so-called map: e.g. let x0 have a given (constant) value, then x1 ¼ F ðx0 Þ ¼ cx0 ð1 x0 Þ; x2 ¼ F ðx1 Þ ¼ cx1 ð1 x1 Þ, etc: Surprisingly, this can exhibit chaotic behaviour (i.e. adjacent solutions diverge exponentially in phase space) once the parameter c exceeds 4.2360679. The remarkable properties of this function were first described by the Australian-born physicist, (Baron) Robert McCredie May (1936–) (May 1976; Turcotte 1997). See also: chaos. Character weighting, character-weighted similarity coefficient A weight is a value (wi) associated with each of a set of observations (xi) reflecting its perceived relative importance or reliability. For example, a weighted mean is given by mw ¼

n X i¼1

w i xi

X n

wi :

i¼1

Weights are often normalised so that their sum is unity. The method was in use by the 1830s. The English palaeontologist, Thomas Patrick Burnaby (1924–1968) discussed the use of character weighting in connection with the computation of a similarity coefficient (a measure of the similarity of one sample to another in terms of their k-dimensional compositions) in a paper originally drafted in 1965 but only published posthumously (Burnaby 1970), but see Gower (1970) for a critique of his approach. Characteristic analysis A technique developed by the American geologist, Joseph Moses Botbol (1937–) for the prediction of areas of potential mineralisation by matching the characteristics (mineralogical, geological, geophysical, geochemical etc.) of encoded data for a grid of map cells to that for a training set based on an area (or areas) containing known mineralisation. Initial work (Botbol 1970) used a Boolean similarity matrix to encode the data for each map cell, but this was replaced (Botbol et al. 1977, 1978;

82

C

McCammon et al. 1983) by a {1, 0, 1} coding scheme representing {unfavourable, unevaluated, favourable}. If data were missing, no assignment would be made. Favorability (N.B. U.S. English spelling) for mineralisation, lithology and other geological features was indicated by their presence; geophysical and geochemical data were generally encoded on the basis of presence or absence of a local anomaly, usually defined by the second derivative of a regionally interpolated surface. The favorability function (F) was given by the elements of the eigenvector (Λ) associated with the largest of the eigenvalues (λ) of the matrix-product of the data matrix (X) with its transpose (XT), obtained by solving (XTX) Λ ¼ λΛ, scaled so that 1 F 1. Chaves (1993) describes the application of the NCHARAN program (McCammon et al. 1984) to oil exploration in the SergipeAlagoas basin, Brazil. See also: Bridges et al. (1985); weights of evidence model. Characteristic root, characteristic value, characteristic polynomial Early terms for the eigenvalue of a square matrix (X): a value (λ) such that det(X λI) ¼ 0, where det denotes the determinant and I is the identity matrix (i.e. one whose elements are unity). In general, for an n n matrix, there will be n eigenvalues. The “characteristic root” or “characteristic value” terminology was gradually replaced by the “eigen” terminology (the German Eigenwert is translated as “intrinsic” or “characteristic”) and first appeared in the work of the German mathematician, David Hilbert (1862–1943) (Hilbert 1904); by the late 1930s, eigenvalue had become adopted into English mathematical usage. The characteristic equation p(λ) is equivalent to det(X λI) ¼ 0. For an n n matrix pðλÞ ¼ ð1Þn λn þ c1 λn1 þ c2 λn2 þ þ cn1 λ þ cn where c are the coefficients of the polynomial and c1 ¼ tr(X) and cn ¼ (1)n det(X), where tr(X) and det(X) denote the trace and determinant of (X) respectively. Chayes-Kruskal procedure The American petrologist, Felix Chayes (1916–1993) and statistician, William Henry Kruskal (1919–2005) proposed a test to determine the significance of correlation coefficients calculated between variables in a closed (i.e. constantsum) data matrix, such as the proportions of different minerals in a suite of rocks (Chayes and Kruskal 1966). However, subsequent examination by both Saha et al. (1974) and Kork (1977) showed the procedure to be unsatisfactory. See also: Aitchison (1986, 2003); closure problem. Chebyshev’s criterion, Chebyshev’s fit, Chebyshev’s inequality 1. Chebyshev’s criterion or fit is named for the Russian mathematician, Pafnuty [Pafnutii] Lvovich Chebyshev [Tchebycheff] (1821–1894). In contrast to the method of least squares, when fitting a function, y ¼ f(x), to a set of observations (xi, yi), his method finds the parameters that minimise the errors for the worst cases by minimizing the

83

maximum absolute value of the deviation: ε ¼ maxi{abs[yi f(xi)]} (Chebyshev 1867a, b). In spectral analysis the criterion is used in optimisation of digital filters; approximating the desired frequency response by minimizing the maximum of the deviation from the desired filter frequency. See: Sheynin (1994), Bratten (1958), Curtis and Frank (1959), Buttkus (1991, 2000). This is also known as the minimax fit. 2. Chebyshev's inequality states that if X is a discrete or continuous random variable with mean μ and variance σ 2, then the probability P(|x μ| ε) [σ 2/ε2]. This implies that the further one is from the mean, then the smaller the proportion of values which will be that far, or further, out from it. Chebyshev (1867a), after whom it is now known, used an inequality originally published by the French mathematician, Irénée-Jules Bienaymé (1796–1878) (Bienaymé 1853) to derive a generalised Law of large numbers. It is mentioned in an earth science context by Vistelius (1980, 1992). Checksum The result of the summation of a set of figures which is used for the assessment of accuracy or completeness, error-detection etc. An early example of usage in hand-computation occurs in Waugh (1935). Many algorithms were subsequently developed for use with computer-based systems for error-detection in transmitted blocks of data. Early examples are the cyclic codes investigated by American mathematician, Eugene A. Prange (1918–2006) (Prange 1957) and the algorithms developed by the GermanAmerican computer scientist, Hans Peter Luhn (1896–1964) in 1954 (Luhn 1960), and by the Dutch mathematician, Jacobus Verhoeff (1927–) (Verhoeff 1969) for automatic verification of identification numbers. Chernoff faces Devised by the American mathematician, statistician and physicist, Herman Chernoff (1923–) (Chernoff 1973), this is a graphical multivariate display technique which assigns features of the human face (e.g. position/style of eyes, eyebrows, nose, mouth) to different variables to make comparative displays, each “face” corresponding to a sample’s composition (Howarth and Garrett 1986). However, Turner (1986) found Chernoff faces to be unsatisfactory for the display of geochemical data, in that much work was required to find the best facial features to which a particular element should correspond (which implied that the technique could be used to deliberately distort results by emphasis or suppression of any variable) and that, so as to achieve the best visual emphasis for any anomalous samples, the analyst must have prior knowledge of which they are. See also Kleiner-Hartigan trees. Chi (χ ) A base-2 logarithmic scale of sediment grain settling velocity: χ ¼ log2(s), where s is the grain settling velocity in m/sec (May 1981) Chi-squared plot A tool used to detect possible multivariate outliers. It is based on the assumption that the squared robust multivariate distances of the sample compositions from the mean (Mahalanobis’ distance) follow a Chi-squared distribution with the degrees of freedom equal to the number of variables. The ordered squared robust

84

distances are plotted along the x-axis against the equivalent quantiles of the Chi-squared distribution along the y-axis. The presence of outliers is indicated by marked departure from linearity of data points at larger distances. Robert G. Garrett of the Geological Survey of Canada has applied it to analysis of regional geochemical data (Garrett 1989; Reimann et al. 2008).

C

Chi-square distribution, chi-squared distribution The probability distribution, f( y), of a random variable given by the sum of squares of v independent standard normal variables, i.e. y ¼ ∑z, where z ¼ x2, and x is normally distributed with a mean of 0 and standard deviation of 1, is:

f ð yÞ ¼

h i 1 yðv2Þ=2 ey=2 , 2 Γ ðv=2Þ v 2

where Γ is the standard Gamma distribution with α ¼ v/2; y > 0, 0 otherwise. It has a mean of v and variance of 2v; v is often called the shape parameter. e is Euler’s number, the constant 2.71828. The distribution was independently arrived at by the German geodesist, Friedrich Robert Helmert (1843–1917) (Helmert 1876) and the English statistician, Karl Pearson (1857–1936), who gave it its present name (Pearson 1900; Fisher 1922a). For discussion in a geological context (together with a full derivation of the distribution), see: Miller and Kahn (1962), Buttkus (1991, 2000), Reimann et al. (2008). Current usage appears to favour the spelling “chi-square” rather than “chi-squared,” “Chisquare” or “Chi-square” with chi non-capitalized (Google Research 2012). Chi-squared (χ2) statistic, chi-squared test A goodness-of-fit test for observed frequency counts which are being compared to a theoretical frequency distribution, introduced by English statistician, Karl Pearson (1857–1936) (Pearson 1900; Plackett 1983). The test statistic, which he named Chi-squared, is given by: χ2 ¼

i¼n X ðO i E i Þ2 i¼1

Ei

where Oi is the observed frequency count per class (i ¼ 1, 2,, n, in increasing order of magnitude); and Ei is the expected (theoretical) frequency per class. Assuming, for the sake of example, that the model used for comparison is a normal distribution, N(μ, σ) with mean (μ) and standard deviation (σ), and that the values of these parameters are estimated from the data, yielding values of m and s respectively, then the expected frequencies for each class are calculated on the basis of N(m, s). The observed value of the test statistic is then compared to tabled critical values of χ 2, using (n k 1) degrees of freedom; k ¼ 2 here (as m and s were estimated from the sample) together with the chosen level of significance. For discussion in a geological context, see Miller and Kahn

85

(1962) and Wells (1990). An early application to lithological variation was by the American statistician, Churchill Eisenhart (1913–1994) (Eisenhart 1935). Current usage appears to favour the spelling “chi-square” rather than “chi-squared,” “Chi-square” or “Chi-square” with chi non-capitalized (Google Research 2012). Cholesky decomposition If the matrix M is Hermitian and positive definite, then it can be decomposed as: M ¼ LL*, where L is a lower triangular matrix with strictly positive diagonal entries and L* is the conjugate transpose of L (i.e. taking the transpose and then taking the complex conjugate of each entry by negating their imaginary parts but not their real parts). It is used in the numerical solution of linear equations Mx ¼ b. If M is symmetric and positive definite, then this can be solved by first computing the Cholesky decomposition, M ¼ LLT, then solving first Ly ¼ b for y, then LT x ¼ y for x. Named after the French mathematician and geodesist, Commandant André-Louis Cholesky (1875–1918) who discovered it, and used it in his surveying work. His method was posthumously published by a fellow officer (Benoit 1924) after Cholesky was killed in action during World War I. Choropleth map A map in which distinct geographical areas (e.g. corresponding to counties or states) are each shaded, patterned, or coloured, to show the magnitude of a measured, or enumerated, variable over each distinct area taken as a whole, in contrast to “contour” (isoline, isarithmic) maps. The method was originally introduced by the French mathematician, (Baron) Pierre Charles François Dupin (1784–1873) (Dupin 1827). The term choroplethe map was introduced by the American geographer, John Kirtland Wright (1891–1969) (Wright 1938). Such maps were well suited to maps made by computer lineprinters (Rhind 1977). Sampling-based soil-class maps are also considered to be of this type (Burrough et al. 1997); Moon (2010) has used this type of map to show the geochemistry of stream drainage catchments. Chronogram A graphical display illustrating the probable error associated with placement of a geological boundary at a any particular age over a given range, based on a set of reported isotopic ages from dated physical samples stratigraphically above and below the boundary. All these age estimates will have different uncertainties associated with them. The method was originally adapted (Harland et al. 1982) from a method used to estimate the ages of magnetic reversals from isotopic data (Cox and Dalrymple 1967). See Agterberg (1990, 76–92) for a further improvement to the method. Circular Error Probability (CEP) The radius of a circle such that a predefined proportion (e.g. 50%, 90%) of a set of spatially distributed data points fall within it; often used to determine the accuracy of military ballistics landing on, or close to, a target, and latterly in geodesy for assessment of positional accuracy of the Global Positioning System. Early work on this statistic was carried out by the American mathematical statistician, Harman

86

Leon Harter (1919–2010) (Harter 1960). Discussed in a geophysical target-location context by Sheriff (1974).

C

Circular statistics Geological phenomena such as paleocurrent directions (as indicated by foreset orientation in bedded sedimentary deposits) or lineament directions have directional properties which can be adequately described in terms of 2-dimensional orientation relative to a fixed direction. These are usually modelled using the von Mises distribution, named for the German physicist and statistician, Edler Richard von Mises (1883–1953). Two early sedimentological studies were by the German geologist, Konrad Richter (1903–1979) (Richter 1932) of pebble orientations in subglacial till, and by the Swedish-American geologist, Hakon Wadell (1895–1962) (Wadell 1936) in an esker and outwash delta. A comprehensive treatment of methods for the statistical analysis of such data is given by Mardia (1972) and Fisher (1993) and Mardia and Jupp (2000); see also Watson (1966), Jones (1968), Rao and Sengupta (1972), Whitten (1974), Plummer and Leppard (1979), Cheeney (1983), Engi (1989), for discussion in a geological context. See also Bennett et al. (1999); Rayleigh test. cis(θ) [notation] An abbreviation for “cos(θ) + i sin(θ),” where cos(θ) + sin(θ) ¼ eiθ;i is pffiffiffiffiffiffiffi the imaginary unit 1 and e is Euler’s number, the constant 2.71828. It was first used by the American mathematician, Washington Stringham (1847–1909) (Stringham 1893). See also: Euler’s identity. City-block distance, city-block metric A measure of the similarity of one sample (x1) to another (x2) in terms of their k-dimensional composition, based on the sums of the differences between the projections of the two points onto each of the coordinate axes, i.e. in two dimensions, the sum of the lengths of the two sides of a right-angle triangle adjacent to the right angle, rather than that of the hypotenuse. dM ¼

k X . xj1 xj2 k, j¼1

Where k is the number of dimensions. Hence the name, city-block or Manhattan distance (as it is equivalent to driving round two sides of a rectangular city-block, such as are found in Manhattan, New York, to reach the opposite corner. Introduction of this measure is variously attributed to the American mathematicians Alston Scott Householder (1904–1993) and Herbert Daniel Landahl (1913–2003) (Householder and Landahl 1945) and to psychologist Fred Attneave (1919–1991) (Attneave 1950). See also Bray-Curtis coefficient. Cladistics, cladogram A method of biological classification, originally proposed by the German insect taxonomist, Willi Hennig (1913–1976) (Hennig 1950; 1966), which, in its

87

purest form, seeks to group taxa into sets and subsets based on the most parsimonious distribution of characters. The results are expressed in the form of a tree-diagram, called a cladogram, which shows the distribution of characters. See Wiley (1981), A. Smith (1994), Skelton et al. (2002), McGowan and Smith (2007), and the Palaeontological Association’s PalAss Newsletter (ongoing) for discussion and examples of usage. Clark-Drew resource supply model A tripartite economic conceptual model, developed by geologists Allen L. Clark and Lawrence J. Drew of the U.S. Geological Survey, of the resource supply system, which links an occurrence model (yielding total resources), a search model (yielding resource availability at present commodity price and technology) and a production model (yielding predicted production at various assumed socio-economic conditions). The model (A. Clark 1976, 311, Fig. 1) was intended to assist assessment of the petroleum supply system of the United States. Class interval This was originally applied in geology to the arbitrary division of a continuous scale of particle sizes such that each scale unit or grade serves as a convenient interval in which to express the results of the analysis. By the 1930s, the term class interval was being applied to the width of the bins in a histogram of particle size (or shape, e.g. roundness) attributes (Krumbein and Pettijohn 1938). The earliest example of a grade scale, based on geometric class limits for sediment sizes, introduced by the SwedishAmerican geologist Johan August Udden (1859–1932) (Udden 1898, 1914). Classification The action of classifying an object or entity into one of a number of pre-defined classes on a systematic basis, either: 1. by following a set of agreed, often hierarchical, rules (e.g. when identifying a mineral or fossil specimen; determining the classification of geographically-defined or geologically-defined areas in terms of their potential for development of a particular commodity, etc.); or 2. based on strictly statistical criteria (e.g. using a classifying function map, discriminant analysis, empirical discriminant analysis, neural net, etc.). In the latter situation, the characteristics of a (hopefully) representative, large, “training set” of samples (cases) drawn from each class are used to determine the statistically-based classification criteria. So as to estimate the likely successful classification rate of the classifier when being presented with measurements on a set of samples of unknown affinity, this should be determined using a separate “test set” for which the correct categories are already known. If this is not possible, then each sample in the training set may be eliminated in turn, used as a test sample, its classification being based on the characteristics of the rest of the training set, and then replaced. This is sometimes known as the “leave-oneout” test method (Howarth 1973a). Recent applications include such diverse topics as: geological field-mapping data (Brodaric et al. 2004), structural geology (Kelker and Langenberg 1976; Lisle et al. 2006), remotely-sensed multi-spectral imagery (Franklin

88

and Wilson 1991), glacial till composition (Granath 1984), fossil (Wright and Switzer 1971) and petrographic classification (Jacob 1975), and seismology (Tiira 1999). See also: fuzzy classification.

C

Classification and Regression Trees (CART) This technique (Breiman et al. 1984) provides an alternative to classical regression methods. It was introduced by the American statistician, Leo Breiman (1928–2005); statistician and physicist, Jerome Harold Friedman (1939–); and statisticians, Richard Allen Olshen (1942–) and Charles Joel Stone (1936–). The fitting is achieved by binary recursive partitioning whereby a data set is split into increasingly homogeneous subsets. It provides output in the form of a decision tree and it particularly useful if a mixture of continuous and binary (presence/absence) is present and/or missing data. Howarth (2001a) gives an example of application of the method to a geological data set. See also Mertens et al. (2002), Spruill et al. (2002) and Kheir et al. (2007). Classifying function map Introduced by the American geologist, Chester Robert Pelto (1915–1984) as the D function (Pelto 1954), it expresses the relationship between the relative amount of each component in a multi-component system (e.g. non-clastics, sand, shale for a lithofacies map) selected as an end-member. It divides a continuous threecomponent system into seven classes: three sectors with a one-component end-mixture; three sectors in which two components approach equal proportions; and one sector in which all three components approach equal proportions. See also: Fogotson (1960). Clastic ratio map An isoline map, introduced by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1948), which shows the areal distribution of changing characteristics of a sedimentary formation or stratigraphic unit based on quantitative data (usually derived from outcrop and/or well log measurements, etc.), e.g. the total thickness of sand, shale, carbonate and evaporite rocks through each profile, yielding the clastic ratio: ðconglomerate þ sand þ shaleÞ map: ðcarbonate þ evaporite þ coalÞ See also: lithofacies map. Clastic simulation Computer simulation of two-dimensional infilling of a sedimentary basin by successive inputs of clastic and carbonate sediments (Strobel et al. 1989). CLEAN spectral algorithm Originally developed by Swedish astronomer, Jan H€ogbom (1929–) (H€ ogbom 1974) to assist deconvolution (“cleaning”) of radio astronomy images, it also offers powerful method for performing power spectral density analysis on time

89

series records with unequally-spaced or missing data (Robert et al. 1987; Heslop and Dekkers 2002). See Negi et al. (1990, 1996), Tiwari and Rao (2000) and Heslop and Dekkers (2002) for earth science applications. Clipping Also known as flat-topping. Resetting all values in a time series with amplitudes above (and/or below) a given threshold to the value of the threshold. The term was in use in signal communication by at least the late 1940s (Licklider and Pollack 1948). Note that more recently, it has been applied to the transformation of a real-valued time series into a binary series where 1 represents a value above the population mean and 0 below. Bagnall and Janacek (2005) showed that this can not only speed up cluster analysis of long time series but increase clustering accuracy as it provides robustness against outliers. However, clipping is known to pose problems with seismic data (e.g. Sloan et al. 2008). See: O’Brien et al. (1982) and Weedon (2003). Closed, closed array, closed covariance, closed data, closed system, closed variance The term closed array was first applied by the American petrologist, Felix Chayes (1916–1993), to data in which the modal compositions of igneous rocks, expressed as percentages, summed to 100% (Chayes 1960, 1962; Vistelius and Sarmanov 1961, Chayes and Kruskal 1966). It was adopted by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1962) for stratigraphic data subjected to a constant-sum constraint: He was considering the effect of what he termed “open” data for the total measured thicknesses of sandstone, shale, carbonate and evaporite in a number of stratigraphic sections, when subsequently converted to “closed” (i.e. constant-sum) data, by re-expressing as a percentage of total thickness in each section. He showed that not only correlations but also the resulting spatial patterns changed as a result. The term closed has also been applied to a covariance matrix in which each row sums to zero, in contrast to “open” variances or covariances (Butler 1975). Also referred to as “closed data” or a “closed system” (Krumbein and Watson 1972). See Aitchison (1982, 1986, 2003), Woronow and Butler (1986), Buccianti et al. (2006) and Buccianti (2013) for recent discussion of this closure problem and methods of statistical analysis of compositional data. See also: parent array. Closed set 1. In topology, a set S is “open” if every point in S has a neighbourhood (the set of points inside an n-dimensional sphere with a given central point (x0) and radius r > 0) lying in the set. An open set is the set of all points x such that |x x0| < r. In one-dimension, the open set S consists of all points located on a line such that a < |x x0| < b, but which does not include the boundary points a and b, hence the set is “open” as opposed to a closed set which, by definition, contains its own limit points: a jx x0j b. Similarly, in two-dimensions all points within a disk of given radius; in three dimensions, all points interior to a sphere of given radius, etc. The foundations of set theory were established

90

C

by the Russian-born German mathematician, Georg Ferdinand Ludwig Philipp Cantor (1845–1918), beginning with his first paper on the subject (Cantor 1874). By the early 1900s, the term “closed set of points” had begun to be widely used but “closed set of equations” became usual only after the 1950s (Google Research 2012). 2. The term generally appears in the geological literature in connection with the closure problem in the sense of a “closed set of data.” Closing One of the Minkowski set operations (Minkowski 1901). See Agterberg and Fabbri (1978) for a geological example. Closure, closure problem The American petrologist, Felix Chayes (1916–1993) and mathematical geologist, William Christian Krumbein (1902–1979), and the Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995), were the first to discuss in the geological literature the problems which arise from the inherent correlation in data which are subject to closure i.e. the amounts of the individual variables present in each sample in the data set are subject to a constant-sum constraint (e.g. 1, in the case of proportions; 100, in the case of percentaged data). An obvious example is major-element oxide geochemical data for silicate rocks: as the SiO2% content increases, the relative amounts of the other oxides present must decrease, as seen in the traditional Harker variation diagram. Some of these correlations are spurious, because variables which are inherently uncorrelated may show strong correlation once they have been transformed to proportions or percentages. Basic statistical descriptors of the composition of a suite of samples such as the mean, standard deviation and covariance will also be biased (Chayes 1948, 1960, 1962, 1971; Krumbein 1962; Krumbein and Watson 1972; Vistelius 1948) Such difficulties were re-emphasised by Butler (1978, 1979) and Skala (1977, 1979). Chayes (1971) and others tried to provide solutions to these problems, but the results were not very satisfactory. However, Aitchison (1982, 1986, 2003) subsequently provided a largely satisfactory theoretical framework (see logratio transformation), but work on this topic is still ongoing (see Principle of statistical analysis on coordinates). See also: Woronow and Butler (1986), and Buccianti et al. (2006); closed array. Cluster A group of particles (objects) with nearest-neighbour links to other particles (objects) in the cluster (set). The term occurs both in percolation theory (percolation cluster) and in multivariate cluster analysis. Cluster analysis The term cluster analysis was introduced by the American psychologist, Robert Choate Tryon (1901–1967) (Tryon 1939), and means the assignment of n individual objects to groups of similar objects on the basis of their p-dimensional attributes. The first step of this multivariate method is to compute a similarity matrix between all pairs of samples, this is then used as the basis for assigning the samples to different groups. In one set of techniques, hierarchical clustering, the solution involves nesting sub-groups within larger groups. This is generally accomplished either by (i) agglomerative clustering, in

91

which the n individuals are successively fused into groups; and (ii) divisive methods, which progressively partition the set of individuals into successively finer groupings. The results are generally displayed in the form of a two-dimensional tree-diagram or dendrogram in which the individuals all occur at the topmost level, representing the tips of the branches; these are then progressively joined downwards as the similarity between the groups becomes more generalised until, at the base, they are all joined as a single group. Several standard algorithms are used to compute the tree structure (e.g. single linkage, complete linkage, median clustering, centroid, etc.); although the resulting structure will be broadly similar, some individuals (probably marginal in composition between two groups) may be forced into different sub-groups depending on the method used. The alternative is to use non-hierarchical methods, e.g. (i) The Nonlinear Mapping algorithm (Sammon 1969), a non-metric multi-dimensional scaling (Kruskal 1964) in which the samples are generally represented as points on a two-dimensional scatterplot, interpoint distance reflecting the distance between the points in the original p-dimensions, thereby allowing the investigator to determine which samples constitute groups or sub-groups, or (ii) the ISODATA algorithm (Ball and Hall 1965, 1966), which can be applied to very large data sets. Mancey (Mancey 1982) achieved a successful cluster analysis of gap-filled, moving average smoothed, maps (22,000 square map cells) based on c. 50,000 stream sediment samples over England and Wales (Webb et al. 1978), on the basis of 10 major and trace elements, into 9 meaningful groups. Although, again, not completely distortion-free, non-metric techniques are preferred by a number of workers. One obvious reason is that hierarchical methods will always force a structure on the data, whereas non-hierarchical methods will show when all the objects belong to a single group of essentially homogeneous composition. See Howarth (1973b) and Kovach (1989) for comparisons of the two approaches. Hierarchical clustering methods are also used in the reconstruction of evolutionary patterns by cladistic methods, the resultant tree-structure being known as a cladogram. These types of methods are also discussed in the pattern recognition literature as pattern analysis. See also: Parks (1966), Bezdek et al. (1984), Botbol (1989) and Southard (1992). Cluster model A model for the possible grouping of events in a time sequence such as earthquake occurrences in a given region (Adamopoulos 1976). See: point process. Cluster sample This refers to a technique in which “natural” subpopulations are identified and a number of them are selected at random, but differs from nested sampling in that the entire subpopulation is sampled (Krumbein and Graybill 1965), e.g. a number of drill-hole sites within the target population are selected at random and if the entire drill core over the interval of interest is assayed on the basis of dividing it into a number of equal-length segments, the drill-hole may be regarded as a cluster. Cluster validity A cluster analysis results in the partitioning of an n-dimensional data set X into a number (k) of groups, or clusters, of objects which are more similar to each

92

other than they are to the rest of the members of X. In many cases, the true value of k is unknown, and the determination of an optimal k has been termed the cluster validity problem (Bezdek et al. 1984).

C

Clustered data Data in which spatial groupings of samples occur. Krumbein and Graybill (1965) pointed out that the presence of clusters may cause distortion when fitting an overall polynomial trend-surface to a data set. COBOL An acronym for Common Business-Oriented Language. The original outlines for a computer programming language which resembled the English language syntax were proposed by the American computer scientist, Grace Murray Hopper (1906–1992) in 1959 and the final specifications were drawn up by a committee representing the American computer manufacturers International Business Machines, RCA (formerly Radio Corporation of America), and Sylvania Electric Products later that year (United States Department of Defense 1961; Sammet 1961); subsequently used in some earth science applications (Hruška 1976; Sheriff 1984). Coconditional simulation An extension of conditional simulation which uses cokriging for the conditional process, dealing with more than one spatially-distributed attribute and preserving the cross-correlation between them (Carr and Myers 1985; Carr and Prezbindowski 1986). Coefficient of association Also known as the simple matching coefficient, Ssm, it indicates the degree of similarity between samples in which the variables used as a basis for comparison can be reduced to two states, e.g. presence/absence, yes/no, etc.: S sm ¼ where C ¼ present in both units compared; N1 ¼ total present in the first unit; N2 ¼ total present in the second unit; and A ¼ absent in both (but present in others). Originally introduced by the Austrian-American biostatistician and anthropologist, Robert Reuven Sokal (1926–2012) and American entomologist, Charles Duncan Michener (1918–) (Sokal and Michener 1958; Cheetham and Hazel 1969), it was implemented in a computer program, which was one of the earliest programs published by the Kansas State Geological Survey (Kaesler et al. 1963). See also: binary coefficient. CþA N 1 þN 2 CþA

Coefficient of determination (r2 or R2) A measure of the goodness-of-fit of a regression model: the square of the product-moment correlation coefficient between the observed and fitted values of y (the multiple correlation coefficient) and is equal to the variation in the dependent variable ( y) explained by all the predictors, divided by the total variation in y, hence the term coefficient of determination. This ratio is often expressed as a percentage. The term was introduced by the American geneticist and evolutionary theorist, Sewall (Green) Wright (1889–1988) (Wright 1921) and its possible first use in geology was by the American sedimentologist, Lincoln Dryden (1903–1977) (Dryden 1935). However,

93

this criterion can be very misleading when fitting nonlinear regression models; see discussion in: Draper and Smith (1981), Kvålseth (1985), Willett and Singer (1988), Ratkowsky (1990), and Scott and Wild (1991). Coefficient of proportional similarity (cosθ coefficient) Better known as the cosθ coefficient, this was introduced by the American geologist, John Imbrie (1925–) and Dutch marine geologist, Tjeerd Hendrik van Andel (1923–2010) (Imbrie and Van Andel 1964) as a measure of the closeness of two compositional vectors xj and xk in pdimensions:

cos θjk ¼

p X i¼1

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! u p p u X X xij xik =t x2ij x2ik , i¼1

i¼1

where the summation is over i ¼ 1, 2, . . ., p. This coefficient ranges from 0–1, being zero when the two vectors are 90 apart (i.e. having nothing in common) to unity when they are coincident. It has been widely used in Q-mode factor analysis. See Howarth (1977b) for levels of significance. Coefficient of variation The ratio of the sample standard deviation to its mean. It has proved a useful measure in studies of sampling variability since its introduction by the English statistician, Karl Pearson (1857–1936) (Pearson 1896a). See Koch and Link (1971) for an early application to earth science. Coherence The coherence between two weakly stationary stochastic processes X(t) and Y(t), both with zero mean, is the square of the cross-spectrum, i.e.

2 Pxy ð f Þ

½Pxx ð f Þ Pyy ð f Þ where Pxx( f ) is the estimated power spectrum of X, Pyy( f ), the estimated power spectrum of Y, and Pxy( f ) is their cross-power density-spectrum, or the (cospectrum)2 + (quadrature spectrum)2 divided by the product of the spectra, i.e. it is the square of coherency. However, as pointed out by Weedon (2003), some authors use the two terms synonymously. Introduced by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1928), coherence is an analogue in the frequency domain of the coefficient of determination. An approximate frequency distribution of the coherence for data having a Normal distribution was developed by Goodman (1957) and is known as the Goodman distribution. See also: coherency spectrum, semblance.

94

Coherence spectrum, coherency spectrum A method by which two signals may be compared quantitatively in the frequency domain. Both terms are used in the earth science literature (Buttkus 1991, 2000; Weedon 2003).

C

Coherency The coherency between two weakly stationary stochastic processes X(t) and Y(t), both with zero mean, is the normalised modulus of the cross-spectrum, i.e. jPxy ð f Þj pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , where Pxx( f ) is the estimated power spectrum of X; Pyy( f ) is the estimated Pxx ð f ÞPyy ð f Þ

power spectrum of Y; and Pxy( f ) is their cross-power density-spectrum, or [(cospectrum) + i (quadrature spectrum)] divided by the square root of the product of the spectra, pffiffiffiffiffiffiffi where i is the imaginary unit 1. Coherency is a measure of the correlation between two time series at different frequencies and is an analogue in the frequency domain of the correlation coefficient. The concept was introduced by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1928), and by the mid-1960s was being widely used in geophysics; see: Jansson and Husebye (1963), Foster and Guinzy (1967), Neidell and Taner (1971), Buttkus (1991, 2000) and Weedon (2003). Coherent noise Noise wave trains which have a systematic phase relation (coherence) between adjacent traces. Sheriff (1984) notes that most source-generated seismic noise is coherent. The term came into general use in the late 1950s and into geophysics in the 1960s (Carder 1963). See Sheriff and Geldart (1982) and Buttkus (1991, 2000) for discussion. Cokriging This is essentially a multivariate extension of kriging, originally introduced by the French geostatistician, Georges Matheron (1930–2000), (Matheron 1970). If, in a geostatistical estimation of spatially distributed ore grades, etc., one variable has not been sampled sufficiently often to provide adequate precision, then the precision of its estimation may be improved by taking into account the spatial correlation and another variable for which denser spatial sampling exists. For discussion see: Journel and Huijbregts (1978), Myers (1982), Freund (1986), Isaaks and Srivastava (1989), Carr and Myers (1990), Wackernagel (1995), and Bivand et al. (2013). Colatitude The complimentary angle of the latitude i.e. (90 -latitude); the polar angle on the sphere measured from the North Pole rather than the Equator. The term has been in use since at least the eighteenth century (e.g. Watts 1728). Since the 1950s colatitude appears to be the preferred spelling rather than co-latitude. See also longitude. Collins diagram Named after the American hydrogeochemist, William Dennis Collins (1875–), this bar chart (Collins 1923) uses double divided bars to show the cationic and anionic compositions of a water sample separately; each set is recalculated to sum to 100% and plotted in the left- and right-hand bars respectively.

95

Colour derivative image Collins and Doveton (1968) describe a technique using colourmixing to highlight shape changes in spontaneous potential and Gamma-ray log curves, based on the first and second derivatives of the filtered log signal. Colored noise, coloured noise Colored (N.B. American English sp.) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t) where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for brown noise decreases linearly as 1/f 2; for pink noise (also known as one-over-f noise) it decreases linearly as 1/f; for blue (or azure) noise it increases linearly as f; for purple (or violet) noise it increases linearly as f 2. The power spectrum density for grey noise is U-shaped with a minimum at mid-range frequencies. That for black noise either: (i) is characterised by predominantly zero power over most frequency ranges, with the exception of a few narrow spikes or bands; or (ii) increases linearly as f p, p > 2. Red noise is a synonym for brown noise (or sometimes pink noise). The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For usage in an earth science context, see Weedon (2003), Treitel and Robinson (1969), and Kulhánek and Klíma (1970). The American spelling “colored noise” rather than the British “coloured noise” has continued to be the most widely used (Google Research 2012). Column vector A matrix with only one column. The term appears in a geological context in Krumbein and Graybill (1965). See also: row vector. Combination tone In the case of imposed amplitude modulation in which a long period sinusoidal wavelength with frequency f1 is imposed on another with frequency f2, f1 > f2, then minor combination tones will be generated at frequencies 1/f ¼ 1/f1 1/f2, the upper and lower sidebands on either side of the dominant frequency ( f2). These appear as symmetrically placed minor-amplitude peaks on either side of f2 in the power spectrum of the resulting waveform. The term combination tone was used in acoustics by the German physicist, Georg Simon Ohm (1787–1854) (Ohm 1839). They are also called interference beats and interference tones; their generation is known as intermodulation or frequency mixing. The primary combination tone at f1 + f2 is known as a summation tone, and at f1 f2 as a difference tone. When a component frequency is higher than a fundamental frequency, it is called an overtone, and a difference tone at a lower frequency than the fundamental is called an undertone. For discussion in an earth science context see King (1996) and Weedon (2003).

96

C

Common logarithm (log, log10) An abbreviation for the common (i.e. base-10) logarithm. If x ¼ zy, then y is the logarithm to the base z of x, e.g. log10(100) ¼ 2 and log(xy) ¼ log(x) + log( y); log(x/y) ¼ log(x) log( y), etc. The principle was originally developed by the Scottish landowner, mathematician, physicist and astronomer, John Napier, 8th Laird of Murchiston (1550–1617), who produced the first table of natural logarithms of sines, cosines and tangents, intended as an aid to astronomical, surveying and navigational calculations (Napier 1614; Napier and Briggs 1618; Napier and Macdonald 1889). “The same were transformed, and the foundation and use of them illustrated with his approbation” by the British mathematician, Henry Briggs (1561–1630), who following discussions with Napier whom he visited in 1615 and 1616, developed the idea of common logarithms (sometimes called Briggsian logarithms), defining log(1) ¼ 0 and log(10) ¼ 1, and obtaining the intermediate values by taking successive roots, e.g. √10 is 3.16227, so log (3.16227) ¼ 0.50000, etc. His first publication (Briggs 1617) consisted of the first 1000 values computed, by hand, to 14 decimal places (they are almost entirely accurate to within

1014; see Monta (2015) for an interesting analysis). A full table was initially published in Latin (Briggs 1624). After Briggs death an English edition was published “for the benefit of such as understand not the Latin tongue” (Briggs 1631). Briggs logarithms were soon being applied in works on geophysics, e.g. by the English mathematician, Henry Gellibrand (1597–1637) who was studying terrestrial magnetism (Gellibrand 1635). The first extensive table of (Briggsian) anti-logarithms was made by the British mathematician, James Dodson (?1705-1757) (Dodson 1742). All the tables mentioned here were calculated by hand as mechanical calculations did not come into use until the beginning of the twentieth century. Although 10 is the common or Briggsian base, others may be used, see: Napierian logarithm and phi scale. Communality Principal components analysis is usually based on the correlation matrix, in which the principal diagonal (the correlation of each variable with itself) is unity. However, in factor analysis, the entries in this diagonal are replaced by estimates of the communality, a measure of the amount which each variable has in common with the other variables retained in the factor solution, which can be regarded as a measure of the non-uniqueness of the variables expressed as a proportion of the total variance. A lowerbound estimate is the squared multiple correlation between each variable and all the others (Guttman 1954). Early geological use occurs in Imbrie and Purdy (1962). Compiler A computer program which translates a high-level program, written in a programming language such as ALGOL, BASIC, COBOL, FORTRAN, etc. into a machine-level object program before loading and running; if an interpreter is used, then the source code is translated and run concurrently. The first compiler, was written in assembly language by the American computer scientist, Grace Hopper (1906–1992) at the Remington Rand Corporation for the Arithmetic Language version 0 (A-0) programming language on a UNIVAC (Universal Automatic Computer) I in 1952 (Hopper 1953); the first FORTRAN compiler was developed by a team at IBM for the IBM 704 in 1957

97

(Backus 1980). More recently, compilers written for Pascal and C have been written in the languages themselves. See Koch and Link (1970–1971) for early discussion in a geological context. Complex conjugate 1. The conjugate of a complex number is the number with the sign of its imaginary part reversed (Sheriff 1984), i.e. for a complex number, z ¼ x + iy, where x and y are real and pffiffiffiffiffiffiffi i is the imaginary unit 1, its complex conjugate is z ¼ x iy: The complex conjugate is usually denoted with a bar ( z ) or a superscript asterisk (z*). Its use is attributed to the French mathematician, Augustin-Louis Cauchy (1789–1857) (Cauchy 1821). 2. In the case of a matrix, A ¼ (aij), it is the matrix obtained by replacing each element aij by its complex conjugate, as above. An early example of its use in geophysics is Gosh (1961); it is also mentioned in Camina and Janacek (1984) and Yang (2008). See also unitary matrix. Complex dedomulation Its goal is similar to that of harmonic analysis, in seeking to describe the amplitude and phase of a waveform, but it makes use of low-pass (moving average) filters to enhance variations in amplitude and phase structure as a function of time. These variations in instantaneous amplitude and instantaneous phase can be usefully plotted as a function of time. The operation of complex dedomulation in the power cepstrum domain is equivalent to complex demodulation in the frequency domain: A shifting of frequency in a time series by multiplication by sines and by cosines of a quefrency, followed by smoothing and sometimes decimation of the two resulting time series, which can be regarded as the real and imaginary parts of a complex series. The term was introduced by the American statistician, John Wilder Tukey (1915–2000) in Bogert et al. (1963); see also Bingham et al. (1967) and Bloomfield (1976). For discussion in an earth science context see: Taner et al. (1979), Pisias and Moore (1981), Shackleton et al. (1995), Rutherford and D’Hondt (2000), Buttkus (1991, 2000) and Weedon (2003). Complex demodulation A technique that allows the examination of the variation with time of the amplitude and phase of selected frequency components of a time series (Banks 1975). A frequency band of interest, centred on frequency ω0 is chosen for a time 0 series x(t) and is shifted to zero frequency by multiplying each term by eiω t, where e is pffiffiffiffiffiffiffiffiffi Euler’s number, the constant 2.71828 and i is the imaginary unit 1, to produce a 0 0 new series Xs(ω , t) ¼ x(t)eiω t.This is then low-pass filtered using a series of filter weights (ak), ranging from m to + m, to produce a demodulated time series:

98

X d ðω0 ; t Þ ¼

k¼þm X

ak X s ðω0 ; t þ kΔt Þ

k¼m

C

Instantaneous values of signal phase and amplitude are derived from the complex demodulates and cross spectra can be estimated from averages of their products. In practice (after removing the mean, tapering and adding zeros) the Fast Fourier transform (FFT) is applied. This produces a set of real and imaginary Fourier coefficients which are then multiplied by chosen set of weights which define the passband centred on ω0 . The resultant frequency band is then shifted to zero and the new set of low frequency Fourier coefficients can be truncated to define a new Nyquist frequency ω0N . Finally, an inverse FFT converts back to the time domain to produce a demodulated series consisting of independent data points with a new sampling interval of ω0N =2 (Webb 1979). The method was originally introduced by Tukey (1959b, 1961), Bingham et al. (1967) and Godfrey (1965); examples of its application in the earth sciences include Banks (1975), Roberts and Dahl-Jensen (1989), Pälike et al. (2001) and Brzeziński (2012). Complex function A function in which both the independent variable and dependent variable are complex numbers. For discussion in an earth science context, see Camina and Janacek (1984), Buttkus (1991, 2000), Yang (2008). Complex matrix A matrix whose elements may contain complex numbers. For discussion in an earth science context, see Buttkus (1991, 2000), Gubbins (2004) and Yang (2008). Complex number, complex value A complex number z has both real and imaginary parts, e.g. z ¼ x + iy, where x is the real part and iy is the so-called imaginary part. The pffiffiffiffiffiffiffi constant i, known as the imaginary unit is 1. This terminology was introduced by the Swiss mathematician, Leonhard Euler (1707–1783) in the fourth volume of Euler (1768–1794). Such a number may be envisaged as a point on an infinite two-dimensional plane bounded by an x-axis, 1 x 1 , corresponding to the real part of the number and a y-axis, 1 i iy 1 i, corresponding to the imaginary part. It may also be written in the form z ¼ Meiθ, where M is the magnitude (modulus) of the pffiffiffiffiffiffiffiffiffiffiffiffiffiffi complex number, and M ¼ x2 þ y2 :The notion of a complex number was introduced some years later by the German mathematician and physicist, Carl Friedrich Gauss (1777–1855) (Gauss 1831, p. 102). Any function of real variables can be extended to a function of complex variables. For discussion in an earth science context, see Leet (1950), Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004), Yang (2008). See also: Argand diagram, imaginary number, complex conjugate, Euler’s relation, polar form.

99

Complex variable, complex variate A variable which can take on the value of a complex number. For discussion in an earth science context, see Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004) and Yang (2008). Complex vector A vector whose elements may contain complex numbers. For discussion in an earth science context, see Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004) and Yang (2008). Component of a vector In two dimensions, the component of a given vector of magnitude m lying in a given direction at an angle θ to the reference direction (say the x-axis), is the projection of the vector onto that direction: the x-component is the scalar msinθ and, correspondingly, the y-component is mcosθ with respect to the orthogonal y-axis. This concept may be generalised to N dimensions. For discussion in an earth science context see Gubbins (2004). Component transformation A component transformation in compositional space (J. Thompson 1982; Spear et al. 1982) is a linear transformation in an n-dimensional vector space. The components may be system components, phase components, or phases and are usually expressed as weight percentages of oxides obtained by chemical analysis of a rock. Phase or phase component compositions may be calculated if the petrographic mode and the chemical composition of a rock are known (Perry 1967a). Sebastián (1989) gives a computer program for calculating either the CIPW norm for oversaturated rocks, or modal norms for peraluminous or sillimanite-bearing granitoids. Composite map A method of integrating the information present in a number of spatially coincident maps (Merriam and Jewett 1989; Le Roux and Rust 1989; Herzfeld and Merriam 1990; Le Roux 1991). Depending on the number of maps employed, a specific standard value is allotted to each original map and its data are normalized so that every map has the same range of values. The new values for each data point are then added to give compound dimensionless values, which are contoured to produce a composite map. Le Roux and Rust (1989) applied the concept to a set of criteria favourable to uranium mineralization. Sepúlveda et al. (2013) have used the same idea in landslide risk assessment. Composite sample A compound sample or a channel sample, in which set of subsamples are combined together to form a single aggregate sample prior to final sample preparation and analysis (Krumbein and Pettijohn 1938; Lancaster and Keller-McNulty 1998). Composite standard Graphic correlation is a method devised by the American palaeontologist and stratigrapher, Alan Bosworth Shaw (1922–), in 1958 (Shaw 1964) to aid stratigraphic correlation between sections. The method has conventionally consisted of

100

C

making a bivariate scatterplot of the heights (stratigraphic sections) or depths (wells) of occurrence of the tops and bases of as many taxa as possible which are common to the stratigraphic sections to be compared. Continuous linear or nonlinear functions are fitted to the whole section, or to segments of it; any abrupt discontinuity suggests a possible sequence boundary, condensed section or structural break. Smith (1989a) showed that the method can be very effective when used with smoothed well log data; modern applications are reviewed in Mann and Lane (1995) and Gradstein (1996). Pairwise comparisons of a number of stratigraphic sections (beginning with that which is believed to be most complete) enables a composite standard to be built up by gradually extending the observed stratigraphic ranges of the taxa from section to section until a “complete” reference standard is obtained. Compositional data, composition A term introduced by the Scottish statistician, John Aitchison (1926–) (Aitchison 1982, 1986, 2003) to refer to data sets subject to constantsum constraint or, more generally, to parts of some whole, whether closed or not (Aitchison 1986). In geology such data is often geochemical (e.g. major-element oxide compositions of rocks and, at least in theory, trace element data expressed as parts per million etc.) and sediment grain-size data, see Pawlowsky-Glahn and Olea (2004), Pawlowsky-Glahn (2005) and Buccianti et al. (2006), Pawlowsky-Glahn and Buccianti (2011).Thió-Henestroas and Martín Fernández (2015). For software, see van den Boogaart and Tolosana-Delgado (2008, 2013) and Templ et al. (2011); see also: closed data, closure problem, logratio transform, normal distribution on the simplex. Compound sample A set of grab samples (usually taken at a single field site, outcrop, etc.) combined together to form a single aggregate sample prior to final preparation and analysis of the physical samples (Krumbein and Pettijohn 1938; Lancaster and KellerMcNulty 1998). See also: composite sample. Computer The earliest electronic computers were analog [American English spelling], in which physical phenomena were modelled using electrical voltages and currents as the analogue quantities. Although these were subsequently used in geophysical applications (Housner and McCann 1949), following the work of the American electronic engineer and mathematician, Claude Shannon (1916–2001), who showed (Shannon 1937, 1993) that the operations of Boolean algebra could be accomplished using electronic relays and switches, the first electronic digital computer was developed by American mathematician and physicist, John Vincent Atanasoff (1903–1995), who is also reputed to have coined the term analog computer, and electrical engineer, Clifford E. Berry (1918–1963) at Iowa State University in 1939. However, it was not programmable, having been specifically designed to solve linear equations. The first programmable electronic computer, the Harvard Mark I, began operation in 1944; this was followed by the ENIAC (Electronic Numerical Integrator and Computer), the first stored-program computer with a programming language (Haigh 2014; Haigh et al. 2014b), developed at the University of

101

Pennsylvania by American physicist John Wendell Mauchly (1907–1980) and electrical engineer John Adam Presper Eckert Jr. (1919–1995). This came into operation in 1945, and by 1948 it was able to execute stored programs. Its successor, the first general-purpose stored program computer, EDVAC (Electronic Discrete Variable Automatic Computer), also designed by Mauchly and Eckert, with the Hungarian-born American mathematician, Janosh (John) von Neumann (1903–1957) as a consultant, began operation in 1951 (Knuth 1968–1973). Inspired by EDVAC, in England, the mathematical physicist, Maurice Wilks (1913–2010) and his team at Cambridge built the EDSAC I (Electronic Delay Storage Automatic Calculator) computer which was used until 1958. Similar machines (LEO (Lyons Electronic Office) I, Ferranti I, UNIVAC (Universal Automatic Computer) I, IBM 701, IBM 704, IBM 650, etc.) were developed for the commercial market in the early 1950s, and oil companies were among the first customers. Within ten years integratedcircuits had begun to replace valve technology. Analogue computers were then still in use in geophysics: in seismology (Housner and McCann 1949), and for applications such as filtering of seismic records (Jones and Morrison 1954), development of synthetic seismograms (Peterson et al. 1955), and seismic correlation (Tullos and Cummings 1961). However, by the late 1950s digital computers had begun to replace them, e.g.: for detection and filtering of seismic data (Wadsworth et al. 1953; Smith 1958; Morrison and Watson 1961); the interpretation and contouring of gravity data (Simpson 1954; Danes 1960; Talwani and Ewing 1960; Morrison and Watson 1961); interpretation of electrical and magnetic surveys (Vozoff 1958; Yungul 1961); and interpretation of well-log data (Broding and Poole 1960). The first publication on computer use in geology was by Krumbein and Sloss (1958) on the analysis of stratigraphic data; see also Creager et al. (1962). The Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995) describes the use of the BESM-2 computer (Bystrodeystvuyushchaya Electronnaya Stchetnaya Mashina [Fast electronic calculating machine]), designed by Sergey Alexeyevich Lebedev (1902–1974) in the years 1947–1951 and manufactured at the Volodarsky Plant, Ulyanovsk (Karpova and Karpov 2011), and installed in 1957 at the V.A. Steklov Mathematics Institute of the Academy of Sciences of the USSR (Moscow) where it was used by Vistelius and his colleagues to compute correlation coefficient matrices and trend-surfaces to aid the solution of geological problems (Vistelius and Yanovskaya 1963). Normative and similar calculations were soon implemented as computer programs (Thornton and McIntyre 1958; Imbrie and Poldervaart 1959; Vitaliano et al. 1961; Johnson 1962; Howarth 1966) and with the development of laboratory methods such as X-ray florescence for geochemical analysis (Leake et al. 1969), geochemical dataprocessing applications grew rapidly. However, the earliest publications of actual program code, generally for more complex applications, included mapping (Whitten 1963; Harbaugh 1963) and simulation (Harbaugh 1966). Koch and Link (1970–1971) was the first mathematical geology textbook to contain a chapter on “Electronic computers and

102

geology,” in which computer programming, data acquisition, storage and retrieval, and approaches to working with computers were discussed at any length. See also: flowchart.

C

Computer-Aided Design or drafting (CAD) The term is attributed to the American computer scientist, Douglas Taylor Ross (1929–2007) (Ross 1960), who headed a project at the Massachusetts Institute of Technology to investigate ways in which the computer could aid the engineering design process. Early applications of the subsequently-developed software (e.g. AutoCAD) to geology include Cameron et al. (1988) and Marschallinger (1991). However, dedicated earth science software packages (e.g. GOCAD) now underpin much three-dimensional geological modelling, e.g. Fallara et al. (2006); Caumon et al. (2009). Computer-Aided Instruction, computer assisted instruction (CAI) The use of computers in teaching academic skills. The underlying philosophy is clearly explained in Mann (2009). The first large-scale project involving the use of computers in education was piloted in 1959 by American electrical engineer and computer scientist, Donald Lester Bitzer (1934–), inventor of the plasma-display panel, at the Computer-based Education Research Laboratory, University of Illinois at Urbana-Champaign, using a Control Data Corporation 1604 computer (Bitzer et al. 1965). Computer-based self-paced instruction to teach mathematics and reading to schoolchildren was also introduced in 1963 by physicist, psychologist and philosopher of science, Patrick Suppes (1922–2014) and psychologist Richard Chatham Atkinson (1929–) of Stanford University, based on a Digital Equipment Corp. PDP-1 computer (Suppes and Jerman 1969). Early applications in the earth sciences appear in Merriam (1976a). Computer graphics Computational methods for 2D- and 3D-modelling and display of geological objects, such as a fossil (Sutton et al. 2013), an orebody or a mine (Xie et al. 2001; Schofield et al. 2010); surfaces, or landscapes (Groshong 2008), or the results of simulations of rock structure and properties, seismic data, etc. (Pflug and Harbaugh 1992; Uhlenküken et al. 2000; Jones et al. 2009; Zehner et al. 2010). This may include modelling the appearance of an illuminated surface as a result of the interaction of light with the surface material and the occurrence of shading. Statistical graphics and data display are generally considered to form a separate topic. See also: image processing. Computer modelling, computer-aided modelling Use of computer-based numerical methods to evaluate conceptual model, deterministic model, discovery-process model, fluid-flow model, mathematical model, stochastic process model. Computer program Programming is the method of encoding the instructions in a program (note American English spelling is conventionally used for this term) enabling

103

a computer to solve a problem by the input of raw data (if required), undertaking the necessary calculations, and output of the results. The initial analysis of the problem would probably have involved developing an algorithm, determining the detailed logical steps involved in the process, often developed diagrammatically in the form of a flowchart (to aid analysis and debugging of the logic) and, finally, embodying the results in a formal language to enable its execution on the computer hardware. From 1954, this would have been in the form of a low-level, often machine-specific, “machine language” or assembler code [such as FAP, acronym for FORTRAN Assembly Program, originally developed by David E. Ferguson and Donald P. Moore at the Western Data Processing Centre, University of California, Los Angeles; Moore (1960)], which enabled translation, by means of a compiler, of the human-originated instructions into the strings of binary bits required for the actual machine operation. The first manual on computer programming (Wilkes et al. 1951) was written for the EDSAC 1 (Electronic Delay Storage Automatic Computer) built at Cambridge in 1946–1949, which was the first stored-program computer. Krumbein and Sloss (1958) give an early example of such a program for compilation of stratigraphic thickness ratios. However, by the early 1960s high-level “autocodes,” i.e. computer languages such as FORTRAN (acronym for Formula Translation), developed for the IBM704 in early 1957 (McCracken 1963), or ALGOL (acronym for Algorithmic Oriented Language), developed mainly in Europe from 1958 (Dijkstra 1962), enabled easy coding of computational instructions and formats for reading data and outputting the results. This “source code” would be processed by a compiler to produce the “object code” which governed the actual operation of the computer. For early discussion in a geological context, see Koch and Link (1971). Early examples of geological usage include: Whitten (1963), Kaesler et al. (1963), Harbaugh (1964), Link et al. (1964), Fox (1964), Manson and Imbrie (1964), Koch et al. (1972) and Sackin et al. (1965). Successive versions of FORTRAN have continued to be used up to the present time. The interactive general-purpose programming language BASIC was introduced in 1964. Despite the later proliferation of computer packages such as Excel for performing spreadsheet, mathematical and statistical calculations, new special-purpose programming languages, such as S, originally developed by a team at AT&T’s Bell laboratories (Becker et al. 1988), and its successors S-Plus (Venables and Ripley 1994), and a freeware alternative R (originally developed by Robert Gentleman and Ross Ihaka of the Statistics Department, University of Auckland, New Zealand in 1993) have been developed (Maindonald and Braun 2003; Everitt and Hothon 2006; Reimann et al. 2008; Bivand et al. 2008, 2013) to assist customised statistical work and enabling the rapid inclusion of research-level methods contributed by its user community, which have been taken up by earth science users. Computer programming language High-level languages in which a computer program can be written so that they may subsequently be translated into machine language to execute the instructions. See: ALGOL, APL, awk, BASIC, C, COBOL, FORTRAN, Pascal, PL/I, Prolog, Python, R, S; see also: assembler language.

104

C

Concentration-Area (CA) plot A graph of log transformed cumulative area for contours on a map plotted (y-axis) as a function of log transformed [concentration] contour value (x-axis). This method was originally proposed by Qiuming Cheng (1994) in his doctoral thesis and published by Cheng et al. (1994). Anomalies with a Pareto distribution show as a straight-line segment on the right-hand side of a CA plot (Cheng and Agterberg 2009). See also Ballantyne (1994), Reimann et al. (2008). Conceptual model A formal expression of an idea which may be used to try to explain a set of observed data. The majority of conceptual models are expressed in qualitative terms and may well be embodied in diagrammatic form to show how various elements of the model are linked. They were early applied to fields such as: stratigraphic models, igneous and sedimentary models, geomorphological models, and palaeontological models (see Krumbein and Graybill 1965 for discussion). See also: deterministic model, discoveryprocess model, fluid-flow model, mathematical model, physical model, processresponse model, scale model, statistical model, stochastic process model. Conceptual population Also known as a target population, it is the ideal population of individuals whose characteristics it is desired to determine (e.g. the rock types making up an igneous intrusion in its entirety). Sampling of such a unit is usually not realistically possible, so one makes do with the sampled population. The nature of the target population is inferred from parameter estimates made from the sampled population. Concordia diagram Attempts to determine the age of minerals by consideration of the amounts of uranium and lead they contain go back to Boltwood (1907); see also early discussion by Kovarik (1931). The first reported calculation of the age of the Earth based on common lead isotope analyses was that of Gerling (1942). The Concordia diagram is a graph (introduced by the American geophysicist, George West Wetherill (1925–2006) in 1956), used in the interpretation of uranium-lead isotope geochronology. Named for the “Concordia curve,” which shows the theoretical locus of pairs of the ratios of 206Pb*/238U (y-axis) and 207Pb*/235U (x-axis) which give the same age (Wetherill 1956; see: D/P diagram); the asterisk denotes that the daughter element has been produced by radioactive decay; the name of the curve is derived from the fact that if two independent dates are the same, they are said to be “concordant.” Thus at any point on the curve, exp[(λ238)t 1] ¼ exp[(λ235)t 1], where λ238 and λ235 are the respective decay constants and t is time (billion years, by). Overall, 0 < 206Pb*/238U < 0.5 and 0 < 207Pb*/235U < 12 and the upper limit of the curve is about 2.5 by. In practice, loss of radiogenic Pb from minerals (zircon or, less frequently, apatite and sphene) causes plotted data sets to lie below the curve. If the mineral crystallized from a magma and subsequently remained a closed system (i.e., no loss or gain of U or Pb) then the set of ratios will plot as points close to a straight line, below

105

the Concordia curve. A straight line fitted to these points may be extrapolated to intersect it: the upper point (older age) will correspond to the date at which the system became closed; the lower (younger age) will be that at which an external event, such as metamorphism, caused the lead leakage. Alternately, as a result of U leakage, the set of approximately linear points may lie above the Concordia curve. In this case, the interpretation of the intersection of the extrapolated straight line fitted to these points with the Concordia curve is as before: the older age represents time of the initial crystallisation of the zircon; the younger the date of the possible metamorphic event. Several variants of this method have also been proposed: 207Pb/206Pb (y-axis) and 238U/206Pb (x-axis) (Terra and Wasserberg 1972); 206Pb/207Pb (y-axis) and 235U/207Pb (x-axis) (Tatsumoto et al. 1972). Levchenkov and Shukolyukov (1970) recommended a three-dimensional approach, e.g. 204 Pb/238U, vertical z-axis as a function of both 207Pb/235U (x-axis) and 206Pb/238U (y-axis). See also Dickin (2005). Condition number, conditioning number A measure of the degree of ill-conditioning of a (symmetric) matrix, A, introduced by the English mathematician, Alan Mathison Turing (1912–1954). It is known as the condition number (Turing 1948), which is given by the ratio of the largest to smallest eigenvalues of A: λn/ λ1, where λn > λn1 > > λ1 > 0. Conditional probability If A and B are two events occurring according to a probability distribution, then the probability of A, given the occurrence of B, is called the conditional probability of A given B, usually written as: Pr(A|B). For example, Pr(A and B) ¼ Pr(A| B) Pr(B). Note that it is not usually the case that Pr(A| B) ¼ Pr(B| A), however, if Pr(A| B) ¼ Pr (A), then the events A and B are said to be independent (Camina and Janacek 1984). See also: Bayesian methods, sequential Gaussian simulation. Conditional simulation, conditional indicator simulation Simulation of a spatiallydistributed (regionalized) variable in one-, two-, or three-dimensions in such a way that it honours both the behaviour of the data as defined by the directional variogram(s) and the values at known data points. The global optimisation method of simulated annealing which was developed by American physicists at the IBM Thomas J. Watson Research Centre, Yorktown Heights, NY. Scott Kirkpatrick (1941–), (Charles) Daniel Gelatt Jr. (1947–) and Mario P. Vecchi (Kirkpatrick et al. 1983) and was independently discovered by the Slovakian physicist, Vladimir Černý (1952–) (Černý 1985), is used to achieve this. It can be regarded as an adaptation of the Metropolis-Hastings algorithm, based on an analogy with condensed matter physics in which the particles in an imagined physical system are regarded as equivalent to the many undetermined parameters of the system being optimized. The energy of the physical system is given by the objective function of the optimization problem. States of low energy in the imaginary physical system are the near-global optimum configurations required by the optimization problem. Their method

106

C

statistically models the evolution of the physical system at a number of increasing temperatures which allow it to “anneal” into a state of high order and very low energy (Kirkpatrick 1984) It was originally used at IBM for the optimization of integrated circuit layouts. In earth science applications, the variable being modelled may be either spatially continuous (e.g. porosity, ore grade) or a binary indicator of the presence or absence of some feature. See Matheron (1973), Gómez-Hernández and Srivastava (1990), Dowd (1991), Srivastava (1994), Deutch and Journel (1994) and Hohn (1999) for discussion and examples in reservoir modelling. Carr and Myers (1985) and Carr and Prezbindowski (1986) extended the technique to coconditional simulation, dealing with more than one spatially-distributed attribute and preserving the cross-correlation between them. PardoIgúzquiza et al. (1992) use a spectral method for 1-D simulation based on Shinozuka and Jan (1972); see also: turning bands algorithm, Bivand et al. (2013). Confidence band, confidence belt, confidence region A delineated interval about a fitted regression line, distribution function, etc. corresponding to the upper and lower limits of the confidence bounds at any one point. In some cases, such as a fitted linear or polynomial regression, y ¼ f(x), or a distribution function (Hastie and Tibshirani 1990; Chung 1989a, b), limits defining the 100(1 α)% confidence interval on the predicted value of y corresponding to a given value of x may be determined analytically, but in more complex situations bootstrap estimates may be required (Efron 1979; Hall and Titterington 1988; Efron and Tibshirani 1993). Confidence bounds, confidence interval, confidence limits When estimating the value of a parameter of an observed population (e.g. the arithmetic mean or standard deviation) based on a finite set of observations, it is usually helpful to also specify an interval about the estimated value of the parameter within which the true (but unknown) value of this parameter should lie with a stated uncertainty, were the entire population to be sampled. The calculation of this interval is based on either a theoretical model for the underlying probability distribution (e.g. the normal distribution) or may be empirical, based on quantiles of the observed distribution. If one were to repeatedly calculate such an interval from many independent random samples, then one would in the long run be correct to state that the unknown true value of the parameter is contained in the confidence interval, say, 95% of the time. The values of the upper and lower limits are known as the confidence limits or confidence bounds. Hahn and Meeker (1991) caution that “a frequent mistake is to calculate a confidence interval to contain the population mean when the problem requires a tolerance interval or a prediction interval.” In some cases, a one-sided confidence bound is required. For example, if fluorite is expected to be present as an accessory mineral in a physical sample of a rock, and 1400 points have been counted over a thin-section but none has been found, one can still state with 99% certainty that the maximum amount which could be present in the parent material will not exceed 0.33% (Howarth 1998). Confidence intervals apply only to the sampled observations, not to possible future observations. See Hahn and Meeker (1991) for a review of methods of

107

calculation, and Helsel (2005) for treatment of geochemical data containing nondetects; Sneyd (1984) for fission-track dates; and Pardo-Igúzquiza and Rodŕiguez-Tovar (2004) for power spectral analysis. Although the idea can be traced back to the French mathematician Joseph-Louis, Comte de Lagrange (1736–1813), (Lagrange 1776), modern theory and the term confidence interval were introduced by the Russian-born American statistician, Jerzy Neyman (1894–1981) (Neyman 1934, 1935). ; see also: tolerance interval and prediction interval. Confidence ellipse A bivariate confidence bound. Often used to show uncertainty in isotopic compositions in isochron plots (Ludwig 1980, 2000). The bivariate normal distribution confidence ellipse was introduced by the French naval officer, astronomer and physicist, Auguste Bravais (1811–1863) (Bravais 1846). Confidence ellipsoid A trivariate confidence bound. The trivariate normal confidence ellipsoid was introduced by the French naval officer, astronomer and physicist, Auguste Bravais (1811–1863) (Bravais 1846). It is often used to show uncertainty in position of mean paleomagnetic directions (Irving 1954; Constable and Tauxe 1990; McElhinny and McFadden 2000) or other directional data (Fisher 1953) on the sphere. See also discussion in Weltje (2002) regarding data from sedimentary petrology and hexagonal confidence bound. Other earth science applications are discussed by Le Goff et al. (1992) and Gubbins (2004). Confirmatory data analysis A term introduced by American statistician John Wilder Tukey (1915–2000) (Tukey 1969, 1973, 1980) for model-fitting and traditional statistical procedures using hypothesis testing based on inference, significance and confidence to distinguish it from exploratory data analysis. “Rough confirmatory data analysis asks, perhaps quite crudely: ‘With what accuracy are the appearances already found to be believed?’” (Tukey 1973). For discussion in a earth science context see Romesburg (1985). Conformable In matrix multiplication, two matrices A and B can only be multiplied together to form C ¼ AB if the number of columns in A is equal to the number of rows in B. If so, A and B are said to be conformable. Hence, if A is an m p matrix and B is a p n matrix, then their product C will be an m n matrix (Camina and Janacek 1984). Conformal mapping, conformal projection 1. A geometrical transformation which does not alter the angle of intersection between two lines or two curves, e.g. the mapping of spherical coordinates on the Earth’s sphere onto a plane via the Lambert conformal conic projection (Thomas 1952). 2. A mathematical technique used to convert (“map”) one mathematical problem into another, through the use of complex numbers. Points in one complex plane can be mapped into an equivalent set of points in another (Spiegel 1972).

108

C

Conjugate The conjugate of a complex number is the number with the sign of its imaginary part reversed (Sheriff 1984), i.e. for a complex number, z ¼ x + iy, its complex pffiffiffiffiffiffiffi conjugate, z* ¼ x iy, where i is the imaginary unit 1. Attributed to the French mathematician, Augustin-Louis Cauchy (1789–1857) (Cauchy 1821). In the case of a matrix, it is the matrix obtained by replacing each element by its complex conjugate. An early example of its use in geophysics is by Gosh (1961); it is also mentioned in Camina and Janacek (1984) and Yang (2008). See also Hermitian conjugate. Conjugate gradient algorithm, conjugate gradient method An iterative method for the solution of large systems of linear equations of the form Ax ¼ b, where A is a known, square, positive definite matrix, x is an unknown vector, and b is a known vector (Shewchuck 1994; Meurant 2006). It was introduced through the work of the American mathematician, Magnus Rudolph Hestenes (1906–1991) and Swiss mathematician Eduard Stiefel (1909–1978) (Hestenes and Stiefel 1952) and independent work by the Hungarian mathematician, Cornelius Lanczos (b. Lánczos Kornél, 1893–1974), (Lanczos 1950, 1952) when all three were at the Institute for Numerical Analysis of the U.S. National Bureau of Standards developing numerical methods suitable for computer-based solutions to problems (Golub and O’Leary 1989). Early applications in earth science were to digital filtering (Wang and Treitel 1973), tomographic inversion of seismic travel time residuals (Scales 1987) and adjustment of geodetic triangulation systems (Saxena 1974). Conservation An old term (Chapman and Bartels 1940) for autocorrelation. The correlation of a time-varying waveform with an offset copy of itself. Let x1, x2, , xn be a series of equally-spaced observations in time (or on a line in space) of length n. The autocovariance function is the series of values of the covariance (Cd) computed between values xi and members of the same series xi + d at a later interval in time, the k pairs of points being separated by the lag, d ¼ 0, 1, 2, Then ( Cd ¼

n X

! ) xi xiþd =k

m,

i¼1

where m is the mean of all the data. The autocorrelation function rd ¼ Cd/s2, i.e. normalised by the variance of the data (which is the same as the autocovariance at lag 0, where the comparison is between all elements of x and itself). So, r0 ¼ 1 by definition. At lag d ¼ 1, the correlation is between {x1, x2, ∙∙∙, xn 1} and {x2, x3, ∙∙∙, xn}, etc.; 1 rd 1. The term conservation was introduced by the Swedish statistician, Herman Ole Andreas Wold (1908–1992) (Wold 1938), although it may also have been used as an unnamed function by the American statistician, Norbert Wiener (1894–1964) from as early as 1926 (Wiener 1930, 1949); see also: Bartlett (1946), Blackman and Tukey (1958). It is mentioned in an earth science context by: Jones and Morrisson (1954), Horton (1955, 1957), Grant (1957), Horton et al. (1964), Robinson (1967b), Davis and Sampson (1973), Camina

109

and Janacek (1984), Sheriff (1984), Buttkus (1991, 2000), Weedon (2003) and Gubbins (2004). See also: lagged product. Consistent estimate In estimation of a power spectrum, it is desirable that both the bias and variance of the estimate approach zero as the time interval over which it is estimated approaches infinity. Because an estimate with a smaller bias has a larger variance, and viceversa, the mean squared error of the estimate of the power spectrum is used as a comparator of alternative estimation methods. The reliability of the estimate is improved by either smoothing the periodogram over neighbouring frequencies using specified weights or taking the average of a large number of periodogram estimates based on 50% overlapping segments of the data (Buttkus 1991; 2000). Constant-sum data Data subject to a constant-sum constraint. In geology such data is often geochemical (e.g. major-element oxide compositions of rocks and, at least in theory, trace element data expressed as parts per million), proportions of grain of different sizes making up a sediment sample, or mineral percentages based on point-counts, etc. The interpretational problems this causes were first recognised by the American petrographer, Felix Chayes (1916–1993) (Chayes 1960, 1962), but viable solutions only came later (Aitchison 1984, 1986, 2003; Pawlowsky-Glahn and Olea 2004; Buccianti et al. 2006). See also: closed array, closure problem, compositional data, logratio transform. Constrained Independent Component Analysis (CICA) Independent Component Analysis (ICA), also known as Blind Source (or Signal) Separation is a technique based on information theory, originally developed in the context of signal processing (Hérault and Ans 1984; Jutten and Hérault 1991; Comon 1994; Hyvärinen and Oja 2000; Hyvärinen et al. 2001; Comon and Jutten 2010) intended to separate independent sources in a multivariate time series which have been mixed in signals detected by several sensors. Constrained Independent Component Analysis (CICA) provides a method to incorporate more assumptions and prior information into ICA so as to improve the quality of source separation, e.g. if reference signals are available which carry some information to distinguish the desired components but are not identical to the corresponding sources. Lu and Rajapakse (2000, 2005) discuss the CICA approach in the context of facial image and brain-scan imagery and Lu and Liu (2009) apply the techniques to removal of multiples in seismic data. Constrained least squares The application of the least squares method to the solution of problems in which numerical constraints need to be considered, e.g. the percentages ( pi) of each of several components present need to be such that all 0 pi 100; the number of atoms (ni) in the formula of a mineral composition need to be such that 0 ni N, where N might be 1, 2, 3, 4, etc., depending on the mineral. Algorithms for solution of this type of problem were first independently developed by applied mathematicians, Josef Stoer (1934–) in Germany (Stoer 1971), and Charles L. Lawson (1931–) and Richard J. Hanson

110

C

(1938–) at the Jet Propulsion Laboratory in America in 1973, as part of a suite of FORTRAN subroutines which would accompany Lawson and Hanson (1974). Reid et al. (1973) apply the method to a petrological calculation; Ghiorso (1983) discusses the solution of this type of problem in the context of igneous, sulphide mineral and sedimentary petrology; and Speece et al. (1985) and Puryear et al. (2012) apply it to geophysical problems. Constrained Optimisation (CONOP) A technique for optimally sequencing large amounts of stratigraphic taxonomic first- and last-appearance data where the data is available from many sections (or cores etc.) so as to obtain an optimum palaeobiological time-line (Kemple et al. 1995; Cooper et al. 2001; Sadler et al. 2003; Gradstein 2005; Sadler 2012). Starting from an essentially random guess for the solution, the CONOP algorithm iteratively improves it to obtain a possible sequence of all palaeobiological events. It initially optimizes timelines only for the order of events, then adjusts the taxonomic range-ends in all sections, but only to other event horizons so as to obtain an ordinal solution which does not take into account the spacing of events. It is subsequently scaled using thickness information from all sections and then calibrated using dated events such as ash-fall bentonites, agreed taxonomic range-ends, radioisotopic dates, carbon isotope excursions, etc., so as to obtain a sequence of species originations and extinctions which minimise implied gaps in the record. Constraint A limitation externally imposed on a data set. The one most frequently encountered in geology is that of chemical or physical composition expressed as a percentage; in a regression analysis or Kriging, a series of weights may be constrained to be non-negative, or within the range > 0 to 1, or to sum to 1, etc. For discussion, see Buccianti et al. (2006) and Pawlowsky-Glahn and Olea (2004). See also: constrained independent component analysis, constrained least squares, constrained optimisation Consultation system A term used for a computer-based advice-giving or expert system (e.g. Hart 1975; Hart et al. 1978). Contingency table A table recording the joint frequencies of occurrence of the corresponding classes of two (usually categorical) variables. The corresponding univariate frequency distributions are given in the margins of the table. The method was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1904). For discussion in a geological context see Krumbein and Graybill (1965), Vistelius (1980), Romesburg et al. (1981), Miller (1983) and Romesburg and Marshall (1985). Continuation Also known as analytic continuation: A method of extending the set of values over which a mathematical complex function is defined; the mathematical projection of a potential field from one datum surface to another level surface lying either above or below the original datum. See Weaver (1942), Peters (1949) and Buttkus (1991, 2000) in

111

the context of digital filtering in applied geophysics. It may also be referred to as: downward continuation or upward continuation. Continued fraction Literally, a fraction which is “continued on.” The first known use of this idea was by the Irish mathematician, (Lord) William Brouncker (1620–1684), first President of Royal Society of London, in about 1654 or 1655, when he derived a continued fraction expansion for 4/π, from which he was able to correctly compute the value of π to 10 decimal places: 4 12 ¼1þ 2 π 2 þ 3 52

2þ2þ

or equivalently π¼

4 1þ

12 2þ

32 2 2þ 5 2þ

His result was first published in a text by his colleague, the English mathematician, John Wallis (1616–1703) in Wallis (1656). Numerous other examples of continuous fractions subsequently discovered can be found in Abramowitz and Stegun (1965). Continuous distribution A frequency distribution in which the variable (x) can take any value within a range a x b. See, for example, Bingham distribution, extreme value distribution, Fisher distribution, Fractal distribution, Kent distribution, lognormal distribution, normal distribution, Pareto distribution, von Mises distribution. Continuous function A function of a single variable x, y ¼ f(x). It may be thought of as continuous at given point x0 if values of y fall on a smooth curve as x! x0, with small changes in x corresponding to smooth small changes in y with no abrupt discontinuities; the limit as x! x0 is f(x) ¼ f(x0) (Camina and Janacek 1984). Continuous inversion A strategy for solving the inverse problem in geophysics: e.g. estimating the physical composition of a sequence of lithologies from data consisting of noisy measurements, in which it is assumed that the properties of the model are a continuous function of an independent variable. In practice, solutions are generally obtained by discretisation of the model. See discussions in a geophysical context by Bakus and Gilbert (1967), Tarantola, and Valette (1982), Russell (1988) and Gubbins (2004).

112

C

Continuous sampling In statistical sampling, continuous sampling is a method used to reduce the volume of a long data series: all the values within a number of fixed non-overlapping sampling intervals are averaged and that value constitutes a channel sample. The term originates in physical sampling in a mine environment: A slot, or channel, of given length is cut into the rock face in a given alignment (generally from top to bottom of the bed, orthogonal to the bedding plane) and all the rock fragments broken out of the slot constitute the sample (Krumbein and Pettijohn 1938). It is also known as batch sampling. See: Krumbein and Graybill (1965); composite sample. Continuous time series An assignment of a numerical value to each time of a continuous time range (Gubbins 2004). See also time series. Continuous-signal record A time series with a depth/thickness, or time scale with a (regular) sampling interval chosen by the investigator (Weedon 2003). Continuous variable, continuous variate A measurement which is not restricted to particular values—it may take values in a continuous range; equal-sized intervals in different parts of its range will be equivalent. The term has been in use since at least the 1840s (O’Brien 1842). Contour, contour interval, contour line, contouring A contour line (isoline, isopleth), with a value x is a line joining points of equal value, and which separates a field of values > x from a field of values x from a field of values 1 > x 1 cos ; where n ¼ N m, , N 1 > :2 n 2m 8 > > >
t0 is 0.05, then t0 is the critical value of t at the 5% level. The value of the statistic calculated from a sample of data will be compared with t0 in order to decide whether to reject the null hypothesis. Cross Iddings Pirsson Washington (CIPW) norm The recalculation of a major-oxide chemical analysis of an igneous, metamorphic or, occasionally, a sedimentary rock into a theoretical mineral association based on the proportions of a standard suite of minerals. This “normative composition” is most frequently used as a basis for comparison and classification. The earliest of these was the CIPW norm (Cross et al. 1902, 1903) so-called after the initials of the names of its originators, American petrologists and geochemists: Charles Whitman Cross (1854–1949), Joseph Paxton Iddings (1857–1920), Louis Valentine Pirsson (1860–1919) and Henry Stevens Washington (1867–1934). It recalculated a rock’s chemical composition in terms of two suites of minerals: I. Salic (dominantly siliceous and aluminous): quartz, zircon, corundum, orthoclase, albite, anorthite, leucite,

124

nephelite, kaliophilite, sodium chloride and sodium sulphate; and II. Femic (dominantly ferromagnesian): acmite, sodium metasilicate, potassium metasilicate, diopside, wollastonite, hypersthene, olivine, akermanite, magnetite, chromite, hematite, ilmenite, titanite, perovskite, rutile, apatite, fluorite, calcite, pyrite, native minerals and other oxides and sulphides. Its usage has diminished since the 1980s. See also norm.

C

Cross-association, cross association A measure similar to cross-correlation but designed for data which consists of nominal variables only (e.g. codes representing a number of different lithological states in successive beds in two stratigraphic sections). It was originally introduced by the British biologist and numerical taxonomist, Peter Henry Andrews Sneath (1923–2011) and microbiologist Michael Jeffrey Sackin, for study of amino acid sequences (Sackin and Sneath 1965), and subsequently applied by them and the American mathematical geologist, Daniel Francis Merriam (1927–) to the study of geological time series, e.g. the comparison of the lithology and fossil content in each bed in a pair of stratigraphic sections (Sackin et al. 1965; Merriam and Sneath 1967). If the two coded sequences are of lengths l and m respectively, and one sequence is slid past the other, then the possible match positions are 1 to l + m 1. For each position, if the number of matches of the coded characteristic (excluding matches of any unknown elements) is M, and the number of comparisons made, i.e. the size of overlap, not counting any comparison with an unknown element in either sequence, is C, then a useful measure of crossassociation at each step is the simple matching coefficient (Sokal and Sneath 1963) Ssm ¼ C/M. The values of Ssm will vary depending on the direction in which the comparisons are made and is therefore calculated in both “forward” and “backward” directions. The expected number of matches for a given overlap position is taken as P times the number of comparisons, where P is the probability that a pair of elements taken at random from the sequence will match: P¼

Total number of matches Total number of comparisons

summed over all overlap positions preceding the position of complete overlap. If a sequence is slid past itself, then the same criterion may be used as a measure of autoassociation. See Sackin et al. (1965) or Sackin and Merriam (1969) for discussion of significance testing of the results. See also: Agterberg and Nel (1982b). Cross-association tends to be the most frequently-used spelling (Google Research 2012). A disadvantage of the method is that it cannot include correlations across gaps caused by local nondeposition or eroded strata (Howell 1983); see multiple matching. Cross-correlation, cross-correlation, cross correlation Let x1, x2, . . ., xn and y1, y2, . . ., yn be two sets of equally-spaced observations in time (or on a line in space) of length n. The

125

cross-correlation function is the series of values of the correlation coefficient computed between x and y at earlier or later intervals in time, given by the lag, d. Then cov xi ; yiþd rd ¼ s2x s2y where cov is the cross-covariance and s is the standard deviation of the members of the sequences where they overlap; and 1 rd 1. The term was used by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1942, 1949). Cross-correlation can also be regarded as the convolution of waveform x with the time-reversed waveform y. For earth science applications see: Jones and Morrison (1954), Tukey (1959a), Vistelius (1961), Schwarzacher (1964), Davis and Sampson (1973), Gubbins (2004); and Oz and Deutsch (2002) for discussion of volume-dependent cross-correlation. See also: ambiguity function. By far the most frequently used spelling is cross-correlation (Google Research 2012). Cross-correlation filter A known signal (“template”) is correlated with an unknown signal so as to detect the presence of the template in the signal under test. It was shown to be the optimum detector of a known signal in additive white noise. It was originally known as a North filter after its developer, the American theoretical physicist Dwight O. North (c. 1910–1998) (North 1943); the term matched filter was introduced by the American physicist and mathematician, John Hasbrouck Van Vleck (1899–1980) and physicist David Middleton (1920–2008), who discovered the principle independently (Van Vleck and Middleton 1944). According to Sheriff (1984) it is also known as a correlation or cross-correlation filter (Jones and Morrison 1954). See also: Clay and Liang (1962), Turin (1960), Gubbins (2004). Cross-correlation theorem If F(ω) and G(ω) are the Fourier transforms of two time functions, f(t) and g(t), then Φðt Þ $ F ðωÞ∗ GðωÞ, where F(ω)* indicates a complex conjugate, the double-headed arrow denotes a Fourier transform pair, and Φ(t), the Fourier transform of the cross-correlation is known as the cross-energy spectrum (Sheriff and Geldart 1983). Cross-covariance, cross covariance, crosscovariance Let x ¼ x1, x2, . . ., xn and y ¼ y1, y2, . . ., yn be two sets of equally-spaced observations in time (or on a line in space) of length n. The cross-covariance (Cd) is a function of the magnitude of the lag (d ) which separates all the pairs of x and y, xi and yj, which are separated by a constant distance d. Then

126

n h io C d ¼ E ½xi E ðxi Þ yj E yj ,

C

where E(•) is the expectation operator which gives the value which a random variable takes on average. The term cross-covariance was in use by the early 1950s (e.g. Whittle 1953). In earth science it has become increasingly important in geostatistics, see: Dowd (1989), Künsch et al. (1997), Oliver (2003); Genton and Kleiber (2015); cross-correlation, ambiguity function. Curiously, cross-covariance is the most widely used spelling, unlike the case with alternative spellings of cross-correlation. Cross-energy spectrum If F(ω) and G(ω) are the Fourier transforms of two time functions, f(t) and g(t), then Φðt Þ $ F ðωÞ∗ GðωÞ, where F(ω)* indicates a complex conjugate, the double-headed arrow denotes a Fourier transform pair, and Φ(t), the Fourier transform of the cross-correlation is known as the cross-energy spectrum (Sheriff and Geldart 1983). Cross-plot, cross plot, crossplot A bivariate graph in which pairs of values of two variables (x, y) are plotted as points on the basis of two orthogonal axes, the x-axis is, by convention, horizontal, and the y-axis is vertical. The terms scatter diagram, scattergram, and scatterplot came into use in the 1930s (e.g. Krumbein and Pettijohn 1938). Although the spelling crossplot has been in use since the 1930s and is more frequent today, cross-plot or cross plot have also been widely used (Google Research 2012). Cross-power density-spectrum (cross-spectrum) Generally shortened to cross-spectrum (Blackman and Tukey 1958). The expression of the mutual frequency properties of two time series, analogous to the power spectrum of a single series. A continuous function which is the Fourier transform of the second-order cross-moment sequence E[X(t)Y(t + l )] of two weakly stationary stochastic processes X(t) and Y(t), both with a zero mean, where E(•) is the expectation operator and l is the lag. It indicates the degree of linear association between the two stochastic processes at different frequencies. (Because mutual relations at a single frequency can be in phase, in quadrature, or in any mixture of these, either a single complex-valued cross-spectrum or a pair of real-valued cross-spectra are needed). The real part of the cross-spectrum is referred to as the cospectrum; the imaginary part as the quadrature spectrum. First introduced empirically (based on earlier work on spectrum analysis by the American statistician, John Wilder Tukey (1915–2000), who introduced the terms cross-spectrum, cospectrum and quadrature spectrum in 1952), by the American mathematician, George Proctor Wadsworth (1906–) and mathematician and geophysicist, Enders Anthony Robinson (1930–), the English geophysicist and structural geologist, Patrick Mason Hurley (1912–2000), and the Canadian-born

127

American mathematician, Joseph Gerard Bryan (1916–2005) (Wadsworth et al. 1953). Subsequent underpinning theory was provided by Tukey’s doctoral student, the American mathematician, Nathanial Roy Goodman (Goodman 1957). An early example of its use in seismology is Tukey (1959a). Cross-product The cross-product is also known as the vector product or outer product: the multiplication of two vectors to give another vector (Sheriff 1984). If two vectors A and B lie in a plane at an angle θ to each other, then the magnitude of their product A B ¼ AB sin θ,directed at right angles to the AB plane, pointing in the direction in which a right-handed screw would move on turning from A to B. It is equal to the area of a parallelogram of which A and B form the non-parallel sides. In a three-dimensional Cartesian coordinate system, if i, j and k are mutually orthogonal unit vectors, writing A ¼ a1i + a2j + a3k and B ¼ b1i + b2j + b3k, then A B ¼ a1i þ a2j þ a3k b1i þ b2j þ b3k ¼ ða2 b3 a3 b2 Þi þ ða3 b1 a1 b3 Þj þ ða1 b2 a2 b1 Þk: The term first appears in an account of work by the American mathematical physicist, Josiah Willard Gibbs (1839–1903), who introduced it his lectures on vector analysis in 1881 and 1884, by his last student, Edwin Bidwell Wilson (1879–1964) (Wilson 1901). Early geophysical examples of usage are Dobrin and Rimmer (1964) and Shimshoni and Smith (1964). Cross-spectrum The expression of the mutual frequency properties of two time series, analogous to the power spectrum of a single series. A continuous function which is the Fourier transform of the second-order cross-moment sequence E[X(t)Y(t + l)] of two weakly stationary stochastic processes X(t) and Y(t), both with a zero mean, where E(•) is the expectation operator and l is the lag. It indicates the degree of linear association between the two stochastic processes at different frequencies. (Because mutual relations at a single frequency can be in phase, in quadrature, or in any mixture of these, either a single complex-valued cross-spectrum or a pair of real-valued cross-spectra are needed). The real part of the cross-spectrum is referred to as the cospectrum; the imaginary part as the quadrature spectrum. First introduced empirically, based on earlier work on spectrum analysis by the American statistician, John Wilder Tukey (1915–2000), who introduced the terms cross-spectrum, cospectrum and quadrature spectrum (Tukey 1952), by the American mathematician, George Proctor Wadsworth (1906–) and mathematician and geophysicist, Enders Anthony Robinson (1930–), the English geophysicist and structural geologist, Patrick Mason Hurley (1912–2000), and the Canadian-born American mathematician, Joseph Gerard Bryan (1916–2005) (Wadsworth et al. 1953). Subsequent underpinning theory was provided by Tukey’s doctoral student, the American

128

mathematician, Nathanial Roy Goodman (Goodman 1957). An early example of its use in seismology is Tukey (1959).

C

Cross-validation The partitioning of a (usually large) data set into two or more subgroups, so that a statistical analysis can be carried out on one group, while the remaining group(s) is/are used to validate the results. First used by the American psychologist Albert Kenneth Kurtz (1904–) (Kurtz 1948), its application became widespread following work by the American statistician, Seymour Geisser (1929–2004) (Geisser 1975). See also: Webb and Bryson (1972), Isaaks and Srivastava (1989), Reyment (1991), Birks (1995), bootstrap, jackknife. Cross-variogram The covariance of the increments of two different regionalized variables, e.g. U3O8 ore grade and radioactivity, whose values are measured at a large number of points within a mineral deposit (Guarascio 1976). Introduced by the French geostatistician, Georges Matheron (1930–2000) (Matheron 1970). The cross-covariogram, γ 12 ~ d , between the two regionalized variables, X1 and X2, in a given direction is: 1 XN γ 12 ~ d ¼ f½X 1 ði þ d Þ X 1 ðiÞ ½X 2 ði þ d Þ X 2 ðiÞ g, i¼0 2N where d is the distance separating each pair of the N data points, X(i) and X(i + d), in the given direction as a function of increasing d. Cubic spline A chain of polynomials of fixed degree (usually cubic functions are used) in such a way that they are continuous at the points at which they join (knots). The knots are usually placed at the x-coordinates of the data points. The function is fitted in such a way that it has continuous first and second derivatives at the knots; the second derivative can be set to zero at the first and last data points, e.g. a quadratic spline is an interpolating function of the form S ð x Þ ¼ y i þ ai ð x x i Þ þ

ðaiþ1 ai Þðx xi Þ2 , 2ðxiþ1 xi Þ

where the coefficients are found by choosing a0, then using the relationship aiþ1 ¼ ai þ 2 yiþ1 yi =ðxiþ1 xi Þ: Its gradient at a new position x3 is a linear combination of that at nearby points x1 and x2. Splines were discovered by the Romanian-American mathematician, Isaac Jacob Schoenberg (1903–1990) (Schoenberg 1946, 1971). See also Rasmussen (1991), Teanby

129

(2007), Weltje and Roberson (2012); smoothing spline regression, spline, piecewise function. Cumulative distribution The cumulative integration of the area under the probability distribution (or frequency distribution). The cumulative distribution function gives the probability that a member of the population will be less than a stated value. Introduced by the French physicist Jean-Baptiste-Joseph Fourier (1768–1830) (Fourier 1821). Cumulative curve, cumulative distribution plot, cumulative probability (CP) plot A bivariate graph in which the coordinates on the x-axis correspond to the cumulative percentages 100{1/(n + 1), 2/(n + 1), ∙∙∙, n/(n + 1)} corresponding to the n observed values of a variable and the equivalent empirical quantiles: i.e., the data values x1, ∙∙∙, xn sorted into order of ascending magnitude, form the coordinates on the y-axis. A divisor of (n + 1) is used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. Such plots have been widely used for comparison of sediment grain-size distributions since Krumbein and Pettijohn (1938) and in geochemistry by Reimann et al. (2008). First illustrated by the French physicist Jean-Baptiste-Joseph Fourier (1768–1830) (Fourier 1821) and popularised by the work of the British anthropologist, Francis Galton (1822–1911), who called it, after an architectural term, the ogive curve (Galton 1875). Cumulative probability (CP) plot Abbreviation for cumulative probability plot, which Reimann et al. (2008) use for a normal or lognormal probability plot of geochemical data. Often used as a visual goodness-of-fit test: A graph of the n observed values of a variable, x1, ∙∙∙, xn, sorted into order of ascending magnitude (empirical quantiles) on the y-axis, versus the percentiles of an appropriate theoretical distribution (e.g. a normal or lognormal distribution) serving as a model (by convention, on the x-axis), equivalent to the cumulative proportions 1/(n + 1), 2/(n + 1),, n/(n + 1). A divisor of (n + 1) is used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. An exact fit to the model results in a linear plot. Specially printed probability-scaled graph paper was widely used for this type of plot, but accurate numerical approximations for the quantiles of the normal distribution can now be obtained using standard software and have rendered probability-paper essentially obsolete. A log-scale vertical axis is used to plot the magnitude of the ordered observations if testing for fit to a lognormal distribution is required. Its use in connection with sediment size distributions is mentioned by Krumbein and Pettijohn (1938). Reimann et al. (2008) show examples of its use with geochemical data. See also quantile-quantile plot. Curl The curl of a vector X is given by the cross-product of the operator del, denoted by the nabla symbol (∇) and X, i.e.: ∇ X. The term was introduced by the British physicist, James Clerk Maxwell (1831–1879) (Maxwell 1873). Treatment of displacement data using vector algebra followed the work of the English mathematician and geophysicist,

130

Augustus Edward Hough Love (1863–1940) (Love 1906). An early example of its use in geophysics is Macelwane (1932). See also Soto (1997) and Irving and Knight (2006).

C

Curvature The rate of change of direction of a curve or surface. An early example of its use in geophysics is Ambronn’s (1926, 1928) discussion of curvature of the geoid; many examples are also given in Heiland (1940). Curve-fitting The fitting of a nonlinear analytic function to a set of data or fitting a statistical regression function to predict the value of a “dependent” or “response” variable, y, which is considered to be controlled by either a single “predictor” or “explanatory variable” (x) or a group of such predictors (X). In this case the fitted function may be curvilinear, such as a polynomial in which all the parameters appear linearly, e.g. y ¼ a0 þ a 1 x þ a 2 x 2 þ a 3 x 3 þ ∙ ∙ ∙ ; or nonlinear, in which one or more of its parameters appear nonlinearly, e.g. y ¼ a[b ecx]d. The term was first introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1901). See also: Ratkowsky (1990); regression model, regression analysis, locally-weighted regression, ridge regression, smoothing spline regression, trendsurface analysis and Howarth (2001a) for a historical review of its application in the earth sciences. Cycle The interval before a function or series repeats itself; it was used in this sense in geology by the Scottish natural historian and geologist, David Page (1814–1879) (Page 1859): if the variable is time, then a cycle is one period; if the variable is distance, it is one wavelength. See also: periodic. Cycle-octave transform The original term for what is now known as wavelet analysis. A scale-invariant filter used to quantify time series, based on wavelets, mathematical functions which reduce data to different frequency components and then examine each component with a resolution matched to its scale, essentially using shorter windows at high frequencies and longer windows at lower frequencies (Graps 1995; Goswami and Chan 1999). Like evolutionary spectrum analysis, the result can be effectively represented as an isoline plot of power as a function of frequency and time (or depth in a stratigraphic section, etc.) It is particularly effective for revealing the changing structure of a non-stationary time series. The idea arose from work by the French geophysicist, Jean Morlet (1931–2007) and Croatian-born physicist, Alexander Grossmann (1930–) (Morlet et al. 1982a, b; Grossman and Morlet 1984; Goupillaud et al. 1984). The theoretical basis for this approach and the development of wavelets of optimal shape, was later extended by the French mathematicians Yves Meyer (1939–), Stéphane G. Mallat (1962–), the Belgian physicist and mathematician, Ingrid Daubechies (1954–) and others (Mallat

131

1989a, 1999; Ruskai et al. 1992; Meyer 1993). Early geophysical applications are discussed by Pike (1994), Chakraborty and Okaya (1995), and Fedi and Quarta (1998), and its potential for cyclostratigraphy was first demonstrated by Prokoph and Bartelmes (1996). See also: Fast wavelet transform, wavelet analysis. Cycle skipping A phenomenon which occurs in the acquisition of acoustic-velocity logs when, for some reason, the returns from a given sonic emission from the downhole probe are of unduly low amplitude, so they are not detected, or so slow that they miss being counted for one “listen” cycle and get counted in with the next one. Causes may include improper adjustment of signal or detection level, fractures or washouts, high-attenuation rocks, and gas in the fluid (Keys 1997). A similar effect can occur in seismic data acquisition (Gubbins 2004). Cyclographic diagram, cyclographic projection A diagram introduced into structural geology by the English geologist, Frank Coles Phillips (1902–1982) (Phillips 1954) in which each structural plane is represented on a stereographic projection by its great circular trace, rather than plotting the pole to the plane. See also: Beta diagram. Cyclostratigraphy This is a subdiscipline of stratigraphy which deals with the identification, characterization, correlation, and interpretation of cyclic variations in the stratigraphic record and, in particular, with their application in geochronology by improving the accuracy and resolution of time-stratigraphic frameworks. Schwarzacher (1993) suggests that the term was first used publicly at a meeting organised by A.G. Fischer and I. Premoli-Silva in 1988 (Fisher et al. 1990). For applications see: House and Gale (1995), Weedon (2003), D’Argenio et al. (2004), Strasser et al. (2006), Kodama and Hinnov (2015) and the important discussions of the practical difficulties of recognition of statistically significant cycles, the effect of time-trend removal, and choice of the correct model for the power spectrum in Vaughan et al. (2011, 2015).

D

D function Introduced by the American geologist, Chester Robert Pelto (1915–1984) (Pelto 1954), the D function expresses the relationship between the relative amount of each lithological component in a multi-component system (e.g. non-clastics, sand, shale for a lithofacies map) selected as an end-member. It divides a continuous three-component system into seven classes: three sectors with a one-component end-mixture; three sectors in which two components approach equal proportions; and one sector in which all three components approach equal proportions. See also: Fogotson (1960). D/P diagram A diagram first used in early isotope geochemistry by the South African geologist, Louis Herman Ahrens (1918–1990) in 1955. The mole concentrations of daughter (D1, 206Pb; D2, 207Pb) to parent (P1, 238U; P2, 235U) isotopes were plotted as ratios, with D1/P1, i.e. 206Pb/238U (y-axis) and D2/P2, i.e. 207Pb/235U (x-axis). Wetherill (1955, 1956) showed that the linear trends (which he named chords) found in the data of Ahrens (1955a, b) for samples from the Witwatersrand and the Rhodesian shield intersected Ahrens’ “age equality” curve (the locus of all points for which 207Pb/235U and 206 Pb/238U ages were equal); further, the upper intersection defined the primary age of crystallisation of all the samples lying on the same chord, while the lower intersection related to the time of episodic lead loss. The American geophysicist, George West Wetherill (1925–2006) named the equal-age curve concordia. See also: Concordia diagram. Damping The progressive reduction of the peak amplitude of an oscillating waveform as a result of the dissipation of the oscillation energy (Willmore 1937; Leet 1950; Gubbins 2004). The resultant waveform is said to be damped. The term was used by the German physicist, Heinrich Rudolf Hertz (1857–1894) in his early researches on electric wave transmission (Hertz 1887, 1893) and by Jeffreys (1931) in the context of seismic wave propagation

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_4

133

134

D

Daniell window This is used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the width of the window is typically even and an integer power of 2; for each point within the window the weight w ¼ 1 and zero outside it. First named in a digital signal processing context by the American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000) (Blackman and Tukey 1958) after the Chilean-born British physicist and mathematician, Percy John Daniell (1889–1946), who on the basis of his familiarity with Wiener’s then-classified work on time series (Wiener 1942, 1949), suggested its use in the context of “smoothing the periodogram” as a means of estimation of spectral intensity (Daniell 1946). Because of its rectangular shape, it is also known as the boxcar window (Alsop 1968), rectangular window (Harris 1978); and Dirichlet window (Rice 1964; Harris 1978); see also: Camina and Janacek (1984); Gubbins (2004). Its shape contrasts with that of the smoothly changing weights in windows which are tapered. Darcy’s law This empirical law was formulated by the French engineer Henri Philibert Gaspard Darcy (1803–1858) (Darcy 1856; Hubbert 1987; Freeze 1994; Brown 2002) on the basis of experiments on vertical fluid flow through porous media: Fluid flow rate (Darcy flow, Darcy flux, Darcy velocity, specific discharge) (q, cm s1) is given by q/A ¼ k(Δh/Δ x), where k is the hydraulic conductivity (cm s1); A is the cross-sectional area through which flow proceeds (cm2); Δh is the hydraulic gradient, i.e. the drop in hydraulic head between the inlet and outlet (cm); and Δx is the distance through the porous medium over which flow takes place (cm). The negative sign is introduced because if there is a hydraulic gradient (difference in hydraulic head over a distance), then flow proceeds from the direction of the inlet (with the higher hydraulic head) to the outlet (low head), which is opposite to the direction of increasing gradient. See also: Kasenow (2001) and Hiscock and Bense (2014); permeability. Data acquisition The recording of data in the field or in a laboratory setting prior to subsequent storage, retrieval and analysis. Koch and Link (1971) described the varied sources then in use, noting that “most field recording is done on paper, which is either in loose sheets or bound in notebooks, the former being easier to loose initially but simpler to process later.” More sophisticated alternatives included the use of mark-sense or handpunched cards, optical document scanning and, for large amounts of data, recording directly onto 5-hole punched paper tape, magnetic tape or punched cards. Examples of data acquisition methods include: Leake et al. (1969), Oncescu et al. (1996), Briner et al. (1999), Ketcham and Catlson (2001), and Lee et al. (2013). Data adaptive filtering Given a time series or spatial signal which is contaminated with uncorrelated noise, if the nature of both the noise and the signal are known a priori, then the noise may be removed from the combined signal by subtraction. However, this cannot be done if both are phase-shifted in an unknown manner with respect to each other, nor if

135

the characteristics of the signal and/or noise change with time. The American electrical engineer, Bernard Widrow (1929–) developed the method of adaptive noise cancelling (Widrow and Hoff 1960; Widrow et al. 1975) to overcome this problem: Suppose at time t0 an underlying signal s is known to have been corrupted by added noise n0, so that (following drift-removal, if required) the observed signal, of total length N, is S ¼ s + n0. Given a drift-free noise signal n1 (which is assumed to be uncorrelated with s but correlated with n0 in an unknown way) as a reference, S is then filtered adaptively until n0 is matched as closely as possible: The output of the filter at time t is yt ¼ WXtT , where Xt is the input vector and W are the current filter weights. yt is then subtracted from S to obtain the estimated error of fit, εt. The weights are then updated as Wt + 1 ¼ Wt + 2μεtXt, where μ is a factor whose chosen magnitude governs the speed of convergence of the process. This whole cycle is repeated until there is minimal change in εt. Hattingh (1988) adapted Widrow’s algorithm so as to be able to produce satisfactory results by using a delayed copy of part of S as input to the filter, the delay being equal to the filter length (L ). The filtered results are then compared to the primary signal delayed by L/2. For the i-th value in the primary signal of total length N, εi ¼ Si L/2 yi, where L i N and the weights are updated as Wj + 1 ¼ Wj + εi2μXj where 1 j L. Hattingh called this algorithm the Correlated Data Adaptive Noise Cancelling Method (CANC). Hattingh (1990) had evidently used a robust version of the algorithm (R-CANC), but it does not seem to have been published. The CANC method has been successfully applied to both geophysical (Hattingh 1989, 1990; Sutcliffe and Yumoto 1989) and geochemical (B€ottcher and Strebel 1990) problems. See also Diniz (2013). Data analysis A term introduced by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1962) to describe what applied statisticians do, as opposed to formal statistical inference: It includes some inference, in the sample-to-population sense; the discovery of indications which could not be found simply by inspection of the raw data; and guides to the required distribution of statistical effort. Tukey’s philosophy of data analysis minimises prior assumptions and allows the data to guide the choice of appropriate models. His classic textbook Exploratory data analysis, which emphasised the use of graphical displays, transforms, robust (resistant) techniques, and residuals, first appeared in preliminary limited editions (1970–1971), the final version being published several years later (Tukey 1977). The use of Tukey’s new graphical methods and robust statistics began to be taken up in both geophysics and geochemistry during the 1980s (Kleiner and Gradel 1980; Howarth 1984). Data array A term introduced by the American petrographer, Felix Chayes (1916–1993) for an array (matrix), containing R rows, e.g. each corresponding to a rock sample, and C columns, e.g. each corresponding to one of a suite of chemical or petrographic constituents (Chayes 1960, 1962; Chayes and Kruskal 1966). It was used in the same sense by Krumbein and Graybill (1965). See also: closed array, data set.

136

Data bank, data-bank A large, computer-based, repository of organised data which may consist of a number of separate databases (e.g. bibliographic, cartographic, geochemical, geophysical etc.) so organised that it is suitable for local or remote enquiry and/or data retrieval. The term seems to have first come into use about 1962 and began to appear in the geological literature by 1968 (e.g. Stauft 1968; Goddin 1968). Merriam (1974) includes a list of examples developed up to that time; see also Bouillé (1976a). The unhyphenated data bank appears to be the most widely-used spelling (Google Research 2012).

D

Database This is an organised collection of data whose size, complexity and usage requirements necessitate its management by a database management system. It may be a collection of: (i) data with a given structure for accepting, storing, and providing, on demand, data for multiple users; (ii) interrelated data organised according to a schema to serve one or more applications; or (iii) a collection of data fundamental to a system or enterprise. In all these cases, there would normally be a single person, the database administrator, who is responsible for defining the rules by which the data is stored and accessed as well as its integrity, security, performance and recovery (IBM undated). “‘Data base’ was originally a fashionable but vaguely defined term floating around cutting-edge, interactive computer projects. It was only gradually associated with the specific technologies that became known as the DBMS [data base management system]” (Haigh 2009). Design of the U.S. Geological Survey National Geologic Map Database (which contains geological, geophysical, geochemical, geochronological, and paleontological information) began in 1992 but, because of its complexity, only started active implementation five years later (Soller and Berg 1997). This was typical of many other large databases. Burk (1975), Harvey and Diment (1979), Baxter and Horder (1981) and Chayes (1983a,b) provide a cross-section of early examples in the geological sciences. Database Management System (DBMS) Software intended to manage and maintain data in a non-redundant structure for the purpose of being processed by multiple applications. It organises data elements in some predefined structure and retains relationships between different data elements within the database (Bergin and Haigh 2009; Haigh 2009). Typically, it will contain routines for the creation and management of the database, involving Data acquisition, verification, storage, retrieval, combination and security. However, use of the term database management system (DBMS) did not become widespread until the early 1970s (Haigh 2009). The early DBMSs had a restrictive hierarchical or network structure, closely tied to the physical (disk) storage of the data, but during the mid-1970s, as the result of a widely-influential paper on the topic, developed by the English-born American computer scientist, Edward Frank Codd (1923–2003) in 1969 (Codd 1970), these began to be replaced by so-called relational database management systems. See also data storage-and-retrieval-system.

137

Data communication, data exchange The transmission, reception and validation of data; the transfer of data among functional units by means of data transfer according to a protocol. (IBM undated). The topic is discussed in an earth science context in Sutterlin et al. (1977), LeBas and Durham (1989) and Hueni et al. (2011). Data compression 1. The process of eliminating gaps, empty fields, redundancies, and unnecessary data to shorten the length of records or blocks. 2. Any encoding to reduce the number of bits used to represent a given message or record (IBM undated). See Salomon and Motta (2010) for a detailed survey of data compression methods and Wood (1974a, b), Bois (1975), Anderson et al. (1977), Spanias et al. (1991), Kidner and Smith (1992) for early examples of its use in the earth sciences. See also: Huffman coding, Walsh transform. Data display, data visualization The display of data by graphical or cartographic means so as to reveal its content in a way which makes it easy for the viewer to assess its meaning. “Data graphics visually display measured quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading and color . . . of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time most powerful” (Tufte 1983). For insightful discussions of the principles of good data display in a general context, see: Dickinson (1973), Tukey (1977), Tufte (1983, 2001), Wainer (1997). Aspects of the history of the use of graphics in geology are given by Howarth (1996a, b, 1998, 1999, 2001, 2009) and Howarth and Garrett (2010). The term data display began to be used in the earth science literature in the early 1960s (e.g. Adams and Allen 1961) but data visualization came in to use only in the 1990s; by far the most usual spelling of the latter is visualization rather than visualisation in both British and American English (e.g. Dorn et al. 1995; Google Research 2012). Data editing The removal and/or correction of erroneous values present in a data file as a result of data entry, instrumental, or formatting errors. The term was in use in the computer industry by 1956; Koch and Link (1971) contains an early discussion of the importance of this topic in a geological context. Data file A named set of records containing data recorded in computer-accessible form and stored and/or processed as a unit (IBM undated, Brisbin and Ediger 1967). Early use of the term in computing is in Eisler (1956) although it was in use in a more general context by 1941. Early use in a geological context occurs in Brisbin and Ediger (1967) and Hubaux (1969). The spelling data file has been in use since the 1960s but datafile, although much less frequent, has come into use since 1980 (Google Research 2012).

138

Data gap An interval (e.g. in time or depth) in which data in a sequence of recorded data is missing, usually as a result of missing coverage, a recording error, etc. An early example of the geological usage of this term is Ivanhoe (1956). Data integration The combining of data from a number of different sources into a unified product which provides a more meaningful view of the total data content to the user. Early geological examples include Missallati et al. (1979) and Bolivar et al. (1983).

D

Data kernel Consider establishing a theoretical model to explain some observed data: Following Gubbins (2004) the data model may be parameterised with a set of P parameters which form a model vector: mT ¼ ðm1 ; m2 ; ∙ ∙ ∙ ; mP Þ; the data form a data vector of length D: dT ¼ ð d 1 ; d 2 ; ∙ ∙ ∙ ; d D Þ and the corresponding errors in the data form an error vector: eT ¼ (e1, e2, ∙ ∙ ∙, eD), where T indicates transposition. For the sake of example, assuming a simple linear relationship between data and model applies, then in matrix form, what are known as the equations of condition are given by: d ¼ Am + e, where A is an D P matrix of coefficients that are independent of both data and model, and are found by consideration of physics of the problem, geometric constraints, and the type of measurements. The i-th row of A, known as the data kernel, describes how the ith datum depends on the model. Gubbins gives an illustrative example in which the density of the Earth’s core (ρc) and mantle (ρm) are to be estimated from data which consist of values of the Earth’s mass (M) and moment of inertia (I), given the radii of a spherical core (c) and a spherical surface (a). Then the data vector is dt ¼ (M, I/a2); the model vector is mT ¼ (ρc, ρm); and the equations of the condition matrix A are: 2

c3

4π 6 5 4 2c 3 5a2

3 ð a3 c 3 Þ 7 2 3 c5 5: a 2 5 a

These may be solved to find ρc and ρm. In general, if D¼P, the problem is said to be equidetermined; if D > P, it is said to be overdetermined; and if D < P, it is underdetermined and different methods of solution apply in each case (Gubbins 2004).

139

Data loading Also known as file generation, the process of initially reading data into machine storage, validating it, and preparing the database for subsequent update and retrieval (Gordon and Martin 1974; Hruška 1976). Data logger An electronic device for the automatic acquisition and storage of data values in digital form, generally as a function of time and/or spatial position, in such a way that they can be subsequently retrieved as a time- or spatial-series (Miller 1963; Houliston et al. 1983). Data mapping See: point-value map, point-symbol map, contour map; biofacies map, choropleth map, classifying function map, derivative map, facies departure map, isofacies map, isopach map, isopleth map, lithofacies map, sorting map, sphericity map, structure contour map, trend-surface analysis. Data mining The nontrivial extraction of implicit, previously unknown, and potentially useful information, such as patterns of relationship, from large amounts of data (Knowledge Discovery in Databases, KDD): using statistical and graphical techniques (Exploratory Data Analysis, EDA) to discover and present knowledge in a form which is easily comprehensible to humans (Everitt 2002). This data exploration stage may be followed by model building to arrive at good predictive models, ideally these would be validated using an independent data set; and, finally, using the model as a predictive tool with new data (Kantardzic 2011). Geological applications include Stolorz and Dean (1996), Marketos et al. (2008), Landgrebe et al. (2013) and Cracknell, and Reading (2014). Data model 1. Consider establishing a theoretical model to explain some observed data: Following Gubbins (2004) the data model may be parameterised with a set of P parameters which form a model vector: mT ¼ (m1, m2, ∙ ∙ ∙, mP); the data form a data vector of length D: dt ¼ (d1, d2, ∙ ∙ ∙, dD) and the corresponding errors in the data form an error vector: εt ¼ (ε1, ε2, ∙ ∙ ∙, εD). For the sake of example, assuming a simple linear relationship between data and model applies, then in matrix form, what are known as the equations of condition are given by: d ¼ Am + ε, where A is an D P matrix of coefficients that are independent of both data and model, and are found by consideration of physics of the problem, geometric constraints, and the type of measurements. The i-th row of A, known as the data kernel, describes how the ith datum depends on the model. Gubbins gives an illustrative example in which the density of the Earth’s core (ρc) and mantle (ρm) are to be estimated from data which consist of values of the Earth’s mass (M ) and moment of inertia (I ), given the radii of a spherical core (c) and a spherical surface (a). Then the data vector is dt ¼ (M, I/a2); the model vector is mT ¼ (ρc, ρm); and the equations of condition matrix, A is:

140

2 A¼

D

c3

4π 6 5 4 2c 3 5a2

3 ð a3 c 3 Þ 7 2 3 c5 5 : a 2 5 a

These may be solved to find ρc and ρm. In general, if D¼P, the problem is said to be equi-determined; if D > P, it is said to be overdetermined; and if D < P, it is underdetermined and different methods of solution apply in each case (Gubbins 2004). 2. The term data model is also applied to a generic, abstract, set of concepts for the representation of the logical organisation of the data in a database, consisting of a set of objects (named logical units of data) and the relationships between them, the appropriate operations and integrity rules between them being formally defined; it is separate from the data structure, which is a set of methods or programs to access the data which is stored in a specific way so as to ensure that the intended behaviour of the operations in the data model is preserved (Tsichiritzis and Lochovsky 1977). The entities and relationships in a data model are described using data structure diagrams (Bachman 1969), introduced by the American software engineer, Charles William Bachman IIIrd (1924–) when he developed General Electric’s Integrated Data Store (IDS) database management system in 1963–1964 (Bachman 1965). There are now three types of data model: the hierarchical model, network model and relational model; see Martin and Gordon (1977) for a discussion of their properties in a geological context and Frank (1992) on adaptation of the data model concept for Geographical Information Systems. A number of open-source data models for the geological community have been developed by the British Geological Surveys (EarthDataModels.org). See West (2011) for a recent discussion of the subject. Data partition, data partitioning 1. Partitioning is the subdivision of a table or index-organised table in a database into smaller entities (a partition), each of which has its own name and may have its own storage characteristics so as to improve both access to, and management of, the contents of the database (Clark 1989). 2. The subdivision of a data set into subsets by means of a set of rules (e.g. Chayes 1970). Data processing The numerical treatment of data to improve its quality (e.g. improvement of its signal:noise ratio; applying corrections; filtering; etc.) and/or to increase its intelligibility. The term began to be used in the early 1950s (e.g. Dixon 1953; Bashe et al. 1954; Canning 1956) and appears in geophysics in Smith (1958). Data quality A high-quality data set is characterised by its representativity, accuracy, good precision, and lack of bias. See: quality assurance, quality control.

141

Data reduction 1. The correction of raw experimental data for known effects so as to convert it to a useful form (Petrus and Kamber 2012). 2. Reducing the volume of a large data set while retaining its essential features to facilitate its storage, transmission and interpretation: (i) Representation of a complex data series by a set of key features, e.g. the positions of major peak positions in an X-ray diffraction trace (Ong et al. 1992). (ii) Reduction of high-dimensional data to a smaller number of variables by means of principal components analysis (Samson 1983). (iii) Compression of a data-series by coding techniques (Spanias et al. 1991). (iv) In image processing, in which the pixels will have a fixed range of values, e.g. 0–255 greylevels, compression may be achieved by encoding successive pixel-to-pixel differences along rows and columns relative to the previous column/row mean followed by recoding as an incremental sequence of look-up values according to their decreasing frequency of occurrence (Plumb 1993). Data retrieval, data retrieval system Obtaining data from a database management system. Early earth science references include Dillon (1964), Brisbin and Ediger (1967) and Bowen and Botbol (1975). Data set, dataset A collection of related data composed of separate variables or attributes which can be manipulated as a unit by a computer: In general, it will be in the form of a data table in which rows correspond to samples, cases, positions, times, etc. and the columns to the variables whose values have been recorded, often as a database table. An early use of the term in the geological literature occurs in Pelto et al. (1968) although Krumbein and Graybill (1965) use data array in the same sense. The unhyphenated spelling data set seems to have first become usual about 1885; while dataset came into use about 1945, but the former is still by far the most frequently used; the hyphenated dataset occasionally appears in post-1980 literature (Google Research 2012). Data-stacking The adding together (or averaging) a group of time series covering the same time interval (usually with regard to a particular feature indicating the onset of a period of interest), so as to reduce noise and improve the signal. In more complex treatments, filtering may also be involved to overcome wave-shape differences between the elements being stacked. A term used in the common-depth-point method of seismic data processing, first used by the American geophysicist, William Harry Mayne (1913–? 1990) in 1950 (Mayne 1956, 1962). See also Camina and Janacek (1984), Weedon (2003) and Gubbins (2004). Data storage, data storage-and-retrieval system A computer-based system for the storage of data and its later retrieval for inspection or further analysis using computer programs. Morgan et al. (1969) used the term data storage-and-retrieval system in a description of a databank first established in 1963 by the Kansas Geological Survey for groundwater and hydrochemical data. See also database management system.

142

Data structure A way of organising data within a computer so that it can be used (retrieved) efficiently, examples include data records, data arrays, and tree-like structures. See Aho et al. (1983) for further information and Bouillé (1976a) for an early discussion in a geological context.

D

Data system An integrated system of computer programs and subroutines for the storage, retrieval, processing and display of numeric and/or nonnumeric data held in a computer database (e.g. LeMaitre and Ferguson 1978); see also: database management system, data retrieval system. Data transfer, data transfer rate, data transmission The physical transfer of digital data (a digital bitstream or digitized analogue signal), either via a medium such as magnetic tape (Dampney et al. 1985) or, in more recent years, over a point-to-point or point-tomultipoint communication channel (Pergola et al. 2001; Araya et al. 2015). Files are transferred using a standard protocol. The data transfer rate is the average number of bits, characters or blocks per unit time passing between corresponding equipment in a data transmission system (IBM undated). Data validation The process of checking the validity of entries in a potential database, following data loading, against permitted codes or a permitted range of values, etc. (Gordon and Martin 1974; Hruška 1976). Data window A function of discrete time by which a data series is multiplied; also known as a taper. It is the multiplication of the values of a time series within a given interval by a gradational series of weights, which smoothly increase from zero at the edges to a maximum value at the centre and are equal to zero outside this given interval (this contrasts with the rectangular boxcar window). The intention is to minimise periodogram leakage by gradually fading the magnitude of the oscillations towards the ends of the time series. The term was introduced by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) in Blackman and Tukey (1958). Discussed in a geophysical context by Tukey (1959a). See also: Weedon (2003); lag window, spectral window. Davidon-Fletcher-Powell algorithm A powerful iterative method for sequentially minimizing a general unconstrained function of n variables or parameters based on the assumption that the function to be minimized behaves like a quadratic function. Originally developed by American physicist William Cooper Davidon (1927–2013) (Davidon 1959, 1991) [note: his name is frequently misspelled Davidson] and subsequently improved by British mathematicians Roger Fletcher (1939–) and Michael James David Powell

143

(1936–2015) (Fletcher and Powell 1963). Mentioned by Buttkus (1991, 2000) and applied to the fitting of sand grain-size distributions by Fieller et al. (1992) and earthquake studies (Zhuang 2011). See also Rosenbrock function. dBASE One of the earliest database management systems for microcomputers. Originally developed in assembler language code under the name Vulcan in 1978 by American software engineer, (Cecil) Wayne Ratcliff (1946–) while he was at the Martin Marietta Corp., the software was subsequently licenced to Californian start-up company AshtonTate and marketed under the name dBASE II. dBASE III, released in 1984, was the first version to be coded in C and had a rapid take-up among geologists (Butler 1987; Mukhopadhyay et al. 1994). Following problems with dBASE IV, Ashton-Tate was sold to Borland Software Corp. in 1991 and dBASE was subsequently marketed by dataBased Intelligence (1999) and dBase LLC, Binghamton, NY (2012) De Moivre’s formula, De Moivre’s theorem This states that eikθ ¼ (cosθ + i sin θ)k ¼ cos(kθ) + i sin(kθ), where e is Euler’s number, the constant 2.71828, and i is the pffiffiffiffiffiffiffi imaginary unit 1. Named for the French-born English mathematician, Abraham De Moivre (1667–1754), who never explicitly stated the theorem in this form but who is believed to have derived an equivalent result in the early 1700s (Bellhouse 2011). It was first stated in the above form by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1748, 104). An early reference to this formula under De Moivre’s name appears in Woodhouse (1809). An example of its use in geophysics is Chen and Alsop (1979). de Wijs binomial cascade, de Wijsian model The Dutch economic geologist, Hendrica Johanna de Wijs (1911–1997), published an important paper in which he introduced the idea of self-similarity of element concentration values: The model has two parameters: the overall average concentration value and the dispersion index (d), and postulates that the values of two halves of a block of ore which has an overall concentration value of c are (1 + d)c and (1 d )c, regardless of the size of the block (de Wijs 1951). In the early 1950s this model inspired the French geostatistician, Georges Matheron (1930–2000) to develop his theory of regionalized variables as applied to ore assays. Matheron’s (1962) absolute dispersion parameter, α, is a function of d and relates the logarithmic variance of element concentration values to the logarithmically transformed ratio of the volumes of a large block and smaller blocks contained within it. Krige (1966) showed that this version of the model applies to the spatial distribution of both gold and uranium in the Witwatersrand goldfields, South Africa. Mandelbrot (1982) demonstrated that the de Wijsian model was the first example of a multifractal. Lovejoy and Schertzer (2007) referred to this as the de Wijs binomial cascade. Independently, Brinck (1971) used the de Wijsian model for the spatial distribution of various chemical elements in large portions of the Earth’s crust. Brinck’s approach is described in detail by Harris (1984), together with other applications. Agterberg (2007) showed that estimation of the dispersion parameter can be improved by

144

using multifractal theory. He proposed a 3-parameter de Wijsian model, the third parameter being the apparent number of subdivisions of the environment. This was introduced because although the de Wijsian model may be satisfied on a regional scale, the degree of dispersion generally decreases rapidly as local, sample-size, scales are reached.

D

Debug, debugging To find and rectify the number of errors in a computer program (Sheriff 1984). Although use of the term in printing dates back to the 1940s, its first appearance in computing literature was by Orden (1952). An early example of usage in geophysics is in Simons (1968). See also bug. Decibel (dB) scale A decibel (Martin 1929) equals 10 log10(power), where power, in time series spectrum analysis, is waveform amplitude squared. Thus a power of 0.01 is equivalent to 20 dB. See also Buttkus (1991, 2000). Decimation A method of taking a subsample from a long time series in which individual data values are obtained by regularly omitting a sequence of n successive data values in the original series. There is a risk that this may introduce aliasing in the subsampled sequence. The term was introduced by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). See Camina and Janacek (1984) and Weedon (2003) for discussion in an earth science context. See also twiddle factor. Decision system A computer-based system for making decisions on a probabilistic basis, e.g. in the oil industry “the ultimate objective of such a system would be to provide a means of obtaining a sequence of optimum decisions that are custom-tailored for a particular oil operator’s financial goals and risk position” (Harbaugh 1977). An early example of this was the KOX (Kansas Oil Exploration) system developed by American geologist and petroleum engineer, John Warvelle Harbaugh (1926–) and others at the Kansas Geological Survey (Harbaugh 1972, 1977; Harbaugh et al. 1977). Declination The angle on the horizontal plane between the direction along a meridian towards the geographical North Pole (true North) and magnetic north. The first known observations of the consistent deviation of a magnetic needle from the astronomical North– South axis was made by the Chinese polymath Shen Kuo (1031–1095 AD) in 1088, although he noted an earlier investigation by the astronomer Yi-Xing (672–717 AD). (Needham 1959; Jialu Fan et al. 2015). The first precise measurement of declination in Europe was not made until c. 1510, when Georg Hartman (1489–1564) determined the declination in Rome (Harradon 1943b). Deconvolution, deconvolved Originally called prediction error filtering in the 1950s (Robinson 2015), it is a process designed to restore a waveform to the shape it had before being affected by some filtering action. The assumption is that a seismic trace consists of a

145

series of reflection events convolved with a wavelet (whose shape depends on the shape of the pressure pulse created by the seismic source, reverberations and ghost reflections in the near-surface, the response of any filters involved in the data acquisition, and the effects of intrinsic attenuation), plus unrelated noise. The deconvolution process designs an inverse filter which compresses the wavelet and enhances the resolution of the seismic data (Dragoset 2005). In practice it may involve the following steps: (i) system deconvolution, to remove the filtering effect of the recording system; (ii) dereverberation or deringing, to remove the filtering action of a water layer (if present); (iii) predictive deconvolution, to attenuate the multiples which involve the surface or near-surface reflectors; (iv) deghosting, to remove the effects of energy which leaves the seismic source directly upwards; (v) whitening or equalizing to make all frequency components within a bandpass equal in amplitude; (vi) shaping the amplitude/frequency and/or phase response to match that of adjacent channels; and (vii) determination of the basic wavelet shape (Sheriff 1984). The principle of digital deconvolution of seismic traces was introduced, and its feasibility compared to the older analogue computing methods for this purpose proven, in 1952–1954 by the American mathematician and geophysicist, Enders Anthony Robinson (1930–) and members of the Geophysical Analysis Group at the Massachusetts Institute of Technology. They initially used the 16-bit word length Whirlwind I computer at MIT, which at that time had only 1024 words of random-access memory (Robinson 1954, 1967a). Subsequently, using the Digital Electronic Computer at the Raytheon Manufacturing Co. at Waltham, Massachusetts and the British-made Ferranti Mark 1 (FERUT) computer at the University of Toronto, which had a 40-bit word length, the feasibility of processing multiple seismic records by digital computer was eventually demonstrated (Robinson 1967b, 2015). See also: Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004) and Robinson and Treitel (2008); adaptive deconvolution, convolution, deterministic deconvolution, dynamic deconvolution, homomorphic deconvolution, inverse filtering, minimum entropy deconvolution, statistical deconvolution. Definite integral An integral is the result of integrating a function: if y ¼ f (x), then it is the total area between the curve defined by the values of y ¼ f (x) and the x-axis. This can be imagined as the sum of the areas of an infinite number of infinitely thin rectangles parallel to the y-axis, all of equal width, δx, and with corresponding mid-point (MP) heights: yMP ¼ f (xMP) hence Z f ð xÞ

n X

ff ðxMP Þgi

i¼1

as δx ! 0 and, correspondingly, n ! 1 . If the area considered is only that between stated lower and upper limits, x1 and x2, then it is referred to as a definite integral which is written in the still-current notation introduced by the German mathematician, Gottfried Wilhelm

146

Rx von Leibniz (1646–1716), (Leibniz 1686, 297; Roero 2005) as: x12 f ðxÞdx. Otherwise it is called an indefinite integral. See Camina and Janacek (1984) for discussion; Abramovitz and Stegun (1965) for special cases.

D

Deformation ellipsoid In three dimensions the semi-axes of the deformation or strain ellipsoid are (1 + ε1) (1 + ε2) (1 + ε3), where ε1 and ε2 are the principal finite extensions (also called principle finite strains). In naturally deformed rocks five types of strain can occur and are characterised in terms of their principal extensions: Type 1, uniaxial flattening: ε1 ¼ ε2, both positive, ε3 negative; Type 2, general flattening: ε1 and ε2 positive, ε3 negative; Type 3, plain strain, ε1 positive, ε2 ¼ 0, ε3 negative; Type 4, general constriction, ε1 positive, ε2 and ε3 negative; and Type 5, uniaxial constriction, ε1 positive, ε2 ¼ ε3, both negative. Strain ellipsoid shape may be characterised using the Flinn diagram, Ramsay logarithmic diagram or Jelinek diagram. The idea of the strain ellipsoid was first discussed by the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823). The terms (stress) ellipsoid and principal strain were used by the British physicist, William Thomson, Lord Kelvin (1824–1907) (Thomson 1856); the first analytical treatment in geology was given by the American mining and structural geologist, George Ferdinand Becker (1847–1919) (Becker 1893) The term deformation ellipsoid was introduced by the British geologist, Derek Flinn (1922–2012) (Flinn 1962); all were quantitative treatments. See also: ellipsoid d-value, ellipsoid D-value, ellipsoid k-value, ellipsoid K-value. Deformation matrix Deformation is the change in position undergone by the material particles of a continuous body as a result of simple shear and pure shear. The Norwegian geologist, Hans Ramberg (1917–1998) was the first to model finite and progressive deformation as a simultaneous combination of pure and simple shearing based on the methods of continuum mechanics (Ramberg 1975, Tikoff and Fossen 1993). See Soto (1997) for a recent review and three-dimensional model of deformation in a general shear zone. Deformation path A line connecting points corresponding to successive changes in shape of the strain ellipsoid under progressive deformation on a Flinn diagram or Ramsay logarithmic diagram. It may be regarded as the history of homogeneous general rotational strain matrix (D) which may be factored into a pure strain matrix (T) and a rotational matrix (R), where D ¼ TR. The term was introduced by the British geologist, Derek Flinn (1922–2012) (Flinn 1962). See also: Ramsay (1967), Elliott (1970, 1972), Ramsay and Huber (1983). Deformation plot Introduced by the British structural geologist, Derek Flinn (1922–2012) in 1962 (following his 1956 study of deformed clast shapes which used an adaptation of the Zingg plot). A method of classifying the shape of the strain ellipsoid on

147

the basis of the two principal strain ratios: the ratio of the maximum/intermediate extensions plotted on the y-axis and the ratio of the intermediate/minimum extensions plotted on the x-axis. It is now known as the Flinn plot. See also: Ramsay (1967), Ramsay and Huber (1983); the Ramsay logarithmic diagram and Jelinek diagram. Degeneracy Given a square matrix, A, its characteristic polynomial is det(xI A), where I is an identity matrix of the same dimensions as A, and det is the determinant. A degenerate eigenvalue (i.e. a multiply coinciding root of the characteristic polynomial) is one which has more than one linearly independent eigenvector. Use of term occurs in physics in Koening (1933) and in geophysics by Chael and Anderson (1982); S-waves have a degeneracy of 2 in isotropic media (Sheriff 1984). Deghosting A filtering technique to remove the effects of energy which leaves the seismic source directly upward, used as part of a process designed to restore a waveform to the shape it had before being affected by some filtering action. The assumption is that a seismic trace consists of a series of reflection events convolved with a wavelet (whose shape depends on the shape of the pressure pulse created by the seismic source, reverberations and ghost reflections in the near-surface, the response of any filters involved in the data acquisition, and the effects of intrinsic attenuation), plus unrelated noise. The deconvolution process designs an inverse filter which compresses the wavelet and enhances the resolution of the seismic data (Dragoset 2005). In practice it may involve the following steps: (i) system deconvolution, to remove the filtering effect of the recording system; (ii) dereverberation or deringing, to remove the filtering action of a water layer (if present); (iii) predictive deconvolution, to attenuate the multiples which involve the surface or near-surface reflectors; (iv) deghosting, to remove the effects of energy which leaves the seismic source directly upwards; (v) whitening or equalizing to make all frequency components within a band-pass equal in amplitude; (vi) shaping the amplitude/frequency and/or phase response to match that of adjacent channels; and (vii) determination of the basic wavelet shape (Sheriff 1984). The method was introduced by the American mathematician and geophysicist, Enders Anthony Robinson (1930–) in 1951 during study for his Massachusetts Institute of Technology PhD thesis (1954). Sometimes referred to as a ghost elimination filter (Camina and Janacek 1984). See also: Robinson (1967b), Sheriff (1984), Buttkus (2000), Gubbins (2004); adaptive deconvolution, convolution, deterministic deconvolution, dynamic deconvolution, homomorphic deconvolution, inverse filtering, minimum entropy deconvolution, predictive deconvolution, statistical deconvolution. Degrees of freedom The number of parameters which may be independently varied. The term was introduced by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) in 1922. Discussed in earth science textbooks such as Miller and Kahn (1962), Krumbein and Graybill (1965), Vistelius (1980, 1992), Buttkus (1991, 2000) and Gubbins (2004).

148

del (∇) [notation] A vector differential operator denoted by the Greek symbol (∇, nabla): ∇¼i

D

∂ ∂ ∂ þj þk ∂x ∂y ∂z

where i, j and k are unit vectors directed along the orthogonal x-, y- and z-axes. An early example of its use in a seismology textbook is Macelwane (1932). It was first used (on its side) in a mathematical paper by the Irish mathematician, physicist and astronomer, (Sir) William Rowan Hamilton (1805–1865) (Hamilton 1837) but was subsequently taken up following its adoption (in its present orientation) by the Scottish mathematician and physicist, Peter Guthrie Tait (1831–1901) (Tait 1867, 1890, §145, 102). Apparently unsure what to call this new and as yet unnamed symbol, the Greek word nabla was suggested to him by the Scottish professor of divinity, and reader in Arabic and Hebrew, William Robertson Smith (1846–1894) on account of its resemblance to the shape of a harp of Phoenician origin, once used in the Middle East by the ancient Hebrews, called by them ‫ֵ֤נֶבל‬ (nêḇel) and known to the Greeks as the nabla or nablia (Rich 1890). Delaunay tessellation, Delaunay triangulation This is a triangle-based tessellation which is the dual of the Dirichlet tessellation. If all neighbouring points of the Dirichlet tessellation are joined, then one has a tessellation of triangles. If a region consists of Ni interior polygons and Nb boundary polygons, then the Delaunay tessellation will consist of (2Ni + Nb 2) triangles. Although algorithms for triangulating a polygon go back to the work of the American mathematician Nels Johann Lennes (1874–1951) (Lennes 1911), an algorithm for triangulating points in a plane was first introduced by the Russian mathematician, Boris Nikolaevich Delaunay [Delone] (1890–1980) (Delaunay 1934). Earth science usage includes Watson and Philip (1984) and Tsai (1993). Delay See: maximum delay, minimum delay, minimum delay filter, phase delay, Takens’ time delay method. Delay-and-sum method A method of stacking a number of seismic traces (or similar time series) recorded simultaneously at a number of receiver stations at varying distances from a source, so as to align them in such a manner that their cross-correlation is maximised. This is achieved by calculating an individual delay time and weighting to be applied to each trace in such a manner that the sum of all the weighted delayed traces is maximised. In practice, each trace in turn is chosen as a reference trace and the sum of squares of differences between the reference trace and the rest (the beam) is minimized. The optimal solution is obtained when the sum of all the misfits is minimized (Mao and Gubbins 1995; Gubbins 2004). This technique is a method of beamforming.

149

Delphi estimation, Delphi method A structured method for the improvement of group estimates by the iterative application of controlled feedback of individual expert opinions to the participating panel (Lindstone and Turoff 1975; Barrington 1986). It was originally questionnaire-based and developed for military purposes at the RAND Corporation in the 1950s by Olaf Helmer (1910–2011), Norman Crolee Dalkey (1915–) and Nicholas Rescher (1928–) (Dalkey and Helmer 1951, 1962). There have been concerns about its use as a result of the ease with which the method may be influenced by bias or improper manipulation (Sackman 1974, Baxter et al. 1978, Krinitzsky 1993) and Woudenberg 1991) concluded that there is no evidence to support the view that Delphi is more accurate than other judgment methods. Despite this, an early application in geology was as an aid to estimation of mineral resources based on collective opinion (Ellis et al. 1975). It has also been applied to fields such as: oil reserves estimation (Masters et al. 1998, Liu et al. 2009), water resources (Taylor and Ryder 2003) and engineering geology (Liu and Chen 2007). Delta function An abbreviation of the Dirac Delta function: A probability density function in which P(x) ¼ 0 for x at 1 to 1, except at x ¼ 0 where P(x) ¼ 1. Its use was popularised by the British physicist, Paul Adrien Maurice Dirac (1902–1984) who introduced it (Dirac 1930, p. 58) as a tool in quantum mechanics. Also known as the Dirac function, and an impulse. Discussed in a geophysical context by Buttkus (1991, 2000) and Gubbins (2004); see also: Gunduz and Aral (2005); Dirac comb, Heaviside function, Kronecker Delta function. Dempster-Shafer Theory Named for the Canadian-born American statistician, Arthur Pentland Dempster (1929–) and American mathematician, Glenn Shafer (1946–), it is also known as the Theory of Evidence and was initially introduced by Dempster (1966, 1967, 1968), and then further developed by Shafer (1976) as an aid to dealing with the uncertainty arising from the nature of evidence. For example, one could state that from the evidence provided by three sets of data (74 water-well pump tests, 161 core permeability determinations and 453 drill-stem tests obtained from depth ranges of: 100–3500, 3000–8500 and 100–10,000 f. respectively; Mathon et al. 2010) the permeability of a certain formation at a depth of 6000 f. is reasonably believed to be 20 md but, because of uncertainty, it is plausible that it could be as high as 970 md. The Dempster-Shafer theory formalises such a reasoning process: a superset S is a set (e.g. of hypotheses) which contains all the n members of a smaller finite set A; this subset A is said to be contained in S, but is not equal to S. The void (empty) set (∅) is also a proper subset of S. This may be symbolised as: A S ; A6¼∅. The power set is the collection of all subsets of S, including the void set and S itself. For example, if S ¼ {a, b}, then the possible subsets of S will be: {∅, {a}, {b}, {a, b}}, and hence the size of a superset is given by 2S. Dempster-Shafer theory assigns an evidential weight to a set A S, which contains a single hypothesis, or set of hypotheses, by means of a mapping m : 2S ⟶ [0, 1]; m is known as the basic belief assignment or basic assignment and

150

X

mðAÞ ¼ 1; mð∅Þ ¼ 0:

AS

D

This can be regarded as assigning an evidential weight to the set A; the degreee of belief that a statement is warranted. By applying the basic assignment, one can obtain two further measures: belief (bel), the total amount of justified support given to A; and plausibility (pl), the maximum amount of specific support which can be given to A, if justified by additional evidence; where: X

bel : 2S ! ½0; 1 and bel ðAÞ ¼

mðBÞ

BA;B6¼∅

and X

pl : 2S ! ½0; 1 and pl ðAÞ ¼ B

T

mðBÞ,

A6¼∅

where \ denotes the intersection of sets A and B, i.e. the elements of A which are also elements of B. Furthermore, 1 bel(A) represents doubt; 1 pl(A) represents disbelief; bel (A) pl(A); and pl(A) bel(A) corresponds to the uncertainty in bel(A). Dempster and Shafer suggested a Rule of Combination which enables the basic assignments to be combined: mðZÞ ¼

X A\B¼Z6¼∅

mðAÞmðBÞ

.

" 1

X

# mðAÞmðBÞ :

A\B6¼∅

Rakowsky (2007) gives a very clear example of how these are used in practice; Mathon et al. (2010) illustrate the application of the theory to dealing with uncertainty in permeability measurement; Kachnic and Sadurski (2005) to estimating the extent of an unconfined aquifer; and Althuwaynee et al. (2012) to landslide susceptibilty mapping. Dendrogram A graphical method of depicting the results of a hierarchical cluster analysis. The term cluster analysis was introduced by the American psychologist, Robert Choate Tryon (1901–1967) (Tryon 1939), and means the assignment of n individual objects to groups of similar objects on the basis of their p-dimensional attributes. The first step of this multivariate method is to compute a similarity matrix between all pairs of samples, this is then used as the basis for assigning the samples to different groups. In hierarchical clustering, the solution involves nesting sub-groups within larger groups. This is generally accomplished either by (i) agglomerative clustering, in which the n individuals are successively fused into groups; or (ii) divisive methods, which progressively partition the set of individuals into successively finer groupings. The results are generally displayed

151

in the form of a two-dimensional tree-diagram or dendrogram in which the individuals all occur at the topmost level, representing the tips of the branches; these are then progressively joined downwards as the similarity between the groups becomes more generalised until, at the base, they are all joined as a single group. Several standard algorithms are used to compute the tree structure (e.g. single linkage, complete linkage, median clustering, centroid, etc.); although the resulting structure is broadly similar whichever method is used, some individuals (probably marginal in composition between two groups) may be forced into different sub-groups depending on which method is used. Hierarchical methods will always force a structure on the data. An early example of geological usage is Valentine and Peddicord (1967). Hierarchical clustering is also used in the reconstruction of evolutionary patterns by cladistic methods, the resultant tree-structure being known as a cladogram. Density diagram A graph of the point-density of directional vectors on an equal-area, or similar, projection. An early geological example is Robinson’s (1963) application to Beta diagrams. Density function An expression specifying the way in which the probability of a given value of a variable (x) varies as a function of x. This applies to a conceptual model; observed distributions are described by a frequency distribution. See also the: additive logistic normal, additive logistic skew-normal, Bernstein, Beta, bimodal, Bingham, binomial, bivariate, broken-line, Burr-Pareto logistic, Cauchy, Chi-squared, cumulative, Dirichlet, discrete, double-exponential, exponential, extreme value, Fisher, fractal, Gamma, generalized Pareto, geometric, joint, Kent, Laplace, log-geometric, loghyperbolic, logistic, logistic-normal, log-logistic, lognormal, logskew normal, marginal, mixture, multinomial, multivariate Cauchy, multivariate lognormal, multivariate logskew normal, multivariate normal, multivariate skew-normal, negative binomial, normal, Pareto, Poisson, shifted Pareto, skew, skew-normal, standard normal, stretched Beta, superposition, triangular, truncated, truncated Pareto, uniform, von Mises, Weibull and Zipf distributions. Density trace This frequency distribution is obtained by first placing smooth (Gaussian) density functions (“kernel”), each with the same spread parameter (bandwidth) at the position of each occurrence of the variable along the horizontal axis corresponding to its magnitude. These often overlapping densities are then summed to give the final smoothed density function. This avoids the blocky appearance of the traditional histogram, but choice of an appropriate bandwidth to avoid under- or over-smoothing is essential. The approach, and terminology, has its origins in work by the American statistician, John Wilder Tukey (1915–2000) on spectrum analysis in the late 1940s. See: Tukey (1950), Wegman (1972), Chambers et al. (1983) and for earth science examples: Vita-Finzi et al. (2005), Nowell et al. (2006).

152

Dependent variable, dependent variate If a variable y is a function, y ¼ f(x), of one (or more) predictors (x), then the x’s are termed the independent variable(s) and y the dependent variable since its values are changed by changes in those of x. Both terms occur in a textbook by the Irish scientific writer, Dionysius Lardner (1793–1859) (Lardner 1825) and subsequent works of reference [e.g. Anonymous (1830a, b), Cayley (1879)], but it was the British statistician, (Sir) Ronald Alymer Fisher (1890–1962), who first explicitly used these terms in a regression context (Fisher 1925a).

D

Deposit model A generalization of a deposit type distinguished by: geological attributes, host rock environment, processes of formation and characteristic amounts of specific commodities (Hansen et al. 1978). Examples of its application include: Gaa’l et al. (1978), Sinding-Larsen and Vokes (1978), Divi (1980) and Briskey and Schulz (2007). Depth-age curve A graph enabling accurate conversion of depths (e.g. down a drillhole) with reference to a given datum to ages (Ma), based on a number of known dates (Davies et al. 1992). Depth map An occasionally-used term for a structure contour map. An isoline map of depths to a given subsurface horizon with reference to a datum level (usually mean sea level). See also contouring. The American geologist, Benjamin Smith Lyman (1835–1920), used “underground contour lines to give the shape of rock beds” for coal, iron and lead deposits in south-western Virginia, USA, in 1866–1867; his first published example occurs in Lyman (1870). Dereverberation, deringing Deconvolution is a process designed to restore a waveform to the shape it had before being affected by some filtering action. The assumption is that a seismic trace consists of a series of reflection events convolved with a wavelet (whose shape depends on the shape of the pressure pulse created by the seismic source, reverberations and ghost reflections in the near-surface, the response of any filters involved in the data acquisition, and the effects of intrinsic attenuation), plus unrelated noise. The deconvolution process designs an inverse filter which compresses the wavelet and enhancing the resolution of the seismic data (Dragoset 2005). In practice it may involve the following steps: (i) system deconvolution, to remove the filtering effect of the recording system; (ii) dereverberation or deringing, to remove the filtering action of a water layer (if present); (iii) predictive deconvolution, to attenuate the multiples which involve the surface or near-surface reflectors; (iv) deghosting, to remove the effects of energy which leaves the seismic source directly upwards; (v) whitening or equalizing to make all frequency components within a band-pass equal in amplitude; (vi) shaping the amplitude/frequency and/or phase response to match that of adjacent channels; and (vii) determination of the basic wavelet shape (Sheriff 1984). The method was introduced by the American mathematician and geophysicist, Enders Anthony Robinson (1930–) in 1951 during study for his Massachusetts Institute of Technology PhD thesis (1954). See also:

153

Robinson (1967b), Camina and Janacek (1984), Sheriff (1984), Buttkus (2000), Gubbins (2004); see: convolution; adaptive, deterministic, dynamic, homomorphic, minimum entropy, predictive and statistical deconvolution; inverse filtering. Derivative The rate of change of the amplitude of a function y ¼ f (x) with change in x, which may represent time, distance, etc., or time as a function of distance in a time-distance dy curve: if the slope of the curve over a small interval (δx) is δy/δx, then as δx ! 0, δy δx ! dx , 0 the derivative of y. dy/dx is also a function and it may also be written as f (x), depending on d dy the notation used. The second derivative, i.e. the rate of change of the derivative, dx dx , 2

00

is written as ddxy2 or f ðxÞ. The d notation was introduced by the German lawyer and mathematician Gottfried Wilhelm von Leibniz (1646–1716) in a manuscript of 26 October 1675. His explanation of differential calculus was eventually published in Leibniz (1684). 0 00 The notation using f (x) , f (x), etc. was introduced by the Sardinian-born French mathematician, Joseph-Louis Lagrange (1736–1813), (Lagrange 1772). Examples of earth science usage include: Jeffreys (1924), Macelwane (1932), Slotnick (1959) and Buttkus (1991, 2000). Derivative map A map of one of the derivatives of a potential field (usually the second vertical derivative), used to emphasise short-wavelength, i.e. high-frequency, spatial anomalies (Peters 1949, Elkins 1951, Vacquier et al. 1951, Agarwal and Lal 1969 and Sheriff 1984). Derived variables A compositional transformation used in chemical petrology (Chayes 1983c). Design of experiments The purpose of designing an experiment is to provide the most efficient and economical method of reaching valid and relevant conclusions from the experiment. A properly designed experiment should permit a relatively simple statistical interpretation of the results, which may not be possible otherwise. The experimental design is the formal arrangement in which the experimental programme is to be conducted, selection of the treatments to be used, and the order in which the experimental runs are undertaken. Experimental design may be applied equally to laboratory investigations or to solely computer-based numerical investigations in which a large number of variables are involved. The design may dictate the levels at which one or more of the variables ( factors) are present, and the combination of factors used, in any one experiment. This formal approach was popularised following the work of the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) (Fisher 1935; Quenouille 1949; Montgomery 1991b). Use of these methods was first promoted in geology by the American petrologist, Felix Chayes (1916–1993) and the mathematical geologist, William Christian Krumbein (1902–1979). See: Chayes and Fairbairn (1951), Krumbein and Miller (1953), Krumbein (1955), Thompson et al. (1979), Damsleth et al. (1992) and Guest and Curtis (2009).

154

Despiker A filter for removing spikes from time series records, e.g. in seismic signal processing (Evans 1982). See also: Treitel and Robinson (1966), Robinson and Treitel (1980) and Buttkus (1991, 2000). det (determinant) A scalar function of a square matrix X, obtained by multiplying and adding the elements of X together in a systematic way, so as to reduce it to a single value. For a 3 3 matrix X, where

D

2

a1 X ¼ 4 a2 a3

b1 b2 b3

3 c1 c2 5 , c3

det(X) ¼ a1b2c3 a1b3c2 + a2b3c1 – a2b1c3 + a3b1c2 – a3b2c1. In general, it is given by detðXÞ ¼

k X

aij C ij ,

i¼1

where Cij is the cofactor (De Morgan, 1849) of the element aij. The cofactor is (1)i + j times the matrix obtained by deleting the i-th row and j-th column of X. If the value of the determinant is zero, the matrix is said to be singular. Note that det(X) may also be written as |X|, a notation introduced by the English mathematician Arthur Cayley (1821–1895) in 1846 (not to be confused with the similar notation used to denote the absolute value of a variable). First introduced by the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) in 1812 and published in Cauchy (1815). An early example of its use in seismological calculations is by Macelwane (1932) and in a geological context by Miller and Kahn (1962) and Krumbein and Graybill (1965). Detection In geophysics this generally implies the recognition of a weak signal in the presence of noise (Buttkus 1991, 2000). Detection limit The detection limit of an analytical (chemical) procedure is the lowest concentration of an analyte that can be distinguished with confidence from a field blank (Analytical Methods Committee 1987). It is generally defined as the concentration or amount corresponding to a measurement level ksB units above the value for zero analyte, where sB is the standard deviation of responses of the field blanks and k is a constant. In analytical chemistry, k is taken as 3, but in geochemistry, it is often taken as 2 (a usage which may stem from a wish to lower the false-dismissal rate in exploration geochemistry). In practice, sB is best determined by linear regression of the standard deviation sCof a number of replicate measurements at each of a series of concentrations (C) as a function of C. This function is estimated using the data furnished by a method calibration experiment.

155

The term detection limit may have arisen from radiochemistry: The New Zealand-born British physicist, Ernest Rutherford, Lord Rutherford of Nelson (1871–1937), and the German physicist, Johannes (Hans) Wilhelm Geiger (1882–1945) used the term limit of detection (Rutherford and Geiger 1908), and it appears in the sense of concentration in a paper by Rutherford (1937). The concept of an Erfassungsgrenze (recording or detection limit) was used by the Austrian-born Brazilian chemist Friedrich Feigl (1891–1971) in Feigel (1923). However, the 3sB statistical definition seems to have been introduced by the German physicist, mathematician and chemist, Heinrich Kaiser (1907–1976) (Kaiser 1947) and its usage subsequently clarified in Kaiser and Specker (1956) and Kaiser (1965) [English translation in Kaiser and Menzies (1969)]. See L. Currie (1995, 2004) and Analytical Methods Committee (2001) for discussion and Helsel (2005) for a comprehensive review of methods for computation of: summary statistics; confidence, tolerance and prediction intervals; comparison of groups, correlation and regression analysis for data sets containing nondetects (i.e. concentrations less than one or multiple detection limits). See also: reporting limit, Thompson-Howarth plot. Deterministic The presumption that a given situation is determined by a necessary chain of causation, or set of causes (Sheriff 1984). In a statistical sense, it is a process in which the past completely determines the future of a system; a process lacking a random element which consequently has a zero error of prediction (Kendall and Buckland 1982). Early distinction between deterministic and stochastic processes is made by Metropolis and Ulam (1949) and in geology by Krumbein and Graybill (1965). See also: probabilistic model, deterministic model, stochastic process model. Deterministic chaos Deterministic chaos is the irregular or chaotic motion that is generated by nonlinear systems whose dynamical laws uniquely determine the time evolution of a state of the system from a knowledge of its previous history (Schuster 1984; Schuster and Just 2005). The term, which came into use in the 1970s (e.g. Oster 1976), is now often used in preference to chaos (Turcotte 1997; Weedon 2003). Deterministic deconvolution Deconvolution in which the characteristics of the filter which is to be removed are known beforehand. For example, in signature deconvolution, a usual data processing step, a measured seismic source signature is used to design a filter which converts it to some desired shape, the shaping filter is then convolved with the seismic traces (Dragoset 2005). For early examples, see: Neidell (1972) and Schultz (1985). See also: statistical deconvolution. Deterministic model 1. A numerical formulation expressing the exact functional relationship between a dependent variable and one or more predictors, arrived at on a purely theoretical basis and which may be tested by experiment (e.g. Stokes’ law). Such a model differs from a

156

stochastic process model in that it has no random element built into it (Krumbein and Graybill 1965). The term seems to have come into use in the 1940s (e.g. Kendall 1949). 2. A dynamical system whose equations and initial conditions are fully specified and are not stochastic or random (Turcotte 1997). See also: conceptual model, discoveryprocess model, fluid-flow model, mathematical model, physical model, scale model, statistical model.

D

Detrend, detrending The process of removal of any major monotone long-term trend in a set of time series data prior to performing power spectral density analysis to estimate the properties of the shorter term oscillations. (This will eliminate the power at zero frequency). Its usage goes back to work by the English statistician, Maurice Stevenson Bartlett (1910–2002) (Bartlett 1948, 1950), taken up in the context of digital signal processing by the American statistician, John Wilder Tukey (1915–2000) with the communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). See Weedon (2003) and Gubbins (2004) for discussion in an earth science context. Deviation map A term for a map of the values of residuals from a trend-surface fit, introduced by the American mathematical geologist, William Christian Krumbein (1902–1979), (Krumbein 1959b; Krumbein and Graybill 1965). Diagonal matrix A square matrix which has zeros in all elements not lying on the principal diagonal. 2

x11 D¼4 0 0

0 x22 0

3 0 0 5: x33

(Camina and Janacek 1984). Diagram See: ACF, addition-subtraction, AFM, Angelier-Mechler, Argand, Beta, block, Chadha, Concordia, cyclographic, D/P, density, Durov, echelon, fabric, facies, fence, Flinn, Fry, Gresen, Hill-Piper, intensive variable, Jelinek, kite, LangelierLudwig, Mohr, nearest neighbour orientation, Panozzo, Pearce, phase, Pi, PiperHill, polar, pole, QAPF, Ramsay logarithmic, rare earth element, Ropes, rose, Schmidt, Schoeller, Sneed-Folk diagram, spider, Stiff, TAS, ternary, tetrahedral, topology, variation, Venn, and Vollmer diagrams; see also plot. Dice coefficient A similarity coefficient for binary (presence/absence data) introduced by the American biogeographer, Lee Raymond Dice (1887–1977) (Dice 1945): 2C/(N1 + N2) where C ¼ total species common to both units compared; N1 ¼ total species present in the first unit; and N2 ¼ total species present in the second unit (Cheetham and Hazel 1969). It was subsequently called the Dice coefficient by Sokal and Sneath (1963). See: binary coefficient.

157

Difference equation An equation that relates a value of a function x(i + 1) to a previous value x(i); it generates a discrete set of values of the function x (Turcotte 1997). Difference tone In the case of imposed amplitude modulation in which a long period sinusoidal wavelength with frequency f1 is imposed on another with frequency f2, f1 > f2, then minor combination tones will be generated at frequencies 1/f ¼ 1/f1 1/f2, the upper and lower sidebands on either side of the dominant frequency ( f2). These appear as symmetrically placed minor-amplitude peaks on either side of f2 in the power spectrum of the resulting waveform. The term combination tone was used in acoustics by the German physicist, Georg Simon Ohm (1787–1854) (Ohm 1839). They are also called interference beats and interference tones; their generation is known as intermodulation or frequency mixing. The primary combination tone at f1 + f2 is known as a summation tone, and at f1f2 as a difference tone. When a component frequency is higher than a fundamental frequency, it is called an overtone, and a difference tone at a lower frequency than the fundamental is called an undertone. For discussion in an earth science context see King (1996) and Weedon (2003). Differential equation An equation that contains both a function and its derivatives, e.g. the behaviour of damped oscillation of a spring is described by: d2 x dx þ a þ b2 x ¼ 0, dt 2 dt where the constants a > 0 and b ¼ mk , where x is the displacement from equilibrium of a mass m at time t and k is the stiffness of the spring. The order of such an equation is the highest power of the highest order derivative (e.g. a “second order” equation will contain second derivatives). An ordinary differential equation involves ordinary derivatives as opposed to a partial differential equation which involves partial derivatives. General use of the differential equation followed its introduction by the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716), (Leibniz 1684); see Archibald et al. (2004) for an account of its historical development. For examples see Polyanin and Zaitsev (2003). For methods of solution using R, see Soetaert et al. (2012). Differentiation a. A modern term for the mathematical operation which gives the rate of change (slope) of a function with respect to some variable. The term calculus differentialis was introduced by the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716), (Leibniz 1684). It can be regarded as a complementary process to integration. See also: derivative; Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004).

158

b. In igneous petrology the term magmatic differentiation is used to indicate the changing chemical composition of a magma, and the suite of igneous rocks derived from it, as the result of crystal fractionation, crustal contamination and magma mixing, etc. 2

D

∂ y Diffusion equation A partial differential equation of the form: ∂y ¼ a ∂x 2 þ b, where t is ∂t time, which has been used to describe density fluctuations in a material undergoing diffusion (Camina and Janacek 1984, Hesse 2012); Begin (1987) applied the same principle to model the elevation of a channel bed in an alluvial drainage system with lateral inflow of sediment. See also Helmholtz's equation.

Diffusion-limited aggregation Particles diffusing (random walking) through a medium stick to a seed particle (aggregate) to form a dendritic structure. It is considered to be diffusion-limited because the particles are at sufficiently low concentration in the transporting medium that they do not come into contact with each other and hence aggregate one at a time rather than in clumps. Simulated examples resemble coral growth and dendritic manganese staining; “hairiness” of the agglomerating object may be controlled by a “stickiness” parameter, which governs the probability of a new particle sticking to the current object. See Whitten and Sander (1981) and Turcotte (1997). Diffusion process The solution to a stochastic differential equation; a continuous time Markov process with a continuous path (Vistelius 1980, 1992). Digital Representation of quantities in discrete units (Sheriff 1984), i.e. using numerical digits (Kilburn 1949); see also analog[ue]. Digital Elevation Model (DEM) A digital elevation model (DEM) is a format for the storage and transmission of digital terrain height data representing bare-ground terrain elevations at regularly spaced horizontal intervals. A digital terrain model differs in that it may be an irregularly-spaced vector model of bare-earth points (Anonymous 1992a, b). A standard was adopted worldwide in 1992 following its development by the U.S. Geological Survey, which began public distribution of DEM data in 1975. A DEM data set is a single file comprising 1024-byte ASCII-encoded (text) blocks that fall into three record categories called A, B, and C. There is no cross-platform ambiguity since line ending control codes are not used, and all data (including numbers) is represented in readable text form. The A record contains information defining the general characteristics of the DEM, including its name, boundaries, units of measurement, minimum and maximum elevations, number of B records, and projection parameters. Each B record consists of an elevation profile with associated header information, and the optional C record contains accuracy data. Each file contains a single A record and may contain a single C record, while there is a separate B record for each elevation profile. Early use of the term appears in Cruden and Krahn (1973) and McEwen and Jacknow (1980). See also Li et al. (2005).

159

Digital filter A system with performs operations on a sampled discrete-time signal to reduce or enhance certain aspects of that signal. The input signal may have been pre-processed by analogue-to-digital conversion, and the output signal may be reconverted to an analogue signal. Early applications were noise removal (Frank and Doty 1953); the elimination of water reverberations (Backus 1959); and ghost reflections (Lindsey 1960) in seismic signal processing. The June 1967 issue of Geophysics was devoted entirely to digital filtering. For detailed discussion see: Robinson (1967b), Camina and Janacek (1984), Buttkus (1991, 2000). A term which originated in radio engineering in the 1920s. Algorithms for selectively removing noise from a time series or spatial set of data (smoothing), or for enhancing particular components of the waveform. The term filter was first used in digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). See also: Frank and Doty (1953), Gubbins (2004); acausal, anti-alias, averaging, band-pass, band-reject, Butterworth, causal, frequency-selective, high-pass, impulse response, low-pass, matched, minimum-delay, nonrealizable, notch, realisable, stacking, threshold, two-pass, wavenumber, Wiener, and zero-phase filters. Digital mapping See: point-value, point-symbol, contour; biofacies, choropleth, classifying function, derivative, facies departure, isofacies, isopach, isopleth, lithofacies, sorting, sphericity, and structure contour maps, also trend-surface analysis. Digital Surface Model (DSM) A digital elevation model which represents the height of the Earth’s surface including reflective surfaces, such as vegetation cover and man-made objects such as buildings. All data sets captured from aircraft or satellites (e.g. using scanning laser-rangefinder systems, Kilian et al. 1996) are DSMs. Digital terrain model (DTM) A digital elevation model which represents the bare ground without any vegetation coverage or buildings, etc. (Anonymous 1992a, b). It may be stored as an irregularly-spaced vector model rather than the regularly spaced digital elevation model. Digital-to-Analog[ue] (D/A) The conversion of a digital number into the equivalent voltage in an analog[ue] system (e.g. De Bremacker et al. 1962; Prothero 1974). Digital signal processing See: convolution, deconvolution, and deterministic, dynamic, homomorphic, minimum entropy and statistical deconvolution, inverse filtering.

160

Digitization, digitize, digitizing The process of sampling a continuous voltage signal (or other continuously varying function, such as a recorder trace of a time series, or a map isoline), usually at regular intervals, and recording the values in digital form for subsequent data storage or analysis (Robinson 1967b, Sheriff 1984). See Broding and Poole (1960), Aspinall and Latchman (1983), Thibault and Klink (1997) and Xu and Xu (2014) for examples; see also: analog[ue]-to-digital conversion.

D

Digitizer Apparatus for carrying out the process of digitization, e.g. in collection of welllog data, (Broding and Poole 1960), a microscope field of view (Lohmann 1983), cartographic data (Bortoluzzi and Ligi 1986), etc. Dihedron, dihedral A geometric figure with two sides formed by two intersecting planes; the dihedral angle is the angle at which the two planes meet in a third plane which cuts the line of intersection at right angles. A regular tiling or map on a sphere composed of two regular p-gons, each occupying a hemisphere and with edge lengths of 2π/p on a unit sphere (Klein 1888; Coxeter 1948) Dilation One of the Minkowski set operations (Minkowski 1901). See Agterberg and Fabbri (1978) for a geological example. See also: area, extension, and volume dilation, mother wavelet. Dimension The topological values of a dimension are integer values: a point, 0; a line, 1; a square, 2; a cube, 3; etc. Dimensions 1–3 are used in this sense by Stone (1743). A vector space may also be spoken of as having a dimension. A fractal has the property of having a fractional (“fractal”) dimension (Mandelbrot 1975a, 1977). Dimensional analysis Developed by the American physicist, Percy Williams Bridgman (1882–1961) it is a method of reducing complex physical problems to their simplest form prior to obtaining a quantitative answer and involves equating units in a physical relationship so that the dimensions as well as the number values balance (Bridgman 1922, Shen et al. 2014). For a historical review of use of this approach, see Macagno (1971). The classic study in the earth sciences using this technique was by American hydraulic engineer, Robert Elmer Horton (1875–1945) (Horton 1945), which influenced much subsequent work (Strahler 1958, 1992). Dimensionless units Ratios which do not depend on the units in which the numerical quantities which form them are measured (Sheriff 1984). Dip The downward angle which a bedding plane or other surface makes with the horizontal, at right angles to the strike direction of the bed. A term probably first used by metal miners in the sixteenth century. It is usually shown on a map or plan by: (i) an arrow showing the dip direction with the angle of dip written at its head; or (ii) by a T-shaped

161

symbol (├), in which the cross-bar of the T corresponds to the strike direction and the short stem to the dip direction. Such symbols were introduced in the first years of the nineteenth century. The term dip was in use by coal miners in Britain by 1672, and its equivalent ( fallen) in Germany c. 1500. See Howarth (1999, 2001b) for a review of the geological analysis and portrayal of such data and Woodcock (1976) for discussion of the magnitude of its measurement error. Dirac comb The “comb” consists of an infinite time series of unit impulses, all equally spaced in time, formed from a combination of Dirac delta functions, named for the English physicist, Paul Adrien Maurice Dirac (1902–1984) (Blackman and Tukey 1958). It is also known as the sampling function, because multiplying a time-varying function by a comb gives the sample values at the comb interval, or the replicating function because convolution of a waveform with a comb replicates the waveform at the position of each impulse spike. Early examples of reference to it are by Blackman and Tukey (1958) and in geophysics Bakun and Eisenberg (1970); see also Gubbins (2004). Dirac delta (δ) function, Dirac function A probability density function in which P(x) ¼ 0 for x ¼ 1 to 1, except at x ¼ 0 where P(x) ¼ 1. Its use was popularised by the British physicist, Paul Adrien Maurice Dirac (1902–1984) who introduced it (Dirac 1930, p. 58) as a tool in quantum mechanics. Also known as the Dirac function (Blackman and Tukey 1958) and an impulse. Discussed in a geophysical context by Buttkus (1991, 2000) and Gubbins (2004); see also: Gunduz and Aral (2005) See also: Kronecker Delta, Heaviside function, Dirac comb. Direct method A term for spectrum estimation, introduced by the American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000) (Blackman and Tukey 1958). Given an infinite length record X(t) the power spectrum may be calculated either directly from X(t), or indirectly as the Fourier transform of the autocovariance function, which is calculable directly from X (t). The basic choice is essentially between squaring a Fourier transform, or Fourier transforming an average of products. Mentioned in an earth science context by Buttkus (1991, 2000) and Weedon (2003). Direct problem Better known as a forward model (Parker 1972, 1977), it has also been called a direct problem (Ianâs and Zorilescu 1968) or normal problem (Sheriff 1984). It calculates what would be observed from a given conceptual model; it is prediction of observations, given the values of the parameters defining the model, e.g. predicting the gravity field over a salt dome whose characteristics have been inferred from a seismic survey (Sheriff 1984; Gubbins 2004). See also: inverse problem.

162

Direction cosines A set of transformation equations for three-dimensional orientation data. In general, given a vector (a, b, c) in three-dimensions, the direction cosines of this vector are given by: 8 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > cos α ¼ a= a2 þ b2 þ c 2 > > < pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cos β ¼ b= a2 þ b2 þ c2 > > pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > : cos γ ¼ c= a2 þ b2 þ c2

D

where α, β and γ are the angles which the vector (a, b, c) makes with the positive x-, y- and z-axes and cos 2 α þ cos 2 β þ cos 2 γ ¼ 1: In the context of structural geology, if φ is the plunge of a line, δ is the dip of the normal to a plane, and θ is the corresponding azimuth in both cases, then the three direction cosines in the directions north, cn; east, ce; and down cd are: 8 cn ¼ cos ðφÞ cos ðθÞ > > < ce ¼ cos ðφÞ sin ðθÞ > > : cd ¼ sin ðφÞ and 8 cn ¼ sin ðδÞ cos ðθÞ > > < ce ¼ sin ðδÞ sin ðθÞ > > : cd ¼ cos ðδÞ respectively. Direction cosines were used, in the context of strain analysis, by the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823) and later popularised by the work of the British physicist, William Thomson, Lord Kelvin (1824–1907) (Thomson 1856). The first analytical treatment in geology was given by the American mining and structural geologist, George Ferdinand Becker (1847–1919) (Becker 1893). Loudon (1964) provided an early computer program for analysis of structural orientation data. See also Watson (1965, 1966), Ramsay (1967), Cheeney (1983), Fisher et al. (1993); Woodcock diagram. Directional statistics See: circular statistics, direction cosines, fluctuation, Fry diagram, gnomonic projection, Lambert equal-area projection, line rotation, nearest

163

neighbour orientation diagram, Panozzo diagram, petrofabric, polar diagram, rose diagram, spherical statistics, stereographic projection, von Mises distribution. Directivity graph, directivity plot 1. A directivity graph of: (a) relative amplitude of an outgoing seismic wave from a single charge or source pattern as a function of direction (Campbell 2005); or (b) the relative response of a geophone array as a function of apparent dip, apparent velocity, etc. as a function of direction; polar coordinates may be used where the directional data is angular (Sheriff 1984). 2. Directivity may also stand for the focusing of wave energy along a fault in the direction of rupture. Stereographic projections of earthquake ground motion data in relation to rupture direction are also known as directivity plots (Anonymous 2010b). Dirichlet conditions The necessary and sufficient conditions for a Fourier series such that a real-valued periodic function f(x) is equal to the sum of its Fourier series at each point where the function is continuous. In any given interval, the function must have a finite number of maxima, minima, and discontinuities; it must be integrable, and it must be bounded, i.e. | f (x)| R, where R is a real number less than infinity, for all x. (Sheriff 1984). Named for the German mathematician, Johann Peter Gustav Lejune Dirichlet (1805–1859) who carried out the first comprehensive investigation of Fourier’s series and stated these conditions (Dirichlet 1829). See also: Buttkus (1991, 2000) and Gubbins (2004). Dirichlet distribution The Dirichlet distribution, named for the German mathematician, Johann Peter Gustav Lejune Dirichlet (1805–1859), of order k is a (k1) dimensional multivariate continuous probability density function, with parameters α ¼ {α1, α2,, αk}, all having values greater than zero. The probability density function is given by f ðx; αÞ ¼

1 α1 1 α2 1 x1 x2 xk αk 1 , where B ðα Þ

BðαÞ ¼

Γðα1 ÞΓðα2 Þ Γðαk Þ Γðα1 Þ þ Γðα2 Þ þ þ Γðαk Þ

and Γ is the Gamma function. It is the multivariate analogue of the Beta distribution. Mentioned in an earth science context by Vistelius (1980, 1992), Aitchison (1984, 1986, 2003) and Strauss and Sadler (1989). Dirichlet domain, Dirichlet tessellation A class of random polygons which describes growth about random centres, or the contraction-cracking of a surface. They are spacefilling, convex polygons constructed around a set of points or centres, such that each polygon contains all of the points that are closer to its centre than to the centres of other

164

D

polygons. The tessellation was first discovered by the German mathematician, Johann Peter Gustav Lejeune Dirichlet (1805–1859) (Dirichlet 1850), but was rediscovered by the Russian mathematician, Georgy Fedoseevich Voronoï (1868–1908) who studied the ndimensional case (Voroni 190); the American meteorologist, Alfred Henry Thiessen (1872–1956), who applied them to finding the spatial average (Thiessen mean) of rainfall (Thiessen 1911); and others. Hence their alternative names, Voronoï polygons and Thiessen polygons. The concept subsequently advocated for use in the mining industry (Harding 1920, 1923). Note that Evans and Jones (1987) comment that “the vast majority of naturally occurring polygons will not be approximated well by [such] polygons” as evidenced by the concave polygons formed by mud cracks, crystal interfaces, etc. See also: Beard (1959), Gilbert (1962), Lachenbruch (1962), Crain (1976), Boots and Jones (1983) and Evans and Jones (1987); Delaunay tessellation. Dirichlet weighting function, Dirichlet window An alternative name for the Daniell window, the Dirichlet window (Rice 1964; Harris 1978) is named for the German mathematician, Johann Peter Gustav Lejeune Dirichlet (1805–1859). It is used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the length of the window is typically even and an integer power of 2; for each point within 0 n N1, the weight w(n) ¼ 1, otherwise it is zero, contrasting with that of the smoothly changing weights in windows which are tapered. It is also known as the boxcar window (Alsop 1968), or the rectangular window (Harris 1978). Discovery-process model Within a business cycle, the sizes of oil fields discovered changes in a systematic way during the life of the play. The analysis of the discoverypattern as a function of the number of wildcat wells drilled provides a basis for forecasting future rates of discovery. The first study of this type, by Arps and Roberts (1958), postulated the discovery-process model: h i F a ðwÞ ¼ F a ð1Þ 1 ecAw=B where Fa(w) is the cumulative number of discoveries estimated to be made in size-class A by the drilling of w wells; Fa(1) is the ultimate number of fields in size-class A that occur in the basin; B is the area of the basin; A is the average areal extent of the fields in size-class A; w is the number of wildcat wells drilled; c is a constant representing the efficiency of the exploration process; and e is Euler’s number, the constant 2.71828. For further discussion see Drew et al. (1980), Schuenemeyer and Drew (1983) and Drew (1990). See also: conceptual model, deterministic model, fluid-flow model, mathematical model, physical model, scale model, statistical model, stochastic process model. Discrete The distinction between continuous and discrete (integer) numbers appears to have been made as early as the sixteenth century (Miller 2015a)

165

Discrete convolution theorem If {a}¼ a0, a1, a2,, aN1 is a sequence of N measurements made at regular intervals, Δt, its z-transform is a polynomial in the complex variable z: A(z) ¼ a0 + a1z + a2z2 + . . . + aN1zN1; and if {b}¼ b0, b1, b2,, bM1 is a series of M measurements also made at regular intervals Δt, its z-transform is: B (z) ¼ b0 + b1z + b2z2 + . . . + bM1zM1 then the discrete convolution of {a} and {b} yields the series {c}, which has a length of N + M 1, in which cp ¼

p X

ak bpk :

k¼0

Expressing this terms of the z-transform, this is: C(z) ¼ A(z)B(z), i.e.: C ðzÞ ¼

N 1 X k¼0

ak z k

M 1 X

bl z l

l¼0

(Gubbins 2004). Discrete distribution A frequency distribution in which the variable (x) can take any integer value within a range a x b; a will often be zero. See, for example, the binomial, negative binomial, Poisson and the multivariate multinomial distributions. Ord (1972) gives a useful graphical test for distinguishing between the binomial, negative binomial, Poisson and other distributions. See also: Hastings and Peacock (1974) and Johnson et al. (2005). Discrete Fourier Transform (DFT) The Fourier analysis of a time series of n equallyspaced observations {x0, x1, x2, . . . xn1} is its decomposition into a sum of sinusoidal components, the coefficients of which {J0, , Jn1} form the discrete Fourier transform of the series, where Jj ¼

n1 1X xt eiωjt n t¼0

pffiffiffiffiffiffiffi the summation is from t ¼ 0 to n1; i is the imaginary unit 1; e is Euler’s number, the constant 2.71828; and ωj is the j-th Fourier frequency. In terms of magnitude A and phase φ, Jj ¼ Aje(iφj). The development of the theory goes back to work by the German mathematician, Carl Friedrich Gauss (1777–1855) (Gauss 1805), its rediscovery by the American physicist, Gordon Charles Danielson (1912–1983) and the Hungarian-born physicist, Cornelius Lanczos (b. Lánczos Kornél, 1893–1974), (Danielson and Lanczos 1942) in the early days of computers, and its popularisation following development of the Cooley-Tukey algorithm (1965). See also: Fast Fourier transform, periodogram,

166

Lomb-Scargle Fourier transform; Heideman et al. (1984), Blackman and Tukey (1958), Cooley and Tukey (1965), Camina and Janacek (1984), Cooley (1990, 1992), Sorensen et al. (1995), Buttkus (1991, 2000) and Gubbins (2004).

D

Discrete prolate spheroidal sequence (DPSS) Discrete prolate spheroidal sequences, also known as Slepian sequences after the American mathematician, David Slepian (1923–2007) who developed their application (Slepian and Pollak 1961, Slepian 1978), are defined in terms of their length, N, and the frequency interval (W, W ) in which they are maximally concentrated. The DPSS of kth order for a given N and W, is defined as the real solution to a system of equations for each k ¼ 0, 1, 2, ∙∙∙, N1, with a specific normalization to ensure uniqueness. The system has N distinct eigenvalues and eigenvectors. The eigenvalues are related to the amount of concentration that is achieved. The window length, N, as well as the bandwidth of concentration, 0 < W < 0.5, parameterize the family of discrete prolate spheroidal windows. The main lobe width in the spectral window is directly related to the bandwidth of the concentration parameter. The sidelobe level is a function of both the window length and the bandwidth of the concentration parameter. See also Percival and Walden (1993). Mentioned in an earth science context in Weedon (2003). Discrete series, discrete time series An assignment of a numerical value X(t) to each time t of a discrete time range (Camina and Janacek 1984). Discrete-signal record A time series obtained when the thickness of successive layers (laminae, beds, cycles or growth-bands) forms the measured variable and the layer or cycle number is used as a proxy for the time or depth/thickness scale (Weedon 2003). Discretisation, discretization The accurate numerical representation of a continuous function by an object consisting of a finite number of discrete elements, so as to render it more suitable for obtaining a computational solution. Commonly applied methods are pointwise discretisation or expansion in orthogonal functions, such as Legendre polynomials or spherical harmonics (Gubbins 2004). Discriminant analysis In the pattern recognition literature, these applications are known as pattern classification—the assignment of an object of “unknown” affinity to one of a pre-defined number of groups on the basis of its p-dimensional composition. Whichever classification method is used, the approach is the same: (i) a data set consisting of n individuals, together representative of the k-classes of interest, is chosen (the training set); (ii) a suitable classification algorithm is chosen; (iii) feature selection is undertaken: the optimum subset of p* features to distinguish between the classes, as measured by the misclassification rate, is determined by experimental trials; (iv) the efficacy of the final classifier is determined by the misclassification rate found using an independent test set; if there is not enough data to do this, then the rate can be estimated by repeating the

167

classification n times, omitting each training sample in turn, using the remaining (n1) samples as a temporary test set and the omitted sample as an independent test sample; and (v) the classification rules are now applied to the p* features to classify the candidates of unknown affinity. As a rule of thumb, the training set size should preferably consist of at least 3p* individuals per class. The classic approach to discriminant analysis is the method of canonical variates analysis (Campbell and Atchley, 1981), introduced by the English statistician, (Sir) Roland Aylmer Fisher (1890–1962) (Fisher 1936), in which linear or quadratic separating hypersurfaces are determined so as to best separate the p*-dimensional multivariate normal ellipsoids representing each group; robust estimation of the covariance matrices is desirable (Campbell 1980, 1982; Campbell and Reyment 1980; Chork and Rousseeuw 1992). Alternative nonparametric methods include: empirical estimation of the density function for each class (Specht 1967; Howarth 1971a, 1973); classification and regression trees; and neural networks. A discriminant function was first used in geology by the Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995) (Vistelius 1950). See also: Mahalanobis’ generalized distance. Disjunctive kriging Kriging is a term coined by the French mining engineer and mathematician, Georges Matheron (1930–2000), for a method which provides optimal estimation of ore grades at a point, or the mean grade within a block contained in an ore body (Matheron 1960). Named for the South African mining engineer, Daniel Gerhardus Krige (1919–2013) who was the first to make use of spatial correlation to overcome the observed disagreements between ore grades estimated from both sampling and stope samples in South African gold mines (Krige 1951). “Ordinary” kriging is essentially an optimum method for spatial interpolation which produces the best unbiased estimate of the mean value at a point with minimum estimation variance, and the best weighted moving average for a block. In the case of a point estimate Z*(x0) at a specified position surrounded by n data points with values Z(xi), Z*(x0) ¼ ΣwiZ(xi), where wi are the weights, Σwi ¼ 0. It is assumed that there is no underlying regional trend and the values of Z(x) should either conform to a normal distribution or should have been transformed so that the transformed values meet this requirement. The weights wi are assigned depending on both the distance and direction of xi from x0, taking into consideration the additional requirements that: nearer points should carry more weight than distant ones; points screened by a nearer point should carry less weight; and spatially clustered points should carry less weight compared to an isolated point at the same distance away. The weights are obtained using a set of variogram models g(d ) fitted along directions aligned with the principal octants of the geographical coordinate system. This is generally sufficient to define the principal axes of the ellipsoids of equal weight with x0 as the centre. In many applications x0 will be the set of grid nodes at which values are to be interpolated prior to contour threading (see contouring). Matheron formalised and generalised Krige’s procedure (Matheron 1960, 1962–1963, 1965), defining kriging as the probabilistic process of obtaining the best linear unbiased estimator of an unknown variable, in the sense of minimizing the variance of the resulting estimation error (estimation variance). He subsequently

168

(Matheron 1973, 1976) developed procedures to obtain unbiased nonlinear estimators (e.g. disjunctive kriging and kriging of transformed variables). Disjunctive kriging (Rivoirard, 1994) is based on an exact transform of the cumulative distribution function of Z(x) to the equivalent quantiles of the standard normal distribution. See also: Bivand et al. (2008, 2013); indicator kriging, universal kriging, conditional simulation.

D

Dispersion analysis A source signal of a given shape passing through a dispersive medium will lead to progressive distortion of the signal during its wave propagation. Inversion of these velocities can lead to an estimate of the physical properties of the medium through which it has passed (Buttkus 2000). Examples of its application are Bolt and Niazi (1964) and Chávez-Garcia et al. (1995). Dispersion curve In seismology, a dispersion curve is a graph of seismic wave phase velocity (km/s or m/s) as a function of either wave period (sec), the reciprocal of frequency (Bullen 1947, Press et al. 1961) or frequency (Hz) (Jin and Colby 1991) which may then be inverted to obtain a velocity/depth profile. Dispersion matrix If a folded surface is considered in three dimensions with n measurements of the orientation of normals to the bedding measured with reference to fixed axes, such as south (S), east (E) and vertical up (V), if pi, qi and ri are the direction cosines of the i-th normal referred to S, E and V then the 3 3 dispersion matrix (A) is given by: 2

Σ p2i =n

6 A¼6 4 Σ ðqi :pi Þ=n

Σ ðpi :qi Þ=n Σ q2i =n

Σ ðri :pi Þ=n

Σ ðri :qi Þ=n

Σ ðpi :ri Þ=n

3

7 Σ ðqi :ri Þ=n 7 5, 2 Σ ri =n

where all the summations (Σ) are from 1 to n (Loudon 1964, Whitten 1968). Displacement, displacement vector, displacement vector field, displacement vector gradient If a point in a body is displaced from an initial position at (x1, y1) to a final position at (x2, y2) in two-dimensional Cartesian coordinates, its displacement is the straight line displacement vector joining (x1, y1) and (x2, y2). In three dimensions, the displacement would be from (x1, y1, z1) to (x2, y2, z2). A displacement vector field is a set of displacement vectors relating to a set of initial points located on a grid which together define the type of displacement to which the body has been subjected, e.g. in body translation, the set of vectors will all be parallel and of equal magnitude throughout the body; in body rotation, they will vary in orientation and magnitude in a systematic way depending on their initial position, increasing in length as they get further away from the point about which the rotation takes place. A displacement vector gradient in two dimensions, is a 2 2 matrix expressing the spatial rate of change of the displacement

169

vector field with respect to the Cartesian x- and y-coordinates. In three dimensions, it is a 3 3 matrix with respect to the x-, y- and z-coordinates. If all terms in the matrix are the same, a state of homogenous strain exists; if they are different, the strain is heterogeneous. Treatment of such data using vector algebra followed the work of the English mathematician and geophysicist, Augustus Edward Hough Love (1863–1940), Love (1906). See also: Nádai (1927, 1931), Ramsay (1967), Hobbs et al. (1976) and Ramsay and Huber (1983). Distance coefficient These are measures of the similarity of one sample to another in terms of their k-dimensional composition. In the case of quantitative data, a distance coefficient can be used. The most usual measure is the Euclidean (Pythagorean) distance, the length of the line joining the two points representing the sample compositions in pdimensional space: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n X dE ¼ ðxi1 xi2 Þ2 =k i¼1

or the Manhattan or city-block distance dM ¼

k X . xj1 xj2 k, j¼1

Raup and Crick (1979) and A. Smith (1994) discuss their use in palaeontology. Distance function map A map based on the idea of the “distance” of one composition (point) from another, taken as a reference composition and which is not an end-member in a ternary diagram or tetrahedron. The distance forms concentric circles (or spheres) about the reference composition in 2- or 3-dimensions depending on whether 3 or four end-members are used. The idea was suggested by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1953a, b; Forgotson 1960) for facies mapping. This predates modern work on the nature of compositional data and distance would be better computed on the basis of the logratio transform than the method Krumbein adopted. Distribution-free statistics See nonparametric statistics. Distribution function The distribution function of a variable x, is the total frequency of members with values x. As a general rule, the total frequency is taken as unity, in which case the distribution function corresponds to the proportion of members with values x. The term came into widespread use in 1930s (e.g. Hotelling 1930) and has been

170

occasionally used in the earth science literature (Chung 1989a, b). See also: frequency distribution, probability distribution. Divergence operator (div) [notation] This is a scalar operator (Macelwane 1932; Sheriff 1984) such that for any vector function F(x, y, z) with components directed along the orthogonal x-, y- and z-axes, it is the sum of the scalar products of the unit vectors i, j and k and the partial derivatives in each of the three component directions:

D

divðF Þ ¼ i

∂F ∂F ∂F þj þk : ∂x ∂x ∂x

It often describes the excess flux (e.g. fluid or heat flow) leaving a unit volume in space. The term was introduced by the British mathematician, William Kingdom Clifford (1845–1879) (Clifford 1878–1887). Treatment of displacement data using vector algebra followed the work of the English mathematician and geophysicist, Augustus Edward Hough Love (1863–1940) (Love 1906). Divergence theorem This states that the flux through a surface (or the integral of the vector flux density over a closed surface) is equal to the divergence of the flux density integrated over the volume contained by the surface (Sheriff 1984). This result appears to have been independently discovered by a number of scientists in the early nineteenth century, but is generally attributed as Gauss’s theorem, named for the German mathematician and physicist, Carl Friedrich Gauss (1777–1855), Gauss (1813), and as Green’s theorem to the English mathematician and physicist, George Green (1793–1841), (Green 1828). The term divergence theorem was used by Heaviside (1892a) but may well have come into being before that. Mentioned in an earth science context by Camina and Janacek (1984) and Gubbins (2004), but see also the discussion in Macelwane (1932). Diversity, diversity indices Many different indices have been developed to characterise biological diversity (Magurran 2004), some of which have been adopted from information theory (e.g. entropy; Shannon 1948; Shannon and Weaver 1949) and have been applied to paleoecology (Kaesler and Mulvany 1976). Divided bar chart A graph in which either the absolute frequency or relative frequency of occurrence of a category is shown by the proportional-length of a vertical bar for each category in a data set. Since they are categorical variables, ideally, the side-by-side bars should be drawn with a gap between them. Not to be confused with a histogram, which shows the binned frequency distribution for a continuous- or discrete-valued variable. The earliest bar chart (based on absolute amount) was published by the English econometrician, William Playfair (1759–1823) (Playfair and Corry 1786). An early earth science use was by Federov (1902) to show relative mineral birefringences. In a divided bar chart, each bar is divided vertically into a number of proportional-width zones to illustrate the

171

relative proportions of various components in a given physical sample; total bar-length may be constant (e.g. 100% composition) or vary, depending on the type of graph. These were first used by the German scientist, Alexander von Humboldt (1769–1859) (Humboldt 1811). In geology, divided bars were first used by the Norwegian geologist, metallurgist and experimental petrologist, Johan Herman Lie Vogt (1858–1932) (Vogt 1903–1904). The Collins (1923) diagram uses double divided bars to show the cationic and anionic compositions of a water sample separately; each set is recalculated to sum to 100% and plotted in the left- and right-hand bars respectively. Usage in geology increased following publication of Krumbein and Pettijohn’s Manual of sedimentary petrography (1938). Domain 1. The set of all allowable values which the independent variable x may take in the function f (x); hence it is the set of numbers on which a mathematical mapping is, or may be, carried out. This meaning of the term dates from pioneering work by the German mathematician, Georg Ferdinand Ludwig Philipp Cantor (1845–1918) (Cantor 1895, 1897, 1915). 2. All possible values of an attribute or data element in a database (IBM undated). Dot-density plot A graphic display in which the x-y point-density of a cloud of randomly-distributed equal-area dots is proportional to the numerical value (z) pertaining to the data point. Suggested by Panchanathan (1987) as an alternative to contour plots. dot NET (.NET) Framework A software environment, developed by Microsoft for its Windows operating system environment and first released in 2002 (now at v. 4.6, which comes installed in Windows 10). It provides a programming environment for developing a wide variety of applications. Used in conjunction with the .NET and Visual Studio development environments. Dot product [notation] The dot product (also known as the inner product) vectors x ¼ {x1, x2, x3, ∙ ∙ ∙, xn} and y ¼ {y1, y2, y3, ∙ ∙ ∙, yn} is x ∙ y ¼ {x1y1, x2y2, x3y3, ∙ ∙ ∙, xnyn} (Sheriff 1984; Camina and Janacek 1984). It first appears in an account (Wilson 1901) of work by the American mathematical physicist, Josiah Willard Gibbs (1839–1903) and occurs in geophysical papers from the 1950s (e.g. Hall 1956) onwards. The unhyphenated spelling dot product rather than dot-product is the most widely used (Google Research 2012). Double-exponential distribution Also known as the Laplace distribution, named for the French mathematician, Pierre-Simon, Marquis de Laplace (1749–1827), who described it in Laplace (1812). It is the distribution of differences between two independent variables with identical exponential distributions. Its probability density function is:

172

9 8 h μ xi 1 > > > , if x < μ > = < exp 2s s f ðx; μ; sÞ ¼ h i > > > ; : 1 exp x μ , if x μ > 2s s

D

where μ is the location parameter and s is the scale parameter. For discussion in an earth science context, see: Vistelius (1980), Walden and Hosken (1986), and Walker and Jackson (2000). Double integral, double integration Given a surface defined by z¼ f (x, y), it is the volume between the x-y plane (at which z¼0) and the surface, minus the volume (if any) between the plane and anywhere in which f (x, y) takes negative values, for a region, R, RR given by the limits x1 x x2 and y1 y y2. It is denoted: R f (x, y)dA (Camina and Janacek 1984). According to Todhunter (1861), the use of double integrals first appeared in a work by Gauss (1830). Double precision A binary floating-point computer numbering format in which twice as many digits (8 bytes, 64 bits) were used to specify a numeric quantity as was usually the case (4 bytes, 32 bits). It provided a relative precision of about 16 digits, and magnitude range of about 10308 to 10 + 308. Originally introduced in the FORTRAN programming language to enable greater accuracy to be obtained in complex scientific calculations, its use became widespread following the release of FORTRAN IV in 1962 (McCracken 1963). An early application in geology was the trend-surface fitting program of Whitten (1963); see also Thong and Liu (1977) and McCarn and Carr (1992). Downward continuation The mathematical projection of a potential field from one datum surface to another level surface below the original datum (Peters 1949; Trejo 1954; Dean 1958; Henderson 1960). Drift 1. A gradual (and often irregular) change in an instrumental reference value with time, e.g. changes in a measurement reading at a base station which is remeasured at regular intervals; element concentrations in a reference material determined each time an analytical (chemical) instrument is set-up or recalibrated (Nettleton 1940; Youden 1954). 2. In geostatistics the term “drift” is preferred to trend to indicate the presence of non-stationarity in the expectation m(x) of the spatial variable (x) studied: linear drift implies:

173

mðxÞ ¼ β0 þ β1 x; and quadratic drift: mðxÞ ¼ β0 þ β1 x þ β2 x2 (Journel and Huijbregts 1978; Bivand et al. 2013). Dual, duality The principle of duality in mathematics and physics gives two different points of view looking at the same object (Atiyah 2007 [2014]), such as that between the time domain and frequency domain in time series analysis (Camina and Janacek 1984). Duhamel’s theorem Convolution is the integral from i ¼ 0 to t of the product of two Rt functions, 0 f 1i f 2ti dx. For two equal-interval discrete time series a ¼ {a0, a1, a2, , an} and b ¼ {b0, b1, b2, , bn}, the convolution, usually written as a∗b or a ⨂ b, is c ¼ {c0, c1, t P ai bti : The operation can be imagined as sliding a past b one step c2, , cn}, where ct ¼ i¼0

at a time and multiplying and summing adjacent entries. This type of integral was originally used by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1781). The Hungarian-born American mathematician, Aurel Friedrich Wintner (1903–1958) may have been the first to use the English term convolution (Wintner 1934), although its German equivalent Faltung ( folding, referring to the way in which the coefficients may be derived from cross-multiplication of the a and b terms and summation of their products along diagonals if they are written along the margins of a square table) appeared in Wiener (1933). The operation has also been referred to as the Boltzmann-Hopkinson theorem, Borel’s theorem, Green’s theorem, Faltungsintegral, and the superposition theorem and a similar result may also be achieved in terms of z-transforms or Fourier transforms. It can also be applied in more than two dimensions (see: helix transform). See also: Tukey and Hamming (1949), Blackman and Tukey (1958), and in an earth science context: Robinson (1967b), Jones (1977), Vistelius (1980, 1992), Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004); deconvolution. Dummy variable A way of entering a categorical variable (e.g. month of the year) as a predictor in multiple regression. Each state is coded 1 if true, 0 otherwise. So, in this example, there would be twelve dummy variables, one for each month; January would be coded {1, 0, 0, 0, ∙∙∙}, February {0, 1, 0, 0, } etc. It may also apply to simple binary coding of presence/absence data, e.g. indicating whether a position falls on a particular geological formation or not, 1 ¼ yes, 0 ¼ no. It seems to have first come into use during the 1950s (Suits 1957). See also: Koch and Link (1970–1971), Dorsett and Webster (1983).

174

D

Duplicate samples, duplicates In any project it is always advisable to take duplicate samples (“duplicates”) at a given site, to analyse a proportion of duplicate splits (subsamples) of a prepared specimen, etc. to ensure that one has adequate means of assessing variability attributable to measurement, subsampling, sampling and other sources of variation. The taking of duplicate samples is mentioned in Winchell and Winchell (1891) and Bain (1904) and duplicate analysis as a means of checking analytical (chemical) error in mentioned in Washington (1919). Although they do not appear in Page’s Handbook of geological terms (1859), the practice of taking duplicate samples as a check was certainly current in other fields by that time (Anonymous 1835). See: nested sampling, analysis of variance, Thompson-Howarth plot. Durov diagram The diagram introduced by the Russian geochemist, Svyatoslav Alekseevich Durov, in 1948 plots the major ions as percentages of milli-equivalents in two base triangles. The total cations and the total anions are made equal to 100% and the data points in the two triangles are projected onto a square grid that lies perpendicular to the third axis in each triangle. The main purpose of the Durov diagram is to show clustering of samples of similar composition. Expanded versions have been developed by Burdon and Mazloum (1958) and Lloyd (1965); see also Al-Bassam and Khalil (2012). Dyadic A second-order tensor; a dyad, D, is formed from two (usually complex) vectors, a and b: D ¼ (aT)b. The terms dyad and dyadic were introduced in lectures on vector analysis given by the American mathematician and physicist, Joseph Willard Gibbs (1839–1903) (Gibbs 1881–1884), and first appeared in an account of work by his last student, Edwin Bidwell Wilson (1879–1964) (Wilson 1901). An early use in geophysics, in the context of Earth stress analysis, is by Macelwane (1932). Dynamic deconvolution A method for directly computing the reflection coefficients from a seismogram (Ferber 1984, Buttkus 1991, 2000). See also deconvolution. Dynamic programming Developed by the American applied mathematician, Richard Ernest Bellman (1920–1984) while at the RAND Corporation in 1951. He originally conceived it as a general method of solving stochastic decision processes but it was subsequently realised that it could be applied to engineering control theory. In principle, it is a multi-stage optimisation method for problem-solving by backward induction, which is achieved by breaking the overall task into a number of sub-problems, each of which can be solved individually and the answer(s) then used to enable solution of the next one up the hierarchy, and so on. It provides a framework in which many algorithms, contributing to the overall solution, may be developed. Given a problem to be solved, the steps to obtaining a solution are: (i) Find a naïve exponential-time recursive algorithmic solution; (ii) speed up the algorithm by storing solutions to sub-problems, so that they can be lookedup when needed, rather than being re-computed; and (iii) speed it up further by solving the subproblems in a more efficient order (Bellman 1954, 1957, 1984). In the earth sciences

175

many early applications were in the fields of water resource problems (reviewed in Yakowitz 1982) or oil-field production (e.g. Johnson et al. 1979) but later focussed on sequence-matching (e.g. Hawkins and ten Krooden 1979, Hawkins 1984, Clark 1985). See also slotting. Dynamic range The ratio (r) of maximum possible to minimum possible recorded signal, usually expressed in decibels as 20log10(r) dB (Gubbins 2004). Dynamic shape factor This is a particle shape factor which takes into account the dynamic properties of non-spherical particles and was originally taken to be the ratio of the actual settling velocity of a particle to that of a true sphere, the “form coefficient” of Krumbein (1942). However, it was described as a “dynamic shape factor” (DSF) by Krumbein (1943). Briggs et al. (1962) took the DSF to be the squared ratio of fall velocity (cm/sec) of the particle to the fall velocity of its nominal sphere (i.e. a sphere of the same material whose volume is equal to that of the actual particle). His definition still seems to be used in sedimentology (e.g. Tomkins et al. 2005). However, in modern work with aerosols, the DSF now seems to be taken (Scheuch and Heyder 1990) as the ratio of the resistance force on the non-spherical particle to the resistance force on its volume-equivalent sphere when both move at the same relative velocity, following Fuchs (1964). Dynamic system, dynamical system This is a system in which its behaviour is described by differential equations. Its long-term behaviour is determined by analytic or numerical integration of these equations. The natural dissipation of the system, combined with its underlying driving force, tends to kill off initial transients and it settles into its typical behaviour. The term was introduced by the American mathematical physicist, George David Birkhoff (1884–1944) (Birkhoff 1920, 1927), having studied the behaviour of such systems extensively since 1912, building on work by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) in celestial mechanics (Poincaré 1890, 1892–1899). The terms dynamic system and dynamical system appear in the literature with equal frequency (Google Research 2012). See also: Yuen (1992), Turcotte (1992), Aubin (1998); nonlinear dynamical system, fractal, chaos.

E

Easting A coordinate value read along the east direction in a geographical grid-reference system, yielding a distance to the east from the north-south gridline which passes through the origin. Eccentricity (e) A measure of the shape of a conic section: it is the ratio of a point on the curve from a fixed point, the focus, to the distance from a fixed line, the directrix. In the case of an ellipse (x2/a2 + y2/b2 ¼ 1), where x and y are the points on the ellipse, and a and b are constants, there are two foci, each located towards the ends of the major axis, equi-distant from the centre, positioned such that the total distance from any point on the ellipse to the foci is constant. In this case, the eccentricity is the ratio of the distance between the foci to the length of the major axis (e ¼ [√(a2 b2)]/a) and 0 e 1. If it is zero, then the ellipse becomes a circle. In the case of a parabola ( y2 ¼ 4ax), e ¼ 1; for a hyperbola (x2/a2 y2/b2 ¼ 1), e ¼ [√(a2 + b2)]/a, and it is greater than 1. The term ultimately derives from the work of the Greek–Egyptian astronomer and mathematician, astronomer and geographer, Claudios Ptolemaios (Ptolemy, ?100–?165). Echelon matrix An m row n column matrix which has undergone Gaussian elimination has a particular structure as a result. It is called a (row) echelon matrix if: the first non-zero element in each non-zero row (i.e. a row with at least one non-zero element) is 1; the leading 1 in any non-zero row occurs to the right of the leading 1 in any preceding row; and the non-zero rows appear before the zero valued rows. All zero rows (if any) occur at the bottom of the matrix. For example: 2

1 40 0

N 0 0

N N 1 N 0 1

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_5

3 N N5 N

177

178

where the Ns are non-zero elements. It is called a reduced echelon matrix if the leading 1 in any non-zero row is the only non-zero element in the column in which that 1 occurs. 2

1 0 40 1 0 0

N 0 N 0 0 1

3 N N5 N

Early use of the term echelon matrix occurs in Thrall and Tornheim (1957); see also Camina and Janacek (1984).

E

Edge effects Distortion in the fitting of a polynomial trend-surface to spatiallydistributed data as a result of the leverage exerted by data points located close to the edges of the study area (Davis and Sampson 1973; Unwin and Wrigley 1987). Edgeworth series, Edgeworth expansion A four-parameter distribution fitted (by nonlinear least squares optimisation) to a cumulative grain-size curve, using the observed mean, standard deviation, skewness and kurtosis as initial parameter estimates (Jones 1970; Dapples 1975). Named for the Irish economist and mathematician, Francis Ysidro Edgeworth (1845–1926) (Edgeworth 1905). Effective record length The actual length of a record available after reduction to allow for end effects. The term was introduced by the American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000) (Blackman and Tukey 1958). See Bardsley (1988). Eigenanalysis Also known as the singular value decomposition: A ¼ UΣVT where A is a real n n matrix; and 2 6 Σ¼6 4

σ1

3 σ2

7 7 5 σn

where the nonnegative diagonal elements, σ 1, ∙∙∙, σ n are in descending order of magnitude, σ 1 > σ2 > ∙∙∙ > σ n; U ¼ ðu1 u2 un Þ; V ¼ ð v1

v2

vn Þ;

and T indicates the transpose, so that the matrix U is horizontal and V is vertical. The method was originally developed by the Italian mathematician, Eugenio Beltrami

179

(1835–1900) (Beltrami 1873) and was almost simultaneously independently discovered (Stewart 1993) by the French mathematician, (Marie Ennemond) Camille Jordan (1838–1921). This matrix decomposition technique, breaks any n m matrix into a product of three matrices: an n p left singular vector matrix, a p p singular value matrix and a m p transposed right singular vector matrix, where p is the rank of the original matrix. If the matrix corresponds to a centred data set, the obtained right-side singular vectors correspond to the principal components of the covariance matrix, while the squares of the singular values are equivalent to the eigenvalues of the principal components. The same can be said from a singular value decomposition derived from the standardized data matrix and a principal components analysis on the correlation matrix. In any case, the left-hand singular vector matrix gives the scores of each sample in each component. It is classically used as a reduction-of-dimensionality technique, for purely descriptive purposes. The term, as it is used today, was introduced by Scottish mathematician, Frank Smithies (1912–2002) (Smithies 1938) and the most widely-used algorithm for performing the decomposition was developed by the American mathematician, Gene Howard Golub (1932–) (Golub and Kahan 1965; Golub and Reinsch 1970). See Davis and Sampson (1973), Harvey (1981), Freire and Ulrych (1988), Reyment (1991) and Mari et al. (1999) for examples of earth science usage. Eigenfunction One of a set of functions which satisfies both a differential equation and a set of boundary conditions; eigenfunctions which correspond to different eigenvalues are uncorrelated (independent) (Sheriff 1984). In systems theory, the eigenfunction of a system is the signal f(t) which, when input into the system produces a response y(t) ¼ λf (t), with the complex constant the corresponding eigenvalue, λ. The term was introduced by the German mathematician, David Hilbert (1862–1943) (Hilbert 1904) and first appeared in the English-language literature in Dirac (1926), but did not become frequent in geological and geophysical literature until the 1970s (e.g. Sidhu 1971; Winant et al. 1975). See also: eigenvalue, eigenvector. Eigenstate The condition of a system represented by one eigenfunction (Sheriff 1984); one of the solutions of an eigenvalue equation: Any equation which has a solution, subject to specified boundary conditions, only when a parameter occurring in it has certain values. Specifically, the equation Av ¼ λv, which can have a solution only when the parameter λ has certain values, where A can be a square matrix which multiplies the vector v, or a linear differential or integral operator which operates on the function v, or in general, any linear operator operating on the vector v in a finite or infinite dimensional vector space. An eigenstate is also the measured state of some object possessing quantifiable characteristics such as position, momentum, etc. The state being measured and described must be observable (i.e. something such as position or momentum that can be experimentally measured, either directly or indirectly), and must have a definite value, called an eigenvalue. The term was introduced by the British theoretical physicist, Paul

180

Adrien Maurice Dirac (1902–1984) (Dirac 1930), but does not seem to appear in the geophysical literature until the 1980s onwards (e.g. Eggers 1982; Pai 1990).

E

Eigenvalue The eigenvalue of a square matrix (X) is a value (λ) such that |X λI| ¼ 0, where I is the identity matrix (i.e. one whose elements are unity); in general, for a p p matrix, there will be p eigenvalues. The corresponding column vectors (v) for which Xv ¼ λv are called the eigenvectors. The method of solution ensures that the elements of λ are in found in order of decreasing magnitude. In geometrical terms, the eigenvectors may be visualised as the vectors defining the p major axes of an ellipsoid formed by the pdimensional data set X, and the eigenvalues as their respective lengths. Also known as the characteristic root or latent root. In actuality, this term has no connection with the elusive Rudolf Gottlieb Viktor Eigen (1833–1876), “father of mathematical geology” (Doveton and Davis 1993). According to John Aldrich in Miller (2015a), its origins go back to celestial mechanics in the early nineteenth century, but the “eigen” terminology (its English translation means proper or characteristic) first appeared in the work of the German mathematician, David Hilbert (1862–1943) (Eigenwert; Hilbert 1904, eigenwert and eigenvektor in Courant and Hilbert 1924); the term eigenvalue first appeared in English mathematical usage in Eddington (1927). For earth science usage, see: Kelker and Langenberg (1976), Le Maitre (1982), Camina and Janacek (1984) and Gubbins (2004). Eigenvalue problem The so-called eigenvalue problem is to find the solution of the equation AV ¼ VΦ, where the p p square matrix A is real and symmetric (however, it may be singular and have zero eigenvalues), V is a p p square matrix of eigenvectors, and Φ is a diagonal matrix of eigenvalues, λ1,, λp. The term appears in English in Koenig (1933) and in an earth science context in Backus and Gilbert (1961), Knopoff (1961) and Buttkus (1991, 2000). Eigenvector The eigenvalue of a square matrix (X) is a value (λ) such that |X λI| ¼ 0, where I is the identity matrix (i.e. one whose elements are unity); in general, for a p p matrix, there will be p eigenvalues. The corresponding column vectors (v) for which Xv ¼ λv are called the eigenvectors. The method of solution ensures that the elements of λ are in found in order of decreasing magnitude. In geometrical terms, the eigenvectors may be visualised as the vectors defining the p major axes of the ellipsoid formed by the pdimensional data set X, and the eigenvalues as their respective lengths. In actuality, this term has no connection with the elusive Rudolf Gottlieb Viktor Eigen (1833–1876), “father of mathematical geology” (Doveton and Davis 1993). According to John Aldrich in Miller (2015a), its origins go back to celestial mechanics the early nineteenth century, but the “eigen” terminology (its English translation means proper or characteristic) first appeared in the work of the German mathematician, David Hilbert (1862–1943) (Eigenwert; Hilbert 1904, eigenwert and eigenvektor in Courant and Hilbert 1924); the term eigenvector first appeared in English mathematical usage in Brauer and Weyl (1935). For earth science usage, see: Davis and Sampson (1973), Le Maitre (1982), Camina and Janacek (1984) and Gubbins (2004).

181

Eigenvector biplot A graphical display of the rows and columns of a rectangular n p data matrix X, where the rows generally correspond to the sample compositions, and the columns to the variables. In almost all applications, biplot analysis starts with performing some transformation on X, depending on the nature of the data, to obtain a transformed matrix Z which is the one that is actually displayed. The graphical representation is based on a singular value decomposition of matrix Z. There are essentially two different biplot representations: the form biplot, which favours the display of individuals (it does not represent the covariance of each variable, so as to better represent the natural form of the data set), and the covariance biplot, which favours the display of the variables (it preserves the covariance structure of the variables but represents the samples as a spherical cloud). Also known simply as a biplot or the Gabriel biplot, named for the German-born statistician, Kuno Ruben Gabriel (1929–2003) who introduced the method in 1971. See also: Greenacre and Underhill (1982), Aitchison and Greenacre (2002); and, in an earth science context, Buccianti et al. (2006) Elastic strain The change in shape or internal configuration of a solid body resulting from certain types of displacement as a result of stress. Homogeneous strain operates such that an initial shape defined by a set of markers in, say, in the form of a circle (or sphere) is deformed into an ellipse (or ellipsoid). In heterogeneous strain the final shape formed by the markers will be irregular. Implicit in the work of the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823, 1827), the first rigorous definition of the term strain (in which it was contrasted with stress) was given by the British engineer, William John Macquorn Rankine (1820–1872) (Rankine 1855, 1858). The term homogeneous strain was used by the British physicist, William Thomson, Lord Kelvin (1824–1907) (Thomson 1856). Both strain and homogeneous strain were introduced into geology by the American mining and structural geologist, George Ferdinand Becker (1847–1919) (Becker 1893); see also Ramsay (1967) and Ramsay and Huber (1983). Ellipse A conic section (a figure formed by a plane cutting diagonally through a cone) with the equation ðx=aÞ2 þ ðy=bÞ2 ¼ 1; where a and b are the semi-major and semi-minor axes respectively. The term is attributed to the Greek mathematician and astronomer, Appolonius of Perga (c. 262–190 BC) although it was previously studied by his fellow-countryman, Menaechmus (c. 380–320 BC), one of the first people recorded as having used conic sections to solve a problem (Boyer 1968). Mentioned in a structural geology context by the British geologist, Henry Clifton Sorby (1826–1908) (Sorby 1856) and by the Hungarian-born American mechanical engineer, Árpád Ludwig Nádai (1883–1963) (Nádai 1927, 1931). Hart and Rudman (1997)

182

discuss fitting an ellipse by least squares to irregularly spaced two-dimensional observations. See also: Spath (1996), ellipsoid, ellipticity.

E

Ellipsoid A three-dimensional figure in which every plane cross-section is an ellipse. The term appears in a letter from the English mathematician, physicist and astronomer, (Sir) Isaac Newton (1643–1727) to the mathematician John Collins in 1672 (in: Turnbull 1959, 229–232). An early use in optical mineralogy was by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1808). Later works in this field (e.g. Groth 1895) used the terms polarisation ellipsoid, elasticity ellipsoid, index ellipsoid, etc. The concept of the stress ellipsoid was introduced by the French mathematician, Gabriel Lamé (1795–1870) and the engineer and physicist, Benoît Paul Émile Clapeyron (1799–1864) (Lamé and Clapeyron 1833). See also: Flinn (1962), Ramsay (1967) and Ramsay and Huber (1983); ellipticity, spheroid, strain ellipsoid, tensor. Ellipsoid d-value 1. A parameter expressing the amount of deformation of a strain ellipsoid, as expressed by the Pythagorean distance it plots from the origin in a Flinn diagram (Flinn 1962): d¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 Rxy 1 þ Ryz 1

where Rxy and Ryz are the two principal strain ratios, introduced by the British structural geologist, Derek Flinn (1922–2012). 2. A parameter expressing the amount of deformation of an ellipsoid as expressed by the Pythagorean distance it plots from the origin in a Ramsay logarithmic diagram (Ramsay 1967; Ramsay and Huber 1983): D¼

rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiiffi ð ε1 ε2 Þ 2 þ ð ε2 ε3 Þ 2

where ε1 and ε2 are the principal finite extensions. Ellipsoid k-value 1. A parameter expressing the amount of oblateness or prolateness of an ellipsoid plotted in a Flinn diagram (Flinn 1962): k ¼ (Rxy 1)/(Ryz 1), where Rxy and Ryz are the two principal strain ratios. Introduced by the British structural geologist, Derek Flinn (1922–2012) in 1962 and also known as Flinn’s k-value (Mookerjee and Peek 2014). 2. A parameter expressing the amount of oblateness or prolateness of an ellipsoid in a Ramsay logarithmic diagram (Ramsay 1967; Ramsay and Huber 1983): K ¼ (e1 e2)/ (e2 e3), where e1 and e2 are the principal finite extensions. See also: strain ellipsoid.

183

Ellipticity (e) 1. In geodesy, the English mathematician, physicist and astronomer, Isaac Newton (1643–1727) argued that if the Earth were of homogeneous composition, and its shape were to be as if it were entirely fluid (a grapefruit-shaped oblate spheroid), then its ellipticity (the “Figure of the Earth”) must be 1/290 (Newton 1687; Newton et al.1934). Ellipticity (1/f, where f is known as the (polar) flattening in the early literature), is here analogous to eccentricity and was first defined by the French polymath, Alexis Claude de Clairaut (1713–1765), in the case of an oblate spheroid to be: e¼

re rp re

where re is the equatorial radius and rp is the polar radius, with rp < re (Clairaut 1743). However, in France, the Italian-born astronomer, Jean-Dominique (Giovanni Domenico) Cassini (1625–1712) and his son, Jacques Cassini (1677–1756) argued for an Earth shaped like a prolate spheroid (Cassini 1718, 1720), in which case rp > re. Eventually, Newton’s hypothesis of polar flattening was confirmed by the measurements of a 1 arc-length, near the geographical North Pole, made in Lapland in 1737 by the French mathematician and philosopher, Pierre-Louis Moreau de Maupertuis (1698–1759) (Maupertuis 1738). Clairaut (1743) also showed that the excess of gravity at the Pole over that at an arbitrary point would vary as sin2(latitude), and that gp ge 5 Fe e¼ 2 ge ge where Fe is the centrifugal force at the Equator, and ge and gp are the acceleration of gravity at the Equator and Pole respectively. By 1750, a number of measurements of 1 arc-length (Lθ) had been made at different latitudes (θ) in Italy, Germany, Hungary and South America, and the Croatian-born physicist and mathematician, Ruđer Josip Bošković [Roger Joseph Boscovich] (1711–1787) made the first attempt to fit the relationship Lθ ¼ c0 + c1sin2(θ), where c0 is the arc length at the equator and c1 is a constant (Maire and Boscovich 1755). At that time, the method of least squares was unknown until published by the French mathematician and geodesist, Adrien-Marie Legendre (1752–1833), (Legendre 1805). Boscovich defined the ellipticity to be

Le =

1 3

1

, Lp Le þ 2

where Le and Lp are the arc lengths at the Equator and Pole respectively, obtaining e ¼ 1/248. It was subsequently qffiffiffiffirealised that, since the semiperiod of vibration (t) of a pendulum of length l is t ¼ π

l g,

then by having l 1 m, t 1 sec, and by observing the number of

184

E

pendulum swings per day (n 86,400) with the same instrument at different latitudes, one could similarly determine: lθ ¼ l0 + l1sin2(θ), where l0 is the estimated pendulum-length at the Equator and l1 is a constant. Using Clairaut’s (1743) theorem, e ¼ 0.00865 l1/l0, and this function was then used by the German polymath, Johann Heinrich Lambert (1728–1777) to find e from a best-fit (again before the introduction of least squares) to the 11 world-wide pendulum measurements which then existed. He obtained e ¼ 1/338 (Lambert 1765, 424–448). Much of the original impetus given to the growth of the young science of geophysics arose from the subsequent interest in gravitational work, which was also undertaken to help establish the mean density of the Earth, together with exploration of the Earth’s geomagnetic field. An American study (United States Department of Defense 1987) using satellite data, established the ellipticity value for an Earth-centred reference system which is a best-fit for the whole Earth to be 1/298.257223563. 2. In structural geology, in two dimensions, the ellipticity or strain ratio (R) of a finite strain ellipse with major and minor semi-axes (1 + e1) and (1 + e2), where e1 and e2 are the principal finite extensions (also called principle finite strains), is R ¼ (1 + e1)/ (1 + e2). In three dimensions we have (1 + e1) (1 + e2) (1 + e3). The three orthogonal planes XY, YZ and ZX are the principal planes of finite strain and the strain ratios are: Rxy ¼ (1 + e1)/(1 + e2), Ryz ¼ (1 + e2)/(1 + e3), and Rzx ¼ (1 + e1)/ (1 + e3). See also: strain ellipsoid. The term principle strain appears in a discussion of elasticity by the British physicist, William Thomson, Lord Kelvin (1824–1907), (Thomson 1856). Embedded Markov chain A Markov process is a natural process (in time or space) in which one or more previous events influence, but do not rigidly control, the state in a given direction in time (or position). By counting the numbers of transitions from the i-th of k possible states to the j-th of k possible states which occur at equal intervals in a given direction (in space or time), the nature of change in the system under study may be characterised by a Markov transition probability matrix. Given this conditional probability matrix and knowledge of the present state, the state at the next instant in time depends only on that state, and is unaffected by any additional knowledge of what might have happened in the past. If the change from state i to state j does not occur instantaneously, but is a random variable specified by a probability density function (e.g. a time-interval of geological non-deposition or erosion, represented by a bedding plane), then the process is known as a semi-Markov process. Markov transition matrices have been used in the study of bed-thickness distributions, and to characterise the nature of cyclic sedimentation and spatial variation in granitoids. Although the idea was introduced by the Russian mathematician, Andrei Andreevich Markov [Sr.] (1856–1922) (Markov 1906), he never applied it to the natural sciences and it was first called a “Markov chain” in Bernštein (1926a). It was first applied in geology by the Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995) (Vistelius 1949a, b). Krumbein and Dacey (1969) distinguish two types of transition matrix as used in a stratigraphic context: the first, corresponding to a Markov

185

chain, gives the transition probabilities between a suite of lithologies (sandstone, shale, siltstone, lignite) based on observations taken at equal intervals up the stratigraphic succession. The second, corresponding to an embedded Markov chain (Karlin 1966) gives the transition probabilities only between successive lithological units, so that the transition probabilities down the diagonal of the transition matrix are all zero. See also: Wold (1965); and in an earth science context: Vistelius (1966, 1980, 1992), Schwarzacher (1969), Dacey and Krumbein (1970), Doveton (1971) and Vistelius et al. (1983). Embedding dimension The minimum dimension in phase space needed to capture the behaviour of a chaotic nonlinear dynamical system (i.e. one which is extremely sensitive to initial conditions). Given a discrete time series xt ; t ¼ 1 , 2 , 3 , , N, values at a future time may be predictable from past ones, i.e.: xt ¼ f ðxt1 ; xt2 ; xt3 ; ; xtD Þ þ εt where εt is a noise term resulting either from real noise in the system or from an insufficient dimensionality (D) of the measurements. In general, it is expected that εt will reduce as D is reduced. If the system is completely deterministic, then εt should vanish once D exceeds the minimum embedding dimension Dmin (Takens 1981; Farmer 1982). Several methods have been suggested for estimating Dmin, see Cellucci et al. (2003) for a review. See Urquizú and Correig (1998), and Frede and Mazzega (1999) for examples of its occurrence in geophysical contexts. Empirical Discriminant Function (EDF) A method of discriminant analysis based on nonparametric estimation of a probability density function for each category to be classified using Bayes’ rule. Developed by Specht (1967), it was successfully applied to geological problems by Howarth (1971a, 1973a) and Castillo-Mu~noz and Howarth (1976). Empirical orthogonal functions A singular spectrum analysis is a decomposition of a time series X(t), of length N, based on an eigenvector decomposition of a matrix of the lagged data series for all lags being considered, up to a maximum L. Following standardisation of the series, the matrix is formed with elements eij ¼

N k X 1 xi xiþk ðN k Þ i¼1

where 0 k L 1. It thereby represents the signal as a sum of components that are not necessarily oscillations, but more general functions and can both identify spectral lines and act as very effective noise filter. It is useful for extracting information even from short and noisy time series without prior knowledge of the affecting dynamics. The graph of the logarithms of square roots of the eigenvalues (singular values) ordered in decreasing

186

magnitude is called the singular spectrum. Reconstructed components, based on the corresponding eigenvectors or empirical orthogonal functions introduced by the French atmospheric physicist Robert Vautard and Hungarian-born American atmospheric physicist, Michael Ghil (1944–) (Vautard and Ghil 1989; Vautard et al. 1992) first separate out broad trends, then superimposed cosinusoidal components, and finally noise. See also Schoellhamer (2001) and Weedon (2003).

E

End-member A “pure” chemical compound, mineral, aqueous component, size-grade of sediments, etc. which represents an “extreme” composition and forms one end of a continuous series of natural mixtures (e.g. albite and anorthite in the plagioclase series). The term appears to have come into use in geology and mineralogy following its introduction by the Dutch chemist, Jacobus Henricus Van’t Hoff (1852–1911) (Van’t Hoff 1899). The concept was used in facies mapping by Krumbein (1955a) and so-called end-member mixture analysis based on principal components analysis has been applied in hydrology since the work of Christopherson and Hooper (1992). See also Yu et al. (2016). Enrichment A concept largely used in exploration geochemistry: also known as contrast, it is the ratio of anomalous to background values in a geochemical anomaly. The term was popularised in the English language following its use by the American applied geochemist, Herbert Edwin Hawkes Jr. (1912–1996) (Hawkes 1957) following its use in Russian work by Dmitriĭ Petrovich Maliuga (1947). See also: enrichment factor, positive weight. Enrichment factor A term used in geochemistry. An early use was by the German aristocrat, Stephan (Baron) Thyssen-Bornemisa von Kaszan (1904–1982), at one time owner of the Seismos geophysical company, in a paper (Thyssen 1942) in which he discusses the enrichment factor of the concentration of an element in the biosphere divided by that in the lithosphere. It was subsequently applied to ratios of element concentrations in weathered/parent rock, crust/mantle, etc. Ensemble 1. A family of functions with probabilities assigned to the various sub-families (Blackman and Tukey 1958). 2. A group of records (Camina and Janacek 1984). Entropy, relative entropy (H ) A thermodynamic quantity established by the German physicist and mathematician, Rudolf Julius Emanuel Clausius (1822–1888) (Clausius 1865), which is a measure of the degree of disorder in a system, characterised (Boltzmann 1872) by the natural logarithm of the probability of occurrence of its particular arrangement of particles. The idea was introduced by the American electronic engineer, Ralph Vinton Lyon Hartley (1888–1970) for use in communication theory (Hartley 1928),

187

although he did not use the term entropy, simply referring to it as a “unit of information,” and it was later introduced by the American mathematician, Claud Elwood Shannon (1916–2001) (Shannon 1948; Shannon and Weaver 1949). It was subsequently taken up in geology as a measure of the lack of uniformity in composition (Pelto 1954). In a kcomponent system, entropy (H ) is defined as: H ¼

k X

pi ln ð pi Þ

i¼1

where pi is the proportion of the i-th component, 0 pi 1. It reaches a maximum if all the pi ¼1/k. If the logarithms are to base-2, the units of information (H ) are bits; if natural logarithms, they are known as nats; and, if the logarithms are to base-10, as hartleys. Relative entropy (Hr) is defined as: H r ¼ 100

k X

pi ln ð pi Þ= ln ðk Þ:

i¼1

High values of Hr correspond to dominance by one of the k possible components present. It has subsequently been used in mapping multi-component taxonomic and sedimentological data to show the degree of mixing of end-members, following early use by Parker et al. (1953). See: Pelto (1954), Miller and Kahn (1962), Vistelius (1964, 1980, 1992), Botbol (1989), Christakos (1990), Buttkus (1991, 2000) and Baltr^unas and Gaigalas (2004). See also: Bayesian/maximum-entropy method, facies map, information coefficient, maximum entropy filter, maximum entropy principle, maximum entropy spectrum, minimum entropy deconvolution. Envelope In time series, it is the pair of low-frequency curves which bound the deflections of a higher frequency time series; often drawn by smoothly connecting (i) all the adjacent peaks, and (ii) all the adjacent troughs (Sheriff 1984; Buttkus 2000). Epoch Harmonic motion was originally defined in terms of mechanics: if a point P is moving round the circumference of a circle with uniform velocity V, then its orthogonal projection (M) onto the diameter of the circle which passes through the centre (O) will execute simple harmonic motion. The speed of M increases from zero at one end of the diameter (A) to V at O, then it falls off again to zero as M approaches the opposite end of the diameter (A’). The time taken for P to return to the same position in the circle is the period, T; the radius of the circle (r) is the amplitude of the simple harmonic motion, and T ¼ 2πr/ V (or 2π/ω, where ω is the angular velocity of P). The angle AOP is the phase of the simple harmonic motion. If P is the position of the point at time t, and Z (lying on the circle d is the epoch. If the between A and P) was its position at time t ¼ 0, then the angle AOZ

188

d ¼ OP cos POZ d þ ZOA d , hence distance OM at time t is x, then x ¼ OP cos POA x ¼ rcos(ωt + ε), where ε is a constant, and dx/dt ¼ rωsin(ωt + ε). A record of x as a function of time was known as the curve of sines or harmonic curve (Thomson and Tait 1878; Macquorn Rankine 1883).

E

Equal-area projection A graph used to display three-dimensional orientation data, particularly in structural geology at micro- (petrofabric) and macro-scales. Originally introduced by the Swiss mathematician, Johan Heinrich Lambert (1728–1777), (Lambert 1772) for cartographic use. Unlike the stereographic projection (which preserves angles), the projection onto the equatorial plane is such that equal areas on the reference sphere remain equal on the projection. Lambert’s net was redrawn by Schmidt (1925), who introduced the projection into geology, and was subsequently popularised by the Austrian structural geologist, Bruno Sander (1884–1979) (Sander 1930). The resultant point pattern (projected onto the lower hemisphere) is generally contoured to show point-density (Schmidt 1925). The polar projection was used for plotting orientations of pebbles in sedimentary rocks by Reiche (1938), Krumbein (1939), Harrison (1957). See also: Sander (1970), Fairbairn and Chayes (1949), Phillips (1954), Whitten (1966), Ramsay (1967) and Howarth (1999). Equalization, equalizing Adjusting the gains of different time series so that their amplitudes are comparable; may be applicable to adjusting the instrumental response at one seismic recording station to match that at another (Sheriff 1984). The term was used by Watkins (1923); early geophysical examples include: Aki (1960), Toks€oz et al. (1965) and Shanks (1966). See also: convolution. Equation A mathematical statement that one expression is equal to another. It may involve one or more unknowns and may express linear or nonlinear relationships. The Latin word equatio, possibly derived from earlier Arabic works, first appears in a manuscript Liber Abaci [Book of calculation] (1202) by the Italian mathematician Leonardo Pisano (c. 1170 fl. 1240), whose original name was Leonardo Fibonacci. An English translation has been published by Sigler (2002), based on a definitive edition of manuscript versions of the earlier work (Boncompagni 1857). The English word equation first appears in a translation of Euclid’s Elements of Geometry by the merchant and parliamentarian (Sir) Henry Billingsley (c. 1538–1606) (Billingsley 1570). See: banded equation solution, Booton integral equation, difference equation, differential equation, diffusion equation, equation of state, Euler’s equation, Gresens’ equation, Helmholtz equation, Laplace’s equation, linear equation, ordinary differential equation, partial differential equation, prediction filter equation, quadratic equation, quasi-functional equation, state equation, wave equation, Wiener-Hopf integral equation.

189

Equations of condition When a forward problem is linear, the relationship between the data and model vectors is linear and may be expressed as: d ¼ Am þ ε where di ¼

P X

Aij mj þ εi

j¼1

This set of equations is known as the equations of condition, in which d is a data vector of length n; ε is a vector of errors, also of length n; m is the model vector, consisting of p parameters; and A is an n p matrix of coefficients. The i-th row of A is known as the data kernel and describes how the i-th datum depends on the model (Gubbins 2004). The term equations of condition originally derives from the solution of values for the unknown constants in a set of equations which define the nature of a problem to be solved. For example, in trigonometric surveying, consider a new position (L) whose bearings have been observed from three previously established stations A, B and C. For the sake of example, let A and C lie on a North–South line; B lies to their East and L to their West; L and B lie on an East–West line. The bearings of A, B and C as observed from L are then found. However, it must be assumed that all these measurements contain unknown errors (ε). The set of triangles involved are: ALB, CLB, ALC and ALC, with common sides formed by LB and AC. Now, considering the angles within ALC and BLC: d þ LAC d þ ACL d ¼ 180 þ ALC d þ LBC d þ BCL d ¼ 180 þ BLC

X X

ε1 ε2

also d sin BLC d sin ACB d sin LAB ∙ ∙ þ ε3 ¼ 1 d sin BCL d sin CAB d sin ALB These are the three equations of condition, and the optimum solution for the values of the angles is found by application of the method of least squares (Legendre 1805; Ivory 1825). Subsequent calculation enables the lengths of AL and LC to be deduced. The name “equations of condition” arose because they are only true when the unknowns contained within them take particular values—their existence makes conditions which must be fulfilled. The term was originally used by the French mathematician, Marie-JeanAntoine-Nicholas de Caritat, Marquis de Condorcet (1743–1794) during his studies on

190

integral calculus to mean a function of a differential equation which was integrable (Condorcet 1765, Pt. 1, p. 5). Early use of the term in geophysics occurs in Loomis (1842) concerning measurements of magnetic dip.

E

Equation of state An equation which describes the relationship between functions of state (pressure (P), volume (V ), temperature (T), internal energy or specific heat) for a gas, fluid, solid or mixtures of fluids. From experimental observations by Boyle, Gay-Lussac and Avogadro, of the behaviour of gases it was found that: (i) for a given mass of gas at constant temperature, the pressure times the volume is constant; and (ii) for a given mass at constant pressure, the volume is directly proportional to temperature. Hence, for an “ideal gas” in which no interaction between the atoms or molecules occurs, the equation of state, is: PV ¼ nRT where V is the molar volume of gas under external pressure P, n is the number of moles, T is the temperature, and R is the ideal molar gas constant, 8.31451 J mol1 K1. This was modified for actual gases by the Dutch chemist, Johannes Diderik van der Waals (1837–1923) (1873, 2004), as:

Pþa

n 2 V V

n

b ¼ RT

where a and b are positive constants which account for lowering of pressure as a result of molecular interaction forces (a, Pa m3) and reduction in volume because of the physical space occupied by the molecules of the gas (b, m3/mol). The effect of the term a is important at low temperatures, but at high temperatures, b (known as the “hard sphere repulsive term”) becomes more important as the thermal energy becomes greater than any molecular attraction. The values of a and b are determined from experimental results. It was the first equation which could represent vapour-liquid coexistence. However, it has been superseded by many subsequent models, beginning with the Redlich-Kwong (1949) equation, which introduced a temperature dependence into the attractive term: P¼

RT a pffiffiffiffi V b T V ð V bÞ

and its subsequent modification with a more general temperature-dependent term by Soave (1972). Various modifications of the hard sphere term have also been proposed (Mulero et al. 2001) of which, that by Guggenheim (1965) has been widely used. In mineral physics, equations of state are often used to describe how the volume or density of a material varies with P and T at increasing depths in the Earth (Duffy and Wang 1998). See also state equation.

191

Equidetermined model, equidetermined problem When a forward problem is linear, the relationship between the data and model vectors is linear and may be expressed as: d ¼ Am þ ε where di ¼

P X

Aij mj þ εi

j¼1

This set of equations is known as the equations of condition, in which d is a data vector of length n; ε is a vector of errors, also of length n; m is the model vector, consisting of p parameters; and A is an n p matrix of coefficients. The i-th row of A is known as the data kernel and describes how the i-th datum depends on the model (Gubbins 2004). If the number of linearly independent equations is equal to the number of unknown parameters to be estimated, the problem is said to be equidetermined. Equimax rotation A method of oblique rotation used in factor analysis. Frequent methods are varimax rotation, which tries to maximise the variance of the loadings in each column of the factor matrix and quartimax rotation, which aims to maximise the variance of the squares of the loadings in each row of the factor matrix. Equimax rotation is a compromise between the other two. Equipotential surface The concept of gravitational potential was first introduced by the French mathematician and astronomer Pierre-Simon Laplace (1749–1827), (Laplace 1784). An equipotential surface is a continuous surface which is everywhere perpendicular to the lines of force; e.g. the geoid, i.e. the best-fit global mean sea level surface with respect to gravity. First described as the mathematical “Figure of the Earth” by the German mathematician, Carl Frederick Gauss (1777–1855) (Gauss 1828); the term geoide was introduced by his former student, the German mathematician and geodesist, Johann Benedict Listing (1808–1882) (Listing 1872). Equipotential surface occurs in Thomson (1866) and the concept was also used in electricity and magnetism by Maxwell (1873) and Adams (1874). See also: potential field. Equivalent grade A graphical measure of grain size introduced by the British stratigrapher, Herbert Arthur Baker (1885–1954) (Baker 1920), shown by Krumbein and Pettijohn (1938) to be equivalent to the arithmetic mean diameter of grain size distribution. Equivalent source technique Bouguer gravity anomaly measurements on an irregular grid and at a variety of elevations can be synthesized by an equivalent source of discrete

192

point masses on a plane of arbitrary depth below the surface. By keeping the depth of the plane within certain limits relative to the station spacing, we can ensure that the synthesized field closely approximates the true gravity field in the region close to and above the terrain. Once the equivalent source is obtained, the projection of the Bouguer anomaly onto a regularly gridded horizontal plane is easily accomplished, it can then be used to carry out vertical continuation. The method was introduced by the Australian geophysicist and computer scientist, Christopher Noel Grant Dampney (1943–2004) (Dampney 1969). Roy and Burman (1960) previously applied a similar method to gravity and magnetic data.

E

Equivalent spherical diameter A descriptor of sedimentary particle settling velocity based on an empirical equation of Gibbs et al. (1971) as modified by Komar (1981). It is embodied in a computer program of Wright and Thornberg (1988). Equivalent width The effective bandwidth is defined by the indefinite integral p(υ) dυ/Pmax ¼ [P(υ) + c]/Pmax, where P(υ) is the power at frequency υ, is a constant of integration and Pmax is the maximum power. The equivalent width is the width of a theoretical Daniell window with the same total power and the same peak power. (Sheriff 1984). Ergodic, ergodicity This is a property of certain systems which evolve through time according to certain probabilistic laws. Under certain circumstances, a system will tend in probability to a limiting form which is independent of the initial position from which it started. This is the ergodic property. The term was first used in connection with classical mechanics at the end of the nineteenth century, the term began to be used in the theory of stochastic processes in the 1950s, following work by the American mathematician, Joseph Leo Doob (1910–2004) (Doob 1953; Vistelius 1980, 1992; Camina and Janacek 1984). Ergodic process In signal processing, a stochastic process is said to be ergodic if its statistical properties (e.g. mean and variance) can be deduced from a single, sufficiently long, sample. The American mathematician, David Birkhoff (1884–1944) gave a proof in 1931 that for a dynamical system, the time average along each trajectory exists almost everywhere and is related to the space average (Birkhoff 1931); see also Camina and Janacek (1984), Buttkus (1991, 2000). Erosion One of the Minkowski set operations (Minkowski 1901). See Agterberg and Fabbri (1978) for a geological example. Error The result of a measurement, or an estimated value (Robinson 1916; Camina and Janacek 1984; Buttkus 1991, 2000; Gubbins 2004; Analytical Methods Committee 2003) minus the true value of what is being measured. In practice, the true value is unknown and, in the case of the measurement of a standard reference material, an established reference value is used (cf. trueness). The observed error will generally be a combination of a

193

component of random error, which varies in an unpredictable way, and systematic error which remains constant or varies in a predictable way. The theory of errors began to be developed by mathematicians and astronomers in the nineteenth century (Stigler 1986). See also: accuracy, asymptotic error, bias, bug, calibration, circular error probability, drift, error control, error function, errors-in-variates regression, Gaussian distribution, Hamming error-correcting codes, inaccuracy, ill-conditioning, least absolute error, minimum absolute error, prediction error, normal distribution, outlier, probable error, propagation error, quality control, random error, residual, root mean square error, roundoff error, Thompson-Howarth error analysis, standard error, systematic error, truncation error, Type I error, Type II error. Error-checking Techniques designed to detect errors which occur during the processing and transfer of data so that they can, if possible, be corrected (Sheriff 1984). See also: Hamming error-correcting code, checksum. Error control A system designed to detect errors and, if possible, to correct them (Sheriff 1984). Error function (erf, erfc) This is the integral: Z

1

exp x2 =2 dx,

x

the term error function, and the abbreviation for it (originally Erf ), were introduced by the British physicist, James Whitbread Lee Glaisher (1848–1928) (Glaisher 1871). Today it is generally expressed as: 2 erf ðt Þ ¼ pffiffiffi π

Z

t

2 eðy Þ dy

and the complementary error function is: erfc(t) ¼ 1 erf(t). The former is also known as the Gauss error function. The term “error function” has also been used in a more general sense by Berlanga and Harbaugh (1981). Errors-in-variates regression This method fits a bivariate linear regression function, y ¼ b0 + b1x, where both x and y are subject to measurement or other error. The probability distributions for the errors in x and y are assumed to conform to normal distributions with means mx and my and standard deviations sx and sy respectively. The reduced major axis minimizes the sum of squares of the lengths of the orthogonal lines from the data points to the regression line.

194

y ¼ my

E

sy mx : sx

In geology, it has become particularly important in fitting isochrons, and was introduced for this purpose by British-born isotope geologist, Derek H. York (1936–2007) (York 1966, 1967, 1969; see also Mahon 1996); McCammon (1973) discusses its extension to nonlinear regression (see also Carroll et al. 2006). Solutions to the problem have a long prior history, going back to the work of American surveyor, farmer, teacher and mathematician, Robert J. Adcock (1826–1895) (Adcock 1877, 1878). Numerous approaches to the problem are now available, see Fuller (1987), Ripley and Thompson (1987), Carroll and Spiegelmann (1992), Riu and Rius (1995), Bj€orck (1996), Webster (1997), Cheng and Van Ness (1999), Carroll et al. (2006), Gillard and Isles (2009); see also organic correlation, reduced major axis. Estimate, estimation An estimate is a particular value, or range of values, yielded by a rule or method for inferring the numerical value of some characteristic (parameter) of a population from incomplete data, such as a sample; estimation is the process of obtaining an estimate. If a single figure is calculated, then it is a point estimate; if an interval within which the value is likely to lie, then it is an interval estimate (Kendall and Buckland 1982; Everitt 2002; Camina and Janacek 1984). Estimation theory Statistical theory which deals with the estimation of parameters from noise-corrupted observations. Its applications particularly apply to adaptive signal processing and the use of digital filters (Hamming 1977; Buttkus 1991, 2000). Euclidean geometry, Euclidean space The familiar system of geometry described by the Greek mathematician Euclid, c. 300 BC. Two important postulates are that a straight line is determined by two points, and that parallel lines cannot meet (Hutton 1815). Both terms, and their corollary non-Euclidean, began to be widely used by the 1890s (Google Research 2012). Euler’s equation, Euler’s formula, Euler’s identity, Euler’s relation Euler’s identity, equation, formula, or relation states that for any real number x: cis(x) ¼ eix ¼ cos(x) + i sin pffiffiffiffiffiffiffi (x) where e is Euler’s number, and i is the imaginary unit 1. When x ¼ π, since cos (π)¼1 and sin(π)¼0, this may be expressed as eiπ + 1 ¼ 0, which is known as Euler’s identity. First derived by the Swiss mathematician, Leonhard Euler (1707–1783) in 1748 (translated by Blanton 1988, 1990), although it had been previously known in different forms, the relationship became known as “Euler’s formula” following publication of a textbook by James Thompson (1848). It enables a Fourier transform to be rewritten in terms of its sine and cosine transform components. Mentioned in a geophysical context by Sheriff (1984). See also: complex number.

195

Euler’s number (e) Euler’s number is the base of the system of “natural” logarithms. This irrational number (2.718281828) is defined as the limit, as n tends to infinity, of n 1 þ 1n ; alternatively, e ¼ 1=0! þ 1=1! þ 1=2! þ 1=3! þ where ! denotes factorial. Although it was in use at the beginning of the seventeenth century, the Swiss mathematician, Leonhard Euler (1707–1783) first used the notation e (the reason is unknown and still a subject of discussion) in a manuscript written in 1727 or 1728, and it became standard notation after the publication of Euler’s Mechanica (1736). See also: logarithm (Napierian). Even function A function which retains the same value when the variable is changed from positive to negative, i.e.: f (x) ¼ f (x). The exponents of such functions are even numbers, or fractions with numerators which are even numbers and denominators which 4 are odd (e.g. x5 ). The idea was introduced by the Swiss mathematician, Leonhard Euler (1707–1783) in 1727 (Euler 1729; Sandifer 2007). Mentioned in a geophysical context by Fu (1947a,b), Camina and Janacek (1984), Sheriff (1984) and Gubbins (2004). Event 1. An especially significant or noteworthy occurrence. 2. In geophysics, a line-up in a number of seismograph recorder traces which indicates the arrival of new seismic energy, indicated by a systematic change of phase or amplitude in the seismic record. By the early 1900s, the term was used in connection with volcanic phenomena and, in the sense used here, by at least 1933 (Goldstone 1934) and it became more widely used from the 1950s onwards. See also Nettleton (1940) and Sheriff and Geldart (1982). Event detection, event-detection Originating in the mid-1960s, this term has generally been applied to the automatic recognition of significant seismic events in multi-channel data originating from earthquakes or nuclear explosions (Green 1966; Ruingdal 1977; Sharma et al. 2010). Evolutionary spectrum analysis See short-time Fourier transform Exact Chi-squared test, Fisher’s exact test, Fisher-Yates test An alternative to use of the Chi-squared statistic for assessing the independence of two variables in a two-by-two contingency table, especially when the cell frequencies are small. The method consists of evaluating the sum of the probabilities associated with the observed table and all possible two-by-two tables which have the same row and column totals as the observed data but

196

exhibit more extreme departure from independence (Everitt 1992). Also known as the Fisher-Yates test (Finney 1948) and Fisher’s exact test, as following a suggestion by the English statistician, (Sir) Roland Aylmer Fisher (1890–1962), it was proposed by his colleague, Frank Yates (1902–1994) (Yates 1934). See Butler et al. (2010) for a palaeontological application.

E

Excel A spreadsheet program originally developed by the Microsoft Corporation for the Macintosh computer in 1985, with a version for the Windows operating system following in 1987. Geological applications of this software are described by Ludwig (2000), Keskin (2002, 2013), Ersoy and Helvaci (2010) and López-Moro (2012). Excess 1. The state of exceeding the usual or appropriate magnitude of something. 2. A measure of the excess of steepness of a unimodal probability density function compared with the normal distribution, given (Vistelius 1980, 1992) by (K 3), where K is the kurtosis of the distribution. Excursion set The foundations of set theory were established by the Russian-born German mathematician, Georg Ferdinand Ludwig Philipp Cantor (1845–1918) (Cantor 1874). An excursion set consists of the set of points obtained by thresholding a bounded region containing a realization of a random field (X) at a level u, so as to create a set of random shapes, which may be thought of a “peaks,” for which X u (Adler 1976). For application in landscape studies see Culling (1989) and Culling and Datko (1987). Expectation E(•), expected value An operator, E(x), which denotes the mean of a variable (x) in repeated sampling. The term was first introduced by the Swiss physicist, Gabriel Cramer (1704–1752) (Cramer 1728), but did not come into wider use until its adoption in publications by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), and the Indian-born English mathematician, Augustus de Morgan (1806–1871), who had reviewed the 3rd (1820) edition of Laplace’s book (De Morgan 1838). See also: Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004). Expectation-Maximization (EM) algorithm The EM algorithm was introduced by Canadian statistician, Arthur Pentland Dempster (1929–), and American statisticians, Nan McKenzie Laird (1943–) and Donald Bruce Rubin (1943–) (Dempster et al. 1977) as a method of maximising likelihood estimates of a parameter when the observations can be considered as incomplete data; each iteration of the algorithm consists of an expectation step followed by a maximisation step, hence the initials which form its name. See also: Navidi (1977), McLachlan and Krishnan (1997) and Heslop et al. (2002) and Palarea-Albaladejo and Martín-Fernández (2008) for earth science applications.

197

Expected value See expectation. Experimental design The purpose of designing an experiment is to provide the most efficient and economical methods of reaching valid and relevant conclusions from the experiment. A properly designed experiment should permit a relatively simple statistical interpretation of the results, which may not be possible otherwise. The experimental design is the formal arrangement in which the experimental programme is to be conducted, selection of the treatments to be used, and the order in which the experimental runs are undertaken. Experimental design may be applied equally to laboratory investigations or to solely computer-based computational investigations in which a large number of variables are involved. The design may dictate the levels at which one or more of the variables (factors) are present, and the combination of factors used, in any one experiment. This formal approach was popularised following the work of the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) and his colleague, Frank Yates (1902–1994) (Fisher 1925a, 1935; Yates 1939; Quenouille 1949; Montgomery 1991b). Use of these methods was first promoted in geology by the American petrologist, Felix Chayes (1916–1993) and the mathematical geologist, William Christian Krumbein (1902–1979) (Chayes and Fairbairn 1951; Krumbein and Miller 1953; Krumbein 1955b). See also: Thompson et al. (1979) and Damsleth et al. (1992). Expert system Expert systems generally use symbolic or non-numeric (rule-based) reasoning, rather than numerical calculations (some of the major applications have been developed in special-purpose programming languages such as LISP, Prolog and KEE), to arrive at decisions, and are normally justified by attempting to mimic the reasoning process in performing a task previously undertaken by a human expert, or experts. The first artificial intelligence programs began to be developed in the 1950s. The earliest earth science application was the U.S. Geological Survey’s PROSPECTOR system to aid mineral deposit location, developed by the American computer scientist, Peter Elliot Hart (1941–) and electrical engineer, Richard Oswald Duda (1936–) (Hart 1975; Hart et al. 1978; Campbell et al. 1982; McCammon 1990) and it was the first expert system to prove that it could solve an economically important problem: predicting the position of a previously unknown orebody at Mt. Tolman in Washington State, USA. Reddy et al. (1992) describe a more recent, Prospector-like, system underpinned by a Geographic information system for location of volcanogenic massive sulphide deposits. Several prototype systems have been developed for petroleum exploration (Wong et al. 1988). However, in many cases, development costs have tended to be high, and long-term maintenance and knowledge-updating can prove difficult. Walker (1988) raised a number of pertinent questions regarding the cost-effectiveness of such tools. A typical example is a prototype developed within an oil company to demonstrate the feasibility of capturing the knowledge of expert micropalaeontologists to aid microfossil identification (Athersuch et al. 1994); an undoubted technical success, the system would have required several years of further development to enlarge the database of diagnostic information to the goal of c. 5000 thousand taxa of operational use. Unfortunately, work on the system ended when all in-house biostratigraphic work was terminated. See also Dimitrakopoulos et al. (1994) and Crain (2014).

198

E

Exploratory Data Analysis (EDA) “Exploratory Data Analysis is detective work— numerical detective work—or counting detective work—or graphical detective work” (Tukey 1973). An approach introduced by the American statistician, John Wilder Tukey (1915–2000) in his classic textbook Exploratory data analysis (EDA) (Tukey 1977), which first appeared in preliminary limited editions in 1970 and 1971. His practical philosophy of data analysis minimises prior assumptions and allows the data to guide the choice of models. It particularly emphasises the use of simple graphical displays (e.g. the histogram, boxplot, Q–Q plot), to reveal the behaviour of the data and the structure of the analyses; residuals, to focus attention on what remains of the data after some analysis; re-expressions (transformations) to simplify behaviour and clarify analyses; and resistance (e.g. median, locally-weighted regression), to down-weight the influence of outliers on the results of an analysis. See also: Chambers et al. (1983), Howarth (1984), Helsel and Hirsch (1992), Maindonald and Braun (2003) and Reimann et al. (2008). Exponential distribution A right-skewed probability density function f(x; θ) ¼ [exp (x/θ)]/θ, the parameter θ > 0, for which both the logarithm of f(x) and of the cumulative distribution F(x) ¼ 1 exp(x/θ) decrease linearly with x. It was investigated by Schuenemeyer and Drew (1983) as a possible model for oil field size distributions, but was not found to be very satisfactory. This distribution was originally described by the British statistician, Karl Pearson (1857–1936) (Pearson 1895). See also: Vistelius (1980, 1992), Camina and Janacek (1984) and Gubbins (2004). Exponential decay A decrease of amplitude ( y) with distance (d ) or time (t) as y ¼ exp (ad) or y ¼ exp(bt), where a and b are decay constants (Sheriff 1984). Classic examples in physics and geophysics come from the study of radioactive decay, e.g. the New Zealandborn English physicist, Ernest Rutherford (Lord Rutherford of Nelson, 1871–1937) explicitly fitted an exponential model to the decay of “excited radiation” resulting from exposure of a metal plate or wire to “emanation” (radioactive gas) from thorium oxide (Rutherford 1900), the law was also noted as applicable to radium by the French physicist, Pierre Curie (1859–1906) in 1902. Shortly after, the American physicists, Henry Andrews Bumstead (1870–1920) and Lynde Phelps Wheeler (1874–1959) (Bumstead and Wheeler 1904) found, by similar means, that radioactive gas obtained from soil and water near New Haven, CT, USA, was “apparently identical with the emanation from radium.” Exponential function (exp) A function (denoted exp) in which the dependant variable increases or decreases geometrically as the independent variable increases arithmetically. Usually expressed in the form ex, as an equation of the type y ¼ y0e(ax), where y0 is the value of y at the origin (x ¼ 0), e is Euler’s number, the constant 2.71828, and a is a constant. Transforming both sides by taking logarithms to the base e, ln( y) ¼ ln(y0) + ax, which will plot as a straight line on paper with linear scaling on the x-axis and logarithmic scaling on the y-axis. Use of the term goes back to at least the work of the French

199

mathematician, Sylvestre François Lacroix (1765–1843) (Lacroix 1806). See also: Krumbein (1937), Krumbein and Pettijohn (1938) and Camina and Janacek (1984). Exponential ramp, exponential taper A weighting function used at the edge of a window in time series computations, multiplying values at time t > t0 by exp[k(t t0)], where k is a negative real number (Sheriff 1984). The term exponential taper occurs in Wheeler (1939). Exponential model A function of the form T ¼ axb c, fitted to sonic log data when drilling an oil well, where T is total travel-time from the well head in milliseconds from a given datum (usually the well head), x is vertical well depth (feet) and a, b and c are constants (Acheson 1963). Jupp and Stewart (1974) discuss the use of piecewise exponential models using spline-based fits. Extension (e) In mechanics and structural geology, extension (e) is a measure of the change in length of a line element, where e ¼ (l1 l0)/l0, l0 is the initial length of the line, and l1 is its final length. Referred to in early literature as dilation or stretch. The concept was first introduced by the French mathematicians, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823, 1827) and Siméon-Denis Poisson (1781–1840) (Poisson 1831). Introduced in the geological literature by the German-born American petrologist and structural geologist, Ernst Cloos (1898–1974) (Cloos 1947). See also: Nádai (1927, 1931), Ramsay (1967) and Ramsay and Huber (1983). Extrapolate, extrapolation The extension of a set of values (e.g. as a function of time) on the assumption that any observed continuous, or fitted, trend exhibited within their range is maintained outside it (Jones 1956). Extreme value distribution In geochemistry, a distinction has been drawn by Reimann et al. (2005) in the univariate case between outliers, which are thought to be observations coming from one or more different distributions and extreme values which are far away from the centre of the data set but are believed to belong to the same distribution. This distinction should be easier to determine in a multivariate situation. The extreme value distribution is the distribution of the largest (or smallest smallest) observation in a sample. The term was introduced by the German-born American mathematician Emil Julius Gumbel (1891–1966) (Gumbel 1935). Different models for the probability density function include: (i) the Gumbel distribution, which has a probability density function: 1 y f ðx; a; bÞ ¼ eðye Þ b

200

where y ¼ (x a)/b ; 1 < x < + 1 ; a 0 is the location parameter and b > 0 is the scale parameter; and e is Euler’s number, the constant 2.71828. (ii) the Fréchet distribution, which has a probability density function: f ðx; a; bÞ ¼

E

sþ1 b s s b e½ðxaÞ , b xa

where a 0 is the location parameter; b > 0 is the scale parameter; s > 0 is a shape parameter; and 1 < x < +1; and (iii) the Weibull distribution, which has a probability density function: f ðx; a; bÞ ¼

s s x a s1 ½ðxa e bÞ b b

where a 0 is the location parameter; b > 0 is the scale parameter; s > 0 is a shape parameter; and –1 < x < +1. See also: generalised Pareto distribution; Embrechts et al. (1997) and Coles (2001). Geoscience applications include: earthquake magnitude, seismic hazard intensity and rates, and flood-frequency analysis: Krumbein and Lieblein (1956), Brutsaert (1968), Bardsley (1978), Mulargia et al. (1985), McCue et al. (1989), Ho (1991), Voigt and Cornelius (1991) and Caers et al. (1996, 1999a,b). Eyeball, eyeball estimate To make an educated guess at a value after casual visual inspection of data or a graph, etc., without any measurement or calculation (Sheriff 1984). The term eyeball estimate first appeared in the 1940s (Google Research 2012) and eventually came into use in geology (e.g. Leonard and Buddington 1964).

F

F-test A test for the equality of the variances of two populations, each having a normal distribution, based on the ratio of the larger to the smaller variances of a sample taken from each (F-ratio). The test is widely used in analysis of variance introduced by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) in Fisher and Mackenzie (1923), Fisher (1935). Named the F-ratio and F-test in honour of Fisher, by the American statistician, George Waddel Snedecor (1881–1974) in Snedecor (1934). Its first application to a geological problem was by the American statistician, Churchill Eisenhart (1913–1994) (Eisenhart 1935). f-value A criterion proposed by Woronow and Love (1990) and Woronow (1990) for use with compositional data sets (see: closed array) to help quantify the significance of differences between the compositions of a given pair of components as observed in two such sets of data. However, Woronow’s views on the compositional data problem, in particular his rejection of the logratio transformation approach, have been strongly criticised by Aitchison (1999). Fabric diagram The study of structural lineation and tectonic fabric was begun by the Austrian geologist and petrologist, Bruno Hermann Max Sander (1884–1979) c. 1910, however, it was his fellow-countryman, the mineralogist, Walter Schmidt (1885–1945) who first began microscopic studies in which he measured in petrological thin-sections the angle between the principal optical axis of uniaxial crystals (such as quartz or calcite) and the direction of schistosity (Schmidt 1917). By 1925, both Schmidt and Sander were using microscopes fitted with a Federov universal stage and employed the Lambert equal area projection of the sphere to portray the results, contouring the spatial density of points on the projection to produce what became known (in English) as a fabric diagram. Sander rapidly extended the method to tectonites and pioneered the science of what eventually

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_6

201

202

became known in English-language texts as petrofabrics (Gef€ugekunde; Sander 1923, 1930, 1948, 1950; Sander and Schmidegg 1926; Sander 1970; Knopf and Ingerson 1938; Fairbain 1942). Application to sedimentary fabrics soon followed (Richter 1936; Krumbein 1939).

F

Facies departure map A map based on the idea of the “distance” in a ternary diagram or tetrahedron of one composition from another, taken with regard to a chosen reference composition which is not an end-member. The “distance” forms concentric circles (or spheres) about the reference composition in 2- or 3-dimensions depending on whether 3 or four end-members are used. The idea was suggested by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1955a; Fogortson 1960) for facies mapping. (Note that this predates modern work on the nature of compositional data and distance would be better computed on the basis of a logratio transform rather than the method Krumbein adopted). Facies diagram The concept of a metamorphic facies was introduced by the Finnish geologist, Pentii Eskola (1883–1964) (Eskola 1915, 1922). A modern definition is: a set of metamorphic mineral assemblages, repeatedly associated in time and space and showing a regular relationship between mineral composition and bulk chemical composition, such that different metamorphic facies (sets of mineral assemblages) appear to be related to different metamorphic conditions, in particular temperature and pressure, although other variables, such as PH2O may also be important (Smulikowski et al. 2007). A metamorphic facies diagram shows in diagrammatic form the coexisting stable mineral assemblages for a particular set of thermodynamic conditions. Usdansky (1985) gave a program to aid the construction of such diagrams. See also: facies map, facies departure map. Facies map An abbreviation of lithofacies map. These are isoline maps which show the areal distribution of changing characteristics of a sedimentary formation or stratigraphic unit based on quantitative data (usually derived from outcrop and/or well log measurements, etc.), e.g. the total thickness of sand, shale, carbonate and evaporite rocks through each profile, yielding lithological ratio or sand-shale ratio maps, such as a (conglomerate + sand)/shale ratio map; clastic ratio maps, such as a (conglomerate + sand + shale)/(carbonate + evaporite + coal) map; or lithological composition, shown in terms of boundaries drawn on a sand-shale-nonclastics end-member ternary diagram; ratios of lithological subtypes may also be used (Krumbein 1948), etc. Such maps have often been combined with isopachs for the total thickness of the unit studied, or isopleths of relative entropy to characterise the degree of mixture uniformity. See also: facies departure map. Although the term facies was first used by the Swiss geologist, Amanz Gressly (1814–1865) for units in sedimentary rocks formed in specific environments (Gressly 1838), these quantitative methods first became popular in the late 1940s following their promotion by the American geologists William Christian Krumbein (1902–1979) and Laurence Louis Sloss (1913–1996): Krumbein (1948, 1952, 1955a),

203

Krumbein and Sloss (1951), Moore (1949), Le Roux and Rust (1989). Clark (1981) describes an early lithofacies mapping package developed by Mobil Oil Canada. See also: isolith map, biofacies map, most predictable surface. Factor Analysis (FA) The multivariate technique of factor analysis, which was introduced by the English psychologist Charles Edward Spearman (1863–1945) (Spearman 1904b), and developed by the American psychologist, Louis Leon Thurstone (1887–1955) (Thurstone 1931), aims to explain the behaviour of a set of n observed objects on the basis of p measured variables in terms of a reduced set of k new variables. It is assumed that the latter reflect a number of latent, or unobserved, common factors which influence the behaviour of some, or all, of the original variables; some may be unique factors, influencing only one variable. Principal components analysis can be based on the correlation matrix, in which the principal diagonal (the correlation of each variable with itself) is unity. In factor analysis, the entries in this diagonal are replaced by estimates of the commonality, a measure of the non-uniqueness of the variables (e.g. the multiple correlation coefficient of a variable with all others). A matrix of correlations between the factors and the original set of variables (called the loadings matrix) is often used to interpret the nature of a causative scheme underlying the original measurement set, although this is not implicit in the model. The coordinates of the points projected onto the factor axes are called the factor scores. An analysis similar to principal components is performed, aiming to produce a “simple structure” in which, ideally, each variable would have a non-zero loading on only one common factor. Methods are used to achieve this are: orthogonal rotation of the axes, or, better, oblique rotation, in which the initial factor axes can rotate to best summarise any clustering of the variables. Common oblique methods are varimax rotation (Kaiser 1958), which tries to maximise the variance of the loadings in each column of the factor matrix; quartimax rotation, which aims to maximise the variance of the squares of the loadings in each row of the factor matrix; or equimax rotation, which is a compromise between the other two. Other criteria, e.g. maximum entropy, have also been applied. Interpretation of the meaning of the results is subjective. Imbrie and Purdy (1962) and Imbrie and Van Andel (1964) introduced the cosθ coefficient for factor analysis of palaeontological and mineralogical compositional data (see also Miesch 1976b). Analyses of the relationships between the variables, based on a correlation matrix, is referred to as an R-mode analysis, whereas an analysis of the relationships between specimen compositions, etc., based on the cosθ matrix, resolved in terms of a number of theoretical end-members, is referred to as a Q-mode analysis. The first computer program for this purpose available in the earth sciences was that of Imbrie (1963). Joreskog et al. (1976) review methods then in use, however, as with principal components analysis, it has subsequently been realised that special methods must be used because of the closed nature of such data (Aitchison 1986, 2003; Buccianti et al. 2006). Factor score The value an individual takes in the transformed space given by one or more of the set of new variables (the so-called factors) obtained by performing a factor analysis

204

(Spearman 1904b; Thurstone 1931; Kaiser 1958; Imbrie and Purdy 1962) on the original data set. Factorial For a positive whole number n, denoted n! this is: n (n 1) (n 2) ∙∙∙∙ 3 2 1, so that 3! ¼ 6. Although the factorial function itself was used in the seventeenth century (Dutka 1991), the notation (n!) was introduced by the French mathematician, Christian (Chrétien) Kramp (1760–1826) in 1808 (Mitchell 1911). Factorial co-kriging The decomposition of p observed variables in r underlying variables (which may be orthonormal although this is not necessary) which are not observed at all the 2- or 3-dimensionally located data points. The model is:

F

ZðxÞ ¼ AYðxÞ þ μ where Y(x) is the r 1 vector of underlying zero expectation, orthogonal, random variables at position x; Z(x) is the p 1 vector of original random variables at position x; A is the p r matrix of coefficients relating the underlying Y(x) variables to the observed Z(x) variables; and μ is the p 1 vector of global means. The coefficients would be estimated after fitting all the variograms and cross-variograms of the observed variables (Marcotte 1991). This approach was originally introduced by the French geostatistician, Georges Matheron (1930–2000), (Matheron 1970, 1982); see also Bourgault and Marcotte (1991). Faded overlap method, faded overlap segment method A method of time series spectral analysis in which the total length of the series is divided into much shorter overlapping equal-length segments. The spectrum is estimated for each piece and resulting estimates are averaged frequency-by-frequency. Introduced by the British statistician, Maurice Stevenson Bartlett (1910–2002) (Bartlett 1948). Anderson and Koopmans (1963) is an example of its use in earth sciences; see also: Welch method. Fast Fourier Transform (FFT) A related group of algorithms designed for fast computation of the discrete Fourier transform of a data series at all of the Fourier frequencies. Named for the French physicist Jean-Baptiste-Joseph Fourier (1768–1830) and traceable back to the work of the German mathematician, Carl Friedrich Gauss (1777–1855) (Gauss 1805). Cooley (1990, 1992) attributes the first computer implementation of the FFT to the American physicist, Philip Rudnick (1904–1982) of Scripps Institution of Oceanography, just prior to publication of the Cooley-Tukey algorithm, based on the Danielson and Lanczos (1942) paper, although the physicist, Llewell Hilleth Thomas (1903–1992) had implemented a Fourier series calculation using an IBM tabulator and multiplying punch in 1948. See also: Heidman et al. (1984), Cooley (1987, 1990), Cooley and Tukey (1965), Gentleman and Sande (1966), Cooley et al. (1967), Van Loan (1992), Sorensen et al.

205

(1995), Camina and Janacek (1984) and Buttkus (1991, 2000); Lomb-Scargle Fourier transform, Blackman-Tukey method. Fast Walsh Transform (FWT) An algorithm (analogous to the Fast Fourier transform) used in spectrum analysis (Whelchel and Guinn 1968; Brown 1977) which involves fitting a type of square wave function (Walsh function) rather than sine and cosine waves, to a square wave time series. It has proved well-suited to the analysis of data from lithological sections in which lithological state is encoded as a function of distance through the section (e.g. the codes: shale, 1; limestone, +1). Named for the American mathematician, Joseph Leonard Walsh (1895–1973). For geoscience examples see Negi and Tiwari (1984), Weedon (1989, 2003); see also: power spectral density analysis, sequency. Fast Wavelet Transform (FWT) An algorithm (analogous to the Fast Fourier transform) developed by the French applied mathematician, Stéphane G. Mallat (1962–) (1989a, 2008) which enabled the rapid computation of a wavelet analysis. This work built on earlier discoveries in the telecommunications field of the techniques of sub-band coding (Esteband and Galand 1977) and pyramidal algorithms (Burt and Adelson 1983). See Cohen and Chen (1993) and Ridsdill-Smith (2000) for discussion in a geoscience context. Favorability function (F) A favorability function (N.B. U.S. spelling) in mineral exploration is defined as F ¼ a1 x 1 þ a 2 x 2 þ ∙ ∙ ∙ ∙ þ a n x n where F (sometimes computed so as to be constrained to the interval {1, +1}) is an index of the favourability of a region to the mineralisation of interest. Methods used to estimate F have included characteristic analysis, subjective choice of weighting factors and canonical correlation. See also weights of evidence model (Chung and Fabbri 1993). Feature extraction The reduction of a very large, complex, data set with inherent redundancy to a simpler, much smaller, set of features which may then be used for comparison and classification purposes. The term derives from pattern recognition and image processing (Chien and Fu 1966). See Howarth (1973a) and Hills (1988) for early geoscience examples. Feedback The use of part of the output of a system as a partial input to itself. The theory of feedback amplification were first investigated by the Swedish-American and American electrical engineers Harry Nyquist (1889–1976), and Harold Stephen Black (1898–1983), who made the first negative-feedback amplifier in 1927 (Nyquist 1932; Black 1934). In geophysics, the term occurs in Willmore (1937). See also: recursive filter.

206

Feigenbaum constant A universal constant for functions approaching chaos, given by the ratios of successive differences between the parameter values, g(i), corresponding to the onset of the i-th period-doubling bifurcation as the controlling parameter value increases, e.g. in the logistic map. The ratio [gn g(n1)]/[g(n+1) gn] eventually approaches the value 4.699201660910. for a large class of period-doubling mappings. It was discovered by the American mathematical physicist, Mitchell Jay Feigenbaum (1944–) in 1975 (Feigenbaum 1978, 1979). The first experimental confirmation of its value in a real physical process was provided by the French physicist, Albert J. Libchaber (1934–) and engineer, Jean Maurer who observed period-doubling cascades in waves travelling up and down vortices in superfluid helium (Libchaber and Maurer 1982). For discussion in a geoscience context see Turcotte (1997).

F

Fejér window Used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time waveform. N, the length of the Bartlett window, is typically even and an integer power of 2; for each point, n ¼ 0,, (N 1), the weight w(n) is given by 2 N 1 n ðN 1Þ wðnÞ ¼ : N 1 2 2 Mentioned in an earth science context by Buttkus (1991, 2000). Named for the Hungarian mathematician, Lipót Fejér (1880–1959) (cf. Fejér 1904), it is also known as the triangle window, or Bartlett window, named for the British statistician, Maurice Stevenson Bartlett (1910–2002) (Bartlett 1948, 1950), the term was introduced by that name into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). For a comprehensive survey see Harris (1978). See also: spectral window, Bartlett method. Fence diagram A set of vertical sections of geological or seismic record sections, etc. drawn (usually using a vertical isometric projection) so as to illustrate variations as though in three dimensions. An early example occurs in Sloss and Laird (1946). The same technique has also been used in geochemistry, e.g. the classic Eh-pH fence diagram for the stability fields of iron facies in Krumbein and Garrels (1952). Fidelity 1. A measure (Bray and Curtis 1957) suggested by Belbin (1984) for use, in a geological context, as a comparator for the success of hierarchical cluster analysis algorithms. 2. The level of perceptibility of noise in a reconstructed continuous signal introduced by its quantization (Wang 2009).

207

Field In mathematics, it is a set of numbers which can be added, multiplied, subtracted or divided by each other (except zero) to gave a result which is a member of the same set. According to Miller (2015a), field is the English equivalent of Zahlk€orper (number-body), a term used by the German mathematician, Julius Wilhelm Richard Dedekind (1831–1916) in his lectures since 1858 but only published in Dedekind (1872). Field was introduced in this sense by the American mathematician, Eliakim Hastings Moore (1862–1932) (Moore 1893). File generation Also known as data loading, the process of initially reading data into machine storage, validating it, and preparing the database for subsequent update and retrieval (Gordon and Martin 1974; Hruška 1976). Filter A term originating in the work of the American electrical engineer, George Ashley Campbell (1870–1954) on telephonic transmission, who used low-pass, high-pass and band-pass filters from 1910 onwards (Campbell 1922). Algorithms for selectively removing noise from a time series or spatial set of data (smoothing), or for enhancing particular components of the waveform. First used in digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). The June 1967 issue of Geophysics was devoted entirely to digital filtering. See also: Frank and Doty (1953), Robinson (1966a), Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004); acausal filter, anti-alias filter, averaging filter, band-reject filter, Butterworth filter, causal filter, digital filter, frequency-selective filter, impulse response filter, matched filter, minimum-delay filter, nonrealizable filter, notch filter, realisable filter, stacking filter, threshold filter, two-pass filter, wavenumber filter, Wiener filter, zero-phase filter. Filter coefficients The value of successive terms in a filter of total length N, which together define its shape (Camina and Janacek 1984). Filtered spectrum The power spectrum of the output from any process which can be regarded as a filter (Blackman and Tukey (1958). The idea was popularised by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990). Filtering The act of applying a filter to a signal of some kind, so as to attenuate certain components of the signal based on some measurable property. The term was in frequent use in geophysics from the late 1940s (e.g. Howell 1949). For more recent discussion, see Buttkus (1991, 2000). Fineness factor The American ceramic engineer, Ross Coffin Purdy (1875–1949) introduced a fineness factor or surface factor for very fine-grained sediments in 1902 (Purdy 1908): It was computed by multiplying the reciprocal of the midpoint of each size

208

grade by the weight percentage of the material in the grade, expressed as a proportion of the total frequency, the sum of these gives the fineness factor. It was subsequently used by Krumbein and Pettijohn (1938). Finite data filter The application of a filter to a data series of finite length (Camina and Janacek 1984).

F

Finite-difference method A numerical method for the solution of differential equations by substituting difference quotients for derivatives, and then using these equations to approximate a derivative (Sheriff 1984). For example, the first derivative f '(x) of f(x), can be approximated by f'(x) [ f (x + ∂) f(x)]/∂, where ∂ is a small increment. The method originated in work in mathematical physics by German mathematicians Richard Courant (1888–1972), Kurt Otto Freidrichs (1901–1982) and Hans Lewy (1904–1988) (Courant et al. 1928); see Morton and Mayers (2005). Early geoscience application concern the exsolution of kamacite from taenite in iron meteorites (Wood 1964), conductive cooling models of dikes etc. (Delaney 1988; Zhao et al. 2008). Finite element analysis, finite element method A numerical method for finding approximate solutions to partial differential equations and integral equations using methods of numerical integration. This approach is based either on eliminating the partial differential equation completely, or by replacing it with an approximating system of ordinary differential equations which are then numerically integrated using standard techniques. The method originated from the need to find solutions to complex elasticity and structural analysis problems in civil and aeronautical engineering. This field of research began with work published by the Russian-Canadian structural engineer, Alexander Hrennikoff (1896–1984) (Hrennikoff 1941) and the German–American mathematician, Richard Courant (1888–1972) (Courant 1943). This was followed by work in Germany by the Greek mathematician, John Agyris (1913–2004) on matrix transformation methods, which was utilised by the American structural engineer, Ray William Clough (1920–) who, having used two-dimensional meshes composed of discrete triangular elements in related work (Turner et al. 1956), realised that they could be applied to solve problems in continuum mechanics, predicting stresses and displacements in continuous structures. Clough (1960) coined the term finite element method for the resulting computational approach. By the 1970s software packages for executing this type of analysis were being distributed (e.g. Wilson 1970). Early geoscience applications include: Coggon (1971), Stephansson and Berner (1971) and Cheng and Hodge (1976); see also Zhao et al. (2008). The unhyphenated spelling finite element rather than finite-element is by far the most widely-used (Google Research 2012). Finite Fourier transform If X( j), j ¼ 0,1, ∙∙∙ , N 1 is a series of N finite valued complex numbers, the finite Fourier Transform of X( j) is defined as:

209

AðnÞ ¼

N 1 1 X X ðjÞe2πinj=N N j¼0

pffiffiffiffiffiffiffi where i is the imaginary unit 1, and e is Euler’s number, the constant 2.71828 (Schoenberg 1950; Cooley et al. 1969); see also Schouten and McCamy (1972). Finite Impulse Response (FIR) filter In a non-recursive filter, the output y(t) of the filter Pk at time t depends only on i¼k wi xti , where wi are the applied weights; in a recursive filter, the output will also depend on a previous output value, yð t Þ ¼

k X i¼k

ai xti þ

k X

bj ytj

j¼0

and ai and bj are the applied weights. If recursive filters are used on processing real-time problems, then observations for i or j > t will not exist; these are physically realisable, as opposed to the more general, physically unrealisable, case. Such “one-sided” physically realisable filters are also known as infinite impulse response (IIR) filters, as they can produce effects arbitrarily far into the future from a single impulse (e.g. a Dirac function). Non-recursive filters are correspondingly known as finite impulse response (FIR) filters. Filters which can be implemented on real-time physical systems are also known as causal filters; those which are applied to filtering an entire time series which has already been obtained are also known as acausal filters. For discussion see: Hamming (1977) and, in an earth science context, Buttkus (1991), Gubbins (2004) and Weedon (2003). Finite strain analysis Finite strain is a dimensionless measure of the total changes in shape and size undergone by a rock body from its original condition as a result of its deformation. Strain analysis (Ramsay 1967) involves determining the strain orientation, magnitude and, if possible, the sequence of changes in strain over time (the strain path). Strain may be characterised by both the change in the length of a line (extension) or the change in the angle between two lines (shear strain). Firmware A computer program implemented in a type of hardware, such as a read-only memory. The term was coined by the American computer scientist, Ascher Opler (1917–1969) in Opler (1967). First differencing Subtracting successive values of a time series from each other (Weedon 2003).

210

Fisher algorithm A method developed by the American economist Walter Dummer Fisher (1916–1995) for optimal non-hierarchical partitioning of a one-dimensional data set into a given number of groups, so as to minimise the total sum of the absolute deviations from the medians (Fisher 1958). A FORTRAN implementation was published in Hartigan (1975). Fisher distribution This is a spherical distribution (also known as the spherical normal distribution) introduced by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) (Fisher 1953). It is an extension of the von Mises distribution to the spherical case. The probability density function is given by

F

F ðθ; κÞ ¼

κ sin ðθÞeκ cos θ 4πsinhðκÞ

where κ > 0 is the concentration parameter; θ is the angular separation of a particular direction away from the true mean direction; and e is Euler’s number, the constant 2.71828. As κ tends to 0, the distribution tends to become uniform over the sphere; as κ becomes larger, the distribution tends to concentrate around the mean direction. cosθ ¼ (l.l' + m.m' + n.n') where (l, m, n) and (l', m', n') are the vectors of the direction cosines of a directional random vector in three dimensions and its mean direction respectively. See Mardia (1972) and Fisher et al. (1993) or, in an earth science context, Cheeney (1983), Tauxe et al. (1991), Fuller (1992) and Buttkus (1991, 2000) for further discussion. According to Fuller (1992) use of the Fisher distribution is preferred to the Bingham distribution by many paleomagnetic workers as significance tests and determination of confidence limits are less complex. See also spherical statistics, Kent distribution. Fisher’s exact test, Fisher-Yates test An alternative to use of the Chi-squared statistic for assessing the independence of two variables in a two-by-two contingency table, especially when the cell frequencies are small. The method consists of evaluating the sum of the probabilities associated with the observed table and all possible two-by-two tables which have the same row and column totals as the observed data but exhibit more extreme departure from independence (Everitt 1992). Called the Fisher-Yates test by Finney (1948), as following a suggestion by the English statistician, (Sir) Roland Aylmer Fisher (1890–1962), it was proposed by his colleague, Frank Yates (1902–1994) (Yates 1934). It is also known as Fisher’s exact test, or the exact chi-square(d) test. See Butler et al. (2010) for a palaeontological application. Fitness for purpose Choosing a measurement process so as to minimise uncertainty (to ensure correct decision-making) while avoiding unnecessary expenditure on the measurement method (e.g. the very successful use of colorimetric tests in field geochemistry for mineral exploration; use of X-ray fluorescence analysis or atomic absorption spectroscopy as opposed to neutron activation analysis for particular analytical (chemical) tasks). From

211

the principles of error propagation, the uncertainty in the analytical result has a negligible effect on the combined uncertainty of the final result unless the analytical uncertainty is greater than about one third of the sampling uncertainty. In short, there is no point in paying a premium for high accuracy analysis when low accuracy will achieve the same ends (Tooms 1959; Webb 1970; Webb and Thompson 1977). This approach (regarded as highly controversial at the time) was pioneered by the English applied geochemist, John Stuart Webb (1920–2007), founder in 1959 of the Geochemical Prospecting Research Centre, Royal School of Mines, Imperial College, London, renamed the Applied Geochemistry Research Group (AGRG) in 1965. Under his colleague, chemist Michael Thompson (1938–), the methods employed and devised in the AGRG became progressively more informative, and supported by a firmer conceptual foundation with the passage of time (Ramsey and Thompson 1992; Thompson and Hale 1992). Fitness-for-purpose has become transformed from a vague idea into a quantitative theory that is now beginning to be applied in all sectors (Thompson and Fearn 1996; Fearn et al. 2002). The methods of analytical quality control invented in the AGRG have now been formalised in internationally recognised protocols (Thompson and Wood 1995; Thompson et al. 2002, 2006). Fitted model A formal representation of a theory or causal scheme which is believed to account for an observed set of data. Often found in the context of: (i) regression analysis, when a numerical model, y ¼ f(x) + ε, where ε is an error term, is fitted to an observed set of data; f(x) is usually an explicit linear, polynomial, or parametric nonlinear function and e is an implicit error term accounting for the difference between observed and fitted values, a normal distribution of error is generally assumed. (ii) Fitting a probability density function to an observed frequency distribution. Model-fitting consists of a number of steps: obtaining a set of data representative of the process to be modelled; choosing a candidate model; fitting the model (usually be estimating the values of some parameters); summarizing the model; and using diagnostics to find out in what ways it might fail to fit as well as it should; if necessary choosing an alternative model and repeating these steps until a satisfactory solution is arrived at. The term appears in Kendall and Stuart (1958) and in an earth science context in: Krumbein and Tukey (1956), Krumbein and Graybill (1965), Koch and Link (1970–1971); see also: conceptual model, deterministic model, discovery-process model, fluid-flow model, mathematical model, physical model, scale model, stochastic process model. Fixed effects, fixed-effects The effects observed on a response variable, y ¼ f ð xÞ corresponding to a set of values of a factor (x) that are of interest and which exist only at given fixed values (as opposed to a random effects factor which has infinitely many possible levels of which only a random sample is available). The term arose in the context of analysis of variance (Eisenhart 1947; Scheffé 1956). Early discussion occurs in Mood

212

(1950), Kempthorne (1952) and Wilk and Kempthorne (1955); and in a geological context by Krumbein and Graybill (1965) and by Miller and Kahn (1962). The unhyphenated spelling fixed effects has consistently been most widely used (Google Research 2012). Fixed point

F

1. A point which is left unchanged by a transformation. 2. A point in phase space towards which a dynamical system evolves as transients die out; once it has been reached, the evolution of the system remains unchanging; a synonym of singular point, a term introduced by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) (Poincaré 1881, 1882; Turcotte 1997). 3. A method of data representation in which the position of the decimal point is fixed, or integer operations (Sheriff 1984). For example, a fixed-point representation that has seven decimal digits, with the decimal point assumed to be positioned after the fifth digit, can represent the numbers 12345.67, 8765.43, 123.00, etc. See also: floating point. Flattening Used in structural geology (Ramsay 1967; Ramsay and Huber 1983), a synonym of ellipticity. In two dimensions, the ellipticity or strain ratio (R) of a finite strain ellipse with major and minor semi-axes (1 + e1) and (1 + e2), where e1 and e2 are the principal finite extensions (also called principle finite strains), is R ¼ (1 + e1)/(1 + e2). In three dimensions we have (1 + e1) (1 + e2) (1 + e3). The three planes XY, YZ and ZX are the principal planes of finite strain and the strain ratios are: Rxy ¼ (1 + e1)/(1 + e2), Ryz ¼ (1 + e2)/(1 + e3), and Rzx ¼ (1 + e1)/(1 + e3). See also: strain ellipsoid. (The term principle strain appears in a discussion of elasticity by the British physicist, William Thomson, Lord Kelvin (1824–1907), (Thomson 1856). Flat-topping Also known as clipping. Resetting all values in a time series with amplitudes above (and/or below) a given threshold(s) to the value of the threshold(s). The term was in use in signal communication by at least the late 1940s (Licklider and Pollack 1948); for discussion in a geophysical context see: O’Brien et al. (1982) and Weedon (2003). Fletcher-Powell algorithm See Davidon-Fletcher-Powell algorithm. Flier An anomalous data value (Sheriff 1984). For discussion, see the more frequentlyused term outlier. Flinn diagram, Flinn plot A method of classifying the shape of the strain ellipsoid on the basis of the two principal strain ratios: the ratio of the maximum/intermediate extensions plotted on the y-axis and the ratio of the intermediate/minimum extensions

213

plotted on the x-axis. Named for the British structural geologist, Derek Flinn (1922–2012) who introduced it (Flinn 1962) following an earlier study (Flinn 1956) of deformed clast shapes which used an adaptation of the Zingg plot under the name deformation plot. See also Jelinek diagram, Ramsay logarithmic diagram, Woodcock diagram. Flinn’s k-value A parameter expressing the amount of oblateness or prolateness of an ellipsoid plotted in a Flinn diagram: k ¼ (Rxy 1)/(Ryz 1), where Rxy and Ryz are the two principal strain ratios. Introduced by the British structural geologist, Derek Flinn (1922–2012) (Flinn 1962; Ramsay and Huber 1983). See Mookerjee and Peek (2014) for a comparison with Lode’s number; see also: strain ellipsoid. Flip transform, flipped data A transform applied to left-censored concentration data introduced by the American hydrologist, Dennis R. Helsel (2005) so as to make analysis of such data by statistical methods, which generally assume one is analysing right-censored data, possible: flip(x) ¼ c x,where c is a suitably chosen large constant. Floating-point representation The typical number which can be represented exactly is of the form (Camina and Janacek 1984; Sheriff 1984): significant digits baseexponent. The term floating-point refers to the fact that the radix point (decimal point, or more frequently in computers, binary point) can “float,” i.e., it can be placed anywhere relative to the significant digits of the number. A fixed-point representation that has seven decimal digits, with the decimal point assumed to be positioned after the fifth digit, can represent the numbers 12345.67, 8765.43, 123.00, etc., whereas a floating-point representation with seven decimal digits could in addition represent 1.234567, 123456.7, 0.00001234567, etc. In 1938, the German engineer and computer pioneer, Konrad Zuse (1910–1995) of Berlin completed the Z1, the first mechanical binary programmable computer. It worked with 22-bit floating-point numbers having a 7-bit exponent, and used floating point numbers in his Z4 computer in 1950; this was followed by the IBM 704 in 1954. See also: Goldberg (1991); roundoff error. Flowchart A diagrammatic representation of the steps involved in obtaining the solution to a problem. The technique was first adopted for industrial process control by the American engineer, Frank Bunker Gilbreth Sr. (1868–1924) in 1921 and his approach was later taken up by the American mathematician and computer scientist, Herman Heine Goldstine (1913–2004) and the Hungarian-born American mathematician Janosh (John) von Neumann (1903–1957), who in 1946 adapted the use of the “flow diagram” to the planning of computer programs (Goldstine and von Neumann 1947; Knuth 1968–1973; Goldstine 1972; Haigh et al. 2014a), to give “a picture of the motion of the control organ as it moves through the memory picking up and executing the instructions it finds there; [showing] the states of the variables at various key points in the computation; [and indicating] the formulas being evaluated” (Goldstine 1972, 267). The first high-level computer-programming language, FORTRAN, was released by International Business

214

Machines Corporation (IBM) in 1957, and this was followed by Algol in 1958. Subsequently, the symbols and methodology used for flowcharting by IBM (1969a) rapidly became the de facto standard, aided by the use of a plastic template (IBM 1969b) for drawing the symbols. In early published geological applications, Krumbein and Sloss (1958) flowcharted a machine-language program for the computation of stratigraphic ratios, and Johnson (1962) flowcharted and programmed the calculation of the CIPW norm (Cross et al. 1902) for an IBM 650 computer at the University of Oklahoma. See also: algorithm.

F

Flow graph A graphical representation of transition probabilities from one state to another (e.g. switching from one rock type to another in a stratigraphic section) in which the probabilities are represented as directed vectors, whose magnitude reflects the probability of one state succeeding another, joining the appropriate nodes among those representing the various possible states in the system (Berge and Ghouli-Houri 1965; Davis and Cocke 1972). See: substitutability analysis. Fluctuation The range of orientation of long axes of a set of ellipsoidal deformed particles with respect to a marker direction. A term introduced by the German-born American petrologist and structural geologist, Ernst Cloos (1898–1974) (Cloos 1947); see also Ramsay (1967) and Dunnet (1969). Fluid-flow model Electrical analogue models were used to solve transient-flow problems in hydrogeology in the 1950s. Although effective, they were time-consuming to set-up and each hard-wired model was problem-specific. The digital computer provided a more flexible solution. Finite-difference methods (in which the user establishes a regular grid for the model area, subdividing it into a number of subregions and assigns constant system parameters to each cell) were used initially (Ramson et al. 1965; Pinder and Bredehoeft 1968) but these gradually gave way to the use of finite-element models, in which the flow equations are approximated by integration rather than differentiation, as used in the finite-difference models (see Spitz and Moreno (1996) for a review). Although both types of model can provide similar solutions in terms of their accuracy, finite-element models had the advantage of allowing the use of irregular meshes which could be tailored to any specific application; they required a smaller number of nodes; and enabled better treatment of boundary conditions and anisotropic media. They were introduced first into groundwater applications by Javandrel and Witherspoon (1969). With increasing interest in problems of environmental contamination, the first chemical-transport model was developed by Anderson (1979). Stochastic (random walk) “particle-in-cell” methods were subsequently used to assist visualization of contaminant concentration: the flow system “transports” numerical “particles” throughout the model domain. Plots of the particle positions at successive time-steps gave a good idea of how a concentration field developed (Prickett et al. 1981). Spitz and Moreno (1996, Table 9.1, p. 280–294) give a comprehensive summary of recent groundwater flow and transport models. Numerical models now

215

underpin applications in hydrogeology, petroleum geology and, latterly, nuclear and other contaminant-transport problems. Models in which both finite-element and stochastic simulation techniques are applied have become increasingly important. For example, Bitzer and Harbaugh (1987) and Bitzer (1999) have developed basin simulation models which include processes such as block fault movement, isostatic response, fluid flow, sediment consolidation, compaction, heat flow, and solute transport. Yu (1998) has reported significant reductions in processing-time for two- and three-dimensional fluid-flow models using a supercomputer. The papers in Gómez-Hernández and Deutsch (1999) discuss recent developments in the field. See also: conceptual model, deterministic model, discovery-process model, mathematical model, physical model, scale model, statistical model, stochastic process model; conditional simulation. Folding frequency Discussed in a seismological context by Tukey (1959a), folding frequency is an alternative term for the Nyquist frequency. When a time series is sampled at regular intervals (Δ), the Nyquist frequency, ω, is π/Δ radians per unit time or 1/(2Δ) cycles per unit time, i.e. half the sampling frequency. Named for the Swedish-American physicist, Harry Nyquist (1889–1976), who first discussed aliasing-related sampling issues (Nyquist 1928a) in Blackman and Tukey (1958). They too called it the folding frequency. For discussion in a geophysical context, see also Camina and Janacek (1984), Buttkus (1991, 2000), Weedon (2003), Gubbins (2004). See also: sampling theorem. Form analysis A term used for automated computer-based two-dimensional particle shape analysis by Starkey and Simigian (1987). Form biplot A biplot is a graphical display of the rows and columns of a rectangular n p data matrix X, where the rows generally correspond to the specimen compositions, and the columns to the variables. In almost all applications, biplot analysis starts with performing some transformation on X, depending on the nature of the data, to obtain a transformed matrix Z, which is the one that is actually displayed. The graphical representation is based on a singular value decomposition of Z. There are essentially two different biplot representations: the form biplot, which favours the display of individuals (it does not represent the covariance of each variable, so as to better represent the natural form of the data set), and the covariance biplot, which favours the display of the variables (it preserves the covariance structure of the variables but represents the samples as a spherical cloud). Named for the German-born statistician, Kuno Ruben Gabriel (1929–2003) who introduced the method (Gabriel 1971). See also: Greenacre and Underhill (1982), Aitchison and Greenacre (2002); and, in an earth science context, Buccianti et al. (2006). FORTRAN Acronym for “Formula Translation,” as in the IBM Mathematical Formula Translating System (IBM 1954). FORTRAN was the first high-level computer programming language, originally proposed in 1954 by John Warner Backus (1924–2007), Harlan

216

F

Herrick and Irving Ziller of International Business Machines Corporation (IBM 1954), it was subsequently developed for the IBM 704 computer in early 1957 (IBM 1957; McCracken 1963) by a team, led by Backus, consisting of: Sheldon F. Best, Richard Goldberg, Lois Mitchell Haibt, Herrick, Grace E. Mitchell, Robert Nelson, Roy Nutt, David Sayre, Peter Sheridan and Ziller. Its optimising compiler generated machine code whose performance was comparable to the best hand-coded assembly language. By 1958 the language was extended to enable usage of blocks of self-contained code which could perform a particular set of operations (such as part of a calculation, input, or output, etc.) through the introduction of CALL, SUBROUTINE, FUNCTION, COMMON and END statements, with separate compilation of these program modules. This version was known as FORTRAN II (IBM 1958; McCracken 1963). The short-lived FORTRAN III, developed in 1958, was never released to the public, and an improved version of FORTRAN II (which removed any machine-dependent features, and included provision for processing logical data as well as arithmetic data) was released in 1962 as FORTRAN IV, first for the IBM 7030, followed by the IBM 7090 and 7094. Backward compatibility with FORTRAN II was retained by means of a FORTRAN II to FORTRAN IV translator (McCracken 1965). By 1963 over 40 different compilers existed to allow FORTRAN IV to be used on other manufacturers’ hardware. The rapid take-up of the language and its widespread usage for scientific programming was reflected in the earth sciences (Whitten 1963; Kaesler et al. 1963; Fox 1964; Link et al. 1964; Manson and Imbrie 1964; Koch et al. 1972). Standardisation of its implementation across a wide variety of platforms was enabled by introduction of FORTRAN 66 (largely based on FORTRAN IV) by the American Standards Association (later known as the American National Standards Institute) in 1966 and FORTRAN 77, FORTRAN 90, FORTRAN 95 and FORTRAN 2003 followed. See also: FORTRAN assembly program. FORTRAN assembly program (FAP) A low-level, machine-specific, assembler code, originally developed by David E. Ferguson and Donald P. Moore at the Western Data Processing Centre, University of California, Los Angeles (Moore 1960), which enabled translation, by means of a compiler, of the human-originated instructions in the FORTRAN programming language into the strings of binary bits required for the actual machine operation. It was the standard macro assembler for the International Business Machines Corporation (IBM) 709, 7090 and 7094 computers. William T. Fox’s (1964) program for calculating and plotting geological time-trend curves is an early example of its explicit use in the earth sciences. See also: computer program. Forward difference A finite difference defined as Δdn xn + 1 xn. Higher-order differences are obtained by repeated applications of the operator (Nabighian 1966). Forward model, forward problem A forward model (Parker 1972, 1977) calculates what would be observed from a given conceptual model; it is prediction of observations, given the values of the parameters defining the model (estimations of model parameters

217

! quantitative model ! predictions of data) e.g. predicting the gravity field over a salt dome whose characteristics have been inferred from a seismic survey (Sheriff 1984; Gubbins 2004) or structural kinematic modelling (Contreras and Suter 1990). Also called direct problem (Ianâs and Zorilescu 1968) or normal problem (Sheriff 1984). See also: inverse problem. Forward selection In both multiple regression and classification (discriminant analysis) there may be a very large number (N ) of potential predictors, some of which may be better predictors than others. In order to find the best possible subset of predictors, one could look at the results obtained using every possible combination of 1, 2, . . . N predictors, but this is often impractical. General strategies are: (i) forward selection, in which the best single predictor is found and retained, all remaining (N 1) predictors are then evaluated in combination with it; the best two are then retained, etc. (ii) backward elimination, which begins with all N predictors; each one is eliminated at a time and the best-performing subset of (N 1) predictors is retained, etc. in either case, selection stops when no further improvement in the regression fit or classification success rate is obtained (Howarth 1973a; Berk 1978). Forward solution The solution of a forward model. The term is used in this context in geodesy in Bjerhamar (1966) and in geophysics in Everett (1974) Fourier analysis A data-analysis procedure which describes the fluctuations in an observed time series by decomposing it into a sum of sinusoidal components of different amplitudes, phase and frequency. Named for the French physicist, Jean-Baptiste-Joseph Fourier (17681830), who introduced the method (Fourier 1808, 1822). It was early applied in geophysics by the Dutch chemist and meteorologist, Christophorous Henricus Didericus Buys Ballot (1817–1890) and by the German-born British physicist, Franz Arthur Friedrich Schuster (1851–1934) (Buys Ballot 1847; Schuster 1897, 1898), for other early geophysical applications, see: Korn (1938) and Born and Kendall (1941). For examples of geological applications, see: Harbaugh and Merriam (1968), Schwarcz and Shane (1969), Dunn (1974), Camina and Janacek (1984), Weedon (2003) and Gubbins (2004). See also: Fourier frequency, Fourier’s theorem, discrete Fourier transform. Fourier coefficients The French physicist Jean-Baptiste-Joseph Fourier (1768–1830) stated in 1807 that every function f(t) of time, defined over the interval t ¼ 0 to t ¼ 2π, and which has a continuous first derivative except, at most, at a finite number of points in the interval, can be expressed as an infinite series of trigonometric functions: f ðt Þ ¼ a0 þ ða1 cos t þ b1 sin t Þ þ ða2 cos 2t þ b2 sin 2t Þ þ , where a0, a1, b1, a2, b2; etc. are the harmonic coefficients (Fourier 1808). This function can be re-expressed as a combination of sine waves: cn sin(nt + pn) of frequencies nt, where the

218

frequency n ¼ 1, 2, 3, ∙∙∙; cn is the amplitude; and pn is the phase. In practice, the function may be approximated very closely using a series with a finite number of terms. An example of early usage in geophysics is by the Scottish physicist Cargill Gilston Knott (1856–1922) (Knott 1886). See also: Born and Kendall (1941), Buttkus (1991, 2000), Camina and Janacek (1984) and Gubbins (2004); Fourier analysis, Fourier series, Fourier synthesis, Fourier transform.

F

Fourier frequency Given a time series of n equally-spaced observations, the j-th Fourier frequency is ωj ¼ 2πj/n (radians per unit time); it has a period n/j, and so will complete j whole cycles in the length of the data sequence. The sines and cosines of the Fourier frequencies are orthogonal (i.e. uncorrelated). Named for the French physicist JeanBaptiste-Joseph Fourier (1768–1830) (Fourier 1808). See also: Fourier analysis, Fourier series, Fourier synthesis, Fourier’s theorem, Fourier transform. Fourier integral, Fourier integral transform Formulas which transform a time series function (waveform) into its frequency domain equivalent and vice versa. If X( f ) is a representation of x(t) in the frequency domain. They are related by (

xðt Þ ! X ð f Þ Fourier transform

X ð f Þ ! xðt Þ

) :

Inverse transform

(A physical analogy is a beam of white light passing through a prism and being separated into its continuous frequency components). Named for the French physicist, JeanBaptiste-Joseph Fourier (1768–1830) (Fourier 1808). The Fourier analysis of a time series of n equally-spaced observations {x0, x1, x2, . . . xn1} is its decomposition into a sum of sinusoidal components, the coefficients of which {J0, , Jn1} form the discrete Fourier transform of the series, where Jj ¼

n1 1X xteiωjt , n t¼0

pffiffiffiffiffiffiffi where i is the imaginary unit 1 ; ωj is the jth Fourier frequency; and e is Euler’s number, the constant 2.71828. In terms of magnitude A and phase φ, J j ¼ Aj eiφj :The development of the theory goes back to work by the German mathematician, Carl Friedrich Gauss (1777–1855) (Gauss 1805), its rediscovery by the American physicist, Gordon Charles Danielson (1912–1983) and the Hungarian-born physicist, Cornelius Lanczos (b. Lánczos Kornél, 1893–1974), (Danielson and Lanczos 1942) in the early days of computers, and its popularisation following development of the Cooley-Tukey algorithm (1965). For discussion see: Heideman et al. (1984), Sorensen et al. (1995), Cooley (1990, 1992), Whittaker and Robinson (1924), Blackman and Tukey (1958) and, in an earth

219

science context, Tukey (1959a), Camina and Janacek (1984), Buttkus (2000) and Gubbins (2004). See also: Fast Fourier Transform, periodogram, Lomb-Scargle Fourier transform, Fourier series, Fourier synthesis, Fourier’s theorem, Blackman-Tukey method. In two-dimensions, it may be achieved by optical means, see: optical data-processing. Fourier pair Operations and functions which Fourier transform into each other, such as: a time function $ an equivalent frequency function; e.g. the rectangular Daniell window (amplitude as a function of time, t), F(t) ¼ 0 when |t| > τ/2 and F(t) ¼ 1 when |t| < τ/2, is the pair to the sinc function (amplitude as a function of frequency, v), S(v) ¼ τ sinc(vτ) (Sheriff 1984). The term occurs in Kurita (1973). Fourier series The re-expression of a function as an infinite series of periodic (sine and cosine) functions whose frequencies are increased by a constant factor with each successive term, as they are integer multiples of a fundamental frequency. Named for the French physicist Jean-Baptiste-Joseph Fourier (1768–1830). See: Cayley (1879), Maxwell (1879), Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004). See also: Fourier analysis, Fourier synthesis, Fourier’s theorem, Fourier transform. Fourier synthesis The process of matching, or synthesising, a waveform by superimposing a set of chosen cosine and/or sine waves of chosen amplitudes and phase. The term was used in early X-ray crystallography (e.g. Bragg 1929) and occurs in geophysics in Nuttli (1964). Fourier’s theorem The French physicist Jean-Baptiste-Joseph Fourier (1768–1830) stated in 1807 (Fourier 1808) that every function f(t) of time, defined in the interval t ¼ 0 – 2π, and which has a continuous first derivative except, at most, at a finite number of points in the interval, can be expressed as an infinite series of trigonometric functions: f ðt Þ ¼ a0 þ ða1 cos t þ b1 sin t Þ þ ða2 cos 2t þ b2 sin 2t Þ þ , where a0, a1, b1, a2, b2; etc. are the harmonic coefficients. This function can be re-expressed as a combination of sine waves: cn sin(nt + pn) of frequencies nt, where the frequency n ¼ 1, 2, 3, ∙∙∙; cn is the amplitude; and pn is the phase. In practice, the function may be approximated very closely using a series with a finite number of terms. An example of early usage in geophysics is by the Scottish physicist Cargill Gilston Knott (1856–1922) (Knott 1886). See also: Born and Kendall (1941), Buttkus (1991, 1990), Camina and Janacek (1984), Gubbins (2004); Fourier analysis, Fourier series, Fourier synthesis, Fourier transform. Fourier transform Formulas which transform a time series function (waveform) into its frequency domain equivalent and vice versa. If X( f ) is a representation of x(t) in the frequency domain. They are related by

220

(

xðt Þ ! X ð f Þ Fourier transform

X ð f Þ ! xðt Þ

) :

Inverse transform

F

(A physical analogy is the splitting of beam of white light by passing it through a prism and it being separated into its continuous frequency components). Named for the French physicist, Jean-Baptiste-Joseph Fourier (1768–1830). For discussion see: Whittaker and Robinson (1924), Blackman and Tukey (1958) and, in an earth science context, Tukey (1959a), Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004). See also Smalley (2009); Fourier analysis, Fourier series, Fourier synthesis, Fourier’s theorem, Blackman-Tukey method, discrete Fourier transform, Fast Fourier transform, inverse Fourier transform, Lomb-Scargle Fourier transform. In two-dimensions, it may be achieved by optical means, see: optical data-processing. Fractal A fractal is an object made up of parts which are similar to the whole in some way, either exactly the same or statistically similar, except for their size (such as a fracture pattern in a rock), i.e. their shape is independent of scale. Their occurrence is characterised by a power-law size-frequency distribution, where the number of observed objects N of size A, is N ¼ cAeD,where c is a constant of proportionality and D is a non-integer constant which may, in some circumstances, gradually change in value with time. A fractal is said to possess the property of self-similarity. The classic example is the measured length of a coastline as a function of ruler-size, described by the British mathematician, Lewis Fry Richardson (1881–1953) (Richardson 1960a, b). His work was developed by the Polish-born French–American mathematician, Benoît B. Mandelbrot (1924–2010), who coined the term fractal in 1975 (Mandelbrot 1967, 1975a, b, 1977, 1982). The constant D is known as the fractal dimension. Geometric objects which possess self-similarity can, however, be traced back to the nineteenth century and earlier. See also Unwin (1989), Turcotte (1992), Herzfeld (1993), Barton and La Pointe (1995) and Agterberg and Cheng (1999): fractal distribution, multifractal. Fractal dimension Geometric fractals are geometric shapes possessing a fractional fractal dimension (D, D > 0). A coastline is the best-known example: If d (>0) is a measure of length then N(d ) ¼ cd D;where N(d ) is the number of straight-line segments of length d (or square boxes of side d ) needed to cover the fractal object (coastline) and c is a constant of proportionality. A graph of log[N(d )] (y-axis) as a function of log(d ) (x-axis) will be linear with a slope of D, the fractal dimension, also known as the Hausdorff dimension. A circle has D ¼ 1; the Cantor set has D 0.633. First described by the German mathematician, Felix Hausdorff (1868–1942) (Hausdorff 1918), the concept was later explored in depth and popularised by the Polish-born French mathematician, Benoît B. Mandelbrot (1924–2010). He originally called it “fractional dimension” (1967), but replaced this by “fractal dimension” in Mandelbrot (1975a, b, 1977). The “roughness” of a self-affine fractal is described by the Hurst exponent. See Esteller et al. 1999 for a review

221

of some more recent estimators of fractal dimension with time series applications; Kenkel (2013) discusses sample size requirements. There are alternative estimators of fractal dimension (the Minkowski-Bouligand dimension, sandbox dimension, Hausdorff dimension). See also Adler (1981), Herzfeld (1993), Barton and La Pointe (1995), Agterberg and Cheng (1999) and Esteller et al. (1999). Fractal distribution A fractal (a term coined by Mandelbrot 1975a, b) is an object made up of parts which are similar to the whole in some way. These parts will be either exactly the same or statistically similar, except for their scale (size). They are said to possess the property of self-similarity. Geometric fractals are geometric shapes possessing a fractional fractal dimension (D; > 0). A coastline is the best-known example: If d (greater than 0) is a measure of length then N(d ) ¼ cdD;where N(d) is the number of straight-line segments of length d (or boxes of area d d ) needed to cover the coastline and c is a constant of proportionality. A graph of log[N(d )] (y-axis) as a function of log(d) (x-axis) will be linear with a slope of D. Geological entities with self-similar properties are described by the Pareto distribution (Pareto’s law), N ðxÞ ¼ cxa , where 0 < x < 1, in the continuous case or the equivalent Zipf distribution (Zipf’s law) where x is discrete. Although fractal distributions may, in many cases, be approximated by a lognormal distribution, the power-law distribution, unlike the Lognormal, does not include a characteristic length-scale and is thus more applicable to scale-invariant phenomena. The theory was first developed by the German mathematician, Felix Hausdorff (1868–1942) (Hausdorff 1918), and popularised by the work of the Polish–French mathematician, Benoît B. Mandelbrot (1924–2010) (Mandelbrot 1967, 1975a, 1975b, 1977, 1982). Fractal sponge See Menger sponge. Fréchet distribution The extreme value distribution is the distribution of the largest (smallest) observation in a sample. One of the models for this is the Fre´chet distribution which has a probability density function: sþ1 b s s b f ðx; a; bÞ ¼ e½ðxaÞ , b xa where a > 0 is the location parameter; b > 0 is the scale parameter; and s > 0 is a shape parameter. It is named for French mathematician, Maurice Fréchet (Fréchet 1927); the Gumbel distribution and the Weibull distribution. Applications include: earthquake

222

magnitude, seismic hazard intensity and rates, and flood-frequency analysis (see under extreme value distribution for geological references). Frequency, frequencies

F

1. In statistical usage, frequency is a count of the number of occurrences of a given type of event, or the number of a given population falling into a given size class (bin); if expressed as a proportion of the total count, it is termed relative frequency. See also: frequency distribution, histogram. 2. In a time series, it is the rate of repetition, i.e. the number of cycles per unit time (cycles per second; hertz), the reciprocal of period. The idea was discussed by the Italian mathematician and physicist, Giovanni Battista Benedetti (1530–1590) (Benedetti 1585) and the term was used by the Swiss mathematician, Leonhard Euler (1707–1783) (Euler 1727); by 1908, it was being used in wireless telegraphy. For discussion in the context of digital signal processing, see Blackman and Tukey (1958), and in earth science, Weedon (2003) and Gubbins (2004). See also: Angular frequency, Nyquist frequency, sampling frequency, sinusoid, amplitude. Frequency convolution theorem Also known as the convolution theorem, it states that the Fourier transform of the convolution of two time functions, f (t) and g(t), is equal to the pointwise multiplication product of their individual transforms, F(ω) and G(ω), and vici versa: f (t)g(t) $ F(ω)*G(ω), where * indicates convolution and the double-headed arrow denotes a Fourier transform pair (Sheriff 1984). Its converse is the time convolution theorem: f (t)*g(t) $ 2πF(ω)G(ω) The term Faltung theorem was used by the AustroHungarian born American mathematician Salomon Bochner (1899–1982) (Bochner 1932) and its English equivalent, “convolution theorem” appeared about 1935 (e.g. Haviland 1935). Frequency distribution A specification of the way in which the frequency count (or relative frequency) of occurrence of the members of a population are distributed according to the values of the variable which they exhibit. In relative frequency distributions the counts per class are normalized by dividing through by the total number of counts. The term frequency distribution, introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1895), applies to observed distributions. Conceptual models are described by probability distributions. Early discussion in the earth science textbooks includes: Krumbein and Pettijohn (1938), Miller and Kahn (1962) and Krumbein and Graybill (1965); see Helsel (2005) for discussion of treatment of geochemical data containing nondetects. See also the: additive logistic normal, additive logistic skewnormal, Bernstein, Beta, bimodal, Bingham, binomial, bivariate, broken-line, BurrPareto logistic, Cauchy, Chi-squared, cumulative, Dirichlet, discrete, double-exponential, exponential, extreme value, Fisher, fractal, Gamma, generalized Pareto, geometric, joint, Kent, Laplace, log-geometric, log-hyperbolic, logistic, logistic-normal, log-logistic, lognormal, logskew normal, marginal, mixture, multinomial,

223

multivariate Cauchy, multivariate lognormal, multivariate logskew normal, multivariate normal, multivariate skew-normal, negative binomial, normal, Pareto, Poisson, Rosin-Rammler, shifted Pareto, skew, skew-normal, standard normal, stretched Beta, superposition, triangular, truncated, truncated Pareto, uniform, von Mises, Weibull and Zipf distributions. Frequency distribution decomposition, frequency distribution splitting Methods for decomposing an observed frequency distribution into two or more subpopulations, based on estimation of the parameters of each subpopulation and the relative proportions in which they are combined. It requires assumptions as to the appropriate model to be used for the densities of the subpopulations (e.g. normal, lognormal) and the number of subpopulations likely to be present. There is no unique solution as more than one model may be a good fit to the observed distribution. Early fitting methods (Lepeltier 1969; Sinclair 1974, 1976) used in the earth sciences were graphical, but soon gave way to computational solutions (e.g. Clark and Garnett 1974). The first attempt to do this in the geological literature appears to be by the British petrologist, William Alfred Richardson (1887–1965) (Richardson 1923), using the method of moments originally described by the British statistician, Karl Pearson (1857–1936) (Pearson 1894). Frequency domain A representation in which frequency is the independent variable; the expression of a variable as a function of frequency as opposed to a function of time (Robinson and Treitel 1964; Camina and Janacek 1984; Sheriff 1984; Bezvoda et al. 1990). See: Fourier transform, amplitude spectrum, phase spectrum. Frequency matrix A matrix (table) of absolute frequency values. This term was used by Boring (1941), but in the earth sciences it is more usually in the context of transition frequency. Frequency mixing In the case of imposed amplitude modulation in which a long period sinusoidal wavelength with frequency f1 is imposed on another with frequency f2, f1 > f2, then minor combination tones will be generated at frequencies 1/f ¼ 1/f1 1/f2, the upper and lower sidebands on either side of the dominant frequency ( f2). These appear as symmetrically placed minor-amplitude peaks on either side of f2 in the power spectrum of the resulting waveform. The term combination tone was used in acoustics by the German physicist, Georg Simon Ohm (1787–1854) (Ohm 1839). They are also called interference beats and interference tones; their generation is known as intermodulation or frequency mixing. The primary combination tone at f1 + f2 is known as a summation tone, and at f1 f2 as a difference tone. When a component frequency is higher than a fundamental frequency, it is called an overtone, and a difference tone at a lower frequency than the fundamental is called an undertone. For discussion in a geoscience context see King (1996) and Weedon (2003)

224

Frequency modulation (FM) A process in which a waveform to be transmitted is encoded in a constant amplitude “carrier wave” by altering its wavelength (frequency) such that it is proportional to the amplitude of the waveform to be transmitted. The technique was patented by the American electrical engineer, Edwin Howard Armstrong (1890–1954) in 1933 (Armstrong 1936). See also: Weedon (2003). Frequency polygon A graph similar to a histogram, but in which the lines showing frequency as a function of magnitude are joined up via the centres of each magnitude class, rather than as a stepped line with horizontals drawn from each class minimum to maximum. Early examples occur in Perrey (1845) for orientation data and Robinson (1916) for geochemical data; see also Krumbein and Pettijohn (1938).

F

Frequency pyramid Krumbein (1934a) noted with disapproval that “the term ‘histogram’ appears to have been discarded in favor of frequency pyramid by some workers in mechanical analysis [of sediment size distributions].” However, the term still occasionally appears in the literature, e.g. Simons and Sentürk (1992). Frequency response, frequency response function The characteristics of a system viewed as a function of frequency (Silverman 1939; Sheriff 1984; Buttkus 1991, 2000). Frequency-selective filter Algorithms for selectively removing noise from a time series (or a spatial set of data), smoothing, or for enhancing particular components of a signal by removing components that are not wanted. A low-pass filter (e.g. moving average and similar smoothing operators) passes frequencies below some cut-off frequency while substantially attenuating higher frequencies. A high-pass filter does the opposite, attenuating frequencies below some cut-off value while passing higher frequencies (it may be used to emphasise anomalies in the data with unusually large positive, or negative, magnitudes). A band-pass filter attenuates all frequencies except those in a given range between two given cut-off frequencies and may also be applied to smoothing of a periodogram. One form of a band-pass filter can be made by using a low-pass and a high-pass filter connected in series. Information in the passband frequencies are treated as signal, and those in the stopband are treated as unwanted and rejected by the filter. There will always be a narrow frequency interval, known as the transition band, between the passband and stopband in which the relative gain of the passed signal decreases to its near-zero values in the stopband. Electrical low-pass, high-pass and band-pass “wave filters” were initially conceived by the American mathematician and telecommunications engineer, George Ashley Campbell (1870–1954) between 1903 and 1910, in his work with colleagues, physicist Otto Julius Zobel (1887–1970) and mathematician Hendrick Wade Bode (1905–1982), but the results were not published until some years later (Campbell 1922; Zobell 1923a, b, c; Bode 1934). Equivalent filters were introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949; Wiener 1949).

225

Parallel theoretical background was provided by the work of the American physicist, George W. Steward (1876–1956), who worked on acoustics between 1903 and 1926 and solved the fundamental wave equations involved in acoustic filter design (Crandall 1926). See Buttkus (1991, 2000), Camina and Janacek (1984), Gubbins (2004), Howarth et al. (1980) and Vistelius (1961) for discussion in an earth sciences context. Frequency spectrum A waveform g(t) and its frequency spectrum G( f ), (the variation of amplitude and phase as a function of frequency) where t is time and f is frequency (cycles/unit time) are Fourier transform pairs. G( f ) is usually a complex-valued function of frequency, extending over all positive and negative frequencies. It may be written in pffiffiffiffiffiffiffi polar form as G( f ) ¼ jG(g)jeiφ( f ),where i is the imaginary unit 1 and e is Euler’s number, the constant 2.71828; the magnitude |G( f )| is called the amplitude spectrum, and the angle φ( f ) is called the phase spectrum. The theory (Blackman and Tukey 1958) was originally developed by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990). Frequency-wavenumber ( f-k) analysis Also referred to as f-k analysis (Linville and Laster 1966), frequency-wavenumber analysis is the search for identifiable sets of data in the frequency-wavenumber domain; used to examine the direction and apparent velocity of seismic waves (Sheriff 1984; Buttkus 1991, 2000). In f-k analysis, the seismic energy density within a given time period is plotted and contoured in the f-k plane. Early use of the method was by American geophysicists, Andrew F. Linville Jr., Stanley J. Laster (Linville and Laster 1966) and Jack J. Capon (1931–1999) (Capon 1969, 1973). See also: κ-κ domain, Nyquist wavenumber. Frequency-wavenumber ( f-k) plane The frequency-wavenumber plane (sometimes referred to as the f-k plane, e.g. Linville and Laster 1966; Mari 2006) is often used to analyse the result of a two-dimensional Fourier transform in the time-distance domain of a seismic record (Sheriff 1984) or multivariate time series (Gubbins 2004). See also: frequency-wavenumber analysis; κ–κ domain. Frequency-wavenumber response The frequency-wavenumber response of quadrant, multi-channel and velocity filters is discussed by Buttkus (1991, 2000). Frequency-wavenumber spectra These are discussed by Buttkus (1991, 2000) in the context of frequency-wavenumber analysis. Frequency window This is the Fourier transform of a data window. The term was introduced by the American statistician, John Wilder Tukey (1915–2000), and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). See also:

226

Blackman and Tukey (1958) and, for discussion in an earth science context, Robinson (1967b) and Weedon (2003). Fry arc The hemisphere locus in some stress space (its coordinates being weighted matrix elements of the stress tensor) of the conditions able to generate one fault datum which consists of: fault plane orientation, slip direction, and slip sense (assuming slip is in the direction of maximum resolved shear stress). This particular stress space was introduced by the British structural geologist, Norman Fry (1946–) (Fry 1979). The arcuate nature of a lower-dimensional analogue was illustrated by Fry (2001). The term Fry arc was subsequently introduced, along with improved specification of the stress space, by the Japanese structural geologists, Atsushi Yamaji (1958–) and Katsushi Sato in 2006.

F

Fry diagram A graphical technique for determining the characteristics of a strain ellipse from field data (e.g. deformed ooliths, pebbles): The point-centres of the original objects are initially plotted and numbered 1 to n on a transparent overlay (A). A second overlay (B) is centred at point 1 and point-centres 2 to n are marked on it. Overlay B is then shifted parallel to a given azimuth and moved until it is centred at point 2, and points 1, 3, 4, . . . n are plotted; this process is repeated for all remaining points. Eventually the point-density pattern reveals the shape and orientation of the strain ellipse about the centre of sheet B, from which its orientation and ellipticity may be determined. Named for the British geologist, Norman Fry (1946–) who showed (Fry 1979) that the method could be used to indicate strain and that it is available when other methods are not because markers are undeformed and their initial-neighbour relationships are not identifiable. Full normal plot (FUNOP) A robust graphical procedure for detecting unusually large or small values in a frequency distribution, introduced by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1962). The n observed values of a variable, x1, ∙∙∙, xn, are first sorted into order of ascending magnitude and the median (M ) of these values is calculated. These are then transformed to Judds, where the ith Judd ¼ xi M Q , and Qi is the i

quantile of the standard normal distribution equivalent to the plotting proportion i/(n + 1). A divisor of (n + 1) is used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. The Judds are plotted on the yaxis as a function of i. If the observations all corresponded to a normal (lognormal, if the data were first transformed to logarithms) distribution, the Judds would be nearly equal to their standard deviation, and the graph would be linear. See Koch and Link (1970–1971) for discussion in an earth science context. Function An association between the elements of two sets; a rule (e.g. an equation) which gives a fixed output for a given input value; any defined procedure which relates one number to one or more others. The Latin equivalent of this term ( functio) was introduced

227

by the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716), (Leibniz 1692). Function space A class X of functions (with fixed domain and range) in either real or complex topological vector space, together with a norm which assigns a non-negative number k f kX to every function f in X; typically k f kX > 0 for non-zero f. The set of all polynomial functions is a subspace of the space of continuous functions on the interval [a, b]. For example, yðt Þ ¼ α1 sin ωt þ α2 cos ωt is a function in a two-dimensional functional space having sinωt and cosωt as fundamental functions. The scalar product of two functions f and g, defined on the interval [a, b], usually called the inner product, is given by: Z h f ; gi ¼

b

f ðxÞgðxÞdx

a

and if this integral is zero, the functions f and g are said to be orthogonal. The “distance” between the two functions is 1

k f g k ¼ f g; f g 2 : The term function space was used by Shaw (1918) and may have been in use earlier in the 1900s; see Gubbins (2004). Function value, functional value The value of a function f (x) corresponding to a given value of x. An argument is the independent variable of a function or the particular value at which a function is evaluated; e.g. if y ¼ 3 2x + 5x2, the argument x ¼ 3 yields the functional value, y ¼ 42. (Cayley 1879; Camina and Janacek 1984). Functional regression The estimation of a smooth functional response (“a curve”) to a scalar explanatory variable where repeated measurements of the function are available, using nonparametric modelling of the shape of the curves (Ramsay and Silverman 1997; Calvo 2013). Manté et al. (2007) discuss this type of problem in with reference to grain-size analysis. Fundamental frequency, fundamental tone The lowest frequency in a harmonic series (Hamming 1977; Weedon 2003); it corresponds to the shortest interval in which a periodic function exactly repeats itself, the period. In acoustics, the concept goes back to the earliest days of music.

228

Fundamental theorem of calculus The first fundamental theorem of calculus implies that, if f and F are real-valued continuous functions on the closed interval [a, b] and F is the indefinite integral of f on [a, b], then Z

b

f ðxÞdx ¼ F ðbÞ F ðaÞ:

a

F

The generalised mathematical theory was developed by the English physicist and mathematician, Isaac Newton (1642–1727) in 1666 and was first expressed in the now-familiar notation of calculus by the German mathematician and philosopher, Gottfried Wilhelm von Leibniz (1646–1716) in 1675, although it remained unpublished until 1684. It is mentioned in a geophysical context by Camina and Janacek (1984). Fuzzy classification, fuzzy c-means clustering (FCM) Most clustering algorithms partition a data set X into c distinct clusters which are distinct, nonempty, pairwise disjoint and, via union of the subsets reproduce X. Each member of X may only be associated with membership of a single subset or cluster. Such partitioning may be thought of as “hard.” By contrast, fuzzy partitioning enables all members of X to have a stated affinity (“membership function”) with each of the c clusters. This may range between 0 and 1 in each case. See also fuzzy logic; Bezdek (1981), Bezdek et al. (1984; corrections in Bezdek 1985) and geoscience applications in Granath (1984), Vriend et al. (1988), Frapporti et al. (1993), Demicco and Klir (2004), Porwal et al. (2004) and Kim et al. (2013). Fuzzy logic, fuzzy set A fuzzy set is one in which there are many grades of membership. The term was coined by the Azerbaijani-American electrical engineer and computer scientist, Lotfi Asker Zadeh (1921–) Zadeh (1965). An extension of conventional (Boolean) logic, which allows for only “true” or “false” states, to include cases which are equivalent to “partially true” and “partially false” situations, by means of a probabilistic “membership function” which categorises the degree to which membership of a set applies. Zadeh showed how set theoretic operations could be calculated with this type of information, leading to “fuzzy reasoning” (Zadeh 1975, 1983). See also Klir (2004), Rao and Prasad (1982), Kacewicz (1987), Bardossy et al. (1989, 1990).

G

Gabriel biplot Graphical display of the rows and columns of a rectangular n p data matrix X, where the rows generally correspond to the specimen compositions, and the columns to the variables. In almost all applications, biplot analysis starts with performing some transformation on X, depending on the nature of the data, to obtain a transformed matrix Z, which is the one that is actually displayed. The graphical representation is based on a singular value decomposition of matrix Z. There are essentially two different biplot representations: the form biplot, which favours the display of individuals (it does not represent the covariance of each variable, so as to better represent the natural form of the data set), and the covariance biplot, which favours the display of the variables (it preserves the covariance structure of the variables but represents the samples as a spherical cloud). Named for the German-born statistician, Kuno Ruben Gabriel (1929–2003) who introduced the method (Gabriel 1971). See also: Greenacre and Underhill (1982) and Aitchison and Greenacre (2002); and, in an earth science context, Buccianti et al. (2006). Gain An increase (or change) in signal amplitude (or power) from one point in a circuit or system to another, e.g. the amplitude of the system output compared with that of its input. The term occurs in Nyquist (1932) and in geophysics in Lehner and Press (1966) and Camina and Janacek (1984). Gain function A function defining how the amplitude of a waveform changes as a result of passing through a filter (Robinson 1967b; Weedon 2003).

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_7

229

230

Gain-loss diagram 1. Pairs of side-by-side divided bar charts were used by the American geologist and petroleum engineer, Joseph Bertram Umpleby (1883–1967) in Umpleby (1917), Butler et al. (1920) and Burbank and Henderson (1932) to compare the major element oxide gains or losses in a rock affected by metamorphism, e.g. the transition from limestone to wollastonite, diopside, etc.; from quartz monzonite to sericitised quartz monzonite, etc. Oxide percentages of each major constituent were multiplied by the specific gravity of the rock to express the changes relative to 100 cm3 of the unaltered rock; the proportional length of each bar was then equivalent to the same volume of unaltered rock. It is also known as an addition-subtraction diagram. 2. Use of a Harker variation diagram to back-calculate the composition of material added to, or subtracted from a magma (Cox et al. 1979)

G

Galerkin’s method Named for the Russian mathematician and structural engineer, Boris Grigorievich Galerkin (1871–1945) who gave the first examples of the application of a finite-element based method for the approximate integration of differential equations (Galerkin 1915). Mathews (2003) outlines the method as follows: Given a differential equation: Đ ½ yð xÞ þ f ð xÞ ¼ 0

ð1Þ

d over the interval a x b, where Đ[y(x)] represents the computation of a derivative dx : then multiplying equation (1) by an arbitrary weight function w(x) and integrating over the interval [a, b] yields:

Z

b

wðxÞðĐ½ yðxÞ þ f ðxÞÞdx ¼ 0:

ð2Þ

a

Because w(x) is an arbitrary function, equations (1) and (2) are equivalent. Introducing a trial solution u(x) to (1) of the form: uð x Þ ¼ ∅ 0 ð x Þ þ

n X

cj ∅j ðxÞ,

j¼1 n where f∅i ðxÞgi¼0 is a set of a finite number of linear independent functions. Then y(x) on the left side of eq. (1) is replaced by u(x). The residual r(x) is then given by

231

rðxÞ ¼ Đ½uðxÞ þ f ðxÞ: The aim is to choose the w(x) such that Z

b

wðxÞðĐ½uðxÞ þ f ðxÞÞdx ¼ 0:

a n In Galerkin’s method, the weight functions are chosen from the basis functions f∅i ðxÞgi¼1 so that

Z

b

∅i ðxÞðĐ½uðxÞ þ f ðxÞÞdx ¼ 0 for i ¼ 1, 2, ∙ ∙ ∙ , n:

a

The first account of his method in the Western literature was given by Duncan (1937, 1938) and it has since become used in earth science for solving problems involving fluid flow through porous media, such as in groundwater flow (Pinder and Frind 1972) or reservoir modelling (Gottardi and Mesini 1987). Gamma analysis Because the different lithofacies in a stratigraphic section may have different sedimentation rates, the true time series for a section will differ from that of the observed spatial series. Gamma analysis is a method introduced by the American geophysicist, Michelle Anne Kominz, to try to correct this problem. The γ-value for each lithofacies is a unique thickness-time conversion factor. The period Tj (γ-time unit) of the j-th parasequence (i.e. a relatively conformable succession of genetically related beds or bedsets bounded by marine flooding surfaces and their correlative surfaces) is equal to the sum over all lithofacies in the parasequence of the product of their γ-values and thicknesses. Tj is initially assumed to be unity and the set of equations is then iteratively solved for optimum γ-values for the various facies, which stabilize as outlier parasequences are successively removed. The duration of the i-th facies in a particular parasequence is then given by the product of γ i and its thickness. The sum over all parasequences and facies gives the duration of the entire sequence (apart from hiatuses represented by entirely missed parasequences). The corrected γ-tuned time series then yields a γ-tuned spectrum, e.g. via the multitaper method. See: Kominz and Bond (1990), Kominz et al. (1991), Kominz (1996), Weedon (2003) and Yang and Lehrmann (2003).

232

Gamma distribution A probability distribution of the form F ðx; β; λÞ ¼

xλ1 exβ β βΓ ðλÞ

where β is a scale parameter; λ is a shape parameter; Γ is the Gamma function; and e is Euler’s number, the constant 2.71828. Originally known as a Pearson Type III distribution, it was introduced by the British statistician, Karl Pearson (1857–1936) in 1895, but it became known as the Gamma distribution in the 1930s because of its incorporation of the Gamma function. Mentioned in an earth science context in Krumbein and Graybill (1965), Vistelius (1980, 1992), Burgess and Webster (1986) and Li et al. (2012).

G

Gamma function The (“complete”) Gamma function, denoted Γ(z), is a so-called Eulerian integral of the second kind: Z ΓðzÞ ¼

1

xðz1Þ ex dx,

where e is Euler’s number, the constant 2.71828. Γ(z) 6¼ 0 for all x. However, Γ(z) > 0 for all x > 0 and whenever 2 < x < 1, 4 < x < 3, 6 < x < 5, etc.; Γ(z) < 0 when 1 < x < 0, 3 < x < 2, 5 < x < 4, etc. The multivaluedness of the function is eliminated by setting x(z 1) ¼ e(z 1) ln(x), then Γ(z) becomes h z z z i ΓðzÞ ¼ lim nz = zð1 þ zÞ 1 þ 1 þ ... 1 þ : n!1 2 3 n Although its introduction has been attributed to the Swiss mathematician Leonhard Euler (17071783) in a letter to the Russian mathematician Christian Goldbach (16901764) in 1729 (Godefroy 1901; Davis 1959; Sebah and Gourdon 2002), see also Dutka (1991). The Gamma symbol (Γ) was first used to denote the function by the French mathematician and geodesist, Adrien-Marie Legendre (1752–1833), (Legendre 1811). Earth science applications include Brutsaert (1968), Vistelius (1980, 1992) and Kagan (1993). See also incomplete Gamma function. There are standard algorithms available to calculate values of these functions (Pearson 1922). Gamnitude The magnitude of a complex number. A term used in cepstrum analysis (Bogert et al. 1963, Oppenheim and Schafer 2004) for the equivalent of magnitude in traditional spectral analysis.

233

Gauss error function The integral: Z

1

eðx

2

=2Þ

dx,

x

the term error function, and the abbreviation for it (originally Erf), were introduced by the British physicist, James Whitbread Lee Glaisher (1848–1928) (Glaisher 1871). e is Euler’s number, the constant 2.71828. Today it is generally expressed as: 2 erf ðt Þ ¼ pffiffiffi π

Z

t

2 eðy Þ dy

and the complementary error function is: erfc(t) ¼ 1 erf(t). The former is also known as the Gauss error function. The term is used in a more general sense by Berlanga and Harbaugh (1981). Gauss’s theorem This states that the flux through a surface (or the integral of the vector flux density over a closed surface) is equal to the divergence of the flux density integrated over the volume contained by the surface (Sheriff 1984). This result appears to have been independently discovered by a number of scientists in the early nineteenth century, but is generally attributed as Gauss’s theorem (or Gauss’s divergence theorem) to the German mathematician and physicist, Carl Friedrich Gauss (1777–1855), Gauss (1813), and as Green’s theorem to the English mathematician and physicist, George Green (1793–1841), (Green 1828). The term divergence theorem was used by Heaviside (1892a) but may well have come into being before that. Mentioned in an earth science context by Camina and Janacek (1984) and Gubbins (2004), but see also the discussion in Macelwane (1932). Gaussian A single quantity, or a finite number of quantities, distributed according to the Gaussian distribution (Blackman and Tukey 1958). Gaussian distribution, normal distribution Also known as the normal distribution. This is one of the most important frequency distributions since its properties are well known and other distributions (e.g. the lognormal) can be conveniently modelled using it. The probability distribution is given by: 1 h 1 xm 2 i f ðx; m; sÞ ¼ pffiffiffiffiffi e2ð s Þ ; 1 x 1: s 2π Its parameters are: m, the mean; and s, standard deviation and e is Euler’s number, the constant 2.71828. The frequency distribution is symmetric and “bell-shaped.” The frequency distribution of analytical (chemical or other) measurement errors measured repeatedly over a short interval of time generally conform fairly closely to this model. The

234

term was introduced by the British statistician, Karl Pearson (1857–1936) in 1895. Its description as a bell-shaped curve appears to date from usage in the 1930s. For discussion in a geological context see: Miller and Kahn (1962), Krumbein and Graybill (1965), Vistelius (1980), Thompson and Howarth (1980), Camina and Janacek (1984), Buttkus (1991, 2000) and Reimann and Filzmoser (2000). Pearson did not think it a good model for the law of error, nor did he approve of the name “normal distribution” (as in his experience many results tended to have asymmetrical distributions and therefore did not conform to this distribution). He preferred the term “Gaussian curve of errors” (Pearson 1902), named for the German mathematician, Carl Friedrich Gauss (1777–1855), who derived its properties in a work on mathematical astronomy (Gauss 1809a). Although the term Gaussian distribution became more widely used after 1945, it never became as frequent as did normal distribution. (Google Research 2012).

G

Gaussian elimination, Gauss elimination A method of solving a set of m simultaneous equations in n unknowns AX ¼ b, calculating determinants and obtaining a matrix inverse. It enables an augmented matrix formed by combining the coefficients and the results of the original equations: 2

a11 a12 6 a21 a22 ½ A b ¼ 4 ⋮ am1 am2

3 a1n b1 a2n b2 7 5 ⋮ amn bm

to be reduced to an upper triangular form: 2

c11 c12 6 0 c22 4 ⋮ 0

c1n c2n

3 d1 d2 7 5 ⋮

cmn

dm

Successive backward substitution in the equations corresponding to each row in this matrix, proceeding upwards row-by-row from the solution to the last equation, cmn ¼ dm, located in the lowest row, yields the desired set of results from cm to c1. Earth science applications include kriging (Carr et al. 1985; Freund 1986; Carr 1990; Carr and Myers 1990; McCarn and Carr 1992) and frequency distribution fitting (Woodbury 2004). Although not expressed in matrix algebra notation, a similar approach to solving equations was known to Chinese mathematicians in antiquity, it was rediscovered both by the English polymath, (Sir) Isaac Newton (1642–1727) sometime between 1673 and 1683, (Newton 1707) and by the German mathematician, Carl Friedrich Gauss (1777–1855) (Gauss 1810 [1874]). The latter method was first improved for hand-calculation by the American geodesist Myrick Hascall Doolittle (1830–1911) (Doolittle 1878 [1881]). Gaussian elimination was first formulated in terms of matrix algebra in the context of electronic computers by von Neumann and Goldstine (1947). The term Gaussian elimination seems to have been

235

explicitly used only since the 1950s (Grcar 2011, Table 1), however, it seems to be far more frequently used than Gauss elimination (Google Research 2012). See Grcar (2011) for discussion. Gaussian field, Gaussian random field In a homogeneous n-dimensional Gaussian random field, the probability distribution of each variable is Gaussian, e.g. in a onedimensional time series, or spatial transect, the point values in the interval x(t) and x(t + τ) have a Gaussian probability distribution about x ¼ 0: F ð x0 Þ ¼

1 R x0 ðx2 Þ 2π 0 e 2 dx

and F ðx0 Þ ¼ p

½xðt þ τÞ xðt Þ < x0 , τH

where H is a constant known as the Hausdorff dimension and e is Euler’s number, the constant 2.71828. If H ¼ 0, then adjacent values are uncorrelated and the result is white noise; if 0 < H < 1, then the signal is known as fractional Brownian noise; if H ¼ 0.5, it is known as Brownian noise. For discussion in an earth science context, see: Culling and Datko (1987), Culling (1989), MacDonald and Aasen (1994), Tyler et al. (1994), Turcotte (1997) and Bivand et al. (2013). See also: fractal. Gaussian random process A process in which a random variable (or a collection of random variables) is subject to evolution in time (or with distance) which is stochastic rather than deterministic in nature. If the random process has been sampled at times t0 , t1 , t2 , , tN the resulting real-valued random variables will be x(t0) , x(t1) , x(t2) , , x(tN). A Gaussian random process is fully characterised by the mean value across all the x(t) at a given instant in time together with the autocorrelation which describes the correlation between the x(t) at any two instants of time, separated by a time interval Δt. Discussed in a geoscience context in Merriam (1976b), Brillinger (1988) and Buttkus (1991, 2000). The study of such processes began with the Russian mathematician, Aleksandr Yakovlevich Khinchin (1894–1959) (Khinchin 1932, 1934). However, this model is not always suitable for modelling processes with high variability and models based on long-tailed distributions (non-Gaussian processes) may be required in some circumstances (Samorodnitsky and Taqqu 1994; Johnny 2012). See also Markov process. Gaussian weighting function, Gaussian window, Gaussian taper The operation of smoothing with a window of weights applied to a discrete time series. N, the length of

236

the window is typically even and an integer power of 2; for each point 0 n N 1, the weight w(n) is given by wðnÞ ¼ e

0:5

nN 1 2 k ðN 1Þ 2

2 ,

where k 0.5, which is the shape of the normal (Gaussian) probability distribution and e is Euler’s number, the constant 2.71828. It has the property that its Fourier transform is also Gaussian in shape. The theory was originally developed by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990). See Blackman and Tukey (1958), Harris (1978) and Gubbins (2004); see also: spectral window.

G

Generalised inverse A square matrix, X1, with elements such that when multiplied by the matrix X, it yields the identity matrix (I), i.e. X1X ¼ I. The term and notation were introduced by the English mathematician, Arthur Cayley (1821–1895) (Cayley 1858). The pseudoinverse is the generalization of an inverse to all matrices, including rectangular as well as square. It was discovered by the American mathematician, Eliakim Hastings Moore (1862–1932) (Moore 1935), who called it the general reciprocal. It was independently rediscovered by the English mathematical physicist, (Sir) Roger Penrose (1931–) (Penrose 1955), who named it the generalized inverse; Greville (1959) says that the (now widely used) name pseudoinverse was suggested to him by the American applied mathematician, Max A. Woodbury (b. 1926). The term inverse (in the sense of a matrix inverse) becomes frequent in geophysics from the 1960s (e.g. Harkrider and Anderson 1962), and pseudoinverse from the 1980s (e.g. Tarlowski 1982). See also: Greenberg and Sarhan (1959). Generalised Pareto distribution A two-parameter distribution with scale parameter σ and shape parameter k with a density function F ðx; σ; k Þ ¼ 1 ð1 kx=σ Þ1=k for k 0 and x > 0; and F ðx; σ; k Þ ¼ 1 ex=σ if k equals zero and 0 x σ/k; σ > 0 in both cases. e is Euler’s number, the constant 2.71828. For the special cases of k ¼ 0 and k ¼ 1, the density function becomes the exponential distribution with mean σ, and the uniform distribution on [0, σ] respectively. It can be used to model exceedances Y ¼ X u over a given (high) threshold u; X > u. It was introduced by the American statistician, James Pickands IIIrd (1931–), named for the Italian engineer and economist, Vilfredo Federico Damaso Pareto (1848–1923),

237

(Pickands 1975). See also: Castillo and Hadi (1997), Embrechts et al. (1997), Caers et al. (1999a, b) and Sparks and Aspinall (2004); Pareto distribution, extreme value distribution. Genetic Algorithm (GA) Automatic development of models by selecting candidate models according to a set of rules relating them to successful models which have previously been computed. The model selection process attempts to mimic the biological process of “survival of the fittest.” Initially there is a population of potential solutions and a criterion for measuring the suitability of each one. A new generation of solutions is then produced either by allowing the existing solutions to become modified slightly, or for two solutions to combine so as to produce one retaining aspects of both, the aim being to produce a new generation of high-scoring models. The process terminates when a solution is found which satisfies a set of given criteria (Fraser and Burnell 1973; Whitley 1994). Attempts at computer simulation of biological genetic processes began in the 1950s. Geophysical applications include: Sambridge and Gallagher (1993), Sen and Stoffa (1995), Dorrington and Link (2004), and Wang and Zhang (2006). See also: neural-net, simulated annealing. Geographic information system (GIS) A computer-based data management system used to capture, store, manage, retrieve, analyse, and display spatial geographical information. All the data are referenced using coordinates in the same geographical projection and can be overlaid for comparison purposes or combined as a single display. Common feature identification keys link the spatial and attribute information in a relational database. Both raster (e.g. satellite imagery) and vector data (e.g. geological boundaries) may usually be combined. The first GIS was the Canada Geographic Information System, begun in 1962 by the English geographer Roger F. Tomlinson (1933–2014), then working at Spartan Air Services, Ottawa, together with staff at International Business Machines (Ottawa), both companies under contract to the Agriculture Rehabilitation and Development Agency to establish a land resource inventory for Canada (CGIS Development Group 1970–1972). The first use of the actual term geographic information system is believed to be that of Tomlinson (1968). Following establishment of the Geographic Information Systems Laboratory in the Department of Geography of the State University of New York at Buffalo, Amherst, NY, in 1975 and similar research groups elsewhere, use of GIS rapidly proliferated. The first Annual Conference on “The management, analysis and display of geoscience data” took place at Golden, CO, in 1982 (Merriam 1983). See also: Van Driel and Davis (1989), Maguire and Raper (1992), Bonham-Carter (1994), Houlding (1994), Singh and Fiorentino (1996), Coburn and Yarus (2000), Madden (2009), and Carranza (2009). Geological object computer aided design (GOCAD) A suite of software for 3D geological modelling originally developed at the University of Nancy, France, under Prof. Jean-Laurent Mallet (Mallet et al. 1989; Mallet 1997; Caumon et al. 2004; Frank et al.

238

2007) and now commercially supported. It enables reservoir data analysis, velocity modelling, etc. Geometric data model, geometric data structure These are concepts in geographical information systems as defined by Frank (1992): A geometric data model is defined as a formalized abstract set of spatial object classes and the operations performed on them; a comprehensive set of tools to be used to structure (spatial) data. A geometric data structure is the specific implementation of a geometric data model which fixes the storage structure, utilization and performance; detailed and low-level descriptions of storage structures (traditional data structures) and the pertinent operations, with details of how the effects are to be achieved. They will not only provide a specific function (i.e. fulfil the conditions of an operation) but also are fixed in terms of performance, storage, utilization, etc.; they are a specific solution for a generic problem. See also: Maguire and Raper (1992).

G

Geometric distribution A frequency distribution in which the frequencies fall off in a geometric progression. In a series of independent trials, it models the length of a run of “unsuccessful” trials before a “successful” result occurs. If the probability of success in any trial is p, then the probability that k trials are needed to obtain the first success P(X ¼ k) ¼ [(1 p)k] p, where k ¼ 0, 1, 2, 3, . The expected value of X is (1 p)/p. Although this type of distribution may have been used as far back as the seventeenth century, the actual term Geometric distribution only appears in the 1950s (e.g. Feller 1950; Google Research 2012); it occurs in Vistelius (1980, 1992). See also: log-geometric distribution. Geometric mean A measure of the location of the centre of the probability distribution of a set of observed positive values of size n. It is given by the n-th root of the product of 1

the values: mg ¼ ðx1 ∙ x2 ∙ ∙ ∙ xn Þn : This has been found useful in the description of permeability data (Warren and Price 1961) and underpins Aitchison’s (1984, 1986, 2003) centred logratio transform for compositional data: the geometric mean is the best linear unbiased estimator when working on coordinates with data on the strictly positive real line, and the closed geometric mean is the best linear unbiased estimator of the expected value of a distribution when working with compositional data in the simplex. The geometric mean has been in use since the time of the Greek philosopher, Pythagoras (c. 530 BC), Halley (1695); see also: Krumbein and Pettijohn (1938), Pawlowsky-Glahn (2003) and Buccianti et al. (2006). Geometric probability The study of probabilities involved in geometric problems (Kendall and Moran 1963; Vistelius 1980, 1992).

239

Geometric progression A series of numbers in which each term after the first is found by multiplying the previous term by a constant, non-zero number, e.g. 1, 7, 49, 343, ∙∙∙ , etc. Known from as early as c. 2000 BC (Sanford 1930). Geometric series A series in which there is a constant ratio between successive terms, e.g. the n-term series: a þ ax þ ax2 þ ax3 þ . . . þ axn1 : Archimedes (c. 225 BC) is known to have summed this infinite series for the case a ¼ 1 and x ¼ 14, the sum S ¼ að1 xn Þ=ð1 xÞ; x 6¼ 1, as n tends to infinity, it will only converge if the absolute value of x is less then 1, in which case S ¼ a/(1 x).It was eventually generalised to the m-th power by Bernoulli (1713; Sanford 1930). Geomorphometry The quantification of land-surface form and terrain modelling (including digital terrain and digital elevation models) also known as morphometry, terrain analysis, terrain modelling, and quantitative geomorphology. The term geomorphometry seems to have been introduced by Chorley et al. (1957). Summaries of its history, methods and recent advances and a comprehensive bibliography are provided by Pike (1993, 1995, 1996, 1999, 2002); a recent overview is Hengl and Reuter (2009). Geoptronics A term proposed by Srivastava (1975) for optical data-processing of geological (or geophysical) data. It does not seem to have been widely used thereafter. Geostatistics, geostatistical The term geostatistics originally appeared in the North American mathematical geology literature of the 1970s to imply any application of statistical methods in the earth sciences (e.g. Merriam 1970; McCammon 1975a, b) and still appears to be used by many in that context. However, from the early 1960s (Matheron 1962–1963), the term began to be used, initially in Europe, to refer to a specific approach to optimum interpolation of spatial data initially in mining applications (David 1977; Journel and Huijbregts 1978), which had been pioneered by the French mining engineer and mathematician, Georges Matheron (1930–2000). Watson (1971) gives a useful introduction to the subject at a simpler mathematical level than that used by Matheron and his co-workers; see also Dowd (1991). Its use has now spread to hydrogeology (Kitandis 1997); petroleum geology (Yarus and Chambers 1994; Hohn 1999); the environmental sciences (Cressie 1993; Webster and Oliver 2001) and other fields. Geostatistics is now a topic of research in its own right in statistical science as well as its applications in the earth

240

sciences; an extensive glossary will be found in Olea et al. (1991). However, in current usage, the term has also come to be used as a synonym for mathematical geology. See also: simulated annealing, regionalized variable, kriging. See Bivand et al. (2008, 2013) for an exposition of the methods using the R statistical software package. Ghost In a seismic survey this term refers to the energy which travels upward from an energy release and is then reflected downward (e.g. from the water surface in a marine survey) which joins with the down-travelling wave train to change the wave shape (Sheriff 1984). Early use of the term ghost reflection in this context occurs in Evjen (1943) and Lindsey and Piety (1959). See also deghosting.

G

Ghost elimination filter Camina and Janacek (1984) use this term, but it is more usually referred to as deghosting. A filtering technique to remove the effects of energy which leaves the seismic source directly upward, used as part of a process designed to restore a waveform to the shape it had before being affected by some filtering action. The assumption is that a seismic trace consists of a series of reflection events convolved with a wavelet (whose shape depends on the shape of the pressure pulse created by the seismic source, reverberations and ghost reflections in the near-surface, the response of any filters involved in the data acquisition, and the effects of intrinsic attenuation), plus unrelated noise. The deconvolution process designs an inverse filter which compresses the wavelet and enhances the resolution of the seismic data (Dragoset 2005). In practice it may involve the following steps: (i) system deconvolution, to remove the filtering effect of the recording system; (ii) dereverberation or deringing, to remove the filtering action of a water layer (if present); (iii) predictive deconvolution, to attenuate the multiples which involve the surface or near-surface reflectors; (iv) deghosting, to remove the effects of energy which leaves the seismic source directly upwards; (v) whitening or equalizing to make all frequency components within a band-pass equal in amplitude; (vi) shaping the amplitude/frequency and/or phase response to match that of adjacent channels; and (vii) determination of the basic wavelet shape (Sheriff 1984). The method was introduced by the American mathematician and geophysicist, Enders Anthony Robinson (1930–) in 1951 during study for his Massachusetts Institute of Technology PhD thesis (1954, 1967a). See also: Robinson (1967b), Sheriff (1984), Buttkus (1991, 2000), Gubbins (2004); adaptive deconvolution, convolution, deterministic deconvolution, dynamic deconvolution, homomorphic deconvolution, inverse filtering, minimum entropy deconvolution, predictive deconvolution, statistical deconvolution. Gibbs effect, Gibbs oscillations, Gibbs phenomenon, Gibbs ringing When a discontinuous function, such as a series (“train”) of equal-amplitude square-wave or rectangular pulses (or a waveform that includes a jump discontinuity) is approximated by the sum of a finite series of sines or cosines, the discontinuities cannot be exactly fitted, no matter how many terms are used. An example is the Fourier series expansion of a square wave:

241

f ðt Þ ¼

1, 1 t < 0 , etc: 1, 0 t < 1

which is f ðt Þ ¼

1 π X π 1 1 1 sin ðπt Þ þ sin ð3πt Þ þ sin ð5πt Þ þ sin ð7πt Þ þ : sin ðiπt Þ ¼ 4 i¼1;odd 4 3 5 7

However, a truncated N-term expansion f N ðt Þ ¼

N π X 1 sin ðiπt Þ 4 i¼1;odd i

provides only a partial sum and the fitted waveform, exhibits symmetrical sets of smallamplitude ripples on either side of the zero-crossings, which occur at (t ¼ , 3, 2, 1, 0, 1, 2, 3, ). These ripples alternately overshoot and undershoot the intended level of the fitted square wave, gradually converging and reducing in amplitude while increasing in wavelength, with increasing distance away from the jumps at the zero-crossings. The larger N, the better the approximation, but it does not vanish completely. This phenomenon was first recognised by the British mathematician, Henry Wilbraham (1825–1883) (Wilbraham 1848), and was discussed independently by the American mathematical physicist, Josiah Willard Gibbs (1839–1903) (Gibbs 1898, 1899), but it was named for Gibbs by the American mathematician Maxime Bôcher (1867–1918), who gave the first complete analysis of the phenomenon (Bôcher 1906). See Hewitt and Hewitt (1979) for discussion in a historical context. It has also been called the Gibbs effect (Pennell 1930), Gibbs oscillations or Gibbs phenomenon (Buttkus 1991, 2000), Gibbs ringing in electronics (Johannesen 1965), or simply ringing in seismology (de Bremaecker 1964). Gibbs phenomenon appears to be by far the most widely used term (Google Research 2012). See also: Tukey and Hamming (1949), Hamming (1977) and Weedon (2003). Gibbs sampler One of a group of computer-intensive techniques (Markov chain Monte Carlo) used for simulating complex nonstandard multivariate distributions. Introduced by American mathematicians Stuart and Donald Geman (b. 1943) (Geman and Geman 1984; Gelfand and Smith 1990; Casella and George 1992), it is named for the American mathematical physicist, Josiah Willard Gibbs (1839–1903). Liu and Stock (1993) applied it to the quantification of errors in the propagation of refracted seismic waves through a series of horizontal or dipping subsurface layers.

242

GIGO An acronym for “Garbage in, garbage out,” it implies that however good one’s data-processing methods are, you are ultimately reliant on the quality of your data. The term is generally attributed to George Fuechsel (1931–), an IBM programmer and instructor in New York, who worked on the IBM 305 RAMAC computer (which was in use between 1956 and 1961), but it apparently first appeared in print in a paper on the application of the Program Evaluation and Review Technique (PERT) in manufacturing of complex systems (Tebo 1962) and a newspaper article (Crowley 1963). Early examples of its use in the earth sciences are Merriam (1966) and Sheriff (1984).

G

Gnomonic projection The polar gnomonic projection has been used to display threedimensional orientation data, particularly in mineralogy (poles to crystal faces). It is the projection of a vector from an origin at the centre of the sphere to a point on the upper hemisphere onto a plane parallel with the Equatorial plane but touching the sphere at the North Pole. If X is longitude and Y is latitude (degrees) then in Cartesian coordinates

x ¼ tan ðX Þ : y ¼ sec ðX Þ tan ðY Þ

The projection was originally introduced for the construction of astronomical maps of the stars. Although its use may go back to the Greek-Egyptian astronomer and mathematician, astronomer and geographer, Claudios Ptolemaios (Ptolemy, ?100–?165), the first certain account is that of the Austrian astronomer and geographer, Christoph Grienberger (1561–1636) (Grienberger 1612). Its application to mineralogy, especially as an aid in crystal drawing, was described by the German physicist and mineralogist, Franz Ernst Neumann (1798–1895) (Neumann 1823), but its use only became popular following work by the French mineralogist, François Ernest Mallard (1833–1894) (Mallard 1879) and the German mineralogist and crystallographer, Victor Goldschmidt (1888–1947) (Goldschmidt 1887). It was in frequent use until the mid-1930s. The British mathematician and mathematical crystallographer, Harold Simpson (né Hilton, 1876–1974) introduced a gnomonic net as an aid to plotting data in this projection (Hilton 1907). See also: stereographic projection. Goodman distribution The coherence between two weakly stationary stochastic processes X(t) and Y(t), both with zero mean, is the square of the cross-power densityspectrum, i.e.

2 Pxy ð f Þ , ½Pxx ð f Þ Pyy ð f Þ where Pxx( f ) is the estimated power spectrum of X, Pyy( f ), the estimated power spectrum of Y, and Pxy( f ) is their cross-spectrum, or the (cospectrum)2 + (quadrature spectrum)2 divided by the product of the spectra, i.e. it is the square of coherency. However, as pointed

243

out by Weedon (2003), some authors use the two terms synonymously. As introduced by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1928), coherence is an analogue, in the frequency domain, of the coefficient of determination. An approximate frequency distribution of the coherence for data having a Gaussian distribution was introduced by the American statistician, Nathaniel Roy Goodman (1926–?1981) (Goodman 1957), and has been subsequently known as the Goodman distribution e.g. Foster and Guinzy (1967). See also: Weedon (2003), coherency spectrum, semblance. Goodness-of-fit A measure of the closeness of agreement between a set of observed values and the equivalent values predicted by a hypothetical model fitted to the data. The term goodness-of-fit was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1895). If there are several competing models, the one which shows the minimum sum of squared differences between the observed and fitted values is considered to be the “best” fit in a least squares sense. Often used in the context of fitting alternative regression models, fitting a theoretical frequency distribution to an observed frequency distribution, or comparing two observed frequency distributions. See Miller and Olsen (1955), Griffiths (1960), Draper and Smith (1981) and Bates and Watts (1988). Grab sample A method of sampling a geographical area, stratigraphic section, outcrop, or data sequence, by selecting points (at random or in some systematic fashion) from within it, leaving the bulk of the population unsampled. The term appears to have been used in mine sampling since at least 1903. Krumbein and Pettijohn (1938) felt that the term “may imply a degree of carelessness in collection of the sample” and recommended that the term spot sample be used instead. grad (gradient operator) [notation] This is a vector operator such that for any function f(x, y, z) it has components directed along the orthogonal x-, y- and z-axes with magnitudes equal to the partial derivatives with respect to x, y and z; thus gradð f Þ ¼ i

∂f ∂f ∂f þj þk , ∂x ∂y ∂z

where i, j and k are unit vectors. It appears in the work of the German mathematician, Heinrich Martin Weber (1842–1913), whose book The partial differential equations of mathematical physics (Weber 1900–1901), although drawing on Bernhard Riemann’s lectures, was an entirely new work by Weber with the same aims. Treatment of displacement data using vector algebra followed the work (Love 1906) of the English mathematician and geophysicist, Augustus Edward Hough Love (1863–1940). An early example of its use in geophysics is Macelwane (1932).

244

G

Grade scale An arbitrary division of a continuous scale of sizes, such that each scale unit or grade may serve as a convenient class interval for conducting the analysis or expressing the results of an analysis (Krumbein and Pettijohn 1938). The first grade scale for sedimentological use was introduced by the Swedish-born American natural scientist and geologist, Johan August Udden (1859–1932), in which “grade” referred to the material ranging between two successive size separations such that the diameter of the largest particles in one grade had twice the length of the diameter of the coarsest particles in the next finer grade (Udden 1898), thus: Coarse gravel, 8–4 mm; Gravel, 4–2 mm; ; Very fine dust, 1/128–1/256 mm. This scale was extended to both coarser and finer materials in Udden (1914) but the terminology of the grades was later modified by the American geologist, Chester Keeler Wentworth (1891–1969) (Wentworth 1922), establishing the now familiar scale: Boulder, > 256 mm; Cobble, 256–64 mm; Pebble, 64–4 mm; ; Clay < 1/256 mm. Krumbein (1934a, 1938) introduced the phi scale, in which φ ¼ log2(ω), where ω is the Wentworth scale diameter in mm, so as to “permit the direct application of conventional statistical practices to sedimentary data.” For discussion in a modern sedimentological context see Blott and Pye (2001). Grading factor A term introduced by Baker (1920) as a measure of sediment size grading. In his view, a perfectly graded sediment would be one with grains of a uniform w:r:t:arithmetic meanÞ size. It is equivalent to 1ðmean deviation : arithmetic mean

Graduation A term used by Whittaker and Robinson (1924) for a one-dimensional moving average. The term was used in geology by Fox and Brown (1965). It was generally used to mean smoothing, a term which became far more frequently used from the 1950s onwards (Google Research 2012). See also Spencer’s formula, Sheppard’s formula. Gram matrix The Gram (or Gramian) matrix of a set V of m vectors x1, x2, xm is the m m matrix G with elements gij ¼ viTvj. Given a real matrix A, the matrix ATA is the Gram matrix of the columns of A, and the matrix AAT is the Gram matrix of the rows of A. Named for the Danish mathematician, Jorgen Pedersen Gram (1850–1916). See Sreeram and Agathoklis (1994) and Gubbins (2004). Gram-Schmidt orthogonalization Named for procedures independently developed by the Danish mathematician, Jorgen Pedersen Gram (1850–1916) (Gram 1883), and the German mathematician, Erhard Schmidt (1876–1959) (Schmidt 1907), which take a finite, linearly-independent k-dimensional set of vectors and generate an orthogonal set of vectors which span the same k-dimensions (Wong 1935). For example, if the GramSchmidt orthogonalization is applied to the column vectors of a square matrix, A, of order n, it is decomposed into an orthogonal matrix, Q, and an upper triangular matrix, R,

245

such that A ¼ QR (the so-called QR decomposition). See also Alam and Sicking (1981), Kacewicz (1991), and Thompson (1992). Graph, graphics, graphical The term graph was introduced by the English mathematician, James Joseph Sylvester (1814–1897) (Sylvester 1878). The terms “to graph” and the “graph of a function” followed at the end of the nineteenth century. Graphical methods for data-display in the earth sciences have a long history, particularly in geochemistry and structural geology (Howarth 1998, 1999, 2001b, 2002, 2009). Graph types include: (i) univariate graphs, e.g. bar charts, frequency distributions (histograms), polar diagrams, and boxplots; (ii) bivariate line graphs and scatterplots; (iii) ternary (trilinear) or tetrahedral diagrams of trivariate percentaged data; and (iv) multi-component plots, including line diagrams (e.g. chondrite-normalised REE abundance diagrams), multioxide variation diagrams, enrichment-depletion diagrams; see Rollinson (1993) for a comprehensive review. Plots using multivariate symbols such as the star plot and pie chart, and specific multi-element plots (e.g. kite, Stiff and Piper diagrams) tend to be frequently used in hydrogeochemistry. See Zaporozec (1972) and Helsel and Hirsch (1992) for use of graphics in hydrogeology; and Chambers et al. (1983), Maindonald and Braun (2003), and Chen et al. (2008) for excellent reviews of modern statistical graphics. Reimann et al. (2008) give extensive examples from regional geochemistry. See also: plot. Graph theory The study of “graphs” which, in this context, refers to a group of vertices or nodes and a group of edges that connect pairs of vertices: mathematical structures used to model pair-wise relations between objects from a certain collection. In this sense, a graph may be undirected, meaning that there is no distinction between the two vertices associated with each edge, or its edges may be directed from one vertex to another. In a stratigraphic context, Guex and Davaud (1984) give an example in which a set of vertices, the nodes of a graph represent a set of species or taxa; the edges of the graph represent the non-ordered pairs of compatible species (which occupy adjacent vertices); and the arcs of the graph represent the stratigraphical relationships observed between species: the arc x ! y implies that species y lies stratigraphically above x. This type of formal analysis goes back to the work of the Swiss mathematician, Leonhard Euler (1707–1783) in 1736, in a discussion of a topological problem, the crossing of the seven bridges of K€onigsberg which connected to two islands in the River Pregel in what is now Kaliningrad (Euler 1741; Alexanderson 2006). The term graph was introduced by the English mathematician, James Joseph Sylvester (1814–1897) (Sylvester 1878). The first book on graph theory is attributed to the Hungarian mathematician, Dénes K€onig (1884–1944) (K€onig 1936). In geology, Bouillé (1976a, b) applied graph theory to the problem of digitization of boundaries in geological maps and to geological cartographic data bases. It has also been used to establish biostratigraphic relationships (Guex and Davaud 1984) and the hydraulic continuity between stratigraphic units (Hirsch and Schuette 1999). In geophysics, it has been used for the adjustment of data along marine geophysical survey track-lines (Ray 1987) and

246

to the calculation of minimum travel times in 2-D (Moser 1991) and 3-D (Cheng and House 1996) in seismic migration schemes.

G

Graphic correlation This is a term for a method devised by the American palaeontologist and stratigrapher, Alan Bosworth Shaw (1922–) in 1958 although it was first published in Shaw (1964, 1995) to aid stratigraphic correlation between sections. The method has conventionally consisted of making a bivariate scatterplot of the heights (stratigraphic sections) or depths (wells) of occurrence of the tops and bases of as many taxa as possible common to the stratigraphic sections to be compared. Continuous linear or nonlinear trends are fitted to the whole section, or to segments; any abrupt discontinuity suggests a possible geological sequence boundary, condensed section or structural break. (This is a case in which structural regression or the use of smoothing splines is appropriate as neither variable can be regarded as “explanatory”). Smith (1989a) showed that the method can be very effective when used with smoothed well log data; modern applications are reviewed in Mann and Lane (1995). Pairwise comparisons of a number of stratigraphic sections (beginning with that which is believed to be most complete) enables a composite standard to be built up by gradually extending the observed stratigraphic ranges of the taxa from section to section until a “complete” reference standard is obtained. Graphic well log The graphical depiction of the varying proportion of one or more components as a function of down-hole depth. Early published examples include plots of varying percentages of lithic components by the American economic geologist, Earl Adam Trager (1893–1978) (Trager 1920). With increasing interest in the use of micropaleontology as a correlative tool in petroleum exploration in the 1920s, paleontological range/abundance charts began to be used e.g. by the Russian-born American economic geologist and palaeontologist, Paul Pavel Goudkoff (1880–1955) (Goudkoff 1926). Gray code A cyclic binary number code in which successive numbers differ only by one bit: e.g. 0 000, 1 001, 2 011, 3 010, 4 110, 5 111, 6 101, 7 100, etc. It is widely used to facilitate error-correction in digital communications because the number of bit changes is the same for a step change regardless of the magnitude of the quantity (Sheriff 1984). Although first introduced in telegraphy by the French engineer, Émile Baudot (1845–1903) in 1870, the technique is today named for the physicist, Frank Gray (1887–1969) at Bell Laboratories, New Jersey, USA, who applied for a patent under the name “reflected binary code” in 1947. The term Gray code seems to have been first used by Flores (1956); by the 1970s it was being used in high-speed analog-to-digital converters (Schmidt 1970) and has subsequently been generalized (Sankar et al. 2004). Green’s function An integral kernel that can be used to solve an inhomogeneous differential equation with boundary conditions, e.g. a Green’s function, G(x, s) of a linear differential operator L ¼ L(x) acting on distributions over a subset of Euclidean space at a point s, is any solution of LG(x, s) ¼ δ(x s), where δ is the Dirac Delta function. In general Green’s functions are distributions rather than proper functions. Named for the

247

British mathematician George Green (1793–1841) (Green 1828). Early examples of its use in geophysics are Gosh (1961), Knopoff (1961) and Herrera (1964). Green’s theorem This states that the flux through a surface (or the integral of the vector flux density over a closed surface) is equal to the divergence of the flux density integrated over the volume contained by the surface (Sheriff 1984). This result appears to have been independently discovered by a number of scientists in the early nineteenth century, but is generally attributed either as Gauss’s theorem, named for the German mathematician and physicist, Carl Friedrich Gauss (1777–1855), Gauss (1813), or as Green’s theorem named for the English mathematician and physicist, George Green (1793–1841) (Green 1828). The term divergence theorem was used by Heaviside (1892a) but may well have come into being earlier. Mentioned in an earth science context by Ingram (1960), Camina and Janacek (1984) and Gubbins (2004), but see also the discussion in Macelwane (1932). Gregory-Newton interpolation formula, Gregory-Newton forward difference formula An interpolation formula: Assume values of function f(x) are known at regularly-spaced points x0 , x0 d , x0 2d , ∙ ∙ ∙ ∙, all a distance h apart, and that one wishes to find its value at an intermediate point x ¼ x0 + kh. Then if f(x) f (x0 + kh):

k ðk 1 Þ 2 k ðk 1Þðk 2Þ 3 D f ð x0 Þ þ D f ð x0 Þ 2! 3!

k ð k 1Þ ð k 2Þ ð k 3Þ 4 þ D f ð x0 Þ þ ∙ ∙ ∙ ∙ 4!

f ðxÞ ¼ f ðx0 Þ þ kDf ðx0 Þ þ

where the first-order difference is Df(k) ¼ f(k + 1) f(k) and the higher-order differences are given by Dn f ðk Þ ¼ Dn1 f ðk þ 1Þ Dn1 f ðk Þ for all n > 1, hence Djkjm ¼ mjkjm 1. First described by the Scottish mathematician and physicist, James Gregory (1638–1675) in a letter to a colleague in 1670, it was later independently discovered by the English polymath, Isaac Newton (1643–1727) who first published it in Newton (1687), but its proof was not published until Newton (1711). Credited as the Gregory-Newton formula since 1924; see also: Meijering (2002). An early application in geophysics was by Landisman et al. (1959); also mentioned in Sheriff (1984). Gresens’ diagram, Gresens’ equation, Gresens’ metasomatic equation Gresens (1967) introduced a method for studying the composition-volume relationships involved in masstransfer during metasomatic processes. The gains and losses in the system may be calculated by solution of a set of equations: For example, in the case of an original mineral A, of

248

specific gravity gA, in the parent rock, and its alteration product B, of specific gravity gB in the product rock, then for a given chemical component x with weight fractions xA and xB in the parent and product rocks respectively: wA[ fv(gB/gA)xB xA] ¼ x, where x is the unknown weight (gm) of a given component lost or gained and fv is the volume factor (the volume ratio of product rock to parent rock): fv > 1 for replacement with volume gain; fv ¼ 1 for isovolumetric replacement; or fv < 1 if volume loss occurs. It is estimated from the ratios of immobile elements, fv ffi (TiO2)A/(TiO2)B ffi (Al2O3)A/(Al2O3)B. Grant (1986) provides a spreadsheet-based method for solution in the more general case. Sketchley and Sinclair (1987) discuss an interesting example of its use. López-Moro (2012) provides a Excel-based solution incorporating improvements to the Grant (1986) method.

G

Grey level, grey scale, grey-level, grey-scale The terms grey scale, grey level (or even grey code) have sometimes been applied to images or maps in which the relative darkness in appearance of an overprinted group of symbols or a cell in a photograph-like image is proportional to the concentration, amplitude etc. of a variable of interest. The technique was originally introduced using successive overprinting of lineprinter characters in the “synographic mapping system” (SYMAP) software developed at Harvard University in 1963 by American architect and urban designer, Howard T. Fisher (1903–1979) working with programmer Betty Tufvander Benson (1924–2008) and was released for general use some years later (Fisher 1968; Chrisman 2006). The author, then at the Imperial College of Science and Technology, London, subsequently developed a lineprinter-based package, better suited to the needs of regional geochemical mapping (Howarth 1971b). This was subsequently adapted for microfilm and laser-plotter output (Webb et al. 1973, 1978; Howarth and Garrett 2010). Similar software is still being used (Reimann et al. 2008). The unhyphenated spellings of both terms are the most widely used; grey is the most widely-used spelling in British English, while gray is most popular in American English (Google Research 2012). Grey noise Coloured (colored, American English sp.) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t), where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for grey noise is U-shaped with a minimum at mid-range frequencies. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003). Grid, grid resolution A two- or three-dimensional grid, often referred to as a mesh (Mason 1956), usually constructed (gridded) using square, rectangular or triangular grid

249

cells (sometimes referred to as grid blocks when in three dimensions) either as a basis for the interpolation of data in the form of a contour map, or in the approximation of a surface for numerical modelling (Sheriff 1984; Camina and Janacek 1984). The ratio of the area (or volume) of an individual grid cell (or block) relative to the area (or volume) of the region (or object) being covered. The grid points may be simply a computational aid, as in contouring the values of a spatially-distributed variable over a mapped area, or real positions at which specimens are taken or geophysical attributes measured. See: contour map, sampling design. Grid sample, grid sampling Taking point samples at the intersections of a regular grid pattern. Early geological examples of its use are in work by Pratje (1932) and Krumbein and Pettijohn (1938). Grid search Systematically searching for a target on the basis of a grid pattern overlying the search region. May be applied to exploration of an actual geographical area (Davis 1976; Shurygin 1976; McCammon 1977), or searching for maxima or minima on the surface formed by a theoretical function (Gubbins 2004). Gridding The interpolation of randomly spaced two-dimensional data on to the nodes of a rectangular (or square) grid. Possible methods include estimation based on a weighted average of a chosen number of closest points (the weighting function used is often proportional to inverse distance of the data point from the grid node); a local surface fit using an n-th order polynomial; interpolation using splines or piecewise continuous polynomials, etc. One of the first computer-based applications was that of McCue and DuPrie (1965) for contouring data acquired by the Ranger 7 lunar probe. Useful reviews of early methods are given by Crain (1970), Naidu (1970a), Braile (1978) and El Abbass et al. (1990). Gumbel distribution The distribution of the magnitude of the largest (or smallest) observation in a sample. The term extreme value was introduced by the German-born American mathematician, Emil Julius Gumbel (1891–1966) (Gumbel 1935, 1941a, 1941b, 1945, 1954). The Gumbel distribution has a probability distribution 1 y f ðx; a; bÞ ¼ eðye Þ b where y ¼ (x a)/b ; b > 0; 1 < x < + 1 ; a is the location parameter and b is the scale parameter; and e is Euler’s number, the constant 2.71828. If a ¼ 0 and b ¼ 1, it becomes the standard Gumbel distribution: f ðxÞ ¼ e y eðe

y

Þ

250

Applications include: earthquake magnitude, seismic hazard intensity and rates, and floodfrequency analysis (Brutsaert 1968; Caers et al. 1999a,b). See also: extreme value distributions, Fréchet distribution, Weibull distribution.

G

H

H (entropy), relative entropy A thermodynamic quantity established by the German physicist and mathematician, Rudolf Julius Emanuel Clausius (1822–1888) (Clausius 1865), which is a measure of the degree of disorder in a system, characterised (Boltzmann 1872) by the natural logarithm of the probability of occurrence of its particular arrangement of particles. The idea was introduced by the American electronic engineer, Ralph Vinton Lyon Hartley (1888–1970) for use in communication theory (Hartley 1928), although he did not use the term “entropy,” simply referring to it as a “unit of information,” and it was later introduced by the American mathematician, Claud Elwood Shannon (1916–2001) (Shannon 1948; Shannon and Weaver 1949). It was subsequently taken up in geology as a measure of the lack of uniformity in composition (Pelto 1954). In a kcomponent system, entropy (H ) is defined as: H ¼

k X

pi ln ð pi Þ:

i¼1

pi is the proportion of the i-th component, 0 pi 1. It reaches a maximum if all the pi ¼ 1/k. If the logarithms are to base-2, the units of information (H) are bits; if natural logarithms, they are known as nats; and, if the logarithms are to base-10, as hartleys. Relative entropy (Hr) is defined as: H r ¼ 100

k X

pi ln ð pi Þ=ln ðk Þ:

i¼1

High values of Hr correspond to dominance by one of the k possible components present. It has subsequently been used in mapping multi-component taxonomic and sedimentological

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_8

251

252

data to show the degree of mixing of end-members, following early use by Parker et al. (1953). See: Pelto (1954), Miller and Kahn (1962), Vistelius (1964, 1980, 1992), Botbol (1989), Christakos (1990), Buttkus (1991, 2000) and Baltr^unas and Gaigalas (2004). See also: Bayesian/maximum-entropy method, facies map, information coefficient, maximum entropy filter, maximum entropy principle, maximum entropy spectrum, minimum entropy deconvolution.

H

H (Hurst exponent) Named for the British hydrologist, Harold Edwin Hurst (1880–1978), who during a career spent largely in Egypt, studied the 800-year flooding history and river flow pattern of the river Nile. If r is the range of the values of a time series and s is its standard deviation, both taken over a time interval T, then for some process which has no persistence, the ratio r/s, known as the rescaled range (Hurst et al. 1965), is H independent of T; for others, r/s is found to be equal to T2 , where the constant H (originally designated by Hurst as K ) is now known as the Hurst exponent. H ~ 0.5 for a random walk. If H < 0.5, the series has negative autocorrelation. If 0.5 < H < 1.0 the series has positive autocorrelation and is known as a long memory, or persistent memory, process. The fractal dimension, D ¼ 2 H. For a time series xi, i ¼ 1,n, H is estimated from the slope of a graph of log(r/s) as a function of log(n), where n, the length of the series considered, may be increased by intervals of, say, 100 or more successive points. In some cases, long sequences (several thousand points) may be required before a conclusive result can be obtained. See also: Hurst (1951, 1955), Hurst et al. (1965) and Turcotte (1997). Half-plane, half plane A plane which exists everywhere on one side of a line, but not on the other. In mathematics, the upper half-plane is the set of complex numbers with a positive imaginary part, and vici versa. Although generally attributed to the mathematician Henri Poincaré, the concept was first discussed by the Italian mathematician, Eugenio Beltrami (1835–1900). The term occurs in geophysics in Omer (1947), Olsson (1980, 1983), Poddar (1982) and Dhanasekaran and Poddar (1985). Half-plane is the most frequently-used spelling (Google Research 2012). See also half-space. Half-space, half space In three-dimensional geometry, it is either of the two parts into which a plane divides the three-dimensional space; more generally, it is either of the two parts into which a hyperplane divides a higher-dimensional space with the properties of Euclidean space. A mathematical model which is so large in other dimensions that only one bounding plane surface affects the results. The first proof that waves could propagate along a free surface of a homogeneous, isotropic, linear elastic half-space (now known as Rayleigh waves), was given by the English physicist, John William Strutt, 3rd Baron Rayleigh (1842–1919) in 1885 (Rayleigh 1887). The English mathematician, Augustus Edward Hough Love (1863–1940), investigating the propagation of waves in a multilayered half-space sought to explain the propagation of earthquake waves and discovered

253

(Love 1911) the horizontally-polarised wave. These two types were subsequently known as Rayleigh and Love waves respectively (e.g. Walker 1919; Macelwane 1932; Harkrider 1964; Gupta and Kisslinger 1964). Half-space is the most frequently-used spelling (Google Research 2012). Half-width, half width 1. The horizontal distance across a peak (or trough) at the half-maximum (or halfminimum) amplitude. The term was originally used in spectroscopy (e.g. Rayleigh 1915) and X-ray crystallography (e.g. Parratt 1932). Its usage became frequent in geophysics from the 1960s. It was used as above in Gupta and Kisslinger (1964) but the concept was also applied to gravity anomaly width by Simmons (1964). 2. Half-width of a spectrum: The absolute difference between the frequencies or wavelengths at which the spectral radiant intensity surrounding the centre frequency have a power level equal to half that of the maximum power (Buttkus 1991, 2000). Halfwidth is the most frequently-used spelling (Google Research 2012). Hamming (error-correcting) code A series of systematic codes in which each code symbol has exactly n binary digits, where m digits are associated with the information while the other (n-m) digits are used for error detection and correction in a transmitted digital signal. The code expresses the sequence of numbers in such a way that any error which has been introduced can be detected, and hopefully corrected, based on the remaining numbers. First developed by the American mathematician, Richard Wesley Hamming (1915–1998) in the 1940s (Hamming 1950); see also Kanasewich (1975) and Moon (2005). Hamming taper, Hamming weighting function, Hamming window Used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time signal. N, the length of the window is typically even and an integer power of 2; for each point 0 n N 1, the weight w(n) is given by wðnÞ ¼ 0:53836 0:46164 cos 2πn N :This tapered cosine window, is also referred to as a Hamming window (occasionally spelt hamming), a term introduced by the American statistician, John Wilder Tukey (1915–2000) in honour of the work of his colleague, the mathematician, Richard Wesley Hamming (1915–1998) in Blackman and Tukey (1958). See: Tukey and Hamming (1949), Hamming (1977), Harris (1978); mentioned in Sheriff (1984); see also: spectral window. Hankel integral, Hankel transform A Hankel transform of order v of the real function f (t) is a linear integral transformation: Z F v ð yÞ ¼ 0

1

pffiffiffiffiffi f ðxÞ xyJ v ðyxÞdx,

254

where y > 0 and Jv is a Bessel function of the first kind, of order v > 12 : Named for the German mathematician, Hermann Hankel (1839–1873) who introduced it. It arises in the context of solving boundary-value problems formulated to cylindrical coordinates and Fourier transforms, as the Hankel transform of order zero is effectively a two-dimensional Fourier transform of a circularly symmetric function. Hankel transforms of order 12 and 12 are equivalent to the Fourier sine and cosine transforms as J 12 ðxÞ ¼

pffiffiffiffiffiffiffiffiffiffi 2=πx sin ðxÞ

J 12 ðxÞ ¼

pffiffiffiffiffiffiffiffiffiffi 2=πx cos ðxÞ:

and

H

An early application in geophysics is that of Ştefănescu et al. (1930) and in geostatistics by Armstrong and Diamond (1984); see also: Christensen (1990), Zayed (1996) and Buttkus (1991, 2000). Hann window, hanning, Hanning function, Hanning weighting function, Hanning window Used in the operation of smoothing a periodogram with a lag window of weights applied to a discrete time signal. N, the length of the window is typically even and an integer power of 2; for each point 0 n N 1,the weight w(n) is given by wðnÞ ¼ 12 f1 cos ½2πn=N g: The window is named for the Austrian meteorologist, Julius Ferdinand von Hann (1839–1921) in Blackman and Tukey (1958), although the term Hanning window (most often spelt with a capital H; cf. Sheriff 1984, Buttkus 1991, 2000), is now used much more widely than Hann window (Google Research 2012); see Hamming (1977), Harris (1978) for discussion. One of the earliest earth science applications was that of Anderson and Koopmans (1963); see also: spectral window. Harker diagram, Harker variation diagram A bivariate graph, first popularised by the English petrologist, Alfred Harker (1859–1939), in which the percentages of major element oxides present in a suite of rock specimens are plotted (y-axis) versus the percentage of SiO2 (x-axis) (Harker 1909). Today it is more widely known simply as the Harker diagram (Google Research 2012). Several other types were later developed (e.g. total alkalis—silica diagram; see also Howarth 1998). As pointed out by the American petrologist, Felix Chayes (1916–1993) (Chayes 1962) the inherently constant-sum nature of the data (closed data) causes problems for interpretation of the apparent trends.

255

Harmonic A frequency which is a multiple of a fundamental frequency. The term originally derives from the acoustics of music, and was used by the French mathematician and physicist, Joseph Sauveur (1653–1716) in 1701 (Sauveur 1743). An early use of the term in geophysics was by the Scottish physicist, Cargill Gilston Knott (1856–1922) in 1884 (Knott 1886). Harmonic algorithm The procedure computes the autoregressive power spectrum coefficients directly from the data by estimating the partial autocorrelations at successive orders. Since the computed coefficients are the harmonic mean between the forward and backward partial autocorrelation estimates, this procedure is more usually known as the Maximum entropy method. It minimizes the forward and backward prediction errors in the least squares sense, with the autoregressive coefficients constrained to satisfy the Levinson-Durbin recursion. The algorithm will exhibit some bias in estimating the central frequencies of sine components, and higher-order fits are notorious for splitting, a phenomenon where multiple spectral peaks are generated where only a single feature is present. It was originally introduced by the American geophysicist, John Parker Burg (1931–) in unpublished papers (Burg 1967, 1968, 1975). For earth science applications see: Ulrych (1972), Ulrych et al. (1973), Camina and Janacek (1984), Yang and Kouwe (1995), Buttkus (1991, 2000) and Weedon (2003). Harmonic analysis The decomposition of a time series waveform into a sum of sinusoidal components so as to detect periodic signal components in the presence of noise. The term began to be used in the 1870s (e.g. Maxwell 1879), following its introduction by the Irish physicist, William Thomson (1824–1907) (Thomson 1878), who later became Lord Kelvin. He adapted a mechanical analyser, invented by his brother, the civil engineer, James Thomson (1822–1892), to carry out harmonic analysis of tidal records (Thomson 1876). It was recast in the context of spectral analysis by the American mathematician, Norbert Wiener (1894–1964) Wiener (1930). It began to be used in geophysics by the Scottish physicist, Cargill Gilston Knott (1856–1922) (Knott 1884 [1886]), and by the English mathematician and seismologist, Charles Davison (1858–1940) (Davison 1893, 1921) and others, and is discussed in Jeffreys (1924, Appendix E); see also Howell et al. (1959). An early geological application is the analysis of varve thickness sequences by Anderson and Koopmans (1963). See also: Fourier analysis, harmonic motion, periodogram, power spectral density analysis. Harmonic dial A graphical representation of the terms in a Fourier series approximated by sine waves cn sin(nt + pn) of frequencies nt, where n ¼ 1, 2, 3, . . .; cn is the amplitude; and pn is the phase (Chapman 1928). A given waveform is plotted as a point P at the end of a vector OP, drawn from the origin O with length equal to the amplitude cn at an angle θ, taken anticlockwise from the horizontal axis, where θ is the phase, e.g. for a 24 hour (diurnal) component of, say, a geomagnetic variation, the time from midnight to midnight is equivalent to the interval 0–2π, hence the circular scale is 15 per hour and goes from 0 to

256

24 hour. It has been used in studies of long-term variation in the Earth’s magnetic field (Bartels 1932a, b), cosmic-ray intensity (Sandstrōm 1955) etc. Harmonic function A function which repeats after successive equal intervals of the arguments (e.g. Thomson 1861; Jeffreys 1924). Such functions also occur as a result of the solution of Laplace’s equation (Camina and Janacek 1984). Harmonic mean A useful measure of central tendency for ellipticity measurements (Lisle 1997), it is given by mh ¼ n=

n X

1=xi :

i¼1

H

It was in use at the time of the Greek philosopher, Pythagoras (c. 530 BC) and the British mathematician, Robert Smith (1689–1768) discussed calculation of the harmonic mean in a treatise harmonics in music (Smith 1759). Harmonic motion A regularly repeated sequence that can be expressed as the sum of a set of sine waves. Originally defined in terms of mechanics (Thomson and Tait 1878; Macquorn Rankine 1883): if a point P is moving round the circumference of a circle with uniform velocity V, then its orthogonal projection (M) onto the diameter of the circle which passes through the centre (O) will execute simple harmonic motion. The speed of M increases from zero at one end of the diameter (A) to V at O, then it falls off again to zero as M approaches the opposite end of the diameter (A'). The time taken for P to return to the same position in the circle is the period, T; the radius of the circle (r) is the amplitude of the simple harmonic motion, and T ¼ 2πr/V (or 2π/ω, where ω is the angular velocity of P). d is the phase of the simple harmonic motion. If P is the position of the point The angle AOP at time t, and Z (lying on the circle between A and P) was its position at time t ¼ 0, then the d is the epoch. If the distance OM at time t is x, then angle AOZ d ¼ OP cos POZ d þ ZOA d , x ¼ OP cos POA hence x ¼ r cos ðωt þ εÞ, where ε is a constant, and

257

dx ¼ rω sin ðωt þ εÞ: dt A record of x as a function of time was known as the curve of sines or a harmonic curve. The concept has been used in geophysical models such as Cooper et al. (1965). Hat matrix (H) The term hat matrix (H) was coined by the American statistician, John Wilder Tukey (1915–2000) about 1968 (Hoaglin and Welch 1978). The general linear regression model is y ¼ bX + ε, where y is the vector of the n values of the dependent variable; X is the n p matrix of predictors; b is the vector of the p regression coefficients; and ε are the n prediction errors. Then H is the n n matrix: 1 H ¼ X XT X XT where the superscript T denotes the matrix transpose. The individual elements of this matrix indicate the values of y which have a large influence on the overall fit. The estimated values fitted by the regression are often denoted by b y (“y-hat”), hence the name. The diagonal elements of H are the leverages which indicate the influence which each of the n observed values have on the fitted values for that observation. See Unwin and Wrigley (1987) and Agterberg (1989). Hausdorff dimension, Hausdorff-Besicovitch dimension More usually known in the earth sciences as the fractal dimension, it was introduced by the German mathematician, Felix Hausdorff (1868–1942) (Hausdorff 1918) but as methods of computing it for very irregular sets were developed between 1929 and 1937 by the Russian mathematician, Abram Samoilovitch Besicovitch (1891–1970) (Besicovitch 1929, 1934, 1935a, b; Besicovitch and Ursell 1937), it is also sometimes called the Hausdorff-Besicovitch (H-B) dimension. In two dimensions, the number of circles of radius r you need to cover the surface, each of which includes all the points within a radius r of its centre, (N ) is proportional to r12; for a 3-dimensional solid, the number of spheres of radius r required to do so is proportional to r13 : In general, given an object covered in the space in which it exists by a number N of small identical spheres of diameter δ, then the measurement unit μ ¼ δα, where α is an unknown exponent. The H-B dimension is the value of α at which the α-covering measure limε!0 inf

nX

δα ; δ ε

o

jumps from zero to infinity. See Edgar (1990), Turcotte (1992) and La Pointe (1995). There are alternative estimators of fractal dimension, see: Richardson plot, box-count dimension, sandbox dimension.

258

H

Hazard map Construction of a map showing the likely area of a region to be affected by a natural hazard, i.e. a potentially damaging or destructive event, such as a flood (Waananen et al. 1977), earthquake (Lemke and Yehle 1972; Nichols and Buchanan-Banks 1974), tsunami (Imamura 2009), lava flow or ash-cloud from a volcanic eruption (Sheridan 1980; Wadge et al. 1994; Alatorre-Ibargüengoitia et al. 2006), an avalanche or rock-flow (Kienholz 1978), etc. The hazard risk may be defined as the probability that an event of a certain type and magnitude will occur within a specified time period and will affect a designated area (Booth 1978) or the expectable consequences of an event in terms of deaths or injuries among a population and the destruction of various kinds of property or other kinds of economic loss (Crandell et al. 1984). Probabilistic ground motion maps depict earthquake hazard by showing, by contour values, the earthquake ground motions (of a particular frequency) that have a common given probability of being exceeded in 50 years (and other time periods). The ground motions being considered at a given position are those from all future possible earthquake magnitudes at all possible distances from that position. The ground motion coming from a particular magnitude and distance is assigned an annual probability equal to the annual probability of occurrence of the causative magnitude and distance (United States Geological Survey 2015) Head-banging A median-based smoothing method for use with irregularly-spaced spatial data, designed to remove small-scale variation from the data set, while preserving regional trends and edge-structures, introduced by American mathematicians, John Wilder Tukey (1915–2000) and Paul A. Tukey, the term refers, in the contest of data-smoothing, to “banging-down” the heads of nails projecting from a wall (Tukey and Tukey 1981); it was further developed by American mathematician, Katherine M. Hansen in 1989 (Hansen 1991). Applied to heat-flow data by Barr and Dahlen (1990). Heaviside function A discontinuous function, H ¼ f (x), whose value is 1 for x < 0 and + 1 for x > 0. It is the integral of the Dirac Delta function. It is also known as the unit step function, or step function. Named for the English telegrapher, electromagnetist and mathematician, Oliver Heaviside (1850–1925). Both Heaviside function and unit step function appear in Poritsky (1936). Helix transform An algorithm which enables a multidimensional convolution to be carried out using a 1-dimensional convolution algorithm. So called because it can be likened to a 1-D wire coiled round the 2-D surface of a cylinder. Developed by the American geophysicist, Jon Claerbout (1937–) in 1997 (Claerbout 1998), it is well suited to problems involving noise attenuation and seismic data regularization with prediction error filters (Naghizadeh and Sacchi 2009). Helmholtz equation An elliptic partial differential equation which may be written as: (∇2 + k2)A ¼ 0, where ∇2 is the Laplacian (∇ is the Nabla operator), k is the wavenumber (wavelengths per unit distance; angular frequency/velocity) and A the

259

amplitude. It represents a time-independent version of the original equation and is the most important, and simplest, eigenvalue equation in two dimensions (Sheriff 1984; Bullen and Bolt 1985); “the classical form for dilational elastic waves” (Macelwane 1932). If k ¼ 0, it reduces to Laplace’s equation, ∇2A ¼ 0. If k is < 0 (i.e. imaginary), it becomes the space part of the diffusion equation. Named for the German physician and physicist, Hermann Ludwig Ferdinand von Helmholtz (1821–1894) who included it in a paper on the vibration of air in organ pipes (Helmholtz 1859). In geophysics, the equation appears, unnamed, in Macelwane (1932), but the term becomes frequent from the early1960s (e.g. Mal 1962). Herglotz-Wiechert transform, Herglotz-Wiechert-Bateman transform Named for the German geophysicist, Emil Johan Wiechert (1861–1928) and mathematician, Gustav Herglotz (1881–1953), it provides the solution of an inverse problem: the derivation of the velocity along a radius vector from the source on the basis of the observed travel time of a seismic wave. The solution involves use of an Abelian integral, a type of integral equation first solved by the Norwegian mathematician, Niels Henrik Abel (1802–1829). The method for the problem’s solution in a seismological context was developed by Herglotz (1907) and modified into a computationally simpler solution by Wiechert (1907) and Wiechert and Geiger (1910). It was subsequently improved by the English mathematician, Harry Bateman (1882–1946) (Bateman 1910). In the simplest case of a spherical Earth, let seismic velocity v increase continuously as a function of depth as the seismic rays travel downwards along a path which lies at a constant angle to the radial direction from the Earth’s centre. This angle increases with depth until it reaches 90 at the turning point: the rays then begin to travel upwards, returning to the surface at an angular distance Δ from the epicentre. The angle θ which subtends half the epicentral distance at the Earth’s surface is given by: Δ 1 θ¼ ¼ 2R V Δ

ZR rv

1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dlogr 1 V12 V2 Δ

r

where R is the Earth’s radius; rv is the radius at the turning point (vertex); r is the general radius to any point on the raypath; VΔ is the apparent velocity, given by the tangent to the time-distance curve at an arcuate distance Δ from the epicentre; and Vr is the apparent surface velocity of any ray at its point of emergence. It is then possible to show that:

log

ZΔ R 1 qdΔ ¼ rv πR 0

where

260

coshðqÞ ¼

sin ðiΔ Þ sin ðir Þ

and sin(iΔ) ¼ v/VΔ where iΔ is the angle between the ray and a normal to the Earth’s surface at the point of emergence; ir is the angle of incidence for a general ray; v is the true velocity at the surface, obtained by direct observation; and the apparent velocity VΔ is obtained from the time-distance curve, hence sin(iΔ) and q as function of Δ. By the early 1930s, the first tables of depths to the vertex and velocities at the vertex of P- and S-waves were being produced. For discussion, see Macelwane (1932), Nowak (1990, 1997), Bormann et al. (2002) and Aki and Richards. (2009). Hermite polynomials A set of orthogonal polynomials Hn(x) over the interval (1, 1) with a weighting function exp(x2/2): H n ðxÞ ¼ ð1Þn ex

H

2

=2

dn x2 =2 e dxn

where n ¼ 0, 1, 2, 3, etc. so that H 0 ðxÞ ¼ 1 H 1 ð xÞ ¼ x H 2 ð xÞ ¼ x2 1 H 3 ðxÞ ¼ x3 3x H 4 ðxÞ ¼ x4 6x2 þ 3 H 5 ðxÞ ¼ x5 10x3 þ 15x etc. Note that in some definitions the exponents are both x2 rather than x2/2. Named for the French mathematician, Charles Hermite (1822–1901) who independently described them in Hermite (1864), although such functions had been used earlier. Mentioned in Agterberg and Fabbri (1978). Hermitian conjugate The adjoint of an operator on a Hilbert space is called the Hermitian conjugate (Gubbins 2004). Named for the French mathematician, Charles Hermite (1822–1901).

261

Hermitian matrix A matrix which equals the transpose of its complex conjugate: (A*)T ¼ AH. Named for the French mathematician, Charles Hermite (1822–1901), who first introduced it (Hermite 1855). The term occurs in geophysics in Tarlowski (1982); see also Gubbins (2004). Hermitian operator An operator on a Hilbert space that is its own Hermitian conjugate is called a Hermitian operator (Gubbins 2004). Named for the French mathematician, Charles Hermite (1822–1901). Hermitian transpose The Hermitian transpose of an m by n matrix A with complex elements is the n by m matrix AH, obtained by first converting all elements of A to their complex conjugates A* and then transposing the elements, so that AH ¼ (A*)T. Named for the French mathematician, Charles Hermite (1822–1901). See Buttkus (1991, 2000). Hertz (Hz) A unit of frequency: the number of complete cycles per second. First named in honour of the German physicist, Heinrich Rudolph Hertz (1857–1894) who undertook a great deal of pioneering electromagnetic research and developed the Hertz antenna in 1886, by the Commission électrotechnique internationale in 1930. It was reconfirmed as an international unit by the Conférence générale des poids et mesures in 1960. Heterodyne amplitude modulation A constant-amplitude sinusoidal “carrier” waveform with a relatively long wavelength is modulated such that its amplitude becomes proportional to that of another waveform whose information content is to be transmitted. The resulting waveform will have a constant pattern of varying amplitude over a fixed interval (beat wavelength). The technique was fundamental to the early transmission of radio signals carrying speech and music. In an earth science context Weedon (2003) distinguishes between: heterodyne amplitude modulation and imposed amplitude modulation: Heterodyne amplitude modulation is the addition of two sinusoids with similar wavelengths to create a new waveform which has a frequency equal to the average of those of the two waveforms added. The amplitude of the resultant waveform (the beat) varies in a fixed pattern over the beat wavelength and has a frequency which equals the difference in the frequencies of the two added waveforms. The principle was originally conceived by Canadian-born chemist, physicist, and wireless telegrapher, Reginald Aubrey Fessenden (1866–1931) (Fessenden 1902), who also coined the term heterodyne from the Greek, heteros (other) and dynamis (force). The superheterodyne receiver evolved through the work of wireless telegrapher Lucien Lévy (1892–1965) in France, and the American electrical engineer, Edwin Howard Armstrong (1890–1954), who patented it in 1917, and by 1921 the term had come into general use (Armstrong 1917, 1921, 1924). Heterogeneous data set A data set formed of two or more subpopulations which have different characteristics in terms of their statistical parameters, e.g. mean and standard deviation. The term occurs in an earth science context in Gutenberg (1954).

262

Heterogeneous strain The change in shape or internal configuration of a solid body resulting from certain types of displacement as a result of stress. Homogeneous strain operates such that an initial shape defined by a set of markers in, say, in the form of a circle (or sphere) is deformed into an ellipse (or ellipsoid). In heterogeneous strain the final shape formed by the markers will be irregular. Implicit in the work of the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823, 1827), the first rigorous definition of the term strain (in which it was contrasted with stress) was given by the British engineer, William John Macquorn Rankine (1820–1872) (Rankine 1855, 1858). The term homogeneous strain was used by the British physicist, William Thomson [Lord Kelvin] (1824–1907) (Thomson 1856). Both strain and homogeneous strain were introduced into geology by the American mining and structural geologist, George Ferdinand Becker (1847–1919) (Becker 1893).

H

Heterogonic growth Allometry was originally the study of the relationship between a measurement characterising the size of a body as a whole (e.g. its weight or overall length) and that of any of its parts (e.g. a limb) or, latterly, the relative growth of any two parts. In geology it has been particularly applied in palaeo-biometrics. The German psychiatrist, Otto Snell (1859–1939) first drew attention to the importance of relating brain size to body size (Snell 1892). The English evolutionary biologist, Julian Huxley (1887–1975), suggested (Huxley 1932) that the relative change in growth of two skeletal parts, x and y, could be expressed by the general equation y ¼ bxk, where b and k are constants. If the parts grow at the same rate, k equals 1, and it is known as isogonic growth; if k is not equal to 1, it is known as heterogonic growth. Early palaeontological studies include Hersh (1934) and Olsen and Miller (1951). As it was realised that the assumption of dependent and independent variables in the case of morphological dimensions was not really applicable (Kermack and Haldane 1950), the line of organic correlation, later known as the reduced major axis, was used to fit regression models to such morphological measurement data. Heteroscedacity, heteroscedastic This applies when the magnitude of the variance of a variable is not the same for all fixed values of that variable, e.g. the spread of analytical (chemical) error for a given analyte tends to increase with concentration of the analyte. The term was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1905b). These terms occur in an earth science context in Reyment (1962), Griffiths (1967a, b,c), Hutchinson et al. (1976) and Hawkins and ten Krooden (1979). Heterotopic data An effect in one variable that is controlled by variations in another (Herzfeld 1990). Hexadecimal A base-16 number system used in computing. Introduced to assist human understanding of binary-coded information, as all 256 possible values of a byte can be represented using two digits in hexadecimal notation. It was in use from 1956 in the

263

Bendix-15 computer, but the current system of representation was introduced by IBM in 1963 (cf. Amdahl et al. 1964). An early usage in geophysics is given in Nordquist (1964) for an earthquake catalogue; his programs were used on a Bendix G15D computer to calculate the distribution of seismicity over Southern California. Hexagonal field of variation A method of placing a trivariate uncertainty region around the arithmetic mean composition, plotted on a ternary diagram: usually drawn by calculating a separate 1- or 2-standard deviation interval on the mean for each of the three components, drawing these as a pair of lines parallel to each side of the triangle, and then joining them at their intersections to form the final hexagonal boundary. Introduced into geology by Stevens et al. (1956). See Philip et al. (1987) and Howard (1994) for discussion. However, because of the constant-sum nature of the data (closed data), their construction is based on erroneous application of univariate theory. See Weltje (2002) and Buccianti et al. (2006) for an improved method based on the additive lognormal distribution. Hidden-line problem, hidden-surface problem In early computer graphics, “solid” 3-dimensional objects were often represented by “wire-frame models” in which only the edges of each surface were represented. The hidden-line problem was to determine which parts (if any) of these lines should be hidden from view when the opaque object was displayed in 2-D as though viewed from a particular point in space. The hidden-surface problem extends the problem to the representation of the opaque exterior surface of the object which will probably include shaded surfaces. An extensive review of early algorithms directed at solving such problems was given by Sutherland et al. (1974); the first solution to the hidden-line problem being that of American computer scientist, Lawrence Gilman Roberts (1937–) (Roberts 1963). The first applications in geology appeared in the 1970s (Sprunt 1975; Tipper 1976) and have followed the subsequent evolution into shaded grey scale images (Savazzi 1990), colour images (Erlebacher et al. 2001) and virtual reality (Lin et al. 1998). Hierarchical cluster analysis A method of cluster analysis which imposes a tree-like structure on the objects classified in terms of their similarity to each other. See: cladogram, dendrogram. Hierarchical sampling Hierarchical, stratified, or stratified random, multi-stage or nested sampling are all names for a sampling design in which the n specimens to be taken from a fixed interval (e.g. a vertical section through the horizon of interest) are selected at random positions (chosen using a random number table, or computergenerated sequence of random numbers, to avoid bias) within n equal-length subdivisions of the entire interval. The name derives from the division of the population to be sampled into parts, known (probably after early geological usage) as “strata.” This sampling strategy is particularly appropriate in spatial geological studies so as to achieve

264

regionally adequate coverage. For example, in a region covered by a particular geological formation to be sampled for a pilot environmental survey, one might, divide the area occupied by the formation in question into 10 10 km grid squares, and select a number of these either on a spatially regular or random basis; within each, select at random two 1 1 km sub-cells; within each of these, take pairs of samples 100 m apart at two randomly-selected positions, and combine these four field samples together so as to provide a single composite sample which will subsequently be used for laboratory preparation and analysis. This hierarchical approach originated in social survey work by the Norwegian statistician, Anders Nicolai Kiaer (1838–1919) (Kiaer 1895) and was later established on a sound theoretical basis by the Russian-born American statistician, Jerzy Neyman (1894–1981) (Neyman 1934). It was introduced into geology by the American mathematical geologist, William Christian Krumbein (1902–1979) and statistician, John Wilder Tukey (1915–2000) (Krumbein and Tukey 1956); see also Krumbein and Graybill (1965), Tourtelot and Miesch (1975) and Alley (1993).

H

High-cut filter More usually known as a low-pass filter. High-level language A user-friendly computer programming language, which hides the detailed of the underlying computer operations. See: ALGOL, APL, awk, BASIC, C, COBOL, FORTRAN, MATLAB, Pascal, PL/I, Prolog, R, S; see also: assembler language. High-pass filter, high pass filter, highpass filter Filters are algorithms for selectively removing noise from a time series (or spatial set of data), smoothing, or for enhancing particular components of the signal by removing those that are not wanted. A high-pass filter attenuates frequencies below some cut-off value while passing higher frequencies (it may be used to emphasise anomalies in the data with unusually large positive, or negative, magnitudes). Electrical low-pass, high-pass and band-pass “wave filters” were initially conceived by the American mathematician and telecommunications engineer, George Ashley Campbell (1870–1954) between 1903 and 1910, working with colleagues, physicist, Otto Julius Zobel (1887–1970) and mathematician Hendrick Wade Bode (1905–1982), but the work was not published until some years later (Campbell 1922; Zobell 1923a, b, c; Bode 1934). Equivalent filters were introduced into digital signal processing by the American statistician, John Wilder Tukey (1915–2000) and mathematician Richard Wesley Hamming (1915–1998) (Tukey and Hamming 1949). Parallel theoretical background was provided by the work of the American physicist, George W. Steward (1876–1956), who worked on acoustics between 1903 and 1926 and solved the fundamental wave equations involved in acoustic filter design (Crandall 1926). Highpass filter is still the most frequently used spelling (Google Research 2012). See also Wiener (1942, 1949), Vistelius (1961), Buttkus (1991, 2000), Howarth et al. (1980), Camina and Janacek (1984), Gubbins (2004).

265

High-tailed frequency distribution A frequency distribution in which, as the absolute values of the observations get large, the ratio of f1(x)/f2(x) approaches infinity, where f1(x) is the frequency distribution of the high-tailed distribution and f2(x) is the frequency distribution of the normal distribution (Link and Koch 1975). Hilbert space An abstract vector space in which the methods of vector algebra and calculus are applicable in many dimensions; three-dimensional Euclidean space may be regarded as a subset. Named for the German mathematician, David Hilbert (1862–1943) in work by the Hungarian-American mathematician Janosh (John) von Neumann (1903–1957) (von Neumann 1929), it occurs in Backus and Gilbert (1967) and Gubbins (2004). Hilbert transform The Hilbert transform of a signal g(t): Z 1 1 g ðt τÞ H fg ðt Þg ¼ dτ, π 1 τ (where the principal value of the integral is used) the convolution of g(t) with the signal (1/πt). In practice, in the frequency domain, given an input signal, application of the transform induces a +90 phase shift on the negative frequencies, and a 90 phase shift on all positive frequencies, e.g. H f cos ðat Þg ¼ sin ðat Þ; it has no effect on amplitude. In the time domain the signal and its Hilbert transform are orthogonal and have the same energy, as the energy of the signal remains unchanged. Its use provides a means of determining the instantaneous frequency and power of a signal. Named for the German mathematician, David Hilbert (1862–1943), it occurs in Moon et al. (1988) and Buttkus (1991, 2000). Hill-Piper diagram A composite diagram showing both cation and anion compositions of groundwater. It is composed of three subplots: (i) a pair of ternary diagrams joined at their common (invisible) base to form a diamond with axes corresponding to the relative percentages of: 0–100% (Cl + SO4), above left, increasing upwards; 0–100% (Ca + Mg), above, right, increasing upwards; 0–100% (Na + K), below, left, increasing downwards; and 0–100% (CO3 + HCO3), below, right, increasing downwards; the two pairs of cations and anions are therefore opposite each other. (ii) Below the diamond to the right is a ternary diagram with its side corresponding to 0–100% (CO3 + HCO3), parallel to the lower right side of the diamond and increasing to 100% at its lower left apex; bottom, 0–100% Cl, increasing to the lower right apex; and 0–100% SO4, right, increasing to the top apex. (iii) Below the diamond to the left is another ternary diagram for 0–100% (Na + K) parallel to

266

the lower left side of the diamond and increasing to 100% at the lower right apex; 0–100% Ca, increasing towards the lower left apex; and 0–100% Mg, increasing towards the top apex. Each water sample composition is plotted as a point in all three diagrams. Popularised by the American hydrogeologist, Arthur Maine Piper (1898–1989) (Piper 1944) in a form modified from a diagram introduced by the American civil engineer Raymond Alva Hill (1892–1973) (Hill 1940). See also: Chadha diagram.

H

Hill shading, hill-shading A method of depicting topographic relief by drawing shadows on a map to simulate the effect of natural illumination of the bare landscape by the sun from a given direction (Peucker and Cochrane 1974). See Horn (1981) for an historical review. As early as the sixteenth century, hand-drawn shading was used schematically to indicate mountainous relief in maps, e.g. in an edition of Claudius Ptolemy’s Geography (Moleti 1562). One of the most impressive early examples of accurate hill shading is a 1:32000-scale topographic map of the Canton of Zurich, Switzerland, surveyed over a 38-year period and completed in 1667 by the Swiss artist, mathematician, surveyor and cartographer, Hans [Johann] Conrad Gÿger [Geiger] (1599–1674) (Gÿger 1667). With the advent of the computer, a variety of algorithmic solutions have become available in which the terrain is considered in terms of a very large number of small plane elements and the correct illumination is calculated for each one separately (e.g. Yoëli 1967; Brassel 1974; Imhof 1982; Katzil and Doytsher 2003). Colour coding of amplitude is often used when the technique is applied to show the topography and strength of a gravity or magnetic field (e.g. Neumann et al. 2015). Spellings with and without hyphenation seem to be equally frequent; the technique appears to be less frequently known as relief shading (Google Research 2012). Histogram A graph in which the absolute or relative frequencies of occurrence of a continuous or discrete variable are shown by the proportional lengths of the vertical bars for each category in a data set. The side-by-side bars should be drawn with no gap between them. Bin-width and choice of endpoints may well affect the visual appearance of the graph. The term was originally used the British statistician, Karl Pearson (1857–1936) in his lectures at Gresham College, London, in 1892 (Bibby 1986). The Swedish-born American geologist, Johan August Udden (1859–1932), originator of the grade scale for sedimentary grain sizes, used histograms (without explicitly naming them) to illustrate sediment grain-size distributions in Udden (1898). Even so, in the 1930s, some workers in sedimentology referred to a histogram in the context of sediment size distributions as a frequency pyramid (a practice discouraged by Krumbein 1934a). See also: Scott (1979) and Wand (1997), density trace. For discussion of treatment of data containing nondetects, see Helsel (2005). History matching A type of inverse problem in which observed historical reservoir behaviour is used to aid the estimation of reservoir model variables (such as permeability and porosity) which caused that behaviour, since a model (which may include a number of

267

sub-models) which can reproduce past behaviour is believed to have a reasonable chance of estimating future behaviour. An early study by Coats et al. (1970) used Linear Programming. Gavalas et al. (1976) introduced a Bayesian approach. The importance of using both multiple algorithmic approaches and not necessarily accepting the first global minimumerror solution as “the answer” was subsequently recognised and Koppen (2004) suggested the use of stochastic population-based algorithms. See Oliver and Chen (2011) and Hajizadeh (2011) for a review of the development of history matching techniques and the application of new population-based optimization methods: Ant Colony Optimization (Dorigo 1992; Dorigo et al. 1996); Differential Evolution (Storn and Price 1995); and the Neighbourhood Algorithm (Sambridge 1999a,b). Hodges-Ajne test A simple test for uniformity of 2-D directional data independently developed by the American statistician, Joseph Lawson Hodges Jr. (1922–2000) (Hodges 1955) and the Swedish statistician Bj€orn Ajne (1935–2005) (Ajne 1968). Cheeney (1983) shows a simple geological usage; see also Mardia (1972) and Mardia and Jupp (2000). Hoeppener plot A graphical technique to aid palaeo-stress analysis of fault plane data, introduced by the German structural geologist, Rolf Hoeppener. The poles of the fault planes plotted on a lower-hemisphere equal-area stereographic projection are combined with arrows indicating the direction of movement of the hanging block in each case (Hoeppener 1955; Krejci and Richter 1991). Holdout validation See Jackknife validation. Hollerith code Named for the German–American mining engineer and statistician, Herman Hollerith (1860–1929) who in 1889 was granted a patent for a method for encoding numerical, alphabetic and special characters using holes punched on the basis of a rectangular grid pattern on 45-column “punched cards” for use in mechanical tabulating machines (punched paper tape came into use in the 1840s). He founded the Tabulating Machine Co. which eventually became the International Business Machines Corporation (IBM) in 1924 (Kistermann 1991). In 1928, IBM introduced use of the 12-row 80-column (738 314 inch; 18.733 8.255 cm) Hollerith punched card on which alphabetic, numerical and special characters were encoded for input/output of data (IBM 2014). These lasted into the 1970s, until being replaced by magnetic storage media. The first comprehensive illustration of the use of punched cards in geology was in a paper by the American mathematical geologist, William Christian Krumbein (1902–1979) and Laurence Louis Sloss (1913–1996) (Krumbein and Sloss 1958), recording thickness of positional and sand, shale and non-clastic rock thickness data, although Margaret Parker of the Illinois Geological Survey had also begun using punched cards in connection with stratigraphic and geochemical studies (Parker 1952, 1957).

268

Holomorphic function A complex-valued function of one or more complex variables that is complex-differentiable in a neighbourhood of every point in its domain. The term has been in use since its introduction by the French mathematicians Charles Auguste Briot (1817–1882) and Jean Claude Bouquet (1819–1895) (Briot and Bouquet 1856). Homogeneity, homogeneous Terms generally used in statistics in the sense that samples from different populations which have essentially identical values of a parameter (or parameters, e.g. mean, standard deviation), are said to be “homogeneous” in respect of that parameter. An early example of geological usage is by the American statistician, Churchill Eisenhart (1913–1994) (Eisenhart 1935). These terms are also used with regard to the uniform composition of a physical sample, e.g. in geochemistry and mineralogy (Miesch 1976a). Homogenous equations A function f(x1, x2, ∙ ∙ ∙ ∙, xn) is called homogeneous of degree k if

H

f ðax1 ; ax2 ; ∙ ∙ ∙ ∙ ; axn Þ ¼ ak f ðx1 ; x2 ; ∙ ∙ ∙ ∙ ; xn Þ is true for every real number a. The term was used in Hutton (1815, Miller 2015a). A system of linear equations called homogeneous if Ax ¼ 0; i.e. the right hand side is a column vector whose entries are all zero (Camina and Janacek 1984). Homogeneous strain The change in shape or internal configuration of a solid body resulting from certain types of displacement as a result of stress. Homogeneous strain operates such that an initial shape defined by a set of markers in, say, in the form of a circle (or sphere) is deformed into an ellipse (or ellipsoid). In heterogeneous strain the final shape formed by the markers will be irregular. Implicit in the work of the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1823, 1827), the first rigorous definition of the term strain (in which it was contrasted with stress) was given by the British engineer, William John Macquorn Rankine (1820–1872) (Rankine 1855). The term was used in 1856 by the British physicist, William Thomson (Lord Kelvin, 1824–1907) (Thomson 1856). Both strain and homogeneous strain were introduced into geology by the American mining and structural geologist, George Ferdinand Becker (1847–1919) (Becker 1893). See also Ramsay (1967), Ramsay and Huber (1983). Homomorphic deconvolution, homomorphic filtering, homomorphic signal processing Homomorphic filtering is a nonlinear technique used in signal processing, to separate signals which have been nonadditively combined, e.g. through convolution or multiplication. It is based on linear filtering operations applied to the complex cepstrum followed by back-transformation to the original domain (Tribolet 1979). The method was introduced by the American electrical engineer, Alan Victor Oppenheim (1937–) (1965a, b;

269

Oppenheim et al. 1968). Early applications in seismology are Ulrych (1971) and Stoffa et al. (1974). Homoscedastic, homoscedasticity The magnitude of variance of a variable is the same for all fixed values of that variable. The term was introduced by the British statistician, Karl Pearson (1857–1936) in Pearson (1905b). The term homoscedastic appears in Griffiths (1958) and Hawkins and ten Krooden (1979). Hopf bifurcation A local bifurcation in a dynamical system at which a fixed-point looses its stability to perturbations that take the form of growing oscillations, shedding its limit cycle. It occurs in certain chemical reaction systems and predator-prey models. Also known as the Andronov-Hopf bifurcation, it is named for the Russian control engineer, Aleksandr Aleksandrovich Andronov (1901–1952) who first discovered it in 1929, and the Austrian-born German mathematician, Eberhard Frederich Ferdinand Hopf (1902–1983) who independently discovered it (Hopf 1942, 1948, 1976). The term Hopf bifurcation was introduced by the Belgian physicist and mathematician, David Ruelle (1935–) and Dutch mathematician, Floris Takens (1940–) (Ruelle and Takens 1971). For discussion in an earth science context, see Turcotte (1997). Horton diagram, Horton analysis Named for the American hydraulic engineer, Robert Elmer Horton (1875–1945) whose work founded the science of hydrology (Horton 1945) and geologist and geomorphologist, Arthur Newell Strahler (1918–2002) showed (Strahler 1952) that in a drainage network in a single drainage basin, the logarithms of stream lengths and drainage sub-basin areas increase with stream order. The logarithm of stream numbers decreases approximately linearly with stream order (so chosen that the fingertip or unbranched tributaries are 1st order; streams which receive only 1st order tributaries are 2nd order, etc.). Such plots became known as a Horton diagram or Horton analysis. The theoretical basis for these results was investigated by Shreve (1966). See also Zavoianu (1985). Horton-Strahler number Stream order is a method of classifying stream segments between confluences (links) based on the number of tributaries upstream. In the original scheme, devised by the American hydraulic engineer, Robert Elmer Horton (1875–1945) whose work led to the founding of the science of hydrology, Horton (1945) designated a headwater (or “fingertip”) stream, i.e. one with no tributaries, as 1st order. Tributaries or streams of the 2nd order received branches or tributaries only of the 1st order; a 3rd order stream must receive one or more tributaries of the 2nd order, but may also receive tributaries of the 1st order, etc. However, so as to determine which is the parent and which is the tributary stream at a given split (bifurcation), in his scheme the stream joining the “parent” stream at the greatest angle was, by definition, of lower order. This led to reclassification of some links and extension of higher-order streams up the drainage network, so that some fingertips, and the channel leading from them, could become 2nd

270

H

or 3rd order. To avoid this problem, the American geologist and geomorphologist, Arthur Newell Strahler (1918–2002) adapted Horton’s scheme: fingertip channels were all, by definition, 1st order. A stream segment downstream of the junction of two 1st order streams became 2nd order and, in general, an nth order stream lay downstream of the confluence of two (n 1)th order streams. Streams of lower order joining a higher order stream did not change the order of the higher stream. Hence, if a 1st-order stream joined a 2nd-order stream, it remained a 2nd-order stream. It is not until a 2nd-order stream combines with another 2nd-order stream that it becomes a 3rd-order stream. Streams up to 3rd order constituted headwater streams and anything larger than 6th order is considered to be a river. The largest-known river (the Amazon) is 12th order. See Shreve (1966, Fig. 1) for a comparison of the Horton and Strahler definitions applied to the same stream network. Whichever scheme is adopted, the logarithm of the number of streams of a given order (Horton stream numbers, Strahler stream numbers) decreases linearly as a function of increasing stream order. Such a graph is known as a Horton diagram. The Swiss-born Austrian geologist, Adrian Scheidegger (1925–2014), then proposed a scheme in which every junction was associated with a progressive increase in stream order (Scheidegger 1965) and the American geomorphologist, Ronald Lee Shreve (1930–), introduced the concept of link magnitude, given by the number of 1st order (headwater) channels upstream of a given link. Thus, a 1st order stream joining a 2nd order results in a 3rd order downstream; a 2nd order and 3rd order stream joining produce a 5th order link, etc. This scheme appears to have become subsequently known as the Shreve order. The Strahler scheme has since been generally adopted, denoted the Strahler number, and on occasion (incorrectly, for the reason given above) as the Horton-Strahler number. Latterly, the Strahler number has been applied to binary trees, either in drainage network simulation (Yuan and Vanderpool 1986) or more generally (Devroye and Kruszewski 1995). Hough transform A feature extraction technique used in digital image processing to identify the presence of lines in an image. Originally developed by the American electrical engineer, Richard Oswald Duda (1936–) and computer scientist, Peter Elliot Hart (1941–), and named by them for the physicist, Paul van Campen Hough (1925–), whose algorithm for line detection (Hough 1962) they showed to be computationally unfeasible and improved it (Duda and Hart 1972). Yamaji et al. (2006) applied it to paleostress analysis. See also: multiple inverse method. Huffman coding A method of data encoding, based on a frequency-sorted binary tree, which ensures lossless data compression. The encoded value at each node becomes the sum of the frequencies of occurrence of the nodes below it. Published by the computer scientist, David Albert Huffman (1925–1999) in 1952. Kidner and Smith (1992) give an earth science example of its use. Hurricane sample A term used in Russian and East European literature on ore deposits to refer to specimens with an anomalously high metal content (Vassilev 1972).

271

Hurst exponent (H ) Named for the British hydrologist, Harold Edwin Hurst (1880–1978) who, during a career spent largely in Egypt, studied the 800-year flooding history and flow pattern of the river Nile. If r is the range of the values of a time series and s is its standard deviation, both determined over a time interval T, then for some process which has no persistence (i.e., regular behaviour), the ratio r/s, known as the rescaled H range (Hurst et al. 1965), is independent of T; for other processes, rs ¼ T2 , where the constant H (originally designated by Hurst as K ) is now known as the Hurst exponent. H ~ 0.5 for a random walk; if H < 0.5, the series has negative autocorrelation; and if 0.5 < H < 1.0, the series has positive autocorrelation and is known as a long memory, or persistent memory, process. The fractal dimension, D ¼ 2 H. For a time series xi, (i ¼ 1,n), H is estimated from the slope of a graph of log(r/s) as a function of log(n), where n, the length of the series considered, may be increased by intervals of, say, 100 or more successive points. In some cases, long sequences (several thousand points) may be required before a conclusive result can be obtained. See also: Hurst (1951, 1955), Hurst et al. (1965) and Turcotte (1997). Hurwitz criterion, Routh-Hurwitz criterion A test used to show whether the equations of motion of a linear time-invariant control system have only stable solutions. Proposed by the German mathematician, Adolf Hurwitz (1859–1919) (Hurwitz 1895). Also known as the Routh-Hurwitz criterion as the English mathematician Edward John Routh (1831–1907) had proposed an equivalent procedure in 1876 (Routh 1877) to determine whether all the roots of the characteristic polynomial of a linear system have negative real parts. Mentioned in Buttkus (1991, 2000). Hybrid walk A method for determining fractal dimension (Clark 1986). Investigated in an earth science context by Longley and Batty (1989). Hyperbolic functions (sinh, cosh, tanh) The hyperbolic sine function, sinh(x) ¼ [ex e(x)]/2; hyperbolic cosine, cosh(x) ¼ [ex + e(x)]/2; and hyperbolic tangent, tanh(x) ¼ sinh(x)/cosh(x) ¼ (e2x 1)/(e2x + 1), where e is Euler’s number, the constant 2.71828. Also: sinh(x) ¼ i sin(ix); cosh(x) ¼ cos(ix) and tanh(x) ¼ i tan(ix), where i is the imaginary unit √(1); and sinh1 (x) ¼ ln[x + (x2 + 1)0.5]; cosh1 (x) ¼ ln[x + (x2 1)0.5]. These relations were introduced by the Italian mathematician, Count Vincenzo Riccati (1707–1775) (Riccati 1757–1762). An example of their early use in geophysics is by Macelwane (1932).

272

Hyperbolic distribution See log-hyperbolic distribution. Hypercube A D-dimensional analogue of a square (D ¼ 2) and a cube (D ¼ 3); mentioned in Davis and David (1978). The Swiss mathematician, Ludwig Schläfi (1814–1895) was the first to investigate the properties of the hypercube in 1850–1852 (Schläfi and Graf 1901), although it is also discussed in Delboeuf (1894). The American computer scientist, A. Michael N€ oll (1939–) working at Bell Telephone Laboratories used a StrombergCarlson microfilm plotter to make the first computer-generated stereoscopic film of a rotating 4-D hypercube (N€oll 1964, 1965a, b). See also: hypersurface, Latin hypercube sampling.

H

Hyperplane, hypersurface The equivalent of a plane in four or more dimensions: a hyperplane of a D-dimensional space is a “flat” subset of dimension (D 1). A hypersurface is the equivalent of a surface in four or more dimensions. Rao and Rao (1970b) discuss the properties of a quadratic hypersurface generated in a three-dimensional functional space, e.g. the ellipsoid: aðx1 Þ2 þ bðx2 Þ2 þ cðx3 Þ2 þ 2f x2 x3 þ 2gx3 x1 þ 2hx1 x2 þ 2ux1 þ 2vx2 þ 2wx3 þ d ¼ 0: See also hypercube. Hypothesis test The use of statistics to determine the probability that a stated hypothesis is true. Assume that f (X, θ) is a population density function (e.g. a normal distribution with a parameter q) and some hypothesis about q is to be tested, based on a random sample size of n. Generally, the null hypothesis (usually designated H0) is to test either whether q is equal to some assigned value, or whether it lies within a stated interval. The alternative hypothesis (Ha) states the values that q can assume if the hypothesis under test is not true. Note that in general it is inappropriate to use data to formulate a hypothesis and then to use the same data to test it: a second, independently obtained, sample is required to do this. A suitable statistical test is chosen and the value of the test statistic calculated. There are two possible types of error which one would like to minimise: Type I, that H0 may in fact be true, but is rejected as false; and Type II, that H0 may in fact be false, but is accepted as true. In practice, the observed value of the test statistic, (τo) is compared with the theoretical value (τα,n) for a sample size of n and a chosen small level of the probability of committing a Type I error (α ¼ 0.001, 0.005, 0.01, 0.05 or 0.10), the size of the test. Then if τo exceeds τα,n reject H0; or, if τ0 is less than or equal to τα,n, accept H0. The general theory of hypothesis testing was developed by the Russian-born American statistician, Jerzy Neyman (1894–1981) and the English statistician, Egon Sharpe Pearson (1895–1980) in Neyman and Pearson (1928). See Miller and Kahn (1962) for an early extended discussion in a geological context.

273

Hypsometric curve A hypsometric curve is constructed by calculating the area between the contours on a topographic map for a region, or a watershed, and plotting as the y-axis, either (i) the given elevation (h) with respect to sea level; and as x-axis the cumulative area (a) above the corresponding elevation or (ii) the relative elevation (h/H ) as a function of relative area (a/A), where H and A are the maximum height and area respectively. The latter, non-dimensional, version was introduced by the American geologist and geomorphologist, Arthur Newell Strahler (1918–2002) (Strahler 1952), who recommended that it was to be preferred, as comparisons between drainage basins could then be made irrespective of their true geographic scale. He called the area under the curve the hypsometric integral. For discussion see: Pike and Wilson (1971), Harlin (1978), Ohmori (1993), Willgoose and Hancock (1998) and Soares and Riffel (2006). Hz (hertz) A unit of frequency: the number of complete cycles per second. First named by the Commission électrotechnique internationale in 1930, in honour of the German physicist, Heinrich Rudolph Hertz (1857–1894), who undertook a great deal of pioneering electromagnetic research and developed the Hertz antenna in 1886. It was reconfirmed as an international unit by the Conférence générale des poids et mesures in 1960.

I

i [notation], imaginary unit A mathematical symbol generally used to denote the imaginary unit, the constant √(1). Note that some authors use j for this purpose. Although such “imaginary” numbers had been used by the Italian mathematician Girolamo Cardan (1501–1576) in 1545 and other mathematicians subsequently, it was the Swiss mathematician and physicist, Leonhard Paul Euler (1707–1783) who introduced the symbolic notation i (Euler 1748). An example of early use in geophysics is Macelwane (1932). See also: complex conjugate, complex number. Identification coefficient A measure of the similarity of the characteristics of an unknown sample to those of a number of pre-defined taxa or groups, all being coded as simply as the presence or absence of each criterion, as an aid to classification (Sneath 1979). The stored matrix of the properties of the groups, in terms of the percentage of occasions on which a given characteristic is associated with the group, is referred to as the identification matrix. Sneath uses a number of identification coefficients to characterise the unknown’s similarity to each group, including the Willcox probability (Willcox et al. 1973); taxonomic distance (Sokal 1961) and pattern distance (Sneath 1968). Identity matrix (I) I is the usual notation for the identity matrix: a square matrix in which the elements on the principal diagonal are all equal to unity and the off-diagonal elements are all zero: 2

1 I ¼ 40 0

3 0 0 1 0 5: 0 1

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_9

275

276

It was introduced by the English mathematician, Arthur Cayley (1821–1895) (Cayley 1858). Originally known as the “unit matrix,” the term identity matrix appears to have come into use in the early 1900s (e.g. Dickson 1908; Miller 2015a) and since 1960 has become the more usual term of the two (Google Research 2012). An early usage in a geological context occurs in Krumbein and Graybill (1965). iff [notation] It is an abbreviation for the logical condition “if, and only if.” Generally believed to have been introduced by the Hungarian-American mathematician, Paul Richard Halmos (1916–2006) in the mid-1950s, its first appearance in print is credited to the American mathematician, John Le Roy Kelley (1916–1999) (Kelley 1955; Miller 2015a). However, the use of symbolic logic for this purpose goes back to the work of the German polymath, Gottfried Wilhelm von Leibniz (1646–1716) in 1679 (Rescher 1954). The term appears in a geological context in Dienes and Mann (1977) and is listed in Sheriff (1984).

I

IGROCS A program, written in Visual basic for the .NET Framework environment, for the chemical classification and nomenclature of igneous rocks based on the International Union of Geological Sciences classification scheme (Verma and Rivera-Gómez 2013). iid [notation] Abbreviation for “independent and identically distributed” random variables. While the term itself seems to have first become frequent in the 1940s, the abbreviation appears to have been first used in the 1960s (Google Research 2012). Ill-conditioning, ill conditioning, illconditioning, ill-conditioned An error is created when the exact mathematical value of a decimal number is replaced by a number with a finite number of digits (or even by an integer). This is known as roundoff error. When a series of calculations (e.g. solving a system of equations) is subject to roundoff error, these may accumulate in some cases so as to render the result of the calculation meaningless. Such problems can easily arise in computations for multiple regression in which the symmetric and positive-definite matrix of normal equations is likely to be ill-conditioned, particularly if polynomial equations of reasonably high degree are involved. (Healy 1963). Those with the smallest determinant are likely to be most ill-conditioned. The extent to which this is true is indicated by the condition number (Turing 1948) and special methods have subsequently been developed to minimise the computational problems (Riley 1955; Healey 1963; Ashenhurst and Metropolis 1965; Goldberg 1991). For early discussion of such problems in petrology see Vistelius (1967, 29–40); and in geophysics see: Paul (1961), Kovach and Anderson (1964); see also Sarma and Selvaraj (1990), Cicci (1992), Ellmann (2005) and Santos and Bassrei (2007). Ill-conditioning and ill-conditioned are the most widely-used spellings (Google Research 2012). See also: floating-point representation, L-curve, truncation error.

277

Ill-structured problem Early usage of this term in the field of computing occurs in Martin (1965) who defined it as a problem “we do not understand well enough to solve with a mathematical model” (i.e., there is no single, identifiable, objectively optimal solution) consequently, there is no clear route to its solution. Although the term occurs in Tuska (1944), it seems to have gained popularity through management science, possibly through the work of Russian-born American mathematician and pioneer of strategic management, H. Igor Ansoff (1918–2002) (Ansoff 1959), and operations research (Optner 1965). Typical of such problems are the making a geological map, interpretation of seismic sections, or the assessment of vulnerability of an urban area to earthquake damage. For discussion see Simon (1973), Rashed and Weeks (2003), and Bond et al. (2011). Image analysis The extraction of meaningful information from digital images by means of image processing techniques (Taranik 1978; Scales 1995; Heilbronner and Barrett 2014). It could be considered as the digital era’s equivalent of human photographic interpretation (Lueder 1959; Miller and Miller 1961; Lattman and Ray 1965) Image compression The reduction in absolute size of the file size (bytes) a digitized image (generally a monochrome or colour photographic image of some kind or a digitized map), consisting of a two-dimensional array of integer pixel values, for the purposes of efficient storage and transmission. The compressed image may be subsequently restored to its original size. Algorithms used to accomplish this are classified as either lossless, which have no information loss in the restored image, or lossy, which enable a higher degree of compression at the expense of some information loss (Rabbani and Jones 1991). A widelyused example is the JPEG (Joint Photographic Experts Group) file format (Taubman and Marcellin 2001). Current standards for space-borne imagery are described in Consultative Committee for Space Data Systems (2005, 2015). Southard (1992) discusses the compression of digitized map images. Image processing Any form of signal processing for which the input is a 2-dimensional image. Early work accomplished this by optical image processing, but as cheaper and more powerful computers became available, from the 1960s onwards, the term was used to imply digital image processing (Abrams 1978; Condit and Chavez 1979; Fabbri 1984; Gonzalez and Wintz 1987; Pike 1992; Petrou and Petrou 2010). Image texture Texture in an image is an expression of its topography as characterised by the variation in grey scale (or colour) intensity of a group of adjacent pixels in an image, which a human observer might describe as e.g. a “smooth,” “rough,” “bumpy,” or “ridged,” etc. surface. The identification of regions of an image which have similar textures can be used as a means of partitioning or to aid classification of elements within the image, e.g. Franklin and Peddle (1987) and Materka and Strzelecki (1998)

278

Imaginary number, imaginary part A complex number has both real and imaginary parts, terms introduced by the French mathematician and philosopher, René Descartes (1596–1650) (Descartes 1637), e.g. z ¼ x + iy; where x is the real part and iy is called an imaginary number and forms the imaginary part of the complex number where the pffiffiffiffiffiffiffi constant i represents the imaginary unit 1. For usage in a geophysical context see Camina and Janacek (1984), Buttkus (1991, 2000) and Gubbins (2004). Imposed amplitude modulation This is the modification of a high-frequency sinusoid by one of longer period (e.g. by multiplication of the two signals) to produce a combined signal in which amplitude varies in a fixed pattern; maximum amplitude corresponding to the frequency of the imposed, longer wavelength, signal. The term occurs in a geological context in Weedon (2003), Rial (2003) and Weedon et al. (2004). See also: heterodyne amplitude modulation.

I

Impulse A probability distribution in which P(x) ¼ 0 from x at 1 to +1, x 6¼ 0; however, at x ¼ 0, P(x) ¼ 1. Its use was popularised by the British physicist, Paul Adrien Maurice Dirac (1902–1984) who introduced it (Dirac 1930, p. 58) as a tool in quantum mechanics. Discussed in a geophysical context by Buttkus (1991, 2000), Gubbins (2004), Gunduz and Aral (2005). Also known as the Dirac Delta function, Dirac function; see also: Kronecker Delta, Heaviside function, Dirac comb. Impulse response function The time function (characteristic waveform) describing a filter in terms of the output resulting from an input described by a Dirac function applied at time t ¼ 0. The filter is completely described by its transfer function. See Blackman and Tukey (1958) and in an earth science context: Robinson (1967b), Buttkus (1991, 2000) and Weedon (2003). See also Finite impulse response filter. In-phase 1. A condition in which the crests (troughs) of two time series waveforms are of the same phase. An early example of the term in this context in geophysics occurs in Neumann (1925). 2. An induced signal with the same phase angle as that of the exciting, or comparison, signal. See also: out-of-phase. Inaccuracy The departure of a measurement or recorded value from the true value as a result of instrumental error such as bias, lack of repeatability, drift, etc. The term was used by the English physician, chemist, metallurgist and crystallographer, William Hyde Wollaston (1766–1828) in his description (Wollaston 1809) of an optical goniometer which he developed so as to make crystallographic measurements. See also: accuracy, precision.

279

Inconsistency A set of two or more equations which cannot be solved because there is no set of values for the variables which can satisfy all the variables. The problem can arise in seismic data processing; see: Camina and Janacek (1984), Vasco (1986), Hanna (2003) and Lebedev and Van der Hilst (2008). Incomplete Beta function This is defined as: Z Bx ðα; βÞ ¼

x

t α1 ð1 t Þβ1 dt; 0 x 1

where α > 0, β > 0 if x 6¼ 1 or, in normalized form: I x ðα; βÞ ¼ Gðα; βÞ dt, and Bðα; βÞ ¼

1 ¼ Gðα; βÞ

Z

1

t α1 ð1 t Þβ1 dt ¼

Rx 0

t α1 ð1 t Þβ1

ΓðαÞΓðβÞ ; Γðα þ βÞ

where B(α, β) is the Beta function and Γ(u) is the Gamma function, given by: Z

1

ΓðuÞ ¼

t u1 et dt, u > 0

(Abramowitz and Stegun 1965). Attempts to solve integrals of the form of Bx go back to the work of the English philosopher and mathematician, Thomas Bayes (1702–1761) (Bayes 1763; Dutka 1981). Pearson (1934) published tables of the incomplete Beta function for different values of α and β, which had been hand-calculated by human “computers” under his direction during the years 1923–1932. It occurs in Brutsaert (1968). Incomplete Gamma function The normalized function is defined as: 1 Γ ðαÞ

Pðα; xÞ ¼

Z

x

t α1 et dt,

where and Γ(u) is the Gamma function, given by: Z Γ ð uÞ ¼

1

t u1 et dt, u > 0

(Abramowitz and Stegun 1965). The “upper” incomplete Gamma function is given by:

280

Z

1

Γ ðz; bÞ ¼

xðz1Þ ex dx;

b

and the “lower” incomplete Gamma function by: Z γðz; bÞ ¼

b

xðz1Þ ex dx:

Since the pioneering work by the French mathematician and geodesist, Adrien-Marie Legendre (1752–1833), (Legendre 1826) and the British statistician, Karl Pearson (1857–1936) and his colleagues (Pearson 1922), to hand-calculate tables, computer algorithms are now widely available. Its usage in earth science includes Brutsaert (1968), Kagan (1993) and Alamilla et al. (2015). Indefinite integral Also known as an antiderivative, it is defined by the relationship: Z f ðxÞdx ¼ ðxÞ þ C

I

where the integral is without limits; f (x) is the integrand; dx is the variable of integration; F(x) is the indefinite integral and C is the constant of integration. For example: Z xn dx ¼ xnþ1 =ðn þ 1Þ þ C; ðn 6¼ 1Þ: The term was introduced by the French mathematician Sylvestre François Lacroix (1765–1843) (Lacroix 1806, 1810–1819; Miller 2015a). An early use in geophysics is Fisher (1881); mentioned in Camina and Janacek (1984). Independence, independent Two quantities are statistically independent if they possess a joint probability distribution such that neither incomplete nor complete knowledge of one alters the distribution of the other (Blackman and Tukey 1958). The concept of an independent event was first introduced by the French-born mathematician, Abraham De Moivre (1667–1754): “Two Events are independent, when they have no connection one with the other, and that the happening of one neither forwards nor obstructs the happening of the other” (De Moivre 1738). Independence testing A term used by Woronow and Butler (1986) to mean statistically testing whether two variables in a data set are uncorrelated. They focus particularly on the problems associated with constant-sum (e.g. percentaged) or so-called closed data.

281

Independent Component Analysis (ICA) Independent Component Analysis, also known as Blind Source (or Signal) Separation: a technique based on information theory, originally developed in the context of signal processing (Hérault and Ans. 1984; Jutten and Hérault 1991; Comon 1994; Hyvärinen and Oja 2000; Hyvärinen et al. 2001; Comon and Jutten 2010) intended to separate independent sources in a multivariate time series which have been mixed in signals detected by several sensors. After whitening the data to ensure the different channels are uncorrelated, they are rotated so as to make the frequency distributions of the points projected onto each axis as near uniform as possible. The source signals are assumed to be non-Gaussian and statistically independent of each other. Unlike principal components analysis (PCA), the axes do not have to be orthogonal; linearity of the mixture model is not required; and ICA extracts statistically independent components, even if these components have non-Gaussian probability distribution functions. Ciaramella et al. (2004) and van der Baan (2006) describe its successful application to seismic data. See also: Constrained independent component analysis. Independent sample 1. Two or more samples collected from either the same population, or from different populations, in such a way that their collection has no effect on the other sample(s). The term was used by the Irish economist and mathematician, Francis Ysidro Edgeworth (1845–1926) (Edgeworth 1896) and had come into general use in statistical literature by the 1930s (Google Research 2012). Early examples of use of the term in a geochemical context are Farrington (1902), Abbott (1925) and Marble (1936). The importance of establishing a truly representative field sample was recognised by the American geologist, John Robert Reeves (1896–1985): “It has been the general practice in the oil-shale region of the mid-eastern States, as probably in the west, to make selective collection of samples; that is, a portion of the richest sample of the outcrop would be taken as a sample, the leaner portion of the outcrop not being collected. This practice is misleading as to the value of the formation at the locality at which it is sampled and is as impracticable as using one piece of coal from a six foot vein for a representative sample” (Reeves 1923). 2. A set of specimens collected in the same manner and similar in nature to a prior set whose data have been used to determine a statistic (e.g. mean concentration) or to establish a method which is to be used subsequently for predictive purposes (e.g. fitting a regression equation or a discriminant function, etc.). However, the data for this second, completely independent, set of samples are used so as to test the veracity of the predictive method or other prior result. Independent variable If a variable y is a function, y ¼ f (x), of one (or more) predictors (x), then x is/are termed the independent variable(s) and y the dependent variable. In Anonymous (1830a, b) reference was made to one quantity which depends upon another as an algebraic function, and the term independent variable was used by Cayley (1879), but it

282

was the British statistician, (Sir) Ronald Alymer Fisher (1890–1962), who first used these terms in a regression context in Fisher (1935). Early geological usage of the term occurs in papers by the American mathematical geologist, William Christian (1902–1979) (Krumbein 1934b, 1936a, 1937, 1938). Index 1. A superscript or subscript to a symbol: e.g. a subscript symbol or number is used to identify one element out of a set, such as an element of a matrix (Sheriff 1984). 2. A number characterising a particular property. Such usage, particularly in economics where it typically represents the monetary value of “a basket of physical commodities,” can be traced back to the eighteenth century (Kendall 1969). For example, the Division of Public Relations of the American Petroleum Institute (1929) gives tables reporting the annual “index of production of minerals [including crude petroleum]” and “wholesale prices of specified products [including gasoline]” in the United States.

I

Indicator kriging, indicator random function A geostatistical estimation method in which use is made of an indicator variable: a spatially distributed binary variable (indicator random function, Dowd 1991) which is assigned a value of 1 if the spatial variable of prime interest (e.g. a grade) is below a given cut-off, and a value of 0 if it is above it. The ordinary kriging of several thresholds, using a different variogram model for each cut-off, is referred to as indicator kriging. In practice, the best results are often obtained using a threshold close to the median. It was introduced by Journel (1982, 1988); see also: Isaaks and Srivastava (1989), Gómez-Hernández and Srivastava (1990), Dowd (1991), Bivand et al. (2008, 2013). Indirect method A term for spectrum estimation, introduced by the American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000) (Blackman and Tukey 1958). Given an infinite length record X(t) the power spectrum P( f ) may be calculated either directly from X(t), or indirectly as the Fourier transform of the autocovariance function, which is calculable directly from X(t). The basic choice is essentially between squaring a Fourier transform, or Fourier transforming an average of products. An example of its use in the earth sciences is Luque-Espinar et al. (2008). inf [notation] An abbreviation for infimum, the largest value of a set (S) which is less than or equal to any element of the set; it is sometimes called the greatest lower bound of the set. This is in contrast to the minimum of S, which is the smallest element of S. An early use of the term in mathematics was by the German mathematician, George August N€obeling (1907–2008) (N€obeling 1935).

283

Inference engine A computer program within an expert system. Written in a logic programming language such as Prolog or its successors which derives conclusions from the facts and encoded rules (initially supplied by an expert) contained within its knowledge base, using either forward or backward chaining to move through the knowledge base and reach a conclusion. In forward chaining, the acquisition of a new piece of data in working memory activates rules whose conditions match the new data; it then finds those whose conditions are true for the new data and then carries out their actions. These may include the generation of new data which will in its turn initiate additional actions, etc. In backward chaining (used in Prolog) the system needs to determine the value of a piece of data; it will search for facts which may give an answer directly or rules whose conclusions mention the data, but before initiating them it will evaluate their conditions and this in turn may involve determining the values of more pieces of data (Prospector uses the latter approach). See Sutterlin and Visher (1990). Infinite Impulse Response (IIR) filter In a non-recursive filter, the output y (t) of the filter at time t depends only on k X

wi xti ,

i¼k

where wi are the applied weights. In a recursive filter, the output will also depend on a previous output value, yð t Þ ¼

k X i¼k

ai xti þ

k X

bj ytj ,

j¼0

where ai and bj are the applied weights. If recursive filters are used on processing real-time problems, then observations for i or j > t will not exist; these are physically realisable, as opposed to the more general physically unrealisable case. Such “one-sided” physically realisable filters are also known as infinite impulse response (IIR) filters, as they can produce effects arbitrarily far into the future from a single impulse (e.g. a Dirac function). Non-recursive filters are correspondingly known as finite impulse response (FIR) filters. Filters which can be implemented on real-time physical systems are also known as causal filters; those which are applied to filtering an entire time series which has already been obtained are also known as acausal filters. For discussion see: Hamming (1977) and, in an earth science context, Buttkus (1991), Gubbins (2003) and Weedon (2003). Inflection point In two dimensions, it may be visualised as the value on the horizontal coordinate (x-axis) of a two-dimensional rectangular Cartesian coordinate system of an (x, y) graph, at which a mathematical or empirically-fitted function y ¼ f (x) changes from

284

a positive to a negative slope, or vici versa. A given function may have more than one turning point (e.g. the variation of strontium or carbon isotope ratio curves with time). The Latin term puncta inflexionum first appears in a posthumous edition of the work the French lawyer and mathematician, Pierre de Fermat (1601–1665) (de Fermat 1679; Miller 2015a). Although point of inflection appears to have been more widely used than inflection point in many nineteenth century geological works, the latter began to become more frequent in the 1920s and, from 1960 onwards, has been the preferred term (Google Research 2012). It has also occasionally been referred to as a turning point. Information coefficient The Russian mathematical geologist, Andreĭ Borisovich Vistelius (1915–1995), suggested use of the Information coefficient, I(x) as a measure of the uniformity of a frequency distribution (Vistelius 1964a, b). If the data are grouped into k-classes, the entropy H ¼

k X

pi ln ðpi Þ,

i¼1

I

where pi is the proportion of the n observations falling into the i-th class. Since in practice, the observed proportions are estimates, a correction for bias is required and so H 0 ð xÞ ¼

k X

pi ln ð pi Þ þ ðk 1Þ=2n

i¼1 0

and I(x) ¼ 1 H (x)/ ln(k); 0 I(x) < 1. He suggested that this is a more useful descriptor for asymmetric and multimodal distributions than the standard deviation. Initialize To establish an initial value of something, e.g. a variable in a computer program. The term was in use in this context in computing in the 1950s and is cited by Sheriff (1984). It was implicit in the ‘order codes’ used to program the EDSAC 1 (Electronic Delay Storage Automatic Computer) built at Cambridge in 1946–1949 (Wilkes et al. 1951), which was the first stored-program computer. See also: computer programming language. Inman deviation, Inman kurtosis The Inman deviation and kurtosis are named for the American geologist, Douglas Lamar Inman (1920–2016) who introduced them (Inman 1952). The Inman deviation is a measure of the spread of a sediment size distribution, M f ¼ 12 ðØ84 Ø16 Þ, where ϕ16 and ϕ84 are the 16th and 84th percentiles, measured on the phi scale and estimated from the cumulative sediment size grade distribution. The Inman kurtosis is a dimensionless measure of the shape (peakedness) of a sediment size distribution:

285

. 1 1 1 b ¼ ðØ95 Ø5 Þ ðØ84 Ø16 Þ ðØ84 Ø16 Þ, 2 2 2 where ϕ5, ϕ16, ϕ84 and ϕ95 are the 5th, 16th, 84th and 95th percentiles of the cumulative sediment size grade distribution, again measured on the phi scale. see also: phi standard deviation, kurtosis, phi kurtosis. Inner product The inner product (also known as the dot product) of the vectors x ¼ {x1, x2, x3, ∙ ∙ ∙, xn} and y ¼ {y1, y2, y3, ∙ ∙ ∙, yn} is x ∙ y ¼ {x1y1, x2 y2, x3 y3, ∙ ∙ ∙, xn yn} (Sheriff 1984; Camina and Janacek 1984). This operator first appears in German (inneres produkt) in the work of the Prussian mathematician and linguist Hermann Günther Graßman (1809–1877), who discovered linear algebra (Grassman 1844; Hyde 1890; Miller 2015a). It occurs in geophysical papers from the 1960s (e.g. Gilbert and Backus 1965) onwards. Note that the spelling is usually the unhyphenated inner product rather than inner-product (Google Research 2012). Instantaneous amplitude, instantaneous frequency, instantaneous phase The application of these measures is similar to that of harmonic analysis, in seeking to describe the amplitude and phase of a waveform, but use is made of low-pass (moving average) filters to enhance variations in amplitude and phase structure as a function of time. Given a complex pffiffiffiffiffiffiffi time series: z(t) ¼ x(t) + iy(t), where i is the imaginary unit 1, then z(t) ¼ A(t)eiΘ(t) where e is Euler’s number and A(t) is the instantaneous amplitude: Aðt Þ ¼ jzðt Þj ¼ √ x2 ðt Þ þ y2 ðt Þ and the instantaneous phase is: yð t Þ Θðt Þ ¼ arctan xð t Þ (Buttkus 1991, 2000). Variations in A(t)and Θ(t) can be usefully plotted as a function of time. These terms were introduced by the American statistician, John Wilder Tukey (1915–2000) in Bogert et al. (1963); see also Bingham et al. (1967) and Bloomfield (1976). For discussion in an earth science context see: Taner et al. (1979), Pisias and Moore (1981), Shackleton et al. (1995), Rutherford and D’Hondt (2000), Buttkus (1991, 2000) and Weedon (2003). Integer The set of whole numbers: {. . ... 3, 2, 1, 0, 1, 2, 3, . . ..}. The term integer was first used in a treatise by the British mathematicians, Leonard Digges (c. 1515–1579) and his son Thomas Digges (c. 1546–1595) (Digges and Digges 1571). Lloyd (1849) is an early example of use the term in an earth science context.

286

Integral, integration An integral is the result of integrating a function: if y ¼ f (x), then it is the total area between the curve defined by the values of y ¼ f(x) and the x-axis. This can be imagined as the sum of the areas of an infinite number of infinitely thin rectangles parallel to the y-axis, all of equal width, δx, and with corresponding mid-point (MP) heights: yMP ¼ f(xMP) hence Z f ð xÞ

n X

f f ðxMP Þgi

i¼1

as δx ! 0 and, correspondingly, n ! 1 . If the area considered is only that between stated lower and upper limits, x1 and x2, then it is referred to as a definite integral which is written in the still-current notation introduced by the German mathematician, Gottfried Wilhelm Rx von Leibniz (1646–1716), (Leibniz 1686, 297; Roero 2005) as: x12 f ðxÞdx. Otherwise it is called an indefinite integral. See Camina and Janacek (1984) for discussion; Abramovitz and Stegun (1965) for special cases. See also: Abelian integral, Booton integral equation, Cauchy’s integral theorem, double integral, Fourier integral, Hankel integral, Lebesgue integral, line integral, path integral, Wiener-Hopf integral equation.

I

Integrated power spectrum Given the power spectral density G( f ) for a time series x(t), as a function of frequency ( f ) Hz. For a very narrow frequency range of width Δf, i.e. between f and f + Δf, as Δf ! 0, x(t) has the power G( f )Δf. Its integrated power spectrum A( f ) resembles a cumulative curve corresponding to G(f ) and for an upper limit f 0: Z Að f 0 Þ ¼

f0

f 0

Gð f Þdf

It can be obtained by applying a low-pass filter to x(t) with an upper cut-off frequency of f0 and calculating the mean square of the output. When f0!0, A( f0) corresponds to the total power of x(t) (Buttkus 1991, 2000). Intensive variable diagram The American mathematical physicist, Josiah Willard Gibbs (1839–1903), considered each state of matter a phase and each substance a component (e.g. water and ice are one component in two phases). His Phase Rule, developed in Gibbs (1873, 1876, 1878a,b) states that in any system the number of independent intensive properties (N ) depends on the number of chemical species present (C) and the number of phases present (K ): N ¼ (C + 2) K. In his treatment of thermodynamics, he distinguishes between intensive variables which do not depend on mass, such as pressure (P), temperature (T) and composition, and extensive variables, which are related to mass, such as flow rate, volume, number of moles. etc. For example, the mole fraction (X) of a fluid is related to the amount of mass of the fluid components. The terms intensive and extensive variables

287

appear to have come into use by the 1930s (Weber 1939). Usage in a petrological context occurs in Greenwood (1975) and Perkins et al. (1986a, b) who describe a number of FORTRAN-77 computer programs for calculation of PT, TX and PX intensive variable diagrams for geological usage; see also Guiraud and Powell (2006). Interactive program A computer program in which a human interacts with the computer, inputting information in response to text or visual output from the computer during the course of its execution. The program changes its course of action depending on the information input. Computers with this capability (e.g. Bendix G-15, International Business Machines 610, Digital Equipment Corporation PDP-1) began to emerge in the 1950s, although capability was very limited (e.g. McCarthy and Silver 1960). By the late 1960s, man-machine interaction was relatively widespread and had begun to come to the attention of earth scientists (e.g. Peikert 1969; Sanderson 1973). Intercept In geometry, the term intercept was originally used in the sense of one line crossing another. However, in the case of a fitted bivariate (or equivalent multivariate) regression model, y ¼ f (x) + c, it refers to the constant c, which corresponds to x ¼ 0. In early literature (e.g. Merriman 1877), it is referred to, like the other coefficients, simply as a “constant” but the terms y-intercept and intercept also came into use (Young 1833; Smith and Gale 1904). See also: slope. Interference The superposition of two or more waveforms, especially noise from another source interfering with a signal. Early examples are interference from atmospherics on radio transmission (e.g. Watson-Watt and Appleton 1923) and in geophysics (Blaik and Donn 1954). Interference beats, interference tones, intermodulation In the case of imposed amplitude modulation in which a long period sinusoidal wavelength with frequency f1 is imposed on another with frequency f2, f1 > f2, then minor combination tones will be generated at frequencies 1/f ¼ 1/f1 1/f2, the upper and lower sidebands on either side of the dominant frequency ( f2). These appear as symmetrically placed minor-amplitude peaks on either side of f2 in the power spectrum of the resulting waveform. The term combination tone was used in acoustics by the German physicist, Georg Simon Ohm (1787–1854) (Ohm 1839). They are also called interference beats and interference tones; their generation is known as intermodulation or frequency mixing. The primary combination tone at f1 + f2 is known as a summation tone, and at f1 f2 as a difference tone. When a component frequency is higher than a fundamental frequency, it is called an overtone, and a difference tone at a lower frequency than the fundamental is called an undertone. For discussion in an earth science context see King (1996) and Weedon (2003) Interpolation The process of estimating the y-values at a series of x positions placed in-between existing data points for the function y ¼ f (x) in one dimension (or the

288

equivalent operation on a two- or three-dimensional mesh). This is achieved by fitting a local parametric or nonparametric function so as to pass a curve through near-by data values (e.g. by simple linear or quadratic interpolation, smoothing spline regression, locally-weighted regression, etc.). The Latin equivalent of the term (interpolare) was originally introduced by the British mathematician, John Wallis (1616–1703) (Wallis 1656). Such methods were included in early statistical texts (e.g. Whittaker and Robinson 1932) and were used in the earth sciences (e.g. Bauer 1895; Jones 1956); see Meijering (2002) for a comprehensive survey. Because of their importance in a two-dimensional context underpinning contour (isoline) mapping for earth science and other applications, by the 1970s evaluation of the various methods had become a matter of importance and was reviewed by Tempfli and Makarovic (1979) and others. However, in this context, the older methods have gradually become replaced by those of geostatistics, pioneered by the French mining engineer and mathematician, Georges Matheron (1930–2000) (Matheron 1962–1963, 1965; Bivand et al. 2013), which provide a statistically optimum basis for the interpolation where a directional spatial autocorrelation structure exists. See also: contour map, Gregory-Newton interpolation formula, Lagrange interpolation polynomial, head-banging, kriging, variogram.

I

Interpolation function (sinc, sincn) sinc is a contraction of the Latin sinus cardinalis (cardinal sine); in mathematics, the function sinc(x) was first defined by the British mathematician, Philip M. Woodward (1919–) as: sinc(x) sin(πx)/πx, where x6¼ 0 (Woodward and Davis 1952). This is sometimes called its normalised form and designated sincn, so that Z

1

1

sincnðxÞdx ¼ 1,

sincn(0) ¼ 1 and sincn(x) ¼0 for non-zero integer values of x. The unnormalised equivalent is: sincðxÞ ¼ sinxðπxÞ and Z

1 1

sincðxÞdx ¼ π:

This is also referred to as the sinc interpolation function or sampling function (Harris 1977). It is of interest in signal processing (Woodward 1953; Harris 1977; Gubbins 2004) because it is the impulse response of the ideal low-pass filter, the Fourier transform of a boxcar function (Daniell window) which cuts off at half the sampling rate (i.e. π and π). Interpreter An interpreter is a computer program which translates instructions written in a high-level source code programming language and runs them concurrently (Reigel et al. 1972). The first interpreter was written for LISP on an IBM 704 computer in 1958 by

289

American computer scientist, Steve R. Russell (1937–) (McCarthy 1979), subsequently interpreters were developed for ALGOL’60 (Anderson 1961), FORTRAN (Melbourne and Pugmire 1965). William Henry Gates (1955–). Paul Gardner Allen (1953), subsequently the founders of Microsoft, wrote the first BASIC interpreter for the Micro Instrumentation and Telemetry Systems, Albuquerque, NM, Altair 8800 “personal computer” in 1975, and by late 1977 versions of Microsoft BASIC were also available for the early Apple and Commodore microcomputers (Steil 2008). Interquartile Range (IQR) A measure of the dispersion of a set of measurements. Quartiles are three (interpolated) values which divide the set of observed values for a variable sorted into order of ascending magnitude such that 25% fall below, or at, the first quartile (Q1); 50% below, or at, the second (Q2); and 75% below, or at, the third (Q3). The second quartile is more usually known as the median. The interquartile range (IQR) is (Q3 Q1). It was introduced by the American geologist William Christian Krumbein (1902–1979) as one of a number of measures for characterising sediment size distributions (Krumbein 1936b; Krumbein and Pettijohn 1938). Inverse, inverse function In general, given two operations, the starting point of one is the result of the other; e.g. an inverse ratio is one in which the terms are reversed with respect to a given ratio; in an inverse relationship between two variables, the value of one increases as that of the other decreases. ex and ln(x) are examples of inverse functions; f 1 is used to denote the inverse function of f; e.g. sin1 x is the angle whose sine is x. The term is also used to refer to an inverse matrix (Camina and Janacek 1984). Inverse filter A filter, often designed to provide a smooth response function (e.g. one shaped like a Gaussian distribution, or similar), used to remove unwanted features in a time series, such as seismic records (Goupillaud 1961; Brigham et al. 1968; Sheriff 1984); see also Rice (1962). The term is also used as a synonym for deconvolution (Robinson 1967b). Inverse Fourier transform The reconstruction of an original waveform from its Fourier transform by determining the time domain waveform from the frequency domain. If X ( f ) is a representation of x(t) in the frequency domain, they are related by (

xðt Þ ! X ð f Þ Fourier transform

X ð f Þ ! xðt Þ

) :

Inverse transform

See Camina and Janacek 1984, Buttkus (1991, 2000), Gubbins (2004). Inverse matrix A square matrix, X1, such that when multiplied by the matrix X, it yields the identity matrix (I), i.e. X1X ¼ I. The term and notation were introduced by the

290

English mathematician, Arthur Cayley (1821–1895) (Cayley 1858). The pseudoinverse, the generalization of an inverse to all matrices including rectangular as well as square, was discovered by the American mathematician, Eliakim Hastings Moore (1862–1932) (Moore 1935), under the name general reciprocal. It was independently rediscovered by the English mathematical physicist, (Sir) Roger Penrose (1931–) (Penrose 1955), who named it the generalized inverse; Greville (1959) said that the now frequently used term pseudoinverse was suggested to him by the American applied mathematician, Max A. Woodbury (1926–). In geophysics, the term inverse (in the sense of a matrix inverse) became frequent from the 1960s (e.g. Harkrider and Anderson 1962), and pseudoinverse from the 1980s (e.g. Tarlowski 1982). See also: Greenberg and Sarhan (1959), illconditioning.

I

Inverse model, inverse problem Determining what kind of conceptual model(s) could have given rise to a set of observed data; using the results of actual observations to infer the values of parameters characterising the system under observation; obtaining subsurface models that may adequately describe a set of observations (observations of data ! quantitative model ! model parameters). Inverse problems are, by their nature, much harder to solve than the corresponding forward problem (estimates of model parameters ! quantitative model ! predictions of data). In principle, it is assumed that d ¼ A(m), where d are the observed data; m is the assumed physical model which gives rise to the observations; and A is the function linking the data and model. Solution proceeds iteratively: Assuming an initial model m0, the expected data values based on it, d0, are calculated, yielding the data misfit (Δd ¼ dd0). Using a suitable inverse operator, A1, the corresponding model perturbation, Δm, is then calculated, and hence an updated model: m1 ¼ m + Δm, etc. In practice, a very large number of possible models will need to be searched so as to obtain the best overall solution, and it is always conceivable that more than one model will provide an “acceptable fit to the data,” in which case the geologically most plausible solution should be adopted. In principle, the majority of geophysical inverse problems are continuous, involving a model described by piecewise continuous functions, but so as to obtain practical solutions, it is usually necessary to discretise the problem by using point-wise discretisation or the use of orthogonal polynomials, etc. A numerical method for solution of this type of problem was first published by the Russian mathematicians, Israel Moiseevich Gel’fand (1913–2009) and Boris Moiseevich Levitan (1914–2004) in 1951 (Gel’fand and Levitan 1955) and subsequently introduced into geophysics by the American geophysicists George Edward Backus (1930–) and James Freeman Gilbert (1931–2014) in a series of papers (1967, 1968, 1970); see also: Parker (1970, 1972, 1977, 1994), Cassinis (1981), Tarantola and Valette (1982), Cicci, D.A. (1992), Herzfeld (1996), Zhdanov (2002) and Gubbins (2004) for discussion in a geoscience context; see also Tarantola (2005) and Sabatier (2009), Backus-Gilbert method, direct problem.

291

Inverse transform This generally refers to the inverse Fourier transform (Knopoff 1956; Camina and Janacek 1984; Gubbins 2004), transforming from the time domain to the frequency domain and vici versa. Inverse z-transform Given a continuous time function, the wavelet b(t), whose amplitude is sampled at regular unit time intervals, t ¼ 0, 1, 2, 3, . . . n is: b ¼ (b0, b1, b2, ∙ ∙ ∙, bn, ).The z-transform of this wavelet is a polynomial: BðzÞ ¼ b0 þ b1 z þ b2 z2 þ . . . þ bn zn in which the coefficients (z, z2, z3, ) represent the wavelet amplitudes at successive times t ¼ 1, 2, 3, and z is a complex variable. The inverse z-transform is then given by: 1 bð t Þ ¼ 2πi

I BðzÞzn dz,

H where dz denotes a line integral, i.e. an integral taken over a closed path, and i is the pffiffiffiffiffiffiffi imaginary unit 1. The convolution of two wavelets is equivalent to multiplying their z-transforms. See Robinson (1966a, b, 1967a, b), Camina and Janacek (1984), Claerbout (1985), Buttkus (1991, 2000), Gubbins (2004). Inverse trigonometric functions These are: (i) the inverse sine, the angle whose sine is x, denoted as sin1(x), arcsin(x), or asin(x); (ii) the inverse cosine, the angle whose cosine is x, denoted as cos1(x), arccos(x), or acos(x); and (iii) the inverse tangent, the angle whose tangent is x, denoted as tan1(x), arctan(x), or atan(x); etc. Examples of these different notations can be found in De Morgan (1847), Nichols (1900), Kenyon et al. (1913), Camina and Janacek (1984). Inversion The process of solving the inverse problem: Determining what kind of conceptual model(s) could have given rise to a set of observed data; using the results of actual observations to infer the values of parameters characterising the system under observation (Gubbins (2004) refers to this process as parameter estimation); obtaining subsurface geological models that may adequately describe a set of geophysical observations (Sheriff 1984). By their nature, most inverse models are underdetermined, with several alternative solutions fitting the same data; Caers and Hoffman (2006) suggested using a Bayesian solution to this problem, the Probability perturbation method. Inversion filtering A two-stage procedure introduced by Ferber (1984): Firstly, construction of a causal filter by factorization of the spectral density function using LevinsonDurbin recursion, followed by filtering of the seismogram.

292

Inverted Gaussian Model Taking the positive half of the curve describing the “bellshaped” normal (Gaussian) probability distribution and inverting it, yields an S-shaped curve which has been used by both by Hare et al. (1985) and Bonham-Carter (1988) as an empirical fit to vegetation reflectance spectra between 680 and 790 nm. Invertibility, invertible matrix A square n n matrix A is said to be invertible if there exists a square n n matrix B such that ordinary matrix multiplication AB ¼ BA ¼ I, where I is an n n identity matrix (Warner 1965; Camina and Janacek 1984). Irrational number A real number which cannot be expressed as a fraction of any two integers and which has a decimal value which neither terminates nor becomes periodic (e.g. the square root of two, and the constants e, π). Although the term may go back to the Greek geometers, it was probably not in wide usage until the sixteenth century, following its appearance in the Welsh mathematician Robert Record’s (1510–1558) Pathwaie to knowledg (1551), which was itself based on Euclid’s Elements. See also: rational number, ℝ.

I

Isarithmic map, isarithmic surface Alternative terms for an isoline contour map or surface of the magnitude of a measured or counted attribute other than topographic height. The term isarithmic map appears to have been in use by 1930 (Google Research 2012; Finch et al. 1957; Robinson 1961). See also: isoline, isolith, isopach map, isopleth map, isoseismal map. iso- A prefix meaning equal, e.g. isochron, isoline, isopach, isopleth, etc. Isochron plot An isochron is a line corresponding to a constant ratio of particular radioactive isotopes used to determine the age of a rock or mineral. The earliest work plotted values of 207Pb/204Pb ( y-axis) versus 206Pb/204Pb (x-axis) in galena specimens. Specimens of the same age will lie on a straight line, first called an “isochrone” by the Polish-Austrian physicist, Friedrich Georg Houtermans (1903–1996) (Houtermans 1946). The present-day term isochron was introduced by the American geochemist, Clair Cameron Patterson (1922–1995) (Patterson 1956). The isotopes in such a graph need not be all of the same element, e.g. in modern work 87Sr/86Sr plotted as a function of 87Rb/86Sr. Today the best-fit of the linear isochron to the data is achieved using linear regression, ideally using a robust method which takes into account the magnitudes of the uncertainties in the data. Isocon Gresens’ (1967) method of analysis of changes in volume and concentrations during metasomatism has been applied in many studies of hydrothermal alteration. Grant (1986) provides a simple method of solution of Gresens’ equation, for both volume (or mass) change and concentration changes. The equation is rearranged into a linear relationship between the concentration of a component in the altered rock and that in the

293

original. Simultaneous solution of such equations for all components that show no relative gain or loss of mass defines an isocon. On a graph of the concentrations in the altered rock plotted as a function of those in the original, an isocon is portrayed as a straight line through the origin. The slope of the isocon defines the mass change in the alteration, and the deviation of a data point from the isocon defines the concentration change for the corresponding component. As Grant has shown, this can be applied to several stages of alteration simultaneously, and to other kinds of mass transfer such as migmatization. ISODATA An early algorithm for performing cluster analysis, ISODATA (Ball and Hall, 1965), it could be applied to very large data sets. Mancey (1982) used it to achieve a successful cluster analysis of gap-filled, moving average smoothed, maps consisting of 22,000 square map cells, based on c. 50,000 stream sediment specimens taken over the whole of England and Wales (Webb et al. 1978), on the basis of ten major and trace elements, resulting in nine meaningful groups. Iso-diametric line An early term for an isopach: An isopach with a value x is an isoline joining points of equal thickness of a stratigraphic or other rock unit, coal seam, etc. and which separates a field of values >x from a field of values x from a field of values x from a field of values x from a field of values x from a field of values 0, prolate ellipsoids in the field y < 0, and spheres plot at the origin {x, y} ¼ {0, 0}. See also Flinn diagram, Ramsay logarithmic diagram. Jensen plot Jensen (1976) proposed a method for the classification of subalkalic volcanic rocks based on a ternary diagram for Al2O3 (lower left), FeO + Fe2O3 + TiO2 (top) and MgO (lower right). Introduced by the Canadian geologist and geochemist, Larry Sigfred Jensen (1942–). Jitter plot A jitter plot, introduced into statistical graphics by Chambers et al. (1983), is a two-dimensional scatterplot of the values of a single variable (x) in which, instead of simply plotting the data points along a horizontal line parallel to the x-axis, showing the actual values to which they correspond, they are jittered in the vertical ( y) dimension by adding a small amount of uniform (white) noise, to form a uniformly distributed dummy yvariable. The plotted data points thus fall along a narrow band instead of a line, so that those which would otherwise fall on top of each other become clearly distinguished. The term jitter has long been used in electronics as a measure of the variability of a timevarying signal. An example of its use in earth science is Nowell et al. (2006).

J

Joint density, joint density function, joint distribution, joint probability distribution, joint frequency distribution The frequency or probability distribution corresponding to the simultaneous occurrence of any pair of values from each of two variables (x and y). It shows not only the univariate frequency distribution for x and y, but also the way in which each value of y is distributed among the values of x and vici-versa. Also known as a twoway or bivariate frequency distribution. The term bivariate was first used by the British statistician, Karl Pearson (1857–1936) (Pearson 1920). The distribution of the “joint chance” of two variables was discussed by the British mathematician, mathematical astronomer and geophysicist, (Sir) Harold Jeffreys (1891–1989) (Jeffreys 1939). However, bivariate frequency distributions were actually used in geology in an empirical fashion by the French mathematician and cataloguer of earthquakes, Alexis Perrey (1807–1882) (Perrey 1847). See also: Alkins (1920), Schmid (1934), Smart (1979), Camina and Janacek (1984) and Swan and Sandilands (1995). Joint probability The probability of simultaneous occurrence of values of two (or more) variables. Joint probability appears in a legal context in Marble (1878) and in actuarial work (Sutton and King 1882–1887). The former had come into widespread by 1943 (Google Research 2012). The term joint chance was used by the British mathematician, cosmologist and geophysicist, (Sir) Harold Jeffreys (1891–1989) (Jeffreys 1939).

303

Judd A full normal plot (FUNOP) is a robust graphical procedure for detecting unusually large or small values in a frequency distribution, introduced by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1962). The n observed values of a variable, x1, . . ., xn, are first sorted into order of ascending magnitude and the median (M) of these values is calculated. These are then transformed to Judds, where the ith Judd ¼ (xi M)/Qi, and Qi is the quantile of the standard normal distribution equivalent to the plotting proportion i/(n + 1). A divisor of (n + 1) is used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. The Judds are plotted on the y-axis as a function of i. If the observations all corresponded to a normal (lognormal, if the data were first transformed to logarithms) distribution, the Judds would be nearly equal to their standard deviation, and the graph would be linear. See Koch and Link (1970–1971) for discussion in an earth science context. A Google search suggests that their book is the first text in which “ a new quantity, the Judd” appears, in a description of the FUNOP; in its Preface, one of the authors, Richard F. Link, acknowledges Tukey “who contributed materially to his development in statistics and data analysis.” The presumption must be that the term was suggested by Tukey.

K

κ-κ [kappa-kappa] domain A wavefield in which the independent variables are both wavenumbers, i.e. the reciprocal of wavelength (Sheriff 1984). See also: Buttkus (1991, 2000) and Gubbins (2004), wavenumber filtering, frequency-wavenumber domain. k-NN classification The k-nearest neighbors (k-NN; note American English sp.) algorithm is a classification algorithm in which an unknown test sample is compared to the k nearest others in a training set and assigned to predicted class based on a majority vote cast by the neighbouring samples (Cover and Hart 1967). See Cracknell and Reading (2014) for discussion of its performance in an earth science context. Kaczmarz method In 1971, the Japanese mathematician, Kunio Tanabe (1941–) implemented a projection method to solve a system of linear equations, Ax ¼ b, following the work of the Polish mathematician, Stefan Kaczmarz (1895–1939) (Kaczmarz 1937). It has subsequently been known as the Kaczmarz method. Each equation in the system can be thought of as the projection of the solution vector onto the hyperplane corresponding to that equation (Carr et al. 1985). It was rediscovered, by Gordon et al. (1970), in the field of biological image reconstruction from projections. They used the method to reconstruct three-dimensional objects from a series of two-dimensional electron photomicrographs taken at a number of angles in a fan-like pattern (Bender et al. 1970; Herman et al. 1973). The method was then called the Algebraic Reconstruction Technique (ART). It has been applied to seismic tomography (McMechan 1983; NeumannDenzau and Behrens 1984) and to cokriging (Carr et al. 1985), although it proved to be slow (Carr and Myers 1990). However, in seismic work the method was also found to be poorly conditioned, and it was subsequently replaced by the Simultaneous iterative reconstruction technique. See also back projection tomography.

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_11

305

306

Kálmán filter, Kálmán-Bucy filter A time series estimation method of the predictorcorrector type, usually applied to a noisy signal in a discrete time series setting. Named for the Hungarian-American mathematician and electrical engineer, Rudolf Emil Kálmán (1930–2016) who introduced it (Kálmán 1960). It is particularly useful when the measurement error varies with time. A set of time-update equations project the current state and error covariance estimates forward in time to the next time-step; the new observation then provides a set of measurement update equations with the means to determine the error of the prediction. This is then taken into account (together with all prior information) to update the prediction process and so obtain a new (and hopefully improved) forward estimate for the next time-step, etc. A continuous time version was developed with the American statistician, Richard Snowden Bucy (1935–) (Kálmán and Bucy 1961). For general discussion see: Meinhold and Singpurwalla (1983) and Harvey (1999); and in an earth science context: Bayless and Brigham (1970), Ruckebusch (1983), Camina and Janacek (1984), Buttkus (1991, 2000), Baziw and Weir-Jones (2002) and Gu and Oliver (2006). Kamb method A method for contouring the point-density of the projections of threedimensional crystallographic orientations on a lower-hemisphere equal-area stereographic projection, using a variable counting-area. Named for the American physicist, geologist and Antarctic glaciologist, Walter Barclay Kamb (1931–2011), who introduced it (Kamb 1959). Examples of it application in geology include Rivers and Fyson (1977), Christensen (1984) and Tamagawa and Pollard (2008).

K

Kaplan-Meier method Also known as the product-limit estimator. A standard method in medical studies for calculating the summary statistics of right-censored survival data. Named for the American statistician, Edward Lynn Kaplan (1920–2006) and biostatistician Paul Meier (1924–2011) who introduced it (Kaplan and Meier 1958). Chung (1988, 1989a) has applied the method to the lengths of fractures in a granitic pluton where both ends of 1567 fractures can only be observed in 257 cases and it is required to obtain a confidence band for the observed distribution of fracture lengths. Helsel (2005) gives a clear example of the application of the method to left-censored geochemical concentration data. Kapteyn’s transform A method proposed by the Dutch astronomer, Jacobus Cornelius Kapteyn (1851–1922) (Kapteyn 1903, 1916; Kapteyn and van Uven 1916) for the transformation of a skew distribution into a normal distribution: f ðzÞ ¼

2 1 2 pffiffiffiffiffi eðzz Þ =2σz σ z 2π

where z ¼ f (x); x is the observed variable (e.g. z ¼ ln x); and z and σz are the mean and standard deviation of z respectively and e is Euler’s number, the constant 2.71828. Applied by Březina (1963) to grain size distributions where the normalizing function is

307

the settling velocity logarithm as a function of grain diameter. See Stamhuis and Seneta (2009) for some historical background. KEE (Knowledge Engineering Environment) An early environment for expert system development (introduced by IntelliCorp in 1983). It used packages of pre-written code, originally for dedicated LISP machines, which provided researchers with the tools need to develop an expert system. KEE supported a variety of knowledge representation schemes including object-oriented frame language. The inference engine supported both forward and backward chaining and had sophisticated graphics. Kendall’s rank correlation coefficient, Kendall’s tau (τ) Devised by the British statistician (Sir) Maurice George Kendall (1907–1983) in 1938, it is a nonparametric correlation coefficient (τ) between two ranked variables. It measures the extent to which the order of observations in one variable differ from the order of the observations in the other. If {X, Y} ¼ (x1, y1), (x2, y2), . . ., (xn, yn) is a set of observations of the joint random variables X and Y respectively, such that all the values of (xi) and (yi) are unique. Any pair of observations (xi, yi) and (xj, yj) are said to be concordant if the ranks for both elements agree: that is, if both xi > xj and yi > yj or if both xi < xj and yi < yj. They are said to be discordant, if xi > xj and yi < yj or if xi < xj and yi > yj. If xi ¼ xj or yi ¼ yj, the pair is neither concordant nor discordant. In the simplest case, in which there are no tied rankings: τ¼

ðno:of concordant pairsÞ ðno:of discordant pairsÞ : ðtotal no:of pair combinationsÞ

It is a nonparametric measure of the statistical dependence between two variables which reflects the strength of their monotone relationship even if it is nonlinear. An early geological application is Melton (1958a); see also Cheeney (1983). Kent distribution This is an elliptical analogue of the Fisher distribution, proposed by the British statistician John T. Kent (1951–) (Kent 1982). The probability distribution expressed in polar coordinates is: F ðθ; ϕ; κ; βÞ ¼ ce½κ cosθþβ sin

2

θ ðcos 2 ϕsin 2 ϕÞsinθ

where c is a normalization constant; κ 0 is a concentration parameter; 0 θ π is the colatitude; 0 ϕ 2π is longitude; β is an “ovalness” parameter, 0 β 2κ ; when β ¼ 0 the Kent distribution reduces to a Fisher distribution. See Peel et al. (2001) for an earth science application; see also spherical statistics, Bingham distribution. Kernel density estimation The frequency distribution is obtained by first placing a series of smooth, symmetrical, density functions (the “kernel” of the method’s name),

308

each with the same spread parameter (known as its “window width” or bandwidth), at the position of each occurrence of the variable along the horizontal axis corresponding to its magnitude. These can be imagined as a series of often overlapping equal-size “bumps” which are then summed to give the final smoothed density function. This avoids the blocky appearance of the traditional histogram, but choice of an appropriate bandwidth to avoid under- or over-smoothing is essential. The approach, and terminology, has its origins in work by the American statistician, John Wilder Tukey (1915–2000) on spectrum analysis in the late 1940s (Tukey 1950). See Wegman (1972), Tukey (1973), Vita-Finzi et al. (2005) and Nowell et al. (2006) for earth science applications. Kernel function A kernel function (K ) of two variables (x, y) is one which defines an integral transform (T ). The input is a function, f, and the output is a second function Tf, thus: Z T f ð yÞ ¼

x2

K ðx; yÞf ðxÞdx

x1

(Sheriff 1984). An early application in geophysics was by the Romanian geophysicist, Sabba S. Ştefănescu (1902–1994) when he was working with the pioneer geophysicists Conrad and Marcel Schlumberger in Paris in 1929–1933, developing the theory of geoelectrical methods (Ştefănescu et al. 1930). See also Onodera (1960), Roman (1963), Koefoed (1968) and Loewenthal (1975).

K

Kernel principal components analysis See: principal components analysis. Kernel smoothing These are modern bivariate nonparametric regression methods which improve on early use of m-point moving averages or medians. The American statistician and computer scientist, William Swain Cleveland (1943–) introduced the Locally Weighted Scatterplot Smoother (LOWESS) (Cleveland 1979). This uses a recursive locally-weighted regression centred on the x-coordinate each data point in turn, whereas kernel methods (Muller 1987) fix the weights explicitly. Such methods have been successfully applied in both hydrogeochemistry (Helsel and Hirsch 1992) and Sr-isotope geochemistry (Howarth and McArthur 1997). See also smoothing spline regression. Kinematic forward model A forward model (Parker 1972, 1977) calculates what would be observed from a given conceptual model; it is prediction of observations, given the values of the parameters defining the model, e.g. predicting the gravity field over a salt dome whose characteristics have been inferred from a seismic survey (Sheriff 1984; Gubbins 2004). Crosby et al. (2008) give an example of kinematic forward modelling

309

in a geological setting. Also called a direct problem (Ianâs and Zorilescu 1968). See also: Inverse problem. Kinematic vorticity number A measure of the rotation rate of progressive deformation relative to the rate of extension; or of the non-coaxiality of the deformation and a descriptor of the relative contribution of pure shearing and simple shearing in progressive deformation. Introduced by the American applied mathematician, Clifford Ambrose Truesdell (1919–2000) and first applied in geology by the American structural geologist, Winthrop Dickinson Means (1933–) and others (Means et al. 1980). Kinetics, kinetic model Kinetics is the study of the time-behaviour of a system of coupled chemical reactions away from equilibrium (Lecca 2009); sometimes referred to as “reaction rates.” A kinetic model is a numerical model of the interaction between two or more chemical components; it may predict the amount of a particular reaction product as a function of time. The first kinetic theory of gases originated with the study by Swiss mathematician and physicist, Daniel Bernoulli (1700–1782) of the behaviour of fluids and gases (Bernoulli 1738) but probabilistic treatment only began with James Clark Maxwell (1860a, 1860b). Modern mathematical treatment began in the early twentieth century (e.g. Hinshelwood 1926). See Schmalz (1967) and Gardner and Lerche (1990) for geochemical examples. Kite diagram A graph of concentrations of: total sulphate + chloride (left), calcium + magnesium (top), bicarbonate + carbonate (right) and sodium + potassium (base) plotted on four orthogonal axes with a common origin; cations are plotted on the upper and lower vertical axes and anions on the left and right horizontal axes. The four coordinates along each axis are joined by lines to form a “kite-shaped” multivariate graphic symbol for each sample composition. Kite diagrams for each of a number of samples are often plotted in a spatial context, samples of similar composition being recognised by their having similar shapes. This type of diagram appears to have been introduced by Colby et al. (1956); see also Davis and Rogers (1984). Kleiner-Hartigan trees This is a multivariate graphical display technique, developed by Swiss statistician Beat Kleiner, and American statistician John Hartigan (Kleiner and Hartigan 1981) at AT&T Bell Laboratories, which uses a tree morphology based on the dendrogram obtained from a prior hierarchical cluster analysis of a correlation matrix of the data set; branch-lengths of the trees are then drawn proportional to the magnitudes of the variables corresponding to each branch (e.g. element concentrations in a specimen). The tree morphology remains unchanged throughout, so that the composition of each specimen is reflected by the relative magnitudes of the branches from one tree to another. Initially suggested for use in applied geochemistry by Robert G. Garrett of the Geological Survey of Canada (Garrett 1983), it was subsequently found by Turner (1986) to be far more effective at portraying the multi-element sample compositions than Chernoff faces

310

and has been likened to “performing a visual factor analysis.” Although the physical size of the plotted trees can make it difficult to use them in a spatial context with a large data set by plotting them at their corresponding sample position on a map, nevertheless, side-byside comparison of the trees laid out as a graphic table, in numerical order of sample numbers, proved quite satisfactory. They were also extensively used by Coward (1986). See also Reimann et al. (2008) and Howarth and Garrett (2010). Kolmogorov factorisation Kolmogorov factorization of the spectrum is a procedure used to construct a minimum-delay inverse filter from a given amplitude spectrum. The procedure used (Gubbins 2004; Claerbout 1992) is: starting with the power spectrum take its logarithm, inverse transform, discard negative time terms, take the transform and exponentiate. Named for the Russian mathematician, Andrei Nikolaevich Kolmogorov (1903–1987) who introduced the technique (Kolmogorov 1939). Kolmogorov-Smirnov filter A probabilistic filter, based on the Kolmogorov-Smirnov test, it computes whether the cumulative frequency distribution of a square central block of cells in an image corresponds to a statistically greater concentration distribution than that of a surrounding square annulus at some distance away. Originally developed for imageprocessing applications (Muerle and Allan 1968), it was subsequently used for anomaly detection in regional geochemical mapping (Howarth et al. 1980; Howarth 1983; Chork and Cruikshank 1984).

K

Kolmogorov-Smirnov test A nonparametric goodness-of-fit test between two cumulative distributions proposed by the Russian mathematician, Andrei Nikolaevich Kolmogorov (1903–1987) (Kolmogorov 1933), and extended by his fellow-countryman, mathematician Nikolai Vasilyevich Smirnov (1900–1966) (Smirnov 1939a, 1939b, 1948). If O(x) is the cumulative distribution of the set of observations and F(x) is the cumulative distribution of the fitted model, then the test is based on the statistic: D ¼ max |O(x) F(x)| ; 0 < D < 100 % . First called the Kolmogorov-Smirnov test in Massey (1951); it has also been referred to as the Smirnov test (Rock 1986b); see Lilliefors (1967) and Stephens (1993) for discussion. Early geological applications include Miller and Olsen (1955), Degens et al. (1957) and Miller and Kahn (1962). Kramers-Kronig relation(s) In filter theory, in the time domain, the output function of a linear, deterministic, time-invariant filter y(t) can be calculated for any filter input x(t) when h(t), the impulse response of the filter, is known: thus x(t) convolved with h(t) yields y(t). In the frequency domain, the equivalent is the multiplication: Y ð f Þ ¼ H ð f ÞX ð f Þ,

311

where X( f ) is the spectrum of the input function; H( f ) is the frequency response of the filter; and Y( f ) is the filter output. The filtering operations in the time and frequency domains have the equivalency: 9 9 8 8 X ð f Þ ! xðt Þ > xðt Þ ! X ð f Þ > > > > > > > Fourier transform Inverse transform = = < < hðt Þ ! H ð f Þ and H ð f Þ ! hðt Þ : Fourier transform Inverse transform > > > > > > ; ; : yðt Þ ! Y ð f Þ > : Y ð f Þ ! yðt Þ > Fourier transform

Inverse transform

For a real impulse-response function Z hð t Þ ¼

1

H ð f Þei2πft df ,

1

pffiffiffiffiffiffiffiffiffi where i is the imaginary unit 1, and e is Euler’s number, the constant 2.71828. The real and imaginary parts of H( f ) are given by:

R1 U ð f Þ ¼ 1 hðt Þ cos ð2πft Þdt R1 : V ð f Þ ¼ 1 hðt Þ sin ð2πftÞdt

The analytic continuation of H( f ) in the plane of the complex variable p ¼ α + i2πf is: Z H ð pÞ ¼

1

hðt Þept dt:

1 1 Also H ð f Þ ¼ Að f Þ þ iBð f Þ, where Bð f Þ ¼ ∗ Aðf Þ ¼ P πf π 1 Að f Þ ¼ P π

Z

1 1

Z

1 1

AðgÞ dg: Similarly gf

B ðg Þ dg gf

where P is the Cauchy principal value. See Warwick (1956) for a clear, graphicallyillustrated, explanation. The Kramers-Kronig relation (sometimes referred to as “relations”) is named for the Dutch physicist, Hendrik Anthony Kramers (1894–1952) and German physicist, Ralph de Laer Kronig (1904–1995) who are reputed to have independently arrived at the relationships between the real and imaginary parts of response functions in the course of their unrelated work on optical dispersion in 1926–1927; however, see discussion by Bohren (2010). The term was in use by the 1950s (e.g. Warwick 1956), probably following publication of Gorter and Kronig (1936) which referred to Kramer’s work. See also Buttkus (1991, 2000) and Mavko et al. (2009).

312

K

Kriging Kriging is a term coined c. 1960 for one of the geostatistical techniques developed by the French mining engineer and mathematician, Georges Matheron (1930–2000), for optimal estimation of ore grades at a point, or the mean grade within a block, within an ore body. Both in English and French, it is more usually spelt with a small “k:” kriging (Fr. krigage) rather than Kriging (Google Research 2012). Named for the South African mining engineer, Daniel Gerhardus Krige (1919–2013) who was the first to make use of spatial correlation to overcome the observed disagreements between ore grades estimated from sampling and stope samples in South African gold mines (Krige 1951). Ordinary kriging is essentially an optimum method for spatial interpolation which produces the best unbiased estimate of the mean value at a point with minimum estimation variance, and the best weighted moving average for a block. In the case of a point estimate Z*(x0) at a specified position surrounded by n data points, with values Z(xi), Z*(x0) ¼ ΣwiZ(xi), where wi are the weights, Σwi ¼ 0. It is assumed that there is no underlying regional trend and the values of Z(x) should either conform to a normal distribution or should have been transformed so that the transformed values meet this requirement. The weights wi are assigned depending on both the distance and direction of xi from x0, taking into consideration the additional requirements that: nearer points should carry more weight than distant ones; points screened by a nearer point should carry less weight; and spatially clustered points should carry less weight compared to an isolated point at the same distance away. The weights are obtained using a set of variogram models, g(d ), fitted along directions aligned with the principal octants of the geographical coordinate system. This is generally sufficient to define the principal axes of the ellipsoids of equal weight with x0 as the centre. The support is a volume defined in terms of shape, size and orientation for which the average values of the regionalized variables are to be estimated. If this is essentially as small as a point, and both observations and estimates have the same support then the process is known as point or punctual kriging. In many applications, x0 will be the set of grid nodes at which values are to be interpolated prior to contour threading (see contouring). Matheron formalised and generalised Krige’s procedure (Matheron 1960, 1962–1963, 1966), defining kriging as the probabilistic process of obtaining the best linear unbiased estimator of an unknown variable, in the sense of minimizing the variance of the resulting estimation error (estimation variance). He subsequently (Matheron 1973a, b) developed procedures to obtain unbiased nonlinear estimators (e.g. disjunctive kriging and kriging of transformed variables). Related techniques include universal kriging, which is intended to enable any large-scale regional trend (so-called drift) to be taken into account; indicator kriging (Goovaerts 1997) which is analogous to logistic regression in that Z(x) is a binary variable and the kriged values are probabilities; and disjunctive kriging (Rivoirard 1994) based on an exact transform of the cumulative distribution function of Z(x) to the equivalent quantiles of the standard normal distribution. Cokriging uses knowledge of one regionalized variable to assist with the estimation of values of another correlated with it. See also conditional simulation, geostatistics. See also: Bivand et al. (2013).

313

Kronecker delta (δ) function, Kronecker function This is a mathematical function with equi-spaced discrete values of zero from minus infinity to plus infinity, except at x ¼ 0, where it has unit value. Named after the German mathematician, Leopold Kronecker (1823–1891) who introduced it (Kronecker 1868). The term Kronecker(‘s) function was in use by the 1940s (Google Research 2012) See also: Gubbins (2004), Dirac delta function. Kruskal-Wallis test A nonparametric statistical test (Kruskal and Wallis 1952; Conover 1980) for comparing k 3 groups of data (of size ni , (i ¼ 1, k), which may not all be the same) to see whether the data for one or more of the groups contains larger values than the others, as opposed to them all being identical (and therefore having identical Pk means). The test is carried out by first ranking all the N observations N ¼ i¼1 ni without regard to which group they belong to. Let Ri ¼

ni X

rij

j¼1

be the sum of the individual ranks (r) for all the samples belonging to the i-th group. The test statistic T is defined as: k 1 X R2i N ðN þ 1Þ2 T¼ 2 þ n 4 S i¼1 i

!

where ! ni k X X 1 N ð N þ 1Þ 2 2 S ¼ r : N 1 i¼1 j¼1 ij 4 2

If there are no tied ranks then this simplifies to: T¼

k X 12 R2i 3ðN þ 1Þ: N ðN þ 1Þ i¼1 ni

Named for American mathematician and statistician, William Henry Kruskal (1919–2005) and economist and statistician, Wilson Allen Wallis (1912–1998) and known as the Kruskal-Wallis test since the publication of their 1952 paper. Early geological discussion includes Miller and Kahn (1962), Davis (1986), Cheeney (1983) and Rock (1986a). Kuiper’s test A nonparametric statistical test for uniformity in ungrouped orientation data, introduced by Dutch mathematician Nicolaas Hendrik Kuiper (1920–1994)

314

(Kuiper 1960). Given a set of n samples from a circular distribution 0 < θi 360, rearrange the values into ascending order: θ(1) θ(2) θ(n). Now convert these to values 0 xi 1 by letting xi ¼ θ(i)/360, and calculate the statistics: Dþ n

1 2 ¼ max x1 ; x2 ; ; 1 xn ; n n

and D n

1 2 n1 ¼ max x1 ; x2 ; x3 ; ; xn : n n n

pffiffiffi pffiffiffi ThenV n ¼ Dþ n þ Dn and the test statistic isV ¼ V n ð n þ 0:155 þ 0:24= nÞ(Fisher 1993). The hypothesis that the samples belong to a uniform distribution is rejected is V is too large. Discussed in a geological content in Cheeney (1983), Schuenemeyer (1984) and Rock (1987). Kurtosis A measure of the peakedness of a unimodal frequency distribution. For a sample of size n it is given by: 8 0) for a two-dimensional spatial point pattern. The area occupied by the set of points is covered with a square mesh of cells, beginning with one of diameter d which is sufficient to cover the whole of the area occupied by the point set. The mesh size is then progressively decreased, and the number of occupied cells, N(d), at each size step is counted. Then, N(d ) ¼ cdD, where c is a constant; a graph of log[N(d)] (y-axis) as a function of log(d ) (x-axis) will be linear with a slope of –D. This is known as the Minkowski or Minkowski-Bouligand dimension, named after the Russian-born German mathematician, Hermann Minkowski (1864–1909) and the French mathematician, Georges Louis Bouligand (1889–1979) (Minkowski 1901; Bouligand 1928, 1929; Mandelbrot 1975a, 1977, 1982). The discussion by Turcotte (1992) is in a geological context. Minkowski set operations Named after the Russian-born German mathematician, Hermann Minkowski (1864–1909) who developed the theory (Minkowski 1901; Hadwiger 1950; but see discussion in Soille 2002) which now underpins many algorithmic operations in image processing. In its simplest form it is applied to binary (black-andwhite) images. For example, given a set of black objects (O), located in a white background (the compliment of O, Oc) to be manipulated; and a defined shape, say a square of given diameter or circle of given radius (S), then several basic morphological operations may be L defined: dilation of O by S, written as: O S, will generally cause O to grow larger by uniform thickening along its boundary; erosion of O by S: O S, will generally cause O to shrink by uniform thinning at its boundary; opening (erosion followed by dilation of the result) causes the smoothing of the boundary of O by removal of pixels at sharp corners: L {(O S) S}; and closing (dilation followed by erosion of the result) causes the filling-in L of small irregularities along the boundary of O: {(O S) S}; and so on. Early application in geology was by Agterberg and Fabbri (1978) and Fabbri (1980). Minor The determinant of a square matrix which is the result of deleting one or more rows and columns of a pre-existing matrix. The term was introduced by the English mathematician, James Joseph Sylvester (1814–1879) (Sylvester 1850) and is used in this sense in a geophysical context by Jeffreys (1926)

382

Mixed effects model, mixed-effects model, mixed components model A model applied to a dependent variable data measured on all individuals belonging to two or more different groups and which includes both fixed effects and random effects (in contrast to a standard regression model which has only fixed effects). There are two types: (i) random intercepts models, in which all the responses for a single group are additively shifted by a value which is specific to that group, and (ii) random slopes models, in which each group follows a linear model but the intercept and slope is specific to that group. Existing software can cope with both linear and nonlinear regression models. The term may also be applied to analysis of variance models in which the usual assumptions that the errors are independent and identically distributed are relaxed. Early discussion occurs in Mood (1950), Kempthorne (1952) and Wilk and Kempthorne (1955); and in a geological context by Krumbein and Graybill (1965) and by Miller and Kahn (1962) although they use the term mixed components model. The unhyphenated form of the spelling appears to be slightly more frequent (Google Research 2012).

M

Mixing model, mixture model Mixing models are often encountered in petrology (using the chemical compositions of a suite of rock or mineral samples) and sedimentology (using grain-size compositions). Typically, one has known compositions of a number of end-members (e.g. a suite of minerals, or of sediments thought to be representative of particular environments) and wish to determine the proportions in which they might have been mixed in order to match the compositions of a set of igneous or metamorphic rock analyses or a suite of sediment grain-size distributions, on the assumption that compositional changes are linear in nature. Early contributions in this field were by the Canadian geologist, Hugh J. Greenwood (1931–) (Greenwood 1967, 1968), followed by Perry (1967), Bryan et al. (1969), Wright and Doherty (1970), Albarède and Provost (1977), Clarke (1978), Le Maitre (1979, 1981, 1982) and Ehrlich and Full (1987). However, although it had been recognised that the closed nature of compositional data required special treatment, it was not until Aitchison’s (1982, 1984, 1986) early publications that a possible means of dealing with the problem was provided (but there has been resistance to his ideas, see Aitchison 1999). Krawczynski and Olive (2011) applied this approach to mass-balance problems and found that it eliminated the calculations of phase proportions which produced negative mass-balance coefficients. See also mixture distributions. Mixture distributions, mixed frequency distributions, mixed populations Many sets of geological data consist of mixtures of two or more populations (e.g. background and anomalous concentration levels of an element in a mineralised area). Provided some assumptions are made about the number of populations present from which the samples are drawn and their nature (e.g. whether they can be adequately modelled by a normal distribution and/or a lognormal distribution) the parameters of the populations, and the proportions in which the populations are mixed, may be estimated by graphical or computational methods (e.g. Burch and Murgatroyd 1971; Sinclair 1974; Titterington et al. 1986; Galbraith and Green 1990). Le Maitre (1982) reviews the use of mixing models

383

in petrology, e.g. determining the proportions in which a group of minerals (whose ideal major-element oxide compositions are known) are combined to give a composition matching that of an analysed igneous rock; see also Ehrlich and Full (1987). Renner (1993a, b) has developed a number of algorithms for estimating ‘end-member’ compositions which can account for the compositional variation observed in a group of samples, and for determining the proportions in which these are mixed. Renner et al. (1998) and Evans et al. (2003) are examples of this type of approach applied to the geochemistry of recent sediments. The analysis of this type of frequency distribution was first discussed by the British statistician, Karl Pearson (1857–1936) (Pearson 1894) and in the geological literature by the British petrologist, William Alfred Richardson (1887–?1964) (Richardson 1923). See also: frequency distribution decomposition. Modal analysis A technique also referred to as point-count, or point count, analysis (e.g. Demirmen 1971, 1972) largely used in sedimentary and igneous petrography, micropaleontology and palynology, to estimate compositions (in terms of the volume percentage of petrographic constituents, heavy minerals present in separated concentrates, or the proportions of different taxa present). Good practice is to systematically traverse one or more microscope slide(s) in equal-sized steps, counting the occurrences of the different constituents until a pre-defined total number has been reached. Large total counts may be required to detect rare constituents believed to be present. Although the technique was first developed by the French mineralogist and petrologist, Achille Ernest Oscar Joseph Delesse (1817–1881) and the Austrian geologist and mineralogist, August Karl Rosiwal (1860–1923) in the nineteenth century (Delesse 1848, Rosiwal 1898), it was the American petrologist, Felix Chayes (1916–1993) who established a statistical basis for the method (Chayes 1956), see also Demirmen (1971, 1972); and Weltje (2002) for discussion of determining confidence intervals on the estimated proportions. The constant-sum nature of the percentaged or proportioned data (to 100% or unity respectively) leads to a number of statistical problems: see closure problem and logratio transformation for discussion. The term modal analysis has been consistently much more widely used than point count analysis (Google Research 2012). Mode 1. In statistics, it is the value of a variable corresponding to a maximum in its probability density. The term was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1895). The use of the mode, in both arithmetic units and the phi scale of sediment grain-size diameter, as a descriptor in sedimentology, was introduced by the American mathematical geologist, William Christian Krumbein (1902–1979) in Krumbein and Pettijohn (1938). 2. In petrology, it refers to the actual mineral composition of a rock, as quantitatively determined by modal analysis (Cross et al. 1902, 1903). See also: G-mode, Q-mode, R-mode.

384

Model A formal expression of an idea which may be used to try to explain a set of observed data, by comparing its deduced behaviour to the actual observed data, or to predict the outcome of an event. The term became popular following its use by the American statistician, Alexander McFarlane Mood (1913–) (Mood 1950). The American mathematical geologist, William Christian Krumbein (1902–1979) and statistician Franklin Arno Graybill (1921–2012) brought the distinction between various types of models to the attention of geologists (Krumbein and Graybill 1965), classifying them into: (i) scale models and physical models, (ii) mathematical models, and (iii) conceptual models. Mathematical models may be subdivided into: (i) statistical models (e.g. a regression equation) and stochastic process models, which describe a phenomenon in a probabilisitic way having specific random process built into the model, and (ii) deterministic models. See also: discovery-process model, fluid-flow model. Note that both modelling and modeling are acceptable spellings, the latter being the more usual in American English (Google Research 2012).

M

Model of de Wijs, de Wijs binomial cascade, de Wijsian model The Dutch economic geologist, Hendrica Johanna de Wijs (1911–1997), published a seminal paper (de Wijs 1951) in which he introduced the idea of self-similarity of element concentration values by assuming that the values of two halves of a block of ore with an overall average concentration value X are (1 + d)X and (1 d )X, regardless of the size of the block (de Wijs 1951), where d is the dispersion index. X and d are the two parameters of the model. In the early 1950s this model inspired the French geostatistician, Georges Matheron (1930–2000) to develop his theory of regionalized variables as applied to ore assays. Matheron’s (1962) absolute dispersion parameter, α, is a function of d and relates logarithmic variance of element concentration values to the logarithmically transformed ratio of volumes of a larger block and smaller blocks contained within it. Krige (1966) showed that this version of the model applies to the spatial distribution of gold and uranium in the Witwatersrand goldfields, South Africa. Mandelbrot (1982) demonstrated that the de Wijsian model was the first example of a multifractal. Lovejoy and Schertzer (2007) referred to this as the de Wijs binomial cascade. Independently, Brinck (1971) used the model of de Wijs for spatial distribution of various chemical elements in large portions of the Earth’s crust. Brinck’s approach is described in detail by Harris (1984) together with other applications. Agterberg (2007) showed that estimation of the dispersion parameter can be improved by using multifractal theory. He proposed a 3-parameter de Wijsian model, the third parameter being the apparent number (N ) of subdivisions of the environment. This was introduced because, although the de Wijsian model may be satisfied on a regional scale, the degree of dispersion generally decreases rapidly as local, sample-size scales are reached. Model resolution The resolution of a model is the smallest change in its input which will produce a detectable change in its output (Gubbins 2004).

385

Modulus 1. A constant multiplier or parameter. 2. The distance of a complex number from the origin (0,0) in the complex plane (see Argand diagram). The absolute magnitude of a complex number, x + iy (where i is the pffiffiffiffiffiffiffi imaginary unit 1 ) is given by √(x2 + y2). The idea of this graphical portrayal is generally credited to the French mathematician, Jean-Robert Argand (1768–1822) (Miller 2015a) (Anonymous 1806, Argand 1874). 3. A multiplier used to convert logarithms of one base to the equivalent in another base. 4. A number reflecting a physical property of a substance, usually determined by the ratio of the magnitude of a mechanical cause to that of its effect (e.g. bulk modulus, elastic modulus). Mohr’s circle, Mohr’s diagram A method originally developed by the German civil engineer, Christian Otto Mohr (1835–1918) (Mohr 1882) for analysing two-dimensional stress distribution at a point in a material. On a graph of tension (y-axis) versus compression (x-axis), a Mohr’s circle describes the two-dimensional stresses acting at a point in the material, each point on the circle represents the normal and shear stresses acting on one side of an element oriented at a given angle. Constructions using the Mohr diagram can be used to determine strain state based on field measurements of extension. Although discussed by the Austrian geologist, Bruno Hermann Max Sander (1884–1979) (Sander 1948, 1970), it did not really see geological application until the 1960s, following discussion by the American structural and engineering geologist, William Francis Brace (1926–2012) (Brace 1961, Ramsay 1967, Ramsay and Huber 1983). Moments The term (based on analogy with the moment of a force in physics) was introduced into statistics by the British statistician, Karl Pearson (1857–1936) (Pearson 1893). Moments are all parameters which characterise a frequency distribution in terms of the average value (m) of integer powers of the values of the variable. The first four moments are: the mean: M1 ¼ m(x); the variance: M2 ¼ m[xm(x)]2; M3 ¼ m[xm(x)]3; and M4 ¼ m[xm(x)]4 ¼ 3{m[xm(x)]2}2. Except for the first moment, they all reflect in some way dispersion with respect to the centre of the distribution as characterised by the mean. Skewness ¼ M3/s3, where s is the standard deviation (i.e. √M2); and kurtosis ¼ M4/s4. Suggested as possible measures in sedimentology (Van Orstrand 1925; Wentworth 1929; Krumbein 1936a, b; Krumbein and Pettijohn 1938) and fold shape in structural geology by Loudon (1964): M1, average attitude; M2, tightness; M3, asymmetry; and M4, reflecting the shape of the fold closure. See also: second moment; Vistelius (1980, 1992), Camina and Janacek (1984) and Gubbins (2004). Monotone sequence, monotone function, monotonic sequence, monotonic function A monotone (also known as a monotonic) series or sequence of numbers is one in which each

386

successive value xi is either consistently > or xi1 (increasing monotonic) or consistently < or xi1 (decreasing monotonic). Similarly, a monotonic function of a variable x, y ¼ f (x), either increases or stays constant as x increases (monotonic increasing function), or decreases or stays constant as x increases. The basic theory was developed by an English mathematician, William Henry Young (1863–1942) (Young 1908). Mentioned in Camina and Janacek (1984); Wolery and Walters (1975), use monotonic sequences to provide an efficient means to determine error bounds on free ion concentrations in determination of chemical equilibria in natural waters.

M

Monte Carlo method A computer-based method of statistical sampling (Tocher 1954) used to solve statistical or modelling problems. Results obtained using this technique are called Monte Carlo (MC) estimates. Having a reliable algorithm for generating a stream of pseudorandom numbers from a uniform distribution underpins the method and is of critical importance (Sharp and Bays 1992; Gentle 1998; Eddelbuettel 2006), these are then used to draw random values from a specified theoretical or actual frequency distribution, based on its corresponding cumulative distribution, as though they are “observed.” The accuracy of the result increases with the number of trials (many thousand are typical today). The first publications on the MC method (Metropolis and Ulam 1949; Donsker and Kac 1950; Kahn 1950; Kac and Donsker 1950) resulted from the implementation of the technique in classified work by the Polish physicist, Stanislaw Marcin Ulam (1909–1984) and the Hungarian mathematician, Janosh (John) von Neumann (1903–1957) on theoretical aspects of thermonuclear weapon development during the Manhattan Project at Los Alamos, USA, in 1946. The name itself was suggested by their colleague, the Greek-American physicist, Nicholas Constantine Metropolis (1915–1999) (Metropolis and Ulam 1949; Metropolis 1987). The first computer programs to use the MC method were run on the ENIAC (Electronic Numerical Integrator And Computer) at the U.S. Army’s Ballistic Research Laboratory, Aberdeen, MD, in April–May 1947, based on flowcharts developed by von Neumann and coded by his wife Klára Dán von Neumann (1911–1963) and the American mathematician, Adele Goldstine (née Katz; 1920–1964), who had written the operators manual for the ENIAC (Haigh et al. 2014b). In 1934, the Italian physicist, Enrico Fermi (1901–1954), at that time working in Rome prior to his joining the Manhattan Project, had apparently also tried a similar approach using a hand calculator to solve neutron diffusion problems (Anderson 1986), but its potential had to await ENIAC before it was able to be fully realised. See also: Hammersley and Handscomb (1964) and Efron and Tibshirani (1993) and, for earth science applications: Harbaugh and Bonham-Carter (1970), Camina and Janacek (1984) and Gubbins (2004). Monte Carlo significance test Monte Carlo sampling provides methods to obtain statistically significant results in test situations which cannot be resolved using classical methods. The American statistician Meyer Dwass (1923–1996) and the British statistician, George Alfred Barnard (1915–2002) independently suggested the use of Monte Carlo tests for hypothesis tests in which the test statistic does not have a known frequency

387

distribution. (Dwass 1957; Barnard 1963). Given a simple null hypothesis H0 and a set of relevant data, Monte Carlo testing consists of calculating a statistic u1 from a set of data. Although the frequency distribution of the statistic is unknown, but it is possible to simulate data sets on the null hypothesis H0 and calculate the values of the statistic for these sets {ui, i ¼ 2, 3, , m}. The observed test statistic u1 is then ranked among this corresponding set of values which have been generated by random sampling. When the frequency distribution of u is effectively continuous, the rank determines an exact significance level for the test since, under the null hypothesis H0, each of the m possible rankings of u1 are equally likely (Hope 1968; Besag and Diggle, 1977). See also: Romesburg et al. (1981), Romesburg and Marshall (1985), Foxall and Baddeley (2002) and Chen et al. (2015). See also permutation test, randomization test. Morphometrics In geology, morphometrics is the application of statistical methods to the biometric analysis of shape in palaeontology, micropalaeontology and palynology. Studies of this sort were originally popularised by the work of British zoologist, (Sir) D’Arcy Wentworth Thompson (1860–1948) (Thompson 1915, 1917). More recently, geometric morphometrics has developed, in which topologically corresponding points (so-called landmarks) can be compared in three-dimensions using digital reconstruction of fossil skulls, etc. For examples of work in this field, see: Reyment (1971b, 1991), Reyment et al. (1984), Bookstein (1991, 1995), MacLeod (2002a, b), Elewa (2004) and Sutton et al. (2013). Most predictable surface An application of canonical correlation to lithostratigraphic thickness data to derive a composite variable, interpreted as reflecting overall facies variations for a given formation (e.g. based on a weighted combination of the thickness of limestone, dolomite, anhydrite, mudstone and marlstone in a given stratigraphic interval), suitable for regional mapping using trend-surface analysis. Introduced (Lee 1981) by Chinese-born Canadian petroleum geologist, Pei-Jen Lee (1934–1999). Mother wavelet The mother wavelet is a finite length, or fast decaying, oscillating, waveform as a function of time with a particular shape and with a fixed number of oscillations, chosen for use in a wavelet analysis. The first of these was the Morlet wavelet, named for the French geophysicist, Jean Morlet (1931–2007) who first introduced the term wavelet [Fr. ondelettes] and the accompanying theory about 1975, while working for Elf-Aquitane (Farge et al. 2012). It is given by: ψ ðt Þ ¼ π 0:25 s0:5 ei2πf 0 ½ s e0:5½ s , tτ

tτ 2

where s is the scaling parameter (s < 1 yields compression and s > 1, dilation); τ is the shift p ffiffiffiffiffiffiffiparameter; f0 is the basic frequency of the mother wavelet; and i is the imaginary unit 1. In its original form, the Morlet wavelet enabled a continuous transform (Morlet et al.

388

1982a, b). The Belgian-born American physicist and mathematician, Ingrid Daubechies (1954–) introduced a discrete approach (Daubechies et al. 1986) which enabled functions to be reconstructed from a discrete set of values. The Daubechies wavelet provides an alternative (Daubechies 1988, 1990). The mother wavelet is orthogonal to all functions which are obtained by translation (shifting) it right or left by an integer amount and to all functions which are obtained by dilating (i.e. stretching) it by a factor of 2j , ( j ¼ 2, 3, ). These dilations and translations enable a whole family of functions to be developed. See also Heil and Walnut (2006) and Weedon (2003) for discussion in a geological context. Moveout filter A velocity filter which attenuates seismic events based on their apparent velocity. Effectively an angularity correction applied to adjust adjacent traces in a display, used in areas of flat bedding to remove steep noise lineups or multiples from seismic data, so as to present a true depiction of the reflecting surfaces. The amount of the correction required depends on the time from the shot, the average seismic wave velocity in the ground, and the distance between the shot-point and the detector groups. Long detector groups have been used to assist discrimination between the seismic signals and noise (Savitt et al. 1958). Moving average, moving-average, moving window 1-dimensional: A technique used to smooth a series of observations (often a time series): x1, x2, ∙∙∙, xm. Choose a window of width N ¼ 2n + 1 points, where N is odd, and a series of weights w0, w1, ∙∙∙, wN; where N P wi ¼ 1: The window is then centred on each data point in turn from xn + 1 to x[m (n + 1)];

i¼0

the smoothed series is given by the weighted average g w ¼

N P

wi xi : Choice of weights

i¼0

M

may be equal-valued or related to their distance away from the centre of the window, e.g. after the shape of the normal distribution, etc. More recently, the median or some other robust estimate of the location has been used if the series is likely to contain outliers. The first application of the method was to disguise the absolute highs and lows in the monthly holdings of gold bullion by the Bank of England through the use of a 3-point moving average (House of Commons Committee on Secrecy 1832) and subsequently in geophysics (Stewart 1889), although the term moving average did not come into use until King (1912). Early use in earth science includes Krumbein and Pettijohn (1938), Korn (1938), Vistelius (1944), Tukey (1959a), Vistelius (1961) and Harbaugh and Merriam (1968). See also: Spencer’s formula, Sheppard’s formula, smoothing. 2-dimensional: A technique used to smooth a two-dimensional series of observations. A square n n window is passed across the map area, generally using adjacent non-overlapping positions on a notional grid superimposed on the map area, and all points falling within the window are averaged. The mapped symbol corresponds to the mean orientation and direction of the data points within each window position. Geological applications include Potter (1955) and Chork and Govett (1979). The unhyphenated spelling moving average has always been the most frequently used (Google Research 2012).

389

Moving Average (MA) process A stationary process in which the value of a time series at time t is correlated in some way with the value(s) in the previous time steps. A moving average process, MA(q) is: ðxt mÞ ¼ εt θ1 εt1 θ2 εt2 . . . θq εtq , where m is the mean level; ε is a white noise process with zero mean and a finite and constant variance; θj, where j ¼ 1:q are the parameters; and q is the order. To obey the assumption of stationarity, the absolute value of θ1 should be less than unity. The American mathematician, Joseph Leo Doob (1910–2004) noted (Doob 1953) that every time series which exhibits a continuous spectrum is in fact a moving average process. For discussion in a geological context, see Sarma (1990), Buttkus (1991, 2000) and Weedon (2003); see also: autoregressive process, autoregressive moving average process. Moving average spectrum The power spectrum of a moving average process (Buttkus 2000). Moving window spectrum analysis Evolutionary spectrum analysis is a technique in which many power spectra are calculated from a series of closely-spaced overlapping windows along the length of a time series. The result can be effectively represented as a contoured graph of power as a function of frequency and time (or depth in a stratigraphic section, etc.) It is particularly effective for revealing the changing structure of a nonstationary time series. Independently developed by the Hungarian-born British electrical engineer and physicist, Dennis Gabor (1900–1979) (Gabor 1946) for the analysis of speech (sonogram), using a decomposition of the waveform via a set of time-shifted and modulated wavelets in the frequency domain known as Gabor’s elementary functions; and the British statistician, Maurice Bertram Priestley (1933–2013) (Priestley 1965, 1996). Examples of its geological application are Pisias and Moore (1981) and Melnyk et al. (1994). Also known as: sliding or moving window spectrum analysis, windowed Fourier analysis, short-time or short-term Fourier transform, spectrogram. See also: power spectrum, Fourier analysis, wavelet analysis. Multichannel filter A filter whose characteristics are partly determined by the characteristics of other channels. It can be regarded as the matrix-based counterpart of single-channel filter theory (Robinson and Treitel 1964; Robinson 1966b; Treitel 1970; Buttkus 1991, 2000). See also: multichannel processing, z-transform. Multichannel processing Data processing in which data from different input channels are treated as an ensemble and combined in some way, e.g. in seismic stacking, filtering, migration, etc. (Sheriff and Geldart 1982). Multichannel recorders for both seismic and

390

well-logging data were in use by the 1940s (e.g. Nettleton 1940; Eisler and Silverman 1947). See also: multichannel filter. Multicollinearity The occurrence of two or more strongly correlated predictor variables in a multiple regression model. While this does not affect the predictive power of the model as a whole, the coefficients in the fitted regression equation may not be a reliable guide to the relative predictive power of any given predictor, and the coefficients may change in an unpredictable manner in response to small changes in the model or the data. If the estimated regression coefficients in the model have major changes on the addition or deletion of a predictor variable, this may be cause for concern (Hoerl and Kennard 1970; Jones 1972). See: ridge regression. Multidimensional, multi-dimensional Two or more dimensions. The unhyphenated spelling has always been the most usual (Google Research 2012) Multidimensional convolution Performing convolution in two or more dimensions. See also: helix transform.

M

Multidimensional Scaling (MDS) A non-hierarchical method of cluster analysis introduced by the American computer scientist, statistician and psychometrician, Joseph Bernard Kruskal (1928–2010) (Kruskal 1964) in which the multivariate compositions of a number of samples are represented as points on a two-dimensional scatterplot in which interpoint distance reflects the distance between the points in the original number of dimensions, thereby allowing the investigator to visually determine which samples constitute groups or sub-groups. Geoscience applications include Williams et al. (1987), Pan and Harris (1991), Greenough and Owen (2002) and Dzwinel et al. (2005). See also: nonlinear mapping algorithm. Multifractal This term, introduced by the Polish-born French mathematician, Benoît B. Mandelbrot (1924–2010) (Mandelbrot 1972), is used when many fractal subsets with different scaling properties coexist simultaneously. An infinite sequence of fractal dimensions is obtained from moments of a statistical distribution (Halsey et al. 1986). The theory has been applied to the spatial distribution of earthquakes (Hirata and Imoto 1991), fracture networks, pore spaces, topography, well logs, etc.(Turcotte 1997). Multifractal power spectrum-area (S-A) method A Cartesian plot of log transformed cumulative area of isolines on a two-dimensional power spectrum as a function of log transformed isoline value. This is frequently a good method for separating anomalies from background as originally proposed by Cheng et al. (2001). The method is compared to other power spectral techniques in Lovejoy and Schertzer (2007). See also: power spectrum.

391

Multifractality An index to characterize the multifractal spectrum of fractal dimensions, introduced by Chinese-Canadian geoscientist Qiuming Cheng. It is written as τ0 (1) representing the second derivative of mass exponent τ(q) with respect to q. The multifractality is estimated using the expression τ0 (1) ¼ τ(2)2τ(1) + τ(0) or τ'(1) ¼ τ(2) + τ(0) if τ(1) ¼ 0, conservation of total mass within system (Cheng 1999). Multigaussian approach, Multi-Gaussian approach A method of geostatistical estimation (Verly 1983) in which in the simple kriging estimate of the grade of one point or block, the Gaussian variable is assumed to be multivariate normal. See also: bi-Gaussian approach. The most frequently used spelling appears to be multigaussian (Google Research 2012). Multimodal distribution, multi-modal distribution A variable with more than two local maxima in its probability distribution. Early use of the term occurs in a biometrics text by the American civil engineer, zoologist and eugenicist, Charles Benedict Davenport (1868–1944) (Davenport 1899) and he mentions (Davenport 1900) that the occurrence of multimodal populations had been recognised by the German botanist, Friedrich Ludwig (1851–1918) in the 1890s. He visited the English statistician, Karl Pearson in London in 1897, who had attempted the first analytical solution to the problem of separating a bimodal distribution into Gaussian subcomponents in Pearson (1894). M. Clark (1976) discusses the history of analytical, graphical and numerical methods for statistical analysis of multimodal distributions and compares their results using various historical data sets as well as sediment grain size distributions. Although not using the term multimodal, Krumbein and Pettijohn (1938) state that “frequency curves of glacial till, either on arithmetic or logarithmic [size] scales, commonly display several modes.” The unhyphenated spelling multimodal distribution is by far the most frequent spelling (Google Research 2012). Multinomial distribution Sometimes called a polynomial distribution (Vistelius 1980). This is a discrete distribution which is associated with events which can have more than two outcomes; it is a generalisation of the binomial distribution to situations in which k > 2 outcomes can occur in each of n trials: Pðn1 ; n2 ; ; nk Þ ¼

n1 n! p1 pn22 pnk k , n1 ! n2 ! nk !

where ni is the number of trials with outcome i and pi is the probability of outcome Pk i occurring on any particular trial; and n ¼ i¼1 ni . It underpins the statistics of pointcounting of mineral constituents in rocks, or counts of species types in the analysis of micropaleontological or pollen abundance data. Introduced by the British statistician, (Sir)

392

Roland Aylmer Fisher (1890–1962) (Fisher 1925a). See also: Miller and Kahn (1962) and Mosimann (1965); modal analysis and closed data. Multiple correlation coefficient (R) The product-moment correlation coefficient between the observed and fitted values of the dependent variable in a regression analysis. The term was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1914). See also: Krumbein and Graybill (1965); coefficient of determination. Multiple inverse method The method, introduced by the Japanese structural geologist, Atsushi Yamaji (1958–) (Yamaji 2000), is a numerical technique designed to separate stresses from heterogeneous fault-slip data. It resamples all n fault subsets from a data set and calculates the optimum stress from each subset. The method uses the generalised Hough transform to identify clusters in parameter space. The results can be improved (Otsubo and Yamaji 2006) by removing subsets which cannot be explained by a single stress. Multiple matching An algorithm, based on dynamic programming, for optimal matching of stratigraphic sections which allows for gaps in the strata being compared and the matching of one stratum to a number of others if required (Smith and Waterman 1980; Howell 1983). See also Waterman and Raymond (1987). Multiple regression This is one of the most widely-used statistical applications in the earth sciences. At its simplest, it involves the fitting of a linear model y ¼ β 0 þ β 1 x1 þ β 2 x2 þ þ β n xn

M

to predict the value of a “dependent” or “response” variable, y, which is considered to be controlled by a group of n “predictors” or “explanatory variables” (x). The coefficients (β) are the parameters whose value is to be estimated. It is generally assumed that the predictors can be treated as though they are independent. Care is needed in the selection of the appropriate method; see Mark and Church (1977), Mann (1987), Troutman and Williams (1987) and Webster (1997) for discussion. Nonlinear and tree-based predictor models may also used. The term multiple regression first appeared in a paper by the British statistician, Karl Pearson (1857–1936) (Pearson 1903b). For earth science applications see: Burma (1949), Griffiths (1958), Mark and Church (1977), Mann (1987), Troutman and Williams (1987) and Webster (1997); see also: Draper and Smith (1981) and Bates and Watts (1988); locally-weighted regression, logistic regression, ridge regression, stepwise selection, tree-based regression. Multistage sampling, multi-stage sampling Hierarchical, stratified, or stratified random, multi-stage or nested sampling are all names for a sampling design in which the n samples to be taken from a fixed interval (e.g. a vertical section through the horizon of

393

interest) are taken at random positions (chosen using a random number table, or computer-generated sequence of random numbers, to avoid bias) within n equal-length subdivisions of the entire interval. The name derives from the division of the population to be sampled into parts, known as strata, probably after geological usage, e.g. Woodward (1695). This sampling strategy is particularly appropriate in spatial geological studies so as to achieve regionally adequate coverage. For example, in a region covered by a particular geological formation to be sampled for a pilot environmental survey, one might, say, divide the area occupied by the formation in question into 10 km 10 km grid squares, and select a number of these, either on a spatially regular or random basis; within each, select at random two 1 km 1 km sub-cells; within each of these, take pairs of samples 100 m apart at two randomly-selected positions, and combine these four fieldsamples together to provide a single composite sample which will subsequently be used for laboratory preparation and analysis. The method originated with social survey work by the Norwegian statistician, Anders Nicolai Kiaer (1838–1919) and was later established on a sound theoretical basis by the Russian-born American statistician, Jerzy Neyman (1894–1981) (Neyman 1934). It was introduced into geology by the American mathematical geologist, William Christian Krumbein (1902–1979) and statistician, John Wilder Tukey (1915–2000) (Krumbein and Tukey 1956). The unhyphenated spelling multistage sampling has been the most frequently-used since the 1970s (Google Research 2012). Multitaper method, multi-window prolate spectrum analysis A method of estimating the power spectrum density of a time series by tapering the detrended data values using a sequence of special wave-shaped weighting functions (discrete prolate spheroidal sequences, also known as Slepian sequences). Each is applied in turn, and the result is then Fourier transformed. Each periodogram is based on a different (uncorrelated) weighting for different parts of the data. The individual spectral coefficients of each are then averaged to reduce the variance. The resulting spectrum density is effectively smoothed, but without losing information at the ends of the time series. The smoothing functions used are orthogonal and reduce leakage as much as possible. The method is also known also known as Thomson tapering after its developer, the Canadian-born American statistician, David James Thomson (1942–) (Thomson 1982), or multi-window prolate spectrum analysis. For geological applications see: Park and Herbert (1987), Percival and Walden (1993), Lees and Park (1995), Weedon (2003) and Gubbins (2004). The unhyphenated multitaper has been the most frequent spelling since the 1970s (Google Research 2012). Multivariate (MV) The term, which came into wide use from the 1930s onwards (Google Research 2012, Bartlett 1939) refers to the analysis of data in which each observation consists of values from more than one variable. Burma (1948, 1949, 1953), Krumbein and Tukey (1956) and Miller and Kahn (1962) are early examples of usage in a geological context. The term bivariate is often used if there are only two variables, trivariate if there are three, and multivariate is usually taken to imply that more than three variables are

394

being considered. See also: (i) Graphics, e.g. the Kite diagram; (ii) data analysis methods, e.g. cluster analysis; correlation coefficient or similarity coefficient matrix; discriminant analysis; factor analysis; logratio transformation; principal components analysis; linear, nonlinear, or logistic regression analysis; Markov chain Monte Carlo; multivariate analysis of variance; and (iii) multivariate frequency distributions: e.g. the Dirichlet and MV Cauchy, MV lognormal, MV logskew-normal, multinomial, MV normal, MV skew-normal distributions. Multivariate Analysis of Variance (MANOVA) Analysis of variance applied to two or more variables simultaneously. The method was developed as an extension of the univariate (ANOVA) approach by the American statistician, Samuel Stanley Wilks (1906–1934) (Wilks 1932). It was introduced into geology by the American mathematical geologist, William Christian Krumbein (1902–1979) and statistician John Wilder Tukey (1915–2000) in Krumbein and Tukey (1956). Multivariate Cauchy distribution A group of variables which can all be described by Cauchy distributions can be treated as a joint distribution. The probability distribution does not have an expectation and is symmetrical about the origin, where it has a maximum. Named for the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857). It is mentioned in an earth science context by Vistelius (1980, 1992).

M

Multivariate lognormal distribution A group of variables which can all be described by lognormal distributions (usually all with different means and standard deviations) can be treated as a joint distribution characterised by a mean vector and covariance matrix (Aitchison and Brown 1957). Garrett (1989) described a graphical method for the detection of outliers in multivariate lognormal trace element distributions. It is also mentioned by Vistelius (1980, 1992). See also Chi-squared plot. Multivariate logskew-normal distribution A positive random vector x is said to have a multivariate logskew-normal distribution if the frequency distribution of y ¼ loge(x) conforms to a multivariate skew-normal distribution. All the marginals have a univariate logskew-normal distribution with either positive or negative bias. It represents a generalization of the multivariate lognormal distribution. The logskew-normal distribution can arise as the distribution of a positive random vector whose associated composition has an additive logistic skew-normal distribution. Introduced in an earth science context by the Spanish statistician, Glória Mateau-Figueras (1973–) (Mateu-Figueras 2003; Buccianti et al. 2006). Multivariate normal distribution A group of variables which can all be described by normal distributions (usually all with different means and standard deviations) can be treated as a joint distribution characterised by a mean vector and covariance matrix. Early use of the term was by the American statistician, Harold Hotelling (1895–1973) in his

395

development of principal components analysis (Hotelling 1933). Early use in an earth science context occurs in Miller and Kahn (1962), Reyment (1969a, 1971a), Blackith and Reyment (1971) and Vistelius (1980, 1992). Multivariate point-density analysis A method introduced by the Swedish geochemist, Otto Brotzen (1926–?2005) for grouping rock samples into similar groups (centred on modes in multidimensional space) on the basis of their major-element geochemical composition (Brotzen 1975). Multivariate skew-normal distribution The univariate skew-normal distribution has a probability distribution which is given by 2 x μ x μ F ðx; α; μ; σ Þ ¼ φ Φ α σ σ σ z2 =2

where φðzÞ ¼ epffiffiffiffi is the standard normal distribution function; ΦðzÞ ¼ 2π

Rz 1

φðηÞdη is

the cumulative normal distribution; μ and σ are the location and scale parameters; and the parameter α regulates the shape of the distribution, bounded by α¼0 corresponding to the normal distribution and 1 to the left- or right-sided half-normal disribution. A kdimensional random vector x is said to have a multivariate skew-normal distribution if it is continuous with density function 1 F k ðxÞ ¼ 2φk ðx; μ; Σ ÞΦ αT Σ 2 ðx μÞ , where φk is the standard multivariate normal distribution function; the location and scale parameters are both vectors: μ ¼ {μ1, μ2, , μk}T is the mean vector and Σ is the covariance matrix. The α parameter is a vector which regulates the shape of the distribution and indicates the direction of maximum skewness. When α ¼ 0, the distribution becomes the multivariate normal distribution. Each component of vector x has a univariate skew-normal distribution. Azzalini and Dalla-Valle (1996) and Azzalini and Capitanio (1999, 2014) develop and prove some properties, most of them similar to those of the multivariate normal distribution. It has begun to be applied to compositional data in geochemistry (Mateu-Figueras et al. 1998).

N

Nabla operator (∇) [notation] A vector differential operator denoted by the Greek symbol (∇, nabla): ∇¼i

∂ ∂ ∂ þj þk ∂x ∂y ∂z

where i, j and k are unit vectors directed along the orthogonal x-, y- and z-axes. An early example of its use in a seismology textbook is Macelwane (1932). It was first used (on its side) in a mathematical paper by the Irish mathematician, physicist and astronomer, (Sir) William Rowan Hamilton (1805–1865) (Hamilton 1837) but was subsequently taken up following its adoption (in its present orientation) by the Scottish mathematician and physicist, Peter Guthrie Tait (1831–1901) (Tait 1867, 1890, §145, p. 102). Apparently unsure what to call this new and as yet unnamed symbol, the Greek word nabla was suggested to him by the Scottish professor of divinity, and reader in Arabic and Hebrew, William Robertson Smith (1846–1894) on account of its resemblance to the shape of a harp of Phoenician origin, once used in the Middle East by the ancient Hebrews, called by them ‫( ֵ֤נֶבל‬nêbel) and known to the Greeks as the nabla or nablia (Rich 1890). ¯ Naïve Bayes A probabilistic pattern classification algorithm which uses Bayes rule to assign an unknown to a given class such that the a posteriori probability of it belonging to the class is maximised based on the “naïve” assumption that the features used to determine the result are independent, given the predicted class (Duda and Hart 1973; Henery 1994; Rish 2001). For recent discussion in a earth science context, see Cracknell and Reading (2014).

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_14

397

398

Napierian logarithm, natural logarithm (ln) ln (sometimes written as loge) is an abbreviation for the natural or Napierian (base-e) logarithm (logarithm meaning “ratio number”). It is defined as follows: If x ¼ ey, then y is the logarithm to the base e of x, e.g. ln (10) ¼ 2.302585 and hence log(xy) ¼ log(x) + log( y); log(x/y) ¼ log(x) log( y), etc. Although named for the Scottish mathematician, physicist, astronomer and astrologer, John Napier [Neper], 8th Baron of Merchiston (1550–1617), his tables (Napier 1614; Napier and Briggs 1618; Napier and Macdonald 1889) did not explicitly involve the concept of a base but were designed to assist calculations of sines; they are related by sinθ ¼ 107exp [ln(sinθ)/107]. However, by 1618, it had been recognised that ln(10) ¼ 2.302585 and the English mathematician, John Speidell ( fl. 1600–1634) published a new table of logarithms using this base: “in whose use was and is required the knowledge of algebraicall addition and substraction, according to + and : these being extracted from and out of them (they being first over seene, corrected, and amended) require not at all any skill in algebra, or cossike numbers, but may be used by every one that can onely adde and substract” (Speidell 1619). However, in his system, the logarithm of a number n is equal to Napier’s logarithm of 1/n, i.e. the logarithm of 1 is zero and of 10 is 2302584. The abbreviation ln was first used by the American mathematician, Washington Stringham (1847–1909) (Stringham 1893). See Bruce (2012) for a modern English translation; also Briggsian logarithm. Natural number A term generally meaning a member of the series of positive integers: 1 , 2 , 3 , , 1 (Camina and Janacek 1984) It appears to have first been used by the English mathematician, William Emerson (1701–1782) in Emerson (1763). Whether zero is also included appears to vary between different authors (Miller 2015a).

N

Nearest neighbors classification The k-nearest neighbors [kNN; note American English sp.] algorithm is a classification algorithm in which an unknown test sample is compared to the k nearest others in a training set and assigned to predicted class based on a majority vote cast by the neighbouring samples (Cover and Hart 1967). See Cracknell and Reading (2014) for discussion of its performance in a earth science context. Nearest neighbour orientation diagram A method for determining the characteristics of a strain ellipse from field data (e.g. deformed ooliths, pebbles). The point-centres of the original objects are marked on an overlay. The lengths of each pair of centre-to-centre distances (d) are measured, together with their orientations with respect to a given azimuth (θ). The ellipticity of the strain ellipse may be determined from the parameters of a nonlinear function fitted to a graph of d as a function of θ. Introduced by the British structural geologist, John Graham Ramsay (1931–) and the Swiss structural geologist, Martin Immanuel Huber (Ramsay and Huber 1983). Negative binomial distribution The probability distribution for the number, x ¼ 0 , 1 , 2 , , of individuals encountered in a sampling unit when the population is

399

clumped or aggregated. The parameters of the distribution are the mean μ and an exponent k, which is related to the degree of clumping of the individuals in the population: as k ! 0, the amount of clumping increases. The probability of there being x individuals in a sampling unit is given by: x μk ðk þ x 1Þ! μ F ðx; μ; k Þ ¼ 1 þ k x!ðk 1Þ! μþk where g! means g factorial. If f is the observed frequency of occurrence of x counts, the P P estimate of μ is given by the arithmetic mean: x ¼ fx= f : The variance P P s2 ¼ ½ ð f x2 Þ x fx=ðn 1Þ. An initial estimate of k is given by: b k ¼ x2 =ðs2 xÞ, and this is then refined by a trial-and-error process to determine a maximum likelihood value of b k such that: X N ðxÞ μ b n log e 1 ¼ ¼ 0, b b k k þx where n is the total number of sampling units and N(x) is the total number of counts exceeding x. X N ðxÞ N ðx¼0Þ N ðx¼1Þ N ðx¼2Þ ¼ þ þ þ b b b b k þx k k þ1 k þ2 (Koch and Link 1970; Haldane 1941; Elliott 1971). The distribution can be used for modelling counts of numbers of species per unit sampling area (Buzas 1968; Elliott 1971) and the number of occurrences of a given commodity per unit area over a region (Koch and Link 1970). The distribution was introduced by the British statisticians, Major Greenwood (1880–1949) and George Udney Yule (1871–1951) (Greenwood and Yule 1920). Negative index, negative indices An expression in which one or more terms involve a number raised to a negative power, e.g. x3 ¼ x13 (Camina and Janacek 1984). Negative weight In a weights of evidence model, a positive weight is assigned to a part of a binary map pattern, which is positively correlated with a point pattern (e.g., occurrences of mineral deposits). A negative weight is the corresponding coefficient assigned to the remainder of the map pattern, except that zero weight is assigned to missing data. The sum of positive weights and absolute values of negative weights is called the contrast. If all map patterns considered in the model are approximately conditionally independent of the point pattern, the contrast is approximately equal to the coefficient obtained by logistic regression of 0–1 data for the point pattern on the 0–1 data for the binary map patterns.

400

Nested sampling Hierarchical, stratified, stratified random, multi-stage and nested sampling are all names for a sampling design in which the n samples to be taken from a fixed interval (e.g. a vertical section through a horizon of interest) are taken at random positions (chosen using a random number table, or computer-generated sequence or random numbers, to avoid bias) within n equal-length subdivisions of the entire interval. The name derives from the division of the population to be sampled into parts, known (probably after geological usage) as strata. This sampling strategy is particularly appropriate in spatial geological studies so as to achieve regionally adequate coverage. For example, in a region covered by a particular geological formation to be sampled for a pilot environmental survey, one might, say, divide the area occupied by the formation in question into 10 km 10 km grid squares, and select a number of these either on a spatially regular or random basis; within each select at random two 1 km 1 km sub-cells; within each of these, take pairs of samples 100 m apart at two randomly-selected positions, and combine these four field samples together to provide a single composite sample which will subsequently be used for laboratory preparation and analysis. This approach originated with social survey work by the Norwegian statistician, Anders Nicolai Kiaer (1838–1919) and was later established on a sound theoretical basis by the Russian-born American statistician, Jerzy Neyman (1894–1981) (Neyman 1934). It was introduced into geology by the American mathematical geologist, William Christian Krumbein (1902–1979) and statistician, John Wilder Tukey (1915–2000) (Krumbein and Tukey 1956). See also: Krumbein and Graybill (1965), Tourtelot and Miesch (1975) and Alley (1993). .NET Framework See dot Net Framework.

N

Network Two principal types of network have been studied in geology: stream and crack/ fracture networks. Both exhibit a hierarchically-branched structure in which conditions at one point in the network may be influenced by conditions in connected branches. The first studies of stream network topology developed methods based on stream order to investigate geomorphological causes of erosion and watershed evolution (Horton 1945; Strahler 1952; Shreve 1966, 1967). See also: Smart (1969), Werner (1971), Dacey (1971), Dacey and Krumbein (1976), Moon (1979), Shimano (1992) and Beven and Kirkby (1993). Properties of crack and fracture networks are discussed by Gray et al. (1976), Chilès (1988) Berkowitz (1995) and Liu et al. (2009). Neural net, neural networks See: artificial neural networks. Neutral line A line connecting points of zero extension. The term was popularised following publication of the classic text on photoelasticity by the English engineer, Ernest George Coker (1899–1946) and the French-born English applied mathematician, Louis Napoleon George Filon (1875–1937) (Coker and Filon 1931). It is referred to in structural geology by Ramsay (1967) and Ramsay and Huber (1983).

401

Neutral point The strains (extensions) at such a point (also known as an isotropic point) are zero in all directions. The term was popularised following publication of the classic text on photoelasticity by the English engineer, Ernest George Coker (1899–1946) and the French-born English applied mathematician, Louis Napoleon George Filon (1875–1937) (Coker and Filon 1931). It is referred to in structural geology by Ramsay (1967) and Ramsay and Huber (1983). Neutral surface A surface connecting points of zero extension. Referred to in the work of the British mathematician and geophysicist, Augustus Edward Hough Love (1863–1940) (Love 1906). It is referred to in structural geology by Ramsay (1967) and Ramsay and Huber (1983). Newton-Raphson algorithm, Newton-Raphson iteration, Newton-Raphson method, Newton’s approximation method An iterative technique for finding the solution to a well-behaved nonlinear algebraic equation, first devised in unpublished work by the English mathematician and polymath, Isaac Newton (1643–1727), in 1669 and subsequently modified by his colleague, the English mathematician, Joseph Raphson (1668?– 1715) (Raphson 1690; see Cajori 1911; Kollerstrom 1992; Ypma 1995). Assuming the function f 0 (x) is well behaved, then given a first reasonable guess at the value (x0) of the root of the function at which f 0 (x) ¼ 0, the next approximate solution, x1, is obtained from: x1 ¼ x0 ff 0ððxx00ÞÞ, where f 0 (x) is its derivative. This solution is then used to obtain a successive estimate x2, and so on, i.e. xnþ1 ¼ xn ff 0ððxxnnÞÞ. This method of solution is based on the fact that the new estimate of the root, xn + 1, corresponds to the point on the x-axis at which it is crossed by the extrapolated tangent to the curve y ¼ f (x) at the point x ¼ xn. Provided the first guess is reasonably close to the value of the unknown root and that f 0 (x0) is not equal to zero, the method should converge. For earth science applications see: Deffeyes (1965), Jones and James (1969), Camina and Janacek (1984), Gubbins (2004). Newton-Raphson method appears to be the most frequently used attribution (Google Research 2012). Niggli grain size measures The Swiss crystallographer, petrologist and geochemist, Paul Niggli (1888–1953) introduced (Niggli 1935) some empirical statistical descriptors of sediment grain size: (i) δ ¼ 2d/(dmax + dmin), where dmin and dmax are the smallest and largest grain-size in the sediment and d is Baker’s equivalent grade; let the total percentage of material lying between dmin and d be p, then π ¼ 2p/100. Note how δ and π depart from unity. (ii) By means of d and p, the sediment distribution is divided into coarse and fine parts and for each of these, the mean grain sizes are d 0 and d00 respectively; then the Niggli sorting index a ¼ 3(d00 d 0 )/d. This has values of a 1 for well sorted sediments and a > 1 for poorly sorted sediments. These measures were used by Zingg (1935) and are mentioned in Krumbein and Pettijohn (1938).

402

Niggli number, Niggli values A method devised by the Swiss crystallographer, petrologist and geochemist, Paul Niggli (1888–1953) (Niggli and Beger 1923; Niggli 1948, 1954) for recalculating weight percentaged major oxide results as equivalent amounts of the participating oxides. The weight percentages of the major oxides (SiO2, Al2O3, Fe2O3, FeO, MnO, MgO, CaO, Na2O, K2O, TiO2 and P2O5) are first normalised by dividing each by its equivalent molecular weights, to give: SiO2*, Al2O3*, Fe2O3*, FeO*, MnO*, MgO*, CaO*, Na2O*, K2O*, TiO2* and P2O5*. If S ¼ (Al2O3* + Fe2O3* + FeO* + MnO* + MgO* + CaO* + Na2O* + K2O*), then al ¼ 100Al2O3*/S; fm ¼ 100(Fe2O3* + FeO* + MnO* + MgO*)/S; c ¼ 100CaO*/S; alk ¼ 100(Na2O* + K2O*)/S; si ¼ 100SiO2*/S; ti ¼ 100TiO2*/S; p ¼ 100P2O5/S; k ¼ K2O*/(Na2O* + K2O*) and mg ¼ MgO*/(Fe2O3* + FeO* + MnO* + MgO*). This was an attempt to avoid the problems caused by the essentially constant-sum nature of the major oxide data. However, as pointed out by Chayes (1949), this causes considerable problems with the interpretation of the apparent trends shown by Niggli plots. Niggli values appears to be the most frequently used attribution (Google Research 2012). See also: closure problem, remaining-space variable, norm. No-space graph A graphical aid to the determination of the ordering in a biostratigraphic range chart based on the occurrences of fossil taxa in two or more well-sampled geological sections, introduced by American biostratigrapher, Lucy E. Edwards. The first and last occurrences of each species as expected in a hypothesised sequence are plotted as a function of their observed sequence. This aids recognition of out-of-place events, such as a taxon which does not fill its entire taxonomic range or the fact that the hypothesised sequence may require partial revision. The term no-space graph arises from the fact that it is based on the relative position only of successive events as compared to the hypothesized sequence. The hypothesized sequence is successively revised, based on data from several sections until all biostratigraphic events occur in-place in at least one section, and all events in all sections may be interpreted as either in-place or unfulfilled range events (Edwards 1978). Subsequent applications include those by Hazel et al. (1980) and Sadler et al. (2009).

N

Node 1. The start of a drainage channel system (source node) or point at which two branches of a drainage channel system join (junction node) (Horton 1945). 2. A fixed point in phase space towards which solutions for dynamical systems evolve. (If its eigenvalues are real and negative, it is stable; if real and positive, it is unstable; see also: saddlepoint). The term originally derived from its use in acoustics to refer to the “stationary” positions on a vibrating string, as discussed by the French mathematician and physicist, Joseph Sauveur (1653–1716) in 1701 (Sauveur 1743). It was subsequently introduced into the mathematical study of dynamical systems by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) (Poincaré 1881, 1882); see also Turcotte (1997).

403

Noise, noisy 1. In time series analysis, it is an unwanted time-related process consisting of random disturbances corrupting the signal being monitored. If it has equal power in all frequency intervals (i.e. it is uncorrelated) over a wide range of frequencies (producing a flat power spectrum) then it is known as white noise. The observed values at each time interval are independent with zero mean and constant variance, i.e., it is a purely random process. If the amplitude of the power spectrum is not equal at all frequencies (i.e. partially correlated in some frequency band), then it is known as colored noise [American English sp.; Tukey and Hamming 1949)] (e.g. red noise is partially correlated at the lowest frequencies; see also white noise). The American statistician, John Wilder Tukey (1915–2000) pointed out that the repetition of a signal would produce an exact copy, whereas a repetition of noise would only have statistical characteristics in common with the original (see also Tukey 1959b). The concept of noise was introduced by the Swiss-born German physicist, Walter Schottky (1886–1976), who predicted (Schottky 1918) that a vacuum tube would have two intrinsic sources of time-dependent current fluctuations: shot noise (Schroteffekt; a superposition of impulses occurring at random Poisson distributed times) and thermal noise (W€ armeeffekt). The former was observed as current fluctuations around an average value, as a result of the discreteness of the electrons and their stochastic emission from the cathode. The latter, manifested as fluctuating voltage across a conductor in thermal equilibrium, is caused by the thermal motion of electrons and occurs in any conductor which has a resistance, and it is temperature-related. It is now called JohnsonNyquist noise, after two Swedish-born American physicists, John Bertrand Johnson (1887–1970) and Harry Nyquist (1889–1976), who first studied it quantitatively and explained the phenomenon (Johnson 1928; Nyquist 1928b). See also: van der Ziel (1954), Wax (1954), Davenport and Root (1958), Blackman and Tukey (1958); and, in an earth science context, Horton (1955, 1957), Buttkus (2000), Gubbins (2004); oneover-f noise, random walk, nugget effect. 2. The American mathematical geologist, William Christian Krumbein (1902–1979) used noise (Krumbein 1960a) to mean fluctuations in data which cannot be assigned to specific causes and which, if they are large, may obscure the meaningful information in the data. Noise analysis Routine analysis of seismic noise levels recorded by seismometers on a weekly and annual basis, using the power spectral density to characterise background levels over a wide range of frequencies to assist in determining seasonal and secular noise characteristics and to identify instrumental problems (Buttkus 2000). Nomogram, nomograph A graphical calculator: a diagram representing the relations between three or more variables by means of linear or curved scales, so arranged that the value of one variable can be read off by means of drawing a straight line intersecting the

404

other scales at the appropriate values. Alignment charts have been used in structural geology to aid calculation of bed thickness; depth to a stratigraphic horizon; the spacing interval for structure contours, etc. The methods for construction of such charts were developed by the French geometer, Maurice D’Ocagne (1862–1938) (D’Ocagne 1896), however, it is now a lost art, formerly much used in engineering and statistics (e.g. Peddle 1910; Levens 1959), having become totally outmoded with the arrival of personal computers. Methods of construction for alignment charts are explained by Peddle (1910). Examples of earth science usage will be found in: Herbert Smith (1907), Krumbein and Pettijohn (1938), Lawlor (1939), Cloos (1947), Nevin (1949), Billings (1954), Ivanhoe (1957). The term nomogram rather than nomograph has been the preferred usage in both British and American English since the 1960s (Google Research 2012). Noncausal filter, non-causal filter A filter which produces output at time t whose output also depends on future inputs is called noncausal. Discussed in an earth science context by Ferber (1984) and Gubbins (2004). See also: causal filter, acausal filter, impulse response filter. The unhyphenated spelling is by far the most frequent (Google Research 2012). Nondetect, nondetectable An observation in which its observed value falls below a method detection limit. Although the term non-detectable has been in use in the context of chemical and radiological analysis since the 1960s (e.g. Anonymous 1963), nondetect seems to have first appeared in this context in the 1980s, (e.g. Loftis et al. 1989), but the term has since been popularised following the work of the American environmental statistician, Dennis Raymond Helsel (1951–) (Helsel and Hirsch 1992; Helsel 2005). Non-Gaussian random process A Gaussian random process is not always suitable for modelling processes with high variability and models based on long-tailed distributions (non-Gaussian processes) may be required in some circumstances (Samorodnitsky and Taqqu 1994; Johnny 2012). See also Markov process.

N

Nonlinear dynamical system, non-linear dynamical system A dynamical system is one whose behaviour is described by a set of deterministic ordinary differential equations and its long-term behaviour is determined by analytical or numerical integration of these equations. The natural dissipation of a dynamical system, combined with its underlying driving force, tends to kill off initial transients and settle the system into its typical behaviour. In dissipative systems governed by finite sets of linear equations, a constant forcing eventually leads to a constant response while periodic forcing leads to a periodic response. However, if the governing equations are nonlinear, constant forcing can lead to a variable response. The behaviour of such nonperiodic systems was first studied by the American meteorologist, Edward Norton Lorenz (1917–2008) (Lorenz 1963). Earth science applications are discussed by Shaw (1987), Turcotte (1992, 1997) and Quin et al. (2006). The unhyphenated spelling is by far the most frequent (Google Research 2012).

405

Nonlinear inverse problem, non-linear inverse problem This refers to obtaining the solution to a problem in which the relationship between the data and the model is nonlinear in nature (Gubbins 2004), e.g. estimating the origin time, depth, latitude and longitude of an earthquake from observations at a number of distant positions. Methods of solution generally involve iterative modelling, using an initial model, computing the residuals to the data, then adjusting the model so as to try to reduce the residual magnitudes, etc. Candidate solutions may be found using either a comprehensive grid search of model space with incremental changes in the model parameter values (e.g. Sambridge and Kennett 1986), which can be very computer-intensive; or by Monte Carlo search (Sambridge and Mosegaard 2002), including simulated annealing (Billings 1994) or genetic algorithms (Stoffa and Sen 1991; Sambridge and Gallagher 1993). Prior information on the relative likelihood of particular models is particularly helpful. The term came into common use in the 1970s; the unhyphenated spelling is generally preferred (Google Research 2012). Nonlinear least squares, non-linear least squares The least squares criterion applied to the estimation of the parameters of nonlinear functions (e.g., y ¼ β1 eβ2 x or similar : Parameters are often determined using an optimization procedure such as the NewtonRaphson algorithm. The term came into wide use during the early 1960s; the unhyphenated spelling has become the most frequent in both American and British English since the 1980s (Google Research 2012). See also nonlinear regression, Bates and Watts (1988), Ratkowsky (1990). Nonlinear mapping (NLM) algorithm A non-hierarchical method of cluster analysis introduced by American electrical engineer, John W. Sammon (1939–) (Sammon 1969), a non-metric multidimensional scaling (MDS) (Kruskal 1964) in which the samples are generally represented as points on a two-dimensional scatterplot, interpoint distance reflecting the distance between the points in the original p-dimensions, thereby allowing the investigator to determine which samples constitute groups or sub-groups If the distance between the i-th and j-th objects in the original space is d ijp and the distance between their two-dimensional projections is d 2ij , the optimal mapping is achieved by minimizing 2 2 3 p 2 d d X ij ij 1 6 7 P 4 5: p p d d ij i 0 ; P(x) ¼ 0 only if x is a member of the set of constraints, and k is a positive constant, the penalty parameter. If there are a number of inequality constraints: ci(x) 0 ; i ¼ 1 , 2 , , m, then the penalty function is typically a quadratic of the form: PðxÞ ¼

P

m 1X fmax½0; ci ðxÞg2 2 i¼1

To solve the problem, one starts with a relatively small value of k and a set of values of x corresponding to a point lying outside the feasible region, as k is gradually increased, successive solution points will approach the feasible solution region and will tend to minimize f(x). A solution to constrained minimization problems was first introduced by American computer scientist, Judah Ben Rosen (1922–2009) (Rosen 1960, 1961); penalty function algorithms were introduced by British computer scientist, Roger Fletcher; chemist, Colin M. Reeves; and mathematician, Michael James David Powell (1936–2015), building on the work of American physicist William Cooper Davidon (1927–2013) (Davidon 1959; Rosen 1960, 1961; Fletcher and Powell 1963; Fletcher and Reeves 1964). The utility of penalty functions to solution of problems in geophysics is discussed in Gubbins (2004) and Kirkner and Reeves (1990) describe its application to chemical equilibrium calculations.

447

Percent-percent (P-P) plot, PP plot A graph comparing the cumulative percentage values of a model distribution ( y-axis) with those of the empirical distribution at fixed quantiles (x-axis). If there are xi ¼ 1 , n observations sorted into ascending order of magnitude, the empirical probabilities (0 < pi < 1) fall at pi ¼ (i 0.5)/n. Essentially, the cumulative probabilities of the two distributions are plotted one against the other and if the model fits well, the majority of the plotted points will fall on a straight line. The plot was introduced by the Canadian statistician, Martin Bradbury Wilk (1922–2013) and the Indian statistician, Ram Gnanadesikan (1932–2015) (Wilk and Gnanadesikan 1968) while both were working at the AT&T Bell Labs at Murray Hill, NJ, USA, and its use was popularised by books such as Chambers et al. (1983). An early use in earth sciences was by Switzer and Parker (1976). It has been advocated for use with geochemical data by Reimann et al. (2008). The hyphenated spelling P-P plot has now become the most widely used (Google Research 2012). See also Q-Q plot, CP plot. Percentage (%) Expressing a proportion (e.g., of one part to the sum of all the parts) in terms of parts per hundred. Its use goes back to at least the fifteenth Century; by the sixteenth Century, it was widely used in calculating interest (Smith 1923–1925). The solidus abbreviation (%) has become increasingly frequent since about 1900 (Google Research 2012). Percentage points 1. The absolute unit of difference between two given percentages. A percentage point is 1%. 2. A level of significance expressed as a percentage, a usage essentially begun by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) (Fisher 1925a, b). 3. Today the term is sometimes written simply as points, to imply a percentaged value. Percentile The set of divisions which produces exactly 100 equal parts in a series of continuous values: the p-th percentile is an interpolated value such that p percent of the observations for a given variable fall below it. The term was introduced by the British anthropologist, Francis Galton (1822–1911) (Galton 1885). Early geological use of the term is often in a sedimentological context, e.g. Trask (1932a, b), Wilson (1937), Krumbein and Monk (1943); see also Miller and Kahn (1962), Krumbein and Graybill (1965). Percolation cluster Given a grid of sites in 2 or 3 dimensions and the probability that a site is permeable is specified, there will be a sudden onset of flow through the grid at a critical value of the probability (percolation threshold) as the result of the creation of a continuous path of nearest neighbours from one side of the grid to the other. Percolation theory began with a model of the deterministic propagation of a fluid through a random medium by the British statistician and expert on advertising evaluation, Simon Ralph Broadbent (1928–2002) and mathematician, John Michael Hammersley (1920–2004)

448

(Broadbent and Hammersley 1957); the study of percolation clusters was subsequently taken up by the German physicist, Dietrich Stauffer (1943–) (Stauffer 1976, 1985), the German-born American mathematician, Harry Kesten (1931–) (Kesten 1982), and others. In the earth sciences it is discussed by Bebbington et al. (1990), Turcotte (1997) and Hunt et al. (2014). Period The time taken for one complete cycle of oscillation of a time series, i.e. the time interval between successive similar points (e.g. troughs or peaks) on two adjacent cycles of a periodic waveform. It is equal to both the reciprocal of frequency and the ratio of wavelength to phase velocity. In a time series, the times between successive peaks/ troughs may well vary and it may be resolvable into a number of periods. The term was used in acoustics by the British mathematician, Brooke Taylor (1685–1731) (Taylor 1713), by the Belgian mathematician, Alexis Perrey (1807–1882) in an analysis of the monthly frequency of occurrence of European earthquakes (Perrey 1844); and by the English mathematician and seismologist, Charles Davison (1858–1940) (Davison 1893); it was generally popularised by Thomson and Tait (1867). See also: Camina and Janacek (1984), Weedon (2003), Gubbins (2004); harmonic motion. Period-doubling, period doubling, period-doubling bifurcation A sequence of periodic oscillations in a dynamical system in which the period doubles as a parameter is varied. The phenomenon was investigated by the American mathematician, Mitchell Jay Feigenbaum (1944–) and led to the discovery of the Feigenbaum constant (Feigenbaum 1979, 1980). A period-doubling bifurcation is a local bifurcation in a discrete dynamical system at which the basic period of a limit cycle doubles. Turcotte (1992) discusses these concepts in an earth science context. Both hyphenated and unhyphenated spellings seem to be used with equal frequency (Google Research 2012).

P

Periodic, periodic function, periodic process, periodicity A function of time, f (t), is periodic (Thompson and Tait 1867), with period T, if, for very many (or infinite) successive integer values k, f (t) ¼ f (t + kT ); the oscillations in this waveform have constant wavelength. The periodicity of earthquakes was referred to by the pioneer English seismologist, John Milne (1850–1913) in Milne (1882) and by his colleague, the British physicist, Cargill Gilston Knott (1856–1922) in an early statistical analysis of earthquake frequency (Knott 1884 [1886]). For discussion in an earth science context, see also: Schuster (1897, 1898), Oldham (1901), Knott (1908), Jeffreys (1924), Davison (1932), Chapman and Bartels (1940), Buttkus (1991, 2000), Weedon (2003), Gubbins (2004). See also: quasi-periodic, angular frequency. Periodic noise The occurrence of noise in a signal which has a known period. Anderssen and Seneta (1971) discuss an example from the analysis of geomagnetic disturbances.

449

Periodic trajectory These are special cases of quasi-periodic trajectories in phase space. Any trajectory passing through a point through which it has previously passed must continue to repeat its past behaviour and so must be periodic. This concept was first applied to the behaviour of nonlinear dynamical systems by the American meteorologist, Edward Norton Lorenz (1917–2008) (Lorenz 1963) and is discussed in Turcotte (1992). Periodogram The Fourier analysis of a time series of n equally-spaced observations {x0, x1, x2, . . . xn1} is its decomposition into a sum of sinusoidal components, the coefficients of which {J0, , Jn1} form the discrete Fourier transform of the series, where Jj ¼

n1 i X xðt Þeiωj t , n t¼0

pffiffiffiffiffiffiffi where i is the imaginary unit 1 and ωj is the j-th Fourier frequency. In terms of magnitude A and phase φ, J j ¼ Aj eiωj , the periodogram, a term coined by the German-born British mathematician and physicist, (Sir) Arthur Schuster (1851–1934) (Schuster 1898), is n 2 J j , but he subsequently modified this definition in Schuster (1900, defined as I ωj ¼ 2π 1906). In practice, it is computed as "

n1 2X ðxt mÞ cos ωt AðωÞ ¼ n t¼0 2

#2

"

n1 2X þ ðxt mÞ sin ωt n t¼0

#2

where m is the mean of the data series, and it is often displayed as a graph of log10 A(ω)2 as a function of frequencies ωj ¼ 2πj/n, where j ¼ 1, , (n/2) + 1. It was recast in the context of spectrum analysis by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926, 1930); see also: Whittaker and Robinson (1932), Wiener (1942, 1949), Bartlett (1950). For discussion in an earth science context, see: Schuster (1897, 1898, 1900), Jeffreys (1924), Anderson and Koopmans (1963), Buttkus (1991, 2000), Weedon (2003). Permeability A measure of fluid conductivity: the ease with which fluids can pass through a rock, this property of porous media is defined by a reformulation of Darcy’s law: the permeability k¼

Qμ dP , A dL

where Q ¼ volumetric flow rate (cm3 s1), μ ¼ fluid viscosity (cp), A ¼ sample crosssectional area (cm2), and (dP/dL) ¼ pressure gradient across the sample (atm cm1). k is

450

measured in Darcies, named for the French engineer Henry (Henri)-Philibert-Gaspard Darcy (1803–1858), who discovered Darcy’s law and named the proportionality constant permeability (Darcy 1856); 1 Darcy ¼ 108 cm2; 1 milliDarcy ¼ 1011 cm2. Hence the rate of flow of liquid per unit area is proportional to the pressure gradient. See Fancher et al. (1933) and Archer and Wall (1986) for discussion of measurement techniques. The first investigation of the permeability of an oil sand is believed to be that by the American mining engineer and hydrographer, Frederick Haynes Newell (1862–1932) (Newell 1885). Fraser (1935) experimentally investigated the factors affecting the porosity and permeability of clastic sediments. Early examples of permeability maps are given by Headlee and Joseph (1931); see also Hubbert (1956). Per mil, per mille, permillage (‰) Expressing a proportion (e.g., of one part to the sum of all the parts) in terms of parts per thousand. While use of the Latin term per mille (in each thousand) goes back to at least the sixteenth Century, the equivalent symbol (‰) began to be used only in the 1920s (Smith 1923–5, v. II). Permutation Rearranging all the elements of a set into a different order. The term seems to have been introduced by the English mathematician, Thomas Strode (?1626–1688) (Strode 1678). Early use of the term in a geological context occurs in a description of the textural patterns used in geological maps (Anonymous 1890) and crystallography (Peacock 1936).

P

Permutation test A procedure for determining the statistical significance of a test without knowledge of the sampling distribution. For example, in determining whether there is a statistically significant difference between the value of a statistic observed on (two or more) groups, the data values are repeatedly randomly assigned to the groups and so that all possible values of the test statistic may be determined. If the proportion of the permutations which yield a value of the test statistic as large as that associated with the observed data is smaller than some chosen level of significance (α), then the actual test result is significant at the α-level. This test method was introduced by the British-born American chemist and mathematician, George Edward Pelham Box (1919–2013) and Danish-born American statistician, Sigurd L€okken Andersen (1924–2012) (Box and Andersen 1955) and by the American statistician, Henry Scheffé (1907–1977) (Scheffé 1956). Gordon and Buckland (1996) discuss the use of this type of test in a geological context. It is more usually known as a randomization test; see also Romesburg (1985); Monte Carlo significance test. Perturbation In mathematics, this means a fundamental operation of compositional change in the simplex considered as a Euclidean space. It allows one to describe the change between two compositions as a new composition in the same space. A term introduced by the Scottish statistician, John Aitchison (1926–); see Aitchison (1984), Buccianti et al. (2006).

451

Petrofabrics, petrofabric analysis The English translation of the term Gef€ugekunde, introduced by the Austrian geologist, Bruno Hermann Max Sander (1884–1979) (Sander 1930).The study of the three-dimensional spatial orientation of grains of particular minerals (e.g. mica, quartz) in an oriented-rock specimen at the microscopic scale as a means to understanding the nature of the fabric of a rock at a macroscopic scale. Pioneered by the Austrian mineralogist, Walter Schmidt (1885–1945) and Sander during the 1920s and 1930s; and promoted in English-speaking world by the crystallographer and geologist, Frank Coles Phillips (1902–1982) in England and Australia (Phillips 1937); by the New Zealand-born American geologist, Francis John Turner (1904–1985) (Turner 1938) in New Zealand and North America; and in North America by Harold Williams Fairbairn (1906–1994) (Fairbairn and Chayes 1949) the early studies were based on optical methods using thin-sections, recent work also uses X-ray methods. Prior to the 1950s, lack of adequate understanding of crystal deformation mechanisms led to controversial kinematic interpretation of the results and, in some cases, to erroneous conclusions (Howarth and Leake 2002). Experimental rock-deformation and computer-simulation studies (e.g. Lister and Hobbs 1980) between 1950 and 1980 led to a greatly improved understanding of the mechanisms involved and subsequently improved results. Petrogenetic modelling The numerical modelling of processes involving fractional crystallization, batch partial melting or mixing in igneous rocks. See Nielsen (1985, 1988), Conrad (1987), Holm (1988, 1990), Cebriá and López-Ruiz (1992), D’Orazio (1993), Spera and Bohrson (2001), Bohrson and Spera (2001), Keskin (2002, 2013), Ersoy and Helvaci (2010) for examples. Phase, phase difference 1. In time series, phase, or phase difference, is the interval between the turning points of a periodic waveform. If two waves are “in phase” their maxima and minima coincide. It is also the term used for the angle of lag or lead of a sine wave with respect to a reference. The term was introduced by the French physicist Jean-Baptiste-Joseph Fourier (1768–1830) in 1807 (Fourier 1808), popularised by Thomson and Tait (1867), and was in use in wireless telegraphy by 1889. The unhyphenated spelling of phase difference is by far the most widely used (Google Research 2012). See also: Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004); aliasing, biphase, harmonic motion, in-phase, out-of-phase, phase delay, phase map, phase space, phase spectrum, phase-lag spectrum, sinusoid. 2. In chemistry, it is a chemically and physically homogeneous quantity of matter consisting of a single substance, or mixture of substances, which may exist (or coexist) in one of several thermodynamically distinct phases: solid, crystalline, colloid, glass, amorphous, liquid, gas or plasma. If one form transforms into another it is said to undergo a phase change. The relations between various phases are governed by the Phase Rule.

452

See also phase map, phase space. Phase angle The phase angle of a signal is tan 1

: Early uses of the

quadrature component in phase component

term in geophysics occur in Chapman (1915) and Klotz (1918). Phase coherence Having the same phase relationship in time between a set of observed waveforms (e.g. Blaik and Donn 1954). The coherence of reflected seismic waves was theoretically investigated by the Scottish physicist, Cargill Gilston Knott (1856–1922) (Knott 1899, 1910). Phase delay The American mathematician and geophysicist, Enders Anthony Robinson (1930–) and German-born Argentinian-American geophysicist, Sven O. Treitel (1929–) introduced this term as an equivalent to phase when referring to the phase of a filter (Robinson and Treitel 1964). See also Buttkus (1991, 2000). Phase diagram A graph showing the limits of stability of the various thermodynamically distinct phases which coexist at equilibrium in a chemical system, with respect to variables such as molar composition, temperature and pressure. See Phase Rule. Phase-lag spectrum The negative of the phase spectrum. It is a more convenient function to work with than the phase spectrum as positive phase-lag is equivalent to delay, which is usually more convenient to work with than phase, which is equivalent to advance. A waveform a(t) and its frequency spectrum A( f ), where t is time and f is frequency (cycles/unit time), are Fourier transform pairs. A( f ) is usually a complexvalued function of frequency, extending over all positive and negative frequencies. It may be written in polar form as Að f Þ ¼

1 X

at e2πift ¼ jAð f Þjeiφð f Þ ,

t¼0

P

pffiffiffiffiffiffiffi where i is the imaginary unit 1, the magnitude |A( f )| is called the amplitude spectrum, and the angle φ( f ) is called the phase spectrum and the phase-lag spectrum is θ( f ) ¼ φ( f ). Referred to in Robinson (1967b), Buttkus (1991, 2000) and Weedon (2003). Phase map, phase portrait This is a graph in which each possible state of a dynamical system is represented by a unique point in the phase space of the system, which together form a series of curves. The curve along which the phase point moves is called a phase trajectory. A set of phase trajectories represents set of all possible configurations of system and the types of possible motions in the system. The theory was developed by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) (Poincaré

453

1881, 1882). The Dutch mathematician, Floris Takens (1940–2010) showed (Takens 1981) that it is not necessary to know a priori all the state variables governing the system under study—it is only necessary to monitor the time evolution of a single variable which depends on one or more of the quantities that actually determine the behaviour of the system. In practice, for an observed time series it is only necessary to plot the values of observations at x(t) as a function of x(t 1); x(t 1) versus x(t 2), using a fixed lag; etc. to reveal it. This is known as Takens’ time delay method. In the time derivative method, the values of observations at x(t) are plotted as a function of dx/dt; and dx/dt versus d2x/dt2, using a fixed lag. See D. Smith (1994), Turcotte (1997), Weedon (2003). The term phase portrait was used from the early 1950s, followed by phase map or Poincare´ map in the early 1970s (Google Research 2012). Phase Rule This was proposed by the American chemist and physicist, Josiah Willard Gibbs (1839–1903): Assuming that the state of a system is normally governed by its composition and two of the three variables: pressure (P), volume and temperature (T ), if the degrees of freedom of the system ( f ) are the number of such variables which may be varied independently without changing the number of phases present at equilibrium; the number of components (c) is the smallest number of independently variable constituents needed to express the composition of each phase; and p is the number of phases in equilibrium (e.g. for a system consisting of solid, liquid and vapour, p ¼ 3); the Phase Rule states that f ¼ c + 2 p (Gibbs 1876, 1878a, b). Unfortunately, his arguments leading to this were couched in such abstract mathematical terms that at the time there was little interest in his work. Its importance was only realised later as a result of laboratory investigations. It had been known since the seventeenth Century that “wootz” steel (an Anglicization of ukku, the word for the crucible-produced steel made in Southern India since ca. 500 AD) used to make weapons such as sword blades, possessed far superior properties to steels made in the West. The presence of carbon in wootz steel was first demonstrated by the Finnish chemist, Johan Gadolin (1760–1852) in a doctoral thesis (Gadolin 1781) supervised by the Swedish chemist and mineralogist, Torbern Olof Bergman (1735–1784) (Srinivasan and Ranganathan 2004). Subsequently, the English assayer and metallurgist, William Chandler Roberts (1843–1902; known from 1885 as Roberts-Austen), determined the first measured composition-temperature diagram (“melting-point curve”) in terms of molar fraction (x) and temperature [T-x plot] for the Cu-Ag alloy system (Roberts 1875); a similar “freezing-point curve” for Cu-Zn alloys (RobertsAusten 1895) and for the Fe-C system followed (Roberts-Austen 1897, 1899; see also Smith 1914, Kayser and Patterson 1998). Roberts-Austen’s work was subsequently improved on by the Dutch chemist, Hendrik Willem Bakhuis Roozeboom (1854–1907), who from 1886 finally gave clarity to Gibbs’ work in experimental studies of solid-liquidgas systems and used both T-x and pressure-temperature [P-T] plots in his work (Snelders 1993; Wisniak 2003), recognising the presence of both triple and quadruple points. He finally elucidated the behaviour of the Fe-C system (Bakhuis Roozeboom 1900), showing that the constituents involved were: carbon in the form of graphite, pure iron, solid

454

solutions of carbon and iron, iron carbide and a ferrite-cementite eutectic mixture. He also initiated a long-term systematic study of how the Phase Rule was exemplified in the behaviour of one-component (Bakhuis Roozeboom 1900) and binary systems. This was continued to more complex systems by his former students after his death (Snelders 1993; Wisniak 2003). His ideas began the development of experimental petrology through the work of Norwegian geologist and metallurgist, Johan Herman Lie Vogt (1858–1932); American physicist, Arthur Louis Day (1869–1960) and his colleagues at the Geophysical Laboratory of the Carnegie Institute of Washington (founded in 1905), who included the petrologist Norman Levi Bowen (1887–1956). See for example: Vogt (1903–1904), Day and Shepherd (1906), Day et al. (1906), Bowen (1912, 1915). Examples of computer applications for the calculation of phase diagrams, usually with the aid of a compositional and thermodynamic database, include: Bergman and Brown (1984), Perkins et al. (1986), Niederkorn and Blumenfeld (1989), Connolly and Petrini (2002), Saika-Voivod et al. (2004). A phase-equilibrium algorithm, a numerical method for the study of mineralmelt equilibria (e.g. plagioclase, olivine and clinopyroxene crystallization from basaltic liquids) was introduced by Weaver and Langmuir (1990); see also Danyushevsky (2001). Phase shift, phase-shift The result of adding to, or subtracting from, a phase measurement; unless all components are shifted proportional to their frequencies, it may result in a change of the wave shape. The term was in frequent use in physics by the 1930s; early usage in geophysics occurs in Jakosky (1938), see also Tukey (1959a). The unhyphenated spelling phase shift is the more widely used (Google Research 2012).

P

Phase space, phase-space A coordinate space defined by the state variables of a dynamical system, i.e. each possible state of the system is represented by a unique point in the phase space; e.g. for a single particle moving in one dimension (e.g. a driven damped pendulum), its behaviour with time can be described in terms of two coordinates: position and velocity. The term phase space was first introduced by the French mathematical physicist and mathematician, Jules Henri Poincaré (1854–1912) in a treatment of the solution of differential equations (Poincaré 1881). It was first applied to the behaviour of nonlinear dynamical systems by the American meteorologist, Edward Norton Lorenz (1917–2008) (Lorenz 1963). In many real systems one may not know what all the state variables are. Fortunately, the Dutch mathematician, Floris Takens (1940–2010) showed (Takens 1981) that such knowledge is not, in fact, necessary: see phase map. It is discussed in an earth science context by Turcotte (1997). The unhyphenated spelling phase space is the most frequently used (Google Research 2012). Phase spectrum A waveform a(t) and its frequency spectrum A( f ), where t is time and f is frequency (cycles/unit time), are Fourier transform pairs. A( f ) is usually a complexvalued function of frequency, extending over all positive and negative frequencies. It may be written in polar form as

455

Að f Þ ¼

1 X

at e2πift ¼ jAð f Þjeiφð f Þ ,

t¼0

pffiffiffiffiffiffiffi where i is the imaginary unit 1, the magnitude |A( f )| is called the amplitude spectrum, and the angle φ( f ) is called the phase spectrum. Referred to in Ben-Menahem and Toks€oz (1962), Robinson (1967b), Buttkus (1991, 2000), Weedon (2003), Gubbins (2004). See also: phase-lag spectrum, autospectrum. Phase splitting 1. The separation of a trough (or peak) in a seismic waveform into two or more troughs (peaks) on adjacent traces (Sheriff 1984). 2. The splitting of earthquake-related SKS or S shear waves in the upper mantle as a result of the preferred orientation of crystallographic axes of elastically anisotropic minerals such as olivine producing polarization of the fast component (e.g. Liu et al. 1995). The unhyphenated spelling phase splitting has become slightly more frequent since the 1970s (Google Research 2012). Phase velocity, phase-velocity The velocity with which any given phase (e.g. a seismic wave or trough of single frequency) travels in the direction of propagation (Fu 1947a, b; Press and Ewing 1950). The term “velocity of transmission of phase” occurs in the work of the Irish mathematician, physicist, and astronomer, William Rowan Hamilton (1805–1865) (Hamilton 1841). The unhyphenated spelling phase velocity is by far the most widely used (Google Research 2012). Phi curve A name given to the lognormal distribution of sedimentary particle sizes when transformed into the phi scale (Krumbein 1936a, 1938). Phi deviation, phi kurtosis, phi mean diameter, phi median diameter Phi deviation is the standard deviation of sedimentary particles sizes when transformed to the phi scale (Krumbein 1936a). This measure was established by the American geologist, Douglas 16 Lamar Inman (1920–2016) (Inman 1952) as: σ ϕ ¼ ϕ84 ϕ , where ϕ16 and ϕ84 are the 16th 2 and 84th percentiles. It was later redefined by the American sedimentologist, Robert Louis Folk (1925–) and his student William Cruse Ward (1933–2011) as the Inclusive graphic standard deviation (Folk and Ward 1957) so as to include more of the distribution curve in

ϕ5 16 the sorting measure: σ I I ¼ ϕ84 ϕ þ ϕ956:6 , where ϕ5 and ϕ95 are the 5th and 95th 4 percentiles. This parameter is also referred to as sorting (e.g. by Graf 1993). Folk and Ward (1957) also introduced phi kurtosis, also known as Graphic kurtosis: KG ¼ (ϕ95 ϕ5)/[2.44(ϕ75 ϕ25)] to measure the assymetry of the extremes, where ϕ25 is the 25th percentile. The phi mean diameter, the arithmetic mean of sedimentary particles

456

sizes transformed to the phi scale (Krumbein 1936a) was used by Inman (1952) as a 16 measure of the centre of a sediment size distribution σ ϕ ¼ ϕ84 ϕ , where ϕ16 and ϕ84 2 are estimated from the cumulative sediment size grade. It was redefined by Folk and Ward (1957), on the basis that Inman’s statistic did not accurately reflect the mean of bimodal distributions nor strongly skewed distributions, as: Mz ¼ (ϕ16 + ϕ50 + ϕ84)/3, where ϕ50 is the median (50th percentile). The phi median diameter was introduced by Inman (1952) as the median sediment size grade measured on the phi scale. However, Folk and Ward (1957) recommended that since it was based on only one point of the cumulative size curve, it provided a very misleading measure and its use should be abandoned. See also Trask sorting coefficient.

Phi scale, phi unit A logarithmic scale of sediment grain-size: ϕ ¼ log2(d ) where d is the grain-size in mm. It was established by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1934a, 1936a) as it transforms the Wentworth grade scale (Wentworth 1922: 2 outcomes can occur in each of n trials: P(n1, n2, , nk) ¼ [n!/n1! n2! nk!] p1n1 p1n2 pknk, where ni is the number of trials with outcome i and pi is the probability of outcome i occurring on any particular trial. It underpins the statistics of point-counting of mineral constituents in rocks, or counts of species types in the analysis of micropaleontological or pollen abundance data (Miller and Kahn 1962; Mosimann 1965). Introduced by the British statistician, (Sir) Roland Aylmer Fisher (1890–1962) (Fisher 1925a, b). See also: modal analysis and closed data. Polynomial function A function which can be defined by evaluating a polynomial.

P

Population An ideal finite or infinite selection of individual “units” such that a sufficiently large finite sample taken from it is hoped to be representative; a collection of objects with probabilities attached to relevant subcollections (Blackman and Tukey 1958). The English scientist and anthropologist, Francis Galton (1822–1911) introduced the concept (Galton 1877, 1889), but it was his colleague the statistician, Karl Pearson (1857–1936) who established it in a firmer statistical setting (Pearson 1903a). It is also occasionally called a universe. For discussion in a geological context, see Krumbein (1960b). See also: target population, sampled population. Porosity The ratio of void volume (pore space) to bulk volume in a rock, usually expressed as a percentage. It is a measure of the fluid capacity of the solid. In 1838, English chemist John Frederick Daniell (1740–1845) and physicist (Sir) Charles Wheatstone (1802–1875) determined the bulk of water absorbed by a well-dried two-inch

469

cube of rock on over one hundred specimens sampled in the course of an investigation to find, on the basis of a large number of attributes, the most suitable stone for building the “New Houses of Parliament” in London, concluding that “the magnesian limestone of Bolsover Moor is . . .. the most fit and proper material to be employed” (Barry et al. 1839). Fraser (1935) experimentally investigated the factors affecting the porosity and permeability of clastic sediments. See Fancher et al. (1933), Archer and Wall (1986), Anovitz and Cole (2015) for discussion of measurement techniques. Positive definite matrix An n by n real symmetric matrix M is positive definite if zT Mz > 0, where zT is the transpose of z, for all non-zero vectors z with real entries. Early use of the term occurs in Pell (1919) and in a palaeontological context in Reyment (1969a). Positive weight In a weights of evidence model, a positive weight is assigned to a part of a binary map pattern, which is positively correlated with a point pattern (e.g., occurrences of mineral deposits). A negative weight is the corresponding coefficient assigned to the remainder of the map pattern, except that zero weight is assigned to missing data. The sum of positive weights and absolute value of the negative weights is called the contrast. If all map patterns considered in the model are approximately conditionally independent of the point pattern, the contrast is approximately equal to the coefficient obtained by logistic regression of 0–1 data for the point pattern on 0–1 data for the binary map patterns. See Bonham-Carter et al. (1988), Agterberg et al. (1993). Posterior probability Bayesian methods are a class of methods for estimating the probability of occurrence of a set of events. Given a prior frequency distribution of known (or sometimes assumed) functional form for the occurrence of the event, the posterior frequency distribution is given by Bayes’ rule. Computer-intensive simulation methods, such as Markov chain Monte Carlo, may be required to obtain a solution, because of the difficulty of performing the necessary integration in many practical problems. The term Bayesian was first used by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) in 1950. See: Wrinch and Jeffreys (1919) and, in an earth science context: Appendix B in Jeffreys (1924), also: Rendu (1976), Vistelius (1980, 1992), Christakos (1990), Curl (1998), Solow (2001), Rostirolla et al. (2003); Bayesian inversion, Bayesian/maximum-entropy method. Potential field Potential is the amount of work needed to position a unit charge, unit pole, or unit mass at a given position (usually with respect to infinity). The concept of gravitational potential was first introduced by the French mathematician and astronomer PierreSimon, Marquis de Laplace (1749–1827), (Laplace 1784). Gravitational, magnetic and electric fields are scalar potential fields, and the gradient of a potential field is called the field strength, field intensity, or flux density. In geophysics, one is usually concerned with: (i) the potential itself; (ii) the potential gradient; (iii) the direction of the field and (iv) the second derivative of the potential, i.e. the field gradient and its direction. In gravitational

470

prospecting, gravity is the first derivative of the gravity potential with respect to the vertical (Heiland 1940). See also Baranov (1975), Bhattacharyya and Chan (1977): equipotential surface. Potential function Defined by the French mathematician and astronomer, Pierre-Simon, Marquis de Laplace (1749–1827), (Laplace 1784) as the function V, the sum of the masses of the molecules of an attracting body divided by their respected distances from the attracted point: ÐÐÐ ρdxdydz V ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ðx αÞ2 þ ðy βÞ2 þ ðz γ Þ2 where ρ is the density of the body at the point {x, y, z} and {α, β, γ} are the coordinates of the attracted point, the limits of integration being determined by the form of the attracting mass. Laplace showed that (in polar coordinates) V satisfied a partial differential equation equivalent to: d2 V d2V d2 V þ þ 2 ¼ 0: dα2 dβ2 dγ This formulation was first given in rectangular coordinates by Laplace (1787 [1789]), but it was first named potential by the British mathematician and physicist, George Green (1793–1841) (Green 1828); and is referred to as potential function throughout Todhunter (1873). See also: harmonic function. Power

P

1. In time series spectral analysis, power is signal (waveform) amplitude squared. The American communications engineer, Ralph Beebe Blackman (1904–1990) and statistician, John Wilder Tukey (1915–2000), justify their use of the adjective power as in power spectrum, etc. (Blackman and Tukey 1958) in terms of a time-varying voltage across, or current through, a pure resistance of one ohm: the long-term average power dissipated in the resistance will be strictly proportional to the variance of X(t). In the case of the discrete Fourier transform of a time series X(t) of n equi-spaced values, for each possible wavelength, say (n/4), the amplitude can be regarded as being given by the {sum of the individual products of X(t) multiplied by the equivalent values of a cosine wave} multiplied by (2/n); and the same for the corresponding sine wave. The power corresponding to a frequency of 4/n is then given by the sum of the squares of these two amplitudes (Buttkus 2000; Weedon 2003). 2. The power of a hypothesis test in statistics is the probability of rejecting a false null hypothesis (H0) when the alternative hypothesis (H1) is the one which is true. It will

471

depend on the magnitude of the effect being measured, the chosen level of significance for the test, and the sample size. In the literature it can be expressed either as a proportion or as a percentage. See: power function, Beta test, Chi-squared test, exact Chi-squared test, F-test, Hodges-Ajne test, Kolmogorov-Smirnov test, Kruskal-Wallis test, Kuiper’s test, Mann-Whitney test, Mardia’s uniform scores test, Monte Carlo significance test, one-sided test, randomization test, Rayleigh test, Smirnov test, squared ranks test, Student’s t-test, two-sided test, Van der Waerden test, Watson’s u2 test, Watson-Williams test. Power density, power density spectrum See: power spectrum. Power function, power law 1. A nonlinear function of the form y ¼ axb, where a is a constant and b (a real number) is the exponent. It may be linearized by taking logarithms of both sides of the equation, i.e. log y ¼ log(a) + b log(x), which may be plotted as a straight line on log-log scaled paper (Krumbein and Pettijohn 1938; Moeller et al. 1979). The use of the term “power” in this sense goes back at least as far as the work of the Greek mathematician, Diophantus of Alexandria (?201–299 AD) (Smith 1923–1925) although only a very small number of such series were known from the time of Euclid (435–635 BC). The modern form of algebraic notation was introduced by Descartes (1637) (Sanford 1930). In the context of biological growth this type of function is known as an allometric function, following Huxley and Tessier (1936). It may also be called a power law, often the phenomenon under consideration exhibits self-similarity; see: Barenblatt (2003). The unhyphenated spellings power function and power law are by far the most frequent (Google Research 2012). 2. In hypothesis testing there is a parameter being hypothesised about and there are two mutually exclusive alternatives: the null hypothesis (H0) and the alternative hypothesis (H1). Suppose we have a sample X of size n, (xi; i ¼ 1, n), drawn from a parent normal distribution whose variance (σ 2) happens to be known, but whose actual mean (μ) is unknown and that H0 : μ ¼ c; the alternative is H1 : μ > c, where c is some reasonable prior expectation of what the mean might actually be. In making the test, there are two types of error which can occur: a Type I error (0 α 1), rejecting H0 when it is in fact true and a Type II error (0 β 1), accepting H0 when it is false. The power of a test is the probability that the hypothesis H0 will be rejected. Making a given test provides a value of a real-valued function, the test statistic, based on the data for which the sample size is n. Suppose the test statistic is the observed mean of the sample ðxÞ, the acceptance region (R0) is the region within the sample space of all possible outcomes of sampling a population X containing those values of x which necessitate accepting H0; and the rejection region (R1) is the region containing those values of x which necessitate rejecting H0 and accepting H1. In this case, the probability density of means of samples of size n drawn from a normal population with mean μ and variance σ 2 is

472

another normal distribution also with mean μ but having a standard deviation σn : The probability of a Type I error occurring α ¼ (1 β) is, in this case, given by 2

"

# x c Pðx > cÞ ¼ 1 Φ pffiffiffiffiffiffiffiffiffiffi , σ 2 =n where Ф is the cumulative standard normal distribution, 0 < Φ(∙) < 1. The power function is a graph of the values of α (expressed either as a proportion or a percentage) as a function of possible values of the parameter whose value is being estimated, in this case c. The theory was developed by the Russian-born American statistician, Jerzy Neyman (1894–1981) and the English statistician, Egon Sharpe Pearson (1895–1980) in Neyman and Pearson (1936, 1938). Discussed in a geological context by Miller and Kahn (1962). Power law spectrum, power-law spectrum An energy spectrum of the form E( f ) ~ f c, for large f (also known as a “1/ f c process”)—a graph of log10(power spectral density) as a function of log10(frequency) will be approximately linear with a slope of c; e.g. intensity (dB) as a function of frequency (Hz) as occurs in pink noise or brown noise. Shiomi et al. (1997) give examples drawn from seismic P- and S-wave velocities and density. Vaughan et al. (2011) discuss the problems inherent in choice of a first-order autoregressive, AR(1), process as a model for the spectrum in cyclostratigraphy and recommend use of the power law or bending power law as alternatives. Both hyphenated and unhyphenated spellings occur with similar frequency (Google Research 2012). See power spectrum. Power series A polynomial function of the form y ¼ a0 + a1x + a2x2 + a3x3 + , where a0, a1, a2, a3, . . . are constants. Early use of the term is by Forsyth (1893). The unhyphenated spelling is by far the most frequent (Google Research 2012).

P

Power spectrum, power spectral density (PSD) The power spectrum, also known as the power density spectrum, power spectral density, spectral density and power density (Blackman and Tukey 1958; Buttkus 1991, 2000) is a continuous function which is the Fourier transform of the second-order moment sequence E[X(t)X(t + τ)] of a weakly stationary stochastic process X(t) with a zero mean, where E(•) is the expectation operator and τ is the lag: It describes the contribution to the expectation of two Fourier components whose frequencies are the same. The autocovariance function C(τ) is C ðτÞ ¼ avefX ðt Þ ∙ X ðt þ τÞg where ave is the average value, and the covariance at lag τ is

473

Z

1 T !1 T

C ðτÞ ¼ lim

T =2 T =2

X ðt Þ ∙ X ðt þ τÞdt:

If P( f ) is the power spectrum, then Z C ðτ Þ ¼

1

1

Pð f Þe2πf τ df

where 2 Z 1 T =2 i2πft Pð f Þ ¼ lim X ðt Þe dt T !1 T T =2 pffiffiffiffiffiffiffi and i is the imaginary unit 1. P( f )df represents the contribution to the variance from frequencies between f and ( f + df ). Blackman and Tukey (1958) make the physical analogy that if X(t) is thought of as the current through a pure resistance of one ohm, the long-term average power dissipated in the resistance will be strictly proportional to the variance of X (t), hence the basis for the use of the term “power.” Note that Z Pð f Þ ¼

1

1

C ðτÞ cos 2πf τdτ

and Z Pð f Þ ¼ 2

1

C ðτÞ cos 2πf τdτ

also Z varfX ðt Þg ¼

1

2Pð f Þdf

(Blackman and Tukey 1958). The integral of the power spectrum, equal to the variance of the sequence, shows the relative contribution made by cycles of each frequency to the overall variance of the series. In practice, squared amplitude (power), or the logarithm of this value, is plotted as a function of frequency (Hz) to give the power spectrum, which is continuous and is independent of the phase of the signal (Blackman and Tukey 1958). The term spectrum was first introduced by the British mathematician, physicist and astronomer, (Sir) Isaac Newton (1643–1727) in his studies of optics (Newton 1704). The term “power in the spectrum” was used by the American mathematician, Norbert Wiener

474

(1894–1964) (Wiener 1926, 1949) and power spectrum was introduced by the American statistician, John Wilder Tukey (1915–2000) (Tukey and Hamming 1949; Tukey 1950) and became widely used following his book with communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958) and provision of algorithms for its calculation (Swinnerton-Dyer 1962). However, the first use of this approach appears to have been by the German-born American theoretical physicist, Albert Einstein (1879–1955) (Einstein 1914). For discussion in an earth science context see Anderson and Koopmans (1963), Horton et al. (1964), Camina and Janacek (1984), Buttkus (1991, 2000), Gubbins (2004), Weedon (2003), and Vaughan et al. (2011) who emphasise the importance of the correct choice of model for the spectrum. The unhyphenated forms of spelling: power spectrum, power density spectrum, power spectral density, spectral density and power density are by far the most frequently used (Google Research 2012). See also: amplitude spectrum, power spectral density analysis.

P

Power spectral density analysis, power spectrum analysis A method of statistical analysis applied to a time series, so as to account for its behaviour in terms of a mixture of contributions of signals of different frequency. In the case of a periodic function (in which the data repeat themselves indefinitely, both forwards from the end and backwards from the beginning), the spectrum will be a discrete distribution made up of a finite number of frequencies; in the non-periodic, case, the frequencies are continuous, but with varying amplitude. Virtually all earth science data is non-periodical in nature. The process of fitting a power spectrum is analogous to performing a multiple regression involving trigonometric (sine, cosine) transformations of the explanatory variable. Any major monotone long-term trend in the data should be removed prior to estimation of the spectrum of the shorter-term oscillations. The method of power spectrum analysis (power spectral density analysis) was initiated by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1942, 1949) and developed by the statistician, John Wilder Tukey (1915–2000), who in his later writings always recommended using the term spectrum analysis in preference to spectral analysis. The method was widely taken up following publication of his book with the communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). Examples of its application in the earth sciences include: Wadsworth et al. (1953). Jones and Morrison (1954), Smith (1956), Anderson and Koopmans (1963), Williams (1989), Buttkus (1991, 2000) and Yang and Kouwe (1995). See also: Walsh spectrum analysis. Power transfer function The function expressing the ratio of output power near a given frequency to the input power near that frequency (Blackman and Tukey 1958). Power transformation, power transform A general method of transformation of a skewed (asymmetric) frequency distribution into one which is more symmetric, for the purposes of statistical analysis: x* ¼ (xλ – 1)/λ when λ is non-zero and x* ¼ loge(x) when λ ¼ 0. In practice, the value of λ is determined empirically such that it minimises one or

475

more measures of the asymmetry of the distribution (e.g. skewness). Introduced by the British-born American chemist and mathematician, George Edward Pelham Box (1919–2013) and statistician, (Sir) David Roxbee Cox (1924–) (Box and Cox 1964); hence it is also known as the Box-Cox transform. However, power transformation is the most frequently used term (Google Research 2012). Geological applications include: Howarth and Earle (1979), Joseph and Bhaumik (1997) and Stanley (2006a, b). Powering An auxiliary operation of change in the simplex considered as a Euclidean space, the complement of the perturbation operation. A term introduced by the Scottish statistician, John Aitchison (1926–); see Aitchison (1986, 2003), Buccianti et al. (2006). Precision The closeness of agreement between independent test results obtained under stipulated conditions. It is generally expressed in terms of the standard deviation (s) of the test results ( 2s is often used in practice, but the multiplier should always be stated). Precision depends only on the magnitude of the random errors present and does not relate to the true (or specified) value. It may be determined under repeatability or reproducibility conditions; it should always be made clear which applies. A satisfactory measure of precision does not imply the data is necessarily accurate. See Thompson (1988), Analytical Methods Committee (2002, 2003), Reimann et al. (2008), Thomson and Coles (2011). See also: bias, accuracy, inaccuracy, Thompson-Howarth error analysis. Prediction error The difference between a value predicted on the basis of past data (or a model) and the value actually observed (Buttkus 1991, 2000). See also: prediction error filter, prediction interval. Prediction error filter A filter which minimises some function of the observed prediction errors; e.g. in processing seismic data, where the predictable part of the signal, such as source wavelet and multiples are removed, leaving the unpredictable part of the signal, which may include reflections of interest. See: Ott and Meder (1972), Mendel (1977), Hildebrand (1981), Buttkus (1991, 2000). See also: Kálmán filter. Prediction interval This is a statistical interval, based on data from a past sample of observations, which is expected to contain with a stated degree of confidence, the next one or more randomly selected observations from the population. It is based on the fundamental assumption that previous and future observations can be regarded as random samples from the same population and that the underlying process is not changing with time (Hahn and Meeker 1991). For example, if levels of a contaminant in a water supply sampled monthly over the past 5 years show no trend with time, and there have been two occurrence of it exceeding a given threshold, then one can be 95% confident that the number of exceedances in the next half year will be less than or equal to 3. Gibbons (1994) discusses the use of prediction intervals in groundwater monitoring. See Helsel (2005) for discussion of treatment of geochemical data containing nondetects.

476

Predictive deconvolution Deconvolution is a process designed to restore a waveform to the shape it had before being affected by some filtering action. The assumption is that a seismic trace consists of a series of reflection events convolved with a wavelet (whose shape depends on the shape of the pressure pulse created by the seismic source, reverberations and ghost reflections in the near-surface, the response of any filters involved in the data acquisition, and the effects of intrinsic attenuation), plus unrelated noise. The deconvolution process designs an inverse filter which compresses the wavelet and enhances the resolution of the seismic data (Dragoset 2005). In practice it may involve the following steps: (i) system deconvolution, to remove the filtering effect of the recording system; (ii) dereverberation or deringing, to remove the filtering action of a water layer (if present); (iii) predictive deconvolution, to attenuate the multiples which involve the surface or near-surface reflectors; (iv) deghosting, to remove the effects of energy which leaves the seismic source directly upwards; (v) whitening or equalizing to make all frequency components within a band-pass equal in amplitude; (vi) shaping the amplitude/frequency and/or phase response to match that of adjacent channels; and (vii) determination of the basic wavelet shape (Sheriff 1984). The method was introduced by the American mathematician and geophysicist, Enders Anthony Robinson (1930–) in 1951 during study for his Massachusetts Institute of Technology PhD thesis (1954, 1967a). See also: Robinson (1967a, b), Camina and Janacek (1984), Sheriff (1984), Buttkus (1991, 2000), Gubbins (2004); adaptive deconvolution, convolution, deterministic deconvolution, dynamic deconvolution, homomorphic deconvolution, inverse filtering, minimum entropy deconvolution, statistical deconvolution. Predictive decomposition An early term (Robinson 1954) for predictive deconvolution. It had been largely replaced by the latter by the late 1960s (Robinson 1967b; Google Research 2012).

P

Preemphasis, pre-emphasis Emphasis of certain frequencies (in comparison to others), before processing a signal, so as to emphasise certain frequencies compared to others, as an aid to the quality of result. The term was introduced by the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). See Knapp and Steeples (1986). The unhyphenated spelling preemphasis has become slightly more frequent since the 1970s (Google Research 2012). Preferred orientation The dominant direction of the orientation directions of a set of fabric elements represented as unit vectors radiating outwards from the centre of a circle in two dimensions or from the centre of a sphere in three dimensions. This phenomenon may be exhibited by crystallites, mineral grains, pebbles, ripple marks on a bedding plane etc. Examples of early macroscale studies are those of Ruedemann (1897), Krumbein (1939), Phillips (1938), Ingerson (1938), Balk (1948) and Crowell (1955). See also Stauffer

477

(1983), Fisher (1993); and petrofabric analysis for studies of preferred orientation at a microscopic scale. Prewhitening, pre-whitening Preprocessing of a signal to make it more like a series of independent, identically distributed, values; this makes the spectrum density of the signal being processed more nearly constant (the spectrum more nearly flat), thereby avoiding difficulty with minor lobes in the spectral window. The term was introduced by the Russian-born American statistician, Harry Press (1921–; f l. 1970) and American statistician, John Wilder Tukey (1915–2000) in Press and Tukey (1956). The unhyphenated spelling prewhitening has remained by far the most widely-used since (Google Research 2012). See also: Blackman and Tukey (1958), Buttkus (2000), Weedon (2003), Gubbins (2004); whitening. Primitive d 1. In differential calculus, if F(x) is a continuous function such that dx F ð xÞ ¼ f ð xÞ everywhere in the domain of definition of f, F(x) is said to be a primitive for f, d ln ðzÞ ¼ 1z , ln(z) is a primitive for 1/z. Any other function which has the e.g. since dz same derivative would also be a primitive. The term was used by Lagrange (1797). 2. In a stereographic projection, the limiting circle on the plane of projection, the circumference of the projection is called the primitive both in general usage and in crystallography (Farrar 1822; Phillips 1954).

Principal alias When a continuous signal has been sampled at discrete intervals Δt, so as to be able to reconstruct the signal perfectly from the sampled version, the sampling frequency must be greater than twice the highest frequency present in the signal being sampled. The principal alias is a frequency lying between the Nyquist frequency λ ¼ 1/ 2Δt and 2λ (Bakun and Eisenberg 1970). All oscillations of frequency f > λ show up in the spectral analysis as low-frequency aliases in the 0 to λ spectrum. An oscillation of frequency f within that spectral range cannot be distinguished from oscillations of higher frequencies, given by (n/Δt) f, where n is a positive integer. This aliasing should, if possible, be removed by filtering to remove all power at frequencies exceeding λ (Tukey and Hamming 1949; Blackman and Tukey 1958). The aliased frequency ( fa) of a sinusoi

dal oscillation is given by f α ¼ abs f 1j f T þ 0:5j , where |∙| returns the largest Δt

integer less than or equal to its argument (Jacobs et al. 1992). For discussion in the context of exploration seismology see Costain and Çoruh (2004). See also: alias. Principal axes A set of orthogonal lines in Euclidean space generalising the axes of an ellipsoid (Price 1862). Their computation was discussed in the context of factor analysis by Harman (1960). In structural geology they are given by the eigenvectors in terms of

478

direction cosines referring to the original three-dimensional coordinate axes (Loudon 1964).

P

Principal Components Analysis (PCA) A multivariate technique in which the dispersion of a set of n points (i.e. objects represented by a set of measurements) in p-dimensional measurement space is described by introducing a new set of orthogonal linear axes, passing through the multivariate mean of the original data set. These new axes are called the principal components (PCs) (Hotelling 1933; Johnson and Wichern 1982) and are by definition uncorrelated. The algorithm ensures that the variance of the coordinates corresponding to the projections of the points onto PC1 is greater than that on any of the other axes; PC2 again has maximum variance subject to it being orthogonal to PC1, and so on. By definition, there will be p PCs altogether, but most of the variability of the data will be represented (“explained”) by the first few. If data-compression is the aim, then the analysis is based on the covariance matrix; if data-understanding is the aim then, in geological work, the data set is standardized and the analysis based on the correlation matrix. In the latter case, a matrix of correlations between the PCs and the original set of variables (called the loadings matrix) is often used to interpret the nature of a causative scheme underlying the original measurement set, although this is not implicit in the model. The coordinates of the points projected onto the PCs are called scores. Interpretation is generally based on the first few PCs (although the last two may be useful for identifying multivariate outliers). Rotation of the set of orthogonal PCs into positions nearer to the ends of the original vectors (or making them non-orthogonal, which corresponds to allowing a certain amount of correlation between them) can be used to increase “interpretability” of the solution; this is known as factor analysis. For a reliable interpretation of the meaning of the PCs in terms of the original variables, n should be at least 3 times, and preferably 10 or more times, larger than p. Sarma et al. (2008) discuss the use of Kernel PCA (Sch€ olkopf et al. 1998; Sch€olkopf and Smola 2002) for dealing with problems which would otherwise require determining the eigenvalues of large covariance matrices. Strictly speaking, principal components analysis of compositional data sets requires logratio transformation (Aitchison 1986, 2003; Buccianti et al. 2006). Devlin et al. (1981) and Zhou (1989) discuss the robust estimation of principal components. Henrion et al. (1992) show how principal components analysis may be extended to data sets involving time as an additional dimension. Earth science applications include Santisteban and Munoz (1978), Reyment (1991), Done et al. (1991), Hohn (1993), Brown (1998). Principal diagonal The elements xii of the diagonal running from top left to bottom right of a square matrix X: 2 4

x11

3 5:

⋱ xnn

479

The term was used by the English mathematician, James Joseph Sylvester (1814–1897) in discussion of matrices in 1883. It is mentioned in a geological context in Parks (1966). Principal direction curves A fold already present in a rock is represented in the deformation ellipsoid by a bundle of planes intersecting in a common line; the fold axis. The loci on the stereographic projection sphere of principal directions for such a bundle cutting the ellipsoid at any angle are, in general, three curves, called the principal direction curves. Introduced by the British structural geologist, Derek Flinn (1922–2012) (Flinn 1962). Principal finite strain In structural geology, in two dimensions the ellipticity or strain ratio (R) of a finite strain ellipse with major and minor semi-axes (1 + e1) and (1 + e2), where e1 and e2 are the principal finite extensions (also called principle finite strains), is R ¼ (1 + e1)/(1 + e2). In three dimensions they are (1 + e1) (1 + e2) (1 + e3). The three planes XY, YZ and ZX are the principal planes of finite strain and the strain ratios are: Rxy ¼ ð1 þ e1 Þ=ð1 þ e2 Þ, Ryz ¼ ð1 þ e2 Þ=ð1 þ e3 Þ, Rzx ¼ ð1 þ e1 Þ=ð1 þ e3 Þ: See Ramsay (1967), Ramsay and Huber (1983); strain ellipsoid. The term principle strain appears in a discussion of elasticity by the British physicist, William Thomson, Lord Kelvin (1824–1907), (Thompson 1856). Principal finite extension, principal quadratic extension The principal quadratic extensions of a finite strain ellipse with major and minor semi-axes (1 + e1) and (1 + e2), where e1 and e2 are the principal finite extensions, are λ1 ¼ (1 + e1)2 and λ2 ¼ (1 + e1)2; see Ramsay (1967), Ramsay and Huber (1983). The term principal extension was used by the British mathematician and geophysicist, Augustus Edward Hough Love (1863–1940) (Love 1906). See also: strain ellipsoid. Principle of statistical modelling on coordinates For sample spaces with a Euclidean vector space structure, the application of standard statistical techniques to the coefficients with respect to an orthonormal basis (Pawlowsky-Glahn 2003). Examples: the usual n-dimensional real space ℝn with coordinates corresponding to the raw observations; the D-part simplex SD with logratio coordinates, as described in Egozcue et al. (2003); the positive real line R+ with coordinates ln(x), the unit square (0, 1) (0, 1) with coordinates as the logistic-transforms. See also Buccianti et al. (2006).

480

Principle value (of Cauchy) The principal value of a definite integral over an integrand with a singularity at c, a < c < b, is obtained by dividing the integral into two parts and evaluating it: Z

Z

b

f ðxÞdx ¼

a

lim ε!0, ε>0

Z

b

f ðxÞdx þ

a

f ðxÞdx:

cþε

Replacing ε by μτ in the first integral and by vτ in the second, where μ and v are two arbitrary and undetermined constants and τ represents an indefinitely small quantity approaching zero (so that neither part-integral contains the actual point at which the original integral becomes infinite or discontinuous), then following integration, replacing τ by 0 will yield the desired result. For example: Z

π 0

dx ¼ a þ bcosx

Z

aμτ

dx þ a þ bcosx

Z

π

dx aþvτ a þ bcosx

If a > b then Z

π 0

" (rffiffiffiffiffiffiffiffiffiffiffi )# π dx 2 ab x π 1 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi tan tan ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi ; 2 2 a þ bcosx a þ b 2 a b a b 0

if a < b then Z

π 0

dx 1 ffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 a þ bcosx b a2

("

#aμτ " # π sin aþx sin xþa 2 2 xa þ log log sin ax sin 2 2 0

)

aþvτ

sin a μτ2 sin vτ2 1 1 v ffi log ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi log : μτ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vτ 2 2 2 2 μ sin sin a þ b a b a 2 2

P

The value of this integral is indeterminate because the values of the constants v and μ are undefined; it is known as a general definite integral. However, by setting these arbitrary constants to μ ¼ v ¼ 1, then the integral takes the form of a definite integral, which the French mathematician, (Baron) Augustin-Louis Cauchy (1789–1857) (Cauchy 1825, 1827) called the principal value of the definite integral. In this case logϑμ ¼ 0, and so the

principal value is 0. If a ¼ b then Z 0

π

dx 1 ¼ að1 þ cosxÞ 2a

Z

π

sec

x 2 dx ¼ 1; 2

481

hence

dx 0 aþb cos x

1 π logϑμ , 1, or pffiffiffiffiffiffiffiffiffi is a discontinuous function, equalling pffiffiffiffiffiffiffiffiffi 2 2 b a2

a2 b

depending on whether a is less than, equal to, or greater than b (Price 1865). If a function f(t) can be expressed as the Fourier integral Z xð t Þ ¼

1

1

X ð f Þei2πft df

where Z Xðf Þ ¼

1

1

xðt Þei2πft dt,

pffiffiffiffiffiffiffi X( f ) is a representation of x(t) in the frequency domain, and i is the imaginary unit 1. They are related by (

xðt Þ ! X ð f Þ Fourier transform

X ð f Þ ! xðt Þ

) :

Inverse transform

Consider a signal consisting of a single rectangular pulse with a half-width in time of 2E : Then ( xð t Þ ¼

E E 1 when t 2 2 0 otherwise:

The principal value of the Fourier integral is: 8 E > 1 when jt j < > > > 2 > Z a < E lim X ð f Þei2πft df ¼ 0:5 when jt j ¼ : a!1 a > 2 > > > E > : 0 when jt j > 2 It is mentioned in an earth science context in Buttkus (1991, 2000). Prior information The evidence available about the occurrence of an event up to the time a particular evaluation or decision has to be made. It may provide quantitative information which can improve the way a ground survey is to be carried out (McCammon 1975b) or even determine whether it is worth undertaking at all; it may influence the way data are to be evaluated (Smith 1968; Caterina et al. 2014); the selection of a geochemical threshold (Garrett and Goss 1980a, b); enable an estimate of the prior probability of an event or

482

events taking place to be made (Weiss and Marshall 1999); or improve the specification of a numerical model (Cooley 1983; Scales and Tenorio 2001). An early use of the term was by the Argentinian-born, British-educated American mathematician, Dorothy Wrinch (1894–1976) and the British mathematician, mathematical astronomer and geophysicist, (Sir) Harold Jeffreys (1891–1989) in a paper on the nature of probability (Wrinch and Jeffreys 1919). See: prior probability, Bayes’ rule. Prior probability Bayesian methods are a class of methods for estimating the probability of occurrence of a set of events making best use of prior information. Given a prior frequency distribution of known (or sometimes assumed) functional form for the occurrence of the event, the posterior frequency distribution is given by Bayes’ rule, named after the English philosopher and mathematician, Thomas Bayes (1702–1761), expressed in modern notation as: pðSjX Þ ¼ ½ pðX jS ÞpðS Þ=f½ pðx1 jS ÞpðS Þ þ ½ pðx2 jS ÞpðS Þ þ þ ½ pðxn jS ÞpðS Þg, where p(S|X) is the posterior probability distribution of a given state (or model parameters) S occurring, given a vector of observations, X; p(S) is the prior probability distribution; and p(x|S) is the likelihood. Computer-intensive simulation methods, such as Markov chain Monte Carlo, may be required to obtain a solution because of the difficulty of performing the integration in the denominator in many practical problems. The term Bayesian was first used by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) (Fisher 1950). See: Wrinch and Jeffreys (1919) and, in an earth science context: Jeffreys (1924, Appendix B), also: Rendu (1976), Vistelius (1980), Christakos (1990), Curl (1998), Solow (2001), Rostirolla et al. (2003); Bayesian inversion, Bayesian/ maximum-entropy method.

P

Probabilistic Something subject to, or involving, chance variations or uncertainties. For example, a probabilistic model contains a specific random process built into it included in the description of the phenomenon, and the outcome is not entirely predictable. Use of the term appears to date from the 1940s (Google Research 2012); it occurs in a paper by the Japanese statistician, Tosio Kitagawa (1909–1993) (Kitagawa et al. 1942) and early usage in geological literature is in Krumbein (1955b). See also: stochastic, deterministic model, stochastic process model. Probability Defined by the French mathematician, Pierre Simon, Marquis de Laplace (1749–1827), (Laplace 1814) as the ratio of the number of favourable cases to all cases, assuming that the various cases are equally possible. The chance, or degree of belief, P(x), 0 P(x) 1, that a stated event will (will not) occur, or a stated criterion is (is not) true etc., under a stated set of conditions; it is a metric for uncertainty. See also odds, Feller (1950); and in the earth sciences Elkins (1940), Krumbein and Graybill (1965), Camina and Janacek (1984), Buttkus (1991, 2000).

483

Probability density, probability density function (PDF), probability distribution An expression specifying the way in which the probability of a given value of a variable x occurring varies as a function of x. This applies to a conceptual model; observed distributions are described by a frequency distribution. In general, the term probability distribution appears to be more widely used than probability density (Google Research 2012). In the earth sciences, early occurrences of the terms probability distribution and probability density occur in Law (1944) and Massé (1955) respectively. See also Miller and Kahn (1962), Krumbein and Graybill (1965), Buttkus (1991, 2000); additive logistic normal, additive logistic skew-normal, Bernstein, Beta, bimodal, Bingham, binomial, bivariate, broken-line, Burr-Pareto logistic, Cauchy, Chi-squared, continuous, cumulative, Dirichlet, discrete, double-exponential, exponential, extreme value, Fisher, fractal, Gamma, generalized Pareto, geometric, joint, Kent, Laplace, log-geometric, log-hyperbolic, logistic, logistic-normal, log-logistic, lognormal, logskew normal, marginal, mixture, multinomial, multivariate Cauchy, multivariate lognormal, multivariate logskew normal, multivariate normal, multivariate skew-normal, negative binomial, normal, Pareto, Poisson, shifted Pareto, Rosin-Rammler, skew, skew-normal, standard normal, stretched Beta, superposition, triangular, truncated, truncated Pareto, uniform, von Mises, Weibull and Zipf distributions. Probability perturbation method A Bayesian approach to the solution of inverse problems which uses a “pre-posterior” distribution, the probability of the model parameters given some subset of the data, to split the data into linear and “nonlinear” types. The former could be a set of point measurements combined with secondary information whose relationship to the data can be regarded as essential linear in nature. The latter exhibits a complex multi-point and nonlinear relationship with the model. The method uses fast non-iterative sequential simulation to obtain model realizations. The nonlinear data is matched by perturbing an initial realization using “probability perturbation,” so-called because it consists of perturbing the probability models used to generate the model realization, moving the initial guess closer to matching the nonlinear data, while maintaining the prior model statistics and the conditioning to the linear data (Caers 2003; Caers and Hoffman 2006). Probability plot, probability graph Often used as a visual goodness-of-fit test: A graph of the n observed values of a variable, xi (i ¼ 1, n), sorted into order of ascending magnitude (empirical quantiles) (y-axis) as a function of the percentiles of an appropriate theoretical frequency distribution (e.g. the normal distribution) serving as a model, equivalent to the cumulative proportions (i 0.5)/n or i/(n + 1), which by convention are plotted on the x-axis. These plotting positions are used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. An exact fit to the model results in a linear graph. If testing for fit to a lognormal distribution is required, a log-scaled y-axis is used to plot the magnitude of the ordered observations. Specially printed arithmetic or logarithmic probability-scaled graph paper was widely used, but

484

accurate numerical approximations for the quantiles of the normal distribution can now be obtained using standard software and have rendered such graph paper essentially obsolete. Although probability theory had been applied to the study of rainfall events by hydraulic engineers in the 1890s (Binnie 1892; Horton 1896), the first use of an arithmetic probability graph with a horizontal scale corresponding to values of the cumulative normal distribution was made by the American civil and sanitary engineer, Allen Hazen (1869–1930) in 1913, based on values given by Merriman (1903, Appendix Table I), to illustrate “the agreement of flow and storage data with the normal law of error” (Hazen 1914). In subsequent discussion of his paper (ibid., p. 1666) he says that “the experiment was tried of making logarithmic probability paper.” Arithmetic and logarithmic probability graph sheets could subsequently be purchased from the Codex Book Co. New York. Krumbein (in Krumbein and Pettijohn 1938) showed the use of conventional logarithmic probability paper in sedimentological studies, but preferred the use of his phi scale (Krumbein 1934a). Usage in petroleum geology (Preston and van Scoyoc 1964) and geochemistry (Miesch 1967b) followed. The term probability graph was frequently used until the 1970s, when it became overtaken by probability plot, which is now by far the most widely used term (Google Research 2012). See also: Barnett (1975), Harter (1984); percent-percent plot, quantilequantile plot. Probability sampling A term synonymous with statistical sampling, implying a general class of samples which includes a randomization procedure in the sampling scheme. A formal procedure for selecting one or more samples from a population in such a manner that each individual or sampling unit in the population has an equal known chance of appearing in the randomly selected (statistical) sample (Krumbein and Graybill 1965). The requirement to obtain representative samples has long been known in the mining industry, e.g. Brunton (1895), Hoover (1948); an early reference to selecting random samples occurs in Dresser (1909). Geologists began to become aware of this topic during the 1950s (see the bibliography in Krumbein 1960b); see also Cochran et al. (1954).

P

Probability space A probability space (Ω, F, P) is given by the non-empty set Ω, whose elements are the possible outcomes, or states of nature; the set F, whose elements are called events (a set of outcomes for which one can ask their probability of occurrence); and P, the probability (or probability measure) assigning each event a probability between zero and one. The total measure of the probability space P(Ω) ¼ 1. It can be visualised as the volume within a coordinate system in which each coordinate is of unit length and corresponds to the range of possible values {0, 1} of the probability 0 P(xi) 1 for each of a set of possible events x1, x2, , xn, within which all values of feasible joint probabilities of occurrence lie. If the events are all mutually independent, then the coordinates will be orthogonal (Vistelius 1980, 1992). Probable error The probable error (of the mean) is the error which will not be exceeded in more than 50 percent of the observed cases; for a normal distribution it is given by

485

0.6745 times the standard deviation. The term probable error (wahrscheinliche Fehler) was introduced by the German mathematician and astronomer, Friedrich Bessel (1784–1846) in 1815 and defined in Bessel (1816). It was applied to the sampling of sediments by the American sedimentologist and mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1934b). Process-response model This is a conceptual model which attempts to define a set of variables (“processes”) which control a number of “responses.” The classic example is that developed by the American mathematical geologist, William Christian Krumbein (1902–1979), for a beach environment (Krumbein 1963a; Krumbein and Graybill 1965) in which the process elements of the model include: (i) energy factors: waves (height, period, angle of approach), tides (range, diurnal pattern, stage), currents (velocity, direction) and wind on the backshore (velocity, direction); (ii) material factors (mean grain diameter, sorting, mineral composition, moisture content, stratification of the material making up the beach); and (iii) the shore geometry (straight, curved, angle of bottomslope). The response elements are: (i) beach geometry (foreshore slope, width, height of berm, backshore width); and (ii) beach materials (mean grain diameter, grain size sorting, mineral composition, moisture content, stratification). There will also be feedback from the response to process elements, e.g. wave action and water depth directly affect near-shore currents, mean grain size and nearshore bottom slope; but the latter is also affected by mean grain size and shore currents, while itself affecting the wave action (feedback). See also discussion by Whitten (1964). Product-limit estimator Also known as the Kaplan-Meier method. A standard method in medical studies for calculating the summary statistics of right-censored survival data. Named for the American statistician, Edward Lynn Kaplan (1920–2006) and biostatistician Paul Meier (1924–2011) who introduced it (Kaplan and Meier 1958). Chung (1988, 1989a) has applied the method to the lengths of fractures in a granitic pluton where both ends of 1567 fractures can only be observed in 257 cases and it is required to obtain a confidence band for the observed distribution of fracture lengths. Helsel (2005) gives a clear example of the application of the method to left-censored geochemical concentration data. Product-moment correlation coefficient (r) A dimensionless measure of the mutual association between a pair of variables. Unless stated otherwise, this relationship is assumed to be linear and the statistic is taken to be the (Pearson) sample product-moment correlation coefficient for n pairs of (x, y) values: covðX ; Y Þ r ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðX ÞvarðY Þ

486

where cov (X, Y) is the covariance of random variables X and Y which have variances var (X) and var(Y ), i.e. r¼

n X ðxi mx Þ yi my = sx sy , i¼1

where the variables are x and y, with means mx and my and standard deviations sx, sy. In the case of perfect sympathetic variation of x and y, r ¼ 1; a perfect antipathetic variation corresponds to r ¼ 1. So-called rank correlation coefficients (such as the Spearman rho and Kendall tau) are based on the relative rather than absolute magnitudes of the values of the variables and can reflect monotone nonlinear relationships between them. See Raveh (1986) for a review of measures of monotone association. The correlation coefficient is inadequate whenever the sample space is not n-dimensional real space with the usual Euclidean space structure; see also: principle of statistical modelling on coordinates, spurious correlation, closed data and Helsel (2005) for discussion of treatment of geochemical data containing nondetects. Although the British scientist and anthropologist, Francis Galton (1822–1911) was the first to measure “co-relation” (Galton 1888), the formula was previously given by the French naval officer, astronomer and physicist, Auguste Bravais (1811–1863) (Bravais 1846). The “Pearson” formula was introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1896a). Early use in geology was by the American sedimentologist, Lincoln Dryden (1903–1977) (Dryden 1935). See also: Krumbein and Pettijohn (1938), Miller and Kahn (1962), Gubbins (2004); cophenetic correlation coefficient, correlation matrix. Profile A graph of a measured quantity as a function of horizontal distance (or the corresponding data values), e.g. a topographic profile. A graph of corrected seismic reflection times or depths as a function of horizontal distance was known as a reflection profile (Nettleton 1940).

P

Program, programming Program is an abbreviation of computer program. Programming is the method of encoding the instructions in a program (note U.S. spelling is conventionally used for this term) enabling a computer to solve a problem by the input of raw data (if required), undertaking the necessary calculations, and output of the results. The initial analysis of the problem would probably have involved developing an algorithm, determining the detailed logical steps involved in the process, often developed diagrammatically in the form of a flowchart (to aid analysis and debugging of the logic) and, finally, embodying the results in a formal computer language to enable its execution on the computer hardware. From 1954, this would have been in the form of a low-level, often machine-specific, “machine language” or assembler code [such as FAP, acronym for FORTRAN Assembly Program, originally developed by David E. Ferguson and Donald P. Moore at the Western Data Processing Centre, University of California, Los Angeles;

487

Moore (1960)], which enabled translation, by means of a compiler, of the humanoriginated instructions into the strings of binary bits required for the actual machine operation. The first manual on computer programming (Wilkes et al. 1951) was written for the EDSAC 1 (Electronic Delay Storage Automatic Computer) built at Cambridge in 1946–1949, which was the first stored-program computer. Krumbein and Sloss (1958) give an early example of such a program for compilation of stratigraphic thickness ratios. However, by the early 1960s high-level “autocodes,” i.e. computer languages such as FORTRAN (acronym for “Formula Translation”), developed for the IBM704 in early 1957 (McCracken 1963), or ALGOL (acronym for Algorithmic Oriented Language), developed mainly in Europe from 1958 (Dijkstra 1962), enabled easy coding of computational instructions and formats for reading data and outputting the results. This “source code” would be processed by a compiler to produce the “object code” which governed the actual operation of the computer. For early discussion in a geological context, see Koch and Link (1971). Early examples of geological usage include: Whitten (1963), Kaesler et al. (1963), Harbaugh (1964), Link et al. (1964), Fox (1964), Manson and Imbrie (1964), Koch et al. (1972) and Sackin et al. (1965). Successive versions of FORTRAN have continued to be used up to the present time. The interactive general-purpose programming language BASIC was introduced in 1964. Despite the later proliferation of computer packages such as Excel for performing spreadsheet, mathematical and statistical calculations, new special-purpose programming languages, such as S, originally developed by a team at AT&T’s Bell laboratories (Becker et al. 1988), and its successors S-Plus (Venables and Ripley 1994), and a freeware alternative R (originally developed by Robert Gentleman and Ross Ihaka of the Statistics Department, University of Auckland, New Zealand in 1993) have been developed (Maindonald and Braun 2003; Everitt and Hothon 2006; Reimann et al. 2008; Bivand et al. 2008, 2013) to assist customised statistical work and enabling the rapid inclusion of research-level methods contributed by its user community, which have been taken up by earth science users. Programming language An abbreviation for computer programming language. These are high-level languages in which a computer program can be written so that they may subsequently be translated into machine language to execute the instructions. See: ALGOL, APL, awk, BASIC, C, COBOL, FORTRAN, PASCAL, PL/I, Prolog, R, S, assembler language. Projection A linear transformation which maps a point (or line) in one plane onto another plane by connecting corresponding points on the two planes by parallel lines. This linear transformation can be represented by a projection matrix, P. For example, the orthogonal transformation which maps a point (x, y, z) onto the x-y plane to give a corresponding point (x, y, 0), is given by the matrix

488

2

1 P ¼ 40 0

0 1 0

3 0 05 0

and P(x y z)T ¼ (x y 0)T. Journel (1977) reviewed the mathematics of kriging in terms of projections. Prolate spheroid This is the lemon-shaped solid of revolution formed by the rotation of an ellipse about its major axis which, by convention, corresponds to extension at the poles. The Italian-born French astronomer, Jean-Dominique (Giovanni Domenico) Cassini (1625–1712) and his son, Jacques Cassini (1677–1756) argued for an Earth shaped like a prolate spheroid (Cassini 1720). Newton’s (1687) hypothesis of polar flattening was eventually confirmed by the measurements of a 1 arc-length, near the geographic North Pole, made in Lapland in 1737 by the French mathematician and philosopher, Pierre-Louis Moreau de Maupertuis (1698–1759) (de Maupertuis 1738). Prolog A general purpose logic programming language used in expert systems and similar artificial intelligence applications (Bratko 2001). Originally developed by the French computer scientists, Alain Colmerauer (1941–) and Philippe Roussel (1945–) in 1972 (Kowalski 1988). Early earth science applications include: Fisher and Balachandran (1989), Riedel (1989), Armstrong and Bennett (1990), Luo et al. (1994). Prospector The earliest earth science application of an expert system in the earth sciences was the Prospector system to aid mineral deposit location, originally developed at SRI International, Menlo Park, CA, by the American computer scientist Peter Elliot Hart (1941–) and electrical engineer, Richard Oswald Duda (1936–) and subsequently by (Hart 1975; Hart et al. 1978; Campbell et al. 1982; McCammon 1990) and it was the first expert system to prove that it could solve an economically important problem. In 1983 its development was transferred to the U.S. Geological Survey (McCammon 1994); see also Katz (1991). Propagation error An error caused by a change in seismic velocity which has not been allowed for (Sheriff 1974).

P

Pseudocode, pseudo-code An algorithm is a formal procedure (set of well-defined logical instructions) for solving a numerical or logical problem. Given an initial state, it will terminate at a defined end-state. It may initially be planned using pseudocode, a natural-language notation which resembles a programming language, but which is not intended for actual compilation. It has simples rules such as: all statements showing dependency (while, do, for, if, etc.) are indented; certain keywords may be used to bracket lines of dependent statements such as: If . . . EndIf; Do While . . . EndDo; etc.; others may

489

indicate actions: Read, Print, etc. Although its syntax may resemble that of the intended programming language to be ultimately used, it may be written in a more general style so long as it is intelligible. Although the term is found in an early paper by the Canadian-born physicist and computer scientist, Morris Rubinoff (1917–2003) (Rubinoff 1953), it appears to have become widespread only in the 1970s; the spelling pseudocode remains the most widely used (Google Research 2012). It occurs in the earth science literature in Fricke (1988) and Dunstan and Mill (1989). See also flowchart. Pseudoinverse, pseudo-inverse, pseudo inverse A square matrix, X1, which when multiplied by the matrix X, yields the identity matrix (I), X1X ¼ I. The term and notation were introduced by the English mathematician, Arthur Cayley (1821–1895) (Cayley 1858). The pseudoinverse, the generalization of an inverse to all matrices, including rectangular as well as square, was discovered by the American mathematician, Eliakim Hastings Moore (1862–1932) (Moore 1935), under the name “general reciprocal.” It was independently rediscovered by the English mathematical physicist, (Sir) Roger Penrose (b. 1931) (Penrose 1955), who named it the generalized inverse; Greville (1959) says that the (now widely used) term pseudoinverse was suggested to him by the American applied mathematician, Max A. Woodbury (b. 1926). The term inverse (in the sense of a matrix inverse) becomes more frequent in geophysics from the 1960s (e.g. Harkrider and Anderson 1962), and pseudoinverse from the 1980s (e.g. Tarlowski 1982); the latter remains the most widely used (Google Research 2012). See also: Greenberg and Sarhan (1959). Pseudolognormal distribution A frequency distribution which mimics the lognormal distribution by having a right-handed skewness (Link and Koch 1975). Pseudorandom numbers, pseudo-random numbers This is a sequence of numbers which are generated by a computer program embodying an algorithm that gives a very good approximation to the properties of random numbers. The first computer-based experiments were made by the Hungarian-American mathematician, John (Janosh) von Neumann (1903–1957) on the ENIAC computer (the first stored-program computer with a programming language) in 1946. However, great care has to be taken to be sure that the sequence of pseudorandom numbers produced by a given algorithm is in fact adequate (Sharp and Bays 1992; Gentle 1998; Eddelbuettel 2006). One of the most successful methods now used is the “Mersenne twister” algorithm, originally developed by Japanese mathematicians, Makoto Matsumoto and Takuji Nishimura in 1997 (Matsumoto and Nishimura 1998; Saito and Matsumoto 2008) which provides fast generation of very long period (219937 1 4.3 106001) high-quality number sequences. The unhyphenated spelling pseudorandom numbers remains the most widely used (Google Research 2012). See also: Monte Carlo method.

490

Pseudorank, pseudo-rank If a matrix X has r rows and c columns, where c r, then its mathematical rank, rank(X) min(r, c). If the values in X are real data values subject to b þ E, where X b represents the measurement errors, then X can be represented as X ¼ X systematic variation in X and E the measurement errors, etc. The pseudorank of X is the b and, in general, rank X b rankðXÞ. The spelling pseudorank has mathematical rank X become the more widely used since 1980. Pseudospectrum method, pseudo-spectrum method, pseudo spectrum method A method for the numerical solution of ordinary and partial differential equations and integral equations (Orszag 1972), providing an alternative to the finite element and finite difference methods for solution of the wave equation by computing global spatial derivatives in the Fourier domain, and which has been used in seismic modelling for the solution of geophysical problems (Fornberg 1987; Kang and McMechan 1990; Huang 1992). The unhyphenated spelling pseudospectrum is the most widely used (Google Research 2012). Pulse 1. A waveform whose duration is short compared to the timescale of interest, and whose initial and final values are the same—usually zero (Sheriff 1984). The pioneer English seismologist, John Milne (1850–1913) used the term “pulsation” to refer to a group of “exceedingly small [amplitude] but extremely regular waves” (Milne 1898). 2. A small group of waves (usually 1–3 successive peaks and troughs) which is considered to indicate a seismic reflection (Nettleton 1940). Pulse shaping, pulse-shaping To change the shape of a pulse into a more useful one (such as a square wave, or to indicate a time-break more effectively). The term came into use in electronics in the 1940s (e.g. Lattin 1945) and was discussed in a seismological context by the American statistician, John Wilder Tukey (1915–2000) (Tukey 1959a); see also Robinson (1967b). The unhyphenated spelling pulse shaping is by far the most widely used (Google Research 2012).

P

Pulse stabilization Processing to ensure the same effective wavelet shape (Sheriff 1984). The term was used in electronics by 1953 (Mayer 1957). Punched card, punch card The 12-row 80-column (7 3/8 3 1/4 in.) Hollerith punched card on which alphabetic, numerical and special characters were encoded using the Hollerith code, is named for the German-American mining engineer and statistician, Herman Hollerith (1860–1929) who in 1889 was granted a patent for a method for encoding numerical, alphabetic and special characters using holes punched on the basis of a rectangular grid pattern on 45-column “punched cards” for use in mechanical

491

tabulating machines (punched paper tape was already in use in the 1840s). He founded the Tabulating Machine Co. which eventually became the International Business Machines Corporation (IBM) in 1924 (Kistermann 1991). IBM first introduced use of the 80-column cards for input of data in 1928 (International Business Machines 2014). Widely adopted for the input of computer programs, data and, sometimes, output of results, they lasted into the 1970s until replaced by magnetic storage media. The first comprehensive illustration of the use of punched cards in geology was in a paper by the American mathematical geologist, William Christian Krumbein (1902–1979) and Laurence Louis Sloss (1913–1996) (Krumbein and Sloss 1958), recording thickness of compositional data and sand, shale and non-clastic rock thickness, although geologist Margaret Ann Parker of the Illinois Geological Survey had also begun using punched cards in connection with stratigraphic and geochemical studies (Parker 1952, 1957); see also Melton (1958b). Their use in the analysis of geophysical time series is mentioned by applied mathematician, Archie Blake (c. 1910–) of the U.S. Coast and Geodetic Service (Blake 1941). The spellings punched card and punched-card were the most widely used terms, with the former dominating, but their usage rapidly declined during the 1970s (Google Research 2012). Punctual kriging The original, but now less widely used (Google Research 2012) term for point kriging. Introduced to the English language literature by the French geostatistician, Georges Matheron (1930–2000) who remarked (Matheron 1967) “this terminology is classical in France since [Matheron] 1960.” Pure shear An irrotational strain where the area dilation is zero. The importance of explicitly distinguishing between pure shear (which is three-dimensional) and simple shear (which is two-dimensional) seems to have been first made by the British mathematician and geophysicist, Augustus Edward Hough Love (1863–1940) (Love 1906). See also Ramsay (1967, 1976), Hobbs et al. (1976), Ramsay and Huber (1983). Purple noise Coloured (American English sp. colored) noise can be obtained from white noise by passing the signal through a filter which introduces a degree of autocorrelation, e.g. x(t) ¼ ax(t 1) + kw(t) where w(t) is a white noise signal; a is a constant, 0 < a < 1; k is the gain, and x(t) is the output signal at time t. The power spectrum density for purple (or violet) noise increases linearly as f2. The concept of white light as having a uniform power density over its spectrum was first discussed by the American mathematician, Norbert Wiener (1894–1964) (Wiener 1926), and taken up in digital signal processing by the American mathematician Richard Wesley Hamming (1915–1998) and statistician John Wilder Tukey (1915–2000) (Tukey and Hamming 1949); see also Blackman and Tukey (1958). For discussion in an earth science context, see Weedon (2003). Purposeful sampling The subjective selection of samples or specimens thought to assist in the solution of a particular problem. As pointed out by Imbrie (1956) and Krumbein and

492

Graybill (1965) such sampling is very likely to be biased, and statistical inferences made on the basis of a collection of such samples may well be invalid. Probability sampling is generally to be preferred. Python A general-purpose high-level object-oriented programming language initially developed in 1989 by its continuing principal author, the Dutch computer programmer, Guido van Rossum (1956–). Its first release was in 1991 (van Rossum 1995) and it is now in its third major release by the Python Software Foundation (van Rossum and Drake 2011). Examples of its application in earth science are Sáenz et al. (2002), Grohmann and Campanha (2010), Wassermann et al. (2013), Tonini et al. (2015), and Hector and Hinderer (2016).

P

Q

Q-mode analysis A term introduced by the American paleoceanographer, John Imbrie (1925–2016) in 1963, to refer to a multivariate analysis (e.g. cluster analysis, factor analysis, or prin components analysis) in which the investigator’s interest is in studying the relationships between the sample compositions (Imbrie 1963; Imbrie and Van Andel 1964; Krumbein and Graybill 1965). It began to fall out of use in the 1980s (Google Research 2012). See also: R-mode analysis. QAPF diagram Double ternary diagram for classification of igneous rocks based on the contents of Quartz, Alkali-feldspar, Plagioclase and Feldspathoid (as determined from thin-sections by modal analysis) with a common base A-P. Since quartz and feldspathoids are mutually exclusive, each of the two triangles sum to 100%. Originally devised by the Swiss petrographer, Albert Streckeisen (1901–1998), Streckeisen (1974, 1976, 1978). QR algorithm A very important method for computing the eigenvalues and eigenvectors of a matrix, independently discovered in 1961 by computer scientist, John G. F. Francis (1934–fl. 2015) in England (Francis 1961, 1962) and, in Russia by mathematician Vera Nikolaevna Kublanovskaya (1920–2012), where it was known as the method of one-sided rotations (Kublanovskaya 1963). Named for the orthogonal matrix Q, and upper triangular matrix R, used in Francis's solution: Given a real matrix A for which the eigenvalues are required, let A0 be equal by definition to A. Then, starting with index k ¼ 0, let Ak+1 ¼ RkQk; then, in principle, Ak+1 ¼ QTkQkRkQk ¼ QTkAkQk ¼ Q1kAkQk; in favourable circumstances, the Ak converge to a triangular matrix which contains the desired eigenvalues along its diagonal (the actual computational steps are more complex so as to reduce the number of operations required).

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_17

493

494

QR decomposition This is named for procedures independently developed by the Danish mathematician, Jorgen Pedersen Gram (1850–1916) (Gram 1883), and the German mathematician, Erhard Schmidt (1876–1959) (Schmidt 1907), which take a finite, linearlyindependent k-dimensional set of vectors and generate an orthogonal set of vectors which span the same k-dimensions. For example, if the Gram-Schmidt orthogonalization is applied to the column vectors of a square matrix, A, of order n, it is decomposed into an orthogonal matrix, Q, and an upper triangular matrix, R, such that A ¼ QR (the so-called QR decomposition). For earth science applications see Mendoza (1986) and Zhang and Schultz (1990). Quadratic deviation An alternative name (Vistelius 1980, 1992) for standard deviation: A measure of the spread of a set of observed values of size n about the centre of the distribution, characterised by the mean of the values (m): ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi v" # u n u X 2 s¼t ðxi mÞ ðn 1Þ: i¼1

In estimating the spread from a finite sample, the divisor is (n 1) rather than n, since one degree of freedom has been used in the prior estimation of m. See Helsel (2005) for discussion of treatment of geochemical data containing nondetects. Also known as root mean square error. See also: Krumbein and Pettijohn (1938), Buttkus (1991, 2000), Gubbins (2004) and Helsel (2005). Inman deviation measure; Trask sorting coefficient. Quadratic equation An equation of the form a + bx + cx2 ¼ 0, where a, b and c are constants. The English term quadratic derives from the Latin quadratus, meaning a square, and was first used in an algebraic sense in a book by the English theologian and natural philosopher, Bishop John Wilkins (1614–1672) (Wilkins 1668; Miller 2015a). However, the earliest-known study of equations involving a squared term occurs in an Egyptian papyrus of the Middle Kingdom (c. 2160–1700 BC), but quadratics with three constants were known to Hindu mathematicians c. 500 BC, and by the seventeenth Century analytical methods of solution had replaced geometric-based ones (Smith 1929). Quadratic form

Q

1. A quadratic form is a polynomial equation of the second degree: e.g. Q(x, y) ¼ ax2 + 2bxy + cy2 for two variables, x and y or Q(x, y, z) ¼ ax2 + by2 + cz2 + 2dxy + 2exz + 2fyz for three, x, y, z; where a, b, c, d, e and f are constants. The terminology was introduced by the German mathematician, astronomer and geomagnetist, Carl Friedrich Gauss (1777–1855) (Gauss 1801). In general, such equations may be expressed in matrix form as:

495

Qðx1 ; x2 ; ; xn Þ ¼

n X n X

αij xi xj ¼ xT Ax,

i¼1 j¼1

where A is an n n symmetric matrix in which ( αij ¼

αii ; i ¼ j 1 αij þ αji ; i 6¼ j 2

so that in the case of the ternary equation, above: 2

a A ¼ 4d e

d b f

3 e f 5: c

See Camina and Janacek (1984), Gubbins (2004) and Gubbins and Bloxham (1985). 2. A three-dimensional surface whose topography is determined by a quadratic equation (Krumbein 1963b). Quadratic spline A type of spline: A chain of polynomials of fixed degree (usually cubic functions are used) joined in such a way that they are continuous at the points at which they join, referred to as “knots.” The knots are usually placed at the x-coordinates of the data points. The function is fitted in such a way that it has continuous first and second derivatives at the knots; the second derivative can be made zero at the first and last data points, it is an interpolating function of the form F ð xÞ ¼ y i þ a i ð x xi Þ þ

ðaiþ1 ai Þðx xi Þ2 , 2ðxiþ1 xi Þ

where the coefficients are found by choosing a0, then using the relationship aiþ1

2 yiþ1 yi : ¼ ai þ ðxiþ1 xi Þ

Its gradient at a new position, x3, is a linear combination of that at nearby points x1 and x2. Splines were discovered by the Romanian-American mathematician, Isaac Jacob Schoenberg (1903–1990) (Schoenberg 1946, 1971; Ahlberg et al. 1967). See also: Rasmussen (1991); smoothing spline regression, piecewise function.

496

Quadrature, quadrature component 1. The construction of a square which has a given area, “squaring.” 2. Solving an integral either analytically or numerically. 3. A component which is always 90 (π/2 radians) out of phase with a cosine carrier-wave signal when it is being modulated so as to use the signal to carry information. The quadrature component of an induced signal is that part which is out of phase with the generating signal. Use of the term in this sense was popularised following the work of the American statistician, John Wilder Tukey (1915–2000) and communications engineer, Ralph Beebe Blackman (1904–1990) (Blackman and Tukey 1958). Early use in a geophysical context is by Anderson (1968). See also Lyons (2008). Quadrature spectrum The imaginary part of the cross-spectrum of two functions (the real part is referred to as the cospectrum). These terms were introduced by American statistician, Nathaniel Roy Goodman (1926–1981) (Goodman 1957; see also Lyons 2008). Discussed in a seismological context by Tukey (1959a) and Iyer and Healy (1972); see also Weller et al. (2004). Quadtree A hierarchical tree data structure in which each internal node has exactly four branches leading to sub-nodes (“children”). It was introduced by American computer scientists, Raphael A. Finkel (1951–) and Jon Louis Bentley (1953–) as an aid to storing and searching multi-dimensional point data in image databases, maps, etc. (Finkel and Bentley 1974). For example, if the whole of an 2-dimensional image corresponds to the root node of the tree, it is then partitioned into 2 2 cells, and each of these is then divided into 2 2 smaller cells, and so on. The primary node acts as a “point” which forms one of the corners of the sub-cells corresponding to its four children. If a sub-cell itself contains a point which needs to be references, then it will again act as a node for a recursive subdivision into four cells, etc. See also Samet (1984, 2005). Originally spelt quad tree, or quad-tree, quadtree was rapidly adopted as the standard (Google Research 2012). Earth science applications include: Chang and Tso (1996), Nickerson et al. (1999) and Ersoy et al. (2006). See also octree.

Q

Quality Assurance (QA) A term often erroneously used synonymously with that of quality control, it in fact refers to planned and systematic activities in a quality system intended to minimise sources of errors whose effects are monitored by statistical quality control and provide confidence that a product or service will fulfil requirements (Bemowski 1992). See: accuracy, bias, calibration, detection limit, error, fitness-forpurpose, precision, recommended value, repeatability conditions, reporting limit, reproducibility conditions, trueness, uncertainty. Quality control (QC) Quality control, the application of statistical methods to assuring the quality of a product (in the earth sciences, this often refers to chemical analysis, etc.);

497

also referred to as statistical quality control. Bemowski (1992) gives a useful glossary of terms used. See: Shewhart (1931), Otto (1937) and Juran (1991, 1997); accuracy, bias, calibration, detection limit, error, fitness-for-purpose, precision, recommended value, repeatability conditions, reporting limit, reproducibility conditions, trueness, uncertainty; see also: quality assurance. Quantiles The general term for the set of (n 1) values which divide the total frequency distribution of a variable into n parts. See quartiles, quantile-quantile plot. Quantile function If ℙ is a probability measure (0 p 1) on an underlying sample space and X is a real-valued random variable belonging to the set of all real numbers (ℝ), then the cumulative distribution function of X is the function F(x) ¼ ℙ(X x); F (1) ¼ 0; F(1) ¼ 1; which is continuous and increases from left to right Z F ðxÞ ¼

x 1

f ðt Þdt

as x increases from 1 (or xmin) to 1 (or xmax). Then the quantile function can be regarded as the inverse of the cumulative distribution of x, and is defined as Q( p) ¼ min {x ∈ ℝ : F (x) p}. If x is a quantile of order p, then Q( p) x. A quantile of order 12 is the median (second quartile) of the distribution; the quantiles of order 14 and 34 are the first and third quartiles respectively. Chung (1989a,b) discusses the construction of confidence bands for the quantile functions of truncated and randomly censored data. They have been used in the study of deposit size-distributions (Caers et al. 1996) and are extensively applied in hydrological studies (Helsel and Hirsch 1992). The term appears to have come into use in the early 1960s (e.g. Hájek 1961). Quantile-quantile (Q-Q) plot Often used as a visual goodness-of-fit test: The n observed values of a variable, x1, . . ., xn, sorted into order of ascending magnitude (empirical quantiles), are plotted (y-axis). By convention, the quantiles of an appropriate theoretical distribution (e.g. a lognormal distribution) serving as a model (x-axis), the plotting points being equivalent to the cumulative proportions 1/(n + 1), 2/(n + 1), . . ., n/(n + 1). A divisor of (n + 1) is used to allow for the fact that the possible extremes of the sampled distribution are unlikely to have been observed. An exact fit of the observed to the model distribution results in a linear plot. Comparison of the shapes of two arbitrary distributions is achieved by plotting the values of the quantiles for each variable corresponding to a set of percentiles (the same ones are used for each variable) chosen by the user. If the two distributions differ only in the magnitudes of their centres and spreads, but not their shape, the plot will again be linear. The plot was introduced by the Canadian statistician, Martin Bradbury Wilk (1922–2013) and the Indian statistician, Ram Gnanadesikan (1932–2015) (Wilk and Gnanadesikan 1968) while both were working at the AT&T Bell Labs at Murray

498

Hill, NJ, USA, and its use was popularised by books such as Chambers et al. (1983). See Helsel (2005) for discussion of data containing nondetects. See Caers et al. (1996), Schmidt et al. (2005), Reimann et al. (2008) and Wu (2010) for examples of earth science usage. The spelling Q-Q plot is far more frequent than QQ plot (Google Research 2012). See also CP plot, P-P plot. Quantitative paleoecology The application of quantitative methods to paleoecology, the study of the ecology of ancient or fossil organisms. Much of the pioneering work in the field was accomplished by the American palaeontologist, Roger Leroy Kaesler (1937–2007) and the Australian-born, British and Swedish geologist and biometrician, Richard Arthur Reyment (1926–2016). See: Kaesler (1966), an ecological study which demonstrated the potential for the application of computer-based methods to paleoecology; Kaesler (1969a, 1969b, 1979), Kaesler and Mulvany (1976), Kovach (1989), Reyment (1963, 1970, 1971a, 1978b, 1980, 1981), Campbell and Reyment (1978), Blackith and Reyment (1971) and Reyment et al. (1984). Quantitative stratigraphy The application of quantitative methods to stratigraphic, biostratigraphic, lithostratigraphic and chronostratigraphic correlation. Methods used include aids to the recognition of useful index fossils; use of cluster analysis, principal components analysis and factor analysis to determine micropalaeontological, mineralogical or geochemical assemblage zones; the use of graphic correlation, ranking and scaling, and Correlation and scaling to determine stratigraphic biozonation; chronograms for estimation of stratigraphic boundary ages. The term was introduced in this context by the Austrian-born British geologist and quantitative stratigrapher, Walther Schwarzacher (1925–) (Schwarzacher 1975). See also: Cubitt and Reyment (1982), Gradstein et al. (1985), Agterberg and Gradstein (1988), Tipper (1988), Agterberg (1990, 2014), Pearce and Jarvis (1995) and Harff et al. (1999).

Q

Quartiles Three (interpolated) values which divide the set of observed values for a variable sorted into order of ascending magnitude such that 25% of the data fall below or at the first quartile; 50% below or at the second quartile; and 75% below or at the third quartile. The second quartile is more usually known as the median. The English statistician, (Sir) Francis Galton (1822–1911) seems to have been the first to use this definition of quartiles (Galton 1880). However, the term was previously used in astronomy to refer to an aspect of the planets when their longitudes are 90 , a quarter of the twelve Signs of the Zodiac apart (Goodacre 1828; Gadbury 1717). In geology its use began with the quantitative study of sediment size distributions by the American mathematician, and engineering geologist Parker Davies Trask (1899–1961) and mathematical geologist, William Christian Krumbein (1902–1979) (Trask 1930; Krumbein 1933, 1936b). See also: quantile function.

499

Quartile deviation The arithmetic quartile deviation, a measure of the spread of a frequency distribution, is (Q3 Q1)/2, where Q1 is the first quartile and Q3 is the third quartile; Q3 > Q1. Their use in sedimentology was encouraged by the work of the American mathematical geologist, William Christian Krumbein (1902–1979) in Krumbein (1936b), Krumbein and Aberdeen (1937), Krumbein and Pettijohn (1938). Quartile kurtosis A graphical measure of the peakedness of a frequency distribution, given by the ratio of the quartile deviation to the difference between the 90th and 10th percentiles (P90, P10), i.e. [(Q3 Q1)/2]/[P90 P10], where Q1 is the first quartile and Q3 is the third quartile; Q3 > Q1. Its introduction was attributed by Krumbein and Pettijohn (1938) to the American psychometrician and statistician, Truman Lee Kelley (1884–1961) (Kelley 1924). N.B. the term was confused by Sheriff (1984) with kurtosis. Quartile skewness The American mathematical geologist, William Christian Krumbein (1902–1979) defined the phi quartile skewness measure of the asymmetry of a sediment size grade distribution (Krumbein 1936b) as Sk ¼ 0.5(ϕ25 + ϕ75) ϕ50, where ϕ25, ϕ50 and ϕ75 are the quartiles (i.e. 25th, 50th and 75th percentiles) measured on the phi scale and estimated from the cumulative sediment size grade curve. However, the American geologist, Douglas Lamar Inman (1920–2016) recommended (Inman 1952) two dimensionless measures: αϕ ¼ (ϕ16 + ϕ84 2ϕ50)/(ϕ84 ϕ16) to measure the asymmetry of the central part of the distribution, and α2ϕ ¼ (ϕ95 + ϕ5 2ϕ50)/(ϕ95 ϕ5) to measure the asymmetry of the extremes, where ϕ5, ϕ16, ϕ84 and ϕ95 are the 5th, 16th, 84th and 95th percentiles. It was redefined as the average of the two, as a better measure of the overall skewness, and called the Inclusive Graphic Skewness, by the American sedimentologist, Robert Louis Folk (1925–) and his student William Cruse Ward (1933–2011) in 1957: SkI ¼ (ϕ16 + ϕ84 2ϕ50)/[2(ϕ84 ϕ16)] + (ϕ95 + ϕ5 2ϕ50)/[2(ϕ95 ϕ5)]. See also: Trask skewness coefficient. Quartimax rotation Used in factor analysis, it is a multivariate technique which was introduced by the psychologist Charles Edward Spearman (1863–1945) in England (Spearman 1904b) and developed in America by the psychologist Louis Leon Thurstone (1887–1955) (Thurstone 1931). It aims to explain the behaviour of a set of n observed objects on the basis of p measured variables in terms of a reduced set of k new variables. It is assumed that the latter reflect a number of latent, or unobserved, common factors which influence the behaviour of some, or all, of the original variables; some may be “unique” factors, influencing only one variable. Principal components analysis can be based on the correlation matrix, in which the principal diagonal (the correlation of each variable with itself) is unity. In factor analysis, the entries in this diagonal are replaced by estimates of the commonality, a measure of the non-uniqueness of the variables (e.g. the multiple correlation of a variable with all others). A matrix of correlations between the factors and

500

the original set of variables (called the loadings matrix) is often used to interpret the nature of a causative scheme underlying the original measurement set, although this is not implicit in the model. The coordinates of the points projected onto the factor axes are called the factor scores. An analysis similar to principal components is performed, aiming to produce a “simple structure” in which, ideally, each variable would have a non-zero loading on only one common factor. Methods are used to achieve this are: orthogonal rotation of the axes, or, better, oblique rotation, in which the initial factor axes can rotate to best summarise any clustering of the variables. Common oblique methods are varimax rotation, which tries to maximise the variance of the loadings in each column of the factor matrix; quartimax rotation, which aims to maximise the variance of the squares of the loadings in each row of the factor matrix; or equimax rotation, which is a compromise between the other two. Other criteria, e.g. maximum entropy, have also been applied. Interpretation of the meaning of the results is subjective. Imbrie and Purdy (1962) and Imbrie and Van Andel (1964) introduced the cosθ coefficient for factor analysis of palaeontological and mineralogical compositional data (see also Miesch 1976b). Analyses of the relationships between the variables, based on a correlation matrix, is referred to as an R-mode analysis, whereas an analysis of the relationships between sample compositions, etc., based on the cosθ matrix, resolved in terms of a number of theoretical end-members, is referred to as a Q-mode analysis. The first computer program for this purpose available in the earth sciences was that of Imbrie (1963). However, as with principal components analysis, it has subsequently been realised that special methods must be used because of the closed nature of such data (Aitchison 1986, 2003; Buccianti et al. 2006). Quasifunctional equation A functional equation is an equation which specifies a function in implicit form, e.g. the equation f (xy) ¼ f (x) + f ( y) is satisfied by all logarithmic functions. Mann (1974) used the term quasi-functional equation to mean an equation based on derived parameters (such as a set of eigenvalues) themselves derived from a man-made classification of facies which is inherently subjective in origin, rather than in terms of natural parameters, and which is being used to characterise a given petrographic facies.

Q

Quasilinearisation A method of solution of nonlinear problems which may be applied to nonlinear ordinary differential equations, or to a partial n-th order differential equation in N dimensions, as a limit of a series of linear differential equations. For example, solution 2 of an equation of the form: du dx ¼ u þ bðxÞu þ cðxÞ: It was developed by the American applied mathematicians, Richard Ernest Bellman (1920–1984) and Robert Edwin Kalaba (1926–2004) (Bellman 1955; Kalaba 1959; Bellman and Kalaba 1965); see also Mandelzwig and Tabakin (2001). Widely applied in hydrological studies (e.g. Yeh and Tauxe 1971), it has also been used in geophysics (Santos 2002; Gubbins 2004).

501

Quasiperiodic, quasi-periodic 1. A time series whose oscillations have a nearly constant wavelength. See also: periodic. 2. A trajectory in phase space with a stable limiting trajectory). The concept was used by the American meteorologist, Edward Norton Lorenz (1917–2008) in his classic study of the behaviour of nonlinear dynamical systems in Lorenz (1963). Earth science examples are discussed in Turcotte (1997) and Weedon (2003). The unhyphenat6ed spelling quasiperiodic has become slightly the more frequent since the early-1980s (Google Research 2012). Quasipolynomial, quasi-polynomial A polynomial function is a mathematical expression of finite length composed of one or more variables and a constant, using only the operations of addition, subtraction, multiplication and non-negative integer exponents. e.g. a0 + a1x + a2x2 + . . . + anxn; a0, a1, a2, . . . an are constants; and n is the degree of the polynomial, i.e. the highest power to which a variable within it is raised. If negative exponents occur, then it is known as a quasipolynomial. Early examples of their use in earth science occur in Simpson (1954), Krumbein (1959a, b), Krumbein and Graybill (1965); see also Camina and Janacek (1984) and Gubbins (2004). The unhyphenated spelling quasipolynomial has become more frequent since the mid-1980s. Quefrency A term used in cepstrum analysis (Bogert et al. 1963; Oppenheim and Schafer 2004) for the equivalent of frequency in traditional spectral analysis, the number of a time series in unit time. Examples of earth science use include Cohen (1970), Lines and Ulrych (1977) and Butler (1987). Quelling The American geophysicist, George Edward Backus (1930–) introduced the concept of quelling (Backus 1970a, b). Myerholtz et al. (1989) showed how it could be applied to improve imaging in seismic tomography as a smoothing to overcome problems when data kernels have a square root singularity at the turning points of the rays: nonsquare integrable singularities may be overcome by quelling using integration by parts. This is equivalent to a damped weighted least squares solution. See Johnson and Gilbert (1972), Chou and Booker (1979), Myerholtz et al. (1989), Neal and Pavlis (2001) and Gubbins (2004). However, note that Chiao and Kuo (2001) have argued that the weighting schemes used are “based on a priori prejudice that is seldom physically justifiable.” Queueing theory The study of the properties of a waiting line or queue, such as times of arrival of the objects forming the queue (which is often governed by a stochastic process of some kind), the length of time they remain in it, the overall length of the queue at any one time, etc. Early discussion of the theory in a statistical context is by the English Statistician, Maurice Kendall (1907–1983) (Kendall 1951). However, work on the development of queues began as a result of problems of call-handling in manual telephone exchanges by

502

the Danish mathematician, Agner Krarup Erlang (1878–1929) in 1909, and by the Head of Traffic and Operations in the Norwegian telegraph company, Telegrafverket, Tore Olaus Engset (1865–1943) in 1915 (Erlang 1909, [1917] 1918; Engset [1915] 1998, [1917] 1918, [1918] 1992; Stordahl 2007). To date, application in the geosciences has been relatively peripheral, e.g. in hydrology (Langbein 1958), groundwater management (Batabyal 1996) and mine planning (Goodfellow and Dimitrakopoulos 2013).

Q

R

r2 (r-squared, R-squared, coefficient of determination) The square of the productmoment correlation coefficient; a measure of the goodness-of-fit of a regression model: the square of the product-moment correlation coefficient between the observed and fitted values of y (the multiple correlation coefficient) and is equal to the variation in the dependent variable ( y) explained by all the predictors, divided by the total variation in y, hence the term coefficient of determination. This ratio is often expressed as a percentage. The term was introduced by the American geneticist and evolutionary theorist, Sewall (Green) Wright (1889–1988) (Wright 1921) and its possible first use in geology was by the American sedimentologist, Lincoln Dryden (1903–1977) (Dryden 1935). This criterion can be very misleading when fitting nonlinear regression models. See discussion in: Draper and Smith (1981), Kvålseth (1985), Willett and Singer (1988), Ratkowsky (1990), and Scott and Wild (1991). ℝ Notation for the set of all real numbers. Introduced by the German mathematician, Julius Wilhelm Richard Dedekind (1831–1916) and used in his own work since 1858 but only published in Dedekind (1872). See also: rational number, irrational number. R A freeware computer programming language for performing customised statistical analysis and graphics (not to be confused with the early IBM System R) which has succeeded S (Becker and Chambers 1984) following its commercialization as S-PLUS. R development was first begun by the Canadian statistician, Robert Clifford Gentleman (1959–) and New Zealand Statistician, George Ross Ihaka (1954–) in 1993 (Ihaka and Gentleman 1996), and is now maintained and developed by the R Project for Statistical Computing (http://www.r-project.org). See the introductory text by Crawley (2005) and, for more advanced discussion, Maindonald and Braun (2003), Reimann et al. (2008) and Bivand et al. (2013). Garrett (2013) describes the rgr package of methods for the display

# Springer International Publishing AG 2017 R.J. Howarth, Dictionary of Mathematical Geosciences, DOI 10.1007/978-3-319-57315-1_18

503

504

and analysis of applied geochemical data; see also Reimann et al. (2008) and Janoušek et al. (2016). Bell and Lloyd (2014) discuss a palaeontological application for phylogenetic analysis. R-mode analysis A term introduced by the American paleoceanographer, John Imbrie (1925–2016) to refer to a multivariate analysis (e.g. cluster analysis, factor analysis, or principal components analysis) in which the investigator’s interest is in studying the relationships between the variables (Imbrie 1963; Imbrie and Van Andel 1964; Krumbein and Graybill 1965). It began to fall out of use in the 1980s (Google Research 2012). See also: Q-mode analysis. RQ-mode analysis A term which became current following the publications of Imbrie (1963), Imbrie and Van Andel (1964) and Krumbein and Graybill (1965) to refer to a multivariate analysis (e.g. cluster analysis, factor analysis, or principal components analysis) in which the investigator’s interest is in studying both the relationships between the variables: R-mode analysis; and the samples: Q-mode analysis. The term probably arose as a contraction of the term “both R- and Q- modes,” it does not appear in texts nearly as often as do R- or Q-mode analysis (Google Research 2012). Rademacher function See square wave. Radial plot A graphical method introduced by the British statistician, Rex. F. Galbraith (1988, 1990, 2005) to compare n estimates {z1, z2, , zn}, such as the fission-track geochronological ages of several zircons from the same granite, each of which has an associated standard error {σ 1, σ 2, , σ n}, to see if they are consistent with each other. The data is displayed as a bivariate graph (x, y) of the standardized values yi ¼ (zi m)/σ i as a function of xi ¼ 1/σ i, where m is the weighted average: m¼

R

! ! n n X X zi 1 = σ2 σ2 i¼1 i i¼1 i

If only one population is present, the points will scatter in a band about the horizontal line through y ¼ 0, otherwise they will to fan out from the origin at (0,0), the scatter increasing as x becomes larger; outliers will also be readily apparent. The y-axis of the graph is not extended beyond 2 (although occasional y-values could obviously fall beyond this) so as to emphasise the range of scatter around the horizontal at y ¼ 0 to be expected for a single population. An arc of a circle drawn at the right hand side of the plot helps to visualise lines with slopes corresponding to various z-values and it is annotated with values corresponding to z; it is drawn with the origin at {x ¼ 0, y ¼ 0} and radius r, given by

505

8 r > < x ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 þ h ðz mÞ2 > : y ¼ ðz mÞx length of where the scale factor h ¼ ððlength of

1 unit of xÞ 1 unit of yÞ.

The value of z at y ¼ 0 corresponds to z ¼ m.

In the case of radiometric age data it is helpful to logtransform the age estimates (Ma) first (Galbraith 1988, 1990). See also Carter et al. (1995). Radian (rad) [notation] The angle subtended at the centre of a circle by an arc whose length is equal to the radius of the circle. π radians ¼ 180 , hence 1 rad ¼ 57.2958 . It is attributed (Sanford 1930, Cooper 1992) to the Irish engineer and physicist, James Thomson (1822–1892) in 1871. Radon transform This is named for the Austrian mathematician, Johann Karl August Radon (1887–1956), who first published the transform and its inverse (Radon 1917). It consists of the integral of a function over straight lines and takes a complete set of line integrals and enables reconstruction of an object in two-, or three-dimensions. This work eventually led to the straight line tomography algorithm. The principle also led to early work in exploration seismology by the American geophysicist, Frank Rieber (1891–1948), who used optical methods to delay and add analogue sound recordings from a number of equally-spaced ground positions to construct sonograms, later known as slant stacks (linear stacks along lines of different slope), over a short range of offsets (Rieber 1936). The so-called τ-p transform is a discrete transform, based on the Radon transform, used to map seismic data from the time (t, sec) and offset (x, km) domain to that of intercept time (τ, sec) and horizontal slowness ( p, sec/km). It carries out a summation along lines in the data at sampled values of τ and p so that a linear event in the t-x domain maps to a point in the τ-p domain and vice-versa: t ¼ τ + p∗x and p ¼ dx/dt ¼ sin(δ)/v, where δ is the plane wave propagation angle within the n-th layer which has an internal velocity v (Zhou and Greenhalg 1994). With the advent of the computer, it was first applied in exploration seismology by Chapman (1978), Schultz and Claerbout (1978) and McMechan and Ottolini (1980). It has subsequently been widely used in velocity analysis, migration and modelling, improvement of signal to noise ratios, suppression of multiple reflections, etc. See also: Chapman (1987), Nowack (1990) and Edme (2003); tomography. Rahmonic A term used in cepstrum analysis (Bogert et al. 1963; Oppenheim and Schafer 2004) for the equivalent of harmonic in traditional spectral analysis, i.e. one of the higher quefrequency components generated from a sinusoidal component of a time series by a nonlinear process applied to the time series or any equivalent time function. Ramsay logarithmic diagram A method of classifying the shape of the strain ellipsoid on the basis of the base-10 logarithms of the two principal strain ratios the ratio of the

506

maximum/intermediate extensions plotted on the y-axis and the ratio of the intermediate/ minimum extensions plotted on the x-axis. This has the advantage that all lines representing equal changes of length of the deformation ellipsoid X > Y > Z are straight (unlike the earlier Flinn diagram). Named for the British structural geologist, John Graham Ramsay (1931–) (Ramsay 1967; Ramsay and Huber 1983; Wood 1974a, b). See also: Jelinek diagram. Random A random event is one which happens by pure chance; the probability of it happening is governed by a probability distribution, e.g. a uniform distribution, normal distribution, or other specified distribution. Although used in a general sense in early literature, the term only began to be used in a statistical context from the late 1920s (e.g. Tippett 1927). Early examples in the earth science literature are Krumbein and Aberdeen (1937), Elkins (1952) and Krumbein and Graybill (1965). See also: random digits, random effects, random field, random forest, random noise, random number, random process, random sample, random selection, random signal, random variable, random walk, randomization. Random effects The effects observed on a response variable, y ¼ f(x) corresponding to a set of values of a factor (x) that are of interest and which exist at infinitely many possible levels (in contrast to fixed effects) of which only a random sample are available. The term arose in the context of analysis of variance (Eisenhart 1947; Scheffé 1956). Early discussion occurs in Mood (1950), Kempthorne (1952) and Wilk and Kempthorne (1955). For discussion in a geological context see Krumbein and Graybill (1965) and Miller and Kahn (1962) although they use the term “random components model.” Random field A set of random values of a variable (or a multidimensional vector of values of n-variables) each of which corresponds to a given position in 2- or 3-dimensional Euclidean space. The values for a given variable are usually spatially correlated in some way (see semivariogram). Depending on the type of distribution function to which its values conform, the random field may be called a Gaussian random field, Markov random field, etc. In an early investigation of natural data, the Indian electrical engineer and geophysicist, Prabhakar Satyanarayan Naidu (1937–) (Naidu 1970b) studied the statistical properties of the aeromagnetic field over a 4,500 sq. mi. area of Canada and showed that from the evidence of first-order statistics (mean, variance, probability distribution, etc.) half of it was homogeneous and Gaussian and the rest was inhomogeneous and non-Gaussian, whereas the second-order statistics (spectrum) showed that it was entirely inhomogeneous. See also random function.

R

Random forest A tree-based classification algorithm (Breiman et al. 1984; Breiman 2001; Liaw and Wiener 2002), developed by the American statistician, Leo Breiman (1928–2005). It is related to the CART algorithm and utilises a majority vote to predict classes, based on the partition of data from multiple decision trees: A random forest is a

507

classifier, h(x) consisting of a collection of tree-structured classifiers {h(x, Θk), k ¼ 1, . . .} where the {Θk} are independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input, x (Breiman 2001). The numerous realizations are obtained by bootstrap sampling from the original data, in each case and growing the predictive tree using a random subsamples of the predictors; new predictions are obtained by aggregating the predictions of the set of trees using a majority vote. See Cracknell and Reading (2014), Carranza and Laborte (2015) and Harris and Grunsky (2015) for discussion in a earth science context. Random function This term has been used in geostatistics to mean the same as a random field but it has also been used to mean a function selected at random from a family of possible functions. See also stationarity. Random noise 1. In time series analysis, it is an unwanted time-related process consisting of random disturbances corrupting the signal being monitored. If it has equal power in all frequency intervals (i.e. it is uncorrelated) over a wide range of frequencies (producing a flat power spectrum) then it is known as white noise. The observed values at each time interval are independent with zero mean and constant variance, i.e., it is a purely random process. If the amplitude of the power spectrum is not equal at all frequencies (i.e. partially correlated in some frequency band), then it is known as colored noise [American English sp.; Tukey and Hamming 1949)] (e.g. red noise is partially correlated at the lowest frequencies; see also white noise). The American statistician, John Wilder Tukey (1915–2000) pointed out that the repetition of a signal would produce an exact copy, whereas a repetition of noise would only have statistical characteristics in common with the original (see also Tukey 1959b). The concept of noise was introduced by the Swiss-born German physicist, Walter Schottky (1886–1976), who predicted (Shottky 1918) that a vacuum tube would have two intrinsic sources of time-dependent current fluctuations: shot noise (Schroteffekt) and thermal noise (W€armeeffekt). The former was observed as current fluctuations around an average value, as a result of the discreteness of the electrons and their stochastic emission from the cathode. The latter, manifested as fluctuating voltage across a conductor in thermal equilibrium, is caused by the thermal motion of electrons and occurs in any conductor which has a resistance, and it is temperature-related. It is now called Johnson-Nyquist noise, after two Swedish-born American physicists, John Bertrand Johnson (1887–1970) and Harry Nyquist (1889–1976), who first studied it quantitatively and explained the phenomenon (Johnson 1928; Nyquist 1928b). See also: van der Ziel (1954), Wax (1954), Davenport and Root (1958), Blackman and Tukey (1958); and, in an earth science context, Horton (1955, 1957), Buttkus (2000), Gubbins (2004); one-over-f noise, random walk, nugget effect.

508

2. The American mathematical geologist, William Christian Krumbein (1902–1979) used noise (Krumbein 1960a) to mean fluctuations in data which cannot be assigned to specific causes and which, if they are large, may obscure the meaningful information in the data. Random number A random number is generated as part of a set of numbers which exhibit statistically random properties, and drawn from a uniform frequency distribution. These may be either digits drawn at random from the set of integers (I): {0, 1, 2,, 9}, or {Imin, , Imax} or real numbers in the range {ℝ min, , ℝ max}. The earliest work using such numbers usually involved physically drawing marked balls from a container in such a way that the numbers could not be seen beforehand. The concept of randomization was discussed by the British statistician, (Sir) Ronald Alymer Fisher (1890–1962) in his book The Design of Experiments (Fisher 1935) and the first book of tables of random numbers was published by Leonard Haley Caleb Tippett (1902–1985) (Tippett 1927). However, by the late 1940s, pseudorandom numbers were beginning to be generated by computer (Rand Corporation 1955). In practice, it has proved extremely difficult to produce pseudorandom number generators which do not eventually prove to have problems with generation of very long sequences of numbers (see Sharp and Bays 1992; Gentle 2003; Deng and Xu 2003; McCullough 2008; Barker and Kelsey 2015). Current research is being driven by simulation and cryptographic needs and may involve hardware as well as algorithmic generators. See also Monte Carlo method. Random process A process in which a random variable (or a collection of random variables) is subject to evolution in time (or with distance) which is stochastic rather than deterministic in nature. If the random process has been sampled at times t0 , t1 , t2 , , tN the resulting real-valued random variables will be x(t0) , x(t1) , x(t2) , , x(tN). A Gaussian random process is fully characterised by the mean value across all the x(t) at a given instant in time together with the autocorrelation which describes the correlation between the x(t) at any two instants of time, separated by a time interval Δt. It is discussed in a geoscience context in Merriam (1976b), Brillinger (1988), Buttkus (1991, 2000). The study of such processes began with the Russian mathematician, Aleksandr Yakovlevich Khinchin (1894–1959) (Khinchin 1932, 1934). However, this model is not always suitable for modelling processes with high variability and models based on long-tailed distributions (non-Gaussian processes) may be required in some circumstances (Samorodnitsky and Taqqu 1994; Johnny 2012). See also Markov process.

R

Random sample Krumbein and Pettijohn (1938) defined a random sample (in a geological context) as: “one in which characteristics of the sample show no systematic variations from the characteristics of the deposit at the sampling locality.” See: random selection, simple random sample, stratified random sample, systematic sample and: cluster sample, composite sample, duplicate samples, grab sample, grid sampling, nested sampling, point sample, probability sampling, purposeful sampling, sampling

509

interval, sampled population, target population, sampling design, serial sample. These methods were introduced by the British statistician, Karl Pearson (1857–1936) (Pearson 1900) and later popularised in geology by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein and Pettijohn 1938; Krumbein and Graybill 1965). Random selection A method of collecting n physical samples, data values, etc., in such a way that each one has a fixed and determinate probability of selection. Subjective, haphazard, sampling by humans does not guarantee randomness and tables of random numbers or computer-generated sequences of pseudorandom numbers should be used to guide the sampling process (e.g. by picking n random positions within the overall length of a transect, or target outcrop section, etc.). The term occurs, in a statistical sense in a paper by the British Mathematician and actuary, Benjamin Gompertz (1779–1865) first delivered at the International Statistical Congress in July 1860 (Gompertz 1871). Its use was popularised in geology by the 1965 textbook by the American mathematical geologist, William Christian Krumbein (1902–1979) and statistician, Franklin Arno Graybill (c. 1921–2012). Random signal This generally means a randomly generated noise signal. Fox (1987) gave a Fortran program for generating random signals conforming to different types of amplitude-frequency spectrum, including band-limited series. Early use of the term in a geophysical context occurs in Dyk and Eisler (1951). Random variable, random variate A random variable is a quantity which may take any value in a specified set with a specified relative frequency or probability, governed by an associated empirical frequency distribution or a specified probability density. It is also known as a variate following the work of the English statisticians, Karl Pearson (1857–1936) (Pearson 1909), and (Sir) Ronald Aylmer Fisher (1890–1962) (Fisher 1925a, b). Both terms appear in an English-language publication by the Swedish mathematician, statistician and actuary, Harald Cramer (Cramer 1930) and random variable in Cramer (1937). The latter has since been by far the more widely used (Google Research 2012). In earth science literature, random variable occurs in Oldham and Sutherland (1955), Miller and Kahn (1962) and Krumbein and Graybill (1965); but random variate appears to be seldom used. See also: Markovian variable. Random walk A one-dimensional random walk which begins with a value, e.g. 0, at time 0, and each successive value is obtained by adding a random number from a normal distribution to the previous value. Also known as Brownian motion, random noise and white noise. The name originates from a letter to Nature by the English statistician, Karl Pearson (1857–1936) in which he asked for assistance with the solution of determining the probability that a man undertaking a “random walk” (i.e. starting from a fixed point, walking a distance d in a straight line in a given direction before turning in a randomly

510

chosen direction and again walking a distance d before changing direction, etc.) will have reached an overall linear distance D δ from the origin, after repeating this course of action n times (Pearson 1905c). Discussed in an earth science context in Raup and Gould (1974), Price (1976), Buttkus (1991, 2000); the model is also widely used in hydrological studies, e.g. Delay et al. (2005). Randomization This means the process of ensuring that a set of samples or test specimens, etc. are arranged “deliberately at random” in a statistical sense. This formal process was first described as “random arrangement” by the English statistician, (Sir) Ronald Alymer Fisher (1890–1962) in Fisher (1925a) but he subsequently introduced the term randomization in Fisher (1926). Early use of the term in a geological context occurs in work by the American mathematical geologist, William Christian Krumbein (1902–1979) (Krumbein 1953a, b). The spelling randomization is far more frequent than randomisation (Google Research 2012). Randomization test A procedure for determining the statistical significance of a test without knowledge of the sampling distribution. For example, in determining whether there is a statistically significant difference between the value of a statistic observed on (two or more) groups, the data values are repeatedly randomly assigned to the groups and so that all possible values of the test statistic may be determined. If the proportion of the permutations which yield a value of the test statistic as large as that associated with the observed data is smaller than some chosen level of significance (α), then the actual test result is significant at the α-level. This test method was introduced by the British-born American chemist and mathematician, George Edward Pelham Box (1919–2013) and Danish-born American statistician, Sigurd L€okken Andersen (1924–2012) (Box and Andersen 1955) who gave the term as an alternative to their permutation test and by the American statistician, Henry Scheffé (1907–1977) as a randomization test (Scheffé 1956). Gordon and Buckland (1996) and Romesburg (1985) discuss the use of this type of test in a geological context. See also: Dwass (1957), Edgington and Onghena (2007); Monte Carlo significance test. Range

R

1. A crude measure of the dispersion of a set of measurements (Fisher 1925a; Krumbein and Pettijohn 1938). The difference (without regard to sign) between the minimum and maximum observed data values for a variable. 2. A set of numbers which form the possible results of a mapping, i.e. the set of values which a function f(x) can take for all possible values of x. 3. In spatial analysis, the distance at which the zone of influence of a sample effectively vanishes (this is equivalent to the autocorrelation effectively becoming zero at a large distance); see: variogram (Matheron 1965).

511

Range chart A comparative chart in which the times of first and last appearance each of a number of taxa are joined by a straight line parallel to the geological-time axis of the diagram. Line width may be drawn proportional to some estimate of relative abundance at particular times. An early example is that of the French-born palaeontologist and stratigrapher, Joachim Barrande (1799–1883) to illustrate the abundance of different Silurian trilobite species (Barrande 1852). Rank 1. Given a list of values of a variable which can be sorted into a sequence of ascending magnitude, the rank of the i-th member of the list is its equivalent position in the sorted list. 2. The rank of a matrix is the order of the largest non-vanishing minor. The term was introduced by the German mathematician, Ferdinand Georg Frobenius (1849–1917) (Frobenius 1878). Given a matrix 2

1 X ¼ 42 3

3 4 55 6

then the values in its two columns are independent and its column-rank is 2. However, if 2

1 X ¼ 42 3

3 3 65 9

then the rightmost column contains values three times the left, consequently the values in its two columns are not independent and its column-rank will be 1. Generalising, if X has r rows and c columns, where c r, then rank(X) min(r, c). In general, there will be n roots for a matrix of rank n. See also pseudo-rank. Rank correlation A nonparametric measure of the statistical dependence between two variables which reflects the strength of their monotone relationship even if it is nonlinear. Common measures are the Spearman rank correlation coefficient (1904a, b), denoted by the Greek letter rho ( ρ) and named for the English psychologist and statistician, Charles Edward Spearman (1863–1945); and the Kendall rank correlation coefficient (1938), denoted by the Greek letter tau (τ) and named for the English statistician, (Sir) Maurice George Kendall (1907–1983). Early geological applications include Melton (1958a) and Johnson (1960).

512

Rank scores test See van der Waerden test. Ranking algorithm, Ranking And Scaling (RASC) Ranking is the process of arranging a number of individuals in order according to the magnitude of some attribute which they all possess. The final order is called the ranking and each individual’s place in it is its rank. As used in stratigraphy, a ranking algorithm is a pairwise comparison technique which tries to determine the most likely sequence of biostratigraphic events as recorded in different stratigraphic sections (Agterberg and Nel 1982a). Scaling determines the spacing of these events on a relative timescale (Agterberg and Nel 1982b). This probabilistic statistical method is embodied in the RASC (Ranking And SCaling) computer program. See the detailed discussion in Gradstein et al. (1985). The method is well suited to multi-well micropalaeontological studies within a sedimentary basin. The history of the development of the method is described in Agterberg (1990, pp. 404–407). For an approach based on extreme-value distributions see Gil-Bescós et al. (1998). See also Cooper et al. (2001), Gradstein et al. (2008), Agterberg et al. (2013); Correlation And Scaling. Rare earth element (REE) diagram Comparison of sample compositions on the basis of their rare earth element (REE) content is often made by means of a diagram developed by the Japanese ge