diff --git a/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/eval_dataset.xml b/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/eval_dataset.xml new file mode 100644 index 00000000..9c45a27e --- /dev/null +++ b/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/eval_dataset.xml @@ -0,0 +1,24186 @@ + + + enwikimath + enwikimath + https://en.formulasearchengine.com/wiki/Main_Page + MediaWiki 1.27alpha + first-letter + + Media + Special + + Talk + User + User talk + enmath + enmath talk + File + File talk + MediaWiki + MediaWiki talk + Template + Template talk + Help + Help talk + Category + Category talk + Module + Module talk + + + + Van der Waerden's theorem + 0 + 2458 + + 2459 + 2013-11-12T18:20:53Z + + 2620:101:F000:700:21E:C2FF:FEAB:82AE + + corrected reference for result (previous was erroneously to Brown article which is just a survey) + wikitext + text/x-wiki + '''Van der Waerden's theorem''' is a theorem in the branch of [[mathematics]] called [[Ramsey theory]]. Van der Waerden's theorem states that for any given positive [[integer]]s ''r'' and ''k'', there is some number ''N'' such that if the integers {1, 2, ..., ''N''} are colored, each with one of ''r'' different colors, then there are at least ''k'' integers in [[arithmetic progression]] all of the same color. The least such ''N'' is the [[Van der Waerden number]] ''W''(''r'',&nbsp;''k''). It is named after the Dutch mathematician [[Bartel Leendert van der Waerden|B. L. van der Waerden]].<ref>{{cite journal |authorlink=Bartel Leendert van der Waerden |first=B. L. |last=van der Waerden |title={{lang|de|Beweis einer Baudetschen Vermutung}} |journal=Nieuw. Arch. Wisk. |volume=15 |year=1927 |issue= |pages=212–216 }}</ref> + +For example, when ''r'' = 2, you have two colors, say <span style="color:red;">red</span> and <span style="color:blue;">blue</span>. ''W''(2, 3) is bigger than 8, because you can color the integers from {1, ..., 8} like this: +{|class="wikitable" +|&nbsp;1&nbsp; +|&nbsp;2&nbsp; +|&nbsp;3&nbsp; +|&nbsp;4&nbsp; +|&nbsp;5&nbsp; +|&nbsp;6&nbsp; +|&nbsp;7&nbsp; +|&nbsp;8&nbsp; +|- +|&nbsp;'''<span style="color:blue;">B</span>'''&nbsp; +|&nbsp;'''<span style="color:red;">R</span>'''&nbsp; +|&nbsp;'''<span style="color:red;">R</span>'''&nbsp; +|&nbsp;'''<span style="color:blue;">B</span>'''&nbsp; +|&nbsp;'''<span style="color:blue;">B</span>'''&nbsp; +|&nbsp;'''<span style="color:red;">R</span>'''&nbsp; +|&nbsp;'''<span style="color:red;">R</span>'''&nbsp; +|&nbsp;'''<span style="color:blue;">B</span>'''&nbsp; +|} + +and no three integers of the same color form an [[arithmetic progression]]. But you can't add a ninth integer to the end without creating such a progression. If you add a <span style="color:red;">red 9</span>, then the <span style="color:red;">red 3</span>, <span style="color:red;">6</span>, and <span style="color:red;">9</span> are in arithmetic progression. Alternatively, if you add a <span style="color:blue;">blue 9</span>, then the <span style="color:blue;">blue 1</span>, <span style="color:blue;">5</span>, and <span style="color:blue;">9</span> are in arithmetic progression. In fact, there is no way of coloring 1 through 9 without creating such a progression. Therefore, ''W''(2, 3) is 9. + +It is an open problem to determine the values of ''W''(''r'', ''k'') for most values of ''r'' and ''k''. The proof of the theorem provides only an upper bound. For the case of ''r'' = 2 and ''k'' = 3, for example, the argument given below shows that it is sufficient to color the integers {1, ..., 325} with two colors to guarantee there will be a single-colored arithmetic progression of length 3. But in fact, the bound of 325 is very loose; the minimum required number of integers is only 9. Any coloring of the integers {1, ..., 9} will have three evenly spaced integers of one color. + +For ''r'' = 3 and ''k'' = 3, the bound given by the theorem is 7(2·3<sup>7</sup>&nbsp;+&nbsp;1)(2·3<sup>7·(2·3<sup>7</sup>&nbsp;+&nbsp;1)</sup>&nbsp;+&nbsp;1), or approximately 4.22·10<sup>14616</sup>. But actually, you don't need that many integers to guarantee a single-colored progression of length 3; you only need 27. (And it is possible to color {1, ..., 26} with three colors so that there is no single-colored arithmetic progression of length 3; for example, RRYYRRYBYBBRBRRYRYYBRBBYBY.) + +Anyone who can reduce the general upper bound to any 'reasonable' function can win a large cash prize. [[Ronald Graham]] has offered a prize of [[US$]]1000 for showing ''W''(2,''k'')&lt;2<sup>''k''<sup>2</sup></sup>.<ref>{{cite journal |authorlink=Ronald Graham |first=Ron |last=Graham |title=Some of My Favorite Problems in Ramsey Theory |journal=INTEGERS (The Electronic Journal of Combinatorial Number Theory |url=http://www.integers-ejcnt.org/vol7-2.html |volume=7 |issue=2 |year=2007 |pages=#A2 }}</ref> The best upper bound currently known is due to [[Timothy Gowers]],<ref>{{cite journal |authorlink=Timothy Gowers |first=Timothy |last=Gowers |title=A new proof of Szemerédi's theorem |journal=Geom. Funct. Anal. |volume=11 |issue=3 |pages=465–588 |year=2001 |url=http://www.dpmms.cam.ac.uk/~wtg10/papers.html |doi=10.1007/s00039-001-0332-9 }}</ref> who establishes + +: <math>W(r,k) \leq 2^{2^{r^{2^{2^{k + 9}}}}},</math> + +by first establishing a similar result for [[Szemerédi's theorem]], which is a stronger version of Van der Waerden's theorem. The previously best-known bound was due to [[Saharon Shelah]] and proceeded via first proving a result for the [[Hales&ndash;Jewett theorem]], which is another strengthening of Van der Waerden's theorem. + +The best lower bound currently known for <math>W(2, k)</math> is that for all positive <math>\varepsilon</math> we have <math>W(2, k) > 2^k/k^\varepsilon</math>, for all sufficiently large <math>k</math>.<ref>{{cite journal |authorlink=Zoltan Szábo|first=Zoltán |last=Szabó |title=An application of Lovász' local lemma -- a new lower bound for the van der Waerden number |journal=Random Struct. Algorithms |volume=1 | issue = 3 |pages=343-360 |year=1990 }}</ref> + +== Proof of Van der Waerden's theorem (in a special case) == + +The following proof is due to [[Ronald Graham|Ron Graham]] and B.L. Rothschild.<ref name="Graham1974">{{cite journal |authorlink=Ronald Graham |first=R. L. |last=Graham |first2=B. L. |last2=Rothschild |title=A short proof of van der Waerden's theorem on arithmetic progressions |journal=Proc. American Math. Soc. |volume=42 |issue=2 |year=1974 |pages=385–386 |doi=10.1090/S0002-9939-1974-0329917-8 }}</ref> [[A. Ya. Khinchin|Khinchin]]<ref>{{Cite document + | last1 = Khinchin | first1 = A. Ya. + | title = Three Pearls of Number Theory + | publisher = Dover + | location = Mineola, NY + | date = 1998 + | isbn = 978-0-486-40026-6 + | postscript = .}} +</ref> gives a fairly simple proof of the theorem without estimating ''W''(''r'',&nbsp;''k''). + +We will prove the special case mentioned above, that ''W''(2, 3) ≤ 325. Let ''c''(''n'') be a coloring of the integers {1, ..., 325}. We will find three elements of {1, ..., 325} in arithmetic progression that are the same color. + +Divide {1, ..., 325} into the 65 blocks {1, ..., 5}, {6, ..., 10}, ... {321, ..., 325}, thus each block is of the form {''b'' ·5 + 1, ..., ''b'' ·5 + 5} for some ''b'' in {0, ..., 64}. Since each integer is colored either red or blue, each block is colored in one of 32 different ways. By the [[pigeonhole principle]], there are two blocks among the first 33 blocks that are colored identically. That is, there are two integers ''b''<sub>1</sub> and ''b''<sub>2</sub>, both in {0,...,32}, such that + +: ''c''(''b''<sub>1</sub>&middot;5 + ''k'') = ''c''(''b''<sub>2</sub>&middot;5 + ''k'') + +for all ''k'' in {1, ..., 5}. Among the three integers ''b''<sub>1</sub>·5 + 1, ''b''<sub>1</sub>·5 + 2, ''b''<sub>1</sub>·5 + 3, there must be at least two that are the same color. (The [[pigeonhole principle]] again.) Call these ''b''<sub>1</sub>·5 + ''a''<sub>1</sub> and ''b''<sub>1</sub>·5 + ''a''<sub>2</sub>, where the ''a''<sub>''i''</sub> are in {1,2,3} and ''a''<sub>1</sub> &lt; ''a''<sub>2</sub>. Suppose (without loss of generality) that these two integers are both red. (If they are both blue, just exchange 'red' and 'blue' in what follows.) + +Let ''a''<sub>3</sub> = 2·''a''<sub>2</sub>&nbsp;&minus;&nbsp;''a''<sub>1</sub>. If ''b''<sub>1</sub>·5 + ''a''<sub>3</sub> is red, then we have found our arithmetic progression: ''b''<sub>1</sub>·5&nbsp;+&nbsp;''a''<sub>''i''</sub> are all red. + +Otherwise, ''b''<sub>1</sub>·5 + ''a''<sub>3</sub> is blue. Since ''a''<sub>3</sub> ≤ 5, ''b''<sub>1</sub>·5 + ''a''<sub>3</sub> is in the ''b''<sub>1</sub> block, and since the ''b''<sub>2</sub> block is colored identically, ''b''<sub>2</sub>·5 + ''a''<sub>3</sub> is also blue. + +Now let ''b''<sub>3</sub> = 2·''b''<sub>2</sub>&nbsp;&minus;&nbsp;''b''<sub>1</sub>. Then ''b''<sub>3</sub> ≤ 64. Consider the integer ''b''<sub>3</sub>·5 + ''a''<sub>3</sub>, which must be ≤ 325. What color is it? + +If it is red, then ''b''<sub>1</sub>·5 + ''a''<sub>1</sub>, ''b''<sub>2</sub>·5 + ''a''<sub>2</sub>, and ''b''<sub>3</sub>·5 + ''a''<sub>3</sub> form a red arithmetic progression. But if it is blue, then ''b''<sub>1</sub>·5 + ''a''<sub>3</sub>, ''b''<sub>2</sub>·5 + ''a''<sub>3</sub>, and ''b''<sub>3</sub>·5 + ''a''<sub>3</sub> form a blue arithmetic progression. Either way, we are done. + +A similar argument can be advanced to show that ''W''(3, 3) ≤ 7(2·3<sup>7</sup>+1)(2·3<sup>7·(2·3<sup>7</sup>+1)</sup>+1). One begins by dividing the integers into 2·3<sup>7·(2·3<sup>7</sup>&nbsp;+&nbsp;1)</sup>&nbsp;+&nbsp;1 groups of 7(2·3<sup>7</sup>&nbsp;+&nbsp;1) integers each; of the first 3<sup>7·(2·3<sup>7</sup>&nbsp;+&nbsp;1)</sup>&nbsp;+&nbsp;1 groups, two must be colored identically. + +Divide each of these two groups into 2·3<sup>7</sup>+1 subgroups of 7 integers each; of the first 3<sup>7</sup>&nbsp;+&nbsp;1 subgroups in each group, two of the subgroups must be colored identically. Within each of these identical subgroups, two of the first four integers must be the same color, say red; this implies either a red progression or an element of a different color, say blue, in the same subgroup. + +Since we have two identically-colored subgroups, there is a third subgroup, still in the same group that contains an element which, if either red or blue, would complete a red or blue progression, by a construction analogous to the one for ''W''(2, 3). Suppose that this element is yellow. Since there is a group that is colored identically, it must contain copies of the red, blue, and yellow elements we have identified; we can now find a pair of red elements, a pair of blue elements, and a pair of yellow elements that 'focus' on the same integer, so that whatever color it is, it must complete a progression. + +The proof for ''W''(2, 3) depends essentially on proving that ''W''(32, 2) ≤ 33. We divide the integers {1,...,325} into 65 'blocks', each of which can be colored in 32 different ways, and then show that two blocks of the first 33 must be the same color, and there is a block coloured the opposite way. Similarly, the proof for ''W''(3, 3) depends on proving that + +: <math>W(3^{7(2 \cdot 3^7+1)},2) \leq 3^{7(2 \cdot 3^7+1)}+1.</math> + +By a double [[mathematical induction|induction]] on the number of colors and the length of the progression, the theorem is proved in general. + +== Proof == + +A [[Generalized arithmetic progression|''D-dimensional arithmetic progression'']] consists of +numbers of the form: +::<math> a + i_1 s_1 + i_2 s_2 ... + i_D s_D </math> +where a is the basepoint, the s's are the different step-sizes, and the i's range from 0 to L-1. A d-dimensional AP is ''homogenous'' for some coloring when it is all the same color. + +A ''D-dimensional arithmetic progression with benefits'' is all numbers of the form above, but where you add on some of the "boundary" of the arithmetic progression, i.e. some of the indices i's can be equal to L. The sides you tack on are ones where the first k i's are equal to L, and the remaining i's are less than L. + +The boundaries of a D-dimensional AP with benefits are these additional arithmetic progressions of dimension d-1,d-2,d-3,d-4, down to 0. The 0 dimensional arithmetic progression is the single point at index value (L,L,L,L...,L). A D-dimensional AP with benefits is ''homogenous'' when each of the boundaries are individually homogenous, but different boundaries do not have to necessarily have the same color. + +Next define the quantity MinN(L, D, N) to be the least integer so +that any assignment of N colors to an interval of length MinN or more +necessarily contains a homogenous D-dimensional arithmetical progression with benefits. + +The goal is to bound the size of MinN. Note that MinN(L,1,N) is an upper bound for Van-Der-Waerden's +number. There are two inductions steps, as follows: + +1. Assume MinN is known for a given lengths L for all dimensions of arithmetic progressions with benefits up to D. This formula gives a bound on MinN when you increase the dimension to D+1: + +let <math> M = {\mathrm MinN}(L,D,n)</math> + +::<math> {\mathrm MinN}(L, D+1 , n) \le M*{\mathrm MinN}(L,1,n^M)</math> + +Proof: +First, if you have an n-coloring of the interval 1...I, you can define a ''block coloring'' of k-size +blocks. Just consider each sequence of k colors in each k block to define a unique color. Call this ''k-blocking'' an n-coloring. k-blocking an n coloring of length l produces an n^k coloring of length l/k. + +So given a n-coloring of an interval I of size M*MinN(L,1,n^M)) you can M-block it into an n^M coloring +of length MinN(L,1,n^M). But that means, by the definition of MinN, that you can find a 1-dimensional arithmetic sequence (with benefits) of length L in the block coloring, which is a sequence of blocks equally spaced, which are all the same block-color, i.e. you have a bunch of blocks of length M in the original sequence, which are equally spaced, which have exactly the same sequence of colors inside. + +Now, by the definition of M, you can find a d-dimensional arithmetic sequence with benefits in any one of these blocks, and since all of the blocks have the same sequence of colors, the same d-dimensional AP with benefits appears in all of the blocks, just by translating it from block to block. This is the definition of a d+1 dimensional arithmetic progression, so you have a homogenous d+1 dimensional AP. The new stride parameter s_{D+1} is defined to be the distance between the blocks. + +But you need benefits. The boundaries you get now are all old boundaries, plus their translations into identically colored blocks, because i_{D+1} is always less than L. The only boundary which is not like this is the 0 dimensional point when <math>i_1=i_2=...=i_{D+1}=L</math>. This is a single point, and is automatically homogenous. + +2. Assume MinN is known for one value of L and all possible dimensions D. Then you can bound MinN for length L+1. + +::<math>{\mathrm MinN}(L+1,D,n) \le 2{\mathrm MinN}(L,n,n)</math> + +proof: +Given an n-coloring of an interval of size MinN(L,n,n), by definition, you can find an arithmetic sequence with benefits of dimension n of length L. But now, the number of "benefit" boundaries is equal to the number of colors, so one of the homogenous boundaries, say of dimension k, has to have the same color as another one of the homogenous benefit boundaries, say the one of dimension p<k. This allows a length L+1 arithmetic sequence (of dimension 1) to be constructed, by going along a line inside the k-dimensional boundary which ends right on the p-dimensional boundary, and including the terminal point in the p-dimensional boundary. In formulas: + +if +::<math> a+ L s_1 +L s_2... + L s_{D-k}</math> has the same color as +::<math> a + L s_1 +L s_2 ... +L s_{D-p}</math> +then +::<math> a + L*(s_1 ... +s_{D-k}) + u *(s_{D-k+1} ... +s_p) </math> have the same color +::<math> u = 0,1,2,...,L-1,L </math> i.e. u makes a sequence of length L+1. + +This constructs a sequence of dimension 1, and the "benefits" are automatic, just add on another point of whatever color. To include this boundary point, one has to make the interval longer by the maximum possible value of the stride, which is certainly less than the interval size. So doubling the interval size will definitely work, and this is the reason for the factor of two. This completes the induction on L. + +Base case: MinN(1,d,n)=1, i.e. if you want a length 1 homogenous d-dimensional arithmetic sequence, with or without benefits, you have nothing to do. So this forms the base of the induction. The VanDerWaerden theorem itself is the assertion that MinN(L,1,N) is finite, and it follows from the base case and the induction steps.<ref name="Graham1974" /> + +==See also== +* [[Van der Waerden number]]s for all known values for ''W''(''n'',''r'') and the best-known bounds for unknown values + +==References== +{{reflist}} + +==External links== +* [http://www.math.uga.edu/~lyall/REU/ramsey.pdf Proof of Van der Waerden's theorem] + +[[Category:Ramsey theory]] +[[Category:Theorems in discrete mathematics]] +[[Category:Articles containing proofs]] + a97js29b6j82oed48e6wr1l6qm3s2ud + + + + Bounded variation + 0 + 4049 + + 4050 + 2014-01-01T19:15:08Z + + Daniele.tampieri + 0 + + + /* References */ Abridged a web link + wikitext + text/x-wiki + {{Use dmy dates|date=June 2013}} +In [[mathematical analysis]], a function of '''bounded variation''', also known as a '''BV function''', is a [[real number|real]]-valued [[function (mathematics)|function]] whose [[total variation]] is bounded (finite): the [[graph of a function]] having this property is well behaved in a precise sense. For a [[continuous function]] of a single [[Variable (mathematics)|variable]], being of bounded variation means that the [[distance]] along the [[Direction (geometry, geography)|direction]] of the [[y-axis|''y''-axis]], neglecting the contribution of motion along [[x-axis|''x''-axis]], traveled by a [[point (mathematics)|point]] moving along the graph has a finite value. For a continuous function of several variables, the meaning of the definition is the same, except for the fact that the continuous path to be considered cannot be the whole graph of the given function (which is a [[Glossary of differential geometry and topology#H|hypersurface]] in this case), but can be every [[Intersection (set theory)|intersection]] of the graph itself with a [[hyperplane]] (in the case of functions of two variables, a [[Plane (mathematics)|plane]]) parallel to a fixed ''x''-axis and to the ''y''-axis. + +Functions of bounded variation are precisely those with respect to which one may find [[Riemann&ndash;Stieltjes integral]]s of all continuous functions. + +Another characterization states that the functions of bounded variation on a closed interval are exactly those ''f'' which can be written as a difference ''g''&nbsp;&minus;&nbsp;''h'', where both ''g'' and ''h'' are bounded [[monotonic function|monotone]]. + +In the case of several variables, a function ''f'' defined on an [[open subset]] '''<math> \Omega </math>''' of ℝ''<sup>n</sup>'' is said to have bounded variation if its [[distribution (mathematics)|distributional derivative]] is a finite [[Vector-valued function|vector]] [[Radon measure]]. + +One of the most important aspects of functions of bounded variation is that they form an [[Associative algebra|algebra]] of [[continuous function|discontinuous functions]] whose first derivative exists [[almost everywhere]]: due to this fact, they can and frequently are used to define [[generalized solution]]s of nonlinear problems involving [[functional (mathematics)|functional]]s, [[ordinary differential equation|ordinary]] and [[partial differential equation]]s in [[mathematics]], [[physics]] and [[engineering]]. Considering the problem of [[Distribution_(mathematics)#Problem_of_multiplication|multiplication of distributions]] or more generally the problem of defining general nonlinear operations on [[generalized function]]s, ''function of bounded variation are the smallest [[Algebra over a field|algebra]] which has to be embedded in every space of [[generalized function]]s preserving the result of [[multiplication]]''. + +==History== +According to Boris Golubov, ''BV'' functions of a single variable were first introduced by [[Camille Jordan]], in the paper {{Harv|Jordan|1881}} dealing with the convergence of [[Fourier series]]. The first successful step in the generalization of this concept to functions of several variables was due to [[Leonida Tonelli]],<ref>[[Leonida Tonelli|Tonelli]] introduced what is now called after him '''Tonelli plane variation''': for an analysis of this concept and its relations to other generalizations, see the entry "[[Total variation]]".</ref> who introduced a class of ''[[continuous function|continuous]]'' ''BV'' functions in 1926 {{Harv|Cesari|1986|pp=47–48}}, to extend his [[Direct method in the calculus of variations|direct method]] for finding solutions to problems in the [[calculus of variations]] in more than one variable. Ten years after, in {{Harv|Cesari|1936}}, [[Lamberto Cesari]] ''changed the [[continuous function|continuity]] requirement'' in Tonelli's definition ''to a less restrictive [[integral|integrability]] requirement'', obtaining for the first time the class of functions of bounded variation of several variables in its full generality: as Jordan did before him, he applied the concept to resolve of a problem concerning the convergence of [[Fourier series]], but for functions of ''two variables''. After him, several authors applied ''BV'' functions to study [[Fourier series]] in several variables, [[geometric measure theory]], [[calculus of variations]], and [[mathematical physics]]. [[Renato Caccioppoli]] and [[Ennio de Giorgi]] used them to define [[measure theory|measure]] of [[smooth function|nonsmooth]] [[boundary (topology)|boundaries]] of [[Set (mathematics)|sets]] (see the entry "''[[Caccioppoli set]]''" for further information). [[Olga Arsenievna Oleinik]] introduced her view of [[generalized solution]]s for [[nonlinear]] [[partial differential equation]]s as functions from the space ''BV'' in the paper {{Harv|Oleinik|1957}}, and was able to construct a generalized solution of bounded variation of a first order partial differential equation in the paper {{Harv|Oleinik|1959}}: few years later, [[Edward D. Conway]] and [[Joel A. Smoller]] applied ''BV''-functions to the study of a single [[hyperbolic equation|nonlinear hyperbolic partial differential equation]] of [[First-order partial differential equation|first order]] in the paper {{Harv|Conway|Smoller|1966}}, proving that the solution of the [[Cauchy problem]] for such equations is a function of bounded variation, provided the [[Cauchy boundary condition|initial value]] belongs to the same class. [[Aizik Isaakovich Vol'pert]] developed extensively a calculus for ''BV'' functions: in the paper {{Harv|Vol'pert|1967}} he proved the [[Bounded variation#Chain rule for BV functions|chain rule for BV functions]] and in the book {{Harv|Hudjaev|Vol'pert|1985}} he, jointly with his pupil [[Sergei Ivanovich Hudjaev]], explored extensively the properties of ''BV'' functions and their application. His chain rule formula was later extended by [[Luigi Ambrosio]] and [[Gianni Dal Maso]] in the paper {{Harv|Ambrosio|Dal Maso|1990}}. + +==Formal definition== + +=== ''BV'' functions of one variable === +{{EquationRef|1|Definition 1.1.}} The '''total variation'''<ref name="Tvar">See the entry "[[Total variation]]" for further details and more information.</ref> of a [[real number|real]]-valued (or more generally [[complex number|complex]]-valued) [[function (mathematics)|function]] ''f'', defined on an [[interval (mathematics)|interval]] [''a'', ''b'']⊂ℝ is the quantity + +:<math> V^a_b(f)=\sup_{P \in \mathcal{P}} \sum_{i=0}^{n_{P}-1} | f(x_{i+1})-f(x_i) |. \,</math> + +where the [[supremum]] is taken over the set <math> \scriptstyle \mathcal{P} =\left\{P=\{ x_0, \dots , x_{n_P}\}|P\text{ is a partition of } [a, b] \right\} </math> of all [[partition of an interval|partitions]] of the interval considered. + +If ''f'' is [[derivative|differentiable]] and its derivative is Riemann-integrable, its total variation is the vertical component of the [[arc length|arc-length]] of its graph, that is to say, + +:<math> V^a_b(f) = \int _a^b |f'(x)|\,\mathrm{d}x.</math> + +{{EquationRef|2|Definition 1.2.}} A real-valued function <math> f </math> on the [[real line]] is said to be of '''bounded variation''' ('''BV function''') on a chosen [[interval (mathematics)|interval]] [''a'', ''b'']⊂ℝ if its total variation is finite, ''i.e.'' +:<math> f \in BV([a,b]) \iff V^a_b(f) < +\infty </math> + +It can be proved that a real function ''ƒ'' is of bounded variation in an interval if and only if it can be written as the difference ''ƒ'' = ''ƒ''<sub>1</sub> − ''ƒ''<sub>2</sub> of two non-decreasing functions: this result is known as the [[Hahn decomposition_theorem#Hahn–Jordan decomposition|Jordan decomposition]]. + +Through the [[Stieltjes integral]], any function of bounded variation on a closed interval [''a'', ''b''] defines a [[bounded linear functional]] on ''C''([''a'', ''b'']). In this special case,<ref>See for example {{harvtxt|Kolmogorov|Fomin|1969|pp=374–376}}.</ref> the [[Riesz representation theorem]] states that every bounded linear functional arises uniquely in this way. The normalised positive functionals or [[probability measure]]s correspond to positive non-decreasing lower [[semicontinuous function]]s. This point of view has been important in +[[spectral theory]],<ref>For a general reference on this topic, see {{harvtxt|Riesz|Szőkefalvi-Nagy|1990}}</ref> in particular in its application to [[spectral theory of ordinary differential equations|ordinary differential equations]]. + +===''BV'' functions of several variables=== +Functions of bounded variation, BV [[function (mathematics)|functions]], are functions whose distributional [[directional derivative|derivative]] is a [[Wikt:finite|finite]]<ref>In this context, "finite" means that its value is never [[Infinity|infinite]], i.e. it is a [[finite measure]].</ref> [[Radon measure]]. More precisely: + +{{EquationRef|3|Definition 2.1.}} Let '''<math> \Omega </math>''' be an [[open subset]] of ℝ''<sup>n</sup>''. A function '''<math> u </math>''' belonging to '''[[Lp space|<math>L^1(\Omega)</math>]]''' is said of '''bounded variation''' ('''BV function'''), and written + +:<math> u\in BV(\Omega)</math> + +if there exists a [[Finite measure|finite]] [[Vector valued function|vector]] [[Radon measure]] <math> \scriptstyle Du\in\mathcal M(\Omega,\mathbb{R}^n)</math> such that the following equality holds + +:<math> +\int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x = - \int_\Omega \langle\boldsymbol{\phi}, Du(x)\rangle +\qquad \forall\boldsymbol{\phi}\in C_c^1(\Omega,\mathbb{R}^n) +</math> + +that is, '''<math>u</math>''' defines a [[linear functional]] on the space <math> \scriptstyle C_c^1(\Omega,\mathbb{R}^n)</math> of [[Smooth function|continuously differentiable]] [[Vector valued function|vector functions]] <math> \scriptstyle\boldsymbol{\phi} </math> of [[support (mathematics)#Compact support|compact support]] contained in '''<math> \Omega </math>''': the vector [[measure (mathematics)|measure]] '''<math>Du</math>''' represents therefore the [[Distribution (mathematics)#Formal definition|distributional]] or [[weak derivative|weak]] [[gradient]] of '''<math>u</math>'''. + +An equivalent definition is the following. + +{{EquationRef|4|Definition 2.2.}} Given a function '''<math>u</math>''' belonging to '''<math>L^1(\Omega)</math>''', the '''total variation of <math>u</math>'''<ref name="Tvar"/> in <math>\Omega</math> is defined as + +:<math> V(u,\Omega):=\sup\left\{\int_\Omega u(x)\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x\colon\boldsymbol{\phi}\in C_c^1(\Omega,\mathbb{R}^n),\ \Vert\boldsymbol{\phi}\Vert_{L^\infty(\Omega)}\le 1\right\}</math> + +where <math> \scriptstyle \Vert\;\Vert_{L^\infty(\Omega)}</math> is the [[essential supremum]] [[Norm (mathematics)|norm]]. Sometimes, especially in the theory of [[Caccioppoli set]]s, the following notation is used + +:<math>\int_\Omega\vert D u\vert = V(u,\Omega)</math> + +in order to emphasize that <math>V(u,\Omega)</math> is the total variation of the [[Distribution (mathematics)#Formal definition|distributional]] / [[weak derivative|weak]] [[gradient]] of '''<math>u</math>'''. This notation reminds also that if '''<math>u</math>''' is of class '''<math>C^1</math>''' (i.e. a [[continuous function|continuous]] and [[differentiable function]] having [[continuous function|continuous]] [[derivative]]s) then its [[Total variation|variation]] is exactly the [[Integral (measure theory)|integral]] of the [[absolute value]] of its [[gradient]]. + +The space of '''functions of bounded variation''' ('''BV functions''') can then be defined as + +:<math> BV(\Omega)=\{ u\in L^1(\Omega)\colon V(u,\Omega)<+\infty\}</math> + +The two definitions are equivalent since if <math> \scriptstyle V(u,\Omega)<+\infty </math> then + +:<math>\left|\int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x \right |\leq V(u,\Omega)\Vert\boldsymbol{\phi}\Vert_{L^\infty(\Omega)} +\qquad \forall \boldsymbol{\phi}\in C_c^1(\Omega,\mathbb{R}^n) +</math> + +therefore <math>\scriptstyle \int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}(x)</math> defines a [[continuous linear functional]] on +the space <math>\scriptstyle C_c^1(\Omega,\mathbb{R}^n)</math>. Since <math>\scriptstyle C_c^1(\Omega,\mathbb{R}^n) +\subset C^0(\Omega,\mathbb{R}^n)</math> as a [[linear subspace]], this [[continuous linear functional]] can be extended [[continuous function|continuously]] and [[linearity|linearily]] to the whole <math>\scriptstyle C^0(\Omega,\mathbb{R}^n)</math> by the [[Hahn–Banach theorem]] i.e. it defines a [[Radon_measure#Duality|Radon measure]]. + +===Locally ''BV'' functions=== +If the [[function space]] of [[locally integrable function]]s, i.e. [[Function (mathematics)|function]]s belonging to <math>\scriptstyle L^1_{loc}(\Omega)</math>, is considered in the preceding definitions {{EquationNote|2|1.2}}, {{EquationNote|3|2.1}} and {{EquationNote|4|2.2}} instead of the one of [[integrable function|globally integrable functions]], then the function space defined is that of '''functions of locally bounded variation'''. Precisely, developing this idea for {{EquationNote|4|definition 2.2}}, a '''[[local property|local]] variation''' is defined as follows, +:<math> V(u,U):=\sup\left\{\int_\Omega u(x)\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x\colon\boldsymbol{\phi}\in C_c^1(U,\mathbb{R}^n),\ \Vert\boldsymbol{\phi}\Vert_{L^\infty(\Omega)}\le 1\right\}</math> +for every [[Set (mathematics)|set]] <math>\scriptstyle U\in\mathcal{O}_c(\Omega)</math>, having defined <math>\scriptstyle \mathcal{O}_c(\Omega)</math> as the set of all [[Relatively compact subspace|precompact]] [[open subset]]s of '''<math>\Omega</math>''' with respect to the standard [[topology]] of [[dimension (mathematics)|finite dimensional]] [[vector space]]s, and correspondingly the class of functions of locally bounded variation is defined as +:<math>BV_{loc}(\Omega)=\{ u\in L^1_{loc}(\Omega)\colon V(u,U)<+\infty\; \forall U\in\mathcal{O}_c(\Omega)\}</math> + +===Notation=== +There are basically two distinct conventions for the notation of spaces of functions of locally or globally bounded variation, and unfortunately they are quite similar: the first one, which is the one adopted in this entry, is used for example in references {{Harvtxt|Giusti|1984}} (partially), {{Harvtxt|Hudjaev|Vol'pert|1985}} (partially), {{Harvtxt|Giaquinta|Modica|Souček|1998}} and is the following one +*<math>\scriptstyle BV(\Omega)</math> identifies the [[Space (mathematics)|space]] of functions of globally bounded variation +*<math>\scriptstyle BV_{loc}(\Omega)</math> identifies the [[Space (mathematics)|space]] of functions of locally bounded variation +The second one, which is adopted in references {{Harvtxt|Vol'pert|1967}} and {{Harvtxt|Maz'ya|1985}} (partially), is the following: +*<math>\scriptstyle \overline{BV}(\Omega)</math> identifies the [[Space (mathematics)|space]] of functions of globally bounded variation +*<math>\scriptstyle BV(\Omega)</math> identifies the [[Space (mathematics)|space]] of functions of locally bounded variation + +==Basic properties== +Only the properties common to [[Function (mathematics)|function]]s of one variable and to [[Function (mathematics)|function]]s of several variables will be considered in the following, and [[Mathematical proof|proof]]s will be carried on only for functions of several variables since the [[Mathematical proof|proof]] for the case of one variable is a straightforward adaptation of the several variables case: also, in each section it will be stated if the property is shared also by functions of locally bounded variation or not. References {{Harv|Giusti|1984|pp=7–9}}, {{Harv|Hudjaev|Vol'pert|1985}} and {{Harv|Màlek|Nečas|Rokyta|Růžička|1996}} are extensively used. + +===''BV'' functions have only jump-type discontinuities=== +In the case of one variable, the assertion is clear: for each point <math>x_0</math> in the [[interval (mathematics)|interval]] <math>[a , b]</math>⊂ℝ of definition of the function '''<math>u</math>''', either one of the following two assertions is true + +:<math> \lim_{x\rightarrow x_{0^-}}\!\!\!u(x) = \!\!\!\lim_{x\rightarrow x_{0^+}}\!\!\!u(x) </math> +:<math> \lim_{x\rightarrow x_{0^-}}\!\!\!u(x) \neq \!\!\!\lim_{x\rightarrow x_{0^+}}\!\!\!u(x) </math> + +while both [[Limit of a function|limits]] exist and are finite. In the case of functions of several variables, there are some premises to understand: first of all, there is a [[Linear continuum|continuum]] of [[Direction (geometry, geography)|direction]]s along which it is possible to approach a given point '''<math>x_0</math>''' belonging to the domain '''<math>\Omega</math>'''⊂ℝ''<sup>n</sup>''. It is necessary to make precise a suitable concept of [[Limit of a function|limit]]: choosing a [[unit vector]] <math>\scriptstyle{\boldsymbol{\hat{a}}}\in\mathbb{R}^n</math> it is possible to divide '''<math>\Omega</math>''' in two sets + +:<math>\Omega_{({\boldsymbol{\hat{a}}},\boldsymbol{x}_0)} = \Omega \cap \{\boldsymbol{x}\in\mathbb{R}^n|\langle\boldsymbol{x}-\boldsymbol{x}_0,{\boldsymbol{\hat{a}}}\rangle>0\} \qquad +\Omega_{(-{\boldsymbol{\hat{a}}},\boldsymbol{x}_0)} = \Omega \cap \{\boldsymbol{x}\in\mathbb{R}^n|\langle\boldsymbol{x}-\boldsymbol{x}_0,-{\boldsymbol{\hat{a}}}\rangle>0\} </math> + +Then for each point '''<math>x_0</math>''' belonging to the domain <math>\scriptstyle\Omega\in\mathbb{R}^n</math> of the ''BV'' function '''<math>u</math>''', only one of the following two assertions is true + +:<math> \lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{({\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!u(\boldsymbol{x}) = \!\!\!\!\!\!\!\lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{(-{\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!\!u(\boldsymbol{x}) +</math> +:<math> \lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{({\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!u(\boldsymbol{x}) \neq \!\!\!\!\!\!\!\lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{(-{\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!\!u(\boldsymbol{x}) +</math> + +or '''<math>x_0</math>''' belongs to a [[subset]] of '''<math>\Omega</math>''' having zero <math>n-1</math>-dimensional [[Hausdorff measure]]. The quantities + +:<math>\lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{({\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!u(\boldsymbol{x})=u_{\boldsymbol{\hat a}}(\boldsymbol{x}_0) \qquad \lim_{\overset{\boldsymbol{x}\rightarrow \boldsymbol{x}_0}{\boldsymbol{x}\in\Omega_{(-{\boldsymbol{\hat{a}}},\boldsymbol{x}_0)}}}\!\!\!\!\!\!\!u(\boldsymbol{x})=u_{-\boldsymbol{\hat a}}(\boldsymbol{x}_0)</math> + +are called '''approximate limits''' of the ''BV'' function '''<math>u</math>''' at the point '''<math>x_0</math>'''. + +===''V''(&middot;,&nbsp;&Omega;) is lower semi-continuous on ''BV''(&Omega;)=== +The [[functional (mathematics)|functional]] <math>\scriptstyle V(\cdot,\Omega):BV(\Omega)\rightarrow \mathbb{R}^+</math> is [[semi-continuity|lower semi-continuous]]: +to see this, choose a [[Cauchy sequence]] of ''BV''-functions '''<math>\scriptstyle\{u_n\}_{n\in\mathbb{N}}</math>''' converging to '''[[locally integrable function|<math>\scriptstyle u\in L^1_\text{loc}(\Omega)</math>]]'''. Then, since all the functions of the sequence and their limit function are [[integral|integrable]] and by the definition of [[lower limit]] + +:<math>\liminf_{n\rightarrow\infty}V(u_n,\Omega) = \liminf_{n\rightarrow\infty} \int_\Omega u_n(x)\,\mathrm{div}\, \boldsymbol{\phi}\, \mathrm{d}x \geq \int_\Omega \lim_{n\rightarrow\infty} u_n(x)\,\mathrm{div}\, \boldsymbol{\phi}\, \mathrm{d}x = \int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}\, \mathrm{d}x \qquad\forall\boldsymbol{\phi}\in C_c^1(\Omega,\mathbb{R}^n),\quad\Vert\boldsymbol{\phi}\Vert_{L^\infty(\Omega)}\leq 1 </math> + +Now considering the [[supremum]] on the set of functions <math>\scriptstyle\boldsymbol{\phi}\in C_c^1(\Omega,\mathbb{R}^n)</math> such that <math>\scriptstyle \Vert\boldsymbol{\phi}\Vert_{L^\infty(\Omega)}\leq 1 </math> then the following inequality holds true + +:<math>\liminf_{n\rightarrow\infty}V(u_n,\Omega)\geq V(u,\Omega)</math> + +which is exactly the definition of [[semicontinuity|lower semicontinuity]]. + +===''BV''(&Omega;) is a Banach space=== +By definition '''<math>BV(\Omega)</math>''' is a [[subset]] of '''[[integrable function|<math>L^1(\Omega)</math>]]''', while [[linearity]] follows from the linearity properties of the defining [[integral]] i.e. + +:<math>\begin{align} +\int_\Omega [u(x)+v(x)]\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x & = +\int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x +\int_\Omega v(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x = \\ +& =- \int_\Omega \langle\boldsymbol{\phi}(x), Du(x)\rangle- \int_\Omega \langle \boldsymbol{\phi}(x), Dv(x)\rangle + =- \int_\Omega \langle \boldsymbol{\phi}(x), [Du(x)+Dv(x)]\rangle +\end{align} +</math> + +for all <math>\scriptstyle\phi\in C_c^1(\Omega,\mathbb{R}^n)</math> therefore <math>\scriptstyle u+v\in BV(\Omega)</math>for all <math>\scriptstyle u,v\in BV(\Omega)</math>, and + +:<math> +\int_\Omega c\cdot u(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x = +c\!\int_\Omega u(x)\,\mathrm{div}\boldsymbol{\phi}(x)\mathrm{d}x = +-c\! \int_\Omega \langle \boldsymbol{\phi}(x), Du(x)\rangle +</math> + +for all <math>\scriptstyle c\in\mathbb{R}</math>, therefore <math>\scriptstyle cu\in BV(\Omega)</math> for all <math>\scriptstyle u\in BV(\Omega)</math>, and all <math>\scriptstyle c\in\mathbb{R}</math>. The proved [[vector space]] properties imply that '''<math>BV(\Omega)</math>''' is a [[vector subspace]] of '''[[Lp space|<math>L^1(\Omega)</math>]]'''. Consider now the function <math>\scriptstyle\|\;\|_{BV}:BV(\Omega)\rightarrow\mathbb{R}^+</math> defined as + +:<math>\| u \|_{BV} := \| u \|_{L^1} + V(u,\Omega)</math> + +where <math>\scriptstyle\| \; \|_{L^1}</math> is the usual '''[[Lp space#Lp spaces|<math>L^1(\Omega)</math> norm]]''': it is easy to prove that this is a [[norm (mathematics)|norm]] on '''<math>BV(\Omega)</math>'''. To see that '''<math>BV(\Omega)</math>''' is [[completeness|complete]] respect to it, i.e. it is a [[Banach space]], consider a [[Cauchy sequence]] <math>\scriptstyle\{u_n\}_{n\in\mathbb{R}}</math> in '''<math>BV(\Omega)</math>'''. By definition it is also a [[Cauchy sequence]] in '''<math>L^1(\Omega)</math>''' and therefore has a [[limit of a sequence|limit]] '''<math>u</math>''' in '''<math>L^1(\Omega)</math>''': since '''<math>u_n</math>''' is bounded in '''<math>BV(\Omega)</math>''' for each '''<math>n</math>''', then <math>\scriptstyle \Vert u \Vert_{BV} < +\infty </math> by [[semicontinuity|lower semicontinuity]] of the variation <math>\scriptstyle V(\cdot,\Omega)</math>, therefore '''<math>u</math>''' is a ''BV'' function. Finally, again by lower semicontinuity, choosing an arbitrary small positive number '''<math>\scriptstyle\varepsilon</math>''' + +:<math>\Vert u_j - u_k \Vert_{BV}<\varepsilon\quad\forall j,k\geq N\in\mathbb{N} \quad\Rightarrow\quad V(u_k-u,\Omega)\leq \liminf_{j\rightarrow +\infty} V(u_k-u_j,\Omega)\leq\varepsilon</math> + +===''BV''(&Omega;) is not separable=== +To see this, it is sufficient to consider the following example belonging to the space '''<math>BV([0,1])</math>''':<ref>The example is taken from {{Harvtxt|Giaquinta|Modica|Souček|1998|p=331}}: see also {{harv|Kannan|Krueger|1996|loc=example 9.4.1, p. 237}}.</ref> for each 0<''&alpha;''<1 define +:<math>\chi_\alpha=\chi_{[\alpha,1]}= +\begin{cases} 0 & \mbox{if } x \notin\; [\alpha,1] \\ + 1 & \mbox{if } x \in [\alpha,1] +\end{cases} +</math> +as the [[indicator function|characteristic function]] of the [[Interval (mathematics)#Terminology|left-closed interval]] <math>[\alpha,1]</math>. Then, choosing ''&alpha;,&beta;''∈<math>[0,1]</math> such that ''&alpha;''≠''&beta;'' the following relation holds true: +:<math>\Vert \chi_\alpha - \chi_\beta \Vert_{BV}=2+|\alpha-\beta|</math> +Now, in order to prove that every [[Dense set|dense subset]] of '''<math>BV(]0,1[)</math>''' cannot be [[countable set|countable]], it is sufficient to see that for every ''&alpha;''∈<math>[0,1]</math> it is possible to construct the [[Ball (mathematics)|ball]]s +:<math>B_\alpha=\left\{\psi\in BV([0,1]);\Vert \chi_\alpha - \psi \Vert_{BV}\leq 1\right\}</math> +Obviously those balls are [[Disjoint sets|pairwise disjoint]], and also are an [[indexed family]] of [[set (mathematics)|set]]s whose [[index set]] is <math>[0,1]</math>. This implies that this family has the [[cardinality of the continuum]]: now, since any dense subset of '''<math>BV([0,1])</math>''' must have at least a point inside each member of this family, its cardinality is at least that of the continuum and therefore cannot a be countable subset.<ref>The same argument is used by {{Harvtxt|Kolmogorov|Fomin|1969|loc=example 7, pp. 48–49}}, in order to prove the non [[Separable space|separability]] of the space of [[bounded sequence]]s, and also {{harvtxt|Kannan|Krueger|1996|loc=example 9.4.1, p. 237}}.</ref> This example can be obviously extended to higher dimensions, and since it involves only [[Local property|local properties]], it implies that the same property is true also for '''<math>BV_{loc}</math>'''. + +===Chain rule for ''BV'' functions=== +[[Chain rule]]s for [[smooth function|nonsmooth function]]s are very important in [[mathematics]] and [[mathematical physics]] since there are several important [[Mathematical model|physical model]]s whose behavior is described by [[Function (mathematics)|functions]] or [[functional (mathematics)|functional]]s with a very limited degree of [[Smooth function|smoothness]].The following version is proved in the paper {{Harv|Vol'pert|1967|p=248}}: all [[partial derivative]]s must be intended in a generalized sense. i.e. as [[Generalized_derivative#Basic_idea|generalized derivative]]s + +'''Theorem'''. Let <math>\scriptstyle f:\mathbb{R}^p\rightarrow\mathbb{R}</math> be a function of class '''<math>C^1</math>''' (i.e. a [[continuous function|continuous]] and [[differentiable function]] having [[continuous function|continuous]] [[derivative]]s) and let <math>\scriptstyle\boldsymbol{u}(\boldsymbol{x})=(u_1(\boldsymbol{x}),\ldots,u_p(\boldsymbol{x})) </math> be a function in '''<math>BV(\Omega)</math>''' with '''<math> \Omega </math>''' being an [[open subset]] of <math> \scriptstyle\mathbb{R}^n </math>. +Then <math>\scriptstyle f\circ\boldsymbol{u}(\boldsymbol{x})=f(\boldsymbol{u}(\boldsymbol{x}))\in BV(\Omega) </math> and + +:<math>\frac{\partial f(\boldsymbol{u}(\boldsymbol{x}))}{\partial x_i}=\sum_{k=1}^p\frac{\partial\bar{f}(\boldsymbol{u}(\boldsymbol{x}))}{\partial u_k}\frac{\partial{u_k(\boldsymbol{x})}}{\partial x_i} +\qquad\forall i=1,\ldots,n</math> + +where <math>\scriptstyle\bar f(\boldsymbol{u}(\boldsymbol{x}))</math> is the mean value of the function at the point '''<math>\scriptstyle x \in\Omega</math>''', defined as + +:<math>\bar f(\boldsymbol{u}(\boldsymbol{x}))=\int_0^1 f\left(\boldsymbol{u}_{\boldsymbol{\hat a}}(\boldsymbol{x})t + \boldsymbol{u}_{-\boldsymbol{\hat a}}(\boldsymbol{x})(1-t)\right)dt</math> + +A more general [[chain rule]] [[formula]] for [[lipschitz continuity|Lipschitz continuous functions]] <math>\scriptstyle f:\mathbb{R}^p\rightarrow\mathbb{R}^s</math> has been found by [[Luigi Ambrosio]] and [[Gianni Dal Maso]] and is published in the paper {{Harv|Ambrosio|Dal Maso|1990}}. However, even this formula has very important direct consequences: choosing <math>\scriptstyle f(u)=v(\boldsymbol{x})u(\boldsymbol{x})</math>, where <math>\scriptstyle v(\boldsymbol{x})</math> is also a '''<math>BV</math>''' function, the preceding formula gives the '''''[[Product rule|Leibniz rule]]''''' for '''<math>BV</math>''' functions + +:<math>\frac{\partial v(\boldsymbol{x})u(\boldsymbol{x})}{\partial x_i} = {\bar u(\boldsymbol{x})}\frac{\partial v(\boldsymbol{x})}{\partial x_i} + +{\bar v(\boldsymbol{x})}\frac{\partial u(\boldsymbol{x})}{\partial x_i} </math> + +This implies that '''the product of two functions of bounded variation is again a function of bounded variation''', therefore '''<math>BV(\Omega)</math>''' is an [[Associative algebra|algebra]]. + +===''BV''(&Omega;) is a Banach algebra=== +This property follows directly from the fact that '''<math>BV(\Omega)</math>''' is a [[Banach space]] and also an [[associative algebra]]: this implies that if '''<math>\{v_n\}</math>''' and '''<math>\{u_n\}</math>''' are [[Cauchy sequence]]s of <math>BV</math> functions converging respectively to [[function (mathematics)|function]]s '''<math>v</math>''' and '''<math>u</math>''' in '''<math>BV(\Omega)</math>''', then + +::<math>\begin{matrix} + vu_n\xrightarrow[n\to\infty]{} vu \\ + v_nu\xrightarrow[n\to\infty]{} vu + \end{matrix}\quad\Longleftrightarrow +\quad vu\in BV(\Omega)</math> + +therefore the ordinary [[Pointwise product|product of functions]] is [[continuity (mathematics)|continuous]] in '''<math>BV(\Omega)</math>''' respect to each argument, making this function space a [[Banach algebra]]. + +==Generalizations and extensions== + +=== Weighted ''BV'' functions === +It is possible to generalize the above notion of [[total variation]] so that different variations are weighted differently. More precisely, let <math>\scriptstyle \varphi : [0, +\infty)\longrightarrow [0, +\infty)</math> be any increasing function such that <math>\scriptstyle \varphi(0) = \varphi(0+) =\lim_{x\rightarrow 0_+}\varphi(x) = 0</math> (the '''[[weight function]]''') and let <math>\scriptstyle f: [0, T]\longrightarrow X </math> be a function from the [[interval (mathematics)|interval]] <math>[0 , T]</math>⊂ℝ taking values in a [[normed vector space]] <math>X</math>. Then the <math>\scriptstyle \boldsymbol\varphi</math>'''-variation''' of <math>f</math> over <math>[0, T]</math> is defined as + +:<math>\mathop{\varphi\mbox{-Var}}_{[0, T]} (f) := \sup \sum_{j = 0}^{k} \varphi \left( | f(t_{j + 1}) - f(t_{j}) |_{X} \right),</math> + +where, as usual, the supremum is taken over all finite [[partition of an interval|partitions]] of the interval <math>[0, T]</math>, i.e. all the [[finite set]]s of [[real number]]s <math>t_i</math> such that + +:<math>0 = t_{0} < t_{1} < \ldots < t_{k} = T.</math> + +The original notion of [[Total variation|variation]] considered above is the special case of <math>\scriptstyle \varphi</math>-variation for which the weight function is the [[identity function]]: therefore an [[integrable function]] <math>f</math> is said to be a '''weighted ''BV'' function''' (of weight <math>\scriptstyle\varphi</math>) if and only if its <math>\scriptstyle \varphi</math>-variation is finite. + +:<math>f\in BV_\varphi([0, T];X)\iff \mathop{\varphi\mbox{-Var}}_{[0, T]} (f) <+\infty</math> + +The space <math>\scriptstyle BV_\varphi([0, T];X)</math> is a [[topological vector space]] with respect to the [[norm (mathematics)|norm]] + +:<math>\| f \|_{BV_\varphi} := \| f \|_{\infty} + \mathop{\varphi \mbox{-Var}}_{[0, T]} (f),</math> + +where <math>\scriptstyle\| f \|_{\infty}</math> denotes the usual [[supremum norm]] of ''<math>f</math>''. Weighted ''BV'' functions were introduced and studied in full generality by [[Władysław Orlicz]] and [[Julian Musielak]] in the paper {{Harvnb|Musielak|Orlicz|1959}}: [[Laurence Chisholm Young]] studied earlier the case <math>\scriptstyle\varphi(x)=x^p</math> where ''<math>p</math>'' is a positive integer. + +===''SBV'' functions=== +'''SBV functions''' ''i.e.'' ''Special functions of Bounded Variation'' were introduced by [[Luigi Ambrosio]] and [[Ennio de Giorgi]] in the paper {{Harv|Ambrosio|De Giorgi|1988}}, dealing with free discontinuity [[variational problem]]s: given an [[open subset]] '''<math> \Omega </math>''' of ℝ''<sup>n</sup>'', the space '''<math>SBV(\Omega)</math>''' is a proper [[linear subspace]] of '''<math>BV(\Omega)</math>''', since the [[weak derivative|weak]] [[gradient]] of each function belonging to it consists precisely of the [[sum]] of an <math>n</math>-[[dimension]]al [[Support (mathematics)|support]] and an <math>n-1</math>-[[dimension]]al [[Support (mathematics)|support]] [[Measure (mathematics)|measure]] and ''no intermediate-dimensional terms'', as seen in the following definition. + +'''Definition'''. Given a [[locally integrable function]] '''<math>u</math>''', then <math>\scriptstyle u\in {S\!BV}(\Omega) </math> if and only if + +'''1.''' There exist two [[Borel function]]s <math>f</math> and <math>g</math> of [[Domain of a function|domain]] '''<math>\Omega</math>''' and [[codomain]] ℝ''<sup>n</sup>'' such that + +:<math> \int_\Omega\vert f\vert dH^n+ \int_\Omega\vert g\vert dH^{n-1}<+\infty.</math> + +'''2.''' For all of [[Smooth function|continuously differentiable]] [[Vector valued function|vector functions]] <math> \scriptstyle\phi </math> of [[support (mathematics)#Compact support|compact support]] contained in '''<math> \Omega </math>''', ''i.e.'' for all <math> \scriptstyle \phi \in +C_c^1(\Omega,\mathbb{R}^n)</math> the following formula is true: + +:<math> \int_\Omega u\mbox{div} \phi dH^n = \int_\Omega \langle \phi, f\rangle dH^n +\int_\Omega \langle \phi, g\rangle dH^{n-1}.</math> + +where <math>H^\alpha</math> is the <math>\alpha</math>-[[dimension]]al [[Hausdorff measure]]. + +Details on the properties of ''SBV'' functions can be found in works cited in the bibliography section: particularly the paper {{Harv|De Giorgi|1992}} contains a useful [[bibliography]]. + +===''bv'' sequences=== +As particular examples of [[Banach spaces]], {{harvtxt|Dunford|Schwartz|1958|loc=Chapter IV}} consider spaces of '''sequences of bounded variation''', in addition to the spaces of functions of bounded variation. The total variation of a [[sequence (mathematics)|sequence]] ''x''=(''x''<sub>i</sub>) of real or complex numbers is defined by +:<math>TV(x) = \sum_{i=1}^\infty |x_{i+1}-x_i|.</math> + +The space of all sequences of finite total variation is denoted by ''bv''. The norm on ''bv'' is given by +:<math>\|x\|_{bv} = |x_1| + TV(x) = |x_1| + \sum_{i=1}^\infty |x_{i+1}-x_i|.</math> +With this norm, the space ''bv'' is a Banach space. + +The total variation itself defines a norm on a certain subspace of ''bv'', denoted by ''bv''<sub>0</sub>, consisting of sequences ''x'' = (''x''<sub>i</sub>) for which +:<math>\lim_{n\to\infty} x_n =0.</math> +The norm on ''bv''<sub>0</sub> is denoted +:<math>\|x\|_{bv_0} = TV(x) = \sum_{i=1}^\infty |x_{i+1}-x_i|.</math> +With respect to this norm ''bv''<sub>0</sub> becomes a Banach space as well. + +===Measures of bounded variation=== +A [[signed measure|signed]] (or [[complex measure|complex]]) [[Measure (mathematics)|measure]] ''<math>\mu</math>'' on a [[sigma-algebra|measurable space]] <math>(X,\Sigma)</math> is said to be of bounded variation if its [[Total variation#Total variation in measure theory|total variation]]'' <math>\scriptstyle\Vert \mu\Vert=|\mu|(X)</math>'' is bounded: see {{harvtxt|Halmos|1950|p=123}}, {{harvtxt|Kolmogorov|Fomin|1969|p=346}} or the entry "[[Total variation]]" for further details. + +==Examples== +[[File:Sin x^-1.svg|right|thumb|The function ''f''(''x'')=sin(1/''x'') is ''not'' of bounded variation on the interval <math> [0,2 / \pi] </math>.]] +The function + +:<math>f(x) = \begin{cases} 0, & \mbox{if }x =0 \\ \sin(1/x), & \mbox{if } x \neq 0 \end{cases} </math> + +is ''not'' of bounded variation on the interval <math> [0, 2/\pi]</math> + +[[File:Xsin(x^-1).svg|thumb|right|The function ''f''(''x'')=''x''&nbsp;sin(1/''x'') is ''not'' of bounded variation on the interval <math> [0,2 / \pi] </math>.]] +While it is harder to see, the continuous function + +:<math>f(x) = \begin{cases} 0, & \mbox{if }x =0 \\ x \sin(1/x), & \mbox{if } x \neq 0 \end{cases} </math> + +is ''not'' of bounded variation on the interval <math> [0, 2/\pi]</math> either. + +[[File:X^2sin(x^-1).svg|thumb|right|The function ''f''(''x'')=''x''<sup>2</sup>&nbsp;sin(1/''x'') ''is'' of bounded variation on the interval <math> [0,2 / \pi] </math>.]] +At the same time, the function + +:<math>f(x) = \begin{cases} 0, & \mbox{if }x =0 \\ x^2 \sin(1/x), & \mbox{if } x \neq 0 \end{cases} </math> + +is of bounded variation on the interval <math> [0,2/\pi]</math>. However, ''all three functions are of bounded variation on each interval'' <math>[a,b]</math> ''with'' <math>a>0</math>. + +The [[Sobolev space]] '''<math> W^{1,1}(\Omega)</math>''' is a [[proper subset]] of '''<math> BV(\Omega)</math>'''. In fact, for each '''<math> u </math>''' in '''<math> W^{1,1}(\Omega) </math>''' it is possible to choose a [[Measure (mathematics)|measure]] <math> \scriptstyle \mu:=\nabla u \mathcal L</math> (where <math> \scriptstyle\mathcal L</math> is the [[Lebesgue measure]] on '''<math>\Omega</math>''') such that the equality + +:<math> \int u\mathrm{div}\phi = -\int \phi\, d\mu = -\int \phi \nabla u \qquad \forall \phi\in C_c^1 </math> + +holds, since it is nothing more than the definition of [[weak derivative]], and hence holds true. One can easily find an example of a ''BV'' function which is not '''<math>W^{1,1}</math>''': in dimension one, any step function with a non-trivial jump will do. + +==Applications== + +=== Mathematics === + +Functions of bounded variation have been studied in connection with the set of [[classification of discontinuities|discontinuities]] of functions and differentiability of real functions, and the following results are well-known. If <math>f</math> is a [[real number|real]] [[Function (mathematics)|function]] of bounded variation on an interval <math>[a,b]</math> then + +* <math>f</math> is [[continuous function|continuous]] except at most on a [[countable set]]; +* <math>f</math> has [[one-sided limit]]s everywhere (limits from the left everywhere in <math>(a,b]</math>, and from the right everywhere in <math>[a,b)</math> ; +* the [[derivative]] <math>f'(x)</math> exists [[almost everywhere]] (i.e. except for a set of [[measure zero]]). + +For [[real number|real]] [[Function (mathematics)|functions]] of several real variables + +* the [[Indicator function|characteristic function]] of a [[Caccioppoli set]] is a ''BV'' function: ''BV'' functions lie at the basis of the modern theory of perimeters. +* [[Minimal surface]]s are [[Graph of a function|graph]]s of ''BV'' functions: in this context, see reference {{Harv|Giusti|1984}}. + +===Physics and engineering=== +The ability of ''BV'' functions to deal with discontinuities has made their use widespread in the applied sciences: solutions of problems in mechanics, physics, chemical kinetics are very often representable by functions of bounded variation. The book {{Harv|Hudjaev|Vol'pert|1985}} details a very ample set of mathematical physics applications of ''BV'' functions. Also there is some modern application which deserves a brief description. + +*The [[Mumford-Shah Functional]]: the segmentation problem for a two-dimensional image, i.e. the problem of faithful reproduction of contours and grey scales is equivalent to the [[minimum|minimization]] of such [[Functional (mathematics)|functional]]. + +==See also== +<div style="-moz-column-count:4; column-count:4;"> +* [[Renato Caccioppoli]] +* [[Caccioppoli set]] +* [[Lamberto Cesari]] +* [[Ennio de Giorgi]] +* [[Helly's selection theorem]] +* [[Locally integrable function]] +* [[Lp space|''L''<sup>''p''</sup>(&Omega;) space]] +* [[Lebesgue&ndash;Stieltjes integral]] +* [[Radon measure]] +* [[Reduced derivative]] +* [[Riemann&ndash;Stieltjes integral]] +* [[Total variation]] +* [[Aizik Isaakovich Vol'pert]] +</div> + +==Notes== +{{Reflist|30em}} + +==References== +*[[Luigi Ambrosio|Ambrosio]], Luigi; Fusco, Nicola; Pallara, Diego (2000) Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York. +*{{Citation +| last = Dunford +| first = Nelson +| author-link = Nelson Dunford +| last2 = Jacob T. +| first2 = Schwartz +| author2-link = Jacob T. Schwartz +| title = Linear operators. Part I: General Theory +| place = [[New York]]-[[London]]-[[Sydney]] +| publisher = [[Wiley-Interscience]] +| year = 1958 +| series = Pure and Applied Mathematics +| volume = VII +| isbn = 0-471-60848-3 +| zbl = 0084.10402 +}}. Includes a discussion of the functional-analytic properties of spaces of functions of bounded variation. +*{{Citation +| last = Giaquinta +| first = Mariano +| author-link = +| last2 = Modica +| first2 = Giuseppe +| author2-link = +| last3 = Souček +| first3 = Jiří +| author3-link = +| title = Cartesian Currents in the Calculus of Variation I +| place = Berlin-Heidelberg-New York +| publisher = [[Springer Verlag]] +| year = 1998 +| series = [[Ergebnisse der Mathematik und ihrer Grenzgebiete]]. 3. Folge. A Series of Modern Surveys in Mathematics +| volume = 37 +| url = http://books.google.com/books?id=xx2vhd_uPS0C&printsec=frontcover&hl=it#v=onepage&q&f=true +| isbn = 3-540-64009-6 +| zbl = 0914.49001}}. +*{{Citation +| last = Giusti +| first = Enrico +| author-link = +| title = Minimal surfaces and functions of bounded variations +| place = [[Basel]]-[[Boston]]-[[Stuttgart]] +| publisher = [http://www.birkhauser.com Birkhäuser Verlag] +| year = 1984 +| series = Monographs in Mathematics +| volume = 80 +| url = http://books.google.com/?id=dNgsmArDoeQC&printsec=frontcover&dq=Minimal+surfaces+and+functions+of+bounded+variations +| isbn = 978-0-8176-3153-6 +| zbl = 0545.49018}}, particularly part I, chapter 1 "''Functions of bounded variation and Caccioppoli sets''". A good reference on the theory of [[Caccioppoli set]]s and their application to the [[Minimal surface]] problem. +*{{Citation +| last = Halmos +| first = Paul +| author-link = Paul Halmos +| title = Measure theory +| publisher = Van Nostrand and Co. +| year = 1950 +| url = http://books.google.com/?id=-Rz7q4jikxUC&printsec=frontcover&dq=halmos+measure+theory#PPP1,M1 +| isbn = 978-0-387-90088-9 +| zbl = 0040.16802 +}}. The link is to a preview of a later reprint by [[Springer-Verlag]]. +*{{Citation +| last = Hudjaev +| first = Sergei Ivanovich +| author-link = +| last2 = Vol'pert +| first2 = Aizik Isaakovich +| author2-link = Aizik Isaakovich Vol'pert +| title = Analysis in classes of discontinuous functions and equations of mathematical physics +| place = Dordrecht–Boston–Lancaster +| publisher = [[Martinus Nijhoff Publishers]] +| year = 1985 +| series = Mechanics: analysis +| volume = 8 +| url = http://books.google.com/?id=lAN0b0-1LIYC&printsec=frontcover&dq=%22Analysis+in+classes+of+discontinuous+functions%22 +| mr = 785938 +| isbn = 90-247-3109-7 +| zbl = 0564.46025 +}}. The whole book is devoted to the theory of ''BV'' functions and their applications to problems in [[mathematical physics]] involving [[discontinuous function]]s and geometric objects with [[smooth function|non-smooth]] [[boundary (topology)|boundaries]]. +*{{Citation +| last = Kannan +| first = Rangachary +| author-link = +| last2 = Krueger +| first2 = Carole King +| author2-link = +| title = Advanced analysis on the real line +| place = Berlin–Heidelberg–New York +| publisher = [[Springer Verlag]] +| year = 1996 +| series = Universitext +| pages = x+259 +| isbn = 978-0-387-94642-9 +| mr = 1390758 +| zbl = 0855.26001 +}}. Maybe the most complete book reference for the theory of ''BV'' functions in one variable: classical results and advanced results are collected in chapter 6 "''Bounded variation''" along with several exercises. The first author was a collaborator of [[Lamberto Cesari]]. +*{{Citation +| first=Andrej N. +| last=Kolmogorov +| author-link= Andrey Kolmogorov +| first2=Sergej V. +| last2=Fomin +| author2-link=Sergei Fomin +| title=Introductory Real Analysis +| publisher=[[Dover Publications]] +| pages=xii+403 +| url=http://books.google.com/?id=z8IaHgZ9PwQC&printsec=frontcover#v=onepage&q +| place=New York +| year=1969 +| isbn = 0-486-61226-0 +| mr=0377445 +| zbl=0213.07305 +}}. +*{{Citation +| last = Màlek +| first = Josef +| author-link = +| last2 = Nečas +| first2 = Jindřich +| author2-link = +| last3 = Rokyta +| first3 = Mirko +| last4 = Růžička +| first4 = Michael +| title = Weak and measure-valued solutions to evolutionary PDEs +| place = London-Weinheim-New York-Tokyo-Melbourne-Madras +| publisher = Chapman & Hall/[[CRC Press]] +| year = 1996 +| series = Applied Mathematics and Mathematical Computation +| volume = 13 +| pages = xi+331 +| url = http://books.google.com/?id=30_PBBzwSfAC&printsec=frontcover&dq=Weak+and+measure-valued+solutions+to+evolutionary+PDEs +| isbn = 0-412-57750-X +| mr = 1409366 +| zbl = 0851.35002}}. One of the most complete monographs on the theory of [[Young measure]]s, strongly oriented to applications in continuum mechanics of fluids. +*{{Citation +| last = Maz'ya +| first = Vladimir G. +| authorlink = Vladimir Gilelevich Maz'ya +| title = Sobolev Spaces +| publisher = [[Springer-Verlag]] +| location = Berlin-Heidelberg-New York +| year = 1985 +| isbn=0-387-13589-8 +| zbl = 0692.46023 +}}; particularly chapter 6, "On functions in the space '''<math>BV(\Omega)</math>'''". One of the best monographs on the theory of [[Sobolev spaces]]. +*{{Citation +| first = Jean Jacques +| last = Moreau +| author-link = +| editor-last = Moreau +| editor-first = J.J. +| editor2-last = Panagiotopoulos +| editor2-first = P.D. +| editor3-last = Strang +| editor3-first = G. +| editor3-link = Gilbert Strang +| contribution = Bounded variation in time +| contribution-url = +| title = Topics in nonsmooth mechanics +| year = 1988 +| pages = 1–74 +| place = Basel-Boston-Stuttgart +| publisher = [http://www.birkhauser.com Birkhäuser Verlag] +| isbn = 3-7643-1907-0 +| zbl = 0657.28008}} +*{{Citation +| last = Musielak +| first = Julian +| author-link = +| last2 = Orlicz +| first2 = Władysław +| author2-link = Władysław Orlicz +| title = On generalized variations (I) +| journal = [[Studia Mathematica]] +| place = Warszawa-Wrocław +| volume = 18 +| pages = 13–41 +| year = 1959 +| url = http://matwbn.icm.edu.pl/ksiazki/sm/sm18/sm1812.pdf +| zbl = 0088.26901 +}}. The first paper where weighted ''BV'' functions are studied in full generality. +*{{Citation +| first=Frigyes +| last=Riesz +| author-link=Frigyes Riesz +| first2=Béla +| last2=Szőkefalvi-Nagy +| author2-link=Béla Szőkefalvi-Nagy +| title=Functional Analysis +| publisher=[[Dover Publications]] +| place=New York +| url=http://books.google.com/?id=jlQnThDV41UC&printsec=frontcover#v=onepage&q +| year=1990 +| isbn=0-486-66289-6 +| zbl=0732.47001 +}} +*{{Citation +| last = Vol'pert +| first = Aizik Isaakovich +| author-link = +| title = Spaces BV and quasi-linear equations +| journal = [[Matematicheskii Sbornik]] +| series = (N.S.) +| volume = 73(115) +| language = Russian +| issue = 2 +| pages = 255–302 +| year = 1967 +| url = http://mi.mathnet.ru/eng/msb/v115/i2/p255 +| mr = 216338 +| zbl = 0168.07402 +}}. A seminal paper where [[Caccioppoli set]]s and ''BV'' functions are thoroughly studied and the concept of [[functional superposition]] is introduced and applied to the theory of [[partial differential equation]]s: it was also translated in English as {{Citation +| title = Spaces BV and quasi-linear equations +| journal = [[Sbornik: Mathematics|Mathematics USSR-Sbornik]] +| volume = 2 +| issue = 2 +| pages = 225–267 +| year = 1967 +| url = +| doi = 10.1070/SM1967v002n02ABEH002340 +| mr = 216338 +| zbl = 0168.07402 +}}. + +==Bibliography== +*{{Citation +| last = Adams +| first = C. Raymond +| author-link = +| last2 = Clarkson +| first2 = James A. +| author2-link = +| title = On definitions of bounded variation for functions of two variables +| journal = [[Transactions of the American Mathematical Society]] +| volume = 35 +| pages = 824–824 +| year = 1933 +| url = http://www.ams.org/journals/tran/1933-035-04/S0002-9947-1933-1501718-2/home.html +| doi = 10.1090/S0002-9947-1933-1501718-2 +| mr = 1501718 +| zbl = 0008.00602 +| issue = 4 +}}. +*{{Citation +| last = Alberti +| first = Giovanni +| author-link = +| last2 = Mantegazza +| first2 = Carlo +| author2-link = +| title = A note on the theory of SBV functions +| journal = [http://umi.dm.unibo.it/bollettino_dell_unione_matematica_italiana--68.html Bollettino della Unione Matematica Italiana, Sezione B] +| volume = 11 +| issue = 2 +| pages = 375&ndash;382 +| date = +| year = 1997 +| url = +| doi = +| zbl = 0877.49001 +}}. A paper containing a proof of the [[Compact space#Compactness of topological spaces|compactness]] of the space of SBV functions. +*{{Citation +| last = Ambrosio +| first = Luigi +| author-link = Luigi Ambrosio +| last2 = Dal Maso +| first2 = Gianni +| author2-link = +| title = A General Chain Rule for Distributional Derivatives +| journal = [[Proceedings of the American Mathematical Society]] +| volume = 108 +| issue = 3 +| pages = 691–691 +| year = 1990 +| url = http://www.ams.org/proc/1990-108-03/S0002-9939-1990-0969514-3/home.html +| doi = 10.1090/S0002-9939-1990-0969514-3 +| mr = 969514 +| zbl = 0685.49027 +}}. A paper containing a very general [[chain rule]] formula for [[Function composition|composition]] of BV functions. +*{{Citation +| last = Ambrosio +| first = Luigi +| author-link = Luigi Ambrosio +| last2 = De Giorgi +| first2 = Ennio +| author2-link = Ennio de Giorgi +| title = Un nuovo tipo di funzionale del calcolo delle variazioni (A new kind of functional in the calculus of variations) +| journal = Atti della [[Accademia Nazionale dei Lincei]], Rendiconti della Classe di Scienze Fisiche, Matematiche e Naturali +| series = 8 +| volume = 82 +| issue = 2 +| pages = 199–210 +| year = 1988 +| url = +| doi = +| zbl = 0715.49014 +}} (in Italian, with English [[Abstract (summary)|summary]]). The first paper about ''SBV'' functions and related variational problems. +*{{Citation +| last = Cesari +| first = Lamberto +| author-link = Lamberto Cesari +| title = Sulle funzioni a variazione limitata (On the functions of bounded variation) +| journal = [http://www.sns.it/it/edizioni/riviste/annaliscienze/ Annali della Scuola Normale Superiore] +| series = II +| volume = 5 +| issue = 3–4 +| pages = 299&ndash;313 +| date = +| year = 1936 +| url = http://www.numdam.org/item?id=ASNSP_1936_2_5_3-4_299_0 +| doi = +| jfm = 62.0247.03 +| zbl = 0014.29605 +}} (in Italian). Available at [http://www.numdam.org Numdam]. +*{{Citation +| first = Lamberto +| last = Cesari +| author-link = Lamberto Cesari +| editor-last = Montalenti +| editor-first = G. +| editor2-last = Amerio +| editor2-first = L.; et als. +| contribution = L'opera di Leonida Tonelli e la sua influenza nel pensiero scientifico del secolo (the work of Leonida Tonelli and his influence on scientific thinking in this century) +| title = Convegno celebrativo del centenario della nascita di Mauro Picone e Leonida Tonelli (International congress in occasion of the celebration of the centenary of birth of Mauro Picone and Leonida Tonelli) +| url = http://www.lincei.it/pubblicazioni/catalogo/volume.php?lg=e&rid=32847 +| series = Atti dei Convegni Lincei +| volume = 77 +| year = 1986 +| date = 6–9 May 1985 +| pages = 41–73 +| place = [[Rome, Italy|Roma]] +| publisher = [[Accademia Nazionale dei Lincei]] + }} (in Italian). Some recollections from one of the founders of the theory of ''BV'' functions of several variables. +*{{Citation +| last = Conway +| first = Edward D. +| author-link = +| last2 = Smoller +| first2 = Joel A. +| author2-link = +| title = Global solutions of the Cauchy problem for quasi-linear first-order equations in several space variables +| journal = [[Communications on Pure and Applied Mathematics]] +| volume = 19 +| issue = 1 +| pages = 95–105 +| year = 1966 +| doi = 10.1002/cpa.3160190107 +| mr = 0192161 +| zbl = 0138.34701 +}}. An important paper where properties of ''BV'' functions were applied to obtain a global in time [[existence theorem]] for ''single'' [[hyperbolic equation]]s of first order in any number of [[Variable (mathematics)|variables]]. +*{{Citation +| first = Ennio +| last = De Giorgi +| author-link = Ennio de Giorgi +| editor-last = Amaldi +| editor-first = E. +| editor-link = Edoardo Amaldi +| editor2-last = Amerio +| editor2-first = L., et als. +| editor2-link = +| contribution = Problemi variazionali con discontinuità libere (Free-discontinuity variational problems) +| contribution-url = +| title = Convegno internazionale in memoria di Vito Volterra (International congress in memory of Vito Volterra), 8–11 October 1990 +| url = http://www.lincei.it/pubblicazioni/catalogo/volume.php?rid=32862 +| series = Atti dei Convegni Lincei +| volume = 92 +| year = 1992 +| pages = 133–150 +| place = Roma +| publisher = [[Accademia Nazionale dei Lincei]] +}}. A survey paper on free-discontinuity [[calculus of variations|variational problems]] including several details on the theory of ''SBV'' functions, their applications and a rich bibliography (in Italian), written by [[Ennio de Giorgi]]. +*{{Citation +| last = Jordan +| first = Camille +| author-link = Camille Jordan +| title = Sur la série de Fourier +| journal = [[Comptes rendus hebdomadaires des séances de l'Académie des sciences]] +| volume = 92 +| pages = 228–230 +| year = 1881 +| url = http://gallica.bnf.fr/ark:/12148/bpt6k7351t/f227.chemindefer +}} (at [[Gallica]]). This is, according to Boris Golubov, the first paper on functions of bounded variation. +*{{Citation +| last = Oleinik +| first = Olga A. +| author-link = Olga Arsenievna Oleinik +| title = Discontinuous solutions of non-linear differential equations +| journal = '[http://www.mathnet.ru/umn UMN]' +| volume = 12 +| issue = 3(75) +| pages = 3–73 +| year = 1957 +| url = http://mi.mathnet.ru/eng/umn/v12/i3/p3 +| zbl = 0080.07701 +}} (in [[Russian language|Russian]]). An important paper where the author describes generalized solutions of [[nonlinear equation|nonlinear]] [[partial differential equation]]s as ''BV'' functions. +*{{Citation +| last = Oleinik +| first = Olga A. +| author-link = Olga Arsenievna Oleinik +| title = Construction of a generalized solution of the Cauchy problem for a quasi-linear equation of first order by the introduction of "vanishing viscosity" +| journal = '[http://www.mathnet.ru/umn UMN]' +| volume = 14 +| issue = 2(86) +| pages = 159–164 +| year = 1959 +| url = http://mi.mathnet.ru/eng/umn/v14/i2/p159 +| zbl = 0096.06603 +}} (in [[Russian language|Russian]]). An important paper where the author constructs a [[weak solution]] in ''BV'' for a [[nonlinear equation|nonlinear]] [[partial differential equation]] with the method of [[vanishing viscosity]]. +*[[Tony F. Chan]] and Jackie (Jianhong) Shen (2005), [http://jackieneoshen.googlepages.com/ImagingNewEra.html ''Image Processing and Analysis - Variational, PDE, Wavelet, and Stochastic Methods''], SIAM Publisher, ISBN 0-89871-589-X (with in-depth coverage and extensive applications of Bounded Variations in modern image processing, as started by Rudin, Osher, and Fatemi). + +==External links== + +=== Theory === +* {{springer +| title= Variation of a function +| id= V/v096110 +| last= Golubov +| first= Boris I. +| author-link= +| last2= Vitushkin +| first2= Anatolii G. +| author2-link= Anatolii Georgievich Vitushkin +}} +*{{planetmath reference|id=6969|title=BV function}}. +*{{MathWorld +|author=Rowland, Todd and Weisstein, Eric W. +|title=Bounded Variation +|urlname=BoundedVariation}} +*[http://www.encyclopediaofmath.org/index.php/Function_of_bounded_variation Function of bounded variation] at [http://www.encyclopediaofmath.org/ Encyclopedia of Mathematics] + +===Other=== +* Luigi Ambrosio [http://cvgmt.sns.it/people/ambrosio/ home page] at the [[Scuola Normale Superiore]], [[Pisa]]. Academic home page (with preprints and publications) of one of the contributors to the theory and applications of BV functions. +* [http://cvgmt.sns.it/ Research Group in Calculus of Variations and Geometric Measure Theory], [[Scuola Normale Superiore]], [[Pisa]]. + +{{PlanetMath attribution|id=6969|title=BV function}} + +{{DEFAULTSORT:Bounded Variation}} +[[Category:Real analysis]] +[[Category:Calculus of variations]] +[[Category:Measure theory]] + 5dnujkc24jldawrt2mub5zm7lag6t1v + + + + Lindemann–Weierstrass theorem + 0 + 4188 + + 4189 + 2013-12-27T23:49:19Z + + 166.137.208.15 + + /* Preliminary Lemmas */ + wikitext + text/x-wiki + {{stack|{{Pi box}}|{{E (mathematical constant)}}}} +In [[mathematics]], the '''Lindemann–Weierstrass theorem''' is a result that is very useful in establishing the [[transcendental number|transcendence]] of numbers. It states that if α<sub>1</sub>,&nbsp;...,&nbsp;α<sub>''n''</sub> are [[algebraic number]]s which are [[linearly independent]] over the [[rational number]]s '''Q''', then ''e''<sup>α<sub>1</sub></sup>,&nbsp;...,&nbsp;''e''<sup>α<sub>''n''</sub></sup> are [[algebraically independent]] over '''Q'''; in other words the [[extension field]] '''Q'''(''e''<sup>α<sub>1</sub></sup>,&nbsp;...,&nbsp;''e''<sup>α<sub>''n''</sub></sup>) has [[transcendence degree]] ''n'' over '''Q'''. + +An equivalent formulation {{Harv|Baker|1975|loc=Chapter 1, Theorem 1.4}}, is the following: If α<sub>1</sub>,&nbsp;...,&nbsp;α<sub>''n''</sub> are distinct algebraic numbers, then the exponentials ''e''<sup>α<sub>1</sub></sup>,&nbsp;...,&nbsp;''e''<sup>α<sub>''n''</sub></sup> are linearly independent over the algebraic numbers. This equivalence transforms a linear relation over the algebraic numbers into an algebraic relation over the '''Q''' by using the fact that a [[symmetric polynomial]] whose arguments are all [[algebraic conjugate|conjugates]] of one another gives a rational number. + +The theorem is named for [[Ferdinand von Lindemann]] and [[Karl Weierstrass]]. Lindemann proved in 1882 that ''e''<sup>α</sup> is transcendental for every non-zero algebraic number α, thereby establishing that [[pi|π]] is transcendental (see below). Weierstrass proved the above more general statement in 1885. + +The theorem, along with the [[Gelfond–Schneider theorem]], is extended by [[Baker's theorem]], and all of these are further generalized by [[Schanuel's conjecture]]. + +==Naming convention== +The theorem is also known variously as the '''Hermite–Lindemann theorem''' and the '''Hermite–Lindemann–Weierstrass theorem'''. [[Charles Hermite]] first proved the simpler theorem where the α<sub>''i''</sub> exponents are required to be [[rational integer]]s and linear independence is only assured over the rational integers,<ref>''Sur la fonction exponentielle'', Comptes Rendus Acad. Sci. Paris, '''77''', pages 18–24, 1873.</ref> a result sometimes referred to as Hermite's theorem.<ref>A.O.Gelfond, ''Transcendental and Algebraic Numbers'', translated by Leo F. Boron, Dover Publications, 1960.</ref> Although apparently a rather special case of the above theorem, the general result can be reduced to this simpler case. Lindemann was the first to allow algebraic numbers into Hermite's work in 1882.<ref>''Über die Ludolph'sche Zahl'', Sitzungsber. Königl. Preuss. Akad. Wissensch. zu Berlin, '''2''', pages 679–682, 1882.</ref> Shortly afterwards Weierstrass obtained the full result,<ref>''Zu Hrn. Lindemanns Abhandlung: 'Über die Ludolph'sche Zahl' '', Sitzungber. Königl. Preuss. Akad. Wissensch. zu Berlin, '''2''', pages 1067–1086, 1885</ref> and further simplifications have been made by several mathematicians, most notably by [[David Hilbert]]. + +== Transcendence of ''e'' and π == +The [[transcendental number|transcendence]] of [[e (mathematical constant)|''e'']] and π are direct corollaries of this theorem. + +Suppose α is a nonzero algebraic number; then α is a linearly independent set over the rationals, and therefore by the first formulation of the theorem {''e''<sup>α</sup>} is an algebraically independent set; or in other words ''e''<sup>α</sup> is transcendental. In particular, ''e''<sup>1</sup> = ''e'' is transcendental. (A more elementary proof that ''e'' is transcendental is outlined in the article on [[transcendental number]]s.) + +Alternatively, by the second formulation of the theorem, if α is a nonzero algebraic number, then {0, α} is a set of distinct algebraic numbers, and so the set {''e''<sup>0</sup>,&nbsp;''e''<sup>α</sup>}&nbsp;=&nbsp;{1,&nbsp;''e''<sup>α</sup>} is linearly independent over the algebraic numbers and in particular ''e''<sup>α</sup> cannot be algebraic and so it is transcendental. + +The proof that π is transcendental is [[proof by contradiction|by contradiction]]. If π were algebraic, π''i'' would be algebraic as well, and then by the Lindemann–Weierstrass theorem ''e''<sup>π''i''</sup> = −1 (see [[Euler's identity]]) would be transcendental, a contradiction. + +A slight variant on the same proof will show that if α is a nonzero algebraic number then sin(α), cos(α), tan(α) and their [[hyperbolic function|hyperbolic]] counterparts are also transcendental. + +== ''p''-adic conjecture == +<blockquote>'''''p''-adic Lindemann–Weierstrass Conjecture.''' Suppose ''p'' is some [[prime number]] and α<sub>1</sub>,&nbsp;...,&nbsp;α<sub>''n''</sub> are [[p-adic numbers|''p''-adic numbers]] which are algebraic and linearly independent over '''Q''', such that |α<sub>''i''</sub>|<sub>''p''</sub>&nbsp;<&nbsp;1/''p'' for all ''i''; then the [[p-adic exponential function|''p''-adic exponential]]s exp<sub>''p''</sub>(α<sub>1</sub>),&nbsp;...,&nbsp;exp<sub>''p''</sub>(α<sub>''n''</sub>) are ''p''-adic numbers that are algebraically independent over '''Q'''.</blockquote> + +==Modular conjecture== +An analogue of the theorem involving the [[modular function]] [[j-invariant|''j'']] was conjectured by Daniel Bertrand in 1997, and remains an open problem.<ref>Daniel Bertrand, ''Theta functions and transcendence'', The Ramanujan Journal '''1''', pages 339&ndash;350, 1997.</ref> Writing ''q''&nbsp;=&nbsp;''e''<sup>2π''i''τ</sup> for the [[Nome (mathematics)|nome]] and ''j''(τ)&nbsp;=&nbsp;''J''(''q''), the conjecture is as follows. Let ''q''<sub>1</sub>, ..., ''q''<sub>''n''</sub> be non-zero algebraic numbers in the complex [[unit disc]] such that the 3''n'' numbers + +:<math>\left \{ J(q_1), J'(q_1), J''(q_1), \ldots, J(q_n), J'(q_n), J''(q_n) \right \}</math> + +are algebraically dependent over '''Q'''. Then there exist two indices 1&nbsp;≤&nbsp;''i''&nbsp;<&nbsp;''j''&nbsp;≤&nbsp;''n'' such that ''q<sub>i</sub>'' and ''q''<sub>''j''</sub> are multiplicatively dependent. + +==Lindemann–Weierstrass Theorem == +<blockquote>'''Lindemann–Weierstrass Theorem (Baker's Reformulation).''' If ''a''<sub>1</sub>, ..., ''a''<sub>''n''</sub> are non-zero algebraic numbers, and α<sub>1</sub>, ..., α<sub>''n''</sub> are distinct algebraic numbers, then<ref>{{fr icon}}[http://nombrejador.free.fr/article/lindemann-weierstrass_ttj.htm Proof's Lindemann-Weierstrass (HTML)]</ref> + +:<math>a_1 e^{\alpha_1} +\cdots + a_n e^{\alpha_n}\ne 0.</math></blockquote> + +==Proof== + +===Preliminary Lemmas=== +<blockquote>'''Lemma A.''' Let ''c''(1),&nbsp;...,&nbsp;''c''(''r'') be non-zero integers and, for every ''k'' between 1 and ''r'', let {γ(''k'')<sub>''i''</sub>} (''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''m''(''k'')) be the roots of a polynomial with integer [[coefficient]]s ''T''<sub>''k''</sub>(''x'') = ''v''(''k'')''x''<sup>''m''(''k'')</sup>&nbsp;+&nbsp;...&nbsp;+''u''(''k'') (with ''v''(''k''), ''u''(''k'') ≠&nbsp;0). If γ(''k'')<sub>''i''</sub>≠γ(''u'')<sub>''v''</sub> whenever (''k'',''i'')≠(''u'',''v''), then + +: <math>c(1)\left (e^{\gamma(1)_1}+\cdots+ e^{\gamma(1)_{m(1)}} \right ) + \cdots + c(r) \left (e^{\gamma(r)_1}+\cdots+ e^{\gamma(r)_{m(r)}} \right) \ne 0.</math></blockquote> + +'''Proof of Lemma A.''' To simplify the notation, let us put <math>n_0=0</math>, <math>n_i=\sum_{k=1}^i m(k)</math> (for <math>i=1,\dots,r</math>) and <math>n=n_r</math>. Let <math>\alpha_{n_i+j}=\gamma(i+1)_j</math> (for <math>0\le i<n_r</math> and <math>1\le j\le m(i)</math>). Let us also put <math>\beta_{n_i+j}=c(i+1)</math>. +The thesis becomes that <math>\sum_{k=1}^n \beta_k e^{\alpha_k}\neq 0</math>. + +Let ''p'' be a [[prime number]] and define the following polynomials: + +: <math>f_i(x) = \frac {l^{np} (x-\alpha_1)^p \cdots (x-\alpha_n)^p}{(x-\alpha_i)},</math> + +where ''l'' is a non-zero integer such that <math>l\alpha_1,\dots,l\alpha_n</math> are all algebraic integers, and the integrals: + +: <math>I_i(s) = \int^s_0 e^{s-x} f_i(x) \, dx.</math> + +(Up to a factor, this is the same integral appearing in [[Transcendental number#Sketch of a proof that ''e'' is transcendental|the proof that ''e'' is a transcendental number]], where β<sub>1</sub>,&nbsp;...,&nbsp;β<sub>''m''</sub>&nbsp;{{math|1==}}&nbsp;1,&nbsp;...,&nbsp;''m''. The rest of the proof of the Lemma is analog to that proof.) + +It can be shown by [[integration by parts]] that + +: <math>I_i(s) = e^s \sum_{j=0}^{np-1} f_i^{(j)}(0) - \sum_{j=0}^{np-1} f_i^{(j)}(s),</math> + +(<math>np-1</math> is the [[Degree of a polynomial|degree]] of <math>f_i</math>, and <math>f_i^{(j)}</math> is the ''j''th derivative of <math>f_i</math>). This also holds for ''s'' complex (in this case the integral has to be intended as a contour integral, for example along the straight segment from 0 to ''s'') because <math>-e^{s-x} \sum_{j=0}^{np-1} f_i^{(j)}(x)</math> is a primitive of <math>e^{s-x} f_i(x)</math>. + +Let us consider the following sum: + +: <math>J_i=\sum_{k=1}^n\beta_k I_i(\alpha_k)=\sum_{k=1}^n\left(\beta_k e^{\alpha_k}\sum_{j=0}^{np-1}f_i^{(j)}(0)\right)-\sum_{k=1}^n\left(\beta_k\sum_{j=0}^{np-1}f_i^{(j)}(\alpha_k)\right)=\left(\sum_{j=0}^{np-1}f_i^{(j)}(0)\right)\left(\sum_{k=1}^n \beta_k e^{\alpha_k}\right)-\sum_{k=1}^n\left(\beta_k\sum_{j=0}^{np-1}f_i^{(j)}(\alpha_k)\right)</math> + +Suppose now that <math>\sum_{k=1}^n \beta_k e^{\alpha_k}=0</math>: we will reach a contradiction by estimating <math>|J_1\cdots J_n|</math> in two different ways. + +We obtain <math>J_i=-\sum_{j=0}^{np-1}\sum_{k=1}^n\beta_k f_i^{(j)}(\alpha_k)</math>. Now <math>f_i^{(j)}(\alpha_k)</math> is an algebraic integer which is divisible by ''p''! for <math>j\ge p</math> and vanishes for <math>j<p</math> unless ''j''=''p''-1 and ''k''=''i'', in which case it equals <math>l^{np}(p-1)!\prod_{k\neq i}(\alpha_i-\alpha_k)^p</math>. + +This is not divisible by ''p'' (if ''p'' is large enough) because otherwise, putting <math>\delta_i=\prod_{k\neq i}(l\alpha_i-l\alpha_k)</math> (which is an algebraic integer) and calling <math>d_i</math> the product of its conjugates, we would get that ''p'' divides <math>l^p(p-1)!d_i^p</math> (and <math>d_i</math> is a non-zero integer), so by Fermat's little theorem ''p'' would divide <math>l(p-1)!d_i</math>, which is false. + +So <math>J_i</math> is a non-zero algebraic integer divisible by (''p''-1)!. Now +:<math>J_i=-\sum_{j=0}^{np-1}\sum_{t=0}^{r-1}c(t+1)\left(f_i^{(j)}(\alpha_{n_t+1})+\dots+f_i^{(j)}(\alpha_{n_{t+1}})\right).</math> +Since each <math>f_i(x)</math> is obtained by dividing a fixed polynomial with integer coefficients by <math>(x-\alpha_i)</math>, it is of the form <math>f_i(x)=\sum_{m=0}^{np-1}g_m(\alpha_i)x^m</math>, where <math>g_m</math> is a polynomial (with integer coefficients) independent of ''i''. The same holds for the derivatives <math>f_i^{(j)}(x)</math>. + +Hence, by the fundamental theorem of symmetric polynomials, <math>f_i^{(j)}(\alpha_{n_t+1})+\dots+f_i^{(j)}(\alpha_{n_{t+1}})</math> is a fixed polynomial with integer coefficients evaluated in <math>\alpha_i</math> (this is seen by grouping the same powers of <math>\alpha_{n_t+1},\dots,\alpha_{n_{t+1}}</math> appearing in the expansion and using the fact that <math>\alpha_{n_t+1},\dots,\alpha_{n_{t+1}}</math> are a complete set of conjugates). So the same is true of <math>J_i</math>, i.e. it equals <math>G(\alpha_i)</math>, where ''G'' is a polynomial with integer coefficients which is independent of ''i''. + +Finally <math>J_1\dots J_n=G(\alpha_1)\dots G(\alpha_n)</math> is an integer number (again by the fundamental theorem of symmetric polynomials), it is non-zero (since the <math>J_i</math>'s are) and it is divisible by <math>(p-1)!^n</math>. + +So <math>|J_1\dots J_n|\ge(p-1)!^n</math>, but clearly <math>|I_i(\alpha_k)|\le|\alpha_k|e^{|\alpha_k|}F_i(|\alpha_k|)</math>, where ''F''<sub>''i''</sub> is the polynomial whose coefficients are the absolute values of those of ''f''<sub>''i''</sub> (this follows directly from the definition of <math>I_i(s)</math>). + +Thus <math>|J_i|\le\sum_{k=1}^n|\beta_k\alpha_k|e^{|\alpha_k|}F_i(|\alpha_k|)</math> and so by the construction of the <math>f_i</math>'s we have <math>|J_1\dots J_n|\le C^p</math> for a sufficiently large ''C'' independent of ''p'', which contradicts the previous inequality. This proves Lemma A. + +<blockquote>'''Lemma B.''' If ''b''(1), ..., ''b''(''n'') are non-zero integers and γ(1), ..., γ(''n''), are distinct [[algebraic number]]s, then + +: <math>b(1)e^{\gamma(1)}+\cdots+ b(n)e^{\gamma(n)}\ne 0.</math></blockquote> + +'''Proof of Lemma B:''' Assuming + +:<math>b(1)e^{\gamma(1)}+\cdots+ b(n)e^{\gamma(n)}= 0,</math> + +we will derive a contradition, thus proving Lemma B. + +Let us choose a polynomial with integer coefficients which vanishes on all the <math>\gamma(k)</math>'s and let <math>\gamma(1),\dots,\gamma(n),\gamma(n+1),\dots,\gamma(N)</math> be all its distinct roots. Let ''b''(''n''+1)=...=''b''(''N'')=0. + +Let us consider the product <math>\prod_{\sigma\in S_N}(b(1) e^{\gamma(\sigma(1))}+\dots+b(N) e^{\gamma(\sigma(N))})</math>. This vanishes by assumption, but by expanding it we obtain a sum of terms of the form <math>e^{h_1\gamma(1)+\dots+h_N\gamma(N)}</math> multiplied by integer coefficients. + +Since the product is symmetric, we have that, for any <math>\tau\in S_n</math>, <math>e^{h_1\gamma(\tau(1))+\dots+h_N\gamma(\tau(N))}</math> has the same coefficient as <math>e^{h_1\gamma(1)+\dots+h_N\gamma(N)}</math>. + +Thus (after having grouped the terms with the same exponent) we see that the set of the exponents form a complete set of conjugates and, if two terms have conjugate exponents, they are multiplied by the same coefficient. + +So we are in the situation of Lemma A. To reach a contradiction it suffices to see that at least one of the coefficients is non-zero. + +This is seen by equipping <math>\mathbb{C}</math> with the lexicographic order and by choosing for each factor in the product the term with non-zero coefficient which has maximum exponent according to this ordering: the product of these terms has non-zero coefficient in the expansion and does not get simplified by any other term. This proves Lemma B. + +===Final step=== +We turn now to prove the theorem: Let ''a''(1), ..., ''a''(''n'') be non-zero [[algebraic number]]s, and α(1), ..., α(''n'') distinct algebraic numbers. Then let us assume that: + +: <math>a(1)e^{\alpha(1)}+\cdots + a(n)e^{\alpha(n)} = 0.</math> + +We will show that this leads to contradiction and thus prove the theorem. + +The proof is very similar to that of Lemma B, except that this time the choices are made over the ''a''(''i'')'s: + +For every ''i'' ∈ {1, ..., ''n''}, ''a''(''i'') is algebraic, so it is a root of a polynomial with integer coefficients, we denote its degree by ''d''(''i''). Let us denote the roots of this polynomial ''a''(''i'')<sub>1</sub>, ..., ''a''(''i'')<sub>''d''(''i'')</sub>, with ''a''(''i'')<sub>1</sub> = ''a''(''i''). + +Let σ be a function which chooses one element from each of the sequences (1, ..., ''d''(1)), (1, ..., ''d''(2)), ..., (1, ..., ''d''(''n'')), such that for every 1&nbsp;≤&nbsp;''i''&nbsp;≤&nbsp;''n'', σ(''i'') is an integer between 1 and ''d''(''i''). Then according to our assumption: + +: <math>\prod\nolimits_{\{\sigma\}}\left(a(1)_{\sigma(1)}e^{\alpha(1)}+\cdots+ a(n)_{\sigma(n)} e^{\alpha(n)}\right) = 0</math> + +where the product is over all possible choices. The product vanishes because one of the choices is just σ(''i'') = 1 for all ''i'', for which the term vanishes according to our assumption above. + +By expanding this product we get a sum of the form: + +: <math>b(1)e^{\beta(1)}+ b(2)e^{\beta(2)}+ \cdots + b(N)e^{\beta(N)}= 0.</math> + +for some non-zero integer ''N'', some distinct algebraic β(1), ..., β(''N'') (these are indeed algebraic because each is a sum of α's which are algebraic themselves), and ''b''(1), ..., ''b''(''N'') are polynomial in ''a''(''i'')<sub>''j''</sub> (''i'' in 1, ..., ''n'' and ''j'' in 1, ..., ''d''(''i'')) with integer coefficients. + +Since the product is over all possible choices, each of ''b''(1), ..., ''b''(''N'') is symmetric in ''a''(''i'')<sub>1</sub>, ..., ''a''(''i'')<sub>''d''(''i'')</sub> for every ''i''; therefore each of ''b''(1), ..., ''b''(''N'') is a polynomial with integer coefficients in elementary symmetric polynomials of the sets {''a''(''i'')<sub>1</sub>, ..., ''a''(''i'')<sub>''d''(''i'')</sub>} for every&nbsp;''i''. Each of the latter is a rational number (as in the proof of Lemma B). + +Thus ''b''(1), ..., ''b''(''N'') ∈ '''Q''', and by multiplying the equation with an appropriate integer factor, we get an identical equation except that now ''b''(1), ..., ''b''(''N'') are all integers. + +Therefore, according to Lemma B, the equality cannot hold, and we are led to a contradiction which completes the proof. + +Note that Lemma A is sufficient to prove that π is irrational, since otherwise we may write π = ''k''/''n'' (''k'',&nbsp;''n'', integers) and then ±''i''π are the roots of ''x''<sup>2</sup>&nbsp;+&nbsp;''k''<sup>2</sup>/''n''<sup>2</sup>; thus 2&nbsp;+&nbsp;''e''<sup>''i''π</sup>&nbsp;+&nbsp;''e''<sup>−''i''π</sup>&nbsp;≠&nbsp;0; but this is false. + +Similarly, Lemma B is sufficient to prove that π is transcendental, since otherwise we would have 1&nbsp;+&nbsp;''e''<sup>''i''π</sup>&nbsp;≠&nbsp;0. + +== References == +{{Reflist|2}} + +== Further reading == +*{{Citation|authorlink=Alan Baker (mathematician)|last=Baker|first=Alan|title=Transcendental Number Theory|publisher=Cambridge University Press|year=1975|isbn=0-521-39791-X}} + +{{DEFAULTSORT:Lindemann-Weierstrass theorem}} +[[Category:Exponentials]] +[[Category:Number theory]] +[[Category:Pi]] +[[Category:Transcendental numbers]] +[[Category:Articles containing proofs]] +[[Category:Theorems in number theory]] +[[Category:E (mathematical constant)]] + engbjl4o38faiqn627hd143grqvvj3y + + + + Orbit portrait + 0 + 15331 + + 15332 + 2011-07-29T10:23:36Z + + Yobot + 0 + + + [[WP:CHECKWIKI]] error fixes (category with space) + [[WP:GENFIXES|general fixes]] using [[Project:AWB|AWB]] (7796) + wikitext + text/x-wiki + {{Wikibooks|Fractals }} +In [[mathematics]], an '''orbit portrait''' is a combinatorial tool used in [[Complex analytic dynamics|complex dynamics]] for understanding the behavior of [[Complex quadratic polynomial|one-complex dimensional quadratic maps]]. + +In simple words one can say that it is : +* a list of external angles for which rays land on points of that orbit +* graph showing above list + +==Definition== +Given a [[Complex quadratic polynomial|quadratic map]] +:<math>f_c : z \to z^2 + c. \,</math> +from the [[complex plane]] to itself +:<math>f_c : \mathbb{\C} \to \mathbb{\C} \,</math> +and a [[w:Periodic points of complex quadratic mappings|repelling or parabolic]] periodic [[Orbit (dynamics)|orbit]] <math>{\mathcal O} = \{z_1, \ldots z_n\}</math> of <math>f\,</math>, so that <math>f(z_j) = z_{j+1}\,</math> (where subscripts are taken 1 + modulo <math>n</math>), let <math>A_j</math> be the set of [[external ray|angles]] whose corresponding [[external ray]]s land at <math>z_j\,</math>. + +Then the set <math>{\mathcal P} = {\mathcal P}({\mathcal O}) = \{A_1, \ldots A_n\}</math> is called '''the orbit portrait of the periodic orbit''' <math>{\mathcal O}</math>. + +All of the sets <math>A_j\,</math> must have the same number of elements, which is called the '''valence''' of the portrait. + +==Examples== +[[File:Julia-p9.png|right|thumb|Julia set with external rays landing on period 3 orbit]] +* Parabolic orbit portrait + +for [[complex quadratic polynomial]] with c= -0.03111+0.79111*i portrait of parabolic period 3 orbit is :<ref>[http://comet.lehman.cuny.edu/keenl/FlekKeenJDEA.pdf Boundaries of Bounded Fatou Components of Quadratic Maps Ross Flek and Linda Keen]</ref> +<math>{\mathcal P} = \left \{ +\left(\frac{74}{511},\frac{81}{511},\frac{137}{511} \right) , + \left(\frac{148}{511},\frac{162}{511},\frac{274}{511} \right) , + \left(\frac{296}{511},\frac{324}{511},\frac{37}{511} \right) +\right \rbrace</math> + +Valence = 3 rays per orbit point. + +Rays for above angles land on points of that orbit . Parameter c is a center of period 9 hyperbolic component of Mandelbrot set. + +==Properties== +Every orbit portrait <math>{\mathcal P}</math> has the following properties: + +*Each <math>A_j</math> is a finite subset of <math>{\mathbb R} / {\mathbb Z}</math> + +*The [[doubling map]] on the circle gives a bijection from <math>A_j</math> to <math>A_{j+1}</math> and preserves cyclic order of the angles.<ref>[http://www.ibiblio.org/e-notes/Chaos/saw.htm Chaotic 1D maps by Evgeny Demidov]</ref> + +*All of the angles in all of the sets <math>A_1, \ldots, A_n</math> are periodic under the doubling map of the circle, and all of the angles have the same exact period. This period must be a multiple of <math>n</math>, so the period is of the form <math>rn</math>, where <math>r</math> is called the recurrent ray period. + +*The sets <math>A_j</math> are pairwise unlinked, which is to say that given any pair of them, there are two disjoint intervals of <math> {\mathbb R }/ {\mathbb Z}</math> where each interval contains one of the sets. + +==Formal orbit portraits== +Any collection <math>\{A_1, \ldots, A_n\}</math> of subsets of the circle which satisfy these four properties above is called a '''formal orbit portrait'''. It is a theorem of [[John Milnor]] that every formal orbit portrait is realized by the actual orbit portrait of a periodic orbit of some quadratic one-complex-dimensional map. Orbit portraits contain dynamical information about how external rays and their landing points map in the plane, but formal orbit portraits are no more than combinatorial objects. Milnor's theorem states that, in truth, there is no distinction between the two. + +==Trivial orbit portraits== +Orbit portrait where all of the sets <math>A_j</math> have only a single element are called trivial, except for orbit portrait <math>{{0}}</math>. An alternative definition is that an orbit portrait is nontrivial if it is maximal, which in this case means that there is no orbit portrait that strictly contains it (i.e. there does not exist an orbit portrait <math>\{A^\prime_1,\ldots,A^\prime_n\}</math> such that <math>A_j \subsetneq A^\prime_j</math>). It is easy to see that every trivial formal orbit portrait is realized as the orbit portrait of some orbit of the map <math>f_0(z) = z^2</math>, since every external ray of this map lands, and they all land at distinct points of the [[Julia Set]]. Trivial orbit portraits are pathological in some respects, and in the sequel we will refer only to nontrivial orbit portraits. + +==Arcs== +In an orbit portrait <math>\{A_1, \ldots, A_n\}</math>, each <math>A_j</math> is a finite subset of the circle <math>\mathbb R / \mathbb Z</math>, so each <math>A_j</math> divides the circle into a number of disjoint intervals, called complementary arcs based at the point <math>z_j</math>. The length of each interval is referred to as its angular width. +Each <math>z_j</math> has a unique largest arc based at it, which is called its critical arc. The critical arc always has length greater than <math>\frac 1 2</math> + +These arcs have the property that every arc based at <math>z_j</math>, except for the critical arc, maps diffeomorphically to an arc based <math>z_{j+1}</math>, and the critical arc covers every arc based at <math>z_{j+1}</math> once, except for a single arc, which it covers twice. The arc that it covers twice is called the critical value arc for <math>z_{j+1}</math>. This is not necessarily distinct from the critical arc. + +When <math>c</math> escapes to infinity under iteration of <math>f_c</math>, or when <math>c</math> is in the Julia set, then <math>c</math> has a well-defined external angle. Call this angle <math>\theta_c</math>. <math>\theta_c</math> is in every critical value arc. Also, the two inverse images of <math>c</math> under the doubling map (<math>\frac {\theta_c} 2</math> and <math>\frac {\theta_c + 1} 2</math>) are both in every critical arc. + +Among all of the critical value arcs for all of the <math>A_j</math>'s, there is a unique smallest critical value arc <math>{\mathcal I}_{\mathcal P}</math>, called the '''characteristic arc''' which is strictly contained within every other critical value arc. The characteristic arc is a complete invariant of an orbit portrait, in the sense that two orbit portraits are identical if and only if they have the same characteristic arc. + +==Sectors== +Much as the rays landing on the orbit divide up the circle, they divide up the complex plane. For every point <math>z_j</math> of the orbit, the [[external ray]]s landing at <math>z_j</math> divide the plane into <math>v</math> open sets called sectors based at <math>z_j</math>. Sectors are naturally identified the complementary arcs based at the same point. The angular width of a sector is defined as the length of its corresponding complementary arc. Sectors are called '''critical sectors''' or '''critical value sectors''' when the corresponding arcs are, respectively, critical arcs and critical value arcs.<ref>[http://www.ibiblio.org/e-notes/MSet/wakes.htm Periodic orbits and external rays by Evgeny Demidov]</ref> + +Sectors also have the interesting property that <math>0</math> is in the critical sector of every point, and <math>c</math>, the [[Complex_quadratic_polynomial#Critical_value|critical value]] of <math>f_c</math>, is in the critical value sector. + +==Parameter wakes== +Two [[External ray|parameter rays]] with angles <math>t_-</math> and <math>t_+</math> land at the same point of the [[Mandelbrot Set]] in parameter space if and only if there exists an orbit portrait <math>\mathcal P</math> with the interval <math>[t_-, t_+]</math> as its characteristic arc. For any orbit portrait <math>\mathcal P</math> let <math>r_{\mathcal P}</math> be the common landing point of the two external angles in parameter space corresponding to the characteristic arc of <math>\mathcal P</math>. These two parameter rays, along with their common landing point, split the parameter space into two open components. Let the component that does not contain the point <math>0</math> be called the <math>\mathcal P</math>-wake and denoted as <math>{\mathcal W}_{\mathcal P}</math>. A [[Complex quadratic polynomial|quadratic polynomial]] <math>f_c(z) = z^2 + c</math> realizes the orbit portrait <math>{\mathcal P}</math> with a repelling orbit exactly when <math>c \in {\mathcal W}_{\mathcal P}</math>. <math>{\mathcal P}</math> is realized with a parabolic orbit only for the single value <math> c= r_{\mathcal P}</math> +for about + +==Primitive and satellite orbit portraits== +Other than the zero portrait, there are two types of orbit portraits: primitive and satellite. If +<math>v</math> is the valence of an orbit portrait <math>\mathcal P</math> and <math>r</math> is the recurrent ray period, then these two types may be characterized as follows: +* Primitive orbit portraits have <math>r = 1</math> and <math>v = 2</math>. Every ray in the portrait is mapped to itself by <math>f^n</math>. Each <math>A_j</math> is a pair of angles, each in a distinct orbit of the doubling map. In this case, <math>r_{\mathcal P}</math> is the base point of a baby Mandelbrot set in parameter space. +* Satellite orbit portraits have <math>r = v \ge 2</math>. In this case, all of the angles make up a single orbit under the doubling map. Additionally, <math>r_{\mathcal P}</math> is the base point of a parabolic bifurcation in parameter space. + +==See also== +* abstract Mandelbrot set <ref>[http://www.mostlymaths.net/2009/08/lavaurs-algorithm.html Lavaurs algorithm by Rubén Berenguel]</ref> +*[[Lamination (topology)|Lamination]] + +==References== +{{Reflist}} + +[[Category:Dynamical systems]] + 22rkzu69u1in4eskfaiywgq0dguiwa6 + + + + First-order logic + 0 + 390 + + 391 + 2014-01-22T16:23:29Z + + Arthur Rubin + 0 + + Reverted [[WP:AGF|good faith]] edits by [[Special:Contributions/Arthur Rubin|Arthur Rubin]] ([[User talk:Arthur Rubin|talk]]): Wrong button. ([[WP:TW|TW]]) + wikitext + text/x-wiki + '''First-order logic''' is a [[formal system]] used in [[mathematics]], [[philosophy]], [[linguistics]], and [[computer science]]. It is also known as '''first-order predicate calculus''', the '''lower predicate calculus''', '''quantification theory''', and [[predicate logic]]. First-order logic is distinguished from [[propositional logic]] by its use of [[Quantifier#Logic|quantified variables]]. + +A theory about some topic is usually first-order logic together with a specified [[domain of discourse]] over which the quantified variables range, finitely many functions which map from that domain into it, finitely many predicates defined on that domain, and a recursive set of axioms which are believed to hold for those things. Sometimes "theory" is understood in a more formal sense, which is just a set of sentences in first-order logic. + +The adjective "first-order" distinguishes first-order logic from [[higher-order logic]] in which there are predicates having predicates or functions as arguments, or in which one or both of predicate quantifiers or function quantifiers are permitted.<ref>{{cite book|last=Mendelson|first=Elliott|title=Introduction to Mathematical Logic|year=1964|publisher=Van Nostrand Reinhold|pages=56}}</ref> In first-order theories, predicates are often associated with sets. In interpreted higher-order theories, predicates may be interpreted as sets of sets. + +There are many [[deductive system]]s for first-order logic that are [[Soundness#Logical systems|sound]] (all provable statements are true) and [[Completeness#Logical completeness|complete]] (all true statements are provable). Although the [[logical consequence]] relation is only [[semidecidability|semidecidable]], much progress has been made in [[automated theorem proving]] in first-order logic. First-order logic also satisfies several [[metalogic]]al theorems that make it amenable to analysis in [[proof theory]], such as the [[Löwenheim–Skolem theorem]] and the [[compactness theorem]]. + +First-order logic is the standard for the formalization of mathematics into [[Axiomatic system|axioms]] and is studied in the [[foundations of mathematics]]. Mathematical theories, such as [[number theory]] and [[set theory]], have been formalized into first-order axiom schemata such as [[Peano arithmetic]] and [[Zermelo–Fraenkel set theory]] (ZF) respectively. + +No first-order theory, however, has the strength to describe fully and [[categorical theory|categorically]] structures with an infinite domain, such as the [[natural number]]s or the [[real line]]. Categorical axiom systems for these structures can be obtained in stronger logics such as [[second-order logic]]. + +For a history of first-order logic and how it came to dominate formal logic, see José Ferreirós (2001). + +==Introduction== +While [[propositional logic]] deals with simple declarative propositions, first-order logic additionally covers [[Predicate (logic)|predicate]]s and [[quantification]]. + +A predicate takes an entity or entities in the [[domain of discourse]] as input and outputs either True or False. Consider the two sentences "Socrates is a philosopher" and "Plato is a philosopher". In [[Propositional calculus|propositional logic]], these sentences are viewed as being unrelated and are denoted, for example, by ''p'' and ''q''. However, the predicate "is a philosopher" occurs in both sentences which have a common structure of "''a'' is a philosopher". The variable ''a'' is instantiated as "Socrates" in first sentence and is instantiated as "Plato" in the second sentence. The use of predicates, such as "is a philosopher" in this example, distinguishes first-order logic from propositional logic. + +Predicates can be compared. Consider, for example, the first-order formula "if ''a'' is a philosopher, then ''a'' is a scholar". This formula is a [[material conditional|conditional]] statement with "''a'' is a philosopher" as hypothesis and "''a'' is a scholar" as conclusion. The truth of this formula depends on which object is denoted by ''a'', and on the interpretations of the predicates "is a philosopher" and "is a scholar". + +Variables can be [[Quantifier|quantified]] over. The variable ''a'' in the previous formula can be quantified over, for instance, in the first-order sentence "For every ''a'', if ''a'' is a philosopher, then ''a'' is a scholar". The [[universal quantifier]] "for every" in this sentence expresses the idea that the claim "if ''a'' is a philosopher, then ''a'' is a scholar" holds for ''all'' choices of ''a''. + +The negation of the above sentence "For every ''a'', if ''a'' is a philosopher, then ''a'' is a scholar" is logically equivalent to the sentence "There exists ''a'' such that ''a'' is a philosopher and ''a'' is not a scholar". The [[existential quantifier]] "there exists" expresses the idea that the claim "''a'' is a philosopher and ''a'' is not a scholar" holds for ''some'' choice of ''a''. + +The predicates "is a philosopher" and "is a scholar" each take a single variable. Predicates can take several variables. In the first-order sentence "Socrates is the teacher of Plato", the predicate "is the teacher of" takes two variables. + +To interpret a first-order formula, one specifies what each predicate means and the entities that can instantiate the predicated variables. These entities form the [[domain of discourse]] or universe, which is usually required to be a nonempty set. Given that the interpretation with the domain of discourse as consisting of all human beings and the predicate "is a philosopher" understood as "have written the Republic", the sentence "There exist ''a'' such that ''a'' is a philosopher" is seen as being true, as witnessed by Plato. + +==Syntax== +There are two key parts of first order logic. The [[syntax]] determines which collections of symbols are legal expressions in first-order logic, while the [[semantics]] determine the meanings behind these expressions. + +===Alphabet=== +Unlike natural languages, such as English, the language of first-order logic is completely formal, so that it can be mechanically determined whether a given expression is legal. There are two key types of legal expressions: '''terms''', which intuitively represent objects, and '''formulas''', which intuitively express predicates that can be true or false. The terms and formulas of first-order logic are strings of '''symbols''' which together form the '''alphabet''' of the language. As with all [[formal language]]s, the nature of the symbols themselves is outside the scope of formal logic; they are often regarded simply as letters and punctuation symbols. + +It is common to divide the symbols of the alphabet into '''logical symbols''', which always have the same meaning, and '''non-logical symbols''', whose meaning varies by interpretation. For example, the logical symbol <math>\land</math> always represents "and"; it is never interpreted as "or". On the other hand, a non-logical predicate symbol such as Phil(''x'') could be interpreted to mean "''x'' is a philosopher", "''x'' is a man named Philip", or any other unary predicate, depending on the interpretation at hand. + +====Logical symbols==== +There are several logical symbols in the alphabet, which vary by author but usually include: +* The quantifier symbols [[∀]] and [[∃]] +* The [[logical connective]]s: ∧ for [[logical conjunction|conjunction]], ∨ for [[disjunction]], → for [[material conditional|implication]], ↔ for [[logical biconditional|biconditional]], ¬ for negation. Occasionally other logical connective symbols are included. Some authors use →, or C''pq'', instead of →, and ↔, or E''pq'', instead of ↔, especially in contexts where <math>\to</math> is used for other purposes. Moreover, the horseshoe ⊃ may replace →; the triple-bar ≡ may replace ↔, and a tilde (~), N''p'', or F''pq'', may replace ¬; ''||'', or A''pq'' may replace ∨; and &, K''pq'', or the middle dot, ⋅, may replace [[∧]], especially if these symbols are not available for technical reasons. (''Note'': the aforementioned symbols C''pq'', E''pq'', N''p'', A''pq'', and K''pq'' are used in [[Polish notation]].) +* Parentheses, brackets, and other punctuation symbols. The choice of such symbols varies depending on context. +* An infinite set of '''variables''', often denoted by lowercase letters at the end of the alphabet ''x'', ''y'', ''z'', … . Subscripts are often used to distinguish variables: ''x''<sub>0</sub>, ''x''<sub>1</sub>, ''x''<sub>2</sub>, … . +* An '''equality symbol''' (sometimes, '''identity symbol''') =; see [[#Equality_and_its_axioms|the section on equality below]]. + +It should be noted that not all of these symbols are required – only one of the quantifiers, negation and conjunction, variables, brackets and equality suffice. There are numerous minor variations that may define additional logical symbols: +* Sometimes the truth constants T, V''pq'', or ⊤, for "true" and F, O''pq'', or ⊥, for "false" are included. Without any such logical operators of valence 0, these two constants can only be expressed using quantifiers. +* Sometimes additional logical connectives are included, such as the [[Sheffer stroke]], D''pq'' (NAND), and [[exclusive or]], J''pq''. + +====Non-logical symbols====<!-- This section is linked from [[Axiom of empty set]] --> +The [[non-logical symbols]] represent predicates (relations), functions and constants on the domain of discourse. It used to be standard practice to use a fixed, infinite set of non-logical symbols for all purposes. A more recent practice is to use different non-logical symbols according to the application one has in mind. Therefore it has become necessary to name the set of all non-logical symbols used in a particular application. This choice is made via a '''[[signature (mathematical logic)|signature]]'''.<ref>The word ''language'' is sometimes used as a synonym for signature, but this can be confusing because "language" can also refer to the set of formulas.</ref> + +The traditional approach is to have only one, infinite, set of non-logical symbols (one signature) for all applications. Consequently, under the traditional approach there is only one language of first-order logic.<ref>More precisely, there is only one language of each variant of one-sorted first-order logic: with or without equality, with or without functions, with or without propositional variables, ….</ref> This approach is still common, especially in philosophically oriented books. +# For every integer ''n''&nbsp;≥&nbsp;0 there is a collection of [[arity|''n''-'''ary''']], or ''n''-'''place''', '''predicate symbols'''. Because they represent [[relation (mathematics)|relations]] between ''n'' elements, they are also called '''relation symbols'''. For each arity ''n'' we have an infinite supply of them: +#:''P''<sup>''n''</sup><sub>0</sub>, ''P''<sup>''n''</sup><sub>1</sub>, ''P''<sup>''n''</sup><sub>2</sub>, ''P''<sup>''n''</sup><sub>3</sub>, … +# For every integer ''n''&nbsp;≥&nbsp;0 there are infinitely many ''n''-ary '''function symbols''': +#:''f<sup> n</sup>''<sub>0</sub>, ''f<sup> n</sup>''<sub>1</sub>, ''f<sup> n</sup>''<sub>2</sub>, ''f<sup> n</sup>''<sub>3</sub>, … + +In contemporary mathematical logic, the signature varies by application. Typical signatures in mathematics are {1, ×} or just {×} for [[group (mathematics)|group]]s, or {0, 1, +, ×, <} for [[ordered field]]s. There are no restrictions on the number of non-logical symbols. The signature can be [[empty set|empty]], finite, or infinite, even [[uncountable]]. Uncountable signatures occur for example in modern proofs of the [[Löwenheim-Skolem theorem]]. + +In this approach, every non-logical symbol is of one of the following types. +# A '''predicate symbol''' (or '''relation symbol''') with some '''valence''' (or '''arity''', number of arguments) greater than or equal to 0. These are often denoted by uppercase letters ''P'', ''Q'', ''R'',... . +#* Relations of valence 0 can be identified with [[propositional variable]]s. For example, ''P'', which can stand for any statement. +#* For example, ''P''(''x'') is a predicate variable of valence 1. One possible interpretation is "''x'' is a man". +#* ''Q''(''x'',''y'') is a predicate variable of valence 2. Possible interpretations include "''x'' is greater than ''y''" and "''x'' is the father of ''y''". +# A '''function symbol''', with some valence greater than or equal to 0. These are often denoted by lowercase letters ''f'', ''g'', ''h'',... . +#* Examples: ''f''(''x'') may be interpreted as for "the father of ''x''". In [[arithmetic]], it may stand for "-x". In [[set theory]], it may stand for "the [[power set]] of x". In arithmetic, ''g''(''x'',''y'') may stand for "''x''+''y''". In set theory, it may stand for "the union of ''x'' and ''y''". +#* Function symbols of valence 0 are called '''constant symbols''', and are often denoted by lowercase letters at the beginning of the alphabet ''a'', ''b'', ''c'',... . The symbol ''a'' may stand for Socrates. In arithmetic, it may stand for 0. In set theory, such a constant may stand for the empty set. + +The traditional approach can be recovered in the modern approach by simply specifying the "custom" signature to consist of the traditional sequences of non-logical symbols. + +===Formation rules=== +The [[formation rule]]s define the terms and formulas of first order logic. When terms and formulas are represented as strings of symbols, these rules can be used to write a [[formal grammar]] for terms and formulas. These rules are generally [[Context-free grammar|context-free]] (each production has a single symbol on the left side), except that the set of symbols may be allowed to be infinite and there may be many start symbols, for example the variables in the case of [[First-order logic#Terms|terms]]. + +====Terms==== +The set of '''[[Term (mathematics)|terms]]''' is [[inductive definition|inductively defined]] by the following rules: +# '''Variables.''' Any variable is a term. +# '''Functions.''' Any expression ''f''(''t''<sub>1</sub>,...,''t''<sub>''n''</sub>) of ''n'' arguments (where each argument ''t''<sub>''i''</sub> is a term and ''f'' is a function symbol of valence ''n'') is a term. In particular, symbols denoting individual constants are 0-ary function symbols, and are thus terms. +Only expressions which can be obtained by finitely many applications of rules 1 and 2 are terms. For example, no expression involving a predicate symbol is a term. + +====Formulas==== +The set of '''[[formula (mathematical logic)|formulas]]''' (also called '''[[well-formed formula]]s'''<ref>Some authors who use the term "well-formed formula" use "formula" to mean any string of symbols from the alphabet. However, most authors in mathematical logic use "formula" to mean "well-formed formula" and have no term for non-well-formed formulas. In every context, it is only the well-formed formulas that are of interest.</ref> or '''wff'''s) is inductively defined by the following rules: +# '''Predicate symbols.''' If ''P'' is an ''n''-ary predicate symbol and ''t''<sub>''1''</sub>, ..., ''t''<sub>''n''</sub> are terms then ''P''(''t''<sub>1</sub>,...,''t''<sub>n</sub>) is a formula. +# '''Equality.''' If the equality symbol is considered part of logic, and ''t''<sub>''1''</sub> and ''t''<sub>2</sub> are terms, then ''t''<sub>1</sub> = ''t''<sub>2</sub> is a formula. +# '''Negation.''' If φ is a formula, then <math>\lnot</math>φ is a formula. +# '''Binary connectives.''' If φ and ψ are formulas, then (φ <math>\rightarrow</math> ψ) is a formula. Similar rules apply to other binary logical connectives. +# '''Quantifiers.''' If φ is a formula and ''x'' is a variable, then <math>\forall x \varphi</math> (for all x, <math>\varphi</math> holds) and <math>\exists x \varphi</math> (there exists x such that <math>\varphi</math>) are formulas. +Only expressions which can be obtained by finitely many applications of rules 1–5 are formulas. The formulas obtained from the first two rules are said to be '''[[atomic formula]]s'''. + +For example, +:<math>\forall x \forall y (P(f(x)) \rightarrow\neg (P(x) \rightarrow Q(f(y),x,z)))</math> +is a formula, if ''f'' is a unary function symbol, ''P'' a unary predicate symbol, and Q a ternary predicate symbol. On the other hand, <math>\forall x\, x \rightarrow</math> is not a formula, although it is a string of symbols from the alphabet. + +The role of the parentheses in the definition is to ensure that any formula can only be obtained in one way by following the inductive definition (in other words, there is a unique [[parse tree]] for each formula). This property is known as '''unique readability''' of formulas. There are many conventions for where parentheses are used in formulas. For example, some authors use colons or full stops instead of parentheses, or change the places in which parentheses are inserted. Each author's particular definition must be accompanied by a proof of unique readability. + +This definition of a formula does not support defining an if-then-else function <tt>ite(c, a, b)</tt>, where "c" is a condition expressed as a formula, that would return "a" if c is true, and "b" if it is false. This is because both predicates and functions can only accept terms as parameters, but the first parameter is a formula. Some languages built on first-order logic, such as SMT-LIB 2.0, add this.<ref>The SMT-LIB Standard: Version 2.0, by Clark Barrett, Aaron Stump, and Cesare Tinelli. http://goedel.cs.uiowa.edu/smtlib/</ref> + +====Notational conventions==== +For convenience, conventions have been developed about the precedence of the logical operators, to avoid the need to write parentheses in some cases. These rules are similar to the [[order of operations]] in arithmetic. A common convention is: +* <math>\lnot</math> is evaluated first +* <math>\land</math> and <math>\lor</math> are evaluated next +* Quantifiers are evaluated next +* <math>\to</math> is evaluated last. +Moreover, extra punctuation not required by the definition may be inserted to make formulas easier to read. Thus the formula +:<math>(\lnot \forall x P(x) \to \exists x \lnot P(x))</math> +might be written as +:<math>(\lnot [\forall x P(x)]) \to \exists x [\lnot P(x)].</math> + +In some fields, it is common to use infix notation for binary relations and functions, instead of the prefix notation defined above. For example, in arithmetic, one typically writes "2 + 2 = 4" instead of "=(+(2,2),4)". It is common to regard formulas in infix notation as abbreviations for the corresponding formulas in prefix notation. + +The definitions above use infix notation for binary connectives such as <math>\to</math>. A less common convention is [[Polish notation]], in which one writes <math>\rightarrow</math>, <math>\wedge</math>, and so on in front of their arguments rather than between them. This convention allows all punctuation symbols to be discarded. Polish notation is compact and elegant, but rarely used in practice because it is hard for humans to read it. In Polish notation, the formula +:<math>\forall x \forall y (P(f(x)) \rightarrow\neg (P(x) \rightarrow Q(f(y),x,z)))</math> +becomes {{nowrap|1="&forall;x&forall;y&rarr;Pfx&not;&rarr; PxQfyxz".}} + +===<span id="sentence"></span>Free and bound variables=== +{{Main|Free variables and bound variables}} + +In a formula, a variable may occur '''free''' or '''bound'''. Intuitively, a variable is free in a formula if it is not quantified: in <math>\forall y\, P(x,y)</math>, variable ''x'' is free while ''y'' is bound. The free and bound variables of a formula are defined inductively as follows. +# '''Atomic formulas.''' If φ is an atomic formula then ''x'' is free in φ if and only if ''x'' occurs in φ. Moreover, there are no bound variables in any atomic formula. +# '''Negation.''' ''x'' is free in <math>\neg</math>φ if and only if ''x'' is free in φ. ''x'' is bound in <math>\neg</math>φ if and only if ''x'' is bound in φ. +# '''Binary connectives.''' ''x'' is free in (φ <math>\rightarrow</math> ψ) if and only if ''x'' is free in either φ or ψ. ''x'' is bound in (φ <math>\rightarrow</math> ψ) if and only if ''x'' is bound in either φ or ψ. The same rule applies to any other binary connective in place of <math>\rightarrow</math>. +# '''Quantifiers.''' ''x'' is free in <math>\forall</math>''y'' φ if and only if ''x'' is free in φ and ''x'' is a different symbol from ''y''. Also, ''x'' is bound in <math>\forall</math>''y'' φ if and only if ''x'' is ''y'' or ''x'' is bound in φ. The same rule holds with <math>\exists</math> in place of <math>\forall</math>. + +For example, in <math>\forall</math>''x'' <math>\forall</math>''y'' (''P''(''x'')<math>\rightarrow</math> ''Q''(''x'',''f''(''x''),''z'')), ''x'' and ''y'' are bound variables, ''z'' is a free variable, and ''w'' is neither because it does not occur in the formula. + +Freeness and boundness can be also specialized to specific occurrences of variables in a formula. For example, in <math>P(x) \rightarrow \forall x\, Q(x)</math>, the first occurrence of ''x'' is free while the second is bound. In other words, the ''x'' in <math>P(x)</math> is free while the <math>x</math> in <math>\forall x\, Q(x)</math> is bound. + +A formula in first-order logic with no free variables is called a '''first-order [[sentence (mathematical logic)|sentence]]'''. These are the formulas that will have well-defined [[truth value]]s under an interpretation. For example, whether a formula such as Phil(''x'') is true must depend on what ''x'' represents. But the sentence <math>\exists x\, \text{Phil}(x)</math> will be either true or false in a given interpretation. + +===Examples=== + +====Abelian groups==== +In mathematics the language of ordered [[abelian groups]] has one constant symbol 0, one unary function symbol &minus;, one binary function symbol +, and one binary relation symbol ≤. Then: +*The expressions +(''x'', ''y'') and +(''x'', +(''y'', &minus;(''z''))) are '''terms'''. These are usually written as ''x'' + ''y'' and ''x'' + ''y'' &minus; ''z''. +*The expressions +(''x'', ''y'') = 0 and ≤(+(''x'', +(''y'', &minus;(''z''))), +(''x'', ''y'')) are '''atomic formulas'''. +:These are usually written as ''x'' + ''y'' = 0 and ''x'' + ''y'' − ''z'' &nbsp;≤&nbsp; ''x'' + ''y''. +*The expression <math>(\forall x \forall y \, \mathop{\leq}(\mathop{+}(x, y), z) \to \forall x\, \forall y\, \mathop{+}(x, y) = 0)</math> is a '''formula''', which is usually written as <math>\forall x \forall y ( x + y \leq z) \to \forall x \forall y (x+y = 0).</math> + +====Loving relation==== +There are 10 different formulas with 8 different meanings, that use the ''loving'' [[Binary relation|relation]] Lxy ("x loves y.") and the [[Quantification|quantifiers]] ∀ and ∃: + +{| style="margin: 1em auto 1em auto;" +|- +| style="width:350px" | +<!--START-->{| style="text-align: center; border: 1px solid darkgray; width:300px" +|- +|colspan="2"|<span style="color:darkgray;">No column/row is empty:</span> +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix a2e1.svg|thumb|center|120px|1. <math>\forall x \exist y Lyx</math>:<br>Everyone is loved by someone.]] +|[[File:Predicate logic; 2 variables; example matrix a1e2.svg|thumb|center|120px|2. <math>\forall x \exist y Lxy</math>:<br>Everyone loves someone.]] +|}<!--END--> +| rowspan="2" style="width:210px" | +<!--START-->{| style="text-align: center; border: 1px solid darkgray; width:160px" +|- +|colspan="2"|<span style="color:darkgray;">The diagonal is<br>nonempty/full:</span> +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix e(12).svg|thumb|center|120px|5. <math>\exist x Lxx</math>:<br>Someone loves himself.]] +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix a(12).svg|thumb|center|120px|6. <math>\forall x Lxx</math>:<br>Everyone loves himself.]] +|}<!--END--> +| rowspan="2" style="width:250px" | +<!--START-->{| style="text-align: center; border: 1px solid darkgray; width:160px" +|- +|colspan="2"|<span style="color:darkgray;">The matrix is<br>nonempty/full:</span> +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix e12.svg|thumb|center|120px|7. <math>\exist x \exist y Lxy</math>:<br>Someone loves someone.<br><br>8. <math>\exist x \exist y Lyx</math>:<br>Someone is loved by someone.]] +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix a12.svg|thumb|center|120px|9. <math>\forall x \forall y Lxy</math>:<br>Everyone loves everyone.<br><br>10. <math>\forall x \forall y Lyx</math>:<br>Everyone is loved by everyone.]] +|}<!--END--> +|rowspan="2"|[[File:Predicate logic; 2 variables; implications.svg|thumb|250px|right|[[Hasse diagram]] of the implications]] +|- +| +<!--START-->{| style="text-align: center; border: 1px solid darkgray; width:300px" +|- +|colspan="2"|<span style="color:darkgray;">One row/column is full:</span> +|- style="vertical-align:top;" +|[[File:Predicate logic; 2 variables; example matrix e1a2.svg|thumb|center|120px|3. <math>\exist x \forall y Lxy</math>:<br>Someone loves everyone.]] +|[[File:Predicate logic; 2 variables; example matrix e2a1.svg|thumb|center|120px|4. <math>\exist x \forall y Lyx</math>:<br>Someone is loved by everyone.]] +|}<!--END--> +|} + +The [[logical matrix|logical matrices]] represent the formulas for the case that there are five individuals that can love (vertical axis) and be loved (horizontal axis). Except for the sentences 9 and 10, they are examples. E.g. the matrix representing sentence 5 stands for "b loves himself."; the matrix representing sentences 7 and 8 stands for "c loves b." + +It's important and instructive to distinguish sentence 1, <math>\forall x \exist y Lyx</math>, and 3, <math>\exist x \forall y Lxy</math>: In both cases everyone is loved; but in the first case everyone is loved by someone, in the second case everyone is loved by the same person. + +Some sentences imply each other &mdash; e.g. if 3 is true also 1 is true, but not vice versa. (See Hasse diagram) + +==Semantics== +An [[Interpretation (logic)|interpretation]] of a first-order language assigns a denotation to all non-logical constants in that language. It also determines a [[domain of discourse]] that specifies the range of the quantifiers. The result is that each term is assigned an object that it represents, and each sentence is assigned a truth value. In this way, an interpretation provides semantic meaning to the terms and formulas of the language. The study of the interpretations of formal languages is called [[Formal semantics (logic)|formal semantics]]. What follows is a description of the standard or [[Truth definition#Tarski.27s Theory|Tarskian]] semantics for first-order logic. (It is also possible to define [[Game semantics#Classical logic|game semantics for first-order logic]], but aside from requiring the [[axiom of choice]], game semantics agree with Tarskian semantics for first-order logic, so game semantics will not be elaborated herein.) + +The domain of discourse ''D'' is a nonempty set of "objects" of some kind. Intuitively, a first-order formula is a statement about these objects; for example, <math>\exists x P(x)</math> states the existence of an object ''x'' such that the predicate ''P'' is true where referred to it. The domain of discourse is the set of considered objects. For example, one can take <math>D</math> to be the set of integer numbers. + +The interpretation of a function symbol is a function. For example, if the domain of discourse consists of integers, a function symbol ''f'' of arity 2 can be interpreted as the function that gives the sum of its arguments. In other words, the symbol ''f'' is associated with the function ''I(f)'' which, in this interpretation, is addition. + +The interpretation of a constant symbol is a function from the one-element set ''D''<sup>0</sup> to ''D'', which can be simply identified with an object in ''D''. For example, an interpretation may assign the value <math>I(c)=10</math> to the constant symbol <math>c</math>. + +The interpretation of an ''n''-ary predicate symbol is a set of ''n''-tuples of elements of the domain of discourse. This means that, given an interpretation, a predicate symbol, and ''n'' elements of the domain of discourse, one can tell whether the predicate is true of those elements according to the given interpretation. For example, an interpretation ''I(P)'' of a binary predicate symbol ''P'' may be the set of pairs of integers such that the first one is less than the second. According to this interpretation, the predicate ''P'' would be true if its first argument is less than the second. + +===First-order structures=== +{{Main|Structure (mathematical logic)}} + +The most common way of specifying an interpretation (especially in mathematics) is to specify a '''structure''' (also called a '''model'''; see below). The structure consists of a nonempty set ''D'' that forms the domain of discourse and an interpretation ''I'' of the non-logical terms of the signature. This interpretation is itself a function: +* Each function symbol ''f'' of arity ''n'' is assigned a function ''I(f)'' from <math>D^n</math> to <math>D</math>. In particular, each constant symbol of the signature is assigned an individual in the domain of discourse. +* Each predicate symbol ''P'' of arity ''n'' is assigned a relation ''I(P)'' over <math>D^n</math> or, equivalently, a function from <math>D^n</math> to <math>\{true, false\}</math>. Thus each predicate symbol is interpreted by a [[Boolean-valued function]] on ''D''. + +===Evaluation of truth values=== +A formula evaluates to true or false given an interpretation, and a '''variable assignment''' μ that associates an element of the domain of discourse with each variable. The reason that a variable assignment is required is to give meanings to formulas with free variables, such as <math>y = x</math>. The truth value of this formula changes depending on whether ''x'' and ''y'' denote the same individual. + +First, the variable assignment μ can be extended to all terms of the language, with the result that each term maps to a single element of the domain of discourse. The following rules are used to make this assignment: +# '''Variables.''' Each variable ''x'' evaluates to μ(''x'') +# '''Functions.''' Given terms <math>t_1, \ldots, t_n</math> that have been evaluated to elements <math>d_1, \ldots, d_n</math> of the domain of discourse, and a ''n''-ary function symbol ''f'', the term <math>f(t_1, \ldots, t_n)</math> evaluates to <math>(I(f))(d_1,\ldots,d_n)</math>. + +Next, each formula is assigned a truth value. The inductive definition used to make this assignment is called the [[T-schema]]. +# '''Atomic formulas (1).''' A formula <math>P(t_1,\ldots,t_n)</math> is associated the value true or false depending on whether <math>\langle v_1,\ldots,v_n \rangle \in I(P)</math>, where <math>v_1,\ldots,v_n</math> are the evaluation of the terms <math>t_1,\ldots,t_n</math> and <math>I(P)</math> is the interpretation of <math>P</math>, which by assumption is a subset of <math>D^n</math>. +# '''Atomic formulas (2).''' A formula <math>t_1 = t_2</math> is assigned true if <math>t_1</math> and <math>t_2</math> evaluate to the same object of the domain of discourse (see the section on equality below). +# '''Logical connectives.''' A formula in the form <math>\neg \phi</math>, <math>\phi \rightarrow +\psi</math>, etc. is evaluated according to the [[truth table]] for the connective in question, as in propositional logic. +# '''Existential quantifiers.''' A formula <math>\exists x \phi(x)</math> is true according to ''M'' and <math>\mu</math> if there exists an evaluation <math>\mu'</math> of the variables that only differs from <math>\mu</math> regarding the evaluation of ''x'' and such that φ is true according to the interpretation ''M'' and the variable assignment <math>\mu'</math>. This formal definition captures the idea that <math>\exists x \phi(x)</math> is true if and only if there is a way to choose a value for ''x'' such that φ(''x'') is satisfied. +# '''Universal quantifiers.''' A formula <math>\forall x \phi(x)</math> is true according to ''M'' and <math>\mu</math> if φ(''x'') is true for every pair composed by the interpretation ''M'' and some variable assignment <math>\mu'</math> that differs from <math>\mu</math> only on the value of ''x''. This captures the idea that <math>\forall x \phi(x)</math> is true if every possible choice of a value for ''x'' causes φ(''x'') to be true. + +If a formula does not contain free variables, and so is a sentence, then the initial variable assignment does not affect its truth value. In other words, a sentence is true according to ''M'' and <math>\mu</math> if and only if it is true according to ''M'' and every other variable assignment <math>\mu'</math>. + +There is a second common approach to defining truth values that does not rely on variable assignment functions. Instead, given an interpretation ''M'', one first adds to the signature a collection of constant symbols, one for each element of the domain of discourse in ''M''; say that for each ''d'' in the domain the constant symbol ''c''<sub>''d''</sub> is fixed. The interpretation is extended so that each new constant symbol is assigned to its corresponding element of the domain. One now defines truth for quantified formulas syntactically, as follows: +# '''Existential quantifiers (alternate).''' A formula <math>\exists x \phi(x)</math> is true according to ''M'' if there is some ''d'' in the domain of discourse such that <math>\phi(c_d)</math> holds. Here <math>\phi(c_d)</math> is the result of substituting ''c''<sub>''d''</sub> for every free occurrence of ''x'' in φ. +# '''Universal quantifiers (alternate).''' A formula <math>\forall x \phi(x)</math> is true according to ''M'' if, for every ''d'' in the domain of discourse, <math>\phi(c_d)</math> is true according to ''M''. +This alternate approach gives exactly the same truth values to all sentences as the approach via variable assignments. + +===Validity, satisfiability, and logical consequence=== +{{see also|Satisfiability}} +If a sentence φ evaluates to True under a given interpretation ''M'', one says that ''M'' '''satisfies''' φ; this is denoted <math>M \vDash \phi</math>. A sentence is '''satisfiable''' if there is some interpretation under which it is true. + +Satisfiability of formulas with free variables is more complicated, because an interpretation on its own does not determine the truth value of such a formula. The most common convention is that a formula with free variables is said to be satisfied by an interpretation if the formula remains true regardless which individuals from the domain of discourse are assigned to its free variables. This has the same effect as saying that a formula is satisfied if and only if its [[universal closure]] is satisfied. + +A formula is '''logically valid''' (or simply '''valid''') if it is true in every interpretation. These formulas play a role similar to [[tautology (logic)|tautologies]] in propositional logic. + +A formula φ is a '''logical consequence''' of a formula ψ if every interpretation that makes ψ true also makes φ true. In this case one says that φ is logically implied by ψ. + +===Algebraizations=== +An alternate approach to the semantics of first-order logic proceeds via [[abstract algebra]]. This approach generalizes the [[Lindenbaum–Tarski algebra]]s of propositional logic. There are three ways of eliminating quantified variables from first-order logic that do not involve replacing quantifiers with other variable binding term operators: +*[[Cylindric algebra]], by [[Alfred Tarski]] and his coworkers; +*[[Polyadic algebra]], by [[Paul Halmos]]; +*[[Predicate functor logic]], mainly due to [[Willard Van Orman Quine|Willard Quine]]. +These [[algebra]]s are all [[lattice (order)|lattices]] that properly extend the [[two-element Boolean algebra]]. + +Tarski and Givant (1987) showed that the fragment of first-order logic that has no [[atomic sentence]] lying in the scope of more than three quantifiers has the same expressive power as [[relation algebra]]. This fragment is of great interest because it suffices for [[Peano arithmetic]] and most [[axiomatic set theory]], including the canonical [[Zermelo–Fraenkel set theory|ZFC]]. They also prove that first-order logic with a primitive [[ordered pair]] is equivalent to a relation algebra with two ordered pair [[projection function]]s. + +===First-order theories, models, and elementary classes=== +A '''first-order theory''' of a particular signature is a set of [[axiom]]s, which are sentences consisting of symbols from that signature. The set of axioms is often finite or [[recursively enumerable]], in which case the theory is called '''effective'''. Some authors require theories to also include all logical consequences of the axioms. The axioms are considered to hold within the theory and from them other sentences that hold within the theory can be derived. + +A first-order structure that satisfies all sentences in a given theory is said to be a '''model''' of the theory. An '''[[elementary class]]''' is the set of all structures satisfying a particular theory. These classes are a main subject of study in [[model theory]]. + +Many theories have an '''[[intended interpretation]]''', a certain model that is kept in mind when studying the theory. For example, the intended interpretation of [[Peano arithmetic]] consists of the usual [[natural number]]s with their usual operations. However, the Löwenheim–Skolem theorem shows that most first-order theories will also have other, [[nonstandard model]]s. + +A theory is '''[[consistency|consistent]]''' if it is not possible to prove a contradiction from the axioms of the theory. A theory is '''[[complete theory|complete]]''' if, for every formula in its signature, either that formula or its negation is a logical consequence of the axioms of the theory. [[Gödel's incompleteness theorem]] shows that effective first-order theories that include a sufficient portion of the theory of the natural numbers can never be both consistent and complete. + +For more information on this subject see [[List of first-order theories]] and [[Theory (mathematical logic)]] + +===Empty domains=== +{{Main|Empty domain}} +The definition above requires that the domain of discourse of any interpretation must be a nonempty set. There are settings, such as [[inclusive logic]], where empty domains are permitted. Moreover, if a class of algebraic structures includes an empty structure (for example, there is an empty [[poset]]), that class can only be an elementary class in first-order logic if empty domains are permitted or the empty structure is removed from the class. + +There are several difficulties with empty domains, however: +* Many common rules of inference are only valid when the domain of discourse is required to be nonempty. One example is the rule stating that <math>\phi \lor \exists x \psi</math> implies <math>\exists x (\phi \lor \psi)</math> when ''x'' is not a free variable in φ. This rule, which is used to put formulas into [[prenex normal form]], is sound in nonempty domains, but unsound if the empty domain is permitted. +* The definition of truth in an interpretation that uses a variable assignment function cannot work with empty domains, because there are no variable assignment functions whose range is empty. (Similarly, one cannot assign interpretations to constant symbols.) This truth definition requires that one must select a variable assignment function (μ above) before truth values for even atomic formulas can be defined. Then the truth value of a sentence is defined to be its truth value under any variable assignment, and it is proved that this truth value does not depend on which assignment is chosen. This technique does not work if there are no assignment functions at all; it must be changed to accommodate empty domains. +Thus, when the empty domain is permitted, it must often be treated as a special case. Most authors, however, simply exclude the empty domain by definition. + +==Deductive systems== +A '''deductive system''' is used to demonstrate, on a purely syntactic basis, that one formula is a logical consequence of another formula. There are many such systems for first-order logic, including [[Hilbert-style deductive system]]s, [[natural deduction]], the [[sequent calculus]], the [[Method of analytic tableaux|tableaux method]], and [[Resolution (logic)|resolution]]. These share the common property that a deduction is a finite syntactic object; the format of this object, and the way it is constructed, vary widely. These finite deductions themselves are often called '''derivations''' in proof theory. They are also often called proofs, but are completely formalized unlike natural-language [[mathematical proofs]]. + +A deductive system is '''sound''' if any formula that can be derived in the system is logically valid. Conversely, a deductive system is '''complete''' if every logically valid formula is derivable. All of the systems discussed in this article are both sound and complete. They also share the property that it is possible to effectively verify that a purportedly valid deduction is actually a deduction; such deduction systems are called '''effective'''. + +A key property of deductive systems is that they are purely syntactic, so that derivations can be verified without considering any interpretation. Thus a sound argument is correct in every possible interpretation of the language, regardless whether that interpretation is about mathematics, economics, or some other area. + +In general, logical consequence in first-order logic is only [[semidecidability|semidecidable]]: if a sentence A logically implies a sentence B then this can be discovered (for example, by searching for a proof until one is found, using some effective, sound, complete proof system). However, if A does not logically imply B, this does not mean that A logically implies the negation of B. There is no effective procedure that, given formulas A and B, always correctly decides whether A logically implies B. + +===Rules of inference=== +{{Further|List of rules of inference}} +A '''[[rule of inference]]''' states that, given a particular formula (or set of formulas) with a certain property as a hypothesis, another specific formula (or set of formulas) can be derived as a conclusion. The rule is sound (or truth-preserving) if it preserves validity in the sense that whenever any interpretation satisfies the hypothesis, that interpretation also satisfies the conclusion. + +For example, one common rule of inference is the '''rule of substitution'''. If ''t'' is a term and φ is a formula possibly containing the variable ''x'', then φ[''t''/''x''] (often denoted φ[''x''/''t'']) is the result of replacing all free instances of ''x'' by ''t'' in φ. The substitution rule states that for any φ and any term ''t'', one can conclude φ[''t''/''x''] from φ provided that no free variable of ''t'' becomes bound during the substitution process. (If some free variable of ''t'' becomes bound, then to substitute ''t'' for ''x'' it is first necessary to change the bound variables of φ to differ from the free variables of ''t''.) + +To see why the restriction on bound variables is necessary, consider the logically valid formula φ given by <math>\exists x (x = y)</math>, in the signature of (0,1,+,×,=) of arithmetic. If ''t'' is the term "x + 1", the formula φ[''t''/''y''] is <math>\exists x ( x = x+1)</math>, which will be false in many interpretations. The problem is that the free variable ''x'' of ''t'' became bound during the substitution. The intended replacement can be obtained by renaming the bound variable ''x'' of φ to something else, say ''z'', so that the formula after substitution is <math>\exists z ( z = x+1)</math>, which is again logically valid. + +The substitution rule demonstrates several common aspects of rules of inference. It is entirely syntactical; one can tell whether it was correctly applied without appeal to any interpretation. It has (syntactically-defined) limitations on when it can be applied, which must be respected to preserve the correctness of derivations. Moreover, as is often the case, these limitations are necessary because of interactions between free and bound variables that occur during syntactic manipulations of the formulas involved in the inference rule. + +===Hilbert-style systems and natural deduction=== +A deduction in a Hilbert-style deductive system is a list of formulas, each of which is a '''logical axiom''', a hypothesis that has been assumed for the derivation at hand, or follows from previous formulas via a rule of inference. The logical axioms consist of several [[axiom scheme]]s of logically valid formulas; these encompass a significant amount of propositional logic. The rules of inference enable the manipulation of quantifiers. Typical Hilbert-style systems have a small number of rules of inference, along with several infinite schemes of logical axioms. It is common to have only [[modus ponens]] and [[generalization (logic)|universal generalization]] as rules of inference. + +Natural deduction systems resemble Hilbert-style systems in that a deduction is a finite list of formulas. However, natural deduction systems have no logical axioms; they compensate by adding additional rules of inference that can be used to manipulate the logical connectives in formulas in the proof. + +===Sequent calculus=== +{{Further | Sequent calculus}} + +The sequent calculus was developed to study the properties of natural deduction systems. Instead of working with one formula at a time, it uses '''sequents''', which are expressions of the form +:<math>A_1, \ldots, A_n \vdash B_1, \ldots, B_k,</math> +where A<sub>1</sub>, ..., A<sub>''n''</sub>, B<sub>1</sub>, ..., B<sub>''k''</sub> are formulas and the turnstile symbol <math>\vdash</math> is used as punctuation to separate the two halves. Intuitively, a sequent expresses the idea that <math>(A_1 \land \cdots\land A_n)</math> implies <math>(B_1\lor\cdots\lor B_k)</math>. + +===Tableaux method=== +[[File:Prop-tableau-4.svg|thumb|right|150px|A tableaux proof for the [[propositional logic|propositional]] formula {{nowrap|1=((a ∨ ~b) & b) &rarr; a.}}]] + +{{Further | Method of analytic tableaux}} + +Unlike the methods just described, the derivations in the tableaux method are not lists of formulas. Instead, a derivation is a tree of formulas. To show that a formula A is provable, the tableaux method attempts to demonstrate that the negation of A is unsatisfiable. The tree of the derivation has <math>\lnot A</math> at its root; the tree branches in a way that reflects the structure of the formula. For example, to show that <math>C \lor D</math> is unsatisfiable requires showing that C and D are each unsatisfiable; this corresponds to a branching point in the tree with parent <math>C \lor D</math> and children C and D. + +===Resolution=== +The [[resolution (logic)|resolution rule]] is a single rule of inference that, together with [[Unification (computing)#Definition of unification for first-order logic|unification]], is sound and complete for first-order logic. As with the tableaux method, a formula is proved by showing that the negation of the formula is unsatisfiable. Resolution is commonly used in automated theorem proving. + +The resolution method works only with formulas that are disjunctions of atomic formulas; arbitrary formulas must first be converted to this form through [[Skolemization]]. The resolution rule states that from the hypotheses <math>A_1 \lor\cdots\lor A_k \lor C</math> and <math>B_1\lor\cdots\lor B_l\lor\lnot C</math>, the conclusion <math>A_1\lor\cdots\lor A_k\lor B_1\lor\cdots\lor B_l</math> can be obtained. + +===Provable identities=== +The following sentences can be called "identities" because the main connective in each is the biconditional. +:<math>\lnot \forall x \, P(x) \Leftrightarrow \exists x \, \lnot P(x)</math> +:<math>\lnot \exists x \, P(x) \Leftrightarrow \forall x \, \lnot P(x)</math> +:<math>\forall x \, \forall y \, P(x,y) \Leftrightarrow \forall y \, \forall x \, P(x,y)</math> +:<math>\exists x \, \exists y \, P(x,y) \Leftrightarrow \exists y \, \exists x \, P(x,y)</math> +:<math>\forall x \, P(x) \land \forall x \, Q(x) \Leftrightarrow \forall x \, (P(x) \land Q(x)) </math> +:<math>\exists x \, P(x) \lor \exists x \, Q(x) \Leftrightarrow \exists x \, (P(x) \lor Q(x)) </math> +:<math>P \land \exists x \, Q(x) \Leftrightarrow \exists x \, (P \land Q(x)) </math> (where <math>x</math> must not occur free in <math>P</math>) +:<math>P \lor \forall x \, Q(x) \Leftrightarrow \forall x \, (P \lor Q(x)) </math> (where <math>x</math> must not occur free in <math>P</math>) + +==Equality and its axioms== +There are several different conventions for using equality (or identity) in first-order logic. The most common convention, known as '''first-order logic with equality''', includes the equality symbol as a primitive logical symbol which is always interpreted as the real equality relation between members of the domain of discourse, such that the "two" given members are the same member. This approach also adds certain axioms about equality to the deductive system employed. These equality axioms are: +# '''Reflexivity'''. For each variable ''x'', ''x'' = ''x''. +# '''Substitution for functions.''' For all variables ''x'' and ''y'', and any function symbol ''f'', +#:''x'' = ''y'' → ''f''(...,''x'',...) = ''f''(...,''y'',...). +# '''Substitution for formulas'''. For any variables ''x'' and ''y'' and any formula φ(''x''), if φ' is obtained by replacing any number of free occurrences of ''x'' in φ with ''y'', such that these remain free occurrences of ''y'', then +#:''x'' = ''y'' → (φ → φ'). + +These are [[axiom scheme]]s, each of which specifies an infinite set of axioms. The third scheme is known as '''[[Leibniz's law]]''', "the principle of substitutivity", "the indiscernibility of identicals", or "the replacement property". The second scheme, involving the function symbol ''f'', is (equivalent to) a special case of the third scheme, using the formula +:''x'' = ''y'' → (''f''(...,''x'',...) = z → ''f''(...,''y'',...) = z). + +Many other properties of equality are consequences of the axioms above, for example: +# '''Symmetry.''' If ''x'' = ''y'' then ''y'' = ''x''. +# '''Transitivity.''' If ''x'' = ''y'' and ''y'' = ''z'' then ''x'' = ''z''. + +===First-order logic without equality=== +An alternate approach considers the equality relation to be a non-logical symbol. This convention is known as '''first-order logic without equality'''. If an equality relation is included in the signature, the axioms of equality must now be added to the theories under consideration, if desired, instead of being considered rules of logic. The main difference between this method and first-order logic with equality is that an interpretation may now interpret two distinct individuals as "equal" (although, by Leibniz's law, these will satisfy exactly the same formulas under any interpretation). That is, the equality relation may now be interpreted by an arbitrary [[equivalence relation]] on the domain of discourse that is [[congruence relation|congruent]] with respect to the functions and relations of the interpretation. + +When this second convention is followed, the term '''normal model''' is used to refer to an interpretation where no distinct individuals ''a'' and ''b'' satisfy ''a'' = ''b''. In first-order logic with equality, only normal models are considered, and so there is no term for a model other than a normal model. When first-order logic without equality is studied, it is necessary to amend the statements of results such as the [[Löwenheim–Skolem theorem]] so that only normal models are considered. + +First-order logic without equality is often employed in the context of [[second-order arithmetic]] and other higher-order theories of arithmetic, where the equality relation between sets of natural numbers is usually omitted. + +===Defining equality within a theory=== +If a theory has a binary formula ''A''(''x'',''y'') which satisfies reflexivity and Leibniz's law, the theory is said to have equality, or to be a theory with equality. The theory may not have all instances of the above schemes as axioms, but rather as derivable theorems. For example, in theories with no function symbols and a finite number of relations, it is possible to [[definitional extension|define]] equality in terms of the relations, by defining the two terms ''s'' and ''t'' to be equal if any relation is unchanged by changing ''s'' to ''t'' in any argument. + +Some theories allow other ''ad hoc'' definitions of equality: +* In the theory of [[partial order]]s with one relation symbol ≤, one could define ''s'' = ''t'' to be an abbreviation for ''s'' ≤ ''t'' <math>\wedge</math> ''t'' ≤ ''s''. +* In set theory with one relation <math>\in</math>, one may define ''s'' = ''t'' to be an abbreviation for <math>\forall</math>''x'' (''s'' <math>\in</math> ''x'' <math>\leftrightarrow</math> ''t'' <math>\in</math> ''x'') <math>\wedge</math> <math>\forall</math>''x'' (''x'' <math>\in</math> ''s'' <math>\leftrightarrow</math> ''x'' <math>\in</math> ''t''). This definition of equality then automatically satisfies the axioms for equality. In this case, one should replace the usual [[axiom of extensionality]], <math>\forall x \forall y [ \forall z (z \in x \Leftrightarrow z \in y) \Rightarrow x = y]</math>, by <math>\forall x \forall y [ \forall z (z \in x \Leftrightarrow z \in y) \Rightarrow \forall z (x \in z \Leftrightarrow y \in z) ]</math>, i.e. if ''x'' and ''y'' have the same elements, then they belong to the same sets. + +==Metalogical properties== +One motivation for the use of first-order logic, rather than [[higher-order logic]], is that first-order logic has many [[metalogic]]al properties that stronger logics do not have. These results concern general properties of first-order logic itself, rather than properties of individual theories. They provide fundamental tools for the construction of models of first-order theories. + +===Completeness and undecidability=== +[[Gödel's completeness theorem]], proved by [[Kurt Gödel]] in 1929, establishes that there are sound, complete, effective deductive systems for first-order logic, and thus the first-order logical consequence relation is captured by finite provability. Naively, the statement that a formula φ logically implies a formula ψ depends on every model of φ; these models will in general be of arbitrarily large cardinality, and so logical consequence cannot be effectively verified by checking every model. However, it is possible to enumerate all finite derivations and search for a derivation of ψ from φ. If ψ is logically implied by φ, such a derivation will eventually be found. Thus first-order logical consequence is [[semidecidable]]: it is possible to make an effective enumeration of all pairs of sentences (φ,ψ) such that ψ is a logical consequence of&nbsp;φ. + +Unlike [[propositional logic]], first-order logic is [[Decidability (logic)|undecidable]] (although semidecidable), provided that the language has at least one predicate of arity at least 2 (other than equality). This means that there is no [[decision procedure]] that determines whether arbitrary formulas are logically valid. This result was established independently by [[Alonzo Church]] and [[Alan Turing]] in 1936 and 1937, respectively, giving a negative answer to the [[Entscheidungsproblem]] posed by [[David Hilbert]] in 1928. Their proofs demonstrate a connection between the unsolvability of the decision problem for first-order logic and the unsolvability of the [[halting problem]]. + +There are systems weaker than full first-order logic for which the logical consequence relation is decidable. These include propositional logic and [[monadic predicate logic]], which is first-order logic restricted to unary predicate symbols and no function symbols. The [[Bernays–Schönfinkel class]] of first-order formulas is also decidable. Decidable subsets of first-order logic are also studied in the framework of [[description logics]]. + +===The Löwenheim–Skolem theorem=== +The [[Löwenheim–Skolem theorem]] shows that if a first-order theory of [[cardinality]] λ has any infinite model then it has models of every infinite cardinality greater than or equal to λ. One of the earliest results in model theory, it implies that it is not possible to characterize [[countable set|countability]] or uncountability in a first-order language. That is, there is no first-order formula φ(''x'') such that an arbitrary structure M satisfies φ if and only if the domain of discourse of M is countable (or, in the second case, uncountable). + +The Löwenheim–Skolem theorem implies that infinite structures cannot be [[categorical theory|categorically]] axiomatized in first-order logic. For example, there is no first-order theory whose only model is the real line: any first-order theory with an infinite model also has a model of cardinality larger than the continuum. Since the real line is infinite, any theory satisfied by the real line is also satisfied by some [[nonstandard model]]s. When the Löwenheim–Skolem theorem is applied to first-order set theories, the nonintuitive consequences are known as [[Skolem's paradox]]. + +===The compactness theorem=== +The [[compactness theorem]] states that a set of first-order sentences has a model if and only if every finite subset of it has a model. This implies that if a formula is a logical consequence of an infinite set of first-order axioms, then it is a logical consequence of some finite number of those axioms. This theorem was proved first by Kurt Gödel as a consequence of the completeness theorem, but many additional proofs have been obtained over time. It is a central tool in model theory, providing a fundamental method for constructing models. + +The compactness theorem has a limiting effect on which collections of first-order structures are elementary classes. For example, the compactness theorem implies that any theory that has arbitrarily large finite models has an infinite model. Thus the class of all finite [[graph (mathematics)|graphs]] is not an elementary class (the same holds for many other algebraic structures). + +There are also more subtle limitations of first-order logic that are implied by the compactness theorem. For example, in computer science, many situations can be modeled as a [[directed graph]] of states (nodes) and connections (directed edges). Validating such a system may require showing that no "bad" state can be reached from any "good" state. Thus one seeks to determine if the good and bad states are in different [[connected component (graph theory)|connected components]] of the graph. However, the compactness theorem can be used to show that connected graphs are not an elementary class in first-order logic, and there is no formula φ(''x'',''y'') of first-order logic, in the signature of graphs, that expresses the idea that there is a path from ''x'' to ''y''. Connectedness can be expressed in [[second-order logic]], however, but not with only existential set quantifiers, as <math>\Sigma_1^1</math> also enjoys compactness. + +===Lindström's theorem=== +{{Main|Lindström's theorem}} + +[[Per Lindström]] showed that the metalogical properties just discussed actually characterize first-order logic in the sense that no stronger logic can also have those properties (Ebbinghaus and Flum 1994, Chapter XIII). Lindström defined a class of abstract logical systems, and a rigorous definition of the relative strength of a member of this class. He established two theorems for systems of this type: +* A logical system satisfying Lindström's definition that contains first-order logic and satisfies both the Löwenheim–Skolem theorem and the compactness theorem must be equivalent to first-order logic. +* A logical system satisfying Lindström's definition that has a semidecidable logical consequence relation and satisfies the Löwenheim–Skolem theorem must be equivalent to first-order logic. + +== Limitations == +Although first-order logic is sufficient for formalizing much of mathematics, and is commonly used in computer science and other fields, it has certain limitations. These include limitations on its expressiveness and limitations of the fragments of natural languages that it can describe. + +For instance, first-order logic is undecidable, meaning a sound, complete and terminating decision algorithm is impossible. This has led to the study of interesting decidable fragments such as C<sub>2</sub>, first-order logic with two variables and the counting quantifiers <math>\exist^{\ge n}</math> and <math>\exist^{\le n}</math> (these quantifiers are, respectively, "there exists at least ''n''" and "there exists at most ''n''") (Horrocks 2010). + +=== Expressiveness === +The [[Löwenheim–Skolem theorem]] shows that if a first-order theory has any infinite model, then it has infinite models of every cardinality. In particular, no first-order theory with an infinite model can be [[Morley's categoricity theorem|categorical]]. Thus there is no first-order theory whose only model has the set of natural numbers as its domain, or whose only model has the set of real numbers as its domain. Many extensions of first-order logic, including infinitary logics and higher-order logics, are more expressive in the sense that they do permit categorical axiomatizations of the natural numbers or real numbers. This expressiveness comes at a metalogical cost, however: by [[Lindström's theorem]], the compactness theorem and the downward Löwenheim–Skolem theorem cannot hold in any logic stronger than first-order. + +===Formalizing natural languages=== +First-order logic is able to formalize many simple quantifier constructions in natural language, such as "every person who lives in Perth lives in Australia". But there are many more complicated features of natural language that cannot be expressed in (single-sorted) first-order logic. "Any logical system which is appropriate as an instrument for the analysis of natural language needs a much richer structure than first-order predicate logic" (Gamut 1991, p.&nbsp;75). + +{| class="wikitable" +|- +! Type !! Example !! Comment +|- +| Quantification over properties ||If John is self-satisfied, then there is at least one thing he has in common with Peter || Requires a quantifier over predicates, which cannot be implemented in single-sorted first-order logic: Zj→ ∃X(Xj∧Xp) +|- +| Quantification over properties || Santa Claus has all the attributes of a sadist || Requires quantifiers over predicates, which cannot be implemented in single-sorted first-order logic: ∀X(∀x(Sx → Xx)→Xs) +|- +| Predicate adverbial || John is walking quickly || Cannot be analysed as Wj ∧ Qj; predicate adverbials are not the same kind of thing as second-order predicates such as colour +|- +| Relative adjective|| Jumbo is a small elephant || Cannot be analysed as Sj ∧ Ej; predicate adjectives are not the same kind of thing as second-order predicates such as colour +|- +| Predicate adverbial modifier || John is walking very quickly || - +|- +| Relative adjective modifier || Jumbo is terribly small || An expression such as "terribly", when applied to a relative adjective such as "small", results in a new composite relative adjective "terribly small" +|- +| Prepositions || Mary is sitting next to John || The preposition "next to" when applied to "John" results in the predicate adverbial "next to John" +|} + +==Restrictions, extensions, and variations== +There are many variations of first-order logic. Some of these are inessential in the sense that they merely change notation without affecting the semantics. Others change the expressive power more significantly, by extending the semantics through additional quantifiers or other new logical symbols. For example, infinitary logics permit formulas of infinite size, and modal logics add symbols for possibility and necessity. + +===Restricted languages=== +First-order logic can be studied in languages with fewer logical symbols than were described above. +* Because <math>\exists x \phi(x)</math> can be expressed as <math>\neg \forall x \neg \phi(x)</math>, and <math>\forall x \phi(x)</math> can be expressed as <math>\neg \exists x \neg \phi(x)</math>, either of the two quantifiers <math>\exists</math> and <math>\forall</math> can be dropped. +* Since <math>\phi \lor \psi</math> can be expressed as <math>\lnot (\lnot \phi \land \lnot \psi)</math> and <math>\phi \land \psi</math> can be expressed as <math>\lnot(\lnot \phi \lor \lnot \psi)</math>, either <math>\vee</math> or <math>\wedge</math> can be dropped. In other words, it is sufficient to have <math>\neg</math> and <math>\vee</math>, or <math>\neg</math> and <math>\wedge</math>, as the only logical connectives. +* Similarly, it is sufficient to have only <math>\neg</math> and <math>\rightarrow</math> as logical connectives, or to have only the [[Sheffer stroke]] (NAND) or the [[Peirce arrow]] (NOR) operator. +* It is possible to entirely avoid function symbols and constant symbols, rewriting them via predicate symbols in an appropriate way. For example, instead of using a constant symbol <math> \; 0 </math> one may use a predicate <math> \; 0(x) </math> (interpreted as <math> \; x=0 </math> ), and replace every predicate such as <math>\; P(0,y) </math> with <math> \forall x \;(0(x) \rightarrow P(x,y)) </math>. A function such as <math> f(x_1,x_2,...,x_n) </math> will similarly be replaced by a predicate <math> F(x_1,x_2,...,x_n,y) </math> interpreted as <math> y = f(x_1,x_2,...,x_n) </math>. This change requires adding additional axioms to the theory at hand, so that interpretations of the predicate symbols used have the correct semantics. + +Restrictions such as these are useful as a technique to reduce the number of inference rules or axiom schemes in deductive systems, which leads to shorter proofs of metalogical results. The cost of the restrictions is that it becomes more difficult to express natural-language statements in the formal system at hand, because the logical connectives used in the natural language statements must be replaced by their (longer) definitions in terms of the restricted collection of logical connectives. Similarly, derivations in the limited systems may be longer than derivations in systems that include additional connectives. There is thus a trade-off between the ease of working within the formal system and the ease of proving results about the formal system. + +It is also possible to restrict the arities of function symbols and predicate symbols, in sufficiently expressive theories. One can in principle dispense entirely with functions of arity greater than 2 and predicates of arity greater than 1 in theories that include a [[pairing function]]. This is a function of arity 2 that takes pairs of elements of the domain and returns an [[ordered pair]] containing them. It is also sufficient to have two predicate symbols of arity 2 that define projection functions from an ordered pair to its components. In either case it is necessary that the natural axioms for a pairing function and its projections are satisfied. + +===Many-sorted logic=== +{{unreferenced section|reason=delocalize "citation needed" tag; the entire second paragraph is what is challenged|date=June 2013}} +Ordinary first-order interpretations have a single domain of discourse over which all quantifiers range. '''Many-sorted first-order logic''' allows variables to have different '''sorts''', which have different domains. This is also called '''typed first-order logic''', and the sorts called '''types''' (as in [[data type]]), but it is not the same as first-order [[type theory]]. Many-sorted first-order logic is often used in the study of [[second-order arithmetic]]. + +When there are only finitely many sorts in a theory, many-sorted first-order logic can be reduced to single-sorted first-order logic. One introduces into the single-sorted theory a unary predicate symbol for each sort in the many-sorted theory, and adds an axiom saying that these unary predicates partition the domain of discourse. For example, if there are two sorts, one adds predicate symbols <math>P_1(x)</math> and <math>P_2(x)</math> and the axiom +:<math>\forall x ( P_1(x) \lor P_2(x)) \land \lnot \exists x (P_1(x) \land P_2(x))</math>. +Then the elements satisfying <math>P_1</math> are thought of as elements of the first sort, and elements satisfying <math>P_2</math> as elements of the second sort. One can quantify over each sort by using the corresponding predicate symbol to limit the range of quantification. For example, to say there is an element of the first sort satisfying formula φ(''x''), one writes +:<math>\exists x (P_1(x) \land \phi(x))</math>. + +===Additional quantifiers=== +Additional quantifiers can be added to first-order logic. +* Sometimes it is useful to say that "''P(x)'' holds for exactly one ''x''", which can be expressed as <math>\exists!</math>''x'' ''P''(''x''). This notation, called [[uniqueness quantification]], may be taken to abbreviate a formula such as <math>\exists</math>''x'' (''P''(''x'') <math>\wedge\forall</math>''y'' (''P''(''y'') <math>\rightarrow</math> (''x'' = ''y''))). +*'''First-order logic with extra quantifiers''' has new quantifiers ''Qx'',..., with meanings such as "there are many ''x'' such that ...". Also see [[branching quantifier]]s and the [[plural quantification|plural quantifier]]s of [[George Boolos]] and others. +* '''[[Bounded quantifier]]s''' are often used in the study of set theory or arithmetic. + +===Infinitary logics=== +{{Main|Infinitary logic}} +Infinitary logic allows infinitely long sentences. For example, one may allow a conjunction or disjunction of infinitely many formulas, or quantification over infinitely many variables. Infinitely long sentences arise in areas of mathematics including [[topology]] and [[model theory]]. + +Infinitary logic generalizes first-order logic to allow formulas of infinite length. The most common way in which formulas can become infinite is through infinite conjunctions and disjunctions. However, it is also possible to admit generalized signatures in which function and relation symbols are allowed to have infinite arities, or in which quantifiers can bind infinitely many variables. Because an infinite formula cannot be represented by a finite string, it is necessary to choose some other representation of formulas; the usual representation in this context is a tree. Thus formulas are, essentially, identified with their parse trees, rather than with the strings being parsed. + +The most commonly studied infinitary logics are denoted ''L''<sub>αβ</sub>, where α and β are each either [[cardinal number]]s or the symbol ∞. In this notation, ordinary first-order logic is ''L''<sub>ωω</sub>. +In the logic ''L''<sub>∞ω</sub>, arbitrary conjunctions or disjunctions are allowed when building formulas, and there is an unlimited supply of variables. More generally, the logic that permits conjunctions or disjunctions with less than κ constituents is known as ''L''<sub>κω</sub>. For example, ''L''<sub>ω<sub>1</sub>ω</sub> permits [[countable set|countable]] conjunctions and disjunctions. + +The set of free variables in a formula of ''L''<sub>κω</sub> can have any cardinality strictly less than κ, yet only finitely many of them can be in the scope of any quantifier when a formula appears as a subformula of another.<ref>Some authors only admit formulas with finitely many free variables in ''L''<sub>κω</sub>, and more generally only formulas with <&nbsp;λ free variables in ''L''<sub>κλ</sub>.</ref> In other infinitary logics, a subformula may be in the scope of infinitely many quantifiers. For example, in ''L''<sub>κ∞</sub>, a single universal or existential quantifier may bind arbitrarily many variables simultaneously. Similarly, the logic ''L''<sub>κλ</sub> permits simultaneous quantification over fewer than λ variables, as well as conjunctions and disjunctions of size less than κ. + +===Non-classical and modal logics=== +*'''[[intuitionistic logic|Intuitionistic first-order logic]]''' uses intuitionistic rather than classical propositional calculus; for example, ¬¬φ need not be equivalent to φ. + +*First-order '''[[modal logic]]''' allows one to describe other possible worlds as well as this contingently true world which we inhabit. In some versions, the set of possible worlds varies depending on which possible world one inhabits. Modal logic has extra ''modal operators'' with meanings which can be characterized informally as, for example "it is necessary that φ" (true in all possible worlds) and "it is possible that φ" (true in some possible world). With standard first-order logic we have a single domain and each predicate is assigned one extension. With first-order modal logic we have a ''domain function'' that assigns each possible world its own domain, so that each predicate gets an extension only relative to these possible worlds. This allows us to model cases where, for example, Alex is a Philosopher, but might have been a Mathematician, and might not have existed at all. In the first possible world ''P''(''a'') is true, in the second ''P''(''a'') is false, and in the third possible world there is no ''a'' in the domain at all. + +*'''[[t-norm fuzzy logics|first-order fuzzy logics]]''' are first-order extensions of propositional fuzzy logics rather than classical [[propositional calculus]]. + +===Higher-order logics=== +{{Main|Higher-order logic}} + +The characteristic feature of first-order logic is that individuals can be quantified, but not predicates. Thus +:<math>\exists a ( \text{Phil}(a))</math> +is a legal first-order formula, but +:<math>\exists \text{Phil} ( \text{Phil}(a))</math> +is not, in most formalizations of first-order logic. [[Second-order logic]] extends first-order logic by adding the latter type of quantification. Other [[higher-order logic]]s allow quantification over even higher [[type theory|types]] than second-order logic permits. These higher types include relations between relations, functions from relations to relations between relations, and other higher-type objects. Thus the "first" in first-order logic describes the type of objects that can be quantified. + +Unlike first-order logic, for which only one semantics is studied, there are several possible semantics for second-order logic. The most commonly employed semantics for second-order and higher-order logic is known as '''full semantics'''. The combination of additional quantifiers and the full semantics for these quantifiers makes higher-order logic stronger than first-order logic. In particular, the (semantic) logical consequence relation for second-order and higher-order logic is not semidecidable; there is no effective deduction system for second-order logic that is sound and complete under full semantics. + +Second-order logic with full semantics is more expressive than first-order logic. For example, it is possible to create axiom systems in second-order logic that uniquely characterize the natural numbers and the real line. The cost of this expressiveness is that second-order and higher-order logics have fewer attractive metalogical properties than first-order logic. For example, the Löwenheim–Skolem theorem and compactness theorem of first-order logic become false when generalized to higher-order logics with full semantics. + +==Automated theorem proving and formal methods== +{{Further2|[[Automated theorem proving#First-order theorem proving|First-order theorem proving]]}} + +[[Automated theorem proving]] refers to the development of computer programs that search and find derivations (formal proofs) of mathematical theorems. Finding derivations is a difficult task because the [[search algorithm|search space]] can be very large; an exhaustive search of every possible derivation is theoretically possible but [[computational complexity|computationally infeasible]] for many systems of interest in mathematics. Thus complicated [[heuristic function]]s are developed to attempt to find a derivation in less time than a blind search. + +The related area of automated [[proof verification]] uses computer programs to check that human-created proofs are correct. Unlike complicated automated theorem provers, verification systems may be small enough that their correctness can be checked both by hand and through automated software verification. This validation of the proof verifier is needed to give confidence that any derivation labeled as "correct" is actually correct. + +Some proof verifiers, such as [[Metamath]], insist on having a complete derivation as input. Others, such as [[Mizar system|Mizar]] and [[Isabelle (theorem prover)|Isabelle]], take a well-formatted proof sketch (which may still be very long and detailed) and fill in the missing pieces by doing simple proof searches or applying known decision procedures: the resulting derivation is then verified by a small, core "kernel". Many such systems are primarily intended for interactive use by human mathematicians: these are known as [[proof assistant]]s. They may also use formal logics that are stronger than first-order logic, such as type theory. Because a full derivation of any nontrivial result in a first-order deductive system will be extremely long for a human to write,<ref>Avigad ''et al.'' (2007) discuss the process of formally verifying a proof of the [[prime number theorem]]. The formalized proof required approximately 30,000 lines of input to the Isabelle proof verifier.</ref> results are often formalized as a series of lemmas, for which derivations can be constructed separately. + +Automated theorem provers are also used to implement [[formal verification]] in computer science. In this setting, theorem provers are used to verify the correctness of programs and of hardware such as [[CPU|processors]] with respect to a [[formal specification]]. Because such analysis is time-consuming and thus expensive, it is usually reserved for projects in which a malfunction would have grave human or financial consequences. + +==See also== +{{Portal|Logic}} +* [[Second-order logic]] +* [[Higher-order logic]] +* [[ACL2]] &mdash; A Computational Logic for Applicative Common Lisp. +* [[Equiconsistency]] +* [[Extension by definitions]] +* [[Hanf number]] +* [[Herbrandization]] +* [[Lowenheim number]] +* [[Prenex normal form]] +* [[Relational algebra]] +* [[Relational model]] +* [[Skolem normal form]] +* [[Table of logic symbols]] +* [[Tarski's World]] +* [[Truth table]] +* [[Type (model theory)]] + +==Notes== +{{Reflist}} + +==References== +* [[Peter B. Andrews|Andrews, Peter B.]] (2002); ''An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof'', 2nd ed., Berlin: Kluwer Academic Publishers. Available from Springer. +* Avigad, Jeremy; Donnelly, Kevin; Gray, David; and Raff, Paul (2007); "A formally verified proof of the prime number theorem", ''ACM Transactions on Computational Logic'', vol. 9 no. 1 {{doi|10.1145/1297658.1297660}} +* [[Jon Barwise|Barwise, Jon]] (1977); "An Introduction to First-Order Logic", in {{Cite book |editor1-last=Barwise |editor1-first=Jon |title=Handbook of Mathematical Logic |publisher=North-Holland |location=Amsterdam, NL |series=Studies in Logic and the Foundations of Mathematics |isbn=978-0-444-86388-1 |year=1982 }} +* Barwise, Jon; and [[John Etchemendy|Etchemendy, John]] (2000); ''Language Proof and Logic'', Stanford, CA: CSLI Publications (Distributed by the University of Chicago Press) +* [[Józef Maria Bocheński|Bocheński, Józef Maria]] (2007); ''A Précis of Mathematical Logic'', Dordrecht, NL: D. Reidel, translated from the French and German editions by Otto Bird +* Ferreirós, José (2001); [http://jstor.org/stable/2687794 ''The Road to Modern Logic — An Interpretation''], Bulletin of Symbolic Logic, Volume 7, Issue 4, 2001, pp.&nbsp;441–484, DOI 10.2307/2687794, [http://links.jstor.org/sici?sici=1079-8986%28200112%297%3A4%3C441%3ATRTMLI%3E2.0.CO%3B2-O JStor] +* [[L. T. F. Gamut|Gamut, L. T. F.]] (1991); ''Logic, Language, and Meaning, Volume 2: Intensional Logic and Logical Grammar'', Chicago, IL: University of Chicago Press, ISBN 0-226-28088-8 +* [[David Hilbert|Hilbert, David]]; and [[Wilhelm Ackermann|Ackermann, Wilhelm]] (1950); ''[[Principles of Mathematical Logic]]'', Chelsea (English translation of ''Grundzüge der theoretischen Logik'', 1928 German first edition) +* [[Wilfrid Hodges|Hodges, Wilfrid]] (2001); "Classical Logic I: First Order Logic", in Goble, Lou (ed.); ''The Blackwell Guide to Philosophical Logic'', Blackwell +* [[Heinz-Dieter Ebbinghaus|Ebbinghaus, Heinz-Dieter]]; Flum, Jörg; and Thomas, Wolfgang (1994); ''Mathematical Logic'', Undergraduate Texts in Mathematics, Berlin, DE/New York, NY: [[Springer-Verlag]], Second Edition, ISBN 978-0-387-94258-2 +* {{Citation + |last=Rautenberg + |first=Wolfgang + |authorlink=Wolfgang Rautenberg + |doi=10.1007/978-1-4419-1221-3 + |title=A Concise Introduction to Mathematical Logic + |url=http://www.springerlink.com/content/978-1-4419-1220-6/ + |publisher=[[Springer Science+Business Media]] + |location=[[New York City|New York, NY]] + |edition=3rd + |isbn=978-1-4419-1220-6 + |year=2010 +}} +*Tarski, Alfred and Givant, Steven (1987); ''A Formalization of Set Theory without Variables''. Providence RI: American Mathematical Society. +{{inconsistent citations}} + +==External links== +* {{springer|title=Predicate calculus|id=p/p074360}} +* [[Stanford Encyclopedia of Philosophy]]: Shapiro, Stewart; "[http://plato.stanford.edu/entries/logic-classical/ Classical Logic]". Covers syntax, model theory, and metatheory for first-order logic in the natural deduction style. +* [[P. D. Magnus|Magnus, P. D.]]; ''[http://www.fecundity.com/logic/ forall x: an introduction to formal logic]''. Covers formal semantics and proof theory for first-order logic. +* [http://us.metamath.org/index.html Metamath]: an ongoing online project to reconstruct mathematics as a huge first-order theory, using first-order logic and the [[axiomatic set theory]] [[Zermelo–Fraenkel set theory|ZFC]]. ''[[Principia Mathematica]]'' modernized. +* Podnieks, Karl; ''[http://www.ltn.lv/~podnieks/ Introduction to mathematical logic]'' +* [http://john.fremlin.de/schoolwork/logic/index.html ''Cambridge Mathematics Tripos Notes''] (typeset by John Fremlin). These notes cover part of a past Cambridge Mathematics Tripos course taught to undergraduates students (usually) within their third year. The course is entitled "Logic, Computation and Set Theory" and covers Ordinals and cardinals, Posets and Zorn's Lemma, Propositional logic, Predicate logic, Set theory and Consistency issues related to ZFC and other set theories. + +{{Logic}} + +{{DEFAULTSORT:First-Order Logic}} +[[Category:Systems of formal logic]] +[[Category:Predicate logic]] +[[Category:Model theory]] + 8lruzoj36g9nmoeud2dttzhf611finu + + + + Clenshaw algorithm + 0 + 6936 + + 6937 + 2013-12-28T01:00:19Z + + 71.41.210.146 + + /* Geodetic applications */ Linked to article describing recurrence relation in more detail. + wikitext + text/x-wiki + In [[numerical analysis]], the '''Clenshaw algorithm'''<ref name="Clenshaw55">{{Cite doi|10.1090/S0025-5718-1955-0071856-0}} Note that this paper is written in terms of the ''Shifted'' Chebyshev polynomials of the first kind <math>T^*_n(x) = T_n(2x-1)</math>.</ref> is a [[Recursion|recursive]] method to evaluate a linear combination of [[Chebyshev polynomials]]. It is a generalization of [[Horner's method]] for evaluating a linear combination of [[monomial]]s. + +It generalizes to more than just Chebyshev polynomials; it applies to any class of functions that can be defined by a three-term [[recurrence relation]].<ref>{{Citation |last1=Press |first1=WH |last2=Teukolsky |first2=SA |last3=Vetterling |first3=WT |last4=Flannery |first4=BP |year=2007 |title=Numerical Recipes: The Art of Scientific Computing |edition=3rd |publisher=Cambridge University Press |publication-place=New York |isbn=978-0-521-88068-8 |chapter=Section 5.4.2. Clenshaw's Recurrence Formula |chapter-url=http://apps.nrbook.com/empanel/index.html?pg=222}}</ref> + +==Clenshaw algorithm== + +Suppose that <math>\phi_k,\; k=0, 1, \ldots</math> is a sequence of functions that satisfy the linear recurrence relation + +:<math>\phi_{k+1}(x) = \alpha_k(x)\,\phi_k(x) + \beta_k(x)\,\phi_{k-1}(x),</math> + +where the coefficients <math>\alpha_k</math> and <math>\beta_k</math> are known in advance. Note that in the most common applications, <math>\alpha(x)</math> does not depend on <math>k</math>, and <math>\beta</math> is a constant that depends on neither <math>x</math> nor <math>k</math>. + +Our goal is to evaluate a weighted sum of these functions +:<math>S(x) = \sum_{k=0}^n a_k \phi_k(x)</math> + +Given the coefficients <math>a_0, \ldots, a_n</math>, compute the values <math>b_k(x)</math> by the "reverse" recurrence formula: + +:<math>\begin{align} + b_{n+1}(x) &= b_{n+2}(x) = 0, \\[.5em] + b_{k}(x) &= a_k + \alpha_k(x)\,b_{k+1}(x) + \beta_{k+1}(x)\,b_{k+2}(x). +\end{align}</math> + +The linear combination of the <math>\phi_k</math> satisfies: + +:<math>S(x) = \phi_0(x)a_0 + \phi_1(x)b_1(x) + \beta_1(x)\phi_0(x)b_2(x).</math> + +See Fox and Parker<ref name="FoxParker">{{Citation |author1=L. Fox |author2=I. B. Parker |title=Chebyshev Polynomials in Numerical Analysis |publisher=Oxford University Press |year=1968 |isbn=0-19-859614-6}}</ref> for more information and stability analyses. + +===Horner as a special case of Clenshaw=== +A particularly simple case occurs when evaluating a polynomial of the form +:<math>S(x) = \sum_{k=0}^n a_k x^k</math>. +The functions are simply +:<math>\begin{align} + \phi_0(x) &= 1, \\ + \phi_k(x) &= x^k = x\phi_{k-1}(x) +\end{align}</math> +and are produced by the recurrence coefficients <math>\alpha(x) = x</math> and <math>\beta = 0</math>. + +In this case, the recurrence formula to compute the sum is +:<math>b_k(x) = a_k + x b_{k+1}(x)</math> +and, in this case, the sum is simply +:<math>S(x) = a_0 + x b_1(x) = b_0(x)</math>, +which is exactly the usual [[Horner's method]]. + +===Special case for Chebyshev series=== +Consider a truncated [[Chebyshev series]] + +:<math>p_n(x) = a_0 + a_1T_1(x) + a_2T_2(x) + \cdots + a_nT_n(x).</math> + +The coefficients in the recursion relation for the [[Chebyshev polynomials]] are + +:<math>\alpha(x) = 2x, \quad \beta = -1,</math> +with the initial conditions +:<math>T_0(x) = 1, \quad T_1(x) = x.</math> + +Thus, the recurrence is +:<math>b_k(x) = a_k + 2xb_{k+1}(x) - b_{k+2}(x)</math> +and the final sum is +:<math>p_n(x) = a_0 + xb_1(x) - b_2(x).</math> + +One way to evaluate this is to continue the recurrence one more step, and compute +:<math>b_0(x) = 2a_0 + 2xb_1(x) - b_2(x),</math> +(note the doubled ''a''<sub>0</sub> coefficient) followed by +:<math>p_n(x) = b_0(x)-x b_1(x)-a_0=\frac{1}{2}\left[b_0(x) - b_2(x)\right].</math> + +===Geodetic applications=== + +Clenshaw's algorithm is extensively used in geodetic applications +where it is usually referred to as '''Clenshaw summation'''.<ref> +{{Citation +| last1=Tscherning +| first1=C. C. +| last2=Poder +| first2=K. +| year=1982 +| title=Some Geodetic applications of Clenshaw Summation +| journal=Bolletino di Geodesia e Scienze Affini +| volume=41 +| number=4 +| pages=349–375 +| url=http://cct.gfy.ku.dk/publ_cct/cct80.pdf +}}</ref> A simple application is summing the trigonometric series to compute +the [[meridian arc]]. These have the form + +:<math>m(\theta) = C_0\,\theta + C_1\sin \theta + C_2\sin 2\theta + \cdots + C_n\sin n\theta.</math> + +Leaving off the initial <math>C_0\,\theta</math> term, the remainder is a summation of the appropriate form. There is no leading term because <math>\phi_0(\theta) = \sin 0\theta = \sin 0 = 0</math>. + +The [[List of trigonometric identities#Chebyshev method|recurrence relation for <math>\sin k\theta</math>]] is +:<math>\sin k\theta = 2 \cos\theta \sin (k-1)\theta - \sin (k-2)\theta</math>, + +making the coefficients in the recursion relation + +:<math>\alpha_k(\theta) = 2\cos\theta, \quad \beta_k = -1.</math> + +and the evaluation of the series is given by + +:<math>\begin{align} + b_{n+1}(\theta) &= b_{n+2}(\theta) = 0,\\[.3em] + b_k(\theta) &= C_k + 2 b_{k+1}(\theta)\cos \theta - b_{k+2}(\theta)\quad(n\ge k \ge 1). +\end{align}</math> + +The final step is made particularly simple because <math>\phi_0(\theta) = \sin 0 = 0</math>, so the end of the recurrence is simply <math>b_1(\theta)\sin(\theta)</math>; the <math>C_0\,\theta</math> term is added separately: + +:<math>m(\theta) = C_0\,\theta + b_1(\theta)\sin \theta.</math> + +Note that the algorithm requires only the evaluation of two trigonometric quantities <math>\cos \theta</math> and <math>\sin \theta</math>. + +==See also== +*[[Horner scheme]] to evaluate polynomials in [[monomial form]] +*[[De Casteljau's algorithm]] to evaluate polynomials in [[Bézier form]] + +==References== +<references/> + +{{DEFAULTSORT:Clenshaw Algorithm}} +[[Category:Numerical analysis]] + 7iqimnyi34cb7u7pc81zn6mzge81fii + + + + Isolation lemma + 0 + 25609 + + 25610 + 2013-08-21T04:57:03Z + + Shreevatsa + 0 + + + add ref asked for + wikitext + text/x-wiki + In [[theoretical computer science]], the term '''isolation lemma''' (or '''isolating lemma''') refers to [[randomized algorithm]]s that reduce the number of solutions to a problem to one, should a solution exist. +This is achieved by adding random restrictions to the solution space such that, with non-negligible probability, exactly one solution satisfies the additional restrictions if the solution space is not empty. +Isolation lemmas have important applications in computer science, such as the [[Valiant–Vazirani theorem]] and [[Toda's theorem]] in [[computational complexity theory]]. + +The first isolation lemma was introduced by {{harvtxt|Valiant|Vazirani|1986}}, albeit not under that name. +Their isolation lemma chooses a random number of random hyperplanes, and has the property that, with non-negligible probability, the intersection of any fixed non-empty solution space with the chosen hyperplanes contains exactly one element. This suffices to show the [[Valiant–Vazirani theorem]]: +there exists a randomized [[polynomial-time reduction]] from the [[Boolean satisfiability problem|satisfiability problem for Boolean formulas]] to the problem of detecting whether a Boolean formula has a unique solution. +{{harvtxt|Mulmuley|Vazirani|Vazirani|1987}} introduced an isolation lemma of a slightly different kind: +Here every coordinate of the solution space gets assigned a random weight in a certain range of integers, and the property is that, with non-negligible probability, there is exactly one element in the solution space that has minimum weight. This can be used to obtain a randomized parallel algorithm for the [[maximum matching]] problem. + +Stronger isolation lemmas have been introduced in the literature to fit different needs in various settings. +For example, the isolation lemma of {{harvtxt|Chari|Rohatgi|Srinivasan|1993}} has similar guarantees as that of Mulmuley et al., but it uses fewer random bits. +In the context of the [[exponential time hypothesis]], {{harvtxt|Calabro|Impagliazzo|Kabanets|Paturi|2008}} prove an isolation lemma for [[k-Sat|k-CNF formulas]]. + +==The isolation lemma of Mulmuley, Vazirani, and Vazirani== +[[File:Linear optimization in a 2-dimensional polytope.svg|thumb|Any [[linear program]] with a randomly chosen linear cost function has a unique optimum with high probability. The isolation lemma of Mulmuley, Vazirani, and Vazirani extends this fact to ''arbitrary'' sets and a random cost function that is sampled using ''few'' random bits.]] +:'''Lemma.''' Let <math>n</math> and <math>N</math> be positive integers, and let <math>\mathcal F</math> be an arbitrary family of subsets of the universe <math>\{1,\dots,n\}</math>. Suppose each element <math>x\in\{1,\dots,n\}</math> in the universe receives an integer weight <math>w(x)</math>, each of which is chosen independently and uniformly at random from <math>\{1,\dots,N\}</math>. The weight of a set ''S'' in <math>\mathcal F</math> is defined as +::<math>w(S) = \sum_{x \in S} w(x)\,.</math> +:Then, with probability at least <math>1-n/N</math>, there is a ''unique'' set in <math>\mathcal F</math> that has the minimum weight among all sets of <math>\mathcal F</math>. + + +It is remarkable that the lemma assumes nothing about the nature of the family <math>\mathcal F</math>: for instance <math>\mathcal F</math> may include ''all'' <math>2^n-1</math> nonempty subsets. Since the weight of each set in <math>\mathcal F</math> is between <math>1</math> and <math>nN</math> on average there will be <math>(2^n-1) / (nN)</math> sets of each possible weight. +Still, with high probability, there is a unique set that has minimum weight. +<div class="NavFrame collapsed" style="width: 50%;"> + <div class="NavHead">[Mulmuley, Vazirani, and Vazirani's Proof]</div> + <div class="NavContent" style="text-align:left"> +Suppose we have fixed the weights of all elements except an element ''x''. Then ''x'' has a ''threshold'' weight ''α'', such that if the weight ''w''(''x'') of ''x'' is greater than ''α'', then it is not contained in any minimum-weight subset, and if <math>w(x) \le \alpha</math>, then it is contained in some sets of minimum weight. Further, observe that if <math>w(x) < \alpha</math>, then ''every'' minimum-weight subset must contain ''x'' (since, when we decrease ''w(x)'' from ''α'', sets that do not contain ''x'' do not decrease in weight, while those that contain ''x'' do). Thus, ambiguity about whether a minimum-weight subset contains ''x'' or not can happen only when its weight is exactly equal to its threshold; in this case we will call ''x'' "singular". Now, as the threshold of ''x'' was defined only in terms of the weights of the ''other'' elements, it is independent of ''w(x)'', and therefore, as ''w''(''x'') is chosen uniformly from {1,&nbsp;…,&nbsp;''N''}, + +:<math>\Pr[x\text{ is singular}] = \Pr[w(x) = \alpha] \le 1/N</math> + +and the probability that ''some'' ''x'' is singular is at most&nbsp;''n/N''. As there is a unique minimum-weight subset [[iff]] no element is singular, the lemma follows. + +Remark: The lemma holds with <math>\le</math> (rather than =) since it is possible that some ''x'' has no threshold value (i.e., ''x'' will not be in any minimum-weight subset even if ''w''(''x'') gets the minimum possible value, 1). + </div> +</div> + +<div class="NavFrame collapsed" style="width: 50%;"> + <div class="NavHead">[Joel Spencer's Proof]</div> + <div class="NavContent" style="text-align:left"> +This is a restatement version of the above proof, due to [[Joel Spencer]] (1995).<ref>{{harvtxt|Jukna|2001}}</ref> + +For any element ''x'' in the set, define + +:<math>\alpha(x) = \min_{S \in \mathcal F, x \not\in S}w(S) - \min_{S\in\mathcal F, x\in S}w(S\setminus\{x\}). \, </math> + +Observe that <math>\alpha(x)</math> depends only on the weights of elements other than ''x'', and not on ''w''(''x'') itself. So whatever the value of <math>\alpha(x)</math>, as ''w''(''x'') is chosen uniformly from {1,&nbsp;…,&nbsp;''N''}, the probability that it is equal to <math>\alpha(x)</math> is at most&nbsp;1/''N''. Thus the probability that <math>w(x) = \alpha(x)</math> for ''some'' ''x'' is at most&nbsp;''n/N''. + +Now if there are two sets ''A'' and ''B'' in <math>\mathcal F</math> with minimum weight, then, taking any ''x'' in ''A\B'', we have + +:<math>\begin{align} +\alpha(x) &= \min_{S \in \mathcal F, x \not\in S}w(S) - \min_{S\in\mathcal F, x\in S}w(S\setminus\{x\}) \\ + &= w(B) - (w(A)-w(x)) \\ + &= w(x), +\end{align}</math> + +and as we have seen, this event happens with probability at most&nbsp;''n/N''. + </div> +</div> + +==Examples/applications== +{{Expand section|date=May 2010}} +* The original application was to minimum-weight (or maximum-weight) perfect matchings in a graph. Each edge is assigned a random weight in {1,&nbsp;…,&nbsp;2''m''}, and <math>\mathcal F</math> is the set of perfect matchings, so that with probability at least&nbsp;1/2, there exists a unique perfect matching. When each indeterminate <math>x_{ij}</math> in the [[Tutte matrix]] of the graph is replaced with <math>2^{w_{ij}}</math> where <math>w_{ij}</math> is the random weight of the edge, we can show that the determinant of the matrix is nonzero, and further use this to find the matching. +* More generally, the paper also observed that any search problem of the form "Given a set system <math>(S,\mathcal F)</math>, find a set in <math>\mathcal F</math>" could be reduced to a decision problem of the form "Is there a set in <math>\mathcal F</math> with total weight at most ''k''?". For instance, it showed how to solve the following problem posed by Papadimitriou and Yannakakis, for which (as of the time the paper was written) no deterministic polynomial-time algorithm is known: given a graph and a subset of the edges marked as "red", find a perfect matching with exactly ''k'' red edges. +* The [[Valiant–Vazirani theorem]], concerning unique solutions to NP-complete problems, has a simpler proof using the isolation lemma. This is proved by giving a randomized reduction from [[Clique problem|CLIQUE]] to UNIQUE-CLIQUE.<ref>{{harvtxt|Mulmuley|Vazirani|Vazirani|1987}}</ref> +* {{harvtxt|Ben-David|Chor|Goldreich|1989}} use the proof of Valiant-Vazirani in their search-to-decision reduction for [[average-case complexity]]. +* [[Avi Wigderson]] used the isolation lemma in 1994 to give a randomized reduction from [[NL (complexity)|NL]] to UL, and thereby prove that NL/poly ⊆ ⊕L/poly.<ref>{{harvtxt|Wigderson|1994}}</ref> Reinhardt and Allender later used the isolation lemma again to prove that NL/poly = UL/poly.<ref>{{harvtxt|Reinhardt|Allender|2000}}</ref> +* The book by Hemaspaandra and Ogihara has a chapter on the isolation technique, including generalizations.<ref>{{harvtxt|Hemaspaandra|Ogihara|2002}}</ref> +* The isolation lemma has been proposed as the basis of a scheme for [[digital watermarking]].<ref>{{harvtxt|Majumdar|Wong|2001}}</ref> +* There is ongoing work on derandomizing the isolation lemma in specific cases<ref>{{harvtxt|Arvind|Mukhopadhyay|2008}}</ref> and on using it for identity testing.<ref>{{harvtxt|Arvind|Mukhopadhyay|Srinivasan|2008}}</ref> + +==Notes== +{{reflist|colwidth=25em}} + +==References== +{{refbegin|colwidth=25em}} +* {{Cite conference | publisher = Springer-Verlag | isbn = 978-3-540-85362-6 | pages = 276–289 + | last1 = Arvind | first1 = V. + | last2 = Mukhopadhyay | first2 = Partha + | title = Derandomizing the Isolation Lemma and Lower Bounds for Circuit Size | booktitle = Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques | location = Boston, MA, USA | accessdate = 2010-05-10 | date = 2008 | url = http://portal.acm.org/citation.cfm?id=1429791.1429816 + | ref=harv}} {{arxiv|0804.0957}} +* {{Cite conference | publisher = IEEE Computer Society | isbn = 978-0-7695-3169-4 | pages = 268–279 + | last1 = Arvind | first1 = V. + | last2 = Mukhopadhyay | first2 = Partha + | last3 = Srinivasan | first3 = Srikanth + | title = New Results on Noncommutative and Commutative Polynomial Identity Testing + | booktitle = Proceedings of the 2008 IEEE 23rd Annual Conference on Computational Complexity | accessdate = 2010-05-10 | date = 2008 | url = http://portal.acm.org/citation.cfm?id=1380843.1380966|ref=harv}} {{arxiv|0801.0514}} +* {{cite doi|10.1145/73007.73027| ref=harv}} +* {{cite doi|10.1016/j.jcss.2007.06.015| ref=harv}} +* {{cite doi|10.1145/167088.167213| ref=harv}} +* {{Cite book | year=2002 | chapter= Chapter 4. The Isolation Technique | url=http://www.cs.rochester.edu/~lane/=companion/isolation.pdf | title=The complexity theory companion | last1=Hemaspaandra | first1=Lane A. | last2=Ogihara | first2=Mitsunori | ref=harv| publisher=Springer | isbn=978-3-540-67419-1 | postscript=.}} +* {{Cite conference | publisher = ACM | doi = 10.1145/378239.378566 | isbn = 1-58113-297-2 | pages = 480–485 | last1 = Majumdar | first1 = Rupak | last2 = Wong | first2 = Jennifer L. | title = Watermarking of SAT using combinatorial isolation lemmas | booktitle = Proceedings of the 38th annual Design Automation Conference | location = Las Vegas, Nevada, United States | accessdate = 2010-05-10 | date = 2001 | ref=harv| url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.9300}} +* {{Cite journal | volume = 29 | pages = 1118 | last1 = Reinhardt | first1 = K. | last2=Allender | first2 = E. | title = Making Nondeterminism Unambiguous | journal = SIAM Journal on Computing | year = 2000| ref=harv|url=ftp://128.6.25.4/http/pub/allender/nlul.pdf | doi = 10.1137/S0097539798339041 | issue = 4}} +* {{Cite journal +| volume = 7 +| issue = 1 +| pages = 105–113 +| last1 = Mulmuley | first1 = Ketan | authorlink1 = Ketan Mulmuley +| last2 = Vazirani | first2 = Umesh | authorlink2 = Umesh Vazirani +| last3 = Vazirani | first3 = Vijay | authorlink3 = Vijay Vazirani +| title = Matching is as easy as matrix inversion +| journal = Combinatorica +| year = 1987 +| ref=harv +| url=http://www.springerlink.com/content/r4rw2x4l46476708/ +| doi = 10.1007/BF02579206}} +* {{Cite book | last=Jukna | first=Stasys | year=2001 | title=Extremal combinatorics: with applications in computer science | publisher=Springer | isbn=978-3-540-66313-3 | url=http://lovelace.thi.informatik.uni-frankfurt.de/~jukna/EC_Book/index.html | pages=147–150 | ref=harv | postscript=.}} +* {{cite doi|10.1016/0304-3975(86)90135-0| ref=harv}} +* {{Cite conference| last = Wigderson| first = Avi | authorlink = Avi Wigderson | title = NL/poly ⊆ ⊕L/poly | url = http://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/W94/proc.pdf | date = 1994 | conference = Proceedings of the 9th Structures in Complexity Conference | ref=harv| pages = 59–62}} +{{refend}} + +==External links== +* [http://blog.computationalcomplexity.org/2006/09/favorite-theorems-unique-witnesses.html Favorite Theorems: Unique Witnesses] by [[Lance Fortnow]] +* [http://rjlipton.wordpress.com/2009/07/01/the-isolation-lemma-and-beyond/ The Isolation Lemma and Beyond] by [[Richard J. Lipton]] + +{{DEFAULTSORT:Isolation Lemma}} +[[Category:Probability theorems]] +[[Category:Combinatorics]] +[[Category:Lemmas]] + jw00nsfsnwkzmj7rnj4h6ja67jxgrqb + + + + Singularity spectrum + 0 + 27418 + + 27419 + 2012-03-18T23:58:35Z + + Helpful Pixie Bot + 0 + + + ISBNs (Build J/) + wikitext + text/x-wiki + {{no footnotes|date=January 2012}} +{{one source|date=January 2012}} +The '''singularity spectrum''' is a function used in [[Multifractal analysis]] to describe the fractal dimension of a subset of points of a function belonging to a group of points that have the same [[Holder exponent]]. Intuitively, the singularity spectrum gives a value for how "fractal" a set of points are in a function. + +More formally, the singularity spectrum <math>D(\alpha)</math> of a function, <math>f(x)</math>, is defined as: + +:<math>D(\alpha) = D_F\{x, \alpha(x) = \alpha\}</math> + +Where <math>\alpha(x)</math> is the function describing the [[Holder exponent]], <math>\alpha(x)</math> of <math>f(x)</math> at the point <math>x</math>. <math>D_F\{\cdot\}</math> is the [[Hausdorff dimension]] of a point set. + +==See also== +* [[Multifractal analysis]] +* [[Holder exponent]] +* [[Hausdorff dimension]] +* [[Fractal]] +* [[Fractional Brownian motion]] + +==References== +* {{citation|last=van den Berg|first=J. C.|year=2004|title=Wavelets in Physics|publisher=Cambridge|isbn=978-0-521-53353-9}}. + + + +[[Category:Fractals]] + 62l4dre4r8yvwsytheqvocp0hpkg6rf + + + + Time-evolving block decimation + 0 + 14487 + + 14488 + 2013-10-13T15:53:21Z + + 217.189.227.252 + + /* The Schmidt decomposition */ + wikitext + text/x-wiki + {{Refimprove|date=March 2011}} +The '''time-evolving block decimation (TEBD)''' [[algorithm]] is a numerical scheme used to simulate one-dimensional [[quantum]] many-body systems, characterized by at most nearest-neighbour interactions. +It is dubbed Time-evolving Block Decimation because it dynamically identifies the relevant low-dimensional Hilbert subspaces of an exponentially larger original [[Hilbert space]]. The algorithm, based on the Matrix Product States formalism, is highly efficient when the amount of [[quantum entanglement|entanglement]] in the system is limited, a requirement fulfilled by a large class of quantum many-body systems in one dimension. + +==Introduction== +{{Essay-like|date=October 2010}} +There is nowadays a considerable interest in the field of quantum theory for computational methods well-suited to the physics of many-body systems. Considering the inherent difficulties of simulating general quantum many-body systems, the [[Exponential growth|exponential increase]] in parameters with the size of the system, and correspondingly, the high computational costs, one solution would be to look for numerical methods that deal with special cases, where one can profit from the physics of the system. The raw approach, by directly dealing with all the parameters used to fully characterize a quantum many-body system is seriously impeded by the lavishly exponential buildup with the system size of the amount of variables needed for simulation, which leads, in the best cases, to unreasonably long computational times and extended use of memory. To get around this problem a number of various methods have been developed and put into practice in the course of time, one of the most successful ones being the [[quantum Monte Carlo|quantum Monte Carlo method]] (QMC). Also the [[density matrix renormalization group]] (DMRG) method, next to QMC, is a very reliable method, with an expanding community of users and an increasing number of applications to physical systems. + +When the first [[quantum computer]] will be plugged in and functioning, the perspectives for the field of computational physics will look rather promising, but until that day one has to restrict oneself to the mundane tools offered by classical computers. While experimental physicists are putting a lot of effort in trying to build the first quantum computer, theoretical physicists are searching, in the field of [[quantum information science|quantum information theory]] (QIT), for genuine quantum algorithms, appropriate for problems that would perform badly when trying to be solved on a classical computer, but pretty fast and successful on a quantum one. The search for such algorithms is still going, the best-known (and almost the only ones found) being the [[Shor's algorithm]], for [[integer factorization|factoring]] large numbers, and [[Grover's algorithm|Grover's search algorithm]]. + +In the field of QIT one has to identify the primary resources necessary for genuine quantum computation. Such a resource may be responsible for the speedup gain in quantum versus classical, identifying them means also identifying systems that can be simulated in a reasonably efficient manner on a classical computer. Such a resource is [[quantum entanglement]]; hence, it is possible to establish a distinct lower bound for the entanglement needed for quantum computational speedups. + +[[Guifré Vidal]], then at the Institute for Quantum Information, [[California Institute of Technology|CalTech]], +has recently proposed a scheme useful for simulating a certain category of quantum<ref name=vidal>[[Guifré Vidal]], ''Efficient Classical Simulation of Slightly Entangled Quantum Computations'', PRL 91, 147902 (2003)[http://www.citebase.org/cgi-bin/citations?id=oai:arXiv.org:quant-ph/0301063]</ref> systems. He asserts that ''"any quantum computation with pure states can be efficiently simulated with a classical computer provided the amount of entanglement involved is sufficiently restricted" +''. +This happens to be the case with generic [[Hamiltonian (quantum mechanics)|Hamiltonians]] displaying local interactions, as for example, [[Hubbard model|Hubbard]]-like Hamiltonians. The method exhibits a low-degree polynomial behavior in the increase of computational time with respect to the amount of entanglement present in the system. The algorithm is based on a scheme that exploits the fact that in these one-dimensional systems the eigenvalues of the reduced [[density matrix]] on a bipartite split of the system are exponentially decaying, thus allowing us to work in a re-sized space spanned by the eigenvectors corresponding to the [[Eigenvalue, eigenvector and eigenspace|eigenvalues]] we selected. + +One can also estimate the amount of computational resources required for the simulation of a quantum system on a classical computer, knowing how the entanglement contained in the system scales with the size of the system. The classically (and quantum, as well) feasible simulations are those that involve systems only a trifle entangled—the strongly entangled ones being, on the other hand, good candidates only for genuine quantum computations. + +The numerical method is efficient in simulating real-time dynamics or calculations of [[Stationary state|ground states]] using imaginary-time evolution or [[Isentropic process|isentropic]] interpolations between a target Hamiltonian and a Hamiltonian with an already-known ground state. The computational time scales [[linear]]ly with the system size, hence many-particles systems in 1D can be investigated. + +A useful feature of the TEBD algorithm is that it can be reliably employed for [[time evolution]] simulations of time-dependent Hamiltonians, describing systems that can be realized with [[Laser cooling|cold]] atoms in [[optical lattice]]s, or in systems far from equilibrium in quantum transport. From this point of view, TEBD had a certain ascendance over DMRG, a very powerful technique, but until recently not very well suited for simulating time-evolutions. With the Matrix Product States formalism being at the mathematical heart of DMRG, the TEBD scheme was adopted by the DMRG community, thus giving birth to the time dependent DMRG [http://www.citebase.org/cgi-bin/citations?id=oai:arXiv.org:cond-mat/0403313], t-DMRG for short. + +Around the same time, other groups have developed similar approaches in which quantum information plays a predominant role as, for example, in DMRG implementations for periodic boundary conditions [http://arxiv.org/abs/cond-mat/0404706], and for studying mixed-state dynamics in one-dimensional quantum lattice systems,.<ref name="cond-mat/0406426">{{cite journal |doi=10.1103/PhysRevLett.93.207204 |author=F. Verstraete, J. J. Garcia-Ripoll, J. I. Cirac |title=Matrix Product Density Operators: Simulation of finite-T and dissipative systems |journal=Phys. Rev. Lett. |volume=93 |issue=20 |pages= 207204 |year=2004 |pmid=15600964 |bibcode=2004PhRvL..93t7204V|arxiv = cond-mat/0406426 }} [http://arxiv.org/abs/cond-mat/0406426]</ref><ref name="cond-mat/0406440">{{cite journal |doi=10.1103/PhysRevLett.93.207205 |author=M. Zwolak, G. Vidal |title=Mixed-state dynamics in one-dimensional quantum lattice systems: a time-dependent superoperator renormalization algorithm |journal=Phys. Rev. Lett. |volume=93 |issue=20 |pages= 207205 |year=2004 |pmid=15600965 |bibcode=2004PhRvL..93t7205Z|arxiv = cond-mat/0406440 }} [http://arxiv.org/abs/cond-mat/0406440]</ref> Those last approaches actually provide a formalism that is more general than the original TEBD approach, as it also allows to deal with evolutions with matrix product operators; this enables the simulation of nontrivial non-infinitesimal evolutions as opposed to the TEBD case, and is a crucial ingredient to deal with higher dimensional analogues of matrix product states. + +==The decomposition of state== +===Introducing the decomposition of State=== + +Let us consider a chain of ''N'' [[qubit]]s, described by the function <math>| \Psi \rangle \in H^{{\otimes} N }</math>. The most natural way of describing <math>| \Psi \rangle</math> would be using the local <math>M^N</math>-dimensional basis <math>| i_1,i_2,..,i_{N-1},i_N \rangle</math>: + +:<math>| \Psi \rangle=\sum\limits_{i=1}^{M}c_{i_1i_2..i_N} | {i_1,i_2,..,i_{N-1},i_N} \rangle</math> + +where ''M'' is the on-site dimension. + +The trick of TEBD is to re-write the coefficients <math>c_{i_1i_2..i_N}</math>: + +:<math> c_{i_1i_2..i_N}=\sum\limits_{\alpha_1,..,\alpha_{N-1}=0}^{\chi}\Gamma^{[1]i_1}_{\alpha_1}\lambda^{[1]}_{\alpha_1}\Gamma^{[2]i_2}_{\alpha_1\alpha_2}\lambda^{[2]}_{\alpha_2}\Gamma^{[3]i_3}_{\alpha_2\alpha_3}\lambda^{[3]}_{\alpha_3}\cdot..\cdot\Gamma^{[{N-1}]i_{N-1}}_{\alpha_{N-2}\alpha_{N-1}}\lambda^{[N-1]}_{\alpha_{N-1}}\Gamma^{[N]i_N}_{\alpha_{N-1}}</math> + +This may seem rather esoteric in the beginning but let's try take a look at how this decomposition is obtained and what it is good for. + +For this we have to make use of another piece of mathematics called the [[Schmidt decomposition]] of a state, or [[singular value decomposition]]. + +===The Schmidt decomposition=== + +Let us consider the state of a bipartite system <math>\vert \Psi \rangle \in {H_A \otimes H_B}</math>. Every such state <math>|{\Psi}\rangle</math> can be represented in an appropriately chosen basis as: + +:<math>|{\Psi}\rangle=\sum\limits_{i=1}^{M}a_i|{\Phi^A_i \Phi^B_i}\rangle</math> + +where <math>|{\Phi^A_i \Phi^B_i}\rangle=| {\Phi^A_i}\rangle\otimes | {\Phi^B_i}\rangle</math> are formed with vectors <math>|{\Phi^A_i}\rangle</math> that make an orthonormal basis in <math>H_A</math> and, correspondingly, vectors <math>|{\Phi^B_i}\rangle</math>, which form an orthonormal basis in <math>{H_B}</math>, with the coefficients <math>a_i</math> being real and positive, <math>\sum\limits_{i=1}^{M}a^2_i = 1</math>. This is called the Schmidt decomposition (SD) of a state. The summation can go up to <math>M_{\max}=\min(\dim({{H_A}}),\dim({{H_B}}))</math>, the lowest of the two Hilbert spaces. The Schmidt rank of a bipartite split is given by the number of non-zero Schmidt coefficients. If the Schmidt rank is one, the split is characterized by a product state. The vectors of the SD are determined up to a phase and the eigenvalues and the Schmidt rank are unique. + +For example, the two-qubit state: + +:<math>| {\Psi}\rangle=\frac{1}{2\sqrt{2}}\left( | {00}\rangle + {\sqrt{3}} | {01}\rangle + {\sqrt{3}} | {10}\rangle + |{11}\rangle\right)</math> + +has the following SD: + +:<math> +|{\Psi}\rangle = \frac{\sqrt{3}+1}{2\sqrt{2}}|{\phi^{A}_1\phi^{B}_1}\rangle + \frac{\sqrt{3}-1}{2\sqrt{2}}|{\phi^{A}_2\phi^{B}_2}\rangle</math> + +with + +:<math>|{\phi^{A}_1}\rangle=\frac{1}{\sqrt{2}}(|{0_{A}}\rangle+|{1_{A}}\rangle), \ \ |{\phi^{B}_1}\rangle=\frac{1}{\sqrt{2}}(|{0_{B}}\rangle+|{1_{B}}\rangle), \ \ |{\phi^{A}_2}\rangle=\frac{1}{\sqrt{2}}(|{0_{A}}\rangle-|{1_{A}}\rangle), \ \ |{\phi^{B}_2}\rangle=\frac{1}{\sqrt{2}}(|{1_{B}}\rangle-|{0_{B}}\rangle)</math> + +On the other hand, the state: + +:<math>|{\Phi}\rangle =\frac{1}{\sqrt{3}}|{00}\rangle + \frac{1}{\sqrt{6}}|{01}\rangle- \frac{i}{\sqrt{3}}|{10}\rangle - \frac{i}{\sqrt{6}}|{11}\rangle</math> + +is a product state: + +:<math>|{\Phi}\rangle = {(}\frac{1}{\sqrt{3}}|{0_{A}}\rangle-\frac{i}{\sqrt{3}}|{1_{A}}\rangle{)} \otimes {(}|{0_{B}}\rangle+\frac{1}{\sqrt{2}}|{1_{B}}\rangle{)}</math> + +===Building the decomposition of state=== + +At this point we probably know enough to try to see how we explicitly build the decomposition (let's call it ''D''). + +Consider the bipartite splitting <math>[1]:[2..N]</math>. The SD has the coefficients <math>\lambda^{[1]}_{{\alpha}_1}</math> and eigenvectors <math>|{\Phi^{[1]}_{\alpha_1}}\rangle |{ \Phi^{[2..N]}_{\alpha_1}}\rangle</math>. +By expanding the <math>|{\Phi^{[1]}_{\alpha_1}}\rangle</math>'s in the local basis, one can write: + +:<math>|{\Psi}\rangle=\sum\limits_{i_1,{\alpha_1=1}}^{M,\chi}\Gamma^{[1]i_1}_{\alpha_1}\lambda^{[1]}_{\alpha_1}|{i_1}\rangle|{\Phi^{[2..N]}_{\alpha_1}}\rangle +</math> + +The process can be decomposed in three steps, iterated for each bond (and, correspondingly, SD) in the chain: + +'''''Step 1''''': express the <math>|{\Phi^{[2..N]}_{\alpha_1}}\rangle</math>'s in a local basis for qubit 2: + +:<math> |{\Phi^{[2..N]}_{\alpha_1}}\rangle=\sum_{i_2}|{i_2}\rangle|{\tau^{[3..N]}_{\alpha_1i_2}}\rangle</math> + +The vectors <math>|{\tau^{[3..N]}_{\alpha_1i_2}}\rangle</math> are not necessarily [[Normalisable wavefunction|normalized]]. + +'''''Step 2''''': write each vector <math>|{\tau^{[3..N]}_{\alpha_1i_2}}\rangle</math> in terms of the '''''at most''''' (Vidal's emphasis) <math>\chi</math> Schmidt vectors <math>|{\Phi^{[3..N]}_{\alpha_2}}\rangle</math> and, correspondingly, coefficients <math>\lambda^{[2]}_{{\alpha}_2}</math>: + +:<math>|\tau^{[3..N]}_{\alpha_1i_2}\rangle=\sum_{\alpha_2}\Gamma^{[2]i_2}_{\alpha_1\alpha_2}\lambda^{[2]}_{{\alpha}_2}|{\Phi^{[3..N]}_{\alpha_2}}\rangle +</math> + +'''''Step 3''''': make the substitutions and obtain: + +:<math>|{\Psi}\rangle=\sum_{i_1,i_2,\alpha_1,\alpha_2}\Gamma^{[1]i_1}_{\alpha_1}\lambda^{[1]}_{\alpha_1}\Gamma^{[2]i_2}_{\alpha_1\alpha_2}\lambda^{[2]}_{{\alpha}_2}|{i_1i_2}\rangle|{\Phi^{[3..N]}_{\alpha_2}}\rangle</math> + +Repeating the steps 1 to 3, one can construct the whole decomposition of state ''D''. The last <math>\Gamma</math>'s are a special case, like the first ones, expressing the right-hand Schmidt vectors at the <math>(N-1)^{th}</math> bond in terms of the local basis at the <math>N^{th}</math> lattice place. As shown in,<ref name=vidal/> it is straightforward to obtain the Schmidt decomposition at <math>k^{th}</math> bond, i.e. <math>[1..k]:[k+1..N]</math>, from ''D''. + +The Schmidt eigenvalues, are given explicitly in ''D'': + +:<math>|{\Psi}\rangle=\sum_{\alpha_k}\lambda^{[k]}_{{\alpha}_k}|{\Phi^{[1..k]}_{\alpha_k}}\rangle|{\Phi^{[k+1..N]}_{\alpha_k}}\rangle</math> + +The Schmidt eigenvectors are simply: + +:<math>|{\Phi^{[1..k]}_{\alpha_k}}\rangle=\sum_{\alpha_1,\alpha_2..\alpha_{k-1}}\Gamma^{[1]i_1}_{\alpha_1}\lambda^{[1]}_{\alpha_1}\cdot\cdot\Gamma^{[k]i_k}_{\alpha_{k-1}\alpha_k}|{i_1i_2..i_k}\rangle</math> +and + +:<math>|{\Phi^{[k+1..N]}_{\alpha_k}}\rangle=\sum_{\alpha_{k+1},\alpha_{k+2}..\alpha_{N}}\Gamma^{[k+1]i_{k+1}}_{\alpha_k\alpha_{k+1}}\lambda^{[k+1]}_{\alpha_{k+1}}\cdot\cdot\lambda^{N-1}_{\alpha_{N-1}}\Gamma^{[N]i_N}_{\alpha_{N-1}}|{i_{k+1}i_{k+2}..i_N}\rangle</math> + +===Rationale=== + +Now, looking at ''D'', instead of <math>M^N</math> initial terms, there are <math>{\chi}^2{\cdot}M(N-2) + 2{\chi}M + (N-1)\chi</math>. Apparently this is just a fancy way of rewriting the coefficients <math>c_{i_1i_2..i_N}</math>, but in fact there is more to it than that. Assuming that ''N'' is even, the Schmidt rank <math>\chi</math> for a bipartite cut in the middle of the chain can have a maximal value of <math>M^{N/2}</math>; in this case we end up with at least <math>M^{N+1}{\cdot}(N-2)</math> coefficients, considering only the <math>{\chi}^2</math> ones, slightly more than the initial <math>M^N</math>! The truth is that the decomposition ''D'' is useful when dealing with systems that exhibit a low degree of entanglement, which fortunately is the case with many 1D systems, where the Schmidt coefficients of the ground state decay in an exponential manner with <math>\alpha</math>: + +:<math> +\lambda^{[l]}_{{\alpha}_l}{\sim}e^{-K\alpha_l},\ K>0.</math> + +Therefore it is possible to take into account only some of the Schmidt coefficients (namely the largest ones), dropping the others and consequently normalizing again the state: + +:<math>|{\Psi}\rangle=\frac{1}{\sqrt{\sum\limits_{{\alpha_l}=1}^{{\chi}_c}{|\lambda^{[l]}_{{\alpha}_l}|}^2}}\cdot\sum\limits_{{{\alpha}_l}=1}^{{\chi}_c}\lambda^{[l]}_{{\alpha}_l}|{\Phi^{[1..l]}_{\alpha_l}}\rangle|{ \Phi^{[l+1..N]}_{\alpha_l}}\rangle,</math> + +where <math>\chi_c</math> is the number of kept Schmidt coefficients. + +Let's get away from this abstract picture and refresh ourselves with a concrete example, to emphasize the advantage of making this decomposition. Consider for instance the case of 50 [[fermion]]s in a [[Ferromagnetism|ferromagnetic]] chain, for the sake of simplicity. A dimension of 12, let's say, for the <math>\chi_c</math> would be a reasonable choice, keeping the discarded eigenvalues at <math>0.0001</math>% of the total, as shown by numerical studies,<ref name=vidal2>Guifré Vidal, ''Efficient Simulation of One-Dimensional Quantum Many-Body-Systems'', PRL 93, 040502 (2004)[http://www.citebase.org/cgi-bin/citations?id=oai:arXiv.org:quant-ph/0310089]</ref> meaning roughly <math>2^{14}</math> coefficients, as compared to the originally <math>2^{50}</math> ones. + +Even if the Schmidt eigenvalues don't have this exponential decay, but they show an algebraic decrease, we can still use ''D'' to describe our state <math>\psi</math>. The number of coefficients to account for a faithful description of <math>\psi</math> may be sensibly larger, but still within reach of eventual numerical simulations. + +==The update of the decomposition== + +One can proceed now to investigate the behaviour of the decomposition ''D'' when acted upon with one-qubit [[Quantum gate|gates]] (OQG) and two-qubit gates (TQG) acting on neighbouring qubits. Instead of updating all the <math>M^N</math> coefficients <math>c_{i_1i_2..i_N}</math>, we will restrict ourselves to a number of operations that increase in <math>\chi</math> as a [[polynomial]] of low degree, thus saving [[Computational resource|computational time]]. + +===One-qubit gates acting on qubit ''k''=== + +The OQGs are affecting only the qubit they are acting upon, the update of the state <math>|{\psi}\rangle</math> after an [[Unitary matrix|unitary operator]] at qubit ''k'' does not modify the Schmidt eigenvalues or vectors on the left, consequently the <math>\Gamma^{[k-1]}</math>'s, or on the right, hence the <math>\Gamma^{[k+1]}</math>'s. The only <math>\Gamma</math>'s that will be updated are the <math>\Gamma^{[k]}</math>'s (requiring only at most <math>{{O}}(M^2\cdot\chi^2)</math> operations), as + +:<math>\Gamma^{'[k]i_k}_{\alpha_{k-1}\alpha_k}=\sum_{j}U^{i_k}_{j_k}\Gamma^{[k]j_k}_{\alpha_{k-1}\alpha_k}.</math> + +===Two-qubit gates acting on qubits ''k, k+1''=== + +The changes required to update the <math>\Gamma</math>'s and the <math>\lambda</math>'s, following a [[Unitary matrix|unitary operation]] ''V'' on qubits ''k, k+1'', concern only <math>\Gamma^{[k]}</math>, and <math>\Gamma^{[k+1]}</math>. +They consist of a number of <math>{{O}}({M\cdot\chi}^3)</math> basic operations. + +Following Vidal's original approach, <math>|{\psi}\rangle</math> can be regarded as belonging to only four subsystems: + +:<math>{{H}=J{{{\otimes}}}H_C{\otimes}H_D{\otimes}K}.\, </math> + +The [[Linear subspace|subspace]] ''J'' is spanned by the eigenvectors of the reduced density matrix <math>\rho^{J}=Tr_{CDK}|\psi\rangle\langle\psi|</math>: + +:<math>\rho^{[1..{k-1}]}=\sum_{\alpha}{(\lambda^{[k-1]}_{\alpha})}^2|{\Phi^{[1..{k-1}]}_{\alpha}}\rangle\langle{\Phi^{[1..{k-1}]}_{\alpha}}|=\sum_{\alpha}{(\lambda^{[k-1]}_{\alpha})^2}|{\alpha}\rangle\langle{\alpha}|.</math> + +In a similar way, the subspace ''K'' is spanned by the eigenvectors of the reduced density matrix: + +:<math>\rho^{[{k+2}..{N}]}=\sum_{\gamma}{(\lambda^{[k+1]}_{\gamma})^2}|{\Phi^{[{k+2}..N]}_{\gamma}}\rangle\langle{\Phi^{[{k+2}..N]}_{\gamma}}|=\sum_{\gamma}{(\lambda^{[k+1]}_{\gamma})^2}|{\gamma}\rangle\langle{\gamma}|.</math> + +The subspaces <math>H_C</math> and <math>H_D</math> belong to the qubits ''k'' and ''k'' + 1. +Using this basis and the decomposition ''D'', <math>|{\psi}\rangle</math> can be written as: + +:<math>|{\psi}\rangle=\sum\limits_{\alpha,\beta,\gamma=1}^{\chi}\sum\limits_{i,j=1}^{M}\lambda^{[C-1]}_{\alpha}\Gamma^{[C]i}_{\alpha\beta}\lambda^{[C]}_{\beta}\Gamma^{[D]j}_{\beta\gamma}\lambda^{[D]}_{\gamma}|{{\alpha}ij{\gamma}}\rangle</math> + +Using the same reasoning as for the OQG, the applying the TQG ''V'' to qubits ''k'', ''k'' + 1 one needs only to update + +:<math>\Gamma^{[C]}</math>, <math>\lambda</math> and <math>\Gamma^{[D]}.</math> + +We can write <math>|{\psi'}\rangle=V|{\psi}\rangle</math> as: +:<math>|{\psi'}\rangle=\sum\limits_{\alpha,\gamma=1}^{\chi}\sum\limits_{i,j=1}^{M}\lambda_{\alpha}\Theta^{ij}_{\alpha\gamma}\lambda_{\gamma}|{{\alpha}ij\gamma}\rangle</math> + +where + +:<math>\Theta^{ij}_{\alpha\gamma}=\sum\limits_{\beta=1}^{\chi}\sum\limits_{m,n=1}^{M}V^{ij}_{mn}\Gamma^{[C]m}_{\alpha\beta}\lambda_{\beta}\Gamma^{[D]n}_{\beta\gamma}.</math> + +To find out the new decomposition, the new <math>\lambda</math>'s at the bond ''k'' and their corresponding Schmidt eigenvectors must be computed and expressed in terms of the <math>{{\Gamma}}</math>'s of the decomposition ''D''. The reduced density matrix <math>\rho^{'[DK]}</math> is therefore [[Diagonalizable matrix|diagonalized]]: + +:<math>\rho^{'[DK]}=Tr_{JC}|{\psi'}\rangle\langle{\psi'}|=\sum_{j,j',\gamma,\gamma'}\rho^{jj'}_{\gamma\gamma'}|{j\gamma}\rangle\langle{j'\gamma'}|.</math> + +The square roots of its eigenvalues are the new <math>\lambda</math>'s. +Expressing the eigenvectors of the diagonalized matrix in the basis :<math>\{|{j\gamma}\rangle\}</math> the <math>\Gamma^{[{{D]}}}</math>'s are obtained as well: + +:<math>|{\Phi^{'[{{DK}}]}}\rangle=\sum_{j,\gamma}\Gamma^{'[{{D}}]j}_{\beta\gamma}\lambda_{\gamma}|{j\gamma}\rangle.</math> + +From the left-hand eigenvectors, + +:<math>\lambda^{'}_{\beta}|{\Phi^{'[{{JC}}]}_{\beta}}\rangle=\langle{\Phi^{'[{DK}]}_{\beta}}|{\psi'}\rangle=\sum_{i,j,\alpha,\gamma}(\Gamma^{'[{D}]j}_{\beta\gamma})^{*}\Theta^{ij}_{\alpha\gamma}(\lambda_{\gamma})^2\lambda_{\alpha}|{{\alpha}i}\rangle</math> + +after expressing them in the basis <math>\{|{i\alpha}\rangle\}</math>, the <math>\Gamma^{[{C}]}</math>'s are: + +:<math>|{\Phi^{'[{{JC}}]}}\rangle=\sum_{i,\alpha}\Gamma^{'[{{C}}]i}_{\alpha\beta}\lambda_{\alpha}|{{\alpha}i}\rangle.</math> + +===The computational cost=== + +The dimension of the largest [[tensor]]s in ''D'' is of the order <math>{{O}}(M{\cdot}{\chi}^2)</math>; when constructing the <math>\Theta^{ij}_{\alpha\gamma}</math> one makes the summation over <math>\beta</math>, <math>\it{m}</math> and <math>\it{n}</math> for each <math>\gamma,\alpha,{\it{i,j}}</math>, adding up to a total of <math>{{O}}(M^4{\cdot}{\chi}^3)</math> operations. The same holds for the formation of the elements <math>\rho^{{{jj'}}}_{\gamma\gamma'}</math>, or for computing the left-hand eigenvectors <math>\lambda^{'}_{\beta}|{\Phi^{'[{\it{JC}}]}_{\beta}}\rangle</math>, a maximum of <math>{\it{O}}(M^3{\cdot}{\chi}^3)</math>, respectively <math>{\it{O}}(M^2{\cdot}{\chi}^3)</math> basic operations. In the case of qubits, <math>{\it{M}}=2</math>, hence its role is not very relevant for the order of magnitude of the number of basic operations, but in the case when the on-site dimension is higher than two it has a rather decisive contribution. + +==The numerical simulation== + +The numerical simulation is targeting (possibly time-dependent) Hamiltonians of a system of <math>{\it{N}}</math> particles, which are composed of arbitrary OQGs and TQGs: + +:<math>H_n=\sum\limits_{l=1}^{N}K^{[l]}_1 + \sum\limits_{l=1}^{N}K^{[l,l+1]}_2.</math> + +It is useful to decompose <math>H_n</math> as a sum of two possibly non-commuting terms, <math>H_n = F + G</math>, where + +:<math>F \equiv \sum_{even\ \ l}(K^{l}_1 + K^{l,l+1}_2) = \sum_{even\ \ l}F^{[l]},</math> + +:<math>G \equiv \sum_{odd \ \ l}(K^{l}_1 + K^{l,l+1}_2) = \sum_{odd \ \ l}G^{[l]}.</math> + +Any two-body terms commute: <math>[F^{[l]},F^{[l']}]=0</math>, <math>[G^{[l]},G^{[l']}]=0</math> +This is done to make the Suzuki-Trotter expansion (ST)<ref name=suzuki>Naomichi Hatano and Masuo Suzuki, ''Finding Exponential Product Formulas of Higher Orders''[http://arxiv.org/abs/math-ph/0506007v1]</ref> of the exponential operator. + +===The Suzuki-Trotter expansion=== + +The Suzuki-Trotter expansion of the first order (ST1) represents a general way of writing exponential operators: + +:<math> e^{(A+B)} = \lim_{n\rightarrow\infty}(e^{\frac{{A}}{n}}e^{\frac{{B}}{n}})^n</math> + +or, equivalently + +:<math>e^{{\delta}(A+B)} = \lim_{\delta\rightarrow0}[e^{{\delta}A}e^{{\delta}B} + {{\it{O}}}(\delta^2)].</math> + +The correction term vanishes in the limit <math>\delta\rightarrow0</math> + +For simulations of quantum dynamics it is useful to use operators that are [[Unitary operator|unitary]], conserving the norm (unlike power series expansions), and there's where the Trotter-Suzuki expansion comes in. In problems of quantum dynamics the unitarity of the operators in the ST expansion proves quite practical, since the error tends to concentrate in the overall [[Phase (waves)|phase]], thus allowing us to faithfully compute expectation values and conserved quantities. Because the ST conserves the phase-space volume, it is also called a symplectic integrator. + +The trick of the ST2 is to write the unitary operators <math>e^{-iHt}</math> as: + +:<math>e^{-iH_nT} = [e^{-iH_n\delta}]^{T/{\delta}} = [e^{\frac{{\delta}}{2}F}e^{{\delta}G}e^{\frac{{\delta}}{2}F}]^{n}</math> + +where <math>n=\frac{T}{\delta}</math>. The number <math>{\it{n}}</math> is called the Trotter number. + +===Simulation of the time-evolution=== + +The operators <math>e^{\frac{{\delta}}{2}F}</math>, <math>e^{{\delta}G}</math> are easy to express, as: + +:<math>e^{\frac{{\delta}}{2}F}=\prod_{odd \ \ l}e^{\frac{{\delta}}{2}F^{[l]}}</math> + +:<math>e^{{\delta}G}=\prod_{even \ \ l}e^{{\delta}G^{[l]}}</math> + +since any two operators <math>F^{[l]}</math>,<math>F^{[l']}</math> (respectively, <math>G^{[l]}</math>,<math>G^{[l']}</math>) commute for <math>l{\neq}l'</math> and an ST expansion of the first order keeps only the product of the exponentials, the approximation becoming, in this case, exact. + +The time-evolution can be made according to + +:<math>|{\tilde{\psi}_{t+\delta}}\rangle=e^{-i\frac{{\delta}}{2}F}e^{{-i\delta}G}e^{\frac{{-i\delta}}{2}F}|{\tilde{\psi}_{t}}\rangle.</math> + +For each "time-step" <math>\delta</math>, <math>e^{-i\frac{{\delta}}{2}F^{[l]}}</math> are applied successively to all odd sites, then <math>e^{{-i\delta}G^{[l]}}</math> to the even ones, and <math>e^{-i\frac{{\delta}}{2}F^{[l]}}</math> again to the odd ones; this is basically a sequence of TQG's, and it has been explained above how to update the decomposition <math>{\it{D}}</math> when applying them. + +Our goal is to make the time evolution of a state <math>|{\psi_0}\rangle</math> for a time T, towards the state <math>|{\psi_{T}}\rangle</math> using the n-particle Hamiltonian <math>H_n</math>. + +It is rather troublesome, if at all possible, to construct the decomposition <math>{\it{D}}</math> for an arbitrary n-particle state, since this would mean one has to compute the Schmidt decomposition at each bond, to arrange the Schmidt eigenvalues in decreasing order and to choose the first <math>\chi_c</math> and the appropriate Schmidt eigenvectors. Mind this would imply diagonalizing somewhat generous reduced density matrices, which, depending on the system one has to simulate, might be a task beyond our reach and patience. +Instead, one can try to do the following: + +'''''i)''''' construct the decomposition <math>{\it{D}}</math> for a simple initial state, let us say, some product state <math>|{\psi_P}\rangle</math>, for which the decomposition is straightforward. + +'''''ii)''''' relate <math>|{\psi_0}\rangle</math> to the ground state <math>|{\psi_{gr}}\rangle</math> of a Hamiltonian <math>\tilde{H}</math> by a sufficiently local transformation Q (one that can be expressed as a product of TQGs, for example) <math>|{\psi_0}\rangle=Q|{\psi_{gr}}\rangle</math> + +'''''iii)''''' make an imaginary-time evolution towards the ground state of the Hamiltonian <math>\tilde{H}</math>, <math>|{\psi_{gr}}\rangle</math> according to: + +:<math>|{\psi_{gr}}\rangle=\lim_{\tau\rightarrow\infty}\frac{e^{-\tilde{H}\tau}|{\psi_P}\rangle}{||e^{-\tilde{H}\tau}|{\psi_P}\rangle||},</math> + +or, alternatively, simulate an isentropic evolution using a time-dependent Hamiltonian, which interpolates between the Hamiltonian <math>H_1</math>, which has the product state <math>|{\psi_P}\rangle</math> as its ground state, and the Hamiltonian <math>\tilde{H}</math>; the evolution must be done slow enough, such that the system is always in the ground state or, at least, very close to it. + +'''''iv)'''''finally, make the time-evolution of the state <math>|{\psi_0}\rangle</math> towards <math>|{\psi_{T}}\rangle</math> using the Hamiltonian <math>H_n</math>: + +:<math>|{\psi_{{T}}}\rangle=e^{-iH_nT}|{\psi_0}\rangle</math> + +==Error sources== + +The errors in the simulation are resulting from the Suzuki-Trotter approximation and the involved truncation of the Hilbert space. + +===Errors coming from the Suzuki-Trotter expansion=== + +In the case of a Trotter approximation of <math>{\it{p^{th}}}</math> order, the error is of order <math>{\delta}^{p+1}</math>. Taking into account <math>n=\frac{T}{\delta}</math> steps, the error after the time T is: + +:<math> \epsilon=\frac{T}{\delta}\delta^{p+1}=T\delta^p</math> + +The unapproximated state <math>|{\tilde{\psi}_{Tr}}\rangle</math> is: + +:<math>|{\tilde{\psi}_{Tr}}\rangle = \sqrt{1-{\epsilon}^2}|{\psi_{Tr}}\rangle + {\epsilon}|{\psi^{\bot}_{Tr}}\rangle</math> + +where <math>|{\psi_{Tr}}\rangle</math> is the state kept after the Trotter expansion and <math>|{\psi^{\bot}_{Tr}}\rangle</math> accounts for the part that is neglected when doing the expansion. + +The total error scales with time <math>{\it{T}}</math> as: + +:<math>\epsilon({{{\it{T}}}}) = 1 -|\langle{\tilde{\psi_{Tr}}}|{\psi_{{Tr}}}\rangle|^2 = 1 - 1 + \epsilon^2 = \epsilon^2</math> + +One should notice that the Trotter error is '''independent''' of the dimension of the chain. + +===Errors coming from the truncation of the Hilbert space=== + +Considering the errors arising from the truncation of the Hilbert space comprised in the decomposition ''D'', they are twofold. + +First, as we have seen above, the smallest contributions to the Schmidt spectrum are left away, the state being faithfully represented up to: + +:<math>\epsilon({{{\it{D}}}}) = 1 - \prod\limits_{n=1}^{N-1}(1-\epsilon_n)</math> + +where <math> \epsilon_n = \sum\limits_{\alpha=\chi_c}^{\chi}(\lambda^{[n]}_{\alpha})^2</math> is the sum of all the discarded eigenvalues of the reduced density matrix, at the bond <math>{\it{n}}</math>. +The state <math>|{\psi}\rangle</math> is, at a given bond <math>{\it{n}}</math>, described by the Schmidt decomposition: + +:<math>|{\psi}\rangle = \sqrt{1-\epsilon_n}|{\psi_{D}}\rangle + \sqrt{\epsilon_n}|{\psi^{\bot}_{D}}\rangle</math> + +where +:<math>|{\psi_{D}}\rangle = \frac{1}{\sqrt{1-\epsilon_n}}\sum\limits_{{{\alpha}_n}=1}^{{\chi}_c}\lambda^{[n]}_{{\alpha}_n}|{\Phi^{[1..n]}_{\alpha_n}}\rangle|{ \Phi^{[n+1..N]}_{\alpha_n}}\rangle</math> + +is the state kept after the truncation and + +:<math>|{\psi^{\bot}_{D}}\rangle = \frac{1}{\sqrt{\epsilon_n}}\sum\limits_{{{\alpha}_n}={\chi}_c}^{{\chi}}\lambda^{[n]}_{{\alpha}_n}|{\Phi^{[1..n]}_{\alpha_n}}\rangle|{ \Phi^{[n+1..N]}_{\alpha_n}}\rangle</math> + +is the state formed by the eigenfunctions corresponding to the smallest, irrelevant Schmidt coefficients, which are neglected. +Now, <math>\langle\psi^{\bot}_{D}|\psi_{D}\rangle=0</math> because they are spanned by vectors corresponding to orthogonal spaces. Using the same argument as for the Trotter expansion, the error after the truncation is: + +:<math> +\epsilon_n = 1 - |\langle{\psi}|\psi_{D}\rangle|^2 = \sum\limits_{\alpha=\chi_c}^{\chi}(\lambda^{[n]}_{\alpha})^2</math> + +After moving to the next bond, the state is, similarly: + +:<math>|{\psi_{D}}\rangle = \sqrt{1-\epsilon_{n+1}}|{{{\psi}'}_{D}}\rangle + \sqrt{\epsilon_{n+1}}|{{\psi'}^{\bot}_{D}}\rangle</math> +The error, after the second truncation, is: + +:<math>\epsilon = 1 - |\langle{\psi}|\psi'_{D}\rangle|^2 = 1 - (1-\epsilon_{n+1})|\langle{\psi}|\psi_{D}\rangle|^2 = 1 - (1-\epsilon_{n+1})(1-\epsilon_{n})</math> + +and so on, as we move from bond to bond. + +The second error source enfolded in the decomposition <math>{\it{D}}</math> is more subtle and requires a little bit of calculation. + +As we calculated before, the normalization constant after making the truncation at bond <math>{\it{l}}</math> <math>([1..l]:[l+1..N])</math> is: + +:<math> R = {\sum\limits_{{\alpha_l}=1}^{{\chi}_c}{|\lambda^{[l]}_{{\alpha}_l}|}^2} = {1 - \epsilon_l}</math> + +Now let us go to the bond <math>{\it{l}}-1</math> and calculate the norm of the right-hand Schmidt vectors <math>||{\Phi^{[l-1..N]}_{\alpha_{l-1}}}||</math>; taking into account the full Schmidt dimension, the norm is: + +:<math>n_1 = 1 = \sum\limits_{\alpha_l=1}^{\chi_c}(c_{\alpha_{l-1}\alpha_{l}})^2(\lambda^{[l]}_{\alpha_l})^2 + \sum\limits_{\alpha_l=\chi_c}^{\chi}(c_{\alpha_{l-1}\alpha_{l}})^2(\lambda^{[l]}_{\alpha_l})^2 = S_1 +S_2</math>, +where <math>(c_{\alpha_{l-1}\alpha_{l}})^2 = \sum\limits_{i_l=1}^{d}(\Gamma^{[l]i_l}_{\alpha_{l-1}\alpha_{l}})^{*}\Gamma^{[l]i_l}_{\alpha_{l-1}\alpha_{l}}</math>. + +Taking into account the truncated space, the norm is: + +:<math>n_{2}=\sum\limits_{\alpha_l=1}^{\chi_c} (c_{\alpha_{{{l-1}}}\alpha_{l}})^2\cdot({\lambda'}^{[l]}_{\alpha_l})^2=\sum\limits_{\alpha_l=1}^{\chi_c}(c_{\alpha_{{{l-1}}}\alpha_{l}})^2\frac{(\lambda^{[l]}_{\alpha_l})^2}{R} = \frac{S_1}{R}</math> + +Taking the difference, <math>\epsilon = n_2 - n_1 = n_2 - 1</math>, we get: + +:<math>\epsilon = \frac{S_1}{R} - 1 \leq \frac{1-R}{R} = \frac{\epsilon_l}{1-\epsilon_l} {\rightarrow}0\ \ as\ \ {{\epsilon_l{\rightarrow}{{0}}}} </math> + +Hence, when constructing the reduced density matrix, the [[Trace (linear algebra)|trace]] of the matrix is multiplied by the factor: + +:<math>|\langle{\psi_{D}}|\psi_{D}\rangle|^2 = 1 - \frac{\epsilon_l}{1-\epsilon_l} = \frac{1-2\epsilon_l}{1-\epsilon_l} +</math> + +===The total truncation error=== + +The total truncation error, considering both sources, is upper bounded by: + +:<math>\epsilon({{{{D}}}}) = 1 - \prod\limits_{n=1}^{N-1}(1-\epsilon_n) \prod\limits_{n=1}^{N-1}\frac{1-2\epsilon_n}{1-\epsilon_n} = 1 - \prod\limits_{n=1}^{N-1}(1-2\epsilon_n)</math> + +When using the Trotter expansion, we do not move from bond to bond, but between bonds of same parity; moreover, for the ST2, we make a sweep of the even ones and two for the odd. But nevertheless, the calculation presented above still holds. The error is evaluated by successively multiplying with the normalization constant, each time we build the reduced density matrix and select its relevant eigenvalues. + +=="Adaptive" Schmidt dimension== + +One thing that can save a lot of computational time without loss of accuracy is to use a different Schmidt dimension for each bond instead of a fixed one for all bonds, keeping only the necessary amount of relevant coefficients, as usual. For example, taking the first bond, in the case of qubits, the Schmidt dimension is just two. Hence, at the first bond, instead of futilely diagonalizing, let us say, 10 by 10 or 20 by 20 matrices, we can just restrict ourselves to ordinary 2 by 2 ones, thus making the algorithm generally faster. What we can do instead is set a threshold for the eigenvalues of the SD, keeping only those that are above the threshold. + +TEBD also offers the possibility of straightforward parallelization due to the factorization of the exponential time-evolution operator using the Suzuki-Trotter expansion. A [[parallel-TEBD]] has the same mathematics as its non-parallelized counterpart, the only difference is in the numerical implementation. + +==References== +<references/> + +{{DEFAULTSORT:Time-Evolving Block Decimation}} +[[Category:Quantum mechanics]] +[[Category:Computational physics]] + 35yrblq4t8btsp0f0x50icsm2fulhet + + + + Monoidal t-norm logic + 0 + 16783 + + 16784 + 2013-02-28T13:58:28Z + + EmilJ + 0 + + + /* Semantics */ dab link + wikitext + text/x-wiki + '''Monoidal t-norm based logic''' (or shortly '''MTL'''), the logic of left-continuous [[t-norm]]s, is one of [[t-norm fuzzy logic]]s. It belongs to the broader class of [[substructural logic]]s, or logics of [[residuated lattice]]s;<ref name="Ono">Ono (2003).</ref> it extends the logic of commutative bounded integral residuated lattices (known as Höhle's [[monoidal logic]], Ono's FL<sub>ew</sub>, or intuitionistic logic without contraction) by the axiom of prelinearity. + +== Motivation == + +[[T-norm]]s are binary functions on the real unit interval [0,&nbsp;1] which are often used to represent a conjunction connective in [[fuzzy logic]]. Every ''left-continuous'' t-norm <math>*</math> has a unique [[t-norm#Residuum|residuum]], that is, a function <math>\Rightarrow</math> such that for all ''x'', ''y'', and ''z'', +:<math>x*y\le z</math> if and only if <math>x\le (y\Rightarrow z).</math> +The residuum of a left-continuous t-norm can explicitly be defined as +:<math>(x\Rightarrow y)=\sup\{z\mid z*x\le y\}.</math> +This ensures that the residuum is the largest function such that for all ''x'' and ''y'', +:<math>x*(x\Rightarrow y)\le y.</math> +The latter can be interpreted as a fuzzy version of the [[modus ponens]] rule of inference. The residuum of a left-continuous t-norm thus can be characterized as the weakest function that makes the fuzzy modus ponens valid, which makes it a suitable truth function for implication in fuzzy logic. Left-continuity of the t-norm is the necessary and sufficient condition for this relationship between a t-norm conjunction and its residual implication to hold. + +Truth functions of further propositional connectives can be defined by means of the t-norm and its residuum, for instance the residual negation <math>\neg x=(x\Rightarrow 0).</math> In this way, the left-continuous t-norm, its residuum, and the truth functions of additional propositional connectives (see the section ''[[#Standard semantics|Standard semantics]]'' below) determine the [[truth value]]s of complex [[propositional formula]]e in [0,&nbsp;1]. Formulae that always evaluate to 1 are then called ''[[tautology (logic)|tautologies]]'' with respect to the given left-continuous t-norm <math>*,</math> or ''<math>*\mbox{-}</math>tautologies.'' The set of all <math>*\mbox{-}</math>tautologies is called the ''logic'' of the t-norm <math>*,</math> since these formulae represent the laws of fuzzy logic (determined by the t-norm) which hold (to degree 1) regardless of the truth degrees of [[atomic formula]]e. Some formulae are tautologies with respect to ''all'' left-continuous t-norms: they represent general laws of propositional fuzzy logic which are independent of the choice of a particular left-continuous t-norm. These formulae form the logic MTL, which can thus be characterized as the ''logic of left-continuous t-norms.''<ref>Conjectured by Esteva and Godo who introduced the logic (2001), proved by Jenei and Montagna (2002).</ref> + +== Syntax == + +=== Language === + +The language of the propositional logic MTL consists of [[countable|countably]] many [[propositional variable]]s and the following primitive [[logical connective]]s: +* '''Implication''' <math>\rightarrow</math> ([[arity|binary]]) +* '''Strong conjunction''' <math>\otimes</math> (binary). The sign &amp; is a more traditional notation for strong conjunction in the literature on fuzzy logic, while the notation <math>\otimes</math> follows the tradition of substructural logics. +* '''Weak conjunction''' <math>\wedge</math> (binary), also called '''lattice conjunction''' (as it is always realized by the [[lattice (order)|lattice]] operation of [[meet (mathematics)|meet]] in algebraic semantics). Unlike [[basic fuzzy logic|BL]] and stronger fuzzy logics, weak conjunction is not definable in MTL and has to be included among primitive connectives. +* '''Bottom''' <math>\bot</math> ([[nullary]] — a [[propositional constant]]); <math>0</math> or <math>\overline{0}</math> are common alternative signs and '''zero''' a common alternative name for the propositional constant (as the constants bottom and zero of substructural logics coincide in MTL). +The following are the most common defined logical connectives: +* '''Negation''' <math>\neg</math> ([[unary operation|unary]]), defined as +::<math>\neg A \equiv A \rightarrow \bot</math> +* '''Equivalence''' <math>\leftrightarrow</math> (binary), defined as +::<math>A \leftrightarrow B \equiv (A \rightarrow B) \wedge (B \rightarrow A)</math> +: In MTL, the definition is equivalent to <math>(A \rightarrow B) \otimes (B \rightarrow A).</math> +* '''(Weak) disjunction''' <math>\vee</math> (binary), also called '''lattice disjunction''' (as it is always realized by the [[lattice (order)|lattice]] operation of [[join (mathematics)|join]] in algebraic semantics), defined as +::<math>A \vee B \equiv ((A \rightarrow B) \rightarrow B) \wedge ((B \rightarrow A) \rightarrow A)</math> +* '''Top''' <math>\top</math> (nullary), also called '''one''' and denoted by <math>1</math> or <math>\overline{1}</math> (as the constants top and zero of substructural logics coincide in MTL), defined as +::<math>\top \equiv \bot \rightarrow \bot</math> + +[[Well-formed formula]]e of MTL are defined as usual in [[propositional logic]]s. In order to save parentheses, it is common to use the following order of precedence: +* Unary connectives (bind most closely) +* Binary connectives other than implication and equivalence +* Implication and equivalence (bind most loosely) + +=== Axioms === + +A [[Hilbert-style deduction system]] for MTL has been introduced by Esteva and Godo (2001). Its single derivation rule is [[modus ponens]]: +:from <math>A</math> and <math>A \rightarrow B</math> derive <math>B.</math> +The following are its [[axiom scheme|axiom schemata]]: +:<math>\begin{array}{ll} + {\rm (MTL1)}\colon & (A \rightarrow B) \rightarrow ((B \rightarrow C) \rightarrow (A \rightarrow C)) \\ + {\rm (MTL2)}\colon & A \otimes B \rightarrow A\\ + {\rm (MTL3)}\colon & A \otimes B \rightarrow B \otimes A\\ + {\rm (MTL4a)}\colon & A \wedge B \rightarrow A\\ + {\rm (MTL4b)}\colon & A \wedge B \rightarrow B \wedge A\\ + {\rm (MTL4c)}\colon & A \otimes (A \rightarrow B) \rightarrow A \wedge B\\ + {\rm (MTL5a)}\colon & (A \rightarrow (B \rightarrow C)) \rightarrow (A \otimes B \rightarrow C)\\ + {\rm (MTL5b)}\colon & (A \otimes B \rightarrow C) \rightarrow (A \rightarrow (B \rightarrow C))\\ + {\rm (MTL6)}\colon & ((A \rightarrow B) \rightarrow C) \rightarrow (((B \rightarrow A) \rightarrow C) \rightarrow C)\\ + {\rm (MTL7)}\colon & \bot \rightarrow A +\end{array}</math> + +The traditional numbering of axioms, given in the left column, is derived from the numbering of axioms of Hájek's [[basic fuzzy logic]] BL.<ref name="BLaxioms">Hájek (1998), Definition&nbsp;2.2.4.</ref> The axioms (MTL4a)–(MTL4c) replace the axiom of ''divisibility'' (BL4) of BL. The axioms (MTL5a) and (MTL5b) express the law of [[residuated lattice|residuation]] and the axiom (MTL6) corresponds to the condition of [[prelinearity]]. The axioms (MTL2) and (MTL3) of the original axiomatic system were shown to be redundant (Chvalovský, 2012) and (Cintula, 2005). All the other axioms were shown to be independent (Chvalovský, 2012). + +== Semantics == + +Like in other propositional [[t-norm fuzzy logics]], [[algebraic semantics (mathematical logic)|algebraic semantics]] is predominantly used for MTL, with three main classes of [[algebraic structure|algebras]] with respect to which the logic is [[completeness|complete]]: +* '''General semantics''', formed of all ''MTL-algebras'' — that is, all algebras for which the logic is [[Soundness theorem|sound]] +* '''Linear semantics''', formed of all ''linear'' MTL-algebras — that is, all MTL-algebras whose [[lattice (order)|lattice]] order is [[total order|linear]] +* '''Standard semantics''', formed of all ''standard'' MTL-algebras — that is, all MTL-algebras whose lattice reduct is the real unit interval [0,&nbsp;1] with the usual order; they are uniquely determined by the function that interprets strong conjunction, which can be any left-continuous [[t-norm]] + +=== General semantics === + +==== MTL-algebras ==== + +Algebras for which the logic MTL is sound are called ''MTL-algebras.'' They can be characterized as ''prelinear commutative bounded integral residuated lattices.'' In more detail, an algebraic structure <math>(L,\wedge,\vee,\ast,\Rightarrow,0,1)</math> is an MTL-algebra if +* <math>(L,\wedge,\vee,0,1)</math> is a [[lattice (order)|bounded lattice]] with the top element 0 and bottom element 1 +* <math>(L,\ast,1)</math> is a [[Commutativity|commutative]] [[monoid]] +* <math>\ast</math> and <math>\Rightarrow</math> form an [[Galois connection|adjoint pair]], that is, <math>z*x\le y</math> if and only if <math>z\le x\Rightarrow y,</math> where <math>\le</math> is the lattice order of <math>(L,\wedge,\vee),</math> for all ''x'', ''y'', and ''z'' in <math>L</math>, (the ''residuation'' condition) +* <math>(x\Rightarrow y)\vee(y\Rightarrow x)=1</math> holds for all ''x'' and ''y'' in ''L'' (the ''prelinearity'' condition) + +Important examples of MTL algebras are ''standard'' MTL-algebras on the real unit interval [0,&nbsp;1]. Further examples include all [[Boolean algebra (structure)|Boolean algebra]]s, all linear [[Heyting algebra]]s (both with <math>\ast=\wedge</math>), all [[MV-algebra]]s, all [[BL (logic)|BL]]-algebras, etc. Since the residuation condition can equivalently be expressed by identities,<ref name="variety">The proof of Lemma&nbsp;2.3.10 in Hájek (1998) for BL-algebras can easily be adapted to work for MTL-algebras, too.</ref> MTL-algebras form a [[variety (universal algebra)|variety]]. + +==== Interpretation of the logic MTL in MTL-algebras ==== + +The connectives of MTL are interpreted in MTL-algebras as follows: +* Strong conjunction by the monoidal operation <math>\ast</math> +* Implication by the operation <math>\Rightarrow</math> (which is called the ''residuum'' of <math>\ast</math>) +* Weak conjunction and weak disjunction by the lattice operations <math>\wedge</math> and <math>\vee,</math> respectively (usually denoted by the same symbols as the connectives, if no confusion can arise) +* The truth constants zero (top) and one (bottom) by the constants 0 and 1 +* The equivalence connective is interpreted by the operation <math>\Leftrightarrow</math> defined as +::<math>x\Leftrightarrow y \equiv (x\Rightarrow y)\wedge(y\Rightarrow x)</math> +: Due to the prelinearity condition, this definition is equivalent to one that uses <math>\ast</math> instead of <math>\wedge,</math> thus +::<math>x\Leftrightarrow y \equiv (x\Rightarrow y)\ast(y\Rightarrow x)</math> +* Negation is interpreted by the definable operation <math>-x \equiv x\Rightarrow 0</math> + +With this interpretation of connectives, any evaluation ''e''<sub>v</sub> of propositional variables in ''L'' uniquely extends to an evaluation ''e'' of all well-formed formulae of MTL, by the following inductive definition (which generalizes [[Semantic theory of truth|Tarski's truth conditions]]), for any formulae ''A'', ''B'', and any propositional variable ''p'': +:<math>\begin{array}{rcl} + e(p) &=& e_{\mathrm v}(p) +\\ e(\bot) &=& 0 +\\ e(\top) &=& 1 +\\ e(A\otimes B) &=& e(A) \ast e(B) +\\ e(A\rightarrow B) &=& e(A) \Rightarrow e(B) +\\ e(A\wedge B) &=& e(A) \wedge e(B) +\\ e(A\vee B) &=& e(A) \vee e(B) +\\ e(A\leftrightarrow B) &=& e(A) \Leftrightarrow e(B) +\\ e(\neg A) &=& e(A) \Rightarrow 0 +\end{array}</math> + +Informally, the truth value 1 represents full truth and the truth value 0 represents full falsity; intermediate truth values represent intermediate degrees of truth. Thus a formula is considered fully true under an evaluation ''e'' if ''e''(''A'')&nbsp;=&nbsp;1. A formula ''A'' is said to be ''valid'' in an MTL-algebra ''L'' if it is fully true under all evaluations in ''L'', that is, if ''e''(''A'')&nbsp;=&nbsp;1 for all evaluations ''e'' in ''L''. Some formulae (for instance, ''p'' → ''p'') are valid in any MTL-algebra; these are called ''tautologies'' of MTL. + +The notion of global [[entailment]] (or: global [[consequence relation|consequence]]) is defined for MTL as follows: a set of formulae &Gamma; entails a formula ''A'' (or: ''A'' is a global consequence of &Gamma;), in symbols <math>\Gamma\models A,</math> if for any evaluation ''e'' in any MTL-algebra, whenever ''e''(''B'')&nbsp;=&nbsp;1 for all formulae ''B'' in &Gamma;, then also ''e''(''A'')&nbsp;=&nbsp;1. Informally, the global consequence relation represents the transmission of full truth in any MTL-algebra of truth values. + +==== General soundness and completeness theorems ==== + +The logic MTL is [[soundness theorem|sound]] and [[completeness|complete]] with respect to the class of all MTL-algebras (Esteva &amp; Godo, 2001): +:A formula is provable in MTL if and only if it is valid in all MTL-algebras. +The notion of MTL-algebra is in fact so defined that MTL-algebras form the class of ''all'' algebras for which the logic MTL is sound. Furthermore, the ''strong completeness theorem'' holds:<ref>A general proof of the strong completeness with respect to all ''L''-algebras for any weakly implicative logic ''L'' (which includes MTL) can be found in Cintula (2006).</ref> +:A formula ''A'' is a global consequence in MTL of a set of formulae &Gamma; if and only if ''A'' is derivable from &Gamma; in MTL. + +=== Linear semantics === + +Like algebras for other fuzzy logics,<ref name="wifl">Cintula (2006).</ref> MTL-algebras enjoy the following ''linear subdirect decomposition property'': +: Every MTL-algebra is a subdirect product of linearly ordered MTL-algebras. +(A ''subdirect product'' is a subalgebra of the [[direct product]] such that all [[projection (mathematics)|projection maps]] are [[surjective function|surjective]]. An MTL-algebra is ''linearly ordered'' if its [[lattice (order)|lattice order]] is [[total order|linear]].) + +In consequence of the linear subdirect decomposition property of all MTL-algebras, the ''completeness theorem with respect to linear MTL-algebras'' (Esteva &amp; Godo, 2001) holds: +*A formula is provable in MTL if and only if it is valid in all ''linear'' MTL-algebras. +*A formula ''A'' is derivable in MTL from a set of formulae &Gamma; if and only if ''A'' is a global consequence in all ''linear'' MTL-algebras of &Gamma;. + +=== Standard semantics === + +''Standard'' are called those MTL-algebras whose lattice reduct is the real unit interval [0,&nbsp;1]. They are uniquely determined by the real-valued function that interprets strong conjunction, which can be any left-continuous [[t-norm]] <math>\ast</math>. The standard MTL-algebra determined by a left-continuous t-norm <math>\ast</math> is usually denoted by <math>[0,1]_{\ast}.</math> In <math>[0,1]_{\ast},</math> implication is represented by the [[t-norm#Residuum|residuum]] of <math>\ast,</math> weak conjunction and disjunction respectively by the minimum and maximum, and the truth constants zero and one respectively by the real numbers 0 and 1. + +The logic MTL is complete with respect to standard MTL-algebras; this fact is expressed by the ''standard completeness theorem'' (Jenei &amp; Montagna, 2002): +: A formula is provable in MTL if and only if it is valid in all standard MTL-algebras. + +Since MTL is complete with respect to standard MTL-algebras, which are determined by left-continuous t-norms, MTL is often referred to as the ''logic of left-continuous t-norms'' (similarly as [[BL (logic)|BL]] is the logic of continuous t-norms). + +== Bibliography == + +* Hájek P., 1998, ''Metamathematics of Fuzzy Logic''. Dordrecht: Kluwer. +* Esteva F. & Godo L., 2001, "Monoidal t-norm based logic: Towards a logic of left-continuous t-norms". ''Fuzzy Sets and Systems'' '''124''': 271–288. +* Jenei S. & Montagna F., 2002, "A proof of standard completeness of Esteva and Godo's monoidal logic MTL". ''Studia Logica'' '''70''': 184–192. +* Ono, H., 2003, "Substructural logics and residuated lattices — an introduction". In F.V. Hendricks, J. Malinowski (eds.): Trends in Logic: 50 Years of Studia Logica, ''Trends in Logic'' '''20''': 177–212. +* Cintula P., 2005, "Short note: On the redundancy of axiom (A3) in BL and MTL". ''Soft Computing'' '''9''': 942. +* Cintula P., 2006, "Weakly implicative (fuzzy) logics I: Basic properties". ''Archive for Mathematical Logic'' '''45''': 673–704. +* Chvalovský K., 2012, "[http://karel.chvalovsky.cz/publications/nezavislost.pdf On the Independence of Axioms in BL and MTL]". ''Fuzzy Sets and Systems'' '''197''': 123–129, {{doi|10.1016/j.fss.2011.10.018}}. + +== References == + +<references/> + +[[Category:Fuzzy logic]] + cenppz2gonv0h0jlm5gcl5wx58r5v78 + + + + Differentiation rules + 0 + 16715 + + 16716 + 2014-01-18T22:30:51Z + + Guy vandegrift + 0 + + + /* Derivatives of exponential and logarithmic functions */ Forgot to finish the sentence. + wikitext + text/x-wiki + {{Calculus |Differential}} + +This is a summary of '''differentiation rules''', that is, rules for computing the [[derivative]] of a [[function (mathematics)|function]] in [[calculus]]. + +== Elementary rules of differentiation == + +Unless otherwise stated, all functions are functions of [[real number|real numbers ('''R''')]] that return real values; although more generally, the formulae below apply wherever they are [[well defined]]<ref>''Calculus (5th edition)'', F. Ayres, E. Mendelson, Schuam's Outline Series, 2009, ISBN 978-0-07-150861-2.</ref><ref>''Advanced Calculus (3rd edition)'', R. Wrede, M.R. Spiegel, Schuam's Outline Series, 2010, ISBN 978-0-07-162366-7.</ref>—including [[complex number|complex numbers ('''C''')]].<ref>''Complex Variables'', M.R. Speigel, S. Lipschutz, J.J. Schiller, D. Spellman, Schaum's Outlines Series, McGraw Hill (USA), 2009, ISBN 978-0-07-161569-3</ref> + +===Differentiation is linear=== + +{{main|Linearity of differentiation}} + +For any functions ''f'' and ''g'' and any real numbers ''a'' and ''b'' the derivative of the function {{nowrap|1=''h''(''x'') = ''af''(''x'') + ''bg''(''x'')}} with respect to ''x'' is + +:<math> h'(x) = a f'(x) + b g'(x).\, </math> +In [[Leibniz's notation]] this is written as: +:<math> \frac{d(af+bg)}{dx} = a\frac{df}{dx} +b\frac{dg}{dx}.</math> + +Special cases include: +* ''The [[Constant factor rule in differentiation|constant division + rule]]'' +:<math>(af)' = af' \,</math> +* ''The [[Sum rule in differentiation|sum rule]]'' +:<math>(f + g)' = f' + g'\,</math> +* ''The subtraction rule'' +:<math>(f - g)' = f' - g'.\,</math> + +===The product rule=== + +{{main|Product rule}} + +For the functions ''f'' and ''g'', the derivative of the function ''h''(''x'') = ''f''(''x'') ''g''(''x'') +with respect to ''x'' is +:<math> h'(x) = f'(x) g(x) + f(x) g'(x).\, </math> +In Leibniz's notation this is written +:<math>\frac{d(fg)}{dx} = \frac{df}{dx} g + f \frac{dg}{dx}.</math> + +===The chain rule=== + +{{main|Chain rule}} + +The derivative of the function of a function ''h''(''x'') = ''f''(''g''(''x'')) with respect to ''x'' is +:<math> h'(x) = f'(g(x)) g'(x).\, </math> +In Leibniz's notation this is written as: +:<math>\frac{dh}{dx} = \frac{df(g(x))}{dg(x)} \frac{dg(x)}{dx}.\,</math> +However, by relaxing the interpretation of ''h'' as a function, this is often simply written +:<math>\frac{dh}{dx} = \frac{dh}{dg} \frac{dg}{dx}.\,</math> + +===The inverse function rule=== + +{{main|inverse functions and differentiation}} + +If the function ''f'' has an [[inverse function]] ''g'', meaning that {{nowrap|1=''g''(''f''(''x'')) = ''x''}} and {{nowrap|1=''f''(''g''(''y'')) = ''y''}}, then +:<math>g' = \frac{1}{f'\circ g}.\,</math> + +In Leibniz notation, this is written as +:<math> \frac{dx}{dy} = \frac{1}{dy/dx}.</math> + +==Power laws, polynomials, quotients, and reciprocals== +===The polynomial or elementary power rule=== + +{{main|Power rule}} + +If <math>f(x) = x^n</math>, for any [[integer]] ''n'' then +:<math>f'(x) = nx^{n-1}.\,</math> + +Special cases include: +* ''Constant rule'': if ''f'' is the constant function ''f''(''x'') = ''c'', for any number ''c'', then for all ''x'', ''f′''(''x'') = 0. +* if ''f''(''x'') = ''x'', then ''f′''(''x'') = 1. This special case may be generalized to: +*:''The derivative of an affine function is constant'': if ''f''(''x'') = ''ax'' + ''b'', then ''f′''(''x'') = ''a''. + +Combining this rule with the linearity of the derivative and the addition rule permits the computation of the derivative of any polynomial. + +===The reciprocal rule=== + +{{main|Reciprocal rule}} +The derivative of ''h''(''x'') = 1/''f''(''x'') for any (nonvanishing) function ''f'' is: + +:<math> h'(x) = -\frac{f'(x)}{[f(x)]^2}.\ </math> + +In Leibniz's notation, this is written +:<math> \frac{d(1/f)}{dx} = -\frac{1}{f^2}\frac{df}{dx}.\,</math> + +The reciprocal rule can be derived from the chain rule and the power rule. + +===The quotient rule=== + +{{main|Quotient rule}} + +If ''f'' and ''g'' are functions, then: +:<math>\left(\frac{f}{g}\right)' = \frac{f'g - g'f}{g^2}\quad</math> wherever ''g'' is nonzero. + +This can be derived from reciprocal rule and the product rule. Conversely (using the constant rule) the reciprocal rule may be derived from the special case ''f''(''x'') = 1. + +===Generalized power rule=== + +{{main|Power rule}} + +The elementary power rule generalizes considerably. The most general power rule is the '''functional power rule''': for any functions ''f'' and ''g'', +:<math>(f^g)' = \left(e^{g\ln f}\right)' = f^g\left(f'{g \over f} + g'\ln f\right),\quad</math> +wherever both sides are well defined. + +Special cases: +* If ''f''(''x'') = ''x''<sup>''a''</sup>, ''f′''(''x'') = ''ax''<sup>''a'' − 1</sup> when ''a'' is any real number and ''x'' is positive. +* The reciprocal rule may be derived as the special case where ''g''(''x'') = −1. + +== Derivatives of exponential and logarithmic functions == + +:<math> \frac{d}{dx}\left(c^{ax}\right) = {c^{ax} \ln c \cdot a } ,\qquad c > 0</math> +note that the equation above is true for all ''c'', but the derivative for c < 0 yields a complex number. + +:<math> \frac{d}{dx}\left(e^x\right) = e^x</math> + +:<math> \frac{d}{dx}\left( \log_c x\right) = {1 \over x \ln c} , \qquad c > 0, c \ne 1</math> +the equation above is also true for all ''c'' but yields a complex number if c<0. + +:<math> \frac{d}{dx}\left( \ln x\right) = {1 \over x} ,\qquad x \ne 0</math> + +:<math> \frac{d}{dx}\left( \ln |x|\right) = {|x| \over x^2}</math> + +:<math> \frac{d}{dx}\left( x^x \right) = x^x(1+\ln x).</math> + +===Logarithmic derivatives=== + +The [[logarithmic derivative]] is another way of stating the rule for differentiating the [[logarithm]] of a function (using the chain rule): +:<math> (\ln f)'= \frac{f'}{f} \quad</math> wherever ''f'' is positive. + +== Derivatives of trigonometric functions == +{{details|Differentiation of trigonometric functions}} + +{| style="width:100%; background:transparent; margin-left:2em;" +|width=50%|<math> (\sin x)' = \cos x \,</math> +|width=50%|<math> (\arcsin x)' = { 1 \over \sqrt{1 - x^2}} \,</math> +|- +|<math> (\cos x)' = -\sin x \,</math> +|<math> (\arccos x)' = -{1 \over \sqrt{1 - x^2}} \,</math> +|- +|<math> (\tan x)' = \sec^2 x = { 1 \over \cos^2 x} = 1 + \tan^2 x \,</math> +|<math> (\arctan x)' = { 1 \over 1 + x^2} \,</math> +|- +|<math> (\sec x)' = \sec x \tan x \,</math> +|<math> (\operatorname{arcsec} x)' = { 1 \over |x|\sqrt{x^2 - 1}} \,</math> +|- +|<math> (\csc x)' = -\csc x \cot x \,</math> +|<math> (\operatorname{arccsc} x)' = -{1 \over |x|\sqrt{x^2 - 1}} \,</math> +|- +|<math> (\cot x)' = -\csc^2 x = { -1 \over \sin^2 x} = -(1 + \cot^2 x)\,</math> +|<math> (\operatorname{arccot} x)' = -{1 \over 1 + x^2} \,</math> +|} + +==Derivatives of hyperbolic functions== +{| style="width:100%; background:transparent; margin-left:2em;" +|width=50%|<math>( \sinh x )'= \cosh x = \frac{e^x + + e^{-x}}{2}</math> +|width=50%|<math>(\operatorname{arsinh}\,x)' = { 1 \over \sqrt{x^2 + 1}}</math> +|- +|<math>(\cosh x )'= \sinh x = \frac{e^x - e^{-x}}{2}</math> +|<math>(\operatorname{arcosh}\,x)' = {\frac {1}{\sqrt{x^2-1}}}</math> +|- +|<math>(\tanh x )'= {\operatorname{sech}^2\,x}</math> +|<math>(\operatorname{artanh}\,x)' = { 1 \over 1 - x^2}</math> +|- +|<math>(\operatorname{sech}\,x)' = - \tanh x\,\operatorname{sech}\,x</math> +|<math>(\operatorname{arsech}\,x)' = -{1 \over x\sqrt{1 - x^2}}</math> +|- +|<math>(\operatorname{csch}\,x)' = -\,\operatorname{coth}\,x\,\operatorname{csch}\,x</math> +|<math>(\operatorname{arcsch}\,x)' = -{1 \over |x|\sqrt{1 + x^2}}</math> +|- +|<math>(\operatorname{coth}\,x )' = + + -\,\operatorname{csch}^2\,x</math> +|<math>(\operatorname{arcoth}\,x)' = -{ 1 \over 1 - x^2}</math> +|} + +==Derivatives of special functions== +{| style="width:100%; background:transparent; margin-left:2em;" +|width=50%| +;[[Gamma function]] +<math>\Gamma'(x) = \int_0^\infty t^{x-1} e^{-t} \ln t\,dt</math> +:<math>= \Gamma(x) \left(\sum_{n=1}^\infty \left(\ln\left(1 + \dfrac{1}{n}\right) - \dfrac{1}{x + n}\right) - \dfrac{1}{x}\right) = \Gamma(x) \psi(x)</math> +|width=50%| +|} +{| style="width:100%; background:transparent; margin-left:2em;" +|width=50%| +;[[Riemann Zeta function]] +<math>\zeta'(x) = -\sum_{n=1}^\infty \frac{\ln n}{n^x} = +-\frac{\ln 2}{2^x} - \frac{\ln 3}{3^x} - \frac{\ln 4}{4^x} - \cdots +\!</math> + +:<math>= -\sum_{p \text{ prime}} \frac{p^{-x} \ln p}{(1-p^{-x})^2}\prod_{q \text{ prime}, q \neq p} \frac{1}{1-q^{-x}} \!</math> +|} + +==Derivatives of integrals== + +{{main|Differentiation under the integral sign}} + +Suppose that it is required to differentiate with respect to ''x'' the function + +:<math>F(x)=\int_{a(x)}^{b(x)}f(x,t)\,dt,</math> + +where the functions <math>f(x,t)\,</math> and <math>\frac{\partial}{\partial x}\,f(x,t)\,</math> are both continuous in both <math>t\,</math> and <math>x\,</math> in some region of the <math>(t,x)\,</math> plane, including <math>a(x)\leq t\leq b(x),</math> <math>x_0\leq x\leq x_1\,</math>, and the functions <math>a(x)\,</math> and <math>b(x)\,</math> are both continuous and both have continuous derivatives for <math>x_0\leq x\leq x_1\,</math>. Then for <math>\,x_0\leq x\leq x_1\,\,</math>: + +:<math> F'(x) = f(x,b(x))\,b'(x) - f(x,a(x))\,a'(x) + \int_{a(x)}^{b(x)} \frac{\partial}{\partial x}\, f(x,t)\; dt\,. </math> + +This formula is the general form of the [[Leibniz integral rule]] and can be derived using the +[[fundamental theorem of calculus]]. + +==Derivatives to ''n''th order== +Some rules exist for computing the ''n''th derivative of functions, where ''n'' is a positive integer. These include: + +===Faà di Bruno's formula=== +{{main|Faà di Bruno's formula}} +If ''f'' and ''g'' are ''n'' times differentiable, then + +:<math> \frac{d^n}{d x^n} [f(g(x))]= n! \sum_{\{k_m\}}^{} f^{(r)}(g(x)) \prod_{m=1}^n \frac{1}{k_m!} \left(g^{(m)}(x) \right)^{k_m}</math> + +where <math> r = \sum_{m=1}^{n-1} k_m</math> and the set <math> \{k_m\}</math> consists of all non-negative integer solutions of the Diophantine equation <math> \sum_{m=1}^{n} m k_m = n</math>. + +===General Leibniz rule=== +{{main|General Leibniz rule}} +If ''f'' and ''g'' are ''n'' times differentiable, then + +:<math> \frac{d^n}{dx^n}[f(x)g(x)] = \sum_{k=0}^{n} \binom{n}{k} \frac{d^{n-k}}{d x^{n-k}} f(x) \frac{d^k}{d x^k} g(x)</math> + +==See also== + +*[[Derivative]] +*[[Differential calculus]] +*[[Vector calculus identities]] +*[[Differentiable function]] +*[[Differential of a function]] +*[[Limit of a function]] +*[[Function (mathematics)]] +*[[List of mathematical functions]] +*[[Trigonometric functions]] +*[[Inverse trigonometric functions]] +*[[Hyperbolic functions]] +*[[Inverse hyperbolic functions]] +*[[Matrix calculus]] +*[[Differentiation under the integral sign]] + +==References== +{{reflist}} + +==Sources and further reading== +These rules are given in many books, both on elementary and advanced calculus, in pure and applied mathematics. Those in this article (in addition to the above references) can be found in: +*''Mathematical Handbook of Formulas and Tables (3rd edition)'', S. Lipschutz, M.R. Spiegel, J. Liu, Schuam's Outline Series, 2009, ISBN 978-0-07-154855-7. +*''The Cambridge Handbook of Physics Formulas'', G. Woan, Cambridge University Press, 2010, ISBN 978-0-521-57507-2. +*''Mathematical methods for physics and engineering'', K.F. Riley, M.P. Hobson, S.J. Bence, Cambridge University Press, 2010, ISBN 978-0-521-86153-3 +*''NIST Handbook of Mathematical Functions'', F. W. J. Olver, D. W. Lozier, R. F. Boisvert, C. W. Clark, Cambridge University Press, 2010, ISBN 978-0-521-19225-5. + +==External links== +{{Library resources box +|by=no +|onlinebooks=no +|others=no +|about=yes +|label=Differentiation rules}} + +* [http://www.planetcalc.com/675/ Derivative calculator with formula simplification] +* [http://mathmajor.org/calculus-and-analysis/table-of-derivatives/ A Table of Derivatives] + +[[Category:Differential calculus|*]] +[[Category:Differentiation rules]] +[[Category:Mathematics-related lists|Derivatives]] +[[Category:Mathematical tables|Derivatives]] +[[Category:Mathematical identities]] + +[[ar:قائمة المطابقات التفاضلية]] +[[bs:Tablica izvoda]] +[[ca:Taula de derivades]] +[[es:Tabla de derivadas]] +[[fr:Dérivées usuelles]] +[[he:נגזרת]] +[[pl:Pochodna_funkcji#Pochodne_funkcji_elementarnych]] +[[sq:Tabela e derivateve]] +[[sl:tabela odvodov]] + 5vfeoipmcbrnq3dao9awuafhruw5jkk + + + + Fermat's spiral + 0 + 2874 + + 2875 + 2014-01-18T18:14:44Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + [[Image:Fermat's spiral.svg|frame|right|Fermat's spiral]] +'''Fermat's spiral''' (also known as a [[parabola|parabolic]] [[spiral]]) follows the equation + +:<math>r = \pm\theta^{1/2}\,</math> + +in [[polar coordinates]] (the more general Fermat's spiral follows ''r''<sup>&nbsp;2</sup>&nbsp;=&nbsp;''a''<sup>&nbsp;2</sup>''&theta;''.) +It is a type of [[Archimedean spiral#General Archimedean spiral|Archimedean spiral]].<ref>{{mathworld|urlname=FermatsSpiral |title=Fermat Spiral}}</ref> + +In disc [[phyllotaxis]] ([[sunflower]], daisy), the mesh of spirals occurs in [[Fibonacci number]]s because divergence (angle of succession in a single spiral arrangement) approaches the [[golden ratio]]. The shape of the spirals depends on the growth of the elements generated sequentially. In mature-disc [[phyllotaxis]], when all the elements are the same size, the shape of the spirals is that of Fermat spirals&mdash;ideally. That is because Fermat's spiral traverses equal [[annulus (mathematics)|annuli]] in equal turns. The full model proposed by H Vogel in 1979<ref> +{{Cite journal + | last =Vogel + | first =H + | title =A better way to construct the sunflower head + | journal =Mathematical Biosciences + | issue =44 + | pages =179–189 + | year =1979 + | doi =10.1016/0025-5564(79)90080-4 + | volume =44 + | postscript =<!--None--> +}}</ref> is + +:<math>r = c \sqrt{n},</math> + +:<math>\theta = n \times 137.508^\circ,</math> + +where ''θ'' is the angle, ''r'' is the radius or distance from the center, and ''n'' is the index number of the floret and ''c'' is a constant scaling factor. The angle 137.508° is the [[golden angle]] which is approximated by ratios of [[Fibonacci number]]s.<ref>{{cite book + | last =Prusinkiewicz + | first =Przemyslaw + | authorlink =Przemyslaw Prusinkiewicz + | coauthors =[[Aristid Lindenmayer|Lindenmayer, Aristid]] + | title =The Algorithmic Beauty of Plants + | publisher =Springer-Verlag + | date =1990 + | location = + | pages =101&ndash;107 + | url =http://algorithmicbotany.org/papers/#webdocs + | doi = + | isbn = 978-0-387-97297-8 }}</ref> +Fermat's spiral has also been found to be an efficient layout for the mirrors of [[concentrated solar power]] plants.<ref> +{{Cite journal + | last =Noone + | first =Corey J. + | last =Torrilhon + | first =Manuel + | last =Mitsos + | first = Alexander + | title =Heliostat Field Optimization: A New Computationally Efficient Model and Biomimetic Layout + | journal =Solar Energy + |date=December 2011 + | doi =10.1016/j.solener.2011.12.007 + | note = In Press + | postscript =<!--None--> +}}</ref> + +[[Image:Sunflower spiral.png|thumb|692px|The pattern of florets produced by Vogel's model (central image). The other two images show the patterns for slightly different values of the angle.]] +{{clear}} +== See also == + +* [[Patterns in nature]] +* [[Spiral of Theodorus]] + +== References == +{{Reflist}} + +* {{cite book | author=J. Dennis Lawrence | title=A catalog of special plane curves | publisher=Dover Publications | year=1972 | isbn=0-486-60288-5 | pages=31,186 }} + +==External links== +* {{springer|title=Fermat spiral|id=p/f038420}} +* [http://jsxgraph.uni-bayreuth.de/wiki/index.php/Fermat's_spiral Online exploration using JSXGraph (JavaScript)] + +[[Category:Spirals]] + nvcj4z9pvjgbtbl3cd4k6zqs8q9q037 + + + + Consistency criterion + 0 + 9318 + + 9319 + 2013-07-08T09:46:08Z + + John of Reading + 0 + + + Typo/[[WP:AWB/GF|general]] fixing, replaced: it have → it has using [[Project:AWB|AWB]] + wikitext + text/x-wiki + A [[voting system]] is '''consistent''' if, when the electorate is divided arbitrarily into two (or more) parts and separate elections in each part result in the same choice being selected, an election of the entire electorate also selects that alternative. Smith{{ref|Smith}} calls this property '''separability''' and Woodall{{ref|Woodall}} calls it '''convexity'''. + +It has been proven a [[Ranked voting systems|ranked voting system]] is consistent if and only if it is a [[positional voting system]].{{ref|Young}}{{Request quotation|date=January 2012}} [[Borda count]] is an example of this. + +The failure of the consistency criterion can be seen as an example of [[Simpson's paradox]]. + +{{TOC limit|limit=3}} + +== Examples == + +=== Copeland === +{{Main|Copeland's method}} + +This example shows that Copeland's method violates the Consistency criterion. Assume five candidates A, B, C, D and E with 27 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 3 || A > D > B > E > C +|- +| 2 || A > D > E > C > B +|- +| 3 || B > A > C > D > E +|- +| 3 || C > D > B > E > A +|- +| 3 || E > C > B > A > D +|- +| style="border-top: 3pt black solid"|3 || style="border-top: 3pt black solid"|A > D > C > E > B +|- +| 1 || A > D > E > B > C +|- +| 3 || B > D > C > E > A +|- +| 3 || C > A > B > D > E +|- +| 3 || E > B > C > A > D +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Copeland winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 3 || A > D > B > E > C +|- +| 2 || A > D > E > C > B +|- +| 3 || B > A > C > D > E +|- +| 3 || C > D > B > E > A +|- +| 3 || E > C > B > A > D +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise Preferences +|- +| colspan=2 rowspan=2 | +| colspan=5 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +| bgcolor="#c0c0ff" | E +|- +| bgcolor="#ffc0c0" rowspan=5 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 5 +| bgcolor="#ffe0e0" | [X] 6 <br/>[Y] 8 +| bgcolor="#ffe0e0" | [X] 3 <br>[Y] 11 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 8 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 5 <br>[Y] 9 +| +| bgcolor="#e0e0ff" | [X] 8 <br>[Y] 6 +| bgcolor="#e0e0ff" | [X] 8 <br>[Y] 6 +| bgcolor="#ffe0e0" | [X] 5 <br>[Y] 9 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#e0e0ff" | [X] 8 <br>[Y] 6 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 8 +| +| bgcolor="#ffe0e0" | [X] 5 <br>[Y] 9 +| bgcolor="#e0e0ff" | [X] 8 <br>[Y] 6 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#e0e0ff" | [X] 11 <br>[Y] 3 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 8 +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 5 +| +| bgcolor="#ffe0e0" | [X] 3 <br>[Y] 11 +|- +| bgcolor="#ffc0c0" | E +| bgcolor="#e0e0ff" | [X] 8 <br>[Y] 6 +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 5 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 8 +| bgcolor="#e0e0ff" | [X] 11 <br>[Y] 3 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 3-0-1 +| 2-0-2 +| 2-0-2 +| 2-0-2 +| 1-0-3 +|} + +* [X] indicates voters who preferred the candidate listed in the column caption to the candidate listed in the row caption +* [Y] indicates voters who preferred the candidate listed in the row caption to the candidate listed in the column caption + +'''Result''': With the votes of the first group of voters, A can defeat three of the four opponents, whereas no other candidate wins against more than two opponents. Thus, '''A''' is elected Copeland winner by the first group of voters. + +==== Second group of voters ==== +Now, the Copeland winner for the second group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 3 || A > D > C > E > B +|- +| 1 || A > D > E > B > C +|- +| 3 || B > D > C > E > A +|- +| 3 || C > A > B > D > E +|- +| 3 || E > B > C > A > D +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=5 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +| bgcolor="#c0c0ff" | E +|- +| bgcolor="#ffc0c0" rowspan=5 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#ffe0e0" | [X] 6 <br/>[Y] 7 +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 4 +| bgcolor="#ffe0e0" | [X] 3 <br>[Y] 10 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 7 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#e0e0ff" | [X] 7 <br>[Y] 6 +| +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 7 +| bgcolor="#ffe0e0" | [X] 4 <br>[Y] 9 +| bgcolor="#e0e0ff" | [X] 7 <br>[Y] 6 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 4 <br>[Y] 9 +| bgcolor="#e0e0ff" | [X] 7 <br>[Y] 6 +| +| bgcolor="#e0e0ff" | [X] 7 <br>[Y] 6 +| bgcolor="#ffe0e0" | [X] 4 <br>[Y] 9 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#e0e0ff" | [X] 10 <br>[Y] 3 +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 4 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 7 +| +| bgcolor="#ffe0e0" | [X] 3 <br>[Y] 10 +|- +| bgcolor="#ffc0c0" | E +| bgcolor="#e0e0ff" | [X] 7 <br>[Y] 6 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 7 +| bgcolor="#e0e0ff" | [X] 9 <br>[Y] 4 +| bgcolor="#e0e0ff" | [X] 10 <br>[Y] 3 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 3-0-1 +| 2-0-2 +| 2-0-2 +| 2-0-2 +| 1-0-3 +|} + +'''Result''': Taking only the votes of the second group in account, again, A can defeat three of the four opponents, whereas no other candidate wins against more than two opponents. Thus, '''A''' is elected Copeland winner by the second group of voters. + +==== All voters ==== +Finally, the Copeland winner of the complete set of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 3 || A > D > B > E > C +|- +| 3 || A > D > C > E > B +|- +| 1 || A > D > E > B > C +|- +| 2 || A > D > E > C > B +|- +| 3 || B > A > C > D > E +|- +| 3 || B > D > C > E > A +|- +| 3 || C > A > B > D > E +|- +| 3 || C > D > B > E > A +|- +| 3 || E > B > C > A > D +|- +| 3 || E > C > B > A > D +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=5 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +| bgcolor="#c0c0ff" | E +|- +| bgcolor="#ffc0c0" rowspan=5 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 21 +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +| +| bgcolor="#e0e0ff" | [X] 14 <br>[Y] 13 +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +| bgcolor="#ffe0e0" | [X] 13 <br>[Y] 14 +| +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +| bgcolor="#ffe0e0" | [X] 12 <br>[Y] 15 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#e0e0ff" | [X] 21 <br>[Y] 6 +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 21 +|- +| bgcolor="#ffc0c0" | E +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 21 <br>[Y] 6 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 2-0-2 +| 3-0-1 +| 4-0-0 +| 1-0-3 +| 0-0-4 +|} + +'''Result''': C is the Condorcet winner, thus Copeland chooses '''C''' as winner. + +==== Conclusion ==== +A is the Copeland winner within the first group of voters and also within the second group of voters. However, both groups combined elect C as the Copeland winner. Thus, Copeland fails the Consistency criterion. + +=== Instant-runoff voting === +{{Main|Instant-runoff voting}} + +This example shows that Instant-runoff voting violates the Consistency criterion. Assume three candidates A, B and C and 23 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 4 || A > B > C +|- +| 2 || B > A > C +|- +| 4 || C > B > A +|- +| style="border-top: 3pt black solid"|4 || style="border-top: 3pt black solid"|A > B > C +|- +| 6 || B > A > C +|- +| 3 || C > A > B +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the instant-runoff winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 4 || A > B > C +|- +| 2 || B > A > C +|- +| 4 || C > B > A +|} + +B has only 2 votes and is eliminated first. Its votes are transferred to A. Now, A has 6 votes and wins against C with 4 votes. +{| class="wikitable" +|- +!Votes in round/<br />Candidate !! 1st !! 2nd +|- +| A || bgcolor=#ddffbb|4 || bgcolor=#bbffbb|'''6''' +|- +| B || bgcolor=#ffbbbb|''2'' +|- +| C || bgcolor=#ddffbb|4 || bgcolor=#ffbbbb|''4'' +|} + +'''Result''': '''A''' wins against C, after B has been eliminated. + +==== Second group of voters ==== +Now, the instant-runoff winner for the second group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 4 || A > B > C +|- +| 6 || B > A > C +|- +| 3 || C > A > B +|} + +C has the least votes count of 3 and is eliminated. A benefits from that, gathering all the votes from C. Now, with 7 votes A wins against B with 6 votes. +{| class="wikitable" +|- +!Votes in round/<br />Candidate !! 1st !! 2nd +|- +| A || bgcolor=#ddffbb|4 || bgcolor=#bbffbb|'''7''' +|- +| B || bgcolor=#ddffbb|6 || bgcolor=#ffbbbb|''6'' +|- +| C || bgcolor=#ffbbbb|''3'' +|} + +'''Result''': '''A''' wins against B, after C has been eliminated. + +==== All voters ==== +Finally, the instant runoff winner of the complete set of voters is determined. + +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 8 || A > B > C +|- +| 8 || B > A > C +|- +| 3 || C > A > B +|- +| 4 || C > B > A +|} + +C has the least first preferences and so is eliminated first, its votes are split: 4 are transferred to B and 3 to A. Thus, B wins with 12 votes against 11 votes of A. +{| class="wikitable" +|- +!Votes in round/<br />Candidate !! 1st !! 2nd +|- +| A || bgcolor=#ddffbb|8 || bgcolor=#ffbbbb|''11'' +|- +| B || bgcolor=#ddffbb|8 || bgcolor=#bbffbb|'''12''' +|- +| C || bgcolor=#ffbbbb|''7'' +|} + +'''Result''': '''B''' wins against A, after C is eliminated. + +==== Conclusion ==== +A is the instant-runoff winner within the first group of voters and also within the second group of voters. However, both groups combined elect B as the instant-runoff winner. Thus, instant-runoff voting fails the Consistency criterion. + +=== Kemeny–Young method === +{{Main|Kemeny–Young method}} + +This example shows that the Kemeny–Young method violates the Consistency criterion. Assume three candidates A, B and C and 38 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| style="border-top: 3pt black solid"|8 || style="border-top: 3pt black solid"|A > C > B +|- +| 7 || B > A > C +|- +| 7 || C > B > A +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Kemeny-Young winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|} + +The Kemeny–Young method arranges the pairwise comparison counts in the following tally table: + +{| class="wikitable" +|- +! colspan=2 rowspan=2|All possible pairs<br/>of choice names !! colspan=3|Number of votes with indicated preference +|- +! Prefer X over Y !! Equal preference !! Prefer Y over X +|- +| X = A || Y = B || 10 || 0 || 6 +|- +| X = A || Y = C || 7 || 0 || 9 +|- +| X = B || Y = C || 13 || 0 || 3 +|} + +The ranking scores of all possible rankings are: +{| class="wikitable" +|- +! Preferences !! 1. vs 2. !! 1. vs 3. !! 2. vs 3. !! Total +|- +| A > B > C || 10 || 7 || 13 || bgcolor=#bbffbb|'''30''' +|- +| A > C > B || 7 || 10 || 3 || bgcolor=#ffbbbb|''20'' +|- +| B > A > C || 6 || 13 || 7 || bgcolor=#ffbbbb|''26'' +|- +| B > C > A || 13 || 6 || 9 || bgcolor=#ffbbbb|''28'' +|- +| C > A > B || 9 || 3 || 10 || bgcolor=#ffbbbb|''22'' +|- +| C > B > A || 3 || 9 || 6 || bgcolor=#ffbbbb|''18'' +|} + +'''Result''': The ranking A > B > C has the highest ranking score. Thus, '''A''' wins ahead of B and C. + +==== Second group of voters ==== +Now, the Kemeny-Young winner for the second group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 8 || A > C > B +|- +| 7 || B > A > C +|- +| 7 || C > B > A +|} + +The Kemeny–Young method arranges the pairwise comparison counts in the following tally table: + +{| class="wikitable" +|- +! colspan=2 rowspan=2|All possible pairs<br/>of choice names !! colspan=3|Number of votes with indicated preference +|- +! Prefer X over Y !! Equal preference !! Prefer Y over X +|- +| X = A || Y = B || 8 || 0 || 14 +|- +| X = A || Y = C || 15 || 0 || 7 +|- +| X = B || Y = C || 7 || 0 || 15 +|} + +The ranking scores of all possible rankings are: +{| class="wikitable" +|- +! Preferences !! 1. vs 2. !! 1. vs 3. !! 2. vs 3. !! Total +|- +| A > B > C || 8 || 15 || 7 || bgcolor=#ffbbbb|''30'' +|- +| A > C > B || 15 || 8 || 15 || bgcolor=#bbffbb|'''38''' +|- +| B > A > C || 14 || 7 || 15 || bgcolor=#ffbbbb|''36'' +|- +| B > C > A || 7 || 14 || 7 || bgcolor=#ffbbbb|''28'' +|- +| C > A > B || 7 || 15 || 8 || bgcolor=#ffbbbb|''30'' +|- +| C > B > A || 15 || 7 || 14 || bgcolor=#ffbbbb|''36'' +|} + +'''Result''': The ranking A > C > B has the highest ranking score. Hence, '''A''' wins ahead of C and B. + +==== All voters ==== +Finally, the Kemeny-Young winner of the complete set of voters is determined. + +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 8 || A > C > B +|- +| 7 || B > A > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| 7 || C > B > A +|} + +The Kemeny–Young method arranges the pairwise comparison counts in the following tally table: + +{| class="wikitable" +|- +! colspan=2 rowspan=2|All possible pairs<br/>of choice names !! colspan=3|Number of votes with indicated preference +|- +! Prefer X over Y !! Equal preference !! Prefer Y over X +|- +| X = A || Y = B || 18 || 0 || 20 +|- +| X = A || Y = C || 22 || 0 || 16 +|- +| X = B || Y = C || 20 || 0 || 18 +|} + +The ranking scores of all possible rankings are: +{| class="wikitable" +|- +! Preferences !! 1. vs 2. !! 1. vs 3. !! 2. vs 3. !! Total +|- +| A > B > C || 18 || 22 || 20 || bgcolor=#ffbbbb|''60'' +|- +| A > C > B || 22 || 18 || 18 || bgcolor=#ffbbbb|''58'' +|- +| B > A > C || 20 || 20 || 22 || bgcolor=#bbffbb|'''62''' +|- +| B > C > A || 20 || 20 || 16 || bgcolor=#ffbbbb|''56'' +|- +| C > A > B || 16 || 18 || 18 || bgcolor=#ffbbbb|''52'' +|- +| C > B > A || 18 || 16 || 20 || bgcolor=#ffbbbb|''54'' +|} + +'''Result''': The ranking B > A > C has the highest ranking score. So, '''B''' wins ahead of A and C. + +==== Conclusion ==== +A is the Kemeny-Young winner within the first group of voters and also within the second group of voters. However, both groups combined elect B as the Kemeny-Young winner. Thus, the Kemeny–Young method fails the Consistency criterion. + +==== Ranking consistency ==== +The Kemeny-Young method satisfies ranking consistency, that is if the electorate is divided arbitrarily into two parts and separate elections in each part result in the same ranking being selected, an election of the entire electorate also selects that ranking. + +===== Informal proof ===== +The Kemeny-Young score of a ranking <math>\mathcal{R}</math> is computed by summing up the number of pairwise comparisons on each ballot that match the ranking <math>\mathcal{R}</math>. Thus, the Kemeny-Young score <math>s_V(\mathcal{R})</math> for an electorate <math>V</math> can be computed by separating the electorate into disjoint subsets <math>V = V_1 \cup V_2</math> (with <math>V_1 \cap V_2 = \emptyset</math>), computing the Kemeny-Young scores for these subsets and adding it up: +::<math>(I) \quad s_V(\mathcal{R}) = s_{V_1}(\mathcal{R}) + s_{V_2}(\mathcal{R})</math>. + +Now, consider an election with electorate <math>V</math>. The premise of the consistency criterion is to divide the electorate arbitrarily into two parts <math>V = V_1 \cup V_2</math>, and in each part the same ranking <math>\mathcal{R}</math> is selected. This means, that the Kemeny-Young score for the ranking <math>\mathcal{R}</math> in each electorate is bigger than for every other ranking <math>\mathcal{R}'</math>: +::<math>(II) \quad \forall \mathcal{R}' : s_{V_1}(\mathcal{R}) > s_{V_1}(\mathcal{R}') </math> and +::<math>(III) \quad \forall \mathcal{R}' : s_{V_2}(\mathcal{R}) > s_{V_2}(\mathcal{R}') </math>. + +Now, it has to be shown, that the Kemeny-Young score of the ranking <math>\mathcal{R}</math> in the entire electorate is bigger than the Kemeny-Young score of every other ranking <math>\mathcal{R}'</math>: +::<math>s_V(\mathcal{R}) \stackrel{(I)}{=} s_{V_1}(\mathcal{R}) + s_{V_2}(\mathcal{R}) \stackrel{(II)}{>} s_{V_1}(\mathcal{R}') + s_{V_2}(\mathcal{R}) \stackrel{(III)}{>} s_{V_1}(\mathcal{R}') + s_{V_2}(\mathcal{R}') \stackrel{(I)}{=} s_V(\mathcal{R}') \quad q.e.d.</math> +Thus, the Kemeny-Young method is consistent respective rankings. + +=== Majority Judgment === +{{Main|Majority Judgment}} + +This example shows that Majority Judgment violates the Consistency criterion. Assume two candidates A and B and 10 voters with the following ratings: +{| class="wikitable" +|- +! Candidates/<br /># of voters !! A !! B +|- +| 3 || bgcolor="green"|Excellent || bgcolor="yellow"|Fair +|- +| 2 || bgcolor="orangered"|Poor || bgcolor="yellow"| Fair +|- +| style="border-top: 3pt black solid"|3 || style="border-top: 3pt black solid" bgcolor="yellow"|Fair || style="border-top: 3pt black solid" bgcolor="orangered"|Poor +|- +| 2 || bgcolor="orangered"|Poor || bgcolor="yellow"|Fair +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Majority judgment winner for the first group of voters is determined. +{| class="wikitable" +|- +! Candidates/<br /># of voters !! A !! B +|- +| 3 || bgcolor="green"|Excellent || bgcolor="yellow"|Fair +|- +| 2 || bgcolor="orangered"|Poor || bgcolor="yellow"|Fair +|} + +The sorted ratings would be as follows: +{| +|- +| align=right | Candidate&nbsp;&nbsp;&nbsp; +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| width=49% | &nbsp; +| width=2% textalign=center | ↓ +| width=49% | Median point +|} +|- +| align=right | A +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=green width=60% | &nbsp; +| bgcolor=orangered width=40% | +|} +|- +| align=right | B +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=yellow width=100% | &nbsp; +|} +|- +| &nbsp; +| &nbsp; +|- +| &nbsp; +| +{| cellpadding=1 border=0 cellspacing=1 +|- +| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; +| bgcolor=green | &nbsp; +| &nbsp;Excellent&nbsp;&nbsp; +| bgcolor=YellowGreen | &nbsp; +| &nbsp;Good&nbsp;&nbsp; +| bgcolor=Yellow | &nbsp; +| &nbsp;Fair&nbsp;&nbsp; +| bgcolor=Orangered | &nbsp; +| &nbsp;Poor&nbsp;&nbsp; +|} +|} + +'''Result''': With the votes of the first group of voters, A has the median rating of "Excellent" and B has the median rating of "Fair". Thus, '''A''' is elected Majority Judgment winner by the first group of voters. + +==== Second group of voters ==== +Now, the Majority Judgment winner for the second group of voters is determined. +{| class="wikitable" +|- +! Candidates/<br /># of voters !! A !! B +|- +| 3 || bgcolor="yellow"|Fair || bgcolor="orangered"|Poor +|- +| 2 || bgcolor="orangered"|Poor || bgcolor="yellow"|Fair +|} + +The sorted ratings would be as follows: +{| +|- +| align=right | Candidate&nbsp;&nbsp;&nbsp; +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| width=49% | &nbsp; +| width=2% textalign=center | ↓ +| width=49% | Median point +|} +|- +| align=right | A +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=yellow width=60% | &nbsp; +| bgcolor=orangered width=40% | +|} +|- +| align=right | B +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=yellow width=40% | &nbsp; +| bgcolor=orangered width=60% | +|} +|- +| &nbsp; +| &nbsp; +|- +| &nbsp; +| +{| cellpadding=1 border=0 cellspacing=1 +|- +| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; +| bgcolor=green | &nbsp; +| &nbsp;Excellent&nbsp;&nbsp; +| bgcolor=YellowGreen | &nbsp; +| &nbsp;Good&nbsp;&nbsp; +| bgcolor=Yellow | &nbsp; +| &nbsp;Fair&nbsp;&nbsp; +| bgcolor=Orangered | &nbsp; +| &nbsp;Poor&nbsp;&nbsp; +|} +|} + +'''Result''': Taking only the votes of the second group in account, A has the median rating of "Fair" and B the median rating of "Poor". Thus, '''A''' is elected Majority Judgment winner by the second group of voters. + +==== All voters ==== +Finally, the Majority Judgment winner of the complete set of voters is determined. +{| class="wikitable" +|- +! Candidates/<br /># of voters !! A !! B +|- +| 3 || bgcolor="green"|Excellent || bgcolor="yellow"|Fair +|- +| 3 || bgcolor="yellow"|Fair || bgcolor="orangered"|Poor +|- +| 4 || bgcolor="orangered"|Poor || bgcolor="yellow"|Fair +|} + +The sorted ratings would be as follows: +{| +|- +| align=right | Candidate&nbsp;&nbsp;&nbsp; +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| width=49% | &nbsp; +| width=2% textalign=center | ↓ +| width=49% | Median point +|} +|- +| align=right | A +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=green width=30% | &nbsp; +| bgcolor=yellow width=30% | &nbsp; +| bgcolor=orangered width=40% | +|} +|- +| align=right | B +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=yellow width=70% | &nbsp; +| bgcolor=orangered width=30% | +|} +|- +| &nbsp; +| &nbsp; +|- +| &nbsp; +| +{| cellpadding=1 border=0 cellspacing=1 +|- +| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; +| bgcolor=green | &nbsp; +| &nbsp;Excellent&nbsp;&nbsp; +| bgcolor=YellowGreen | &nbsp; +| &nbsp;Good&nbsp;&nbsp; +| bgcolor=Yellow | &nbsp; +| &nbsp;Fair&nbsp;&nbsp; +| bgcolor=Orangered | &nbsp; +| &nbsp;Poor&nbsp;&nbsp; +|} +|} + +The median ratings for A and B are both "Fair". Since there is a tie, "Fair" ratings are removed from both, until their medians become different. After removing 20% "Fair" ratings from the votes of each, the sorted ratings are now: +{| +|- +| align=right | Candidate&nbsp;&nbsp;&nbsp; +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| width=49% | &nbsp; +| width=2% textalign=center | ↓ +| width=49% | Median point +|} +|- +| align=right | A +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=gray width=10% | &nbsp; +| bgcolor=green width=30% | &nbsp; +| bgcolor=yellow width=10% | &nbsp; +| bgcolor=orangered width=40% | + +| bgcolor=gray width=10% | +|} +|- +| align=right | B +| +{| cellpadding=0 width=500 border=0 cellspacing=0 +|- +| bgcolor=gray width=10% | +| bgcolor=yellow width=50% | &nbsp; +| bgcolor=orangered width=30% | + +| bgcolor=gray width=10% | +|} +|} + +'''Result''': Now, the median rating of A is "Poor" and the median rating of B is "Fair". Thus, '''B''' is elected Majority Judgment winner. + +==== Conclusion ==== +A is the Majority Judgment winner within the first group of voters and also within the second group of voters. However, both groups combined elect B as the Majority Judgment winner. Thus, Majority Judgment fails the Consistency criterion. + +=== Minimax === +{{Main|Minimax Condorcet}} + +This example shows that the Minimax method violates the Consistency criterion. Assume four candidates A, B, C and D with 43 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 1 || A > B > C > D +|- +| 6 || A > D > B > C +|- +| 5 || B > C > D > A +|- +| 6 || C > D > B > A +|- +| style="border-top: 3pt black solid"|8 || style="border-top: 3pt black solid"|A > B > D > C +|- +| 2 || A > D > C > B +|- +| 9 || C > B > D > A +|- +| 6 || D > C > B > A +|} + +Since all preferences are strict rankings (no equals are present), all three Minimax methods (winning votes, margins and pairwise opposite) elect the same winners. + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Minimax winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 1 || A > B > C > D +|- +| 6 || A > D > B > C +|- +| 5 || B > C > D > A +|- +| 6 || C > D > B > A +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=4 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +|- +| bgcolor="#ffc0c0" rowspan=4 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 11 <br/>[Y] 7 +| bgcolor="#e0e0ff" | [X] 11 <br/>[Y] 7 +| bgcolor="#e0e0ff" | [X] 11 <br/>[Y] 7 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 7 <br>[Y] 11 +| +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 12 <br>[Y] 6 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 7 <br>[Y] 11 +| bgcolor="#e0e0ff" | [X] 12 <br>[Y] 6 +| +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 12 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#ffe0e0" | [X] 7 <br>[Y] 11 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 12 +| bgcolor="#e0e0ff" | [X] 12 <br>[Y] 6 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 0-0-3 +| 2-0-1 +| 2-0-1 +| 2-0-1 +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (winning votes): +| bgcolor=#bbffbb|'''11''' +| bgcolor=#ffbbbb|''12'' +| bgcolor=#ffbbbb|''12'' +| bgcolor=#ffbbbb|''12'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (margins): +| bgcolor=#bbffbb|'''4''' +| bgcolor=#ffbbbb|''6'' +| bgcolor=#ffbbbb|''6'' +| bgcolor=#ffbbbb|''6'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise opposition: +| bgcolor=#bbffbb|'''11''' +| bgcolor=#ffbbbb|''12'' +| bgcolor=#ffbbbb|''12'' +| bgcolor=#ffbbbb|''12'' +|} + +* [X] indicates voters who preferred the candidate listed in the column caption to the candidate listed in the row caption +* [Y] indicates voters who preferred the candidate listed in the row caption to the candidate listed in the column caption + +'''Result''': The candidates B, C and D form a cycle with clear defeats. A benefits from that since it loses relatively closely against all three and therefore A's biggest defeat is the closest of all candidates. Thus, '''A''' is elected Minimax winner by the first group of voters. + +==== Second group of voters ==== +Now, the Minimax winner for the second group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 8 || A > B > D > C +|- +| 2 || A > D > C > B +|- +| 9 || C > B > D > A +|- +| 6 || D > C > B > A +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=4 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +|- +| bgcolor="#ffc0c0" rowspan=4 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 15 <br/>[Y] 10 +| bgcolor="#e0e0ff" | [X] 15 <br/>[Y] 10 +| bgcolor="#e0e0ff" | [X] 15 <br/>[Y] 10 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 10 <br>[Y] 15 +| +| bgcolor="#e0e0ff" | [X] 17 <br>[Y] 8 +| bgcolor="#ffe0e0" | [X] 8 <br>[Y] 17 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 10 <br>[Y] 15 +| bgcolor="#ffe0e0" | [X] 8 <br>[Y] 17 +| +| bgcolor="#e0e0ff" | [X] 16 <br>[Y] 9 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#ffe0e0" | [X] 10 <br>[Y] 15 +| bgcolor="#e0e0ff" | [X] 17 <br>[Y] 8 +| bgcolor="#ffe0e0" | [X] 9 <br>[Y] 16 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 0-0-3 +| 2-0-1 +| 2-0-1 +| 2-0-1 +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (winning votes): +| bgcolor=#bbffbb|'''15''' +| bgcolor=#ffbbbb|''17'' +| bgcolor=#ffbbbb|''16'' +| bgcolor=#ffbbbb|''17'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (margins): +| bgcolor=#bbffbb|'''5''' +| bgcolor=#ffbbbb|''9'' +| bgcolor=#ffbbbb|''7'' +| bgcolor=#ffbbbb|''9'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise opposition: +| bgcolor=#bbffbb|'''15''' +| bgcolor=#ffbbbb|''17'' +| bgcolor=#ffbbbb|''16'' +| bgcolor=#ffbbbb|''17'' +|} + +'''Result''': Taking only the votes of the second group in account, again, B, C and D form a cycle with clear defeats and A benefits from that because of its relatively close losses against all three and therefore A's biggest defeat is the closest of all candidates. Thus, '''A''' is elected Minimax winner by the second group of voters. + +==== All voters ==== +Finally, the Minimax winner of the complete set of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 1 || A > B > C > D +|- +| 8 || A > B > D > C +|- +| 6 || A > D > B > C +|- +| 2 || A > D > C > B +|- +| 5 || B > C > D > A +|- +| 9 || C > B > D > A +|- +| 6 || C > D > B > A +|- +| 6 || D > C > B > A +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=4 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +| bgcolor="#c0c0ff" | D +|- +| bgcolor="#ffc0c0" rowspan=4 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 26 <br/>[Y] 17 +| bgcolor="#e0e0ff" | [X] 26 <br/>[Y] 17 +| bgcolor="#e0e0ff" | [X] 26 <br/>[Y] 17 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 17 <br>[Y] 26 +| +| bgcolor="#e0e0ff" | [X] 23 <br>[Y] 20 +| bgcolor="#ffe0e0" | [X] 20 <br>[Y] 23 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 17 <br>[Y] 26 +| bgcolor="#ffe0e0" | [X] 20 <br>[Y] 23 +| +| bgcolor="#e0e0ff" | [X] 22 <br>[Y] 21 +|- +| bgcolor="#ffc0c0" | D +| bgcolor="#ffe0e0" | [X] 17 <br>[Y] 26 +| bgcolor="#e0e0ff" | [X] 23 <br>[Y] 20 +| bgcolor="#ffe0e0" | [X] 21 <br>[Y] 22 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 0-0-3 +| 2-0-1 +| 2-0-1 +| 2-0-1 +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (winning votes): +| bgcolor=#ffbbbb|''26'' +| bgcolor=#ffbbbb|''23'' +| bgcolor=#bbffbb|'''22''' +| bgcolor=#ffbbbb|''23'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise defeat (margins): +| bgcolor=#ffbbbb|''9'' +| bgcolor=#ffbbbb|''3'' +| bgcolor=#bbffbb|'''1''' +| bgcolor=#ffbbbb|''3'' +|- +| colspan=2 bgcolor="#c0c0ff" | worst pairwise opposition: +| bgcolor=#ffbbbb|''26'' +| bgcolor=#ffbbbb|''23'' +| bgcolor=#bbffbb|'''22''' +| bgcolor=#ffbbbb|''23'' +|} + +'''Result''': Again, B, C and D form a cycle. But now, their mutual defeats are very close. Therefore, the defeats A suffers from all three are relatively clear. With a small advantage over B and D, '''C''' is elected Minimax winner. + +==== Conclusion ==== +A is the Minimax winner within the first group of voters and also within the second group of voters. However, both groups combined elect C as the Minimax winner. Thus, Minimax fails the Consistency criterion. + +=== Ranked pairs === +{{Main|Ranked pairs}} + +This example shows that the Ranked pairs method violates the Consistency criterion. Assume three candidates A, B and C with 39 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| style="border-top: 3pt black solid"|9 || style="border-top: 3pt black solid"|A > C > B +|- +| 8 || B > A > C +|- +| 6 || C > B > A +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Ranked pairs winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=3 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +|- +| bgcolor="#ffc0c0" rowspan=3 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 10 +| bgcolor="#e0e0ff" | [X] 9 <br/>[Y] 7 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#e0e0ff" | [X] 10 <br>[Y] 6 +| +| bgcolor="#ffe0e0" | [X] 3 <br>[Y] 13 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#ffe0e0" | [X] 7 <br>[Y] 9 +| bgcolor="#e0e0ff" | [X] 13 <br>[Y] 3 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 1-0-1 +| 1-0-1 +| 1-0-1 +|} + +* [X] indicates voters who preferred the candidate listed in the column caption to the candidate listed in the row caption +* [Y] indicates voters who preferred the candidate listed in the row caption to the candidate listed in the column caption + +The sorted list of victories would be: +{| class="wikitable" +! Pair !! Winner +|- +| B (13) vs. C (3)|| B 13 +|- +| A (10) vs. B (6)|| A 10 +|- +| A (7) vs. C (9)|| C 9 +|} + +'''Result''': B > C and A > B are locked in first (and C > A can't be locked in after that), so the full ranking is A > B > C. Thus, '''A''' is elected Ranked pairs winner by the first group of voters. + +==== Second group of voters ==== +Now, the Ranked pairs winner for the second group of voters is determined. +{| class="wikitable" +! # of voters !! Preferences +|- +| 9 || A > C > B +|- +| 8 || B > A > C +|- +| 6 || C > B > A +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=3 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +|- +| bgcolor="#ffc0c0" rowspan=3 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 14 <br>[Y] 9 +| bgcolor="#ffe0e0" | [X] 6 <br>[Y] 17 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 9 <br>[Y] 14 +| +| bgcolor="#e0e0ff" | [X] 15 <br>[Y] 8 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#e0e0ff" | [X] 17 <br>[Y] 6 +| bgcolor="#ffe0e0" | [X] 8 <br>[Y] 15 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 1-0-1 +| 1-0-1 +| 1-0-1 +|} + +The sorted list of victories would be: +{| class="wikitable" +! Pair !! Winner +|- +| A (17) vs. C (6)|| A 17 +|- +| B (8) vs. C (15)|| C 15 +|- +| A (9) vs. B (14)|| B 14 +|} + +'''Result''': Taking only the votes of the second group in account, A > C and C > B are locked in first (and B > A can't be locked in after that), so the full ranking is A > C > B. Thus, '''A''' is elected Ranked pairs winner by the second group of voters. + +==== All voters ==== +Finally, the Ranked pairs winner of the complete set of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 9 || A > C > B +|- +| 8 || B > A > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| 6 || C > B > A +|} + +The results would be tabulated as follows: +{| class=wikitable border=1 +|+ Pairwise election results +|- +| colspan=2 rowspan=2 | +| colspan=3 bgcolor="#c0c0ff" align=center | X +|- +| bgcolor="#c0c0ff" | A +| bgcolor="#c0c0ff" | B +| bgcolor="#c0c0ff" | C +|- +| bgcolor="#ffc0c0" rowspan=3 | Y +| bgcolor="#ffc0c0" | A +| +| bgcolor="#e0e0ff" | [X] 20 <br>[Y] 19 +| bgcolor="#ffe0e0" | [X] 15 <br>[Y] 24 +|- +| bgcolor="#ffc0c0" | B +| bgcolor="#ffe0e0" | [X] 19 <br>[Y] 20 +| +| bgcolor="#ffe0e0" | [X] 18 <br>[Y] 21 +|- +| bgcolor="#ffc0c0" | C +| bgcolor="#e0e0ff" | [X] 24 <br>[Y] 15 +| bgcolor="#e0e0ff" | [X] 21 <br>[Y] 18 +| +|- +| colspan=2 bgcolor="#c0c0ff" | Pairwise election results (won-tied-lost): +| 1-0-1 +| 2-0-0 +| 0-0-2 +|} + +The sorted list of victories would be: +{| class="wikitable" +! Pair !! Winner +|- +| A (25) vs. C (15)|| A 24 +|- +| B (21) vs. C (18)|| B 21 +|- +| A (19) vs. B (20)|| B 20 +|} + +'''Result''': Now, all three pairs (A > C, B > C and B > A) can be locked in without cycle. The full ranking is B > A > C. Thus, Ranked pairs chooses '''B''' as winner. In fact, B is also Condorcet winner. + +==== Conclusion ==== +A is the Ranked pairs winner within the first group of voters and also within the second group of voters. However, both groups combined elect B as the Ranked pairs winner. Thus, the Ranked pairs method fails the Consistency criterion. + +=== Schulze method === +{{Main|Schulze method}} + +This example shows that the Schulze method violates the Consistency criterion. Again, assume three candidates A, B and C with 39 voters with the following preferences: +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| style="border-top: 3pt black solid"|9 || style="border-top: 3pt black solid"|A > C > B +|- +| 8 || B > A > C +|- +| 6 || C > B > A +|} + +Now, the set of all voters is divided into two groups at the bold line. The voters over the line are the first group of voters; the others are the second group of voters. + +==== First group of voters ==== +In the following the Schulze winner for the first group of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|} + +The pairwise preferences would be tabulated as follows: +{| class="wikitable" style="text-align:center" +|+ Matrix of pairwise preferences +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ddffdd|10 || bgcolor=#ffdddd|7 +|- +! d[B,*] +| bgcolor=#ffdddd|6 || || bgcolor=#ddffdd|13 +|- +! d[C,*] +| bgcolor=#ddffdd|9 || bgcolor=#ffdddd|3 +|} + +Now, the strongest paths have to be identified, e.g. the path A > B > C is stronger than the direct path A > C (which is nullified, since it is a loss for A). +{| class="wikitable" style="text-align:center" +|+ Strengths of the strongest paths +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ddffdd|10 || bgcolor=#ddffdd|10 +|- +! d[B,*] +| bgcolor=#ffdddd|9 || || bgcolor=#ddffdd|13 +|- +! d[C,*] +| bgcolor=#ffdddd|9 || bgcolor=#ffdddd|9 +|} + +'''Result''': A > B, A > C and B > C prevail, so the full ranking is A > B > C. Thus, '''A''' is elected Schulze winner by the first group of voters. + +==== Second group of voters ==== +Now, the Schulze winner for the second group of voters is determined. +{| class="wikitable" +! # of voters !! Preferences +|- +| 9 || A > C > B +|- +| 8 || B > A > C +|- +| 6 || C > B > A +|} + +The pairwise preferences would be tabulated as follows: +{| class="wikitable" style="text-align:center" +|+ Matrix of pairwise preferences +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ffdddd|9 || bgcolor=#ddffdd|17 +|- +! d[B,*] +| bgcolor=#ddffdd|14 || || bgcolor=#ffdddd|8 +|- +! d[C,*] +| bgcolor=#ffdddd|6 || bgcolor=#ddffdd|15 +|} + +Now, the strongest paths have to be identified, e.g. the path A > C > B is stronger than the direct path A > B. +{| class="wikitable" style="text-align:center" +|+ Strengths of the strongest paths +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ddffdd|15 || bgcolor=#ddffdd|17 +|- +! d[B,*] +| bgcolor=#ffdddd|14 || || bgcolor=#ffdddd|14 +|- +! d[C,*] +| bgcolor=#ffdddd|14 || bgcolor=#ddffdd|15 +|} + +'''Result''': A > B, A > C and C > B prevail, so the full ranking is A > C > B. Thus, '''A''' is elected Schulze winner by the second group of voters. + +==== All voters ==== +Finally, the Schulze winner of the complete set of voters is determined. +{| class="wikitable" +|- +! # of voters !! Preferences +|- +| 7 || A > B > C +|- +| 9 || A > C > B +|- +| 8 || B > A > C +|- +| 6 || B > C > A +|- +| 3 || C > A > B +|- +| 6 || C > B > A +|} + +The pairwise preferences would be tabulated as follows: +{| class="wikitable" style="text-align:center" +|+ Matrix of pairwise preferences +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ffdddd|19 || bgcolor=#ddffdd|24 +|- +! d[B,*] +| bgcolor=#ddffdd|20 || || bgcolor=#ddffdd|21 +|- +! d[C,*] +| bgcolor=#ffdddd|15 || bgcolor=#ffdddd|18 +|} + +Now, the strongest paths have to be identified: +{| class="wikitable" style="text-align:center" +|+ Strengths of the strongest paths +|- +! !! d[*,A] !! d[*,B] !! d[*,C] +|- +! d[A,*] +| || bgcolor=#ffdddd|0 || bgcolor=#ddffdd|24 +|- +! d[B,*] +| bgcolor=#ddffdd|20 || || bgcolor=#ddffdd|21 +|- +! d[C,*] +| bgcolor=#ffdddd|0 || bgcolor=#ffdddd|0 +|} + +'''Result''': A > C, B > A and B > C prevail, so the full ranking is B > A > C. Thus, Schulze chooses '''B''' as winner. In fact, B is also Condorcet winner. + +==== Conclusion ==== +A is the Schulze winner within the first group of voters and also within the second group of voters. However, both groups combined elect B as the Schulze winner. Thus, the Schulze method fails the Consistency criterion. + +==References== +#{{note|Smith}}[[John H Smith (mathematician)|John H Smith]], "Aggregation of preferences with variable electorate", ''Econometrica'', Vol. 41 (1973), pp.&nbsp;1027&ndash;1041. +#{{note|Woodall}}[[Douglas R. Woodall|D. R. Woodall]], "[http://www.votingmatters.org.uk/ISSUE3/P5.HTM Properties of preferential election rules]", ''Voting matters'', Issue 3 (December 1994), pp.&nbsp;8&ndash;15. +#{{note|Young}}[[Peyton Young|H. P. Young]], "Social Choice Scoring Functions", ''SIAM Journal on Applied Mathematics'' Vol. 28, No. 4 (1975), pp.&nbsp;824&ndash;838. + +[[Category:Voting system criteria]] + pponuekg9z47c7kb2ejvghiuws4z4gb + + + + Basis (universal algebra) + 0 + 21981 + + 21982 + 2014-01-09T22:56:14Z + + Mark viking + 0 + + Added wl + wikitext + text/x-wiki + In [[universal algebra]] a '''basis''' is a structure inside of some (universal) algebras, which are called [[free algebra]]s. It generates all algebra elements from its own elements by the algebra operations in an independent manner. It also represents the [[endomorphisms]] of an algebra by certain indexings of algebra elements, which can correspond to the usual [[Matrix (mathematics)|matrices]] when the free algebra is a [[vector space]]. + +== Definitions == + +The '''basis''' (or '''reference frame''') ''' of a [[Universal algebra|(universal) algebra]]''' is a [[Function (set theory)|function]] ''b'' that takes some algebra elements as values <math>b(i)</math> and satisfies either one of the following two equivalent conditions. Here, the set of all <math>b(i)</math> is called '''basis set''', whereas several authors call it the "basis".<ref>Gould.</ref><ref>Grätzer 1968, p.198.</ref> The set <math>I</math> of its arguments ''i'' is called '''dimension set'''. Any function, with all its arguments in the whole <math>I</math>, that takes algebra elements as values (even outside the basis set) will be denoted by ''m''. Then, ''b'' will be an ''m''. + +=== Outer condition === +This condition will define bases by the set ''L'' of the <math>I</math>-'''ary elementary functions of the algebra''', which are certain functions <math>\ell</math> that take every ''m'' as argument to get some algebra element as value <math>\ell(m)</math>. In fact, they consist of all the '''projections''' <math>p_i</math> with ''i'' in <math>I</math>, which are the functions such that <math>p_i(m)=m(i)</math> for each ''m'', and of all functions that rise from them by repeated "multiple compositions" with operations of the algebra. + +(When an algebra operation has a single algebra element as argument, the value of such a composed function is the one that the operation takes from the value of a single previously computed <math>I</math>-ary function as in [[Function composition|composition]]. When it does not, such compositions require that many (or none for a nullary operation) <math>I</math>-ary functions are evaluated before the algebra operation: one for each possible algebra element in that argument. In case <math>I</math> and the numbers of elements in the arguments, or “arity”, of the operations are finite, this is the [[clone (algebra)|finitary multiple composition]] .) + +Then, according to the ''outer condition'' a basis has to ''generate'' the algebra (namely when <math>\ell</math> ranges over the whole ''L'', <math>\ell(b)</math> gets every algebra element) and must be ''independent'' (namely whenever any two <math>I</math>-ary elementary functions coincide at ''b'', they will do everywhere: <math>\ell'(b)=\ell''(b)</math> implies <math>\ell'=\ell''</math>).<ref>For instance, see (Grätzer 1968, p.198).</ref> This is the same as to require that there exists a ''single'' function <math>\chi</math> that takes every algebra element as argument to get an <math>I</math>-ary elementary function as value and satisfies <math>\chi({\ell(b)})=\ell</math> for all <math>\ell</math> in ''L''. + +=== Inner condition === +This other condition will define bases by the set ''E'' of the '''endomorphisms''' of the algebra, which are the [[Universal algebra|homomorphisms]] from the algebra into itself, through its '''analytic representation''' <math>\varrho</math> by a basis. The latter is a function that takes every endomorphism ''e'' as argument to get a function ''m'' as value: <math>\varrho(e)=m</math>, where this ''m'' is the "sample" of the values of ''e'' at ''b'', namely <math>m(i)=[\varrho(e)]_i=e(b(i))</math> for all ''i'' in the dimension set. + +Then, according to the ''inner condition'' ''b'' is a basis, when <math>\varrho</math> is a '''bijection''' from ''E'' onto the set of all ''m'', namely for each ''m'' there is one and only one endomorphism ''e'' such that <math>m=\varrho(e)</math>. This is the same as to require that there exists an '''extension function''', namely a function <math>\eta</math> that takes every (sample) ''m'' as argument to extend it onto an endomorphism <math>\eta(m)</math> such that <math>\varrho(\eta(m))=m</math>.<ref>For instance, see '''0.4''' and '''0.5''' of (Ricci 2007)</ref> + +The link between these two conditions is given by the identity <math>[\chi(a)]_m=[\eta(m)]_a</math>, which holds for all ''m'' and all algebra elements ''a''.<ref>For instance, see '''0.4''' (E) of (Ricci 2007)</ref> Several other conditions that characterize bases for universal algebras are omitted. + +As the next example will show, present bases are a generalization of the [[Basis (linear algebra)|bases]] of vector spaces. Then, the name "reference frame" can well replace "basis". Yet, contrary to the vector space case, a universal algebra might lack bases and, when it has them, their dimension sets might have different finite positive cardinalities.<ref>Grätzer 1979.</ref> + +== Examples == + +=== Vector space algebras === +In the universal algebra corresponding to a vector space with positive dimension the bases essentially are the [[ordered basis|ordered bases]] of this vector space. Yet, this will come after several details. + +When the vector space is finite-dimensional, for instance <math>I=\{0,1,\ldots n-1\}</math> with <math>n > 0</math>, the functions <math>\ell</math> in the set ''L'' of the ''outer condition'' exactly are the ones that provide the [[Basis (linear algebra)|spanning and linear independence properties]] with linear combinations <math>\ell(b)=c_0 b_0+c_1b_1+\ldots c_{n-1}b_{n-1}</math> and present generator property becomes the spanning one. On the contrary, linear independence is a mere instance of present independence, which becomes equivalent to it in such vector spaces. (Also, several other generalizations of linear independence for universal algebras do not imply present independence.) + +The functions ''m'' for the ''inner condition'' correspond to the square arrays of field numbers (namely, usual vector-space square matrices) that serve to build the endomorphisms of vector spaces (namely, [[linear maps]] into themselves). Then, the ''inner condition'' requires a bijection property from endomorphisms also to arrays. In fact, each column of such an array represents a vector <math>m(i)</math> as its ''n''-tuple of [[coordinate]]s with respect to the basis ''b''. For instance, when the vectors are ''n''-tuples of numbers from the underlying field and ''b'' is the [[standard basis|Kronecker basis]], ''m'' is such an array ''seen by columns'', <math>\varrho</math> is the sample of such a linear map at the reference vectors and <math>\eta</math> extends this sample to this map as below. + +<math>{}\qquad +\left(\begin{array}{rrc} +0 & -1 & 2 \\ +-2 & 3 & 1 \\ +1 & 0 & 2 +\end{array}\right) +\quad +\begin{array}{c} +\stackrel{\eta}{\longmapsto}\\ +\stackrel{\varrho}{\longleftarrow\!\!{}^{{}_{\!{}_\mathsf{l}}}} +\end{array} +\quad +\left\{ +\begin{array}{rcrccr} +x'_0 & = & & -x_1 &+& 2x_2 \\ +x'_1 & = &-2x_0&+3x_1&+& x_2\\ +x'_2 & = & x_0 & & +&2x_2 +\end{array}\right.</math> + +When the vector space is not finite-dimensional, further distinctions are needed. In fact, though the functions <math>\ell</math> formally have an infinity of vectors in every argument, the linear combinations they evaluate never require infinitely many addenda <math>c_i m(i)</math> and each <math>\ell</math> determines a finite subset ''J'' of <math>I</math> that contains all required ''i''. Then, every value <math>\ell(m)</math> equals an <math>\ell'(m')</math>, where <math>m'</math> is the restriction of ''m'' to ''J'' and <math>\ell'</math> is the ''J''-ary elementary function corresponding to <math>\ell</math>. When the <math>\ell'</math> replace the <math>\ell</math>, both the linear independence and spanning properties for infinite basis sets follow from present ''outer condition'' and conversely. + +Therefore, as far as vector spaces of a positive dimension are concerned, the only difference between present bases for universal algebras and the [[ordered basis|ordered bases]] of vector spaces is that here no order on <math>I</math> is required. Still it is allowed, in case it serves some purpose. + +When the space is zero-dimensional, its ordered basis is empty. Then, being the [[empty function]], it is a present basis. Yet, since this space only contains the null vector and its only endomorphism is the identity, any function ''b'' from any set <math>I</math> (even a nonempty one) to this singleton space work as a present basis. This is not so strange from the point of view of Universal Algebra, where singleton algebras, which are called "trivial", enjoy a lot of other seeming strange properties. + +=== Word monoid === +Let <math>I=\{ \mathsf{a, b, c,} \ldots\}</math> be an "alphabet", namely a (usually finite) set of objects called "letters". Let ''W'' denote the corresponding set of '''words''' or "strings", which will be denoted as in [[String (computer science)|strings]], namely either by writing their letters in sequence or by <math>\epsilon</math> in case of the empty word ([[formal language|Formal Language]] notation).<ref name=warning>Formal Language notation is used in Computer Science and sometimes collides with the set-theoretical definitions of words. See G. Ricci, ''An observation on a Formal Language notation,'' SIGACT News, '''17''' (1972), 18&ndash;23.</ref> Accordingly, the juxtaposition ''<math>vw</math>'' will denote the [[concatenation]] of two words ''v'' and ''w'', namely the word that begins with ''v'' and is followed by ''w''. + +Concatenation is a binary operation on ''W'' that together with the empty word <math>\epsilon</math> defines a [[free monoid]], the monoid of the words on <math>I</math>, which is one of the simplest universal algebras. Then, the ''inner condition'' will immediately prove that one of its bases is the function ''b'' that makes a single-letter word <math>{i}</math> of each letter <math>\mathsf{i}</math>, <math>b(\mathsf{i})=i</math>. + +(Depending on the set-theoretical implementation of sequences, ''b'' may not be an identity function, namely <math>i</math> may not be <math>\mathsf{i}</math>, rather an object like <math>\{ (\emptyset,\mathsf{i})\}</math>, namely a singleton function, or a pair like <math>(\emptyset,\mathsf{i})</math> or <math>(\mathsf{i},\emptyset)</math>.<ref name=warning/>) + +In fact, in the theory of D0L systems (Rozemberg & Salomaa 1980) such <math>m=\varrho(e)</math> are the tables of [[L-system|"productions"]], which such systems use to define the simultaneous substitutions of every <math>i</math> by a single word <math>w=m(\mathsf{i})</math> in any word ''u'' in ''W'': if <math>u={i}_0{i}_1\cdots {i}_k</math>, then <math>e(u)=m(\mathsf{i}_0)m(\mathsf{i}_1)\cdots m(\mathsf{i}_k)</math>. Then, ''b'' satisfies the ''inner condition'', since the function <math>\varrho</math> is the well-known bijection that identifies every word endomorphism with any such table. (The repeated applications of such an endomorphism starting from a given "seed" word are able to model many growth processes, where words and concatenation serve to build fairly heterogeneous structures as in [[L-system]], not just "sequences".) + +== Notes == +<references/> + +== References == + +# Gould, V. ''Independence algebras,'' Algebra Universalis '''33''' (1995), 294&ndash;318. +# Grätzer, G. (1968). ''Universal Algebra'', D. Van Nostrand Company Inc.. +# Grätzer, G. (1979). ''Universal Algebra'' 2-nd 2ed., Springer Verlag. ISBN 0-387-90355-0. +# Ricci, G. (2007). ''Dilatations kill fields'', Int. J. Math. Game Theory Algebra, '''16''' 5/6, pp.&nbsp;13&ndash;34. +# Rozenberg G. and Salomaa A. (1980). ''The mathematical theory of L systems'', Academic Press, New York. ISBN 0-12-597140-0 + +[[Category:Universal algebra]] + 81chlikxbhc2dkd8baepsetvbf3lucn + + + + Adequality + 0 + 26320 + + 26321 + 2014-01-26T10:05:22Z + + Tkuvho + 0 + + eta gives an "e" apparently + wikitext + text/x-wiki + '''Adequality''' is a term introduced by [[history of mathematics|historians of mathematics]] to discuss the mathematical meaning of the Latin word ''adaequalitas'', as it was used by [[Pierre de Fermat]] in his work on finding maxima, minima and tangents. [[André Weil]] wrote: "[Fermat] introduces the technical term adaequalitas, adaequare, etc., which he says he has borrowed from [[Diophantus]]. As Diophantus V.11 shows, it means an approximate equality, and this is indeed how Fermat explains the word in one of his later writings." (Weil 1973).<ref>see also [[André Weil]]: [[Number Theory, An approach through history from Hammurapi to Legendre]]. Birkhauser Boston, Inc., Boston, MA, 1984, ISBN 0-8176-4565-9 page 28.</ref> Diophantus coined the term παρισὀτης (parisotes) to refer to an approximate equality.<ref>{{citation + | last1 = Katz | first1 = Mikhail G. + | author1-link = Mikhail Katz + | last2 = Schaps | first2 = David + | last3 = Shnider | first3 = Steve + | author3-link = Steve Shnider + | arxiv = 1210.7750 + | doi = + | issue = 3 + | journal = [[Perspectives on Science]] + | pages = + | title = Almost Equal: The Method of Adequality from Diophantus to Fermat and Beyond + | volume = 21 + | year = 2013}}</ref> The term was rendered as ''adaequalitas'' in [[Claude Gaspard Bachet de Méziriac]]'s Latin translation of Diophantus, and ''adéquation'' and ''adégaler'' in [[Paul Tannery]]'s French translation of Fermat’s Latin treatises on maxima and minima and related problems. + +== Fermat's method == + +Fermat used ''adequality'' first to find maxima of functions, and then adapted it to find tangent lines to curves. + +To find the maximum of a term <math>p(x)</math>, Fermat did equate (or more precisely adequate) <math>p(x)</math> and <math>p(x+e)</math> and after doing algebra he could divide by e, and then discard any remaining terms involving e. To illustrate the method by Fermat's own example, consider the problem of finding the maximum of <math>p(x)=bx-x^2</math>. Fermat ''adequated'' <math>bx-x^2</math> with <math>b(x+e)-(x+e)^2=bx-x^2+be-2ex-e^2</math>. That is (using the notation <math>\backsim</math> to denote adequality, introduced by [[Paul Tannery]]): +:<math>bx-x^2\backsim bx-x^2+be-2ex-e^2.</math> +Canceling terms and dividing by <math>e</math> Fermat arrived at +:<math>b\backsim 2x+e.</math> +Removing the terms that contained <math>e</math> Fermat arrived at the desired result that the maximum occurred when <math>x=b/2</math>. + +Fermat also used his principle to give a mathematical derivation of [[Snell's law]]s of refraction directly from the principle that light takes the quickest path.{{sfn|Grabiner|1983}} + +==Descartes' criticism== +Fermat's method was highly criticized by his contemporaries, particularly [[Descartes]]. V. Katz suggests this is because Descartes had independently discovered the same new mathematics, known as his [[method of normals]], and Descartes was quite proud of his discovery. He also notes that while Fermat's methods were closer to the future developments in calculus, Descartes methods had a more immediate impact on the development.{{sfn|Katz|2008}} + +==Scholarly controversy== +Both Newton and Leibniz referred to Fermat's work as an antecedent of [[infinitesimal calculus]]. Nevertheless, there is disagreement amongst modern scholars about the exact meaning of Fermat's adequality. Fermat's ''adequality'' was analyzed in a number of scholarly studies. In 1896, [[Paul Tannery]] published a French translation of Fermat’s Latin treatises on maxima and minima (Fermat, Œuvres, Vol. III, pp.&nbsp;121–156). Tannery translated Fermat's term as “adégaler” and adopted Fermat’s “adéquation”. Tannery also introduced the symbol <math>\scriptstyle\backsim</math> for adequality in mathematical formulas. + +'''Heinrich Wieleitner''' (1929)<ref>Wieleitner, H.:Bemerkungen zu Fermats Methode der Aufsuchung von Extremwerten und der Berechnung von Kurventangenten. Jahresbericht der Deutschen Mathematiker-Vereinigung '''38''' (1929)24-35, p.25</ref> wrote: "Fermat replaces A with A+E. Then he sets the new expression '''roughly equal''' ( '''angenähert gleich''') to the old one, cancels equal terms on both sides, and divides by the highest possible power of E. He then cancels all terms which contain E and sets those that remain equal to each other. From that [the required] A results. That E should be as small as possible is nowhere said and is at best expressed by the word "adaequalitas". (Wieleitner uses the symbol <math>\scriptstyle\sim</math>.) + +'''Max Miller''' (1934)<ref>Miller, M.: Pierre de Fermats Abhandlungen über Maxima und Minima. Akademische Verlagsgesellschaft, Leipzig (1934), p.1</ref> wrote: "Thereupon one should put the both terms, which express the maximum and the minimum, '''approximately equal''' ('''näherungsweise gleich'''), as Diophantus says." (Miller uses the symbol <math>\scriptstyle \approx</math>.) + +'''Jean Itard''' (1948)<ref>Itard, I: Fermat précurseur du calcul différentiel. Arch Int. Hist. Sci. '''27''' (1948), 589-610, p.597</ref> wrote: "One knows that the expression "adégaler" is adopted by Fermat from Diophantus, translated by Xylander and by Bachet. It is about an '''approximate equality''' ('''égalité approximative''') ". (Itard uses the symbol <math>\scriptstyle \backsim</math>.) + +'''Joseph Ehrenfried Hofmann''' (1963)<ref>Hofmann, J.E.: Über ein Extremwertproblem des Apollonius und seine Behandlung bei Fermat. Nova Acta Leopoldina (2) '''27''' (167) (1963), 105-113, p.107</ref> wrote: "Fermat chooses a quantity h, thought as sufficiently small, and puts f(x+h) '''roughly equal''' ('''ungefähr gleich''') to f(x). His technical term is ''adaequare''." (Hofmann uses the symbol <math>\scriptstyle \approx</math>.) + +'''Peer Strømholm''' (1968)<ref>Strømholm, P.: Fermat's method of maxima and minima and of tangents. A reconstruction. Arch. Hist Exact Sci. '''5''' (1968), 47-69, p.51</ref> wrote: "The basis of Fermat's approach was the comparition of two expressions which, though they had the same form, were '''not exactly equal'''. This part of the process he called "''comparare par adaequalitatem''" or "''comparer per adaequalitatem''", and it implied that the otherwise strict identity between the two sides of the "equation" was destroyed by the modification of the variable by a ''small'' amount: + +<math>\scriptstyle f(A){\sim}f(A+E)</math>. + +This, I believe, was the real significance of his use of Diophantos' πἀρισον, stressing the ''smallness'' of the variation. The ordinary translation of 'adaequalitas' seems to be "'''approximate equality'''", but I much prefer "'''pseudo-equality'''" to present Fermat's thought at this point." He further notes that "there was never in M1 (Method 1) any question of the variation E being put equal to zero. The words Fermat used to express the process of suppressing terms containing E was 'elido', 'deleo', and 'expungo', and in French 'i'efface' and 'i'ôte'. We can hardly believe that a sane man wishing to express his meaning and searching for words, would constantly hit upon such tortuous ways of imparting the simple fact that the terms vanished because E was zero." (p.&nbsp;51) + +'''Claus Jensen''' (1969)<ref>Jensen, C.: ''Pierre Fermat's method of determining tangents and its application to the conchoid and the quadratrix.'' Centaurus '''14''' (1969), 72-85, p.73</ref> wrote: "Moreover, in applying the notion of ''adégalité'' - which constitutes the basis of Fermat's general method of constructing tangents, and by which is meant a comparition of two magnitudes '''as if they were equal, although they are in fact not''' ("tamquam essent aequalia, licet revera aequalia non sint") - I will employ the nowadays more usual symbol <math>\scriptstyle \approx</math>." The Latin quotation comes from Tannery's 1891 edition of Fermat, volume 1, page 140. + +'''[[Michael Sean Mahoney]]''' (1971)<ref>Mahoney, M.S.: ''Fermat, Pierre de.'' Dictionary of Scientific Biography, vol. IV, Charles Scribner's Sons, New York (1971), p.569.</ref> wrote: "Fermat's Method of maxima and minima, which is clearly applicable to any polynomial P(x), originally rested on purely ''finitistic'' algebraic foundations. It assumed, '''counterfactually''', the inequality of two equal roots in order to determine, by Viete's theory of equations, a relation between those roots and one of the coefficients of the polynomial, a relation that was fully general. This relation then led to an extreme-value solution when Fermat removed his '''counterfactual assumption''' and set the roots equal. Borrowing a term from Diophantus, Fermat called this '''counterfactual equality''' 'adequality'." (Mahoney uses the symbol <math>\scriptstyle\approx</math>.) On p.&nbsp;164, end of footnote 46, Mahoney notes that one of the meanings of adequality is ''approximate equality'' or ''equality in the limiting case''. + +'''Charles Henry Edwards, Jr. ''' (1979)<ref>Edwards, C.H., Jr.:''The historical Development of the Calculus.'' Springer, New York 1979, p.122f</ref> wrote: "For example, in order to determine how to subdivide a segment of length <math>\scriptstyle b</math> into two segments <math>\scriptstyle x</math> and <math>\scriptstyle b-x</math> whose product <math>\scriptstyle x(b-x)=bx-x^2</math> is maximal, that is to find the rectangle with perimeter <math>\scriptstyle 2b</math> that has the maximal area, he [Fermat] proceeds as follows. First he substituted <math>\scriptstyle x+e</math> (he used ''A, E'' instead of ''x, e'') for the unknown ''x'', and then wrote down the following '''"pseudo-equality"''' to compare the resulting expression with the original one: + +::::::::::<math> \scriptstyle b(x+e)-(x+e)^2=bx+be-x^2-2xe-e^2\; \sim\; bx-x^2. </math> + +After canceling terms, he divided through by ''e'' to obtain <math>\scriptstyle 2\,x+b\;\sim\;b.</math> Finally he discarded the remaining term containing ''e'', transforming the '''pseudo-equality''' into the true equality <math>\scriptstyle x=\frac{b}{2}</math> that gives the value of ''x'' which makes <math>\scriptstyle bx-x^2</math> maximal. Unfortunately, Fermat never explained the logical basis for this method with sufficient clarity or completeness to prevent disagreements between historical scholars as to precisely what he meant or intended." + +'''[[Kirsti Andersen]]''' (1980)<ref>Andersen, K.: ''Techniques of the calculus 1630-1660.'' In: Grattan-Guinness, I. (ed): ''From the Calculus to Set Theory. An Introductory History.'' Duckworth, London 1980, 10-48, p.23</ref> wrote: "The two expressions of the maximum or minimum are made ''"adequal"'', which means something like '''as nearly equal as possible'''." (Anderson uses the symbol <math>\scriptstyle\approx</math>.) + +'''Herbert Breger''' (1994)<ref>Breger, H.: ''The mysteries of adaequare: A vindication of Fermat.'' Arch. Hist. Exact Sci. '''46''' (1994), 193-219</ref> wrote: “I want to put forward my hypothesis: ''Fermat used the word "adaequare" in the sense of'' '''"to put equal"''' ... In a mathematical context, the only difference between "aequare" and "adaequare" seems to be that the latter gives more stress on the fact that the equality is achieved." (Page 197f.) + +'''[[John Stillwell]]''' (Stillwell 2006 p. 91) wrote: "Fermat introduced the idea of adequality in 1630s but he was ahead of his time. His successors were unwilling to give up the convenience of ordinary equations, preferring to use equality loosely rather than to use adequality accurately. The idea of adequality was revived only in the twentieth century, in the so-called [[non-standard analysis]]." + +'''[[Enrico Giusti]]''' (2009)<ref>Giusti, Enrico, Les méthodes des maxima et minima de Fermat. Ann. Fac. Sci. Toulouse Math. (6) 18 (2009), Fascicule Spécial, 59–85.</ref> cites Fermat's letter to [[Marin Mersenne]] where Fermat wrote: "Cette comparaison par adégalité produit deux termes inégaux qui enfin produisent l'égalité (selon ma méthode) qui nous donne la solution de la question." Giusti notes in a footnote that this letter seems to have escaped Breger's notice. + +'''Klaus Barner''' (2011)<ref>Barner, K.: ''Fermat’s <<adaequare>> - and no end in sight? (Fermats <<adaequare>> - und kein Ende? '') Math. Semesterber. (2011) '''58''', p.13-45</ref> asserts that Fermat uses two different Latin words (aequabitur and adaequabitur) to replace the nowadays usual equals sign, ''aequabitur'' when the equation concerns a valid identity between two constants, a universally valid (proved) formula, or a conditional equation, ''adaequabitur'', however, when the equation describes a relation between two variables, which are ''not independent'' (and the equation is no valid formula). On page 36, Barner writes: "Why did Fermat continually repeat his inconsistent procedure for all his examples for the method of tangents? Why did he never mention the secant, with which he in fact operated? I do not know." + +'''Katz, Schaps, Shnider''' (2013)<ref>{{citation + | last1 = Katz | first1 = Mikhail G. | author1-link = Mikhail Katz | last2 = Schaps | first2 = David | last3 = Shnider | first3 = Steve | author3-link = Steve Shnider | arxiv = 1210.7750 | doi = + | issue = 3 | journal = [[Perspectives on Science]] | pages = | title = Almost Equal: The Method of Adequality from Diophantus to Fermat and Beyond | volume = 21 | year = 2013}}</ref> argue that Fermat's application of the technique to transcendental curves such as the cycloid shows that adequality goes beyond a purely algebraic algorithm, and that, contrary to Breger's interpretation, the terms ''parisotes'' and ''adaequalitas'' mean "approximate equality". They develop a formalisation of Fermat's technique of adequality in modern mathematics as the [[standard part function]] sending a finite [[hyperreal number]] to the [[real number]] infinitely close to it. + +==See also== +*[[Fermat's principle]] +*[[Transcendental Law of Homogeneity]] + +==References== +{{Reflist}} + +==Bibliography== +* {{citation|title=The Historical Development of the Calculus|last1=Edwards|first1=C. H. Jr.|publisher=Springer|year=1994}} +* {{citation|title=The Changing Concept of Change: The Derivative from Fermat to Weierstrass|last=Grabiner|first=Judith V.|journal=Mathematics Magazine|volume=56|number=4|date=Sep 1983|pages=195&ndash;206}} +* {{citation|title=A History of Mathematics: An Introduction|last=Katz|first=V.|publisher=Addison Wesley|year=2008}} + +* Barner, K. (2011) "Fermats <<adaequare>> - und kein Ende?" Mathematische Semesterberichte (58), pp.&nbsp;13–45 +* Breger, H. (1994) "The mysteries of adaequare: a vindication of Fermat", [[Archive for History of Exact Sciences]] 46(3):193&ndash;219. +* [[Enrico Giusti|Giusti, E.]] (2009) "Les méthodes des maxima et minima de Fermat", Ann. Fac. Sci. Toulouse Math. (6) 18, Fascicule Special, 59–85. +* Stillwell, J.(2006) ''Yearning for the impossible. The surprising truths of mathematics'', page 91, [[A K Peters, Ltd.]], Wellesley, MA. +* [[André Weil|Weil, A.]], Book Review: The mathematical career of Pierre de Fermat. Bull. Amer. Math. Soc. 79 (1973), no. 6, 1138–1149. + +{{Infinitesimals}} + +[[Category:Mathematical terminology]] +[[Category:History of calculus]] + lpgt4vzofs06i5m8tis4ums6crp9njy + + + + Mason–Weaver equation + 0 + 13491 + + 13492 + 2013-10-22T01:25:08Z + + 132.203.109.67 + + Adding precision to the explanations + wikitext + text/x-wiki + The '''Mason–Weaver equation''' (named after [[Max Mason]] and [[Warren Weaver]]) describes the [[sedimentation]] and [[diffusion]] of solutes under a uniform [[force]], usually a [[gravitation]]al field.<ref name="mason_1924" >{{cite journal | last = Mason | first = M | coauthors = Weaver W | year = 1924 | title = The Settling of Small Particles in a Fluid | journal = [[Physical Review]] | volume = 23 | pages = 412–426 | doi = 10.1103/PhysRev.23.412 | bibcode=1924PhRv...23..412M}}</ref> Assuming that the [[gravitation]]al field is aligned in the ''z'' direction (Fig. 1), the Mason–Weaver equation may be written + +:<math> +\frac{\partial c}{\partial t} = +D \frac{\partial^{2}c}{\partial z^{2}} + +sg \frac{\partial c}{\partial z} +</math> + +where ''t'' is the time, ''c'' is the [[solution|solute]] [[concentration]] (moles per unit length in the ''z''-direction), and the parameters ''D'', ''s'', and ''g'' represent the [[solution|solute]] [[diffusion constant]], [[sedimentation coefficient]] and the (presumed constant) [[acceleration]] of [[gravitation|gravity]], respectively. + +The Mason–Weaver equation is complemented by the [[boundary conditions]] +:<math> +D \frac{\partial c}{\partial z} + s g c = 0 +</math> +at the top and bottom of the cell, denoted as <math>z_{a}</math> and <math>z_{b}</math>, respectively (Fig. 1). These [[boundary conditions]] correspond to the physical requirement that no [[solution|solute]] pass through the top and bottom of the cell, i.e., that the [[flux]] there be zero. The cell is assumed to be rectangular and aligned with +the [[Cartesian coordinate system|Cartesian axes]] (Fig. 1), so that the net [[flux]] through the side walls is likewise +zero. Hence, the total amount of [[solution|solute]] in the cell +:<math> +N_{tot} = \int_{z_{b}}^{z_{a}} dz \ c(z, t) +</math> +is conserved, i.e., <math>dN_{tot}/dt = 0</math>. + +[[Image:Mason Weaver cell.png|frame|left|Figure 1: Diagram of Mason–Weaver cell and Forces on Solute]] + +==Derivation of the Mason–Weaver equation== +A typical particle of [[mass]] ''m'' moving with vertical [[velocity]] ''v'' is acted upon by three [[force]]s (Fig. 1): the +[[drag (physics)|drag force]] <math>f v</math>, the force of [[gravitation|gravity]] <math>m g</math> and the [[buoyancy|buoyant force]] <math>\rho V g</math>, where ''g'' is the [[acceleration]] of [[gravitation|gravity]], ''V'' is the [[solution|solute]] particle volume and <math>\rho</math> is the [[solvent]] [[density]]. At [[mechanical equilibrium|equilibrium]] (typically reached in roughly 10 ns for [[molecule|molecular]] [[solution|solutes]]), the +particle attains a [[terminal velocity]] <math>v_{term}</math> where the three [[force]]s are balanced. Since ''V'' equals the particle [[mass]] ''m'' times its [[partial specific volume]] <math>\bar{\nu}</math>, the [[mechanical equilibrium|equilibrium]] condition may be written as + +:<math> +f v_{term} = m (1 - \bar{\nu} \rho) g \ \stackrel{\mathrm{def}}{=}\ m_{b} g +</math> + +where <math>m_{b}</math> is the [[buoyant mass]]. + +We define the Mason–Weaver [[sedimentation coefficient]] <math>s \ \stackrel{\mathrm{def}}{=}\ m_{b} / f = v_{term}/g</math>. Since the [[drag coefficient]] ''f'' is related to the [[diffusion constant]] ''D'' by the [[Einstein relation (kinetic theory)|Einstein relation]] + +:<math> +D = \frac{k_{B} T}{f} +</math>, + +the ratio of ''s'' and ''D'' equals + +:<math> +\frac{s}{D} = \frac{m_{b}}{k_{B} T} +</math> + +where <math>k_{B}</math> is the [[Boltzmann constant]] and ''T'' is the [[temperature]] in [[kelvin]]s. + +The [[flux]] ''J'' at any point is given by + +:<math> +J = -D \frac{\partial c}{\partial z} - v_{term} c + = -D \frac{\partial c}{\partial z} - s g c. +</math> + +The first term describes the [[flux]] due to [[diffusion]] down a [[concentration]] gradient, whereas the second term +describes the [[convective flux]] due to the average velocity <math>v_{term}</math> of the particles. A positive net [[flux]] out of a small volume produces a negative change in the local [[concentration]] within that volume + +:<math> +\frac{\partial c}{\partial t} = -\frac{\partial J}{\partial z}. +</math> + +Substituting the equation for the [[flux]] ''J'' produces the Mason–Weaver equation + +:<math> +\frac{\partial c}{\partial t} = +D \frac{\partial^{2}c}{\partial z^{2}} + +sg \frac{\partial c}{\partial z}. +</math> + +==The dimensionless Mason–Weaver equation== + +The parameters ''D'', ''s'' and ''g'' determine a length scale <math>z_{0}</math> + +:<math> +z_{0} \ \stackrel{\mathrm{def}}{=}\ \frac{D}{sg} +</math> + +and a time scale <math>t_{0}</math> + +:<math> +t_{0} \ \stackrel{\mathrm{def}}{=}\ \frac{D}{s^{2}g^{2}} +</math> + +Defining the [[dimensionless]] variables <math>\zeta \ \stackrel{\mathrm{def}}{=}\ z/z_{0}</math> and <math>\tau \ \stackrel{\mathrm{def}}{=}\ t/t_{0}</math>, the Mason–Weaver equation becomes + +:<math> +\frac{\partial c}{\partial \tau} = +\frac{\partial^{2} c}{\partial \zeta^{2}} + +\frac{\partial c}{\partial \zeta} +</math> + +subject to the [[boundary conditions]] + +:<math> +\frac{\partial c}{\partial \zeta} + c = 0 +</math> +at the top and bottom of the cell, <math>\zeta_{a}</math> and +<math>\zeta_{b}</math>, respectively. + +==Solution of the Mason–Weaver equation== + +This partial differential equation may be solved by [[separation of variables]]. Defining <math>c(\zeta,\tau) \ \stackrel{\mathrm{def}}{=}\ e^{-\zeta/2} T(\tau) P(\zeta)</math>, we obtain two ordinary differential equations coupled by a constant <math>\beta</math> + +:<math> +\frac{dT}{d \tau} + \beta T = 0 +</math> + +:<math> +\frac{d^{2} P}{d \zeta^{2}} + + \left[ \beta - \frac{1}{4} \right] P = 0 +</math> + +where acceptable values of <math>\beta</math> are defined by the [[boundary conditions]] + +:<math> +\frac{dP}{d\zeta} + \frac{1}{2} P = 0 +</math> + +at the upper and lower boundaries, <math>\zeta_{a}</math> and <math>\zeta_{b}</math>, respectively. Since the ''T'' equation +has the solution <math>T(\tau) = T_{0} e^{-\beta \tau}</math>, where <math>T_{0}</math> is a constant, the Mason–Weaver equation is reduced to solving for the function <math>P(\zeta)</math>. + +The [[ordinary differential equation]] for ''P'' and its [[boundary conditions]] satisfy the criteria +for a [[Sturm–Liouville theory|Sturm–Liouville problem]], from which several conclusions follow. '''First''', there is a discrete set of [[orthonormal]] [[eigenfunction]]s +<math>P_{k}(\zeta)</math> that satisfy the [[ordinary differential equation]] and [[boundary conditions]]. '''Second''', the corresponding [[eigenvalue]]s <math>\beta_{k}</math> are real, bounded below by a lowest +[[eigenvalue]] <math>\beta_{0}</math> and grow asymptotically like <math>k^{2}</math> where the nonnegative integer ''k'' is the rank of the [[eigenvalue]]. (In our case, the lowest eigenvalue is zero, corresponding to the equilibrium solution.) '''Third''', the [[eigenfunction]]s form a complete set; any solution for <math>c(\zeta, \tau)</math> can be expressed as a weighted sum of the [[eigenfunction]]s + +:<math> +c(\zeta, \tau) = +\sum_{k=0}^{\infty} c_{k} P_{k}(\zeta) e^{-\beta_{k}\tau} +</math> + +where <math>c_{k}</math> are constant coefficients determined from the initial distribution <math>c(\zeta, \tau=0)</math> + +:<math> +c_{k} = +\int_{\zeta_{a}}^{\zeta_{b}} d\zeta \ +c(\zeta, \tau=0) e^{\zeta/2} P_{k}(\zeta) +</math> + +At equilibrium, <math>\beta=0</math> (by definition) and the equilibrium concentration distribution is + +:<math> +e^{-\zeta/2} P_{0}(\zeta) = B e^{-\zeta} = B e^{-m_{b}gz/k_{B}T} +</math> + +which agrees with the [[Boltzmann distribution]]. The <math>P_{0}(\zeta)</math> function satisfies the [[ordinary differential equation]] and [[boundary conditions]] at all values of <math>\zeta</math> (as may be verified by substitution), and the constant ''B'' may be determined from the total amount of [[solution|solute]] + +:<math> +B = N_{tot} \left( \frac{sg}{D} \right) +\left( \frac{1}{e^{-\zeta_{b}} - e^{-\zeta_{a}}} \right) +</math> + +To find the non-equilibrium values of the [[eigenvalue]]s <math>\beta_{k}</math>, we proceed as follows. The P equation has the form of a simple [[harmonic oscillator]] with solutions <math>P(\zeta) = e^{i\omega_{k}\zeta}</math> where + +:<math> +\omega_{k} = \pm \sqrt{\beta_{k} - \frac{1}{4}} +</math> + +Depending on the value of <math>\beta_{k}</math>, <math>\omega_{k}</math> is either purely real (<math>\beta_{k}\geq\frac{1}{4}</math>) or purely imaginary (<math>\beta_{k} < \frac{1}{4}</math>). Only one purely imaginary solution can satisfy the [[boundary conditions]], namely, the equilibrium solution. Hence, the non-equilibrium [[eigenfunctions]] can be written as + +:<math> +P(\zeta) = A \cos{\omega_{k} \zeta} + B \sin{\omega_{k} \zeta} +</math> + +where ''A'' and ''B'' are constants and <math>\omega</math> is real and strictly positive. + +By introducing the oscillator [[amplitude]] <math>\rho</math> and [[phase (waves)|phase]] <math>\phi</math> as new variables, + +:<math> +u \ \stackrel{\mathrm{def}}{=}\ \rho \sin(\phi) \ \stackrel{\mathrm{def}}{=}\ P +</math> + +:<math> +v \ \stackrel{\mathrm{def}}{=}\ \rho \cos(\phi) \ \stackrel{\mathrm{def}}{=}\ - \frac{1}{\omega} +\left( \frac{dP}{d\zeta} \right) +</math> + +:<math> +\rho \ \stackrel{\mathrm{def}}{=}\ u^{2} + v^{2} +</math> + +:<math> +\tan(\phi) \ \stackrel{\mathrm{def}}{=}\ v / u +</math> + +the second-order equation for ''P'' is factored into two simple first-order equations + +:<math> +\frac{d\rho}{d\zeta} = 0 +</math> + +:<math> +\frac{d\phi}{d\zeta} = \omega +</math> + +Remarkably, the transformed [[boundary conditions]] are independent of <math>\rho</math> and the endpoints <math>\zeta_{a}</math> and <math>\zeta_{b}</math> + +:<math> +\tan(\phi_{a}) = +\tan(\phi_{b}) = \frac{1}{2\omega_{k}} +</math> + +Therefore, we obtain an equation + +:<math> +\phi_{a} - \phi_{b} + k\pi = k\pi = +\int_{\zeta_{b}}^{\zeta_{a}} d\zeta \ \frac{d\phi}{d\zeta} = +\omega_{k} (\zeta_{a} - \zeta_{b}) +</math> + +giving an exact solution for the frequencies <math>\omega_{k}</math> + +:<math> +\omega_{k} = \frac{k\pi}{\zeta_{a} - \zeta_{b}} +</math> + +The eigenfrequencies <math>\omega_{k}</math> are positive as required, since <math>\zeta_{a} > \zeta_{b}</math>, and comprise the set of [[harmonic]]s of the [[fundamental frequency]] <math>\omega_{1} \ \stackrel{\mathrm{def}}{=}\ \pi/(\zeta_{a} - \zeta_{b})</math>. Finally, the [[eigenvalue]]s <math>\beta_{k}</math> can be derived from <math>\omega_{k}</math> + +:<math> +\beta_{k} = \omega_{k}^{2} + \frac{1}{4} +</math> + +Taken together, the non-equilibrium components of the solution correspond to a [[Fourier series]] decomposition of the initial concentration distribution <math>c(\zeta, \tau=0)</math> +multiplied by the [[weight function|weighting function]] <math>e^{\zeta/2}</math>. Each Fourier component decays independently as <math>e^{-\beta_{k}\tau}</math>, where <math>\beta_{k}</math> is given above in terms of the [[Fourier series]] frequencies <math>\omega_{k}</math>. + +==See also== +* [[Lamm equation]] +* The Archibald approach , and a simpler presentation of the basic physics of the Mason-Weaver equation than the original. <ref>{{cite web |url=http://prola.aps.org/abstract/PR/v53/i9/p746_1 |title=Phys. Rev. 53, 746 (1938): The Process of Diffusion in a Centrifugal Field of Force |format= |work= |accessdate=}}</ref> + +==References== + +{{reflist|1}} + +{{DEFAULTSORT:Mason-Weaver equation}} +[[Category:Laboratory techniques]] +[[Category:Partial differential equations]] + jkltrdfu8hpffnc9o23opnbb82k4jj5 + + + + Kernel Fisher discriminant analysis + 0 + 27274 + + 27275 + 2013-07-16T14:11:56Z + + 12.170.248.36 + + /* Linear discriminant analysis */ rephrased for clearer wording + wikitext + text/x-wiki + In [[statistics]], '''kernel Fisher discrimant analysis (KFD)''',<ref name=flda>{{cite journal|last=Mika|first=S|coauthors=Ratsch, G.; Weston, J.; Scholkoph, B.; Mullers, KR|title=Fisher discriminant analysis with kernels|journal=Neural Networks for Signal Processing|year=1999}}</ref> also known as '''generalized discriminant analysis'''<ref name=gda>{{cite journal|last=Baudat|first=G.|coauthors=Anouar, F.|title=Generalized discriminant analysis using a kernel approach|journal=Neural Computation|year=2000|volume=12|issue=10|pages=2385–2404}}</ref> and '''kernel discriminant analysis''',<ref name=faces3>{{cite journal|last=Li|first=Y.|coauthors=Gong, S.; Liddell, H.|title=Recognising trajectories of facial identities using kernel discriminant analysis|journal=Image and Vision Computing|year=2003|volume=21|issue=13-14|pages=1077–1086}}</ref> is a kernelized version of [[linear discriminant analysis]]. It is named after [[Ronald Fisher]]. Using the [[kernel trick]], LDA is implicitly performed in a new feature space, which allows non-linear mappings to be learned. + +==Linear discriminant analysis== +Intuitively, the idea of LDA is to find a projection where class separation is maximized. Given two sets of labeled data, <math>\mathbf{C}_1</math> and <math>\mathbf{C}_2</math>, define the class means <math>\mathbf{m}_1</math> and <math>\mathbf{m}_2</math> to be + +: <math> +\mathbf{m}_i = \frac{1}{l_i}\sum_{n=1}^{l_i}\mathbf{x}_n^i, +</math> + +where <math>l_i</math> is the number of examples of class <math>\mathbf{C}_i</math>. The goal of linear discriminant analysis is to give a large separation of the class means while also keeping the in-class variance small.<ref name=bishop>{{cite book|last=Bishop|first=CM|title=Pattern Recognition and Machine Learning|year=2006|publisher=Springer|location=New York, NY}}</ref> This is formulated as maximizing + +: <math> +J(\mathbf{w}) = \frac{\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w}}{\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w}}, +</math> + +where <math>\mathbf{S}_B</math> is the between-class covariance matrix and <math>\mathbf{S}_W</math> is the total within-class covariance matrix: + +: <math> +\begin{align} +\mathbf{S}_B & = (\mathbf{m}_2-\mathbf{m}_1)(\mathbf{m}_2-\mathbf{m}_1)^{\text{T}} \\ +\mathbf{S}_W & = \sum_{i=1,2}\sum_{n=1}^{l_i}(\mathbf{x}_n^i-\mathbf{m}_i)(\mathbf{x}_n^i-\mathbf{m}_i)^{\text{T}}. +\end{align} +</math> + +Differentiating <math>J(\mathbf{w})</math> with respect to <math>\mathbf{w}</math>, setting equal to zero, and rearranging gives + +: <math> +(\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w})\mathbf{S}_W\mathbf{w} = (\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w})\mathbf{S}_B\mathbf{w}. +</math> + +Since we only care about the direction of <math>\mathbf{w}</math> and <math>\mathbf{S}_B\mathbf{w}</math> has the same direction as <math>(\mathbf{m}_2-\mathbf{m}_1)</math> , <math>\mathbf{S}_B\mathbf{w}</math> can be replaced by <math>(\mathbf{m}_2-\mathbf{m}_1)</math> and we can drop the scalars <math>(\mathbf{w}^{\text{T}}\mathbf{S}_B\mathbf{w})</math> and <math>(\mathbf{w}^{\text{T}}\mathbf{S}_W\mathbf{w})</math> to give + +: <math> +\mathbf{w} \propto \mathbf{S}^{-1}_W(\mathbf{m}_2-\mathbf{m}_1). +</math> + +==Kernel trick with LDA== +To extend LDA to non-linear mappings, the data can be mapped to a new feature space, <math>F</math>, via some function <math>\phi</math>. In this new feature space, the function that needs to be maximized is<ref name=flda /> + +: <math> +J(\mathbf{w}) = \frac{\mathbf{w}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{w}}{\mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w}}, +</math> + +where + +: <math> +\begin{align} +\mathbf{S}_B^{\phi} & = (\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})^{\text{T}} \\ +\mathbf{S}_W^{\phi} & = \sum_{i=1,2}\sum_{n=1}^{l_i}(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})^{\text{T}}, +\end{align} +</math> + +and + +: <math> +\mathbf{m}_i^{\phi} = \frac{1}{l_i}\sum_{j=1}^{l_i}\phi(\mathbf{x}_j^i). +</math> + +Further, note that <math>\mathbf{w}\in F</math>. Explicitly computing the mappings <math>\phi(\mathbf{x}_i)</math> and then performing LDA can be computationally expensive, and in many cases intractable. For example, <math>F</math> may be infinitely dimensional. Thus, rather than explicitly mapping the data to <math>F</math>, the data can be implicitly embedded by rewriting the algorithm in terms of [[dot product]]s and using the [[kernel trick]] in which the dot product in the new feature space is replaced by a kernel function, <math>k(\mathbf{x},\mathbf{y})=\phi(\mathbf{x})\cdot\phi(\mathbf{y})</math>. + +LDA can be reformulated in terms of dot products by first noting that <math>\mathbf{w}</math> will have an expansion of +the form<ref>{{cite journal|last=Scholkopf|first=B|coauthors=Herbrich, R.; Smola, A.|title=A generalized representer theorem|journal=Computational learning theory|year=2001}}</ref> + +: <math> +\mathbf{w} = \sum_{i=1}^l\alpha_i\phi(\mathbf{x}_i). +</math> +Then note that + +: <math> +\mathbf{w}^{\text{T}}\mathbf{m}_i^{\phi} = \frac{1}{l_i}\sum_{j=1}^{l}\sum_{k=1}^{l_i}\alpha_jk(\mathbf{x}_j,\mathbf{x}_k^i) = \mathbf{\alpha}^{\text{T}}\mathbf{M}_i, +</math> + +where + +: <math> +(\mathbf{M}_i)_j = \frac{1}{l_i}\sum_{k=1}^{l_i}k(\mathbf{x}_j,\mathbf{x}_k^i). +</math> + +The numerator of <math>J(\mathbf{w})</math> can then be written as: + +: <math> +\begin{align} +\mathbf{w}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{w} & = \mathbf{w}^{\text{T}}(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})(\mathbf{m}_2^{\phi}-\mathbf{m}_1^{\phi})^{\text{T}}\mathbf{w} \\ +& = \mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha}, +\end{align} +</math></center> +where <math>\mathbf{M} = (\mathbf{M}_2-\mathbf{M}_1)(\mathbf{M}_2-\mathbf{M}_1)^{\text{T}}</math>. Similarly, the denominator can be written as + +: <math> +\mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w}=\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha}, +</math> + +where + +: <math> +\mathbf{N} = \sum_{j=1,2}\mathbf{K}_j(\mathbf{I}-\mathbf{1}_{l_j})\mathbf{K}_j^{\text{T}}, +</math> + +with the <math>n^{\text{th}}, m^{\text{th}}</math> component of <math>\mathbf{K}_j</math> defined as <math>k(\mathbf{x}_n,\mathbf{x}_m^j)</math>, <math>\mathbf{I}</math> is the identity matrix, and <math>\mathbf{1}_{l_j}</math> the matrix with all entries equal to <math>1/l_j</math>. This identity can be derived by starting out with the expression for <math>\mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w}</math> and using the expansion of <math>\mathbf{w}</math> and the definitions of <math>\mathbf{S}_W^{\phi}</math> and <math>\mathbf{m}_i^{\phi}</math> + +: <math> +\begin{align} +\mathbf{w}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{w} & = +\left(\sum_{i=1}^l\alpha_i\phi^{\text{T}}(\mathbf{x}_i)\right)\left(\sum_{j=1,2}\sum_{n =1}^{l_j}(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})^{\text{T}}\right) +\left(\sum_{k=1}^l\alpha_k\phi(\mathbf{x}_k)\right)\\ +& = \sum_{j=1,2}\sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\alpha_i\phi^{\text{T}}(\mathbf{x}_i)(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})(\phi(\mathbf{x}_n^j)-\mathbf{m}_j^{\phi})^{\text{T}} +\alpha_k\phi(\mathbf{x}_k) \\ +& = \sum_{j=1,2}\sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l \left(\alpha_ik(\mathbf{x}_i,\mathbf{x}_n^j)-\frac{1}{l_j}\sum_{p=1}^{l_j}\alpha_ik(\mathbf{x}_i,\mathbf{x}_p^j)\right) +\left(\alpha_kk(\mathbf{x}_k,\mathbf{x}_n^j)-\frac{1}{l_j}\sum_{q=1}^{l_j}\alpha_kk(\mathbf{x}_k,\mathbf{x}_q^j)\right) \\ +& = \sum_{j=1,2}\left( \sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\Bigg( \alpha_i\alpha_kk(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_n^j)\right.\\ +& \left.{} - \frac{2\alpha_i\alpha_k}{l_j}\sum_{p=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_p^j) +\left. + \frac{\alpha_i\alpha_k}{l_j^2}\sum_{p=1}^{l_j}\sum_{q=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_p^j)k(\mathbf{x}_k,\mathbf{x}_q^j) \right)\right) \\ +& = \sum_{j=1,2}\left( \sum_{i=1}^l\sum_{n =1}^{l_j}\sum_{k=1}^l\left( \alpha_i\alpha_kk(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_n^j) + - \frac{\alpha_i\alpha_k}{l_j}\sum_{p=1}^{l_j}k(\mathbf{x}_i,\mathbf{x}_n^j)k(\mathbf{x}_k,\mathbf{x}_p^j) \right)\right) \\ +& = \sum_{j=1,2} \mathbf{\alpha}^{\text{T}} \mathbf{K}_j\mathbf{K}_j^{\text{T}}\mathbf{\alpha} - \mathbf{\alpha}^{\text{T}} \mathbf{K}_j\mathbf{1}_{l_j}\mathbf{K}_j^{\text{T}}\mathbf{\alpha} \\ +& = \mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha}. +\end{align} +</math> + +With these equations for the numerator and denominator of <math>J(\mathbf{w})</math>, the equation for <math>J</math> can be rewritten as + +: <math> +J(\mathbf{\alpha}) = \frac{\mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha}}{\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha}}. +</math> + +Then, differentiating and setting equal to zero gives + +: <math> +(\mathbf{\alpha}^{\text{T}}\mathbf{M}\mathbf{\alpha})\mathbf{N}\mathbf{\alpha} = (\mathbf{\alpha}^{\text{T}}\mathbf{N}\mathbf{\alpha})\mathbf{M}\mathbf{\alpha}. +</math> + +Since only the direction of <math>\mathbf{w}</math>, and hence the direction of <math>\mathbf{\alpha}</math>, matters, the above can be solved for <math>\mathbf{\alpha}</math> as + +: <math> +\mathbf{\alpha} = \mathbf{N}^{-1}(\mathbf{M}_2- \mathbf{M}_1). +</math> + +Note that in practice, <math>\mathbf{N}</math> is usually singular and so a multiple of the identity is added to it <ref name=flda /> + +: <math> +\mathbf{N}_{\epsilon} = \mathbf{N}+\epsilon\mathbf{I}. +</math> + +Given the solution for <math>\mathbf{\alpha}</math>, the projection of a new data point is given by<ref name=flda /> + +: <math> +y(\mathbf{x}) = (\mathbf{w}\cdot\phi(\mathbf{x})) = \sum_{i=1}^l\alpha_ik(\mathbf{x}_i,\mathbf{x}). +</math> + +==Multi-class KFD== + +The extension to cases where there are more than two classes is relatively straightforward.<ref name=gda /><ref name=duda>{{cite book|last=Duda|first=R.|coauthors = Hart, P.;Stork, D.|title=Pattern Classification|year=2001|publisher=Wiley|location=New York, NY}}</ref><ref name=texture>{{cite journal|last=Zhang|first=J.|coauthors=Ma, K.K.,|title=Kernel fisher discriminant for texture classification|year=2004}}</ref> Let <math>c</math> be the number of classes. Then multi-class KFD involves projecting the data into a <math>(c-1)</math>-dimensional space using <math>(c-1)</math> discriminant functions + +: <math> +y_i = \mathbf{w}_i^{\text{T}}\phi(\mathbf{x}) \qquad i= 1,\ldots,c-1. +</math> + +This can be written in matrix notation + +: <math> +\mathbf{y} = \mathbf{W}^{\text{T}}\phi(\mathbf{x}), +</math> + +where the <math>\mathbf{w}_i</math> are the columns of <math>\mathbf{W}</math>.<ref name=duda /> Further, the between-class covariance matrix is now + +: <math> +\mathbf{S}_B^{\phi} = \sum_{i=1}^c l_i(\mathbf{m}_i^{\phi}-\mathbf{m}^{\phi})(\mathbf{m}_i^{\phi}-\mathbf{m}^{\phi})^{\text{T}}, +</math> + +where <math>\mathbf{m}^\phi</math> is the mean of all the data in the new feature space. The within-class covariance matrix is + +: <math> +\mathbf{S}_W^{\phi} = \sum_{i=1}^c \sum_{n=1}^{l_i}(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})(\phi(\mathbf{x}_n^i)-\mathbf{m}_i^{\phi})^{\text{T}}, +</math> + +The solution is now obtained by maximizing + +: <math> +J(\mathbf{W}) = \frac{\left|\mathbf{W}^{\text{T}}\mathbf{S}_B^{\phi}\mathbf{W}\right|}{\left|\mathbf{W}^{\text{T}}\mathbf{S}_W^{\phi}\mathbf{W}\right|}. +</math> + +The kernel trick can again be used and the goal of multi-class KFD becomes<ref name=texture /> + +: <math> +\mathbf{A}^* = \underset{\mathbf{A}}{\operatorname{argmax}} = \frac{\left|\mathbf{A}^{\text{T}}\mathbf{M}\mathbf{A}\right|}{\left|\mathbf{A}^{\text{T}}\mathbf{N}\mathbf{A}\right|}, +</math> + +where <math>A = [\mathbf{\alpha}_1,\ldots,\mathbf{\alpha}_{c-1}]</math> and + +: <math> +\begin{align} +M & = \sum_{j=1}^cl_j(\mathbf{M}_j-\mathbf{M}_{*})(\mathbf{M}_j-\mathbf{M}_{*})^{\text{T}} \\ +N & = \sum_{j=1}^c\mathbf{K}_j(\mathbf{I}-\mathbf{1}_{l_j})\mathbf{K}_j^{\text{T}}. +\end{align} +</math> + +The <math>\mathbf{M}_i</math> are defined as in the above section and <math>\mathbf{M}_{*}</math> is defined as + +: <math> +(\mathbf{M}_{*})_j = \frac{1}{l}\sum_{k=1}^{l}k(\mathbf{x}_j,\mathbf{x}_k). +</math> + +<math>\mathbf{A}^{*}</math> can then be computed by finding the <math>(c-1)</math> leading eigenvectors of <math>\mathbf{N}^{-1}\mathbf{M}</math>.<ref name=texture /> Furthermore, the projection of a new input, <math>\mathbf{x}_t</math>, is given by<ref name=texture /> + +: <math> +\mathbf{y}(\mathbf{x}_t) = \left(\mathbf{A}^{*}\right)^{\text{T}}\mathbf{K}_t, +</math> + +where the <math>i^{th}</math> component of <math>\mathbf{K}_t</math> is given by <math>k(\mathbf{x}_i,\mathbf{x}_t)</math>. + +==Classification using KFD== +In both two-class and multi-class KFD, the class label of a new input can be assigned as<ref name=texture /> + +: <math> +f(\mathbf{x}) = arg\min_j D(\mathbf{y}(\mathbf{x}),\bar{\mathbf{y}}_j), +</math> + +where <math>\bar{\mathbf{y}}_j</math> is the projected mean for class <math>j</math> and <math>D(\cdot,\cdot)</math> is a distance function. + +==Applications== + +Kernel discriminant analysis has been used in a variety of applications. These include: +*Face recognition<ref name=faces3 /><ref>{{cite journal|last=Liu|first=Q.|coauthors=Lu, H.; Ma, S.|title=Improving kernel Fisher discriminant analysis for face recognition|journal=IEEE Transactions on Circuits and Systems for Video Technology|year=2004|volume=14|issue=1|pages=42–49}}</ref><ref>{{cite journal|last=Liu|first=Q.|coauthors=Huang, R.; Lu, H.;Ma, S.|title=Face recognition using kernel-based Fisher discriminant analysis|journal=IEEE International Conference on Automatic Face and Gesture Recognition|year=2002}}</ref> and detection<ref name=faceDetection1>{{cite journal|last=Kurita|first=T.|coauthors=Taguchi, T.|title=A modification of kernel-based Fisher discriminant analysis for face detection|journal=IEEE International Conference on Automatic Face and Gesture Recognition|year=2002}}</ref><ref name=faceDetection2>{{cite journal|last=Feng|first=Y.|coauthors=Shi, P.|title=Face detection based on kernel fisher discriminant analysis|journal=IEEE International Conference on Automatic Face and Gesture Recognition|year=2004}}</ref> +*Hand-written digit recognition<ref name=flda /><ref name=digitRecognition>{{cite journal|last=Yang|first=J.|coauthors=Frangi, AF; Yang, JY; Zang, D., Jin, Z.|title=KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2005|volume=27|issue=2}}</ref> +*Palmprint recognition<ref>{{cite journal|last=Wang|first=Y.|coauthors=Ruan, Q.|title=Kernel fisher discriminant analysis for palmprint recognition|journal=International Conference on Pattern Recognition|year=2006}}</ref> +*Classification of malignant and benign cluster microcalcifications<ref name=cancer>{{cite journal|last=Wei|first=L.|coauthors=Yang, Y.; Nishikawa, R.M.; Jiang, Y.|title=A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications|journal=IEEE Transactions on Medical Imaging|year=2005|volume=24|issue=3|pages=371–380}}</ref> +*Seed classification<ref name=gda /> + +==See also== +* [[Kernel trick]] +* [[Kernel principal component analysis]] +* [[Linear discriminant analysis]] +* [[Factor analysis]] + +==References== +{{Reflist}} + +==External links== +* [http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/LI1/kda/index.html Kernel Discriminant Analysis] - This site gives a high level explanation of KFD. +* [http://crsouza.blogspot.com/2010/01/kernel-discriminant-analysis-in-c.html Kernel Discriminant Analysis in C#] - C# code to perform KFD. +* [http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html Matlab Toolbox for Dimensionality Reduction] - Includes a method for performing KFD. +* [http://www.codeproject.com/KB/recipes/handwriting-kda.aspx Handwriting Recognition using Kernel Discriminant Analysis] - C# code that demonstrates handwritten digit recognition using KFD. + + + +[[Category:Multivariate statistics]] + lp44e8yrizmaesrormisp824y63u59z + + + + Implicit solvation + 0 + 15282 + + 15283 + 2014-01-21T22:17:04Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + '''Implicit solvation''' (sometimes known as '''continuum solvation''') is a method of representing [[solvent]] as a continuous medium instead of individual “explicit” solvent molecules most often used in [[molecular dynamics]] simulations and in other applications of [[molecular mechanics]]. The method is often applied to estimate [[Thermodynamic free energy|free energy]] of [[solution|solute]]-[[solvent]] interactions in structural and chemical processes, such as folding or [[conformational change|conformational transitions]] of [[proteins]], [[DNA]], [[RNA]], and [[polysaccharide]]s, association of biological macromolecules with [[ligand]]s, or transport of [[drugs]] across [[biological membrane]]s. + +The implicit solvation model is justified in liquids, where the [[potential of mean force]] can be applied to approximate the averaged behavior of many highly dynamic solvent molecules. However, the interiors of [[biological membrane]]s or [[protein]]s can also be considered as media with specific [[solvation]] or [[dielectric]] properties. These media are continuous but not necessarily uniform, since their properties can be described by different analytical functions, such as “polarity profiles” of [[lipid bilayer]]s.<ref name="pmid11438731">{{cite journal | author = Marsh D | title = Polarity and permeation profiles in lipid membranes | journal = Proc. Natl. Acad. Sci. U.S.A. | volume = 98 | issue = 14 | pages = 7777–82 |date=July 2001 | pmid = 11438731 | pmc = 35418 | doi = 10.1073/pnas.131023798 | url =http://www.pnas.org/cgi/pmidlookup?view=long&pmid=11438731 | issn = 0027-8424 | format = Free full text |bibcode = 2001PNAS...98.7777M }}</ref> There are two basic types of implicit solvent methods: models based on [[accessible surface area]]s (ASA) that were historically the first, and more recent continuum electrostatics models, although various modifications and combinations of the different methods are possible. +The accessible surface area (ASA) method is based on experimental linear relations between [[Gibbs free energy]] of transfer and the [[surface area]] of a [[solution|solute]] molecule.<ref name="pmid326146">{{cite journal | author = [[Frederic M. Richards|Richards FM]] | title = Areas, volumes, packing and protein structure | journal = Annu. Rev. Biophys. Bioeng. | volume = 6 | issue = | pages = 151–76 | year = 1977 | pmid = 326146 | doi = 10.1146/annurev.bb.06.060177.001055 | url = | issn = 0084-6589 }}</ref> This method operates directly with free energy of [[solvation]], unlike [[molecular mechanics]] or [[electrostatic]] methods that include only the [[enthalpy|enthalpic]] component of free energy. The continuum representation of solvent also significantly improves the computational speed and reduces errors in statistical averaging that arise from incomplete sampling of solvent conformations,<ref name="pmid17030302">{{cite journal | author = Roux B, Simonson T | title = Implicit solvent models | journal = Biophys. Chem. | volume = 78 | issue = 1–2 | pages = 1–20 |date=April 1999 | pmid = 17030302 | doi = 10.1016/S0301-4622(98)00226-9| url = | issn = 0301-4622 }}</ref> so that the energy landscapes obtained with implicit and explicit solvent are different.<ref name="Zhou_2003">{{cite journal | author = Zhou R | title = Free energy landscape of protein folding in water: explicit vs. implicit solvent | journal = Proteins | volume = 53 | issue = 2 | pages = 148–61 |date=November 2003 | pmid = 14517967 | doi = 10.1002/prot.10483 | url = | issn = 0887-3585 }}</ref> Although the implicit solvent model is useful for simulations of biomolecules, this is an approximate method with certain limitations and problems related to parameterization and treatment of [[ionization]] effects. + +==Accessible surface area-based method== +{{Main|Accessible surface area}} +The free energy of solvation of a [[solution|solute]] molecule in the simplest ASA-based method is given by: +:<math> +\Delta G_\mathrm{solv} = \sum_{i} \sigma_{i} \ ASA_{i} +</math> +where <math> ASA_{i}</math> is the [[accessible surface area]] of atom ''i'', and +<math> \sigma_{i}</math> is ''solvation parameter'' of atom ''i'', i.e. a contribution to the free energy of [[solvation]] of the particular atom i per surface unit area. The required solvation parameters for different types of atoms ([[carbon|C]], [[nitrogen|N]], [[oxygen|O]], [[sulfur|S]], etc.) are usually determined by a [[least squares]] fit of the calculated and experimental transfer free energies for a series of [[organic compound]]s. The experimental energies are determined from [[partition coefficient]]s of these compounds between different solutions or media using standard mole concentrations of the solutes.<ref name="Ben-Naim">{{cite book | title = Hydrophobic interactions | author = Ben-Naim AY |year = 1980 | publisher = Plenum Press | location = New York | isbn = 0-306-40222-X | page = | pages = | url = }}</ref><ref name="pmid7766825">{{cite journal | author = Holtzer A | title = The "cratic correction" and related fallacies | journal = Biopolymers | volume = 35 | issue = 6 | pages = 595–602 |date=June 1995 | pmid = 7766825 | doi = 10.1002/bip.360350605 | url =http://www.scholaruniverse.com/ncbi-linkout?id=7766825 | issn = 0006-3525 | format = Free full text }}</ref> + +It is noteworthy that ''solvation energy'' is the free energy required to transfer a solute molecule from a solvent to “vacuum” (gas phase). This solvation energy can supplement the intramolecular energy in vacuum calculated in [[molecular mechanics]]. Therefore, the required atomic solvation parameters were initially derived from water-gas partition data.<ref name="pmid3472198">{{cite journal | author = Ooi T, Oobatake M, Némethy G, Scheraga HA | title = Accessible surface areas as a measure of the thermodynamic parameters of hydration of peptides | journal = Proc. Natl. Acad. Sci. U.S.A. | volume = 84 | issue = 10 | pages = 3086–90 |date=May 1987 | pmid = 3472198 | pmc = 304812 | doi = 10.1073/pnas.84.10.3086| issn = 0027-8424 | format = Free full text |bibcode = 1987PNAS...84.3086O }}</ref> However, the dielectric properties of proteins and [[lipid bilayer]]s are much more similar to those of nonpolar solvents than to vacuum. Newer parameters have therefore been derived from [[water]]-[[1-octanol]] [[partition coefficient]]s<ref name="pmid3945310">{{cite journal | author = Eisenberg D, McLachlan AD | title = Solvation energy in protein folding and binding | journal = Nature | volume = 319 | issue = 6050 | pages = 199–203 | year = 1986 | pmid = 3945310 | doi = 10.1038/319199a0 | url = | month = Jan | issn = 0028-0836 |bibcode = 1986Natur.319..199E }}</ref> or other similar data. Such parameters actually describe ''transfer'' energy between two condensed media or the ''difference'' of two solvation energies. + +==Poisson-Boltzmann== +{{Main|Poisson-Boltzmann equation}} +Although this equation has solid theoretical justification, it is computationally expensive to calculate without approximations. The [[Poisson-Boltzmann equation]] (PB) describes the electrostatic environment of a solute in a solvent containing [[ion]]s. It can be written in [[cgs]] units as: +:<math> +\vec{\nabla}\cdot\left[\epsilon(\vec{r})\vec{\nabla}\Psi(\vec{r})\right] = -4\pi\rho^{f}(\vec{r}) - 4\pi\sum_{i}c_{i}^{\infty}z_{i}q\lambda(\vec{r})e^{\frac{-z_{i}q\Psi(\vec{r})}{kT}} +</math> + +or (in [[MKS system of units|mks]]): + +:<math> +\vec{\nabla}\cdot\left[\epsilon(\vec{r})\vec{\nabla}\Psi(\vec{r})\right] = -\rho^{f}(\vec{r}) - \sum_{i}c_{i}^{\infty}z_{i}q\lambda(\vec{r})e^{\frac{-z_{i}q\Psi(\vec{r})}{kT}} +</math> + +where <math>\epsilon(\vec{r})</math> represents the position-dependent dielectric, <math>\Psi(\vec{r})</math> represents the electrostatic potential, <math>\rho^{f}(\vec{r})</math> represents the charge density of the solute, <math>c_{i}^{\infty}</math> represents the concentration of the ion ''i'' at a distance of infinity from the solute, <math>z_{i}</math> is the valence of the ion, ''q'' is the charge of a proton, ''k'' is the [[Boltzmann constant]], ''T'' is the [[temperature]], and <math>\lambda(\vec{r})</math> is a factor for the position-dependent accessibility of position ''r'' to the ions in solution (often set to uniformly 1). If the potential is not large, the equation can be [[linearization|linearized]] to be solved more efficiently.<ref name="pmid12501158">{{cite journal | author = Fogolari F, Brigo A, Molinari H | title = The Poisson-Boltzmann equation for biomolecular electrostatics: a tool for structural biology | journal = J. Mol. Recognit. | volume = 15 | issue = 6 | pages = 377–92 | year = 2002 | pmid = 12501158 | doi = 10.1002/jmr.577 | url = | month = Nov | issn = 0952-3499 }}</ref> + +A number of numerical Poisson-Boltzmann equation solvers of varying generality and efficiency have been developed,<ref name="pmid16290441">{{cite journal | author = Shestakov AI, Milovich JL, Noy A | title = Solution of the nonlinear Poisson-Boltzmann equation using pseudo-transient continuation and the finite element method | journal = J Colloid Interface Sci | volume = 247 | issue = 1 | pages = 62–79 |date=March 2002 | pmid = 16290441 | doi = 10.1006/jcis.2001.8033 | url = | issn = 0021-9797 }}</ref><ref name="pmid15974723">{{cite journal | author = Lu B, Zhang D, McCammon JA | title = Computation of electrostatic forces between solvated molecules determined by the Poisson-Boltzmann equation using a boundary element method | journal = J Chem Phys | volume = 122 | issue = 21 | pages = 214102 |date=June 2005 | pmid = 15974723 | doi = 10.1063/1.1924448 | url = | issn = 0021-9606 |bibcode = 2005JChPh.122u4102L }}</ref><ref name="pmid11517324">{{cite journal | author = Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA | title = Electrostatics of nanosystems: Application to microtubules and the ribosome | journal = Proc. Natl. Acad. Sci. U.S.A. | volume = 98 | issue = 18 | pages = 10037–41 |date=August 2001 | pmid = 11517324 | pmc = 56910 | doi = 10.1073/pnas.181342398 | url =http://www.pnas.org/cgi/pmidlookup?view=long&pmid=11517324 | issn = 0027-8424 | format = Free full text |bibcode = 2001PNAS...9810037B }}</ref> including one application with a specialized computer hardware platform.<ref name="pmid15942918">{{cite journal | author = Höfinger S | title = Solving the Poisson-Boltzmann equation with the specialized computer chip MD-GRAPE-2 | journal = J Comput Chem | volume = 26 | issue = 11 | pages = 1148–54 |date=August 2005 | pmid = 15942918 | doi = 10.1002/jcc.20250 | url = | issn = 0192-8651 }}</ref> However, performance from PB solvers does not yet equal that from the more commonly used generalized Born approximation.<ref name="pmid16540310">{{cite journal | author = Koehl P | title = Electrostatics calculations: latest methodological advances | journal = Curr. Opin. Struct. Biol. | volume = 16 | issue = 2 | pages = 142–51 |date=April 2006 | pmid = 16540310 | doi = 10.1016/j.sbi.2006.03.001 | url = | issn = 0959-440X }}</ref> + +==Generalized Born== +The ''Generalized Born'' (GB) model is an approximation to the exact (linearized) Poisson-Boltzmann equation. It is based on modeling the solute as a set of spheres whose internal dielectric constant differs from the external solvent. The model has the following functional form: +:<math> +G_{s} = \frac{1}{8\pi}\left(\frac{1}{\epsilon_{0}}-\frac{1}{\epsilon}\right)\sum_{i,j}^{N}\frac{q_{i}q_{j}}{f_{GB}} +</math> +where +:<math> +f_{GB} = \sqrt{r_{ij}^{2} + a_{ij}^{2}e^{-D}} +</math> +and +<math> +D = \left(\frac{r_{ij}}{2a_{ij}}\right)^{2}, a_{ij} = \sqrt{a_{i}a_{j}} +</math> + +where <math>\epsilon_{0}</math> is the [[permittivity of free space]], <math>\epsilon</math> is the [[dielectric constant]] of the solvent being modeled, <math>q_{i}</math> is the [[electrostatic charge]] on particle ''i'', <math>r_{ij}</math> is the distance between particles ''i'' and ''j'', and <math>a_{i}</math> is a quantity (with the dimension of length) known as the ''effective Born radius''.<ref name="Still">{{cite journal | author = Still WC, Tempczyk A, Hawley RC, Hendrickson T |year = 1990 | title = Semianalytical treatment of solvation for molecular mechanics and dynamics | journal = J Am Chem Soc | volume = 112 | issue = 16 | pages = 6127–6129 | doi = 10.1021/ja00172a038 }}</ref> The effective Born radius of an atom characterizes its degree of burial inside the solute; qualitatively it can be thought of as the distance from the atom to the molecular surface. Accurate estimation of the effective Born radii is critical for the GB model.<ref name="Onufriev">{{cite journal | author = Onufriev A, Bashford D, Case DA | year = 2002 | title = Effective Born radii in the generalized Born approximation: The importance of being perfect | journal = J Comp Chem | volume = 23 | issue = 14 | pages = 1297–1304 | doi = 10.1002/jcc.10126 | pmid = 12214312 }}</ref> + +===GBSA=== +GBSA is simply a Generalized Born model augmented with the hydrophobic solvent accessible surface area (SA) term. It is among the most commonly used implicit solvent model combinations. The use of this model in the context of [[molecular mechanics]] is known as MM/GBSA. Although this formulation has been shown to successfully identify the [[native state]]s of short peptides with well-defined [[tertiary structure]],<ref name="pmid16617376">{{cite journal | author = Ho BK, Dill KA | title = Folding Very Short Peptides Using Molecular Dynamics | journal = PLoS Comput. Biol. | volume = 2 | issue = 4 | pages = e27 |date=April 2006 | pmid = 16617376 | pmc = 1435986 | doi = 10.1371/journal.pcbi.0020027 | url =http://dx.plos.org/10.1371/journal.pcbi.0020027 | issn = 1553-734X | format = Free full text |bibcode = 2006PLSCB...2...27H }}</ref> the conformational ensembles produced by GBSA models in other studies differ significantly from those produced by explicit solvent and do not identify the protein's native state.<ref name="Zhou_2003" /> In particular, [[Salt bridge (protein)|salt bridge]]s are overstabilized, possibly due to insufficient electrostatic screening, and a higher-than-native [[alpha helix]] population was observed. Variants of the GB model have also been developed to approximate the electrostatic environment of membranes, which have had some success in folding the [[transmembrane helix|transmembrane helices]] of [[integral membrane protein]]s.<ref name="pmid14581194">{{cite journal | author = Im W, Feig M, Brooks CL | title = An Implicit Membrane Generalized Born Theory for the Study of Structure, Stability, and Interactions of Membrane Proteins | journal = Biophys. J. | volume = 85 | issue = 5 | pages = 2900–18 |date=November 2003 | pmid = 14581194 | pmc = 1303570 | doi = 10.1016/S0006-3495(03)74712-2 | url = | issn = 0006-3495 | bibcode=2003BpJ....85.2900I}}</ref> + +==Ad hoc fast solvation models== +Another possibility is to use ad hoc quick strategies to estimate solvation free energy. A first generation of fast implicit solvents is based on the calculation of a per-atom solvent accessible surface area. For each of group of atom types, a different parameter scales its contribution to solvation ("ASA-based model" described above).<ref name="pmid1304905">{{cite journal | author = Wesson L, Eisenberg D | title = Atomic solvation parameters applied to molecular dynamics of proteins in solution | journal = Protein Sci. | volume = 1 | issue = 2 | pages = 227–35 |date=February 1992 | pmid = 1304905 | pmc = 2142195 | doi = 10.1002/pro.5560010204 | issn = 0961-8368 | format = Free full text }}</ref> + +Another strategy is implemented for the [[CHARMM]]19 force-field and is called EEF1.<ref name="pmid10223287">{{cite journal | author = Lazaridis T, Karplus M | title = Effective energy function for proteins in solution | journal = Proteins | volume = 35 | issue = 2 | pages = 133–52 |date=May 1999 | pmid = 10223287 | doi = 10.1002/(SICI)1097-0134(19990501)35:2<133::AID-PROT1>3.0.CO;2-N | url = http://www.sci.ccny.cuny.edu/~themis/eef1.pdf | issn = 0887-3585 | format = }} {{dead link|date=September 2009}}</ref> EEF1 is based on a Gaussian-shaped solvent exclusion. The solvation free energy is + +:<math> +\Delta G_{i}^{solv} = \Delta G_{i}^{ref} - \sum_{j} \int_{Vj} f_i(r) dr +</math> + +The reference solvation free energy of ''i'' corresponds to a suitably chosen small molecule in +which group i is essentially fully solvent-exposed. The integral is over the volume ''V<sub>j</sub>'' of +group ''j'' and the summation is over all groups ''j'' around ''i''. EEF1 additionally uses a distance-dependent (non-constant) dielectric, and ionic side-chains of proteins are simply neutralized. It is only 50% slower than a vacuum simulation. This model was later augmented with the hydrophobic effect and called Charmm19/SASA.<ref name="pmid11746700">{{cite journal | author = Ferrara P, Apostolakis J, Caflisch A | title = Evaluation of a fast implicit solvent model for molecular dynamics simulations | journal = Proteins | volume = 46 | issue = 1 | pages = 24–33 |date=January 2002 | pmid = 11746700 | doi = 10.1002/prot.10001 }}</ref> + +==Hybrid implicit/explicit solvation models== +It is possible to include a layer or sphere of water molecules around the solute, and model the bulk with an implicit solvent. Such an approach is proposed by M. J. Frisch and coworkers<ref name="isbn0-8412-2981-3">{{cite book | editor = Smith D | title = Modeling the hydrogen bond | publisher = American Chemical Society | location = Columbus, OH | year = 1994 | author = TA Keith, MJ Frisch | chapter = Chapter 3: Inclusion of Explicit Solvent Molecules in a Self-Consistent-Reaction Field Model of Solvation | isbn = 0-8412-2981-3 }}</ref> and by other authors .<ref name="pmid15470756">{{cite journal | author = Lee MS, Salsbury FR, Olson MA | title = An efficient hybrid explicit/implicit solvent method for biomolecular simulations | journal = J Comput Chem | volume = 25 | issue = 16 | pages = 1967–78 |date=December 2004 | pmid = 15470756 | doi = 10.1002/jcc.20119 }}</ref> <ref>{{Cite journal | author = Marini A, Muñoz-Losa A, Biancardi A, Mennucci B | title = What is Solvatochromism? | journal = The Journal of Physical Chemistry B | volume = 114 | pages = 17128 | year = 2010 | doi = 10.1021/jp1097487 | issue = 51}}</ref> For instance in Ref. <ref name=pmid15470756 /> the bulk solvent is modeled with a Generalized Born approach and the multi-grid method used for Coulombic pairwise particle interactions. It is reported to be faster than a full explicit solvent simulation with the [[Ewald summation|particle mesh Ewald]] (PME) method of electrostatic calculation. + +==Effects not accounted for== +===The hydrophobic effect=== +Models like PB and GB allow estimation of the mean electrostatic free energy but do not account for the (mostly) [[entropy|entropic]] effects arising from solute-imposed constraints on the organization of the water or solvent molecules. This is known as the [[hydrophobic effect]] and is a major factor in the [[protein folding|folding]] process of [[globular protein]]s with [[hydrophobic core]]s. Implicit solvation models may be augmented with a term that accounts for the hydrophobic effect. The most popular way to do this is by taking the solvent accessible surface area (SASA) as a [[Proxy (statistics)|proxy]] of the extent of the hydrophobic effect. Most authors place the extent of this effect between 5 and 45 cal/(Å<sup>2</sup> mol).<ref>{{cite journal | first = K.A. | last = Sharp | authorlink =| coauthors = A. Nicholls, R.F. Fine, B. Honig | year =1991 | month = | title = Reconciling the magnitude of the microscopic and macroscopic hydrophobic effects | journal = Science | volume = 252 | issue = 5002| pages = 106–109 | id =| doi =10.1126/science.2011744 | pmid = 2011744|bibcode = 1991Sci...252..106S }}</ref> Note that this surface area pertains to the solute, while the hydrophobic effect is mostly entropic in nature at physiological temperatures and occurs on the side of the solvent. + +===Viscosity=== +Implicit solvent models such as PB, GB, and SASA lack the viscosity that water molecules impart by randomly colliding and impeding the motion of solutes through their van der Waals repulsion. In many cases, this is desirable because it makes sampling of configurations and [[phase space]] much faster. This acceleration means that more configurations are visited per simulated time unit, on top of whatever CPU acceleration is achieved in comparison to explicit solvent. It can, however, lead to misleading results when kinetics are of interest. + +Viscosity may be added back by using [[Langevin dynamics]] instead of [[Hamiltonian mechanics|Hamiltonian dynamics]] and choosing an appropriate damping constant for the particular solvent.<ref>Tamar Schlick (2002). ''Molecular Modeling and Simulation: An Interdisciplinary Guide'' Interdisciplinary Applied Mathematics: Mathematical Biology. Springer-Verlag New York, NY, ISBN 0-387-95404-X</ref> Recent work has also been done developing thermostats based on fluctuating hydrodynamics to account for momentum transfer through the solvent and related thermal fluctuations. <ref> Yaohong Wang, Jon Karl Sigurdsson, Paul J. Atzberger (2012). ''Dynamic Implicit-Solvent Coarse-Grained Models of Lipid Bilayer Membranes : Fluctuating Hydrodynamics Thermostat'' arXiv:1212.0449, http://arxiv.org/abs/1212.0449. </ref> One should keep in mind, though, that the folding rate of proteins does not depend linearly on viscosity for all regimes.<ref name="pmid12868108">{{cite journal | author = Zagrovic B, Pande V | title = Solvent viscosity dependence of the folding rate of a small protein: distributed computing study | journal = J Comput Chem | volume = 24 | issue = 12 | pages = 1432–6 |date=September 2003 | pmid = 12868108 | doi = 10.1002/jcc.10297 }}</ref> + +===Hydrogen bonds with solvent=== +Solute-solvent [[hydrogen bond]]s in the first [[solvation shell]] are important for solubility of organic molecules and especially [[ions]]. Their average energetic contribution can be reproduced with an implicit solvent model.<ref name="pmid21438609">{{cite journal | author = Lomize AL, Pogozheva ID, Mosberg HI | title = Anisotropic solvent model of the lipid bilayer. 1. Parameterization of long-range electrostatics and first solvation shell effects | journal = J Chem Inf Model | volume = 51 | issue = 4 | pages = 918–29 |date=April 2011 | pmid = 21438609 | doi = 10.1021/ci2000192 | pmc=3089899}}</ref><ref name="pmid21438606">{{cite journal | author = Lomize AL, Pogozheva ID, Mosberg HI | title = Anisotropic solvent model of the lipid bilayer. 2. Energetics of insertion of small molecules, peptides and proteins in membranes | journal = J Chem Inf Model | volume = 51 | issue = 4 | pages = 930–46 |date=April 2011 | pmid = 21438606 | doi = 10.1021/ci200020k | url = | pmc=3091260}}</ref> + +==Problems and limitations== +All implicit solvation models rest on the simple idea that nonpolar atoms of a [[solution|solute]] tend to cluster together or occupy nonpolar media, whereas polar and charged groups of the solute tend to remain in water. However, it is important to properly balance the opposite energy contributions from different types of atoms. Several important points have been discussed and investigated over the years. + +===Choice of model solvent=== +It has been noted that wet [[1-octanol]] solution is a poor approximation of proteins or biological membranes because it contains ~2M of water, and that [[cyclohexane]] would be a much better approximation.<ref name="Radzicka">{{cite journal | author = Radzicka A., Wolfenden R. | year = 1988 | title = Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution | url = | journal = Biochemistry | volume = 27 | issue = | pages = 1664–1670 }}</ref> Investigation of passive permeability barriers for different compounds across lipid bilayers led to conclusion that 1,9-decadiene can serve as a good approximations of the bilayer interior,<ref name="pmid11920749">{{cite journal | author = Mayer PT, Anderson BD | title = Transport across 1,9-decadiene precisely mimics the chemical selectivity of the barrier domain in egg lecithin bilayers | journal = J Pharm Sci | volume = 91 | issue = 3 | pages = 640–6 |date=March 2002 | pmid = 11920749 | doi = 10.1002/jps.10067| url = | issn = 0022-3549 }}</ref> whereas [[1-octanol]] was a very poor approximation.<ref name="pmid3735402">{{cite journal | author = Walter A, Gutknecht J | title = Permeability of small nonelectrolytes through lipid bilayer membranes | journal = J. Membr. Biol. | volume = 90 | issue = 3 | pages = 207–17 | year = 1986 | pmid = 3735402 | doi = 10.1007/BF01870127| url = | issn = 0022-2631 }}</ref> A set of solvation parameters derived for protein interior from [[protein engineering]] data was also different from octanol scale: it was close to [[cyclohexane]] scale for nonpolar atoms but intermediate between cyclohexane and octanol scales for polar atoms.<ref name="Lomize_2002">{{cite journal | author = Lomize AL, Reibarkh MY, Pogozheva ID | title = Interatomic potentials and solvation parameters from protein engineering data for buried residues | journal = Protein Sci. | volume = 11 | issue = 8 | pages = 1984–2000 |date=August 2002 | pmid = 12142453 | pmc = 2373680 | doi = 10.1110/ps.0307002 | issn = 0961-8368 | format = Free full text }}</ref> Thus, different atomic solvation parameters should be applied for modeling of protein folding and protein-membrane binding. This issue remains controversial. The original idea of the method was to derive all solvation parameters directly from experimental [[partition coefficient]]s of organic molecules, which allows calculation of solvation free energy. However, some of the recently developed electrostatic models use ''ad hoc'' values of 20 or 40 cal/(Å<sup>2</sup> mol) for ''all'' types of atoms. The non-existent “hydrophobic” interactions of polar atoms are overridden by large electrostatic energy penalties in such models. + +===Solid-state applications=== +Strictly speaking, ASA-based models should only be applied to describe ''solvation'', i.e. energetics of transfer between [[liquid]] or uniform media. It is possible to express van der Waals interaction energies in the [[solid]] state in the surface energy units. This was sometimes done for interpreting [[protein engineering]] and [[ligand binding]] energetics,<ref name="pmid1553543">{{cite journal | author = Eriksson AE, Baase WA, Zhang XJ, Heinz DW, Blaber M, Baldwin EP, Matthews BW | title = Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect | journal = Science | volume = 255 | issue = 5041 | pages = 178–83 |date=January 1992 | pmid = 1553543 | doi = 10.1126/science.1553543| url = | issn = 0036-8075 |bibcode = 1992Sci...255..178E }}</ref> which leads to “solvation” parameter for [[aliphatic]] carbon of ~40 cal/(Å<sup>2</sup> mol),<ref name="pmid11297670">{{cite journal | author = Funahashi J, Takano K, Yutani K | title = Are the parameters of various stabilization factors estimated from mutant human lysozymes compatible with other proteins? | journal = Protein Eng. | volume = 14 | issue = 2 | pages = 127–34 |date=February 2001 | pmid = 11297670 | doi = 10.1093/protein/14.2.127| url =http://peds.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=11297670 | issn = 0269-2139 | format = Free full text }}</ref> which is 2 times bigger than ~20 cal/(Å<sup>2</sup> mol) obtained for transfer from water to liquid hydrocarbons, because the parameters derived by such fitting represent sum of the hydrophobic energy (i.e. 20 cal/Å<sup>2</sup> mol) and energy of van der Waals attractions of aliphatic groups in the solid state, which corresponds to [[heat of fusion|fusion enthalpy]] of [[alkanes]].<ref name="Lomize_2002" /> Unfortunately, the simplified ASA-based model can not capture the "specific" distance-dependent interactions between different types of atoms in the solid state which are responsible for clustering of atoms with similar polarities in protein structures and molecular crystals. Parameters of such interatomic interactions, together with atomic solvation parameters for the protein interior, have been approximately derived from [[protein engineering]] data.<ref name="Lomize_2002"/> The implicit solvation model breaks down when solvent molecules associate strongly with binding cavities in a protein, so that the protein and the solvent molecules form a continuous solid body.<ref name="Lomize_2004">{{cite journal | author = Lomize AL, Pogozheva ID, Mosberg HI | title = Quantification of helix–helix binding affinities in micelles and lipid bilayers | journal = Protein Sci. | volume = 13 | issue = 10 | pages = 2600–12 |date=October 2004 | pmid = 15340167 | pmc = 2286553 | doi = 10.1110/ps.04850804 | issn = 0961-8368 | format = Free full text }}</ref> On the other hand, this model can be successfully applied for describing transfer from water to the ''[[fluid]]'' lipid bilayer.<ref name="Lomize_2006">{{cite journal | author = Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI | title = Positioning of proteins in membranes: A computational approach | journal = Protein Sci. | volume = 15 | issue = 6 | pages = 1318–33 |date=June 2006 | pmid = 16731967 | pmc = 2242528 | doi = 10.1110/ps.062126106 | issn = 0961-8368 | format = Free full text }}</ref> + +===Importance of extensive testing=== +More testing is needed to evaluate the performance of different implicit solvation models and parameter sets. They are often tested only for a small set of molecules with very simple structure, such as hydrophobic and amphiphilic [[alpha helix|α-helices]]. This method was rarely tested for hundreds of protein structures.<ref name="Lomize_2006"/> + +===Treatment of ionization effects=== +Ionization of charged groups has been neglected in continuum [[electrostatic]] models of implicit solvation, as well as in standard [[molecular mechanics]] and [[molecular dynamics]]. The transfer of an ion from water to a nonpolar medium with [[dielectric constant]] of ~3 (lipid bilayer) or 4 to 10 (interior of proteins) costs significant energy, as follows from the [[Max Born|Born]] equation and from experiments. However, since the charged protein residues are ionizable, they simply lose their charges in the nonpolar environment, which costs relatively little at the neutral [[pH]]: ~4 to 7 kcal/mol for Asp, Glu, Lys, and Arg [[amino acid]] residues, according to the [[Henderson-Hasselbalch equation]], ''ΔG = 2.3RT (pH - pK)''. The low energetic costs of such ionization effects have indeed been observed for protein mutants with buried ionizable residues.<ref name="pmid1747370">{{cite journal | author = Dao-pin S, Anderson DE, Baase WA, Dahlquist FW, Matthews BW | title = Structural and thermodynamic consequences of burying a charged residue within the hydrophobic core of T4 lysozyme | journal = Biochemistry | volume = 30 | issue = 49 | pages = 11521–9 |date=December 1991 | pmid = 1747370 | doi = 10.1021/bi00113a006| url = | issn = 0006-2960 }}</ref> and hydrophobic α-helical peptides in membranes with a single ionizable residue in the middle.<ref name="pmid12641459">{{cite journal | author = Caputo GA, London E | title = Cumulative effects of amino acid substitutions and hydrophobic mismatch upon the transmembrane stability and conformation of hydrophobic alpha-helices | journal = Biochemistry | volume = 42 | issue = 11 | pages = 3275–85 |date=March 2003 | pmid = 12641459 | doi = 10.1021/bi026697d | url = | issn = 0006-2960 }}</ref> However, all electrostatic methods, such as PB, GB, or GBSA assume that ionizable groups remain charged in the nonpolar environments, which leads to grossly overestimated electrostatic energy. In the simplest [[accessible surface area]]-based models, this problem was treated using different solvation parameters for charged atoms or Henderson-Hasselbalch equation with some modifications.<ref name="Lomize_2006"/> However even the latter approach does not solve the problem. Charged residues can remain charged even in the nonpolar environment if they are involved in intramolecular ion pairs and H-bonds. Thus, the energetic penalties can be overestimated even using the Henderson-Hasselbalch equation. More rigorous theoretical methods describing such ionization effects have been developed,<ref name="pmid9615168">{{cite journal | author = Schaefer M, van Vlijmen HW, Karplus M | title = Electrostatic contributions to molecular free energies in solution | journal = Adv. Protein Chem. | volume = 51 | issue = | pages = 1–57 | year = 1998 | pmid = 9615168 | doi = 10.1016/S0065-3233(08)60650-6| url = | issn = 0065-3233 | series = Advances in Protein Chemistry | isbn = 978-0-12-034251-8 }}</ref> and there are ongoing efforts to incorporate such methods into the implicit solvation models.<ref name="pmid15051331">{{cite journal | author = García-Moreno E B, Fitch CA | title = Structural interpretation of pH and salt-dependent processes in proteins with computational methods | journal = Meth. Enzymol. | volume = 380 | issue = | pages = 20–51 | year = 2004 | pmid = 15051331 | doi = 10.1016/S0076-6879(04)80002-8 | url = | issn = 0076-6879 | series = Methods in Enzymology | isbn = 978-0-12-182784-7 }}</ref> + +==See also== +{{columns-list|2| +* [[Polarizable continuum model]] +* [[COSMO solvation model]] +* [[Molecular dynamics]] +* [[Molecular mechanics]] +* [[Water model]] +* [[Force field (chemistry)|Force field]]s in chemistry +* [[Force field implementation]] +* [[Poisson's equation]] +* [[Accessible surface area]] +* [[List of software for molecular mechanics modeling]] +}} + +==References== +{{reflist|2}} + +{{DEFAULTSORT:Implicit solvation}} +[[Category:Molecular modelling]] +[[Category:Computational chemistry]] +[[Category:Molecular dynamics]] +[[Category:Protein structure]] + m65tnpdv0aaa47drps6hi0b68ppnh1n + + + + Partition function (statistical mechanics) + 0 + 3896 + + 3897 + 2013-11-13T15:44:10Z + + Nanite + 0 + + /* Grand canonical partition function */ + wikitext + text/x-wiki + {{About|statistical mechanics|other uses|partition function (disambiguation)}} + +{{statistical mechanics}} + +In [[physics]], a '''partition function''' describes the [[statistics|statistical]] properties of a system in [[thermodynamic equilibrium]]. They are [[function (mathematics)|functions]] of [[temperature]] and other parameters, such as the [[volume]] enclosing a gas. Most of the aggregate [[thermodynamics|thermodynamic]] variables of the system, such as the [[energy|total energy]], [[Thermodynamic free energy|free energy]], [[entropy]], and [[pressure]], can be expressed in terms of the partition function or its [[derivative]]s. + +There are actually several different types of partition functions, each corresponding to different types of [[statistical ensemble]] (or, equivalently, different types of [[Thermodynamic free energy|free energy]].) The '''canonical partition function''' applies to a [[canonical ensemble]], in which the system is allowed to exchange [[heat]] with the environment at fixed temperature, volume, and [[number of particles]]. The '''grand canonical partition function''' applies to a [[grand canonical ensemble]], in which the system can exchange both heat and particles with the environment, at fixed temperature, volume, and [[chemical potential]]. Other types of partition functions can be defined for different circumstances; see [[partition function (mathematics)]] for generalizations. + +== Canonical partition function == + +=== Definition === +As a beginning assumption, assume that a thermodynamically large system is in thermal contact with the environment, with a temperature ''T'', and both the volume of the system and the number of constituent particles are fixed. This kind of system is called a [[canonical ensemble]]. Let us label with ''s'' = 1, 2, 3, ... the ''exact'' states ([[Microstate (statistical mechanics)|microstates]]) that the system can occupy, and denote the total energy of the system when it is in microstate ''s'' as ''E<sub>s</sub>''. Generally, these microstates can be regarded as analogous to discrete [[quantum state]]s of the system. + +The '''canonical partition function''' is + +: <math> Z = \sum_{s} \mathrm{e}^{- \beta E_s}</math> , + +where the "inverse temperature", ''[[Thermodynamic beta|&beta;]]'', is conventionally defined as + +: <math>\beta \equiv \frac{1}{k_BT}</math> + +with ''k''<sub>B</sub> denoting [[Boltzmann's constant]]. The [[Exponential function|exponential]] factor exp(−''βE<sub>s</sub>'') is known as the [[Boltzmann factor]]. (For a detailed derivation of this result, see [[canonical ensemble]]). In systems with multiple quantum states ''s'' sharing the same ''E<sub>s</sub>'', it is said that the energy levels of the system are degenerate. In the case of degenerate energy levels, we can write the partition function in terms of the contribution from energy levels (indexed by ''j'' ) as follows: + +: <math> Z = \sum_{j} g_j \cdot \mathrm{e}^{- \beta E_j}</math>, + +where ''g''<sub>j</sub> is the degeneracy factor, or number of quantum states ''s'' which have the same energy level defined by ''E<sub>j</sub>'' = ''E<sub>s</sub>''. + +The above treatment applies to ''quantum'' [[statistical mechanics]], where a physical system inside a finite-sized box will typically have a discrete set of [[energy]] [[eigenstates]], which we can use as the states ''s'' above. In ''classical'' statistical mechanics, it is not really correct to express the partition function as a sum of discrete terms, as we have done. In [[classical mechanics]], the position and momentum variables of a particle can vary continuously, so the set of microstates is actually [[uncountable set|uncountable]]. In this case we must describe the partition function using an [[integral]] rather than a sum. For instance, the partition function of a gas of ''N'' identical classical particles is + +: <math>Z=\frac{1}{N! h^{3N}} \int \, \exp[-\beta H(p_1 \cdots p_N, x_1 +\cdots x_N)] \; d^3p_1 \cdots d^3p_N \, d^3x_1 \cdots d^3x_N </math> + +where +:''p<sub>i</sub>'' indicate particle momenta +:''x<sub>i</sub>'' indicate particle positions +:''d''<sup>3</sup> is a shorthand notation serving as a reminder that the ''p<sub>i</sub>'' and ''x<sub>i</sub>'' are vectors in three dimensional space, and +:''H'' is the classical [[Hamiltonian mechanics|Hamiltonian]]. + +The reason for the [[factorial]] factor ''N''! is discussed [[#Partition functions of subsystems|below]]. For simplicity, we will use the discrete form of the partition function in this article. Our results will apply equally well to the continuous form. The extra constant factor introduced in the denominator was introduced because, unlike the discrete form, the continuous form shown above is not [[dimensionless]]. To make it into a dimensionless quantity, we must divide it by ''h''<sup>3''N''</sup> where ''h'' is some quantity with units of [[action (physics)|action]] (usually taken to be [[Planck's constant]]). + +In quantum mechanics, the partition function can be more formally written as a [[trace (linear algebra)|trace]] over the [[mathematical formulation of quantum mechanics|state space]] (which is independent of the choice of [[basis (linear algebra)|basis]]): + +: <math>Z = \operatorname{tr} ( \mathrm{e} ^{-\beta\hat{H}} )</math>&nbsp;, + +where ''Ĥ'' is the [[Hamiltonian (quantum mechanics)|quantum Hamiltonian operator]]. The exponential of an operator can be defined using the [[Characterizations of the exponential function|exponential power series]]. +The classical form of ''Z'' is recovered when the trace is expressed in terms +of [[coherent state]]s +<ref>J. R. Klauder, B.-S. Skagerstam, +''Coherent States --- Applications in Physics and Mathematical Physics'', +World Scientific, 1985, p. 71-73.</ref> +and when quantum-mechanical [[uncertainty principle|uncertainties]] in the position and momentum of a particle +are regarded as negligible. Formally, one inserts under the trace for each degree of freedom the identity: +:<math> + \boldsymbol{1} = \int |x,p\rangle\,\langle x,p|~\frac{ dx\, dp}{h} +</math> +where {{!}}''x'', ''p''{{rangle}} is a normalised Gaussian wavepacket centered at +position ''x'' and momentum ''p''. Thus, +:<math> + Z = \int \operatorname{tr} \left( \mathrm{e}^{-\beta\hat{H}} |x,p\rangle\,\langle x,p| \right) + \frac{ dx\, dp}{h} + = \int\langle x,p| \mathrm{e} ^{-\beta\hat{H}}|x,p\rangle ~\frac{ dx\, dp}{h} +</math> +A coherent state is an approximate eigenstate +of both operators <math> \hat{x} </math> and <math> \hat{p} </math>, +hence also of the Hamiltonian ''Ĥ'', with errors of the size of +the uncertainties. If Δ''x'' and Δ''p'' can be regarded as +zero, the action of ''Ĥ'' reduces to multiplication by the classical +Hamiltonian, and ''Z'' reduces to the classical configuration integral. + +=== Meaning and significance === + +It may not be obvious why the partition function, as we have defined it above, is an important quantity. First, let us consider what goes into it. The partition function is a function of the temperature ''T'' and the microstate energies ''E''<sub>1</sub>, ''E''<sub>2</sub>, ''E''<sub>3</sub>, etc. The microstate energies are determined by other thermodynamic variables, such as the number of particles and the volume, as well as microscopic quantities like the mass of the constituent particles. This dependence on microscopic variables is the central point of statistical mechanics. With a model of the microscopic constituents of a system, one can calculate the microstate energies, and thus the partition function, which will then allow us to calculate all the other thermodynamic properties of the system. + +The partition function can be related to thermodynamic properties because it has a very important statistical meaning. The probability ''P<sub>s</sub>'' that the system occupies microstate ''s'' is + +: <math>P_s = \frac{1}{Z} \mathrm{e}^{- \beta E_s}. </math> + +The partition function thus plays the role of a normalizing constant (note that it does ''not'' depend on ''s''), ensuring that the probabilities sum up to one: + +: <math>\sum_s P_s = \frac{1}{Z} \sum_s \mathrm{e}^{- \beta E_s} = \frac{1}{Z} Z += 1. </math> + +This is the reason for calling ''Z'' the "partition function": it encodes how the probabilities are partitioned among the different microstates, based on their individual energies. The letter ''Z'' stands for the [[German language|German]] word ''Zustandssumme'', "sum over states". This notation also implies another important meaning of the partition function of a system: it counts the (weighted) number of states a system can occupy. Hence if all states are equally probable (equal energies) the partition function is the total number of possible states. Often this is the practical importance of Z. + +=== Calculating the thermodynamic total energy === + +In order to demonstrate the usefulness of the partition function, let us calculate the thermodynamic value of the total energy. This is simply the [[expected value]], or [[ensemble average]] for the energy, which is the sum of the microstate energies weighted by their probabilities: + +: <math>\langle E \rangle = \sum_s E_s P_s = \frac{1}{Z} \sum_s E_s +e^{- \beta E_s} = - \frac{1}{Z} \frac{\partial}{\partial \beta} +Z(\beta, E_1, E_2, \cdots) = - \frac{\partial \ln Z}{\partial \beta} +</math> + +or, equivalently, + +: <math>\langle E\rangle = k_B T^2 \frac{\partial \ln Z}{\partial T}.</math> + +Incidentally, one should note that if the microstate energies depend on a parameter λ in the manner + +: <math>E_s = E_s^{(0)} + \lambda A_s \qquad \mbox{for all}\; s </math> + +then the expected value of ''A'' is + +: <math>\langle A\rangle = \sum_s A_s P_s = -\frac{1}{\beta} +\frac{\partial}{\partial\lambda} \ln Z(\beta,\lambda).</math> + +This provides us with a method for calculating the expected values of many microscopic quantities. We add the quantity artificially to the microstate energies (or, in the language of quantum mechanics, to the Hamiltonian), calculate the new partition function and expected value, and then set ''λ'' to zero in the final expression. This is analogous to the [[source field]] method used in the [[path integral formulation]] of [[quantum field theory]]. + +=== Relation to thermodynamic variables === + +In this section, we will state the relationships between the partition function and the various thermodynamic parameters of the system. These results can be derived using the method of the previous section and the various thermodynamic relations. + +As we have already seen, the thermodynamic energy is + +: <math>\langle E \rangle = - \frac{\partial \ln Z}{\partial \beta}.</math> + +The [[variance]] in the energy (or "energy fluctuation") is + +: <math>\langle (\Delta E)^2 \rangle \equiv \langle (E - \langle +E\rangle)^2 \rangle = \frac{\partial^2 \ln Z}{\partial \beta^2}.</math> + +The [[heat capacity]] is + +: <math>C_v = \frac{\partial \langle E\rangle}{\partial T} = \frac{1}{k_B T^2} \langle (\Delta E)^2 \rangle.</math> + +The [[entropy]] is + +: <math>S \equiv -k_B\sum_s P_s\ln P_s= k_B (\ln Z + \beta \langle E\rangle)=\frac{\partial}{\partial T}(k_B T \ln Z) =-\frac{\partial A}{\partial T}</math> + +where ''A'' is the [[Helmholtz free energy]] defined as ''A'' = ''U'' − ''TS'', where ''U'' = {{langle}}''E''{{rangle}} is the total energy and ''S'' is the [[entropy]], so that + +: <math>A = \langle E\rangle -TS= - k_B T \ln Z.</math> + +=== Partition functions of subsystems === + +Suppose a system is subdivided into ''N'' sub-systems with negligible interaction energy, that is, we can assume the particles are essentially non-interacting. If the partition functions of the sub-systems are ''ζ''<sub>1</sub>, ''ζ''<sub>2</sub>, ..., ''ζ''<sub>N</sub>, then the partition function of the entire system is the ''product'' of the individual partition functions: + +: <math>Z =\prod_{j=1}^{N} \zeta_j.</math> + +If the sub-systems have the same physical properties, then their partition functions are equal, ζ<sub>1</sub> = ζ<sub>2</sub> = ... = ζ, in which case + +: <math>Z = \zeta^N.</math> + +However, there is a well-known exception to this rule. If the sub-systems are actually [[identical particles]], in the [[quantum mechanics|quantum mechanical]] sense that they are impossible to distinguish even in principle, the total partition function must be divided by a ''N''! (''N'' [[factorial]]): + +: <math>Z = \frac{\zeta^N}{N!}.</math> + +This is to ensure that we do not "over-count" the number of microstates. While this may seem like a strange requirement, it is actually necessary to preserve the existence of a thermodynamic limit for such systems. This is known as the [[Gibbs paradox]]. + +==Grand canonical partition function== +{{main|Grand canonical ensemble}} + +We can define a '''grand canonical partition function''' for a [[grand canonical ensemble]], which describes the statistics of a constant-volume system that can exchange both heat and particles with a reservoir. +The reservoir has a constant temperature ''T'', and a [[chemical potential]] ''μ''. + +The grand canonical partition function, denoted by <math>\mathcal{Z}</math>, is the following sum over [[microstate (statistical mechanics)|microstates]] +:<math> \mathcal{Z}(\mu, V, T) = \sum_{i} \exp((N_i\mu - E_i)/k_B T). </math> +Here, each microstate is labelled by <math>i</math>, and has total particle number <math>N_i</math> and total energy <math>E_i</math>. +This partition function is closely related to the [[Grand potential]], <math>\Phi_{\rm G}</math>, by the relation +:<math> -k_B T \ln \mathcal{Z} = \Phi_{\rm G} = \langle E \rangle - TS - \mu \langle N\rangle. </math> +This can be contrasted to the canonical partition function above, which is related instead to the [[Helmholtz free energy]]. + +It is important to note that the number of microstates in the grand canonical ensemble may be much larger than in the canonical ensemble, +since here we consider not only variations in energy but also in particle number. +Again, the utility of the grand canonical partition function is that it is related to the probability that the system is in state <math>i</math>: +:<math> p_i = \frac{1}{\mathcal Z} \exp((N_i\mu - E_i)/k_B T) .</math> + +An important application of the grand canonical ensemble is in deriving exactly the statistics of a non-interacting many-body quantum gas ([[Fermi-Dirac statistics]] for fermions, [[Bose-Einstein statistics]] for bosons), however it is much more generally applicable than that. +The grand canonical ensemble may also be used to describe classical systems, or even interacting quantum gases. + +The grand partition function is sometimes written (equivalently) in terms of alternate variables as<ref>{{cite isbn|9780120831807}}</ref> +:<math> \mathcal{Z}(z, V, T) = \sum_{N_i} z^{N_i} Z(N_i, V, T), </math> +where <math>z \equiv \exp(\mu/kT)</math> is known as the ''activity'' or ''fugacity'' and <math>Z(N_i, V, T)</math> is the canonical partition function. + +==See also== +* [[Partition function (mathematics)]] +* [[Virial theorem]] +* [[Widom insertion method]] + +==References== +<references /> +* Huang, Kerson, "Statistical Mechanics", John Wiley & Sons, New York, 1967. +* A. Isihara, "Statistical Physics", Academic Press, New York, 1971. +* Kelly, James J, [http://www.physics.umd.edu/courses/Phys603/kelly/Notes/IdealQuantumGases.pdf (Lecture notes)] +* L. D. Landau and E. M. Lifshitz, "Statistical Physics, 3rd Edition Part 1", Butterworth-Heinemann, Oxford, 1996. +* Vu-Quoc, L., [http://clesm.mae.ufl.edu/wiki.pub/index.php/Configuration_integral_%28statistical_mechanics%29 Configuration integral], 2008 +{{Statistical mechanics topics}} + +[[Category:Concepts in physics]] +[[Category:Partition functions| ]] + ppd8zp33hzsgtsz3masgk0ut077mdt7 + + + + Color balance + 0 + 8807 + + 8808 + 2014-02-04T02:23:34Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + {{About|the process applied to still images|the equivalent process applied to video|Color grading}} +[[File:Lily-M7292-As-shot-and-manual.jpg|thumb|right|300px|The left half shows the photo as it came from the digital camera. The right half shows the photo adjusted to make a gray surface neutral in the same light.]] +[[File:Clifton Beach 5.jpg|thumb|right|300px|A seascape photograph at [[Clifton Beach, Tasmania|Clifton Beach]], [[South Arm, Tasmania|South Arm]], [[Tasmania]], Australia. The white balance has been adjusted towards the warm side for creative effect.]] +[[File:ColorChecker100423.jpg|thumb|right|300px|Photograph of a ColorChecker as a reference shot for color balance adjustments.]] +[[File:Government Center Miami color balance comparison.jpg|thumb|right|300px|Two photos of the Stephen P. Clark Government Center building in Miami, Florida taken with a Samsung SL50 point and shoot camera. Left photo shows a "normal", accurate color balance, while the right side shows a "vivid" color balance]] +[[Image:PIA16800-MarsCuriosityRover-MtSharp-ColorVersions-20120823.jpg|thumb|right|300px|Comparison of color versions (raw, natural, white balance) of "[[Aeolis Mons|Mount Sharp]]" on [[Mars]] (August 23, 2012).]] +[[Image:PIA16068 - Mars Curiosity Rover - Aeolis Mons - 20120817.jpg|thumb|right|300px|A white balanced image of "[[Aeolis Mons| Mount Sharp]]" on [[Mars]] (August 8, 2012).]] + +In [[photography]] and [[image processing]], '''color balance''' is the global adjustment of the intensities of the colors (typically red, green, and blue [[primary colors]]). An important goal of this adjustment is to render specific colors – particularly neutral colors – correctly; hence, the general method is sometimes called '''gray balance''', '''neutral balance''', or '''white balance'''. Color balance changes the overall mixture of colors in an image and is used for [[color correction]]; generalized versions of color balance are used to get colors other than neutrals to also appear correct or pleasing. + +Image data acquired by sensors – either [[photographic film|film]] or electronic [[image sensor]]s – must be transformed from the acquired values to new values that are appropriate for color reproduction or display. Several aspects of the acquisition and display process make such color correction essential – including the fact that the acquisition sensors do not match the sensors in the human eye, that the properties of the display medium must be accounted for, and that the ambient viewing conditions of the acquisition differ from the display viewing conditions. + +The color balance operations in popular [[image editing]] applications usually operate directly on the red, green, and blue channel [[pixel]] values,<ref>{{Cite book| title = The Gimp for Linux and Unix | author = Phyllis Davis | publisher = Peachpit Press | year = 2000 | isbn = 0-201-70253-3 | url = http://books.google.com/?id=0sEnoWrMw-gC&pg=PA135&dq=%22color+balance%22+channels | page = 134}}</ref><ref>{{Cite book| title = Adobe Photoshop 6.0 | author = Adobe Creative Team | publisher = Adobe Press | year = 2000 | isbn = 0-201-71016-1 | url = http://books.google.com/?id=MRtx2-0GZc4C&pg=PA277&dq=%22color+balance%22+channels | page = 278 }}</ref> without respect to any color sensing or reproduction model. In shooting film, color balance is typically achieved by using [[color correction filter]]s over the lights or on the camera lens.<ref>{{Cite book| title = Cinematography: Theory and Practice : Imagemaking for Cinematographers, Directors, and Videographers | author = Blain Brown | publisher = Focal Press | year = 2002 | isbn = 0-240-80500-3| url = http://books.google.com/?id=1JL2jFbNPNAC&pg=PA170&dq=%22color+balance%22 | page=170 }}</ref> + +==Generalized color balance== +Sometimes the adjustment to keep neutrals neutral is called ''white balance'', and the phrase ''color balance'' refers to the adjustment that in addition makes other colors in a displayed image appear to have the same general appearance as the colors in an original scene.<ref>{{Cite book| title = Introduction to Color Imaging Science | author = Hsien-Che Lee | publisher = Cambridge University Press | year = 2005 | isbn = 0-521-84388-X | url = http://books.google.com/?id=CzAbJrLin_AC&pg=PA450&vq=color+balance&dq=%22color+balance%22+wandell | page=450 }}</ref> It is particularly important that neutral (gray, [[achromatic]], white) colors in a scene appear neutral in the reproduction. Hence, the special case of balancing the neutral colors (sometimes ''gray balance'', ''neutral balance'', or ''white balance'') is a particularly important – perhaps dominant – element of color balancing. + +Normally, one would not use the phrase ''color balance'' to describe the adjustments needed to account for differences between the sensors and the human eye, or the details of the display primaries. ''Color balance'' is normally reserved to refer to correction for differences in the ambient illumination conditions. However, the algorithms for transforming the data do not always clearly separate out the different elements of the correction. Hence, it can be difficult to assign color balance to a specific step in the color correction process. Moreover, there can be significant differences in the color balancing goal. Some applications are created to produce an accurate rendering – as suggested above. In other applications, the goal of color balancing is to produce a pleasing rendering. This difference also creates difficulty in defining the color balancing processing operations. + +==Illuminant estimation and adaptation== +Most digital cameras have a means to select a color correction based on the type of scene illumination, using either manual illuminant selection, or automatic white balance (AWB), or custom white balance. The algorithm that performs this analysis performs generalized color balancing, known as illuminant adaptation or [[chromatic adaptation]]. + +Many methods are used to achieve color balancing. Setting a button on a camera is a way for the user to indicate to the processor the nature of the scene lighting. Another option on some cameras is a button which one may press when the camera is pointed at a [[gray card]] or other neutral object. This "custom white balance" step captures an image of the ambient light, and this information is helpful in controlling color balance. + +There is a large literature on how one might estimate the ambient illumination from the camera data and then use this information to transform the image data. A variety of algorithms have been proposed, and the quality of these have been debated. A few examples and examination of the references therein will lead the reader to many others. Examples are [[Retinex]], an [[artificial neural network]]<ref name="Funt1996">Brian Funt, Vlad Cardei, and Kobus Barnard, "[http://www.cs.sfu.ca/~colour/publications/ARIZONA/arizona_abs.html Learning color constancy]," in ''Proceedings of the Fourth IS&T/SID Color Imaging Conference,'' p 58-60 (1996).</ref> or a [[Bayesian method]].<ref name=Finlayson2001>{{Cite journal + | author = Graham Finlayson, Paul M. Hubel, and Steven Hordley + |date=November 2001 + | title = Color by correlation: a simple, unifying framework for color constancy + | journal = [[IEEE Transactions on Pattern Analysis and Machine Intelligence]] + | volume = 23 + | issue = 11 + | pages = 1209–1221 + | doi = 10.1109/34.969113 + | url = http://www2.cmp.uea.ac.uk/Research/compvis/Papers/FinHorHub_PAMI01.pdf +|format=PDF}}</ref> + +==Color balance and chromatic colors== +Color balancing an image affects not only the neutrals, but other colors as well. An image that is not color balanced is said to have a color cast, as everything in the image appears to have been shifted towards one color or another.<ref name="Yule1967">John A C Yule, ''Principles of Color Reproduction.'' New York: Wiley, 1967.</ref>{{Page needed|date=September 2010}} Color balancing may be thought in terms of removing this color cast. + +Color balance is also related to [[color constancy]]. Algorithms and techniques used to attain color constancy are frequently used for color balancing, as well. Color constancy is, in turn, related to [[chromatic adaptation]]. Conceptually, color balancing consists of two steps: first, determining the [[standard illuminant|illuminant]] under which an image was captured; and second, scaling the components (e.g., R, G, and B) of the image or otherwise transforming the components so they conform to the viewing illuminant. + +Viggiano found that white balancing in the camera's native [[RGB]] tended to produce less color inconstancy (i.e., less distortion of the colors) than in monitor RGB for over 4000 hypothetical sets of camera sensitivities.<ref name="Viggiano2004"/> This difference typically amounted to a factor of more than two in favor of camera RGB. This means that it is advantageous to get color balance right at the time an image is captured, rather than edit later on a monitor. If one must color balance later, balancing the [[Raw image format|raw image data]] will tend to produce less distortion of chromatic colors than balancing in monitor RGB. + +==Mathematics of color balance== +Color balancing is sometimes performed on a three-component image (e.g., [[RGB color model|RGB]]) using a 3x3 [[matrix (mathematics)|matrix]]. This type of transformation is appropriate if the image were captured using the wrong white balance setting on a digital camera, or through a color filter. + +===Scaling monitor R, G, and B=== +In principle, one wants to scale all relative luminances in an image so that objects which are believed to be [[grey|neutral]] appear so. If, say, a surface with <math>R=240</math> was believed to be a white object, and if 255 is the count which corresponds to white, one could multiply all [[red]] values by 255/240. Doing analogously for [[green]] and [[blue]] would result, at least in theory, in a color balanced image. In this type of transformation the 3x3 matrix is a [[diagonal matrix]]. + +: <math>\left[\begin{array}{c} R \\ G \\ B \end{array}\right]=\left[\begin{array}{ccc}255/R'_w & 0 & 0 \\ 0 & 255/G'_w & 0 \\ 0 & 0 & 255/B'_w\end{array}\right]\left[\begin{array}{c}R' \\ G' \\ B' \end{array}\right]</math> + +where <math>R</math>, <math>G</math>, and <math>B</math> are the color balanced red, green, and blue components of a [[pixel]] in the image; <math>R'</math>, <math>G'</math>, and <math>B'</math> are the red, green, and blue components of the image before color balancing, and <math>R'_w</math>, <math>G'_w</math>, and <math>B'_w</math> are the red, green, and blue components of a pixel which is believed to be a white surface in the image before color balancing. This is a simple scaling of the red, green, and blue channels, and is why color balance tools in [[Photoshop]] and the [[GIMP]] have a white eyedropper tool. It has been demonstrated that performing the white balancing in the phosphor set assumed by [[sRGB]] tends to produce large errors in chromatic colors, even though it can render the neutral surfaces perfectly neutral.<ref name="Viggiano2004">J A Stephen Viggiano, "[http://www.acolyte-color.com/papers/EI_2004.pdf Comparison of the accuracy of different white balancing options as quantified by their color constancy]." ''Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications V: Proceedings of the SPIE,'' volume 5301. Bellingham, WA: SPIE: the International Society for Optical Engineering, p 323-333 (2004), retrieved online 2008-07-28.</ref> + +===Scaling X, Y, Z=== +If the image may be transformed into [[CIE 1931 color space|CIE XYZ tristimulus values]], the color balancing may be performed there. This has been termed a “wrong von Kries” transformation.<ref name=Terstiege1972>{{Cite journal + | author = Heinz Terstiege + | title = Chromatic adaptation: a state-of-the-art report + | year = 1972 + | journal = Journal of Color Appearance + | volume = 1 + | issue = 4 + | pages = 19–23 (cont. 40) +}}</ref><ref name="Fairchild1998">Mark D Fairchild, ''Color Appearance Models.'' Reading, MA: Addison-Wesley, 1998.</ref> Although it has been demonstrated to offer usually poorer results than balancing in monitor RGB, it is mentioned here as a bridge to other things. Mathematically, one computes: + +:<math>\left[\begin{array}{c} X \\ Y \\ Z \end{array}\right]=\left[\begin{array}{ccc}X_w/X'_w & 0 & 0 \\ 0 & Y_w/Y'_w & 0 \\ 0 & 0 & Z_w/Z'_w\end{array}\right]\left[\begin{array}{c}X' \\ Y' \\ Z' \end{array}\right]</math> + +where <math>X</math>, <math>Y</math>, and <math>Z</math> are the color-balanced tristimulus values; <math>X_w</math>, <math>Y_w</math>, and <math>Z_w</math> are the tristimulus values of the viewing illuminant (the white point to which the image is being transformed to conform to); <math>X'_w</math>, <math>Y'_w</math>, and <math>Z'_w</math> are the tristimulus values of an object believed to be white in the un-color-balanced image, and <math>X'</math>, <math>Y'</math>, and <math>Z'</math> are the tristimulus values of a pixel in the un-color-balanced image. If the tristimulus values of the monitor primaries are in a matrix <math>\mathbf{P}</math> so that: + +:<math>\left[\begin{array}{c} X \\ Y \\ Z \end{array}\right]=\mathbf{P}\left[\begin{array}{c}L_R \\ L_G \\ L_B \end{array}\right]</math> + +where <math>L_R</math>, <math>L_G</math>, and <math>L_B</math> are the un-[[gamma correction|gamma corrected]] monitor RGB, one may use: + +:<math>\left[\begin{array}{c} L_R \\ L_G \\ L_B \end{array}\right]=\mathbf{P^{-1}}\left[\begin{array}{ccc}X_w/X'_w & 0 & 0 \\ 0 & Y_w/Y'_w & 0 \\ 0 & 0 & Z_w/Z'_w\end{array}\right]\mathbf{P}\left[\begin{array}{c}L_{R'} \\ L_{G'} \\ L_{B'} \end{array}\right]</math> + +===Von Kries's method=== +[[Johannes von Kries]], whose theory of [[rod cell|rods]] and three color-sensitive [[cone cell|cone]] types in the [[retina]] has survived as the dominant explanation of color sensation for over 100 years, motivated the method of converting color to the [[LMS color space]], representing the effective stimuli for the Long-, Medium-, and Short-wavelength cone types that are modeled as adapting independently. A 3x3 matrix converts RGB or XYZ to LMS, and then the three LMS primary values are scaled to balance the neutral; the color can then be converted back to the desired final [[color space]]:<ref name=Sharma>{{Cite book| title = Digital Color Imaging Handbook | author = Gaurav Sharma| url = http://books.google.com/?id=AkByHKRGTsQC&pg=PA153&dq=%22von+Kries%22 | publisher = [[CRC Press]] | year = 2003 | isbn = 0-8493-0900-X | page=153 }}</ref> + +:<math>\left[\begin{array}{c} L \\ M \\ S \end{array}\right]=\left[\begin{array}{ccc}1/L'_w & 0 & 0 \\ 0 & 1/M'_w & 0 \\ 0 & 0 & 1/S'_w\end{array}\right]\left[\begin{array}{c}L' \\ M' \\ S' \end{array}\right]</math> + +where <math>L</math>, <math>M</math>, and <math>S</math> are the color-balanced LMS cone tristimulus values; <math>L'_w</math>, <math>M'_w</math>, and <math>S'_w</math> are the tristimulus values of an object believed to be white in the un-color-balanced image, and <math>L'</math>, <math>M'</math>, and <math>S'</math> are the tristimulus values of a pixel in the un-color-balanced image. + +Matrices to convert to LMS space were not specified by von Kries, but can be derived from CIE color matching functions and LMS color matching functions when the latter are specified; matrices can also be found in reference books.<ref name=Sharma/> + +===Scaling camera RGB=== +By Viggiano's measure, and using his model of gaussian camera spectral sensitivities, most camera RGB spaces performed better than either monitor RGB or XYZ.<ref name="Viggiano2004"/> If the camera's raw RGB values are known, one may use the 3x3 diagonal matrix: + +: <math>\left[\begin{array}{c} R \\ G \\ B \end{array}\right]=\left[\begin{array}{ccc}255/R'_w & 0 & 0 \\ 0 & 255/G'_w & 0 \\ 0 & 0 & 255/B'_w\end{array}\right]\left[\begin{array}{c}R' \\ G' \\ B' \end{array}\right]</math> + +and then convert to a working RGB space such as [[sRGB]] or [[Adobe RGB]] after balancing. +<!-- in progress! +However, if one has already converted to monitor RGB, one may still work in camera RGB if a 3x3 [[regular matrix]] <math>\mathbf{A}</math> characterizes the camera's color mixing behavior reasonably well, so that: + +: <math>\left[\begin{array}{c} X \\ Y \\ Z \end{array}\right]\approx\mathbf{A}\left[\begin{array}{c}L_R \\ L_G \\ L_B \end{array}\right]</math> + +(This matrix is included in the [[ICC profile]] for some cameras.<ref name="ICC01_2006">International Color Consortium, ''Specification ICC.1:2004-10 (Profile version 4.2.0.0) Image technology colour management — Architecture, profile format, and data structure'', (2006).</ref>) If this matrix is known, one computes: + +:<math>\left[\begin{array}{c} L_R \\ L_G \\ L_B \end{array}\right]=\mathbf{P^{-1}A^{-1}}\left[\begin{array}{ccc}R_w/R'_w & 0 & 0 \\ 0 & G_w/G'_w & 0 \\ 0 & 0 & B_w/B'_w\end{array}\right]\mathbf{A\cdotP}\left[\begin{array}{c}L_{R'} \\ L_{G'} \\ L_{B'} \end{array}\right]</math> + +where <math>\mathbf{P}</math> is the phosphor matrix mentioned in the previous section; <math>L_{Rw}</math, <math>G_w</math, <math>B_w</math are the +--> + +===Preferred chromatic adaptation spaces=== +Comparisons of images balanced by diagonal transforms in a number of different RGB spaces have identified several such spaces that work better than others, and better than camera or monitor spaces, for chromatic adaptation, as measured by several color appearance models; the systems that performed statistically as well as the best on the majority of the image test sets used were the "Sharp", "Bradford", "CMCCAT", and "ROMM" spaces.<ref>{{Cite journal| url = http://infoscience.epfl.ch/getfile.py?recid=34049&mode=best | title = Chromatic Adaptation Performance of Different RGB Sensors | author = Sabine Süsstrunk, Jack Holm, and Graham D. Finlayson | journal = IS&T/SPIE Electronic Imaging | volume = 4300|date=January 2001 }}</ref> + +===General illuminant adaptation=== +The best color matrix for adapting to a change in illuminant is not necessarily a diagonal matrix in a fixed color space. It has long been known that if the space of illuminants can be described as a linear model with ''N'' basis terms, the proper color transformation will be the weighted sum of ''N'' fixed linear transformations, not necessarily consistently diagonalizable.<ref>{{Cite book| author = Laurence T. Maloney and Brain A. Wandell | chapter = Color constancy: a method for recovering surface spectral reflectance | title = Readings in Computer Vision | editor = Martin A. Fischler and Oscar Firschein | year = 1987 | publisher = Morgan-Kaufmann | isbn = 0-934613-33-8 | url = http://books.google.com/?id=W5hLHUI8U-kC&pg=PA293&dq=maloney+wandell }}</ref> + +==See also== +* [[Color cast]] +* [[Color temperature]] +* [[Gamma correction]] +* [[White point]] + +==References== +{{reflist|35em}} + +==External links== +* [http://www.nikondigital.org/articles/white_balance.htm White Balance] - Intro at nikondigital.org +* [http://www.photoxels.com/tutorial_white-balance.html Understanding White Balance] - Tutorial +* [http://www.ipol.im/pub/algo/lmps_simplest_color_balance/ Affine color balance with saturation, with code and on-line demonstration] + +{{DEFAULTSORT:Color Balance}} +[[Category:Color]] +[[Category:Image processing]] + aattkg5sj9eaazy10042j0awqpam7k4 + + + + Hilbert's theorem (differential geometry) + 0 + 17335 + + 17336 + 2013-03-17T03:41:24Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Migrating 2 interwiki links, now provided by [[Wikipedia:Wikidata|Wikidata]] on [[d:q2008549]] + wikitext + text/x-wiki + In [[differential geometry]], '''Hilbert's theorem''' (1901) states that there exists no complete [[regular surface]] <math>S</math> of constant negative [[gaussian curvature]] <math>K</math> [[immersion (mathematics)|immersed]] in <math>\mathbb{R}^{3}</math>. This theorem answers the question for the negative case of which surfaces in <math>\mathbb{R}^{3}</math> can be obtained by isometrically immersing [[complete manifold]]s with [[constant curvature]]. + +Hilbert's theorem was first treated by [[David Hilbert]] in, "Über Flächen von konstanter Krümmung" ([[Trans. Amer. Math. Soc.]] 2 (1901), 87-99). A different proof was given shortly after by E. Holmgren, "Sur les surfaces à courbure constante negative," (1902). + +==Proof== +The [[proof (mathematics)|proof]] of Hilbert's theorem is elaborate and requires several [[lemma (mathematics)|lemma]]s. The idea is to show the nonexistence of an isometric [[immersion (mathematics)|immersion]] +:<math>\varphi = \psi \circ \exp_p: S' \longrightarrow \mathbb{R}^{3}</math> + +of a plane <math>S'</math> to the real space <math>\mathbb{R}^{3}</math>. This proof is basically the same as in Hilbert's paper, although based in the books of Do Carmo and [[Michael Spivak|Spivak]]. + +''Observations'': In order to have a more manageable treatment, but without loss of generality, the [[curvature]] may be considered equal to minus one, <math>K=-1</math>. There is no loss of generality, since it is being dealt with constant curvatures, and similarities of <math>\mathbb{R}^{3}</math> multiply <math>K</math> by a constant. The [[exponential map]] <math>\exp_p: T_p(S) \longrightarrow S</math> is a [[local diffeomorphism]] (in fact a covering map, by Cartan-Hadamard theorem), therefore, it induces an [[inner product]] in the [[tangent space]] of <math>S</math> at <math>p</math>: <math>T_p(S)</math>. Furthermore, <math>S'</math> denotes the geometric surface <math>T_p(S)</math> with this inner product. If <math>\psi:S \longrightarrow \mathbb{R}^{3}</math> is an isometric immersion, the same holds for +:<math>\varphi = \psi \circ \exp_o:S' \longrightarrow \mathbb{R}^{3}</math>. + +The first lemma is independent from the other ones, and will be used at the end as the counter statement to reject the results from the other lemmas. + +'''Lemma 1''': The area of <math>S'</math> is infinite. <br /> +''Proof's Sketch:'' <br /> +The idea of the proof is to create a [[global isometry]] between <math>H</math> and <math>S'</math>. Then, since <math>H</math> has an infinite area, <math>S'</math> will have it too. <br /> +The fact that the [[Hyperbolic manifold|hyperbolic plane]] <math>H</math> has an infinite area comes by computing the [[surface integral]] with the corresponding [[coefficient]]s of the [[First fundamental form]]. To obtain these ones, the hyperbolic plane can be defined as the plane with the following inner product around a point <math>q\in \mathbb{R}^{2}</math> with coordinates <math>(u,v)</math><br /> +:<math>E = \left\langle \frac{\partial}{\partial u}, \frac{\partial}{\partial u} \right\rangle = 1 \qquad F = \left\langle \frac{\partial}{\partial u}, \frac{\partial}{\partial v} \right\rangle = \left\langle \frac{\partial}{\partial v}, \frac{\partial}{\partial u} \right\rangle = 0 \qquad G = \left\langle \frac{\partial}{\partial v}, \frac{\partial}{\partial v} \right\rangle = e^{u} </math> <br /> + +Since the hyperbolic plane is unbounded, the limits of the integral are [[Infinity|infinite]], and the area can be calculated through +:<math>\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{u} du dv = \infty</math> + +Next it is needed to create a map, which will show that the global information from the hyperbolic plane can be transfer to the surface <math>S'</math>, i.e. a global isometry. <math>\varphi: H \rightarrow S'</math> will be the map, whose domain is the hyperbolic plane and image the [[2-dimensional manifold]] <math>S'</math>, which carries the inner product from the surface <math>S</math> with negative curvature. <math>\varphi</math> will be defined via the exponential map, its inverse, and a linear isometry between their tangent spaces, +:<math>\psi:T_p(H) \rightarrow T_{p'}(S')</math>. + +That is +:<math>\varphi = \exp_{p'} \circ \psi \circ \exp_p^{-1}</math>, + +where <math>p\in H, p' \in S'</math>. That is to say, the starting point <math>p\in H</math> goes to the tangent plane from <math>H</math> through the inverse of the exponential map. Then travels from one tangent plane to the other through the isometry <math>\psi</math>, and then down to the surface <math>S'</math> with another exponential map. + +The following step involves the use of [[polar coordinates]], <math>(\rho, \theta)</math> and <math>(\rho', \theta')</math>, around <math>p</math> and <math>p'</math> respectively. The requirement will be that the axis are mapped to each other, that is <math>\theta=0</math> goes to <math>\theta'=0</math>. Then <math>\varphi</math> preserves the first fundamental form. <br /> +In a geodesic polar system, the [[Gaussian curvature]] <math>K</math> can be expressed as +:<math>K = - \frac{(\sqrt{G})_{\rho \rho}}{\sqrt{G}}</math>. + +In addition K is constant and fulfills the following differential equation +:<math>(\sqrt{G})_{\rho \rho} + K\cdot \sqrt{G} = 0</math> + +Since <math>H</math> and <math>S'</math> have the same constant Gaussian curvature, then they are locally isometric ([[Minding's Theorem]]). That means that <math>\varphi</math> is a local isometry between <math>H</math> and <math>S'</math>. Furthermore, from the Hadamard's theorem it follows that <math>\varphi</math> is also a covering map. <br /> +Since <math>S'</math> is simply connected, <math>\varphi</math> is a homeomorphism, and hence, a (global) isometry. Therefore, <math>H</math> and <math>S'</math> are globally isometric, and because <math>H</math> has an infinite area, then <math>S'=T_p(S)</math> has an infinite area, as well. <math>\square</math> + +'''Lemma 2''': For each <math>p\in S'</math> exists a parametrization <math>x:U \subset \mathbb{R}^{2} \longrightarrow S', \qquad p \in x(U)</math>, such that the coordinate curves of <math>x</math> are asymptotic curves of <math> x(U) = V'</math> and form a Tchebyshef net. + +'''Lemma 3''': Let <math>V' \subset S'</math> be a coordinate [[neighborhood]] of <math>S'</math> such that the coordinate curves are asymptotic curves in <math>V'</math>. Then the area A of any quadrilateral formed by the coordinate curves is smaller than <math>2\pi</math>. + +The next goal is to show that <math>x</math> is a parametrization of <math>S'</math>. + +'''Lemma 4''': For a fixed <math>t</math>, the curve <math>x(s,t), -\infty < s < +\infty </math>, is an asymptotic curve with <math>s</math> as arc length. + +The following 2 lemmas together with lemma 8 will demonstrate the existence of a [[parametrization]] <math>x:\mathbb{R}^{2} \longrightarrow S'</math> + +'''Lemma 5''': <math>x</math> is a local diffeomorphism. + +'''Lemma 6''': <math>x</math> is [[surjective]]. + +'''Lemma 7''': On <math>S'</math> there are two differentiable linearly independent vector fields which are tangent to the [[asymptotic curve]]s of <math>S'</math>. + +'''Lemma 8''': <math>x</math> is [[injective]]. <br /> + +''Proof of Hilbert's Theorem:'' <br /> +First, it will be assumed that an isometric immersion from a [[complete surface]] with negative curvature<math>S</math> exists: <math>\psi:S \longrightarrow \mathbb{R}^{3}</math> + +As stated in the observations, the tangent plane <math>T_p(S)</math> is endowed with the metric induced by the exponential map <math>\exp_p: T_p(S) \longrightarrow S</math> . Moreover, <math>\varphi = \psi \circ \exp_p:S' \longrightarrow \mathbb{R}^{3}</math> is an isometric immersion and Lemmas 5,6, and 8 show the existence of a parametrization <math>x:\mathbb{R}^{2} \longrightarrow S'</math> of the whole <math>S'</math>, such that the coordinate curves of <math>x</math> are the asymptotic curves of <math>S'</math>. This result was provided by Lemma 4. Therefore, <math>S'</math> can be covered by a union of "coordinate" quadrilaterals <math>Q_{n}</math> with <math> Q_{n} \subset Q_{n+1}</math>. By Lemma 3, the area of each quadrilateral is smaller than <math>2 \pi </math>. On the other hand, by Lemma 1, the area of <math>S'</math> is infinite, therefore has no bounds. This is a contradiction and the proof is concluded. <math>\square</math> + +==References== +* {{aut|Do Carmo, Manfredo}}, ''Differential Geometry of Curves and Surfaces'', Prentice Hall, 1976. +* {{aut|[[Michael Spivak|Spivak, Michael]]}}, ''A Comprenhensive Introduction to Differential Geometry'', Publish or Perish, 1999. + +{{DEFAULTSORT:Hilberts theorem}} +[[Category:Hyperbolic geometry]] +[[Category:Theorems in differential geometry]] +[[Category:Articles containing proofs]] + 4b5h8vlcu7tzufwvwa4hxmtizs87mpt + + + + Mayer–Vietoris sequence + 0 + 4415 + + 4416 + 2014-01-22T10:51:29Z + + 81.65.121.64 + + /* Derivation */ + wikitext + text/x-wiki + In [[mathematics]], particularly [[algebraic topology]] and [[homology theory]], the '''Mayer–Vietoris sequence''' is an [[algebra]]ic tool to help compute [[algebraic invariant]]s of [[topological space]]s, known as their [[Homology group|homology]] and [[cohomology group]]s. The result is due to two [[Austria]]n mathematicians, [[Walther Mayer]] and [[Leopold Vietoris]]. The method consists of splitting a space into pieces, called [[Subspace topology|subspaces]], for which the homology or cohomology groups may be easier to compute. The sequence relates the (co)homology groups of the space to the (co)homology groups of the subspaces. It is a [[Natural (category theory)|natural]] [[long exact sequence]], whose entries are the (co)homology groups of the whole space, the [[direct sum of abelian groups|direct sum]] of the (co)homology groups of the subspaces, and the (co)homology groups of the [[intersection (set theory)|intersection]] of the subspaces. + +The Mayer–Vietoris sequence holds for a variety of [[cohomology theory|cohomology]] and [[homology theory|homology theories]], including [[singular homology]] and [[singular cohomology]]. In general, the sequence holds for those theories satisfying the [[Eilenberg–Steenrod axioms]], and it has variations for both [[Reduced homology|reduced]] and [[Relative homology|relative]] (co)homology. Because the (co)homology of most spaces cannot be computed directly from their definitions, one uses tools such as the Mayer–Vietoris sequence in the hope of obtaining partial information. Many spaces encountered in [[topology]] are constructed by piecing together very simple patches. Carefully choosing the two covering subspaces so that, together with their intersection, they have simpler (co)homology than that of the whole space may allow a complete deduction of the (co)homology of the space. In that respect, the Mayer–Vietoris sequence is analogous to the [[Seifert–van Kampen theorem]] for the [[fundamental group]], and a precise relation exists for homology of dimension one. + +==Background, motivation, and history== + +[[Image:Vietoris4343.jpg|Right|thumb|Leopold Vietoris on his 110th birthday]] + +Like the [[fundamental group]] or the higher [[homotopy group]]s of a space, homology groups are important topological invariants. Although some (co)homology theories are computable using tools of [[linear algebra]], many other important (co)homology theories, especially singular (co)homology, are not computable directly from their definition for nontrivial spaces. For singular (co)homology, the singular (co)chains and (co)cycles groups are often too big to handle directly. More subtle and indirect approaches become necessary. The Mayer–Vietoris sequence is such an approach, giving partial information about the (co)homology groups of any space by relating it to the (co)homology groups of two of its subspaces and their intersection. + +The most natural and convenient way to express the relation involves the algebraic concept of [[exact sequence]]s: sequences of [[Object (category theory)|objects]] (in this case [[Group (mathematics)|groups]]) and [[morphism]]s (in this case [[group homomorphism]]s) between them such that the [[Image (mathematics)|image]] of one morphism equals the [[Kernel (algebra)|kernel]] of the next. In general, this does not allow (co)homology groups of a space to be completely computed. However, because many important spaces encountered in topology are [[topological manifold]]s, [[simplicial complex]]es, or [[CW complex]]es, which are constructed by piecing together very simple patches, a theorem such as that of Mayer and Vietoris is potentially of broad and deep applicability. + +Mayer was introduced to topology by his colleague Vietoris when attending his lectures in 1926 and 1927 at a local university in [[Vienna]].<ref>{{harvnb|Hirzebruch|1999}}</ref> He was told about the conjectured result and a way to its solution, and solved the question for the [[Betti number]]s in 1929.<ref>{{harvnb|Mayer|1929}}</ref> He applied his results to the [[torus]] considered as the union of two cylinders.<ref>{{harvnb|Dieudonné|1989|p=39}}</ref><ref>{{harvnb|Mayer|1929|p=41}}</ref> Vietoris later proved the full result for the homology groups in 1930 but did not express it as an exact sequence.<ref>{{harvnb|Vietoris|1930}}</ref> The concept of an exact sequence only appeared in print in the 1952 book ''Foundations of Algebraic Topology'' by [[Samuel Eilenberg]] and [[Norman Steenrod]]<ref>{{harvnb|Corry|2004|p=345}}</ref> where the results of Mayer and Vietoris were expressed in the modern form.<ref>{{harvnb|Eilenberg|Steenrod|1952|loc=Theorem 15.3}}</ref> +{{-}} + +==Basic versions for singular homology== +Let ''X'' be a [[topological space]] and ''A'', ''B'' be two subspaces whose [[Interior (topology)|interiors]] cover ''X''. (The interiors of ''A'' and ''B'' need not be disjoint.) The Mayer–Vietoris sequence in [[singular homology]] for the triad (''X'', ''A'', ''B'') is a [[long exact sequence]] relating the singular homology groups (with coefficient group the integers '''Z''') of the spaces ''X'', ''A'', ''B'', and the [[intersection (set theory)|intersection]] ''A''∩''B''.<ref>{{harvnb|Eilenberg|Steenrod|1952|loc=§15}}</ref> There is an unreduced and a reduced version. + +===Unreduced version=== +For unreduced homology, the Mayer–Vietoris sequence states that the following sequence is exact:<ref name="Hatcher149">{{harvnb|Hatcher|2002|p=149}}</ref> + +<br /><math>\begin{align} +\cdots\rightarrow H_{n+1}(X)\,&\xrightarrow{\partial_*}\,H_{n}(A\cap B)\,\xrightarrow{(i_*,j_*)}\,H_{n}(A)\oplus H_{n}(B)\,\xrightarrow{k_* - l_*}\,H_{n}(X)\xrightarrow{\partial_*}\\ +&\quad\xrightarrow{\partial_*}\,H_{n-1} (A\cap B)\rightarrow \cdots\rightarrow H_0(A)\oplus H_0(B)\,\xrightarrow{k_* - l_*}\,H_0(X)\rightarrow\,0. +\end{align}</math> + +Here the maps ''i'' : ''A''∩''B'' ↪ ''A'', ''j'' : ''A''∩''B'' ↪ ''B'', ''k'' : ''A'' ↪ ''X'', and ''l'' : ''B'' ↪ ''X'' are [[inclusion map]]s and <math>\oplus</math> denotes the [[direct sum of abelian groups]]. + +===Boundary map=== +[[Image:Mayer Vietoris sequence boundary map on torus.png|thumb|280px|right|Illustration of the boundary map ∂<sub>*</sub> on the torus where the 1-cycle ''x'' = ''u'' + ''v'' is the sum of two 1-chains whose boundary lies in the intersection of ''A'' and ''B''.]] +The boundary maps ∂<sub>*</sub> lowering the dimension may be made explicit as follows.<ref name="Hatcher 2002 150">{{harvnb|Hatcher|2002|p=150}}</ref> An element in ''H''<sub>n</sub>(''X'') is the homology class of an ''n''-cycle ''x'' which, by [[barycentric subdivision]] for example, can be written as the sum of two ''n''-chains ''u'' and ''v'' whose images lie wholly in ''A'' and ''B'', respectively. Thus ∂''x'' = ∂(''u'' + ''v'') = 0 so that ∂''u'' = &minus;∂''v''. This implies that the images of both these boundary (''n'' &minus; 1)-cycles are contained in the intersection ''A''∩''B''. Then ∂<sub>*</sub>([''x'']) is the class of ∂''u'' in ''H''<sub>n&minus;1</sub>(''A''∩''B''). Choosing a different representative ''x′'' does not affect ∂''u'' since ∂''x′'' = ∂''x'' = 0; nor does choosing another decomposition ''x'' = ''u′'' + ''v′'' since then ∂''u'' + ∂''v'' &minus; ∂''u′'' &minus; ∂''v′'' = 0 which implies ∂''u'' = ∂''u′'' and ∂''v'' = ∂''v′''. Notice that the maps in the Mayer–Vietoris sequence depend on choosing an order for ''A'' and ''B''. In particular, the boundary map changes sign if ''A'' and ''B'' are swapped. + +===Reduced version=== +For [[reduced homology]] there is also a Mayer–Vietoris sequence, under the assumption that ''A'' and ''B'' have [[non-empty]] intersection.<ref>{{harvnb|Spanier|1966|p=187}}</ref> The sequence is identical for positive dimensions and ends as: + +<br /><math>\cdots\rightarrow\tilde{H}_0(A\cap B)\,\xrightarrow{(i_*,j_*)}\,\tilde{H}_0(A)\oplus\tilde{H}_0(B)\,\xrightarrow{k_* - l_*}\,\tilde{H}_0(X)\rightarrow\,0.</math> + +===Analogy with the Seifert–van Kampen theorem=== +There is an analogy between the Mayer–Vietoris sequence (especially for homology groups of dimension 1) and the [[Seifert–van Kampen theorem]].<ref name="Hatcher 2002 150"/><ref>{{harvnb|Massey|1984|p=240}}</ref> Whenever ''A''∩''B'' is [[path-connected]] the reduced Mayer–Vietoris sequence yields the isomorphism + +:<math>H_1(X) \cong (H_1(A)\oplus H_1(B))/\text{Ker} (k_* - l_*)</math> + +where, by exactness, + +:<math>\text{Ker} (k_* - l_*) \cong \text{Im} (i_*, j_*).</math> + +This is precisely the [[Commutator subgroup#Abelianization|abelianized]] statement of the Seifert–van Kampen theorem. Compare with the fact that ''H''<sub>1</sub>(''X'') is the abelianization of the [[fundamental group]] π<sub>1</sub>(''X'') when ''X'' is path-connected.<ref>{{harvnb|Hatcher|2002|loc=Theorem 2A.1, p. 166}}</ref> + +==Basic applications== + +===''k''-sphere=== +[[Image:SphereCoverStriped.png|thumb|250px|right|The decomposition for ''X'' = ''S''<sup>2</sup>]] +To completely compute the homology of the [[n-sphere|''k''-sphere]] ''X'' = ''S''<sup>''k''</sup>, let ''A'' and ''B'' be two hemispheres of ''X'' with intersection [[homotopy equivalent]] to a (''k'' &minus; 1)-dimensional equatorial sphere. Since the ''k''-dimensional hemispheres are [[homeomorphic]] to ''k''-discs, which are [[contractible]], the homology groups for ''A'' and ''B'' are [[Trivial group|trivial]]. The Mayer–Vietoris sequence for [[reduced homology]] groups then yields + +:<br /><math> \cdots\rightarrow 0 \rightarrow \tilde{H}_{n}\left(S^k\right) \xrightarrow{\partial_*}\, \tilde{H}_{n-1}\left(S^{k-1}\right) \rightarrow 0 \rightarrow \cdots \! </math> + +Exactness immediately implies that the map ∂<sub>*</sub> is an isomorphism. Using the [[reduced homology]] of the [[0-sphere]] (two points) as a [[Mathematical induction|base case]], it follows<ref>{{harvnb|Hatcher|2002|loc=Example 2.46, p. 150}}</ref> + +:<br /><math>\tilde{H}_n\left(S^k\right)\cong\delta_{kn}\,\mathbb{Z}=\left\{\begin{matrix} +\mathbb{Z} & \mbox{if } n=k \\ +0 & \mbox{if } n \ne k \end{matrix}\right.</math> + +where δ is the [[Kronecker delta]]. Such a complete understanding of the homology groups for spheres is in stark contrast with current knowledge of [[homotopy groups of spheres]], especially for the case ''n'' > ''k'' about which little is known.<ref>{{harvnb|Hatcher|2002|p=384}}</ref> +{{-}} + +===Klein bottle=== +[[Image:KleinBottle2D covered by Möbius strips.svg|thumb|200px|right|The Klein bottle ([[fundamental polygon]] with appropriate edge identifications) decomposed as two Möbius strips ''A'' (in blue) and ''B'' (in red).]] +A slightly more difficult application of the Mayer–Vietoris sequence is the calculation of the homology groups of the [[Klein bottle]] ''X''. One uses the decomposition of ''X'' as the union of two [[Möbius strip]]s ''A'' and ''B'' [[Quotient space|glued]] along their boundary circle (see illustration on the right). Then ''A'', ''B'' and their intersection ''A''∩''B'' are [[Homotopy#Homotopy equivalence and null-homotopy|homotopy equivalent]] to circles, so the nontrivial part of the sequence yields<ref>{{harvnb|Hatcher|2002|p=151}}</ref> + +:<br /><math> 0 \rightarrow H_{2}(X) \rightarrow\, \mathbb{Z}\ \xrightarrow{\alpha} \ \mathbb{Z} \oplus \mathbb{Z} \rightarrow \, H_1(X) \rightarrow 0 \! </math> + +and the trivial part implies vanishing homology for dimensions greater than 2. The central map α sends 1 to (2, &minus;2) since the boundary circle of a Möbius band wraps twice around the core circle. In particular α is [[Injective function|injective]] so homology of dimension 2 also vanishes. Finally, choosing (1, 0) and (1, &minus;1) as a basis for '''Z'''<sup>2</sup>, it follows + +:<br /><math>\tilde{H}_n\left(X\right)\cong\delta_{1n}\,(\mathbb{Z}\oplus\mathbb{Z}_2)=\left\{\begin{matrix} +\mathbb{Z}\oplus\mathbb{Z}_2 & \mbox{if } n=1\\ +0 & \mbox{if } n\ne1 \end{matrix}\right. +</math> +{{-}} + +===Wedge sums=== +[[Image:WedgeSumSpheres.png|right|300px|thumb|This decomposition of the wedge sum ''X'' of two 2-spheres ''K'' and ''L'' yields all the homology groups of ''X''.]] +Let ''X'' be the [[wedge sum]] of two spaces ''K'' and ''L'', and suppose furthermore that the identified [[basepoint]] is a [[deformation retract]] of [[Neighbourhood (mathematics)|open neighborhoods]] ''U'' ⊂ ''K'' and ''V'' ⊂ ''L''. Letting ''A'' = ''K''∪''V'' and ''B'' = ''U''∪''L'' it follows that ''A''∪''B'' = ''X'' and ''A''∩''B'' = ''U''∪''V'', which is [[contractible]] by construction. The reduced version of the sequence then yields (by exactness)<ref>{{harvnb|Hatcher|2002|loc=Exercise 31}}</ref> +:<math>\tilde{H}_n(K\vee L)\cong \tilde{H}_n(K)\oplus\tilde{H}_n(L)</math> +for all dimensions ''n''. The illustration on the right shows ''X'' as the sum of two 2-spheres ''K'' and ''L''. For this specific case, using the result [[Mayer–Vietoris sequence#k-sphere|from above]] for 2-spheres, one has +:<math>\tilde{H}_n\left(S^2\vee S^2\right)\cong\delta_{2n}\,(\mathbb{Z}\oplus\mathbb{Z})=\left\{\begin{matrix} +\mathbb{Z}\oplus\mathbb{Z} & \mbox{if } n=2 \\ +0 & \mbox{if } n \ne 2 \end{matrix}\right.</math> +{{-}} + +===Suspensions=== +[[Image:0-Sphere Suspension - Mayer-Vietoris Cover.svg|right|500px|thumb|This decomposition of the suspension ''X'' of the 0-sphere ''Y'' yields all the homology groups of ''X''.]] +If ''X'' is the [[Suspension (topology)|suspension]] ''SY'' of a space ''Y'', let ''A'' and ''B'' be the [[Complement (set theory)|complements]] in ''X'' of the top and bottom 'vertices' of the double cone, respectively. Then ''X'' is the union ''A''∪''B'', with ''A'' and ''B'' contractible. Also, the intersection ''A''∩''B'' is homotopy equivalent to ''Y''. Hence the Mayer–Vietoris sequence yields, for all ''n'',<ref>{{harvnb|Hatcher|2002|loc=Exercise 32}}</ref> +:<math>\tilde{H}_n(SY)\cong \tilde{H}_{n-1}(Y)</math> + +The illustration on the right shows the 1-sphere ''X'' as the suspension of the 0-sphere ''Y''. Noting in general that the ''k''-sphere is the suspension of the (''k'' &minus; 1)-sphere, it is easy to derive the homology groups of the ''k''-sphere by induction, [[Mayer–Vietoris sequence#k-sphere|as above]]. +{{-}} + +==Further discussion== + +===Relative form=== +A [[relative homology|relative]] form of the Mayer–Vietoris sequence also exists. If ''Y'' ⊂ ''X'' and is the union of ''C'' ⊂ ''A'' and ''D'' ⊂ ''B'', then the exact sequence is:<ref>{{harvnb|Hatcher|2002|p=152}}</ref> + +<br /><math>\cdots\rightarrow H_{n}(A\cap B,C\cap D)\,\xrightarrow{(i_*,j_*)}\,H_{n}(A,C)\oplus H_{n}(B,D)\,\xrightarrow{k_* - l_*}\,H_{n}(X,Y)\,\xrightarrow{\partial_*}\,H_{n-1}(A\cap B,C\cap D)\rightarrow\cdots</math> + +===Naturality=== +The homology groups are [[Natural (category theory)|natural]] in the sense that if ''ƒ'' is a [[Continuous function (topology)|continuous]] map from ''X''<sub>1</sub> to ''X''<sub>2</sub>, then there is a canonical [[pushforward (homology)|pushforward]] map ''ƒ''<sub>∗</sub> of homology groups ''ƒ''<sub>∗</sub>&nbsp;:&nbsp;''H''<sub>''k''</sub>(''X''<sub>1</sub>)&nbsp;→&nbsp;''H''<sub>''k''</sub>(''X''<sub>2</sub>), such that the composition of pushforwards is the pushforward of a composition: that is, <math>(g\circ h)_* = g_*\circ h_*</math>. The Mayer–Vietoris sequence is also natural in the sense that if ''X''<sub>1</sub> = ''A''<sub>1</sub>∪''B''<sub>1</sub> to ''X''<sub>2</sub> = ''A''<sub>2</sub>∪''B''<sub>2</sub> and the mapping ''ƒ'' satisfies ''ƒ''(''A''<sub>1</sub>) ⊂ ''A''<sub>2</sub> and ''ƒ''(''B''<sub>1</sub>) ⊂ ''B''<sub>2</sub>, then the connecting morphism ∂<sub>∗</sub> of the Mayer–Vietoris sequence commutes with ''ƒ''<sub>∗</sub>.<ref>{{harvnb|Massey|1984|p=208}}</ref> That is,<ref>{{harvnb|Eilenberg|Steenrod|1952|loc=Theorem 15.4}}</ref> the following diagram [[Commutative diagram|commutes]] (the horizontal maps are the usual ones): +[[Image:Mayer-Vietoris naturality.png|center|740px]] + +===Cohomological versions=== + +The Mayer–Vietoris long exact sequence for [[singular cohomology]] groups with coefficient [[group (mathematics)|group]] ''G'' is [[Duality (mathematics)|dual]] to the homological version. It is the following:<ref>{{harvnb|Hatcher|2002|p=203}}</ref> + +<br /><math>\cdots\rightarrow H^{n}(X;G)\rightarrow H^{n}(A;G)\oplus H^{n}(B;G)\rightarrow H^{n}(A\cap B;G)\rightarrow H^{n+1}(X;G)\rightarrow\cdots</math> + +where the dimension preserving maps are restriction maps induced from inclusions, and the (co-)boundary maps are defined in a similar fashion to the homological version. There is also a relative formulation. + +As an important special case when ''G'' is the group of [[real number]]s '''R''' and the underlying topological space has the additional structure of a [[smooth manifold]], the Mayer–Vietoris sequence for [[de Rham cohomology]] is + +<br /><math>\cdots\rightarrow H^{n}(X)\,\xrightarrow{\rho}\,H^{n}(U)\oplus H^{n}(V)\,\xrightarrow{\Delta}\,H^{n}(U\cap V)\,\xrightarrow{d^*}\,H^{n+1}(X)\rightarrow\cdots</math> + +where {''U'', ''V''} is an [[open cover]] of ''X'', ''ρ'' denotes the restriction map, and Δ is the difference. The map ''d*'' is defined similarly as the map ''∂''<sub>*</sub> from above. It can be briefly described as follows. For a cohomology class [''ω''] represented by [[closed and exact differential forms|closed form]] ''ω'' in ''U''∩''V'', express ''ω'' as a difference of forms ''ω<sub>U</sub>'' - ''ω<sub>V</sub>'' via a [[partition of unity]] subordinate to the open cover {''U'', ''V''}, for example. The exterior derivative ''dω<sub>U</sub>'' and ''dω<sub>V</sub>'' agree on ''U''∩''V'' and therefore together define an ''n'' + 1 form ''σ'' on ''X''. One then has ''d*''([''ω'']) = [''σ'']. + +===Derivation=== +Consider the [[Homological algebra#Functoriality|long exact sequence associated to]] the [[short exact sequence]]s of [[chain group]]s (constituent groups of [[chain complex]]es) + +:<math>0 \rightarrow C_n(A\cap B)\,\xrightarrow{\alpha}\,C_n(A) \oplus C_n(B)\,\xrightarrow{\beta}\,C_n(A+B) \rightarrow 0 </math> + +where α(''x'') = (''x'', &minus;''x''), β(''x'', ''y'') = ''x'' + ''y'', and ''C''<sub>''n''</sub>(''A'' + ''B'') is the chain group consisting of sums of chains in ''A'' and chains in ''B''.<ref name="Hatcher149"/> It is a fact that the singular ''n''-simplices of ''X'' whose images are contained in either ''A'' or ''B'' generate all of the homology group ''H''<sub>''n''</sub>(''X'').<ref>{{harvnb|Hatcher|2002|loc=Proposition 2.21, p. 119}}</ref> In other words, ''H''<sub>''n''</sub>(''A'' + ''B'') is isomorphic to ''H''<sub>''n''</sub>(''X''). This gives the Mayer–Vietoris sequence for singular homology. + +The same computation applied to the short exact sequences of vector spaces of [[differential form]]s + +:<math> +0\rightarrow\Omega^{n}(X)\rightarrow\Omega^{n}(U)\oplus\Omega^{n}(V)\rightarrow\Omega^{n}(U\cap V)\rightarrow0 +</math> + +yields the Mayer–Vietoris sequence for de Rham cohomology.<ref>{{harvnb|Bott|Tu|1982|loc=§I.2}}</ref> + +From a formal point of view, the Mayer–Vietoris sequence can be derived from the [[Eilenberg–Steenrod axioms]] for [[homology theory|homology theories]] using the [[long exact sequence in homology]], at least for CW complexes.<ref>{{harvnb|Hatcher|2002|p=162}}</ref> + +===Other homology theories=== +The derivation of the Mayer–Vietoris sequence from the Eilenberg–Steenrod axioms does not require the [[dimension axiom]],<ref>{{harvnb|Kōno|Tamaki|2006|pp=25–26}}</ref> so in addition to existing in [[List of cohomology theories#Ordinary homology theories|ordinary cohomology theories]], it holds in [[extraordinary cohomology theories]] (such as [[topological K-theory]] and [[cobordism]]). + +===Sheaf cohomology=== +From the point of view of [[sheaf cohomology]], the Mayer–Vietoris sequence is related to [[Čech cohomology]]. Specifically, it arises from the [[Spectral sequence|degeneration]] of the [[spectral sequence]] that relates Čech cohomology to sheaf cohomology (sometimes called the [[Mayer–Vietoris spectral sequence]]) in the case where the open cover used to compute the Čech cohomology consists of two open sets.<ref>{{harvnb|Dimca|2004|pp=35–36}}</ref> This spectral sequence exists in arbitrary [[Topos|topoi]].<ref>{{harvnb|Verdier|1972}} (SGA 4.V.3)</ref> + +==See also== +*[[Excision theorem]] +*[[Zig-zag lemma]] + +==Notes== +{{reflist|colwidth=30em}} + +==References== + +*{{citation + |last1=Bott + |first1=Raoul + |author1-link=Raoul Bott + |last2=Tu + |first2=Loring W. + |title=Differential Forms in Algebraic Topology + |publisher=[[Springer Science+Business Media|Springer-Verlag]] + |location=Berlin, New York + |isbn=978-0-387-90613-3 + |year=1982}}. + +*{{citation + |first= Leo + |last= Corry + |authorlink= Leo Corry + |title= Modern Algebra and the Rise of Mathematical Structures + |publisher= Birkhäuser + |year= 2004 + |page= 345 + |isbn= 3-7643-7002-5 +}}. + +*{{citation + |first= Jean + |last= Dieudonné + |authorlink= Jean Dieudonné + |title= A History of Algebraic and Differential Topology 1900–1960 + |publisher= Birkhäuser + |year= 1989 + |page= 39 + |isbn= 0-8176-3388-X +}}. + +*{{citation +| last=Dimca +| first=Alexandru +| title=Sheaves in topology +| publisher=[[Springer-Verlag]] +| year=2004 +| location=Berlin +| series=Universitext +| isbn=978-3-540-20665-1 +| mr=2050072 +}} + +* {{citation + |last1=Eilenberg + |first1=Samuel + |authorlink1=Samuel Eilenberg + |last2=Steenrod + |first2=Norman + |authorlink2=Norman Steenrod + |title=Foundations of Algebraic Topology + |year=1952 + |isbn=978-0-691-07965-3 + |publisher= [[Princeton University Press]] +}}. + +*{{citation + |first= Allen + |last= Hatcher + |author-link= Allen Hatcher + |title= Algebraic Topology + |url= http://www.math.cornell.edu/%7Ehatcher/AT/ATpage.html + |year= 2002 + |publisher= [[Cambridge University Press]] + |isbn= 978-0-521-79540-1 + |mr= 1867354 + }}. + +*{{citation +|title= The Heritage of Emmy Noether +|last=Hirzebruch +|first=Friedrich +|authorlink=Friedrich Hirzebruch +|contribution=Emmy Noether and Topology +|pages=61–63 +|editor= Teicher, M. +|series= Israel Mathematical Conference Proceedings +|publisher= [[Bar-Ilan University]]/[[American Mathematical Society]]/[[Oxford University Press]] +|year= 1999 +|isbn= 978-0-19-851045-1 +|oclc= 223099225 +}}. + +*{{citation + |last=Kōno + |first=Akira + |last2=Tamaki + |first2=Dai + |title=Generalized cohomology + |publisher=[[American Mathematical Society]] + |location=Providence, RI + |series=Iwanami Series in Modern Mathematics, Translations of Mathematical Monographs + |volume=230 + |year=2006 + |origyear=2002 + |edition=Translated from the 2002 Japanese edition by Tamaki + |isbn=978-0-8218-3514-2 + |mr=2225848 +}} + +*{{citation + |first= William + |last= Massey + |author-link= William S. Massey + |title= Algebraic Topology: An Introduction + |year= 1984 + |publisher= [[Springer Science+Business Media|Springer-Verlag]] + |isbn= 978-0-387-90271-5 + }}. + +*{{citation + |first= Walther + |last= Mayer + |author-link= Walther Mayer + |title= Über abstrakte Topologie + |year= 1929 + |journal= [[Monatshefte für Mathematik]] + |url= http://www.springerlink.com/content/x33611021p942518/ + |doi= 10.1007/BF02307601 + |issn= 0026-9255 + |volume= 36 + |issue= 1 + |pages= 1–42 +}}. {{de icon}} + +*{{citation + |first= Edwin + |last= Spanier + |author-link= Edwin Spanier + |title= Algebraic Topology + |year= 1966 + |publisher= [[Springer Science+Business Media|Springer-Verlag]] + |isbn= 0-387-94426-5 + }}. + +*{{citation + |first=Jean-Louis + |last=Verdier + |author-link=Jean-Louis Verdier + |contribution=Cohomologie dans les topos + |editor1-first=Michael + |editor1-last=Artin + |editor1-link=Michael Artin + |editor2-first=Alexander + |editor2-last=Grothendieck + |editor2-link=Alexander Grothendieck + |editor3-first=Jean-Louis + |editor3-last=Verdier + |editor3-link=Jean-Louis Verdier + |title=Séminaire de Géométrie Algébrique du Bois Marie – 1963–64 – Théorie des topos et cohomologie étale des schémas – (SGA 4) – Tome 2 + |year=1972 + |publisher = [[Springer Science+Business Media|Springer-Verlag]] + |location = Berlin; Heidelberg + |language = French + |series=[[Lecture Notes in Mathematics]] + |volume=270 + |isbn=978-3-540-06012-3 + |doi=10.1007/BFb0061320 + |pages=1 +}} + +*{{citation + |first= Leopold + |last= Vietoris + |author-link= Leopold Vietoris + |title= Über die Homologiegruppen der Vereinigung zweier Komplexe + |year= 1930 + |journal= [[Monatshefte für Mathematik]] + |volume= 37 + |pages= 159–62 + |doi=10.1007/BF01696765 +}}. {{de icon}} + +==Further reading== +* {{citation + |last1=Reitberger + |first1=Heinrich + |title=Leopold Vietoris (1891–2002) + |url=http://www.ams.org/notices/200210/fea-vietoris.pdf + |format=PDF|year=2002 + |journal=[[Notices of the American Mathematical Society]] + |issn=0002-9920 + |volume=49 + |issue=20 +}}. + +{{good article}} + +{{DEFAULTSORT:Mayer-Vietoris Sequence}} +[[Category:Homology theory]] + 1uhah33ea1hxevvphjn985ak9cr6ac5 + + + + Graphlets + 0 + 25342 + + 25343 + 2014-01-28T12:16:23Z + + LilHelpa + 0 + + [[WP:AWB/T|Typo fixing]] and general fixes using [[Project:AWB|AWB]] + wikitext + text/x-wiki + {{Context|date=March 2010}} <!-- is this about the internet? fishing? pure mathematics? no. but what is it about? apparently something to do with biology? --> +'''Graphlets''' are small connected non-isomorphic ''induced'' subgraphs of a large network.<ref name="Przulj2004">Pržulj N, [[Derek Corneil|Corneil DG]], Jurisica I: Modeling Interactome, Scale-Free or Geometric?, Bioinformatics 2004, 20(18):3508-3515.</ref><ref name="Przulj2007">Pržulj N, Biological Network Comparison Using Graphlet Degree Distribution, Bioinformatics 2007, 23:e177-e183.</ref> Graphlets differ from [[network motif]]s, since they must be ''induced'' subgraphs, whereas motifs are ''partial'' subgraphs. An [[Induced subgraph#Subgraphs|induced subgraph]] must contain all edges between its nodes that are present in the large network, while a partial subgraph may contain only some of these edges. Moreover, graphlets do not need to be over-represented in the data when compared with randomized networks, while motifs do.<ref>R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs, simple building blocks of complex networks, ''Science'' 2002, 298(5594): p. 824-7.</ref> + +Graphlets were first introduced by Nataša Pržulj, when they were used as a basis for designing two new highly sensitive measures of network local structural similarities: the relative graphlet frequency distance (RGF-distance)<ref name="Przulj2004" /> and the graphlet degree distribution agreement (GDD-agreement).<ref name="Przulj2007" /> Additionally, Pržulj group developed a novel measure of network topological similarity that generalizes the [[Degree (graph theory)|degree]] of a node in the network to its graphlet degree vector (GDV) or graphlet degree signature.<ref name="Milenkovic2008">Tijana Milenković and Nataša Pržulj, Uncovering Biological Network Function via Graphlet Degree Signatures, Cancer Informatics 2008, 6:257–273.</ref> + +==Graphlet-based network properties== + +=== Relative graphlet frequency distance === + +RGF-distance compares the frequencies of the appearance of all 3-5-node graphlets in two networks.<ref name="Przulj2004" /> Let ''N<sub>i</sub>(G)'' be the number of graphlets of type <math>i</math> (<math>i \in \{1,\ldots,29\}</math>) in network ''G'', and let <math>T(G) = \sum_{i=1}^{29} N_i(G)</math> be the total number of graphlets of ''G''. The "similarity" between two graphs should be independent of the total number of nodes or edges, and should depend only upon the differences between relative frequencies of graphlets. Thus, ''relative graphlet frequency distance'' ''D(G,H)'' between two graphs ''G'' and ''H'' is defined as:<br /> +<math>D(G,H) = \sum_{i=1}^{29} | F_i(G) - F_i(H) |</math>, <br /> +where <math>F_i(G) = -\log(N_i(G)/T(G))</math>. The logarithm of the graphlet frequency is used because frequencies of different graphlets can differ by several orders of magnitude and the distance measure should not be entirely dominated by the most frequent graphlets. + +===Graphlet degree distribution agreement=== +GDD-agreement generalizes the notion of the [[degree distribution]] to the spectrum of graphlet degree distributions (GDDs) in the following way.<ref name="Przulj2007" /> The degree distribution measures the number of nodes of degree ''k'' in graph ''G'', i.e., the number of nodes "touching" ''k'' edges, for each value of ''k''. Note that an edge is the only graphlet with two nodes. GDDs generalize the degree distribution to other graphlets: they measure for each 2-5-node graphlet ''G<sub>i</sub>'', <math>i = 0, 1,..., 29</math>, such as a triangle or a square, the number of nodes "touching" ''k'' graphlets ''G<sub>i</sub>'' at a particular node. A node at which a graphlet is "touched" is topologically relevant, since it allows us to distinguish between nodes "touching", for example, a three node path at an end node or at the middle node. This is summarized by [[automorphism]] orbits (or just orbits, for brevity): by taking into account the "symmetries" between nodes of a graphlet, there are 73 different orbits across all 2-5-node graphlets (see [Pržulj, 2007]<ref name="Przulj2007" /> for details). + +For each orbit ''j'', one needs to measure the ''j<sup>th</sup>'' GDD, ''d<sub>G</sub><sup>j</sup>(k)'', i.e., the distribution of the number of nodes in ''G'' "touching" the corresponding graphlet at orbit ''j'' ''k'' times. Clearly, the degree distribution is the ''0th'' GDD. ''d<sub>G</sub><sup>j</sup>(k)'' is scaled as +<math>S_G^j(k) = \frac{d_G^j(k)}{k}</math> to decrease the contribution of larger degrees in a GDD and then normalized with respect to its total area +<math>T_G^j = \sum_{k=1}^\infty S_G^j(k)</math> giving the "normalized distribution" +<math>N_G^j(k) = \frac{S_G^j(k)}{T_G^j}</math>. + +The ''j<sup>th</sup>'' GDD-agreement compares the ''j<sup>th</sup>'' GDDs of two networks. +For two networks ''G'' and ''H'' and a particular orbit ''j'', the "distance" ''D<sup>j</sup>(G,H)'' between their normalized ''j<sup>th</sup>'' GDDs is: <br /> +<math>D^{j}(G,H)=\frac{1}{\sqrt{2}}(\sum_{k=1}^{\infty}[N_{G}^{j}(k)-N_{H}^{j}(k)]^{2})^{\frac{1}{2}}</math>. + +The distance is between 0 and 1, where 0 means that ''G'' and ''H'' have identical ''j<sup>th</sup>'' GDDs, and 1 means that their ''j<sup>th</sup>'' GDDs +are far away. Next, ''D<sup>j</sup>(G,H)'' is reversed to obtain the ''j<sup>th</sup>'' ''GDD-agreement'':<br /> +<math>A^j(G,H) = 1 - D^j(G,H)</math>, for <math>j \in \{0,1,\ldots,72\}</math>. + +The total GDD-agreement between two networks ''G'' and ''H'' is the arithmetic or the geometric average of the ''j<sup>th</sup>'' GDD-agreements over all ''j'', i.e., <br /> +<math>A_{arith}(G,H) = \frac{1}{73} \sum_{j=0}^{72}A^j(G,H)</math>,<br /> +and<br /> +<math>A_{geo}(G,H) = \left(\prod_{j=0}^{72}A^j(G,H)\right)^{\frac{1}{73}}</math>,<br /> +respectively. GDD-agreement is scaled to always be between 0 and 1, where 1 means that two networks are identical with respect to this property. (See [Pržulj, 2007]<ref name="Przulj2007" /> for details.) + +===Graphlet degree vectors (signatures) and signature similarities=== +This method generalizes the degree of a node, which counts the number of edges that the node touches, into the vector of graphlet degrees, or graphlet degree signature, counting the number of graphlets that the node touches at a particular orbit, for all graphlets on 2 to 5 nodes.<ref name="Milenkovic2008" /> The resulting vector of 73 coordinates is the signature of a node that describes the topology of node's neighborhood and captures its interconnectivities out to a distance of 4 (see [Milenković and Pržulj, 2008]<ref name="Milenkovic2008" /> for details). The graphlet degree signature of a node provides a highly constraining measure of local topology in its vicinity and comparing the signatures of two nodes provides a highly constraining measure of local topological similarity between them. + +The ''signature similarity''<ref name="Milenkovic2008" /> is computed as follows. For a node ''u'' in graph ''G'', ''u<sub>i</sub>'' denotes the ''i<sup>th</sup>'' coordinate of its signature vector, i.e., ''u<sub>i</sub>'' is the number of times node ''u'' is touched by an orbit ''i'' in ''G''. The distance ''D<sub>i</sub>(u,v)'' between the ''i<sup>th</sup>'' orbits of nodes ''u'' and ''v'' is defined as: <br /> +<math>D_i(u,v) = w_i \times \frac{|log(u_i + 1) - log(v_i + 1)|}{log(max\{u_i, v_i\} + 2)}</math>, <br /> +where ''w<sub>i</sub>'' is the weight of orbit ''i'' that accounts for dependencies between orbits (see [Milenković and Pržulj, 2008]<ref name="Milenkovic2008" /> for details). The total distance ''D(u,v)'' between nodes ''u'' and ''v'' is defined as: <br /> +<math>D(u,v) = \frac{\sum_{i=0}^{72}D_i}{\sum_{i=0}^{72}w_i}</math>. <br /> +The distance ''D(u,v)'' is in [0, 1), where distance 0 means that signatures of nodes ''u'' and ''v'' are identical. Finally, the signature similarity, ''S(u,v)'', between nodes ''u'' and ''v'' is: <br /> +<math>S(u,v) = 1 - D(u,v)</math>. <br /> +Clearly, a higher signature similarity between two nodes corresponds to a higher topological similarity between their extended neighborhoods (out to distance 4). + +==Application of graphlet-based network properties== +RGF-distance and GDD-agreement were used to evaluate the fit of various network models to real-world networks and to discover a new, well-fitting, [[Random geometric graph|geometric random graph model]] for [[Protein–protein interaction|protein-protein interaction]] networks,<ref name="Przulj2004" /><ref name="Przulj2007" /> as well as other types of [[Network biology|biological networks]], such as [[protein structure]] networks, also called residue interaction graphs.<ref>Tijana Milenković, Ioannis Filippis, Michael Lappe, and Nataša Pržulj, Optimized Null Model for Protein Structure Networks, 2009, PLoS ONE 4(6): e5967.</ref> These graphlet-based network properties are implemented in GraphCrunch, a software tool for large network analyses and modeling,<ref>Tijana Milenković, Jason Lai, and Nataša Pržulj, GraphCrunch: a tool for large network analyses, BMC Bioinformatics 2008, 9:70. Highly accessed.</ref> + +Graphlet degree vectors (signatures) and signature similarities were applied to biological networks to identify groups (or [[Cluster analysis|clusters]]) of topologically similar nodes in a network and predict biological properties of yet uncharacterized nodes based on known biological properties of characterized nodes. Specifically, they were applied to [[protein function prediction]],<ref name="Milenkovic2008" /> cancer gene identification,<ref name="Milenkovic2009">Tijana Milenković, Vesna Memisević, Anand K. Ganesan, and Nataša Pržulj, Systems-level Cancer Gene Identification from Protein Interaction Network Topology Applied to Melanogenesis-related Interaction Networks, Journal of the Royal Society Interface 2009, {{doi|10.1098/rsif.2009.0192}}.</ref> and discovery of pathways underlying certain biological processes, such as melanogenesis<ref name="Milenkovic2009" /> or protein degradation.<ref>Cortnie Guerrero, Tijana Milenković, Nataša Pržulj, Peter Kaiser, Lan Huang, Characterization of the Yeast Proteasome Interaction Network by QTAX-Based Tag-Team Mass Spectrometry and Protein Interaction Network Analysis, PNAS 2008, 105(36): 13333–13338.</ref><ref>Robyn Kaake, Tijana Milenković, Nataša Pržulj, Peter Kaiser, and Lan Huang, Quantifying Cell Cycle Dependent Changes in Protein Interacting Network of the Yeast 26S Proteasome, Journal of Proteome Research 2010, to appear.</ref> Additionally, [[GRAph ALigner (GRAAL)]],<ref name="Kuchaiev2010">Oleksii Kuchaiev, Tijana Milenković, Vesna Memisević, Wayne Hayes, and Nataša Pržulj, Topological network alignment uncovers biological function and phylogeny, Journal of the Royal Society Interface 2010, to appear.</ref> a global network alignment method, used graphlet degree vectors and signature similarities to produce ''topological'' alignments of biological networks, without using any information external to network topology. + +==References== +<!--- See [[Wikipedia:Footnotes]] on how to create references using<ref></ref> tags which will then appear here automatically --> +{{Reflist}} + +==External links== +* [http://bio-nets.doc.ic.ac.uk/graphcrunch2/] + +[[Category:Networks]] + ko7nkttoodno8ralnjq94bltbz23773 + + + + Kinetic energy + 0 + 565 + + 566 + 2014-01-31T16:00:00Z + + ClueBot NG + 0 + + + Reverting possible vandalism by [[Special:Contributions/64.251.52.199|64.251.52.199]] to version by Dger. False positive? [[User:ClueBot NG/FalsePositives|Report it]]. Thanks, [[User:ClueBot NG|ClueBot NG]]. (1678585) (Bot) + wikitext + text/x-wiki + {{Infobox physical quantity +|bgcolour={default} +|name = Kinetic energy +|image=[[File:Wooden roller coaster txgi.jpg|220px]] +|caption=The cars of a [[roller coaster]] reach their maximum kinetic energy when at the bottom of their path. When they start rising, the kinetic energy begins to be converted to gravitational [[potential energy]]. The sum of kinetic and potential energy in the system remains constant, ignoring losses to [[friction]]. +|unit = [[joule]] (J) +|symbols = KE, ''E''<sub>k</sub>, or T +|derivations = ''E''<sub>k</sub> = ½''[[mass|m]][[velocity|v]]''<sup>2</sup> <br> +''E''<sub>k</sub> = ''E''<sub>t</sub>+''E''<sub>r</sub> +}} +{{Classical mechanics}} +In [[physics]], the '''kinetic energy''' of an object is the [[energy]] which it possesses due to its [[motion (physics)|motion]].<ref>{{cite book +|title=Textbook of Engineering Physics (Part I) +|first1=Mahesh C. +|last1=Jain +|publisher=PHI Learning Pvt. Ltd. +|year=2009 +|isbn=81-203-3862-6 +|page=9 +|url=http://books.google.com/books?id=DqZlU3RJTywC}}, [http://books.google.com/books?id=DqZlU3RJTywC&pg=PA9 Chapter 1, p. 9] +</ref> +It is defined as the [[work (physics)|work]] needed to accelerate a body of a given mass from rest to its stated [[velocity]]. Having gained this energy during its [[acceleration]], the body maintains this kinetic energy unless its speed changes. The same amount of work is done by the body in decelerating from its current speed to a state of rest. + +In [[classical mechanics]], the kinetic energy of a non-rotating object of [[mass]] ''m'' traveling at a [[speed]] ''v'' is ''½ mv²''. In [[Special relativity|relativistic mechanics]], this is only a good approximation when ''v'' is much less than the [[speed of light]]. + +==History and etymology== +The adjective ''kinetic'' has its roots in the [[Ancient Greek|Greek]] word ''κίνησις'' ([[-kinesis|kinesis]]) meaning ''motion''. The dichotomy between kinetic energy and [[potential energy]] can be traced back to [[Aristotle]]'s concepts of [[actuality and potentiality]].{{citation needed|date=July 2012}} + +The principle in [[classical mechanics]] that ''E ∝ mv²'' was first developed by [[Gottfried Leibniz]] and [[Johann Bernoulli]], who described kinetic energy as the ''living force'', ''[[vis viva]]''. [[Willem 's Gravesande]] of the Netherlands provided experimental evidence of this relationship. By dropping weights from different heights into a block of clay, [[Willem 's Gravesande]] determined that their penetration depth was proportional to the square of their impact speed. [[Émilie du Châtelet]] recognized the implications of the experiment and published an explanation.<ref>{{Cite book|author=Judith P. Zinsser |title=Emilie du Chatelet: Daring Genius of the Enlightenment|publisher=Penguin|year= 2007|isbn=0-14-311268-6}}</ref> + +The terms ''kinetic energy'' and ''work'' in their present scientific meanings date back to the mid-19th century. Early understandings of these ideas can be attributed to [[Gaspard-Gustave Coriolis]], who in 1829 published the paper titled ''Du Calcul de l'Effet des Machines'' outlining the mathematics of kinetic energy. [[William Thomson, 1st Baron Kelvin|William Thomson]], later Lord Kelvin, is given the credit for coining the term "kinetic energy" c. 1849–51.<ref>{{cite book| author=Crosbie Smith, M. Norton Wise|title=Energy and Empire: A Biographical Study of Lord Kelvin|publisher=Cambridge University Press|pages=866| isbn=0-521-26173-2}}</ref><ref>{{cite book|author=John Theodore Merz|title=A History of European Thought in the Nineteenth Century|publisher=Blackwood|year=1912|page= 139|isbn=0-8446-2579-5}}</ref> + +==Introduction== +[[Energy]] occurs in many forms, including [[chemical energy]], [[thermal energy]], [[electromagnetic radiation]], [[gravitational energy]], [[electric energy]], [[elastic energy]], [[nuclear binding energy|nuclear energy]], and [[rest energy]]. These can be categorized in two main classes: [[potential energy]] and kinetic energy. + +Kinetic energy may be best understood by examples that demonstrate how it is transformed to and from other forms of energy. For example, a [[cyclist]] uses [[food energy|chemical energy provided by food]] to accelerate a [[bicycle]] to a chosen speed. On a level surface, this speed can be maintained without further work, except to overcome [[drag (physics)|air resistance]] and [[friction]]. The chemical energy has been converted into kinetic energy, the energy of motion, but the process is not completely efficient and produces heat within the cyclist. + +The kinetic energy in the moving cyclist and the bicycle can be converted to other forms. For example, the cyclist could encounter a hill just high enough to coast up, so that the bicycle comes to a complete halt at the top. The kinetic energy has now largely been converted to gravitational potential energy that can be released by freewheeling down the other side of the hill. Since the bicycle lost some of its energy to friction, it never regains all of its speed without additional pedaling. The energy is not destroyed; it has only been converted to another form by friction. Alternatively the cyclist could connect a [[Bottle_dynamo|dynamo]] to one of the wheels and generate some electrical energy on the descent. The bicycle would be traveling slower at the bottom of the hill than without the generator because some of the energy has been diverted into electrical energy. Another possibility would be for the cyclist to apply the brakes, in which case the kinetic energy would be dissipated through friction as [[heat]]. + +Like any physical quantity which is a function of velocity, the kinetic energy of an object depends on the relationship between the object and the observer's [[frame of reference]]. Thus, the kinetic energy of an object is not [[Galilean invariance|invariant]]. + +[[Spacecraft]] use chemical energy to launch and gain considerable kinetic energy to reach [[orbital speed|orbital velocity]]. In a perfectly circular orbit, this kinetic energy remains constant because there is almost no friction in near-earth space. However it becomes apparent at re-entry when some of the kinetic energy is converted to heat. If the orbit is [[elliptic orbit|elliptical]] or [[hyperbolic trajectory|hyperbolic]], then throughout the orbit kinetic and [[potential energy]] are exchanged; kinetic energy is greatest and potential energy lowest at closest approach to the earth or other massive body, while potential energy is greatest and kinetic energy the lowest at maximum distance. Without loss or gain, however, the sum of the kinetic and potential energy remains constant. + +Kinetic energy can be passed from one object to another. In the game of [[billiards]], the player imposes kinetic energy on the cue ball by striking it with the cue stick. If the cue ball collides with another ball, it slows down dramatically and the ball it collided with accelerates to a speed as the kinetic energy is passed on to it. [[Collisions]] in billiards are effectively [[elastic collision]]s, in which kinetic energy is preserved. In [[inelastic collision]]s, kinetic energy is dissipated in various forms of energy, such as heat, sound, binding energy (breaking bound structures). + +[[Flywheel]]s have been developed as a method of [[flywheel energy storage|energy storage]]. This illustrates that kinetic energy is also stored in rotational motion. + +Several mathematical descriptions of kinetic energy exist that describe it in the appropriate physical situation. For objects and processes in common human experience, the formula ½mv² given by [[Newtonian mechanics|Newtonian (classical) mechanics]] is suitable. However, if the speed of the object is comparable to the speed of light, [[special relativity|relativistic effects]] become significant and the relativistic formula is used. If the object is on the atomic or [[sub-atomic scale]], [[quantum mechanical]] effects are significant and a quantum mechanical model must be employed. + +==Newtonian kinetic energy== + +===Kinetic energy of rigid bodies=== +In [[classical mechanics]], the kinetic energy of a ''point object'' (an object so small that its mass can be assumed to exist at one point), or a non-rotating [[rigid body]] depends on the [[mass]] of the body as well as its [[speed]]. The kinetic energy is equal to the mass multiplied by the square of the speed, multiplied by the constant 1/2. In formula form: + +:<math>E_\text{k} =\tfrac{1}{2} mv^2 </math> + +where <math>m</math> is the mass and <math>v</math> is the speed (or the velocity) of the body. In [[SI]] units (used for most modern scientific work), mass is measured in [[kilogram]]s, speed in [[metres per second]], and the resulting kinetic energy is in [[joule]]s. + +For example, one would calculate the kinetic energy of an 80&nbsp;kg mass (about 180&nbsp;lbs) traveling at 18 metres per second (about 40&nbsp;mph, or 65&nbsp;km/h) as +:<math>E_\text{k} = \frac{1}{2} \cdot 80 \,\text{kg} \cdot \left(18 \,\text{m/s}\right)^2 = 12960 \,\text{J} = 12.96 \,\text{kJ}</math> + +When you throw a ball, you do [[work (physics)|work]] on it to give it speed as it leaves your hand. The moving ball can then hit something and push it, doing work on what it hits. The kinetic energy of a moving object is equal to the work required to bring it from rest to that speed, or the work the object can do while being brought to rest: '''net force × displacement = kinetic energy''', i.e., + +:<math>F s =\tfrac{1}{2} mv^2</math> + +Since the kinetic energy increases with the square of the speed, an object doubling its speed has four times as much kinetic energy. For example, a car traveling twice as fast as another requires four times as much distance to stop, assuming a constant braking force. As a consequence of this quadrupling, it takes four times the work to double the speed. + +The kinetic energy of an object is related to its [[momentum]] by the equation: +:<math>E_\text{k} = \frac{p^2}{2m}</math> + +where: +:<math>p\;</math> is momentum +:<math>m\;</math> is mass of the body + +For the ''translational kinetic energy,'' that is the kinetic energy associated with [[rectilinear motion]], of a [[rigid body]] with constant [[mass]] <math>m\;</math>, whose [[center of mass]] is moving in a straight line with speed <math>v\;</math>, as seen above is equal to + +:<math> E_\text{t} =\tfrac{1}{2} mv^2 </math> + +where: +:<math>m\;</math> is the mass of the body +:<math>v\;</math> is the speed of the [[center of mass]] of the body. + +The kinetic energy of any entity depends on the reference frame in which it is measured. However the total energy of an isolated system, i.e. one which energy can neither enter nor leave, does not change in whatever reference frame it is measured. Thus, the chemical energy converted to kinetic energy by a rocket engine is divided differently between the rocket ship and its exhaust stream depending upon the chosen reference frame. This is called the [[Oberth effect]]. But the total energy of the system, including kinetic energy, fuel chemical energy, heat, etc., is conserved over time, regardless of the choice of reference frame. Different observers moving with different reference frames disagree on the value of this conserved energy. + +The kinetic energy of such systems depends on the choice of reference frame: the reference frame that gives the minimum value of that energy is the [[center of momentum]] frame, i.e. the reference frame in which the total momentum of the system is zero. This minimum kinetic energy contributes to the [[invariant mass]] of the system as a whole. + +====Derivation==== +The work done accelerating a particle during the infinitesimal time interval ''dt'' is given by the dot product of ''force'' and ''displacement'': +:<math>\mathbf{F} \cdot d \mathbf{x} = \mathbf{F} \cdot \mathbf{v} d t = \frac{d \mathbf{p}}{d t} \cdot \mathbf{v} d t = \mathbf{v} \cdot d \mathbf{p} = \mathbf{v} \cdot d (m \mathbf{v})\,,</math> +where we have assumed the relationship '''p'''&nbsp;=&nbsp;''m''&nbsp;'''v'''. (However, also see the special relativistic derivation [[Kinetic energy#Relativistic kinetic energy of rigid bodies|below]].) + +Applying the [[product rule]] we see that: +:<math> d(\mathbf{v} \cdot \mathbf{v}) = (d \mathbf{v}) \cdot \mathbf{v} + \mathbf{v} \cdot (d \mathbf{v}) = 2(\mathbf{v} \cdot d\mathbf{v}).</math> + +Therefore (assuming constant mass so that ''dm''=0), the following can be seen: +:<math> \mathbf{v} \cdot d (m \mathbf{v}) = \frac{m}{2} d (\mathbf{v} \cdot \mathbf{v}) = \frac{m}{2} d v^2 = d \left(\frac{m v^2}{2}\right). </math> + +Since this is a total differential (that is, it only depends on the final state, not how the particle got there), we can integrate it and call the result kinetic energy: +:<math> E_\text{k} = \int \mathbf{F} \cdot d \mathbf{x} = \int \mathbf{v} \cdot d (m \mathbf{v}) = \int d \left(\frac{m v^2}{2}\right) = \frac{m v^2}{2}. </math> + +This equation states that the kinetic energy (''E''<sub>k</sub>) is equal to the [[integral]] of the [[dot product]] of the [[velocity]] ('''v''') of a body and the [[infinitesimal]] change of the body's [[momentum]] ('''p'''). It is assumed that the body starts with no kinetic energy when it is at rest (motionless). + +===Rotating bodies=== +If a rigid body is rotating about any line through the center of mass then it has [[rotational energy|''rotational kinetic energy'']] (<math>E_\text{r}\,</math>) which is simply the sum of the kinetic energies of its moving parts, and is thus given by: + +:<math> E_\text{r} = \int \frac{v^2 dm}{2} = \int \frac{(r \omega)^2 dm}{2} = \frac{\omega^2}{2} \int{r^2}dm = \frac{\omega^2}{2} I = \begin{matrix} \frac{1}{2} \end{matrix} I \omega^2 </math> + +where: +*ω is the body's [[angular velocity]] +*''r'' is the distance of any mass ''dm'' from that line +*<math>I\,</math> is the body's [[moment of inertia]], equal to <math>\int{r^2}dm</math>. + +(In this equation the moment of [[inertia]] must be taken about an axis through the center of mass and the rotation measured by ω must be around that axis; more general equations exist for systems where the object is subject to wobble due to its eccentric shape). + +===Kinetic energy of systems=== +A system of bodies may have internal kinetic energy due to the relative motion of the bodies in the system. For example, in the [[Solar System]] the planets and planetoids are orbiting the Sun. In a tank of gas, the molecules are moving in all directions. The kinetic energy of the system is the sum of the kinetic energies of the bodies it contains. + +A macroscopic body that is stationary (i.e. a reference frame has been chosen to correspond to the body's [[center of momentum]]) may have various kinds of [[internal energy]] at the molecular or atomic level, which may be regarded as kinetic energy, due to molecular translation, rotation, and vibration, electron translation and spin, and nuclear spin. These all contribute to the body's mass, as provided by the special theory of relativity. When discussing movements of a macroscopic body, the kinetic energy referred to is usually that of the macroscopic movement only. However all internal energies of all types contribute to body's mass, inertia, and total energy. + +===Frame of reference=== + +The speed, and thus the kinetic energy of a single object is frame-dependent (relative): it can take any non-negative value, by choosing a suitable [[inertial frame of reference]]. For example, a bullet passing an observer has kinetic energy in the reference frame of this observer. The same bullet is stationary from the point of view of an observer moving with the same velocity as the bullet, and so has zero kinetic energy.<ref>{{cite book +|title=Introduction to the theory of relativity +|first1=Francis Weston +|last1=Sears +|first2=Robert W. +|last2=Brehme +|publisher=Addison-Wesley +|year=1968 +|page=127 +}}, [http://books.google.com/books?ei=uLlaTKiSF5DuOaqf3JYP&ct=result&id=cpzvAAAAMAAJ&dq=%22in+its+own+rest+frame%22+%22kinetic+energy%22&q=%22in+its+own+rest+frame%22 Snippet view of page 127] +</ref> By contrast, the total kinetic energy of a system of objects cannot be reduced to zero by a suitable choice of the inertial reference frame, unless all the objects have the same velocity. In any other case the total kinetic energy has a non-zero minimum, as no inertial reference frame can be chosen in which all the objects are stationary. This minimum kinetic energy contributes to the system's [[invariant mass]], which is independent of the reference frame. + +The total kinetic energy of a system depends on the [[inertial frame of reference]]: it is the sum of the total kinetic energy in a [[center of momentum frame]] and the kinetic energy the total mass would have if it were concentrated in the [[center of mass]]. + +This may be simply shown: let <math>\textstyle\mathbf{V}</math> be the relative velocity of the center of mass frame ''i'' in the frame ''k''. +Since <math>\textstyle v^2 = (v_i + V)^2 = (\mathbf{v}_i + \mathbf{V}) \cdot (\mathbf{v}_i + \mathbf{V}) = \mathbf{v}_i \cdot \mathbf{v}_i + 2 \mathbf{v}_i \cdot \mathbf{V} + \mathbf{V} \cdot \mathbf{V} = v_i^2 + 2 \mathbf{v}_i \cdot \mathbf{V} + V^2</math>, + +:<math>E_\text{k} = \int \frac{v^2}{2} dm = \int \frac{v_i^2}{2} dm + \mathbf{V} \cdot \int \mathbf{v}_i dm + \frac{V^2}{2} \int dm. </math> + +However, let <math> \int \frac{v_i^2}{2} dm = E_i </math> the kinetic energy in the center of mass frame, <math> \int \mathbf{v}_i dm </math> would be simply the total momentum which is by definition zero in the center of mass frame, and let the total mass: <math> \int dm = M </math>. Substituting, we get:<ref>[http://www.phy.duke.edu/~rgb/Class/intro_physics_1/intro_physics_1/node64.html Physics notes - Kinetic energy in the CM frame]. [[Duke University|Duke]].edu. Accessed 2007-11-24.</ref> + +:<math> E_\text{k} = E_i + \frac{M V^2}{2}. </math> + +Thus the kinetic energy of a system is lowest with respect to center of momentum reference frames, i.e., frames of reference in which the center of mass is stationary (either the [[center of mass frame]] or any other [[center of momentum frame]]). In any other frame of reference there is additional kinetic energy corresponding to the total mass moving at the speed of the center of mass. The kinetic energy of the system in the [[center of momentum frame]] is a quantity which is both invariant (all observers see it to be the same) and is conserved (in an isolated system, it cannot change value, no matter what happens inside the system). + +===Rotation in systems=== +It sometimes is convenient to split the total kinetic energy of a body into the sum of the body's center-of-mass translational kinetic energy and the energy of rotation around the center of mass ([[rotational energy]]): + +:<math> E_\text{k} = E_t + E_\text{r} \, </math> + +where: +:''E''<sub>k</sub> is the total kinetic energy +:''E''<sub>t</sub> is the translational kinetic energy +:''E''<sub>r</sub> is the ''rotational energy'' or ''angular kinetic energy'' in the rest frame + +Thus the kinetic energy of a tennis ball in flight is the kinetic energy due to its rotation, plus the kinetic energy due to its translation. + +==Relativistic kinetic energy of rigid bodies== +{{See also|Mass in special relativity|Tests of relativistic energy and momentum}} + +In [[special relativity]], we must change the expression for linear momentum. + +Using ''m'' for [[rest mass]], '''v''' and ''v'' for the object's velocity and speed respectively, and ''c'' for the speed of light in vacuum, we assume for linear momentum that <math>\mathbf{p}=m\gamma \mathbf{v}</math>, where <math>\gamma = 1/\sqrt{1-v^2/c^2}</math>. + +[[Integration by parts|Integrating by parts]] gives +:<math>E_\text{k} = \int \mathbf{v} \cdot d \mathbf{p}= \int \mathbf{v} \cdot d (m \gamma \mathbf{v}) = m \gamma \mathbf{v} \cdot \mathbf{v} - \int m \gamma \mathbf{v} \cdot d \mathbf{v} = m \gamma v^2 - \frac{m}{2} \int \gamma d (v^2)</math> +Remembering that <math>\gamma = (1 - v^2/c^2)^{-1/2}\!</math>, we get: +:<math>\begin{align} +E_\text{k} &= m \gamma v^2 - \frac{- m c^2}{2} \int \gamma d (1 - v^2/c^2) \\ + &= m \gamma v^2 + m c^2 (1 - v^2/c^2)^{1/2} - E_0 +\end{align}</math> +where ''E''<sub>0</sub> serves as an integration constant. +Thus: +:<math>\begin{align} +E_\text{k} &= m \gamma (v^2 + c^2 (1 - v^2/c^2)) - E_0 \\ + &= m \gamma (v^2 + c^2 - v^2) - E_0 \\ + &= m \gamma c^2 - E_0 +\end{align}</math> +The constant of integration ''E''<sub>0</sub> is found by observing that, when <math>\mathbf{v }= 0 , \ \gamma = 1\!</math> and <math> E_\text{k} = 0 \!</math>, giving +:<math>E_0 = m c^2 \,</math> +and giving the usual formula: +:<math>E_\text{k} = m \gamma c^2 - m c^2 = \frac{m c^2}{\sqrt{1 - v^2/c^2}} - m c^2</math> + +If a body's speed is a significant fraction of the [[speed of light]], it is necessary to use relativistic mechanics (the [[Relativity theory|theory of relativity]] as developed by [[Albert Einstein]]) to calculate its kinetic energy. + +For a relativistic object the momentum p is equal to: + +:<math> p = \frac{m v}{\sqrt{1 - (v/c)^2}} </math>. + +Thus the work expended accelerating an object from rest to a relativistic speed is: + +:<math>E_\text{k} = \frac{m c^2}{\sqrt{1 - (v/c)^2}} - m c^2 </math>. + +The equation shows that the energy of an object approaches infinity as the velocity ''v'' approaches the speed of light ''c'', thus it is impossible to accelerate an object across this boundary. + +The mathematical by-product of this calculation is the [[mass-energy equivalence]] formula—the body at rest must have energy content equal to: + +:<math>E_\text{rest} = E_0 = m c^2 \!</math> + +At a low speed (v<<c), the relativistic kinetic energy may be approximated well by the classical kinetic energy. This is done by [[binomial approximation]]. Indeed, taking [[Taylor expansion]] for the reciprocal square root and keeping first two terms we get: + +:<math>E_\text{k} \approx m c^2 \left(1 + \frac{1}{2} v^2/c^2\right) - m c^2 = \frac{1}{2} m v^2 </math>, + +So, the total energy E can be partitioned into the energy of the rest mass plus the traditional Newtonian kinetic energy at low speeds. + +When objects move at a speed much slower than light (e.g. in everyday phenomena on Earth), the first two terms of the series predominate. The next term in the approximation is small for low speeds, and can be found by extending the expansion into a Taylor series by one more term: + +:<math> E_\text{k} \approx m c^2 \left(1 + \frac{1}{2} v^2/c^2 + \frac{3}{8} v^4/c^4\right) - m c^2 = \frac{1}{2} m v^2 + \frac{3}{8} m v^4/c^2 </math>. + +For example, for a speed of {{convert|10|km/s|mph|abbr=on}} the correction to the Newtonian kinetic energy is 0.0417&nbsp;J/kg (on a Newtonian kinetic energy of 50&nbsp;MJ/kg) and for a speed of 100&nbsp;km/s it is 417&nbsp;J/kg (on a Newtonian kinetic energy of 5&nbsp;GJ/kg), etc. + +For higher speeds, the formula for the relativistic kinetic energy<ref>In Einstein's original [http://www.uni-kiel.de/ub/digiport/ab1800/G4378.html Über die spezielle und die allgemeine Relativitätstheorie] (Zu Seite 41) and in most translations (e.g. [http://bartleby.com/173/15.html Relativity - The Special and General Theory]) kinetic energy is defined as <math>m c^2 / \sqrt{1 - v^2/c^2}</math>.</ref> is derived by simply subtracting the rest mass energy from the total energy: + +:<math> E_\text{k} = m \gamma c^2 - m c^2 = m c^2\left(\frac{1}{\sqrt{1 - (v/c)^2}} - 1\right) </math>. + +The relation between kinetic energy and [[momentum]] is more complicated in this case, and is given by the equation: + +:<math>E_\text{k} = \sqrt{p^2 c^2 + m^2 c^4} - m c^2</math>. + +This can also be expanded as a [[Taylor series]], the first term of which is the simple expression from Newtonian mechanics. + +What this suggests is that the formulas for energy and momentum are not special and axiomatic, but rather concepts which emerge from the equation of mass with energy and the principles of relativity. + +===General relativity=== +{{see also|Schwarzschild geodesics}} +Using the convention that +:<math>g_{\alpha \beta} \, u^{\alpha} \, u^{\beta} \, = \, - c^2 </math> + +where the [[four-velocity]] of a particle is +:<math>u^{\alpha} \, = \, \frac{d x^{\alpha}}{d \tau} </math> + +and <math>\tau \,</math> is the [[proper time]] of the particle, there is also an expression for the kinetic energy of the particle in [[general relativity]]. + +If the particle has momentum +:<math>p_{\beta} \, = \, m \, g_{\beta \alpha} \, u^{\alpha} </math> + +as it passes by an observer with four-velocity ''u''<sub>obs</sub>, then the expression for total energy of the particle as observed (measured in a local inertial frame) is +:<math>E \, = \, - \, p_{\beta} \, u_{\text{obs}}^{\beta} </math> + +and the kinetic energy can be expressed as the total energy minus the rest energy: +:<math>E_{k} \, = \, - \, p_{\beta} \, u_{\text{obs}}^{\beta} \, - \, m \, c^2 \, .</math> + +Consider the case of a metric which is diagonal and spatially isotropic (''g''<sub>tt</sub>,''g''<sub>ss</sub>,''g''<sub>ss</sub>,''g''<sub>ss</sub>). Since +:<math>u^{\alpha} = \frac{d x^{\alpha}}{d t} \frac{d t}{d \tau} = v^{\alpha} u^{t} \,</math> + +where ''v''<sup>α</sup> is the ordinary velocity measured w.r.t. the coordinate system, we get +:<math>-c^2 = g_{\alpha \beta} u^{\alpha} u^{\beta} = g_{t t} (u^{t})^2 + g_{s s} v^2 (u^{t})^2 \,.</math> + +Solving for ''u''<sup>t</sup> gives +:<math>u^{t} = c \sqrt{\frac{-1}{g_{t t} + g_{s s} v^2}} \,.</math> + +Thus for a stationary observer (''v''= 0) +:<math>u_{\text{obs}}^{t} = c \sqrt{\frac{-1}{g_{t t}}} \,</math> + +and thus the kinetic energy takes the form +:<math>E_\text{k} = - m g_{tt} u^t u_{\text{obs}}^t - m c^2 = m c^2 \sqrt{\frac{g_{tt}}{g_{tt} + g_{ss} v^2}} - m c^2\,.</math> + +Factoring out the rest energy gives: +:<math>E_\text{k} = m c^2 \left( \sqrt{\frac{g_{tt}}{g_{tt} + g_{ss} v^2}} - 1 \right) \,.</math> + +This expression reduces to the special relativistic case for the flat-space metric where +:<math>g_{t t} = -c^2 \,</math> +:<math>g_{s s} = 1 \,.</math> + +In the Newtonian approximation to general relativity +:<math>g_{t t} = - \left( c^2 + 2 \Phi \right) \,</math> +:<math>g_{s s} = 1 - \frac{2 \Phi}{c^2} \,</math> + +where Φ is the Newtonian [[gravitational potential]]. This means clocks run slower and measuring rods are shorter near massive bodies. + +==Kinetic energy in quantum mechanics== + +{{further2|[[Hamiltonian (quantum mechanics)]]}} + +In [[quantum mechanics]], observables like kinetic energy are represented as operators. For one particle of mass ''m'', the kinetic energy operator appears as a term in the [[Hamiltonian (quantum mechanics)|Hamiltonian]] and is defined in terms of the more fundamental momentum operator <math>\hat p</math> as + +:<math>\hat T = \frac{\hat p^2}{2m}.</math> + +Notice that this can be obtained by replacing <math>p</math> by <math>\hat p</math> in the classical expression for kinetic energy in terms of [[momentum]], +:<math>E_\text{k} = \frac{p^2}{2m}.</math> + +In the [[Schrödinger picture]], <math>\hat p</math> takes the form <math>-i\hbar\nabla </math> where the derivative is taken with respect to position coordinates and hence + +:<math>\hat T = -\frac{\hbar^2}{2m}\nabla^2.</math> + +The expectation value of the electron kinetic energy, <math>\langle\hat{T}\rangle</math>, for a system of ''N'' electrons described by the [[Wave function|wavefunction]] <math>\vert\psi\rangle</math> is a sum of 1-electron operator expectation values: +:<math>\langle\hat{T}\rangle = \bigg\langle\psi \bigg\vert \sum_{i=1}^N \frac{-\hbar^2}{2 m_\text{e}} \nabla^2_i \bigg\vert \psi \bigg\rangle = -\frac{\hbar^2}{2 m_\text{e}} \sum_{i=1}^N \bigg\langle\psi \bigg\vert \nabla^2_i \bigg\vert \psi \bigg\rangle</math> +where <math>m_\text{e}</math> is the mass of the electron and <math>\nabla^2_i</math> is the [[Laplacian]] operator acting upon the coordinates of the ''i''<sup>th</sup> electron and the summation runs over all electrons. + +The [[Density functional theory|density functional]] formalism of quantum mechanics requires knowledge of the electron density ''only'', i.e., it formally does not require knowledge of the wavefunction. Given an electron density <math>\rho(\mathbf{r})</math>, the exact N-electron kinetic energy functional is unknown; however, for the specific case of a 1-electron system, the kinetic energy can be written as +:<math> T[\rho] = \frac{1}{8} \int \frac{ \nabla \rho(\mathbf{r}) \cdot \nabla \rho(\mathbf{r}) }{ \rho(\mathbf{r}) } d^3r </math> +where <math>T[\rho]</math> is known as the [[Carl Friedrich von Weizsäcker|von Weizsäcker]] kinetic energy functional. + +==See also== +{{Portal|Energy}} +* [[Escape velocity]] +* [[Joule]] +* [[KE-Munitions]] +* [[Projectile#Typical_projectile_speeds|Kinetic energy per unit mass of projectiles]] +* [[Projectile#Kinetic projectiles|Kinetic projectile]] +* [[Parallel axis theorem]] +* [[Potential energy]] +* [[Recoil]] + +==Notes== +{{reflist}} + +==References== +* [http://www.kineticenergys.com kinetic energy]—What it is and how it works. +* [[Oxford Dictionary]] 1998 +* {{cite web | url = http://www-history.mcs.st-andrews.ac.uk/Mathematicians/Coriolis.html | title = Biography of Gaspard-Gustave de Coriolis (1792-1843) | accessdate = 2006-03-03 | author = School of Mathematics and Statistics, University of St Andrews | year = 2000 }} +* {{cite book | last = Serway | first = Raymond A. | coauthors = Jewett, John W. | title = Physics for Scientists and Engineers | edition = 6th | publisher = Brooks/Cole | year = 2004 | isbn = 0-534-40842-7 }} +* {{cite book | last = Tipler | first = Paul | title = Physics for Scientists and Engineers: Mechanics, Oscillations and Waves, Thermodynamics | edition = 5th | publisher = W. H. Freeman | year = 2004 | isbn = 0-7167-0809-4 }} +* {{cite book | last = Tipler | first = Paul | coauthors = Llewellyn, Ralph | title = Modern Physics | edition = 4th | publisher = W. H. Freeman | year = 2002 | isbn = 0-7167-4345-0 }} + +{{Footer energy}} + +[[Category:Forms of energy]] +[[Category:Kinetic energy]] +[[Category:Dynamics]] + +[[ml:ഗതികോര്‍ജ്ജം]] + 137ml25w7xk4aoyd9ibyo0abz5acuqx + + + + Lattice problem + 0 + 22617 + + 22618 + 2014-01-28T17:50:54Z + + Arta1365 + 0 + + + wikitext + text/x-wiki + In [[computer science]], '''lattice problems''' are a class of optimization problems on [[Lattice (group)|lattices]]. The conjectured intractability of such problems is central to construction of secure [[Lattice-based cryptography|lattice-based]] [[cryptosystems]]. For applications in such cryptosystems, lattices over vector spaces (often <math>\mathbb{Q}^n</math>) or free modules (often <math>\mathbb{Z}^n</math>) are generally considered. + +For all the problems below, assume that we are given (in addition to other more specific inputs) a basis for the vector space ''V'' and a [[Norm (mathematics)|norm]] ''N''. The norms usually considered are [[Norm (mathematics)#Euclidean norm|''L''<sup>2</sup>]]. However, other norms (such as [[Norm (mathematics)#p-norm|''L''<sup>p</sup>]]) are also considered and show up in a variety of results.<ref>[[Subhash Khot]], "Hardness of approximating the shortest vector problem in lattices," J. ACM 52, no. 5 (2005): 789–808.</ref> Let <math>\lambda(L)</math> denote the length of the shortest non-zero vector in the lattice ''L'': <math> \lambda(L)=\mathbf{min} \{ \|v\|_N | v \in \mathbf{L}, v \neq 0 \} +</math>. + +==Shortest vector problem (SVP)== +In SVP, a [[Basis (linear algebra)|basis]] of a [[vector space]] ''V'' and a [[Norm (mathematics)|norm]] ''N'' (often [[Norm (mathematics)#Euclidean norm|''L''<sup>2</sup>]]) are given for a lattice ''L'' and one must find the shortest non-zero vector in ''V'', as measured by ''N'', in ''L''. In other words, the algorithm should output a non-zero vector ''v'' such that <math>N(v)=\lambda(L)</math>. + +In the <math>\gamma</math>-approximation version <math>SVP_\gamma</math>, one must find a non-zero lattice vector of length at most <math>\gamma \lambda(L)</math>. + +===Known results=== +The exact version of the problem is [[NP-hard]].<ref name="vEB">[http://staff.science.uva.nl/~peter/vectors/mi8104c.html Peter van Emde Boas], P. 1981. Another NP-complete problem and the complexity of computing short vectors in a lattice. Tech. rep., University of Amsterdam, Department of Mathematics, Netherlands. Technical Report 8104</ref> +Approach techniques: [[Lenstra–Lenstra–Lovász lattice basis reduction algorithm]] produces a "relatively short vector" in polynomial time, but does not solve the problem. +Kannan's HKZ basis reduction algorithm solves the problem in <math>n^{\frac{n}{2 e} + o(n)}</math> time where n is the dimension. +Lastly, Schnorr presented a technique that interpolates between LLL and HKZ called Block Reduction. Block reduction works with HKZ bases and if the number of blocks is chosen to be larger than the dimension, the resulting algorithm Kannan's full HKZ basis reduction. + +==GapSVP== +The problem <math>GapSVP_\beta</math> consists of differentiating between the instances of SVP in which the answer is at most 1 or larger than <math>\beta</math>, where <math>\beta</math> can be a fixed function of <math>n</math>, the number of vectors. Given a basis for the lattice, the algorithm must decide whether <math>\lambda(L) \leq 1</math> or <math>\lambda(L)>\beta</math>. Like other [[promise problem]]s, the algorithm is allowed to err on all other cases. + +Yet another version of the problem is <math>GapSVP_{\zeta,\gamma}</math> for some functions <math>\zeta,\gamma</math>. The input to the algorithm is a basis <math>B</math> and a number <math>d</math>. It is assured that all the vectors in the [[Gram–Schmidt orthogonalization]] are of length at least 1, and that <math>\lambda(L(B)) \leq \zeta(n) </math> and that <math>1 \leq d \leq \zeta(n)/\gamma(n)</math> where <math>n</math> is the dimension. The algorithm must accept if <math>\lambda(L(B)) \leq d</math>, and reject if <math>\lambda(L(B)) \geq \gamma(n).d</math>. For large <math>\zeta</math> (<math>\zeta(n)>2^{n/2}</math>), the problem is equivalent to <math>GapSVP_\gamma</math> because<ref>Chris Peikert, "Public-key cryptosystems from the worst-case shortest vector problem: extended abstract," in Proceedings of the 41st annual ACM symposium on Theory of computing (Bethesda, MD, USA: ACM, 2009), 333–342, http://portal.acm.org/citation.cfm?id=1536414.1536461.</ref> a preprocessing done using the [[LLL algorithm]] makes the second condition (and hence, <math>\zeta</math>) redundant. + +==Closest vector problem (CVP)== +<gallery caption="Lattice problems by example" widths="200px" heights="200px"> +Image:Svp09.png|The SVP by example +Image:Cvp3.png|The CVP by example +</gallery> +In CVP, a basis of a vector space ''V'' and a [[Metric (mathematics)|metric]] ''M'' (often [[Euclidean distance|''L''<sup>2</sup>]]) are given for a lattice ''L'', as well as a vector ''v'' in ''V'' but not necessarily in ''L''. It is desired to find the vector in ''L'' closest to ''v'' (as measured by ''M''). In the <math>\gamma</math>-approximation version <math>CVP_\gamma</math>, one must find a lattice vector at distance at most <math>\gamma</math>. + +===Relationship with SVP=== +The closest vector problem is a generalization of the shortest vector problem. It is easy<ref>Daniele Micciancio and [[Shafi Goldwasser]], Complexity of lattice problems (Springer, 2002)</ref> to show that given an oracle for <math>CVP_\gamma</math> (defined below), one can solve <math>SVP_\gamma</math> by making some queries to the oracle. The naive method to find the shortest vector by calling the <math>CVP_\gamma</math> oracle to find the closest vector to 0 does not work because 0 is itself a lattice vector and the algorithm could potentially output 0. + +The reduction from <math>SVP_\gamma</math> to <math>CVP_\gamma</math> is as follows: Suppose that the input to the <math>SVP_\gamma</math> problem is the basis for lattice <math>B=[b_1,b_2,\ldots,b_n]</math>. Consider the basis <math>B^i=[b_1,\ldots,2b_i,\ldots,b_n]</math> and let <math>x_i</math> be the vector returned by <math>CVP_\gamma(B^i, b_i)</math>. The claim is that the shortest vector in the set <math>\{x_i-b_i\}</math> is the shortest vector in the given lattice. + +===Known results=== +Goldreich et al.<ref>O. Goldreich et al., "Approximating shortest lattice vectors is not harder than approximating closest lattice vectors," Inf. Process. Lett. 71, no. 2 (1999): 55–61.</ref> showed that any hardness of SVP implies the same hardness for CVP. Using [[Probabilistically checkable proof (complexity)|PCP]] tools, Arora et al.<ref>[[Sanjeev Arora]] et al., "The hardness of approximate optima in lattices, codes, and systems of linear equations," J. Comput. Syst. Sci. 54, no. 2 (1997): 317–331.</ref> showed that CVP is hard to approximate within factor <math>2^{\log^{1-\epsilon}(n)}</math> unless <math>\operatorname{NP} \subseteq \operatorname{DTIME}(2^{poly(\log n)})</math>. Dinur et al.<ref>I. Dinur et al., "Approximating CVP to Within Almost-Polynomial Factors is NP-Hard," Combinatorica 23, no. 2 (2003): 205–243.</ref> strengthened this by giving a NP-hardness result with <math>\epsilon=(\log \log n)^c</math> for <math>c<1/2</math>. + +===Sphere decoding=== +The algorithm for CVP, especially the Fincke and Pohst variant,<ref>Fincke, U. and Pohst, M., "Improved Methods for Calculating Vectors of Short Length in a Lattice, Including a Complexity Analysis," Math. Comp., vol. 44, no. 170, pp. 463–471, 1985</ref> have been used, for example, for data detection in multiple-input multiple-output ([[MIMO]]) wireless communication systems (for coded and uncoded signals)<ref>Biglieri, E. and Calderbank, R. and [[Anthony G. Constantinides]], A. and Goldsmith, A. and Paulraj, A. and Poor, H. V., MIMO Wireless Communications, Cambridge U. P., Cambridge, 2007</ref> +.<ref>Agrell, E. and Eriksson, T. and Vardy, A. and Zeger, K., "Closest Point Search in Lattices," IEEE Trans. Inform. Theory, vol. 48, no. 8, pp. 2201–2214, 2002. http://dx.doi.org/10.1109/TIT.2002.800499</ref> It is called ''sphere decoding''.<ref>Ping Wang, Tho Le-Ngoc, "A List Sphere Decoding Algorithm with Improved Radius Setting Strategies", Wireless Personal Communications +November 2011, Volume 61, Issue 1, pp 189-200 +</ref> + +It has been applied in the field of the integer ambiguity resolution of carrier-phase GNSS (GPS) +.<ref>Hassibi, A. and Boyd, S., Integer Parameter Estimation in Linear Models with Applications to GPS, IEEE Trans. Sig. Proc., 46, 11, 2938--2952, 1998.</ref> It is called ''LAMBDA method'' in that field. + +==GapCVP== +This problem is similar to the GapSVP problem. For <math>GapCVP_\beta</math>, the input consists of a lattice basis and a vector <math>v</math> and the algorithm must answer whether +* there is a lattice vector such that the distance between it and <math>v</math> is at most 1. +* every lattice vector is at a distance greater than <math>\beta</math> away from <math>v</math>. + +===Known results=== +The problem is trivially contained in [[NP (complexity)|NP]] for any approximation factor. + +Schnorr,<ref>C. P. Schnorr, Factoring integers and computing [[discrete logarithm]]s via diophantine approximation, Advances in Cryptology: Proceedings of Eurocrypt '91</ref> in 1987, showed that deterministic polynomial time algorithms can solve the problem for <math>\beta=2^{O(n(\log \log n)^2/\log n)}</math>. Ajtai et al.<ref>Miklós Ajtai, Ravi Kumar, and D. Sivakumar, "A sieve algorithm for the shortest lattice vector problem," in Proceedings of the thirty-third annual ACM symposium on Theory of computing (Hersonissos, Greece: ACM, 2001), 601–610, http://portal.acm.org/citation.cfm?doid=380752.380857</ref> showed that probabilistic algorithms can achieve a slightly better approximation factor of <math>\beta=2^{O(n \log \log n/\log n)}</math> + +In 1993, Banaszczyk<ref>W. Banaszczyk, New bounds in some transference theorems in the geometry of numbers, Math. Ann. 296 (1993) 625–635.</ref> showed that <math>GapCVP_n</math> is in <math>NP \cap coNP</math>. In 2000, Goldreich and Goldwasser<ref>Oded Goldreich and Shafi Goldwasser, "On the limits of non-approximability of lattice problems," in Proceedings of the thirtieth annual ACM symposium on Theory of computing (Dallas, Texas, United States: ACM, 1998), 1–9, http://portal.acm.org/citation.cfm?id=276704.</ref> showed that <math>\beta=\sqrt{n/\log n}</math> puts the problem in both NP and [[coAM]]. In 2005, Aharonov and Regev<ref>{{Cite journal| doi = 10.1145/1089023.1089025| volume = 52| issue = 5| pages = 749–765| last = Aharonov| first = Dorit| coauthors = Oded Regev| title = Lattice problems in NP <math>\cap</math> coNP| journal = J. ACM| year = 2005| url = http://portal.acm.org/citation.cfm?id=1089025}}</ref> showed that for some constant <math>c</math>, the problem with <math>\beta=c\sqrt{n}</math> is in <math>NP \cap coNP</math>. + +For lower bounds, Dinur et al.<ref>I. Dinur, G. Kindler, and S. Safra, "Approximating-CVP to within Almost-Polynomial Factors is NP-Hard," in Proceedings of the 39th Annual Symposium on Foundations of Computer Science (IEEE Computer Society, 1998), 99, http://portal.acm.org/citation.cfm?id=796466.</ref> showed in 1998 that the problem is NP-hard for <math>\beta=n^{o(1/\log{\log{n}})}</math>. + +==Shortest independent vectors problem (SIVP)== +Given a lattice L of dimension n, the algorithm must output n [[linearly independent]] <math>v_1, v_2, \ldots, v_n</math> so that <math>\max \|v_i\| < \max_{B} \|b_i\|</math> where the right hand side considers all basis <math>B=\{b_1,\ldots,b_n\}</math> of the lattice. + +In the <math>\gamma</math>-approximate version, given a lattice L with dimension n, find n [[linearly independent]] vectors <math>v_1, v_2,\ldots, v_n</math> of length max ||<math>v_i</math>|| ≤ <math>\gamma \lambda_n(L)</math>, where <math>\lambda_n(L)</math> is the <math>n</math>'th successive mininum of <math>L</math>. + +==Bounded distance decoding== +This problem is similar to CVP. Given a vector such that its distance from the lattice is at most <math>\lambda(L)/2</math>, the algorithm must output the closest lattice vector to it. + +==Covering radius problem== +Given a basis for the lattice, the algorithm must find the largest distance (or in some versions, its approximation) from any vector to the lattice. + +==Shortest basis problem== +Many problems become easier if the input basis consists of short vectors. An algorithm that solves the Shortest Basis Problem (SBP) must, given a lattice basis<math>B</math>, output an equivalent basis <math>B'</math> such that the length of the longest vector in <math>B'</math> is as short as possible. + +The approximation version <math>SBP_\gamma</math> problem consist of finding a basis whose longest vector is at most <math>\gamma</math> times longer than the longest vector in the shortest basis. + +==Use in cryptography== +{{main|Lattice-based cryptography}} + +[[Average case]] hardness of problems forms a basis for proofs-of-security for most cryptographic schemes. However, experimental evidence suggests that most NP-hard problems lack this property: they are probably only worst case hard. Many lattice problems have been conjectured or proven to be average-case hard, making them an attractive class of problems to base cryptographic schemes on. Moreover, worst-case hardness of some lattice problems have been used to create secure cryptographic schemes. The use of worst-case hardness in such schemes makes them among the very few schemes that are very likely secure even against [[quantum computers]]. + +The above lattice problems are easy to solve if the algorithm is provided with a "good" basis. [[Lattice reduction]] algorithms aim, given a basis for a lattice, to output a new basis consisting of relatively short, nearly orthogonal vectors. The [[LLL algorithm]]<ref>{A. K. Lenstra, H. W. Lenstra, Jr., L. Lovász, Factoring polynomials with rational coefficients, Math. Ann. 261 (1982), 515–534.}</ref> was an early efficient algorithm for this problem which could output an almost reduced lattice basis in polynomial time. This algorithm and its further refinements were used to break several cryptographic schemes, establishing its status as a very important tool in cryptanalysis. The success of LLL on experimental data led to a belief that lattice reduction might be an easy problem in practice. However, this belief was challenged when in the late 1990s, several new results on the hardness of lattice problems were obtained, starting with the result of Ajtai.<ref name="ajtai">M. Ajtai, "Generating hard instances of lattice problems (extended abstract)," in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing (Philadelphia, Pennsylvania, United States: ACM, 1996), 99–108, http://portal.acm.org/citation.cfm?id=237838</ref> + +In his seminal papers,<ref name="ajtai" /><ref>Miklós Ajtai, "The shortest vector problem in ''L<sub>2</sub>'' is ''NP''-hard for randomized reductions (extended abstract)," in Proceedings of the thirtieth annual ACM symposium on Theory of computing (Dallas, Texas, United States: ACM, 1998), 10–19, http://portal.acm.org/citation.cfm?id=276705</ref> Ajtai showed that the SVP problem was NP-hard and discovered some connections between the worst-case complexity and [[average-case complexity]] of some lattice problems. Building on these results, Ajtai and Dwork<ref>Miklós Ajtai and Cynthia Dwork, "A public-key cryptosystem with worst-case/average-case equivalence," in Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (El Paso, Texas, United States: ACM, 1997), 284–293, http://portal.acm.org/citation.cfm?id=258604</ref> created a public-key cryptosystem whose security could be proven using only the worst case hardness of a certain version of SVP, thus making it the first<ref>1Jin-Yi Cai, "The Complexity of Some Lattice Problems," in Algorithmic Number Theory, 2000, 1–32, http://dx.doi.org/10.1007/10722028_1</ref> result to have used worst-case hardness to create secure systems. + +==See also== +*[[Learning with errors]] + +==References== +{{reflist|colwidth=30em}} + +* Daniele Micciancio: The Shortest Vector Problem is {NP}-hard to approximate to within some constant. SIAM Journal on Computing. 2001, http://cseweb.ucsd.edu/~daniele/papers/SVP.html. +* Phong Q. Nguyen and Jacques Stern, "Lattice Reduction in Cryptology: An Update," in Proceedings of the 4th International Symposium on Algorithmic Number Theory (Springer-Verlag, 2000), 85–112, http://portal.acm.org/citation.cfm?id=749906. +* {{cite journal |author=Agrell, E.; Eriksson, T.; Vardy, A.; Zeger, K. |title=Closest Point Search in Lattices |journal=IEEE Trans. Inform. Theory |volume=48 |issue=8 |pages=2201–2214 |doi=10.1109/TIT.2002.800499}} + +{{DEFAULTSORT:Lattice Problems}} +[[Category:Lattice-based cryptography]] +[[Category:Mathematical problems]] + 8wqxmb6db86t7o3uwra2k5anv632w1h + + + + Huffman coding + 0 + 484 + + 485 + 2014-01-22T17:33:28Z + + AnomieBOT + 0 + + + Dating maintenance tags: {{External links}} + wikitext + text/x-wiki + {{more footnotes|date=January 2011}} +[[Image:Huffman tree 2.svg|thumb|Huffman tree generated from the exact frequencies of the text "this is an example of a huffman tree". The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed to 288 bits if 36 characters of 8 bits were used. (This assumes that the code tree structure is known to the decoder and thus does not need to be counted as part of the transmitted information.)]] +{| class="wikitable sortable" style="float:right; clear:right;" +!Char!!Freq!!Code +|- +|space||7||111 +|- +|a ||4||010 +|- +|e ||4||000 +|- +|f ||3||1101 +|- +|h ||2||1010 +|- +|i ||2||1000 +|- +|m ||2||0111 +|- +|n ||2||0010 +|- +|s ||2||1011 +|- +|t ||2||0110 +|- +|l ||1||11001 +|- +|o ||1||00110 +|- +|p ||1||10011 +|- +|r ||1||11000 +|- +|u ||1||00111 +|- +|x ||1||10010 +|} +In [[computer science]] and [[information theory]], '''Huffman coding''' is an [[entropy encoding]] [[algorithm]] used for [[lossless data compression]]. The term refers to the use of a [[variable-length code]] table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by [[David A. Huffman]] while he was a [[Doctor of Philosophy|Ph.D.]] student at [[Massachusetts Institute of Technology|MIT]], and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". + +Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a [[prefix code]] (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method ''of this type'': no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.{{citation needed|date=July 2013}} The running time of Huffman's method is fairly efficient, it takes <math> O(n \log n) </math> operations to construct it. A method was later found to design a Huffman code in [[linear time]] if input probabilities (also known as ''weights'') are sorted.<ref>Jan van Leeuwen, On the construction of Huffman trees, ICALP 1976, 382-410</ref> + +For a set of symbols with a uniform probability distribution and a number of members which is a [[power of two]], Huffman coding is equivalent to simple binary [[Block code|block encoding]], e.g., [[ASCII]] coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. + +Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbol-by-symbol restriction is dropped, or when the [[probability mass function]]s are unknown, not [[independent and identically-distributed random variables|identically distributed]], or not [[independence (probability theory)|independent]] (e.g., "cat" is more common than "cta").{{citation needed|date=July 2013}} Other methods such as [[arithmetic coding]] and [[LZW]] coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. However, the limitations of Huffman coding should not be overstated; it can be used adaptively, accommodating unknown, changing, or context-dependent probabilities. In the case of known [[independent and identically distributed random variables]], combining symbols reduces inefficiency in a way that approaches optimality as the number of symbols combined increases. + +== History == + +In 1951, [[David A. Huffman]] and his [[MIT]] [[information theory]] classmates were given the choice of a term paper or a final [[exam]]. The professor, [[Robert M. Fano]], assigned a [[term paper]] on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted [[binary tree]] and quickly proved this method the most efficient.<ref>see Ken Huffman (1991)</ref> + +In doing so, the student outdid his professor, who had worked with [[information theory]] inventor [[Claude Shannon]] to develop a similar code. By building the tree from the bottom up instead of the top down, Huffman avoided the major flaw of the suboptimal [[Shannon-Fano coding]]. + +== Problem definition == + +=== Informal description === +;Given: A set of symbols and their weights (usually [[Proportionality (mathematics)|proportional]] to probabilities). +;Find: A [[Prefix code|prefix-free binary code]] (a set of codewords) with minimum [[Expected value|expected]] codeword length (equivalently, a tree with minimum [[weighted path length from the root]]). + +=== Formalized description === +'''Input'''.<br> +Alphabet <math>A = \left\{a_{1},a_{2},\cdots,a_{n}\right\}</math>, which is the symbol alphabet of size <math>n</math>. <br> +Set <math>W = \left\{w_{1},w_{2},\cdots,w_{n}\right\}</math>, which is the set of the (positive) symbol weights (usually proportional to probabilities), i.e. <math>w_{i} = \mathrm{weight}\left(a_{i}\right), 1\leq i \leq n</math>. <br> +<br> +'''Output'''.<br> +Code <math>C \left(A,W\right) = \left\{c_{1},c_{2},\cdots,c_{n}\right\}</math>, which is the set of (binary) codewords, where <math>c_{i}</math> is the codeword for <math>a_{i}, 1 \leq i \leq n</math>.<br> +<br> +'''Goal'''.<br> +Let <math>L\left(C\right) = \sum_{i=1}^{n}{w_{i}\times\mathrm{length}\left(c_{i}\right)}</math> be the weighted path length of code <math>C</math>. Condition: <math>L\left(C\right) \leq L\left(T\right)</math> for any code <math>T\left(A,W\right)</math>. + +=== Samples === +{|class="wikitable" +!rowspan="2" style="background:#efefef"| Input (''A'', ''W'') +!style="background:#efefef;font-weight:normal"| Symbol (''a''<sub>''i''</sub>) +|align="center" style="background:#efefef"| a +|align="center" style="background:#efefef"| b +|align="center" style="background:#efefef"| c +|align="center" style="background:#efefef"| d +|align="center" style="background:#efefef"| e +!style="background:#efefef"| Sum +|- +!style="background:#efefef;font-weight:normal"| Weights (''w''<sub>''i''</sub>) +|align="center"| 0.10 +|align="center"| 0.15 +|align="center"| 0.30 +|align="center"| 0.16 +|align="center"| 0.29 +|align="center"| = 1 +|- +!rowspan="3" style="background:#efefef"| Output ''C'' +!style="background:#efefef;font-weight:normal"| Codewords (''c''<sub>''i''</sub>) +|align="center"| <tt>010</tt> +|align="center"| <tt>011</tt> +|align="center"| <tt>11</tt> +|align="center"| <tt>00</tt> +|align="center"| <tt>10</tt> +|rowspan="2"|&nbsp; +|- +!style="background:#efefef;font-weight:normal"| Codeword length (in bits)<br />(''l''<sub>''i''</sub>) +|align="center"| 3 +|align="center"| 3 +|align="center"| 2 +|align="center"| 2 +|align="center"| 2 +|- +!style="background:#efefef;font-weight:normal"| Weighted path length<br />(''l''<sub>''i''</sub> ''w''<sub>''i''</sub> ) +|align="center"| 0.30 +|align="center"| 0.45 +|align="center"| 0.60 +|align="center"| 0.32 +|align="center"| 0.58 +|align="center"| ''L''(''C'') = 2.25 +|- +!rowspan="3" style="background:#efefef"| Optimality +!style="background:#efefef;font-weight:normal"| Probability budget<br />(2<sup>-''l''<sub>''i''</sub></sup>) +| align="center" | 1/8 +| align="center" | 1/8 +| align="center" | 1/4 +| align="center" | 1/4 +| align="center" | 1/4 +| align="center" | = 1.00 +|- +! style="background: #efefef; font-weight: normal;" | Information content (in bits)<br />(−'''log'''<sub>2</sub> ''w''<sub>''i''</sub>) ≈ +|align="center"| 3.32 +|align="center"| 2.74 +|align="center"| 1.74 +|align="center"| 2.64 +|align="center"| 1.79 +|align="center"| &nbsp; +|- +! style="background: #efefef; font-weight: normal;" | Entropy<br />(−''w''<sub>''i''</sub> '''log'''<sub>2</sub> ''w''<sub>''i''</sub>) +|align="center"| 0.332 +|align="center"| 0.411 +|align="center"| 0.521 +|align="center"| 0.423 +|align="center"| 0.518 +|align="center"| ''H''(''A'') = 2.205 +|} + +For any code that is ''biunique'', meaning that the code is ''uniquely decodeable'', the sum of the probability budgets across all symbols is always less than or equal to one. In this example, the sum is strictly equal to one; as a result, the code is termed a ''complete'' code. If this is not the case, you can always derive an equivalent code by adding extra symbols (with associated null probabilities), to make the code complete while keeping it ''biunique''. + +As defined by [[A Mathematical Theory of Communication|Shannon (1948)]], the information content ''h'' (in bits) of each symbol ''a''<sub>i</sub> with non-null probability is + +:<math>h(a_i) = \log_2{1 \over w_i}. </math> + +The [[information entropy|entropy]] ''H'' (in bits) is the weighted sum, across all symbols ''a''<sub>''i''</sub> with non-zero probability ''w''<sub>''i''</sub>, of the information content of each symbol: + +:<math> H(A) = \sum_{w_i > 0} w_i h(a_i) = \sum_{w_i > 0} w_i \log_2{1 \over w_i} = - \sum_{w_i > 0} w_i \log_2{w_i}. </math> + +(Note: A symbol with zero probability has zero contribution to the entropy, since <math>\lim_{w \to 0^+} w \log_2 w = 0</math> So for simplicity, symbols with zero probability can be left out of the formula above.) + +As a consequence of [[Shannon's source coding theorem]], the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights. In this example, the weighted average codeword length is 2.25 bits per symbol, only slightly larger than the calculated entropy of 2.205 bits per symbol. So not only is this code optimal in the sense that no other feasible code performs better, but it is very close to the theoretical limit established by Shannon. + +Note that, in general, a Huffman code need not be unique, but it is always one of the codes minimizing <math>L(C)</math>. + +== Basic technique == + +===Compression=== +[[Image:Huffman coding example.svg|thumb|A source generates 4 different symbols <math>\{a_1 , a_2 , a_3 , a_4 \}</math> with probability <math>\{0.4 ; 0.35 ; 0.2 ; 0.05 \}</math>. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is: +{|class="wikitable" +! Symbol !! Code +|- +|a1 || 0 +|- +|a2 || 10 +|- +|a3 || 110 +|- +|a4 || 111 +|- +|} +The standard way to represent a signal made of 4 symbols is by using 2 bits/symbol, but the [[Information entropy|entropy]] of the source is 1.74 bits/symbol. If this Huffman code is used to represent the signal, then the average length is lowered to 1.85 bits/symbol; it is still far from the theoretical limit because the probabilities of the symbols are different from negative powers of two.]] + +The technique works by creating a [[binary tree]] of nodes. These can be stored in a regular [[Array data type|array]], the size of which depends on the number of symbols, <math>n</math>. A node can be either a [[leaf node]] or an [[internal node]]. Initially, all nodes are leaf nodes, which contain the '''symbol''' itself, the '''weight''' (frequency of appearance) of the symbol and optionally, a link to a '''parent''' node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol '''weight''', links to '''two child nodes''' and the optional link to a '''parent''' node. As a common convention, bit '0' represents following the left child and bit '1' represents following the right child. A finished tree has up to <math>n</math> leaf nodes and <math>n-1</math> internal nodes. A Huffman tree that omits unused symbols produces optimal code lengths. + +The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent, then a new node whose children are the 2 nodes with smallest probability is created, such that the new node's probability is equal to the sum of the children's probability. With the previous 2 nodes merged into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree. + +The simplest construction algorithm uses a [[priority queue]] where the node with lowest probability is given highest priority: + +# Create a leaf node for each symbol and add it to the priority queue. +# While there is more than one node in the queue: +## Remove the two nodes of highest priority (lowest probability) from the queue +## Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. +## Add the new node to the queue. +# The remaining node is the root node and the tree is complete. + +Since efficient priority queue data structures require O(log ''n'') time per insertion, and a tree with ''n'' leaves has 2''n''−1 nodes, this algorithm operates in O(''n'' log ''n'') time, where ''n'' is the number of symbols. + +If the symbols are sorted by probability, there is a [[linear-time]] (O(''n'')) method to create a Huffman tree using two [[Queue (data structure)|queues]], the first one containing the initial weights (along with pointers to the associated leaves), and combined weights (along with pointers to the trees) being put in the back of the second queue. This assures that the lowest weight is always kept at the front of one of the two queues: + +#Start with as many leaves as there are symbols. +#Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). +#While there is more than one node in the queues: +##Dequeue the two nodes with the lowest weight by examining the fronts of both queues. +##Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight. +##Enqueue the new node into the rear of the second queue. +#The remaining node is the root node; the tree has now been generated. + +Although this algorithm may appear "faster" complexity-wise than the previous algorithm using a priority queue, this is not actually the case because the symbols need to be sorted by probability before-hand, a process that takes O(''n'' log ''n'') time in itself. + +In many cases, time complexity is not very important in the choice of algorithm here, since ''n'' here is the number of symbols in the alphabet, which is typically a very small number (compared to the length of the message to be encoded); whereas complexity analysis concerns the behavior when ''n'' grows to be very large. + +It is generally beneficial to minimize the variance of codeword length. For example, a communication buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the tree is especially unbalanced. To minimize variance, simply break ties between queues by choosing the item in the first queue. This modification will retain the mathematical optimality of the Huffman coding while both minimizing variance and minimizing the length of the longest character code. + +Here's an example of optimized Huffman coding using the French subject string "j'aime aller sur le bord de l'eau les jeudis ou les jours impairs". Note that original Huffman coding tree structure would be different from the given example: + +[[Image:Huffman huff demo.gif|center]] + +===Decompression=== +Generally speaking, the process of decompression is simply a matter of translating the stream of prefix codes to individual byte values, usually by traversing the Huffman tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Huffman tree must be somehow reconstructed. In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression efficiency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. If the data is compressed using [[canonical Huffman code|canonical encoding]], the compression model can be precisely reconstructed with just <math>B2^B</math> bits of information (where <math>B</math> is the number of bits per symbol). Another method is to simply prepend the Huffman tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the Huffman tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet). Many other techniques are possible as well. In any case, since the compressed data can include unused "trailing bits" the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by defining a special code symbol to signify the end of input (the latter method can adversely affect code length optimality, however). + +== Main properties == +The probabilities used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed. +This requires that a [[frequency table]] must be stored with the compressed text. See the Decompression section above for more information about the various techniques employed for this purpose. + +Huffman coding is optimal when the probability of each input symbol is the inverse of a power of two. Prefix codes tend to have inefficiency on small alphabets, where probabilities often fall between these optimal points. "Blocking", or expanding the alphabet size by grouping multiple symbols into "words" of fixed or variable-length before Huffman coding helps both to reduce that inefficiency and to take advantage of statistical dependencies between input symbols within the group (as in the case of natural language text). The worst case for Huffman coding can happen when the probability of a symbol exceeds 2<sup>−1</sup> = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called [[run-length encoding]]; for the simple case of [[Bernoulli process]]es, [[Golomb coding]] is a provably optimal run-length code. + +[[Arithmetic coding]] produces some gains over Huffman coding, although arithmetic coding has higher computational complexity. Also, arithmetic coding was historically a subject of some concern over [[patent]] issues. However, as of mid-2010, various well-known effective techniques for arithmetic coding have passed into the public domain as the early patents have expired. + +== Variations == +Many variations of Huffman coding exist, some of which use a Huffman-like algorithm, and others of which find optimal prefix codes (while, for example, putting different restrictions on the output). Note that, in the latter case, the method need not be Huffman-like, and, indeed, need not even be [[polynomial time]]. An exhaustive list of papers on Huffman coding and its variations is given by "Code and Parse Trees for Lossless Source Encoding"[http://scholar.google.com/scholar?hl=en&lr=&cluster=6556734736002074338]. + +=== ''n''-ary Huffman coding === +The '''''n''-ary Huffman''' algorithm uses the {0, 1, ... , ''n'' − 1} alphabet to encode message and build an ''n''-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (''n'' equals 2) codes, except that the ''n'' least probable symbols are taken together, instead of just the 2 least probable. Note that for ''n'' greater than 2, not all sets of source words can properly form an ''n''-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added. This is because the tree must form an ''n'' to 1 contractor; for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractor. If the number of source words is congruent to 1 modulo ''n''-1, then the set of source words will form a proper Huffman tree. + +=== Adaptive Huffman coding === +A variation called '''[[adaptive Huffman coding]]''' involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols, and changing the coding tree structure to match the updated probability estimates. It is used rarely in practice, since the cost of updating the tree makes it slower than optimized [[Arithmetic_coding#Adaptive_arithmetic_coding|adaptive arithmetic coding]], that is more flexible and has a better compression. + +=== Huffman template algorithm === +Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only that the weights form a [[total order|totally ordered]] [[Monoid#Commutative monoid|commutative monoid]], meaning a way to order weights and to add them. The '''Huffman template algorithm''' enables one to use any kind of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many combining methods (not just addition). Such algorithms can solve other minimization problems, such as minimizing <math>\max_i\left[w_{i}+\mathrm{length}\left(c_{i}\right)\right]</math>, a problem first applied to circuit design. + +=== Length-limited Huffman coding/minimum variance huffman coding === +'''Length-limited Huffman coding''' is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The [[package-merge algorithm]] solves this problem with a simple [[Greedy algorithm|greedy]] approach very similar to that used by Huffman's algorithm. Its time complexity is <math>O(nL)</math>, where <math>L</math> is the maximum length of a codeword. No algorithm is known to solve this problem in [[Big O notation#Orders of common functions|linear or linearithmic]] time, unlike the presorted and unsorted conventional Huffman problems, respectively. + +=== Huffman coding with unequal letter costs === +In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is ''N'' digits will always have a cost of ''N'', no matter how many of those digits are 0s, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing. + +''Huffman coding with unequal letter costs'' is the generalization without this assumption: the letters of the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An example is the encoding alphabet of [[Morse code]], where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in the same manner or with the same efficiency as conventional Huffman coding. + +=== Optimal alphabetic binary trees (Hu-Tucker coding) === +In the standard Huffman coding problem, it is assumed that any codeword can correspond to any input symbol. In the alphabetic version, the alphabetic order of inputs and outputs must be identical. Thus, for example, <math>A = \left\{a,b,c\right\}</math> could not be assigned code <math>H\left(A,C\right) = \left\{00,1,01\right\}</math>, but instead should be assigned either <math>H\left(A,C\right) =\left\{00,01,1\right\}</math> or <math>H\left(A,C\right) = \left\{0,10,11\right\}</math>. This is also known as the '''Hu-Tucker''' problem, after the authors of the paper presenting the first [[linearithmic]] solution to this optimal binary alphabetic problem,<ref>T.C. Hu and A.C. Tucker, ''Optimal computer search trees and variable length alphabetical codes'', Journal of SIAM on Applied Mathematics, vol. 21, no. 4, December 1971, pp. 514-532.</ref> which has some similarities to Huffman algorithm, but is not a variation of this algorithm. These optimal alphabetic binary trees are often used as [[binary search tree]]s. + +=== The canonical Huffman code === + +If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman code has the same lengths as the optimal alphabetic code, which can be found from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting from numerically (re-)ordered input is sometimes called the ''[[canonical Huffman code]]'' and is often the code used in practice, due to ease of encoding/decoding. The technique for finding this code is sometimes called '''Huffman-Shannon-Fano coding''', since it is optimal like Huffman coding, but alphabetic in weight probability, like [[Shannon-Fano coding]]. The Huffman-Shannon-Fano code corresponding to the example is <math>\{000,001,01,10,11\}</math>, which, having the same codeword lengths as the original solution, is also optimal. + +== Applications == +[[Arithmetic coding]] can be viewed as a generalization of Huffman coding, in the sense that they produce the same output when every symbol has a probability of the form 1/2<sup>''k''</sup>; in particular it tends to offer significantly better compression for small alphabet sizes. Huffman coding nevertheless remains in wide use because of its simplicity and high speed. Intuitively, arithmetic coding can offer better compression than Huffman coding because its "code words" can have effectively non-integer bit lengths, whereas code words in Huffman coding can only have an integer number of bits. Therefore, there is an inefficiency in Huffman coding where a code word of length ''k'' only optimally matches a symbol of probability 1/2<sup>''k''</sup> and other probabilities are not represented as optimally; whereas the code word length in arithmetic coding can be made to exactly match the true probability of the symbol. + +Huffman coding today is often used as a "back-end" to some other compression methods. +[[DEFLATE (algorithm)|DEFLATE]] ([[PKZIP]]'s algorithm) and multimedia [[codec]]s such as [[JPEG]] and [[MP3]] have a front-end model and [[quantization (signal processing)|quantization]] followed by Huffman coding (or variable-length prefix-free codes with a similar structure, although perhaps not necessarily designed by using Huffman's algorithm{{clarify|date=February 2012}}). + +==See also== +*[[Adaptive Huffman coding]] +*[[Canonical Huffman code]] +*[[Data compression]] +*[[Huffyuv]] +*[[Lempel–Ziv–Welch]] +*[[Modified Huffman coding]] - used in [[fax machines]] +*[[Shannon-Fano coding]] +*[[Varicode]] + +== Notes == +{{Reflist}} + +== References== +* For Java Implementation see: [https://github.com/Glank/Huffman-Compression GitHub:Glank] +* D.A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proceedings of the I.R.E., September 1952, pp 1098–1102. Huffman's original article. +* Ken Huffman. [http://www.huffmancoding.com/my-uncle/scientific-american Profile: David A. Huffman], [[Scientific American]], September 1991, pp.&nbsp;54–58 +* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 16.3, pp.&nbsp;385–392. + +== External links == +{{external links|date=January 2014}} +{{Commons category|Huffman coding}} +* [http://scanftree.com/Data_Structure/huffman-code Huffman Coding with c Algorithm ] +* [http://demo.tinyray.com/huffman Huffman Encoding process animation] +* [http://www.cs.pitt.edu/~kirk/cs1501/animations/Huffman.html Huffman Encoding & Decoding Animation] +* [http://alexvn.freeservers.com/s1/huffman_template_algorithm.html n-ary Huffman Template Algorithm] +* [http://huffman.ooz.ie/ Huffman Tree visual graph generator] +* [http://www.research.att.com/projects/OEIS?Anum=A098950 Sloane A098950] Minimizing k-ordered sequences of maximum height Huffman tree +* [http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html A quick tutorial on generating a Huffman tree] +* Pointers to [http://web-cat.cs.vt.edu/AlgovizWiki/HuffmanCodingTrees Huffman coding visualizations] +* [http://rosettacode.org/wiki/Huffman_codes Explanation of Huffman coding with examples in several languages] +* [http://www.hightechdreams.com/weaver.php?topic=huffmancoding Interactive Huffman Tree Construction] +* [http://github.com/elijahbal/huffman-coding/ A C program doing basic Huffman coding on binary and text files] +* [http://www.reznik.org/software.html#ABC Efficient implementation of Huffman codes for blocks of binary sequences] +{{Compression Methods}} + +{{DEFAULTSORT:Huffman Coding}} +[[Category:1952 in computer science]] +[[Category:Lossless compression algorithms]] +[[Category:Binary trees]] + 2ec48t9nspiqbsz0zxd3o2v68gdxvqx + + + + Dispersion (optics) + 0 + 2622 + + 2623 + 2013-12-24T09:03:23Z + + Materialscientist + 0 + + + [[Help:Reverting|Reverted]] edits by [[Special:Contributions/49.205.169.108|49.205.169.108]] ([[User talk:49.205.169.108|talk]]) to last version by Cheolsoo + wikitext + text/x-wiki + {{About|dispersion of waves in optics|other forms of dispersion|Dispersion (disambiguation)}} +[[File:Prism rainbow schema.png|thumb|right|In a [[dispersive prism]], material dispersion (a [[wavelength]]-dependent [[refractive index]]) causes different colors to [[refraction|refract]] at different angles, splitting white light into a [[rainbow]].]] +[[File:Light dispersion of a compact fluorescent lamp seen through an Amici direct-vision prism PNr°0114.jpg|thumb|A [[compact fluorescent lamp]] seen through an [[Amici prism]]]] + +In [[optics]], '''dispersion''' is the phenomenon in which the [[phase velocity]] of a wave depends on its frequency,<ref>{{cite book +|last = Born +|first = Max +|authorlink = Max Born +|last2 = Wolf +|first2 = Emil +|title = Principles of Optics +|publisher = [[Cambridge University Press]] +|date = October 1999 +|location = Cambridge +|pages = 14–24 +|isbn = 0-521-64222-1}}</ref> or alternatively when the [[group velocity]] depends on the frequency. +Media having such a property are termed ''dispersive media''. Dispersion is sometimes called '''''chromatic'' dispersion''' to emphasize its wavelength-dependent nature, or '''group-velocity dispersion''' ('''GVD''') to emphasize the role of the group velocity. +Dispersion is most often described for [[light]] waves, but it may occur for any kind of wave that interacts with a medium or passes through an inhomogeneous geometry (e.g., a [[waveguide]]), such as [[sound]] waves. A material's dispersion is measured by its [[Abbe number]], ''V'', with low Abbe numbers corresponding to strong dispersion. + +== Examples of dispersion == +The most familiar example of dispersion is probably a [[rainbow]], in which dispersion causes the spatial separation of a white light into components of different [[wavelengths]] (different [[color]]s). However, dispersion also has an effect in many other circumstances: for example, GVD causes [[Pulse (signal processing)|pulses]] to spread in [[optical fiber]]s, degrading signals over long distances; also, a cancellation between group-velocity dispersion and [[nonlinear]] effects leads to [[soliton]] waves. + +== Sources of dispersion == +There are generally two sources of dispersion: material dispersion and waveguide dispersion. '''Material dispersion''' comes from a frequency-dependent response of a material to waves. For example, material dispersion leads to undesired [[chromatic aberration]] in a [[lens (optics)|lens]] or the separation of colors in a [[Dispersive prism|prism]]. '''Waveguide dispersion''' occurs when the speed of a wave in a waveguide (such as an optical fiber) depends on its frequency for geometric reasons, independent of any frequency dependence of the materials from which it is constructed. More generally, "waveguide" dispersion can occur for waves propagating through any inhomogeneous structure (e.g., a [[photonic crystal]]), whether or not the waves are confined to some region. In general, ''both'' types of dispersion may be present, although they are not strictly additive. Their combination leads to signal degradation in [[optical fiber]]s for [[telecommunication]]s, because the varying delay in arrival time between different components of a signal "smears out" the signal in time. + +== Material dispersion in optics == +[[File:Dispersion-curve.png|right|thumb|320px|The variation of refractive index vs. vacuum wavelength for various glasses. The wavelengths of visible light are shaded in red.]] +[[File:Spidergraph Dispersion.GIF|320px|thumb|Influences of selected glass component additions on the mean dispersion of a specific base glass (n<sub>F</sub> valid for λ = 486 nm (blue), n<sub>C</sub> valid for λ = 656 nm (red))<ref>[http://glassproperties.com/dispersion/ Calculation of the Mean Dispersion of Glasses]</ref>]] + +Material dispersion can be a desirable or undesirable effect in optical applications. The dispersion of light by glass prisms is used to construct [[spectrometer]]s and [[spectroradiometer]]s. [[Holographic]] gratings are also used, as they allow more accurate discrimination of wavelengths. However, in lenses, dispersion causes [[chromatic aberration]], an undesired effect that may degrade images in microscopes, telescopes and photographic objectives. + +The ''[[phase velocity]]'', ''v'', of a wave in a given uniform medium is given by + +:<math>v = \frac{c}{n}</math> + +where ''c'' is the [[speed of light]] in a vacuum and ''n'' is the [[refractive index]] of the medium. + +In general, the refractive index is some function of the frequency ''f'' of the light, thus ''n'' = ''n''(''f''), or alternatively, with respect to the wave's wavelength ''n'' = ''n''(''λ''). The wavelength dependence of a material's refractive index is usually quantified by its [[Abbe number]] or its coefficients in an empirical formula such as the [[Cauchy's equation|Cauchy]] or [[Sellmeier equation]]s. + +Because of the [[Kramers–Kronig relation]]s, the wavelength dependence of the real part of the refractive index is related to the material [[absorption (electromagnetic radiation)|absorption]], described by the imaginary part of the refractive index (also called the [[refractive index#Dispersion and absorption|extinction coefficient]]). In particular, for non-magnetic materials ([[Permeability (electromagnetism)|μ]]&nbsp;=&nbsp;[[magnetic constant|μ<sub>0</sub>]]), the [[Linear response function|susceptibility]] <math>\chi</math> that appears in the Kramers–Kronig relations is the [[electric susceptibility]] <math>\chi_e = n^2 - 1</math>. + +The most commonly seen consequence of dispersion in optics is the separation of [[white light]] into a [[optical spectrum|color spectrum]] by a [[triangular prism (optics)|prism]]. From [[Snell's law]] it can be seen that the angle of [[refraction]] of light in a prism depends on the refractive index of the prism material. Since that refractive index varies with wavelength, it follows that the angle that the light is refracted by will also vary with wavelength, causing an angular separation of the colors known as ''angular dispersion''. + +For visible light, refraction indices ''n'' of most transparent materials (e.g., air, glasses) decrease with increasing wavelength ''λ'': +:<math>1 < n(\lambda_{\rm red}) < n(\lambda_{\rm yellow}) < n(\lambda_{\rm blue})\ ,</math> + +or alternatively: +:<math>\frac{{\rm d}n}{{\rm d}\lambda} < 0.</math> + +In this case, the medium is said to have ''normal dispersion''. Whereas, if the index increases with increasing wavelength (which is typically the case for [[X-ray]]s), the medium is said to have ''anomalous dispersion''. + +At the interface of such a material with air or vacuum (index of ~1), Snell's law predicts that light incident at an angle ''θ'' to the [[surface normal|normal]] will be refracted at an angle arcsin(sin(''θ'')/''n''). Thus, blue light, with a higher refractive index, will be bent more strongly than red light, resulting in the well-known [[rainbow]] pattern. + +==Group and phase velocity==<!-- This section is linked from [[Speed of sound]] --> +{{merge section from|Group delay and phase delay#Group delay in optics|date=September 2013}} +Another consequence of dispersion manifests itself as a temporal effect. The formula ''v'' = ''c'' / ''n'' calculates the ''phase velocity'' of a wave; this is the [[velocity]] at which the ''[[phase (waves)|phase]]'' of any one frequency component of the wave will propagate. This is not the same as the ''[[group velocity]]'' of the wave, that is the rate at which changes in [[amplitude]] (known as the ''envelope'' of the wave) will propagate. For a homogeneous medium, the group velocity ''v''<sub>g</sub> is related to the phase velocity by (here λ is the wavelength in vacuum, not in the medium): + +:<math>v_g = c \left( n - \lambda \frac{dn}{d\lambda} \right)^{-1}.</math> + +The group velocity ''v''<sub>g</sub> is often thought of as the velocity at which energy or information is conveyed along the wave. In most cases this is true, and the group velocity can be thought of as the ''[[signal velocity]]'' of the waveform. In some unusual circumstances, called cases of anomalous dispersion, the rate of change of the index of refraction with respect to the wavelength changes sign, in which case it is possible for the group velocity to exceed the speed of light (''v''<sub>g</sub> &gt; ''c''). Anomalous dispersion occurs, for instance, where the wavelength of the light is close to an [[absorption (optics)|absorption]] resonance of the medium. When the dispersion is anomalous, however, group velocity is no longer an indicator of signal velocity. Instead, a signal travels at the speed of the wavefront, which is ''c'' irrespective of the index of refraction.<ref>Brillouin, Léon. Wave Propagation and Group Velocity. (Academic Press: San Diego, 1960). See esp. Ch. 2 by A. Sommerfeld.</ref> Recently, it has become possible to create gases in which the group velocity is not only larger than the speed of light, but even negative. In these cases, a pulse can appear to exit a medium before it enters.<ref>{{cite journal|author=Wang, L.J., Kuzmich, A., and Dogariu, A.|title=Gain-assisted superluminal light propagation|journal=Nature|volume=406|page=277|year=2000|bibcode = 2000Natur.406..277W | doi=10.1038/35018520 | issue=6793}}</ref> Even in these cases, however, a signal travels at, or less than, the speed of light, as demonstrated by Stenner, et al.<ref>{{cite journal|author=Stenner, M. D., Gauthier, D. J., and Neifeld, M. A.|title=The speed of information in a 'fast-light' optical medium|journal=Nature|volume=425 |year=2003|bibcode = 2003Natur.425..695S |doi = 10.1038/nature02016|issue=6959|pmid=14562097|pages=695–8}}</ref> + +The group velocity itself is usually a function of the wave's frequency. This results in '''group velocity dispersion''' (GVD), which causes a short pulse of light to spread in time as a result of different frequency components of the pulse travelling at different velocities. GVD is often quantified as the ''[[group delay]] dispersion parameter'' (again, this formula is for a uniform medium only): + +:<math>D = - \frac{\lambda}{c} \, \frac{d^2 n}{d \lambda^2}. </math> + +If ''D'' is less than zero, the medium is said to have ''positive dispersion''. If ''D'' is greater than zero, the medium has ''negative dispersion''. If a light pulse is propagated through a normally dispersive medium, the result is the higher frequency components travel slower than the lower frequency components. The pulse therefore becomes ''positively [[chirp]]ed'', or ''up-chirped'', increasing in frequency with time. Conversely, if a pulse travels through an anomalously dispersive medium, high frequency components travel faster than the lower ones, and the pulse becomes ''negatively [[chirp]]ed'', or ''down-chirped'', decreasing in frequency with time. + +The result of GVD, whether negative or positive, is ultimately temporal spreading of the pulse. This makes dispersion management extremely important in optical communications systems based on optical fiber, since if dispersion is too high, a group of pulses representing a bit-stream will spread in time and merge, rendering the bit-stream unintelligible. This limits the length of fiber that a signal can be sent down without regeneration. One possible answer to this problem is to send signals down the optical fibre at a wavelength where the GVD is zero (e.g., around 1.3–1.5 μm in [[silica]] [[fibres]]), so pulses at this wavelength suffer minimal spreading from dispersion—in practice, however, this approach causes more problems than it solves because zero GVD unacceptably amplifies other nonlinear effects (such as [[four wave mixing]]). Another possible option is to use [[soliton (optics)|soliton]] pulses in the regime of anomalous dispersion, a form of optical pulse which uses a [[nonlinear optics|nonlinear optical]] effect to self-maintain its shape—solitons have the practical problem, however, that they require a certain power level to be maintained in the pulse for the nonlinear effect to be of the correct strength. Instead, the solution that is currently used in practice is to perform dispersion compensation, typically by matching the fiber with another fiber of opposite-sign dispersion so that the dispersion effects cancel; such compensation is ultimately limited by nonlinear effects such as [[self-phase modulation]], which interact with dispersion to make it very difficult to undo. + +Dispersion control is also important in [[laser]]s that produce [[ultrashort pulse|short pulses]]. The overall dispersion of the [[laser construction|optical resonator]] is a major factor in determining the duration of the pulses emitted by the laser. A pair of [[Prism (optics)|prisms]] can be arranged to produce net negative dispersion, which can be used to balance the usually positive dispersion of the laser medium. [[Diffraction grating]]s can also be used to produce dispersive effects; these are often used in high-power laser amplifier systems. Recently, an alternative to prisms and gratings has been developed: [[chirped mirror]]s. These dielectric mirrors are coated so that different wavelengths have different penetration lengths, and therefore different group delays. The coating layers can be tailored to achieve a net negative dispersion. + +== Dispersion in waveguides == +[[Optical fibers]], which are used in telecommunications, are among the most abundant types of waveguides. Dispersion in these fibers is one of the limiting factors that determine how much data can be transported on a single fiber. + +The [[transverse mode]]s for waves confined laterally within a [[waveguide]] generally have different speeds (and field patterns) depending upon their frequency (that is, on the relative size of the wave, the wavelength) compared to the size of the waveguide. + +In general, for a waveguide mode with an [[angular frequency]] ω(β) at a [[propagation constant]] β (so that the electromagnetic fields in the propagation direction ''(z)'' oscillate proportional to <math>e^{i(\beta z - \omega t)}</math>), the group-velocity [[dispersion parameter]] ''D'' is defined as:<ref>Rajiv Ramaswami and Kumar N. Sivarajan, ''Optical Networks: A Practical Perspective'' (Academic Press: London 1998).</ref> + +:<math>D = -\frac{2\pi c}{\lambda^2} \frac{d^2 \beta}{d\omega^2} = \frac{2\pi c}{v_g^2 \lambda^2} \frac{dv_g}{d\omega}</math> + +where <math>\lambda = 2\pi c/\omega</math> is the vacuum wavelength and <math>v_g = d\omega/d\beta</math> is the group velocity. This formula generalizes the one in the previous section for homogeneous media, and includes both waveguide dispersion and material dispersion. The reason for defining the dispersion in this way is that |''D''| is the (asymptotic) temporal pulse spreading <math>\Delta t</math> per unit bandwidth +<math>\Delta\lambda</math> per unit distance travelled, commonly reported in [[picosecond|ps]]&nbsp;/&nbsp;[[nanometre|nm]]&nbsp;km for optical fibers. + +A similar effect due to a somewhat different phenomenon is [[modal dispersion]], caused by a waveguide having multiple modes at a given frequency, each with a different speed. A special case of this is [[polarization mode dispersion]] (PMD), which comes from a superposition of two modes that travel at different speeds due to random imperfections that break the symmetry of the waveguide. [[Modal dispersion]] can also be used to generate large, tunable group delay dispersion in a compact footprint using [[chromo-modal dispersion]].<ref>E.D. Diebold et al., "Giant tunable optical dispersion using chromo-modal excitation of a multimode waveguide," Optics Express 19 (24) 2011</ref> + +== Higher-order dispersion over broad bandwidths == +When a broad range of frequencies (a broad bandwidth) is present in a single wavepacket, such as in an [[ultrashort pulse]] or a [[chirp]]ed pulse or other forms of [[spread spectrum]] transmission, it may not be accurate to approximate the dispersion by a constant over the entire bandwidth, and more complex calculations are required to compute effects such as pulse spreading. + +In particular, the dispersion parameter ''D'' defined above is obtained from only one derivative of the group velocity. Higher derivatives are known as ''higher-order dispersion''.<ref>[http://www.rp-photonics.com/chromatic_dispersion.html Chromatic Dispersion], ''Encyclopedia of Laser Physics and Technology'' (Wiley, 2008).</ref> These terms are simply a [[Taylor series]] expansion of the [[dispersion relation]] <math>\beta(\omega)</math> of the medium or waveguide around some particular frequency. Their effects can be computed via numerical evaluation of [[Fourier transform]]s of the waveform, via integration of higher-order [[slowly varying envelope approximation]]s, by a [[split-step method]] (which can use the exact dispersion relation rather than a Taylor series), or by direct simulation of the full [[Maxwell's equations]] rather than an approximate envelope equation. + +== Dispersion in gemology == +{|Class="wikitable sortable collapsible collapsed" style="float:right; text-align: center" +|+Dispersion values of minerals<ref name=b1>{{cite book|author=Walter Schumann|title=Gemstones of the World: Newly Revised & Expanded Fourth Edition|url=http://books.google.com/books?id=V9PqVxpxeiEC&pg=PA42|accessdate=31 December 2011|year=2009|publisher=Sterling Publishing Company, Inc.|isbn=978-1-4027-6829-3|pages=41–2}}</ref> +!Name !!B–G !!C–F +|- +|[[Cinnabar]] (HgS) || 0.40 || – +|- +|Synth. [[rutile]] || 0.330 || 0.190 +|- +|[[Rutile]] (TiO<sub>2</sub>) || 0.280 || 0.120–0.180 +|- +|[[Anatase]] (TiO<sub>2</sub>)|| 0.213–0.259 || – +|- +|[[Wulfenite]] || 0.203 || 0.133 +|- +|[[Vanadinite]] || 0.202 || – +|- +|[[Fabulite]] || 0.190 || 0.109 +|- +|[[Sphalerite]] (ZnS) || 0.156 || 0.088 +|- +|[[Sulfur]] (S) || 0.155 || – +|- +|[[Stibiotantalite]] || 0.146 || – +|- +|[[Goethite]] (FeO(OH)) || 0.14 || – +|- +|[[Brookite]] (TiO<sub>2</sub>) || 0.131 || 0.12–1.80 +|- +|[[Zincite]] (ZnO) || 0.127 || – +|- +|[[Linobate]] || 0.13 || 0.075 +|- +|Synth. [[moissanite]] (SiC) || 0.104 || – +|- +|[[Cassiterite]] (SnO<sub>2</sub>)|| 0.071 || 0.035 +|- +|[[Zirconia]] (ZrO<sub>2</sub>)|| 0.060 || 0.035 +|- +|[[Powellite]] (CaMoO<sub>4</sub>) || 0.058 || – +|- +|[[Andradite]] || 0.057 || – +|- +|[[Demantoid]] || 0.057 || 0.034 +|- +|[[Cerussite]] || 0.055 || 0.033–0.050 +|- +|[[Titanite]] || 0.051 || 0.019–0.038 +|- +|[[Benitoite]] || 0.046 || 0.026 +|- +|[[Anglesite]] || 0.044 || 0.025 +|- +|[[Diamond]] (C) || 0.044 || 0.025 +|- +|[[Flint glass]] || 0.041 || – +|- +|[[Hyacinth (mineral)|Hyacinth]] || 0.039 || – +|- +|[[Jargoon]] || 0.039 || – +|- +|[[Starlite]] || 0.039 || – +|- +|[[Zircon]] (ZrSiO<sub>4</sub>)|| 0.039 || 0.022 +|- +|[[Gadolinium gallium garnet|GGG]] || 0.038 || 0.022 +|- +|[[Scheelite]] || 0.038 || 0.026 +|- +|[[Dioptase]] || 0.036 || 0.021 +|- +|[[Whewellite]] || 0.034 || – +|- +|[[Alabaster]] || 0.033 || – +|- +|[[Gypsum]] || 0.033 || 0.008 +|- +|[[Epidote]] || 0.03 || 0.012–0.027 +|- +|[[Achroite]] || 0.017 || – +|- +|[[Cordierite]] || 0.017 || 0.009 +|- +|[[Danburite]] || 0.017 || 0.009 +|- +|[[Dravite]] || 0.017 || – +|- +|[[Elbaite]] || 0.017 || – +|- +|[[Herderite]] || 0.017 || 0.008–0.009 +|- +|[[Hiddenite]] || 0.017 || 0.010 +|- +|[[Indicolite]] || 0.017 || – +|- +|[[Liddicoatite]] || 0.017 || – +|- +|[[Kunzite]] || 0.017 || 0.010 +|- +|[[Rubellite]] || 0.017 || 0.008–0.009 +|- +|[[Schorl]] || 0.017 || – +|- +|[[Scapolite]] || 0.017 || – +|- +|[[Spodumene]] || 0.017 || 0.010 +|- +|[[Tourmaline]] || 0.017 || 0.009–0.011 +|- +|[[Verdelite]] || 0.017 || – +|- +|[[Andalusite]] || 0.016 || 0.009 +|- +|[[Baryte]] (BaSO<sub>4</sub>)|| 0.016 || 0.009 +|- +|[[Euclase]] || 0.016 || 0.009 +|- +|[[Alexandrite]] || 0.015 || 0.011 +|- +|[[Chrysoberyl]] || 0.015 || 0.011 +|- +|[[Hambergite]] || 0.015 || 0.009–0.010 +|- +|[[Phenakite]] || 0.01 || 0.009 +|- +|[[Rhodochrosite]] || 0.015 || 0.010–0.020 +|- +|[[Sillimanite]] || 0.015 || 0.009–0.012 +|- +|[[Smithsonite]] || 0.014–0.031 || 0.008–0.017 +|- +|[[Amblygonite]] || 0.014–0.015 || 0.008 +|- +| [[Aquamarine (gemstone)|Aquamarine]] || 0.014 || 0.009–0.013 +|- +|[[Beryl]] || 0.014 || 0.009–0.013 +|- +|[[Brazilianite]] || 0.014 || 0.008 +|- +|[[Celestine (mineral)|Celestine]] || 0.014 || 0.008 +|- +|[[Goshenite]] || 0.014 || – +|- +|[[Heliodor]] || 0.014 || 0.009–0.013 +|- +|[[Morganite]] || 0.014 || 0.009–0.013 +|- +|[[Pyroxmangite]] || 0.015 || – +|- +|Synth. [[scheelite]] || 0.015 || – +|- +|[[Dolomite]] || 0.013 || – +|- +|[[Magnesite]] (MgCO<sub>3</sub>)|| 0.012 || – +|- +|Synth. [[emerald]] || 0.012 || – +|- +|Synth. [[alexandrite]] || 0.011 || – +|- +|Synth. [[sapphire]] (Al<sub>2</sub>O<sub>3</sub>)|| 0.011 || – +|- +|[[Phosphophyllite]] || 0.010–0.011 || – +|- +|[[Enstatite]] || 0.010 || – +|- +|[[Anorthite]] || 0.009–0.010 || – +|- +|[[Actinolite]] || 0.009 || – +|- +|[[Jeremejevite]] || 0.009 || – +|- +|[[Nepheline]] || 0.008–0.009 || – +|- +|[[Apophyllite]] || 0.008 || – +|- +|[[Hauyne]] || 0.008 || – +|- +|[[Natrolite]] || 0.008 || – +|- +|Synth. quartz (SiO<sub>2</sub>) || 0.008 || – +|- +|[[Aragonite]] || 0.007–0.012 || – +|- +|[[Augelite]] || 0.007 || – +|- +|[[Tanzanite]] || 0.030 || 0.011 +|- +|[[Thulite]] || 0.03 || 0.011 +|- +|[[Zoisite]] || 0.03 || – +|- +|[[YAG]] || 0.028 || 0.015 +|- +|[[Almandine]] || 0.027 || 0.013–0.016 +|- +|[[Hessonite]] || 0.027 || 0.013–0.015 +|- +|[[Spessartine]] || 0.027 || 0.015 +|- +|[[Uvarovite]] || 0.027 || 0.014–0.021 +|- +|[[Willemite]] || 0.027 || – +|- +|[[Pleonaste]] || 0.026 || – +|- +|[[Rhodolite]] || 0.026 || – +|- +|[[Boracite]] || 0.024 || 0.012 +|- +|[[Cryolite]] || 0.024 || – +|- +|[[Staurolite]] || 0.023 || 0.012–0.013 +|- +|[[Pyrope]] || 0.022 || 0.013–0.016 +|- +|[[Diaspore]] || 0.02 || – +|- +|[[Grossular]] || 0.020 || 0.012 +|- +|[[Hemimorphite]] || 0.020 || 0.013 +|- +|[[Kyanite]] || 0.020 || 0.011 +|- +|[[Peridot]] || 0.020 || 0.012–0.013 +|- +|[[Spinel]] || 0.020 || 0.011 +|- +|[[Vesuvianite]] || 0.019–0.025 || 0.014 +|- +|[[Clinozoisite]] || 0.019 || 0.011–0.014 +|- +|[[Labradorite]] || 0.019 || 0.010 +|- +|[[Axinite]] || 0.018–0.020 || 0.011 +|- +|[[Ekanite]] || 0.018 || 0.012 +|- +|[[Kornerupine]] || 0.018 || 0.010 +|- +|[[Corundum]] (Al<sub>2</sub>O<sub>3</sub>)|| 0.018 || 0.011 +|- +|[[Rhodizite]] || 0.018 || – +|- +|[[Ruby]] (Al<sub>2</sub>O<sub>3</sub>)|| 0.018 || 0.011 +|- +|[[Sapphire]] (Al<sub>2</sub>O<sub>3</sub>)|| 0.018 || 0.011 +|- +|[[Sinhalite]] || 0.018 || 0.010 +|- +|[[Sodalite]] || 0.018 || 0.009 +|- +|Synth. [[corundum]] || 0.018 || 0.011 +|- +|[[Diopside]] || 0.018–0.020 || 0.01 +|- +|[[Emerald]] || 0.014 || 0.009–0.013 +|- +|[[Topaz]] || 0.014 || 0.008 +|- +|[[Amethyst]] (SiO<sub>2</sub>)|| 0.013 || 0.008 +|- +|[[Anhydrite]] || 0.013 || – +|- +|[[Apatite]] || 0.013 || 0.010 +|- +|[[Apatite]] || 0.013 || 0.008 +|- +|[[Aventurine]] || 0.013 || 0.008 +|- +|[[Citrine]] || 0.013 || 0.008 +|- +|[[Morion (mineral)|Morion]] || 0.013 || – +|- +|[[Prasiolite]] || 0.013 || 0.008 +|- +|[[Quartz]] (SiO<sub>2</sub>)|| 0.013 || 0.008 +|- +|Smoky quartz (SiO<sub>2</sub>)|| 0.013 || 0.008 +|- +|Rose quartz (SiO<sub>2</sub>)|| 0.013 || 0.008 +|- +|[[Albite]] || 0.012 || – +|- +|[[Bytownite]] || 0.012 || – +|- +|[[Feldspar]] || 0.012 || 0.008 +|- +|[[Moonstone (gemstone)|Moonstone]] || 0.012 || 0.008 +|- +|[[Orthoclase]] || 0.012 || 0.008 +|- +|[[Pollucite]] || 0.012 || 0.007 +|- +|[[Sanidine]] || 0.012 || – +|- +|[[Sunstone]] || 0.012 || – +|- +|[[Beryllonite]] || 0.010 || 0.007 +|- +|[[Cancrinite]] || 0.010 || 0.008–0.009 +|- +|[[Leucite]] || 0.010 || 0.008 +|- +|[[Obsidian]] || 0.010 || – +|- +|[[Strontianite]] || 0.008–0.028 || – +|- +|[[Calcite]] (CaCO<sub>3</sub>) || 0.008–0.017 || 0.013–0.014 +|- +|[[Fluorite]] (CaF<sub>2</sub>)|| 0.007 || 0.004 +|- +|[[Hematite]] || 0.500 || – +|- +|Synth. [[cassiterite]] (SnO<sub>2</sub>)|| 0.041 || – +|- +|[[Gahnite]] || 0.019–0.021 || – +|- +|[[Datolite]] || 0.016 || – +|- +|[[Tremolite]] || 0.006–0.007 || – +|} + +In the [[technical terminology]] of [[gemology]], ''dispersion'' is the difference in the refractive index of a material at the B and G (686.7 [[Nanometre|nm]] and 430.8&nbsp;nm) or C and F (656.3&nbsp;nm and 486.1&nbsp;nm) [[fraunhofer line|Fraunhofer]] wavelengths, and is meant to express the degree to which a prism cut from the [[gemstone]] shows "fire", or color. Dispersion is a material property. Fire depends on the dispersion, the cut angles, the lighting environment, the refractive index, and the viewer.<ref name=b1 /> + +== Dispersion in imaging == +In photographic and microscopic lenses, dispersion causes [[chromatic aberration]], which causes the different colors in the image not to overlap properly. Various techniques have been developed to counteract this, such as the use of [[achromat]]s, multielement lenses with glasses of different dispersion. They are constructed in such a way that the chromatic aberrations of the different parts cancel out. + +== Dispersion in pulsar timing == +[[Pulsar]]s are spinning neutron stars that emit pulses at very regular intervals ranging from milliseconds to seconds. Astronomers believe that the pulses are emitted simultaneously over a wide range of frequencies. However, as observed on Earth, the components of each pulse emitted at higher radio frequencies arrive before those emitted at lower frequencies. This dispersion occurs because of the ionized component of the [[interstellar medium]], mainly the free electrons, which make the group velocity frequency dependent. The extra delay added at a frequency <math>\nu</math> is + +:<math>t = k_\mathrm{DM} \times \left(\frac{\mathrm{DM}}{\nu^2}\right)</math> + +where the dispersion constant <math>k_\mathrm{DM}</math> is given by + +:<math> k_\mathrm{DM} = \frac{k_e e^2}{2 \pi m_\mathrm{e}c} \simeq 4.149 \mathrm{GHz}^2\mathrm{pc}^{-1}\mathrm{cm}^3\mathrm{ms}</math>, + +and the dispersion measure ''DM'' is the column density of electrons — i.e. the number density of electrons <math>n_e</math> (electrons/cm<sup>3</sup>) integrated along the path traveled by the photon from the pulsar to the Earth — and is given by + +:<math>\mathrm{DM} = \int_0^d{n_e\;dl}</math> + +with units of [[parsec]]s per cubic centimetre (1pc/cm<sup>3</sup> = 30.857×10<sup>21</sup> m<sup>−2</sup>).<ref>Lorimer, D.R., and Kramer, M., ''Handbook of Pulsar Astronomy'', vol. 4 of Cambridge Observing Handbooks for Research Astronomers, ([[Cambridge University Press]], Cambridge, U.K.; New York, U.S.A, 2005), 1st edition.</ref> + +Typically for astronomical observations, this delay cannot be measured directly, since the emission time is unknown. What ''can'' be measured is the difference in arrival times at two different frequencies. The delay <math>\Delta T</math> between a high frequency <math>\nu_{hi}</math> and a low frequency <math>\nu_{lo}</math> component of a pulse will be + +:<math>\Delta t = k_\mathrm{DM} \times \mathrm{DM} \times \left( \frac{1}{\nu_{\mathrm{lo}}^2} - \frac{1}{\nu_{\mathrm{hi}}^2} \right)</math> + +Re-writing the above equation in terms of ''DM'' allows one to determine the ''DM'' by measuring pulse arrival times at multiple frequencies. This in turn can be used to study the interstellar medium, as well as allow for observations of pulsars at different frequencies to be combined. + +== See also == +{{colbegin|3}} +* [[Dispersion relation]] +* [[Sellmeier equation]] +* [[Cauchy's equation]] +* [[Abbe number]] +* [[Kramers–Kronig relations]] +* [[Group delay]] +* [[Calculation of glass properties]] incl. dispersion +* [[Linear response function]] +* [[Green–Kubo relations]] +* [[Fluctuation theorem]] +* [[Multiple-prism dispersion theory]] +* [[Ultrashort pulse]] +* [[Intramodal dispersion]] +{{colend}} + +== References == +<!-- ---------------------------------------------------------- + See [[Wikipedia:Footnotes]] for a + discussion of different citation methods and how to generate + footnotes using the<ref>,</ref> and <reference /> tags +----------------------------------------------------------- --> +{{reflist|2}} + +== External links == +{{Commons|Dispersion|Dispersion (optics)}} +* [http://ioannis.virtualcomposer2000.com/spectroscope/characteristics.html Optical Characteristics of the SF10 Crystal Prism] +* [http://ioannis.virtualcomposer2000.com/spectroscope/deviationangle.html Deviation Angle for a Prism] +* [http://tosio.math.toronto.edu/wiki/index.php/Main_Page Dispersive Wiki] – discussing the mathematical aspects of dispersion. +* [http://www.rp-photonics.com/dispersion.html Dispersion] – Encyclopedia of Laser Physics and Technology +* [http://qed.wikina.org/dispersion/ Animations demonstrating optical dispersion] by QED + +{{Glass science}} + +[[Category:Optics]] +[[Category:Glass physics]] + sij8wj1qz6fx95pv7748tgvepv7pgm1 + + + + Allan variance + 0 + 1226 + + 1227 + 2014-01-28T22:34:12Z + + Derekdoth + 0 + + + /* Power-law noise */ + wikitext + text/x-wiki + {{Use dmy dates|date=June 2013}} +The '''Allan variance''' ('''AVAR'''), also known as '''two-sample variance''', is a measure of frequency stability in [[clock]]s, [[oscillator]]s and [[amplifier]]s. It is named after [[David W. Allan]]. It is expressed mathematically as + +:<math>\sigma_y^2(\tau). \, </math> + +The '''Allan deviation''' ('''ADEV''') is the square root of Allan variance. It is also known as ''sigma-tau'', and is expressed mathematically as + +:<math>\sigma_y(\tau).\,</math> + +The ''M-sample variance'' is a measure of frequency stability using M samples, time T between measures and observation time <math>\tau</math>. ''M''-sample variance is expressed as + +:<math>\sigma_y^2(M, T, \tau).\,</math> + +The ''Allan variance'' is intended to estimate stability due to noise processes and not that of systematic errors or imperfections such as frequency drift or temperature effects. The Allan variance and Allan deviation describe frequency stability, i.e. the stability in frequency. See also the section entitled "[[Allan variance#Interpretation of value|Interpretation of value]]" below. + +There are also different adaptations or alterations of ''Allan variance'', notably the [[modified Allan variance]] MAVAR or MVAR, the [[total variance]], and the [[Hadamard variance]]. There also exist time stability variants such as [[time deviation]] TDEV or [[time deviation|time variance]] TVAR. Allan variance and its variants have proven useful outside the scope of [[timekeeping]] and are a set of improved statistical tools to use whenever the noise processes are not unconditionally stable, thus a derivative exists. + +The general ''M''-sample variance remains important since it allows [[dead time]] in measurements and bias functions allows conversion into Allan variance values. Nevertheless, for most applications the special case of 2-sample, or "Allan variance" with <math>T = \tau</math> is of greatest interest. + +==Background== +When investigating the stability of [[crystal oscillator]]s and [[atomic clock]]s it was found that they did not have a [[phase noise]] consisting only of [[white noise]], but also of white frequency noise and [[flicker noise|flicker frequency noise]]. These noise forms become a challenge for traditional statistical tools such as [[standard deviation]] as the estimator will not converge. The noise is thus said to be divergent. Early efforts in analysing the stability included both theoretical analysis and practical measurements.<ref name=Cutler1966>{{Citation |last1=Cutler |first1=L. S. |last2=Searle |first2=C. L. |url=http://wwwusers.ts.infn.it/~milotti/Didattica/Segnali/Cutler&Searle_1966.pdf |title=Some Aspects of the Theory and Measurements of Frequency Fluctuations in Frequency Standards |journal=Proceedings of IEEE |volume=54 |number=2 |date=February 1966 |pages=136–154}}</ref><ref name=Leeson1966>{{Citation |last=Leeson |first=D. B |title=A simple Model of Feedback Oscillator Noise Spectrum |url=http://ccnet.stanford.edu/cgi-bin/course.cgi?cc=ee246&action=handout_download&handout_id=ID113350669026291 |pages=329–330 |journal=Proceedings of IEEE |volume=54 |number=2 |date=February 1966 |accessdate=20 September 2012}}</ref> + +An important side-consequence of having these types of noise was that, since the various methods of measurements did not agree with each other, the key aspect of repeatability of a measurement could not be achieved. This limits the possibility to compare sources and make meaningful specifications to require from suppliers. Essentially all forms of scientific and commercial uses were then limited to dedicated measurements which hopefully would capture the need for that application. + +To address these problems, David Allan introduced the M-sample variance and (indirectly) the two-sample variance.<ref name=Allan1966/> While the two-sample variance did not completely allow all types of noise to be distinguished, it provided a means to meaningfully separate many noise-forms for time-series of phase or frequency measurements between two or more oscillators. Allan provided a method to convert between any M-sample variance to any N-sample variance via the common 2-sample variance, thus making all M-sample variances comparable. The conversion mechanism also proved that M-sample variance does not converge for large M, thus making them less useful. IEEE later identified the 2-sample variance as the preferred measure.<ref name=IEEE1139>{{cite journal | doi = 10.1109/IEEESTD.1999.90575 | title=Definitions of physical quantities for fundamental frequency and time metrology &ndash; Random Instabilities | journal=IEEE Std 1139-1999}}</ref> + +An early concern was related to time and frequency measurement instruments which had a [[dead time]] between measurements. Such a series of measurements did not form a continuous observation of the signal and thus introduced a [[systematic bias]] into the measurement. Great care was spent in estimating these biases. The introduction of zero dead time counters removed the need, but the bias analysis tools have proved useful. + +Another early aspect of concern was related to how the [[Bandwidth (signal processing)|bandwidth]] of the measurement instrument would influence the measurement, such that it needed to be noted. It was later found that by algorithmically changing the observation <math>\tau</math>, only low <math>\tau</math> values would be affected while higher values would be unaffected. The change of <math>\tau</math> is done by letting it be an integer multiple <math>n</math> of the measurement timebase <math>\tau_0</math>. + +:<math>\tau = n\,\tau_0 </math> + +The physics of [[crystal oscillator]]s was analyzed by D. B. Leeson<ref name=Leeson1966/> and the result is now referred to as [[Leeson's equation]]. The feedback in the [[oscillator]] will make the [[white noise]] and [[flicker noise]] of the feedback amplifier and crystal become the [[power-law noise]]s of <math>f^{-2}</math> white frequency noise and <math>f^{-3}</math> flicker frequency noise respectively. These noise forms have the effect that the [[standard variance]] estimator does not converge when processing time error samples. This mechanics of the feedback oscillators was unknown when the work on oscillator stability started but was presented by Leeson at the same time as the statistical tools was made available by [[David W. Allan]]. For a more thorough presentation on the [[Leeson effect]] see modern phase noise literature.<ref name=Rubiola2009>{{Citation |last=Rubiola |first=Enrico |title=Phase Noise and Frequency Stability in Oscillators |publisher=Cambridge university press |isbn=0-521-88677-5 |year=2008}}</ref> + +==Interpretation of value== +Allan variance is defined as one half of the [[time]] average of the squares of the differences between successive readings of the [[frequency deviation]] sampled over the sampling period. The Allan variance depends on the time period used between samples: therefore it is a function of the sample period, commonly denoted as τ, likewise the distribution being measured, and is displayed as a graph rather than a single number. A low Allan variance is a characteristic of a clock with good stability over the measured period. + +Allan deviation is widely used for plots (conveniently in [[Log-log graph|log-log]] format) and presentation of numbers. It is preferred as it gives the relative amplitude stability, allowing ease of comparison with other sources of errors. + +An Allan deviation of 1.3×10<sup>&minus;9</sup> at observation time 1 s (i.e. τ = 1 s) should be interpreted as there being an instability in frequency between two observations a second apart with a relative [[root mean square]] (RMS) value of 1.3×10<sup>&minus;9</sup>. For a 10-MHz clock, this would be equivalent to 13 mHz RMS movement. If the phase stability of an oscillator is needed then the [[time deviation]] variants should be consulted and used. + +One may convert the Allan variance and other time-domain variances into frequency-domain measures of time (phase) and frequency stability. The following link shows these relationships and how to perform these conversions: +http://www.allanstime.com/Publications/DWA/Conversion_from_Allan_variance_to_Spectral_Densities.pdf + +==Definitions== + +===<math>M</math>-sample variance=== + +The <math>M</math>-sample variance is defined<ref name=Allan1966>Allan, D [http://tf.boulder.nist.gov/general/pdf/7.pdf ''Statistics of Atomic Frequency Standards''], pages 221–230. Proceedings of IEEE, Vol. 54, No 2, February 1966.</ref> (here in a modernized notation form) as + +:<math>\sigma_y^2(M, T, \tau) = \frac{1}{M-1}\left\{\sum_{i=0}^{M-1}\left[\frac{x(iT+\tau )-x(iT)}{\tau}\right]^2 - \frac{1}{M}\left[\sum_{i=0}^{M-1}\frac{x(iT+\tau)-x(iT)}{\tau}\right]^2\right\}</math> + +or with [[Allan variance#Average fractional frequency|average fractional frequency]] time series + +:<math>\sigma_y^2(M, T, \tau) = \frac{1}{M-1}\left\{\sum_{i=0}^{M-1}\bar{y}_i^2 - \frac{1}{M}\left[\sum_{i=0}^{M-1}\bar{y}_i\right]^2\right\}</math> + +where <math>M</math> is the number of frequency samples used in variance, <math>T</math> is the time between each frequency sample and <math>\tau</math> is the time-length of each frequency estimate. + +An important aspect is that <math>M</math>-sample variance model counter dead-time by letting the time <math>T</math> be different from that of&nbsp;<math>\tau</math>. + +===Allan variance=== +The Allan variance is defined as + +:<math>\sigma_y^2(\tau) = \langle\sigma_y^2(2, \tau, \tau)\rangle</math> + +which is conveniently expressed as + +:<math>\sigma_y^2(\tau) = \frac{1}{2}\langle(\bar{y}_{n+1}-\bar{y}_n)^2\rangle = \frac{1}{2\tau^2}\langle(x_{n+2}-2x_{n+1}+x_n)^2\rangle</math> + +where <math>\tau</math> is the observation period, <math>\bar{y}_n</math> is the ''n''th [[allan variance#Fractional frequency|fractional frequency]] average over the observation time <math>\tau</math>. + +The samples are taken with no dead-time between them, which is achieved by letting + +:<math>T = \tau \, </math> + +===Allan deviation=== +Just as with [[standard deviation]] and [[variance]], the Allan deviation is defined as the square root of the Allan variance. + +:<math>\sigma_y(\tau) = \sqrt{\sigma_y^2(\tau)} \, </math> + +==Supporting definitions== + +===Oscillator model=== + +The oscillator being analysed is assumed to follow the basic model of + +: <math>V(t) = V_0 \sin (\Phi(t)) \, </math> + +The oscillator is assumed to have the nominal frequency of ''v''<sub>''n''</sub> being the nominal number of cycles per second or Hertz (Hz), corresponding to the nominal angular frequency <math>\omega_n</math> as related in + +: <math>\omega_n = 2\pi v_n \, </math> + +Removing the nominal phase ramp, the total phase can be separated into: + +: <math>\Phi(t) = \omega_nt + \phi(t) = 2\pi v_nt + \phi(t) \, </math> + +===Time error=== +The time error function ''x''(''t'') is the difference between expected nominal time and actual normal time + +: <math>x(t) = \frac{\phi(t)}{2\pi v_n} = \frac{\Phi(t)}{2\pi v_n} - t = T(t) - t </math> + +For measured values a time error series TE(''t'') is defined from the reference time function ''T''<sub>REF</sub>(''t'') as + +: <math>TE(t) = T(t) - T_\text{REF}(t). \, </math> + +===Frequency function=== +The frequency function ''v''(''t'') is the frequency over time defined as + +: <math>v(t) = \frac{1}{2\pi} \frac{d\Phi(t)}{dt}</math> + +===Fractional frequency=== +The fractional frequency ''y''(''t'') is the normalized delta from the nominal frequency ''v''<sub>''n''</sub>, thus + +:<math>y(t) = \frac{v(t)-v_n}{v_n} = \frac{v(t)}{v_n}-1</math> + +===Average fractional frequency=== +The average fractional frequency is defined as + +:<math>\bar{y}(t, \tau) = \frac{1}{\tau}\int\limits_0^\tau y(t+t_v) \, dt_v</math> + +where the average is taken over observation time ''&tau;'', the ''y''(''t'') is the fractional frequency error at time ''t'' and ''τ'' is the observation time. + +Since ''y''(''t'') is the derivative of ''x''(''t'') we can without loss of generality rewrite it as + +:<math>\bar{y}(t, \tau) = \frac{x(t+\tau)-x(t)}{\tau}</math> + +==Estimators== +The definition is based on the statistical [[expected value]], integrating over infinite time. Real world situation does not allow for such time-series, in which case a statistical [[estimator]] needs to be used in its place. A number of different estimators will be presented and discussed. + +===Conventions=== +*The number of frequency samples in a fractional frequency series is denoted with ''M''. +*The number of time error samples in a time error series is denoted with ''N''. +The relation between the number of fractional frequency samples and time error series is fixed in the relationship +: <math>N = M + 1 \, </math> + +*For [[allan variance#Time error|time error]] sample series, ''x''<sub>''i''</sub> denotes the ''i'';th sample of the continuous time function ''x''(''t'') as given by + +:<math>x_i = x(iT) \, </math> + +where ''T'' is the time between measurements. For Allan variance, the time being used has ''T'' set to the observation time ''τ''. + +The [[allan variance#Time error|time error]] sample series let ''N'' denote the number of samples (''x''<sub>0</sub>&nbsp;...''x''<sub>''N-1''</sub>) in the series. The traditional convention uses index 1 through&nbsp;''N''. + +*For [[allan variance#Average fractional frequency|average fractional frequency]] sample series, <math>\bar{y}_i</math> denotes the ''i''th sample of the average continuous fractional frequency function ''y''(''t'') as given by + +:<math>\bar{y}_i = \bar{y}(Ti, \tau) \, </math> + +which gives + +:<math>\bar{y}_i = \frac{1}{\tau}\int\limits_0^\tau y(iT + t_v) \, dt_v = \frac{x(iT+\tau)-x(iT)}{\tau}</math> + +For the Allan variance assumption of ''T'' being ''τ'' it becomes + +:<math>\bar{y}_i = \frac{x_{i+1}-x_i}{\tau}. </math> + +The [[allan variance#Average fractional frequency|average fractional frequency]] sample series let ''M'' denote the number of samples (<math>\bar{y}_0 \ldots \bar{y}_{M-1}</math>) in the series. The traditional convention uses index 1 through&nbsp;''M''. + +As a shorthand is [[allan variance#Average fractional frequency|average fractional frequency]] often written without the average bar over it. This is however formally incorrect as the [[allan variance#Fractional frequency|fractional frequency]] and [[allan variance#Average fractional frequency|average fractional frequency]] is two different functions. A measurement instrument able to produce frequency estimates with no dead-time will actually deliver a frequency average time series which only needs to be converted into [[allan variance#Average fractional frequency|average fractional frequency]] and may then be used directly. + +*It is further a convention to let ''&tau;'' denote the nominal time-difference between adjacent phase or frequency samples. A time series taken for one time-difference ''&tau;''<sub>0</sub> can be used to generate Allan variance for any ''&tau;'' being an integer multiple of ''&tau;''<sub>0</sub> in which case ''&tau;''&nbsp;=&nbsp;''n&tau;''<sub>0</sub> is being used, and n becomes a variable for the estimator. + +*The time between measurements is denoted with ''T'', which is the sum of observation time ''τ'' and dead-time. + +===Fixed &tau; estimators=== +A first simple estimator would be to directly translate the definition into + +:<math>\sigma_y^2(\tau, M) = \text{AVAR}(\tau, M) = \frac{1}{2(M-1)} \sum_{i=0}^{M-2}(\bar{y}_{i+1}-\bar{y}_i)^2</math> + +or for the time series + +:<math>\sigma_y^2(\tau, N) = \text{AVAR}(\tau, N) = \frac{1}{2\tau^2(N-2)} \sum_{i=0}^{N-3}(x_{i+2}-2x_{i+1}+x_i)^2</math> + +These formulas however only provide the calculation for the ''&tau;''&nbsp;=&nbsp;''&tau;''<sub>0</sub> case. To calculate for a different value of ''&tau;'', a new time-series needs to be provided. + +===Non-overlapped variable &tau; estimators=== +If taking the time-series and skipping past ''n''&nbsp;−&nbsp;1 samples a new (shorter) time-series would occur with ''τ''<sub>0</sub> as the time between the adjacent samples, for which the Allan variance could be calculated with the simple estimators. These could be modified to introduce the new variable ''n'' such that no new time-series would have to be generated, but rather the original time series could be reused for various values of ''n''. The estimators become + +:<math>\sigma_y^2(n\tau_0, M) = \text{AVAR}(n\tau_0, M) = \frac{1}{2n(M-1)} \sum_{i=0}^{\frac{M-1}{n}-1}(\bar{y}_{ni+n}-\bar{y}_{ni})^2</math> + +with <math>n \le M - 1</math>, + +and for the time series + +:<math>\sigma_y^2(n\tau_0, N) = \text{AVAR}(n\tau_0, N) = \frac{1}{2n^2\tau_0^2(\frac{N-1}{n}-1)} \sum_{i=0}^{\frac{N-1}{n}-2}(x_{ni+2n}-2x_{ni+n}+x_{ni})^2</math> + +with <math>n \le \frac{N-1}{2}</math>. + +These estimators have a significant drawback in that they will drop a significant amount of sample data as only 1/''n'' of the available samples is being used. + +===Overlapped variable &tau; estimators=== +A technique presented by J.J. Snyder<ref name=Snyder1981>Snyder, J. J.: ''An ultra-high resolution frequency meter'', pages 464–469, Frequency Control Symposium #35, 1981</ref> provided an improved tool, as measurements was overlapped in ''n'' overlapped series out of the original series. The overlapping Allan variance estimator was introduced in.<ref name=Howe1981/> This can be shown to be equivalent to averaging the time or normalized frequency samples in blocks of ''n'' samples prior to processing. The resulting predictors becomes + +:<math>\sigma_y^2(n\tau_0, M) = \text{AVAR}(n\tau_0, M) = \frac{1}{2n^2(M-2n+1)} \sum_{j=0}^{M-2n} \left( \sum_{i=j}^{j+n-1}\bar{y}_{i+n}-\bar{y}_i \right)^2 </math> + +or for the time series + +:<math>\sigma_y^2(n\tau_0, N) = \text{AVAR}(n\tau_0, N) = \frac{1}{2n^2\tau_0^2(N-2n)} \sum_{i=0}^{N-2n-1}(x_{i+2n}-2x_{i+n}+x_i)^2</math> + +The overlapping estimators have far superior performance over the non-overlapping estimators as ''n'' rises and the time-series is of moderate length. The overlapped estimators have been accepted as the preferred Allan variance estimators in IEEE,<ref name=IEEE1139/> ITU-T<ref name=itutg810>ITU-T Rec. G.810: [http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-G.810-199608-I!!PDF-E&type=items ''Definitions and terminology for synchronization and networks''], ITU-T Rec. G.810 (08/96)</ref> and ETSI<ref name=ETSIEN3004610101>ETSI EN 300 462-1-1: [http://www.etsi.org/deliver/etsi_en/300400_300499/3004620701/01.01.01_20/en_3004620701v010101c.pdf ''Definitions and terminology for synchronisation networks''], ETSI EN 300 462-1-1 V1.1.1 (1998–05)</ref> standards for comparable measurements such as needed for telecommunication qualification. + +===Modified Allan variance=== +In order to address the inability to separate white phase modulation from flicker phase modulation using traditional Allan variance estimators an algorithmic filtering to reduce the bandwidth by ''n''. This filtering provides a modification to the definition and estimators and is now identifies as a separate class of variance called [[modified Allan variance]]. The modified Allan variance measure is a frequency stability measure, just as the Allan variance. + +===Time stability estimators=== +The Allan variance and Allan deviation provides the frequency stability variance and deviation. The time stability variants can be provided by using frequency to time scaling from the modified (Mod.) Allan variance to [[time deviation|time variance]] + +:<math>\sigma_x^2(\tau) = \frac{\tau^2}{3}Mod.\sigma_y^2(\tau)</math> + +and similarly for Allan deviation to [[time deviation]] + +:<math>\sigma_x(\tau) = \frac{\tau}{\sqrt{3}}Mod.\sigma_y(\tau).</math> + +===Other estimators=== +Further developments have produced improved estimation methods for the same stability measure, the variance/deviation of frequency, but these are known by separate names such as the [[Hadamard variance]], [[modified Hadamard variance]], the [[total variance]], [[modified total variance]] and the [[Theo variance]]. These distinguish themselves in better use of statistics for improved confidence bounds or ability to handle linear frequency drift. + +==Confidence intervals and equivalent degrees of freedom== +Statistical estimators will calculate an estimated value on the sample series used. The estimates may deviate from the true value and the range of values which for some probability will contain the true value is referred to as the [[confidence interval]]. The confidence interval depends on the number of observations in the sample series, the dominant noise type, and the estimator being used. The width is also dependent on the statistical certainty for which the confidence interval values forms a bounded range, thus the statistical certainty that the true value is within that range of values. For variable-τ estimators, the ''&tau;''<sub>0</sub> multiple ''n'' is also a variable. + +===Confidence interval=== +The [[confidence interval]] can be established using [[chi-squared distribution]] by using the [[Variance#Distribution of the sample variance|distribution of the sample variance]]:<ref name=IEEE1139/><ref name=Howe1981>D.A. Howe, D.W. Allan and J.A. Barnes: [http://tf.boulder.nist.gov/general/pdf/554.pdf ''Properties of signal sources and measurement methods''], pages 464–469, Frequency Control Symposium #35, 1981</ref> + +:<math>\chi^2 = \frac{(d.f.)s^2}{\sigma^2}</math> + +where ''s''<sup>''2''</sup> is the sample variance of our estimate, ''σ''<sup>2</sup> is the true variance value, ''d.f.'' is the degrees of freedom for the estimator and ''χ''<sup>2</sup> is the degrees of freedom for a certain probability. For a 90% probability, covering the range from the 5% to the 95% range on the probability curve, the upper and lower limits can be found using the inequality: + +:<math>\chi^2(0.05) \le \frac{(d.f.)s^2}{\sigma^2} \le \chi^2(0.95)</math> + +which after rearrangement for the true variance becomes: + +:<math>\frac{(d.f.)s^2}{\chi^2(0.95)} \le \sigma^2 \le \frac{(d.f.)s^2}{\chi^2(0.05)}</math> + +===Effective degrees of freedom=== +The [[Degrees of freedom (statistics)|degrees of freedom]] represents the number of free variables capable of contributing to the estimate. Depending on the estimator and noise type, the effective degrees of freedom varies. Estimator formulas depending on ''N'' and ''n'' has been empirically found<ref name=Howe1981/> to be: + +{| border="1" cellpadding="5" cellspacing="0" align="center" +|+ '''Allan variance degrees of freedom''' +|- +|Noise type +|degrees of freedom +|- +|white phase modulation (WPM) +|<math>d.f. \cong \frac{(N+1)(N-2n)}{2(N-n)}</math> +|- +|flicker phase modulation (FPM) +|<math>d.f. \cong {e^\left(\ln \frac{N-1}{2n} \ln \frac{(2n+1)(N-1)}{4}\right)}^{- \frac{1}{2}}</math> +|- +|white frequency modulation (WFM) +|<math>d.f. \cong \left[ \frac{3(N-1)}{2n} - \frac{2(N-2)}{N}\right]\frac{4n^2}{4n^2+5}</math> +|- +|flicker frequency modulation (FFM) +|<math>d.f. \cong \begin{cases}\frac{2(N-2)}{2.3N-4.9} & n = 1 \\ \frac{5N^2}{4n(N+3n)}& n \ge 2\end{cases}</math> +|- +|random walk frequency modulation (RWFM) +|<math>d.f. \cong \frac{N-2}{n}\frac{(N-1)^2-3n(N-1)+4n^2}{(N-3)^2}</math> +|} + +==Power-law noise== +The Allan variance will treat various [[power-law noise]] types differently, conveniently allowing them to be identified and their strength estimated. As a convention, the measurement system width (high corner frequency) is denoted ''f''<sub>''H''</sub>. +{| border="1" cellpadding="5" cellspacing="0" align="center" +|+ '''Allan variance power-law response''' +|- +|Power-law noise type +|Phase noise slope +|Frequency noise slope +|Power coefficient +|Phase noise +|Allan variance +|Allan deviation +|- +|white phase modulation (WPM) +|<math>f^0=1</math> +|<math>f^2</math> +|<math>h_2</math> +|<math>S_x(f) = \frac{1}{(2\pi)^2}h_2</math> +|<math>\sigma_y^2(\tau) = \frac{3 f_H}{4\pi^2\tau^2}h_2</math> +|<math>\sigma_y(\tau) = \frac{\sqrt{3 f_H}}{2\pi\tau}\sqrt{h_2}</math> +|- +|flicker phase modulation (FPM) +|<math>f^{-1}</math> +|<math>f^1=f</math> +|<math>h_1</math> +|<math>S_x(f) = \frac{1}{(2\pi)^2f}h_1</math> +|<math>\sigma_y^2(\tau) = \frac{3[\gamma+\ln(2\pi f_H\tau)]-\ln 2}{4\pi^2\tau^2}h_1</math> +|<math>\sigma_t(\tau) = \frac{\sqrt{3[\gamma+\ln(2\pi f_H\tau)]-\ln 2}}{2\pi\tau}\sqrt{h_1}</math> +|- +|white frequency modulation (WFM) +|<math>f^{-2}</math> +|<math>f^0=1</math> +|<math>h_0</math> +|<math>S_x(f) = \frac{1}{(2\pi)^2f^2}h_0</math> +|<math>\sigma_y^2(\tau) = \frac{1}{2\tau}h_0</math> +|<math>\sigma_y(\tau) = \frac{1}{\sqrt{2\tau}}\sqrt{h_0}</math> +|- +|flicker frequency modulation (FFM) +|<math>f^{-3}</math> +|<math>f^{-1}</math> +|<math>h_{-1}</math> +|<math>S_x(f) = \frac{1}{(2\pi)^2f^3}h_{-1}</math> +|<math>\sigma_y^2(\tau) = 2\ln(2)h_{-1}</math> +|<math>\sigma_y(\tau) = \sqrt{2\ln(2)}\sqrt{h_{-1}}</math> +|- +|random walk frequency modulation (RWFM) +|<math>f^{-4}</math> +|<math>f^{-2}</math> +|<math>h_{-2}</math> +|<math>S_x(f) = \frac{1}{(2\pi)^2f^4}h_{-2}</math> +|<math>\sigma_y^2(\tau) = \frac{2\pi^2\tau}{3}h_{-2}</math> +|<math>\sigma_y(\tau) = \frac{\pi\sqrt{2\tau}}{\sqrt{3}}\sqrt{h_{-2}}</math> +|- +|} + +As found in<ref name=NBSTN394>J.A. Barnes, A.R. Chi, L.S. Cutler, D.J. Healey, D.B. Leeson, T.E. McGunigal, J.A. Mullen, W.L. Smith, R. Sydnor, R.F.C. Vessot, and G.M.R. Winkler: [http://tf.boulder.nist.gov/general/pdf/264.pdf ''Characterization of Frequency Stability''], NBS Technical Note 394, 1970</ref><ref>J.A. Barnes, A.R. Chi, L.S. Cutler, D.J. Healey, D.B. Leeson, T.E. McGunigal, J.A. Mullen, Jr., W.L. Smith, R.L. Sydnor, R.F.C. Vessot, and G.M.R. Winkler: [http://tf.boulder.nist.gov/general/pdf/118.pdf ''Characterization of Frequency Stability''], IEEE Transactions on Instruments and Measurements 20, pp. 105&ndash;120, 1971</ref> and in modern forms.<ref name=Bregni2002>Bregni, Stefano: [http://books.google.com/books?id=APEBaL4WHNoC&printsec=frontcover ''Synchronisation of digital telecommunication networks''], Wiley 2002, ISBN 0-471-61550-1</ref><ref name=NISTSP1065>NIST SP 1065: [http://tf.nist.gov/timefreq/general/pdf/2220.pdf ''Handbook of Frequency Stability Analysis'']</ref> + +The Allan variance is unable to distinguish between WPM and FPM, but is able to resolve the other power-law noise types. In order to distinguish WPM and FPM, the [[modified Allan variance]] needs to be employed. + +The above formulas assume that + +:<math>\tau \gg \frac{1}{2\pi f_H}</math> + +and thus that the bandwidth of the observation time is much lower than the instruments bandwidth. When this condition is not met, all noise forms depend on the instrument's bandwidth. + +===&alpha;-&mu; mapping=== +The detailed mapping of a phase modulation of the form + +:<math>S_x(f) = \frac{1}{4\pi^2}h_{\alpha}f^{\alpha-2} = \frac{1}{4\pi^2}h_{\alpha}f^{\beta}</math> + +where + +:<math>\beta \equiv \alpha - 2</math> + +or frequency modulation of the form + +:<math>S_y(f) = h_{\alpha}f^{\alpha}</math> + +into the Allan variance of the form + +:<math>\sigma_y^2(\tau) = K_{\alpha}h_{\alpha}\tau^{\mu}</math> + +can be significantly simplified by providing a mapping between α and μ. A mapping between α and ''K''<sub>α</sub> is also presented for convenience: + +{| border="1" cellpadding="5" cellspacing="0" align="center" +|+ '''Allan variance α-μ mapping''' +|- +|α +|β +|μ +|''K''<sub>α</sub> +|- +| -2 +| -4 +| 1 +|<math>\frac{2\pi^2}{3}</math> +|- +| -1 +| -3 +| 0 +|<math>2\ln{2}</math> +|- +| 0 +| -2 +| -1 +|<math>\frac{1}{2}</math> +|- +| 1 +| -1 +| -2 +|<math>\frac{3[\gamma+\ln(2\pi f_H\tau)]-\ln 2}{4\pi^2}</math> +|- +| 2 +| 0 +| -2 +|<math>\frac{3f_H}{4\pi^2}</math> +|- +|} + +The mapping is taken from.<ref name=IEEE1139/> + +===General Conversion from Phase Noise=== +A signal with spectral phase noise <math>S_\phi</math> with units rad<sup>2</sup>/Hz can be converted to Allan Variance by:<ref name=NISTSP1065/> + +<math>\sigma^2_y(\tau) = \frac{2}{\nu_0^2} \int^{f_b}_0 S_\phi(f) \frac{\sin^4(\pi \tau f)}{(\pi \tau)^2} df</math> + +==Linear response== +While Allan variance is intended to be used to distinguish noise forms, it will depend on some but not all linear responses to time. They are given in the table: + +{| border="1" cellpadding="5" cellspacing="0" align="center" +|+ '''Allan variance linear response''' +|- +! Linear effect +! time response +! frequency response +! Allan variance +! Allan deviation +|- +| phase offset +| <math>x_0</math> +| <math>0</math> +| <math>0</math> +| <math>0</math> +|- +| frequency offset +| <math>y_0t</math> +| <math>y_0</math> +| <math>0</math> +| <math>0</math> +|- +| linear drift +| <math>\frac{Dt^2}{2}</math> +| <math>Dt</math> +| <math>\frac{D^2\tau^2}{2}</math> +| <math>\frac{D\tau}{\sqrt{2}}</math> +|- +|} + +Thus, linear drift will contribute to output result. When measuring a real system, the linear drift or other drift mechanism may need to be estimated and removed from the time-series prior to calculating the Allan variance.<ref name=Bregni2002/> + +==Time and frequency filter properties== +In analysing the properties of Allan variance and friends, it has proven useful to consider the filter properties on the normalize frequency. Starting with the definition for Allan variance for + +:<math>\sigma_y^2(\tau) = \frac{1}{2}\langle(\bar{y}_{i+1}-\bar{y}_i)^2\rangle</math> + +where + +:<math>\bar{y}_i = \frac{1}{\tau} \int\limits_0^\tau y(i\tau+t) \, dt.</math> + +Replacing the time series of <math>y_i</math> with the Fourier transformed variant <math>S_y(f)</math> the Allan variance can be expressed in the frequency domain as + +:<math>\sigma_y^2(\tau) = \int_0^\infty S_y(f)\frac{2\sin^4\pi\tau f}{(\pi \tau f)^2} \, df</math> + +Thus the transfer function for Allan variance is + +:<math>\left\vert H_A(f)\right\vert^2 = \frac{2\sin^4\pi \tau f}{(\pi \tau f)^2}. </math> + +==Bias functions== +The ''M''-sample variance, and the defined special case Allan variance, will experience [[systematic bias]] depending on different number of samples ''M'' and different relationship between ''T'' and ''τ''. In order address these biases the bias-functions ''B''<sub>1</sub> and ''B''<sub>2</sub> has been defined<ref name=NBSTN375>Barnes, J.A.: [http://tf.boulder.nist.gov/general/pdf/11.pdf ''Tables of Bias Functions, ''B''<sub>1</sub> and ''B''<sub>2</sub>, for Variances Based On Finite Samples of Processes with Power Law Spectral Densities''], NBS Technical Note 375, 1969</ref> and allows for conversion between different ''M'' and ''T'' values. + +These bias functions is not sufficient for handling the bias resulting from concatenating ''M'' samples to the ''Mτ''<sub>0</sub> observation time over the ''MT''<sub>0</sub> with has the dead-time distributed among the ''M'' measurement blocks rather than in the end of the measurement. This rendered the need for the ''B''<sub>3</sub> bias.<ref name=NISTTN1318/> + +The bias functions is evaluated for a particular µ value, so the α-µ mapping needs to be done for the dominant noise form as found using [[noise identification]]. Alternatively as proposed in<ref name=Allan1966/> and elaborated in<ref name=NBSTN375/> the µ value of the dominant noise form may be inferred from the measurements using the bias functions. + +===B<sub>1</sub> bias function=== +The ''B''<sub>1</sub> bias function relates the ''M''-sample variance with the 2-sample variance (Allan variance), keeping the time between measurements ''T'' and time for each measurements ''τ'' constant, and is defined<ref name=NBSTN375/> as + +:<math>B_1 (N, r, \mu ) = \frac{ \left \langle\sigma_y^2(N, T, \tau ) \right \rangle}{ \left \langle\sigma_y^2(2, T, \tau ) \right\rangle}</math> + +where + +:<math>r = \frac{T}{\tau}.</math> + +The bias function becomes after analysis + +:<math>B_1(N, r, \mu) = \frac{1 + \sum_{n=1}^{N-1} \frac{N-n}{N(N-1)}\left [ 2\left (rn\right )^{\mu+2} - \left (rn+1\right )^{\mu+2} -\left |rn-1\right |^{\mu+2}\right ]}{1 + \frac{1}{2}\left [ 2r^{\mu+2} - \left (r+1\right )^{\mu+2}-\left |r-1\right |^{\mu+2}\right ]}.</math> + +===B<sub>2</sub> bias function=== +The ''B''<sub>2</sub> bias function relates the 2-sample variance for sample time ''T'' with the 2-sample variance (Allan variance), keeping the number of samples ''N''&nbsp;=&nbsp;2 and the observation time ''τ'' constant, and is defined<ref name=NBSTN375/> + +:<math>B_2 (r, \mu ) = \frac{ \left \langle\sigma_y^2(2, T, \tau ) \right \rangle}{ \left \langle\sigma_y^2(2, \tau, \tau ) \right\rangle}</math> + +where + +:<math>r = \frac{T}{\tau}.</math> + +The bias function becomes after analysis + +:<math>B_2(r, \mu) = \frac{1 + \frac{1}{2}\left [ 2r^{\mu+2} - \left (r+1\right )^{\mu+2}-\left |r-1\right |^{\mu+2}\right ]}{2\left ( 1-2^{\mu}\right )}. </math> + +===''B''<sub>3</sub> bias function=== +The ''B''<sub>3</sub> bias function relates the 2-sample variance for sample time ''MT''<sub>0</sub> and observation time ''Mτ''<sub>0</sub> with the 2-sample variance (Allan variance) and is defined<ref name=NISTTN1318>J.A. Barnes and D.W. Allan: [http://tf.boulder.nist.gov/general/pdf/878.pdf ''Variances Based on Data with Dead Time Between the Measurements''], NIST Technical Note 1318, 1990</ref> as + +:<math>B_3 (N, M, r, \mu) = \frac{\left\langle\sigma_y^2(N, M, T, \tau)\right\rangle}{\left\langle\sigma_y^2(N, T, \tau)\right\rangle}</math> + +where + +:<math>T = M T_0 \, </math> + +:<math>\tau = M \tau_0. \, </math> + +The ''B''<sub>3</sub> bias function is useful to adjust non-overlapping and overlapping variable ''τ'' estimator values based on dead-time measurements of observation time ''τ''<sub>0</sub> and time between observations ''T''<sub>0</sub> to normal dead-time estimates. + +The bias function becomes after analysis (for the ''N''&nbsp;=&nbsp;2 case) + +:<math>B_3(2, M, r, \mu) = \frac{2M + MF(Mr) - \sum_{n=1}^{M-1} (M-n)\left [ 2F(nr) - F((M+n)r) + F((M-n)r)\right ]}{M^{\mu+2} \left [ F(r) + 2\right ]}</math> + +where + +:<math>F(A) = 2A^{\mu+2} - (A+1)^{\mu+2} - |A-1|^{\mu+2}. \, </math> + +===&tau; bias function=== +While formally not formulated, it has been indirectly inferred as a consequence of the α-µ mapping. When comparing two Allan variance measure for different τ assuming same dominant noise in the form of same µ coefficient, a bias can be defined as + +:<math>B_\tau (\tau_1, \tau_2, \mu ) = \frac{ \left \langle\sigma_y^2(2, \tau_2, \tau_2 ) \right \rangle}{ \left \langle\sigma_y^2(2, \tau_1, \tau_1 ) \right\rangle}. \, </math> + +The bias function becomes after analysis + +:<math>B_\tau (\tau_1, \tau_2, \mu ) = \left ( \frac{\tau_2}{\tau_1} \right)^\mu.</math> + +===Conversion between values=== +In order to convert from one set of measurements to another the ''B''<sub>1</sub>, ''B''<sub>2</sub> and τ bias functions can be assembled. First the ''B''<sub>1</sub> function converts the (''N''<sub>1</sub>,&nbsp;''T''<sub>1</sub>,&nbsp;''τ''<sub>1</sub>) value into (2,&nbsp;''T''<sub>1</sub>,&nbsp;''τ''<sub>1</sub>), from which the ''B''<sub>2</sub> function converts into a (2,&nbsp;''τ''<sub>1</sub>,&nbsp;''τ''<sub>1</sub>) value, thus the Allan variance at&nbsp;''τ''<sub>1</sub>. The Allan variance measure can be converted using the τ bias function from ''τ''<sub>1</sub> to ''τ''<sub>2</sub>, from which then the (2,&nbsp;''T''<sub>2</sub>,&nbsp;''τ''<sub>2</sub>) using ''B''<sub>2</sub> and then finally using ''B''<sub>1</sub> into the (''N''<sub>2</sub>,&nbsp;''T''<sub>2</sub>,&nbsp;''τ''<sub>2</sub>) variance. The complete conversion becomes + +:<math>\left \langle \sigma_y^2(N_2, T_2, \tau_2) \right \rangle = \left ( \frac{\tau_2}{\tau_1} \right )^\mu \left [ \frac{B_1(N_2, r_2, \mu)B_2(r_2, \mu)}{B_1(N_1, r_1, \mu)B_2(r_1, \mu)} \right ] \left \langle \sigma_y^2(N_1, T_1, \tau_1) \right \rangle</math> + +where + +:<math>r_1 = \frac{T_1}{r_1}</math> + +:<math>r_2 = \frac{T_2}{r_2}</math> + +Similarly, for concatenated measurements using M sections, the logical extension becomes + +:<math>\left \langle \sigma_y^2(N_2, M_2, T_2, \tau_2) \right \rangle = \left ( \frac{\tau_2}{\tau_1} \right )^\mu \left [ \frac{B_3(N_2, M_2, r_2, \mu)B_1(N_2, r_2, \mu)B_2(r_2, \mu)}{B_3(N_1, M_1, r_1, \mu)B_1(N_1, r_1, \mu)B_2(r_1, \mu)} \right ] \left \langle \sigma_y^2(N_1, M_1, T_1, \tau_1) \right \rangle.</math> + +==Measurement issues== +When making measurements to calculate Allan variance or Allan deviation a number of issues may cause the measurements to degenerate. Covered here is the effects specific to Allan variance, where results would be biased. + +===Measurement bandwidth limits=== +A measurement system is expected to have a bandwidth at or below that of the Nyquist rate as described within the [[Shannon–Hartley theorem]]. As can be seen in the power-law noise formulas, the white and flicker noise modulations both depends on the upper corner frequency <math>f_H</math> (these systems is assumed to be low-pass filtered only). Considering the frequency filter property it can be clearly seen that low-frequency noise has greater impact on the result. For relatively flat phase modulation noise types (e.g. WPM and FPM), the filtering has relevance, whereas for noise types with greater slope the upper frequency limit becomes of less importance, assuming that the measurement system bandwidth is wide relative the <math>\tau</math> as given by + +:<math>\tau \gg \frac{1}{2\pi f_H}.</math> + +When this assumption is not met, the effective bandwidth <math>f_H</math> needs to be notated alongside the measurement. The interested should consult NBS TN394.<ref name=NBSTN394/> + +If however one adjust the bandwidth of the estimator by using integer multiples of the sample time <math>n\tau_0</math> then the system bandwidth impact can be reduced to insignificant levels. For telecommunication needs, such methods have been required in order to ensure comparability of measurements and allow some freedom for vendors to do different implementations. The ITU-T Rec. G.813<ref name=ITUTG813>ITU-T Rec. G.813: [http://www.itu.int/rec/T-REC-G.813/recommendation.asp?lang=en&parent=T-REC-G.813-200303-I ''Timing characteristics of SDH equipment slave clock (SEC)''], ITU-T Rec. G.813 (03/2003)</ref> for the TDEV measurement. + +It can be recommended that the first <math>\tau_0</math> multiples be ignored such that the majority of the detected noise is well within the passband of the measurement systems bandwidth. + +Further developments on the Allan variance was performed to let the hardware bandwidth be reduced by software means. This development of a software bandwidth allowed for addressing the remaining noise and the method is now referred to [[modified Allan variance]]. This bandwidth reduction technique should not be confused with the enhanced variant of [[modified Allan variance]] which also changes a smoothing filter bandwidth. + +===Dead time in measurements=== +Many measurement instruments of time and frequency have the stages of arming time, time-base time, processing time and may then re-trigger the arming. The arming time is from the time the arming is triggered to when the start event occurs on the start channel. The time-base then ensures that minimum amount of time goes prior to accepting an event on the stop channel as the stop event. The number of events and time elapsed between the start event and stop event is recorded and presented during the processing time. When the processing occurs (also known as the dwell time) the instrument is usually unable to do another measurement. After the processing has occurred, an instrument in continuous mode triggers the arm circuit again. The time between the stop event and the following start event becomes [[dead time]] during which the signal is not being observed. Such dead time introduces systematic measurement biases, which needs to be compensated for in order to get proper results. For such measurement systems will the time ''T'' denote the time between the adjacent start events (and thus measurements) while <math>\tau</math> denote the time-base length, i.e. the nominal length between the start and stop event of any measurement. + +Dead time effects on measurements have such an impact on the produced result that much study of the field have been done in order to quantify its properties properly. The introduction of zero dead-time counters removed the need for this analysis. A zero dead-time counter has the property that the stop-event of one measurement is also being used as the start-event of the following event. Such counters creates a series of event and time timestamp pairs, one for each channel spaced by the time-base. Such measurements have also proved useful in order forms of time-series analysis. + +Measurements being performed with dead time can be corrected using the bias function ''B''<sub>1</sub>, ''B''<sub>2</sub> and ''B''<sub>3</sub>. Thus, dead time as such is not prohibiting the access to the Allan variance, but it makes it more problematic. The dead time must be known such that the time between samples ''T'' can be established. + +===Measurement length and effective use of samples=== +Studying the effect on the [[Allan variance#Confidence interval|confidence intervals]] that the length ''N'' of the sample series have, and the effect of the variable τ parameter ''n'' the confidence intervals may become very large since the [[Allan variance#Effective degree of freedom|effective degree of freedom]] may become small for some combination of ''N'' and ''n'' for the dominant noise-form (for that τ). + +The effect may be that the estimated value may be much smaller or much greater than the real value, which may lead to false conclusions of the result. + +It is recommended that the confidence interval is plotted along with the data, such that the reader of the plot is able to be aware of the statistical uncertainty of the values. + +It is recommended that the length of the sample sequence, i.e. the number of samples ''N'' is kept high to ensure that confidence interval is small over the τ-range of interest. + +It is recommended that the τ-range as swept by the ''&tau;''<sub>0</sub> multiplier ''n'' is limited in the upper end relative ''N'' such that the read of the plot is not being confused by highly unstable estimator values. + +It is recommended that estimators providing better degrees of freedom values be used in replacement of the Allan variance estimators or as complementing them where they outperform the Allan variance estimators. Among those the [[Total variance]] and [[Theo variance]] estimators should be considered. + +===Dominant noise type=== +A large number of conversion constants, bias corrections and confidence intervals depends on the dominant noise type. For proper interpretation shall the dominant noise type for the particular τ of interest be identified through noise identification. Failing to identify the dominant noise type will produce biased values. Some of these biases may be of several order of magnitude, so it may be of large significance. + +===Linear drift=== +Systematic effects on the signal is only partly cancelled. Phase and frequency offset is cancelled, but linear drift or other high degree forms of polynomial phase curves will not be cancelled and thus form a measurement limitation. Curve fitting and removal of systematic offset could be employed. Often removal of linear drift can be sufficient. Use of linear drift estimators such as the [[Hadamard variance]] could also be employed. A linear drift removal could be employed using a moment based estimator. + +===Measurement instrument estimator bias=== +Traditional instruments provided only the measurement of single events or event pairs. The introduction of the improved statistical tool of overlapping measurements by J.J. Snyder<ref name=Snyder1981/> allowed for much improved resolution in frequency readouts, breaking the traditional digits/time-base balance. While such methods is useful for their intended purpose, using such smoothed measurements for Allan variance calculations would give a false impression of high resolution,<ref name=Rubiola2005>{{Cite journal|url=http://www.femto-st.fr/~rubiola/pdf-articles/journal/2005rsi-hi-res-freq-counters.pdf|doi=10.1063/1.1898203|title=On the measurement of frequency and of its sample variance with high-resolution counters|year=2005|last1=Rubiola|first1=Enrico|journal=Review of Scientific Instruments|volume=76|pages=054703|issue=5|arxiv = physics/0411227 |bibcode = 2005RScI...76e4703R }}</ref><ref name=Rubiola2005ifcs>Rubiola, Enrico: [http://www.femto-st.fr/~rubiola/pdf-articles/conference/2005-ifcs-counters.pdf ''On the measurement of frequency and of its sample variance with high-resolution counters''], Proc. Joint IEEE International Frequency Control Symposium and Precise Time and Time Interval Systems and Applications Meeting pp. 46–49, Vancouver, Canada, 29–31 August 2005.</ref><ref name=Rubiola2008cntpres>Rubiola, Enrico: [http://www.femto-st.fr/~rubiola/pdf-slides/2008T-femto-counters.pdf ''High-resolution frequency counters (extended version, 53 slides)''], seminar given at the FEMTO-ST Institute, at the Université Henri Poincaré, and at the Jet Propulsion Laboratory, NASA-Caltech.</ref> but for longer τ the effect is gradually removed and the lower τ region of the measurement has biased values. This bias is providing lower values than it should, so it is an overoptimistic (assuming that low numbers is what one wishes) bias reducing the usability of the measurement rather than improving it. Such smart algorithms can usually be disabled or otherwise circumvented by using time-stamp mode which is much preferred if available. + +==Practical measurements== +While several approaches to measurement of Allan variance can be devised, a simple example may illustrate how measurements can be performed. + +===Measurement=== +All measurements of Allan variance will in effect be the comparison of two different clocks. Lets consider a reference clock and a device under test (DUT), and both having a common nominal frequency of 10&nbsp;MHz. A time-interval counter is being used to measure the time between the rising edge of the reference (channel A) and the rising edge of the device under test. + +In order to provide evenly spaced measurements will the reference clock be divided down to form the measurement rate, triggering the time-interval counter (ARM input). This rate can be 1&nbsp;Hz (using the [[Pulse per second|1 PPS]] output of a reference clock) but other rates like 10&nbsp;Hz and 100&nbsp;Hz can also be used. The speed of which the time-interval counter can complete the measurement, output the result and prepare itself for the next arm will limit the trigger frequency. + +A computer is then useful to record the series of time-differences being observed. + +===Post-processing=== +The recorded time-series require post-processing to unwrap the wrapped phase, such that a continuous phase error is being provided. If necessary should also logging and measurement mistakes be fixed. Drift estimation and drift removal should be performed, the drift mechanism needs to be identified and understood for the sources. Drift limitations in measurements can be severe, so letting the oscillators become stabilized by long enough time being powered on is necessary. + +The Allan variance can then be calculated using the estimators given, and for practical purposes the overlapping estimator should be used due to its superior use of data over the non-overlapping estimator. Other estimators such as Total or Theo variance estimators could also be used if bias corrections is applied such that they provide Allan variance compatible results. + +To form the classical plots, the Allan deviation (square root of Allan variance) is plotted in log-log format against the observation interval tau. + +===Equipment and software=== +The time-interval counter is typically an off the shelf counter commercially available. Limiting factors involve single-shot resolution, trigger jitter, speed of measurements and stability of reference clock. The computer collection and post-processing can be done using existing commercial or public domain software. Highly advanced solutions exists which will provide measurement and computation in one box. + +==Research history== +The field of frequency stability has been studied for a long time, however it was found during the 1960s that there was a lack of coherent definitions. The NASA-IEEE Symposium on Short-Term Stability in 1964 was followed with the IEEE Proceedings publishing a special issue on Frequency Stability in its February 1966 issue. + +The NASA-IEEE Symposium on Short-Term Stability in November 1964<ref name=NASA1964>NASA: [http://hdl.handle.net/2060/19660001092] ''Short-Term Frequency Stability'', NASA-IEEE symposium on Short Term Frequency Stability Goddard Space Flight Center 23–24 November 1964, NASA Special Publication 80</ref> brings together many fields and uses of short and long term stability with papers from many different contributors. The articles and panel discussions is interesting in that they concur on the existence of the frequency flicker noise and the wish for achieving a common definition for short and long term stability (even if the conference name only reflect the short-term stability intention). + +The IEEE proceedings on Frequency Stability 1966 included a number of important papers including those of David Allan,<ref name=Allan1966/> James A. Barnes,<ref name=Barnes1966>Barnes, J. A.: [http://tf.boulder.nist.gov/general/pdf/6.pdf ''Atomic Timekeeping and the Statistics of Precision Signal Generators''], IEEE Proceedings on Frequency Stability, Vol 54 No 2, pages 207&ndash;220, 1966</ref> L. S. Cutler and C. L. Searle<ref name=Cutler1966/> and D. B. Leeson.<ref name=Leeson1966/> These papers helped shape the field. + +The classical ''M''-sample variance of frequency was analysed by David Allan in<ref name=Allan1966/> along with an initial bias function. This paper tackles the issues of dead-time between measurements and analyses the case of M frequency samples (called ''N'' in the paper) and variance estimators. It provides the now standard ''α'' to ''µ'' mapping. It clearly builds on James Barnes work as detailed in his article<ref name=Barnes1966/> in the same issue. The initial bias functions introduced assumes no dead-time, but the formulas presented includes dead-time calculations. The bias function assumes the use of the 2-sample variance as a base-case, since any other variants of ''M'' may be chosen and values may be transferred via the 2-sample variance to any other variance for of arbitrary ''M''. Thus, the 2-sample variance was only implicitly used and not clearly stated as the preference even if the tools where provided. It however laid the foundation for using the 2-sample variance as the base case of comparison among other variants of the ''M''-sample variance. The 2-sample variance case is a special case of the ''M''-sample variance which produces an average of the frequency derivative. + +The work on bias functions was significantly extended by James Barnes in<ref name=NBSTN375/> in which the modern B<sub>1</sub> and B<sub>2</sub> bias functions was introduced. Curiously enough it refers to the ''M''-sample variance as "Allan variance" while referencing to.<ref name=Allan1966/> With these modern bias functions full conversion among ''M''-sample variance measures of variating ''M'', ''T'' and τ values could used, by conversion through the 2-sample variance. + +James Barnes and David Allan further extended the bias functions with the B<sub>3</sub> function in<ref name=NISTTN1318/> to handle the concatenated samples estimator bias. This was necessary to handle the new use of concatenated sample observations with dead time in between. + +The IEEE Technical Committee on Frequency and Time within the IEEE Group on Instrumentation & Measurements provided a summary of the field in 1970 published as NBS Technical Notice 394.<ref name=NBSTN394/> This paper could be considered first in a line of more educational and practical papers aiding the fellow engineers in grasping the field. In this paper the 2-sample variance with ''T''&nbsp;=&nbsp;''τ'' is being the recommended measurement and it is referred to as Allan variance (now without the quotes). The choice of such parametrisation allows good handling of some noise forms and to get comparable measurements, it is essentially the least common denominator with the aid of the bias functions B<sub>1</sub> and B<sub>2</sub>. + +An improved method for using sample statistics for frequency counters in frequency estimation or variance estimation was proposed by J.J. Snyder.<ref name=Snyder1981/> The trick to get more effective degrees of freedom out of the available dataset was to use overlapping observation periods. This provides a square-root ''n'' improvement. It was included into the overlapping Allan variance estimator introduced in.<ref name=Howe1981/> The variable τ software processing was also included in.<ref name=Howe1981/> This development improved the classical Allan variance estimators likewise providing a direct inspiration going into the work on [[modified Allan variance]]. + +The confidence interval and degrees of freedom analysis, along with the established estimators was presented in.<ref name=Howe1981/> + +==Educational and practical resources== +The field of time and frequency and its use of Allan variance, [[Allan deviation]] and friends is a field involving many aspects, for which both understanding of concepts and practical measurements and post-processing requires care and understanding. Thus, there is a realm of educational material stretching some 40 years available. Since these reflect the developments in the research of their time, they focus on teaching different aspect over time, in which case a survey of available resources may be a suitable way of finding the right resource. + +The first meaningful summary is the NBS Technical Note 394 "Characterization of Frequency Stability".<ref name=NBSTN394/> This is the product of the Technical Committee on Frequency and Time of the IEEE Group on Instrumentation & Measurement. It gives the first overview of the field, stating the problems, defining the basic supporting definitions and getting into Allan variance, the bias functions ''B''<sub>1</sub> and ''B''<sub>2</sub>, the conversion of time-domain measures. This is useful as it is among the first references to tabulate the Allan variance for the five basic noise types. + +A classical reference is the NBS Monograph 140<ref name=NBSMG140>Blair, B.E.: [http://tf.boulder.nist.gov/general/pdf/59.pdf ''Time and Frequency: Theory and Fundamentals''], NBS Monograph 140, May 1974</ref> from 1974, which in chapter 8 has "Statistics of Time and Frequency Data Analysis".<ref name=NBSMG140-8>David W. Allan, John H. Shoaf and Donald Halford: [http://tf.boulder.nist.gov/general/pdf/59.pdf ''Statistics of Time and Frequency Data Analysis''], NBS Monograph 140, pages 151&ndash;204, 1974</ref> This is the extended variant of NBS Technical Note 394 and adds essentially in measurement techniques and practical processing of values. + +An important addition will be the ''Properties of signal sources and measurement methods''.<ref name=Howe1981/> It covers the effective use of data, confidence intervals, effective degree of freedom likewise introducing the overlapping Allan variance estimator. It is a highly recommended reading for those topics. + +The IEEE standard 1139 ''Standard definitions of Physical Quantities for Fundamental Frequency and Time Metrology''<ref name=IEEE1139/> is beyond that of a standard a comprehensive reference and educational resource. + +A modern book aimed towards telecommunication is Stefano Bregni "Synchronisation of Digital Telecommunication Networks".<ref name=Bregni2002/> This summarises not only the field but also much of his research in the field up to that point. It aims to include both classical measures likewise telecommunication specific measures such as MTIE. It is a handy companion when looking at telecommunication standard related measurements. + +The NIST Special Publication 1065 "Handbook of Frequency Stability Analysis" of W.J. Riley<ref name=NISTSP1065/> is a recommended reading for anyone wanting to pursue the field. It is rich of references and also covers a wide range of measures, biases and related functions that a modern analyst should have available. Further it describes the overall processing needed for a modern tool. + +==Uses== +Allan variance is used as a measure of frequency stability in a variety of precision oscillators, such as [[crystal oscillator]]s, [[atomic clock]]s and frequency-stabilized [[laser]]s over a period of a second or more. Short term stability (under a second) is typically expressed as [[phase noise]]. The Allan variance is also used to characterize the bias stability of [[gyroscopes]], including [[fiber optic gyroscope]]s and [[Microelectromechanical systems|MEMS]] gyroscopes. + +==See also== +{{colbegin|2}} +*[[Variance]] +*[[Semivariance]] +*[[Variogram]] +*[[Metrology]] +*[[Network time protocol]] +*[[Precision Time Protocol]] +*[[Synchronization]] +{{colend}} + +==References== +{{Reflist|2}} + +==External links== +*[http://www.ieee-uffc.org/frequency_control/teaching.asp UFFC Frequency Control Teaching Resources] +*[http://www.tf.nist.gov/timefreq/general/publications.htm NIST Publication search tool] +*[http://www.allanstime.com/AllanVariance/ David W. Allan's Allan Variance Overview] +*[http://www.allanstime.com David W. Allan's official web site] +*[http://horology.jpl.nasa.gov/noiseinfo.html JPL Publications &ndash; Noise Analysis and Statistics] +*[http://www.wriley.com/ William Riley publications] +*[http://home.dei.polimi.it/bregni/public.htm Stefano Bregni publications] +*[http://rubiola.org/ Enrico Rubiola publications] +*[http://cran.r-project.org/web/packages/allanvar/index.html Allanvar: R package for sensor error characterization using the Allan Variance] +*[http://www.alamath.com/ Alavar windows software with reporting tools; Freeware ] + +{{DEFAULTSORT:Allan Variance}} +[[Category:Clocks]] +[[Category:Signal processing metrics]] +[[Category:Measurement]] + jbmna7z2u6ifyuoc0qnetvjp3mjhuny + + + + Superconducting radio frequency + 0 + 21691 + + 21692 + 2013-01-07T09:46:32Z + + 193.62.111.10 + + /* Physics of SRF cavities */ Capitalized Cooper pairs + wikitext + text/x-wiki + {{Use dmy dates|date=August 2012}} +[[Image:Cornell SRF B-Cell 1.png|thumb|The Cornell storage ring 500&nbsp;MHz SRF cavity.]] +'''Superconducting radio frequency (SRF)''' science and technology involves the application of electrical [[Superconductivity|superconductors]] to [[radio frequency]] devices. The ultra-low [[Electrical resistivity and conductivity|electrical resistivity]] of a superconducting material allows an RF resonator to obtain an extremely high [[Q factor|quality factor]], ''Q''. For example, it is commonplace for a 1.3&nbsp;GHz [[niobium]] SRF resonant cavity at 1.8&nbsp;[[Kelvin]] to obtain a quality factor of ''Q''=5×10<sup>10</sup>. Such a very high ''Q'' resonator stores energy with very low loss and narrow [[Bandwidth (signal processing)|bandwidth]]. These properties can be exploited for a variety of applications, including the construction of high-performance [[particle accelerator]] structures. + +==Introduction== +The amount of loss in an SRF resonant cavity is so minute that it is often explained with the following comparison: [[Galileo Galilei]] (1564–1642) was one of the first investigators of pendulous motion, a simple form of mechanical [[resonator|resonance]]. Had Galileo experimented with a 1&nbsp;Hz resonator with a quality factor ''Q'' typical of today's SRF cavities and left it swinging in a [[wiktionary:sepulchered|sepulchered]] lab since the early 17th century, that pendulum would still be swinging today with about half of its original amplitude. +[[Image:Cornell SRF B-Cell 2.png|thumb|Photograph of the Cornell storage ring 500&nbsp;MHz SRF cavity being lifted out of a cryogenic test [[Vacuum flask|dewar]] while still cold.]] + +The most common application of superconducting RF is in [[particle accelerator]]s. Accelerators typically use [[Resonator|resonant RF cavities]] formed from or coated with superconducting materials. Electromagnetic fields are excited in the cavity by coupling in an RF source with an antenna. When the RF frequency fed by the antenna is the same as that of a cavity mode, the resonant fields build to high amplitudes. Charged particles passing through apertures in the cavity are then accelerated by the electric fields and deflected by the magnetic fields. The resonant frequency driven in SRF cavities typically ranges from 200&nbsp;MHz to 3&nbsp;GHz, depending on the particle species to be accelerated. + +The most common fabrication technology for such SRF cavities is to form thin walled (1–3&nbsp;mm) shell components from high purity niobium sheets by [[Stamping (metalworking)|stamping]]. These shell components are then [[electron beam welding|welded]] together to form cavities. Several such finished products are pictured below. + +A simplified diagram of the key elements of an SRF cavity setup is shown below. The cavity is immersed in a [[Saturated fluid|saturated]] [[liquid helium]] bath. Pumping removes helium vapor boil-off and controls the bath temperature. The helium vessel is often pumped to a pressure below helium's [[superfluid]] [[lambda point]] to take advantage of the superfluid's thermal properties. Because superfluid has very high thermal conductivity, it makes an excellent coolant. In addition, superfluids boil only at free surfaces, preventing the formation of bubbles on the surface of the cavity, which would cause mechanical perturbations. An antenna is needed in the setup to couple RF power to the cavity fields and, in turn, any passing particle beam. The cold portions of the setup need to be extremely well insulated, which is best accomplished by a vacuum vessel surrounding the helium vessel and all ancillary cold components. The full SRF cavity containment system, including the vacuum vessel and many details not discussed here, is a [[cryomodule]]. + +[[Image:SRF Cavity Diagram 1.png|frame|A simplified diagram of an SRF cavity in a helium bath with RF coupling and a passing particle beam.]] + +Entry into superconducting RF technology can incur more complexity, expense, and time than normal-conducting RF cavity strategies. SRF requires chemical facilities for harsh cavity treatments, a low-particulate [[cleanroom]] for high-pressure water rinsing and assembly of components, and complex engineering for the cryomodule vessel and cryogenics. A vexing aspect of SRF is the as-yet elusive ability to consistently produce high ''Q'' cavities in high volume production, which would be required for a large [[International Linear Collider|linear collider]]. Nevertheless, for many applications the capabilities of SRF cavities provide the only solution for a host of demanding performance requirements. + +Several extensive treatments of SRF physics and technology are available, many of them free of charge and online. There are the proceedings of [[CERN]] accelerator schools,<ref>[http://documents.cern.ch/cgi-bin/setlink?base=cernrep&categ=Yellow_Report&id=2004-008 ''2004 CERN Accelerator School: Superconductivity and cryogenics for accelerators and detectors'']</ref><ref>[http://documents.cern.ch/cgi-bin/setlink?base=cernrep&categ=Yellow_Report&id=96-03 ''1996 CERN Accelerator School: Superconductivity in particle accelerators'']</ref><ref>[http://documents.cern.ch/cgi-bin/setlink?base=cernrep&categ=Yellow_Report&id=1989-004 ''1989 CERN Accelerator School: Course on superconductivity in particle accelerators'']</ref> a scientific paper giving a thorough presentation of the many aspects of an SRF cavity to be used in the [[International Linear Collider]],<ref name=aune>[http://prst-ab.aps.org/pdf/PRSTAB/v3/i9/e092001 B. Aune et al., "Superconducting TESLA cavities", Phys. Rev. ST Accel. Beams 3, 092001 (2000). A thorough presentation of the many aspects of an SRF cavity]</ref> bi-annual International Conferences on RF Superconductivity held at varying global locations in odd numbered years,<ref>[http://www.helmholtz-berlin.de/events/srf2009/ 2009 Conference on RF Superconductivity]</ref> and tutorials presented at the conferences.<ref>[http://www.helmholtz-berlin.de/events/srf2009/programs/tutorials_de.html SRF Tutorials at the 2009 Conference on RF Superconductivity]</ref> +[[Image:Cornell SRF Collection 1.png|thumb|A collection of SRF cavities developed at Cornell University with frequencies spanning 200&nbsp;MHz to 3&nbsp;GHz.]] + +==SRF cavity application in particle accelerators== +A large variety of RF cavities are utilized in particle accelerators. Historically they have been made of copper, a good electrical conductor, and operated near room temperature with water cooling. The water cooling is necessary to remove the heat generated by the electrical loss in the cavity. In the past two decades, though, there has been a growing number of accelerator facilities for which superconducting cavities were deemed more suitable, or necessary, for the accelerator than normal-conducting copper versions. The motivation for using superconductors in RF cavities is ''not'' to achieve a net power savings. Though superconductors have very small electrical resistance, the little power that they do dissipate is done so at very low temperatures, typically in a liquid helium bath at 1.6&nbsp;K to 4.5&nbsp;K. The refrigeration power to maintain the cryogenic bath at low temperature in the presence of heat from small RF power dissipation is dictated by the [[Heat engine#Efficiency|Carnot efficiency]], and can easily be comparable to the normal-conductor power dissipation of a room-temperature copper cavity. The motivations for using superconducting RF cavities, are instead, the following: + +* '''High duty cycle or cw operation'''. SRF cavities allow the excitation of high electromagnetic fields at high duty cycle, or even cw, in such regimes that a copper cavity's electrical loss could ''melt'' the copper, even with robust water cooling. +* '''Low beam impedance'''. The low electrical loss in an SRF cavity allows their geometry to have large beampipe apertures while still maintaining a high accelerating field along the beam axis. Normal-conducting cavities need small beam apertures to concentrate the electric field as compensation for power losses in wall currents. However, the small apertures can be deleterious to a particle beam due to their spawning of larger wakefields, which are quantified by the accelerator parameters termed "beam impedance" and "loss parameter". +* '''Nearly all RF power goes to the beam'''. The RF source driving the cavity need only provide the RF power that is absorbed by the particle beam being accelerated, since the RF power dissipated in the SRF cavity walls is negligible. This is in contrast to normal-conducting cavities where the wall power loss can easily equal or exceed the beam power consumption. The RF power budget is important since the RF source technologies, such as a [[Klystron]], [[Inductive output tube]] (IOT), or [[Solid state (electronics)|solid state]] amplifier, have costs that increase dramatically with increasing power. + +When future superconducting material advances occur to obtain higher [[Superconductivity#Superconducting phase transition|superconducting critical temperatures]] ''T<sub>c</sub>'' and consequently higher SRF bath temperatures, then the better efficiencies of the refrigerator could yield a significant net power savings by SRF over the normal conducting approach to RF cavities. There are other issues that would have to be considered with a higher bath temperature, though, such as the absence of superfluidity that is presently exploited with liquid helium that would not be present with, e.g., liquid nitrogen. At present, none of the "high ''T<sub>c</sub>''" superconducting materials are suitable for RF applications. Shortcomings of these materials arise due to their underlying physics as well as their bulk mechanical properties not being amenable to fabricating accelerator cavities. However, depositing films of promising materials onto other mechanically amenable cavity materials could provide a viable option for exotic materials serving SRF applications. At present, the de facto choice of SRF material is still pure niobium, which has a critical temperature of 9.3&nbsp;K and functions as a superconductor nicely in a liquid helium bath of 4.2&nbsp;K or lower. + +==Physics of SRF cavities== +The physics of Superconducting RF can be complex and lengthy. A few simple approximations derived from the complex theories, though, can serve to provide some of the important parameters of SRF cavities. + +By way of background, some of the pertinent parameters of RF cavities are itemized as follows. A resonator's quality factor is defined by +:<math> Q_o = \frac{\omega U} {P_d} </math>, +where: +: ''&omega;'' is the resonant frequency in [rad/s], +:''U'' is the energy stored in [J], and +:''P<sub>d</sub>'' is the power dissipated in [W] in the cavity to maintain the energy ''U''. +The energy stored in the cavity is given by the integral of field energy density over its volume, +:<math> U = \frac{\mu_0}{2}\int{|\overrightarrow{H}|^2 dV}</math> , +where: +:''H'' is the magnetic field in the cavity and +:''&mu;<sub>0</sub>'' is the permeability of free space. +The power dissipated is given by the integral of resistive wall losses over its surface, +:<math> P_d = \frac{R_s}{2}\int{|\overrightarrow{H}|^2 dS} </math> , +where: +:''R<sub>s</sub>'' is the surface resistance which will be discussed below. + +The integrals of the electromagnetic field in the above expressions are generally not solved analytically, since the cavity boundaries rarely lie along axes of common coordinate systems. Instead, the calculations are performed by any of a variety of computer programs that solve for the fields for non-simple cavity shapes, and then numerically integrate the above expressions. + +An RF cavity parameter known as the Geometry Factor ranks the cavity's effectiveness of providing accelerating electric field due to the influence of its shape alone, which excludes specific material wall loss. The Geometry Factor is given by +:<math> G = \frac{\omega \mu_0 \int{|\overrightarrow{H}|^2 dV}}{\int{|\overrightarrow{H}|^2 dS}} </math> , +and then +:<math> Q_o = \frac{G} {R_s} \cdot </math> +The geometry factor is quoted for cavity designs to allow comparison to other designs independent of wall loss, since wall loss for SRF cavities can vary substantially depending on material preparation, cryogenic bath temperature, electromagnetic field level, and other highly variable parameters. The Geometry Factor is also independent of cavity size, it is constant as a cavity shape is scaled to change its frequency. + +As an example of the above parameters, a typical 9-cell SRF cavity for the [[International Linear Collider]]<ref name=aune/> (a.k.a. a TESLA cavity) would have ''G''=270 Ω and ''R<sub>s</sub>''= 10 nΩ, giving ''Q<sub>o</sub>''=2.7×10<sup>10</sup>. + +The critical parameter for SRF cavities in the above equations is the surface resistance ''R<sub>s</sub>'', and is where the complex physics comes into play. For normal-conducting copper cavities operating near room temperature, ''R<sub>s</sub>'' is simply determined by the empirically measured bulk electrical conductivity ''&sigma;'' by +:<math> R_{s\ normal} = \sqrt{ \frac{\omega \mu_0} {2 \sigma} }</math> . + +For copper at 300&nbsp;K, ''&sigma;''=5.8×10<sup>7</sup>&nbsp;(Ω·m)<sup>−1</sup> and at 1.3&nbsp;GHz, ''R<sub>s&nbsp;copper</sub>''= 9.4&nbsp;mΩ. + +For Type II superconductors in RF fields, ''R<sub>s</sub>'' can be viewed as the sum of the superconducting BCS resistance and temperature-independent "residual resistances", +:<math> R_s = R_{BCS} + R_{res}</math> . + +The ''BCS resistance'' derives from [[BCS theory]]. One way to view the nature of the BCS RF resistance is that the superconducting [[Cooper pair]]s, which have zero resistance for DC current, have finite mass and momentum which has to alternate sinusoidally for the AC currents of RF fields, thus giving rise to a small energy loss. The BCS resistance for niobium can be approximated when the temperature is less than half of niobium's [[Superconductivity#Superconducting phase transition|superconducting critical temperature]], ''T''<''T<sub>c</sub>''/2, by +:<math> R_{BCS} \simeq 2 \times 10^{-4} \left( \frac{f}{1.5 \times 10^{9}} \right)^2 \frac {e^{-17.67 / T}} {T} </math> [Ω], +where: +:''f'' is the frequency in [Hz], +:''T'' is the temperature in [K], and +:''T<sub>c</sub>''=9.3&nbsp;K for niobium, so this approximation is valid for ''T''<4.65&nbsp;K. + +Note that for superconductors, the BCS resistance increases quadratically with frequency, ~''f''&nbsp;<sup>2</sup>, whereas for normal conductors the surface resistance increases as the root of frequency, ~√''f''. For this reason, the majority of superconducting cavity applications favor lower frequencies, <3&nbsp;GHz, and normal-conducting cavity applications favor higher frequencies, >0.5&nbsp;GHz, there being some overlap depending on the application. + +The superconductor's ''residual resistance'' arises from several sources, such as random material defects, hydrides that can form on the surface due to hot chemistry and slow cool-down, and others that are yet to be identified. One of the quantifiable residual resistance contributions is due to an external magnetic field pinning [[Fluxon|magnetic fluxons]] in a Type II superconductor. The pinned fluxon cores create small normal-conducting regions in the niobium that can be summed to estimate their net resistance. For niobium, the magnetic field contribution to ''R<sub>s</sub>'' can be approximated by +:<math> R_{H} = \frac{H_{ext}}{2 H_{c2}} R_n \approx 9.49 \times 10^{-12} H_{ext}\sqrt{f} </math> [Ω], +where: +:''H<sub>ext</sub>'' is any external magnetic field in <nowiki>[</nowiki>[[Oersted|Oe]]<nowiki>]</nowiki>, +:''H<sub>c2</sub>'' is the Type II superconductor magnetic quench field, which is 2400&nbsp;Oe (190 kA/m) for niobium, and +:''R<sub>n</sub>'' is the normal-conducting resistance of niobium in [[ohm (unit)|ohms]]. + +The Earth's nominal magnetic flux of 0.5&nbsp;[[Gauss (unit)|gauss]] (50 [[microtesla|µT]]) translates to a magnetic field of 0.5&nbsp;Oe (40 [[ampere per meter|A/m]]) and would produce a residual surface resistance in a superconductor that is orders of magnitude greater than the BCS resistance, rendering the superconductor too lossy for practical use. For this reason, superconducting cavities are surrounded by [[Electromagnetic shielding#How magnetic shielding works|magnetic shielding]] to reduce the field permeating the cavity to typically <10&nbsp;mOe (0.8 A/m). + +Using the above approximations for a niobium a SRF cavity at 1.8&nbsp;K, 1.3&nbsp;GHz, and assuming a magnetic field of 10&nbsp;mOe (0.8 A/m), the surface resistance components would be +:''R<sub>BCS</sub>''&nbsp;=&nbsp;4.55&nbsp;nΩ and +:''R<sub>res</sub>''&nbsp;=&nbsp;''R<sub>H</sub>''&nbsp;=&nbsp;3.42&nbsp;nΩ, giving a net surface resistance +:''R<sub>s</sub>''&nbsp;=&nbsp;7.97&nbsp;nΩ. If for this cavity +:''G''&nbsp;=&nbsp;270&nbsp;Ω then the ideal quality factor would be +:''Q<sub>o</sub>''&nbsp;=&nbsp;3.4×10<sup>10</sup>. +The ''Q<sub>o</sub>'' just described can be further improved by up to a factor of 2 by performing a mild vacuum bake of the cavity. Empirically, the bake seems to reduce the BCS resistance by 50%, but increases the residual resistance by 30%. The plot below shows the ideal ''Q<sub>o</sub>'' values for a range of residual magnetic field for a baked and unbaked cavity. + +[[Image:SRF Cavity Max Qo vs H 2.jpg|frame|Plot of SRF cavity ideal ''Q<sub>o</sub>'' vs external DC magnetic field for the same cavity frequency, temperature, and geometry factor as used in the text.]] + +In general, much care and attention to detail must be exercised in the experimental setup of SRF cavities so that there is not ''Q<sub>o</sub>'' degradation due to RF losses in ancillary components, such as stainless steel vacuum flanges that are too close to the cavity's [[Evanescent wave|evanescent]] fields. However, careful SRF cavity preparation and experimental configuration have achieved the ideal ''Q<sub>o</sub>'' not only for low field amplitudes, but up to cavity fields that are typically 75% of the [[Superconductivity#Meissner effect|magnetic field quench]] limit. Few cavities make it to the magnetic field quench limit since residual losses and vanishingly small defects heat up localized spots, which eventually exceed the superconducting critical temperature and lead to a [[Superconducting magnet#Use|thermal quench]]. + +==''Q '' vs ''E''== +When using superconducting RF cavities in particle accelerators, the field level in the cavity should generally be as high as possible to most efficiently accelerate the beam passing through it. The ''Q<sub>o</sub>'' values described by the above calculations tend to degrade as the fields increase, which is plotted for a given cavity as a "''Q''&nbsp;vs&nbsp;''E''" curve, where "''E''" refers to the accelerating electric field of the [[Waveguide (electromagnetism)|TM<sub>01</sub>]] mode. Ideally, the cavity ''Q<sub>o</sub>'' would remain constant as the accelerating field is increased all the way up to the point of a magnetic quench field, as indicated by the "ideal" dashed line in the plot below. In reality, though, even a well prepared niobium cavity will have a ''Q''&nbsp;vs&nbsp;''E'' curve that lies beneath the ideal, as shown by the "good cavity" curve in the plot. + +There are many phenomena that can occur in an SRF cavity to degrade its ''Q''&nbsp;vs&nbsp;''E'' performance, such as impurities in the niobium, hydrogen contamination due to excessive heat during chemistry, and a rough surface finish. After a couple decades of development, a necessary prescription for successful SRF cavity production is emerging. This includes: +* Eddy-current scanning of the raw niobium sheet for impurities, +* Good quality control of electron beam welding parameters, +* Maintain a low cavity temperature during acid chemistry to avoid hydrogen contamination, +* [[Electropolishing|Electropolish]] of the cavity interior to achieve a very smooth surface, +* High pressure rinse (HPR) of the cavity interior in a clean room with filtered water to remove particulate contamination, +* Careful assembly of the cavity to other vacuum apparatus in a clean room with clean practices, +* A vacuum bake of the cavity at 120&nbsp;°C for 48 hours, which typically improves ''Q<sub>o</sub>'' by a factor of 2. + +[[Image:SRF Cavity Q vs E 1.png|frame|Example plots of SRF cavity ''Q<sub>o</sub>'' vs the accelerating electric field ''E<sub>a</sub>'' and peak magnetic field of the TM<sub>01</sub> mode.]] +There remains some uncertainty as to the root cause of why some of these steps lead to success, such as the electropolish and vacuum bake. However, if this prescription is not followed, the ''Q''&nbsp;vs&nbsp;''E'' curve often shows an excessive degradation of ''Q<sub>o</sub>'' with increasing field, as shown by the "''Q''&nbsp;slope" curve in the plot below. Finding the root causes of ''Q''&nbsp;slope phenomena is the subject of ongoing fundamental SRF research. The insight gained could lead to simpler cavity fabrication processes as well as benefit future material development efforts to find higher ''T<sub>c</sub>'' alternatives to niobium. + +==Wakefields and higher order modes (HOMs)== +One of the main reasons for using SRF cavities in particle accelerators is that their large apertures result in low beam impedance and higher thresholds of deleterious beam instabilities. As a charged particle beam passes through a cavity, its electromagnetic radiation field is perturbed by the sudden increase of the conducting wall diameter in the transition from the small-diameter beampipe to the large hollow RF cavity. A portion of the particle's radiation field is then "clipped off" upon re-entrance into the beampipe and left behind as wakefields in the cavity. The wakefields are simply superimposed upon the externally driven accelerating fields in the cavity. The spawning of electromagnetic cavity modes as wakefields from the passing beam is analogous to a [[Drum stick|drumstick]] striking a [[drumhead]] and exciting many resonant mechanical modes. + +The beam wakefields in an RF cavity excite a subset of the spectrum of the many [[Waveguide (electromagnetism)|electromagnetic modes]], including the externally driven TM<sub>01</sub> mode. There are then a host of beam instabilities that can occur as the repetitive particle beam passes through the RF cavity, each time adding to the wakefield energy in a collection of modes. + +For a particle bunch with charge ''q'', a length much shorter than the wavelength of a given cavity mode, and traversing the cavity at time ''t''=0, the amplitude of the wakefield voltage left behind in the cavity in a given mode is given by<ref name=pwilson>[http://www.slac.stanford.edu/pubs/slacpubs/2000/slac-pub-2884.html P. Wilson, "High Energy Electron Linacs: Applications to Storage Ring RF Systems and Linear Colliders", SLAC-PUB-2884 (Rev) November 1991. See Section 6 of this excellent treatment of particle accelerator RF and beam loading.]</ref> +:<math> V_{wake} = \frac{q \omega_o R} {2 Q_o} \ e^{j \omega_o t} \ e^{-\frac{\omega t}{2 Q_L}} = k q \ e^{j \omega_o t} \ e^{-\frac{\omega t}{2 Q_L}}</math> , +where: +:''R'' is the [[shunt impedance]] of the cavity mode defined by +:<math> R = \frac{ \left( \int{\overrightarrow{E} \cdot dl} \right)^2}{P_d} = \frac{V^2}{P_d} </math> , +:''E'' is the electric field of the RF mode, +:''P<sub>d</sub>'' is the power dissipated in the cavity to produce the electric field ''E'', +:''Q<sub>L</sub>'' is the "loaded ''Q''" of the cavity, which takes into account energy leakage out of the coupling antenna, +:''&omega;<sub>o</sub>'' is the angular frequency of the mode, +:the imaginary exponential is the mode's sinusoidal time variation, +:the real exponential term quantifies the decay of the wakefield with time, and +:<math> k = \frac{\omega_o R} {2 Q_o} </math> is termed the ''loss parameter'' of the RF mode. + +The shunt impedance ''R'' can be calculated from the solution of the electromagnetic fields of a mode, typically by a computer program that solves for the fields. In the equation for ''V<sub>wake</sub>'', the ratio ''R''/''Q<sub>o</sub>'' serves as a good comparative measure of wakefield amplitude for various cavity shapes, since the other terms are typically dictated by the application and are fixed. Mathematically, +:<math> \frac{R} {Q_o} = \frac{V^2}{\omega U} = \frac{2 \left( \int{\overrightarrow{E} \cdot dl} \right)^2}{ \omega \mu_o\int{|\overrightarrow{H}|^2 dV} } = \frac {2k}{\omega_o}</math> , +where relations defined above have been used. ''R''/''Q<sub>o</sub>'' is then a parameter that factors out cavity dissipation and is viewed as measure of the cavity geometry's effectiveness of producing accelerating voltage per stored energy in its volume. The wakefield being proportional to ''R''/''Q<sub>o</sub>'' can be seen intuitively since a cavity with small beam apertures concentrates the electric field on axis and has high ''R''/''Q<sub>o</sub>'', but also clips off more of the particle bunch's radiation field as deleterious wakefields. + +The calculation of electromagnetic field buildup in a cavity due to wakefields can be complex and depends strongly on the specific accelerator mode of operation. For the straightforward case of a storage ring with repetitive particle bunches spaced by time interval ''T<sub>b</sub>'' and a bunch length much shorter than the wavelength of a given mode, the long term steady state wakefield voltage presented to the beam by the mode is given by<ref name=pwilson/> +:<math> V_{ss \ wake} = V_{wake} \left( \frac{1} {1 - e^{-\tau} e^{j\delta}} - \frac{1}{2} \right) </math> , +where: +:<math> \tau = \frac{\omega T_b}{2 Q_L} </math> is the decay of the wakefield between bunches, and +:''&delta;'' is the phase shift of the wakefield mode between bunch passages through the cavity. + +As an example calculation, let the phase shift ''&delta;=0'', which would be close to the case for the TM<sub>01</sub> mode by design and unfortunately likely to occur for a few HOM's. Having ''&delta;=0'' (or an integer multiple of an RF mode's period, ''&delta;=n2&pi;'') gives the worse-case wakefield build-up, where successive bunches are maximally decelerated by previous bunches' wakefields and give up even more energy than with only their "self wake". Then, taking ''&omega;''&nbsp;= 2''&pi;''&nbsp;500&nbsp;MHz, ''T<sub>b</sub>''=1&nbsp;µs, and ''Q<sub>L</sub>''=10<sup>6</sup>, the buildup of wakefields would be ''V<sub>ss wake</sub>''=637×''V<sub>wake</sub>''. A pitfall for any accelerator cavity would be the presence of what is termed a "trapped mode". This is an HOM that does not leak out of the cavity and consequently has a ''Q<sub>L</sub>'' that can be orders of magnitude larger than used in this example. In this case, the buildup of wakefields of the trapped mode would likely cause a beam instability. The beam instability implications due to the ''V<sub>ss wake</sub>'' wakefields is thus addressed differently for the fundamental accelerating mode TM<sub>01</sub> and all other RF modes, as described next. + +===Fundamental accelerating mode TM<sub>01</sub>=== +The complex calculations treating wakefield-related beam stability for the TM<sub>01</sub> mode in accelerators show that there are specific regions of phase between the beam bunches and the driven RF mode that allow stable operation at the highest possible beam currents. At some point of increasing beam current, though, just about any accelerator configuration will become unstable. As pointed out above, the beam wakefield amplitude is proportional to the cavity parameter ''R''/''Q<sub>o</sub>'', so this is typically used as a comparative measure of the likelihood of TM<sub>01</sub> related beam instabilities. A comparison of ''R''/''Q<sub>o</sub>'' and ''R'' for a 500&nbsp;MHz superconducting cavity and a 500&nbsp;MHz normal-conducting cavity is shown below. The accelerating voltage provided by both cavities is comparable for a given net power consumption when including refrigeration power for SRF. The ''R''/''Q<sub>o</sub>'' for the SRF cavity is 15 times less than the normal-conducting version, and thus less beam-instability susceptible. This one of the main reasons such SRF cavities are chosen for use in high-current storage rings. + +[[Image:Cornell SRF vs NRF 1.png|frame|center|Comparison of superconducting and normal-conducting RF cavity shapes and their ''R''/''Q<sub>o</sub>''.]] + +===Higher order modes (HOMs)=== +[[Image:Cornell HOM Load 3.png|thumb|Photograph of the Cornell electron storage ring beamline HOM load.]] +In addition to the fundamental accelerating TM<sub>01</sub> mode of an RF cavity, numerous higher frequency modes and a few lower-frequency dipole modes are excited by charged particle beam wakefields, all generally denoted higher order modes (HOMs). These modes serve no useful purpose for accelerator particle beam dynamics, only giving rise to beam instabilities, and are best heavily damped to have as low a ''Q<sub>L</sub>'' as possible. The damping is accomplished by preferentially allowing dipole and all HOMs to leak out of the SRF cavity, and then coupling them to resistive RF loads. The leaking out of undesired RF modes occurs along the beampipe, and results from a careful design of the cavity aperture shapes. The aperture shapes are tailored to keep the TM<sub>01</sub> mode "trapped" with high ''Q<sub>o</sub>'' inside of the cavity and allow HOMs to propagate away. The propagation of HOMs is sometimes facilitated by having a fluted beampipe on one side of the cavity, as seen in the SRF cavity photograph at the top of this wiki page. The flutes present an effectively larger beampipe diameter to asymmetric RF modes, allowing them to easily propagate away from the cavity, while presenting an effectively small diameter to the axisymmetric TM<sub>01</sub> mode and hindering its propagation. + +The resistive load for HOMs can be implemented by having loop antennas located at apertures on the side of the beampipe, with coaxial lines routing the RF to outside of the cryostat to standard RF loads. Another approach is to place the HOM loads directly on the beampipe as hollow cylinders with RF lossy material attached to the interior surface, as shown in the image to the right. This "beamline load" approach can be more technically challenging, since the load must absorb high RF power while preserving a high-vacuum beamline environment in close proximity to a contamination-sensitive SRF cavity. Further, such loads must sometimes operate at cryogenic temperatures to avoid large thermal gradients along the beampipe from the cold SRF cavity. The benefit of the beamline HOM load configuration, however, is a greater absorptive bandwidth and HOM attenuation as compared to antenna coupling. This benefit can be the difference between a stable vs. an unstable particle beam for high current accelerators. + +==Cryogenics== +{{main | Cryomodule }} +A significant part of SRF technology is cryogenic engineering. The SRF cavities tend to be thin-walled structures immersed in a bath of liquid helium having temperature 1.6&nbsp;K to 4.5&nbsp;K. Careful engineering is then required to insulate the helium bath from the room-temperature external environment. This is accomplished by: +* A vacuum chamber surrounding the cold components to eliminate [[Convection|convective]] heat transfer by gases. +* [[Multi-layer insulation]] wrapped around cold components. This insulation is composed of dozens of alternating layers of aluminized mylar and thin fiberglass sheet, which reflects infrared radiation that shines through the vacuum insulation from the 300&nbsp;K exterior walls. +* Low [[thermal conductivity]] mechanical connections between the cold mass and the room temperature vacuum vessel. These connections are required, for example, to support the mass of the helium vessel inside the vacuum vessel and to connect the apertures in the SRF cavity to the accelerator beamline. Both types of connections transition from internal cryogenic temperatures to room temperature at the vacuum vessel boundary. The thermal conductivity of these parts is minimized by having small cross sectional area and being composed of low thermal conductivity material, such as stainless steel for the vacuum beampipe and fiber reinforced epoxies (G10) for mechanical support. The vacuum beampipe also requires good electrical conductivity on its interior surface to propagate the image currents of the beam, which is accomplished by about 100&nbsp;µm of copper plating on the interior surface. + +The major cryogenic engineering challenge is the refrigeration plant for the liquid helium. The small power that is dissipated in an SRF cavity and the heat leak to the vacuum vessel are both heat loads at very low temperature. The refrigerator must replenish this loss with an inherent poor efficiency, given by the product of the Carnot efficiency ''&eta;<sub>C</sub>'' and a "practical" efficiency ''&eta;<sub>p</sub>''. The Carnot efficiency derives from the [[second law of thermodynamics]] and can be quite low. It is given by + +:<math> \eta_C = +\begin{cases} + \frac{T_{cold}} {T_{warm} - T_{cold}}, & \mbox{if } T_{cold} < T_{warm} - T_{cold} \\ + 1, & \mbox{otherwise} +\end{cases} +</math> +where +:''T<sub>cold</sub>'' is the temperature of the cold load, which is the helium vessel in this case, and +:''T<sub>warm</sub>'' is the temperature of the refrigeration heat sink, usually room temperature. + +In most cases ''T<sub>warm</sub> =''300&nbsp;K, so for ''T<sub>cold</sub> &ge;''150&nbsp;K the Carnot efficiency is unity. The practical efficiency is a catch-all term that accounts for the many mechanical non-idealities that come into play in a refrigeration system aside from the fundamental physics of the Carnot efficiency. For a large refrigeration installation there is some economy of scale, and it is possible to achieve ''&eta;<sub>p</sub>'' in the range of 0.2&ndash;0.3. The [[wall-plug efficiency|wall-plug]] power consumed by the refrigerator is then +:<math> P_{warm} = \frac{P_{cold}} {\eta_C \ \eta_{p}} </math> , +where +:''P<sub>cold</sub>'' is the power dissipated at temperature ''T<sub>cold</sub>'' . + +As an example, if the refrigerator delivers 1.8&nbsp;K helium to the [[cryomodule]] where the cavity and heat leak dissipate ''P<sub>cold</sub>''=10&nbsp;W, then the refrigerator having ''T<sub>warm</sub>''=300&nbsp;K and ''&eta;<sub>p</sub>''=0.3 would have ''&eta;<sub>C</sub>''=0.006 and a wall-plug power of ''P<sub>warm</sub>''=5.5&nbsp;kW. Of course, most accelerator facilities have numerous SRF cavities, so the refrigeration plants can get to be very large installations. + +[[Image:He4 T vs P 1.png|frame|Plot of helium-4 temperature vs. pressure, with the superfluid &lambda; point indicated.]] +The temperature of operation of an SRF cavity is typically selected as a minimization of wall-plug power for the entire SRF system. The plot to the right then shows the pressure to which the helium vessel must be pumped to obtain the desired liquid helium temperature. Atmospheric pressure is 760&nbsp;[[Torr]] (101.325 kPa), corresponding to 4.2&nbsp;K helium. The superfluid ''&lambda;'' point occurs at about 38&nbsp;Torr (5.1 kPa), corresponding to 2.18&nbsp;K helium. Most SRF systems either operate at atmospheric pressure, 4.2&nbsp;K, or below the λ point at a system efficiency optimum usually around 1.8&nbsp;K, corresponding to about 12&nbsp;Torr (1.6 kPa). + +==References== +{{reflist|colwidth=30em}} + +[[Category:Accelerator physics]] +[[Category:Superconductivity]] + 1b0elhtfqn8evz4kriz54t72e0am5j8 + + + + Vienna rectifier + 0 + 24276 + + 24277 + 2013-03-20T10:19:31Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Migrating 1 interwiki links, now provided by [[Wikipedia:Wikidata|Wikidata]] on [[d:q386167]] + wikitext + text/x-wiki + {{Multiple issues| +{{orphan|date=February 2012}} +{{cleanup|date=June 2010}} +{{expert-subject|date=June 2010}} +}} + +The '''Vienna Rectifier''' is a [[pulse-width modulation]] rectifier, invented in 1993 by Prof. Johann W. Kolar.<ref>J. W. Kolar, „Dreiphasen-Dreipunkt-Pulsgleichrichter“, filed Dec. 23, 1993, File No.: A2612/93, European Patent Appl.: EP 94 120 245.9-1242 entitled “Vorrichtung und Verfahren zur Umformung von Drehstrom in Gleichstrom”.</ref> It provides: +* [[Three-phase]] three-level three-switch PWM [[rectifier]] with controlled output voltage.<ref>J. W. Kolar, F. C. Zach, “A Novel Three-Phase Utility Interface Minimizing Line Current Harmonics of High-Power Telecommunications Rectifier Modules”, Record of the 16th IEEE International Telecommunications Energy Conference, Vancouver, Canada, Oct. 30 - Nov. 3, pp. 367-374 (1994).</ref> +* Three-wire input, no connection to neutral. +* Ohmic mains behaviour {{Citation needed|date=June 2009}} +* Boost system (continuous input current). +* Unidirectional power flow.<ref>J. W. Kolar, H. Ertl, F. C. Zach, “Design and Experimental Investigation of a Three-Phase High Power Density High Efficiency Unity Power Factor PWM (Vienna) Rectifier Employing a Novel Integrated Power Semiconductor Module”, Proceedings of the 11th IEEE Applied Power Electronics Conference, San Jose (CA), USA, March 3–7, Vol.2, pp.514-523 (1998).</ref> +* High power density. +* Low conducted common-mode EMI emissions. +* Simple control to stabilize the neutral point potential.<ref>J. W. Kolar, U. Drofenik, F. C. Zach, “Space Vector Based Analysis of the Variation and Control of the Neutral Point Potential of Hysteresis Current Controlled Three-Phase/Switch/Level PWM Rectifier Systems”, Proceedings of the International Conference on Power Electronics and Drive Systems, Singapore, Feb.21-24, Vol.1, pp.22-33 (1995).</ref> +* Low complexity, low realization effort <ref>J. W. Kolar, H. Ertl, F. C. Zach, “Design and Experimental Investigation of a Three-Phase High Power Density High Efficiency Unity Power Factor PWM (Vienna) Rectifier Employing a Novel Integrated Power Semiconductor Module”, Proceedings of the 11th IEEE Applied Power Electronics Conference, San Jose (CA), USA, March 3–7, Vol.2, pp.514-523 (1998).</ref> +* Low switching losses.<ref>*Report “How to Design a 10kW Three-Phase AC/DC Interface Step by Step” at [http://www.gecko-research.com/downloadFreeTrial/FrontEndComparison_Part_1.html www.gecko-research.com]</ref> +* Reliable behaviour (guaranteeing ohmic mains behaviour) under heavily unbalanced mains voltages and in case of mains failure.<ref>J. W. Kolar, U. Drofenik, F. C. Zach, “Current Handling Capability of the Neutral Point of a Three-Phase/Switch/Level Boost-Type PWM (Vienna) Rectifier”, Proceedings of the 27th IEEE Power Electronics Specialists Conference, Baveno, Italy, June 24–27, Vol.II, pp.1329-1336 (1996).</ref> + +==Topology== +[[Image:Vienna rectifier schematic.jpg|thumb|150px|Fig. 1: Schematic of a Vienna Rectifier.]] +The Vienna Rectifier is a unidirectional three-phase three-switch three-level [[Pulse-width modulation]] (PWM) rectifier. It can be seen as a three-phase [[diode bridge]] with an integrated boost converter. + +==Applications== +[[Image:Vienna rectifier real.jpg|thumb|200px|Fig. 2: Top and bottom views of an air-cooled 10kW-Vienna Rectifier (400kHz PWM).]] +The Vienna Rectifier is useful wherever six-switch converters are used for achieving sinusoidal mains current and controlled output voltage, when no energy feedback from the load into the mains is required. In practice, use of the Vienna Rectifier is advantageous when space is at a sufficient premium to justify the additional hardware cost. These include: +* Telecommunications power supplies. +* [[Uninterruptable power supply|Uninterruptable power supplies]]. +* Input stages of AC-drive converter systems. +Figure 2 shows the top and bottom views of an air-cooled 10&nbsp;kW-Vienna Rectifier (400&nbsp;kHz PWM), with sinusoidal input current s and controlled output voltage. Dimensions are 250mm x 120mm x 40mm, resulting in a power density of 8.5&nbsp;kW/dm3. The total weight of the converter is 2.1&nbsp;kg <ref>S. D. Round, P. Karutz, M. L. Heldwein, J. W. Kolar, “Towards a 30 kW/liter, Three-Phase Unity Power Factor Rectifier”, Proceedings of the 4th Power Conversion Conference (PCC'07), Nagoya, Japan, April 2–5, CD-ROM, ISBN 1-4244-0844-X, (2007).</ref> + +==Current and voltage waveforms== +[[Image:vr wave.jpg|thumb|200px|Fig 3:Time variation of voltage-phases ua, ub, uc of the current-phases ia, ib, ic. +Fig. 2: From top to bottom: 1) mains voltages ua, ub, uc. 2) mains currents ia, ib, ic. 3) rectifier voltage at uDaM (see Fig. 1), which forms the input current. 4. Midpoint current of the output capacitors (i0 in Fig. 1). 5. Voltage between mains midpoint M and the output voltage midpoint 0. Note: Inner mains inductance is not considered, and therefore the voltage across the [[filter capacitor]]s is therefore equal to the mains voltage.]] + +Figure 3 shows the system behavior, calculated using the power-electronics circuit simulator.<ref>[http://www.gecko-research.com www.gecko-research.com]</ref> Between the output voltage midpoint (0) and the mains midpoint (M) the common mode voltage u0M appears, as is characteristic in three-phase converter systems. + +==Current control and balance of the neutral point at the DC-side== +It is possible to separately control the input current shape in each branch of the diode bridge by inserting a bidirectional switch into the node, as shown in Figure 3. The switch Ta controls the current by controlling the magnetization of the inductor. Switched on charges the inductor which drives the current through the bidirectional switch. Deactivating the switch increases causes the current to bypass the switch and flow through the freewheeling diodes Da+ and Da-. This results in a negative voltage across the inductor and drains it. This demonstrates the ability of the topology to control the current in phase with the mains voltage ([[Power factor correction|PFC]] capability). + +To generate a sinusoidal power input which is in phase with the voltage +<math> \underline{i}_D = G \star \underline{u}_C \approx G \star \underline{u}_1</math> +the average voltage space vector over a pulse-period must satisfy: +<math> \underline{u}_D\star = \underline{u}-j\omega_1L_1\underline{1}_D</math> +For high switching frequencies or low inductivities we require (<math>L1</math>) <math>\underline{u}_D \star \approx \underline{u}_1</math>. +The available voltage space vectors required for the input voltage are defined by the switching states(sa,sb,sc) and the direction of the phase currents. For example, for <math>iDa>0,iDb,iDc<0</math>, i.e. for the phase-range<math> \phi_1 = -30^\circ...+30^\circ</math> of the period(<math>\phi_1</math>) the phase of the input current space vector is <math>i_D \approx i_1</math>). Fig. 4 shows the conduction states of the system, and from this we get the input space vectors shows in Fig. 5 <ref>iPES (Interactive Power Electronics Seminar): Java-Applet Animation of the Vienna Rectifier at [http://www.ipes.ethz.ch/ipes/2002Vienna1/vr1.html www.ipes.ee.ethz.ch]</ref> + +[[Image:Conduction states vr.jpg|thumb|400px|Fig 5: Conduction states of the Vienna Rectifier, for ia>0, ib,ic<0, valid in a <math> 60^o</math> sector of the period T1 +sa,sb, and sc characterise the switching state of the system. The arrows represent the physical direction and value of the current midpoint i0.]] + +==References== +{{Reflist}} + +[[Category:Electronic circuits]] +[[Category:Electrical power conversion]] +[[Category:Power electronics]] + 601m8lfp3rx7p36hm3kk20etvftfwld + + + + Modigliani–Miller theorem + 0 + 3624 + + 3625 + 2013-12-19T10:49:51Z + + 202.120.150.43 + + wikitext + text/x-wiki + {{Refimprove|date=January 2007}} +The '''Modigliani–Miller theorem''' (of [[Franco Modigliani]], [[Merton Miller]]) is a theorem on capital structure, arguably forming the basis for modern thinking on [[capital structure]]. The basic theorem states that, under a certain market price process (the classical [[random walk]]), in the absence of [[tax]]es, [[bankruptcy]] costs, agency costs, and [[asymmetric information]], and in an [[efficient market]], the value of a firm is unaffected by how that firm is financed.<ref>MIT Sloan Lecture Notes, Finance Theory II, Dirk Jenter, 2003</ref> It does not matter if the firm's capital is raised by issuing [[stock]] or selling debt. It does not matter what the firm's [[dividend policy]] is. Therefore, the Modigliani–Miller theorem is also often called the '''capital structure irrelevance principle'''. + +Modigliani was awarded the [[Nobel Prize in Economics#Laureates|1985 Nobel Prize in Economics]] for this and other contributions. + +Miller was a professor at the [[University of Chicago]] when he was awarded the 1990 Nobel Prize in Economics, along with [[Harry Markowitz]] and [[William Forsyth Sharpe|William Sharpe]], for their "work in the theory of financial economics," with Miller specifically cited for "fundamental contributions to the theory of corporate finance." + +==Historical background== +Miller and Modigliani derived the theorem and wrote their groundbreaking article when they were both professors at the [[Tepper School of Business|Graduate School of Industrial Administration (GSIA)]] of [[Carnegie Mellon University]]. The story goes that Miller and Modigliani were set to teach corporate finance for business students despite the fact that they had no prior experience in corporate finance. When they read the material that existed they found it inconsistent so they sat down together to try to figure it out. The result of this was the article in the ''American Economic Review'' and what has later been known as the M&M theorem. + +Miller and Modigliani published a number of follow-up papers discussing some of these issues. The theorem was first proposed by F. Modigliani and M. Miller in 1958. + +==The theorem== +Consider two firms which are identical except for their financial structures. The first (Firm U) is '''unlevered''': that is, it is financed by '''equity''' only. The other (Firm L) is levered: it is financed partly by equity, and partly by debt. The Modigliani–Miller theorem states that the value of the two firms is the same. + +==Without taxes== +===Proposition I=== +<math>V_U = V_L \,</math> + +where + +<math>V_U</math> ''is the value of an unlevered firm'' = price of buying a firm composed only of equity, and <math>V_L</math> ''is the value of a levered firm'' = price of buying a firm that is composed of some mix of debt and equity. Another word for levered is ''geared'', which has the same meaning.<ref>Arnold G. (2007)</ref> + +To see why this should be true, suppose an investor is considering buying one of the two firms U or L. Instead of purchasing the shares of the levered firm L, he could purchase the shares of firm U and borrow the same amount of money B that firm L does. The eventual returns to either of these investments would be the same. Therefore the price of L must be the same as the price of U minus the money borrowed B, which is the value of L's debt. + +This discussion also clarifies the role of some of the theorem's assumptions. We have implicitly assumed that the [[investor]]'s cost of borrowing money is the same as that of the firm, which need not be true in the presence of asymmetric information, in the absence of efficient markets, or if the investor has a different risk profile than the firm. + +===Proposition II=== +[[Image:MM2.png|frame|right|Proposition II with risky debt. As [[leverage (finance)|leverage]] ([[Debt to equity ratio|D/E]]) increases, the [[weighted average cost of capital|WACC]] (k0) stays constant.]] + +:<math>r_E = r_0 + \frac{D}{E}(r_0 - r_D)</math> + +where + +* <math>r_E</math> ''is the required rate of return on equity, or [[cost of equity]].'' +* <math>r_0</math> ''is the company unlevered [[cost of capital]] (ie assume no leverage).'' +* <math>r_D</math> ''is the required rate of return on borrowings, or [[cost of debt]].'' +* <math>\frac{D}{E}</math> ''is the [[debt-to-equity ratio]].'' + +A higher debt-to-equity ratio leads to a higher required return on equity, because of the higher risk involved for equity-holders in a company with debt. The formula is derived from the theory of [[weighted average cost of capital]] (WACC). + +These propositions are true under the following assumptions: +* no transaction costs exist, and +* individuals and corporations borrow at the same rates. + +These results might seem irrelevant (after all, none of the conditions are met in the real world), but the theorem is still taught and studied because it tells something very important. That is, [[capital structure]] matters precisely because one or more of these assumptions is violated. It tells where to look for determinants of optimal capital structure and how those factors might affect optimal capital structure. + +==With taxes== +===Proposition I=== + +:<math>V_L =V_U + T_C D\,</math> + +where + +* <math>V_L</math> ''is the value of a levered firm.'' +* <math>V_U</math> ''is the value of an unlevered firm.'' +* <math>T_C D</math> ''is the tax rate (<math>T_C</math>) x the value of debt (D)'' +* the term <math>T_C D</math> assumes debt is perpetual + +This means that there are advantages for firms to be levered, since corporations can deduct interest payments. Therefore leverage lowers [[tax]] payments. [[Dividend]] payments are non-deductible. + +===Proposition II=== +:<math>r_E = r_0 + \frac{D}{E}(r_0 - r_D)(1-T_C)</math> + +where: + +* <math>r_E</math> ''is the required rate of return on equity, or cost of levered equity = unlevered equity + financing premium.'' +* <math>r_0</math> ''is the company cost of equity capital with no leverage (unlevered cost of equity, or return on assets with D/E = 0).'' +* <math>r_D</math> ''is the required rate of return on borrowings, or [[cost of debt]].'' +* <math>{D}/{E}</math> ''is the debt-to-equity ratio.'' +* <math>T_c</math> ''is the tax rate.'' + +The same relationship as earlier described stating that the cost of equity rises with leverage, because the risk to equity rises, still holds. The formula, however, has implications for the difference with the [[Weighted average cost of capital|WACC]]. Their second attempt on capital structure included taxes has identified that as the level of gearing increases by replacing equity with cheap debt the level of the WACC drops and an optimal capital structure does indeed exist at a point where debt is 100%. + +The following assumptions are made in the propositions with taxes: +* corporations are taxed at the rate <math>T_C</math> on earnings after interest, +* no transaction costs exist, and +* individuals and corporations borrow at the same rate. + +==Notes== +{{Reflist}} + +==Further reading== +{{More footnotes|date=March 2009}} +*{{cite book |title=Principles of Corporate Finance |last=Brealey |first=Richard A. |authorlink= |coauthors=Myers, Stewart C. |year=2008 |edition=9th |origyear=1981 |publisher=McGraw-Hill/Irwin |location=Boston |isbn=978-0-07-340510-0 |pages= }} +*{{cite book |title=The Quest for Value: The EVA management guide |last=Stewart |first=G. Bennett |authorlink= |coauthors= |year=1991 |publisher=HarperBusiness |location=New York |isbn=0-88730-418-4 |pages= }} +*{{cite journal | last = Modigliani | first = F. | authorlink = | coauthors = Miller, M. | year = 1958 | month = | title = The Cost of Capital, Corporation Finance and the Theory of Investment | journal = American Economic Review | volume = 48 | issue = 3 | pages = 261&ndash;297 | doi = | jstor = 1809766 | accessdate = | quote = }} +*{{cite journal | last = Modigliani | first = F. | authorlink = | coauthors = Miller, M. | year = 1963 | month = | title = Corporate income taxes and the cost of capital: a correction | journal = American Economic Review | volume = 53 | issue = 3 | pages = 433&ndash;443 | doi = | jstor = 1809167 | accessdate = | quote = }} +*{{cite journal | last = Miles | first = J. | authorlink = | coauthors = Ezzell, J. | year = 1980 | month = | title = The weighted average cost of capital, perfect capital markets and project life: a clarification | journal = Journal of Financial and Quantitative Analysis | volume = 15 | issue = | pages = 719&ndash;730 | id = | jstor = 2330405 | accessdate = | quote =| doi =10.2307/2330405 }} + +==External links== +* [http://rdcohen.50megs.com/MMabstract.htm Ruben D Cohen: An Implication of the Modigliani-Miller Capital Structuring Theorems on the Relation between Equity and Debt] + +{{corporate finance and investment banking}} + +{{DEFAULTSORT:Modigliani-Miller Theorem}} +[[Category:Capital (economics)]] +[[Category:Economics theorems]] +[[Category:Financial economics]] + g8pzcvghsji1ha1h4me8o3w9y3r1bzj + + + + Proximity effect (superconductivity) + 0 + 17516 + + 17517 + 2013-03-17T05:49:12Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Migrating 1 interwiki links, now provided by [[Wikipedia:Wikidata|Wikidata]] on [[d:q7252874]] + wikitext + text/x-wiki + '''Proximity effect''' or '''Holm-Meissner effect''' is a term used in the field of [[superconductivity]] to describe phenomena that occur when a superconductor (S) is placed in contact with a "normal" (N) non-superconductor. Typically the [[critical temperature]] <math>T_{c}</math> of the superconductor is suppressed and signs of weak superconductivity are observed in the normal material over [[Mesoscopic_physics|mesoscopic]] distances. The proximity effect is known since the pioneering work by R. Holm and W. Meissner.<ref name=Holm>{{Cite journal |author=Holm, R.; Meissner, W. |journal=Z.Physik. |year=1932 |volume=74 |page=715}}</ref> They have observed zero resistance in SNS pressed contacts, in which two superconducting metals are separated by a thin film of a non-superconducting (i.e. normal) metal. The discovery of the supercurrent in SNS contacts is sometimes mistakenly attributed to B. Josephson 1962 work, yet the effect was known long before his publication and was understood as the proximity effect.<ref name=Meis>{{Cite journal |author=Meissner, H. |title=Superconductivity in contacts with interposed barriers|journal=Phys.Rev. |year=1960 |volume=117 |pages=672–680}}</ref> + +==Origin of the effect== +Electrons in a [[superconductor]] in the superconducting state are ordered in a very different way than in a normal metal, i.e. they are paired into cooper pairs. Furthermore, electrons in a material cannot be said to have a definitive position because of the momentum-position [[Complementarity_(physics)|complementarity]]. In solid state physics one generally chooses a momentum space basis, and all electron states are filled with electrons until the [[fermi surface]] in a metal, or until the gap edge energy in the superconductor. + +Because of the nonlocality of the electrons in metals, the properties of those electrons cannot change infinitely quickly. In the case of a superconductor and a normal metal, we have the superconducting cooper-paired-electrons order in the superconductor, and the gapless filled-up-to-the-fermi-surface electron order in the normal metal. if we bring the two together, the electron order in the one system cannot infinitely quickly (in real space) change into the other order at the border. The paired state in the superconducting layer is carried over to the normal metal, and there the pairing is destroyed by scattering events causing the paired electrons (cooper pairs) to lose coherence. For very clean metals like Cu it can be several hundreds of micrometers before the pairing is destroyed. + +Conversely, the (gapless) electron order present in the normal metal is also carried over to the superconductor in that the superconducting gap is lowered near the interface. + +The microscopic model describing this behavior in terms of single electron processes is called [[Andreev reflection]]. It describes how electrons in one one material "pick up" the order of the layer they are proximate to by taking into account which states are present in the other material to scatter from and taking into account effects as interface transparency. + +==Overview== +As a contact effect, the SPE is closely related to thermoelectric phenomena like the [[Peltier effect]] or the formation of [[Semiconductors#P-N junctions|pn junctions]] in [[semiconductors]]. The proximity effect enhancement of <math>T_c</math> is largest when the normal material is a metal with a large diffusivity rather than an insulator (I). Proximity-effect suppression of <math>T_c</math> in a superconductor is largest when the normal material is ferromagnetic, as the presence of the internal magnetic field weakens superconductivity ([[Cooper pairs]] breaking). + +==Research== +The study of S/N, S/I and S/S' (S' is lower superconductor) bilayers and multilayers has been a particularly active area of SPE research. The behavior of the compound structure in the direction parallel to the interface differs from that perpendicular to the interface. In [[type II superconductor]]s exposed to a magnetic field parallel to the interface, vortex defects will preferentially nucleate in the N or I layers and a discontinuity in behavior is observed when an increasing field forces them into the S layers. In type I superconductors, flux will similarly first penetrate N layers. Similar qualitative changes in behavior do not occur when a magnetic field is applied perpendicular to the S/I or S/N interface. In S/N and S/I multilayers at low temperatures, the long penetration depths and coherence lengths of the Cooper pairs will allow the S layers to maintain a mutual, three-dimensional quantum state. As temperature is increased, communication between the S layers is destroyed resulting in a crossover to two-dimensional behavior. The anisotropic behavior of S/N, S/I and S/S' bilayers and multilayers has served as a basis for understanding the far more complex critical field phenomena observed in the highly anisotropic cuprate [[High-temperature superconductivity|high-temperature superconductors]]. + +Recently the Holm-Meissner proximity effect was observed in [[graphene]] by the Morpurgo research group.<ref name=Morp>{{Cite journal |author=Heersche, H.B. et al. |title=Bipolar Supercurrent in Graphene|journal=Nature |year=2007 |volume=446 |pages=56–59|doi=10.1038/nature05555}}</ref> The experiments have been done on nanometer scale devices made of single graphene layers with superimposed superconducting electrodes made of 10 nm Ti and 70 nm Al films. Al is a superconductor, which is responsible for inducing superconductivity into graphene. The distance between the electrodes was in the range between 100 nm and 500 nm. The proximity effect is manifested by observations of a supercurrent, i.e. a current flowing through the graphene junction with zero voltage on the junction. By using the gate electrodes the researches have shown that the proximity effect occurs when the carriers in the graphene are electrons as well as when the carriers are holes. The critical current of the devices was above zero even at the Dirac point. + +==See also== +*[[Andreev reflection]] + +==References== +{{reflist}} +*''Superconductivity of Metals and Alloys'' by [[Pierre-Gilles de Gennes|P.G. de Gennes]], ISBN 0-201-40842-2, a textbook which devotes significant space to the superconducting proximity effect (called "boundary effect" in the book). +<br /> +[[Category:Superconductivity]] + 4976oki2zbjzimwxmw9mwjdx5xlci0t + + + + Multicritical point + 0 + 22537 + + 22538 + 2013-06-09T12:38:59Z + + 79.201.167.142 + + /* Tricritical Point and Multicritical Points of Higher Order */ + wikitext + text/x-wiki + Multicritical points are special points in the parameter space of thermodynamic or +other systems with a continuous [[phase transition]]. At least two thermodynamic or other +parameters must be adjusted to reach a multicritical point. At a multicritical point the +system belongs to a [[universality class]] different from the "normal" universality class. + +A more detailed definition requires concepts from the theory of [[critical phenomena]], +a branch of [[physics]] that reached a very satisfying state in the 1970s. + +== Definition == +The union of all the points of the parameter space for which the system is critical is +called a critical [[manifold]]. + +[[Image:multicritical_end.png|thumb|right|A critical curve terminating at a multicritical point (schematic).]] + +As an example consider a substance [[ferromagnetic]] below a +transition temperature <math>T_{c}</math>, and paramagnetic above <math>T_c</math>. The parameter space here is +the temperature axis, and the critical manifold consists of the point <math>T_c</math>. Now add +hydrostatic pressure <math>P</math> to the parameter space. Under hydrostatic pressure the substance +normally still becomes ferromagnetic below a temperature <math>T_{c}</math>(<math>P</math>). + +This leads to a +critical curve in the (<math>T,P</math>) plane - a <math>1</math>-dimensional critical manifold. Also taking into account +shear stress <math>K</math> as a thermodynamic parameter leads to a critical surface <math>T_c</math>(<math>P,K</math>) in the +(<math>T,P,K</math>) parameter space - a <math>2</math>-dimensional critical manifold. +Critical manifolds of dimension <math>d > 1</math> and <math>d > 2</math> may have physically reachable borders of dimension +<math>d-1</math> which in turn may have borders of dimension <math>d-2</math>. The system still is critical at +these borders. However, criticality terminates for good reason, and the points on the +borders normally belong to another [[universality class]] than the [[universality class]] realized +within the critical manifold. All the points on the border of a critical manifold are +multicritical points. +Instead of terminating somewhere critical manifolds also may branch or intersect. +The points on the intersections or branch lines also are multicritical points. + +At least two parameters must be adjusted to reach a multicritical point. +A <math>2</math>-dimensional critical manifold may have two <math>1</math>-dimensional borders intersecting at a point. Two parameters must be adjusted to reach such a border, three parameters must be adjusted to reach the intersection of the two borders. A system of this type represents up to four universality classes: one within the critical manifold, two on the borders and one on the intersection of the borders. + +The gas-liquid critical point is not multicritical, because the phase transition at +the vapour pressure curve <math>P</math>(<math>T</math>) is discontinuous and the critical manifold thus consists of a single point. + +== Examples == +=== Tricritical Point and Multicritical Points of Higher Order=== +To reach a [[tricritical point]] the parameters must be tuned in such a way that the renormalized counterpart of the <math>\phi^4</math>-term of the Hamiltonian vanishes. A well-known experimental realization is found in the mixture of [[Helium-3]] and [[Helium-4]]. + +=== Lifshitz Point=== +To reach a Lifshitz point the parameters must be tuned in such a way that the renormalized counterpart of the <math>\left(\nabla\phi\right)^2</math>-term of the Hamiltonian vanishes. Consequently, at the Lifshitz point phases of uniform and modulated order meet the disordered phase. An experimental example is the [[magnet]] +MnP. A Lifshitz point is realized in a prototypical way in the [[ANNNI model]]. + +=== Lifshitz Tricritical Point=== +This multicritical point is simultaneously tricritical and Lifshitz. Three parameters must be adjusted to reach +a Lifshitz tricritical point. Such a point has been discussed to occur in non-[[stochiometric]] ferroelectrics. + +== Renormalization Group == +The [[renormalization group]] provides a detailed and quantitative explanation of critical phenomena. + +[[Category:Critical phenomena| ]] +[[Category:Renormalization group]] + 90r7501omd6a00wp65xo7ina6auzgqf + + + + Jenkins–Traub algorithm + 0 + 17496 + + 17497 + 2013-12-25T11:47:43Z + + LutzL + 0 + + One root at a time for the complex variant, deflation. + wikitext + text/x-wiki + The '''Jenkins–Traub algorithm for polynomial zeros''' is a fast globally convergent iterative method published in 1970 by [[Michael A. Jenkins]] and [[Joseph F. Traub]]. They gave two variants, one for general polynomials with complex coefficients, commonly known as the "CPOLY" algorithm, and a more complicated variant for the special case of polynomials with real coefficients, commonly known as the "RPOLY" algorithm. The latter is "practically a standard in black-box polynomial root-finders".<ref>Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007), Numerical Recipes: The Art of Scientific Computing, 3rd ed., Cambridge University Press, page 470.</ref> + +This article describes the complex variant. Given a polynomial ''P'', + +:<math>P(z)=\sum_{i=0}^na_iz^{n-i}, \quad a_0=1,\quad a_n\ne 0</math> + +with complex coefficients it computes approximations to the ''n'' zeros <math>\alpha_1,\alpha_2,\dots,\alpha_n</math> of ''P''(''z''), one at a time in roughly increasing order of magnitude. After each root is computed, its linear factor is removed from the polynomial. Using this ''deflation'' guarantees that each root is computed only once and that all roots are found. + +The real variant follows the same pattern, but computes two roots at a time, either two real roots or a pair of conjugate complex roots. By avoiding complex arithmetic, the real variant can be faster (by a factor of 4) than the complex variant. The Jenkins–Traub algorithm has stimulated considerable research on theory and software for methods of this type. + +==Overview== +The Jenkins–Traub algorithm calculates all of the roots of a [[polynomial]] with complex coefficients. The algorithm starts by checking the polynomial for the occurrence of very large or very small roots. If necessary, the coefficients are rescaled by a rescaling of the variable. In the algorithm proper, roots are found one by one and generally in increasing size. After each root is found, the polynomial is deflated by dividing off the corresponding linear factor. Indeed, the factorization of the polynomial into the linear factor and the remaining deflated polynomial is already a result of the root-finding procedure. The root-finding procedure has three stages that correspond to different variants of the [[inverse power iteration]]. See Jenkins and [[Joseph F Traub|Traub]].<ref>Jenkins, M. A. and Traub, J. F. (1970), [http://www.springerlink.com/content/q6w17w30035r2152/?p=ae17d723839045be82d270b45363625f&pi=1 A Three-Stage Variables-Shift Iteration for Polynomial Zeros and Its Relation to Generalized Rayleigh Iteration], Numer. Math. 14, 252–263.</ref> +A description can also be found in Ralston and [[Philip Rabinowitz (mathematician)| +Rabinowitz]]<ref>Ralston, A. and Rabinowitz, P. (1978), A First Course in Numerical Analysis, 2nd ed., McGraw-Hill, New York.</ref> p.&nbsp;383. +The algorithm is similar in spirit to the two-stage algorithm studied by Traub.<ref>Traub, J. F. (1966), [http://links.jstor.org/sici?sici=0025-5718(196601)20%3A93%3C113%3AACOGCI%3E2.0.CO%3B2-3 A Class of Globally Convergent Iteration Functions for the Solution of Polynomial Equations], Math. Comp., 20(93), 113–138.</ref> + +=== Root-finding procedure === + +Starting with the current polynomial ''P''(''X'') of degree ''n'', the smallest root of ''P(x)'' is computed. To that end, a sequence of so-called ''H'' polynomials is constructed. These polynomials are all of degree ''n''&nbsp;&minus;&nbsp;1 and are supposed to converge to the factor of ''P''(''X'') containing all the remaining roots. The sequence of ''H'' polynomials occurs in two variants, an unnormalized variant that allows easy theoretical insights and a normalized variant of <math>\bar H</math> polynomials that keeps the coefficients in a numerically sensible range. + +The construction of the ''H'' polynomials <math>\left(H^{(\lambda)}(z)\right)_{\lambda=0,1,2,\dots}</math> depends on a sequence of complex numbers <math>(s_\lambda)_{\lambda=0,1,2,\dots}</math> called shifts. These shifts themselves depend, at least in the third stage, on the previous ''H'' polynomials. The ''H'' polynomials are defined as the solution to the implicit recursion +:<math> + H^{(0)}(z)=P^\prime(z) +</math> and <math> + (X-s_\lambda)\cdot H^{(\lambda+1)}(X)\equiv H^{(\lambda)}(X)\pmod{P(X)}\ . +</math> +A direct solution to this implicit equation is +:<math> + H^{(\lambda+1)}(X) + =\frac1{X-s_\lambda}\cdot + \left( + H^{(\lambda)}(X)-\frac{H^{(\lambda)}(s_\lambda)}{P(s_\lambda)}P(X) + \right)\,, +</math> +where the polynomial division is exact. + +Algorithmically, one would use for instance the [[Horner scheme]] or [[Ruffini rule]] to evaluate the polynomials at <math>s_\lambda</math> and obtain the quotients at the same time. With the resulting quotients ''p''(''X'') and ''h''(''X'') as intermediate results the next ''H'' polynomial is obtained as +:<math> +\left.\begin{align} +P(X)&=p(X)\cdot(X-s_\lambda)+P(s_\lambda)\\ +H^{(\lambda)}(X)&=h(X)\cdot(X-s_\lambda)+H^{(\lambda)}(s_\lambda)\\ +\end{align}\right\} +\implies H^{(\lambda+1)}(z)=h(z)-\frac{H^{(\lambda)}(s_\lambda)}{P(s_\lambda)}p(z). +</math> +Since the highest degree coefficient is obtained from ''P(X)'', the leading coefficient of <math>H^{(\lambda+1)}(X)</math> is <math>-\tfrac{H^{(\lambda)}(s_\lambda)}{P(s_\lambda)}</math>. If this is divided out the normalized ''H'' polynomial is +:<math>\begin{align} + \bar H^{(\lambda+1)}(X) + &=\frac1{X-s_\lambda}\cdot + \left( + P(X)-\frac{P(s_\lambda)}{H^{(\lambda)}(s_\lambda)}H^{(\lambda)}(X) + \right)\\[1em] + &=\frac1{X-s_\lambda}\cdot + \left( + P(X)-\frac{P(s_\lambda)}{\bar H^{(\lambda)}(s_\lambda)}\bar H^{(\lambda)}(X) + \right)\,.\end{align} +</math> + +==== Stage one: no-shift process ==== +For <math>\lambda=0,1,\dots, M-1</math> set <math>s_\lambda=0</math>. Usually ''M=5'' is chosen for polynomials of moderate degrees up to ''n''&nbsp;=&nbsp;50. This stage is not necessary from theoretical considerations alone, but is useful in practice. It emphasizes in the ''H'' polynomials the cofactor (of the linear factor) of the smallest root. + +==== Stage two: fixed-shift process ==== +The shift for this stage is determined as some point close to the smallest root of the polynomial. It is quasi-randomly located on the circle with the inner root radius, which in turn is estimated as the positive solution of the equation +:<math> +R^n+|a_{n-1}|\,R^{n-1}+\dots+|a_{1}|\,R=|a_0|\,. +</math> +Since the left side is a convex function and increases monotonically from zero to infinity, this equation is easy to solve, for instance by [[Newton's method]]. + +Now choose <math>s=R\cdot \exp(i\,\phi_\text{random})</math> on the circle of this radius. The sequence of polynomials <math>H^{(\lambda+1)}(z)</math>, <math>\lambda=M,M+1,\dots,L-1</math>, is generated with the fixed shift value <math>s_\lambda=s</math>. During this iteration, the current approximation for the root +:<math>t_\lambda=s-\frac{P(s)}{\bar H^{(\lambda)}(s)}</math> +is traced. The second stage is finished successfully if the conditions +:<math> + |t_{\lambda+1}-t_\lambda|<\tfrac12\,|t_\lambda| +</math> and <math> + |t_\lambda-t_{\lambda-1}|<\tfrac12\,|t_{\lambda-1}| +</math> +are simultaneously met. If there was no success after some number of iterations, a different random point on the circle is tried. Typically one uses a number of 9 iterations for polynomials of moderate degree, with a doubling strategy for the case of multiple failures. + +==== Stage three: variable-shift process ==== +The <math>H^{(\lambda+1)}(X)</math> are now generated using the variable shifts <math>s_{\lambda},\quad\lambda=L,L+1,\dots</math> which are generated by +:<math>s_L=t_L=s- \frac{P(s)}{\bar H^{(\lambda)}(s)}</math> +being the last root estimate of the second stage and +:<math>s_{\lambda+1}=s_\lambda- \frac{P(s_\lambda)}{\bar H^{(\lambda+1)}(s_\lambda)}, \quad \lambda=L,L+1,\dots,</math> +:where <math>\bar H^{(\lambda+1)}(z)</math> is the normalized ''H'' polynomial, that is <math>H^{(\lambda)}(z)</math> divided by its leading coefficient. + +If the step size in stage three does not fall fast enough to zero, then stage two is restarted using a different random point. If this does not succeed after a small number of restarts, the number of steps in stage two is doubled. + +==== Convergence ==== +It can be shown that, provided ''L'' is chosen sufficiently large, ''s''<sub>λ</sub> always converges to a root of ''P''. + +The algorithm converges for any distribution of roots, but may fail to find all roots of the polynomial. Furthermore, the convergence is slightly faster than the [[Rate of convergence|quadratic convergence]] of Newton–Raphson iteration, however, it uses at least twice as many operations per step. + +==What gives the algorithm its power?== +Compare with the [[Newton–Raphson iteration]] + +:<math>z_{i+1}=z_i - \frac{P(z_i)}{P^{\prime}(z_i)}.</math> + +The iteration uses the given ''P'' and <math>\scriptstyle P^{\prime}</math>. In contrast the third-stage of Jenkins–Traub + +:<math> +s_{\lambda+1} + =s_\lambda- \frac{P(s_\lambda)}{\bar H^{\lambda+1}(s_\lambda)} + =s_\lambda-\frac{W^\lambda(s_\lambda)}{(W^\lambda)'(s_\lambda)} +</math> + +is precisely a Newton–Raphson iteration performed on certain [[rational functions]]. More precisely, Newton–Raphson is being performed on a sequence of rational functions + +:<math>W^\lambda(z)=\frac{P(z)}{H^\lambda(z)}.</math> + +For <math>\lambda</math> sufficiently large, + +:<math>\frac{P(z)}{\bar H^{\lambda}(z)}=W^\lambda(z)\,LC(H^{\lambda})</math> + +is as close as desired to a first degree polynomial + +:<math>z-\alpha_1, \,</math> + +where <math>\alpha_1</math> is one of the zeros of <math>P</math>. Even though Stage 3 is precisely a Newton–Raphson iteration, differentiation is not performed. + +=== Analysis of the ''H'' polynomials === +Let <math>\alpha_1,\dots,\alpha_n</math> be the roots of ''P''(''X''). The so-called Lagrange factors of ''P(X)'' are the cofactors of these roots, +:<math>P_m(X)=\frac{P(X)-P(\alpha_m)}{X-\alpha_m}.</math> +If all roots are different, then the Lagrange factors form a basis of the space of polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1. By analysis of the recursion procedure one finds that the ''H'' polynomials have the coordinate representation +:<math> +H^{(\lambda)}(X) + =\sum_{m=1}^n + \left[ + \prod_{\kappa=0}^{\lambda-1}(\alpha_m-s_\kappa) + \right]^{-1}\,P_m(X)\ . +</math> +Each Lagrange factor has leading coefficient 1, so that the leading coefficient of the H polynomials is the sum of the coefficients. The normalized H polynomials are thus +:<math> +\bar H^{(\lambda)}(X) + =\frac{\sum_{m=1}^n + \left[ + \prod_{\kappa=0}^{\lambda-1}(\alpha_m-s_\kappa) + \right]^{-1}\,P_m(X) + }{ + \sum_{m=1}^n + \left[ + \prod_{\kappa=0}^{\lambda-1}(\alpha_m-s_\kappa) + \right]^{-1} + } +=\frac{P_1(X)+\sum_{m=2}^n + \left[ + \prod_{\kappa=0}^{\lambda-1}\frac{\alpha_1-s_\kappa}{\alpha_m-s_\kappa} + \right]\,P_m(X) + }{ + 1+\sum_{m=1}^n + \left[ + \prod_{\kappa=0}^{\lambda-1}\frac{\alpha_1-s_\kappa}{\alpha_m-s_\kappa} + \right] + }\ . +</math> + +=== Convergence orders === +If the condition <math>|\alpha_1-s_\kappa|<\min{}_{m=2,3,\dots,n}|\alpha_m-s_\kappa|</math> holds for almost all iterates, the normalized H polynomials will converge at least geometrically towards <math>P_1(X)</math>. + +Under the condition that +:<math>|\alpha_1|<|\alpha_2|=\min{}_{m=2,3,\dots,n}|\alpha_m|</math> +one gets the aymptotic estimates for +*stage 1: +*:<math> + H^{(\lambda)}(X) + =P_1(X)+O\left(\left|\frac{\alpha_1}{\alpha_2}\right|^\lambda\right). +</math> +*for stage 2, if ''s'' is close enough to <math>\alpha_1</math>: +*:<math> + H^{(\lambda)}(X) + =P_1(X) + +O\left( + \left|\frac{\alpha_1}{\alpha_2}\right|^M + \cdot + \left|\frac{\alpha_1-s}{\alpha_2-s}\right|^{\lambda-M}\right) +</math> +*:and +*:<math> + s-\frac{P(s)}{\bar H^{(\lambda)}(s)} + =\alpha_1+O\left(\ldots\cdot|\alpha_1-s|\right).</math> +*and for stage 3: +*:<math> + H^{(\lambda)}(X) + =P_1(X) + +O\left(\prod_{\kappa=0}^{\lambda-1} + \left|\frac{\alpha_1-s_\kappa}{\alpha_2-s_\kappa}\right| + \right) +</math> +*:and +*:<math> + s_{\lambda+1}= + s_\lambda-\frac{P(s)}{\bar H^{(\lambda+1)}(s_\lambda)} + =\alpha_1+O\left(\prod_{\kappa=0}^{\lambda-1} + \left|\frac{\alpha_1-s_\kappa}{\alpha_2-s_\kappa}\right| + \cdot + \frac{|\alpha_1-s_\lambda|^2}{|\alpha_2-s_\lambda|} + \right) +</math> +:giving rise to a higher than quadratic convergence order of <math>\phi^2=1+\phi\approx 2.61</math>, where <math>\phi=\tfrac12(1+\sqrt5)</math> is the [[golden ratio]]. + +=== Interpretation as inverse power iteration === +All stages of the Jenkins–Traub complex algorithm may be represented as the linear algebra problem of determining the eigenvalues of a special matrix. This matrix is the coordinate representation of a linear map in the ''n''-dimensional space of polynomials of degree ''n''&nbsp;&minus;&nbsp;1 or less. The principal idea of this map is to interpret the factorization +:<math>P(X)=(X-\alpha_1)\cdot P_1(X)</math> +with a root <math>\alpha_1\in\C</math> and <math>P_1(X)=P(X)/(X-\alpha_1)</math> the remaining factor of degree ''n''&nbsp;&minus;&nbsp;1 as the eigenvector equation for the multiplication with the variable ''X'', followed by remainder computation with divisor ''P''(''X''), +:<math>M_X(H)=(X\cdot H(X)) \bmod P(X)\,.</math> +This maps polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1 to polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1. The eigenvalues of this map are the roots of ''P''(''X''), since the eigenvector equation reads +:<math>0=(M_X-\alpha\cdot id)(H)=((X-\alpha)\cdot H) \bmod P\,,</math> +which implies that <math>(X-\alpha)\cdot H)=C\cdot P(X)</math>, that is, <math>(X-\alpha)</math> is a linear factor of ''P''(''X''). In the monomial basis the linear map <math>M_X</math> is represented by a [[companion matrix]] of the polynomial ''P'', as +:<math> M_X(H)=\sum_{m=1}^{n-1}(H_{m-1}-P_{m}H_{n-1})X^m-P_0H_{n-1}\,,</math> +the resulting coefficient matrix is +:<math>A=\begin{pmatrix} +0 & 0 & \dots & 0 & -P_0 \\ +1 & 0 & \dots & 0 & -P_1 \\ +0 & 1 & \dots & 0 & -P_2 \\ +\vdots & \vdots & \ddots & \vdots & \vdots \\ +0 & 0 & \dots & 1 & -P_{n-1} +\end{pmatrix}\,.</math> +To this matrix the [[inverse power iteration]] is applied in the three variants of no shift, constant shift and generalized Rayleigh shift in the three stages of the algorithm. It is more efficient to perform the linear algebra operations in polynomial arithmetic and not by matrix operations, however, the properties of the inverse power iteration remain the same. + +==Real coefficients== +The Jenkins–Traub algorithm described earlier works for polynomials with complex coefficients. The same authors also created a three-stage algorithm for polynomials with real coefficients. See Jenkins and Traub [http://links.jstor.org/sici?sici=0036-1429%28197012%297%3A4%3C545%3AATAFRP%3E2.0.CO%3B2-J A Three-Stage Algorithm for Real Polynomials Using Quadratic Iteration].<ref>Jenkins, M. A. and Traub, J. F. (1970), [http://links.jstor.org/sici?sici=0036-1429%28197012%297%3A4%3C545%3AATAFRP%3E2.0.CO%3B2-J A Three-Stage Algorithm for Real Polynomials Using Quadratic Iteration], SIAM J. Numer. Anal., 7(4), 545–566.</ref> The algorithm finds either a linear or quadratic factor working completely in real arithmetic. If the complex and real algorithms are applied to the same real polynomial, the real algorithm is about four times as fast. The real algorithm always converges and the rate of convergence is greater than second order. + +==A connection with the shifted QR algorithm== +There is a surprising connection with the shifted QR algorithm for computing matrix eigenvalues. See Dekker and Traub [http://linkinghub.elsevier.com/retrieve/pii/0024379571900358 The shifted QR algorithm for Hermitian matrices].<ref>Dekker, T. J. and Traub, J. F. (1971), [http://linkinghub.elsevier.com/retrieve/pii/0024379571900358 The shifted QR algorithm for Hermitian matrices], Lin. Algebra Appl., 4(2), 137–154.</ref> Again the shifts may be viewed as Newton-Raphson iteration on a sequence of rational functions converging to a first degree polynomial. + +==Software and testing== +The software for the Jenkins–Traub algorithm was published as Jenkins and Traub [http://portal.acm.org/citation.cfm?id=361262&coll=portal&dl=ACM Algorithm 419: Zeros of a Complex Polynomial].<ref>Jenkins, M. A. and Traub, J. F. (1972), [http://portal.acm.org/citation.cfm?id=361262&coll=portal&dl=ACM Algorithm 419: Zeros of a Complex Polynomial], Comm. ACM, 15, 97–99.</ref> The software for the real algorithm was published as Jenkins [http://portal.acm.org/citation.cfm?id=355643&coll=ACM&dl=ACM Algorithm 493: Zeros of a Real Polynomial].<ref>Jenkins, M. A. (1975), [http://portal.acm.org/citation.cfm?id=355643&coll=ACM&dl=ACM Algorithm 493: Zeros of a Real Polynomial], ACM TOMS, 1, 178–189.</ref> + +The methods have been extensively tested by many people. As predicted they enjoy faster than quadratic convergence for all distributions of zeros. + +However there are polynomials which can cause loss of precision as illustrated by the following example. The polynomial has all its zeros lying on two half-circles of different radii. [[James H. Wilkinson|Wilkinson]] recommends that it is desirable for stable deflation that smaller zeros be computed first. The second-stage shifts are chosen so that the zeros on the smaller half circle are found first. After deflation the polynomial with the zeros on the half circle is known to be ill-conditioned if the degree is large; see Wilkinson,<ref>Wilkinson, J. H. (1963), Rounding Errors in Algebraic Processes, Prentice Hall, Englewood Cliffs, N.J.</ref> p.&nbsp;64. The original polynomial was of degree 60 and suffered severe deflation instability. + +==References== +{{reflist}} + +==External links== +*[http://math.fullerton.edu/mathews/n2003/jenkinstraub/JenkinsTraubBib/Links/JenkinsTraubBib_lnk_2.html Additional Bibliography for the Jenkins–Traub Method] +*[http://math.fullerton.edu/mathews/n2003/jenkinstraub/JenkinsTraubBib/Links/JenkinsTraubBib_lnk_1.html Internet Resources for the Jenkins–Traub Method] +*[http://www.hvks.com/Numerical/winsolve.html A free downloadable Windows application using the Jenkins–Traub Method for polynomials with real and complex coefficients] +*[http://www.novanumeric.com/samples.php?CalcName=Roots Online Calculator] Online Polynomial Calculator using the Jenkins Traub procedure + +{{DEFAULTSORT:Jenkins-Traub Algorithm}} +[[Category:Numerical analysis]] +[[Category:Root-finding algorithms]] + i9gsjwg9t1e6e6zgsfju6q1s10s8s7c + + + + Doomsday argument + 0 + 5625 + + 5626 + 2014-02-01T14:42:15Z + + 192.76.7.203 + + /* See also */ + wikitext + text/x-wiki + [[Image:Population curve.svg|thumb|350px|World population from 10,000 BC to 2000 AD]] + +The '''Doomsday argument''' ('''DA''') is a [[probability theory|probabilistic argument]] that claims to [[predict]] the number of future members of the [[human species]] given only an estimate of the total number of humans born so far. Simply put, it says that supposing the humans alive today are in a random place in the whole human history timeline, chances are we are about halfway through it. + +It was first proposed in an explicit way by the astrophysicist [[Brandon Carter]] in 1983,<ref>{{Cite journal + | author = [[Brandon Carter]] + | title = The anthropic principle and its implications for biological evolution + | journal = [[Philosophical Transactions of the Royal Society of London]] + | volume = A310 + | pages = 347&ndash;363 + | year = 1983 + | doi = 10.1098/rsta.1983.0096 + | last2 = McCrea + | first2 = W. H. + | issue = 1512 +}}</ref> from which it is sometimes called the '''Carter catastrophe'''; the argument was subsequently championed by the [[philosopher]] [[John A. Leslie]] and has since been independently discovered by [[J. Richard Gott]]<ref>{{Cite journal + | author = J. Richard Gott, III + | title = Implications of the Copernican principle for our future prospects + | journal = [[Nature (journal)|Nature]] + | volume = 363 + | pages = 315&ndash;319 + | year = 1993 + | doi = 10.1038/363315a0 + | issue = 6427 +}}</ref> and [[Holger Bech Nielsen]].<ref>{{Cite journal + | author = [[Holger Bech Nielsen]] + | title = Random dynamics and relations between the number of fermion generations and the fine structure constants + | journal = [[Acta Physica Polonica]] + | volume = B20 + | pages = 427&ndash;468 + | year = 1989 +}}</ref> Similar principles of [[eschatology]] were proposed earlier by [[Heinz von Foerster]], among others. + +Denoting by ''N'' the total number of humans who were ever or will ever be born, the [[Copernican principle]] suggests that humans are equally likely (along with the other ''N''&nbsp;−&nbsp;1 humans) to find themselves at any position ''n'' of the total population ''N'', so humans assume that our fractional position ''f''&nbsp;=&nbsp;''n''/''N'' is [[Uniform distribution (continuous)|uniformly distributed]] on the [[interval (mathematics)|interval]] <nowiki>[0,&nbsp;1]</nowiki> [[Prior probability|prior]] to learning our absolute position. + +''f'' is uniformly distributed on (0,&nbsp;1] even after learning of the absolute position ''n''. That is, for example, there is 95% chance that ''f'' is in the interval (0.05,&nbsp;1], that is ''f''&nbsp;>&nbsp;0.05. In other words we could assume that we could be 95% certain that we would be within the last 95% of all the humans ever to be born. If we know our absolute position ''n'', this implies{{Dubious|The article is misleading, and the argument itself is absurd|date=March 2009}} an upper bound for ''N'' obtained by rearranging ''n''/''N''&nbsp;>&nbsp;0.05 to give ''N''&nbsp;<&nbsp;20''n''. + +If Leslie's Figure<ref>http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.5899&rep=rep1&type=pdf</ref> is used, then 60 billion humans have been born so far, so it can be estimated that there is a 95% chance that the total number of humans ''N'' will be less than 20&nbsp;×&nbsp;60 billion&nbsp;=&nbsp;1.2 trillion. Assuming that the [[world population]] stabilizes [[World Population#Forecasts|at 10 billion]] and a [[life expectancy]] of [[Longevity#Future|80 years]], it can be estimated that the remaining 1,140 billion humans will be born in 9,120 years. Depending on the projection of world population in the forthcoming centuries, estimates may vary, but the main point of the argument is that it is unlikely that more than 1.2 trillion humans will ever live. This problem is similar to the famous [[German tank problem]]. + +==Aspects== + +===Remarks=== +* The step that converts ''N'' into an extinction time depends upon a finite human lifespan. If [[immortality]] becomes common, and the birth rate drops to zero, then the human race could continue forever even if the total number of humans ''N'' is finite. +* A precise formulation of the Doomsday Argument requires the [[Bayesian probability|Bayesian]] interpretation of probability +* Even among Bayesians some of the assumptions of the argument's logic would not be acceptable; for instance, the fact that it is applied to a temporal phenomenon (how long something lasts) means that ''N'''s distribution simultaneously represents an "[[aleatory probability]]" (as a future event), and an "[[epistemic probability]]" (as a decided value about which we are uncertain). +* The ''U'' <nowiki>(0,1]</nowiki> ''f'' distribution is derived from two choices, which despite being the default are also arbitrary: +** The [[principle of indifference]], so that it is as likely for any other randomly selected person to be born after you as before you. +** The ''assumption'' of no 'prior' knowledge on the distribution of ''N''. + +===Simplification: two possible total numbers of humans=== +Assume for simplicity that the total number of humans who will ever be born is 60 billion (''N''<sub>1</sub>), or 6,000 billion (''N''<sub>2</sub>).<ref>Doomsday argument two-case section is partially based on [http://www.findarticles.com/p/articles/mi_m2346/is_n426_v107/ai_20550244 a refutation of the Doomsday Argument by Korb and Oliver].</ref> If there is no prior knowledge of the position that a currently living individual, ''X'', has in the history of humanity, we may instead compute how many humans were born before ''X'', and arrive at (say) 59,854,795,447, which would roughly place ''X'' amongst the first 60 billion humans who have ever lived. + +Now, if we assume that the number of humans who will ever be born equals ''N''<sub>1</sub>, the probability that ''X'' is amongst the first 60 billion humans who have ever lived is of course 100%. However, if the number of humans who will ever be born equals ''N''<sub>2</sub>, then the probability that ''X'' is amongst the first 60 billion humans who have ever lived is only 1%. Since X is in fact amongst the first 60 billion humans who have ever lived, this means that the total number of humans who will ever be born is more likely to be much closer to 60 billion than to 6,000 billion. In essence the DA therefore suggests that [[human extinction]] is more likely to occur sooner rather than later. + +It is possible to sum the probabilities for each value of ''N'' and therefore to compute a statistical 'confidence limit' on ''N''. For example, taking the numbers above, it is 99% certain that ''N'' is smaller than 6,000 billion. + +Note that as remarked above, this argument assumes that the prior probability for ''N'' is flat, or 50% for ''N''<sub>1</sub> and 50% for ''N''<sub>2</sub> in the absence of any information about ''X''. On the other hand, it is possible to conclude, given ''X'', that ''N''<sub>2</sub> is more likely than ''N''<sub>1</sub>, if a different prior is used for ''N''. More precisely, Bayes' theorem tells us that P(''N''|''X'')=P(''X''|''N'')P(''N'')/P(''X''), and the conservative application of the Copernican principle tells us only how to calculate P(''X''|''N''). Taking P(''X'') to be flat, we still have to make an assumption about the prior probability P(''N'') that the total number of humans is ''N''. If we conclude that ''N''<sub>2</sub> is much more likely than ''N''<sub>1</sub> (for example, because producing a larger population takes more time, increasing the chance that a low-probability but cataclysmic natural event will take place in that time), then P(''X''|''N'') can become more heavily weighted towards the bigger value of ''N''. A further, more detailed discussion, as well as relevant distributions P(''N''), are given below in the [[Doomsday Argument#Rebuttals|Rebuttals]] section. + +===What the argument is not=== +The Doomsday argument (DA) does ''not'' say that humanity cannot or will not exist indefinitely. It does not put any upper limit on the number of humans that will ever exist, nor provide a date for when humanity will become [[extinct]]. + +An abbreviated form of the argument ''does'' make these claims, by confusing probability with certainty. However, the actual DA's conclusion is: +:There is a 95% ''chance'' of extinction within 9,120 years. + +The DA gives a 5% chance that some humans will still be alive at the end of that period. (These dates are based on the assumptions above; the precise numbers vary among specific ''Doomsday arguments''.) + +==Variations== +This argument has generated a lively philosophical debate, and no consensus has yet emerged on its solution. The variants described below produce the DA by separate derivations. + +===Gott's formulation: 'vague prior' total population=== +Gott specifically proposes the functional form for the [[Prior probability|prior distribution]] of the number of people who will ever be born (''N''). Gott's DA used the [[prior probability#Uninformative priors|vague prior distribution]]: +:<math>P(N) = \frac{k}{N}</math>. +where +* P(N) is the probability prior to discovering ''n'', the total number of humans who have ''yet'' been born. +* The constant, ''k'', is chosen to [[Normalizing constant|normalize]] the sum of P(''N''). The value chosen isn't important here, just the functional form (this is an [[improper prior]], so no value of ''k'' gives a valid distribution, but [[Bayesian inference]] is still possible using it.) + +Since Gott specifies the [[Prior probability|prior]] distribution of total humans, ''P(N)'', [[Bayes's theorem]] and the [[principle of indifference]] alone give us ''P(N|n)'', the probability of ''N'' humans being born if ''n'' is a random draw from ''N'': + +:<math>P(N\mid n) = \frac{P(n\mid N) P(N)}{P(n)}.</math> + +This is Bayes's theorem for the [[posterior probability]] of total population exactly ''N'', [[conditioning (probability)|conditioned]] on current population exactly ''n''. Now, using the indifference principle: + +:<math>P(n\mid N) = \frac{1}{N}</math>. + +The unconditioned ''n'' distribution of the current population is identical to the vague prior ''N'' probability density function,<ref>The only [[probability density function]]s that must be specified ''[[A priori and a posteriori|a priori]]'' are: +* Pr(''N'') - the ultimate number of people that will be born, assumed by J. Richard Gott to have a vague prior distribution, Pr(''N'') = ''k''/''N'' +* Pr(''n''|''N'') - the chance of being born in any position based on a total population ''N'' - all DA forms assume the [[Copernican principle]], making Pr(''n''|''N'') = 1/''N'' + +From these two distributions, the Doomsday Argument proceeds to create a Bayesian inference on the distribution of ''N'' from ''n'', through [[Bayes' theorem#For probability densities|Bayes' rule]], which requires P(''n''); to produce this, integrate over all the possible values of ''N'' which might contain an individual born ''n''th (that is, wherever ''N'' > ''n''): + +:<math> P(n) = \int_{N=n}^{N=\infty} P(n\mid N) P(N) \,dN = \int_{n}^{\infty}\frac{k}{N^2} \,dN </math> <math>= \frac{k}{n}.</math> + +This is why the marginal distribution of n and N are identical in the case of P(''N'') = ''k''/''N' +</ref> so: + +:<math>P(n) = \frac{k}{n}</math>, + +giving P (''N'' | ''n'') for each specific ''N'' (through a substitution into the posterior probability equation): + +:<math>P(N\mid n) = \frac{n}{N^2}</math>. + +The easiest way to produce the doomsday estimate with a given confidence (say 95%) is to pretend that ''N'' is a [[Continuous random variable|continuous variable]] (since it is very large) and [[Integral|integrate]] over the probability density from ''N'' = ''n'' to ''N'' = ''Z''. (This will give a function for the probability that ''N'' ≤ ''Z''): + +:<math>P(N \leq Z) = \int_{N=n}^{N=Z} P(N|n)\,dN</math> <math> = \frac{Z-n}{Z}</math> + +Defining ''Z'' = 20''n'' gives: + +:<math>P(N \leq 20n) = \frac{19}{20}</math>. + +This is the simplest [[Bayes factor|Bayesian]] derivation of the Doomsday Argument: +:The chance that the total number of humans that will ever be born (''N'') is greater than twenty times the total that have been is below 5% + +The use of a [[prior probability#Uninformative priors|vague prior]] distribution seems well-motivated as it assumes as little knowledge as possible about ''N'', given that any particular function must be chosen. It is equivalent to the assumption that the probability density of one's fractional position remains uniformly distributed even after learning of one's absolute position (''n''). + +Gott's 'reference class' in his original 1993 paper was not the number of births, but the number of years 'humans' had existed as a species, which he put [[Human evolution#H. sapiens|at 200,000]]. Also, Gott tried to give a 95% confidence interval between a ''minimum'' survival time and a maximum. Because of the 2.5% chance that he gives to underestimating the minimum he has only a 2.5% chance of overestimating the maximum. This equates to 97.5% confidence that extinction occurs before the upper boundary of his confidence interval. + +97.5% is one chance in forty, which can be used in the integral above with ''Z'' = 40''n'', and ''n'' = 200,000 years: + +:<math>P(N \leq 40[200000]) = \frac{39}{40}</math> + +This is how Gott produces a 97.5% confidence of extinction within ''N'' ≤ 8,000,000 years. The number he quoted was the likely time remaining, ''N''&nbsp;−&nbsp;''n'' = '''7.8 million years'''. This was much higher than the temporal confidence bound produced by counting births, because it applied the principle of indifference to time. (Producing different estimates by sampling different parameters in the same hypothesis is [[Bertrand's paradox (probability)|Bertrand's paradox]].) + +His choice of 95% confidence bounds (rather than 80% or 99.9%, say) matched the scientifically accepted limit of [[statistical significance]] for hypothesis rejection. Therefore, he argued that the [[hypothesis]]: “humanity will cease to exist before 5,100 years or thrive beyond 7.8 million years” can be rejected. + +Leslie's argument differs from Gott's version in that he does not assume a'' vague prior'' probability distribution for ''N''. Instead he argues that the force of the Doomsday Argument resides purely in the increased probability of an early Doomsday once you take into account your birth position, regardless of your prior probability distribution for ''N''. He calls this the ''probability shift''. + +[[Heinz von Foerster]] argued that humanity's abilities to construct societies, civilizations and technologies do not result in self inhibition. Rather, societies' success varies directly with population size. Von Foerster found that this model fit some 25 data points from the birth of [[Jesus]] to 1958, with only 7% of the [[variance]] left unexplained. Several follow-up letters (1961, 1962, …) were published in ''Science'' showing that von Foerster's equation was still on track. The data continued to fit up until 1973. The most remarkable thing about von Foerster's model was it predicted that the human population would reach infinity or a mathematical singularity, on Friday, November 13, 2026. In fact, von Foerster did not imply that the world population on that day could actually become infinite. The real implication was that the world population growth pattern followed for many centuries prior to 1960 was about to come to an end and be transformed into a radically different pattern. Note that this prediction began to be fulfilled just in a few years after the "Doomsday" was published.<ref>See, for example, [http://urss.ru/cgi-bin/db.pl?cp=&page=Book&id=34250&lang=en&blang=en&list=38 ''Introduction to Social Macrodynamics''] by [[Andrey Korotayev]] ''et al.''</ref> + +==Reference classes== +One of the major areas of Doomsday Argument debate is the [[reference class problem|reference class]] from which ''n'' is drawn, and of which ''N'' is the ultimate size. The 'standard' Doomsday Argument [[hypothesis]] doesn't spend very much time on this point, and simply says that the reference class is the number of 'humans'. Given that you are human, the Copernican principle could be applied to ask if you were born unusually early, but the grouping of 'human' has been widely challenged on [[Anthropology|practical]] and [[Philosophy|philosophical]] grounds. [[Nick Bostrom]] has argued that [[consciousness]] is (part of) the discriminator between what is in and what is out of the reference class, and that [[extraterrestrial intelligence]]s might affect the calculation dramatically. + +The following sub-sections relate to different suggested reference classes, each of which has had the standard Doomsday Argument applied to it. + +===Sampling only WMD-era humans=== +The [[Doomsday clock]] shows the expected time to nuclear [[Doomsday event|doomsday]] by the judgment of an [[Bulletin of the Atomic Scientists|expert board]], rather than a Bayesian model. If the twelve hours of the clock symbolize the lifespan of the human species, its current time of 11:54 implies that we are among the last 1% of people who will ever be born (i.e. that ''n'' > 0.99''N''). [[J. Richard Gott]]'s temporal version of the Doomsday argument (DA) would require very strong prior evidence to overcome the improbability of being born in such a [[Copernican principle|special]] time. +:If the clock's doomsday estimate is correct, there is less than 1 chance in 100 of seeing it show such a late time in human history, if observed at a random time within that history. + +The [[Bulletin of the Atomic Scientists|scientists']] warning can be reconciled with the DA, however{{Citation needed|date=May 2009}}: The Doomsday clock specifically estimates the proximity of [[Nuclear weapon|atomic]] self-destruction—which has only been possible for sixty years.<ref>The clock first appeared in 1949, and the date on which humanity gained the power to destroy itself is debatable, but to simplify the argument the numbers here are based on an assumption of fifty years.</ref> +If doomsday requires nuclear weaponry then the Doomsday Argument 'reference class' is: people contemporaneous with nuclear weapons. In this model, the number of people living through, or born after [[Atomic bombings of Hiroshima and Nagasaki|Hiroshima]] is ''n'', and the number of people who ever will is ''N''. Applying [[J. Richard Gott|Gott's]] DA to these variable definitions gives a 50% chance of doomsday within 50 years. + +:In this model, the clock's hands are so close to midnight because a [[conditional probability|condition]] of doomsday is living post-1945, a condition which applies now but not to the earlier 11 hours and 53 minutes of the clock's metaphorical human 'day'.{{citation needed|date=March 2013}} + +If your life is randomly selected from all lives lived under the shadow of the bomb, this simple model gives a 95% chance of doomsday within 1000 years. + +The scientists' recent use of moving the clock forward to warn of the dangers posed by [[global warming]] muddles this reasoning, however. + +===SSSA: Sampling from observer-moments=== +[[Nick Bostrom]], [[Anthropic principle#Anthropic bias and anthropic reasoning|considering observation selection effects]], has produced a Self-Sampling Assumption (SSA): "''that you should think of yourself as if you were a random observer from a suitable reference class''". If the 'reference class' is the set of humans to ever be born, this gives ''N'' < 20''n'' with 95% confidence (the standard Doomsday argument). However, he has [[Anthropic bias|refined]] this idea to apply to ''observer-moments'' rather than just observers. He has formalized this ([http://anthropic-principle.com/preprints/self-location.html] as: + +:The Strong Self-Sampling Assumption ('''SSSA'''): Each observer-moment should reason as if it were randomly selected from the class of all observer-moments in its reference class. + +If the minute in which you read this article is randomly selected from every minute in every human's lifespan then (with 95% confidence) this event has occurred after the first 5% of human observer-moments. If the mean lifespan in the future is twice the historic mean lifespan, this implies 95% confidence that ''N'' < 10''n'' (the average future human will account for twice the observer-moments of the average historic human). Therefore, the 95th percentile extinction-time estimate in this version is '''4560 years'''. + +==Rebuttals== +{{Tone|date=November 2010}} + +===We are in the earliest 5%, ''a priori''=== +If one agrees with the statistical methods, still disagreeing with the Doomsday argument (DA) implies that: +# The current generation of humans '''are''' within the first 5% of humans to be born. +# This is '''not''' purely a coincidence. + +Therefore, these rebuttals try to give reasons for believing that the currently living humans are some of the earliest beings. + +For instance, if one is a member of 50,000 people in a collaborative project, the Doomsday Argument implies a 95% chance that there will never be more than a million members of that project. This can be refuted if one's other characteristics are typical of the [[early adopter]]. The mainstream of potential users will prefer to be involved when the project is nearly complete. If one were to enjoy the project's incompleteness, it is already known that he or she is unusual, prior to the discovery of his or her early involvement. + +If one has measurable attributes that sets one apart from the typical long run user, the project DA can be refuted based on the fact that one could expect to be within the first 5% of members, ''a priori''. The analogy to the total-human-population form of the argument is: Confidence in a prediction of the [[probability distribution|distribution]] of human characteristics that places modern & historic humans outside the mainstream, implies that it is already known, before examining ''n'' that it is likely to be very early in ''N''. + +For example, if one is certain that 99% of humans who will ever live will be [[cyborg]]s, but that only a negligible fraction of humans who have been born to date are cyborgs, one could be equally certain that at least one hundred times as many people remain to be born as have been. + +[[Robin Hanson]]'s paper sums up these criticisms of the DA: +:"All else is not equal; we have good reasons for thinking we are not randomly selected humans from all who will ever live." + +Drawbacks of this rebuttal: + +# The question of how the confident prediction is derived. An uncannily [[prescient]] picture of humanity's statistical [[probability distribution|distribution]] is needed through all time, before humans can pronounce ourselves extreme members of that [[population]]. (In contrast, project pioneers have clearly distinct psychology from the mainstream.) +# If the majority of humans have characteristics that they do not share, some would argue that this is equivalent to the Doomsday argument, since ''people similar to those observing these matters'' will become extinct. + +===Critique: Human extinction is distant, ''a posteriori''=== +The [[a posteriori]] observation that [[extinction level event]]s are rare could be offered as evidence that the DA's predictions are implausible; typically, [[extinction]]s of a dominant [[species]] happens less often than once in a million years. Therefore, it is argued that [[Human extinction]] is unlikely within the next ten millennia. (Another [[probability theory|probabilistic argument]], drawing a different conclusion than the DA.) + +In Bayesian terms, this response to the DA says that our knowledge of history (or ability to prevent disaster) produces a prior marginal for ''N'' with a minimum value in the trillions. If ''N'' is distributed uniformly from 10<sup>12</sup> to 10<sup>13</sup>, for example, then the probability of ''N'' < 1,200 billion inferred from ''n'' = 60 billion will be extremely small. This is an equally impeccable Bayesian calculation, rejecting the [[Copernican principle]] on the grounds that we must be 'special observers' since there is no likely mechanism for humanity to go extinct within the next hundred thousand years. + +This response is accused of overlooking the [[Human extinction#Scientific accidents|technological threats to humanity's survival]], to which earlier life was not subject, and is specifically rejected by most of the DA's academic critics (arguably excepting [[Robin Hanson]]). + +In fact, many [[futurologists]] believe the empirical situation is worse than Gott's DA estimate. For instance, Sir [[Martin Rees]] believes that the technological dangers give an estimated human survival duration of ninety-five years (with [[Our Final Hour|50% confidence]].) Earlier prophets made similar predictions and were 'proven' wrong (e.g. on [[Doomsday argument#Sampling only WMD-era humans|surviving the nuclear arms race]]). It is possible that their estimates were accurate, and that their common image as alarmists is a [[survivorship bias]]. + +===The prior ''N'' distribution may make ''n'' very uninformative=== +[[Robin Hanson]] argues that ''N'''s prior may be [[exponential distribution|exponentially distributed]]: +:<math>N = \frac{e^{U(0, q]}}{c}</math> + +Here, ''c'' and'' q'' are constants. If ''q'' is large, then our 95% confidence upper bound is on the uniform draw, not the exponential value of ''N''. + +The best way to compare this with Gott's Bayesian argument is to flatten the distribution from the vague prior by having the probability fall off more slowly with ''N'' (than inverse proportionally). This corresponds to the idea that humanity's growth may be exponential in time with doomsday having a vague prior [[probability density function|pdf]] in ''time''. This would mean than ''N'', the last birth, would have a distribution looking like the following: + +:<math>\Pr(N) = \frac{k}{N^\alpha}, 0 < \alpha < 1. +</math> + +This prior ''N'' distribution is all that is required (with the principle of indifference) to produce the inference of ''N'' from ''n'', and this is done in an identical way to the standard case, as described by Gott (equivalent to <math>\alpha</math> = 1 in this distribution): + +:<math> \Pr(n) = \int_{N=n}^{N=\infty} \Pr(n\mid N) \Pr(N) \,dN = \int_{n}^{\infty} \frac{k}{N^{(\alpha+1)}} \,dN = \frac{k}{{\alpha}n^{\alpha}}</math> + +Substituting into the posterior probability equation): + +:<math>\Pr(N\mid n) = \frac{{\alpha}n^{\alpha}}{N^{(1+\alpha)}}.</math> + +Integrating the probability of any ''N'' above ''xn'': + +:<math>\Pr(N > xn) = \int_{N=xn}^{N=\infty} \Pr(N\mid n)\,dN = \frac{1}{x^{\alpha}}.</math> + +For example, if ''x'' = 20, and <math>\alpha</math> = 0.5, this becomes: + +:<math>\Pr(N > 20n) = \frac{1}{\sqrt{20}} \simeq 22.3\%. </math> + +Therefore, with this prior, the chance of a trillion births is well over 20%, rather than the 5% chance given by the standard DA. If <math>\alpha</math> is reduced further by assuming a flatter prior ''N'' distribution, then the limits on'' N'' given by ''n'' become weaker. An <math>\alpha</math> of one reproduces Gott's calculation with a birth reference class, and <math>\alpha</math> around 0.5 could approximate his temporal confidence interval calculation (if the population were expanding exponentially). As <math>\alpha \to 0</math> (gets smaller) ''n'' becomes less and less [[uninformative prior|informative]] about ''N''. In the limit this distribution approaches an (unbounded) [[uniform distribution (continuous)|uniform distribution]], where all values of ''N'' are equally likely. This is Page et al.'s''' "Assumption 3"''', which they find few reasons to reject, ''a priori''. (Although all distributions with <math>\alpha \leq 1</math> are improper priors, this applies to Gott's vague-prior distribution also, and they can all be converted to produce [[improper integral|proper integrals]] by postulating a finite upper population limit.) Since the probability of reaching a population of size 2''N'' is usually thought of as the chance of reaching ''N'' multiplied by the survival probability from ''N'' to 2''N'' it seems that Pr(''N'') must be a [[monotonic function|monotonically]] decreasing function of ''N'', but this doesn't necessarily require an inverse proportionality. + +A prior distribution with a very low <math>\alpha</math> [[parameter]] makes the DA's ability to constrain the ultimate size of humanity very weak. + +===Infinite Expectation=== +Another objection to the Doomsday Argument is that the [[Expected value|expected]] total human population is actually [[Infinity|infinite]]. The calculation is as follows: + +:The total human population <var>N</var> = <var>n</var>/<var>f</var>, where <var>n</var> is the human population to date and <var>f</var> is our fractional position in the total. +:We assume that <var>f</var> is uniformly distributed on <nowiki>(0,1]</nowiki>. +: The expectation of <var>N</var> is <math> E(N) = \int_{0}^{1} {n \over f} \, df = n \ln (1) - n \ln (0) = + \infty .</math> + +This infinite expectation shows that, under the framework of the DA, humanity still has some chance of surviving an arbitrarily long time. + +For a similar example of counterintuitive infinite expectations, see the [[St. Petersburg paradox]]. + +===Self-Indication Assumption: The possibility of not existing at all=== +One objection is that the possibility of your existing at all depends on how many humans will ever exist (''N''). If this is a high number, then the possibility of your existing is higher than if only a few humans will ever exist. Since you do indeed exist, this is evidence that the number of humans that will ever exist is high. + +This objection, originally by [[Dennis Dieks]] (1992), is now known by [[Nick Bostrom]]'s name for it: the "[[Self-Indication Assumption]] objection". It can be shown that some [[Self-Indication Assumption|SIAs]] prevent any inference of ''N'' from ''n'' (the current population); for details of this argument from the Bayesian inference perspective see: [[Self-Indication Assumption Doomsday argument rebuttal]]. + +===Caves' rebuttal=== +The [[Bayesian inference|Bayesian]] argument by [[Carlton M. Caves]] says that the uniform distribution assumption is incompatible with the [[Copernican principle]], not a consequence of it. + +He gives a number of examples to argue that Gott's rule is implausible. For instance, he says, imagine stumbling into a birthday party, about which you know nothing: + +<blockquote>Your friendly enquiry about the age of the celebrant elicits the reply that she is celebrating her (''t''<sub>''p''</sub>&nbsp;=&nbsp;) 50th birthday. According to Gott, you can predict with 95% confidence that the woman will survive between [50]/39 = 1.28 years and 39[&times;50] = 1,950 years into the future. Since the wide range encompasses reasonable expectations regarding the woman's survival, it might not seem so bad, till one realizes that [Gott's rule] predicts that with probability 1/2 the woman will survive beyond 100 years old and with probability 1/3 beyond 150. Few of us would want to bet on the woman's survival using Gott's rule. ''(See Caves' online paper [[Doomsday argument#External links|below]].)''</blockquote> + +Although this example exposes a weakness in [[J. Richard Gott]]'s "Copernicus method" DA (that he does not specify when the "Copernicus method" can be applied) it is not precisely analogous with the [[Doomsday argument#Numerical estimates of the Doomsday argument|modern DA]]; [[Epistemology|epistemological]] refinements of Gott's argument by [[philosopher]]s such as [[Nick Bostrom]] specify that: +: Knowing the absolute birth rank (''n'') must give no information on the total population (''N''). + +Careful DA variants specified with this rule aren't shown implausible by Caves' "Old Lady" example above, because, the woman's age is given prior to the estimate of her lifespan. Since human age gives an estimate of survival time (via [[actuary|actuarial]] tables) Caves' Birthday party age-estimate could not fall into the class of DA problems defined with this proviso. + +To produce a comparable "Birthday party example" of the carefully specified Bayesian DA we would need to completely exclude all prior knowledge of likely human life spans; in principle this could be done (e.g.: hypothetical [[Amnesia chamber]]). However, this would remove the modified example from everyday experience. To keep it in the everyday realm the lady's age must be ''hidden'' prior to the survival estimate being made. (Although this is no longer exactly the DA, it is much more comparable to it.) + +Without knowing the lady’s age, the DA reasoning produces a ''rule'' to convert the birthday (''n'') into a maximum lifespan with 50% confidence (''N''). Gott's [[Copernicus principle|Copernicus method]] rule is simply: Prob (''N'' < 2''n'') = 50%. How accurate would this estimate turn out to be? Western [[demographics]] are now fairly [[uniform]] across ages, so a random birthday (''n'') could be (very roughly) approximated by a U(0,''M''<nowiki>]</nowiki> draw where ''M'' is the maximum lifespan in the census. In this 'flat' model, everyone shares the same lifespan so ''N'' = ''M''. If ''n'' happens to be less than (''M'')/2 then Gott's 2''n'' estimate of ''N'' will be under ''M'', its true figure. The other half of the time 2''n'' underestimates ''M'', and in this case (the one Caves highlights in his example) the subject will die before the 2''n'' estimate is reached. In this 'flat demographics' model Gott's 50% confidence figure is proven right 50% of the time. + +===Self-referencing doomsday argument rebuttal=== +{{Main|Self-referencing doomsday argument rebuttal}} +Some philosophers have been bold enough to suggest that only people who have contemplated the Doomsday argument (DA) belong in the reference class '[[human]]'. If that is the appropriate reference class, [[Brandon Carter|Carter]] defied his own prediction when he first described the argument (to the [[Royal Society]]). A member present could have argued thus: + +<blockquote>Presently, only one person in the world understands the Doomsday argument, so by its own logic there is a 95% chance that it is a minor problem which will only ever interest twenty people, and I should ignore it.</blockquote> + +[[Jeff Dewynne]] and Professor [[Peter Landsberg]] suggested that this line of reasoning will create a [[paradox]] for the Doomsday argument: + +If a member did pass such a comment, it would indicate that they understood the DA sufficiently well that in fact 2 people could be considered to understand it, and thus there would a 5% chance that 40 or more people would actually be interested. Also, of course, ignoring something because you only expect a small number of people to be interested in it is extremely short sighted—if this approach were to be taken, nothing new would ever be explored, if we assume no ''a priori'' knowledge of the nature of interest and attentional mechanisms. + +Additionally, it should be considered that because [[Brandon Carter|Carter]] did present and describe his argument, in which case the people to whom he explained it did contemplate the DA, as it was inevitable, the conclusion could then be drawn that in the moment of explanation [[Brandon Carter|Carter]] created the basis for his own prediction. + +===Conflation of future duration with total duration=== +A rebuttal by Ronald Pisaturo in 2009<ref>{{Cite journal + | author = Ronald Pisaturo + | title = Past Longevity as Evidence for the Future + | journal = [[Philosophy of Science (journal)|Philosophy of Science]] + | volume = 76 + | pages = 73&ndash;100 + | year = 2009 + | doi = 10.1086/599273 +}}</ref> argues that the Doomsday Argument conflates future duration and total duration. + +According to Pisaturo, the Doomsday Argument relies on the equivalent of this equation: + +:<math> P(H_{TS}|D_pX)/P(H_{TL}|D_pX) = [P(H_{FS}|X)/P(H_{FL}|X)] \cdot [P(D_p|H_{TS}X)/P(D_p|H_{TL}X)] </math>, +:where: +:''X'' = the prior information; +:''D<sub>p</sub>'' = the data that past duration is ''t<sub>p</sub>''; +:''H<sub>FS</sub>'' = the hypothesis that the future duration of the phenomenon will be short; +:''H<sub>FL</sub>'' = the hypothesis that the future duration of the phenomenon will be long; +:''H<sub>TS</sub>'' = the hypothesis that the ''total'' duration of the phenomenon will be short—i.e., that ''t<sub>t</sub>'', the phenomenon’s ''total'' longevity, = ''t<sub>TS</sub>''; +: ''H<sub>TL</sub>'' = the hypothesis that the ''total'' duration of the phenomenon will be long—i.e., that ''t<sub>t</sub>'', the phenomenon’s ''total'' longevity, = ''t<sub>TL</sub>'', with ''t<sub>TL</sub>'' > ''t<sub>TS</sub>''. +Pisaturo then observes: + +:Clearly, this is an invalid application of Bayes’ theorem, as it conflates future duration and total duration. + +Pisaturo takes numerical examples based on two possible corrections to this equation: considering only future durations, and considering only total durations. In both cases, he concludes that the Doomsday Argument’s claim, that there is a ‘Bayesian shift’ in favor of the shorter future duration, is fallacious. + +==Mathematics-free explanation by analogy== +Assume the human species is a car driver. The driver has encountered some bumps but no catastrophes, and the car ([[Earth]]) is still road-worthy. However, insurance is required. The cosmic insurer has not dealt with humanity before, and needs some basis on which to calculate the premium. According to the Doomsday Argument, the insurer merely need ask how long the car and driver have been on the road&mdash;currently at least 40,000 years without an "accident"&mdash;and use the response to calculate insurance based on a 50% chance that a fatal "accident" will occur inside that time period. + +Consider a hypothetical insurance company that tries to attract drivers with long accident-free histories not because they necessarily drive more safely than newly qualified drivers, but for statistical reasons: the hypothetical insurer estimates that each driver looks for insurance quotes every year, so that the time since the last [[accident]] is an evenly distributed random sample between accidents. The chance of being more than halfway through an evenly distributed random sample is one-half, and (ignoring old-age effects) if the driver is more than half way between accidents then he is closer to his next accident than his previous one. A driver who was accident-free for 10 years would be quoted a very low premium for this reason, but someone should not expect cheap insurance if he only passed his test two hours ago (equivalent to the accident-free record of the human species in relation to 40,000 years of [[geological time]].) + +===Analogy to the estimated final score of a cricket batsman=== +A random in-progress [[cricket]] [[Test cricket|test match]] is sampled for a single piece of information: the current [[batsman]]'s run tally so far. If the batsman is dismissed (rather than declaring), what is the chance that he will end up with a score more than double his current total? +: A ''rough'' [[empirical]] result is that the chance is half (on average). + +The '''Doomsday argument''' (DA) is that even if we were completely ignorant of the game we could make the same prediction, or profit by offering a bet paying [[odds]] of 2-to-3 on the batsmen doubling his current score. + +Importantly, we can only offer the bet before the current score is given (this is necessary because the absolute value of the current score would give a cricket expert a lot of information about the chance of that tally doubling). It is necessary to be ignorant of the absolute run tally before making the prediction because this is linked to the likely total, but if the likely total and absolute value are ''not'' linked the survival prediction can be made ''after'' discovering the batter's current score. Analogously, the DA says that ''if the absolute number of humans born gives no information on the number that will be'', we can predict the species’ total number of births after discovering that 60 billion people have ever been born: with 50% confidence it is 120 billion people, so that there is better-chance-than-not that '''the last human birth will occur before the 23rd century'''. + +It is ''not'' true that the chance is half, ''whatever'' is the number of runs currently scored; [[batting (baseball)|batting]] records give an empirical [[correlation]] between reaching a given score (50 say) and reaching any other, higher score (say 100). On the average, the chance of doubling the current tally may be half, but the chance of reaching a century having scored fifty is much lower than reaching ten from five. Thus, the ''absolute'' value of the score gives information about the likely final total the batsman will reach, beyond the “scale invariant”.<ref>The cricketing rationale for the lengthening of future survival time with current score is that batting is a test of skill that a high-scoring batsman has passed. Therefore, higher scores are correlated with better players who will then be more likely to continue scoring heavily. Historic batting records give a [[Prior probability|prior]] distribution that provides other useful data. In particular, we know the [[mean]] score across all players and matches. High and low [[Posterior probability|posterior]] information (the current score) only gives a weak indication of the player's skill, which is more strongly described by this ''prior'' mean. (This statistical phenomenon of [[Prior probability#Informative priors|informative]] averages is called [[Regression toward the mean]].)</ref> + +An analogous Bayesian critique of the DA is that is somehow possessed [[Prior probability|prior]] knowledge of the all-time human population distribution (total runs scored), and that this is more significant than the finding of a low number of births until now (a low current run count). + +There are two alternative methods of making [[uniform]] draws from the current score (''n''): +# Put the runs actually scored by dismissed player in order, say 200, and randomly choose between these scoring increments by <nowiki>U(0, 200]</nowiki>. +# Select a ''time'' randomly from the beginning of the match to the final dismissal. + +The second sampling-scheme will include those lengthy periods of a game where a dismissed player is replaced, during which the ‘current batsman’ is preparing to take the field and has no runs. If people sample based on time-of-day rather than running-score they will often find that a new batsman has a score of zero ''when the total score that day was low'', but humans will rarely sample a zero if one batsman stayed at the [[Crease (cricket)|crease]], piling on runs all day long. Therefore, the fact that sampling a non-zero score would tell us something about the likely final score that the current batsman will achieve. + +Choosing sampling method 2 rather than method 1 would give a different statistical link between current and final score: any non-zero score would imply that the batsman reached a high final total, especially if the time to replace batsman is ''very'' long. This is analogous to the [[Self-Indication Assumption|SIA]]-DA-refutation that ''N'''s distribution should include ''N'' = 0 states, which leads to the DA having reduced [[predictive power]] (in the extreme, no power to predict ''N'' from ''n'' at all). + +===The Doomsday Argument as a tricky problem=== + +Sometimes, the Doomsday Argument is presented as a probability problem using Bayes’ formula.<ref>''"Logique, informatique et paradoxes"'' [[Jean-Paul Delahaye]], Belin, pages 30-32</ref> + +'''Hypotheses''' + +Two hypotheses are in competition: +# The theory A says that humanity will disappear in 2150, +# and the theory B says that it will be much later. + +Under assumption A, a tenth of humanity was alive in the year 2000, and humanity has included 50 billion individuals. + +Under assumption B, one thousandth of humanity was alive in the year 2000, and humanity has included 5 trillion individuals. + +The first theory seems less likely, and its ''a priori'' probability is set at 1%, while the probability of the second is logically set to 99%. + +Now consider an event E, for example: "a person is part of the 5 billion people alive in the year 2000". One may ask "What is the most likely hypothesis, if you take into account this event?" and apply Bayes' formula: + +:<math>\mathbb{P}(A\mid E) = \frac{\mathbb{P}(E\mid A)\cdot \mathbb{P}(A)}{\mathbb{P}(E)}</math> +According to the above figures: +:<math>\mathbb{P}(E\mid A) = 10%\ , \ \mathbb{P}(E\mid B) = 0.10%</math> +Now with : +:<math>\mathbb{P}(A) = \frac {1}{100}\ , \ \mathbb{P}(B) = \frac {99}{100}</math> +We get : +:<math>\mathbb{P}(E) = \mathbb{P}(E \cap A) + \mathbb{P}(E\cap B) = \mathbb{P}(E\mid A)\cdot\mathbb{P}(A) + \mathbb{P}(E\mid B)\cdot \mathbb{P}(B)=\frac{19.9}{10 000}</math> + +Finally the probabilities have changed dramatically: + +:<math>\mathbb{P}(A\mid E) = \frac{10}{19.2}=50.25% </math> +:<math>\mathbb{P}(B\mid E) = \frac{9.9}{19.2}=49.75% </math> + +Because an individual was chosen randomly, the probability of the end of the world has significantly increased. + +'''Attempted Refutations''' + +A potential refutation was provided in July 2003:<ref>''"La Belle au bois dormant, la fin du monde et les extraterrestres"'', Jean-Paul Delahaye, Belin, Pour la science, juillet 2003, pages 30-32, http://www2.lifl.fr/~delahaye/pls/2003/107</ref> Jean-Paul Delahaye showed that Bayes' formula introduces "probabilistic anamorphosis", and demonstrated that Bayes' formula is prone to misleading errors made in good faith by its users. In 2011,<ref>''"L’Argument de l’Apocalypse… selon la Répression des Fraudes|collection"'' Philippe Gay, Image des mathématiques (CNRS),août 2011, http://images.math.cnrs.fr/L-Argument-de-l-Apocalypse-selon.html</ref> Philippe Gay showed that many similar problems can lead to these mistakes: each change of a weighted average by a simple one leads to odd results. + +In 2010,<ref>''"Détournements de Bayes"'' Philippe Gay and Édouard Thomas, Tangente, septembre-octobre 2010, n°136</ref> Philippe Gay and Édouard Thomas described a slightly different understanding: the formula must take into account the number of humans involved in each case. Whatever the explanation, both show the same algebra: + +:<math>\mathbb{P}(B\mid E) = \frac{0.1% \times 5 \cdot 10^{12} \times 99%}{0.1%\times 5\cdot 10^{12} \times 99% +10% \times 50\cdot 10^{9} \times 1%} = \frac{99%}{99% + 1%} =99%=\mathbb{P}(B) </math> + +Using a similar method, we get: + +:<math>\mathbb{P}(A\mid E) = \frac{1%}{99% + 1%} = 1%=\mathbb{P}(A) </math> + +==See also== +* [[Doomsday event]]s +* [[Fermi paradox]] +* [[Final anthropic principle]] +* [[List of disasters#Causes of hypothetical future disasters|Hypothetical disasters]] +* [[Mediocrity principle]] +* [[Quantum immortality]] +* [[Simulated reality]] +* [[Sic transit gloria mundi]] +* [[Survival analysis]] +* [[Survivalism]] +* [[Technological singularity]] + +==Notes== +{{Reflist}} + +==References== +{{Refbegin}} +* John Leslie, ''The End of the World: The Science and Ethics of Human Extinction'', Routledge, 1998, ISBN 0-415-18447-9. +* J. R. Gott III, ''Future Prospects Discussed'', Nature, vol. 368, p.&nbsp;108, 1994. +* This argument plays a central role in [[Stephen Baxter]]'s science fiction book, ''[[Manifold: Time]]'', Del Rey Books, 2000, ISBN 0-345-43076-X. +{{Refend}} + +==External links== +* [http://flatrock.org.nz/topics/environment/doom_soon.htm A non-mathematical, unpartisan introduction to the DA] +* [http://youtube.com/watch?v=F-QA2rkpBSY A compelling lecture from the University of Colorado-Boulder] +* [http://www.anthropic-principle.com/preprints/ali/alive.html Nick Bostrom's response to Korb and Oliver] +* [http://www.anthropic-principle.com/primer1.html Nick Bostrom's summary version of the argument] +* [http://www.anthropic-principle.com/preprints.html#doomsday Nick Bostrom's annotated collection of references] +* [http://arxiv.org/abs/gr-qc/9407002 Kopf, Krtouš & Page's early (1994) refutation] based on the [[Self-Indication Assumption|SIA]], which they called "Assumption 2". +* [http://xxx.lanl.gov/abs/gr-qc/0009081 The Doomsday argument and the number of possible observers by Ken Olum] In 1993 [[J. Richard Gott]] used his "Copernicus method" to predict the lifetime of Broadway shows. One part of this paper uses the same reference class as an empirical counter-example to Gott's method. +* [http://hanson.gmu.edu/nodoom.html A Critique of the Doomsday Argument by Robin Hanson] +* [http://cogprints.org/7044/ A Third Route to the Doomsday Argument by Paul Franceschi], ''Journal of Philosophical Research'', 2009, vol. 34, pp.&nbsp;263–278 +* [http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82931 Chambers' Ussherian Corollary Objection] +* [http://info.phys.unm.edu/papers/2000/Caves2000a.pdf Caves' Bayesian critique of Gott's argument. C. M. Caves, "Predicting future duration from present age: A critical assessment", Contemporary Physics 41, 143-153 (2000).] +* [http://arxiv.org/abs/0806.3538v1 C.M. Caves, "Predicting future duration from present age: Revisiting a critical assessment of Gott's rule.] +* [http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=2400044 "Infinitely Long Afterlives and the Doomsday Argument" by John Leslie] shows that Leslie has recently modified his analysis and conclusion (Philosophy 83 (4) 2008 pp.&nbsp;519–524): Abstract—A recent book of mine defends three distinct varieties of immortality. One of them is an infinitely lengthy afterlife; however, any hopes of it might seem destroyed by something like Brandon Carter's ‘doomsday argument’ against viewing ourselves as extremely early humans. The apparent difficulty might be overcome in two ways. First, if the world is non-deterministic then anything on the lines of the doomsday argument may prove unable to deliver a strongly pessimistic conclusion. Secondly, anything on those lines may break down when an infinite sequence of experiences is in question. +* [http://www.lrb.co.uk/v21/n13/gree04_.html Mark Greenberg, "Apocalypse Not Just Now" in London Review of Books] +* [http://pthbb.org/manual/services/grim/laster.html Laster]: A simple webpage applet giving the min & max survival times of anything with 50% and 95% confidence requiring only that you input how old it is. It is designed to use the same mathematics as [[J. Richard Gott]]'s form of the DA, and was programmed by [[sustainable development]] researcher Jerrad Pierce. + +{{Doomsday}} + +{{DEFAULTSORT:Doomsday Argument}} +[[Category:Eschatology]] +[[Category:Probabilistic arguments]] +[[Category:Theories of history]] +[[Category:Sociocultural evolution]] +[[Category:Fermi paradox]] + 4c7eogxp2ya90ue4xgnfj361z3jewit + + + + Divisor function + 0 + 4629 + + 4630 + 2013-12-07T21:38:29Z + + XLinkBot + 0 + + BOT--Reverting link addition(s) by [[:en:Special:Contributions/130.216.82.29|130.216.82.29]] to revision 570643994 (http://engineeringandmathematics.blogspot.co.nz/2012/11/dirichlet-series-of-divisor-function.html [\bblogspot\.]) + wikitext + text/x-wiki + [[Image:Divisor.svg|thumb|right|Divisor function σ<sub>0</sub>(''n'') up to ''n''&nbsp;=&nbsp;250]] +[[Image:Sigma function.svg|thumb|right|Sigma function σ<sub>1</sub>(''n'') up to ''n''&nbsp;=&nbsp;250]] +[[Image:Divisor square.svg|thumb|right|Sum of the squares of divisors, σ<sub>2</sub>(''n''), up to ''n''&nbsp;=&nbsp;250]] +[[Image:Divisor cube.svg|thumb|right|Sum of cubes of divisors, σ<sub>3</sub>(''n'') up to ''n''&nbsp;=&nbsp;250]] +In [[mathematics]], and specifically in [[number theory]], a '''divisor function''' is an [[arithmetic function]] related to the [[divisor]]s of an [[integer]]. When referred to as ''the'' divisor function, it counts the ''number of divisors of an integer''. It appears in a number of remarkable identities, including relationships on the [[Riemann zeta function]] and the [[Eisenstein series]] of [[modular form]]s. Divisor functions were studied by [[Ramanujan]], who gave a number of important [[Modular arithmetic|congruences]] and [[identity (mathematics)|identities]]. + +A related function is the [[divisor summatory function]], which, as the name implies, is a sum over the divisor function. + +==Definition== +The '''sum of positive divisors function''' σ<sub>''x''</sub>(''n''), for a real or complex number ''x'', is defined as the [[sum]] of the ''x''th [[Exponentiation|powers]] of the positive [[divisor]]s of ''n''. It can be expressed in [[Summation#Capital-sigma notation|sigma notation]] as + +:<math>\sigma_{x}(n)=\sum_{d|n} d^x\,\! ,</math> + +where <math>{d|n}</math> is shorthand for "''d'' [[divides]] ''n''". +The notations ''d''(''n''), ν(''n'') and τ(''n'') (for the German ''Teiler'' = divisors) are also used to denote σ<sub>0</sub>(''n''), or the '''number-of-divisors function'''<ref name="Long 1972 46">{{harvtxt|Long|1972|p=46}}</ref><ref>{{harvtxt|Pettofrezzo|Byrkit|1970|p=63}}</ref> {{OEIS|id=A000005}}. When ''x'' is 1, the function is called the '''sigma function''' or '''sum-of-divisors function''',<ref name="Long 1972 46"/><ref>{{harvtxt|Pettofrezzo|Byrkit|1970|p=58}}</ref> and the subscript is often omitted, so σ(''n'') is equivalent to σ<sub>1</sub>(''n'') ({{OEIS2C|id=A000203}}). + +The '''aliquot sum''' s(n) of ''n'' is the sum of the [[proper divisor]]s (that is, the divisors excluding ''n'' itself, {{OEIS2C|id=A001065}}), and equals σ<sub>1</sub>(''n'')&nbsp;&minus;&nbsp;''n''; the [[aliquot sequence]] of ''n'' is formed by repeatedly applying the aliquot sum function. + +==Example== +For example, σ<sub>0</sub>(12) is the number of the divisors of 12: + +: <math> +\begin{align} +\sigma_{0}(12) & = 1^0 + 2^0 + 3^0 + 4^0 + 6^0 + 12^0 \\ +& = 1 + 1 + 1 + 1 + 1 + 1 = 6, +\end{align} +</math> + +while σ<sub>1</sub>(12) is the sum of all the divisors: + +: <math> +\begin{align} +\sigma_{1}(12) & = 1^1 + 2^1 + 3^1 + 4^1 + 6^1 + 12^1 \\ +& = 1 + 2 + 3 + 4 + 6 + 12 = 28, +\end{align} +</math> + +and the aliquot sum s(12) of proper divisors is: + +: <math> +\begin{align} +s(12) & = 1^1 + 2^1 + 3^1 + 4^1 + 6^1 \\ +& = 1 + 2 + 3 + 4 + 6 = 16. +\end{align} +</math> + +==Table of values== +{| class="wikitable" +|- +! ''n'' +! Divisors +! σ<sub>0</sub>(''n'') +! σ<sub>1</sub>(''n'') +! ''s''(''n'')&nbsp;=&nbsp;σ<sub>1</sub>(''n'')&nbsp;&minus;&nbsp;''n'' +! Comment +|- +! 1 +| 1 +| 1 +| 1 +| 0 +| square number: σ<sub>0</sub>(''n'') is odd; power of 2: s(''n'')&nbsp;=&nbsp;''n''&nbsp;&minus;&nbsp;1 (almost-perfect) +|- +! 2 +| 1,2 +| 2 +| 3 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 3 +| 1,3 +| 2 +| 4 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 4 +| 1,2,4 +| 3 +| 7 +| 3 +| square number: σ<sub>0</sub>(''n'') is odd; power of 2: ''s''(''n'')&nbsp;=&nbsp;''n''&nbsp;&minus;&nbsp;1 (almost-perfect) +|- +! 5 +| 1,5 +| 2 +| 6 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 6 +| 1,2,3,6 +| 4 +| 12 +| 6 +| first [[perfect number]]: ''s''(''n'')&nbsp;=&nbsp;''n'' +|- +! 7 +| 1,7 +| 2 +| 8 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 8 +| 1,2,4,8 +| 4 +| 15 +| 7 +| power of 2: ''s''(''n'') = ''n'' &minus; 1 (almost-perfect) +|- +! 9 +| 1,3,9 +| 3 +| 13 +| 4 +| square number: σ<sub>0</sub>(''n'') is odd +|- +! 10 +| 1,2,5,10 +| 4 +| 18 +| 8 +| +|- +! 11 +| 1,11 +| 2 +| 12 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 12 +| 1,2,3,4,6,12 +| 6 +| 28 +| 16 +| first [[abundant number]]: ''s''(''n'')&nbsp;>&nbsp;''n'' +|- +! 13 +| 1,13 +| 2 +| 14 +| 1 +| Prime: σ<sub>1</sub>(n) = 1+n so s(n) =1 +|- +! 14 +| 1,2,7,14 +| 4 +| 24 +| 10 +| +|- +! 15 +| 1,3,5,15 +| 4 +| 24 +| 9 +| +|- +! 16 +| 1,2,4,8,16 +| 5 +| 31 +| 15 +| square number: σ<sub>0</sub>(''n'') is odd; power of 2: ''s''(''n'')&nbsp;=&nbsp;''n''&nbsp;&minus;&nbsp;1 (almost-perfect) +|} + +The cases {{math|x{{=}}2}}, {{math|x{{=}}3}} and so on are tabulated +in {{OEIS2C|A001157}}, {{OEIS2C|A001158}}, {{OEIS2C|A001159}}, +{{OEIS2C|A001160}}, {{OEIS2C|A013954}}, {{OEIS2C|A013955}} ... + +==Properties== +For a non-square integer every divisor d of n is paired with divisor n/d of n and <math>\sigma_{0}(n)</math> is then even; for a square integer one divisor (namely <math>\sqrt n</math>) is not paired with a distinct divisor and <math>\sigma_{0}(n)</math> is then odd. + +For a [[prime number]] ''p'', + +:<math> +\begin{align} +\sigma_0(p) & = 2 \\ +\sigma_0(p^n) & = n+1 \\ +\sigma_1(p) & = p+1 +\end{align} +</math> + +because by definition, the factors of a prime number are 1 and itself. Also,where ''p<sub>n</sub>''# denotes the [[primorial]], + +:<math> \sigma_0(p_n\#) = 2^n \, </math> + +since ''n'' prime factors allow a sequence of binary selection (<math>p_{i}</math> or 1) from ''n'' terms for each proper divisor formed. + +Clearly, <math>1 < \sigma_0(n) < n</math> and σ(''n'')&nbsp;>&nbsp;''n'' for all&nbsp;''n''&nbsp;>&nbsp;2. + +The divisor function is [[multiplicative function|multiplicative]], but not [[Completely multiplicative function|completely multiplicative]]. The consequence of this is that, if we write + +:<math>n = \prod_{i=1}^r p_i^{a_i}</math> + +where ''r''&nbsp;=&nbsp;''&omega;''(''n'') is the number of distinct [[prime factor]]s of ''n'', ''p<sub>i</sub>'' is the ''i''th prime factor, and ''a<sub>i</sub>'' is the maximum power of ''p<sub>i</sub>'' by which ''n'' is [[divisible]], then we have +:<math>\sigma_x(n) = \prod_{i=1}^{r} \frac{p_{i}^{(a_{i}+1)x}-1}{p_{i}^x-1}</math> + +which is equivalent to the useful formula: +:<math> +\sigma_x(n) = \prod_{i=1}^r \sum_{j=0}^{a_i} p_i^{j x} = +\prod_{i=1}^r (1 + p_i^x + p_i^{2x} + \cdots + p_i^{a_i x}). +</math> + +It follows (by setting ''x'' = 0) that ''d''(''n'') is: +:<math>\sigma_0(n)=\prod_{i=1}^r (a_i+1).</math> + +For example, if ''n'' is 24, there are two prime factors (''p<sub>1</sub>'' is 2; ''p<sub>2</sub>'' is 3); noting that 24 is the product of 2<sup>3</sup>×3<sup>1</sup>, ''a''<sub>1</sub> is 3 and ''a''<sub>2</sub> is 1. Thus we can calculate <math>\sigma_0(24)</math> as so: + +: <math> +\begin{align} +\sigma_0(24) & = \prod_{i=1}^{2} (a_i+1) \\ +& = (3 + 1)(1 + 1) = 4 \times 2 = 8. +\end{align} +</math> + +The eight divisors counted by this formula are 1, 2, 4, 8, 3, 6, 12, and 24. + +We also note ''s''(''n'') = ''&sigma;''(''n'')&nbsp;&minus;&nbsp;''n''. Here ''s''(''n'') denotes the sum of the proper divisors of ''n'', i.e. the divisors of ''n'' excluding ''n'' itself. +This function is the one used to recognize [[perfect number]]s which are the ''n'' for which ''s''(''n'') =&nbsp;''n''. If ''s''(''n'') > ''n'' then ''n'' is an [[abundant number]] and if ''s''(''n'') < ''n'' then ''n'' is a [[deficient number]]. + +If n is a power of 2, e.g. <math>n = 2^k</math>, then <math>\sigma(n) = 2 \times 2^k - 1 = 2n - 1,</math> and ''s(n) = n - 1'', which makes ''n'' [[Almost perfect number|almost-perfect]]. + +As an example, for two distinct primes ''p'' and ''q'' with ''p < q'', let + +:<math>n = pq. \, </math> + +Then +:<math>\sigma(n) = (p+1)(q+1) = n + 1 + (p+q), \, </math> +:<math>\varphi(n) = (p-1)(q-1) = n + 1 - (p+q), \, </math> +and +:<math>n + 1 = (\sigma(n) + \varphi(n))/2, \, </math> +:<math>p + q = (\sigma(n) - \varphi(n))/2, \, </math> +where ''&phi;''(''n'') is [[Euler phi|Euler's totient function]]. + +Then, the roots of: +:<math>(x-p)(x-q) = x^2 - (p+q)x + n = x^2 - [(\sigma(n) - \varphi(n))/2]x + [(\sigma(n) + \varphi(n))/2 - 1] = 0 \, </math> +allows us to express ''p'' and ''q'' in terms of ''&sigma;''(''n'') and ''&phi;''(''n'') only, without even knowing ''n'' or ''p+q'', as: +:<math>p = (\sigma(n) - \varphi(n))/4 - \sqrt{[(\sigma(n) - \varphi(n))/4]^2 - [(\sigma(n) + \varphi(n))/2 - 1]}, \, </math> +:<math>q = (\sigma(n) - \varphi(n))/4 + \sqrt{[(\sigma(n) - \varphi(n))/4]^2 - [(\sigma(n) + \varphi(n))/2 - 1]}. \, </math> + +Also, knowing n and either ''&sigma;''(''n'') or ''&phi;''(''n'') (or knowing p+q and either ''&sigma;''(''n'') or ''&phi;''(''n'')) allows us to easily find ''p'' and ''q''. + +In 1984, [[Roger Heath-Brown]] proved that + +:<math>\sigma_0(n) = \sigma_0(n + 1)</math> + +will occur infinitely often. + +==Series relations== +Two [[Dirichlet series]] involving the divisor function are: + +:<math>\sum_{n=1}^\infty \frac{\sigma_{a}(n)}{n^s} = \zeta(s) \zeta(s-a),</math> + +which for ''d''(''n'')&nbsp;=&nbsp;''&sigma;''<sub>0</sub>(''n'') gives + +: <math>\sum_{n=1}^\infty \frac{d(n)}{n^s} = \zeta^2(s),</math> + +and + +:<math>\sum_{n=1}^\infty \frac{\sigma_a(n)\sigma_b(n)}{n^s} = \frac{\zeta(s) \zeta(s-a) \zeta(s-b) \zeta(s-a-b)}{\zeta(2s-a-b)}.</math> + +A [[Lambert series]] involving the divisor function is: + +:<math>\sum_{n=1}^\infty q^n \sigma_a(n) = \sum_{n=1}^\infty \frac{n^a q^n}{1-q^n}</math> + +for arbitrary [[complex number|complex]] |''q''|&nbsp;≤&nbsp;1 and&nbsp;''a''. This summation also appears as the Fourier series of the [[Eisenstein series]] and the invariants of the [[Weierstrass elliptic functions]]. + +==Approximate growth rate== +In [[Big O notation#Little-o notation|little-o notation]], the divisor function satisfies the inequality (see page 296 of Apostol’s book<ref name="Apostol">{{Apostol IANT}}</ref>) +:<math>\mbox{for all }\epsilon>0,\quad d(n)=o(n^\epsilon).</math> +More precisely, [[Severin Wigert]] showed that +:<math>\limsup_{n\to\infty}\frac{\log d(n)}{\log n/\log\log n}=\log2.</math> +On the other hand, since [[Prime number#There are infinitely many prime numbers|there are infinitely many prime numbers]], +:<math>\liminf_{n\to\infty} d(n)=2.</math> + +In [[Big-O notation]], [[Peter Gustav Lejeune Dirichlet]] showed that the [[Average order of an arithmetic function|average order]] of the divisor function satisfies the following inequality (see Theorem 3.3 of Apostol’s book<ref name="Apostol"/>) +:<math>\mbox{for all } x\geq1, \sum_{n\leq x}d(n)=x\log x+(2\gamma-1)x+O(\sqrt{x}),</math> +where <math>\gamma</math> is [[Euler–Mascheroni constant|Euler's constant]]. Improving the bound <math>O(\sqrt{x})</math> in this formula is known as [[Divisor summatory function#Dirichlet's divisor problem|Dirichlet's divisor problem]] + +{{anchor|Robin's theorem|Robin's inequality|Grönwall's theorem}} + +The behaviour of the sigma function is irregular. The asymptotic growth rate of the sigma function can be expressed by: +:<math> +\limsup_{n\rightarrow\infty}\frac{\sigma(n)}{n\,\log \log n}=e^\gamma, +</math> + +where lim sup is the [[limit superior]]. This result is '''[[Thomas Hakon Grönwall|Grönwall]]'s theorem''', published in 1913 {{harv|Grönwall|1913}}. His proof uses [[Mertens' theorems|Mertens' 3rd theorem]], which says that + +:<math>\lim_{n\to\infty}\frac{1}{\log n}\prod_{p\le n}\frac{p}{p-1}=e^{\gamma},</math> + +where ''p'' denotes a prime. + +In 1915, Ramanujan proved that under the assumption of the [[Riemann hypothesis]], the inequality: +:<math>\ \sigma(n) < e^\gamma n \log \log n </math> (Robin's inequality) +holds for all sufficiently large ''n'' {{harv|Ramanujan|1997}}. In 1984 [[Guy Robin]] proved that the inequality is true for all ''n'' ≥ 5,041 if and only if the Riemann hypothesis is true {{harv|Robin|1984}}. This is '''Robin's theorem''' and the inequality became known after him. The largest known value that violates the inequality is ''n''=5,040. If the Riemann hypothesis is true, there are no greater exceptions. If the hypothesis is false, then Robin showed there are an infinite number of values of ''n'' that violate the inequality, and it is known that the smallest such ''n'' ≥ 5,041 must be [[superabundant number|superabundant]] {{harv|Akbary|Friggstad|2009}}. It has been shown that the inequality holds for large odd and square-free integers, and that the Riemann hypothesis is equivalent to the inequality just for ''n'' divisible by the fifth power of a prime {{Harv|Choie|Lichiardopol|Moree|Solé|2007}}. + +A related bound was given by [[Jeffrey Lagarias]] in 2002, who proved that the Riemann hypothesis is equivalent to the statement that +:<math> \sigma(n) \le H_n + \ln(H_n)e^{H_n}</math> +for every [[natural number]] ''n'', where <math>H_n</math> is the ''n''th [[harmonic number]], {{harv|Lagarias|2002}}. + +Robin also proved, unconditionally, that the inequality +:<math>\ \sigma(n) < e^\gamma n \log \log n + \frac{0.6483\ n}{\log \log n}</math> +holds for all ''n'' ≥ 3. + +== See also == +* [[Euler's totient function]] (Euler's phi function) +* [[Table of divisors]] +* [[Arithmetic function#Divisor sum convolutions|Divisor sum convolutions]] Lists a few identities involving the divisor functions +* [[Unitary divisor]] + +==Notes== +<references/> + +== References == +*{{Citation|doi=10.4169/193009709X470128|first1=Amir|last1=Akbary|first2=Zachary|last2=Friggstad|title=Superabundant numbers and the Riemann hypothesis|url=http://webdocs.cs.ualberta.ca/~zacharyf/Papers/superabundant.pdf|journal=American Mathematical Monthly|volume=116|issue=3|year=2009|pages=273–275}}. +* [[Eric Bach|Bach, Eric]]; [[Jeffrey Shallit|Shallit, Jeffrey]], ''Algorithmic Number Theory'', volume 1, 1996, MIT Press. ISBN 0-262-02405-5, see page 234 in section 8.8. +* {{Citation | last1=Caveney | first1=Geoffrey | last2=Nicolas | first2=Jean-Louis | last3=Sondow | first3=Jonathan | title=Robin's theorem, primes, and a new elementary reformulation of the Riemann Hypothesis | url=http://www.integers-ejcnt.org/l33/l33.pdf | year=2011 | journal=INTEGERS: the Electronic Journal of Combinatorial Number Theory | volume=11 | pages=A33}} +*{{Citation | last1=Choie | first1=YoungJu | last2=Lichiardopol | first2=Nicolas | last3=Moree | first3=Pieter | last4=Solé | first4=Patrick | title=On Robin's criterion for the Riemann hypothesis | url=http://jtnb.cedram.org/item?id=JTNB_2007__19_2_357_0 | mr=2394891 |arxiv=math.NT/0604314 | year=2007 | journal=Journal de théorie des nombres de Bordeaux | issn=1246-7405 | volume=19 | issue=2 | pages=357–372 | doi=10.5802/jtnb.591}} +* {{Citation | last1=Grönwall | first1=Thomas Hakon | author1-link=Thomas Hakon Grönwall | title=Some asymptotic expressions in the theory of numbers | year=1913 | journal=Transactions of the American Mathematical Society | volume=14 | pages=113–122 | doi=10.1090/S0002-9947-1913-1500940-6}} +* {{citation | last=Ivić | first=Aleksandar | title=The Riemann zeta-function. The theory of the Riemann zeta-function with applications | series=A Wiley-Interscience Publication | location=New York etc. | publisher=John Wiley & Sons | year=1985 | isbn=0-471-80634-X | zbl=0556.10026 | pages=385–440 }} +* {{Citation | last1=Lagarias | first1=Jeffrey C. | author1-link=Jeffrey C. Lagarias | title=An elementary problem equivalent to the Riemann hypothesis | doi=10.2307/2695443 | jstor=2695443 | mr=1908008 | year=2002 | journal=[[American Mathematical Monthly|The American Mathematical Monthly]] | issn=0002-9890 | volume=109 | issue=6 | pages=534–543}} +* {{citation | first1 = Calvin T. | last1 = Long | year = 1972 | title = Elementary Introduction to Number Theory | edition = 2nd | publisher = [[D. C. Heath and Company]] | location = Lexington | lccn = 77-171950 }} +* {{Citation | last=Ramanujan | first=Srinivasa | author-link=Srinivasa Ramanujan | title=Highly composite numbers, annotated by Jean-Louis Nicolas and Guy Robin | doi=10.1023/A:1009764017495 | mr=1606180 | year=1997 | journal=The Ramanujan Journal | issn=1382-4090 | volume=1 | issue=2 | pages=119–153}} +* {{citation | first1 = Anthony J. | last1 = Pettofrezzo | first2 = Donald R. | last2 = Byrkit | year = 1970 | title = Elements of Number Theory | publisher = [[Prentice Hall]] | location = Englewood Cliffs | lccn = 77-81766 }} +* {{Citation | last1=Robin | first1=Guy | title=Grandes valeurs de la fonction somme des diviseurs et hypothèse de Riemann | mr=774171 | year=1984 | journal=[[Journal de Mathématiques Pures et Appliquées]]|series= Neuvième Série | issn=0021-7824 | volume=63 | issue=2 | pages=187–213}} +* {{mathworld|urlname=DivisorFunction|title=Divisor Function}} +* {{mathworld|urlname=RobinsTheorem|title=Robin's Theorem}} +* [http://mathstat.carleton.ca/~williams/papers/pdf/249.pdf Elementary Evaluation of Certain Convolution Sums Involving Divisor Functions] PDF of a paper by Huard, Ou, Spearman, and Williams. Contains elementary (i.e. not relying on the theory of modular forms) proofs of divisor sum convolutions, formulas for the number of ways of representing a number as a sum of triangular numbers, and related results. + + +{{Divisor classes}} + +[[Category:Divisor function| ]] +[[Category:Number theory]] + +{{Link FA|hu}} +[[hu:Osztóösszeg-függvény]] +[[pl:Funkcja σ]] + hinnwfdztk20ohkwkcvnd2ebf258u1q + + + + LTI system theory + 0 + 8337 + + 8338 + 2014-01-25T13:26:23Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + {{More footnotes|date=April 2009}} + +'''Linear time-invariant theory''', commonly known as '''LTI system theory,''' comes from [[applied mathematics]] and has direct applications in [[NMR spectroscopy]], [[seismology]], [[electrical network|circuit]]s, [[signal processing]], [[control theory]], and other technical areas. It investigates the response of a [[linear system|linear]] and [[time-invariant system]] to an arbitrary input signal. Trajectories of these systems are commonly measured and tracked as they move through time (e.g., an acoustic waveform), but in applications like [[image processing]] and [[Classical field theory|field theory]], the LTI systems also have trajectories in spatial dimensions. Thus, these systems are also called ''linear translation-invariant'' to give the theory the most general reach. In the case of generic [[discrete-time]] (i.e., [[sample (signal)|sampled]]) systems, ''linear shift-invariant'' is the corresponding term. A good example of LTI systems are electrical circuits that can be made up of resistors, capacitors, and inductors.<ref>Hespanha 2009, p. 78.</ref> + +==Overview== +The defining properties of any LTI system are ''linearity'' and ''time invariance''. + +* ''Linearity'' means that the relationship between the input and the output of the system is a [[linear map]]: If input <math>x_1(t)\,</math> produces response <math>y_1(t),\,</math> and input <math>x_2(t)\,</math> produces response <math>y_2(t),\,</math> then the ''scaled'' and ''summed'' input <math>a_1 x_1(t)+ a_2 x_2(t)\,</math> produces the scaled and summed response <math>a_1 y_1(t) + a_2y_2(t)\,</math> where <math>a_1</math> and <math>a_2</math> are [[real number|real]] [[scalar (mathematics)|scalar]]s. It follows that this can be extended to an arbitrary number of terms, and so for real numbers <math>c_1, c_2, \ldots, c_k</math>, +::Input &nbsp; <math>\sum_k c_k\,x_k(t)</math> &nbsp; produces output &nbsp; <math>\sum_k c_k\,y_k(t).\,</math> +:In particular, +{{NumBlk|::|Input &nbsp; <math>\int_{-\infty}^{\infty} c_{\omega}\,x_{\omega}(t) \, \operatorname{d}\omega</math> &nbsp; produces output &nbsp; <math>\int_{-\infty}^{\infty} c_{\omega}\,y_{\omega}(t) \, \operatorname{d}\omega\,</math>|{{EquationRef|Eq.1}}}} +:where <math>c_{\omega}</math> and <math>x_{\omega}</math> are scalars and inputs that vary over a [[Continuum (set theory)|continuum]] indexed by <math>\omega</math>. Thus if an input function can be represented by a continuum of input functions, combined "linearly", as shown, then the corresponding output function can be represented by the corresponding continuum of output functions, ''scaled'' and ''summed'' in the same way. + +* ''Time invariance'' means that whether we apply an input to the system now or ''T'' seconds from now, the output will be identical except for a time delay of the ''T'' seconds. That is, if the output due to input <math>x(t)</math> is <math>y(t)</math>, then the output due to input <math>x(t-T)</math> is <math>y(t-T)</math>. Hence, the system is time invariant because the output does not depend on the particular time the input is applied. + +The fundamental result in LTI system theory is that any LTI system can be characterized entirely by a single function called the system's [[impulse response]]. The output of the system is simply the [[convolution]] of the input to the system with the system's impulse response. This method of analysis is often called the ''[[time domain]]'' point-of-view. The same result is true of discrete-time linear shift-invariant systems in which signals are discrete-time samples, and convolution is defined on sequences. + +[[File:LTI.png|thumb|Relationship between the '''time domain''' and the '''frequency domain'''|right|320px]] + +Equivalently, any LTI system can be characterized in the ''[[frequency domain]]'' by the system's [[transfer function]], which is the [[Laplace transform]] of the system's impulse response (or [[Z transform]] in the case of discrete-time systems). As a result of the properties of these transforms, the output of the system in the frequency domain is the product of the transfer function and the transform of the input. In other words, convolution in the time domain is equivalent to multiplication in the frequency domain. + +For all LTI systems, the [[eigenfunction]]s, and the basis functions of the transforms, are [[complex number|complex]] [[exponential function|exponentials]]. This is, if the input to a system is the complex waveform <math>A e^{st}</math> for some complex amplitude <math>A</math> and complex frequency <math>s</math>, the output will be some complex constant times the input, say <math>B e^{st}</math> for some new complex amplitude <math>B</math>. The ratio <math>B/A</math> is the transfer function at frequency <math>s</math>. + +Because [[sine wave|sinusoids]] are a sum of complex exponentials with complex-conjugate frequencies, if the input to the system is a sinusoid, then the output of the system will also be a sinusoid, perhaps with a different [[amplitude]] and a different [[phase (waves)|phase]], but always with the same frequency. LTI systems cannot produce frequency components that are not in the input. + +LTI system theory is good at describing many important systems. Most LTI systems are considered "easy" to analyze, at least compared to the time-varying and/or [[nonlinear]] case. Any system that can be modeled as a linear homogeneous [[differential equation]] with constant coefficients is an LTI system. Examples of such systems are [[electrical network|electrical circuits]] made up of [[resistor]]s, [[inductor]]s, and [[capacitor]]s (RLC circuits). Ideal spring–mass–damper systems are also LTI systems, and are mathematically equivalent to RLC circuits. + +Most LTI system concepts are similar between the continuous-time and discrete-time (linear shift-invariant) cases. In image processing, the time variable is replaced with two space variables, and the notion of time invariance is replaced by two-dimensional shift invariance. When analyzing [[filter bank]]s and [[MIMO (systems theory)|MIMO]] systems, it is often useful to consider [[matrix (mathematics)|vectors]] of signals. + +A linear system that is not time-invariant can be solved using other approaches such as the [[Green's function|Green function]] method. The same method must be used when the initial conditions of the problem are not null. + +== Continuous-time systems == + +=== Impulse response and convolution=== + +The behavior of a linear, continuous-time, time-invariant system with input signal x(t) and output signal y(t) is described by the convolution integral,<ref>Crutchfield[web]</ref> ''':''' +:{| +|<math>y(t) = x(t) * h(t)\,</math> +|<math>{}\quad \stackrel{\mathrm{def}}{=} \ \int_{-\infty}^{\infty} x(t-\tau)\cdot h(\tau) \, \operatorname{d}\tau</math> +|- +| +|<math>{}\quad = \int_{-\infty}^{\infty} x(\tau)\cdot h(t-\tau) \,\operatorname{d}\tau,</math> &nbsp; &nbsp; &nbsp; (using [[Convolution#Commutativity|commutativity]]) +|} + +where <math>\scriptstyle h(t)</math> is the system's response to an [[Dirac delta function|impulse]]''':''' &nbsp;<math>\scriptstyle x(\tau) = \delta(\tau).</math> &nbsp; <math>\scriptstyle y(t)</math> is therefore proportional to a weighted average of the input function <math>\scriptstyle x(\tau).</math>&nbsp; The weighting function is <math>\scriptstyle h(-\tau),</math> simply shifted by amount <math>\scriptstyle t.</math> &nbsp; As <math>\scriptstyle t</math> changes, the weighting function emphasizes different parts of the input function. When <math>\scriptstyle h(\tau)</math> is zero for all negative <math>\scriptstyle \tau,</math>&nbsp; <math>\scriptstyle y(t)</math> depends only on values of <math>\scriptstyle x</math> prior to time <math>\scriptstyle t,</math>&nbsp; and the system is said to be [[Causal system|causal]]. + +To understand why the convolution produces the output of an LTI system, let the notation <math>\scriptstyle \{x(u-\tau);\ u\}</math> represent the function <math>\scriptstyle x(u-\tau)</math> with variable <math>\scriptstyle u</math> and constant <math>\scriptstyle \tau.</math>&nbsp; And let the shorter notation <math>\scriptstyle \{x\}\,</math> represent <math>\scriptstyle \{x(u);\ u\}.</math> Then a continuous-time system transforms an input function, <math>\scriptstyle \{x\},</math> into an output function, <math>\scriptstyle \{y\}.</math>&nbsp; And in general, every value of the output can depend on every value of the input. This concept is represented by''':''' + +:<math>y(t) \ \stackrel{\text{def}}{=}\ O_t\{x\},</math> + +where <math>\scriptstyle O_t</math> is the transformation operator for time <math>\scriptstyle t.</math>&nbsp; In a typical system, <math>\scriptstyle y(t)</math> depends most heavily on the values of <math>\scriptstyle x</math> that occurred near time <math>\scriptstyle t.</math>&nbsp; Unless the transform itself changes with <math>\scriptstyle t,</math> the output function is just constant, and the system is uninteresting. + +For a linear system, <math>\scriptstyle O</math> must satisfy {{EquationNote|Eq.1}}''':''' + +{{NumBlk|:|<math> +O_t\left\{\int_{-\infty}^{\infty} c_{\tau}\ x_{\tau}(u) \, \operatorname{d}\tau ;\ u\right\} = \int_{-\infty}^{\infty} c_{\tau}\ \underbrace{y_{\tau}(t)}_{O_t\{x_{\tau}\}} \, \operatorname{d}\tau. +\,</math>|{{EquationRef|Eq.2}}}} + +And the time-invariance requirement is''':''' + +{{NumBlk|:|<math> +\begin{align} +O_t\{x(u-\tau);\ u\}\ &\stackrel{\quad}{=}\ y(t-\tau)\\ +&\stackrel{\text{def}}{=}\ O_{t-\tau}\{x\}.\, +\end{align} +</math>|{{EquationRef|Eq.3}}}} + +In this notation, we can write the '''impulse response''' as &nbsp;<math>\scriptstyle h(t) \ \stackrel{\text{def}}{=}\ O_t\{\delta(u);\ u\}.</math> + +Similarly''':''' +:{| +|<math>h(t-\tau)\,</math> +|<math>{}\stackrel{\text{def}}{=}\ O_{t-\tau}\{\delta(u);\ u\}</math> +|- +| +|<math>{}= O_t\{\delta(u-\tau);\ u\}.\,</math> &nbsp; &nbsp; &nbsp; (using {{EquationNote|Eq.3}}) +|} + +Substituting this result into the convolution integral''':''' + +:<math> +\begin{align} +x(t) * h(t) &= \int_{-\infty}^{\infty} x(\tau)\cdot h(t-\tau) \,\operatorname{d}\tau\\ +&= \int_{-\infty}^{\infty} x(\tau)\cdot O_t\{\delta(u-\tau);\ u\} \, \operatorname{d}\tau,\, +\end{align} +</math> + +which has the form of the right side of {{EquationNote|Eq.2}} for the case <math>\scriptstyle c_{\tau} = x(\tau)</math> and <math>\scriptstyle x_{\tau}(u) = \delta(u-\tau).</math><br> +{{EquationNote|Eq.2}} then allows this continuation''':''' + +:<math> +\begin{align} +x(t) * h(t) &= O_t\left\{\int_{-\infty}^{\infty} x(\tau)\cdot \delta(u-\tau) \, \operatorname{d}\tau;\ u \right\}\\ +&= O_t\left\{x(u);\ u \right\}\\ +&\ \stackrel{\text{def}}{=}\ y(t).\, +\end{align} +</math> + +In summary, the input function, <math>\scriptstyle \{x\},</math>&nbsp; can be represented by a continuum of time-shifted impulse functions, combined "linearly", as shown at {{EquationRef|Eq.1}}. The system's linearity property allows the system's response to be represented by the corresponding continuum of impulse <u>responses</u>, combined in the same way. &nbsp;And the time-invariance property allows that combination to be represented by the convolution integral. + +The mathematical operations above have a simple graphical simulation.<ref>Crutchfield</ref> + +=== Exponentials as eigenfunctions === +An [[eigenfunction]] is a function for which the output of the operator is a scaled version of the same function. That is, +:<math>\mathcal{H}f = \lambda f</math>, +where ''f'' is the eigenfunction and <math>\lambda</math> is the [[eigenvalue]], a constant. + +The [[exponential function]]s <math>A e^{s t}</math>, where <math>A, s \in \mathbb{C}</math>, are [[eigenfunction]]s of a [[linear]], [[time-invariant]] operator. A simple proof illustrates this concept. Suppose the input is <math>x(t) = A e^{s t}</math>. The output of the system with impulse response <math>h(t)</math> is then + +:<math>\int_{-\infty}^{\infty} h(t - \tau) A e^{s \tau}\, \operatorname{d} \tau</math> + +which, by the commutative property of [[convolution]], is equivalent to + +:<math>\begin{align} +\overbrace{\int_{-\infty}^{\infty} h(\tau) \, A e^{s (t - \tau)} \, \operatorname{d} \tau}^{\mathcal{H} f} +&= \int_{-\infty}^{\infty} h(\tau) \, A e^{s t} e^{-s \tau} \, \operatorname{d} \tau +&= A e^{s t} \int_{-\infty}^{\infty} h(\tau) \, e^{-s \tau} \, \operatorname{d} \tau\\ +&= \overbrace{\underbrace{A e^{s t}}_{\text{Input}}}^{f} \overbrace{\underbrace{H(s)}_{\text{Scalar}}}^{\lambda}, +\end{align}</math> +where the scalar +:<math>H(s)\ \stackrel{\text{def}}{=}\ \int_{-\infty}^\infty h(t) e^{-s t} \, \operatorname{d} t</math> +is dependent only on the parameter ''s''. + +So the system's response is a scaled version of the input. In particular, for any <math>A, s \in \mathbb{C}</math>, the system output is the product of the input <math>A e^{st}</math> and the constant <math>H(s)</math>. Hence, <math>A e^{s t}</math> is an [[eigenfunction]] of an LTI system, and the corresponding [[eigenvalue]] is <math>H(s)</math>. + +=== Fourier and Laplace transforms === +The eigenfunction property of exponentials is very useful for both analysis and insight into LTI systems. The [[Laplace transform]] + +:<math>H(s)\ \stackrel{\text{def}}{=}\ \mathcal{L}\{h(t)\}\ \stackrel{\text{def}}{=}\ \int_{-\infty}^\infty h(t) e^{-s t} \, \operatorname{d} t</math> + +is exactly the way to get the eigenvalues from the impulse response. Of particular interest are pure sinusoids (i.e., exponential functions of the form <math>e^{j \omega t}</math> where <math>\omega \in \mathbb{R}</math> and <math>j\ \stackrel{\text{def}}{=}\ \sqrt{-1}</math>). These are generally called complex exponentials even though the argument is purely imaginary. The [[Fourier transform]] <math>H(j \omega) = \mathcal{F}\{h(t)\}</math> gives the eigenvalues for pure complex sinusoids. Both of <math>H(s)</math> and <math>H(j\omega)</math> are called the ''system function'', ''system response'', or ''transfer function''. + +The Laplace transform is usually used in the context of one-sided signals, i.e. signals that are zero for all values of ''t'' less than some value. Usually, this "start time" is set to zero, for convenience and without loss of generality, with the transform integral being taken from zero to infinity (the transform shown above with lower limit of integration of negative infinity is formally known as the [[bilateral Laplace transform]]). + +The Fourier transform is used for analyzing systems that process signals that are infinite in extent, such as modulated sinusoids, even though it cannot be directly applied to input and output signals that are not [[square integrable]]. The Laplace transform actually works directly for these signals if they are zero before a start time, even if they are not square integrable, for stable systems. The Fourier transform is often applied to spectra of infinite signals via the [[Wiener–Khinchin theorem]] even when Fourier transforms of the signals do not exist. + +Due to the convolution property of both of these transforms, the convolution that gives the output of the system can be transformed to a multiplication in the transform domain, given signals for which the transforms exist +:<math>y(t) = (h*x)(t)\ \stackrel{\text{def}}{=}\ \int_{-\infty}^\infty h(t - \tau) x(\tau) \, \operatorname{d} \tau\ \stackrel{\text{def}}{=}\ \mathcal{L}^{-1}\{H(s)X(s)\}.</math> + +Not only is it often easier to do the transforms, multiplication, and inverse transform than the original convolution, but one can also gain insight into the behavior of the system from the system response. One can look at the modulus of the system function |''H''(''s'')| to see whether the input <math>\exp({s t})</math> is ''passed'' (let through) the system or ''rejected'' or ''attenuated'' by the system (not let through). + +=== Examples === +* A simple example of an LTI operator is the [[derivative]]. +** <math> \frac{\operatorname{d}}{\operatorname{d}t} \left( c_1 x_1(t) + c_2 x_2(t) \right) = c_1 x'_1(t) + c_2 x'_2(t) </math> &nbsp; (i.e., it is linear) +** <math> \frac{\operatorname{d}}{\operatorname{d}t} x(t-\tau) = x'(t-\tau) </math> &nbsp; (i.e., it is time invariant) +:When the Laplace transform of the derivative is taken, it transforms to a simple multiplication by the Laplace variable ''s''. +::<math> \mathcal{L}\left\{\frac{\operatorname{d}}{\operatorname{d}t}x(t)\right\} = s X(s) </math> +:That the derivative has such a simple Laplace transform partly explains the utility of the transform. + +* Another simple LTI operator is an averaging operator +::<math> \mathcal{A}\left\{x(t)\right\}\ \stackrel{\text{def}}{=}\ \int_{t-a}^{t+a} x(\lambda) \, \operatorname{d} \lambda. </math> +:By the linearity of integration, +::<math>\begin{align} +\mathcal{A}\left\{c_1 x_1(t) + c_2 x_2(t) \right\} +&= \int_{t-a}^{t+a} \left( c_1 x_1(\lambda) + c_2 x_2(\lambda) \right) \, \operatorname{d} \lambda\\ +&= c_1 \int_{t-a}^{t+a} x_1(\lambda) \, \operatorname{d} \lambda + c_2 \int_{t-a}^{t+a} x_2(\lambda) \, \operatorname{d} \lambda\\ +&= c_1 \mathcal{A}\left\{x_1(t) \right\} + c_2 \mathcal{A}\left\{x_2(t) \right\}, +\end{align}</math> +:it is linear. Additionally, because +::<math>\begin{align} +\mathcal{A}\left\{x(t-\tau)\right\} +&= \int_{t-a}^{t+a} x(\lambda-\tau) \, \operatorname{d} \lambda\\ +&= \int_{(t-\tau)-a}^{(t-\tau)+a} x(\xi) \, \operatorname{d} \xi\\ +&= \mathcal{A}\{x\}(t-\tau), +\end{align}</math> +:it is time invariant. In fact, <math>\mathcal{A}</math> can be written as a convolution with the [[boxcar function]] <math>\Pi(t)</math>. That is, +::<math> \mathcal{A}\left\{x(t)\right\} = \int_{-\infty}^\infty \Pi\left(\frac{\lambda-t}{2a}\right) x(\lambda) \, \operatorname{d} \lambda, </math> +:where the boxcar function +::<math>\Pi(t)\ \stackrel{\text{def}}{=}\ \begin{cases} 1 &\text{if } |t| < \frac{1}{2},\\ 0 &\text{if } |t| > \frac{1}{2}.\end{cases}</math> + +=== Important system properties === +Some of the most important properties of a system are causality and stability. Causality is a necessity if the independent variable is time, but not all systems have time as an independent variable. For example, a system that processes still images does not need to be causal. Non-causal systems can be built and can be useful in many circumstances. Even [[Quadrature filter|non-real]] systems can be built and are very useful in many contexts. +<!--does anyone have any counterexamples? a non-causal LTI CT system?--> + +==== Causality ==== +{{main|Causal system}} +<!--the causal system article needs work--> +A system is causal if the output depends only on present and past, but not future inputs. A necessary and sufficient condition for causality is + +:<math>h(t) = 0 \quad \forall t < 0,</math> + +where <math>h(t)</math> is the impulse response. It is not possible in general to determine causality from the Laplace transform, because the inverse transform is not unique. When a [[region of convergence]] is specified, then causality can be determined. + +==== Stability ==== +{{main|BIBO stability}} +A system is '''bounded-input, bounded-output stable''' (BIBO stable) if, for every bounded input, the output is finite. Mathematically, if every input satisfying + +:<math>\ \|x(t)\|_{\infty} < \infty</math> + +leads to an output satisfying + +:<math>\ \|y(t)\|_{\infty} < \infty</math> + +(that is, a finite [[Infinity norm|maximum absolute value]] of <math>x(t)</math> implies a finite maximum absolute value of <math>y(t)</math>), then the system is stable. A necessary and sufficient condition is that <math>h(t)</math>, the impulse response, is in [[Lp space|L<sup>1</sup>]] (has a finite L<sup>1</sup> norm): + +:<math>\ \|h(t)\|_1 = \int_{-\infty}^\infty |h(t)| \, \operatorname{d}t < \infty.</math> + +In the frequency domain, the [[region of convergence]] must contain the imaginary axis <math>s=j\omega</math>. + +As an example, the ideal [[low-pass filter]] with impulse response equal to a [[sinc function]] is not BIBO stable, because the sinc function does not have a finite L<sup>1</sup> norm. Thus, for some bounded input, the output of the ideal low-pass filter is unbounded. In particular, if the input is zero for <math>t < 0\,</math> and equal to a sinusoid at the [[cut-off frequency]] for <math>t > 0\,</math>, then the output will be unbounded for all times other than the zero crossings. + +== Discrete-time systems == +Almost everything in continuous-time systems has a counterpart in discrete-time systems. <!-- this section may be very redundant. don't remove this redundancy because these should probably be separate articles. --> + +=== Discrete-time systems from continuous-time systems === +In many contexts, a discrete time (DT) system is really part of a larger continuous time (CT) system. For example, a digital recording system takes an analog sound, digitizes it, possibly processes the digital signals, and plays back an analog sound for people to listen to. + +Formally, the DT signals studied are almost always uniformly sampled versions of CT signals. If <math>x(t)</math> is a CT signal, then an [[analog to digital converter]] will transform it to the DT signal: + +:<math>x[n] \ \stackrel{\text{def}}{=}\ x(nT) \qquad \forall \, n \in \mathbb{Z},</math> + +where ''T'' is the [[sampling frequency|sampling period]]. It is very important to limit the range of frequencies in the input signal for faithful representation in the DT signal, since then the [[sampling theorem]] guarantees that no information about the CT signal is lost. A DT signal can only contain a frequency range of <math>1/(2T)</math>; other frequencies are [[aliasing|aliased]] to the same range. + +=== Impulse response and convolution === + +Let <math>\{x[m-k];\ m\}</math> represent the sequence <math>\{x[m-k];\ \mbox{for all integer values of m}\}</math>. + +And let the shorter notation <math>\{x\}\,</math> represent <math>\{x[m];\ m\}.</math> + +A discrete system transforms an input sequence, <math>\{x\}</math> into an output sequence, <math>\{y\}.</math> In general, every element of the output can depend on every element of the input. Representing the transformation operator by <math>O</math>, we can write: + +:<math>y[n] \ \stackrel{\text{def}}{=}\ O_n\{x\}.</math> + +Note that unless the transform itself changes with '''n''', the output sequence is just constant, and the system is uninteresting. (Thus the subscript, '''n'''.) In a typical system, '''y[n]''' depends most heavily on the elements of '''x''' whose indices are near '''n'''. + +For the special case of the [[Kronecker delta function]], <math>x[m] = \delta[m],</math> the output sequence is the '''impulse response:''' + +:<math>h[n] \ \stackrel{\text{def}}{=}\ O_n\{\delta[m];\ m\}.\,</math> + +For a linear system, <math>O</math> must satisfy: +{{NumBlk|:|<math> +O_n\left\{\sum_{k=-\infty}^{\infty} c_k\cdot x_k[m];\ m\right\} = +\sum_{k=-\infty}^{\infty} c_k\cdot O_n\{x_k\}. +\,</math>|{{EquationRef|Eq.4}}}} + +And the time-invariance requirement is: +{{NumBlk|:|<math> +\begin{align} +O_n\{x[m-k];\ m\}\ &\stackrel{\quad}{=}\ y[n-k]\\ +&\stackrel{\text{def}}{=}\ O_{n-k}\{x\}.\, +\end{align} +</math>|{{EquationRef|Eq.5}}}} + +In such a system, the impulse response, <math>\{h\},\,</math> characterizes the system completely. I.e., for any input sequence, the output sequence can be calculated in terms of the input and the impulse response. To see how that is done, consider the identity: + +:<math> +x[m] \equiv \sum_{k=-\infty}^{\infty} x[k]\cdot \delta[m-k], +\,</math> + +which expresses <math>\{x\}\,</math> in terms of a sum of weighted delta functions. + +Therefore: + +:<math> +\begin{align} +y[n] = O_n\{x\} +&= O_n\left\{\sum_{k=-\infty}^{\infty} x[k]\cdot \delta[m-k];\ m \right\}\\ +&= \sum_{k=-\infty}^{\infty} x[k]\cdot O_n\{\delta[m-k];\ m\},\, +\end{align} +</math> + +where we have invoked {{EquationNote|Eq.4}} for the case <math>c_k = x[k]\,</math> and <math>x_k[m] = \delta[m-k].\,</math> + +And because of {{EquationNote|Eq.5}}, we may write: + +:<math> +\begin{align} +O_n\{\delta[m-k];\ m\}\ &\stackrel{\quad}{=}\ O_{n-k}\{\delta[m];\ m\}\\ +&\stackrel{\text{def}}{=}\ h[n-k].\, +\end{align} +</math> + +Therefore: + +:{| +|<math>y[n]\,</math> +|<math>= \sum_{k=-\infty}^{\infty} x[k]\cdot h[n-k]\,</math> +|- +| +|<math>= \sum_{k=-\infty}^{\infty} x[n-k]\cdot h[k],</math> &nbsp; &nbsp; &nbsp; ([[Convolution#Commutativity|commutativity]]) +|} + +which is the familiar discrete convolution formula. The operator <math>O_n\,</math> can therefore be interpreted as proportional to a weighted average of the function '''x[k]'''. +The weighting function is '''h[-k]''', simply shifted by amount '''n'''. As '''n''' changes, the weighting function emphasizes different parts of the input function. Equivalently, the system's response to an impulse at '''n=0''' is a "time" reversed copy of the unshifted weighting function. When '''h[k]''' is zero for all negative '''k''', the system is said to be [[Causal system|causal]]. + +=== Exponentials as eigenfunctions === +An [[eigenfunction]] is a function for which the output of the operator is the same function, just scaled by some amount. In symbols, +:<math>\mathcal{H}f = \lambda f</math>, +where ''f'' is the eigenfunction and <math>\lambda</math> is the [[eigenvalue]], a constant. + +The [[exponential function]]s <math>z^n = e^{sT n}</math>, where <math>n \in \mathbb{Z}</math>, are [[eigenfunction]]s of a [[linear]], [[time-invariant]] operator. <math>T \in \mathbb{R}</math> is the sampling interval, and <math>z = e^{sT}, \ z,s \in \mathbb{C}</math>. A simple proof illustrates this concept. + +Suppose the input is <math>x[n] = \,\!z^n</math>. The output of the system with impulse response <math>h[n]</math> is then + +:<math>\sum_{m=-\infty}^{\infty} h[n-m] \, z^m</math> + +which is equivalent to the following by the commutative property of [[convolution]] + +:<math>\sum_{m=-\infty}^{\infty} h[m] \, z^{(n - m)} = z^n \sum_{m=-\infty}^{\infty} h[m] \, z^{-m} = z^n H(z)</math> +where +:<math>H(z)\ \stackrel{\text{def}}{=}\ \sum_{m=-\infty}^\infty h[m] z^{-m}</math> +is dependent only on the parameter ''z''. + +So <math>z^n</math> is an [[eigenfunction]] of an LTI system because the system response is the same as the input times the constant <math>H(z)</math>. + +=== Z and discrete-time Fourier transforms === +The eigenfunction property of exponentials is very useful for both analysis and insight into LTI systems. The [[Z transform]] +:<math>H(z) = \mathcal{Z}\{h[n]\} = \sum_{n=-\infty}^\infty h[n] z^{-n}</math> +is exactly the way to get the eigenvalues from the impulse response. Of particular interest are pure sinusoids, i.e. exponentials of the form <math>e^{j \omega n}</math>, where <math>\omega \in \mathbb{R}</math>. These can also be written as <math>z^n</math> with <math>z = e^{j \omega}</math>. These are generally called complex exponentials even though the argument is purely imaginary. +The [[Discrete-time Fourier transform]] (DTFT) <math>H(e^{j \omega}) = \mathcal{F}\{h[n]\}</math> +gives the eigenvalues of pure sinusoids. Both of <math>H(z)</math> and <math>H(e^{j\omega})</math> are called the ''system function'', ''system response'', or ''transfer function'''. + +The Z transform is usually used in the context of one-sided signals, i.e. signals that are zero for all values of t less than some value. Usually, this "start time" is set to zero, for convenience and without loss of generality. The Fourier transform is used for analyzing signals that are infinite in extent. + +Due to the convolution property of both of these transforms, the convolution that gives the output of the system can be transformed to a multiplication in the transform domain. That is, +:<math>y[n] = (h*x)[n] = \sum_{m=-\infty}^\infty h[n-m] x[m] = \mathcal{Z}^{-1}\{H(z)X(z)\}.</math> + +Just as with the Laplace transform transfer function in continuous-time system analysis, the Z transform makes it easier to analyze systems and gain insight into their behavior. One can look at the modulus of the system function ''|H(z)|'' to see whether the input <math>z^n</math> is ''passed'' (let through) by the system, or ''rejected'' or ''attenuated'' by the system (not let through). + +=== Examples === +*A simple example of an LTI operator is the delay operator <math>D\{x[n]\} \ \stackrel{\text{def}}{=}\ x[n-1]</math>. +**<math> D \left( c_1 x_1[n] + c_2 x_2[n] \right) = c_1 x_1[n-1] + c_2 x_2[n-1] = c_1 Dx_1[n] + c_2 Dx_2[n]</math> &nbsp; (i.e., it is linear) +**<math> D\{x[n-m]\} = x[n-m-1] = x[(n-1)-m] = D\{x\}[n-m]\,</math> &nbsp; (i.e., it is time invariant) +:The Z transform of the delay operator is a simple multiplication by ''z''<sup>-1</sup>. That is, +::<math> \mathcal{Z}\left\{Dx[n]\right\} = z^{-1} X(z). </math> + +*Another simple LTI operator is the averaging operator +::<math> \mathcal{A}\left\{x[n]\right\}\ \stackrel{\text{def}}{=}\ \sum_{k=n-a}^{n+a} x[k]</math>. +:Because of the linearity of sums, +::<math>\begin{align} +\mathcal{A}\left\{c_1 x_1[n] + c_2 x_2[n] \right\} +&= \sum_{k=n-a}^{n+a} \left( c_1 x_1[k] + c_2 x_2[k] \right)\\ +&= c_1 \sum_{k=n-a}^{n+a} x_1[k] + c_2 \sum_{k=n-a}^{n+a} x_2[k]\\ +&= c_1 \mathcal{A}\left\{x_1[n] \right\} + c_2 \mathcal{A}\left\{x_2[n] \right\}, +\end{align}</math> +:and so it is linear. Because, +::<math>\begin{align} +\mathcal{A}\left\{x[n-m]\right\} +&= \sum_{k=n-a}^{n+a} x[k-m]\\ +&= \sum_{k'=(n-m)-a}^{(n-m)+a} x[k']\\ +&= \mathcal{A}\left\{x\right\}[n-m], +\end{align}</math> +:it is also time invariant. + +=== Important system properties === +The input-output characteristics of discrete-time LTI system are completely described by its impulse response <math>h[n]</math>. +Some of the most important properties of a system are causality and stability. Unlike CT systems, non-causal DT systems can be realized. It is trivial to make an acausal [[Finite Impulse Response|FIR]] system causal by adding delays. It is even possible to make acausal [[Infinite impulse response|IIR]] systems.<ref>Vaidyanathan,1995</ref> Non-stable systems can be built and can be useful in many circumstances. Even [[Quadrature filter|non-real]] systems can be built and are very useful in many contexts. + +==== Causality ==== +{{main|Causal system}} +<!--the causal system article needs work--> +A discrete-time LTI system is causal if the current value of the output depends on only the current value and past values of the input.,<ref>Phillips 2007, p. 508.</ref> A necessary and sufficient condition for causality is + +:<math>h[n] = 0 \ \forall n < 0,</math> + +where <math>h[n]</math> is the impulse response. It is not possible in general to determine causality from the Z transform, because the inverse transform is not unique. When a [[region of convergence]] is specified, then causality can be determined. + +==== Stability ==== +{{main|BIBO stability}} +A system is '''bounded input, bounded output stable''' (BIBO stable) if, for every bounded input, the output is finite. Mathematically, if + +:<math>\ \|x[n]\|_{\infty} < \infty</math> + +implies that + +:<math>\ \|y[n]\|_{\infty} < \infty</math> + +(that is, if bounded input implies bounded output, in the sense that the [[Infinity norm|maximum absolute values]] of <math>x[n]</math> and <math>y[n]</math> are finite), then the system is stable. A necessary and sufficient condition is that <math>h[n]</math>, the impulse response, satisfies + +:<math>\|h[n]\|_1\ \stackrel{\text{def}}{=}\ \sum_{n = -\infty}^\infty |h[n]| < \infty.</math> + +In the frequency domain, the [[region of convergence]] must contain the [[unit circle]] (i.e., the [[locus (mathematics)|locus]] satisfying <math>|z|=1</math> for complex ''z''). + +==Notes== +{{Reflist}} + +== See also == +* [[Circulant matrix]] +* [[Frequency response]] +* [[Impulse response]] +* [[System analysis]] +* [[Green's function|Green function]] + +==References== +* {{cite book + | author=Phillips, C.l., Parr, J.M., & Riskin, E.A + | title=Signals, systems and Transforms + | publisher=Prentice Hall + | year=2007 | isbn=0-13-041207-4}} +* {{cite book + | author=Hespanha,J.P. + | title=Linear System Theory + | publisher=Princeton university press + | year=2009| isbn=0-691-14021-9}} +* {{citation|last=Crutchfield|first=Steve|url=http://www.jhu.edu/signals/convolve/index.html|title=The Joy of Convolution +|work=Johns Hopkins University |date=October 12, 2010 |accessdate=November 21, 2010}} +* {{cite journal +| last1=Vaidyanathan +| first1=P. P. +| last2=Chen +| first2=T. +| title=Role of anticausal inverses in multirate filter banks &mdash; Part I: system theoretic fundamentals +| journal=IEEE Trans. Signal Proc. +|date=May 1995 +| doi=10.1109/78.382395 +| volume=43 +| pages=1090 +| issue=6 +|bibcode = 1995ITSP...43.1090V }} + +== Further reading == +{{refbegin}} +* {{cite book +| first=Boaz +| last=Porat +| authorlink=Boaz Porat +| title=A Course in Digital Signal Processing +| year=1997 +| isbn=978-0-471-14961-3 +| publisher=John Wiley +| location=New York +}} + +* {{cite journal +| last1=Vaidyanathan +| first1=P. P. +| last2=Chen +| first2=T. +| title=Role of anticausal inverses in multirate filter banks &mdash; Part I: system theoretic fundamentals +| journal=IEEE Trans. Signal Proc. +|date=May 1995 +| doi=10.1109/78.382395 +| volume=43 +| pages=1090 +| issue=5 +|bibcode = 1995ITSP...43.1090V }} + +== External links == +* [http://www.tedpavlic.com/teaching/osu/ece209/support/circuits_sys_review.pdf ECE 209: Review of Circuits as LTI Systems]&nbsp;&ndash; Short primer on the mathematical analysis of (electrical) LTI systems. +* [http://www.tedpavlic.com/teaching/osu/ece209/lab3_opamp_FO/lab3_opamp_FO_phase_shift.pdf ECE 209: Sources of Phase Shift]&nbsp;&ndash; Gives an intuitive explanation of the source of phase shift in two common electrical LTI systems. +*[http://www.ece.jhu.edu/~cooper/courses/214/signalsandsystemsnotes.pdf JHU 520.214 Signals and Systems course notes]. An encapsulated course on LTI system theory. Adequate for self teaching. + +{{DEFAULTSORT:Lti System Theory}} +[[Category:Digital signal processing]] +[[Category:Electrical engineering]] +[[Category:Control theory]] +[[Category:Signal processing]] +[[Category:Frequency domain analysis]] +[[Category:Time domain analysis]] + n1ahssrj45wikl2nhvjjet8d3vhg0e9 + + + + Phase-shift keying + 0 + 1319 + + 1320 + 2014-01-31T14:54:25Z + + RomanSpa + 0 + + + /* Demodulation */ + wikitext + text/x-wiki + {{Modulation techniques}} + +'''Phase-shift keying''' ('''PSK''') is a [[Digital data|digital]] [[modulation]] scheme that conveys [[Data#Uses of data in computing|data]] by changing, or modulating, the [[Phase (waves)|phase]] of a reference [[Signal (information theory)|signal]] (the [[carrier wave]]). + +Any digital modulation scheme uses a [[wiktionary:finite|finite]] number of distinct signals to represent digital data. PSK uses a finite number of phases, each assigned a unique pattern of [[bit|binary digit]]s. Usually, each phase encodes an equal number of bits. Each pattern of bits forms the [[Symbol (data)|symbol]] that is represented by the particular phase. The [[demodulator]], which is designed specifically for the symbol-set used by the modulator, determines the phase of the received signal and maps it back to the symbol it represents, thus recovering the original data. This requires the receiver to be able to compare the phase of the received signal to a reference signal &mdash; such a system is termed coherent (and referred to as CPSK). + +Alternatively, instead of operating with respect to a constant reference wave, the broadcast can operate with respect to itself. Changes in phase of a single broadcast waveform can be considered the significant items. In this system, the demodulator determines the changes in the phase of the received signal rather than the phase (relative to a reference wave) itself. Since this scheme depends on the difference between successive phases, it is termed '''differential phase-shift keying (DPSK)'''. DPSK can be significantly simpler to implement than ordinary PSK since there is no need for the demodulator to have a copy of the reference signal to determine the exact phase of the received signal (it is a non-coherent scheme). In exchange, it produces more erroneous demodulation. + +==Introduction== + +There are three major classes of [[modulation#Digital modulation methods|digital modulation]] techniques used for transmission of [[Digital data|digital]]ly represented data: + +* [[Amplitude-shift keying]] (ASK) +* [[Frequency-shift keying]] (FSK) +* Phase-shift keying (PSK) + +All convey data by changing some aspect of a base signal, the [[carrier wave]] (usually a [[Sine wave|sinusoid]]), in response to a data signal. In the case of PSK, the phase is changed to represent the data signal. There are two fundamental ways of utilizing the phase of a signal in this way: + +* By viewing the [[Phase (waves)|phase]] itself as conveying the information, in which case the [[demodulator]] must have a reference signal to compare the received signal's phase against; or +* By viewing the ''change'' in the phase as conveying information &mdash; [[#Differential encoding|''differential'']] schemes, [[Phase-shift keying#Differential phase-shift keying .28DPSK.29|some]] of which do not need a reference carrier (to a certain extent). + +A convenient way to represent PSK schemes is on a [[constellation diagram]]. This shows the points in the [[complex plane]] where, in this context, the [[real number|real]] and [[imaginary number|imaginary]] axis are termed the in-phase and quadrature axes respectively due to their 90° separation. Such a representation on perpendicular axes lends itself to straightforward implementation. The amplitude of each point along the in-phase axis is used to modulate a cosine (or sine) wave and the amplitude along the quadrature axis to modulate a sine (or cosine) wave. + +In PSK, the [[constellation diagram|constellation points]] chosen are usually positioned with uniform [[angle|angular]] spacing around a [[circle]]. This gives maximum phase-separation between adjacent points and thus the best immunity to corruption. They are positioned on a circle so that they can all be transmitted with the same energy. In this way, the moduli of the complex numbers they represent will be the same and thus so will the amplitudes needed for the cosine and sine waves. Two common examples are "binary phase-shift keying" ([[Phase-shift keying#Binary phase-shift keying (BPSK)|BPSK]]) which uses two phases, and "quadrature phase-shift keying" ([[Phase-shift keying#Quadrature phase-shift keying (QPSK)|QPSK]]) which uses four phases, although any number of phases may be used. Since the data to be conveyed are usually binary, the PSK scheme is usually designed with the number of constellation points being a [[power (mathematics)|power]] of 2. + +===Definitions=== + +For determining error-rates mathematically, some definitions will be needed: + +*<math>E_b</math> = Energy-per-[[bit]] +*<math>E_s</math> = Energy-per-symbol = <math>nE_b</math> with ''n'' bits per symbol +*<math>T_b</math> = [[Bit rate|Bit duration]] +*<math>T_s</math> = [[Symbol rate|Symbol duration]] +*<math>N_0/2</math> = [[Signal noise|Noise]] [[spectral density|power spectral density]] ([[Watt|W]]/[[Hertz|Hz]]) +*<math>P_b</math> = [[Probability]] of bit-error +*<math>P_s</math> = Probability of symbol-error + +<math>Q(x)</math> will give the probability that a single sample taken from a random process with zero-mean and unit-variance [[Normal distribution|Gaussian probability density function]] will be greater or equal to <math>x</math>. It is a scaled form of the [[Error function|complementary Gaussian error function]]: +: <math>Q(x) = \frac{1}{\sqrt{2\pi}}\int_{x}^{\infty}e^{-t^{2}/2}dt = \frac{1}{2}\,\operatorname{erfc}\left(\frac{x}{\sqrt{2}}\right),\ x\geq{}0</math>. + +The error-rates quoted here are those in [[additive white Gaussian noise|additive]] [[white noise|white]] [[Gaussian noise]] ([[AWGN]]). These error rates are lower than those computed in [[fading channel]]s, hence, are a good theoretical benchmark to compare with. + +==Applications== + +Owing to PSK's simplicity, particularly when compared with its competitor [[quadrature amplitude modulation]], it is widely used in existing technologies. + +The [[wireless LAN]] standard, [[IEEE 802.11b-1999]],<ref name="ref80211">[http://standards.ieee.org/getieee802/download/802.11-1999.pdf IEEE Std 802.11-1999: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications] &mdash; the overarching IEEE 802.11 specification.{{dead link|date=October 2011}}</ref><ref name="80211b">[http://standards.ieee.org/getieee802/download/802.11b-1999.pdf IEEE Std 802.11b-1999 (R2003)] &mdash; the IEEE 802.11b specification.</ref> uses a variety of different PSKs depending on the data-rate required. At the basic-rate of 1 [[Mbit]]/s, it uses DBPSK (differential BPSK). To provide the extended-rate of 2 Mbit/s, DQPSK is used. In reaching 5.5 Mbit/s and the full-rate of 11 Mbit/s, QPSK is employed, but has to be coupled with [[complementary code keying]]. The higher-speed wireless LAN standard, [[IEEE 802.11g-2003]]<ref name="ref80211" /><ref name="80211g">[http://standards.ieee.org/getieee802/download/802.11g-2003.pdf IEEE Std 802.11g-2003] &mdash; the IEEE 802.11g specification.</ref> has eight data rates: 6, 9, 12, 18, 24, 36, 48 and 54 Mbit/s. The 6 and 9 Mbit/s modes use [[Orthogonal frequency-division multiplexing|OFDM]] modulation where each sub-carrier is BPSK modulated. The 12 and 18 Mbit/s modes use OFDM with QPSK. The fastest four modes use OFDM with forms of [[quadrature amplitude modulation]]. + +Because of its simplicity BPSK is appropriate for low-cost passive transmitters, and is used in [[RFID]] standards such as [[ISO/IEC 14443]] which has been adopted for [[biometric passport]]s, credit cards such as [[American Express]]'s [[ExpressPay]], and many other applications.<ref>[http://www.atmel.com/dyn/resources/prod_documents/doc2056.pdf Understanding the Requirements of ISO/IEC 14443 for Type B Proximity Contactless Identification Cards], Application Note, Rev. 2056B–RFID–11/05, 2005, ATMEL</ref> + +[[Bluetooth]] 2 will use <math>\pi/4</math>-DQPSK at its lower rate (2 Mbit/s) and 8-DPSK at its higher rate (3 Mbit/s) when the link between the two devices is sufficiently robust. Bluetooth 1 modulates with [[Minimum-shift keying|Gaussian minimum-shift keying]], a binary scheme, so either modulation choice in version 2 will yield a higher data-rate. A similar technology, [[IEEE 802.15.4]] (the wireless standard used by [[ZigBee]]) also relies on PSK. IEEE 802.15.4 allows the use of two frequency bands: 868&ndash;915 [[Megahertz|MHz]] using BPSK and at 2.4 [[Hertz|GHz]] using OQPSK. + +Notably absent from these various schemes is 8-PSK. This is because its error-rate performance is close to that of [[quadrature amplitude modulation|16-QAM]] &mdash; it is only about 0.5 [[decibel|dB]] better{{Citation needed|date=October 2007}} &mdash; but its data rate is only three-quarters that of 16-QAM. Thus 8-PSK is often omitted from standards and, as seen above, schemes tend to 'jump' from QPSK to 16-QAM ([[Quadrature amplitude modulation#Rectangular QAM|8-QAM]] is possible but difficult to implement). + +Included among the exceptions is [[HughesNet]] satellite ISP. For example, the model HN7000S modem +(on KU-band satcom) uses 8-PSK modulation. + +==Binary phase-shift keying (BPSK)== + +[[File:BPSK Gray Coded.svg|200px|right|thumb|Constellation diagram example for BPSK.]] + +BPSK (also sometimes called PRK, phase reversal keying, or 2PSK) is the simplest form of phase shift keying (PSK). It uses two phases which are separated by 180° and so can also be termed 2-PSK. It does not particularly matter exactly where the constellation points are positioned, and in this figure they are shown on the real axis, at 0° and 180°. This modulation is the most robust of all the PSKs since it takes the highest level of noise or distortion to make the [[demodulator]] reach an incorrect decision. It is, however, only able to modulate at 1 bit/symbol (as seen in the figure) and so is unsuitable for high data-rate applications. + + + +In the presence of an arbitrary phase-shift introduced by the [[communications channel]], the demodulator is unable to tell which constellation point is which. As a result, the data is often [[#Differential encoding|differentially encoded]] prior to modulation. + +BPSK is functionally equivalent to [[Quadrature amplitude modulation|2-QAM]] modulation. + +=== Implementation === +The general form for BPSK follows the equation: +:<math>s_n(t) = \sqrt{\frac{2E_b}{T_b}} \cos(2 \pi f_c t + \pi(1-n )), n = 0,1. </math> +This yields two phases, 0 and π. +In the specific form, binary data is often conveyed with the following signals: +:<math>s_0(t) = \sqrt{\frac{2E_b}{T_b}} \cos(2 \pi f_c t + \pi ) + = - \sqrt{\frac{2E_b}{T_b}} \cos(2 \pi f_c t)</math> for binary "0" +:<math>s_1(t) = \sqrt{\frac{2E_b}{T_b}} \cos(2 \pi f_c t) </math> for binary "1" +where ''f''<sub>''c''</sub> is the frequency of the carrier-wave. + +Hence, the signal-space can be represented by the single [[basis function]] +:<math>\phi(t) = \sqrt{\frac{2}{T_b}} \cos(2 \pi f_c t) </math> +where 1 is represented by <math>\sqrt{E_b} \phi(t)</math> and 0 is represented by <math>-\sqrt{E_b} \phi(t)</math>. This assignment is, of course, arbitrary. + +This use of this basis function is shown at the [[#timing|end of the next section]] in a signal timing diagram. The topmost signal is a BPSK-modulated cosine wave that the BPSK modulator would produce. The bit-stream that causes this output is shown above the signal (the other parts of this figure are relevant only to QPSK). + +=== Bit error rate === + +The [[bit error rate]] (BER) of BPSK in [[AWGN]] can be calculated as:<ref>Communications Systems, H. Stern & S. Mahmoud, Pearson Prentice Hall, 2004, p283</ref> +:<math>P_b = Q\left(\sqrt{\frac{2E_b}{N_0}}\right)</math> or <math>P_b = \frac{1}{2} \operatorname{erfc} \left( \sqrt{\frac{E_b}{N_0}}\right)</math> + +Since there is only one bit per symbol, this is also the symbol error rate. + +==Quadrature phase-shift keying (QPSK)== + +[[File:QPSK Gray Coded.svg|200px|right|thumb|Constellation diagram for QPSK with [[Gray coding]]. Each adjacent symbol only differs by one bit.]] + +Sometimes this is known as ''quaternary PSK'', ''quadriphase PSK'', 4-PSK, or 4-[[QAM]]. (Although the root concepts of QPSK and 4-QAM are different, the resulting modulated radio waves are exactly the same.) QPSK uses four points on the constellation diagram, equispaced around a circle. With four phases, QPSK can encode two bits per symbol, shown in the diagram with [[Gray coding]] to minimize the [[bit error rate]] (BER) &mdash; sometimes misperceived as twice the BER of BPSK. + +The mathematical analysis shows that QPSK can be used either to double the data rate compared with a BPSK system while maintaining the ''same'' [[bandwidth (signal processing)|bandwidth]] of the signal, or to ''maintain the data-rate of BPSK'' but halving the bandwidth needed. In this latter case, the BER of QPSK is ''exactly the same'' as the BER of BPSK - and deciding differently is a common confusion when considering or describing QPSK. The transmitted carrier can undergo numbers of phase changes. + +Given that radio communication channels are allocated by agencies such as the [[Federal Communication Commission]] giving a prescribed (maximum) bandwidth, the advantage of QPSK over BPSK becomes evident: QPSK transmits twice the data rate in a given bandwidth compared to BPSK - at the same BER. The engineering penalty that is paid is that QPSK transmitters and receivers are more complicated than the ones for BPSK. However, with modern [[electronics]] technology, the penalty in cost is very moderate. + +As with BPSK, there are phase ambiguity problems at the receiving end, and [[#Differential encoding|differentially encoded]] QPSK is often used in practice. + +=== Implementation === + +The implementation of QPSK is more general than that of BPSK and also indicates the implementation of higher-order PSK. Writing the symbols in the constellation diagram in terms of the sine and cosine waves used to transmit them: + +:<math>s_n(t) = \sqrt{\frac{2E_s}{T_s}} \cos \left ( 2 \pi f_c t + (2n -1) \frac{\pi}{4}\right ),\quad n = 1, 2, 3, 4. </math> + +This yields the four phases π/4, 3π/4, 5π/4 and 7π/4 as needed. + +This results in a two-dimensional signal space with unit [[basis functions]] +:<math>\phi_1(t) = \sqrt{\frac{2}{T_s}} \cos (2 \pi f_c t) </math> +:<math>\phi_2(t) = \sqrt{\frac{2}{T_s}} \sin (2 \pi f_c t) </math> +The first basis function is used as the in-phase component of the signal and the second as the quadrature component of the signal. + +Hence, the signal constellation consists of the signal-space 4 points + +:<math>\left ( \pm \sqrt{E_s/2}, \pm \sqrt{E_s/2} \right ).</math> + +The factors of 1/2 indicate that the total power is split equally between the two carriers. + +Comparing these basis functions with that for BPSK shows clearly how QPSK can be viewed as two independent BPSK signals. Note that the signal-space points for BPSK do not need to split the symbol (bit) energy over the two carriers in the scheme shown in the BPSK constellation diagram. + +QPSK systems can be implemented in a number of ways. An illustration of the major components of the transmitter and receiver structure are shown below. + +[[File:Transmitter QPSK 2.PNG|thumb|600px|center|Conceptual transmitter structure for QPSK. The binary data stream is split into the in-phase and quadrature-phase components. These are then separately modulated onto two orthogonal basis functions. In this implementation, two sinusoids are used. Afterwards, the two signals are superimposed, and the resulting signal is the QPSK signal. Note the use of polar non-return-to-zero encoding. These encoders can be placed before for binary data source, but have been placed after to illustrate the conceptual difference between digital and analog signals involved with digital modulation.]] + +[[File:Receiver QPSK.PNG|thumb|600px|center|Receiver structure for QPSK. The matched filters can be replaced with correlators. Each detection device uses a reference threshold value to determine whether a 1 or 0 is detected.]] + +=== Bit error rate === + +Although QPSK can be viewed as a quaternary modulation, it is easier to see it as two independently modulated quadrature carriers. With this interpretation, the even (or odd) bits are used to modulate the in-phase component of the carrier, while the odd (or even) bits are used to modulate the quadrature-phase component of the carrier. BPSK is used on both carriers and they can be independently demodulated. + +As a result, the probability of bit-error for QPSK is the same as for BPSK: +:<math>P_b = Q\left(\sqrt{\frac{2E_b}{N_0}}\right).</math> + +However, in order to achieve the same bit-error probability as BPSK, QPSK uses twice the power (since two bits are transmitted simultaneously). + +The symbol error rate is given by: +<!--Note this needs to be in a table to make the alignment and spacing right. --> +{| +|<math>\,\!P_s</math> +|<math>= 1 - \left( 1 - P_b \right)^2</math> +|- +| +|<math>= 2Q\left( \sqrt{\frac{E_s}{N_0}} \right) - \left[ Q \left( \sqrt{\frac{E_s}{N_0}} \right) \right]^2</math>. +|} + +If the [[signal-to-noise ratio]] is high (as is necessary for practical QPSK systems) the probability of symbol error may be approximated: +:<math>P_s \approx 2 Q \left( \sqrt{\frac{E_s}{N_0}} \right )</math> + + + +<div id="timing">The modulated signal is shown below for a short segment of a random binary data-stream. The two carrier waves are a cosine wave and a sine wave, as indicated by the signal-space analysis above. Here, the odd-numbered bits have been assigned to the in-phase component and the even-numbered bits to the quadrature component (taking the first bit as number 1). The total signal &mdash; the sum of the two components &mdash; is shown at the bottom. Jumps in phase can be seen as the PSK changes the phase on each component at the start of each bit-period. The topmost waveform alone matches the description given for BPSK above.</div> + +[[File:QPSK timing diagram.png|frame|center|Timing diagram for QPSK. The binary data stream is shown beneath the time axis. The two signal components with their bit assignments are shown the top and the total, combined signal at the bottom. Note the abrupt changes in phase at some of the bit-period boundaries.]] + +The binary data that is conveyed by this waveform is: 1 1 0 0 0 1 1 0. + +* The odd bits, highlighted here, contribute to the in-phase component: '''<u>1</u>''' 1 '''<u>0</u>''' 0 '''<u>0</u>''' 1 '''<u>1</u>''' 0 +* The even bits, highlighted here, contribute to the quadrature-phase component: 1 '''<u>1</u>''' 0 '''<u>0</u>''' 0 '''<u>1</u>''' 1 '''<u>0</u>''' + +===Variants=== + +====Offset QPSK (OQPSK)==== + +[[File:Pi-by-O-QPSK Gray Coded.svg|thumb|Signal doesn't cross zero, because only one bit of the symbol is changed at a time]] + +''Offset quadrature phase-shift keying'' (''OQPSK'') is a variant of phase-shift keying modulation using 4 different values of the phase to transmit. It is sometimes called ''Staggered quadrature phase-shift keying'' (''SQPSK''). + +[[File:Oqpsk phase plot.svg|thumb|Difference of the phase between QPSK and OQPSK]] + +Taking four values of the phase (two [[bit]]s) at a time to construct a QPSK symbol can allow the phase of the signal to jump by as much as 180° at a time. When the signal is low-pass filtered (as is typical in a transmitter), these phase-shifts result in large amplitude fluctuations, an undesirable quality in communication systems. By offsetting the timing of the odd and even bits by one bit-period, or half a symbol-period, the in-phase and quadrature components will never change at the same time. In the constellation diagram shown on the right, it can be seen that this will limit the phase-shift to no more than 90° at a time. This yields much lower amplitude fluctuations than non-offset QPSK and is sometimes preferred in practice. + +The picture on the right shows the difference in the behavior of the phase between ordinary QPSK and OQPSK. It can be seen that in the first plot the phase can change by 180° at once, while in OQPSK the changes are never greater than 90°. + +The modulated signal is shown below for a short segment of a random binary data-stream. Note the half symbol-period offset between the two component waves. The sudden phase-shifts occur about twice as often as for QPSK (since the signals no longer change together), but they are less severe. In other words, the magnitude of jumps is smaller in OQPSK when compared to QPSK. + +[[File:OQPSK timing diagram.png|frame|center|Timing diagram for offset-QPSK. The binary data stream is shown beneath the time axis. The two signal components with their bit assignments are shown the top and the total, combined signal at the bottom. Note the half-period offset between the two signal components.]] + +====''&pi;'' /4&ndash;QPSK==== + +[[File:Pi-by-4-QPSK Gray Coded.svg|thumb|right|Dual constellation diagram for π/4-QPSK. This shows the two separate constellations with identical Gray coding but rotated by 45° with respect to each other.]] + +This variant of QPSK uses two identical constellations which are rotated by 45° (<math>\pi/4</math> radians, hence the name) with respect to one another. Usually, either the even or odd symbols are used to select points from one of the constellations and the other symbols select points from the other constellation. This also reduces the phase-shifts from a maximum of 180°, but only to a maximum of 135° and so the amplitude fluctuations of <math>\pi/4</math>&ndash;QPSK are between OQPSK and non-offset QPSK. + +One property this modulation scheme possesses is that if the modulated signal is represented in the complex domain, it does not have any paths through the origin. In other words, the signal does not pass through the origin. This lowers the dynamical range of fluctuations in the signal which is desirable when engineering communications signals. + +On the other hand, <math>\pi/4</math>&ndash;QPSK lends itself to easy demodulation and has been adopted for use in, for example, [[Time division multiple access|TDMA]] [[cellular telephone]] systems. + +The modulated signal is shown below for a short segment of a random binary data-stream. The construction is the same as above for ordinary QPSK. Successive symbols are taken from the two constellations shown in the diagram. Thus, the first symbol (1 1) is taken from the 'blue' constellation and the second symbol (0 0) is taken from the 'green' constellation. Note that magnitudes of the two component waves change as they switch between constellations, but the total signal's magnitude remains constant ([[constant envelope]]). The phase-shifts are between those of the two previous timing-diagrams. + +[[File:Pi-by-4-QPSK timing diagram.png|frame|center|Timing diagram for π/4-QPSK. The binary data stream is shown beneath the time axis. The two signal components with their bit assignments are shown the top and the total, combined signal at the bottom. Note that successive symbols are taken alternately from the two constellations, starting with the 'blue' one.]] + +==== SOQPSK ==== + +The license-free '''shaped-offset QPSK''' (SOQPSK) is interoperable with Feher-patented QPSK ('''FQPSK'''), in the sense that an integrate-and-dump offset QPSK detector produces the same output no matter which kind of transmitter is used.<ref> +Tom Nelson, Erik Perrins, and Michael Rice. +[http://people.eecs.ku.edu/~esp/publications/c2005ItcCommon.pdf "Common detectors for Tier 1 modulations"]. + +T. Nelson, E. Perrins, M. Rice. +[http://www.researchgate.net/publication/4213516_Common_detectors_for_shaped_offset_QPSK_(SOQPSK)_and_Feher-patented_QPSK_(FQPSK) "Common detectors for shaped offset QPSK (SOQPSK) and Feher-patented QPSK (FQPSK)"] +{{cite doi|10.1109/GLOCOM.2005.1578470}} +ISBN 0-7803-9414-3 +</ref> + +These modulations carefully shape the I and Q waveforms such that they change very smoothly, and the signal stays constant-amplitude even during signal transitions. (Rather than traveling instantly from one symbol to another, or even linearly, it travels smoothly around the constant-amplitude circle from one symbol to the next.) + +The standard description of SOQPSK-TG involves [[ternary signal|ternary symbols]]. + +==== DPQPSK ==== + +'''Dual-polarization quadrature phase shift keying''' (DPQPSK) or '''dual-polarization QPSK''' - involves the polarization multiplexing of two different QPSK signals, thus improving the spectral efficiency by a factor of 2. This is a cost-effective alternative, to utilizing 16-PSK instead of QPSK to double the spectral efficiency. + +==Higher-order PSK== + +[[File:8PSK Gray Coded.svg|200px|right|thumb|Constellation diagram for 8-PSK with Gray coding.]] + +Any number of phases may be used to construct a PSK constellation but 8-PSK is usually the highest order PSK constellation deployed. With more than 8 phases, the error-rate becomes too high and there are better, though more complex, modulations available such as [[quadrature amplitude modulation]] (QAM). Although any number of phases may be used, the fact that the constellation must usually deal with binary data means that the number of symbols is usually a power of 2 &mdash; this allows an equal number of bits-per-symbol. + +===Bit error rate=== +For the general <math>M</math>-PSK there is no simple expression for the symbol-error probability if <math>M>4</math>. Unfortunately, it can only be obtained from: +:<math> +P_s = 1 - \int_{-\frac{\pi}{M}}^{\frac{\pi}{M}}p_{\theta_{r}}\left(\theta_{r}\right)d\theta_{r} +</math> + +where + +:<math>p_{\theta_{r}}\left(\theta_r\right) = \frac{1}{2\pi}e^{-2\gamma_{s}\sin^{2}\theta_{r}}\int_{0}^{\infty}Ve^{-\left(V-\sqrt{4\gamma_{s}}\cos\theta_{r}\right)^{2}/2}dV</math>, +:<math>V = \sqrt{r_1^2 + r_2^2}</math>, +:<math>\theta_r = \tan^{-1}\left(r_2/r_1\right)</math>, +:<math>\gamma_{s} = \frac{E_{s}}{N_{0}}</math> and +:<math>r_1 \sim{} N\left(\sqrt{E_s},N_{0}/2\right)</math> and <math>r_2 \sim{} N\left(0,N_{0}/2\right)</math> are jointly Gaussian [[random variable]]s. + +[[File:PSK BER curves.svg|thumb|left|280px|Bit-error rate curves for BPSK, QPSK, 8-PSK and 16-PSK, AWGN channel.]] +This may be approximated for high <math>M</math> and high <math>E_b/N_0</math> by: +:<math>P_s \approx 2Q\left(\sqrt{2\gamma_s}\sin\frac{\pi}{M}\right)</math>. + +The bit-error probability for <math>M</math>-PSK can only be determined exactly once the bit-mapping is known. However, when [[Gray code|Gray coding]] is used, the most probable error from one symbol to the next produces only a single bit-error and +:<math>P_b \approx \frac{1}{k}P_s</math>. +(Using Gray coding allows us to approximate the [[Lee distance]] of the errors as the [[Hamming distance]] of the errors in the decoded bitstream, which is easier to implement in hardware.) + +The graph on the left compares the bit-error rates of BPSK, QPSK (which are the same, as noted above), 8-PSK and 16-PSK. It is seen that [[higher-order modulation]]s exhibit higher error-rates; in exchange however they deliver a higher raw data-rate. + +Bounds on the error rates of various digital modulation schemes can be computed with application of the [[union bound]] to the signal constellation. + +==Differential phase-shift keying (DPSK)== + +===Differential encoding=== + +{{main|differential coding}} + +Differential phase shift keying (DPSK) is a common form of phase modulation that conveys data by changing the phase of the carrier wave. As mentioned for BPSK and QPSK there is an ambiguity of phase if the constellation is rotated by some effect in the [[communications channel]] through which the signal passes. This problem can be overcome by using the data to ''change'' rather than ''set'' the phase. + +For example, in differentially encoded BPSK a binary '1' may be transmitted by adding 180° to the current phase and a binary '0' by adding 0° to the current phase. Another variant of DPSK is Symmetric Differential Phase Shift keying, SDPSK, where encoding would be +90° for a '1' and -90° for a '0'. + +In differentially encoded QPSK (DQPSK), the phase-shifts are 0°, 90°, 180°, -90° corresponding to data '00', '01', '11', '10'. This kind of encoding may be demodulated in the same way as for non-differential PSK but the phase ambiguities can be ignored. Thus, each received symbol is demodulated to one of the <math>M</math> points in the constellation and a [[comparator]] then computes the difference in phase between this received signal and the preceding one. The difference encodes the data as described above. +Symmetric Differential Quadrature Phase Shift Keying (SDQPSK) is like DQPSK, but encoding is symmetric, using phase shift values of -135°, -45°, +45° and +135°. + +The modulated signal is shown below for both DBPSK and DQPSK as described above. In the figure, it is assumed that the ''signal starts with zero phase'', and so there is a phase shift in both signals at <math>t = 0</math>. + +[[File:DBQPSK timing diag.png|frame|center|Timing diagram for DBPSK and DQPSK. The binary data stream is above the DBPSK signal. The individual bits of the DBPSK signal are grouped into pairs for the DQPSK signal, which only changes every ''T<sub>s</sub>'' = 2''T<sub>b</sub>''.]] + +Analysis shows that differential encoding approximately doubles the error rate compared to ordinary <math>M</math>-PSK but this may be overcome by only a small increase in <math>E_b/N_0</math>. Furthermore, this analysis (and the graphical results below) are based on a system in which the only corruption is additive white Gaussian noise([[AWGN]]). However, there will also be a physical channel between the transmitter and receiver in the communication system. This channel will, in general, introduce an unknown phase-shift to the PSK signal; in these cases the differential schemes can yield a ''better'' error-rate than the ordinary schemes which rely on precise phase information. + +===Demodulation=== + +[[File:DPSK BER curves.svg|thumb|right|280px|BER comparison between DBPSK, DQPSK and their non-differential forms using gray-coding and operating in white noise.]] + +For a signal that has been differentially encoded, there is an obvious alternative method of demodulation. Instead of demodulating as usual and ignoring carrier-phase ambiguity, the phase between two successive received symbols is compared and used to determine what the data must have been. When differential encoding is used in this manner, the scheme is known as differential phase-shift keying (DPSK). Note that this is subtly different from just differentially encoded PSK since, upon reception, the received symbols are ''not'' decoded one-by-one to constellation points but are instead compared directly to one another. + +Call the received symbol in the <math>k</math><sup>th</sup> timeslot <math>r_k</math> and let it have phase <math>\phi_k</math>. Assume without loss of generality that the phase of the carrier wave is zero. Denote the [[AWGN]] term as <math>n_k</math>. Then +:<math>r_k = \sqrt{E_s}e^{j\phi_k} + n_k</math>. + +The decision variable for the <math>k-1</math><sup>th</sup> symbol and the <math>k</math><sup>th</sup> symbol is the phase difference between <math>r_k</math> and <math>r_{k-1}</math>. That is, if <math>r_k</math> is projected onto <math>r_{k-1}</math>, the decision is taken on the phase of the resultant complex number: +:<math>r_kr_{k-1}^{*} = E_se^{j\left(\theta_k - \theta_{k-1}\right)} + \sqrt{E_s}e^{j\theta_k}n_{k-1}^{*} + \sqrt{E_s}e^{-j\theta_{k-1}}n_k + n_kn_{k-1}</math> +where superscript * denotes [[complex conjugation]]. In the absence of noise, the phase of this is <math>\theta_{k}-\theta_{k-1}</math>, the phase-shift between the two received signals which can be used to determine the data transmitted. + +The probability of error for DPSK is difficult to calculate in general, but, in the case of DBPSK it is: +:<math>P_b = \frac{1}{2}e^{-E_b/N_0},</math> +which, when numerically evaluated, is only slightly worse than ordinary BPSK, particularly at higher <math>E_b/N_0</math> values. + +Using DPSK avoids the need for possibly complex carrier-recovery schemes to provide an accurate phase estimate and can be an attractive alternative to ordinary PSK. + +In [[optical communications]], the data can be modulated onto the phase of a [[laser]] in a differential way. The modulation is a laser which emits a [[continuous wave]], and a [[Mach-Zehnder modulator]] which receives electrical binary data. For the case of BPSK for example, the laser transmits the field unchanged for binary '1', and with reverse polarity for '0'. The demodulator consists of a [[delay line interferometer]] which delays one bit, so two bits can be compared at one time. In further processing, a [[photodiode]] is used to transform the [[optical field]] into an electric current, so the information is changed back into its original state. + +The bit-error rates of DBPSK and DQPSK are compared to their non-differential counterparts in the graph to the right. The loss for using DBPSK is small enough compared to the complexity reduction that it is often used in communications systems that would otherwise use BPSK. For DQPSK though, the loss in performance compared to ordinary QPSK is larger and the system designer must balance this against the reduction in complexity. + +===Example: Differentially encoded BPSK=== + +[[File:Differential Codec.png|center|500px|thumb|Differential encoding/decoding system diagram.]] + +At the <math>k^{\textrm{th}}</math> time-slot call the bit to be modulated <math>b_k</math>, the differentially encoded bit <math>e_k</math> and the resulting modulated signal <math>m_k(t)</math>. Assume that the constellation diagram positions the symbols at ±1 (which is BPSK). The differential encoder produces: +:<math>\,e_k = e_{k-1}\oplus{}b_k</math> +where <math>\oplus{}</math> indicates [[binary addition|binary]] or [[modular arithmetic|modulo-2]] addition. + +[[File:Diff enc BPSK BER curves.svg|thumb|right|280px|BER comparison between BPSK and differentially encoded BPSK with gray-coding operating in white noise.]] + +So <math>e_k</math> only changes state (from binary '0' to binary '1' or from binary '1' to binary '0') if <math>b_k</math> is a binary '1'. Otherwise it remains in its previous state. This is the description of differentially encoded BPSK given above. + +The received signal is demodulated to yield <math>e_k=</math>±1 and then the differential decoder reverses the encoding procedure and produces: +:<math>\,b_k = e_{k}\oplus{}e_{k-1}</math> since binary subtraction is the same as binary addition. + +Therefore, <math>b_k=1</math> if <math>e_k</math> and <math>e_{k-1}</math> differ and <math>b_k=0</math> if they are the same. Hence, if both <math>e_k</math> and <math>e_{k-1}</math> are ''inverted'', <math>b_k</math> will still be decoded correctly. Thus, the 180° phase ambiguity does not matter. + +Differential schemes for other PSK modulations may be devised along similar lines. The waveforms for DPSK are the same as for differentially encoded PSK given above since the only change between the two schemes is at the receiver. + +The BER curve for this example is compared to ordinary BPSK on the right. As mentioned above, whilst the error-rate is approximately doubled, the increase needed in <math>E_b/N_0</math> to overcome this is small. The increase in <math>E_b/N_0</math> required to overcome differential modulation in coded systems, however, is larger - typically about 3 dB. The performance degradation is a result of [[noncoherent transmission]] - in this case it refers to the fact that tracking of the phase is completely ignored. + +== Channel capacity == + +[[File:Channel capacity for complex constellations.svg|thumb|300px|Given a fixed bandwidth, channel capacity vs. [[signal-to-noise ratio|SNR]] for some common modulation schemes]] + +Like all M-ary modulation schemes with M = 2<sup>''b''</sup> symbols, when given exclusive access to a fixed bandwidth, the channel capacity of any phase shift keying modulation scheme rises to a maximum of ''b'' bits per symbol as the [[signal-to-noise ratio]] increases. + +==See also== + +*[[Differential coding]] +*[[Filtered symmetric differential phase-shift keying]] +*[[Modulation]] &mdash; for an overview of all modulation schemes +*[[Phase modulation]] (PM) &mdash; the analogue equivalent of PSK +*[[Polar modulation]] +*[[PSK31]] +*[[PSK63]] + +==Notes== + +<!--This article uses the Cite.php citation mechanism. If you would like more information on how to add references to this article, please see http://meta.wikimedia.org/wiki/Cite/Cite.php --> +{{reflist}} + +==References== + +The notation and theoretical results in this article are based on material presented in the following sources: + +*{{cite book | author=Proakis, John G. | title=Digital Communications | location=Singapore | publisher=McGraw Hill | year=1995 | isbn=0-07-113814-5}} +*{{cite book | author=Couch, Leon W. II | title=Digital and Analog Communications | location=Upper Saddle River, NJ | publisher=Prentice-Hall | year=1997 | isbn=0-13-081223-4}} +*{{cite book | author=Haykin, Simon | title=Digital Communications | location=Toronto, Canada | publisher=John Wiley & Sons | year=1988 | isbn=0-471-62947-2}} + +{{DEFAULTSORT:Phase-Shift Keying}} +[[Category:Quantized radio modulation modes]] +[[Category:Data transmission]] + qkc3nxko03v95gi2cybrjvdyspdxho0 + + + + Binomial theorem + 0 + 132 + + 133 + 2014-01-23T13:45:46Z + + 149.77.175.51 + + wikitext + text/x-wiki + [[Image:Pascal's triangle 5.svg|right|thumb|200px|The [[binomial coefficients]] appear as the entries of [[Pascal's triangle]] where each entry is the sum of the two above it.]] +In [[elementary algebra]], the '''binomial theorem''' describes the algebraic expansion of [[exponentiation|powers]] of a [[binomial]]. According to the theorem, it is possible to expand the power (''x''&nbsp;+&nbsp;''y'')<sup>''n''</sup> into a [[sum]] involving terms of the form ''ax''<sup>''b''</sup>''y''<sup>''c''</sup>, where the exponents ''b'' and ''c'' are [[nonnegative integer]]s with {{nowrap|''b'' + ''c'' {{=}} ''n''}}, and the [[coefficient]] ''a'' of each term is a specific [[positive integer]] depending on ''n'' and ''b''. When an exponent is zero, the corresponding power is usually omitted from the term. For example, + +:<math>(x+y)^4 \;=\; x^4 \,+\, 4 x^3y \,+\, 6 x^2 y^2 \,+\, 4 x y^3 \,+\, y^4.</math> + +The coefficient ''a'' in the term of ''ax''<sup>''b''</sup>''y''<sup>''c''</sup> is known as the [[binomial coefficient]] <math>\tbinom nb</math> or <math>\tbinom nc</math> (the two have the same value). These coefficients for varying ''n'' and ''b'' can be arranged to form [[Pascal's triangle]]. These numbers also arise in [[combinatorics]], where <math>\tbinom nb</math> gives the number of different [[combinations]] of ''b'' [[element (mathematics)|elements]] that can be chosen from an ''n''-element [[set (mathematics)|set]]. + +==History== +This formula and the triangular arrangement of the binomial coefficients are often attributed to [[Blaise Pascal]], who described them in the 17th century, but they were known to many mathematicians who preceded him. For instance, Sir Isaac Newton is generally credited with the generalised binomial theorem, valid for any exponent. The 4th century B.C. [[Greek mathematics|Greek mathematician]] [[Euclid]] mentioned the special case of the binomial theorem for exponent&nbsp;2<ref>[http://mathworld.wolfram.com/BinomialTheorem.html Binomial Theorem]</ref><ref>[http://www.jstor.org/pss/2305028 The Story of the Binomial Theorem by J. L. Coolidge], ''The American Mathematical Monthly'' '''56''':3 (1949), pp. 147–157</ref> as did the 3rd century B.C. [[Indian mathematics|Indian mathematician]] [[Pingala]] to higher orders. A more general binomial theorem and the so-called "[[Pascal's triangle]]" were known in the 10th-century A.D. to Indian mathematician [[Halayudha]] and [[Islamic mathematics|Persian mathematician]] [[Al-Karaji]],<ref name=Karaji>{{MacTutor|id=Al-Karaji|title=Abu Bekr ibn Muhammad ibn al-Husayn Al-Karaji}}</ref> in the 11th century to Persian poet and mathematician [[Omar Khayyam]],<ref>{{cite book|last=Sandler|first=Stanley|title=An Introduction to Applied Statistical Thermodynamics|year=2011|publisher=John Wiley & Sons, Inc.|location=Hoboken NJ|isbn=978-0-470-91347-5}}</ref> and in the 13th century to [[Chinese mathematics|Chinese mathematician]] [[Yang Hui]], who all derived similar results.<ref>{{Cite web +| last = Landau +| first = James A +| title = <nowiki>Historia Matematica Mailing List Archive: Re: [HM] Pascal's Triangle</nowiki> +| work = Archives of Historia Matematica +| format = mailing list email +| accessdate = 2007-04-13 +| date = 1999-05-08 +| url = http://archives.math.utk.edu/hypermail/historia/may99/0073.html +}}</ref> Al-Karaji also provided a [[mathematical proof]] of both the binomial theorem and Pascal's triangle, using [[mathematical induction]].<ref name=Karaji/> + +==Statement of the theorem== +According to the theorem, it is possible to expand any power of ''x''&nbsp;+&nbsp;''y'' into a sum of the form + +:<math>(x+y)^n = {n \choose 0}x^n y^0 + {n \choose 1}x^{n-1}y^1 + {n \choose 2}x^{n-2}y^2 + \cdots + {n \choose n-1}x^1 y^{n-1} + {n \choose n}x^0 y^n, +</math> + +where each <math> \tbinom nk </math> is a specific positive integer known as [[binomial coefficient]]. This formula is also referred to as the '''binomial formula''' or the '''binomial identity'''. Using [[Capital-sigma notation|summation notation]], it can be written as + +:<math>(x+y)^n = \sum_{k=0}^n {n \choose k}x^{n-k}y^k = \sum_{k=0}^n {n \choose k}x^{k}y^{n-k}. +</math> +The final expression follows from the previous one by the symmetry of ''x'' and ''y'' in the first expression, and by comparison it follows that the sequence of binomial coefficients in the formula is symmetrical. + +A simple variant of the binomial formula is obtained by [[substitution (algebra)|substituting]] 1 for ''y'', so that it involves only a single [[Variable (mathematics)|variable]]. In this form, the formula reads + +:<math>(1+x)^n = {n \choose 0}x^0 + {n \choose 1}x^1 + {n \choose 2}x^2 + \cdots + {n \choose {n-1}}x^{n-1} + {n \choose n}x^n,</math> + +or equivalently + +:<math>(1+x)^n = \sum_{k=0}^n {n \choose k}x^k.</math> + +==Examples== +[[Image:Pascal triangle small.png|thumb|right|300px|Pascal's triangle]] +The most basic example of the binomial theorem is the formula for the [[Square (algebra)|square]] of ''x''&nbsp;+&nbsp;''y'': + +:<math>(x + y)^2 = x^2 + 2xy + y^2.\!</math> + +The binomial coefficients 1, 2, 1 appearing in this expansion correspond to the third row of Pascal's triangle. The coefficients of higher powers of ''x''&nbsp;+&nbsp;''y'' correspond to later rows of the triangle: + +:<math> +\begin{align} + \\[8pt] +(x+y)^3 & = x^3 + 3x^2y + 3xy^2 + y^3, \\[8pt] +(x+y)^4 & = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4, \\[8pt] +(x+y)^5 & = x^5 + 5x^4y + 10x^3y^2 + 10x^2y^3 + 5xy^4 + y^5, \\[8pt] +(x+y)^6 & = x^6 + 6x^5y + 15x^4y^2 + 20x^3y^3 + 15x^2y^4 + 6xy^5 + y^6, \\[8pt] +(x+y)^7 & = x^7 + 7x^6y + 21x^5y^2 + 35x^4y^3 + 35x^3y^4 + 21x^2y^5 + 7xy^6 + y^7. +\end{align} +</math> +Notice that +#the powers of x go down until it reaches 0 (<math>x^0=1</math>), starting value is n (the n in <math>(x+y)^n</math>.) +#the powers of y go up from 0 (<math>y^0=1</math>) until it reaches n (also the n in <math>(x+y)^n</math>.) +#the nth row of the Pascal's Triangle will be the coefficients of the expanded binomial. (Note that the top is row 0.) +#for each line, the number of products (i.e. the sum of the coefficients) is equal to <math>2^n</math>. +#for each line, the number of product groups is equal to <math>n+1</math>. +The binomial theorem can be applied to the powers of any binomial. For example, + +:<math>\begin{align} +(x+2)^3 &= x^3 + 3x^2(2) + 3x(2)^2 + 2^3 \\ +&= x^3 + 6x^2 + 12x + 8.\end{align}</math> + +For a binomial involving subtraction, the theorem can be applied as long as the [[additive inverse|opposite]] of the second term is used. This has the effect of changing the sign of every other term in the expansion: +:<math>(x-y)^3 = x^3 - 3x^2y + 3xy^2 - y^3.\!</math> + +Another useful example is that of the expansion of the following square roots: +:<math>(1+x)^{0.5} = \textstyle 1 + \frac{1}{2}x - \frac{1}{8}x^2 + \frac{1}{16}x^3 - \frac{5}{128}x^4 + \frac{7}{256}x^5 - \cdots</math> + +:<math>(1+x)^{-0.5} = \textstyle 1 -\frac{1}{2}x + \frac{3}{8}x^2 - \frac{5}{16}x^3 + \frac{35}{128}x^4 - \frac{63}{256}x^5 + \cdots</math> + +===Geometric explanation=== +[[Image:BinomialTheorem.png|right|315px]] +For positive values of ''a'' and ''b'', the binomial theorem with ''n''&nbsp;=&nbsp;2 is the geometrically evident fact that a square of side {{nowrap|''a'' + ''b''}} can be cut into a square of side ''a'', a square of side ''b'', and two rectangles with sides ''a'' and ''b''. With ''n''&nbsp;=&nbsp;3, the theorem states that a cube of side {{nowrap|''a'' + ''b''}} can be cut into a cube of side ''a'', a cube of side ''b'', three ''a''&times;''a''&times;''b'' rectangular boxes, and three ''a''&times;''b''&times;''b'' rectangular boxes. + +In [[calculus]], this picture also gives a geometric proof of the [[derivative]] <math>(x^n)'=nx^{n-1}:</math><ref name="barth2004">{{Harv|Barth|2004}}</ref> if one sets <math>a=x</math> and <math>b=\Delta x,</math> interpreting ''b'' as an infinitesimal change in ''a,'' then this picture shows the infinitesimal change in the volume of an ''n''-dimensional [[hypercube]], <math>(x+\Delta x)^n,</math> where the coefficient of the linear term (in <math>\Delta x</math>) is <math>nx^{n-1},</math> the area of the ''n'' faces, each of dimension <math>(n-1):</math> +:<math>(x+\Delta x)^n = x^n + nx^{n-1}\Delta x + \tbinom{n}{2}x^{n-2}(\Delta x)^2 + \cdots.</math> +Substituting this into the [[definition of the derivative]] via a [[difference quotient]] and taking limits means that the higher order terms – <math>(\Delta x)^2</math> and higher – become negligible, and yields the formula <math>(x^n)'=nx^{n-1},</math> interpreted as +:"the infinitesimal change in volume of an ''n''-cube as side length varies is the area of ''n'' of its <math>(n-1)</math>-dimensional faces". +If one integrates this picture, which corresponds to applying the [[fundamental theorem of calculus]], one obtains [[Cavalieri's quadrature formula]], the integral <math>\textstyle{\int x^{n-1}\,dx = \tfrac{1}{n} x^n}</math> – see [[Cavalieri's quadrature formula#Proof|proof of Cavalieri's quadrature formula]] for details.<ref name="barth2004" /> + +{{clear}} + +==The binomial coefficients== +{{main|Binomial coefficient}} +The coefficients that appear in the binomial expansion are called '''binomial coefficients'''. These are usually written <math> \tbinom nk </math>, and pronounced “''n'' choose ''k''”. + +===Formulae=== +The coefficient of ''x''<sup>''n''&minus;''k''</sup>''y''<sup>''k''</sup> is given by the formula + +:<math>{n \choose k} = \frac{n!}{k!\,(n-k)!}</math>, + +which is defined in terms of the [[factorial]] function ''n''!. Equivalently, this formula can be written + +:<math>{n \choose k} = \frac{n (n-1) \cdots (n-k+1)}{k (k-1) \cdots 1} = \prod_{\ell=1}^k \frac{n-\ell+1}{\ell} = \prod_{\ell=0}^{k-1} \frac{n-\ell}{k - \ell}</math> + +with ''k'' factors in both the numerator and denominator of the [[Fraction (mathematics)|fraction]]. Note that, although this formula involves a fraction, the binomial coefficient <math> \tbinom nk </math> is actually an [[integer]]. + +===Combinatorial interpretation=== +The binomial coefficient <math> \tbinom nk </math> can be interpreted as the number of ways to choose ''k'' elements from an ''n''-element set. This is related to binomials for the following reason: if we write (''x''&nbsp;+&nbsp;''y'')<sup>''n''</sup> as a [[Product (mathematics)|product]] +:<math>(x+y)(x+y)(x+y)\cdots(x+y),</math> +then, according to the [[distributive law]], there will be one term in the expansion for each choice of either ''x'' or ''y'' from each of the binomials of the product. For example, there will only be one term ''x''<sup>''n''</sup>, corresponding to choosing ''x'' from each binomial. However, there will be several terms of the form ''x''<sup>''n''&minus;2</sup>''y''<sup>2</sup>, one for each way of choosing exactly two binomials to contribute a ''y''. Therefore, after [[combining like terms]], the coefficient of ''x''<sup>''n''&minus;2</sup>''y''<sup>2</sup> will be equal to the number of ways to choose exactly 2 elements from an ''n''-element set. + +==Proofs== +===Combinatorial proof=== +====Example==== +The coefficient of ''xy''<sup>2</sup> in + +:<math>\begin{align} +(x+y)^3 &= (x+y)(x+y)(x+y) \\ +&= xxx + xxy + xyx + \underline{xyy} + yxx + \underline{yxy} + \underline{yyx} + yyy \\ +&= x^3 + 3x^2y + \underline{3xy^2} + y^3. +\end{align} \, </math> + +equals <math>\tbinom{3}{2}=3</math> because there are three ''x'',''y'' strings of length 3 with exactly two ''y'''s, namely, + +:<math>xyy, \; yxy, \; yyx,</math> + +corresponding to the three 2-element subsets of {&nbsp;1,&nbsp;2,&nbsp;3&nbsp;}, namely, + +:<math>\{2,3\},\;\{1,3\},\;\{1,2\}, </math> + +where each subset specifies the positions of the ''y'' in a corresponding string. + +====General case==== +Expanding (''x''&nbsp;+&nbsp;''y'')<sup>''n''</sup> yields the sum of the 2<sup>&nbsp;''n''</sup> products of the form ''e''<sub>1</sub>''e''<sub>2</sub>&nbsp;...&nbsp;''e''<sub>&nbsp;''n''</sub> where each ''e''<sub>&nbsp;''i''</sub> is ''x'' or&nbsp;''y''. Rearranging factors shows that each product equals ''x''<sup>''n''&minus;''k''</sup>''y''<sup>''k''</sup> for some ''k'' between 0 and&nbsp;''n''. For a given ''k'', the following are proved equal in succession: +*the number of copies of ''x''<sup>''n''&nbsp;&minus;&nbsp;''k''</sup>''y''<sup>''k''</sup> in the expansion +*the number of ''n''-character ''x'',''y'' strings having ''y'' in exactly ''k'' positions +*the number of ''k''-element subsets of {&nbsp;1,&nbsp;2,&nbsp;...,&nbsp;''n''} +*<math>{n \choose k}</math> (this is either by definition, or by a short combinatorial argument if one is defining <math>{n \choose k}</math> as <math>\frac{n!}{k!\,(n-k)!}</math>). +This proves the binomial theorem. + +===Inductive proof=== +[[mathematical induction|Induction]] yields another proof of the binomial theorem&nbsp;(1). When ''n''&nbsp;=&nbsp;0, both sides equal 1, since ''x''<sup>0</sup>&nbsp;=&nbsp;1 for all nonzero ''x'' and <math>\tbinom{0}{0}=1</math>. +Now suppose that (1) holds for a given ''n''; we will prove it for ''n''&nbsp;+&nbsp;1. +For ''j'',&nbsp;''k''&nbsp;≥&nbsp;0, let [''ƒ''(''x'',&nbsp;''y'')]<sub>&nbsp;''jk''</sub> denote the coefficient of ''x''<sup>''j''</sup>''y''<sup>''k''</sup> in the polynomial ''ƒ''(''x'',&nbsp;''y''). +By the inductive hypothesis, (''x''&nbsp;+&nbsp;''y'')<sup>''n''</sup> is a polynomial in ''x'' and ''y'' such that [(''x''&nbsp;+&nbsp;''y'')<sup>''n''</sup>]<sub>&nbsp;''jk''</sub> is <math>\tbinom{n}{k}</math> if ''j''&nbsp;+&nbsp;''k''&nbsp;=&nbsp;''n'', and 0 otherwise. +The identity + +:<math> (x+y)^{n+1} = x(x+y)^n + y(x+y)^n, \, </math> + +shows that (''x''&nbsp;+&nbsp;''y'')<sup>''n''+1</sup> also is a polynomial in ''x'' and ''y'', and + +:<math> [(x+y)^{n+1}]_{jk} = [(x+y)^n]_{j-1,k} + [(x+y)^n]_{j,k-1}. \, </math> + +If ''j''&nbsp;+&nbsp;''k''&nbsp;=&nbsp;''n''&nbsp;+&nbsp;1, then (''j''&nbsp;&minus;&nbsp;1)&nbsp;+&nbsp;''k''&nbsp;=&nbsp;''n'' and ''j''&nbsp;+&nbsp;(''k''&nbsp;&minus;&nbsp;1)&nbsp;=&nbsp;''n'', so the right hand side is + +:<math> \tbinom{n}{k} + \tbinom{n}{k-1} = \tbinom{n+1}{k},</math> + +by [[Pascal's identity]]. On the other hand, if ''j''&nbsp;+''k''&nbsp;≠&nbsp;''n''&nbsp;+&nbsp;1, then (''j''&nbsp;–&nbsp;1)&nbsp;+&nbsp;''k''&nbsp;≠&nbsp;''n'' and ''j''&nbsp;+(''k''&nbsp;–&nbsp;1)&nbsp;≠&nbsp;''n'', so we get 0&nbsp;+&nbsp;0&nbsp;=&nbsp;0. Thus + +:<math>(x+y)^{n+1} = \sum_{k=0}^{n+1} \tbinom{n+1}{k} x^{n+1-k} y^k,</math> + +which is the inductive hypothesis with ''n''&nbsp;+&nbsp;1 substituted for ''n'' and so completes the inductive step. + +==Generalisations== +===Newton's generalised binomial theorem=== +{{main|Binomial series}} +Around 1665, [[Isaac Newton]] generalised the formula to allow real exponents other than nonnegative integers, and in fact it can be generalised further, to complex exponents. In this generalisation, the finite sum is replaced by an [[infinite series]]. In order to do this one needs to give meaning to binomial coefficients with an arbitrary upper index, which cannot be done using the above formula with factorials; however factoring out (''n''&nbsp;−&nbsp;''k'')! from numerator and denominator in that formula, and replacing ''n'' by ''r'' which now stands for an arbitrary number, one can define + +:<math>{r \choose k}=\frac{r\,(r-1) \cdots (r-k+1)}{k!} =\frac{(r)_k}{k!},</math> +<!-- +This is not the same as \frac{r!}{k!\,(r−k)!}. Factorials are typically only defined on natural number arguments, but even if you are using factorials generalised (e.g. by the \Gamma function) to non-integer values, they are still undefined on the negative integers. To get the usual binomial theorem as a special case of this so-called generalisation, we had better define the binomial coefficient when ''r'' is an integer, but in that case ''r''−''k'' will be a negative integer for sufficiently large ''k'', so one cannot use any formula involving the factorial <math>(r−k)!</math>. + +This negative comment about "not the same as…" seems to be needed. People keep coming along and completing this formula with this expression involving factorials, missing the point of this section. +~~~~perhaps someone could put a better explanation in! Here is an attempt!. +The problem with substituting \frac{r!}{k!\,(r−k)!} is that the ! ends up being used for negative numbers which doesn't work with the definition of !. Consequently, the notation here is used because if you look at it for a negative value of n, the value is still defined with this notation. That being said, many text books are careless about it. +--> +where <math>(\cdot)_k</math> is the [[Pochhammer symbol]] here standing for a [[falling factorial]]. Then, if ''x'' and ''y'' are real numbers with |''x''|&nbsp;>&nbsp;|''y''|,<ref name=convergence>This is to guarantee convergence. Depending on ''r'', the series may also converge sometimes when |''x''|&nbsp;=&nbsp;|''y''|.</ref> and ''r'' is any [[complex number]], one has + +:<math> +\begin{align} +(x+y)^r & =\sum_{k=0}^\infty {r \choose k} x^{r-k} y^k \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad(2) \\ +& = x^r + r x^{r-1} y + \frac{r(r-1)}{2!} x^{r-2} y^2 + \frac{r(r-1)(r-2)}{3!} x^{r-3} y^3 + \cdots. +\end{align} +</math> +When ''r'' is a nonnegative integer, the binomial coefficients for ''k''&nbsp;>&nbsp;''r'' are zero, so (2) specializes to (1), and there are at most ''r''&nbsp;+&nbsp;1 nonzero terms. For other values of ''r'', the series (2) has infinitely many nonzero terms, at least if ''x'' and ''y'' are nonzero. + +This is important when one is working with infinite series and would like to represent them in terms of [[generalised hypergeometric function]]s. + +Taking ''r''&nbsp;=&nbsp;&minus;''s'' leads to a useful formula: + +:<math>\frac{1}{(1-x)^s} = \sum_{k=0}^\infty {s+k-1 \choose k} x^k \equiv \sum_{k=0}^\infty {s+k-1 \choose s-1} x^k.</math> + +Further specializing to ''s''&nbsp;=&nbsp;1 yields the [[Geometric series#Formula|geometric series formula]]. + +====Generalisations==== +Formula (2) can be generalised to the case where ''x'' and ''y'' are [[complex numbers]]. For this version, one should assume |''x''|&nbsp;>&nbsp;|''y''|<ref name=convergence/> and define the powers of ''x''&nbsp;+&nbsp;''y'' and ''x'' using a [[holomorphic]] [[complex logarithm|branch of log]] defined on an open disk of radius |''x''| centered at ''x''. + +Formula (2) is valid also for elements ''x'' and ''y'' of a [[Banach algebra]] as long as ''xy''&nbsp;=&nbsp;''yx'', ''x''&nbsp;is invertible, and&nbsp;||''y/x''||&nbsp;<&nbsp;1. + +===The multinomial theorem=== +{{main|Multinomial theorem}} +The binomial theorem can be generalised to include powers of sums with more than two terms. The general version is + +:<math>(x_1 + x_2 + \cdots + x_m)^n + = \sum_{k_1,k_2,\ldots,k_m} {n \choose k_1, k_2, \ldots, k_m} + x_1^{k_1} x_2^{k_2} \cdots x_m^{k_m}. </math> + +where the summation is taken over all sequences of nonnegative integer indices ''k''<sub>1</sub> through ''k''<sub>''m''</sub> such that the sum of all ''k''<sub>''i''</sub> is&nbsp;''n''. (For each term in the expansion, the exponents must add up to&nbsp;''n''). The coefficients <math> \tbinom n{k_1,\cdots,k_n} </math> are known as multinomial coefficients, and can be computed by the formula + +:<math> {n \choose k_1, k_2, \ldots, k_m} + = \frac{n!}{k_1!\, k_2! \cdots k_m!}.</math> + +Combinatorially, the multinomial coefficient <math>\tbinom n{k_1,\cdots,k_n}</math> counts the number of different ways to [[Partition of a set|partition]] an ''n''-element set into [[Disjoint sets|disjoint]] [[subset]]s of sizes ''k''<sub>1</sub>,&nbsp;...,&nbsp;''k''<sub>''n''</sub>. + +=== {{anchor|multi-binomial}} The multi-binomial theorem === +It is often useful when working in more dimensions, to deal with products of binomial expressions. By the binomial theorem this is equal to + +:<math> (x_{1}+y_{1})^{n_{1}}\dotsm(x_{d}+y_{d})^{n_{d}} = \sum_{k_{1}=0}^{n_{1}}\dotsm\sum_{k_{d}=0}^{n_{d}} \binom{n_{1}}{k_{1}}\, x_{1}^{k_{1}}y_{1}^{n_{1}-k_{1}}\;\dotsc\;\binom{n_{d}}{k_{d}}\, x_{d}^{k_{d}}y_{d}^{n_{d}-k_{d}}. </math> + +This may be written more concisely, by [[multi-index notation]], as + +:<math> (x+y)^\alpha = \sum_{\nu \le \alpha} \binom{\alpha}{\nu} \, x^\nu y^{\alpha - \nu}.</math> + +==Applications== +===Multiple angle identities=== +For the [[complex numbers]] the binomial theorem can be combined with [[De Moivre's formula]] to yield [[List of trigonometric identities#Multiple-angle formulae|multiple-angle formulas]] for the [[sine]] and [[cosine]]. According to De Moivre's formula, +:<math>\cos\left(nx\right)+i\sin\left(nx\right) = \left(\cos x+i\sin x\right)^n.\,</math> +Using the binomial theorem, the expression on the right can be expanded, and then the real and imaginary parts can be taken to yield formulas for cos(''nx'') and sin(''nx''). For example, since +:<math>\left(\cos x+i\sin x\right)^2 = \cos^2 x + 2i \cos x \sin x - \sin^2 x,</math> +De Moivre's formula tells us that +:<math>\cos(2x) = \cos^2 x - \sin^2 x \quad\text{and}\quad\sin(2x) = 2 \cos x \sin x,</math> +which are the usual double-angle identities. Similarly, since +:<math>\left(\cos x+i\sin x\right)^3 = \cos^3 x + 3i \cos^2 x \sin x - 3 \cos x \sin^2 x - i \sin^3 x,</math> +De Moivre's formula yields +:<math>\cos(3x) = \cos^3 x - 3 \cos x \sin^2 x \quad\text{and}\quad \sin(3x) = 3\cos^2 x \sin x - \sin^3 x.</math> +In general, +:<math>\cos(nx) = \sum_{k\text{ even}} (-1)^{k/2} {n \choose k}\cos^{n-k} x \sin^k x</math> +and +:<math>\sin(nx) = \sum_{k\text{ odd}} (-1)^{(k-1)/2} {n \choose k}\cos^{n-k} x \sin^k x.</math> + +===Series for e=== +The [[e (mathematical constant)|number ''e'']] is often defined by the formula + +:<math>e = \lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n.</math> + +Applying the binomial theorem to this expression yields the usual [[infinite series]] for ''e''. In particular: + +:<math>\left(1 + \frac{1}{n}\right)^n = 1 + {n \choose 1}\frac{1}{n} + {n \choose 2}\frac{1}{n^2} + {n \choose 3}\frac{1}{n^3} + \cdots + {n \choose n}\frac{1}{n^n}.</math> + +The ''k''th term of this sum is + +:<math>{n \choose k}\frac{1}{n^k} \;=\; \frac{1}{k!}\cdot\frac{n(n-1)(n-2)\cdots (n-k+1)}{n^k}</math> + +As ''n''&nbsp;→&nbsp;∞, the rational expression on the right approaches one, and therefore + +:<math>\lim_{n\to\infty} {n \choose k}\frac{1}{n^k} = \frac{1}{k!}.</math> + +This indicates that ''e'' can be written as a series: + +:<math>e = \frac{1}{0!} + \frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots.</math> + +Indeed, since each term of the binomial expansion is an [[Monotonic function|increasing function]] of ''n'', it follows from the [[monotone convergence theorem]] for series that the sum of this infinite series is equal to&nbsp;''e''. + +==The binomial theorem in abstract algebra== + +Formula (1) is valid more generally for any elements ''x'' and ''y'' of a [[semiring]] satisfying ''xy''&nbsp;=&nbsp;''yx''. The [[theorem]] is true even more generally: [[alternativity]] suffices in place of [[associativity]]. + +The binomial theorem can be stated by saying that the [[polynomial sequence]] {&nbsp;1,&nbsp;''x'',&nbsp;''x''<sup>2</sup>,&nbsp;''x''<sup>3</sup>,&nbsp;...&nbsp;} is of [[binomial type]]. + +==In popular culture== +*The binomial theorem is mentioned in the [[Major-General's Song]] in the comic opera [[The Pirates of Penzance]]. +*Professor Moriarty is described by Sherlock Holmes as having written a treatise on the binomial theorem. + +== See also == +* [[A Treatise on the Binomial Theorem]] +* [[Binomial approximation]] +* [[Binomial distribution]] +* [[Binomial inverse theorem]] +* [[Binomial probability]] +* [[Binomial series]] +* [[Combination]] +* [[Multinomial theorem]] +* [[Negative binomial distribution]] +* [[Pascal's triangle]] +* [[Stirling's approximation]] + +==Notes== +{{reflist}} + +==References== +{{refbegin}} +*{{cite journal|last=Bag|first=Amulya Kumar|year=1966|title=Binomial theorem in ancient India|journal=Indian J. History Sci|volume=1|issue=1|pages=68–74}} +*{{cite doi|10.2307/4145193|noedit}} +*{{cite book|last1=Graham|first1=Ronald|first2=Donald |last2=Knuth|first3= Oren|last3= Patashnik|title=Concrete Mathematics|publisher=Addison Wesley|year=1994|edition=2nd|pages=153–256|chapter=(5) Binomial Coefficients|isbn=0-201-55802-5|oclc=17649857}} +{{refend}} + +== External links == +{{wikibooks|Combinatorics|Binomial Theorem|The Binomial Theorem}} +*{{SpringerEOM|id=Newton_binomial|first=E.D.|last= Solomentsev|title=Newton binomial}} +*[http://demonstrations.wolfram.com/BinomialTheorem/ Binomial Theorem] by [[Stephen Wolfram]], and [http://demonstrations.wolfram.com/BinomialTheoremStepByStep/ "Binomial Theorem (Step-by-Step)"] by Bruce Colletti and Jeff Bryant, [[Wolfram Demonstrations Project]], 2007. + +{{PlanetMath attribution|id=338|title=inductive proof of binomial theorem}} + +{{DEFAULTSORT:Binomial Theorem}} +[[Category:Factorial and binomial topics]] +[[Category:Theorems in algebra]] +[[Category:Articles containing proofs]] + c18sffx6q4vz1s32sen1844lmzclqqo + + + + Rate equation + 0 + 9081 + + 9082 + 2014-01-31T16:54:19Z + + Hairy Dude + 0 + + capitalisation + wikitext + text/x-wiki + The '''rate law''' or '''rate equation''' for a [[chemical reaction]] is an equation that links the [[reaction rate]] with concentrations or pressures of reactants and constant parameters (normally rate coefficients and partial [[reaction order]]s).<ref>[http://goldbook.iupac.org/R05141.html IUPAC Gold Book definition of rate law]. See also: According to [[IUPAC]] [[Compendium of Chemical Terminology]].</ref> To determine the rate equation for a particular system one combines the reaction rate with a [[mass balance]] for the system.<ref>Kenneth A. Connors ''Chemical Kinetics, the study of reaction rates in solution'', 1991, VCH Publishers. This book''' contains most of the rate equations in this article and their derivation.</ref> For a generic reaction {{nowrap|''a''A + ''b''B → C}} with no intermediate steps in its [[reaction mechanism]] (that is, an [[elementary reaction]]), the rate is given by +:<math>r\; =\; k[\mathrm{A}]^x[\mathrm{B}]^y</math> +where [A] and [B] express the concentration of the species A and B, respectively (usually in moles per liter ([[molarity]], M)); ''x'' and ''y'' must be determined experimentally (a common mistake is assuming they represent stoichiometric coefficients but this is not the case). ''k'' is the ''rate coefficient'' or ''rate constant'' of the reaction. The value of this coefficient ''k'' depends on conditions such as temperature, ionic strength, surface area of the [[adsorbent]] or light irradiation. For elementary reactions, the rate equation can be derived from first principles using [[collision theory]] under well-stirred conditions. + +The rate equation is a [[differential equation]], and it can be [[integral|integrated]] to obtain an '''integrated rate equation''' that links concentrations of reactants or products with time. + +==Stoichiometric reaction networks== +The most general description of a chemical reaction network considers a number <math>N</math> of distinct chemical species reacting via <math>R</math> reactions.<ref>Heinrich, R. and Schuster, S. (1996) The regulation of cellular systems. Chapman & Hall, New York.</ref> +<ref>Chen, L. and Wang, R. and Li, C. and Aihara, K. (2010) Modeling biomolecular networks in cells: structures and dynamics. Springer.</ref> The chemical equation of the <math>j</math>-th reaction can then be written in the generic form + +:<math> + s_{1j} X_1 + s_{2j} X_2 \ldots + s_{Nj} X_{N} \xrightarrow{k_j} \ r_{1j} X_{1} + \ r_{2j} X_{2} + \ldots + r_{Nj} X_{N}, +</math> + +which is often written in the equivalent form + +:<math> + \sum_{i=1}^{N} s_{ij} X_i \xrightarrow{k_j} \sum_{i=1}^{N}\ r_{ij} X_{i}. +</math> + +Here + +: <math>j</math> is the reaction index running from 1 to <math>R</math>, +: <math>X_i</math> denotes the <math>i</math>-th chemical species, +: <math>k_j</math> is the [[Reaction rate constant|rate constant]] of the <math>j</math>-th reaction and +: <math>s_{ij}</math> and <math>r_{ij}</math> are the stoichiometric coefficients of reactants and products, respectively. + +The rate of such reaction can be inferred by the [[law of mass action]] + +:<math> + f_j([\vec{X}])= k_j \prod_{z=1}^N [X_z]^{s_{zj}} +</math> + +which denotes the flux of molecules per unit time and unit volume. Here <math>[\vec{X}]=([X_1], [X_2], ... ,[X_N])</math> is the vector of concentrations. Note that this definition includes the [[elementary reaction]]s: + +* '''zero-order reactions''' +for which <math>s_{zj}=0</math> for all <math>z</math>, +* '''first-order reactions''' +for which <math>s_{zj}=1</math> for a single <math>z</math>, +* '''second-order reactions''' +for which <math>s_{zj}=1</math> for exactly two <math>z</math>, i.e., a bimolecular reaction, or <math>s_{zj}=2</math> for a single <math>z</math>, i.e., a dimerization reaction. + +Each of which are discussed in detail below. One can define the [[Stoichiometry#Stoichiometry_matrix|stoichiometric matrix]] +: <math>S_{ij}=r_{ij}-s_{ij},</math> +denoting the net extend of molecules of <math>i</math> in reaction <math>j</math>. The reaction rate equations can then be written in the general form + +:<math> + \frac{d [X_i]}{dt} =\sum_{j=1}^{R} S_{ij} f_j([\vec{X}]). +</math> + +Note that this is the product of the stochiometric matrix and the vector of reaction rate functions. +Particular simple solutions exist in equilibrium, <math>\frac{d [X_i]}{dt}=0</math>, for systems composed of merely reversible reactions. In this case the rate of the forward and backward reactions are equal, a principle called [[detailed balance]]. Note that detailed balance is a property of the stoichiometric matrix <math>S_{ij}</math> alone and does not depend on the particular form of the rate functions <math>f_j</math>. All other cases where detailed balance is violated are commonly studied by [[flux balance analysis]] which has been developed to understand [[metabolic pathway]]s.<ref>Szallasi, Z. and Stelling, J. and Periwal, V. (2006) System modeling in cell biology: from concepts to nuts and bolts. MIT Press Cambridge.</ref><ref>Iglesias, P.A. and Ingalls, B.P. (2010) Control theory and systems biology. MIT Press Cambridge.</ref> + +The rate equation of a reaction with a multi-step mechanism cannot, in general, be deduced from the stoichiometric coefficients of the overall reaction; it must be derived theoretically using [[Steady State theory|quasi-steady state assumptions]] from the underlying elementary reactions or determined experimentally. The equation may involve fractions, or it may depend on the concentration of an intermediate species. + +==Zero-order reactions== +A '''Zero-order reaction''' has a rate that is independent of the concentration of the reactant(s). Increasing the concentration of the reacting species will not speed up the rate of the reaction i.e. the amount of substance reacted is proportional to the time. Zero-order reactions are typically found when a material that is required for the reaction to proceed, such as a surface or a [[catalyst]], is saturated by the reactants. The rate law for a zero-order reaction is + +:<math>\ r = k</math> + +where r is the reaction rate and k is the reaction rate coefficient with units of concentration or time. If, and only if, this zeroth-order reaction 1) occurs in a closed system, 2) there is no net build-up of intermediates, and 3) there are no other reactions occurring, it can be shown by solving a [[mass balance]] equation for the system that: + +:<math> r = -\frac{d[A]}{dt}=k</math> + +If this [[differential equation]] is [[integral|integrated]] it gives an equation often called the '''integrated zero-order rate law'''. + +:<math>\ [A]_t = -kt + [A]_0</math> + +where <math>\ [A]_t</math> represents the concentration of the chemical of interest at a particular time, and <math>\ [A]_0</math> represents the initial concentration. + +A reaction is zero order if concentration data are plotted versus time and the result is a straight line. A plot of <math>\ [A]_t</math> vs. time t gives a straight line with a slope of <math> -k </math>. + +The half-life of a reaction describes the time needed for half of the reactant to be depleted (same as the [[half-life]] involved in [[nuclear decay]], which is a first-order reaction). For a zero-order reaction the half-life is given by + +: <math>\ t_ \frac{1}{2} = \frac{[A]_0}{2k}</math> + +;Example of a zero-order reaction +* Reversed [[Haber process]]: <math>2NH_3 (g) \rightarrow \; 3H_2 (g) + N_2 (g)</math> + +The order of a reaction cannot be deduced from the chemical equation of the reaction. + +==First-order reactions== + +{{see also|Order of reaction}} + +A '''first-order reaction''' depends on the concentration of only one reactant (a '''unimolecular reaction'''). Other reactants can be present, but each will be zero-order. The rate law for a reaction that is first order with respect to a reactant A is +:<math>\frac{-d[A]}{dt} \equiv r = k[A]</math> + +''k'' is the first order rate constant, which has units of 1/s. + +The '''integrated first-order rate law''' is + +:<math>\ \ln{[A]} = -kt + \ln{[A]_0}</math> + +A plot of <math>\ln{[A]}</math> vs. time ''t'' gives a straight line with a slope of <math>-k</math>. + +The half-life of a first-order reaction is independent of the starting concentration and is given by <math>\ t_ \frac{1}{2} = \frac{\ln{(2)}}{k}</math>. + +Examples of reactions that are first-order with respect to the reactant: + +* <math>\mbox{H}_2 \mbox{O}_2 (l) \rightarrow \; \mbox{H}_2\mbox{O} (l) + \frac{1}{2}\mbox{O}_2 (g)</math> +* <math>\mbox{SO}_2 \mbox{Cl}_2 (l) \rightarrow \; \mbox{SO}_2 (g) + \mbox{Cl}_2 (g)</math> +* <math>2\mbox{N}_2 \mbox{O}_5 (g) \rightarrow \; 4\mbox{NO}_2 (g) + \mbox{O}_2 (g)</math> + +===Further properties of first-order reaction kinetics=== +The integrated first-order rate law +:<math>\ \ln{[A]} = -kt + \ln{[A]_0}</math> +is usually written in the form of the exponential decay equation +:<math>A=A_0e^{-kt}\,</math> +A different (but equivalent) way of considering first order kinetics is as follows: The exponential decay equation can be rewritten as: +:<math>A=A_{0}\left( e^{-k\Delta t_{p}} \right)^{n}</math> +where <math>\Delta t_{p}</math> corresponds to a specific time period and <math>n</math> is an integer corresponding to the number of time periods. At the end of each time period, the fraction of the reactant population remaining relative to the amount present at the start of the time period, <math>f_{RP}</math>, will be: +:<math>\frac{A_{n}}{A_{n-1}} =f_{RP}=e^{-k\Delta t_{p}}</math> +Such that after <math>n</math> time periods, the fraction of the original reactant population will be: +:<math>\frac{A}{A_{0}}\equiv \frac{A_{n}}{A_{0}}=\left( e^{-k\Delta t_{p}} \right)^{n}=\left( f_{RP} \right)^{n}=\left( 1-f_{BP} \right)^{n}</math> +where: <math>f_{BP}</math> corresponds to the fraction of the reactant population that will break down in each time period. +This equation indicates that the fraction of the total amount of reactant population that will break down in each time period is independent of the initial amount present. When the chosen time period corresponds to <math>\Delta t_{p}=\frac{\ln \left( 2 \right)}{k}</math>, the fraction of the population that will break down in each time period will be exactly ½ the amount present at the start of the time period (i.e. the time period corresponds to the half-life of the first-order reaction). + +The average rate of the reaction for the n<sup>th</sup> time period is given by: +:<math>r_{avg,n}=-\frac{\Delta A}{\Delta t_{p}}=\frac{A_{n-1}-A_{n}}{\Delta t_{p}}</math> +Therefore, the amount remaining at the end of each time period will be related to the average rate of that time period and the reactant population at the start of the time period by: +:<math>A_{n}=A_{n-1}-r_{avg,n}\Delta t_{p}</math> +Since the fraction of the reactant population that will break down in each time period can be expressed as: +:<math>f_{BP}=1-\frac{A_{n}}{A_{n-1}}</math> +The amount of reactant that will break down in each time period can be related to the average rate over that time period by: +:<math>f_{BP}=\frac{r_{avg,n}\Delta t_{p}}{A_{n-1}}</math> +Such that the amount that remains at the end of each time period will be related to the amount present at the start of the time period according to: +:<math>A_{n}=A_{n-1}\left( 1-\frac{r_{avg,n}\Delta t_{p}}{A_{n-1}} \right)</math> +This equation is a recursion allowing for the calculation of the amount present after any number of time periods, without need of the rate constant, provided that the average rate for each time period is known. +<ref>Walsh R, Martin E, Darvesh S. A method to describe enzyme-catalyzed reactions by combining steady state and time course enzyme kinetic parameters... Biochim Biophys Acta. 2010 Jan;1800:1-5</ref> + +==Second-order reactions== + +A '''second-order reaction''' depends on the concentrations of one second-order reactant, or two first-order reactants. + +For a second order reaction, its reaction rate is given by: + +:<math>\ -\frac{d[A]}{dt} = 2k[A]^2</math> or <math>\ -\frac{d[A]}{dt} = k[A][B]</math> or <math>\ -\frac{d[A]}{dt} = 2k[B]^2</math> + +In several popular kinetics books, the definition of the rate law for second-order reactions is written instead as<math>-\frac{d[A]}{dt} = k[A]^2</math>. This effectively conflates the 2 inside the constant, k, whose numerical meaning then becomes different. This simplifying convention is followed in the integrated rate laws provided below. However, this simplification leads to potentially problematic inconsistencies, i.e. if the reaction rate is described in terms of product formation vs reactant disappearance. Instead, the option of keeping the 2 in the rate law (rather than absorbing it into a rate constant with an altered meaning) maintains a consistent meaning for k and is considered more correct technically. This more technically consistent convention is almost always used in peer-reviewed literature, tables of rate constants, and simulation software.<ref name="2nd-order">[http://www.rcdc.nd.edu/compilations/Ali/Ali.htm NDRL Radiation Chemistry Data Center]. See also: [http://www.getcited.org/puba/101600761 Christos Capellos and Bennon H. Bielski ''"Kinetic systems: mathematical description of chemical kinetics in solution"'' 1972, Wiley-Interscience (New York)].</ref> + +The '''integrated second-order rate laws''' are respectively + +:<math>\frac{1}{[A]} = \frac{1}{[A]_0} + kt </math> + +or + +:<math>\frac{[A]}{[B]} = \frac{[A]_0}{[B]_0} e^{([A]_0 - [B]_0)kt}</math> + +[A]<sub>0</sub> and [B]<sub>0</sub> must be different to obtain that integrated equation. + +The half-life equation for a second-order reaction dependent on one second-order reactant is <math>\ t_ \frac{1}{2} = \frac{1}{k[A]_0}</math>. For such a reaction, the half-life progressively doubles as the concentration of the reactant falls to half its initial value. + +Another way to present the above rate laws is to take the log of both sides: +<math>\ln{}r = \ln{}k + 2\ln\left[A\right] </math> + +;Examples of a Second-order reaction: +* <math>2\mbox{NO}_2(g) \rightarrow \; 2\mbox{NO}(g) + \mbox{O}_2(g)</math> + +===Pseudo-first-order=== + +Measuring a second-order reaction rate with reactants A and B can be problematic: The concentrations of the two reactants must be followed simultaneously, which is more difficult; or measure one of them and calculate the other as a difference, which is less precise. A common solution for that problem is the '''pseudo-first-order approximation'''. + +If the concentration of one of a reactants remains constant because it is supplied in great excess, its concentration can be absorbed within the rate constant, obtaining a '''pseudo''' first-order reaction constant, because in fact, it depends on the concentration of only one reactant. If, for example, [B] remains constant, then: + +<math>\ r = k[A][B] = k'[A]</math> + +where <math>k'=k[B]_0</math> (k' or k<sub>obs</sub> with units s<sup>−1</sup>) and an expression is obtained identical to the first order expression above. + +One way to obtain a pseudo-first-order reaction is to use a large excess of one of the reactants ([B]>>[A]) would work for the previous example) so that, as the reaction progresses, only a small amount of the reactant is consumed, and its concentration can be considered to stay constant. By collecting <math>k'</math> for many reactions with different (but excess) concentrations of [B], a plot of <math>k'</math> versus [B] gives <math>k</math> (the regular second order rate constant) as the slope. + +Example: +The hydrolysis of esters by dilute mineral acids follows pseudo-first-order kinetics where the concentration of water is present in large excess. +:CH<sub>3</sub>COOCH<sub>3</sub> + H<sub>2</sub>O → CH<sub>3</sub>COOH + CH<sub>3</sub>OH + +==Summary for reaction orders 0, 1, 2, and ''n''== + +Elementary reaction steps with order 3 (called '''ternary reactions''') are [[Elementary reaction|rare and unlikely]] to occur. However, overall reactions composed of several elementary steps can, of course, be of any (including non-integer) order. + +{| class="wikitable" +! +!Zero-Order +!First-Order +!Second-Order +!''n''th-Order +|- +|Rate Law +|<math>-\frac{d[A]}{dt} = k</math> +|<math>-\frac{d[A]}{dt} = k[A]</math> +|<math>-\frac{d[A]}{dt} = k[A]^2</math><ref name="2nd-order"/> +|<math>-\frac{d[A]}{dt} = k[A]^n</math> +|- +|Integrated Rate Law +|<math>\ [A] = [A]_0 - kt</math> +|<math>\ [A] = [A]_0 e^{-kt}</math> +|<math>\frac{1}{[A]} = \frac{1}{[A]_0} + kt</math><ref name="2nd-order"/> +|<math>\frac{1}{[A]^{n-1}} = \frac{1}{{[A]_0}^{n-1}} + (n-1)kt</math> +<small>[Except first order]</small> +|- +|Units of Rate Constant (''k'') +|<math>\rm\frac{M}{s}</math> +|<math>\rm\frac{1}{s}</math> +|<math>\rm\frac{1}{M \cdot s}</math> +|<math>\frac{1}{{\rm M}^{n-1} \cdot \rm s}</math> +|- +|Linear Plot to determine ''k'' +|<math>[A] \ \mbox{vs.} \ t</math> +|<math>\ln ([A]) \ \mbox{vs.} \ t </math> +|<math>\frac{1}{[A]} \ \mbox{vs.} \ t</math> +|<math>\frac{1}{[A]^{n-1}} \ \mbox{vs.} \ t</math> +<small>[Except first order]</small> +|- +|Half-life +|<math>t_{1/2} = \frac{[A]_0}{2k}</math> +|<math>t_{1/2} = \frac{\ln (2)}{k}</math> +|<math>t_{1/2} = \frac{1}{k[A]_0}</math><ref name="2nd-order"/> +|<math>t_{1/2} = \frac{2^{n-1}-1}{(n-1)k{[A]_0}^{n-1}}</math> +<small>[Except first order] +|} + +Where M stands for concentration in [[molarity]] (mol · L<sup>−1</sup>), ''t'' for time, and ''k'' for the reaction rate constant. The half-life of a first-order reaction is often expressed as ''t''<sub>1/2</sub> = 0.693/''k'' (as ln2 = 0.693). + +==Equilibrium reactions or opposed reactions== + +A pair of forward and reverse reactions may define an [[Chemical equilibrium|equilibrium]] process. For example, A and B react into X and Y and vice versa (s, t, u, and v are the [[stoichiometric coefficient]]s): + +:<math>\ sA + tB \rightleftharpoons uX + vY</math> + +The reaction rate expression for the above reactions (assuming each one is elementary) can be expressed as: + +:<math> r = {k_1 [A]^s[B]^t} - {k_2 [X]^u[Y]^v}\,</math> + +where: k<sub>1</sub> is the rate coefficient for the reaction that consumes A and B; k<sub>2</sub> is the rate coefficient for the backwards reaction, which consumes X and Y and produces A and B. + +The constants k<sub>1</sub> and k<sub>2</sub> are related to the equilibrium coefficient for the reaction (K) by the following relationship (set r=0 in balance): + +:<math> {k_1 [A]^s[B]^t = k_2 [X]^u[Y]^v}\,</math> +:<math> K = \frac{[X]^u[Y]^v}{[A]^s[B]^t} = \frac{k_1}{k_2}</math> + +[[Image:ChemicalEquilibrium.svg|thumb|300px|right|Concentration of A (A<sub>0</sub> = 0.25 mole/l) and B versus time reaching equilibrium k<sub>f</sub> = 2 min<sup>-1</sup> and k<sub>r</sub> = 1 min<sup>-1</sup>]] + +===Simple example=== + +In a simple equilibrium between two species: + +:<math> A \rightleftharpoons B </math> + +Where the reactions starts with an initial concentration of A, <math>[A]_0</math>, with an initial concentration of 0 for B at time t=0. + +Then the constant K at equilibrium is expressed as: + +:<math>K \ \stackrel{\mathrm{def}}{=}\ \frac{k_{f}}{k_{b}} = \frac{\left[B\right]_e} {\left[A\right]_e}</math> + +Where <math>[A]_e</math> and <math>[B]_e</math> are the concentrations of A and B at equilibrium, respectively. + +The concentration of A at time t, <math>[A]_t</math>, is related to the concentration of B at time t, <math>[B]_t</math>, by the equilibrium reaction equation: + +:<math>\ [A]_t = [A]_0 - [B]_t </math> + +Note that the term <math>[B]_0</math> is not present because, in this simple example, the initial concentration of B is 0. + +This applies even when time t is at infinity; i.e., equilibrium has been reached: + +:<math>\ [A]_e = [A]_0 - [B]_e </math> + +then it follows, by the definition of K, that + +:<math>\ [B]_e = x = \frac{k_{f}}{k_f+k_b}[A]_0 </math> + +and, therefore, + +:<math>\ [A]_e = [A]_0 - x = \frac{k_{b}}{k_f+k_b}[A]_0 </math> + +These equations allow us to uncouple the [[system of equations|system of differential equations]], and allow us to solve for the concentration of A alone. + +The reaction equation, given previously as: + +:<math> r = {k_1 [A]^s[B]^t} - {k_2 [X]^u[Y]^v}\,</math> +:<math> -\frac{d[A]}{dt} = {k_f [A]_t} - {k_b [B]_t}\,</math> + +The derivative is negative because this is the rate of the reaction going from A to B, and therefore the concentration of A is decreasing. To simplify annotation, let x be <math>[A]_t</math>, the concentration of A at time t. Let <math>x_e</math> be the concentration of A at equilibrium. Then: + +:<math> -\frac{d[A]}{dt} = {k_f [A]_t} - {k_b [B]_t}\,</math> +:<math> -\frac{dx}{dt} = {k_f x} - {k_b [B]_t}\,</math> +:<math> -\frac{dx}{dt} = {k_f x} - {k_b ([A]_0 - x)}\,</math> +:<math> -\frac{dx}{dt} = {(k_f + k_b)x} - {k_b [A]_0}\,</math> + +Since: + +:<math> k_f + k_b = {k_b \frac{[A]_0}{x_e}} </math> + +The [[reaction rate]] becomes: + +:<math>\ \frac{dx}{dt} = \frac{k_b[A]_0}{x_e} (x_e - x) </math> + +which results in: + +:<math> \ln \left(\frac{[A]_0 - [A]_e}{[A]_t-[A]_e}\right) = (k_f + k_b)t </math> + +A plot of the negative [[natural logarithm]] of the concentration of A in time minus the concentration at equilibrium versus time t gives a straight line with slope k<sub>f</sub> + k<sub>b</sub>. By measurement of A<sub>e</sub> and B<sub>e</sub> the values of K and the two [[reaction rate constant]]s will be known.<ref>For a worked out example see: ''Determination of the Rotational Barrier for Kinetically Stable Conformational Isomers via NMR and 2D TLC An Introductory Organic Chemistry Experiment'' Gregory T. Rushton, William G. Burns, Judi M. Lavin, Yong S. Chong, Perry Pellechia, and Ken D. Shimizu [[J. Chem. Educ.]] '''2007''', 84, 1499. [http://jchemed.chem.wisc.edu/Journal/Issues/2007/Sep/abs1499.html Abstract]</ref> + +===Generalization of simple example=== + +If the concentration at the time t = 0 is different from above, the simplifications above are invalid, and a system of differential equations must be solved. However, this system can also be solved exactly to yield the following generalized expressions: + +<math>\left[ A \right]=\left[ A \right]_{0}\frac{1}{k_{f}+k_{b}}\left( k_{b}+k_{f}e^{-\left( k_{f}+k_{b} \right)t} \right)+\left[ B \right]_{0}\frac{k_{b}}{k_{f}+k_{b}}\left( 1-e^{-\left( k_{f}+k_{b} \right)t} \right)</math> + +<math>\left[ B \right]=\left[ A \right]_{0}\frac{k_{f}}{k_{f}+k_{b}}\left( 1-e^{-\left( k_{f}+k_{b} \right)t} \right)+\left[ B \right]_{0}\frac{1}{k_{f}+k_{b}}\left( k_{f}+k_{b}e^{-\left( k_{f}+k_{b} \right)t} \right)</math> + +When the equilibrium constant is close to unity and the reaction rates very fast for instance in [[Conformational isomerism|conformational analysis]] of molecules, other methods are required for the determination of rate constants for instance by complete lineshape analysis in [[NMR spectroscopy]]. + +==Consecutive reactions== +If the rate constants for the following reaction are <math>k_1</math> and <math>k_2</math>; <math> A \rightarrow \; B \rightarrow \; C </math>, then the rate equation is: + +For reactant A: <math> \frac{d[A]}{dt} = -k_1 [A] </math> + +For reactant B: <math> \frac{d[B]}{dt} = k_1 [A] - k_2 [B]</math> + +For product C: <math> \frac{d[C]}{dt} = k_2 [B]</math> + +With the individual concentrations scaled by the total population of reactants to become probabilities, linear systems of differential equations such as these can be formulated as a [[master equation]]. The differential equations can be solved analytically and the integrated rate equations are + +<math>[A]=[A]_0 e^{-k_1 t}</math> + +<math>\left[ B \right]=\left\{ \begin{array}{*{35}l} + \left[ A \right]_{0}\frac{k_{1}}{k_{2}-k_{1}}\left( e^{-k_{1}t}-e^{-k_{2}t} \right) & k_{1}\ne k_{2} \\ + \left[ A \right]_{0}k_{1}te^{-k_{1}t}+\left[ B \right]_{0}e^{-k_{1}t} & \text{otherwise} \\ +\end{array} \right.</math> + +<math>\left[ C \right]=\left\{ \begin{array}{*{35}l} + \left[ A \right]_{0}\left( 1+\frac{k_{1}e^{-k_{2}t}-k_{2}e^{-k_{1}t}}{k_{2}-k_{1}} \right)+\left[ B \right]_{0}\left( 1-e^{-k_{2}t} \right)+\left[ C \right]_{0} & k_{1}\ne k_{2} \\ + \left[ A \right]_{0}\left( 1-e^{-k_{1}t}-k_{1}te^{-k_{1}t} \right)+\left[ B \right]_{0}\left( 1-e^{-k_{1}t} \right)+\left[ C \right]_{0} & \text{otherwise} \\ +\end{array} \right.</math> + +The [[steady state (chemistry)|steady state]] approximation leads to very similar results in an easier way. + +==Parallel or competitive reactions== +When a substance reacts simultaneously to give two different products, a parallel or competitive reaction is said to take place. + +*''Two first order reactions'': + +<math> A \rightarrow \; B </math> and <math> A \rightarrow \; C </math>, with constants <math> k_1</math> and <math> k_2</math> and rate equations <math>-\frac{d[A]}{dt}=(k_1+k_2)[A]</math>, <math> \frac{d[B]}{dt}=k_1[A]</math> and <math> \frac{d[C]}{dt}=k_2[A]</math> + +The integrated rate equations are then <math>\ [A] = [A]_0 e^{-(k_1+k_2)t}</math>; <math>[B] = \frac{k_1}{k_1+k_2}[A]_0 (1-e^{-(k_1+k_2)t})</math> and +<math>[C] = \frac{k_2}{k_1+k_2}[A]_0 (1-e^{-(k_1+k_2)t})</math>. + +One important relationship in this case is <math> \frac{[B]}{[C]}=\frac{k_1}{k_2}</math> + +*''One first order and one second order reaction'':<ref>José A. Manso et al."A Kinetic Approach to the Alkylating Potential of Carcinogenic Lactones" Chem. Res. Toxicol. 2005, 18, (7) 1161-1166</ref> + +This can be the case when studying a bimolecular reaction and a simultaneous hydrolysis (which can be treated as pseudo order one) takes place: the hydrolysis complicates the study of the reaction kinetics, because some reactant is being "spent" in a parallel reaction. For example A reacts with R to give our product C, but meanwhile the hydrolysis reaction takes away an amount of A to give B, a byproduct: <math> A + H_2O \rightarrow \ B </math> and <math> A + R \rightarrow \ C </math>. The rate equations are: <math> \frac{d[B]}{dt}=k_1[A][H_2O]=k_1'[A]</math> and <math> \frac{d[C]}{dt}=k_2[A][R]</math>. Where <math>k_1'</math> is the pseudo first order constant. + +The integrated rate equation for the main product [C] is <math> [C]=[R]_0 \left [ 1-e^{-\frac{k_2}{k_1'}[A]_0(1-e^{-k_1't})} \right ] </math>, which is equivalent to <math> ln \frac{[R]_0}{[R]_0-[C]}=\frac{k_2[A]_0}{k_1'}(1-e^{-k_1't})</math>. Concentration of B is related to that of C through <math> [B]=-\frac{k_1'}{k_2} ln \left ( 1 - \frac{[C]}{[R]_0} \right )</math> + +The integrated equations were analytically obtained but during the process it was assumed that <math>[A]_0-[C]\approx \;[A]_0</math> therefeore, previous equation for [C] can only be used for low concentrations of [C] compared to [A]<sub>0</sub> + +==General dynamics of unimolecular conversion== + +For a general unimolecular reaction involving interconversion of <math>N</math> different species, whose concentrations at time <math>t</math> are denoted by <math>X_1(t)</math> through <math>X_N(t)</math>, an analytic form for the time-evolution of the species can be found. Let the rate constant of conversion from species <math>X_i</math> to species <math>X_j</math> be denoted as <math>k_{ij}</math>, and construct a rate-constant matrix <math>K</math> whose entries are the <math>k_{ij}</math>. + +Also, let <math>X(t)=(X_1(t),X_2(t),...,X_N(t))^T</math> be the vector of concentrations as a function of time. + +Let <math>J=(1,1,1,...,1)^T</math> be the vector of ones. + +Let <math>I</math> be the <math>N</math>×<math>N</math> identity matrix. + +Let <math>Diag</math> be the function that takes a vector and constructs a diagonal matrix whose on-diagonal entries are those of the vector. + +Let <math> \displaystyle\mathcal{L}^{-1}</math> be the inverse Laplace transform from <math>s</math> to <math>t</math>. + +Then the time-evolved state <math>X(t)</math> is given by + +:<math>X(t)=\displaystyle\mathcal{L}^{-1}[(sI+Diag(KJ)-K^T)^{-1}X(0)]</math>, +thus providing the relation between the initial conditions of the system and its state at time <math>t</math>. + +==See also== +*[[Michaelis–Menten kinetics]] +*[[Petersen matrix]] +*[[Reaction-diffusion equation]] +*[[Reactions on surfaces]]: rate equations for reactions where at least one of the reactants [[adsorption|adsorbs]] onto a surface +*[[Reaction progress kinetic analysis]] +*[[Reaction rate]] +*[[Reaction rate constant]] +*[[Steady state (chemistry)|Steady state approximation]] + +==References== +{{reflist}} + +{{Reaction mechanisms}} + +{{DEFAULTSORT:Rate Equation}} +[[Category:Chemical kinetics]] +[[Category:Chemical engineering]] + +[[cy:Cyfradd adwaith#Hafaliadau cyfradd]] + py0pyrxamt3dxahemml6biggatonspx + + + + Martingale (betting system) + 0 + 3544 + + 3545 + 2013-12-02T13:07:40Z + + 212.9.31.12 + + /* Anti-martingale */ what are the scare quotes for? + wikitext + text/x-wiki + {{For|the generalised mathematical concept|Martingale (probability theory)}} +{{Refimprove|date=October 2010}} + +A '''martingale''' is any of a class of [[betting strategy|betting strategies]] that originated from and were popular in 18th century [[France]]. The simplest of these strategies was designed for a game in which the gambler wins his stake if a coin comes up heads and loses it if the coin comes up tails. The strategy had the gambler double his bet after every loss, so that the first win would recover all previous losses plus win a profit equal to the original stake. The martingale strategy has been applied to [[roulette]] as well, as the probability of hitting either red or black is close to 50%. + +Since a gambler with infinite wealth will, [[almost surely]], eventually flip heads, the martingale betting strategy was seen as a [[certainty|sure thing]] by those who advocated it. Of course, none of the gamblers in fact possessed infinite wealth, and the [[exponential growth]] of the bets would eventually bankrupt "unlucky" gamblers who chose to use the martingale. It is therefore a good example of a [[Taleb distribution]] – the gambler usually wins a small net reward, thus appearing to have a sound strategy. However, the gambler's expected value does indeed remain zero (or less than zero) because the small probability that he will suffer a catastrophic loss exactly balances with his expected gain. (In a casino, the expected value is ''negative'', due to the house's edge.) The likelihood of catastrophic loss may not even be very small. The bet size rises exponentially. This, combined with the fact that strings of consecutive losses actually occur more often than common intuition suggests, can bankrupt a gambler quickly. + +Casino betting limits eliminate the effectiveness of using the martingale strategy.<ref>{{cite web|url=http://www.goodbonusguide.com/casino-articles/roulette-systems-destroying-the-martingale-theory-myth.html|title=Roulette Systems: Destroying The Martingale System Myth |publisher=Good Bonus Guide|accessdate=31 March 2012}}</ref> + +==Effect of variance== +Sometimes, by temporarily avoiding a losing streak, a bettor achieves a better result than the expected negative return. A straight string of losses is the only sequence of outcomes that results in a loss of money, so even when a player has lost the majority of his bets, he can still be ahead overall, since he always wins 1 unit when a bet wins, regardless of how many previous losses.<ref>{{cite web|url=http://www.blackjackincolor.com/useless4.htm |title=Martingale Long Term vs. Short Term Charts |publisher=Blackjackincolor.com |date= |accessdate=2009-08-04}}</ref> + +==Intuitive analysis== + +Assuming that the win/loss outcomes of each bet are [[independent and identically distributed random variables]], the stopping time has finite [[expected value]]. This justifies the following argument, explaining why the betting system fails: Since [[expected value#Linearity|expectation is linear]], the expected value of a series of bets is just the sum of the expected value of each bet. Since in such games of chance the bets are [[statistical independence|independent]], the expectation of each bet does not depend on whether you previously won or lost. In most casino games, the expected value of any individual bet is negative, so the sum of lots of negative numbers is also always going to be negative. + +The martingale strategy fails even with unbounded stopping time, as long as there is a limit on earnings or on the bets (which are also true in practice).<ref name=mitzenmacherupfal>{{citation | year=2005 | title = Probability and computing: randomized algorithms and probabilistic analysis | author1=Michael Mitzenmacher | author2=Eli Upfal | publisher=Cambridge University Press | isbn=978-0-521-83540-4 | page=298 | url=http://books.google.com/books?id=0bAYl6d7hvkC&pg=PA298&dq=%22martingale+stopping%22}}</ref> It is only with unbounded wealth, bets ''and'' time that the martingale becomes a [[winning strategy]]. + +==Mathematical analysis== +One round of the idealized martingale without time or credit constraints can be formulated mathematically as follows. Let the coin tosses be represented by a sequence {{nowrap|1=''X''<sub>0</sub>, ''X''<sub>1</sub>, &hellip;}} of independent random variables, each of which is equal to ''H'' with probability ''p'', and ''T'' with probability {{nowrap|1=''q'' = 1 – ''p''.}} Let ''N'' be time of appearance of the first ''H''; in other words, {{nowrap|1=''X''<sub>0</sub>, ''X''<sub>1</sub>, &hellip;, ''X''<sub>''N''–1</sub> = ''T''}}, and {{nowrap|1=''X''<sub>''N''</sub> = ''H''.}} If the coin never shows ''H'', we write {{nowrap|1=''N'' = ∞.}} ''N'' is itself a random variable because it depends on the random outcomes of the coin tosses. + +In the first {{nowrap|1=''N'' – 1}} coin tosses, the player following the martingale strategy loses {{nowrap|1=1, 2, &hellip;, 2<sup>''N''–1</sup>}} units, accumulating a total loss of {{nowrap|1=2<sup>''N''</sup> − 1.}} On the ''N''<sup>th</sup> toss, there is a win of 2<sup>''N''</sup> units, resulting in a net gain of 1 unit over the first ''N'' tosses. For example, suppose the first four coin tosses are ''T'', ''T'', ''T'', ''H'' making {{nowrap|1=''N'' = 3.}} The bettor loses 1, 2, and 4 units on the first three tosses, for a total loss of 7 units, then wins 8 units on the fourth toss, for a net gain of 1 unit. As long as the coin eventually shows heads, the betting player realizes a gain. + +What is the probability that {{nowrap|1=''N'' = ∞,}} i.e., that the coin never shows heads? Clearly it can be no greater than the probability that the first ''k'' tosses are all ''T''; this probability is ''q<sup>k''</sup>. Unless {{nowrap|1=''q'' = 1}}, the only nonnegative number less than or equal to ''q<sup>k''</sup> for all values of ''k'' is zero. It follows that ''N'' is finite with probability 1; therefore with probability 1, the coin will eventually show heads and the bettor will realize a net gain of 1 unit. + +This property of the idealized version of the martingale accounts for the attraction of the idea. In practice, the idealized version can only be approximated, for two reasons. Unlimited credit to finance possibly astronomical losses during long runs of tails is not available, and there is a limit to the number of coin tosses that can be performed in any finite period of time, precluding the possibility of playing long enough to observe very long runs of tails. + +As an example, consider a bettor with an available fortune, or credit, of <math>2^{43}</math> (approximately 9 trillion) units, roughly half the size of the current US national debt in dollars. With this very large fortune, the player can afford to lose on the first 42 tosses, but a loss on the 43rd cannot be covered. The probability of losing on the first 42 tosses is <math>q^{42}</math>, which will be a very small number unless tails are nearly certain on each toss. In the fair case where <math>q=1/2</math>, we could expect to wait something on the order of <math>2^{42}</math> tosses before seeing 42 consecutive tails; tossing coins at the rate of one toss per second, this would require approximately 279,000 years. + +This version of the game is likely to be unattractive to both players. The player with the fortune can expect to see a head and gain one unit on average every two tosses, or two seconds, corresponding to an annual income of about 31.6 million units until disaster (42 tails) occurs. This is only a 0.0036 ''percent'' return on the fortune at risk. The other player can look forward to steady losses of 31.6 million units per year until hitting an incredibly large jackpot, probably in something like 279,000 years, a period far longer than any currency has yet existed. If <math>q > 1/2</math>, this version of the game is also unfavorable to the first player in the sense that it would have negative expected winnings. + +The impossibility of winning over the long run, given a limit of the size of bets or a limit in the size of one's bankroll or line of credit, is proven by the [[optional stopping theorem]].<ref name=mitzenmacherupfal/> + +==Mathematical analysis of a single round== +Let one round be defined as a sequence of consecutive losses followed by either a win, or bankruptcy of the gambler. After a win, the gambler "resets" and is considered to have started a new round. A continuous sequence of martingale bets can thus be partitioned into a sequence of independent rounds. Following is an analysis of the expected value of one round. + +Let ''q'' be the probability of losing (e.g. for American double-zero roulette, it is 10/19 for a bet on black or red). Let ''B'' be the amount of the initial bet. Let ''n'' be the finite number of bets the gambler can afford to lose. + +The probability that the gambler will lose all ''n'' bets is ''q''<sup>''n''</sup>. When all bets lose, the total loss is + +:<math>\sum_{i=1}^n B \cdot 2^{i-1} = B (2^n - 1)</math> + +The probability the gambler does not lose all ''n'' bets is 1&nbsp;&minus;&nbsp;''q''<sup>''n''</sup>. In all other cases, the gambler wins the initial bet (''B''.) Thus, the [[expected value|expected]] profit per round is + +:<math>(1-q^n) \cdot B - q^n \cdot B (2^n - 1) = B (1 - (2q)^n)</math> + +Whenever ''q''&nbsp;>&nbsp;1/2, the expression 1&nbsp;&minus;&nbsp;(2''q'')<sup>''n''</sup>&nbsp;<&nbsp;0 for all ''n''&nbsp;>&nbsp;0. Thus, for all games where a gambler is more likely to lose than to win any given bet, that gambler is expected to lose money, on average, each round. Increasing the size of wager for each round per the martingale system only serves to increase the average loss. + +Suppose a gambler has a 63 unit gambling bankroll. The gambler might bet 1 unit on the first spin. On each loss, the bet is doubled. Thus, taking ''k'' as the number of preceding consecutive losses, the player will always bet 2<sup>k</sup> units. + +With a win on any given spin, the gambler will net 1 unit over the total amount wagered to that point. Once this win is achieved, the gambler restarts the system with a 1 unit bet. + +With losses on all of the first six spins, the gambler loses a total of 63 units. This exhausts the bankroll and the martingale cannot be continued. + +In this example, the probability of losing the entire bankroll and being unable to continue the martingale is equal to the probability of 6 consecutive losses: (10/19)<sup>6</sup> =&nbsp;2.1256%. The probability of winning is equal to 1 minus the probability of losing 6 times: 1&nbsp;&minus;&nbsp;(20/38)<sup>6</sup>&nbsp;=&nbsp;97.8744%. + +The expected amount won is (1 &times; 0.978744) = 0.978744.<br> +The expected amount lost is (63 &times; 0.021256)= 1.339118.<br> +Thus, the total expected value for each application of the betting system is (0.978744&nbsp;&minus;&nbsp;1.339118) = &minus;0.360374 . + +In a unique circumstance, this strategy can make sense. Suppose the gambler possesses exactly 63 units but desperately needs a total of 64. Assuming ''q''&nbsp;>&nbsp;1/2 (it is a real casino) and he may only place bets at even odds, his best strategy is '''bold play''': at each spin, he should bet the smallest amount such that if he wins he reaches his target immediately, and if he doesn't have enough for this, he should simply bet everything. Eventually he either goes bust or reaches his target. This strategy gives him a probability of 97.8744% of achieving the goal of winning one unit vs. a 2.1256% chance of losing all 63 units, and that is the best probability possible in this circumstance.<ref name=dubinssavage>{{citation | year=1965 | title = How to gamble if you must: inequalities for stochastic processes | author1=Lester E. Dubins |authorlink1=Lester Dubins| author2=Leonard J. Savage|authorlink2=Leonard Jimmie Savage | publisher=McGraw Hill | url=http://books.google.nl/books?id=kt9QAAAAMAAJ }}</ref> However, bold play is not always the optimal strategy for having the biggest possible chance to increase an initial capital to some desired higher amount. If the gambler can bet arbitrarily small amounts at arbitrarily long odds (but still with the same expected loss of 2/38 of the stake at each bet), and can only place one bet at each spin, then there are strategies with above 98% chance of attaining his goal, and these use very timid play unless the gambler is close to losing all his capital, in which case he does switch to extremely bold play.<ref name=shepp>{{citation | year=2006 | title = Bold play and the optimal policy for Vardi's casino, pp 150–156 in: Random Walk, Sequential Analysis and Related Topics | author1=Larry Shepp | publisher=World Scientific | url=http://eproceedings.worldscinet.com/9789812772558/9789812772558_0010.html }}</ref> + +==Alternative mathematical analysis== +The previous analysis calculates ''expected value'', but we can ask another question: what is the chance that one can play a casino game using the martingale strategy, and avoid the losing streak long enough to double one's bankroll. + +As before, this depends on the likelihood of losing 6 roulette spins in a row assuming we are betting red/black or even/odd. Many gamblers believe that the chances of losing 6 in a row are remote, and that with a patient adherence to the strategy they will slowly increase their bankroll. + +In reality, the odds of a streak of 6 losses in a row are much higher than the many people intuitively believe. Psychological studies have shown that since people know that the odds of losing 6 times in a row out of 6 plays are low, they incorrectly assume that in a longer string of plays the odds are also very low. When people are asked to invent data representing 200 coin tosses, they often do not add streaks of more than 5 because they believe that these streaks are very unlikely.<ref>{{cite web|url=http://wizardofodds.com/image/ask-the-wizard/streaks.pdf|title=What were the Odds of Having Such a Terrible Streak at the Casino?|last=Martin|first=Frank A.|date=February 2009|publisher=WizardOfOdds.com|accessdate=31 March 2012}}</ref> This intuitive belief is sometimes referred to as the [[representativeness heuristic]]. + +The odds of losing a single spin at roulette are {{math|q {{=}} 20/38 {{=}} 52.6316%}}. If you play a total of 6 spins, the odds of losing 6 times are {{math|q<sup>6</sup> {{=}} 2.1256%}}, as stated above. However if you play more and more spins, the odds of losing 6 times in a row begin to increase rapidly. +* In 73 spins, there is a 50.3% chance that you will at some point have lost at least 6 spins in a row. (The chance of still being solvent after the first six spins is 0.978744, and the chance of becoming bankrupt at each subsequent spin is (1&nbsp;&minus;&nbsp;0.526316)&times;0.021256 =&nbsp;0.010069, where the first term is the chance that you won the (''n''&nbsp;&minus;&nbsp;6)th spin – if you had lost the (''n''&nbsp;&minus;&nbsp;6)th spin, you would have become bankrupt on the (''n''&nbsp;&minus;&nbsp;1)th spin. Thus over 73 spins the probability of remaining solvent is 0.978744 x (1-0.010069)^67 = 0.49683, and thus the chance of becoming bankrupt is 1&nbsp;&minus;&nbsp;0.49683 = 50.3%.) +* Similarly, in 150 spins, there is a 77.2% chance that you will lose at least 6 spins in a row at some point. +* And in 250 spins, there is a 91.1% chance that you will lose at least 6 spins in a row at some point. + +To double the initial bankroll of 6,300 with initial bets of 100 would require a minimum of 63 spins (in the unlikely event you win every time), and a maximum of 378 spins (in the even more unlikely event that you win every single round on the sixth spin). Each round will last an average of approximately 2 spins, so, 63 rounds can be expected to take about 126 spins on average. Computer simulations show that the required number will almost{{Clarify|date=August 2009}} never exceed 150 spins. Thus many gamblers believe that they can play the martingale strategy with very little chance of failure long enough to double their bankroll. However, the odds of losing 6 in a row are 77.2% over 150 spins, as above. + +We can replace the roulette game in the analysis with either the ''pass line'' at [[craps]], where the odds of losing are lower {{math|q{{=}}(251:244}}, or {{math|251/495){{=}}50.7071%}}, or a ''coin toss'' game where the odds of losing are 50.0%. We should note that games like coin toss with no house edge are not played in a commercial casino and thus represent a ''limiting case''. +* In 150 turns, there is a 70.7% chance that you will lose 6 times in a row on the ''pass line''. +* In 150 turns, there is a 68.2% chance that you will lose 6 times in a row at ''coin tossing''. + +In larger casinos, the maximum table limit is higher, so you can double 7, 8, or 9 times without exceeding the limit. However, in order to end up with twice your initial bankroll, you must play even longer. The calculations produce the same results. The probabilities are overwhelming that you will reach the ''bust streak'' before you can even double your bankroll. + +The conclusion is that players using martingale strategy pose no threat to a casino. The odds are high that the player will go bust before he is even able to double his money. + +Contrary to popular belief, [[table limit]]s are not designed to limit players from exploiting a martingale strategy. Instead, table limits exist to reduce the variance for the casino. For example, a casino which wins an average of $1000 a day on a given roulette table might not accept a $7000 bet on black at that table. While that bet would represent a positive expectation of over $368 ({{math|10/19 &middot; 7000 &minus; 18/38 &middot; 7000 {{=}} 368.42}}) to the casino, it would also have a 47.37% chance of negating an entire week's profit. The effect however is the same - the ability of the player to use the martingale system to win is curtailed. + +==Anti-martingale== +This is also known as the reverse martingale. In a classic martingale betting style, gamblers increase bets after each loss in hopes that an eventual win will recover all previous losses. The anti-martingale approach instead increases bets after wins, while reducing them after a loss. The perception is that the gambler will benefit from a winning streak or a "hot hand", while reducing losses while "cold" or otherwise having a losing streak. As the single bets are independent from each other (and from the gambler's expectations), the concept of winning "streaks" is merely an example of [[gambler's fallacy]], and the anti-martingale strategy fails to make any money. If on the other hand, real-life stock returns are serially correlated (for instance due to economic cycles and delayed reaction to news of larger market participants), "streaks" of wins or losses do happen more often and are longer than those under a purely random process, the anti-martingale strategy could theoretically apply and can be used in trading systems (as trend-following or "doubling up"). + +== See also == +{{Portal|Mathematics}} + +*[[St. Petersburg paradox]] + +==References== + +{{Reflist}} + +{{DEFAULTSORT:Martingale (Betting System)}} +[[Category:Betting systems]] +[[Category:Roulette and wheel games]] +[[Category:Gambling terminology]] + 3wnogoecv939va0ky4sgwlkebx78n6w + + + + Borsuk's conjecture + 0 + 15432 + + 15433 + 2013-11-07T17:11:44Z + + Yobot + 0 + + + /* Problem */Reference before punctuation using [[Project:AWB|AWB]] (9585) + wikitext + text/x-wiki + [[File:Borsuk Hexagon.svg|200px|thumb|right|An example of a [[hexagon]] cut into three pieces of smaller diameter.]] + +The '''Borsuk problem in geometry''', for historical reasons incorrectly called '''Borsuk's [[conjecture]]''', is a question in [[discrete geometry]]. + +==Problem== +In 1932 [[Karol Borsuk]] showed<ref name="BorsukFM">K. Borsuk, ''Drei Sätze über die n-dimensionale euklidische Sphäre'', "Fundamenta Mathematicae", '''20''' (1933). 177&ndash;190</ref> that an ordinary 3-dimensional [[ball (mathematics)|ball]] in [[Euclidean space]] can be easily dissected into 4 solids, each of which has a smaller [[diameter]] than the ball, and generally ''d''-dimensional ball can be covered with {{nobr|''d'' + 1}} [[Compact space|compact]] [[Set (mathematics)|sets]] of diameters smaller than the ball. At the same time he proved that ''d'' [[subset]]s are not enough in general. The proof is based on the [[Borsuk–Ulam theorem]]. That led Borsuk to a general question: + +: ''Die folgende Frage bleibt offen: Lässt sich jede beschränkte Teilmenge E des Raumes <math>\Bbb R^n</math> in (n&nbsp;+&nbsp;1) Mengen zerlegen, von denen jede einen kleineren Durchmesser als E hat?''<ref name="BorsukFM" /> + +Translation: + +: ''The following question remains open: Can every [[bounded set|bounded]] subset E of the space <math>\Bbb R^n</math> be [[partition of a set|partitioned]] into (n&nbsp;+&nbsp;1) sets, each of which has a smaller diameter than E?'' + +The question got a positive answer in the following cases: +* ''d'' = 2 — the original result by Borsuk (1932). +* ''d'' = 3 — the result of Julian Perkal (1947),<ref>J. Perkal, Sur la subdivision des ensembles en parties de diamètre inférieur, ''Colloq. Math.'' '''2''' (1947), 45.</ref> and independently, 8 years later, H. G. Eggleston (1955).<ref>H. G. Eggleston, Covering a three-dimensional set with sets of smaller diameter, ''J. Lond. Math. Soc''. 30 (1955), 11–24.</ref> A simple proof was found later by [[Branko Grünbaum]] and Aladár Heppes. +* For all ''d'' for the [[Smooth manifold|smooth]] convex bodies — the result of [[Hugo Hadwiger]] (1946).<ref>Hadwiger H, Überdeckung einer Menge durch Mengen kleineren Durchmessers, ''Comment. Math. Helv.'', 18 (1945/46), 73–75; <br/> Mitteilung betreffend meine Note: Überdeckung einer Menge durch Mengen kleineren Durchmessers, 19 (1946/47), 72–73</ref> +* For all ''d'' for [[Rotational symmetry|centrally-symmetric]] bodies (A.S. Riesling, 1971). +* For all ''d'' for [[Solid of revolution|bodies of revolution]] — the result of Boris Dekster (1995). + +The problem was finally solved in 1993 by [[Jeff Kahn]] and [[Gil Kalai]], who showed the general answer to the Borsuk's question is ''no''. Their construction shows that {{nobr|''d'' + 1}} pieces do not suffice for {{nobr|1=''d'' = 1,325}} and for each {{nobr|''d'' > 2,014}}. + +After Andriy V. Bondarenko has shown that Borsuk’s conjecture is false for all {{nobr|''d'' ≥ 65}},<ref>Andriy V. Bondarenko, [http://arxiv.org/abs/1305.2584 On Borsuk's conjecture for two-distance sets]</ref> the current best bound, due to Thomas Jenrich, is 64.<ref>Thomas Jenrich, [http://arxiv.org/abs/1308.0206 A 64-dimensional two-distance counterexample to Borsuk's conjecture]</ref> + +Apart from finding the minimum number ''d'' of dimensions such that the number of pieces <math>\alpha(d) > d+1</math> mathematicians are interested in finding the general behavior of <math>\alpha(d)</math> function. Kahn and Kalai show that in general (that is for ''d'' big enough), one needs <math>\alpha(d) \ge (1.2)^\sqrt{d}</math> number of pieces. They also quote the upper bound by [[Oded Schramm]], who showed that for every ''ε'', if ''d'' is sufficiently large, <math>\alpha(d) \le \left(\sqrt{3/2} + \varepsilon\right)^d</math>. The correct order of magnitude of ''α''(''d'') is still unknown (see e.g. Alon's article), however it is conjectured that there is a constant {{nobr|''c'' > 1}} such that <math>\alpha(d) > c^d</math> for all {{nobr|''d'' ≥ 1}}. + +==See also== +*[[Hadwiger conjecture (combinatorial geometry)|Hadwiger's conjecture]] on covering convex bodies with smaller copies of themselves + +==Notes== +{{reflist}} + +==References== +* [http://matwbn.icm.edu.pl/ksiazki/fm/fm20/fm20117.pdf ''Drei Sätze über die n-dimensionale euklidische Sphäre''] (German 'Three statements of ''n''-dimensional Euclidean sphere') – original Borsuk's article in [[Fundamenta Mathematicae]], made available by [http://matwbn.icm.edu.pl/index.php?jez=en Polish Virtual Library of Science] +* Jeff Kahn and [[Gil Kalai]], [http://arxiv.org/abs/math.MG/9307229 A counterexample to Borsuk's conjecture], ''[[Bulletin of the American Mathematical Society]]'' '''29''' (1993), 60&ndash;62. +* [[Noga Alon]], [http://arxiv.org/abs/math.CO/0212390 Discrete mathematics: methods and challenges], ''Proceedings of the [[International Congress of Mathematicians]], [[Beijing]] 2002'', vol. 1, 119&ndash;135. +* Aicke Hinrichs and Christian Richter, [http://users.minet.uni-jena.de/~hinrichs/paper/18/borsuk.pdf New sets with large Borsuk numbers], ''Discrete Math.'' '''270''' (2003), 137&ndash;147 +* Andrei M. Raigorodskii, The Borsuk partition problem: the seventieth anniversary, ''[[Mathematical Intelligencer]]'' '''26''' (2004), no. 3, 4&ndash;12. +* [[Oded Schramm]], Illuminating sets of constant width, ''Mathematika'' '''35''' (1988), 180–199. + +==Further reading== +* Oleg Pikhurko, ''[http://www.math.cmu.edu/~pikhurko/AlgMet.ps Algebraic Methods in Combinatorics]'', course notes. + +==External links== +* {{MathWorld|urlname=BorsuksConjecture|title=Borsuk's Conjecture}} + +[[Category:Disproved conjectures]] +[[Category:Discrete geometry]] + 814djwi6iszq0u13tifikxrfduk64ke + + + + Theoretical motivation for general relativity + 0 + 12470 + + 12471 + 2013-08-31T21:53:56Z + + Anythingyouwant + 0 + + /* Geodesic equation for circular orbits */ better main article + wikitext + text/x-wiki + A '''Theoretical motivation for general relativity''', including the motivation for the [[geodesic equation]] and the [[Einstein field equation]], can be obtained from [[special relativity]] by examining the [[Dynamics (mechanics)|dynamics]] of particles in [[circular orbit]]s about the earth. A key advantage in examining circular orbits is that it is possible to know the solution of the Einstein Field Equation ''[[A priori and a posteriori|a priori]]''. This provides a means to inform and verify the formalism. + +[[General relativity]] addresses two questions: +# How does the [[curvature]] of [[spacetime]] affect the motion of [[matter]]? +# How does the presence of matter affect the curvature of spacetime? + +The former question is answered with the [[#The geodesic equation in a local coordinate system|geodesic equation]]. The second question is answered with the [[#Einstein field equation|Einstein field equation]]. The geodesic equation and the field equation are related through a [[principle of least action]]. The motivation for the geodesic equation is provided in the section [[#Geodesic equation for circular orbits|Geodesic equation for circular orbits]] The motivation for the Einstein field equation is provided in the section [[#Stress-energy tensor|Stress-energy tensor]] + +<div class="noprint" style="clear: right"> +{{General relativity}} +__TOC__ +</div> + +==Geodesic equation for circular orbits== + +{{main|Geodesics in general relativity}} + +===Kinetics of circular orbits=== + +[[Image:060322 helix.svg|thumb|250px|left|World line of a circular orbit about the Earth depicted in two spatial dimensions X and Y (the plane of the orbit) and a time dimension, usually put as the vertical axis. Note that the orbit about the Earth is a circle in space, but its worldline is a helix in spacetime.]] +For definiteness consider a circular earth orbit (helical [[world line]]) of a particle. The particle travels with speed v. An observer on earth sees that length is contracted in the frame of the particle. A measuring stick traveling with the particle appears shorter to the earth observer. Therefore the circumference of the orbit, which is in the direction of motion appears longer than <math> \pi </math> times the diameter of the orbit.<ref name="Ref. 1">{{cite book | author=Einstein, A. | title=Relativity: The Special and General Theory | location= New York | publisher=Crown| year=1961 | isbn=0-517-02961-8}}</ref> + +In [[special relativity]] the 4-proper-velocity of the particle in the [[inertial]] (non-accelerating) frame of the earth is + +:<math> u = \left ( \gamma , \gamma { \mathbf{v} \over c } \right ) </math> + +where c is the [[speed of light]], <math> \mathbf{v} </math> is the 3-velocity, and <math> \gamma </math> is + +:<math> \gamma = { 1 \over \sqrt { { 1 - { { \mathbf{v} \cdot \mathbf{v} } \over c^2 } } } } </math>. + +The magnitude of the 4-velocity vector is always constant + +:<math> u_{\alpha} u^{\alpha} = -1 </math> + +where we are using a [[Minkowski metric]] + +:<math>\eta^{\mu\nu} =\eta_{\mu\nu} = \begin{pmatrix} +-1 & 0 & 0 & 0\\ +0 & 1 & 0 & 0\\ +0 & 0 & 1 & 0\\ +0 & 0 & 0 & 1 +\end{pmatrix}</math>. + +The magnitude of the 4-velocity is therefore a [[Lorentz scalar]]. + +The 4-acceleration in the earth (non-accelerating) frame is + +:<math> a \equiv { {d u} \over {d\tau} } = { d \over {d\tau} } { \left ( \gamma , \gamma { \mathbf{v} \over c } \right )} = { \left ( 0 , \gamma^2 { \mathbf{a} \over c^2 } \right )} = { \left ( 0 , - \gamma^2 { { \mathbf{v} \cdot \mathbf{v} } \over c^2 } { {\mathbf{r} } \over r^2 } \right )} </math> + +where <math> d\tau </math> is c times the proper time interval measured in the frame of the particle. This is related to the time interval in the Earth's frame by + +:<math> c dt = \gamma d\tau </math>. + +Here, the 3-acceleration for a circular orbit is + +:<math> \mathbf{a} = - \omega^2 \mathbf{r} = - { \mathbf{v} \cdot \mathbf{v} } { {\mathbf{r} } \over r^2 } </math> + +where <math> \omega </math> is the angular velocity of the rotating particle and <math> \mathbf{r} </math> is the 3-position of the particle. + +The magnitude of the 4-velocity is constant. This implies that the 4-acceleration must be perpendicular to the 4-velocity. The 4-acceleration is, in fact, perpendicular to the 4-velocity in this example (see [[Fermi-Walker transport]]). The inner product of the 4-acceleration and the 4-velocity is therefore always zero. The inner product is a [[Lorentz scalar]]. + +===Curvature of spacetime: Geodesic equation=== +The equation for the acceleration can be generalized, yielding the [[geodesic equation]] + +:<math> { {d u^{\mu}} \over {d\tau}} - a^{\mu} = 0 </math> + +:<math> { {d u^{\mu}} \over {d\tau}} + {R^{\mu}}_{\alpha \nu \beta } u^{\alpha} x^{\nu} u^{\beta} = 0 </math> + +where <math> x^{\mu} </math> is the 4-position of the particle and <math> {R^{\mu}}_{\alpha \nu \beta } </math> is the [[curvature]] tensor give by + +:<math> {R^{\mu}}_{\alpha \nu \beta } = { 1 \over r^2 } \eta_{\alpha \beta} {\delta^{\mu}}_{\nu} </math> + +where <math> {\delta^{\mu}}_{\nu} </math> is the [[Kronecker delta function]], and we have the constraints + +:<math> u_{\alpha} u^{\alpha} = -1</math> + +and + +:<math>a_{\alpha} u^{\alpha} = 0</math>. + +It is easily verified that circular orbits satisfy the geodesic equation. The geodesic equation is actually more general. Circular orbits are a particular solution of the equation. Solutions other than circular orbits are permissible and valid. + +===Ricci curvature tensor and trace=== +{{main|Ricci curvature}} +{{main|Scalar curvature}} + +The [[Ricci curvature]] tensor is a special curvature tensor given by the contraction + +:<math> R_{\alpha \beta } \equiv {R^{\nu}}_{\alpha \nu \beta } </math>. + +The trace of the Ricci tensor, called the [[scalar curvature]], is + +:<math> R \equiv {R^{\alpha}}_{ \alpha } </math>. + +===The geodesic equation in a local coordinate system=== + +[[Image:General relativity rdj 3.png|frame|right|Circular orbits at the same radius.]] + +Consider the situation in which there are now two particles in nearby [[Circular orbit|circular]] [[Polar orbit|polar]] orbits of the [[earth]] at radius <math> r </math> and speed <math> v </math>. + +The particles execute [[simple harmonic motion]] about the earth and with respect to each other. They are at their maximum distance from each other as they cross the equator. Their [[Trajectory|trajectories]] intersect at the poles. + +Imagine we have a spacecraft co-moving with one of the particles. The ceiling of the craft, the <math> \acute{\mathbf{z}} </math> direction, coincides with the <math> \mathbf{r} </math> direction. The front of the craft is in the <math> \acute{\mathbf{x} } </math> direction, and the <math> \acute{\mathbf{y} } </math> direction is to the left of the craft. The spacecraft is small compared with the size of the orbit so that the local frame is a local Lorentz frame. The 4-separation of the two particles is given by <math> \acute{x}^{\mu} </math>. In the local frame of the spacecraft the geodesic equation is given by + +:<math> { {d^2 \acute{x}^{\mu}} \over {d\tau^2}} + \acute{{R}^{\mu} }_{\alpha \nu \beta } \acute{u}^{\alpha} \acute{x}^{\nu} \acute{u}^{\beta} = 0 </math> + +where + +:<math> \acute{u}^{\mu} = { {d \acute{x}^{\mu}} \over {d\tau}} </math> + +and + +:<math> \acute{{R}^{\mu}}_{\alpha \nu \beta } </math> + +is the curvature tensor in the local frame. + +===Geodesic equation as a covariant derivative=== +The equation of motion for a particle in flat spacetime and in the absence of forces is + +:<math> { {d {u}^{\mu}} \over {d\tau}} =0 </math>. + +If we require a particle to travel along a geodesic in curved spacetime, then the analogous expression in curved spacetime is + +:<math> { {D \acute{u}^{\mu}} \over {D\tau}}= { {d \acute{u}^{\mu}} \over {d\tau}} + {\Gamma ^{\mu}}_{\alpha \beta} \acute{u}^{\alpha} \acute{u}^{\beta} =0 </math> + +where the derivative on the left is the [[covariant derivative]], which is the generalization of the normal derivative to a derivative in curved spacetime. Here + +:<math> {\Gamma ^{\mu}}_{\alpha \beta} </math> + +is a [[Christoffel symbol]]. + +The curvature is related to the Christoffel symbol by + +:<math> \acute{{R}^{\mu}}_{\alpha \nu \beta } = { { \partial {{\Gamma}^{\mu}}}_{\alpha \beta} \over {\partial x^{\nu}} } +- { { \partial {{\Gamma}^{\mu}}}_{\alpha \nu} \over {\partial x^{\beta}} } ++ {{\Gamma}^{\mu}}_{\gamma \nu} {{\Gamma}^{\gamma}}_{\alpha \beta} +- {{\Gamma}^{\mu}}_{\gamma \beta} {{\Gamma}^{\gamma}}_{\alpha \nu} +</math>. + +===Metric tensor in the local frame=== +The interval in the local frame is + +:<math> ds^2 = dx^2 +dy^2 + dz^2 - c^2 dt^2 \equiv g_{\mu \nu } d \acute{x}^{\mu} d \acute{x}^{\nu} </math> + +:<math> = d \acute{x} ^2 +d\acute{y}^2 + d\acute{z}^2 - c^2 d\acute{t}^2 +2\gamma \cos(\theta ) \cos(\phi) \,v \, d\acute{t} \,d\acute{x} +2\gamma \cos(\theta ) \sin (\phi) v \,d\acute{t} \,d\acute{y} -2\gamma \sin(\theta ) v \, d\acute{t} \, d\acute{z} </math> + +where + +:<math> \theta </math> is the angle with the <math>z</math> axis (longitude) and + +:<math> \phi </math> is the angle with the <math>x</math> axis (latitude). + +This gives a [[Metric tensor|metric]] of + +:<math> g_{\mu\nu} = \begin{pmatrix} +-1 & \gamma \cos( \theta ) \cos ( \phi ) \frac{v}{c} & \gamma \cos( \theta ) \sin ( \phi ) \frac{v}{c} & -\gamma \sin ( \theta ) \frac{v}{c} \\ +\gamma \cos( \theta ) \cos ( \phi ) {\frac{v}{c}} & 1 & 0 & 0\\ +\gamma \cos( \theta ) \sin ( \phi ) {\frac{v}{c}} & 0 & 1 & 0\\ +-\gamma \sin ( \theta ) \frac{v}{c} & 0 & 0 & 1 +\end{pmatrix} </math> + +in the local frame. + +The inverse of the metric tensor <math> g^{\mu \nu} </math> is defined such that + +:<math> g_{\mu \alpha} g^{\alpha \nu} = \delta_{\mu}^{\nu} </math> + +where the term on the right is the [[Kronecker delta]]. + +The transformation of the infinitesimal 4-volume <math> d\Omega </math> is + +:<math> d\acute{\Omega} = \sqrt{-g} d{\Omega } </math> + +where g is the determinant of the metric tensor. + +The differential of the determinant of the metric tensor is + +:<math> dg = g g^{\mu \nu} dg_{\mu \nu} = -g g_{\mu \nu} dg^{\mu \nu} </math>. + +The relationship between the Christoffel symbols and the metric tensor is + +:<math> {{\Gamma}^{\alpha}}_{ \mu \nu } = g^{\alpha \beta} {\Gamma}_{\beta \mu \nu } +</math> + +:<math> {\Gamma}_{\beta \mu \nu } = {\frac{1}{2}} \left ( +{ { \partial {g}}_{\beta \nu} \over {\partial x^{\mu}} } ++ { { \partial {g}}_{\beta \mu} \over {\partial x^{\nu}} } +- { { \partial {g}}_{\mu \nu} \over {\partial x^{\beta}} } +\right ) +</math>. + +===Principle of least action in general relativity=== +{{main|Einstein-Hilbert action}} + +The principle of least action states that the [[world line]] between two events in spacetime is that world line that minimizes the action between the two events. In [[classical mechanics]] the principle of least action is used to derive [[Newton's laws of motion]] and is the basis for [[Lagrangian dynamics]]. In relativity it is expressed as + +:<math> S = \int_1^2 \mathcal{L}\, d\Omega </math> + +between events 1 and 2 is a minimum. Here S is a [[Scalar (mathematics)|scalar]] and + +:<math> \mathcal{L} </math> + +is known as the [[Lagrangian density]]. The Lagrangian density is divided into two parts, the density for the orbiting particle <math> \mathcal{L}_p </math> and the density <math> \mathcal{L}_e </math> of the gravitational field generated by all other particles including those comprising the earth, + +:<math> \mathcal{L} = \mathcal{L}_p + \mathcal{L}_e </math>. + +In curved [[spacetime]], the "shortest" world line is that [[geodesic]] that minimizes the curvature along the geodesic. The action then is proportional to the curvature of the world line. Since S is a scalar, the [[scalar curvature]] is the appropriate measure of curvature. The action for the particle is therefore + +:<math> S_p = C \int_1^2 \acute{R}\, d\acute{\Omega} = C \int_1^2 { \acute{R} } \sqrt{-g} \,d{\Omega} = C \int_1^2 g^{\alpha \beta} \acute{R}_{\alpha \beta} \sqrt{-g}\, d{\Omega} </math> + +where <math> C </math> is an unknown constant. This constant will be determined by requiring the theory to reduce to Newton's law of gravitation in the nonrelativistic limit. + +The Lagrangian density for the particle is therefore + +:<math> \mathcal{L}_p = C g^{\alpha \beta} \acute{R}_{\alpha \beta} \sqrt{-g} </math>. + +The action for the particle and the earth is + +:<math> S = \int_1^2 C g^{\alpha \beta} \acute{R}_{\alpha \beta} \sqrt{-g}\, d\Omega + \int_1^2 \mathcal{L}_e \,d\Omega </math>. + +We find the world line that lies on the surface of the sphere of radius r by varying the metric tensor. Minimization and neglect of terms that disappear on the boundaries, including terms second order in the derivative of g, yields + +:<math> 0 = \delta S = \int_1^2 C \left ( \acute{R}_{\alpha \beta} - {1\over 2} \acute{R} g^{\alpha \beta} \right ) \delta g^{\alpha \beta} \sqrt{-g}\, d\Omega - \int_1^2 \acute{T}_{\alpha \beta} \delta g^{\alpha \beta} \sqrt{-g}\, d\Omega </math> + +where<ref name="Ref 3 Sec 94">{{cite book | author=Landau, L. D. and Lifshitz, E. M.| title=Classical Theory of Fields (Fourth Revised English Edition) | location=Oxford | publisher=Pergamon | year=1975 | isbn=0-08-018176-7}}</ref> + +:<math> \acute{T}_{\alpha \beta} = { 1 \over \sqrt{-g} } \left ( {d \over {dx^{\nu} } } { { \partial \mathcal{L}_e} \over { \partial \left ( { {d g^{ \alpha \beta } } \over { dx^{\nu} } } \right ) } } - { {\partial \mathcal{L}_e} \over { \partial g^{ \alpha \beta } } } \right ) </math> + +is the [[Stress-energy tensor|Hilbert stress-energy tensor]] of the field generated by the earth. + +The relationship, to within an unknown constant factor, between the stress-energy and the curvature is + +:<math> \acute{T}_{\alpha \beta} = C \left ( \acute{R}_{\alpha \beta} - {1\over 2} \acute{R} \, g_{\alpha \beta} \right ) </math>. + +==Stress-energy tensor== +===Newton's law of gravitation=== + +[[Image:Lorentz transform of world line.gif|right|framed|Diagram 1. Changing views of spacetime along the [[world line]] of a rapidly accelerating observer. +In this animation, the dashed line is the spacetime trajectory ("[[world line]]") of a particle. The balls are placed at regular intervals of [[proper time]] along the world line. The solid diagonal lines are the [[light cone]]s for the observer's current event, and intersect at that event. The small dots are other arbitrary events in the spacetime. For the observer's current instantaneous inertial frame of reference, the vertical direction indicates the time and the horizontal direction indicates distance. +The slope of the world line (deviation from being vertical) is the velocity of the particle on that section of the world line. So at a bend in the world line the particle is being accelerated. Note how the view of spacetime changes when the observer accelerates, changing the instantaneous inertial frame of reference. These changes are governed by the Lorentz transformations. Also note that: +* the balls on the world line before/after future/past accelerations are more spaced out due to time dilation. +* events which were simultaneous before an acceleration are at different times afterwards (due to the [[relativity of simultaneity]]), +* events pass through the light cone lines due to the progression of proper time, but not due to the change of views caused by the accelerations, and +* the world line always remains within the future and past light cones of the current event.]] + +[[Newton's law of gravitation|Newton's Law of Gravitation]] in non-relativistic mechanics states that the acceleration on an object of mass <math> m </math> due to another object of mass <math> M </math> is equal to + +:<math> \mathbf{f} = {d^2 \mathbf{r} \over d\tau^2} = - {GM \over { c^2 r^3} }\mathbf{r} </math> + +where <math> G </math> is the [[gravitational constant]], <math> \mathbf{r} </math> is a vector from mass <math> M </math> to mass <math> m </math> and <math> r </math> is the magnitude of that vector. The time t is scaled with the [[speed of light]] c + +:<math> \tau \equiv c t </math>. + +The acceleration <math> \mathbf{f} </math> is independent of <math> m </math>. + +For definiteness. consider a particle of mass <math> m </math> orbiting in the gravitational field of the earth with mass <math> M </math>. The law of gravitation can be written + +:<math> \mathbf{f} = - {4\pi G \over {3 c^2} }\rho(r) \mathbf{r} </math> + +where <math> \rho(r) </math> is the average mass density inside a [[Volume|sphere]] of radius <math> r </math>. + +===Gravitational force in terms of the 00 component of the stress-energy tensor=== + +Newton's law can be written + +:<math> \mathbf{f} = - {4\pi G \over {3 c^4}} \left ( {Mc^2 \over V }\right ) \mathbf{r} </math>. + +where <math> V </math> is the [[volume]] of a sphere of radius <math> r </math>. The quantity <math> Mc^2 </math> will be recognized from [[special relativity]] as the rest energy of the large body, the earth. This is the sum of the rest energies of all the particles that compose earth. The quantity in the parentheses is then the average rest energy density of a sphere of radius <math> r </math> about the earth. The gravitational field is proportional to the average energy density within a radius r. This is the 00 component of the [[stress-energy tensor]] in [[Special relativity|relativity]] for the special case in which all the energy is rest energy. More generally + +:<math> T_{00} = - {T^{0}}_0 = \sum_{i=1}^N \left ( {\gamma_i m_i c^2 \over V }\right ) </math> + +where + +:<math> \gamma_i \equiv { 1 \over {\sqrt {1 - {{\mathbf{v}_i \cdot \mathbf{v}_i } \over c^2} } } } </math> + +and <math> \mathbf{v_i} </math> is the velocity of particle i making up the earth and <math> m_i </math> in the rest mass of particle i. There are N particles altogether making up the earth. + +===Relativistic generalization of the energy density=== +[[Image:StressEnergyTensor.svg|thumb|250px|left|The components of the stress-energy tensor.]] +There are two simple relativistic entities that reduce to the 00 component of the stress-energy tensor in the nonrelativistic limit + +:<math> u^{\alpha} T_{\alpha \beta} u^{\beta} \rightarrow T_{00} </math> + +and the [[Trace (linear algebra)|trace]] + +:<math> T \equiv {T^{\alpha}}_{\alpha} = -u_{\alpha} u^{\alpha} T = -u^{\alpha} T \eta_{\alpha \beta} u^{\beta} \rightarrow - T_{00}</math> + +where <math> u^{\alpha} </math> is the 4-velocity. + +The 00 component of the stress-energy tensor can be generalized to the relativistic case as a linear combination of the two terms + +:<math> T_{00} \rightarrow u^{\alpha} \left ( A T_{\alpha \beta} + B T \eta_{\alpha \beta} \right ) u^{\beta} </math> + +where + +:<math> A + B = 1 </math> + +===4-acceleration due to gravity=== + +The 4-acceleration due to gravity can be written + +:<math> f^{\mu} = - 8\pi { G \over { 3 c^4 } } \left ( {A \over 2} T_{\alpha \beta} + {B \over 2} T \eta_{\alpha \beta} \right )\delta^{\mu}_{\nu} u^{\alpha} x^{\nu} u^{\beta} </math>. + +Unfortunately, this acceleration is nonzero for <math> \mu = 0 </math> as is required for circular orbits. Since the magnitude of the 4-velocity is constant, it is only the component of the force perpendicular to the 4-velocity that contributes to the acceleration. We must therefore subtract off the component of force parallel to the 4-velocity. This is known as [[Fermi-Walker transport]].<ref>{{cite book | author=Misner, Charles; Thorne, Kip S. & Wheeler, John Archibald | title=Gravitation | location=San Francisco | publisher=W. H. Freeman | year=1973 | isbn=0-7167-0344-0|pages= 170,171}}</ref> In other words + +:<math> f^{\mu} \rightarrow f^{\mu} + u^{\mu} u_{\nu} f^{\nu} </math>. + +This yields + +:<math> f^{\mu} = - 8\pi { G \over { 3 c^4 } } \left ( {A \over 2} T_{\alpha \beta} + {B \over 2} T \eta_{\alpha \beta} \right ) \left ( \delta^{\mu}_{\nu} + u^{\mu} u_{\nu} \right ) u^{\alpha} x^{\nu} u^{\beta} </math>. + +The force in the local frame is + +:<math> \acute{f}^{\mu} = - 8\pi { G \over { 3 c^4 } } \left ( {A \over 2} \acute{T}_{\alpha \beta} + {B \over 2} \acute{T} g_{\alpha \beta} \right ) \left ( \delta^{\mu}_{\nu} + \acute{u}^{\mu} \acute{u}_{\nu} \right ) \acute{u}^{\alpha} \acute{x}^{\nu} \acute{u}^{\beta} </math>. + +==Einstein field equation== + +[[Image:spacetime curvature.png|thumb|right|400px|Two-dimensional visualization of space-time distortion. The presence of matter changes the geometry of spacetime, this (curved) geometry being interpreted as gravity.]] + +We obtain the [[Einstein field equation]]<ref>Landau 1975, p. 276</ref> by equating the acceleration required for circular orbits with the acceleration due to gravity + +:<math> a^{\mu} = f^{\mu} </math> + +:<math> \acute{{R}^{\mu}}_{\alpha \nu \beta} \acute{u}^{\alpha} \acute{x}^{\nu} \acute{u}^{\beta} = - \acute{f}^{\mu} </math>. + +This is the relationship between curvature of spacetime and the stress-energy tensor. + +The Ricci tensor becomes + +:<math> \acute{R}_{\alpha \beta} = 8\pi { G \over { c^4 } } \left ( { A \over 2 } \acute{T}_{\alpha \beta} + {B \over 2} \acute{T} g_{\alpha \beta} \right ) </math>. + +The trace of the Ricci tensor is + +:<math> \acute{R} = \acute{R}_{\alpha}^{ \alpha} = 8\pi { G \over { c^4 } } \left ( {A\over 2}\acute{T}_{\alpha}^{ \alpha} + {B \over 2} \acute{T} \delta_{\alpha}^{ \alpha} \right ) = 8\pi { G \over { c^4 } } \left ( {A\over 2} + 2B \right ) \acute{T } </math>. + +Comparison of the Ricci tensor with the Ricci tensor calculated from the principle of least action, [[Theoretical motivation for general relativity#Principle of least action in general relativity]] identifying the stress-energy tensor with the Hilbert stress-energy, and remembering that A+B=1 removes the ambiguity in A, B, and C. + +:<math> A=2 </math> + +:<math> B=-1 </math> + +and + +:<math> C= \left ( 8\pi { G \over { c^4 } } \right )^{-1} </math>. + +This gives + +:<math> \acute{R} = - 8\pi { G \over { c^4 } } \acute{T } </math>. + +The field equation can be written + +:<math> \mathcal{G}_{\alpha \beta} = 8\pi { G \over { c^4 } } \acute{T}_{\alpha \beta} </math> + +where + +:<math> \mathcal{G}_{\alpha \beta} \equiv \acute{R}_{\alpha \beta} - {1 \over 2} \acute{R} g_{\alpha \beta} </math>. + +This is the Einstein field equation that describes curvature of spacetime that results from stress-energy density. This equation, along with the geodesic equation have motivated by the kinetics and dynamics of a particle orbiting the earth in a circular orbit. They are true in general. + +==Solving the Einstein field equation== + +Solving the Einstein field equation requires an iterative process. The solution is represented in the metric tensor + +:<math> +g_{\mu \nu} +</math>. + +Typically there is an initial guess for the tensor. The guess is used to calculate [[Christoffel symbol]]s, which are used to calculate the curvature. If the Einstein field equation is not satisfied, the process is repeated. + +Solutions occur in two forms, vacuum solutions and non-vacuum solutions. A [[Vacuum solution (general relativity)|vacuum solution]] is one in which the stress-energy tensor is zero. The relevant vacuum solution for circular orbits is the [[Schwarzschild metric]]. There are also a number of [[Exact solutions in general relativity|exact solutions]] that are non-vacuum solutions, solutions in which the stress tensor is non-zero. + +==Solving the geodesic equation== +{{main|Solving the geodesic equations}} +Solving the geodesic equations requires knowledge of the metric tensor obtained through the solution of the Einstein field equation. Either the Christoffel symbols or the curvature are calculated from the metric tensor. The geodesic equation is then integrated with the appropriate [[boundary condition]]s. + +==Electrodynamics in curved spacetime== +{{main|Maxwell's equations in curved spacetime}} + +[[Maxwell's equations]], the equations of electrodynamics, in curved spacetime are a generalization of Maxwell's equations in flat [[spacetime]] (see [[Formulation of Maxwell's equations in special relativity]]). Curvature of spacetime affects electrodynamics. Maxwell's equations in curved spacetime can be obtained by replacing the derivatives in the equations in flat spacetime with [[covariant derivative]]s. The sourced and source-free equations become (cgs units): + +:<math> { 4 \pi \over c }J^ b = \partial_a F^{ab} + {\Gamma^a}_{\mu a} F^{\mu b} + {\Gamma^b}_{\mu a} F^{a \mu} \equiv D_a F^{ab} \equiv {F^{ab}}_{;a} \,\!</math>, + +and + +:<math>0 = \partial_c F_{ab} + \partial_b F_{ca} + \partial_a F_{bc} = D_c F_{ab} + D_b F_{ca} + D_a F_{bc}</math> + +where <math>\, J^a</math> is the [[4-current]], <math>\, F^{ab}</math> is the [[electromagnetic tensor|field strength tensor]], <math>\, \epsilon_{abcd}</math> is the [[Levi-Civita symbol]], and + +:<math> { \partial \over { \partial x^a } } \equiv \partial_a \equiv {}_{,a} \equiv (\partial/\partial ct, \nabla)</math> + +is the [[Four-gradient|4-gradient]]. Repeated indices are summed over according to [[Einstein notation|Einstein summation convention]]. We have displayed the results in several common notations. + +The first tensor equation is an expression of the two inhomogeneous Maxwell's equations, [[Gauss' law]] and the [[Ampère's circuital law|Ampère's law with Maxwell's correction]]. The second equation is an expression of the homogenous equations, [[Faraday's law of induction]] and [[Gauss's law for magnetism]]. + +The electromagnetic wave equation is modified from the equation in flat spacetime in two ways, the derivative is replaced with the covariant derivative and a new term that depends on the curvature appears. + +:<math> - {A^{\alpha ; \beta}}_{; \beta} + {R^{\alpha}}_{\beta} A^{\beta} = {4 \pi \over c } J^{\alpha} </math> + +where the [[Four-potential|4-potential]] is defined such that + +:<math>F^{ab} = \partial^b A^a - \partial^a A^b \,\!</math>. + +We have assumed the generalization of the [[Lorenz gauge]] in curved spacetime + +:<math> {A^{\mu}}_{ ; \mu} = 0 </math>. + +==See also== +*[[Newtonian foundation of general relativity]] + +==References== +{{reflist}} +* {{cite book | author=R. P. Feynman, F. B. Moringo, and W. G. Wagner | title=Feynman Lectures on Gravitation | publisher=Addison-Wesley | year=1995 | isbn=0-201-62734-5}} +* {{cite book | author=P. A. M. Dirac | title=General Theory of Relativity | publisher=Princeton University Press| year=1996 | isbn=0-691-01146-X}} + +{{Physics-footer}} + +[[Category:General relativity]] +[[Category:Concepts in physics]] + +{{Link FA|de}} +{{Link FA|ru}} +{{Link FA|zh}} + +[[ar:نظرية النسبية العامة]] +[[cs:Obecná teorie relativity]] +[[da:Almen relativitetsteori]] +[[de:Allgemeine Relativitätstheorie]] +[[et:Üldrelatiivsusteooria]] +[[el:Γενική Θεωρία Σχετικότητας]] +[[es:Teoría General de la Relatividad]] +[[eo:Fizika relativeco]] +[[fr:Relativité générale]] +[[gl:Relatividade Xeral]] +[[ko:일반 상대성 이론]] +[[id:Teori relativitas umum]] +[[it:Relatività generale]] +[[he:תורת היחסות הכללית]] +[[la:Relativitas generalis]] +[[lt:Bendroji reliatyvumo teorija]] +[[hu:Általános relativitáselmélet]] +[[nl:Algemene relativiteitstheorie]] +[[ja:一般相対性理論]] +[[pl:Ogólna teoria względności]] +[[pt:Relatividade geral]] +[[ru:Общая теория относительности]] +[[simple:General relativity]] +[[sk:Všeobecná teória relativity]] +[[sl:Splošna teorija relativnosti]] +[[fi:Yleinen suhteellisuusteoria]] +[[sv:Allmänna relativitetsteorin]] +[[th:ทฤษฎีสัมพัทธภาพทั่วไป]] +[[vi:Lý thuyết tương đối rộng]] +[[tr:Genel görelilik]] +[[uk:Теорія відносності загальна]] +[[zh:廣義相對論]] + dralqsj94wsd2cihknmhnd6oy5ezc0z + + + + Q-Vectors + 0 + 27610 + + 27611 + 2013-03-07T18:49:48Z + + Yobot + 0 + + + [[WP:CHECKWIKI]] errors fixed + general fixes using [[Project:AWB|AWB]] (8961) + wikitext + text/x-wiki + '''Q-vectors''' are used in atmospheric dynamics to understand physical processes such as vertical motion and [[frontogenesis]]. Q-vectors are not physical quantities that can be measured in the atmosphere but are derived from the quasi-geostrophic equations and can be used in the previous diagnostic situations. On meteorological charts, Q-vectors point toward upward motion and away from downward motion. Q-vectors are an alternative to the [[omega equation]] for diagnosing vertical motion in the quasi-geostrophic equations. + +==Derivation== +First derived in 1978,<ref name=autogenerated1>{{cite journal|last=Hoskins|first=B. J.|coauthors=I. Draghici and H. C. Davies|title=A new look at the ω-equation|journal=Quart. J. R. Met. Soc|year=1978|volume=104|pages=31–38}}</ref> Q-vector derivation can be simplified for the midlatitudes, using the midlatitude β-plane quasi-geostrophic prediction equations:<ref>{{cite book|last=Holton|first=James R.|title=An Introduction to Dynamic Meteorology|year=2004|publisher=Elsevier Academic|location=New York|isbn=0-12-354015-1|pages=168–72}}</ref> + +# <math> \frac{D_g u_g}{Dt} - f_{0}v_a - \beta y v_g = 0 </math> (x component of quasi-geostrophic momentum equation) +# <math> \frac{D_g v_g}{Dt} + f_{0}u_a + \beta y u_g = 0 </math> (y component of quasi-geostrophic momentum equation) +# <math> \frac{D_g T}{Dt} - \frac{\sigma p}{R} \omega = \frac{J}{c_p} </math> (quasi-geostrophic thermodynamic equation) + +And the [[thermal wind]] equations: + +<math> f_{0} \frac{\partial u_g}{\partial p} = \frac{R}{p} \frac{\partial T}{\partial y} </math> (x component of thermal wind equation) + +<math> f_{0} \frac{\partial v_g}{\partial p} = - \frac{R}{p} \frac{\partial T}{\partial x} </math> (y component of thermal wind equation) + +where <math>f_0</math> is the [[Coriolis parameter]], approximated by the constant 1e<sup>−4</sup> s<sup>−1</sup>; <math>R</math> is the atmospheric [[ideal gas constant]]; <math> \beta </math> is the latitudinal change in the Coriolis parameter <math> \beta = \frac{\partial f} {\partial y} </math>; <math> \sigma </math> is a static stability parameter; <math>c_p</math> is the [[specific heat]] at constant pressure; <math>p</math> is pressure; <math>T</math> is temperature; anything with a subscript <math>g</math> indicates [[geostrophic]]; anything with a subscript <math>a</math> indicates [[ageostrophic]]; <math>J</math> is a diabatic heating rate; and <math>\omega</math> is the Lagrangian rate change of pressure with time. <math>\omega = \frac{Dp}{Dt}</math>. Note that because pressure decreases with height in the atmosphere, a <math> - \omega </math> is upward vertical motion, analogous to <math>+w=\frac{Dz}{Dt}</math>. + +From these equations we can get expressions for the Q-vector: + +<math> Q_1 = - \frac{R}{p} \left[ \frac{\partial u_g}{\partial x} \frac{\partial T}{\partial x} + \frac{\partial v_g}{\partial x} \frac{\partial T}{\partial y} \right] </math> + +<math> Q_2 = - \frac{R}{p} \left[ \frac{\partial u_g}{\partial y} \frac{\partial T}{\partial x} + \frac{\partial v_g}{\partial y} \frac{\partial T}{\partial y} \right] </math> + +And in vector form: + +<math> Q_1 = - \frac{R}{p} \frac{\partial \vec{V_g}}{\partial x} \cdot \vec{\nabla} T </math> + +<math> Q_2 = - \frac{R}{p} \frac{\partial \vec{V_g}}{\partial y} \cdot \vec{\nabla} T </math> + +Plugging these Q-vector equations into the [[omega equation|quasi-geostrophic omega equation]] gives: + +<math> \left(\sigma \overrightarrow{\nabla^2} + f_{\circ}^2 \frac{\partial ^2}{\partial p^2} \right) \omega = -2 \vec{\nabla} \cdot \vec{Q} + f_{\circ} \beta \frac{\partial v_g}{\partial p} - \frac{\kappa}{p} \overrightarrow{\nabla^2} J </math> + +Which in an adiabatic setting gives: + +<math> -\omega \propto -2 \vec{\nabla} \cdot \vec{Q} </math> + +Expanding the left-hand side of the quasi-geostrophic omega equation in a [[Fourier Series]] gives the <math> -\omega </math> above, implying that a <math> -\omega </math> relationship with the right-hand side of the [[omega equation|quasi-geostrophic omega equation]] can be assumed. + +This expression shows that the divergence of the Q-vector (<math> \vec{\nabla} \cdot \vec{Q} </math>) is associated with downward motion. Therefore, convergent <math> \vec{Q} </math> forces ascend and divergent <math> \vec{Q} </math> forces descend.<ref>{{cite book|last=Holton|first=James R.|title=An Introduction to Dynamic Meteorology|year=2004|publisher=Elsevier Academic|location=New York|isbn=0-12-354015-1|pages=170}}</ref> Q-vectors and all [[ageostrophic]] flow exist to preserve [[thermal wind]] balance. Therefore, low level Q-vectors tend to point in the direction of low-level ageostrophic winds.<ref>{{cite book|last=Hewitt|first=C. N.|title=Handbook of atmospheric science: principles and applications|year=2003|publisher=John Wiley & Sons|location=New York|isbn=0-632-05286-4|pages=286}}</ref> + +==Applications== +Q-vectors can be determined wholly with: [[geopotential height]] (<math> \Phi </math>) and temperature on a constant pressure surface. Q-vectors always point in the direction of ascending air. For an idealized cyclone and anticyclone in the Northern Hemisphere (where <math> \frac{\partial T} {\partial y} <0 </math>), cyclones have Q-vectors which point parallel to the thermal wind and anticyclones have Q-vectors that point antiparallel to the thermal wind.<ref>{{cite book|last=Holton|first=James R.|title=An Introduction to Dynamic Meteorology|year=2004|publisher=Elsevier Academic|location=New York|isbn=0-12-354015-1|pages=171}}</ref> This means upward motion in the area of warm air advection and downward motion in the area of cold air advection. + +In [[frontogenesis]], temperature gradients need to tighten for initiation. For those situations Q-vectors point toward ascending air and the tightening thermal gradients.<ref>{{cite web|last=National Weather Service|first=Jet Stream - Online School for Weather|title=Glossary: Q's|url=http://www.srh.weather.gov/jetstream/append/glossary_q.htm|work=NOAA - NWS|accessdate=15 March 2012}}</ref> In areas of convergent Q-vectors, cyclonic vorticity is created, and in divergent areas, anticyclonic vorticity is created.<ref name=autogenerated1 /> + +==References== +{{Reflist}} + +[[Category:Meteorology]] + gtwfz684n4k2p3s8tuj4zvszzhr5ru4 + + + + Stencil code + 0 + 22411 + + 22412 + 2013-09-12T16:55:50Z + + 2001:4878:8000:70:2C61:48D7:9286:428 + + disambiguating + wikitext + text/x-wiki + [[File:3D von Neumann Stencil Model.svg|thumb|right|The shape of a 6-point 3D [[von Neumann neighborhood|von Neumann]] style stencil.]] + +'''Stencil codes''' are a class of iterative [[GPGPU#Kernels|kernels]]<ref name="Roth"> + Roth, Gerald et al. (1997) + Proceedings of SC'97: High Performance Networking and Computing. + ''[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.1505 Compiling Stencils in High Performance Fortran.]'' +</ref> +which update [[Array data structure|array elements]] according to some fixed pattern, called stencil.<ref name="Sloot"> + Sloot, Peter M.A. et al. (May 28, 2002) + ''[http://books.google.com/books?id=qVcLw1UAFUsC&pg=PA843&dq=stencil+array&sig=g3gYXncOThX56TUBfHE7hnlSxJg#PPA843,M1 Computational Science – ICCS 2002: International Conference, Amsterdam, The Netherlands, April 21–24, 2002. Proceedings, Part I.]'' + Page 843. Publisher: Springer. ISBN 3-540-43591-3. +</ref> +They are most commonly found in the [[Source code|codes]] of [[computer simulation]]s, e.g. for [[computational fluid dynamics]] in the context of scientific and engineering applications. +Other notable examples include solving [[partial differential equations]],<ref name="Roth"/> the [[Jacobi method|Jacobi]] kernel, the [[Gauss–Seidel method]],<ref name="Sloot"/> [[image processing]]<ref name="Roth"/> and [[Cellular automaton|cellular automata]].<ref name="Fey"> + Fey, Dietmar et al. (2010) + ''[http://books.google.com/books?id=RJRZJHVyQ4EC&pg=PA51&dq=fey+grid&hl=de&ei=uGk8TtDAAo_zsgbEoZGpBQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCoQ6AEwAA#v=onepage&q&f=true Grid-Computing: Eine Basistechnologie für Computational Science]''. + Page 439. Publisher: Springer. ISBN 3-540-79746-7</ref> +The regular structure of the arrays sets stencil codes apart from other modeling methods such as the [[Finite element method]]. Most [[Finite difference method|finite difference codes]] which operate on regular grids can be formulated as stencil codes. + +==Definition== +Stencil codes perform a sequence of sweeps (called timesteps) through a given array.<ref name="Sloot"/> Generally this is a 2- or 3-dimensional regular grid.<ref name="Fey"/> The elements of the arrays are often referred to as cells. In each timestep, the stencil code updates all array elements.<ref name="Sloot"/> Using neighboring array elements in a fixed pattern (called the stencil), each cell's new value is computed. In most cases boundary values are left unchanged, but in some cases (e.g. [[Lattice Boltzmann methods|LBM codes]]) those need to be adjusted during the course of the computation as well. Since the stencil is the same for each element, the pattern of data accesses is repeated.<ref> + Yang, Laurence T.; Guo, Minyi. (August 12, 2005) + ''[http://books.google.com/books?id=qA4DbnFB2XcC&pg=PA221&dq=Stencil+codes&as_brr=3&sig=H8wdKyABXT5P7kUh4lQGZ9C5zDk High-Performance Computing : Paradigm and Infrastructure.]'' + Page 221. Publisher: Wiley-Interscience. ISBN 0-471-65471-X +</ref> + +More formally, we may define stencil codes as a [[N-tuple|5-tuple]] <math>(I, S, S_0, s, T)</math> with the following meaning:<ref name="Fey"/> + +* <math>I = \prod_{i=1}^k [0, \ldots, n_i]</math> is the index set. It defines the topology of the array. +* <math>S</math> is the (not necessarily finite) set of states, one of which each cell may take on on any given timestep. +* <math>S_0\colon \Z^k \to S</math> defines the initial state of the system at time 0. +* <math>s \in \prod_{i=1}^l \Z^k</math> is the stencil itself and describes the actual shape of the neighborhood. (There are <math>l</math> elements in the stencil. +* <math>T\colon S^l \to S</math> is the transition function which is used to determine a cell's new state, depending on its neighbors. + +Since ''I'' is a ''k''-dimensional integer interval, the array will always have the topology of a finite regular grid. The array is also called simulation space and individual cells are identified by their index <math>c \in I</math>. The stencil is an ordered set of <math>l</math> relative coordinates. We can now obtain for each cell <math>c</math> the tuple of its neighbors indices <math>I_c</math> + +: <math>I_c = \{j \mid \exists x \in s: j = c + x\} \, </math> + +Their states are given by mapping the tuple <math>I_c</math> to the corresponding tuple of states <math>N_i(c)</math>, where <math>N_i\colon I \to S^l</math> is defined as follows: + +: <math> +N_i(c) = (s_1, \ldots, s_l) \text{ with } s_j = S_i(I_c(j)) \, +</math> + +This is all we need to define the system's state for the following time steps <math>S_{i+1}\colon \Z^k \to S</math> with <math>i \in \N</math>: + +: <math> +S_{i+1}(c) = \begin{cases}T(N_i(c)), & c \in I\\ + S_i(c), & c \in \Z^k \setminus I \end{cases} +</math> + +Note that <math>S_i</math> is defined on <math>\Z^k</math> and not just on <math>I</math> since the boundary conditions need to be set, too. Sometimes the elements of <math>I_c</math> may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies: + +: <math> +I_c = \{j \mid \exists x \in s: j = ((c + x) \mod(n_1, \ldots, n_k))\} +</math> + +This may be useful for implementing [[periodic boundary conditions]], which simplifies certain physical models. + +=== Example: 2D Jacobi iteration === + +[[File:2D von Neumann Stencil.svg|thumb|right|Data dependencies of a selected cell in the 2D array.]] + +To illustrate the formal definition, we'll have a look at how a two dimensional [[Jacobi method|Jacobi]] iteration can be defined. The update function computes the arithmetic mean of a cell's four neighbors. In this case we set off with an initial solution of 0. The left and right boundary are fixed at 1, while the upper and lower boundaries are set to 0. After a sufficient number of iterations, the system converges against a saddle-shape. + +: <math> +\begin{align} +I & = [0, \ldots, 99]^2 \\ +S & = \R \\ +S_0 &: \Z^2 \to \R \\ +S_0((x, y)) & = \begin{cases} +1, & x < 0 \\ +0, & 0 \le x < 100 \\ +1, & x \ge 100 + \end{cases}\\ +s & = ((0, -1), (-1, 0), (1, 0), (0, 1)) \\ +T &\colon \R^4 \to \R \\ +T((x_1, x_2, x_3, x_4)) & = 0.25 \cdot (x_1 + x_2 + x_3 + x_4) +\end{align} +</math> + +{{multiple image + | width = 100 + | align = center + | footer = 2D Jacobi Iteration on a <math>100^2</math> Array + | image1 = 2D_Jacobi_t_0000.png + | alt1 = S_0 + | caption1 = <math>S_{0}</math> + | image2 = 2D_Jacobi_t_0200.png + | alt2 = S_200 + | caption2 = <math>S_{200}</math> + | image3 = 2D_Jacobi_t_0400.png + | alt3 = S_400 + | caption3 = <math>S_{400}</math> + | image4 = 2D_Jacobi_t_0600.png + | alt4 = S_600 + | caption4 = <math>S_{600}</math> + | image5 = 2D_Jacobi_t_0800.png + | alt5 = S_800 + | caption5 = <math>S_{800}</math> + | image6 = 2D_Jacobi_t_1000.png + | alt6 = S_1000 + | caption6 = <math>S_{1000}</math> + }} + +==Stencils== + +The shape of the neighborhood used during the updates depends on the application itself. The most common stencils are the 2D or 3D versions of the [[von Neumann neighborhood]] and [[Moore neighborhood]]. The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant. [[Conway's Game of Life]] uses the 2D Moore neighborhood. That said, other stencils such as a 25-point stencil for seismic wave propagation<ref> + Micikevicius, Paulius et al. (2009) + ''[http://portal.acm.org/citation.cfm?id=1513905 3D finite difference computation on GPUs using CUDA]'' + Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units + ISBN 978-1-60558-517-8 +</ref> can be found, too. + +{{multiple image + | width = 100 + | align = center + | footer = A selection of stencils used in various scientific applications. + | image1 = Moore_d.gif + | alt1 = 9-point stencil + | caption1 = 9-point 2D stencil + | image2 = Vierer-Nachbarschaft.png + | alt2 = 5-point stencil + | caption2 = 5-point 2D stencil + | image3 = 3D_von_Neumann_Stencil_Model.svg + | alt3 = 6-point stencil + | caption3 = 6-point 3D stencil + | image4 = 3D_Earth_Sciences_Stencil_Model.svg + | alt4 = 25-point stencil + | caption4 = 25-point 3D stencil + }} + +==Implementation issues== +Many simulation codes may be formulated naturally as stencil codes. Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of stencil codes are of paramount importance to research.<ref name="Datta"> + Datta, Kaushik (2009) + ''[http://www.cs.berkeley.edu/~kdatta/pubs/EECS-2009-177.pdf Auto-tuning Stencil Codes for Cache-Based Multicore Platforms]'', + Ph.D. Thesis +</ref> +This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most stencil codes are memory bound (i.e. the ratio of memory accesses to calculations is high).<ref name="Wellein"> + Wellein, G et al. (2009) + ''[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5254211 Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization]'', + 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009 +</ref> +Virtually all current parallel architectures have been explored for executing stencil codes efficiently;<ref name="datta2"> + Datta, Kaushik et al. (2008) + ''[http://portal.acm.org/citation.cfm?id=1413375 Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures],'' + SC '08 Proceedings of the 2008 ACM/IEEE conference on Supercomputing +</ref> +at the moment [[GPGPU]]s have proven to be most efficient.<ref name="schaefer"> + Schäfer, Andreas and Fey, Dietmar (2011) + ''[http://www.sciencedirect.com/science/article/pii/S1877050911002791 High Performance Stencil Code Algorithms for GPGPUs]'', + Proceedings of the International Conference on Computational Science, ICCS 2011 +</ref> + +==Libraries== +Due to both, the importance of stencil codes to [[computer simulation]]s and their high computational requirements, there are a number of efforts which aim at creating reusable libraries to support scientists in implementing new stencil codes. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO, [[Computational steering|steering]] and [[Application checkpointing|checkpointing]]. They may be classified by their API. + +===Patch-based libraries=== +This is a traditional design. The library manages a set of ''n''-dimensional scalar arrays, which the user code may access to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The advantage of this interface is that the user code may loop over the arrays, which makes it easy to integrate legacy codes<ref name="walberla"> + S. Donath, J. Götz, C. Feichtinger, K. Iglberger and U. Rüde (2010) + ''[http://www.springerlink.com/content/p2583237l2187374/ waLBerla: Optimization for Itanium-based Systems with Thousands of Processors]'', + High Performance Computing in Science and Engineering, Garching/Munich 2009 +</ref> +. The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops<ref name="35dblocking"> + Nguyen, Anthony et al. (2010) + ''[http://dl.acm.org/citation.cfm?id=1884658 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs]'', + SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis +</ref>) +or wrapping of the code for accelerators (e.g. via CUDA or OpenCL). Notable implementations include [http://cactuscode.org/ Cactus], a physics problem solving environment, and [http://www10.informatik.uni-erlangen.de/Research/Projects/walberla/description.shtml waLBerla]. + +===Cell-based libraries=== +These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are exposed to the user code, e.g. via getter/setter methods. The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking,<ref name=schaefer /> +but also to run the same code on multi-cores and GPUs.<ref name="physis"> + Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka (2011) + ''Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers'', + SC '11 Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis +</ref> This approach requires the user to recompile his source code together with the library. Otherwise a function call for every cell update would be required, which would seriously impair performance. This is only feasible with techniques such as [[Template (programming)|class templates]] or [[metaprogramming]], which is also the reason why this design is only found in newer libraries. Examples are [https://github.com/naoyam/physis Physis] and [http://www.libgeodecomp.org LibGeoDecomp]. + +==See also== +* [[Finite difference method]] +* [[Computer simulation]] +* [[Five-point stencil]] +* [[Stencil jumping]] + +==References== +{{reflist|33em}} + +==External links== +* [https://github.com/naoyam/physis Physis] +* [http://www.libgeodecomp.org LibGeoDecomp] + +{{DEFAULTSORT:Computer Simulation}} +[[Category:Computational science]] +[[Category:Scientific modeling]] +[[Category:Simulation software]] + 146jte2al2o59kmby7bz2zkdih2ze5v + + + + Geometric algebra + 0 + 459 + + 460 + 2014-01-19T16:56:00Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + {{Distinguish|Algebraic geometry}} +{{other uses}} +A '''geometric algebra''' (GA) is the [[Clifford algebra]] of a [[vector space]] over the field of [[real numbers]] endowed with a [[quadratic form]]. The term is also sometimes used as a collective term for the approach to classical, computational and relativistic [[geometry]] that applies these algebras. The distinguishing multiplication operation that defines the GA as a [[unital ring]] is the '''geometric product'''. Taking the geometric product among vectors can yield [[bivector]]s, trivectors, or general ''n''-vectors. The addition operation combines these into general [[multivector]]s, which are the elements of the ring. This includes, among other possibilities, a well-defined sum of a [[scalar (mathematics)|scalar]] and a [[Euclidean vector|vector]], an operation that is impossible by traditional [[vector addition]]. This operation may seem peculiar, but in geometric algebra it is seen as no more unusual than the representation of a [[complex number]] by the sum of its real and imaginary components. + +Geometric algebra is distinguished from Clifford algebra in general by its restriction to real numbers and its emphasis on its geometric interpretation and physical applications. Specific examples of geometric algebras applied in physics include the [[algebra of physical space]], the [[spacetime algebra]], and the [[#Conformal geometric algebra (CGA)|conformal geometric algebra]]. [[Geometric calculus]], an extension of GA that includes [[differentiation (mathematics)|differentiation]] and [[integral|integration]] can be further shown to incorporate other theories such as [[complex analysis]], [[differential geometry]], and [[differential form]]s. Because of such a broad reach with a comparatively simple algebraic structure, GA has been advocated, most notably by [[David Hestenes]]<ref>{{Citation + | last = Hestenes + | first = David + | author-link = David Hestenes + | title = Oersted Medal Lecture 2002: Reforming the Mathematical Language of Physics + | journal = Am. J. Phys. + | volume = 71 + | issue = 2 + | pages = 104–121 + |date=February 2003 + | url = http://geocalc.clas.asu.edu/pdf/OerstedMedalLecture.pdf +|bibcode = 2003AmJPh..71..104H |doi = 10.1119/1.1522700 }}</ref> and [[Chris J. L. Doran|Chris Doran]],<ref>{{cite journal + | last = Doran + | first = Chris + | authorlink = Chris J. L. Doran + | title = Geometric Algebra and its Application to Mathematical Physics + | journal = PhD thesis + | publisher = University of Cambridge + | year = 1994 + | url = http://www.mrao.cam.ac.uk/~clifford/publications/abstracts/chris_thesis.html}}</ref> as the preferred mathematical framework for [[physics]]. Proponents argue that it provides compact and intuitive descriptions in many areas including [[classical mechanics|classical]] and [[quantum mechanics]], [[electromagnetic theory]] and [[theory of relativity|relativity]].{{sfn|Lasenby|Lasenby|Doran|year=2000}} Others claim that in some cases the geometric algebra approach is able to sidestep a "proliferation of manifolds"<ref>McRobie, F. A.; Lasenby, J. (1999) ''Simo-Vu Quoc rods using Clifford algebra.'' Internat. J. Numer. Methods Engrg. Vol 45, #4, p. 377−398</ref> that arises during the standard application of [[differential geometry]]. + +The geometric product was first briefly mentioned by [[Hermann Grassmann]], who was chiefly interested in developing the closely related but more limited [[exterior algebra]]. In 1878, [[William Kingdom Clifford]] greatly expanded on Grassmann's work to form what are now usually called Clifford algebras in his honor (although Clifford himself chose to call them "geometric algebras"). For several decades, geometric algebras went somewhat ignored, greatly eclipsed by the [[vector calculus]] then newly developed to describe electromagnetism. The term "geometric algebra" was repopularized by Hestenes in the 1960s, who recognized its importance to relativistic physics.{{sfn|Hestenes|1966}} Since then, geometric algebra (GA) has also found application in [[computer graphics]] and [[robotics]]. + +==Definition and notation== +Given a finite dimensional real [[quadratic space]] {{nowrap|1=''V'' = '''R'''<sup>''n''</sup>}} with quadratic form {{nowrap|1=''Q'' : ''V'' → '''R'''}}, the '''geometric algebra''' for this quadratic space is the [[Clifford algebra]] ''C''ℓ(''V'',''Q''). + +The algebra product is called the ''geometric product''. It is standard to denote the geometric product by juxtaposition. The above definition of the geometric algebra is abstract, so we summarize the properties of the geometric product by the following set of axioms. If ''a'', ''b'', and ''c'' are vectors, then the geometric product has the following properties: + +:<math>a(bc)=(ab)c</math> ([[associativity]]) +:<math>a(b+c)=ab+ac</math> ([[distributivity]] over addition) +:<math>a^2 \in \mathbb R</math> + +Note that in the final property above, the square is not necessarily positive. An important property of the geometric product is the existence of elements with multiplicative inverse, also known as [[unit (ring theory)|units]]. If {{nowrap|1=''a''<sup>2</sup> ≠ 0}} for some vector ''a'', then ''a''<sup>−1</sup> exists and is equal to {{nowrap|1=''a''/(''a''<sup>2</sup>)}}. Not all the elements of the algebra are necessarily units. For example, if ''u'' is a vector in ''V'' such that {{nowrap|1=''u<sup>2</sup>'' = 1}}, the elements {{nowrap|1=1 ± ''u''}} have no inverse since they are [[zero divisor]]s: {{nowrap|1=(1 − ''u'')(1 + ''u'') = 1 − ''uu'' = 1 − 1 = 0}}. There may also exist nontrivial [[idempotent element]]s such as {{nowrap|(1 + ''u'')/2}}. + +===Inner and outer product of vectors=== +[[File:GA parallel and perpendicular vectors.svg|200px|right|thumb|Given two vectors '''a''' and '''b''', if the geometric product '''ab''' is<ref>[http://geocalc.clas.asu.edu/html/IntroPrimerGeometricAlgebra.html]</ref> anticommutative; they are perpendicular (top) because '''a'''&and;'''b''' = −'''b'''&and;'''a''' and '''a · b''' = 0, if it's commutative; they are parallel (bottom) because '''a'''&and;'''b''' = '''0''' and '''a · b''' = '''b · a'''.]] +[[File:N-vector.svg|right|thumb|126px|Geometric interpretation for the '''outer product''' of ''n'' [[vector (geometry)|vector]]s ('''u''', '''v''', '''w''') to obtain an ''n''-vector ([[parallelotope]] elements), where ''n'' = [[graded algebra|grade]],<ref>{{cite book |author=R. Penrose| title=[[The Road to Reality]]| publisher= Vintage books| year=2007 | isbn=0-679-77631-1}}</ref> for ''n'' = 1, 2, 3. The "circulations" show [[Orientation (vector space)|orientation]].<ref>{{cite book|title=Gravitation|author=J.A. Wheeler, C. Misner, K.S. Thorne|publisher=W.H. Freeman & Co|year=1973|page=83|isbn=0-7167-0344-0}}</ref>]] + +From the axioms above, we find that, for vectors ''a'' and ''b'', we may write the geometric product of any two vectors ''a'' and ''b'' as the sum of a symmetric product and an antisymmetric product: + +:<math>ab=\frac{1}{2}(ab+ba)+\frac{1}{2}(ab-ba)</math> + +Thus we can define the ''inner product'' of vectors to be the symmetric product + +:<math>a \cdot b := \frac{1}{2}(ab + ba) = \frac{1}{2}((a+b)^2 - a^2 - b^2) ,</math> + +which is a real number because it is a sum of squares. The remaining antisymmetric part is the ''outer product'' (the exterior product of the contained [[exterior algebra]]): + +:<math>a \wedge b := \frac{1}{2}(ab - ba) = -(b \wedge a)</math> + +The inner and outer products are associated with familiar concepts from standard vector algebra. Pictorially, ''a'' and ''b'' are [[parallel (geometry)|parallel]] if all their geometric product is equal to their inner product whereas ''a'' and ''b'' are [[perpendicular]] if their geometric product is equal to their outer product. In a geometric algebra for which the square of any nonzero vector is positive, the inner product of two vectors can be identified with the [[dot product]] of standard vector algebra. The outer product of two vectors can be identified with the [[signed area]] enclosed by a [[parallelogram]] the sides of which are the vectors. The [[cross product]] of two vectors in 3 dimensions with positive-definite quadratic form is closely related to their outer product. + +Most instances of geometric algebras of interest have a nondegenerate quadratic form. If the quadratic form is fully [[nondegenerate quadratic form|degenerate]], the inner product of any two vectors is always zero, and the geometric algebra is then simply an exterior algebra. Unless otherwise stated, this article will treat only nondegenerate geometric algebras. + +The outer product is naturally extended as a completely antisymmetric, associative operator between any number of vectors + +:<math>a_1\wedge a_2\wedge\dots\wedge a_r = \frac{1}{r!}\sum_{\sigma\in\mathfrak{S}_r} \operatorname{sgn}(\sigma) a_{\sigma(1)}a_{\sigma(1)} \dots a_{\sigma(r)},</math> + +where the sum is over all permutations of the indices, with <math>\operatorname{sgn}(\sigma)</math> the [[parity of a permutation|sign of the permutation]]. + +===Blades, grading, and canonical basis=== +[[File:GA basis of 3d multivector.svg|301px|thumb|Canonical ''n''-vector basis; unit scalar 1 (represented by a black number line), unit vectors, unit bivectors, and a unit trivector, all in 3d.]] +A multivector that is the outer product of ''r'' independent vectors (<math>r \le n</math>) is called a ''blade'', and the blade is said to be a multivector of grade ''r''. From the axioms, with closure, every multivector of the geometric algebra is a sum of blades. + +Consider a set of ''r'' independent vectors <math>\{a_1,...,a_r\}</math> spanning an ''r''-dimensional subspace of the vector space. With these, we can define a real [[symmetric matrix]] + +:<math>[\mathbf{A}]_{ij}=a_i\cdot a_j</math> + +By the [[spectral theorem]], '''A''' can be diagonalized to [[diagonal matrix]] '''D''' by an [[orthogonal matrix]] '''O''' via + +:<math>\sum_{k,l}[\mathbf{O}]_{ik}[\mathbf{A}]_{kl}[\mathbf{O}^{\mathrm{T}}]_{lj}=\sum_{k,l}[\mathbf{O}]_{ik}[\mathbf{O}]_{jl}[\mathbf{A}]_{kl}=[\mathbf{D}]_{ij}</math> + +Define a new set of vectors <math>\{e_1,...,e_r\}</math>, known as orthogonal basis vectors, to be those transformed by the orthogonal matrix: + +:<math>e_i=\sum_j[\mathbf{O}]_{ij}a_j</math> + +Since orthogonal transformations preserve inner products, it follows that <math>e_i\cdot e_j=[\mathbf{D}]_{ij}</math> and thus the <math>\{e_1,...,e_r\}</math> are perpendicular. In other words the geometric product of two distinct vectors <math>e_i \ne e_j</math> is completely specified by their outer product, or more generally + +:<math>\begin{align}e_1e_2\cdots e_r &= e_1 \wedge e_2 \wedge \cdots \wedge e_r \\ +&= \left(\sum_j [\mathbf{O}]_{1j}a_j\right) \wedge \left(\sum_j [\mathbf{O}]_{2j}a_j\right) \wedge \cdots \wedge \left(\sum_j [\mathbf{O}]_{rj}a_j\right) \\ +&= \det [\mathbf{O}] a_1 \wedge a_2 \wedge \cdots \wedge a_r \end{align}</math> + +Therefore every blade of grade ''r'' can be written as a geometric product of ''r'' vectors. More generally, if a degenerate geometric algebra is allowed, then the orthogonal matrix is replaced by a [[block matrix]] that is orthogonal in the nondegenerate block, and the diagonal matrix has zero-valued entries along the degenerate dimensions. If the new vectors of the nondegenerate subspace are [[unit vector|normalized]] according to + +:<math>\hat{e}_i=\frac{1}{\sqrt{|e_i \cdot e_i|}}e_i,</math> + +then these normalized vectors must square to +1 or −1. By [[Sylvester's law of inertia]], the total number of +1's and the total number of −1's along the diagonal matrix is invariant. By extension, the total number ''p'' of orthonormal basis vectors that square to +1 and the total number ''q'' of orthonormal basis vectors that square to −1 is invariant. (If the degenerate case is allowed, then the total number of basis vectors that square to zero is also invariant.) We denote this algebra <math>\mathcal{G}(p,q)</math>. For example, <math>\mathcal G(3,0)</math> models 3D [[Euclidean space]], <math>\mathcal G(1,3)</math> relativistic [[spacetime]] and <math>\mathcal G(4,1)</math> a 3D [[conformal geometric algebra]]. + +The set of all possible products of ''n'' orthogonal basis vectors with indices in increasing order, including 1 as the empty product forms a basis for the entire geometric algebra (an analogue of the [[Poincaré–Birkhoff–Witt theorem|PBW theorem]]). For example, the following is a basis for the geometric algebra <math>\mathcal{G}(3,0)</math>: +:<math>\{1,e_1,e_2,e_3,e_1e_2,e_1e_3,e_2e_3,e_1e_2e_3\}\,</math> +A basis formed this way is called a '''canonical basis''' for the geometric algebra, and any other orthogonal basis for ''V'' will produce another canonical basis. Each canonical basis consists of 2<sup>''n''</sup> elements. Every multivector of the geometric algebra can be expressed as a linear combination of the canonical basis elements. If the canonical basis elements are {{nowrap|1={''B''<sub>''i''</sub> {{!}} ''i''∈''S''} }} with ''S'' being an index set, then the geometric product of any two multivectors is +:<math>(\Sigma_i \alpha_i B_i)(\Sigma_j \beta_j B_j)=\Sigma_{i,j} \alpha_i\beta_j B_i B_j\,</math>. + +===Grade projection=== + +Using a canonical basis, a [[graded vector space]] structure can be established. Elements of the geometric algebra that are simply scalar multiples of 1 are grade-0 blades and are called ''scalars''. Nonzero multivectors that are in the span of <math>\{e_1,\cdots,e_n\}</math> are grade-1 blades and are the ordinary vectors. Multivectors in the span of <math>\{e_ie_j\mid 1\leq i<j\leq n\}</math> are grade-2 blades and are the bivectors. This terminology continues through to the last grade of ''n''-vectors. Alternatively, grade-''n'' blades are called [[pseudoscalar]]s, grade-''n''−1 blades pseudovectors, etc. Many of the elements of the algebra are not graded by this scheme since they are sums of elements of differing grade. Such elements are said to be of ''mixed grade''. The grading of multivectors is independent of the orthogonal basis chosen originally. + +A multivector <math>A</math> may be decomposed with the '''grade-projection operator''' <math>\langle A \rangle _r</math> which outputs the grade-''r'' portion of ''A''. As a result: + +:<math> A = \sum_{r=0}^{n} \langle A \rangle _r </math> + +As an example, the geometric product of two vectors <math> a b = a \cdot b + a \wedge b = \langle a b \rangle_0 + \langle a b \rangle_2</math> since <math>\langle a b \rangle_0=a\cdot b\,</math> and <math>\langle a b \rangle_2 = a\wedge b\,</math> and <math>\langle a b \rangle_i=0\,</math> for ''i'' other than 0 and 2. + +The decomposition of a multivector <math>A</math> may also be split into those components that are even and those that are odd: + +:<math> A^+ = \langle A \rangle _0 + \langle A \rangle _2 + \langle A \rangle _4 + \cdots </math> +:<math> A^- = \langle A \rangle _1 + \langle A \rangle _3 + \langle A \rangle _5 + \cdots </math> + +This makes the algebra a '''Z'''<sub>2</sub>-[[graded algebra]] or [[superalgebra]] with the geometric product. Since the geometric product of two even multivectors is an even multivector, they define an ''[[superalgebra#Even subalgebra|even subalgebra]]''. The even subalgebra of an ''n''-dimensional geometric algebra is [[algebra homomorphism|isomorphic]] to a full geometric algebra of (''n''−1) dimensions. Examples include <math>\mathcal G^+(2,0) \cong \mathcal G(0,1)</math> and <math>\mathcal G^+(1,3) \cong \mathcal G(3,0)</math>. + +===Representation of subspaces=== +Geometric algebra represents subspaces of ''V'' as multivectors, and so they coexist in the same algebra with vectors from ''V''. A ''k'' dimensional subspace ''W'' of ''V'' is represented by taking an orthogonal basis <math>\{b_1,b_2,\cdots b_k\}</math> and using the geometric product to form the [[blade (geometry)|blade]] {{nowrap|1=''D'' = ''b''<sub>1</sub>''b''<sub>2</sub>⋅⋅⋅''b''<sub>''k''</sub>}}. There are multiple blades representing ''W''; all those representing ''W'' are scalar multiples of ''D''. These blades can be separated into two sets: positive multiples of ''D'' and negative multiples of ''D''. The positive multiples of ''D'' are said to have ''the same [[orientation (vector space)|orientation]]'' as ''D'', and the negative multiples the ''opposite orientation''. + +Blades are important since geometric operations such as projections, rotations and reflections depend on the factorability via the outer product that (the restricted class of) ''n''-blades provide but that (the generalized class of) grade-''n'' multivectors do not when ''n'' ≥ 4. + +===Unit pseudoscalars=== +Unit pseudoscalars are blades that play important roles in GA. A '''unit pseudoscalar''' for a non-degenerate subspace ''W'' of ''V'' is a blade that is the product of the members of an orthonormal basis for ''W''. It can be shown that if {{math|''I''}} and {{math|''I''′}} are both unit pseudoscalars for ''W'', then {{math|1=''I'' = ±''I''′}} and {{math|1=''I''<sup>2</sup> = ±1}}. + +Suppose the geometric algebra <math>\mathcal{G}(n,0)</math> with the familiar positive definite inner product on '''R'''<sup>''n''</sup> is formed. Given a plane (2-dimensional subspace) of '''R'''<sup>''n''</sup>, one can find an orthonormal basis {''b''<sub>1</sub>,''b''<sub>2</sub>} spanning the plane, and thus find a unit pseudoscalar {{nowrap|1={{math|''I''}} = ''b''<sub>1</sub>''b''<sub>2</sub>}} representing this plane. The geometric product of any two vectors in the span of ''b''<sub>1</sub> and ''b''<sub>2</sub> lies in <math>\{\alpha_0+\alpha_1 I\mid \alpha_i\in\mathbb{R} \}</math>, that is, it is the sum of a 0-vector and a 2-vector. + +By the properties of the geometric product, {{nowrap|1={{math|''I''}}&nbsp;<sup>2</sup> = ''b''<sub>1</sub>''b''<sub>2</sub>''b''<sub>1</sub>''b''<sub>2</sub> = −''b''<sub>1</sub>''b''<sub>2</sub>''b''<sub>2</sub>''b''<sub>1</sub> = −1}}. The resemblance to the [[imaginary unit]] is not accidental: the subspace <math>\{\alpha_0+\alpha_1 I\mid \alpha_i\in\mathbb{R} \}</math> is '''R'''-algebra isomorphic to the [[complex number]]s. In this way, a copy of the complex numbers is embedded in the geometric algebra for each 2-dimensional subspace of ''V'' on which the quadratic form is definite. + +It is sometimes possible to identify the presence of an imaginary unit in a physical equation. Such units arise from one of the many quantities in the real algebra that square to −1, and these have geometric significance because of the properties of the algebra and the interaction of its various subspaces. + +In <math>\mathcal{G}(3,0)</math>, an exceptional case occurs. Given a canonical basis built from orthonormal ''e''<sub>''i''</sub>'s from ''V'', the set of ''all'' 2-vectors is generated by +:<math>\{e_3e_2,e_1e_3,e_2e_1\}\,</math>. +Labelling these ''i'', ''j'' and ''k'' (momentarily deviating from our uppercase convention), the subspace generated by 0-vectors and 2-vectors is exactly <math>\{\alpha_0+i\alpha_1+j\alpha_2+k\alpha_3\mid \alpha_i\in\mathbb{R}\}</math>. This set is seen to be a subalgebra, and furthermore is '''R'''-algebra isomorphic to the [[quaternion]]s, another important algebraic system. + +===Dual basis=== + +Let <math>\{e_i\}</math> be a basis of ''V'', i.e. a set of ''n'' linearly independent vectors that span the ''n''-dimensional vector space ''V''. The basis that is dual to <math>\{e_i\}</math> is the set of elements of the [[dual vector space]] ''V''<sup>∗</sup> that forms a [[biorthogonal system]] with this basis, thus being the elements denoted <math>\{e^i\}</math> satisfying +:<math>e^i \cdot e_j = \delta^i{}_j,</math> +where δ is the [[Kronecker delta]]. + +Given a nondegenerate quadratic form on ''V'', ''V''<sup>∗</sup> becomes naturally identified with ''V'', and the dual basis may be regarded as elements of ''V'', but are not in general the same set as the original basis. + +Given further a GA of ''V'', let +:<math> \epsilon = e_1 \wedge \cdots \wedge e_n</math> +be the pseudoscalar (which does not necessarily square to ±1) formed from the basis <math>\{e_i\}</math>. The dual basis vectors may be constructed as +:<math>e^i=(-1)^{i-1}(e_1 \wedge \cdots \wedge \check{e}_i \wedge \cdots \wedge e_n) \epsilon^{-1},</math> +where the <math>\check{e}_i</math> denotes that the ''i''th basis vector is omitted from the product. + +===Extensions of the inner and outer products=== + +It is common practice to extend the outer product on vectors to the entire algebra. This may be done through the use of the grade projection operator: + +: <math>C \wedge D := \sum_{r,s}\langle \langle C \rangle_r \langle D \rangle_s \rangle_{r+s} </math> (the ''outer product'') + +This generalization is consistent with the above definition involving antisymmetrization. Another generalization related to the outer product is the commutator product: +: <math>C \times D := \tfrac{1}{2}(CD-DC) </math> + +The regressive product is the dual of the outer product:<ref>Perwass (2005), ''Geometry and Computing'', §3.2.13 p.89</ref> +: <math>C \;\triangledown\; D := \sum_{r,s}\langle \langle C \rangle_r \langle D \rangle_s \rangle_{r+s-n} </math> + +The inner product on vectors can also be generalised, but in more than one non-equivalent way. The paper {{Harvard citation|Dorst|2002}} gives a full treatment of several different inner products developed for geometric algebras and their interrelationships, and the notation is taken from there. Many authors use the same symbol as for the inner product of vectors for their chosen extension (e.g. Hestenes and Perwass). No consistent notation has emerged. + +Among these several different generalizations of the inner product on vectors are: + +: <math>\, C \;\big\lrcorner\; D := \sum_{r,s}\langle \langle C\rangle_r \langle D \rangle_{s} \rangle_{s-r} </math> &nbsp;&nbsp;(the ''left contraction'') +: <math>\, C \;\big\llcorner\; D := \sum_{r,s}\langle \langle C\rangle_r \langle D \rangle_{s} \rangle_{r-s} </math> &nbsp;&nbsp;(the ''right contraction'') +: <math>\, C * D := \sum_{r,s}\langle \langle C \rangle_r \langle D \rangle_s \rangle_{0} </math> &nbsp;&nbsp;(the ''scalar product'') +: <math>\, C \bullet D := \sum_{r,s}\langle \langle C\rangle_r \langle D \rangle_{s} \rangle_{|s-r|} </math> &nbsp;&nbsp;(the "(fat) dot" product) +: <math>\, C \bullet_H D := \sum_{r\ne0,s\ne0}\langle \langle C\rangle_r \langle D \rangle_{s} \rangle_{|s-r|} </math> &nbsp;&nbsp;(Hestenes's inner product)<ref>Distinguishing notation here is from Dorst (2007) ''Geometric Algebra for computer Science'' §B.1 p.590.; the point is also made that scalars must be handled as a special case with this product.</ref> + +{{Harvard citation|Dorst|2002}} makes an argument for the use of contractions in preference to Hestenes's inner product; they are algebraically more regular and have cleaner geometric interpretations. A number of identities incorporating the contractions are valid without restriction of their inputs. Benefits of using the left contraction as an extension of the inner product on vectors include that the identity <math> ab = a \cdot b + a \wedge b </math> is extended to <math> aB = a \;\big\lrcorner\; B + a \wedge B</math> for any vector ''a'' and multivector ''B'', and that the [[projection (linear algebra)|projection]] operation <math> \mathcal{P}_b (a) = (a \cdot b^{-1})b </math> is extended to <math> \mathcal{P}_B (A) = (A \;\big\lrcorner\; B^{-1}) \;\big\lrcorner\; B</math> for any blades ''A'' and ''B'' (with a minor modification to accommodate null ''B'', given [[#Projection and rejection|below]]). + +===Terminology specific to geometric algebra=== + +Some terms are used in geometric algebra with a meaning that differs from the use of those terms in other fields of mathematics. Some of these are listed here: + +;Vector: In GA this refers specifically to an element of the 1-vector subspace unless otherwise clear from the context, despite the entire algebra forming a [[vector space]]. +;Grade: In GA this refers to a [[graded algebra|grading as an algebra]] under the outer product (an <math>\mathbb{N}</math>-grading), and not under the geometric product (which produces a Z<sub>2</sub><sup>''n''</sup>-grading). +;Outer product: In GA this refers to what is generally called the [[exterior product]] (including in GA as an alternative). It is not the [[outer product]] of linear algebra. +;Inner product: In GA this generally refers to a scalar product on the vector subspace (which is not required to be positive definite) and may include any chosen extension of this product to the entire algebra. It is not specifically the [[inner product]] on a normed vector space. +;Versor: In GA this refers to an object that can be constructed as the geometric product of any number of non-null vectors. The term otherwise may refer to a [[Versor|unit quaternion]], analogous to a rotor in GA. +;[[Outermorphism]]: This term is used only in GA, and refers to a linear map on the vector subspace, extended to apply to the entire algebra by defining it as preserving the outer product. + +==Geometric interpretation== + +===Projection and rejection=== +[[File:GA plane subspace and projection.svg|right|300px|thumb|In 3d space, a bivector '''a'''&and;'''b''' defines a 2d plane subspace (light blue, extends infinitely in indicated directions). Any vector '''c''' in the 3-space can be projected onto and rejected normal to the plane, shown respectively by '''c'''<sub>&perp;</sub> and '''c'''<sub>&#8741;</sub>.]] + +For any vector ''a'' and any invertible vector ''m'', +:<math>\, a = amm^{-1} = (a\cdot m + a \wedge m)m^{-1} = a_{\| m} + a_{\perp m} </math> +where the '''projection''' of ''a'' onto ''m'' (or the parallel part) is +:<math>\, a_{\| m} = (a\cdot m)m^{-1} </math> +and the '''rejection''' of ''a'' onto ''m'' (or the perpendicular part) is +:<math>\, a_{\perp m} = a - a_{\| m} = (a\wedge m)m^{-1} .</math> + +Using the concept of a ''k''-blade ''B'' as representing a subspace of ''V'' and every multivector ultimately being expressed in terms of vectors, this generalizes to projection of a general multivector onto any invertible ''k''-blade ''B'' as<ref>This definition follows Dorst (2007) and Perwass (2009) – the left contraction used by Dorst replaces the ("fat dot") inner product that Perwass uses, consistent with Perwass's constraint that grade of ''A'' may not exceed that of ''B''.</ref> +:<math>\, \mathcal{P}_B (A) = (A \;\big\lrcorner\; B^{-1}) \;\big\lrcorner\; B </math> +with the rejection being defined as +:<math>\, \mathcal{P}_B^\perp (A) = A - \mathcal{P}_B (A) .</math> + +The projection and rejection generalize to null blades ''B'' by replacing the inverse ''B''<sup>−1</sup> with the pseudoinverse ''B''<sup>+</sup> with respect to the contractive product.<ref>Dorst appears to merely assume ''B''<sup>+</sup> such that {{nowrap|1=''B'' ⨼ ''B''<sup>+</sup> = 1}}, whereas Perwass (2009) defines {{nowrap|1=''B''<sup>+</sup> = ''B''<sup>†</sup>/(''B'' ⨼ ''B''<sup>†</sup>)}}, where ''B''<sup>†</sup> is the conjugate of ''B'', equivalent to the reverse of ''B'' up to a sign.</ref> The outcome of the projection coincides in both cases for non-null blades.{{sfn|Dorst|year=2007|loc=§3.6 p. 85}}<ref>Perwass (2009) §3.2.10.2 p83</ref> For null blades ''B'', the definition of the projection given here with the first contraction rather than the second being onto the pseudoinverse should be used,<ref>That is to say, the projection must be defined as {{nowrap|1=''P''<sub>''B''</sub>(''A'') = (''A'' ⨼ ''B''<sup>+</sup>) ⨼ ''B''}} and not as {{nowrap|1=(''A'' ⨼ ''B'') ⨼ ''B''<sup>+</sup>}}, though the two are equivalent for non-null blades ''B''</ref> as only then is the result necessarily in the subspace represented by ''B''.{{sfn|Dorst|year=2007|loc=§3.6 p. 85}} +The projection generalizes through linearity to general multivectors ''A''.<ref>This generalization to all ''A'' is apparently not considered by Perwass or Dorst.</ref> The projection is not linear in ''B'' and does not generalize to objects ''B'' that are not blades. + +===Reflections=== + +The definition of a reflection occurs in two forms in the literature. Several authors work with reflection ''on'' a vector (negating all vector components except that parallel to the specifying vector), while others work with reflection ''along'' a vector (negating only the component parallel to the specifying vector, or reflection in the hypersurface orthogonal to that vector). Either may be used to build general versor operations, but the former has the advantage that it extends to the algebra in a simpler and algebraically more regular fashion. + +====Reflection ''on'' a vector==== + +[[File:GA reflection on vector.svg|200px|left|thumb|Reflection of vector ''c'' on a vector ''n''. The rejection of ''c'' on ''n'' is negated.]] + +The result of reflecting a vector ''a'' on another vector ''n'' is to negate the rejection of ''a''. It is akin to reflecting the vector ''a'' through the origin, except that the projection of ''a'' onto ''n'' is not reflected. Such an operation is described by +:<math>\, a \mapsto nan^{-1} .</math> +Repeating this operation results in a general versor operation (including both rotations and reflections) of a general multivector ''A'' being expressed as +:<math>\, A \mapsto NAN^{-1} .</math> +This allows a general definition of any versor ''N'' (including both reflections and rotors) as an object that can be expressed as a geometric product of any number of non-null 1-vectors. Such a versor can be applied in a uniform sandwich product as above irrespective of whether it is of even (a proper rotation) or odd grade (an improper rotation i.e. general reflection). The set of all versors with the geometric product as the group operation constitutes the [[Clifford group]] of the Clifford algebra ''C''ℓ<sub>''p'',''q''</sub>('''R''').<ref>Perwass (2009) §3.3.1. Perwass also claims here that David Hestenes coined the term "versor", where he is presumably is referring to the GA context (the term [[versor]] appears to have been used by [[William Rowan Hamilton|Hamilton]] to refer to an equivalent object of the [[quaternion]] algebra).</ref> +{{-}} + +====Reflection ''along'' a vector==== + +[[File:GA reflection along vector.svg|200px|left|thumb|Reflection of vector ''c'' along a vector ''m''. Only the component of ''c'' parallel to ''m'' is negated.]] + +The reflection of a vector ''a'' along a vector ''m'', or equivalently in the hyperplane orthogonal to ''m'', is the same as negating the component of a vector parallel to ''m''. The result of the reflection will be +:<math>\! a' = {-a_{\| m} + a_{\perp m}} = {-(a \cdot m)m^{-1} + (a \wedge m)m^{-1}} += {(-m \cdot a - m \wedge a)m^{-1}} += -mam^{-1} </math> + +This is not the most general operation that may be regarded as a reflection when the dimension {{nowrap|''n'' ≥ 4}}. A general reflection may be expressed as the composite of any odd number of single-axis reflections. Thus, a general reflection of a vector may be written +:<math>\! a \mapsto -MaM^{-1} </math> +where +:<math>\! M = pq \ldots r</math> and <math>\! M^{-1} = (pq \ldots r)^{-1} = r^{-1} \ldots q^{-1}p^{-1} .</math> + +If we define the reflection along a non-null vector ''m'' of the product of vectors as the reflection of every vector in the product along the same vector, we get for any product of an odd number of vectors that, by way of example, +:<math> (abc)' = a'b'c' = (-mam^{-1})(-mbm^{-1})(-mcm^{-1}) = -ma(m^{-1}m)b(m^{-1}m)cm^{-1} = -mabcm^{-1} \,</math> +and for the product of an even number of vectors that +:<math> (abcd)' = a'b'c'd' = (-mam^{-1})(-mbm^{-1})(-mcm^{-1})(-mdm^{-1}) += mabcdm^{-1} .\,</math> + +Using the concept of every multivector ultimately being expressed in terms of vectors, the reflection of a general multivector ''A'' using any reflection versor ''M'' may be written +:<math>\, A \mapsto M\alpha(A)M^{-1} ,</math> +where ''α'' is the [[automorphism]] of [[reflection through the origin]] of the vector space (''v'' ↦ −''v'') extended through multilinearity to the whole algebra. + +===Hypervolume of an ''n''-parallelotope spanned by ''n'' vectors=== +For vectors <math> a </math> and <math> b </math> spanning a parallelogram we have +:<math> a \wedge b = ((a \wedge b) b^{-1}) b = a_{\perp b} b </math> +with the result that <math> a \wedge b</math> is linear in the product of the "altitude" and the "base" of the parallelogram, that is, its area. + +Similar interpretations are true for any number of vectors spanning an ''n''-dimensional [[parallelotope]]; the outer product of vectors ''a''<sub>1</sub>, ''a''<sub>2</sub>, ... ''a<sub>n</sub>'', that is <math>\bigwedge_{i=1}^n a_i </math>, has a magnitude equal to the volume of the ''n''-parallelotope. An ''n''-vector doesn't necessarily have a shape of a parallelotope – this is a convenient visualization. It could be any shape, although the volume equals that of the parallelotope. + +===Rotations=== +{{merge section from|Rotation formalisms in three dimensions#Rotors in a geometric algebra|date=September 2013}} +[[File:GA planar rotations.svg|right|200px|thumb|A rotor that rotates vectors in a plane rotates vectors through angle ''θ'', that is ''x''→''R''<sub>''θ''</sub>''xR''<sub>''θ''</sub><sup>†</sup> is a rotation of ''x'' through angle ''θ''. The angle between ''u'' and ''v'' is ''θ''/2. Similar interpretations are valid for a general multivector ''X'' instead of the vector ''x''.<ref>[http://geocalc.clas.asu.edu/html/IntroPrimerGeometricAlgebra.html]</ref>]] + +If we have a product of vectors <math>R = a_1a_2....a_r</math> then we denote the reverse as +:<math>R^{\dagger}= (a_1a_2....a_r)^{\dagger} = a_r....a_2a_1</math>. + +As an example, assume that <math> R = ab </math> we get +:<math>RR^{\dagger} = abba = ab^2a =a^2b^2 = R^{\dagger}R</math>. + +Scaling {{math|''R''}} so that {{math|1=''RR''<sup>†</sup> = 1}} then +:<math>(RvR^{\dagger})^2 = Rv^{2}R^{\dagger}= v^2RR^{\dagger} = v^2 </math> +so <math>RvR</math><sup>†</sup> leaves the length of <math>v</math> unchanged. We can also show that +:<math>(Rv_1R^{\dagger}) \cdot (Rv_2R^{\dagger}) = v_1 \cdot v_2</math> +so the transformation {{math|1=''RvR''<sup>†</sup>}} preserves both length and angle. It therefore can be identified as a rotation or rotoreflection; {{math|''R''}} is called a [[rotor (mathematics)|rotor]] if it is a [[proper rotation]] (as it is if it can be expressed as a product of an even number of vectors) and is an instance of what is known in GA as a ''[[versor]]'' (presumably for historical reasons). + +There is a general method for rotating a vector involving the formation of a multivector of the form <math> R = e^{-\frac{B \theta}{2}} </math> that produces a rotation <math> \theta </math> in the plane and with the orientation defined by a 2-blade <math> B </math>. + +Rotors are a generalization of quaternions to ''n''-D spaces. + +For more about reflections, rotations and "sandwiching" products like {{math|1=''RvR''<sup>†</sup>}} see [[Plane of rotation]]. + +==Linear functions== +An important class of functions of multivectors are the [[linear function]]s mapping multivectors to multivectors. The geometric algebra of an ''n''-dimensional vector space is spanned by 2<sup>''n''</sup> canonical basis elements. If a multivector in this basis is represented by a 2<sup>''n''</sup> x 1 real [[column matrix]], then in principle all linear transformations of the multivector can be written as the [[matrix multiplication]] of a 2<sup>''n''</sup> x 2<sup>''n''</sup> real matrix on the column, just as in the entire theory of [[linear algebra]] in 2<sup>''n''</sup> dimensions. + +There are several issues with this naive generalization. To see this, recall that the [[eigenvalues]] of a real matrix may in general be complex. The scalar coefficients of blades must be real, so these complex values are of no use. If we attempt to proceed with an analogy for these complex eigenvalues anyway, we know that in ordinary linear algebra, complex eigenvalues are associated with [[rotation matrices]]. However if the linear function is truly general, it could allow arbitrary exchanges among the different grades, such as a "rotation" of a scalar into a vector. This operation has no clear geometric interpretation. + +We seek to restrict the class of linear functions of multivectors to more geometrically sensible transformations. A common restriction is to require that the linear functions be ''grade-preserving''. The grade-preserving linear functions are the linear functions that map scalars to scalars, vectors to vectors, bivectors to bivectors, etc. In matrix representation, the grade-preserving linear functions are [[block diagonal matrix|block diagonal matrices]], where each ''r''-grade block is of size <math>\binom nr \times \binom nr</math>. A weaker restriction allows the linear functions to map ''r''-grade multivectors into linear combinations of ''r''-grade and (''n''−''r'')-grade multivectors. These functions map scalars into scalars+pseudoscalars, vectors to vectors+pseudovectors, etc. + +Often an [[invertible matrix|invertible]] linear transformation from vectors to vectors is already of known interest. There is no unique way to generalize these transformations to the entire geometric algebra without further restriction. Even the restriction that the linear transformation be grade-preserving is not enough. We therefore desire a stronger rule, motivated by geometric interpretation, for generalizing these linear transformations of vectors in a standard way. The most natural choice is that of the ''[[outermorphism]]'' of the linear transformation because it extends the concepts of reflection and rotation straightforwardly. If ''f'' is a function that maps vectors to vectors, then its outermorphism is the function that obeys the rule + +:<math>\underline{\mathsf{f}}(a_1 \wedge a_2 \wedge \cdots \wedge a_r) = f(a_1) \wedge f(a_2) \wedge \cdots \wedge f(a_r).</math> + +In particular, the outermorphism of the reflection of a vector on a vector is + +:<math>nan^{-1} \mapsto nAn^{-1},</math> + +and the outermorphism of the rotation of a vector by a rotor is + +:<math>RaR^{\dagger} \mapsto RAR^{\dagger}.</math> + +==Examples and applications== + +===Intersection of a line and a plane=== + +[[File:LinePlaneIntersect.png|thumb|A line L defined by points T and P (which we seek) and a plane defined by a bivector B containing points P and Q.]] + +We may define the line parametrically by <math> p = t + \alpha \ v </math> where ''p'' and ''t'' are position vectors for points T and P and ''v'' is the direction vector for the line. + +Then +:<math>B \wedge (p-q) = 0</math> and <math>B \wedge (t + \alpha v - q) = 0</math> +so +:<math>\alpha = \frac{B \wedge(q-t)}{B \wedge v} </math> +and +:<math>p = t + \left(\frac{B \wedge (q-t)}{B \wedge v}\right) v</math>. + +===Rotating systems=== + +The mathematical description of rotational forces such as [[torque]] and [[angular momentum]] make use of the [[cross product]].[[File:Exterior calc cross product.svg|right|thumb|The cross product in relation to the outer product. In red are the unit normal vector, and the "parallel" unit bivector.]] + +The cross product can be viewed in terms of the outer product allowing a more natural geometric interpretation of the cross product as a bivector using the [[Hodge dual|dual]] relationship + +:<math>a \times b = -I (a \wedge b) \,.</math> + +For example,torque is generally defined as the magnitude of the perpendicular force component times distance, or work per unit angle. + +Suppose a circular path in an arbitrary plane containing orthonormal vectors <math>\hat{ u}</math> and<math>\hat{ v}</math> is parameterized by angle. + +:<math> +\mathbf{r} = r(\hat{ u} \cos \theta + \hat{ v} \sin \theta) = r \hat{ u}(\cos \theta + \hat{ u} \hat{ v} \sin \theta) +</math> + +By designating the unit bivector of this plane as the imaginary number + +:<math>{i} = \hat{ u} \hat{ v} = \hat{ u} \wedge \hat{ v}</math> +:<math>{i}^2 = -1</math> + +this path vector can be conveniently written in complex exponential form + +:<math> +\mathbf{r} = r \hat{ u} e^{{i} \theta} +</math> + +and the derivative with respect to angle is + +:<math> +\frac{d \mathbf{r}}{d\theta} = r \hat{ u} {i} e^{{i} \theta} = \mathbf{r} {i} +</math> + +So the torque, the rate of change of work ''W'', due to a force ''F'', is + +:<math>\tau = \frac{dW}{d\theta} = F \cdot \frac{d r}{d\theta} = F \cdot (\mathbf{r} {i}) +</math> + +Unlike the cross product description of torque, <math> \tau = \mathbf{r} \times F</math>, the geometric algebra description does not introduce a vector in the normal direction; a vector that does not exist in two and that is not unique in greater than three dimensions. The unit bivector describes the plane and the orientation of the rotation, and the sense of the rotation is relative to the angle between the vectors <math>{\hat{u}}</math> and <math>{\hat{v}}</math>. + +===Electrodynamics and special relativity=== +In physics, the main applications are the geometric algebra of [[Minkowski spacetime|Minkowski 3+1 spacetime]], ''C''ℓ<sub>1,3</sub>, called [[spacetime algebra]] (STA).{{sfn|Hestenes|1966}} or less commonly, ''C''ℓ<sub>3</sub>, called the [[algebra of physical space]] (APS) where ''C''ℓ<sub>3</sub> is isomorphic to the ''even'' subalgebra of the 3+1 Clifford algebra, ''C''ℓ{{su|p=0|b=3,1}}. + +While in STA points of spacetime are represented simply by vectors, in APS, points of (3+1)-dimensional spacetime are instead represented by [[paravector]]s: a 3-dimensional vector (space) plus a 1-dimensional scalar (time). + +In spacetime algebra the electromagnetic field tensor has a bivector representation <math>{F} = ({E} + i c {B})e_0</math>.<ref>{{cite web |url=http://www.av8n.com/physics/maxwell-ga.htm |title=Electromagnetism using Geometric Algebra versus Components + |accessdate=19 March 2013}}</ref> Here, the imaginary unit is the (four-dimensional) volume element, and <math>e_0</math> is the unit vector in time direction. Using the [[four-current]] <math>{J}</math>, [[Maxwell's equations]] then become + + +:{|class="wikitable" style="text-align: center;" +|- +!scope="column" width="160px"|Formulation +!| Homogeneous equations +!| Non-homogeneous equations +|- +! rowspan="2" |Fields +| colspan="2" |<math> D F = \mu_0 J </math> +|- +| <math> D\wedge F = 0 </math> +| <math> D\cdot F = \mu_0 J </math> +|- +!Potentials (any gauge) +||<math>F = D \wedge A</math> +||<math>D \cdot D \wedge A = \mu_0 J </math> +|- +!Potentials (Lorenz&nbsp;gauge) +||<math>F = D A</math> +<math> D\cdot A = 0 </math> +||<math>D^2 A = \mu_0 J </math> +|} + +In geometric calculus, juxtapositioning of vectors such as in <math>DF</math> indicate the geometric product and can be decomposed into parts as <math>DF=D\cdot F+D\wedge F</math>. Here <math>D</math> is the covector derivative in any spacetime and reduces to <math>\bigtriangledown</math> in flat spacetime. Where <math>\bigtriangledown</math> plays a role in Minkowski 4-spacetime which is synonymous to the role of <math>\nabla</math> in Euclidean 3-space and is related to the D'Alembertian by <math> \Box=\bigtriangledown^2 </math>. Indeed given an observer represented by a future pointing timelike vector <math>\gamma_0</math> we have + +:<math>\gamma_0\cdot\bigtriangledown=\frac{1}{c}\frac{\partial}{\partial t}</math> + +:<math>\gamma_0\wedge\bigtriangledown=\nabla</math> + +[[Lorentz boost|Boosts]] in this Lorenzian metric space have the same expression <math>e^{{\beta}}</math> as rotation in Euclidean space, where <math>{\beta}</math> is the bivector generated by the time and the space directions involved, whereas in the Euclidean case it is the bivector generated by the two space directions, strengthening the "analogy" to almost identity. + +==Relationship with other formalisms== +<math>\mathcal G(3,0)</math> may be [[Comparison of vector algebra and geometric algebra|directly compared]] to [[Vector calculus#Algebraic operations|vector algebra]]. + +The [[Superalgebra#Even subalgebra|even]] [[subalgebra]] of <math>\mathcal G(2,0)</math> is isomorphic to the [[complex number]]s, as may be seen by writing a vector {{math|1=''P''}} in terms of its components in an orthonormal basis and left multiplying by the basis vector {{math|1=''e''<sub>1</sub>}}, yielding + +:<math> Z = {e_1} P = {e_1} ( x {e_1} + y {e_2}) += x (1) + y ( {e_1} {e_2})\, +</math> + +where we identify {{math|1=''i'' ↦ ''e''<sub>1</sub>''e''<sub>2</sub>}} since + +:<math>({e_1}{e_2})^2 = {e_1}{e_2}{e_1}{e_2} = -{e_1}{e_1}{e_2}{e_2} = -1 \,</math> + +Similarly, the even subalgebra of <math>\mathcal G(3,0)</math> with basis {{math|1={1, ''e''<sub>2</sub>''e''<sub>3</sub>, ''e''<sub>3</sub>''e''<sub>1</sub>, ''e''<sub>1</sub>''e''<sub>2</sub>} }} is isomorphic to the [[quaternion]]s as may be seen by identifying {{math|1=''i'' ↦ −''e''<sub>2</sub>''e''<sub>3</sub>}}, {{math|1=''j'' ↦ −''e''<sub>3</sub>''e''<sub>1</sub>}} and {{math|1=''k'' ↦ −''e''<sub>1</sub>''e''<sub>2</sub>}}. + +Every [[associative algebra]] has a matrix representation; the [[Pauli matrices]] are a representation of <math>\mathcal G(3,0)</math> and the [[Dirac matrices]] are a representation of <math>\mathcal G(1,3)</math>, showing the equivalence with matrix representations used by physicists. + +==Geometric calculus== +{{main|Geometric calculus}} + +Geometric calculus extends the formalism to include differentiation and integration including differential geometry and differential forms.<ref>Clifford Algebra to Geometric Calculus, a Unified Language for mathematics and Physics (Dordrecht/Boston:G.Reidel Publ.Co.,1984</ref> + +Essentially, the vector derivative is defined so that the GA version of [[Green's theorem]] is true, +:<math>\int_{A} dA \nabla f = \oint_{\partial A} dx f</math> +and then one can write +:<math>\nabla f = \nabla \cdot f + \nabla \wedge f</math> +as a geometric product, effectively generalizing [[Stokes' theorem]] (including the differential form version of it). + +In <math>1D</math> when A is a curve with endpoints <math>a</math> and <math>b</math>, then +:<math>\int_{A} dA \nabla f = \oint_{\partial A} dx f</math> +reduces to +:<math>\int_{a}^{b} dx \nabla f = \int_{a}^{b} dx \cdot \nabla f = \int_{a}^{b} df = f(b) -f(a)</math> +or the fundamental theorem of integral calculus. + +Also developed are the concept of vector manifold and geometric integration theory (which generalizes Cartan's differential forms). + +==Conformal geometric algebra (CGA)== +{{main|Conformal geometric algebra}} + +A compact description of the current state of the art is provided by Bayro-Corrochano and Scheuermann (2010),<ref>Geometric Algebra Computing in Engineering and Computer Science, E.Bayro-Corrochano & Gerik Scheuermann (Eds),Springer 2010. Extract online at http://geocalc.clas.asu.edu/html/UAFCG.html #5 New Tools for Computational Geometry and rejuvenation of Screw Theory</ref> which also includes further references, in particular to Dorst ''et al'' (2007).<ref>{{cite book|first1=Leo |last1=Dorst |first2=Daniel |last2=Fontijne |first3=Stephen |last3=Mann |title=Geometric algebra for computer science: an object-oriented approach to geometry |publisher=Elsevier/Morgan Kaufmann |location=Amsterdam |year=2007 |isbn=978-0-12-369465-2 |oclc=132691969 |url=http://www.geometricalgebra.net/ |ref=harv}}</ref> Other useful references are Li (2008).<ref>Hongbo Li (2008) ''Invariant Algebras and Geometric Reasoning'', Singapore: World Scientific. Extract online at http://www.worldscibooks.com/etextbook/6514/6514_chap01.pdf</ref> and Bayro (2010).<ref>Bayro-Corrochano, Eduardo (2010). Geometric Computing for Wavelet Transforms, Robot Vision, Learning, Control and Action. Springer Verlag</ref> + +[[File:Conformal Embedding.svg|right|300px]] +Working within GA, Euclidian space <math>\mathcal E^3</math> is embedded projectively in the CGA <math>\mathcal G^{4,1}</math> via the identification of Euclidean points with 1D subspaces in the 4D null cone of the 5D CGA vector subspace, and adding a point at infinity. This allows all conformal transformations to be done as rotations and reflections and is [[Covariance and contravariance of vectors|covariant]], extending incidence relations of projective geometry to circles and spheres. + +Specifically, we add orthogonal basis vectors <math>\, e_+ </math> and <math>\, e_- </math> such that <math>\, {e_+}^2 = +1 </math> and <math>\, {e_-}^2 = -1 </math> to the basis of <math> \mathcal{G}(3,0) </math> and identify [[null vectors]] +:<math> n_{\infty} = e_- + e_+ </math> as an [[ideal point]] (point at infinity) (see [[Compactification (mathematics)|Compactification]]) and +:<math> n_{o} = \tfrac{1}{2}(e_- - e_+) </math> as the point at the origin, giving +:<math> n_{\infty} \cdot n_{o} = -1 </math>. + +This procedure has some similarities to the procedure for working with [[homogeneous coordinates]] in projective geometry and in this case allows the modeling of [[Euclidean transformation]]s as [[orthogonal transformation]]s. + +A fast changing and fluid area of GA, CGA is also being investigated for applications to +relativistic physics. + +==History== + +;Before the 20th century + +Although the connection of geometry with algebra dates as far back at least to [[Euclid]]'s ''[[Euclid's Elements|Elements]]'' in the 3rd century B.C. (see [[History of elementary algebra#Greek geometric algebra|Greek geometric algebra]]), +GA in the sense used in this article was not developed until 1844, when it was used in a ''systematic way'' to describe the geometrical properties and ''transformations'' of a space. In that year, [[Hermann Grassmann]] introduced the idea of a geometrical algebra in full generality as a certain calculus (analogous to the [[propositional calculus]]) that encoded all of the geometrical information of a space.<ref>{{cite book|first=Hermann |last=Grassmann |authorlink=Hermann Grassmann |title=Die lineale Ausdehnungslehre ein neuer Zweig der Mathematik: dargestellt und durch Anwendungen auf die übrigen Zweige der Mathematik, wie auch auf die Statik, Mechanik, die Lehre vom Magnetismus und die Krystallonomie erläutert |year=1844 |url=http://books.google.com/?id=bKgAAAAAMAAJ |oclc=20521674 |publisher=O. Wigand |location=Leipzig |ref=harv}}</ref> Grassmann's algebraic system could be applied to a number of different kinds of spaces, the chief among them being [[Euclidean space]], [[affine space]], and [[projective space]]. Following Grassmann, in 1878 [[William Kingdon Clifford]] examined Grassmann's algebraic system alongside the [[quaternions]] of [[William Rowan Hamilton]] in {{Harvard citation|Clifford|1878}}. From his point of view, the quaternions described certain ''transformations'' (which he called ''rotors''), whereas Grassmann's algebra described certain ''properties'' (or ''Strecken'' such as length, area, and volume). His contribution was to define a new product — the '''geometric product''' — on an existing Grassmann algebra, which realized the quaternions as living within that algebra. Subsequently [[Rudolf Lipschitz]] in 1886 generalized Clifford's interpretation of the quaternions and applied them to the geometry of rotations in ''n'' dimensions. Later these developments would lead other 20th-century mathematicians to formalize and explore the properties of the Clifford algebra. + +Nevertheless, another revolutionary development of the 19th-century would completely overshadow the geometric algebras: that of [[vector analysis]], developed independently by [[Josiah Willard Gibbs]] and [[Oliver Heaviside]]. Vector analysis was motivated by [[James Clerk Maxwell]]'s studies of [[electromagnetism]], and specifically the need to express and manipulate conveniently certain [[differential equation]]s. Vector analysis had a certain intuitive appeal compared to the rigors of the new algebras. Physicists and mathematicians alike readily adopted it as their geometrical toolkit of choice, particularly following the influential 1901 textbook ''[[Vector Analysis]]'' by [[Edwin Bidwell Wilson]], following lectures of Gibbs. + +In more detail, there have been three approaches to geometric algebra: [[quaternion]]ic analysis, initiated by Hamilton in 1843 and geometrized as rotors by Clifford in 1878; geometric algebra, initiated by Grassmann in 1844; and vector analysis, developed out of quaternionic analysis in the late 19th century by Gibbs and Heaviside. The legacy of quaternionic analysis in vector analysis can be seen in the use of {{math|1=''i'', ''j'', ''k''}} to indicate the basis vectors of '''R'''<sup>3</sup>: it is being thought of as the purely imaginary quaternions. From the perspective of geometric algebra, [[Quaternion#Quaternions as the even part of Cℓ3,0(R)|quaternions can be identified as ''C''ℓ<sup>0</sup><sub>3,0</sub>('''R''')]], the even part of the Clifford algebra on Euclidean 3-space, which unifies the three approaches. + +;20th century and present + +Progress on the study of Clifford algebras quietly advanced through the twentieth century, although largely due to the work of [[abstract algebra]]ists such as [[Hermann Weyl]] and [[Claude Chevalley]]. The ''geometrical'' approach to geometric algebras has seen a number of 20th-century revivals. In mathematics, [[Emil Artin]]'s ''Geometric Algebra''<ref>{{citation |first=Emil |last=Artin |title=Geometric algebra |series=Wiley Classics Library|publisher=John Wiley & Sons Inc. |place=New York |year=1988 |pages=x+214 |isbn=0-471-60839-4|mr=1009557}} ''(Reprint of the 1957 original; A Wiley-Interscience Publication)'' +</ref> discusses the algebra associated with each of a number of geometries, including [[affine geometry]], [[projective geometry]], [[symplectic geometry]], and [[orthogonal geometry]]. In physics, geometric algebras have been revived as a "new" way to do classical mechanics and electromagnetism, together with more advanced topics such as quantum mechanics and gauge theory.<ref>{{cite book|first=Chris J. L. |last=[[Chris J. L. Doran|Doran]] |date=February 1994 |title=Geometric Algebra and its Application to Mathematical Physics |type=Ph.D. thesis |publisher=[[University of Cambridge]] |url=http://www.mrao.cam.ac.uk/~clifford/publications/abstracts/chris_thesis.html |oclc=53604228 |ref=harv}}</ref> [[David Hestenes]] reinterpreted the Pauli and Dirac matrices as vectors in ordinary space and spacetime, respectively, and has been a primary contemporary advocate for the use of geometric algebra. + +In [[computer graphics]] and robotics, geometric algebras have been revived in order to efficiently represent rotations and other transformations. For applications of GA in robotics (screw theory, kinematics and dynamics using versors), computer vision, control and neural computing (geometric learning) see Bayro (2010). + +==Software== +GA is a very application oriented subject. There is a reasonably steep initial learning curve associated with it, but this can be eased somewhat by the use of applicable software. + +The following is a list of freely available software that does not require ownership of commercial software or purchase of any commercial products for this purpose: +* GA Viewer [http://www.geometricalgebra.net/downloads.html Fontijne, Dorst, Bouma & Mann] +The link provides a manual, introduction to GA and sample material as well as the software. +* CLUViz [http://www.clucalc.info/ Perwass] +Software allowing script creation and including sample visualizations, manual and GA introduction. +* Gaigen [[SourceForge:projects/g25/|Fontijne]] +For programmers,this is a code generator with support for C,C++,C# and Java. +* Cinderella Visualizations [http://sinai.apphy.u-fukui.ac.jp/gcj/software/GAcindy-1.4/GAcindy.htm Hitzer] and [http://staff.science.uva.nl/~leo/cinderella/ Dorst]. +* Gaalop [http://www.gaalop.de] Standalone GUI-Application that uses the Open-Source Computer Algebra Software [[Maxima (software)|Maxima]] to break down CLUViz code into C/C++ or Java code. +* Gaalop Precompiler [http://www.gaalop.de] Precompiler based on Gaalop integrated with [[CMake]]. +* Gaalet, C++ Expression Template Library [http://sourceforge.net/apps/trac/gaalet/ Seybold]. +* GALua, A Lua module adding GA data-types to the Lua programming language [http://spencerparkin.github.com/GALua/ Parkin]. + +==See also== +* [[Comparison of vector algebra and geometric algebra]] +* [[Clifford algebra]] +* [[Spacetime algebra]] +* [[Spinor]] +* [[Quaternion]] +* [[Algebra of physical space]] ([[wikibooks:Physics in the Language of Geometric Algebra. An Approach with the Algebra of Physical Space]]) +* [[Universal geometric algebra]] + +==References== +{{Reflist}} +* {{citation|editor=Baylis, W. E. |year=1996 |title=Clifford (Geometric) Algebra with Applications to Physics, Mathematics, and Engineering|publisher=Birkhäuser}} +* {{citation|last=Baylis|first=W. E.|year=2002 |title=Electrodynamics: A Modern Geometric Approach|edition=2|publisher=Birkhäuser|ISBN=978-0-8176-4025-5}} +* {{citation|last=Bourbaki|first=Nicolas|authorlink=Nicolas Bourbaki| year=1980|title=Eléments de Mathématique. Algèbre|location=Ch. 9 "Algèbres de Clifford"|publisher=Hermann}} +*{{citation|last=Dorst|first=Leo|title=The inner products of geometric algebra|publisher=Birkhäuser Boston|place=Boston, MA|year=2002|page=35−46}} +* {{citation|last=Hestenes|first=David|authorlink=David Hestenes| year=1999| title=New Foundations for Classical Mechanics| edition=2|publisher=Springer Verlag| ISBN=978-0-7923-5302-7}} +*{{cite book|first=David |last=Hestenes |authorlink=David Hestenes |title=Space-time Algebra |location=New York |publisher=Gordon and Breach |year=1966 |oclc=996371 |isbn=978-0-677-01390-9}} +* {{citation|last1=Lasenby|first1= J.| last2=Lasenby|first2=A. N.|last3=Doran|first3= C. J. L.|year=2000|url=http://www.mrao.cam.ac.uk/%7Eclifford/publications/ps/dll_millen.pdf|title=A Unified Mathematical Language for Physics and Engineering in the 21st Century| journal=Philosophical Transactions of the Royal Society of London|issue=A 358|location=pp. 1–18}} +*{{cite book|last1=Doran|first1=Chris|last2=Lasenby|first2=Anthony |title=Geometric algebra for physicists |url=http://assets.cambridge.org/052148/0221/sample/0521480221WS.pdf |year=2003 |publisher=Cambridge University Press |isbn=978-0-521-71595-9}} +*{{cite book|first=Alan |last=Macdonald |title=Linear and Geometric Algebra |location=Charleston |publisher=CreateSpace |year=2011 |oclc=704377582 |isbn=9781453854938}} +*{{cite book|title=The ontology of spacetime |editor=[[Dennis Dieks]] |author=J Bain |chapter=Spacetime structuralism: §5 Manifolds ''vs.'' geometric algebra |page=54 ''ff'' |isbn=978-0-444-52768-4 |year=2006 |publisher=Elsevier |url=http://books.google.com/?id=OI5BySlm-IcC&pg=PT72}} +*{{cite book|last1=Bayro-Corrochano|first1=Eduardo|title=Geometric Computing for Wavelet Transforms, Robot Vision, Learning, Control and Action |year=2010 |publisher=Springer Verlag}} + +==External links== +* [http://faculty.luther.edu/~macdonal/GA&GC.pdf A Survey of Geometric Algebra and Geometric Calculus] [http://faculty.luther.edu/~macdonal/ Alan Macdonald], Luther College, Iowa. +* [http://www.mrao.cam.ac.uk/~clifford/introduction/intro/intro.html Imaginary Numbers are not Real – the Geometric Algebra of Spacetime]. Introduction (Cambridge GA group). +* [http://www.mrao.cam.ac.uk/~clifford/ptIIIcourse/ Physical Applications of Geometric Algebra]. Final-year undergraduate course by Chris Doran and Anthony Lasenby (Cambridge GA group; see also [http://www.mrao.cam.ac.uk/~clifford/ptIIIcourse/course99/ 1999 version]). +* [http://www.iancgbell.clara.net/maths/ Maths for (Games) Programmers: 5 – Multivector methods]. Comprehensive introduction and reference for programmers, from [[Ian Bell (programmer)|Ian Bell]]. +* {{planetmath reference|id=3770|title=Geometric Algebra}} +*[[arXiv:0907.5356|Clifford algebra, geometric algebra, and applications]] Douglas Lundholm, Lars Svensson Lecture notes for a course on the theory of Clifford algebras, with special emphasis on their wide range of applications in mathematics and physics. +*[http://www.visgraf.impa.br/Courses/ga/ IMPA SUmmer School 2010] Fernandes Oliveira Intro and Slides. +* [http://sinai.apphy.u-fukui.ac.jp/gcj/pubs.html University of Fukui] E.S.M. Hitzer and Japan GA publications. +* [http://groups.google.com/group/geometric_algebra Google Group for GA] +* [http://www.jaapsuter.com/geometric-algebra/ Geometric Algebra Primer] Introduction to GA, Jaap Suter. + +'''English translations of early books and papers''' +*[http://neo-classical-physics.info/uploads/3/0/6/5/3065888/combebiac_-_tri-quaternions.pdf G. Combebiac, "calculus of tri-quaternions"] (Doctoral dissertation) +*[http://neo-classical-physics.info/uploads/3/0/6/5/3065888/markic_-_tri_and_quadri-quaternions.pdf M. Markic, "Transformants: A new mathematical vehicle. A synthesis of Combebiac's tri-quaternions and Grassmann's geometric system. The calculus of quadri-quaternions"] +* [http://neo-classical-physics.info/uploads/3/0/6/5/3065888/burali-forti_-_grassman_and_proj._geom..pdf C. Burali-Forti, "The Grassmann method in projective geometry"] A compilation of three notes on the application of exterior algebra to projective geometry +* [http://neo-classical-physics.info/uploads/3/0/6/5/3065888/burali-forti_-_diff._geom._following_grassmann.pdf C. Burali-Forti, "Introduction to Differential Geometry, following the method of H. Grassmann"] Early book on the application of Grassmann algebra +* [http://neo-classical-physics.info/uploads/3/0/6/5/3065888/grassmann_-_mechanics_and_extensions.pdf H. Grassmann, "Mechanics, according to the principles of the theory of extension"] One of his papers on the applications of exterior algebra. + +'''Research groups''' +* [http://sinai.apphy.u-fukui.ac.jp/gcj/gc_int.html Geometric Calculus International]. Links to Research groups, Software, and Conferences, worldwide. +* [http://www.mrao.cam.ac.uk/~clifford/ Cambridge Geometric Algebra group]. Full-text online publications, and other material. +* [http://www.science.uva.nl/ga/ University of Amsterdam group] +* [http://geocalc.clas.asu.edu/ Geometric Calculus research & development] (Arizona State University). +* [http://gaupdate.wordpress.com/ GA-Net blog] and [http://sinai.apphy.u-fukui.ac.jp/GA-Net/archive/index.html newsletter archive]. Geometric Algebra/Clifford Algebra development news. +*[http://www.gdl.cinvestav.mx/edb/ Geometric Algebra for Perception Action Systems. Geometric Cybernetics Group] (CINVESTAV, Campus Guadalajara, Mexico). + +{{DEFAULTSORT:Geometric Algebra}} +[[Category:Clifford algebras]] +[[Category:Ring theory]] +[[Category:Geometric algebra| ]] + jw37avtwr2btx6vhld26n97cadbrc39 + + + + Stochastic game + 0 + 16802 + + 16803 + 2013-12-15T16:52:14Z + + 190.105.4.72 + + /* Further reading */ link + wikitext + text/x-wiki + In [[game theory]], a '''stochastic game''', introduced by [[Lloyd Shapley]] in the early 1950s, is a dynamic game with '''probabilistic transitions''' played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some '''state'''. The players select actions and each player receives a '''payoff''' that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs. + +Stochastic games generalize both [[Markov decision process]]es and [[repeated game]]s. + +==Theory== +The ingredients of a stochastic game are: a finite set of players <math>I</math>; a state space <math>M</math> (either a finite set or a measurable space <math>(M,{\mathcal A})</math>); for each player <math>i\in I</math>, an action set <math>S^i</math> +(either a finite set or a measurable space <math>(S^i,{\mathcal S}^i)</math>); a transition probability <math>P</math> from <math>M\times S</math>, where <math>S=\times_{i\in I}S^i</math> is the action profiles, to <math>M</math>, where <math>P(A \mid m, s)</math> is the probability that the next state is in <math>A</math> given the current state <math>m</math> and the current action profile <math>s</math>; and a payoff function <math>g</math> from <math>M\times S</math> to <math>R^I</math>, where the <math>i</math>-th coordinate of <math>g</math>, <math>g^i</math>, is the payoff to player <math>i</math> as a function of the state <math>m</math> and the action profile <math>s</math>. + +The game starts at some initial state <math>m_1</math>. At stage <math>t</math>, players first observe <math>m_t</math>, then simultaneously choose actions <math>s^i_t\in S^i</math>, then observe the action profile <math>s_t=(s^i_t)_i</math>, and then nature selects <math>m_{t+1}</math> according to the probability <math>P(\cdot\mid m_t,s_t)</math>. A play of the stochastic game, <math>m_1,s_1,\ldots,m_t,s_t,\ldots</math>, +defines a stream of payoffs <math>g_1,g_2,\ldots</math>, where <math>g_t=g(m_t,s_t)</math>. + +The discounted game <math>\Gamma_\lambda</math> with discount factor <math>\lambda </math> (<math>0<\lambda \leq 1</math>) is the game where the payoff to player <math>i</math> is <math>\lambda \sum_{t=1}^{\infty}(1-\lambda)^{t-1}g^i_t</math>. The <math>n</math>-stage game +is the game where the payoff to player <math>i</math> is <math>\bar{g}^i_n:=\frac1n\sum_{t=1}^ng^i_t</math>. + +The value <math>v_n(m_1)</math>, respectively <math>v_{\lambda}(m_1)</math>, of a two-person zero-sum stochastic game <math>\Gamma_n</math>, respectively <math>\Gamma_{\lambda}</math>, with finitely many states and actions exists, and [[Truman Bewley]] and [[Elon Kohlberg]] (1976) proved that <math>v_n(m_1)</math> converges to a limit as <math>n</math> goes to infinity and that <math>v_{\lambda}(m_1)</math> converges to the same limit as <math>\lambda</math> goes to <math>0</math>. + +The "undiscounted" game <math>\Gamma_\infty</math> is the game where the payoff to player <math>i</math> is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum <math>\Gamma_{\infty}</math> and in defining equilibrium payoffs of a non-zero-sum <math>\Gamma_{\infty}</math>. The uniform value <math>v_{\infty}</math> of a two-person zero-sum stochastic game <math>\Gamma_\infty</math> exists if for every <math>\varepsilon>0</math> there is a positive integer <math>N</math> and a strategy pair <math>\sigma_{\varepsilon}</math> of player 1 and <math>\tau_{\varepsilon}</math> of player 2 such that for every <math>\sigma</math> and <math>\tau</math> and every <math>n\geq N</math> the expectation of <math>\bar{g}^i_n</math> with respect to the probability on plays defined by <math>\sigma_{\varepsilon} </math> and <math>\tau</math> is at least <math>v_{\infty} -\varepsilon </math>, and the expectation of <math>\bar{g}^i_n</math> with respect to the probability on plays defined by <math>\sigma </math> and <math>\tau_{\varepsilon}</math> is at most <math>v_{\infty} +\varepsilon </math>. [[Jean-François Mertens]] and [[Abraham Neyman]] (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value. + +If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a [[Nash equilibrium]]. The same is true for a game with infinitely many stages if the total payoff is the discounted sum. [[Nicolas Vieille]] has shown that all two-person stochastic games with finite state and action spaces have +[[Epsilon-equilibrium|approximate Nash equilibria]] when the total payoff is the limit inferior of the averages of the stage payoffs. Whether such equilibria exist when there are more than two players is a challenging open question. + +A [[Markov perfect equilibrium]] is a refinement of the concept of [[sub-game perfect Nash equilibrium]] to stochastic games.. + +==Applications== +Stochastic games have applications in economics, evolutionary biology and computer networks.<ref>[http://www-net.cs.umass.edu/~sadoc/mdp/main.pdf Constrained Stochastic Games in Wireless Networks] by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche</ref> They are generalizations of [[repeated game]]s which correspond to the special case where there is only one state. + +==Referring book== +The most complete reference is the book of articles edited by Neyman and Sorin. The more elementary book of Filar and Vrieze provides a unified rigorous treatment of the theories of [[Markov Decision Process]]es and two-person stochastic games. They coin the term Competitive MDPs to encompass both one- and two-player stochastic games. + +==Notes== +{{reflist}} + +==Further reading== +*{{cite journal |first=A. |last=Condon|authorlink=Anne Condon |title=The complexity of stochastic games |journal=Information and Computation |volume=96 |issue= |pages=203–224 |year=1992 |doi=10.1016/0890-5401(92)90048-K }} +* {{cite book|author=H. Everett|authorlink=Hugh Everett III |pages=67–78| +chapter=Recursive games|title=Contributions to the Theory of Games, Volume 3|series=Annals of Mathematics Studies| +editors=Melvin Dresher, [[Albert W. Tucker|Albert William Tucker]], [[Philip Wolfe (mathematician)|Philip Wolfe]]|publisher=Princeton University Press|year=1957|isbn=0-691-07936-6, ISBN 978-0-691-07936-3|id=(Reprinted in [[Harold W. Kuhn]], ed. ''Classics in Game Theory'', Princeton University Press, 1997. ISBN 978-0-691-01192-9)}} +*{{cite book |first=J. |last=Filar |lastauthoramp=yes |first2=K. |last2=Vrieze |title=Competitive Markov Decision Processes |location= |publisher=Springer-Verlag |year=1997 |isbn=0-387-94805-8 }} +*{{cite journal |first=J. F. |last=Mertens |lastauthoramp=yes |first2=A. |last2=Neyman |title=Stochastic Games |journal=International Journal of Game Theory |volume=10 |issue=2 |pages=53–66 |year=1981 |doi=10.1007/BF01769259 }} +*{{cite journal |first=A. |last=Neyman |lastauthoramp=yes |first2=S. |last2=Sorin |title=Stochastic Games and Applications |location=Dordrecht |publisher=Kluwer Academic Press |year=2003 |isbn=1-4020-1492-9 }} +*{{cite journal |first=L. S. |last=Shapley |title=Stochastic games |journal=[[Proceedings of the National Academy of Sciences|PNAS]] |volume=39 |issue=10 |pages=1095–1100 |year=1953 |url=http://www.pnas.org/content/39/10/1095 |doi=10.1073/pnas.39.10.1095}} +*{{cite book |first=N. |last=Vieille |chapter=Stochastic games: Recent results |title=Handbook of Game Theory |pages=1833–1850 |location=Amsterdam |publisher=Elsevier Science |year=2002 |isbn=0-444-88098-4 }} +* {{cite book|author1=Yoav Shoham|author2=Kevin Leyton-Brown|title=Multiagent systems: algorithmic, game-theoretic, and logical foundations|year=2009|publisher=Cambridge University Press|isbn=978-0-521-89943-7|pages=153–156}} (suitable for undergraduates; main results, no proofs) + +{{Game theory}} + +{{DEFAULTSORT:Stochastic Game}} +[[Category:Game theory]] + iiss35pvsrtaxkqrc23rd9xzfsx02o4 + + + + 331 model + 0 + 15857 + + 15858 + 2013-03-06T17:04:23Z + + BattyBot + 0 + + Converted {{Multiple issues}} to new format to fix expert parameter & [[WP:AWB/GF|general fixes]] using [[Project:AWB|AWB]] (8853) + wikitext + text/x-wiki + {{Multiple issues| +{{technical|date=July 2009}} +{{expert-subject|date=July 2009}} +{{refimprove|date=July 2009}} +}} + +The '''331 model''' in [[particle physics]] offers an explanation of why there must exist three families of quarks and leptons. One curious feature of the [[Standard Model]] is that the [[Anomaly (physics)|anomaly]] cancels exactly, for each quark-lepton family, of which we know three. The standard model thus offers no explanation of why there are three families, or indeed why there is more than one family. + +One idea, therefore, is to extend the standard model such as to destroy the perfect cancellation of the anomaly, per family, and to make the three families transform differently under an extended gauge group, and to arrange that the anomaly cancel, only for three families. But the cancellation will persist for 6, 9, ... families, so then there is a new super-family problem, which is best avoided by having only three families. + +Such a construction necessarily requires the addition of further gauge bosons and chiral fermions, which then provide testable predictions of the new model, in the form of elementary particles, to be sought experimentally, at masses above the weak scale, of about 100 GeV. The minimal 331 model predicts singly and doubly charged spin-one bosons, [[bilepton]]s, which could show up in electron-electron scattering when it is studied at TeV energy scales and may also be produced in multi-TeV proton–proton scattering at the [[Large Hadron Collider]] as early as 2011. + +The 331 model offers an explanation of why there must exist three families of quarks and leptons, a fact which is put in "by hand" in the Standard Model. The 331 model is an extension of the [[electroweak]] [[gauge symmetry]] from <math>SU(2)_W \times U(1)_Y</math> to <math>SU(3)_L \times U(1)_X</math> with <math>SU(2)_W \subset SU(3)_W</math> and the hypercharge <math>Y = \beta T_8 + I X</math> and the electric charge <math>Q=Y/2 + T_3/2</math> where T<sub>3</sub> and T<sub>8</sub> are the [[Gell-Mann matrices]] of SU(3)<sub>L</sub> and β and I are parameters of the model. The name 331 comes from the full gauge symmetry group <math>SU(3)_C \times SU(3)_L \times U(1)_X</math>. + +==References== +*{{Cite journal + |last=Frampton |first=P.H + |authorlink=Paul Frampton + |title=Chiral Dilepton Model and the Flavor Question + |url=http://ccdb4fs.kek.jp/cgi-bin/img_index?199207358 + |journal=[[Physical Review Letters]] + |volume=69 + |issue=20 |pages=2889–2891 + |year=1992 + |doi=10.1103/PhysRevLett.69.2889 +|pmid=10046667 +|bibcode=1992PhRvL..69.2889F +}} +*{{cite journal + |last1=Pisano | first1=F. + |last2=Pleitez | first2=V. + |title=An SU(3) x U(1) model for electroweak interactions + |year=1992 + |doi=10.1103/PhysRevD.46.410 + |journal=Physical Review D + |volume=46 + |pages=410–417 |arxiv=hep-ph/9206242 +|bibcode = 1992PhRvD..46..410P }} +*{{cite journal + |last1=Foot | first1=R. + |last2=Hernandez | first2=O.F. + |last3=Pisano | first3=F. + |last4=Pleitez | first4=V. + |title=Lepton masses in an SU(3)<sub>L</sub> x U(1)<sub>N</sub> gauge model + |year=1992 + |doi=10.1103/PhysRevD.47.4158 + |journal=Physical Review D + |volume=47 + |issue=9 + |pages=4158–4161 |arxiv=hep-ph/9207264 +|bibcode = 1993PhRvD..47.4158F }} + +==See also== + +*[[Standard Model]] +*[[Standard model (basic details)]] +*[[Beyond the Standard Model]] + +[[Category:Particle physics]] + + +{{Particle-stub}} + sfi7emw697qwe2trgvsle5upk9fcuxz + + + + Sigma additivity + 0 + 7472 + + 7473 + 2013-12-22T17:50:31Z + + 157.181.98.186 + + fix math fuckups + wikitext + text/x-wiki + In [[mathematics]], '''additivity''' and '''sigma additivity''' (also called '''countable additivity''') of a [[function (mathematics)|function]] defined on [[subset]]s of a given [[Set (mathematics)|set]] are abstractions of the intuitive properties of size ([[length]], [[area]], [[volume]]) of a set. + +== Additive (or finitely additive) set functions == +Let ''<math>\mu</math>'' be a function defined on an [[field of sets|algebra of sets]] <math>\scriptstyle\mathcal{A}</math> with values in [&minus;&infin;, +&infin;] (see the [[extended real number line]]). The function <math>\mu</math> is called '''additive''', or '''finitely additive''', if, whenever ''A'' and ''B'' are [[disjoint set]]s in <math>\scriptstyle\mathcal{A}</math>, one has + +:<math> \mu(A \cup B) = \mu(A) + \mu(B). \, </math> + +(A consequence of this is that an additive function cannot take both &minus;&infin; and +&infin; as values, for the expression &infin;&nbsp;&minus;&nbsp;&infin; is undefined.) + +One can prove by [[mathematical induction]] that an additive function satisfies + +: <math>\mu\left(\bigcup_{n=1}^N A_n\right)=\sum_{n=1}^N \mu(A_n)</math> + +for any <math>A_1,A_2,\dots,A_N</math> disjoint sets in <math>\scriptstyle\mathcal{A}</math>. + +==&sigma;-additive set functions== +Suppose that <math>\scriptstyle\mathcal{A}</math> is a [[sigma algebra|&sigma;-algebra]]. If for any [[sequence]] <math>A_1,A_2,\dots,A_n,\dots </math> of disjoint sets in <math>\scriptstyle\mathcal{A}</math>, one has +:<math> \mu\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n)</math>{{math|,}} +we say that &mu; is '''countably additive''' or '''&sigma;-additive'''. <br /> +Any &sigma;-additive function is additive but not vice-versa, as shown below. + +== Properties == +=== Basic properties === +Useful properties of an additive function &mu; include the following: +# Either &mu;(&empty;) = 0, or &mu; assigns ∞ to all sets in its domain, or &mu; assigns &minus;∞ to all sets in its domain. +# If &mu; is non-negative and ''A'' &sube; ''B'', then &mu;(''A'') &le; &mu;(''B''). +# If ''A'' &sube; ''B'' and &mu;(''B'') &minus; &mu;(''A'') is defined, then &mu;(''B'' \ ''A'') = &mu;(''B'') &minus; &mu;(''A''). +# Given ''A'' and ''B'', &mu;(''A'' &cup; ''B'') + &mu;(''A'' &cap; ''B'') = &mu;(''A'') + &mu;(''B''). + +==Examples== +An example of a &sigma;-additive function is the function &mu; defined over the [[power set]] of the [[real number]]s, such that +:<math> \mu (A)= \begin{cases} 1 & \mbox{ if } 0 \in A \\ + 0 & \mbox{ if } 0 \notin A. +\end{cases}</math> + +If <math>A_1,A_2,\dots,A_n,\dots</math> is a sequence of disjoint sets of real numbers, then either none of the sets contains 0, or precisely one of them does. In either case, the equality +:<math> \mu\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n)</math> +holds. + +See [[measure (mathematics)|measure]] and [[signed measure]] for more examples of &sigma;-additive functions. + +===An additive function which is not &sigma;-additive=== +An example of an additive function which is not &sigma;-additive is obtained by considering &mu;, defined over the Lebesgue sets of the [[real number]]s by the formula + +:<math> \mu(A)=\lim_{k\to\infty} k \cdot \lambda\left(A \cap \left(0,\frac{1}{k}\right)\right),</math> +where ''λ'' denotes the [[Lebesgue measure]] and ''lim'' the [[Banach limit]]. + +One can check that this function is additive by using the linearity of the limit. That this function is not &sigma;-additive follows by considering the sequence of disjoint sets +:<math>A_n=\left[\frac {1}{n+1},\, \frac{1}{n}\right)</math> +for ''n''=1, 2, 3, ... The union of these sets is the interval (0, 1), and &mu; applied to the union is then one, while &mu; applied to any of the individual sets is zero, so the sum of &mu;(''A''<sub>''n''</sub>) is also zero, which proves the counterexample. + +==Generalizations== +One may define additive functions with values in any additive [[monoid]] (for example any [[group (mathematics)|group]] or more commonly a [[vector space]]). For sigma-additivity, one needs in addition that the concept of [[limit of a sequence]] be defined on that set. For example, [[spectral measure]]s are sigma-additive functions with values in a [[Banach algebra]]. Another example, also from quantum mechanics, is the [[positive operator-valued measure]]. + +== See also == +* [[signed measure]] +* [[measure (mathematics)]] +* [[additive function]] +* [[subadditive function]] +* [[σ-finite measure]] +* [[Hahn–Kolmogorov theorem]] + +{{PlanetMath attribution|id=3400|title=additive}} + +[[Category:Measure theory]] + lcqxgj6hnnav40ka0g0o2qa56unmzrr + + + + Marcus theory + 0 + 12129 + + 12130 + 2013-12-24T20:48:35Z + + BD2412 + 0 + + + Fixing [[Wikipedia:Disambiguation pages with links|links to disambiguation pages]] using [[Project:AWB|AWB]] + wikitext + text/x-wiki + '''Marcus Theory''' is a theory originally developed by [[Rudolph A. Marcus]], starting in 1956, to explain the rates of [[electron transfer]] reactions &ndash; the rate at which an electron can move or jump from one chemical species (called the electron donor) to another (called the electron acceptor).<ref name="Nobel">{{internetquelle|werk=Nobelstiftung|url=http://nobelprize.org/nobel_prizes/chemistry/laureates/1992/marcus-lecture.pdf |titel=Electron Transfer Reactions in Chemistry: Theory and Experiment|zugriff=02.04.2007}}</ref> It was originally formulated to address [[outer sphere electron transfer]] reactions, in which the two chemical species only change in their charge with an electron jumping (e.g. the oxidation of an ion like Fe<sup>2+</sup>/Fe<sup>3+</sup>), but do not undergo large structural changes. It was extended to include [[inner sphere electron transfer]] contributions, in which a change of distances or geometry in the solvation or coordination shells of the two chemical species is taken into account (the Fe-O distances in Fe(H<sub>2</sub>O)<sup>2+</sup> and Fe(H<sub>2</sub>O)<sup>3+</sup> are different).<ref>Contrary to Marcus' approach the inner sphere electron transfer theory of [[Noel S. Hush]] refers to a ''continuous'' change of the electron density during transfer along a geometrical coordinate (adiabatic case), and takes also into account the solvent influence as did Marcus. Hush's formulation is known as Marcus-Hush theory.</ref><ref name="Hush">Hush, N.S. Trans. Faraday Soc. 1961, '''57''',557</ref> + +For redox reactions without making or breaking bonds '''Marcus theory''' takes the place of Eyring's [[transition state theory]] <ref name="TST1">P. W. Atkins: ''Physical Chemistry'', 6. Ed., Oxford University Press, Oxford 1998 p.830</ref><ref name="TST2">R.S. Berry, S. A. Rice, J. Ross: ''Physical Chemistry'', Wiley, New York 1980, S. 1147 ff,</ref> which has been derived for reactions with structural changes. Both theories lead to rate equations of the same exponential form. However, whereas in Eyring theory the reaction partners become strongly coupled in the course of the reaction to form a structurally defined activated complex, in Marcus theory they are weakly coupled and retain their individuality. It is the thermally induced reorganization of the surroundings, the solvent (outer sphere) and the solvent sheath or the ligands (inner sphere) which create the geometrically favourable situation ''prior'' to and independent of the electron jump. + +The original classical Marcus theory for outer sphere electron transfer reactions demonstrates the importance of the solvent and leads the way to the calculation of the [[Gibbs free energy]] of activation, using the [[Polarization (electrochemistry)|polarization]] properties of the solvent, the size of the reactants, the transfer distance and the Gibbs free energy <math>\Delta</math>G<sup>0</sup> of the redox reaction. The most startling result of Marcus' theory was the "inverted region": whereas the reaction rates usually become higher with increasing [[Exergonic reaction|exergonicity]] of the reaction, electron transfer should, according to Marcus theory, become slower in the very negative <math>\Delta</math>G<sup>0</sup> domain. The inverted region was searched for 30 years until it was unequivocally verified experimentally in 1984. + +R.A. Marcus received the [[Nobel Prize in Chemistry]] in 1992 for this theory. Marcus theory is used to describe a number of important processes in chemistry and biology, including [[photosynthesis]], [[corrosion]], certain types of [[chemiluminescence]], charge separation in some types of [[solar cells|solar cell]] and more. Besides the inner and outer sphere applications, Marcus theory has been extended to address [[heterogeneous electron transfer]]. + +== The One-Electron Redox Reaction == + +Chemical reactions may lead to a substitution of a group in a molecule or a ligand in a complex, to the elimination of a group of the molecule or a ligand, or to a rearrangement of a molecule or complex. A chemical reaction may, however, also cause simply an exchange of charges between the reactants, and these redox reactions without making or breaking a bond seem to be quite simple in Inorganic Chemistry for ions and complexes. These reactions often become manifest by a change of colour, e.g. for ions or complexes of transition metal ions, but organic molecules, too, may change their colour by accepting or giving away an electron (like the herbicide [[Paraquat]] (4,4'-bipyridyl) which becomes blue when accepting an electron, thence the alternative name of methyl viologen). For this type of redox reactions R.A. Marcus has developed his theory. Here the trace of argument and the results are presented. For the mathematical development and details the original papers<ref name="Marcus 1">Marcus, R.A. "On the Theory of Oxidation-Reduction Reactions Involving Electron Transfer I" ''J.Chem.Phys.''1956, '''24''', 966. {{DOI|10.1063/1.1742723}} or [http://www.cce.caltech.edu/faculty/marcus/publications/16.pdf Free Text]</ref><ref name="Marcus 2">Marcus.R.A. "Electrostatic Free Energy and Other Properties of States Having Nonequilibrium Polarization I. ''J.Chem.Phys.''1956, '''24''', 979. {{DOI|10.1063/1.1742724}} or [http://www.cce.caltech.edu/faculty/marcus/publications/17.pdf Free Text]</ref> should be consulted. + +In a redox reaction one partner acts as an electron donor D the other as an acceptor A. For a reaction to take place D and A must diffuse together. They form the precursor complex, usually a kinetic, unstable, solvated encounter complex, which by electron transfer is transformed to the successor complex, and finally this separates by diffusion. For a one electron transfer the reaction is + +:<math>\mathrm{D+A \ \overset{\xrightarrow{k_{12}}}{\xleftarrow[k_{21}]{}} \ [D{\dotsm}A] \ \overset{\xrightarrow{k_{23}}}{\xleftarrow[k_{32}]{}} \ [D^+{\dotsm}A^-] \xrightarrow{k_{30}} \ D^+ + A^-}</math> + +(D and A may already carry charges). Here k<sub>12</sub>, k<sub>21</sub> and k<sub>30</sub> are diffusion constants, k<sub>23</sub> and k<sub>32</sub> rate constants of activated reactions. The total reaction may be diffusion controlled (the electron transfer step is faster than diffusion, every encounter leads to reaction) or activation controlled (the "equilibrium of association" is reached, the electron transfer step is slow, the separation of the successor complex is fast). + +==Outer Sphere Electron Transfer== +Redox reactions are preferably run in polar solvents. Donor and acceptor then have a solvent shell and the precursor and successor complexes are solvated also. The closest molecules of the solvent shell, or the ligands in complexes, are tightly bound and constitute the "inner sphere". Reactions in which these participate are called inner sphere redox reactions. The free solvent molecules constitute the "outer sphere". Outer sphere redox reactions do not change the inner sphere, no bonds are made nor broken. + +It was R.A. Marcus who realized the role of the solvent when he worked on the nature and magnitude of the Gibbs free energy of activation for redox reactions, more precisely: one-electron transfer reactions of the outer sphere type. He published two fundamental papers.<ref name="Marcus 1" /><ref name="Marcus 2" /> The ideas of these two papers are often referred to Marcus Theory although Marcus’ later work goes much beyond them.<ref name="Nobel"/> In the following the development and results of the ideas of these two papers are outlined. For the mathematics and more details the original papers should be consulted. + +== The Problem == + +In outer sphere redox reactions no bonds are formed or broken; only an electron transfer (ET) takes place. A quite simple example is the Fe<sup>2+</sup>/Fe<sup>3+</sup> redox reaction, the self exchange reaction which is known to be always occurring in an aqueous solution containing both FeSO<sub>4</sub> and Fe<sub>2</sub>(SO<sub>4</sub>)<sub>3</sub> (of course, with equal and measurable rates in both directions and with Gibbs free reaction energy <math>\Delta</math>G<sup>0</sup> = 0). + +From the reaction rate's temperature dependence (e.g. the S<sub>N2</sub>-substitution reaction of the saponification of an alkyl halide) an [[activation energy]] is determined, and this activation energy is interpreted as the energy of the [[transition state]] in a reaction diagram. The latter is drawn, according to Arrhenius and Eyring, as an energy diagram with the reaction coordinate as the abscissa. The reaction coordinate describes the minimum energy path from the reactants to the products, and the points of this coordinate are combinations of distances and angles between and in the reactants in the course of the formation and/or cleavage of bonds. The maximum of the energy diagram, the [[transition state]], is characterized by a specific configuration of the atoms. Moreover, in Eyring’s TST <ref name="TST1" /><ref name="TST2" /> a quite specific change of the [[nuclear coordinates]] is responsible for crossing the maximum point, a vibration in this direction is consequently treated as a translation. + +For outer sphere redox reactions there cannot be such a reaction path, but nevertheless one does observe an activation energy. The rate equation for activation-controlled reactions has the same exponential form as the Eyring equation, + +<math> k_{act} = A\cdot e^{-\frac{\Delta G^{\ddagger}}{RT}} </math> + +<math> \Delta G^{\ddagger}</math>is the Gibbs free energy of the formation of the transition state, the exponential term represents the probability of its formation, A contains the probability of crossing from precursor to successor complex. + +== The Marcus Model == + +The consequence of an electron transfer is the rearrangement of charges, and this gravely influences the solvent environment. For the dipolar solvent molecules rearrange in the direction of the field of the charges (this is called orientation polarisation), and also the atoms and electrons in the solvent molecules are slightly displaced (atomic and electron polarization, respectively). It is this [[solvent polarization]] which determines the free energy of activation and thus the reaction rate. + +Substitution, elimination and isomerization reactions differ from the outer sphere redox reaction not only in the structural changes outlined above, but also in the fact that the movements of the nuclei and the shift of charges ([[Intervalence charge transfer|charge transfer]], CT) on the reactions path take place in a continuous and concerted way: nuclear configurations and charge distribution are always “in equilibrium”. This is illustrated by the S<sub>N</sub>2 substitution of the saponification of an alkyl halide where the rear side attack of the OH<sup>-</sup> ion pushes out a halide ion and where a transition state with a five-coordinated carbon atom must be visualized. The system of the reactants becomes coupled so tightly during the reaction that they form the [[activated complex]] as an integral entity. The solvent here has a minor effect. + +By contrast, in outer sphere redox reactions the displacement of nuclei in the reactants are small, here the solvent has the dominant role. Donor-acceptor coupling is weak, both keep their identity during the reaction. Therefore the electron, being an elementary particle, can only “jump” as a whole ([[electron transfer]], ET). If the electron jumps, the transfer is much faster than the movement of the large solvent molecules, with the consequence that the nuclear positions of the reaction partners and the solvent molecules are the same before and after the electron jump ([[Franck-Condon principle]]).<ref>W.F. Libby, "Theory of Electron Exchange Reactions in Aquous Solution" ''J.Phys.Chem.'' 1952, '''56''', 863</ref> The jump of the electron is governed by quantum mechanical rules, it is only possible if also the energy of the ET system does not change “during” the jump. + +The arrangement of solvent molecules depends on the charge distribution on the reactants. If the solvent configuration must be the same before and after the jump ''and'' the energy may not change, then the solvent cannot be in the solvation state of the precursor nor in that of the successor complex as they are different, it has to be somewhere in between. For the self-exchange reaction for symmetry reasons an arrangement of the solvent molecules exactly in the middle of those of precursor and successor complex would meet the conditions. This means that the solvent arrangement with half of the electron on both donor and acceptor would be the correct environment for jumping. Also, in this state the energy of precursor and successor in their solvent environment would be the same. + +However, the electron as an elementary particle cannot be divided, it resides either on the donor or the acceptor and arranges the solvent molecules accordingly in an equilibrium. The “transition state”, on the other hand, requires a solvent configuration which would result from the transfer of half an electron, which is impossible. This means that real charge distribution and required solvent polarization are not in an “equilibrium”. Yet it is possible that the solvent takes a configuration corresponding to the “transition state”, even if the electron sits on the donor or acceptor. This, however, requires energy. This energy may be provided by the thermal energy of the solvent and [[thermal fluctuations]] can produce the correct polarization state. Once this has been reached the electron can jump. The ''creation'' of the correct solvent arrangement and the electron jump are decoupled and do not happen in a synchronous process. Thus the energy of the transition state is mostly polarization energy of the solvent. + +== Marcus Theory == + +=== The Macroscopic System: Two Conducting Spheres === + +On the basis of his reasoning R.A. Marcus developed a ''classical'' theory with the aim of calculating the polarization energy of the said non-equilibrium state. From thermodynamics it is well known that the energy of such a state can be determined if a reversible path to that state is found. Marcus was successful finding such a path via two reversible charging steps for the preparation of the “transition state” from the precursor complex. + +Four elements are essential for the model on which the theory is based: (1) Marcus employs a classical, purely electrostatic model. The charge (many elementary charges) may be transferred in any portion from one body to another. (2) Marcus separates the fast electron polarisation P<sub>e</sub> and the slow atom and orientation polarisation P<sub>u</sub> of the solvent on grounds of their time constants differing several orders of magnitude. (3) Marcus separates the inner sphere (reactant + tightly bound solvent molecules, in complexes + ligands) and the outer sphere (free solvent ) (4) In this model Marcus confines himself to calculating the outer sphere energy of the non-equilibrium polarization of the “transition state”. The outer sphere energy is often much larger than the inner sphere contribution because of the far reaching electrostatic forces (compare the [[Debye-Hückel theory]] of electrochemistry). + +Marcus’ tool is the theory of dielectric polarization in solvents. He solved the problem in a general way for a transfer of charge between two bodies of arbitrary shape with arbitrary surface and volume charge. For the self-exchange reaction, the redox pair (e.g. Fe(H<sub>2</sub>O)<sub>6</sub><sup>3+</sup> / Fe(H<sub>2</sub>O)<sub>6</sub><sup>2+</sup>) is substituted by two macroscopic conducting spheres at a defined distance carrying specified charges. Between these spheres a certain amount of charge is reversibly exchanged. + +In the first step the energy W<sub>I</sub> of the transfer of a specific amount of charge is calculated, e.g. for the system in a state when both spheres carry half of the amount of charge which is to be transferred. This state of the system can be reached by transferring the respective charge from the donor sphere to the vacuum and then back to the acceptor sphere.<ref>Marcus takes the vacuum state of the reactants as the zero energy point. Therefore many of his equations contain also the solvation energy of the isolated species W<sub>iso</sub> and the electrostatic energy of formation of the precursor and successor complexes.</ref> Then the spheres in this state of charge give rise to a defined electric field in the solvent which creates the total solvent polarization P<sub>u</sub> + P<sub>e</sub>. By the same token this polarization of the solvent interacts with the charges. + +In a second step the energy W<sub>II</sub> of the reversible (back) transfer of the charge to the first sphere, again via the vacuum, is calculated. However, ''the atom and orientation polarization P<sub>u</sub> is kept fixed'', only the electron polarization P<sub>e</sub> may adjust to the field of the new charge distribution ''and'' the fixed P<sub>u</sub>. After this second step the system is in the desired state with an electron polarization corresponding to the starting point of the redox reaction and an atom and orientation polarization corresponding to the “transition state”. The energy W<sub>I</sub> + W<sub>II</sub> of this state is, thermodynamically speaking, a Gibbs free energy G. + +[[Image:Marcusparabel.jpg|thumb|300px|Fig. 1. The parabolas of outer-sphere reorganisation energy of the system two spheres in a solvent. Parabola i: the charge on the first, transfer to the second, parabola f: the charge on the second, transfer to the first. The abscissa is the transferred amount of charge <math>\Delta</math>e or the induced polarization P, the ordinate the Gibbs free energy. ΔG(0)<sup>‡</sup> = λ<sub>o</sub>/4 is the reorganization energy at Δe = 0.5, it corresponds to the activation energy of the self-exchange reaction.]] + +Of course, in this classical model the transfer any arbitrary amount of charge <math>\Delta</math>e is possible. So the energy of the non-equilibrium state, and consequently of the polarization energy of the solvent, can be probed as a function of <math>\Delta</math>e. Thus Marcus has lumped together, in a very elegant way, the coordinates of all solvent molecules into a single coordinate of solvent polarization <math>\Delta</math>p which is determined by the amount of transferred charge <math>\Delta</math>e. So he reached a simplification of the energy representation to only two dimensions: G = f(<math>\Delta</math>e). The result for two conducting spheres in a solvent is the formula of Marcus + +<math> G =\left(\frac{1}{2r_{1}}+\frac{1}{2r_2}-\frac{1}{R}\right)\cdot\left(\frac{1}{\epsilon_{op}}-\frac{1}{\epsilon_s}\right)\cdot(\Delta e)^2 </math> + +Where r<sub>1</sub> and r<sub>2</sub> are the radii of the spheres and R is their separation, <math>\epsilon</math><sub>s</sub> and <math>\epsilon</math><sub>op</sub> are the static and high frequency (optical) dielectric constants of the solvent, <math>\Delta</math>e the amount of charge transferred. The graph of G vs. <math>\Delta</math>e is a parabola (Fig. 1). In Marcus theory the energy belonging to the transfer of a unit charge (<math>\Delta</math>e = 1) is called the (outer sphere) reorganization energy <math>\lambda</math><sub>o</sub>, i.e. the energy of a state where the polarization would correspond to the transfer of a unit amount of charge, but the real charge distribution is that before the transfer.<ref name= "square">Note: The quadratic dependence of outer sphere reorganization energy is ''not'' a consequence of vibrations in reactants or solvent!</ref> In terms of exchange direction the system is symmetric. + +=== The Microscopic System: The Redox Pair === + +Shrinking the two-sphere model to the molecular level creates the problem that in the self exchange reaction the charge can no longer be transferred in arbitrary amounts, but only as a single electron. However, the polarization still is determined by the total ensemble of the solvent molecules and therefore can still be treated classically, i.e. the polarization energy is not subject to quantum limitations. Therefore the energy of solvent reorganization can be calculated as being due to a ''hypothetical'' transfer and back transfer of a partial elementary charge according to the Marcus formula. Thus the reorganization energy for chemical redox reactions, which is a Gibbs free energy, is also a parabolic function of <math>\Delta</math>e of this hypothetical transfer, For the self exchange reaction, where for symmetry reasons <math>\Delta</math>e = 0.5, the Gibbs free energy of activation is <math>\Delta</math>G(0)<sup>‡</sup> = <math>\lambda</math><sub>o</sub>/4 (see Fig. 1 and Fig.2 intersection of the parabolas I and f, f(0), respectively). + +Up to now all was physics, now some chemistry enters. The self exchange reaction is a very specific redox reaction, most of the redox reactions are between different partners <ref>they are often called Marcus cross reactions.</ref> e.g. + +:<math>\mathrm{[Fe^{II}(CN)_{6}]^{4-}}+\mathrm{[Ir^{IV}Cl_{6}]^{2-}}\rightleftharpoons\mathrm{[Fe^{III}(CN)_{6}]^{3-}}+\mathrm{[Ir^{III}Cl_{6}]^{3-}}</math> + +and they have positive (endergonic) or negative (exergonic) Gibbs free energies of reaction <math>\Delta</math>G<sup>0</sup>. + +As Marcus calculations refer exclusively to the electrostatic properties in the solvent (outer sphere) <math>\Delta</math>G<sup>0</sup> and <math>\lambda</math><sub>o</sub> are independent of one another and therefore can just be added up. This means that the Marcus parabolas in systems with different <math>\Delta</math>G<sup>0</sup> are shifted just up or down in the G vs. <math>\Delta</math>e diagram (Fig. 2). Variation of <math>\Delta</math>G<sup>0</sup> can be affected in experiments by offering different acceptors to the same donor. + +Simple calculations of the points of intersection of the parabolas i (y = x<sup>2</sup>), f(0) (y = (x-d)<sup>2</sup>) and f1 to f3 (y = (x-d)<sup>2</sup> +c) give the Gibbs free energy of activation + +:<math>\Delta G^{\ddagger} = \frac{(\lambda_{o} + \Delta G^0)^2}{4 \lambda_{o}}</math> + +It should be noted that the intersection of those parabolas represents an activation energy and not the energy of a transition state of fixed configuration of all nuclei in the system as is the case in the substitution and other reactions mentioned. The transition state of the latter reactions has to meet structural and energetic conditions, redox reactions have only to comply to the energy requirement. Whereas the geometry of the transition state in the other reactions is the same for all pairs of reactants, for redox pairs many polarization environments may meet the energetic conditions. + +[[Image:Marcusparabel 2.jpg|thumb|300 px|Fig. 2 Marcus-Parabolas for different redox reactions: f<sub>1</sub> such with positive ΔG<sup>0</sup>, f(0) for the self-exchange reaction with ΔG<sup>0</sup> = 0 (broken line), f<sub>2</sub> for moderately negative ΔG<sup>0</sup> (selected so, that ΔG<sup>‡</sup> = 0) and f<sub>3</sub> for strongly negative ΔG<sup>0</sup>. The free energy of activation ΔG<sup>‡</sup> decreases from f<sub>1</sub> (b<sub>1</sub>) via f(0) (a) to f<sub>2</sub> (zero) and increases again for f<sub>3</sub> („Marcus inverted region“).]] + +Marcus’ formula shows a quadratic dependence of the Gibbs free energy of activation on the Gibbs free energy of reaction. It is general knowledge from the host of chemical experience that reactions usually are the faster the more negative is <math>\Delta</math>G<sup>0</sup>. In many cases even a linear free energy relation is found. According to the Marcus formula the rates increase also when the reactions are more exergonic, however only as long as <math>\Delta</math>G<sup>0</sup> is positive or slightly negative. It is surprising that for redox reactions according to the Marcus formula the activation energy should increase for very exergonic reaction, i.e. in the cases when <math>\Delta</math>G<sup>0</sup>is negative and its absolute value is greater than that of <math>\lambda</math><sub>o</sub>. This realm of Gibbs free energy of reaction is called “Marcus inverted region”. In Fig. 2 it becomes obvious that the intersection of the parabolas i and f moves upwards in the left part of the graph when <math>\Delta</math>G<sup>0</sup> continues to become more negative, and this means increasing activation energy. Thus the total graph of ln k vs. <math>\Delta</math>G<sup>0</sup> should have a maximum. + +The maximum of the ET rate is expected at <math>\Delta</math>G<sup>‡</sup> = 0. Here <math>\Delta</math>e = 0 and q = 0 (Abb.2) which means that the electron may jump in the precursor complex at its equilibrium polarization. No thermal activation is necessary: the reaction is barrierless. In the inverted region the polarization corresponds to the difficult to imagine notion of a charge distribution where the donor has received and the acceptor given off charge. Of course, in real world this does not happen, it is not a real charge distribution which creates this critical polarization, but the thermal fluctuation in the solvent. This polarization necessary for transfer in the inverted region can be created – with some probability – as well as any other one.<ref>The reverse reaction may support understanding: for this reaction the polarization due to the hypothetical transfer of a unit electron charge is not sufficient to reach a polarization where the polarization energies of A/D and A<sup>-</sup>/D<sup>+</sup> are equal. This can only happen on the hypothetical transfer of more that one electron charge.</ref> The electron is just waiting for it for jumping. + +==Inner Sphere Electron Transfer== + +In the outer sphere model the donor or acceptor and the tightly bound solvation shells or the complex’ ligands were considered to form rigid structures which do not change in the course of electron transfer. However, the distances in the inner sphere are dependent on the charge of donor and acceptor, e.g. the central ion-ligand distances are different in complexes carrying different charges And again the Franck-Condon principle must be obeyed: for the electron to jump to occur, the nuclei have to have a configuration which is an identical one of as well the precursor as the successor complexes, of course highly distorted. In this case the energy requirement is fulfilled automatically. + +In this inner sphere case the Arrhenius concept holds, the transition state of definite geometric structure is reached along a geometrical reaction coordinate determined by nuclear motions. No further nuclear motion is necessary to form the successor complex, just the electron jumps, which makes a difference to the TST theory. The reaction coordinate for inner sphere energy is governed by vibrations and they differ in the oxidized and reduces species.<ref name= "Sutin">N. Sutin, 'Theory of Electron Transfer Reactions: Insights and Hindsights', Progr. Inorg. Chem. 1083, '''30''', 441-448</ref> + +For the self-exchange system Fe<sup>2+</sup>/Fe<sup>3+</sup> only the symmetrical breathing vibration of the six water molecules around the iron ions is considered.<ref name = "Sutin" /> Assuming harmonic conditions this vibration has frequencies <math>\nu_D</math> and <math>\nu_A</math>, the force constants f<sub>D</sub> and f<sub>A</sub> are <math>f = 4 \pi^2 \nu^2 \mu</math> and the energies are +:<math>E_D = E_D(q_{0,D}) + 3 f_D(\Delta q_D)^2</math> +:<math>E_A = E_A(q_{0,A}) + 3 f_A(\Delta q_A)^2</math> +where q<sub>0</sub> is the equilibrium normal coordinate and <math>\Delta q = (q-q_0) </math> the displacement along the normal coordinate, the factor 3 stems from 6 (H<sub>2</sub>O)<math>\cdot</math> ½. Like for the outer-sphere reorganization energy potential energy curve is quadratic, here, however, as a consequence of vibrations. + +The equilibrium normal coordinates differ in Fe(H<sub>2</sub>O)<sub>6</sub><sup>2+</sup> and Fe(H<sub>2</sub>O)<sub>6</sub><sup>3+</sup>. By thermal excitation of the breathing vibration a geometry can be reached which is common to both donor and acceptor, i.e. the potential energy curves of the breathing vibrations of D and A intersect here. This is the situation where the electron may jump. The energy of this transition state is the inner sphere reorganization energy <math>\lambda</math><sub>in</sub>. + +For the self-exchange reaction the metal-water distance in the transition state can be calculated <ref name="Sutin"/> + +:<math> q^*=\frac{f_D q_{0,D} + f_A q_{0,A}}{f_D + f_A} </math> + +This gives the inner sphere reorganisation energy + +:<math> \lambda_{in} = \Delta E^* = \frac{3 f_D f_A}{f_D + f_A}(q_{0,D} - q_{0,A})^2 </math> + +It is fortunate that the expressions for the energies for outer and inner reorganization have the same quadratic form. Inner sphere and outer sphere reorganization energies are independent, so they can be added to give <math>\lambda = \lambda_{in} + \lambda_o</math> and inserted in the Arrhenius equation + +<math> k_{act} = A \cdot e^{- \frac{ \Delta {G_{in}^{\ddagger}}+ \Delta {G_{o}}^{\ddagger}}{kT}}</math> + +Here, A can be seen to represent the probability of electron jump, exp[-<math>\Delta</math>G<sub>in</sub><sup><math>^\ddagger</math></sup>/kT] that of reaching the transition state of the inner sphere and exp[-<math>\Delta</math>G<sub>o</sub><sup><math>^\ddagger</math></sup>/kT] that of outer sphere adjustment . +For unsymmetrical (cross) reactions like + +:<math>\mathrm{[Fe(H_2O)_6]^{2+}} +\mathrm{[[Co(H_2O)_6]]^{3+}} \rightleftharpoons \mathrm{[Fe(H_2O)_6]^{3+}} + \mathrm{[Co(H_2O)_6]^{2+}}</math> + +the expression for <math>\lambda_{in}</math> can also be derived, but it is more complicated.<ref name = "Sutin"/> These reactions have a free reaction enthalpy <math>\Delta</math>G<sup>0</sup> which is independent of the reorganization energy and determined by the different redox potentials of the iron and cobalt couple. Consequently the quadratic Marcus equation holds also for the inner sphere reorganization energy, including the prediction of an inverted region. One may visualizing this by (a) in the normal region both the initial state and the final state have to have stretched bonds, (b) In the <math>\Delta</math> G<math>^\ddagger</math> = 0 case the equilibrium configuration of the initial state is the stretched configuration of the final state, and (c) in the inverted region the initial state has compressed bonds whereas the final state has largely stretched bonds. +Similar considerations hold for metal complexes where the ligands are larger than solvent molecules and also for ligand bridged polynuclear complexes. + +==The Probability of the Electron Jump== +The strength of the electronic coupling of the donor and acceptor decides whether the electron transfer reaction is adiabatic or non-adiabatic. In the non-adiabatic case the coupling is weak, i.e. H<sub>AB</sub> in Fig. 3 is small compared to the reorganization energy and donor and acceptor retain their identity. The system has a certain probability to jump from the initial to the final potential energy curves. In the adiabatic case the coupling is considerable, the gap of 2 H<sub>AB</sub> is larger and the system stays on the lower potential energy curve.<ref>In normal chemical reactions, like substitutions, which proceed via a transition state the upper potential energy curve is so far up that it is neglected</ref> + +Marcus theory as laid out above, represents the non-adiabatic case.<ref>The theory of adiabatic electron transfer with participation of nuclear movement (which may be considered as a transfer of charge, not an electron jump), has been worked out by Hush.</ref> Consequently the semi-classical [[Landau-Zener formula|Landau-Zener theory]] can be applied, which gives the probability of interconversion of donor and acceptor for a single passage of the system through the region of the intersection of the potential energy curves +:<math> P_{if} = 1-\exp[-\frac{4\pi^2 {H_{if}^2}}{hv \mid(s_i - s_f)\mid}] </math> + +where H<sub>if</sub> is the interaction energy at the intersection, v the velocity of the system through the intersection region, s<sub>i</sub> and s<sub>f</sub> the slopes there. + + [[Image:Parameters of the Marcus Equation.JPG]] + +Fig. 3 Energy diagram for Electron Transfer including inner and outer sphere reorganization and electronic coupling: The vertical axis is the free energy, and the horizontal axis is the "reaction coordinate" – a simplified axis representing the motion of all the atomic nuclei (inclusive solvent reorganization) + +Working this out one arrives at the basic equation of Marcus theory + +<math>k_{et} = \frac{2\pi}{\hbar}|H_{AB}|^2 \frac{1}{\sqrt{4\pi \lambda k_bT}}\exp \left ( -\frac{(\lambda +\Delta G^\circ)^2}{4\lambda k_bT} \right )</math> + +where <math>k_{et}</math> is the rate constant for electron transfer, <math>|H_{AB}|</math> is the electronic coupling between the initial and final states, <math>\lambda</math> is the reorganization energy (both inner and outer-sphere), and <math>\Delta G^\circ</math> is the total [[Gibbs free energy]] change for the electron transfer reaction (<math>k_b</math> is the [[Boltzmann constant]] and <math>T</math> is the [[absolute temperature]]). + +Thus Marcus's theory builds on the traditional Arrhenius equation for the rates of chemical reactions in two ways: +1. It provides a formula for the activation energy, based on a parameter called the reorganization energy, as well as the Gibbs free energy. The reorganization energy is defined as the energy required to “reorganize” the system structure from initial to final coordinates, without making the charge transfer. +2. It provides a formula for the pre-exponential factor in the Arrhenius equation, based on the electronic coupling between the initial and final state of the electron transfer reaction (i.e., the overlap of the electronic wave functions of the two states). + +== Experimental Results == + +Marcus has published his theory in 1956. For long years there was an intensive search for the inverted region which would be a proof of the theory. But all experiments with series of reactions of more and more negative <math>\Delta</math>G<sup>0</sup> revealed only an increase of the reaction rate up to the diffusion limit, i.e. to a value indicating that ''every'' encounter lead to electron transfer, and that limit held also for very negative <math>\Delta</math>G<sup>0</sup> values (Rehm-Weller behaviour).<ref>Rehm, D., Weller, A. "Kinetik und Mechanismus der Elektronenübertragung bei der Fluoreszenzlöschung in Acetonitril" ''Ber. Bunsenges.Physik.Chem.'' 1969, '''73''', 834-839 characterized this behaviour by the empirical formula <math> \Delta G^{\ddagger} = \frac{\Delta G^0}{2}+\sqrt{\Delta G^{\ddagger}(0)^2 + \left(\frac{\Delta G^0}{2}\right)^2} </math></ref> It took about 30 years until the inverted region was unequivocally substantiated by Miller, Calcaterra and Closs for an intermolecular electron transfer in a molecule where donor and acceptor are kept at a constant distance by means of a stiff spacer (Fig.4).<ref>Miller J.R., Calcaterra L.T., Closs G.L.: "Intramolecular long-distance electron transfer in radical anions. The effects of free energy and solvent on the reaction rates", ''J.Am.Chem.Soc.'' 1984, '''106''', 3047, {{DOI|10.1021/ja00322a058}}</ref> + +[[Image:Marcusparabel MillerCloss.jpg|thumb|Fig.4. Marcus behaviour in a molecule, which is composed of a biphenyl entity, whose anion (produced by means of pulse radiolysis) acts as a donor, a steroid entity, which is a rigid spacer and different aromatic hydrocarbons (1−3) and quinones (4−8), which are the acceptors.<ref>We recommend reference to the original paper, JACS does not license Wikipedia for graphs.</ref>]] + +''A posteriori'' one may presume that in the systems where the reaction partners may diffuse freely the optimum distance for the electron jump may be sought, i.e. the distance for which <math>\Delta</math>G<sup>‡</sup> = 0 and <math>\Delta</math>G<sup>0</sup> = - <math>\lambda</math><sub>o</sub>. For <math>\lambda</math><sub>o</sub> is dependent on R, <math>\lambda</math><sub>o</sub> increases for larger R and the opening of the parabola smaller. It is formally always possible to close the parabola in Fig. 2 to such an extent, that the f-parabola intersects the i-parabola in the apex. Then always <math>\Delta</math>G<sup>‡</sup> = 0 and the rate k reaches the maximum diffusional value for all very negative <math>\Delta</math>G<sup>0</sup>. There are, however, other concepts for the phenomenon,<ref name="Nobel"/> e.g. the participation of excited states or that the decrease of the rate constants would be so far in the inverted region that it escapes measurement. + +R.A. Marcus and his coworkers have further developed the theory outlined here in several aspects. They have included inter alia statistical aspects and quantum effects,<ref>Siders, P.,Marcus, R.A. "Quantum Effects in Electron-Transfer Reactions" ''J.Am.Chem.Soc.'' 1981,'''103''',741; Siders, P., Marcus, R.A. "Quantum Effects for Electron-Transfer Reactions in the 'Inverted Region'" ''J.Am.Chem.Soc.'' 1981,'''103''',748</ref> they have applied the theory to chemiluminescence <ref>Marcus. R.A. "On the Theory of Chemiluminescent Electron-Transfer Reactions" ''J.Chem.Phys.''1965,'''43''',2654</ref> and electrode reactions.<ref>Marcus, R.A. "On the theory of Electron-Transfer Reaction IV. Unified Treatment of Homogeneous and Electrode Reactions" ''J.Chem.Phys.'' 1965, '''43'''. 679</ref> R.A Marcus received the Nobel Prize in Chemistry in 1992, his Nobel Lecture gives an extensive view of his work.<ref name = "Nobel" /> + +==See also== +*[[Hammond's postulate]] + +== References == +<references/> + +==Marcus's Key Papers== +* {{cite journal | author = Marcus, R.A | journal = [[J. Chem. Phys.]] | year = 1956 | volume = 24 | issue = 5 | pages = 966 | doi = 10.1063/1.1742723 | title = On the Theory of Oxidation-Reduction Reactions Involving Electron Transfer. I|bibcode = 1956JChPh..24..966M }} +* {{cite journal | author = Marcus, R.A | journal = [[J. Chem. Phys.]] | year = 1956 | volume = 24 | issue = 5 | pages = 979 | doi = 10.1063/1.1742724 | title = Electrostatic Free Energy and Other Properties of States Having Nonequilibrium Polarization. I|bibcode = 1956JChPh..24..979M }} +* {{cite journal | author = Marcus, R.A | journal = [[J. Chem. Phys.]] | year = 1957 | volume = 26 | issue = 4 | pages = 867 | doi = 10.1063/1.1743423 | title = On the Theory of Oxidation-Reduction Reactions Involving Electron Transfer. II. Applications to Data on the Rates of Isotopic Exchange Reactions|bibcode = 1957JChPh..26..867M }} +* {{cite journal | author = Marcus, R.A | journal = [[J. Chem. Phys.]] | year = 1957 | volume = 26 | issue = 4 | pages = 872 | doi = 10.1063/1.1743424 | title = On the Theory of Oxidation-Reduction Reactions Involving Electron Transfer. III. Applications to Data on the Rates of Organic Redox Reactions|bibcode = 1957JChPh..26..872M }} +* {{cite journal | author = Marcus, R.A | journal = [[Disc. Faraday Soc.]] | year = 1960 | volume = 29 | pages = 21 | doi = 10.1039/df9602900021 | title = Exchange reactions and electron transfer reactions including isotopic exchange. Theory of oxidation-reduction reactions involving electron transfer. Part 4.—A statistical-mechanical basis for treating contributions from solvent, ligands, and inert salt}} +* {{cite journal | author = Marcus, R.A | journal = [[J. Phys. Chem.]] | year = 1963 | volume = 67 | issue = 4 | pages = 853 | doi = 10.1021/j100798a033 | title = On The Theory Of Oxidation--Reduction Reactions Involving Electron Transfer. V. Comparison And Properties Of Electrochemical And Chemical Rate Constants}} +* {{cite journal | author = Marcus, R.A | journal = [[Annu. Rev. Phys. Chem.]] | year = 1964 | volume = 15 | issue = 1 | pages = 155 | doi = 10.1146/annurev.pc.15.100164.001103 | title = Chemical and Electrochemical Electron-Transfer Theory|bibcode = 1964ARPC...15..155M }} +* {{cite journal | author = Marcus, R.A | journal = [[J. Chem. Phys.]] | year = 1965 | volume = 43 | issue = 2 | pages = 679 | doi = 10.1063/1.1696792 | title = On the Theory of Electron-Transfer Reactions. VI. Unified Treatment for Homogeneous and Electrode Reactions|bibcode = 1965JChPh..43..679M }} +* {{cite journal | author = Marcus, R.A.; Sutin N | journal = [[Biochim. Biophys. Acta.]] | year = 1985 | volume = 811 | issue = 3 | pages = 265 | title = Electron transfers in chemistry and biology | doi = 10.1016/0304-4173(85)90014-X}} + +{{DEFAULTSORT:Marcus Theory}} +[[Category:Physical organic chemistry]] +[[Category:Physical chemistry]] + mfyygj1ey30t6ohk9o0otrg5zd0t09m + + + + Centrifugal fan + 0 + 16787 + + 16788 + 2014-01-15T22:32:46Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + {{Refimprove|date=July 2011}} +[[File:Centrifugal fan.gif|thumb|right|275px|A typical backward-curved centrifugal fan, where the blades curve in the opposite direction than the one they rotate in]] +A '''centrifugal fan''' is a mechanical device for moving [[air]] or other [[gas]]es. The terms "blower" and "squirrel cage fan" (because it looks like a [[hamster wheel]]) are frequently used as synonyms. These fans increase the speed of air stream with the rotating [[impeller]]s.<ref name="Fans and Blowers">{{cite book|title=Electrical Energy Equipment: Fans and Blowers|year=2006|publisher=UNEP|pages=21}}</ref> +They use the kinetic energy of the [[impeller]]s or the rotating blade to increase the pressure of the air/gas stream which in turn moves them against the resistance caused by ducts, dampers and other components. Centrifugal fans accelerate air radially, changing the direction (typically by 90<sup>o</sup>) of the airflow. They are sturdy, quiet, reliable, and capable of operating over a wide range of conditions.<ref name="ACMA sourcebook">{{cite book|last=Lawrence Berkeley National Laboratory Washington, DC Resource Dynamics Corporation Vienna, VA|title=Improving Fan System Performance|url=http://www1.eere.energy.gov/manufacturing/tech_deployment/pdfs/fan_sourcebook.pdf|accessdate=29 February 2012|page=21}}</ref> + +Centrifugal fans are constant [[Actual cubic feet per minute|CFM]] devices or constant volume devices, meaning that, at a constant fan speed, a centrifugal fan will pump a constant volume of air rather than a constant mass. This means that the air velocity in a system is fixed even though mass flow rate through the fan is not. + +The centrifugal fan is one of the most widely used fans. Centrifugal fans are by far the most prevalent type of fan used in the [[Heating, Ventilation, and Air Conditioning|HVAC]] industry today. They are usually cheaper than axial fans and simpler in construction.<ref name="estar air">{{cite web|title=Air Distribution Systems|url=http://www.energystar.gov/index.cfm?c=business.EPA_BUM_CH8_AirDistSystems#SS_8_3_1|publisher=Energy Star|accessdate=29 February 2012}}</ref><!-----Don't give line break-----> +It is used in transporting gas or materials and in ventilation system for buildings.<ref name="Singh IITM">{{cite journal|last=Singh|first=O.P.|coauthors=Rakesh Khilwani, T. Sreenivasulu, M. Kannan|title=PARAMETRIC STUDY OF CENTRIFUGAL FAN PERFORMANCE: EXPERIMENTS AND NUMERICAL SIMULATION|journal=International Journal of Advances in Engineering & Technology|date=May 2011|volume=1|issue=2|pages=18|doi=|pmid=|url=http://www.ijaet.org/media/0001/5PARAMETRIC-STUDY-OF-CENTRIFUGAL-FAN-PERFORMANCE-EXPERIMENTS-AND-NUMERICAL-SIMULATION-Copyright-IJAET.pdf|accessdate=29 February 2012|issn=2231-1963}}</ref> They are also used commonly in central heating/cooling systems. They are also well-suited for [[Industry|industrial]] processes and [[air pollution]] control systems. + +It has a [[fan (mechanical)|fan]] [[wheel]] composed of a number of fan [[blade]]s, or [[ribs]], mounted around a hub. As shown in Figure 1, the hub turns on a [[driveshaft]] that passes through the fan housing. The gas enters from the side of the fan [[wheel]], turns 90 degrees and [[accelerate]]s due to [[centrifugal force]] as it flows over the fan blades and exits the fan housing.<ref name=EPA>[http://www.epa.gov/apti/bces/module5/fans/types/types.htm#types Fan types] ([[United States Environmental Protection Agency|U.S. Environmental Protection Agency]] website page)</ref> + +The centrifugal fan was invented by Russian military engineer [[Alexander Sablukov]] in 1832, and found its usage both in the Russian light industry (such as sugar making) and abroad.<ref>[http://www.elcomspb.ru/wiki/eltech_history/vent_invent/ A History of Mechanical Fan] {{ru icon}}</ref> + +==Construction== +[[Image:CentrifugalFan.png|thumb|right|275px|Figure 1: Components of a centrifugal fan]] +Main parts of a centrifugal fan are : +#Fan Housing +#[[Impeller]]s +#Inlet and outlet ducts +#[[Drive Shaft]] +#Drive mechanism +Other components used may include [[bearing (mechanical)|bearing]]s, [[coupling]]s, impeller locking device, fan discharge casing, shaft seal plates etc.<ref>{{cite web|title=TECHNICAL SPECIFICATION OF CENTRIFUGAL FANS DESIGN|url=http://www.flaktwoods.com/b02672b3-dc6f-442d-8e7f-0319f77799ad|accessdate=29 February 2012}}</ref> + +===Types of drive mechanisms=== + +The fan drive determines the speed of the fan wheel (impeller) and the extent to which this speed can be varied. There are three basic types of fan drives.<ref name=EPA/> + +====Direct drive==== + +The fan wheel can be linked directly to the shaft of an [[electric motor]]. This means that the fan wheel speed is identical to the motor's [[rotation]]al speed. With this type of fan drive mechanism, the fan speed cannot be varied unless the motor speed is adjustable. Air conditioning will then automatically provide faster speed because colder air is more dense. + +Some electronics manufacturers have made centrifugal fans with external rotor motors (the stator is inside the rotor), and the rotor is directly mounted on the fan wheel (impeller). + +====Belt drive==== +A set of [[sheave]]s are mounted on the motor shaft and the fan wheel shaft, and a belt transmits the mechanical energy from the motor to the fan. + +The fan wheel speed depends upon the [[ratio]] of the diameter of the motor sheave to the diameter of the fan wheel sheave and can be obtained from this equation:<ref name=EPA/> + +Very few people and manufacturers use chain-drive fans due to their greater noise output and complex workup, but they are more durable and don't require frequent replacement. + +<math>rpm_{fan} = rpm_{motor}\,\bigg(\frac{\,D_{motor}}{D_{fan}}\bigg)</math> +{| border="0" cellpadding="2" +|- +|align=left|where: +|&nbsp; +|- +!align=right|<math>rpm_{fan}</math> +|align=left|= fan wheel speed, revolutions per minute +|- +!align=right|<math>rpm_{motor}</math> +|align=left|= motor nameplate speed, revolutions per minute +|- +!align=right|<math> D_{motor}</math> +|align=left|= diameter of the motor sheave +|- +!align=right|<math>D_{fan}</math> +|align=left|= diameter of the fan wheel sheave +|} + +Fan wheel speeds in belt-driven fans are fixed unless the belt(s) slip. Belt slippage can reduce the fan wheel speed by several hundred revolutions per minute (rpm). + +====Variable drive==== + +Variable drive fans use [[Hydraulic coupling|hydraulic]] or [[Magnetic coupling]]s (between the fan wheel shaft and the motor shaft) that allow r speed. The fan speed controls are often integrated into [[automate]]d systems to maintain the desired fan wheel speed.<ref name=EPA/> + +An alternate method of varying the fan speed is by use of an electronic variable-speed drive which controls the speed of the motor driving the fan. This offers better overall energy efficiency at reduced speeds than mechanical couplings. + +===Bearings=== + +Bearings are an important part of a fan. + +Sleeve-ring oil bearings are used extensively in fans. Some sleeve-ring bearings may be water-cooled. Water-cooled sleeve bearings are used when hot gases are being moved by the fan. Heat is conducted through the shaft and into the oil which must be cooled to prevent overheating of the bearing. + +Since lower-speed fans have bearings in hard-to-reach spots, grease-packed anti-friction bearings are used. + +===Fan dampers and Vanes=== + +Fan dampers are used to control gas flow into and out of the centrifugal fan. They may be installed on the inlet side or on the outlet side of the fan, or both. Dampers on the outlet side impose a flow resistance that is used to control gas flow. Dampers on the inlet side (inlet vanes) are designed to control gas flow by changing the amount of gas or air admitted to the fan inlet. + +Inlet dampers (Inlet vanes) reduce fan energy usage due to their ability to affect the airflow pattern into the fan.<ref name=EPA/> + +===Fan ribs=== +[[Image:CentrifugalFanBlades.png|thumb|right|325px|<center>Figure 3: Centrifugal fan blades</center>]] + +The fan wheel consists of a hub on which a number of fan blades are attached. The fan blades on the hub can be arranged in three different ways: forward-curved, backward-curved or radial.<ref name=EPA/> + +====Forward-curved blade==== +Forward-curved blades, as in Figure 3(a), curve in the direction of the fan wheel's rotation. These are especially sensitive to particulates. Forward-curved blades are for high flow, low pressure applications. A characteristic of forward curved blower wheels is their weight, due to the large number of blades they require. {{Citation needed|date=July 2011}} + +====Backward-curved blades==== + +Backward-curved blades, as in Figure 3(b), curve against the direction of the fan wheel's rotation. Smaller blowers may have '''backward-inclined''' blades, which are straight, not curved. Larger backward-inclined/-curved blowers have blades whose backward curvatures mimic that of an airfoil cross section, but both designs provide good operating efficiency with relatively economical construction techniques. These types of blowers are designed to handle gas streams with low to moderate particulate loadings {{Citation needed|date=July 2011}}. They can be easily fitted with wear protection but certain blade curvatures can be prone to solids build-up.{{Citation needed|date=July 2011}}. Backward curved wheels are often lighter than corresponding forward-curved equivalents, as they don't require so many blades. + +Backward curved fans can have a high range of specific speeds but are most often used for medium specific speed applications—high pressure, medium flow applications.{{Citation needed|date=July 2011}} + +Backward-curved fans are much more energy efficient than radial blade fans and so, for high power applications may be a suitable alternative to the lower cost radial bladed fan.{{Citation needed|date=July 2011}} + +Also, some backward curved fans can also operate in reverse - the wheel rotates in the opposite direction, forcing air backward through the housing. {{citation needed|date=September 2013}} Also available are '''plug fans, '''which are centrifugal fans, most often backward curved, without scroll housings, and '''in-line''' centrifugal blowers, in which the duct for the fan contains the wheel, but in a way that allows air to exit the fan in exactly the same direction as it enters, which was previously achievable only with axial fans. {{citation needed|date=September 2013}} + +====Straight radial blades==== + +Radial blowers, as in Figure 3(c), have wheels whose blades extend straight out from the center of the hub. Radial bladed wheels are often used on particulate-laden gas streams because they are the least sensitive to solid build-up on the blades, but they are often characterized by greater noise output. High speeds, low volumes, and high pressures are common with radial blowers{{Citation needed|date=July 2011}}, and are often used in [[vacuum cleaner]]s, pneumatic material conveying systems, and similar processes. + +==Principle of Working== + +The centrifugal fan uses the centrifugal power generated from the rotation of impellers to increase the pressure of air/gases. When the impellers rotate, the gas near the impellers is thrown-off from the impellers due to the centrifugal force and then moves into the fan casing. As a result the gas pressure in the fan casing is increased. The gas is then guided to the exit via outlet ducts. After the gas is thrown-off, the gas pressure in the middle region of the impellers decreases. The gas from the impeller eye rushes in to normalize this pressure. This cycle repeats and therefore the gas can be continuously transferred. + +{|border=1px align=right +|+Table 1 +|colspan="3" style="background:#7da7d9; color:white;" align="center"|Differences between fans and blowers +|- +!Equipment!!Pressure Ratio!!Pressure rise (mm Hg) +|- +|Fans||Up to 1.1||1136 +|- +|Blowers||1.1 to 1.2||1136-2066 +|} + +===Velocity Triangle=== +[[Velocity]] triangle helps us in determining the flow geometry at the entry and exit of a blade. A minimum number of data are required to draw a velocity triangle at a point on blade. Some component of velocity varies at different point on the blade due to changes in the direction of flow. Hence an infinite number of velocity triangles are possible for a given blade. In order to describe the flow using only two velocity triangles we define mean values of velocity and their direction. Velocity triangle of any turbo machine has three components as shown: +[[File:Velocity Triangle for Forward Facing Blade.png|thumb|Velocity triangle for forward facing blade]] +*U - Blade velocity +*V<sub>r</sub> – Relative Velocity +*V - Absolute velocity +These velocities are related by the triangle law of vector addition: - +*V=U+V<sub>r</sub> +This relatively simple equation is used frequently while drawing the velocity diagram. The velocity diagram for the forward, backward face blades shown are drawn using this law. The angle α is the angle made by the absolute velocity with the axial direction and angle β is the angle made by blade with respect to axial direction. +[[File:Velocity Triangle Backward Facing.png|thumb|Velocity triangle for backward Facing blade]] + +===Difference between fans and blowers=== +The property that distinguishes a centrifugal fan from a blower is the pressure ratio it can achieve. A blower in general can produce higher pressure ratio. As per [[American Society of Mechanical Engineers]] (ASME) the specific ratio - the ratio of the discharge pressure over the suction pressure – is used for defining the fans and blowers (refer Table 1). + +==Centrifugal fan ratings== +Ratings found in centrifugal fan performance tables and curves are based on standard air [[SCFM]]. Fan manufacturers define standard air as clean, dry air with a [[density]] of 0.075 pounds mass per cubic foot (1.2&nbsp;kg/m³), with the [[barometric pressure]] at sea level of 29.92&nbsp;inches of mercury (101.325 kPa) and a [[temperature]] of 70&nbsp;°F (21&nbsp;°C). Selecting a centrifugal fan to operate at conditions other than standard air requires adjustment to both static pressure and [[Electric power|power]]. + +At higher-than-standard elevation ([[sea level]]) and higher-than-standard temperature, air density is lower than standard density. Air density corrections need to be taken into account for centrifugal fans that are specified for continuous operation at higher temperatures. The centrifugal fan will displace a constant volume of air in a given system regardless of the air density. + +When a centrifugal fan is specified for a given CFM and static pressure at conditions other than standard, an air density correction factor must be applied to select the proper size fan to meet the new condition. Since {{convert|200|°F|°C|abbr=on}} air weighs only 80% of {{convert|70|°F|°C|abbr=on}} air, the centrifugal fan will create less pressure and require less power. To get the actual pressure required at {{convert|200|°F|°C|abbr=on}}, the designer would have to multiply the pressure at standard conditions by an air density correction factor of 1.25 (i.e., 1.0/0.8) to get the system to operate correctly. To get the actual power at {{convert|200|°F|°C|abbr=on}}, the designer would have to divide the power at standard conditions by the air density correction factor. + +==Air Movement and Control Association (AMCA)== + +The centrifugal fan performance tables provide the fan RPM and power requirements for the given CFM and static pressure at standard air density. When the centrifugal fan performance is not at standard conditions, the performance must be converted to standard conditions before entering the performance tables. Centrifugal fans rated by the [[Air Movement and Control Association]] (AMCA) are tested in laboratories with test setups that simulate installations that are typical for that type of fan. Usually they are tested and rated as one of four standard installation types as designated in AMCA Standard 210.<ref>ANSI/AMCA Standard 210-99, "Laboratory Methods Of Testing Fans for Aerodynamic Performance Rating"</ref> + +AMCA Standard 210 defines uniform methods for conducting laboratory tests on housed fans to determine airflow rate, pressure, power and efficiency, at a given speed of rotation. The purpose of AMCA Standard 210 is to define exact procedures and conditions of fan testing so that ratings provided by various manufacturers are on the same basis and may be compared. For this reason, fans must be rated in SCFM. + +==Losses in Centrifugal Fan== +In centrifugal fans losses will be there in both stationary and moving parts of the centrifugal fan stage. We can get the actual performance of the centrifugal fan by taking these stage losses into account. +Various types of losses + +=== Impeller entry losses === +Due to the flow at the eye and it’s turning from axial to radial direction causes losses at the entry. Friction and separation causes impeller blade losses since there is change in incidence. Normally these impeller blade losses are also included in this head. + +=== Leakage loss === +Leakage of some air and disturbance in the main flow field is caused due to the clearance provided between the rotating periphery of the impeller and the casing at the entry. + +=== Impeller losses === +Passage friction and separation causes impeller losses which are dependent on relative velocity, rate of diffusion and blade geometry. + +Impeller balancing is done by small weights on a balancing machine. All energy of vibration is lost (''i.e.'', can easily amount to %50 air-flow loss in home AC units). + +=== Diffuser and volute losses === +[[Friction]] and [[separation process|separation]] also causes losses in [[diffuser]]{{disambiguation needed|date=September 2013}}. Further losses due to incidence occur if the device is working in off-design conditions. Flow from impeller or diffuser expands in the volute which is having larger cross section leading to the formation of [[Eddy]], which in turn reduces head. Friction and flow separation losses also occur due the volute passage. + +=== Disc Friction === +Viscous [[drag (physics)|drag]] on the back surface of the impeller disc causes Disc friction. + +==See also== +{{Commons category|Centrifugal fans}} +*[[Mechanical fan]] +*[[Ducted fan]] +*[[Standard temperature and pressure]] +*[[Wind turbine]] +* [[Three dimensional losses and correlation in turbomachinery]] +* [[Waddle fan]] + +==References== +{{reflist}} + +[[Category:Chemical engineering]] +[[Category:Compressors]] +[[Category:Turbomachinery]] +[[Category:Turbines]] +[[Category:Thermodynamics]] +[[Category:Fluid dynamics]] +[[Category:Aerodynamics]] +[[Category:Fans]] +[[Category:Russian inventions]] + q00ymq73fr7i6a9y0unfuddi28p2k72 + + + + Backstepping + 0 + 21959 + + 21960 + 2014-02-01T00:37:27Z + + Bgwhite + 0 + + There must be no material between TOC and headline per [[WP:TOC]], [[WP:AWB/T|typo(s) fixed]]: , → , using [[Project:AWB|AWB]] (9890) + wikitext + text/x-wiki + In [[control theory]], backstepping is a technique developed [[circa]] 1990 by [[Petar V. Kokotovic]] and others<ref name=Kokotovic1992>{{cite journal + | last = Kokotovic + | first = P.V. + | authorlink = Petar V. Kokotovic + | year = 1992 + | title = The joy of feedback: nonlinear and adaptive + | journal = Control Systems Magazine, IEEE + | volume = 12 + | issue = 3 + | pages = 7–17 + | url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=165507 + | accessdate = 2008-04-13 + | doi = 10.1109/37.165507 +}}</ref><ref name=LB92>{{cite journal | first1=R.|last1=Lozano| first2=B.|last2=Brogliato | year=1992 | title=Adaptive control of robot manipulators with flexible joints | journal= IEEE Transactions on Automatic Control, | volume=37 | issue=2 | pages=174–181 | doi=10.1109/9.121619}}</ref> for designing [[Lyapunov stability|stabilizing]] controls for a special class of [[nonlinear system|nonlinear]] [[dynamical system]]s. These systems are built from subsystems that radiate out from an irreducible subsystem that can be stabilized using some other method. Because of this [[recursion|recursive]] structure, the designer can start the design process at the known-stable system and "back out" new controllers that progressively stabilize each outer subsystem. The process terminates when the final external control is reached. Hence, this process is known as ''backstepping.<ref name="Khalil">{{cite book + | last = Khalil + | first = H.K. + | authorlink = Hassan K. Khalil + | year = 2002 + | edition = 3rd + | url = http://www.egr.msu.edu/~khalil/NonlinearSystems/ + | isbn = 0-13-067389-7 + | title = Nonlinear Systems + | publisher = [[Prentice Hall]] + | location = Upper Saddle River, NJ}}</ref>'' + +==Backstepping approach== +The backstepping approach provides a [[recursion|recursive]] method for [[Lyapunov stability|stabilizing]] the [[origin (mathematics)|origin]] of a system in [[strict-feedback form]]. That is, consider a [[dynamical system|system]] of the form<ref name="Khalil"/> + +:<math>\begin{cases} \dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = f_1(\mathbf{x},z_1) + g_1(\mathbf{x},z_1) z_2\\ +\dot{z}_2 = f_2(\mathbf{x},z_1,z_2) + g_2(\mathbf{x},z_1,z_2) z_3\\ +\vdots\\ +\dot{z}_i = f_i(\mathbf{x},z_1, z_2, \ldots, z_{i-1}, z_i) + g_i(\mathbf{x},z_1, z_2, \ldots, z_{i-1}, z_i) z_{i+1} \quad \text{ for } 1 \leq i < k-1\\ +\vdots\\ +\dot{z}_{k-1} = f_{k-1}(\mathbf{x},z_1, z_2, \ldots, z_{k-1}) + g_{k-1}(\mathbf{x},z_1, z_2, \ldots, z_{k-1}) z_k\\ +\dot{z}_k = f_k(\mathbf{x},z_1, z_2, \ldots, z_{k-1}, z_k) + g_k(\mathbf{x},z_1, z_2, \dots, z_{k-1}, z_k) u\end{cases}</math> + +where +* <math>\mathbf{x} \in \mathbb{R}^n</math> with <math>n \geq 1</math>, +* <math>z_1, z_2, \ldots, z_i, \ldots, z_{k-1}, z_k</math> are [[scalar (mathematics)|scalar]]s, +* <math>u</math> is a [[scalar (mathematics)|scalar]] input to the system, +* <math>f_x, f_1, f_2, \ldots, f_i, \ldots, f_{k-1}, f_k</math> [[vanish (mathematics)|vanish]] at the [[origin (mathematics)|origin]] (i.e., <math>f_i(0,0,\dots,0) = 0</math>), +* <math>g_1, g_2, \ldots, g_i, \ldots, g_{k-1}, g_k</math> are nonzero over the domain of interest (i.e., <math>g_i(\mathbf{x},z_1,\ldots,z_k) \neq 0</math> for <math>1 \leq i \leq k</math>). + +Also assume that the subsystem +:<math>\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x})</math> +is [[Lyapunov stability|stabilized]] to the [[origin (mathematics)|origin]] (i.e., <math> \mathbf{x} = \mathbf{0}\,</math>) by some '''known''' control <math>u_x(\mathbf{x})</math> such that <math>u_x(\mathbf{0}) = 0</math>. It is also assumed that a [[Lyapunov function]] <math>V_x</math> for this stable subsystem is known. That is, this <math>\mathbf{x}</math> subsystem is stabilized by some other method and backstepping extends its stability to the <math>\textbf{z}</math> shell around it. + +In systems of this ''strict-feedback form'' around a stable <math>\mathbf{x}</math> subsystem, +* The backstepping-designed control input <math>u</math> has its most immediate stabilizing impact on state <math>z_n</math>. +* The state <math>z_n</math> then acts like a stabilizing control on the state <math>z_{n-1}</math> before it. +* This process continues so that each state <math>z_i</math> is stabilized by the ''fictitious'' "control" <math>z_{i+1}</math>. +The '''backstepping''' approach determines how to stabilize the <math>\mathbf{x}</math> subsystem using <math>z_1</math>, and then proceeds with determining how to make the next state <math>z_2</math> drive <math>z_1</math> to the control required to stabilize <math>\mathbf{x}</math>. Hence, the process "steps backward" from <math>\mathbf{x}</math> out of the strict-feedback form system until the ultimate control <math>u</math> is designed. + +==Recursive Control Design Overview== + +# It is given that the smaller (i.e., lower-order) subsystem +#::<math>\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x})</math> +#:is already stabilized to the origin by some control <math>u_x(\mathbf{x})</math> where <math>u_x(\mathbf{0}) = 0</math>. That is, choice of <math>u_x</math> to stabilize this system must occur using ''some other method.'' It is also assumed that a [[Lyapunov function]] <math>V_x</math> for this stable subsystem is known. Backstepping provides a way to extend the controlled stability of this subsystem to the larger system. +# A control <math>u_1(\mathbf{x},z_1)</math> is designed so that the system +#::<math>\dot{z}_1 = f_1(\mathbf{x},z_1) + g_1(\mathbf{x},z_1) u_1(\mathbf{x},z_1)</math> +#:is stabilized so that <math>z_1</math> follows the desired <math>u_x</math> control. The control design is based on the augmented Lyapunov function candidate +#::<math>V_1(\mathbf{x},z_1) = V_x(\mathbf{x}) + \frac{1}{2}( z_1 - u_x(\mathbf{x}) )^2</math> +#:The control <math>u_1</math> can be picked to bound <math>\dot{V}_1</math> away from zero. +# A control <math>u_2(\mathbf{x},z_1,z_2)</math> is designed so that the system +#::<math>\dot{z}_2 = f_2(\mathbf{x},z_1,z_2) + g_2(\mathbf{x},z_1,z_2) u_2(\mathbf{x},z_1,z_2)</math> +#:is stabilized so that <math>z_2</math> follows the desired <math>u_1</math> control. The control design is based on the augmented Lyapunov function candidate +#::<math>V_2(\mathbf{x},z_1,z_2) = V_1(\mathbf{x},z_1) + \frac{1}{2}( z_2 - u_1(\mathbf{x},z_1) )^2</math> +#:The control <math>u_2</math> can be picked to bound <math>\dot{V}_2</math> away from zero. +# This process continues until the actual <math>u</math> is known, and +#* The ''real'' control <math>u</math> stabilizes <math>z_k</math> to ''fictitious'' control <math>u_{k-1}</math>. +#* The ''fictitious'' control <math>u_{k-1}</math> stabilizes <math>z_{k-1}</math> to ''fictitious'' control <math>u_{k-2}</math>. +#* The ''fictitious'' control <math>u_{k-2}</math> stabilizes <math>z_{k-2}</math> to ''fictitious'' control <math>u_{k-3}</math>. +#* ... +#* The ''fictitious'' control <math>u_2</math> stabilizes <math>z_2</math> to ''fictitious'' control <math>u_1</math>. +#* The ''fictitious'' control <math>u_1</math> stabilizes <math>z_1</math> to ''fictitious'' control <math>u_x</math>. +#* The ''fictitious'' control <math>u_x</math> stabilizes <math>\mathbf{x}</math> to the origin. + +This process is known as '''backstepping''' because it starts with the requirements on some internal subsystem for stability and progressively ''steps back'' out of the system, maintaining stability at each step. Because +* <math>f_i</math> vanish at the origin for <math>0 \leq i \leq k</math>, +* <math>g_i</math> are nonzero for <math>1 \leq i \leq k</math>, +* the given control <math>u_x</math> has <math>u_x(\mathbf{0}) = 0</math>, +then the resulting system has an equilibrium at the '''origin''' (i.e., where <math> \mathbf{x}=\mathbf{0}\,</math>, <math>z_1=0</math>, <math>z_2=0</math>, ..., <math>z_{k-1}=0</math>, and <math>z_k=0</math>) that is [[Lyapunov function#Globally asymptotically stable equilibrium|globally asymptotically stable]]. + +==Integrator Backstepping== + +Before describing the backstepping procedure for general [[strict-feedback form]] [[dynamical system]]s, it is convenient to discuss the approach for a smaller class of strict-feedback form systems. These systems connect a series of integrators to the input of a +system with a known feedback-stabilizing control law, and so the stabilizing approach is known as ''integrator backstepping.'' With a small modification, the integrator backstepping approach can be extended to handle all strict-feedback form systems. + +===Single-integrator Equilibrium=== + +Consider the [[dynamical system]] +:{| border="0", width="75%" +|- +|align="left"|<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = u_1 +\end{cases}</math> +|align="right"|<math> (1)\,</math> +|- +|} +where <math>\mathbf{x} \in \mathbb{R}^n</math> and <math>z_1</math> is a scalar. This system is a [[cascade connection]] of an [[integrator]] with the <math>\mathbf{x}</math> subsystem (i.e., the input <math>u</math> enters an integrator, and the [[integral]] <math>z_1</math> enters the <math>\mathbf{x}</math> subsystem). + +We assume that <math>f_x(\mathbf{0})=0</math>, and so if <math>u_1=0</math>, <math> \mathbf{x} = \mathbf{0}\,</math> and <math>z_1 = 0</math>, then +:<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\underbrace{\mathbf{0}}_{\mathbf{x}}) + ( g_x(\underbrace{\mathbf{0}}_{\mathbf{x}}) )(\underbrace{0}_{z_1}) = 0 + ( g_x(\mathbf{0}) )(0) = \mathbf{0} & \quad \text{ (i.e., } \mathbf{x} = \mathbf{0} \text{ is stationary)}\\ +\dot{z}_1 = \overbrace{0}^{u_1} & \quad \text{ (i.e., } z_1 = 0 \text{ is stationary)} +\end{cases}</math> +So the [[origin (mathematics)|origin]] <math>(\mathbf{x},z_1) = (\mathbf{0},0)</math> is an equilibrium (i.e., a [[stationary point]]) of the system. If the system ever reaches the origin, it will remain there forever after. + +===Single-integrator Backstepping=== + +In this example, backstepping is used to [[Lyapunov stability|stabilize]] the single-integrator system in Equation&nbsp;(1) around its equilibrium at the origin. To be less precise, we wish to design a control law <math>u_1(\mathbf{x},z_1)</math> that ensures that the states <math>(\mathbf{x}, z_1)</math> return to <math>(\mathbf{0},0)</math> after the system is started from some arbitrary initial condition. + +* First, by assumption, the subsystem + +::<math>\dot{\mathbf{x}} = F(\mathbf{x}) \qquad \text{where} \qquad F(\mathbf{x}) \triangleq f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x})</math> + +:with <math>u_x(\mathbf{0}) = 0</math> has a [[Lyapunov function]] <math>V_x(\mathbf{x}) > 0</math> such that + +::<math>\dot{V}_x=\frac{\partial V_x}{\partial \mathbf{x}}(f_x(\mathbf{x})+g_x(\mathbf{x})u_x(\mathbf{x})) \leq - W(\mathbf{x})</math> + +:where <math>W(\mathbf{x})</math> is a [[positive-definite function]]. That is, we '''assume''' that we have '''already shown''' that this '''existing simpler''' <math>\mathbf{x}</math> '''subsystem''' is '''[[Lyapunov stability|stable (in the sense of Lyapunov)]].''' Roughly speaking, this notion of stability means that: +** The function <math>V_x</math> is like a "generalized energy" of the <math>\mathbf{x}</math> subsystem. As the <math>\mathbf{x}</math> states of the system move away from the origin, the energy <math>V_x(\mathbf{x})</math> also grows. +** By showing that over time, the energy <math>V_x(\mathbf{x}(t))</math> decays to zero, then the <math>\mathbf{x}</math> states must decay toward <math> \mathbf{x}=\mathbf{0}\,</math>. That is, the origin <math> \mathbf{x}=\mathbf{0}\,</math> will be a '''stable equilibrium''' of the system – the <math>\mathbf{x}</math> states will continuously approach the origin as time increases. +** Saying that <math>W(\mathbf{x})</math> is positive definite means that <math>W(\mathbf{x}) > 0</math> everywhere except for <math> \mathbf{x}=\mathbf{0}\,</math>, and <math>W(\mathbf{0})=0</math>. +** The statement that <math>\dot{V}_x \leq -W(\mathbf{x})</math> means that <math>\dot{V}_x</math> is bounded away from zero for all points except where <math> \mathbf{x} = \mathbf{0}\,</math>. That is, so long as the system is not at its equilibrium at the origin, its "energy" will be decreasing. +** Because the energy is always decaying, then the system must be stable; its trajectories must approach the origin. +:Our task is to find a control <math>u</math> that makes our cascaded <math>(\mathbf{x},z_1)</math> system also stable. So we must find a ''new'' Lyapunov function '''candidate''' for this new system. That candidate will depend upon the control <math>u</math>, and by choosing the control properly, we can ensure that it is decaying everywhere as well. + +* Next, by ''adding'' '''and''' ''subtracting'' <math>g_x(\mathbf{x}) u_x(\mathbf{x})</math> (i.e., we don't change the system in any way because we make no ''net'' effect) to the <math>\dot{\mathbf{x}}</math> part of the larger <math>(\mathbf{x},z_1)</math> system, it becomes + +::<math>\begin{cases}\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 + \mathord{\underbrace{\left( g_x(\mathbf{x})u_x(\mathbf{x}) - g_x(\mathbf{x})u_x(\mathbf{x}) \right)}_{0}}\\\dot{z}_1 = u_1\end{cases}</math> + +:which we can re-group to get + +::<math>\begin{cases}\dot{x} = \mathord{\underbrace{\left( f_x(\mathbf{x}) + g_x(\mathbf{x})u_x(\mathbf{x}) \right)}_{F(\mathbf{x})}} + g_x(\mathbf{x}) \underbrace{\left( z_1 - u_x(\mathbf{x}) \right)}_{z_1 \text{ error tracking } u_x}\\\dot{z}_1 = u_1\end{cases}</math> + +:So our cascaded supersystem encapsulates the known-stable <math>\dot{\mathbf{x}} = F(\mathbf{x})</math> subsystem plus some error perturbation generated by the integrator. + +* We now can change variables from <math>(\mathbf{x}, z_1)</math> to <math>(\mathbf{x}, e_1)</math> by letting <math>e_1 \triangleq z_1 - u_x(\mathbf{x})</math>. So + +::<math>\begin{cases}\dot{\mathbf{x}} = (f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x})) + +g_x(\mathbf{x}) e_1\\\dot{e}_1 = u_1 - \dot{u}_x\end{cases}</math> + +: Additionally, we let <math>v_1 \triangleq u_1 - \dot{u}_x</math> so that <math>u_1 = v_1 + \dot{u}_x</math> and + +::<math>\begin{cases}\dot{\mathbf{x}} = (f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x}))+g_x(\mathbf{x}) e_1\\\dot{e}_1 = v_1\end{cases}</math> + +: We seek to stabilize this '''error system''' by feedback through the new control <math>v_1</math>. By stabilizing the system at <math>e_1 = 0</math>, the state <math>z_1</math> will track the desired control <math>u_x</math> which will result in stabilizing the inner <math>\mathbf{x}</math> subsystem. + +* From our existing Lyapunov function <math>V_x</math>, we define the ''augmented'' Lyapunov function ''candidate'' + +::<math>V_1(\mathbf{x}, e_1) \triangleq V_x(\mathbf{x}) + \frac{1}{2} e_1^2</math> + +: So + +::<math>\dot{V}_1 += \dot{V}_x(\mathbf{x}) + \frac{1}{2}\left( 2 e_1 \dot{e}_1 \right) += \dot{V}_x(\mathbf{x}) + e_1 \dot{e}_1 += \dot{V}_x(\mathbf{x}) + e_1 \overbrace{v_1}^{\dot{e}_1} += \overbrace{\frac{\partial V_x}{\partial \mathbf{x}} \underbrace{\dot{\mathbf{x}}}_{\text{(i.e., }\frac{\operatorname{d}\mathbf{x}}{\operatorname{d}t}\text{)}}}^{\dot{V}_x\text{ (i.e.,} \frac{\operatorname{d}V_x}{\operatorname{d}t}\text{)}} + e_1 v_1 += \overbrace{\frac{\partial V_x}{\partial \mathbf{x}} \underbrace{\left( (f_x(\mathbf{x}) + g_x(\mathbf{x})u_x(\mathbf{x})) + g_x(\mathbf{x}) e_1 \right)}_{\dot{\mathbf{x}}}}^{\dot{V}_x} + e_1 v_1</math> + +: By distributing <math>\partial V_x/\partial \mathbf{x}</math>, we see that + +::<math>\dot{V}_1 = \overbrace{\frac{\partial V_x}{\partial \mathbf{x}}(f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x(\mathbf{x}))}^{{} \leq -W(\mathbf{x})} + \frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x}) e_1 + e_1 v_1 \leq -W(\mathbf{x})+ \frac{\partial V_x}{\partial \mathbf{x}} g_x(\mathbf{x}) e_1 + e_1 v_1</math> + +: To ensure that <math>\dot{V}_1 \leq -W(\mathbf{x}) < 0</math> (i.e., to ensure stability of the supersystem), we '''pick''' the control law + +::<math>v_1 = -\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})- k_1 e_1</math> + +: with <math>k_1 > 0</math>, and so + +::<math>\dot{V}_1 += -W(\mathbf{x}) + \frac{\partial V_x}{\partial \mathbf{x}} g_x(\mathbf{x}) e_1 + e_1\overbrace{\left( -\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})-k_1 e_1 \right)}^{v_1}</math> + +: After distributing the <math>e_1</math> through, + +::<math>\dot{V}_1 += +-W(\mathbf{x}) + \mathord{\overbrace{\frac{\partial V_x}{\partial \mathbf{x}} g_x(\mathbf{x}) e_1 +- e_1 \frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})}^{0}} - k_1 e_1^2 += -W(\mathbf{x})-k_1 e_1^2 \leq -W(\mathbf{x}) +< 0</math> + +: So our ''candidate'' Lyapunov function <math>V_1</math> '''is''' a true [[Lyapunov function]], and our system is '''stable''' under this control law <math>v_1</math> (which corresponds the control law <math>u_1</math> because <math>v_1 \triangleq u_1 - \dot{u}_x</math>). Using the variables from the original coordinate system, the equivalent Lyapunov function +::{| border="0", width="75%" +|- +|align="left"|<math>V_1(\mathbf{x}, z_1) \triangleq V_x(\mathbf{x}) + \frac{1}{2} ( z_1 - u_x(\mathbf{x}) )^2</math> +|align="right"|<math> (2)\,</math> +|- +|} +: As discussed below, this Lyapunov function will be used again when this procedure is applied iteratively to multiple-integrator problem. + +* Our choice of control <math>v_1</math> ultimately depends on all of our original state variables. In particular, the actual feedback-stabilizing control law +::{| border="0", width="75%" +|- +|align="left"|<math>\underbrace{u_1(\mathbf{x},z_1)=v_1+\dot{u}_x}_{\text{By definition of }v_1}=\overbrace{-\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})-k_1(\underbrace{z_1-u_x(\mathbf{x})}_{e_1})}^{v_1} \, + \, \overbrace{\frac{\partial u_x}{\partial \mathbf{x}}(\underbrace{f_x(\mathbf{x})+g_x(\mathbf{x})z_1}_{\dot{\mathbf{x}} \text{ (i.e., } \frac{\operatorname{d}\mathbf{x}}{\operatorname{d}t} \text{)}})}^{\dot{u}_x \text{ (i.e., } \frac{ \operatorname{d}u_x }{\operatorname{d}t} \text{)}}</math> +|align="right"|<math> (3)\,</math> +|- +|} +: The states <math>\mathbf{x}</math> and <math>z_1</math> and functions <math>f_x</math> and <math>g_x</math> come from the system. The function <math>u_x</math> comes from our known-stable <math>\dot{\mathbf{x}}=F(\mathbf{x})</math> subsystem. The '''gain''' parameter <math>k_1 > 0</math> affects the convergence rate or our system. Under this control law, our system is [[Lyapunov stability|stable]] at the origin <math>(\mathbf{x},z_1)=(\mathbf{0},0)</math>. + +: Recall that <math>u_1</math> in Equation&nbsp;(3) drives the input of an integrator that is connected to a subsystem that is feedback-stabilized by the control law <math>u_x</math>. Not surprisingly, the control <math>u_1</math> has a <math>\dot{u}_x</math> term that will be integrated to follow the stabilizing control law <math>\dot{u}_x</math> plus some offset. The other terms provide damping to remove that offset and any other perturbation effects that would be magnified by the integrator. + +So because this system is feedback stabilized by <math>u_1(\mathbf{x}, z_1)</math> and has Lyapunov function <math>V_1(\mathbf{x},z_1)</math> with <math>\dot{V}_1(\mathbf{x}, z_1) \leq -W(\mathbf{x}) < 0</math>, it can be used as the upper subsystem in another single-integrator cascade system. + +===Motivating Example: Two-integrator Backstepping=== +Before discussing the recursive procedure for the general multiple-integrator case, it is instructive to study the recursion present in the two-integrator case. That is, consider the [[dynamical system]] +:{| border="0", width="75%" +|- +|align="left"|<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = z_2\\ +\dot{z}_2 = u_2 +\end{cases}</math> +|align="right"|<math> (4)\,</math> +|- +|} +where <math>\mathbf{x} \in \mathbb{R}^n</math> and <math>z_1</math> and <math>z_2</math> are scalars. This system is a cascade connection of the single-integrator system in Equation&nbsp;(1) with another integrator (i.e., the input <math>u_2</math> enters through an integrator, and the output of that integrator enters the system in Equation&nbsp;(1) by its <math>u_1</math> input). + +By letting +* <math>\mathbf{y} \triangleq \begin{bmatrix} \mathbf{x} \\ z_1 \end{bmatrix}\,</math>, +* <math>f_y(\mathbf{y}) \triangleq \begin{bmatrix} f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 \\ 0 \end{bmatrix}\,</math>, +* <math>g_y(\mathbf{y}) \triangleq \begin{bmatrix} \mathbf{0}\\ 1 \end{bmatrix},\,</math> +then the two-integrator system in Equation&nbsp;(4) becomes the single-integrator system +:{| border="0", width="75%" +|- +|align="left"|<math>\begin{cases} +\dot{\mathbf{y}} = f_y(\mathbf{y}) + g_y(\mathbf{y}) z_2 &\quad \text{( where this } \mathbf{y} \text{ subsystem is stabilized by } z_2 = u_1(\mathbf{x},z_1) \text{ )}\\ +\dot{z}_2 = u_2. +\end{cases}</math> +|align="right"|<math> (5)\,</math> +|- +|} +By the single-integrator procedure, the control law <math>u_y(\mathbf{y}) \triangleq u_1(\mathbf{x},z_1)</math> stabilizes the upper <math>z_2</math>-to-<math>\mathbf{y}</math> subsystem using the Lyapunov function <math>V_1(\mathbf{x},z_1)</math>, and so Equation&nbsp;(5) is a new single-integrator system that is structurally equivalent to the single-integrator system in Equation&nbsp;(1). So a stabilizing control <math>u_2</math> can be found using the same single-integrator procedure that was used to find <math>u_1</math>. + +===Many-integrator backstepping=== + +In the two-integrator case, the upper single-integrator subsystem was stabilized yielding a new single-integrator system that can be similarly stabilized. This recursive procedure can be extended to handle any finite number of integrators. This claim can be formally proved with [[mathematical induction]]. Here, a stabilized multiple-integrator system is built up from subsystems of already-stabilized multiple-integrator subsystems. + +* First, consider the [[dynamical system]] +::<math>\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) u_x</math> +:that has scalar input <math>u_x</math> and output states <math>\mathbf{x} = [x_1, x_2, \ldots, x_n]^{\text{T}} \in \mathbb{R}^n</math>. Assume that +**<math>f_x(\mathbf{x}) = \mathbf{0}</math> so that the zero-input (i.e., <math>u_x = 0</math>) system is [[stationary point|stationary]] at the origin <math> \mathbf{x} = \mathbf{0}\,</math>. In this case, the origin is called an ''equilibrium'' of the system. +**The feedback control law <math>u_x(\mathbf{x})</math> stabilizes the system at the equilibrium at the origin. +**A [[Lyapunov function]] corresponding to this system is described by <math>V_x(\mathbf{x})</math>. +:That is, if output states <math>\mathbf{x}</math> are fed back to the input <math>u_x</math> by the control law <math>u_x(\mathbf{x})</math>, then the output states (and the Lyapunov function) return to the origin after a single perturbation (e.g., after a nonzero initial condition or a sharp disturbance). This subsystem is '''stabilized''' by feedback control law <math>u_x</math>. + +* Next, connect an [[integrator]] to input <math>u_x</math> so that the augmented system has input <math>u_1</math> (to the integrator) and output states <math>\mathbf{x}</math>. The resulting augmented dynamical system is +::<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = u_1 +\end{cases}</math> +:This "cascade" system matches the form in Equation&nbsp;(1), and so the single-integrator backstepping procedure leads to the stabilizing control law in Equation&nbsp;(3). That is, if we feed back states <math>z_1</math> and <math>\mathbf{x}</math> to input <math>u_1</math> according to the control law +::<math>u_1(\mathbf{x},z_1)=-\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})-k_1(z_1-u_x(\mathbf{x})) + \frac{\partial u_x}{\partial \mathbf{x}}(f_x(\mathbf{x})+g_x(\mathbf{x})z_1)</math> +: with gain <math>k_1 > 0</math>, then the states <math>z_1</math> and <math>\mathbf{x}</math> will return to <math>z_1 = 0</math> and <math> \mathbf{x}=\mathbf{0}\,</math> after a single perturbation. This subsystem is '''stabilized''' by feedback control law <math>u_1</math>, and the corresponding Lyapunov function from Equation&nbsp;(2) is +::<math>V_1(\mathbf{x},z_1) = V_x(\mathbf{x}) + \frac{1}{2}( z_1 - u_x(\mathbf{x}) )^2</math> +:That is, under feedback control law <math>u_1</math>, the Lyapunov function <math>V_1</math> decays to zero as the states return to the origin. + +* Connect a new integrator to input <math>u_1</math> so that the augmented system has input <math>u_2</math> and output states <math>\mathbf{x}</math>. The resulting augmented dynamical system is +::<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = z_2\\ +\dot{z}_2 = u_2 +\end{cases}</math> +:which is equivalent to the ''single''-integrator system +::<math>\begin{cases} +\overbrace{ \begin{bmatrix} \dot{\mathbf{x}}\\ \dot{z}_1 \end{bmatrix} }^{\triangleq \, \dot{\mathbf{x}}_1} += +\overbrace{ \begin{bmatrix} f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 \\ 0 \end{bmatrix} }^{\triangleq \, f_1(\mathbf{x}_1)} ++ +\overbrace{ \begin{bmatrix} \mathbf{0}\\ 1\end{bmatrix} }^{\triangleq \, g_1(\mathbf{x}_1)} z_2 &\qquad \text{ ( by Lyapunov function } V_1, \text{ subsystem stabilized by } u_1(\textbf{x}_1) \text{ )}\\ +\dot{z}_2 = u_2 +\end{cases}</math> +:Using these definitions of <math>\mathbf{x}_1</math>, <math>f_1</math>, and <math>g_1</math>, this system can also be expressed as +::<math>\begin{cases} +\dot{\mathbf{x}}_1 = f_1(\mathbf{x}_1) + g_1(\mathbf{x}_1) z_2 &\qquad \text{ ( by Lyapunov function } V_1, \text{ subsystem stabilized by } u_1(\textbf{x}_1) \text{ )}\\ +\dot{z}_2 = u_2 +\end{cases}</math> +:This system matches the single-integrator structure of Equation&nbsp;(1), and so the single-integrator backstepping procedure can be applied again. That is, if we feed back states <math>z_1</math>, <math>z_2</math>, and <math>\mathbf{x}</math> to input <math>u_2</math> according to the control law +::<math>u_2(\mathbf{x},z_1,z_2)=-\frac{\partial V_1}{\partial \mathbf{x}_1 } g_1(\mathbf{x}_1)-k_2(z_2-u_1(\mathbf{x}_1)) + \frac{\partial u_1}{\partial \mathbf{x}_1}(f_1(\mathbf{x}_1)+g_1(\mathbf{x}_1)z_2)</math> +:with gain <math>k_2 > 0</math>, then the states <math>z_1</math>, <math>z_2</math>, and <math>\mathbf{x}</math> will return to <math>z_1 = 0</math>, <math>z_2 = 0</math>, and <math> \mathbf{x}=\mathbf{0}\,</math> after a single perturbation. This subsystem is '''stabilized''' by feedback control law <math>u_2</math>, and the corresponding Lyapunov function is +::<math>V_2(\mathbf{x},z_1,z_2) = V_1(\mathbf{x}_1) + \frac{1}{2}( z_2 - u_1(\mathbf{x}_1) )^2</math> +:That is, under feedback control law <math>u_2</math>, the Lyapunov function <math>V_2</math> decays to zero as the states return to the origin. + +* Connect an integrator to input <math>u_2</math> so that the augmented system has input <math>u_3</math> and output states <math>\mathbf{x}</math>. The resulting augmented dynamical system is +::<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = z_2\\ +\dot{z}_2 = z_3\\ +\dot{z}_3 = u_3 +\end{cases}</math> +:which can be re-grouped as the ''single''-integrator system +::<math>\begin{cases} +\overbrace{ \begin{bmatrix} \dot{\mathbf{x}}\\ \dot{z}_1\\ \dot{z}_2 \end{bmatrix} }^{\triangleq \, \dot{\mathbf{x}}_2} += +\overbrace{ \begin{bmatrix} f_x(\mathbf{x}) + g_x(\mathbf{x}) z_2 \\ z_2 \\ 0\end{bmatrix} }^{\triangleq \, f_2(\mathbf{x}_2)} ++ +\overbrace{ \begin{bmatrix} \mathbf{0}\\ 0\\ 1\end{bmatrix} }^{\triangleq \, g_2(\mathbf{x}_2)} z_3 &\qquad \text{ ( by Lyapunov function } V_2, \text{ subsystem stabilized by } u_2(\textbf{x}_2) \text{ )}\\ +\dot{z}_3 = u_3 +\end{cases}</math> +:By the definitions of <math>\mathbf{x}_1</math>, <math>f_1</math>, and <math>g_1</math> from the previous step, this system is also represented by +::<math>\begin{cases} +\overbrace{ \begin{bmatrix} \dot{\mathbf{x}}_1\\ \dot{z}_2 \end{bmatrix} }^{\dot{\mathbf{x}}_2} += +\overbrace{ \begin{bmatrix} f_1(\mathbf{x}_1) + g_1(\mathbf{x}_1) z_2 \\ 0\end{bmatrix} }^{f_2(\mathbf{x}_2)} ++ +\overbrace{ \begin{bmatrix} \mathbf{0}\\ 1\end{bmatrix} }^{g_2(\mathbf{x}_2)} z_3 &\qquad \text{ ( by Lyapunov function } V_2, \text{ subsystem stabilized by } u_2(\textbf{x}_2) \text{ )}\\ +\dot{z}_3 = u_3 +\end{cases}</math> +:Further, using these definitions of <math>\mathbf{x}_2</math>, <math>f_2</math>, and <math>g_2</math>, this system can also be expressed as +::<math>\begin{cases} +\dot{\mathbf{x}}_2 = f_2(\mathbf{x}_2) + g_2(\mathbf{x}_2) z_3 &\qquad \text{ ( by Lyapunov function } V_2, \text{ subsystem stabilized by } u_2(\textbf{x}_2) \text{ )}\\ +\dot{z}_3 = u_3 +\end{cases}</math> +:So the re-grouped system has the single-integrator structure of Equation&nbsp;(1), and so the single-integrator backstepping procedure can be applied again. That is, if we feed back states <math>z_1</math>, <math>z_2</math>, <math>z_3</math>, and <math>\mathbf{x}</math> to input <math>u_3</math> according to the control law +::<math>u_3(\mathbf{x},z_1,z_2,z_3)=-\frac{\partial V_2}{\partial \mathbf{x}_2 } g_2(\mathbf{x}_2)-k_3(z_3-u_2(\mathbf{x}_2)) + \frac{\partial u_2}{\partial \mathbf{x}_2}(f_2(\mathbf{x}_2)+g_2(\mathbf{x}_2)z_3)</math> +:with gain <math>k_3 > 0</math>, then the states <math>z_1</math>, <math>z_2</math>, <math>z_3</math>, and <math>\mathbf{x}</math> will return to <math>z_1 = 0</math>, <math>z_2 = 0</math>, <math>z_3 = 0</math>, and <math> \mathbf{x}=\mathbf{0}\,</math> after a single perturbation. This subsystem is '''stabilized''' by feedback control law <math>u_3</math>, and the corresponding Lyapunov function is +::<math>V_3(\mathbf{x},z_1,z_2,z_3) = V_2(\mathbf{x}_2) + \frac{1}{2}( z_3 - u_2(\mathbf{x}_2) )^2</math> +:That is, under feedback control law <math>u_3</math>, the Lyapunov function <math>V_3</math> decays to zero as the states return to the origin. + +* This process can continue for each integrator added to the system, and hence any system of the form +::<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 &\qquad \text{ ( by Lyapunov function } V_x, \text{ subsystem stabilized by } u_x(\textbf{x}) \text{ )}\\ +\dot{z}_1 = z_2\\ +\dot{z}_2 = z_3\\ +\vdots\\ +\dot{z}_i = z_{i+1}\\ +\vdots\\ +\dot{z}_{k-2} = z_{k-1}\\ +\dot{z}_{k-1} = z_k\\ +\dot{z}_k = u +\end{cases}</math> +:has the recursive structure +::<math>\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 &\qquad \text{ ( by Lyapunov function } V_x, \text{ subsystem stabilized by } u_x(\textbf{x}) \text{ )}\\ +\dot{z}_1 = z_2 +\end{cases}\\ +\dot{z}_2 = z_3 +\end{cases}\\ +\vdots +\end{cases}\\ +\dot{z}_i = z_{i+1} +\end{cases}\\ +\vdots +\end{cases}\\ +\dot{z}_{k-2} = z_{k-1} +\end{cases}\\ +\dot{z}_{k-1} = z_k +\end{cases}\\ +\dot{z}_k = u +\end{cases}</math> +:and can be feedback stabilized by finding the feedback-stabilizing control and Lyapunov function for the single-integrator <math>(\mathbf{x},z_1)</math> subsystem (i.e., with input <math>z_2</math> and output <math>\mathbf{x}</math>) and iterating out from that inner subsystem until the ultimate feedback-stabilizing control <math>u</math> is known. At iteration <math>i</math>, the equivalent system is +::<math>\begin{cases} +\overbrace{ \begin{bmatrix} \dot{\mathbf{x}}\\ \dot{z}_1\\ \dot{z}_2 \\ \vdots \\ \dot{z}_{i-2} \\ \dot{z}_{i-1} \end{bmatrix} }^{\triangleq \, \dot{\mathbf{x}}_{i-1}} += +\overbrace{ \begin{bmatrix} f_{i-2}(\mathbf{x}_{i-2}) + g_{i-2}(\mathbf{x}_{i-1}) z_{i-2} \\ 0 \end{bmatrix} }^{\triangleq \, f_{i-1}(\mathbf{x}_{i-1})} ++ +\overbrace{ \begin{bmatrix} \mathbf{0}\\ 1\end{bmatrix} }^{\triangleq \, g_{i-1}(\mathbf{x}_{i-1})} z_i &\quad \text{ ( by Lyap. func. } V_{i-1}, \text{ subsystem stabilized by } u_{i-1}(\textbf{x}_{i-1}) \text{ )}\\ +\dot{z}_i = u_i +\end{cases}</math> +:The corresponding feedback-stabilizing control law is +::<math>u_i(\overbrace{\mathbf{x},z_1,z_2,\dots,z_i}^{\triangleq \, \mathbf{x}_i})=-\frac{\partial V_{i-1}}{\partial \mathbf{x}_{i-1} } g_{i-1}(\mathbf{x}_{i-1}) \, - \, k_i(z_i \, - \, u_{i-1}(\mathbf{x}_{i-1})) \, + \, \frac{\partial u_{i-1}}{\partial \mathbf{x}_{i-1}}(f_{i-1}(\mathbf{x}_{i-1}) \, + \, g_{i-1}(\mathbf{x}_{i-1})z_i)</math> +:with gain <math>k_i > 0</math>. The corresponding Lyapunov function is +::<math>V_i(\mathbf{x}_i) = V_{i-1}(\mathbf{x}_{i-1}) + \frac{1}{2}( z_i - u_{i-1}(\mathbf{x}_{i-1}) )^2</math> +:By this construction, the ultimate control <math>u(\mathbf{x},z_1,z_2,\ldots,z_k) = u_k(\mathbf{x}_k)</math> (i.e., ultimate control is found at final iteration <math>i=k</math>). +Hence, any system in this special many-integrator strict-feedback form can be feedback stabilized using a straightforward procedure that can even be automated (e.g., as part of an [[adaptive control]] algorithm). + +==Generic Backstepping== + +Systems in the special [[strict-feedback form]] have a recursive structure similar to the many-integrator system structure. Likewise, they are stabilized by stabilizing the smallest cascaded system and then ''backstepping'' to the next cascaded system and repeating the procedure. So it is critical to develop a single-step procedure; that procedure can be recursively applied to cover the many-step case. Fortunately, due to the requirements on the functions in the strict-feedback form, each single-step system can be rendered by feedback to a single-integrator system, and that single-integrator system can be stabilized using methods discussed above. + +===Single-step Procedure=== + +Consider the simple [[strict-feedback form|strict-feedback]] [[dynamical system|system]] +:{| border="0", width="75%" +|- +|align="left"|<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = f_1(\mathbf{x}, z_1) + g_1(\mathbf{x}, z_1) u_1 +\end{cases}</math> +|align="right"|<math> (6)\,</math> +|- +|} +where +* <math>\mathbf{x} = [x_1, x_2, \ldots, x_n]^{\text{T}} \in \mathbb{R}^n</math>, +* <math>z_1</math> and <math>u_1</math> are [[scalar (mathematics)|scalar]]s, +* For all <math>\mathbf{x}</math> and <math>z_1</math>, <math>g_1(\mathbf{x},z_1) \neq 0</math>. +Rather than designing feedback-stabilizing control <math>u_1</math> directly, introduce a new control <math>u_{a1}</math> (to be designed ''later'') and use control law +:<math>u_1( \mathbf{x}, z_1 ) += +\frac{ 1 }{ g_1( \mathbf{x}, z_1 ) } +\left( u_{a1} - f_1(\mathbf{x},z_1) \right)</math> +which is possible because <math>g_1 \neq 0</math>. So the system in Equation&nbsp;(6) is +:<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = f_1(\mathbf{x}, z_1) + g_1(\mathbf{x}, z_1) \overbrace{\frac{ 1 }{ g_1( \mathbf{x}, z_1 ) } +\left( u_{a1} - f_1(\mathbf{x},z_1) \right)}^{u_1(\mathbf{x}, z_1)} +\end{cases}</math> +which simplifies to +:<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1\\ +\dot{z}_1 = u_{a1} +\end{cases}</math> +This new <math>u_{a1}</math>-to-<math>\mathbf{x}</math> system matches the ''single-integrator cascade system'' in Equation&nbsp;(1). Assuming that a feedback-stabilizing control law <math>u_x(\mathbf{x})</math> and [[Lyapunov function]] <math>V_x(\mathbf{x})</math> for the upper subsystem is known, the feedback-stabilizing control law from Equation&nbsp;(3) is +:<math>u_{a1}(\mathbf{x},z_1)=-\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})-k_1(z_1-u_x(\mathbf{x})) + \frac{\partial u_x}{\partial \mathbf{x}}(f_x(\mathbf{x})+g_x(\mathbf{x})z_1)</math> +with gain <math>k_1 > 0</math>. So the final feedback-stabilizing control law is +:{| border="0", width="75%" +|- +|align="left"|<math>u_1(\mathbf{x},z_1) = \frac{1}{ g_1(\mathbf{x},z_1) } \left( \overbrace{-\frac{\partial V_x}{\partial \mathbf{x}}g_x(\mathbf{x})-k_1(z_1-u_x(\mathbf{x})) + \frac{\partial u_x}{\partial \mathbf{x}}(f_x(\mathbf{x})+g_x(\mathbf{x})z_1)}^{u_{a1}(\mathbf{x},z_1)} \, - \, f_1(\mathbf{x}, z_1) \right)</math> +|<math> (7)\,</math> +|- +|} +with gain <math>k_1 > 0</math>. The corresponding Lyapunov function from Equation&nbsp;(2) is +:{| border="0", width="75%" +|- +|align="left"|<math>V_1(\mathbf{x},z_1) = V_x(\mathbf{x}) + \frac{1}{2} ( z_1 - u_x(\mathbf{x}) )^2</math> +|<math> (8)\,</math> +|- +|} +Because this ''strict-feedback system'' has a feedback-stabilizing control and a corresponding Lyapunov function, it can be cascaded as part of a larger strict-feedback system, and this procedure can be repeated to find the surrounding feedback-stabilizing control. + +===Many-step Procedure=== + +As in many-integrator backstepping, the single-step procedure can be completed iteratively to stabilize an entire strict-feedback system. In each step, +# The smallest "unstabilized" single-step strict-feedback system is isolated. +# Feedback is used to convert the system into a single-integrator system. +# The resulting single-integrator system is stabilized. +# The stabilized system is used as the upper system in the next step. +That is, any ''strict-feedback system'' +:<math>\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 &\qquad \text{ ( by Lyapunov function } V_x, \text{ subsystem stabilized by } u_x(\textbf{x}) \text{ )}\\ +\dot{z}_1 = f_1( \mathbf{x}, z_1 ) + g_1( \mathbf{x}, z_1 ) z_2\\ +\dot{z}_2 = f_2( \mathbf{x}, z_1, z_2 ) + g_2( \mathbf{x}, z_1, z_2 ) z_3\\ +\vdots\\ +\dot{z}_i = f_i( \mathbf{x}, z_1, z_2, \ldots, z_i ) + g_i( \mathbf{x}, z_1, z_2, \ldots, z_i ) z_{i+1}\\ +\vdots\\ +\dot{z}_{k-2} = f_{k-2}( \mathbf{x}, z_1, z_2, \ldots z_{k-2} ) + g_{k-2}( \mathbf{x}, z_1, z_2, \ldots, z_{k-2} ) z_{k-1}\\ +\dot{z}_{k-1} = f_{k-1}( \mathbf{x}, z_1, z_2, \ldots z_{k-2}, z_{k-1} ) + g_{k-1}( \mathbf{x}, z_1, z_2, \ldots, z_{k-2}, z_{k-1} ) z_k\\ +\dot{z}_k = f_k( \mathbf{x}, z_1, z_2, \ldots z_{k-1}, z_k ) + g_k( \mathbf{x}, z_1, z_2, \ldots, z_{k-1}, z_k ) u +\end{cases}</math> +has the recursive structure +:<math>\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\begin{cases} +\dot{\mathbf{x}} = f_x(\mathbf{x}) + g_x(\mathbf{x}) z_1 &\qquad \text{ ( by Lyapunov function } V_x, \text{ subsystem stabilized by } u_x(\textbf{x}) \text{ )}\\ +\dot{z}_1 = f_1( \mathbf{x}, z_1 ) + g_1( \mathbf{x}, z_1 ) z_2 +\end{cases}\\ +\dot{z}_2 = f_2( \mathbf{x}, z_1, z_2 ) + g_2( \mathbf{x}, z_1, z_2 ) z_3 +\end{cases}\\ +\vdots\\ +\end{cases}\\ +\dot{z}_i = f_i( \mathbf{x}, z_1, z_2, \ldots, z_i ) + g_i( \mathbf{x}, z_1, z_2, \ldots, z_i ) z_{i+1} +\end{cases}\\ +\vdots +\end{cases}\\ +\dot{z}_{k-2} = f_{k-2}( \mathbf{x}, z_1, z_2, \ldots z_{k-2} ) + g_{k-2}( \mathbf{x}, z_1, z_2, \ldots, z_{k-2} ) z_{k-1} +\end{cases}\\ +\dot{z}_{k-1} = f_{k-1}( \mathbf{x}, z_1, z_2, \ldots z_{k-2}, z_{k-1} ) + g_{k-1}( \mathbf{x}, z_1, z_2, \ldots, z_{k-2}, z_{k-1} ) z_k +\end{cases}\\ +\dot{z}_k = f_k( \mathbf{x}, z_1, z_2, \ldots z_{k-1}, z_k ) + g_k( \mathbf{x}, z_1, z_2, \ldots, z_{k-1}, z_k ) u +\end{cases}</math> +and can be feedback stabilized by finding the feedback-stabilizing control and Lyapunov function for the single-integrator <math>(\mathbf{x},z_1)</math> subsystem (i.e., with input <math>z_2</math> and output <math>\mathbf{x}</math>) and iterating out from that inner subsystem until the ultimate feedback-stabilizing control <math>u</math> is known. At iteration <math>i</math>, the equivalent system is +:<math>\begin{cases} +\overbrace{ \begin{bmatrix} \dot{\mathbf{x}}\\ \dot{z}_1\\ \dot{z}_2 \\ \vdots \\ \dot{z}_{i-2} \\ \dot{z}_{i-1} \end{bmatrix} }^{\triangleq \, \dot{\mathbf{x}}_{i-1}} += +\overbrace{ \begin{bmatrix} f_{i-2}(\mathbf{x}_{i-2}) + g_{i-2}(\mathbf{x}_{i-2}) z_{i-2} \\ f_{i-1}(\mathbf{x}_i) \end{bmatrix} }^{\triangleq \, f_{i-1}(\mathbf{x}_{i-1})} ++ +\overbrace{ \begin{bmatrix} \mathbf{0}\\ g_{i-1}(\mathbf{x}_i)\end{bmatrix} }^{\triangleq \, g_{i-1}(\mathbf{x}_{i-1})} z_i &\quad \text{ ( by Lyap. func. } V_{i-1}, \text{ subsystem stabilized by } u_{i-1}(\textbf{x}_{i-1}) \text{ )}\\ +\dot{z}_i = f_i(\mathbf{x}_i) + g_i(\mathbf{x}_i) u_i +\end{cases}</math> +By Equation&nbsp;(7), the corresponding feedback-stabilizing control law is +:<math>u_i(\overbrace{\mathbf{x},z_1,z_2,\dots,z_i}^{\triangleq \, \mathbf{x}_i}) += +\frac{1}{g_i(\mathbf{x}_i)} +\left( \overbrace{-\frac{\partial V_{i-1}}{\partial \mathbf{x}_{i-1} } +g_{i-1}(\mathbf{x}_{i-1}) +\, - \, +k_i\left( z_i \, - \, u_{i-1}(\mathbf{x}_{i-1}) \right) +\, + \, +\frac{\partial u_{i-1}}{\partial \mathbf{x}_{i-1}}(f_{i-1}(\mathbf{x}_{i-1}) +\, + \, +g_{i-1}(\mathbf{x}_{i-1})z_i) }^{\text{Single-integrator stabilizing control } u_{a\;\!i}(\mathbf{x}_i)} +\, - \, +f_i( \mathbf{x}_{i-1} ) +\right)</math> +with gain <math>k_i > 0</math>. By Equation&nbsp;(8), the corresponding Lyapunov function is +:<math>V_i(\mathbf{x}_i) = V_{i-1}(\mathbf{x}_{i-1}) + \frac{1}{2} ( z_i - u_{i-1}(\mathbf{x}_{i-1}) )^2</math> +By this construction, the ultimate control <math>u(\mathbf{x},z_1,z_2,\ldots,z_k) = u_k(\mathbf{x}_k)</math> (i.e., ultimate control is found at final iteration <math>i=k</math>). +Hence, any strict-feedback system can be feedback stabilized using a straightforward procedure that can even be automated (e.g., as part of an [[adaptive control]] algorithm). + +==See also== +* [[Nonlinear control]] +* [[Strict-feedback form]] +* [[Robust control]] +* [[Adaptive control]] + +==References== +{{Reflist}} + +[[Category:Control theory]] +[[Category:Nonlinear control]] + kxu47zcvzt90bjb88afhx2h1x7q691u + + + + Stochastic volatility + 0 + 14317 + + 14318 + 2013-11-09T08:05:42Z + + Cydebot + 0 + + + Robot - Speedily moving category Options to [[:Category:Options (finance)]] per [[WP:CFDS|CFDS]]. + wikitext + text/x-wiki + {{Hatnote|See also [[Volatility (finance)]].}} + +'''Stochastic [[Volatility (finance)|volatility]]''' models are those in which the [[variance]] of a [[stochastic process]] is itself randomly distributed. <ref>Gatheral, J. (2006). The volatility surface: a practitioner's guide. Wiley.</ref> They are used in the field of [[mathematical finance]] to evaluate [[derivative (finance)|derivative]] [[securities]], such as [[option (finance)|options]]. The name derives from the models' treatment of the underlying security's volatility as a [[random process]], governed by [[state variable]]s such as the price level of the underlying security, the tendency of volatility to revert to some long-run mean value, and the [[variance]] of the volatility process itself, among others. + +Stochastic volatility models are one approach to resolve a shortcoming of the [[Black–Scholes]] model. In particular, these models assume that the underlying volatility is constant over the life of the derivative, and unaffected by the changes in the price level of the underlying security. However, these models cannot explain long-observed features of the implied volatility surface such as [[volatility smile]] and skew, which indicate that implied volatility does tend to vary with respect to [[strike price]] and expiry. By assuming that the volatility of the underlying price is a stochastic process rather than a constant, it becomes possible to model derivatives more accurately. + +==Basic model== +Starting from a constant volatility approach, assume that the derivative's underlying price follows a standard model for [[geometric brownian motion]]: + +:<math> dS_t = \mu S_t\,dt + \sigma S_t\,dW_t \, </math> + +where <math>\mu \,</math> is the constant drift (i.e. expected return) of the security price <math>S_t \,</math>, <math>\sigma \,</math> is the constant volatility, and <math>dW_t \,</math> is a standard [[Wiener process]] with zero [[mean]] and unit rate of [[variance]]. The explicit solution of this [[stochastic differential equation]] is +:<math>S_t= S_0 e^{(\mu- \frac{1}{2} \sigma^2) t+ \sigma W_t}</math>. + +The [[maximum likelihood|Maximum likelihood estimator]] to estimate the constant volatility <math>\sigma \,</math> for given stock prices <math>S_t \,</math> at different times <math>t_i \,</math> is +:<math>\begin{align}\hat{\sigma}^2 &= \left(\frac{1}{n} \sum_{i=1}^n \frac{(\ln S_{t_i}- \ln S_{t_{i-1}})^2}{t_i-t_{i-1}} \right) - \frac 1 n \frac{(\ln S_{t_n}- \ln S_{t_0})^2}{t_n-t_0}\\ +& = \frac 1 n \sum_{i=1}^n (t_i-t_{i-1})\left(\frac{\ln \frac{S_{t_i}}{S_{t_{i-1}}}}{t_i-t_{i-1}} - \frac{\ln \frac{S_{t_n}}{S_{t_{0}}}}{t_n-t_0}\right)^2;\end{align}</math> +its expectation value is <math>E \left[ \hat{\sigma}^2\right]= \frac{n-1}{n} \sigma^2</math>. + +This basic model with constant volatility <math>\sigma \,</math> is the starting point for non-stochastic volatility models such as Black–Scholes and [[Cox–Ross–Rubinstein]]. + +For a stochastic volatility model, replace the constant volatility <math>\sigma \,</math> with a function <math>\nu_t \,</math>, that models the variance of <math>S_t \,</math>. This variance function is also modeled as brownian motion, and the form of <math>\nu_t \,</math> depends on the particular SV model under study. +:<math> dS_t = \mu S_t\,dt + \sqrt{\nu_t} S_t\,dW_t \,</math> + +:<math> d\nu_t = \alpha_{S,t}\,dt + \beta_{S,t}\,dB_t \,</math> + +where <math>\alpha_{S,t} \,</math> and <math>\beta_{S,t} \,</math> are some functions of <math>\nu \,</math> and <math>dB_t \,</math> is another standard gaussian that is correlated with <math>dW_t \,</math> with constant correlation factor <math>\rho \,</math>. + +===Heston model=== +{{Main|Heston model}} + +The popular Heston model is a commonly used SV model, in which the randomness of the variance process varies as the square root of variance. In this case, the differential equation for variance takes the form: + +:<math> d\nu_t = \theta(\omega - \nu_t)dt + \xi \sqrt{\nu_t}\,dB_t \,</math> + +where <math>\omega</math> is the mean long-term volatility, <math>\theta</math> is the rate at which the volatility reverts toward its long-term mean, <math>\xi</math> is the volatility of the volatility process, and <math>dB_t</math> is, like <math>dW_t</math>, a gaussian with zero mean and <math>\sqrt{dt}</math> standard deviation. However, <math>dW_t</math> and <math>dB_t</math> are correlated with the constant [[correlation]] value <math>\rho</math>. + +In other words, the Heston SV model assumes that the variance is a random process that +#exhibits a tendency to revert towards a long-term mean <math>\omega</math> at a rate <math>\theta</math>, +#exhibits a volatility proportional to the square root of its level +#and whose source of randomness is correlated (with correlation <math>\rho</math>) with the randomness of the underlying's price processes. + +There exist few known parametrisation of the volatility surface based on the heston model (Schonbusher, SVI and gSVI) as well as their de-arbitraging methodologies.<ref name=damghani>{{cite journal | author=Babak Mahdavi Damghani | title=De-arbitraging with a weak smile | publisher=Wilmott | year = 2013}}http://www.readcube.com/articles/10.1002/wilm.10201?locale=en</ref> + +===CEV Model === +{{Main|Constant Elasticity of Variance Model}} + +The '''CEV''' model describes the relationship between volatility and price, introducing stochastic volatility: + +:<math>dS_t=\mu S_t dt + \sigma S_t ^ \gamma dW_t</math> + +Conceptually, in some markets volatility rises when prices rise (e.g. commodities), so <math>\gamma > 1</math>. In other markets, volatility tends to rise as prices fall, modelled with <math>\gamma < 1</math>. + +Some argue that because the CEV model does not incorporate its own stochastic process for volatility, it is not truly a stochastic volatility model. Instead, they call it a [[local volatility]] model. + +===SABR volatility model=== +{{Main|SABR Volatility Model}} + +The '''SABR''' model (Stochastic Alpha, Beta, Rho) describes a single forward <math>F</math> (related to any asset e.g. an index, interest rate, bond, currency or equity) under stochastic volatility <math>\sigma</math>: + +:<math>dF_t=\sigma_t F^\beta_t\, dW_t,</math> + +:<math>d\sigma_t=\alpha\sigma^{}_t\, dZ_t,</math> + +The initial values <math>F_0</math> and <math>\sigma_0</math> are the current forward price and volatility, whereas <math>W_t</math> and <math>Z_t</math> are two correlated Wiener processes (i.e. Brownian motions) with correlation coefficient <math>-1<\rho<1</math>. The constant parameters <math>\beta,\;\alpha</math> are such that <math>0\leq\beta\leq 1,\;\alpha\geq 0</math>. + +The main feature of the SABR model is to be able to reproduce the smile effect of the [[volatility smile]]. + +===GARCH model=== +The Generalized Autoregressive Conditional Heteroskedasticity ([[GARCH]]) model is another popular model for estimating stochastic volatility. It assumes that the randomness of the variance process varies with the variance, as opposed to the square root of the variance as in the Heston model. The standard GARCH(1,1) model has the following form for the variance differential: + +:<math> d\nu_t = \theta(\omega - \nu_t)dt + \xi \nu_t\,dB_t \,</math> + +The GARCH model has been extended via numerous variants, including the NGARCH, TGARCH, IGARCH, LGARCH, EGARCH, GJR-GARCH, etc. + +===3/2 model=== +The 3/2 model is similar to the Heston model, but assumes that the randomness of the variance process varies with <math>\nu_t^{3/2}</math>. The form of the variance differential is: + +:<math> d\nu_t = \nu_t(\omega - \theta\nu_t)dt + \xi \nu_t^\frac{3}{2}\,dB_t \,</math>. + +However the meaning of the parameters is different from Heston model. In this model both, mean reverting and volatility of variance parameters, are stochastic quantities given by <math> \theta\nu_t</math> and <math> \xi\nu_t</math> respectively. + +===Chen model=== +In interest rate modelings, [[Lin Chen]] in 1994 developed the first stochastic mean and stochastic volatility model, [[Chen model]]. +Specifically, the dynamics of the instantaneous interest rate are given by following the stochastic differential equations: + +:<math> dr_t = (\theta_t-\alpha_t)\,dt + \sqrt{r_t}\,\sigma_t\, dW_t</math>, +:<math> d \alpha_t = (\zeta_t-\alpha_t)\,dt + \sqrt{\alpha_t}\,\sigma_t\, dW_t</math>, +:<math> d \sigma_t = (\beta_t-\sigma_t)\,dt + \sqrt{\sigma_t}\,\eta_t\, dW_t</math>. + +==Calibration== +Once a particular SV model is chosen, it must be calibrated against existing market data. Calibration is the process of identifying the set of model parameters that are most likely given the observed data. One popular technique is to use [[Maximum likelihood|Maximum Likelihood Estimation]] (MLE). For instance, in the Heston model, the set of model parameters <math>\Psi_0 = \{\omega, \theta, \xi, \rho\} \,</math> can be estimated applying an MLE algorithm such as the Powell [[Directed set|Directed Set]] method [http://www.library.cornell.edu/nr/bookcpdf.html] to observations of historic underlying security prices. + +In this case, you start with an estimate for <math>\Psi_0 \,</math>, compute the residual errors when applying the historic price data to the resulting model, and then adjust <math>\Psi \,</math> to try to minimize these errors. Once the calibration has been performed, it is standard practice to re-calibrate the model periodically. + +==See also== +*[[Chen model]] +*[[Heston model]] +*[[Local volatility]] +*[[Risk-neutral measure]] +*[[SABR Volatility Model]] +*[[Volatility (finance)|Volatility]] +*[[Volatility, uncertainty, complexity and ambiguity]] +*[[Black–Scholes]] +*[[Subordinator_(mathematics)|Subordinator]] + +==References== +{{Reflist}} +* [http://www.wilmott.com/detail.cfm?articleID=245 Stochastic Volatility and Mean-variance Analysis], Hyungsok Ahn, Paul Wilmott, (2006). +* [http://www.javaquant.net/papers/Heston-original.pdf A closed-form solution for options with stochastic volatility], SL Heston, (1993). +* [http://www.amazon.com/s?platform=gurupa&url=index%3Dblended&keywords=inside+volatility+arbitrage Inside Volatility Arbitrage], Alireza Javaheri, (2005). +* [http://ssrn.com/abstract=982221 Accelerating the Calibration of Stochastic Volatility Models], Kilin, Fiodar (2006). +*{{cite book | title = Stochastic Mean and Stochastic Volatility -- A Three-Factor Model of the Term Structure of Interest Rates and Its Application to the Pricing of Interest Rate Derivatives. Blackwell Publishers. +| author = Lin Chen | publisher = Blackwell Publishers | year = 1996}} + +{{Derivatives market}} +{{Volatility}} + +[[Category:Mathematical finance]] +[[Category:Options (finance)]] +[[Category:Derivatives (finance)]] + oqljju4fjdslkqm7q1w2im4bwlr7yod + + + + Separation logic + 0 + 13071 + + 13072 + 2013-11-20T12:42:25Z + + 194.254.61.40 + + Disambiguated: [[adjunction]] → [[adjunction (category theory)]] + wikitext + text/x-wiki + In [[computer science]], '''separation logic'''<ref name="lics02">[http://www.cs.cmu.edu/~jcr/seplogic.pdf Separation Logic: A Logic for Shared Mutable Data Structures.] John C. Reynolds. LICS 2002.</ref> is an extension of [[Hoare logic]], a way of reasoning about programs. +It was developed by [[John C. Reynolds]], [[Peter O'Hearn]], Samin Ishtiaq and Hongseok Yang,<ref name="lics02" /><ref name="sl1999">Intuitionistic Reasoning about Shared Mutable Data Structure. John Reynolds. Millennial Perspectives in Computer Science, Proceedings of the 1999 Oxford-Microsoft Symposium in Honour of Sir Tony Hoare</ref><ref name="popl01">BI as an Assertion Language for Mutable Data Structures. Samin Ishtiaq, Peter O'Hearn. POPL 2001.</ref><ref name="csl01">[http://www.eecs.qmul.ac.uk/~ohearn/papers/localreasoning.pdf Local Reasoning about Programs that Alter Data Structures.] Peter O'Hearn, John Reynolds, Hongseok Yang. CSL 2001</ref> drawing upon early work by [[Rod Burstall]].<ref name="burstall">Some techniques for proving programs which alter data structures. R.M. Burstall. Machine Intelligence 7, 1972.</ref> The assertion language of separation logic is a special case of the [[logic of bunched implications]] (BI).<ref name="bi">The Logic of Bunched Implications P.W. O'Hearn and D. J. Pym. Bulletin of Symbolic Logic , 5(2), June 1999, pp215-244</ref> + +Separation logic facilitates reasoning about: + +* programs that manipulate pointer data structures — including [[information hiding]] in the presence of pointers; +* ''"transfer of ownership"'' (avoidance of semantic frame [[axiom]]s); and +* virtual separation (modular reasoning) between concurrent modules. + +Separation logic supports the developing field of research described by [[Peter O'Hearn]] and others as ''local reasoning'', whereby specifications and proofs of a program component mention only the portion of memory used by the component, and not the entire global state of the system. Applications include automated [[program verification]] (where an [[algorithm]] checks the validity of another algorithm) and automated [[parallelization]] of software. + +== Assertions: Operators and semantics == + +Separation logic assertions describe "states" consisting of a ''store'' and a ''heap'', roughly corresponding to the state of [[stack-based memory allocation|local (or ''stack-allocated'') variables]] and [[dynamic memory allocation|''dynamically-allocated'' objects]] in common programming languages such as [[C (programming language)|C]] and [[Java (programming language)|Java]]. A store <math>s</math> is a [[function (mathematics)|function]] mapping variables to values. A heap <math>h</math> is a [[partial function]] mapping memory addresses to values. Two heaps <math>h</math> and <math>h'</math> are ''disjoint'' (denoted <math>h \,\bot\, h'</math>) if their domains do not overlap (i.e., if for every memory address <math>\ell</math>, at least one of <math>h(\ell)</math> and <math>h'(\ell)</math> is undefined). + +The logic allows to prove judgements of the form <math>s, h \models P</math>, where <math>s</math> is a store, <math>h</math> is a heap, and <math>P</math> is an ''assertion'' over the given store and heap. Separation logic assertions (denoted as <math>P</math>, <math>Q</math>, <math>R</math>) contain the standard boolean connectives and, in addition, <math>\mathbf{e}\mathbf{m}\mathbf{p}</math>, <math>e \mapsto e'</math>, <math>P \ast Q</math>, and <math>P {-\!\!\ast}\, Q</math>, where <math>e</math> and <math>e'</math> are expressions. +* The constant <math>\mathbf{e}\mathbf{m}\mathbf{p}</math> asserts that the heap is ''empty'', i.e., <math>s, h \models \mathbf{e}\mathbf{m}\mathbf{p}</math> when <math>h</math> is undefined for all addresses. +* The binary operator <math>\mapsto</math> takes an address and a value and asserts that the heap is defined at exactly one location, mapping the given address to the given value. I.e., <math>s, h \models e \mapsto e'</math> when <math>h([\![e]\!]_{s}) = [\![e']\!]_{s}</math> (where <math>[\![e]\!]_{s}</math> denotes the value of expression <math>e</math> evaluated in store <math>s</math>) and <math>h</math> is otherwise undefined. +* The binary operator <math>\ast</math> (pronounced ''star'' or ''separating conjunction'') asserts that the heap can be split into two ''disjoint'' parts where its two arguments hold, respectively. I.e., <math>s, h \models P \ast Q</math> when there exist <math>h_1, h_2</math> such that <math>h_1 \,\bot\, h_2</math> and <math>h = h_1 \cup h_2</math> and <math>s, h_1 \models P</math> and <math>s, h_2 \models Q</math>. +* The binary operator <math>-\!\!\ast</math> (pronounced ''magic wand'' or ''separating implication'') asserts that extending the heap with a disjoint part that satisfies its first argument results in a heap that satisfies its second argument. I.e,. <math>s, h \models P -\!\!\ast\, Q</math> when for every heap <math>h' \,\bot\, h</math> such that <math>s, h' \models P</math>, also <math>s, h \cup h' \models Q</math> holds. + +The operators <math>\ast</math> and <math>-\!\!\ast</math> share some properties with the classical [[Logical Conjunction|conjunction]] and [[Entailment|implication]] operators. They can be combined using an inference rule similar to [[modus ponens]] +:<math>\frac{s, h \models P \ast (P -\!\!\ast\, Q)}{s, h \models Q}</math> +and they form an [[adjunction (category theory)|adjunction]], i.e., <math>s, h \cup h' \models P \ast Q \Rightarrow R</math> if and only if <math>s, h \models P \Rightarrow Q -\!\!\ast\, R</math> for <math>h \,\bot\, h'</math>; more precisely, the adjoint operators are <math>\_ \ast Q </math> and <math>Q -\!\!\ast\, \_</math>. + +==Reasoning about programs: triples and proof rules== + +In separation logic, Hoare triples have a slightly different meaning than in [[Hoare logic]]. The triple <math>\{P\}\ C\ \{Q\}</math> asserts that if the program, <math>C</math>, executes from an initial state satisfying the precondition, <math>P</math>, then the program will ''not go wrong'' (e.g., have undefined behaviour), and if it terminates, then the final state will satisfy the postcondition, <math>Q</math>. In essence, during its execution, <math>C</math> may access only memory locations whose existence is asserted in the precondition or that have been allocated by <math>C</math> itself. + +In addition to the standard rules from [[Hoare logic]], separation logic supports the following very important rule: + +<math>\frac{ \{P\}\ C\ \{Q\} }{ \{P \ast R\}\ C\ \{Q \ast R\} }~\mathsf{mod}(C) \cap \mathsf{fv}(R) =\emptyset</math> + +This is known as the '''frame rule''' (named after the [[frame problem]]) and enables local reasoning. It says that a program that executes safely in a small state (satisfying <math>P</math>), can also execute in any bigger state (satisfying <math>P \ast R</math>) and that its execution will not affect the additional part of the state (and so <math>R</math> will remain true in the postcondition). The side condition enforces this by specifying that none of the variables modified by <math>C</math> occur free in <math>R</math>, i.e. none of them are in the 'free variable' set <math>\mathsf{fv}</math> of <math>R</math>. + +==Implementations== +The [[Ynot]] library for the [[Coq proof assistant]] contains an implementation. + +==References== +<references /> + +[[Category:Program logic]] +[[Category:Substructural logic]] +[[Category:Logic in computer science]] + cujc0xk14v10bjfuwchkidlc503y2wn + + + + Image moment + 0 + 10049 + + 10050 + 2014-01-11T00:40:09Z + + 87.219.204.227 + + /* External links */ + wikitext + text/x-wiki + In [[image processing]], [[computer vision]] and related fields, an '''image moment''' is a certain particular weighted average ([[moment (mathematics)|moment]]) of the image pixels' intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. + +Image moments are useful to describe objects after segmentation. [[#Examples|Simple properties of the image]] which are found ''via'' image moments include area (or total intensity), its [[centroid]], and [[Image moments#Examples 2|information about its orientation]]. + +== Raw moments == +For a 2D continuous function ''f''(''x'',''y'') the [[Moment (mathematics)|moment]] (sometimes called "raw moment") of order (''p'' + ''q'') is defined as + +:<math> M_{pq}=\int\limits_{-\infty}^{\infty} \int\limits_{-\infty}^{\infty} x^py^qf(x,y) \,dx\, dy</math> + +for ''p'',''q'' = 0,1,2,... +Adapting this to scalar (greyscale) image with pixel intensities ''I''(''x'',''y''), raw image moments ''M<sub>ij</sub>'' are calculated by + +:<math>M_{ij} = \sum_x \sum_y x^i y^j I(x,y)\,\!</math> + +In some cases, this may be calculated by considering the image as a [[probability density function]], ''i.e.'', by dividing the above by + +:<math>\sum_x \sum_y I(x,y) \,\!</math> + +A uniqueness theorem (Hu [1962]) states that if ''f''(''x'',''y'') +is piecewise continuous and has nonzero values only in a finite part of the ''xy'' +plane, moments of all orders exist, and the moment sequence (''M<sub>pq</sub>'') is uniquely determined by ''f''(''x'',''y''). Conversely, (''M<sub>pq</sub>'') uniquely determines ''f''(''x'',''y''). In practice, the image is summarized with functions of a few lower order moments. + +===Examples=== + +Simple image properties derived ''via'' raw moments include: +* Area (for binary images) or sum of grey level (for greytone images): ''M''<sub>00</sub> +* Centroid: { {{overbar|''x''}}, {{overbar|''y''}} } = {''M''<sub>10</sub>/''M''<sub>00</sub>, ''M''<sub>01</sub>/''M''<sub>00</sub> } + +== Central moments == +[[moment about the mean|Central moments]] are defined as + +:<math> \mu_{pq} = \int\limits_{-\infty}^{\infty} \int\limits_{-\infty}^{\infty} (x - \bar{x})^p(y - \bar{y})^q f(x,y) \, dx \, dy </math> + +where <math>\bar{x}=\frac{M_{10}}{M_{00}}</math> and <math>\bar{y}=\frac{M_{01}}{M_{00}}</math> are the components of the [[centroid]]. + +If ''&fnof;''(''x'',&nbsp;''y'') is a digital image, then the previous equation becomes + +:<math>\mu_{pq} = \sum_{x} \sum_{y} (x - \bar{x})^p(y - \bar{y})^q f(x,y)</math> + +The central moments of order up to 3 are: + +:<math>\mu_{00} = M_{00},\,\!</math> +:<math>\mu_{01} = 0,\,\!</math> +:<math>\mu_{10} = 0,\,\!</math> +:<math>\mu_{11} = M_{11} - \bar{x} M_{01} = M_{11} - \bar{y} M_{10},</math> +:<math>\mu_{20} = M_{20} - \bar{x} M_{10}, </math> +:<math>\mu_{02} = M_{02} - \bar{y} M_{01}, </math> +:<math>\mu_{21} = M_{21} - 2 \bar{x} M_{11} - \bar{y} M_{20} + 2 \bar{x}^2 M_{01}, </math> +:<math>\mu_{12} = M_{12} - 2 \bar{y} M_{11} - \bar{x} M_{02} + 2 \bar{y}^2 M_{10}, </math> +:<math>\mu_{30} = M_{30} - 3 \bar{x} M_{20} + 2 \bar{x}^2 M_{10}, </math> +:<math>\mu_{03} = M_{03} - 3 \bar{y} M_{02} + 2 \bar{y}^2 M_{01}. </math> + +It can be shown that: +:<math>\mu_{pq} = \sum_{m}^p \sum_{n}^q {p\choose m} {q\choose n}(-\bar{x})^{(p-m)}(-\bar{y})^{(q-n)} M_{mn}</math> + +Central moments are [[Translational invariance|translational invariant]]. +<!-- [[Invariant (mathematics)|invariant]] to [[translation (geometry)|translation]]. --> + +===Examples=== + +Information about image orientation can be derived by first using the second order central moments to construct a [[covariance matrix]]. + +:<math>\mu'_{20} = \mu_{20} / \mu_{00} = M_{20}/M_{00} - \bar{x}^2</math> +:<math>\mu'_{02} = \mu_{02} / \mu_{00} = M_{02}/M_{00} - \bar{y}^2</math> +:<math>\mu'_{11} = \mu_{11} / \mu_{00} = M_{11}/M_{00} - \bar{x}\bar{y}</math> + +The [[covariance matrix]] of the image <math>I(x,y)</math> is now + +:<math>\operatorname{cov}[I(x,y)] = \begin{bmatrix} \mu'_{20} & \mu'_{11} \\ \mu'_{11} & \mu'_{02} \end{bmatrix}</math>. + +The [[eigenvector]]s of this matrix correspond to the major and minor axes of the image intensity, so the '''orientation''' can thus be extracted from the angle of the eigenvector associated with the largest eigenvalue. It can be shown that this angle Θ is given by the following formula: + +:<math>\Theta = \frac{1}{2} \arctan \left( \frac{2\mu'_{11}}{\mu'_{20} - \mu'_{02}} \right)</math> + +The above formula holds as long as: +:<math>\mu'_{20} - \mu'_{02} \ne 0</math> + +The [[eigenvalue]]s of the covariance matrix can easily be shown to be + +:<math> \lambda_i = \frac{\mu'_{20} + \mu'_{02}}{2} \pm \frac{\sqrt{4{\mu'}_{11}^2 + ({\mu'}_{20}-{\mu'}_{02})^2 }}{2}, </math> + +and are proportional to the squared length of the eigenvector axes. The relative difference in magnitude of the eigenvalues are thus an indication of the eccentricity of the image, or how elongated it is. The [[Eccentricity (mathematics)|eccentricity]] is + +:<math> \sqrt{1 - \frac{\lambda_2}{\lambda_1}}. </math> + +== Scale invariant moments == +Moments ''&eta;<sub>i j</sub>'' where ''i'' + ''j'' ≥ 2 can be constructed to be [[Invariant (mathematics)|invariant]] to both [[translation (geometry)|translation]] and changes in [[Scale (ratio)|scale]] by dividing the corresponding central moment by the properly scaled (00)th moment, using the following formula. + +:<math>\eta_{ij} = \frac{\mu_{ij}} + {\mu_{00}^{\left(1 + \frac{i+j}{2}\right)}}\,\!</math> + +== Rotation invariant moments == +It is possible to calculate moments which are [[Invariant (mathematics)|invariant]] under [[translation (geometry)|translation]], changes in [[Scale (ratio)|scale]], and also ''[[rotation]]''. Most frequently used are the Hu set of invariant moments:<ref name="“hu">M. K. Hu, "Visual Pattern Recognition by Moment Invariants", IRE Trans. Info. Theory, vol. IT-8, pp.179&ndash;187, 1962</ref> + +:<math> + \begin{align} + I_1 =\ & \eta_{20} + \eta_{02} \\ + I_2 =\ & (\eta_{20} - \eta_{02})^2 + 4\eta_{11}^2 \\ + I_3 =\ & (\eta_{30} - 3\eta_{12})^2 + (3\eta_{21} - \eta_{03})^2 \\ + I_4 =\ & (\eta_{30} + \eta_{12})^2 + (\eta_{21} + \eta_{03})^2 \\ + I_5 =\ & (\eta_{30} - 3\eta_{12}) (\eta_{30} + \eta_{12})[ (\eta_{30} + \eta_{12})^2 - 3 (\eta_{21} + \eta_{03})^2] + \\ + \ & (3\eta_{21} - \eta_{03}) (\eta_{21} + \eta_{03})[ 3(\eta_{30} + \eta_{12})^2 - (\eta_{21} + \eta_{03})^2] \\ + I_6 =\ & (\eta_{20} - \eta_{02})[(\eta_{30} + \eta_{12})^2 - (\eta_{21} + \eta_{03})^2] + 4\eta_{11}(\eta_{30} + \eta_{12})(\eta_{21} + \eta_{03}) \\ + I_7 =\ & (3\eta_{21} - \eta_{03})(\eta_{30} + \eta_{12})[(\eta_{30} + \eta_{12})^2 - 3(\eta_{21} + \eta_{03})^2] - \\ + \ & (\eta_{30} - 3\eta_{12})(\eta_{21} + \eta_{03})[3(\eta_{30} + \eta_{12})^2 - (\eta_{21} + \eta_{03})^2]. + \end{align} +</math> + +The first one, ''I''<sub>1</sub>, is analogous to the [[moment of inertia]] around the image's centroid, where the pixels' intensities are analogous to physical density. The last one, ''I''<sub>7</sub>, is skew invariant, which enables it to distinguish mirror images of otherwise identical images. + +A general theory on deriving complete and independent sets of rotation invariant moments was proposed by J. Flusser<ref name="Flusser">J. Flusser: "[http://library.utia.cas.cz/prace/20000033.pdf On the Independence of Rotation Moment Invariants]", Pattern Recognition, vol. 33, pp. 1405&ndash;1410, 2000.</ref> and T. Suk.<ref name="Suk">J. Flusser and T. Suk, "[http://library.utia.cas.cz/separaty/historie/flusser-rotation%20moment%20invariants%20for%20recognition%20of%20symmetric%20objects.pdf Rotation Moment Invariants for Recognition of Symmetric Objects]", IEEE Trans. Image Proc., vol. 15, pp. 3784&ndash;3790, 2006.</ref> They showed that the traditional Hu's invariant set is not independent nor complete. ''I''<sub>3</sub> is not very useful as it is dependent on the others. In the original Hu's set there is a missing third order independent moment invariant: +:<math> + \begin{align} +I_8 =\ & \eta_{11}[ ( \eta_{30} + \eta_{12})^2 - (\eta_{03} + \eta_{21})^2 ] - (\eta_{20}-\eta_{02}) (\eta_{30}+\eta_{12}) (\eta_{03}+\eta_{21}) + \end{align} +</math> + +== External links == +* [http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT2/node3.html Analysis of Binary Images], University of Edinburgh +* [http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SHUTLER3/CVonline_moments.html Statistical Moments], University of Edinburgh +* [http://jamh-web.appspot.com/computer_vision.html Variant moments], Machine Perception and Computer Vision page (Matlab and Python source code) +* [http://www.youtube.com/watch?v=O-hCEXi3ymU Hu Moments] introductory video on YouTube + +==References== +<references /> + +{{DEFAULTSORT:Image Moment}} +[[Category:Computer vision]] + jyexw1l7mddngxwm67ahw4pij42dybk + + + + Geostrophic wind + 0 + 5362 + + 5363 + 2013-10-31T10:29:44Z + + Qwfp + 0 + + /* Governing formula */ D/Dt is the [[material derivative]] + wikitext + text/x-wiki + The '''geostrophic wind''' ({{IPAc-en|dʒ|iː|ɵ|ˈ|s|t|r|ɒ|f|ɨ|k}} or {{IPAc-en|dʒ|iː|ɵ|ˈ|s|t|r|oʊ|f|ɨ|k}}) is the theoretical [[wind]] that would result from an exact balance between the [[Coriolis effect]] and the [[pressure gradient]] force. This condition is called ''geostrophic balance.'' The geostrophic wind is directed [[Parallel (geometry)|parallel]] to [[isobar (meteorology)|isobar]]s (lines of constant [[Atmospheric pressure|pressure]] at a given height). This balance seldom holds exactly in nature. The true wind almost always differs from the geostrophic wind due to other forces such as [[friction]] from the ground. Thus, the actual wind would equal the geostrophic wind only if there were no friction and the isobars were perfectly straight. Despite this, much of the atmosphere outside the [[tropics]] is close to geostrophic flow much of the time and it is a valuable first approximation. Geostrophic flow in air or water is a zero-frequency [[inertial waves|inertial wave]]. + +==Origin== + +[[Air]] naturally moves from areas of high [[pressure]] to areas of low pressure, due to the [[pressure gradient]] force. As soon as the air starts to move, however, the [[Coriolis effect|Coriolis "force"]] deflects it. The [[wikt:deflection|deflection]] is to the right in the [[northern hemisphere]], and to the left in the [[southern hemisphere]]. As the air moves from the high pressure area, its speed increases, and so does its Coriolis deflection. The deflection increases until the Coriolis and pressure gradient forces are in geostrophic balance: at this point, the air flow is no longer moving from high to low pressure, but instead moves along an [[wikt:isobar|isobar]]. (Note that this explanation assumes that the atmosphere starts in a geostrophically unbalanced state and describes how such a state would evolve into a balanced flow. In practice, the flow is nearly always balanced.) The geostrophic balance helps to explain why, in the northern hemisphere, [[low pressure system]]s (or ''[[cyclone]]s'') spin counterclockwise and [[High pressure area|high pressure systems]] (or ''[[anticyclone]]s'') spin clockwise, and the opposite in the southern hemisphere. + +==Geostrophic currents== + +Flow of ocean water is also largely geostrophic. Just as multiple weather balloons that measure pressure as a function of height in the atmosphere are used to map the atmospheric pressure field and infer the geostrophic wind, measurements of density as a function of depth in the ocean are used to infer geostrophic currents. [[satellite altimetry|Satellite altimeters]] are also used to measure sea surface height anomaly, which permits a calculation of the geostrophic current at the surface. + +==Limitations of the geostrophic approximation== + +The effect of friction, between the air and the land, breaks the geostrophic balance. Friction slows the flow, lessening the effect of the Coriolis force. As a result, the pressure gradient force has a greater effect and the air still moves from high pressure to low pressure, though with great deflection. This explains why high pressure system winds radiate out from the center of the system, while low pressure systems have winds that spiral inwards. + +The geostrophic wind neglects [[friction]]al effects, which is usually a good [[approximation]] for the [[synoptic scale meteorology|synoptic scale]] instantaneous flow in the midlatitude mid-[[troposphere]].<ref>Holton, J.R., 'An Introduction to Dynamic Meteorology', International Geophysical Series, Vol 48 Academic Press.</ref> Although [[ageostrophic]] terms are relatively small, they are essential for the time evolution of the flow and in particular are necessary for the growth and decay of storms. Quasigeostrophic and Semigeostrophic theory are used to model flows in the atmosphere more widely. These theories allow for divergence to take place and for weather systems to then develop. + +==Governing formula== +[[Newton's Second Law]] can be written as follows if only the pressure gradient, gravity, and friction act on an air parcel, where the bold symbolizes a vector: + +<math>{D\boldsymbol{U} \over Dt} = -2\boldsymbol{\Omega} \times \boldsymbol{U} - {1 \over \rho} \nabla p + \boldsymbol{g} + \boldsymbol{F}_r</math> + +Here <math>\boldsymbol{U}</math> is the velocity field of the air, <math>\boldsymbol{\Omega}</math> is the angular velocity vector of the planet, <math>\rho</math> is the density of the air, <math>p</math> is the air pressure, <math>\boldsymbol{F}_r</math> is the friction, <math>\boldsymbol{g}</math> is the [[standard gravity|acceleration vector due to gravity]] and <math>{D \; \over Dt}</math> is the [[material derivative]]. + +Locally this can be expanded in [[Cartesian coordinates]], with a positive u representing an eastward direction and a positive v representing a northward direction. Neglecting friction and vertical motion, as justified by the [[Taylor-Proudman theorem]], we have: + +<math>{Du \over Dt} = -{1 \over \rho}{\partial P \over \partial x} + f \cdot v</math> + +<math>{Dv \over Dt} = -{1 \over \rho}{\partial P \over \partial y} - f \cdot u</math> + +<math> 0 = -g -{1 \over \rho}{\partial P \over \partial z}</math> + +With <math>f = 2 \Omega \sin{\phi}</math> the [[Coriolis effect|Coriolis parameter]] (approximately 10<sup>&minus;4</sup> s<sup>&minus;1</sup>, varying with latitude). + +Assuming geostrophic balance, the system is stationary and the first two equations become: + +<math>f \cdot v = {1 \over \rho}{\partial P \over \partial x}</math> + +<math>f \cdot u = -{1 \over \rho}{\partial P \over \partial y}</math> + +By substituting using the third equation above, we have: + +<math>f \cdot v = g\frac{\partial P / \partial x}{\partial P / \partial z} = g{\partial Z \over \partial x}</math> + +<math>f \cdot u = -g\frac{\partial P / \partial y}{\partial P / \partial z} = -g{\partial Z \over \partial y}</math> + +with ''Z'' the height of the constant pressure surface (satisfying <math>{\partial P \over \partial x}dx + {\partial P \over \partial y}dy + {\partial P \over \partial z} dZ = 0 </math>). + +This leads us to the following result for the geostrophic wind components <math>(u_g,v_g)</math>: + +: <math> u_g = - {g \over f} {\partial Z \over \partial y}</math> + +<!-- extra blank line between two lines of "displayed" [[TeX]], for legiblity --> + +: <math> v_g = {g \over f} {\partial Z \over \partial x}</math> + +The validity of this approximation depends on the local [[Rossby number]]. It is invalid at the equator, because ''f'' is equal to zero there, and therefore generally not used in the [[tropics]]. + +Other variants of the equation are possible; for example, the geostrophic wind vector can be expressed in terms of the gradient of the [[geopotential height]] Φ on a surface of constant pressure: + +: <math> \overrightarrow{V_g} = {\hat{k} \over f} \times \nabla_p \Phi </math> + +== See also == +*[[Geostrophic current]] +*[[Thermal wind]] +*[[Gradient wind]] +*[[Prevailing winds]] + +==References== +{{Reflist}} + +== External links == +* [http://atmos.nmsu.edu/education_and_outreach/encyclopedia/geostrophic.htm Geostrophic approximation] +* [http://nsidc.org/arcticmet/glossary/geostrophic_winds.html Definition of geostrophic wind] +*[http://atmo.tamu.edu/class/atmo203/tut/windpres/wind8.html Geostrophic wind description] + +{{DEFAULTSORT:Geostrophic Wind}} +[[Category:Geophysics]] +[[Category:Fluid dynamics]] +[[Category:Atmospheric dynamics]] + pyv3e4wbh891h0lcnnfx1eptqjyz8tt + + + + Ext functor + 0 + 6692 + + 6693 + 2013-11-23T06:22:43Z + + 71.139.172.99 + + wikitext + text/x-wiki + In [[mathematics]], the '''Ext functors''' of [[homological algebra]] are [[derived functor]]s of [[Hom functor]]s. They were first used in [[algebraic topology]], but are common in many areas of mathematics. The name "Ext" comes from the connection between the [[functor]]s and extensions in abelian categories. + +== Definition and computation == +Let ''R'' be a [[ring (mathematics)|ring]] and let Mod<sub>''R''</sub> be the [[Category (mathematics)|category]] of [[module (mathematics)|modules]] over ''R''. Let ''B'' be in Mod<sub>''R''</sub> and set ''T''(''B'') = Hom<sub>''R''</sub>(''A,B''), for fixed ''A'' in Mod<sub>''R''</sub>. This is a [[left exact functor]] and thus has right [[derived functor]]s ''R<sup>n</sup>T''. The Ext functor is defined by + +:<math>\operatorname{Ext}_R^n(A,B)=(R^nT)(B).</math> + +This can be calculated by taking any [[injective resolution]] + +:<math>0 \rightarrow B \rightarrow I^0 \rightarrow I^1 \rightarrow \dots, </math> + +and computing + +:<math>0 \rightarrow \operatorname{Hom}_R(A,I^0) \rightarrow \operatorname{Hom}_R(A,I^1) \rightarrow \dots.</math> + +Then (''R<sup>n</sup>T'')(''B'') is the [[homology (mathematics)|homology]] of this complex. Note that Hom<sub>''R''</sub>(''A,B'') is excluded from the complex. + +An alternative definition is given using the functor ''G''(''A'')=Hom<sub>''R''</sub>(''A,B''). For a fixed module ''B'', this is a [[Covariance and contravariance of functors|contravariant]] [[left exact functor]], and thus we also have right [[derived functor]]s ''R<sup>n</sup>G'', and can define + +:<math>\operatorname{Ext}_R^n(A,B)=(R^nG)(A).</math> + +This can be calculated by choosing any [[projective resolution]] + +:<math>\dots \rightarrow P^1 \rightarrow P^0 \rightarrow A \rightarrow 0, </math> + +and proceeding dually by computing + +:<math>0\rightarrow\operatorname{Hom}_R(P^0,B)\rightarrow \operatorname{Hom}_R(P^1,B) \rightarrow \dots.</math> + +Then (''R<sup>n</sup>G'')(''A'') is the homology of this complex. Again note that Hom<sub>''R''</sub>(''A,B'') is excluded. + +These two constructions turn out to yield [[isomorphic]] results, and so both may be used to calculate the Ext functor. +== Ext and extensions == <!-- "Extension of modules" redirects here --> + +===Equivalence of extensions=== +Ext functors derive their name from the relationship to '''extensions of modules'''. Given ''R''-modules ''A'' and ''B'', an '''extension of ''A'' by ''B''''' is a [[short exact sequence]] of ''R''-modules + +:<math>0\rightarrow B\rightarrow E\rightarrow A\rightarrow0.</math> + +Two extensions + +:<math>0\rightarrow B\rightarrow E\rightarrow A\rightarrow0</math> +:<math>0\rightarrow B\rightarrow E^\prime\rightarrow A\rightarrow0</math> + +are said to be '''equivalent''' (as extensions of ''A'' by ''B'') if there is a [[commutative diagram]] + +[[Image:EquivalenceOfExtensions.png]]. + +Note that the [[Five Lemma]] implies that the middle arrow is an isomorphism. An extension of ''A'' by ''B'' is called '''split''' if it is equivalent to the '''trivial extension''' + +:<math>0\rightarrow B\rightarrow A\oplus B\rightarrow A\rightarrow0.</math> + +There is a bijective correspondence between [[equivalence class]]es of extensions + +:<math>0\rightarrow B\rightarrow E\rightarrow A\rightarrow 0</math> + +of ''A'' by ''B'' and elements of + +:<math>\operatorname{Ext}_R^1(A,B).</math> + +===The Baer sum of extensions=== +Given two extensions + +:<math>0\rightarrow B\rightarrow E\rightarrow A\rightarrow 0</math> +:<math>0\rightarrow B\rightarrow E^\prime\rightarrow A\rightarrow 0</math> + +we can construct the '''Baer sum''', by forming the [[Pullback (category theory)|pullback]] over <math>A</math>, + +<math>\Gamma = \left\{ (e, e') \in E \oplus E' \; | \; g(e) = g'(e')\right\}.</math> + +We form the quotient + +<math>Y = \Gamma / \{(f(b), 0) - (0, f'(b))\;|\;b \in B\}</math>, + +that is, we [[mod out]] by the relation <math>(f(b)+e, e') \sim (e, f'(b)+e')</math>. The extension + +:<math>0\rightarrow B\rightarrow Y\rightarrow A\rightarrow 0</math> + +where the first arrow is <math>b \mapsto [(f(b), 0)] = [(0, f'(b))]</math> and the second <math>(e, e') \mapsto g(e) = g'(e')</math> thus formed is called the Baer sum of the extensions ''E'' and ''E'''. + +[[Up to]] equivalence of extensions, the Baer sum is commutative and has the trivial extension as identity element. The extension 0 → ''B'' → ''E'' → ''A'' → 0 has for opposite the same extension with exactly one of the central arrows turned to their opposite ''eg'' the morphism ''g'' is replaced by ''-g''. + +The set of extensions up to equivalence is an [[abelian group]] that is a realization of the functor Ext{{su|b=''R''|p=1}}(''A'', ''B'') + +== Construction of Ext in abelian categories == +This identification enables us to define Ext{{su|b='''Ab'''|p=1}}(''A'', ''B'') even for [[abelian categories]] '''Ab''' without reference to [[Projective module|projectives]] and [[Injective module|injectives]]. We simply take Ext{{su|b='''Ab'''|p=1}}(''A'', ''B'') to be the set of equivalence classes of extensions of ''A'' by ''B'', forming an abelian group under the Baer sum. Similarly, we can define higher Ext groups Ext{{su|b='''Ab'''|p=''n''}}(''A'', ''B'') as equivalence classes of ''n-extensions'' + +:<math>0\rightarrow B\rightarrow X_n\rightarrow\cdots\rightarrow X_1\rightarrow A\rightarrow0</math> + +under the [[equivalence relation]] generated by the relation that identifies two extensions + +:<math>0\rightarrow B\rightarrow X_n\rightarrow\cdots\rightarrow X_1\rightarrow A\rightarrow0</math> +:<math>0\rightarrow B\rightarrow X'_n\rightarrow\cdots\rightarrow X'_1\rightarrow A\rightarrow0</math> + +if there are maps ''X<sub>m</sub>'' → ''X′<sub>m</sub>'' for all ''m'' in {1, 2, ..., ''n''} so that every resulting [[Commutative diagram|square commutes]]. + +The Baer sum of the two ''n''-extensions above is formed by letting ''X{{su|b=1|p=′′}}'' be the [[Pullback (category theory)|pullback]] of ''X<sub>1</sub>'' and ''X{{su|b=1|p=′}}'' over ''A'', and ''X{{su|b=n|p=′′}}'' be the [[Pushout (category theory)|pushout]] of ''X<sub>n</sub>'' and ''X{{su|b=n|p=′}}'' under ''B'' quotiented by the skew diagonal copy of B. Then we define the Baer sum of the extensions to be + +:<math>0\rightarrow B\rightarrow X''_n\rightarrow X_{n-1}\oplus X'_{n-1}\rightarrow\cdots\rightarrow X_2\oplus X'_2\rightarrow X''_1\rightarrow A\rightarrow0.</math> + +== Further properties of Ext == +The Ext functor exhibits some convenient properties, useful in computations. + +* Ext{{su|b=''R''|p=''i''}}(''A'', ''B'') = 0 for ''i'' > 0 if either ''B'' is [[injective module|injective]] or ''A'' [[projective module|projective]]. + +* A converse also holds: if Ext{{su|b=''R''|p=1}}(''A'', ''B'') = 0 for all ''A'', then Ext{{su|b=''R''|p=''i''}}(''A'', ''B'') = 0 for all ''A'', and ''B'' is injective; if Ext{{su|b=''R''|p=1}}(''A'', ''B'') = 0 for all ''B'', then Ext{{su|b=''R''|p=''i''}}(''A'', ''B'') = 0 for all ''B'', and ''A'' is projective. + +* <math>\operatorname{Ext}^n_R \left (\bigoplus_\alpha A_\alpha,B \right )\cong\prod_\alpha\operatorname{Ext}^n_R(A_\alpha,B)</math> + +* <math>\operatorname{Ext}^n_R \left (A,\prod_\beta B_\beta \right )\cong\prod_\beta\operatorname{Ext}^n_R(A,B_\beta)</math> + +== Ring structure and module structure on specific Exts == +One more very useful way to view the Ext functor is this: when an element of Ext{{su|b=''R''|p=''n''}}(''A'', ''B'') = 0 is considered as an equivalence class of maps ''f'': ''P<sub>n</sub>'' → ''B'' for a [[Projective module#Facts|projective resolution]] ''P''<sub>*</sub> of ''A'' ; so, then we can pick a long exact sequence ''Q''<sub>*</sub> ending with ''B'' and lift the map ''f'' using the projectivity of the modules ''P<sub>m</sub>'' to a [[Chain complex#Chain maps|chain map]] ''f''<sub>*</sub>: ''P''<sub>*</sub> → ''Q''<sub>*</sub> of degree -n. It turns out that [[Chain complex#Chain homotopy|homotopy classes]] of such chain maps correspond precisely to the equivalence classes in the definition of Ext above. + +Under sufficiently nice circumstances, such as when the [[Ring (mathematics)|ring]] ''R'' is a [[group ring]] over a field ''k'', or an augmented ''k''-[[Algebra over a field|algebra]], we can impose a ring structure on Ext{{su|b=''R''|p=*}}(''k'', ''k''). The multiplication has quite a few equivalent interpretations, corresponding to different interpretations of the elements of Ext{{su|b=''R''|p=*}}(''k'', ''k''). + +One interpretation is in terms of these homotopy classes of chain maps. Then the product of two elements is represented by the composition of the corresponding representatives. We can choose a single resolution of ''k'', and do all the calculations inside Hom<sub>''R''</sub>(''P''<sub>*</sub>,''P''<sub>*</sub>), which is a differential graded algebra, with cohomology precisely Ext<sub>''R''</sub>(''k,k''). + +The Ext groups can also be interpreted in terms of exact sequences; this has the advantage that it does not rely on the existence of projective or injective modules. Then we take the viewpoint above that an element of Ext{{su|b=''R''|p=''n''}}(''A'', ''B'') is a class, under a certain equivalence relation, of exact sequences of length ''n'' + 2 starting with ''B'' and ending with ''A''. This can then be spliced with an element in Ext{{su|b=''R''|p=''m''}}(''C'', ''A''), by replacing ... → ''X''<sub>1</sub> → ''A'' → 0 and 0 → ''A'' → ''Y<sub>n</sub>'' → ... with: + +:<math>\cdots \rightarrow X_1\rightarrow Y_n\rightarrow \cdots </math> + +where the middle arrow is the composition of the functions ''X''<sub>1</sub> → ''A'' and ''A'' → ''Y<sub>n</sub>''. This product is called the ''Yoneda splice''. + +These viewpoints turn out to be equivalent whenever both make sense. + +Using similar interpretations, we find that Ext{{su|b=''R''|p=*}}(''k'', ''M'') is a [[Module (mathematics)|module]] over Ext{{su|b=''R''|p=*}}(''k'', ''k''), again for sufficiently nice situations. + +== Interesting examples == +If '''Z'''[''G''] is the [[group ring|integral group ring]] for a [[Group (mathematics)|group]] ''G'', then Ext{{su|b='''Z'''[''G'']|p=*}}('''Z''', ''M'') is the [[group cohomology]] H*(''G,M'') with coefficients in ''M''. + +For '''F'''<sub>''p''</sub> the finite field on ''p'' elements, we also have that H*(''G,M'') = Ext{{su|b='''F'''<sub>''p''</sub>[''G'']|p=*}}('''F'''<sub>''p''</sub>, ''M''), and it turns out that the group cohomology doesn't depend on the base ring chosen. + +If ''A'' is a ''k''-[[algebra over a field|algebra]], then Ext{{su|b=''A'' ⊗<sub>''k''</sub> ''A''<sup>op</sup>|p=*}}(''A'', ''M'') is the [[Hochschild cohomology]] HH*(''A,M'') with coefficients in the ''A''-bimodule ''M''. + +If ''R'' is chosen to be the [[universal enveloping algebra]] for a [[Lie algebra]] <math>\mathfrak g</math> over a commutative ring ''k'', then Ext{{su|b=''R''|p=*}}(''k'', ''M'') is the [[Lie algebra cohomology]] <math>\operatorname{H}^*(\mathfrak g,M)</math> with coefficients in the module ''M''. + +==See also== +* [[Tor functor]] +* The [[Grothendieck group#Grothendieck group and extensions in an abelian category|Grothendieck group]] is a construction centered on extensions +* The [[universal coefficient theorem for cohomology]] is one notable use of the Ext functor + +==References== +* {{Citation | last1=Gelfand | first1=Sergei I. | last2=Manin | first2=Yuri Ivanovich | author2-link=Yuri Ivanovich Manin | title=Homological algebra | isbn=978-3-540-65378-3 | year=1999 | publisher=Springer | location=Berlin}} +* {{Weibel IHA}} + +[[Category:Homological algebra]] +[[Category:Binary operations]] + 0r919cwvcnxpejfyyj3ppf36460y1ij + + + + Covariant derivative + 0 + 4786 + + 4787 + 2013-10-31T20:42:27Z + + Cloudswrest + 0 + + + Wikified. + wikitext + text/x-wiki + {{about|covariant derivatives|directional tensor derivatives with respect to continuum mechanics|Tensor derivative (continuum mechanics)}} + +In [[mathematics]], the '''covariant derivative''' is a way of specifying a [[derivative]] along [[tangent vector]]s of a [[manifold]]. Alternatively, the covariant derivative is a way of introducing and working with a [[connection (mathematics)|connection]] on a manifold by means of a [[differential operator]], to be contrasted with the approach given by a [[connection (principal bundle)|principal connection]] on the frame bundle – see [[affine connection]]. In the special case of a manifold isometrically embedded into a higher dimensional [[Euclidean space]], the covariant derivative can be viewed as the [[orthogonal projection]] of the Euclidean derivative along a tangent vector onto the manifold's tangent space. In this case the Euclidean derivative is broken into two parts, the extrinsic normal component and the intrinsic covariant derivative component. + +This article presents an introduction to the covariant derivative of a [[vector field]] with respect to a vector field, both in a coordinate free language and using a local [[coordinate system]] and the traditional index notation. The covariant derivative of a [[tensor field]] is presented as an extension of the same concept. The covariant derivative generalizes straightforwardly to a notion of differentiation associated to a [[connection on a vector bundle]], also known as a '''Koszul connection'''. + +==Introduction and history== +Historically, at the turn of the 20th century, the covariant derivative was introduced by [[Gregorio Ricci-Curbastro]] and [[Tullio Levi-Civita]]<ref>Levi-Civita, T. and Ricci, G. "Méthodes de calcul différential absolu et leurs applications", ''Math. Ann. B'', '''54''' (1900) 125–201.</ref> in the theory of [[Riemannian geometry|Riemannian]] and [[pseudo-Riemannian manifold|pseudo-Riemannian geometry]]. Ricci and Levi-Civita (following ideas of [[Elwin Bruno Christoffel]]) observed that the [[Christoffel symbols]] used to define the [[Riemann tensor|curvature]]<ref>[[Riemann]], G.F.B., "Über die Hypothesen, welche der Geomtrie zu Grunde liegen", ''Gesammelte Mathematische Werke'' (1866); reprint, ed. Weber, H.: Dover, New York, 1953.</ref><ref>Christoffel, E.B., "Über die Transformation der homogenen Differentialausdrücke zweiten Grades," ''J. für die Reine und Angew. Math.'' '''70''' (1869), 46–70.</ref> could also provide a notion of [[derivative|differentiation]] which generalized the classical [[directional derivative]] of [[vector fields]] on a manifold. This new derivative – the [[Levi-Civita connection]] – was ''[[Covariance and contravariance of vectors|covariant]]'' in the sense that it satisfied Riemann's requirement that objects in geometry should be independent of their description in a particular coordinate system. + +It was soon noted by other mathematicians, prominent among these being [[Hermann Weyl]], [[Jan Arnoldus Schouten]], and [[Élie Cartan]],<ref>cf. with Cartan, E. [http://www.numdam.org/item?id=ASENS_1923_3_40__325_0 "Sur les variétés à connexion affine et la theorie de la relativité généralisée"], ''Annales, Ecole Normale'' '''40''' (1923), 325–412.</ref> that a covariant derivative could be defined abstractly without the presence of a [[metric tensor|metric]]. The crucial feature was not a particular dependence on the metric, but that the Christoffel symbols satisfied a certain precise second order transformation law. This transformation law could serve as a starting point for defining the derivative in a covariant manner. Thus the theory of covariant differentiation forked off from the strictly Riemannian context to include a wider range of possible geometries. + +In the 1940s, practitioners of [[differential geometry]] began introducing other notions of covariant differentiation in general [[vector bundle]]s which were, in contrast to the classical bundles of interest to geometers, not part of the [[tensor analysis]] of the manifold. By and large, these generalized covariant derivatives had to be specified ''ad hoc'' by some version of the connection concept. In 1950, [[Jean-Louis Koszul]] unified these new ideas of covariant differentiation in a vector bundle by means of what is known today as a '''[[connection (vector bundle)|Koszul connection]]''' <ref>Koszul, J. L. "Homologie et cohomologie des algebres de Lie", ''Bulletin de la Société Mathématique'' '''78''' (1950) 65–127.</ref> or a '''connection on a vector bundle'''. Using ideas from [[Lie algebra cohomology]], Koszul successfully converted many of the analytic features of covariant differentiation into algebraic ones. In particular, Koszul connections eliminated the need for awkward manipulations of [[Christoffel symbols]] (and other analogous non-[[tensor]]ial) objects in differential geometry. Thus they quickly supplanted the classical notion of covariant derivative in many post-1950 treatments of the subject. + +==Motivation== +The '''covariant derivative''' is a generalization of the [[directional derivative]] from [[vector calculus]]. As with the directional derivative, the covariant derivative is a rule, <math>\nabla_{\bold u}{\bold v}</math>, which takes as its inputs: (1) a vector, '''u''', defined at a point ''P'', and (2) a [[vector field]], '''v''', defined in a neighborhood of ''P''.<ref>The covariant derivative is also denoted variously by '''<math>\partial</math><sub>v</sub>u''', '''D<sub>v</sub>u''', or other notations.</ref> The output is the vector <math>\nabla_{\bold u}{\bold v}(P)</math>, also at the point ''P''. The primary difference from the usual directional derivative is that <math>\nabla_{\bold u}{\bold v}</math> must, in a certain precise sense, be ''independent'' of the manner in which it is expressed in a [[coordinate system]]. + +A vector may be ''described'' as a list of numbers in terms of a [[basis (mathematics)|basis]], but as a geometrical object a vector retains its own identity regardless of how one chooses to describe it in a basis. This persistence of identity is reflected in the fact that when a vector is written in one basis, and then the basis is changed, the vector transforms according to a [[change of basis]] formula. Such a transformation law is known as a [[covariant transformation]]. The covariant derivative is required to transform, under a change in coordinates, in the same way as a vector does: the covariant derivative must change by a covariant transformation (hence the name). + +In the case of [[Euclidean space]], one tends to define the derivative of a vector field in terms of the difference between two vectors at two nearby points. +In such a system one [[Translation (geometry)|translates]] one of the vectors to the origin of the other, keeping it parallel. With a Cartesian (fixed [[orthonormal]]) coordinate system we thus obtain the simplest example: covariant derivative which is obtained by taking the derivative of the components. + +In the general case, however, one must take into account the change of the coordinate system. For example, if the same covariant derivative is written in [[coordinates (elementary mathematics)|polar coordinates]] in a two dimensional Euclidean plane, then it contains extra terms that describe how the coordinate grid itself "rotates". In other cases the extra terms describe how the coordinate grid expands, contracts, twists, interweaves, etc. + +Consider the example of moving along a curve γ(''t'') in the Euclidean plane. In polar coordinates, γ may be written in terms of its radial and angular coordinates by γ(''t'') = (''r''(''t''), θ(''t'')). A vector at a particular time ''t''<ref>In many applications, it may be better not to think of ''t'' as corresponding to time, at least for applications in [[general relativity]]. It is simply regarded as an abstract parameter varying smoothly and monotonically along the path.</ref> (for instance, the acceleration of the curve) is expressed in terms of <math>({\mathbf e}_r, {\mathbf e}_{\theta})</math>, where <math>{\mathbf e}_r</math> and <math>{\mathbf e}_{\theta}</math> are unit tangent vectors for the polar coordinates, serving as a basis to decompose a vector in terms of radial and [[tangential component]]s. At a slightly later time, the new basis in polar coordinates appears slightly rotated with respect to the first set. The covariant derivative of the basis vectors (the [[Christoffel symbols]]) serve to express this change. +{{Clear}} + +In a curved space, such as the surface of the Earth (regarded as a sphere), the [[Translation (geometry)|translation]] is not well defined and its analog, [[parallel transport]], depends on the path along which the vector is translated. + +A vector '''e''' on a globe on the equator in Q is directed to the north. Suppose we [[parallel transport]] the vector first along the equator until P and then (keeping it parallel to itself) drag it along a meridian to the pole N and (keeping the direction there) subsequently transport it along another meridian back to Q. Then we notice that the parallel-transported vector along a closed circuit does not return as the same vector; instead, it has another orientation. This would not happen in Euclidean space and is caused by the ''curvature'' of the surface of the globe. The same effect can be noticed if we drag the vector along an infinitesimally small closed surface subsequently along two directions and then back. The infinitesimal change of the vector is a measure of the curvature. +{{Clear}} + +===Remarks=== +* The definition of the covariant derivative does not use the metric in space. However, for each metric there is a unique [[Torsion tensor|torsion]]-free covariant derivative called the [[Levi-Civita connection]] such that the covariant derivative of the metric is zero. + +* The properties of a derivative imply that <math>\nabla_{\mathbf v} {\mathbf u}</math> depends on an arbitrarily small neighborhood of a point ''p'' in the same way as e.g. the derivative of a scalar function along a curve at a given point ''p'' depends on an arbitrarily small neighborhood of ''p''. + +* The information on the neighborhood of a point ''p'' in the covariant derivative can be used to define [[parallel transport]] of a vector. Also the [[Curvature of Riemannian manifolds|curvature]], [[Torsion tensor|torsion]], and [[geodesic]]s may be defined only in terms of the covariant derivative or other related variation on the idea of a [[linear connection]]. + +==Informal definition using an embedding into Euclidean space== +Assume a (pseudo) Riemann manifold is embedded into Euclidean space <math>(\R^n, \langle\cdot;\cdot\rangle)</math> via a (twice continuously) differentiable mapping <math>\vec\Psi : \R^d \supset U \rightarrow \R^n</math> such that the tangent space at <math>\vec\Psi(p) \in M</math> is spanned by the vectors +:<math>\left\lbrace \left. \frac{\partial\vec\Psi}{\partial x^i} \right|_p : i \in \lbrace1, \dots d\rbrace\right\rbrace</math> + +and the scalar product on <math>\R^n</math> is compatible with the metric on ''M'': <math>g_{ij} = \left\langle \frac{\partial\vec\Psi}{\partial x^i} ; \frac{\partial\vec\Psi}{\partial x^j} \right\rangle</math>. (Since the manifold metric is always assumed to be regular, the compatibility condition implies linear independence of the partial derivative tangent vectors.) + +For a tangent vector field +:<math>\vec V = v^j \frac{\partial \vec\Psi}{\partial x^j}\quad</math> one has <math>\quad\frac{\partial\vec V}{\partial x^i} = \frac{\partial v^j}{\partial x^i} \frac{\partial\vec \Psi}{\partial x^j} + v^j \frac{\partial^2 \vec\Psi}{\partial x^i \, \partial x^j} </math>. +The last term is not tangential to ''M'', but can be expressed as a linear combination of the tangent space base vectors using the [[Christoffel symbols]] as linear factors plus a non-tangent vector: +:<math> +\frac{\partial^2 \vec\Psi}{\partial x^i \, \partial x^j} = \Gamma^k{}_{ij} \frac{\partial\vec\Psi}{\partial x^k} + \vec n +</math>. +The covariant derivative is defined as just a tangential portion of the usual derivative: +:<math> +\nabla_i \vec V := \frac{\partial\vec V}{\partial x^i} - \vec n = \left( \frac{\partial v^k}{\partial x^i} + v^j \Gamma^k{}_{ij} \right) \frac{\partial\vec\Psi}{\partial x^k}. +</math> +In the case of the [[Levi-Civita connection]] <math>\vec n</math> is required to be orthogonal to tangent space, so +:<math> +\left\langle \frac{\partial^2 \vec\Psi}{\partial x^i \, \partial x^j} ; \frac{\partial\vec \Psi}{\partial x^l} \right\rangle = \Gamma^k{}_{ij} \left\langle \frac{\partial\vec\Psi}{\partial x^k} ; \frac{\partial\vec\Psi}{\partial x^l} \right\rangle = \Gamma^k{}_{ij} \, g_{kl} +</math>. +On the other hand +:<math> +\frac{\partial g_{ab}}{\partial x^c} = \left\langle \frac{\partial^2 \vec\Psi}{ \partial x^c \, \partial x^a} ; \frac{\partial \vec\Psi}{\partial x^b} \right\rangle + \left\langle \frac{\partial \vec\Psi}{\partial x^a} ; \frac{\partial^2 \vec\Psi}{ \partial x^c \, \partial x^b} \right\rangle +</math> +implies (using the symmetry of the scalar product and swapping the order of partial differentiations) +:<math> +\frac{\partial g_{jk}}{\partial x^i} + \frac{\partial g_{ki}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^k} = 2\left\langle \frac{\partial^2 \vec\Psi}{\partial x^i \, \partial x^j} ; \frac{\partial\vec \Psi}{\partial x^k} \right\rangle +</math> +and yields the Christoffel symbols for the Levi-Civita connection in terms of the metric: +:<math> +g_{kl} \Gamma^k{}_{ij} = \frac{1}{2} \left( \frac{\partial g_{jl}}{\partial x^i} + \frac{\partial g_{li}}{\partial x^j}- \frac{\partial g_{ij}}{\partial x^l}\right). +</math> + +==Formal definition== +A covariant derivative is a [[connection (vector bundle)|(Koszul) connection]] on the [[tangent bundle]] and other [[tensor bundle]]s. Thus it has a certain behavior on functions, on vector fields, on the duals of vector fields (i.e., [[cotangent space|covector]] fields), and most generally of all, on arbitrary [[tensor field]]s. + +===Functions=== +Given a function <math>f\,</math>, the covariant derivative <math>\nabla_{\mathbf v}f</math> coincides with the normal differentiation of a real function in the direction of the vector '''v''', usually denoted by <math>{\mathbf v}f</math> and by <math>df({\mathbf v})</math>. + +===Vector fields=== +A '''covariant derivative''' <math>\nabla</math> of a vector field <math>{\mathbf u}</math> in the direction of the vector <math>{\mathbf v} </math> denoted <math>\nabla_{\mathbf v} {\mathbf u}</math> is defined by the following properties for any vector '''v''', vector fields '''u, w''' and scalar functions ''f'' and ''g'': +# <math>\nabla_{\mathbf v} {\mathbf u}</math> is algebraically linear in <math>{\mathbf v}</math> so <math>\nabla_{f{\mathbf v}+g{\mathbf w}} {\mathbf u}=f\nabla_{\mathbf v} {\mathbf u}+g\nabla_{\mathbf w} {\mathbf u}</math> +# <math>\nabla_{\mathbf v} {\mathbf u}</math> is additive in <math>{\mathbf u}</math> so <math>\nabla_{\mathbf v}({\mathbf u}+{\mathbf w})=\nabla_{\mathbf v} {\mathbf u}+\nabla_{\mathbf v} {\mathbf w}</math> +# <math>\nabla_{\mathbf v} {\mathbf u}</math> obeys the [[product rule]], i.e. <math>\nabla_{\mathbf v} f{\mathbf u}=f\nabla_{\mathbf v} {\mathbf u}+{\mathbf u}\nabla_{\mathbf v}f</math> where <math>\nabla_{\mathbf v}f</math> is defined above. + +Note that <math>\nabla_{\mathbf v} {\mathbf u}</math> at point ''p'' depends on the value of '''v''' at ''p'' and on values of '''u''' in a neighbourhood of ''p'' because of the last property, the product rule. + +===Covector fields=== +Given a field of [[Cotangent space|covectors]] (or [[one-form]]) <math>\alpha</math>, its covariant derivative <math>\nabla_{\mathbf v}\alpha</math> can be defined using the following identity which is satisfied for all vector fields '''u''' +:<math>(\nabla_{\mathbf v}\alpha)({\mathbf u})=\nabla_{\mathbf v}(\alpha({\mathbf u}))-\alpha(\nabla_{\mathbf v}{\mathbf u}).</math> +The covariant derivative of a covector field along a vector field '''v''' is again a covector field. + +===Tensor fields=== +Once the covariant derivative is defined for fields of vectors and covectors it can be defined for arbitrary [[Tensor (intrinsic definition)|tensor]] fields using the following identities where <math>\varphi</math> and <math>\psi\,</math> are any two tensors: +:<math>\nabla_{\mathbf v}(\varphi\otimes\psi)=(\nabla_{\mathbf v}\varphi)\otimes\psi+\varphi\otimes(\nabla_{\mathbf v}\psi),</math> +and if <math>\varphi</math> and <math>\psi</math> are tensor fields of the same tensor bundle then +:<math>\nabla_{\mathbf v}(\varphi+\psi)=\nabla_{\mathbf v}\varphi+\nabla_{\mathbf v}\psi.</math> +The covariant derivative of a tensor field along a vector field '''v''' is again a tensor field of the same type. + +Explicitly, let ''T'' be a tensor field of type (''p'',''q''). Consider ''T'' to be a differentiable [[multilinear map]] of [[smooth function|smooth]] [[section (fiber bundle)|sections]] α<sup>1</sup>, α<sup>2</sup>, ..., α<sup>q</sup> of the cotangent bundle ''T*M'' and of sections ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... ''X''<sub>p</sub> of the [[tangent bundle]] ''TM'', written ''T''(α<sup>1</sup>, α<sup>2</sup>, ..., ''X''<sub>1</sub>, ''X''<sub>2</sub>, ...) into '''R'''. The covariant derivative of ''T'' along ''Y'' is given by the formula + +:<math>(\nabla_Y T)(\alpha_1, \alpha_2, \ldots, X_1, X_2, \ldots) =Y(T(\alpha_1,\alpha_2,\ldots,X_1,X_2,\ldots))</math> +::<math>- T(\nabla_Y\alpha_1, \alpha_2, \ldots, X_1, X_2, \ldots) +- T(\alpha_1, \nabla_Y\alpha_2, \ldots, X_1, X_2, \ldots) -\ldots </math> +::<math>- T(\alpha_1, \alpha_2, \ldots, \nabla_YX_1, X_2, \ldots) +- T(\alpha_1, \alpha_2, \ldots, X_1, \nabla_YX_2, \ldots) - \ldots +</math> + +==Coordinate description== +{{Dablink|This section uses the [[Einstein summation convention]].}} +Given coordinate functions +: <math>x^i,\ i=0,1,2,\dots</math>, +any [[tangent vector]] can be described by its components in the basis +: <math>\mathbf{e}_i={\partial\over\partial x^i}</math>. +The covariant derivative of a basis vector along a basis vector is again a vector and so can be expressed as a linear combination <math>\Gamma^k {\mathbf e}_k\,</math>. +To specify the covariant derivative it is enough to specify the covariant derivative of each basis vector field <math>{\mathbf e}_j\,</math> along <math>{\mathbf e}_i\,</math>. +:<math> \nabla_{{\mathbf e}_i} {\mathbf e}_j = \Gamma^k {}_{i j} {\mathbf e}_k,</math> +the coefficients <math>\Gamma^k_{\ i j}</math> are called '''[[Christoffel symbols]]'''. +Then using the rules in the definition, we find that for general vector fields <math>{\mathbf v}= v^ie_i</math> and <math>{\mathbf u}= u^je_j</math> we get +:<math> \nabla_{\mathbf v} {\mathbf u} = \nabla_{v^i {\mathbf e}_i} u^j {\mathbf e}_j = v^i \nabla_{{\mathbf e}_i} u^j{\mathbf e}_j = v^i u^j \nabla_{{\mathbf e}_i} {\mathbf e}_j + v^i {\mathbf e}_j \nabla_{{\mathbf e}_i} u^j = v^i u^j \Gamma^k {}_{i j}{\mathbf e}_k+v^i{\partial u^j\over\partial x^i} {\mathbf e}_j </math> +so +:<math> \nabla_{\mathbf v} {\mathbf u} = \left(v^i u^j \Gamma^k {}_{i j}+v^i{\partial u^k\over\partial x^i}\right){\mathbf e}_k</math> +The first term in this formula is responsible for "twisting" the coordinate system with respect to the covariant derivative and the second for changes of components of the vector field ''u''. In particular +:<math>\nabla_{{\mathbf e}_j} {\mathbf u}=\nabla_j {\mathbf u} = \left( \frac{\partial u^i}{\partial x^j} + u^k \Gamma^i {}_{jk} \right) {\mathbf e}_i </math> +In words: the covariant derivative is the usual derivative along the coordinates with correction terms which tell how the coordinates change. + +The covariant derivative of a type (''r'',''s'') tensor field along <math>e_c</math> is given by the expression: + +:<math> (\nabla_c T)^{a_1 \ldots a_r}{}_{b_1 \ldots b_s} = \frac{\partial}{\partial x^c}T^{a_1 \ldots a_r}{}_{b_1 \ldots b_s}+\,\Gamma ^{a_1}{}_{dc} T ^{d \ldots a_r}{}_{b_1 \ldots b_s} + \cdots + \Gamma ^{a_r}{}_{dc} T ^{a_1 \ldots a_{r-1}d}{}_{b_1 \ldots b_s} </math> +::::::::<math> -\,\Gamma ^d {}_{b_1 c} T ^{a_1 \ldots a_r}{}_{d \ldots b_s} - \cdots - \Gamma ^d {}_{b_s c} T ^{a_1 \ldots a_r}{}_{b_1 \ldots b_{s-1} d}. +</math> + +Or, in words: take the partial derivative of the tensor and add: a <math>+\Gamma^{a_i}{}_{dc}</math> for every upper index <math>a_i</math>, and a <math>-\Gamma^{d}{}_{b_ic}</math> for every lower index <math>b_i</math>. + +If instead of a tensor, one is trying to differentiate a ''[[tensor density]]'' (of weight +1), then you also add a term +:<math>-\Gamma^d{}_{d c} T^{a_1 \ldots a_r}{}_{b_1 \ldots b_s}.</math> +If it is a tensor density of weight ''W'', then multiply that term by ''W''. +For example, <math>\sqrt{-g}</math> is a scalar density (of weight +1), so we get: +:<math>(\sqrt{-g})_{;c} = (\sqrt{-g})_{,c} - \sqrt{-g}\,\Gamma^{d}{}_{d c}</math> +where semicolon ";" indicates covariant differentiation and comma "," indicates partial differentiation. Incidentally, this particular expression is equal to zero, because the covariant derivative of a function solely of the metric is always zero. + +==Examples== +For a scalar field <math>\displaystyle \phi\,</math>, covariant differentiation is simply partial differentiation: +:<math>\displaystyle \phi_{;a}\equiv \partial_a \phi</math> + +For a contravariant vector field <math>\lambda^a\,</math>, we have: +:<math>\lambda^a{}_{;b}\equiv \partial_b \lambda^a+\Gamma^a{}_{bc}\lambda^c</math> + +For a covariant vector field <math>\lambda_a\,</math>, we have: +:<math>\lambda_{a;c}\equiv \partial_c \lambda_a-\Gamma^b{}_{c a}\lambda_b</math> + +For a type (2,0) tensor field <math>\tau^{a b}\,</math>, we have: +:<math>\tau^{a b}{}_{;c}\equiv \partial_c \tau^{a b}+\Gamma^a{}_{c d}\tau^{d b}+\Gamma^b{}_{c d}\tau^{a d}</math> + +For a type (0,2) tensor field <math>\tau_{a b}\,</math>, we have: +:<math>\tau_{a b ;c}\equiv \partial_c \tau_{a b}-\Gamma^d{}_{c a}\tau_{d b}-\Gamma^d{}_{c b}\tau_{a d}</math> + +For a type (1,1) tensor field <math>\tau^{a}{}_{b}\,</math>, we have: +:<math>\tau^{a}{}_{b;c}\equiv \partial_c \tau^{a}{}_{b}+\Gamma^a{}_{c d}\tau^d{}_b-\Gamma^d{}_{c b}\tau^{a}{}_{d}</math> + +The notation above is meant in the sense +:<math>\tau^{a b}{}_{;c}\equiv (\nabla_{{\mathbf e}_c}\tau)^{a b}</math> + +One must always remember that covariant derivatives do not commute, i.e. <math>\lambda_{a;bc}\neq\lambda_{a;cb}\,</math>. It is actually easy to show that: +:<math> \lambda_{a;bc}-\lambda_{a;cb}=R^d{}_{abc}\lambda_d</math> +where <math>R^d{}_{abc} \,</math> is the [[Riemann tensor]]. Similarly, +:<math> \lambda^a{}_{;bc}-\lambda^a{}_{;cb}=-R^a{}_{dbc}\lambda^d</math> +and +:<math> \tau^{ab}{}_{;cd}-\tau^{ab}{}_{;dc}=-R^a{}_{ecd}\tau^{eb}-R^b{}_{ecd}\tau^{ae}</math> +The latter can be shown by taking (without loss of generality) that <math>\tau^{ab}=\lambda^a \mu^b \,</math>. + +==Notation== +In textbooks on physics, the covariant derivative is sometimes simply stated in terms of its components in this equation. + +Often a notation is used in which the covariant derivative is given with a [[semicolon]], while a normal [[partial derivative]] is indicated by a [[comma]]. In this notation we write the same as: +:<math> + \nabla_{e_j} {\mathbf v} \ \stackrel{\mathrm{def}}{=}\ v^s {}_{;j}e_s \;\;\;\;\;\; + v^i {}_{;j} = + v^i {}_{,j} + v^k\Gamma^i {}_{k j} +</math> +Once again this shows that the covariant derivative of a vector field is not just simply obtained by differentiating to the coordinates <math> v^i {}_{,j}</math>, but also depends on the vector '''v''' itself through <math> v^k\Gamma^i {}_{k j}</math>. + +In some older texts (notably Adler, Bazin & Schiffer, ''Introduction to General Relativity''), the covariant derivative is denoted by a double pipe: +:<math> + \nabla_j {\mathbf v} \ \stackrel{\mathrm{def}}{=}\ v^i {}_{||j} \;\;\;\;\;\; +</math> + +==Derivative along curve== +Since the covariant derivative <math>\nabla_XT</math> of a tensor field <math>T</math> at a point <math>p</math> depends only on value of the vector field <math>X</math> at <math>p</math> one can define the covariant derivative along a smooth curve <math>\gamma(t)</math> in a manifold: +:<math>D_tT=\nabla_{\dot\gamma(t)}T.</math> +Note that the tensor field <math>T</math> only needs to be defined on the curve <math>\gamma(t)</math> for this definition to make sense. + +In particular, <math>\dot{\gamma}(t)</math> is a vector field along the curve <math>\gamma</math> itself. If <math>\nabla_{\dot\gamma(t)}\dot\gamma(t)</math> vanishes then the curve is called a geodesic of the covariant derivative. If the covariant derivative is the [[Levi-Civita connection]] of a certain metric then the geodesics for the connection are precisely the [[geodesics]] of the [[Metric tensor|metric]] that are parametrised by arc length. + +The derivative along a curve is also used to define the [[parallel transport]] along the curve. + +Sometimes the covariant derivative along a curve is called '''absolute''' or '''intrinsic derivative'''. + +==Relation to Lie derivative== +A covariant derivative introduces an extra geometric structure on a manifold which allows vectors in neighboring tangent spaces to be compared. This extra structure is necessary because there is no canonical way to compare vectors from different vector spaces, as is necessary for this generalization of the [[directional derivative]]. There is however another generalization of directional derivatives which ''is'' canonical: the [[Lie derivative]]. The Lie derivative evaluates the change of one vector field along the flow of another vector field. Thus, one must know both vector fields in an open neighborhood. The covariant derivative on the other hand introduces its own change for vectors in a given direction, and it only depends on the vector direction at a single point, rather than a vector field in an open neighborhood of a point. In other words, the covariant derivative is linear (over ''C''<sup>∞</sup>(''M'')) in the direction argument, while the Lie derivative is linear in neither argument. + +Note that the antisymmetrized covariant derivative ∇<sub>u</sub>''v'' − ∇<sub>v</sub>''u'', and the Lie derivative ''L''<sub>u</sub>''v'' differ by the [[torsion of connection|torsion of the connection]], so that if a connection is symmetric, then its antisymmetrization ''is'' the Lie derivative. + +==See also== +<div style="-moz-column-count:2; column-count:2;"> +* [[Affine connection]] +* [[Christoffel symbols]] +* [[Connection (algebraic framework)]] +* [[Connection (mathematics)]] +* [[Connection (vector bundle)]] +* [[Connection form]] +* [[Exterior covariant derivative]] +* [[Gauge covariant derivative]] +* [[Introduction to mathematics of general relativity]] +* [[Levi-Civita connection]] +* [[Parallel transport]] +* [[Ricci calculus]] +* [[Tensor derivative (continuum mechanics)]] +</div> + +==Notes== +{{Reflist}} + +==References== +*{{Cite book| author=Kobayashi, Shoshichi and Nomizu, Katsumi | title = [[Foundations of Differential Geometry]], Vol. 1 | publisher=[[Wiley Interscience]] | year=1996 (New edition) |isbn = 0-471-15733-3}} +*{{springer|id=c/c026870|title=Covariant differentiation|author=I.Kh. Sabitov}} +*{{Cite book|first=Shlomo|last=Sternberg|title=Lectures on Differential Geometry|year=1964|publisher=Prentice-Hall}} +*{{Cite book|first=Michael|last=Spivak|title=A Comprehensive Introduction to Differential Geometry (Volume Two)|publisher=Publish or Perish, Inc.|year=1999}} + +{{tensors}} + +{{DEFAULTSORT:Covariant Derivative}} +[[Category:Differential geometry]] +[[Category:Riemannian geometry]] +[[Category:Connection (mathematics)]] +[[Category:Mathematical methods in general relativity]] +[[Category:Solid mechanics]] + q1okzxofz2sogv51rkamubbz1565nyg + + + + Anderson localization + 0 + 10208 + + 10209 + 2013-12-19T22:17:38Z + + 216.171.5.23 + + /* Analysis */ + wikitext + text/x-wiki + In [[condensed matter physics]], '''Anderson localization''', also known as '''strong localization''', is the absence of diffusion of waves in a ''disordered'' medium. This phenomenon is named after the American physicist [[P. W. Anderson]], who was the first one to suggest the possibility of electron localization inside a semiconductor, provided that the degree of [[Randomness#In_the_physical_sciences|randomness]] of the [[impurities]] or [[crystallographic defect|defects]] is sufficiently large.<ref name=a58>{{ cite journal | last = Anderson | first = P. W. | authorlink = | coauthors = | year = 1958 | month = | title = Absence of Diffusion in Certain Random Lattices | journal = [[Physical Review|Phys. Rev.]] | volume = 109 | issue = 5| pages = 1492&ndash;1505 | doi = 10.1103/PhysRev.109.1492 | url = | accessdate = | quote = |bibcode = 1958PhRv..109.1492A }}</ref> + +Anderson localization is a general wave phenomenon that applies to the transport of electromagnetic waves, acoustic waves, quantum waves, spin waves, etc. This phenomenon is to be distinguished from [[weak localization]], which is the precursor effect of Anderson localization (see below), and from [[Mott transition|Mott localization]], named after Sir [[Nevill Mott]], where the transition from metallic to insulating behaviour is ''not'' due to disorder, but to a strong mutual [[Coulomb repulsion]] of electrons. + +==Introduction== + +In the original '''Anderson tight-binding model''', the evolution of the [[wave function]] ''&psi;'' on the ''d''-dimensional lattice '''Z'''<sup>''d''</sup> is given by the [[Schrödinger equation]] + +:<math> i \hbar \dot{\psi} = H \psi~, </math> + +where the [[Hamiltonian (quantum mechanics)|Hamiltonian]] ''H'' is given by + +:<math> (H \phi)(j) = E_j \phi(j) + \sum_{k \neq j} V(|k-j|) \phi(k)~, </math> + +with ''E''<sub>''j''</sub> random and independent, and interaction ''V''(''r'') falling off as ''r''<sup>-2</sup> at infinity. For example, one may take ''E''<sub>''j''</sub> uniformly distributed in [&minus;''W'', &nbsp; +''W''], and + +:<math> V(|r|) = \begin{cases} 1, & |r| = 1 \\ 0, &\text{otherwise.} \end{cases} </math> + +Starting with ''&psi;''<sub>0</sub> localised at the origin, one is interested in how fast the probability distribution <math>|\psi|^2</math> diffuses. Anderson's analysis shows the following: + +* if ''d'' is 1 or 2 and ''W'' is arbitrary, or if ''d'' &ge; 3 and ''W''/ħ is sufficiently large, then the probability distribution remains localized: + +::<math> \sum_{n \in \mathbb{Z}^d} |\psi(t,n)|^2 |n| \leq C </math> + +:uniformly in ''t''. This phenomenon is called '''Anderson localization'''. + +* if ''d'' &ge; 3 and ''W''/ħ is small, + +:<math> \sum_{n \in \mathbb{Z}^d} |\psi(t,n)|^2 |n| \approx D \sqrt{t}~, </math> + +:where ''D'' is the diffusion constant. + +==Analysis== +[[File:WF111-Anderson transition-multifractal.jpeg|thumbnail|Example of a multifractal electronic eigenstate at the Anderson localization transition in a system with 1367631 atoms.]] + +The phenomenon of Anderson localization, particularly that of weak localization, finds its origin in the [[wave interference]] between multiple-scattering paths. In the strong scattering limit, the severe interferences can completely halt the waves inside the disordered medium. + +For non-interacting electrons, a highly successful approach was put forward in 1979 by Abrahams ''et al.''<ref>{{cite journal|last1=Abrahams|first1=E.|last2=Anderson|first2=P.W.|last3=Licciardello|first3=D.C.|last4=Ramakrishnan|first4=T.V.|title=Scaling Theory of Localization: Absence of Quantum Diffusion in Two Dimensions|year=1979|journal=Phys. Rev. Lett.|volume=42|issue=10|pages=673&ndash;676|url=http://link.aps.org/doi/10.1103/PhysRevLett.42.673|doi=10.1103/PhysRevLett.42.673|bibcode = 1979PhRvL..42..673A }}</ref> This scaling hypothesis of localization suggests that a disorder-induced [[metal-insulator transition]] (MIT) exists for non-interacting electrons in three dimensions (3D) at zero magnetic field and in the absence of spin-orbit coupling. Much further work has subsequently supported these scaling arguments both analytically and numerically (Brandes ''et al.'', 2003; see Further Reading). In 1D and 2D, the same hypothesis shows that there are no extended states and thus no MIT. However, since 2 is the lower critical dimension of the localization problem, the 2D case is in a sense close to 3D: states are only marginally localized for weak disorder and a small magnetic field or [[spin-orbit coupling]] can lead to the existence of extended states and thus an MIT. Consequently, the localization lengths of a 2D system with potential-disorder can be quite large so that in numerical approaches one can always find a localization-delocalization transition when either decreasing system size for fixed disorder or increasing disorder for fixed system size. + +Most numerical approaches to the localization problem use the standard tight-binding Anderson [[Hamiltonian (quantum mechanics)|Hamiltonian]] with onsite-potential disorder. Characteristics of the electronic [[eigenstate]]s are then investigated by studies of participation numbers obtained by exact diagonalization, multifractal properties, level statistics and many others. Especially fruitful is the [[transfer-matrix method]] (TMM) which allows a direct computation of the localization lengths and further validates the scaling hypothesis by a numerical proof of the existence of a one-parameter scaling function. Direct numerical solution of Maxwell equations to demonstrate Anderson localization of light has been implemented (Conti and Fratalocchi, 2008). The phenomenon has also been observed in numerical simulation of the non-relativistic Schrödinger equation. + +==Experimental evidence== +Two reports of Anderson localization of light in 3D random media exist up to date (Wiersma ''et al.'', 1997 and Storzer ''et al.'', 2006; see Further Reading), even though absorption complicates interpretation of experimental results (Scheffold ''et al.'', 1999). Anderson localization can also be observed in a perturbed periodic potential where the transverse localization of light is caused by random fluctuations on a photonic lattice. Experimental realizations of transverse localization were reported for a 2D lattice (Schwartz ''et al.'', 2007) and a 1D lattice (Lahini ''et al.'', 2006). It has also been observed by localization of a [[Bose–Einstein condensate]] in a 1D disordered optical potential (Billy ''et al.'', 2008; Roati ''et al.'', 2008). Anderson localization of elastic waves in a 3D disordered medium has been reported (Hu ''et al.'', 2008). The observation of the MIT has been reported in a 3D model with atomic matter waves (Chabé ''et al.'', 2008). [[Random laser]]s can operate using this phenomenon. + +==Notes== +{{Reflist}} + +==Further reading== + +*{{ Cite document + | last1=Brandes + | first1=T. + | last2=Kettemann + | first2=S. + | lastauthoramp=yes + | title= The Anderson Transition and its Ramifications --- Localisation, Quantum Interference, and Interactions + | publisher=Springer Verlag + | place=Berlin + | year=2003 + | postscript=<!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->{{inconsistent citations}} +}} + +*{{ cite journal | last = Wiersma | first = Diederik S. | authorlink = | coauthors = ''et al.'' | year = 1997 | month = | title = Localization of light in a disordered medium | journal = [[Nature (journal)|Nature]] | volume = 390 | issue = 6661 | pages = 671&ndash;673 | doi = 10.1038/37757 | url = | accessdate = | quote = |bibcode = 1997Natur.390..671W }} + +*{{ cite journal | last = Störzer | first = Martin | authorlink = | coauthors = ''et al.'' | year = 2006 | month = | title = Observation of the critical regime near Anderson localization of light | journal = [[Physical Review Letters|Phys. Rev. Lett.]] | volume = 96 | issue = 6| pages = 063904 | doi = 10.1103/PhysRevLett.96.063904 |pmid=16605998 | url = | accessdate = | quote = | bibcode=2006PhRvL..96f3904S|arxiv = cond-mat/0511284 }} + +*{{ cite journal | last = Scheffold | first = Frank | authorlink = | coauthors = ''et al.'' | year = 1999 | month = | title = Localization or classical diffusion of light? | journal = Nature | volume = 398 | issue = 6724| pages = 206&ndash;207 | doi = 10.1038/18347 | url = | accessdate = | quote = |bibcode = 1999Natur.398..206S }} + +*{{ cite journal | last = Schwartz| first = T. | authorlink = | coauthors = ''et al.'' | year = 2007 | month = | title = Transport and Anderson Localization in disordered two-dimensional Photonic Lattices | journal = Nature | volume = 446| issue = 7131| pages = 52&ndash;55 | doi = 10.1038/nature05623 | url = | accessdate = | quote = | pmid = 17330037 |bibcode = 2007Natur.446...52S }} + +*{{ cite journal | last = Lahini| first = Y. | authorlink = | coauthors = ''et al.'' | year = 2006 | month = | title = Direct Observation of Anderson Localized Modes and the Effect of Nonlinearity | journal = Photonic Metamaterials: From Random to Periodic (META), Grand Bahama Island, The Bahamas, June 5, 2006, Postdeadline Papers | volume = | issue = | pages = | doi = | url = http://www.opticsinfobase.org/abstract.cfm?URI=META-2006-ThC4 | accessdate = | quote = | pmid = | bibcode=}} + +*{{ cite journal | last = Billy | first = Juliette | authorlink = | coauthors = ''et al.'' | year = 2008 | month = | title = Direct observation of Anderson localization of matter waves in a controlled disorder | journal = Nature | volume = 453 | issue = 7197 | pages = 891&ndash;894 | doi = 10.1038/nature07000 | url = | accessdate = | quote = | pmid = 18548065 |bibcode = 2008Natur.453..891B |arxiv = 0804.1621 }} + +*{{ cite journal | last = Roati | first = Giacomo | authorlink = | coauthors = ''et al.'' | year = 2008 | month = | title = Anderson localization of a non-interacting Bose-Einstein condensate | journal = Nature | volume = 453 | issue = 7197 | pages = 895&ndash;898 | doi = 10.1038/nature07071 | url = | accessdate = | quote = | pmid = 18548066 |bibcode = 2008Natur.453..895R |arxiv = 0804.2609 }} + +*{{ cite journal | last = Ludlam | first = J. J. | authorlink = | coauthors = ''et al.'' | year = 2005 | month = | title = Universal features of localized eigenstates in disordered systems | journal = Journal of Physics: Condensed Matter | volume = 17 | issue = 30| pages = L321–L327 | doi = 10.1088/0953-8984/17/30/L01 | url = | accessdate = | quote = |bibcode = 2005JPCM...17L.321L }} + +*{{ cite journal | last = Conti | first = C | authorlink = | coauthors = A. Fratalocchi | year = 2008 | month = | title = Dynamic light diffusion, three-dimensional Anderson localization and lasing in inverted opals | journal = [[Nature Physics]] | volume = 4 | issue = 10| pages = 794&ndash;798 | doi = 10.1038/nphys1035 | url = | accessdate = | quote = |bibcode = 2008NatPh...4..794C |arxiv = 0802.3775 }} + +*{{ cite journal | last = Hu | first = Hefei | authorlink = | coauthors = ''et al.'' | year = 2008 | month = | title = Localization of ultrasound in a three-dimensional elastic network | journal = Nature Physics | volume = 4| issue = 12| pages = 945| doi = 10.1038/nphys1101 | url = | accessdate = | quote = |bibcode = 2008NatPh...4..945H |arxiv = 0805.1502 }} + +*{{ cite journal | last = Chabé| first = J. | authorlink = | coauthors = ''et al.'' | year = 2008 | month = | title = Experimental Observation of the Anderson Metal-Insulator Transition with Atomic Matter Waves | journal = Phys. Rev. Lett. | volume = 101| issue = 25| pages = 255702| doi = 10.1103/PhysRevLett.101.255702 | url = | accessdate = | quote = | pmid = 19113725 | bibcode=2008PhRvL.101y5702C|arxiv = 0709.4320 }} + +==External links== +*[http://ptonline.aip.org/journals/doc/PHTOAD-ft/vol_62/iss_8/24_1.shtml Fifty years of Anderson localization] ''Physics Today'', August 2009. +*[http://www2.warwick.ac.uk/fac/sci/csc/images/wf111.jpg Example of an electronic eigenstate at the MIT in a system with 1367631 atoms] Each cube indicates by its size the probability to find the electron at the given position. The color scale denotes the position of the cubes along the axis into the plane +*[http://www2.warwick.ac.uk/fac/sci/physics/research/theory/research/disqs/media Videos of multifractal electronic eigenstates at the MIT] +*[http://lpmmc.grenoble.cnrs.fr/spip.php?article408 Anderson localization of elastic waves] +*[http://www.opfocus.org/index.php?topic=story&v=1&s=1 Popular scientific article on the first experimental observation of Anderson localization in matter waves] + +[[Category:Mesoscopic physics]] +[[Category:Condensed matter physics]] + 9jvondbp4o4tl3uvtl4r0kw3jdzzo21 + + + + Trajectory of a projectile + 0 + 9832 + + 9833 + 2014-02-02T22:06:25Z + + Gdecarp + 0 + + /* External links */ + wikitext + text/x-wiki + {{refimprove|date=March 2012}} +In [[physics]], the '''ballistic trajectory of a projectile''' is the path that a thrown or launched [[projectile]] will take under the action of [[gravity]], neglecting all other forces, such as [[friction]] from air resistance, without [[Vehicle propulsion|propulsion]]. + +The [[United States Department of Defense]] and [[NATO]] define a [[Ballistics|ballistic]] [[trajectory]] as a trajectory traced after the propulsive force is terminated and the body is acted upon only by gravity and [[aerodynamic drag]].<ref>{{Cite web|url=http://www.dtic.mil/doctrine/jel/doddict/data/b/00611.html|title=Ballistic trajectory|accessdate=2011-07-28|publisher=[[Defense Technical Information Center]]}}</ref> + +The following applies for ranges which are small compared to the size of the Earth. For longer ranges see [[sub-orbital spaceflight]]. + +==Notation== +In the equations on this page, the following variables will be used: + +* <var>g</var>: the [[gravitational acceleration]]&mdash;usually taken to be 9.81 m/s<sup>2</sup> near the Earth's surface +* <var>θ</var>: the angle at which the projectile is launched +* <var>v</var>: the velocity at which the projectile is launched +* <var>y<sub>0</sub></var>: the initial height of the projectile +* <var>d</var>: the total horizontal distance traveled by the projectile +Ballistics (gr. βάλλειν ('ba'llein'), "throw") is the science of mechanics that deals with the flight, behavior, and effects of projectiles, especially bullets, gravity bombs, rockets, or the like; the science or art of designing and accelerating projectiles so as to achieve a desired performance. +A ballistic body is a body which is free to move, behave, and be modified in appearance, contour, or texture by ambient conditions, substances, or forces, as by the pressure of gases in a gun, by rifling in a barrel, by gravity, by temperature, or by air particles. A ballistic missile is a missile only guided during the relatively brief initial powered phase of flight, whose course is subsequently governed by the laws of classical mechanics. + +== Conditions at the final position of the projectile == +=== Distance traveled === +[[Image:Ideal_projectile_motion_for_different_angles.svg|thumb|350px|Trajectories of projectiles launched at different elevation angles but the same speed of 10 m/s in a vacuum and uniform downward gravity field of 10 m/s<sup>2</sup>. Points are at 0.05 s intervals and length of their tails is linearly proportional to their speed. ''t'' = time from launch, ''T'' = time of flight, ''R'' = range and ''H'' = highest point of trajectory (indicated with arrows).]] +The total horizontal distance <var>(d)</var> traveled. + +: <math> d = \frac{v \cos \theta}{g} \left( v \sin \theta + \sqrt{(v \sin \theta)^2 + 2gy_0} \right) </math> + +When the surface the object is launched from and is flying over is flat (the initial height is zero), the distance traveled is: + +: <math> d = \frac{v^2 \sin(2 \theta)}{g} </math> + +Thus the maximum distance is obtained if <var>θ</var> is 45 degrees. This distance is: + +: <math> d = \frac{v^2}{g} </math> + +For explicit derivations of these results, see [[Range of a projectile]]. + +=== Time of flight === +The time of flight <var>(t)</var> is the time it takes for the projectile to finish its trajectory. + +: <math> t = \frac{d}{v \cos\theta} = \frac{v \sin \theta + \sqrt{(v \sin \theta)^2 + 2gy_0}}{g} </math> + +As above, this expression can be reduced to + +: <math> t = \frac{\sqrt{2} \cdot v}{g} </math> + +if <var>θ</var> is 45° and <var>y<sub>0</sub></var> is 0. + +The above results are found in [[Range of a projectile]]. + +=== Angle of reach === +The "angle of reach" (not quite a scientific term) is the angle (φ) at which a projectile must be launched in order to go a distance <var>d</var>, given the initial velocity <var>v</var>. + +: <math> \sin(2\phi) = \frac{gd}{v^2} </math> + +: <math> \phi = \frac{1}{2} \arcsin \left( \frac{gd}{v^2} \right) </math> + +== Conditions at an arbitrary distance <var>x</var> == +=== Height at <var>x</var> === +The height <var>y</var> of the projectile at distance <var>x</var> is given by + +: <math> y = y_0 + x \tan \theta - \frac {gx^2}{2(v\cos\theta)^2} </math>. + +The third term is the deviation from traveling in a straight line. + +=== Velocity at <var>x</var> === +The magnitude, <math>|v|,</math> of the velocity of the projectile at distance <var>x</var> is given by + +: <math> | v | = \sqrt{v^2 - 2gx \tan \theta + \left(\frac{gx}{v\cos \theta}\right)^2} </math>. + +==== Derivation ==== +The magnitude |<var>v</var>| of the velocity is given by + +: <math> | v | = \sqrt{V_x^2 + V_y^2} </math>, + +where <var>V<sub>x</sub></var> and <var>V<sub>y</sub></var> are the instantaneous velocities in the <var>x</var>- and <var>y</var>-directions, respectively. + +Here the <var>x</var>-velocity remains constant; it is always equal to <var>v</var> cos <var>θ</var>. + +The <var>y</var>-velocity can be found using the formula + +: <math> v_f = v_i + at </math> + +by setting <var>v<sub>i</sub></var> = <var>v</var> sin <var>θ</var>, <var>a</var> = <var>-g</var>, and <math>t = \frac{x}{v \cos \theta}</math>. (The latter is found by taking <var>x</var> = (<var>v</var> cos <var>θ</var>) <var>t</var> and solving for <var>t</var>.) Then, + +: <math> V_y = v \sin \theta - \frac{gx}{v \cos \theta} </math> + +and + +: <math> | v | = \sqrt{(v \cos \theta)^2 + \left(v \sin \theta - \frac{gx}{v \cos \theta} \right)^2} </math>. + +The formula above is found by simplifying. + +== Angle <math> \theta</math> required to hit coordinate (<var>x</var>,<var>y</var>) == +[[Image:Trajectory_for_changing_launch_angle.gif|right|thumb|330px|Vacuum trajectory of a projectile for different launch angles. Launch speed is the same for all angles, 50&nbsp;m/s if "g" is 10&nbsp;m/s<sup>2</sup>.]] +To hit a target at range <var>x</var> and altitude <var>y</var> when fired from (0,0) and with initial speed <var>v</var> the required angle(s) of launch <math> \theta</math> are: + +: <math> \theta = \arctan{\left(\frac{v^2\pm\sqrt{v^4-g(gx^2+2yv^2)}}{gx}\right)} </math> + +The two roots of the equation correspond to the two possible launch angles, so long as they aren't imaginary, in which case the initial speed is not great enough to reach the point (<var>x<var/>,<var>y<var/>) you have selected. The greatest feature of this formula is that it allows you to find the angle of launch needed without the restriction of y = 0. + +'''Derivation''' + +First, two elementary formulae are called upon relating to projectile motion: + +:<math>x = v t \cos \theta , t = \frac{x}{v \cos \theta}</math> (1) + +:<math>y = vt \sin \theta - \frac{1}{2} g t^2</math> (2) + +Solving (1) for t and substituting this expression in (2) gives: + +:<math> y = x \tan \theta - \frac{gx^2}{2v^2 \cos^2 \theta}</math> (2a) + +:<math> y = x \tan \theta - \frac{gx^2 \sec^2 \theta}{2v^2}</math> (2b) (Trigonometric identity) + +:<math>y =x \tan \theta - \frac{gx^2}{2v^2}(1+ \tan^2 \theta)</math> (2c) (Trigonometric identity) + +:<math> 0 = \frac{-gx^2}{2v^2} \tan^2 \theta + x \tan \theta - \frac{gx^2}{2v^2} - y</math> (2d) (Algebra) + +Let <math>p = \tan \theta</math> + +:<math> 0 = \frac{-gx^2}{2v^2} p^2 + xp - \frac{gx^2}{2v^2} - y</math> (2e) (Substitution) + +:<math> p = {\frac{-x\pm\sqrt{x^2-4(\frac{-gx^2}{2v^2})(\frac{-gx^2}{2v^2}-y)}}{2(\frac{-gx^2}{2v^2}) }}</math> (2f) ([[Quadratic formula]]) + +:<math> p = \frac{v^2\pm\sqrt{v^4-g(gx^2+2yv^2)}}{gx} </math> (2f) (Algebra) + +:<math> \tan \theta = \frac{v^2\pm\sqrt{v^4-g(gx^2+2yv^2)}}{gx} </math> (2g) (Substitution) + +:<math> \theta = \tan^{-1}{\left(\frac{v^2\pm\sqrt{v^4-g(gx^2+2yv^2)}}{gx}\right)} </math> (2h) (Algebra) + +Also, if instead of a coordinate (<var>x</var>,<var>y</var>) you're interested in hitting a target at distance <var>r</var> and angle of elevation <math>\phi</math> (polar coordinates), use the relationships <math>x = r \cos \phi</math> and <math>y = r \sin \phi</math> and substitute to get: + +:<math> \theta = \tan^{-1}{\left(\frac{v^2\pm\sqrt{v^4-g(gr^2\cos^2\phi+2v^2r\sin\phi )}}{gr\cos\phi}\right)} </math> + +==Catching balls== +If a projectile, such as a baseball or cricket ball, travels in a parabolic path, with negligible air resistance, and if a player is positioned so as to catch it as it descends, he sees its angle of elevation increasing continuously throughout its flight. The tangent of the angle of elevation is proportional to the time since the ball was sent into the air, usually by being struck with a bat. Even when the ball is really descending, near the end of its flight, its angle of elevation seen by the player continues to increase. The player therefore sees it as if it were ascending vertically at constant speed. Finding the place from which the ball appears to rise steadily helps the player to position himself correctly to make the catch. If he is too close to the batsman who has hit the ball, it will appear to rise at an accelerating rate. If he is too far from the batsman, it will appear to slow rapidly, and then to descend. + +'''Proof''' + +Suppose the ball starts with a vertical component of velocity of <math>v,</math> upward, and a horizontal component of velocity of <math>h</math> toward the player who wants to catch it. Its altitude above the ground is given by: + +:<math>a=vt-\frac{1}{2}gt^2,</math> where <math>t</math> is the time since the ball was hit, and <math>g</math> is the acceleration due to gravity. + +The total time for the flight, until the ball is back down to the ground, from which it started, is given by: + +:<math>a=0</math> + +:<math> \therefore T=\frac{2v}{g}.</math> + +The horizontal component of the ball's distance from the catcher at time <math>t</math> is: + +:<math>d=h(T-t) = \frac{2hv}{g}-ht</math> + +The tangent of the angle of elevation of the ball, as seen by the catcher, is: + +:<math>\tan(e)=\frac{a}{d}</math> + +:<math>=\frac{vt-\frac{gt^2}{2}}{\frac{2hv}{g}-ht}</math> + +:<math>=\frac{2gvt-g^2t^2}{4hv-2ght}</math> + +:<math>=\frac{gt(2v-gt)}{2h(2v-gt)}</math> + +The quantity in the brackets will not be zero except when the ball is on the ground, therefore, while the ball is in flight: + +:<math>\tan(e)=\left(\frac{g}{2h}\right)t</math> + +The bracket in this last expression is constant for a given flight of the ball. Therefore the tangent of the angle of elevation of the ball, as seen by the player who is properly positioned to catch it, is directly proportional to the time since the ball was hit. + +== Trajectory of a projectile with air resistance == +{{Expert-subject|Physics|date=June 2008}} +[[File:Inclinedthrow.gif|thumb|400px|Trajectories of a mass thrown at an angle of 70°:<br> +{{color box|black}} without [[Drag (physics)|drag]]<br> +{{color box|blue}} with [[Stokes'_law|Stokes drag]]<br> +{{color box|green}} with [[Newtonian_fluid|Newton drag]]]] + +Air resistance will be taken to be in direct proportion to the velocity of the particle (i.e. <math>F_a \propto \vec{v}</math>). This is valid at low speed (low [[Reynolds number]]), and this is done so that the equations describing the particle's motion are easily solved. At higher speed (high Reynolds number) the force of air resistance is proportional to the square of the particle's velocity (see [[drag equation]]). Here, <math>v_0</math>,<math>v_x</math> and <math>v_y</math> will be used to denote the initial velocity, the velocity along the direction of <var>x</var> and the velocity along the direction of <var>y</var>, respectively. The mass of the projectile will be denoted by <var>m</var>. For the derivation only the case where <math>0^o \le \theta \le 180^o</math> is considered. Again, the projectile is fired from the origin (0,0). + +For this assumption, that air resistance may be taken to be in direct proportion to the velocity of the particle is not correct for a typical projectile in air with a velocity above a few tens of meters/second, and so this equation should not be applied to that situation. + +[[Image:Free_body_diagram2.png|right|thumb|320px|Free body diagram of a body on which only gravity and air resistance acts]] + +The [[free body diagram]] on the right is for a projectile that experiences air resistance and the effects of gravity. Here, air resistance is assumed to be in the direction opposite of the projectile's velocity. <math>F_{air} = -kv</math> (actually <math>F_{air} = -k v^2</math> is more realistic, but not used here, to ensure an analytic solution,) is written due to the initial assumption of direct proportionality implies that the air resistance and the velocity differ only by a constant arbitrary factor with units of N*s/m. + +As an example, say that when the velocity of the projectile is 4 m/s, the air resistance is 7 [[Newton (unit)|newtons]] (N). When the velocity is doubled to 8 m/s, the air resistance doubles to 14 N accordingly. In this case, <var>k</var> = 7/4 N x s/m. Note that k is needed in order to relate the air resistance and the velocity by an equal sign: otherwise, it would be stating incorrectly that the two are always equal in value (i.e. 1 m/s of velocity gives 1 N of force, 2 m/s gives 2 N etc.) which isn't always the case, and also it keeps the equation dimensionally correct (a force and a velocity cannot be equal to each other, e.g. m/s = N). As another quick example, [[Hooke's Law]] (<math>F = -kx</math>) describes the force produced by a spring when stretched a distance <var>x</var> from its resting position, and is another example of a direct proportion: k in this case has units N/m (in metric). + +To show why k = 7/4 N·s/m above, first equate 4 m/s and 7 N: + +<math>4 \ \mathrm{m}/\mathrm{s} = 7 \ \mathrm{N}</math> (Incorrect) + +<math>4 \ \mathrm{m}/\mathrm{s} \times (\frac{7}{4} \ \mathrm{N} \times \frac {\mathrm{s}}{\mathrm{m}})= 7 \ \mathrm{N}</math> (Introduction of k) + +<math>4 \ \mathrm{N} \times \frac{7}{4}= 7 \ \mathrm{N}</math> (<math>\frac{\mathrm{s}}{\mathrm{m}} \times \frac{\mathrm{m}}{\mathrm{s}}</math> cancels) + +<math>7 \ \mathrm{N} = 7 \ \mathrm{N} (4 \times \frac{7}{4} = 7)</math> + +For more on proportionality, see: [[Proportionality (mathematics)]] +<br /><br /> +The relationships that represent the motion of the particle are derived by [[Newton's Second Law]], both in the x and y directions. +In the x direction <math>\Sigma F = -kv_x = ma_x</math> and in the y direction <math>\Sigma F = -kv_y - mg = ma_y</math>.<br /><br /> +This implies that: <math>a_x = \frac{-kv_x}{m} = \frac{dv_x}{dt}</math> (1),<br /><br /> and <br /> +<br /><math>a_y = \frac{1}{m}(-kv_y - mg) = \frac{-kv_y}{m} - g = \frac{dv_y}{dt}</math> (2) +<br /> +Solving (1) is an elementary [[differential equation]], thus the steps leading to a unique solution for <math>v_x</math> and, subsequently, <math>x</math> will not be enumerated. Given the initial conditions <math>v_x = v_{xo}</math> (where <math>v_{xo}</math> is understood to be the x component of the initial velocity) and <math>s_x = 0</math> for <math>t = 0</math>: <br /> +<br /> +<math>v_x = v_{xo} e^{-\frac{k}{m}t}</math> (1a)<br /> +<br /> +<math>s_x = \frac{m}{k}v_{xo}(1-e^{-\frac{k}{m}t})</math> (1b) <br /> +<br/> +While (1) is solved much in the same way, (2) is of distinct interest because of its non-homogeneous nature. Hence, we will be extensively solving (2). Note that in this case the initial conditions are used <math>v_y = v_{yo}</math> and <math>s_y = 0</math> when <math>t = 0</math>.<br /> +<br /> +<math>\frac{dv_y}{dt} = \frac{-k}{m}v_y - g </math> (2)<br /> +<br /> +<math>\frac{dv_y}{dt} + \frac{k}{m}v_y = - g </math> (2a) <br /> +<br /> +This first order, linear, non-homogeneous differential equation may be solved a number of ways, however, in this instance it will be quicker to approach the solution via an [[integrating factor]]: <math>e^{\int \frac{k}{m} \, dt}</math>.<br /> +<br /> +<math>e^{\frac{k}{m}t}(\frac{dv_y}{dt} + \frac{k}{m}v_y) = e^{\frac{k}{m}t}(-g)</math> (2c)<br /> +<br /> +<math>(e^{\frac{k}{m}t}v_y)^\prime = e^{\frac{k}{m}t}(-g) </math> (2d)<br /> +<br /> +<math>\int{(e^{\frac{k}{m}t}v_y)^\prime \,dt} = e^{\frac{k}{m}t}v_y = \int{ e^{\frac{k}{m}t}(-g) \, dt} </math> (2e)<br /> +<br /> +<math>e^{\frac{k}{m}t}v_y = \frac{m}{k} e^{\frac{k}{m}t}(-g) + C </math>(2f) <br /> +<br /> +<math>v_y = \frac{-mg}{k} + Ce^{\frac{-k}{m}t}</math> (2g)<br /> +<br /> +And by integration we find:<br /> +<br /> +<math>s_y = -\frac{mg}{k}t - \frac{m}{k}(v_{yo} + \frac{mg}{k})e^{-\frac{k}{m}t} + C</math> (3)<br /> +<br /> +Solving for our initial conditions:<br /> +<br /> +<math>v_y(t) = -\frac{mg}{k} + (v_{yo} + \frac{mg}{k})e^{-\frac{k}{m}t}</math> (2h)<br /> +<br /> +<math>s_y(t) = -\frac{mg}{k}t - \frac{m}{k}(v_{yo} + \frac{mg}{k})e^{-\frac{k}{m}t} + \frac{m}{k}(v_{yo} + \frac{mg}{k})</math> (3a)<br /> +<br /> +With a bit of algebra to simplify (3a): <br /> +<math>s_y(t) = -\frac{mg}{k}t + \frac{m}{k}(v_{yo} + \frac{mg}{k})(1 - e^{-\frac{k}{m}t})</math> (3b)<br /> +<br /> + +An example is given using values for the mass and terminal velocity for a [[baseball]] taken from [http://hyperphysics.phy-astr.gsu.edu/hbase/airfri2.html#c3]. +:''m'' = 0.145 kg (5.1 oz) +:''v''<sub>0</sub> = 44.7 m/s (100 mph) +:''g'' = -9.81 m/s² (-32.2 ft/s²) +:''v''<sub>t</sub> = -33.0 m/s (-73.8 mph) +:<math>k =\frac{mg}{v_t} = \frac{(0.145 \mbox{ kg})(-9.81 \ \mathrm{m}/\mathrm{s}^2)}{-33.0 \ \mathrm{m}/\mathrm{s}} = 0.0431 \mbox{ kg}/\mbox{s} , \ \theta = 45^o</math>. + +[[Image:BaseballProjectileGraph.jpg|left|This graph was produced using [http://www.graphcalc.com GraphCalc]]]{{clear}} + +The red path (the lower path) is the path taken by the projectile modeled by the equations derived above, and the green path is taken by an idealized projectile, one that ignores air resistance altogether. (3.28 ft/m) Ignoring air resistance is not ideal in this scenario, as with no air resistance, a home run could be hit with 270 ft to spare. (The mechanics of pitching at 45 degrees notwithstanding.) And in most cases it's more accurate to assume <math>F_a \propto \vec{v}^2</math>, meaning when air resistance increases by a factor of <var>p</var> the resistance increases by <math>p^2</math>. In the first example of proportionality, where the velocity was doubled to 8 m/s, the air resistance would instead be quadrupled (<math>2^2=4</math>) to 28 N: this only adds to the large amount of error in neglecting air resistance. For an analytic solution: [http://math.stackexchange.com/questions/150242/teenager-solves-newton-dynamics-problem-where-is-the-paper Shouryya Ray solves one of Newton's puzzles] + +==See also== +*[[Ballistic coefficient]] +*[[Range of a projectile]] +*[[Trajectory]] +In fluid dynamics, drag (sometimes called air resistance or fluid resistance) refers to forces which act on a solid object in the direction of the relative fluid flow velocity.[1] [2] [3] [4] Unlike other resistive forces such as dry friction, which is nearly independent of velocity, drag forces depend on velocity.[5] +Drag forces always decrease fluid velocity relative to the solid object in the fluid's path. + +==References== +{{Reflist}} + +==External links== +* [http://www.phy.ntnu.edu.tw/ntnujava/htmltag.php?code=users.sgeducation.lookang.Projectile02_pkg.Projectile02Applet.class&name=Projectile02&muid=14019 Open Source Physics computer model] +*[http://formularium.org/?go=88 Spreadsheet to calculate distance and time of flight] +*[http://www.phy.hk/wiki/englishhtm/ThrowABall.htm Java applet of projectile motion] +* {{cite journal | date=2014 | title=Analytical Ballistic Trajectories with Approximately Linear Drag | publisher=Hindawi Publishing Corporation | url=http://dx.doi.org/10.1155/2014/463489 }} +*[http://www.physics.usyd.edu.au/~cross/TRAJECTORIES/42.%20Ball%20Trajectories.pdf] +[[Category:Ballistics]] + +[[de:Wurfparabel]] +[[sv:Kastparabel]] + cb8arxu0g6ku3n1mh5fy0sxm0ku6ka3 + + + + Cusp neighborhood + 0 + 8876 + + 8877 + 2012-06-06T07:05:00Z + + Lockley + 0 + + + remove context tag + wikitext + text/x-wiki + {{Unreferenced|date=October 2008}} + +In [[mathematics]], a '''cusp neighborhood''' is defined as a set of points near a [[cusp (singularity)|cusp]]. + +==Cusp neighborhood for a Riemann surface== +The cusp neighborhood for a hyperbolic [[Riemann surface]] can be defined in terms of its [[Fuchsian model]]. + +Suppose that the [[Fuchsian group]] ''G'' contains a [[parabolic element]] g. For example, the element ''t'' ∈ SL(2,'''Z''') where + +:<math>t(z)=\begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}:z = \frac{1\cdot z+1}{0 \cdot z + 1} = z+1</math> + +is a parabolic element. Note that all parabolic elements of SL(2,'''C''') are [[conjugacy class|conjugate]] to this element. That is, if ''g'' ∈ SL(2,'''Z''') is parabolic, then <math>g=h^{-1}th</math> for some ''h'' ∈ SL(2,'''Z'''). + +The set + +:<math>U=\{ z \in \mathbf{H} : \Im z > 1 \} </math> + +where '''H''' is the [[upper half-plane]] has + +:<math>\gamma(U) \cap U = \emptyset</math> + +for any <math>\gamma \in G - \langle g \rangle </math> where <math>\langle g \rangle</math> is understood to mean the [[group (mathematics)|group]] generated by ''g''. That is, γ acts [[properly discontinuously]] on ''U''. Because of this, it can be seen that the projection of ''U'' onto '''H'''/''G'' is thus + +:<math>E = U/ \langle g \rangle</math>. + +Here, ''E'' is called the '''neighborhood of the cusp corresponding to g'''. + +Note that the hyperbolic area of ''E'' is exactly 1, when computed using the canonical [[Poincaré metric]]. This is most easily seen by example: consider the intersection of ''U'' defined above with the [[fundamental domain]] + +:<math>\left\{ z \in H: \left| z \right| > 1,\, \left| \,\mbox{Re}(z) \,\right| < \frac{1}{2} \right\}</math> + +of the [[modular group]], as would be appropriate for the choice of ''T'' as the parabolic element. When integrated over the [[volume element]] + +:<math>d\mu=\frac{dxdy}{y^2}</math> + +the result is trivially 1. Areas of all cusp neighborhoods are equal to this, by the invariance of the area under conjugation. + +[[Category:Hyperbolic geometry]] +[[Category:Riemann surfaces]] + 6taia3xpsxrwblsf2ofqaxhhqb3ktu5 + + + + Spectrum (functional analysis) + 0 + 2950 + + 2951 + 2014-01-27T21:14:22Z + + Chris the speller + 0 + + + per [[WP:HYPHEN]], sub-subsection 3, points 3,4,5, replaced: ,consider → , consider, densely- → densely (2) using [[Project:AWB|AWB]] + wikitext + text/x-wiki + In [[functional analysis]], the concept of the '''spectrum''' of a [[bounded operator]] is a generalisation of the concept of [[eigenvalue]]s for [[matrix (mathematics)|matrices]]. Specifically, a [[complex number]] λ is said to be in the spectrum of a bounded linear operator ''T'' if λ''I''&nbsp;&minus;&nbsp;''T'' is not [[inverse function|invertible]], where ''I'' is the [[identity operator]]. The study of spectra and related properties is known as [[spectral theory]], which has numerous applications, most notably the [[mathematical formulation of quantum mechanics|mathematical formulation]] of [[quantum mechanics]]. + +The spectrum of an operator on a [[Dimension (vector space)|finite-dimensional]] [[vector space]] is precisely the set of eigenvalues. However an operator on an infinite-dimensional space may have additional elements in its spectrum, and may have no eigenvalues. For example, consider the [[unilateral shift|right shift]] operator ''R'' on the [[Hilbert space]] [[Lp space|ℓ<sup>2</sup>]], +:<math>(x_1, x_2, \dots) \mapsto (0, x_1, x_2, \dots).</math> +This has no eigenvalues, since if ''Rx''=λ''x'' then by expanding this expression we see that ''x''<sub>1</sub>=0, ''x''<sub>2</sub>=0, etc. On the other hand 0 is in the spectrum because the operator ''R''&nbsp;&minus;&nbsp;0 (i.e. ''R'' itself) is not invertible: it is not surjective since any vector with non-zero first component is not in its range. In fact ''every'' bounded linear operator on a [[complex number|complex]] [[Banach space]] must have a non-empty spectrum. + +The notion of spectrum extends to [[densely defined operator|densely defined]] [[unbounded operator]]s. In this case a [[complex number]] λ is said to be in the spectrum of such an operator ''T'':''D''→''X'' (where ''D'' is dense in ''X'') if there is no bounded inverse (λ''I''&nbsp;&minus;&nbsp;''T'')<sup>−1</sup>:''X''→''D''. If ''T'' is a [[closed operator]] (which includes the case that ''T'' is a bounded operator), boundedness of such inverses follow automatically if the inverse exists at all. + +The space of bounded linear operators ''B''(''X'') on a Banach space ''X'' is an example of a [[unital algebra|unital]] [[Banach algebra]]. Since the definition of the spectrum does not mention any properties of ''B''(''X'') except those that any such algebra has, the notion of a spectrum may be generalised to this context by using the same definition verbatim. + +==Spectrum of a bounded operator== +===Definition=== +Let <math>T</math> be a bounded linear operator acting on a Banach space <math>\mathbb{X}</math> over the scalar field <math>\mathbb{K}</math>, and <math>I</math> be the [[identity operator]] on <math>\mathbb{X}</math>. The '''spectrum''' of <math>T</math> is the set of all <math>\lambda \in \mathbb{K}</math> for which the operator <math>\lambda I - T</math> does not have an inverse that is a bounded linear operator. + +Since <math>\lambda I - T</math> is a linear operator, the inverse is linear if it exists; and, by the [[bounded inverse theorem]], it is bounded. Therefore the spectrum consists precisely of those scalars <math>\lambda</math> for which <math>\lambda I - T</math> is not [[bijective]]. + +The spectrum of a given operator <math>T</math> is often denoted <math>\sigma(T)</math>, and its complement, the [[resolvent set]], is denoted <math>\rho(T) = \mathbb{K} \setminus \sigma(T)</math>. + +===Spectrum and eigenvalues=== +If <math>\lambda</math> is an eigenvalue of <math>T</math>, then the operator <math>T-\lambda I</math> is not one-to-one, and therefore its inverse <math>(T-\lambda I)^{-1}</math> is not defined. However, the converse statement is not true: the operator <math>T - \lambda I</math> may not have an inverse, even if <math>\lambda</math> is not an eigenvalue. Thus the spectrum of an operator always contains all its eigenvalues, but is not limited to them. + +For example, consider the Hilbert space <math>\ell^2(\mathbb{Z})</math>, that consists of all [[Sequence#Finite_and_infinite|bi-infinite sequences]] of real numbers +:<math>v = (\ldots, v_{-2},v_{-1},v_0,v_1,v_2,\ldots)</math> +that have a finite sum of squares <math>\sum_{i=-\infty}^{+\infty} v_i^2</math>. The [[bilateral shift]] operator <math>T</math> simply displaces every element of the sequence by one position; namely if <math>u = T(v)</math> then <math>u_i = v_{i-1}</math> for every integer <math>i</math>. The eigenvalue equation <math>T(v) = \lambda v</math> has no solution in this space, since it implies that all the values <math>v_i</math> have the same absolute value (if <math>\lambda = 1</math>) or are a geometric progression (if <math>\lambda \neq 1</math>); either way, the sum of their squares would not be finite. However, the operator <math>T-\lambda I</math> is not invertible if <math>|\lambda| = 1</math>. For example, the sequence <math>u</math> such that <math>u_i = 1/(|i|+1)</math> is in <math>\ell^2(\mathbb{Z})</math>; but there is no sequence <math>v</math> in <math>\ell^2(\mathbb{Z})</math> such that <math>(T-I)v = u</math> (that is, <math>v_{i-1} = u_i + v_i</math> for all <math>i</math>). + +=== Basic properties === + +The spectrum of a bounded operator ''T'' is always a [[closed set|closed]], [[bounded set|bounded]] and [[empty set|non-empty]] subset of the [[complex plane]]. + +If the spectrum were empty, then the [[Resolvent formalism|''resolvent function'']] + +:<math>R(\lambda) = (\lambda I - T)^{-1} \,</math> + +would be defined everywhere on the complex plane and bounded. But it can be shown that the resolvent function ''R'' is [[holomorphic]] on its domain. By the vector-valued version of [[Liouville's theorem (complex analysis)|Liouville's theorem]], this function is constant, thus everywhere zero as it is zero at infinity. This would be a contradiction. + +The boundedness of the spectrum follows from the [[Neumann series|Neumann series expansion]] in ''λ''; the spectrum ''σ''(''T'') is bounded by ||''T''||. A similar result shows the closedness of the spectrum. + +The bound ||''T''|| on the spectrum can be refined somewhat. The ''[[spectral radius]]'', ''r''(''T''), of ''T'' is the radius of the smallest circle in the complex plane which is centered at the origin and contains the spectrum σ(''T'') inside of it, i.e. + +:<math>r(T) = \sup \{|\lambda| : \lambda \in \sigma(T)\}.</math> + +The '''spectral radius formula''' says<ref>Theorem 3.3.3 of Kadison & Ringrose, 1983, ''Fundamentals of the Theory of Operator Algebras, Vol. I: Elementary Theory'', New York: Academic Press, Inc.</ref> that for any element <math>T</math> of a [[Banach algebra]], +:<math>r(T) = \lim_{n \to \infty} \|T^n\|^{1/n}.</math> + +== Classification of points in the spectrum of an operator == +{{Further2|[[Decomposition of spectrum (functional analysis)]]}} +A bounded operator ''T'' on a Banach space is invertible, i.e. has a bounded inverse, if and only if ''T'' is bounded below and has dense range. Accordingly, the spectrum of ''T'' can be divided into the following parts: + +#''λ'' ∈ ''σ''(''T''), if ''λ - T'' is not bounded below. In particular, this is the case, if ''λ - T'' is not injective, that is, ''λ'' is an eigenvalue. The set of eigenvalues is called the '''point spectrum''' of ''T'' and denoted by '''σ<sub>p</sub>(T)'''. Alternatively, ''λ - T'' could be one-to-one but still not be bounded below. Such ''λ'' is not an eigenvalue but still an ''approximate eigenvalue'' of ''T'' (eigenvalues themselves are also approximate eigenvalues). The set of approximate eigenvalues (which includes the point spectrum) is called the '''approximate point spectrum''' of ''T'', denoted by '''σ<sub>ap</sub>(T)'''. +#''λ'' ∈ ''σ''(''T''), if ''λ - T'' does not have dense range. No notation is used to describe the set of all ''λ'', which satisfy this condition, but for a subset: If ''λ - T'' does not have dense range but is injective, ''λ'' is said to be in the '''residual spectrum''' of ''T'', denoted by '''σ<sub>r</sub>(T)''' . + +Note that the approximate point spectrum and residual spectrum are not necessarily disjoint (however, the point spectrum and the residual spectrum are). + +The following subsections provide more details on the three parts of ''σ''(''T'') sketched above. + +===Point spectrum=== + +If an operator is not injective (so there is some nonzero ''x'' with ''T''(''x'') = 0), then it is clearly not invertible. So if λ is an [[eigenvalue]] of ''T'', one necessarily has λ ∈ σ(''T''). The set of eigenvalues of ''T'' is also called the '''point spectrum''' of ''T'', denoted by '''σ<sub>p</sub>(T)''' . + +===Approximate point spectrum=== + +More generally, ''T'' is not invertible if it is not bounded below; that is, if there is no ''c'' > 0 such that ||''Tx''||&nbsp;≥ ''c''||''x''|| for all {{nowrap|''x'' ∈ ''X''}}. So the spectrum includes the set of '''approximate eigenvalues''', which are those λ such that {{nowrap|''T'' - λ ''I''}} is not bounded below; equivalently, it is the set of λ for which there is a sequence of unit vectors ''x''<sub>1</sub>, ''x''<sub>2</sub>, ... for which + +:<math>\lim_{n \to \infty} \|Tx_n - \lambda x_n\| = 0</math>. + +The set of approximate eigenvalues is known as the '''approximate point spectrum''', denoted by '''σ<sub>ap</sub>(T)'''. + +It is easy to see that the eigenvalues lie in the approximate point spectrum. + +'''Example''' Consider the [[bilateral shift]] ''T'' on ''l''<sup>2</sup>('''Z''') defined by + +:<math> +T(\cdots, a_{-1}, \hat{a}_0, a_1, \cdots) = (\cdots, \hat{a}_{-1}, a_0, a_1, \cdots) +</math> + +where the ˆ denotes the zero-th position. Direct calculation shows ''T'' has no eigenvalues, but every λ with |λ| = 1 is an approximate eigenvalue; letting ''x''<sub>''n''</sub> be the vector + +:<math>\frac{1}{\sqrt{n}}(\dots, 0, 1, \lambda^{-1}, \lambda^{-2}, \dots, \lambda^{1 - n}, 0, \dots)</math> + +then ||''x''<sub>''n''</sub>|| = 1 for all ''n'', but + +:<math>\|Tx_n - \lambda x_n\| = \sqrt{\frac{2}{n}} \to 0.</math> + +Since ''T'' is a unitary operator, its spectrum lie on the unit circle. Therefore the approximate point spectrum of T is its entire spectrum. This is true for a more general class of operators. + +A unitary operator is [[normal operator|normal]]. By [[spectral theorem]], a bounded operator on a Hilbert space is normal if and only if it is a [[multiplication operator]]. It can be shown that, in general, the approximate point spectrum of a bounded multiplication operator is its spectrum. + +===Residual spectrum=== + +An operator may be injective, even bounded below, but not invertible. The [[unilateral shift]] on ''l'' <sup>2</sup>('''N''') is such an example. This shift operator is an [[isometry]], therefore bounded below by 1. But it is not invertible as it is not surjective. The set of ''λ'' for which ''λI - T'' is injective but does not have dense range is known as the '''residual spectrum''' or '''compression spectrum''' of ''T'' and is denoted by '''σ<sub>r</sub>(T)'''. + +===Continuous spectrum=== + +The set of all ''λ'' for which ''λI'' - ''T'' is injective and has dense range, but is not surjective, is called the '''continuous spectrum''' of ''T'', denoted by '''σ<sub>c</sub>(T)''' . The continuous spectrum therefore consists of those approximate eigenvalues which are not eigenvalues and do not lie in the residual spectrum. That is, + +:<math>\sigma_c(T) = \sigma_{ap}(T) \setminus (\sigma_r(T) \cup \sigma_p(T)) </math>. + +===Peripheral spectrum=== + +The peripheral spectrum of an operator is defined as the set of points in its spectrum which have modulus equal to its spectral radius. + +===Example=== +The [[hydrogen atom]] provides an example of this decomposition. The eigenfunctions of the [[molecular Hamiltonian|hydrogen atom Hamiltonian]] are called '''eigenstates''' and are grouped into two categories. The [[bound state]]s of the hydrogen atom correspond to the discrete part of the spectrum (they have a discrete set of eigenvalues that can be computed by [[Rydberg formula]]) while the [[ionization]] processes are described by the continuous part (the energy of the collision/ionization is not quantized). + +== Further results == + +If ''T'' is a [[compact operator]], then it can be shown that any nonzero λ in the spectrum is an eigenvalue. In other words, the spectrum of such an operator, which was defined as a generalization of the concept of eigenvalues, consists in this case only of the usual eigenvalues, and possibly 0. + +If ''X'' is a [[Hilbert space]] and ''T'' is a [[normal operator]], then a remarkable result known as the [[spectral theorem]] gives an analogue of the diagonalisation theorem for normal finite-dimensional operators (Hermitian matrices, for example). + +== Spectrum of an unbounded operator == + +One can extend the definition of spectrum for [[unbounded operator]]s on a [[Banach space]] ''X'', operators which are no longer elements in the Banach algebra ''B''(''X''). One proceeds in a manner similar to the bounded case. A complex number λ is said to be in the '''resolvent set''', that is, the [[complement (set theory)|complement]] of the spectrum of a linear operator + +:<math>T: D \subset X \to X</math> + +if the operator + +:<math>T-\lambda I: D \to X</math> + +has a bounded inverse, i.e. if there exists a bounded operator + +:<math>S : X \rightarrow D</math> + +such that + +:<math>S (T - I \lambda) = I_D, \, (T - I \lambda) S = I_X.</math> + +A complex number λ is then in the '''spectrum''' if this property fails to hold. One can classify the spectrum in exactly the same way as in the bounded case. + +The spectrum of an unbounded operator is in general a closed, possibly empty, subset of the complex plane. + +For ''λ'' to be in the resolvent (i.e. not in the spectrum), as in the bounded case λ''I''&nbsp;&minus;&nbsp;''T'' must be bijective, since it must have a two-sided inverse. As before if an inverse exists then its linearity is immediate, but in general it may not be bounded, so this condition must be checked separately. + +However, boundedness of the inverse ''does'' follow directly from its existence if one introduces the additional assumption that ''T'' is [[closed operator|closed]]; this follows from the [[closed graph theorem]]. Therefore, as in the bounded case, a complex number ''λ'' lies in the spectrum of a closed operator ''T'' if and only if λ''I''&nbsp;&minus;&nbsp;''T'' is not bijective. Note that the class of closed operators includes all bounded operators. + +Via its [[spectral measure]]s, one can define a [[decomposition of spectrum (functional analysis)|decomposition of the spectrum]] of any self adjoint operator, bounded or otherwise into absolutely continuous, pure point, and singular parts. + +== Spectrum of a unital Banach algebra == +{{Expand section|date=June 2009}} +Let ''B'' be a complex [[Banach algebra]] containing a [[unit (ring theory)|unit]] ''e''. Then we define the spectrum σ(''x'') (or more explicitly σ<sub>''B''</sub>(''x'')) of an element ''x'' of ''B'' to be the set of those [[complex number]]s λ for which λ''e''&nbsp;−&nbsp;''x'' is not invertible in ''B''. This extends the definition for bounded linear operators ''B''(''X'') on a Banach space ''X'', since ''B''(''X'') is a Banach algebra. + +==See also== +*[[Essential spectrum]] +*[[Self-adjoint operator]] +*[[Pseudospectrum]] + +== References == +{{Reflist}} +*Dales et al., ''Introduction to Banach Algebras, Operators, and Harmonic Analysis'', ISBN 0-521-53584-0 +*{{springer|title=Spectrum of an operator|id=p/s086610}} + +{{Functional Analysis}} + +{{DEFAULTSORT:Spectrum (Functional Analysis)}} +[[Category:Spectral theory]] + e41i14ug3b53vl6h9yuuk4r8o1y5161 + + + + Multivariate normal distribution + 0 + 1587 + + 1588 + 2014-01-19T05:00:05Z + + 76.94.227.51 + + /* Cumulative distribution function */ Added a request for citation for the numerical algorithms. + wikitext + text/x-wiki + {{Redirect|MVN|the airport with that [[International Air Transport Association airport code|IATA code]]|Mount Vernon Airport}} +{{Probability distribution + | name = + | type = multivariate + | pdf_image = [[Image:MultivariateNormal.png|300px]]<br/> <small>Many samples from a multivariate normal distribution, shown along with the 3-sigma ellipse, the two marginal distributions, and the two 1-d histograms.</small> + | cdf_image = + | notation = <math>\mathcal{N}(\boldsymbol\mu,\,\boldsymbol\Sigma)</math> + | parameters = '''''μ''''' ∈ '''R'''<sup>''k''</sup> — [[location parameter|location]]<br/>'''Σ''' ∈ '''R'''<sup>''k×k''</sup> — [[covariance matrix|covariance]] ([[nonnegative-definite matrix]]) + | support = '''''x''''' ∈ '''μ'''+span('''Σ''') ⊆ '''R'''<sup>''k''</sup> + | pdf = <math>(2\pi)^{-\frac{k}{2}}|\boldsymbol\Sigma|^{-\frac{1}{2}}\, e^{ -\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)'\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu) },</math><br/>exists only when '''Σ''' is [[positive-definite matrix|positive-definite]] + | cdf = (no analytic expression) + | mean = '''''μ''''' + | median = + | mode = '''''μ''''' + | variance = '''Σ''' + | skewness = + | kurtosis = + | entropy = <math>\frac{k}{2} (1 + \ln (2\pi)) + \frac{1}{2} \ln |\boldsymbol\Sigma |</math> + | mgf = <math>\exp\!\Big( \boldsymbol\mu'\mathbf{t} + \tfrac{1}{2} \mathbf{t}'\boldsymbol\Sigma \mathbf{t}\Big)</math> + | char = <math>\exp\!\Big( i\boldsymbol\mu'\mathbf{t} - \tfrac{1}{2} \mathbf{t}'\boldsymbol\Sigma \mathbf{t}\Big)</math> + }} +In [[probability theory]] and [[statistics]], the '''multivariate normal distribution''' or '''multivariate Gaussian distribution''', is a generalization of the one-dimensional ([[univariate]]) [[normal distribution]] to higher dimensions. One possible definition is that a [[random vector]] is said to be ''k''-variate normally distributed if every [[linear combination]] of its ''k'' components has a univariate normal distribution. However, its importance derives mainly from the [[Central limit theorem#Multivariate central limit theorem|multivariate central limit theorem]]. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued [[random variable]]s each of which clusters around a mean value. + +== Notation and parametrization == +The multivariate normal distribution of a ''k''-dimensional random vector {{nowrap|'''x''' {{=}} [''X''<sub>1</sub>, ''X''<sub>2</sub>, …, ''X<sub>k</sub>'']}} can be written in the following notation: +: <math> + \mathbf{x}\ \sim\ \mathcal{N}(\boldsymbol\mu,\, \boldsymbol\Sigma), + </math> +or to make it explicitly known that ''X'' is ''k''-dimensional, +: <math> + \mathbf{x}\ \sim\ \mathcal{N}_k(\boldsymbol\mu,\, \boldsymbol\Sigma). + </math> +with ''k''-dimensional [[mean vector]] +:<math> \boldsymbol\mu = [ \operatorname{E}[X_1], \operatorname{E}[X_2], \ldots, \operatorname{E}[X_k]] </math> +and ''k x k'' [[covariance matrix]] +:<math> \boldsymbol\Sigma = [\operatorname{Cov}[X_i, X_j]], i=1,2,\ldots,k; j=1,2,\ldots,k </math> + +== Definition == + +A [[random vector]] {{nowrap|1='''x''' = (''X''<sub>1</sub>, …, ''X''<sub>''k''</sub>)'}} is said to have the multivariate normal distribution if it satisfies the following equivalent conditions.<ref>Gut, Allan (2009) ''An Intermediate Course in Probability'', Springer. ISBN 9781441901613 (Chapter 5)</ref> + +*Every linear combination of its components ''Y''&nbsp;=&nbsp;''a''<sub>1</sub>''X''<sub>1</sub> + … + ''a<sub>k</sub>X<sub>k</sub>'' is [[normal distribution|normally distributed]]. That is, for any constant vector {{nowrap|'''a''' ∈ '''R'''<sup>''k''</sup>}}, the random variable {{nowrap|1=''Y'' = '''a′x'''}} has a univariate normal distribution. + +*There exists a random ''ℓ''-vector '''z''', whose components are independent standard normal random variables, a ''k''-vector '''μ''', and a ''k×ℓ'' [[matrix (math)|matrix]] '''A''', such that {{nowrap|1='''x''' = '''Az''' + '''μ'''}}. Here ''ℓ'' is the [[rank (linear algebra)|rank]] of the [[covariance matrix]] {{nowrap|1='''Σ''' = '''AA′'''}}. Especially in the case of full rank, see the section below on [[#Geometric interpretation|Geometric interpretation]]. + +*There is a ''k''-vector '''μ''' and a symmetric, [[nonnegative-definite]] ''k×k'' matrix '''Σ''', such that the [[Characteristic function (probability theory)|characteristic function]] of '''x''' is +:: <math> + \varphi_\mathbf{x}(\mathbf{u}) = \exp\Big( i\mathbf{u}'\boldsymbol\mu - \tfrac{1}{2} \mathbf{u}'\boldsymbol\Sigma \mathbf{u} \Big). + </math> + +The covariance matrix is allowed to be singular (in which case the corresponding distribution has no density). This case arises frequently in [[statistics]]; for example, in the distribution of the vector of [[errors and residuals in statistics|residuals]] in the [[ordinary least squares]] regression. Note also that the ''X''<sub>''i''</sub> are in general ''not'' independent; they can be seen as the result of applying the matrix '''A''' to a collection of independent Gaussian variables '''z'''. + +== Properties == + +===Density function=== + +====Non-degenerate case==== +The multivariate normal distribution is said to be "non-degenerate" when the symmetric covariance matrix <math>\boldsymbol\Sigma</math> is [[Positive-definite matrix|positive definite]]. In this case the distribution has [[probability density function|density]]<ref>[http://www.math.uiuc.edu/~r-ash/Stat/StatLec21-25.pdf UIUC, Lecture 21. ''The Multivariate Normal Distribution''], 21.5:"Finding the Density".</ref> + +:<math> +f_{\mathbf x}(x_1,\ldots,x_k) = +\frac{1}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}} +\exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu}) +\right), +</math> + +where <math>|\boldsymbol\Sigma|</math> is the [[determinant]] of <math>\boldsymbol\Sigma</math>. Note how the equation above reduces to that of the univariate normal distribution if <math>\boldsymbol\Sigma</math> is a <math>1 \times 1</math> matrix (i.e. a real number). + +Each iso-density locus&mdash;the locus of points in ''k''-dimensional space each of which gives the same particular value of the density&mdash;is an [[ellipse]] or its higher-dimensional generalization; hence the multivariate normal is a special case of the [[elliptical distribution]]s. + +;Bivariate case +In the 2-dimensional nonsingular case ({{nowrap|1=''k'' = rank(Σ) = 2}}), the [[probability density function]] of a vector {{nowrap|[''X'' ''Y'']′}} is +: <math> + f(x,y) = + \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1-\rho^2}} + \exp\left( + -\frac{1}{2(1-\rho^2)}\left[ + \frac{(x-\mu_x)^2}{\sigma_x^2} + + \frac{(y-\mu_y)^2}{\sigma_y^2} - + \frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x \sigma_y} + \right] + \right), + </math> +where ''ρ'' is the [[Pearson product-moment correlation coefficient|correlation]] between ''X'' and ''Y'' and +where <math> \sigma_x>0 </math> and <math> \sigma_y>0 </math>. In this case, +: <math> + \boldsymbol\mu = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad + \boldsymbol\Sigma = \begin{pmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ + \rho \sigma_x \sigma_y & \sigma_y^2 \end{pmatrix}. + </math> +In the bivariate case, the first equivalent condition for multivariate normality can be made less restrictive: it is sufficient to verify that [[countably infinite|countably many]] distinct linear combinations of X and Y are normal in order to conclude that the vector {{nowrap|[X Y]′}} is bivariate normal.<ref name=HT/> + +The bivariate iso-density loci plotted in the ''x,y''-plane are ellipses. As the correlation parameter ''ρ'' increases, these loci appear to be squeezed to the following line : + +: <math> + y\left( x \right) = {\mathop{\rm sgn}} \left( {{\rho }} \right)\frac{{{\sigma _y}}}{{{\sigma _x}}}\left( {x - {\mu _x}} \right) + {\mu _y}. + </math> + +This is because the above expression - but without the rho being inside a signum function - is the [[best linear unbiased prediction]] of ''Y'' given a value of ''X''.<ref name=wyattlms/> + +====Degenerate case==== +If the covariance matrix <math>\boldsymbol\Sigma</math> is not full rank, then the multivariate normal distribution is degenerate and does not have a density. More precisely, it does not have a density with respect to ''k''-dimensional Lebesgue measure (which is the usual measure assumed in calculus-level probability courses). Only random vectors whose distributions are [[absolute continuity#Absolute continuity of measures|absolutely continuous]] with respect to a measure are said to have densities (with respect to that measure). To talk about densities but avoid dealing with measure-theoretic complications it can be simpler to restrict attention to a subset of <math>\text{rank}(\boldsymbol\Sigma)</math> of the coordinates of <math>\mathbf{x}</math> such that the covariance matrix for this subset is positive definite; then the other coordinates may be thought of as an [[affine function]] of the selected coordinates.{{citation needed|date=July 2012}} + +To talk about densities meaningfully in the singular case, then, we must select a different base measure. Using the [[disintegration theorem]] we can define a restriction of Lebesgue measure to the <math>\text{rank}(\boldsymbol\Sigma)</math>-dimensional affine subspace of <math>\mathbb{R}^k</math> where the Gaussian distribution is supported, i.e. <math>\{\boldsymbol\mu+\boldsymbol{\Sigma ^{1/2}}\mathbf{v} : \mathbf{v} \in \mathbb{R}^k \}</math>. With respect to this probability measure the distribution has density: +:<math>f(\mathbf{x})=(\text{det}^*(2\pi\boldsymbol\Sigma))^{-\frac{1}{2}}\, e^{ -\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)'\boldsymbol\Sigma^+(\mathbf{x}-\boldsymbol\mu) }</math> +where <math>\boldsymbol\Sigma^+</math> is the [[generalized inverse]] and det* is the [[pseudo-determinant]].<ref name=rao/> + +===Higher moments=== +{{Main|Isserlis’ theorem}} +The ''k''th-order [[moment (mathematics)|moments]] of '''x''' are defined by + +:<math> +\mu _{1,\dots,N}(\mathbf{x})\ \stackrel{\mathrm{def}}{=}\ \mu _{r_{1},\dots,r_{N}}(\mathbf{x})\ \stackrel{\mathrm{def}}{=}\ E\left[ +\prod\limits_{j=1}^{N}x_j^{r_{j}}\right] +</math> + +where {{nowrap|''r''<sub>1</sub> + ''r''<sub>2</sub> + ⋯ + ''r<sub>N</sub>'' {{=}} ''k''.}} + +The central ''k''-order central moments are given as follows + +(a) If ''k'' is odd, {{nowrap|''μ''<sub>1, …, ''N''</sub>('''x''' − '''μ''') {{=}} 0}}. + +(b) If ''k'' is even with {{nowrap|''k'' {{=}} 2''λ''}}, then + +:<math> +\mu _{1,\dots,2\lambda }(\mathbf{x}-\boldsymbol\mu )=\sum \left( \Sigma _{ij}\Sigma _{k\ell}\cdots\Sigma _{XZ}\right) +</math> + +where the sum is taken over all allocations of the set <math>\left\{ 1,\dots,2\lambda +\right\}</math> into ''λ'' (unordered) pairs. That is, if you have a ''k''th ({{nowrap| {{=}} 2''λ'' {{=}} 6}}) central moment, you will be summing the products of {{nowrap|''λ'' {{=}} 3}} covariances (the -'''μ''' notation has been dropped in the interests of parsimony): + +:<math>\begin{align} +& {} E[x_1 x_2 x_3 x_4 x_5 x_6] \\ +&{} = E[x_1 x_2 ]E[x_3 x_4 ]E[x_5 x_6 ] + E[x_1 x_2 ]E[x_3 x_5 ]E[x_4 x_6] + E[x_1 x_2 ]E[x_3 x_6 ]E[x_4 x_5] \\ +&{} + E[x_1 x_3 ]E[x_2 x_4 ]E[x_5 x_6 ] + E[x_1 x_3 ]E[x_2 x_5 ]E[x_4 x_6 ] + E[x_1 x_3]E[x_2 x_6]E[x_4 x_5] \\ +&+ E[x_1 x_4]E[x_2 x_3]E[x_5 x_6]+E[x_1 x_4]E[x_2 x_5]E[x_3 x_6]+E[x_1 x_4]E[x_2 x_6]E[x_3 x_5] \\ +& + E[x_1 x_5]E[x_2 x_3]E[x_4 x_6]+E[x_1 x_5]E[x_2 x_4]E[x_3 x_6]+E[x_1 x_5]E[x_2 x_6]E[x_3 x_4] \\ +& + E[x_1 x_6]E[x_2 x_3]E[x_4 x_5 ] + E[x_1 x_6]E[x_2 x_4 ]E[x_3 x_5] + E[x_1 x_6]E[x_2 x_5]E[x_3 x_4]. +\end{align}</math> + +This yields <math>(2\lambda -1)!/(2^{\lambda -1}(\lambda -1)!)</math> terms in the sum (15 in the above case), each being the product of ''λ'' (in this case 3) covariances. For fourth order moments (four variables) there are three terms. For sixth-order moments there are 3&nbsp;×&nbsp;5 = 15 terms, and for eighth-order moments there are 3&nbsp;×&nbsp;5&nbsp;×&nbsp;7 = 105 terms. + +The covariances are then determined by replacing the terms of the list <math>\left[ 1,\dots,2\lambda \right]</math> by the corresponding terms of the list consisting of ''r''<sub>1</sub> ones, then ''r''<sub>2</sub> twos, etc.. To illustrate this, examine the following 4th-order central moment case: + +:<math>E\left[ x_i^4\right] = 3\Sigma _{ii}^2</math> +:<math>E\left[ x_i^3 x_j\right] = 3\Sigma _{ii} \Sigma _{ij}</math> +:<math>E\left[ x_i^2 x_j^2\right] = \Sigma _{ii}\Sigma_{jj}+2\left( \Sigma _{ij}\right) ^2</math> +:<math>E\left[ x_i^2x_jx_k\right] = \Sigma _{ii}\Sigma _{jk}+2\Sigma _{ij}\Sigma _{ik}</math> +:<math>E\left[ x_i x_j x_k x_n\right] = \Sigma _{ij}\Sigma _{kn}+\Sigma _{ik}\Sigma _{jn}+\Sigma _{in}\Sigma _{jk}. +</math> + +where <math>\Sigma_{ij}</math> is the covariance of ''x<sub>i</sub>'' and ''x<sub>j</sub>''. The idea with the above method is you first find the general case for a ''k''th moment where you have ''k'' different ''x'' variables - <math>E\left[ x_i x_j x_k x_n\right]</math> and then you can simplify this accordingly. Say, you have <math>E\left[ x_i^2 x_k x_n\right]</math> then you simply let {{nowrap|''x<sub>i</sub>'' {{=}} ''x<sub>j</sub>''}} and realise that {{nowrap|''<math>\Sigma _{ii}</math>'' {{=}} ''σ<sub>i</sub>''<sup>2</sup>}}. + +===Likelihood function=== + +If the mean and variance matrix are unknown, a suitable log likelihood function for a single observation '''x''' would be:{{citation needed|date=March 2012}} + +:<math>\ln(L)= -\frac{1}{2} \ln (|\boldsymbol\Sigma|\,) -\frac{1}{2}(\mathbf{x}-\boldsymbol\mu)^{\rm T}\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu) - \frac{k}{2}\ln(2\pi)</math> + +where ''x'' is a vector of real numbers. The complex case, where ''z'' is a vector of complex numbers, would be + +:<math>\ln(L) = -\frac{1}{2}\ln (|\boldsymbol\Sigma|\,) -\frac{1}{2}(\mathbf{z}-\boldsymbol\mu)^\dagger\boldsymbol\Sigma^{-1}(\mathbf{z}-\boldsymbol\mu) - \frac{k}{2}\ln(2\pi)</math> + +i.e. with the [[conjugate transpose]] (indicated by <math>\dagger</math>) replacing the normal [[transpose]] (indicated by <math>{}^{\rm T}</math>). A similar notation is used for [[multiple linear regression]].<ref>Tong, T. (2010) [http://amath.colorado.edu/courses/7400/2010Spr/lecture9.pdf Multiple Linear Regression : MLE and Its Distributional Results], Lecture Notes</ref> + +===Entropy=== + +The [[differential entropy]] of the multivariate normal distribution is<ref>{{cite journal + | last1 = Gokhale | first1 = DV | authorlink1= + | last2 = Ahmed | first2 = NA + | last3 = Res |first3=BC + | last4 = Piscataway |first4=NJ + | date = May 1989 + | title = Entropy Expressions and Their Estimators for Multivariate Distributions + | journal = Information Theory, IEEE Transactions on + | volume = 35 | issue = 3 | pages = 688–692 + | doi =10.1109/18.30996 +}}</ref> + +:<math> +\begin{align} +h\left(f\right) & = -\int_{-\infty}^\infty \int_{-\infty}^\infty \cdots\int_{-\infty}^\infty f(\mathbf{x}) \ln f(\mathbf{x})\,d\mathbf{x},\\ +& = \frac12 \ln\left|(2\pi e) \boldsymbol\Sigma \right|,\\ +\end{align} +</math> +where the bars denote the [[determinant|matrix determinant]]. + +===Kullback–Leibler divergence=== +The [[Kullback–Leibler divergence]] from <math>\mathcal{N}_0(\boldsymbol\mu_0, \boldsymbol\Sigma_0)</math> to <math>\mathcal{N}_1(\boldsymbol\mu_1, \boldsymbol\Sigma_1)</math>, for non-singular matrices Σ<sub>0</sub> and Σ<sub>1</sub>, is:<ref>Penny & Roberts, PARG-00-12, (2000) [http://www.allisons.org/ll/MML/KL/Normal]. pp. 18</ref> + +:<math> +D_\text{KL}(\mathcal{N}_0 \| \mathcal{N}_1) = { 1 \over 2 } \left\{ \mathrm{tr} \left( \boldsymbol\Sigma_1^{-1} \boldsymbol\Sigma_0 \right) + \left( \boldsymbol\mu_1 - \boldsymbol\mu_0\right)^{\rm T} \boldsymbol\Sigma_1^{-1} ( \boldsymbol\mu_1 - \boldsymbol\mu_0 ) - K -\ln { | \boldsymbol \Sigma_0 | \over | \boldsymbol\Sigma_1 | } \right\}, +</math> +where <math>K</math> is the dimension of the vector space. + +The [[logarithm]] must be taken to base ''[[e (mathematical constant)|e]]'' since the two terms following the logarithm are themselves base-''e'' logarithms of expressions that are either factors of the density function or otherwise arise naturally. The equation therefore gives a result measured in [[nat (information)|nats]]. Dividing the entire expression above by log<sub>''e''</sub>&nbsp;2 yields the divergence in [[bit]]s. + +=== Cumulative distribution function === + +The notion of [[cumulative distribution function]] (cdf) in dimension 1 can be extended in two ways to the multidimensional case. +The first way is to define the cumulative distribution function <math>F(r)</math> as the probability that a sample '''falls''' inside the ellipsoid determined by its [[Mahalanobis distance]] <math>r</math> from the Gaussian, a direct generalization of the standard deviation +.<ref name=Bensimhoun>[https://upload.wikimedia.org/wikipedia/commons/a/a2/Cumulative_function_n_dimensional_Gaussians_12.2013.pdf Bensimhoun Michael, ''N-Dimensional Cumulative Function, And Other Useful Facts About Gaussians and Normal Densities'' (2006)]</ref> +In order to compute the values of this function, closed analytic formulae exist.<ref name="Bensimhoun"/> + +Another way to extend the notion of cumulative distribution function is to define +the [[cumulative distribution function]] (cdf) ''F''('''x'''<sub>0</sub>) of a random vector '''x''' as the probability that all components of '''x''' are less than or equal to the corresponding values in the vector&nbsp;'''x'''<sub>0</sub>. Though there is no closed form for ''F''('''x'''), there are a number of algorithms that estimate it numerically.{{Citation needed|reason=Existence of algorithms not obvious but easily demonstrated with a citation|date=January 2014}} + +===Prediction Interval=== + +The [[prediction interval]] for the multivariate normal distribution yields a region consisting of those vectors '''x''' satisfying + +:<math>({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu}) \leq \chi^2_k(p).</math> + +Here <math>{\mathbf x}</math> is a <math>k</math>-dimensional vector, <math>{\boldsymbol\mu}</math> is the known <math>k</math>-dimensional mean vector, <math>\boldsymbol\Sigma</math> is the known [[covariance matrix]] and <math>\chi^2_k(p)</math> is the [[quantile function]] for probability <math>p</math> of the [[chi-squared distribution]] with <math>k</math> degrees of freedom.<ref name=Siotani/> + +When <math>k = 2,</math> the expression defines the interior of an ellipse and the chi-squared distribution simplifies to an [[exponential distribution]] with mean equal to two. + +==Joint normality== + +===Normally distributed and independent=== + +If ''X'' and ''Y'' are normally distributed and [[statistical independence|independent]], this implies they are "jointly normally distributed", i.e., the pair (''X'',&nbsp;''Y'') must have multivariate normal distribution. However, a pair of jointly normally distributed variables need not be independent (would only be so if uncorrelated, <math> \rho = 0</math> ). + +===Two normally distributed random variables need not be jointly bivariate normal=== +{{See also|normally distributed and uncorrelated does not imply independent}} +The fact that two random variables ''X'' and ''Y'' both have a normal distribution does not imply that the pair (''X'',&nbsp;''Y'') has a joint normal distribution. A simple example is one in which X has a normal distribution with expected value 0 and variance 1, and ''Y''&nbsp;=&nbsp;''X'' if |''X''|&nbsp;>&nbsp;''c'' and ''Y''&nbsp;=&nbsp;−''X'' if |''X''|&nbsp;<&nbsp;''c'', where ''c''&nbsp;>&nbsp;0. There are similar counterexamples for more than two random variables. In general, they sum to a [[mixture model]]. + +===Correlations and independence=== + +In general, random variables may be uncorrelated but highly dependent. But if a random vector has a multivariate normal distribution then any two or more of its components that are uncorrelated are [[statistical independence|independent]]. This implies that any two or more of its components that are [[pairwise independence|pairwise independent]] are independent. + +But it is '''not''' true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. Two random variables that are normally distributed may fail to be ''jointly'' normally distributed, i.e., the vector whose components they are may fail to have a multivariate normal distribution. In the preceding example, clearly ''X'' and ''Y'' are not independent, yet [[normally distributed and uncorrelated does not imply independent|choosing ''c'' to be 1.54]] makes them uncorrelated. + +==Conditional distributions== + +If '''μ''' and '''Σ''' are partitioned as follows + +:<math> +\boldsymbol\mu += +\begin{bmatrix} + \boldsymbol\mu_1 \\ + \boldsymbol\mu_2 +\end{bmatrix} +\text{ with sizes }\begin{bmatrix} q \times 1 \\ (N-q) \times 1 \end{bmatrix}</math> + +:<math> +\boldsymbol\Sigma += +\begin{bmatrix} + \boldsymbol\Sigma_{11} & \boldsymbol\Sigma_{12} \\ + \boldsymbol\Sigma_{21} & \boldsymbol\Sigma_{22} +\end{bmatrix} +\text{ with sizes }\begin{bmatrix} q \times q & q \times (N-q) \\ (N-q) \times q & (N-q) \times (N-q) \end{bmatrix}</math> + +then, the distribution of '''x'''<sub>1</sub> conditional on '''x'''<sub>2</sub> = ''a'' is multivariate normal {{nowrap|('''x'''<sub>1</sub>{{!}}'''x'''<sub>2</sub> {{=}} '''a''') ~ ''N''(<span style{{=}}"text-decoration:overline;">'''μ'''</span>, <span style{{=}}"text-decoration:overline;">'''Σ'''</span>)}} where + +:<math> +\bar{\boldsymbol\mu} += +\boldsymbol\mu_1 + \boldsymbol\Sigma_{12} \boldsymbol\Sigma_{22}^{-1} +\left( + \mathbf{a} - \boldsymbol\mu_2 +\right) +</math> + +and covariance matrix + +:<math> +\overline{\boldsymbol\Sigma} += +\boldsymbol\Sigma_{11} - \boldsymbol\Sigma_{12} \boldsymbol\Sigma_{22}^{-1} \boldsymbol\Sigma_{21}. +</math><ref name=eaton>{{cite book|last=Eaton|first=Morris L.|title=Multivariate Statistics: a Vector Space Approach|year=1983|publisher=John Wiley and Sons|isbn=0-471-02776-6|pages=116–117}}</ref> + +This matrix is the [[Schur complement]] of '''Σ'''<sub>22</sub> in '''Σ'''. This means that to calculate the conditional covariance matrix, one inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional covariance matrix. Here <math>\boldsymbol\Sigma_{22}^{-1}</math> is the [[generalized inverse]] of <math>\boldsymbol\Sigma_{22}</math>. + +Note that knowing that {{nowrap|'''x'''<sub>2</sub> {{=}} '''a'''}} alters the variance, though the new variance does not depend on the specific value of '''a'''; perhaps more surprisingly, the mean is shifted by <math>\boldsymbol\Sigma_{12} \boldsymbol\Sigma_{22}^{-1} \left(\mathbf{a} - \boldsymbol\mu_2 \right)</math>; compare this with the situation of not knowing the value of '''a''', in which case '''x'''<sub>1</sub> would have distribution +<math>\mathcal{N}_q \left(\boldsymbol\mu_1, \boldsymbol\Sigma_{11} \right)</math>. + +An interesting fact derived in order to prove this result, is that the random vectors <math>\mathbf{x}_2</math> and <math>\mathbf{y}_1=\mathbf{x}_1-\boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1}\mathbf{x}_2</math> are independent. + +The matrix '''Σ'''<sub>12</sub>'''Σ'''<sub>22</sub><sup>−1</sup> is known as the matrix of [[regression analysis|regression]] coefficients. + +=== Bivariate case === +In the bivariate case where '''x''' is partitioned into ''X''<sub>1</sub> and ''X''<sub>2</sub>, the conditional distribution of ''X''<sub>1</sub> given ''X''<sub>2</sub> is<ref>{{cite book|last=Jensen|first=J|title=Statistics for Petroleum Engineers and Geoscientists|year=2000|publisher=Elsevier|location=Amsterdam|pages=207}}</ref> + +: <math>X_1|X_2=x_2 \ \sim\ \mathcal{N}\left(\mu_1+\frac{\sigma_1}{\sigma_2}\rho( x_2 - \mu_2),\, (1-\rho^2)\sigma_1^2\right). </math> + +where <math>\rho</math> is the [[Pearson product-moment correlation coefficient|correlation coefficient]] between ''X''<sub>1</sub> and ''X''<sub>2</sub>. + +=== Bivariate conditional expectation === + +====In the general case==== + +:<math> +\begin{pmatrix} + X_1 \\ + X_2 +\end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} + \mu_1 \\ + \mu_2 +\end{pmatrix} , \begin{pmatrix} + \sigma^2_1 & \sigma_{12} \\ + \sigma_{12} & \sigma^2_2 +\end{pmatrix} \right) +</math> + +The conditional expectation of X<sub>1</sub> given X<sub>2</sub> is: + +<math>\operatorname{E}(X_1 | X_2=x_2) = \mu_1 + \rho \frac{\sigma_1}{\sigma_2}(x_2 - \mu_2)</math> + +Proof: the result is simply obtained taking the expectation of the conditional distribution <math>X_1|X_2</math> above. + +====In the standard normal case==== + +:<math> +\begin{pmatrix} + X_1 \\ + X_2 +\end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} + 0 \\ + 0 +\end{pmatrix} , \begin{pmatrix} + 1 & \rho \\ + \rho & 1 +\end{pmatrix} \right) +</math> + +The conditional expectation of X<sub>1</sub> given X<sub>2</sub> is: + +<math>\operatorname{E}(X_1 | X_2=x_2)= \rho x_2 </math> + +and the conditional expectation of X<sub>1</sub> given that X<sub>2</sub> is smaller/bigger than z is (Maddala 1983, p.&nbsp;367<ref name=Maddala83>{{cite book|last=Gangadharrao|first=Maddala|title=Limited Dependent and Qualitative Variables in Econometrics|year=1983|publisher=Cambridge University Press}}</ref>) : + +:<math> +\operatorname{E}(X_1 | X_2 < z) = -\rho { \phi(z) \over \Phi(z) } , +</math> + +:<math> +\operatorname{E}(X_1 | X_2 > z) = \rho { \phi(z) \over (1- \Phi(z)) } , +</math> + +where the final ratio here is called the [[inverse Mills ratio]]. + +Proof: the last two results are obtained using the result <math>\operatorname{E}(X_1 | X_2=x_2)= \rho x_2 </math>, so that :<math> +\operatorname{E}(X_1 | X_2 < z) = \rho (X_2 | X_2 < z)</math> and then using the properties of the expectation of a [[truncated normal distribution]]. + +==Marginal distributions== +To obtain the [[marginal distribution]] over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra.<ref>The formal proof for marginal distribution is shown here http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html</ref> + +''Example'' + +Let {{nowrap|'''x''' {{=}} [''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>]}} be multivariate normal random variables with mean vector {{nowrap|'''μ''' {{=}} [''μ''<sub>1</sub>, ''μ''<sub>2</sub>, ''μ''<sub>3</sub>]}} and covariance matrix '''Σ''' (standard parametrization for multivariate normal distributions). Then the joint distribution of {{nowrap|'''x′''' {{=}} [''X''<sub>1</sub>, ''X''<sub>3</sub>]}} is multivariate normal with mean vector {{nowrap|'''μ′''' {{=}} [''μ''<sub>1</sub>, ''μ''<sub>3</sub>]}} and covariance matrix +<math> \boldsymbol\Sigma' = +\begin{bmatrix} +\boldsymbol\Sigma_{11} & \boldsymbol\Sigma_{13} \\ +\boldsymbol\Sigma_{31} & \boldsymbol\Sigma_{33} +\end{bmatrix} +</math>. + +==Affine transformation== + +If {{nowrap|'''y''' {{=}} '''c''' + '''Bx'''}} is an [[affine transformation]] of <math>\mathbf{x}\ \sim \mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma),</math> where '''c''' is an <math>M \times 1</math> vector of constants and '''B''' is a constant <math>M \times N</math> matrix, then '''y''' has a multivariate normal distribution with expected value {{nowrap|'''c''' + '''Bμ'''}} and variance '''BΣB'''<sup>T</sup> i.e., <math>\mathbf{y} \sim \mathcal{N} \left(\mathbf{c} + \mathbf{B} \boldsymbol\mu, \mathbf{B} \boldsymbol\Sigma \mathbf{B}^{\rm T}\right)</math>. In particular, any subset of the ''x<sub>i</sub>'' has a marginal distribution that is also multivariate normal. +To see this, consider the following example: to extract the subset (''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>4</sub>)<sup>T</sup>, use + +:<math> +\mathbf{B} += +\begin{bmatrix} + 1 & 0 & 0 & 0 & 0 & \ldots & 0 \\ + 0 & 1 & 0 & 0 & 0 & \ldots & 0 \\ + 0 & 0 & 0 & 1 & 0 & \ldots & 0 +\end{bmatrix} +</math> + +which extracts the desired elements directly. + +Another corollary is that the distribution of {{nowrap|'''Z''' {{=}} '''b''' · '''x'''}}, where '''b''' is a constant vector of the same length as '''x''' and the dot indicates a vector product, is univariate Gaussian with <math>Z\sim\mathcal{N}\left(\mathbf{b}\cdot\boldsymbol\mu, \mathbf{b}^{\rm T}\boldsymbol\Sigma \mathbf{b}\right)</math>. This result follows by using + +:<math> +\mathbf{B}=\begin{bmatrix} +b_1 & b_2 & \ldots & b_n +\end{bmatrix}. +</math> +Observe how the positive-definiteness of '''Σ''' implies that the variance of the dot product must be positive. + +An affine transformation of '''x''' such as 2'''x''' is not the same as the [[Sum of normally distributed random variables|sum of two independent realisations]] of '''x'''. + +==Geometric interpretation== + +The equidensity contours of a non-singular multivariate normal distribution are [[ellipsoid]]s (i.e. linear transformations of [[hypersphere]]s) centered at the mean.<ref>{{cite web|author=Nikolaus Hansen|title=The CMA Evolution Strategy: A Tutorial|url=http://www.lri.fr/~hansen/cmatutorial.pdf|format=PDF}}</ref> Hence the multivariate normal distribution is an example of the class of [[elliptical distribution]]s. The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix '''Σ'''. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. + +If {{nowrap|'''Σ''' {{=}} '''UΛU'''<sup>T</sup> {{=}} '''UΛ'''<sup>1/2</sup>('''UΛ'''<sup>1/2</sup>)<sup>T</sup>}} is an [[eigendecomposition]] where the columns of '''U''' are unit eigenvectors and '''Λ''' is a [[diagonal matrix]] of the eigenvalues, then we have + +::<math>\mathbf{x}\ \sim \mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma) \iff \mathbf{x}\ \sim \boldsymbol\mu+\mathbf{U}\boldsymbol\Lambda^{1/2}\mathcal{N}(0, \mathbf{I}) \iff \mathbf{x}\ \sim \boldsymbol\mu+\mathbf{U}\mathcal{N}(0, \boldsymbol\Lambda).</math> + +Moreover, '''U''' can be chosen to be a [[rotation matrix]], as inverting an axis does not have any effect on ''N''(0, '''Λ'''), but inverting a column changes the sign of '''U''''s determinant. The distribution ''N''('''μ''', '''Σ''') is in effect ''N''(0, '''I''') scaled by '''Λ'''<sup>1/2</sup>, rotated by '''U''' and translated by '''μ'''. + +Conversely, any choice of '''μ''', full rank matrix '''U''', and positive diagonal entries Λ<sub>''i''</sub> yields a non-singular multivariate normal distribution. If any Λ<sub>''i''</sub> is zero and '''U''' is square, the resulting covariance matrix '''UΛU'''<sup>T</sup> is [[singular matrix|singular]]. Geometrically this means that every contour ellipsoid is infinitely thin and has zero volume in ''n''-dimensional space, as at least one of the principal axes has length of zero. + +==Estimation of parameters== + +The derivation of the [[maximum likelihood|maximum-likelihood]] [[estimator]] of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. See [[estimation of covariance matrices]]. + +In short, the probability density function (pdf) of a multivariate normal is + +:<math>f(\mathbf{x})= \frac{1}{\sqrt { (2\pi)^k|\boldsymbol \Sigma| } } \exp\left(-{1 \over 2} (\mathbf{x}-\boldsymbol\mu)^{\rm T} \boldsymbol\Sigma^{-1} ({\mathbf x}-\boldsymbol\mu)\right)</math> + +and the ML estimator of the covariance matrix from a sample of ''n'' observations is + +:<math>\widehat{\boldsymbol\Sigma} = {1 \over n}\sum_{i=1}^n ({\mathbf x}_i-\overline{\mathbf x})({\mathbf x}_i-\overline{\mathbf x})^T</math> + +which is simply the [[sample covariance matrix]]. This is a [[biased estimator]] whose expectation is + +:<math>E[\widehat{\boldsymbol\Sigma}] = \frac{n-1}{n} \boldsymbol\Sigma.</math> + +An unbiased sample covariance is + +:<math>\widehat{\boldsymbol\Sigma} = {1 \over n-1}\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf{x}})(\mathbf{x}_i-\overline{\mathbf{x}})^{\rm T}.</math> + +The [[Fisher information matrix]] for estimating the parameters of a multivariate normal distribution has a closed form expression. This can be used, for example, to compute the [[Cramér–Rao bound]] for parameter estimation in this setting. See [[Fisher information#Multivariate normal distribution|Fisher information]] for more details. + +==Bayesian inference== +In [[Bayesian statistics]], the [[conjugate prior]] of the mean vector is another multivariate normal distribution, and the conjugate prior of the covariance matrix is an [[inverse-Wishart distribution]] <math>\mathcal{W}^{-1}</math> . Suppose then that ''n'' observations have been made +:<math>\mathbf{X} = \{\mathbf{x}_1,\dots,\mathbf{x}_n\} \sim \mathcal{N}(\boldsymbol\mu,\boldsymbol\Sigma)</math> +and that a conjugate prior has been assigned, where +:<math>p(\boldsymbol\mu,\boldsymbol\Sigma)=p(\boldsymbol\mu\mid\boldsymbol\Sigma)\ p(\boldsymbol\Sigma),</math> +where +:<math>p(\boldsymbol\mu\mid\boldsymbol\Sigma) \sim\mathcal{N}(\boldsymbol\mu_0,m^{-1}\boldsymbol\Sigma) ,</math> +and +:<math>p(\boldsymbol\Sigma) \sim \mathcal{W}^{-1}(\boldsymbol\Psi,n_0).</math> + +Then,{{citation needed|date=July 2012}} + +:<math> +\begin{array}{rcl} +p(\boldsymbol\mu\mid\boldsymbol\Sigma,\mathbf{X}) & \sim & \mathcal{N}\left(\frac{n\bar{\mathbf{x}} + m\boldsymbol\mu_0}{n+m},\frac{1}{n+m}\boldsymbol\Sigma\right),\\ +p(\boldsymbol\Sigma\mid\mathbf{X}) & \sim & \mathcal{W}^{-1}\left(\boldsymbol\Psi+n\mathbf{S}+\frac{nm}{n+m}(\bar{\mathbf{x}}-\boldsymbol\mu_0)(\bar{\mathbf{x}}-\boldsymbol\mu_0)', n+n_0\right), +\end{array} +</math> +where +:<math> +\begin{array}{rcl} +\bar{\mathbf{x}} & = & n^{-1}\sum_{i=1}^{n} \mathbf{x}_i ,\\ +\mathbf{S} & = & n^{-1}\sum_{i=1}^{n} (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})' . +\end{array} +</math> + +== Multivariate normality tests == + +Multivariate normality tests check a given set of data for similarity to the multivariate [[normal distribution]]. The [[null hypothesis]] is that the [[data set]] is similar to the normal distribution, therefore a sufficiently small [[p-value|''p''-value]] indicates non-normal data. Multivariate normality tests include the Cox-Small test<ref>{{cite doi | 10.1093/biomet/65.2.263}}</ref> +and Smith and Jain's adaptation<ref>{{cite doi | 10.1109/34.6789}}</ref> of the Friedman-Rafsky test.<ref>{{cite doi|10.1214/aos/1176344722}}</ref> + +'''Mardia's test'''<ref name=Mardia/> is based on multivariate extensions of [[skewness]] and [[kurtosis]] measures. For a sample {'''x'''<sub>1</sub>, ..., '''x'''<sub>''n''</sub>} of ''k''-dimensional vectors we compute +: <math>\begin{align} + & \widehat{\boldsymbol\Sigma} = {1 \over n} \sum_{j=1}^n \left(\mathbf{x}_j - \bar{\mathbf{x}}\right)\left(\mathbf{x}_j - \bar{\mathbf{x}}\right)^T \\ + + & A = {1 \over 6n} \sum_{i=1}^n \sum_{j=1}^n \left[ (\mathbf{x}_i - \bar{\mathbf{x}})^T\;\widehat{\boldsymbol\Sigma}^{-1} (\mathbf{x}_j - \bar{\mathbf{x}}) \right]^3 \\ + + & B = \sqrt{\frac{n}{8k(k+2)}}\left\{{1 \over n} \sum_{i=1}^n \left[ (\mathbf{x}_i - \bar{\mathbf{x}})^T\;\widehat{\boldsymbol\Sigma}^{-1} (\mathbf{x}_i - \bar{\mathbf{x}}) \right]^2 - k(k+2) \right\} + \end{align}</math> +Under the null hypothesis of multivariate normality, the statistic ''A'' will have approximately a [[chi-squared distribution]] with {{nowrap|{{frac2|1|6}}⋅''k''(''k'' + 1)(''k'' + 2)}} degrees of freedom, and ''B'' will be approximately [[standard normal]] ''N''(0,1). + +Mardia's kurtosis statistic is skewed and converges very slowly to the limiting normal distribution. For medium size samples <math>(50 \le n < 400)</math>, the parameters of the asymptotic distribution of the kurtosis statistic are modified<ref>Rencher (1995), pages 112-113.</ref> For small sample tests (<math>n<50</math>) empirical critical values are used. Tables of critical values for both statistics are given by Rencher<ref>Rencher (1995), pages 493-495.</ref> for ''k''=2,3,4. + +Mardia's tests are affine invariant but not consistent. For example, the multivariate skewness test is not consistent against +symmetric non-normal alternatives.<ref>{{cite doi|10.1016/0047-259X(91)90031-V}}</ref> + +The '''BHEP test'''<ref name=EP/> computes the norm of the difference between the empirical [[characteristic function (probability theory)|characteristic function]] and the theoretical characteristic function of the normal distribution. Calculation of the norm is performed in the [[Lp space|L<sup>2</sup>(''μ'')]] space of square-integrable functions with respect to the Gaussian weighting function <math>\scriptstyle \mu_\beta(\mathbf{t}) = (2\pi\beta^2)^{-k/2} e^{-|\mathbf{t}|^2/(2\beta^2)}</math>. The test statistic is +: <math>\begin{align} + T_\beta &= \int_{\mathbb{R}^k} \left| {1 \over n} \sum_{j=1}^n e^{i\mathbf{t}^T\widehat{\boldsymbol\Sigma}^{-1/2}(\mathbf{x}_j - \bar{\mathbf{x})}} - e^{-|\mathbf{t}|^2/2} \right|^2 \; \boldsymbol\mu_\beta(\mathbf{t}) d\mathbf{t} \\ + &= {1 \over n^2} \sum_{i,j=1}^n e^{-{\beta \over 2}(\mathbf{x}_i-\mathbf{x}_j)^T\widehat{\boldsymbol\Sigma}^{-1}(\mathbf{x}_i-\mathbf{x}_j)} - \frac{2}{n(1 + \beta^2)^{k/2}}\sum_{i=1}^n e^{ -\frac{\beta^2}{2(1+\beta^2)} (\mathbf{x}_i-\bar{\mathbf{x}})^T\widehat{\boldsymbol\Sigma}^{-1}(\mathbf{x}_i-\bar{\mathbf{x}})} + \frac{1}{(1 + 2\beta^2)^{k/2}} + \end{align}</math> +The limiting distribution of this test statistic is a weighted sum of chi-squared random variables,<ref name=BH/> however in practice it is more convenient to compute the sample quantiles using the Monte-Carlo simulations.{{citation needed|date=July 2012}} + +A detailed survey of these and other test procedures is available.<ref name=Henze/> + +==Drawing values from the distribution== + +A widely used method for drawing a random vector '''x''' from the ''N''-dimensional multivariate normal distribution with mean vector '''μ''' and [[covariance matrix]] '''Σ''' works as follows:<ref name=Gentle/> + +# Find any real matrix '''A''' such that {{nowrap|'''A'''&thinsp;'''A'''<sup>T</sup> {{=}} '''Σ'''}}. When '''Σ''' is positive-definite, the [[Cholesky decomposition]] is typically used, and the [[Cholesky decomposition#Avoiding taking square roots|extended form]] of this decomposition can always be used (as the covariance matrix may be only positive semi-definite) in both cases a suitable matrix '''A''' is obtained.{{citation needed|date=July 2012}} An alternative is to use the matrix '''A''' = '''UΛ'''<sup>½</sup> obtained from a [[Eigendecomposition of a matrix#Symmetric matrices|spectral decomposition]] '''Σ''' = '''UΛU'''<sup>T</sup> of '''Σ'''.{{citation needed|date=July 2012}} The former approach is more computationally straightforward but the matrices '''A''' change for different orderings of the elements of the random vector, while the latter approach gives matrices that are related by simple re-orderings. In theory both approaches give equally good ways of determining a suitable matrix '''A''', but there are differences in computation time. +# Let {{nowrap|'''z''' {{=}} (''z''<sub>1</sub>, …, ''z<sub>N</sub>'')<sup>T</sup>}} be a vector whose components are ''N'' [[statistical independence|independent]] [[normal distribution|standard normal]] variates (which can be generated, for example, by using the [[Box–Muller transform]]). +# Let '''x''' be {{nowrap|'''μ''' + '''Az'''}}. This has the desired distribution due to the affine transformation property. + +== See also == +* [[Chi distribution]], the [[probability density function|pdf]] of the [[Norm (mathematics)#p-norm|2-norm]] (or [[Euclidean norm]]) of a multivariate normally distributed vector (centered at zero). +* [[Complex normal distribution]], for the generalization to complex valued random variables. +* [[Multivariate stable distribution]] extension of the multivariate normal distribution, when the index (exponent in the characteristic function) is between zero to two. +* [[Mahalanobis distance]] +* [[Wishart distribution]] + +== References == +{{Reflist|refs= + +<ref name = Siotani>{{cite journal + | author = Siotani, Minoru + | title = Tolerance regions for a multivariate normal population + | journal = Annals of the Institute of Statistical Mathematics + | year = 1964 + | volume = 16 + | number = 1 + | pages = 135–153 + | doi = 10.1007/BF02868568 + | url = http://www.ism.ac.jp/editsec/aism/pdf/016_1_0135.pdf + }}</ref> + +<ref name=Mardia>{{cite journal + | last = Mardia | first = K. V. + | year = 1970 + | title = Measures of multivariate skewness and kurtosis with applications + | journal = Biometrika + | volume = 57 | issue = 3 | pages = 519–530 + | doi = 10.1093/biomet/57.3.519 + }}</ref> + +<ref name=EP>{{cite journal + | last1 = Epps | first1 = Lawrence B. + | last2 = Pulley | first2 = Lawrence B. + | year = 1983 + | title = A test for normality based on the empirical characteristic function + | journal = Biometrika + | volume = 70 | issue = 3 | pages = 723–726 + | doi = 10.1093/biomet/70.3.723 + }}</ref> + +<ref name=BH>{{cite journal + | last1 = Baringhaus | first1 = L. + | last2 = Henze | first2 = N. + | year = 1988 + | title = A consistent test for multivariate normality based on the empirical characteristic function + | journal = Metrika + | volume = 35 | issue = 1 | pages = 339–348 + | doi = 10.1007/BF02613322 + }}</ref> + +<ref name=Henze>{{cite journal + | last = Henze | first = Norbert + | year = 2002 + | title = Invariant tests for multivariate normality: a critical review + | journal = Statistical Papers + | volume = 43 | issue = 4 | pages = 467–506 + | doi = 10.1007/s00362-002-0119-6 + }}</ref> + +<ref name=HT>{{cite journal + | last1 = Hamedani | first1 = G. G. + | last2 = Tata | first2 = M. N. + | year = 1975 + | title = On the determination of the bivariate normal distribution from distributions of linear combinations of the variables + | journal = The American Mathematical Monthly + | volume = 82 | issue = 9 | pages = 913–915 + | doi = 10.2307/2318494 + }}</ref> + +<ref name=wyattlms>{{cite web|last=Wyatt|first=John + |title=Linear least mean-squared error estimation + |url=http://web.mit.edu/6.041/www/LECTURE/lec22.pdf + |work=Lecture notes course on applied probability|accessdate=23 January 2012}}</ref> + +<ref name=rao>{{cite book + | author = Rao, C.R. + | title = Linear Statistical Inference and Its Applications + |year = 1973 + |publisher = Wiley + |location = New York + | pages = 527–528 + }}</ref> + +<ref name=Gentle>{{cite book + | author = Gentle, J.E. + | title = Computational Statistics + |year = 2009 + |publisher = Springer + |location = New York + | pages = 315–316 + | doi = 10.1007/978-0-387-98144-4 + }}</ref> + +}} + +=== Literature === + +{{refbegin}} +* {{cite book + | author = Rencher, A.C. + | title = Methods of Multivariate Analysis + |year = 1995 + |publisher = Wiley + |location = New York + }} +{{refend}} + +{{ProbDistributions|multivariate}} + +{{DEFAULTSORT:Multivariate Normal Distribution}} +[[Category:Continuous distributions]] +[[Category:Multivariate continuous distributions]] +[[Category:Normal distribution]] +[[Category:Exponential family distributions]] +[[Category:Stable distributions]] +[[Category:Probability distributions]] + iyo8nt33x5tnue4y0in6lhnal8b7kvf + + + + Algebraically closed group + 0 + 15119 + + 15120 + 2013-02-14T19:02:54Z + + David Eppstein + 0 + + /* Known Results */ dab [[Finitely generated group]] + wikitext + text/x-wiki + In [[mathematics]], in the realm of [[group theory]], a [[group (mathematics)|group]] <math>A\ </math> is '''algebraically closed''' if any finite set of equations and inequations that "make sense" in <math>A\ </math> already have a solution in <math>A\ </math>. This idea will be made precise later in the article. + +==Informal discussion== + +Suppose we wished to find an element <math>x\ </math> of a group <math>G\ </math> satisfying the conditions (equations and inequations): + +::<math>x^2=1\ </math> +::<math>x^3=1\ </math> +::<math>x\ne 1\ </math> + +Then it is easy to see that this is impossible because the first two equations imply <math>x=1\ </math>. In this case we say the set of conditions are [[inconsistent]] with <math>G\ </math>. (In fact this set of conditions are inconsistent with any group whatsoever.) + +{| class="infobox" style="width:auto; font-size:100%" +! style="text-align: center" | <math>G\ </math> +|- +| +{| class="wikitable" style="margin: 0" +|<math>. \ </math> +! style="background: #ddffdd;"|<math>\underline{1} \ </math> +! style="background: #ddffdd;"|<math>\underline{a} \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{1} \ </math> +|<math>1 \ </math> +|<math>a \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{a} \ </math> +|<math>a \ </math> +|<math>1 \ </math> +|} +|} + +Now suppose <math>G\ </math> is the group with the multiplication table: + +Then the conditions: + +::<math>x^2=1\ </math> +::<math>x\ne 1\ </math> + +have a solution in <math>G\ </math>, namely <math>x=a\ </math>. + +However the conditions: + +::<math>x^4=1\ </math> +::<math>x^2a^{-1} = 1\ </math> + +Do not have a solution in <math>G\ </math>, as can easily be checked. + +{| class="infobox" style="width:auto; font-size:100%" +! style="text-align: center" | <math>H\ </math> +|- +| +{| class="wikitable" style="margin: 0" +|<math>. \ </math> +! style="background: #ddffdd;"|<math>\underline{1} \ </math> +! style="background: #ddffdd;"|<math>\underline{a} \ </math> +! style="background: #ddffdd;"|<math>\underline{b} \ </math> +! style="background: #ddffdd;"|<math>\underline{c} \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{1} \ </math> +|<math>1 \ </math> +|<math>a \ </math> +|<math>b \ </math> +|<math>c \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{a} \ </math> +|<math>a \ </math> +|<math>1 \ </math> +|<math>c \ </math> +|<math>b \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{b} \ </math> +|<math>b \ </math> +|<math>c \ </math> +|<math>a \ </math> +|<math>1 \ </math> +|- +! style="background: #ddffdd;"|<math>\underline{c} \ </math> +|<math>c \ </math> +|<math>b \ </math> +|<math>1 \ </math> +|<math>a \ </math> +|} +|} + +However if we extend the group <math>G \ </math> to the group <math>H \ </math> with multiplication table: + +Then the conditions have two solutions, namely <math>x=b \ </math> and <math>x=c \ </math>. + +Thus there are three possibilities regarding such conditions: +* They may be inconsistent with <math>G \ </math> and have no solution in any extension of <math>G \ </math>. +* They may have a solution in <math>G \ </math>. +* They may have no solution in <math>G \ </math> but nevertheless have a solution in some extension <math>H \ </math> of <math>G \ </math>. + +It is reasonable to ask whether there are any groups <math>A \ </math> such that whenever a set of conditions like these have a solution at all, they have a solution in <math>A \ </math> itself? The answer turns out to be "yes", and we call such groups algebraically closed groups. + +==Formal definition of an algebraically closed group== + +We first need some preliminary ideas. + +If <math>G\ </math> is a group and <math>F\ </math> is the [[free group]] on [[countably]] many generators, then by a '''finite set of equations and inequations with coefficients in''' <math>G\ </math> we mean a pair of subsets <math>E\ </math> and <math>I\ </math> of <math>F\star G</math> the [[free product]] of <math>F\ </math> and <math>G\ </math>. + +This formalizes the notion of a set of equations and inequations consisting of variables <math>x_i\ </math> and elements <math>g_j\ </math> of <math>G\ </math>. The set <math>E\ </math> represents equations like: +::<math>x_1^2g_1^4x_3=1</math> +::<math>x_3^2g_2x_4g_1=1</math> +::<math>\dots\ </math> +The set <math>I\ </math> represents inequations like +::<math>g_5^{-1}x_3\ne 1</math> +::<math>\dots\ </math> + +By a '''solution''' in <math>G\ </math> to this finite set of equations and inequations, we mean a homomorphism <math>f:F\rightarrow G</math>, such that <math>\tilde{f}(e)=1\ </math> for all <math>e\in E</math> and <math>\tilde{f}(i)\ne 1\ </math> for all <math>i\in I</math>. Where <math>\tilde{f}</math> is the unique homomorphism <math>\tilde{f}:F\star G\rightarrow G</math> that equals <math>f\ </math> on <math>F\ </math> and is the identity on <math>G\ </math>. + +This formalizes the idea of substituting elements of <math>G\ </math> for the variables to get true identities and inidentities. In the example the substitutions <math>x_1\mapsto g_6, x_3\mapsto g_7</math> and <math>x_4\mapsto g_8</math> yield: +::<math>g_6^2g_1^4g_7=1</math> +::<math>g_7^2g_2g_8g_1=1</math> +::<math>\dots\ </math> +::<math>g_5^{-1}g_7\ne 1</math> +::<math>\dots\ </math> + +We say the finite set of equations and inequations is '''consistent with''' <math>G\ </math> if we can solve them in a "bigger" group <math>H\ </math>. More formally: + +The equations and inequations are consistent with <math>G\ </math> if there is a group<math>H\ </math> and an embedding <math>h:G\rightarrow H</math> such that the finite set of equations and inequations <math>\tilde{h}(E)</math> and <math>\tilde{h}(I)</math> has a solution in <math>H\ </math>. Where <math>\tilde{h}</math> is the unique homomorphism <math>\tilde{h}:F\star G\rightarrow F\star H</math> that equals <math>h\ </math> on <math>G\ </math> and is the identity on <math>F\ </math>. + +Now we formally define the group <math>A\ </math> to be '''algebraically closed''' if every finite set of equations and inequations that has coefficients in <math>A\ </math> and is consistent with <math>A\ </math> has a solution in <math>A\ </math>. + +==Known Results== + +It is difficult to give concrete examples of algebraically closed groups as the following results indicate: + +* Every [[countable]] group can be embedded in a countable algebraically closed group. +* Every algebraically closed group is [[simple group|simple]]. +* No algebraically closed group is [[Finitely generated group|finitely generated]]. +* An algebraically closed group cannot be [[presentation of a group|recursively presented]]. +* A finitely generated group has [[Word problem for groups|solvable word problem]] if and only if it can embedded in every algebraically closed group. + +The proofs of these results are, in general very complex. However a sketch of the proof that a countable group <math>C\ </math> can be embedded in an algebraically closed group follows. + +First we embed <math>C\ </math> in a countable group <math>C_1\ </math> with the property that every finite set of equations with coefficients in <math>C\ </math> that is consistent in <math>C_1\ </math> has a solution in <math>C_1\ </math> as follows: + +There are only countably many finite sets of equations and inequations with coefficients in <math>C\ </math>. Fix an enumeration <math>S_0,S_1,S_2,\dots\ </math> of them. Define groups <math>D_0,D_1,D_2,\dots\ </math> inductively by: + +::<math>D_0 = C\ </math> + +::<math>D_{i+1} = +\left\{\begin{matrix} +D_i\ &\mbox{if}\ S_i\ \mbox{is not consistent with}\ D_i \\ +\langle D_i,h_1,h_2,\dots,h_n \rangle &\mbox{if}\ S_i\ \mbox{has a solution in}\ H\supseteq D_i\ \mbox{with}\ x_j\mapsto h_j\ 1\le j\le n +\end{matrix}\right. +</math> + +Now let: + +::<math>C_1=\cup_{i=0}^{\infty}D_{i}</math> + +Now iterate this construction to get a sequence of groups <math>C=C_0,C_1,C_2,\dots\ </math> and let: + +::<math>A=\cup_{i=0}^{\infty}C_{i}</math> + +Then <math>A\ </math> is a countable group containing <math>C\ </math>. It is algebraically closed because any finite set of equations and inequations that is consistent with <math>A\ </math> must have coefficients in some <math>C_i\ </math> and so must have a solution in <math>C_{i+1}\ </math>. + +==References== + +* A. Macintyre: On algebraically closed groups, ann. of Math, 96, 53-97 (1972) +* B.H. Neumann: A note on algebraically closed groups. J. London Math. Soc. 27, 227-242 (1952) +* B.H. Neumann: The isomorphism problem for algebraically closed groups. In: Word Problems, pp 553–562. Amsterdam: North-Holland 1973 +* W.R. Scott: Algebraically closed groups. Proc. Amer. Math. Soc. 2, 118-121 (1951) + +[[Category:Properties of groups]] + d3ms56n4u7tmweyqp483tjpyx7n8jae + + + + Analytic capacity + 0 + 6391 + + 6392 + 2013-04-11T14:30:03Z + + 131.220.132.179 + + /* Positive length but zero analytic capacity */ + wikitext + text/x-wiki + In [[complex analysis]], the '''analytic capacity''' of a [[compact subset]] ''K'' of the [[complex plane]] is a number that denotes "how big" a [[bounded function|bounded]] [[analytic function]] from '''C'''\''K'' can become. Roughly speaking, γ(''K'') measures the size of the unit ball of the space of bounded analytic functions outside ''K''. + +It was first introduced by [[Ahlfors]] in the 1940s while studying the removability of [[mathematical singularity|singularities]] of bounded analytic functions. + +==Definition== +Let ''K'' ⊂ '''C''' be [[compact space|compact]]. Then its analytic capacity is defined to be + +:<math>\gamma(K) = \sup \{|f'(\infty)|;\ f\in\mathcal{H}^\infty(\mathbf{C}\setminus K),\ \|f\|_\infty\leq 1,\ f(\infty)=0\}</math> + +Here, <math>\mathcal{H}^\infty (U) </math> denotes the set of [[bounded function|bounded]] analytic [[Function (mathematics)|functions]] ''U'' → '''C''', whenever ''U'' is an [[open set|open]] subset of the [[complex plane]]. Further, + +:<math> f'(\infty):= \lim_{z\to\infty}z\left(f(z)-f(\infty)\right) </math> +:<math> f(\infty):= \lim_{z\to\infty}f(z) </math> + +(note that usually <math> f'(\infty)\neq \lim_{z\to\infty} f'(z) </math>) + +==Ahlfors function== +For each compact ''K'' ⊂ '''C''', there exists a unique extremal function, i.e. <math>f\in\mathcal{H}^\infty(\mathbf{C}\setminus K)</math> such that <math>\|f\|\leq 1</math>, ''f''(∞) = 0 and ''f′''(∞) = γ(''K''). This function is called the '''Ahlfors function''' of ''K''. Its existence can be proved by using a normal family argument involving [[Montel's theorem]]. + +==Analytic capacity in terms of Hausdorff dimension== +Let dim<sub>''H''</sub> denote [[Hausdorff dimension]] and ''H''<sup>1</sup> denote 1-dimensional [[Hausdorff measure]]. Then ''H''<sup>1</sup>(''K'') = 0 implies γ(''K'') = 0 while dim<sub>''H''</sub>(''K'') > 1 guarantees γ(''K'') > 0. However, the case when dim<sub>''H''</sub>(''K'') = 1 and ''H''<sup>1</sup>(''K'') ∈ (0, ∞] is more difficult. + +===Positive length but zero analytic capacity=== +Given the partial correspondence between the 1-dimensional Hausdorff measure of a compact subset of '''C''' and its analytic capacity, it might be conjectured that γ(''K'') = 0 implies ''H''<sup>1</sup>(''K'') = 0. However, this conjecture is false. A counterexample was first given by [[Anatoli Georgievich Vitushkin|A. G. Vitushkin]], and a much simpler one by J. Garnett in his 1970 paper. This latter example is the '''linear four corners Cantor set''', constructed as follows: + +Let ''K''<sub>0</sub> := [0, 1] × [0, 1] be the unit square. Then, ''K''<sub>1</sub> is the union of 4 squares of side length 1/4 and these squares are located in the corners of ''K''<sub>0</sub>. In general, ''K<sub>n</sub>'' is the union of 4<sup>''n''</sup> squares (denoted by <math>Q_n^j</math>) of side length 4<sup>−''n''</sup>, each <math>Q_n^j</math> being in the corner of some <math>Q_{n-1}^k</math>. Take ''K'' to be the intersection of all ''K<sub>n</sub>'' then <math>H^1(K)=\sqrt{2}</math> but γ(''K'') = 0. + +===Vitushkin's Conjecture=== +Suppose dim<sub>''H''</sub>(''K'') = 1 and ''H''<sup>1</sup>(''K'') > 0. Vitushkin's conjecture states that + +:<math> \gamma(K)=0\ \Leftrightarrow\ K \ \text{ is purely unrectifiable} </math> + +In this setting, ''K'' is (purely) [[unrectifiable]] if and only if ''H''<sup>1</sup>(''K'' ∩ Γ) = 0 for all [[rectifiable curve]]s (or equivalently, ''C''<sup>1</sup>-curves or (rotated) Lipschitz graphs) Γ. + +Guy David published a proof in 1998 for the case when, in addition to the hypothesis above, ''H''<sup>1</sup>(''K'') < ∞. Until now, very little is known about the case when ''H''<sup>1</sup>(''K'') is infinite (even [[sigma-finite]]). + +==Removable sets and Painlevé's problem== +The compact set ''K'' is called '''removable''' if, whenever Ω is an open set containing ''K'', every function which is bounded and holomorphic on the set Ω\''K'' has an analytic extension to all of Ω. By [[Removable singularity#Riemann's theorem|Riemann's theorem for removable singularities]], every [[singleton (mathematics)|singleton]] is removable. This motivated Painlevé to pose a more general question in 1880: "Which subsets of '''C''' are removable?" + +It is easy to see that ''K'' is removable if and only if γ(''K'') = 0. However, analytic capacity is a purely complex-analytic concept, and much more work needs to be done in order to obtain a more geometric characterization. + +==References== +* {{cite book |last=Mattila |first=Pertti |title=Geometry of sets and measures in Euclidean spaces |year=1995 |publisher=Cambridge University Press |isbn=0-521-65595-1}} +* {{cite book |last=Pajot |first=Hervé |title=Analytic Capacity, Rectifiability, Menger Curvature and the Cauchy Integral |year=2002 |series=Lecture Notes in Mathematics |publisher=Springer-Verlag}} +* J. Garnett, Positive length but zero analytic capacity, ''Proc. Amer. Math. Soc.'' '''21''' (1970), 696-699 +* G. David, Unrectifiable 1-sets have vanishing analytic capacity, ''Rev. Math. Iberoam.'' '''14''' (1998) 269-479 +* {{cite book |last=Dudziak |first=James J. |title=Vitushkin's Conjecture for Removable Sets |year=2010 |series=Universitext |publisher=Springer-Verlag |isbn=978-14419-6708-4}} +[[Category:Analytic functions|*]] + eaugx9e3jklpymyeezmccz98660nb7p + + + + Block cipher mode of operation + 0 + 2481 + + 2482 + 2014-01-19T08:56:17Z + + Mitch Ames + 0 + + /* Common modes */ c/e + wikitext + text/x-wiki + {{about|cryptography|"method of operating"|modus operandi}} + +In [[cryptography]], a '''mode of operation''' is an algorithm that uses a [[block cipher]] to provide an [[information security|information service]] such as [[confidentiality]] or [[authentication|authenticity]].<ref name="NIST-BLOCK-CIPHER-MODES"> +{{cite web +| author = NIST Computer Security Division's (CSD) Security Technology Group (STG) +| title = Block cipher modes +| year = 2013 +| work = Cryptographic Toolkit +| publisher = NIST +| url = http://csrc.nist.gov/groups/ST/toolkit/BCM/index.html +| accessdate = April 12, 2013 +}}</ref> +A block cipher by itself is only suitable for the secure cryptographic transformation (encryption or decryption) of one fixed-length group of [[bit]]s called a [[Block (data storage)|block]].<ref name="FERGUSON"> +{{Cite book +| others = Ferguson, N., Schneier, B. and Kohno, T. +| year = 2010 +| title = Cryptography Engineering: Design Principles and Practical Applications +| publisher = Wiley Publishing, Inc. +| location = Indianapolis +| ISBN = 978-0-470-47424-2 +| pages = 63, 64 +}}</ref> A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.<ref name="NIST-PROPOSED-MODES"> +{{cite web +| author = NIST Computer Security Division's (CSD) Security Technology Group (STG) +| title = Proposed modes +| year = 2013 +| work = Cryptographic Toolkit +| publisher = NIST +| url = http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html +| accessdate = April 14, 2013 +}} +</ref><ref name="HAC"> +{{cite book +|authors=Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone +|title=Handbook of Applied Cryptography +|publisher=CRC Press +|year=1996 +|isbn=0-8493-8523-7 +|pages=228–233 +|url=http://www.cacr.math.uwaterloo.ca/hac/ +}}</ref><ref name="ISO-10116"> +{{Cite journal +| authors = ISO JTC 1/SC 27 +| title = ISO/IEC 10116:2006 - Information technology -- Security techniques -- Modes of operation for an n-bit block cipher +| journal = ISO Standards catalogue +| url = http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38761 +| year = 2006 +}}</ref> + +Most modes require a unique binary sequence, often called an [[initialization vector]] (IV), for each encryption operation. The IV has to be non-repeating and for some modes random as well. The initialization vector is used to ensure distinct [[ciphertext]]s are produced even when the same [[plaintext]] is encrypted multiple times independently with the same [[Key (cryptography)|key]].<ref name="HUANG"> +{{Cite journal +| authors = Kuo-Tsang Huang, Jung-Hui Chiu, and Sung-Shiou Shen +| title = A Novel Structure with Dynamic Operation Mode for Symmetric-Key Block Ciphers +| journal = International Journal of Network Security & Its Applications (IJNSA) +| url = http://airccse.org/journal/ijnsa.html +|volume = 5 +|issue = 1 +|date=January 2013 +| pages = 19 +}}</ref> Block ciphers have one or more [[Block size (cryptography)|block size]](s), but during transformation the block size is always fixed. Block cipher modes operate on whole blocks and require that the last part of the data be [[Padding (cryptography)|padded]] to a full block if it is smaller than the current block size.<ref name="FERGUSON"/> There are, however, modes that do not require padding because they effectively use a block cipher as a [[stream cipher]]. + +Historically, encryption modes have been studied extensively in regard to their error propagation properties under various scenarios of data modification. Later development regarded [[integrity protection]] as an entirely separate cryptographic goal. Some modern modes of operation combine [[confidentiality]] and [[authentication|authenticity]] in an efficient way, and are known as [[authenticated encryption]] modes.<ref name="NIST-CURRENT-MODES"> +{{cite web +| author = NIST Computer Security Division's (CSD) Security Technology Group (STG) +| title = Current modes +| year = 2013 +| work = Cryptographic Toolkit +| publisher = NIST +| url = http://csrc.nist.gov/groups/ST/toolkit/BCM/current_modes.html +| accessdate = April 12, 2013 +}} +</ref> + +==History and standardization== +The earliest modes of operation, ECB, CBC, OFB, and CFB (see below for all), date back to 1981 and were specified in [http://www.itl.nist.gov/fipspubs/fip81.htm FIPS 81], ''DES Modes of Operation''. In 2001, NIST revised its list of approved modes of operation by including AES as a block cipher and adding CTR mode in [http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf SP800-38A], ''Recommendation for Block Cipher Modes of Operation''. Finally, in January, 2010, NIST added [[XTS-AES]] in [http://csrc.nist.gov/publications/nistpubs/800-38E/nist-sp-800-38E.pdf SP800-38E], ''Recommendation for Block Cipher Modes of Operation: The XTS-AES Mode for Confidentiality on Storage Devices''. Other confidentiality modes exist which have not been approved by NIST. For example, CTS is [[ciphertext stealing]] mode and available in many popular cryptographic libraries. + +The block cipher modes ECB, CBC, OFB, CFB, CTR, and [[XTS mode|XTS]] provide confidentiality, but they do not protect against accidental modification or malicious tampering. Modification or tampering can be detected with a separate [[message authentication code]] such as [[CBC-MAC]], or a [[digital signature]]. The cryptographic community recognized the need for dedicated integrity assurances and NIST responded with HMAC, CMAC, and GMAC. [[HMAC]] was approved in 2002 as [http://csrc.nist.gov/publications/fips/fips198/fips-198a.pdf FIPS 198], ''The Keyed-Hash Message Authentication Code (HMAC)'', [[CMAC]] was released in 2005 under [http://csrc.nist.gov/publications/nistpubs/800-38B/SP_800-38B.pdf SP800-38B], ''Recommendation for Block Cipher Modes of Operation: The CMAC Mode for Authentication'', and GMAC was formalized in 2007 under [http://csrc.nist.gov/publications/nistpubs/800-38D/SP-800-38D.pdf SP800-38D], ''Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC''. + +After observing that compositing a confidentiality mode with an authenticity mode could be difficult and error prone, the cryptographic community began to supply modes which combined confidentiality and data integrity into a single cryptographic primitive. The modes are referred to as [[authenticated encryption]], AE or "authenc". Examples of authenc modes are [[CCM mode|CCM]] ([http://csrc.nist.gov/publications/nistpubs/800-38C/SP800-38C_updated-July20_2007.pdf SP800-38C]), [[GCM mode|GCM]] ([http://csrc.nist.gov/publications/nistpubs/800-38D/SP-800-38D.pdf SP800-38D]), [[CWC mode|CWC]], [[EAX mode|EAX]], [[IAPM mode|IAPM]], and [[OCB mode|OCB]]. + +Modes of operation are nowadays defined by a number of national and internationally recognized standards bodies. The most influential source is the US [[NIST]]{{citation needed|date=April 2012}}. Other notable standards organizations include [[International Organization for Standardization|ISO]] (with ISO/IEC 10116<ref name="ISO-10116"/>), the [[International Electrotechnical Commission|IEC]], the [[IEEE]], the national [[ANSI]], and the [[IETF]]. + +==Initialization vector (IV)== +{{Main|Initialization vector}} + +An [[initialization vector]] (IV) or starting variable (SV)<ref name="ISO-10116"/> is a block of bits that is used by several modes to randomize the encryption and hence to produce distinct ciphertexts even if the same plaintext is encrypted multiple times, without the need for a slower re-keying process.<ref name="HUANG"/> + +An [[initialization vector]] has different security requirements than a key, so the [[initialization vector|IV]] usually does not need to be secret. However, in most cases, it is important that an [[initialization vector]] is never reused under the same key. For CBC and CFB, reusing an IV leaks some information about the first block of plaintext, and about any common prefix shared by the two messages. For OFB and CTR, reusing an IV completely destroys security.<ref name="HUANG"/> This can be seen because both modes effectively create a bitstream that is XORed with the plaintext, and this bitstream is dependent on the password and IV only. Reusing a bitstream destroys security.<ref>{{cite web|title=Stream Cipher Reuse: A Graphic Example|url=http://www.cryptosmith.com/archives/70|publisher=Cryptosmith LLC|accessdate=27 March 2013}}</ref> In CBC mode, the [[initialization vector|IV]] must, in addition, be unpredictable at encryption time; in particular, the (previously) common practice of re-using the last ciphertext block of a message as the [[initialization vector|IV]] for the next message is insecure (for example, this method was used by SSL 2.0). If an attacker knows the [[initialization vector|IV]] (or the previous block of ciphertext) before he specifies the next plaintext, he can check his guess about plaintext of some block that was encrypted with the same key before (this is known as the TLS CBC IV attack).<ref>{{citation|author=B. Moeller|title=Security of CBC Ciphersuites in SSL/TLS: Problems and Countermeasures|date=May 20, 2004|url=http://www.openssl.org/~bodo/tls-cbc.txt}}</ref> + +==Padding== +{{Main|Padding (cryptography)}} +A [[block cipher]] works on units of a fixed [[block size (cryptography)|size]] (known as a ''block size''), but messages come in a variety of lengths. So some modes (namely [[Block cipher modes of operation#Electronic codebook .28ECB.29|ECB]] and [[Cipher block chaining|CBC]]) require that the final block be padded before encryption. Several [[padding (cryptography)|padding]] schemes exist. The simplest is to add null bytes to the [[plaintext]] to bring its length up to a multiple of the block size, but care must be taken that the original length of the plaintext can be recovered; this is so, for example, if the plaintext is a [[C (programming language)|C]] style [[Literal string|string]] which contains no null bytes except at the end. Slightly more complex is the original [[Data Encryption Standard|DES]] method, which is to add a single one [[bit]], followed by enough zero [[bit]]s to fill out the block; if the message ends on a block boundary, a whole padding block will be added. Most sophisticated are CBC-specific schemes such as [[ciphertext stealing]] or [[residual block termination]], which do not cause any extra ciphertext, at the expense of some additional complexity. [[Bruce Schneier|Schneier]] and [[Niels Ferguson|Ferguson]] suggest two possibilities<!-- Practical Crypto, sect 5.1 -->, both simple: append a byte with value 128 (hex 80), followed by as many zero bytes as needed to fill the last block, or pad the last block with ''n'' bytes all with value ''n''. + +CFB, OFB and CTR modes do not require any special measures to handle messages whose lengths are not multiples of the block size, since the modes work by XORing the plaintext with the output of the block cipher. The last partial block of plaintext is XORed with the first few bytes of the last [[keystream]] block, producing a final ciphertext block that is the same size as the final partial plaintext block. This characteristic of stream ciphers makes them suitable for applications that require the encrypted ciphertext data to be the same size as the original plaintext data, +and for applications that transmit data in streaming form where it is inconvenient to add padding bytes. + +==Common modes== +Many modes of operation have been defined. Some of these are described below. + +===Electronic codebook (ECB)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = ECB +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Electronic codebook +|header3 = +|label3 = Encryption parallelizable: +|data3 = Yes +|header4 = +|label4 = Decryption parallelizable: +|data4 = Yes +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +The simplest of the encryption modes is the '''electronic codebook''' (ECB) mode. The message is divided into blocks, and each block is encrypted separately. + +[[File:ECB encryption.svg]] + +[[File:ECB decryption.svg]] + +The disadvantage of this method is that identical [[plaintext]] blocks are encrypted into identical [[ciphertext]] blocks; thus, it does not hide data patterns well. In some senses, it doesn't provide serious message confidentiality, and it is not recommended for use in cryptographic protocols at all. + +A striking example of the degree to which ECB can leave plaintext data patterns in the ciphertext can be seen when ECB mode is used to encrypt a [[bitmap image]] which uses large areas of uniform colour. While the colour of each individual [[pixel]] is encrypted, the overall image may still be discerned as the pattern of identically coloured pixels in the original remains in the encrypted version. + +{{multiple image +| align = center +| image1 = Tux.jpg +| caption1 = Original image +| image2 = Tux ecb.jpg +| caption2 = Encrypted using ECB mode +| image3 = Tux secure.jpg +| caption3 = Modes other than ECB result in pseudo-randomness +| footer = The image on the right is how the image might appear encrypted with CBC, CTR or any of the other more secure modes—indistinguishable from random noise. Note that the random appearance of the image on the right does not ensure that the image has been securely encrypted; many kinds of insecure encryption have been developed which would produce output just as "random-looking". +| width = 196 +}} + +ECB mode can also make protocols without integrity protection even more susceptible to [[replay attack]]s, since each block gets decrypted in exactly the same way. For example, the ''[[Phantasy Star Online|Phantasy Star Online: Blue Burst]]'' online [[video game]] uses [[Blowfish (cipher)|Blowfish]] in ECB mode. Before the key exchange system was cracked, leading to even easier methods, cheaters repeated encrypted "monster killed" message packets, each an encrypted Blowfish block, to illegitimately gain [[experience point]]s quickly.{{Citation needed|date=February 2010}} + +===Cipher-block chaining (CBC)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = CBC +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Cipher-block chaining +|header3 = +|label3 = Encryption parallelizable: +|data3 = No +|header4 = +|label4 = Decryption parallelizable: +|data4 = Yes +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +IBM invented the cipher-block chaining (CBC) mode of operation in 1976.<ref>William F. Ehrsam, Carl H. W. Meyer, John L. Smith, Walter L. Tuchman, "Message verification and transmission error detection by block chaining", US Patent 4074066, 1976</ref> In CBC mode, each block of plaintext is [[XOR]]ed with the previous ciphertext block before being encrypted. This way, each ciphertext block depends on all plaintext blocks processed up to that point. To make each message unique, an [[initialization vector]] must be used in the first block. + +[[File:CBC encryption.svg]] + +[[File:CBC decryption.svg]] + +If the first block has index 1, the mathematical formula for CBC encryption is +:<math>C_i = E_K(P_i \oplus C_{i-1}), C_0 = IV</math> + +while the mathematical formula for CBC decryption is +:<math>P_i = D_K(C_i) \oplus C_{i-1}, C_0 = IV.</math> + +CBC has been the most commonly used mode of operation. Its main drawbacks are that encryption is sequential (i.e., it cannot be parallelized), and that the message must be padded to a multiple of the cipher block size. One way to handle this last issue is through the method known as [[ciphertext stealing]]. Note that a one-bit change in a plaintext or IV affects all following ciphertext blocks. + +Decrypting with the incorrect IV causes the first block of plaintext to be corrupt but subsequent plaintext blocks will be correct. This is because a plaintext block can be recovered from two adjacent blocks of ciphertext. As a consequence, decryption ''can'' be parallelized. Note that a one-bit change to the ciphertext causes complete corruption of the corresponding block of plaintext, and inverts the corresponding bit in the following block of plaintext, but the rest of the blocks remain intact. + +===Propagating cipher-block chaining (PCBC)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = PCBC +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Propagating cipher-block chaining +|header3 = +|label3 = Encryption parallelizable: +|data3 = No +|header4 = +|label4 = Decryption parallelizable: +|data4 = No +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +The propagating cipher-block chaining<ref>http://www.iks-jena.de/mitarb/lutz/security/cryptfaq/q84.html</ref> or plaintext cipher-block chaining<ref>{{cite book |last=Kaufman |first=C. |last2=Perlman |first2=R. |last3=Speciner |first3=M. |year=2002 |title=Network Security |location=Upper Saddle River, NJ |publisher=Prentice Hall |page=319 |edition=2nd |isbn=0130460192 }}</ref> mode was designed to cause small changes in the ciphertext to propagate indefinitely when decrypting, as well as when encrypting. + +[[File:PCBC encryption.svg]] + +[[File:PCBC decryption.svg]] + +Encryption and decryption algorithms are as follows: + +:<math>C_i = E_K(P_i \oplus P_{i-1} \oplus C_{i-1}), P_0 \oplus C_0 = IV</math> + +:<math>P_i = D_K(C_i) \oplus P_{i-1} \oplus C_{i-1}, P_0 \oplus C_0 = IV</math> + +PCBC is used in [[Kerberos (protocol)|Kerberos v4]] and [[WASTE]], most notably, but otherwise is not common. On a message encrypted in PCBC mode, if two adjacent ciphertext blocks are exchanged, this does not affect the decryption of subsequent blocks.<ref>{{cite book |last=Kohl |first=J. |chapter=The Use of Encryption in Kerberos for Network Authentication |title=Proceedings, Crypto '89 |year=1990 |publisher=Springer |location=Berlin |isbn=0387973176 |chapterurl=http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/C89/35.PDF }}</ref> For this reason, PCBC is not used in Kerberos v5. + +===Cipher feedback (CFB)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = CFB +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Cipher feedback +|header3 = +|label3 = Encryption parallelizable: +|data3 = No +|header4 = +|label4 = Decryption parallelizable: +|data4 = Yes +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +The ''cipher feedback'' (CFB) mode, a close relative of CBC, makes a block cipher into a self-synchronizing [[stream cipher]]. Operation is very similar; in particular, CFB decryption is almost identical to CBC encryption performed in reverse: + +:<math>C_i = E_K (C_{i-1}) \oplus P_i</math> + +:<math>P_i = E_K (C_{i-1}) \oplus C_i</math> + +:<math>C_{0} = \ \mbox{IV}</math> + +[[File:CFB encryption.svg]] + +[[File:CFB decryption.svg]] + +This simplest way of using CFB described above is not any more self-synchronizing than other cipher modes like CBC. If a whole blocksize of ciphertext is lost both CBC and CFB will synchronize, but losing only a single byte or bit will permanently throw off decryption. To be able to synchronize after the loss of only a single byte or bit, a single byte or bit must be encrypted at a time. CFB can be used this way when combined with a [[shift register]] as the input for the block cipher. + +To use CFB to make a self-synchronizing stream cipher that will synchronize for any multiple of x bits lost, start by initializing a shift register the size of the block size with the initialization vector. This is encrypted with the block cipher, and the highest x bits of the result are XOR'ed with x bits of the plaintext to produce x bits of ciphertext. These x bits of output are shifted into the shift register, and the process repeats with the next x bits of plaintext. Decryption is similar, start with the initialization vector, encrypt, and XOR the high bits of the result with x bits of the ciphertext to produce x bits of plaintext. Then shift the x bits of the ciphertext into the shift register. This way of proceeding is known as CFB-8 or CFB-1 (according to the size of the shifting).<ref name="AESBlockDocumentation">[http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf NIST: Recommendation for Block Cipher Modes of Operation]</ref> + +In notation, where S<sub>i</sub> is the ith state of the shift register, a << x is ''a'' shifted up ''x'' bits, head(a, x) is the x highest bits of a and n is number of bits of IV: + +:<math>C_i = \mbox{head}(E_K (S_{i-1}), x) \oplus P_i</math> + +:<math>P_i = \mbox{head}(E_K (S_{i-1}), x) \oplus C_i</math> + +:<math>S_i = \ ((S_{i-1} << x) + C_i) \mbox{ mod } 2^n</math> + +:<math>S_{0} = \ \mbox{IV}</math> + +If x bits are lost from the ciphertext, the cipher will output incorrect plaintext until the shift register once again equals a state it held while encrypting, at which point the cipher has resynchronized. This will result in at most one blocksize of output being garbled. + +Like CBC mode, changes in the plaintext propagate forever in the ciphertext, and encryption cannot be parallelized. Also like CBC, decryption can be parallelized. When decrypting, a one-bit change in the ciphertext affects two plaintext blocks: a one-bit change in the corresponding plaintext block, and complete corruption of the following plaintext block. Later plaintext blocks are decrypted normally. + +CFB shares two advantages over CBC mode with the stream cipher modes OFB and CTR: the block cipher is only ever used in the encrypting direction, and the message does not need to be padded to a multiple of the cipher block size (though [[ciphertext stealing]] can also be used to make padding unnecessary). + +===Output feedback (OFB)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = OFB +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Output feedback +|header3 = +|label3 = Encryption parallelizable: +|data3 = No +|header4 = +|label4 = Decryption parallelizable: +|data4 = No +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +The ''output feedback'' (OFB) mode makes a block cipher into a synchronous [[stream cipher]]. It generates [[keystream]] blocks, which are then [[XOR]]ed with the plaintext blocks to get the ciphertext. Just as with other stream ciphers, flipping a bit in the ciphertext produces a flipped bit in the plaintext at the same location. This property allows many [[Error-correcting code|error correcting codes]] to function normally even when applied before encryption. + +Because of the symmetry of the XOR operation, encryption and decryption are exactly the same: + +:<math>C_j = P_j \oplus O_j</math> + +:<math>P_j = C_j \oplus O_j</math> + +:<math>O_j = \ E_K (I_{j})</math> + +:<math>I_j =\ O_{j-1}</math> + +:<math>I_{0}= \ \mbox{IV}</math> + +[[File:OFB encryption.svg]] + +[[File:OFB decryption.svg]] + +Each output feedback block cipher operation depends on all previous ones, and so cannot be performed in parallel. However, because the plaintext or ciphertext is only used for the final XOR, the block cipher operations may be performed in advance, allowing the final step to be performed in parallel once the plaintext or ciphertext is available. + +It is possible to obtain an OFB mode keystream by using CBC mode with a constant string of zeroes as input. This can be useful, because it allows the usage of fast hardware implementations of CBC mode for OFB mode encryption. + +Using OFB mode with a partial block as feedback like CFB mode reduces the average cycle length by a factor of <math>2^{32}</math> or more. A mathematical model proposed by Davies and Parkin and substantiated by experimental results showed that only with full feedback an average cycle length near to the obtainable maximum can be achieved. For this reason, support for truncated feedback was removed from the specification of OFB.<ref>{{cite book |first=D. W. |last=Davies |first2=G. I. P. |last2=Parkin |chapter=The average cycle size of the key stream in output feedback encipherment |title=Advances in Cryptology, Proceedings of CRYPTO 82 |pages=263–282 |year=1983 |location=New York |publisher=Plenum Press |isbn=0306413663 }}</ref><ref>http://www.crypto.rub.de/its_seminar_ws0809.html</ref> + +===Counter (CTR)=== +{{Infobox +|name = +|bodystyle = +|title = +|titlestyle = +|image = +|imagestyle = +|caption = +|captionstyle = +|headerstyle = background:#ccf; +|labelstyle = background:#ddf; +|datastyle = + +|header1 = CTR +|label1 = +|data1 = +|header2 = +|label2 = +|data2 = Counter +|header3 = +|label3 = Encryption parallelizable: +|data3 = Yes +|header4 = +|label4 = Decryption parallelizable: +|data4 = Yes +|header5 = +|label5 = +|data5 = + +|belowstyle = background:#ddf; +|below = +}} +:''Note: CTR mode (CM) is also known as ''integer counter mode'' (ICM) and ''segmented integer counter'' (SIC) mode'' + +Like OFB, counter mode turns a [[block cipher]] into a [[stream cipher]]. It generates the next [[keystream]] block by encrypting successive values of a "counter". The counter can be any function which produces a sequence which is guaranteed not to repeat for a long time, although an actual increment-by-one counter is the simplest and most popular. The usage of a simple deterministic input function used to be controversial; critics argued that "deliberately exposing a cryptosystem to a known systematic input represents an unnecessary risk."<ref>{{cite book |first=Robert R. |last=Jueneman |chapter=Analysis of certain aspects of output feedback mode |title=Advances in Cryptology, Proceedings of CRYPTO 82 |pages=99–127 |year=1983 |location=New York |publisher=Plenum Press |isbn=0306413663 }}</ref> By now, CTR mode is widely accepted, and problems resulting from the input function are recognized as a weakness of the underlying block cipher instead of the CTR mode.<ref>Helger Lipmaa, Phillip Rogaway, and David Wagner. Comments to NIST concerning AES modes of operation: CTR-mode encryption. 2000</ref> Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and Bruce Schneier.<ref>Niels Ferguson, Bruce Schneier, Tadayoshi Kohno, Cryptography Engineering, page 71, 2010</ref> + +CTR mode has similar characteristics to OFB, but also allows a random access property during decryption. CTR mode is well suited to operate on a multi-processor machine where blocks can be encrypted in parallel. Furthermore, it does not suffer from the short-cycle problem that can affect OFB.<ref>http://www.quadibloc.com/crypto/co040601.htm</ref> + +Note that the [[cryptographic nonce|nonce]] in this diagram is the same thing as the [[initialization vector]] (IV) in the other diagrams. The IV/nonce and the counter can be combined together using any lossless operation (concatenation, addition, or XOR) to produce the actual unique counter block for encryption. + +[[File:CTR encryption 2.svg]] + +[[File:CTR decryption 2.svg]] + +==Error propagation== +Before the widespread use of [[message authentication codes]] and [[authenticated encryption]], it was common to discuss the "error propagation" properties as a selection criterion for a mode of operation. It might be observed, for example, that a one-block error in the transmitted ciphertext would result in a one-block error in the reconstructed plaintext for ECB mode encryption, while in CBC mode such an error would affect two blocks. + +Some felt that such resilience was desirable in the face of random errors (e.g., line noise), while others argued that error correcting increased the scope for attackers to maliciously tamper with a message. + +However, when proper integrity protection is used, such an error will result (with high probability) in the entire message being rejected. If resistance to random error is desirable, [[error-correcting code]]s should be applied to the ciphertext before transmission. + +==Authenticated encryption== +{{Main|Authenticated encryption}} +A number of modes of operation have been designed to combine [[secrecy]] and [[authentication]] in a single cryptographic primitive. Examples of such modes are [[XCBC mode|XCBC]],<ref>[[Virgil D. Gligor]], Pompiliu Donescu, "Fast Encryption and Authentication: XCBC Encryption and XECB Authentication Modes". Proc. Fast Software Encryption, 2001: 92-108.</ref> [[IACBC mode|IACBC]], [[IAPM mode|IAPM]],<ref>Charanjit S. Jutla, "Encryption Modes with Almost Free Message Integrity", Proc. Eurocrypt 2001, LNCS 2045, May 2001.</ref> [[OCB mode|OCB]], [[EAX mode|EAX]], [[CWC mode|CWC]], [[CCM mode|CCM]], and [[Galois/Counter Mode|GCM]]. [[Authenticated encryption]] modes are classified as single pass modes or double pass modes. Unfortunately for the cryptographic user community, many of the single pass [[authenticated encryption]] algorithms (such as [[OCB mode]]) are patent encumbered. + +In addition, some modes also allow for the authentication of unencrypted associated data, and these are called [[AEAD block cipher modes of operation|AEAD]] (Authenticated-Encryption with Associated-Data) schemes. For example, EAX mode is a double pass AEAD scheme while OCB mode is single pass. + +==Other modes and other cryptographic primitives== +Many more modes of operation for block ciphers have been suggested. Some have been accepted, fully described (even standardized), and are in use. Others have been found insecure, and should never be used. Still others don't categorize as confidentiality, authenticity, or authenticated encryption - for example [[key feedback mode]] and [[One-way_compression_function#Davies.E2.80.93Meyer|Davies-Meyer]] hashing. + +[[NIST]] maintains a list of proposed modes for block ciphers at [http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html Modes Development].<ref name="AESBlockDocumentation" /><ref>[http://csrc.nist.gov/groups/ST/toolkit/BCM/modes_development.html NIST: Modes Development]</ref> + +Disk encryption often uses special purpose modes specifically designed for the application. Tweakable narrow-block encryption modes ([[LRW]], [[disk encryption theory#XEX|XEX]], and [[XTS mode|XTS]]) and wide-block encryption modes ([[Disk_encryption_theory#CBC-mask-CBC (CMC) and ECB-mask-ECB (EME)|CMC]] and [[Disk_encryption_theory#CBC-mask-CBC (CMC) and ECB-mask-ECB (EME)|EME]]) are designed to securely encrypt sectors of a disk. (See [[disk encryption theory]]) + +Block ciphers can also be used in other [[cryptographic protocol]]s. They are generally used in modes of operation similar to the block modes described here. As with all protocols, to be cryptographically secure, care must be taken to build them correctly. + +There are several schemes which use a block cipher to build a [[cryptographic hash function]]. See [[one-way compression function]] for descriptions of several such methods. + +[[Cryptographically secure pseudorandom number generator]]s (CSPRNGs) can also be built using block ciphers. + +[[Message authentication code]]s (MACs) are often built from block ciphers. [[CBC-MAC]], [[One-key MAC|OMAC]] and [[PMAC (cryptography)|PMAC]] are examples. + +==See also== + +{{col-begin}} +{{col-break}} +* [[Disk encryption]] +* [[Message authentication code]] +* [[Authenticated encryption]] +* [[One-way compression function]] +{{col-break}} +{{Portal|Cryptography}} +{{col-end}} + +==References== +{{Reflist|30em}} + +{{Cryptography navbox | block | hash}} + +{{DEFAULTSORT:Block Cipher Modes Of Operation}} +[[Category:Block cipher modes of operation| ]] +[[Category:Cryptographic algorithms]] + oxk4tzhvcjlpborolehhinerrec61j7 + + + + Partial derivative + 0 + 1665 + + 1666 + 2013-12-15T07:19:42Z + + Jasper Deng + 0 + + /* Examples */ perhaps the more-common use of the term "total derivative" + wikitext + text/x-wiki + {{other uses of|∂|∂}} +{{Calculus |Multivariable}} + +In [[mathematics]], a '''partial derivative''' of a [[function (mathematics)|function]] of several variables is its [[derivative]] with respect to one of those variables, [[Ceteris paribus|with the others held constant]] (as opposed to the [[total derivative]], in which all variables are allowed to vary). Partial derivatives are used in [[vector calculus]] and [[differential geometry]]. + +The partial derivative of a function ''f'' with respect to the variable ''x'' is variously denoted by + +: <math>f^\prime_x,\ f_x,\ \partial_x f, \frac{\partial}{\partial x}f, \text{ or } \frac{\partial f}{\partial x}</math> + +The partial-derivative symbol is ∂. One of the first known uses of the symbol in mathematics is by [[Marquis de Condorcet]] from 1770, who used it for partial differences. The modern [[partial derivative]] notation is by [[Adrien-Marie Legendre]] (1786), though he later abandoned it; [[Carl Gustav Jacob Jacobi]] re-introduced the symbol in 1841.<ref name="jeff_earliest">{{cite web|url=http://jeff560.tripod.com/calculus.html|title=Earliest Uses of Symbols of Calculus|author=Jeff Miller|date=2009-06-14|work=Earliest Uses of Various Mathematical Symbols|accessdate=2009-02-20}}</ref> + +==Introduction== +Suppose that ''ƒ'' is a function of more than one variable. For instance, + +:<math>z = f(x,y) = \,\! x^2 + xy + y^2.\,</math> + +{{multiple image + | align = right + | direction = vertical + | width = 250 + + | image1 = Grafico 3d x2+xy+y2.png + | caption1 = A graph of {{nowrap|''z'' {{=}} ''x''<sup>2</sup> + ''xy'' + ''y''<sup>2</sup>}}. For the partial derivative at {{nowrap|(1, 1, 3)}} that leaves ''y'' constant, the corresponding [[tangent]] line is parallel to the ''xz''-plane. + + | image2 = X2+x+1.png + | caption2 = A slice of the graph above showing the function in the ''xz''-plane at {{nowrap|''y''{{=}} 1}} +}} + +The [[graph of a function|graph]] of this function defines a [[surface]] in [[Euclidean space]]. To every point on this surface, there are an infinite number of [[tangent line]]s. Partial differentiation is the act of choosing one of these lines and finding its [[slope]]. Usually, the lines of most interest are those that are parallel to the ''xz''-plane, and those that are parallel to the ''yz''-plane (which result from holding either y or x constant, respectively.) + +To find the slope of the line tangent to the function at P{{nowrap|(1, 1, 3)}} that is parallel to the ''xz''-plane, the ''y'' variable is treated as constant. The graph and this plane are shown on the right. On the graph below it, we see the way the function looks on the plane {{nowrap|''y'' {{=}} 1}}. By finding the [[derivative]] of the equation while assuming that ''y'' is a constant, the slope of ''ƒ'' at the point {{nowrap|(''x'', ''y'', ''z'')}} is found to be: + +: <math>\frac{\partial z}{\partial x} = 2x+y</math> + +So at {{nowrap|(1, 1, 3)}}, by substitution, the slope is 3. Therefore + +: <math>\frac{\partial z}{\partial x} = 3</math> + +at the point {{nowrap|(1, 1, 3)}}. That is, the partial derivative of ''z'' with respect to ''x'' at {{nowrap|(1, 1, 3)}} is 3. + +==Definition== +=== Basic definition === +The function ''f'' can be reinterpreted as a family of functions of one variable indexed by the other variables: + +:<math>f(x,y) = f_x(y) = \,\! x^2 + xy + y^2.\,</math> + +In other words, every value of ''x'' defines a function, denoted ''f<sub>x</sub>'', which is a function of one variable.<ref>This can also be expressed as the [[adjoint functors|adjointness]] between the [[product topology|product space]] and [[function space]] constructions.</ref> That is, + +:<math>f_x(y) = x^2 + xy + y^2.\,</math> + +Once a value of ''x'' is chosen, say ''a'', then ''f''(''x'',''y'') determines a function ''f<sub>a</sub>'' which sends ''y'' to ''a''<sup>2</sup> + ''ay'' + ''y''<sup>2</sup>: + +:<math>f_a(y) = a^2 + ay + y^2. \,</math> + +In this expression, ''a'' is a ''constant'', not a ''variable'', so ''f<sub>a</sub>'' is a function of only one real variable, that being ''y''. Consequently, the definition of the derivative for a function of one variable applies: + +:<math>f_a'(y) = a + 2y. \,</math> + +The above procedure can be performed for any choice of ''a''. Assembling the derivatives together into a function gives a function which describes the variation of ''f'' in the ''y'' direction: + +:<math>\frac{\partial f}{\partial y}(x,y) = x + 2y.\,</math> + +This is the partial derivative of ''f'' with respect to ''y''. Here ∂ is a rounded ''d'' called the '''partial derivative symbol'''. To distinguish it from the letter ''d'', ∂ is sometimes pronounced "del" or "partial" instead of "dee". + +In general, the '''partial derivative''' of a function ''f''(''x''<sub>1</sub>,...,''x''<sub>''n''</sub>) in the direction ''x<sub>i</sub>'' at the point (''a''<sub>1</sub>,...,''a<sub>n</sub>'') is defined to be: + +:<math>\frac{\partial f}{\partial x_i}(a_1,\ldots,a_n) = \lim_{h \to 0}\frac{f(a_1,\ldots,a_i+h,\ldots,a_n) - f(a_1,\ldots, a_i, \dots,a_n)}{h}.</math> + +In the above difference quotient, all the variables except ''x<sub>i</sub>'' are held fixed. That choice of fixed values determines a function of one variable <math>f_{a_1,\ldots,a_{i-1},a_{i+1},\ldots,a_n}(x_i) = f(a_1,\ldots,a_{i-1},x_i,a_{i+1},\ldots,a_n)</math>, and by definition, + +:<math>\frac{df_{a_1,\ldots,a_{i-1},a_{i+1},\ldots,a_n}}{dx_i}(a_i) = \frac{\partial f}{\partial x_i}(a_1,\ldots,a_n).</math> + +In other words, the different choices of ''a'' index a family of one-variable functions just as in the example above. This expression also shows that the computation of partial derivatives reduces to the computation of one-variable derivatives. + +An important example of a function of several variables is the case of a [[scalar-valued function]] ''f''(''x''<sub>1</sub>,...''x''<sub>''n''</sub>) on a domain in Euclidean space '''R'''<sup>''n''</sup> (e.g., on '''R'''<sup>2</sup> or '''R'''<sup>3</sup>). In this case ''f'' has a partial derivative ∂''f''/∂''x''<sub>''j''</sub> with respect to each variable ''x''<sub>''j''</sub>. At the point ''a'', these partial derivatives define the vector + +:<math>\nabla f(a) = \left(\frac{\partial f}{\partial x_1}(a), \ldots, \frac{\partial f}{\partial x_n}(a)\right).</math> + +This vector is called the '''[[gradient]]''' of ''f'' at ''a''. If ''f'' is differentiable at every point in some domain, then the gradient is a vector-valued function ∇''f'' which takes the point ''a'' to the vector ∇''f''(''a''). Consequently, the gradient produces a [[vector field]]. + +A common [[abuse of notation]] is to define the [[del operator]] (∇) as follows in three-dimensional [[Euclidean space]] '''R'''<sup>3</sup> with [[unit vectors]] <math>\mathbf{\hat{i}}, \mathbf{\hat{j}}, \mathbf{\hat{k}}</math>: +:<math>\nabla = \bigg[{\frac{\partial}{\partial x}} \bigg] \mathbf{\hat{i}} + \bigg[{\frac{\partial}{\partial y}}\bigg] \mathbf{\hat{j}} + \bigg[{\frac{\partial}{\partial z}}\bigg] \mathbf{\hat{k}}</math> +Or, more generally, for ''n''-dimensional Euclidean space '''R'''<sup>''n''</sup> with coordinates (x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>,...,x<sub>''n''</sub>) and unit vectors (<math>\mathbf{\hat{e}_1}, \mathbf{\hat{e}_2}, \mathbf{\hat{e}_3}, \dots , \mathbf{\hat{e}_n}</math>): +:<math>\nabla = \sum_{j=1}^n \bigg[{\frac{\partial}{\partial x_j}}\bigg] \mathbf{\hat{e}_j} = \bigg[{\frac{\partial}{\partial x_1}}\bigg] \mathbf{\hat{e}_1} + \bigg[{\frac{\partial}{\partial x_2}}\bigg] \mathbf{\hat{e}_2} + \bigg[{\frac{\partial}{\partial x_3}}\bigg] \mathbf{\hat{e}_3} + \dots + \bigg[{\frac{\partial}{\partial x_n}}\bigg] \mathbf{\hat{e}_n}</math> + +===Formal definition=== +Like ordinary derivatives, the partial derivative is defined as a [[limit of a function|limit]]. Let ''U'' be an [[open set|open subset]] of '''R'''<sup>''n''</sup> and ''f'' : ''U'' → '''R''' a function. The partial derivative of ''f'' at the point '''''a''''' = (''a''<sub>1</sub>, ..., ''a''<sub>''n''</sub>) ∈ ''U'' with respect to the ''i''-th variable ''a''<sub>''i''</sub> is defined as + +:<math>\frac{ \partial }{\partial a_i }f(\mathbf{a}) = +\lim_{h \rightarrow 0}{ +f(a_1, \dots , a_{i-1}, a_i+h, a_{i+1}, \dots ,a_n) - +f(a_1, \dots, a_i, \dots ,a_n) \over h } +</math> + +Even if all partial derivatives ∂''f''/∂''a''<sub>''i''</sub>(''a'') exist at a given point ''a'', the function need not be [[continuous function|continuous]] there. However, if all partial derivatives exist in a [[neighborhood (topology)|neighborhood]] of ''a'' and are continuous there, then ''f'' is [[total derivative|totally differentiable]] in that neighborhood and the total derivative is continuous. In this case, it is said that ''f'' is a C<sup>1</sup> function. This can be used to generalize for vector valued functions (''f'' : ''U'' → ''R'''<sup>''m''</sup>) by carefully using a componentwise argument. + +The partial derivative <math>\frac{\partial f}{\partial x}</math> can be seen as another function defined on ''U'' and can again be partially differentiated. If all mixed second order partial derivatives are continuous at a point (or on a set), ''f'' is termed a C<sup>2</sup> function at that point (or on that set); in this case, the partial derivatives can be exchanged by [[Symmetry of second derivatives#Clairaut.27s theorem|Clairaut's theorem]]: + +:<math>\frac{\partial^2f}{\partial x_i\, \partial x_j} = \frac{\partial^2f} {\partial x_j\, \partial x_i}.</math> + +==Examples== +[[Image:Cone 3d.png|thumb|The volume of a cone depends on height and radius]] +The [[volume]] ''V'' of a [[cone (geometry)|cone]] depends on the cone's [[height]] ''h'' and its [[radius]] ''r'' according to the formula +:<math>V(r, h) = \frac{\pi r^2 h}{3}.</math> + +The partial derivative of ''V'' with respect to ''r'' is +:<math>\frac{ \partial V}{\partial r} = \frac{ 2 \pi r h}{3},</math> + +which represents the rate with which a cone's volume changes if its radius is varied and its height is kept constant. The partial derivative with respect to ''h'' is +:<math>\frac{ \partial V}{\partial h} = \frac{\pi r^2}{3},</math> + +which represents the rate with which the volume changes if its height is varied and its radius is kept constant. + +By contrast, the [[total derivative|''total'' derivative]] of ''V'' with respect to ''r'' and ''h'' are respectively +:<math>\frac{\operatorname dV}{\operatorname dr} = \overbrace{\frac{2 \pi r h}{3}}^\frac{ \partial V}{\partial r} + \overbrace{\frac{\pi r^2}{3}}^\frac{ \partial V}{\partial h}\frac{\operatorname d h}{\operatorname d r}</math> + +and +:<math>\frac{\operatorname dV}{\operatorname dh} = \overbrace{\frac{\pi r^2}{3}}^\frac{ \partial V}{\partial h} + \overbrace{\frac{2 \pi r h}{3}}^\frac{ \partial V}{\partial r}\frac{\operatorname d r}{\operatorname d h}</math> + +The difference between the total and partial derivative is the elimination of indirect dependencies between variables in partial derivatives. + +If (for some arbitrary reason) the cone's proportions have to stay the same, and the height and radius are in a fixed ratio ''k'', +:<math>k = \frac{h}{r} = \frac{\operatorname d h}{\operatorname d r}.</math> + +This gives the total derivative with respect to ''r'': +:<math>\frac{\operatorname dV}{\operatorname dr} = \frac{2 \pi r h}{3} + \frac{\pi r^2}{3}k</math> + +Which simplifies to: +:<math>\frac{\operatorname dV}{\operatorname dr} = k\pi r^2</math> + +Similarly, the total derivative with respect to ''h'' is: +:<math>\frac{\operatorname dV}{\operatorname dh} = \pi r^2</math> + +Equations involving an unknown function's partial derivatives are called [[partial differential equation]]s and are common in [[physics]], [[engineering]], and other [[science]]s and applied disciplines. + +The total derivative with respect to ''both'' r and h is given by the [[Jacobian matrix and determinant|Jacobian matrix]], which here takes the form of the [[gradient]] vector <math>\nabla V =(\frac{\partial V}{\partial r},\frac{\partial V}{\partial h}) = (\frac{2}{3}\pi rh, \frac{1}{3}\pi r^2)</math>. + +==Notation== +For the following examples, let ''f'' be a function in ''x'', ''y'' and ''z''. + +First-order partial derivatives: + +:<math>\frac{ \partial f}{ \partial x} = f_x = \partial_x f.</math> + +Second-order partial derivatives: + +:<math>\frac{ \partial^2 f}{ \partial x^2} = f_{xx} = \partial_{xx} f.</math> + +Second-order [[mixed derivatives]]: + +:<math>\frac{\partial^2 f}{\partial y \, \partial x} = \frac{\partial}{\partial y} \left( \frac{\partial f}{\partial x} \right) = (f_{x})_{y} = f_{xy} = \partial_{yx} f.</math> + +Higher-order partial and mixed derivatives: + +:<math>\frac{ \partial^{i+j+k} f}{ \partial x^i\, \partial y^j\, \partial z^k } = f^{(i, j, k)}.</math> + +When dealing with functions of multiple variables, some of these variables may be related to each other, and it may be necessary to specify explicitly which variables are being held constant. In fields such as [[statistical mechanics]], the partial derivative of ''f'' with respect to ''x'', holding ''y'' and ''z'' constant, is often expressed as + +:<math>\left( \frac{\partial f}{\partial x} \right)_{y,z}.</math> + +==Antiderivative analogue== +There is a concept for partial derivatives that is analogous to [[antiderivative]]s for regular derivatives. Given a partial derivative, it allows for the partial recovery of the original function. + +Consider the example of <math>\frac{\partial z}{\partial x} = 2x+y</math>. The "partial" integral can be taken with respect to ''x'' (treating ''y'' as constant, in a similar manner to partial differentiation): +:<math>z = \int \frac{\partial z}{\partial x} \,dx = x^2 + xy + g(y)</math> +Here, the [[Constant of integration|"constant" of integration]] is no longer a constant, but instead a function of all the variables of the original function except ''x''. The reason for this is that all the other variables are treated as constant when taking the partial derivative, so any function which does not involve <math>x</math> will disappear when taking the partial derivative, and we have to account for this when we take the antiderivative. The most general way to represent this is to have the "constant" represent an unknown function of all the other variables. + +Thus the set of functions <math>x^2 + xy + g(y)</math>, where ''g'' is any one-argument function, represents the entire set of functions in variables ''x'',''y'' that could have produced the ''x''-partial derivative 2''x''+''y''. + +If all the partial derivatives of a function are known (for example, with the [[gradient]]), then the antiderivatives can be matched via the above process to reconstruct the original function up to a constant. + +==See also== +<div style="-moz-column-count:2; column-count:2"> +*[[d'Alembertian operator]] +*[[Chain rule]] +*[[Curl (mathematics)]] +*[[Directional derivative]] +*[[Divergence]] +*[[Exterior derivative]] +*[[Gradient]] +*[[Jacobian matrix and determinant]] +*[[Laplacian]] +*[[Symmetry of second derivatives]] +*[[Triple product rule]], also known as the cyclic chain rule. +</div> + +==Notes== +<references /> + +==External links== +*{{springer|title=Partial derivative|id=p/p071620}} +*[http://mathworld.wolfram.com/PartialDerivative.html Partial Derivatives] at MathWorld + +[[Category:Multivariable calculus]] +[[Category:Differential operators]] + lyy9y1pe1010hfrnenw4rxcgzlg5h7n + + + + Poset game + 0 + 27026 + + 27027 + 2013-12-20T12:53:37Z + + 75.0.178.254 + + Use canonical capitalization + wikitext + text/x-wiki + In [[combinatorial game theory]], '''poset games''' are [[mathematical game|mathematical]] [[game of strategy|games of strategy]], generalizing many well-known games such as [[Nim]] and [[Chomp]].<ref name="MSCW2011" /> In such games, two players start with a [[poset]] (a '''partially ordered set'''), and take turns choosing one point in the poset, removing it and all points that are greater. The player who is left with no point to choose, loses. + +==Game play== +Given a [[partially ordered set]] (''P'',&nbsp;<), let +:<math> P_x = P - \{ a\mid a \geq x\} </math> +denote the poset formed by removing ''x'' from ''P''. + +A poset game on ''P'', played between two players conventionally named [[Alice and Bob]], is as follows: + +* Alice chooses a point ''x''&nbsp;&isin;&nbsp;''P''; thus replacing ''P'' with ''P''<sub>''x''</sub>, and then passes the turn to Bob who plays on ''P''<sub>''x''</sub>, and passes the turn to Alice. +* A player loses if it is his/her turn and there are no points to choose. + +==Examples== +If ''P'' is a [[finite set|finite]] [[totally ordered set]], then game play in ''P'' is exactly the same as the game play in a game of [[Nim]] with a heap of size |''P''|. For, in both games, it is possible to choose a move that leads to a game of the same type whose size is any number smaller than |''P''|. In the same way, a poset game with a disjoint union of total orders is equivalent to a game of Nim with multiple heaps with sizes equal to the chains in the poset. + +A special case of [[Hackenbush]], in which all edges are green (able to be cut by either player) and every configuration takes the form of a [[tree (graph theory)|forest]], may be expressed similarly, as a poset game on a poset in which, for every element ''x'', there is at most one element ''y'' for which ''x'' [[covering relation|covers]] ''y''. If ''x'' covers ''y'', then ''y'' is the parent of ''x'' in the forest on which the game is played. + +[[Chomp]] may be expressed similarly, as a poset game on the [[Product order|product]] of total orders from which the [[infimum]] has been removed. + +==Grundy value== +Poset games are [[impartial game]]s, meaning that every move available to Alice would also be available to Bob if Alice were allowed to [[Null move|pass]], and vice versa. Therefore, by the [[Sprague–Grundy theorem]], every position in a poset game has a Grundy value, a number describing an equivalent position in the game of Nim. The Grundy value of a poset may be calculated as the least natural number which is not the Grundy value of any ''P''<sub>''x''</sub>, ''x''&nbsp;&isin;&nbsp;''P''. That is,<ref name="Byrnes2003"/> +: <math>G(P)=\min\bigl(\mathbb{N}\setminus \{G(P_x)\mid x\in P\}\bigr).</math> + +This number may be used to describe the optimal game play in a poset game. In particular, the Grundy value is nonzero when the player whose turn it is has a winning strategy, and zero when the current player cannot win against optimal play from his or her opponent. A winning strategy in the game consists of moving to a position whose Grundy value is zero, whenever this is possible. + +==Strategy stealing== +A [[strategy-stealing argument]] shows that the Grundy value is nonzero for every poset that has a [[supremum]]. For, let ''x'' be the supremum of a partially ordered set ''P''. If ''P<sub>x</sub>'' has Grundy value zero, then ''P'' itself has a nonzero value, by the formula above; in this case, ''x'' is a winning move in ''P''. If, on the other hand, ''P<sub>x</sub>'' has a nonzero Grundy value, then there must be a winning move ''y'' in ''P<sub>x</sub>'', such that the Grundy value of (''P<sub>x</sub>'')<sub>''y''</sub> is zero. But by the assumption that ''x'' is a supremum, ''x''&nbsp;>&nbsp;''y'' and (''P<sub>x</sub>'')<sub>''y''</sub>&nbsp;=&nbsp;''P<sub>y</sub>'', so the winning move ''y'' is also available in ''P'' and again ''P'' must have a nonzero Grundy value.<ref name="MSCW2011"/> + +For more trivial reasons a poset with an infimum also has a nonzero Grundy value: moving to the infimum is always a winning move. + +==Complexity== +Deciding the winner of an arbitrary finite poset game is [[PSPACE-complete]].<ref name="Grier2012" /> This means that unless P=PSPACE, computing the Grundy value of an arbitrary poset game is computationally difficult. + +==References== +{{Reflist|refs= +<ref name="MSCW2011">{{citation + | last1 = Soltys | first1 = Michael + | last2 = Wilson | first2 = Craig + | doi = 10.1007/s00224-010-9254-y + | issue = 3 + | journal = Theory of Computing Systems + | mr = 2770813 + | pages = 680–692 + | title = On the complexity of computing winning strategies for finite poset games + | volume = 48 + | year = 2011}}.</ref> +<ref name="Byrnes2003">{{citation + | last = Byrnes | first = Steven + | issue = G3 + | journal = Integers + | mr = 2036487 + | pages = 1–16 + | title = Poset game periodicity + | url = http://www.emis.ams.org/journals/INTEGERS/papers/dg3/dg3.pdf + | volume = 3 + | year = 2003}}.</ref> +<ref name="Grier2012">{{citation + | last = Grier | first = Daniel + | journal = arXiv + | title = Deciding the Winner of an Arbitrary Finite Poset Game is PSPACE-Complete + | url = http://arxiv.org/abs/1209.1750 + | year = 2012}}.</ref> +}} + +[[Category:Combinatorial game theory]] +[[Category:Mathematical games]] + 163jp34icz7pm6ui2xfbjnvkz1qi17h + + + + Engine efficiency + 0 + 15166 + + 15167 + 2014-02-02T04:15:00Z + + APerson + 0 + + Reverted 1 [[WP:AGF|good faith]] edit by [[Special:Contributions/115.251.246.210|115.251.246.210]] using [[WP:STiki|STiki]] + wikitext + text/x-wiki + {{multiple issues| +{{Cleanup|date=July 2008}} +{{refimprove|date=December 2008}} +}} + +'''Engine efficiency''' of thermal [[engine]]s is the relationship between the total [[energy]] contained in the [[fuel]], and the amount of energy used to perform useful work. There are two classifications of thermal engines- +#[[Internal combustion]] ([[Otto cycle|gasoline]], [[Diesel cycle|diesel]] and [[gas turbine]], i.e., [[Brayton cycle]] engines) and +#[[External combustion engines]] ([[Steam engine|steam piston]], [[steam turbine]], and the [[Stirling cycle]] engine). + +Each of these engines has [[thermal efficiency]] characteristics that are unique to it. + +==Mathematical definition== +{{unreferenced section|date=July 2012}} +The efficiency of engine is defined as ratio of the useful '''[[work done]]''' to the heat provided. + +: <math>\eta = \frac{ work\ done } {heat\ absorbed} = \frac{ Q1-Q2 }{ Q1}</math> + +where, <math>Q1</math> is the heat absorbed and <math>Q1-Q2</math> is the work done. + +Please note that the term '''work done''' relates to the power delivered '''at the clutch''' or '''at the driveshaft'''. + +This means the friction and other losses are subtracted from the work done by thermodynamic expansion. Thus an engine not delivering any work to the outside environment has zero efficiency. + +==Compression ratio== +{{unreferenced section|date=July 2012}} +The efficiency of internal combustion engines depends on several factors, one of which is the [[compression ratio]]. Most gasoline (petrol) engines have a ''geometric'' compression ratio (the compression ratio calculated purely from the geometry of the mechanical parts) of 10:1 ([[Octane rating|premium fuel]]) or 9:1 (regular fuel), with some engines reaching a ratio of 12:1 or more. The greater the compression ratio the more efficient is the engine, in principle, and higher compression-ratio conventional engines in principle need gasoline with higher [[Octane rating|octane]] value, though this simplistic analysis is complicated by the difference between actual and geometric compression ratios. High octane value inhibits the fuel's tendency to burn nearly instantaneously (known as [[Engine knocking|''detonation'' or ''knock'']]) at high compression/high heat conditions. However, in engines that utilize compression rather than spark ignition, by means of very high compression ratios (14-25:1), such as the [[diesel engine]] or [[Bourke engine]], high octane fuel is not necessary. In fact, lower-octane fuels, typically rated by [[cetane number]], are preferable in these applications because they are more easily ignited under compression. + +Under part throttle conditions (i.e. when the throttle is less than fully open), the ''effective'' compression ratio is less than when the engine is operating at full throttle, due to the simple fact that the incoming fuel-air mixture is being restricted and cannot fill the chamber to full atmospheric pressure. The engine efficiency is less than when the engine is operating at full throttle. One solution to this fact is to shift the load in a multi-cylinder engine from some of the cylinders (by deactivating them) to the remaining cylinders so that they may operate under higher individual loads and with correspondingly higher effective compression ratios. This technique is known as [[variable displacement]]. + +Diesel engines have a compression ratio between 14:1 to 25:1. In this case the general rule does not apply because Diesels with compression ratios over 20:1 are [[Diesel engine|indirect injection diesels]]. These use a prechamber to make possible high RPM operation as is required in automobiles and light trucks. The thermal and gas dynamic losses from the prechamber result in direct injection Diesels (despite their lower compression ratio) being more efficient. + +==Friction== +An engine has many moving parts that produce [[friction]]. Some of these friction forces remain constant (as long as applied load is constant); some of these friction losses increase as engine speed increases, such as piston side forces and connecting bearing forces (due to increased inertia forces from the oscillating piston). A few friction forces decrease at higher speed, such as the friction force on the [[camshaft|cam]]'s lobes used to operate the [[Four-stroke cycle engine valves|inlet and outlet valves]] (the valves' [[inertia]] at high speed tends to pull the cam follower away from the cam lobe). Along with friction forces, an operating engine has ''pumping losses'', which is the work required to move air into and out of the cylinders. This pumping loss is minimal at low speed, but increases approximately as the square of the speed, until at rated power an engine is using about 20% of total power production to overcome friction and pumping losses. + +==Oxygen== +[[Earth's atmosphere|Air]] is approximately 21% [[oxygen]]. If there is not enough [[oxygen]] for proper combustion, the fuel will not burn completely and will produce less energy. An excessively rich air fuel ratio will increase pollutants from the engine. The fuel burns in three stages. First, the hydrogen burns to form water vapour. Second, the carbon burns to carbon monoxide. Finally, the carbon monoxide burns to carbon dioxide. This last stage produces most of the power of the engine. If all of the [[oxygen]] is consumed before this stage because there is too much fuel, engine's power is reduced. + +There are a few exceptions where introducing fuel upstream of the combustion chamber can cool down the incoming air through evaporative cooling. The extra fuel that is not burned in the combustion chamber cools down the intake air resulting in more power. With direct injection this effect is not as dramatic but it can cool down the combustion chamber enough to reduce certain [[air pollution|pollutants]] such as nitrogen oxides ([[NOx]]), while raising others such as partially decomposed hydrocarbons. + +The air-fuel mix is drawn into an engine because downward motion of the pistons induces a partial vacuum. A [[gas compressor|compressor]] can additionally be used to force a larger charge (forced induction) into the cylinder to produce more power. The [[gas compressor|compressor]] is either mechanically driven [[supercharging]] or exhaust driven [[turbocharging]]. Either way, forced induction increases the air pressure exterior to the cylinder inlet port. + +There are other methods to increase the amount of oxygen available inside the engine; one of them, is to inject [[nitrous oxide]], (N2O) to the mixture, and some engines use [[nitromethane]], a fuel that provides the oxygen itself it needs to burn. Because of that, the mixture could be 1 part of fuel and 3 parts of air; thus, it is possible to burn more fuel inside the engine, and get higher power outputs... + +==Internal combustion engines== + +===Gasoline (petrol) engines=== +Modern [[gasoline]] engines have a maximum thermal efficiency of about 25% to 30% when used to power a car. In other words, even when the engine is operating at its point of maximum thermal efficiency, of the total heat energy released by the [[gasoline]] consumed, about 70-75% is rejected as heat without being turned into useful work, i.e. turning the crankshaft.<ref name="Baglione 2007">{{cite thesis |type=Ph.D. +|title=Development of System Analysis Methodologies and Tools for Modeling and Optimizing Vehicle System Efficiency + |last=Baglione + |first=Melody L. +|authorlink= +|coauthors= +|year=2007 |publisher =University of Michigan +|pages=52-54 |url=http://deepblue.lib.umich.edu/handle/2027.42/57640}}</ref> Approximately half of this rejected heat is carried away by the exhaust gases, and half passes through the cylinder walls or cylinder head into the engine cooling system, and is passed to the atmosphere via the cooling system radiator.<ref>http://www.arrowheadradiator.com/14_rules_for_improving_engine_cooling_system_capability_in_high-performance_automobiles.htm</ref> Some of the work generated is also lost as friction, noise, air turbulence, and work used to turn engine equipment and appliances such as [[circulation pump|water and oil pumps]] and the electrical [[alternator|generator]], leaving only about 25-30% of the energy released by the fuel consumed available to move the vehicle. + +At idle, the thermal efficiency is zero, since no usable work is being drawn from the engine. At low speeds, gasoline engines suffer efficiency losses at small throttle openings from the high turbulence and frictional (head) loss when the incoming air must fight its way around the nearly closed throttle; diesel engines do not suffer this loss because the incoming air is not throttled. At high speeds, efficiency in both types of engine is reduced by pumping and mechanical frictional losses, and the shorter period within which combustion has to take place. Engine efficiency peaks in most applications at around 75% of rated engine power, which is also the range of greatest engine torque (e.g. in the 2008 Baby Blue [[Ford Focus (North America)|Ford Focus]], maximum torque of 133 foot-pounds (180 Nm) (348 foot-pounds if turbocharged) is obtained at 4,500 [[RPM]], and maximum engine power of {{convert|136|bhp}} is obtained at 6,000 RPM). At all other combinations of engine speed and torque, the thermal efficiency is less than this maximum. + +A gasoline engine burns a mix of gasoline and air, consisting of a range of about twelve to eighteen parts (by weight) of air to one part of fuel (by weight). A mixture with a 14.7:1 air/fuel ratio is said to be [[stoichiometric]], that is when burned, 100% of the [[fuel]] and the [[oxygen]] are consumed. Mixtures with slightly less fuel, called [[lean burn]] are more efficient. The [[combustion]] is a reaction which uses the air's [[oxygen]] content to combine with the fuel, which is a mixture of several [[hydrocarbon]]s, resulting in [[water vapor]], [[carbon dioxide]], and sometimes [[carbon monoxide]] and partially burned hydrocarbons. In addition, at high temperatures the oxygen tends to combine with [[nitrogen]], forming [[nitrogen oxide|oxides of nitrogen]] (usually referred to as ''NOx'', since the number of oxygen atoms in the compound can vary, thus the "X" subscript). This mixture, along with the unused nitrogen and [[Atmospheric chemistry|other trace atmospheric elements]], is what we see in the [[Exhaust system|exhaust]]. + +In the past 3–4 years, GDI ([[Gasoline Direct Injection]]) increased the efficiency of the engines equipped with this fueling system up to 35%. Currently, the technology is available in a wide variety of vehicles ranging from less expensive cars produced by Mazda, Ford and Chevrolet to more expensive cars produced by BMW, Mercedes-Benz, and Volkswagen Auto Group. + +===Diesel engines=== +Engines using the Diesel cycle are usually more efficient, although the Diesel cycle itself is less efficient at equal compression ratios. Since diesel engines use much higher compression ratios (the heat of compression is used to ignite the slow-burning [[diesel fuel]]), that higher ratio more than compensates for the lower intrinsic cycle efficiency, and allows the diesel engine to be more efficient. The most efficient type, direct injection Diesels, are able to reach an efficiency of about 40% in the engine speed range of idle to about 1,800 rpm. Beyond this speed, efficiency begins to decline due to air pumping losses within the engine. +Modern turbo-diesel engines are using electronically controlled, common-rail fuel injection, that increases the efficiency up to 50% with the help of geometrically variable turbo-charging system; this also increases the engines' torque at low engine speeds (1200-1800RPM). + +===Gas turbine=== +The [[gas turbine]] is most efficient at maximum power output in the same way reciprocating engines are most efficient at maximum load. The difference is that at lower rotational speed the pressure of the compressed air drops and thus thermal and fuel efficiency drop dramatically. Efficiency declines steadily with reduced power output and is very poor in the low power range - the same is true in reciprocating engines, the friction losses at 3000 RPM are almost the same whether the engine is under 10% load or not having any useful output on the driveshaft. The inertia of high speed gas turbine together with the low air pressure under low speed cause it to have a significant lag which many drivers are unwilling to cope with. Today the gas turbine is not used for automobiles and trucks because the usage patterns dictate varying loads, including idling speeds. [[General Motors]] at one time manufactured a bus powered by a gas turbine, but due to the economy where crude oil prices rose exponentially (1970's) this concept was abandoned, [[Chrysler]] and [[Ford]] also built prototypes of turbine powered cars, Chrysler building a short prototype series of them. Driving comfort was good, but overall economy lacked due to reasons mentioned above. This is also why gas turbines can be used for permanent and peak power electric plants. In this application they are only run at or close to full power where they are efficient or shut down when not needed. + +Gas turbines do have advantage in power density - gas turbines are used as the engines in heavy armored vehicles and armored tanks and in power generators in jet fighters. + +One other factor negatively affecting the gas turbine efficiency is the ambient air temperature. With increasing temperature, intake air becomes less dense and therefore the gas turbine experiences power loss proportional to the increase in ambient air temperature.<ref>http://www.cospp.com/articles/print/volume-8/issue-6/features/gas-turbine-plant-efficiency-balancing-power-heat-and-operational-flexibility.html</ref> + +==External combustion engines== + +===Steam engine=== +{{Main|Steam engine}} +::See also: [[Steam engine#Efficiency]] +::See also: [[Timeline of steam power]] + +====Piston engine==== +Steam engines and turbines operate on the [[Rankine cycle]] which has a maximum [[Carnot efficiency]] of 63% for practical engines. + +The efficiency of steam engines is primarily related to the steam temperature and pressure and the number of stages or ''expansions''.<ref name="Thurston 1875">{{cite book +|title= A History of the Growth of the Steam-Engine + |last=Thurston + |first= Robert H. +|authorlink= +|coauthors= +|year=1875 |publisher =D. Appleton & Co.|location= +|pages=464–70 |url=http://www.history.rochester.edu/steam/thurston/1878/}}</ref> Steam engine efficiency improved as the operating principles were discovered, which lead to the development of the science of [[thermodynamics]]. See graph:[http://www.cuug.ab.ca/branderr/eeepc/017_coal.html Steam Engine Efficiency] + +In earliest steam engines the boiler was considered part of the engine. Today they are considered separate, so it is necessary to know whether stated efficiency is overall, which includes the boiler, or just of the engine. + +Comparisons of efficiency and power of the early steam engines is difficult for several reasons: 1) there was no standard weight for a bushel of coal, which could be anywhere from 82 to 96 pounds. 2) There was no standard heating value for coal, and probably no way to measure heating value. The coals had much higher heating value than today's steam coals, with 13,500 BTU/pound sometimes mentioned. 3) Efficiency was reported as "duty", meaning how many foot pounds of work lifting water were produced, but the mechanical pumping efficiency is not known.<ref name="Thurston 1875"/> + +The first piston steam engine, developed by [[Thomas Newcomen]] around 1710, was slightly over one half percent (0.5%) efficient. It operated with steam at near atmospheric pressure drawn into the cylinder by the load, then condensed by a spray of cold water into the steam filled cylinder, causing a partial vacuum in the cylinder and the pressure of the atmosphere to drive the piston down. Using the cylinder as the vessel in which to condense the steam also cooled the cylinder, so that some of the heat in the incoming steam on the next cycle was lost in warming the cylinder, reducing the thermal efficiency. Improvements made by [[John Smeaton]] to the Newcomen engine increased the efficiency to over 1%. + +[[James Watt]] made several improvements to the Newcomen engine, the most significant of which was the external condenser, which prevented the cooling water from cooling the cylinder. Watt's engine operated with steam at slightly above atmospheric pressure. Watt's improvements increased efficiency by a factor of over 2.5.<ref>John Enys, [http://books.google.ca/books?id=blhqAAAAMAAJ&pg=PA457 "Remarks on the Duty of the Steam Engines employed in the Mines of Cornwall at different periods"], ''Transactions of the Institution of Civil Engineers'', Volume 3 (14 January 1840), pg. 457</ref> +The lack of general mechanical ability, including skilled mechanics, [[machine tool]]s, and manufacturing methods, limited the efficiency of actual engines and their design until about 1840.<ref>{{Roe1916}}</ref> + +Higher pressures engines were developed by [[Oliver Evans]] and independently by [[Richard Trevithick]]. These engines were not very efficient but had high power-to-weight ratio, allowing them to be used for powering locomotives and boats. + +The centrifugal governor, which had first been used by Watt to maintain constant speed, worked by throttling the inlet steam, which lowered the pressure, resulting in a loss of efficiency on the high (above atmospheric) pressure engines.<ref>{{cite book +|title=A History of Control Engineering 1800-1930 + |last=Benett + |first= Stuart +|authorlink= +|coauthors= +|year=1986 |publisher =Institution of Engineering and Technology +|location= +|isbn= 978-0-86341-047-5 +|pages=}}</ref> Later control methods reduced or eliminated this pressure loss. + +The improved valving mechanism of the [[Corliss steam engine]] (Ptd. 1849) was better able to adjust speed with varying load and increased efficiency by about 30%. The Corliss engine had separate valves and headers for the inlet and exhaust steam so the hot feed steam never contacted the cooler exhaust pots and valving. The valves were quick acting, which reduced the amount of throttling of the steam and resulted in faster response. Instead of operating a throttling valve, the governor was used to adjust the valve timing to give a variable steam cut off. The variable cut off was responsible for a major portion of the efficiency increase of the Corliss engine.<ref name="Hunter 1985">{{cite book +|title=A History of Industrial Power in the United States, 1730-1930, Vol. 2: Steam Power + |last1=Hunter + |first1= Louis C. +|authorlink= +|year=1985 |publisher =University Press of Virginia +|location= Charolttesville +|isbn= |page=}}</ref> + +Others before Corliss had at least part of this idea, including [[Zachariah Allen]], who patented variable cut off, but lack of demand, increased cost and complexity and poorly developed machining technology delayed introduction until Corliss.<ref name="Hunter 1985"/> + +The Porter-Allen high speed engine (ca. 1862) operated at from three to five times the speed of other similar sized engines. The higher speed minimized the amount of condensation in the cylinder, resulting in increased efficiency.<ref name="Hunter 1985"/> + +[[Compound engine]]s gave further improvements in efficiency.<ref name="Hunter 1985"/> By the 1870s triple expansion engines were being used on ships. Compound engines allowed ships to carry less coal than freight.<ref>{{cite book +|title=Recent Economic Changes and Their Effect on Production and Distribution of Wealth and Well-Being of Society +|last=Wells +|first=David A. +|authorlink= +|coauthors= +|year=1891 |publisher= D. Appleton and Co.|location= New York|isbn= 0-543-72474-3 |pages= |url= http://books.google.com/?id=2V3qF4MWh_wC&printsec=frontcover&dq=RECENT+ECONOMIC+CHANGES+AND+THEIR+EFFECT+ON+DISTRIBUTION+OF+WEALTH+AND+WELL+BEING+OF+SOCIETY+WELLS#v=onepage&q&f=false }}</ref> Compound engines were used on some locomotives but were not widely adopted because of their mechanical complexity. + +The most efficient reciprocating steam engine design (per stage) was the [[Uniflow steam engine|uniflow engine]], but by the time it appeared steam was being displaced by diesel engines, which were even more efficient and had the advantage of requiring less labor for coal handling and oil being a more dense fuel displaced less cargo. + +====Steam turbine==== +The [[steam turbine]] is the most efficient steam engine and for this reason is universally used for electrical generation. Steam expansion in a turbine is nearly continuous, which makes a turbine comparable to a very large number of expansion stages. Steam [[fossil fuel power station]]s operating at the [[critical point (thermodynamics)|critical point]] have efficiencies in the low 40% range. Turbines produce direct rotary motion and are far more compact and weigh far less than reciprocating engines and can be controlled to within a very constant speed. + +===Stirling engines=== +The [[Stirling cycle engine]] has the highest theoretical efficiency of any thermal engine but it is more expensive to make and is not competitive with other types for normal commercial use.{{Citation needed|date=December 2010}} + +==See also== +* [[Specific fuel consumption (shaft engine)|Brake Specific Fuel Consumption]] +* [[Fuel efficiency]] +* [[Chrysler Turbine Car]] (1963) + +== References == +{{Reflist}} + +== External links == +* [http://www.viragotech.com/fixit/FuelEconomyEngineEfficiencyPower.html Fuel Economy, Engine Efficiency & Power] + +{{DEFAULTSORT:Engine Efficiency}} +[[Category:Engine technology]] + 8xum2l47pl6zy0as8ycl6hjjk3h5ez9 + + + + Legendre transformation + 0 + 4479 + + 4480 + 2014-01-07T20:22:13Z + + Dan Gluck + 0 + + /* Properties */ correcting the proofs + wikitext + text/x-wiki + [[Image:Legendre transformation.png|thumb|256px|right|The function {{math|''f(x)''}} is defined on the interval [''a,b'']. The difference {{math|''px'' − ''f(x)''}} takes a maximum at {{math|''x'''}}. Thus, {{math|''f <sup>*</sup>''(''p'') {{=}} ''px' − f(x')''}}.]] + +In [[mathematics]], the '''Legendre transformation''' or '''Legendre transform''', named after [[Adrien-Marie Legendre]], is an +[[involution (mathematics)|involutive]] [[List of transforms|transformation]] on the [[real number|real]]-valued [[convex function]]s of one real variable. Its generalisation to convex functions of affine spaces is sometimes called the [[Legendre-Fenchel transformation]]. It is commonly used in [[thermodynamics]] and to derive the [[Hamiltonian mechanics|Hamiltonian]] formalism of classical mechanics out of the [[Lagrangian mechanics|Lagrangian]] formulation, as well as in the solution of [[differential equation]]s of several variables. + +==Definition== + +Let {{math|''I'' ⊂ ℝ }} be an interval, and {{math|''f'': ''I'' → ℝ }} a [[convex function]]; then its ''Legendre transform'' is the function {{math|''f*'': ''I*'' → ℝ}} defined by +:<math>f^*(x^*) = \sup_{x\in I}(x^*x-f(x)),\quad x^*\in I^*</math> +with domain +:<math>I^*=\{x^*:\sup_{x\in I}(x^*x-f(x))<\infty\}</math>. +The transform is always well-defined when {{math|''f''(''x'')}} is [[convex function|convex]]. + +The generalization to convex functions {{math|''f'': ''X'' → ℝ }} on a convex set {{math|''X'' ⊂ ℝ<sup>''n''</sup> }} is straightforward: {{math|''f*'': ''X*'' → ℝ}} has domain +:<math>X^*=\{x^*:\sup_{x\in X}(\langle x^*,x\rangle-f(x))<\infty\}</math> +and is defined by +:<math>f^*(x^*) = \sup_{x\in X}(\langle x^*,x\rangle-f(x)),\quad x^*\in X^*</math>, +where <math>\langle x^*,x \rangle</math> denotes the [[dot product]] of {{math|''x''<sup>*</sup>}} and {{math|''x''}}. + +The function {{math|''f'' <sup>*</sup>}} is called the [[convex conjugate]] function of {{mvar|f}}. For historical reasons (rooted in analytic mechanics), the conjugate variable is often denoted {{mvar|p}}, instead of {{math|''x''<sup>*</sup>}}. If the convex function {{mvar|f}} is defined on the whole line and is everywhere [[differentiable]], then +:<math>f^*(p)=\sup_{x\in I}(px-f(x))</math> +can be interpreted as the negative of the [[y-intercept|''y''-intercept]] of the [[tangent line]] to the [[Graph of a function|graph]] of {{mvar|f}} that has slope {{mvar|p}}. + +The Legendre transformation is an application of the [[Duality (projective geometry)|duality]] relationship between points and lines. The functional relationship specified by {{mvar|f}} can be represented equally well as a set of {{math|(''x,y'')}} points, or as a set of tangent lines specified by their slope and intercept values. + +==Properties== + +The Legendre transform of a convex function is convex. + +Let us show this for the case of a doubly differentiable f with a non zero (and hence positive, due to convexity) double derivative. + +For a fixed {{mvar|p}}, let {{mvar|x}} maximize {{math|''px''−''f''(''x'')}}. +Then {{math|''f'' <sup>*</sup>(''p'') {{=}} ''px''−''f''(''x'')}}, noting that {{mvar|x}} depends on {{mvar|p}}. +So we have +:<math>f^\prime(x) = p</math> + +The derivative of f is itself differentiable with a positive derivative and hence strictly monotonuous and invertible. Thus +:<math>x = g(p),</math> +where <math>g \equiv (f^{\prime})^{-1}(p)</math>, meaning that g is defined so that <math>f'(g(p))= p</math>. + +Note that g is also differentiable with the following derivative +:<math>\frac{dg(p)}{dp} = \frac{1}{f''(g(p))}</math> + +Thus {{math|''f'' <sup>*</sup>(''p'') {{=}} ''pg''(''p'')−''f''(''g''(''p''))}} is the composition of differentiable functions, hence differentiable. + + +Applying the product rule and the chain rule we have +:<math> +\begin{align} +\frac{d(f^{*})}{dp} +&{} = g(p) + \left(p - f'(g(p))\right)\cdot \frac{dg(p)}{dp}\\ +&{} = g(p), +\end{align} +</math> + +Giving +:<math> +\begin{align} +\frac{d^2(f^{*})}{dp^2} +&{} = \frac{dg(p)}{dp} \\ +&{} = \frac{1}{f''(g(p))} \\ +&{} > 0 , +\end{align} +</math> +so {{math|''f'' <sup>*</sup>}} is convex. + +As we now show, it follows that the Legendre transformation is an [[Involution (mathematics)|involution]], i.e., {{math|''f'' <sup>**</sup> {{=}} ''f''}}. Thus, by using the above equalities for {{math|''g''(''p'')}}, {{math|''f''<sup>*</sup>(''p'')}} and its derivative +:<math> +\begin{align} +f^{**}(x) +&{} = {\left(x\cdot p_s - f^{*}(p_s)\right)}_{|\frac{d}{dp}f^{*}(p=p_s) = x} \\ +&{} = g(p_s)\cdot p_s - f^{*}(p_s) \\ +&{} = f(g(p_s)) \\ +&{} = f(x), +\end{align}</math> + +==Examples== + +===Example 1=== +Let {{math| ''f(x)'' {{=}} ''cx<sup>2</sup>''}} defined on the whole ℝ, where {{math|''c'' > 0}} is a fixed constant. + +For {{math|''x''*}} fixed, the function {{math|''x''*''x'' – ''f''(''x'') {{=}} ''x''*''x'' – ''cx''<sup>2</sup>}} of {{mvar|x}} has the first derivative {{math|''x''* – 2''cx''}} and second derivative −2''c''; there is one stationary point at {{math|''x'' {{=}} ''x''*/2''c''}}, which is always a maximum. + +Thus, ''I''* = ℝ and +:<math>f^*(x^*)=c^*{x^*}^2~,</math> +where {{math|''c''* {{=}} 1/4''c'' }}. + +Clearly, +:<math>f^{**}(x)=\frac{1}{4c^*}x^2=cx^2\, ,</math> +namely {{math| ''f'' <sup>**</sup> {{=}} ''f''}}. + +===Example 2=== +Let {{math| ''f(x)'' {{=}} ''x<sup>2</sup>''}} for {{math| ''x'' ∈ ''I'' {{=}}}} [2,3]. + +For {{math|''x''*}} fixed, {{math| ''x''*''x'' − ''f''(''x'')}} is continuous on {{mvar|I}} [[compact space|compact]], hence it always takes a finite maximum on it; it follows that {{math|''I''*}} = ℝ. The stationary point at {{math| ''x'' {{=}} ''x''*/2}} is in the domain [2,3] if and only if {{math|4 ≤ ''x''* ≤ 6}}, otherwise the maximum is taken either at ''x''=2 or ''x''=3. + +It follows that +<math>f^*(x^*)=\begin{cases}2x^*-4,\quad&x^*<4\\ \frac{{x^*}^2}{4},&4\leqslant x^*\leqslant 6,\\3x^*-9,&x^*>6\end{cases}</math> . + +===Example 3=== +The function {{math|''f''(''x''){{=}}''cx''}} is convex, for every {{mvar|x}} (strict convexity is not required for the Legendre transformation to be well defined). +Clearly {{math| ''x''*''x'' − ''f''(''x'') {{=}} (''x''* − ''c'')''x''}} is never upper-bounded as a function of {{mvar|x}}, unless {{math|''x''* − ''c'' {{=}} 0}}. + +Hence {{math|''f''*}} is defined on <math>I^*=\{c\}</math> and <math>f^*(c)=0</math>. + +One may check involutivity: of course {{math| ''x''*''x'' − ''f'' *(''x''*)}} is always bounded as a function of ''x''* ∈ {''c''}, hence {{math|''I''** {{=}} ℝ}}. + +Then,{{math| ∀''x''}} one has +:<math>\sup_{x^*\in\{c\}}(xx^*-f^*(x^*))=xc,</math> +and thus {{math|''f''**(''x'') {{=}} ''cx'' {{=}} ''f''(''x'')}}. + +===Example 4 (many variables)=== +Let <math>f(x)=\langle x,Ax\rangle+c</math> be defined on {{math|''X'' {{=}} ℝ<sup>''n''</sup>}}, where {{mvar|A}} is a real, positive definite matrix. Then {{mvar|f}} is convex. + +<math>\langle p,x\rangle-f(x)=\langle p,x \rangle-\langle x,Ax\rangle-c</math> has gradient {{math|''p'' − 2''Ax''}} and [[Hessian matrix|Hessian]] {{math|−2''A''}}, which is negative; hence the stationary point {{math|''x'' {{=}} ''A''<sup>-1</sup>''p''/2}} is a maximum. We have {{math|''X''*{{=}} ℝ<sup>''n''</sup>}}, and +:<math>f^*(p)=\frac14\langle p,A^{-1}p\rangle-c</math> . + +==An equivalent definition in the differentiable case== + +Equivalently, two convex functions {{mvar|f}} and {{mvar|g}} defined on the whole line +are said to be Legendre transforms of each other if their first [[derivative]]s are [[inverse function]]s of each other, +:<math>Df = \left( Dg \right)^{-1}~,</math> +in which case one writes equivalently {{math|''f'' <sup>*</sup> {{=}} ''g''}} and {{math|''g''<sup>*</sup> {{=}} ''f''}}. +We can see this by first taking the derivative of {{math|''f'' <sup>*</sup>}}, +:<math>{df^\star(p) \over dp} = {d \over dp}(xp-f(x)) = x + p {dx \over dp} - {df \over dx} {dx \over dp} = x~.</math> + +This equation, taken together with the previous equation resulting from the maximization condition, results in the following pair of reciprocal equations, + +:<math>p = {df \over dx}(x),</math> + +:<math>x = {df^\star \over dp}(p).</math> + +From these, it is evident that {{math|''Df''}} and {{math|''Df'' <sup>*</sup>}} are inverses, as stated. One may exemplify this by considering {{math| ''f''(''x'') {{=}} exp ''x''}} and hence {{math| ''g''(''p'') {{=}} ''p'' log ''p'' − ''p''}}. + +They are unique, up to an additive constant, which is fixed by the additional requirement that +:<math>f(x) + f^\star(p) = x\,p ~.</math> +The symmetry of this expression underscores that the Legendre transformation is its own inverse (involutive). + +In practical terms, given {{math| ''f''(''x'')}}, the parametric plot of {{math| ''xf'''(''x'') −''f''(''x'')}} versus {{math| ''f'' '(''x'')}} amounts to the graph of {{math| ''g''(''p'')}} versus {{mvar|p}}. + +In some cases (e.g. thermodynamic potentials, below), a non-standard requirement is used, amounting to an alternative definition of {{math|''f'' <sup>*</sup>}} with a ''minus sign'', +:<math>f(x) - f^\star(p) = x\,p~.</math> + +==Behavior of differentials under Legendre transforms== +The Legendre transform is linked to [[integration by parts]], {{math|''p dx'' {{=}} ''d(px) − x dp''}}. + +Let {{mvar|f}} be a function of two independent variables {{mvar|x}} and {{mvar|y}}, with the differential +:<math>df = {\partial f \over \partial x}dx + {\partial f \over \partial y}dy = pdx + vdy</math> . + +Assume that it is convex in {{mvar|x}} for all {{mvar|y}}, so that one may perform the Legendre transform in {{mvar|x}}, with {{mvar|p}} the variable conjugate to {{mvar|x}}. Since the new independent variable is +{{mvar|p}}, the differentials {{math|''dx''}} and {{math|''dy''}} devolve to {{math|''dp''}} and {{math|''dy''}}, i.e., we build another function with its differential expressed in terms of the new basis {{math|''dp''}} and {{math|''dy''}}. We thus consider the function {{math|''g(p, y)'' {{=}} ''f'' − ''px''}} so that +:<math>dg = df - pdx - xdp = pdx + vdy - pdx - xdp = -xdp + vdy</math> +:<math>x = -{\partial g \over \partial p}</math> +:<math>v = {\partial g \over \partial y} ~.</math> + +The function {{math|''g(p, y)''}} is the Legendre transform of {{math|''f(x,y)''}}, where only the independent variable {{mvar|x}} has been supplanted by {{mvar|p}}. This is widely used in thermodynamics, as illustrated below. + +==Applications== + +=== Hamilton-Lagrange mechanics === + +A Legendre transform is used in [[classical mechanics]] to derive the [[Hamiltonian mechanics|Hamiltonian formulation]] from the [[Lagrangian mechanics|Lagrangian formulation]], and conversely. A typical Lagrangian has the form +:<math>L(v,q)=\tfrac{1}2\langle v,Mv\rangle-V(q)</math>, +where {{math|(''v,q'')}} are coordinates on ℝ<sup>n</sup>×ℝ<sup>n</sup>, {{mvar|M}} is a positive real matrix, and +<math>\langle x,y\rangle=\sum_jx_jy_j</math>. For every {{mvar|q}} fixed, {{math|''L(v,q)''}} is a convex function of {{mvar|v}}, +while {{math| −''V(q)''}} plays the role of a constant. + +Hence the Legendre transform of {{math|''L(v,q)''}} as a function of {{mvar|v}} is the Hamiltonian function, +:<math>H(p,q)=\tfrac 12\langle p,M^{-1}p\rangle+V(q)</math>. + +In a more general setting, {{math|(''v,q'')}} are local coordinates on the tangent bundle <math>T\mathcal M</math> of a manifold <math>\mathcal M</math>. For each {{mvar|q}}, {{math|''L(v,q)''}} is a convex function of the tangent space {{math|''V''<sub>''q''</sub>}}. The Legendre transform +gives the Hamiltonian {{math|''H(p,q)''}} as a function of the coordinates {{math|''(p,q)''}} of the cotangent bundle <math>T^*\mathcal M</math>; +the inner product used to define the Legendre transform is inherited from the pertinent canonical [[symplectic vector space|symplectic structure]]. + +=== Thermodynamics === +The strategy behind the use of Legendre transforms in thermodynamics is to shift from a function that depends on a variable to a new (conjugate) function that depends on a new variable, the conjugate of the original one. The new variable is the partial derivative of the original function with respect to the original variable. The new function is the difference between the original function and the product of the old and new variables. Typically, this transformation is useful because it shifts the dependence of, e.g., the energy from an [[Intensive and extensive properties|extensive variable]] to its conjugate intensive variable, which can usually be controlled more easily in a physical experiment. + +For example, the [[internal energy]] is an explicit function of the ''[[extensive quantity|extensive variables]]'' [[entropy]], [[volume]], and [[chemical composition]] + +:<math> U = U(S,V,\{N_i\})\,</math>, +which has a total differential +:<math> dU = TdS - PdV + \sum \mu _i dN _i</math>. + +By using the (non standard) Legendre transform of the internal energy, {{mvar|U}}, with respect to volume, {{mvar|V}}, it is possible to define the [[enthalpy]] as +:<math> H = U + PV \, = H(S,P,\{N_i\})\,</math>, +which is an explicit function of the pressure, ''P''. The enthalpy contains all of the same information as the internal energy, but is often easier to work with in situations where the pressure is constant. + +It is likewise possible to shift the dependence of the energy from the extensive variable of entropy, {{mvar|S}}, to the (often more convenient) intensive variable {{mvar|T}}, resulting in the [[Helmholtz energy|Helmholtz]] and [[Gibbs energy|Gibbs]] [[thermodynamic free energy|free energies]]. The Helmholtz free energy, {{mvar|A}}, and Gibbs energy, {{mvar|G}}, are obtained by performing Legendre transforms of the internal energy and enthalpy, respectively, +:<math> A = U - TS ~,</math> + +:<math> G = H - TS = U + PV - TS ~.</math> +The Helmholtz free energy is often the most useful thermodynamic potential when temperature and volume are held constant, while the Gibbs energy is often the most useful when temperature and pressure are held constant. + +=== An example &ndash; variable capacitor === +As another example from [[physics]], consider a parallel-plate [[capacitor]], in which the plates can move relative to one another. Such a capacitor would allow transfer of the electric energy which is stored in the capacitor into external mechanical work, done by the [[force]] acting on the plates. One may think of the electric charge as analogous to the "charge" of a [[gas]] in a [[cylinder (engine)|cylinder]], with the resulting mechanical [[force]] exerted on a [[piston]]. + +Compute the force on the plates as a function of '''x''', the distance which separates them. To find the force, +compute the potential energy, and then apply the definition of force as the gradient of the potential energy function. + +The energy stored in a capacitor of [[capacitance]] ''C''('''x''') and charge ''Q'' is +:<math> U (Q, \mathbf{x} ) = \begin{matrix} \frac{1}{2} \end{matrix} QV = \begin{matrix} \frac{1}{2} \end{matrix} \frac{Q^2}{C(\mathbf{x})}~</math> , +where the dependence on the area of the plates, the dielectric constant of the material between the plates, and the separation '''x''' are abstracted away as the [[capacitance]] ''C''('''x'''). (For a parallel plate capacitor, this is proportional to the area of the plates and inversely proportional to the separation.) + +The force '''F''' between the plates due to the electric field is then +:<math> \mathbf{F}(\mathbf{x}) = -\frac{dU}{d\mathbf{x}} ~. </math> + +If the capacitor is not connected to any circuit, then the ''[[electric charge|charges]]'' on the plates remain constant as they move, and the force is the negative [[gradient]] of the [[electrostatics|electrostatic]] energy +:<math> \mathbf{F}(\mathbf{x}) = \begin{matrix} \frac{1}{2} \end{matrix} \frac{dC}{d\mathbf{x}} \frac{Q^2}{C^2}~. </math> + +However, suppose, instead, that the ''[[volt]]age'' between the plates ''V'' is maintained constant by connection to a [[battery (electricity)|battery]], which is a reservoir for charge at constant potential difference; now the ''charge is variable'' instead of the voltage, its Legendre conjugate. +To find the force, first compute the non-standard Legendre transform, +:<math> U^* = U - QV = \begin{matrix} \frac{1}{2} \end{matrix}QV - QV = -\begin{matrix} \frac{1}{2} \end{matrix} QV= - \tfrac{1}{2} V^2 {C(\mathbf{x})} \,.</math> + +The force now becomes the negative gradient of this Legendre transform, still pointing in the same direction, +:<math> \mathbf{F}(\mathbf{x}) = -\frac{dU^*}{d\mathbf{x}}~.</math> + +The two conjugate energies happen to stand opposite to each other, only because of the [[linear]]ity of the [[capacitance]]—except now ''Q'' is no longer a constant. They reflect the two different pathways of storing energy into the capacitor, resulting in, for instance, the same "pull" between a capacitor's plates. + +=== Probability theory === +In [[large deviations theory]], the ''rate function'' is defined as the Legendre transformation of the logarithm of the [[moment generating function]] of a random variable. An important application of the rate function is in the calculation of tail probabilities of sums of i.i.d. random variables. +<!--***** I remove since it is a repetition **** +==Examples== +[[Image:LegendreExample.svg|right|thumb|200px|e<sup>''x''</sup> is plotted in red and its Legendre transform in dashed blue.]] +The [[exponential function]] + +<math> f(x) = e^x </math> has <math> f^\star(p) = p ( \ln p - 1 ) </math> + +as a Legendre transform since their respective first derivatives e<sup>''x''</sup> and &nbsp;ln&nbsp;''x'' are inverse to each other. This example shows that the respective [[domain (mathematics)|domain]]s of a function and its Legendre transform need not agree. As another easy example, for + +<math> f(x) = x^2, </math> the Legendre transform is <math> f^\star(p) = \frac{p^2}{4}. </math> + +Similarly, the [[quadratic form]] + +:<math> f(x) = \begin{matrix} \frac{1}{2} \end{matrix} \, x^T \, A \, x </math> + +with ''A'' a [[symmetric matrix|symmetric]] [[invertible matrix|invertible]] ''n''-by-''n''-[[Matrix (mathematics)|matrix]] has + +:<math> f^\star(p) = \begin{matrix} \frac{1}{2} \end{matrix} \, p^T \, A^{-1} \, p </math> + +as a Legendre transform. + +==Legendre transformation in one dimension== +In one dimension, a Legendre transform to a function <math>f: \R \rightarrow \R</math> with an invertible first derivative may be found using the formula + +:<math> f^\star(y) = y \, x - f(x), \, x = \dot{f}^{-1}(y). </math> + +This can be seen by integrating both sides of the defining condition restricted to one-dimension + +:<math> \dot{f}(x) = \dot{f}^{\star-1}(x) </math> + +from <math>x_0</math> to <math>x_1</math>, making use of the [[fundamental theorem of calculus]] on the left hand side and [[Substitution rule|substituting]] + +:<math> y = \dot{f}^{\star-1}(x) </math> + +on the right hand side to find + +:<math> f(x_1) - f(x_0) = \int_{y_0}^{y_1} y \, \ddot{f}^\star(y) \, dy </math> + +with <math>f^\star(y_0)=x_0, f^\star(y_1)=x_1</math>. Using [[integration by parts]] the last integral simplifies to + +:<math> y_1 \, \dot{f}^\star(y_1) - y_0 \, \dot{f}^\star(y_0) - \int_{y_0}^{y_1} \dot{f}^\star(y) \, dy += y_1 \, x_1 - y_0 \, x_0 - f^\star(y_1) + f^\star(y_0). </math> + +Therefore, + +:<math> f(x_1) + f^\star(y_1) - y_1 \, x_1 = f(x_0) + f^\star(y_0) - y_0 \, x_0. </math> + +Since the left hand side of this equation does only depend on <math>x_1</math> and the right hand side only on <math>x_0</math>, they have to evaluate to the same constant. + +:<math> f(x) + f^\star(y) - y \, x = C,\, x = \dot{f}^\star(y) = \dot{f}^{-1}(y). </math> + +Solving for <math>f^\star</math> and choosing <math>C</math> to be zero results in the above-mentioned formula. +--> + +==Geometric interpretation== +For a [[strictly convex function]], the Legendre transformation can be interpreted as a mapping between the [[graph of a function|graph]] of the function and the family of [[tangent]]s of the graph. (For a function of one variable, the tangents are well-defined at all but at most [[countable set|countably many]] points, since a convex function is [[derivative|differentiable]] at all but at most countably many points.) + +The equation of a line with [[slope]] ''p'' and [[y-intercept]] ''b'' is given by +:<math>y = px + b~.</math> + +For this line to be tangent to the graph of a function ''f'' at the point (''x''<sub>0</sub>, ''f''(''x''<sub>0</sub>)) requires +:<math>f\left(x_0\right) = p x_0 + b</math> +and +:<math>p = \dot{f}\left(x_0\right)</math> . +''f''' is strictly monotone as the derivative of a strictly convex function. The second equation can be solved for ''x''<sub>0</sub>, allowing elimination of ''x''<sub>0</sub> from the first, giving the ''y''-intercept ''b'' of the tangent as a function of its slope ''p'', +:<math> +b = f\left(\dot{f}^{-1}\left(p\right)\right) - p \cdot \dot{f}^{-1}\left(p\right) = -f^\star(p). +</math> +Here, ''f*'' denotes the Legendre transform of ''f''. + +The [[indexed family|family]] of tangents of the graph of ''f'' parameterized by ''p'' is therefore given by +:<math>y = px - f^\star(p)</math> , +or, written implicitly, by the solutions of the equation +:<math>F(x,y,p) = y + f^\star(p) - px = 0~.</math> + +The graph of the original function can be reconstructed from this family of lines as the [[envelope (mathematics)|envelope]] of this family by demanding +: <math>{\partial F(x,y,p)\over\partial p} = \dot{f}^\star(p) - x = 0~.</math> + +Eliminating ''p'' from these two equations gives +: <math>y = x \cdot \dot{f}^{\star-1}(x) - f^\star\left(\dot{f}^{\star-1}(x)\right).</math> + +Identifying ''y'' with ''f''(''x'') and recognizing the right side of the preceding equation as the Legendre transform of ''f*'', yields +: <math>f(x) = f^{\star\star}(x) ~.</math> + +==Legendre transformation in more than one dimension== + +For a differentiable real-valued function on an [[open set|open]] subset ''U'' of '''R'''<sup>''n''</sup> the Legendre conjugate of the pair (''U'', ''f'') is defined to be the pair (''V'', ''g''), where ''V'' is the image of ''U'' under the [[gradient]] mapping D''f'', and ''g'' is the function on ''V'' given by the formula + +:<math> +g(y) = \left\langle y, x \right\rangle - f\left(x\right), \, +x = \left(Df\right)^{-1}(y) +</math> + +where + +:<math>\left\langle u,v\right\rangle = \sum_{k=1}^{n}u_{k} \cdot v_{k}</math> + +is the [[scalar product]] on '''R'''<sup>''n''</sup>. The multidimensional transform can be interpreted as an encoding of the [[convex hull]] of the function's [[epigraph (mathematics)|epigraph]] in terms of its [[supporting hyperplane]]s.[http://maze5.net/?page_id=733] + +Alternatively, if ''X'' is a [[real vector space]] and ''Y'' is its [[dual space|dual vector space]], then for each point ''x'' of ''X'' and ''y'' of ''Y'', there is a natural identification of the [[cotangent space]]s T*''X''<sub>''x''</sub> with ''Y'' and T*''Y''<sub>''y''</sub> with ''X''. If ''f'' is a real differentiable function over ''X'', then ∇''f'' is a section of the [[cotangent bundle]] T*''X'' and as such, we can construct a map from ''X'' to ''Y''. Similarly, if ''g'' is a real differentiable function over ''Y'', ∇''g'' defines a map from ''Y'' to ''X''. If both maps happen to be inverses of each other, we say we have a Legendre transform. +<!-- section on convex conjugation moved to own page --> + +When the function is not differentiable, the Legendre transform can still be extended, and is known as the [[Legendre-Fenchel transformation]]. In this more general setting, a few properties are lost: for example, the Legendre transform is no longer its own inverse (unless there are extra assumptions, like [[convex function|convexity]]). + +==Further properties== + +===Scaling properties=== +The Legendre transformation has the following scaling properties: For ''a>0'', + +:<math> +f(x) = a \cdot g(x) +\Rightarrow +f^\star(p) = a \cdot g^\star\left(\frac{p}{a}\right) +</math> + +:<math> +f(x) = g(a \cdot x) +\Rightarrow +f^\star(p) = g^\star\left(\frac{p}{a}\right). +</math> + +It follows that if a function is [[homogeneous function|homogeneous of degree ''r'']] then its image under the Legendre transformation is a homogeneous function of degree ''s'', where {{math|1/''r''+1/''s'' {{=}} 1}}. (For {{math|''f(x)'' {{=}} ''x''<sup>''r''</sup>/''r''}}, where {{math|''r'' > 1}}, &nbsp; ⇒ {{math|''f*(p)'' {{=}} ''p''<sup>''s''</sup>/''s''}}.) Thus, the only monomial whose degree is invariant under Legendre transform is the quadratic. + +===Behavior under translation=== + +:<math> +f(x) = g(x) + b +\Rightarrow +f^\star(p) = g^\star(p) - b +</math> + +:<math> +f(x) = g(x + y) +\Rightarrow +f^\star(p) = g^\star(p) - p \cdot y +</math> + +===Behavior under inversion=== +:<math> +f(x) = g^{-1}(x) +\Rightarrow +f^\star(p) = - p \cdot g^\star\left(\frac{1}{p}\right) +</math> + +===Behavior under linear transformations=== +Let ''A'' be a [[linear transformation]] from '''R'''<sup>''n''</sup> to '''R'''<sup>''m''</sup>. For any convex function ''f'' on '''R'''<sup>''n''</sup>, one has +:<math> \left(A f\right)^\star = f^\star A^\star </math> + +where ''A*'' is the [[adjoint operator]] of ''A'' defined by +:<math> \left \langle Ax, y^\star \right \rangle = \left \langle x, A^\star y^\star \right \rangle, </math> +and <math>A f</math> is the ''push-forward'' of <math>f</math> along <math>A</math> +:<math> (A f)(y) = \inf\{ f(x) : x \in X , A x = y \}. </math> + +A closed convex function ''f'' is symmetric with respect to a given set ''G'' of [[orthogonal matrix|orthogonal linear transformation]]s, +:<math>f\left(A x\right) = f(x), \; \forall x, \; \forall A \in G </math> +[[if and only if]] ''f*'' is symmetric with respect to ''G''. + +===Infimal convolution=== +The '''infimal convolution''' of two functions ''f'' and ''g'' is defined as +:<math> \left(f \star_\inf g\right)(x) = \inf \left \{ f(x-y) + g(y) \, | \, y \in \mathbb{R}^n \right \}. </math> + +Let ''f''<sub>1</sub>, …, ''f''<sub>m</sub> be proper convex functions on '''R'''<sup>''n''</sup>. Then +:<math> \left( f_1 \star_\inf \cdots \star_\inf f_m \right)^\star = f_1^\star + \cdots + f_m^\star. </math> + +===Fenchel's inequality === +For any function {{mvar|f}} and its convex conjugate {{math|''f''*}} ''Fenchel's inequality'' (also known as the ''Fenchel–Young inequality'') holds for every {{math|''x'' ∈ ''X''}} and {{math|''p'' ∈ ''X''*}}, i.e., ''independent'' {{math|''x,p''}} pairs, +:<math> +\left\langle p,x \right\rangle \le f(x) + f^\star(p). +</math> + +==See also== +* [[Dual curve]] +* [[Projective duality]] +* [[Young's inequality]] +* [[Convex conjugate]] +* [[Moreau's theorem]] +* [[Integration by parts]] +* [[Fenchel's duality theorem]] + +== References == +{{reflist}} +* {{cite book | last1=Courant |first1=Richard |authorlink1=Richard Courant |last2=Hilbert |first2=David |authorlink2=David Hilbert | title=Methods of Mathematical Physics |volume=2 |year=2008 | publisher=John Wiley & Sons |isbn=0471504394}} +* {{cite book | last=Arnol'd |first=Vladimir Igorevich |authorlink=Vladimir Igorevich Arnol'd | title=Mathematical Methods of Classical Mechanics |edition=2nd | publisher=Springer | year=1989 | isbn=0-387-96890-3}} +* Fenchel, W. (1949). "On conjugate convex functions", ''Canad. J. Math'' '''1''': 73-77. +* {{cite book | last=Rockafellar |first=R. Tyrrell | authorlink=R. Tyrrell Rockafellar |title=Convex Analysis |publisher=Princeton University Press |year=1996 |origyear=1970 |isbn=0-691-01586-4}} +* {{cite doi|10.1119/1.3119512|noedit}} + +==Further reading== +*{{cite web +|url = http://www.maths.qmw.ac.uk/~ht/archive/lfth2.pdf +|title = Legendre-Fenchel transforms in a nutshell +|accessdate = 2013-10-13 +|last = Touchette +|first = Hugo +|date = 2005-07-27 +|format = PDF +}} +*{{cite web +|url = http://www.maths.qmul.ac.uk/~ht/archive/convex1.pdf +|title = Elements of convex analysis +|accessdate = 2013-10-13 +|last = Touchette +|first = Hugo +|date = 2006-11-21 +|format = PDF +}} + +==External links== +*[http://maze5.net/?page_id=733 Legendre transform with figures] at onmyphd.com +*[http://www.onmyphd.com/?p=legendre.fenchel.transform Legendre and Legendre-Fenchel transforms in a step-by-step explanation] at maze5.net + +[[Category:Transforms]] +[[Category:Duality theories]] +[[Category:Concepts in physics]] +[[Category:Convex analysis]] + exex7vmz3kdik9x0yit8ertm8ambiyp + + + + Cooperative diversity + 0 + 17689 + + 17690 + 2013-11-07T14:48:25Z + + 24.1.195.143 + + /* Applications */ + wikitext + text/x-wiki + '''Cooperative diversity''' is a cooperative multiple antenna technique for improving or maximising total network [[channel capacity|channel capacities]] for any given set of bandwidths which exploits user [[Diversity scheme|diversity]] by decoding the combined signal of the relayed signal and the direct signal in wireless multihop networks. A conventional single hop system uses direct transmission where a receiver decodes the information only based on the direct signal while regarding the relayed signal as interference, whereas the cooperative diversity considers the other signal as contribution. That is, cooperative diversity decodes the information from the combination of two signals. Hence, it can be seen that cooperative diversity is an [[antenna diversity]] that uses distributed antennas belonging to each node in a wireless network. Note that user cooperation is another definition of cooperative diversity. ''User cooperation'' considers an additional fact that each user relays the other user's signal while cooperative diversity can be also achieved by multi-hop relay networking systems. + +== Relaying Strategies == +The simplest cooperative relaying network consists of three nodes, namely source, destination, and a third node supporting the direct communication between source and destination denoted as relay. If the direct transmission of a message from source to destination is not (fully) successful, the overheard information from the source is forwarded by the relay to reach the destination via a different path. Since the two communications took a different path and take place one after another, this example implements the concept of '''space diversity''' and '''time diversity'''.<ref>{{Cite journal|doi=10.1007/s00502-008-0571-7|title=Building blocks of cooperative relaying in wireless systems|url=https://mobile.aau.at/~welmenre/papers/elmenreich-2008-building-blocks.pdf|author=W. Elmenreich, N. Marchenko, H. Adam, C. Hofbauer, G. Brandner, C. Bettstetter, and M. Huemer|journal=e & i, Springer|pages=353–359|volume=125|issue=10|date=2008}}</ref> + +The relaying strategies can be further distinguished by the amplify-and-forward, decode-and-forward, and compress-and-forward strategies: +* The '''amplify-and-forward''' strategy allows the relay station to amplify the received signal from the source node and to forward it to the destination station +* Relays following the '''decode-and-forward''' strategy overhear transmissions from the source, decode them and in case of correct decoding, forward them to the destination. Whenever unrecoverable errors reside in the overheard transmission, the relay can not contribute to the cooperative transmission. +* The '''compress-and-forward''' strategy allows the relay station to compress the received signal from the source node and forward it to the destination without decoding the signal where [[Distributed_source_coding#Wyner–Ziv_coding_–_lossy_distributed_coding|Wyner-Ziv coding]] can be used for optimal compression. + +== Relay Transmission Topology == + +'''Serial relay transmission''' is used for long distance communication and range-extension in shadowy regions. It provides power gain. In this topology signals propagate from one relay to another relay and the channels of neighboring hop are orthogonal to avoid any interference. + +'''Parallel relay transmission''' may be used where serial relay transmission suffers from multi-path [[fading]]. For outdoors and [[non-line-of-sight propagation]], signal wavelength may be large and installation of multiple antennas are not possible. To increase the robustness against multi-path fading, parallel relay transmission can be used. In this topology, signals propagate through multiple relay paths in the same hop and the destination combines the signals received with the help of various combining schemes. It provides power gain and [[diversity gain]] simultaneously. + +== System model == +We consider a wireless relay system that consists of source, relay and destination nodes. It is assumed that the channel is in a half-duplex, orthogonal and amplify-and-forward relaying mode. Differently to the conventional direct transmission system, we exploit a time division relaying function where this system can deliver information with two temporal phases. + +On the first phase, the source node broadcasts information <math>x_{s}</math> toward both the destination and the relay nodes. The received signal at the destination and the relay nodes are respectively written as: + +:<math> +r_{d,s} = h_{d,s} x_{s} + n_{d,s} \quad +</math> +:<math> +r_{r,s} = h_{r,s} x_{s} + n_{r,s} \quad +</math> + +where <math>h_{d,s}</math> is the channel from the source to the destination nodes, <math>h_{r,s}</math> is the channel from the source to the relay node, <math>n_{r,s}</math> is the noise signal added to <math>h_{r,s}</math> and <math>n_{d,s}</math> is the noise signal added to <math>h_{d,s}</math>. + +On the second phase, the relay can transmit its received signal to the destination node except the direct transmission mode. + +== Signal Decoding == +We introduce four schemes to decode the signal at the destination node which are the direct scheme, the non-cooperative scheme, the cooperative scheme and the adaptive scheme. Except the direct scheme, the destination node uses the relayed signal in all other schemes. + +=== Direct Scheme === +In the direct scheme, the destination decodes the data using the signal received from the source node on the first phase where the second phase transmission is omitted so that the relay node is not involved in transmission. The decoding signal received from the source node is written as: +:<math> +r_{d,s} = h_{d,s} x_{s} + n_{d,s} \quad +</math> +While the advantage of the direct scheme is its simplicity in terms of the decoding processing, the received signal power can be severely low if the distance between the source node and the destination node is large. Thus, in the following we consider non-cooperative scheme which exploits signal relaying to improve the signal quality. + +=== Non-cooperative Scheme === +In the non-cooperative scheme, the destination decodes the data using the signal received from the relay on the second phase, which results in the signal power boosting gain. The signal received from the relay node which retransmits the signal received from the source node is written as: +:<math> +r_{d,r} = h_{d,r} r_{r,s} + n_{d,r} += h_{d,r} h_{r,s} x_{s} + h_{d,r} n_{r,s} + n_{d,r} \quad +</math> +where <math>h_{d,r}</math> is the channel from the relay to the destination nodes and <math>n_{r,s}</math> is the noise signal added to <math>h_{d,r}</math>. + +The reliability of decoding can be low since the degree of freedom is not increased by signal relaying. There is no increase in the diversity order since this scheme exploits only the relayed signal and the direct signal from the source node is either not available or is not accounted for. When we can take advantage of such a signal and increase in diversity order results. Thus, in the following we consider the cooperative scheme which decodes the combined signal of both the direct and relayed signals. + +=== Cooperative Scheme === +For cooperative decoding, the destination node combines two signals received from the source and the relay nodes which results in the diversity advantage. The whole received signal vector at the destination node can be modeled as: + +:<math> +\mathbf{r} = [r_{d,s} \quad r_{d,r}]^T + = [h_{d,s} \quad h_{d,r} h_{r,s}]^T x_{s} + \left[1 \quad \sqrt{|h_{d,r}|^2+1} \right]^T n_{d} + = \mathbf{h} x_{s} + \mathbf{q} n_{d} +</math> + +where <math>r_{d,s}</math> and <math>r_{d,r}</math> are the signals received at the destination node from the source and relay nodes, respectively. As a linear decoding technique, the destination combines elements of the received signal vector as follows: +:<math> +y = \mathbf{w}^H \mathbf{r} +</math> +where <math>\mathbf{w}</math> is the linear combining weight which can be obtained to maximize signal-to-noise ratio (SNR) of the combined signals subject to given the complexity level of the weight calculation. + +=== Adaptive Scheme === +Adaptive scheme selects one of the three modes described above which are the direct, the non-cooperative, and the cooperative schemes relying on the network [[channel state information]] and other network parameters. + +== Trade-off == +It is noteworthy that cooperative diversity can increase the diversity gain at the cost of losing the wireless resource such as frequency, time and power resources for the relaying phase. Wireless resources are wasted since the relay node uses wireless resources to relay the signal from the source to the destination node. Hence, it is important to remark that there is trade-off between the diversity gain and the waste of the spectrum resource in cooperative diversity. + +== Channel Capacity of Cooperative Diversity == +In June 2005, A. Høst-Madsen published a paper in-depth analyzing the [[channel capacity]] of the cooperative relay network.<ref>{{Cite journal|doi=10.1109/TIT.2005.847703|title=Capacity bounds and power allocation for the wireless relay channel|url=http://www.it.iitb.ac.in/~subbu/pdf_ps/relay_channel1.pdf|author=A. Høst-Madsen and J. Zhang|journal=IEEE Trans. Inform. Theory|pages=2020–2040|volume=51|issue=6|date=June, 2005}}</ref> + +We assume that the channel from the source node to the relay node, from the source node to the destination node, and from the relay node to the destination node are <math>c_{21} e^{j\varphi_{21}},c_{31} e^{j\varphi_{31}},c_{32} e^{j\varphi_{32}}</math> +where the source node, the relay node, and the destination node are denoted node 1, node 2, and node 3, subsequently. + +===The capacity of cooperative relay channels=== +Using the [[max-flow min-cut theorem]] yields the upper bound of full duplex relaying +:<math> +C^+ = \max_{f(X_1,X_2)} \min \{ I(X_1;Y_2,Y_3|X_2), I(X_1,X_2;Y_3)\} +</math> +where <math>X_1</math> and <math>X_2</math> are transmit information at the source node and the relay node respectively and <math>Y_2</math> and <math>Y_3</math> are received information at the relay node and the destination node respectively. Note that the max-flow min-cut theorem states that the maximum amount of flow is equal to the capacity of a minimum cut, i.e., dictated by its bottleneck. The capacity of the broadcast channel from <math>X_1</math> to <math>Y_2</math> and <math>Y_3</math> with given <math>X_2</math> is +:<math> +\max_{f(X_1,X_2)} I(X_1;Y_2,Y_3|X_2) = \frac{1}{2} \log(1 + (1 - \beta) (c^2_{21} + c^2_{31})P_1 ) +</math> +while the capacity of the multiple access channel from <math>X_1</math> and <math>X_2</math> to <math>Y_3</math> is +:<math> +\max_{f(X_1,X_2)} I(X_2,X_2;Y_3) = \frac{1}{2} \log(1 + c^2_{31} P_1 + c^2_{32} P_2 + 2 \sqrt{ \beta c^2_{31} c^2_{32} P_1 P_2}) +</math> +where <math>\beta</math> is the amount of correlation between <math>X_1</math> and <math>X_2</math>. Note that <math>X_2</math> copies some part of <math>X_1</math> for cooperative relaying capability. Using cooperative relaying capability at the relay node improves the performance of reception at the destination node. +Thus, the upper bound is rewritten as +:<math> +C^+ = \max_{0 \leq \beta \leq 1} \min \left\{ \frac{1}{2} \log(1 + (1 - \beta) (c^2_{21} + c^2_{31}) P_1), \frac{1}{2} \log(1 + c^2_{31} P_1 + c^2_{32} P_2 + 2 \sqrt{ \beta c^2_{31} c^2_{32} P_1 P_2}) \right\} +</math> + +===Achievable rate of a decode-and-forward relay=== +Using a relay which decodes and forwards its captured signal yields the achievable rate as follows: +:<math> +R_1 = \max_{f(X_1,X_2)} \min \{ I(X_1;Y_2|X_2), I(X_1,X_2;Y_3)\} +</math> +where the broadcast channel is reduced to the point-to-point channel because of decoding at the relay node, i.e., <math>I(X_1;Y_2,Y_3|X_2)</math> is reduced to <math>I(X_1;Y_2|X_2)</math>. The capacity of the reduced broadcast channel is +:<math> +\max_{f(X_1,X_2)} I(X_1;Y_2|X_2) = \frac{1}{2} \log(1 + (1 - \beta) c^2_{21} P_1 ). +</math> +Thus, the achievable rate is rewritten as +:<math> +R_1 = \max_{0 \leq \beta \leq 1} \min \left\{ \frac{1}{2} \log(1 + (1 - \beta) c^2_{21} P_1), \frac{1}{2} \log(1 + c^2_{31} P_1 + c^2_{32} P_2 + 2 \sqrt{ \beta c^2_{31} c^2_{32} P_1 P_2}) \right\} +</math> + +===Time-Division Relaying=== +The capacity of the TD relay channel is upper-bounded by +:<math> +C^+ = \max_{0 \leq \beta \leq 1} \min \{ C_1^+(\beta), C_2^+(\beta) \} +</math> +with +:<math> +C_1^+(\beta) = \frac{\alpha}{2} \log \left( 1 + (c_{31}^2 + c_{21}^2) P_1^{(1)} \right) + + \frac{1-\alpha}{2} \log \left( 1 + (1-\beta) c_{31}^2 P_1^{(2)} \right) +</math> +:<math> +C_2^+(\beta) = \frac{\alpha}{2} \log \left( 1 + c_{31}^2 P_1^{(1)} \right) + + \frac{1-\alpha}{2} \log \left( 1 + c_{31}^2 P_1^{(2)} + c_{32}^2 P_2 + 2 \sqrt{ \beta C_{31}^2 P_1^{(2)} C_{32}^2 P_2} \right) +</math> + +== Applications == + +In a [[cognitive radio]] system, unlicensed secondary users can use the resources which is licensed for primary users. When primary users want to use their licensed resources, secondary users has to vacate these resources. Hence secondary users have to constantly sense the channel for detecting the presence of primary user. It is very challenging to sense the activity of spatially distributed primary users in wireless channel. Spatially distributed nodes can improve the channel sensing reliability by sharing the information and reduce the probability of false alarming. + +A [[wireless ad hoc network]] is an autonomous and self organizing network without any centralized controller or pre-established infrastructure. In this network randomly distributed nodes forms a temporary functional network and support seamless leaving or joining of nodes. Such networks have been successfully deployed for military communication and have lot of potential for civilian applications, to include commercial and educational use, disaster management, road vehicle network etc.<ref>M. Eriksson, A. Mahmud, [http://apachepersonal.miun.se/myresearch/cit2010_eriksson_mahmud_DSFN_in_Wireless_Multihop_Networks.pdf “Dynamic Single Frequency Networks in Wireless Multihop Networks - Energy aware routing algorithms with performance analysis”], 2010 IEEE International Conference on Computer and Information Technology, CIT’10, Bradford, UK, June 2010.</ref> + +A [[wireless sensor network]] can use cooperative relaying to reduce the energy consumption in sensor nodes, hence lifetime of sensor network increases. Due to nature of wireless medium, communication through weaker channels requires huge energy as compared to relatively stronger channels. Careful incorporation of relay cooperation into routing process can selects better communication links and precious battery power can be saved. + +==References== +{{reflist|1}} + +== See also == +=== Systems === +* 3GPP [[long term evolution]] (LTE) ''coordinated multipoint transmission/reception'' (CoMP), making it possible to increase the data rate to and from a mobile situated in the overlap of several base stations. +* [[5G]] +* [[Mesh network]] +* [[Mobile ad hoc network]] (MANet) +* [[Wireless mesh network]] +* [[Wireless ad hoc network]] + +=== Technologies === +* [[Cooperative wireless communications]] +* [[Cooperative MIMO]] +* [[Diversity scheme]]s +* [[Dynamic Single Frequency Networks]] (DSFN) +* [[Soft handover]] +* [[Space–time code]] +* [[Multiple-input multiple-output communications| Multiple-input multiple-output communications (MIMO)]] +* [[Multi-user MIMO]] +* [[Diversity combining]] +* [[Transmit diversity]] +* [[Diversity gain]] + +=== External references === +* A. Sendonaris, E. Erkip, and B. Aazhang, “User Cooperation Diversity Part I and Part II,” IEEE Trans. Commun., vol. 51, no. 11, November 2003, pp.&nbsp;1927–48. +* J. N. Laneman, D. N. C. Tse, and G. W. Wornell, “Cooperative Diversity in Wireless Networks: Efficient Protocols and Outage Behavior,” IEEE Trans. Inform. Theory, vol. 50, no. 12, pp.&nbsp;3062–3080, December 2004. +* [http://www.wireless-world-research.org/fileadmin/sites/default/files/about_the_forum/WG/WG3/White%20Papers/WWRF_WG3WP06-coopnw-heidelberg.pdf S. Valentin, et al., Cooperative wireless networking beyond store-and-forward: Perspectives for PHY and MAC design] +* [http://sebastien.simoens.free.fr/publis_seb/simoens_spawc_2006.pdf Sébastien Simoens, Josep Vidal, Olga Muñoz, COMPRESS-AND-FORWARD COOPERATIVE RELAYING IN MIMO-OFDM SYSTEMS] + +[[Category:Radio resource management]] + p8g54staf6aweapmbzsg6obi27qy5k0 + + + + Algebraic K-theory + 0 + 5762 + + 5763 + 2013-12-23T03:25:15Z + + TakuyaMurata + 0 + + /* See also */ Fundamental theorem of algebraic K-theory + wikitext + text/x-wiki + In [[mathematics]], '''algebraic K-theory''' is an important part of [[homological algebra]] concerned with defining and applying a sequence +:''K<sub>n</sub>(R)'' +of [[functor]]s from [[ring (mathematics)|rings]] to [[abelian group]]s, for all nonnegative integers ''n.'' For historical reasons, the '''[[#Lower K-groups|lower K-groups]]''' ''K''<sub>0</sub> and ''K''<sub>1</sub> are thought of in somewhat different terms from the '''[[#Higher K-theory|higher algebraic K-groups]]''' ''K<sub>n</sub>'' for ''n'' ≥ 2. Indeed, the lower groups are more accessible, and have more applications, than the higher groups. The theory of the higher K-groups is noticeably deeper, and certainly much harder to compute (even when ''R'' is the ring of [[integer]]s). + +The group ''K<sub>0</sub>(R)'' generalises the construction of the [[ideal class group]] of a ring, using [[projective module]]s. Its development in the 1960s and 1970s was linked to attempts to solve a conjecture of [[Jean-Pierre Serre|Serre]] on projective modules that now is the [[Quillen-Suslin theorem]]; numerous other connections with classical algebraic problems were found in this era. Similarly, ''K<sub>1</sub>(R)'' is a modification of the group of [[unit (ring theory)|units]] in a ring, using [[elementary matrix]] theory. The group ''K<sub>1</sub>(R)'' is important in [[topology]], especially when ''R'' is a [[group ring]], because its quotient the [[Whitehead torsion#The Whitehead group of a group|Whitehead group]] contains the [[Whitehead torsion]] used to study problems in [[CW complex|simple homotopy theory]] and [[surgery theory]]; the group ''K<sub>0</sub>(R)'' also contains other invariants such as the finiteness invariant. Since the 1980s, algebraic ''K''-theory has increasingly had applications to [[algebraic geometry]]. For example, [[motivic cohomology]] is closely related to algebraic ''K''-theory. + +== History == +[[Alexander Grothendieck]] discovered K-theory in the mid-1950s as a framework to state his far-reaching generalization of the [[Riemann-Roch theorem]]. Within a few years, its topological counterpart was considered by [[Michael Atiyah]] and [[Friedrich Hirzebruch]] and is now known as [[topological K-theory]]. + +Applications of ''K''-groups were found from 1960 onwards in [[surgery theory]] for [[manifold]]s, in particular; and numerous other connections with classical algebraic problems were brought out. + +A little later a branch of the theory for [[operator algebra]]s was fruitfully developed, resulting in [[operator K-theory]] and [[KK-theory]]. It also became clear that ''K''-theory could play a role in [[algebraic cycle]] theory in [[algebraic geometry]] ([[Gersten's conjecture]]): here the ''higher'' K-groups become connected with the ''higher codimension'' phenomena, which are exactly those that are harder to access. The problem was that the definitions were lacking (or, too many and not obviously consistent). Using Robert Steinberg's work on universal central extensions of classical algebraic groups, [[John Milnor]] defined the group ''K<sub>2</sub>(A)'' of a ring ''A'' as the center, isomorphic to H<sub>2</sub>(E(''A''),'''Z'''), of the universal central extension of the group E(''A)'' of infinite elementary matrices over ''A''. (Definitions below.) There is a natural bilinear pairing from ''K<sub>1</sub>(A) × K<sub>1</sub>(A)'' to ''K<sub>2</sub>(A)''. In the special case of a field k, with ''K<sub>1</sub>(k)'' isomorphic to the multiplicative group GL(1,''k''), computations of Hideya Matsumoto showed that ''K<sub>2</sub>(k)'' is isomorphic to the group generated by ''K<sub>1</sub>(A) × K<sub>1</sub>(A)'' modulo an easily described set of relations. + +Eventually the foundational difficulties were resolved (leaving a deep and difficult theory) by {{harvs|txt|authorlink=Daniel Quillen|last=Quillen|year1=1973|year2=1974}}, who gave several definitions of ''K<sub>n</sub>(A)'' for arbitrary non-negative ''n'', via the [[plus construction|+-construction]] and the ''Q''-construction. + +== Lower K-groups == +The lower K-groups were discovered first, and given various ad hoc descriptions, which remain useful. Throughout, let ''A'' be a [[ring (mathematics)|ring]]. + +=== ''K<sub>0</sub>'' === +The functor ''K<sub>0</sub>'' takes a ring ''A'' to the [[Grothendieck group]] of the set of isomorphism classes of its [[finitely generated module|finitely generated]] [[projective module]]s, regarded as a monoid under direct sum. Any ring homomorphism ''A'' → ''B'' gives a map ''K<sub>0</sub>(A)'' → ''K<sub>0</sub>(B)'' by mapping (the class of) a projective ''A''-module ''M'' to ''M'' ⊗<sub>''A''</sub> ''B'', making ''K<sub>0</sub>'' a covariant functor. + +If the ring ''A'' is commutative, we can define a subgroup of ''K<sub>0</sub>(A)'' as the set + +:<math>\tilde{K}_0\left(A\right) = \bigcap\limits_{\mathfrak p\text{ prime ideal of }A}\mathrm{Ker}\dim_{\mathfrak p},</math> + +where : + +:<math>\dim_{\mathfrak p}:K_0\left(A\right)\to \mathbf{Z}</math> + +is the map sending every (class of a) finitely generated projective ''A''-module ''M'' to the rank of the free <math>A_{\mathfrak p}</math>-module <math>M_{\mathfrak p}</math> (this module is indeed free, as any finitely generated projective module over a local ring is free). This subgroup <math>\tilde{K}_0\left(A\right)</math> is known as the ''reduced zeroth K-theory'' of ''A''. + +If ''B'' is a [[pseudo-ring|ring without an identity element]], we can extend the definition of K<sub>0</sub> as follows. Let ''A'' = ''B''⊕'''Z''' be the extension of ''B'' to a ring with unity obtaining by adjoining an identity element (0,1). There is a short exact sequence ''B'' → ''A'' → '''Z''' and we define K<sub>0</sub>(''B'') to be the kernel of the corresponding map K<sub>0</sub>(''A'') → K<sub>0</sub>('''Z''') = '''Z'''.<ref name=Ros30>Rosenberg (1994) p.30</ref> + +====Examples==== +* (Projective) modules over a [[field (mathematics)|field]] ''k'' are [[vector space]]s and K<sub>0</sub>(''k'') is isomorphic to '''Z''', by [[Dimension (vector space)|dimension]]. +* Finitely generated projective modules over a [[local ring]] ''A'' are free and so in this case again K<sub>0</sub>(''A'') is isomorphic to '''Z''', by [[Rank of a free module|rank]].<ref name=Mil5>Milnor (1971) p.5</ref> +* For ''A'' a [[Dedekind domain]], + +:K<sub>0</sub>(''A'') = Pic(''A'') &oplus; '''Z''', + +where Pic(''A'') is the [[Picard group]] of ''A'',<ref name=Mil14>Milnor (1971) p.14</ref> and similarly the reduced K-theory is given by + +:<math>\tilde K_0(A)=\operatorname{Pic} A.</math> + +An algebro-geometric variant of this construction is applied to the category of [[algebraic variety|algebraic varieties]]; it associates with a given algebraic variety ''X'' the Grothendieck's K-group of the category of locally free sheaves (or coherent sheaves) on ''X''. Given a [[compact topological space]] ''X'', the [[topological K-theory]] K<sup>top</sup>(''X'') of (real) [[vector bundle]]s over ''X'' coincides with ''K<sub>0</sub>'' of the ring of [[continuous function|continuous]] real-valued functions on ''X''.<ref>{{Citation | last1=Karoubi | first1=Max | title=K-Theory: an Introduction | publisher=[[Springer-Verlag]] | location=Berlin, New York | series=Classics in mathematics | isbn=978-3-540-79889-7 | year=2008}}, see Theorem I.6.18</ref> + +====Relative K<sub>0</sub>==== +Let ''I'' be an ideal of ''A'' and define the "double" to be a subring of the [[Cartesian product]] ''A''×''A'':<ref name=Ros27>Rosenberg (1994) 1.5.1, p.27</ref> + +:<math>D(A,I) = \{ (x,y) \in A \times A : x-y \in I \} \ . </math> + +The ''relative K-group'' is defined in terms of the "double"<ref name=Ros27a>Rosenberg (1994) 1.5.3, p.27</ref> + +:<math>K_0(A,I) = \ker \left({ K_0(D(A,I)) \rightarrow K_0(A) }\right) \ . </math> + +where the map is induced by projection along the first factor. + +The relative K<sub>0</sub>(''A'',''I'') is isomorphic to K<sub>0</sub>(''I''), regarding ''I'' as a ring without identity. The independence from ''A'' is an analogue of the [[Excision theorem]] in homology.<ref name=Ros30/> + +====''K''<sub>0</sub> as a ring==== +If ''A'' is a commutative ring, then the [[tensor product]] of projective modules is again projective, and so tensor product induces a multiplication turning K<sub>0</sub> into a commutative ring with the class [''A''] as identity.<ref name=Mil5/> The [[exterior product]] similarly induces a [[λ-ring]] structure. +The [[Picard group]] embeds as a subgroup of the group of units K<sub>0</sub>(''A'')<sup>&lowast;</sup>.<ref name=Mil15>Milnor (1971) p.15</ref> + +=== ''K''<sub>1</sub> === +[[Hyman Bass]] provided this definition, which generalizes the group of units of a ring: ''K<sub>1</sub>(A)'' is the [[abelianization]] of the [[infinite general linear group]]: + +:<math>K_1(A) = \operatorname{GL}(A)^{\mbox{ab}} = \operatorname{GL}(A) / [\operatorname{GL}(A),\operatorname{GL}(A)]</math> + +Here + +:<math>\operatorname{GL}(A) = \operatorname{colim} \operatorname{GL}(n, A)</math> + +is the [[direct limit]] of the GL(''n''), which embeds in GL(''n''+1) as the upper left [[block matrix]], and the [[commutator subgroup]] agrees with the group generated by elementary matrices ''E(A)=[GL(A), GL(A)]'', by [[Whitehead's lemma]]. Indeed, the group GL(''A'')/E(''A'') was first defined and studied by Whitehead,<ref>J.H.C. Whitehead, ''Simple homotopy types'' Amer. J. Math. , 72 (1950) pp. 1–57</ref> and is called the '''Whitehead group''' of the ring ''A''. + +==== Relative ''K''<sub>1</sub> ==== +The ''relative K-group'' is defined in terms of the "double"<ref name=Ros92>Rosenberg (1994) 2.5.1, p.92</ref> + +:<math>K_1(A,I) = \ker \left({ K_1(D(A,I)) \rightarrow K_1(A) }\right) \ . </math> + +There is a natural [[exact sequence]]<ref name=Ros95>Rosenberg (1994) 2.5.4, p.95</ref> + +:<math> K_1(A,I) \rightarrow K_1(A) \rightarrow K_1(A/I) \rightarrow K_0(A,I) \rightarrow K_0(A) \rightarrow K_0(A/I) \ . </math> + +==== Commutative rings and fields ==== +For ''A'' a [[commutative ring]], one can define a determinant det: GL(''A'') → ''A*'' to the [[group of units]] of ''A'', which vanishes on E(''A'') and thus descends to a map det: ''K<sub>1</sub>(A)'' → ''A*''. As E(''A'') ◅ SL(''A''), one can also define the '''special Whitehead group''' SK<sub>1</sub>(''A'') := SL(''A'')/E(''A''). This map splits via the map ''A*'' → GL(1, ''A'') → ''K<sub>1</sub>(A)'' (unit in the upper left corner), and hence is onto, and has the special Whitehead group as kernel, yielding the [[split short exact sequence]]: + +:<math>1 \to SK_1(A) \to K_1(A) \to A^* \to 1,</math> + +which is a quotient of the usual split short exact sequence defining the [[special linear group]], namely + +:<math>1 \to \operatorname{SL}(A) \to \operatorname{GL}(A) \to A^* \to 1.</math> + +The determinant is split by including the group of units ''A*'' = GL<sub>1</sub>''(A)'' into the general linear group GL''(A)'', so ''K<sub>1</sub>(A)'' splits as the direct sum of the group of units and the special Whitehead group: ''K<sub>1</sub>(A)'' ≅ ''A*'' ⊕ SK<sub>1</sub> (''A''). + +When ''A'' is a Euclidean domain (e.g. a field, or the integers) SK<sub>1</sub>(''A'') vanishes, and the determinant map is an isomorphism from K<sub>1</sub>(''A'') to ''A''<sup>&lowast;</sup>.<ref name=Ros74>Rosenberg (1994) Theorem 2.3.2, p.74</ref> This is ''false'' in general for PIDs, thus providing one of the rare mathematical features of Euclidean domains that do not generalize to all PIDs. An explicit PID such that SK<sub>1</sub> is nonzero was given by Ischebeck in 1980 and by Grayson in 1981.<ref name=Ros75>Rosenberg (1994) p.75</ref> If ''A'' is a Dedekind domain whose quotient field is an [[algebraic number field]] (a finite extension of the rationals) then {{harvtxt|Milnor|1971|loc=corollary 16.3}} shows that SK<sub>1</sub>(''A'') vanishes.<ref name=Ros81>Rosenberg (1994) p.81</ref> + +The vanishing of SK<sub>1</sub> can be interpreted as saying that K<sub>1</sub> is generated by the image of GL<sub>1</sub> in GL. When this fails, one can ask whether K<sub>1</sub> is generated by the image of GL<sub>2</sub>. For a Dedekind domain, this is the case: indeed, K<sub>1</sub> is generated by the images of GL<sub>1</sub> and SL<sub>2</sub> in GL.<ref name=Ros75/> The subgroup of SK<sub>1</sub> generated by SL<sub>2</sub> may be studied by [[Mennicke symbol]]s. For Dedekind domains with all quotients by maximal ideals finite, SK<sub>1</sub> is a torsion group.<ref name=Ros78>Rosenberg (1994) p.78</ref> + +For a non-commutative ring, the determinant cannot in general be defined, but the map GL(''A'') → ''K<sub>1</sub>(A)'' is a generalisation of the determinant. + +====Central simple algebras==== +In the case of a [[central simple algebra]] ''A'' over a field ''F'', the [[reduced norm]] provides a generalisation of the determinant giving a map K<sub>1</sub>(''A'') → ''F''<sup>&lowast;</sup> and SK<sub>1</sub>(''A'') may be defined as the kernel. '''Wang's theorem''' states that if ''A'' has prime degree then SK<sub>1</sub>(''A'') is trivial,<ref name=GS47>Gille & Szamuely (2006) p.47</ref> and this may be extended to square-free degree.<ref name=GS48>Gille & Szamuely (2006) p.48</ref> [[Shianghao Wang|Wang]] also showed that SK<sub>1</sub>(''A'') is trivial for any central simple algebra over a number field,<ref name=Wang1950>{{cite journal | zbl=0040.30302 | last=Wang | first=Shianghaw | authorlink=Shianghao Wang | title=On the commutator group of a simple algebra | journal=Am. J. Math. | volume=72 | pages=323–334 | year=1950 | issn=0002-9327 }}</ref> but Platonov has given examples of algebras of degree prime squared for which SK<sub>1</sub>(''A'') is non-trivial.<ref name=GS48/> + +=== ''K''<sub>2</sub> === +{{See also|Steinberg group (K-theory)}} +<!--Matsumoto's theorem (K-theory) links here--> +[[John Milnor]] found the right definition of ''K<sub>2</sub>'': it is the [[centre of a group|center]] of the [[Steinberg group (K-theory)|Steinberg group]] St(''A'') of ''A''. + +It can also be defined as the [[kernel (algebra)|kernel]] of the map + +:<math>\varphi\colon\operatorname{St}(A)\to\mathrm{GL}(A),</math> + +or as the [[Schur multiplier]] of the group of [[elementary matrices]]. + +For a field, K<sub>2</sub> is determined by [[Steinberg symbol]]s: this leads to Matsumoto's theorem. + +One can compute that K<sub>2</sub> is zero for any finite field.<ref name=Lam139>Lam (2005) p.139</ref><ref name=Lem66>Lemmermeyer (2000) p.66</ref> The computation of K<sub>2</sub>('''Q''') is complicated: Tate proved<ref name=Lem66/><ref name=Mil101>Milnor (1971) p.101</ref> + +:<math>K_2(\mathbf{Q}) = (\mathbf{Z}/4)^* \times \prod_{p\ge 3} (\mathbf{Z}/p)^* \ </math> + +and remarked that the proof followed [[Gauss]]'s first proof of the [[Law of Quadratic Reciprocity]].<ref name=Mil102>Milnor (1971) p.102</ref><ref name=Gras205>Gras (2003) p.205</ref> + +For non-Archimedean local fields, the group K<sub>2</sub>(''F'') is the direct sum of a finite [[cyclic group]] of order ''m'', say, and a [[divisible group]] K<sub>2</sub>(''F'')<sup>''m''</sup>.<ref name=Mil175>Milnor (1971) p.175</ref> + +We have K<sub>2</sub>('''Z''') = '''Z'''/2,<ref name=Mil81>Milnor (1971) p.81</ref> and in general K<sub>2</sub> is finite for the ring of integers of a number field.<ref name=Lem385>Lemmermeyer (2000) p.385</ref> + +We further have K<sub>2</sub>('''Z'''/''n'') = '''Z'''/2 if ''n'' is divisible by 4, and otherwise zero.<ref name=Sil228>Silvester (1981) p.228</ref> + +====Matsumoto's theorem==== +'''Matsumoto's theorem''' states that for a field ''k'', the second ''K''-group is given by<ref>{{citation | mr=0240214 | last=Matsumoto | first= Hideya | title=Sur les sous-groupes arithmétiques des groupes semi-simples déployés | journal=Ann. Sci. École Norm. Sup. (4) | issue= 2 | year=1969 | pages= 1–62 +| url=http://www.numdam.org/item?id=ASENS_1969_4_2_1_1_0 | zbl=0261.20025 | language=French | issn=0012-9593 }}</ref><ref name=Ros214>Rosenberg (1994) Theorem 4.3.15, p.214</ref> + +:<math>K_2(k) = k^\times\otimes_{\mathbf Z} k^\times/\langle a\otimes(1-a)\mid a\not=0,1\rangle.</math> + +Matsumoto's original theorem is even more general: For any [[root system]], it gives a presentation for the unstable K-theory. This presentation is different from the one given here only for symplectic root systems. For non-symplectic root systems, the unstable second K-group with respect to the root system is exactly the stable K-group for GL(''A''). Unstable second K-groups (in this context) are defined by taking the kernel of the universal central extension of the [[Chevalley group]] of universal type for a given root system. This construction yields the kernel of the Steinberg extension for the root systems ''A<sub>n</sub>'' (''n''>1) and, in the limit, stable second ''K''-groups. + +====Long exact sequences==== +If ''A'' is a [[Dedekind domain]] with [[field of fractions]] ''F'' then there is a [[long exact sequence]] + +:<math> K_2F \rightarrow \oplus_{\mathbf p} K_1 A/{\mathbf p} \rightarrow K_1 A \rightarrow K_1 F \rightarrow \oplus_{\mathbf p} K_0 A/{\mathbf p} \rightarrow K_0 A \rightarrow K_0 F \rightarrow 0 \ </math> + +where '''''p''''' runs over all prime ideals of ''A''.<ref name=Mil123>Milnor (1971) p.123</ref> + +There is also an extension of the exact sequence for relative K<sub>1</sub> and K<sub>0</sub>:<ref name=Ros200>Rosenberg (1994) p.200</ref> + +:<math>K_2(A) \rightarrow K_2(A/I) \rightarrow K_1(A,I) \rightarrow K_1(A) \cdots \ . </math> + +====Pairing==== +There is a pairing on K<sub>1</sub> with values in K<sub>2</sub>. Given commuting matrices ''X'' and ''Y'' over ''A'', take elements ''x'' and ''y'' in the [[Steinberg group]] with ''X'',''Y'' as images. The commutator <math>x y x^{-1} y^{-1}</math> is an element of K<sub>2</sub>.<ref name=Mil63>Milnor (1971) p.63</ref> The map is not always surjective.<ref name=Mil6>Milnor (1971) p.69</ref> + +== Milnor ''K''-theory == +{{main|Milnor K-theory}} + +The above expression for ''K<sub>2</sub>'' of a field ''k'' led Milnor to the following definition of "higher" ''K''-groups by + +:<math> K^M_*(k) := T^*(k^\times)/(a\otimes (1-a)) </math>, + +thus as graded parts of a quotient of the [[tensor algebra]] of the [[multiplicative group]] ''k''<sup>×</sup> by the [[two-sided ideal]], generated by the + +:<math>\left \{a\otimes(1-a): \ a \neq 0,1 \right \}.</math> + +For ''n'' = 0,1,2 these coincide with those below, but for ''n''≧3 they differ in general.<ref>{{Harvard citations|last=Weibel|year=2005}}, cf. Lemma 1.8</ref> For example, we have ''K''{{su|b=''n''|p=''M''}}''(F<sub>q</sub>) = 0'' for ''n'' ≧2 +but ''K<sub>n</sub>F<sub>q</sub>'' is nonzero for odd ''n'' (see below). + +The tensor product on the tensor algebra induces a product <math> K_m \times K_n \rightarrow K_{m+n}</math> making <math> K^M_*(F)</math> a [[graded ring]] which is [[graded-commutative]].<ref name=GS184>Gille & Szamuely (2006) p.184</ref> + +The images of elements <math>a_1 \otimes \cdots \otimes a_n</math> in <math>K^M_n(k)</math> are termed ''symbols'', denoted <math>\{a_1,\ldots,a_n\}</math>. For integer ''m'' invertible in ''k'' there is a map + +:<math>\partial : k^* \rightarrow H^1(k,\mu_m) </math> + +where <math>\mu_m</math> denotes the group of ''m''-th roots of unity in some separable extension of ''k''. This extends to + +:<math>\partial^n : k^* \times \cdots \times k^* \rightarrow H^n\left({k,\mu_m^{\otimes n}}\right) \ </math> + +satisfying the defining relations of the Milnor K-group. Hence <math>\partial^n</math> may be regarded as a map on <math>K^M_n(k)</math>, called the ''Galois symbol'' map.<ref name=GS108>Gille & Szamuely (2006) p.108</ref> + +The relation between [[étale cohomology|étale]] (or [[Galois cohomology|Galois]]) cohomology of the field and Milnor K-theory modulo 2 is the [[Milnor conjecture]], proven by Voevodsky.<ref>{{Citation | last1=Voevodsky | first1=Vladimir | author1-link=Vladimir Voevodsky | title=Motivic cohomology with '''Z'''/2-coefficients | doi=10.1007/s10240-003-0010-6 | mr=2031199 | year=2003 | journal=Institut des Hautes Études Scientifiques. Publications Mathématiques | issn=0073-8301 | issue=98 | pages=59–104 | volume=98}}</ref> The analogous statement for odd primes is the [[Bloch-Kato conjecture]], proved by Voevodsky, Rost, and others. + +== Higher ''K''-theory == +The accepted definitions of higher ''K''-groups were given by {{harvtxt|Quillen|1973}}, after a few years during which several incompatible definitions were suggested. The object of the program was to find definitions of '''K'''(''R'') and '''K'''(''R'',''I'') in terms of [[classifying space]]s so that +''R'' ⇒ '''K'''(''R'') and (''R'',''I'') ⇒ '''K'''(''R'',''I'') are functors into a [[homotopy category]] of spaces and the long exact sequence for relative K-groups arises as the [[long exact homotopy sequence]] of a [[fibration]] '''K'''(''R'',''I'')&nbsp;→&nbsp;'''K'''(''R'')&nbsp;→&nbsp;'''K'''(''R''/''I'').<ref name=Ros2456>Rosenberg (1994) pp.245-246</ref> + +Quillen gave two constructions, the "+-construction" and the "''Q''-construction", the latter subsequently modified in different ways.<ref name=Ros246>Rosenberg (1994) p.246</ref> The two constructions yield the same K-groups.<ref name=Ros289>Rosenberg (1994) p.289</ref> + +=== The +-construction === +One possible definition of higher algebraic ''K''-theory of rings was given by Quillen + +:<math> K_n(R) = \pi_n(BGL(R)^+),</math> + +Here π<sub>''n''</sub> is a [[homotopy group]], GL(''R'') is the [[direct limit]] of the [[general linear group]]s over ''R'' for the size of the matrix tending to infinity, ''B'' is the classifying space construction of [[homotopy theory]], and the <sup>+</sup> is Quillen's [[plus construction]]. + +This definition only holds for ''n>0'' so one often defines the higher algebraic ''K''-theory via + +:<math> K_n(R) = \pi_n(BGL(R)^+\times K_0(R)) </math> + +Since ''BGL''(''R'')<sup>+</sup> is path connected and ''K<sub>0</sub>(R)'' discrete, this definition doesn't differ in higher degrees and also holds for ''n=0''. + +=== The Q-construction === +{{main|Q-construction}} + +The Q-construction gives the same results as the +-construction, but it applies in more general situations. Moreover, the definition is more direct in the sense that the ''K''-groups, defined via the Q-construction are functorial by definition. This fact is not automatic in the +-construction. + +Suppose ''P'' is an [[exact category]]; associated to ''P'' a new category Q''P'' is defined, objects of which are those of ''P'' and morphisms from ''M''′ to ''M''″ are isomorphism classes of diagrams + +:<math> M'\longleftarrow N\longrightarrow M'',</math> + +where the first arrow is an admissible [[epimorphism]] and the second arrow is an admissible [[monomorphism]]. + +The ''i''-th '''''K''-group''' of the exact category ''P'' is then defined as + +:<math> K_i(P)=\pi_{i+1}(\mathrm{BQ}P,0)</math> + +with a fixed zero-object 0, where B''QP'' is the ''classifying space'' of ''QP'', which is defined to be the [[geometric realisation]] of the ''[[Nerve (category theory)|nerve]]'' of ''QP''. + +This definition coincides with the above definition of ''K<sub>0</sub>(P)''. If ''P'' is the category of finitely generated [[projective module|projective ''R''-modules]], this definition agrees with the above ''BGL<sup>+</sup>'' +definition of ''K<sub>n</sub>(R)'' for all ''n''. +<!-- +The ''K''-groups ''K''<sub>i</sub>(''R'') of the ring ''R'' are then the ''K''-groups ''K''<sub>i</sub>(''P<sub>R</sub>'') where ''P<sub>R</sub>'' is the category of finitely generated [[projective module|projective ''R''-modules]]. +--> +More generally, for a [[scheme (mathematics)|scheme]] ''X'', the higher ''K''-groups of ''X'' are defined to be the ''K''-groups of (the exact category of) locally free [[Coherent sheaf|coherent sheaves]] on ''X''. + +The following variant of this is also used: instead of finitely generated projective (=locally free) modules, take finitely generated modules. The resulting ''K''-groups are usually written ''G<sub>n</sub>(R)''. When ''R'' is a [[noetherian ring|noetherian]] [[regular ring]], then ''G''- and ''K''-theory coincide. Indeed, the [[global dimension]] of regular rings is finite, i.e. any finitely generated module has a finite projective resolution ''P<sub>*</sub> → M'', and a simple argument shows that the canonical map ''K''<sub>0</sub>(R) → ''G''<sub>0</sub>(R) is an [[isomorphism]], with ''[M]=&Sigma; ±[P<sub>n</sub>]''. This isomorphism extends to the higher ''K''-groups, too. + +=== The S-construction === +A third construction of ''K''-theory groups is the S-construction, due to [[Friedhelm Waldhausen|Waldhausen]].<ref>{{Citation | last1=Waldhausen | first1=Friedhelm | author1-link=Friedhelm Waldhausen | title=Algebraic ''K''-theory of spaces | doi=10.1007/BFb0074449 | publisher=[[Springer-Verlag]] | location=Berlin, New York | series=Lecture Notes in Mathematics | mr=802796 | year=1985 | volume=1126 | pages=318–419 | chapter=Algebraic K-theory of spaces | isbn=978-3-540-15235-4}}. See also Lecture IV and the references in {{Harvard citations|last1=Friedlander|last2=Weibel|year=1999}}</ref> It applies to categories with cofibrations (also called [[Waldhausen category|Waldhausen categories]]). This is a more general concept than exact categories. + +== Examples == +While the Quillen algebraic ''K''-theory has provided deep insight into various aspects of algebraic geometry and topology, the ''K''-groups have proved particularly difficult to compute except in a few isolated but interesting cases. + +=== Algebraic K-groups of finite fields === +The first and one of the most important calculations of the higher algebraic ''K''-groups of a ring were made by Quillen himself for the case of [[finite field]]s: + +If '''F'''<sub>''q''</sub> is the finite field with ''q'' elements, then: + +* ''K''<sub>0</sub>('''F'''<sub>''q''</sub>) = '''Z''', +* ''K''<sub>2i</sub>('''F'''<sub>''q''</sub>)=0 for ''i'' &ge;1, +* ''K''<sub>2i-1</sub>('''F'''<sub>''q''</sub>)= '''Z'''/(''q<sup>&nbsp;i</sup>''-1)'''Z''' for ''i''&nbsp;&ge;1. + +=== Algebraic K-groups of rings of integers === +Quillen proved that if ''A'' is the [[ring of integers|ring of algebraic integers]] in an algebraic [[number field]] ''F'' (a finite extension of the rationals), then the algebraic K-groups of ''A'' are finitely generated. [[Armand Borel|Borel]] used this to calculate K<sub>''i''</sub>(''A'') and K<sub>''i''</sub>(''F'') modulo torsion. For example, for the integers '''Z''', Borel proved that (modulo torsion) + +* ''K''<sub>i</sub> ('''Z''')/tors.=0 for positive ''i'' unless ''i=4k+1'' with ''k'' positive +* ''K''<sub>4''k''+1</sub> ('''Z''')/tors.= '''Z''' for positive ''k''. + +The torsion subgroups of K<sub>2''i''+1</sub>('''Z'''), and the orders of the finite groups K<sub>4''k''+2</sub>('''Z''') have recently been determined, but whether the latter groups are cyclic, and whether the groups K<sub>4''k''</sub>('''Z''') vanish depends upon [[Vandiver's conjecture]] about the class groups of cyclotomic integers. See [[Quillen-Lichtenbaum conjecture]] for more details. + +== Applications and open questions == +Algebraic ''K''-groups are used in conjectures on [[special values of L-functions]] and the formulation of an [[non-commutative main conjecture of Iwasawa theory]] and in construction of [[higher regulator]]s.<ref name=Lem385>Lemmermeyer (2000) p.385</ref> + +[[Parshin's conjecture]] concerns the higher algebraic ''K''-groups for smooth varieties over finite fields, and states that in this case the groups vanish up to torsion. + +Another fundamental conjecture due to [[Hyman Bass]] ([[Bass' conjecture]]) says that all of the groups ''G<sub>n</sub>(A)'' are finitely generated when ''A'' is a finitely generated '''Z'''-algebra. (The groups +''G<sub>n</sub>(A)'' are the ''K''-groups of the category of finitely generated ''A''-modules) <ref>{{Harvard citations|last1=Friedlander| last2=Weibel | year=1999}}, Lecture VI</ref> + +==Notes== +{{reflist}} + +== References == +*{{citation | last=Bass | first=Hyman | authorlink=Hyman Bass | title=Algebraic ''K''-theory | series=Mathematics Lecture Note Series | location=New York-Amsterdam | publisher=W.A. Benjamin, Inc. | year=1968 | zbl=0174.30302 }} +*{{Citation | editor1-last=Friedlander |editorlink1=Eric Friedlander | editor1-first=Eric | editor2-last=Grayson | editor2-first=Daniel | title=Handbook of K-Theory | url=http://www.springerlink.com/content/978-3-540-23019-9/ | publisher=[[Springer-Verlag]] | location=Berlin, New York | isbn=978-3-540-30436-4 | mr=2182598 | year=2005}} +* {{Citation | last1=Friedlander | first1=Eric M. | last2=Weibel | first2=Charles W. | title=An overview of algebraic ''K''-theory | publisher=World Sci. Publ., River Edge, NJ | mr=1715873 | year=1999 | pages=1–119}} +* {{citation | last1=Gille | first1=Philippe | last2=Szamuely | first2=Tamás | title=Central simple algebras and Galois cohomology | series=Cambridge Studies in Advanced Mathematics | volume=101 | location=Cambridge | publisher=[[Cambridge University Press]] | year=2006 | isbn=0-521-86103-9 | zbl=1137.12001 }} +* {{citation | last=Gras | first=Georges | title=Class field theory. From theory to practice | series=Springer Monographs in Mathematics | location=Berlin | publisher=[[Springer-Verlag]] | year=2003 | isbn=3-540-44133-6 | zbl=1019.11032 }} +* {{citation | first=Tsit-Yuen | last=Lam | authorlink=Tsit Yuen Lam | title=Introduction to Quadratic Forms over Fields | volume=67 | series=Graduate Studies in Mathematics | publisher=[[American Mathematical Society]] | year=2005 | isbn=0-8218-1095-2 | zbl=1068.11023 | mr = 2104929 }} +* {{citation | last=Lemmermeyer | first=Franz | title=Reciprocity laws. From Euler to Eisenstein | series=Springer Monographs in Mathematics | location=Berlin | publisher=[[Springer-Verlag]] | year=2000 | isbn=3-540-66957-4 | zbl=0949.11002 | mr=1761696 | doi=10.1007/978-3-662-12893-0 }} +* {{Citation | last1=Milnor | first1=John Willard | author1-link= John Milnor | title=Algebraic ''K''-theory and quadratic forms | mr=0260844 | year=1970 | month=1969 | journal=Inventiones Mathematicae | issn=0020-9910 | volume=9 | pages=318–344 | doi=10.1007/BF01425486 | issue=4}} +* {{Citation | last1=Milnor | first1=John Willard | author1-link= John Milnor | title=Introduction to algebraic K-theory | publisher=[[Princeton University Press]] | location=Princeton, NJ | mr=0349811 | year=1971 | zbl=0237.18005 | series=Annals of Mathematics Studies | volume=72 }} (lower K-groups) +*{{Citation | last1=Quillen | first1=Daniel | author1-link=Daniel Quillen | title=Algebraic K-theory, I: Higher K-theories (Proc. Conf., Battelle Memorial Inst., Seattle, Wash., 1972) | publisher=[[Springer-Verlag]] | location=Berlin, New York | series=Lecture Notes in Math | doi=10.1007/BFb0067053 | mr=0338129 | year=1973 | volume=341 | chapter=Higher algebraic K-theory. I | pages=85–147 | isbn=978-3-540-06434-3}} +* {{Citation | last1=Quillen | first1=Daniel | author1-link= Daniel Quillen | title=Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Vol. 1 | publisher=Canad. Math. Congress | location=Montreal, Quebec | mr=0422392 | year=1975 | chapter=Higher algebraic K-theory | pages=171–176}} (Quillen's Q-construction) +* {{Citation | last1=Quillen | first1=Daniel | title=New developments in topology (Proc. Sympos. Algebraic Topology, Oxford, 1972) | publisher=[[Cambridge University Press]] | series=London Math. Soc. Lecture Note Ser. | mr=0335604 | year=1974 | volume=11 | chapter=Higher K-theory for categories with exact sequences | pages=95–103}} (relation of Q-construction to +-construction) +*{{Citation | last1=Rosenberg | first1=Jonathan | authorlink=Jonathan Rosenberg (mathematician) | title=Algebraic K-theory and its applications | url=http://books.google.com/books?id=TtMkTEZbYoYC | publisher=[[Springer-Verlag]] | location=Berlin, New York | series=[[Graduate Texts in Mathematics]] | isbn=978-0-387-94248-3 | mr=1282290 | zbl=0801.19001 | year=1994 | volume=147}}. [http://www-users.math.umd.edu/~jmr/KThy_errata2.pdf Errata] +* {{Citation | last1=Seiler | first1=Wolfgang | editor1-last=Rapoport | editor1-first=M. | editor2-last=Schneider | editor2-first=P. | editor3-last=Schappacher | editor3-first=N. | title=Beilinson's Conjectures on Special Values of L-Functions | publisher=Academic Press | location=Boston, MA | isbn=978-0-12-581120-0 | chapter=λ-Rings and Adams Operations in Algebraic K-Theory | year=1988}} +* {{citation | last=Silvester | first=John R. | title=Introduction to algebraic K-theory | series=Chapman and Hall Mathematics Series | location=London, New York | publisher=[[Chapman and Hall]] | year=1981 | isbn=0-412-22700-2 | zbl=0468.18006 }} +* {{Citation | last1=Weibel | first1=Charles | author1-link=Charles Weibel | title=Handbook of K-theory | url=http://www.math.uiuc.edu/K-theory/0691/KZsurvey.pdf | publisher=[[Springer-Verlag]] | location=Berlin, New York | mr=2181823 | year=2005 | chapter=Algebraic K-theory of rings of integers in local and global fields | pages=139–190}} (survey article) + +==Further reading== +*{{citation | last=Srinivas | first=V. | title=Algebraic ''K''-theory | edition=Paperback reprint of the 1996 2nd | series=Modern Birkhäuser Classics | location=Boston, MA | publisher=[[Birkhäuser]] | year=2008 | isbn=978-0-8176-4736-0 | zbl=1125.19300 }} +*C. Weibel "[http://www.math.rutgers.edu/~weibel/Kbook.html The K-book: An introduction to algebraic K-theory]" + +== See also == +*[[Bloch's formula]] +*[[Redshift conjecture]] +*[[K-theory spectrum]] +*[[Fundamental theorem of algebraic K-theory]] + +== External links == + +* [http://www.math.uiuc.edu/K-theory/ K theory preprint archive] + +{{DEFAULTSORT:Algebraic K-Theory}} +[[Category:Algebraic K-theory| ]] +[[Category:Algebraic geometry]] + srucb9h3xkw03en3m8rmfxdcqu2uhic + + + + Cone of curves + 0 + 21481 + + 21482 + 2012-01-03T09:29:52Z + + 129.215.5.180 + + /* A structure theorem */ + wikitext + text/x-wiki + In [[mathematics]], the '''cone of curves''' (sometimes the '''Kleiman-Mori''' cone) of an [[algebraic variety]] <math>X</math> is a [[combinatorial invariant]] of much importance to the [[birational geometry]] of <math>X</math>. + +==Definition== +Let <math>X</math> be a [[Proper morphism|proper]] variety. By definition, a (real) ''1-cycle'' on <math>X</math> is a formal [[linear combination]] <math>C=\sum a_iC_i</math> of irreducible, reduced and proper curves <math>C_i</math>, with coefficients <math>a_i \in \mathbb{R}</math>. ''Numerical equivalence'' of 1-cycles is defined by intersections: two 1-cycles <math>C</math> and <math>C'</math> are numerically equivalent if <math>C \cdot D = C' \cdot D</math> for every Cartier [[divisor]] <math>D</math> on <math>X</math>. Denote the [[real vector space]] of 1-cycles modulo numerical equivalence by <math>N_1(X)</math>. + +We define the ''cone of curves'' of <math>X</math> to be + +: <math>NE(X) = \left\{\sum a_i[C_i], \ 0 \leq a_i \in \mathbb{R} \right\} </math> + +where the <math>C_i</math> are irreducible, reduced, proper curves on <math>X</math>, and <math>[C_i]</math> their classes in <math>N_1(X)</math>. It is not difficult to see that <math>NE(X)</math> is indeed a [[Cone_(linear_algebra)#Convex_cone|convex cone]] in the sense of convex geometry. + +==Applications== +One useful application of the notion of the cone of curves is the '''[[Steven Kleiman|Kleiman]] condition''', which says that a (Cartier) divisor <math>D</math> on a complete variety <math>X</math> is [[ample line bundle|ample]] if and only if <math>D \cdot x > 0</math> for any nonzero element <math>x</math> in <math>\overline{NE(X)}</math>, the closure of the cone of curves in the usual real topology. (In general, <math>NE(X)</math> need not be closed, so taking the closure here is important.) + +A more involved example is the role played by the cone of curves in the theory of [[minimal model]]s of algebraic varieties. Briefly, the goal of that theory is as follows: given a (mildly singular) projective variety <math>X</math>, find a (mildly singular) variety <math>X'</math> which is [[birational]] to <math>X</math>, and whose [[canonical divisor]] <math>K_{X'}</math> is [[numerically effective|nef]]. The great breakthrough of the early 1980s (due to [[Shigefumi Mori|Mori]] and others) was to construct (at least morally) the necessary birational map from <math>X</math> to <math>X'</math> as a sequence of steps, each of which can be thought of as contraction of a <math>K_x</math>-negative extremal ray of <math>NE(X)</math>. This process encounters difficulties, however, whose resolution necessitates the introduction of the [[flip (algebraic geometry)|flip]]. + +==A structure theorem== +The above process of contractions could not proceed without the fundamental result on the structure of the cone of curves known as the '''Cone Theorem'''. The first version of this theorem, for [[smooth varieties]], is due to [[Shigefumi Mori|Mori]]; it was later generalised to a larger class of varieties by [[János_Kollár|Kollár]], [[Miles Reid|Reid]], [[Vyacheslav_Shokurov|Shokurov]], and others. Mori's version of the theorem is as follows: + +'''Cone Theorem.''' Let <math>X</math> be a smooth [[projective variety]]. Then + +1. There are [[countably many]] [[rational curve]]s <math>C_i</math> on <math>X</math>, satisfying <math>0< -K_X \cdot C_i \leq \operatorname{dim} X +1 </math>, and + +: <math>\overline{NE(X)} = \overline{NE(X)}_{K_X\geq 0} + \sum_i \mathbf{R}_{\geq0} [C_i].</math> + +2. For any positive real number <math>\epsilon</math> and any [[ample divisor]] <math>H</math>, + +: <math>\overline{NE(X)} = \overline{NE(X)}_{K_X+\epsilon H\geq0} + \sum \mathbf{R}_{\geq0} [C_i],</math> + +where the sum in the last term is finite. + +The first assertion says that, in the [[closed half-space]] of <math>N_1(X)</math> where intersection with <math>K_X</math> is nonnegative, we know nothing, but in the complementary half-space, the cone is spanned by some countable collection of curves which are quite special: they are [[Rational variety|rational]], and their 'degree' is bounded very tightly by the dimension of <math>X</math>. The second assertion then tells us more: it says that, away from the hyperplane <math>\{C : K_X \cdot C = 0\}</math>, extremal rays of the cone cannot accumulate. + + +If in addition the variety <math>X</math> is defined over a field of characteristic 0, we have the following assertion, sometimes referred to as the '''Contraction Theorem''': + +3. Let <math>F \subset \overline{NE(X)}</math> be an extremal face of the cone of curves on which <math>K_X</math> is negative. Then there is a unique [[morphism]] <math>\operatorname{cont}_F : X \rightarrow Z</math> to a projective variety ''Z'', such that <math>(\operatorname{cont}_F)_* \mathcal{O}_X = \mathcal{O}_Z</math> and an irreducible curve <math>C</math> in <math>X</math> is mapped to a point by <math>\operatorname{cont}_F</math> if and only if <math>[C] \in F</math>. + +==References== + +* Lazarsfeld, R., ''Positivity in Algebraic Geometry I'', Springer-Verlag, 2004. ISBN 3-540-22533-1 +* Kollár, J. and Mori, S., ''Birational Geometry of Algebraic Varieties'', Cambridge University Press, 1998. ISBN 0-521-63277-3 + +[[Category:Algebraic geometry]] +[[Category:Birational geometry]] + 545z1esamz48kbuts133ihjqiu97zga + + + + Geometrothermodynamics + 0 + 28601 + + 28602 + 2013-04-24T13:33:31Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Removing Orphan Tag - Linked from [[Differential geometry]] ([[User_talk:Addbot|Report Errors]]) + wikitext + text/x-wiki + {{multiple issues| +{{notability|date=January 2013}} +{{original research|date=January 2013}} +}} + +In physics, '''geometrothermodynamics (GTD)''' is a formalism developed recently by Hernando Quevedo to describe the properties of thermodynamic systems in terms of concepts of differential geometry.<ref name=quev07>{{cite journal|last=Quevedo | first=Hernando|year=2007|title=Geometrothermodynamics|journal=J. Math. Phys.|volume=48|pages=013506|DOI=10.1063/1.2409524 |arxiv=physics/0604164 }}</ref> Consider a thermodynamic system in the framework of classical equilibrium thermodynamics. The states of thermodynamic equilibrium are considered as points of an abstract equilibrium space in which a Riemannian metric can be introduced in several ways. In particular, one can introduce [[Hessian]] metrics like the [[Fisher information metric]], the [[Weinhold metric]], the [[Ruppeiner metric]] and others, whose components are calculated as the Hessian of a particular [[thermodynamic potential]]. Another possibility is to introduce metrics which are independent of the thermodynamic potential, a property which is shared by all thermodynamic systems in classical thermodynamics.<ref>{{Cite book|last=Callen|first=Herbert B.| title=Thermodynamics and an Introduction to Thermostatistics +|publisher=John Wiley & Sons Inc.|year=1985 | isbn=0-471-86256-8}}</ref> Since a change of thermodynamic potential is equivalent to a [[Legendre transformation]], and Legendre transformations do not act in the equilibrium space, it is necessary to introduce an auxiliary space to correctly handle the Legendre transformations. This is the so-called thermodynamic phase space. If the phase space is equipped with a Legendre invariant Riemannian metric, a smooth map can be introduced that induces a thermodynamic metric in the equilibrium manifold. The thermodynamic metric can then be used with different thermodynamic potentials without changing the geometric properties of the equilibrium manifold. One expects the geometric properties of the equilibrium manifold to be related to the macroscopic physical properties. The details of this relation can be summarized in three main points: + +#Curvature is a measure of the thermodynamical interaction. +#Curvature singularities correspond to curvature phase transitions. +#Thermodynamic geodesics correspond to quasi-static processes. + +==Geometric aspects== +The main ingredient of GTD is a (2''n''&nbsp;+&nbsp;1)-dimensional manifold <math>\mathcal{T}</math> +with coordinates <math>Z^A=\{\Phi,E^a,I^a\}</math>, where <math>\Phi</math> is an arbitrary thermodynamic potential, <math>E^a</math>, <math>a=1,2,\ldots,n</math>, are the +extensive variables, and <math>I^a</math> the intensive variables. It is also +possible to introduce in a canonical manner the fundamental +one-form <math>\Theta = d\Phi - \delta_{ab}I^a d E^b</math> (summation over repeated indices) with <math>\delta_{ab}={\rm +diag}(+1,\ldots,+1)</math>, which satisfies the condition <math>\Theta \wedge +(d\Theta)^n \neq 0</math>, where <math>n</math> is the number of thermodynamic +degrees of freedom of the system, and is invariant with respect to +Legendre transformations<ref>{{Cite book|last=Arnold|first=V.I. +| title=Mathematical Methods of Classical Mechanics +|publisher=Springer Verlag|year=1989 | isbn=0-387-96890-3}}</ref> + +: <math> +\{Z^A\}\longrightarrow \{\widetilde{Z}^A\}=\{\tilde \Phi, \tilde E ^a, \tilde I ^ a\}\ ,\quad +\Phi = \tilde \Phi - \delta_{kl} \tilde E ^k \tilde I ^l ,\quad +E^i = - \tilde I ^{i}, \quad + E^j = \tilde E ^j,\quad + I^{i} = \tilde E ^i , \quad + I^j = \tilde I ^j \ , + </math> + +where <math>i\cup j</math> is any disjoint decomposition of the set of indices <math>\{1,\ldots,n\}</math>, +and <math>k,l= 1,\ldots,i</math>. In particular, for <math>i=\{1,\ldots,n\}</math> and <math>i=\emptyset</math> we obtain +the total Legendre transformation and the identity, respectively. +It is also assumed that in <math> \mathcal{T} </math> +there exists a metric <math> G </math> which is also +invariant with respect to Legendre transformations. The triad +<math>(\mathcal{T},\Theta,G)</math> defines a Riemannian [[contact manifold]] which is +called the thermodynamic phase space (phase manifold). The space of +thermodynamic equilibrium states (equilibrium manifold) is an +n-dimensional [[Riemannian submanifold]] <math>\mathcal{E}\subset \mathcal{T}</math> +induced by a smooth map <math>\varphi:\mathcal{E}\rightarrow\mathcal{T}</math>, +i.e. <math>\varphi:\{E^a\} \mapsto \{\Phi,E^a,I^a\}</math>, with <math>\Phi=\Phi(E^a)</math> +and <math>I^a= I^a(E^a)</math>, such that <math> \varphi^*(\Theta)=\varphi^*(d\Phi +- \delta_{ab}I^{a}dE^{b})=0 </math> holds, where <math>\varphi^*</math> is the +pullback of <math>\varphi</math>. The manifold <math>\mathcal{E}</math> is naturally equipped +with the Riemannian metric <math>g=\varphi^*(G)</math>. The purpose of GTD is +to demonstrate that the geometric properties of <math>\mathcal{E}</math> are +related to the thermodynamic properties of a system with fundamental +thermodynamic equation <math>\Phi=\Phi(E^a)</math>. +The condition of invariance with respect total Legendre transformations leads to the metrics + +: <math> +G^I = (d\Phi- \delta_{ab}I^a d E^b)^2 + \Lambda\, (\xi_{ab} E^a I^b)\left( \delta_{cd} dE^c d I^d\right)\ ,\quad +\delta_{ab}={\rm diag}(1,\ldots,1) +</math> + +: <math> +G^{II} = (d\Phi- \delta_{ab}I^a d E^b)^2 + \Lambda\, (\xi_{ab} E^a I^b)\left( \eta_{cd} dE^c d I^d\right)\ ,\quad +\eta_{ab}={\rm diag}(-1,1,\ldots,1) +</math> + +where <math>\xi_{ab}</math> is a constant diagonal matrix that can be expressed in terms of <math>\delta_{ab}</math> and +<math>\eta_{ab}</math>, and <math>\Lambda</math> is an arbitrary Legendre invariant function of <math>Z^A</math>. The metrics <math>G^I</math> and <math>G^{II}</math> have been used to describe thermodynamic systems with first and second order phase transitions, respectively. The most general metric which is invariant with respect to partial Legendre transformations is + +: <math> +G^{III} = (d\Phi- \delta_{ab}I^a d E^b)^2 + \Lambda\, ( E_a I_a)^{2k+1} \left( dE^a d I^a\right)\ , +\quad + E_a= \delta_{ab} E^b \ , +\quad I_a = \delta_{ab} I^b \ . +</math> + +The components of the corresponding metric for the equilibrium manifold <math> {\mathcal E} </math> can be computed as + +: <math> +g_{ab} = \frac{\partial Z^A}{\partial E^a}\frac{\partial Z^B}{\partial E^b} G_{AB}\ . +</math> + +==Applications== +GTD has been applied to describe laboratory systems like the ideal gas, van der Waals gas, the Ising model, etc., more exotic systems like black holes in different gravity theories,<ref name=qstv11>{{cite journal|last=Quevedo | first=H.|last=Sanchez | first=A.|last=Taj | first=S.|last=Vazquez | first=A. |year=2011|title=Phase transitions in Geometrothermodynamics|journal=Gen. Rel. Grav. |volume=43|pages=1153|DOI=10.1007/s10714-010-0996-2|arxiv=1010.5599 }}</ref> in the context of relativistic cosmology,<ref name=abcq12>{{cite journal|last=Aviles | first=A. |title=Extending the generalized Chaplygin gas model by using geometrothermodynamics |year=2012 +|journal=Phys. Rev. D |volume=86|pages=063508|DOI=10.1103/PhysRevD.86.063508|arxiv=1203.4637}}</ref> and to describe chemical reactions +.<ref name=tq13>{{cite journal|last=Tapias | first=D. |year=2013|title=Geometric description of chemical reactions +|arxiv=1301.0262}}</ref> + +==References== + +<References /> + +[[Category:Thermodynamics]] +[[Category:Geometry]] +[[Category:Information]] + lkddb14u1fxrza7upsd2pylzj8ti18z + + + + Darcy's law + 0 + 6723 + + 6724 + 2014-02-03T08:06:05Z + + Antwan718 + 0 + + /* Description */ + wikitext + text/x-wiki + '''Darcy's law''' is a [[Phenomenology (science)|phenomenologically]] derived [[constitutive equation]] that describes the flow of a [[fluid]] through a [[porous]] medium. The law was formulated by [[Henry Darcy]] based on the results of experiments<ref>H. Darcy, Les Fontaines Publiques de la Ville de Dijon, Dalmont, Paris (1856).</ref> on the flow of [[water]] through beds of [[sand]]. It also forms the scientific basis of fluid [[Permeability (fluid)|permeability]] used in the [[earth science]]s, particularly in [[hydrogeology]]. + +== Background == + +Although Darcy's law (an expression of [[conservation of momentum]]) was determined experimentally by Darcy, it has since been derived from the [[Navier-Stokes equations]] via [[Homogenization (mathematics)|homogenization]]. It is analogous to [[Fourier's law]] in the field of [[heat conduction]], [[Ohm's law]] in the field of [[electrical networks]], or [[Fick's law]] in [[diffusion]] theory. + +One application of Darcy's law is to water flow through an [[aquifer]]; Darcy's law along with the equation of [[conservation of mass]] are equivalent to the [[groundwater flow equation]], one of the basic relationships of [[hydrogeology]]. Darcy's law is also used to describe oil, water, and gas flows through petroleum reservoirs. + +== Description == +[[Image:Darcy's Law.png|thumb|right|300px|Diagram showing definitions and directions for Darcy's law.]] + +Darcy's law is a simple proportional relationship between the instantaneous discharge rate through a porous medium, the [[viscosity]] of the fluid and the pressure drop over a given distance. + +: <math>Q=\frac{-kA}{\mu} \frac{(P_b - P_a)}{L}</math>' + +The total discharge, ''Q'' (units of volume per time, e.g., m<sup>3</sup>/s) is equal to the product of the intrinsic [[Permeability (fluid)|permeability]] of the medium, ''k'' (m<sup>2</sup>), the cross-sectional area to flow, ''A'' (units of area, e.g., m<sup>2</sup>), and the total pressure drop (P<sub>b</sub> - P<sub>a</sub>), (Pascals), all divided by the [[viscosity]], ''&mu;'' (Pa·s) and the length over which the pressure drop is taking place (m). The negative sign is needed because fluid flows from high pressure to low pressure. If the change in pressure is negative (where P<sub>a</sub> > P<sub>b</sub>), then the flow will be in the positive 'x' direction. Dividing both sides of the equation by the area and using more general notation leads + +: <math>q=\frac{-k}{\mu} \nabla P</math> + +where ''q'' is the flux (discharge per unit area, with units of length per time, m/s) and <math>\nabla P</math> is the [[pressure gradient]] vector (Pa/m). This value of flux, often referred to as the Darcy flux, is not the velocity which the fluid traveling through the pores is experiencing. The fluid velocity (''v'') is related to the Darcy flux (''q'') by the [[porosity]] (''n''). The flux is divided by porosity to account for the fact that only a fraction of the total formation volume is available for flow. The fluid velocity would be the velocity a conservative tracer would experience if carried by the fluid through the formation. + +: <math>v=\frac{q}{n}</math> + +Darcy's law is a simple mathematical statement which neatly summarizes several familiar properties that [[groundwater]] flowing in [[aquifer]]s exhibits, including: +* if there is no pressure gradient over a distance, no flow occurs (these are [[hydrostatics|hydrostatic]] conditions), +* if there is a pressure gradient, flow will occur from high pressure towards low pressure (opposite the direction of increasing gradient - hence the negative sign in Darcy's law), +* the greater the pressure gradient (through the same formation material), the greater the discharge rate, and +* the discharge rate of fluid will often be different &mdash; through different formation materials (or even through the same material, in a different direction) &mdash; even if the same pressure gradient exists in both cases. + +A graphical illustration of the use of the steady-state [[groundwater flow equation]] (based on Darcy's law and the conservation of mass) is in the construction of [[flownet]]s, to quantify the amount of [[groundwater]] flowing under a [[dam]]. + +Darcy's law is only valid for slow, [[viscous]] flow; fortunately, most groundwater flow cases fall in this category. Typically any flow with a [[Reynolds number]] less than one is clearly laminar, and it would be valid to apply Darcy's law. Experimental tests have shown that flow regimes with Reynolds numbers up to 10 may still be Darcian, as in the case of groundwater flow. The Reynolds number (a dimensionless parameter) for porous media flow is typically expressed as + +: <math>Re = \frac{\rho v d_{30}}{\mu}</math>. + +where ''&rho;'' is the [[density]] of [[water]] (units of mass per volume), ''v'' is the specific discharge (not the pore velocity &mdash; with units of length per time), ''d<sub>30</sub>'' is a representative grain diameter for the porous media (often taken as the 30% passing size from a [[grain size]] analysis using sieves - with units of length), and ''&mu;'' is the [[viscosity]] of the fluid. + +== Derivation == +For stationary, creeping, incompressible flow, i.e. <math>D\left(\rho u_i\right)/Dt\approx0</math>, the Navier-Stokes equation simplify to the [[Stokes flow|Stokes equation]]: + +: <math> \mu\nabla^2 u_i +\rho g_i -\partial_i P=0</math>, + +where <math>\mu</math> is the viscosity, <math>u_i</math> is the velocity in the i direction, <math>g_i</math> is the gravity component in the i direction and P is the pressure. +Assuming the viscous resisting force is linear with the velocity we may write: + +: <math>-\left(k_{ij}\right)^{-1}\mu\phi u_j+\rho g_i-\partial_i P=0</math>, + +where <math>\phi</math> is the [[porosity]], and <math>k_{ij}</math> is the second order permeability tensor. This gives the velocity in the <math>n</math> direction, + +: <math>k_{ni}\left(k_{ij}\right)^{-1} u_j= \delta_{nj} u_j = u_n = -\frac{k_{ni}}{\phi\mu}\left(\partial_i P-\rho g_i\right)</math>, + +which gives Darcy's law for the volumetric flux density in the <math>n</math> direction, + +: <math>q_n=-\frac{k_{ni}}{\mu}\left(\partial_i P -\rho g_i\right)</math>. + +In isotropic porous media the off-diagonal elements in the permeability tensor are zero, <math>k_{ij}=0</math> for <math>i\neq j</math> and the diagonal elements are identical, <math> k=k_{ii}</math>, and the common form is obtained + +: <math>\boldsymbol{q}=-\frac{k}{\mu}\left(\boldsymbol{\nabla} P -\rho \boldsymbol{g}\right)</math>. + +== Additional forms of Darcy's law == +For very short time scales, a time derivative of flux may be added to Darcy's law, which results in valid solutions at very small times (in heat transfer, this is called the modified form of [[Fourier's law]]), + +: <math>\tau \frac{\partial q}{\partial t}+q=-K \nabla h</math>, + +where ''&tau;'' is a very small time constant which causes this equation to reduce to the normal form of Darcy's law at "normal" times (> [[nanosecond]]s). The main reason for doing this is that the regular [[groundwater flow equation]] ([[diffusion equation]]) leads to [[Mathematical singularity|singularities]] at constant head boundaries at very small times. This form is more mathematically rigorous, but leads to a [[hyperbolic]] groundwater flow equation, which is more difficult to solve and is only useful at very small times, typically out of the realm of practical use. + +Another extension to the traditional form of Darcy's law is the Brinkman term, which is used to account for transitional flow between boundaries (introduced by Brinkman in 1949 <ref>{{cite journal + |last = Brinkman + |first = H. C. + |title = A calculation of the viscous force exerted by a flowing fluid on a dense swarm of particles + |journal = Applied Scientific Research + |volume = 1 + |pages = 27–34 + |url = http://dx.doi.org/10.1007/BF02120313 + |doi = 10.1007/BF02120313 + |year = 1949}}</ref>), + +: <math>\beta \nabla^{2}q +q =-K \nabla P</math>, + +where ''&beta;'' is an effective [[viscosity]] term. This correction term accounts for flow through medium where the grains of the media are porous themselves, but is difficult to use, and is typically neglected. + +Another derivation of Darcy's law is used extensively in [[petroleum engineering]] to determine the flow through permeable media - the most simple of which is for a one dimensional, homogeneous rock formation with a fluid of constant [[viscosity]]. + +: <math>Q= \frac {k A}{\mu} \left( \frac{\partial P}{\partial L} \right)</math>, + +where Q is the [[flowrate]] of the formation (in units of volume per unit time), k is the relative [[Permeability (earth sciences)|permeability]] of the formation (typically in [[millidarcies]]), A is the cross-sectional [[area]] of the formation, ''&mu;'' is the [[viscosity]] of the fluid (typically in units of [[centipoise]], and L is the [[length]] of the porous media the fluid will flow through. <math>\partial P/ \partial L</math> represents the pressure change per unit length of the formation. This equation can also be solved for permeability, allowing for [[relative permeability]] to be calculated by forcing a fluid of known viscosity through a core of a known length and area, and measuring the pressure drop across the length of the core. + +For very high velocities in porous media, [[inertial]] effects can also become significant. Sometimes an [[inertial]] term is added to the Darcy's equation, known as [[Philipp Forchheimer|Forchheimer]] term. This term is able to account for the [[non-linear]] behavior of the pressure difference vs velocity data.<ref>A. Bejan, Convection Heat Transfer, John Wiley & Sons (1984)</ref> +:<math>\nabla P=-\frac{\mu}{k}q-\frac{\rho}{k_1}q^2</math>, +where the additional term <math>k_1</math> is known as inertial permeability. + +Darcy's law is valid only for flow in [[Continuum mechanics|continuum]] region. For a flow in transition region, where both [[viscous]] and [[Knudsen]] friction are present a new formulation is used, which is known as binary friction model <ref>{{cite journal + |last = Pant + |first = Lalit M. + |coauthors = Sushanta K. Mitra, Marc Secanell + |title = Absolute permeability and Knudsen diffusivity measurements in PEMFC gas diffusion layers and micro porous layers + |journal = Journal of Power Sources + |url = http://dx.doi.org/10.1016/j.jpowsour.2012.01.099 + |doi=10.1016/j.jpowsour.2012.01.099 + |year = 2012}}</ref> + +<math>\nabla P=-\left(\frac{k}{\mu}+D_K\right)^{-1}q</math>, + +where <math>D_K</math> is the [[Knudsen]] [[diffusivity]] of the fluid in porous media. + +==See also== +*The [[darcy]] unit of fluid permeability +*[[Hydrogeology]] +*[[Groundwater flow equation]] + +== Notes == +<references/> + +[[Category:Water]] +[[Category:Civil engineering]] +[[Category:Soil mechanics]] +[[Category:Soil physics]] +[[Category:Hydrology]] + 7m5z0j20js0mg14yxfk10k3nxlv6fsg + + + + Unit root + 0 + 10487 + + 10488 + 2013-12-27T02:25:08Z + + BD2412 + 0 + + + Fixing [[Wikipedia:Disambiguation pages with links|links to disambiguation pages]], improving links, other minor cleanup tasks using [[Project:AWB|AWB]] + wikitext + text/x-wiki + A '''unit root''' is a feature of [[Dynamical system|processes that evolve through time]] that can cause problems in [[statistical inference]] involving [[time series]] [[model (abstract)|models]]. + +A linear [[stochastic process]] has a unit root if 1 is a root of the process's [[Characteristic equation (calculus)|characteristic equation]]. Such a process is [[Stationary process|non-stationary]]. If the other roots of the characteristic equation lie inside the unit circle—that is, have a modulus ([[absolute value]]) less than one—then the [[first difference]] of the process will be stationary. + +==Definition== +Consider a discrete-time [[stochastic process]] <math> \{y_t,t=1,\ldots,\infty\}</math>, and suppose that it can be written as an [[autoregressive]] process of order&nbsp;''p'': + +:<math>y_t=a_1 y_{t-1}+a_2 y_{t-2} + \cdots + a_p y_{t-p}+\varepsilon_t.</math> + +Here, <math> \{\varepsilon_{t},t=0,\infty\}</math> is a serially uncorrelated, mean zero stochastic process with constant variance <math>\sigma^2</math>. For convenience, assume <math> y_0 = 0 </math>. If <math>m=1</math> is a [[Root of a function|root]] of the [[Characteristic polynomial|characteristic equation]]: + +:<math> m^p - m^{p-1}a_1 - m^{p-2}a_2 - \cdots - a_p = 0 </math> + +then the stochastic process has a '''unit root''' or, alternatively, is [[Order of integration|integrated of order]] one, denoted <math> I(1) </math>. If ''m'' = 1 is a [[Multiplicity (mathematics)#Multiplicity of a root of a polynomial|root of multiplicity]] ''r'', then the stochastic process is integrated of order ''r'', denoted ''I''(''r''). + +==Example== +The first order autoregressive model, <math>y_t=a_{1}y_{t-1}+\varepsilon_t</math>, has a unit root when <math>a_1=1</math>. In this example, the characteristic equation is <math> m - a_1 = 0 </math>. The root of the equation is <math> m = 1 </math>. + +If the process has a unit root, then it is a non-stationary time series. That is, the moments of the stochastic process depend on <math>t</math>. To illustrate the effect of a unit root, we can consider the first order case, starting from ''y''<sub>0</sub>&nbsp;=&nbsp;0: + +:<math>y_{t}= y_{t-1}+\varepsilon_t.</math> + +By repeated substitution, we can write <math> y_t = y_0 + \sum_{j=1}^t \varepsilon_j</math>. Then the variance of <math> y_t</math> is given by: + +: <math> \operatorname{Var}(y_t) = \sum_{j=1}^t \sigma^2=t \sigma^2 .</math> + +The variance depends on ''t'' since <math> \operatorname{Var}(y_{1}) = \sigma^2 </math>, while <math> \operatorname{Var}(y_{2}) = 2\sigma^2 </math>. Note that the variance of the series is diverging to infinity with&nbsp;''t''. + +==Related models== +In addition to [[autoregressive model|AR]] and [[Autoregressive–moving-average model|ARMA]] models, other important models arise in [[regression analysis]] where the [[errors and residuals in statistics|model errors]] may themselves have a [[time series]] structure and thus may need to be modelled by an AR or ARMA process that may have a unit root, as discussed above. The [[sample size|finite sample]] properties of regression models with first order ARMA errors, including unit roots, have been analyzed.<ref>{{cite journal |authorlink=John Denis Sargan |last=Sargan |first=J. D. |authorlink2=Alok Bhargava |first2=Alok |last2=Bhargava |year=1983 |title=Testing residuals from least squares regressions for being generated by the Gaussian random walk |journal=[[Econometrica]] |volume=51 |issue=1 |pages=153–174 |jstor=1912252 }}</ref><ref>{{cite journal |last=Sargan |first=J. D. |first2=Alok |last2=Bhargava |year=1983 |title=Maximum Likelihood Estimation of Regression Models with First Order Moving Average Errors when the Root Lies on the Unit Circle |journal=Econometrica |volume=51 |issue=3 |pages=799–820 |jstor=1912159 }}</ref> + +==Estimation when a unit root may be present== + +Often, [[ordinary least squares]] (OLS) is used to estimate the slope coefficients of the [[autoregressive model]]. Use of OLS relies on the stochastic process being stationary. When the stochastic process is non-stationary, the use of OLS can produce invalid estimates. [[Clive Granger|Granger]] and Newbold called such estimates 'spurious regression' results:<ref>{{cite journal |last=Granger |first=C. W. J. |last2=Newbold |first2=P. |year=1974 |title=Spurious regressions in econometrics |journal=[[Journal of Econometrics]] |volume=2 |issue=2 |pages=111–120 |doi=10.1016/0304-4076(74)90034-7 }}</ref> high [[Coefficient of determination|R<sup>2</sup>]] values and high [[t-statistic|t-ratios]] yielding results with no economic meaning. + +To estimate the slope coefficients, one should first conduct a [[unit root test]], whose [[null hypothesis]] is that a unit root is present. If that hypothesis is rejected, one can use OLS. However, if the presence of a unit root is not rejected, then one should apply the [[Finite difference|difference operator]] to the series. If another unit root test shows the differenced time series to be stationary, OLS can then be applied to this series to estimate the slope coefficients. + +For example, in the AR(1) case, <math>\Delta y_{t} = y_{t} - y_{t-1} = \varepsilon_{t}</math> is stationary. + +In the AR(2) case, <math> y_{t} = a_{1}y_{t-1} + a_{2}y_{t-2} + \varepsilon_{t} </math> can be written as <math> (1 +-\lambda_{1}L)(1 - \lambda_{2}L)y_{t} = \varepsilon_{t} </math> where L is a [[lag operator]] that decreases the time index of a variable by one period: <math> Ly_{t} = y_{t-1} </math>. If <math> \lambda_{2} = 1 </math>, the model has a unit root and we can define <math> z_{t} = \Delta y_{t} </math>; then +: <math> z_{t} = \lambda_{1}z_{t-1} + \varepsilon_{t} </math> +is stationary if <math>|\lambda_1| < 1</math>. OLS can be used to estimate the slope coefficient, <math> \lambda_{1} </math>. + +If the process has multiple unit roots, the difference operator can be applied multiple times. + +==Properties and characteristics of unit-root processes== + +* Shocks to a unit root process have permanent effects which do not decay as they would if the process were stationary +* As noted above, a unit root process has a variance that depends on t, and diverges to infinity +* If it is known that a series has a unit root, the series can be differenced to render it stationary. For example, if a series <math> Y_t</math> is I(1), the series <math> \Delta Y_t=Y_t-Y_{t-1}</math> is I(0) (stationary). It is hence called a ''difference stationary'' series.{{Citation needed|date=December 2010}} + +==Unit root hypothesis== +[[File:Unit root hypothesis diagram.svg|thumb|The diagram above depicts an example of a potential unit root. The red line represents an observed drop in output. Green shows the path of recovery if the series has a unit root. Blue shows the recovery if there is no unit root and the series is trend stationary. The blue line returns to meet and follow the dashed trend line while the green line remains permanently below the trend. The unit root hypothesis also holds that a spike in output will lead to levels of output higher than the past trend.]] +Economists debate whether various economic statistics, especially [[economic output|output]], have a unit root or are [[trend stationary]].<ref name=econbrowser>{{cite web|url=http://www.econbrowser.com/archives/2009/03/trend_stationar.html |title=Trend Stationarity/Difference Stationarity over the (Very) Long Run |date=March 13, 2009 |publisher=Econbrowser }}</ref><ref>{{cite news|last=Krugman |first= Paul |title=Roots of evil (wonkish) |date =March 3, 2009 |url=http://krugman.blogs.nytimes.com/2009/03/03/roots-of-evil-wonkish/ |newspaper=The New York Times |authorlink=Paul Krugman}}</ref><ref>{{cite web|url=http://econlog.econlib.org/archives/2009/03/greg_mankiw_get.html |title=Greg Mankiw Gets Technical |publisher=Library of Economics and Liberty |date=March 3, 2009 |accessdate=2012-06-23}}</ref><ref>{{cite web|last=Verdon|first=Steve|title=Economic Cage Match: Mankiw vs. Krugman|url=http://www.outsidethebeltway.com/economic_cage_match_mankiw_vs_krugman/|publisher=Outside the Beltway|date=March 11, 2009}}</ref> A unit root process with drift is given in the first-order case by + +:<math>y_t = y_{t-1} + c + e_t</math> + +where ''c'' is a constant term referred to as the "drift" term, and <math>e_t</math> is white noise. Any non-zero value of the noise term, occurring for only one period, will permanently affect the value of <math>y_t</math> as shown in the graph, so deviations from the line <math>y_t = a + ct</math> are non-stationary; there is no reversion to any trend line. In contrast, a trend stationary process is given by + +:<math>y_t = k \cdot t + u_t</math> + +where ''k'' is the slope of the trend and <math>u_t</math> is noise (white noise in the simplest case; more generally, noise following its own stationary autoregressive process). Here any transient noise will not alter the long-run tendency for <math>y_t</math> to be on the trend line, as also shown in the graph. This process is said to be trend stationary because deviations from the trend line are stationary. + +The issue is particularly popular in the literature on business cycles.<ref>{{cite journal |last=Hegwood |first=Natalie |last2=Papell |first2=David H. |title=Are Real GDP Levels Trend, Difference, or Regime-Wise Trend Stationary? Evidence from Panel Data Tests Incorporating Structural Change |journal=Southern Economic Journal |volume=74 |issue=1 |year=2007 |pages=104–113 |jstor=20111955 }}</ref><ref>{{cite journal |last=Lucke |first=Bernd |authorlink=Bernd Lucke |title=Is Germany‘s GDP trend-stationary? A measurement-with-theory approach |journal=Jahrbücher für Nationalökonomie und Statistik |year=2005 |volume=225 |issue=1 |pages=60–76 |doi= |url=http://www.wiso-net.de/genios1.pdf?START=0A1&ANR=215850&DBN=ZECO&ZNR=1&ZHW=-4&WID=59162-3020953-72523_1 }}</ref> Research on the subject began with Nelson and Plosser whose paper on [[GNP]] and other output aggregates failed to reject the unit root hypothesis for these series.<ref>{{cite journal |last=Nelson |first=Charles R. |last2=Plosser |first2=Charles I. |year=1982 |title=Trends and Random Walks in Macroeconomic Time Series: Some Evidence and Implications |journal=[[Journal of Monetary Economics]] |volume=10 |issue=2 |pages=139–162 |doi=10.1016/0304-3932(82)90012-5 }}</ref> Since then, a debate—entwined with technical disputes on statistical methods—has ensued. Some economists<ref>[http://www.imf.org/external/pubs/ft/fandd/2009/09/blanchardindex.htm Olivier Blanchard] with the [[International Monetary Fund]] makes the claim that after a banking crisis "on average, output does not go back to its old trend path, but remains permanently below it."</ref> argue that [[GDP]] has a unit root or [[structural break]], implying that economic downturns result in permanently lower GDP levels in the long run. Other economists argue that GDP is trend-stationary: That is, when GDP dips below trend during a downturn it later returns to the level implied by the trend so that there is no permanent decrease in output. While the literature on the unit root hypothesis may consist of arcane debate on statistical methods, the hypothesis carries significant practical implications for economic forecasts and policies. + +==See also== +* [[Dickey–Fuller test]] +* [[Augmented Dickey–Fuller test]] +* [[Unit root test]] +* [[Phillips–Perron test]] +* [[Cointegration]], determining the relationship between two variables having unit roots +* Weighted symmetric unit root test (WS) +* Kwiatkowski, Phillips, Schmidt, Shin test, known as [[KPSS tests]] + +==Notes== +{{Reflist}} + +{{DEFAULTSORT:Unit Root}} +[[Category:Time series analysis]] +[[Category:Econometrics]] +[[Category:Regression with time series structure]] + hrc95n3bc2m9x5af8ykymc0m4fmi79l + + + + Drucker–Prager yield criterion + 0 + 22205 + + 22206 + 2013-09-19T16:03:00Z + + Erik Streb + 0 + + + corrected link + wikitext + text/x-wiki + [[Image:Drucker Prager Yield Surface 3Da.png|300px|right|thumb|Figure 1: View of Drucker–Prager yield surface in 3D space of principal stresses for <math>c=2, \phi=-20^\circ</math>]] + +The '''Drucker–Prager yield criterion'''<ref>Drucker, D. C. and Prager, W. (1952). ''Soil mechanics and plastic analysis for limit design''. Quarterly of Applied Mathematics, vol. 10, no. 2, pp. 157–165.</ref> is a pressure-dependent model for determining whether a material has failed or undergone plastic yielding. The criterion was introduced to deal with the plastic deformation of soils. It and its many variants have been applied to rock, concrete, polymers, foams, and other pressure-dependent materials. + +The [[Daniel C. Drucker|Drucker]]–[[William Prager|Prager]] yield criterion has the form +:<math> + \sqrt{J_2} = A + B~I_1 + </math> +where <math>I_1</math> is the [[Stress_(physics)#Principal_stresses_and_stress_invariants|first invariant]] of the [[Stress (physics)|Cauchy stress]] and <math>J_2</math> is the [[Stress_(physics)#Invariants_of_the_stress_deviator_tensor|second invariant]] of the [[Stress_(physics)#Stress_deviator_tensor|deviatoric]] part of the [[Stress (physics)|Cauchy stress]]. The constants <math>A, B </math> are determined from experiments. + +In terms of the [[von Mises stress|equivalent stress]] (or [[von Mises stress]]) and the [[hydrostatic stress|hydrostatic (or mean) stress]], the Drucker–Prager criterion can be expressed as +:<math> + \sigma_e = a + b~\sigma_m + </math> +where <math>\sigma_e</math> is the equivalent stress, <math>\sigma_m</math> is the hydrostatic stress, and +<math>a,b</math> are material constants. The Drucker–Prager yield criterion expressed in [[Yield_surface#Invariants_used_to_describe_yield_surfaces|Haigh–Westergaard coordinates]] is +:<math> + \tfrac{1}{\sqrt{2}}\rho - \sqrt{3}~B\xi = A + </math> + +The [[Yield surface#Drucker.E2.80.93Prager_yield_surface|Drucker–Prager yield surface]] is a smooth version of the [[Yield surface#Mohr.E2.80.93Coulomb_yield_surface|Mohr–Coulomb yield surface]]. + +== Expressions for A and B == +The Drucker–Prager model can be written in terms of the [[stress (physics)#Principal_stresses_and_stress_invariants|principal stresses]] as +:<math> + \sqrt{\cfrac{1}{6}\left[(\sigma_1-\sigma_2)^2+(\sigma_2-\sigma_3)^2+(\sigma_3-\sigma_1)^2\right]} = A + B~(\sigma_1+\sigma_2+\sigma_3) ~. + </math> +If <math>\sigma_t</math> is the yield stress in uniaxial tension, the Drucker–Prager criterion implies +:<math> + \cfrac{1}{\sqrt{3}}~\sigma_t = A + B~\sigma_t ~. + </math> +If <math>\sigma_c</math> is the yield stress in uniaxial compression, the Drucker–Prager criterion implies +:<math> + \cfrac{1}{\sqrt{3}}~\sigma_c = A - B~\sigma_c ~. + </math> +Solving these two equations gives +:<math> + A = \cfrac{2}{\sqrt{3}}~\left(\cfrac{\sigma_c~\sigma_t}{\sigma_c+\sigma_t}\right) ~;~~ B = \cfrac{1}{\sqrt{3}}~\left(\cfrac{\sigma_t-\sigma_c}{\sigma_c+\sigma_t}\right) ~. + </math> + +=== Uniaxial asymmetry ratio === +Different uniaxial yield stresses in tension and in compression are predicted by the Drucker–Prager model. The uniaxial asymmetry ratio for the Drucker–Prager model is +:<math> + \beta = \cfrac{\sigma_\mathrm{c}}{\sigma_\mathrm{t}} = \cfrac{1 - \sqrt{3}~B}{1 + \sqrt{3}~B} ~. + </math> + +=== Expressions in terms of cohesion and friction angle === +Since the Drucker–Prager [[yield surface]] is a smooth version of the [[Mohr-Coulomb theory|Mohr–Coulomb yield surface]], it is often expressed in terms of the cohesion (<math>c</math>) and the angle of internal friction (<math>\phi</math>) that are used to describe the [[Mohr-Coulomb theory|Mohr–Coulomb yield surface]]. If we assume that the Drucker–Prager yield surface '''circumscribes''' the Mohr–Coulomb yield surface then the expressions for <math>A</math> and <math>B</math> are +:<math> + A = \cfrac{6~c~\cos\phi}{\sqrt{3}(3+\sin\phi)} ~;~~ + B = \cfrac{2~\sin\phi}{\sqrt{3}(3+\sin\phi)} + </math> +If the Drucker–Prager yield surface '''inscribes''' the Mohr–Coulomb yield surface then +:<math> + A = \cfrac{6~c~\cos\phi}{\sqrt{3}(3-\sin\phi)} ~;~~ + B = \cfrac{2~\sin\phi}{\sqrt{3}(3-\sin\phi)} + </math> +:{| class="toccolours collapsible collapsed" width="90%" style="text-align:left" +!Derivation of expressions for <math>A,B</math> in terms of <math>c,\phi</math> +|- +|The expression for the [[Mohr-Coulomb theory|Mohr–Coulomb yield criterionddadsa]] in [[Yield_surface#Invariants_used_to_describe_yield_surfaces|Haigh–Westergaard space]] is +:<math> + \left[\sqrt{3}~\sin\left(\theta+\tfrac{\pi}{3}\right) - \sin\phi\cos\left(\theta+\tfrac{\pi}{3}\right)\right]\rho - \sqrt{2}\sin(\phi)\xi = \sqrt{6} c \cos\phi + </math> +If we assume that the Drucker–Prager yield surface '''circumscribes''' the Mohr–Coulomb yield surface such that the two surfaces coincide at <math>\theta=\tfrac{\pi}{3}</math>, then at those points the Mohr–Coulomb yield surface can be expressed as +:<math> + \left[\sqrt{3}~\sin\tfrac{2\pi}{3} - \sin\phi\cos\tfrac{2\pi}{3}\right]\rho - \sqrt{2}\sin(\phi)\xi = \sqrt{6} c \cos\phi + </math> +or, +:<math> + \tfrac{1}{\sqrt{2}}\rho - \cfrac{2\sin\phi}{3+\sin\phi}\xi = \cfrac{\sqrt{12} c \cos\phi}{3+\sin\phi} \qquad \qquad (1.1) + </math> + +The Drucker–Prager yield criterion expressed in [[Yield_surface#Invariants_used_to_describe_yield_surfaces|Haigh–Westergaard coordinates]] is +:<math> + \tfrac{1}{\sqrt{2}}\rho - \sqrt{3}~B\xi = A \qquad \qquad (1.2) + </math> +Comparing equations (1.1) and (1.2), we have +:<math> + A = \cfrac{\sqrt{12} c \cos\phi}{3+\sin\phi} = \cfrac{6 c \cos\phi}{\sqrt{3}(3+\sin\phi)} ~;~~ B = \cfrac{2\sin\phi}{\sqrt{3}(3+\sin\phi)} + </math> +These are the expressions for <math>A,B</math> in terms of <math>c,\phi</math>. + +On the other hand if the Drucker–Prager surface inscribes the Mohr–Coulomb surface, then matching the two surfaces at <math>\theta=0</math> gives +:<math> + A = \cfrac{6 c \cos\phi}{\sqrt{3}(3-\sin\phi)} ~;~~ B = \cfrac{2\sin\phi}{\sqrt{3}(3-\sin\phi)} + </math> +[[Image:MC DP Yield Surface 3Da.png|300px|left|thumb|Comparison of Drucker–Prager and Mohr–Coulomb (inscribed) yield surfaces in the <math>\pi</math>-plane for <math>c = 2, \phi = 20^\circ</math>]] +[[Image:MC DP Yield Surface 3Db.png|300px|none|thumb|Comparison of Drucker–Prager and Mohr–Coulomb (circumscribed) yield surfaces in the <math>\pi</math>-plane for <math>c = 2, \phi = 20^\circ</math>]] +|} +{| border="0" +|- +| valign="bottom"| +[[Image:Drucker Prager Yield Surface 3Db.png|300px|none|thumb|Figure 2: Drucker–Prager yield surface in the <math>\pi</math>-plane for <math>c = 2, \phi = 20^\circ</math>]] +| +| +|valign="bottom"| +[[Image:MC DP Yield Surface sig1sig2.png|300px|none|thumb|Figure 3: Trace of the Drucker–Prager and Mohr–Coulomb yield surfaces in the <math>\sigma_1-\sigma_2</math>-plane for <math>c = 2, \phi = 20^\circ</math>. Yellow = Mohr–Coulomb, Cyan = Drucker–Prager.]] +|- +|} + +== Drucker–Prager model for polymers == +The Drucker–Prager model has been used to model polymers such as [[polyoxymethylene]] and [[polypropylene]]{{Citation needed|date=September 2011}}.<ref>Abrate, S. (2008). ''Criteria for yielding or failure of cellular materials''. Journal of Sandwich Structures and Materials, vol. 10. pp. 5–51.</ref> For [[polyoxymethylene]] the yield stress is a linear function of the pressure. However, [[polypropylene]] shows a quadratic pressure-dependence of the yield stress. + +== Drucker–Prager model for foams == +For foams, the GAZT model <ref>Gibson, L.J., [[M. F. Ashby|Ashby, M.F.]], Zhang, J. and Triantafilliou, T.C. (1989). ''Failure surfaces for cellular materials under multi-axial loads. I. Modeling''. International Journal of +Mechanical Sciences, vol. 31, no. 9, pp. 635–665.</ref> uses +:<math> + A = \pm \cfrac{\sigma_y}{\sqrt{3}} ~;~~ B = \mp \cfrac{1}{\sqrt{3}}~\left(\cfrac{\rho}{5~\rho_s}\right) + </math> +where <math>\sigma_{y}</math> is a critical stress for failure in tension or compression, <math>\rho</math> is the density of the foam, and <math>\rho_s</math> is the density of the base material. + +== Extensions of the isotropic Drucker–Prager model == +The Drucker–Prager criterion can also be expressed in the alternative form +:<math> + J_2 = (A + B~I_1)^2 = a + b~I_1 + c~I_1^2 ~. +</math> + +=== Deshpande–Fleck yield criterion === +The Deshpande–Fleck yield criterion<ref>V. S. Deshpande, and Fleck, N. A. (2001). ''Multi-axial yield behaviour of polymer foams.'' Acta Materialia, vol. 49, no. 10, pp. 1859–1866.</ref> for foams has the form given in above equation. The parameters <math>a, b, c</math> for the Deshpande–Fleck criterion are +:<math> + a = (1 + \beta^2)~\sigma_y^2 ~,~~ + b = 0 ~,~~ + c = -\cfrac{\beta^2}{3} +</math> +where <math>\beta</math> is a parameter<ref><math>\beta= \alpha/3</math> where <math>\alpha</math> is the +quantity used by Deshpande–Fleck</ref> that determines the shape of the yield surface, and <math>\sigma_y</math> is the yield stress in tension or compression. + +== Anisotropic Drucker–Prager yield criterion == +An anisotropic form of the Drucker–Prager yield criterion is the Liu–Huang–Stout yield criterion.<ref>Liu, C., Huang, Y., and Stout, M. G. (1997). ''On the asymmetric yield surface of plastically orthotropic materials: A phenomenological study.'' Acta Materialia, vol. 45, no. 6, pp. 2397–2406</ref> This yield criterion is an extension of the [[Hill yield criteria|generalized Hill yield criterion]] and has the form +:<math> + \begin{align} + f := & \sqrt{F(\sigma_{22}-\sigma_{33})^2+G(\sigma_{33}-\sigma_{11})^2+H(\sigma_{11}-\sigma_{22})^2 + + 2L\sigma_{23}^2+2M\sigma_{31}^2+2N\sigma_{12}^2}\\ + & + I\sigma_{11}+J\sigma_{22}+K\sigma_{33} - 1 \le 0 + \end{align} + </math> + +The coefficients <math>F,G,H,L,M,N,I,J,K</math> are +:<math> + \begin{align} + F = & \cfrac{1}{2}\left[\Sigma_2^2 + \Sigma_3^2 - \Sigma_1^2\right] ~;~~ + G = \cfrac{1}{2}\left[\Sigma_3^2 + \Sigma_1^2 - \Sigma_2^2\right] ~;~~ + H = \cfrac{1}{2}\left[\Sigma_1^2 + \Sigma_2^2 - \Sigma_3^2\right] \\ + L = & \cfrac{1}{2(\sigma_{23}^y)^2} ~;~~ + M = \cfrac{1}{2(\sigma_{31}^y)^2} ~;~~ + N = \cfrac{1}{2(\sigma_{12}^y)^2} \\ + I = & \cfrac{\sigma_{1c}-\sigma_{1t}}{2\sigma_{1c}\sigma_{1t}} ~;~~ + J = \cfrac{\sigma_{2c}-\sigma_{2t}}{2\sigma_{2c}\sigma_{2t}} ~;~~ + K = \cfrac{\sigma_{3c}-\sigma_{3t}}{2\sigma_{3c}\sigma_{3t}} + \end{align} + </math> +where +:<math> + \Sigma_1 := \cfrac{\sigma_{1c}+\sigma_{1t}}{2\sigma_{1c}\sigma_{1t}} ~;~~ + \Sigma_2 := \cfrac{\sigma_{2c}+\sigma_{2t}}{2\sigma_{2c}\sigma_{2t}} ~;~~ + \Sigma_3 := \cfrac{\sigma_{3c}+\sigma_{3t}}{2\sigma_{3c}\sigma_{3t}} + </math> +and <math>\sigma_{ic}, i=1,2,3</math> are the uniaxial yield stresses in '''compression''' in the three principal directions of anisotropy, <math>\sigma_{it}, i=1,2,3</math> are the uniaxial yield stresses in '''tension''', and <math>\sigma_{23}^y, \sigma_{31}^y, \sigma_{12}^y</math> are the yield stresses in pure shear. It has been assumed in the above that the quantities <math>\sigma_{1c},\sigma_{2c},\sigma_{3c}</math> are positive and <math>\sigma_{1t},\sigma_{2t},\sigma_{3t}</math> are negative. + +== The Drucker yield criterion == +The Drucker–Prager criterion should not be confused with the earlier Drucker criterion <ref>Drucker, D. C. (1949) '' Relations of experiments to mathematical theories of plasticity'', Journal of Applied Mechanics, vol. 16, pp. 349–357.</ref> which is independent of the pressure (<math>I_1</math>). The Drucker yield criterion has the form +:<math> + f := J_2^3 - \alpha~J_3^2 - k^2 \le 0 + </math> +where <math>J_2</math> is the second invariant of the deviatoric stress, <math>J_3</math> is the third invariant of the deviatoric stress, <math>\alpha</math> is a constant that lies between -27/8 and 9/4 (for the yield surface to be convex), <math>k</math> is a constant that varies with the value of <math>\alpha</math>. For <math>\alpha=0</math>, <math>k^2 = \cfrac{\sigma_y^6}{27}</math> where <math>\sigma_y</math> is the yield stress in uniaxial tension. + +== Anisotropic Drucker Criterion == +An anisotropic version of the Drucker yield criterion is the Cazacu–Barlat (CZ) yield criterion <ref>Cazacu, O. and Barlat, F. (2001). ''Generalization of Drucker's yield criterion to orthotropy.'' Mathematics and Mechanics of Solids, vol. 6, no. 6, pp. 613–630.</ref> which has the form +:<math> + f := (J_2^0)^3 - \alpha~(J_3^0)^2 - k^2 \le 0 + </math> +where <math>J_2^0, J_3^0</math> are generalized forms of the deviatoric stress and are defined as +:<math> + \begin{align} + J_2^0 := & \cfrac{1}{6}\left[a_1(\sigma_{22}-\sigma_{33})^2+a_2(\sigma_{33}-\sigma_{11})^2 +a_3(\sigma_{11}-\sigma_{22})^2\right] + a_4\sigma_{23}^2 + a_5\sigma_{31}^2 + a_6\sigma_{12}^2 \\ + J_3^0 := & \cfrac{1}{27}\left[(b_1+b_2)\sigma_{11}^3 +(b_3+b_4)\sigma_{22}^3 + \{2(b_1+b_4)-(b_2+b_3)\}\sigma_{33}^3\right] \\ + & -\cfrac{1}{9}\left[(b_1\sigma_{22}+b_2\sigma_{33})\sigma_{11}^2+(b_3\sigma_{33}+b_4\sigma_{11})\sigma_{22}^2 + + \{(b_1-b_2+b_4)\sigma_{11}+(b_1-b_3+b_4)\sigma_{22}\}\sigma_{33}^2\right] \\ + & + \cfrac{2}{9}(b_1+b_4)\sigma_{11}\sigma_{22}\sigma_{33} + 2 b_{11}\sigma_{12}\sigma_{23}\sigma_{31}\\ + & - \cfrac{1}{3}\left[\{2b_9\sigma_{22}-b_8\sigma_{33}-(2b_9-b_8)\sigma_{11}\}\sigma_{31}^2+ + \{2b_{10}\sigma_{33}-b_5\sigma_{22}-(2b_{10}-b_5)\sigma_{11}\}\sigma_{12}^2 \right.\\ + & \qquad \qquad\left. \{(b_6+b_7)\sigma_{11} - b_6\sigma_{22}-b_7\sigma_{33}\}\sigma_{23}^2 + \right] + \end{align} + </math> + +=== Cazacu–Barlat yield criterion for plane stress === +For thin sheet metals, the state of stress can be approximated as [[plane stress]]. In that case the Cazacu–Barlat yield criterion reduces to its two-dimensional version with +:<math> + \begin{align} + J_2^0 = & \cfrac{1}{6}\left[(a_2+a_3)\sigma_{11}^2+(a_1+a_3)\sigma_{22}^2-2a_3\sigma_1\sigma_2\right]+ a_6\sigma_{12}^2 \\ + J_3^0 = & \cfrac{1}{27}\left[(b_1+b_2)\sigma_{11}^3 +(b_3+b_4)\sigma_{22}^3 \right] + -\cfrac{1}{9}\left[b_1\sigma_{11}+b_4\sigma_{22}\right]\sigma_{11}\sigma_{22} + + \cfrac{1}{3}\left[b_5\sigma_{22}+(2b_{10}-b_5)\sigma_{11}\right]\sigma_{12}^2 + \end{align} + </math> + +For thin sheets of metals and alloys, the parameters of the Cazacu–Barlat yield criterion are +{| border="1" +|+ Table 1. '''Cazacu–Barlat yield criterion parameters for sheet metals and alloys''' +! Material !! <math>a_1</math> !! <math>a_2</math> !! <math>a_3</math> !! <math>a_6</math> !! <math>b_1</math> !! <math>b_2</math> !! <math>b_3</math> !! <math>b_4</math> !! <math>b_5</math> !! <math>b_{10}</math> !! <math>\alpha</math> +|- +! 6016-T4 Aluminum Alloy +| 0.815 || 0.815 || 0.334 || 0.42 || 0.04 || -1.205 || -0.958 || 0.306 || 0.153 || -0.02 || 1.4 +|- +! 2090-T3 Aluminum Alloy +| 1.05 || 0.823 || 0.586 || 0.96 || 1.44 || 0.061 || -1.302 || -0.281 || -0.375 || 0.445 || 1.285 +|} + +== See also == +{{Continuum mechanics|cTopic=[[Solid mechanics]]}} +*[[Yield surface]] +*[[Yield (engineering)]] +*[[Plasticity (physics)]] +*[[Material failure theory]] +*[[Daniel C. Drucker]] +*[[William Prager]] + +== References == +<references/> + +{{DEFAULTSORT:Drucker-Prager yield criterion}} +[[Category:Plasticity]] +[[Category:Soil mechanics]] +[[Category:Solid mechanics]] +[[Category:Yield criteria]] + 5l788gzx8snh2177nfuo81azb0gyi55 + + + + Antiplane shear + 0 + 23060 + + 23061 + 2013-08-19T23:43:42Z + + Bbanerje + 0 + + /* See also */ + wikitext + text/x-wiki + '''Antiplane shear''' or '''antiplane strain'''<ref>W. S. Slaughter, 2002, ''The Linearized Theory of Elasticity'', Birkhauser</ref> is a special state of [[Deformation (mechanics)|strain]] in a body. This state of strain is achieved when the [[displacement field (mechanics)|displacement]]s in the body are zero in the plane of interest but nonzero in the direction perpendicular to the plane. For small strains, the [[strain tensor]] under antiplane shear can be written as + +: <math>\boldsymbol{\varepsilon} = \begin{bmatrix} +0 & 0 & \epsilon_{13} \\ +0 & 0 & \epsilon_{23}\\ + \epsilon_{13} & \epsilon_{23} & 0\end{bmatrix}</math> +where the <math>12\,</math> plane is the plane of interest and the <math>3\,</math> direction is perpendicular to that plane. + +== Displacements == +The displacement field that leads to a state of antiplane shear is (in rectangular Cartesian coordinates) +:<math> + u_1 = u_2 = 0 ~;~~ u_3 = \hat{u}_3(x_1, x_2) + </math> +where <math>u_i,~ i=1,2,3</math> are the displacements in the <math>x_1, x_2, x_3\,</math> directions. + +== Stresses == +For an [[isotropic]], [[Deformation (engineering)#Elastic_deformation|linear elastic]] material, the [[stress (physics)|stress]] tensor that results from a state of antiplane shear can be expressed as +:<math> + \boldsymbol{\sigma} \equiv + \begin{bmatrix} + \sigma_{11} & \sigma_{12} & \sigma_{13} \\ + \sigma_{12} & \sigma_{22} & \sigma_{23} \\ + \sigma_{13} & \sigma_{23} & \sigma_{33} + \end{bmatrix} = + \begin{bmatrix} 0 & 0 & \mu~\cfrac{\partial u_3}{\partial x_1} \\ + 0 & 0 & \mu~\cfrac{\partial u_3}{\partial x_2} \\ + \mu~\cfrac{\partial u_3}{\partial x_1} & \mu~\cfrac{\partial u_3}{\partial x_2} & 0 \end{bmatrix} + </math> +where <math>\mu\,</math> is the shear modulus of the material. + +== Equilibrium equation for antiplane shear == +The conservation of linear momentum in the absence of inertial forces takes the form of the '''equilibrium equation'''. For general states of stress there are three equilibrium equations. However, for antiplane shear, with the assumption that body forces in the 1 and 2 directions are 0, these reduce to one equilibrium equation which is expressed as +:<math> + \mu~\nabla^2 u_3 + b_3(x_1, x_2) = 0 + </math> +where <math>b_3</math> is the body force in the <math>x_3</math> direction and <math>\nabla^2 u_3 = \cfrac{\partial^2 u_3}{\partial x_1^2} + \cfrac{\partial^2 u_3}{\partial x_2^2}</math>. Note that this equation is valid only for infinitesimal strains. + +== Applications == +The antiplane shear assumption is used to determine the stresses and displacements due to a [[screw dislocation]]. + +== References == +<references /> + +== See also == +*[[Infinitesimal strain theory]] +*[[Deformation (mechanics)]] + +[[Category:Elasticity (physics)]] +[[Category:Solid mechanics]] + bqtqqsgpn83ysfdaw6n42cqqhe9rivo + + + + Table of thermodynamic equations + 0 + 15871 + + 15872 + 2013-12-08T16:50:50Z + + 2605:A000:D420:B01:906F:3C15:187F:F359 + + /* Kinetic theory */ + wikitext + text/x-wiki + {{Thermodynamics|cTopic=[[Thermodynamic equations|Equations]]}} +{{For|list of mathematical notation used in these equations|mathematical notation}} +{{main|List of thermodynamic properties}} + +This article is summary of common [[equation]]s and [[physical quantity|quantities]] in [[thermodynamics]] (see [[thermodynamic equations]] for more elaboration). SI units are used for [[absolute temperature]], not celsius or fahrenheit. + +==Definitions== + +{{hatnote|Main articles: [[List of thermodynamic properties]], [[Thermodynamic potential]], [[Free entropy]], [[Defining equation (physical chemistry)]]}} + +Many of the definitions below are also used in the thermodynamics of [[chemical reaction]]s. + +===General basic quantities=== + +:{| class="wikitable" +|- +! scope="col" width="200" | Quantity (Common Name/s) +! scope="col" width="125" | (Common) Symbol/s +! scope="col" width="125" | SI Units +! scope="col" width="100" | Dimension +|- +!Number of molecules +| ''y' ' +| dimensionless +| dimensionless +|- +!Number of moles +|''n'' +| mol +| [N] +|- +![[Temperature]] +| ''T'' +| K +| [Θ] +|- +![[Heat|Heat Energy]] +| ''Q, q'' +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Latent Heat]] +| ''Q<sub>L</sub>'' +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +|} + +===General derived quantities=== + +:{| class="wikitable" +|- +! scope="col" width="200" | Quantity (Common Name/s) +! scope="col" width="125" | (Common) Symbol/s +! scope="col" width="200" | Defining Equation +! scope="col" width="125" | SI Units +! scope="col" width="100" | Dimension +|- +![[Thermodynamic beta]], Inverse temperature +|| ''β'' +||<math> \beta = 1/k_B T \,\!</math> +|| J<sup>−1</sup> +|| [T]<sup>2</sup>[M]<sup>−1</sup>[L]<sup>−2</sup> +|- +![[Entropy]] +| ''S'' +| <math>S = -k_B\sum_i p_i\ln p_i</math> +| J K<sup>−1</sup> +| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +![[Negentropy]] +| ''J'' +| +| J K<sup>−1</sup> +| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +![[Internal Energy]] +| ''U'' +|<math>U = \sum_i E_i \!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Enthalpy]] +| ''H'' +|<math> H = U+pV\,\!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Partition function (statistical mechanics)|Partition Function]] +| ''Z'' +| +| dimensionless +| dimensionless +|- +![[Gibbs free energy]] +| ''G'' +|<math> G = H - TS \,\!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Chemical potential]] (of +component ''i'' in a mixture) +| ''μ<sub>i</sub>'' +|<math> \mu_i = \left (\partial U/\partial N_i \right )_{N_{i \neq j}, S, V } \,\!</math> +(''N<sub>i</sub>'', ''S'', ''V'' must all be constant) +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Helmholtz free energy]] +| ''A, F'' +|<math> F = U - TS \,\!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Grand potential|Landau potential]], Landau Free Energy +| ''Ω'' +|<math> \Omega = U - TS - \mu N\,\!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +![[Grand potential]] +| ''Φ<sub>G</sub>'' +|<math> \Phi_G = U - TS - \mu N \,\!</math> +| J +| [M][L]<sup>2</sup>[T]<sup>−2</sup> +|- +!Massieu Potential, Helmholtz [[free entropy]] +| ''Φ'' +|<math> \Phi = S - U/T \,\!</math> +| J K<sup>−1</sup> +| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Planck potential, Gibbs [[free entropy]] +| ''Ξ'' +|<math> \Xi = \Phi - pV/T \,\!</math> +| J K<sup>−1</sup> +| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +|} + +===Thermal properties of matter=== + +{{hatnote|Main Articles: ''[[Heat capacity]]'', ''[[Thermal expansion]]''}} + +:{| class="wikitable" +|- +! scope="col" width="100" | Quantity (common name/s) +! scope="col" width="100" | (Common) symbol/s +! scope="col" width="300" | Defining equation +! scope="col" width="125" | SI units +! scope="col" width="100" | Dimension +|- +!General heat/thermal capacity +|| ''C'' +||<math> C = \partial Q/\partial T\,\!</math> +|| J K <sup>−1</sup> +|| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Heat capacity (isobaric) +|| ''C<sub>p</sub>'' +||<math> C_{p} = \partial H/\partial T\,\!</math> +|| J K <sup>−1</sup> +|| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Specific heat capacity (isobaric) +|| ''C<sub>mp</sub>'' +||<math> C_{mp} = \partial^2 Q/\partial m \partial T \,\!</math> +|| J kg<sup>−1</sup> K<sup>−1</sup> +| [L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Molar specific heat capacity (isobaric) +|| ''C<sub>np</sub>'' +||<math>C_{np} = \partial^2 Q/\partial n \partial T \,\!</math> +|| J K <sup>−1</sup> mol<sup>−1</sup> +|| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> [N]<sup>−1</sup> +|- +!Heat capacity (isochoric/volumetric) +|| ''C<sub>V</sub>'' +||<math> C_{V} = \partial Q/\partial T \,\!</math> +|| J K <sup>−1</sup> +|| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Specific heat capacity (isochoric) +|| ''C<sub>mV</sub>'' +||<math> C_{mV} = \partial^2 Q/\partial m \partial T \,\!</math> +|| J kg<sup>−1</sup> K<sup>−1</sup> +|| [L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> +|- +!Molar specific heat capacity (isochoric) +|| ''C<sub>nV</sub>'' +||<math> C_{nV} = \partial^2 Q/\partial n \partial T \,\!</math> +|| J K <sup>−1</sup> mol<sup>−1</sup> +|| [M][L]<sup>2</sup>[T]<sup>−2</sup> [Θ]<sup>−1</sup> [N]<sup>−1</sup> +|- +!Specific latent heat +|| ''L'' +||<math>L = \partial Q/ \partial m \,\!</math> +|| J kg<sup>−1</sup> +|| [L]<sup>2</sup>[T]<sup>−2</sup> +|- +!Ratio of isobaric to isochoric heat capacity, [[heat capacity ratio]], adiabatic index +|| ''γ'' +||<math>\gamma = C_p/C_V = c_p/c_V = C_{mp}/C_{mV} \,\!</math> +|| dimensionless +|| dimensionless +|- +|} + +===Thermal transfer=== + +{{Main|Thermal conductivity}} + +:{| class="wikitable" +|- +! scope="col" width="100" | Quantity (common name/s) +! scope="col" width="100" | (Common) symbol/s +! scope="col" width="300" | Defining equation +! scope="col" width="125" | SI units +! scope="col" width="100" | Dimension +|- +![[Temperature gradient]] +|| No standard symbol +||<math> \nabla T \,\!</math> +|| K m<sup>−1</sup> +|| [Θ][L]<sup>−1</sup> +|- +!Thermal conduction rate, thermal current, thermal/[[heat flux]], thermal power transfer +|| ''P'' +||<math>P = \mathrm{d} Q/\mathrm{d} t \,\!</math> +|| W = J s<sup>−1</sup> +|| [M] [L]<sup>2</sup> [T]<sup>−3</sup> +|- +!Thermal intensity +|| ''I'' +||<math>I = \mathrm{d} P/\mathrm{d} A </math> +|| W m<sup>−2</sup> +|| [M] [T]<sup>−3</sup> +|- +!Thermal/heat flux density (vector analogue of thermal intensity above) +|| '''q''' +||<math>Q = \iint \mathbf{q} \cdot \mathrm{d}\mathbf{S}\mathrm{d} t \,\!</math> +|| W m<sup>−2</sup> +|| [M] [T]<sup>−3</sup> +|- +|} + +==Equations== + +The equations in this article are classified by subject. + +===Phase transitions=== + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="10" | Equations +|- +!Adiabatic transition +|<math>\Delta Q = 0, \quad \Delta U = -\Delta W\,\!</math> +|- +!Isothermal transition +|<math>\Delta U = 0, \quad \Delta W = \Delta Q \,\!</math> + +For an ideal gas<br /> +<math>W=kTN \ln(V_2/V_1)\,\!</math> +|- +!Isobaric transition +|''p''<sub>1</sub> = ''p''<sub>2</sub>, ''p'' = constant <br /> +<math>\Delta W = p \Delta V, \quad \Delta Q = \Delta U + p \delta V\,\!</math> +|- +!Isochoric transition +|''V''<sub>1</sub> = ''V''<sub>2</sub>, ''V'' = constant <br /> +<math>\Delta W = 0, \quad \Delta Q = \Delta U\,\!</math> +|- +!Adiabatic expansion +|<math>p_1 V_1^{\gamma} = p_2 V_2^{\gamma}\,\!</math><br /> +<math>T_1 V_1^{\gamma - 1} = T_2 V_2^{\gamma - 1} \,\!</math> +|- +!Free expansion +|<math>\Delta U = 0\,\!</math> +|- +!Work done by an expanding gas +|Process<br /> +<math> \Delta W = \int_{V_1}^{V_2} p \mathrm{d}V \,\!</math> + +Net Work Done in Cyclic Processes<br /> +<math> \Delta W = \oint_\mathrm{cycle} p \mathrm{d}V \,\!</math> +|- +|} + +===Kinetic theory=== + +:{| class="wikitable" +|+Ideal gas equations +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +! Ideal gas law +| <div class="plainlist"> +*''p'' = pressure +*''V'' = volume of container +*''T'' = temperature +*''n'' = number of moles +*''R'' = [[Gas constant]] +*''N'' = number of molecules +*''k'' = Boltzmann’s constant +</div> +|<math>pV = nRT = kTN\,\!</math><br /> +<math>\frac{p_1 V_1}{p_2 V_2} = \frac{n_1 T_1}{n_2 T_2} = \frac{N_1 T_1}{N_2 T_2} \,\!</math> +|- +! Pressure of an ideal gas +| <div class="plainlist"> +*''m'' = mass of ''one'' molecule +*''M<sub>m</sub>'' = molar mass +</div> +| <math>p = \frac{Nm \langle v^2 \rangle}{3V} = \frac{nM_m \langle v^2 \rangle}{3V} = \frac{1}{3}\rho \langle v^2 \rangle \,\!</math> +|- +|} + +==== Ideal gas ==== + +:{| class="wikitable" +|- +! Quantity +! General Equation +! Isobaric<br />Δ''p'' = 0 +! Isochoric<br />Δ''V'' = 0 +! Isothermal<br />Δ''T'' = 0 +! Adiabatic<br /><math>Q=0</math> +|- +! Work <br /> ''W'' +| align="center" | <math> \delta W = p dV\;</math> +| align="center" | <math>p\Delta V\;</math> +| align="center" | <math>0\;</math> +| align="center" | <math>nRT\ln\frac{V_2}{V_1}\;</math> +| align="center" | <math>\frac{PV^\gamma (V_f^{1-\gamma} - V_i^{1-\gamma}) } {1-\gamma} = C_V \left(T_1 - T_2 \right)</math> +|- +! Heat Capacity <br /> ''C'' +| align="center" | (as for real gas) +| align="center" | <math>C_p = \frac{5}{2}nR\;</math><br>(for monatomic ideal gas) +| align="center" | <math>C_V = \frac{3}{2}nR \;</math><br>(for monatomic ideal gas) +|| +|| +|- +! Internal Energy <br /> Δ''U'' +| align="center" | <math>\Delta U = C_v \Delta T\;</math> +| align="center" | <math>Q - W\;</math><br><br><math>Q_p - p\Delta V\;</math> +| align="center" | <math>Q\;</math><br><br><math>C_V\left ( T_2-T_1 \right )\;</math> +| align="center" | <math>0\;</math><br><br><math>Q=-W\;</math> +| align="center" | <math>-W\;</math><br><br><math>C_V\left ( T_2-T_1 \right )\;</math> +|- +! Enthalpy <br /> Δ''H'' +| align="center" | <math>H=U+pV\;</math> +| align="center" | <math>C_p\left ( T_2-T_1 \right )\;</math> +| align="center" | <math>Q_V+V\Delta p\;</math> +| align="center" | <math>0\;</math> +| align="center" | <math>C_p\left ( T_2-T_1 \right )\;</math> +|- +! Entropy <br /> Δ''S'' +| align="center" | <math>\Delta S = C_v \ln{T_2 \over T_1} + R \ln{V_2 \over V_1}</math><br><math>\Delta S = C_p \ln{T_2 \over T_1} - R \ln{p_2 \over p_1}</math><ref>Keenan, ''Thermodynamics'', Wiley, New York, 1947</ref> +| align="center" | <math>C_p\ln\frac{T_2}{T_1}\;</math> +| align="center" | <math>C_V\ln\frac{T_2}{T_1}\;</math> +| align="center" | <math>nR\ln\frac{V_2}{V_1}\;</math><br><math>\frac{Q}{T}\;</math> +| align="center" | <math>C_p\ln\frac{V_2}{V_1}+C_V\ln\frac{p_2}{p_1}=0\;</math> +|- +! Constant +| <math>\;</math> +| align="center" | <math>\frac{V}{T}\;</math> +| align="center" | <math>\frac{p}{T}\;</math> +| align="center" | <math>p V\;</math> +| align="center" | <math>p V^\gamma\;</math> +|} + +===Entropy=== + +*<math> S = k_B (\ln \Omega) </math>, where ''k<sub>B</sub>'' is the [[Boltzmann constant]], and Ω denotes the volume of [[macrostate]] in the [[phase space]] or otherwise called thermodynamic probability. + +*<math> dS = \frac{\delta Q}{T} </math>, for reversible processes only + +===Statistical physics=== + +Below are useful results from the [[Maxwell-Boltzmann distribution]] for an ideal gas, and the implications of the Entropy quantity. The distribution is valid for atoms or molecules constituting ideal gases. + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +!Maxwell–Boltzmann distribution +| <div class="plainlist"> +*''v'' = velocity of atom/molecule, +*''m'' = mass of each molecule (all molecules are identical in kinetic theory), +*''γ''(''p'') = Lorentz factor as function of momentum (see below) +*Ratio of thermal to rest mass-energy of each molecule:<math>\theta = k_B T/mc^2 \,\!</math> +</div"> + +''K''<sub>2</sub> is the Modified [[Bessel function]] of the second kind. +|Non-relativistic speeds<br /> +<math>P\left ( v \right )=4\pi\left ( \frac{m}{2\pi k_B T} \right )^{3/2} v^2 e^{-mv^2/2 k_B T} \,\!</math> + +Relativistic speeds (Maxwell-Juttner distribution)<br /> +<math> f(p) = \frac{1}{4 \pi m^3 c^3 \theta K_2(1/\theta)} e^{-\gamma(p)/\theta}</math> +|- +!Entropy [[logarithmic scale|Logarithm]] of the [[density of states]] +|<div class="plainlist"> +* ''P<sub>i</sub>'' = probability of system in microstate ''i'' +* Ω = total number of microstates +</div> +|<math>S = - k_B\sum_i P_i \ln P_i = k_\mathrm{B}\ln \Omega\,\!</math> + +where:<br /> +<math>P_i = 1/\Omega\,\!</math> +|- +!Entropy change +| +|<math>\Delta S = \int_{Q_1}^{Q_2} \frac{\mathrm{d}Q}{T} \,\!</math><br /> +<math>\Delta S = k_B N \ln\frac{V_2}{V_1} + N C_V \ln\frac{T_2}{T_1} \,\!</math> +|- +!Entropic force +| +|<math>\mathbf{F}_\mathrm{S} = -T \nabla S \,\!</math> +|- +!Equipartition theorem +| <div class="plainlist"> +*''d<sub>f</sub>'' = degree of freedom +</div> +| Average kinetic energy per degree of freedom +<math> \langle E_\mathrm{k} \rangle = \frac{1}{2}kT\,\!</math> + +Internal energy +<math> U = d_f \langle E_\mathrm{k} \rangle = \frac{d_f}{2}kT\,\!</math> +|- +|} + +Corollaries of the non-relativistic Maxwell-Boltzmann distribution are below. + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +!Mean speed +| +|<math> \langle v \rangle = \sqrt{\frac{8 k_B T}{\pi m}}\,\!</math> +|- +!Root mean square speed +| +| <math> v_\mathrm{rms} = \sqrt{\langle v^2 \rangle} = \sqrt{\frac{3k_B T}{m}} \,\!</math> +|- +!Modal speed +| +|<math> v_\mathrm{mode} = \sqrt{\frac{2k_B T}{m}}\,\!</math> +|- +![[Mean free path]] +|<div class="plainlist"> +*''σ'' = Effective cross-section +*''n'' = Volume density of number of target particles +*''{{ell}}'' = Mean free path +</div> +|<math>\ell = 1/\sqrt{2} n \sigma \,\!</math> +|- +|} + +===Quasi-static and reversible processes=== + +For [[Quasistatic process|quasi-static]] and [[Reversible process (thermodynamics)|reversible]] processes, the [[first law of thermodynamics]] is: + +:<math>dU=\delta Q - \delta W</math> + +where δ''Q'' is the heat supplied ''to'' the system and δ''W'' is the work done ''by'' the system. + +===Thermodynamic potentials=== +{{main|Thermodynamic potentials}} +{{See also|Maxwell relations}} + +The following energies are called the [[thermodynamic potentials]], + +:{{table of thermodynamic potentials}} + +and the corresponding [[fundamental thermodynamic relation]]s or "master equations"<ref>Physical chemistry, P.W. Atkins, Oxford University Press, 1978, ISBN 0 19 855148 7</ref> are: + +:{| class="wikitable" +|- +! Potential +! Differential +|- +! Internal energy +|<math>dU\left(S,V,{n_{i}}\right) = TdS - pdV + \sum_{i} \mu_{i} dN_i</math> +|- +! Enthalpy +|<math>dH\left(S,p,n_{i}\right) = TdS + Vdp + \sum_{i} \mu_{i} dN_{i}</math> +|- +!Helmholtz free energy +|<math>dF\left(T,V,n_{i}\right) = -SdT - pdV + \sum_{i} \mu_{i} dN_{i}</math> +|- +!Gibbs free energy +|<math>dG\left(T,p,n_{i}\right) = -SdT + Vdp + \sum_{i} \mu_{i} dN_{i}</math> +|- +|} + +===Maxwell's relations=== + +The four most common [[Maxwell's relations]] are: + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +!Thermodynamic potentials as functions of their natural variables +|<div class="plainlist"> +*<math>U(S,V)\,</math> = [[Internal energy]] +*<math>H(S,P)\,</math> = [[Enthalpy]] +*<math>F(T,V)\,</math> = [[Helmholtz free energy]] +*<math>G(T,P)\,</math> = [[Gibbs free energy]] +</div> +|<math> \left(\frac{\partial T}{\partial V}\right)_S = -\left(\frac{\partial P}{\partial S}\right)_V = \frac{\partial^2 U }{\partial S \partial V} </math> + +<math> \left(\frac{\partial T}{\partial P}\right)_S = +\left(\frac{\partial V}{\partial S}\right)_P = \frac{\partial^2 H }{\partial S \partial P} +</math> + +<math> +\left(\frac{\partial S}{\partial V}\right)_T = \left(\frac{\partial P}{\partial T}\right)_V = - \frac{\partial^2 F }{\partial T \partial V} </math> + +<math> -\left(\frac{\partial S}{\partial P}\right)_T = \left(\frac{\partial V}{\partial T}\right)_P = \frac{\partial^2 G }{\partial T \partial P} </math> +|- +|} + +More relations include the following. + +:{| class="wikitable" +|- +|<math> \left ( {\partial S\over \partial U} \right )_{V,N} = { 1\over T } </math> +|<math> \left ( {\partial S\over \partial V} \right )_{N,U} = { p\over T } </math> +|<math> \left ( {\partial S\over \partial N} \right )_{V,U} = - { \mu \over T } </math> +|- +|<math> \left ( {\partial T\over \partial S} \right )_V = { T \over C_V } </math> +|<math> \left ( {\partial T\over \partial S} \right )_P = { T \over C_P } </math> +| +|- +|<math> -\left ( {\partial p\over \partial V} \right )_T = { 1 \over {VK_T} } </math> +| +| +|- +|} + +Other differential equations are: + +:{| class="wikitable" +|- +! Name +! ''H'' +! ''U'' +! ''G'' +|- +![[Gibbs–Helmholtz equation]] +|<math>H = -T^2\left(\frac{\partial \left(G/T\right)}{\partial T}\right)_p</math> +|<math>U = -T^2\left(\frac{\partial \left(F/T\right)}{\partial T}\right)_V</math> +|<math>G = -V^2\left(\frac{\partial \left(F/V\right)}{\partial V}\right)_T</math> +|- +| +|<math>\left(\frac{\partial H}{\partial p}\right)_T = V - T\left(\frac{\partial V}{\partial T}\right)_P</math> +|<math>\left(\frac{\partial U}{\partial V}\right)_T = T\left(\frac{\partial P}{\partial T}\right)_V - P</math> +| +|- +|} + +===Quantum properties=== + +*<math> U = N k_B T^2 \left(\frac{\partial \ln Z}{\partial T}\right)_V ~</math> + +*<math> S = \frac{U}{T} + N * ~ S = \frac{U}{T} + N k_B \ln Z - N k \ln N + Nk ~</math> Indistinguishable Particles + +where ''N'' is number of particles, ''h'' is [[Planck's constant]], ''I'' is [[moment of inertia]], and ''Z'' is the [[partition function (statistical mechanics)|partition function]], in various forms: + +:{| class="wikitable" +|- +!Degree of freedom +!Partition function +|- +!Translation +|<math> Z_t = \frac{(2 \pi m k_B T)^\frac{3}{2} V}{h^3} </math> +|- +!Vibration +|<math> Z_v = \frac{1}{1 - e^\frac{-h \omega}{2 \pi k_B T}} </math> +|- +!Rotation +|<math> Z_r = \frac{2 I k_B T}{\sigma (\frac{h}{2 \pi})^2} </math> + +<div class="plainlist"> +*where: +*σ = 1 ([[heteronuclear molecule]]s) +*σ = 2 ([[homonuclear]]) +</div> +|} + +==Thermal properties of matter== + +:{| class="wikitable" +|- +! Coefficients +! Equation +|- +![[Joule–Thomson effect|Joule-Thomson coefficient]] +|<math>\mu_{JT} = \left(\frac{\partial T}{\partial p}\right)_H</math> +|- +![[Compressibility]] (constant temperature) +|<math> K_T = -{ 1\over V } \left ( {\partial V\over \partial p} \right )_{T,N} </math> +|- +! [[Coefficient of thermal expansion]] (constant pressure) +|<math>\alpha_{p} = \frac{1}{V}\left(\frac{\partial V}{\partial T}\right)_p</math> +|- +! Heat capacity (constant pressure) +| <math>C_p += \left ( {\partial Q_{rev} \over \partial T} \right )_p += \left ( {\partial U \over \partial T} \right )_p + p \left ( {\partial V \over \partial T} \right )_p += \left ( {\partial H \over \partial T} \right )_p += T \left ( {\partial S \over \partial T} \right )_p </math> +|- +! Heat capacity (constant volume) +| <math>C_V += \left ( {\partial Q_{rev} \over \partial T} \right )_V += \left ( {\partial U \over \partial T} \right )_V += T \left ( {\partial S \over \partial T} \right )_V </math> +|- +|} + +:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left" +!Derivation of heat capacity (constant pressure) +|- +| +Since + +:<math> +\left(\frac{\partial T}{\partial p}\right)_H +\left(\frac{\partial p}{\partial H}\right)_T +\left(\frac{\partial H}{\partial T}\right)_p += -1 +</math> + +:<math> +\begin{align} +\left(\frac{\partial T}{\partial p}\right)_H +& = -\left(\frac{\partial H}{\partial p}\right)_T + \left(\frac{\partial T}{\partial H}\right)_p +\\ +& = \frac{-1}{\left(\frac{\partial H}{\partial T}\right)_p} + \left(\frac{\partial H}{\partial p}\right)_T +\end{align} +</math> + +:<math>C_p = \left(\frac{\partial H}{\partial T}\right)_p</math> + +:<math> +\Rightarrow \left(\frac{\partial T}{\partial p}\right)_H += -\frac{1}{C_p} + \left(\frac{\partial H}{\partial p}\right)_T +</math> +|} + +:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left" +!Derivation of heat capacity (constant volume) +|- +| +Since + +:<math>dU = \delta Q_{rev} - \delta W_{rev} ,</math> + +(where δ''W''<sub>rev</sub> is the work done by the system), + +:<math>\delta S = \frac{\delta Q_{rev}}{T}, \delta W_{rev}= p \delta V </math> + +:<math> d U = T \delta S- p\delta V </math> + +:<math> +\left(\frac{\partial U}{\partial T}\right)_V += T\left(\frac{\partial S}{\partial T}\right)_V +- p\left(\frac{\partial V}{\partial T}\right)_V ; C_V = \left(\frac{\partial U}{\partial T}\right)_V +</math> + +:<math>\Rightarrow C_V = T\left(\frac{\partial S}{\partial T}\right)_V</math> +|} + +===Thermal transfer=== + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +!Net intensity emission/absorption +| <div class="plainlist"> +*''T''<sub>external</sub> = external temperature (outside of system) +*''T''<sub>system</sub> = internal temperature (inside system) +*''ε'' = emmisivity +</div> +|<math> I = \sigma \epsilon \left ( T_\mathrm{external}^4 - T_\mathrm{system}^4 \right ) \,\!</math> +|- +!Internal energy of a substance +| <div class="plainlist"> +*''C<sub>V</sub>'' = isovolumetric heat capacity of substance +*Δ''T'' = temperature change of substance +</div> +|<math>\Delta U = N C_V \Delta T\,\!</math> +|- +!Meyer's equation +|<div class="plainlist"> +*''C<sub>p</sub>'' = isobaric heat capacity +*''C<sub>V</sub>'' = isovolumetric heat capacity +*''n'' = number of moles +</div> +|<math> C_p - C_V = nR \,\!</math> +|- +!Effective thermal conductivities +| <div class="plainlist"> +*''λ<sub>i</sub>'' = thermal conductivity of substance ''i'' +*''λ''<sub>net</sub> = equivalent thermal conductivity +</div> +| Series +<math> \lambda_\mathrm{net} = \sum_j \lambda_j \,\!</math> + +Parallel +<math> \frac{1}{\lambda}_\mathrm{net} = \sum_j \left ( \frac{1}{\lambda}_j \right ) \,\!</math> +|- +|} + +===Thermal efficiencies=== + +:{| class="wikitable" +|- +! scope="col" width="100" | Physical situation +! scope="col" width="250" | Nomenclature +! scope="col" width="10" | Equations +|- +!Thermodynamic engines +|<div class="plainlist"> +* ''η'' = efficiency +* ''W'' = work done by engine +* ''Q<sub>H</sub>'' = heat energy in higher temperature reservoir +* ''Q<sub>L</sub>'' = heat energy in lower temperature reservoir +* ''T<sub>H</sub>'' = temperature of higher temp. reservoir +* ''T<sub>L</sub>'' = temperature of lower temp. reservoir +</div> +|Thermodynamic engine:<br /> +<math>\eta = \left |\frac{W}{Q_H} \right|\,\!</math> + +Carnot engine efficiency:<br /> +<math>\eta_c = 1 - \left | \frac{Q_L}{Q_H} \right | = 1-\frac{T_L}{T_H}\,\!</math> +|- +!Refrigeration +| <div class="plainlist"> +*''K'' = coefficient of refrigeration performance +</div> +|Refrigeration performance +<math>K = \left | \frac{Q_L}{W} \right | \,\!</math> + +Carnot refrigeration performance +<math>K_C = \frac{|Q_L|}{|Q_H|-|Q_L|} = \frac{T_L}{T_H-T_L}\,\!</math> +|- +|} + +==See also== +{{multicol}} +*[[Antoine equation]] +*[[Bejan number]] +*[[Bowen ratio]] +*[[Bridgman's thermodynamic equations|Bridgman's equations]] +*[[Clausius–Clapeyron relation]] +*[[Departure function]]s +*[[Duhem–Margules equation]] +*[[Ehrenfest equations]] +{{multicol-break}} +*[[Gibbs–Helmholtz equation]] +*[[Gibbs' phase rule]] +*[[Kopp's law]] +*[[Kopp–Neumann law]] +*[[Noro–Frenkel law of corresponding states]] +*[[Onsager reciprocal relations]] +*[[Stefan number]] +*[[Triple product rule]] +{{multicol-end}} + +==References== +<references/> +{{refbegin|2}} +* [[Peter Atkins|Atkins, Peter]] and de Paula, Julio ''Physical Chemistry'', 7th edition, W.H. Freeman and Company, 2002 [ISBN 0-7167-3539-3]. +** Chapters 1 - 10, ''Part 1: Equilibrium''. +* Bridgman, P.W., ''Phys. Rev.'', 3, 273 (1914). +*Landsberg, Peter T. <u>Thermodynamics and Statistical Mechanics</u>. New York: Dover Publications, Inc., 1990. ''(reprinted from Oxford University Press, 1978)''. +* Lewis, G.N., and Randall, M., "Thermodynamics", 2nd Edition, McGraw-Hill Book Company, New York, 1961. +* Reichl, L.E., "A Modern Course in Statistical Physics", 2nd edition, New York: John Wiley & Sons, 1998. +*Schroeder, Daniel V. <u>Thermal Physics</u>. San Francisco: Addison Wesley Longman, 2000 [ISBN 0-201-38027-7]. +*Silbey, Robert J., et al. <u>Physical Chemistry</u>. 4th ed. New Jersey: Wiley, 2004. +*Callen, Herbert B. (1985). "Thermodynamics and an Introduction to Themostatistics", 2nd Ed., New York: John Wiley & Sons. +{{refend}} + +{{Physics-footer}} + +[[Category:Thermodynamic equations]] + 0nnx7aiizn4xjtnefujvotd80cnj5pt + + + + Fulton–Hansen connectedness theorem + 0 + 13965 + + 13966 + 2013-04-20T15:14:53Z + + Yobot + 0 + + + /* External links */[[WP:CHECKWIKI]] error fixes - Replaced endash with hyphen in sortkey per [[WP:MCSTJR]] using [[Project:AWB|AWB]] (9100) + wikitext + text/x-wiki + In [[mathematics]], the '''Fulton–Hansen connectedness theorem''' is a result from [[intersection theory]] in [[algebraic geometry]], for the case of [[subvarieties]] of [[projective space]] with [[codimension]] large enough to make the intersection have components of dimension at least 1. + +The formal statement is that if ''V'' and ''W'' are irreducible algebraic subvarieties of a [[projective space]] ''P'', all over an [[algebraically closed field]], and if + +: dim(''V'') + dim (''W'') > dim (''P'') + +in terms of the [[dimension of an algebraic variety]], then the intersection ''U'' of ''V'' and ''W'' is [[connected space|connected]]. + +More generally, the theorem states that if <math>Z</math> is a projective variety and <math>f:Z \to P^n \times P^n</math> is any morphism such that <math>\dim f(Z) > n</math>, then <math>f^{-1}\Delta</math> is connected, where <math>\Delta</math> is the [[diagonal]] in <math>P^n \times P^n</math>. The special case of intersections is recovered by taking <math>Z = V \times W</math>, with <math>f</math> the natural inclusion. + +==See also== +* [[Zariski's connectedness theorem]] +* [[Grothendieck's connectedness theorem]] +* [[Deligne's connectedness theorem]] + +==References== +* {{citation|first=W.|last= Fulton|first2= J. |last2=Hansen|title=A connectedness theorem for projective varieties with applications to intersections and singularities of mappings|journal= Annals of Math. |volume=110 |year=1979|pages= 159–166|doi=10.2307/1971249|jstor=1971249|issue=1|publisher=Annals of Mathematics}} +* {{citation|first=R.|last= Lazarsfeld|title=Positivity in Algebraic Geometry|publisher= Springer|year= 2004}} + +==External links== +* [http://www.math.unizh.ch/fileadmin/math/preprints/20-05.pdf PDF lectures withe the result as Theorem 15.3 (attributed to Faltings, also)] + +{{DEFAULTSORT:Fulton-Hansen connectedness theorem}} +[[Category:Intersection theory]] +[[Category:Theorems in algebraic geometry]] + bxqkty288zw4798dydsxe7rioyfftt7 + + + + E (mathematical constant) + 0 + 334 + + 335 + 2014-01-30T15:02:07Z + + Fraggle81 + 0 + + + Reverted 2 edits by [[Special:Contributions/24.38.121.66|24.38.121.66]] identified using [[WP:STiki|STiki]] + wikitext + text/x-wiki + {{DISPLAYTITLE:{{mvar|e}} (mathematical constant)}} +{{Redirect|Euler's number|γ, a constant in number theory|<!-- Euler's constant -->Euler–Mascheroni constant|other uses|List of things named after Leonhard Euler#Euler's numbers}} +[[Image:Exp derivative at 0.svg|right|frame|Functions {{math|''f''(''x'') {{=}} ''a''<sup>''x''</sup>}} are shown for several values of {{math|''a''}}. +{{mvar|e}} is the unique value of {{math|''a''}}, such that the derivative of {{math|''f''(''x'') {{=}} ''a''<sup>''x''</sup>}} at the point {{math|''x'' {{=}} 0}} is equal to 1. The blue curve illustrates this case, {{math|''e''<sup>''x''</sup>}}. For comparison, functions {{math|2<sup>''x''</sup>}} (dotted curve) and {{math|4<sup>''x''</sup>}} (dashed curve) are shown; they are not [[tangent]] to the line of slope 1 and y-intercept 1 (red).]] +{{pp-move-indef|small=yes}} + +The number '''{{mvar|e}}''' is an important [[mathematical constant]] that is the base of the [[natural logarithm]]. It is approximately equal to 2.71828,<ref>[[Oxford English Dictionary]], 2nd ed.: [http://oxforddictionaries.com/definition/english/natural%2Blogarithm natural logarithm]</ref> and is the [[limit of a sequence|limit]] of {{math|(1 + 1/''n'')<sup>''n''</sup>}} as {{mvar|n}} approaches infinity, an expression that arises in the study of [[compound interest]]. It can also be calculated as the sum of the infinite [[series (mathematics)|series]]<ref>[[Encyclopedic Dictionary of Mathematics]] 142.D</ref> + +:<math>e = \displaystyle\sum\limits_{n = 0}^{ \infty} \dfrac{1}{n!} = 1 + \frac{1}{1} + \frac{1}{1\cdot 2} + \frac{1}{1\cdot 2\cdot 3} + \cdots</math> + +The constant can be defined in many ways; for example, {{mvar|e}} is the unique [[real number]] such that the value of the [[derivative]] (slope of the [[tangent line]]) of the function {{math|1=''f''(''x'') = ''e''<sup>''x''</sup>}} at the point {{math|1=''x'' = 0}} is equal to 1.<ref>{{cite book|title = Calculus|author = Jerrold E. Marsden, Alan Weinstein|publisher = Springer|year = 1985|isbn = 0-387-90974-5|url=http://books.google.com/?id=KVnbZ0osbAkC&printsec=frontcover}}</ref> The function {{math|''e''<sup>''x''</sup>}} so defined is called the [[exponential function]], and its [[Inverse function|inverse]] is the [[natural logarithm]], or logarithm to [[base (exponentiation)|base]] {{mvar|e}}. The natural logarithm of a positive number {{math|''k''}} can also be defined directly as the [[integral|area under]] the curve {{math|1=''y'' = 1/''x''}} between {{math|1=''x'' = 1}} and {{math|1=''x'' = ''k''}}, in which case, {{mvar|e}} is the number whose natural logarithm is 1. There are also more [[#Alternative characterizations|alternative characterizations]]. + +Sometimes called '''Euler's number''' after the [[Switzerland|Swiss]] [[mathematician]] [[Leonhard Euler]], {{mvar|e}} is not to be confused with {{math|γ}}—the [[Euler–Mascheroni constant]], sometimes called simply ''Euler's constant''. The number {{mvar|e}} is also known as '''Napier's constant''', but Euler's choice of the symbol {{mvar|e}} is said to have been retained in his honor.<ref name="mathworld">{{cite web|last=Sondow|first=Jonathan|title=e|url=http://mathworld.wolfram.com/e.html|work=[[MathWorld|Wolfram Mathworld]]|publisher=[[Wolfram Research]]|accessdate=10 May 2011}}</ref> The number {{mvar|e}} is of eminent importance in mathematics,<ref>{{cite book|title = An Introduction to the History of Mathematics|author = Howard Whitley Eves|year = 1969|publisher = Holt, Rinehart & Winston|isbn =0-03-029558-0}}</ref> alongside [[0 (number)|0]], [[1 (number)|1]], [[pi|{{pi}}]] and [[imaginary unit|{{mvar|i}}]]. All five of these numbers play important and recurring roles across mathematics, and are the five constants appearing in one formulation of [[Euler's identity]]. Like the constant {{pi}}, {{mvar|e}} is [[irrational number|irrational]]: it is not a ratio of [[integers]]; and it is [[transcendental number|transcendental]]: it is not a root of ''any'' non-zero [[polynomial]] with rational coefficients. The numerical value of {{mvar|e}} truncated to 50 [[decimal|decimal places]] is +:{{gaps|2.71828|18284|59045|23536|02874|71352|66249|77572|47093|69995...}} {{OEIS|A001113}}. + +{{E (mathematical constant)}} + +==History== +The first references to the constant were published in 1618 in the table of an appendix of a work on logarithms by [[John Napier]].<ref name="OConnor">{{cite web|url=<!-- http://www.gap-system.org/~history/PrintHT/e.html -->http://www-history.mcs.st-and.ac.uk/HistTopics/e.html|title=The number ''e''|publisher=MacTutor History of Mathematics|first1=J J|last1=O'Connor|first2=E F|last2=Robertson}}</ref> However, this did not contain the constant itself, but simply a list of logarithms calculated from the constant. It is assumed that the table was written by [[William Oughtred]]. The discovery of the constant itself is credited to [[Jacob Bernoulli]], who attempted to find the value of the following expression (which is in fact {{mvar|e}}): + +:<math>\lim_{n\to\infty} \left( 1 + \frac{1}{n} \right)^n</math> + +The first known use of the constant, represented by the letter {{math|''b''}}, was in correspondence from [[Gottfried Leibniz]] to [[Christiaan Huygens]] in 1690 and 1691. [[Leonhard Euler]] introduced the letter {{mvar|e}} as the base for natural logarithms, writing in a letter to [[Christian Goldbach]] of 25 November 1731.<ref>{{Cite book|last=Remmert|first=Reinhold|authorlink=Reinhold Remmert|title=Theory of Complex Functions|page=136|publisher=[[Springer-Verlag]]|year=1991|isbn=0-387-97195-5|postscript=<!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->{{inconsistent citations}}}}</ref> Euler started to use the letter {{mvar|e}} for the constant in 1727 or 1728, in an unpublished paper on explosive forces in cannons,<ref name="Meditatio">Euler, ''[http://www.math.dartmouth.edu/~euler/pages/E853.html Meditatio in experimenta explosione tormentorum nuper instituta]''.</ref> and the first appearance of {{mvar|e}} in a publication was [[Mechanica|Euler's ''Mechanica'']] (1736). While in the subsequent years some researchers used the letter {{math|''c''}}, {{mvar|e}} was more common and eventually became the standard. + +==Applications== +===Compound interest=== +[[File:Compound Interest with Varying Frequencies.svg|thumb|right|350px|The effect of earning 20% annual interest on an initial $1,000 investment at various compounding frequencies]] + +[[Jacob Bernoulli]] discovered this constant by studying a question about [[compound interest]]:<ref name="OConnor" /> + +:An account starts with $1.00 and pays 100 percent interest per year. If the interest is credited once, at the end of the year, the value of the account at year-end will be $2.00. What happens if the interest is computed and credited more frequently during the year? + +If the interest is credited twice in the year, the interest rate for each 6 months will be 50%, so the initial $1 is multiplied by 1.5 twice, yielding $1.00×1.5<sup>2</sup>&nbsp;=&nbsp;$2.25 at the end of the year. Compounding quarterly yields $1.00×1.25<sup>4</sup>&nbsp;=&nbsp;$2.4414..., and compounding monthly yields $1.00×(1+1/12)<sup>12</sup>&nbsp;=&nbsp;$2.613035... If there are {{math|''n''}} compounding intervals, the interest for each interval will be {{math|100%/''n''}} and the value at the end of the year will be $1.00×{{math|1=(1 + 1/''n'')<sup>''n''</sup>}}. + +Bernoulli noticed that this sequence approaches a limit (the [[force of interest]]) with larger {{math|''n''}} and, thus, smaller compounding intervals. Compounding weekly ({{math|1=''n'' = 52}}) yields $2.692597..., while compounding daily ({{math|1=''n'' = 365}}) yields $2.714567..., just two cents more. The limit as {{math|''n''}} grows large is the number that came to be known as {{mvar|e}}; with ''continuous'' compounding, the account value will reach $2.7182818.... More generally, an account that starts at $1 and offers an annual interest rate of {{math|''R''}} will, after {{math|''t''}} years, yield {{math|''e''<sup>''Rt''</sup>}} dollars with continuous compounding. (Here {{math|''R''}} is a fraction, so for 5% interest, {{math|1=''R'' = 5/100 = 0.05}}) + +===Bernoulli trials=== +The number {{mvar|e}} itself also has applications to [[probability theory]], where it arises in a way not obviously related to exponential growth. Suppose that a gambler plays a slot machine that pays out with a probability of one in {{math|''n''}} and plays it {{math|''n''}} times. Then, for large {{math|''n''}} (such as a million) the [[probability]] that the gambler will lose every bet is (approximately) {{math|1/''e''}}. For {{math|1=''n'' = 20}} it is already 1/2.72. + +This is an example of a [[Bernoulli trials]] process. Each time the gambler plays the slots, there is a one in one million chance of winning. Playing one million times is modelled by the [[binomial distribution]], which is closely related to the [[binomial theorem]]. The probability of winning {{math|''k''}} times out of a million trials is; +:<math>\binom{10^6}{k} \left(10^{-6}\right)^k(1-10^{-6})^{10^6-k}.</math> +In particular, the probability of winning zero times ({{math|1=''k'' = 0}}) is +:<math>\left(1-\frac{1}{10^6}\right)^{10^6}.</math> +This is very close to the following limit for {{math|1/''e''}}: +:<math>\frac{1}{e} = \lim_{n\to\infty} \left(1-\frac{1}{n}\right)^n.</math> + +===Derangements=== +Another application of {{mvar|e}}, also discovered in part by Jacob Bernoulli along with [[Pierre Raymond de Montmort]] is in the problem of [[derangement]]s, also known as the ''hat check problem'':<ref>Grinstead, C.M. and Snell, J.L.''[http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html Introduction to probability theory] (published online under the [[GFDL]]), p. 85.</ref> {{math|''n''}} guests are invited to a party, and at the door each guest checks his hat with the butler who then places them into {{math|''n''}} boxes, each labelled with the name of one guest. But the butler does not know the identities of the guests, and so he puts the hats into boxes selected at random. The problem of de Montmort is to find the probability that ''none'' of the hats gets put into the right box. The answer is: + +:<math>p_n = 1-\frac{1}{1!}+\frac{1}{2!}-\frac{1}{3!}+\cdots+\frac{(-1)^n}{n!} = \sum_{k = 0}^n \frac{(-1)^k}{k!}.</math> + +As the number {{math|''n''}} of guests tends to infinity, {{math|''p''<sub>''n''</sub>}} approaches {{math|1/''e''}}. Furthermore, the number of ways the hats can be placed into the boxes so that none of the hats is in the right box is {{math|''n''!/''e''}} rounded to the nearest integer, for every positive&nbsp;{{math|''n''}}.<ref>Knuth (1997) ''[[The Art of Computer Programming]]'' Volume I, Addison-Wesley, p. 183 ISBN 0-201-03801-3.</ref> + +===Asymptotics=== +The number {{mvar|e}} occurs naturally in connection with many problems involving [[asymptotics]]. A prominent example is [[Stirling's formula]] for the [[Asymptotic analysis|asymptotics]] of the [[factorial function]], in which both the numbers {{mvar|e}} and [[Pi|{{pi}}]] enter: +:<math>n! \sim \sqrt{2\pi n}\, \left(\frac{n}{e}\right)^n.</math> +A particular consequence of this is +:<math>e = \lim_{n\to\infty} \frac{n}{\sqrt[n]{n!}}</math>. + +===Standard normal distribution=== +(from [[Normal distribution]]) + +The simplest case of a normal distribution is known as the ''standard normal distribution'', described by this [[probability density function]]: + +:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math> + +The factor <math style="position:relative; top:-.2em">\scriptstyle\ 1/\sqrt{2\pi}</math> in this expression ensures that the total area under the curve ''ϕ''(''x'') is equal to one<sup>[[Gaussian integral|[proof]]]</sup>. The {{frac2|1|2}} in the exponent ensures that the distribution has unit variance (and therefore also unit standard deviation). This function is symmetric around ''x''=0, where it attains its maximum value <math style="position:relative; top:-.2em">\scriptstyle\ 1/\sqrt{2\pi}</math>; and has [[inflection point]]s at +1 and −1. + +=={{mvar|e}} in calculus== +[[Image:Ln+e.svg|thumb|right|The natural log at (x-axis) {{mvar|e}}, {{math|ln(''e'')}}, is equal to 1]] + +The principal motivation for introducing the number {{mvar|e}}, particularly in [[calculus]], is to perform [[derivative (mathematics)|differential]] and [[integral calculus]] with [[exponential function]]s and [[logarithm]]s.<ref>Kline, M. (1998) ''Calculus: An intuitive and physical approach'', section 12.3 [http://books.google.co.jp/books?id=YdjK_rD7BEkC&pg=PA337 "The Derived Functions of Logarithmic Functions."], pp. 337 ff, Courier Dover Publications, 1998, ISBN 0-486-40453-6</ref> A general exponential function {{math|''y'' {{=}} ''a''<sup>''x''</sup>}} has derivative given as the [[limit of a function|limit]]: +:<math>\frac{d}{dx}a^x=\lim_{h\to 0}\frac{a^{x+h}-a^x}{h}=\lim_{h\to 0}\frac{a^{x}a^{h}-a^x}{h}=a^x\left(\lim_{h\to 0}\frac{a^h-1}{h}\right).</math> +The limit on the far right is independent of the variable {{math|''x''}}: it depends only on the base {{math|''a''}}. When the base is {{mvar|e}}, this limit is equal to one, and so {{mvar|e}} is symbolically defined by the equation: +:<math>\frac{d}{dx}e^x = e^x.</math> + +Consequently, the exponential function with base {{mvar|e}} is particularly suited to doing calculus. Choosing {{mvar|e}}, as opposed to some other number, as the base of the exponential function makes calculations involving the derivative much simpler. + +Another motivation comes from considering the base-{{math|''a''}} [[logarithm]].<ref>This is the approach taken by Kline (1998).</ref> Considering the definition of the derivative of {{math|log<sub>''a''</sub> ''x''}} as the limit: +:<math>\frac{d}{dx}\log_a x = \lim_{h\to 0}\frac{\log_a(x+h)-\log_a(x)}{h}=\frac{1}{x}\left(\lim_{u\to 0}\frac{1}{u}\log_a(1+u)\right),</math> +where the substitution {{math|''u'' {{=}} ''h''/''x''}} was made in the last step. The last limit appearing in this calculation is again an undetermined limit that depends only on the base {{math|a}}, and if that base is {{mvar|e}}, the limit is one. So symbolically, +:<math>\frac{d}{dx}\log_e x=\frac{1}{x}.</math> +The logarithm in this special base is called the [[natural logarithm]] and is represented as {{math|ln}}; it behaves well under differentiation since there is no undetermined limit to carry through the calculations. + +There are thus two ways in which to select a special number {{math|''a'' {{=}} ''e''}}. One way is to set the derivative of the exponential function {{math|''a''<sup>''x''</sup>}} to {{math|''a''<sup>''x''</sup>}}, and solve for {{math|''a''}}. The other way is to set the derivative of the base {{math|''a''}} logarithm to {{math|1/''x''}} and solve for {{math|''a''}}. In each case, one arrives at a convenient choice of base for doing calculus. In fact, these two solutions for {{math|''a''}} are actually ''the same'', the number {{mvar|e}}. + +===Alternative characterizations=== +[[Image:hyperbola E.svg|thumb|right|The area between the {{math|''x''}}-axis and the graph {{math|''y'' {{=}} 1/''x''}}, between {{math|''x'' {{=}} 1}} and {{math|''x'' {{=}} ''e''}} is 1.]] +{{See also|Representations of e}} +Other characterizations of {{mvar|e}} are also possible: one is as the [[limit of a sequence]], another is as the sum of an [[infinite series]], and still others rely on [[integral calculus]]. So far, the following two (equivalent) properties have been introduced: + +1. The number {{mvar|e}} is the unique positive [[real number]] such that +:<math>\frac{d}{dt}e^t = e^t.</math> + +2. The number {{mvar|e}} is the unique positive real number such that +:<math>\frac{d}{dt} \log_e t = \frac{1}{t}.</math> + +The following three characterizations can be [[characterizations of the exponential function#Equivalence of the characterizations|proven equivalent]]: + +3. The number {{mvar|e}} is the [[limit of a sequence|limit]] +:<math>e = \lim_{n\to\infty} \left( 1 + \frac{1}{n} \right)^n</math> + +Similarly: +:<math>e = \lim_{x\to 0} \left( 1 + x \right)^{\frac{1}{x}}</math> + +4. The number {{mvar|e}} is the sum of the [[infinite series]] +:<math>e = \sum_{n = 0}^\infty \frac{1}{n!} = \frac{1}{0!} + \frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \frac{1}{4!} + \cdots</math> +where {{math|''n''!}} is the [[factorial]] of {{math|''n''}}. + +5. The number {{mvar|e}} is the unique positive real number such that +:<math>\int_1^e \frac{1}{t} \, dt = 1.</math> + +==Properties== +===Calculus=== +As in the motivation, the [[exponential function]] {{math|''e''<sup>''x''</sup>}} is important in part because it is the unique nontrivial function (up to multiplication by a constant) which is its own [[derivative]] + +:<math>\frac{d}{dx}e^x=e^x</math> + +and therefore its own [[antiderivative]] as well: + +:<math>\int e^x\,dx = e^x + C.</math> + +===Exponential-like functions=== +{{See also|Steiner's problem}} +[[Image:Xth root of x.svg|thumb|right|250px|The [[global maximum]] of <math>\sqrt[x]{x}</math> occurs at {{math|''x'' {{=}} ''e''}}.]] +The [[global maximum]] for the function + +:<math> f(x) = \sqrt[x]{x}</math> + +occurs at {{math|''x'' {{=}} ''e''}}. Similarly, {{math|''x'' {{=}} 1/''e''}} is where the [[global minimum]] occurs for the function + +:<math> f(x) = x^x\, </math> + +defined for positive {{math|''x''}}. More generally, {{math|''x'' {{=}} ''e''<sup>−1/''n''</sup>}} is where the global minimum occurs for the function + +:<math> \!\ f(x) = x^{x^n} </math> + +for any {{math|n > 0}}. The infinite [[tetration]] + +:<math> x^{x^{x^{\cdot^{\cdot^{\cdot}}}}} </math> or <sup>∞</sup><math>x</math> + +converges if and only if {{math|''e''<sup>−''e''</sup> ≤ ''x'' ≤ ''e''<sup>1/''e''</sup>}} (or approximately between 0.0660 and 1.4447), due to a theorem of [[Leonhard Euler]]. + +===Number theory=== +The real number {{mvar|e}} is [[Irrational number|irrational]]. [[Leonhard Euler|Euler]] proved this by showing that its [[simple continued fraction]] expansion is infinite.<ref>{{cite web|url=http://www.maa.org/editorial/euler/How%20Euler%20Did%20It%2028%20e%20is%20irrational.pdf|title=How Euler Did It: Who proved {{mvar|e}} is Irrational?|last=Sandifer|first=Ed|date=Feb. 2006|publisher=MAA Online|accessdate=2010-06-18}}</ref> (See also [[Joseph Fourier|Fourier]]'s [[proof that e is irrational|proof that {{mvar|e}} is irrational]].) + +Furthermore, by the [[Lindemann–Weierstrass theorem]], {{mvar|e}} is [[Transcendental number|transcendental]], meaning that it is not a solution of any non-constant polynomial equation with rational coefficients. It was the first number to be proved transcendental without having been specifically constructed for this purpose (compare with [[Liouville number]]); the proof was given by [[Charles Hermite]] in 1873. + +It is conjectured that {{mvar|e}} is [[normal number|normal]], meaning that when {{mvar|e}} is expressed in any [[Radix|base]] the possible digits in that base are uniformly distributed (occur with equal probability in any sequence of given length). + +===Complex numbers=== +The [[exponential function]] {{math|''e''<sup>''x''</sup>}} may be written as a [[Taylor series]] + +:<math> e^{x} = 1 + {x \over 1!} + {x^{2} \over 2!} + {x^{3} \over 3!} + \cdots = \sum_{n=0}^{\infty} \frac{x^n}{n!}</math> + +Because this series keeps many important properties for {{math|''e''<sup>''x''</sup>}} even when {{math|''x''}} is [[complex number|complex]], it is commonly used to extend the definition of {{math|''e''<sup>''x''</sup>}} to the complex numbers. This, with the Taylor series for [[trigonometric functions|sin and cos {{math|''x''}}]], allows one to derive [[Euler's formula]]: + +:<math>e^{ix} = \cos x + i\sin x,\,\!</math> + +which holds for all {{math|''x''}}. The special case with {{math|''x'' {{=}} [[Pi|&pi;]]}} is [[Euler's identity]]: + +:<math>e^{i\pi} + 1 = 0\,\!</math> + +from which it follows that, in the [[principal branch]] of the logarithm, + +:<math>\ln (-1) = i\pi.\,\!</math> + +Furthermore, using the laws for exponentiation, + +:<math>(\cos x + i\sin x)^n = \left(e^{ix}\right)^n = e^{inx} = \cos (nx) + i \sin (nx),</math> + +which is [[de Moivre's formula]]. + +The expression + +:<math>\cos x + i \sin x \,</math> + +is sometimes referred to as {{math|cis(''x'')}}. + +===Differential equations=== +The general function + +:<math>y(x) = Ce^x\,</math> + +is the solution to the differential equation: + +:<math>y' = y.\,</math> + +==Representations== +{{Main|List of representations of e}} + +The number {{mvar|e}} can be represented as a [[real number]] in a variety of ways: as an [[infinite series]], an [[infinite product]], a [[continued fraction]], or a [[limit of a sequence]]. The chief among these representations, particularly in introductory [[calculus]] courses is the limit +:<math>\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^n,</math> +given above, as well as the series +:<math>e=\sum_{n=0}^\infty \frac{1}{n!}</math> +given by evaluating the above [[power series]] for {{math|''e''<sup>''x''</sup>}} at {{math|''x'' {{=}} 1}}. + +Less common is the [[continued fraction]] {{OEIS|id=A003417}}. +<!--move to history section or say <ref>[[Leonhard Euler|Euler]] was the first showed that {{mvar|e}} can be represented as a continued fraction.</ref>--> + +:<math> +e = [2;1,\mathbf 2,1,1,\mathbf 4,1,1,\mathbf 6,1,1,...,\mathbf {2n},1,1,...] = [1;\mathbf 0,1,1,\mathbf 2,1,1,\mathbf 4,1,1,...,\mathbf {2n},1,1,...], +</math><ref>Hofstadter, D. R., "Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought" Basic Books (1995) ISBN 0-7139-9155-0</ref> + +which written out looks like + +:<math>e = 2+ +\cfrac{1} + {1+\cfrac{1} + {\mathbf 2 +\cfrac{1} + {1+\cfrac{1} + {1+\cfrac{1} + {\mathbf 4 +\cfrac{1} + {1+\cfrac{1} + {1+\ddots} + } + } + } + } + } + } += 1+ +\cfrac{1} + {\mathbf 0 + \cfrac{1} + {1 + \cfrac{1} + {1 + \cfrac{1} + {\mathbf 2 + \cfrac{1} + {1 + \cfrac{1} + {1 + \cfrac{1} + {\mathbf 4 + \cfrac{1} + {1 + \cfrac{1} + {1 + \ddots} + } + } + } + } + } + } + } + }. +</math> + +This continued fraction for {{mvar|e}} converges three times as quickly: +:<math> e = [ 1 ; 0.5 , 12 , 5 , 28 , 9 , 44 , 13 , \ldots , 4(4n-1) , (4n+1) , \ldots ],</math> + +which written out looks like + +:<math> e = 1+\cfrac{2}{1+\cfrac{1}{6+\cfrac{1}{10+\cfrac{1}{14+\cfrac{1}{18+\cfrac{1}{22+\cfrac{1}{26+\ddots\,}}}}}}}.</math> + +Many other series, sequence, continued fraction, and infinite product representations of {{mvar|e}} have been developed. + +===Stochastic representations=== +In addition to exact analytical expressions for representation of {{mvar|e}}, there are stochastic techniques for estimating {{mvar|e}}. One such approach begins with an infinite sequence of independent random variables {{math|''X''<sub>1</sub>}}, {{math|''X''<sub>2</sub>}}..., drawn from the [[uniform distribution (continuous)|uniform distribution]] on [0, 1]. Let {{math|''V''}} be the least number {{math|''n''}} such that the sum of the first {{math|''n''}} samples exceeds 1: +:<math>V = \min { \left \{ n \mid X_1+X_2+\cdots+X_n > 1 \right \} }.</math> +Then the [[expected value]] of {{math|''V''}} is {{mvar|e}}: {{math|E(''V'') {{=}} ''e''}}.<ref>Russell, K. G. (1991) ''[http://links.jstor.org/sici?sici=0003-1305%28199102%2945%3A1%3C66%3AETVOEB%3E2.0.CO%3B2-U Estimating the Value of e by Simulation]'' The American Statistician, Vol. 45, No. 1. (Feb., 1991), pp. 66–68.</ref><ref>Dinov, ID (2007) ''[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_LawOfLargeNumbers#Estimating_e_using_SOCR_simulation Estimating e using SOCR simulation]'', SOCR Hands-on Activities (retrieved December 26, 2007).</ref> + +===Known digits=== +The number of known digits of {{mvar|e}} has increased dramatically during the last decades. This is due both to the increased performance of computers and to algorithmic improvements.<ref>Sebah, P. and Gourdon, X.; [http://numbers.computation.free.fr/Constants/E/e.html The constant e and its computation]</ref><ref>Gourdon, X.; [http://numbers.computation.free.fr/Constants/PiProgram/computations.html Reported large computations with PiFast]</ref> + +{| class="wikitable" style="margin: 1em auto 1em auto" +|+ '''Number of known decimal digits of {{mvar|e}} ''' +! Date || Decimal digits || Computation performed by +|- +| 1748 ||align=right| 23 || [[Leonhard Euler]]<ref>''Introductio in analysin infinitorum'' [http://books.google.de/books?id=jQ1bAAAAQAAJ&pg=PA90 p. 90]</ref> +|- +| 1853 ||align=right| 137 || [[William Shanks]] +|- +| 1871 ||align=right| 205 || [[William Shanks]] +|- +| 1884 ||align=right| 346 || J. Marcus Boorman +|- +| 1949 ||align=right| 2,010 || [[John von Neumann]] (on the [[ENIAC]]) +|- +| 1961 ||align=right| 100,265 || [[Daniel Shanks]] and [[John Wrench]]<ref name="We have computed e on a 7090 to 100,265D by the obvious program.">{{cite journal|author=Daniel Shanks and John W Wrench|quote=We have computed e on a 7090 to 100,265D by the obvious program|title=Calculation of Pi to 100,000 Decimals|journal =Mathematics of Computation|volume= 16 |year=1962| issue =77| pages =76–99 (78)|url=http://www.ams.org/journals/mcom/1962-16-077/S0025-5718-1962-0136051-9/S0025-5718-1962-0136051-9.pdf|doi=10.2307/2003813}}</ref> +|- +| 1978 ||align=right| 116,000 || [[Steve Wozniak]] on the [[Apple II]]<ref name="wozniak198106">{{cite news | url=http://archive.org/stream/byte-magazine-1981-06/1981_06_BYTE_06-06_Operating_Systems#page/n393/mode/2up | title=The Impossible Dream: Computing ''e'' to 116 Places with a Personal Computer | work=BYTE | date=June 1981 | accessdate=18 October 2013 | author=Wozniak, Steve | pages=392}}</ref> +|- +| 1994 April 1 ||align=right| 1,000,000 || [[Robert J. Nemiroff]] & Jerry Bonnell <ref>[http://apod.nasa.gov/htmltest/gifcity/e.1mil Email from Robert Nemiroff and Jerry Bonnell – The Number e to 1 Million Digits]. None. Retrieved on 2012-02-24.</ref> +|- +| 1999 November 21 ||align=right| 1,250,000,000 || Xavier Gourdon <ref name="Email from Xavier Gourdon to Simon Plouffe">[http://web.archive.org/web/20021223163426/http://pi.lacim.uqam.ca/piDATA/expof1.txt Email from Xavier Gourdon to Simon Plouffe – I have made a new e computation (with verification): 1,250,000,000 digits]. None. Retrieved on 2012-02-24.</ref> +|- +| 2000 July 16 ||align=right| 3,221,225,472 || Colin Martin & Xavier Gourdon <ref name="PiHacks message 177 - E to 3,221,225,472 D">[http://groups.yahoo.com/group/pi-hacks/message/177 PiHacks message 177 – E to 3,221,225,472 D]. Groups.yahoo.com. Retrieved on 2012-02-24.</ref> +|- +| 2003 September 18 ||align=right| 50,100,000,000 || Shigeru Kondo & Xavier Gourdon <ref name="PiHacks message 1071 - Two new records: 50 billions for E and 25 billions for pi">[http://groups.yahoo.com/group/pi-hacks/message/1071 PiHacks message 1071 – Two new records: 50 billions for E and 25 billions for pi]. Groups.yahoo.com. Retrieved on 2012-02-24.</ref> +|- +| 2007 April 27 ||align=right| 100,000,000,000 || Shigeru Kondo & Steve Pagliarulo <ref name="English Version of PI WORLD">[http://web.archive.org/web/20021221061853/http://ja0hxv.calico.jp/pai/eevalue.html English Version of PI WORLD]. Ja0hxv.calico.jp. Retrieved on 2012-02-24.</ref> +|- +| 2009 May 6 ||align=right| 200,000,000,000 || Rajesh Bohara & Steve Pagliarulo <ref name="English Version of PI WORLD"/> +|- +| 2010 July 5 ||align=right| 1,000,000,000,000 || Shigeru Kondo & Alexander J. Yee <ref>[http://www.numberworld.org/digits/E/ A list of notable large computations of e]. Numberworld.org. Last updated: March 7, 2011. Retrieved on 2012-02-24.</ref> +|} + +==In computer culture== +In contemporary [[internet culture]], individuals and organizations frequently pay homage to the number {{mvar|e}}. + +For instance, in the [[Initial Public Offering|IPO]] filing for [[Google]] in 2004, rather than a typical round-number amount of money, the company announced its intention to raise $2,718,281,828, which is {{mvar|e}} billion [[United States dollar|dollars]] to the nearest dollar. Google was also responsible for a billboard<ref>[http://braintags.com/archives/2004/07/first-10digit-prime-found-in-consecutive-digits-of-e/ First 10-digit prime found in consecutive digits of {{math|e}}&#125;]. Brain Tags. Retrieved on 2012-02-24.</ref> that appeared in the heart of [[Silicon Valley]], and later in [[Cambridge, Massachusetts]]; [[Seattle, Washington]]; and [[Austin, Texas]]. It read "{first 10-digit prime found in consecutive digits of {{mvar|e}}}.com". Solving this problem and visiting the advertised web site (now defunct) led to an even more difficult problem to solve, which in turn led to [[Google Labs]] where the visitor was invited to submit a resume.<ref>{{cite news|first=Andrea|last=Shea|url=http://www.npr.org/templates/story/story.php?storyId=3916173|title=Google Entices Job-Searchers with Math Puzzle|work=NPR|accessdate=2007-06-09}}</ref> The first 10-digit prime in {{mvar|e}} is 7427466391, which starts at the 99th digit.<ref>{{cite web |first=Marcus |last=Kazmierczak |url=http://mkaz.com/math/google-billboard |title=Google Billboard |publisher=mkaz.com |date=2004-07-29 |accessdate=2007-06-09}}</ref> + +In another instance, the [[computer scientist]] [[Donald Knuth]] let the version numbers of his program [[Metafont]] approach {{mvar|e}}. The versions are 2, 2.7, 2.71, 2.718, and so forth. Similarly, the version numbers of his [[TeX]] program approach {{pi}}.<ref>{{Cite journal|url=http://www.tex.ac.uk/tex-archive/digests/tex-mag/v5.n1|title=The Future of TeX and Metafont|first=Donald|last=Knuth|authorlink=Donald Knuth|journal=TeX Mag|volume=5|issue=1}}</ref> + +==Notes== +{{Reflist|colwidth=30em}} + +==Further reading== +* Maor, Eli; ''{{mvar|e}}: The Story of a Number'', ISBN 0-691-05854-7 +* [http://www.johnderbyshire.com/Books/Prime/Blog/page.html#endnote10 Commentary on Endnote 10] of the book ''[[Prime Obsession]]'' for another stochastic representation + +==External links== +{{Commons category|E (mathematical constant)}} +*[http://betterexplained.com/articles/an-intuitive-guide-to-exponential-functions-e/ An Intuitive Guide To Exponential Functions &{{mvar|e}}] for the non-mathematician +*[<!-- http://www.gutenberg.org/etext/127 -->http://gutenberg.org/ebooks/127 The number {{mvar|e}} to 1 million places] and [<!-- http://antwrp.gsfc.nasa.gov/htmltest/rjn_dig.html -->http://apod.nasa.gov/htmltest/gifcity/e.2mil 2 and 5 million places (link obsolete)] +*[http://mathworld.wolfram.com/eApproximations.html {{mvar|e}} Approximations]&nbsp;– Wolfram MathWorld +*[http://jeff560.tripod.com/constants.html Earliest Uses of Symbols for Constants] Jan. 13, 2008 +*[http://www.gresham.ac.uk/lectures-and-events/the-story-of-e "The story of {{mvar|e}}"], by Robin Wilson at [[Gresham College]], 28 February 2007 (available for audio and video download) +*[http://www.subidiom.com/e {{mvar|e}} Search Engine] 2 billion searchable digits of {{mvar|e}}, {{pi}} and √2 + +{{Good article}} + +{{DEFAULTSORT:E (Mathematical Constant)}} +[[Category:Transcendental numbers]] +[[Category:Mathematical constants]] +[[Category:E (mathematical constant)|*]] + +{{Link FA|ka}} +{{Link FA|mk}} +{{Link FA|lmo}} + tmutkcczlgeayd6i21al2u6q04mmtr7 + + + + BrownBoost + 0 + 17231 + + 17232 + 2013-03-17T02:04:04Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Migrating 1 interwiki links, now provided by [[Wikipedia:Wikidata|Wikidata]] on [[d:q4035469]] + wikitext + text/x-wiki + '''BrownBoost''' is a [[Boosting (meta-algorithm)|boosting]] algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the [[boost by majority]] algorithm. As is true for all [[boosting (machine learning)|boosting]] algorithms, BrownBoost is used in conjunction with other [[machine learning]] methods. BrownBoost was introduced by [[Yoav Freund]] in 2001.<ref name="Freund01">Yoav Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293--318, June 2001.</ref> + +==Motivation== + +[[AdaBoost]] performs well on a variety of datasets; however, it can be shown that AdaBoost does not perform well on noisy data sets.<ref name="Dietterich00">Dietterich, T. G., (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40 (2) 139-158.</ref> This is a result of AdaBoost's focus on examples that are repeatedly misclassified. In contrast, BrownBoost effectively "gives up" on examples that are repeatedly misclassified. The core assumption of BrownBoost is that noisy examples will be repeatedly mislabeled by the weak hypotheses and non-noisy examples will be correctly labeled frequently enough to not be "given up on." Thus only noisy examples will be "given up on," whereas non-noisy examples will contribute to the final classifier. In turn, if the final classifier is learned from the non-noisy examples, the [[generalization error]] of the final classifier may be much better than if learned from noisy and non-noisy examples. + +The user of the algorithm can set the amount of error to be tolerated in the training set. Thus, if the training set is noisy (say 10% of all examples are assumed to be mislabeled), the booster can be told to accept a 10% error rate. Since the noisy examples may be ignored, only the true examples will contribute to the learning process. + +==Algorithm Description== + +BrownBoost uses a non-convex potential loss function, thus it does not fit into the [[AnyBoost]] framework. The non-convex optimization provides a method to avoid overfitting noisy data sets. However, in contrast to boosting algorithms that analytically minimize a convex loss function (e.g. [[AdaBoost]] and [[LogitBoost]]), BrownBoost solves a system of two equations and two unknowns using standard numerical methods. + +The only parameter of BrownBoost (<math>c</math> in the algorithm) is the "time" the algorithm runs. The theory of BrownBoost states that each hypothesis takes a variable amount of time (<math>t</math> in the algorithm) which is directly related to the weight given to the hypothesis <math>\alpha</math>. The time parameter in BrownBoost is analogous to the number of iterations <math>T</math> in AdaBoost. + +A larger value of <math>c</math> means that BrownBoost will treat the data as if it were less noisy and therefore will give up on fewer examples. Conversely, a smaller value of <math>c</math> means that <math>c</math> BrownBoost will treat the data as more noisy and give up on more examples. + +During each iteration of the algorithm, a hypothesis is selected with some advantage over random guessing. The weight of this hypothesis <math>\alpha</math> and the "amount of time passed" <math>t</math> during the iteration are simultaneously solved in a system of two non-linear equations ( 1. uncorrelate hypothesis w.r.t example weights and 2. hold the potential constant) with two unknowns (weight of hypothesis <math>\alpha</math> and time passed <math>t</math>). This can be solved by bisection (as implemented in the [[JBoost]] software package) or [[Newton's method]] (as described in the original paper by Freund). Once these equations are solved, the margins of each example (<math>r_i(x_j)</math> in the algorithm) and the amount of time remaining <math>s</math> are updated appropriately. This process is repeated until there is no time remaining. + +The initial potential is defined to be <math>\frac{1}{m}\sum_{j=1}^m 1-\mbox{erf}(\sqrt{c}) = 1-\mbox{erf}(\sqrt{c})</math>. Since a constraint of each iteration is that the potential be held constant, the final potential is <math>\frac{1}{m}\sum_{j=1}^m 1-\mbox{erf}(r_i(x_j)/\sqrt{c}) = 1-\mbox{erf}(\sqrt{c})</math>. Thus the final error is ''likely'' to be near <math>1-\mbox{erf}(\sqrt{c})</math>. However, the final potential function is not the 0-1 loss error function. For the final error to be exactly <math>1-\mbox{erf}(\sqrt{c})</math>, the variance of the loss function must decrease linearly w.r.t. time to form the 0-1 loss function at the end of boosting iterations. This is not yet discussed in the literature and is not in the definition of the algorithm below. + +The final classifier is a linear combination of weak hypotheses and is evaluated in the same manner as most other boosting algorithms. + +==BrownBoost Learning Algorithm Definition== + +Input: +* <math>m</math> training examples <math>(x_{1},y_{1}),\ldots,(x_{m},y_{m})</math> where <math>x_{j} \in X,\, y_{j} \in Y = \{-1, +1\}</math> +* The parameter <math>c</math> + +Initialise: +* <math>s=c</math>. The value of <math>s</math> is the amount of time remaining in the game) +* <math>r_i(x_j) = 0</math> &nbsp; <math>\forall j</math>. The value of <math>r_i(x_j)</math> is the margin at iteration <math>i</math> for example <math>x_j</math>. + +While <math>s > 0</math>: +* Set the weights of each example: <math>W_{i}(x_j) = e^{- \frac{(r_i(x_j)+s)^2}{c}}</math>, where <math>r_i(x_j)</math> is the margin of example <math>x_j</math> +* Find a classifier <math>h_i : X \to \{-1,+1\}</math> such that <math>\sum_j W_i(x_j) h_i(x_j) y_j > 0</math> +* Find values <math>\alpha, t</math> that satisfy the equation: <br /> <math>\sum_j h_i(x_j) y_j e^{-\frac{(r_i(x_j)+\alpha h_i(x_j) y_j + s - t)^2}{c}} = 0</math>. <br />(Note this is similar to the condition <math>E_{W_{i+1}}[h_i(x_j) y_j]=0</math> set forth by Schapire and Singer.<ref name="Schapire99">Robert Schapire and Yoram Singer. Improved Boosting Using Confidence-rated Predictions. Journal of Machine Learning, Vol 37(3), pages 297-336. 1999</ref> In this setting, we are numerically finding the <math>W_{i+1} = \exp(\frac{\ldots}{\ldots})</math> such that <math>E_{W_{i+1}}[h_i(x_j) y_j]=0</math>.) <br /> This update is subject to the constraint <br /> <math> \sum \left(\Phi\left(r_i(x_j) + \alpha h(x_j) y_j + s - t\right) - \Phi\left( r_i(x_j) + s \right) \right) = 0 </math>, <br /> where <math> \Phi(z) = 1-\mbox{erf}(z/\sqrt{c}) </math> is the potential loss for a point with margin <math>r_i(x_j)</math> +* Update the margins for each example: <math>r_{i+1}(x_j) = r_i(x_j) + \alpha h(x_j) y_j</math> +* Update the time remaining: <math>s = s - t</math> + +Output: <math>H(x) = \textrm{sign}\left( \sum_i \alpha_{i} h_{i}(x) \right)</math> + +==Empirical Results== + +In preliminary experimental results with noisy datasets, BrownBoost outperformed [[AdaBoost]]'s generalization error; however, [[LogitBoost]] performed as well as BrownBoost.<ref name="McDonald03">Ross A. McDonald, David J. Hand, Idris A. Eckley. An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise. Multiple Classifier Systems, In Series Lecture Notes in Computer Science, pages 35-44, 2003.</ref> An implementation of BrownBoost can be found in the open source software [[JBoost]]. + +==References== +{{Reflist}} + +==See also== +* [[Boosting (machine learning)|Boosting]] +* [[AdaBoost]] +* [[Alternating decision tree]]s +* [[JBoost]] + +{{DEFAULTSORT:Brownboost}} +[[Category:Classification algorithms]] +[[Category:Ensemble learning]] + b2i1z6spritc6r4qmjllf2qcqn2j32t + + + + List of RNA structure prediction software + 0 + 21544 + + 21545 + 2014-01-25T07:45:32Z + + Monkbot + 0 + + + Fix [[Help:CS1_errors#deprecated_params|CS1 deprecated date parameter errors]] + wikitext + text/x-wiki + This '''list of RNA structure prediction software''' is a compilation of software tools and web portals used for [[RNA structure]] prediction. + +==Single sequence secondary structure prediction== +{| class="wikitable sortable" +! Name +! Description +! Knots<br/><ref group=Note>'''Knots:''' [[Pseudoknot]] prediction, <yes|no>.</ref> +! Links || References +|- +! [[CentroidFold]] +|Secondary structure prediction based on generalized centroid estimator || no || [http://www.ncrna.org/centroidfold/ sourcecode] [http://www.ncrna.org/centroidfold/ webserver]||<ref name="pmid19095700">{{cite journal | author = Michiaki Hamada, Hisanori Kiryu, Kengo Sato, Toutai Mituyama, Kiyoshi Asai | title = Predictions of RNA secondary structure using generalized centroid estimators | journal = Bioinformatics | volume = 25 | issue = 4 | pages = 465–473 | year = 2009 | pmid = 19095700 | doi = 10.1093/bioinformatics/btn601 }}</ref> +|- +! [[CentroidHomfold]] +|Secondary structure prediction by using homologous sequence information || no || [http://www.ncrna.org/centroidfold/ sourcecode] [http://www.ncrna.org/centroidhomfold/ webserver] ||<ref name="pmid19478007">{{cite journal | author = Michiaki Hamada, Hisanori Kiryu, Kengo Sato, Toutai Mituyama, Kiyoshi Asai | title = Predictions of RNA secondary structure by combining homologous sequence information | journal = Bioinformatics | volume = 25 | issue = 12 | pages = i330 - i3388 | year = 2009 | pmid = 19478007 | doi = 10.1093/bioinformatics/btp228 }}</ref> +|- +! [[Context Fold]] +|An RNA secondary structure prediction software based on feature-rich trained scoring models. || no || [http://www.cs.bgu.ac.il/~negevcb/contextfold/ContextFold_1_00.zip sourcecode] [http://www.cs.bgu.ac.il/~negevcb/contextfold/ webserver] ||<ref name="pmid22035327">{{cite journal | author = Shay Zakov, Yoav Goldberg, Michael Elhadad, Michal Ziv-Ukelson | title = Rich parameterization improves RNA structure prediction | journal = Journal of Computational Biology | volume = 18 | issue = 11 | pages = 1525–1542 | year = 2011 | pmid = 22035327 | doi = 10.1089/cmb.2011.0184 }}</ref> +|- +! [[CONTRAfold]] +|Secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon [[SCFG]]s by using discriminative training and [[feature-rich]] scoring. || no || [http://contra.stanford.edu/contrafold/ sourcecode] [http://contra.stanford.edu/contrafold/server.html webserver]||<ref name="pmid16873527">{{cite journal | author = Do CB, Woods DA, Batzoglou S | title = CONTRAfold: RNA secondary structure prediction without physics-based models | journal = Bioinformatics | volume = 22 | issue = 14 | pages = e90–8 | year = 2006 | pmid = 16873527 | doi = 10.1093/bioinformatics/btl246 }}</ref> +|- +! [[CyloFold]] +|Secondary structure prediction method based on placement of helices allowing complex pseudoknots. || yes || [http://cylofold.abcc.ncifcrf.gov/ webserver] ||<ref name="pmid20501603">{{cite journal | author = Bindewald E, Kluth T, Shapiro BA | title = CyloFold: secondary structure prediction including pseudoknots | journal = Nucleic Acids Research | volume = Suppl | issue = W | pages = 368–72 | year = 2010 | pmid = 20501603 | pmc = 2896150 | doi = 10.1093/nar/gkq432 }}</ref> +|- +! [[IPknot]] +|Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. || yes || [https://code.google.com/p/ipknot/ sourcecode] [http://rna.naist.jp/ipknot/ webserver] ||<ref name="pmid21685106">{{cite journal | author = Sato K, Kato Y, Hamada M, Akutsu T, Asai K | title = IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming | journal = Bioinformatics | volume = 27 | issue = 13 | pages = i85-93 | year = 2011 | pmid = 21685106 | pmc = 3117384 | doi = 10.1093/bioinformatics/btr215 }}</ref> +|- +! [[KineFold]] +|Folding kinetics of RNA sequences including pseudoknots by including an implementation of the partition function for knots.||yes||[http://kinefold.curie.fr/ linuxbinary,] [http://kinefold.curie.fr/cgi-bin/form.pl webserver]||<ref name="pmid15980546">{{cite journal | author = Xayaphoummine A, Bucher T, Isambert H | title = Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots | journal = Nucleic Acids Res. | volume = 33 | issue = Web Server issue | pages = W605–10 | year = 2005 | pmid = 15980546 | doi = 10.1093/nar/gki447 | pmc = 1160208 }}</ref><ref name="pmid14676318">{{cite journal | author = Xayaphoummine A, Bucher T, Thalmann F, Isambert H | title = Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations | journal = Proc. Natl. Acad. Sci. U.S.A. | volume = 100 | issue = 26 | pages = 15310–5 | year = 2003 | pmid = 14676318 | doi = 10.1073/pnas.2536430100 | pmc = 307563 |arxiv = physics/0309117 |bibcode = 2003PNAS..10015310X }}</ref> +|- +! [[Mfold]] +|[[Gibbs free energy|MFE]] (Minimum Free Energy) RNA structure prediction algorithm. || no || [http://www.bioinfo.rpi.edu/applications/mfold/ sourcecode,] [http://mfold.rit.albany.edu/?q=mfold webserver] || <ref name="pmid6163133">{{cite journal | author = Zuker M, Stiegler P | title = Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information | journal = Nucleic Acids Res. | volume = 9 | issue = 1 | pages = 133–48 | year = 1981 | pmid = 6163133 | doi = 10.1093/nar/9.1.133 | pmc = 326673 }}</ref> +|- +! [[Pknots]] +|A dynamic programming algorithm for optimal RNA pseudoknot prediction using the nearest neighbour energy model. ||yes||[http://selab.janelia.org/software.html sourcecode]||<ref name="pmid9925784">{{cite journal | author = Rivas E, Eddy SR | title = A dynamic programming algorithm for RNA structure prediction including pseudoknots | journal = J. Mol. Biol. | volume = 285 | issue = 5 | pages = 2053–68 | year = 1999 | pmid = 9925784 | doi = 10.1006/jmbi.1998.2436 }}</ref> +|- +! [[PknotsRG]] +|A dynamic programming algorithm for the prediction of a restricted class of RNA pseudoknots.||yes||[http://bibiserv.techfak.uni-bielefeld.de/pknotsrg/welcome.html sourcecode,] [http://bibiserv.techfak.uni-bielefeld.de/pknotsrg/submission.html webserver]||<ref name="pmid17478505">{{cite journal | author = Reeder J, Steffen P, Giegerich R | title = pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows | journal = Nucleic Acids Res. | volume = 35 | issue = Web Server issue | pages = W320–4 | year = 2007 | pmid = 17478505 | doi = 10.1093/nar/gkm258 | pmc = 1933184 }}</ref> +|- +! [[RNA123]] +|Secondary structure prediction via thermodynamic-based folding algorithms and novel structure-based sequence alignment specific for RNA.|| yes || [http://www.rna123.com/ webserver] || +|- +! [[RNAfold]] +|MFE RNA structure prediction algorithm. Includes an implementation of the partition function for computing basepair probabilities and circular RNA folding.|| no || [http://www.tbi.univie.ac.at/~ivo/RNA/ sourcecode,] [http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi webserver] || +<ref name="pmid6163133"/><ref name="RNAInverse">{{cite journal | author = I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster | title = Fast Folding and Comparison of RNA Secondary Structures. | journal = Monatshefte f. Chemie | volume = 125 | issue = 2 | pages = 167–188 | year = 1994 | doi = 10.1007/BF00818163}}</ref><ref name="pmid1695107">{{cite journal | author = McCaskill JS | title = The equilibrium partition function and base pair binding probabilities for RNA secondary structure | journal = Biopolymers | volume = 29 | issue = 6-7 | pages = 1105–19 | year = 1990 | pmid = 1695107 | doi = 10.1002/bip.360290621 }}</ref><ref name="pmid16452114">{{cite journal | author = Hofacker IL, Stadler PF | title = Memory efficient folding algorithms for [[circular RNA]] secondary structures | journal = Bioinformatics | volume = 22 | issue = 10 | pages = 1172–6 | year = 2006 | pmid = 16452114 | doi = 10.1093/bioinformatics/btl023 }}</ref><ref name="pmid17611759">{{cite journal | author = Bompfünewerer AF, Backofen R, Bernhart SH, ''et al.'' | title = Variations on RNA folding and alignment: lessons from Benasque | journal = J Math Biol | volume = 56 | issue = 1-2 | pages = 129–144 | year = 2008 | pmid = 17611759 | doi = 10.1007/s00285-007-0107-5 }}</ref> +|- +! [[RNAshapes]] +|MFE RNA structure prediction based on abstract shapes. Shape abstraction retains adjacency and nesting of structural features, but disregards helix lengths, thus reduces the number of suboptimal solutions without losing significant information. Furthermore, shapes represent classes of structures for which probabilities based on Boltzmann-weighted energies can be computed.|| no || [http://bibiserv.techfak.uni-bielefeld.de/download/tools/rnashapes.html source & binaries,] [http://bibiserv.techfak.uni-bielefeld.de/rnashapes/ webserver] ||<ref name="abstract shapes">{{cite journal | author = R. Giegerich, B.Voß, M. Rehmsmeier | title = Abstract shapes of RNA. | journal = Nucleic Acids Res. | volume = 32 | issue = 16 | pages = 4843–4851 | year = 2004 | doi = 10.1093/nar/gkh779 | pmid = 15371549 | pmc = 519098}}</ref><ref name="shape probabilities">{{cite journal | author = B. Voß, R. Giegerich, M. Rehmsmeier | title = Complete probabilistic analysis of RNA shapes. | journal = BMC Biology | volume = 4 | year = 2006 | doi = 10.1186/1741-7007-4-5 | pmid = 16480488 | pages = 5 | pmc = 1479382}}</ref> +|- +! [[RNAstructure]] +|A program to predict lowest free energy structures and base pair probabilities for RNA or DNA sequences. Programs are also available to predict Maximum Expected Accuracy structures and these can include pseudoknots. Structure prediction can be constrained using experimental data, including SHAPE, enzymatic cleavage, and chemical modification accessibility. Graphical user interfaces are available for Windows and for Mac OS-X/Linux. Programs are also available for use with Unix-style text interfaces. Additionally, a C++ class library is available.|| yes || [http://rna.urmc.rochester.edu/RNAstructure.html source & binaries] || +<ref name="RNAstructure">{{cite journal | doi = 10.1073/pnas.0401799101 | author = D.H. Mathews, M.D. Disney, J. L. Childs, S.J. Schroeder, M. Zuker, D.H. Turner | title = Incorporating chemical modification constraints into a dynamic programming algorothm for prediction of RNA secondary structure. | journal = Proceedings of the National Academy of Sciences of the United States of America| volume = 101 | issue = 19 | pages = 7287–7292 | year = 2004 | pmid = 15123812 | pmc = 409911 | bibcode=2004PNAS..101.7287M}}</ref><ref name="RNAstructure-partition">{{cite journal | author = D.H. Mathews | title = Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. | journal = RNA | volume = 10 | pages = 1178–1190 | year = 2004 | pmid = 15272118 | issue = 8 | doi = 10.1261/rna.7650904 | pmc = 1370608 }}</ref> +|- +! [[Sfold]] +|Statistical sampling of all possible structures. The sampling is weighted by partition function probabilities. || no ||[http://sfold.wadsworth.org webserver]||<ref name="pmid14654704">{{cite journal | author = Ding Y, Lawrence CE | title = A statistical sampling algorithm for RNA secondary structure prediction | journal = Nucleic Acids Res. | volume = 31 | issue = 24 | pages = 7280–301 | year = 2003 | pmid = 14654704 | doi = 10.1093/nar/gkg938 | pmc = 297010 }}</ref><ref name="pmid15215366">{{cite journal | author = Ding Y, Chan CY, Lawrence CE | title = Sfold web server for statistical folding and rational design of nucleic acids | journal = Nucleic Acids Res. | volume = 32 | issue = Web Server issue | pages = W135–41 | year = 2004 | pmid = 15215366 | doi = 10.1093/nar/gkh449 | pmc = 441587 }}</ref><ref name="pmid16043502">{{cite journal | author = Ding Y, Chan CY, Lawrence CE | title = RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble | journal = RNA | volume = 11 | issue = 8 | pages = 1157–66 | year = 2005 | pmid = 16043502 | doi = 10.1261/rna.2500605 | pmc = 1370799 }}</ref><ref name="pmid16109749">{{cite journal | author = Chan CY, Lawrence CE, Ding Y | title = Structure clustering features on the Sfold Web server | journal = Bioinformatics | volume = 21 | issue = 20 | pages = 3926–8 | year = 2005 | pmid = 16109749 | doi = 10.1093/bioinformatics/bti632 }}</ref> +|- +! [[UNAFold]] +|The UNAFold software package is an integrated collection of programs that simulate folding, hybridization, and melting pathways for one or two single-stranded nucleic acid sequences. || no || [http://www.bioinfo.rpi.edu/applications/hybrid/download.php sourcecode] || <ref name="pmid18712296">{{cite journal |author=Markham NR, Zuker M |title=UNAFold: software for nucleic acid folding and hybridization. |journal=Methods Mol Biol |volume=453 |issue= |pages=3–31 |year=2008 |pmid=18712296 |doi=10.1007/978-1-60327-429-6_1}}</ref> +|- +! [[Crumple]] +|Crumple is simple, cleanly written software for producing the full set of possible secondary structures for a single sequence, given optional constraints. || no || [http://adenosine.chem.ou.edu#crumple sourcecode] || <ref name="pmid21723827">{{cite journal |author=Schroeder S, Bleckley S, Stone JW |title=Ensemble of secondary structures for encapsidated satellite tobacco mosaic virus RNA consistent with chemical probing and crystallography constraints. |journal=Biophysical Journal |volume=101 |issue=1 |pages=167–175 |year=2011 |pmid=21723827 |doi=10.1016/j.bpj.2011.05.053|bibcode = 2011BpJ...101..167S }}</ref> +|- +! [[Sliding Windows & Assembly]] +|Sliding windows and assembly is a tool chain for folding long series of similar hairpins. || no || [http://adenosine.chem.ou.edu#sliding sourcecode] || <ref name="pmid21723827"/> +|- +| colspan=5| +;Notes:{{reflist|group=Note}} +|} + +==Single sequence tertiary structure prediction== +{| class="wikitable sortable" +! Name +! Description +! Knots<br/><ref group=Note>'''Knots:''' [[Pseudoknot]] prediction, <yes|no>.</ref> +! Links || References +|- +! [[BARNACLE]] +|A Python library for the probabilistic sampling of RNA structures that are compatible with a given nucleotide sequence and that are RNA-like on a local length scale. || yes || [http://sourceforge.net/projects/barnacle-rna/ sourcecode] || <ref name="pmid19543381">{{cite journal |author=Frellsen J, Moltke I, Thiim M, Mardia KV, Ferkinghoff-Borg J, Hamelryck T |editor1-last=Gardner |editor1-first=Paul |title=A probabilistic model of RNA conformational space. |journal=PLoS Comput Biol |volume=5 |issue=6 |pages=e1000406 |year=2009 |pmid=19543381 |doi=10.1371/journal.pcbi.1000406 |pmc=2691987}}</ref> +|- +! [[FARNA]] +|Automated de novo prediction of native-like RNA tertiary structures . || yes || [http://faculty.washington.edu/rhiju/FARNA/ sourcecode] || <ref name="pmid17726102">{{cite journal |author=Das R, Baker D |title=Automated de novo prediction of native-like RNA tertiary structures |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=104 |issue=37 |pages=14664–9 |date=September 2007 |pmid=17726102 |pmc=1955458 |doi=10.1073/pnas.0703836104 |url=http://www.pnas.org/cgi/pmidlookup?view=long&pmid=17726102|bibcode = 2007PNAS..10414664D }}</ref> +|- +! [[iFoldRNA]] +|three-dimensional RNA structure prediction and folding || yes || [http://iFoldRNA.dokhlab.org webserver] || <ref name="pmid18579566">{{cite journal |author=Sharma S, Ding F, Dokholyan NV |title=iFoldRNA: three-dimensional RNA structure prediction and folding |journal=Bioinformatics |volume=24 |issue=17 |pages=1951–2 |date=September 2008 |pmid=18579566 |pmc=2559968 |doi=10.1093/bioinformatics/btn328 |url=http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18579566}}</ref> +|- +! [[MC-Fold MC-Sym Pipeline]] +| Thermodynamics and Nucleotide cyclic motifs for RNA structure prediction algorithm. 2D and 3D structures. || yes || [http://www.major.iric.ca/MajorLabEn/MC-Tools.html sourcecode,] [http://www.major.iric.ca/MC-Pipeline/ webserver] || <ref name="pmid18322526">{{cite journal | author = Parisien M, Major F | title = The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data | journal = Nature | volume = 452 | issue = 1 | pages = 51–55 | year = 2008 | pmid = 18322526 | doi = 10.1038/nature06684 |bibcode = 2008Natur.452...51P }}</ref> +|- +! [[NAST (software)|NAST]] +|Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters || ? || [https://simtk.org/home/nast executables] || <ref name="pmid20651028">{{cite journal |author=SC Flores, RB Altman |title=Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters |journal=RNA |volume=15 |issue=9 |pages=1769–1778 |date=September 2010 |pmid=20651028 |pmc=2648710 |doi=10.1261/rna.1270809 |url=http://rnajournal.cshlp.org/cgi/pmidlookup?view=long&pmid=19144906}}</ref> +|- +! [[MMB (software)|MMB]] +|Turning limited experimental information into 3D models of RNA || ? || [https://simtk.org/home/rnatoolbox sourcecode] || <ref name="pmid19144906">{{cite journal |author=Jonikas MA, Radmer RJ, Laederach A, ''et al.'' |title=Turning limited experimental information into 3D models of RNA |journal=RNA |volume=16 |issue=2 |pages=189–99 |date=February 2009 |pmid=19144906 |pmc= 2924536 |doi=10.1261/rna.2112110 |url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924536/}}</ref> +|- +! [[RNA123]] +|An integrated platform for de novo and homology modeling of RNA 3D structures, where coordinate file input, sequence editing, sequence alignment, structure prediction and analysis features are all accessed from a single intuitive graphical user interface. || yes || [http://www.rna123.com/ webserver] || +|- +! [[RNAComposer]] +|Fully automated prediction of large RNA 3D structures. || yes || [http://rnacomposer.cs.put.poznan.pl/ webserver] [http://rnacomposer.ibch.poznan.pl/ webserver] || <ref name="pmid22539264">{{cite journal |author= Popenda M, Szachniuk M, Antczak M, Purzycka KJ, Lukasiak P, Bartol N, Blazewicz J, Adamiak RW | title = Automated 3D structure composition for large RNAs | journal = Nucleic Acids Res. | volume = 40 | issue = 14 | pages = 1–12 | year = 2012 | pmid = 22539264 |pmc= 3413140 | doi = 10.1093/nar/gks339 }}</ref> +|- +| colspan=5| +;Notes:{{reflist|group=Note}} +|} + +==Comparative methods== +The single sequence methods mentioned above have a difficult job detecting a small sample of reasonable secondary structures from a large space of possible structures. A good way to reduce the size of the space is to use evolutionary approaches. Structures that have been conserved by evolution are far more likely to be the functional form. The methods below use this approach. + +{| class="wikitable sortable" +! Name +! Description +! Number of sequences<br/><ref group=Note>'''Number of sequences:''' <any|num>.</ref> +! Alignment<br/><ref group=Note>'''Alignment:''' predicts an [[sequence alignment|alignment]], <input|yes|no>.</ref> +! Structure<br/><ref group=Note>'''Structure:''' predicts [[RNA structure|structure]], <input|yes|no>.</ref> +! Knots<br/><ref group=Note>'''Knots:''' [[Pseudoknot]] prediction, <yes|no>.</ref> +! Link || References +|- +! [[Carnac (software)|Carnac]] +|Comparative analysis combined with MFE folding.||any||no||yes||no||[http://bioinfo.lifl.fr/RNA/carnac/index.php sourcecode,] [http://bioinfo.lifl.fr/RNA/carnac/carnac.php webserver]||<ref name="pmid12499300">{{cite journal | author = Perriquet O, Touzet H, Dauchet M. | title = Finding the common structure shared by two homologous RNAs. | journal = Bioinformatics. | year = 2003 | volume = 19| issue = 1 | pages = 108–16 | pmid = 12499300 | doi = 10.1093/bioinformatics/19.1.108 }}</ref><ref name="pmid15215367">{{cite journal | author = Touzet H, Perriquet O. | title = CARNAC: folding families of related RNAs. | series = 32| journal = Nucleic Acids Res. | date = Jul 1, 2004| volume = (Web Server issue)| pages = W142–5.| pmid = 15215367 | issue = Web Server issue | doi = 10.1093/nar/gkh415 | pmc = 441553}}</ref> +|- +! [[CentroidAlifold]] +|Common secondary structure prediction based on generalized centroid estimator ||any||no||yes||no|| [http://www.ncrna.org/centroidfold/ sourcecode] [http://www.ncrna.org/centroidfold/ webserver]||<ref name="pmid20843778">{{cite journal | author = Michiaki Hamada, Kengo Sato, Kiyoshi Asai | title = Improving the accuracy of predicting secondary structure for aligned RNA sequences | journal = Nucleic Acids Res. | volume = 39 | issue = 2 | pages = 393–402 | year = 2011 | pmid = 20843778 | doi = 10.1093/nar/gkq792 }}</ref> +|- +! [[CentroidAlign]] +|Fast and accurate multiple aligner for RNA sequences ||any||yes||no||no|| [http://www.ncrna.org/software/centroidalign sourcecode] ||<ref name="pmid19808876">{{cite journal | author = Michiaki Hamada, Kengo Sato, Hisanori Kiryu, Toutai Mituyama, Kiyoshi Asai | title = CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score | journal = Bioinformatics | volume = 25 | issue = 24 | pages = 3236–43 | year = 2009 | pmid = 19808876 | doi = 10.1093/bioinformatics/btp580 }}</ref> +|- +! [[CMfinder]] +|an expectation maximization algorithm using covariance models for motif description. Uses heuristics for effective motif search, and a Bayesian framework for structure prediction combining folding energy and sequence covariation.||<math>3\le seqs \le60</math>||yes||yes||no||[http://bio.cs.washington.edu/yzizhen/CMfinder/ sourcecode,] [http://wingless.cs.washington.edu/htbin-post/unrestricted/CMfinderWeb/CMfinderInput.pl webserver,] [http://bio.cs.washington.edu/yzizhen/CMfinder/ website]|| <ref name="pmid16357030">{{cite journal | author = Yao Z, Weinberg Z, Ruzzo WL | title = CMfinder--a covariance model based RNA motif finding algorithm | journal = Bioinformatics | volume = 22 | issue = 4 | pages = 445–52 | year = 2006 | pmid = 16357030 | doi = 10.1093/bioinformatics/btk008 }}</ref> +|- +! [[CONSAN]] +|implements a pinned Sankoff algorithm for simultaneous pairwise RNA alignment and consensus structure prediction.|| 2 || yes || yes || no || [http://selab.janelia.org/software.html sourcecode] || <ref name="pmid16952317">{{cite journal | author = Dowell RD, Eddy SR | title = Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints | journal = BMC Bioinformatics | volume = 7| pages = 400 | year = 2006 | pmid = 16952317 | doi = 10.1186/1471-2105-7-400 | pmc = 1579236 }}</ref> +|- +! [[DAFS (software)|DAFS]] +|Simultaneous aligning and folding of RNA sequences via dual decomposition.|| any || yes || yes || yes || [https://code.google.com/p/dafs-rna/ sourcecode] || <ref name="pmid23060618">{{cite journal | author = Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y | title = DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition | journal = Bioinformatics | volume = 28| pages = 3218–24 | year = 2012 | pmid = 23060618 | doi = 10.1093/bioinformatics/bts612 }}</ref> +|- +! [[Dynalign]] +|an algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. || 2 || yes || yes || no || [http://rna.urmc.rochester.edu/dynalign.html sourcecode] || <ref name="pmid11902836">{{cite journal | author = Mathews DH, Turner DH | title = Dynalign: an algorithm for finding the secondary structure common to two RNA sequences | journal = J. Mol. Biol. | volume = 317 | issue = 2 | pages = 191–203 | year = 2002 | pmid = 11902836 | doi = 10.1006/jmbi.2001.5351 }}</ref><ref name="pmid15731207">{{cite journal | author = Mathews DH | title = Predicting a set of minimal free energy RNA secondary structures common to two sequences | journal = Bioinformatics | volume = 21 | issue = 10 | pages = 2246–53 | year = 2005 | pmid = 15731207 | doi = 10.1093/bioinformatics/bti349 }}</ref><ref name="pmid17445273">{{cite journal | author = Harmanci AO, Sharma G, Mathews DH | title = Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign | journal = BMC Bioinformatics | volume = 8| pages = 130 | year = 2007 | pmid = 17445273 | doi = 10.1186/1471-2105-8-130 | pmc = 1868766 }}</ref> +|- +! [[FoldalignM]] +|A multiple RNA structural RNA alignment method, to a large extend based on the PMcomp program.||any||yes||yes||no|| [http://foldalign.ku.dk/software/index.html sourcecode]|| <ref name="pmid17324941">{{cite journal | author = Torarinsson E, Havgaard JH, Gorodkin J | title = Multiple structural alignment and clustering of RNA sequences | journal = Bioinformatics | volume = 23 | issue = 8 | pages = 926–32 | year = 2007 | pmid = 17324941 | doi = 10.1093/bioinformatics/btm049 }}</ref> +|- +! [[FRUUT]] +|A pairwise RNA structural alignment tool based on the comparison of RNA trees. Considers alignments in which the compared trees can be rooted differently (with respect to the standard “external loop” corresponding roots), and/or permuted with respect to branching order.||any||yes||input||no||[http://www.cs.bgu.ac.il/~negevcb/FRUUT/code/FRUUT-2.40.jar sourcecode,] [http://www.cs.bgu.ac.il/~negevcb/FRUUT/ webserver]|| <ref name="978-3-642-33122-0_11">{{cite journal | author = Milo Nimrod, Zakov Shay, Katzenelson Erez, Bachmat Eitan, Dinitz Yefim, Ziv-Ukelson Michal | title = RNA Tree Comparisons via Unrooted Unordered Alignments | journal = Algorithms in Bioinformatics | volume = 7534 | pages = 135–148 | year = 2012 | pmid = | doi = 10.1007/978-3-642-33122-0_11 | issn = }}</ref><ref name="pmid23590940">{{cite journal | author = Milo Nimrod, Zakov Shay, Katzenelson Erez, Bachmat Eitan, Dinitz Yefim, Ziv-Ukelson Michal | title = Unrooted unordered homeomorphic subtree alignment of RNA trees | journal = Algorithms for Molecular Biology | volume = 8 | pages = 13 | year = 2013 | pmid = 23590940 | doi = 10.1186/1748-7188-8-13 | issn = 1748-7188}}</ref> +|- +! [[GraphClust]] +|Fast RNA structural clustering method of local RNA secondary structures. Predicted clusters are refined using LocARNA and CMsearch. Due to the linear time complexity for clustering it is possible to analyse large RNA datasets. ||any||yes||yes||no|| [http://www.bioinf.uni-freiburg.de/Software/GraphClust/ sourcecode]|| <ref name="pmid22689765">{{cite journal | author = Heyne S, Costa F, Rose D, Backofen R | title = GraphClust: alignment-free structural clustering of local RNA secondary structures | journal = Bioinformatics | volume = 28 | issue = 12 | pages = i224-i232 | year = 2012 | pmid = 22689765 | doi = 10.1093/bioinformatics/bts224 }}</ref> +|- +! [[KNetFold]] +|Computes a consensus RNA secondary structure from an RNA sequence alignment based on machine learning.||any||input||yes||yes||[http://www-lmmb.ncifcrf.gov/~bshapiro/downloader_v1/register.php linuxbinary,] [http://knetfold.abcc.ncifcrf.gov/ webserver]|| <ref name="pmid16495232">{{cite journal | author = Bindewald E, Shapiro BA | title = RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers | journal = RNA | volume = 12 | issue = 3 | pages = 342–52 | year = 2006 | pmid = 16495232 | doi = 10.1261/rna.2164906 | pmc = 1383574 }}</ref> +|- +! [[LARA (software)|LARA]] +|Produce a global fold and alignment of ncRNA families using integer linear programming and Lagrangian relaxation.||any||yes||yes||no||[https://www.mi.fu-berlin.de/w/LiSA/ sourcecode] || <ref name="pmid17662141">{{cite journal | author = Bauer M, Klau GW, Reinert K. | title = Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. | journal = BMC Bioinformatics. | volume = 8 | year = 2007 | pmid = 17662141 | doi = 10.1186/1471-2105-8-271 | pages = 271 | pmc = 1955456 }}</ref> +|- +! [[LocaRNA]] +|LocaRNA is the successor of PMcomp with an improved time complexity. It is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices. || any || yes || yes || no || [http://www.bioinf.uni-freiburg.de/Software/LocARNA/ sourcecode,] [http://rna.informatik.uni-freiburg.de:8080/LocARNA/ webserver] || <ref name="pmid17432929">{{cite journal | author = Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R | title = Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. | journal = PLoS Comput Biol. | volume = 3 | issue = 4 | pages = e65 | year = 2007 | pmid = 17432929 | doi = 10.1371/journal.pcbi.0030065 | pmc = 1851984 |bibcode = 2007PLSCB...3...65W }}</ref> +|- +! [[MASTR]] +|A sampling approach using Markov chain Monte Carlo in a [[simulated annealing]] framework, where both structure and alignment is optimized by making small local changes. The score combines the log-likelihood of the alignment, a covariation term and the basepair probabilities.||any||yes||yes||no|| [http://mastr.binf.ku.dk/ sourcecode]|| <ref name="pmid17038338">{{cite journal | author = Lindgreen S, Gardner PP, Krogh A | title = Measuring covariation in RNA alignments: physical realism improves information measures | journal = Bioinformatics | volume = 22 | issue = 24 | pages = 2988–95 | year = 2006 | pmid = 17038338 | doi = 10.1093/bioinformatics/btl514 }}</ref><ref name="pmid18006551">{{cite journal | author = Lindgreen S, Gardner PP, Krogh A | title = MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing | journal = Bioinformatics | volume = 23 | issue = 24 | pages = 3304–11 | year = 2007 | pmid = 18006551 | doi = 10.1093/bioinformatics/btm525 }}</ref> +|- +! [[Multilign]] +|This method uses multiple Dynalign calculations to find a low free energy structure common to any number of sequences. It does not require any sequence identity. || any || yes || yes || no || [http://rna.urmc.rochester.edu/RNAstructure.html sourcecode] || <ref name="pmid21193521">{{cite journal | author = Xu Z, Mathews DH | title = Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences | journal = Bioinformatics | volume = 27 | issue = 5 | pages = 626–632 | year = 2011 | pmid = 21193521 | doi = 10.1093/bioinformatics/btq726 }}</ref> +|- +! [[Murlet]] +|a multiple alignment tool for RNA sequences using iterative alignment based on Sankoff's algorithm with sharply reduced computational time and memory. || any || yes || yes || no || [http://murlet.ncrna.org/murlet/murlet.html webserver] || <ref name="pmid17459961">{{cite journal | author = Kiryu H, Tabei Y, Kin T, Asai K | title = Murlet: a practical multiple alignment tool for structural RNA sequences | journal = Bioinformatics | volume = 23 | issue = 13 | pages = 1588–98 | year = 2007 | pmid = 17459961 | doi = 10.1093/bioinformatics/btm146 }}</ref> +|- +! [[MXSCARNA]] +|a multiple alignment tool for RNA sequences using progressive alignment based on pairwise structural alignment algorithm of SCARNA. || any || yes || yes || no || [http://mxscarna.ncrna.org/mxscarna/mxscarna.html webserver] [http://www.ncrna.org/software/mxscarna/download/ sourcecode] || <ref name="MXSCARNA">{{cite journal | author = Tabei Y, Kiryu H, Kin T, Asai K | title = A fast structural multiple alignment method for long RNA sequences | journal = BMC Bioinformatics | year = 2008 | volume = 33 | url=http://www.biomedcentral.com/1471-2105/9/33}}</ref> +|- +! [[PARTS]] +|A method for joint prediction of alignment and common secondary structures of two RNA sequences using a probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities. || 2 || yes || yes || no || [http://rna.urmc.rochester.edu sourcecode] || <ref name="pmid18304945">{{cite journal |author=Harmanci AO, Sharma G, Mathews DH |title=PARTS: probabilistic alignment for RNA joinT secondary structure prediction. |journal=Nucleic Acids Res |volume=36 |issue=7 |pages=2406–17 |year=2008 |pmid=18304945 |doi=10.1093/nar/gkn043 |pmc=2367733}}</ref> +|- +! [[Pfold]] +|Folds alignments using a SCFG trained on rRNA alignments. ||<math>\le40</math>||input||yes||no||[http://www.daimi.au.dk/~compbio/rnafold/ webserver]||<ref name="pmid10383470">{{cite journal | author = Knudsen B, Hein J | title = RNA secondary structure prediction using stochastic context-free grammars and evolutionary history | journal = Bioinformatics | volume = 15 | issue = 6 | pages = 446–54 | year = 1999 | pmid = 10383470 | doi = 10.1093/bioinformatics/15.6.446 }}</ref><ref name="pmid12824339">{{cite journal | author = Knudsen B, Hein J | title = Pfold: RNA secondary structure prediction using stochastic context-free grammars | journal = Nucleic Acids Res. | volume = 31 | issue = 13 | pages = 3423–8 | year = 2003 | pmid = 12824339 | doi = 10.1093/nar/gkg614 | pmc = 169020 }}</ref> +|- +! [[PETfold]] +|Formally integrates both the energy-based and evolution-based approaches in one model to predict the folding of multiple aligned RNA sequences by a maximum expected accuracy scoring. The structural probabilities are calculated by RNAfold and Pfold. || any || input || yes || no || [http://genome.ku.dk/resources/petfold/ sourcecode] || <ref name="pmid18836192">{{cite journal | author = Seemann S E, Gorodkin J, Backofen R | title = Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments | journal = Nucleic Acids Res. | volume = 36 | issue = 20 | pages = 6355–62 | year = 2008 | pmid = 18836192 | doi = 10.1093/nar/gkn544 | pmc = 2582601 }}</ref> +|- +! [[PMcomp/PMmulti]] +|PMcomp is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices. PMmulti is a wrapper program that does progressive multiple alignments by repeatedly calling pmcomp || <math>2\le seqs \le6</math> || yes || yes || no || [http://www.tbi.univie.ac.at/~ivo/RNA/PMcomp/ sourcecode,] [http://rna.tbi.univie.ac.at/cgi-bin/pmcgi.pl webserver] || <ref name="pmid15073017">{{cite journal | author = Hofacker IL, Bernhart SH, Stadler PF | title = Alignment of RNA base pairing probability matrices | journal = Bioinformatics | volume = 20 | issue = 14 | pages = 2222–7 | year = 2004 | pmid = 15073017 | doi = 10.1093/bioinformatics/bth229 }}</ref> +|- +! [[RNAG]] +|A Gibbs sampling method to determine a conserved structure and the structural alignment. || any || yes || yes || no || [http://ccmbweb.ccv.brown.edu/rnag.html sourcecode] || <ref name="pmid21788211">{{cite journal | author = Wei D, Alpert LV, Lawrence CE | title = RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequence | journal = Bioinformatics | volume = 27 | issue = 18 | pages = 2486–2493 | year = 2011 | pmid = 21788211 | doi = 10.1093/bioinformatics/btr421 }}</ref> +|- +! [[T-Coffee|R-COFFEE]] +|uses RNAlpfold to compute the secondary structure of the provided sequences. A modified version of [[T-Coffee]] is then used to compute the multiple sequence alignment having the best agreement with the sequences and the structures. R-Coffee can be combined with any existing sequence alignment method. || any || yes || yes || no || [http://www.tcoffee.org/Projects_home_page/r_coffee_home_page.html sourcecode,] [http://www.tcoffee.org/ webserver] || <ref name="pmid18420654">{{cite journal |author=Wilm A, Higgins DG, Notredame C |title=R-Coffee: a method for multiple alignment of non-coding RNA |journal=Nucleic Acids Res. |volume=36 |issue=9 |pages=e52 |date=May 2008 |pmid=18420654 |pmc=2396437 |doi=10.1093/nar/gkn174 |url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18420654}}</ref><ref name="pmid18483080">{{cite journal |author=Moretti S, Wilm A, Higgins DG, Xenarios I, Notredame C |title=R-Coffee: a web server for accurately aligning noncoding RNA sequences |journal=Nucleic Acids Res. |volume=36 |issue=Web Server issue |pages=W10–3 |date=July 2008 |pmid=18483080 |pmc=2447777 |doi=10.1093/nar/gkn278 |url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18483080}}</ref> +|- +! [[TurboFold]] +|This algorithm predicts conserved structures in any number of sequences. It uses probabilistic alignment and partition functions to map conserved pairs between sequences, and then iterates the partition functions to improve structure prediction accuracy || any || no || yes || yes || [http://rna.urmc.rochester.edu/RNAstructure.html sourcecode] || <ref name="pmid21507242">{{cite journal | author = Harmanci AO, Sharma G, Mathews DH | title = TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequence | journal = BMC Bioinformatics | volume = 12 | pages = 108 | year = 2011 | pmid = 21507242 | doi = 10.1186/1471-2105-12-108 | pmc=3120699}}</ref><ref name="pmid22285566">{{cite journal | author = Seetin MG, Mathews DH | title = TurboKnot: rapid prediction of conserved RNA secondary structures including pseudoknots | journal = Bioinformatics | volume = 28 | issue = 6 | pages = 792–798 | year = 2012 | pmid = 22285566 | doi = 10.1093/bioinformatics/bts044 }}</ref> +|- +! [[RNA123]] +|The structure based sequence alignment (SBSA) algorithm within RNA123 utilizes a novel suboptimal version of the Needleman-Wunsch global sequence alignment method that fully accounts for secondary structure in the template and query. It also utilizes two separate substitution matrices that are optimized for RNA helices and single stranded regions. The SBSA algorithm provides >90% accurate sequence alignments even for structures as large as bacterial 23S rRNA (~2800 nts). || any ||yes || yes || yes || [http://www.rna123.com/ webserver] || +|- +! [[RNAalifold]] +|Folds precomputed alignments using a combination of free-energy and a covariation measures. Ships with the Vienna package. || any ||input || yes || no || [http://www.tbi.univie.ac.at/~ivo/RNA/ homepage] || <ref name="RNAInverse"/><ref name="pmid12079347">{{cite journal | author = Hofacker IL, Fekete M, Stadler PF | title = Secondary structure prediction for aligned RNA sequences | journal = J. Mol. Biol. | volume = 319 | issue = 5 | pages = 1059–66 | year = 2002 | pmid = 12079347 | doi = 10.1016/S0022-2836(02)00308-X }}</ref> +|- +! [[RNAcast]] +|enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences, and for each sequence, the thermodynamically best structure which has this abstract shape. ||any||no||yes||no ||[http://bibiserv.techfak.uni-bielefeld.de/rnacast/ sourcecode,] [http://bibiserv.techfak.uni-bielefeld.de/rnashapes/submission.html webserver]|| <ref name="pmid16020472">{{cite journal | author = Reeder J, Giegerich R | title = Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction | journal = Bioinformatics | volume = 21 | issue = 17 | pages = 3516–23 | year = 2005 | pmid = 16020472 | doi = 10.1093/bioinformatics/bti577 }}</ref> +|- +! [[RNAforester]] +|Compare and align RNA secondary structures via a "forest alignment" approach.||any||yes||input||no||[http://bibiserv.techfak.uni-bielefeld.de/rnaforester/ sourcecode,] [http://bibiserv.techfak.uni-bielefeld.de/rnaforester/submission.html webserver]|| <ref name="pmid16452790">{{cite journal | author = Höchsmann M, Töller T, Giegerich R, Kurtz S | title = Local similarity in RNA secondary structures | journal = Proc IEEE Comput Soc Bioinform Conf | volume = 2 | issue = | pages = 159–68 | year = 2003 | pmid = 16452790 | doi = | issn = }}</ref><ref name="pmid17048408">{{cite journal | author = Höchsmann M, Voss B, Giegerich R | title = Pure multiple RNA secondary structure alignments: a progressive profile approach | journal = IEEE/ACM Trans Comput Biol Bioinform | volume = 1 | issue = 1 | pages = 53–62 | year = 2004 | pmid = 17048408 | doi = 10.1109/TCBB.2004.11 }}</ref> +|- +! [[RNAmine]] +|Frequent stem pattern miner from unaligned RNA sequences is a software tool to extract the structural motifs from a set of RNA sequences. || any || no || yes || no || [http://rnamine.ncrna.org/RNAMINE/ webserver] || <ref name="pmid16908501">{{cite journal | author = Hamada M, Tsuda K, Kudo T, Kin T, Asai K | title = Mining frequent stem patterns from unaligned RNA sequences | journal = Bioinformatics | volume = 22 | issue = 20 | pages = 2480–7 | year = 2006 | pmid = 16908501 | doi = 10.1093/bioinformatics/btl431 }}</ref> +|- +! [[RNASampler]] +|A probabilistic sampling approach that combines intrasequence base pairing probabilities with intersequence base alignment probabilities. This is used to sample possible stems for each sequence and compare these stems between all pairs of sequences to predict a consensus structure for two sequences. The method is extended to predict the common structure conserved among multiple sequences by using a consistency-based score that incorporates information from all the pairwise structural alignments. || any || yes || yes || yes || [http://ural.wustl.edu/~xingxu/RNASampler/index.html sourcecode] || <ref name="pmid17537756">{{cite journal | author = Xu X, Ji Y, Stormo GD | title = RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment | journal = Bioinformatics | volume = 23 | issue = 15 | pages = 1883–91 | year = 2007 | pmid = 17537756 | doi = 10.1093/bioinformatics/btm272 }}</ref> +|- +! [[SCARNA]] +|Stem Candidate Aligner for RNA (Scarna) is a fast, convenient tool for structural alignment of a pair of RNA sequences. It aligns two RNA sequences and calculates the similarities of them, based on the estimated common secondary structures. It works even for pseudoknotted secondary structures.||2||yes||yes||no|| [http://www.scarna.org/scarna/ webserver] || <ref name="pmid16690634">{{cite journal | author = Tabei Y, Tsuda K, Kin T, Asai K | title = SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments | journal = Bioinformatics | volume = 22 | issue = 14 | pages = 1723–9 | year = 2006 | pmid = 16690634 | doi = 10.1093/bioinformatics/btl177 }}</ref> +|- +! [[SimulFold]] +|simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. || any || yes || yes || yes || [http://www.cs.ubc.ca/~irmtraud/simulfold/ sourcecode] || <ref name="pmid17696604">{{cite journal | author = Meyer IM, Miklós I | title = SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework | journal = PLoS Comput. Biol. | volume = 3 | issue = 8 | pages = e149 | year = 2007 | pmid = 17696604 | doi = 10.1371/journal.pcbi.0030149 | pmc = 1941756 |bibcode = 2007PLSCB...3..149M }}</ref> +|- +! [[Stemloc]] +|a program for pairwise RNA structural alignment based on probabilistic models of RNA structure known as Pair [[SCFG|stochastic context-free grammars]].||any||yes||yes||no||[http://biowiki.org/StemLoc sourcecode]||<ref name="pmid15790387">{{cite journal | author = Holmes I | title = Accelerated probabilistic inference of RNA structure evolution | journal = BMC Bioinformatics | volume = 6| pages = 73 | year = 2005 | pmid = 15790387 | doi = 10.1186/1471-2105-6-73 | pmc = 1090553 }}</ref> +|- +! [[StrAl]] +|an alignment tool designed to provide multiple alignments of non-coding RNAs following a fast progressive strategy. It combines the thermodynamic base pairing information derived from RNAfold calculations in the form of base pairing probability vectors with the information of the primary sequence.||<math>\le50</math>||yes||no||no||[http://www.biophys.uni-duesseldorf.de/stral/about.php sourcecode,] [http://www.biophys.uni-duesseldorf.de/stral/advancedForm.php webserver]|| <ref name="pmid16613908">{{cite journal | author = Dalli D, Wilm A, Mainz I, Steger G | title = STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time | journal = Bioinformatics | volume = 22 | issue = 13 | pages = 1593–9 | year = 2006 | pmid = 16613908 | doi = 10.1093/bioinformatics/btl142 }}</ref> +|- +! [[TFold]] +|A tool for predicting non-coding RNA secondary structures including pseudoknots. It takes in input an alignment of RNA sequences and returns the predicted secondary structure(s).It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots. Users can change different parameters values, set (or not) some known stems (if there are) which are taken into account by the system, choose to get several possible structures or only one, search for pseudoknots or not, etc. +||any||yes||yes||yes||[http://tfold.ibisc.univ-evry.fr:8080/TFold/ webserver]||<ref name="PMID 20047957">{{cite journal | author = Engelen S, Tahi F | title = Tfold: efficient in silico prediction of non-coding RNA secondary structures | journal = Nucleic Acids Res. | volume = 7 | issue = 38| pages = 2453–66 | year = 2010 | pmid = 20047957 | pmc = 2853104 | doi = 10.1093/nar/gkp1067 }}</ref> +|- +! [[WAR (software)|WAR]] +|a webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences. ||<math>2\le seqs \le50</math>||yes||yes||no||[http://genome.ku.dk/resources/war/ webserver]||<ref name="pmid18492721">{{cite journal |author=Torarinsson E, Lindgreen S |title=WAR: Webserver for aligning structural RNAs. |journal=Nucleic Acids Res |volume=36 |issue=Web Server issue |pages=W79–84 |year=2008 |pmid=18492721 |doi=10.1093/nar/gkn275 |pmc=2447782}}</ref> +|- +! [[Xrate]] +|a program for analysis of multiple sequence alignments using phylogenetic [[SCFG|grammars]], that may be viewed as a flexible generalization of the "Pfold" program.||any||yes||yes||no||[http://biowiki.org/XrateSoftware sourcecode]||<ref name="pmid17018148">{{cite journal | author = Klosterman P | title = XRate: a fast prototyping, training and annotation tool for phylo-grammars | journal = BMC Bioinformatics | volume = 7 | pages = 428 | year = 2006 | pmid = 17018148 | doi = 10.1186/1471-2105-7-428 | last2 = Uzilov | first2 = AV | last3 = Bendaña | first3 = YR | last4 = Bradley | first4 = RK | last5 = Chao | first5 = S | last6 = Kosiol | first6 = C | last7 = Goldman | first7 = N | last8 = Holmes | first8 = I | pmc = 1622757 }}</ref> +|- +| colspan=8| +;Notes:{{reflist|group=Note}} +|} + +==Inter molecular interactions: RNA-RNA== +Many [[ncRNA]]s function by binding to other [[RNA]]s. For example, [[miRNA]]s regulate protein coding gene expression by binding to [[Three prime untranslated region|3' UTRs]], [[snoRNA|small nucleolar RNAs]] guide post-transcriptional modifications by binding to [[rRNA]], [[U4 spliceosomal RNA]] and [[U6 spliceosomal RNA]] bind to each other forming part of the [[spliceosome]] and many small bacterial RNAs regulate gene expression by antisense interactions E.g. [[GcvB RNA|GcvB]], [[OxyS RNA|OxyS]] and [[RyhB RNA|RyhB]]. + +{| class="wikitable sortable" +! Name +! Description || Intra-molecular structure || Comparative || Link || References +|- +! GUUGle +|A utility for fast determination of RNA-RNA matches with perfect hybridization via A-U, C-G, and G-U base pairing. || no || no || [http://bibiserv.techfak.uni-bielefeld.de/guugle/ webserver] || <ref name="pmid16403789">{{cite journal |doi=10.1093/bioinformatics/btk041 |author=Gerlach W, Giegerich R |title=GUUGle: a utility for fast exact matching under RNA complementary rules including G-U base pairing. |journal=Bioinformatics |volume=22 |issue=6 |pages=762–764 |year=2006 |pmid=16403789 }}</ref> +|- +! IntaRNA +|Efficient target prediction incorporating the accessibility of target sites || yes || no || [http://www.bioinf.uni-freiburg.de/Software/#IntaRNA-download sourcecode,] [http://rna.informatik.uni-freiburg.de:8080/v1/IntaRNA.jsp webserver] || <ref name="pmid18940824">{{cite journal |doi=10.1093/bioinformatics/btn544 |author=Busch A, Richter AS, Backofen R |title=IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. |journal=Bioinformatics |volume=24 |issue=24 |pages=2849–56 |year=2008 |pmid=18940824 |pmc=2639303}}</ref><ref name="pmid19850757">{{cite journal |doi=10.1093/bioinformatics/btp609 |author=Richter AS, Schleberger C, Backofen R, Steglich C |title=Seed-based INTARNA prediction combined with GFP-reporter system identifies mRNA targets of the small RNA Yfr1. |journal=Bioinformatics |volume=26 |issue=1 |pages=1–5 |year=2010 |pmid=19850757 |pmc=2796815}}</ref><ref name="pmid20444875">{{cite journal |author=Smith C, Heyne S, Richter AS, Will S, Backofen R |title=Freiburg RNA Tools: a web server integrating INTARNA, EXPARNA and LOCARNA. |series=38 |journal=Nucleic Acids Res |volume= Suppl|issue= Web Server|pages= W373–7|year=2010 |pmid=20444875 |pmc=2896085 |doi=10.1093/nar/gkq316}}</ref> +|- +! [[NUPACK]] +| Computes the full unpseudoknotted partition function of interacting strands in dilute solution. Calculates the concentrations, mfes, and base-pairing probabilities of the ordered complexes below a certain complexity. Also computes the partition function and basepairing of single strands including a class of pseudoknotted structures. Also enables design of ordered complexes. || yes || no || [http://nupack.org/ NUPACK] || <ref name="NUPACK">{{cite journal | doi=10.1137/060651100 | author = R.M. Dirks, J.S. Bois, J.M. Schaeffer, E. Winfree, N.A. Pierce | title = Thermodynamic Analysis of Interacting Nucleic Acid Strands | journal = SIAM Review | volume = 49 | issue=1 | pages = 65–88 | year = 2007|bibcode = 2007SIAMR..49...65D }}</ref> +|- +! [[OligoWalk/RNAstructure]] +|Predicts bimolecular secondary structures with and without intramolecular structure. Also predicts the hybridization affinity of a short nucleic acid to an RNA target. || yes || no || [http://rna.urmc.rochester.edu] || <ref name="oligowalk">{{cite journal | doi = 10.1017/S1355838299991148 | author = D.H. Mathews, M.E. Burkard, S.M. Freier, D.H. Turner | title = Predicting Oligonucleotide Affinity to RNA Targets. | journal = RNA | volume = 5 | pages = 1458–1469 | year = 1999 | pmid = 10580474 | issue = 11 | pmc = 1369867}}</ref> +|- +! [[piRNA (software)|piRNA]] +|calculates the partition function and thermodynamics of RNA-RNA interactions. It considers all possible joint secondary structure of two interacting nucleic acids that do not contain pseudoknots, interaction pseudoknots, or zigzags. || yes || no || [http://compbio.cs.sfu.ca/taverna/pirna/ linuxbinary] || <ref name="piRNA">{{cite journal | author = H. Chitsaz, R. Salari, S.C. Sahinalp, R. Backofen | title = A Partition Function Algorithm for Interacting Nucleic Acid Strands. | doi=10.1093/bioinformatics/btp212 | pmc=2687966 | journal = Bioinformatics | volume = 25 | pages = | year = 2009 | issue = 12 | pmid=19478011}}</ref> +|- +! [[RNAripalign (software)|RNAripalign]] +|calculates the partition function and thermodynamics of RNA-RNA interactions based on structural alignments. Also supports RNA-RNA interaction prediction for single sequences. It outputs suboptimal structures based on Boltzmann distribution. It considers all possible joint secondary structure of two interacting nucleic acids that do not contain pseudoknots, interaction pseudoknots, or zigzags. || yes || no || [http://www.bioinf.uni-leipzig.de/~qin/resources/ripalign.tar.gz] || <ref name="RNAripalign">{{cite journal | author =Andrew Xiang Li, Jing Qin, Manja Marz, Christian M. Reidys | title = RNA–RNA interaction prediction based on multiple sequence alignments. | doi=10.1093/bioinformatics/btq659 | journal = Bioinformatics | volume = 27 | pages = 456–463 | year = 2011 | issue = 4 }}</ref> +|- +! [[RactIP]] +|Fast and accurate prediction of RNA-RNA interaction using integer programming. || yes || no || [https://code.google.com/p/ractip/ sourcecode] [http://rna.naist.jp/ractip/ webserver]|| <ref name="pmid20823308">{{cite journal | author = Kato Y, Sato K, Hamada M, Watanabe Y, Asai K, Akutsu T | title = RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming | doi=10.1093/bioinformatics/btq372 | pmc=2935440 | journal = Bioinformatics | volume = 26 | pages = i460-6 | year = 2010 | issue = 18 | pmid=20823308}}</ref> +|- +! [[RNAaliduplex]] +|Based upon RNAduplex with bonuses for covarying sites || no || yes || [http://www.tbi.univie.ac.at/~ivo/RNA/ sourcecode] || <ref name="RNAInverse"/> +|- +! [[RNAcofold]] +|works much like RNAfold, but allows to specify two RNA sequences which are then allowed to form a dimer structure. || yes || no || [http://www.tbi.univie.ac.at/~ivo/RNA/ sourcecode] || <ref name="RNAInverse"/><ref name="pmid16722605">{{cite journal | author = Bernhart SH, Tafer H, Mückstein U, Flamm C, Stadler PF, Hofacker IL | title = Partition function and base pairing probabilities of RNA heterodimers | journal = Algorithms Mol Biol | volume = 1 | issue = 1 | pages = 3 | year = 2006 | pmid = 16722605 | doi = 10.1186/1748-7188-1-3 | pmc = 1459172 }}</ref> +|- +! [[RNAduplex]] +|computes optimal and suboptimal secondary structures for hybridization. The calculation is simplified by allowing only inter-molecular base pairs. || no || no || [http://www.tbi.univie.ac.at/~ivo/RNA/ sourcecode] || <ref name="RNAInverse"/> +|- +! [[RNAhybrid]] +|a tool for finding the minimum free energy hybridisation of a long and a short RNA. || no || no || [http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/ sourcecode,] [http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/submission.html webserver] || <ref name="pmid15383676">{{cite journal | author = Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R | title = Fast and effective prediction of microRNA/target duplexes | journal = RNA | volume = 10 | issue = 10 | pages = 1507–17 | year = 2004 | pmid = 15383676 | doi = 10.1261/rna.5248604 | pmc = 1370637 }}</ref><ref name="pmid16845047">{{cite journal | author = Krüger J, Rehmsmeier M | title = RNAhybrid: microRNA target prediction easy, fast and flexible | journal = Nucleic Acids Res. | volume = 34 | issue = Web Server issue | pages = W451–4 | year = 2006 | pmid = 16845047 | doi = 10.1093/nar/gkl243 | pmc = 1538877 }}</ref> +|- +! [[RNAup]] +|calculates the thermodynamics of RNA-RNA interactions. RNA-RNA binding is decomposed into two stages. (1) First the probability that a sequence interval (e.g. a binding site) remains unpaired is computed. (2) Then the binding energy given that the binding site is unpaired is calculated as the optimum over all possible types of bindings. || yes || no || [http://www.tbi.univie.ac.at/~ivo/RNA/ sourcecode] || <ref name="RNAInverse"/><ref name="pmid16446276">{{cite journal | author = Mückstein U, Tafer H, Hackermüller J, Bernhart SH, Stadler PF, Hofacker IL | title = Thermodynamics of RNA-RNA binding | journal = Bioinformatics | volume = 22 | issue = 10 | pages = 1177–82 | year = 2006 | pmid = 16446276 | doi = 10.1093/bioinformatics/btl024 }}</ref> +|} + +==Inter molecular interactions: MicroRNA:UTR== +[[miRNA|MicroRNAs]] regulate protein coding gene expression by binding to [[Three prime untranslated region|3' UTRs]], there are tools specifically designed for predicting these interactions. For an evaluation of target prediction methods on high-throughput experimental data see (Baek ''et al.'', Nature 2008) <ref name="pmid18668037">{{cite journal |author=Baek D, Villén J, Shin C, Camargo FD, Gygi SP, Bartel DP |title=The impact of microRNAs on protein output. |journal=Nature |volume=455 |issue=7209 |pages=64–71 |year=2008 |pmid=18668037 |doi=10.1038/nature07242 }}</ref> and (Alexiou ''et al.'', Bioinformatics 2009)<ref name="pmid19789267">{{cite journal |author=Alexiou P, Maragkakis M, Papadopoulos GL, Reczko M, Hatzigeorgiou AG |title=Lost in translation: an assessment and perspective for computational microRNA target identification. |journal=Bioinformatics |volume=25 |issue=23 |pages=3049–55 |year=2009 |pmid=19789267 |doi=10.1093/bioinformatics/btp565}}</ref> + +{| class="wikitable sortable" +! Name +! Description ||Species Specific || Intra-molecular structure || Comparative || Link || References +|- +! [[Diana-microT]] +|DIANA-microT 3.0 is an algorithm based on several parameters calculated individually for each microRNA and it combines conserved and non-conserved microRNA recognition elements into a final prediction score.|| human, mouse || no || yes || [http://diana.cslab.ece.ntua.gr/microT/ webserver] || <ref name="pmid19765283">{{cite journal |author= Maragkakis M, Alexiou P, Papadopoulos GL, Reczko M, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K, Simossis VA, Sethupathy P, Vergoulis T, Koziris N, Sellis T, Tsanakas P, Hatzigeorgiou AG |title=Accurate microRNA target prediction correlates with protein repression levels. |journal=BMC Bioinformatics |volume=10|pages=295 |year=2009 |pmid=19765283 |doi=10.1186/1471-2105-10-295|pmc=2752464}}</ref> +|- +! [[MicroTar]] +|An animal miRNA target prediction tool based on miRNA-target complementarity and thermodynamic data. || no || no || no || [http://tiger.dbs.nus.edu.sg/microtar/ sourcecode] || <ref name="pmid17254305">{{cite journal |author=Thadani R, Tammi MT |title=MicroTar: predicting microRNA targets from RNA duplexes. |series=7 |journal=BMC Bioinformatics |volume=Suppl 5|pages=S20 |year=2006 |pmid=17254305 |doi=10.1186/1471-2105-7-S5-S20 |pmc=1764477}}</ref> +|- +! [[miTarget]] +|microRNA target gene prediction using a support vector machine. || no || no || no || [http://cbit.snu.ac.kr/~miTarget/ webserver] || <ref name="pmid16978421">{{cite journal |author=Kim SK, Nam JW, Rhee JK, Lee WJ, Zhang BT |title=miTarget: microRNA target gene prediction using a support vector machine. |journal=BMC Bioinformatics |volume=7|pages=411 |year=2006 |pmid=16978421 |doi=10.1186/1471-2105-7-411 |pmc=1594580}}</ref> +|- +! [[miRror]] +| Based on the notion of a combinatorial regulation by an ensemble of miRNAs or genes. miRror integrates predictions from a dozen of miRNA resources that are based on complementary algorithms into a unified statistical framework || no || no || no || [http://www.proto.cs.huji.ac.il/mirror/index.php webserver] || <ref>{{cite doi|10.1093/bioinformatics/btq298}}</ref><ref>{{cite doi|10.1093/nar/gks759}}</ref> +|- +! [[PicTar]] +|Combinatorial microRNA target predictions. || 8 vertebrates || no || yes || [http://pictar.bio.nyu.edu predictions] || <ref name="pmid15806104">{{cite journal |author=Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N |title=Combinatorial microRNA target predictions. |journal=Nat Genet |volume=37 |issue=5 |pages=495–500 |year=2005 |pmid=15806104 |doi=10.1038/ng1536}}</ref> +|- +! [[PITA (software)|PITA]] +|Incorporates the role of target-site accessibility, as determined by base-pairing interactions within the mRNA, in microRNA target recognition.|| no || yes || no || [http://genie.weizmann.ac.il/pubs/mir07/mir07_exe.html executable,] [http://genie.weizmann.ac.il/pubs/mir07/mir07_prediction.html webserver,] [http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html predictions] || <ref name="pmid17893677">{{cite journal |author=Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E |title=The role of site accessibility in microRNA target recognition. |journal=Nat Genet |volume=39 |issue=10 |pages=1278–84 |year=2007 |pmid=17893677 |doi=10.1038/ng2135}}</ref> +|- +! [[RNA22]] +|The first link (predictions) provides RNA22 predictions for all protein coding transcripts in human, mouse, roundworm, and fruit fly. It allows you to visualize the predictions within a cDNA map and also find transcripts where multiple miR's of interest target. The second web-site link (custom) first finds putative microRNA binding sites in the sequence of interest, then identifies the targeted microRNA. || no || no || no || [http://cm.jefferson.edu/rna22v1.0/ predictions] [http://cbcsrv.watson.ibm.com/rna22.html custom] || <ref name="pmid16990141">{{cite journal |author=Miranda KC, Huynh T, Tay Y, Ang YS, Tam WL, Thomson AM, Lim B, Rigoutsos I |title=A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. |journal=Cell |volume=126 |issue=6 |pages=1203–17 |year=2006 |pmid=16990141 |doi=10.1016/j.cell.2006.07.031}}</ref> +|- +! [[RNAhybrid]] +|a tool for finding the minimum free energy hybridisation of a long and a short RNA. || no || no || no || [http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/ sourcecode,] [http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/submission.html webserver] || <ref name="pmid15383676"/><ref name="pmid16845047"/> +|- +! [[Sylamer]] +|Sylamer is a method for finding significantly over or under-represented words in sequences according to a sorted gene list. Typically it is used to find significant enrichment or depletion of microRNA or siRNA seed sequences from microarray expression data. || no || no || no || [http://www.ebi.ac.uk/enright/sylamer/ sourcecode] [http://www.ebi.ac.uk/enright/sylarray/ webserver] || <ref name="pmid18978784">{{cite journal |author=van Dongen S, Abreu-Goodger C, Enright AJ |title=Detecting microRNA binding and siRNA off-target effects from expression data. |journal=Nat Methods |volume=5 |issue=12 |pages=1023–5 |year=2008 |pmid=18978784 |doi=10.1038/nmeth.1267 |pmc=2635553}}</ref><ref name="pmid20871108">{{cite journal |author=Bartonicek N, Enright AJ |title=SylArray: A web-server for automated detection of miRNA effects from expression data. |journal=Bioinformatics |year=2010 |pmid=20871108 |doi=10.1093/bioinformatics/btq545 |volume=26 |issue=22 |pages=2900–1}}</ref> +|- +! [[TAREF]] +|TAREF stands for TARget REFiner. It predicts microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering. || Yes || no || no || [http://scbb.ihbt.res.in/TAREF/programchoice.html server/sourcecode] || <ref name="pmid=20413915">{{cite journal |doi=10.1007/s12038-010-0013-7 |author=R. Heikham and R. Shankar |title=Flanking region sequence information to refine microRNA target predictions. |journal=Journal of Biosciences |volume=35 |issue=1 |pages=105–18|year=2010 |pmid=20413915}}</ref> +|- +! [[p-TAREF]] +|p-TAREF stands for plant TARget REFiner. It identifies plant microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering. It first time employed power of machine learning approach with scoring scheme through Support Vector Regression(SVR) while considering structural and alignment aspects of targeting in plants with plant specific models. p-TAREF has been implemented in concurrent architecture in server as well as standalone form, making it one of the very few available target identification tools able to run concurrently on simple desktops while performing huge transcriptome level analysis accurately and fast. Besides this, it also provides an option to experimentally validate the predicted targets, on the spot, using expression data, which has been integrated in its back-end, to draw confidence on prediction along with SVR score.p-TAREF performance benchmarking has been done extensively through different tests and compared with other plant miRNA target identification tools. p-TAREF was found better performing.|| Yes || no || no || [http://scbb.ihbt.res.in/SCBB_dept/Software.php server/standalone] || +|- +! [[TargetScan]] +|Predicts biological targets of miRNAs by searching for the presence of sites that match the seed region of each miRNA. In flies and nematodes, predictions are ranked based on the probability of their evolutionary conservation. In zebrafish, predictions are ranked based on site number, site type, and site context, which includes factors that influence target-site accessibility. In mammals, the user can choose whether the predictions should be ranked based on the probability of their conservation or on site number, type, and context. In mammals and nematodes, the user can chose to extend the predictions beyond conserved sites and consider all sites. || vertebrates, flies, nematodes || evaluated indirectly || yes || [http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_60 sourcecode], [http://www.targetscan.org/ webserver] || <ref name="pmid14697198">{{cite journal |author=Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB |title=Prediction of mammalian microRNA targets. |journal=Cell |volume=115 |issue=7 |pages=787–98 |year=2003 |pmid=14697198 |doi=10.1016/S0092-8674(03)01018-3}}</ref><ref name="pmid15652477">{{cite journal |author=Lewis BP, Burge CB, Bartel DP |title=Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. |journal=Cell |volume=120 |issue=1 |pages=15–20 |year=2005 |pmid=15652477 |doi=10.1016/j.cell.2004.12.035}}</ref><ref name="pmid17612493">{{cite journal |author=Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP |title=MicroRNA targeting specificity in mammals: determinants beyond seed pairing. |journal=Mol Cell |volume=27 |issue=1 |pages=91–105 |year=2007 |pmid=17612493 |doi=10.1016/j.molcel.2007.06.017}}</ref><ref name="pmid21909094">{{cite journal |author=Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP |title=Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. |journal=Nature Structural & Molecular Biology |volume=18 |issue=10 |pages=1139–1146 |year=2011 |pmid=21909094 |doi=10.1038/nsmb.2115 |pmc=3190056}}</ref> +|} + +==ncRNA gene prediction software== +{| class="wikitable sortable" +! Name +! Description +! Number of sequences<br/><ref group=Note>'''Number of sequences:''' <any|num>.</ref> +! Alignment<br/><ref group=Note>'''Alignment:''' predicts an [[sequence alignment|alignment]], <input|yes|no>.</ref> +! Structure<br/><ref group=Note>'''Structure:''' predicts [[RNA structure|structure]], <input|yes|no>.</ref> +! Link || References +|- +! [[Alifoldz]] +|Assessing a multiple sequence alignment for the existence of an unusual stable and conserved RNA secondary structure. || any || input || yes || [http://www.tbi.univie.ac.at/papers/SUPPLEMENTS/Alifoldz/ sourcecode] || <ref name="pmid15313604">{{cite journal | author = Washietl S, Hofacker IL | title = Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics | journal = J. Mol. Biol. | volume = 342 | issue = 1 | pages = 19–30 | year = 2004 | pmid = 15313604 | doi = 10.1016/j.jmb.2004.07.018 }}</ref> +|- +! [[EvoFold]] +|a comparative method for identifying functional RNA structures in multiple-sequence alignments. It is based on a probabilistic model-construction called a phylo-SCFG and exploits the characteristic differences of the substitution process in stem-pairing and unpaired regions to make its predictions. || any || input || yes || [http://www.cbse.ucsc.edu/~jsp/EvoFold/ linuxbinary] || <ref name="pmid16628248">{{cite journal | author = Pedersen JS, Bejerano G, Siepel A, ''et al.'' | title = Identification and classification of conserved RNA secondary structures in the human genome | journal = PLoS Comput. Biol. | volume = 2 | issue = 4 | pages = e33 | year = 2006 | pmid = 16628248 | doi = 10.1371/journal.pcbi.0020033 | pmc = 1440920 |bibcode = 2006PLSCB...2...33P }}</ref> +|- +! [[GraphClust]] +|Fast RNA structural clustering method to identify common (local) RNA secondary structures. Predicted structural clusters are presented as alignment. Due to the linear time complexity for clustering it is possible to analyse large RNA datasets. || any || yes || yes || [http://www.bioinf.uni-freiburg.de/Software/GraphClust/ sourcecode]|| <ref name="pmid22689765">{{cite journal | author = Heyne S, Costa F, Rose D, Backofen R | title = GraphClust: alignment-free structural clustering of local RNA secondary structures | journal = Bioinformatics | volume = 28 | issue = 12 | pages = i224-i232 | year = 2012 | pmid = 22689765 | doi = 10.1093/bioinformatics/bts224 }}</ref> +|- +! [[MSARi]] +|heuristic search for statistically significant conservation of RNA secondary structure in deep multiple sequence alignments. || any || input || yes || [http://theory.csail.mit.edu/MSARi sourcecode] || <ref>{{cite journal | author = Coventry A, Kleitman DJ, Berger BA | title = MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure | journal = PNAS | volume = 101 | issue = 33 | pages = 12102–12107 | pmid = 15304649 | doi = 10.1073/pnas.0404193101| year = 2004 | pmc = 514400 |bibcode = 2004PNAS..10112102C }}</ref> +|- +! [[QRNA]] +|This is the code from Elena Rivas that accompanies a submitted manuscript "Noncoding RNA gene detection using camparative sequence analysis". QRNA uses comparative genome sequence analysis to detect conserved RNA secondary structures, including both ncRNA genes and cis-regulatory RNA structures. || 2 || input || yes || [http://selab.janelia.org/software.html sourcecode] || <ref name="pmid11801179">{{cite journal | author = Rivas E, Eddy SR | title = Noncoding RNA gene detection using comparative sequence analysis | journal = BMC Bioinformatics | volume = 2| pages = 8 | year = 2001 | pmid = 11801179 | doi = 10.1186/1471-2105-2-8 | pmc = 64605 }}</ref><ref name="pmid11553332">{{cite journal | author = Rivas E, Klein RJ, Jones TA, Eddy SR | title = Computational identification of noncoding RNAs in E. coli by comparative genomics | journal = Curr. Biol. | volume = 11 | issue = 17 | pages = 1369–73 | year = 2001 | pmid = 11553332 | doi = 10.1016/S0960-9822(01)00401-8 }}</ref> +|- +! [[RNAz]] +|program for predicting structurally conserved and thermodynamic stable RNA secondary structures in multiple sequence alignments. It can be used in genome wide screens to detect functional RNA structures, as found in noncoding RNAs and cis-acting regulatory elements of mRNAs. || any || input || yes || [http://www.tbi.univie.ac.at/~wash/RNAz/ sourcecode,] [http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi webserver] [http://psb.stanford.edu/psb-online/proceedings/psb10/abstracts/2010_p69.html RNAz 2] || <ref name="pmid15665081">{{cite journal | author = Washietl S, Hofacker IL, Stadler PF | title = Fast and reliable prediction of noncoding RNAs | journal = Proc. Natl. Acad. Sci. U.S.A. | volume = 102 | issue = 7 | pages = 2454–9 | year = 2005 | pmid = 15665081 | doi = 10.1073/pnas.0409169102 | pmc = 548974 |bibcode = 2005PNAS..102.2454W }}</ref><ref name="pmid17452347">{{cite journal | author = Gruber AR, Neuböck R, Hofacker IL, Washietl S | title = The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures | journal = Nucleic Acids Res. | volume = 35 | issue = Web Server issue | pages = W335–8 | year = 2007 | pmid = 17452347 | doi = 10.1093/nar/gkm222 | pmc = 1933143 }}</ref><ref name="pmid17993695">{{cite journal | author = Washietl S | title = Prediction of Structural Noncoding RNAs With RNAz | journal = Methods Mol. Biol. | volume = 395 | issue = | pages = 503–26 | year = 2007 | pmid = 17993695 | doi = 10.1007/978-1-59745-514-5_32| issn = }}</ref> +|- +! [[Xrate]] +|a program for analysis of multiple sequence alignments using phylogenetic [[SCFG|grammars]], that may be viewed as a flexible generalization of the "Evofold" program.||any||yes||yes||[http://biowiki.org/XrateSoftware sourcecode]||<ref name="pmid17018148"/> +|- +| colspan=7| +;Notes:{{reflist|group=Note}} +|} + +==Family specific gene prediction software== +{| class="wikitable sortable" +! Name +! Description || Family || Link || References +|- +! ARAGORN +|ARAGORN detects tRNA and tmRNA in nucleotide sequences. || [[tRNA]] [[tmRNA]] || [http://130.235.46.10/ARAGORN/ webserver] [http://130.235.46.10/ARAGORN/aragorn1.2.28.c source] || <ref name="pmid14704338">{{cite journal |author=Laslett D, Canback B |title=ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. |journal=Nucl. Acids Res. |volume=32 |issue= 1|pages=39 |year=2004 |pmid=14704338 |doi=10.1093/nar/gkh152|pmc=373265}}</ref> +|- +! miReader +|miReader is a first of its type to detect mature miRNAs without any dependence upon genomic or reference sequences. So far, discovering miRNAs was possible only with species for which genomic or reference sequences would be available as most of the miRNA discovery tools relied on drawing pre-miRNA candidates. Due to this, miRNA biology became limited to model organisms, mostly. With miReader, one can now directly find out mature miRNAs from small RNA sequencing data, without any need of genomic/reference sequences. It has been developed for large number of Phyla and species, ranging from Vertebrate models to plant and fish models. Its accuracy has been found to be consistently >90% for large number of validatory testing. || [[mature miRNA]] || [http://scbb.ihbt.res.in/2810-12/miReader.php webserver/source] [http://sourceforge.net/projects/mireader/ webserver/source] || <ref name="pmid23805282">{{cite journal |author=Jha A, Shankar R |title=miReader: Discovering novel miRNAs in species without sequenced genome. |journal=PLOS ONE. |volume=8 |issue= 6|pages= e66857|year=2013 |pmid=23805282 |doi= 10.1371/journal.pone.0066857|pmc=3689854 +}}</ref> +|- + +! [[miRNAminer]] +|Given a search query, candidate homologs are identified using BLAST search and then tested for their known miRNA properties, such as secondary structure, energy, alignment and conservation, in order to assess their fidelity. || [[MicroRNA]] || [http://groups.csail.mit.edu/pag/mirnaminer/ webserver] || <ref name="pmid18215311">{{cite journal |author=Artzi S, Kiezun A, Shomron N |title=miRNAminer: a tool for homologous microRNA gene search. |journal=BMC Bioinformatics |volume=9|pages=39 |year=2008 |pmid=18215311 |doi=10.1186/1471-2105-9-39 |pmc=2258288}}</ref> +|- +! RISCbinder +|Prediction of guide strand of microRNAs. || [[MicroRNA|Mature miRNA]] || [http://crdd.osdd.net:8081/RISCbinder/ webserver] || <ref name=riscbinder>{{cite journal |author=Ahmed F, Ansari HR and Raghava GPS |title=Prediction of guide strand of microRNAs from its sequence and secondary structure |journal=BMC Bioinformatics |year=2009 |url=http://www.biomedcentral.com/1471-2105/10/105}}</ref> +|- +! [[RNAmicro]] +|A SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structures, is capable of recognizing microRNA precursors in multiple sequence alignments. || [[MicroRNA]] || [http://www.bioinf.uni-leipzig.de/~jana/software/RNAmicro.html homepage] || <ref name="pmid16873472">{{cite journal |author=Hertel J, Stadler PF |title=Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. |journal=Bioinformatics |volume=22 |issue=14 |pages=e197–202 |year=2006 |pmid=16873472 |doi=10.1093/bioinformatics/btl257}}</ref> +|- +! RNAmmer +|RNAmmer uses [[HMMER]] to annotate [[rRNA]] genes in genome sequences. Profiles were built using alignments from the European ribosomal RNA database<ref name="pmid14681368">{{cite journal |author=Wuyts J, Perrière G, Van De Peer Y |title=The European ribosomal RNA database. |journal=Nucleic Acids Res |volume=32 |issue=Database issue |pages=D101–3 |year=2004 |pmid=14681368 |doi=10.1093/nar/gkh065 |pmc=308799}}</ref> and the 5S Ribosomal RNA Database.<ref name="pmid11752286">{{cite journal |doi=10.1093/nar/30.1.176 |author=Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J |title=5S Ribosomal RNA Database. |journal=Nucleic Acids Res |volume=30 |issue=1 |pages=176–8 |year=2002 |pmid=11752286 |pmc=99124}}</ref>|| [[rRNA]] || [http://www.cbs.dtu.dk/services/RNAmmer/ webserver] [http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer source] || <ref name="pmid17452365">{{cite journal |author=Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW |title=RNAmmer: consistent and rapid annotation of ribosomal RNA genes. |journal=Nucleic Acids Res |volume=35 |issue=9 |pages=3100–8 |year=2007 |pmid=17452365 |doi=10.1093/nar/gkm160 |pmc=1888812}}</ref> +|- +! [[SnoReport]] +|Uses a combination of RNA secondary structure prediction and machine learning that is designed to recognize the two major classes of snoRNAs, box C/D and box H/ACA snoRNAs, among ncRNA candidate sequences. || [[snoRNA]] || [http://www.bioinf.uni-leipzig.de/~jana/software/SnoReport.html sourcecode] || <ref name="pmid17895272">{{cite journal |author=Hertel J, Hofacker IL, Stadler PF |title=SnoReport: computational identification of snoRNAs with unknown targets. |journal=Bioinformatics |volume=24 |issue=2 |pages=158–64 |year=2008 |pmid=17895272 |doi=10.1093/bioinformatics/btm464}}</ref> +|- +! [[SnoScan]] +|Search for C/D box methylation guide snoRNA genes in a genomic sequence. || [[snoRNA|C/D box snoRNA]] || [http://lowelab.ucsc.edu/snoscan/ sourcecode,] [http://lowelab.ucsc.edu/snoscan/ webserver] || <ref name="pmid10024243">{{cite doi|10.1126/science.283.5405.1168}}</ref><ref name="pmid15980563">{{cite journal |author=Schattner P, Brooks AN, Lowe TM |title=The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. |journal=Nucleic Acids Res |volume=33 |issue=Web Server issue |pages=W686–9 |year=2005 |pmid=15980563 |doi=10.1093/nar/gki366 |pmc=1160127}}</ref> +|- +! [[snoSeeker]] +|snoSeeker includes two snoRNA-searching programs, CDseeker and ACAseeker, specific to the detection of C/D [[snoRNA]]s and H/ACA snoRNAs, respectively. snoSeeker has been used to scan four human–mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. || [[snoRNA]] || [http://genelab.sysu.edu.cn/snoSeeker/index.php webserver,stand-alone] || <ref name="pmid16990247">{{cite journal |author=Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH. |title=snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. |journal=Nucleic Acids Res. |volume=34 |issue=18 |pages=5112–5123 |year=2006 |pmid=16990247 |pmc=1636440 |doi=10.1093/nar/gkl672}}</ref> +|- +! [[tRNAscan-SE]] +|a program for the detection of transfer RNA genes in genomic sequence. || [[tRNA]] || [http://lowelab.ucsc.edu/tRNAscan-SE/ sourcecode,] [http://lowelab.ucsc.edu/tRNAscan-SE/ webserver] || <ref name="pmid15980563"/><ref name="pmid9023104">{{cite journal |author=Lowe TM, Eddy SR |title=tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. |journal=Nucleic Acids Res |volume=25 |issue=5 |pages=955–64 |year=1997 |pmid=9023104 |doi=10.1093/nar/25.5.955 |pmc=146525}}</ref> +|- +! [[miRNAFold]] +|A fast ab initio software for searching for microRNA precursors in genomes. || [[microRNA]] || [http://EvryRNA.ibisc.univ-evry.fr/ webserver] || <ref name="pmid22362754">{{cite journal |author=Tempel S, Tahi F |title=A fast ab-initio method for predicting miRNA precursors in genomes. |journal=Nucleic Acids Res. |volume=40 |issue=11 |pages=955–64 |year=2012 |pmid=22362754 |doi= 10.1093/nar/gks146 |pmc=3367186}}</ref> +|- +|} + +==RNA homology search software== +{| class="wikitable sortable" +! Name +! Description || Link || References +|- +! [[ERPIN]] +|"Easy RNA Profile IdentificatioN" is an RNA motif search program reads a sequence alignement and secondary structure, and automatically infers a statistical "secondary structure profile" (SSP). An original Dynamic Programming algorithm then matches this SSP onto any target database, finding solutions and their associated scores. || [http://rna.igmors.u-psud.fr/erpin/ sourcecode] [http://tagc.univ-mrs.fr/erpin/ webserver] || <ref name="pmid11700055">{{cite journal |author=Gautheret D, Lambert A |title=Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. |journal=J Mol Biol |volume=313 |issue=5 |pages=1003–11 |year=2001 |pmid=11700055 |doi=10.1006/jmbi.2001.5102}}</ref><ref name="pmid15215371">{{cite journal |author=Lambert A, Fontaine JF, Legendre M, Leclerc F, Permal E, Major F, Putzer H, Delfour O, Michot B, Gautheret D |title=The ERPIN server: an interface to profile-based RNA motif identification. |journal=Nucleic Acids Res |volume=32 |issue=Web Server issue |pages=W160–5 |year=2004 |pmid=15215371 |doi=10.1093/nar/gkh418 |pmc=441556}}</ref><ref name="pmid15892887">{{cite journal |author=Lambert A, Legendre M, Fontaine JF, Gautheret D |title=Computing expectation values for RNA motifs using discrete convolutions. |journal=BMC Bioinformatics |volume=6|pages=118 |year=2005 |pmid=15892887 |doi=10.1186/1471-2105-6-118 |pmc=1168889}}</ref> +|- +! [[Infernal (software)|Infernal]] +|"INFERence of RNA ALignment" is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). || [http://infernal.janelia.org/ sourcecode] || <ref name="pmid17397253">{{cite journal |author=Nawrocki EP, Eddy SR |title=Query-dependent banding (QDB) for faster RNA similarity searches. |journal=PLoS Comput Biol |volume=3 |issue=3 |pages=e56 |year=2007 |pmid=17397253 |doi=10.1371/journal.pcbi.0030056 |pmc=1847999|bibcode = 2007PLSCB...3...56N }}</ref><ref name="pmid12095421">{{cite journal |author=Eddy SR |title=A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. |journal=BMC Bioinformatics |volume=3|pages=18 |year=2002 |pmid=12095421 |doi=10.1186/1471-2105-3-18 |pmc=119854}}</ref><ref name="pmid8029015">{{cite journal |author=Eddy SR, Durbin R |title=RNA sequence analysis using covariance models. |journal=Nucleic Acids Res |volume=22 |issue=11 |pages=2079–88 |year=1994 |pmid=8029015 |doi=10.1093/nar/22.11.2079 |pmc=308124}}</ref> +|- +! [[GraphClust]] +|Fast RNA structural clustering method to identify common (local) RNA secondary structures. Predicted structural clusters are presented as alignment. Due to the linear time complexity for clustering it is possible to analyse large RNA datasets. || [http://www.bioinf.uni-freiburg.de/Software/GraphClust/ sourcecode]|| <ref name="pmid22689765">{{cite journal | author = Heyne S, Costa F, Rose D, Backofen R | title = GraphClust: alignment-free structural clustering of local RNA secondary structures | journal = Bioinformatics | volume = 28 | issue = 12 | pages = i224-i232 | year = 2012 | pmid = 22689765 | doi = 10.1093/bioinformatics/bts224 }}</ref> +|- +! [[PHMMTS]] +|"pair hidden Markov models on tree structures" is an extension of pair hidden Markov models defined on alignments of trees. || [http://phmmts.dna.bio.keio.ac.jp/ sourcecode,] [http://phmmts.dna.bio.keio.ac.jp/ webserver] || <ref name="pmid16204111">{{cite journal |author=Sato K, Sakakibara Y |title=RNA secondary structural alignment with conditional random fields. |series=21 |journal=Bioinformatics |volume=Suppl 2 |issue= suppl_2|pages=ii237–42 |year=2005 |pmid=16204111 |doi=10.1093/bioinformatics/bti1139}}</ref> +|- +! [[RaveNnA]] +|A slow and rigorous or fast and heuristic sequence-based filter for covariance models. || [http://bliss.biology.yale.edu/~zasha/ravenna/ sourcecode] || <ref name="pmid15262817">{{cite journal |author=Weinberg Z, Ruzzo WL |title=Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy. |series=20 |journal=Bioinformatics |volume=Suppl 1 |issue= suppl_1|pages=i334–41 |year=2004 |pmid=15262817 |doi=10.1093/bioinformatics/bth925}}</ref><ref name="pmid16267089">{{cite journal |author=Weinberg Z, Ruzzo WL |title=Sequence-based heuristics for faster annotation of non-coding RNA families. |journal=Bioinformatics |volume=22 |issue=1 |pages=35–9 |year=2006 |pmid=16267089 |doi=10.1093/bioinformatics/bti743}}</ref> +|- +! [[RSEARCH]] +|Takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. || [ftp://selab.janelia.org/pub/software/rsearch/ sourcecode] || <ref name="pmid14499004">{{cite journal |author=Klein RJ, Eddy SR |title=RSEARCH: finding homologs of single structured RNA sequences. |journal=BMC Bioinformatics |volume=4|pages=44 |year=2003 |pmid=14499004 |doi=10.1186/1471-2105-4-44 |pmc=239859}}</ref> +|- +! [[Structator]] +|Ultra fast software for searching for RNA structural motifs employing an innovative index-based bidirectional matching algorithm combined with a new fast fragment chaining strategy. || [http://www.zbh.uni-hamburg.de/Structator/ sourcecode] || <ref name="pmid21619640">{{cite journal |author=Meyer F, Kurtz S, Backofen R, Will S, Beckstette M |title=Structator: fast index-based search for RNA sequence-structure patterns |journal=BMC Bioinformatics |volume=12|pages=214 |year=2011 |pmid=21619640 |doi=10.1186/1471-2105-12-214 |pmc=3154205}}</ref> +|} + +==Benchmarks== +{| class="wikitable sortable" +! Name +! Description +! Structure<ref group=Note>'''Structure:''' benchmarks [[RNA structure|structure]] prediction tools <yes|no>.</ref> +! Alignment<ref group=Note>'''Alignment:''' benchmarks [[sequence alignment|alignment]] tools <yes|no>.</ref> +! Phylogeny || Links || References +|- +! [[BRalibase]] I +|A comprehensive comparison of comparative RNA structure prediction approaches || yes || no || no || [http://projects.binf.ku.dk/pgardner/bralibase/bralibase1.html data] || <ref name="pmid15458580">{{cite journal | author = Gardner PP, Giegerich R | title = A comprehensive comparison of comparative RNA structure prediction approaches | journal = BMC Bioinformatics | volume = 5| pages = 140 | year = 2004 | pmid = 15458580 | doi = 10.1186/1471-2105-5-140 | pmc = 526219 }}</ref> +|- +! BRalibase II +|A benchmark of multiple sequence alignment programs upon structural RNAs || no || yes || no || [http://projects.binf.ku.dk/pgardner/bralibase/bralibase2.html data] || <ref name="pmid15860779">{{cite journal | author = Gardner PP, Wilm A, Washietl S | title = A benchmark of multiple sequence alignment programs upon structural RNAs | journal = Nucleic Acids Res. | volume = 33 | issue = 8 | pages = 2433–9 | year = 2005 | pmid = 15860779 | doi = 10.1093/nar/gki541 | pmc = 1087786 }}</ref> +|- +! BRalibase 2.1 +|A benchmark of multiple sequence alignment programs upon structural RNAs || no || yes || no || [http://www.biophys.uni-duesseldorf.de/bralibase/ data] || <ref name="pmid17062125">{{cite journal |author=Wilm A, Mainz I, Steger G |title=An enhanced RNA alignment benchmark for sequence alignment programs. |journal=Algorithms Mol Biol |volume=1 |issue= 1|pages=19 |year=2006 |pmid=17062125 |pmc=1635699 |doi=10.1186/1748-7188-1-19}}</ref> +|- +! BRalibase III +|A critical assessment of the performance of homology search methods on noncoding RNA || no || yes || no || [http://projects.binf.ku.dk/pgardner/bralibase/bralibase3 data] || <ref name="pmid17151342">{{cite journal | author = Freyhult EK, Bollback JP, Gardner PP | title = Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA | journal = Genome Res. | volume = 17 | issue = 1 | pages = 117–25 | year = 2007 | pmid = 17151342 | doi = 10.1101/gr.5890907 | pmc = 1716261 }}</ref> +|- +! CompaRNA +|An independent comparison of single-sequence and comparative methods for RNA secondary structure prediction || yes || no || no || [http://comparna.amu.edu.pl AMU mirror] or [http://iimcb.genesilico.pl/comparna/ IIMCB mirror] || <ref name="pmid23435231">{{cite journal | author = Puton T, Kozlowski LP, Rother KM, Bujnicki JM | title = CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction | journal = Nucleic Acids Research | volume = 41 | issue = 7 | pages = 4307–23 | year = 2013 | pmid = 23435231 | doi = 10.1093/nar/gkt101 }}</ref> +|- +| colspan=7| +;Notes:{{reflist|group=Note}} +|} + +==Alignment viewers/editors== +{| class="wikitable sortable" +! Name +! Description +! Alignment<ref group=Note>'''Alignment:''' view and edit an [[sequence alignment|alignment]], <yes|no>.</ref> +! Structure<ref group=Note>'''Structure:''' view and edit [[RNA structure|structure]], <yes|no>.</ref> +! Link || References +|- +! [[4sale]] +|A tool for Synchronous RNA Sequence and Secondary Structure Alignment and Editing||yes||yes || [http://4sale.bioapps.biozentrum.uni-wuerzburg.de/index.html sourcecode] || <ref name="pmid17101042">{{cite journal | author = Seibel PN, Müller T, Dandekar T, Schultz J, Wolf M | title = 4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing | journal = BMC Bioinformatics | volume = 7| pages = 498 | year = 2006 | pmid = 17101042 | doi = 10.1186/1471-2105-7-498 | pmc = 1637121 }}</ref> +|- +! [[Colorstock]], [[SScolor]], [[Raton (software)|Raton]] +|Colorstock, a command-line script using ANSI terminal color; SScolor, a Perl script that generates static HTML pages; and Raton, an AJAX web application generating dynamic HTML. Each tool can be used to color RNA alignments by secondary structure and to visually highlight compensatory mutations in stems. || yes || yes || [http://biowiki.org/RNAAlignmentViewers sourcecode] || <ref name="colorstock">{{cite journal | author = Bendana YR, Holmes IH | title = Colorstock, SScolor, Rat ́on: RNA Alignment Visualization Tools |doi=10.1093/bioinformatics/btm635| pmid=18218657 |journal = Bioinformatics | year = 2008 | volume=24 | issue=4 | pages=579–80}}</ref> +|- +! [[Integrated Genome Browser]] (IGB) +|a multiple alignment viewer written in Java. +||yes||no|| [http://genoviz.sourceforge.net/ sourcecode] || <ref name='binf_igb_paper'>{{cite journal |author=Nicol JW, Helt GA, Blanchard SG Jr, Raja A, Loraine AE |title=The Integrated Genome Browser: Free software for distribution and exploration of genome-scale data sets. |journal=Bioinformatics |volume=25 |issue=20 |pages=2730–2731|year=2009 |pmid=19654113 |pmc=2759552 |doi=10.1093/bioinformatics/btp472}}</ref> +|- +! [[Jalview]] +|a multiple alignment editor written in Java. +||yes||no|| [http://www.jalview.org/ sourcecode] || <ref name="pmid19151095">{{cite journal |author=Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ |title=Jalview Version 2--a multiple sequence alignment editor and analysis workbench. |journal=Bioinformatics |volume=25 |issue=9 |pages=1189–91 |year=2009 |pmid=19151095 |doi=10.1093/bioinformatics/btp033 |pmc=2672624}}</ref><ref name="pmid14960472">{{cite journal |author=Clamp M, Cuff J, Searle SM, Barton GJ |title=The Jalview Java alignment editor. |journal=Bioinformatics |volume=20 |issue=3 |pages=426–7 |year=2004 |pmid=14960472 |doi=10.1093/bioinformatics/btg430}}</ref> +|- +! [[RALEE]] +|a major mode for the [[Emacs]] text editor. It provides functionality to aid the viewing and editing of multiple sequence alignments of structured RNAs.||yes||yes || [http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ sourcecode] || <ref name="pmid15377506">{{cite journal | author = Griffiths-Jones S | title = RALEE--RNA ALignment editor in Emacs | journal = Bioinformatics | volume = 21 | issue = 2 | pages = 257–9 | year = 2005 | pmid = 15377506 | doi = 10.1093/bioinformatics/bth489 }}</ref> +|- +! [[SARSE]] +|A graphical sequence editor for working with structural alignments of RNA.||yes||yes|| [http://sarse.kvl.dk/ sourcecode]|| <ref name="pmid17804647">{{cite journal | author = Andersen ES, Lind-Thomsen A, Knudsen B, ''et al.'' | title = Semiautomated improvement of RNA alignments | journal = RNA | volume = 13 | issue = 11 | pages = 1850–9 | year = 2007 | pmid = 17804647 | doi = 10.1261/rna.215407 | pmc = 2040093 }}</ref> +|- +| colspan=6| +;Notes:{{reflist|group=Note}} +|} + +==Inverse Folding/RNA design== +{| class="wikitable sortable" +! Name +! Description || Link || References +|- +! [[ETeRNA]] +|An RNA folding game that challenges players to come up with sequences that fold into a target RNA structure. The best sequences for a given puzzle are synthesized and their structures are probed through chemical mapping. The sequences are then scored by the data's agreement to the target structure and feedback is provided to the players. || [http://eterna.cmu.edu/content/EteRNA home page] || -- +|- +! [[NUPACK]] +| Although NUPACK can be used to get useful statistics and properties of an RNA's structure as mentioned above, its main goal is design of new sequences that fold into a desired structure.|| [http://nupack.org/ home page] || <ref name="NUPACK">{{cite journal | doi=10.1137/060651100 | author = R.M. Dirks, J.S. Bois, J.M. Schaeffer, E. Winfree, N.A. Pierce | title = Thermodynamic Analysis of Interacting Nucleic Acid Strands | journal = SIAM Review | volume = 49 | issue=1 | pages = 65–88 | year = 2007|bibcode = 2007SIAMR..49...65D }}</ref> +|- +! [[RNAInverse]] +| The ViennaRNA package provides RNAInverse, an algorithm for designing sequences with desired structure.|| [http://www.tbi.univie.ac.at/~ivo/RNA/man/RNAinverse.html help page] || <ref name="RNAInverse">{{cite journal | author = I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster | title = Fast Folding and Comparison of RNA Secondary Structures. | journal = Monatshefte f. Chemie | volume = 125 | issue = 2 | pages = 167–188 | year = 1994 | doi = 10.1007/BF00818163}}</ref> +|- +! [[RNAiFold]] +| A complete RNA inverse folding approach based in [[constraint programming]] which allows for the specification of a wide range of design constraints.|| [http://bioinformatics.bc.edu/clotelab/RNAiFold/ home page] || <ref name="RNAiFold">{{cite journal | author = JA. Garcia-Martin, P. Clote, I. Dotu | title = RNAiFOLD: A CONSTRAINT PROGRAMMING ALGORITHM FOR RNA INVERSE FOLDING AND MOLECULAR DESIGN. | journal = {Journal of Bioinformatics and Computational Biology | volume = 11 | issue = 02 | pages = 1350001 | year = 2013 | doi = 10.1142/S0219720013500017 | pmid = 23600819}}</ref> +|} + +==Secondary structure viewers/editors== +{| class="wikitable sortable" +! Name +! Description || Link || References +|- +! PseudoViewer +|Automatically visualizing RNA pseudoknot structures as planar graphs. || [http://wilab.inha.ac.kr/pseudoviewer/ webapp/binary] || <ref name="pmid19369500">{{cite journal |author=Byun Y, Han K |title=PseudoViewer3: generating planar drawings of large-scale RNA structures with pseudoknots. |journal=Bioinformatics |volume=25 |issue=11 |pages=1435–7 |year=2009 |pmid=19369500 |doi=10.1093/bioinformatics/btp252}}</ref><ref name="pmid16845039">{{cite journal |author=Byun Y, Han K |title=PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. |journal=Nucleic Acids Res |volume=34 |issue=Web Server issue |pages=W416–22 |year=2006 |pmid=16845039 |doi=10.1093/nar/gkl210 |pmc=1538805}}</ref><ref name="pmid12824341">{{cite journal |doi=10.1093/nar/gkg539 |author=Han K, Byun Y |title=PSEUDOVIEWER2: Visualization of RNA pseudoknots of any type. |journal=Nucleic Acids Res |volume=31 |issue=13 |pages=3432–40 |year=2003 |pmid=12824341 |pmc=168946}}</ref><ref name="pmid12169562">{{cite journal |author=Han K, Lee Y, Kim W |title=PseudoViewer: automatic visualization of RNA pseudoknots. |series=18 |journal=Bioinformatics |volume=Suppl 1 |issue= |pages=S321–8 |year=2002 |pmid=12169562 | doi = 10.1093/bioinformatics/18.suppl_1.S321 }}</ref> +|- +! RNA Movies +|browse sequential paths through RNA secondary structure landscapes || [http://bibiserv.techfak.uni-bielefeld.de/rnamovies sourcecode] || <ref name="pmid17567618">{{cite journal |author=Kaiser A, Krüger J, Evers DJ |title=RNA Movies 2: sequential animation of RNA secondary structures. |journal=Nucleic Acids Res |volume=35 |issue=Web Server issue |pages=W330–4 |year=2007 |pmid=17567618 |doi=10.1093/nar/gkm309 |pmc=1933240}}</ref><ref name="pmid10068690">{{cite journal |doi=10.1093/bioinformatics/15.1.32 |author=Evers D, Giegerich R |title=RNA movies: visualizing RNA secondary structure spaces. |journal=Bioinformatics |volume=15 |issue=1 |pages=32–7 |year=1999 |pmid=10068690}}</ref> +|- +! RNA2D3D +|a program for generating, viewing, and comparing 3-dimensional models of RNA || [http://www-lmmb.ncifcrf.gov/~bshapiro/software.html binary] || <ref name="pmid18399701">{{cite journal |author=Martinez HM, Maizel JV, Shapiro BA |title=RNA2D3D: a program for generating, viewing, and comparing 3-dimensional models of RNA. |journal=J Biomol Struct Dyn |volume=25 |issue=6 |pages=669–83 |year=2008 |pmid=18399701}}</ref> +|- +! RNAstructure +|RNAstructure has a viewer for structures in ct files. It can also compare predicted structures using the circleplot program. Structures can be output as postscript files. || [http://rna.urmc.rochester.edu/RNAstructure.html sourcecode] || <ref name="pmid20230624 ">{{cite journal |author=Reuter JS, Mathews DH |title=RNAstructure: software for RNA secondary structure prediction and analysis. |journal=BMC Bioinformatics |volume=11 |pages=129 |year=2010 |pmid=20230624|doi = 10.1186/1471-2105-11-129 |pmc=2984261}}</ref> +|- +! RNAView/RnamlView +|Use RNAView to automatically identify and classify the types of base pairs that are formed in nucleic acid structures. Use RnamlView to arrange RNA structures. || [http://ndbserver.rutgers.edu/services/ sourcecode] || <ref name="pmid12824344">{{cite journal |doi=10.1093/nar/gkg529 |author=Yang H, Jossinet F, Leontis N, Chen L, Westbrook J, Berman H, Westhof E |title=Tools for the automatic identification and classification of RNA base pairs. |journal=Nucleic Acids Res |volume=31 |issue=13 |pages=3450–60 |year=2003 |pmid=12824344 |pmc=168936}}</ref> +|- +! RILogo +|Visualizes the intra-/intermolecular base pairing of two interacting RNAs with sequence logos in a planar graph. || [http://rth.dk/resources/rilogo web server / sourcecode] || <ref name="pmid22826541">{{cite journal |doi=10.1093/bioinformatics/bts461 |author=Menzel P, Seemann SE, Gorodkin J |title=RILogo: visualizing RNA-RNA interactions. |journal=Bioinformatics |volume=28 |issue=19 |pages=2523–6 |year=2012 |pmid=22826541}}</ref> +|- +! VARNA +|A tool for the automated drawing, visualization and annotation of the secondary structure of RNA, initially designed as a companion software for web servers and databases || [http://varna.lri.fr webapp/sourcecode] || <ref name="pmid19398448">{{cite journal |author=Darty K, Denise A, Ponty Y |title=VARNA: Interactive drawing and editing of the RNA secondary structure. |journal=Bioinformatics |volume=25 |issue=15 |pages=1974–5 |year=2009 |pmid=19398448 |doi=10.1093/bioinformatics/btp250 |pmc=2712331}}</ref> +|} + +==See also== +* [[RNA]] +* [[Non-coding RNA]] +* [[RNA structure]] +* [[List of nucleic acid simulation software]] + +==References== +{{reflist|2}} + +{{DEFAULTSORT:List Of Rna Structure Prediction Software}} +[[Category:Bioinformatics software]] +[[Category:Lists of software|RNA structure prediction software]] +[[Category:RNA]] + 2u39pk9v7cq07hrtqcxlwoxf001wt4s + + + + Projections onto convex sets + 0 + 28183 + + 28184 + 2013-09-21T18:02:43Z + + EvergreenFir + 0 + + + Reverted 1 edit by [[Special:Contributions/Willywheel|Willywheel]] ([[User talk:Willywheel|talk]]) to last revision by Rcsprinter123. ([[WP:TW|TW]]) + wikitext + text/x-wiki + In mathematics, '''projections onto convex sets (POCS)''', sometimes known as the '''alternating projection''' method, is a method to find a point in the intersection of two [[closed set|closed]] [[convex set|convex]] sets. It is a very simple algorithm and has been rediscovered many times.<ref name="SIAMreview"/> The simplest case, when the sets are [[affine spaces]], was analyzed by [[John von Neumann]].<ref>J. von Neumann, On rings of operators. Reduction theory, Ann. of Math. 50 (1949) 401–485 (a reprint of lecture notes first distributed in 1933).</ref> +<ref>J. von Neumann. Functional Operators, volume II. Princeton University Press, Princeton, NJ, 1950. Reprint of mimeographed lecture notes first distributed in 1933.</ref> The case when the sets are affine spaces is special, since the iterates not only converge to a point in the intersection (assuming the intersection is non-empty) but in fact to the orthogonal projection onto the intersection of the initial iterate. For general closed convex sets, the limit point need not be the projection. Classical work on the case of two closed convex sets shows that the [[rate of convergence]] of the iterates is linear. +<ref>L.G. Gubin, B.T. Polyak, and E.V. Raik. The method of projections for finding the common point of convex sets. U.S.S.R. Computational Mathematics and Mathematical Physics, 7:1–24, 1967.</ref> +<ref>H.H. Bauschke and J.M. Borwein. On the convergence of von Neumann's alternating projection algorithm for two sets. Set-Valued Analysis, 1:185–212, 1993.</ref> +There are now extensions that consider cases when there are more than one set, or when the sets are not [[convex set|convex]],<ref>{{cite DOI| 10.1287/moor.1070.0291}}</ref> or that give faster convergence rates. Analysis of POCS and related methods attempt to show that the algorithm converges (and if so, find the [[rate of convergence]]), and whether it converges to the [[Projection_(linear_algebra)#Orthogonal_projections|projection]] of the original point. These questions are largely known for simple cases, but a topic of active research for the extensions. There are also variants of the algorithm, such as [[Dykstra's projection algorithm]]. See the references in the [[#Further_reading|further reading]] section for an overview of the variants, extensions and applications of the POCS method; a good historical background can be found in section III of.<ref name="PLC">P. L. Combettes, "The foundations of set theoretic estimation," Proceedings of the IEEE, vol. 81, no. 2, pp. 182–208, February 1993. [http://www.ann.jussieu.fr/~plc/proc.pdf PDF]</ref> + +== Algorithm == +[[File:Projections onto convex sets circles.svg|350px|thumb|right|Example on two circles.]] + +The POCS algorithm solves the following problem: + +: <math> \text{find} \; x \in \mathcal{R}^n \quad\text{such that}\; x \in C \cap D </math> + +where ''C'' and ''D'' are [[closed set|closed]] [[convex set]]s. + +To use the POCS algorithm, one must know how to project onto the sets ''C'' and ''D'' separately. +The algorithm starts with an arbitrary value for <math>x_0</math> and then generates the sequence + +: <math>x_{k+1} = \mathcal{P}_C \left( \mathcal{P}_D ( x_k ) \right). </math> + +The simplicity of the algorithm explains some of its popularity. If the [[Intersection (set theory)|intersection]] of ''C'' and ''D'' is non-empty, then the [[sequence]] generated by the algorithm will [[Convergent series|converge]] to some point in this intersection. + +Unlike [[Dykstra's projection algorithm]], the solution need not be a projection onto the intersection ''C'' and ''D''. + +== Related algorithms == +[[File:Projections onto convex avg sets circles.svg|350px|thumb|right|Example of '''averaged projections''' variant.]] + +The method of '''averaged projections''' is quite similar. For the case of two closed convex sets ''C'' and ''D'', it proceeds by + +: <math> x_{k+1} = \frac{1}{2}( \mathcal{P}_C(x_k) + \mathcal{P}_D(x_k) ) </math> + +It has long been known to converge globally.<ref>A. Auslender. Methodes Numeriques pour la Resolution des Problems +d’Optimisation avec Constraintes. PhD thesis, Faculte des Sciences, Grenoble, 1969</ref> Furthermore, the method is easy to generalize to more than two sets; some convergence results for this case are in.<ref>Local convergence for alternating and averaged nonconvex projections. A Lewis, R Luke, J Malick, 2007. [http://arxiv.org/abs/0709.0109 arXiv]</ref> + +The ''averaged'' projections method can be reformulated as ''alternating'' projections method using a standard trick. Consider the set + +: <math> E = \{ (x,y) : x \in C, \; y \in D \}</math> + +which is defined in the [[Tensor product|product space]] <math> \mathcal{R}^n \times \mathcal{R}^n </math>. +Then define another set, also in the product space: + +: <math> F = \{ (x,y) : x \in \mathcal{R}^b,\, y \in \mathcal{R}^n,\; x=y \}.</math> + +Thus finding <math> C \cap D </math> is equivalent to finding <math> E \cap F</math>. + +To find a point in <math> E \cap F</math>, use the alternating projection method. The projection of a vector <math>(x,y)</math> onto the set ''F'' is given by <math>(x+y,x+y)/2</math>. Hence + +: <math>(x_{k+1},y_{k+1}) = \mathcal{P}_{F}( \mathcal{P}_{E}( (x_{k},y_{k}) ) ) = \mathcal{P}_{F}( (\mathcal{P}_{C}x_{k},\mathcal{P}_{D}y_{k}) ) = \frac{1}{2}( \mathcal{P}_C(x_k) + \mathcal{P}_D(y_k) , (\mathcal{P}_C(x_k) + \mathcal{P}_D(y_k) ). </math> + +Since <math> x_{k+1} = y_{k+1} </math> and assuming <math> x_0 = y_0 </math>, then <math>x_j=y_j</math> for all <math> j \ge 0</math>, and hence we can simplify the iteration to <math> x_{k+1} = \frac{1}{2}( \mathcal{P}_C(x_k) + \mathcal{P}_D(x_k) ) </math>. + +== Further reading == +* Book from 2011: [http://www.ec-securehost.com/SIAM/FA08.html Alternating Projection Methods] by René Escalante and Marcos Raydan (2011), published by SIAM. +* The review article from 1996:<ref name="SIAMreview"/> + +== References == +<references> +<ref name="SIAMreview">H.H. Bauschke and J.M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, 1996.</ref> + +<references/> + + + +[[Category:Convex geometry]] + fq61pa5cbfwsf1gx240temygcn7pnmr + + + + Almost surely + 0 + 4211 + + 4212 + 2014-01-27T17:23:03Z + + Jochen Burghardt + 0 + + added justification for dab link + wikitext + text/x-wiki + {{for|Rudolf Carnap's "probability<sub>1</sub>"|Probability1}} +<!---the above disambiguation link does make sense when this article is reached via the REDIRECT from "probability 1"---> +In [[probability theory]], one says that an [[event (probability theory)|event]] happens '''almost surely''' (sometimes abbreviated as '''a.s.''') if it happens with probability one.<ref>p 186, Stroock, D. W. (2011). Probability theory: an analytic view. Cambridge university press.</ref> The concept is analogous to the concept of "[[almost everywhere]]" in [[measure theory]]. While there is no difference between ''almost surely'' and ''surely'' (that is, entirely certain to happen) in many basic probability experiments, the distinction is important in more complex cases relating to some sort of [[infinity]]. For instance, the term is often encountered in questions that involve infinite time, regularity properties or infinite-[[dimension]]al spaces such as [[function space]]s. Basic examples of use include the [[law of large numbers]] (strong form) or continuity of [[Brownian motion|Brownian paths]]. + +'''Almost never''' describes the opposite of ''almost surely''; an event which happens with probability zero happens ''almost never''.<ref name="Gradel">{{cite book|last=Grädel|first=Erich|coauthors=Kolaitis, Libkin, Marx, Spencer, Vardi, Venema, Weinstein|year=2007|title=Finite model theory and its applications|publisher=Springer|pages=232|isbn=978-3-540-00428-8}}</ref> + +== Formal definition == +Let <math>(\Omega,\mathcal{F},P)</math> be a [[probability space]]. An [[event (probability theory)|event]] <math>E \in \mathcal{F}</math> happens ''almost surely'' if <math>P[E]=1</math>. Equivalently, <math>E</math> happens almost surely if the probability of <math>E</math> not occurring is [[0 (number)|zero]]: <math>P[E^C] = 0</math>. More generally, any event <math>E</math> (not necessarily in <math>\mathcal{F}</math>) happens almost surely if <math>E^C</math> is contained in a [[null set]]: a subset of some <math>N\in\mathcal F</math> such that <math>P[N]=0</math>.<ref name="Jacod">{{cite book|last=Jacod|first=Jean|coauthors=Protter, |year=2004|title=Probability Essentials|publisher=Springer|page=37|isbn=978-3-540-438717}}</ref> The notion of almost sureness depends on the probability measure <math>P</math>. If it is necessary to emphasize this dependence, it is customary to say that the event <math>E</math> occurs <math>P</math>-almost surely or almost surely <math>[P]</math>. + +== "Almost sure" versus "sure" == +The difference between an event being ''almost sure'' and ''sure'' is the same as the subtle difference between something happening ''with probability 1'' and happening ''always''. + +If an event is ''sure'', then it will always happen, and no outcome not in this event can possibly occur. If an event is ''almost sure'', then outcomes not in this event are theoretically possible; however, the probability of such an outcome occurring is smaller than any fixed positive probability, and therefore must be&nbsp;0. Thus, one cannot definitively say that these outcomes will never occur, but can for most purposes assume this to be true. + +=== Throwing a dart === +For example, imagine throwing a dart at a unit square wherein the dart will impact exactly one point, and imagine that this square is the only thing in the universe besides the dart and the thrower. There is physically nowhere else for the dart to land. Then, the event that "the dart hits the square" is a '''sure''' event. No other alternative is imaginable. + +Next, consider the event that "the dart hits the diagonal of the unit square exactly". The probability that the dart lands on any subregion of the square is proportional to the area of that subregion. But, since the area of the diagonal of the square is zero, the probability that the dart lands exactly on the diagonal is zero. So, the dart will '''almost never''' land on the diagonal (i.e. it will '''almost surely''' ''not'' land on the diagonal). Nonetheless the set of points on the diagonal is not empty and a point on the diagonal is no less possible than any other point, therefore theoretically it is possible that the dart actually hits the diagonal. + +The same may be said of any point on the square. Any such point ''P'' will contain zero area and so will have zero probability of being hit by the dart. However, the dart clearly must hit the square somewhere. Therefore, in this case, it is not only possible or imaginable that an event with zero probability will occur; one must occur. Thus, we would not want to say we were certain that a given event would not occur, but rather ''almost certain''. + +=== Tossing a coin === + +Consider the case where a coin is tossed. A coin has two sides, heads and tails, and therefore the event that "heads or tails is flipped" is a '''sure''' event. There can be no other result from such a coin. + +Now consider the single "coin toss" probability space <math>(\{H,T\}, 2^{\{H, T\}}, \mathbb{P})</math>, where the event <math>\{\omega = H\}</math> occurs if heads is flipped, and <math>\{\omega=T\}</math> if tails. For this particular coin, assume the probability of flipping heads is <math>\mathbb{P}[\omega = H] = p\in (0, 1)</math> from which it follows that the complement event, flipping tails, has <math>\mathbb{P}[\omega = T] = 1 - p</math>. + +Suppose we were to conduct an experiment where the coin is tossed repeatedly, and it is assumed each flip's outcome is independent of all the others. That is, they are [[Independent_and_identically_distributed_random_variables| ''i.i.d.'']]. Define the sequence of random variables on the coin toss space, <math>\{X_i(\omega)\}_{i\in\mathbb{N}}</math> where <math>X_i(\omega)=\omega_i</math>. ''i.e.'' each <math>X_i</math> records the outcome of the <math>i</math>'th flip. + +The event that every flip results in heads, yielding the sequence <math>\{H, H, H, \dots\}</math>, ''[[ad infinitum]]'', is possible in some sense (it does not violate any physical or mathematical laws to suppose that tails never appears), but it is very, very improbable. In fact, the probability of tails never being flipped in an infinite series is zero. To see why, note that the [[Independent_and_identically_distributed_random_variables| ''i.i.d.'']] assumption implies that the probability of flipping all heads over <math>n</math> flips is simply <math>\mathbb{P}[X_i = H, \ i=1,2,\dots,n]=\left(\mathbb{P}[X_1 = H]\right)^n = p^n</math>. Letting <math>n\rightarrow\infty</math> yields zero, since <math>p\in (0,1)</math> by assumption. Note that the result is the same no matter how much we bias the coin towards heads, so long as we constrain <math>p</math> to be greater than 0, and less than 1. + +Thus, though we cannot definitely say tails will be flipped at least once, we can say there will '''almost surely''' be at least one tails in an infinite sequence of flips. (Note that given the statements made in this paragraph, any predefined infinitely long ordering, such as the digits of [[pi]] in base two with heads representing 1 and tails representing 0, would have zero-probability in an infinite series. This makes sense because there are an infinite number of total possibilities and <math>\scriptstyle \lim\limits_{n\to\infty}\frac{1}{n} = 0</math>.) + +However, if instead of an infinite number of flips we stop flipping after some finite time, say a million flips, then the all-heads sequence has non-zero probability. The all-heads sequence has probability <math>p^{1,000,000}\neq 0</math>, while the probability of getting at least one tails is <math>1 - p^{1,000,000}</math> and the event is no longer '''almost sure'''. + +== Asymptotically almost surely == +In [[asymptotic analysis]], one says that a property holds '''asymptotically almost surely''' ('''a.a.s.''') if, over a sequence of sets, the probability converges to 1. For instance, a large number is asymptotically almost surely [[composite number|composite]], by the [[prime number theorem]]; and in [[random graph|random graph theory]], the statement "''G''(''n'',''p''<sub>''n''</sub>) is [[Connectivity (graph theory)|connected]]" (where [[Erdős–Rényi model|''G''(''n'',''p'')]] denotes the graphs on ''n'' vertices with edge probability ''p'') is true a.a.s. when ''p''<sub>n</sub> > <math>\tfrac{(1+\epsilon) \ln n}{n}</math> for any ε > 0.<ref name="RandGraph">{{cite journal|last=Friedgut|first=Ehud|coauthors=Rödl, Vojtech; Rucinski, Andrzej; Tetali, Prasad|date=January 2006|title=A Sharp Threshold for Random Graphs with a Monochromatic Triangle in Every Edge Coloring|journal=Memoirs of the American Mathematical Society|publisher=AMS Bookstore|volume=179|issue=845|pages=pp. 3–4|issn=0065-9266|accessdate=2008-09-21}}</ref> + +In [[number theory]] this is referred to as "[[almost all]]", as in "almost all numbers are composite". Similarly, in graph theory, this is sometimes referred to as "almost surely".<ref name="Springer">{{cite book|last=Spencer|first=Joel H.|title=The Strange Logic of Random Graphs|publisher=Springer|date=2001|series=Algorithms and Combinatorics|pages=4|chapter=0. Two Starting Examples|accessdate=2008-09-21}}</ref> + +== See also == +{{Portal|Mathematics}} +* [[Convergence of random variables]], for "almost sure convergence" +* [[Degenerate distribution]], for "almost surely constant" +* [[Almost everywhere]], the corresponding concept in measure theory +* [[Infinite monkey theorem]], a theorem using the aforementioned terms. + +== Notes == +{{Reflist}} + +==References== +* {{cite book|last=Rogers|first=L. C. G.|coauthors=Williams, David|title=Diffusions, Markov Processes, and Martingales|publisher=Cambridge University Press|date=2000|volume=1}} +* {{cite book|last=Williams|first=David|title=Probability with Martingales|publisher=Cambridge University Press|date=1991}} + +[[Category:Probability theory]] +[[Category:Mathematical terminology]] + lybyzsh1lmbr41ee1gvxwf2w1kabv8w + + + + Lagrangian mechanics + 0 + 24327 + + 24328 + 2014-01-15T23:01:43Z + + DVdm + 0 + + /* Lagrangian and action */ (null edit) re my prev. edit summary ("... erhaps a wikilinks ..."): did I write that? Time to go to sleep :-) + wikitext + text/x-wiki + {{Classical mechanics|cTopic=Formulations}} +'''Lagrangian mechanics''' is a re-formulation of [[classical mechanics]] using the [[principle of stationary action]] (also called the principle of least action).<ref>{{Cite book|last=Goldstein|first= H. |title=Classical Mechanics|edition=3rd| page=35 |publisher=Addison-Wesley|year= 2001}}</ref> Lagrangian mechanics applies to systems whether or not they conserve energy or momentum, and it provides conditions under which energy and/or momentum are conserved.<ref>{{Cite book|last=Goldstein|first= H. |title=Classical Mechanics|edition=3rd| page=54 |publisher=Addison-Wesley|year= 2001}}</ref> It was introduced by the Italian-French mathematician [[Joseph-Louis Lagrange]] in 1788. + +In Lagrangian mechanics, the trajectory of a system of particles is derived by solving the Lagrange equations in one of two forms, either the '''Lagrange equations of the first kind''',<ref name=Dvorak> + +{{cite book |title=Chaos and stability in planetary systems |author=R. Dvorak, Florian Freistetter |chapter=§ 3.2 Lagrange equations of the first kind |url=http://books.google.com/books?id=shYNuW0B0fsC&pg=PA24 |page=24 |isbn=3-540-28208-4 |year=2005 |publisher=Birkhäuser}} + +</ref> which treat constraints explicitly as extra equations, often using [[Lagrange multipliers]];<ref name=Haken> + +{{cite book |title=Information and self-organization |author=H Haken |url=http://books.google.com/books?id=tAfj4-xzyGwC&pg=PA61 |page=61 |isbn=3-540-33021-6 |year=2006 |edition=3rd |publisher=Springer}} + +</ref><ref name=Lanczos> + +{{cite book |title=The variational principles of mechanics |author=Cornelius Lanczos |page= 43 |chapter=II §5 Auxiliary conditions: the Lagrangian λ-method |isbn=0-486-65067-7 |publisher=Courier Dover |year=1986 |edition=Reprint of University of Toronto 1970 4th |url=http://books.google.com/books?id=ZWoYYr8wk2IC&pg=PA43 }} + +</ref> or the '''Lagrange equations of the second kind''', which incorporate the constraints directly by judicious choice of [[generalized coordinates]].<ref name=Dvorak/><ref name=Menzel> + +{{cite book |title=Fundamental formulas of physics |editor= DH Menzel |author=Henry Zatzkis |chapter=§1.4 Lagrange equations of the second kind |url=http://books.google.com/books?id=QgswE2BicW4C&pg=PA160 |page=160 |isbn=0-486-60595-7 |publisher=Courier Dover |year=1960 |volume=1 |edition=2nd}} + +</ref> The [[fundamental lemma of the calculus of variations]] shows that solving the Lagrange equations is equivalent to finding the path for which the [[Action (physics)|action functional]] is stationary, a quantity that is the [[integral]] of the [[Lagrangian]] over time. + +The use of generalized coordinates may considerably simplify a system's [[analysis]]. For example, consider a small frictionless bead traveling in a groove. If one is tracking the bead as a particle, calculation of the motion of the bead using [[Newtonian mechanics]] would require solving for the time-varying constraint force required to keep the bead in the groove. For the same problem using Lagrangian mechanics, one looks at the path of the groove and chooses a set of ''independent'' generalized coordinates that completely characterize the possible motion of the bead. This choice eliminates the need for the constraint force to enter into the resultant system of equations. There are fewer equations since one is not directly calculating the influence of the groove on the bead at a given moment. + +==Conceptual framework== + +===Generalized coordinates=== + +[[File:Generalized coordinates 1df.svg|right|350px|"350px"|thumb|Illustration of a [[generalized coordinate]] ''q'' for one degree of freedom, of a particle moving in a complicated path. Four possibilities of ''q'' for the particle's path are shown. For more particles each with their own degrees of freedom, there are more coordinates.]] + +====Concepts and terminology==== + +For one particle acted on by external forces, [[Newton's laws of motion|Newton's second law]] forms a set of 3 second-order [[ordinary differential equation]]s, one for each dimension. Therefore, the motion of the particle can be completely described by 6 independent variables: 3 initial position coordinates and 3 initial velocity coordinates. Given these, the general solutions to Newton's second law become particular solutions that determine the time evolution of the particle's behaviour after its initial state (''t'' = 0). + +The most familiar set of variables for position '''r''' = (''r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub>'') and velocity <math>\mathbf{\dot{r}}_j = (\dot{r}_1, \dot{r}_2, \dot{r}_3)</math> are [[Cartesian coordinates]] and their time derivatives (i.e. position (''x, y, z'') and velocity (''v<sub>x</sub>, v<sub>y</sub>, v<sub>z</sub>'') components). Determining forces in terms of standard coordinates can be complicated, and usually requires much labour. + +An alternative and more efficient approach is to use only as many coordinates as are needed to define the position of the particle, at the same time incorporating the constraints on the system, and writing down kinetic and potential energies. In other words, to determine the number of ''[[Degrees of freedom (mechanics)|degrees of freedom]]'' the particle has, i.e. the number of possible ways the system ''can'' move subject to the constraints (forces that prevent it moving in certain paths). Energies are much easier to write down and calculate than forces, since energy is a scalar while forces are vectors. + +These coordinates are ''[[generalized coordinates]]'', denoted <math>q_j</math>, and there is one for each degree of freedom. Their corresponding time derivatives are the [[Generalized coordinates#Generalized velocities and kinetic energy|generalized velocities]], <math>\dot{q_j}</math>. The number of degrees of freedom is usually not equal to the number of spatial dimensions: multi-body systems in 3 dimensional space (such as [[Barton's Pendulums]], [[planets]] in the [[solar system]], or [[atoms]] in [[molecules]]) can have many more degrees of freedom incorporating rotations as well as translations. This contrasts the number of spatial coordinates used with Newton's laws above. + +====Mathematical formulation==== + +The position vector '''r''' in a standard coordinate system (like Cartesian, spherical etc.), is related to the generalized coordinates by some ''transformation equation'': + +:<math>\bold{r} = \bold{r}(q_i, t). \, </math> + +where there are as many ''q<sub>i</sub>'' as needed (number of degrees of freedom in the system). Likewise for velocity and generalized velocities. + +For example, for a [[simple pendulum]] of length ''ℓ'', there is the constraint of the pendulum bob's suspension (rod/wire/string etc.). The position '''r''' depends on the ''x'' and ''y'' coordinates at time ''t'', that is, '''r'''(''t'')=(''x''(''t''),''y''(''t'')), however ''x'' and ''y'' are coupled to each other in a constraint equation (if ''x'' changes ''y'' must change, and vice versa). A logical choice for a generalized coordinate is the angle of the pendulum from vertical, θ, so we have '''r''' = (''x''(θ), ''y''(θ)) = '''r'''(θ), in which θ = θ(''t''). Then the transformation equation would be + +:<math> \bold{r}(\theta(t)) =(\ell\sin\theta, -\ell\cos\theta)</math> + +and so + +:<math>\bold{\dot{r}}(\theta(t),\dot{\theta}(t))=( \ell\, \dot{\theta}\cos\theta, \ell\,\dot{\theta}\sin \theta)</math> + +which corresponds to the one degree of freedom the pendulum has. The term "generalized coordinates" is really a holdover from the period when [[Cartesian coordinate system|Cartesian coordinates]] were the default coordinate system. + +In general, from ''m'' independent [[generalized coordinates]] ''q<sub>j</sub>'', the following transformation equations hold for a system composed of n particles:<ref name="Torby1984"/>{{rp|260}} + +:<math> +\begin{array}{r c l} +\mathbf{r}_1 &=& \mathbf{r}_1(q_1, q_2, \cdots, q_m, t) \\ +\mathbf{r}_2 &=& \mathbf{r}_2(q_1, q_2, \cdots, q_m, t) \\ + & \vdots & \\ +\mathbf{r}_n &=& \mathbf{r}_n(q_1, q_2, \cdots, q_m, t) +\end{array}</math> + +where ''m'' indicates the total number of generalized coordinates. An expression for the [[virtual displacement]] (infinitesimal), ''δ'''''r'''<sub>''i''</sub> of the system for ''time-independent constraints'' or "velocity-dependent constraints" is the same form as a [[total differential]]<ref name="Torby1984"/>{{rp|264}} +:<math>\delta \mathbf{r}_i = \sum_{j=1}^m \frac {\partial \mathbf {r}_i} {\partial q_j} \delta q_j,</math> +where ''j'' is an integer label corresponding to a generalized coordinate. + +The generalized coordinates form a discrete set of variables that define the configuration of a system. The continuum analogue for defining a ''[[Classical field theory|field]]'' are field variables, say ''ϕ''(''r, t''), which represents density function varying with position and time. + +===D'Alembert's principle and generalized forces=== + +[[D'Alembert's principle]] introduces the concept of [[virtual work]] due to applied forces '''F'''<sub>''i''</sub> and [[inertia]]l forces, acting on a three dimensional accelerating system of ''n'' particles whose motion is consistent with its constraints,<ref name="Torby1984">{{cite book |last=Torby |first=Bruce |title=Advanced Dynamics for Engineers |series=HRW Series in Mechanical Engineering |year=1984 |publisher=CBS College Publishing |location=United States of America |isbn=0-03-063366-4 |chapter=Energy Methods}}</ref>{{rp|269}} + +Mathematically the virtual work done ''δW'' on a particle of mass ''m<sub>i</sub>'' through a virtual displacement ''δ'''''r'''<sub>''i''</sub> (consistent with the constraints) is: + +{{Equation box 1 +|indent =: +|title='''D'Alembert's principle''' +|equation = <math>\delta W = \sum_{i=1}^n ( \mathbf {F}_{i} - m_i \mathbf{a}_i )\cdot \delta \mathbf r_i = 0.</math> +|cellpadding +|border +|border colour = #50C878 +|background colour = #ECFCF4}} + +where '''a'''<sub>''i''</sub> are the accelerations of the particles in the system and ''i'' = 1, 2,...,''n'' simply labels the particles. In terms of generalized coordinates + +:<math>\delta W = \sum_{j=1}^m \sum_{i=1}^n ( \mathbf {F}_{i} - m_i \mathbf{a}_i )\cdot \frac {\partial \mathbf {r}_i} {\partial q_j} \delta q_j= 0.</math> + +this expression suggests that the applied forces may be expressed as [[generalized forces]], ''Q<sub>j</sub>''. Dividing by ''δq<sub>j</sub>'' gives the definition of a generalized force:<ref name="Torby1984"/>{{rp|265}} + +:<math>Q_j = \frac{\delta W}{\delta q_j}= \sum_{i=1}^n \mathbf {F}_i \cdot \frac {\partial \mathbf{r}_i} {\partial q_j}.</math> + +If the forces '''F'''<sub>''i''</sub> are [[Conservative force|conservative]], there is a [[scalar potential]] field ''V'' in which the [[gradient]] of ''V'' is the force:<ref name="Torby1984"/>{{rp|266 & 270}} +:<math>\mathbf F_i = - \nabla V \Rightarrow Q_j = - \sum_{i=1}^n \nabla V \cdot \frac {\partial \mathbf {r}_i} {\partial q_j} = - \frac {\partial V}{\partial q_j}.</math> + +i.e. generalized forces can be reduced to a potential gradient in terms of generalized coordinates. The previous result may be easier to see by recognizing that ''V'' is a function of the '''r'''<sub>''i''</sub>, which are in turn functions of ''q<sub>j</sub>'', and then applying the [[chain rule]] to the derivative of <math>V</math> with respect to ''q<sub>j</sub>''. + +===Kinetic energy relations=== + +The [[kinetic energy]], ''T'', for the system of particles is defined by<ref name="Torby1984"/>{{rp|269}} + +:<math>T = \frac {1}{2} \sum_{i=1}^n m_i \mathbf {\dot{r}}_i \cdot \mathbf {\dot{r}}_i.</math> + +The partial derivatives of ''T'' with respect to the generalized coordinates ''q<sub>j</sub>'' and generalized velocities <math>\dot{q}_j</math> are <ref name="Torby1984"/>{{rp|269}}: + +:<math>\frac{\partial T}{\partial q_j} = \sum_{i=1}^n m_i \mathbf{\dot{r}}_i \cdot \frac{\partial \mathbf{\dot{r}}_i}{\partial q_j}</math> + +:<math>\quad \frac{\partial T}{\partial \dot{q}_j} = \sum_{i=1}^n m_i \mathbf{\dot{r}}_i \cdot \frac{\partial \mathbf{\dot{r}}_i}{\partial \dot{q}_j}.</math> + +Because <math>\dot{q_j}</math> and <math>q_j</math> are independent variables: +:<math>\frac {\partial \mathbf{\dot{r}}_i}{\partial \dot{q_j}} = \frac {\partial \mathbf{r}_i}{\partial q_j} .</math> + +Then: + +:<math>\quad \frac{\partial T}{\partial \dot{q}_j} = \sum_{i=1}^n m_i \mathbf{\dot{r}}_i \cdot \frac{\partial \mathbf{r}_i}{\partial q_j} \ .</math> + +The total time derivative of this equation is + +:<math>\frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial T}{\partial \dot{q}_j} = \sum_{i=1}^n m_i \mathbf{\ddot{r}}_i \cdot \frac {\partial \mathbf{r}_i}{\partial q_j} + \sum_{i=1}^n m_i \mathbf{\dot{r}}_i \cdot \frac {\partial \mathbf{\dot{r}}_i}{\partial q_j} = Q_j + \frac{\partial T}{\partial q_j} \ .</math> + +resulting in: + +{{Equation box 1 +|indent =: +|title='''Generalized equations of motion''' +|equation = <math>Q_j = \frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac {\partial T}{\partial \dot{q}_j} \right ) - \frac {\partial T}{\partial q_j}</math> +|cellpadding= 6 +|border +|border colour = #0073CF +|background colour=#F5FFFA}} + +Newton's laws are contained in it, yet there is no need to find the constraint forces because virtual work and generalized coordinates (which account for constraints) are used. This equation in itself is not actually used in practice, but is a step towards deriving Lagrange's equations (see below).<ref name="Analytical Mechanics 2008">Analytical Mechanics, L.N. Hand, J.D. Finch, Cambridge University Press, 2008, ISBN 978-0-521-57572-0</ref> + +===Lagrangian and action=== + +The core element of Lagrangian mechanics is the [[Lagrangian]] function, which summarizes the dynamics of the entire system in a very simple expression. The physics of analyzing a system is reduced to choosing the most convenient set of generalized coordinates, determining the kinetic and potential energies of the constituents of the system, then writing down the equation for the Lagrangian to use in Lagrange's equations. It is defined by <ref>Torby1984, p.270</ref> + +:<math>L = T - V \,</math> + +where ''T'' is the total kinetic energy and ''V'' is the total potential energy of the system. + +The next fundamental element is the [[action (physics)|action]] <math>\mathcal{S}</math>, defined as the time integral of the Lagrangian:<ref name="Analytical Mechanics 2008"/> + +:<math>\mathcal{S} = \int_{t_1}^{t_2} L\,\mathrm{d}t.</math> + +This also contains the dynamics of the system, and has deep theoretical implications (discussed below). Technically action is a ''[[Functional (mathematics)|functional]]'', rather than a ''[[Function (mathematics)|function]]'': its value depends on the full Lagrangian function for all times between ''t''<sub>1</sub> and ''t''<sub>2</sub>. Its [[dimensional analysis|dimensions]] are the same as [[angular momentum]]. + +In classical [[field theory (physics)|field theory]], the physical system is not a set of discrete particles, but rather a continuous field defined over a region of 3d space. Associated with the field is a [[Lagrangian density]] <math>\mathcal{L}(\mathbf{r},t)</math> defined in terms of the field and its derivatives at a location <math>\mathbf{r}</math>. The total Lagrangian is then the integral of the Lagrangian density over 3d space (see [[volume integral]]): + +:<math> L(t) = \int \mathcal{L}(\mathbf{r},t) \mathrm{d}^3 \mathbf{r} \,</math> + +where d<sup>3</sup>'''r''' is a 3d [[Total_differential#Differentials_in_several_variables|differential]] [[volume element]], must be used instead. The action becomes an integral over space and time: + +:<math>\mathcal{S} = \int_{t_1}^{t_2}\int \mathcal{L}(\mathbf{r},t) \mathrm{d}^3\mathbf{r} \mathrm{d}t.</math> + +===Hamilton's principle of stationary action=== + +Let ''q''<sub>0</sub> and ''q''<sub>1</sub> be the coordinates at respective initial and final times ''t''<sub>0</sub> and ''t''<sub>1</sub>. Using the [[calculus of variations]], it can be shown that Lagrange's equations are equivalent to ''[[Hamilton's principle]]'': + +:''The trajectory of the system between t<sub>0</sub> and t<sub>1</sub> has a '''stationary action''' S.'' + +By ''stationary'', we mean that the action does not vary to first-order from infinitesimal deformations of the trajectory, with the end-points (''q''<sub>0</sub>, ''t''<sub>0</sub>) and (''q''<sub>1</sub>,''t''<sub>1</sub>) fixed. Hamilton's principle can be written as: + +:<math>\delta \mathcal{S} = 0. \,\!</math> + +Thus, instead of thinking about particles accelerating in response to applied forces, one might think of them picking out the path with a stationary action. + +Hamilton's principle is sometimes referred to as the ''[[principle of least action]]'', however the action functional need only be ''stationary'', not necessarily a maximum or a minimum value. Any variation of the functional gives an increase in the functional integral of the action. + +We can use this principle instead of [[Newton's Laws]] as the fundamental principle of mechanics, this allows us to use an integral principle (Newton's Laws are based on differential equations so they are a differential principle) as the basis for mechanics. However it is not widely stated that Hamilton's principle is a variational principle only with [[holonomic constraints]], if we are dealing with nonholonomic systems then the variational principle should be replaced with one involving [[d'Alembert]] principle of [[virtual work]]. Working only with holonomic constraints is the price we have to pay for using an elegant variational formulation of mechanics. + +==Lagrange equations of the first kind== + +Lagrange introduced an analytical method for finding stationary points using the method of [[Lagrange multiplier]]s, and also applied it to mechanics. + +For a system subject to the constraint equation on the generalized coordinates: + +:<math>F(r_1,r_2,r_3) = A </math> + +where ''A'' is a constant, then '''Lagrange's equations of the first kind''' are: + +:<math>\left[\frac{\partial L}{\partial r_j} - \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L}{\partial \dot{r}_j}\right)\right] + \lambda\frac{\partial F}{\partial r_j}=0 </math> + +where ''λ'' is the Lagrange multiplier. By analogy with the mathematical procedure, we can write: + +:<math>\frac{\delta L}{\delta r_j} + \lambda\frac{\partial F}{\partial r_j}=0 </math> + +where + +:<math>\frac{\delta L}{\delta r_j} = \frac{\partial L}{\partial r_j} - \frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\partial L}{\partial \dot{r}_j}\right) </math> + +denotes the [[variational derivative]]. + +For ''e'' constraint equations ''F''<sub>1</sub>, ''F''<sub>2</sub>,..., ''F<sub>e</sub>'', there is a Lagrange multiplier for each constraint equation, and Lagrange's equations of the first kind generalize to: + +{{Equation box 1 +|indent =: +|title='''Lagrange's equations''' ''(1st kind)'' +|equation = <math>\frac{\delta L}{\delta r_j} + \sum_{i=1}^e \lambda_i\frac{\partial F_i}{\partial r_j}=0 </math> +|cellpadding= 6 +|border +|border colour = #0073CF +|background colour=#F5FFFA}} + +This procedure does increase the number of equations, but there are enough to solve for all of the multipliers. The number of equations generated is the number of constraint equations plus the number of coordinates, i.e. ''e'' + ''m''. The advantage of the method is that (potentially complicated) substitution and elimination of variables linked by constraint equations can be bypassed. + +There is a connection between the constraint equations ''F<sub>j</sub>'' and the constraint forces ''N<sub>j</sub>'' acting in the [[conservative force|conservative]] system (forces are conservative): + +:<math>N_j = \sum_{i=1}^e \lambda_i \frac{\partial F_i}{\partial r_j} </math> + +which is derived below. + +:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left" +!Derivation of connection between constraint equations and forces +|- +|The generalized constraint forces are given by (using the definition of generalized force above): + +:<math>N_j = \sum_{i=1}^n \mathbf{N}_i\cdot\frac{\partial \mathbf{r}_i}{\partial q_j}</math> + +and using the kinetic energy equation of motion (blue box above): + +:<math>Q_j = \frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac {\partial T}{\partial \dot{q}_j} \right ) - \frac {\partial T}{\partial q_j} = -\frac{\delta T}{\delta q_j}=\sum_{i=1}^n \mathbf{F}_i\cdot\frac{\partial \mathbf{r}_i}{\partial q_j},</math> + +For conservative systems (see below) + +:<math> \mathbf{F}_i = -\nabla V_i + \mathbf{N}_i,</math> + +so + +:<math>\frac{\delta T}{\delta q_j} = \sum_{i=1}^n \mathbf{F}_i\cdot\frac{\partial \mathbf{r}_i}{\partial q_j} +=\sum_{i=1}^n (-\nabla V_i + \mathbf{N}_i)\cdot\frac{\partial \mathbf{r}_i}{\partial q_j} +=-\sum_{i=1}^n\nabla V_i\cdot\frac{\partial \mathbf{r}_i}{\partial q_j}+\sum_{i=1}^n \mathbf{N}_i\cdot\frac{\partial \mathbf{r}_i}{\partial q_j} +=-\frac{\partial V}{\partial q_j} + N_j +</math> + +and + +:<math>\frac{\delta T}{\delta q_j}=\frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac{\partial (L+V)}{\partial \dot{q}_j} \right ) - \frac {\partial (L+V)}{\partial q_j} +=-\frac{\delta L}{\delta \dot{q}_j} - \frac {\partial V}{\partial q_j} +</math> + +equating leads to + +:<math>N_j = -\frac{\delta L}{\delta \dot{q}_j} </math> + +and finally equating to Lagrange's equations of the first kind implies: + +:<math>N_j = \sum_{i=1}^e \lambda_i \frac{\partial F_i}{\partial r_j} </math> + +So each constraint equation corresponds to a constraint force (in a conservative system). +|} + +==Lagrange equations of the second kind== + +===Euler-Lagrange equations=== + +For any system with ''m'' degrees of freedom, the Lagrange equations include ''m'' generalized coordinates and ''m'' generalized velocities. Below, we sketch out the derivation of the Lagrange equations of the second kind. In this context, ''V'' is used rather than ''U'' for [[potential energy]] and ''T'' replaces ''K'' for [[kinetic energy]]. See the references for more detailed and more general derivations. + +The equations of motion in Lagrangian mechanics are the '''Lagrange equations of the second kind''', also known as the '''[[Euler–Lagrange equation]]s''':<ref name="Analytical Mechanics 2008"/><ref>The Road to Reality, Roger Penrose, Vintage books, 2007, ISBN 0-679-77631-1</ref> + +{{Equation box 1 +|indent =: +|title='''Lagrange's equations''' ''(2nd kind)'' +|equation = <math>\frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac {\partial L}{\partial \dot{q}_j} \right ) = \frac {\partial L}{\partial q_j} </math> +|cellpadding +|border +|border colour = #50C878 +|background colour = #ECFCF4}} + +where ''j'' = 1, 2,...''m'' represents the ''j''th degree of freedom, ''q<sub>j</sub>'' are the [[generalized coordinates]], and <math>\dot{q}_j</math> are the [[generalized velocities]]. + +Although the mathematics required for Lagrange's equations appears significantly more complicated than Newton's laws, this does point to deeper insights into classical mechanics than Newton's laws alone: in particular, symmetry and conservation. In practice it's often easier to solve a problem using the Lagrange equations than Newton's laws, because the minimum generalized coordinates ''q''<sub>''i''</sub> can be chosen by convenience to exploit symmetries in the system, and constraint forces are incorporated into the geometry of the problem. There is one Lagrange equation for each generalized coordinate ''q<sub>i</sub>''. + +For a system of many particles, each particle can have different numbers of degrees of freedom from the others. In each of the Lagrange equations, ''T'' is the ''total'' kinetic energy of the system, and ''V'' the ''total'' potential energy. + +===Derivation of Lagrange's equations=== + +====Hamilton's principle==== + +The Euler-Lagrange equations follow directly from Hamilton's principle, and are mathematically equivalent. From the [[calculus of variations]], any functional of the form: + +:<math>J=\int_{x_1}^{x_2}F(x,y,y')\mathrm{d}x</math> + +leads to the general [[Euler-Lagrange equation]] for stationary value of J. (see main article for derivation): + +:<math>\frac{\mathrm{d}}{\mathrm{d}x}\frac{\partial F}{\partial y'}=\frac{\partial F}{\partial y}</math> + +Then making the replacements: + +:<math>x\rightarrow t,\quad y\rightarrow q,\quad y'\rightarrow \dot{q},\quad F\rightarrow L,\quad J\rightarrow\mathcal{S}</math> + +yields the Lagrange equations for mechanics. Since mathematically [[Hamilton's equations]] can be derived from Lagrange's equations (by a [[Legendre transformation]]) and Lagrange's equations can be derived from Newton's laws, all of which are equivalent and summarize classical mechanics, this means classical mechanics is fundamentally ruled by a variation principle (Hamilton’s principle above). + +====Generalized forces==== + +For a conservative system, since the potential field is only a function of position, not velocity, Lagrange's equations also follow directly from the equation of motion above: + +:<math>Q_j = \frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac {\partial \mathcal (L+V)}{\partial \dot{q}_j} \right ) - \frac {\partial \mathcal (L+V)}{\partial q_j} = \left[\frac{\mathrm{d}}{\mathrm{d}t}\left ( \frac {\partial L}{\partial \dot{q}_j} \right ) +0\right] - \left[ \frac {\partial L}{\partial q_j}+\frac {\partial V}{\partial q_j}\right] = \frac{\mathrm{d}}{\mathrm{d}t}\left ( \frac {\partial L}{\partial \dot{q}_j} \right ) - \frac {\partial L}{\partial q_j} + Q_j. </math> + +simplifying to + +:<math>\frac{\mathrm{d}}{\mathrm{d}t}\left ( \frac {\partial L}{\partial \dot{q}_j} \right ) = \frac {\partial L}{\partial q_j} </math> + +This is consistent with the results derived above and may be seen by differentiating the right side of the Lagrangian with respect to <math>\dot{q}_j</math> and time, and solely with respect to ''q<sub>j</sub>'', adding the results and associating terms with the equations for '''F'''<sub>''i''</sub> and ''Q<sub>j</sub>''. + +====Newton's laws==== + +As the following derivation shows, ''no new physics'' is introduced, so the Lagrange equations can describe the dynamics of a classical system equivalently as Newton's laws. + +:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left" +!Derivation of Lagrange's equations from Newton's 2nd law and D'Alembert's principle +|- +| +;Force and work done (on the particle) + +Consider a single particle with [[mass]] ''m'' and [[position vector]] '''r''', moving under an applied [[conservative force]] '''F''', which can be expressed as the [[gradient]] of a [[scalar potential|scalar]] [[potential energy]] function ''V''('''r''', t): + +:<math>\bold{F} = - \bold{\nabla} V. \, </math> + +Such a force is independent of third- or higher-order derivatives of '''r'''. + +Consider an arbitrary displacement ''δ'''''r''' of the particle. The [[Mechanical work|work]] done by the applied force '''F''' is + +:<math>\delta W = \bold{F} \cdot \delta \bold{r}</math>. + +Using Newton's second law: + +:<math> \bold{F} \cdot \delta \bold{r} = m\ddot{\bold{r}} \cdot \delta \bold{r}. </math> + +Since work is a physical scalar quantity, we should be able to rewrite this equation in terms of the generalized coordinates and velocities. On the left hand side, + +:<math>\bold{F} \cdot \bold{\delta} \bold{r} += - \bold{\nabla} V \cdot \displaystyle\sum_i {\partial \bold{r} \over \partial q_i} \delta q_i += - \displaystyle\sum_{i,j} {\partial V \over \partial r_j} {\partial r_j \over \partial q_i} \delta q_i += - \displaystyle\sum_i {\partial V \over \partial q_i} \delta q_i. </math> + +On the right hand side, carrying out a change of coordinates to generalized coordinates, we obtain: + +:<math>m \ddot{\bold{r}} \cdot \delta \bold{r} = m \sum_j \left[ \sum_i \ddot{r_i} {\partial r_i \over \partial q_j} \right] \delta q_j </math> + +Now [[Integration by parts|integrating by parts]] the summand with respect to ''t'', then differentiating with respect to ''t'': + +:<math> \frac{\mathrm{d}}{\mathrm{d}t}\int\ddot{r_i} {\partial r_i \over \partial q_j} \mathrm{d}t = \frac{\mathrm{d}}{\mathrm{d}t}\left({\partial r_i \over \partial q_j}\dot{r}_i\right)-\frac{\mathrm{d}}{\mathrm{d}t}\int\frac{\mathrm{d}}{\mathrm{d}t}\left({\partial r_i \over \partial q_j}\right)\dot{r}_i\mathrm{d}t= \frac{\mathrm{d}}{\mathrm{d}t}\left(\dot{r}_i{\partial r_i \over \partial q_j}\right)-\dot{r}\frac{\mathrm{d}}{\mathrm{d}t}\left({\partial r_i \over \partial q_j}\right)</math> + +allows the sum to be written as: + +:<math>m \ddot{\bold{r}} \cdot \delta \bold{r} = m \sum_j \left[ \sum_i \left[ {\mathrm{d} \over \mathrm{d}t} \left( \dot{r_i} {\partial r_i \over \partial q_j} \right) - \dot{r_i} {\mathrm{d} \over \mathrm{d}t}\left( {\partial r_i \over \partial q_j} \right) \right] \right] \delta q_j </math> + +Recognizing that + +:<math>{\mathrm{d} \over \mathrm{d}t}{\partial r_j \over \partial q_i} = {\partial \dot{r_j} \over \partial q_i}, \quad {\partial r_j \over \partial q_i} = {\partial \dot{r_j} \over \partial \dot{q_i}},</math> + +we obtain: + +:<math>m \ddot{\bold{r}} \cdot \delta \bold{r} = m \sum_j \left[ \sum_i \left[ {\mathrm{d} \over \mathrm{d}t} \left( \dot{r_i} {\partial \dot{r_i} \over \partial \dot{q_j}} \right) - \dot{r_i} {\partial \dot{r_i} \over \partial q_j} \right] \right] \delta q_j </math> + +;Kinetic and potential energy + +Now, by changing the order of differentiation, we obtain: + +:<math>m \ddot{\bold{r}} \cdot \delta \bold{r} = m \sum_j \left[ \sum_i \left[ {\mathrm{d} \over \mathrm{d}t} {\partial \over \partial \dot{q_j}} \left( \frac{1}{2} \dot{r_i}^2 \right) - {\partial \over \partial q_j} \left( \frac{1}{2} \dot{r_i}^2 \right) \right] \right] \delta q_j </math> + +Finally, we change the order of summation: + +:<math>m \ddot{\bold{r}} \cdot \delta \bold{r} = \sum_j \left[ {\mathrm{d} \over \mathrm{d}t} {\partial \over \partial \dot{q_j}} \left( \sum_i \frac{1}{2} m \dot{r_i}^2 \right) - {\partial \over \partial q_j} \left( \sum_i \frac{1}{2} m \dot{r_i}^2 \right) \right] \delta q_j </math> + +Which is equivalent to: + +:<math> + m \ddot{\bold{r}} \cdot \delta \bold{r} += \sum_i \left[{\mathrm{d} \over \mathrm{d}t}{\partial T \over \partial \dot{q_i}}-{\partial T \over \partial q_i}\right]\delta q_i +</math> + +where ''T'' is total kinetic energy of the system. + +;Applying D'Alembert's principle + +The equation for the work done becomes + +:<math> +m\mathbf{\ddot{r}}\cdot\delta \mathbf{r}-\mathbf{F}\cdot\delta \mathbf{r}=\sum_i \left[{\mathrm{d} \over \mathrm{d}t}{\partial{T}\over \partial{\dot{q_i}}}-{\partial{(T-V)}\over \partial q_i}\right]\delta q_i = 0. +</math> + +However, this must be true for ''any'' set of generalized displacements ''δq<sub>i</sub>'', so we must have + +:<math> +\left[ {\mathrm{d} \over \mathrm{d}t}{\partial{T}\over \partial{\dot{q_i}}}-{\partial{(T-V)}\over \partial q_i}\right] = 0 +</math> + +for ''each'' generalized coordinate ''δq<sub>i</sub>''. We can further simplify this by noting that ''V'' is a function solely of '''r''' and ''t'', and '''r''' is a function of the generalized coordinates and ''t''. Therefore, ''V'' is independent of the generalized velocities: + +:<math>{\mathrm{d} \over \mathrm{d}t}{\partial{V}\over \partial{\dot{q_i}}} = 0.</math> + +Inserting this into the preceding equation and substituting ''L'' =&nbsp;''T''&nbsp;&minus;&nbsp;''V'', called the Lagrangian, we obtain Lagrange's equations: + +:<math> +{\partial{L}\over \partial q_i} = {\mathrm{d} \over \mathrm{d}t}{\partial{L}\over \partial{\dot{q_i}}}. +</math> +|} + +When ''q''<sub>''i''</sub> = ''r''<sub>''i''</sub> (i.e. the generalized coordinates are simply the Cartesian coordinates), it is straightforward to check that Lagrange's equations reduce to Newton's second law. + +===Dissipation function=== +{{main|Rayleigh dissipation function}} +In a more general formulation, the forces could be both potential and [[viscosity|viscous]]. If an appropriate transformation can be found from the '''F'''<sub>i</sub>, [[John Strutt, 3rd Baron Rayleigh|Rayleigh]] suggests using a dissipation function, ''D'', of the following form:<ref name="Torby1984"/>{{rp|271}} +:<math>D = \frac {1}{2} \sum_{j=1}^m \sum_{k=1}^m C_{j k} \dot{q}_j \dot{q}_k.</math> +where ''C<sub>jk</sub>'' are constants that are related to the damping coefficients in the physical system, though not necessarily equal to them + +If ''D'' is defined this way, then<ref name="Torby1984"/>{{rp|271}} +:<math>Q_j = - \frac {\partial V}{\partial q_j} - \frac {\partial D}{\partial \dot{q}_j}</math> +and +:<math>0 = \frac{\mathrm{d}}{\mathrm{d}t} \left ( \frac {\partial L}{\partial \dot{q}_j} \right ) - \frac {\partial L}{\partial q_j} + \frac {\partial D}{\partial \dot{q}_j}.</math> + +===Examples=== +In this section two examples are provided in which the above concepts are applied. The first example establishes that in a simple case, the Newtonian approach and the Lagrangian formalism agree. The second case illustrates the power of the above formalism, in a case that is hard to solve with Newton's laws. + +====Falling mass==== +Consider a point mass ''m'' falling freely from rest. By gravity a force ''F'' = ''mg'' is exerted on the mass (assuming ''g'' constant during the motion). Filling in the force in Newton's law, we find <math>\ddot x = g</math> from which the solution +:<math>x(t) = \frac{1}{2} g t^2</math> +follows (by taking the antiderivative of the antiderivative, and choosing the origin as the starting point). This result can also be derived through the Lagrangian formalism. Take ''x'' to be the coordinate, which is ''0'' at the starting point. The kinetic energy is ''T'' = {{frac|1|2}}''mv''<sup>2</sup> and the potential energy is ''V'' = −''mgx''; hence, +:<math>L = T - V = \frac{1}{2} m \dot{x}^2 + m g x.</math>. +Then +:<math>0 = \frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial \dot x} = m g - m \frac{\mathrm{d} \dot x}{\mathrm{d} t} </math> +which can be rewritten as <math>\ddot x = g</math>, yielding the same result as earlier. + +====Pendulum on a movable support==== +Consider a pendulum of mass ''m'' and length ''ℓ'', which is attached to a support with mass ''M'', which can move along a line in the ''x''-direction. Let ''x'' be the coordinate along the line of the support, and let us denote the position of the pendulum by the angle ''θ'' from the vertical. + +[[File:pendulumWithMovableSupport.svg|thumb|right|300px|Sketch of the situation with definition of the coordinates (click to enlarge)]] + +The kinetic energy can then be shown to be + +:<math> +\begin{align} +T &= \frac{1}{2} M \dot{x}^2 + \frac{1}{2} m \left( \dot{x}_\mathrm{pend}^2 + \dot{y}_\mathrm{pend}^2 \right) \\ +&= \frac{1}{2} M \dot{x}^2 + \frac{1}{2} m \left[ \left( \dot x + \ell \dot\theta \cos \theta \right)^2 + \left( \ell \dot\theta \sin \theta \right)^2 \right], +\end{align}</math> + +and the potential energy of the system is + +:<math> V = m g y_\mathrm{pend} = - m g \ell \cos \theta . </math> + +The Lagrangian is therefore + +<math> +\begin{align} +L &= T - V \\ +&= \frac{1}{2} M \dot{x}^2 + \frac{1}{2} m \left[ \left( \dot x + \ell \dot\theta \cos \theta \right)^2 + \left( \ell \dot\theta \sin \theta \right)^2 \right] + m g \ell \cos \theta \\ +&= \frac{1}{2} \left( M + m \right) \dot x^2 + m \dot x \ell \dot \theta \cos \theta + \frac{1}{2} m \ell^2 \dot \theta ^2 + m g \ell \cos \theta +\end{align} +</math> + +Now carrying out the differentiations gives for the support coordinate ''x'' + +:<math>\frac{\mathrm{d}}{\mathrm{d}t} \left[ (M + m) \dot x + m \ell \dot\theta \cos\theta \right] = 0, </math> + +therefore: + +:<math> (M + m) \ddot x + m \ell \ddot\theta\cos\theta-m \ell \dot\theta ^2 \sin\theta = 0 </math> + +indicating the presence of a constant of motion. Performing the same procedure for the variable <math>\theta</math> yields: + +:<math>\frac{\mathrm{d}}{\mathrm{d}t}\left[ m( \dot x \ell \cos\theta + \ell^2 \dot\theta ) \right] + m \ell (\dot x \dot \theta + g) \sin\theta = 0;</math> + +therefore + +:<math>\ddot\theta + \frac{\ddot x}{\ell} \cos\theta + \frac{g}{\ell} \sin\theta = 0.\, </math> + +These equations may look quite complicated, but finding them with Newton's laws would have required carefully identifying all forces, which would have been much more laborious and prone to errors. By considering limit cases, the correctness of this system can be verified: For example, <math>\ddot x \to 0</math> should give the equations of motion for a pendulum that is at rest in some [[inertial frame]], while <math>\ddot\theta \to 0</math> should give the equations for a pendulum in a constantly accelerating system, etc. Furthermore, it is trivial to obtain the results numerically, given suitable starting conditions and a chosen time step, by [[Numerical ordinary differential equations|stepping through the results iteratively]]. + +====Two-body central force problem==== +The basic problem is that of two bodies in orbit about each other attracted by a central force. The [[Jacobi coordinates]] are introduced; namely, the location of the center of mass '''R''' and the separation of the bodies '''r''' (the relative position). The Lagrangian is then<ref name=Taylor>{{cite book |page=297 |title=Classical mechanics |url=http://books.google.com/books?id=P1kCtNr-pJsC&pg=PA297 |author=John Robert Taylor |isbn=1-891389-22-X |publisher=University Science Books |year=2005}}</ref><ref name=Padmanabhan>The Lagrangian also can be written explicitly for a rotating frame. See {{cite book |title=Theoretical Astrophysics: Astrophysical processes |chapter=§2.3.2 Motion in a rotating frame |page=48 |url=http://books.google.com/books?id=ZzJicsTIrAAC&pg=PA48 |author=Thanu Padmanabhan |isbn=0-521-56632-0 |edition=3rd |publisher=Cambridge University Press |year=2000}}</ref> + +:<math> +\begin{align} +L &= T-U = \frac {1}{2} M \dot{\mathbf{R}}^2 + \left( \frac {1}{2} \mu \dot{\mathbf{r}}^2 - U(r) \right) \\ + &= L_{\mathrm{cm}} + L_{\mathrm{rel}} +\end{align}</math> + +where ''M'' is the total mass, ''μ'' is the [[reduced mass]], and ''U'' the potential of the radial force. The Lagrangian is divided into a ''center-of-mass'' term and a ''relative motion'' term. The '''R''' equation from the Euler-Lagrange system is simply: + +:<math>M\ddot{\mathbf{R}} = 0, \, </math> + +resulting in simple motion of the center of mass in a straight line at constant velocity. The relative motion is expressed in polar coordinates (''r'', ''θ''): + +:<math>L=\frac{1}{2} \mu \left(\dot r ^2 +r^2 \dot \theta ^2 \right) - U(r), </math> + +which does not depend upon ''θ'', therefore an ''ignorable'' coordinate. The Lagrange equation for ''θ'' is then: + +:<math>\frac {\partial L}{\partial \dot \theta} = \mu r^2 \dot \theta = \mathrm{constant} = \ell, \, </math> + +where ''ℓ'' is the conserved angular momentum. The Lagrange equation for ''r'' is: + +:<math>\frac{\partial L}{\partial r} = \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial \dot r}, \, </math> + +or: + +:<math> \mu r \dot \theta ^2 -\frac {dU}{dr} = \mu \ddot r. \, </math> + +This equation is identical to the radial equation obtained using Newton's laws in a ''co-rotating'' reference frame, that is, a frame rotating with the reduced mass so it appears stationary. If the angular velocity is replaced by its value in terms of the angular momentum, + +:<math>\dot \theta = \frac {\ell}{\mu r^2}, \, </math> + +the radial equation becomes:<ref name=Finch> + +{{cite book |title=Analytical mechanics |author=Louis N. Hand, Janet D. Finch |url=http://books.google.com/books?id=1J2hzvX2Xh8C&pg=PA141 |pages=140–141 |isbn=0-521-57572-9 |year=1998 |publisher=Cambridge University Press}} + +</ref> + +:<math>\mu \ddot r = -\frac{dU}{dr} + \frac{\ell^2}{\mu r^3}. \, </math> + +which is the equation of motion for a one-dimensional problem in which a particle of mass ''μ'' is subjected to the inward central force −d''U''/d''r'' and a second outward force, called in this context the [[centrifugal force]]: + +:<math>F_{\mathrm{cf}} = \mu r \dot \theta ^2 = \frac {\ell^2}{\mu r^3}. \, </math> + +Of course, if one remains entirely within the one-dimensional formulation, ''ℓ'' enters only as some imposed parameter of the external outward force, and its interpretation as angular momentum depends upon the more general two-dimensional problem from which the one-dimensional problem originated. + +If one arrives at this equation using Newtonian mechanics in a co-rotating frame, the interpretation is evident as the centrifugal force in that frame due to the rotation of the frame itself. If one arrives at this equation directly by using the generalized coordinates (''r'', ''θ'') and simply following the Lagrangian formulation without thinking about frames at all, the interpretation is that the centrifugal force is an outgrowth of ''using polar coordinates''. As Hildebrand says:<ref name=Hildebrand> + +{{cite book |title=Methods of applied mathematics |author=Francis Begnaud Hildebrand |url=http://books.google.com/books?id=17EZkWPz_eQC&pg=PA156 |page=156 |isbn=0-486-67002-3 |edition=Reprint of Prentice-Hall 1965 2nd |year=1992 |publisher=Courier Dover }} + +</ref> "Since such quantities are not true physical forces, they are often called ''inertia forces''. Their presence or absence depends, not upon the particular problem at hand, but ''upon the coordinate system chosen''." In particular, if Cartesian coordinates are chosen, the centrifugal force disappears, and the formulation involves only the central force itself, which provides the [[centripetal force]] for a curved motion. + +This viewpoint, that fictitious forces originate in the choice of coordinates, often is expressed by users of the Lagrangian method. This view arises naturally in the Lagrangian approach, because the frame of reference is (possibly unconsciously) selected by the choice of coordinates.<ref name= example> + +For example, see {{cite book |title=From instability to intelligence |page=202 |author=Michail Zak, Joseph P. Zbilut, Ronald E. Meyers |url=http://books.google.com/books?id=tHdDL0GCA70C&pg=PA202 |isbn=3-540-63055-4 |year=1997 |publisher=Springer}} for a comparison of Lagrangians in an inertial and in a noninertial frame of reference. See also the discussion of "total" and "updated" Lagrangian formulations in {{cite book |title=Computational continuum mechanics |author=Ahmed A. Shabana |url=http://books.google.com/books?id=RJbPQPrS6VsC&pg=PA118 |pages=118–119 |isbn=0-521-88569-8 |year=2008 |publisher=Cambridge University Press}} + +</ref> Unfortunately, this usage of "inertial force" conflicts with the Newtonian idea of an inertial force. In the Newtonian view, an inertial force originates in the acceleration of the frame of observation (the fact that it is not an [[inertial frame of reference]]), not in the choice of coordinate system. To keep matters clear, it is safest to refer to the Lagrangian inertial forces as ''generalized'' inertial forces, to distinguish them from the Newtonian vector inertial forces. That is, one should avoid following Hildebrand when he says (p.&nbsp;155) "we deal ''always'' with ''generalized'' forces, velocities accelerations, and momenta. For brevity, the adjective "generalized" will be omitted frequently." + +It is known that the Lagrangian of a system is not unique. Within the Lagrangian formalism the Newtonian fictitious forces can be identified by the existence of alternative Lagrangians in which the fictitious forces disappear, sometimes found by exploiting the symmetry of the system.<ref name=Gannon> + +{{cite book |title=Moonshine beyond the monster: the bridge connecting algebra, modular forms and physics |author= Terry Gannon |url=http://books.google.com/books?id=ehrUt21SnsoC&pg=RA3-PA267 |page=267 |isbn=0-521-83531-3 |year=2006 |publisher=Cambridge University Press }}</ref> + +==Extensions of Lagrangian mechanics== +The [[Hamiltonian mechanics|Hamiltonian]], denoted by ''H'', is obtained by performing a [[Legendre transformation]] on the Lagrangian, which introduces new variables, canonically conjugate to the original variables. This doubles the number of variables, but makes differential equations first order. The Hamiltonian is the basis for an alternative formulation of classical mechanics known as [[Hamiltonian mechanics]]. It is a particularly ubiquitous quantity in [[quantum mechanics]] (see [[Hamiltonian (quantum mechanics)]]). + +In 1948, [[Richard Feynman|Feynman]] discovered the [[path integral formulation]] extending the [[principle of least action]] to [[quantum mechanics]] for [[electrons]] and [[photons]]. In this formulation, particles travel every possible path between the initial and final states; the probability of a specific final state is obtained by summing over all possible trajectories leading to it. In the classical regime, the path integral formulation cleanly reproduces Hamilton's principle, and [[Fermat's principle]] in [[optics]]. + +==See also== +* [[Canonical coordinates]] +* [[Functional derivative]] +* [[Generalized coordinates]] +* [[Hamiltonian mechanics]] +* [[Hamiltonian optics]] +* [[Lagrangian analysis]] (applications of Lagrangian mechanics) +* [[Lagrangian point]] +* [[Non-autonomous mechanics]] +* [[Restricted three-body problem]] + +==References== +{{reflist}} + +==Further reading== +* [[Lev Landau|Landau, L.D.]] and [[Evgeny Lifshitz|Lifshitz, E.M.]] ''Mechanics'', Pergamon Press. +* Gupta, Kiran Chandra, ''Classical mechanics of particles and rigid bodies'' (Wiley, 1988). +* [[Classical Mechanics (book)|Goldstein, Herbert, ''Classical Mechanics'', Addison Wesley]]. +* Cassel, Kevin W.: Variational Methods with Applications in Science and Engineering, Cambridge University Press, 2013. + +==External links== +* Tong, David, [http://www.damtp.cam.ac.uk/user/tong/dynamics.html Classical Dynamics] Cambridge lecture notes +* [http://www.eftaylor.com/software/ActionApplets/LeastAction.html Principle of least action interactive] Excellent interactive explanation/webpage +* [http://portail.mathdoc.fr/cgi-bin/oetoc?id=OE_LAGRANGE__1 Joseph Louis de Lagrange - Œuvres complètes] (Gallica-Math) + +{{Physics-footer}} + +[[Category:Lagrangian mechanics|*]] + cd344oylvqxrx5d9ehy2e6le9lx531i + + + + Near sets + 0 + 24262 + + 24263 + 2013-06-24T20:09:35Z + + 38.111.50.78 + + /* Definitions */ removed extraneous space + wikitext + text/x-wiki + '''Near sets''' are [[disjoint sets]] that resemble each other. Resemblance between disjoint sets occurs whenever there are observable similarities between the objects in the sets. Similarity is determined by comparing lists of object feature values. Each list of feature values defines an object's description. Comparison of object descriptions provides a basis for determining the extent that disjoint sets resemble each other. Objects that are perceived as similar based on their descriptions are grouped together. These groups of similar objects can provide information and reveal patterns about objects of interest in the disjoint sets. For example, collections of digital images viewed as disjoint sets of points provide a rich hunting ground for near sets. + +Near set theory provides methods that can be used to extract resemblance information from objects contained in disjoint sets, ''i.e.'', it provides a formal basis for the observation, comparison, and classification of objects. The discovery of near sets begins with choosing the appropriate method to describe observed objects. This is accomplished by the selection of probe functions representing observable object features. A probe function is a mapping from an object to a real number representing a feature value. For example, when comparing fruit such as apples, the redness of an apple (observed object) can be described by a probe function representing colour, and the output of the probe function is a number representing the degree of redness (or whatever colour apple you prefer to eat). Probe functions provide a basis for describing and discerning affinities between objects as well as between groups of similar objects. Objects that have, in some degree, affinities are considered near each other. Similarly, groups of objects (''i.e.'' sets) that have, in some degree, affinities are also considered near each other. + +Near sets offer a framework for solving problems based on human [[perception]] that arise in areas such as [[image processing]], [[computer vision]] as well as engineering and science problems. In near set theory, perception is a combination of the view of perception in [[psychophysics]] with a view of perception found in [[Maurice Merleau-Ponty|Merleau-Ponty's]] work. In the context of psychophysics, perception of an object (''i.e.'', in effect, knowledge about an object) depends on signal values gathered by our senses. In this view of perception, our senses are likened to probe functions by considering them as mappings of stimuli to sensations that are a source of values assimilated by the mind. A human sense modelled as a probe measures observable physical characteristics of objects in our environment. The sensed physical characteristics of an object are identified with object features. In Merleau-Ponty's view, an object is perceived to the extent that it can be described. In other words, object description goes hand-in-hand with object perception. It is the mind that identifies relationships between object descriptions to form perceptions of sensed objects. It is also the case that near set theory has been proven to be quite successful in finding solutions to perceptual problems such as measuring image correspondence and segmentation evaluation. + +[[File:Set partition.svg|right|thumb|A [[partition of a set|partition]] of a set]] + +{{TOClimit|limit=3}} + +== History == +[[File:Rough set.svg|right|thumb|Example of a [[rough set]]]] +It has been observed that mathematical topics emerge and evolve through interactions among many researchers. This was the case with the discovery of near sets. Work on a perceptual basis for near sets began in 2002, motivated by digital image analysis. It was inspired by a study of the perception of nearness of familiar objects carried out by [[Zdzislaw Pawlak|Z. Pawlak]] and J.F. Peters.<ref>Z. Pawlak, Z. Peters, J.F. Jak blisko (How near), Systemy Wspomagania Decyzji I (2002, 2007) 57, 109, ISBN 83-920730-4-5 [http://witch.ii.pw.edu.pl/rseisp07/presentations/RSEISP07/Day_1/_Commemorative_session/RSEISP07-ZP-wspomnienia.pdf (available here)]. The intuition that led to the discovery of near sets is given in ''How near''.</ref> In this context, ''nearness'' is interpreted to mean ''closely corresponding to or resembling an original''. This collaboration was important in paving the way toward a description-based approach to exploring the nearness of sets. + +Excitement grew after 2002, when it became apparent that it was possible to introduce measures of nearness based on similarities between classes contained in coverings of disjoint sets (''e.g.'', this is possible if we define coverings of sets representing digital images and then look for similarities between the images such as shades of green in one landscape that resemble one or more shades of green in another landscape). In this context the term ''similarity'' means resemblance between two or more individual objects or sets of objects and almost equal patterns in compared items. Collaboration between J.F. Peters, A. Skowron, and J. Stepaniuk led to a formal basis for the nearness of objects considered in the context of [[proximity space]]s.<ref>Peters J., Skowron, A. Stepaniuk, J. Nearness of objects: Extension of approximation space model. Fundamenta Informaticae 79, 3-4, 2007, 497-512 [http://portal.acm.org/citation.cfm?id=1366089 (available here)]. Where a nearness relation is used to define a particular form of proximity space.</ref> Near sets and an approach to defining resemblance between sets was introduced by J.F. Peters in.<ref>Peters, J.F. Near sets. General theory about nearness of objects, Appl. Math. Sci. 1 (53) (2007) 2029–2609 [http://www.m-hikari.com/ams/ams-password-2007/ams-password53-56-2007/petersAMS53-56-2007.pdf (available here)]. Reminiscent of M. Pavel's approach, descriptions of objects are defined relative to vectors of values of real-valued functions called probes (Sect. 3, n. 2). See Pavel, M. Fundamentals of Pattern Recognition, in the Further reading section below, for the introduction of probe functions considered in the context of [[image registration]]. In the near set approach, a probe is viewed as a model for a sensor typically used in science and engineering. See, also, Peters, J.F., Wasilewski, P., Foundations of near sets, also listed in the Further reading section.</ref><ref>Peters, J.F. Near sets. Special theory about nearness of objects, Fundam. Inform. 75 (1–4) (2007) 407–433 [http://iospress.metapress.com/content/6l6xvxtabn9nh136 (available here)]. The basic distinction between near sets and rough sets is given (Remark 2.1). For a more detailed presentation of this topic, see Peters, J.F., Wasilewski, P., Foundations of near sets, listed in the Further reading section.</ref> + +[[File:Near sets.svg|right|thumb|Example of near sets]] + +Near set theory and its applications grew out of a generalization of the approach to the classification of objects proposed by [[Zdzislaw Pawlak|Z. Pawlak]] during his work on [[rough set]]s in the early 1980s, and E. Orłowska's work on approximation spaces. Briefly, a rough set can be described as follows. Consider a non-empty finite set of objects labelled <math>O</math>. The set <math>O</math> can be [[partition of a set|partitioned]] into cells (referred to as classes in near set theory) by grouping together objects that have similar descriptions (using one or more probe functions). A set <math>X\subset O</math> is considered rough when it cannot be formed completely by the union of classes from the partition of <math>O</math>. The set <math>X</math> is considered ''rough'' inasmuch as <math>X</math> cannot be fully described by probe functions selected to describe the individual objects of <math>O</math>. + +Near sets are considered a generalization of rough sets, since it has been shown that every rough set is a near set but not every near set is a rough set. Near sets grew out of the idea that two or more rough sets can share objects with matching descriptions if they both contain objects belonging to the same class from the partition of <math>O</math>. When this occurs, the sets are considered near each other with respect to the classes contained in the partition. + +== Definitions == +'''Definition 1: Object''' + +An ''object'' is anything that has its origin in the physical world. + +An identifying characteristic of an object is that it must have some quantifiable features. The term ''feature'' is used in [[Satoshi Watanabe (physicist)|S. Watanabe's]] sense of the word, ''i.e.'', a feature corresponds to an observable property of physical objects. Each feature has a 1-to-many relationship to real-valued functions called probe functions representing the feature. For each feature (such as colour) one or more probe functions can be introduced to represent the feature (such as [[grayscale]], or [[RGB color model|RGB]] values). Objects and sets of probe functions form the basis of near set theory and are sometimes referred to as perceptual objects due to the focus on assigning values to perceived object features. A non-empty, finite set of objects is denoted by <math>O</math>. + +'''Definition 2: Probe Function''' + +A ''probe function'' is a real-valued function, <math>f:O\to\mathbb R</math>, representing a feature of an object. + +Examples of probe functions are the colour, size, texture, edge-orientation, or weight of an object. Probe functions are used to describe an object to determine the characteristics and perceptual similarity of objects. Perceptual information is always presented with respect to probe functions just as our senses define our perception of the world. For example, our ability to view light in the ''visible spectrum'' rather than infra red or microwaves spectra defines our perception of the world just as the selection of probe functions constrains the amount of perceptual information available for feature extraction from a set of objects. The set of all probe functions is denoted by <math>\mathbb{F}</math>, and a set of specific probe functions for a given application is denoted by <math>\mathcal{B}\subseteq\mathbb{F}</math> + +'''Definition 3: Perceptual System''' + +A ''perceptual system'' <math> \langle O, \mathbb{F} \rangle </math> consists of a non-empty set <math>O</math> together with a set <math>\mathbb{F}</math> of real-valued functions. + +The notion of a perceptual system admits a wide variety of different interpretations that result from the selection of sample objects contained in a particular sample space <math>O</math>. A recent example of a perceptual system is given by D. Hall.<ref>Hall, D. Automatic parameter regulation of perceptual systems. Image and Vision Computing 24, 8, 2006, 870-881 [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V09-4K9C57K-6&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1037015643&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=3b4e645e375adb5b3b8f2c933a28cf5e (available here)].</ref> Two other examples of perceptual systems are: a set of microscope images together with a set of image processing probe functions, or a set of results from a web query together with some measures (probe functions) indicating, ''e.g.'', relevancy of the results. + +'''Definition 4: Object Description''' + +Consider a perceptual system <math> \langle O, \mathbb{F} \rangle </math>. The description of an object <math>x\in O, \phi_i\in \mathcal{B}\subseteq \mathbb{F}</math> is given by the vector + +:<math>\boldsymbol{\phi}_{\mathcal{B}}(x) = (\phi_1(x),\phi_2(x),\ldots,\phi_i(x),\ldots,\phi_\ell(x)),</math> + +where <math>l</math> is the length of the vector <math>\boldsymbol{\phi}</math>, and each <math>\phi_i</math> is a probe function belonging to the set <math>\mathcal{B}</math>. + +'''Definition 5: Perceptual Indiscernibility Relation''' + +Let <math>\langle O, \mathbb{F} \rangle</math> be a perceptual system. For every <math>\mathcal{B}\subseteq\mathbb{F}</math> the perceptual indiscernibility relation <math>\sim_{\mathcal{B}}</math> is defined as follows: + +:<math>\sim_{\mathcal{B}} = \{(x,y)\in O \times O : \,\parallel \boldsymbol{\phi}_{\mathcal{B}}(x) - \boldsymbol{\phi}_{\mathcal{B}}(y)\parallel_{_2} = 0\},</math> + +where <math>\parallel\cdot\parallel</math> represents the [[Norm (mathematics)|<math>L^2</math> norm]]. This is a refinement of the original indiscernibility relation given by [[Zdzislaw Pawlak|Pawlak]] in 1981. Using the perceptual indiscernibility relation, objects with matching descriptions can be grouped to form classes called elementary sets (also called an [[equivalence class]]) defined by + +:<math>\mathbb{C}_{/\sim_{\mathcal{B}}} = \{o\in O\mid o\sim_{\mathcal{B}}c\,\forall\,c\in\mathbb{C}_{/\sim_{\mathcal{B}}}\}.</math> + +Similarly, a [[equivalence class|quotient set]] is the set of all elementary sets defined as + +:<math>O_{/\sim_{\mathcal{B}}} = \bigcup\{\mathbb{C}_{/\sim_{\mathcal{B}}}\}.</math> + +'''Definition 6: Perceptual Tolerance Relation''' + +When dealing with perceptual objects (especially, components in images), it is sometimes necessary to relax the equivalence condition of Defn. 5 to facilitate observation of associations in a perceptual system. This variation is called a perceptual tolerance relation. Let <math>\langle O, \mathbb{F} \rangle</math> be a perceptual system and let <math>\varepsilon\in\mathbb{R}</math>. For every<math>\mathcal{B}\subseteq\mathbb{F}</math> the tolerance relation <math>\cong_{\mathcal{B}}</math> is defined as follows: + +:<math>\cong_{\mathcal{B},\epsilon} = \{(x,y)\in O \times O : \parallel\boldsymbol{\phi}_{\mathcal{B}}(x) - \boldsymbol{\phi}_{\mathcal{B}}(y)\parallel_{_2} \leq \varepsilon\}.</math> + +For notational convenience, this relation can be written <math>\cong_{\mathcal{B}}</math> instead of<math>\cong_{\mathcal{B},\varepsilon}</math> with the understanding that <math>\epsilon</math> is inherent to the definition of the tolerance.<ref>Peters, J.F., Tolerance near sets and image correspondence. International Journal of Bio-Inspired Computation, 1, 4, 2009 [http://inderscience.metapress.com/app/home/contribution.asp?referrer=parent&backto=issue,2,6;journal,1,3;linkingpublicationresults,1:121374,1 (available here)].</ref> + +Tolerance classes are defined in terms of preclasses. Let <math>A_{\mathcal{B},\varepsilon}</math> denote that <math>A\subset O</math> is a perception-based preclass. Given<math>A_{\mathcal{B},\varepsilon}</math>, then [[existential quantification|for all]] <math>x,y\in A, x\cong_{\mathcal{B},\epsilon} y</math>, ''i.e.'', + +:<math> A_{\mathcal{B},\varepsilon}\ \mbox{is a preclass} \iff \forall x,y\in A, \parallel \boldsymbol{\phi}_{\mathcal{B}}(x) - \boldsymbol{\phi}_{\mathcal{B}}(y)\parallel_{_2} \leq \varepsilon.</math> + +Let <math>\mathbb{C}_{\mathcal{B},\varepsilon}</math> denote a tolerance class, which, by definition, is a maximal preclass. For <math>x\in O</math>, we also use the notation<math>x_{/_{\cong_{\mathcal{B},\epsilon}}}</math> to denote a tolerance class containing <math>x</math>. Note, <math>\cong_{\mathcal{B},\epsilon}</math> [[Cover (topology)|covers]] <math>O</math> instead of [[Partition of a set|partitioning]] <math>O</math> because an object can belong to more than one class. In addition, each pair of objects <math>x, y</math> in <math>\mathbb{C}_{\mathcal{B},\epsilon}</math> must satisfy the condition <math>\parallel\boldsymbol{\phi}_{\mathcal{B}}(x) -\boldsymbol{\phi}_{\mathcal{B}}(y)\parallel_{_2}\leq\varepsilon</math>. Next, a [[Cover (topology)|covering]] of <math>O</math> defined by<math>\cong_{\mathcal{B},\epsilon}</math> is the union of all tolerance classes in the covering. + +Notice that the tolerance relation <math>\cong_{\mathcal{B},\epsilon}</math> is a generalization of the indiscernibility relation given in Defn. 5 (obtained by setting <math>\varepsilon = 0</math>). + +'''Definition 7: Weak Nearness Relation''' + +Let <math>\langle O, \mathbb{F}\rangle</math> be a perceptual system and let <math>X,Y\subseteq O</math>. A set <math>X</math> ''is weakly near to'' a set <math>Y</math> (denoted <math>X \underline{\bowtie}_{\mathbb{F}} Y</math>) ''within'' the perceptual system <math>\langle O, \mathbb{F}\rangle</math> [[If and only if|iff]] there are <math>x \in X</math> and <math>y \in Y</math> and there is<math>\mathcal{B}\subseteq \mathbb{F}</math> such that <math>x \cong_{\mathcal{B}} y</math>. Notice that the image given in the [[Wikipedia:Lead section|lead section]] is actually an example of sets that are weakly near each other (with <math>\varepsilon = 0</math>). + +'''Definition 8: Nearness Relation''' + +Let <math>\langle O, \mathbb{F}\rangle</math> be perceptual system and let <math>X,Y \subseteq O</math>. A set <math>X</math> ''is near to'' a set <math>Y</math> (denoted <math>X\ \bowtie_{\mathbb{F}}\ Y</math>)''within'' the perceptual system <math>\langle O, \mathbb{F}\rangle</math> [[If and only if|iff]] there are <math>\mathbb{F}_1, \mathbb{F}_2 \subseteq \mathbb{F}</math> and <math>f\in \mathbb{F}</math> and there are <math>A \in O_{/\sim_{\mathbb{F}_1}}, B\in O_{/\sim_{\mathbb{F}_2}}, C\in O_{/\sim_{f}}</math> such that <math>A \subseteq X</math>, <math>B \subseteq Y</math> and <math>A,B \subseteq C</math>. + +[[File:CoveringSVG.svg|thumb|500px|Examples of Defn.'s 7 & 8: (a) Example of Defn. 7, (b) example of <math>O_{/\sim_{\mathbb{F}_1}}</math>, (c) example of <math>O_{/\sim_{\mathbb{F}_2}}</math>, and (d) example of<math>O_{/\sim_f}</math> showing (together with (b) and (c)) that sets <math>X</math> and <math>Y</math> are near to each other according to Defn. 8.]] + +== Examples == +'''Simple Example''' + +The following simple example highlights the need for a tolerance relation as well as demonstrates the construction of tolerance classes from real data. Consider the 20 objects in the table below with<math>|\phi(x_i)| = 1</math>. + +:{| class="wikitable" style="text-align:center; width:30%" border="1" +|+ Sample Perceptual System +!<math>x_i</math> !! <math>\phi(x)</math> !! <math>x_i</math> !! <math>\phi(x)</math> !! <math>x_i</math> !! <math>\phi(x)</math> !!<math>x_i</math> !! <math>\phi(x)</math> +|- +|<math>x_1</math> || .4518 || <math>x_6</math> || .6943 || <math>x_{11}</math> || .4002 || <math>x_{16}</math> || .6079 +|- +|<math>x_2</math> || .9166 || <math>x_7</math> || .9246 || <math>x_{12}</math> || .1910 || <math>x_{17}</math> || .1869 +|- +|<math>x_3</math> || .1398 || <math>x_8</math> || .3537 || <math>x_{13}</math> || .7476 || <math>x_{18}</math> || .8489 +|- +|<math>x_4</math> || .7972 || <math>x_9</math> || .4722 || <math>x_{14}</math> || .4990 || <math>x_{19}</math> || .9170 +|- +|<math>x_5</math> || .6281 || <math>x_{10}</math> || .4523 || <math>x_{15}</math> || .6289 || <math>x_{20}</math> || .7143 +|} + +Letting <math>\varepsilon = 0.1</math> gives the following tolerance classes: + +:<math> +\begin{align} +O = & \{ \{x_1, x_8, x_{10}, x_{11}\},\{x_1, x_9, x_{10}, x_{11}, x_{14}\},\\ +& \{x_2, x_7, x_{18}, x_{19}\},\\ +& \{x_3, x_{12}, x_{17}\},\\ +& \{x_4, x_{13}, x_{20}\},\{x_4, x_{18}\},\\ +& \{x_5, x_6, x_{15}, x_{16}\},\{x_5, x_6, x_{15}, x_{20}\},\\ +& \{x_6, x_{13}, x_{20}\}\}. +\end{align} +</math> + +Observe that each object in a tolerance class satisfies the condition <math>\parallel\boldsymbol{\phi}_{\mathcal{B}}(x) -\boldsymbol{\phi}_{\mathcal{B}}(y)\parallel_2\leq\varepsilon</math>, and that almost all of the objects appear in more than one class. Moreover, there would be twenty classes if the indiscernibility relation was used since there are no two objects with matching descriptions. Finally, using these objects, the sets + +:<math> X = \{x_1, x_9\}</math> and <math> Y = \{x_{11}, x_{14}\},</math> + +are weakly near each other. + +'''Image Processing Example''' + +[[File:NearImages.svg|thumb|500px| Example of images that are near each other. (a) and (b) Images from the freely available LeavesDataset (see, ''e.g.'', www.vision.caltech.edu/archive.html).]] + +The following example provides a more useful application of near set theory. Let a subimage be defined as a small subset of [[pixel]]s belonging to a digital image such that the pixels contained in the subimage form a square. Then, let the sets <math> X</math> and <math>Y</math> respectively represent the subimages obtained from two different images, and let <math>O = \{X \cup Y\}</math>. Finally, let the description of an object be given by the Green component in the [[RGB color model]]. The next step is to find all the tolerance classes using the tolerance relation. Using this information, tolerance classes can be formed containing objects that have similar (within some small <math>\varepsilon</math>) values for the Green component in the RGB colour model. Furthermore, images that are near (similar) to each other should have tolerance classes divided among both images (instead of a tolerance classes contained solely in one of the images). For example, the figure accompanying this example shows a subset of the tolerance classes obtained from two leaf images. In this figure, each tolerance class is assigned a separate colour. As can be seen, the two leaves share similar tolerance classes. This example is a first step toward the application of near sets to the image correspondence problem. However, it also highlights a need to measure the degree of nearness of two sets. + +== Nearness measure == + +For some applications it is not sufficient to simply state that two sets are near each other. The practical application of near set theory sometimes requires a method for quantifying the nearness of sets. As a result, a <math>L_2</math> norm-based nearness measure is was developed. Specifically, it was based on the idea that sets can be considered near each other when they have "things" in common. In the context of near sets, the "things" can be quantified by granules of a perceptual system, ''i.e.'', the tolerance classes. The simplest example of nearness between sets sharing "things" in common is the case when two sets have similar elements. Defn. 7 can be used to define a Nearness Measure (NM) between two sets <math>X</math> and <math>Y</math>. Let <math>Z = X\cup Y</math> and let the notation + +:<math>[z_{/\cong_{\mathcal{B}}}]_X = \{z\in z_{/\cong_{\mathcal{B}}}\mid z\in X\},</math> + +denote the portion of the tolerance class <math>z_{/\cong_{\mathcal{B}}}</math> that belongs to <math>X</math>, and similarly, use the notation + +:<math>[z_{/\cong_{\mathcal{B}}}]_Y = \{z\in z_{/\cong_{\mathcal{B}}}\mid z\in Y\},</math> + +to denote the portion that belongs to <math>Y</math>. Further, let the sets <math>X</math> and <math>Y</math> be weakly near each other using Defn. 6. Also, let <math>Z_{/\cong_{\mathcal{B}}}</math> denote a covering of <math>Z</math> defined by <math>\cong_{\mathcal{B}}</math>. Then, a <math>NM_{\cong_{\mathcal{B}}}(X,Y)</math> between <math>X</math> and <math>Y</math> is given by + +:<math> NM_{\cong_{\mathcal{B}}}(X,Y) = \Biggl ( \sum_{z_{/\cong_{\mathcal{B}}}\in Z_{/\cong_{\mathcal{B}}}} |z_{/\cong_{\mathcal{B}}}| \Biggr)^{-1} \sum_{z_{/\cong_{\mathcal{B}}}\in Z_{/\cong_{\mathcal{B}}}}|z_{/\cong_{\mathcal{B}}}| \frac{ \min (|[z_{/\cong_{\mathcal{B}}}]_X|,|[z_{/\cong_{\mathcal{B}}}]_Y|)}{\max (|[z_{/\cong_{\mathcal{B}}}]_X|,|[z_{/\cong_{\mathcal{B}}}]_Y|)}. </math> + +The idea behind the NM is that sets that are similar should have a similar number of objects in each tolerance class. Thus, for each tolerance class obtained from the covering of <math>Z=X\cup Y</math>, the NM counts the number of objects that belong to <math>X</math> and <math>Y</math> and takes the ratio (as a proper fraction) of their cardinalities. Furthermore, each ratio is weighted by the total size of the tolerance class (thus giving importance to the larger classes) and the final result is normalized by dividing by the sum of all the cardinalities. The range of the NM is in the interval [0,1], where a value of 1 is obtained if the sets are equivalent and a value of 0 is obtained if they have no elements in common. + +As an example of the degree of nearness between two sets, consider figure below in which each image consists of two sets of objects, <math>X</math> and <math>Y</math>. Each colour in the figures corresponds to an elementary set where all the objects in the class share the same description. The idea behind the NM is that the nearness of sets in a perceptual system is based on the cardinality of tolerance classes that they share. Thus, the sets in left side of the figure are closer (more near) to each other in terms of their descriptions than the sets in right side of the figure. + +[[File:Visulization of nearness measure.jpg|thumb|500px|Examples of degree of nearness between two sets: (a) High degree of nearness, and (b) Low degree of nearness.]] + +== Near set evaluation and recognition (NEAR) system == + +The Near set Evaluation and Recognition (NEAR) system, is a system developed to demonstrate practical applications of near set theory to the problems of image segmentation evaluation and image correspondence. It was motivated by a need for a freely available software tool that can provide results for research and to generate interest in near set theory. The system implements a Multiple Document Interface (MDI) where each separate processing task is performed in its own child frame. The objects (in the near set sense) in this system are subimages of the images being processed and the probe functions (features) are image processing functions defined on the subimages. The system was written in C++ and was designed to facilitate the addition of new processing tasks and probe functions. Currently, the system performs five major tasks, namely, displaying equivalence and tolerance classes for an image, performing segmentation evaluation, measuring the nearness of two images, and displaying the output of processing an image using an individual probe functions. + +[[File:Near gui.jpg|thumb|500px|NEAR system GUI.]] + +== See also == +<div style="-moz-column-count:4; column-count:4;"> + +* [[Alternative set theory]] +* [[Cover (topology)]] +* [[Features (pattern recognition)]] +* [[Image analysis]] +* [[Maurice Merleau-Ponty]] +* [[Perception]] +* [[Proximity space]] +* [[Psychophysics]] +* [[Rough set| Rough set theory]] +* [[Satoshi Watanabe (physicist)|Satoshi Watanabe]] +* [[Soft computing]] +</div> + +== Notes == + +<references/> + +==Further reading== +* Henry, C., Peters, J.F. [http://wren.ece.umanitoba.ca/index.php?option=com_content&view=article&id=119&Itemid=93 Near set evaluation and recognition (NEAR) system]. Tech. rep., Computational Intelligence Laboratory, University of Manitoba, UM CI Laboratory Technical Report No. TR-2009-015, 2009. +*Peters, J.F., Wasilewski, P. [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V0C-4W8TW7K-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1034064435&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=5dddb73027f11d718f7ceb7a1118c796 Foundations of near sets]. Information Sciences 179 (18), 2009, 3091-3109. +* Pavel, M. [http://portal.acm.org/citation.cfm?id=155290 Fundamentals of Pattern Recognition], 2nd ed. Pure and Applied Mathematics, vol. 174, Marcel Dekker, Inc., 1993. + +==External links== +* Pal, S.K., Peters, J.F., Eds. [http://www.routledge.com/books/Rough-Fuzzy-Image-Analysis-ISBN 978-1-4398-0329-5 Rough Fuzzy Image Analysis]. Foundations and Methodologies. Routledge, 2010, ISBN 978-1-4398-0329-5. To appear in September 2010. Many chapters present the theory and application of near sets in image analysis, including several system implementations. +*Hassanien, E., Abraham, A., Peters, J.F., Schaefer, G., Henry, C. [http://ieeexplore.ieee.org/xpl/tocpreprint.jsp?isnumber=4358869&punumber=4233 Rough sets and near sets in medical imaging: A review]. IEEE Trans. on Information Technology in Biomedicine 13 (5), 2009, {{doi|10.1109/TITB.2009.2017017}}. + +[[Category:Set theory]] + 1ijzm6lpite1vw9a8e7iygn3bjxmxdo + + + + Polynomial interpolation + 0 + 2925 + + 2926 + 2014-02-03T23:18:47Z + + Glrx + 0 + + rvt last 4; divided diffs are not defined as coefficients; method to calc coefficients + wikitext + text/x-wiki + In [[numerical analysis]], '''polynomial interpolation''' is the [[interpolation]] of a given [[data set]] by a [[polynomial]]: given some [[Point (geometry)#Points in Euclidean geometry|points]], find a polynomial which goes exactly through these points. + +== Applications == + +Polynomials can be used to approximate more complicated curves, for example, the shapes of letters in [[typography]], given a few points. A relevant application is the evaluation of the [[natural logarithm]] and [[trigonometric function]]s: pick a few known data points, create a [[lookup table]], and interpolate between those data points. This results in significantly faster computations. Polynomial interpolation also forms the basis for algorithms in [[numerical quadrature]] and [[numerical ordinary differential equations]]. + +Polynomial interpolation is also essential to perform sub-quadratic multiplication and squaring such as [[Karatsuba multiplication]] and [[Toom–Cook multiplication]], where an interpolation through points on a polynomial which defines the product yields the product itself. For example, given ''a'' = ''f''(''x'') = ''a''<sub>0</sub>''x''<sup>0</sup> + ''a''<sub>1</sub>''x''<sup>1</sup> + ... and ''b'' = ''g''(''x'') = ''b''<sub>0</sub>''x''<sup>0</sup> + ''b''<sub>1</sub>''x''<sup>1</sup> + ... then the product ''ab'' is equivalent to ''W''(''x'') = ''f''(''x'')''g''(''x''). Finding points along ''W''(''x'') by substituting ''x'' for small values in ''f''(''x'') and ''g''(''x'') yields points on the curve. Interpolation based on those points will yield the terms of ''W''(''x'') and subsequently the product ''ab''. In the case of Karatsuba multiplication this technique is substantially faster than quadratic multiplication, even for modest-sized inputs. This is especially true when implemented in parallel hardware. + +==Definition== + +Given a set of ''n''&nbsp;+&nbsp;1 data points (''x''<sub>''i''</sub>,''y''<sub>''i''</sub>) where no two ''x''<sub>''i''</sub> are the same, one is looking for a polynomial ''p'' of degree at most ''n'' with the property +:<math>p(x_i) = y_i,\; i=0,\ldots,n.</math> + +The [[Unisolvent functions|unisolvence]] theorem {{anchor|unisolvence theorem}} states that such a polynomial ''p'' exists and is unique, and can be proved by the [[Vandermonde matrix]], as described below. + +The theorem states that for ''n''+1 interpolation nodes (''x''<sub>''i''</sub>), polynomial interpolation defines a linear [[bijection]] + +:<math>L_n:\mathbb{K}^{n+1} \to \Pi_n</math> + +where <math>\Pi_n</math> is the [[vector space]] of polynomials (defined on any interval containing the nodes) of degree at most&nbsp;''n''. + +==Constructing the interpolation polynomial== +[[Image:Interpolation example polynomial.svg|thumb|right|The red dots denote the data points (''x''<sub>''k''</sub>,''y''<sub>''k''</sub>), while the blue curve shows the interpolation polynomial.]] +Suppose that the interpolation polynomial is in the form +:<math>p(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_2 x^2 + a_1 x + a_0. \qquad (1) </math> +The statement that ''p'' interpolates the data points means that +:<math>p(x_i) = y_i \qquad\mbox{for all } i \in \left\{ 0, 1, \dots, n\right\}.</math> +If we substitute equation (1) in here, we get a [[system of linear equations]] in the coefficients <math>a_k</math>. The system in matrix-vector form reads +:<math>\begin{bmatrix} +x_0^n & x_0^{n-1} & x_0^{n-2} & \ldots & x_0 & 1 \\ +x_1^n & x_1^{n-1} & x_1^{n-2} & \ldots & x_1 & 1 \\ +\vdots & \vdots & \vdots & & \vdots & \vdots \\ +x_n^n & x_n^{n-1} & x_n^{n-2} & \ldots & x_n & 1 +\end{bmatrix} +\begin{bmatrix} +a_n \\ +a_{n-1} \\ +\vdots \\ +a_0 +\end{bmatrix} += +\begin{bmatrix} +y_0 \\ +y_1 \\ +\vdots \\ +y_n +\end{bmatrix}. +</math> +We have to solve this system for <math>a_k</math> to construct the interpolant <math>p(x).</math> The matrix on the left is commonly referred to as a [[Vandermonde matrix]]. + +The [[condition number]] of the Vandermonde matrix may be large,<ref>{{cite journal|last=Gautschi|first=Walter|title=Norm Estimates for Inverses of Vandermonde Matrices|journal=Numerische Mathematik|volume=23|issue=4|pages=337–347|year=1975|doi=10.1007/BF01438260}}</ref> causing large errors when computing the coefficients <math>a_i</math> if the system of equations is solved using [[Gaussian elimination]]. + +Several authors have therefore proposed algorithms which exploit the structure of the Vandermonde matrix to compute numerically stable solutions in <math>\mathcal O(n^2)</math> operations instead of the <math>\mathcal O(n^3)</math> required by Gaussian elimination.<ref>{{cite journal|last=Higham|first=N. J.|title=Fast Solution of Vandermonde-Like Systems Involving Orthogonal Polynomials|journal=IMA Journal of Numerical Analysis|volume=8|issue=4|pages=473–486|year=1988|doi=10.1093/imanum/8.4.473}}</ref><ref>{{cite journal|last=Björck|first=Å|coauthors=V. Pereyra|title=Solution of Vandermonde Systems of Equations|journal=Mathematics of Computation|volume=24|pages=893–903|year=1970|doi=10.2307/2004623|issue=112|publisher=American Mathematical Society|jstor=2004623}}</ref><ref>{{cite journal|author=Calvetti, D and Reichel, L|title=Fast Inversion of Vanderomnde-Like Matrices Involving Orthogonal Polynomials|journal=BIT|pages=473–484|year=1993|doi=10.1007/BF01990529|volume=33|issue=33}}</ref> These methods rely on constructing first a [[Newton polynomial|Newton interpolation]] of the polynomial and then converting it to the monomial form above. + +Alternatively, we may write down the polynomial immediately in terms of [[Lagrange polynomial]]s: +:<math>p(x)=\frac{(x-x_1)(x-x_2)\cdots(x-x_n)}{(x_0-x_1)(x_0-x_2)\cdots(x_0-x_n)}\cdot y_0+\frac{(x-x_0)(x-x_2)\cdots(x-x_n)}{(x_1-x_0)(x_1-x_2)\cdots(x_1-x_n)}\cdot y_1</math> +::<math>+\ldots+\frac{(x-x_0)(x-x_1)\cdots(x-x_{n-1})}{(x_n-x_0)(x_n-x_1)\cdots(x_n-x_{n-1})}\cdot y_n.</math> +That is, +:<math>p(x)=\sum_{i=0}^{n}y_i\cdot\prod_{0\leq j\leq n,j\neq i}\frac{x-x_j}{x_i-x_j}.</math> + + +For matrix arguments, this formula is called [[Sylvester's formula]] and the matrix-valued Lagrange polynomials are the [[Frobenius covariant]]s. + +==Uniqueness of the interpolating polynomial== + +===Proof 1=== +Suppose we interpolate through ''n''&nbsp;+&nbsp;1 data points with an at-most ''n'' degree polynomial ''p''(''x'') (we need at least ''n''&nbsp;+&nbsp;1 datapoints or else the polynomial cannot be fully solved for). Suppose also another polynomial exists also of degree at most ''n'' that also interpolates the ''n''&nbsp;+&nbsp;1 points; call it ''q''(''x''). + +Consider <math>r(x) = p(x) - q(x)</math>. We know, +# ''r''(''x'') is a polynomial +# ''r''(''x'') has degree at most ''n'', since <math>p(x)</math> and <math>q(x)</math> are no higher than this and we are just subtracting them. +# At the ''n''&nbsp;+&nbsp;1 data points, <math>r(x_i) = p(x_i) - q(x_i) = y_i - y_i = 0</math>. Therefore ''r''(''x'') has ''n''&nbsp;+&nbsp;1 roots. + +But ''r''(''x'') is an ''n'' degree polynomial (or less)! It has one root too many. +Formally, if <math>r(x)</math> is any non-zero polynomial, it must be writable as <math>r(x) = (x-x_0)(x-x_1)\cdots(x-x_n)</math>. +By distributivity the ''n''&nbsp;+&nbsp;1 ''x'''s multiply together to make <math>x^{n+1}</math>, i.e. one degree higher than the maximum we set. +So the only way ''r''(''x'') can exist is if ''r''(''x'')&nbsp;=&nbsp;0. +: <math>r(x) = 0 = p(x) - q(x) \implies p(x) = q(x)</math> + +So <math>q(x)</math> (which could be any polynomial, so long as it interpolates the points) is identical with <math>p(x)</math>, and <math>p(x)</math> is unique. + +===Proof 2=== + +Given the Vandermonde matrix used above to construct the interpolant, we can set up the system + +: <math>V a = y \, </math> + +To prove that V is [[Invertible matrix|nonsingular]] we use the Vandermonde determinant formula: + +: <math>\det(V) = \prod_{i,j=0, i<j}^n (x_i - x_j) </math> + +since the ''n''&nbsp;+&nbsp;1 points are distinct, the [[determinant]] can't be zero as <math>x_i - x_j</math> is never zero, therefore ''V'' is nonsingular and the system has a unique solution. + +Either way this means that no matter what method we use to do our interpolation: direct, [[Spline (mathematics)|spline]], [[Lagrange polynomial|lagrange]] etc., (assuming we can do all our calculations perfectly) we will always get the same polynomial. + +==Non-Vandermonde solutions== + +We are trying to construct our unique interpolation polynomial in the vector space <math>\Pi_n</math> of polynomials of degree ''n''. When using a [[monomial basis]] for <math>\Pi_n</math> we have to solve the Vandermonde matrix to construct the coefficients <math>a_k</math> for the interpolation polynomial. This can be a very costly operation (as counted in clock cycles of a computer trying to do the job). By choosing another basis for <math>\Pi_n</math> we can simplify the calculation of the coefficients but then we have to do additional calculations when we want to express the interpolation polynomial in terms of a [[monomial basis]]. + +One method is to write the interpolation polynomial in the [[Newton form]] and use the method of [[divided differences]] to construct the coefficients, e.g. [[Neville's algorithm]]. The cost is [[Big O notation|O]]<math>(n^2)</math> operations, while Gaussian elimination costs O<math>(n^3)</math> operations. Furthermore, you only need to do O<math>(n)</math> extra work if an extra point is added to the data set, while for the other methods, you have to redo the whole computation. + +Another method is to use the [[Lagrange form]] of the interpolation polynomial. The resulting formula immediately shows that the interpolation polynomial exists under the conditions stated in the above theorem. Lagrange formula is to be preferred to Vandermorde formula when we are not interested in computing the coefficients of the polynomial, but in computing the value of <math>p(x)</math> in a given ''x'' not in the original data set. In this case, we can reduce complexity to O<math>(n^2)</math>.<ref>R.Bevilaqua, D. Bini, M.Capovani and O. Menchi (2003). ''Appunti di Calcolo Numerico''. Chapter 5, p. 89. Servizio Editoriale Universitario Pisa - Azienda Regionale Diritto allo Studio Universitario.</ref> + +The [[Bernstein form]] was used in a constructive proof of the [[Weierstrass approximation theorem]] by [[Sergei Natanovich Bernstein|Bernstein]] and has nowadays gained great importance in computer graphics in the form of [[Bézier curve]]s. + +==Interpolation error== + +{{clarify section|date=June 2011}} + +When interpolating a given function ''f'' by a polynomial of degree ''n'' at the nodes ''x''<sub>0</sub>,...,''x''<sub>''n''</sub> we get the error + +:<math>f(x) - p_n(x) = f[x_0,\ldots,x_n,x] \prod_{i=0}^n (x-x_i) </math> + +where +:<math>f[x_0,\ldots,x_n,x]</math> + +is the notation for [[divided differences]]. + +If ''f'' is ''n''&nbsp;+&nbsp;1 times continuously differentiable on a closed interval ''I'' and <math>p_n(x)</math> be a polynomial of degree at most n that interpolates ''f'' at ''n''&nbsp;+&nbsp;1 distinct points {''x''<sub>''i''</sub>} (''i''=0,1,...,n) in that interval. Then for each x in the interval there exists <math>\xi</math> in that interval such that + +:<math> f(x) - p_n(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!} \prod_{i=0}^n (x-x_i) </math> + +=== Proof === +Let's set the error term is + +<math> R_n(x) = f(x) - p_n(x) </math> + +and set up an auxiliary function <math>Y(t)</math> and the function is + +<math> Y(t) = R_n(t) - \frac{R_n(x)}{W(x)} \ W(t) </math> + +where + +<math> W(t) = \prod_{i=0}^n (t-x_i) </math> + +and + +<math> W(x) = \prod_{i=0}^n (x-x_i) </math> + +Since <math> x_i </math> are roots of function f and <math> p_n(x) </math>, so we will have + +<math> Y(x) = R_n(x) - \frac{R_n(x)}{W(x)} \ W(x) = 0 </math> + +and + +<math> Y(x_i) = R_n(x_i) - \frac{R_n(x)}{W(x)} \ W(x_i) = 0 </math> + +Then <math>Y(t)</math> has n+2 roots. From [[Rolle's theorem]], <math>Y^\prime(t)</math> has n+1 roots, then <math>Y^{(n+1)}(t)</math> has one root <math>\xi</math>, where <math>\xi</math> is in the interval I. + +So we can get + +<math> Y^{(n+1)}(t) = R_n^{(n+1)}(t) - \frac{R_n(x)}{W(x)} \ (n+1)! </math> + +Since <math>p_n(x)</math> is a polynomial of degree at most n, then + +<math> R_n^{(n+1)}(t) = f^{(n+1)}(t) </math> + +Thus + +<math> Y^{(n+1)}(t) = f^{(n+1)}(t) - \frac{R_n(x)}{W(x)} \ (n+1)! </math> + +Since <math>\xi</math> is the root of <math>Y^{(n+1)}(t)</math>, so + +<math> Y^{(n+1)}(\xi) = f^{(n+1)}(\xi) - \frac{R_n(x)}{W(x)} \ (n+1)! = 0 </math> + +Therefore + +<math> R_n(x) = f(x) - p_n(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!} \prod_{i=0}^n (x-x_i) </math>. + +Thus the remainder term in the Lagrange form of the [[Taylor's theorem|Taylor theorem]] is a special case of interpolation error when all interpolation nodes&nbsp;''x''<sub>''i''</sub> are identical.<ref>{{cite web|url=http://www.math.okstate.edu/~binegar/4513-F98/4513-l16.pdf|title=Errors in Polynomial Interpolation}}</ref> + +In the case of equally spaced interpolation nodes <math>x_i = x_0 + ih</math>, it follows that the interpolation error is O<math>(h^{n+1})</math>. However, this does not yield any information on what happens when <math>n \to \infty</math>. That question is treated in the [[#Convergence properties|section ''Convergence properties'']]. + +The above error bound suggests choosing the interpolation points ''x''<sub>''i''</sub> such that the product | Π (''x'' &minus; ''x''<sub>''i''</sub>) | is as small as possible. The [[Chebyshev nodes]] achieve this. + +==Lebesgue constants== +:''See the main article: [[Lebesgue constant (interpolation)|Lebesgue constant]].'' + +We fix the interpolation nodes ''x''<sub>0</sub>, ..., ''x''<sub>''n''</sub> and an interval [''a'', ''b''] containing all the interpolation nodes. The process of interpolation maps the function ''f'' to a polynomial ''p''. This defines a mapping ''X'' from the space ''C''([''a'', ''b'']) of all continuous functions on [''a'', ''b''] to itself. The map ''X'' is linear and it is a [[projection (linear algebra)|projection]] on the subspace Π<sub>''n''</sub> of polynomials of degree ''n'' or less. + +The Lebesgue constant ''L'' is defined as the [[operator norm]] of ''X''. One has (a special case of [[Lebesgue's lemma]]): + +:<math> \|f-X(f)\| \le (L+1) \|f-p^*\|. </math> + +In other words, the interpolation polynomial is at most a factor (''L''&nbsp;+&nbsp;1) worse than the best possible approximation. This suggests that we look for a set of interpolation nodes that ''L'' small. In particular, we have for [[Chebyshev nodes]]: + +:<math> L \le \frac2\pi \log(n+1) + 1.\quad </math> + +We conclude again that Chebyshev nodes are a very good choice for polynomial interpolation, as the growth in ''n'' is exponential for equidistant nodes. However, those nodes are not optimal. + +==Convergence properties== + +It is natural to ask, for which classes of functions and for which interpolation nodes the sequence of interpolating polynomials converges to the interpolated function as the degree ''n'' goes to infinity? Convergence may be understood in different ways, e.g. pointwise, uniform or in some integral norm. + +The situation is rather bad for equidistant nodes, in that uniform convergence is not even guaranteed for infinitely differentiable functions. One [[Runge's phenomenon|classical example, due to Carl Runge]], is the function ''f''(''x'') = 1 / (1 + ''x''<sup>2</sup>) on the interval [&minus;5, 5]. The interpolation error ||''f'' &minus; ''p''<sub>''n''</sub>||<sub><math>\infty</math></sub> grows without bound as <math>n \rightarrow \infty</math>. Another example is the function ''f''(''x'') = |''x''| on the interval [&minus;1, 1], for which the interpolating polynomials do not even converge pointwise except at the three points ''x'' = &minus;1, 0, and 1.<ref>{{Harvtxt|Watson|1980|p=21}} attributes the last example to {{Harvtxt|Bernstein|1912}}.</ref> + +One might think that better convergence properties may be obtained by choosing different interpolation nodes. The following '''theorem''' seems to be a rather encouraging answer: + +:For any function ''f''(''x'') continuous on an interval [''a'',''b''] there exists a table of nodes for which the sequence of interpolating polynomials <math>p_n(x)</math> converges to ''f''(''x'') uniformly on [''a'',''b'']. + +'''Proof'''. It's clear that the sequence of polynomials of best approximation <math>p^*_n(x)</math> converges to ''f''(''x'') uniformly (due to [[Weierstrass approximation theorem]]). Now we have only to show that each <math>p^*_n(x)</math> may be obtained by means of interpolation on certain nodes. But this is true due to a special property of polynomials of best approximation known from the [[Chebyshev alternation theorem]]. Specifically, we know that such polynomials should intersect ''f''(''x'') at least ''n''+1 times. Choosing the points of intersection as interpolation nodes we obtain the interpolating polynomial coinciding with the best approximation polynomial. + +The defect of this method, however, is that interpolation nodes should be calculated anew for each new function ''f''(''x''), but the algorithm is hard to be implemented numerically. Does there exist a single table of nodes for which the sequence of interpolating polynomials converge to any continuous function ''f''(''x'')? The answer is unfortunately negative as it is stated by the following '''theorem''': + +:For any table of nodes there is a continuous function ''f''(''x'') on an interval [''a'',''b''] for which the sequence of interpolating polynomials diverges on [''a'',''b''].<ref>{{Harvtxt|Watson|1980|p=21}} attributes this theorem to {{Harvtxt|Faber|1914}}.</ref> + +The proof essentially uses the lower bound estimation of the Lebesgue constant, which we defined above to be the operator norm of ''X''<sub>''n''</sub> (where ''X''<sub>''n''</sub> is the projection operator on Π<sub>''n''</sub>). Now we seek a table of nodes for which + +:<math>\lim_{n \to \infty} X_n f = f,\text{ for every }f \in C([a,b]). \, </math> + +Due to the [[Banach–Steinhaus theorem]], this is only possible when norms of ''X''<sub>''n''</sub> are uniformly bounded, which cannot be true since we know that <math>\|X_n\|\geq \frac{2}{\pi} \log(n+1)+C.</math> + +For example, if equidistant points are chosen as interpolation nodes, the function from [[Runge's phenomenon]] demonstrates divergence of such interpolation. Note that this function is not only continuous but even infinitely times differentiable on [&minus;1, 1]. For better [[Chebyshev nodes]], however, such an example is much harder to find because of the '''theorem''': + +:For every [[absolute continuity|absolutely continuous]] function on [&minus;1,&nbsp;1] the sequence of interpolating polynomials constructed on Chebyshev nodes converges to&nbsp;''f''(''x'') uniformly. + +==Related concepts== + +[[Runge's phenomenon]] shows that for high values of ''n'', the interpolation polynomial may oscillate wildly between the data points. This problem is commonly resolved by the use of [[spline interpolation]]. Here, the interpolant is not a polynomial but a [[spline (mathematics)|spline]]: a chain of several polynomials of a lower degree. + +Interpolation of [[periodic function]]s by [[harmonic analysis|harmonic]] functions is accomplished by [[Fourier transform]]. This can be seen as a form of polynomial interpolation with harmonic base functions, see [[trigonometric interpolation]] and [[trigonometric polynomial]]. + +[[Hermite interpolation]] problems are those where not only the values of the polynomial ''p'' at the nodes are given, but also all derivatives up to a given order. This turns out to be equivalent to a system of simultaneous polynomial congruences, and may be solved by means of the [[Chinese remainder theorem]] for polynomials. [[Birkhoff interpolation]] is a further generalization where only derivatives of some orders are prescribed, not necessarily all orders from 0 to a ''k''. + +[[Collocation method]]s for the solution of differential and integral equations are based on polynomial interpolation. + +The technique of [[rational function modeling]] is a generalization that considers ratios of polynomial functions. + +At last, [[multivariate interpolation]] for higher dimensions. + +==See also== +* [[Newton series]] + +==Notes== +{{reflist|30em}} + +==References== +* {{Citation |first=Kendell A. |last=Atkinson |year=1988 |title=An Introduction to Numerical Analysis |edition=2nd |chapter=Chapter 3. |publisher= John Wiley and Sons |isbn=0-471-50023-2 |doi= }} +* {{Citation |first=Sergei N. |last=Bernstein |authorlink=Sergei Natanovich Bernstein |year=1912 |title=Sur l'ordre de la meilleure approximation des fonctions continues par les polynômes de degré donné |language=French |trans_chapter=On the order of the best approximation of continuous functions by polynomials of a given degree |journal=Mem. Acad. Roy. Belg. |issn= |volume=4 |issue= |pages=1&ndash;104 |doi=}} +* {{Citation |first=L. |last=Brutman |year=1997 |title=Lebesgue functions for polynomial interpolation — a survey |journal=Ann. Numer. Math. |issn= |volume=4 |issue= |pages=111&ndash;127 |doi= }} +* {{Citation |first=Georg |last=Faber |authorlink=Georg Faber |year=1914 |title=Über die interpolatorische Darstellung stetiger Funktionen |language=German |trans_chapter=On the Interpolation of Continuous Functions |journal=Deutsche Math. Jahr. |volume=23 |issue= |pages=192&ndash;210 |doi=}} +* {{Citation |first=M. J. D. |last=Powell |authorlink=Michael J. D. Powell |year=1981 |title=Approximation Theory and Methods |chapter=Chapter 4 |publisher=Cambridge University Press |isbn=0-521-29514-9 |doi= }} +* {{Citation |first=Michelle |last=Schatzman |year=2002 |title=Numerical Analysis: A Mathematical Introduction |chapter=Chapter 4 |publisher=Clarendon Press |location=Oxford |isbn=0-19-850279-6 |doi=}} +* {{Citation |first=Endre |last=Süli |authorlink=Endre Süli |first2=David |last2=Mayers |year=2003 |title=An Introduction to Numerical Analysis |chapter=Chapter 6 |publisher=Cambridge University Press |isbn=0-521-00794-1 |doi=}} +* {{Citation |first=G. Alistair |last=Watson |year=1980 |title=Approximation Theory and Numerical Methods |publisher=John Wiley |isbn=0-471-27706-1 |doi=}} + +== External links == +* {{springer|title=Interpolation process|id=p/i051970}} +* [http://www.alglib.net/interpolation/polynomial.php ALGLIB] has an implementations in C++ / C# / VBA / Pascal. +* [http://www.gnu.org/software/gsl/ GSL] has a polynomial interpolation code in C +* [http://demonstrations.wolfram.com/InterpolatingPolynomial/ Interpolating Polynomial] by [[Stephen Wolfram]], the [[Wolfram Demonstrations Project]]. + +[[Category:Interpolation]] +[[Category:Polynomials]] +[[Category:Articles containing proofs]] + iseksblk3tvbddzofq3zphhnb8aj1pk + + + + Mutual fund separation theorem + 0 + 26176 + + 26177 + 2013-09-02T05:16:00Z + + Michael Hardy + 0 + + wikitext + text/x-wiki + In [[Modern portfolio theory|portfolio theory]], a '''mutual fund separation theorem''', '''mutual fund theorem''', or '''separation theorem''' is a [[theorem]] stating that, under certain conditions, any investor's optimal portfolio can be constructed by holding each of certain [[mutual fund]]s in appropriate ratios, where the number of mutual funds is smaller than the number of individual assets in the portfolio. Here a mutual fund refers to any specified benchmark portfolio of the available assets. There are two advantages of having a mutual fund theorem. First, if the relevant conditions are met, it may be easier (or lower in transactions costs) for an investor to purchase a smaller number of mutual funds than to purchase a larger number of assets individually. Second, from a theoretical and empirical standpoint, if it can be assumed that the relevant conditions are indeed satisfied, then [[Capital asset pricing model|implications]] for the functioning of asset markets can be derived and tested. + +==Portfolio separation in mean-variance analysis== + +Portfolios can be analyzed in a [[mean-variance analysis|mean-variance]] framework, with every investor holding the portfolio with the lowest possible return [[variance]] consistent with that investor's chosen level of [[expected return]] (called a '''minimum-variance portfolio'''), if the returns on the assets are jointly [[elliptical distribution|elliptically distributed]], including the special case in which they are [[joint normality|jointly normally distributed]].<ref>Chamberlain, G. 1983."A characterization of the distributions that imply mean-variance utility functions", ''[[Journal of Economic Theory]]'' 29, 185–201.</ref><ref>Owen, J., and Rabinovitch, R. 1983. "On the class of elliptical distributions and their applications to the theory of portfolio choice", ''[[Journal of Finance]]'' 38, 745–752.</ref> Under mean-variance analysis, it can be shown<ref>Merton, Robert. September 1972. "An analytic derivation of the efficient portfolio frontier," ''[[Journal of Financial and Quantitative Analysis]]'' 7, 1851–1872.</ref> that every minimum-variance portfolio given a particular expected return (that is, every efficient portfolio) can be formed as a combination of any two efficient portfolios. If the investor's optimal portfolio has an expected return that is between the expected returns on two efficient benchmark portfolios, then that investor's portfolio can be characterized as consisting of positive quantities of the two benchmark portfolios. + +===No risk-free asset=== + +To see two-fund separation in a context in which no risk-free asset is available, using [[matrix algebra]], let <math>\sigma^2</math> be the variance of the portfolio return, let <math>\mu</math> be the level of expected return on the portfolio that portfolio return variance is to be minimized contingent upon, let <math>r</math> be the [[Euclidean vector|vector]] of expected returns on the available assets, let <math>X</math> be the vector of amounts to be placed in the available assets, let <math>W</math> be the amount of wealth that is to be allocated in the portfolio, and let <math>1</math> be a vector of ones. Then the problem of minimizing the portfolio return variance subject to a given level of expected portfolio return can be stated as + +:Minimize <math>\sigma^2</math> + +:subject to + +:<math>X^Tr = \mu</math> + +:and + +:<math>X^T1 = W</math> + +where the superscript <math>^T</math> denotes the [[transpose]] of a matrix. The portfolio return variance in the objective function can be written as <math>\sigma^2 = X^TVX,</math> where <math>V</math> is the positive definite [[covariance matrix]] of the individual assets' returns. The [[Lagrange multipliers|Lagrangian]] for this constrained optimization problem (whose second-order conditions can be shown to be satisfied) is + +:<math>L = X^TVX + 2\lambda(\mu - X^Tr) + 2\eta (W-X^T1),</math> + +with Lagrange multipliers <math>\lambda</math> and <math>\eta</math>.This can be solved for the optimal vector <math>X</math> of asset quantities by equating to zero the [[Matrix calculus|derivatives]] with respect to <math>X</math>, <math>\lambda</math>, and <math>\eta</math>, provisionally solving the [[first-order condition]] for <math>X</math> in terms of <math>\lambda</math> and <math>\eta</math>, substituting into the other first-order conditions, solving for <math>\lambda</math> and <math>\eta</math> in terms of the model parameters, and substituting back into the provisional solution for <math>X</math>. The result is + +:<math>X^\mathrm{opt} = \frac{W}{\Delta}[(r^TV^{-1}r)V^{-1}1 - (1^TV^{-1}r)V^{-1}r] + \frac{\mu}{\Delta}[(1^TV^{-1}1)V^{-1}r - (r^TV^{-1}1)V^{-1}1]</math> + +where + +::<math>\Delta = (r^TV^{-1}r)(1^TV^{-1}1) - (r^TV^{-1}1)^2 > 0.</math> + +For simplicity this can be written more compactly as + +:<math>X^\mathrm{opt} = \alpha W + \beta \mu</math> + +where <math>\alpha</math> and <math>\beta</math> are parameter vectors based on the underlying model parameters. Now consider two benchmark efficient portfolios constructed at benchmark expected returns <math>\mu_1</math> and <math>\mu_2</math> and thus given by + +:<math>X_{1}^\mathrm{opt} = \alpha W + \beta \mu_1</math> + +and + +:<math>X_{2}^\mathrm{opt} = \alpha W + \beta \mu_2.</math> + +The optimal portfolio at arbitrary <math>\mu_3</math> can then be written as a weighted average of <math>X_{1}^\mathrm{opt}</math> and <math>X_{2}^\mathrm{opt}</math> as follows: + +:<math>X_{3}^\mathrm{opt} = \alpha W + \beta \mu_3 = \frac{\mu_3 - \mu_2}{\mu_1 - \mu_2}X_{1}^\mathrm{opt} + \frac{\mu_1 - \mu_3}{\mu_1 - \mu_2}X_{2}^\mathrm{opt}.</math> + +This equation proves the two-fund separation theorem for mean-variance analysis. For a geometric interpretation, see [[Modern portfolio theory#The efficient frontier with no risk-free asset|the Markowitz bullet]]. + +===One risk-free asset=== + +If a [[Risk-free interest rate|risk-free asset]] is available, then again a two-fund separation theorem applies; but in this case one of the "funds" can be chosen to be a very simple fund containing only the risk-free asset, and the other fund can be chosen to be one which contains zero holdings of the risk-free asset. (With the risk-free asset referred to as "money", this form of the theorem is referred to as the '''monetary separation theorem'''.) Thus mean-variance efficient portfolios can be formed simply as a combination of holdings of the risk-free asset and holdings of a particular efficient fund that contains only risky assets. The derivation above does not apply, however, since with a risk-free asset the above covariance matrix of all asset returns, <math>V</math>, would have one row and one column of zeroes and thus would not be invertible. Instead, the problem can be set up as + +:Minimize <math>\sigma^2</math> + +:subject to + +:<math>(W-X^T1)r_f + X^Tr = \mu,</math> + +where <math>r_f</math> is the known return on the risk-free asset, X is now the vector of quantities to be held in the ''risky'' assets, and <math>r</math> is the vector of expected returns on the risky assets. The left side of the last equation is the expected return on the portfolio, since <math>(W-X^T1)</math> is the quantity held in the risk-free asset, thus incorporating the asset adding-up constraint that in the earlier problem required the inclusion of a separate Lagrangian constraint. The objective function can be written as <math>\sigma^2 = X^TVX</math>, where now <math>V</math> is the covariance matrix of the risky assets only. This optimization problem can be shown to yield the optimal vector of risky asset holdings + +:<math>X^\mathrm{opt} = \frac{(\mu - Wr_f)}{(r-1r_f)^TV^{-1}(r-1r_f)}V^{-1}(r-1r_f).</math> + +Of course this equals a zero vector if <math>\mu = Wr_f</math>, the risk-free portfolio's return, in which case all wealth is held in the risk-free asset. It can be shown that the portfolio with exactly zero holdings of the risk-free asset occurs at <math>\mu = \tfrac{Wr^TV^{-1}(r-1r_f)}{1^TV^{-1}(r-1r_f)}</math> and is given by + +:<math>X^* = \frac{W}{1^TV^{-1}(r-1r_f)}V^{-1}(r-1r_f).</math> + +It can also be shown (analogously to the demonstration in the above two-mutual-fund case) that every portfolio's risky asset vector (that is, <math>X^\mathrm{opt}</math> for every value of <math>\mu</math>) can be formed as a weighted combination of the latter vector and the zero vector. For a geometric interpretation, see [[Modern portfolio theory#The efficient frontier with no risk-free asset|the efficient frontier with no risk-free asset]]. + +==Portfolio separation without mean-variance analysis== + +If investors have [[hyperbolic absolute risk aversion]] (HARA) (including the [[power utility function]], [[logarithmic function]] and the [[Exponential utility|exponential utility function]]), separation theorems can be obtained without the use of mean-variance analysis. For example, [[David Cass]] and [[Joseph Stiglitz]]<ref>Cass, David, and Joseph Stiglitz, "The structure of investor preferences and asset returns, and separability in portfolio allocation", ''[[Journal of Economic Theory]]'' 2, 1970, 122–160.</ref> showed in 1970 that two-fund monetary separation applies if all investors have HARA utility with the same exponent as each other.<ref>Huang, Chi-fu, and Robert H. Litzenberger, ''Foundations for Financial Economics'', North-Holland, 1988.</ref>{{rp|ch.4}} + +More recently, in the dynamic portfolio optimization model of Çanakoğlu and Özekici,<ref>Çanakoğlu, Ethem, and Süleyman Özekici (March 2010), "Portfolio selection in stochastic markets with HARA utility functions", ''European Journal of Operational Research'' 201(2), 520–536. http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VCT-4VXDTWH-5&_user=10&_coverDate=03%2F01%2F2010&_rdoc=1&_fmt=high&_orig=search&_origin=search&_sort=d&_docanchor=&view=c&_searchStrId=1572358725&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=c24c04131ff627766be9dc38e04726d2&searchtype=a</ref> the investor's level of initial wealth (the distinguishing feature of investors) does not affect the optimal composition of the risky part of the portfolio. A similar result is given by Schmedders.<ref>Schmedders, Karl H. (June 15, 2006) "Two-fund separation in dynamic general equilibrium," SSRN Working Paper Series. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=908587</ref> + +==References== +{{reflist}} + +[[Category:Finance]] +[[Category:Financial economics]] +[[Category:Portfolio theories]] + omv8cq22taj4213p2vkjl9enllj0vje + + + + Coarea formula + 0 + 22381 + + 22382 + 2013-03-08T19:55:47Z + + Addbot + 0 + + + [[User:Addbot|Bot:]] Migrating 3 interwiki links, now provided by [[Wikipedia:Wikidata|Wikidata]] on [[d:q2591257]] + wikitext + text/x-wiki + In the [[mathematics|mathematical]] field of [[geometric measure theory]], the '''coarea formula''' expresses the [[integral]] of a function over an [[open set]] in [[Euclidean space]] in terms of the integral of the [[level set]]s of another function. A special case is [[Fubini's theorem]], which says under suitable hypotheses that the integral of a function over the region enclosed by a rectangular box can be written as the [[iterated integral]] over the level sets of the coordinate functions. Another special case is integration in [[spherical coordinates]], in which the integral of a function on '''R'''<sup>''n''</sup> is related to the integral of the function over spherical shells: level sets of the radial function. The formula plays a decisive role in the modern study of [[isoperimetric problem]]s. + +For [[smooth function]]s the formula is a result in [[multivariate calculus]] which follows from a simple [[change of variables]]. More general forms of the formula for [[Lipschitz function]]s were first established by [[Herbert Federer]] {{harv|Federer|1959}}, and for [[Bounded variation|''{{math|BV}}'' functions]] by {{harvtxt|Fleming|Rishel|1960}}. + +A precise statement of the formula is as follows. Suppose that Ω is an open set in '''R'''<sup>''n''</sup>, and ''u'' is a real-valued [[Lipschitz function]] on Ω. Then, for an [[Lp space|L<sup>1</sup>]] function ''g'', + +:<math>\int_\Omega g(x) |\nabla u(x)|\, dx = \int_{-\infty}^\infty \left(\int_{u^{-1}(t)}g(x)\,dH_{n-1}(x)\right)\,dt</math> + +where ''H''<sub>''n''&nbsp;&minus;&nbsp;1</sub> is the (''n''&nbsp;&minus;&nbsp;1)-dimensional [[Hausdorff measure]]. In particular, by taking ''g'' to be one, this implies + +:<math>\int_\Omega |\nabla u| = \int_{-\infty}^\infty H_{n-1}(u^{-1}(t))\,dt,</math> + +and conversely the latter equality implies the former by standard techniques in [[Lebesgue integral|Lebesgue integration]]. + +More generally, the coarea formula can be applied to Lipschitz functions ''u'' defined in Ω&nbsp;⊂&nbsp;'''R'''<sup>''n''</sup>, taking on values in '''R'''<sup>''k''</sup> where ''k''&nbsp;<&nbsp;''n''. In this case, the following identity holds + +:<math>\int_\Omega g(x) |J_k u(x)|\, dx = \int_{\mathbb{R}^k} \left(\int_{u^{-1}(t)}g(x)\,dH_{n-k}(x)\right)\,dt</math> + +where ''J''<sub>''k''</sub>''u'' is the ''k''-dimensional [[Jacobian]] of ''u''. + +==Applications== +* Taking ''u''(''x'') = |''x''&nbsp;&minus;&nbsp;''x''<sub>0</sub>| gives the formula for integration in spherical coordinates of an integrable function ƒ: +::<math>\int_{\mathbb{R}^n}f\,dx = \int_0^\infty\left\{\int_{\partial B(x_0;r)} f\,dS\right\}\,dr.</math> +* Combining the coarea formula with the [[isoperimetric inequality]] gives a proof of the [[Sobolev inequality]] for ''W''<sup>1,1</sup> with best constant: +::<math>\left(\int_{\mathbb{R}^n} |u|^{n/(n-1)}\right)^{\frac{n-1}{n}}\le n^{-1}\omega_n^{-1/n}\int_{\mathbb{R}^n}|\nabla u|</math> +:where &omega;<sub>n</sub> is the volume of the [[unit ball]] in '''R'''<sup>''n''</sup>. + +==See also== +* [[Sard's theorem]] +* [[Smooth coarea formula]] + +==References== +* {{citation +| last = Federer +| first = Herbert +| authorlink = Herbert Federer +| title = Geometric measure theory +| publisher = Springer-Verlag New York Inc. +| location = New York +| year = 1969 +| pages = xiv+676 +| isbn = 978-3-540-60656-7 +| mr= 0257325 +| series = Die Grundlehren der mathematischen Wissenschaften, Band 153 }}. +* {{citation|last=Federer|first=H|authorlink=Herbert Federer|title=Curvature measures|journal=Transactions of the American Mathematical Society|volume=93|year=1959|pages=418–491|jstor=1993504|doi=10.2307/1993504|issue= 3|publisher=Transactions of the American Mathematical Society, Vol. 93, No. 3}}. +* {{citation|last1=Fleming|first1=WH|last2=Rishel|first2=R|title=An integral formula for the total gradient variation|journal=Archiv der Mathematik|volume = 11|year=1960|doi=10.1007/BF01236935|pages=218–222|url=http://www.springerlink.com/index/WV67N13464926501.pdf|format=PDF|issue = 1}} +* {{citation|last1=Malý|first1=J|last2=Swanson|first2=D|last3=Ziemer|first3=W|title=The co-area formula for Sobolev mappings|journal=Transactions of the American Mathematical Society|year=2002|volume=355|pages=477–492|url=http://www.ams.org/tran/2003-355-02/S0002-9947-02-03091-X/S0002-9947-02-03091-X.pdf|format=PDF|doi=10.1090/S0002-9947-02-03091-X|issue=2}}. + +[[Category:Measure theory]] + sb4sbzq6mfctnzpl4qx9xhwvp644g45 + + + + Dirac bracket + 0 + 18055 + + 18056 + 2013-11-15T02:59:30Z + + Sam Sailor + 0 + + + Reverted 1 edit by [[Special:Contributions/151.230.63.194|151.230.63.194]] identified as test/vandalism using [[WP:STiki|STiki]] + wikitext + text/x-wiki + The '''Dirac bracket''' is a generalization of the [[Poisson bracket]] developed by [[Paul Dirac]]<ref>{{cite doi|10.4153/CJM-1950-012-1|noedit}}</ref> to treat classical systems with [[second class constraints]] in [[Hamiltonian mechanics]], and to thus allow them to undergo [[canonical quantization]]. It is an important part of Dirac's development of [[Hamiltonian mechanics]] to elegantly handle more general [[Lagrangian]]s, when constraints and thus more apparent than dynamical variables are at hand.<ref>{{Cite book | last1=Dirac | first1=Paul A. M. | title=Lectures on quantum mechanics | url=http://books.google.com/books?id=GVwzb1rZW9kC | publisher=Belfer Graduate School of Science, New York | series=Belfer Graduate School of Science Monographs Series | year=1964 | volume=2 | mr=2220894 | isbn=9780486417134 }}; Dover, ISBN 0486417131.</ref> More abstractly, the two-form implied from the Dirac bracket is '''the restriction of the [[Symplectic manifold|symplectic form]] to the constraint surface in [[phase space]]'''.<ref>See pages 48-58 of Ch. 2 in Henneaux, Marc and Teitelboim, Claudio, ''Quantization of Gauge Systems''. Princeton University Press, 1992. ISBN 0-691-08775-X</ref> + +This article assumes familiarity with the standard [[Lagrangian]] and [[Hamiltonian Mechanics|Hamiltonian]] formalisms, and their connection to [[canonical quantization]]. Details of Dirac's modified Hamiltonian formalism are also summarized to put the Dirac bracket in context. + +== Inadequacy of the standard Hamiltonian procedure == + +The standard development of Hamiltonian mechanics is inadequate in several specific situations: +# When the Lagrangian is at most linear in the velocity of at least one coordinate; in which case, the definition of the [[Canonical coordinate|canonical momentum]] leads to a ''constraint''. This is the most frequent reason to resort to Dirac brackets. For instance, the Lagrangian (density) for any [[fermion]] is of this form. +# When there are [[Gauge fixing|gauge]] (or other unphysical) degrees of freedom which need to be fixed. +# When there are any other constraints that one wishes to impose in phase space. + +=== Example of a Lagrangian linear in velocity === + +An example in [[classical mechanics]] is a particle with charge ''q'' and mass ''m'' confined to the ''x'' - ''y'' plane with a strong constant, homogeneous perpendicular magnetic field, so then pointing in the ''z''-direction with strength ''B''.<ref>{{cite doi|10.1103/PhysRevD.43.1332|noedit }}</ref> + +The Lagrangian for this system with an appropriate choice of parameters is +:<math> L = \tfrac{1}{2}m\vec{v}^2 + \frac{q}{c}\vec{A}\cdot\vec{v} - V(\vec{r}),</math> +where <math>\vec{A}</math> is the [[vector potential]] for the magnetic field, <math>\vec{B}</math>; ''c'' is the speed of light in vacuum; and <math>V(\vec{r})</math> is an arbitrary external scalar potential; one could easily take it to be quadratic in ''x'' and ''y'', without loss of generality. We use +:<math> \vec{A} = \frac{B}{2}(x\hat{y} - y\hat{x})</math> +as our vector potential. Here, the hats indicate unit vectors. Later in the article, however, they are used to distinguish quantum mechanical operators from their classical analogs. The usage should be clear from the context. + +Explicitly, the [[Lagrangian]] amounts to just +:<math> +L = \frac{m}{2}(\dot{x}^2 + \dot{y}^2) + \frac{qB}{2c}(x\dot{y} - y\dot{x}) - V(x, y) ~, +</math> +which leads to the equations of motion +:<math> +m\ddot{x} = - \frac{\partial V}{\partial x} + \frac{q B}{c}\dot{y} +</math> +:<math> +m\ddot{y} = - \frac{\partial V}{\partial y} - \frac{q B}{c}\dot{x}. +</math> +For a harmonic potential, the gradient of ''V'' amounts to just the coordinates, −(''x'',''y''). + +Now, in the limit of a very large magnetic field, ''qB''/''mc'' ≫ 1. One may then drop the kinetic term to produce a simple approximate Lagrangian, +:<math> +L = \frac{qB}{2c}(x\dot{y} - y\dot{x}) - V(x, y)~, +</math> +with first-order equations of motion +:<math> +\dot{y} = \frac{c}{q B}\frac{\partial V}{\partial x} +</math> +:<math> +\dot{x} = -\frac{c}{q B}\frac{\partial V}{\partial y}~. +</math> +Note that this approximate Lagrangian is ''linear in the velocities'', which is one of the conditions under which the standard Hamiltonian procedure breaks down. While this example has been motivated as an approximation, the Lagrangian under consideration is legitimate and leads to consistent equations of motion in the Lagrangian formalism. + +Following the Hamiltonian procedure, however, the canonical momenta associated with the coordinates are now +:<math> +p_x = \frac{\partial L}{\partial \dot{x}} = -\frac{q B}{2c}y +</math> +:<math> +p_y = \frac{\partial L}{\partial \dot{y}} = \frac{q B}{2c}x ~, +</math> +which are unusual in that they are not invertible to the velocities; instead, they are constrained to be functions of the coordinates: the four phase-space variables are linearly dependent, so the variable basis is [[overcompleteness|overcomplete]]. + +A [[Legendre transformation]] then produces the Hamiltonian, +:<math> +H(x,y, p_x, p_y) = \dot{x}p_x + \dot{y} p_y - L = V(x, y). +</math> +Note that this "naive" Hamiltonian has ''no dependence on the momenta'', which means that equations of motion (Hamilton's equations) are inconsistent. + +The Hamiltonian procedure has broken down. One might try to fix the problem by eliminating two of the components of the 4d phase space, say ''y'' and ''p''<sub>''y''</sub>, down to a reduced phase space of 2d, that is sometimes expressing the coordinates as momenta and sometimes as coordinates. However, this is neither a general nor rigorous solution. This gets to the heart of the matter: that the definition of the canonical momenta implies a ''constraint on phase space'' (between momenta and coordinates) that was never taken into account. + +== Generalized Hamiltonian procedure == + +In Lagrangian mechanics, if the system has [[holonomic constraint|non-holonomic constraints]], then one generally adds [[Lagrange multipliers]] to the Lagrangian to account for them. The extra terms vanish when the constraints are satisfied, thereby forcing the path of stationary action to be on the constraint surface. In this case, going to the Hamiltonian formalism introduces a constraint on ''phase space'' in Hamiltonian mechanics, but the solution is similar. + +Before proceeding, it is useful to understand the notions of '''weak equality''' and '''strong equality'''. Two functions on phase space, ''f'' and ''g'', are weakly equal if they are equal ''when the equations of motion are satisfied'' or [[On shell and off shell|on shell]], denoted <math>f\approx g</math>. If ''f'' and ''g'' are equal on and off shell, then they are called strongly equal, written ''f''=''g''. It is important to note that, in order to get the right answer, ''no weak equations may be used before evaluating derivatives or Poisson brackets''. + +The new procedure works as follows, start with a Lagrangian and define the canonical momenta in the usual way. Some of those definitions may not be invertible and instead give a constraint in phase space (as above). Constraints derived in this way or imposed from the beginning of the problem are called '''primary constraints'''. The constraints, labeled <math>\phi_j</math>, must weakly vanish, <math>\phi_j(q, p)\approx 0</math>. + +Next, one finds the '''naive Hamiltonian''', ''H'', in the usual way via a Legendre transformation, exactly as in the above example. Note that the Hamiltonian can always be written as a function of ''q''s and ''p''s only, even if the velocities cannot be inverted into functions of the momenta. + +=== Generalizing the Hamiltonian === + +Dirac argues that we should generalize the Hamiltonian (somewhat analogously to the method of Lagrange multipliers) to +:<math> +H^* = H + \sum_j c_j\phi_j \approx H, +</math> +where the ''c''<sub>j</sub> are not constants but functions of the coordinates and momenta. Since this new Hamiltonian is the most general function of coordinates and momenta weakly equal to the naive Hamiltonian, ''H''<sup>*</sup> is the broadest generalization of the Hamiltonian possible. + +To further illuminate the ''c''<sub>j</sub>, consider how one gets the equations of motion from the naive Hamiltonian in the standard procedure. One expands the variation of the Hamiltonian out in two ways and sets them equal (using a somewhat abbreviated notation with suppressed indices and sums): +:<math> +\delta H = \frac{\partial H}{\partial q}\delta q + \frac{\partial H}{\partial p}\delta p + \approx \dot{q}\delta p - \dot{p}\delta q ~, +</math> +where the second equality holds after simplifying with the Euler-Lagrange equations of motion and the definition of canonical momentum. From this equality, one deduces the equations of motion in the Hamiltonian formalism from +:<math> +\left(\frac{\partial H}{\partial q} + \dot{p}\right)\delta q + \left(\frac{\partial H}{\partial p} - \dot{q}\right)\delta p = 0 ~, +</math> +where the weak equality symbol is no longer displayed explicitly, since by definition the equations of motion only hold weakly. In the present context, one cannot simply set the coefficients of ''δq'' and ''δp'' separately to zero, since the variations are somewhat restricted by the constraints. In particular, the variations must be tangent to the constraint surface. + +One can demonstrate the solution to +:<math> +\sum_n A_n\delta q_n + \sum_n B_n\delta p_n = 0, +</math> +for the variations ''δq''<sub>n</sub> and ''δp''<sub>n</sub> restricted by the constraints <math>\phi_j\approx 0</math> (assuming the constraints satisfy some [[regular function|regularity conditions]]) is generally<ref name = Henneaux>See page 8 in Henneaux and Teitelboim in the references.</ref> +:<math> +A_n = \sum_m u_m \frac{\partial \phi_m}{\partial q_n} +</math> +:<math> +B_n = \sum_m u_m \frac{\partial \phi_m}{\partial p_n}, +</math> +where the ''u''<sub>m</sub> are arbitrary functions. + +Using this result, the equations of motion become +:<math> +\dot{p}_j = -\frac{\partial H}{\partial q_j} - \sum_k u_k \frac{\partial \phi_k}{\partial q_j} +</math> +:<math> +\dot{q}_j = \frac{\partial H}{\partial p_j} + \sum_k u_k \frac{\partial \phi_k}{\partial p_j} +</math> +:<math> +\phi_j(q, p) = 0, +</math> +where the ''u''<sub>k</sub> are functions of coordinates and velocities that can be determined, in principle, from the second equation of motion above. + +The Legendre transform between the Lagrangian formalism and the Hamiltonian formalism has been saved at the cost of adding new variables. + +=== Consistency conditions === + +The equations of motion become more compact when using the Poisson bracket, since if <math>f</math> is some function of the coordinates and momenta then +:<math> +\dot{f} \approx \{f, H^*\}_{PB} \approx \{f, H\}_{PB} + \sum_k u_k\{f, \phi_k\}_{PB}, +</math> +if one assumes that the Poisson bracket with the ''u''<sub>k</sub> (functions of the velocity) exist; this causes no problems since the contribution weakly vanishes. Now, there are some consistency conditions which must be satisfied in order for this formalism to make sense. If the constraints are going to be satisfied, then their equations of motion must weakly vanish, that is, we require +:<math> +\dot{\phi_j} \approx \{\phi_j, H\}_{PB} + \sum_k u_k\{\phi_j,\phi_k\}_{PB} \approx 0. +</math> +There are four different types of conditions that can result from the above: +# An equation that is inherently false, such as 1=0 . +# An equation that is identically true, possibly after using one of our primary constraints. +# An equation that places new constraints on our coordinates and momenta, but is independent of the <math>u_k</math>. +# An equation that serves to specify the ''u''<sub>k</sub>. + +The first case indicates that the starting Lagrangian gives inconsistent equations of motion, such as ''L=q''. The second case does not contribute anything new. + +The third case gives new constraints in phase space. A constraint derived in this manner is called a '''[[secondary constraint]]'''. Upon finding the secondary constraint one should add it to the extended Hamiltonian and check the new consistency conditions, which may result in still more constraints. Iterate this process until there are no more constraints. The distinction between primary and secondary constraints is largely an artificial one (i.e. a constraint for the same system can be primary or secondary depending on the Lagrangian), so this article does not distinguish between them from here on. Assuming the consistency condition has been iterated until all of the constraints have been found, then <math>\phi_j</math> will index all of them. Note this article uses secondary constraint to mean any constraint that was not initially in the problem or derived from the definition of canonical momenta; some authors distinguish between secondary constraints, tertiary constraints, et cetera. + +Finally, the last case helps fix the ''u''<sub>k</sub>. If, at the end of this process, the ''u''<sub>k</sub> are not completely determined, then that means there are unphysical (gauge) degrees of freedom in the system. Once all of the constraints (primary and secondary) are added to the naive Hamiltonian and the solutions to the consistency conditions for the ''u''<sub>k</sub> are plugged in, the result is called ''the total Hamiltonian''. + +=== Fixing the ''u''<sub>k</sub> === +The ''u''<sub>k</sub> must solve a set of inhomogeneous linear equations of the form +:<math> +\{\phi_j, H\}_{PB} + \sum_k u_k\{\phi_j,\phi_k\}_{PB} \approx 0. +</math> +The above equation must possess at least one solution, since otherwise the initial Lagrangian is inconsistent; however, in systems with gauge degrees of freedom, the solution will not be unique. The most general solution is of the form +:<math> +u_k = U_k + V_k, +</math> +where <math>U_k</math> is a particular solution and <math>V_k</math> is the most general solution to the homogeneous equation +:<math> +\sum_k V_k\{\phi_j,\phi_k\}_{PB}\approx 0. +</math> +The most general solution will be a linear combination of linearly independent solutions to the above homogeneous equation. The number of linearly independent solutions equals the number of ''u''<sub>k</sub> (which is the same as the number of constraints) minus the number of consistency conditions of the fourth type (in previous subsection). This is the number of unphysical degrees of freedom in the system. Labeling the linear independent solutions <math>V^a_k</math> where the index ''a'' runs from 1 to the number of unphysical degrees of freedom, the general solution to the consistency conditions is of the form +:<math> +u_k \approx U_k + \sum_a v_a V^a_k, +</math> +where the ''v''<sub>a</sub> are completely arbitrary functions of time. A different choice of the ''v''<sub>a</sub> corresponds to a gauge transformation, and should leave the physical state of the system unchanged.<ref>Weinberg, Steven, ''The Quantum Theory of Fields'', Volume 1. Cambridge University Press, 1995. ISBN 0-521-55001-7</ref> + +=== The total Hamiltonian === +At this point, it is natural to introduce the '''total Hamiltonian''' +:<math> +H_T = H + \sum_k U_k\phi_k + \sum_{a, k} v_a V^a_k \phi_k +</math> +and what is denoted +:<math> +H' = H + \sum_k U_k \phi_k. +</math> +The time evolution of a function on the phase space, <math>f</math> is governed by +:<math> +\dot{f} \approx \{f, H_T\}_{PB}. +</math> +Later, the extended Hamiltonian is introduced. For gauge-invariant (physically measurable quantities) quantities, all of the Hamiltonians should give the same time evolution, since they are all weakly equivalent. It is only for nongauge-invariant quantities that the distinction becomes important. + +== The Dirac bracket == + +Above is everything needed to find the equations of motion in Dirac's modified Hamiltonian procedure. Having the equations of motion, however, is not the endpoint for theoretical considerations. If one wants to canonically quantize a general system, then one needs the Dirac brackets. Before defining Dirac brackets, '''first-class''' and '''second-class''' constraints need to be introduced. + +We call a function {{math|''f(q, p)''}} of coordinates and momenta first class if its Poisson bracket with all of the constraints weakly vanishes, that is, +:<math> +\{f, \phi_j\}_{PB} \approx 0, +</math> +for all {{mvar|j}}. Note that the only quantities that weakly vanish are the constraints <math>\phi_j</math>, and therefore anything that weakly vanishes must be strongly equal to a linear combination of the constraints. One can demonstrate that the Poisson bracket of two first class quantities must also be first class. The first class constraints are intimately connected with the unphysical degrees of freedom mentioned earlier. Namely, the number of independent first class constraints is equal to the number of unphysical degrees of freedom, and furthermore the primary first class constraints generate gauge transformations. Dirac further postulated that all secondary first class constraints are generators of gauge transformations, which turns out to be false; however, typically one operates under the assumption that all first class constraints generate gauge transformations when using this treatment.<ref>See Henneaux and Teitelboim, pages 18-19.</ref> + +When the first-class secondary constraints are added into the Hamiltonian with arbitrary <math>v_a</math> as the first class primary constraints are added to arrive at the total Hamiltonian, then one obtains the '''extended Hamiltonian'''. The extended Hamiltonian gives the most general possible time evolution for any gauge-dependent quantities, and may actually generalize the equations of motion from those of the Lagrangian formalism. + +For the purposes of introducing the Dirac bracket, of more immediate interest are the [[First_class_constraint#Second_class_constraints|second class constraints]]. Second class constraints are constraints that have nonvanishing Poisson bracket with at least one other constraint. + +For instance, consider constraints {{mvar|φ}}<sub>1</sub> and {{mvar|φ}}<sub>2</sub> whose Poisson bracket is simply a constant, {{mvar|c}}, +:<math> +\{\phi_1,\phi_2\}_{PB} = c ~. +</math> +Now, suppose one wishes to employ canonical quantization, then the phase-space coordinates become operators whose commutators become {{math|''iħ''}} times their classical Poisson bracket. Assuming there are no ordering issues that give rise to new quantum corrections, this implies that +:<math> +[\hat{\phi}_1, \hat{\phi}_2] = i\hbar ~c, +</math> +where the hats emphasize the fact that the constraints are on operators. + +On the one hand, canonical quantization gives the above commutation relation, but on the other hand {{mvar|φ}}<sub>1</sub> and {{mvar|φ}}<sub>2</sub> are constraints that must vanish on physical states, whereas the right-hand side cannot vanish. This example illustrates the need for some generalization of the Poisson bracket which respects the system's constraints, and which leads to a consistent quantization procedure. This new bracket should be bilinear, antisymmetric, satisfy the Jacobi identity as does the Poisson bracket, reduce to the Poisson bracket for unconstrained systems, and, additionally, ''the bracket of any constraint with any other quantity must vanish''. + +At this point, the second class constraints will be labeled <math>\tilde{\phi}_a</math>. Define a matrix with entries +:<math> +M_{ab} = \{\tilde{\phi}_a,\tilde{\phi}_b\}_{PB}. +</math> +In this case, the Dirac bracket of two functions on phase space, ''f'' and ''g'', is defined as +{{Equation box 1 +|indent =: +|equation = <math> +\{f, g\}_{DB} = \{f, g\}_{PB} - \sum_{a, b}\{f,\tilde{\phi}_a\}_{PB} M^{-1}_{ab}\{\tilde{\phi}_b,g\}_{PB} ~, +</math> +|cellpadding= 6 +|border +|border colour = #0073CF +|background colour=#F9FFF7}} +where <math>M^{-1}_{ab}</math> denotes the {{math|''ab''}} entry of {{mvar|M}} 's inverse matrix. Dirac proved that {{mvar|M}} will always be invertible. + +It is straightforward to check that the above definition of the Dirac bracket satisfies all of the desired properties, and especially the last one, of vanishing for an argument which is a constraint. When using [[canonical quantization]] with a constrained Hamiltonian system, the commutator of the operators is supplanted by {{math|''iħ''}} times their classical ''Dirac bracket''. Since the Dirac bracket respects the constraints, one need not be careful about evaluating all brackets before using any weak equations, as is the case with the Poisson bracket. + +Note that while the Poisson bracket of bosonic (Grassmann even) variables with itself must vanish, the Poisson bracket of fermions represented as a [[Grassmann number|Grassmann variables]] with itself need not vanish. This means that in the fermionic case it ''is'' possible for there to be an odd number of second class constraints. + +== Illustration on the example provided== + +Returning to the above example, the naive Hamiltonian and the two primary constraints are +:<math> +H = V(x, y) +</math> +:<math> +\phi_1 = p_x + \tfrac{q B}{2c} y,\qquad \phi_2 = p_y - \tfrac{q B}{2 c} x. +</math> +Therefore the extended Hamiltonian can be written +:<math> +H^* = V(x, y) + u_1 \left(p_x + \tfrac{q B}{2c}y\right) + u_2 \left(p_y - \tfrac{q B}{2c}x\right). +</math> +The next step is to apply the consistency conditions <math>\{\phi_j, H^*\}_{PB} \approx 0</math>, which in this case become +:<math> +\{\phi_1, H\}_{PB}+\sum_j u_j\{\phi_1, \phi_j\}_{PB} = -\frac{\partial V}{\partial x} + u_2 \frac{q B}{c} \approx 0 +</math> +:<math> +\{\phi_2, H\}_{PB}+\sum_j u_j\{\phi_2, \phi_j\}_{PB} = -\frac{\partial V}{\partial y} - u_1 \frac{q B}{c} \approx 0. +</math> +These are ''not'' secondary constraints, but conditions that fix <math>u_1</math> and <math>u_2</math>. Therefore, there are no secondary constraints and the arbitrary coefficients are completely determined, indicating that there are no unphysical degrees of freedom. + +If one plugs in with the values of <math>u_1</math> and <math>u_2</math>, then one can see that the equations of motion are +:<math> +\dot{x} = \{x, H\}_{PB} + u_1\{x, \phi_1\}_{PB} + u_2 \{x, \phi_2\} = -\frac{c}{q B} \frac{\partial V}{\partial y} +</math> +:<math> +\dot{y} = \frac{c}{q B} \frac{\partial V}{\partial x} +</math> +:<math> +\dot{p}_x = -\frac{1}{2}\frac{\partial V}{\partial x} +</math> +:<math> +\dot{p}_y = -\frac{1}{2}\frac{\partial V}{\partial y}, +</math> +which are self-consistent and coincide with the Lagrangian equations of motion. + +A simple calculation confirms that <math>\phi_1</math> and <math>\phi_2</math> are second class constraints since +:<math> +\{\phi_1, \phi_2\}_{PB} = - \{\phi_2, \phi_1\}_{PB} = \frac{q B}{c}, +</math> +hence the matrix looks like +:<math> +M = \frac{q B}{c} +\left(\begin{matrix} + 0 & 1\\ +-1 & 0 +\end{matrix}\right), +</math> +which is easily inverted to +:<math> +M^{-1} = \frac{c}{q B} +\left(\begin{matrix} + 0 & -1\\ + 1 & 0 +\end{matrix}\right) \quad\Rightarrow\quad M^{-1}_{ab} = -\frac{c}{q B_0} \epsilon_{ab}, +</math> +where <math>\epsilon_{ab}</math> is the [[Levi-Civita symbol]]. Thus, the Dirac brackets are defined to be +:<math> +\{f, g\}_{DB} = \{f, g\}_{PB} + \frac{c\epsilon_{ab}}{q B} \{f, \phi_a\}_{PB}\{\phi_b, g\}_{PB}. +</math> +If one always uses the Dirac bracket instead of the Poisson bracket then there is no issue about the order of applying constraints and evaluating expressions, since the Dirac bracket of anything weakly zero is strongly equal to zero. This means that one can just use the naive Hamiltonian with Dirac brackets, and get the correct equations of motion, which one can easily confirm. + +To quantize the system, the Dirac brackets between all of the phase space variables are needed. The nonvanishing Dirac brackets for this system are +:<math> +\{x, y\}_{DB} = -\tfrac{c}{q B} +</math> +:<math> +\{x, p_x\}_{DB} = \{y, p_y\}_{DB} = \frac{1}{2} +</math> +while the cross-terms vanish, and +:<math> +\{p_x, p_y\}_{DB} = - \tfrac{q B}{4c}. +</math> + +Therefore, the correct implementation of [[canonical quantization]] dictates the commutation relations, +:<math> +[\hat{x}, \hat{y}] = -i\tfrac{\hbar c}{q B} +</math> +:<math> +[\hat{x}, \hat{p}_x] = [\hat{y}, \hat{p}_y] = i\frac{\hbar}{2} +</math> +with the cross terms vanishing, and +:<math> +[\hat{p}_x, \hat{p}_y] = -i\tfrac{\hbar q B}{4c}~. +</math> + +Interestingly, this example has a nonvanishing commutator between <math>\hat{x}</math> and <math>\hat{y}</math>, which means this structure specifies a [[noncommutative geometry]]. (Since the two coordinates do not commute, there will be an [[uncertainty principle]] for the ''x'' and ''y'' positions.) + +Similarly, for free motion on a hypersphere ''S''<sup>n</sup>, the ''n''+1 coordinates are constrained <math> x_i x^i=1</math>. From a plain kinetic Lagrangian, it is evident that their momenta are perpendicular to them, <math>x_i p^i=0</math>. Thus the corresponding Dirac Brackets are likewise simple to work out,<ref>{{cite doi|10.1016/0370-2693(79)90465-9|noedit}}</ref> +:<math> +\{x_i, x_j\}_{DB} = 0, +</math> +:<math> +\{x_i, p_j\}_{DB} = \delta_{ij} -x_i x_j ,</math> +:<math> +\{p_i, p_j\}_{DB} = x_j p_i - x_i p_j ~. +</math> +The 2(''n''+1) constrained phase-space variables (''x''<sub>i</sub>, ''p''<sub>i</sub>) ''obey much simpler Dirac brackets'' than the 2''n'' unconstrained variables, had one eliminated one of the ''x''s and one of the ''p''s through the two constraints ab initio, would obey plain Poisson brackets. The Dirac brackets add simplicity and elegance, at the cost of excessive (constrained) phase-space variables. + +== References == +{{reflist}} + +== See also == +* [[Canonical quantization]] +* [[Hamiltonian mechanics]] +* [[Poisson bracket]] +* [[First class constraint]] +* [[Second class constraints]] +* [[Lagrangian]] +* [[Symplectic structure]] +* [[Overcompleteness]] + +{{DEFAULTSORT:Dirac Bracket}} +[[Category:Mathematical quantization]] +[[Category:Symplectic geometry]] +[[Category:Hamiltonian mechanics]] +[[Category:Theoretical physics]] + bhjq79j6jvop52884rd1jxbl6zzkza8 + + + + Pickands–Balkema–de Haan theorem + 0 + 26814 + + 26815 + 2013-09-17T12:16:56Z + + Yobot + 0 + + + [[WP:CHECKWIKI]] error fixes / special characters in pagetitle using [[Project:AWB|AWB]] (9485) + wikitext + text/x-wiki + The '''Pickands–Balkema–de Haan theorem''' is often called the second theorem in [[extreme value theory]]. It gives the asymptotic [[tail distribution]] of a [[random variable]]&nbsp;''X'', when the true distribution ''F'' of ''X'' is unknown. Unlike the first theorem (the [[Fisher–Tippett–Gnedenko theorem]]) in extreme value theory, the interest here is the values above a threshold. + +==Conditional excess distribution function== +If we consider an unknown distribution function <math>F</math> of a random variable <math>X</math>, we are interested in estimating the conditional distribution function <math>F_u</math> of the variable <math>X</math> above a certain threshold <math>u</math>. This is the so-called conditional excess distribution function, defined as + +: <math>F_u(y) = P(X-u \leq y | X>u) = \frac{F(u+y)-F(u)}{1-F(u)} \, </math> + +for <math>0 \leq y \leq x_F-u</math>, where <math>x_F</math> is either the finite or infinite right endpoint of the underlying distribution <math>F</math>. The function <math>F_u</math> describes the distribution of the excess value over a threshold <math>u</math>, given that the threshold is exceeded. + +==Statement== +Let <math>(X_1,X_2,\ldots)</math> be a sequence of [[independent and identically-distributed random variables]], and let <math>F_u</math> be their conditional excess distribution function. Pickands (1975), Balkema and de Haan (1974) posed that for a large class of underlying distribution functions <math>F</math>, and large <math>u</math>, <math>F_u</math> is well approximated by the [[generalized Pareto distribution]]. That is: + +: <math>F_u(y) \rightarrow G_{k, \sigma} (y),\text{ as }u \rightarrow \infty</math> + +where +*<math>G_{k, \sigma} (y)= 1-(1+ky/\sigma)^{-1/k} </math>, if <math>k \neq 0</math> +*<math>G_{k, \sigma} (y)= 1-e^{-y/\sigma} </math>, if <math>k = 0.</math> + +Here ''&sigma;''&nbsp;>&nbsp;0, and ''y''&nbsp;≥&nbsp;0 when ''k''&nbsp;≥&nbsp;0 and 0&nbsp;≤&nbsp;''y''&nbsp;≤&nbsp;&minus;''&sigma;''/''k'' when ''k''&nbsp;<&nbsp;0. + +==Special cases of generalized Pareto distribution== +* [[Exponential distribution]] with [[expected value|mean]] <math>\sigma</math>, if ''k''&nbsp;=&nbsp;0. +* [[Uniform distribution (continuous)|Uniform distribution]] on <math>[0,\sigma]</math>, if k = 1. +* [[Pareto distribution]], if ''k''&nbsp;<&nbsp;0. + +{{primary sources|date=July 2012}} + +==References== +* Balkema, A., and [[Laurens de Haan|de Haan, L.]] (1974). "Residual life time at great age", ''Annals of Probability'', '''2''', 792–804. +* Pickands, J. (1975). "Statistical inference using extreme order statistics", ''Annals of Statistics'', '''3''', 119–131. + +{{DEFAULTSORT:Pickands-Balkema-de Haan theorem}} +[[Category:Probability theorems]] +[[Category:Extreme value data]] +[[Category:Tails of probability distributions]] + bfcf6w66wg8mgel38vx0er0u4ydijqg + + + + Quantum channel + 0 + 7259 + + 7260 + 2014-01-30T00:21:10Z + + David Eppstein + 0 + + + /* Separable channel */ fix wikilink + wikitext + text/x-wiki + In [[quantum information theory]], a '''quantum channel''' is a communication channel which can transmit [[quantum information]], as well as classical information. An example of quantum information is the state of a [[qubit]]. An example of classical information is a text document transmitted over the [[internet]]. + +More formally, quantum channels are [[completely positive]], trace preserving maps between spaces of operators. In other words, a quantum channel is just a [[quantum operation]] viewed not merely as the [[reduced dynamics]] of a system but as a pipeline intended to carry quantum information. + +== Memoryless quantum channel == + +We will assume for the moment that all state spaces of the systems considered, classical or quantum, are finite dimensional. + +The '''memory-less''' in the section title carries the same meaning as in classical [[information theory]]: the output of a channel at a given time depends only upon the corresponding input and not any previous ones. + +=== Schrödinger picture === + +Consider quantum channels that transmit only quantum information. This is precisely a [[quantum operation]], whose properties we now summarize. + +Let <math>H_A</math> and <math>H_B</math> be the state spaces (finite-dimensional [[Hilbert space]]s) of the sending and receiving ends, respectively, of a channel. <math>L(H_A)</math> will denote the family of operators on <math>H_A</math>. In the [[Schrödinger picture]], a purely quantum channel is a map Φ between [[density matrix|density matrices]] acting on <math>H_A</math> and <math>H_B</math> with the following properties: + +#As required by postulates of quantum mechanics, Φ needs to be linear. +#Since density matrices are positive, Φ must preserve the [[cone (linear algebra)|cone]] of positive elements. In other words, Φ is a [[Choi's theorem on completely positive maps|positive map]]. +#If an [[ancilla (quantum computing)|ancilla]] of arbitrary finite dimension ''n'' is coupled to the system, then the induced map <math>I_n \otimes \Phi</math>, where ''I<sub>n</sub>'' is the identity map on the ancilla, must also be positive. Therefore it is required that <math>I_n \otimes \Phi</math> is positive for all ''n''. Such maps are called [[completely positive]]. +#Density matrices are specified to have trace 1, so Φ has to preserve the trace. + +The adjectives '''completely positive and trace preserving''' used to describe a map are sometimes abbreviated '''CPTP'''. In the literature, sometimes the fourth property is weakened so that Φ is only required to be not trace-increasing. In this article, it will be assumed that all channels are CPTP. + +=== Heisenberg picture === + +Density matrices acting on ''H<sub>A</sub>'' only constitute a proper subset of the operators on ''H<sub>A</sub>'' and same can be said for system ''B''. However, once a linear map Φ between the density matrices is specified, a standard linearity argument, together with the finite dimensional assumption, allow us to extend Φ uniquely to the full space of operators. This leads to the adjoint map Φ<sup>*</sup>, which describes the action of Φ in the [[Heisenberg picture]]: + +The spaces of operators ''L(H<sub>A</sub>)'' and ''L(H<sub>B</sub>)'' are Hilbert spaces with the [[Hilbert-Schmidt operator|Hilbert-Schmidt]] inner product. Therefore, viewing <math>\Phi : L(H_A) \rightarrow L(H_B)</math> as a map between Hilbert spaces, we obtain its adjoint Φ<sup>*</sup> given by + +:<math>\langle A , \Phi(\rho) \rangle = \langle \Phi^*(A) , \rho \rangle .</math> + +While Φ takes states on ''A'' to those on ''B'', Φ<sup>*</sup> maps observables on system ''B'' to observables on ''A''. This relationship is same as that between the Schrödinger and Heisenberg descriptions of dynamics. The measurement statistics remain unchanged whether the observables are considered fixed while the states undergo operation or vice versa. + +It can be directly checked that if Φ is assumed to be trace preserving, Φ<sup>*</sup> is [[unital map|unital]], that is,Φ<sup>*</sup>''(I) = I''. Physically speaking, this means that, in the Heisenberg picture, the trivial observable remains trivial after applying the channel. + +=== Classical information === + +So far we have only defined quantum channel that transmits only quantum information. As stated in the introduction, the input and output of a channel can include classical information as well. To describe this, the formulation given so far needs to be generalized somewhat. A purely quantum channel, in the Heisenberg picture, is a linear map Ψ between spaces of operators: + +:<math>\Psi : L(H_B) \rightarrow L(H_A)</math> + +that is unital and completely positive ('''CP'''). The operator spaces can be viewed as finite dimensional +[[C*-algebra]]s. Therefore we can say a channel is a unital CP map between C*-algebras: + +:<math>\Psi : \mathcal{B} \rightarrow \mathcal{A}.</math> + +Classical information can then be included in this formulation. The observables of a classical system can be +assumed to be a commutative C*-algebra, i.e. the space of continuous functions ''C(X)'' on some set ''X''. We assume ''X'' is finite so ''C(X)'' can be identified with the n-dimensional Euclidean space <math>\mathbb{R}^n</math> with entry-wise multiplication. + +Therefore, in the Heisenberg picture, if the classical information is part of, say, the input, we would define <math>\mathcal{B}</math> to include the relevant classical observables. An example of this would be a channel + +:<math>\Psi : L(H_B) \otimes C(X) \rightarrow L(H_A).</math> + +Notice <math>L(H_B) \otimes C(X)</math> is still a C*-algebra. An element ''a'' of a C*-algebra <math>\mathcal{A}</math> is called positive if ''a'' = ''x*x'' for some ''x''. Positivity of a map is defined accordingly. This characterization is not universally accepted; the [[quantum instrument]] is sometimes given as the generalized mathematical framework for conveying both quantum and classical information. In axiomatizations of quantum mechanics, the classical information is carried in a [[Frobenius algebra]] or [[Frobenius category]]. + +== Examples == + +=== States === + +A state, viewed as a mapping from observables to their expectation values, is an immediate example of a channel. + +=== Time evolution === + +For a purely quantum system, the time evolution, at certain time ''t'', is given by + +:<math>\rho \rightarrow U \rho \;U^*,</math> + +where <math>U = e^{-iH(t)/\hbar}</math> and ''H(t)'' is the [[Hamiltonian (quantum mechanics)|Hamiltonian]] at time ''t''. Clearly this gives a CPTP map in the Schrödinger picture and is therefore a channel. The dual map in the Heisenberg picture is + +:<math>A \rightarrow U^* A U.</math> + +=== Restriction === + +Consider a composite quantum system with state space <math>H_A \otimes H_B.</math> For a state + +:<math>\rho \in H_A \otimes H_B,</math> + +the reduced state of ''ρ'' on system ''A'', ''ρ<sup>A</sup>'', is obtained by taking the [[partial trace]] of ''ρ'' with respect to the ''B'' system: + +:<math> \rho ^A = \operatorname{Tr}_B \; \rho.</math> + +The partial trace operation is a CPTP map, therefore a quantum channel in the Schrödinger picture. In the Heisenberg picture, the dual map of this channel is + +:<math> A \rightarrow A \otimes I_B,</math> + +where ''A'' is an observable of system ''A''. + +=== Observable === + +An observable associates a numerical value <math>f_i \in \mathbb{C}</math> to a quantum mechanical ''effect'' <math>F_i</math>. <math>F_i</math>'s are assumed to be positive operators acting on appropriate state space and <math>\sum F_i = I</math>. (Such a collection is called a [[POVM]].) In the Heisenberg picture, the corresponding ''observable map'' Ψ maps a classical observable + +:<math>f = \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} \in C(X)</math> + +to the quantum mechanical one + +:<math>\; \Psi (f) = \sum_i f_i F_i.</math> + +In other words, one [[Naimark's dilation theorem|integrate ''f'' against the POVM]] to obtain the quantum mechanical observable. It can be easily checked that Ψ is CP and unital. + +The corresponding Schrödinger map Ψ<sup>*</sup> takes density matrices to classical states: + +:<math> +\Psi (\rho) = \begin{bmatrix} \langle F_1, \rho \rangle \\ \vdots \\ \langle F_n, \rho \rangle \end{bmatrix} +</math> + +,where the inner product is the Hilbert-Schmidt inner product. Furthermore, viewing states as normalized [[density matrix#C*-algebraic formulation of states|functionals]], and invoking the [[Riesz representation theorem]], we can put + +:<math> +\Psi (\rho) = \begin{bmatrix} \rho (F_1) \\ \vdots \\ \rho (F_n) \end{bmatrix}. +</math> + +=== Instrument === + +The observable map, in the Schrödinger picture, has a purely classical output algebra and therefore only describe measurement statistics. To take the state change into account as well, we define what is called an [[quantum instrument]]. Let <math>\{ F_1, \cdots, F_n \}</math> be the effects (POVM) associated to an observable. In the Schrödinger picture, an instrument is a map Φ with pure quantum input <math>\rho \in L(H)</math> and with output space <math>C(X) \otimes L(H)</math>: + +:<math> +\Phi (\rho) = \begin{bmatrix} \rho(F_1) \cdot F_1 \\ \vdots \\ \rho(F_n) \cdot F_n \end{bmatrix}. +</math> + +Let + +:<math> +f = \begin{bmatrix} f_1 \\ \vdots \\ f_n \end{bmatrix} \in C(X). +</math> + +The dual map in the Heisenberg picture is + +:<math> +\Psi (f \otimes A) = \begin{bmatrix} f_1 \Psi_1(A) \\ \vdots \\ f_n \Psi_n(A)\end{bmatrix} +</math> + +where <math>\Psi_i</math> is defined in the following way: Factor <math>F_i = M_i ^2</math> (this can always be done since elements of a POVM are positive) then <math>\; \Psi_i (A) = M_i A M_i</math>. +We see that Ψ is CP and unital. + +Notice that <math>\Psi (f \otimes I)</math> gives precisely the observable map. The map + +:<math>{\tilde \Psi}(A)= \sum_i \Psi_i (A) = \sum _i M_i A M_i</math> + +describes the overall state change. + +=== Separable channel === + +A separable channel is an example of [[LOCC|local operation and classical communication (LOCC)]]. Suppose two parties ''A'' and ''B'' wish to communicate in the following manner: ''A'' performs measurement on an observable and communicates the measurement outcome to ''B'' classically. According to the message he receives, ''B'' prepares his (quantum) system in a state that is previously agreed upon by both parties. In the Schrödinger picture, the first part of the channel Φ<sub>1</sub> simply consists of ''A'' making a measurement, i.e. it is the observable map: + +:<math>\; \Phi_1 (\rho) = \begin{bmatrix} \rho(F_1) \\ \vdots \\ \rho(F_n)\end{bmatrix}.</math> + +If, in the event of the ''i''-th measurement outcome, ''B'' prepares his system in state ''R<sub>i</sub>'', the second part of the channel Φ<sub>2</sub> takes the above classical state to the density matrix + +:<math> +\Phi_2 (\begin{bmatrix} \rho(F_1) \\ \vdots \\ \rho(F_n)\end{bmatrix}) = \sum _i \rho (F_i) R_i. +</math> + +The total operation is the composition + +:<math>\Phi (\rho)= \Phi_2 \circ \Phi_1 (\rho) = \sum _i \rho (F_i) R_i.</math> + +Channels of this form are called ''separable'' or in [[Alexander Holevo|Holevo]] form. + +In the Heisenberg picture, the dual map <math>\Phi^* = \Phi_1^* \circ \Phi_2 ^*</math> is defined by + +:<math>\; \Phi^* (A) = \sum_i R_i(A) F_i.</math> + +A separable channel can not be the identity map. This is precisely the statement of the [[no teleportation theorem]], which says classical teleportation (not to be confused with [[quantum teleportation|entanglement-assisted teleportation]]) is impossible. In other words, a quantum state can not be measured reliably. + +In the [[channel-state duality]], a channel is separable if and only if the corresponding state is [[separable state|separable]]. Several other characterizations of separable channels are known, notably that a channel is separable if and only if it is entanglement-breaking. + +=== Pure channel === + +Consider the case of a purely quantum channel Ψ in the Heisenberg picture. With the assumption that everything is finite dimensional, Ψ is a unital CP map between spaces of matrices + +:<math>\Psi : \mathbb{C}^{n \times n} \rightarrow \mathbb{C}^{m \times m}.</math> + +By [[Choi's theorem on completely positive maps]], Ψ must take the form + +:<math>\Psi (A) = \sum_{i = 1}^N K_i A K_i^*</math> + +where ''N'' ≤ ''nm''. The matrices ''K''<sub>i</sub> are called '''[[Kraus operator]]s''' of Ψ (after the German physicist [[Karl Kraus (physicist)|Karl Kraus]], who introduced them). The minimum number of Kraus operators is call the Kraus rank of Ψ. A channel with Kraus rank 1 is called '''pure'''. The time evolution is one example of a pure channel. This terminology again comes from the channel-state duality. A channel is pure if and only if its dual state is a pure state. Since this duality preserves the extremal points, the extremal points in the convex set of channels are precisely the pure channels. + +=== Teleportation === + +In [[quantum teleportation]], a sender wishes to transmit an arbitrary quantum state of a particle to a possibly distant receiver. Consequently, the teleportation process is a quantum channel. The apparatus for the process itself requires a quantum channel for the transmission of one particle of an entangled-state to the receiver. Teleportation occurs by a joint measurement of the send particle and the remaining entangled particle. This measurement results in classical information which must be sent to the receiver to complete the teleportation. Importantly, the classical information can be sent after the quantum channel has ceased to exist. + +== In the experimental setting == + +Experimentally, a simple implementation of a quantum channel is [[fiber optic]] (or free-space for that matter) transmission of single [[photon]]s. Single photons can be transmitted up to 100&nbsp;km in standard fiber optics before losses dominate. The photon's time-of-arrival (''time-bin entanglement'') or [[Polarization (waves)|polarization]] are used as a basis to encode quantum information for purposes such as [[quantum cryptography]]. The channel is capable of transmitting not only basis states (e.g. |0>, |1>) but also superpositions of them (e.g. |0>+|1>). The [[quantum coherence|coherence]] of the state is maintained during transmission through the channel. Contrast this with the transmission of electrical pulses through wires (a classical channel), where only classical information (e.g. 0s and 1s) can be sent. + +== Channel capacity == + +=== The cb-norm of a channel === + +Before giving the definition of channel capacity, the preliminary notion of the '''norm of complete boundedness''', or '''cb-norm''' of a channel needs to be discussed. When considering the capacity of a channel Φ, we need to compare it with an "ideal channel" Λ. For instance, when the input and output algebras are identical, we can choose Λ to be the identity map. Such a comparison requires a [[metric (mathematics)|metric]] between channels. +Since a channel can be viewed as a linear operator, it is tempting to use the natural [[operator norm]]. In other words, the closenss of Φ to the ideal channel Λ can be defined by + +:<math>\| \Phi - \Lambda \| = \sup \{ \| (\Phi - \Lambda)(A)\| \;|\; \|A\| \leq 1 \}.</math> + +However, the operator norm may increase when we tensor Φ with the identity map on some ancilla. + +To make the operator norm even a more undesirable candidate, the quantity + +:<math>\| \Phi \otimes I_n \|</math> + +may increase without bound as <math>n \rightarrow \infty.</math> The solution is to introduce, for any linear map Φ between C*-algebras, the cb-norm + +:<math>\| \Phi \|_{cb} = \sup _n \| \Phi \otimes I_n \|.</math> + +=== Definition of Channel Capacity === + +We remind the reader that the mathematical model of a channel used here is same as the [[channel capacity|classical one]]. + +Let <math>\Psi :\mathcal{B}_1 \rightarrow \mathcal{A}_1</math> be a channel in the Heisenberg picture and <math>\Psi_{id} : \mathcal{B}_2 \rightarrow \mathcal{A}_2</math> be a chosen ideal channel. To make the comparison possible, one needs to encode and decode Φ via appropriate devices, i.e. we consider the composition + +:<math>{\hat \Psi} = D \circ \Phi \circ E : \mathcal{B}_2 \rightarrow \mathcal{A}_2 </math> + +where ''E'' is an [[encoder]] and ''D'' is a [[decoder]]. In this context, ''E'' and ''D'' are unital CP maps with appropriate domains. The quantity of interest is the ''best case scenario'': + +:<math>\Delta ({\hat \Psi}, \Psi_{id}) = \inf_{E,D} \| {\hat \Psi} - \Psi_{id} \|_{cb}</math> + +with the infimum being taken over all possible encoders and decoders. + +To transmit words of length ''n'', the ideal channel is to be applied ''n'' times, so we consider the tensor power + +:<math>\Psi_{id}^{\otimes n} = \Psi_{id} \otimes \cdots \otimes \Psi_{id}.</math> + +The <math>\otimes</math> operation describes ''n'' inputs undergoing the operation <math>\Psi_{id}</math> independently and is the quantum mechanical counterpart of [[concatenation]]. Similarly, ''m-invokations of the channel'' corresponds to <math>{\hat \Psi} ^{\otimes m}</math>. + +The quantity + +:<math>\Delta ( {\hat \Psi}^{\otimes m}, \Psi_{id}^{\otimes n} )</math> + +is therefore a measure of the ability of the channel to transmit words of length ''n'' faithfully by being invoked ''m'' times. + +This leads to the following definition: + +:An a non-negative real number ''r'' is an '''achievable rate of <math>\Psi</math> with respect to <math>\Psi_{id}</math>''' if + +:For all sequences <math>\{ n_{\alpha} \}, \{ m_{\alpha} \} \subset \mathbb{N}</math> where <math>m_{\alpha}\rightarrow \infty</math> and <math>\lim \sup _{\alpha} (n_{\alpha}/m_{\alpha}) < r</math>, we have + +:<math>\lim_{\alpha} \Delta ( {\hat \Psi}^{\otimes m_{\alpha}}, \Psi_{id}^{\otimes n_{\alpha}} ) = 0.</math> + +A sequence <math>\{ n_{\alpha} \}</math> can be viewed as representing a message consisting of possibly infinite number of words. The limit supremum condition in the definition says that, in the limit, faithful transmission can be achieved by invoking the channel no more than ''r'' times the length of a word. One can also say that ''r'' is the number of letters per invokation of the channel that can be sent without error. + +The '''channel capacity of <math>\Psi</math> with respect to <math>\Psi_{id}</math>''', denoted by <math>\;C(\Psi, \Psi_{id})</math> is the supremum of all achievable rates. + +From the definition, it is vacuously true that 0 is an achievable rate for any channel. + +=== Important examples === + +As stated before, for a system with observable algebra <math>\mathcal{B}</math>, the ideal channel <math>\Psi_{id}</math> is by definition the identity map <math>I_{\mathcal{B}}</math>. Thus for a purely ''n'' dimensional quantum system, the ideal channel is the identity map on the space of ''n x n'' matrices <math>\mathbb{C}^{n \times n}</math>. As a slight abuse of notation, this ideal quantum channel will be also denoted by <math>\mathbb{C}^{n \times n}</math>. Similarly for a classical system with output algebra <math>\mathbb{C}^m</math> will have an ideal channel denoted by the same symbol. We can now state some fundamental channel capacities. + +The channel capacity of the classical ideal channel <math>\mathbb{C}^m</math> with respect to a quantum ideal channel <math>\mathbb{C}^{n \times n}</math> is + +:<math>C(\mathbb{C}^m, \mathbb{C}^{n \times n}) = 0.</math> + +This is equivalent to the no-teleportation theorem: it is impossible to transmit quantum information via a classical channel. + +Moreover, the following equalities hold: + +:<math> +C(\mathbb{C}^m, \mathbb{C}^n) = C(\mathbb{C}^{m \times m}, \mathbb{C}^{n \times n}) +== C( \mathbb{C}^{m \times m}, \mathbb{C}^{n} ) == \frac{\log n}{\log m}. +</math> + +The above says, for instance, an ideal quantum channel is no more efficient at transmitting classical information than an ideal classical channel. When ''n'' = ''m'', the best one can achieve is ''one bit per qubit''. + +'''Remark''' It is relevant to note here that both of the above bounds on capacities can be broken, with the aid of [[quantum entanglement|entanglement]]. The [[quantum teleportation|entanglement-assisted teleportation scheme]] allows one to transmit quantum information using a classical channel. [[Superdense coding]]. achieves ''two bit per qubit''. These results indicate the significant role played by entanglement in quantum communication. + +=== Classical and quantum channel capacities === + +Using the same notation as the previous subsection, the '''classical capacity''' of a channel Ψ is + +:<math>C(\Psi, \mathbb{C}^2)</math> + +, that is, it is the capacity of Ψ with respect to the ideal channel on the classical one-bit system <math>\mathbb{C}^2</math>. + +Similarly the '''quantum capacity''' of Ψ is + +:<math>C(\Psi, \mathbb{C}^{2 \times 2})</math> + +, where the reference system is now the one qubit system <math>\mathbb{C}^{2 \times 2}</math>. + +== Channel fidelity == + +Another measure of how well a quantum channel preserves information is called '''channel fidelity''', and it arises from [[fidelity of quantum states]]. + +{{Expand section|date=June 2008}} + +== Quantum channel with memory == +{{Empty section|date=June 2008}} + +==See also== +* [[No-communication theorem]] +* [[Amplitude damping channel]] + +== References == + +* M. Keyl and R.F. Werner, ''How to Correct Small Quantum Errors'', Lecture Notes in Physics Volume 611, Springer, 2002. +* Mark M. Wilde, [http://arxiv.org/abs/1106.1445 "From Classical to Quantum Shannon Theory", arXiv:1106.1445]. + +{{Quantum computing}} + +{{DEFAULTSORT:Quantum Channel}} +[[Category:Quantum information theory]] + jrtuwera8d4lsm5rtkv1ij2wsgkktzi + + + + Ehrenfest model + 0 + 22754 + + 22755 + 2013-08-29T19:48:05Z + + 190.139.221.106 + + added link to Tatyana Afanasyeva article + wikitext + text/x-wiki + The '''Ehrenfest model''' (or '''dog-flea model'''<ref>{{cite doi|10.1119/1.1632488}}</ref>) of [[diffusion]] was proposed by [[Tatyana Afanasyeva|Tatiana]] and [[Paul Ehrenfest]] to explain the [[second law of thermodynamics]]. The model considers ''N'' particles in two containers. Particles independently change container at a rate&nbsp;''λ''. If ''X''(t)&nbsp;=&nbsp;''i'' is defined to be the number of particles in one container at time t, then it is a [[birth-death process]] with [[Continuous-time Markov process#Mathematical definitions|transition rates]] + +* <math>q_{i, i-1} = i\, \lambda</math> for ''i'' = 1, 2, ..., ''N'' +* <math>q_{i, i+1} = (N-i\,) \lambda</math> for ''i'' = 0, 1, ..., ''N'' – 1 + +and equilibrium distribution <math>\pi_i = 2^{-N} \tbinom Ni</math>. + +[[Mark Kac]] proved in 1947 that if the initial system state is not equilibrium, then the [[Entropy (information theory)|entropy]], given by + +:<math>H(t) = -\sum_{i} P(X(t)=i) \log \left( \frac{P(X(t)=i)}{\pi_i}\right) , </math> + +is monotonically increasing ([[H-theorem]]). This is a consequence of the convergence to the equilibrium distribution. + +==References== +{{Reflist}} +* [[F.P. Kelly]] Reversibility and Stochastic Networks (Wiley, Chichester, 1979) ISBN 0-471-27601-4 [http://www.statslab.cam.ac.uk/~frank/BOOKS/kelly_book.html] pp. 17–20 +* "Ehrenfest model of diffusion." [[Encyclopædia Britannica]] (2008) +* Paul und Tatjana Ehrenfest. Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem. Physikalishce Zeitschrift, vol. 8 (1907), pp. 311-314. + +[[Category:Queueing theory]] +[[Category:Diffusion]] +[[Category:Stochastic processes]] + fgxdnc1qnx8yy6q478kzxoygbzn7thq + + + + Fidelity of quantum states + 0 + 14284 + + 14285 + 2013-11-01T05:00:15Z + + Bgwhite + 0 + + [[WP:CHECKWIKI]] error fix #26. Convert HTML to wikicode. Do [[Wikipedia:GENFIXES|general fixes]] and cleanup if needed. - using [[Project:AWB|AWB]] (9572) + wikitext + text/x-wiki + In [[quantum information theory]], '''fidelity''' is a measure of the "closeness" of two quantum states. It is not a [[metric (mathematics)|metric]] on the space of [[Mixed state (physics)|density matrices]], but it can be used to define the [[Bures metric]] on this space. + +== Motivation == + +In probability theory, given two random variables ''p'' = (''p''<sub>1</sub>...''p<sub>n</sub>'') and ''q'' = (''q''<sub>1</sub>...''q<sub>n</sub>'') on the probability space ''X'' = {1,2...n}. The fidelity of ''p'' and ''q'' is defined to be the quantity + +:<math>F(p,q) = \sum _i \sqrt{p_i q_i}</math>. + +In other words, the fidelity ''F(p,q)'' is the inner product of <math>(\sqrt{p_1}, \cdots ,\sqrt{p_n})</math> and <math>(\sqrt{q_1}, \cdots ,\sqrt{q_n})</math> viewed as vectors in Euclidean space. Notice that ''F(p,q)'' = 1 if and only if ''p'' = ''q''. In general, <math>0 \leq F(p,q) \leq 1</math>. This measure is known as the [[Bhattacharyya coefficient]]. + +Given a classical measure of the distinguishability of two probability distributions, one can motivate a measure of distinguishability of two quantum states as follows. If an experimenter is attempting to determine whether a quantum state is either of two possibilities <math>\rho</math> or <math>\sigma</math>, the most general possible measurement he can make on the state is a [[POVM]], which is described by a set of [[Hermitian operator|Hermitian]] [[Positive-definite function|positive semidefinite]] [[Operator (mathematics)|operators]] <math>\{F_i\} </math>. If the state given to the experimenter is <math>\rho</math>, he will witness outcome <math>i</math> with probability <math>p_i = \mathrm{Tr}[ \rho F_i ]</math>, and likewise with probability <math>q_i = \mathrm{Tr}[ \sigma F_i ]</math> for <math>\sigma</math>. His ability to distinguish between the quantum states <math>\rho</math> and <math>\sigma</math> is then equivalent to his ability to distinguish between the classical probability distributions <math>p</math> and <math>q</math>. Naturally, the experimenter will choose the best POVM he can find, so this motivates defining the quantum fidelity as the [[Bhattacharyya coefficient]] when extremized over all possible POVMs <math>\{F_i\} </math>: + +:<math>F(\rho,\sigma) = \min_{\{F_i\}} F(p,q)</math>. +::::<math>= \min_{\{F_i\}} \sum _i \sqrt{\mathrm{Tr}[ \rho F_i ], \mathrm{Tr}[ \sigma F_i ]}</math>. + +It was shown by Fuchs and Caves that this manifestly symmetric definition is equivalent to the simple asymmetric formula given in the next section.<ref>C. A. Fuchs, C. M. Caves: [http://prl.aps.org/abstract/PRL/v73/i23/p3047_1 Ensemble-Dependent Bounds for Accessible Information in Quantum Mechanics], [[Physical Review Letters]] 73, 3047(1994)</ref> + +== Definition == + +Given two density matrices ''ρ'' and ''σ'', the '''fidelity''' is defined by + +:<math>F(\rho, \sigma) = \operatorname{Tr} \left[\sqrt{\sqrt{\rho} \sigma \sqrt{\rho}}\right].</math> + +By ''M''<sup>&frac12;</sup> of a positive semidefinite matrix ''M'', we mean its unique positive square root given by the [[spectral theorem]]. The Euclidean inner product from the classical definition is replaced by the [[Hilbert-Schmidt operator|Hilbert-Schmidt]] [[inner product]]. When the states are classical, i.e. when ''ρ'' and ''σ'' commute, the definition coincides with that for probability distributions. + +An equivalent definition is given by +:<math>F(\rho, \sigma) = \lVert \sqrt{\rho} \sqrt{\sigma} \rVert_\mathrm{tr},</math> +where the norm is the trace norm (sum of the singular values). This definition has the advantage that it clearly shows that the fidelity is symmetric in its two arguments. + +Notice by definition ''F'' is non-negative, and ''F(ρ,ρ)'' = 1. In the following section it will be shown that it can be no larger than 1. + +In the original 1994 paper of Jozsa the name 'fidelity' was used for the quantity +<math>F\;'=F^2</math> and this convention is often used in the literature. +According to this convention 'fidelity' has a meaning of probability. + +== Simple examples == + +=== Pure states === + +Suppose that one of the states is pure: <math>\rho = | \phi \rangle \langle \phi |</math>. Then <math>\sqrt{\rho} = \rho = | \phi \rangle \langle \phi |</math> and the fidelity is + +:<math> +F(\rho, \sigma) = \operatorname{Tr} \left[\sqrt{ | \phi \rangle \langle \phi | \sigma | \phi \rangle \langle \phi |} \right] += \sqrt{\langle \phi | \sigma | \phi \rangle} \operatorname{Tr} \left[\sqrt{ | \phi \rangle \langle \phi |} \right] += \sqrt{\langle \phi | \sigma | \phi \rangle}. +</math> + +If the other state is also pure, <math>\sigma = | \psi \rangle \langle \psi |</math>, then the fidelity is + +:<math> +F(\rho, \sigma) = \sqrt{\langle \phi | \psi \rangle \langle \psi | \phi \rangle} += | \langle \phi | \psi \rangle |. +</math> + +This is sometimes called the ''overlap'' between two states. If, say, <math>|\phi\rangle</math> is an eigenstate of an observable, and the system is prepared in <math>| \psi \rangle</math>, then ''F(ρ, σ)''<sup>2</sup> is the probability of the system being in state <math>|\phi\rangle</math> after the measurement. + +=== Commuting states === + +Let ρ and σ be two density matrices that commute. Therefore they can be simultaneously diagonalized by unitary matrices, and we can write + +:<math> \rho = \sum_i p_i | i \rangle \langle i |</math> and <math> \sigma = \sum_i q_i | i \rangle \langle i |</math> + +for some orthonormal basis <math>\{ | i \rangle \}</math>. Direct calculation shows the fidelity is + +:<math>F(\rho, \sigma) = \sum_i \sqrt{p_i q_i}.</math> + +This shows that, heuristically, fidelity of quantum states is a genuine extension of the notion from probability theory. + +== Some properties == + +=== Unitary invariance === + +Direct calculation shows that the fidelity is preserved by unitary evolution, i.e. + +:<math>\; F(\rho, \sigma) = F(U \rho \; U^*, U \sigma U^*) </math> + +for any unitary operator ''U''. + +=== Uhlmann's theorem === + +We saw that for two pure states, their fidelity coincides with the overlap. Uhlmann's theorem generalizes this statement to mixed states, in terms of their purifications: + +'''Theorem''' Let ρ and σ be density matrices acting on '''C'''<sup>n</sup>. Let ρ<sup>½</sup> be the unique positive square root of ρ and + +:<math> +| \psi _{\rho} \rangle = \sum_{i=1}^n (\rho^{\frac{1}{2}} | e_i \rangle) \otimes | e_i \rangle \in \mathbb{C}^n \otimes \mathbb{C}^n +</math> + +be a [[purification of quantum state|purification]] of ρ (therefore <math>\textstyle \{|e_i\rangle\}</math> is an orthonormal basis), then the following equality holds: + +:<math>F(\rho, \sigma) = \max_{|\psi_{\sigma} \rangle} | \langle \psi _{\rho}| \psi _{\sigma} \rangle |</math> + +where <math>| \psi _{\sigma} \rangle</math> is a purification of σ. Therefore, in general, the fidelity is the maximum overlap between purifications. + +'''Proof:''' +A simple proof can be sketched as follows. Let <math>\textstyle |\Omega\rangle</math> denote the vector + +:<math>| \Omega \rangle= \sum_{i=1}^n | e_i \rangle \otimes | e_i \rangle </math> + +and σ<sup>½</sup> be the unique positive square root of σ. We see that, due to the unitary freedom in square root factorizations and choosing orthonormal bases, an arbitrary purification of σ is of the form + +:<math>| \psi_{\sigma} \rangle = ( \sigma^{\frac{1}{2}} V_1 \otimes V_2 ) | \Omega \rangle </math> + +where ''V''<sub>i</sub>'s are unitary operators. Now we directly calculate + +:<math> +| \langle \psi _{\rho}| \psi _{\sigma} \rangle | += | \langle \Omega | ( \rho^{\frac{1}{2}} \otimes I) ( \sigma^{\frac{1}{2}} V_1 \otimes V_2 ) | \Omega \rangle | += | \operatorname{Tr} ( \rho^{\frac{1}{2}} \sigma^{\frac{1}{2}} V_1 V_2^T )|. +</math> + +But in general, for any square matrix ''A'' and unitary ''U'', it is true that |Tr(''AU'')| &le; Tr (''A''<sup>*</sup>''A'')<sup>&frac12;</sup>. Furthermore, equality is achieved if ''U''<sup>*</sup> is the unitary operator in the [[polar decomposition]] of ''A''. From this follows directly Uhlmann's theorem. + +==== Consequences ==== +Some immediate consequences of Uhlmann's theorem are +* Fidelity is symmetric in its arguments, i.e. ''F'' (ρ,σ) = ''F'' (σ,ρ). Notice this is not obvious from the definition. +* ''F'' (ρ,σ) lies in [0,1], by the [[Cauchy-Schwarz inequality]]. +* ''F'' (ρ,σ) = 1 if and only if ρ = σ, since Ψ<sub>ρ</sub> = Ψ<sub>σ</sub> implies ρ = σ. +So we can see that fidelity behaves almost like a metric. This can be formalised and made useful by defining +:<math> \cos \theta_{\rho\sigma} = F(\rho,\sigma) \,</math> +As the angle between the states <math>\rho</math> and <math>\sigma</math>. It follows from the above properties that <math>\theta_{\rho\sigma}</math> is non-negative, symmetric in its inputs, and is equal to zero if and only if <math>\rho = \sigma</math>. Furthermore, it can be proved that it obeys the triangle inequality,<ref>M. Nielsen, I. Chuang, ''Quantum Computation and Quantum Information'', Cambridge University Press, 2000, 409-416</ref> so this angle is a metric on the state space: the [[Fubini-Study metric]].<ref>K. Życzkowski, I. Bengtsson, ''Geometry of Quantum States'', Cambridge University Press, 2008, 131</ref> + +=== Relationship to Trace Distance === +We can define the [[trace distance]] between two matrices A and B in terms of the [[matrix norm|trace norm]] by + +:<math> +D(A,B) = \frac{1}{2}\| A-B\|_{\rm tr} \, . +</math> + +When A and B are both density operators, this is a quantum generalization of the [[statistical distance]]. This is relevant because the trace distance provides upper and lower bounds on the fidelity as quantified by the ''Fuchs-van de Graaf inequalities'',<ref>C. A. Fuchs and J. van de Graaf, "Cryptographic Distinguishability Measures for Quantum Mechanical States," IEEE Trans. Inf. Theory 45, 1216 (1999). arXiv:quant-ph/9712042</ref> + +:<math> +1-F(\rho,\sigma) \le D(\rho,\sigma) \le\sqrt{1-F(\rho,\sigma)^2} \, . +</math> + +Often the trace distance is easier to calculate or bound than the fidelity, so these relationships are quite useful. In the case that at least one of the states is a pure state Ψ, the lower bound can be tightened. + +:<math> +1-F(\psi,\rho)^2 \le D(\psi,\rho) \, . +</math> + +== Fidelity of quantum measurements == + +The '''fidelity of a measurement with a projective measurement''' is defined<ref>Taoufik Amri, Quantum behavior of measurement apparatus, [http://arxiv1.library.cornell.edu/abs/1001.3032 arXiv:1001.3032] (2010).</ref> as the overlap between their [[Quantum tomography|pre-measurement states]]: + +:<math> +\mathcal{F}_{n}\left(\psi_{tar}\right)=\langle\psi_{tar}\vert\hat{\rho}_{retr}^{[n]}\vert\psi_{tar}\rangle, +</math> +where <math>\hat{\rho}_{retr}^{[n]}</math> and <math>\vert\psi_{tar}\rangle</math> are respectively the pre-measurement state corresponding to the result "n" and the target state in which we would like measuring the system before its interaction with the measurement apparatus. + +The [[Quantum tomography|pre-measurement state]] is the main tool of the [[Quantum Retrodiction|retrodictive approach]] of quantum physics in which we make predictions about state preparations leading to a certain measurement result. +In such an approach, this fidelity has an interesting meaning: this is nothing but the retrodictive probability of preparing the system in the target state <math>\vert\psi_{tar}\rangle</math> when we read the result "n". Thus, when a measurement is sufficiently ''faithful'' <math>\mathcal{F}_{n}\left(\psi_{tar}\right)\simeq 1</math>, the most probable state in which the system was prepared before the measurement giving the result "n" is this target state <math>\vert\psi_{tar}\rangle</math>. + +== References == +<references/> + +* A. Uhlmann ''The "Transition Probability" in the State Space of a *-Algebra''. Rep. Math. Phys. 9 (1976) 273 - 279. [http://www.physik.uni-leipzig.de/~uhlmann/PDF/Uh76a.pdf PDF] +* R. Jozsa, ''Fidelity for mixed quantum states'', Journal of Modern Optics, 1994, vol. 41, 2315-2323. +* J. A. Miszczak, Z. Puchała, P. Horodecki, A. Uhlmann, K. Życzkowski, ''Sub-- and super--fidelity as bounds for quantum fidelity'', Quantum Information & Computation, Vol.9 No.1&2 (2009). [http://arxiv.org/abs/0805.2037 arXiv:0805.2037]. + +{{DEFAULTSORT:Fidelity Of Quantum States}} +[[Category:Quantum information science]] + bgg5oerz8qnfzgbm6skj8p8abd1djbl + + + + Yield surface + 0 + 15595 + + 15596 + 2014-01-27T11:22:29Z + + Gaius Cornelius + 0 + + + Fix typo. + wikitext + text/x-wiki + <!--{{continuum mechanics|cTopic=[[Solid mechanics]]}}--> +[[File:YieldSurface.svg|right|300px|thumb|Surfaces on which the invariants <math>I_1</math>, <math>J_2</math>, <math>J_3</math> are constant. Plotted in principal stress space.]] +A '''yield surface''' is a five-dimensional surface in the six-dimensional space of [[Stress (mechanics)|stresses]]. The yield surface is usually [[convex polytope|convex]] and the state of stress of ''inside'' the yield surface is elastic. When the stress state lies on the surface the material is said to have reached its [[Yield (engineering)|yield point]] and the material is said to have become [[Plasticity (physics)|plastic]]. Further deformation of the material causes the stress state to remain on the yield surface, even though the shape and size the surface may change as the plastic deformation evolves. This is because stress states that lie outside the yield surface are non-permissible in [[plasticity (physics)|rate-independent plasticity]], though not in some models of [[viscoplasticity]].<ref name=Simo>Simo, J. C. and Hughes, T,. J. R., (1998), '''Computational Inelasticity''', Spinger.</ref> + +The yield surface is usually expressed in terms of (and visualized in) a three-dimensional [[Stress (physics)#Principal_stresses_in_3-D|principal stress]] space (<math> \sigma_1, \sigma_2 , \sigma_3</math>), a two- or three-dimensional space spanned by [[Stress (physics)#Principal_stresses_in_3-D|stress invariants]] (<math> I_1, J_2, J_3</math>) or a version of the three-dimensional [[stress space|Haigh–Westergaard stress space]]. Thus we may write the equation of the yield surface (that is, the yield function) in the forms: + +*<math> f(\sigma_1,\sigma_2,\sigma_3) = 0 \,</math> where <math>\sigma_i</math> are the principal stresses. +*<math> f(I_1, J_2, J_3) = 0 \,</math> where <math>I_1</math> is the first principal invariant of the Cauchy stress and <math>J_2, J_3</math> are the second and third principal invariants of the deviatoric part of the Cauchy stress. +*<math> f(p, q, r) = 0 \,</math> where <math>p, q</math> are scaled versions of <math>I_1</math> and <math>J_2</math> and <math>r</math> is a function of <math>J_2, J_3</math>. +*<math>f(\xi,\rho,\theta) = 0 \,</math> where <math>\xi,\rho</math> are scaled versions of <math>I_1</math> and <math>J_2</math>, and <math>\theta</math> is the '''Lode angle'''. + +== Invariants used to describe yield surfaces == +[[File:YieldSurfacerhoxitheta.svg|right|300px|thumb|Surfaces on which the invariants <math>\xi</math>, <math>\rho</math>, <math>\theta</math> are constant. Plotted in principal stress space.]] +The first principal invariant (<math>I_1</math>) of the [[stress (mechanics)|Cauchy stress]] (<math>\boldsymbol{\sigma}</math>), and the second and third principal invariants (<math>J_2, J_3</math>) of the ''deviatoric'' part (<math>\boldsymbol{s}</math>) of the Cauchy stress are defined as: +:<math> + \begin{align} + I_1 & = \text{Tr}(\boldsymbol{\sigma}) = \sigma_1 + \sigma_2 + \sigma_3 \\ + J_2 & = \tfrac{1}{2} \boldsymbol{s}:\boldsymbol{s} = + \tfrac{1}{6}\left[(\sigma_1-\sigma_2)^2+(\sigma_2-\sigma_3)^2+(\sigma_3-\sigma_1)^2\right] \\ + J_3 & = \det(\boldsymbol{s}) = \tfrac{1}{3} (\boldsymbol{s}\cdot\boldsymbol{s}):\boldsymbol{s} + = s_1 s_2 s_3 + \end{align} + </math> +where (<math> \sigma_1, \sigma_2 , \sigma_3</math>) are the principal values of <math>\boldsymbol{\sigma}</math>, (<math>s_1, s_2, s_3</math>) are the principal values of <math>\boldsymbol{s}</math>, and +:<math> + \boldsymbol{s} = \boldsymbol{\sigma}-\tfrac{I_1}{3}\,\boldsymbol{I} +</math> +where <math>\boldsymbol{I}</math> is the identity matrix. + +A related set of quantities, (<math>p, q, r\,</math>), are usually used to describe yield surfaces for [[cohesive frictional material]]s such as rocks, soils, and ceramics. These are defined as +:<math> + p = \tfrac{1}{3}~I_1 ~:~~ + q = \sqrt{3~J_2} = \sigma_\mathrm{eq} ~;~~ + r = 3\left(\tfrac{1}{2}\,J_3\right)^{1/3} + </math> +where <math>\sigma_\mathrm{eq}</math> is the '''equivalent stress'''. However, the possibility of negative values of <math>J_3</math> and the resulting imaginary <math>r</math> makes the use of these quantities problematic in practice. + +Another related set of widely used invariants is (<math>\xi, \rho, \theta\,</math>) which describe a [[cylindrical coordinate system]] (the '''Haigh–Westergaard''' coordinates). These are defined as: +:<math> + \xi = \tfrac{1}{\sqrt{3}}~I_1 = \sqrt{3}~p ~;~~ + \rho = \sqrt{2 J_2} = \sqrt{\tfrac{2}{3}}~q ~;~~ + \cos(3\theta) = \left(\tfrac{r}{q}\right)^3 = \tfrac{3\sqrt{3}}{2}~\cfrac{J_3}{J_2^{3/2}} + </math> +The <math>\xi-\rho\,</math> plane is also called the '''Rendulic plane'''. The angle <math>\theta</math> is called the '''Lode angle'''<ref>Lode, W. (1926). '' Versuche über den Einfuss der mittleren Hauptspannung auf das Fliessen der Metalle Eisen Kupfer und Nickel''. Zeitung Phys., vol. 36, pp. 913–939.</ref> and the relation between <math>\theta</math> and <math>J_2,J_3</math> was first given by Nayak and Zienkiewicz in 1972 <ref>Nayak, G. C. and Zienkiewicz, O.C. (1972). ''Convenient forms of stress invariants for plasticity''. Proceedings of the ASCE Journal of the Structural Division, vol. 98, no. ST4, pp. 949–954.</ref> + +The principal stresses and the Haigh–Westergaard coordinates are related by +:<math> + \begin{bmatrix} \sigma_1 \\ \sigma_2 \\ \sigma_3 \end{bmatrix} = + \tfrac{1}{\sqrt{3}} \begin{bmatrix} \xi \\ \xi \\ \xi \end{bmatrix} + + \sqrt{\tfrac{2}{3}}~\rho~\begin{bmatrix} \cos\theta \\ \cos\left(\theta-\tfrac{2\pi}{3}\right) \\ \cos\left(\theta+\tfrac{2\pi}{3}\right) \end{bmatrix} + = \tfrac{1}{\sqrt{3}} \begin{bmatrix} \xi \\ \xi \\ \xi \end{bmatrix} + + \sqrt{\tfrac{2}{3}}~\rho~\begin{bmatrix} \cos\theta \\ -\sin\left(\tfrac{\pi}{6}-\theta\right) \\ -\sin\left(\tfrac{\pi}{6}+\theta\right) \end{bmatrix} \,. + </math> +A different definition of the Lode angle can also be found in the literature:<ref name=chak>Chakrabarty, J., 2006, ''Theory of Plasticity: Third edition'', Elsevier, Amsterdam.</ref> +:<math> + \sin(3\theta) = -~\tfrac{3\sqrt{3}}{2}~\cfrac{J_3}{J_2^{3/2}} + </math> +in which case +:<math> + \begin{bmatrix} \sigma_1 \\ \sigma_2 \\ \sigma_3 \end{bmatrix} = + \tfrac{1}{\sqrt{3}} \begin{bmatrix} \xi \\ \xi \\ \xi \end{bmatrix} + + \sqrt{\tfrac{2}{3}}~\rho~\begin{bmatrix} \sin\left(\theta-\tfrac{2\pi}{3}\right) \\ \sin\theta \\ \sin\left(\theta+\tfrac{2\pi}{3}\right) \end{bmatrix} += \tfrac{1}{\sqrt{3}} \begin{bmatrix} \xi \\ \xi \\ \xi \end{bmatrix} + + \sqrt{\tfrac{2}{3}}~\rho~\begin{bmatrix} -\cos\left(\tfrac{\pi}{6}-\theta\right) \\ \sin\theta \\ \cos\left(\tfrac{\pi}{6}+\theta\right) \end{bmatrix} + \,. + </math> +Whatever definition is chosen, the angle <math>\theta</math> varies between 0 degrees to +60 degrees. + +== Examples of yield surfaces == + +There are several different yield surfaces known in engineering, and those most popular are listed below. + +=== Tresca yield surface === +The Tresca yield criterion is taken to be the work of [[Henri Tresca]].<ref>Tresca, H. (1864). ''Mémoire sur l'écoulement des corps solides soumis à de fortes pressions.'' C.R. Acad. Sci. Paris, vol. 59, p. 754.</ref> It is also known as the ''maximum shear stress theory'' (MSST) and the Tresca–Guest (TG) criterion. In terms of the principal stresses the Tresca criterion is expressed as +:<math>\tfrac{1}{2}{\max(|\sigma_1 - \sigma_2| , |\sigma_2 - \sigma_3| , |\sigma_3 - \sigma_1| ) = S_{sy} = \tfrac{1}{2}S_y}\!</math> +Where <math>S_{sy}</math> is the yield strength in shear, and <math>S_y</math> is the tensile yield strength. + +Figure 1 shows the Tresca–Guest yield surface in the three-dimensional space of principal stresses. It is a [[Prism (geometry)|prism]] of six sides and having infinite length. This means that the material remains elastic when all three principal stresses are roughly equivalent (a [[hydrostatic pressure]]), no matter how much it is compressed or stretched. However, when one of the principal stresses becomes smaller (or larger) than the others the material is subject to shearing. In such situations, if the shear stress reaches the yield limit then the material enters the plastic domain. Figure 2 shows the Tresca–Guest yield surface in two-dimensional stress space, it is a cross section of the prism along the <math> \sigma_1, \sigma_2</math> plane. + +:{| +[[Image:Tresca Guest Yield Surface 3D.png|left|400px|thumb|Figure 1: View of Tresca–Guest yield surface in 3D space of principal stresses]] +[[Image:Tresca Guest Yield Surface 2D.png|none|200px|thumb|Figure 2: Tresca–Guest yield surface in 2D space (<math> \sigma_1, \sigma_2</math>)]] +|} + +=== von Mises yield surface === +{{main|von Mises yield criterion}} +The von Mises yield criterion is expressed in the principal stresses as +:<math> {(\sigma_1 - \sigma_2)^2 + (\sigma_2 - \sigma_3)^2 + (\sigma_3 - \sigma_1)^2 = 2 {S_y}^2 }\!</math> +where <math>S_y</math> is the yield strength in uniaxial tension. + +Figure 3 shows the von Mises yield surface in the three-dimensional space of principal stresses. It is a circular [[Cylinder (geometry)|cylinder]] of infinite length with its axis inclined at equal angles to the three principal stresses. Figure 4 shows the von Mises yield surface in two-dimensional space compared with Tresca–Guest criterion. A cross section of the von Mises cylinder on the plane of <math> \sigma_1, \sigma_2</math> produces the [[ellipse|elliptical]] shape of the yield surface. + +:{| +[[Image:Mises Yield Surface 3D.png|left|400px|thumb|Figure 3: View of Huber–Mises–Hencky yield surface in 3D space of principal stresses]] +[[Image:Tresca stress 2D.png|none|200px|thumb|Figure 4: Comparison of Tresca–Guest and Huber–Mises–Hencky criteria in 2D space (<math> \sigma_1, \sigma_2</math>)]] +|} + +===Mohr–Coulomb yield surface=== +{{Main|Mohr–Coulomb theory}} +The [[Mohr–Coulomb theory|Mohr–Coulomb yield (failure) criterion]] is similar to the Tresca criterion, with additional provisions for materials with different tensile and compressive yield strengths. This model is often used to model [[concrete]], [[soil]] or [[granular material]]s. The Mohr–Coulomb yield criterion may be expressed as: +:<math> +\frac{m+1}{2}\max \Big(|\sigma_1 - \sigma_2|+K(\sigma_1 + \sigma_2) ~,~~ + |\sigma_1 - \sigma_3|+K(\sigma_1 + \sigma_3) ~,~~ + |\sigma_2 - \sigma_3|+K(\sigma_2 + \sigma_3) \Big) = S_{yc} +</math> +where +:<math> m = \frac {S_{yc}}{S_{yt}}; K = \frac {m-1}{m+1}</math> + +and the parameters <math>S_{yc}</math> and <math>S_{yt}</math> are the yield (failure) stresses of the material in uniaxial compression and tension, respectively. The formula reduces to the Tresca criterion if <math>S_{yc}=S_{yt}</math>. + +Figure 5 shows Mohr–Coulomb yield surface in the three-dimensional space of principal stresses. It is a conical prism and <math>K</math> determines the inclination angle of conical surface. Figure 6 shows Mohr–Coulomb yield surface in two-dimensional stress space. It is a cross section of this conical prism on the plane of <math> \sigma_1, \sigma_2</math>. +:{| +[[Image:MH Yield Surface 3D.png|400px|left|thumb|Figure 5: View of Mohr–Coulomb yield surface in 3D space of principal stresses]] +[[Image:MH Surface 2D.png|250px|none|thumb|Figure 6: Mohr–Coulomb yield surface in 2D space (<math> \sigma_1, \sigma_2</math>)]] +|} + +=== Drucker–Prager yield surface=== +{{Main|Drucker Prager yield criterion}} +The [[Drucker Prager|Drucker–Prager yield criterion]] is similar to the von Mises yield criterion, with provisions for handling materials with differing tensile and compressive yield strengths. This criterion is most often used for [[concrete]] where both normal and shear stresses can determine failure. The Drucker–Prager yield criterion may be expressed as +:<math> \bigg(\frac {m-1}{2}\bigg) ( \sigma_1 + \sigma_2 + \sigma_3 ) + \bigg(\frac{m+1}{2}\bigg)\sqrt{\frac{(\sigma_1 - \sigma_2)^2 + (\sigma_2 - \sigma_3)^2 + (\sigma_3 - \sigma_1)^2}{2}} = S_{yc} </math> +where +:<math> m = \frac{S_{yc}}{S_{yt}} </math> +and <math>S_{yc}</math>, <math>S_{yt}</math> are the uniaxial yield stresses in compression and tension respectively. The formula reduces to the von Mises equation if <math>S_{yc}=S_{yt}</math>. + +Figure 7 shows Drucker–Prager yield surface in the three-dimensional space of principal stresses. It is a regular [[cone (geometry)|cone]]. Figure 8 shows Drucker–Prager yield surface in two-dimensional space. The elliptical elastic domain is a cross section of the cone on the plane of <math> \sigma_1, \sigma_2</math>; it can be chosen to intersect the Mohr–Coulomb yield surface in different number of vertices. One choice is to intersect the Mohr–Coulomb yield surface at three vertices on either side of the <math> \sigma_1 = -\sigma_2 </math> line, but usually selected by convention to be those in the compression regime.<ref>Khan and Huang. (1995), Continuum Theory of Plasticity. J.Wiley.</ref> Another choice is to intersect the Mohr–Coulomb yield surface at four vertices on both axes (uniaxial fit) or at two vertices on the diagonal <math> \sigma_1 = \sigma_2 </math> (biaxial fit). <ref>Neto, Periç, Owen. (2008), The mathematical Theory of Plasticity. J.Wiley.</ref> The Drucker-Prager yield criterion is also commonly expressed in terms of the [[Drucker_Prager_yield_criterion#Expressions_in_terms_of_cohesion_and_friction_angle|material cohesion and friction angle]]. + +{| +|- +| [[Image:Drucker Prager Yield Surface 3D.png|400px|left|thumb|Figure 7: View of Drucker–Prager yield surface in 3D space of principal stresses]] || [[Image:Drucker Prager WIKI.png|740px|none|thumb|Figure 8: View of Drucker–Prager yield surface in 2D space of principal stresses]] +|} + +===Bresler–Pister yield surface=== +{{Main|Bresler Pister yield criterion}} +The Bresler–Pister yield criterion is an extension of the [[Drucker Prager yield criterion]] that uses three parameters, and has additional terms for materials that yield under hydrostatic compression. +In terms of the principal stresses, this yield criterion may be expressed as +:<math> + S_{yc} = \tfrac{1}{\sqrt{2}}\left[(\sigma_1-\sigma_2)^2+(\sigma_2-\sigma_3)^2+(\sigma_3-\sigma_1)^2\right]^{1/2} - c_0 - c_1~(\sigma_1+\sigma_2+\sigma_3) - c_2~(\sigma_1+\sigma_2+\sigma_3)^2 + </math> +where <math>c_0, c_1, c_2 </math> are material constants. The additional parameter <math>c_2</math> gives the yield surface an [[ellipse|ellipsoidal]] cross section when viewed from a direction perpendicular to its axis. If <math>\sigma_c</math> is the yield stress in uniaxial compression, <math>\sigma_t</math> is the yield stress in uniaxial tension, and <math>\sigma_b</math> is the yield stress in biaxial compression, the parameters can be expressed as +:<math> + \begin{align} + c_1 = & \left(\cfrac{\sigma_t-\sigma_c}{(\sigma_t+\sigma_c)}\right) + \left(\cfrac{4\sigma_b^2 - \sigma_b(\sigma_c+\sigma_t) + \sigma_c\sigma_t}{4\sigma_b^2 + 2\sigma_b(\sigma_t-\sigma_c) - \sigma_c\sigma_t} \right) \\ + c_2 = & \left(\cfrac{1}{(\sigma_t+\sigma_c)}\right) + \left(\cfrac{\sigma_b(3\sigma_t-\sigma_c) -2\sigma_c\sigma_t}{4\sigma_b^2 + 2\sigma_b(\sigma_t-\sigma_c) - \sigma_c\sigma_t} \right) \\ + c_0 = & \sigma_c +\sqrt{3}(c_1\sigma_c -c_2\sigma_c^2) + \end{align} + </math> + +<!--{{verify section}}--> +:{| +[[Image:Bresler Pister Yield Surface 3D.png|400px|left|thumb|Figure 9: View of Bresler–Pister yield surface in 3D space of principal stresses]] +[[Image:Bresler Pister Surface 2D.png|200px|none|thumb|Figure 10: Bresler–Pister yield surface in 2D space (<math> \sigma_1, \sigma_2</math>)]] +|} + +===Willam–Warnke yield surface=== +{{Main|Willam Warnke yield criterion}} +The [[Willam Warnke yield criterion|Willam–Warnke yield criterion]] is a three-parameter smoothed version of the [[Mohr–Coulomb theory|Mohr–Coulomb yield criterion]] that has similarities in form to the [[Drucker Prager|Drucker–Prager]] and [[Bresler Pister yield criterion|Bresler–Pister]] yield criteria. + +The yield criterion has the functional form +:<math> + f(I_1, J_2, J_3) = 0 ~. + </math> +However, it is more commonly expressed in Haigh–Westergaard coordinates as +:<math> + f(\xi, \rho, \theta) = 0 ~. + </math> +The cross-section of the surface when viewed along its axis is a smoothed triangle (unlike Mohr–Coulumb). The Willam–Warnke yield surface is convex and has unique and well defined first and second derivatives on every point of its surface. Therefore the Willam–Warnke model is computationally robust and has been used for a variety of cohesive-frictional materials. +:{| +[[Image:Willam Warnke Yield Surface 3Da.png|300px|left|thumb|Figure 11: View of Willam–Warnke yield surface in 3D space of principal stresses]] [[Image:Willam Warnke Yield Surface 3Db.png|300px|none|thumb|Figure 12: Willam–Warnke yield surface in the <math>\pi</math>-plane]] +|} + +===Bigoni–Piccolroaz yield surface=== +The [[Bigoni Piccolroaz yield criterion|Bigoni–Piccolroaz yield criterion]] <ref>Bigoni, D. Nonlinear Solid Mechanics: Bifurcation Theory and Material Instability. Cambridge University Press, 2012 . ISBN 9781107025417.</ref><ref name=BP>Bigoni, D. and Piccolroaz, A., (2004), Yield criteria for quasibrittle and frictional materials, ''International Journal of Solids and Structures'' '''41''', 2855-2878.</ref> is a seven-parameter surface defined by + +:<math> + f(p,q,\theta) = F(p) + \frac{q}{g(\theta)} = 0, + </math> + +where <math>F(p)</math> is the “meridian” function + +:<math> +F(p) = +\left\{ +\begin{array}{ll} +-M p_c \sqrt{(\phi - \phi^m)[2(1 - \alpha)\phi + \alpha]}, & \phi \in [0,1], \\ ++\infty, & \phi \notin [0,1], +\end{array} +\right. +</math> + +:<math> +\phi = \frac{p + c}{p_c + c}, +</math> + +describing the pressure-sensitivity and <math>g(\theta)</math> is the “deviatoric” function + +:<math> +g(\theta) = \frac{1}{\cos[\beta \frac{\pi}{6} - \frac{1}{3} \cos^{-1}(\gamma \cos 3\theta)]}, +</math> + +describing the Lode-dependence of yielding. The seven, non-negative material parameters: + +:<math> +\underbrace{M > 0,~ p_c > 0,~ c \geq 0,~ 0 < \alpha < 2,~ m > 1}_{\mbox{defining}~\displaystyle{F(p)}},~~~ +\underbrace{0\leq \beta \leq 2,~ 0 \leq \gamma < 1}_{\mbox{defining}~\displaystyle{g(\theta)}}, +</math> + +define the shape of the meridian and deviatoric sections. + +This criterion represents a smooth and convex surface, which is closed both in hydrostatic tension and compression and has a +drop-like shape, particularly suited to describe frictional and granular materials. This criterion has also been generalized to the case of surfaces with corners.<ref name=BP2>Piccolroaz, A. and Bigoni, D. (2009), Yield criteria for quasibrittle and frictional materials: a generalization to surfaces with corners, ''International Journal of Solids and Structures'' '''46''', 3587-3596.</ref> + +{{multiple image + | align = none + | footer = Bigoni-Piccolroaz yield surface + | image1 = Supbp1.png + | width1 = 350 + | alt1 = 3D + | caption1 = In 3D space of principal stresses + | image2 = Supbp2.png + | width2 = 280 + | alt2 = <math>\pi</math>-plane + | caption2 = In the <math>\pi</math>-plane + }} + +==See also== +* [[Yield (engineering)]] +* [[Plasticity (physics)]] +* [[Stress (physics)|Stress]] +* [[Henri Tresca]] +* [[von Mises stress]] +* [[Mohr–Coulomb theory]] +* [[Strain (materials science)|Strain]] +* [[Strain tensor]] +* [[Stress-energy tensor]] +* [[Stress concentration]] +* [[3-D elasticity]] + +== References == +<references/> + +[[Category:Plasticity]] +[[Category:Solid mechanics]] +[[Category:Continuum mechanics]] +[[Category:Materials science]] + ch5wcbzrmgg6llvsb4vnz506ozumxah + + + + Non-linear least squares + 0 + 21657 + + 21658 + 2013-09-23T06:01:16Z + + 87.188.169.69 + + clarify + wikitext + text/x-wiki + {{Regression bar}} +'''Non-linear least squares''' is the form of [[least squares]] analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m''&nbsp;>&nbsp;''n''). It is used in some forms of [[non-linear regression]]. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to [[linear least squares (mathematics)|linear least squares]], but also some [[least squares#Differences between linear and non-linear least squares|significant differences]]. + +== Theory == +Consider a set of <math>m</math> data points, <math>(x_1, y_1), (x_2, y_2),\dots,(x_m, y_m),</math> and a curve (model function) <math>y=f(x, \boldsymbol \beta),</math> that in addition to the variable <math>x</math> also depends on <math>n</math> parameters, <math>\boldsymbol \beta = (\beta_1, \beta_2, \dots, \beta_n),</math> with <math>m\ge n.</math> It is desired to find the vector <math>\boldsymbol \beta</math> of parameters such that the curve fits best the given data in the least squares sense, that is, the sum of squares +:<math>S=\sum_{i=1}^{m}r_i^2</math> +is minimized, where the [[errors and residuals in statistics|residuals]] (errors) ''r<sub>i</sub>'' are given by +:<math>r_i= y_i - f(x_i, \boldsymbol \beta) </math> + +for <math>i=1, 2,\dots, m.</math> + +The [[Maxima and minima|minimum]] value of ''S'' occurs when the [[gradient]] is zero. Since the model contains ''n'' parameters there are ''n'' gradient equations: + +:<math>\frac{\partial S}{\partial \beta_j}=2\sum_i r_i\frac{\partial r_i}{\partial \beta_j}=0 \quad (j=1,\ldots,n).</math> + +In a non-linear system, the derivatives <math>\frac{\partial r_i}{\partial \beta_j}</math> are functions of both the independent variable and the parameters, so these gradient equations do not have a closed solution. Instead, initial values must be chosen for the parameters. Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation, + +:<math>\beta_j \approx \beta_j^{k+1} =\beta^k_j+\Delta \beta_j. \, </math> + +Here, ''k'' is an iteration number and the vector of increments, <math>\Delta \boldsymbol \beta\,</math> is known as the shift vector. At each iteration the model is linearized by approximation to a first-order [[Taylor series]] expansion about <math> \boldsymbol \beta^k\!</math> +:<math>f(x_i,\boldsymbol \beta)\approx f(x_i,\boldsymbol \beta^k) +\sum_j \frac{\partial f(x_i,\boldsymbol \beta^k)}{\partial \beta_j} \left(\beta_j -\beta^{k}_j \right) \approx f(x_i,\boldsymbol \beta^k) +\sum_j J_{ij} \,\Delta\beta_j. </math> +The [[Jacobian]], '''J''', is a function of constants, the independent variable ''and'' the parameters, so it changes from one iteration to the next. Thus, in terms of the linearized model, <math>\frac{\partial r_i}{\partial \beta_j}=-J_{ij}</math> and the residuals are given by + +:<math>r_i=\Delta y_i- \sum_{s=1}^{n} J_{is}\ \Delta\beta_s; \ \Delta y_i=y_i- f(x_i,\boldsymbol \beta^k).</math> + +Substituting these expressions into the gradient equations, they become + +:<math>-2\sum_{i=1}^{m}J_{ij} \left( \Delta y_i-\sum_{s=1}^{n} J_{is}\ \Delta \beta_s \right)=0</math> + +which, on rearrangement, become ''n'' simultaneous linear equations, the '''normal equations''' + +:<math>\sum_{i=1}^{m}\sum_{s=1}^{n} J_{ij}J_{is}\ \Delta \beta_s=\sum_{i=1}^{m} J_{ij}\ \Delta y_i \qquad (j=1,\dots,n).\,</math> + +The normal equations are written in matrix notation as + +:<math>\mathbf{\left(J^TJ\right)\Delta \boldsymbol \beta=J^T\ \Delta y}.</math> + +When the observations are not equally reliable, a weighted sum of squares may be minimized, + +:<math>S=\sum_{i=1}^m W_{ii}r_i^2.</math> + +Each element of the [[diagonal matrix|diagonal]] weight matrix '''W''' should, ideally, be equal to the reciprocal of the error [[variance]] of the measurement.<ref>This implies that the observations are uncorrelated. If the observations are [[correlated]], the expression + +:<math>S=\sum_k \sum_j r_k W_{kj} r_j\,</math> + +applies. In this case the weight matrix should ideally be equal to the inverse of the error [[variance-covariance matrix]] of the observations.</ref> +The normal equations are then + +:<math>\mathbf{\left(J^TWJ\right)\Delta \boldsymbol \beta=J^TW\ \Delta y}.</math> + +These equations form the basis for the [[Gauss–Newton algorithm]] for a non-linear least squares problem. + +<!-- === Differences between linear and non-linear least squares === +*NLLSQ (Non-linear least squares) requires initial estimates of the parameters, LLSQ (linear least squares) does not. +*NLLSQ requires that the Jacobian be calculated. Analytical expressions for the partial derivatives can be complicated. If analytical expressions are impossible to obtain the partial derivatives must be calculated by numerical approximation. +*In NLLSQ divergence is a common phenomenon whereas in LLSQ it is quite rare. Divergence occurs when the sum of squares increases from one iteration to the next. It is caused by the inadequacy of the approximation that the Taylor series can be truncated at the first term. +*NLLSQ is an iterative process, LLSQ is not. The iterative process has to be terminated when a convergence criterion is satisfied. +*In LLSQ the solution is unique, but in NLLSQ there may be multiple minima in the sum of squares. +*In NLLSQ estimates of the parameter errors are [[biased]], but in LLSQ they are not. +These differences must be considered whenever the solution to a non-linear least squares problem is being sought. --> +=== Geometrical interpretation === +In linear least squares the [[Optimization (mathematics)|objective function]], ''S'', is a [[quadratic function#Bivariate quadratic function|quadratic function]] of the parameters. +:<math>S=\sum_i W_{ii} \left(y_i-\sum_jX_{ij}\beta_j \right)^2</math> +When there is only one parameter the graph of ''S'' with respect to that parameter will be a [[parabola]]. With two or more parameters the contours of ''S'' with respect to any pair of parameters will be concentric [[ellipse]]s (assuming that the normal equations matrix <math>\mathbf{X^TWX}</math> is [[positive-definite matrix|positive definite]]). The minimum parameter values are to be found at the centre of the ellipses. The geometry of the general objective function can be described as paraboloid elliptical. +In NLLSQ the objective function is quadratic with respect to the parameters only in a region close to its minimum value, where the truncated Taylor series is a good approximation to the model. +:<math>S \approx\sum_i W_{ii} \left(y_i-\sum_j J_{ij}\beta_j \right)^2</math> +The more the parameter values differ from their optimal values, the more the contours deviate from elliptical shape. A consequence of this is that initial parameter estimates should be as close as practicable to their (unknown!) optimal values. It also explains how divergence can come about as the Gauss–Newton algorithm is convergent only when the objective function is approximately quadratic in the parameters. + +== Computation == + +=== Initial parameter estimates === +Problems of ill-conditioning and divergence can be ameliorated by finding initial parameter estimates that are near to the optimal values. A good way to do this is by [[computer simulation]]. Both the observed and calculated data are displayed on a screen. The parameters of the model are adjusted by hand until the agreement between observed and calculated data is reasonably good. Although this will be a subjective judgment, it is sufficient to find a good starting point for the non-linear refinement. + +=== Solution === +Any method among the ones described [[#Algorithms|below]] can be applied for find a solution. + +=== Convergence criteria === +The common sense criterion for convergence is that the sum of squares does not decrease from one iteration to the next. However this criterion is often difficult to implement in practice, for various reasons. A useful convergence criterion is +:<math>\left|\frac{S^k-S^{k+1}}{S^k}\right|<0.0001.</math> +The value 0.0001 is somewhat arbitrary and may need to be changed. In particular it may need to be increased when experimental errors are large. An alternative criterion is + +:<math>\left|\frac{\Delta \beta_j}{\beta_j}\right|<0.001, \qquad j=1,\dots,n.</math> + +Again, the numerical value is somewhat arbitrary; 0.001 is equivalent to specifying that each parameter should be refined to 0.1% precision. This is reasonable when it is less than the largest relative standard deviation on the parameters. + +===Calculation of the Jacobian by numerical approximation=== +{{main|Numerical differentiation}} +There are models for which it is either very difficult or even impossible to derive analytical expressions for the elements of the Jacobian. Then, the numerical approximation +:<math>\frac{\partial f(x_i, \boldsymbol \beta)}{\partial \beta_j} \approx \frac{\delta f(x_i, \boldsymbol \beta)}{\delta \beta_j}</math> +is obtained by calculation of <math>f(x_i, \boldsymbol \beta)\,</math> for <math>\beta_j\,</math> and <math>\beta_j+\delta \beta_j\,</math>. The increment,<math>\delta \beta_j\,</math>, size should be chosen so the numerical derivative is not subject to approximation error by being too large, or [[round-off]] error by being too small. + +=== Parameter errors, confidence limits, residuals etc. === +Some information is given in [[linear least squares (mathematics)#Parameter errors, correlation and confidence limits|the section]] on the [[linear least squares (mathematics)|linear least squares]] page. + +=== Multiple minima === +Multiple minima can occur in a variety of circumstances some of which are: +*A parameter is raised to a power of two or more. For example, when fitting data to a [[Lorentzian]] curve +:: <math>f(x_i, \boldsymbol \beta)=\frac{\alpha}{1+\left(\frac{\gamma-x_i}{\beta} \right)^2}</math> +where <math>\alpha</math> is the height, <math>\gamma</math> is the position and <math>\beta</math> is the half-width at half height, there are two solutions for the half-width, <math>\hat \beta</math> and <math>-\hat \beta</math> which give the same optimal value for the objective function. +*Two parameters can be interchanged without changing the value of the model. A simple example is when the model contains the product of two parameters, since <math>\alpha \beta</math> will give the same value as <math>\beta \alpha</math>. +*A parameter is in a trigonometric function, such as <math>\sin \beta\,</math>, which has identical values at <math>\hat \beta +2n \pi</math>. See [[Levenberg–Marquardt algorithm#Example|Levenberg&ndash;Marquardt algorithm]] for an example. +Not all multiple minima have equal values of the objective function. False minima, also known as local minima, occur when the objective function value is greater than its value at the so-called global minimum. To be certain that the minimum found is the global minimum, the refinement should be started with widely differing initial values of the parameters. When the same minimum is found regardless of starting point, it is likely to be the global minimum. + +When multiple minima exist there is an important consequence: the objective function will have a maximum value somewhere between two minima. The normal equations matrix is not positive definite at a maximum in the objective function, as the gradient is zero and no unique direction of descent exists. Refinement from a point (a set of parameter values) close to a maximum will be ill-conditioned and should be avoided as a starting point. For example, when fitting a Lorentzian the normal equations matrix is not positive definite when the half-width of the band is zero.<ref>In the absence of [[round-off error]] and of experimental error in the independent variable the normal equations matrix would be singular</ref> + +=== Transformation to a linear model === +{{main|Nonlinear regression#Transformation}} +{{move section portions|Nonlinear regression#Transformation|date=August 2013}} +A non-linear model can sometimes be transformed into a linear one. For example, when the model is a simple exponential function, +:<math>f(x_i,\boldsymbol \beta)= \alpha e^{\beta x_i}</math> +it can be transformed into a linear model by taking logarithms. +:<math>\log f(x_i,\boldsymbol \beta)=\log \alpha + \beta x_i</math> +Graphically this corresponds to working on a [[semi-log plot]]. The sum of squares becomes +:<math>S=\sum_i (\log y_i-\log \alpha - \beta x_i)^2.\!</math> +This procedure should be avoided unless the errors are multiplicative and [[log normal distribution|log-normally distributed]] because it can give misleading results. This comes from the fact that whatever the experimental errors on '''y''' might be, the errors on '''log y''' are different. Therefore, when the transformed sum of squares is minimized different results will be obtained both for the parameter values and their calculated standard deviations. However, with multiplicative errors that are log-normally distributed, this procedure gives unbiased and consistent parameter estimates. + +Another example is furnished by [[Michaelis&ndash;Menten kinetics#Equation optimization|Michaelis&ndash;Menten kinetics]], used to determine two parameters <math>V_{\max}</math> and <math>K_m</math>: +:<math> v = \frac{V_{\max}[S]}{K_{m} + [S]}</math>. +The [[Lineweaver–Burk plot]] +:<math> \frac{1}{v} = \frac{1}{V_\max} + \frac{K_m}{V_{\max}[S]}</math> +of <math>\frac{1}{v}</math> against <math>\frac{1}{[S]}</math> is linear in the parameters <math>\frac{1}{V_\max}</math> and <math>\frac{K_m}{V_\max}</math>, but very sensitive to data error and strongly biased toward fitting the data in a particular range of the independent variable <math>[S]</math>. + +== Solution == +{{split section|Non-linear least squares algorithms|date=August 2013}} + +=== Gauss–Newton method === +{{main|Gauss–Newton algorithm}} +The normal equations +:<math>\mathbf{\left( J^TWJ \right)\Delta \boldsymbol\beta=\left( J^TW \right) \Delta y}</math> +may be solved for <math>\Delta \boldsymbol\beta</math> by [[Cholesky decomposition]], as described in [[linear least squares (mathematics)#Computation|linear least squares]]. The parameters are updated iteratively +:<math>\boldsymbol\beta^{k+1}=\boldsymbol\beta^k+\Delta \boldsymbol\beta</math> +where ''k'' is an iteration number. While this method may be adequate for simple models, it will fail if divergence occurs. Therefore protection against divergence is essential. + +==== Shift-cutting ==== +If divergence occurs, a simple expedient is to reduce the length of the shift vector, <math>\mathbf{\Delta \beta}</math>, by a fraction, ''f'' +:<math>\boldsymbol\beta^{k+1}=\boldsymbol\beta^k+f\ \Delta \boldsymbol\beta.</math> +For example the length of the shift vector may be successively halved until the new value of the objective function is less than its value at the last iteration. The fraction, ''f'' could be optimized by a [[line search]].<ref name=BDS>M.J. Box, D. Davies and W.H. Swann, Non-Linear optimisation Techniques, Oliver & Boyd, 1969</ref> As each trial value of ''f'' requires the objective function to be re-calculated it is not worth optimizing its value too stringently. + +When using shift-cutting, the direction of the shift vector remains unchanged. This limits the applicability of the method to situations where the direction of the shift vector is not very different from what it would be if the objective function were approximately quadratic in the parameters, <math>\boldsymbol\beta^k.</math> + +==== Marquardt parameter ==== +{{main|Levenberg–Marquardt algorithm}} +If divergence occurs and the direction of the shift vector is so far from its "ideal" direction that shift-cutting is not very effective, that is, the fraction, ''f'' required to avoid divergence is very small, the direction must be changed. This can achieved by using the [[Levenberg–Marquardt algorithm|Marquardt]] parameter.<ref>This technique was proposed independently by Levenberg (1944), Girard (1958), Wynne (1959), Morrison (1960) and Marquardt (1963). Marquardt's name alone is used for it in much of the scientific literature.</ref> In this method the normal equations are modified +:<math>\mathbf{\left( J^TWJ +\lambda I \right)\Delta \boldsymbol \beta=\left( J^TW \right) \Delta y}</math> +where <math>\lambda</math> is the Marquardt parameter and '''I''' is an identity matrix. Increasing the value of <math>\lambda</math> has the effect of changing both the direction and the length of the shift vector. The shift vector is rotated towards the direction of [[steepest descent]] +:when <math>\lambda \mathbf{I\gg{}J^TWJ}, \ \mathbf{\Delta \boldsymbol \beta} \approx 1/\lambda \mathbf{J^TW\ \Delta y}.</math> +<math>\mathbf{J^TW\ \Delta y}</math> is the steepest descent vector. So, when <math>\lambda</math> becomes very large, the shift vector becomes a small fraction of the steepest descent vector. + +Various strategies have been proposed for the determination of the Marquardt parameter. As with shift-cutting, it is wasteful to optimize this parameter too stringently. Rather, once a value has been found that brings about a reduction in the value of the objective function, that value of the parameter is carried to the next iteration, reduced if possible, or increased if need be. When reducing the value of the Marquardt parameter, there is a cut-off value below which it is safe to set it to zero, that is, to continue with the unmodified Gauss–Newton method. The cut-off value may be set equal to the smallest singular value of the Jacobian.<ref name=LH/> A bound for this value is given by <math>1/\mbox{trace} \mathbf{\left(J^TWJ \right)^{-1}}</math>.<ref>R. Fletcher, UKAEA Report AERE-R 6799, H.M. Stationery Office, 1971</ref> + +=== QR decomposition === +The minimum in the sum of squares can be found by a method that does not involve forming the normal equations. The residuals with the linearized model can be written as +:<math>\mathbf{r=\Delta y-J\ \Delta\boldsymbol\beta}.</math> +The Jacobian is subjected to an orthogonal decomposition; the [[QR decomposition]] will serve to illustrate the process. + +:<math>\mathbf{J=QR}</math> + +where '''Q''' is an [[Orthogonal matrix|orthogonal]] <math>m \times m</math> matrix and '''R''' is an <math>m \times n</math> matrix which is [[block matrix|partitioned]] into a <math>n \times n</math> block, <math>\mathbf\R_n</math>, and a <math>m-n \times n</math> zero block. <math>\mathbf\R_n</math> is upper triangular. + +:<math>\mathbf{R}= \begin{bmatrix} +\mathbf{R}_n \\ +\mathbf{0}\end{bmatrix}</math> + +The residual vector is left-multiplied by <math>\mathbf Q^T</math>. + +:<math>\mathbf{Q^Tr=Q^T\ \Delta y -R\ \Delta\boldsymbol\beta}= \begin{bmatrix} +\mathbf{\left(Q^T\ \Delta y -R\ \Delta\boldsymbol\beta \right)}_n \\ +\mathbf{\left(Q^T\ \Delta y \right)}_{m-n}\end{bmatrix}</math> + +This has no effect on the sum of squares since <math>S=\mathbf{r^T Q Q^Tr = r^Tr}</math> because '''Q''' is [[orthogonal]] +The minimum value of ''S'' is attained when the upper block is zero. Therefore the shift vector is found by solving + +:<math>\mathbf{R_n\ \Delta\boldsymbol\beta =\left(Q^T\ \Delta y \right)_n}. \, </math> + +These equations are easily solved as '''R''' is upper triangular. + +=== Singular value decomposition === +A variant of the method of orthogonal decomposition involves [[singular value decomposition]], in which '''R''' is diagonalized by further orthogonal transformations. + +:<math>\mathbf{J=U \boldsymbol\Sigma V^T} \, </math> + +where <math>\mathbf U</math> is orthogonal, <math>\boldsymbol\Sigma </math> is a diagonal matrix of singular values and <math>\mathbf V</math> is the orthogonal matrix of the eigenvectors of <math>\mathbf {J^TJ}</math> or equivalently the right singular vectors of <math>\mathbf{J}</math>. In this case the shift vector is given by + +:<math>\mathbf{\boldsymbol\Delta\beta=V \boldsymbol\Sigma^{-1}\left( U^T\ \boldsymbol\Delta y \right)}_n. \, </math> + +The relative simplicity of this expression is very useful in theoretical analysis of non-linear least squares. The application of singular value decomposition is discussed in detail in Lawson and Hanson.<ref name=LH>C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, Prentice–Hall, 1974</ref> + +=== Gradient methods === +There are many examples in the scientific literature where different methods have been used for non-linear data-fitting problems. + +*Inclusion of second derivatives in The Taylor series expansion of the model function. This is [[Newton's method in optimization]]. +:: <math>f(x_i, \boldsymbol \beta)=f^k(x_i, \boldsymbol \beta) +\sum_j J_{ij} \, \Delta \beta_j + \frac{1}{2}\sum_j\sum_k \Delta\beta_j \, \Delta\beta_k \,H_{jk_{(i)}},\ H_{jk_{(i)}}=\frac{\partial^2 f(x_i, \boldsymbol \beta)}{\partial \beta_j \, \partial \beta_k }. </math> +: The matrix '''H''' is known as the [[Hessian matrix]]. Although this model has better convergence properties near to the minimum, it is much worse when the parameters are far from their optimal values. Calculation of the Hessian adds to the complexity of the algorithm. This method is not in general use. +*[[Davidon–Fletcher–Powell formula|Davidon–Fletcher–Powell method]]. This method, a form of pseudo-Newton method, is similar to the one above but calculates the Hessian by successive approximation, to avoid having to use analytical expressions for the second derivatives. +*[[Steepest descent]]. Although a reduction in the sum of squares is guaranteed when the shift vector points in the direction of steepest descent, this method often performs poorly. When the parameter values are far from optimal the direction of the steepest descent vector, which is normal (perpendicular) to the contours of the objective function, is very different from the direction of the Gauss–Newton vector. This makes divergence much more likely, especially as the minimum along the direction of steepest descent may correspond to a small fraction of the length of the steepest descent vector. When the contours of the objective function are very eccentric, due to there being high correlation between parameters. the steepest descent iterations, with shift-cutting, follow a slow, zig-zag trajectory towards the minimum. +*[[Conjugate gradient method|Conjugate gradient search]]. This is an improved steepest descent based method with good theoretical convergence properties, although it can fail on finite-precision digital computers even when used on quadratic problems.<ref>M. J. D. Powell, Computer Journal, (1964), '''7''', 155.</ref> + +=== Direct search methods === +Direct search methods depend on evaluations of the objective function at a variety of parameter values and do not use derivatives at all. They offer alternatives to the use of numerical derivatives in the Gauss–Newton method and gradient methods. +* Alternating variable search.<ref name=BDS/> Each parameter is varied in turn by adding a fixed or variable increment to it and retaining the value that brings about a reduction in the sum of squares. The method is simple and effective when the parameters are not highly correlated. It has very poor convergence properties, but may be useful for finding initial parameter estimates. + +*[[Nelder–Mead method|Nelder–Mead (simplex) search]] A [[simplex]] in this context is a [[polytope]] of ''n''&nbsp;+&nbsp;1 vertices in ''n'' dimensions; a triangle on a plane, a tetrahedron in three-dimensional space and so forth. Each vertex corresponds to a value of the objective function for a particular set of parameters. The shape and size of the simplex is adjusted by varying the parameters in such a way that the value of the objective function at the highest vertex always decreases. Although the sum of squares may initially decrease rapidly, it can converge to a nonstationary point on quasiconvex problems, by an example of M. J. D. Powell. + +More detailed descriptions of these, and other, methods are available, in ''[[Numerical Recipes]]'', together with computer code in various languages. + +== See also == +* [[Least squares support vector machine]] +* [[Curve fitting]] +* [[Nonlinear programming]] +* [[Optimization (mathematics)]] +* [[Levenberg&ndash;Marquardt algorithm]] + +== Notes == +<references/> + +== References == +*C. T. Kelley, ''Iterative Methods for Optimization'', SIAM Frontiers in Applied Mathematics, no 18, 1999, ISBN 0-89871-433-8. [http://www.siam.org/books/textbooks/fr18_book.pdf Online copy] +* T. Strutz: ''Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond)''. Vieweg+Teubner, ISBN 978-3-8348-1022-9. + +{{Least Squares and Regression Analysis}} + +[[Category:Numerical analysis]] +[[Category:Mathematical optimization]] +[[Category:Regression analysis]] +[[Category:Least squares]] + aqofx7ibsececcmqbpk1a7irxgbfn6r + + + + Markov decision process + 0 + 7628 + + 7629 + 2014-01-06T21:40:41Z + + 86.129.8.36 + + link + wikitext + text/x-wiki + '''Markov decision processes (MDPs)''', named after [[Andrey Markov]], provide a mathematical framework for modeling [[decision making]] in situations where outcomes are partly [[Randomness#In mathematics|random]] and partly under the control of a decision maker. MDPs are useful for studying a wide range of [[optimization problem]]s solved via [[dynamic programming]] and [[reinforcement learning]]. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from [[Ronald A. Howard]]'s book published in 1960, ''Dynamic Programming and Markov Processes''. They are used in a wide area of disciplines, including [[robotics]], [[Automatic control|automated control]], [[economics]], and [[manufacturing]]. + +More precisely, a Markov Decision Process is a [[discrete time]] [[stochastic]] [[Optimal control theory|control]] process. At each time step, the process is in some state <math>s</math>, and the decision maker may choose any action <math>a</math> that is available in state <math>s</math>. The process responds at the next time step by randomly moving into a new state <math>s'</math>, and giving the decision maker a corresponding reward <math>R_a(s,s')</math>. + +The probability that the process moves into its new state <math>s'</math> is influenced by the chosen action. Specifically, it is given by the state transition function <math>P_a(s,s')</math>. Thus, the next state <math>s'</math> depends on the current state <math>s</math> and the decision maker's action <math>a</math>. But given <math>s</math> and <math>a</math>, it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP possess the ''[[Markov property]]''. + +Markov decision processes are an extension of [[Markov chain]]s; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are the same (e.g., zero), a Markov decision process reduces to a [[Markov chain]]. + +==Definition== +[[Image:Markov Decision Process example.png|400px|right|thumb|Example of a simple MDP with three states and two actions.]] +A Markov decision process is a 4-[[tuple]] <math>(S,A,P_\cdot(\cdot,\cdot),R_\cdot(\cdot,\cdot))</math>, where + +* <math>S</math> is a finite set of states, +* <math>A</math> is a finite set of actions (alternatively, <math>A_s</math> is the finite set of actions available from state <math>s</math>), +* <math>P_a(s,s') = \Pr(s_{t+1}=s' \mid s_t = s, a_t=a)</math> is the probability that action <math>a</math> in state <math>s</math> at time <math>t</math> will lead to state <math>s'</math> at time <math>t+1</math>, +*<math>R_a(s,s')</math> is the immediate reward (or [[expected]] immediate reward) received after transition to state <math>s'</math> from state <math>s</math>. + +(Note: The theory of Markov decision processes does not state that <math>S</math> or <math>A</math> are finite, but the basic algorithms below assume that they are finite.) + +==Problem== + +The core problem of MDPs is to find a "policy" for the decision maker: a function <math>\pi</math> that specifies the action <math>\pi(s)</math> that the decision maker will choose when in state <math>s</math>. Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a [[Markov chain]]. + +The goal is to choose a policy <math>\pi</math> that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon: + +:<math>\sum^{\infty}_{t=0} {\gamma^t R_{a_t} (s_t, s_{t+1})} </math> &nbsp;&nbsp;&nbsp;(where we choose <math>a_t = \pi(s_t)</math>) + +where <math>\ \gamma \ </math> is the discount factor and satisfies <math>0 \le\ \gamma\ < 1</math>. (For example, <math> \gamma = 1/(1+r) </math> when the discount rate is r.) <math> \gamma </math> is typically close to 1. + +Because of the Markov property, the optimal policy for this particular problem can indeed be written as a function of <math>s</math> only, as assumed above. + +==Algorithms== +MDPs can be solved by [[linear programming]] or [[dynamic programming]]. In what follows we present the latter approach. + +Suppose we ''know'' the state transition function <math>P</math> and the reward function <math>R</math>, and we wish to calculate the policy that maximizes the expected discounted reward. + +The standard family of algorithms to calculate this optimal policy requires storage for two arrays indexed by state: ''value'' <math>V</math>, which contains real values, and ''policy'' <math>\pi</math> which contains actions. At the end of the algorithm, <math>\pi</math> will contain the solution and <math>V(s)</math> will contain the discounted sum of the rewards to be earned (on average) by following that solution from state <math>s</math>. + +The algorithm has the following two kinds of steps, which are repeated in some order for all the states until no further changes take place. +They are defined recursively as follows: + +:<math> \pi(s) := \arg \max_a \left\{ \sum_{s'} P_a(s,s') \left( R_a(s,s') + \gamma V(s') \right) \right\} </math> + +:<math> V(s) := \sum_{s'} P_{\pi(s)} (s,s') \left( R_{\pi(s)} (s,s') + \gamma V(s') \right) </math> + +Their order depends on the variant of the algorithm; one can also do them for all states at once or state by state, and more often to some states than others. As long as no state is permanently excluded from either of the steps, the algorithm will eventually arrive at the correct solution. + +===Notable variants=== + +====Value iteration==== +In value iteration (Bellman 1957), which is also called [[backward induction]], +the <math>\pi</math> function is not used; instead, the value of <math>\pi(s)</math> is calculated within <math>V(s)</math> whenever it is needed. Shapley's 1953 paper on [[stochastic games]] included as a special case the value iteration method for MDPs, but this was recognized only later on.<ref>Lodewijk Kallenberg, ''Finite state and action MDPs'', in Eugene A. Feinberg, Adam Shwartz (eds.) ''Handbook of Markov decision processes: methods and applications'', Springer, 2002, ISBN 0-7923-7459-2</ref> + +Substituting the calculation of <math>\pi(s)</math> into the calculation of <math>V(s)</math> gives the combined step: +:<math> V(s) := \max_a \left\{ \sum_{s'} P_a(s,s') \left( R_a(s,s') + \gamma V(s') \right) \right\}. </math> + +This update rule is iterated for all states <math>s</math> until it converges with the left-hand side equal to the right-hand side (which is the "[[Bellman equation]]" for this problem). + +====Policy iteration==== +In policy iteration (Howard 1960), step one is performed once, and then step two is repeated until it converges. Then step one is again performed once and so on. + +Instead of repeating step two to convergence, it may be formulated and solved as a set of linear equations. + +This variant has the advantage that there is a definite stopping condition: when the array <math>\pi</math> does not change in the course of applying step 1 to all states, the algorithm is completed. + +====Modified policy iteration==== +In modified policy iteration (van Nunen, 1976; Puterman and Shin 1978), step one is performed once, and then step two is repeated several times. Then step one is again performed once and so on. + +====Prioritized sweeping==== +In this variant, the steps are preferentially applied to states which are in some way important - whether based on the algorithm (there were large changes in <math>V</math> or <math>\pi</math> around those states recently) or based on use (those states are near the starting state, or otherwise of interest to the person or program using the algorithm). + +==Extensions and generalizations== +A Markov decision process is a [[stochastic game]] with only one player. + +===Partial observability=== +{{main|partially observable Markov decision process}} +The solution above assumes that the state <math>s</math> is known when action is to be taken; otherwise <math>\pi(s)</math> cannot be calculated. When this assumption is not true, the problem is called a partially observable Markov decision process or POMDP. + +A major advance in this area was provided by Burnetas and Katehakis in "Optimal adaptive policies for Markov decision processes".<ref>Burnetas AN and Katehakis MN (1997). +"Optimal adaptive policies for Markov decision processes", +''Math. Oper. Res.'', 22(1), 262&ndash;268</ref> In this work a class of adaptive policies that possess uniformly maximum convergence rate properties for the total expected finite horizon reward, were constructed under the assumptions of finite state-action spaces and irreducibility of the transition law. These policies prescribe that the choice of actions, at each state and time period, should be based on indices that are inflations of the right-hand side of the estimated average reward optimality equations. + +===Reinforcement learning=== +If the probabilities or rewards are unknown, the problem is one of [[reinforcement learning]] (Sutton and Barto, 1998). + +For this purpose it is useful to define a further function, which corresponds to taking the action <math>a</math> and then continuing optimally (or according to whatever policy one currently has): +:<math>\ Q(s,a) = \sum_{s'} P_a(s,s') (R_a(s,s') + \gamma V(s')).\ </math> + +While this function is also unknown, experience during learning is based on <math>(s, a)</math> pairs (together with the outcome <math>s'</math>); that is, "I was in state <math>s</math> and I tried doing <math>a</math> and <math>s'</math> happened"). Thus, one has an array <math>Q</math> and uses experience to update it directly. This is known as [[Q-learning|Q&#8209;learning]]. + +Reinforcement learning can solve Markov decision processes without explicit specification of the transition probabilities; the values of the transition probabilities are needed in value and policy iteration. In reinforcement learning, instead of explicit specification of the transition probabilities, the transition probabilities are accessed through a simulator that is typically restarted many times from a uniformly random initial state. Reinforcement learning can also be combined with function approximation to address problems with a very large number of states. + +==Continuous-time Markov Decision Process== +In discrete-time Markov Decision Processes, decisions are made at discrete time intervals. However, for '''Continuous-time Markov Decision Processes''', decisions can be made at any time the decision maker chooses. In comparison to discrete-time Markov Decision Process, Continuous-time Markov Decision Process can better model the decision making process for a system that has [[Continuous time|continuous dynamics]], i.e.,&nbsp;the system dynamics is defined by [[partial differential equation]]s (PDEs). + +===Definition=== +In order to discuss the continuous-time Markov Decision Process, we introduce two sets of notations: + +If the state space and action space are finite, +* <math>\mathcal{S}</math>: State space; +* <math>\mathcal{A}</math>: Action space; +* <math>q(i|j,a)</math>: <math>\mathcal{S}\times \mathcal{A} \rightarrow \triangle \mathcal{S}</math>, transition rate function; +* <math>R(i,a)</math>: <math>\mathcal{S}\times \mathcal{A} \rightarrow \mathbb{R}</math>, a reward function. + +If the state space and action space are continuous, +* <math>\mathcal{X}</math>: State space.; +* <math>\mathcal{U}</math>: Space of possible control; +* <math>f(x,u)</math>: <math>\mathcal{X}\times \mathcal{U} \rightarrow \triangle \mathcal{X}</math>, a transition rate function; +* <math>r(x,u)</math>: <math>\mathcal{X}\times \mathcal{U} \rightarrow \mathbb{R}</math>, a reward rate function such that <math>r(x(t),u(t))dt=dR(x(t),u(t))</math>, where <math>R(x,u)</math> is the reward function we discussed in previous case. + +===Problem=== +Like the Discrete-time Markov Decision Processes, in Continuous-time Markov Decision Process we want to find the optimal ''policy'' or ''control'' which could give us the optimal expected integrated reward: +:<math>max \quad \mathbb{E}_u[\int_0^{\infty}\gamma^t r(x(t),u(t)))dt|x_0]</math> +Where <math>0\leq\gamma< 1</math> + +===Linear programming formulation=== +If the state space and action space are finite, we could use linear programming to find the optimal policy, which was one of the earliest approaches applied. Here we only consider the ergodic model, which means our continuous-time MDP becomes an [[Ergodicity|ergodic]] continuous-time Markov Chain under a stationary [[policy]]. Under this assumption, although the decision maker can make a decision at any time at the current state, he could not benefit more by taking more than one action. It is better for him to take an action only at the time when system is transitioning from the current state to another state. Under some conditions,(for detail check Corollary 3.14 of [http://www.springer.com/mathematics/applications/book/978-3-642-02546-4 ''Continuous-Time Markov Decision Processes'']), if our optimal value function <math>V^*</math> is independent of state i, we will have the following inequality: +:<math>g\geq R(i,a)+\sum_{j\in S}q(j|i,a)h(j) \quad \forall i \in S \,\, and \,\, +a\in A(i)</math> +If there exists a function <math>h</math>, then <math>\bar V^*</math> will be the smallest g satisfying the above equation. In order to find <math>\bar V^*</math>, we could use the following linear programming model: +*Primal linear program(P-LP) +:<math> +\begin{align} +\text{Minimize}\quad &g\\ +\text{s.t} \quad & g-\sum_{j \in S}q(j|i,a)h(j)\geq R(i,a)\,\, +\forall i\in S,\,a\in A(i) +\end{align} +</math> +*Dual linear program(D-LP) +:<math> +\begin{align} +\text{Maximize} &\sum_{i\in S}\sum_{a\in A(i)}R(i,a)y(i,a)\\ +\text{s.t.} &\sum_{i\in S}\sum_{a\in A(i)} q(j|i,a)y(i,a)=0 \quad +\forall j\in S,\\ +& \sum_{i\in S}\sum_{a\in A(i)}y(i,a)=1,\\ +& y(i,a)\geq 0 \qquad \forall a\in A(i)\,\,and\,\, \forall i\in S +\end{align} +</math> +<math>y(i,a)</math> is a feasible solution to the D-LP if <math>y(i,a)</math> is +nonnative and satisfied the constraints in the D-LP problem. A +feasible solution <math>y*(i,a)</math> to the D-LP is said to be an optimal +solution if +:<math> +\begin{align} +\sum_{i\in S}\sum_{a\in A(i)}R(i,a)y^*(i,a) \geq \sum_{i\in +S}\sum_{a\in A(i)}R(i,a)y(i,a) +\end{align} +</math> +for all feasible solution y(i,a) to the D-LP. +Once we found the optimal solution <math>y*(i,a)</math>, we could use those optimal solution to establish the optimal policies. + +===Hamilton-Jacobi-Bellman equation=== +In continuous-time MDP, if the state space and action space are continuous, the optimal criterion could be found by solving Hamilton-Jacobi-Bellman (HJB) partial differential equation. +In order to discuss the HJB equation, we need to reformulate +our problem +:<math>\begin{align} V(x(0),0)=&\text{max}_u\int_0^T r(x(t),u(t))dt+D[x(T)]\\ +s.t.\quad & \frac{dx(t)}{dt}=f[t,x(t),u(t)] +\end{align} +</math> + +D(<math>\cdot</math>) is the terminal reward function, <math> x(t)</math> is the +system state vector, <math> u(t)</math> is the system control vector we try to +find. f(<math>\cdot</math>) shows how the state vector change over time. +Hamilton-Jacobi-Bellman equation is as follows: +:<math>0=\text{max}_u ( r(t,x,u) +\frac{\partial V(t,x)}{\partial x}f(t,x,u)) </math> +We could solve the equation to find the optimal control <math>u(t)</math>, which could give us the optimal value <math>V^*</math> + +===Application=== +Queueing system, epidemic processes, [[Population process]]. + +==Alternative notations== +The terminology and notation for MDPs are not entirely settled. There are two main streams — one focuses on maximization problems from contexts like economics, using the terms action, reward, value, and calling the discount factor <math>\beta</math> or <math>\gamma</math>, while the other focuses on minimization problems from engineering and navigation, using the terms control, cost, cost-to-go, and calling the discount factor <math>\alpha</math>. In addition, the notation for the transition probability varies. + +{| border +! in this article !! alternative !! comment +|- +| action <math>a</math> || control <math>u</math> || +|- +| reward <math>R</math> || cost <math>g</math> +| <math>g</math> is the negative of <math>R</math> +|- +| value <math>V</math> || cost-to-go <math>J</math> +| <math>J</math> is the negative of <math>V</math> +|- +| policy <math>\pi</math> || policy <math>\mu</math> || +|- +| discounting factor <math>\ \gamma \ </math> || discounting factor <math>\alpha</math> || +|- +| transition probability <math>P_a(s,s')</math> || transition probability <math>p_{ss'}(a)</math> || +|} + +In addition, transition probability is sometimes written <math>Pr(s,a,s')</math>, <math>Pr(s'|s,a)</math> or, rarely, <math>p_{s's}(a).</math> + +==See also== +* [[Partially observable Markov decision process]] +* [[Dynamic programming]] +* [[Bellman equation]] for applications to economics. +* [[Hamilton–Jacobi–Bellman equation]] +* [[Optimal control]] +* [[Recursive Economics]] +* [[Mabinogion sheep problem]] +* [[Stochastic games]] +* [[Q-learning]] + +==Notes== +{{reflist}} + +==References== +{{refbegin}} +* R. Bellman. [http://www.iumj.indiana.edu/IUMJ/FULLTEXT/1957/6/56038 ''A Markovian Decision Process'']. Journal of Mathematics and Mechanics 6, 1957. +* R. E. Bellman. ''Dynamic Programming''. Princeton University Press, Princeton, NJ, 1957. Dover paperback edition (2003), ISBN 0-486-42809-5. +* Ronald A. Howard ''Dynamic Programming and Markov Processes'', The M.I.T. Press, 1960. +* D. Bertsekas. Dynamic Programming and Optimal Control. Volume 2, Athena, MA, 1995. +* Burnetas, A.N. and M. N. Katehakis. "Optimal Adaptive Policies for Markov Decision Processes'', Mathematics of Operations Research, 22,(1), 1995. +*E.A. Feinberg and A. Shwartz (eds.) Handbook of Markov Decision Processes, Kluwer, Boston, MA, 2002. +* M. L. Puterman. ''Markov Decision Processes''. Wiley, 1994. +* H.C. Tijms. ''A First Course in Stochastic Models''. Wiley, 2003. +* Sutton, R. S. and Barto A. G. ''Reinforcement Learning: An Introduction''. The MIT Press, Cambridge, MA, 1998. +* J.A. E. E van Nunen. A set of successive approximation methods for discounted Markovian decision problems. Z. Operations Research, 20:203-208, 1976. +* S. P. Meyn, 2007. [https://netfiles.uiuc.edu/meyn/www/spm_files/CTCN/CTCN.html Control Techniques for Complex Networks], Cambridge University Press, 2007. ISBN 978-0-521-88441-9. Appendix contains abridged [https://netfiles.uiuc.edu/meyn/www/spm_files/book.html Meyn & Tweedie]. +* S. M. Ross. 1983. Introduction to stochastic dynamic programming. Academic press +* X. Guo and O. Hernández-Lerma. [http://www.springer.com/mathematics/applications/book/978-3-642-02546-4 ''Continuous-Time Markov Decision Processes''], Springer, 2009. +* M. L. Puterman and Shin M. C. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, ''Management Science'' 24, 1978. +{{refend}} + +==External links== +* [http://www.ai.mit.edu/~murphyk/Software/MDP/mdp.html MDP Toolbox for Matlab] - An excellent tutorial and Matlab toolbox for working with MDPs. +* [http://www.cs.ualberta.ca/~sutton/book/ebook Reinforcement Learning] An Introduction by Richard S. Sutton and Andrew G. Barto +* [http://www.cs.uwaterloo.ca/~jhoey/research/spudd/index.php SPUDD] A structured MDP solver for download by Jesse Hoey +* [http://www.eecs.umich.edu/~baveja/Papers/Thesis.ps.gz Learning to Solve Markovian Decision Processes] by [http://www.eecs.umich.edu/~baveja/ Satinder P. Singh] +* [http://www.jstor.org/stable/3690147 Optimal Adaptive Policies for Markov Decision Processes] by Burnetas and Katehakis (1997). + +[[Category:Optimal decisions]] +[[Category:Dynamic programming]] +[[Category:Markov processes]] +[[Category:Stochastic control]] + 1ztq3b3q9cpoib9uyybg84degajg57z + + + + Hilbert series and Hilbert polynomial + 0 + 14811 + + 14812 + 2013-11-15T19:10:19Z + + 144.118.113.215 + + /* Degree of a projective variety and Bézout's theorem */ + wikitext + text/x-wiki + Given a [[graded algebra|graded commutative algebra]] finitely generated over a [[field (mathematics)|field]], the '''[[David Hilbert|Hilbert]] function''', '''Hilbert polynomial''', and '''Hilbert series''' are three strongly related notions which measure the growth of the dimension of its homogeneous components. + +These notions have been extended to [[filtered algebra]]s and graded filtered [[modules]] over these algebras. + +The typical situations where these notions are used are the following: +* The quotient by a homogeneous [[Ideal (ring theory)|ideal]] of a [[multivariate polynomial]] [[polynomial ring|ring]], graded by the total degree. +* The quotient by an [[Ideal (ring theory)|ideal]] of a [[multivariate polynomial]] [[polynomial ring|ring]], filtered by the total degree. +* The filtration of a [[local ring]] by the powers of its [[maximal ideal]]. In this case the Hilbert polynomial is called the [[Hilbert–Samuel function|Hilbert–Samuel polynomial]]. + +The Hilbert series of an algebra or a module is a special case of the [[Hilbert–Poincaré series]] of a [[graded vector space]]. + +Hilbert polynomial and Hilbert series are important in computational [[algebraic geometry]], as they are the easiest known way for computing the dimension and the degree of an algebraic variety defined by explicit polynomial equations. + +== Definitions and main properties== + +Let us consider a finitely generated [[graded algebra|graded commutative algebra]] ''S'' over a [[field (mathematics)|field]] ''K'', which is finitely generated by elements of positive degree. This means that +:<math>S = \bigoplus_{i \ge 0} S_i\ </math> +and that <math>S_0=K</math>. + +The '''Hilbert function''' +:<math>HF_S\;:\;n\mapsto \dim_K\,S_n</math> +maps the integer ''n'' onto the dimension of the ''K''-vector space ''S''<sub>''n''</sub>. The '''Hilbert series''', which is called [[Hilbert–Poincaré series]] in the more general setting of graded vector spaces, is the [[formal series]] +:<math>HS_S(t)=\sum_{n=0}^{\infty} HF_S(n)\,t^n.</math> + +If ''S'' is generated by ''h'' homogeneous elements of positive degrees <math>d_1, \ldots, d_h</math>, then the sum of the Hilbert series is a rational fraction + +:<math>HS_S(t)=\frac{Q(t)}{\prod_{i=1}^h (1-t^{d_i})}\,,</math> + +where ''Q'' is a polynomial with integer coefficients. + +If ''S'' is generated by elements of degrees 1 then the sum of the Hilbert series may be rewritten as + +:<math>HS_S(t)=\frac{P(t)}{(1-t)^\delta}\,,</math> + +where ''P'' is a polynomial with positive integer coefficients. + +In this case the series expansion of this rational fraction is + +:<math>HS_S(t)=P(t)\,\left(1+\delta\,t+\cdots +\binom{n+\delta-1}{\delta-1}\,t^n+\cdots\right)</math> + +where the [[binomial coefficient]] <math>\binom{n+\delta-1}{\delta-1}</math> is <math>\;\frac{(n+\delta-1)(n+\delta-2)\cdots n}{(\delta-1)!}\;</math> for <math>n>-\delta</math> and 0 otherwise. + +This shows that there exists a unique polynomial <math>HP_S(n)</math> with rational coefficients which is equal to <math>HF_S(n)</math> for <math> n\ge \deg P-\delta+1</math>. This polynomial is the '''Hilbert polynomial'''. The least ''n''<sub>0</sub> such that <math>HP_S(n)=HF_S(n)</math> for ''n''&nbsp;≥&nbsp;''n''<sub>0</sub> is called the '''Hilbert regularity'''. It may be lower than <math>\deg P-\delta+1</math>. + +The Hilbert polynomial is a [[numerical polynomial]], since the dimensions are integers, but the polynomial almost never has integer coefficients {{harv|Schenck|2003|pp=41}}. + +All these definitions may be extended to finitely generated [[graded module]]s over ''S'', with the only difference that a factor ''t''<sup>''m''</sup> appears in the Hilbert series, where ''m'' is the minimal degree of the generators of the module, which may be negative. + +The '''Hilbert function''', the '''Hilbert series''' and the '''Hilbert polynomial''' of a [[filtered algebra]] are those of the associated graded algebra. + +The Hilbert polynomial of a [[projective variety]] ''V'' in '''P'''<sup>''n''</sub> is defined as the Hilbert polynomial of the [[homogeneous coordinate ring]] of ''V''. + +== Graded algebra and polynomial rings == + +Polynomial rings and their quotients by homogeneous ideals are typical graded algebras. Conversely, if ''S'' is a graded algebra generated over the field ''K'' by ''n'' homogeneous elements ''g''<sub>1</sub>, ..., ''g''<sub>''n''</sub> of degree 1, then the map which sends ''X''<sub>''i''</sub> onto ''g''<sub>''i''</sub> defines an homomorphism of graded rings from <math>R_n=K[X_1,\ldots, X_n]</math> onto ''S''. Its [[Kernel (algebra)|kernel]] is a homogeneous ideal ''I'' and this defines an isomorphism of graded algebra between <math>R_n/I</math> and ''S''. + +Thus, the graded algebras generated by elements of degree 1 are exactly, up to an isomorphism, the quotients of polynomial rings by homogeneous ideals. Therefore, the remainder of this article will be restricted to the quotients of polynomial rings by ideals. + +== Properties of Hilbert series == + +=== Additivity === +Hilbert series and Hilbert polynomial are additive relatively to [[exact sequence]]s. More precisely, if +:<math>0 \;\rightarrow\; A\;\rightarrow\; B\;\rightarrow\; C \;\rightarrow\; 0</math> +is an exact sequence of graded or filtered modules, then we have +:<math>HS_B=HS_A+HS_C</math> +and +:<math>HP_B=HP_A+HP_C.</math> +This follows immediately from the same property for the dimension of vector spaces. + +=== Quotient by a non-zero divisor === + +Let ''A'' be a graded algebra and ''f'' a homogeneous element of degree ''d'' in ''A'' which is not a [[zero divisor]]. Then we have +:<math>HS_{A/f}(t)=(1-t^d)\,HS_A(t)\,.</math> +It follows immediately from the additivity on the exact sequence +:<math>0 \;\rightarrow\; A^{[d]}\; \xrightarrow{f}\; A \;\rightarrow\; A/f\rightarrow\; 0\,,</math> +where the arrow labeled ''f'' is the multiplication by ''f'' and <math>A^{[d]}</math> is the graded algebra, which is obtained from ''A'' by shifting the degrees by ''d'', in order that the multiplication by ''f'' has degree 0. This implies that <math>HS_{A^{[d]}}(t)=t^d\,HS_A(t)\,.</math> + +=== Hilbert series and Hilbert polynomial of a polynomial ring === + +The Hilbert series of the polynomial ring <math>R_n=K[x_0, \ldots, x_n]</math> is +:<math>HS_{R_n}(t) = \frac{1}{(1-t)^{n+1}}\,.</math> +It follows that the Hilbert polynomial is +: <math> HP_{R_n}(k) = {{k+n}\choose{n}} = \frac{(k+1)\cdots(k+n)}{n!}\,.</math> + +The proof that the Hilbert series has this simple form is obtained by applying recursively the previous formula for the quotient by a non zero divisor (here <math>x_n</math>) and remarking that <math>HS_K(t)=1\,.</math> + +=== Shape of the Hilbert series and dimension === + +A graded algebra ''A'' generated by homogeneous elements of degree 1 has [[Krull dimension]] zero if the maximal homogeneous ideal, that is the ideal generated by the homogeneous elements of degree 1, is [[nilpotent ideal|nilpotent]]. This implies that the dimension of ''A'' as a ''K''-vector space is finite and the Hilbert series of ''A'' is a polynomial ''P''(''t'') such that ''P''(1) is equal to the dimension of ''A'' as a ''K'' vector space. + +If the Krull dimension of ''A'' is positive, there is a homogeneous element ''f'' of degree one which is not a zero divisor (in fact almost all elements of degree one have this property). The Krull dimension of ''A''/''f'' is the Krull dimension of ''A'' minus one. + +The additivity of Hilbert series shows that <math>HS_{A/f}(t)=(1-t)\,HS_A(t)</math>. Iterating this a number of times equal to the Krull dimension of ''A'', we get eventually an algebra of dimension 0 whose Hilbert series is a polynomial ''P''(''t''). This show that the Hilbert series of ''A'' is +:<math>HS_A(t)=\frac{P(t)}{(1-t)^d}</math> +where the polynomial ''P''(''t'') is such that ''P''(1)&nbsp;≠&nbsp;0 and ''d'' is the Krull dimension of ''A''. + +This formula for the Hilbert series implies that the degree of the Hilbert polynomial is ''d'' and that its leading coefficient is ''P''(1)/''d''!. + +== Degree of a projective variety and Bézout's theorem == + +The Hilbert series allows us to compute the [[degree of an algebraic variety]] as the value at 1 of the numerator of the Hilbert series. This provides also a simple proof of [[Bézout's theorem]]. For this purpose, let us consider an [[projective algebraic set]] {{mvar|''V''}} defined as the set of the zeros of a [[homogeneous ideal]] <math>I\subset k[x_0, x_1, \ldots, x_n]</math>, where {{mvar|''k''}} is a field, and let <math> R=k[x_1, \ldots, x_n]/I</math> be the ring of the [[regular function]]s on the algebraic set (in this section, we do not need that the algebraic set be irreducible nor that the ideal is prime). + +If the dimension of {{mvar|''V''}}, equal to the dimension of {{mvar|''R''}} is {{mvar|d}}, the degree of {{mvar|''V''}} is the number of points of intersection, counted with multiplicity, of {{mvar|''V''}} with the intersection of <math>d</math> hyperplanes in [[general position]]. This implies that the equations of these hyperplanes, say <math>h_1, \ldots, h_{d},</math> are a [[regular sequence]], and that we have the exact sequences +:<math>0 \;\rightarrow\; \left(R/\langle h_1,\ldots, h_{k-1}\rangle \right)^{[1]}\; \xrightarrow{h_k}\; R/\langle h_1,\ldots, h_{k-1}\rangle \;\rightarrow\; R/\langle h_1,\ldots, h_{k}\rangle\;\rightarrow\; 0,</math> +for <math>k=1, \ldots, d.</math> This implies that +:<math>HS_{R/\langle h_1,\ldots, h_{d}\rangle}(t) = (1-t)^d\,HS_R(t)</math> +is a polynomial, which is equal to the numerator <math> P(t)</math> of the Hilbert series of {{mvar|R}}. After dehomogeneizing by putting <math>x_0=1</math>, [[Jordan-Hölder theorem]] for [[Artinian ring]]s allows to prove that <math>P(1)</math> is the degree of the algebraic set {{mvar|''V''}}. + +Similarly, if {{mvar|''f''}} is a homogeneous polynomial of degree <math>\delta</math>, which is not a zero divisor in {{mvar|''R''}}, the exact sequence +:<math>0 \;\rightarrow\; R^{[\delta]}\; \xrightarrow{f}\; R \;\rightarrow\; R/\langle f\rangle\;\rightarrow\; 0,</math> +shows that +:<math>HS_{R/\langle f \rangle}(t)=(1-t^\delta)HS_R(t).</math> +Looking on the numerators this proves the following generalization of Bézout's theorem.theorem: + +''If'' {{mvar|''f''}} ''is a homogeneous polynomial of degree'' <math>\delta</math>, ''which is not a zero divisor in'' {{mvar|''R''}}, ''then the degree of the intersection of'' {{mvar|''V''}} ''with the hypersurface defined by'' {{mvar|''f''}} ''is the product of the degree of'' {{mvar|''V''}} ''by'' <math>\delta</math> ''. + +The usual Bézout's theorem is easily deduced by starting from a hypersurface and intersecting it, one after the other, with <math>n-1</math> other hypersurfaces. + +== Computation of Hilbert series and Hilbert polynomial == + +The Hilbert polynomial is easily deducible from the Hilbert series. This section describes how the Hilbert series may be computed in the case of a quotient of a polynomial ring, filtered or graded by the total degree. + +Thus let ''K'' a field, <math>R=K[x_1,\ldots,x_n]</math> be a polynomial ring and ''I'' be an ideal in ''R''. Let ''H'' be the homogeneous ideal generated by the homogeneous parts of highest degree of the elements of ''I''. If ''I'' is homogeneous, then ''H''=''I''. Finally let ''B'' be a [[Gröbner basis]] of ''I'' for a [[monomial ordering]] refining the [[total degree]] partial ordering and ''G'' the (homogeneous) ideal generated by the leading monomials of the elements of ''B''. + +The computation of the Hilbert series is based on the fact that ''the filtered algebra R/I and the graded algebras R/H and R/G have the same Hilbert series''. + +Thus the computation of the Hilbert series is reduced, through the computation of a Gröbner basis, to the same problem for an ideal generated by monomials, which is usually much easier than the computation of the Gröbner basis. The [[computational complexity]] of the whole computation depends mainly on the regularity, which is the degree of the numerator of the Hilbert series. In fact the Gröbner basis may be computed by linear algebra over the polynomials of degree bounded by the regularity. + +The computation of Hilbert series and Hilbert polynomials are available in most [[computer algebra system]]s. For example in both [[Maple (software)|Maple]] and [[Magma (software)|Magma]] these functions are named ''HilbertSeries'' and ''HilbertPolynomial''. + +== References == +* {{Citation| last=Eisenbud | first=David | author-link=David Eisenbud | year=1995 | title=Commutative algebra. With a view toward algebraic geometry | volume=150 | series=Graduate Texts in Mathematics | place=New York | publisher=Springer-Verlag | id={{MathSciNet|id=1322960}} | isbn=0-387-94268-8}}. +* {{Citation| last=Schenck | first=Hal | title=Computational Algebraic Geometry | publisher=[[Cambridge University Press]] | location=[[Cambridge]] | isbn=978-0-521-53650-9 | id={{MathSciNet|id=011360}} | year=2003}} +* {{Citation| last=Stanley | first=Richard | author-link=Richard P. Stanley | year=1978 | title=Hilbert functions of graded algebras | periodical=Advances in Math. | volume=28 | issue=1 | pages=57–83 | id={{MathSciNet|id=0485835}}| doi=10.1016/0001-8708(78)90045-2 }}. + +[[Category:Commutative algebra]] +[[Category:Algebraic geometry]] + 8vnmv73qikhw4fokjr17rfc3y8sdmby + + + + Overlap–add method + 0 + 17318 + + 17319 + 2013-12-11T21:31:44Z + + Enkiwang + 0 + + /* The algorithm */ + wikitext + text/x-wiki + In [[signal processing]], the '''overlap–add method (OA, OLA)''' is an efficient way to evaluate the discrete [[convolution]] of a very long signal <math>x[n]</math> with a [[finite impulse response]] (FIR) filter <math>h[n]</math>''':''' + +:<math> +\begin{align} +y[n] = x[n] * h[n] \ \stackrel{\mathrm{def}}{=} \ \sum_{m=-\infty}^{\infty} h[m] \cdot x[n-m] += \sum_{m=1}^{M} h[m] \cdot x[n-m], +\end{align}</math> + +where ''h''[''m''] = 0 for ''m'' outside the region [1, ''M'']. + +The concept is to divide the problem into multiple convolutions of ''h''[''n''] with short segments of <math>x[n]</math>''':''' + +:<math>x_k[n] \ \stackrel{\mathrm{def}}{=} +\begin{cases} +x[n+kL] & n=1,2,\ldots,L\\ +0 & \textrm{otherwise}, +\end{cases} +</math> + +where ''L'' is an arbitrary segment length. Then''':''' + +:<math>x[n] = \sum_{k} x_k[n-kL],\,</math> + +and ''y''[''n''] can be written as a sum of short convolutions''':''' + +:<math> +\begin{align} +y[n] = \left(\sum_{k} x_k[n-kL]\right) * h[n] &= \sum_{k} \left(x_k[n-kL]* h[n]\right)\\ +&= \sum_{k} y_k[n-kL], +\end{align} +</math> + +where &nbsp;<math>y_k[n] \ \stackrel{\mathrm{def}}{=} \ x_k[n]*h[n]\,</math>&nbsp; is zero outside the region [1,&nbsp;''L''&nbsp;+&nbsp;''M''&nbsp;&minus;&nbsp;1]. &nbsp;And for any parameter &nbsp;<math>N\ge L+M-1,\,</math>&nbsp; it is equivalent to the <math>N\,</math>-point [[circular convolution]] of <math>x_k[n]\,</math> with <math>h[n]\,</math>&nbsp; in the region&nbsp;[1,&nbsp;''N'']. + +The advantage is that the [[circular convolution]] can be computed very efficiently as follows, according to the [[Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem|circular convolution theorem]]''':''' + +{{NumBlk|:|<math>y_k[n] = \textrm{IFFT}\left(\textrm{FFT}\left(x_k[n]\right)\cdot\textrm{FFT}\left(h[n]\right)\right)</math>|{{EquationRef|Eq.1}}}} + +where FFT and IFFT refer to the [[fast Fourier transform]] and inverse +fast Fourier transform, respectively, evaluated over <math>N</math> discrete +points. + +== The algorithm == + +[[Image:Depiction of overlap-add algorithm.png|frame|none|Figure 1: the overlap–add method]] + +Fig. 1 sketches the idea of the overlap–add method. The +signal <math>x[n]</math> is first partitioned into non-overlapping sequences, +then the [[discrete Fourier transform]]s of the sequences <math>y_k[n]</math> +are evaluated by multiplying the FFT of <math>x_k[n]</math> with the FFT of +<math>h[n]</math>. After recovering of <math>y_k[n]</math> by inverse FFT, the resulting +output signal is reconstructed by overlapping and adding the <math>y_k[n]</math> +as shown in the figure. The overlap arises from the fact that a linear +convolution is always longer than the original sequences. In the early days of development of the fast Fourier transform, <math>L</math> was often chosen to be a power of 2 for efficiency, but further development has revealed efficient transforms for larger prime factorizations of L, reducing computational sensitivity to this parameter. A [[pseudocode]] of the algorithm is the +following: + + '''Algorithm 1''' (''OA for linear convolution'') + Evaluate the best value of N and L + H = FFT(h,N) <span style="color:green;">(''zero-padded FFT'')</span> + i = 1 + '''while''' i <= Nx <span style="color:green;">(''Nx: the last index of x[n]'')</span> + il = min(i+L-1,Nx) + yt = IFFT( FFT(x(i:il),N) * H, N) + k = min(i+N-1,Nx) + y(i:k) = y(i:k) + yt(1:k-i+1) <span style="color:green;">(''add the overlapped output blocks'')</span> + i = i+L + '''end''' + +== Circular convolution with the overlap–add method == + +When sequence ''x''[''n''] is periodic, and ''N''<sub>''x''</sub> is the period, then ''y''[''n''] is also periodic, with the same period. &nbsp;To compute one period of y[n], Algorithm 1 can first be used to convolve ''h''[''n''] with just one period of ''x''[''n'']. &nbsp;In the region ''M'' ≤ ''n'' ≤ ''N''<sub>''x''</sub>, &nbsp;the resultant ''y''[''n''] sequence is correct. &nbsp;And if the next ''M''&nbsp;&minus;&nbsp;1 values are added to the first ''M''&nbsp;&minus;&nbsp;1 values, then the region 1 ≤ ''n'' ≤ ''N''<sub>''x''</sub> will represent the desired convolution. The modified pseudocode is''':''' + + '''Algorithm 2''' (''OA for circular convolution'') + Evaluate Algorithm 1 + y(1:M-1) = y(1:M-1) + y(Nx+1:Nx+M-1) + y = y(1:Nx) + '''end''' + +== Cost of the overlap-add method == + +The cost of the convolution can be associated to the number of complex +multiplications involved in the operation. The major computational +effort is due to the FFT operation, which for a radix-2 algorithm +applied to a signal of length <math>N</math> roughly calls for <math>C=\frac{N}{2}\log_2 N</math> +complex multiplications. It turns out that the number of complex multiplications +of the overlap-add method are: + +:<math>C_{OA}=\left\lceil \frac{N_x}{N-M+1}\right\rceil +N\left(\log_2 N+1\right)\,</math> + +<math>C_{OA}</math> accounts for the FFT+filter multiplication+IFFT operation. + +The additional cost of the <math>M_L</math> sections involved in the circular +version of the overlap–add method is usually very small and can be +neglected for the sake of simplicity. The best value of <math>N</math> +can be found by numerical search of the minimum of <math>C_{OA}\left(N\right)=C_{OA}\left(2^m \right)</math> +by spanning the integer <math>m</math> in the range <math>\log_2\left(M\right)\le m\le\log_2 \left(N_x\right)</math>. +Being <math>N</math> a power of two, the FFTs of the overlap–add method +are computed efficiently. Once evaluated the value of <math>N</math> it +turns out that the optimal partitioning of <math>x[n]</math> has <math>L=N-M+1</math>. +For comparison, the cost of the standard circular convolution of <math>x[n]</math> +and <math>h[n]</math> is: + +:<math>C_S=N_x\left(\log_2 N_x+1\right)\,</math> + +Hence the cost of the overlap–add method scales almost as <math>O\left(N_x\log_2 N\right)</math> +while the cost of the standard circular convolution method is almost +<math>O\left(N_x\log_2 N_x \right)</math>. However such functions accounts +only for the cost of the complex multiplications, regardless of the +other operations involved in the algorithm. A direct measure of the +computational time required by the algorithms is of much interest. +Fig. 2 shows the ratio of the measured time to evaluate +a standard circular convolution using &nbsp;{{EquationNote|Eq.1}} with +the time elapsed by the same convolution using the overlap–add method +in the form of Alg 2, vs. the sequence and the filter length. Both algorithms have been implemented under [[Matlab]]. The +bold line represent the boundary of the region where the overlap–add +method is faster (ratio>1) than the standard circular convolution. +Note that the overlap–add method in the tested cases can be three +times faster than the standard method. + +[[Image:gain oa method.png|frame|none|Figure 2: Ratio between the time required by &nbsp;{{EquationNote|Eq.1}} and the time required by the overlap–add Alg. 2 to evaluate +a complex circular convolution, vs the sequence length <math>N_x</math> and +the filter length <math>M</math>.]] + +== See also == + +*[[Overlap–save method]] + +== References == + +*{{Cite book + | author=Rabiner, Lawrence R.; Gold, Bernard + | authorlink= + | coauthors= + | title=Theory and application of digital signal processing + | year=1975 + | publisher=Prentice-Hall + | location=Englewood Cliffs, N.J. + | isbn=0-13-914101-4 + | pages=63–67 +}} +*{{Cite book + | author=Oppenheim, Alan V.; Schafer, Ronald W. + | authorlink= + | coauthors= + | title=Digital signal processing + | year=1975 + | publisher=Prentice-Hall + | location=Englewood Cliffs, N.J. + | isbn=0-13-214635-5 + | pages= +}} +*{{Cite book + | author=Hayes, M. Horace + | authorlink= + | coauthors= + | title = Digital Signal Processing + | series = Schaum's Outline Series + | year=1999 + | publisher=McGraw Hill + | location=New York + | isbn=0-07-027389-8 + | pages= +}} + +== External links == + +{{DEFAULTSORT:Overlap-Add Method}} +[[Category:Signal processing]] +[[Category:Transforms]] +[[Category:Fourier analysis]] +[[Category:Numerical analysis]] + 1z9alglnah1axhcmon51jy5yda5rjo2 + + + diff --git a/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/gold.json b/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/gold.json new file mode 100644 index 00000000..9d12be1e --- /dev/null +++ b/mathosphere-core/src/test/resources/com/formulasearchengine/mathosphere/mlp/gold/gold.json @@ -0,0 +1,2515 @@ +[ + { + "definitions": { + "W": [ + { + "Q7913892": "Van der Waerden number" + } + ], + "k": [ + { + "Q12503": "integer (number that can be written without a fractional or decimal component)" + } + ], + "\\varepsilon": [ + { + "Q3176558": "positive number (real number strictly greater than zero)" + } + ] + }, + "formula": { + "qID": "1", + "oldId": "2459", + "fid": "3", + "math_inputtex": "W(2, k) > 2^k/k^\\varepsilon", + "title": "Van_der_Waerden's_theorem" + } + }, + { + "definitions": { + "X": [ + { + "Q36161": "set (fundamental mathematical concept related to the notions of belonging or inclusion)" + } + ], + "\\Sigma": [ + { + "Q739925": "Family of sets (a collection of some of the subsets of a set)" + } + ] + }, + "formula": { + "qID": "2", + "oldId": "4050", + "fid": "189", + "math_inputtex": "(X,\\Sigma)", + "title": "Bounded_variation" + } + }, + { + "definitions": { + "p": [ + { + "Q49008": " prime number (natural number greater than 1 that has no positive divisors other than 1 and itsel)" + } + ], + "n": [ + { + "Q12503": "integer (number that can be written without a fractional or decimal component)" + } + ] + }, + "formula": { + "qID": "3", + "oldId": "4189", + "fid": "50", + "math_inputtex": "(p-1)!^n", + "title": "Lindemann–Weierstrass_theorem" + } + }, + { + "definitions": { + "f_{c}": [ + { + "Q5156597": "Complex quadratic polynomial" + } + ], + "z": [ + { + "Q11567": "complex number (number that can be put in the form a + bi, where a and b are real numbers and i is called the imaginary unit )" + } + ], + "c": [ + { + "Q1413083": "parameter" + }, + { + "Q50700": "coefficient (number just before a variable)" + } + ] + }, + "formula": { + "qID": "4", + "oldId": "15332", + "fid": "68", + "math_inputtex": "f_c(z) = z^2 + c", + "title": "Orbit_portrait" + } + }, + { + "definitions": { + "x": [ + { + "Q50701": "variable (a value that can change, usually with a context of an equation or operation)" + }, + { + "Q935944": "Free variables and bound variables" + } + ], + "y": [ + { + "Q50701": "variable (a value that can change, usually with a context of an equation or operation)" + }, + { + "Q935944": "Free variables and bound variables" + } + ], + "P": [ + { + "Q1144319": "Predicate" + } + ] + }, + "formula": { + "qID": "5", + "oldId": "391", + "fid": "114", + "math_inputtex": "\\forall x \\, \\forall y \\, P(x,y) \\Leftrightarrow \\forall y \\, \\forall x \\, P(x,y)", + "title": "First-order_logic" + } + }, + { + "definitions": { + "\\alpha": [ + { + "Q50700": "coefficient (number just before a variable)" + } + ], + "x": [ + { + "Q3150667": "Indeterminate " + } + ] + }, + "formula": { + "qID": "6", + "oldId": "6937", + "fid": "5", + "math_inputtex": "\\alpha(x)", + "title": "Clenshaw_algorithm" + } + }, + { + "definitions": { + "\\alpha": [ + { + "Q2256802": "Schwellenwert", + "Q729113": "Weighted mean" + } + ], + "x": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ] + }, + "formula": { + "qID": "7", + "oldId": "25610", + "fid": "24", + "math_inputtex": "\\alpha(x)", + "title": "Isolation_lemma" + } + }, + { + "definitions": { + "\\alpha": [ + "exponent (of the Hölder condition)", + { + "Q33456": "exponentiation (mathematical operation)" + } + ], + "x": [ + { + "Q44946": "point (fundamental object of Euclidean geometry)" + } + ] + }, + "formula": { + "qID": "8", + "oldId": "27419", + "fid": "4", + "math_inputtex": "\\alpha(x)", + "title": "Singularity_spectrum" + } + }, + { + "definitions": { + "\\Psi": [ + { + "Q230883": "quantum state (state of a quantum system )" + } + ], + "i_{1}": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "i_{2}": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "\\alpha_{1}": [ + { + "Q50700": "coefficient (number just before a variable)" + } + ], + "\\alpha_{2}": [ + { + "Q50700": "coefficient (number just before a variable)" + } + ], + "\\Gamma": [ + { + "Q50700": "coefficient (number just before a variable)" + } + ], + "\\lambda": [ + { + "Q50700": "coefficient (number just before a variable)" + } + ], + "\\Phi": [ + "Schmidt vectors" + ], + "N": [ + { + "Q378201": "Qubit (unit of information)", + "Q302462": "Count" + } + ] + }, + "formula": { + "qID": "9", + "oldId": "14488", + "fid": "36", + "math_inputtex": "|{\\Psi}\\rangle=\\sum_{i_1,i_2,\\alpha_1,\\alpha_2}\\Gamma^{[1]i_1}_{\\alpha_1}\\lambda^{[1]}_{\\alpha_1}\\Gamma^{[2]i_2}_{\\alpha_1\\alpha_2}\\lambda^{[2]}_{{\\alpha}_2}|{i_1i_2}\\rangle|{\\Phi^{[3..N]}_{\\alpha_2}}\\rangle", + "title": "Time-evolving_block_decimation" + } + }, + { + "definitions": { + "z": [ + { + "Q3913": "binary number (a system that represents numeric values using two symbols; 0 and 1)" + } + ], + "x": [ + { + "Q3913": "binary number (a system that represents numeric values using two symbols; 0 and 1)" + } + ], + "y": [ + { + "Q3913": "binary number (a system that represents numeric values using two symbols; 0 and 1)" + } + ] + }, + "formula": { + "qID": "10", + "oldId": "16784", + "fid": "38", + "math_inputtex": "z*x\\le y", + "title": "Monoidal_t-norm_logic" + } + }, + { + "definitions": { + "x": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "c": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ] + }, + "formula": { + "qID": "11", + "oldId": "16716", + "fid": "20", + "math_inputtex": " \\frac{d}{dx}\\left( \\log_c x\\right) = {1 \\over x \\ln c} , \\qquad c > 0, c \\ne 1", + "title": "Differentiation_rules" + } + }, + { + "definitions": { + "\\theta": [ + { + "Q11352": "angle ( figure formed by two rays)" + } + ], + "n": [ + { + "Q2303886": "index (the numbering of various objects within the mathematical notation)" + } + ] + }, + "formula": { + "qID": "12", + "oldId": "2875", + "fid": "2", + "math_inputtex": "\\theta = n \\times 137.508^\\circ,", + "title": "Fermat's_spiral" + } + }, + { + "definitions": { + "s_{V}": [ + "Kemeny-Young score" + ], + "\\mathcal{R}": [ + { + "Q526719": "ranking (relationship between items in a set)" + } + ] + }, + "formula": { + "qID": "13", + "oldId": "9319", + "fid": "2", + "math_inputtex": "s_V(\\mathcal{R})", + "title": "Consistency_criterion" + } + }, + { + "definitions": { + "\\ell": [ + { + "Q11348": "function (binary relation, which is left-total and right-unique)" + } + ], + "m": [ + { + "Q1027788": "Argument of a function (independent variable of math function)" + } + ] + }, + "formula": { + "qID": "14", + "oldId": "21982", + "fid": "43", + "math_inputtex": "\\ell(m)", + "title": "Basis_(universal_algebra)" + } + }, + { + "definitions": { + "b": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "x": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ] + }, + "formula": { + "qID": "15", + "oldId": "26321", + "fid": "4", + "math_inputtex": "bx-x^2", + "title": "Adequality" + } + }, + { + "definitions": { + "\\omega_{k}": [ + { + "Q946764": "Natural frequency" + } + ] + }, + "formula": { + "qID": "16", + "oldId": "13492", + "fid": "85", + "math_inputtex": "\\omega_{k}", + "title": "Mason–Weaver_equation" + } + }, + { + "definitions": { + "\\mathbf{m}_{1}": [ + "class mean", + { + "Q19033": "arithmetic mean (sum of a collection of numbers divided by the number of numbers in the collection)" + } + ] + }, + "formula": { + "qID": "17", + "oldId": "27275", + "fid": "2", + "math_inputtex": "\\mathbf{m}_1", + "title": "Kernel_Fisher_discriminant_analysis" + } + }, + { + "definitions": { + "r_{ij}": [ + { + "Q126017": "distance (straight line that connects two points)" + } + ] + }, + "formula": { + "qID": "18", + "oldId": "15283", + "fid": "17", + "math_inputtex": "r_{ij}", + "title": "Implicit_solvation" + } + }, + { + "definitions": { + "Z": [ + "canonical partition function", + { + "Q230963": "Partition function (statistical mechanics) " + } + ], + "j": [ + { + "Q2303886": "index (the numbering of various objects within the mathematical notation )" + } + ], + "g_{j}": [ + "degeneracy factor" + ], + "\\mathrm{e}": [ + { + "Q168698": "exponential function (unique function which is its own derivative and equals one at zero)" + } + ], + "\\beta": [ + "inverse temperature", + { + "Q917476": "Thermodynamic beta" + } + ], + "E_{j}": [ + "energy level" + ] + }, + "formula": { + "qID": "19", + "oldId": "3897", + "fid": "2", + "math_inputtex": " Z = \\sum_{j} g_j \\cdot \\mathrm{e}^{- \\beta E_j}", + "title": "Partition_function_(statistical_mechanics)" + } + }, + { + "definitions": { + "S'": [ + "tristimulus values" + ] + }, + "formula": { + "qID": "20", + "oldId": "8808", + "fid": "39", + "math_inputtex": "S'", + "title": "Color_balance" + } + }, + { + "definitions": { + "S'": [ + { + "Q17285": "plane (flat, two-dimensional surface)" + }, + "plane", + "geometric surface", + "area", + "2-dimensional manifold" + ] + }, + "formula": { + "qID": "21", + "oldId": "17336", + "fid": "30", + "math_inputtex": "S'", + "title": "Hilbert's_theorem_(differential_geometry)" + } + }, + { + "definitions": { + "k": [ + { + "Q1663694": "inclusion map" + } + ], + "l": [ + { + "Q1663694": "inclusion map" + } + ], + "i": [ + { + "Q1663694": "inclusion map" + } + ], + "j": [ + { + "Q1663694": "inclusion map" + } + ] + }, + "formula": { + "qID": "22", + "oldId": "4416", + "fid": "4", + "math_inputtex": "\\text{Ker} (k_* - l_*) \\cong \\text{Im} (i_*, j_*).", + "title": "Mayer–Vietoris_sequence" + } + }, + { + "definitions": { + "D": [ + "relative graphlet frequency distance" + ], + "G": [ + "graphlets", + "graphs", + "graph", + { + "Q5597315": "Graphlets" + } + ], + "H": [ + "graphlets", + "graphs", + "graph", + { + "Q5597315": "Graphlets" + } + ], + "i": [ + "number", + { + "Q2303886": "index (the numbering of various objects within the mathematical notation)" + } + ] + }, + "formula": { + "qID": "23", + "oldId": "25343", + "fid": "3", + "math_inputtex": "D(G,H) = \\sum_{i=1}^{29} | F_i(G) - F_i(H) |", + "comments": "F_i is a substitution", + "title": "Graphlets" + } + }, + { + "definitions": { + "E_{\\mathrm{k}}": [ + "total kinetic energy", + { + "Q46276": "kinetic energy" + } + ], + "E_{\\mathrm{r}}": [ + "rotational energy", + "angular kinetic energy", + { + "Q2140940": "Rotational energy" + } + ], + "E_{\\mathrm{t}}": [ + "translational kinetic energy" + ] + }, + "formula": { + "qID": "24", + "oldId": "566", + "fid": "28", + "math_inputtex": " E_\\text{k} = E_t + E_\\text{r} \\, ", + "title": "Kinetic_energy" + } + }, + { + "definitions": { + "\\lambda": [ + "length", + { + "Q36253": "length (measured dimension of an object)" + } + ], + "L": [ + { + "Q1096885": "lattice (subgroup of a real vector space or a Lie group)" + }, + "lattice" + ], + "B": [ + { + "Q189569": "basis" + } + ], + "d": [ + "number", + { + "Q3176558": "positive number (real number strictly greater than zero)" + } + ] + }, + "formula": { + "qID": "25", + "oldId": "22618", + "fid": "22", + "math_inputtex": "\\lambda(L(B)) \\leq d", + "title": "Lattice_problem" + } + }, + { + "definitions": { + "L": [ + "weighted path length" + ], + "C": [ + { + "Q188889": "code (system of rules to convert information (codification))" + } + ], + "T": [ + { + "Q188889": "code (system of rules to convert information (codification))" + } + ] + }, + "formula": { + "qID": "26", + "oldId": "485", + "fid": "10", + "math_inputtex": "L\\left(C\\right) \\leq L\\left(T\\right)", + "title": "Huffman_coding" + } + }, + { + "definitions": { + "v": [ + { + "Q13824": "phase velocity (rate at which the phase of the wave propagates in space)" + } + ], + "c": [ + { + "Q2111": "speed of light (speed at which all massless particles and associated fields travel in vacuum)" + } + ], + "n": [ + { + "Q174102": "Refractive index (optical characteristic of a material)" + } + ] + }, + "formula": { + "qID": "27", + "oldId": "2623", + "fid": "0", + "math_inputtex": "v = \\frac{c}{n}", + "title": "Dispersion_(optics)" + } + }, + { + "definitions": { + "\\sigma_{y}": [ + "Allan deviation", + { + "Q1440227": "Allan variance" + } + ], + "\\tau": [ + "observation time" + ], + "\\pi": [ + { + "Q167": "pi (ratio of the circumference of a circle to its diameter)" + } + ], + "h_{-2}": [ + "Power coefficient" + ] + }, + "formula": { + "qID": "28", + "oldId": "1227", + "fid": "93", + "math_inputtex": "\\sigma_y^2(\\tau) = \\frac{2\\pi^2\\tau}{3}h_{-2}", + "title": "Allan_variance" + } + }, + { + "definitions": { + "R_{\\text{s normal}}": [ + "surface resistance" + ], + "\\omega": [ + "resonant frequency" + ], + "\\mu_{0}": [ + { + "Q1515261": "vacuum permeability (physical constant)" + } + ], + "\\sigma": [ + "electrical conductivity" + ] + }, + "formula": { + "qID": "29", + "oldId": "21692", + "fid": "5", + "math_inputtex": " R_{s\\ normal} = \\sqrt{ \\frac{\\omega \\mu_0} {2 \\sigma} }", + "title": "Superconducting_radio_frequency" + } + }, + { + "definitions": { + "\\phi_{1}": [ + "phase-range" + ] + }, + "formula": { + "qID": "30", + "oldId": "24277", + "fid": "5", + "math_inputtex": " \\phi_1 = -30^\\circ...+30^\\circ", + "title": "Vienna_rectifier" + } + }, + { + "definitions": { + "T_{c}": [ + { + "Q1128317": "tax rate (ratio (usually expressed as a percentage) at which a business or person is taxed)" + } + ] + }, + "formula": { + "qID": "31", + "oldId": "3625", + "fid": "19", + "math_inputtex": "T_c", + "title": "Modigliani–Miller_theorem" + } + }, + { + "definitions": { + "T_{c}": [ + "critical Temperature", + { + "Q111059": "critical point (critical point where phase boundaries disappear)" + } + ] + }, + "formula": { + "qID": "32", + "oldId": "17517", + "fid": "2", + "math_inputtex": "T_c", + "title": "Proximity_effect_(superconductivity)" + } + }, + { + "definitions": { + "T_{c}": [ + "critical surface", + { + "Q111059": "critical point (critical point where phase boundaries disappear)" + } + ] + }, + "formula": { + "qID": "33", + "oldId": "22538", + "fid": "9", + "math_inputtex": "T_c", + "title": "Multicritical_point" + } + }, + { + "definitions": { + "P_{1}": [ + { + "Q43260": "polynomial (mathematical expression consisting of variables and coefficients)" + } + ], + "X": [ + { + "Q3150667": "Indeterminate " + } + ], + "P": [ + { + "Q43260": "polynomial (mathematical expression consisting of variables and coefficients)" + } + ], + "\\alpha_{1}": [ + { + "Q11567": "complex number (number that can be put in the form a + bi, where a and b are real numbers and i is called the imaginary unit )" + } + ] + }, + "formula": { + "qID": "34", + "oldId": "17497", + "fid": "55", + "math_inputtex": "P_1(X)=P(X)/(X-\\alpha_1)", + "title": "Jenkins–Traub_algorithm" + } + }, + { + "definitions": { + "k": [ + { + "Q2095069": "normalizing constant" + } + ], + "n": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ] + }, + "formula": { + "qID": "35", + "oldId": "5626", + "fid": "4", + "math_inputtex": "= \\frac{k}{n}.", + "title": "Doomsday_argument" + } + }, + { + "definitions": { + "n": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "i": [ + { + "Q2303886": "index (the numbering of various objects within the mathematical notation)" + } + ], + "r": [ + { + "Q2303886": "index (the numbering of various objects within the mathematical notation)" + } + ], + "p_{i}": [ + { + "Q1137759": " prime factor ( prime number dividing an integer )" + } + ], + "a_{i}": [ + "maximum power" + ] + }, + "formula": { + "qID": "36", + "oldId": "4630", + "fid": "12", + "math_inputtex": "n = \\prod_{i=1}^r p_i^{a_i}", + "title": "Divisor_function" + } + }, + { + "definitions": { + "H": [ + "system function", + "system response", + "transfer function" + ], + "j": [ + { + "Q193796": "imaginary unit (square root of negative one, used to define complex numbers)" + } + ], + "\\omega": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "\\mathcal{F}": [ + { + "Q6520159": "Fourier transform (mathematical transform that expresses a mathematical function of time as a function of frequency)" + } + ], + "h": [ + "impulse response" + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ] + }, + "formula": { + "qID": "37", + "oldId": "8338", + "fid": "87", + "math_inputtex": "H(j \\omega) = \\mathcal{F}\\{h(t)\\}", + "title": "LTI_system_theory" + } + }, + { + "definitions": { + "\\pi": [ + { + "Q167": "pi (ratio of the circumference of a circle to its diameter)" + } + ] + }, + "formula": { + "qID": "38", + "oldId": "1320", + "fid": "31", + "math_inputtex": "\\pi/4", + "title": "Phase-shift_keying" + } + }, + { + "definitions": { + "x": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "y": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "n": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "k": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ] + }, + "formula": { + "qID": "39", + "oldId": "133", + "fid": "6", + "math_inputtex": "(x+y)^n = \\sum_{k=0}^n {n \\choose k}x^{n-k}y^k = \\sum_{k=0}^n {n \\choose k}x^{k}y^{n-k}.\n", + "title": "Binomial_theorem" + } + }, + { + "definitions": { + "A": [ + { + "Q3686031": "concentration (type of physical property)" + } + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ], + "k": [ + "reaction rate coefficient" + ] + }, + "formula": { + "qID": "40", + "oldId": "9082", + "fid": "33", + "math_inputtex": "\\ [A]_t = -kt + [A]_0", + "title": "Rate_equation" + } + }, + { + "definitions": { + "q": [ + { + "Q9492": "probability (measure of the expectation that an event will occur or a statement is true)" + } + ] + }, + "formula": { + "qID": "41", + "oldId": "3545", + "fid": "1", + "math_inputtex": "q^{42}", + "title": "Martingale_(betting_system)" + } + }, + { + "definitions": { + "\\alpha": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "d": [ + { + "Q4440864": "dimension (minimum number of coordinates within a space needed to specify any point)" + } + ], + "\\varepsilon": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ] + }, + "formula": { + "qID": "42", + "oldId": "15433", + "fid": "5", + "math_inputtex": "\\alpha(d) \\le \\left(\\sqrt{3/2} + \\varepsilon\\right)^d", + "title": "Borsuk's_conjecture" + } + }, + { + "definitions": { + "f^{\\mu}": [ + "4-acceleration" + ], + "G": [ + { + "Q18373": " gravitational constant (empirical physical constant)" + } + ], + "c": [ + { + "Q2111": "speed of light (speed at which all massless particles and associated fields travel in vacuum)" + } + ], + "A": [ + { + "Q1289248": "scalar (real numbers in the context auf linear algebra)" + } + ], + "T_{\\alpha\\beta}": [ + { + "Q876346": "Stress–energy tensor" + } + ], + "B": [ + { + "Q1289248": "scalar (real numbers in the context auf linear algebra)" + } + ], + "T": [ + "trace of the stress energy tensor" + ], + "\\eta_{\\alpha\\beta}": [ + "Minkowski metric", + { + "Q464794": "Minkowski spacetime (mathematical space setting which eases explanation of special relativity)" + } + ], + "\\delta _{\\nu}^{\\mu}": [ + { + "Q193794": "identity matrix (n × n square matrix with ones on the main diagonal and zeros elsewhere)" + }, + { + "Q192826": "Kronecker delta (function)" + } + ], + "u^{\\mu}": [ + "4-velocity" + ], + "u_{\\nu}": [ + "4-velocity" + ], + "u^{\\alpha}": [ + "4-velocity" + ], + "x^{\\nu}": [ + "4-position" + ], + "u^{\\beta}": [ + "4-velocity" + ] + }, + "formula": { + "qID": "43", + "oldId": "12471", + "fid": "96", + "math_inputtex": " f^{\\mu} = - 8\\pi { G \\over { 3 c^4 } } \\left ( {A \\over 2} T_{\\alpha \\beta} + {B \\over 2} T \\eta_{\\alpha \\beta} \\right ) \\left ( \\delta^{\\mu}_{\\nu} + u^{\\mu} u_{\\nu} \\right ) u^{\\alpha} x^{\\nu} u^{\\beta} ", + "comment": "einstein notation", + "title": "Theoretical_motivation_for_general_relativity" + } + }, + { + "definitions": { + "u_{g}": [ + "velocity field" + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ], + "f_{0}": [ + "Coriolis parameter" + ], + "v_{a}": [ + "velocity" + ], + "\\beta": [ + "latitudinal change" + ], + "y": [ + "y component" + ], + "v_{g}": [ + "geostrophic velocity", + { + "Q929043": "Geostrophic wind" + } + ] + }, + "formula": { + "qID": "44", + "oldId": "27611", + "fid": "0", + "math_inputtex": " \\frac{D_g u_g}{Dt} - f_{0}v_a - \\beta y v_g = 0 ", + "title": "Q-Vectors" + } + }, + { + "definitions": { + "I_{c}": [ + "neighbors indices" + ] + }, + "formula": { + "qID": "45", + "oldId": "22412", + "fid": "10", + "math_inputtex": "I_c", + "title": "Stencil_code" + } + }, + { + "definitions": { + "A": [ + "general multivector" + ], + "M": [ + "reflection versor", + { + "Q2518235": "Versor" + } + ], + "\\alpha": [ + { + "Q782566": "automorphism (isomorphism from a mathematical object to itself)" + } + ] + }, + "formula": { + "qID": "46", + "oldId": "460", + "fid": "81", + "math_inputtex": "\\, A \\mapsto M\\alpha(A)M^{-1} ,", + "title": "Geometric_algebra" + } + }, + { + "definitions": { + "\\Gamma_{\\infty}": [ + "undiscounted game", + { + "Q1074380": "stochastic game" + } + ] + }, + "formula": { + "qID": "47", + "oldId": "16803", + "fid": "53", + "math_inputtex": "\\Gamma_{\\infty}", + "title": "Stochastic_game" + } + }, + { + "definitions": { + "Y": [ + "hypercharge" + ], + "\\beta": [ + { + "Q1413083": "parameter" + } + ], + "I": [ + { + "Q1413083": "parameter" + } + ], + "T_{8}": [ + { + "Q1008943": "Gell-Mann matrices " + } + ], + "X": [ + "charge" + ] + }, + "formula": { + "qID": "48", + "oldId": "15858", + "fid": "3", + "math_inputtex": "Y = \\beta T_8 + I X", + "title": "331_model" + } + }, + { + "definitions": { + "\\mu": [ + "function" + ], + "A": [ + "sets of real numbers" + ] + }, + "formula": { + "qID": "49", + "oldId": "7473", + "fid": "12", + "math_inputtex": " \\mu (A)= \\begin{cases} 1 & \\mbox{ if } 0 \\in A \\\\ \n 0 & \\mbox{ if } 0 \\notin A.\n\\end{cases}", + "title": "Sigma_additivity" + } + }, + { + "definitions": { + "\\lambda_{in}": [ + "inner sphere reorganisation energy" + ] + }, + "formula": { + "qID": "50", + "oldId": "12130", + "fid": "56", + "math_inputtex": "\\lambda_{in}", + "title": "Marcus_theory" + } + }, + { + "definitions": { + "\\mathrm{rpm}_{\\text{motor}}": [ + { + "Q1256787": "rotational speed (physical quantity)" + } + ] + }, + "formula": { + "qID": "51", + "oldId": "16788", + "fid": "2", + "math_inputtex": "rpm_{motor}", + "title": "Centrifugal_fan" + } + }, + { + "definitions": { + "u_{1}": [ + "control" + ], + "\\mathbf{x}": [ + { + "Q13471665": "vector" + } + ], + "z_{1}": [ + { + "Q1289248": "scalar (real numbers in the context auf linear algebra)" + } + ], + "v_{1}": [ + "control" + ], + "\\dot{u}_{x}": [ + "control law" + ], + "V_{x}": [ + { + "Q2337858": "Lyapunov function" + } + ], + "g_{x}": [ + { + "Q11348": "function (binary relation, which is left-total and right-unique)" + } + ], + "k_{1}": [ + "gain parameter" + ], + "u_{x}": [ + { + "Q11348": "function (binary relation, which is left-total and right-unique)" + } + ], + "e_{1}": [ + "control" + ], + "f_{x}": [ + { + "Q11348": "function (binary relation, which is left-total and right-unique)" + } + ], + "\\dot{\\mathbf{x}}": [ + "subsystem" + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ] + }, + "formula": { + "qID": "52", + "oldId": "21960", + "fid": "157", + "math_inputtex": "\\underbrace{u_1(\\mathbf{x},z_1)=v_1+\\dot{u}_x}_{\\text{By definition of }v_1}=\\overbrace{-\\frac{\\partial V_x}{\\partial \\mathbf{x}}g_x(\\mathbf{x})-k_1(\\underbrace{z_1-u_x(\\mathbf{x})}_{e_1})}^{v_1} \\, + \\, \\overbrace{\\frac{\\partial u_x}{\\partial \\mathbf{x}}(\\underbrace{f_x(\\mathbf{x})+g_x(\\mathbf{x})z_1}_{\\dot{\\mathbf{x}} \\text{ (i.e., } \\frac{\\operatorname{d}\\mathbf{x}}{\\operatorname{d}t} \\text{)}})}^{\\dot{u}_x \\text{ (i.e., } \\frac{ \\operatorname{d}u_x }{\\operatorname{d}t} \\text{)}}", + "title": "Backstepping" + } + }, + { + "definitions": { + "E": [ + { + "Q2918589": "Expectation value" + } + ], + "\\hat{\\sigma}": [ + { + "Q1045555": "maximum likelihood" + } + ], + "n": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "\\sigma": [ + "constant volatility" + ] + }, + "formula": { + "qID": "53", + "oldId": "14318", + "fid": "10", + "math_inputtex": "E \\left[ \\hat{\\sigma}^2\\right]= \\frac{n-1}{n} \\sigma^2", + "title": "Stochastic_volatility" + } + }, + { + "definitions": { + "\\mathsf{fv}": [ + "free variable set" + ] + }, + "formula": { + "qID": "54", + "oldId": "13072", + "fid": "63", + "math_inputtex": "\\mathsf{fv}", + "title": "Separation_logic" + } + }, + { + "definitions": { + "x": [ + "x coordinate" + ], + "y": [ + "y coordinate" + ], + "I": [ + "pixel intensities" + ] + }, + "formula": { + "qID": "55", + "oldId": "10050", + "fid": "2", + "math_inputtex": "\\sum_x \\sum_y I(x,y) \\,\\!", + "title": "Image_moment" + } + }, + { + "definitions": { + "\\boldsymbol{F}_{r}": [ + { + "Q82580": "friction" + } + ] + }, + "formula": { + "qID": "56", + "oldId": "5363", + "fid": "5", + "math_inputtex": "\\boldsymbol{F}_r", + "title": "Geostrophic_wind" + } + }, + { + "definitions": { + "B": [ + "R-module", + { + "Q18848": "module" + } + ], + "A": [ + "R-module", + { + "Q18848": "module" + } + ] + }, + "formula": { + "qID": "57", + "oldId": "6693", + "fid": "9", + "math_inputtex": "0\\rightarrow B\\rightarrow A\\oplus B\\rightarrow A\\rightarrow0.", + "title": "Ext_functor" + } + }, + { + "definitions": { + "Y": [ + "direction" + ], + "T": [ + { + "Q1952404": "Multilinear map" + } + ], + "\\alpha_{1}": [ + { + "Q11703678": "section (right inverse of a fiber bundle map)" + } + ], + "\\alpha_{2}": [ + { + "Q11703678": "section (right inverse of a fiber bundle map)" + } + ], + "X_{1}": [ + "section" + ], + "X_{2}": [ + "section" + ] + }, + "formula": { + "qID": "58", + "oldId": "4787", + "fid": "50", + "math_inputtex": "(\\nabla_Y T)(\\alpha_1, \\alpha_2, \\ldots, X_1, X_2, \\ldots) =Y(T(\\alpha_1,\\alpha_2,\\ldots,X_1,X_2,\\ldots))", + "title": "Covariant_derivative" + } + }, + { + "definitions": { + "n": [ + { + "Q2303886": "index (the numbering of various objects within the mathematical notation)" + } + ], + "\\mathbb{Z}": [ + { + "Q1096885": "lattice (subgroup of a real vector space or a Lie group )" + } + ], + "d": [ + { + "Q4440864": "dimension (minimum number of coordinates within a space needed to specify any point)" + } + ], + "\\psi": [ + "probability distribution", + { + "Q2362761": "wave function" + } + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ], + "C": [ + { + "Q3176558": "positive number (real number strictly greater than zero)" + } + ] + }, + "formula": { + "qID": "59", + "oldId": "10209", + "fid": "4", + "math_inputtex": " \\sum_{n \\in \\mathbb{Z}^d} |\\psi(t,n)|^2 |n| \\leq C ", + "title": "Anderson_localization" + } + }, + { + "definitions": { + "x": [ + "distance", + "x coordinate" + ], + "g": [ + { + "Q30006": "gravitational acceleration (acceleration on an object caused by gravity)" + } + ], + "v": [ + "velocity" + ], + "y": [ + "height", + "y coordinate", + "altitude" + ] + }, + "formula": { + "qID": "60", + "oldId": "9833", + "fid": "26", + "math_inputtex": " p = {\\frac{-x\\pm\\sqrt{x^2-4(\\frac{-gx^2}{2v^2})(\\frac{-gx^2}{2v^2}-y)}}{2(\\frac{-gx^2}{2v^2}) }}", + "comment": "p is a substitution", + "title": "Trajectory_of_a_projectile" + } + }, + { + "definitions": { + "z": [ + { + "Q11567": "complex number (number that can be put in the form a + bi, where a and b are real numbers and i is called the imaginary unit )" + } + ], + "H": [ + { + "Q3258885": "Upper half-plane" + } + ] + }, + "formula": { + "qID": "61", + "oldId": "8877", + "fid": "7", + "math_inputtex": "\\left\\{ z \\in H: \\left| z \\right| > 1,\\, \\left| \\,\\mbox{Re}(z) \\,\\right| < \\frac{1}{2} \\right\\}", + "title": "Cusp_neighborhood" + } + }, + { + "definitions": { + "T": [ + { + "Q131030": "operator (mapping from one vector space or module to another in mathematics)" + } + ], + "\\lambda": [ + { + "Q21406831": "eigenvalue" + } + ], + "I": [ + "identity operator", + { + "Q131030": "operator (mapping from one vector space or module to another in mathematics)" + } + ] + }, + "formula": { + "qID": "62", + "oldId": "2951", + "fid": "17", + "math_inputtex": "T-\\lambda I", + "title": "Spectrum_(functional_analysis)" + } + }, + { + "definitions": { + "y": [ + "y dimension" + ], + "x": [ + "x dimension" + ], + "\\rho": [ + { + "Q186290": "correlation (concept)" + } + ], + "\\sigma_{y}": [ + "standard deviation in y direction" + ], + "\\sigma_{x}": [ + "standard deviation in x direction" + ], + "\\mu_{x}": [ + "x component of mean" + ], + "\\mu_{y}": [ + "y combonent of mean" + ] + }, + "formula": { + "qID": "63", + "oldId": "1588", + "fid": "20", + "math_inputtex": "\n y\\left( x \\right) = {\\mathop{\\rm sgn}} \\left( {{\\rho }} \\right)\\frac{{{\\sigma _y}}}{{{\\sigma _x}}}\\left( {x - {\\mu _x}} \\right) + {\\mu _y}.\n ", + "comments": "sgn is an operator", + "title": "Multivariate_normal_distribution" + } + }, + { + "definitions": { + "x": [ + "variable" + ], + "b": [ + "element of a group" + ] + }, + "formula": { + "qID": "64", + "oldId": "15120", + "fid": "56", + "math_inputtex": "x=b \\ ", + "title": "Algebraically_closed_group" + } + }, + { + "definitions": { + "H": [ + { + "Q1591095": "Hausdorff measure" + } + ], + "K": [ + "compact" + ] + }, + "formula": { + "qID": "65", + "oldId": "6392", + "fid": "10", + "math_inputtex": "H^1(K)=\\sqrt{2}", + "title": "Analytic_capacity" + } + }, + { + "definitions": { + "P_{i}": [ + "Plain text" + ], + "E_{K}": [ + "Encryption" + ], + "S_{i-1}": [ + "state of the shift register" + ], + "x": [ + "number of bits" + ], + "C_{i}": [ + "Chip text" + ] + }, + "formula": { + "qID": "66", + "oldId": "2482", + "fid": "8", + "math_inputtex": "P_i = \\mbox{head}(E_K (S_{i-1}), x) \\oplus C_i", + "title": "Block_cipher_mode_of_operation" + } + }, + { + "definitions": { + "f": [ + { + "Q11348": "function (binary relation, which is left-total and right-unique)" + } + ], + "x": [ + { + "Q3150667": "Indeterminate" + } + ] + }, + "formula": { + "qID": "67", + "oldId": "1666", + "fid": "30", + "math_inputtex": "\\frac{ \\partial f}{ \\partial x} = f_x = \\partial_x f.", + "comments": "in that context subscript is not a part of a variable but a notational element", + "title": "Partial_derivative" + } + }, + { + "definitions": { + "P_{x}": [ + "poset" + ], + "P": [ + { + "Q474715": "partially ordered set (a set ordered by a transitive, antisymmetric, and reflexive binary relation)" + } + ], + "a": [ + { + "Q379825": "element (any one of the distinct objects that make up a set in set theory)" + } + ], + "x": [ + "point", + { + "Q379825": "element (any one of the distinct objects that make up a set in set theory)" + } + ] + }, + "formula": { + "qID": "68", + "oldId": "27027", + "fid": "0", + "math_inputtex": " P_x = P - \\{ a\\mid a \\geq x\\} ", + "comment": "P_{x} is a substitution", + "title": "Poset_game" + } + }, + { + "definitions": { + "\\eta": [ + { + "Q192704": "energy efficiency" + } + ], + "Q_{1}": [ + { + "Q44432": "heat (energy)" + } + ], + "Q_{2}": [ + { + "Q44432": "heat (energy)" + } + ] + }, + "formula": { + "qID": "69", + "oldId": "15167", + "fid": "0", + "math_inputtex": "\\eta = \\frac{ work\\ done } {heat\\ absorbed} = \\frac{ Q1-Q2 }{ Q1}", + "title": "Engine_efficiency" + } + }, + { + "definitions": { + "f": [ + { + "Q319913": "convex function" + } + ], + "x": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "y": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "p": [ + { + "Q2627460": "Conjugate variables" + } + ], + "v": [ + { + "Q2627460": "Conjugate variables" + } + ] + }, + "formula": { + "qID": "70", + "oldId": "4480", + "fid": "29", + "math_inputtex": "df = {\\partial f \\over \\partial x}dx + {\\partial f \\over \\partial y}dy = pdx + vdy", + "title": "Legendre_transformation" + } + }, + { + "definitions": { + "h_{r,s}": [ + "channel from the source to the relay node" + ] + }, + "formula": { + "qID": "71", + "oldId": "17690", + "fid": "4", + "math_inputtex": "h_{r,s}", + "title": "Cooperative_diversity" + } + }, + { + "definitions": { + "k": [ + { + "Q190109": "field (algebraic structure)" + } + ], + "K": [ + { + "Q83478": "group (set with an invertible, associative internal operation admitting a neutral element)" + } + ], + "T": [ + "Tensor field" + ], + "M": [ + { + "Q18848": "module (algebraic structure)" + } + ], + "a": [ + { + "Q379825": "element (any one of the distinct objects that make up a set in set theory)" + } + ] + }, + "formula": { + "qID": "72", + "oldId": "5763", + "fid": "20", + "math_inputtex": " K^M_*(k) := T^*(k^\\times)/(a\\otimes (1-a)) ", + "title": "Algebraic_K-theory" + } + }, + { + "definitions": { + "C": [ + "1-cycles" + ], + "K_{X}": [ + { + "Q844128": "Canonical bundle" + }, + "canonical divisor" + ] + }, + "formula": { + "qID": "73", + "oldId": "21482", + "fid": "45", + "math_inputtex": "\\{C : K_X \\cdot C = 0\\}", + "title": "Cone_of_curves" + } + }, + { + "definitions": { + "\\Theta": [ + { + "Q2608202": "One-form" + } + ], + "n": [ + "number of thermodynamic degrees of freedom" + ] + }, + "formula": { + "qID": "74", + "oldId": "28602", + "fid": "8", + "math_inputtex": "\\Theta \\wedge\n(d\\Theta)^n \\neq 0", + "title": "Geometrothermodynamics" + } + }, + { + "definitions": { + "\\rho": [ + { + "Q29539": "density (mass per unit volume)" + } + ], + "u_{i}": [ + { + "Q11465": "velocity (rate of change of the position of an object as a function of time, and the direction of that change)" + } + ], + "t": [ + "time" + ] + }, + "formula": { + "qID": "75", + "oldId": "6724", + "fid": "5", + "math_inputtex": "D\\left(\\rho u_i\\right)/Dt\\approx0", + "title": "Darcy's_law" + } + }, + { + "definitions": { + "z_{t}": [ + { + "Q176737": "stochastic process (collection of random variables)" + } + ], + "\\lambda_{1}": [ + "slope coefficient" + ], + "z_{t-1}": [ + { + "Q176737": "stochastic process (collection of random variables)" + } + ], + "\\varepsilon_{t}": [ + { + "Q176737": "stochastic process (collection of random variables)" + } + ] + }, + "formula": { + "qID": "76", + "oldId": "10488", + "fid": "25", + "math_inputtex": " z_{t} = \\lambda_{1}z_{t-1} + \\varepsilon_{t} ", + "title": "Unit_root" + } + }, + { + "definitions": { + "b_{3}": [ + { + "Q1413083": "parameter" + } + ] + }, + "formula": { + "qID": "77", + "oldId": "22206", + "fid": "83", + "math_inputtex": "b_3", + "title": "Drucker–Prager_yield_criterion" + } + }, + { + "definitions": { + "b_{3}": [ + { + "Q1759756": "Body force" + } + ] + }, + "formula": { + "qID": "78", + "oldId": "23061", + "fid": "9", + "math_inputtex": "b_3", + "title": "Antiplane_shear" + } + }, + { + "definitions": { + "W": [ + { + "Q900231": "work (term in thermodynamics)" + } + ], + "V_{1}": [ + { + "Q39297": "volume (quantity of three-dimensional space)" + } + ], + "V_{2}": [ + { + "Q39297": "volume (quantity of three-dimensional space)" + } + ], + "p": [ + { + "Q39552": "pressure" + } + ], + "V": [ + { + "Q39297": "volume (quantity of three-dimensional space)" + } + ] + }, + "formula": { + "qID": "79", + "oldId": "15872", + "fid": "32", + "math_inputtex": " \\Delta W = \\int_{V_1}^{V_2} p \\mathrm{d}V \\,\\!", + "title": "Table_of_thermodynamic_equations" + } + }, + { + "definitions": { + "f": [ + { + "Q1948412": "morphism (mathematics)" + } + ], + "Z": [ + { + "Q3554818": "projective variety" + } + ], + "n": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ] + }, + "formula": { + "qID": "80", + "oldId": "13966", + "fid": "2", + "math_inputtex": "\\dim f(Z) > n", + "title": "Fulton–Hansen_connectedness_theorem" + } + }, + { + "definitions": { + "t": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "e": [ + { + "Q168698": "exponential function (unique function which is its own derivative and equals one at zero)" + } + ] + }, + "formula": { + "qID": "81", + "oldId": "335", + "fid": "16", + "math_inputtex": "\\frac{d}{dt} \\log_e t = \\frac{1}{t}.", + "title": "E_(mathematical_constant)" + } + }, + { + "definitions": { + "h_{i}": [ + "classifier" + ], + "X": [ + { + "Q50701": "variable (a value that can change, usually with a context of an equation or operation)" + } + ] + }, + "formula": { + "qID": "82", + "oldId": "17232", + "fid": "32", + "math_inputtex": "h_i : X \\to \\{-1,+1\\}", + "title": "BrownBoost" + } + }, + { + "definitions": { + "\\mathrm{seqs}": [ + "Number of sequences", + { + "Q11053": "RNA (family of large biological molecules)" + } + ] + }, + "formula": { + "qID": "83", + "oldId": "21545", + "fid": "2", + "math_inputtex": "2\\le seqs \\le6", + "title": "List_of_RNA_structure_prediction_software" + } + }, + { + "definitions": { + "F": [ + "set" + ], + "x": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "y": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "\\mathcal{R}": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "b": [ + { + "Q12503": "integer (number that can be written without a fractional or decimal component)" + } + ], + "n": [ + { + "Q12503": "integer (number that can be written without a fractional or decimal component)" + } + ] + }, + "formula": { + "qID": "84", + "oldId": "28184", + "fid": "6", + "math_inputtex": " F = \\{ (x,y) : x \\in \\mathcal{R}^b,\\, y \\in \\mathcal{R}^n,\\; x=y \\}.", + "title": "Projections_onto_convex_sets" + } + }, + { + "definitions": { + "X_{i}": [ + { + "Q176623": "random variable (variable whose value is subject to variations due to chance)" + } + ], + "\\omega": [ + { + "Q10290214": "event (in statistics, a set of outcomes to which a probability is assigned)" + } + ], + "\\omega_{i}": [ + { + "Q10290214": "event (in statistics, a set of outcomes to which a probability is assigned)" + } + ] + }, + "formula": { + "qID": "85", + "oldId": "4212", + "fid": "21", + "math_inputtex": "X_i(\\omega)=\\omega_i", + "title": "Almost_surely" + } + }, + { + "definitions": { + "L": [ + { + "Q505735": "Lagrangian" + }, + "Lagrange function" + ], + "q_{i}": [ + { + "Q1057607": "Generalized coordinates" + } + ], + "t": [ + { + "Q11471": "time (dimension in which events can be ordered from the past through the present into the future)" + } + ], + "\\dot{q_{i}}": [ + "generalized velocities", + { + "Q1057607": "Generalized coordinates" + } + ] + }, + "formula": { + "qID": "86", + "oldId": "24328", + "fid": "67", + "math_inputtex": "\n{\\partial{L}\\over \\partial q_i} = {\\mathrm{d} \\over \\mathrm{d}t}{\\partial{L}\\over \\partial{\\dot{q_i}}}.\n", + "title": "Lagrangian_mechanics" + } + }, + { + "definitions": { + "x_{7}": [ + { + "Q5227327": "Data point" + } + ] + }, + "formula": { + "qID": "87", + "oldId": "24263", + "fid": "100", + "math_inputtex": "x_7", + "title": "Near_sets" + } + }, + { + "definitions": { + "\\Pi_{n}": [ + { + "Q6901742": "Monomial basis" + } + ] + }, + "formula": { + "qID": "88", + "oldId": "2926", + "fid": "30", + "math_inputtex": "\\Pi_n", + "title": "Polynomial_interpolation" + } + }, + { + "definitions": { + "\\sigma": [ + { + "Q159375": "standard deviation (dispersion of the values ​​of a random variable around its expected value)" + } + ], + "X": [ + "available assets" + ], + "T": [ + { + "Q2858846": "Transpose of a linear map" + } + ], + "V": [ + { + "Q1134404": "Covariance matrix" + } + ] + }, + "formula": { + "qID": "89", + "oldId": "26177", + "fid": "10", + "math_inputtex": "\\sigma^2 = X^TVX,", + "title": "Mutual_fund_separation_theorem" + } + }, + { + "definitions": { + "\\mathbb{R}": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "n": [ + { + "Q4440864": "dimension (minimum number of coordinates within a space needed to specify any point)" + } + ], + "f": [ + "integrable function" + ], + "x": [ + "variable" + ], + "B": [ + { + "Q838611": "ball (in mathematics, space inside a sphere)" + } + ], + "x_{0}": [ + { + "Q44946": "point (fundamental object of Euclidean geometry)" + } + ], + "r": [ + { + "Q173817": "radius (segment in a circle and its length)" + } + ], + "S": [ + "surface" + ] + }, + "formula": { + "qID": "90", + "oldId": "22382", + "fid": "3", + "math_inputtex": "\\int_{\\mathbb{R}^n}f\\,dx = \\int_0^\\infty\\left\\{\\int_{\\partial B(x_0;r)} f\\,dS\\right\\}\\,dr.", + "title": "Coarea_formula" + } + }, + { + "definitions": { + "x": [ + "x coordinate" + ], + "p_{x}": [ + "x momentum" + ], + "y": [ + "y coordinate" + ], + "p_{y}": [ + "y momentum" + ] + }, + "formula": { + "qID": "91", + "oldId": "18056", + "fid": "74", + "math_inputtex": "\n\\{x, p_x\\}_{DB} = \\{y, p_y\\}_{DB} = \\frac{1}{2}\n", + "title": "Dirac_bracket" + } + }, + { + "definitions": { + "G_{k,\\sigma}": [ + "substitution" + ], + "y": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ], + "k": [ + { + "Q21199": "natural number (numbers used for counting and ordering)" + } + ], + "\\sigma": [ + { + "Q12916": "real number (quantity along a continuous line)" + } + ] + }, + "formula": { + "qID": "92", + "oldId": "26815", + "fid": "17", + "math_inputtex": "G_{k, \\sigma} (y)= 1-(1+ky/\\sigma)^{-1/k} ", + "title": "Pickands–Balkema–de_Haan_theorem" + } + }, + { + "definitions": { + "L": [ + "spaces of operators" + ], + "H_{B}": [ + { + "Q190056": "Hilbert space" + }, + "state spaces" + ], + "C": [ + "space of continuous functions" + ], + "X": [ + "some set" + ] + }, + "formula": { + "qID": "93", + "oldId": "7260", + "fid": "15", + "math_inputtex": "L(H_B) \\otimes C(X)", + "title": "Quantum_channel" + } + }, + { + "definitions": { + "\\pi_{i}": [ + "equilibrium distribution" + ], + "N": [ + "Number of particles" + ], + "i": [ + "index" + ] + }, + "formula": { + "qID": "94", + "oldId": "22755", + "fid": "2", + "math_inputtex": "\\pi_i = 2^{-N} \\tbinom Ni", + "title": "Ehrenfest_model" + } + }, + { + "definitions": { + "p_{1}": [ + { + "Q10290214": "event (in statistics, a set of outcomes to which a probability is assigned)" + } + ], + "p_{n}": [ + { + "Q10290214": "event (in statistics, a set of outcomes to which a probability is assigned)" + } + ] + }, + "formula": { + "qID": "95", + "oldId": "14285", + "fid": "1", + "math_inputtex": "(\\sqrt{p_1}, \\cdots ,\\sqrt{p_n})", + "title": "Fidelity_of_quantum_states" + } + }, + { + "definitions": { + "\\boldsymbol{s}": [ + "deviatoric part" + ] + }, + "formula": { + "qID": "96", + "oldId": "15596", + "fid": "27", + "math_inputtex": "\\boldsymbol{s}", + "title": "Yield_surface" + } + }, + { + "definitions": { + "J": [ + { + "Q506041": "Jacobian matrix and determinant" + } + ], + "T": [ + { + "Q2858846": "Transpose of a linear map" + } + ], + "W": [ + "diagonal weight matrix" + ], + "y": [ + "residuals" + ] + }, + "formula": { + "qID": "97", + "oldId": "21658", + "fid": "67", + "math_inputtex": "\\mathbf{J^TW\\ \\Delta y}", + "title": "Non-linear_least_squares" + } + }, + { + "definitions": { + "\\bar{V}": [ + "smallest g" + ] + }, + "formula": { + "qID": "98", + "oldId": "7629", + "fid": "87", + "math_inputtex": "\\bar V^*", + "title": "Markov_decision_process" + } + }, + { + "definitions": { + "n": [ + "integer" + ], + "\\delta": [ + "integer" + ] + }, + "formula": { + "qID": "99", + "oldId": "14812", + "fid": "9", + "math_inputtex": "\\;\\frac{(n+\\delta-1)(n+\\delta-2)\\cdots n}{(\\delta-1)!}\\;", + "title": "Hilbert_series_and_Hilbert_polynomial" + } + }, + { + "definitions": { + "y_{k}": [ + "convolution" + ], + "n": [ + "integer" + ] + }, + "formula": { + "qID": "100", + "oldId": "17319", + "fid": "15", + "math_inputtex": "y_k[n]", + "title": "Overlap–add_method" + } + } +] \ No newline at end of file