Special relativity done wrong

This is a self-study guide to special relativity, from the ground up. The title is inspired by Linear Algebra Done Wrong.


Hasn't special relativity been covered enough already? Here are some "features" of this guide that I believe make it unique:

Other things to mention

Let's get started!
Table of contents
    The basics
        Reference frames
        Linear transformations
        The interval
        Proper time
            The velocity four-vector
            The energy-momentum four-vector

The basics

Relativity, at its core, deals with events that happen in reference frames, and how their components transform between inertial frames. Let's define all these terms briefly.


An event is something that happens at a time and place. It is described by a time and a position in 3D space: the four numbers $t$, $x$, $y$, and $z$. An event is essentially a vector in the 4-dimensional vector space of space plus time, and $t$, $x$, $y$, and $z$ are its components. Of course, vector components require a basis for the vector space, which in this case is called a reference frame.

Reference frames

A reference frame consists of three coordinate axes as well as identical clocks that are stationary with respect to these axes, shown in Fig. 1. These allow us to measure position and time. Since an observer can only measure time locally (i.e. at their position), there are identical "clocks" at every point in space. The clocks aren't actual physical clocks, just as the coordinate axes aren't actual arrows in space; they're just a way to visualize the fourth axis in the "time" direction. The faster the clocks run, the shorter the basis vector in the time direction is, since the same event (vector) would be measured by a larger time component.

FIG. 1. A reference frame.
When we refer to an object's "own frame" or "rest frame", we mean a frame centered on that object (treated as a point particle).
An inertial reference frame is simply a reference frame moving at constant velocity, i.e. not accelerating[3].

Transformations between inertial frames

Let's consider all possible transformations from one inertial frame to another inertial frame. There are, in fact, 10 basic transformations, which can be grouped into 3 types and composed (performed one after the other) to yield any transformation:

FIG. 2. Rotation about the x-axis.

FIG. 3. Translation along the y-axis.

FIG. 4. Frame $F$ undergoes a boost along the $x$ axis to become frame $F'$.
How do vector components transform under the three types of transformations?
$$ \begin{pmatrix} t' \\ x' \\ y' \\ z' \end{pmatrix} \: = \: \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos{\theta} & \sin{\theta} & 0 \\ 0 & -\sin{\theta} & \cos{\theta} & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \: \begin{pmatrix} t \\ x \\ y \\ z \end{pmatrix} $$
The $t$ component is left unchanged, of course.
$$ \begin{aligned} t' &= t \\ x' &= x + 1 \\ y' &= y \\ z' &= z \\ \end{aligned} $$
$$ \begin{equation} \begin{pmatrix} t' \\ x' \\ y' \\ z' \end{pmatrix} \: = \: \begin{pmatrix} 1 & 0 & 0 & 0 \\ -v & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \: \begin{pmatrix} t \\ x \\ y \\ z \end{pmatrix} \label{eq:class_boost} \end{equation} $$
for a boost along the x-axis. But in special relativity, the Lorentz transformation is much weirder, and is a direct result of a couple of physical postulates that Einstein came up with, which seem pretty reasonable by themselves. We'll derive it starting from those postulates. But before we get to the good stuff, here's a small exercise.
Exercise: standard configuration
Frames $F$ and $F'$ are in standard configuration. From frame $F$'s perspective, give the position $(x, y, z)$ of the origin of frame $F'$ as a function of $t$. Then from frame $F'$'s perspective, give the position $(x', y', z')$ of the origin of frame $F$ as a function of time $t'$.
Show/hide solution
In frame $F$, the origin of $F'$ is moving in the $+x$ direction with velocity $v$: $$ \begin{aligned} x &= vt \\ y &= 0 \\ z &= 0. \end{aligned} $$
In frame $F'$, the origin of $F$ is moving in the $-x$ direction with the same velocity $v$: $$ \begin{aligned} x' &= -v t' \\ y' &= 0 \\ z' &= 0. \end{aligned} $$

Postulates of special relativity

Special relativity arose from Einstein's study of electromagnetism; in fact, you could say that special relativity is the "arena" in which the laws of electromagnetism play out. Specifically, by applying Maxwell's equations in different reference frames, he came to the conclusion that they must apply in all inertial frames. Consider the moving wire loop in Fig. 5.

FIG. 5. A wire loop moving in a region of constant magnetic field.
By the Lorentz force law, any positive charges on the left side of the loop experience an upwards force, and the negative charges experience a downwards force, so an electromotive force (emf) is generated around the loop, causing a current to flow. If we look at the situation in the frame of the loop, the B-field is moving left at speed $v$, so Faraday's law says that an emf is generated by the changing magnetic flux through the loop[5], which turns out to be equal to the emf predicted in the other frame. If the laws of electromagnetism only held in a special "absolutely stationary" frame, it's quite a coincidence that they give the same result when applied in another "moving" frame. Thus, Einstein rejected the idea of an absolute frame, and said that all laws of physics hold equally in all inertial frames.
Maxwell's equations also imply that the speed of light is constant:
$$ c = \frac{1}{\sqrt{\epsilon_0 \mu_0}}, $$
where $\epsilon_0$ and $\mu_0$ are universal constants. Since Maxwell's equations hold in all inertial frames, the speed of light is constant in all inertial frames as well.
These two postulates form the basis of special relativity:
  1. The laws of physics are identical in all inertial frames.
  2. The speed of light in vacuum is a constant ($c \approx 3 \times 10^8 \text{m/s}$) in all inertial frames.
Of course, the second postulate is a consequence of the first, if you accept Maxwell's equations as true.

The Lorentz transformation

Now we'll use these postulates and some clever math and physics to derive the Lorentz transformation. But one more thing first: in view of the constant speed of light, let's redefine the vector components as $(ct, x, y, z)$ so that all the components have the same units.

Linear transformations

First off, let's show that the Lorentz transformation is linear. In fact, let's investigate what a linear transformation really means in physics. For a general coordinate transformation, the primed coordinates can be any function of the unprimed ones:
$$ \textbf{x'} = \textbf{T}(\textbf{x}), $$
where $\textbf{x'} = (ct', x', y', z')$ and $\textbf{x} = (ct, x, y, z)$. Recall that a transformation is linear if these conditions hold:
$$ \begin{aligned} 1) & \textbf{T}(\textbf{x}_1 + \textbf{x}_2) = \textbf{T}(\textbf{x}_1) + \textbf{T}(\textbf{x}_2) \\ 2) & \textbf{T}(a \textbf{x}_1) = a \textbf{T}(\textbf{x}_1) \end{aligned} $$
for all vectors $\textbf{x}_1$ and $\textbf{x}_2$ and scalars $a$.
Let's start with condition 1. Since there's no physical significance to the origin of our reference system, we should be able to translate our reference system first, then apply the transformation, and get the same result as first applying the transformation, then translating. In other words, the transformation commutes with translation. Fig. 6 shows this using rotation as an example.

FIG. 6. A linear transformation (e.g. rotation) commutes with translation.
This idea is expressed mathematically as:
$$ \textbf{T}(\textbf{x} + \Delta \textbf{x}) = \textbf{T}(\textbf{x}) + \textbf{T}(\Delta \textbf{x}), $$
where $\Delta \textbf{x}$ is the translation vector. Since $\textbf{x}$ and $\Delta \textbf{x}$ are arbitrary, this is exactly condition 1[6] .
Condition 2 simply states that $\textbf{x}$ and $\textbf{x'} = \textbf{T}(\textbf{x})$ have the same units. Let's say $\textbf{x}$ and $\textbf{x'}$ are measured in kilometers. If we change units to meters, the transformation becomes:
$$ 1000 \textbf{x'} = \textbf{T}(1000 \textbf{x}) = 1000 \textbf{T}(\textbf{x}). $$
Clearly 1000 can be replaced by any number, so condition 2 holds, and the Lorentz transformation is linear.

Eigenvectors and eigenvalues

Consider two frames in standard configuration (Fig. 4). Since the Lorentz transformation is linear, our job is to find the 16 elements of its matrix $\pmb{\Lambda}$, which will depend on $v$, the relative velocity of the frames $F$ and $F'$. We can immediately simplify the problem. Since the frames are only moving in the $x$-direction, the problem is symmetric in the $y$ and $z$ directions, so they can't "mix" with the $t$ and $x$ axes[7]. Thus, $\pmb{\Lambda}$ has the form:
$$ \begin{equation} \pmb{\Lambda} = \begin{pmatrix} A & B & 0 & 0 \\ C & D & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{pmatrix}. \label{eq:ABCD_mat} \end{equation} $$
We'll just work with the sub-matrix $\textbf{L}$ to make things simpler:
$$ \textbf{L} \equiv \begin{pmatrix} A & B \\ C & D \\ \end{pmatrix}. $$
Now consider a light bulb at the origin that is turned on at time $t = 0$, then immediately turned off again, emitting only a short pulse[8]. This light pulse radiates in all directions equally (Fig. 7). The edge of this "light sphere" can be observed in both frames. In frame $F$, it is described by the equation
$$ (ct)^2 - x^2 - y^2 - z^2 = 0 $$
($x^2 + y^2 + z^2$ is the radius squared of the sphere). In frame $F'$, we also see the light turn on and off at $t' = 0$, and it radiates outward in exactly the same way, due to the constant speed of light:
$$ (ct')^2 - x'^2 - y'^2 - z'^2 = 0. $$

FIG. 7. Traveling light pulse as seen from frame $F$ (a) and frame $F'$ (b). Note that the light bulb stays at the origin of frame $F$, so it's moving left in (b). It is off after emitting the pulse. Point $P$, the point on the light sphere and the $+x$ axis, moves at the same speed ($c$) in both frames.
Now consider the point on the sphere on the $+x$ axis (Fig. 7). For this point,
$$ \begin{aligned} x &= ct \text{, and} \\ x' &= ct', \end{aligned} $$
$$ \begin{pmatrix} ct' \\ x' \\ \end{pmatrix} = ct' \begin{pmatrix} 1 \\ 1 \\ \end{pmatrix} = \textbf{L} ct \begin{pmatrix} 1 \\ 1 \\ \end{pmatrix}, $$
$$ \frac{t'}{t} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \textbf{L} \begin{pmatrix} 1 \\ 1 \\ \end{pmatrix}. $$
Thus, $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$ is an eigenvector of the matrix, with some unknown eigenvalue $\lambda_1$ (since we don't know how $t'$ and $t$ are related). For the point on the $-x$ axis, we get the other eigenvector, $\begin{pmatrix} 1 \\ -1 \end{pmatrix}$, with eigenvalue $\lambda_2$[9]. These are very important, since the eigenvectors and eigenvalues of any linear transformation are the key to understanding what it does.
We don't know the eigenvalues. But think about the situation from the perspective of frame $F'$, i.e. consider the inverse transformation, $\textbf{L}^{-1}$. The eigenvectors stay the same, but now the light on the $+x$ axis is moving away from the moving frame (which is now $F$), and the light on the $-x$ axis is moving along with it, just the opposite of before. So, the eigenvalues switch: the new $\lambda_1'$ equals $\lambda_2$, and $\lambda_2' = \lambda_1$. But from linear algebra, we know that inverting a transformation simply inverts the eigenvalues. So $\lambda_1' = \lambda_2 = 1 / \lambda_1$, and we get the nice result that:
$$ \lambda_1 \lambda_2 = 1. $$

"Imaginary" time; finishing the derivation

We now know enough to derive the Lorentz transformation directly[10]. But there's a lot of insight to be gained from taking a more roundabout way:
Any vector can be expressed in the eigenvector basis, $\left\{\begin{pmatrix} 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ -1 \end{pmatrix}\right\}$, calling the components $x_1$ and $x_2$ (Fig. 8).

FIG. 8. A vector $\textbf{v}$ expressed in the eigenvector basis $\textbf{e}_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \textbf{e}_2 = \begin{pmatrix} 1 \\ -1 \end{pmatrix}$, and its new components after the transformation.
Then under $\textbf{L}$:
$$ \begin{aligned} x_1 &\rightarrow \lambda_1 x_1 \text{, and} \\ x_2 &\rightarrow \lambda_2 x_2 = x_2 / \lambda_1, \end{aligned} $$
so the "area" $x_1 x_2$ is invariant under a Lorentz transformation. Now, $x_1 = (ct + x) / 2$, and $x_2 = (ct - x) / 2$, so the quantity:
$$ \begin{equation} 2x_1 2x_2 = (ct + x)(ct - x) = (ct)^2 - x^2 \label{eq:x_interval} \end{equation} $$
is invariant.
Let's try to understand this quantity a little better. Note that it looks an awful lot like the length squared of a vector, $x^2 + y^2$, which we know is invariant under rotations, but there's that pesky minus sign. However, if we make the substitution into "imaginary time":
$$ T = it, $$
then $\eqref{eq:x_interval}$ becomes $(cT / i)^2 - x^2 = -(cT)^2 - x^2$. This has the desired form of (negative) length squared.
So the Lorentz transformation can be viewed as a rotation between imaginary time and real space[11], and thus has the form of a rotation matrix:
$$ \begin{equation} \begin{pmatrix} ict' \\ x' \end{pmatrix} \: = \: \begin{pmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \\ \end{pmatrix} \: \begin{pmatrix} ict \\ x \end{pmatrix}. \label{eq:imag_time_matrix} \end{equation} $$
For the path of the $F'$ origin, $x = vt$ and $x' = 0$, and we have:
$$ \begin{pmatrix} ict' \\ 0 \end{pmatrix} \: = \: \begin{pmatrix} \cos{\theta} & \sin{\theta} \\ -\sin{\theta} & \cos{\theta} \\ \end{pmatrix} \: \begin{pmatrix} ict \\ vt \end{pmatrix}. $$
In particular, from the bottom row,
$$ \begin{aligned} ict\sin{\theta} &= vt\cos{\theta} \\ \tan{\theta} &= -iv/c \end{aligned} $$
We can find $\sin{\theta}$ and $\cos{\theta}$ using a trig identity, or by drawing a picture (Fig. 9).

FIG. 9. Triangle showing how to find $\sin{\theta}$ and $\cos{\theta}$ from $\tan{\theta}$.
Note that they are only determined up to a minus sign, since the minus sign could belong to either $iv$ or $c$. We can figure out which one is positive by observing that $\{t = 0, x > 0\}$ implies $x' > 0$. Plugging this into $\eqref{eq:imag_time_matrix}$, we see from the bottom row that $\cos{\theta}$ must be positive, and so:
$$ \begin{aligned} \cos{\theta} &= \frac{c}{\sqrt{c^2 - v^2}} \\ \sin{\theta} &= \frac{-iv}{\sqrt{c^2 - v^2}} \end{aligned} $$
Substituting these into $\eqref{eq:imag_time_matrix}$ and going back to real time, we get:
$$ \begin{pmatrix} ct' \\ x' \end{pmatrix} \: = \: \frac{1}{\sqrt{c^2 - v^2}} \begin{pmatrix} c & -v \\ -v & c \\ \end{pmatrix} \: \begin{pmatrix} ct \\ x \end{pmatrix}. $$
This is typically rewritten in terms of two dimensionless quantities $\beta \equiv \frac{v}{c}$ and $\gamma \equiv \frac{1}{\sqrt{1 - \beta ^ 2}}$[12]. If we also include the other two coordinates $y$ and $z$, we get:
$$ \begin{equation} \begin{pmatrix} ct' \\ x' \\ y' \\ z' \end{pmatrix} \: = \: \begin{pmatrix} \gamma & -\gamma\beta & 0 & 0 \\ -\gamma\beta & \gamma & 0 & 0\\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{pmatrix} \: \begin{pmatrix} ct \\ x \\ y \\ z \end{pmatrix}. \label{eq:lorentz} \end{equation} $$
And we're done (whew!).
One thing to note is that $v < c$, otherwise we get $\gamma$ imaginary, which is unphysical. So no object can move faster than the speed of light, since time and distance would make no sense in its reference frame.
Here are some exercises to get more familiar with the Lorentz transformation and explore some important consequences.
Exercise: classical limit
Show that $\eqref{eq:lorentz}$ reduces to the classical equation $\eqref{eq:class_boost}$ when $v \ll c$, by taking the limit as $c \rightarrow \infty$[13].
Show/hide solution
$$ \begin{aligned} ct' &= \gamma (ct - vx / c) \\ t' &= \gamma (t - vx / c^2) \approx t \text{ as }c \rightarrow \infty. \\ \\ x' &= \gamma (-vt + x) \approx -vt + x \text{ as }c \rightarrow \infty. \end{aligned} $$
Exercise: inverse Lorentz transformation
What is the inverse Lorentz transformation, to go from the primed coordinates ($ct'$, etc) to the unprimed ones? You can explicitly find the inverse of the matrix in $\eqref{eq:lorentz}$, but think about the situation physically first.
Show/hide solution
It's the same equation but with $v$ replaced with $-v$, since the target frame is going in the opposite direction now.
Exercise: time dilation
You are standing at the origin and quite unexpectedly see a flying clock moving in the $+x$ direction with speed $v$. As it passes by you, you synchronize your watch to it, and they both read $t = t' = 0$ at the origin. What is the relationship between your time $t$ and the time on the clock $t'$? (Hint: remember that clocks always show the time in their own reference frame.)

FIG. 10. Clock moving in the $+x$ direction with speed $v$.
Show/hide solution
The clock's path is $x = vt$ and your frames are in standard configuration since their origins coincide. From $\eqref{eq:lorentz}$:
$$ ct' = \gamma (ct - (v/c) (vt)); $$
$$ \begin{equation} \begin{aligned} t' &= \gamma (t - \beta ^2 t) \\ &= t \frac{(1 - \beta ^2)}{\sqrt{1 - \beta ^2}} \\ &= t / \gamma. \end{aligned} \label{eq:t_trans} \end{equation} $$
Thus we see that moving clocks run slower, by a factor of $\gamma$. When your watch reads $1$ second, the clock only reads $1 / \gamma$ seconds. We call this effect time dilation.
Immediately, there seems to be a problem - from the clock's perspective, you're moving in the $-x$ direction with speed $v$, and so your watch is also running slower relative to the clock. How can this be? Here is an explanation which probably won't entirely satisfy you, but may make it more plausible. Remember how we mentioned earlier that observers can only measure time locally? This means the situation is inherently asymmetric - in order to measure $t'$ continuously, you must in fact have many observers in the clock's path, using their own watches, to measure $t'$ on the moving clock as it flies past them[14]. So, you are using one clock in the clock's frame (the clock itself), but many clocks in your own frame.
And of course, the situation is reversed in the clock's frame, so you two never end up comparing clocks in the same position after the initial pass-by at the origin.
But what if the moving clock turned around and started heading back towards you? It would continue to be dilated by a factor of $\gamma$, and when it passes by you again it would read a lower time. From the clock's perspective, you started heading back towards it, and so when you pass by, your clock would read a lower time! Since now you are comparing clocks in the same position in both perspectives, this seems like a much more serious problem. It even has a name, the Twin paradox. I'll explain the solution when we talk more about particles, but as a hint, think carefully about what "turning around" means - is the situation really as symmetric as it seems?
Exercise: length contraction
You're standing at the origin again when a ladder flies past you! You decide to measure its length for some reason. In terms of events, "measuring length" means getting the positions of the two ends of the object at the same time, and finding their difference. Let's say you measure it at time $t = 0$ when one end is at the origin and the other is at $x = d$. So the two relevant events $(ct, x)$ are: $(0, 0)$ and $(0, d)$, and the length is $d$. Question: what is the length of the ladder in the ladder's frame?

FIG. 11. Ladder moving in the $+x$ direction with speed $v$.
Show/hide solution
We need to apply the Lorentz transformation to $(0, 0)$ and $(0, d)$ to find the length in the moving frame. We also don't care about the time $t'$ since the ladder is stationary in its own frame, so it doesn't matter at what time the two ends are measured.
Using $\eqref{eq:lorentz}$, $(0, 0)$ becomes $(0, 0)$, of course, and $x = d$ becomes $x' = \gamma d$, so the length of the stationary ladder is $\gamma d$.
So the ladder is actually longer by a factor of $\gamma$ in its rest frame. Or in other words: moving objects are shortened by a factor of $\gamma$, when their length is measured in the moving frame.
Of course, this means that in the ladder's frame, you have also shrunk in the $x$ direction by a factor of $\gamma$. And you guessed it, this leads to another apparent paradox, the ladder paradox. Wikipedia describes it pretty well.


In the next part of this guide, we will mostly focus on moving particles and their properties, such as energy and momentum. Specifically, we're dealing with idealized point particles, which have no volume and may or may not have mass. The path of such a particle is called its worldline, and can be thought of as an infinite series of events, separated by infinitesimal displacements $\textbf{dx}_1$, $\textbf{dx}_2$, etc [15].

FIG. 12. The path of a moving particle is an infinite series of events separated by infinitesimal displacements.
Each of these displacement vectors is just a difference of two events, so its components transform in the usual way under a Lorentz transformation:
$$ \textbf{dx} = \textbf{x}_2 - \textbf{x}_1 \text{ becomes } \textbf{dx'} = \Lambda \textbf{x}_2 - \Lambda \textbf{x}_1 = \Lambda (\textbf{x}_2 - \textbf{x}_1) = \Lambda \textbf{dx}. $$

The interval

In classical physics, the arc length describes the distance between two infinitesimally separated positions in space:
$$ \begin{aligned} \text{ds} \; &=\sqrt{\text{dx}^2 + \text{dy}^2 + \text{dz}^2} \\ &= \text{dt} \sqrt{\left(\frac{\text{dx}}{\text{dt}}\right)^2 + \left(\frac{\text{dy}}{\text{dt}}\right)^2 + \left(\frac{\text{dz}}{\text{dt}}\right)^2} \; \; &\text{(multiplying top and bottom by dt)} \\ &= v \text{dt}, \end{aligned} $$
where $v$ is the particle's instantaneous speed and $\text{dt}$ is the time for the particle to travel between the points. If we integrate this over the whole path of the particle, we get the total distance traveled:
$$ \begin{equation} s = \int_{t_1}^{t_2} v \text{dt}, \label{eq:tot_dist} \end{equation} $$
in this case between time $t_1$ and $t_2$[17].
Note that the arc length is invariant under rotations; if you rotate your coordinate system, the particle still travels the same distance. What is the corresponding quantity in relativity? Ideally, it would be invariant under boosts as well as rotations. Well, we have already constructed a quantity $(ct)^2 - x^2$ (equation $\eqref{eq:x_interval}$) that played the role of the (negative) "length squared" of a vector, and was invariant under boosts in the $x$ direction. We only need to extend it to all three spatial dimensions, and make it infinitesimal:
$$ \text{ds}^2 = (c\text{dt})^2 - \text{dx}^2 - \text{dy}^2 - \text{dz}^2, $$
and we call $\text{ds}^2$ the interval. It is invariant under boosts in any direction, since any boost can be decomposed into $x$, $y$, and $z$ boosts, and it's clearly invariant under all of those.
The interval $\text{ds}^2$ can be negative or positive - the quantity $\text{ds}$ only has meaning if it is positive. If the interval is negative, $\text{dx}^2 + \text{dy}^2 + \text{dz}^2 > (c\text{dt})^2$, which would mean that the particle traveled faster than light, since it moved a distance greater than $c \text{dt}$ in time $\text{dt}$. No physical particle can do this, but in general, events can be separated in space but not time, such as when we measured length above. We call an interval between two events spacelike if $\Delta x^2 + \Delta y^2 + \Delta z^2 > (c\Delta t)^2$, timelike if $\Delta x^2 + \Delta y^2 + \Delta z^2 < (c\Delta t)^2$, and lightlike if $\Delta x^2 + \Delta y^2 + \Delta z^2 = (c\Delta t)^2$.[18]

Proper time

The events of a real particle's path are separated by timelike intervals ($\text{ds}^2 > 0$), and we can define the proper time as:
$$ \begin{equation} \begin{aligned} \text{d}\tau \; &= \sqrt{\text{ds}^2 / c^2} \\ &= \sqrt{\text{dt}^2 - \frac{\text{dx}^2 + \text{dy}^2 + \text{dz}^2}{c^2}} \\ &= \text{dt} \sqrt{1 - \frac{\left(\frac{\text{dx}}{\text{dt}}\right)^2 + \left(\frac{\text{dy}}{\text{dt}}\right)^2 + \left(\frac{\text{dz}}{\text{dt}}\right)^2}{c^2}} \\ &= \text{dt} \sqrt{1 - v^2 / c^2} \\ &= \text{dt} / \gamma, \end{aligned} \label{eq:prop_time} \end{equation} $$
where $v$ is the particle's instantaneous speed. This is probably the closest analogy to the arc length in relativity. Since it is invariant under boosts and rotations, the total (integrated) proper time is also invariant:
$$ \tau = \int_{t_1}^{t_2} \text{dt} / \gamma. $$
Remember, the $v$ hidden in $\gamma$ is the instantaneous speed, so $\gamma$ depends on time and we can't just factor it out, same as in $\eqref{eq:tot_dist}$. Why is it called the proper time? When viewing a path in the particle's own reference frame, $\text{dx} = \text{dy} = \text{dz} = 0$ throughout the motion, so $\text{d}\tau = \text{dt}$ and the total $\tau$ is just the time elapsed. propre is the French word for "own", thus "own time"[19].
Exercise: proper time of a sinusoidal path
A particle moves in a sinusoidal path in one dimension: $x(t) = x_0 \sin(\omega t)$.
a) Find the period of the motion in the particle's own reference frame. Just write out the integral; it can't be solved in closed form.
b) If $x_0 \omega \approx c$, the integral can be solved. Solve it and find the ratio of periods in the two frames, $T_\text{particle frame} / T_\text{your frame}$.
Show/hide solution
a) The velocity is $v(t) = x_0 \omega \cos(\omega t)$. Plugging into $\eqref{eq:prop_time}$, we find:
$$ T_\text{particle frame} = \int_0^{2 \pi / \omega} \sqrt{1 - \frac{\omega^2 x_0^2}{c^2} \cos^2(\omega t)} \; \text{dt}. $$
b) When $x_0 \omega \approx c$, $\frac{\omega^2 x_0^2}{c^2} \approx 1$, so the integral becomes:
$$ T_\text{particle frame} = \int_0^{2 \pi / \omega} \left|\sin(\omega t)\right| \; \text{dt}, $$
which yields $\frac{2}{\omega}$. So, $T_\text{particle frame} / T_\text{your frame} = \frac{1}{\pi}$.


We have seen that events are vectors in a four dimensional space with components $(ct, x, y, z)$. Their components transform under a boost in the $x$ direction by $\eqref{eq:lorentz}$. We have also seen that the difference between two infinitesimally separated events is a vector that transforms in the same way. Let's now define a four-vector as any set of four quantities that transforms as $\eqref{eq:lorentz}$ under a boost. The only four-vectors we have seen so far are the two just mentioned above: the position four-vector, $(ct, x, y, z)$, and the infinitesimal version $(c\text{dt}, \text{dx}, \text{dy}, \text{dz})$. We will soon derive other ones.
Finding four-vectors is important, since they describe how quantities of interest such as velocity or momentum transform under a boost. This will be more clear after some examples.
But first, we will introduce some new notation. A four-vector $A$ will have component indices on top now, written as $A^i$, where the index $i = 0, 1, 2, 3$. The matrix $\pmb{\Lambda}$ will be written as $\Lambda^j_i$, where $i, j = 0, 1, 2, 3$. The index $i$ denotes the column, and $j$ denotes the row. We will also use the Einstein summation convention: any expression with a repeated index indicates a summation over that index. So,
$$ \textbf{x'} = \pmb{\Lambda} \textbf{x} \text{ means } x'^j = \sum_{i = 0}^3 \Lambda^j_i x^i \text{ which becomes } x'^j = \Lambda^j_i x^i. $$
(If the second equation is not clear, write out the matrix multiplication fully and stare at it for a while.) Since the index $i$ is repeated on the right, the summation is implied.
The Einstein notation is just a way to save space, since sometimes equations have multiple summations and it can get messy. The real benefit is actually in using these indices instead of matrix-vector notation. We will see this later when we introduce covectors, and tensors in general; I just wanted to introduce it early since it can take a while to get used to.
In this new notation, a four vector is a set of four quantities $A^i$ that transform as:
$$ A'^j = \Lambda^j_i A^i $$
under a boost in the $x$ direction.

The velocity four-vector

Dividing the differential position $\text{dx}^i = (c\text{dt}, \text{dx}, \text{dy}, \text{dz})$ by $\text{dt}$ gives:
$$ (c, \frac{\text{dx}}{\text{dt}}, \frac{\text{dy}}{\text{dt}}, \frac{\text{dz}}{\text{dt}}) = (c, v_x, v_y, v_z). $$
This is a set of four numbers which includes the velocity as components 1-3 (remember, the first component is index 0). Is it a four-vector? Obviously not, since $c$ doesn't change under a boost. The problem is that $\text{dt}$ itself changes, so we can't divide the existing four-vector $\text{dx}^i$ by it to get another one. We need a scalar quantity that doesn't change under a boost. Fortunately, we just found one, the proper time $\text{d}\tau = \text{dt} / \gamma$ $\eqref{eq:prop_time}$. Now we can define the velocity four-vector as[20]:
$$ V^i = \frac{\text{dx}^i}{\text{d}\tau} = \gamma (c, \frac{\text{dx}}{\text{dt}}, \frac{\text{dy}}{\text{dt}}, \frac{\text{dz}}{\text{dt}}) = \gamma (c, v_x, v_y, v_z). $$ $V^i$ can also be written schematically as $\gamma (c, \textbf{v})$, with the regular 3-vector $\textbf{v} = (v_x, v_y, v_z)$.
Exercise: velocity invariant
Find the invariant $(V^0)^2 - (V^1)^2 - (V^2)^2 - (V^3)^2$ associated with the velocity four-vector $V^i$. (Confusing notation: the first superscript is the vector component and the second is the power it's raised to.)
Show/hide solution
$$ \gamma^2 (c^2 - v_x^2 - v_y^2 - v_z^2) = \gamma^2 (c^2 - v^2) = \frac{(c^2 - v^2) c}{c^2 - v^2} = c^2. $$ $c^2$ is obviously invariant. Just as $(ct)^2 - x^2 - y^2 - z^2$ is like the (negative) "length squared" of a position four-vector, we can interpret $c^2$ as the magnitude squared of the particle's "relativistic velocity". Thus, although the "length" of the position vector can change, all particles move through spacetime with speed $c$, so to speak.
Exercise: acceleration four-vector
(Note: this problem is more difficult than the others. But it's worth it, since it shows that acceleration is handled perfectly fine in special relativity. Also, it's great practice for really understanding four-vectors. You will need to know how to solve simple differential equations.)
We can go further and find a four-vector involving the acceleration, by differentiating the velocity four-vector with respect to $\tau$ again, and using the chain rule:
$$ A^i = \frac{\text{d}V^i}{\text{d}\tau} = \frac{\text{d}V^i}{\text{dt}} \frac{\text{dt}}{\text{d}\tau}. $$
a) Calculate the components of the acceleration four-vector, in terms of the regular 3-vector acceleration $\textbf{a}$ and velocity $\textbf{v}$.
b) Consider a particle that always has constant acceleration $\textbf{a} = (a_x, 0, 0)$ in its own reference frame. You observe it in an inertial frame where it is at rest at the origin at $t = 0$. What is its path $x(t)$ in your frame?
Show/hide solution
a) First of all, $\frac{\text{dt}}{\text{d}\tau}$ is simply $\gamma$, from $\eqref{eq:prop_time}$. To find $\frac{\text{d}V^i}{\text{dt}}$, we need to calculate $\frac{\text{d}\gamma}{\text{dt}}$, which comes out as:
$$ \frac{\text{d}\gamma}{\text{dt}} = \frac{\gamma^3 \textbf{a} \cdot \textbf{v}}{c^2}. $$
So, putting it all together, we get:
$$ \begin{equation} A^i = \gamma^2 (\frac{\gamma^2 \textbf{a} \cdot \textbf{v}}{c}, \textbf{a} + \frac{\gamma^2 \textbf{a} \cdot \textbf{v}}{c^2} \textbf{v}), \label{eq:accel} \end{equation} $$
where we used the product rule on $\frac{\text{d}}{\text{dt}} (\gamma \textbf{v})$.
b) In the particle's frame, $\textbf{v} = 0$ always, so $A'^i = (0, a_x, 0, 0)$. Apply the inverse Lorentz transformation $\eqref{eq:lorentz}$ to boost back to your frame:
$$ A^i = \gamma (\beta a_x, a_x, 0, 0). $$
We can equate the first component $\gamma \beta a_x$ with $\gamma^4 av / c$ from $\eqref{eq:accel}$, where $a$ is the acceleration in your frame, which equals $\frac{\text{dv}}{\text{dt}}$. Now we have a second-order ordinary differential equation, with initial conditions $v(0) = 0, x(0) = 0$. Integrating twice, we get the final answer:
$$ x(t) = \frac{c^2}{a_x} (\sqrt{1 + \frac{a_x^2 t^2}{c^2}} - 1). $$
Note that at large $t$, it becomes $x(t) \approx ct$.
This problem shows that we can think of a four-vector as a "package" for dynamical quantities, that can be carried between frames using the Lorentz transformation, then "unwrapped" to get normal quantities in the frame of interest.
Incidentally, the solution to the twin paradox is that one twin has to accelerate to turn around, while the other doesn't. This clearly makes the situation asymmetric - the accelerating twin is in a non-inertial reference frame.

The energy-momentum four-vector

Since we have a velocity four-vector, we can easily form a four-vector involving momentum by simply multiplying by the mass of the particle, $m$, since the mass doesn't depend on reference frame:
$$ P^i = \gamma (mc, mv_x, mv_y, mv_z) = (\gamma mc, \textbf{p}), $$
where we define $\textbf{p} = \gamma m \textbf{v}$ as the relativistic momentum. Note that $\textbf{v}$ is the ordinary velocity while $\textbf{p}$ is the relativistic momentum. A bit confusing, but it appears to be standard notation.
The quantity $\gamma mc$ is actually something surprising, which you can guess from the title of this section. Let's expand it out to first order in $\beta^2$[21]:
$$ \gamma mc = mc (1 - \frac{v^2}{c^2})^{-1/2} \approx mc (1 + \frac{v^2}{2c^2}) = \frac{1}{c} (mc^2 + \frac{1}{2} m v^2) $$
We see the classical kinetic energy $\frac{1}{2} m v^2$ pop out, along with another quantity with units of energy, $mc^2$. This suggests we define the total energy $E$ as $\gamma mc^2$. It consists of kinetic energy plus $mc^2$, which we call the rest energy, since it's the energy when $v = 0$[22].
So, we can rewrite the four-momentum as:
$$ P^i = (E / c, \textbf{p}) $$
The associated invariant is, of course, $m^2c^2$, since we just multiplied the four-velocity by $m$. Thus,
$$ E^2 / c^2 - p^2 = m^2 c^2, $$
$$ \begin{equation} E^2 = m^2 c^4 + p^2 c^2. \label{eq:eprel} \end{equation} $$
This is the relationship between energy and momentum in special relativity.
The laws of conservation of energy and momentum in classical mechanics are now combined into a single law, the law of conservation of four-momentum, which we won't prove here. It states that the total four-momentum of all particles in a closed system is conserved[23].
To be continued...

1 If you haven't had a course on linear algebra, I recommend watching Gilbert Strang's lectures here.
2 Click anywhere to hide this footnote.
3 You may ask, "even if I'm accelerating, from my point of view I'm in a stationary frame, so isn't any frame 'inertial'?" No, you can always tell if you're in an accelerating frame, because you experience a fictitious force in the opposite direction of your acceleration. This is why acceleration is fundamentally different from velocity, which cannot be measured absolutely (more on this later).
4 Remember, the origin of our 4D vector space includes time - they clearly don't preserve the spatial origin over time since frame $F'$ is moving!
5 Note that Faraday's law does not apply in the first case, since the flux integral $\int_S \textbf{B} \cdot \textbf{dS}$ must be done using a fixed surface at a given instant of time, and the flux through any fixed surface is not changing, even though the wire is moving.
6 You might think that it would look like:
$$ \textbf{T}(\textbf{x} + \Delta \textbf{x}) = \textbf{T}(\textbf{x}) + \Delta \textbf{x} $$
but remember, when you apply $\textbf{T}$ to the components, it applies to all vectors, including $\Delta \textbf{x}$.
7 Specifically, consider if one of the zero elements in $\eqref{eq:ABCD_mat}$ were non-zero. That would mean that time depends on $y$, for example. But there's no reason it would depend on positive $y$ any more than negative $y$. So it must be zero.
8 It is immediately turned off to eliminate any potential asymmetry in the problem, since the light bulb has to move with one of the frames and not the other.
9 Note that $\lambda_1 \neq \lambda_2$, since we're comparing different points in space; $t'$ could (and does) depend on $x$.
10 Briefly: you can express $\textbf{L}$ in terms of $\lambda_1$ and $\lambda_2$ and apply it to the known case of the path of frame $F'$'s spatial origin ($x = vt \rightarrow x' = 0$).
11 Or, equivalently, between real time and imaginary space.
12 These quantities will be used over and over again, so it's worth it to memorize them. $\gamma$ shows up especially often; many quantities differ from the classical value by a factor of $\gamma$.
13 This is basically equivalent to ignoring terms of order $v / c$ and higher in the Taylor expansion. I'll elaborate more on Taylor series approximations later.
14 These don't have to be actual physical observers. You can, for example, read the clock using the light emitted from it, and since you know the clock's velocity, calculate the time in your frame when it showed that reading. This would be an indirect measurement, which is different from your direct measurement as it passed by you.
15 Mathematicians tend to complain a lot about infinitesimals since they're not rigorously defined, but most physicists don't seem to mind. I think they are quite intuitive and simplify some derivations, so we'll use them here.
16 The distinction between infinitesimal and finite vectors is not too important here, since both transform the same way under boosts. However, it turns out to be crucial in general relativity, and in the mathematical machinery behind it, differential geometry. Basically, there is no concept of a finite vector anymore, but each point in space has an associated vector space where infinitesimal vectors live. I might elaborate more on that later if there's a good opportunity.
17 It is useful to think of integrals in this way - as adding up a bunch of infinitesimals. A lot of times you can first construct an infinitesimal quantity you'd like to add up, then plop an integral sign in front to find the sum.
18 The $\Delta$ instead of $d$ means that this applies to events separated by a finite (non-infinitesimal) displacement as well.
19 We could have gotten $\eqref{eq:prop_time}$ directly without going through this whole process, by simply remembering the effect of time dilation - $\eqref{eq:t_trans}$ gives the particle's own time from any other reference frame. The instantaneous speed $v$ of a particle can be thought of as the speed needed to boost into the particle's own reference frame.
20 Instead of dividing two differentials, we can also think of it as using the chain rule:
$$ \frac{\text{dx}^i}{\text{d}\tau} = \frac{\text{dx}^i}{\text{dt}} \frac{\text{dt}}{\text{d}\tau} $$
21 To $n$th order means: assume some quantity in an expression is much less than 1, and write out the Taylor series around zero for it, dropping terms with higher power than $n$. Scientists and engineers often toss these calculations around like it's second nature, since they're extremely useful for approximations and sanity checks. The most common one by far is:
$$ (1 + \epsilon)^n \approx 1 + n\epsilon \; \; \; \; \; (\epsilon \ll 1), $$
which we use here.
22 But wait, what about $E = mc^2$? Well, some physicists define $m$ as the "relativistic mass", $\gamma m_0$, where $m_0$ is the "rest mass", which is what we're calling $m$. Many people (including myself) argue that the notion of relativistic mass is not very useful, and just define the mass, $m$, which is the rest mass. I'll quote Einstein here:
"It is not good to introduce the concept of the mass $M = m/\sqrt{1 - v^2/c^2}$ of a moving body for which no clear definition can be given. It is better to introduce no other mass concept than the ’rest mass’ m. Instead of introducing M it is better to mention the expression for the momentum and energy of a body in motion."
23 It is important to distinguish conserved quantities like four-momentum from invariant ones like $m^2c^2$. Conserved quantities don't change in a closed system, but may change in different inertial frames. Invariant ones may change over time in a closed system but don't change when switching between inertial frames.