This is a self-study guide to special relativity, from the ground up. The title is inspired by Linear Algebra Done Wrong.

Hasn't special relativity been covered enough already? Here are some "features" of this guide that I believe make it unique:

- Fairly rigorous and detailed derivations of the Lorentz transformation and Lorentz invariance. Most sources handwave certain aspects of these derivations. I've tried to make every detail as mathematically, or at least physically, clear as possible.
- Heavy use of linear algebra as a framework for understanding spacetime. In my opinion, special relativity is not memorable, and often seems like a bunch of disparate concepts, unless you start with an overarching mathematical framework.
- Treatment of acceleration in relativity, which is often omitted in undergraduate courses.
- Use of Einstein notation and other things to allow for an easy transition into general relativity.
- Conversational tone, which helps immensely when I'm self studying. "Textbook humor" is always better than no humor.

- The prerequisites are undergraduate linear algebra, vector calculus, and basic classical mechanics
^{[1]}. Basic understanding of electromagnetism will also help. - This is a footnote, click on it to show:
^{[2]}. - This guide will probably always be a work in progress, as I continually add or revise things. Please give it a read anyway - it was a labor of love. ^^
- I'm not the best at coming up with exercises, so there aren't a lot of them. This means it's important to attempt all of them, or at least read and understand the solutions, as they are sometimes mentioned in the main text. This also means you should probably supplement them with problems from another source, for more practice.
- Most of the later sections (starting from "Particles") are based off the treatment in The Classical Theory of Fields.
- Finally, please let me know if you find any mistakes, or if there's something important I should include. My e-mail is on my homepage.

Let's get started!

Table of contents

Relativity, at its core, deals with *events* that happen in *reference frames*, and how their components transform between *inertial frames*. Let's define all these terms briefly.

An event is something that happens at a time and place. It is described by a time and a position in 3D space: the four numbers $t$, $x$, $y$, and $z$. An event is essentially a vector in the 4-dimensional vector space of space plus time, and $t$, $x$, $y$, and $z$ are its components. Of course, vector components require a basis for the vector space, which in this case is called a reference frame.

A reference frame consists of three coordinate axes as well as identical clocks that are stationary with respect to these axes, shown in Fig. 1. These allow us to measure position and time. Since an observer can only measure time locally (i.e. at their position), there are identical "clocks" at every point in space. The clocks aren't actual physical clocks, just as the coordinate axes aren't actual arrows in space; they're just a way to visualize the fourth axis in the "time" direction. The faster the clocks run, the shorter the basis vector in the time direction is, since the same event (vector) would be measured by a larger time component.

FIG. 1. A reference frame.

When we refer to an object's "own frame" or "rest frame", we mean a frame centered on that object (treated as a point particle).

An *inertial reference frame* is simply a reference frame moving at constant velocity, i.e. not accelerating^{[3]}.

Let's consider all possible transformations from one inertial frame to another inertial frame. There are, in fact, 10 basic transformations, which can be grouped into 3 types and composed (performed one after the other) to yield any transformation:

- 3
**rotations**about the $x$, $y$, or $z$ axis - these can be composed to yield any rotation in space (Fig. 2). These are linear and can be considered a*change of basis*of the space. - 4
**translations**along the $x$, $y$, $z$, or $t$ axis (Fig. 3). These shift the origin around so are not really a change of basis in the usual sense. A translation along the $t$ axis is just adjusting your clocks forwards or backwards. - 3
**Lorentz transformations**, also called**boosts**, which start the frame moving at time $t = 0$ with a velocity $v$, along the $x$, $y$, or $z$ axis (Fig. 4). These are also linear (we will prove this in detail later) and are also a change of basis. Note that they preserve the origin since the axes are lined up at $t = t' = 0$^{[4]}. If the boost happens along the x-axis, the two frames $F$ and $F'$ are said to be in*standard configuration*.

FIG. 2. Rotation about the x-axis.

FIG. 3. Translation along the y-axis.

FIG. 4. Frame $F$ undergoes a boost along the $x$ axis to become frame $F'$.

How do vector components transform under the three types of transformations?

- For rotations, they are multiplied by a rotation matrix. For example, for a rotation about the z-axis:

$$
\begin{pmatrix} t' \\ x' \\ y' \\ z' \end{pmatrix}
\:
=
\:
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & \cos{\theta} & \sin{\theta} & 0 \\
0 & -\sin{\theta} & \cos{\theta} & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}
\:
\begin{pmatrix}
t \\ x \\ y \\ z
\end{pmatrix}
$$

The $t$ component is left unchanged, of course.

- For translations, a constant just gets added to one of the components, for example:

$$
\begin{aligned}
t' &= t \\
x' &= x + 1 \\
y' &= y \\
z' &= z \\
\end{aligned}
$$

- For boosts, figuring this out is much trickier. In classical physics, it's simply:

$$
\begin{equation}
\begin{pmatrix} t' \\ x' \\ y' \\ z' \end{pmatrix}
\:
=
\:
\begin{pmatrix}
1 & 0 & 0 & 0 \\
-v & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}
\:
\begin{pmatrix}
t \\ x \\ y \\ z
\end{pmatrix}
\label{eq:class_boost}
\end{equation}
$$

for a boost along the x-axis. But in special relativity, the Lorentz transformation is much weirder, and is a direct result of a couple of physical postulates that Einstein came up with, which seem pretty reasonable by themselves. We'll derive it starting from those postulates. But before we get to the good stuff, here's a small exercise.

Frames $F$ and $F'$ are in standard configuration. From frame $F$'s perspective, give the position $(x, y, z)$ of the origin of frame $F'$ as a function of $t$. Then from frame $F'$'s perspective, give the position $(x', y', z')$ of the origin of frame $F$ as a function of time $t'$.

Show/hide solutionIn frame $F$, the origin of $F'$ is moving in the $+x$ direction with velocity $v$:
$$
\begin{aligned}
x &= vt \\
y &= 0 \\
z &= 0.
\end{aligned}
$$

In frame $F'$, the origin of $F$ is moving in the $-x$ direction with the same velocity $v$:
$$
\begin{aligned}
x' &= -v t' \\
y' &= 0 \\
z' &= 0.
\end{aligned}
$$

Special relativity arose from Einstein's study of electromagnetism; in fact, you could say that special relativity is the "arena" in which the laws of electromagnetism play out. Specifically, by applying Maxwell's equations in different reference frames, he came to the conclusion that they must apply in all inertial frames. Consider the moving wire loop in Fig. 5.

FIG. 5. A wire loop moving in a region of constant magnetic field.

By the Lorentz force law, any positive charges on the left side of the loop experience an upwards force, and the negative charges experience a downwards force, so an electromotive force (emf) is generated around the loop, causing a current to flow. If we look at the situation in the frame of the loop, the B-field is moving left at speed $v$, so Faraday's law says that an emf is generated by the changing magnetic flux through the loop^{[5]}, which turns out to be *equal* to the emf predicted in the other frame. If the laws of electromagnetism only held in a special "absolutely stationary" frame, it's quite a coincidence that they give the same result when applied in another "moving" frame. Thus, Einstein rejected the idea of an absolute frame, and said that *all laws of physics* hold equally in all inertial frames.

Maxwell's equations also imply that the speed of light is constant:

$$
c = \frac{1}{\sqrt{\epsilon_0 \mu_0}},
$$

where $\epsilon_0$ and $\mu_0$ are universal constants. Since Maxwell's equations hold in all inertial frames, the speed of light is constant in all inertial frames as well.

These two postulates form the basis of special relativity:

- The laws of physics are identical in all inertial frames.
- The speed of light in vacuum is a constant ($c \approx 3 \times 10^8 \text{m/s}$) in all inertial frames.

Of course, the second postulate is a consequence of the first, *if* you accept Maxwell's equations as true.

Now we'll use these postulates and some clever math and physics to derive the Lorentz transformation. But one more thing first: in view of the constant speed of light, let's **redefine the vector components** as $(ct, x, y, z)$ so that all the components have the same units.

First off, let's show that the Lorentz transformation is linear. In fact, let's investigate what a linear transformation really *means* in physics. For a general coordinate transformation, the primed coordinates can be any function of the unprimed ones:

$$
\textbf{x'} = \textbf{T}(\textbf{x}),
$$

where $\textbf{x'} = (ct', x', y', z')$ and $\textbf{x} = (ct, x, y, z)$. Recall that a transformation is *linear* if these conditions hold:

$$
\begin{aligned}
1) & \textbf{T}(\textbf{x}_1 + \textbf{x}_2) = \textbf{T}(\textbf{x}_1) + \textbf{T}(\textbf{x}_2) \\
2) & \textbf{T}(a \textbf{x}_1) = a \textbf{T}(\textbf{x}_1)
\end{aligned}
$$

for all vectors $\textbf{x}_1$ and $\textbf{x}_2$ and scalars $a$.

Let's start with condition 1. Since there's no physical significance to the origin of our reference system, we should be able to translate our reference system first, then apply the transformation, and get the same result as first applying the transformation, then translating. In other words, the transformation commutes with translation. Fig. 6 shows this using rotation as an example.

FIG. 6. A linear transformation (e.g. rotation) commutes with translation.

This idea is expressed mathematically as:

$$
\textbf{T}(\textbf{x} + \Delta \textbf{x}) = \textbf{T}(\textbf{x}) + \textbf{T}(\Delta \textbf{x}),
$$

where $\Delta \textbf{x}$ is the translation vector. Since $\textbf{x}$ and $\Delta \textbf{x}$ are arbitrary, this is exactly condition 1^{[6]} .

Condition 2 simply states that $\textbf{x}$ and $\textbf{x'} = \textbf{T}(\textbf{x})$ have the same units. Let's say $\textbf{x}$ and $\textbf{x'}$ are measured in kilometers. If we change units to meters, the transformation becomes:

$$
1000 \textbf{x'} = \textbf{T}(1000 \textbf{x}) = 1000 \textbf{T}(\textbf{x}).
$$

Clearly 1000 can be replaced by any number, so condition 2 holds, and the Lorentz transformation is linear.

Consider two frames in standard configuration (Fig. 4). Since the Lorentz transformation is linear, our job is to find the 16 elements of its matrix $\pmb{\Lambda}$, which will depend on $v$, the relative velocity of the frames $F$ and $F'$. We can immediately simplify the problem. Since the frames are only moving in the $x$-direction, the problem is symmetric in the $y$ and $z$ directions, so they can't "mix" with the $t$ and $x$ axes^{[7]}. Thus, $\pmb{\Lambda}$ has the form:

$$
\begin{equation}
\pmb{\Lambda} = \begin{pmatrix}
A & B & 0 & 0 \\
C & D & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{pmatrix}.
\label{eq:ABCD_mat}
\end{equation}
$$

We'll just work with the sub-matrix $\textbf{L}$ to make things simpler:

$$
\textbf{L} \equiv \begin{pmatrix}
A & B \\
C & D \\
\end{pmatrix}.
$$

Now consider a light bulb at the origin that is turned on at time $t = 0$, then immediately turned off again, emitting only a short pulse^{[8]}. This light pulse radiates in all directions equally (Fig. 7). The edge of this "light sphere" can be observed in both frames. In frame $F$, it is described by the equation

$$
(ct)^2 - x^2 - y^2 - z^2 = 0
$$

($x^2 + y^2 + z^2$ is the radius squared of the sphere). In frame $F'$, we also see the light turn on and off at $t' = 0$, and it radiates outward in exactly the same way, due to the constant speed of light:

$$
(ct')^2 - x'^2 - y'^2 - z'^2 = 0.
$$

FIG. 7. Traveling light pulse as seen from frame $F$ (a) and frame $F'$ (b). Note that the light bulb stays at the origin of frame $F$, so it's moving left in (b). It is off after emitting the pulse. Point $P$, the point on the light sphere and the $+x$ axis, moves at the same speed ($c$) in both frames.

Now consider the point on the sphere on the $+x$ axis (Fig. 7). For this point,

$$
\begin{aligned}
x &= ct \text{, and} \\
x' &= ct',
\end{aligned}
$$

so

$$
\begin{pmatrix}
ct' \\
x' \\
\end{pmatrix}
=
ct'
\begin{pmatrix}
1 \\
1 \\
\end{pmatrix}
= \textbf{L}
ct
\begin{pmatrix}
1 \\
1 \\
\end{pmatrix},
$$

and

$$
\frac{t'}{t} \begin{pmatrix} 1 \\ 1 \end{pmatrix}
= \textbf{L}
\begin{pmatrix}
1 \\
1 \\
\end{pmatrix}.
$$

Thus, $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$ is an eigenvector of the matrix, with some unknown eigenvalue $\lambda_1$ (since we don't know how $t'$ and $t$ are related). For the point on the $-x$ axis, we get the other eigenvector, $\begin{pmatrix} 1 \\ -1 \end{pmatrix}$, with eigenvalue $\lambda_2$^{[9]}. These are very important, since the eigenvectors and eigenvalues of any linear transformation are the key to understanding what it does.

We don't know the eigenvalues. But think about the situation from the perspective of frame $F'$, i.e. consider the inverse transformation, $\textbf{L}^{-1}$. The eigenvectors stay the same, but now the light on the $+x$ axis is moving *away* from the moving frame (which is now $F$), and the light on the $-x$ axis is moving *along* with it, just the opposite of before. So, the eigenvalues switch: the new $\lambda_1'$ equals $\lambda_2$, and $\lambda_2' = \lambda_1$. But from linear algebra, we know that inverting a transformation simply inverts the eigenvalues. So $\lambda_1' = \lambda_2 = 1 / \lambda_1$, and we get the nice result that:

$$
\lambda_1 \lambda_2 = 1.
$$

We now know enough to derive the Lorentz transformation directly^{[10]}. But there's a lot of insight to be gained from taking a more roundabout way:

Any vector can be expressed in the eigenvector basis, $\left\{\begin{pmatrix} 1 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ -1 \end{pmatrix}\right\}$, calling the components $x_1$ and $x_2$ (Fig. 8).

FIG. 8. A vector $\textbf{v}$ expressed in the eigenvector basis $\textbf{e}_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix}, \textbf{e}_2 = \begin{pmatrix} 1 \\ -1 \end{pmatrix}$, and its new components after the transformation.

Then under $\textbf{L}$:

$$
\begin{aligned}
x_1 &\rightarrow \lambda_1 x_1 \text{, and} \\
x_2 &\rightarrow \lambda_2 x_2 = x_2 / \lambda_1,
\end{aligned}
$$

so the "area" $x_1 x_2$ is invariant under a Lorentz transformation. Now, $x_1 = (ct + x) / 2$, and $x_2 = (ct - x) / 2$, so the quantity:

$$
\begin{equation}
2x_1 2x_2 = (ct + x)(ct - x) = (ct)^2 - x^2
\label{eq:x_interval}
\end{equation}
$$

is invariant.

Let's try to understand this quantity a little better. Note that it looks an awful lot like the *length squared* of a vector, $x^2 + y^2$, which we know is invariant under *rotations*, but there's that pesky minus sign. However, if we make the substitution into "imaginary time":

$$
T = it,
$$

then $\eqref{eq:x_interval}$ becomes $(cT / i)^2 - x^2 = -(cT)^2 - x^2$. This has the desired form of (negative) length squared.

So the Lorentz transformation can be viewed as a rotation between imaginary time and real space^{[11]}, and thus has the form of a rotation matrix:

$$
\begin{equation}
\begin{pmatrix} ict' \\ x' \end{pmatrix}
\:
=
\:
\begin{pmatrix}
\cos{\theta} & \sin{\theta} \\
-\sin{\theta} & \cos{\theta} \\
\end{pmatrix}
\:
\begin{pmatrix}
ict \\ x
\end{pmatrix}.
\label{eq:imag_time_matrix}
\end{equation}
$$

For the path of the $F'$ origin, $x = vt$ and $x' = 0$, and we have:

$$
\begin{pmatrix} ict' \\ 0 \end{pmatrix}
\:
=
\:
\begin{pmatrix}
\cos{\theta} & \sin{\theta} \\
-\sin{\theta} & \cos{\theta} \\
\end{pmatrix}
\:
\begin{pmatrix}
ict \\ vt
\end{pmatrix}.
$$

In particular, from the bottom row,

$$
\begin{aligned}
ict\sin{\theta} &= vt\cos{\theta} \\
\tan{\theta} &= -iv/c
\end{aligned}
$$

We can find $\sin{\theta}$ and $\cos{\theta}$ using a trig identity, or by drawing a picture (Fig. 9).

FIG. 9. Triangle showing how to find $\sin{\theta}$ and $\cos{\theta}$ from $\tan{\theta}$.

Note that they are only determined up to a minus sign, since the minus sign could belong to either $iv$ or $c$. We can figure out which one is positive by observing that $\{t = 0, x > 0\}$ implies $x' > 0$. Plugging this into $\eqref{eq:imag_time_matrix}$, we see from the bottom row that $\cos{\theta}$ must be positive, and so:

$$
\begin{aligned}
\cos{\theta} &= \frac{c}{\sqrt{c^2 - v^2}} \\
\sin{\theta} &= \frac{-iv}{\sqrt{c^2 - v^2}}
\end{aligned}
$$

Substituting these into $\eqref{eq:imag_time_matrix}$ and going back to real time, we get:

$$
\begin{pmatrix} ct' \\ x' \end{pmatrix}
\:
=
\:
\frac{1}{\sqrt{c^2 - v^2}}
\begin{pmatrix}
c & -v \\
-v & c \\
\end{pmatrix}
\:
\begin{pmatrix}
ct \\ x
\end{pmatrix}.
$$

This is typically rewritten in terms of two dimensionless quantities $\beta \equiv \frac{v}{c}$ and $\gamma \equiv \frac{1}{\sqrt{1 - \beta ^ 2}}$^{[12]}. If we also include the other two coordinates $y$ and $z$, we get:

$$
\begin{equation}
\begin{pmatrix} ct' \\ x' \\ y' \\ z' \end{pmatrix}
\:
=
\:
\begin{pmatrix}
\gamma & -\gamma\beta & 0 & 0 \\
-\gamma\beta & \gamma & 0 & 0\\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{pmatrix}
\:
\begin{pmatrix}
ct \\ x \\ y \\ z
\end{pmatrix}.
\label{eq:lorentz}
\end{equation}
$$

And we're done (whew!).

One thing to note is that $v < c$, otherwise we get $\gamma$ imaginary, which is unphysical. So no object can move faster than the speed of light, since time and distance would make no sense in its reference frame.

Here are some exercises to get more familiar with the Lorentz transformation and explore some important consequences.

Show that $\eqref{eq:lorentz}$ reduces to the classical equation $\eqref{eq:class_boost}$ when $v \ll c$, by taking the limit as $c \rightarrow \infty$^{[13]}.

Show/hide solution$$
\begin{aligned}
ct' &= \gamma (ct - vx / c) \\
t' &= \gamma (t - vx / c^2) \approx t \text{ as }c \rightarrow \infty. \\
\\
x' &= \gamma (-vt + x) \approx -vt + x \text{ as }c \rightarrow \infty.
\end{aligned}
$$

What is the inverse Lorentz transformation, to go from the primed coordinates ($ct'$, etc) to the unprimed ones? You can explicitly find the inverse of the matrix in $\eqref{eq:lorentz}$, but think about the situation physically first.

Show/hide solutionIt's the same equation but with $v$ replaced with $-v$, since the target frame is going in the opposite direction now.

You are standing at the origin and quite unexpectedly see a flying clock moving in the $+x$ direction with speed $v$. As it passes by you, you synchronize your watch to it, and they both read $t = t' = 0$ at the origin. What is the relationship between your time $t$ and the time on the clock $t'$? (*Hint*: remember that clocks always show the time in their *own* reference frame.)

FIG. 10. Clock moving in the $+x$ direction with speed $v$.

The clock's path is $x = vt$ and your frames are in standard configuration since their origins coincide. From $\eqref{eq:lorentz}$:

$$
ct' = \gamma (ct - (v/c) (vt));
$$

$$
\begin{equation}
\begin{aligned}
t' &= \gamma (t - \beta ^2 t) \\
&= t \frac{(1 - \beta ^2)}{\sqrt{1 - \beta ^2}} \\
&= t / \gamma.
\end{aligned}
\label{eq:t_trans}
\end{equation}
$$

Thus we see that *moving clocks run slower*, by a factor of $\gamma$. When your watch reads $1$ second, the clock only reads $1 / \gamma$ seconds. We call this effect *time dilation*.

Immediately, there seems to be a problem - from the clock's perspective, you're moving in the $-x$ direction with speed $v$, and so your watch is *also* running slower relative to the clock. How can this be? Here is an explanation which probably won't entirely satisfy you, but may make it more plausible. Remember how we mentioned earlier that observers can only measure time locally? This means the situation is inherently asymmetric - in order to measure $t'$ continuously, you must in fact have many observers in the clock's path, using their own watches, to measure $t'$ on the moving clock as it flies past them^{[14]}. So, you are using *one* clock in the clock's frame (the clock itself), but *many* clocks in your own frame.

And of course, the situation is reversed in the clock's frame, so you two never end up comparing clocks in the same position after the initial pass-by at the origin.

But what if the moving clock turned around and started heading back towards you? It would continue to be dilated by a factor of $\gamma$, and when it passes by you again it would read a lower time. From the clock's perspective, you started heading back towards it, and so when you pass by, your clock would read a lower time! Since now you are comparing clocks in the same position in both perspectives, this seems like a much more serious problem. It even has a name, the Twin paradox. I'll explain the solution when we talk more about particles, but as a hint, think carefully about what "turning around" means - is the situation really as symmetric as it seems?

You're standing at the origin again when a ladder flies past you! You decide to measure its length for some reason. In terms of events, "measuring length" means getting the positions of the two ends of the object at the same time, and finding their difference. Let's say you measure it at time $t = 0$ when one end is at the origin and the other is at $x = d$. So the two relevant events $(ct, x)$ are: $(0, 0)$ and $(0, d)$, and the length is $d$. Question: what is the length of the ladder in the ladder's frame?

FIG. 11. Ladder moving in the $+x$ direction with speed $v$.

We need to apply the Lorentz transformation to $(0, 0)$ and $(0, d)$ to find the length in the moving frame. We also don't care about the time $t'$ since the ladder is stationary in its own frame, so it doesn't matter at what time the two ends are measured.

Using $\eqref{eq:lorentz}$, $(0, 0)$ becomes $(0, 0)$, of course, and $x = d$ becomes $x' = \gamma d$, so the length of the stationary ladder is $\gamma d$.

So the ladder is actually *longer* by a factor of $\gamma$ in its rest frame. Or in other words: *moving objects are shortened by a factor of $\gamma$*, when their length is measured in the moving frame.

Of course, this means that in the ladder's frame, *you* have also shrunk in the $x$ direction by a factor of $\gamma$. And you guessed it, this leads to another apparent paradox, the *ladder paradox*. Wikipedia describes it pretty well.

In the next part of this guide, we will mostly focus on moving particles and their properties, such as energy and momentum. Specifically, we're dealing with idealized *point* particles, which have no volume and may or may not have mass. The path of such a particle is called its *worldline*, and can be thought of as an infinite series of events, separated by infinitesimal displacements $\textbf{dx}_1$, $\textbf{dx}_2$, etc ^{[15]}.

FIG. 12. The path of a moving particle is an infinite series of events separated by infinitesimal displacements.

Each of these displacement vectors is just a difference of two events, so its components transform in the usual way under a Lorentz transformation:

$$
\textbf{dx} = \textbf{x}_2 - \textbf{x}_1 \text{ becomes } \textbf{dx'} = \Lambda \textbf{x}_2 - \Lambda \textbf{x}_1 = \Lambda (\textbf{x}_2 - \textbf{x}_1) = \Lambda \textbf{dx}.
$$

In classical physics, the arc length describes the distance between two infinitesimally separated positions in space:

$$
\begin{aligned}
\text{ds} \; &=\sqrt{\text{dx}^2 + \text{dy}^2 + \text{dz}^2} \\
&= \text{dt} \sqrt{\left(\frac{\text{dx}}{\text{dt}}\right)^2 + \left(\frac{\text{dy}}{\text{dt}}\right)^2 + \left(\frac{\text{dz}}{\text{dt}}\right)^2} \; \; &\text{(multiplying top and bottom by dt)} \\
&= v \text{dt},
\end{aligned}
$$

where $v$ is the particle's instantaneous speed and $\text{dt}$ is the time for the particle to travel between the points. If we integrate this over the whole path of the particle, we get the total distance traveled:

$$
\begin{equation}
s = \int_{t_1}^{t_2} v \text{dt},
\label{eq:tot_dist}
\end{equation}
$$

in this case between time $t_1$ and $t_2$^{[17]}.

Note that the arc length is invariant under rotations; if you rotate your coordinate system, the particle still travels the same distance. What is the corresponding quantity in relativity? Ideally, it would be invariant under boosts as well as rotations. Well, we have already constructed a quantity $(ct)^2 - x^2$ (equation $\eqref{eq:x_interval}$) that played the role of the (negative) "length squared" of a vector, and was invariant under boosts in the $x$ direction. We only need to extend it to all three spatial dimensions, and make it infinitesimal:

$$
\text{ds}^2 = (c\text{dt})^2 - \text{dx}^2 - \text{dy}^2 - \text{dz}^2,
$$

and we call $\text{ds}^2$ the *interval*. It is invariant under boosts in *any* direction, since any boost can be decomposed into $x$, $y$, and $z$ boosts, and it's clearly invariant under all of those.

The interval $\text{ds}^2$ can be negative or positive - the quantity $\text{ds}$ only has meaning if it is positive. If the interval is negative, $\text{dx}^2 + \text{dy}^2 + \text{dz}^2 > (c\text{dt})^2$, which would mean that the particle traveled faster than light, since it moved a distance greater than $c \text{dt}$ in time $\text{dt}$. No physical particle can do this, but in general, events can be separated in space but not time, such as when we measured length above. We call an interval between two events *spacelike* if $\Delta x^2 + \Delta y^2 + \Delta z^2 > (c\Delta t)^2$, *timelike* if $\Delta x^2 + \Delta y^2 + \Delta z^2 < (c\Delta t)^2$, and *lightlike* if $\Delta x^2 + \Delta y^2 + \Delta z^2 = (c\Delta t)^2$.^{[18]}

The events of a real particle's path are separated by timelike intervals ($\text{ds}^2 > 0$), and we can define the *proper time* as:

$$
\begin{equation}
\begin{aligned}
\text{d}\tau \; &= \sqrt{\text{ds}^2 / c^2} \\
&= \sqrt{\text{dt}^2 - \frac{\text{dx}^2 + \text{dy}^2 + \text{dz}^2}{c^2}} \\
&= \text{dt} \sqrt{1 - \frac{\left(\frac{\text{dx}}{\text{dt}}\right)^2 + \left(\frac{\text{dy}}{\text{dt}}\right)^2 + \left(\frac{\text{dz}}{\text{dt}}\right)^2}{c^2}} \\
&= \text{dt} \sqrt{1 - v^2 / c^2} \\
&= \text{dt} / \gamma,
\end{aligned}
\label{eq:prop_time}
\end{equation}
$$

where $v$ is the particle's instantaneous speed. This is probably the closest analogy to the arc length in relativity. Since it is invariant under boosts and rotations, the total (integrated) proper time is also invariant:

$$
\tau = \int_{t_1}^{t_2} \text{dt} / \gamma.
$$

Remember, the $v$ hidden in $\gamma$ is the instantaneous speed, so $\gamma$ depends on time and we can't just factor it out, same as in $\eqref{eq:tot_dist}$. Why is it called the proper time? When viewing a path in the particle's own reference frame, $\text{dx} = \text{dy} = \text{dz} = 0$ throughout the motion, so $\text{d}\tau = \text{dt}$ and the total $\tau$ is just the time elapsed. *propre* is the French word for "own", thus "own time"^{[19]}.

A particle moves in a sinusoidal path in one dimension: $x(t) = x_0 \sin(\omega t)$.

a) Find the period of the motion in the particle's own reference frame. Just write out the integral; it can't be solved in closed form.

b) If $x_0 \omega \approx c$, the integral can be solved. Solve it and find the ratio of periods in the two frames, $T_\text{particle frame} / T_\text{your frame}$.

Show/hide solution$$
T_\text{particle frame} = \int_0^{2 \pi / \omega} \sqrt{1 - \frac{\omega^2 x_0^2}{c^2} \cos^2(\omega t)} \; \text{dt}.
$$

$$
T_\text{particle frame} = \int_0^{2 \pi / \omega} \left|\sin(\omega t)\right| \; \text{dt},
$$

which yields $\frac{2}{\omega}$. So, $T_\text{particle frame} / T_\text{your frame} = \frac{1}{\pi}$.

We have seen that events are vectors in a four dimensional space with components $(ct, x, y, z)$. Their components transform under a boost in the $x$ direction by $\eqref{eq:lorentz}$. We have also seen that the difference between two infinitesimally separated events is a vector that transforms in the same way. Let's now define a *four-vector* as any set of four quantities that transforms as $\eqref{eq:lorentz}$ under a boost. The only four-vectors we have seen so far are the two just mentioned above: the *position four-vector*, $(ct, x, y, z)$, and the infinitesimal version $(c\text{dt}, \text{dx}, \text{dy}, \text{dz})$. We will soon derive other ones.

Finding four-vectors is important, since they describe how quantities of interest such as velocity or momentum transform under a boost. This will be more clear after some examples.

But first, we will introduce some new notation. A four-vector $A$ will have component indices on top now, written as $A^i$, where the index $i = 0, 1, 2, 3$. The matrix $\pmb{\Lambda}$ will be written as $\Lambda^j_i$, where $i, j = 0, 1, 2, 3$. The index $i$ denotes the column, and $j$ denotes the row. We will also use the *Einstein summation convention*: any expression with a repeated index indicates a summation over that index. So,

$$
\textbf{x'} = \pmb{\Lambda} \textbf{x} \text{ means } x'^j = \sum_{i = 0}^3 \Lambda^j_i x^i \text{ which becomes } x'^j = \Lambda^j_i x^i.
$$

(If the second equation is not clear, write out the matrix multiplication fully and stare at it for a while.) Since the index $i$ is repeated on the right, the summation is implied.

The Einstein notation is just a way to save space, since sometimes equations have multiple summations and it can get messy. The real benefit is actually in using these indices instead of matrix-vector notation. We will see this later when we introduce covectors, and tensors in general; I just wanted to introduce it early since it can take a while to get used to.

In this new notation, a four vector is a set of four quantities $A^i$ that transform as:

$$
A'^j = \Lambda^j_i A^i
$$

under a boost in the $x$ direction.

Dividing the differential position $\text{dx}^i = (c\text{dt}, \text{dx}, \text{dy}, \text{dz})$ by $\text{dt}$ gives:

$$
(c, \frac{\text{dx}}{\text{dt}}, \frac{\text{dy}}{\text{dt}}, \frac{\text{dz}}{\text{dt}}) = (c, v_x, v_y, v_z).
$$

This is a set of four numbers which includes the velocity as components 1-3 (remember, the first component is index 0). Is it a four-vector? Obviously not, since $c$ doesn't change under a boost. The problem is that $\text{dt}$ itself changes, so we can't divide the existing four-vector $\text{dx}^i$ by it to get another one. We need a scalar quantity that doesn't change under a boost. Fortunately, we just found one, the proper time $\text{d}\tau = \text{dt} / \gamma$ $\eqref{eq:prop_time}$. Now we can define the velocity four-vector as^{[20]}:

$$
V^i = \frac{\text{dx}^i}{\text{d}\tau} = \gamma (c, \frac{\text{dx}}{\text{dt}}, \frac{\text{dy}}{\text{dt}}, \frac{\text{dz}}{\text{dt}}) = \gamma (c, v_x, v_y, v_z).
$$
$V^i$ can also be written schematically as $\gamma (c, \textbf{v})$, with the regular 3-vector $\textbf{v} = (v_x, v_y, v_z)$.

Find the invariant $(V^0)^2 - (V^1)^2 - (V^2)^2 - (V^3)^2$ associated with the velocity four-vector $V^i$. (Confusing notation: the first superscript is the vector component and the second is the power it's raised to.)

Show/hide solution$$
\gamma^2 (c^2 - v_x^2 - v_y^2 - v_z^2) = \gamma^2 (c^2 - v^2) = \frac{(c^2 - v^2) c}{c^2 - v^2} = c^2.
$$
$c^2$ is obviously invariant. Just as $(ct)^2 - x^2 - y^2 - z^2$ is like the (negative) "length squared" of a position four-vector, we can interpret $c^2$ as the magnitude squared of the particle's "relativistic velocity". Thus, although the "length" of the position vector can change, all particles move through spacetime with speed $c$, so to speak.

(Note: this problem is more difficult than the others. But it's worth it, since it shows that acceleration is handled perfectly fine in special relativity. Also, it's great practice for really understanding four-vectors. You will need to know how to solve simple differential equations.)

We can go further and find a four-vector involving the acceleration, by differentiating the velocity four-vector with respect to $\tau$ again, and using the chain rule:

$$
A^i = \frac{\text{d}V^i}{\text{d}\tau} = \frac{\text{d}V^i}{\text{dt}} \frac{\text{dt}}{\text{d}\tau}.
$$

a) Calculate the components of the acceleration four-vector, in terms of the regular 3-vector acceleration $\textbf{a}$ and velocity $\textbf{v}$.

b) Consider a particle that always has constant acceleration $\textbf{a} = (a_x, 0, 0)$ *in its own reference frame*. You observe it in an inertial frame where it is at rest at the origin at $t = 0$. What is its path $x(t)$ in your frame?

Show/hide solution$$
\frac{\text{d}\gamma}{\text{dt}} = \frac{\gamma^3 \textbf{a} \cdot \textbf{v}}{c^2}.
$$

So, putting it all together, we get:

$$
\begin{equation}
A^i = \gamma^2 (\frac{\gamma^2 \textbf{a} \cdot \textbf{v}}{c}, \textbf{a} + \frac{\gamma^2 \textbf{a} \cdot \textbf{v}}{c^2} \textbf{v}),
\label{eq:accel}
\end{equation}
$$

where we used the product rule on $\frac{\text{d}}{\text{dt}} (\gamma \textbf{v})$.

$$
A^i = \gamma (\beta a_x, a_x, 0, 0).
$$

We can equate the first component $\gamma \beta a_x$ with $\gamma^4 av / c$ from $\eqref{eq:accel}$, where $a$ is the acceleration in your frame, which equals $\frac{\text{dv}}{\text{dt}}$. Now we have a second-order ordinary differential equation, with initial conditions $v(0) = 0, x(0) = 0$. Integrating twice, we get the final answer:

$$
x(t) = \frac{c^2}{a_x} (\sqrt{1 + \frac{a_x^2 t^2}{c^2}} - 1).
$$

Note that at large $t$, it becomes $x(t) \approx ct$.

This problem shows that we can think of a four-vector as a "package" for dynamical quantities, that can be carried between frames using the Lorentz transformation, then "unwrapped" to get normal quantities in the frame of interest.

Incidentally, the solution to the twin paradox is that one twin has to accelerate to turn around, while the other doesn't. This clearly makes the situation asymmetric - the accelerating twin is in a non-inertial reference frame.

Since we have a velocity four-vector, we can easily form a four-vector involving momentum by simply multiplying by the mass of the particle, $m$, since the mass doesn't depend on reference frame:

$$
P^i = \gamma (mc, mv_x, mv_y, mv_z) = (\gamma mc, \textbf{p}),
$$

where we define $\textbf{p} = \gamma m \textbf{v}$ as the relativistic momentum. Note that $\textbf{v}$ is the ordinary velocity while $\textbf{p}$ is the relativistic momentum. A bit confusing, but it appears to be standard notation.

The quantity $\gamma mc$ is actually something surprising, which you can guess from the title of this section. Let's expand it out to first order in $\beta^2$^{[21]}:

$$
\gamma mc = mc (1 - \frac{v^2}{c^2})^{-1/2} \approx mc (1 + \frac{v^2}{2c^2}) = \frac{1}{c} (mc^2 + \frac{1}{2} m v^2)
$$

We see the classical kinetic energy $\frac{1}{2} m v^2$ pop out, along with another quantity with units of energy, $mc^2$. This suggests we define the total energy $E$ as $\gamma mc^2$. It consists of kinetic energy plus $mc^2$, which we call the *rest energy*, since it's the energy when $v = 0$^{[22]}.

So, we can rewrite the four-momentum as:

$$
P^i = (E / c, \textbf{p})
$$

The associated invariant is, of course, $m^2c^2$, since we just multiplied the four-velocity by $m$. Thus,

$$
E^2 / c^2 - p^2 = m^2 c^2,
$$

or

$$
\begin{equation}
E^2 = m^2 c^4 + p^2 c^2.
\label{eq:eprel}
\end{equation}
$$

This is the relationship between energy and momentum in special relativity.

The laws of conservation of energy and momentum in classical mechanics are now combined into a single law, the *law of conservation of four-momentum*, which we won't prove here. It states that the total four-momentum of all particles in a closed system is conserved^{[23]}.

$$
\textbf{T}(\textbf{x} + \Delta \textbf{x}) = \textbf{T}(\textbf{x}) + \Delta \textbf{x}
$$

but remember, when you apply $\textbf{T}$ to the components, it applies to all vectors, including $\Delta \textbf{x}$.

$$
\frac{\text{dx}^i}{\text{d}\tau} = \frac{\text{dx}^i}{\text{dt}} \frac{\text{dt}}{\text{d}\tau}
$$

$$
(1 + \epsilon)^n \approx 1 + n\epsilon \; \; \; \; \; (\epsilon \ll 1),
$$

which we use here.

"It is not good to introduce the concept of the mass $M = m/\sqrt{1 - v^2/c^2}$ of a moving body for which no clear definition can be given. It is better to introduce no other mass concept than the ’rest mass’ m. Instead of introducing M it is better to mention the expression for the momentum and energy of a body in motion."