Cartesian Coordinates - Introduction to Computer Assisted Surgeries

Linear algebra, particularly Cartesian coordinate systems and vector geometry, is fundamental to CAS. To explain why they are essential to all aspects of CAS, let’s begin with something familiar: a (medical) image:

A ultrasound image of liver with metastatic cancer growths — Figure 1:An ultrasound image of a liver containing metastatic cancer growths, image courtesy here, accessed on January 6, 2026.

From Pixels to Positions: Why We Need Coordinates¶

Consider an US image of a patient’s liver. Displayed on a computer monitor as a {term}`2D’ image of different gray scale, different shades of gray represents different tissues types (often highlighted by tissue boundraries). In Figure 1 for example, a tumour appears as a darker region within the brighter liver parenchyma. But how does one infer the size and location of a tumour based on an image?

Using a digital computer, a grey-scale image is represented as a 2D matrix of numerical values. Each element in this matrix represents a pixel, and the numerical value at each position (pixel) encodes the tissue’s appearance in terms of pixel intensity. For example, Figure 1 is an image of $245 \times 245$ pixels stored as a matrix:

I = \begin{bmatrix} I_{0,0} & I_{0,1} & \cdots & I_{0,244} \\ I_{1,0} & I_{1,1} & \cdots & I_{1,244} \\ \vdots & \vdots & \ddots & \vdots \\ I_{244,0} & I_{245,1} & \cdots & I_{244,244} \end{bmatrix}

(1)

where each $I_{i,j}$ represents the intensity value at row $i$ and column $j$ . Each pixel in an US image is typically represented using a 8-bit of storage (e.g. 1-byte) with the pixel intensity value ranging from 0 to 255, where $I_{i,j}=0$ is pure black and $I_{i,j}=255$ is pure white.

A coordinate system assited to an image — Figure 2:A coordinate system assigned to an US image, where the top-left corner is *arbitrary* designated as the origin. A pixel at location $(x,y)$ with an intensity value $v$ is thus referred to as $I(x,y)=v$ , image modified based on the image courtesy here, accessed on January 6, 2026.

That is, each pixel in a 2D image is referred by $(x,y)$ position, which is also the index to the 2D matrix that is used to store and represent this image in a computer system. The $(x,y)$ location also represents a vector from the origin of the coordinate system $(0,0)$ .

The Challenge: From Image Indices to Physical Space¶

Suppose a radiologist examines this US image and identifies that one of the tumour centre appears at pixel location $(66,110)$ in the image. The logical questions that follow are:

Where is this location in the patient’s body? This matrix indices $(66,110)$ does not indicate anything about the physical location of it with respect to the patient.
What about the physical size of the tumour? The 2D matrix, by itself, does not have any information about the physical scale of the image. That is, how do we convert pixel size to a physical unit such as in millimetre?
How do one guide a surgical instrument to this location? Suppose we want to insert a RFA ablation application into this tumour, we need to relate this abstract pixel coordinate system into a physical location that a navigation system can use.

To answer these question requires an understanding of image presentation, coordinate systems, and transformation between coordinate systems. In this regard, a review of linear algebra and vector geometry is needed.

Multi-Channel image and 3D Volumetric Data¶

While a 2D grey-scale image such as US or X-ray image can be represented by 2D matrix in a digital system, color images and volumetric (i.e. 3D) data requires different representation.

For example, a colour image is typically represented as 3 2D images, each 2D image correspond to a colour channel of red, green, and blue (RGB) primary colours. Conceptually, these 3 2D images can be stacked on top of each other, forming a 2D image but each pixel is a 1D vector of 3 values (i.e. multi-channel).

Figure 3:A conceptual representation of a 5x5 RGB colour image, image courtesy here, accessed on January 6, 2026.

A 3D volumetric image data such as CT or MRI are typically displayed, on a 2D monitor, as a series of 2D images:

Figure 4:Sequential CT slices displayed as 2D images, image courtesy here, accessed on January 6, 2026.

This was mainly done for historical reasons including the lack of computational power to render these volumetric data in 3D using technique such as Volume rendering.

A volumetric data is presented digitally as 3D array in computer where, conceptually, each 2D slice is assigned a $x-y$ coordinate system and sequential images are stacked in the $z-$ axis.

A coordinate system assited to a 3D volume — Figure 5:A coordinate system assigned to a 3D volumetric data, where sequential 2D images are assigned a z-coordinate value.

In this regard, each voxel (i.e. a volumetric pixel) is indexed using the $(x,y,z)$ notation.

Image Fusion and Augmented Reality Visualization¶

Image fusion and augmented reality visualization requires the alignment of multiple coordinate systems. This is accomplished via transformation that maps one coordinate system onto another.

Categorically, there are three types of transformations:

Rigid transformation
- Represented as a $4 \times 4$ transformation matrix that
  - Rotates, and
  - Translates one coordinate system onto another.
Similarity transformation
- Similar to the rigid translation, but with scaling
- Also represented as a $4 \times 4$ transformation matrix
- Scaling can be:
  - Isotropic, same in all 3 directions, or
  - Anisotropic, different in each direction
Deformable transformation
- In some literature, referred to as elastic transformation
- Typically represented as an array (2D or 3D) of deformation fields, or by some parametric function.

For the purpose of this course, we focus on only rigid and similarity transformation.

Cartesian Coordinate Systems¶

The key solution is to establish a Cartesian coordinate system, that specifies each point uniquely by a pair (in 2D) or a triplet (in 3D) of real number called coordinate that in terms allows use to use vector geometry to address these problems.

A Cartesian coordinate system has

An origin, denoted as $O=(0,0)$ in 2D or $O=(0,0,0)$ in 3D,
Axes, a notation of directions, that intersect at the origin. In most of the scenarios we will encounter in this course, these axes are perpenticular to each other, and
A scale. Each increment, i.e. from $(0,0)$ to $(1,0)$ , at a coordinate system may correspond to a physical unit. For example, a pixel in an US image may occupy a physical space of $1.0mm \times 1.0mm$ , but a voxel in a CT volume may occupy a physical space of $0.3mm \times 0.3mm \times 0.4mm$ . Often, an anisotropic scaling is needed to match one coordinate system with another.

Vector Geometry: Describing Positions and Transformations¶

In medical imaging and most of the engineering fields, we use a Right-hand rule (RHR) to define the orientation of the axes with respect to its origin. That is, when extended, if the thumb points at the first (positive x-) axis, and the index finger points at the second (positive y-) axis, then middle finger points to the third (positive z-) axis.

This is the opposite of the left-hand rule, which is more commonly employed in Physics.

Notation for this course:¶

Cartesian coordinate system — Figure 8:Righthanded **Cartesian** coordinate system (a.k.a. frame), with $(i,j,k)$ orthonormal bases.

We use the following notation:

Points in capital
Vectors and location vectors are usually in lower case, often in italic and/or bold
$\lvert p \rvert$ is for length of a vector or “absolute value”
Sometimes, we omit bold and/or italics
Sometimes, we use location vectors and points interchangeably
Sometimes, we use the same letter and font for x,y,z coordinates and labelling the x,y,z axes

Thus, a location vector p can be expressed by the linear combination of the base vector

p = P = x i + y j + z k

(2)

and

length(p) = \lvert p \rvert = sqrt( x^2 + y^2 + z^2)

(3)

Skewed Cartesian Frame¶

Some CT scanners, such as the Toshiba Asteion 4, has the ability to performed tilted scan. The intended use is, as an example, reduce X-ray radiation dose to patient’s eye lenses.

Skewed Cartesian coordinate system — Figure 9:A Tilted Computed Tomography (CT) machine, aimed to reduce X-ray dose to eye lenses, image courtesy of Prof. Gabor Fichtinger at Queen’s University, Canada

In this scenario, the base vectors are no longer pair-wise orthogonal, however, a point P or location vector $p$ can still be represented by the linear combination of the base vector

p = P = x i + y j + z k

(4)

But the length of the vector is no longer the square root of the sums of the squared coordinates:

\lvert p \rvert \ne sqrt( x^2 + y^2 + z^2)

(5)

Skewed CT slices — Figure 10:A Tilted Computed Tomography (CT) image volume, image courtesy of Prof. Gabor Fichtinger at Queen’s University, Canada

Polar Frames¶

For mathematical convenience, polar frames are often be used to represent coordinate systems.

Spherical Coordinate System¶

A vector $\overrightarrow{r}$ with a vector length $r$ is denoted by its length and two angles from $x$ - and $z$ -axes:

\begin{align} x & = & r \sin{\theta} \cos{\phi}\\ y & = & r \sin{\theta} \sin{\phi}\\ z & = & r \cos{\theta}\\ \end{align}

(6)

A use case for spherical coordinate system is the Lars Leksell frame:

Laksell Frame — Figure 12:A version of the Leksell Stereotactic frame with angular marking, image courtesy here, accessed January 8, 2026.

commonly used for neurosurgery for inserting a (biopsy/electro) needle:

Laksell Frame used in surgery — Figure 13:A Leksell frame used in OR, image courtesy of Prof. Gabor Fichtinger at Queen’s University, Canada

Because needle is more intuitively represented as a vector (i.e. a point with direction), the use of the spherical coordinate system is more mathematically convenient.

Cylyndrical Coordinate System¶

The cylindrical coordinate system is used to describe a point on the surface of a cylinder, where a point is represented by a lenth $\rho$ (radius of the cylinder), an angle $\psi$ , and the height $z$ :

\begin{align} x & = & \rho \cos{\psi} \\ y & = & \rho \sin{\psi} \\ z & = & z\\ \end{align}

(7)

One use of the cylindrical coordinate system is the intracavity robot, where an, as an example, a transrectal ultrasound (TRUS) is use in surgery.

A transrectal ultrasound in MRI suite — Figure 15:A transrectal ultrasound used an MRI suite, image courtesy of Prof. Gabor Fichtinger at Queen’s University, Canada

Because the TRUS is a cylindrical object, it is more convenient to define a points on its surface (and thereby a direction) using the cylindrical coordinate system.

A transrectal ultrasound probe — Figure 16:A transrectal ultrasound transducer, image courtesy of Prof. Gabor Fichtinger at Queen’s University, Canada