Background and Intuition • l1rotation

Factor Models Primer

Factor models seek to distill often high dimensional data into a set of underlying or latent factors that capture much of the original variation in the data.

We say the data, \(X\), follows a factor structure if:

\[ \begin{align} \underset{(T \times n)}{X} = \underset{(T \times r)}{F} \underset{(r \times n)}{\Lambda^{*'}} + \underset{(T \times n)}{e} \end{align} \]

where there are

\(T\) rows (observations),
\(n\) columns (variables), and
\(r\) factors. And we can refer to
\(\lambda_{ik}^*\) as the entry in \(\Lambda^{*}\) in the \(i\)th row and \(k\)th column

This \(\lambda_{ik}^*\) shows how factor \(k\) is related to, or “loads onto,” variable \(i\). Note that the \(k\)th column of \(\Lambda^*\) will be referred to as \(\lambda_k^*\).

We assume that the number of underlying factors can be learned from the data (e.g. following the procedure in Bai and Ng (2002) or Ahn and Horenstein (2013)).¹ Next, we’re interested in estimating the loading matrix \(\Lambda^*\), which shows how the factors are related to the data matrix \(X\).

Problem: Rotational Indeterminacy

However, there is no unique estimate of \(F\) and estimate of \(\Lambda^*\) that satisfies the equation above.

To illustrate, let \(H\) be any nonsingular \(r \times r\) matrix. We can define an alternative loading matrix \(\Lambda^0 = \Lambda^* (H')^{-1}\) and alternative factors \(F^0 = FH\), and it follows that

\[ \begin{align} X & = F^0 \Lambda^{0'} \\ & = FH (H^{-1'})' \Lambda^{*'}\\ & = FH H^{-1} \Lambda^{*'} \\ & = F \Lambda^{*'} \end{align} \]

Hence \((F^0,\Lambda^{0'})\) and \((F,\Lambda^*)\) are observationally equivalent. This is referred to as rotational indeterminacy and makes interpretation of any estimate of the loading matrix delicate. Each arbitrary rotation \(H\) will result in an alternate interpretation of how the factors and variables are related. But we are interested in the true relationship, \(\Lambda^*\). So how can we recover the true interpretation?

Solution: Use Sparsity

The key idea in Freyaldenhoven (2025) is that assuming a sparsity pattern in the true loadings matrix \(\Lambda^*\) solves the issue of rotational indeterminacy since the sparsity pattern is not invariant to rotations. Intuitively, any rotation (or linear combination) of a sparse loading vector will be less sparse.

To see this, note that PCA gives us estimates that are asymptotically a linear combination of the true loadings vectors. That is, we can obtain principal component vectors in the columns of \(\Lambda^0\) such that

\[ \begin{align} \Lambda^0 = \Lambda^*(H^{-1})' + \epsilon \end{align} \] where \(H\) is a nonsingular \(r \times r\) matrix and \(\epsilon\) is ``small’’.

Thus, \(\Lambda^0\) is (close to) a linear combination of \(\Lambda^*\). However, even if \(\Lambda^*\) is sparse, linear combinations of the sparse loadings in \(\Lambda^*\), such as \(\Lambda^0\), are generally not sparse.

Conversely, \(\Lambda^*\) is also (close to) a linear combination of \(\Lambda^0\). Thus, there must be a linear combination of \(\Lambda^0\) that is sparse. This sparse linear combination of \(\Lambda^0\) will be our estimator for \(\Lambda^*\).

Recovering Sparsity

Now, how do we find this sparse linear combination of loading vectors?

A simple approach would be:

Take the principal components estimator as starting point to obtain \(\Lambda^0\)
Find loadings \(\Lambda^*\) equal to the rotation of \(\Lambda^0\) that minimizes the number of non-zero elements (the \(l_0\) norm) in the loading vectors

Unfortunately, the \(l_0\) norm is infeasible to optimize over in practice. To see why, let’s examine the figure below. The distance from the origin to each red point depicts the \(l_0\) norm of points along the unit circle. The blue points depict the \(l_1\) norm and the gray points depict the \(l_2\) norm. In the \(l_2\) case, these distances remain constant as we traverse the circle.

The \(l_0\) norm directly computes the number of nonzero elements, shown by the red points. At (1, 0), (0, 1), (-1, 0) and (0, -1), these are a distance of 1 away from the origin. The remaining red points are a distance of 2 away from the origin corresponding to the 2 nonzero elements in all other points along the unit circle. Since the blue and red dots coincide at the above four points, the \(l_1\) norm is minimized at the same points as the \(l_0\) norm. But the \(l_1\) norm has a smooth descent to these minima compared with the discontinuities of \(l_0\) norm around these minima. This makes the \(l_1\) easier to minimize.

Thus, to maximize sparsity we minimize the \(l_1\) norm instead of the \(l_0\) norm.

Objective Function

Formally, using the \(l_1\) norm, our objective function becomes

\[ \begin{align} \min_R ||\Lambda^0R||_1 \end{align} \] such that \(R\) is nonsingular and \(||r_k||_2 = 1\) where \(r_k\) is a row in \(R\). This keeps the length of each loading vector fixed, ensuring we look only at rotations of each initial loading vector. Since the length remains constant, we can transform this constrained optimization problem into an unconstrained one by converting to polar coordinates. For example, in the case of two factors, in the constrained version we want to find weights \(w_1, w_2\) that minimize

\[ || w_1 \lambda_1^0 + w_2 \lambda_2^0 ||_1 \quad s.t. \quad w_1^2 + w_2^2 = 1. \] In spherical coordinates, this amounts to finding an angle \(\theta\) that minimizes

\[ ||\lambda_1^0 \cos \theta + \lambda_2^0 \sin \theta ||_1. \]

A simple example

Let’s consider an example we can visualize fully: we’ll use the following simulated data that has three columns (\(n = 3\)) and suppose we know that there are two factors, \(r = 2\). Below only the first 6 of 224 total rows are printed.

          V1         V2         V3
1  0.4814018  0.7468954 -1.4401043
2 -0.8309668 -0.6558851  1.5983829
3  0.2360548  0.8199547  1.8496925
4  0.3685305  0.9678467 -1.1387934
5 -0.5266293  0.3336697 -0.0650335
6 -0.2464255 -0.5563491  0.8534258

Let the true loadings matrix, \(\Lambda^*\) have 2 columns and 3 rows. As we can see there is a sparsity pattern in the matrix with the first factor affecting only the first column and the second factor affecting the second and third columns of \(X\).

  loadings_1 loadings_2
1   1.024946  0.0000000
2   0.000000  1.3517553
3   0.000000  0.2765392

Now, let’s look at the PCA estimator for \(X\) for the first two principal components. The zeroes from above are no longer present, but the principal component estimate still provides a great starting point as they are generally linear combinations of the true loading vectors as \(n \to \infty\) (e.g., Bai (2003)).

  loadings_1 loadings_2
1 -1.2668929 -0.2153919
2 -0.8709413 -0.9799329
3  0.7977743 -1.4118561

Let’s try to get a clearer picture of what we have so far by visualizing these components. Below is an interactive visualization of

the data points, \(X\)
the plane spanned by the principal component vectors (blue plane)
the principal component vectors \(\Lambda^0\) (blue)
the true loading vectors \(\Lambda^*\) (orange)

Interactive Visualization

As we can see, the blue principal component vectors span the blue plane. By definition, these two vectors provide the directions of maximal variance of the data. We can also imagine this data cloud projected onto the plane to give us a sense of how the data could be compressed from 3 to 2 dimensions. Notice that the blue vectors are nowhere near the true loading vectors shown in orange. However, simply rotating these blue vectors would give us solutions that are closer to the true vectors.

Using the sliders below which control the angle of rotation for each vector separately, try rotating the blue vectors along the plane to find a rotation that lines up as close as possible to the orange vectors.

Since \(n = 3\) is small, we can see there is some noise in the PC estimate, resulting in the loading vectors being located slightly off of this plane. Still, we can get a better estimate of the true vectors by rotating.

As we try different angles of rotation out we can see their impact on the objective function we’re optimizing over. The rotation that best aligns with the true loadings vectors should also occur at the local minima of the objective function.

math = require('https://cdnjs.cloudflare.com/ajax/libs/mathjs/1.5.2/math.min.js')
d3 = require("d3@3")

THREE = {
  const THREE = window.THREE = await require("three@0.130.0/build/three.min.js");
  await require("three@0.130.0/examples/js/controls/OrbitControls.js").catch(() => {});
  return THREE;
}

renderer.domElement

viewof theta1_value = Inputs.range(
  [-3.13, 3.13], 
  {value: 0, step: .01, label: "Vector 1 angle (θ):"}
)

viewof theta2_value = Inputs.range(
  [-3.13, 3.13], 
  {value: 1.57, step: .01, label: "Vector 2 angle (θ):"}
)

highlighted_points = obj_data.filter(function(angle) {
  return (theta1_value - 0.005 < angle.theta && theta1_value + 0.005 > angle.theta) ||
         (theta2_value - 0.005 < angle.theta && theta2_value + 0.005 > angle.theta)
})

obj_data = FileAttachment("obj_function_data.csv").csv({typed: true});

Plot.plot({
  height: 150,
  width: 700,
  grid: true,
  marks: [
    Plot.dot(obj_data, {
      x: "theta", 
      y: "y",
      r: 1
    }),
    Plot.dot(highlighted_points, {
      x: "theta", 
      y: "y", 
      fill: "#0000FF", 
      r: 5
    })
  ],
  x: {
  label: "θ"
},
  y: {
    label: ""
  },
  title: "Objective function: ||λ₁⁰ cos θ + λ₂⁰ sin θ ||₁",
  style: {
    background: "transparent",
    fontSize: 12
  }
})

height = 450;
camera = {
  const fov = 45;
  const aspect = width / height;
  const near = 1;
  const far = 1000;
  const camera = new THREE.PerspectiveCamera(fov, aspect, near, far);
  camera.position.set(-2*2, 2*2, -2*2)
  camera.lookAt(new THREE.Vector3(0, 0, 0));
  return camera;
}

gridHelper = {const size = 10;
const divisions = 10;

const gridHelper = new THREE.GridHelper( size, divisions );
return gridHelper;
}

renderer = {
  const renderer = new THREE.WebGLRenderer({antialias: true});
  renderer.setSize(width, height);
  renderer.setPixelRatio(devicePixelRatio);
  const controls = new THREE.OrbitControls(camera, renderer.domElement);
  controls.addEventListener("change", () => renderer.render(scene, camera));
  invalidation.then(() => (controls.dispose(), renderer.dispose()));
  return renderer;
}

pca_vectors = await FileAttachment('lambda0.csv').csv({typed: true})
true_vectors = await FileAttachment('truth_normal.csv').csv({typed: true})


display_arrow = {const dir = new THREE.Vector3( pca_vectors[0].V1, pca_vectors[1].V1, pca_vectors[2].V1);

//normalize the direction vector (convert to vector of length 1)
dir.normalize();

const origin = new THREE.Vector3( 0, 0, 0 );
const length = 1;
const hex = 0x0000FF;

const arrowHelper = new THREE.ArrowHelper( dir, origin, length, hex );
return arrowHelper;
        }
        
display_arrow2 = {const dir = new THREE.Vector3( pca_vectors[0].V2, pca_vectors[1].V2, pca_vectors[2].V2);

//normalize the direction vector (convert to vector of length 1)
dir.normalize();

const origin = new THREE.Vector3( 0, 0, 0 );
const length = 1;
const hex = 0x0000FF;

const arrowHelper = new THREE.ArrowHelper( dir, origin, length, hex );
return arrowHelper;
        }
        
        
true_vector1 = {const dir = new THREE.Vector3( true_vectors[0].V1, true_vectors[1].V1, true_vectors[2].V1);

//normalize the direction vector (convert to vector of length 1)
dir.normalize();

const origin = new THREE.Vector3( 0, 0, 0 );
const length = 1;
const hex = 0xFF664F;

const arrowHelper = new THREE.ArrowHelper( dir, origin, length, hex );
return arrowHelper;
        }
        
true_vector2 = {const dir = new THREE.Vector3( true_vectors[0].V2, true_vectors[1].V2, true_vectors[2].V2);

//normalize the direction vector (convert to vector of length 1)
dir.normalize();

const origin = new THREE.Vector3( 0, 0, 0 );
const length = 1;
const hex = 0xFF664F;

const arrowHelper = new THREE.ArrowHelper( dir, origin, length, hex );
return arrowHelper;
        }
        
        


math_plane = {
  const v1 = new THREE.Vector3(pca_vectors[0].V1, pca_vectors[1].V1, pca_vectors[2].V1);
  const v2 = new THREE.Vector3(pca_vectors[0].V2, pca_vectors[1].V2, pca_vectors[2].V2);
  const normal = new THREE.Vector3().crossVectors(v1, v2).normalize();
  
  // Create a plane with normal
  const plane = new THREE.Plane(normal);
  
  // Create a PlaneGeometry (size 4x4 units, with sufficient segments)
  const planeGeometry = new THREE.PlaneGeometry(4, 4, 1, 1);
  
  // Create a material with transparency
  const planeMaterial = new THREE.MeshBasicMaterial({
    color: 0x6495ED, 
    transparent: true,
    opacity: 0.5,     // 50% opacity, adjust as needed
    side: THREE.DoubleSide // Visible from both sides
  });
  
  // Create the mesh
  const planeMesh = new THREE.Mesh(planeGeometry, planeMaterial);
  
  // Position and orient the plane correctly
  // First, find a point on the plane (using the plane equation)
  const pointOnPlane = new THREE.Vector3();
  // You can set this to origin or any other point you prefer
  plane.projectPoint(new THREE.Vector3(0, 0, 0), pointOnPlane);
  
  // Set the position
  planeMesh.position.copy(pointOnPlane);
  
  // Orient the plane to match your normal vector
  planeMesh.lookAt(pointOnPlane.clone().add(normal));
  
  // Store both the plane object (for calculations) and the mesh (for display)
  return {
    plane: plane,
    mesh: planeMesh
  };
}


points = {
  const data = await FileAttachment('data.csv').csv({typed: true});
  const geometry = new THREE.BufferGeometry();
  const positions = new Float32Array(data.flatMap(d => [d.V1, d.V2, d.V3]));
  geometry.setAttribute("position", new THREE.BufferAttribute(positions, 3));

  const material = new THREE.PointsMaterial({color: 0x000000, size: 0.1});
  const points = new THREE.Points(geometry, material);
  return points;
}

scene = {
  const scene = new THREE.Scene();
  scene.background = new THREE.Color(0xffffff);
  scene.add(gridHelper);
  scene.add(display_arrow);
  scene.add(display_arrow2);
  scene.add(true_vector1);
  scene.add(true_vector2);
  scene.add(math_plane.mesh);
  scene.add(points);
  return scene;
}

update1 = {
  // For the first vector v1
  const x1 = Math.cos(theta1_value);
  const y1 = Math.sin(theta1_value);
  
  // Linear combination: v1 = x1*V1 + y1*V2
  const new_direction = new THREE.Vector3(
    x1 * pca_vectors[0].V1 + y1 * pca_vectors[0].V2,
    x1 * pca_vectors[1].V1 + y1 * pca_vectors[1].V2,
    x1 * pca_vectors[2].V1 + y1 * pca_vectors[2].V2
  ).normalize();
  
  display_arrow.setDirection(new_direction);
  renderer.render(scene, camera);
  return;
}

update2 = {
  // For the first vector v1
  const x2 = Math.cos(theta2_value);
  const y2 = Math.sin(theta2_value);
  
  // Linear combination: v1 = x1*V1 + y1*V2
  const new_direction = new THREE.Vector3(
    x2 * pca_vectors[0].V1 + y2 * pca_vectors[0].V2,
    x2 * pca_vectors[1].V1 + y2 * pca_vectors[1].V2,
    x2 * pca_vectors[2].V1 + y2 * pca_vectors[2].V2
  ).normalize();
  
  display_arrow2.setDirection(new_direction);
  renderer.render(scene, camera);
  return;
}

Note that several minima exist due to sign indeterminacy, i.e., multiplying a sparse loading vector by -1 does not change its sparsity or interpretation.

The above visualizations show the relationship between the angles of rotation, the objective function optimizing for sparsity, and the position of the vectors in space relative to the true loadings. This is the crux of the intuition behind this research and accompanying package.