GEOMETRIC DEEP REGRESSION: INFORMATION DISCOVERY THROUGH ROTOR-FREED REPRESENTATION LEARNING
Sudjianto, A. & Setiawan, S.
We present a geometric approach to deep regression that architecturally decouples representation learning from rotational alignment. Standard neural networks with low-dimensional bottlenecks must simultaneously learn meaningful features and their optimal orientation for prediction-a mixed objective that forces suboptimal compromises. We propose inserting a rotor layer-parametrized by bivectors in Geometric Algebra Cl(3, 0)-between a 3D bottleneck and the prediction head to explicitly handle rotational alignment via learnable transformations in SO(3). Our central discovery is that this architectural factorization enables dramatic information gain: encoders freed from alignment constraints capture substantially more mutual information about targets while maintaining identical predictive accuracy. This information gain manifests as discovery of interpretable latent factors-such as commodity/cyclical clusters in financial markets and composite condition patterns in transportation demand-that are invisible to standard joint optimization. The mechanism is fundamental: when forced to simultaneously optimize for feature informativeness and task-specific orientation, encoders learn “just enough” representations sufficient for immediate prediction but missing broader structure. Architectural separation of these objectives allows encoders to discover natural data organization beyond task-specific shortcuts. The rotor guarantees optimal post-hoc alignment via geodesic optimization on SO(3), removing the encoder’s need for representational compromise. Our approach is motivated by recognizing that least squares regression inherently involves rotational transformations through covariance matrix eigendecomposition. We do not impose external geometric structure but rather expose the rotational operations already implicit in least squares optimization. By using bivectors-the algebraic generators of rotations-we achieve structural consistency between architecture and optimization geometry. Theoretical analysis establishes that rotor updates follow cross-product gradient flow in the tangent space of SO(3), naturally converging to optimal supervised directions. Experiments across both linear and nonlinear encoders, on financial time series and transportation demand data, demonstrate that geometric factorization consistently improves representation quality: better cluster separation, faster convergence, superior transfer learning, and-most significantly-dramatically richer information content enabling interpretable factor discovery. The work establishes a broader principle: when optimization objectives have intrinsic geometric structure, explicit architectural factorization mirroring this structure can enable discovery of richer representations by avoiding the compromises inherent in entangled multi-objective learning.
Interested in reading the full article? Please visit the link below: