RXTX: Machine Learning Boot Algorithm for Effective Structured Matrix Multiplication

Discovering faster matrix multiplication algorithms in computer science and numerical linear algebra remains a key pursuit. Since the pioneering contributions of Strassen and Winograd in the late 1960s, this suggests that general matrix products can be calculated with fewer multiplications than previously thought, and thus various strategies have emerged. These include gradient-based approaches, heuristics, group theory frameworks, graph-based random walking and deep reinforcement learning. However, focus will be on matrix products with inherent structures, such as the focus when the second matrix is transposed or the same as the first matrix or the matrix has sparsity or symmetry. Given that expressions such as AA^t often appear in fields such as statistics, deep learning, and communication, representing key constructs such as Gram and covariance matrix, this is worth noting. In addition, XX^T is repeated in large language models such as large language model training algorithms.
Previous studies have explored structured matrix multiplication using theoretical and machine learning-based approaches. Representative Theory and Cohn-Umans framework have been used to design efficient multiplication schemes for structured matrix design. Reinforcement learning also shows promise – the model has learned to discover or rediscover algorithms such as Strassen. Recent work has focused on optimizing the calculation of XX^T on finite fields and complex fields. Among them, the most effective known method of implementing XX^T is Strassen’s algorithm. They recursively apply Strassen’s algorithm to a 2×2 block matrix, effectively converting the structured problem back to the domain of general matrix multiplication.
Researchers from the Chinese University and Shenzhen Institute have developed RXTX, an algorithm for efficient calculation of XX^T, where X belongs to R^n*m. Compared to the current leading methods, RXTX reduces the number of operations required by about 5% (most grants and additions). Unlike many algorithms that only show benefits from large matrices, RXTX can be improved even for small sizes (e.g. n = 4). The algorithm is discovered through machine learning-based search and combinatorial optimization, utilizing the specific structure of XX^T for constant factor acceleration.
Compared with previous methods, the RXTX algorithm improves matrix multiplication by reducing the number of operations of recursive Strassen and Strassen-Winograd (such as recursive quantities). It uses 26 conventional matrix multiplication and optimized addition schemes, thus reducing the total operation. Theoretical analysis shows that RXTX performs fewer multiplication and combinatorial operations, especially for larger matrices. Actual testing of the 6144×6144 matrix using a single threaded CPU showed that RXTX was 9% faster than the standard BLAS routine, with acceleration observed in 99% of runs. These results highlight the efficiency of RXTX for large-scale symmetric matrix products and its advantages over traditional and state-of-the-art recursive algorithms.
The proposed method integrates RL with a two-layer hybrid integer linear programming (MILP) pipeline to discover efficient matrix multiplication algorithms, especially for calculating XX^t. RL-guided large neighborhood searches produce a large number of potential Rank-1 bilinear products that are candidate expressions. MILP-A explores all linear combinations of these products to express target output, while MILP-B identifies the smallest subset that can represent all targets. This setup reflects the αentensor method, but simplifies it by greatly reducing the action space, focusing on lower tensor products, and using Gurobi (e.g. Gurobi) for fast calculations.
For example, to calculate xx^t of a 2×2 matrix X, the goal is to derive an expression such as x_1^2 + x_2^2 or x_1x_3 + x_2x_4. The RL strategy randomly sampled thousands of bilinear products using coefficients of {-1, 0, +1}. MILP-A found a combination of these products that match the desired expression, and MILP-B selected the least product needed to cover all targets. The framework enables the discovery of RXTX, an algorithm that performs 5% fewer multiplications and overall operations than previous methods. RXTX is effective for large and small matrices and demonstrates a successful fusion of ML-based search and combinatorial optimization.
View paper. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 95k+ ml reddit And subscribe Our newsletter.

Sana Hassan, a consulting intern at Marktechpost and a dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. He is very interested in solving practical problems, and he brings a new perspective to the intersection of AI and real-life solutions.
