Weekly AI Research Roundup - February 02, 2026

Published on 2026-02-02

15 papers

AI Research Roundup: February 04, 2026

Discover the latest breakthroughs in artificial intelligence with our curated selection of top cutting-edge research papers of this week.

15 Papers
5 Categories
72 Researchers

Explainable & Ethical AI

Transparency, fairness, and responsible AI development

1

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs

By Casimiro Pio Carrino, Paula Estrella, Rabih Zbib et al. (5 authors)

Explainable & Ethical AI 2026-01-30
universitat politècnica de catalunya

Problem

The main problem addressed in this research paper is the lack of benchmarks for evaluating the performance, fairness, and bias of Large Language Models (LLMs) in Human Resource (HR) tasks, specifically in the analysis of résumés for matching with job descriptions. This task involves asking questions about the skills, experience, and background of a candidate in relation to a job description, and is a critical use case of LLMs in HR.

Analogy

Imagine you are a recruiter trying to match a candidate with a job opening. You need to read the candidate's résumé and the job description to determine if they have the required skills and experience. This process can be time-consuming and prone to bias. LLMs can help automate this process by analyzing the résumé and job description, but they need to be evaluated and tested to ensure they are accurate and fair. JobResQA is like a test dataset that allows developers to evaluate the performance of LLMs on this task, identify areas for improvement, and develop more accurate and transparent HR systems.

Key Innovation

The key innovation of this paper is the introduction of JobResQA, a multilingual Question Answering benchmark for evaluating Machine Reading Comprehension (MRC) capabilities of LLMs on HR-specific tasks involving résumés and job descriptions. JobResQA is a curated, synthetic, multilingual QA dataset of over 105 résumé-JD pairs (581 QA items) that supports both short and long answers across three complexity levels: basic (extractive), intermediate (multi-passage), and complex (cross-document reasoning).

Practical Impact

The practical impact of this research is significant, as it provides a reproducible benchmark for advancing fair and reliable LLM-based HR systems. The JobResQA dataset can be used to evaluate the performance of LLMs on HR tasks, identify biases and fairness issues, and develop more accurate and transparent HR systems. This can lead to better candidate matching, reduced bias in hiring decisions, and improved overall HR processes.

2

Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

By Krishnakumar Balasubramanian, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan

Explainable & Ethical AI 2026-01-29

Problem

When evaluating the performance of AI systems, researchers often rely on aggregating the judgments of multiple annotators, including large language models (LLMs) used as judges. However, these LLMs are not independent of each other, as they share data, architectures, and other factors that can lead to correlated judgments. This correlation can result in systematically miscalibrated predictions and even confident incorrect predictions.

Analogy

Imagine you have multiple people trying to guess the outcome of a coin toss. Each person has their own opinion, but they may also be influenced by the opinions of others. If you simply average their opinions, you may get a misleading result. However, if you take into account the fact that they are influencing each other, you can get a more accurate estimate of the true outcome. This is similar to what the authors are doing in this paper, except instead of people, they are dealing with large language models and their judgments.

Key Innovation

The authors propose a new model hierarchy based on Ising graphical models and latent factors to address the problem of dependence between annotators. This model hierarchy includes three types of models: a conditional independence model, a class-independent Ising model, and a class-dependent Ising model. The class-dependent Ising model allows for class-specific interactions between annotators, which enables it to capture the dependence between annotators more accurately.

Practical Impact

This research has significant practical implications for AI evaluation and development. By accounting for the dependence between annotators, this work can improve the accuracy and reliability of AI performance evaluations. This, in turn, can lead to better AI systems that are more trustworthy and effective in real-world applications.

3

Quantum-Inspired Reinforcement Learning for Secure and Sustainable AIoT-Driven Supply Chain Systems

By Muhammad Bilal Akram Dastagir, Omer Tariq, Shahid Mumtaz et al. (5 authors)

Explainable & Ethical AI 2026-01-29

Problem

Modern supply chains face a triple challenge: high-speed logistics, environmental impact, and security constraints. As a result, there is a growing need for AI-enabled Internet of Things (AIoT) solutions that can balance these competing demands.

Analogy

Imagine a complex supply chain as a network of interconnected nodes, each representing a different part of the logistics process. The proposed framework uses a "quantum spin-chain" analogy to model this network, where each node is connected by a quantum "spin" that can be controlled to optimize the flow of goods and information. This allows the framework to balance competing demands, such as reducing carbon emissions and preventing cyber threats, in a single decision model.

Key Innovation

This research proposes a quantum-inspired reinforcement learning framework that unifies inventory management, carbon footprint reduction, and security objectives within a single decision model. The framework uses a controllable spin-chain analogy coupled to real-time AIoT signals to operationalize a multi-objective reward and learn robust policies via value-based and ensemble policy updates.

Practical Impact

The proposed framework has the potential to drive secure, eco-conscious supply chain operations at scale, laying the groundwork for globally connected infrastructures that responsibly meet both consumer and environmental needs. By addressing the challenges of sustainability and security simultaneously, this research can help reduce the environmental impact of supply chains while preventing malicious intrusions.

4

Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use

By Julien Delavande, Regis Pierrard, Sasha Luccioni

Explainable & Ethical AI 2026-01-29

Problem

Large Language Models (LLMs) are increasingly being used in production, which has shifted the focus from training to inference. However, the energy consumption during inference has become a growing concern. While prior work has examined the energy cost of inference per prompt or per token, the paper highlights the impact of system-level design choices, such as numerical precision, batching strategy, and request scheduling, on energy consumption.

Analogy

Imagine a large restaurant where multiple orders are being prepared simultaneously. Each order is like a prompt being processed by the LLM. If the restaurant is not organized efficiently, with orders being processed one by one, it will take a long time to complete all the orders, and the kitchen will be inefficient. Similarly, if the LLM is not optimized for batching and serving, it will consume more energy and be less efficient. By optimizing the batching and serving strategies, the restaurant (or the LLM) can process orders more efficiently, reducing energy consumption and increasing productivity.

Key Innovation

The paper presents a detailed empirical study of LLM inference energy and latency on NVIDIA H100 GPUs, analyzing the impact of quantization, batch size, and serving configuration on energy consumption. The study reveals that lower-precision formats only yield energy gains in compute-bound regimes, and that batching improves energy efficiency, especially in memory-bound phases like decoding. Additionally, the paper shows that structured request timing (arrival shaping) can reduce per-request energy by up to 100×.

Practical Impact

The findings of this paper have significant practical implications for the deployment of LLMs. By understanding the impact of system-level design choices on energy consumption, developers can optimize their models and serving configurations to reduce energy consumption and carbon footprint. This is particularly important as LLMs are increasingly being used in user-facing applications, such as chatbots and assistants, which can have a significant impact on energy consumption. The paper's findings can also inform the development of more energy-efficient AI services and help mitigate the environmental impact of AI.

Agentic AI

Autonomous agents, multi-agent systems, and intelligent decision-making

1

Tackling air quality with SAPIENS

By Marcella Bona, Nathan Heatley, Jia-Chen Hua et al. (10 authors)

Agentic AI 2026-01-30

Problem

Air pollution is a major problem in large cities worldwide, causing disease and premature death. Vehicular traffic is a significant contributor to poor air quality, and current air quality forecasts are often coarse-grained and not very accurate. This makes it difficult for people to make informed decisions about their daily activities and commute.

Analogy

Imagine a city as a large, complex organism with many different parts that interact with each other. Traffic is like the blood flow, carrying pollutants through the city. The SAPIENS model is like a sophisticated medical imaging technique that can visualize the flow of traffic and pollutants, allowing researchers to understand the relationships between them. By analyzing this complex system, the model can provide valuable insights and predictions that can help improve air quality and reduce pollution.

Key Innovation

Researchers have developed a new method to represent traffic intensities using concentric ring-based descriptions, which are derived from Google Maps traffic data. This allows for a more detailed understanding of traffic conditions and their impact on air quality. The team used Partial Least Squares Regression to predict pollution levels based on these new traffic intensity measures.

Practical Impact

The SAPIENS project aims to provide hyper-local, dynamic air quality forecasts that can help individuals make informed decisions about their daily activities and commute. By taking into account traffic intensity and other factors, the model can provide more accurate predictions of air pollution levels. This can help reduce exposure to air pollutants, particularly for vulnerable populations such as children, the elderly, and people with chronic health conditions.

2

MonoScale: Scaling Multi-Agent System with Monotonic Improvement

By Shuai Shao, Yixiang Liu, Bingwei Lu et al. (4 authors)

Agentic AI 2026-01-30

Problem

Multi-agent systems (MAS) built on large language models (LLMs) are prone to performance collapse when they are expanded by continuously integrating new agents or tools. This can lead to cold-start misrouting, where the router struggles to make effective decisions about which agents to use, resulting in a degradation of overall performance.

Analogy

Imagine a large team of experts working together to solve a complex problem. As new experts join the team, the team leader (router) needs to learn how to effectively use their skills and expertise. MonoScale is like a specialized training program that helps the team leader learn about the new expert's strengths and weaknesses, so that they can make informed decisions about who to assign to each task. This ensures that the team works efficiently and effectively, even as it grows and changes over time.

Key Innovation

MonoScale is a novel expansion-aware update framework that proactively generates a small set of agent-conditioned familiarization tasks to collect controlled feedback from new agents. It then distills this feedback into auditable natural-language memory to guide future routing decisions. This approach ensures that the router learns when to use or not use the new agent, preventing performance collapse.

Practical Impact

MonoScale can significantly improve the robustness of agentic applications in areas such as research assistance, enterprise automation, and software engineering. By reducing cold-start misrouting and preventing cascading failures, MonoScale can help make large-scale, open agent onboarding in the future Agentic Web more reliable and efficient.

3

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training

By Anglin Liu, Ruichao Chen, Yi Lu et al. (5 authors)

Agentic AI 2026-01-30

Problem

Multimodal Large Language Models (MLLMs) are widely used in medical imaging, but they suffer from a critical limitation: geometric blindness. This means that even though they can generate semantically rich descriptions, they often fail to ground their outputs in the strict geometric facts of the image. This can lead to plausible yet factually incorrect hallucinations, such as misplacing organs or hallucinating lesions.

Analogy

Imagine trying to describe a puzzle to someone without showing them the puzzle pieces. You might use fancy words and phrases, but without actually seeing the puzzle, you might get some of the pieces in the wrong place. This is similar to what happens when MLLMs try to describe medical images without understanding the underlying geometry. Med-Scout is like a "puzzle solver" that helps MLLMs understand the geometry of the image and describe it more accurately.

Key Innovation

The Med-Scout framework proposes a novel solution to this problem by using Reinforcement Learning (RL) to leverage the intrinsic geometric logic latent within unlabeled medical images. Instead of relying on costly expert annotations, Med-Scout derives verifiable supervision signals through three strategic proxy tasks: Hierarchical Scale Localization, Topological Jigsaw Reconstruction, and Anomaly Consistency Detection. This approach allows Med-Scout to significantly mitigate geometric blindness and improve performance on radiological and comprehensive medical VQA tasks.

Practical Impact

The practical impact of Med-Scout is significant, as it can be applied to various medical imaging tasks, such as radiological diagnosis and comprehensive medical VQA. By improving the geometric perception of MLLMs, Med-Scout can help clinicians make more accurate diagnoses and provide better patient care. Additionally, Med-Scout can be used to develop more robust and reliable medical AI systems that can handle complex medical imaging tasks.

Generative AI & LLMs

Breakthroughs in language models, text generation, and creative AI systems

1

Optimal Fair Aggregation of Crowdsourced Noisy Labels using Demographic Parity Constraints

By Gabriel Singer, Samuel Gruffaz, Olivier Vo Van et al. (5 authors)

Generative AI & LLMs 2026-01-30

Problem

The main challenge addressed in this research paper is the amplification of individual biases in crowdsourced aggregation, particularly regarding sensitive features, which raises fairness concerns. This is a significant problem because crowdsourcing and aggregation of noisy human annotations is a common practice in domains where ground-truth is inherently subjective or prohibitively expensive to obtain.

Analogy

Imagine a group of people trying to guess the price of a house. Each person has their own opinion, but some people might be more biased towards overestimating or underestimating the price. In crowdsourced aggregation, these biases can be amplified, leading to an unfair result. FairCrowd is like a filter that removes these biases and ensures that the final result is fair and representative of the ground-truth.

Key Innovation

The key innovation of this research paper is the development of a post-processing algorithm called FairCrowd, which regularizes any label aggregation rule to enforce strict ε-fairness constraints. This is a novel solution to the problem of fairness in crowdsourced aggregation, as existing approaches only provide limited post-processing methods for enforcing ε-fairness under demographic parity.

Practical Impact

The practical impact of this research is significant, as it provides a solution to the problem of fairness in crowdsourced aggregation. This has implications for various domains where crowdsourcing and aggregation of noisy human annotations is used, such as medical diagnosis, content moderation, and sentiment analysis. By enforcing strict demographic parity constraints, FairCrowd can improve fairness in crowdsourced aggregation and reduce the amplification of individual biases.

2

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

By Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady et al. (6 authors)

Generative AI & LLMs 2026-01-30

Problem

Markov decision processes (MDPs) are a fundamental model in decision making, but they assume that the transition function is known. In reality, this assumption is not always justified, as MDPs are constructed from data and the transition functions are estimated with uncertainty. This issue has led to the study of Robust Markov decision processes (RMDPs), which weaken the assumption by only assuming knowledge of some uncertainty set containing the true transition function. The goal of solving RMDPs is to minimize the worst-case expected payoff with respect to all possible choices of transition functions belonging to the uncertainty set.

Analogy

Imagine you are driving a car, and you're not sure which road to take because the traffic lights are uncertain. A Markov decision process would try to find the best route based on the expected traffic patterns. However, in reality, the traffic patterns are uncertain, and we need to account for the worst-case scenario. A Robust Markov decision process would try to find the best route by minimizing the worst-case expected payoff with respect to all possible traffic patterns. The paper's algorithm provides a way to efficiently solve this problem in polynomial time, which is essential for decision-making in uncertain environments.

Key Innovation

The paper presents a novel potential function that tracks the effects of changing the optimal policy within the uncertainty set. This function, called fρ(s, s′, s′′), estimates how much the value of a policy can be improved by donating some probability mass from one state to another. The paper also shows bounds to relate the policy values and the defined potential function. Moreover, it proves a novel combinatorial result (Lemma 8) over the number of most significant bits in the binary representation of unitary signed subset sums of a finite set of real numbers. This result is a key component of the paper's algorithm.

Practical Impact

The paper's main contribution is to resolve a fundamental algorithmic open problem for discounted (s, a)-rectangular RMDPs with L∞uncertainty sets. The paper shows that a robust policy iteration algorithm terminates in strongly polynomial time when the discount factor is fixed. This result has important implications for decision-making in uncertain environments. By providing a polynomial-time algorithm for solving RMDPs, the paper opens up new possibilities for applying robust decision-making techniques in real-world applications.

3

Solving Inverse Problems with Flow-based Models via Model Predictive Control

By George Webber, Alexander Denker, Riccardo Barbano et al. (4 authors)

Generative AI & LLMs 2026-01-30

Problem

The main problem this research paper addresses is the challenge of using flow-based generative models to solve inverse problems, such as image restoration, without retraining the model. Inverse problems involve recovering a signal from noisy measurements, and current methods often lack theoretical guarantees and can be computationally unstable.

Analogy

Think of MPC-Flow as a GPS system for flow-based generative models. Just as a GPS system provides turn-by-turn directions to reach a destination, MPC-Flow breaks down the complex trajectory of the flow model into a sequence of short-horizon control problems, guiding the model towards the desired solution. This approach allows for more efficient and robust control, much like how a GPS system helps navigate through complex terrain.

Key Innovation

The key innovation of this paper is the introduction of a model predictive control (MPC) framework called MPC-Flow, which formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems. This allows for practical optimal control-based guidance at inference time, reducing memory requirements and improving robustness.

Practical Impact

MPC-Flow has the potential to significantly impact the field of image restoration and other inverse problems. By providing a scalable and efficient approach to inference-time control, MPC-Flow can be applied to large-scale architectures and complex tasks, such as image super-resolution and deblurring. The results of the paper demonstrate strong performance and scalability on benchmark image restoration tasks.

4

Particle-Guided Diffusion Models for Partial Differential Equations

By Andrew Millard, Fredrik Lindsten, Zheng Zhao

Generative AI & LLMs 2026-01-30

Problem

Partial differential equations (PDEs) are a fundamental tool in science and engineering, but solving them can be computationally expensive and challenging, especially when dealing with large parameter spaces or complex dynamics. Current numerical solvers are often slow and impractical for real-time or uncertainty-aware applications.

Analogy

Imagine trying to solve a puzzle with millions of pieces, where each piece represents a tiny part of a complex system. Traditional numerical solvers are like trying to solve the puzzle by looking at each piece individually, which can be slow and error-prone. The Particle-Guided Diffusion Models approach is like using a powerful AI assistant that can look at the entire puzzle at once, making educated guesses about the missing pieces and filling them in with high accuracy.

Key Innovation

The paper presents a new approach to solving PDEs using a guided stochastic sampling method, which combines the strengths of diffusion models and physics-based guidance. This method, called Particle-Guided Diffusion Models, uses a new Sequential Monte Carlo (SMC) framework to generate solution fields that are physically admissible and accurate.

Practical Impact

The proposed method has the potential to revolutionize the way PDEs are solved, enabling faster and more accurate simulations in various fields, such as fluid dynamics, heat transport, and electromagnetics. This could lead to breakthroughs in fields like climate modeling, materials science, and medical imaging.

Computer Vision & MultiModal AI

Advances in image recognition, video analysis, and multimodal learning

1

End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms

By MH Farhadi, Ali Rabiee, Sima Ghafoori et al. (6 authors)

Computer Vision & MultiModal AI 2026-01-30

Problem

Shared autonomy is a control paradigm where a human and an automated system work together to operate a device, with the system adapting its assistance to the situation. However, the core challenge lies in the tension between two critical processes: inferring the user's goal (a probabilistic inference problem) and determining the appropriate level of assistance (an optimization problem). Prior approaches typically addressed these challenges separately or sequentially, leading to suboptimal results.

Analogy

Imagine you're trying to grasp a small pill bottle with a robotic arm. The robotic arm needs to infer whether you intend to grasp the pill bottle or a large water glass. If the arm is uncertain about your goal, it should provide minimal assistance during the reaching phase to preserve your agency. However, once it's confident that you're aiming for the pill bottle, it should increase support to ensure a precise grasp. BRACE's end-to-end integration of goal inference and assistance arbitration allows the robotic arm to adapt its assistance levels in real-time, based on the user's goal uncertainty and environmental constraints.

Key Innovation

The research introduces a novel framework called BRACE (Bayesian Reinforcement Assistance with Context Encoding), which integrates goal inference and assistance arbitration end-to-end. BRACE processes the full Bayesian goal distribution, conditioning collaborative control policies on environmental context and complete goal probability distributions. This approach allows for a more nuanced reaction to user uncertainty and environmental constraints.

Practical Impact

The BRACE framework has the potential to improve assistive robotics, particularly for users with motor impairments. By adapting assistance levels to the user's goal uncertainty and environmental constraints, BRACE can provide more effective support, preserving user agency while ensuring precision-critical tasks are completed successfully. This research could also be applied to other shared autonomy scenarios, such as collaborative robots or autonomous vehicles.

AI in healthcare

Cutting-edge research in artificial intelligence

1

Privacy-Preserving Sensor-Based Human Activity Recognition for Low-Resource Healthcare Using Classical Machine Learning

By Ramakant Kumar, Pravin Kumar

AI in healthcare 2026-01-29

Problem

The main problem this research addresses is the limited access to medical infrastructure for elderly and vulnerable patients, leading to neglect and poor adherence to therapeutic exercises. This gap in healthcare can be particularly challenging for those living in low-resource and rural areas.

Analogy

Imagine wearing a fitness tracker that not only counts your steps but also recognizes your activities, such as walking upstairs or practicing yoga. This technology can help healthcare professionals monitor patients remotely, ensure adherence to therapy, and provide timely interventions. The Support Tensor Machine is like a super-smart algorithm that can learn from sensor data and make accurate predictions about your activities, enabling more effective and personalized care.

Key Innovation

The research proposes a low-cost and automated human activity recognition (HAR) framework based on wearable inertial sensors and machine learning. The key innovation is the introduction of the Support Tensor Machine (STM), a novel classifier that leverages tensor representations to capture the multi-dimensional nature of sensor signals, resulting in improved accuracy and generalization capability.

Practical Impact

This research has the potential to revolutionize remote healthcare, elderly assistance, and smart home wellness by providing a scalable solution for low-resource and rural healthcare settings. The proposed framework can be used to track daily human activities, such as walking, sitting, and standing, and offer personalized recommendations for rehabilitation, fitness tracking, and ambient-assisted living.

2

From Retrieving Information to Reasoning with AI: Exploring Different Interaction Modalities to Support Human-AI Coordination in Clinical Decision-Making

By Behnam Rahdari, Sameer Shaikh, Jonathan H Chen et al. (5 authors)

AI in healthcare 2026-01-29

Problem

Clinical decision-making is a complex task that involves diagnosing and treating patients. While large language models (LLMs) have shown promise in improving clinician performance, their impact on clinical decision-making is still unclear. Clinicians are not using these models as intended, and it's unclear how they compare to traditional clinical decision-support systems (CDSS). This lack of understanding restricts the design of new mechanisms that can overcome existing tool limitations and enhance performance and experience.

Analogy

Imagine you're working with a colleague who has expertise in a particular area. You ask them a question, and they respond with a brief answer. However, if you ask them to explain their thought process and reasoning behind their answer, you may get a more detailed and insightful response. This is similar to how clinicians interact with LLMs in this study. They tend to use the models as a tool for targeted retrieval and confirmation, but when the interaction setup allows for deeper engagement and the model is positioned as a specialist, clinicians are more likely to engage with the model and benefit from its expertise.

Key Innovation

This study explores how clinicians interact with LLMs in different ways, including text-based conversation, interactive user interfaces, and voice-based systems. The researchers used think-aloud case walkthroughs, interviews, and UI design probes to understand how clinicians interpret, verify, and incorporate AI output under realistic constraints. The study found that clinicians tend to use LLMs as a tool for targeted retrieval and confirmation, but that deeper engagement occurs when the interaction setup positions the model in a familiar consult role and when reasoning is externalized into stable visual artifacts.

Practical Impact

The findings of this study have practical implications for the design of clinical decision-support systems. By understanding how clinicians interact with LLMs, developers can design systems that support clinician-AI coordination and enhance performance and experience. The study suggests that different interaction modalities (text, visual, voice) are better suited for different tasks, and that a one-size-fits-all approach may not be effective. This knowledge can inform the development of more effective and user-friendly clinical decision-support systems.

3

EMBC Special Issue: Calibrated Uncertainty for Trustworthy Clinical Gait Analysis Using Probabilistic Multiview Markerless Motion Capture

By Seth Donahue, Irina Djuraskovic, Kunal Shah et al. (6 authors)

AI in healthcare 2026-01-29
university göttingen

Problem

The main problem this research paper addresses is the need for accurate and trustworthy clinical gait analysis using computer-vision-based methods. Current multiview markerless motion capture (MMMC) models are accurate but lack quantifiable uncertainty outputs, making it difficult for clinicians to know when the data is trustworthy.

Analogy

Imagine having a camera that can take pictures of a person walking and automatically analyze their gait. While the camera can provide accurate information about the person's movement, it's essential to know how accurate the information is. The probabilistic multiview markerless motion capture method is like adding a "confidence meter" to the camera, providing a measure of how reliable the analysis is. This allows clinicians to trust the data and make informed decisions about a patient's treatment.

Key Innovation

The key innovation of this work is the development of a probabilistic multiview markerless motion capture method that provides calibrated uncertainty estimates. This method uses variational inference to estimate joint angle posterior distributions and provides statistically sound confidence intervals for kinematic estimates. The innovation lies in the external validation of this method against clinical systems, filling a gap in previous research.

Practical Impact

This research has significant practical impact for clinical gait analysis. By providing calibrated confidence intervals, clinicians can identify unreliable outputs and exclude instances of low-quality biomechanical reconstruction. This improves the reliability and trust of kinematic data used for clinical decision-making. The method can be applied in various clinical settings, including physical therapy and rehabilitation, to provide accurate and trustworthy gait analysis.