Priority Research Agenda

The field's ten highest-priority open research directions. Click an item to expand its full description, investigate it in The Lab, or read its living systematic review.

Cross-Domain Generalization of AI Research Assistance Tools

Current AI-assisted research tools are overwhelmingly validated on computer science and closely related domains, leaving the vast majority of scientific disciplines underserved. This domain narrowness fundamentally limits the field's ability to claim generalizable progress and restricts adoption across the broader scientific community. Addressing this gap is essential for establishing AI-assisted research as a universal scientific capability rather than a niche CS tool.

Show full description ▼

Investigate in the Lab →

Hallucination Detection and Mitigation in AI-Generated Scientific Content

Hallucinated citations, fabricated findings, and factually incorrect statements represent the most critical reliability barrier for deploying LLMs in scientific workflows. Despite being the largest cluster of identified gaps, the field lacks systematic frameworks for measuring, categorizing, and mitigating hallucinations specifically in scientific contexts. Without solving this, AI-assisted research tools cannot be trusted for consequential scientific tasks.

Show full description ▼

Investigate in the Lab →Read the Living Systematic Review →

Standardized Evaluation Frameworks for AI-Assisted Scientific Review Quality

The field currently lacks consensus on how to measure whether AI-assisted literature reviews, peer reviews, or research summaries are actually better, worse, or biased compared to human-produced equivalents. Without standardized quality metrics, results across studies are incomparable and the field cannot accumulate reliable knowledge about system performance. This gap affects every researcher building or evaluating AI review tools.

Show full description ▼

Investigate in the Lab →

Full-Cycle Automation of Scientific Literature Review Processes

While significant progress has been made automating individual steps such as screening or extraction, fewer than 2% of studies have explored end-to-end automation of the complete literature review cycle. Integrating these components into coherent, reliable pipelines represents a qualitatively different and higher-order challenge that the field must address to deliver transformative research efficiency gains.

Show full description ▼

Investigate in the Lab →

Multi-Turn and Agentic AI Workflows for Scientific Investigation

Current benchmarks and systems predominantly evaluate single-turn interactions, but real scientific inquiry requires sustained, multi-step reasoning, iterative hypothesis refinement, and tool-using agents operating over extended horizons. The field's inability to evaluate and build multi-turn scientific AI agents represents a fundamental gap between current capabilities and the complex workflows researchers actually need.

Show full description ▼

Investigate in the Lab →

Multimodal Understanding of Scientific Figures, Tables, and Visual Data

Scientific papers communicate critical quantitative information through figures, tables, charts, and diagrams that current text-focused AI systems cannot reliably process. This limitation fundamentally constrains AI-assisted research to text-only information, missing a large portion of scientific knowledge. Given that nearly all empirical papers contain visual data, multimodal capability is a prerequisite for comprehensive scientific understanding.

Show full description ▼

Investigate in the Lab →

Bias Detection and Governance in AI-Assisted Research Workflows

AI systems integrated into scientific workflows introduce systematic biases through training data, model architecture, and deployment choices that can distort research outputs, perpetuate existing inequities in citation practices, and undermine scientific integrity. The field lacks robust methods for detecting, measuring, and mitigating these biases across the full range of research assistance tasks. This affects all researchers using or evaluating AI research tools.

Show full description ▼

Investigate in the Lab →

Prompt Engineering and Fine-Tuning Optimization for Scientific Tasks

Despite widespread use of prompt engineering and fine-tuning in deploying LLMs for scientific tasks, the field lacks principled, evidence-based guidance on which strategies work best for specific scientific applications. Researchers and practitioners are largely relying on trial-and-error, creating massive duplication of effort and preventing systematic improvement. Systematic study of prompt and fine-tuning optimization is foundational for the entire field.

Show full description ▼

Investigate in the Lab →

Novelty Assessment and Research Idea Generation Using AI Systems

Evaluating the novelty of research ideas and generating genuinely new hypotheses are among the highest-value potential applications of AI in science, yet they remain among the least understood and least benchmarked capabilities. The absence of robust novelty metrics and evaluation datasets prevents rigorous progress in this area, which sits at the core of scientific discovery. This gap matters to any researcher interested in AI's role in accelerating discovery.

Show full description ▼

Investigate in the Lab →

Disciplinary and Institutional Governance of AI in Academic Research

The rapid adoption of AI tools in research is outpacing the development of disciplinary norms, institutional policies, and governance frameworks. Researchers across all fields are navigating inconsistent and often absent guidelines for AI use in writing, reviewing, and publishing, creating integrity risks and inequitable access to AI benefits. Establishing evidence-based governance frameworks is a field-wide priority with implications for all researchers.

Show full description ▼

Investigate in the Lab →