Context Definition and the AGENTS.md Concept
Last updated on 2026-01-02 | Edit this page
Estimated time: 35 minutes
Overview
Questions
- Why is context important for AI coding assistants?
- What is the
AGENTS.mdconcept? - How can I define coding standards for AI agents in my R project?
- What are other ways to provide context to AI assistants?
Objectives
- Understand the importance of context definitions for AI agents
- Learn about the
AGENTS.mdspecification - Create an
AGENTS.mdfile with R project-specific guidelines - Explore alternative methods for providing context to AI assistants
Introduction
As AI coding assistants become more integrated into our development workflows, it’s crucial to provide them with appropriate context about project-specific conventions, coding standards, and preferences. Without proper context, AI assistants may generate code that, while functionally correct, doesn’t align with your project’s style or best practices.
This chapter explores how to define context for AI agents, focusing
on the AGENTS.md concept and other strategies for
communicating your project’s coding standards to AI assistants.
Context is Key
Just as human developers need onboarding documentation to understand a project’s conventions, AI assistants benefit from explicit context definitions. Clear guidelines help AI tools generate code that fits seamlessly into your existing codebase.
The Need for General Context Definitions
Why Context Matters
When working with AI coding assistants, providing clear context helps ensure:
- Consistency: Code generated by AI matches your project’s existing style and patterns
- Best Practices: AI follows domain-specific conventions (e.g., using tidyverse packages in R)
- Efficiency: Less time spent revising AI-generated code to match your standards
- Maintainability: Generated code is easier for team members to understand and maintain
- Learning: AI can help reinforce good coding practices by consistently applying them
Types of Context
Context for AI assistants can include:
- Language-specific conventions: Which libraries or frameworks to prefer
- Style guidelines: Naming conventions, formatting, documentation standards
- Architectural patterns: How to structure code, which design patterns to use
- Project-specific rules: Custom conventions unique to your codebase
- Domain knowledge: Specialized terminology or domain-specific best practices
The AGENTS.md Concept
What is AGENTS.md?
AGENTS.md is a standardized approach to providing context and
guidelines for AI coding assistants directly within your repository. By
placing an AGENTS.md file in your repository root, you
create a machine-readable specification that AI tools can reference when
generating or reviewing code.
The concept is documented at https://agents.md/, which provides:
- Specifications for the
AGENTS.mdformat - Examples from various programming languages and domains
- Best practices for writing effective agent guidelines
- Community-contributed templates and patterns
How AGENTS.md Works
When an AI assistant (like GitHub Copilot, Cursor, or other AI tools)
works in a repository with an AGENTS.md file, it can:
- Read the guidelines at the start of a session
- Apply the rules when generating code suggestions
- Reference the standards when reviewing or revising code
- Adapt behavior to match project-specific preferences
Multiple AGENTS.md Files
You can place AGENTS.md files at different levels of
your project hierarchy. AI assistants typically use the
closest AGENTS.md file relative to the
current working file:
-
Repository root (
/AGENTS.md): Defines project-wide standards that apply to all code -
Subdirectory (
/src/AGENTS.md,/tests/AGENTS.md): Provides context-specific guidelines that override or extend root-level rules -
Module-level
(
/src/data-processing/AGENTS.md): Defines specialized rules for specific components
Example hierarchy:
my-r-project/
├── AGENTS.md # General tidyverse standards
├── src/
│ └── analysis/
│ └── AGENTS.md # Additional statistical analysis guidelines
└── tests/
└── AGENTS.md # Testing-specific conventions
When working on a file like /src/analysis/models.R, the
AI assistant will prioritize: 1. /src/analysis/AGENTS.md
(most specific) 2. /src/AGENTS.md (if it exists) 3.
/AGENTS.md (project-wide defaults)
This hierarchical approach allows you to maintain general standards while accommodating specialized needs in different parts of your codebase.
Version Control Benefits
Since AGENTS.md is a simple text file and can be easily
version-controlled alongside your code, e.g. using git,
it:
- Evolves with your project
- Can be reviewed and improved through pull requests
- Maintains consistency across different development stages
- Provides historical context for coding decisions
Example: AGENTS.md for an R Project
Here’s a comprehensive but quite extensive example of an
AGENTS.md file for an R project that emphasizes tidyverse
principles and functional programming patterns:
MARKDOWN
# AGENTS.md - R Project Coding Guidelines
## Overview
This R project follows tidyverse conventions and functional programming principles.
AI assistants should generate code that adheres to these guidelines.
## Language and Framework
- **Primary Language**: R (version 4.1 or higher)
- **Core Framework**: tidyverse
- **Required Packages**: dplyr, tidyr, ggplot2, purrr, readr
## Code Style Principles
### 1. Mandatory Tidyverse Usage
All data manipulation and analysis code MUST use tidyverse packages and functions.
**Prefer:**
```r
mtcars %>%
filter(mpg > 20) %>%
select(mpg, cyl, hp)
```
**Avoid:**
```r
mtcars[mtcars$mpg > 20, c("mpg", "cyl", "hp")]
```
### 2. Piping Over Local Variables
ALWAYS favor piping operations over creating intermediate local variables.
**Prefer:**
```r
mtcars %>%
filter(cyl == 6) %>%
mutate(efficiency = mpg / hp) %>%
arrange(desc(efficiency)) %>%
head(10)
```
**Avoid:**
```r
cars_filtered <- filter(mtcars, cyl == 6)
cars_mutated <- mutate(cars_filtered, efficiency = mpg / hp)
cars_sorted <- arrange(cars_mutated, desc(efficiency))
result <- head(cars_sorted, 10)
```
### 3. Prohibition of Variable Overwriting
NEVER overwrite existing variables. Use piping to transform data in a single flow.
**Forbidden:**
```r
data <- read_csv("input.csv")
data <- filter(data, value > 0)
data <- mutate(data, log_value = log(value))
data <- arrange(data, date)
```
**Required:**
```r
data <- read_csv("input.csv") %>%
filter(value > 0) %>%
mutate(log_value = log(value)) %>%
arrange(date)
```
### 4. Sparse Use of Local Variables
Minimize the creation of intermediate variables. Only create local variables when:
- The result will be used multiple times in different contexts
- The variable name significantly improves code readability
- The computation is expensive and should not be repeated
**Acceptable local variable usage:**
```r
# Used in multiple independent operations
base_data <- read_csv("data.csv") %>%
filter(status == "active")
summary_stats <- base_data %>%
summarize(mean_value = mean(value), sd_value = sd(value))
detailed_analysis <- base_data %>%
group_by(category) %>%
summarize(across(where(is.numeric), list(mean = mean, sd = sd)))
```
### 5. Multiline Pipes with Documentation
Pipelines MUST be formatted across multiple lines with inline documentation.
**Required format:**
```r
analysis_results <- raw_data %>%
# Remove incomplete cases and outliers
filter(complete.cases(.), between(value, 0, 100)) %>%
# Normalize values by group
group_by(category) %>%
mutate(normalized = (value - mean(value)) / sd(value)) %>%
ungroup() %>%
# Calculate derived metrics
mutate(
log_value = log1p(value),
squared_value = value^2,
interaction = value * normalized
) %>%
# Sort by importance
arrange(desc(abs(normalized)))
```
Each step in a pipeline should:
- Be on its own line
- Have a preceding comment explaining its purpose
- Use meaningful intermediate calculations when needed
### 6. Function Definitions
Functions should also follow piping principles when applicable:
```r
process_dataset <- function(data, threshold = 0.05) {
data %>%
# Filter based on significance threshold
filter(p_value < threshold) %>%
# Calculate effect sizes
mutate(
effect_size = (mean_treatment - mean_control) / pooled_sd,
ci_lower = effect_size - 1.96 * se,
ci_upper = effect_size + 1.96 * se
) %>%
# Add interpretation
mutate(
significant = p_value < threshold,
effect_magnitude = case_when(
abs(effect_size) < 0.2 ~ "small",
abs(effect_size) < 0.8 ~ "medium",
TRUE ~ "large"
)
)
}
```
## Forbidden Patterns
1. **Loop-based operations** when vectorized or tidyverse alternatives exist
2. **Direct variable assignment in loops** - use `purrr::map()` family instead
3. **Base R subsetting syntax** - use `dplyr::filter()` and `dplyr::select()`
4. **Nested function calls** without pipes when multiple operations are chained
5. **`attach()` function** - always use explicit data references
## Documentation Standards
- Use roxygen2-style comments for all functions
- Include inline comments for complex pipeline steps
- Document assumptions and data requirements
- Explain any deviations from these guidelines (rare cases only)
## Examples of Complete Workflows
### Data Import and Cleaning
```r
cleaned_data <- read_csv("raw_data.csv") %>%
# Handle missing values
drop_na(key_columns) %>%
# Standardize column names
rename_with(tolower) %>%
# Type conversion
mutate(across(ends_with("_date"), ymd)) %>%
# Remove duplicates
distinct()
```
### Analysis Pipeline
```r
analysis <- cleaned_data %>%
# Subset to relevant period
filter(between(date, start_date, end_date)) %>%
# Group-level transformations
group_by(category, region) %>%
summarize(
n_obs = n(),
mean_value = mean(value, na.rm = TRUE),
median_value = median(value, na.rm = TRUE),
.groups = "drop"
) %>%
# Calculate derived metrics
mutate(
relative_value = mean_value / median_value,
weight = n_obs / sum(n_obs)
) %>%
# Final filtering
filter(n_obs >= min_sample_size)
```
## Additional Guidelines
- Prefer `tibble` over `data.frame`
- Use `readr::read_*()` over base R `read.*()` functions
- Always specify `.groups` argument when using `summarize()` with `group_by()`
- Use `across()` for operations on multiple columns
- Leverage `case_when()` for complex conditional logic
Challenge 1: Create Your Own AGENTS.md
Create an AGENTS.md file for one of your own R projects.
Consider:
- What coding style do you prefer?
- Which packages should be favored?
- What patterns should be avoided?
- How should functions be documented?
Your AGENTS.md should include:
- Clear statement of purpose
- Specific package preferences with examples
- Do’s and don’ts with code comparisons
- Documentation requirements
- Examples of good practices
Remember: Start simple and expand based on your project’s needs.
Other Options to Specify Context
While AGENTS.md is a powerful tool, there are several complementary or alternative approaches to providing context to AI assistants:
1. Inline Comments and Documentation
AI assistants can learn from well-documented code:
R
# This project uses tidyverse conventions exclusively
library(tidyverse)
#' Process customer data following tidyverse patterns
#'
#' @param data A tibble with customer information
#' @return A processed tibble with standardized columns
#' @examples
#' process_customers(raw_customers)
process_customers <- function(data) {
# Always use piping for multi-step transformations
data %>%
filter(!is.na(customer_id)) %>%
mutate(name = str_to_title(name))
}
2. Chat Instructions
When using AI chat interfaces, provide context explicitly:
I'm working on an R project that follows strict tidyverse conventions.
Please generate all code using:
- dplyr for data manipulation
- Piping (%>%) for all multi-step operations
- No variable overwriting
- Comments before each pipe step
5. Style Guides with Tools
Use automated style checking tools like lintr to enforce
standards:
R
# .lintr configuration
linters: linters_with_defaults(
line_length_linter(120),
object_usage_linter = NULL,
# Enforce tidyverse style
assignment_linter(),
pipe_continuation_linter()
)
6. Pre-commit Hooks
Enforce standards before code is committed via pre-commit hooks
7. Copilot Instructions in IDE
Some IDEs allow workspace-specific instructions for AI assistants. While specific settings vary by tool and may evolve, the concept involves configuring your IDE to provide additional context files or instructions to the AI assistant.
Note on IDE-Specific Settings
The availability and configuration of AI assistant settings varies across IDEs and tools. Check your specific IDE’s documentation for current options to provide context to AI coding assistants.
Combining Approaches
The most effective strategy often combines multiple approaches:
- AGENTS.md for comprehensive, machine-readable guidelines
- Inline comments for implementation-specific context
- Chat instructions for interactive sessions
- Automated tools for enforcement
Best Practices for Context Definitions
Keep It Focused
- Start with the most important rules
- Don’t try to specify everything at once
- Expand based on actual needs and pain points
Use Examples Liberally
- Show preferred patterns with code examples
- Demonstrate anti-patterns to avoid
- Include both simple and complex scenarios
Make It Discoverable
- Place context files in repository root
- Reference them in README and contributing guides
- Keep them up-to-date with project evolution
Test Your Guidelines
- Verify that AI actually follows your guidelines
- Iterate based on the quality of generated code
- Collect feedback from team members
Version Control Context
- Track changes to context definitions
- Review updates through pull requests
- Document why rules were added or changed
Challenge 2: Context Specification Strategy
For a team R project, design a context specification strategy that includes:
- What would go in
AGENTS.md? - What would be better as inline comments?
- How would you communicate standards to new team members?
- AGENTS.md: Core style principles, mandatory patterns, forbidden practices
- Inline comments: Function-specific logic, data flow explanations, edge cases
- README.md: Quick-start guide, links to detailed standards, setup instructions
- Onboarding docs: Human-readable explanation of why standards exist, examples
- Code reviews: Consistent feedback referring to documented standards
The key is redundancy across human and AI channels.
Summary
Providing proper context to AI coding assistants is essential for
generating high-quality, consistent code. The AGENTS.md
concept offers a standardized, version-controlled approach to defining
project-specific guidelines. Combined with other context specification
methods, it creates a comprehensive environment where AI assistants can
truly enhance your development workflow.
- Context definitions help AI assistants generate code that matches your project standards
- Context improves consistency: AI-generated code matches your project standards
-
AGENTS.mdprovides a standardized, version-controlled, machine-readable way to specify coding guidelines - Multiple
AGENTS.mdfiles can exist at different hierarchy levels; AI assistants use the closest file - Effective
AGENTS.mdfiles include clear examples of preferred and forbidden patterns - Start simple and iterate: Begin with core principles and expand based on needs
- Combining
AGENTS.mdwith inline comments, chat instructions, and tooling creates robust context - R projects benefit from explicit tidyverse usage and piping conventions in context definitions