Context Definition and the AGENTS.md Concept

Last updated on 2026-01-02 | Edit this page

Overview

Questions

Why is context important for AI coding assistants?
What is the AGENTS.md concept?
How can I define coding standards for AI agents in my R project?
What are other ways to provide context to AI assistants?

Objectives

Understand the importance of context definitions for AI agents
Learn about the AGENTS.md specification
Create an AGENTS.md file with R project-specific guidelines
Explore alternative methods for providing context to AI assistants

Introduction

As AI coding assistants become more integrated into our development workflows, it’s crucial to provide them with appropriate context about project-specific conventions, coding standards, and preferences. Without proper context, AI assistants may generate code that, while functionally correct, doesn’t align with your project’s style or best practices.

This chapter explores how to define context for AI agents, focusing on the AGENTS.md concept and other strategies for communicating your project’s coding standards to AI assistants.

Callout

Context is Key

Just as human developers need onboarding documentation to understand a project’s conventions, AI assistants benefit from explicit context definitions. Clear guidelines help AI tools generate code that fits seamlessly into your existing codebase.

The Need for General Context Definitions

Why Context Matters

When working with AI coding assistants, providing clear context helps ensure:

Consistency: Code generated by AI matches your project’s existing style and patterns
Best Practices: AI follows domain-specific conventions (e.g., using tidyverse packages in R)
Efficiency: Less time spent revising AI-generated code to match your standards
Maintainability: Generated code is easier for team members to understand and maintain
Learning: AI can help reinforce good coding practices by consistently applying them

Types of Context

Context for AI assistants can include:

Language-specific conventions: Which libraries or frameworks to prefer
Style guidelines: Naming conventions, formatting, documentation standards
Architectural patterns: How to structure code, which design patterns to use
Project-specific rules: Custom conventions unique to your codebase
Domain knowledge: Specialized terminology or domain-specific best practices

The AGENTS.md Concept

What is AGENTS.md?

AGENTS.md is a standardized approach to providing context and guidelines for AI coding assistants directly within your repository. By placing an AGENTS.md file in your repository root, you create a machine-readable specification that AI tools can reference when generating or reviewing code.

The concept is documented at https://agents.md/, which provides:

Specifications for the AGENTS.md format
Examples from various programming languages and domains
Best practices for writing effective agent guidelines
Community-contributed templates and patterns

How AGENTS.md Works

When an AI assistant (like GitHub Copilot, Cursor, or other AI tools) works in a repository with an AGENTS.md file, it can:

Read the guidelines at the start of a session
Apply the rules when generating code suggestions
Reference the standards when reviewing or revising code
Adapt behavior to match project-specific preferences

Multiple AGENTS.md Files

You can place AGENTS.md files at different levels of your project hierarchy. AI assistants typically use the closest AGENTS.md file relative to the current working file:

Repository root (/AGENTS.md): Defines project-wide standards that apply to all code
Subdirectory (/src/AGENTS.md, /tests/AGENTS.md): Provides context-specific guidelines that override or extend root-level rules
Module-level (/src/data-processing/AGENTS.md): Defines specialized rules for specific components

Example hierarchy:

my-r-project/
├── AGENTS.md               # General tidyverse standards
├── src/
│   └── analysis/
│       └── AGENTS.md       # Additional statistical analysis guidelines
└── tests/
    └── AGENTS.md           # Testing-specific conventions

When working on a file like /src/analysis/models.R, the AI assistant will prioritize: 1. /src/analysis/AGENTS.md (most specific) 2. /src/AGENTS.md (if it exists) 3. /AGENTS.md (project-wide defaults)

This hierarchical approach allows you to maintain general standards while accommodating specialized needs in different parts of your codebase.

Callout

Version Control Benefits

Since AGENTS.md is a simple text file and can be easily version-controlled alongside your code, e.g. using git, it:

Evolves with your project
Can be reviewed and improved through pull requests
Maintains consistency across different development stages
Provides historical context for coding decisions

Example: AGENTS.md for an R Project

Here’s a comprehensive but quite extensive example of an AGENTS.md file for an R project that emphasizes tidyverse principles and functional programming patterns:

MARKDOWN

# AGENTS.md - R Project Coding Guidelines

## Overview

This R project follows tidyverse conventions and functional programming principles. 
AI assistants should generate code that adheres to these guidelines.

## Language and Framework

- **Primary Language**: R (version 4.1 or higher)
- **Core Framework**: tidyverse
- **Required Packages**: dplyr, tidyr, ggplot2, purrr, readr

## Code Style Principles

### 1. Mandatory Tidyverse Usage

All data manipulation and analysis code MUST use tidyverse packages and functions.

**Prefer:**
```r
mtcars %>%
  filter(mpg > 20) %>%
  select(mpg, cyl, hp)
```

**Avoid:**
```r
mtcars[mtcars$mpg > 20, c("mpg", "cyl", "hp")]
```

### 2. Piping Over Local Variables

ALWAYS favor piping operations over creating intermediate local variables.

**Prefer:**
```r
mtcars %>%
  filter(cyl == 6) %>%
  mutate(efficiency = mpg / hp) %>%
  arrange(desc(efficiency)) %>%
  head(10)
```

**Avoid:**
```r
cars_filtered <- filter(mtcars, cyl == 6)
cars_mutated <- mutate(cars_filtered, efficiency = mpg / hp)
cars_sorted <- arrange(cars_mutated, desc(efficiency))
result <- head(cars_sorted, 10)
```

### 3. Prohibition of Variable Overwriting

NEVER overwrite existing variables. Use piping to transform data in a single flow.

**Forbidden:**
```r
data <- read_csv("input.csv")
data <- filter(data, value > 0)
data <- mutate(data, log_value = log(value))
data <- arrange(data, date)
```

**Required:**
```r
data <- read_csv("input.csv") %>%
  filter(value > 0) %>%
  mutate(log_value = log(value)) %>%
  arrange(date)
```

### 4. Sparse Use of Local Variables

Minimize the creation of intermediate variables. Only create local variables when:
- The result will be used multiple times in different contexts
- The variable name significantly improves code readability
- The computation is expensive and should not be repeated

**Acceptable local variable usage:**
```r
# Used in multiple independent operations
base_data <- read_csv("data.csv") %>%
  filter(status == "active")

summary_stats <- base_data %>%
  summarize(mean_value = mean(value), sd_value = sd(value))

detailed_analysis <- base_data %>%
  group_by(category) %>%
  summarize(across(where(is.numeric), list(mean = mean, sd = sd)))
```

### 5. Multiline Pipes with Documentation

Pipelines MUST be formatted across multiple lines with inline documentation.

**Required format:**
```r
analysis_results <- raw_data %>%
  # Remove incomplete cases and outliers
  filter(complete.cases(.), between(value, 0, 100)) %>%
  # Normalize values by group
  group_by(category) %>%
  mutate(normalized = (value - mean(value)) / sd(value)) %>%
  ungroup() %>%
  # Calculate derived metrics
  mutate(
    log_value = log1p(value),
    squared_value = value^2,
    interaction = value * normalized
  ) %>%
  # Sort by importance
  arrange(desc(abs(normalized)))
```

Each step in a pipeline should:
- Be on its own line
- Have a preceding comment explaining its purpose
- Use meaningful intermediate calculations when needed

### 6. Function Definitions

Functions should also follow piping principles when applicable:

```r
process_dataset <- function(data, threshold = 0.05) {
  data %>%
    # Filter based on significance threshold
    filter(p_value < threshold) %>%
    # Calculate effect sizes
    mutate(
      effect_size = (mean_treatment - mean_control) / pooled_sd,
      ci_lower = effect_size - 1.96 * se,
      ci_upper = effect_size + 1.96 * se
    ) %>%
    # Add interpretation
    mutate(
      significant = p_value < threshold,
      effect_magnitude = case_when(
        abs(effect_size) < 0.2 ~ "small",
        abs(effect_size) < 0.8 ~ "medium",
        TRUE ~ "large"
      )
    )
}
```

## Forbidden Patterns

1. **Loop-based operations** when vectorized or tidyverse alternatives exist
2. **Direct variable assignment in loops** - use `purrr::map()` family instead
3. **Base R subsetting syntax** - use `dplyr::filter()` and `dplyr::select()`
4. **Nested function calls** without pipes when multiple operations are chained
5. **`attach()` function** - always use explicit data references

## Documentation Standards

- Use roxygen2-style comments for all functions
- Include inline comments for complex pipeline steps
- Document assumptions and data requirements
- Explain any deviations from these guidelines (rare cases only)

## Examples of Complete Workflows

### Data Import and Cleaning
```r
cleaned_data <- read_csv("raw_data.csv") %>%
  # Handle missing values
  drop_na(key_columns) %>%
  # Standardize column names
  rename_with(tolower) %>%
  # Type conversion
  mutate(across(ends_with("_date"), ymd)) %>%
  # Remove duplicates
  distinct()
```

### Analysis Pipeline
```r
analysis <- cleaned_data %>%
  # Subset to relevant period
  filter(between(date, start_date, end_date)) %>%
  # Group-level transformations
  group_by(category, region) %>%
  summarize(
    n_obs = n(),
    mean_value = mean(value, na.rm = TRUE),
    median_value = median(value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Calculate derived metrics
  mutate(
    relative_value = mean_value / median_value,
    weight = n_obs / sum(n_obs)
  ) %>%
  # Final filtering
  filter(n_obs >= min_sample_size)
```

## Additional Guidelines

- Prefer `tibble` over `data.frame`
- Use `readr::read_*()` over base R `read.*()` functions
- Always specify `.groups` argument when using `summarize()` with `group_by()`
- Use `across()` for operations on multiple columns
- Leverage `case_when()` for complex conditional logic

Challenge

Challenge 1: Create Your Own AGENTS.md

Create an AGENTS.md file for one of your own R projects. Consider:

What coding style do you prefer?
Which packages should be favored?
What patterns should be avoided?
How should functions be documented?

Example Solution

Your AGENTS.md should include:

Clear statement of purpose
Specific package preferences with examples
Do’s and don’ts with code comparisons
Documentation requirements
Examples of good practices

Remember: Start simple and expand based on your project’s needs.

Other Options to Specify Context

While AGENTS.md is a powerful tool, there are several complementary or alternative approaches to providing context to AI assistants:

1. Inline Comments and Documentation

AI assistants can learn from well-documented code:

R

# This project uses tidyverse conventions exclusively
library(tidyverse)

#' Process customer data following tidyverse patterns
#' 
#' @param data A tibble with customer information
#' @return A processed tibble with standardized columns
#' @examples
#' process_customers(raw_customers)
process_customers <- function(data) {
  # Always use piping for multi-step transformations
  data %>%
    filter(!is.na(customer_id)) %>%
    mutate(name = str_to_title(name))
}

2. Chat Instructions

When using AI chat interfaces, provide context explicitly:

I'm working on an R project that follows strict tidyverse conventions.
Please generate all code using:

- dplyr for data manipulation
- Piping (%>%) for all multi-step operations
- No variable overwriting
- Comments before each pipe step

3. Project README Files

Include coding standards in your README.md:

MARKDOWN

## Coding Standards

This project follows tidyverse conventions. All contributions must:

- Use tidyverse packages
- Implement piping for data transformations
- Avoid variable overwriting
- Include inline documentation

4. Code Templates and Snippets

Create RStudio code snippets that enforce your patterns:

R

# In RStudio: Tools > Global Options > Code > Snippets
snippet tidypipe
	${1:data} %>%
		# ${2:description}
		${3:operation}() %>%
		# ${4:description}
		${5:operation}()

5. Style Guides with Tools

Use automated style checking tools like lintr to enforce standards:

R

# .lintr configuration
linters: linters_with_defaults(
  line_length_linter(120),
  object_usage_linter = NULL,
  # Enforce tidyverse style
  assignment_linter(),
  pipe_continuation_linter()
)

6. Pre-commit Hooks

Enforce standards before code is committed via pre-commit hooks

YAML

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/lorenzwalthert/precommit
    rev: v0.3.2
    hooks:
      - id: style-files
      - id: lintr

7. Copilot Instructions in IDE

Some IDEs allow workspace-specific instructions for AI assistants. While specific settings vary by tool and may evolve, the concept involves configuring your IDE to provide additional context files or instructions to the AI assistant.

Callout

Note on IDE-Specific Settings

The availability and configuration of AI assistant settings varies across IDEs and tools. Check your specific IDE’s documentation for current options to provide context to AI coding assistants.

Callout

Combining Approaches

The most effective strategy often combines multiple approaches:

AGENTS.md for comprehensive, machine-readable guidelines
Inline comments for implementation-specific context
Chat instructions for interactive sessions
Automated tools for enforcement

Best Practices for Context Definitions

Keep It Focused

Start with the most important rules
Don’t try to specify everything at once
Expand based on actual needs and pain points

Use Examples Liberally

Show preferred patterns with code examples
Demonstrate anti-patterns to avoid
Include both simple and complex scenarios

Make It Discoverable

Place context files in repository root
Reference them in README and contributing guides
Keep them up-to-date with project evolution

Test Your Guidelines

Verify that AI actually follows your guidelines
Iterate based on the quality of generated code
Collect feedback from team members

Version Control Context

Track changes to context definitions
Review updates through pull requests
Document why rules were added or changed

Challenge

Challenge 2: Context Specification Strategy

For a team R project, design a context specification strategy that includes:

What would go in AGENTS.md?
What would be better as inline comments?
How would you communicate standards to new team members?

Example Strategy

AGENTS.md: Core style principles, mandatory patterns, forbidden practices
Inline comments: Function-specific logic, data flow explanations, edge cases
README.md: Quick-start guide, links to detailed standards, setup instructions
Onboarding docs: Human-readable explanation of why standards exist, examples
Code reviews: Consistent feedback referring to documented standards

The key is redundancy across human and AI channels.

Summary

Providing proper context to AI coding assistants is essential for generating high-quality, consistent code. The AGENTS.md concept offers a standardized, version-controlled approach to defining project-specific guidelines. Combined with other context specification methods, it creates a comprehensive environment where AI assistants can truly enhance your development workflow.

Key Points

Context definitions help AI assistants generate code that matches your project standards
Context improves consistency: AI-generated code matches your project standards
AGENTS.md provides a standardized, version-controlled, machine-readable way to specify coding guidelines
Multiple AGENTS.md files can exist at different hierarchy levels; AI assistants use the closest file
Effective AGENTS.md files include clear examples of preferred and forbidden patterns
Start simple and iterate: Begin with core principles and expand based on needs
Combining AGENTS.md with inline comments, chat instructions, and tooling creates robust context
R projects benefit from explicit tidyverse usage and piping conventions in context definitions