Context Definition and the AGENTS.md Concept

Last updated on 2026-01-02 | Edit this page

Overview

Questions

  • Why is context important for AI coding assistants?
  • What is the AGENTS.md concept?
  • How can I define coding standards for AI agents in my R project?
  • What are other ways to provide context to AI assistants?

Objectives

  • Understand the importance of context definitions for AI agents
  • Learn about the AGENTS.md specification
  • Create an AGENTS.md file with R project-specific guidelines
  • Explore alternative methods for providing context to AI assistants

Introduction


As AI coding assistants become more integrated into our development workflows, it’s crucial to provide them with appropriate context about project-specific conventions, coding standards, and preferences. Without proper context, AI assistants may generate code that, while functionally correct, doesn’t align with your project’s style or best practices.

This chapter explores how to define context for AI agents, focusing on the AGENTS.md concept and other strategies for communicating your project’s coding standards to AI assistants.

Callout

Context is Key

Just as human developers need onboarding documentation to understand a project’s conventions, AI assistants benefit from explicit context definitions. Clear guidelines help AI tools generate code that fits seamlessly into your existing codebase.

The Need for General Context Definitions


Why Context Matters

When working with AI coding assistants, providing clear context helps ensure:

  • Consistency: Code generated by AI matches your project’s existing style and patterns
  • Best Practices: AI follows domain-specific conventions (e.g., using tidyverse packages in R)
  • Efficiency: Less time spent revising AI-generated code to match your standards
  • Maintainability: Generated code is easier for team members to understand and maintain
  • Learning: AI can help reinforce good coding practices by consistently applying them

Types of Context

Context for AI assistants can include:

  • Language-specific conventions: Which libraries or frameworks to prefer
  • Style guidelines: Naming conventions, formatting, documentation standards
  • Architectural patterns: How to structure code, which design patterns to use
  • Project-specific rules: Custom conventions unique to your codebase
  • Domain knowledge: Specialized terminology or domain-specific best practices

The AGENTS.md Concept


What is AGENTS.md?

AGENTS.md is a standardized approach to providing context and guidelines for AI coding assistants directly within your repository. By placing an AGENTS.md file in your repository root, you create a machine-readable specification that AI tools can reference when generating or reviewing code.

The concept is documented at https://agents.md/, which provides:

  • Specifications for the AGENTS.md format
  • Examples from various programming languages and domains
  • Best practices for writing effective agent guidelines
  • Community-contributed templates and patterns

How AGENTS.md Works

When an AI assistant (like GitHub Copilot, Cursor, or other AI tools) works in a repository with an AGENTS.md file, it can:

  1. Read the guidelines at the start of a session
  2. Apply the rules when generating code suggestions
  3. Reference the standards when reviewing or revising code
  4. Adapt behavior to match project-specific preferences

Multiple AGENTS.md Files

You can place AGENTS.md files at different levels of your project hierarchy. AI assistants typically use the closest AGENTS.md file relative to the current working file:

  • Repository root (/AGENTS.md): Defines project-wide standards that apply to all code
  • Subdirectory (/src/AGENTS.md, /tests/AGENTS.md): Provides context-specific guidelines that override or extend root-level rules
  • Module-level (/src/data-processing/AGENTS.md): Defines specialized rules for specific components

Example hierarchy:

my-r-project/
├── AGENTS.md               # General tidyverse standards
├── src/
│   └── analysis/
│       └── AGENTS.md       # Additional statistical analysis guidelines
└── tests/
    └── AGENTS.md           # Testing-specific conventions

When working on a file like /src/analysis/models.R, the AI assistant will prioritize: 1. /src/analysis/AGENTS.md (most specific) 2. /src/AGENTS.md (if it exists) 3. /AGENTS.md (project-wide defaults)

This hierarchical approach allows you to maintain general standards while accommodating specialized needs in different parts of your codebase.

Callout

Version Control Benefits

Since AGENTS.md is a simple text file and can be easily version-controlled alongside your code, e.g. using git, it:

  • Evolves with your project
  • Can be reviewed and improved through pull requests
  • Maintains consistency across different development stages
  • Provides historical context for coding decisions

Example: AGENTS.md for an R Project


Here’s a comprehensive but quite extensive example of an AGENTS.md file for an R project that emphasizes tidyverse principles and functional programming patterns:

MARKDOWN

# AGENTS.md - R Project Coding Guidelines

## Overview

This R project follows tidyverse conventions and functional programming principles. 
AI assistants should generate code that adheres to these guidelines.

## Language and Framework

- **Primary Language**: R (version 4.1 or higher)
- **Core Framework**: tidyverse
- **Required Packages**: dplyr, tidyr, ggplot2, purrr, readr

## Code Style Principles

### 1. Mandatory Tidyverse Usage

All data manipulation and analysis code MUST use tidyverse packages and functions.

**Prefer:**
```r
mtcars %>%
  filter(mpg > 20) %>%
  select(mpg, cyl, hp)
```

**Avoid:**
```r
mtcars[mtcars$mpg > 20, c("mpg", "cyl", "hp")]
```

### 2. Piping Over Local Variables

ALWAYS favor piping operations over creating intermediate local variables.

**Prefer:**
```r
mtcars %>%
  filter(cyl == 6) %>%
  mutate(efficiency = mpg / hp) %>%
  arrange(desc(efficiency)) %>%
  head(10)
```

**Avoid:**
```r
cars_filtered <- filter(mtcars, cyl == 6)
cars_mutated <- mutate(cars_filtered, efficiency = mpg / hp)
cars_sorted <- arrange(cars_mutated, desc(efficiency))
result <- head(cars_sorted, 10)
```

### 3. Prohibition of Variable Overwriting

NEVER overwrite existing variables. Use piping to transform data in a single flow.

**Forbidden:**
```r
data <- read_csv("input.csv")
data <- filter(data, value > 0)
data <- mutate(data, log_value = log(value))
data <- arrange(data, date)
```

**Required:**
```r
data <- read_csv("input.csv") %>%
  filter(value > 0) %>%
  mutate(log_value = log(value)) %>%
  arrange(date)
```

### 4. Sparse Use of Local Variables

Minimize the creation of intermediate variables. Only create local variables when:
- The result will be used multiple times in different contexts
- The variable name significantly improves code readability
- The computation is expensive and should not be repeated

**Acceptable local variable usage:**
```r
# Used in multiple independent operations
base_data <- read_csv("data.csv") %>%
  filter(status == "active")

summary_stats <- base_data %>%
  summarize(mean_value = mean(value), sd_value = sd(value))

detailed_analysis <- base_data %>%
  group_by(category) %>%
  summarize(across(where(is.numeric), list(mean = mean, sd = sd)))
```

### 5. Multiline Pipes with Documentation

Pipelines MUST be formatted across multiple lines with inline documentation.

**Required format:**
```r
analysis_results <- raw_data %>%
  # Remove incomplete cases and outliers
  filter(complete.cases(.), between(value, 0, 100)) %>%
  # Normalize values by group
  group_by(category) %>%
  mutate(normalized = (value - mean(value)) / sd(value)) %>%
  ungroup() %>%
  # Calculate derived metrics
  mutate(
    log_value = log1p(value),
    squared_value = value^2,
    interaction = value * normalized
  ) %>%
  # Sort by importance
  arrange(desc(abs(normalized)))
```

Each step in a pipeline should:
- Be on its own line
- Have a preceding comment explaining its purpose
- Use meaningful intermediate calculations when needed

### 6. Function Definitions

Functions should also follow piping principles when applicable:

```r
process_dataset <- function(data, threshold = 0.05) {
  data %>%
    # Filter based on significance threshold
    filter(p_value < threshold) %>%
    # Calculate effect sizes
    mutate(
      effect_size = (mean_treatment - mean_control) / pooled_sd,
      ci_lower = effect_size - 1.96 * se,
      ci_upper = effect_size + 1.96 * se
    ) %>%
    # Add interpretation
    mutate(
      significant = p_value < threshold,
      effect_magnitude = case_when(
        abs(effect_size) < 0.2 ~ "small",
        abs(effect_size) < 0.8 ~ "medium",
        TRUE ~ "large"
      )
    )
}
```

## Forbidden Patterns

1. **Loop-based operations** when vectorized or tidyverse alternatives exist
2. **Direct variable assignment in loops** - use `purrr::map()` family instead
3. **Base R subsetting syntax** - use `dplyr::filter()` and `dplyr::select()`
4. **Nested function calls** without pipes when multiple operations are chained
5. **`attach()` function** - always use explicit data references

## Documentation Standards

- Use roxygen2-style comments for all functions
- Include inline comments for complex pipeline steps
- Document assumptions and data requirements
- Explain any deviations from these guidelines (rare cases only)

## Examples of Complete Workflows

### Data Import and Cleaning
```r
cleaned_data <- read_csv("raw_data.csv") %>%
  # Handle missing values
  drop_na(key_columns) %>%
  # Standardize column names
  rename_with(tolower) %>%
  # Type conversion
  mutate(across(ends_with("_date"), ymd)) %>%
  # Remove duplicates
  distinct()
```

### Analysis Pipeline
```r
analysis <- cleaned_data %>%
  # Subset to relevant period
  filter(between(date, start_date, end_date)) %>%
  # Group-level transformations
  group_by(category, region) %>%
  summarize(
    n_obs = n(),
    mean_value = mean(value, na.rm = TRUE),
    median_value = median(value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Calculate derived metrics
  mutate(
    relative_value = mean_value / median_value,
    weight = n_obs / sum(n_obs)
  ) %>%
  # Final filtering
  filter(n_obs >= min_sample_size)
```

## Additional Guidelines

- Prefer `tibble` over `data.frame`
- Use `readr::read_*()` over base R `read.*()` functions
- Always specify `.groups` argument when using `summarize()` with `group_by()`
- Use `across()` for operations on multiple columns
- Leverage `case_when()` for complex conditional logic
Challenge

Challenge 1: Create Your Own AGENTS.md

Create an AGENTS.md file for one of your own R projects. Consider:

  1. What coding style do you prefer?
  2. Which packages should be favored?
  3. What patterns should be avoided?
  4. How should functions be documented?

Your AGENTS.md should include:

  • Clear statement of purpose
  • Specific package preferences with examples
  • Do’s and don’ts with code comparisons
  • Documentation requirements
  • Examples of good practices

Remember: Start simple and expand based on your project’s needs.

Other Options to Specify Context


While AGENTS.md is a powerful tool, there are several complementary or alternative approaches to providing context to AI assistants:

1. Inline Comments and Documentation

AI assistants can learn from well-documented code:

R

# This project uses tidyverse conventions exclusively
library(tidyverse)

#' Process customer data following tidyverse patterns
#' 
#' @param data A tibble with customer information
#' @return A processed tibble with standardized columns
#' @examples
#' process_customers(raw_customers)
process_customers <- function(data) {
  # Always use piping for multi-step transformations
  data %>%
    filter(!is.na(customer_id)) %>%
    mutate(name = str_to_title(name))
}

2. Chat Instructions

When using AI chat interfaces, provide context explicitly:

I'm working on an R project that follows strict tidyverse conventions.
Please generate all code using:

- dplyr for data manipulation
- Piping (%>%) for all multi-step operations
- No variable overwriting
- Comments before each pipe step

3. Project README Files

Include coding standards in your README.md:

MARKDOWN

## Coding Standards

This project follows tidyverse conventions. All contributions must:

- Use tidyverse packages
- Implement piping for data transformations
- Avoid variable overwriting
- Include inline documentation

4. Code Templates and Snippets

Create RStudio code snippets that enforce your patterns:

R

# In RStudio: Tools > Global Options > Code > Snippets
snippet tidypipe
	${1:data} %>%
		# ${2:description}
		${3:operation}() %>%
		# ${4:description}
		${5:operation}()

5. Style Guides with Tools

Use automated style checking tools like lintr to enforce standards:

R

# .lintr configuration
linters: linters_with_defaults(
  line_length_linter(120),
  object_usage_linter = NULL,
  # Enforce tidyverse style
  assignment_linter(),
  pipe_continuation_linter()
)

6. Pre-commit Hooks

Enforce standards before code is committed via pre-commit hooks

YAML

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/lorenzwalthert/precommit
    rev: v0.3.2
    hooks:
      - id: style-files
      - id: lintr

7. Copilot Instructions in IDE

Some IDEs allow workspace-specific instructions for AI assistants. While specific settings vary by tool and may evolve, the concept involves configuring your IDE to provide additional context files or instructions to the AI assistant.

Callout

Note on IDE-Specific Settings

The availability and configuration of AI assistant settings varies across IDEs and tools. Check your specific IDE’s documentation for current options to provide context to AI coding assistants.

Callout

Combining Approaches

The most effective strategy often combines multiple approaches:

  1. AGENTS.md for comprehensive, machine-readable guidelines
  2. Inline comments for implementation-specific context
  3. Chat instructions for interactive sessions
  4. Automated tools for enforcement

Best Practices for Context Definitions


Keep It Focused

  • Start with the most important rules
  • Don’t try to specify everything at once
  • Expand based on actual needs and pain points

Use Examples Liberally

  • Show preferred patterns with code examples
  • Demonstrate anti-patterns to avoid
  • Include both simple and complex scenarios

Make It Discoverable

  • Place context files in repository root
  • Reference them in README and contributing guides
  • Keep them up-to-date with project evolution

Test Your Guidelines

  • Verify that AI actually follows your guidelines
  • Iterate based on the quality of generated code
  • Collect feedback from team members

Version Control Context

  • Track changes to context definitions
  • Review updates through pull requests
  • Document why rules were added or changed
Challenge

Challenge 2: Context Specification Strategy

For a team R project, design a context specification strategy that includes:

  1. What would go in AGENTS.md?
  2. What would be better as inline comments?
  3. How would you communicate standards to new team members?
  1. AGENTS.md: Core style principles, mandatory patterns, forbidden practices
  2. Inline comments: Function-specific logic, data flow explanations, edge cases
  3. README.md: Quick-start guide, links to detailed standards, setup instructions
  4. Onboarding docs: Human-readable explanation of why standards exist, examples
  5. Code reviews: Consistent feedback referring to documented standards

The key is redundancy across human and AI channels.

Summary


Providing proper context to AI coding assistants is essential for generating high-quality, consistent code. The AGENTS.md concept offers a standardized, version-controlled approach to defining project-specific guidelines. Combined with other context specification methods, it creates a comprehensive environment where AI assistants can truly enhance your development workflow.

Key Points
  • Context definitions help AI assistants generate code that matches your project standards
  • Context improves consistency: AI-generated code matches your project standards
  • AGENTS.md provides a standardized, version-controlled, machine-readable way to specify coding guidelines
  • Multiple AGENTS.md files can exist at different hierarchy levels; AI assistants use the closest file
  • Effective AGENTS.md files include clear examples of preferred and forbidden patterns
  • Start simple and iterate: Begin with core principles and expand based on needs
  • Combining AGENTS.md with inline comments, chat instructions, and tooling creates robust context
  • R projects benefit from explicit tidyverse usage and piping conventions in context definitions