AI-supported Coding with GitHub Copilot in RStudio: All in One View

Content from Getting Started with GitHub Copilot

Last updated on 2025-12-15 | Edit this page

Estimated time: 30 minutes

Overview

Questions

How do I register for a GitHub account?
How can I get free GitHub Copilot access as a student?
How do I set up GitHub Copilot in RStudio?

Objectives

Create a GitHub account
Apply for GitHub Student Developer Pack
Install and configure GitHub Copilot in RStudio
Verify GitHub Copilot is working as an autocompletion tool

Introduction

GitHub Copilot is an AI-powered code completion tool that can significantly enhance your coding experience in RStudio. This chapter will guide you through the process of setting up GitHub Copilot, from creating a GitHub account to configuring it in RStudio.

Registering with GitHub

If you don’t already have a GitHub account, follow these steps:

Navigate to github.com
Click the “Sign up” button in the top right corner
Enter your email address (preferably your university/institutional email)
Create a strong password
Choose a username
Verify your account through the email confirmation

Requesting GitHub Student Developer Pack

As of November 2025, GitHub offers free access to GitHub Copilot and other premium features to students through the GitHub Student Developer Pack.

Steps to Apply

Go to education.github.com/pack
Click on “Sign up for Student Developer Pack”
Sign in with your GitHub account if not already logged in
Fill out the application form:
- Select your school from the dropdown (or enter it manually)
- Provide your school-issued email address
- Describe how you plan to use GitHub (e.g., “For coursework and research projects”)
Upload proof of enrollment:
- Student ID card
- Official enrollment letter
- Transcript or other academic document
Submit your application

Callout

Application Processing Time

GitHub typically processes student applications within a few days, but it can take up to 2 weeks. You’ll receive an email notification once your application is approved.

Installing GitHub Copilot in RStudio

Once your GitHub Student Developer Pack is approved, you can set up GitHub Copilot in RStudio.

Prerequisites

RStudio version 2022.02 or later
GitHub account with Copilot access

Installation Steps

Install the GitHub Copilot extension in RStudio:
- Open RStudio
- Go to Tools → Global Options
- Select Copilot from the left sidebar
- Click “Enable GitHub Copilot”
- If the Copilot option is not available, make sure you’re running RStudio 2022.02 or later
Sign in to GitHub:
- Click “Sign in to GitHub” in the Copilot settings
- A browser window will open asking you to authorize RStudio
- Click “Authorize” to grant RStudio access to your GitHub account
- You may be asked to enter a device code - copy the code shown in RStudio and paste it into the browser
Verify the connection:
- Return to RStudio
- You should see a confirmation that GitHub Copilot is enabled
- The status should show “GitHub Copilot: Active”

Challenge

Challenge 1: Test GitHub Copilot

Open a new R script in RStudio and try typing a comment describing a function, such as:

R

# Function to calculate the mean of a numeric vector

Does GitHub Copilot suggest a function implementation?

Show me the solution

GitHub Copilot should suggest a function implementation below your comment. The suggestion might look something like:

R

# Function to calculate the mean of a numeric vector
calculate_mean <- function(x) {
  sum(x) / length(x)
}

You can accept the suggestion by pressing Tab or continue typing to see alternative suggestions.

Callout

Tips for Using Copilot

Write clear, descriptive comments to get better suggestions
Review all suggestions before accepting them
Use Tab to accept a suggestion, or Esc to dismiss it
Copilot learns from context, so well-structured code gets better suggestions

Configuring Copilot Settings

You can customize how GitHub Copilot works in RStudio:

Go to Tools → Global Options → Copilot
Adjust settings such as:
- Enable/Disable Copilot: Toggle Copilot on or off
- Suggestion mode: Configure how suggestions appear
- Keybindings: Customize keyboard shortcuts for accepting suggestions

Callout

Recommendation

For this course, we recommend to

disable “Index project files with GitHub Copilot”

to speed up performance
to reduce traffic
to keep private code private
to reduce unwanted suggestions from your existing codebase.

Troubleshooting Common Issues

Github Education Programm Registration fails

Check the security settings of your browser!

Github (and thus the browser) want to check your location to verify your student status
This check is sometimes blocked by strict privacy settings
Try a different browser or adjust (temporarily) your privacy settings

Copilot is not showing suggestions

Verify that Copilot is enabled in settings
Check that you’re signed in to GitHub
Check your internet connection
Ensure your GitHub Copilot subscription is active
Try restarting RStudio

Authorization fails

Clear your browser cache and try again
Make sure you’re using the correct GitHub account
Check that your GitHub Copilot access is active

It’s all not working !!!

Give us a ring! We are happy to help you to get started.

Key Points

A GitHub account is required to use GitHub Copilot
Students can get free GitHub Copilot access through the GitHub Student Developer Pack
GitHub Copilot integrates directly into RStudio as an autocompletion tool
Test Copilot by writing descriptive comments and observing the suggestions
Copilot can be customized through RStudio’s settings

Content from Where Do We Need AI Support?

Last updated on 2026-01-09 | Edit this page

Estimated time: 25 minutes

Overview

Questions

What types of coding tasks can benefit from AI assistance?
When should we use AI coding assistants and when shouldn’t we?
What are the limitations of AI-powered coding tools?

Objectives

Identify scenarios where AI coding assistants are most helpful
Understand the limitations and potential pitfalls of AI-generated code
Develop a balanced approach to using AI in coding workflows
Recognize when human expertise is essential

Introduction

AI-powered coding assistants like GitHub Copilot have revolutionized how we write code, but understanding when and how to use them effectively is crucial. This chapter explores the scenarios where AI support is most beneficial and where caution is needed.

Scenarios Where AI Assistants Excel

1. Boilerplate Code Generation

AI assistants are particularly effective at generating repetitive, standard code patterns:

Function templates and class structures
Data validation checks
Common data transformations
Standard file I/O operations

Challenge

Challenge 1: Identify Boilerplate Code

Think about your recent coding projects. List 2-3 examples of repetitive code patterns you frequently write that could benefit from AI assistance.

Example Solutions

Examples might include:

Reading CSV files and performing basic data cleaning
Creating standard plotting functions with ggplot2
Writing function documentation in roxygen2 format
Setting up standard data frame transformations with dplyr

2. Code Completion and Syntax

AI can help with:

Completing function arguments
Suggesting appropriate function names from loaded packages
Correcting syntax errors
Providing context-aware variable names

3. Documentation and Comments

Writing clear documentation is time-consuming but essential. AI can assist with:

Generating function documentation
Creating informative code comments
Writing README files
Producing example usage code

4. Learning New Libraries and Functions

When working with unfamiliar packages or functions:

AI can suggest appropriate functions for specific tasks
Explain function parameters and usage or given code
Provide example implementations
Offer alternative approaches

Scenarios Requiring Caution

1. Complex Logic and Algorithms

AI assistants may struggle with:

Domain-specific algorithms
Complex statistical methods
Unique business logic
Performance-critical code

Callout

Critical Review Required

Always carefully review AI-generated code for:

Logical correctness
Edge case handling
Performance implications
Security vulnerabilities

2. Project-Specific Requirements

AI may not understand:

Specific coding standards in your project
Custom architectural patterns
Project-specific constraints
Team conventions and best practices

3. Data Privacy and Security

Be cautious when:

Working with sensitive or proprietary data
Implementing security-critical features
Handling authentication and authorization
Processing personal information

Callout

Data Privacy Considerations

Remember that code sent to AI services may be used for training.

Never include:

API keys or passwords
Proprietary algorithms/code
Sensitive data
Personal information

Limitations of AI Coding Assistants

Understanding Context

AI may miss broader project context
Limited understanding of project history and evolution
Cannot always infer implicit requirements

Code Quality

Generated code may not follow best practices
Potential for introducing subtle bugs
May suggest outdated or deprecated approaches

Dependency on Training Data

Biased toward common patterns seen in training data
May not know about very recent updates or libraries
Could suggest obsolete methods

Challenge

Challenge 2: Evaluate AI Suggestions

Consider this scenario: You ask an AI assistant to generate code for a statistical analysis. What questions should you ask yourself before accepting the suggestion?

Questions to Consider

Before accepting AI-generated code, ask:

Does this code produce valid results?
Are the assumptions appropriate for my data?
Does it handle missing data correctly?
Are there edge cases that aren’t covered?
Is this the most efficient approach?
Does it align with best practices in my field?

Keep asking/investigating until you can confidently answer “yes” to all questions!

Developing a Balanced Approach

Best Practices for Using AI Assistants

Start with a clear goal: Know what you want to achieve before asking for AI help
- Best decompose complex tasks into smaller, manageable parts that can be easily checked
Review and understand: Never accept suggestions blindly
- Request explanations for AI-generated code until you understand it fully
Test thoroughly: Validate all AI-generated code
- Think about “What can go wrong?” (edge cases) and design tests accordingly
Iterate and refine: Use AI suggestions as a starting point, not the final solution
- Often AI helps with code fragments. Do/request a final revision to merge them into a coherent whole
Maintain ownership: You are responsible for the code in your project
- Ensure it meets your project’s standards and requirements

When to Rely on Human Expertise

Prioritize human judgment for:

Critical decision-making about architecture and design
Code review and quality assessment
Understanding domain-specific requirements
Ethical considerations in code implementation
Debugging complex issues

Ways How AI Assistance Can Help You

Typically, one distinguishes between three ways of AI usage:

Assistively	Directively	Autonomously

AI suggests code snippets, completes lines, or generates boilerplate code automatically.	AI generates larger code blocks or entire functions based on explicit prompts.	AI independently creates code with minimal human input based on high-level requirements.
This is typically done automatically without the need to explicitly ask for it.	Here, you explicitly ask the AI to perform a specific task.	You provide high-level requirements or a description of a larger task, and the AI generates the code accordingly.

Within this course, we will investigate all three ways of AI usage.

What is Supported by AI?

Beside the way how AI is used, one can also distinguish what process is supported by AI:

Coding Support
- AI assists with writing and revising your code
- i.e. AI is used as a tool within your coding workflow
- this can be done as a code assistant or as a code generator
- subtasks are e.g.
  - syntax support: AI helps with code syntax, such as completing function calls, correcting errors, or suggesting variable names.
  - logic support: AI assists with the logical structure of the code, such as suggesting algorithms, data structures, or control flow.
  - documentation support: AI helps with writing comments, documentation, or explanations for the code.
  - debugging support: AI assists in identifying and fixing bugs, suggesting test cases, or improving code quality.
  - optimization support: AI helps improve code performance, suggesting optimizations, refactoring, or resource management techniques.
Data Processing Support: AI assists in cleaning, transforming, and preparing data for analysis.
- i.e. AI is used as a tool within data processing pipelines.
- e.g. we will use the ellmer R package later in this course for AI-supported data generation.
Project Management Support
- AI aids in organizing tasks, tracking progress, and managing timelines.
- i.e. AI is used as a tool within project management software
- e.g. GitHub Copilot Codespaces, GitHub Issues AI, etc.
Collaboration Support
- AI facilitates teamwork by suggesting code reviews, merging changes, or managing version control.
- e.g. GitHub Copilot Chat, GitHub Pull Requests AI, etc.
Learning Support
- AI provides educational assistance, such as explaining concepts, suggesting resources, or guiding through coding challenges.
- typically using a chatbot interface
- e.g. ChatGPT, GitHub Copilot Chat, etc.
… something missing?! Let us know!

Discussion

Preparation of Group Discussion

In small groups, we want to discuss:

Have you used AI coding assistants before? What was your experience?
Can you share an example where AI helped you solve a problem?
Can you share an example where AI suggestions were incorrect or unhelpful?
How do you decide when to use AI assistance versus figuring things out yourself?

Thus, please think on these questions for our upcoming group discussion.

The Future of AI in Coding

AI assistants are rapidly evolving. As they improve, we should:

Stay informed about new capabilities and limitations
Continuously refine our approach to using these tools
Share knowledge about effective practices
Contribute to discussions about responsible AI use

Key Points

AI assistants excel at generating boilerplate code, completing syntax, and helping with documentation
Use caution with complex logic, project-specific requirements, and security-critical code
Always review and test AI-generated code thoroughly
AI assistants are tools to augment, not replace, human expertise
Develop a balanced approach that leverages AI strengths while maintaining code quality and security
Stay open to learning and adapting as AI technologies evolve

Content from Risks, Drawbacks and Responsibilities with AI Usage

Last updated on 2026-01-19 | Edit this page

Estimated time: 45 minutes

Overview

Questions

What can go wrong when using AI coding assistants?
What are my responsibilities as a developer when using AI?
When is AI usage inappropriate or a no-go?
What are the side effects of heavy AI reliance?
Why do LLMs produce incorrect code and how can I avoid it?

Objectives

Understand potential risks and pitfalls of AI-assisted coding
Recognize your responsibilities as a developer using AI tools
Identify scenarios where AI usage is inappropriate
Learn strategies to mitigate risks and verify AI-generated code
Develop awareness of long-term effects of AI dependence
Apply best practices for responsible AI usage in coding

Introduction

While AI coding assistants like GitHub Copilot can significantly enhance productivity, they come with important risks, drawbacks, and responsibilities. Understanding these challenges is crucial for using AI tools effectively and ethically. This lesson explores what can go wrong, your responsibilities as a user, and how to navigate the complex landscape of AI-assisted development.

Callout

The Double-Edged Sword

AI coding assistants are powerful tools that can accelerate development, but they require careful use. Just as you wouldn’t use a power tool without understanding safety precautions, you shouldn’t use AI coding assistants without understanding their limitations and risks.

What Can Go Wrong?

1. Incorrect or Buggy Code

AI models can generate code that appears correct but contains subtle bugs:

Example:

R

# AI might suggest
remove_last <- function(data) {
  data[1:length(data) - 1]  # Bug: access of data[0] will cause an error
}

# Correct version
remove_last <- function(data) {
  data[1:(length(data) - 1)]  # Proper parentheses to ensure correct indexing
}

Why this happens:

AI learns from patterns in training data, including buggy code
AI doesn’t execute or test the code it generates
Subtle logic errors are harder for pattern-matching to detect

2. Security Vulnerabilities

AI may suggest code with security flaws:

R

# INSECURE: AI might suggest
execute_query <- function(user_input) {
  query <- paste0("SELECT * FROM users WHERE name = '", user_input, "'")
  dbGetQuery(conn, query)  # SQL injection vulnerability!
}

# SECURE: Use parameterized queries
execute_query <- function(user_input) {
  query <- "SELECT * FROM users WHERE name = ?"
  dbGetQuery(conn, query, params = list(user_input))
}

Security risks include:

SQL injection vulnerabilities (misuse/manipulation of the database request)
Cross-site scripting (XSS) in web applications
Hardcoded credentials or API keys
Unsafe file operations
Improper input validation

3. Inefficient or Non-Performant Code

AI may suggest code that works but performs poorly:

R

# INEFFICIENT: AI might suggest
calculate_means <- function(data) {
  results <- list()
  for(i in 1:ncol(data)) {
    results[[i]] <- mean(data[, i])
  }
  return(unlist(results))
}

# EFFICIENT: Vectorized approach
calculate_means <- function(data) {
  colMeans(data)
}

4. License and Copyright Issues

AI models are trained on public code repositories, which may include:

Code with restrictive licenses (e.g. GPL)
Proprietary code that shouldn’t have been public
Code with unclear licensing

Risks:

Inadvertently incorporating GPL code into proprietary projects
- Note: GPL requires derivative works to also be GPL
Copyright infringement claims
License compliance violations

Your Responsibilities as an AI User

When using AI coding assistants, it is your responsibility to ensure code quality, security, and legality. It is important to remember that AI is a tool to assist you, not a substitute for your expertise and judgment. Thus, you have to regard AI suggestions with the same scrutiny as code from any other source.

With AI usage, your role changes from sole author to supervisor and validator of AI-generated code!

1. Code Ownership and Accountability

You are ultimately responsible for all code in your project, regardless of whether it was written by you or suggested by AI.

This means:

You must understand every line of code you commit
You are accountable for bugs, security issues, and performance problems
You cannot blame the AI if something goes wrong
You must be able to explain and defend your code choices

2. Verification and Testing

Especially when it comes to the generation of complex functions or algorithms, never blindly accept AI suggestions. Always verify through testing. This should be standard practice anyway, but is especially critical with AI-generated code.

R

# Example verification workflow
ai_suggested_function <- function(data) {
  # [AI-generated code here]
}

# REQUIRED: Create comprehensive tests
test_that("ai_suggested_function works correctly", {
  # Test normal cases
  expect_equal(ai_suggested_function(c(1, 2, 3)), expected_result)
  
  # Test edge cases
  expect_error(ai_suggested_function(NULL))
  expect_equal(ai_suggested_function(c()), expected_empty_result)
  expect_equal(ai_suggested_function(c(NA, 1, 2)), expected_with_na)
  
  # Test performance for large inputs
  large_data <- rep(1:1000, 1000)
  expect_lt(system.time(ai_suggested_function(large_data))[3], 1.0)
})

Note, creating test code is another area where AI can assist you very efficiently, but you must still verify the tests themselves and check if they cover all relevant cases.

3. Security Awareness

You must:

Review all AI-generated code for security vulnerabilities
Never include sensitive data in prompts to AI tools
Understand that code sent to cloud-based AI may be logged or used for subsequent training and answering
Follow security best practices even when AI suggests otherwise

Callout

Data Privacy Alert

When using cloud-based AI assistants:

Your code snippets are in most cases sent to external servers
Avoid including passwords, API keys, or sensitive data
Check your organization’s policies on AI tool usage
Consider using local AI models for sensitive projects

4. Continuous Learning

Don’t let AI replace your learning:

Use AI as a learning tool, not a crutch
Understand why suggested solutions work
Research functions and techniques you don’t recognize
Build your own expertise alongside AI assistance

5. Attribution and Transparency

Be transparent about AI usage:

Document when AI significantly contributed to code
Follow your organization’s policies on AI disclosure
Consider adding comments noting AI-assisted sections
Be honest in academic and professional contexts

When AI Usage Is a No-Go

The easiness of using AI coding assistants can tempt developers to use them inappropriately. Especially if you are new to programming or a specific domain, it can be hard to judge when AI usage is acceptable and when not.

Thus, beginners should use AI tools as a personal learning assistant to get hints and explanations, but avoid using them to generate complete solutions for critical tasks you cannot yet judge properly yourself. That way, you can build up your own expertise boosted by AI help without becoming overly dependent on it.

Once you mastered the basics and can read and understand code well, you can start using AI more freely to broaden your productivity and understanding. Still, always be cautious when using AI for critical tasks beyond your expertise.

1. Specialized or Novel Algorithms

Be cautious with:

Cutting-edge research implementations
Domain-specific algorithms not well-represented online
Novel statistical methods
Proprietary business logic

Why: AI training data may not include correct implementations of specialized techniques.

2. Learning Fundamentals

Don’t use AI when:

When not familiar with a new programming language
You have no time or interest in understanding basic concepts and getting further explanations
Completing academic assignments (unless explicitly allowed)
Building foundational skills

Why: You need to develop core competencies yourself.

Side Effects of Heavy AI Reliance

1. Skill Atrophy

Long-term risks:

Reduced ability to write code from scratch
Weakened problem-solving skills
Decreased understanding of fundamentals
Dependency on AI availability

Example:

A developer who always uses AI for basic tasks may struggle when:

Working offline
Debugging complex issues
Interviewing for new positions
Mentoring junior developers

2. Reduced Code Understanding

Consequences:

Difficulty maintaining code you didn’t write or understand
Inability to debug when issues arise
Challenges explaining code to colleagues
Problems adapting code to changing requirements

3. Homogenization of Code

Problems:

Repetition of common mistakes or old patterns/approaches
Loss of creative problem-solving approaches
“Cargo cult” programming (copying without understanding)

4. False Confidence

Dangers:

Overestimating code quality because it “looks professional”
Underestimating testing needs
Reduced code review rigor
Faster development at the cost of quality

Callout

Maintaining Skills While Using AI

Balance is key:

Set aside time for coding without AI assistance
Practice fundamental skills regularly
Review and understand all AI-generated code
Explain solutions to others to test your understanding

Why LLMs Produce Wrong Code

1. Pattern Matching, Not Understanding

LLMs don’t “understand” code:

They predict likely token sequences based on training data
They don’t execute code mentally to verify correctness
They don’t reason about edge cases
They replicate patterns even when inappropriate

Example:

R

# AI might suggest this pattern because it's common
calculate_average <- function(x) {
  sum(x) / length(x)  # Fails with NAs!
}

# But context might require
calculate_average <- function(x) {
  mean(x, na.rm = TRUE)  # Handles NAs correctly
}

2. Training Data Limitations

Issues with training data:

Contains buggy code from public repositories
May be outdated (not current best practices)
Overrepresents certain languages and frameworks
May include deprecated or insecure patterns

3. Context Window Limitations

LLMs have limited context:

Can’t see entire project structure
Miss important constraints from other files
Don’t know your specific requirements
Lack domain-specific knowledge

4. No Execution or Testing

Critical limitation:

AI generates code but doesn’t run it
No feedback loop from actual execution
Can’t verify correctness through testing
Doesn’t catch runtime errors

Challenge

Challenge: Prompt Engineering for Better Code

Try improving AI-generated code quality by refining your prompts. Compare results:

Poor prompt:

Write a function to process data

Better prompt:

Write an R function that filters a data frame to include only rows where the ‘value’ column is positive and non-NA. Include input validation, error handling, and roxygen2 documentation.

Try both prompts with your AI assistant. How do the results differ?

Observations

Poor prompt typically produces:

Generic, vague code
No error handling
No documentation
Assumes undocumented column names
Doesn’t handle edge cases

Better prompt typically produces:

More specific, targeted code
Input validation
Documentation
Explicit handling of requirements
Better edge case coverage

Key lessons:

Specific prompts → better results
State requirements explicitly
Request error handling and documentation
Mention edge cases you care about
Specify coding standards or style

But remember: Even with great prompts, always verify the output!

How to Avoid AI-Generated Errors

1. Adopt a Verification Mindset

Question everything:

R

# When AI suggests code, ask yourself:
# - Do I understand what this does?
# - What are the assumptions?
# - What could go wrong?
# - Are there edge cases?
# - Is this efficient?
# - Is this secure?

2. Prompt Engineering

Improve AI suggestions by extended initial prompts:

Describe the general requirements (what packages to use, coding style, version constraints, etc.)
Give details about the data structures involved
Describe the coding task in detail
Define critical edge cases (or request handling of edge cases)
Request documentation and comments for clarity and subsequent maintenance

In some AI systems, general requirements can be provided as system prompts or initial context to guide all subsequent suggestions.

3. Code Review Process

Always review AI-generated code for:

Correctness of logic
Error handling
Input validation
Performance implications
Security vulnerabilities
Code style and readability
Documentation quality

Don’t accept the first suggestion:

R

# Round 1: AI suggests basic solution
# Round 2: Request error handling
# Round 3: Request performance optimization
# Round 4: Request documentation
# Final: Human review and testing

Best Practices for Responsible AI Usage

1. The 80/20 Rule

Use AI for the 80% of straightforward, repetitive tasks. Reserve human expertise for the critical 20% involving:

Complex algorithms
Security-sensitive code
Performance optimization
Architectural decisions
Domain-specific logic

2. Treat AI as a Junior Developer

Think of AI as a smart but inexperienced assistant:

It needs clear instructions
It requires supervision
Its work must be reviewed
It shouldn’t handle critical tasks alone
It can help with routine work

3. Document AI Usage

R

# Consider adding comments for transparency
#' Calculate weighted average
#'
#' This function was initially drafted with AI assistance
#' and subsequently reviewed, tested, and validated.
#'
#' @param values Numeric vector of values
#' @param weights Numeric vector of weights
#' @return Weighted average
calculate_weighted_avg <- function(values, weights) {
  # Implementation here
}

4. Continuous Skill Development

Maintain your skills:

Practice coding without AI regularly
Study algorithms and data structures
Read high-quality code from experts
Contribute to code reviews
Teach others what you learn

5. Stay Informed

AI capabilities and limitations change rapidly:

Follow updates to tools you use
Learn about new AI capabilities
Understand evolving best practices
Participate in communities discussing AI use

Discussion

Group Discussion

Within the session we want to discuss some of the following questions:

What are your “tricks” to check for and deal with wrong answers of the AI?
How do you know/decide whether AI usage is allowed/ok and when not?
Are policies (of organizations, employers, …) a sufficient guide regarding AI use?
Honestly, do you think AI usage helps to increase your skills or is a tempting way to reduce your learning/understanding efforts? Or do you have examples for both? If so, what makes the difference?

So think about them and make some notes of your thoughts and ideas!

Legal and Ethical Considerations

Copyright and Licensing

Key concerns:

AI-generated code may resemble copyrighted code
Unclear legal status of AI-generated content
License compatibility issues
Potential copyright infringement

Best practices:

Understand your organization’s AI usage policies
Check licenses of suggested dependencies
Document AI usage for legal compliance
Consult legal counsel for commercial projects

Privacy and Data Protection

Considerations:

Code sent to AI services may be stored
Proprietary algorithms might be leaked
Sensitive data in code could be exposed
Compliance with GDPR, HIPAA, etc.

Mitigation:

Use local AI models for sensitive code
Anonymize data in examples sent to AI
Review AI service terms of service
Follow organizational data policies

Professional Ethics

Ethical obligations:

Honesty about AI use (academic, professional)
Not claiming AI-generated work as entirely your own
Ensuring code quality and safety
Considering societal impact of your code

Conclusion

AI coding assistants are powerful tools that can enhance productivity, but they come with significant responsibilities. Success requires:

Awareness of what can go wrong
Responsibility for all code you commit
Judgment about when AI is appropriate
Verification of all AI suggestions
Balance between AI assistance and skill development
Ethics in how you use and disclose AI usage

Callout

Remember

AI is a tool, not a replacement for human expertise, judgment, and responsibility. Use it wisely, verify everything, and never stop learning.

Education is what remains when you have no tool at hand...

Key Points

AI can generate incorrect, insecure, or inefficient code - always verify
You are fully responsible for all code you commit, regardless of AI involvement
AI usage is inappropriate for critical systems, novel algorithms, and learning fundamentals
Heavy AI reliance can lead to skill atrophy and reduced code understanding
LLMs produce wrong code because they pattern-match rather than understand
Comprehensive testing, code review, and iterative refinement are essential
Balance AI productivity gains with maintaining your programming skills
Consider legal, ethical, and privacy implications of AI tool usage
Treat AI as a junior assistant requiring supervision, not an expert to trust blindly
Responsible AI usage requires continuous learning and critical thinking

Content from RStudio Autocompletion with Copilot

Last updated on 2026-01-09 | Edit this page

Estimated time: 40 minutes

Overview

Questions

How does GitHub Copilot work as an autocompletion tool in RStudio?
What are the best practices for getting useful suggestions from Copilot?
How can I accept, reject, or modify Copilot suggestions?

Objectives

Understand how GitHub Copilot generates code suggestions
Learn techniques for writing effective prompts
Practice accepting, rejecting, and modifying suggestions
Develop efficient workflows using Copilot in RStudio

Using an AI assistively

GitHub Copilot functions as an advanced autocompletion tool in RStudio, going beyond simple syntax completion to suggest entire lines or blocks of code. This chapter covers how to use Copilot effectively as an autocompletion assistant.

How Copilot Works

GitHub Copilot uses machine learning models trained on billions of lines of public code to:

Analyze the context of your current code
Understand comments and function names
Predict what you’re trying to accomplish
Generate relevant code suggestions

Callout

Context is Key

Copilot examines:

Your current file’s code
- Comments you’ve written
- Variable and function names
- The structure of your code
Files from your RStudio project (depending on your settings)

The more context you provide, the better the suggestions!

Writing Effective Comments for Better Suggestions

Comment-Driven Development

One of the most effective ways to use Copilot is to write descriptive comments first:

Good Example:

R

# Load data from CSV file 'dat.csv', remove rows with missing values, and convert date column to Date type

Less Effective:

R

# Load data from 'dat.csv'

Challenge

Challenge 1: Practice Writing Effective Comments

Write a detailed comment describing what you want the code to do for the following scenario:

You need to create a function that takes a data frame of student grades, calculates the average grade for each student, and returns only students with an average above 70.

Example Solution

R

# Function to calculate average grades per student and filter for high performers
# Input: data frame with columns 'student_name' and 'grade'
# Output: data frame with columns 'student_name' and 'avg_grade' for students with avg > 70

Accepting and Managing Suggestions

Keyboard Shortcuts

Tab: Accept the entire suggestion
Esc: Dismiss the current suggestion
Ctrl + Shift + L or Cmd + Shift + L: Request a new suggestion

Partial Acceptance

You can accept suggestions word-by-word:

Ctrl + → or Cmd + →: Accept next word
This allows you to use parts of a suggestion while continuing to type

Callout

Ghost Text

Copilot suggestions appear as gray “ghost text” in your editor. This makes it easy to see suggestions without disrupting your coding flow.

Techniques for Getting Better Suggestions

1. Provide Clear Function Signatures

Start with a clear function definition:

R

# Copilot works better when you define function structure first
calculate_summary_stats <- function(data, group_var) {
  # Calculate mean, median, and sd for each group
  
}

2. Use Meaningful Variable Names

R

# Better - descriptive names help Copilot understand context
student_grades_df <- read.csv("grades.csv")

# Less helpful
df <- read.csv("grades.csv")

3. Break Down Complex Tasks

Instead of asking for everything at once:

R

# Step 1: Load and clean data
# Load CSV file with student information

# Step 2: Calculate metrics
# Calculate average grade per student

# Step 3: Filter results
# Keep only students with average above threshold

Challenge

Challenge 2: Experiment with Context

Try generating code for the same task with different levels of context:

Just type: read.csv(
Add a comment first: # Load student data from grades.csv then read.csv(
Add more context:

R

# Data has columns: student_id, name, grade, date
# Load student data from grades.csv with explicit column types

then read.csv(

Observation

You should notice that:

With minimal context, Copilot might just complete the parentheses
With a comment, it might suggest the filename
With detailed context, it might suggest the filename AND additional parameters like stringsAsFactors = FALSE or header = TRUE

Working with Different Types of Suggestions

Single-Line Completions

Best for:

Completing function calls
Finishing variable assignments
Adding package imports

Example:

R

library(# Copilot suggests: tidyverse)

Multi-Line Suggestions

Best for:

Function implementations
Code blocks (if/else, loops)
Multiple related operations

Example:

R

# Function to plot distribution with ggplot2
plot_distribution <- function(data, column) {
  # Copilot may suggest entire function body
}

Questions

It is also possible to ask Copilot questions in comments:

write a comment starting with # q: followed by your question

For instance

R

# q: What is meant with facetting in ggplot2?

produced

R

# A: Facetting in ggplot2 refers to the process of creating multiple subplots (facets) within a single plot, based on the values of one or more categorical variables. 
# This allows for easy comparison of distributions or relationships across different subsets of the data.

Callout

Why is the suggestion incomplete?

Communication between RStudio and the AI service is done in chunks and the amount of ‘tokens’ (words/pieces of words) that can be sent at once is limited. The same holds true for the response from the AI service. If your code or comment is very long, or if the AI service generates a long suggestion, it may get cut off.

In that case, it is typically sufficient to accept what has been suggested so far and then request a new suggestion (just wait) to continue from there.

Best Practices for Efficient Workflow

1. Review Before Accepting

Always read the suggestion before pressing Tab key:

Check for logical errors
Verify it matches your intent
Look for security issues

2. Iterate and Refine

Accept a suggestion as a starting point
Modify it to fit your specific needs
Add error handling and edge cases

3. Combine with Traditional Coding

Use Copilot for repetitive tasks
Code critical logic yourself
Let Copilot help with boilerplate

Callout

Maintain Your Coding Skills

While Copilot is helpful, continue to:

Understand the code you’re using
Practice writing code without assistance
Learn from the suggestions Copilot provides

Common Patterns and Use Cases

In the following, we highlight some common coding patterns where Copilot can be particularly useful. Therein, [...] indicates your cursor position where Copilot will provide suggestions.

Data Manipulation with dplyr

R

# Copilot excels at suggesting dplyr pipelines
# Filter data for specific conditions and group by category
data %>%
[...]

Copilot suggests the rest..

Note: triggering extensions via comments can also be used at the end or within a pipeline to add the next step or add an intermediate transformation.

Creating Plots with ggplot2

R

ggplot(data, aes(x = height, y = weight)) +
# Create a scatter plot with regression line
[...]

Copilot will suggest respective geoms and themes..

Writing Functions

R

# Function to validate email addresses
validate_email <- function(email) {
[...]
}

Copilot will suggest the function body, e.g. using regex patterns and validation logic

Challenge

Challenge 3: Build a Function with Copilot

Use Copilot to help you create a function that:

Takes a numeric vector as input
Removes outliers (values > 3 standard deviations from mean)
Returns the cleaned vector

Start with a descriptive comment and function signature.

Example Approach

R

# Function to remove outliers from a numeric vector
# Outliers are defined as values more than 3 SD from the mean
# Input: numeric vector
# Output: numeric vector with outliers removed
remove_outliers <- function(x, sd_threshold = 3) {
  # Let Copilot suggest the implementation
  # It might suggest something like:
  mean_x <- mean(x, na.rm = TRUE)
  sd_x <- sd(x, na.rm = TRUE)
  x[abs(x - mean_x) <= sd_threshold * sd_x]
}

Remember to test the function with sample data!

Troubleshooting Suggestions

Copilot Suggests Incorrect Code

Provide more specific comments
Add type hints or example data structures
Break down the task into smaller steps

No Suggestions Appear

Check that Copilot is enabled
Verify your internet connection
Provide more context with comments
Wait a moment - suggestions can take a second to generate

Suggestions Don’t Match Your Intent

Rewrite your comment more specifically
Add examples of input/output
Specify the packages you want to use

Advanced Tips

Specifying Packages

R

# Using dplyr and tidyr to reshape data

Requesting Specific Approaches

R

# Using base R (not tidyverse) to calculate mean by group

Setting Constraints

R

# Function must handle NA values and return informative error messages

Key Points

Copilot generates suggestions based on context from your code and comments
Write clear, descriptive comments to get better suggestions
Use Tab to accept, and Esc to dismiss suggestions
Break complex tasks into smaller steps for more accurate suggestions
Always review and test AI-generated code before using it
Combine Copilot assistance with your own coding expertise for best results

Content from Context Definition and the AGENTS.md Concept

Last updated on 2026-01-02 | Edit this page

Estimated time: 35 minutes

Overview

Questions

Why is context important for AI coding assistants?
What is the AGENTS.md concept?
How can I define coding standards for AI agents in my R project?
What are other ways to provide context to AI assistants?

Objectives

Understand the importance of context definitions for AI agents
Learn about the AGENTS.md specification
Create an AGENTS.md file with R project-specific guidelines
Explore alternative methods for providing context to AI assistants

Introduction

As AI coding assistants become more integrated into our development workflows, it’s crucial to provide them with appropriate context about project-specific conventions, coding standards, and preferences. Without proper context, AI assistants may generate code that, while functionally correct, doesn’t align with your project’s style or best practices.

This chapter explores how to define context for AI agents, focusing on the AGENTS.md concept and other strategies for communicating your project’s coding standards to AI assistants.

Callout

Context is Key

Just as human developers need onboarding documentation to understand a project’s conventions, AI assistants benefit from explicit context definitions. Clear guidelines help AI tools generate code that fits seamlessly into your existing codebase.

The Need for General Context Definitions

Why Context Matters

When working with AI coding assistants, providing clear context helps ensure:

Consistency: Code generated by AI matches your project’s existing style and patterns
Best Practices: AI follows domain-specific conventions (e.g., using tidyverse packages in R)
Efficiency: Less time spent revising AI-generated code to match your standards
Maintainability: Generated code is easier for team members to understand and maintain
Learning: AI can help reinforce good coding practices by consistently applying them

Types of Context

Context for AI assistants can include:

Language-specific conventions: Which libraries or frameworks to prefer
Style guidelines: Naming conventions, formatting, documentation standards
Architectural patterns: How to structure code, which design patterns to use
Project-specific rules: Custom conventions unique to your codebase
Domain knowledge: Specialized terminology or domain-specific best practices

The AGENTS.md Concept

What is AGENTS.md?

AGENTS.md is a standardized approach to providing context and guidelines for AI coding assistants directly within your repository. By placing an AGENTS.md file in your repository root, you create a machine-readable specification that AI tools can reference when generating or reviewing code.

The concept is documented at https://agents.md/, which provides:

Specifications for the AGENTS.md format
Examples from various programming languages and domains
Best practices for writing effective agent guidelines
Community-contributed templates and patterns

How AGENTS.md Works

When an AI assistant (like GitHub Copilot, Cursor, or other AI tools) works in a repository with an AGENTS.md file, it can:

Read the guidelines at the start of a session
Apply the rules when generating code suggestions
Reference the standards when reviewing or revising code
Adapt behavior to match project-specific preferences

Multiple AGENTS.md Files

You can place AGENTS.md files at different levels of your project hierarchy. AI assistants typically use the closest AGENTS.md file relative to the current working file:

Repository root (/AGENTS.md): Defines project-wide standards that apply to all code
Subdirectory (/src/AGENTS.md, /tests/AGENTS.md): Provides context-specific guidelines that override or extend root-level rules
Module-level (/src/data-processing/AGENTS.md): Defines specialized rules for specific components

Example hierarchy:

my-r-project/
├── AGENTS.md               # General tidyverse standards
├── src/
│   └── analysis/
│       └── AGENTS.md       # Additional statistical analysis guidelines
└── tests/
    └── AGENTS.md           # Testing-specific conventions

When working on a file like /src/analysis/models.R, the AI assistant will prioritize: 1. /src/analysis/AGENTS.md (most specific) 2. /src/AGENTS.md (if it exists) 3. /AGENTS.md (project-wide defaults)

This hierarchical approach allows you to maintain general standards while accommodating specialized needs in different parts of your codebase.

Callout

Version Control Benefits

Since AGENTS.md is a simple text file and can be easily version-controlled alongside your code, e.g. using git, it:

Evolves with your project
Can be reviewed and improved through pull requests
Maintains consistency across different development stages
Provides historical context for coding decisions

Example: AGENTS.md for an R Project

Here’s a comprehensive but quite extensive example of an AGENTS.md file for an R project that emphasizes tidyverse principles and functional programming patterns:

MARKDOWN

# AGENTS.md - R Project Coding Guidelines

## Overview

This R project follows tidyverse conventions and functional programming principles. 
AI assistants should generate code that adheres to these guidelines.

## Language and Framework

- **Primary Language**: R (version 4.1 or higher)
- **Core Framework**: tidyverse
- **Required Packages**: dplyr, tidyr, ggplot2, purrr, readr

## Code Style Principles

### 1. Mandatory Tidyverse Usage

All data manipulation and analysis code MUST use tidyverse packages and functions.

**Prefer:**
```r
mtcars %>%
  filter(mpg > 20) %>%
  select(mpg, cyl, hp)
```

**Avoid:**
```r
mtcars[mtcars$mpg > 20, c("mpg", "cyl", "hp")]
```

### 2. Piping Over Local Variables

ALWAYS favor piping operations over creating intermediate local variables.

**Prefer:**
```r
mtcars %>%
  filter(cyl == 6) %>%
  mutate(efficiency = mpg / hp) %>%
  arrange(desc(efficiency)) %>%
  head(10)
```

**Avoid:**
```r
cars_filtered <- filter(mtcars, cyl == 6)
cars_mutated <- mutate(cars_filtered, efficiency = mpg / hp)
cars_sorted <- arrange(cars_mutated, desc(efficiency))
result <- head(cars_sorted, 10)
```

### 3. Prohibition of Variable Overwriting

NEVER overwrite existing variables. Use piping to transform data in a single flow.

**Forbidden:**
```r
data <- read_csv("input.csv")
data <- filter(data, value > 0)
data <- mutate(data, log_value = log(value))
data <- arrange(data, date)
```

**Required:**
```r
data <- read_csv("input.csv") %>%
  filter(value > 0) %>%
  mutate(log_value = log(value)) %>%
  arrange(date)
```

### 4. Sparse Use of Local Variables

Minimize the creation of intermediate variables. Only create local variables when:
- The result will be used multiple times in different contexts
- The variable name significantly improves code readability
- The computation is expensive and should not be repeated

**Acceptable local variable usage:**
```r
# Used in multiple independent operations
base_data <- read_csv("data.csv") %>%
  filter(status == "active")

summary_stats <- base_data %>%
  summarize(mean_value = mean(value), sd_value = sd(value))

detailed_analysis <- base_data %>%
  group_by(category) %>%
  summarize(across(where(is.numeric), list(mean = mean, sd = sd)))
```

### 5. Multiline Pipes with Documentation

Pipelines MUST be formatted across multiple lines with inline documentation.

**Required format:**
```r
analysis_results <- raw_data %>%
  # Remove incomplete cases and outliers
  filter(complete.cases(.), between(value, 0, 100)) %>%
  # Normalize values by group
  group_by(category) %>%
  mutate(normalized = (value - mean(value)) / sd(value)) %>%
  ungroup() %>%
  # Calculate derived metrics
  mutate(
    log_value = log1p(value),
    squared_value = value^2,
    interaction = value * normalized
  ) %>%
  # Sort by importance
  arrange(desc(abs(normalized)))
```

Each step in a pipeline should:
- Be on its own line
- Have a preceding comment explaining its purpose
- Use meaningful intermediate calculations when needed

### 6. Function Definitions

Functions should also follow piping principles when applicable:

```r
process_dataset <- function(data, threshold = 0.05) {
  data %>%
    # Filter based on significance threshold
    filter(p_value < threshold) %>%
    # Calculate effect sizes
    mutate(
      effect_size = (mean_treatment - mean_control) / pooled_sd,
      ci_lower = effect_size - 1.96 * se,
      ci_upper = effect_size + 1.96 * se
    ) %>%
    # Add interpretation
    mutate(
      significant = p_value < threshold,
      effect_magnitude = case_when(
        abs(effect_size) < 0.2 ~ "small",
        abs(effect_size) < 0.8 ~ "medium",
        TRUE ~ "large"
      )
    )
}
```

## Forbidden Patterns

1. **Loop-based operations** when vectorized or tidyverse alternatives exist
2. **Direct variable assignment in loops** - use `purrr::map()` family instead
3. **Base R subsetting syntax** - use `dplyr::filter()` and `dplyr::select()`
4. **Nested function calls** without pipes when multiple operations are chained
5. **`attach()` function** - always use explicit data references

## Documentation Standards

- Use roxygen2-style comments for all functions
- Include inline comments for complex pipeline steps
- Document assumptions and data requirements
- Explain any deviations from these guidelines (rare cases only)

## Examples of Complete Workflows

### Data Import and Cleaning
```r
cleaned_data <- read_csv("raw_data.csv") %>%
  # Handle missing values
  drop_na(key_columns) %>%
  # Standardize column names
  rename_with(tolower) %>%
  # Type conversion
  mutate(across(ends_with("_date"), ymd)) %>%
  # Remove duplicates
  distinct()
```

### Analysis Pipeline
```r
analysis <- cleaned_data %>%
  # Subset to relevant period
  filter(between(date, start_date, end_date)) %>%
  # Group-level transformations
  group_by(category, region) %>%
  summarize(
    n_obs = n(),
    mean_value = mean(value, na.rm = TRUE),
    median_value = median(value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Calculate derived metrics
  mutate(
    relative_value = mean_value / median_value,
    weight = n_obs / sum(n_obs)
  ) %>%
  # Final filtering
  filter(n_obs >= min_sample_size)
```

## Additional Guidelines

- Prefer `tibble` over `data.frame`
- Use `readr::read_*()` over base R `read.*()` functions
- Always specify `.groups` argument when using `summarize()` with `group_by()`
- Use `across()` for operations on multiple columns
- Leverage `case_when()` for complex conditional logic

Challenge

Challenge 1: Create Your Own AGENTS.md

Create an AGENTS.md file for one of your own R projects. Consider:

What coding style do you prefer?
Which packages should be favored?
What patterns should be avoided?
How should functions be documented?

Example Solution

Your AGENTS.md should include:

Clear statement of purpose
Specific package preferences with examples
Do’s and don’ts with code comparisons
Documentation requirements
Examples of good practices

Remember: Start simple and expand based on your project’s needs.

Other Options to Specify Context

While AGENTS.md is a powerful tool, there are several complementary or alternative approaches to providing context to AI assistants:

1. Inline Comments and Documentation

AI assistants can learn from well-documented code:

R

# This project uses tidyverse conventions exclusively
library(tidyverse)

#' Process customer data following tidyverse patterns
#' 
#' @param data A tibble with customer information
#' @return A processed tibble with standardized columns
#' @examples
#' process_customers(raw_customers)
process_customers <- function(data) {
  # Always use piping for multi-step transformations
  data %>%
    filter(!is.na(customer_id)) %>%
    mutate(name = str_to_title(name))
}

2. Chat Instructions

When using AI chat interfaces, provide context explicitly:

I'm working on an R project that follows strict tidyverse conventions.
Please generate all code using:

- dplyr for data manipulation
- Piping (%>%) for all multi-step operations
- No variable overwriting
- Comments before each pipe step

3. Project README Files

Include coding standards in your README.md:

MARKDOWN

## Coding Standards

This project follows tidyverse conventions. All contributions must:

- Use tidyverse packages
- Implement piping for data transformations
- Avoid variable overwriting
- Include inline documentation

4. Code Templates and Snippets

Create RStudio code snippets that enforce your patterns:

R

# In RStudio: Tools > Global Options > Code > Snippets
snippet tidypipe
	${1:data} %>%
		# ${2:description}
		${3:operation}() %>%
		# ${4:description}
		${5:operation}()

5. Style Guides with Tools

Use automated style checking tools like lintr to enforce standards:

R

# .lintr configuration
linters: linters_with_defaults(
  line_length_linter(120),
  object_usage_linter = NULL,
  # Enforce tidyverse style
  assignment_linter(),
  pipe_continuation_linter()
)

6. Pre-commit Hooks

Enforce standards before code is committed via pre-commit hooks

YAML

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/lorenzwalthert/precommit
    rev: v0.3.2
    hooks:
      - id: style-files
      - id: lintr

7. Copilot Instructions in IDE

Some IDEs allow workspace-specific instructions for AI assistants. While specific settings vary by tool and may evolve, the concept involves configuring your IDE to provide additional context files or instructions to the AI assistant.

Callout

Note on IDE-Specific Settings

The availability and configuration of AI assistant settings varies across IDEs and tools. Check your specific IDE’s documentation for current options to provide context to AI coding assistants.

Callout

Combining Approaches

The most effective strategy often combines multiple approaches:

AGENTS.md for comprehensive, machine-readable guidelines
Inline comments for implementation-specific context
Chat instructions for interactive sessions
Automated tools for enforcement

Best Practices for Context Definitions

Keep It Focused

Start with the most important rules
Don’t try to specify everything at once
Expand based on actual needs and pain points

Use Examples Liberally

Show preferred patterns with code examples
Demonstrate anti-patterns to avoid
Include both simple and complex scenarios

Make It Discoverable

Place context files in repository root
Reference them in README and contributing guides
Keep them up-to-date with project evolution

Test Your Guidelines

Verify that AI actually follows your guidelines
Iterate based on the quality of generated code
Collect feedback from team members

Version Control Context

Track changes to context definitions
Review updates through pull requests
Document why rules were added or changed

Challenge

Challenge 2: Context Specification Strategy

For a team R project, design a context specification strategy that includes:

What would go in AGENTS.md?
What would be better as inline comments?
How would you communicate standards to new team members?

Example Strategy

AGENTS.md: Core style principles, mandatory patterns, forbidden practices
Inline comments: Function-specific logic, data flow explanations, edge cases
README.md: Quick-start guide, links to detailed standards, setup instructions
Onboarding docs: Human-readable explanation of why standards exist, examples
Code reviews: Consistent feedback referring to documented standards

The key is redundancy across human and AI channels.

Summary

Providing proper context to AI coding assistants is essential for generating high-quality, consistent code. The AGENTS.md concept offers a standardized, version-controlled approach to defining project-specific guidelines. Combined with other context specification methods, it creates a comprehensive environment where AI assistants can truly enhance your development workflow.

Key Points

Context definitions help AI assistants generate code that matches your project standards
Context improves consistency: AI-generated code matches your project standards
AGENTS.md provides a standardized, version-controlled, machine-readable way to specify coding guidelines
Multiple AGENTS.md files can exist at different hierarchy levels; AI assistants use the closest file
Effective AGENTS.md files include clear examples of preferred and forbidden patterns
Start simple and iterate: Begin with core principles and expand based on needs
Combining AGENTS.md with inline comments, chat instructions, and tooling creates robust context
R projects benefit from explicit tidyverse usage and piping conventions in context definitions

Content from Using AI within Pipelines via ellmer

Last updated on 2026-02-09 | Edit this page

Estimated time: 50 minutes

Overview

Questions

What is ellmer and how does it integrate with AI models?
How can I set up ellmer with GitHub Copilot in RStudio?
How can I use AI prompts to process data within my analysis pipelines?
What are the best practices for integrating AI into data processing workflows?

Objectives

Understand the ellmer package and its capabilities
Install and configure ellmer for use with AI models
Integrate AI prompts into data processing pipelines
Apply AI-powered transformations to datasets
Develop reproducible AI-enhanced workflows

Introduction

The ellmer package provides a powerful interface for integrating large language models (LLMs) into R workflows. This enables you to leverage AI capabilities directly within your data processing pipelines, combining traditional statistical computing with modern AI assistance.

What is ellmer?

ellmer is an R package that provides a unified interface to various LLM providers, including:

GitHub Copilot (via GitHub Models)
OpenAI models
Anthropic Claude
Google Gemini
Local models via Ollama

It allows you to:

Send prompts to AI models from R code
Process text data with AI assistance
Generate structured outputs
Stream responses for interactive applications

Callout

Why Use ellmer?

Unified API: Work with different AI providers using consistent syntax
Pipeline Integration: Seamlessly incorporate AI into tidyverse workflows
Reproducible: Track prompts and model versions for reproducible research
Flexible: Switch between models without rewriting code

Installing ellmer

First, install the ellmer package from CRAN or GitHub:

R

# Install from CRAN
install.packages("ellmer")

# Or install development version from GitHub
# install.packages("remotes")
remotes::install_github("tidyverse/ellmer")

Load the package:

R

library(ellmer)
library(tidyverse)  # For data manipulation

Setting Up ellmer with GitHub Copilot

To use ellmer with GitHub Copilot (via GitHub Models), you need to set up authentication.

Step 1-pat: Getting Access to GitHub Models

Given that we have already registered our GitHub account in RStudio, to use the GitHub Copilot features, we can proceed to use the GitHub Models via ellmer without further actions.

Step 1-explicit: Get a GitHub Token and Store It Securely

The following steps are only needed, if you have not already set up GitHub authentication in RStudio.

Go to github.com/settings/tokens
Click “Generate new token” → “Generate new token (classic)”
Give it a descriptive name (e.g., “ellmer-access”)
Select the required scopes (typically repo and user)
Click “Generate token” and copy it immediately

Store your token securely in your R environment:

R

# Option 1: Set for the current session
Sys.setenv(GITHUB_TOKEN = "your-token-here")

# Option 2: Store permanently in .Renviron
usethis::edit_r_environ()
# Add this line: GITHUB_TOKEN=your-token-here
# Save and restart R

Callout

Security Best Practice

Never commit tokens or API keys to your code! Always use environment variables and add .Renviron to your .gitignore file.

Step 2: Initialize a Chat Object

R

# Create a chat session with GitHub Copilot
chat <- chat_github()  # the used default model will be printed

# Test the connection
chat$chat("Hi, please give me a joke!")

Challenge

Challenge 1: Setup and Test `ellmer`

Install the ellmer package
Create a chat object and send a test message
Verify you receive a response and enjoy your joke..
What do you see, when you print your chat object?

Show me the solution

Printing the chat object shows

the model being used
the number of tokens sent and received
the total cost incurred (if applicable)
the history of messages exchanged

Using AI for Data Processing

Basic Text Processing

Process individual text strings with AI:

R

# Classify sentiment
texts <- c(
  "I love this product!",
  "This is terrible.",
  "It's okay, not great."
)

# Use AI to classify sentiment
chat <- chat_github()
# iterative call of chat interface using sapply() or purrr::map()
results <- purrr::map_chr(texts, 
  function(text) {
    chat$chat(paste("Classify the sentiment (positive/negative/neutral):", text))
  })

Processing Data in Pipelines

Integrate AI into tidyverse pipelines:

R

# Example: Customer feedback analysis
feedback_data <- tibble(
  id = 1:5,
  comment = c(
    "Great service, very helpful!",
    "Long wait times, not happy.",
    "Average experience.",
    "Excellent quality and fast delivery!",
    "Product arrived damaged."
  )
)

# Add AI-powered sentiment analysis
feedback_processed <- feedback_data %>%
  rowwise() %>% # ensures each row/information is processed individually
  mutate(
    sentiment = chat$chat(
      paste("Classify as positive/negative/neutral:", comment)
    ),
    key_themes = chat$chat(
      paste("Extract main themes (max 3 words):", comment)
    )
  )

Aggregated calls to reduce IO and number of prompts

So far, one chat() call (prompt) was made per row.

An alternative is to aggregate multiple inputs into a single prompt, reducing the number of API calls:

R

feedback_data |> 
  mutate(
    mood = 
      comment |> 
      str_c(collapse="#") |> 
      chat$chat( "Assign to each product feedback answer (provided as #-separated list) a respective category from (happy,unhappy) in a #-separated aggregated text output",
                      echo = "none") |> 
      str_split_1("#")
  )

That way, only one API call is made for the entire dataset.

But when working on larger datasets, be aware of token limits per prompt (both input and output tokens).

Callout

Rate Limits and Costs

Be aware of:

API rate limits for your chosen provider (limited number of requests per minute/hour)
Limited token quotas and prompt sizes
Potential costs for API calls
Processing time for large datasets
Consider batching requests when possible (less tokens spent)

Advanced: Structured Output Generation

Request structured data from AI models:

R

# Extract structured information
extract_info <- function(text) {
  prompt <- paste0(
    "Extract the following from this text and return as JSON:\n",
    "- sentiment (positive/negative/neutral)\n",
    "- urgency (high/medium/low)\n",
    "- category (product/service/delivery/other)\n\n",
    "Text: ", text
  )
  
  chat$chat(prompt)
}

# Apply to dataset
feedback_structured <- feedback_data %>%
  rowwise() %>%
  mutate(analysis = extract_info(comment))

Challenge

Challenge 2: Build a Data Processing Pipeline

Create a tidyverse pipeline that:

creates a tibble dataset (columns review_id and text) with the following product reviews

The software is intuitive but lacks some features. Rating: 4/5
Terrible experience, crashed multiple times. Very disappointed.
Perfect for my needs! Easy to use and fast. Highly recommend.

Uses AI to classify the main topic of each review in up to two words
Extracts a numerical satisfaction score (1-5) from the text

Example Solution

R

library(ellmer)
library(tidyverse)

# Sample data
reviews <- tibble(
  review_id = 1:3,
  text = c(
    "The software is intuitive but lacks some features. Rating: 4/5",
    "Terrible experience, crashed multiple times. Very disappointed.",
    "Perfect for my needs! Easy to use and fast. Highly recommend."
  )
)

# Initialize chat
chat <- chat_github()

# Process with AI
reviews %>%
  rowwise() %>%
  mutate(
    topic = chat$chat(paste("Main topic (1-2 words):", text)),
    score = chat$chat(paste("Satisfaction score 1-5:", text)),
  ) %>%
  ungroup()

Local LLMs

So far, we have focused on cloud-based AI models like GitHub Copilot. Thus, our setup required internet access and API tokens. Since the latter may incur costs and have rate limits, local models can be a viable alternative for some use cases. This is especially true for high-volume applications or sensitive data. Furthermore, this reduces dependency on external services and the costs associated with them.

To use local models with ellmer, you can leverage Ollama, which allows you to run LLMs on your local machine.

The general steps are:

Install Ollama and set up a local model that fits both your computational resources and needs
Install the ellmer package
Initialize a chat object for the local model

A detailed guide for setting up Ollama with ellmer can be found in the

POSIT Blog post on “Setting up local LLMs for R and Python” (Jan 2026) and the
ellmer chat_ollama documentation.

Best Practices for AI in Pipelines

1. Design Clear Prompts

R

# Good: Specific and constrained
prompt <- "Classify sentiment as: positive, negative, or neutral. 
          Return only one word."

# Less effective: Vague
prompt <- "What do you think about this?"

2. Handle Errors Gracefully

R

safe_chat <- function(chat, prompt) {
  tryCatch(
    chat$chat(prompt),
    error = function(e) {
      warning("API call failed: ", e$message)
      return(NA)
    }
  )
}

# Use in pipeline
data %>%
  mutate(ai_result = safe_chat(chat, prompt))

This is especially important when working with large datasets or unreliable network connections.

3. Cache Results

Write your AI-generated data to file to avoid redundant calls:

R

# Save processed results to avoid reprocessing
processed_data <- data %>%
  mutate(ai_field = process_with_ai(text))

# Save to disk
write_rds(processed_data, "cache/processed_data.rds")

# Load cached version later
processed_data <- read_rds("cache/processed_data.rds")

This reduces redundant API calls and speeds up data processing.

You might even want to write your script in a way, that it first checks, whether a cached version exists, and only processes the data if not. That way, you can easily re-run your whole analysis without incurring additional costs or delays.

4. Use Batch Processing for Large Datasets

R

# Process in chunks to manage rate limits
process_in_batches <- function(data, batch_size = 10) {
  data %>%
    mutate(batch = ceiling(row_number() / batch_size)) %>%
    group_by(batch) %>%
    mutate(ai_result = process_with_ai(text)) %>%
    ungroup() %>%
    select(-batch)
}

This approach is especially useful for large datasets to avoid hitting API rate limits.

Furthermore, it can be nicely combined with the “aggregated calls” approach shown earlier.

Reproducibility Considerations

Document your AI pipeline for reproducibility:

R

# Initialize chat
chat <- chat_github()

# Record model and version
metadata <- list(
  model = chat$get_model(),
  provider = "github",
  date = Sys.Date(),
  ellmer_version = packageVersion("ellmer"),
  prompt_template = "Classify sentiment: {text}"
)

# Save with results
list(
  data = processed_data,
  metadata = metadata
) %>%
  write_rds("results_with_metadata.rds")

Discussion

Group Discussion

Consider the following questions for discussion within the class:

What types of data processing tasks in your work could benefit from AI integration?
How would you balance reproducibility with using AI models that may change over time?
What ethical considerations arise when using AI to process data?
How do you validate the quality of AI-generated classifications or summaries?

Real-World Use Cases

Text Classification

R

# Categorize research abstracts
abstracts %>%
  mutate(field = chat$chat(
    paste("Research field (one word):", abstract)
  ))

Data Cleaning

R

# Standardize inconsistent entries
messy_data %>%
  mutate(cleaned = chat$chat(
    paste("Standardize company name:", company_name_raw)
  ))

Content Generation

R

# Generate descriptions
products %>%
  mutate(description = chat$chat(
    paste("Write 20-word product description for:", product_name)
  ))

Translation and Localization

R

# Translate content
content %>%
  mutate(translated = chat$chat(
    paste("Translate to Spanish:", english_text)
  ))

Challenge

Challenge 3: Implement a Complete Workflow

Choose a scenario and implement a complete AI-enhanced pipeline:

Option A: Analyze a dataset of tweets/social media posts
Option B: Process customer support tickets
Option C: Categorize research papers by topic

Include error handling, caching, and metadata tracking.

Example for Option B: Support Tickets

R

library(ellmer)
library(tidyverse)

# Initialize
chat <- chat_github()

# Process tickets
process_ticket <- function(ticket_text) {
  list(
    category = safe_chat(chat, 
      paste("Category (billing/technical/account):", ticket_text)),
    priority = safe_chat(chat,
      paste("Priority (high/medium/low):", ticket_text)),
    suggested_response = safe_chat(chat,
      paste("Suggest 2-sentence response:", ticket_text))
  )
}

# Apply to dataset
tickets_processed <- tickets %>%
  rowwise() %>%
  mutate(analysis = list(process_ticket(description))) %>%
  unnest_wider(analysis)

# Save with metadata
output <- list(
  data = tickets_processed,
  metadata = list(
    processed = Sys.time(),
    model = chat$get_model(),
    n_tickets = nrow(tickets)
  )
)

write_rds(output, "processed_tickets.rds")

Troubleshooting Common Issues

Authentication Errors

Verify your token is set correctly:
- Sys.getenv("GITHUB_PAT_GITHUB_COM") for RStudio setup
- Sys.getenv("GITHUB_TOKEN") for explicit token setup from above
Ensure token has required permissions
Check token hasn’t expired

Rate Limiting

Implement delays between requests: Sys.sleep(1)
Use batch processing
Consider caching results
Monitor API usage
Consider installation of local models via Ollama for heavy workloads

Inconsistent Results

Make prompts more specific
Add constraints to expected outputs
Use temperature parameter (if supported)
Validate and clean AI outputs

Future Developments

Stay updated with ellmer developments:

New model integrations
Enhanced streaming capabilities
Better error handling
Performance improvements

Check the ellmer documentation regularly for updates.

Key Points

ellmer provides a unified interface for integrating LLMs into R workflows
Set up authentication using environment variables for security
Integrate AI prompts seamlessly into tidyverse pipelines
Design clear, constrained prompts for consistent results
Implement error handling and caching for robust pipelines
Document models and prompts for reproducibility
Consider rate limits, costs, and ethical implications when using AI in data processing
Local model options are available for high-volume use cases

Content from Understanding and Setting LLM Parameters

Last updated on 2026-01-19 | Edit this page

Estimated time: 40 minutes

Overview

Questions

How do large language models generate responses?
What parameters control LLM behavior?
How can I adjust LLM parameters to improve result quality?
How do I set parameters when using ellmer?

Objectives

Understand how LLMs generate text through token probability
Learn the difference between deterministic and random outputs
Configure temperature, top_k, top_p, and seed parameters
Apply parameter settings in ellmer for better results
Follow best practices for AI-driven data processing

Introduction

When working with large language models (LLMs) like GitHub Copilot, understanding how these models generate responses and how to control their behavior is essential for getting consistent, high-quality results. This episode explores the inner workings of LLMs and the parameters you can adjust to guide their output.

How LLMs Generate Responses

Large language models generate text through a process called autoregressive token generation. Let’s break down what this means:

Token-by-Token Generation

Input Processing: The model receives your prompt and converts it into tokens (words or word pieces)
Probability Calculation: For each position, the model calculates probability scores for all possible next tokens
Token Selection: A token is selected based on these probabilities
Iteration: The selected token is added to the sequence, and the process repeats

This means that each word (or token) in the response is chosen based on:

The original prompt
All previously generated tokens
A probability distribution over the vocabulary

Example

Given the prompt: “The capital of France is”

The model might calculate probabilities like:

“Paris” → 85%
“Lyon” → 3%
“located” → 2%
… (thousands of other tokens with lower probabilities)

Deterministic vs. Random Generation

The way a model selects tokens has a significant impact on the quality and consistency of results.

Deterministic Generation

Always selects the highest probability token
Produces consistent, reproducible results
Useful for tasks requiring reliability (e.g., code generation, data extraction)
Can be repetitive or lack creativity

Random (Stochastic) Generation

Samples from the probability distribution
Produces varied, creative results
Useful for content generation, brainstorming
Results may be inconsistent across runs
Can occasionally produce unexpected or incorrect outputs

The parameters we’ll discuss control this balance between deterministic and random behavior.

Key LLM Parameters

Temperature

Temperature controls the randomness of token selection by adjusting the probability distribution.

Low temperature (e.g., 0.0-0.5):
- Sharpens the distribution (high-probability tokens become even more likely)
- More deterministic and focused outputs
- Better for factual tasks, code generation, data processing
High temperature (e.g., 0.8-1.5):
- Flattens the distribution (gives lower-probability tokens more chance)
- More creative and diverse outputs
- Better for creative writing, brainstorming

Top-K Sampling

Top-K limits the selection to the K most probable tokens.

Only the top K tokens with highest probability are considered
All other tokens are given zero probability
Reduces the chance of selecting very unlikely tokens
Helps prevent nonsensical outputs

For example, with top_k = 50:

Only the 50 most likely tokens can be selected
The other thousands of tokens are excluded
The model samples randomly among these 50 tokens (influenced by temperature)

Top-P Sampling (Nucleus Sampling)

Top-P (also called nucleus sampling) selects from the smallest set of tokens whose cumulative probability exceeds P.

Dynamically adjusts the number of tokens considered
More adaptive than top-K
With top_p = 0.9, tokens are selected until their cumulative probability reaches 90%

The advantage of top-P over top-K:

When the model is confident (one token has high probability), fewer tokens are considered
When the model is uncertain (probabilities are spread out), more tokens are considered
This adapts better to different contexts

Seed

Seed controls the random number generator for reproducible results.

Setting the same seed with the same prompt produces identical outputs
Useful for debugging and testing
Enables reproducible research
Note: Different model versions may still produce different results even with the same seed. Furthermore, model internal effects due to parallelization etc. might also cause minor differences.

Setting Parameters in ellmer

The ellmer package provides the params() function to configure LLM parameters. Here’s how to use it:

Parameter Settings for Different Tasks

For Code Generation and Data Processing

R

# Deterministic, reliable outputs
chat_code <- chat_github(
  params = params(
    temperature = 0.2,  # Very focused
    top_p = 0.8,
    seed = 123          # Reproducible
  )
)

# Use the chat as normal
response <- chat$chat("Explain the concept of loops in R")

For Creative Content

R

# More varied, creative outputs
chat_creative <- chat_github(
  params = params(
    temperature = 1.5,  # More randomness
    top_p = 0.95  # More tokens
  )
)

Viewing Current Parameters

You can check the parameters being used by a chat object:

R

# Create chat with parameters
chat <- chat_github(params = params(temperature = 0.3))

# The chat object will show the configured parameters when printed
print(chat)

Callout

Parameter Availability

Not all LLM providers support all parameters. GitHub Copilot models typically support:

temperature
top_p
seed (may vary by model)

Parameter support may vary depending on your ellmer version and the specific model provider. Check the ellmer documentation for the most up-to-date information on parameter support for your version.

Challenge

Challenge 1: Experiment with Temperature

Create two chat sessions with different temperature settings and compare their outputs for repeated calls:

For the following prompt: “Name with one word an animal that fits the following description ‘It has sharp teeth and a long tail’”
- do the prompt individually for temperature 0.1 and 2.0
- repeat the model creation and prompting 3 times each
Compare the creativity and variety of responses

Which temperature setting would you use for your research and in which context?

Show me the solution

R

library(ellmer)

for ( i in 1:3) {
  # Low temperature - deterministic
  llm <- chat_github(params = params(temperature = 0.1))
  llm$chat("Name with one word an animal that fits the following description 'It has sharp teeth and a long tail'")
  # High temperature - creative
  llm <- chat_github(params = params(temperature = 2.0))
  llm$chat("Name with one word an animal that fits the following description 'It has sharp teeth and a long tail'")
}

Discussion:

The low temperature setting will likely produce more conventional, safe suggestions and repetition (reproducability) is higher
The high temperature setting may produce more creative but potentially less practical names
For research, you might want to stick with deterministic model setups in order to keep your results reproducible and sound

Best Practices for AI-Driven Data Processing with ellmer

When using LLMs for data processing tasks, follow these guidelines to ensure reliable and reproducible results:

1. Use Low Temperature for Consistency

R

# For data classification, extraction, or transformation
chat <- chat_github(
  params = params(
    temperature = 0.2,  # Consistent outputs
    seed = 123          # Reproducibility
  )
)

# Process data
data %>%
  rowwise() %>%
  mutate(category = chat$chat(paste("Classify:", text)))

2. Set Seeds for Reproducible Research

R

# Document your parameters in your analysis script
ANALYSIS_PARAMS <- params(
  temperature = 0.3,
  top_p = 0.85,
  seed = 42  # Document this in your README
)

chat <- chat_github(params = ANALYSIS_PARAMS)

3. Test Parameter Settings on Sample Data

R

# Test on a small sample first
sample_data <- data %>% slice_sample(n = 10)

# Try different parameter settings
params_conservative <- params(temperature = 0.1, seed = 1)
params_balanced <- params(temperature = 0.5, seed = 1)

# Compare results before processing full dataset

4. Document Parameter Choices

R

# Save metadata with your processed data
processing_metadata <- list(
  date = Sys.Date(),
  model = chat$get_model(),
  parameters = list(
    temperature = 0.3,
    top_p = 0.9,
    seed = 123
  ),
  ellmer_version = as.character(packageVersion("ellmer"))
)

# Save with your results
output <- list(
  data = processed_data,
  metadata = processing_metadata
)
write_rds(output, "processed_data_with_metadata.rds")

Discussion

Reflection: Parameter Trade-offs

Consider these questions:

When might you want to prioritize creativity over consistency?
How do parameter settings affect the interpretability of AI-processed data?
What are the implications of using low vs. high temperature for scientific reproducibility?
How should parameter choices be documented in published research?

Parameter Selection Guidelines

Here’s a quick reference for choosing parameter values:

Task Type	Temperature	Top-P	Seed	Reasoning
Code generation	0.1-0.3	0.8-0.9	Yes	Need deterministic, correct syntax
Data classification	0.1-0.2	0.8	Yes	Consistency is critical
Text extraction	0.2-0.4	0.85	Yes	Balance accuracy and flexibility
Content summarization	0.3-0.5	0.9	Optional	Some creativity helps
Creative writing	0.7-1.0	0.95	No	Maximize diversity
Brainstorming	0.8-1.2	0.95	No	Want unexpected ideas

Key Points

LLMs generate text through repeated token probability calculations and random selection
Temperature controls the randomness of outputs: low values are more deterministic, high values are more creative
Top-K and top-P sampling limit token selection to reduce nonsensical outputs
Seeds enable reproducible results when using the same prompt and parameters
Use params() in ellmer to configure temperature, top_p, seed, and other parameters
For data processing, use low temperature (0.1-0.3) and set seed for reproducibility
Document parameter choices and model versions for transparent, reproducible research
Test parameter settings on sample data before processing full datasets
Combine appropriate parameters with clear prompts for best results

Content from Revise Your Code with AI

Last updated on 2026-02-13 | Edit this page

Estimated time: 40 minutes

Overview

Questions

How can I use AI to check and improve my code?
What are effective strategies for code review with AI assistants?
How do I use chat interfaces for iterative code refinement?
When should I trust AI suggestions for code revisions?

Objectives

Use AI chat interfaces to review and analyze code
Identify code issues with AI assistance
Iteratively refine code based on AI feedback
Validate AI suggestions critically
Develop a workflow for AI-assisted code revision

Using AI directively

AI-powered tools can serve as an additional pair of eyes when reviewing and improving your code. This chapter explores how to effectively use AI chat interfaces to check, revise, and validate your R code, ensuring better quality and maintainability.

Why Use AI for Code Review?

AI assistants can help identify:

Logic errors: Potential bugs or incorrect implementations
Performance issues: Inefficient code patterns
Style problems: Code that doesn’t follow best practices
Documentation gaps: Missing or unclear comments
Security vulnerabilities: Potential security risks

Callout

AI as a Complement, Not Replacement

AI code review should complement, not replace:

Your own understanding of the code
Human peer reviews
Automated testing and linting tools
Domain expertise

Setting Up for AI Code Review

Using GitHub Copilot Chat in your browser

Currently (as of end 2025), RStudio has no built-in AI chat interface

However, you can use GitHub Copilot Chat within your browser.

To this end:

Navigate to https://github.com/copilot
For an extended list of models:
- Ensure you have a GitHub Copilot subscription
- Sign in

If you have any other AI chat interface you prefer (e.g., ChatGPT, Claude), you can use that as well.

When using a browser, you have to manually copy and paste code snippets between RStudio and the chat interface to get reviews and suggestions. While this is less seamless than an integrated solution, it still allows you to leverage AI for code review.

Alternative: Use the `chattr` app within RStudio

The chattr package provides an interface to chat with AI models directly within RStudio.

The chattr_app() function will open a Shiny app where you can interact with various AI models. Alternatively, you can run the app via the RStudio “Addins”” menu by selecting “Open chat”.

Unfortunately, running the chat interface will “block” your R session until you close the app.

A workaround is to run the app in “job” mode using

R

chattr::chattr_app(as_job = TRUE)

That way, you can continue working in your main R session while chatting with the AI in a separate window. The con is that the chattr app won’t be able to directly interact with your current document in RStudio.

Via the same “Addins” menu, you can also select code in your R script and choose “Send prompt” to get AI feedback on that specific code snippet, which will be copied back directly below your selection.

As of now, I personally find using GitHub Copilot Chat in the browser more flexible for code review tasks, but the chattr package is a promising option for integrated RStudio workflows.

If you are interested in a more sophisticated AI integration into your IDE, you might have to consider to try another IDE such as Positron or Visual Studio Code.

Basic Code Review Workflow

The following steps are a general workflow for using AI to review and improve your R code.

Step 1: Request an Initial Review

Select your code and ask AI to review it:

R

# Example code to review
calculate_mean <- function(numbers) {
  sum(numbers) / length(numbers)
}

Prompt for AI: > Review this function for potential issues and suggest improvements.

Challenge

Challenge 1: Basic Code Review

Ask AI to review the following function. What issues does it identify?

R

process_data <- function(data) {
  result <- data[data$value > 0]
  mean_val <- sum(result$value) / nrow(result)
  return(mean_val)
}

Issues AI Might Identify

No NA handling: Function will fail if there are NA values
Division by zero: If no rows match the condition, nrow(result) is 0
Column existence: Assumes value column exists without checking
No input validation: Doesn’t verify data is a data frame

Improved version:

R

process_data <- function(data) {
  # Input validation
  if (!is.data.frame(data)) {
    stop("Input must be a data frame")
  }
  if (!"value" %in% names(data)) {
    stop("Data frame must contain 'value' column")
  }
  
  # Filter and calculate
  result <- data[data$value > 0 & !is.na(data$value), ]
  
  if (nrow(result) == 0) {
    warning("No positive values found")
    return(NA)
  }
  
  mean(result$value, na.rm = TRUE)
}

Step 2: Ask Specific Questions

Be specific about what you want to check:

Good prompts:

“Does this function handle edge cases correctly?”
“Are there any performance bottlenecks in this loop?”
“Is this code following tidyverse style guidelines?”
“Could this code be more readable?”

Less effective prompts:

“Is this good?”
“Check this code”

Step 3: Request Improvements

R

# Original code
for(i in 1:length(data)) {
  result[i] <- data[i] * 2
}

Prompt: > How can I make this code more efficient and R-idiomatic?

AI might suggest:

R

# Vectorized approach (much faster)
result <- data * 2

Round 1: Initial Review

R

# Your initial code
analyze_sales <- function(sales_data) {
  total <- 0
  for(i in 1:nrow(sales_data)) {
    total <- total + sales_data[i, "amount"]
  }
  return(total / nrow(sales_data))
}

AI Feedback: “This uses a slow loop. Consider vectorization.”

Round 2: Apply Suggestions

R

analyze_sales <- function(sales_data) {
  mean(sales_data$amount)
}

AI Feedback: “Good! Consider adding NA handling and input validation.”

R

analyze_sales <- function(sales_data) {
  if (!is.data.frame(sales_data)) {
    stop("Input must be a data frame")
  }
  if (!"amount" %in% names(sales_data)) {
    stop("Data must contain 'amount' column")
  }
  
  mean(sales_data$amount, na.rm = TRUE)
}

Callout

Iterative Improvement

Don’t expect perfect code in one iteration. Use AI as a collaborative partner:

Get initial feedback
Make changes
Ask for review again
Repeat until satisfied

Common Code Issues AI Can Help Identify

1. Logic Errors

R

# Problematic code
if (x > 0 & y > 0) {  # What if x or y is NA?
  process(x, y)
}

AI can suggest:

R

if (!is.na(x) && !is.na(y) && x > 0 && y > 0) {
  process(x, y)
}

Did you notice the change from & to &&? If you don’t know why, ask AI!

2. Performance Problems

R

# Slow: Growing vector in loop
result <- c()
for(i in 1:10000) {
  result <- c(result, calculate(i))
}

AI can suggest:

R

# Fast: Pre-allocate vector
result <- vector("numeric", 10000)
for(i in 1:10000) {
  result[i] <- calculate(i)
}

# Even better: Vectorize if possible using sapply() or purrr
result <- purrr::map_dbl(1:10000, calculate)

3. Code Readability

R

# Hard to read
f <- function(x, y, z) { x + y * z / (x - y) }

AI can suggest:

R

# More readable
calculate_metric <- 
  function(base_value, multiplier, divisor) {
    adjustment <- multiplier * divisor
    denominator <- base_value - multiplier
    
    base_value + (adjustment / denominator)
  }

Challenge

Challenge 2: Iterative Code Improvement

Start with this code and iteratively improve it with AI assistance:

R

my_function <- function(x) {
  y <- c()
  for(i in 1:length(x)) {
    if(x[i] > 0) {
      y <- c(y, x[i] * 2)
    }
  }
  return(y)
}

Ask AI to help you:

Improve performance
Add error handling
Improve readability
Add documentation

Improved Version After Iterations

R

#' Double positive values in a numeric vector
#'
#' Takes a numeric vector and returns a new vector containing
#' only the positive values, each doubled.
#'
#' @param x A numeric vector
#' @return A numeric vector of doubled positive values
#' @examples
#' double_positives(c(-1, 2, -3, 4))  # Returns c(4, 8)
double_positives <- function(x) {
  # Input validation
  if (!is.numeric(x)) {
    stop("Input must be a numeric vector")
  }
  
  # Vectorized filtering and transformation
  positive_values <- x[x > 0 & !is.na(x)]
  positive_values * 2
}

Key improvements:

Descriptive function name
Roxygen documentation
Input validation
Vectorized operations (much faster)
NA handling

Using Chat for Code Explanation

AI can help you understand unfamiliar code:

R

# Complex code you found
result <- df %>%
  group_by(category) %>%
  summarise(
    mean_val = mean(value, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(mean_val > quantile(mean_val, 0.75))

Prompt:

Explain what this code does step by step.

AI response would explain:

Groups data by category
Calculates mean value per category
Drops grouping structure
Filters for top 25% of means

Often it is a good idea to ask the AI to incorporate explanations as comments in the code itself. That way, you have both the code and the explanation together for future reference and rereading.

Double-Checking AI Suggestions

Always Validate AI Recommendations

R

# AI suggests: "Use this for better performance"
result <- parallel::mclapply(data, complex_function)

Questions to ask yourself:

Does this actually work with my data?
Is parallel processing appropriate here?
Will this work on all platforms (Windows issues with mclapply)?
Is the complexity worth the performance gain?

Callout

Test AI Suggestions

Never blindly accept AI code suggestions:

Run the code with test data
Verify results match expectations
Check edge cases
Benchmark performance claims
Ensure compatibility with your environment

Creating Test Cases

Use AI to help generate tests:

Prompt:

Generate test cases for this function, including edge cases.

R

# Your function
safe_divide <- function(a, b) {
  if (b == 0) return(NA)
  a / b
}

AI-generated tests:

R

# Test cases
test_that("safe_divide works correctly", {
  expect_equal(safe_divide(10, 2), 5)
  expect_equal(safe_divide(0, 5), 0)
  expect_true(is.na(safe_divide(5, 0)))
  expect_true(is.na(safe_divide(Inf, 0)))
  expect_equal(safe_divide(-10, 2), -5)
})

While this will provide the needed code for tests, make sure to review the tested values and adapt the tests to your specific needs.

Advanced Code Review Techniques

1. Security Review

Prompt:

Review this code for security vulnerabilities.

R

# Potentially unsafe
query <- paste0("SELECT * FROM users WHERE id = ", user_input)

AI might warn: SQL injection risk!

2. Style Consistency

Prompt:

Does this code follow tidyverse style guidelines?

R

# Inconsistent style
myFunction<-function(x,y){return(x+y)}

AI suggests:

R

# Consistent style
my_function <- function(x, y) {
  x + y
}

3. Documentation Review

Prompt:

Is this function well-documented? Suggest improvements.

R

calc <- function(x, y) {
  x * y + mean(x)
}

AI suggests adding:

R

#' Calculate weighted metric
#'
#' Multiplies two vectors element-wise and adds the mean of the first vector
#'
#' @param x Numeric vector for values
#' @param y Numeric vector for weights
#' @return Numeric vector of weighted values plus mean adjustment
calc <- function(x, y) {
  x * y + mean(x)
}

Challenge

Challenge 3: Comprehensive Code Review

Perform a complete AI-assisted review of this analysis function:

R

analyze <- function(d) {
  d2 <- d[d$val > 0]
  m <- sum(d2$val) / length(d2$val)
  s <- sqrt(sum((d2$val - m)^2) / length(d2$val))
  list(m, s)
}

Review for:

Correctness
Efficiency
Readability
Documentation
Error handling

Comprehensive Improved Version

R

#' Calculate mean and standard deviation for positive values
#'
#' Filters a data frame to positive values in the specified column
#' and calculates descriptive statistics
#'
#' @param data A data frame containing the data to analyze
#' @param value_col Name of the column containing values (default: "val")
#' @return A named list with 'mean' and 'sd' of positive values
#' @examples
#' analyze_positive_values(data.frame(val = c(-1, 2, 3, 4)))
analyze_positive_values <- function(data, value_col = "val") {
  # Input validation
  if (!is.data.frame(data)) {
    stop("Input must be a data frame")
  }
  if (!value_col %in% names(data)) {
    stop(paste("Column", value_col, "not found in data"))
  }
  if (!is.numeric(data[[value_col]])) {
    stop(paste("Column", value_col, "must be numeric"))
  }
  
  # Filter positive values and remove NAs
  positive_values <- data[[value_col]][
    data[[value_col]] > 0 & !is.na(data[[value_col]])
  ]
  
  # Check if any values remain
  if (length(positive_values) == 0) {
    warning("No positive values found")
    return(list(mean = NA, sd = NA))
  }
  
  # Calculate statistics using built-in functions
  # (more accurate than manual calculation)
  list(
    mean = mean(positive_values),
    sd = sd(positive_values)
  )
}

Improvements:

Clear, descriptive name
Full documentation
Input validation
Column name parameter
NA handling
Edge case handling
Uses built-in statistical functions
Named list output

What you might have noticed at this point: well documented and robust code often spends more lines on validation and documentation than on the actual logic itself.

That’s fine!!!

Writing maintainable code is more important than keeping it short.

From this code, you can also automatically generate documentation pages using roxygen2, which is a great bonus for future users (including yourself). Or you can use the documentation to generate vignettes or tutorials.

Best Practices for AI-Assisted Code Review

1. Start with Specific Questions

Instead of: “Review this code”

Try: “Does this function handle missing data correctly?”

2. Review in Small Chunks

Don’t send entire scripts for review
Focus on one function or logical block at a time
Easier to understand and act on feedback

3. Ask “Why” Questions

“Why is this approach better?”
“Why might this fail?”
“Why is this more efficient?”

Understanding helps you learn, not just copy.

4. Combine Multiple Perspectives

AI review
Peer review
Automated linting (e.g. lintr package)
Testing (e.g. testthat package)

5. Document Changes

Keep track of improvements.

R

# Version 1 (original): Simple but no error handling
# Version 2 (after AI review): Added input validation
# Version 3 (after testing): Improved edge case handling

But best not via comments, rather via a proper version control system such as Git. The latter is especially important when collaborating in teams and neatly integrated into RStudio.

In combination with GitHub, you can even use AI tools to help you write better commit messages, set up automated post-push actions for linting, testing, and documentation generation. And the GitHub Copilot can even help you via the GitHub web interface to review pull requests, suggest solutions for issues and improvements, as we will discuss in more detail in later chapters.

Discussion

Group Discussion

Discuss with your peers:

What types of code issues have you discovered using AI review?
Have you encountered situations where AI gave incorrect advice?
How do you balance AI suggestions with your own judgment?
What are the limitations of AI code review compared to human review?

Limitations and Cautions

AI Cannot Replace

Domain expertise: Understanding of your specific field
Context awareness: Knowledge of project history and constraints
Testing: Actual execution and validation
Team standards: Specific conventions in your organization

Common AI Mistakes

Suggesting overly complex solutions
Not understanding project-specific constraints
Proposing outdated or deprecated approaches
Missing subtle domain-specific issues

When to Seek Human Review

Security-critical code
Complex algorithms
Code with business logic
Performance-critical sections
Code you don’t fully understand

Key Points

AI chat interfaces provide valuable code review assistance
Use iterative refinement: review, improve, and re-review
Ask specific questions about correctness, performance, and style
Always validate and test AI suggestions before accepting them
Combine AI review with human peer review and automated tools
Be aware of AI limitations and seek human expertise for critical code
Document the review process and improvements made

Content from GitHub Issue-Driven Coding with Copilot

Last updated on 2026-01-16 | Edit this page

Estimated time: 40 minutes

Overview

Questions

What is GitHub issue-driven coding with Copilot?
How can I use GitHub issues to guide AI-assisted development?
What are the best practices for working with Copilot on GitHub issues?
How do I effectively communicate tasks to Copilot through issues?

Objectives

Understand the concept of issue-driven development with AI assistance
Learn how to structure GitHub issues for effective AI collaboration
Master techniques for guiding Copilot through issue-based workflows
Apply best practices for iterative development with issue tracking
Recognize when to use issue-driven AI assistance vs. direct coding

Using AI autonomously

GitHub issue-driven coding combines the structured approach of issue tracking with the power of AI-assisted development. By linking your coding work to specific GitHub issues, you create a clear workflow that helps both human collaborators and AI assistants understand the context and goals of your code changes.

This approach is particularly valuable when working with GitHub Copilot, as it provides the AI with rich context about what you’re trying to accomplish, leading to more relevant and accurate suggestions.

Furthermore, you can “employ” the AI as additional team members, assigning them specific issues to work on in parallel with your own tasks. That way, you can leverage AI to accelerate development while focusing your efforts on reviewing the AI’s solutions and the more critical or complex parts of the project.

What is Issue-Driven Coding?

Issue-driven coding is a development practice where:

Tasks are defined as issues: Each feature, bug fix, or improvement starts as a GitHub issue
Issues provide context: The issue description explains the problem, requirements, and acceptance criteria
Code references issues: Commits and pull requests link back to the issues they address
Progress is tracked: Issue status reflects the current state of development

Callout

Why Combine Issues with AI Assistance?

When you work on code while referencing a GitHub issue, AI assistants like Copilot can:

Better understand the task context
Generate code that aligns with the stated requirements
Suggest implementations that match the issue’s acceptance criteria
Help maintain consistency across related changes
Enable parallelizing progress on multiple subtasks when issues are well-defined

Use Cases for Issue-Driven AI Coding

1. Feature Implementation

Scenario: You need to add a new feature to analyze data trends.

GitHub Issue Example:

MARKDOWN

Title: Add trend analysis function to data processing module

Description:
Create a function that calculates linear trends in time series data.

Requirements:
- Accept a data frame with date and value columns
- Calculate slope and intercept using linear regression
- Return results as a named list
- Handle missing values appropriately
- Include error handling for invalid inputs

Acceptance Criteria:
- Function works with sample dataset
- Returns correct statistical values
- Handles edge cases (empty data, all NAs, single point)

Using Copilot: When you reference this issue while coding, Copilot understands you need a robust statistical function and will suggest implementations that include error handling and edge case management.

2. Bug Fixes

Scenario: Users report incorrect calculations in your analysis code.

GitHub Issue Example:

MARKDOWN

Title: Fix mean calculation ignoring NA values

Description:
The calculate_statistics() function returns NA when any value 
in the input is NA, instead of computing the mean of available values.

Steps to Reproduce:
1. Call calculate_statistics(c(1, 2, NA, 4))
2. Expected: mean should be 2.33 (mean of 1, 2, 4)
3. Actual: function returns NA

Expected Behavior:
Function should use na.rm = TRUE by default or provide an option 
to handle NA values.

Using Copilot: With this context, Copilot will suggest fixes that specifically address NA handling, such as adding na.rm = TRUE parameters.

3. Code Refactoring

Scenario: You need to improve code performance or readability.

GitHub Issue Example:

MARKDOWN

Title: Refactor data loading functions for better performance

Description:
Current data loading is slow with large files. Refactor to use 
more efficient approaches.

Current Problems:
- Uses loops to process rows individually
- Reads entire file into memory at once
- No progress indication for large files

Proposed Improvements:
- Use vectorized operations where possible
- Implement chunked reading for large files
- Add optional progress bar
- Maintain backward compatibility

Using Copilot: The AI will suggest vectorized alternatives and modern R idioms that address the performance concerns.

4. Adding Tests

Scenario: You need to add test coverage for existing functions.

GitHub Issue Example:

MARKDOWN

Title: Add unit tests for data validation functions

Description:
Add comprehensive tests for validate_input() and sanitize_data() 
functions.

Test Coverage Needed:
- Valid input handling
- Invalid input rejection
- Edge cases (empty data, single value, extreme values)
- Type checking
- NA and NULL handling

Framework: Use testthat
Target Coverage: 90%+

Using Copilot: The AI will generate appropriate test cases using the testthat framework that cover the specified scenarios.

Challenge

Challenge 1: Create an Issue for AI-Assisted Development

Write a GitHub issue for adding a data visualization function to your R project. Include:

Clear title
Detailed description
Specific requirements
Acceptance criteria

Think about what information would help an AI assistant generate appropriate code.

Example Solution

MARKDOWN

Title: Create scatter plot function with trend line for correlation analysis

Description:
Implement a function to create scatter plots with optional trend lines 
for exploring relationships between variables.

Requirements:
- Function name: plot_correlation()
- Parameters:
  - data: data frame
  - x_var: name of x-axis variable
  - y_var: name of y-axis variable
  - add_trend: logical, whether to add trend line (default: TRUE)
  - title: optional plot title
- Use ggplot2 for plotting
- Include correlation coefficient in plot subtitle when trend is shown
- Return a ggplot object for further customization

Acceptance Criteria:
- Creates clear, readable scatter plot
- Trend line uses appropriate statistical method (lm)
- Handles data with missing values
- Includes proper axis labels and title
- Works with sample mtcars dataset

Example Usage:
plot_correlation(mtcars, "wt", "mpg", title = "Weight vs MPG")

This issue provides clear context that helps Copilot suggest appropriate ggplot2 code with statistical elements.

How to Use GitHub Issues (with and without AI)

Create a Detailed Issue

Before starting to code, create a GitHub issue that clearly describes:

The problem or feature: What needs to be done?
Context: Why is this needed?
Requirements: What are the specific technical requirements?
Acceptance criteria: How will you know it’s done correctly?
Examples: Input/output examples if applicable

Closing an Issue via Reference in Your Work

When referencing issues, include the issue number (with #) in your code or git workflow using the keywords “closes”, “fixes”, or “resolves”, GitHub will automatically link the commit to the issue and close it.

In the following, some examples of referencing issues are provided:

In commit messages:

BASH

git commit -m "Add trend analysis function #42"
git commit -m "Fix NA handling in statistics - fixes #38"

In code comments:

R

# Implementation for GitHub issue #42: Add trend analysis
# See: https://github.com/username/repo/issues/42
analyze_trend <- function(data, date_col, value_col) {
  # Function implementation
}

In pull request descriptions:

MARKDOWN

Closes #42

This PR implements the trend analysis function as specified in the issue.

Changes:
- Added analyze_trend() function
- Included error handling for edge cases
- Added unit tests for validation

Using Issues as AI Context for Autocompletion While Coding

When writing code that addresses an issue:

Keep the issue open in your browser
Reference requirements in code comments (maybe also the github repo’s URL)
Write descriptive comments that mirror the issue’s language
Test against acceptance criteria listed in the issue

Example workflow:

R

# GitHub Issue #42: Add trend analysis function
# Requirement: Accept data frame with date and value columns
# Requirement: Calculate slope and intercept using linear regression
# Requirement: Handle missing values appropriately

analyze_trend <- function(data, date_col = "date", value_col = "value") {
  # Input validation (Requirement: error handling for invalid inputs)
  if (!is.data.frame(data)) {
    stop("Input must be a data frame")
  }
  
  # Copilot will suggest implementations based on the comments
  # and your typing that align with the requirements
}

Triggering Autonomous AI Coding via Issues

When you create a detailed GitHub issue, you can “assign” it to GitHub Copilot (if supported in your environment) or simply reference it while coding.

This will trigger the following workflow:

GitHub Copilot reads the issue description
It creates a new git branch for the issue
It generates code that addresses the issue requirements within the branch
It creates a pull request for review and lists all changes made
It will send you a notification to review the PR
You will review the code, request changes if necessary, and merge it when satisfied

That way, the AI can work semi-autonomously on well-defined tasks, freeing you to focus on higher-level design and review.

This approach works very well for

Specific feature implementations
Bug fixes
Error identification and handling
Code refactoring
Documentation generation or improvement/update
Test case creation
And more…

GitHub Repositories vs. RStudio Projects

While GitHub repositories and RStudio projects are related, they serve different purposes in the development workflow.

GitHub Repository: A GitHub repository is a remote storage space on GitHub where your project’s files, including code, documentation, and version history, are hosted. It allows for collaboration, issue tracking, and version control using Git.
RStudio Project: An RStudio project is a local environment within RStudio that organizes your work. It includes settings, file paths, and configurations specific to your R development. An RStudio project can be linked to a GitHub repository for version control and collaboration.

When working with issue-driven coding, you typically create and manage issues on the GitHub repository, while you do the actual coding and testing within your local RStudio project.

Thus, you have to ensure that your RStudio project is connected to the corresponding GitHub repository and you synchronize changes between them using Git commands.

GitHub Authentication in RStudio

In order to enable the communication between your local RStudio and the online GitHub server, you need to authenticate RStudio with your GitHub account. This can be done by generating a Personal Access Token (PAT) on GitHub and configuring RStudio to use it for authentication.

For detailed instructions on how to set this up, refer to the respective section in Happy Git and GitHub for the useR.

A general description about the integration of versioning systems like git in RStudio can be found in the RStudio User Guide.

Starting an RStudio project from a GitHub repository

Once your authentication is set up, you can clone a GitHub repository directly into RStudio to create a new project.

To this end

Open RStudio
Go to File -> New Project...
Select Version Control
Choose Git
Enter the repository URL from GitHub
- Note: ensure you use the “https://…” URL format if using PAT authentication!
Select the local directory where you want to store the project
Click Create Project

Afterwards, RStudio will create a new project linked to the GitHub repository, allowing you to work on the code locally while managing issues and version control through GitHub.

Within the project you can use the Git tab in RStudio to commit changes, push updates to GitHub, and pull the latest changes from the repository.

Callout

Best Practices for git-based Projects

always ensure that you pull the latest changes from GitHub before starting new work to avoid merge conflicts.
- thus, do a “pull” whenever you open RStudio to work on the project
commit changes frequently with clear messages that reference relevant issue numbers
don’t forget to “push” your commits to GitHub to keep the remote repository updated

When using with GitHub issues, ensure that your commit messages and pull requests clearly reference the issues they address.

When using GitHub Copilot for issue-driven coding, always review the generated code carefully to ensure it meets the issue requirements before merging it into the main branch.

ensure your remote repository is up-to-date before starting work on an issue..
- check if you PUSHed all your local changes to GitHub
assign the issue to GitHub Copilot (if supported)
- review and merge the respective pull request if satisfied
PULL these changes into your local RStudio project to keep it synchronized !!!

Key Points

Issue-driven development provides structure and context for AI-assisted coding
Detailed issues with clear requirements help AI assistants generate better code
Reference issue numbers in commits, comments, and code to maintain traceability
Use issues to guide Copilot for autonomous code generation and review
Effective use of GitHub repositories and RStudio projects enhances collaboration and version control
If not familiar with git, read “Happy Git and GitHub for the useR” for a gentle introduction

Content from Wrap-Up and Next Steps

Last updated on 2026-01-02 | Edit this page

Estimated time: 20 minutes

Overview

Questions

What are the key best practices for AI-supported coding?
What additional AI applications exist beyond what we covered?
How can I stay current with rapidly evolving AI tools?

Objectives

Summarize key takeaways and best practices from the workshop
Identify additional AI applications for programming tasks
Understand the importance of staying current with AI developments
Recognize the ongoing evolution of AI capabilities and tools

Congratulations!

You’ve completed this journey through AI-supported coding with GitHub Copilot in RStudio!

Throughout this workshop, you’ve gained hands-on experience with various AI tools and techniques that can significantly enhance your programming productivity and capabilities.

Let’s consolidate the most important lessons into actionable best practices and look ahead to the exciting future of AI-assisted development.

Key Takeaways and Best Practices

Throughout this workshop, you’ve explored essential practices for AI-supported coding. Here are the most important lessons to remember:

Start with Proper Setup

Best Practice: Invest time in proper configuration from the beginning.

Ensure GitHub Copilot is correctly installed and authenticated in RStudio
Disable indexing of project files to maintain privacy and improve performance
Test your setup with simple examples before tackling complex projects

Use AI Where It Excels, Avoid Where It Doesn’t

Best Practice: Be strategic about when to use AI assistance.

Use AI for: Boilerplate code, documentation, learning new libraries, repetitive patterns
Avoid AI for: Critical algorithms, novel implementations, security-sensitive code, learning fundamentals
Always apply human judgment .. AI is a tool, not a replacement for expertise

Always Verify and Take Responsibility

Best Practice: Never blindly accept AI suggestions.

You are fully responsible for all code you commit, regardless of its source
Test AI-generated code thoroughly, especially for edge cases and error handling
Review for security vulnerabilities, performance issues, and correctness
Maintain your programming skills, don’t become overly dependent on AI!

Write Clear Prompts for Better Results

Best Practice: Guide AI with descriptive comments and context.

Use clear, specific comments to describe what you need
Provide context about data structures, requirements, and constraints
Request error handling, documentation, and edge case coverage explicitly
Iterate on prompts if initial suggestions aren’t quite right

Provide Context for Accurate Suggestions

Best Practice: Help AI understand your project conventions.

Use AGENTS.md files to define project-specific guidelines and patterns
Structure your code clearly so AI can better understand your codebase
Keep related code visible in your editor to provide context
Define naming conventions and coding standards

Integrate AI into Your Workflow

Best Practice: Use AI systematically, not sporadically.

Incorporate AI into data pipelines using tools like ellmer for consistent results
- BUT: document the used model etc. to stay (as far as possible) reproducable!
Use AI to refactor and modernize legacy code systematically
Leverage AI for comprehensive documentation generation
Apply AI assistance to test generation and validation

Structure Work with Issues

Best Practice: Use issue-driven development to guide AI effectively.

Create detailed GitHub issues with clear requirements and acceptance criteria
Reference issues in your code to provide context for AI suggestions
Use AI as a collaborative team member on well-defined tasks by assigning it in the issue
Track progress and maintain traceability through issue links

Beyond This Workshop: Further AI Applications

The AI tools and techniques we’ve covered are just the beginning. The field is expanding rapidly, and new applications emerge constantly:

Additional AI-Powered Development Tools

Code Review and Quality Assurance:

AI-powered code review tools that identify bugs, security issues, and style violations
Automated test generation based on code analysis
Intelligent code coverage tools that suggest missing test cases
Static analysis enhanced with machine learning

Documentation and Communication:

Automatic generation of API documentation from code
AI-assisted technical writing and blog posts about your code
Automated creation of user guides and tutorials
Translation of documentation across multiple languages

Project Management and Planning:

AI tools for estimating development time and complexity
Automated task breakdown from high-level requirements
Intelligent prioritization of issues and features
Predictive analytics for project timelines and risks

AI for Data Science and Analysis

Advanced Analytics:

AI-assisted feature engineering for machine learning models
Automated exploratory data analysis and visualization suggestions
Intelligent data cleaning and preprocessing recommendations
Natural language querying of datasets

Statistical Modeling:

AI suggestions for appropriate statistical tests and models
Automated model selection and hyperparameter tuning
Interpretation assistance for complex statistical results
Validation and diagnostics guidance

Emerging AI Capabilities

Multimodal AI Tools:

Tools that understand both code and diagrams
AI that can work with screenshots and visual interfaces
Voice-activated coding assistants
AI that integrates code with database schemas and architecture diagrams

Specialized Domain AI:

Domain-specific AI trained on bioinformatics, finance, or other specialized fields
AI assistants with deep knowledge of specific R packages and ecosystems
Industry-specific coding assistants (e.g., for healthcare, finance, research)

Collaborative AI:

AI tools that learn from your team’s coding patterns
Shared AI knowledge bases across development teams
AI facilitators for code reviews and pair programming
Integration with team communication platforms

The Rapidly Evolving AI Landscape

AI capabilities are evolving at an unprecedented pace.

The tools and techniques you’ve learned here are just the beginning! New models, features, and applications emerge constantly.

What this means for you:

Tools will become more capable, i.e. stay informed about updates to Copilot and RStudio’s AI features
Best practices will evolve, i.e. engage with developer communities to learn from shared experiences
New paradigms will emerge, i.e. remain flexible and open to new approaches
Continuous learning is essential, i.e. the field moves fast, and staying current requires active effort

Looking to the Future

The integration of AI into programming is not a temporary trend—it represents a fundamental shift in how we develop software and data processing pipelines.

Embrace the Change

Rather than fearing AI or becoming overly dependent on it, strive for a balanced approach:

Use AI as a powerful tool that amplifies your capabilities
Maintain your core skills so you remain effective with or without AI
Stay curious and adaptable as new capabilities emerge
Think critically about AI suggestions rather than accepting them blindly
Share knowledge to help build a community of responsible AI users

The Human Element Remains Essential

While AI capabilities continue to grow, remember that:

Creativity and innovation come from human insight
Domain expertise cannot be fully replaced by AI
Ethical judgment requires human values and context
Quality standards must be set and enforced by people
Learning and growth remain uniquely human pursuits

AI is a tool that makes you more effective—but you remain the developer, the decision-maker, and the expert.

Final Thoughts

Thank you for joining us on this exploration of AI-supported coding in RStudio!

The best practices you’ve learned provide a solid foundation for working effectively with AI tools. As you continue your journey, remember to:

Be strategic in choosing when to use AI assistance
Be critical in evaluating AI suggestions and outputs
Be responsible for all code you commit
Be adaptable as AI capabilities continue to evolve
Be engaged with the community to share and learn

The future of programming is being written right now, and you’re part of it. Use these best practices to make AI a powerful ally in your development work while maintaining your expertise and judgment.

Callout

Keep Learning and Practicing!

Remember: AI is here to assist you, not replace you. Your judgment, expertise, and creativity are irreplaceable. Apply the best practices you’ve learned to use AI tools effectively while continuing to develop your skills and knowledge.

The best is yet to come!

Key Points

Always verify AI suggestions—you are responsible for all code you commit
Use AI strategically: excel at boilerplate and documentation, avoid critical algorithms
Write clear prompts and provide context for better AI suggestions
Structure work with GitHub issues to guide AI effectively
AI applications extend beyond code completion to testing, documentation, and project management
The AI landscape evolves rapidly—stay current through communities and experimentation
Human expertise, judgment, and creativity remain essential
Balance AI assistance with skill maintenance to avoid over-dependence
Treat AI as a powerful tool that amplifies your capabilities, not a replacement