Skip to content

Basics of R and R Studio

# Setting Up Your Workspace

Start by installing R and RStudio. R is the programming language, and RStudio is a user-friendly interface that makes working with R much easier. Just Google “RStudio download” and follow the installation instructions. It’s completely free!

Installation

RStudio interface

When you open RStudio, you’ll see a a user-friendly interface with four key panes:

    • Script/Source Window (Top): This is where you’ll write and edit your R code.

    • Console Window (Bottom Left): Execute commands directly and view output.

    • Environment Pane (Top Right): This pane displays the objects (variables, data frames) you create in your R session.

    • Files/ Plots/ Help Pane (Bottom Right): Manage your R scripts, data files, and project files. Displays graphs generated by your code. This pane provides documentation for functions and data sets, as well as access to online resources.

# Basic R Syntax

  • R as a Calculator: R can perform basic arithmetic operations:

    7 + 7
  • Variable Assignment: Assign values to variables using the arrow symbol (<-). For example:

    my_variable <- "Hello, world!"
  • Printing Values: Use the print() function to display the value of a variable or any expression:

    print(my_variable)

Working with Data Structures

R offers various data types and structures for storing and manipulating information.

Data Types:

  • Numeric: Whole numbers (integers) and decimal numbers (doubles)

  • Character: Text strings

  • Logical: TRUE/FALSE values

  • Complex: Numbers with real and imaginary parts

  • Raw: Binary data

Data Structures:

  • Vectors: A sequence of elements of the same data type. Create vectors using the c() function:

    my_vector <- c(1, 2, 3, 4, 5) # Numeric vector
    my_vector_char <- c("apple", "banana", "cherry") # Character vector
  • Lists: Collections of elements of potentially different data types:

    my_list <- list(1, "hello", TRUE) # Contains a number, string, and boolean
  • Matrices: Two-dimensional arrays of elements of the same data type:

    my_matrix <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2) # Create a 2x2 matrix
  • Arrays: Multi-dimensional data structures:

    my_array <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3)) # Create a 2x3 array
  • Factors: Represent categorical variables, treating data as groups rather than individual values:

    my_factor <- factor(c("red", "green", "blue", "red")) # Create a factor
  • Data Frames: Organize data in a tabular format, with columns representing different data types and rows representing individual observations. Create data frames using the data.frame() function:

    my_df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28))

**Coercion (**Changing data type)

Manual Coercion: Use functions like as.integer(), as.numeric(), as.data.frame() to convert data types explicitly.

Entering Data Manually

Sometimes, you might need to enter small amounts of data directly into R. Methods for Entering Data:

  • Colon Operator: x <- 0:10 (Creates a sequence from 0 to 10)

  • seq() Function: x <- seq(1, 10, by=2) (Creates a sequence starting at 1, ending at 10, with a step of 2)

  • C() Function: x <- c(1, 4, 7, 2) (Combines individual values)

  • scan() Function: Allows interactive data entry (enter values followed by Enter, end with two Enter presses)

  • rep() Function: x <- rep(TRUE, 5) (Repeats a value 5 times)

Help on Functions and Datasets

Use a question mark followed by the function or dataset name to access help documentation:

?mean
?mpg

# Power Up R with Packages

R packages are collections of functions and tools that expand R’s capabilities.

Install Packages

  • Option 1: By using the install.packages() function:

    • install.packages(“dplyr”)
  • Option 2: By using PacMan:

    • Install the PacMan package: install.packages(“pacman”)

    • Load PacMan: library(pacman)

    • Install a set of packages: p_load(dplyr, tidyr, stringr, lubridate, ggplot2, readr)

  • To check if a package has been installed:

    • You can use the installed.packages() function. Here’s a simple way to check:

      is_installed <- "package_name" %in% rownames(installed.packages())

      Replace “package_name” with the name of the package you want to check. This will return TRUE if the package is installed, and FALSE if it’s not.

Load Packages

Once installed, you can load packages usinglibrary():

    • library(dplyr)

**To check if a package has been loaded: **You can use the isNamespaceLoaded() function or check the search() list. Here are two methods:

# Method 1
is_loaded <- isNamespaceLoaded("package_name")
# Method 2
is_loaded <- "package:package_name" %in% search()

Again, replace “package_name” with the name of the package you want to check. Both methods will return TRUE if the package is loaded, and FALSE if it’s not.

Check the version of an installed package

To check the version of an installed package in R, you can use the packageVersion() function. Here’s how to do it

packageVersion("package_name")

Replace “package_name” with the name of the package you want to check. This function will return the version number of the package.

For example, to check the version of the “dplyr” package:

packageVersion("dplyr")

Or, 

print(packageVersion("dplyr"), quote = FALSE)

Discover Useful Packages

  • CRAN (Comprehensive R Archive Network): The official repository for R packages, organized by task views (e.g., Bayesian inference, chemo metrics, etc.).

  • CRANtastic: A site listing recently updated and popular R packages.

  • GitHub: A platform where developers share and collaborate on R packages.

# Data Manipulation using Tidyverse and Dplyr

  • Tidyverse: A Collection of Packages: The Tidyverse is a set of packages designed to work together seamlessly for consistent data analysis using the Tidy data format.

  • Dplyr: The Data Manipulation Master: Dplyr is a key Tidyverse package that empowers you to filter, transform, and manipulate data with ease.

The Pipe Operator ( %>%)

The pipe operator ( %>%) chains operations, passing the output of one function as input to the next. For example:

  • my_df %>%
    filter(age > 28) %>% # Filter for rows where age is greater than 28
    mutate(new_column = age * 2) # Create a new column doubling the age

Let’s filter by City mileage and then mutate a column in one step:

mpg_filtered_and_mutated <- mpg %>%
filter(cty >= 20) %>%
mutate(cty_metric = cty * 0.425144)
# Output: A new dataset with both filtering and mutation performed
view(mpg_filtered_and_mutated)

Subsetting Data

The filter() command allows you to select rows based on specific conditions:

  • Filtering by Condition: Get cars with City mileage at least 20 miles per gallon:
mpg_efficient <- mpg %>% filter(cty >= 20)
# Output: A new dataset called "mpg_efficient" with only cars that meet the condition
view(mpg_efficient)
  • Filtering by Variable Value: Get cars manufactured by Ford:
mpg_ford <- mpg %>% filter(manufacturer == "ford")
# Output: A new dataset called "mpg_ford" with only Ford cars
view(mpg_ford)

Grouped Summaries

  • Grouping and Summarizing: Calculate the average City mileage for each vehicle class:
mpg_grouped_summary <- mpg %>%
group_by(class) %>%
summarize(avg_cty = mean(cty))
# Output: A dataset showing the average City mileage for each vehicle class
view(mpg_grouped_summary)
  • Multiple Summaries: Calculate both average and median City mileage:
mpg_grouped_summary <- mpg %>%
group_by(class) %>%
summarize(avg_cty = mean(cty), median_cty = median(cty))
# Output: A dataset showing both the average and median City mileage for each vehicle class
view(mpg_grouped_summary)

# Importing Data from Files

The most common way to get data into R is by importing it from files.

Common Data File Formats:

  • CSV (Comma Separated Values): Plain text version of a spreadsheet.

  • TXT (Text File): Simple text files.

  • XLSX (Excel Spreadsheet): Excel files.

  • JSON (JavaScript Object Notation): Data format often used for web data.

Step 1: Load the readr Package

  • library(readr)

Step 2: Import Data

  • CSV: data <- read_csv(“your_file.csv”)

  • TXT: data <- read_delim(“your_file.txt”, delim=“\t”)

  • XLSX: Import using the readxl package: install.packages(“readxl”); library(readxl); data <- read_excel(“your_file.xlsx”)

Practice


This tutorial will guide you through the fundamental data structures in R, specifically focusing on those essential for working with Bioconductor.

Vectors: Building Blocks of Data

A vectors are the foundation of many data structures in R. They hold elements of the same data type, making them efficient for storing and manipulating data.

Example:

# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# Print the vector
numbers
# Get the class of the vector
class(numbers)

Subsetting Vectors: You can access specific elements within a vector using indexing and names.

Example:

# Give names to the elements
names(numbers) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
# Access elements by index
numbers[1] # Get the first element
numbers[3:5] # Get elements from index 3 to 5
# Access elements by name
numbers["a"] # Get the element named "a"
numbers[c("b", "d", "f")] # Get elements named "b", "d", and "f"

Important Note: When using names for subsetting, be aware that non-unique names can lead to confusion. Only the first match will be returned.

# Example with non-unique names
names(numbers) <- c("a", "a", "b")
numbers["a"] # Returns the first element with name "a"

Matrices: Organizing Data in Rows and Columns

Matrices are two-dimensional data structures with rows and columns. They are useful for representing tables or arrays of data.

Example:

# Create a matrix
matrix <- matrix(1:9, nrow = 3, ncol = 3)
matrix
# Get the dimensions of the matrix
dim(matrix)
# Access elements by indices
matrix[1, 2] # Access element in the first row, second column
matrix[1:2, 3] # Access elements in the first and second row, third column
# Add row names
rownames(matrix) <- c("R1", "R2", "R3")
# Add column names
colnames(matrix) <- c("C1", "C2", "C3")
matrix

Subsetting Matrices: You can use both numeric indices and names for subsetting matrices, considering their two-dimensional nature.

Lists: Holding Diverse Data

Lists are flexible data structures capable of holding various types of objects, even of different classes.

Example:

# Create a list with different elements
my_list <- list(numbers = numbers, letters = letters[1:5], function = mean)
# Print the list
my_list
# Access elements by name
my_list$numbers
my_list$letters
my_list$function

Subsetting Lists: Similar to vectors, you can subset lists using indexing and names.

Example:

my_list[1:2] # Access the first two elements
my_list[1] # Access the first element (returns a list with one element)
my_list[[1]] # Access the first element (returns the element itself)

Important Note: Double brackets ( [[ ]] ) are crucial for accessing the elements directly within a list, as single brackets ([ ]) return a list with one element.

Data Frames: Organizing Data for Analysis

Data frames are essential for data analysis, storing observations of different types in columns. Each column represents a variable, and each row represents an observation.

Example:

# Create a data frame with two variables
my_df <- data.frame(sex = c("M", "F", "M"), age = c(25, 30, 28))
# Print the data frame
my_df
# Access columns
my_df$sex
my_df$age
# Access rows using subsetting
my_df[1:2, ] # Access the first two rows

Data Frame Characteristics:

  • Column Orientation: Data frames are column-oriented, allowing easy access to individual variables.

  • Unique Row Names: Row names in data frames are required to be unique, ensuring clear identification of observations.

Converting Objects: Changing Data Types

R provides functions for converting between different data types.

Example:

# Convert data frame to matrix
as.matrix(my_df)
# Convert matrix to list
as.list(matrix)
# Convert a vector to a list
as.list(numbers)

General Conversion Function: The as function in the methods package offers a general way to convert objects of various types.

Example:

# Use the as function to convert an object
as(my_df, "matrix")

This tutorial has provided you with a solid foundation in essential R objects for Bioconductor. You’ve learned about vectors, matrices, lists, and data frames, understanding how to create, manipulate, and convert them. This knowledge will serve you well as you explore the exciting world of Bioconductor.