What is 1_r_basics.Html?

1_r_basics.Html is an important topic in Omics Sciences that helps students understand bioinformatics concepts.

How to learn 1_r_basics.Html?

This comprehensive guide covers 1_r_basics.Html with practical examples and step-by-step instructions suitable for intermediate level students.

Basics of R and R Studio

# Setting Up Your Workspace

Start by installing R and RStudio. R is the programming language, and RStudio is a user-friendly interface that makes working with R much easier. Just Google “RStudio download” and follow the installation instructions. It’s completely free!

Installation

Download and Install R: Head to https://cran.r-project.org/ and download the R distribution for your operating system. Follow the installation instructions.
Download and Install RStudio: Visit https://www.rstudio.com/products/rstudio/download/ and download the free RStudio Desktop version. Install it after you’ve installed R.

RStudio interface

When you open RStudio, you’ll see a a user-friendly interface with four key panes:

- Script/Source Window (Top): This is where you’ll write and edit your R code.
- Console Window (Bottom Left): Execute commands directly and view output.
- Environment Pane (Top Right): This pane displays the objects (variables, data frames) you create in your R session.
- Files/ Plots/ Help Pane (Bottom Right): Manage your R scripts, data files, and project files. Displays graphs generated by your code. This pane provides documentation for functions and data sets, as well as access to online resources.

# Basic R Syntax

R as a Calculator: R can perform basic arithmetic operations:
```
7 + 7
```
Variable Assignment: Assign values to variables using the arrow symbol (<-). For example:
```
my_variable <- "Hello, world!"
```
Printing Values: Use the print() function to display the value of a variable or any expression:
```
print(my_variable)
```

Working with Data Structures

R offers various data types and structures for storing and manipulating information.

Data Types:

Numeric: Whole numbers (integers) and decimal numbers (doubles)
Character: Text strings
Logical: TRUE/FALSE values
Complex: Numbers with real and imaginary parts
Raw: Binary data

Data Structures:

Vectors: A sequence of elements of the same data type. Create vectors using the c() function:

my_vector <- c(1, 2, 3, 4, 5) # Numeric vector
my_vector_char <- c("apple", "banana", "cherry") # Character vector

Lists: Collections of elements of potentially different data types:

my_list <- list(1, "hello", TRUE) # Contains a number, string, and boolean

Matrices: Two-dimensional arrays of elements of the same data type:

my_matrix <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2) # Create a 2x2 matrix

Arrays: Multi-dimensional data structures:

my_array <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3)) # Create a 2x3 array

Factors: Represent categorical variables, treating data as groups rather than individual values:
```
my_factor <- factor(c("red", "green", "blue", "red")) # Create a factor
```
Data Frames: Organize data in a tabular format, with columns representing different data types and rows representing individual observations. Create data frames using the data.frame() function:
```
my_df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28))
```

Coercion (Changing data type)

Manual Coercion: Use functions like as.integer(), as.numeric(), as.data.frame() to convert data types explicitly.

Entering Data Manually

Sometimes, you might need to enter small amounts of data directly into R. Methods for Entering Data:

Colon Operator: x <- 0:10 (Creates a sequence from 0 to 10)
seq() Function: x <- seq(1, 10, by=2) (Creates a sequence starting at 1, ending at 10, with a step of 2)
C() Function: x <- c(1, 4, 7, 2) (Combines individual values)
scan() Function: Allows interactive data entry (enter values followed by Enter, end with two Enter presses)
rep() Function: x <- rep(TRUE, 5) (Repeats a value 5 times)

Help on Functions and Datasets

Use a question mark followed by the function or dataset name to access help documentation:

?mean
?mpg

# Power Up R with Packages

R packages are collections of functions and tools that expand R’s capabilities.

Install Packages

Option 1: By using the install.packages() function:
- install.packages(“dplyr”)
Option 2: By using PacMan:
- Install the PacMan package: install.packages(“pacman”)
- Load PacMan: library(pacman)
- Install a set of packages: p_load(dplyr, tidyr, stringr, lubridate, ggplot2, readr)
To check if a package has been installed:
- You can use the installed.packages() function. Here’s a simple way to check:
```
is_installed <- "package_name" %in% rownames(installed.packages())
```
  Replace “package_name” with the name of the package you want to check. This will return TRUE if the package is installed, and FALSE if it’s not.

Load Packages

Once installed, you can load packages usinglibrary():

- library(dplyr)

**To check if a package has been loaded: **You can use the isNamespaceLoaded() function or check the search() list. Here are two methods:

# Method 1
is_loaded <- isNamespaceLoaded("package_name")

# Method 2
is_loaded <- "package:package_name" %in% search()

Again, replace “package_name” with the name of the package you want to check. Both methods will return TRUE if the package is loaded, and FALSE if it’s not.

Check the version of an installed package

To check the version of an installed package in R, you can use the packageVersion() function. Here’s how to do it

packageVersion("package_name")

Replace “package_name” with the name of the package you want to check. This function will return the version number of the package.

For example, to check the version of the “dplyr” package:

packageVersion("dplyr")

Or,

print(packageVersion("dplyr"), quote = FALSE)

Discover Useful Packages

CRAN (Comprehensive R Archive Network): The official repository for R packages, organized by task views (e.g., Bayesian inference, chemo metrics, etc.).
CRANtastic: A site listing recently updated and popular R packages.
GitHub: A platform where developers share and collaborate on R packages.

# Data Manipulation using Tidyverse and Dplyr

Tidyverse: A Collection of Packages: The Tidyverse is a set of packages designed to work together seamlessly for consistent data analysis using the Tidy data format.
Dplyr: The Data Manipulation Master: Dplyr is a key Tidyverse package that empowers you to filter, transform, and manipulate data with ease.

The Pipe Operator ( %>%)

The pipe operator ( %>%) chains operations, passing the output of one function as input to the next. For example:

my_df %>%
    filter(age > 28) %>% # Filter for rows where age is greater than 28
    mutate(new_column = age * 2) # Create a new column doubling the age

Let’s filter by City mileage and then mutate a column in one step:

mpg_filtered_and_mutated <- mpg %>%
  filter(cty >= 20) %>%
  mutate(cty_metric = cty * 0.425144)
# Output: A new dataset with both filtering and mutation performed

view(mpg_filtered_and_mutated)

Subsetting Data

The filter() command allows you to select rows based on specific conditions:

Filtering by Condition: Get cars with City mileage at least 20 miles per gallon:

mpg_efficient <- mpg %>% filter(cty >= 20)
# Output: A new dataset called "mpg_efficient" with only cars that meet the condition

view(mpg_efficient)

Filtering by Variable Value: Get cars manufactured by Ford:

mpg_ford <- mpg %>% filter(manufacturer == "ford")
# Output: A new dataset called "mpg_ford" with only Ford cars

view(mpg_ford)

Grouped Summaries

Grouping and Summarizing: Calculate the average City mileage for each vehicle class:

mpg_grouped_summary <- mpg %>%
  group_by(class) %>%
  summarize(avg_cty = mean(cty))
# Output: A dataset showing the average City mileage for each vehicle class

view(mpg_grouped_summary)

Multiple Summaries: Calculate both average and median City mileage:

mpg_grouped_summary <- mpg %>%
  group_by(class) %>%
  summarize(avg_cty = mean(cty), median_cty = median(cty))
# Output: A dataset showing both the average and median City mileage for each vehicle class

view(mpg_grouped_summary)

# Importing Data from Files

The most common way to get data into R is by importing it from files.

Common Data File Formats:

CSV (Comma Separated Values): Plain text version of a spreadsheet.
TXT (Text File): Simple text files.
XLSX (Excel Spreadsheet): Excel files.
JSON (JavaScript Object Notation): Data format often used for web data.

Step 1: Load the readr Package

library(readr)

Step 2: Import Data

CSV: data <- read_csv(“your_file.csv”)
TXT: data <- read_delim(“your_file.txt”, delim=“\t”)
XLSX: Import using the readxl package: install.packages(“readxl”); library(readxl); data <- read_excel(“your_file.xlsx”)

Practice

This tutorial will guide you through the fundamental data structures in R, specifically focusing on those essential for working with Bioconductor.

Vectors: Building Blocks of Data

A vectors are the foundation of many data structures in R. They hold elements of the same data type, making them efficient for storing and manipulating data.

Example:

# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# Print the vector
numbers
# Get the class of the vector
class(numbers)

Subsetting Vectors: You can access specific elements within a vector using indexing and names.

Example:

# Give names to the elements
names(numbers) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")
# Access elements by index
numbers[1] # Get the first element
numbers[3:5] # Get elements from index 3 to 5
# Access elements by name
numbers["a"] # Get the element named "a"
numbers[c("b", "d", "f")] # Get elements named "b", "d", and "f"

Important Note: When using names for subsetting, be aware that non-unique names can lead to confusion. Only the first match will be returned.

# Example with non-unique names
names(numbers) <- c("a", "a", "b")
numbers["a"] # Returns the first element with name "a"

Matrices: Organizing Data in Rows and Columns

Matrices are two-dimensional data structures with rows and columns. They are useful for representing tables or arrays of data.

Example:

# Create a matrix
matrix <- matrix(1:9, nrow = 3, ncol = 3)
matrix
# Get the dimensions of the matrix
dim(matrix)
# Access elements by indices
matrix[1, 2] # Access element in the first row, second column
matrix[1:2, 3] # Access elements in the first and second row, third column
# Add row names
rownames(matrix) <- c("R1", "R2", "R3")
# Add column names
colnames(matrix) <- c("C1", "C2", "C3")
matrix

Subsetting Matrices: You can use both numeric indices and names for subsetting matrices, considering their two-dimensional nature.

Lists: Holding Diverse Data

Lists are flexible data structures capable of holding various types of objects, even of different classes.

Example:

# Create a list with different elements
my_list <- list(numbers = numbers, letters = letters[1:5], function = mean)
# Print the list
my_list
# Access elements by name
my_list$numbers
my_list$letters
my_list$function

Subsetting Lists: Similar to vectors, you can subset lists using indexing and names.

Example:

my_list[1:2] # Access the first two elements
my_list[1] # Access the first element (returns a list with one element)
my_list[[1]] # Access the first element (returns the element itself)

Important Note: Double brackets ( [[ ]] ) are crucial for accessing the elements directly within a list, as single brackets ([ ]) return a list with one element.

Data Frames: Organizing Data for Analysis

Data frames are essential for data analysis, storing observations of different types in columns. Each column represents a variable, and each row represents an observation.

Example:

# Create a data frame with two variables
my_df <- data.frame(sex = c("M", "F", "M"), age = c(25, 30, 28))
# Print the data frame
my_df
# Access columns
my_df$sex
my_df$age
# Access rows using subsetting
my_df[1:2, ] # Access the first two rows

Data Frame Characteristics:

Column Orientation: Data frames are column-oriented, allowing easy access to individual variables.
Unique Row Names: Row names in data frames are required to be unique, ensuring clear identification of observations.

Converting Objects: Changing Data Types

R provides functions for converting between different data types.

Example:

# Convert data frame to matrix
as.matrix(my_df)
# Convert matrix to list
as.list(matrix)
# Convert a vector to a list
as.list(numbers)

General Conversion Function: The as function in the methods package offers a general way to convert objects of various types.

Example:

# Use the as function to convert an object
as(my_df, "matrix")

This tutorial has provided you with a solid foundation in essential R objects for Bioconductor. You’ve learned about vectors, matrices, lists, and data frames, understanding how to create, manipulate, and convert them. This knowledge will serve you well as you explore the exciting world of Bioconductor.

Basics of R and R Studio

# Setting Up Your Workspace

Installation

RStudio interface

# Basic R Syntax

Working with Data Structures

Data Types:

Data Structures:

**Coercion (**Changing data type)

Entering Data Manually

Help on Functions and Datasets

# Power Up R with Packages

Install Packages

Load Packages

Check the version of an installed package

Discover Useful Packages

# Data Manipulation using Tidyverse and Dplyr

The Pipe Operator ( %>%)

Subsetting Data

Grouped Summaries

# Importing Data from Files

Common Data File Formats:

Step 1: Load the readr Package

Step 2: Import Data

Practice

Vectors: Building Blocks of Data

Matrices: Organizing Data in Rows and Columns

Lists: Holding Diverse Data

Data Frames: Organizing Data for Analysis

Converting Objects: Changing Data Types

Coercion (Changing data type)