Subido por carersmors5

R Programing Formulary

Anuncio
Contenido
Formulary ............................................................................................................................... 2
objects() .......................................................................................................................... 2
rm() ................................................................................................................................. 2
c( value1 , value2) .......................................................................................................... 2
<- ; -> .............................................................................................................................. 3
assign( “x” , values ) ....................................................................................................... 3
range(..., na.rm = FALSE) .............................................................................................. 3
length(x) ......................................................................................................................... 3
sum(..., na.rm = FALSE) ................................................................................................ 4
prod(..., na.rm = FALSE) ............................................................................................... 4
mean(x, trim = 0, na.rm = FALSE, ...) ........................................................................... 4
var(x, y = NULL, na.rm = FALSE, use) ........................................................................ 5
sort(x, decreasing = FALSE, na.last = NA, ...)............................................................... 5
order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) 6
number:number............................................................................................................... 6
seq(from = 1, to = 1, by = ((to - from)/ (length.out - 1)) , length.out = NULL,
along.with = NULL, ...) .................................................................................................. 6
rep(x, ...) ......................................................................................................................... 7
is.na(x) ............................................................................................................................ 7
paste (..., sep = " ", collapse = NULL, recycle0 = FALSE) ........................................... 7
vector[index_vector] ....................................................................................................... 8
as.something() ................................................................................................................. 8
which(x, arr.ind = FALSE, useNames = TRUE)............................................................ 8
Files ........................................................................................................................................ 9
Download files .................................................................................................................... 9
download.file() ............................................................................................................... 9
Reading local files .............................................................................................................. 9
read.table() ...................................................................................................................... 9
Read Excel files ................................................................................................................ 10
read.xlsx() ..................................................................................................................... 10
Reading XML ................................................................................................................... 10
XML ............................................................................................................................. 11
Tags, elements and attributes........................................................................................ 11
XPath ............................................................................................................................ 11
xmlTreeParse() ............................................................................................................. 11
[[]] ................................................................................................................................. 11
xmlSapply() .................................................................................................................. 12
Reading JSON .................................................................................................................. 12
JSON............................................................................................................................. 12
fromJSON() .................................................................................................................. 12
Reading from MySQL ...................................................................................................... 12
MySQL ......................................................................................................................... 12
Download a MySQL server. ......................................................................................... 13
Install RMySQL package. ............................................................................................ 13
Connecting and listing databases. ................................................................................. 13
Connecting to databases and listing tables. .................................................................. 13
Get dimensions of a specific table. ............................................................................... 13
Read from the table....................................................................................................... 13
Reading from HDF5 ......................................................................................................... 13
Formulary
{base}
objects()
(alternatively, ls()) can be used to display the names of (most of) the objects which are
currently stored within R. The collection of objects currently stored is called the workspace.
rm()
Is for remove objects whom was created.
Example:
> rm(x, y, z, ink, junk, temp, foo, bar)
c( value1 , value2)
It is a vector of values.
Examples:
> c ( 1 , 2 , 3 )
[1] 1 2 3
> c ( ”h”, ”k” , ”t” )
[1] ”h” ”k” ”t”
<- ; ->
It is an operator to assign values to a variable.
Examples:
> x <- 2
> x
[1] 2
> 2 -> x
> x
[1] 2
> x -> c(1,2,3,4)
assign( “x” , values )
It is a function to assign values to a variable.
Examples:
> assign("x", 2))
> x
[1] 2
> assign("x", c(1,2,3,4))
> x
[1] 2
range(..., na.rm = FALSE)
Print a vector like “c(min(variable),máximo(variable))” from other vector.
Examples:
> range(c(1,2,3,4,5))
[1] 1 5
> range( c(Na,3,4,5,6) , na.rm = TRUE)
[1] 3 6
length(x)
Print the number of elements in the variable.
Example:
> x <- c(1,3,5,7)
> length (x)
[1] 4
sum(..., na.rm = FALSE)
Add all the elements inside the variable.
Examples:
> x <- c(2,3,4,5)
> sum(x)
[1] 14
> x <- c(2,NA,3,NA)
> sum(x,na.rm = TRUE)
[1] 5
prod(..., na.rm = FALSE)
Multiply all the elements inside the variable.
Examples:
> x <- c(1,2,3,4)
> prod(x)
[1] 24
> x <- c(NA,2,3,NA)
> prod(x,na.rm = TRUE)
[1] 6
mean(x, trim = 0, na.rm = FALSE, ...)
Is the average of all the elements inside the variable.
Examples:
> x <- c(2,3,4,6)
> mean(x)
[1] 3.75
> x <- c(2,3,4,6)
> mean(x,trim = 0.5)
[1] 3.5
> x <- c(2,3,NA,4,6,NA)
> mean(x,trim = 0.5,na.rm = TRUE)
[1] 3.5
var(x, y = NULL, na.rm = FALSE, use)
It is the sample variance.
Examples:
> x <- c(1,2,3,4)
> var(x)
[1] 1.666667
> x <- c(1,2,3,4)
> y <- c(5,6,7,8)
> var(x,y)
[1] 1.666667
> x <- c(1,2,3,4)
> y <- c(5,6,7,NA)
> var(x,y,na.rm = TRUE)
[1] 1
> x <- c(1,2,3,4)
> y <- c(5,6,7,NA)
> var(x,y,use = “complete.obs”)
[1] 1
sort(x, decreasing = FALSE, na.last = NA, ...)
Returns a vector of the same size as x with the elements arranged in increasing order.
Examples:
> x <- c(7,3,NA,5,0,NA,1)
> sort(x)
[1] 0 1 3 5 7
> x <- c(7,3,NA,5,0,NA,1)
> sort(x,decreasing = TRUE)
[1] 7 5 3 1 0
> x <- c(7,3,NA,5,0,NA,1)
> sort(x,decreasing = TRUE,na.last = TRUE)
[1] 7 5 3 1 0 NA NA
> x <- c(7,3,NA,5,0,NA,1)
> sort(x,decreasing = TRUE,na.last = TRUE)
[1] NA NA 7 5 3 1 0
order(..., na.last = TRUE, decreasing = FALSE,
method = c("auto", "shell", "radix"))
Returns a permutation which rearranges its first argument into ascending or descending
order
Examples:
> x <- c(7,3,NA,5,0,NA,1)
> order(x)
[1] 5 7 2 4 1 3 6
> x <- c(7,3,NA,5,0,NA,1)
> order(x, na.last = FALSE)
[1] 3 6 5 7 2 4 1
> x <- c(7,3,NA,5,0,NA,1)
> order(x, na.last = FALSE,decreasing = TRUE)
[1] 3 6 1 4 2 7 5
> x <- c(7,3,NA,5,0,NA,1)
> order(x, na.last = TRUE,decreasing = TRUE)
[1] 1 4 2 7 5 3 6
number:number
It create a number sequence
Example:
> 5:10
[1] 5 6 7 8 9 10
> pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
seq(from = 1, to = 1, by = ((to - from)/
(length.out - 1)) , length.out = NULL,
along.with = NULL, ...)
It creates a number sequence with a certain difference
Example:
> seq(1,3,by = 0.5)
[1] 1.0 1.5 2.0 2.5 3.0
> seq(1,3,length.out = 6)
[1] 1.0 1.4 1.8 2.2 2.6 3.0
> seq(1,3,along.with = 4:10)
[1] 1.000000 1.333333 1.666667 2.000000 2.333333 2.666667 3.000000
rep(x, ...)
Replicates the values inside a variable
Example:
> rep(5,times = 5)
[1] 5 5 5 5 5
> rep(5:7,times = 5)
[1] 5 6 7 5 6 7 5 6 7 5 6 7 5 6 7
> rep(5:7,each = 5)
[1] 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7
> rep(5:7,length.out = 5)
[1] 5 6 7 5 6
is.na(x)
Indicates which elements are missing.
Examples:
> x <- c(7,3,NA,5,0,NA,1)
> is.na(x)
[1] FALSE FALSE
TRUE FALSE FALSE
TRUE FALSE
paste (..., sep = " ", collapse = NULL,
recycle0 = FALSE)
Concatenate vectors after converting to character.
Examples:
> paste(“a”,”b”,”u”,”e”,”l”,”a”,sep = “”)
[1] “abuela”
> x <- c(“a”,”b”,”u”,”e”,”l”,”a”)
> paste(x,collapse = “”)
[1] “abuela”
vector[index_vector]
It select objects inside a vector.
Example:
> x <- c(NA,1,NA,5,NA,8,NA,10)
> x[!is.na(x]
[1] 1 5 8 10
> x <-seq(1,30,0.5)
> x[12:18]
[1] 6.5 7.0 7.5 8.0 8.5 9.0 9.5
> x <- 1:10
> x[-(1:5)]
[1] 6 7 8
9 10
as.something()
Change the class of the object.
Examples:
> x <- 1:3
> as.character(x)
[1] “1” “2” “3”
> x <- c("a", "b", "c")
> as.list(x)
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
which(x, arr.ind = FALSE, useNames = TRUE)
Give the TRUE indices of a logical object, allowing for array indices but it does not consider
the missing values.
Example:
> x <- c(NA, 2, 3, 4, 5, NA, 7, 8, NA, 10)
> which(x > 5)
[1]
7
8 10
{plyr}
arrange(df, ...)
This function completes the subsetting, transforming and ordering triad with a function that
works in a similar way to subset and transform but for reordering a data frame by its columns.
Files
Download files
download.file()
•
•
•
•
Downloads a file from internet.
Even if you could do this by hand, helps with reproducibility.
Important parameters are url, destfile, method.
Useful for downloading tab-delimeted, csv, and others files.
Example:
> fileUrl <- “url”
> download.file(fileUrl, destfile = “./data/file.csv”, method = “curl”)
> fileUrl <- “url”
> download.file(fileUrl, destfile = “./data/file.xlsx”, method = “curl”)
Some notes about download.file():
•
•
•
•
•
If the url starts with http you can use download.file() .
If the url starts with https on Windows you may be ok.
If the url starts with https on Mac you need to set method=“curl” .
If the file is big, this might take a while.
Be sure to record when you downloaded.
Reading local files
read.table()
•
•
•
This is the main function for reading files in R.
Flexible and robust but requires more parameters.
Reads the data into RAM – big data can cause problems
•
•
Important parameters: file, header, sep, row.names, nrows.
Related: read.csv(), read.csv2()
Examples:
> datafile <- read.table(“./data/file.csv”,sep = “,”, header = TRUE)
> datafile <- read.csv(“./data/file.csv”)
Some more important parameters:
•
•
•
•
quote – you can tell R whether there are any quoted values quote=”” means no quote.
na.strings – set the character that represents a missing values.
nrows – how many rows to read of the file.
skip – number of lines to skip before starting to read.
Read Excel files
read.xlsx()
•
•
Is necessary to load the package “xlsx”.
Important parameters: file, sheetIndex, header, colIndex, rowIndex.
Examples:
> library(xlsx)
> datafile <- read.xlsx(“./data/file.xlsx”,sheetIndex = 1, header = TRUE)
> colIndex <- 2:3
> rowIndex <- 1:4
> library(xlsx)
> datafile <- read.xlsx(“./data/file.xlsx”,sheetIndex
colIndex, rowIndex = rowIndex)
=
1,
colIndex
Further Notes:
•
•
•
•
•
The write.xlsx function will write out an Excel file with similar arguments.
read.xlsx2 is much faster than read.xlsx but for reading subsets or rows maybe slightly
unstable.
The XLConnect package has more options for writing and manipulating Excel files.
The XLConnect vignette is a good place to start for that package.
In general it is advised to store your data in either a database or in comma separated
files(.csv) on tab separated files (.tab/.txt) as there are easier to distribute.
Reading XML
=
XML
•
•
•
•
•
Extensible markup language.
Frequently used to store structured data.
Particularly widely used in internet applications.
Extracting XML is the basis for most web scraping.
Componentes:
▪ Markup: labels that give the text structure.
▪ Content: the actual text of the document.
Tags, elements and attributes
•
•
•
Tags correspond to general labels.
▪ Start tags <section>
▪ End tags </section>
▪ Empty tags <line-break />
Elements are specific examples of tags.
▪ <Greeting> Hello world </Greeting>
Attributes are components of the label.
▪ <img src=”jeff.jpg” alt=”instructor”/>
▪ <step number=”3”> Connect A to B </step>
XPath
•
•
•
•
/node Top level node.
//node Node at any level.
node[@attr-name] Node with an attribute name.
node[@attr-name =’value’] Node with an attr-name=’value’.
xmlTreeParse()
•
•
•
Is necessary to load the package “XML”.
Important parameters: url, useInternal.
xmlRoot : have acces to that particular element to that xml file.
Example:
> library(XML)
> fileurl <- “url”
> doc <- xmlTreeParse(fileurl,useInternal=TRUE)
> rootNode <- xmlRoot(doc)
[[]]
Directly access part of the xml document.
Examples:
> rootNode[[1]]
> rootNode[[1]][[1]]
xmlSapply()
Extract parts of the file.
Example:
> xmlSapply(rootNode,xmlValue)
> xmlSapply(rootNode, ”//name”, xmlValue)
Reading JSON
JSON
•
•
•
•
•
Javascript Object Notation.
Lightweight data storage.
Common format for data from application programming interfaces (APIs)
Similar structure to XML but different syntax/format.
Data stored as:
▪ Numbers (double).
▪ Strings (double quoted).
▪ Boolean (true or false).
▪ Array (ordered, comma separated enclosed in square brackets[])
▪ Object (unorderd, comma separated collection of key:value pairs in curley brackets
[])
fromJSON()
•
•
•
Is necessary to load the package “jsonlite”.
Important parameter: url.
You can convert a data frame to a JSON file with toJASON function.
Examples:
> library(jasonlite)
> dataJSON <- fromJSON(“url”)
> myjason <- toJSON(iris, pretty=TR
Reading from MySQL
MySQL
•
•
Free and widely used open source database software.
Widely used in internet based applications.
•
•
Data are structured in:
▪ Databases.
▪ Tables within databases.
▪ Fields within tables.
Each row is called a record.
Download a MySQL server.
https://dev.mysql.com/downloads/installer/ Installer.
Install RMySQL package.
http://biostat.mc.vanderbilt.edu/wiki/Main/RMySQL
Instructions.
Connecting and listing databases.
•
Is necessary to load the package “RMySQL”.
Example:
> ucscDb <- dbConnect(MySQL(),user="genome",host="genomemysql.cse.ucsc.edu")
> result <- dbGetQuery(ucscDb,"show databases;"); dbDisconnect(ucscDb);
Connecting to databases and listing tables.
Example:
> hg19 <- dbConnect(MySQL(),user = "genome", db = “hg19”, host = "genomemysql.cse.ucsc.edu")
> alltables <- dbListTables(hg19)
Get dimensions of a specific table.
Example:
> dbListFields(hg19,”affyU133Plus2”)
>dbGetQuery(hg19,”select count(*)from affyU133Plus2”)
Read from the table
> affyData <- dbReadTable(hg19,”affyU133Plus2”)
Reading from HDF5
•
•
•
•
•
Used for storing large data sets.
Supports storing a range of data types.
Heirarchical data format.
Groups containing zero or more data sets and metadata.
▪ Have a group header with group name and list of attributes.
▪ Have a group symbol table with a list of objects in group.
Datasets multidimensional array of data elements with metadata
▪ Have a header with name, datatype, dataspace, and storage layout.
▪ Have a data array with the data.
Descargar