2016/12/24

New R Package magicfor - Magic Functions to Obtain Results from for Loops in R

What is inconvenience of for loops in R? It is that results you get will be gone away. So we have created a package to store the results automatically. To do it, you only need to cast one line spell magic_for(). In this text, we tell you about how to use the magic.

1. Overview

for() is one of the most popular functions in R. As you know, it is used to create loops.

For example, let’s calculate squared values for 1 to 3.

for (i in 1:3) {
  squared <- i ^ 2
  print(squared)
}
#> [1] 1
#> [1] 4
#> [1] 9

It is very easy.

However, it becomes too much hassle to change such codes to store printed results. You must prepare some containers with correct length for storing results and change print() to assignment statements.

result <- vector("numeric", 3) # prepare a container
for (i in 1:3) {
  squared <- i ^ 2
  result[i] <- squared         # change to assignment
}
result
#> [1] 1 4 9

Moreover, you may want to store results as a data.frame with iteration numbers.

result <- data.frame(matrix(nrow = 3, ncol = 2))
colnames(result) <- c("i", "squared")
for (i in 1:3) {
  squared <- i ^ 2
  result[i, 1] <- i
  result[i, 2] <- squared
}
result
#>   i squared
#> 1 1       1
#> 2 2       4
#> 3 3       9

What a bother!

In such or more troublesome situations like that you have to store many variables, codes will grow more complex.

The magicfor package makes to resolve the problem being kept readability.

You just add two lines before for loops. First, load the library. Second, call magic_for(). Notice that the main for loop is kept intact.

library(magicfor)  # Load library
magic_for(print)   # Call magic_for()

for (i in 1:3) {
  squared <- i ^ 2
  print(squared)
}
#> The loop is magicalized with print().
#> [1] 1
#> [1] 4
#> [1] 9

magic_for() takes a function name, and reconstructs for() to remember values passed to the specified function in for loops. We call it magicalization. Once you call magic_for(), as you just exectute for() as usual, the result will be stored in memory automatically.

Here, let’s use magic_result_as_vector() to access the stored values.

magic_result_as_vector()  # Get the result
#> [1] 1 4 9

This is one of the functions to obtain results from magicalized for loops, and means to take out the results as a vector.

Even if the number of observed variables increases, you can do it the same way.

magic_for(silent = TRUE)

for (i in 1:3) {
  squared <- i ^ 2
  cubed <- i ^ 3
  put(squared, cubed)
}

magic_result_as_dataframe()
#>   i squared cubed
#> 1 1       1     1
#> 2 2       4     8
#> 3 3       9    27

put() is the default function to store values in magicalized for loops. It allows to take any number of variables and can display them.

2. Installation

You can install the magicfor package from CRAN.

install.packages("magicfor")

The source code for magicfor package is available on GitHub at

3. Details

The magicfor package provides the functions as follows:

  • magic_for(): Magicalize for.
  • magic_free(): Cancel magicalization.
  • Get results:
    • magic_result(): as a list.
    • magic_result_as_vetor(): as a vector.
    • magic_result_as_dataframe(): as a data.frame.
  • put(): Display values.

In the following, we assume that the library is loaded to use the functions.

library(magicfor)

3.1 Basics

The main function magic_for() magicalize for loops. Magicalize means to change the behavior of for() to store values outputted via target functions.

magic_for()

for (i in 1:3) {
  squared <- i ^ 2
  put(squared)
}
#> The loop is magicalized with put().
#> squared: 1
#> squared: 4
#> squared: 9

The default target function is put(). It displays input values, for example:

x <- 1
put(x)
#> x: 1

You can take out stored values using magic_result_**() when for loops have finished.

magic_result_as_vector()
#> [1] 1 4 9

3.2 magic_for()

magic_for() has several options.

Specify the first argument func, you can change target functions.

magic_for(cat)

for (i in 1:3) {
  squared <- i ^ 2
  cat(squared, " ")
}
#> The loop is magicalized with cat().
#> 1  4  9

If progress = TRUE, show progress bar.

magic_for(progress = TRUE)

for (i in 1:3) {
  squared <- i ^ 2
  put(squared)
}
#> |=================================================================| 100%

If you set test a number, the iteration is limited to that number of times.

magic_for(test = 2)

for (i in 1:100) {
  squared <- i ^ 2
  put(squared)
}
#> The loop is magicalized with put().
#> squared: 1
#> squared: 4

If silent = TRUE, target function will be not executed but only the values will be stored.

If temp = TRUE, the effect of magicalization will be lost after once execution of for loop.

magic_for(temp = TRUE)
is_magicalized()
#> [1] TRUE

for (i in 1:3) {
  squared <- i ^ 2
  put(squared)
}
#> The loop is temporary magicalized with put().
#> squared: 1
#> squared: 4
#> squared: 9

is_magicalized()
#> [1] FALSE

3.3 magic_free()

You can use magic_free() to cancel magicalization.

magic_for()
is_magicalized()
#> [1] TRUE

magic_free()
is_magicalized()
#> [1] FALSE

The function also clear the stored values.

magic_for(silent = TRUE)

for (i in 1:3) {
  squared <- i ^ 2
  put(squared)
}

magic_result_as_vector()
#> [1] 1 4 9

magic_free()
magic_result_as_vector()
#> NULL

3.4 magic_result_**()

You can use magic_result_**() to obtain results from magicalized for loops.

magic_for(silent = TRUE)

for (i in 1:3) {
  squared <- i ^ 2
  put(squared)
}

magic_result() returns results as a list.

magic_result()
#> $squared
#> $squared[[1]]
#> [1] 1
#> 
#> $squared[[2]]
#> [1] 4
#> 
#> $squared[[3]]
#> [1] 9

magic_result_as_vector() returns results as a vector.

magic_result_as_vector()
#> [1] 1 4 9

magic_result_as_dataframe() returns results as a data.frame.

magic_result_as_dataframe()
#>   i squared
#> 1 1       1
#> 2 2       4
#> 3 3       9

3.5 put()

put() displays input values with high flexibility.

x <- 2
y <- 3
put(x)
#> x: 2
put(x, y)
#> x: 2, y: 3
put(x, x ^ 2, x ^ 3)
#> x: 2, x^2: 4, x^3: 8
put(x, squared = x ^ 2, cubed = x ^ 3)
#> x: 2, squared: 4, cubed: 8

It is very useful for magicfor.

magic_for()

for (i in 1:3) {
  put(x = i, squared = i ^ 2, cubed = i ^ 3)
}
#> The loop is magicalized with put().
#> x: 1, squared: 1, cubed: 1
#> x: 2, squared: 4, cubed: 8
#> x: 3, squared: 9, cubed: 27

magic_result_as_dataframe(F)
#>   x squared cubed
#> 1 1       1     1
#> 2 2       4     8
#> 3 3       9    27

4. Miscellaneous

Whenever you put just variables in magicalized for loops, their values will be stored regardless of target functions.

magic_for()

for (i in 1:3) {
  squared <- i ^ 2
  squared
}
#> The loop is magicalized with put().

magic_result_as_vector()
#> [1] 1 4 9

When you write trarget functions inside of if statements without else, NA will be inserted to represent missing.

magic_for()

for (i in 1:3) {
  squared <- i ^ 2
  if(i == 3) put(squared)
}
#> The loop is magicalized with put().
#> squared: 9

magic_result_as_vector()
#> [1] NA NA  9

Target functions work only top level lines or inside of if statements in magicalized for loops. For example, it does not work inside nested for loops.

magic_for()

for (i in 1:2) {
  for (j in 1:2) {
    put(i, j, i * j)
  }
}
#> The loop is magicalized with put().
#> i: 1, j: 1, i*j: 1
#> i: 1, j: 2, i*j: 2
#> i: 2, j: 1, i*j: 2
#> i: 2, j: 2, i*j: 4

magic_result_as_vector()
#> list()

5. Bug Reports

  • https://github.com/hoxo-m/magicfor/issues

2016/08/18

githubinstall 0.1.0: New Feature for A Helpful Way to Install R Packages Hosted on GitHub

We have updated our githubinstall package. It is now on CRAN.

Basics

Using the package, you can install R packages hosted on GitHub without usernames.

library(githubinstall)
githubinstall("AnomalyDetection")
# It is same as devtools::install_github("twitter/AnomalyDetection")

We introduced the package in the previous entry.

You can install or update the package as follows.

install.packages("githubinstall")

New Feature

We have added a new feature to the new version of the package.

Now, you can install packages with specifying Git references (branch, tag, commit and pull request).

Developers are divided in policy to manage R packages on GitHub. If a package is going to be developed in "develop" branch, you may want to install the package from the branch.

gh_install_packages() has ref argument to specify Git references. For instance, you can install awaptools from the "develop" branch as follows:

githubinstall("awaptools", ref = "develop")

You may sometimes encounter failing to install packages because its repository HEAD is broken. In such case, you can specify a tag or commit to ref. In almost cases, tags are added on an unbroken commit. For instance, you can install densratio from the “v0.0.3” tag as follows:

githubinstall("densratio", ref = "v0.0.3")

Even if you cannot find such tags, you can install packages from any commit that is not broken. For instance, you can install densratio from the “e8233e6” commit as follows:

githubinstall("densratio", ref = "e8233e6")

Finally, you may find a patch for fixing bugs as a pull request. In such case, you can specify pull requests to ref using github_pull(). For instance, you can install dplyr from the pull request #2058 as follows:

githubinstall("dplyr", ref = github_pull("2058"))

Bug Fixes

We have fixed some bugs reported on Issues. It has detailed on NEWS.

If you find some bugs or need new features, we would appreciate reporting it.

2016/06/15

githubinstall - New R Package for Easy to Install R Packages on GitHub

1. Overview

A growing number of R packages are created by various people in the world. A part of the cause of it is the devtools package that makes it easy to develop R packages [1]. The devtools package not only facilitates the process to develop R packages but also provides an another way to distribute R packages.

When developers publish R packages, the CRAN [2] is commonly used. You can install the packages that are available on CRAN using install.package(). For example, you can install dplyr package as follows:

install.packages("dplyr")

The devtools package provides install_github() that enables installing packages from GitHub.

library(devtools)
install_github("hadley/dplyr")

Therefore, developers can distribute R packages that is developing on GitHub. Moreover, there are some developers that they have no intention to submit to CRAN. For instance, Twitter, Inc. provides AnomalyDetection package on GitHub but it will not be available on CRAN [3]. You can install such packages easily using devtools.

library(devtools)
install_github("twitter/AnomalyDetection")

There is a difference between install.packages() and install_github() in the required argument. install.packages() takes package names, while install_github() needs repository names. It means that when you want to install a package on GitHub you must remember its repository name correctly.

The trouble is that the usernames of GitHub are often hard to remember. Developers consider the package names so that users can understand the functionalities intuitively. However, they often decide username incautiously. For instance, ggfortify is a great package on GitHub, but who created it? What is the username? The answer is sinhrks [4]. It seems to be difficult to remember it.

The githubinstall package provides a way to install packages on GitHub by only the package names just like install.packages().

library(githubinstall)
githubinstall("AnomalyDetection")
Suggetion:
 - twitter/AnomalyDetection
Do you install the package? 

1: Yes (Install)
2: No (Cancel)

githubinstall() suggests the GitHub repository from package names, and asks whether you want to execute the installation.

Furthermore, you may succeed in installing packages from a faint memory because our package automatically correct its spelling by fuzzy string search.

githubinstall("AnomaryDetection")
githubinstall("AnomalyDetect")
githubinstall("anomaly-detection")

2. Installation

You can install the githubinstall package from CRAN.

install.packages("githubinstall")

The source code for githubinstall package is available on GitHub at

3. Details

The githubinstall package provides several useful functions.

  • githubinstall() or gh_install_packages()
  • gh_suggest()
  • gh_suggest_username()
  • gh_list_packages()
  • gh_search_packages()
  • gh_show_source()
  • gh_update_package_list()

The functions have common prefix gh. githubinstall() is an alias of gh_install_packages().

To use these functions, first, you should load the package as follows.

library(githubinstall)

3.1. Install Packages from GitHub

githubinstall() enables to install packages on GitHub by only package names.

githubinstall("AnomalyDetection")
Suggestion:
 - twitter/AnomalyDetection
Do you install the package? 

1: Yes (Install)
2: No (Cancel)

Selection: 

The function suggests GitHub repositories. If you type ‘1’ and ‘enter’, then installation of the package will begin. The suggestion is made of looking for the list of R packages on GitHub. The list is provided by Gepuro Task Views.

If multiple candidates are found, you can select one of them.

githubinstall("cats")
Select one repository or, hit 0 to cancel. 

1: amurali2/cats      cats
2: danielwilhelm/cats No description or website provided.
3: hilaryparker/cats  An R package for cat-related functions #rcatladies
4: lolibear/cats      No description or website provided.
5: rafalszota/cats    No description or website provided.
6: tahir275/cats      ff

Selection: 

githubinstall() is an alias of gh_install_packages().

gh_install_packages("AnomalyDetection")

3.2. Suggest Repositories

githubinstall() prompts you to install the suggested packages. But you may just want to know what will be suggestions.

gh_suggest() returns the suggested repository names as a vector.

gh_suggest("AnomalyDetection")
## [1] "twitter/AnomalyDetection"
gh_suggest("cats")
## [1] "amurali2/cats"       "danielwilhelm/cats"  "davidluizrusso/cats"
## [4] "hilaryparker/cats"   "lolibear/cats"       "rafalszota/cats"    
## [7] "tahir275/cats"

In addition, gh_suggest_username() is useful if you want to know usernames from a faint memory.

gh_suggest_username("hadly")
## [1] "hadley"
gh_suggest_username("yuhui")
## [1] "yihui"

3.3. List the Packages

gh_list_packages() returns the list of R package repositories on GitHub as data.frame.

For example, if you want to get the repositories that have been created by hadley, run the following.

hadleyverse <- gh_list_packages(username = "hadley")
head(hadleyverse)
##   username package_name                                              title
## 1   hadley   assertthat                     User friendly assertions for R
## 2   hadley    babynames An R package contain all baby names data from the 
## 3   hadley    bigrquery          An interface to Google's bigquery from R.
## 4   hadley     bookdown                                              Watch
## 5   hadley   clusterfly An R package for visualising high-dimensional clus
## 6   hadley      decumar                           An alternative to sweave

By using the result, you can install all packages created by hadley.

repos <- with(hadleyverse, paste(username, package_name, sep="/"))
githubinstall(repos) # I have not tried it

3.4. Search Packages by a Keyword

gh_search_packages() returns the list of R package repositories on GitHub that the titles contains a given keyword.

For example, if you want to search packages that are relevant to lasso, run the following.

gh_search_packages("lasso")
##           username     package_name                                  title
## 1  ChingChuan-Chen             milr  multiple-instance logistic regressi..
## 2       YaohuiZeng         biglasso  Big Lasso: Extending Lasso Model Fi..
## 3      huayingfang          CCLasso  CCLasso: Correlation Inference for ..
## 4         mlampros FeatureSelection  Feature Selection in R using glmnet..
## 5             pnnl        glmnetLRC  Lasso and Elastic-Net Logistic Regr..
## 6       statsmaths         genlasso  Path algorithm for generalized lass..
## 7       vincent-dk         logitsgl  Fit Logistic Regression with Multi-..
## 8       vincent-dk             lsgl  Linear Multiple Output Using Sparse..
## 9       vincent-dk             msgl  High Dimensional Multiclass Classif..
## 10      vstanislas             GGEE  R Package for the Group Lasso Gene-..
## 11          zdk123       BatchStARS  R package for Stability Approach to..
## 12          zdk123           pulsar  R package for Stability Approach to..

3.5. Show the Source Code of Functions on GitHub

gh_show_source() looks for the source code of a given function on GitHub, and tries to open the place on Web browser.

gh_show_source("mutate", "dplyr")

If you have loaded the package that the function belongs to, you can input the function directly.

library(dplyr)
gh_show_source(mutate)

This function may do not work well with Safari.

3.6. Update the List of R Packages

The githubinstall package uses Gepuro Task Views for getting the list of R packages on GitHub. Gepuro Task Views is crawling the GitHub and updates information every day. The package downloads the list of R packages from Gepuro Task Views each time it was loaded. Thus, you can always use the newest list of packages on a new R session.

However, you may use an R session for a long time. In such case, gh_update_package_list() is useful.

gh_update_package_list() updates the downloaded list of the R packages explicitly.

gh_update_package_list()

2016/04/01

densratio: New R Package for Density Ratio Estimation

1. Overview

Density ratio estimation is described as follows: for given two data samples $x$ and $y$ from unknown distributions $p(x)$ and $q(y)$ respectively, estimate $$ w(x) = \frac{p(x)}{q(x)} $$ where $x$ and $y$ are $d$-dimensional real numbers.

The estimated density ratio function $w(x)$ can be used in many applications such as the inlier-based outlier detection [1] and covariate shift adaptation [2]. Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].

The package densratio provides a function densratio() that returns a result has the function to estimate density ratio compute_density_ratio().

For example,

set.seed(3)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)

library(densratio)
result <- densratio(x, y)
result
## 
## Call:
## densratio(x = x, y = y, method = "uLSIF")
## 
## Kernel Information:
##   Kernel type:  Gaussian RBF 
##   Number of kernels:  100 
##   Bandwidth(sigma):  0.1 
##   Centers:  num [1:100, 1] 1.007 0.752 0.917 0.824 0.7 ...
## 
## Kernel Weights(alpha):
##   num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...
## 
## The Function to Estimate Density Ratio:
##   compute_density_ratio()

In this case, the true density ratio $w(x)$ is known, so we can compare $w(x)$ with the estimated density ratio $\hat{w}(x)$.

true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)
estimated_density_ratio <- result$compute_density_ratio

plot(true_density_ratio, xlim=c(-1, 3), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(estimated_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)

2. How to Install

You can install the densratio package from CRAN.

install.packages("densratio")

You can also install the package from GitHub.

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("hoxo-m/densratio")

The source code for densratio package is available on GitHub at

3. Details

3.1. Basics

The package provides densratio() that the result has the function to estimate density ratio.

For data samples x and y,

library(densratio)

x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)

result <- densratio(x, y)

In this case, result$compute_density_ratio() can compute estimated density ratio.

w_hat <- result$compute_density_ratio(y)
plot(y, w_hat)

3.2. Methods

densratio() has method parameter that you can pass "uLSIF" or "KLIEP".

  • uLSIF (unconstrained Least-Squares Importance Fitting) is the default method. This algorithm estimates density ratio by minimizing the squared loss. You can find more information in Hido et al. (2011) [1].

  • KLIEP (Kullback-Leibler Importance Estimation Procedure) is the anothor method. This algorithm estimates density ratio by minimizing Kullback-Leibler divergence. You can find more information in Sugiyama et al. (2007) [2].

The both methods assume that the denity ratio is represented by linear model: $$ w(x) = \alpha_1 K(x, c_1) + \alpha_2 K(x, c_2) + ... + \alpha_b K(x, c_b) $$ where $$ K(x, c) = \exp\left(\frac{-\|x - c\|^2}{2 \sigma ^ 2}\right) $$ is the Gaussian RBF.

densratio() performs the two main jobs:

  • First, deciding kernel parameter $\sigma$ by cross validation,
  • Second, optimizing kernel weights $\alpha$.

As the result, you can obtain compute_density_ratio().

3.3. Result and Paremeter Settings

densratio() outputs the result like as follows:

## 
## Call:
## densratio(x = x, y = y, method = "uLSIF")
## 
## Kernel Information:
##   Kernel type:  Gaussian RBF 
##   Number of kernels:  100 
##   Bandwidth(sigma):  0.1 
##   Centers:  num [1:100, 1] 1.007 0.752 0.917 0.824 0.7 ...
## 
## Kernel Weights(alpha):
##   num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...
## 
## Regularization Parameter(lambda):  
## 
## The Function to Estimate Density Ratio:
##   compute_density_ratio()
  • Kernel type is fixed by Gaussian RBF.
  • The number of kernels is the number of kernels in the linear model. You can change by setting kernel_num parameter. In default, kernel_num = 100.
  • Bandwidth(sigma) is the Gaussian kernel bandwidth. In default, sigma = "auto", the algorithms automatically select the optimal value by cross validation. If you set sigma a number, that will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.
  • Centers are centers of Gaussian kernels in the linear model. These are selected at random from the data sample x underlying a numerator distribution p_nu(x). You can find the whole values in result$kernel_info$centers.
  • Kernel weights are alpha parameters in the linear model. It is optimaized by the algorithms. You can find the whole values in result$alpha.
  • The funtion to estimate density ratio is named compute_density_ratio().

4. Multi Dimensional Data Samples

In the above, the input data samples x and y were one dimensional. densratio() allows to input multidimensional data samples as matrix.

For example,

library(densratio)
library(mvtnorm)

set.seed(71)
x <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
y <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))

result <- densratio(x, y)
result
## 
## Call:
## densratio(x = x, y = y, method = "uLSIF")
## 
## Kernel Information:
##   Kernel type:  Gaussian RBF 
##   Number of kernels:  100 
##   Bandwidth(sigma):  0.316 
##   Centers:  num [1:100, 1:2] 1.178 0.863 1.453 0.961 0.831 ...
## 
## Kernel Weights(alpha):
##   num [1:100] 0.145 0.128 0.138 0.187 0.303 ...
## 
## Regularization Parameter(lambda):  0.1 
## 
## The Function to Estimate Density Ratio:
##   compute_density_ratio()

Also in this case, we can compare the true density ratio with the estimated density ratio.

true_density_ratio <- function(x) {
  dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
    dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}
estimated_density_ratio <- result$compute_density_ratio

N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
z_true <- matrix(true_density_ratio(input), nrow = N)
z_hat <- matrix(estimated_density_ratio(input), nrow = N)

par(mfrow = c(1, 2))
contour(range, range, z_true, main = "True Density Ratio")
contour(range, range, z_hat, main = "Estimated Density Ratio")

The dimensions of x and y must be same.

5. References

[1] Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 2011.

[2] Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P. & Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.

[3] Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012.