Contents

R List To/From Dataframe

This is an exercising memo which I spent hours for converting data before plotting it as a numeric value.

problem on converting data

While I want to convert a price data loaded from csv by data.table() (below is run on JupyterNotebook)

1
2
3
4
data <- read.table("../resource/asnlib/publicdata/Auto Insurance data.csv", header =T)
str(data)
typeof(data)
head(data)

[output]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
'data.frame':	63 obs. of  1 variable:
 $ Number_of_Claims.Total_Payment_in_thousands: chr  "108,392.5" "19,46.2" "13,15.7" "124,422.2" ...
'list'

A data.frame: 6 × 1
Number_of_Claims.Total_Payment_in_thousands
<chr>
1	108,392.5
2	19,46.2
3	13,15.7
4	124,422.2
5	40,119.4
6	57,170.9

I misunderstood that the amount of Total_Payment_in_thousands contains “,” value in weird position. In fact it is just a delimiter of columns for each row. Actually, this loading procedure using data.table() against csv file brought every problem. If I used read_csv2() directly, I did not need to spend such a silly time 😵

I struggled to eliminate this “,” value from the list and found 2 ways to do that finally. This is my 2nd day to start R. If I had to say, this experience gave me a good chance to study R language.

Solution 1 : list and for loop

Converting entire list is doable by gsub function. However, this did not work because gsub parse list into a character object. Also, gsub cannot be called from apply function.

Alternatively, I did convert element one-by-one in for loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
payment_list = list()
for (i in data){
    i <- gsub(",","",i)
    payment_list = append(payment_list, i)
}

payment_list.n <- lapply(payment_list, as.numeric)
# payment_list.n

df <- data.frame(p=unlist(payment_list.n))
str(df)
head(df)
# df["p"]
mean(df$p)

[output]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
'data.frame':	63 obs. of  1 variable:
 $ p: num  108392 1946 1316 124422 40119 ...
A data.frame: 6 × 1
p
<dbl>
1	108392.5
2	1946.2
3	1315.7
4	124422.2
5	40119.4
6	57170.9
16438.6634920635

Solution 2 : vector and sapply function

Xapply function enables to work command in argument against the element of list, vector, dataframe, etc.

For avoiding gsub parsing issue, convert list into vector by unlist

1
2
3
4
5
6
7
8
data.vec <- gsub(",","",unlist(data))
data.vec2 <- sapply(data.vec, as.numeric)
str(data.vec2)

df2 <- data.frame(p=data.vec2)
str(df2)
head(df2)
mean(df2$p)

[output]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 Named num [1:63] 108392 1946 1316 124422 40119 ...
 - attr(*, "names")= chr [1:63] "Number_of_Claims.Total_Payment_in_thousands1" "Number_of_Claims.Total_Payment_in_thousands2" "Number_of_Claims.Total_Payment_in_thousands3" "Number_of_Claims.Total_Payment_in_thousands4" ...
'data.frame':	63 obs. of  1 variable:
 $ p: num  108392 1946 1316 124422 40119 ...
A data.frame: 6 × 1
p
<dbl>
Number_of_Claims.Total_Payment_in_thousands1	108392.5
Number_of_Claims.Total_Payment_in_thousands2	1946.2
Number_of_Claims.Total_Payment_in_thousands3	1315.7
Number_of_Claims.Total_Payment_in_thousands4	124422.2
Number_of_Claims.Total_Payment_in_thousands5	40119.4
Number_of_Claims.Total_Payment_in_thousands6	57170.9
16438.6634920635

References