This is an exercising memo which I spent hours for converting data before plotting it as a numeric value.
problem on converting data
While I want to convert a price data loaded from csv by data.table()
(below is run on JupyterNotebook)
1
2
3
4
|
data <- read.table("../resource/asnlib/publicdata/Auto Insurance data.csv", header =T)
str(data)
typeof(data)
head(data)
|
[output]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
'data.frame': 63 obs. of 1 variable:
$ Number_of_Claims.Total_Payment_in_thousands: chr "108,392.5" "19,46.2" "13,15.7" "124,422.2" ...
'list'
A data.frame: 6 × 1
Number_of_Claims.Total_Payment_in_thousands
<chr>
1 108,392.5
2 19,46.2
3 13,15.7
4 124,422.2
5 40,119.4
6 57,170.9
|
I misunderstood that the amount of Total_Payment_in_thousands
contains “,” value in weird position. In fact it is just a delimiter of columns for each row. Actually, this loading procedure using data.table()
against csv file brought every problem. If I used read_csv2()
directly, I did not need to spend such a silly time 😵
I struggled to eliminate this “,” value from the list and found 2 ways to do that finally. This is my 2nd day to start R. If I had to say, this experience gave me a good chance to study R language.
Solution 1 : list and for loop
Converting entire list is doable by gsub
function. However, this did not work because gsub
parse list into a character object. Also, gsub cannot be called from apply
function.
Alternatively, I did convert element one-by-one in for loop.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
payment_list = list()
for (i in data){
i <- gsub(",","",i)
payment_list = append(payment_list, i)
}
payment_list.n <- lapply(payment_list, as.numeric)
# payment_list.n
df <- data.frame(p=unlist(payment_list.n))
str(df)
head(df)
# df["p"]
mean(df$p)
|
[output]
1
2
3
4
5
6
7
8
9
10
11
12
|
'data.frame': 63 obs. of 1 variable:
$ p: num 108392 1946 1316 124422 40119 ...
A data.frame: 6 × 1
p
<dbl>
1 108392.5
2 1946.2
3 1315.7
4 124422.2
5 40119.4
6 57170.9
16438.6634920635
|
Solution 2 : vector and sapply function
Xapply
function enables to work command in argument against the element of list, vector, dataframe, etc.
For avoiding gsub
parsing issue, convert list into vector by unlist
1
2
3
4
5
6
7
8
|
data.vec <- gsub(",","",unlist(data))
data.vec2 <- sapply(data.vec, as.numeric)
str(data.vec2)
df2 <- data.frame(p=data.vec2)
str(df2)
head(df2)
mean(df2$p)
|
[output]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Named num [1:63] 108392 1946 1316 124422 40119 ...
- attr(*, "names")= chr [1:63] "Number_of_Claims.Total_Payment_in_thousands1" "Number_of_Claims.Total_Payment_in_thousands2" "Number_of_Claims.Total_Payment_in_thousands3" "Number_of_Claims.Total_Payment_in_thousands4" ...
'data.frame': 63 obs. of 1 variable:
$ p: num 108392 1946 1316 124422 40119 ...
A data.frame: 6 × 1
p
<dbl>
Number_of_Claims.Total_Payment_in_thousands1 108392.5
Number_of_Claims.Total_Payment_in_thousands2 1946.2
Number_of_Claims.Total_Payment_in_thousands3 1315.7
Number_of_Claims.Total_Payment_in_thousands4 124422.2
Number_of_Claims.Total_Payment_in_thousands5 40119.4
Number_of_Claims.Total_Payment_in_thousands6 57170.9
16438.6634920635
|
References