6.1 RMySQL数据库编程指南
问题
R语言如何访问MySQL?
引言
MySQL是一款最常用的开源数据库软件,安装简单(关于MySQL的安装和配置,请参考附录B),运行稳定,非常适用于中小型的数据存储。R作为数据分析的工具,当然要支持数据库驱动接口。让R和MySQL配合在一起,所能爆发出的能量是巨大的。
RMySQL是一个R语言程序包,提供了访问MySQL数据库的R语言接口程序,RMySQL的依赖于DBI包。RMySQL不仅提供了基本的数据库访问,SQL查询,还封装了一些方法,比如读整表、分页、data.frame快速插入等的功能。掌握好RMySQL,数据库编辑将得心应手!
由于操作系统的原因,Windows和Linux有不一样的字符集,不一样的运行时环境。所以,本节将分别介绍如何在Linux和Windows上面安装和使用RMySQL。
6.1.1 RMySQL在Linux下安装
Linux系统环境:
- Linux: Ubuntu 12.04.2 LTS 64bit server
- Linux字符集: en_US.UTF-8
- R: 3.0.1, x86_64-pc-linux-gnu (64-bit)
- MySQL: Ver 14.14 Distrib 5.5.29 64bit server
- MySQL字符集: utf8
1. 查看Linux下的系统环境
~ uname -a # Linux内核
Linux conan 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
~ cat /etc/issue # Linux版本
Ubuntu 12.04.2 LTS \n \l
~ locale # Linux系统字符集设置LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
~ mysql --version # 查看MySQL数据库服务器版本
mysql Ver 14.14 Distrib 5.5.29, for debian-linux-gnu (x86_64) using readline 6.2
mysql> show variables like '%char%'; # 查看MySQL字符集
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
2. 在R环境中安装RMySQL
~ R # 启动R程序
> install.packages('RMySQL') # 安装RMySQL
# 省略部分输出
Configuration error:
could not find the MySQL installation include and/or library
directories. Manually specify the location of the MySQL
libraries and the header files and re-run R CMD INSTALL.
INSTRUCTIONS:
1. Define and export the 2 shell variables PKG_CPPFLAGS and
PKG_LIBS to include the directory for header files (*.h)
and libraries, for example (using Bourne shell syntax):
export PKG_CPPFLAGS="-I"
export PKG_LIBS="-L -lmysqlclient"
Re-run the R INSTALL command:
R CMD INSTALL RMySQL_.tar.gz
2. Alternatively, you may pass the configure arguments
--with-mysql-dir= (distribution directory)
or
--with-mysql-inc= (where MySQL header files reside)
--with-mysql-lib= (where MySQL libraries reside)
in the call to R INSTALL --configure-args='...'
R CMD INSTALL --configure-args='--with-mysql-dir=DIR' RMySQL_.tar.gz
ERROR: configuration failed for package ‘RMySQL’
* removing ‘/home/conan/R/x86_64-pc-linux-gnu-library/3.0/RMySQL’
The downloaded source packages are in
‘/tmp/Rtmpu0Gn88/downloaded_packages’
Warning message:
In install.packages("RMySQL") :
installation of package ‘RMySQL’ had non-zero exit status
安装出错了,提示我们需要增加MySQL安装目录的配置参数 --with-mysql-dir。下面我们解决安装出错的问题。
~ sudo apt-get install libdbd-mysql libmysqlclient-dev # 安装mysql类库
~ whereis mysql # 找到mysql的安装目录
mysql: /usr/bin/mysql /etc/mysql /usr/lib/mysql /usr/bin/X11/mysql /usr/share/mysql /usr/share/man/man1/mysql.1.gz
~ ls /tmp/Rtmpu0Gn88/downloaded_packages # 找到刚刚下载的文件,RMySQL_.tar.gz
DBI_0.2-7.tar.gz RMySQL_0.9-3.tar.gz
~ R CMD INSTALL --configure-args='--with-mysql-dir=/usr/lib/mysql' /tmp/Rtmpu0Gn88/downloaded_packages/RMySQL_0.9-3.tar.gz # 通过命令重新安装RMySQL
# 省略部分输出
installing to /home/conan/R/x86_64-pc-linux-gnu-library/3.0/RMySQL/libs
** R
** inst
** preparing package for lazy loading
Creating a generic function for ‘format’ from package ‘base’ in package ‘RMySQL’
Creating a generic function for ‘print’ from package ‘base’ in package ‘RMySQL’
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RMySQL)
这次RMySQL安装成功。
3. 在MySQL中建库建表
~ mysql -uroot –p # 通过mysql命令行客户端工具,登陆本地MySQL
mysql> create database rmysql; # 创建一个数据库叫rmysql
Query OK, 1 row affected (0.00 sec)
mysql> grant all on rmysql.* to rmysql@localhost identified by 'rmysql'; # 授权rmysql用户,可以本地操作rmysql库
Query OK, 0 rows affected (0.00 sec)
mysql> use rmysql # 切换到rmysql库
Database changed
mysql> CREATE TABLE t_user( # 创建一个表 t_user
-> id INT PRIMARY KEY AUTO_INCREMENT,
-> user varchar(12) NOT NULL UNIQUE
-> )ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO t_user(user) values('A1'),('AB'),('fens.me'); # 插入3条数据
Query OK, 3 rows affected (0.04 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t_user; # 查询数据
+----+---------+
| id | user |
+----+---------+
| 1 | A1 |
| 2 | AB |
| 3 | fens.me |
+----+---------+
3 rows in set (0.00 sec)
4. 通过R程序,读MySQL数据库数据
~ R # 启动R程序
> library(RMySQL) # 加载RMySQL
Loading required package: DBI
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql") # 建立连接
> users = dbGetQuery(conn, "SELECT * FROM t_user") # 执行SQL
> users # 查看数据
id user
1 1 A1
2 2 AB
3 3 fens.me
> dbDisconnect(conn) # 断开连接
[1] TRUE
好了,我们已经实现了在Linux Ubuntu下面R和MySQL的连接。
6.1.3 RMySQL在Win7下安装
Win系统环境:
- Win7: 64位旗舰版
- Win字符集: gbk,utf8
- R: 3.0.1, x86_64-w64-mingw32/x64 (64-bit)
- MySQL: mysql Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)
1. 查看Win7下的系统环境
~ R --version # 查看R版本
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
~ mysql --version # MySQL版本
mysql Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)
mysql> show variables like '%char%'; # MySQL字符集
+--------------------------+------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | D:\toolkit\mysql56\share\charsets\ |
+--------------------------+---
8 rows in set (0.07 sec)
2. 在R环境中安装RMySQL
~ R # 启动R程序
> install.packages('RMySQl') # 安装RMySQL
package 'RMySQl' is not available (for R version 3.0.1)
又遇到错误,提示没有对应的RMySQL安装版本。
由于RMySQL没有提供Windows版本的发行包,所以我们必须在Windows中手动编译RMySQL的包后,再进行安装。
> install.packages("RMySQL", type="source") # 下载RMySQL源代码包
URLhttp://cran.dataguru.cn/src/contrib/RMySQL_0.9-3.tar.gz'
Content type 'application/x-gzip' length 165363 bytes (161 Kb)
URL
downloaded 161 Kb
找到源代码包:RMySQL_0.9-3.tar.gz
~ dir C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages
2013-09-24 13:16 165,363 RMySQL_0.9-3.tar.gz
通过源代码包安装
~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
* installing to library 'C:/Program Files/R/R-3.0.1/library'
* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... not found... searching registry...
# 省略部分输出
readRegistry("SOFTWARE\\MySQL AB", hive = "HLM", maxdepth = 2) :
Registry key 'SOFTWARE\MySQL AB' not found
ERROR: configuration failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'
又出错了,我们还需要设置MYSQL_HOME的环境变量。
~ set MYSQL_HOME=D:\toolkit\mysql56
注: MYSQL_HOME建议设置在系统环境变量中。
再一次安装RMySQL
D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
# 省略部分输出
gcc.exe: error: D:\toolkit\mysql56/bin/libmySQL.dll: No such file or directory
ERROR: compilation failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'
错误不断,这次提示为没有找到动态链接库文件 D:\toolkit\mysql56/bin/libmySQL.dll 。
# 复制动态链接库libmySQL.dll
cp D:\toolkit\mysql56\lib\libmysql.dll D:\toolkit\mysql56\bin\
mv D:\toolkit\mysql56\bin\libmysql.dll D:\toolkit\mysql56\bin\libmySQL.dll
再一次安装RMySQL
~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
# 省略部分输出
installing to C:/Program Files/R/R-3.0.1/library/RMySQL/libs/x64
** R
** inst
** preparing package for lazy loading
Creating a generic function for 'format' from package 'base' in package 'RMySQL'
Creating a generic function for 'print' from package 'base' in package 'RMySQL'
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
MYSQL_HOME defined as D:\toolkit\mysql56
* DONE (RMySQL)
终于安装成功了,值得庆祝!
3. 在MySQL中建库建表
~ mysql -uroot –p # 通过mysql命令行客户端工具,登陆本地MySQL
mysql> create database rmysql; # 创建一个数据库叫rmysql
Query OK, 1 row affected (0.00 sec)
~ mysql -uroot –p # 通过mysql命令行客户端工具,登陆本地MySQL
mysql> create database rmysql; # 创建一个数据库叫rmysql
Query OK, 1 row affected (0.00 sec)
mysql> grant all on rmysql.* to rmysql@localhost identified by 'rmysql'; # 授权rmysql用户,可以本地操作rmysql库
Query OK, 0 rows affected (0.00 sec)
mysql> use rmysql # 切换到rmysql库
Database changed
mysql> CREATE TABLE t_user( # 创建一个表 t_user
-> id INT PRIMARY KEY AUTO_INCREMENT,
-> user varchar(12) NOT NULL UNIQUE
-> )ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO t_user(user) values('A1'),('AB'),('fens.me'); # 插入3条数据
Query OK, 3 rows affected (0.04 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t_user; # 查询数据
+----+---------+
| id | user |
+----+---------+
| 1 | A1 |
| 2 | AB |
| 3 | fens.me |
+----+---------+
3 rows in set (0.00 sec)
4. 通过R程序,读MySQL数据库数据
~ R # 启动R程序
> library(RMySQL) # 加载RMySQL
Loading required package: DBI
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql") # 建立连接
> users = dbGetQuery(conn, "SELECT * FROM t_user") # 执行SQL
> users # 查看数据
id user
1 1 A1
2 2 AB
3 3 fens.me
> dbDisconnect(conn) # 断开连接
[1] TRUE
好了,我们已经实现了在Linux Ubuntu下面R和MySQL的连接。
6.1.3 RMySQL在Win7下安装
Win系统环境:
- Win7: 64位旗舰版
- Win字符集: gbk,utf8
- R: 3.0.1, x86_64-w64-mingw32/x64 (64-bit)
- MySQL: mysql Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)
1. 查看Win7下的系统环境
~ R --version # 查看R版本
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
~ mysql --version # MySQL版本
mysql Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)
mysql> show variables like '%char%'; # MySQL字符集
+--------------------------+------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | D:\toolkit\mysql56\share\charsets\ |
+--------------------------+------------------------------------+
8 rows in set (0.07 sec)
2. 在R环境中安装RMySQL
~ R # 启动R程序
> install.packages('RMySQl') # 安装RMySQL
package 'RMySQl' is not available (for R version 3.0.1)
又遇到错误,提示没有对应的RMySQL安装版本。
由于RMySQL没有提供Windows版本的发行包,所以我们必须在Windows中手动编译RMySQL的包后,再进行安装。
> install.packages("RMySQL", type="source") # 下载RMySQL源代码包
URLhttp://cran.dataguru.cn/src/contrib/RMySQL_0.9-3.tar.gz'
Content type 'application/x-gzip' length 165363 bytes (161 Kb)
URL
downloaded 161 Kb
找到源代码包:RMySQL_0.9-3.tar.gz
~ dir C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages
2013-09-24 13:16 165,363 RMySQL_0.9-3.tar.gz
通过源代码包安装
~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
* installing to library 'C:/Program Files/R/R-3.0.1/library'
* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... not found... searching registry...
# 省略部分输出
readRegistry("SOFTWARE\\MySQL AB", hive = "HLM", maxdepth = 2) :
Registry key 'SOFTWARE\MySQL AB' not found
ERROR: configuration failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'
又出错了,我们还需要设置MYSQL_HOME的环境变量。
~ set MYSQL_HOME=D:\toolkit\mysql56
注: MYSQL_HOME建议设置在系统环境变量中。
再一次安装RMySQL
D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
# 省略部分输出
gcc.exe: error: D:\toolkit\mysql56/bin/libmySQL.dll: No such file or directory
ERROR: compilation failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'
错误不断,这次提示为没有找到动态链接库文件 D:\toolkit\mysql56/bin/libmySQL.dll 。
# 复制动态链接库libmySQL.dll
cp D:\toolkit\mysql56\lib\libmysql.dll D:\toolkit\mysql56\bin\
mv D:\toolkit\mysql56\bin\libmysql.dll D:\toolkit\mysql56\bin\libmySQL.dll
再一次安装RMySQL
~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3.tar.gz
# 省略部分输出
installing to C:/Program Files/R/R-3.0.1/library/RMySQL/libs/x64
** R
** inst
** preparing package for lazy loading
Creating a generic function for 'format' from package 'base' in package 'RMySQL'
Creating a generic function for 'print' from package 'base' in package 'RMySQL'
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
MYSQL_HOME defined as D:\toolkit\mysql56
* DONE (RMySQL)
终于安装成功了,值得庆祝!
3. 在MySQL中建库建表
操作过程与在Linux中操作过程一致。
~ mysql -uroot -p
mysql> create database rmysql;
Query OK, 1 row affected (0.04 sec)
mysql> grant all on rmysql.* to rmysql@'%' identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all on rmysql.* to rmysql@localhost identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)
mysql> use rmysql
Database changed
mysql> CREATE TABLE t_user(
-> id INT PRIMARY KEY AUTO_INCREMENT,
-> user varchar(12) NOT NULL UNIQUE
-> )ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (1.01 sec)
mysql>
mysql> INSERT INTO t_user(user) values('A1'),('AB'),('fens.me');
Query OK, 3 rows affected (0.05 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t_user;
+----+---------+
| id | user |
+----+---------+
| 1 | A1 |
| 2 | AB |
| 3 | fens.me |
+----+---------+
3 rows in set (0.03 sec)
4. 通过R程序,读MySQL数据库数据。
如果刚才没有把MYSQL_HOME的变量写到环境变更中,每次在启动R之前都要设置这个变量。
~ set MYSQL_HOME=D:\toolkit\mysql56 # 在当前环境中,设置环境变量
~ R # 启动R程序
> library(RMySQL) # 加载RMySQL
DBI
MYSQL_HOME defined as D:\toolkit\mysql56
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql") # 建立连接
> users = dbGetQuery(conn, "SELECT * FROM t_user") # 执行SQL
> users
id user
1 1 A1
2 2 AB
3 3 fens.me
> dbDisconnect(conn) # 断开连接
[1] TRUE
好了,我们又实现了在Win7下的R和MySQL的连接
6.1.4 RMySQL函数使用
环境都安装好了,接下来我们就具体使用RMySQL包了。本节主要介绍RMySQL辅助操作、数据库操作以及win的字符集设置。
1. RMySQL辅助函数
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", # 建立本地连接
password="rmysql",client.flag=CLIENT_MULTI_STATEMENTS)
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", # 建立远程连接
password="rmysql",host="192.168.1.201",port=3306)
> dbDisconnect(conn) # 关闭连接
> dbListTables(conn) # 查看数据库中所有的表
[1] "t_user"
> dbListFields(conn, "t_user") # 查看表的字段
[1] "id" "user"
> summary(MySQL(), verbose = TRUE) # 查询MySQL信息
<MySQLDriver:(23864)>
Driver name: MySQL
Max connections: 16
Conn. processed: 3
Default records per fetch: 500
DBI API version:
> summary(conn, verbose = TRUE) # MySQL连接实例信息
<MySQLConnection:(23864,2)>
User: root
Host: localhost
Dbname: rmysql
Connection type: localhost via TCP/IP
MySQL server version: 5.6.11
MySQL client version: 5.6.11
MySQL protocol version: 10
MySQL server thread id: 35
No resultSet available
> dbListConnections(MySQL()) # MySQL连接信息
[[1]]
<MySQLConnection:(23864,2)>
2. RMySQL数据库操作
> t_demo<-data.frame( # 建表并插入数据
a=seq(1:10),
b=letters[1:10],
c=rnorm(10)
)
> dbWriteTable(conn, "t_demo", t_demo)
> dbReadTable(conn, "t_demo") # 获得整个表数据
a b c
1 1 a 0.98868164
2 2 b -0.66935770
3 3 c 0.27703638
4 4 d 1.36137156
5 5 e -0.70291017
6 6 f 1.61235088
7 7 g 0.17616068
8 8 h 0.29700017
9 9 i 0.19032719
10 10 j -0.06222173
> dbWriteTable(conn, "t_demo", t_demo, append=TRUE) # 插入新数据
> dbReadTable(conn, "t_demo")
row_names a b c
1 1 1 a 0.98868164
2 2 2 b -0.66935770
3 3 3 c 0.27703638
4 4 4 d 1.36137156
5 5 5 e -0.70291017
6 6 6 f 1.61235088
7 7 7 g 0.17616068
8 8 8 h 0.29700017
9 9 9 i 0.19032719
10 10 10 j -0.06222173
11 1 1 a 0.98868164
12 2 2 b -0.66935770
13 3 3 c 0.27703638
14 4 4 d 1.36137156
15 5 5 e -0.70291017
16 6 6 f 1.61235088
17 7 7 g 0.17616068
18 8 8 h 0.29700017
19 9 9 i 0.19032719
20 10 10 j -0.06222173
dbWriteTable(conn, "t_demo", t_demo, overwrite=TRUE) # 覆盖原表数据
> d0 <- dbGetQuery(conn, "SELECT * FROM t_demo where c>0") # 查询数据
> class(d0)
[1] "data.frame"
> d0
row_names a b c
1 1 1 a 0.9886816
2 3 3 c 0.2770364
3 4 4 d 1.3613716
4 6 6 f 1.6123509
5 7 7 g 0.1761607
6 8 8 h 0.2970002
7 9 9 i 0.1903272
> rs <- dbSendQuery(conn, "SELECT * FROM t_demo where c>0") # 执行SQL脚本查询,并分页
> class(rs)
[1] "MySQLResult"
attr(,"package")
[1] "RMySQL"
> mysqlCloseResult(rs)
[1] TRUE
> d1 <- fetch(rs, n = 3)
> d1
row_names a b c
1 1 1 a 0.9886816
2 3 3 c 0.2770364
3 4 4 d 1.3613716
> summary(rs, verbose = TRUE) # 查看集统计信息
row_names a b c
Length:7 Min. :1.000 Length:7 Min. :0.1762
Class :character 1st Qu.:3.500 Class :character 1st Qu.:0.2337
Mode :character Median :6.000 Mode :character Median :0.2970
Mean :5.429 Mean :0.7004
3rd Qu.:7.500 3rd Qu.:1.1750
Max. :9.000 Max. :1.6124
> dbWriteTable(conn, "t_demo", t_demo,row.names=FALSE,overwrite=TRUE) # 不插入row.names字段
> dbGetQuery(conn, "SELECT * FROM t_demo where c>0")
a b c
1 1 a 0.9886816
2 3 c 0.2770364
3 4 d 1.3613716
4 6 f 1.6123509
5 7 g 0.1761607
6 8 h 0.2970002
7 9 i 0.1903272
> if(dbExistsTable(conn,'t_demo')){ # 删除表
+ dbRemoveTable(conn, "t_demo")
+ }
[1] TRUE
执行SQL语句,dbSendQuery
> query<-dbSendQuery(conn, "show tables")
> data <- fetch(query, n = -1)
> data
Tables_in_rmysql
1 t_demo
2 t_user
> mysqlCloseResult(query)
[1] TRUE
特别提示,尽量避免用dbWriteTable()函数,因为它会删除你原来定义的表结构,创建新的表结构再插入数据!
3. Windows的字符集设置
在Win7中,向MySQL数据库中插入中文时,由于字符编码是gbk,所以会出现中文的乱码。下面通过MySQL命令行客户端插入中文。
mysql> INSERT INTO t_user(user) values('小朋友'),('你好'),('正确了'); # 插入中文数据
Query OK, 3 rows affected (0.07 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from t_user; # 查询数据,中文显示正确
+----+---------+
| id | user |
+----+---------+
| 1 | A1 |
| 2 | AB |
| 3 | fens.me |
| 5 | 你好 |
| 4 | 小朋友 |
| 6 | 正确了 |
+----+---------+
6 rows in set (0.07 sec)
通过RMySQL查询
> dbGetQuery(conn, "SELECT * FROM t_user") # 查询数据,中文显示为乱码
id user
1 1 A1
2 2 AB
3 3 fens.me
4 5 ??
5 4 ???
6 6 ???
在R的API中,设置GKB字符集。
> dbDisconnect(conn)
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="root", password="",client.flag=CLIENT_MULTI_STATEMENTS)
> dbSendQuery(conn,'SET NAMES gbk') # 设置GKB字符集
> query<-dbSendQuery(conn, "SELECT * FROM t_user") # 查询数据
> data <- fetch(query, n = -1)
> mysqlCloseResult(query)
[1] TRUE
> data # 中文显示正确
id user
1 1 A1
2 2 AB
3 3 fens.me
4 5 你好
5 4 小朋友
6 6 正确了
我们在Win7下面修正字符编号的问题。
6.1.5 RMySQL案例实践
接下来,我们再做一个远程数据库连接的案例。在Win7的环境使用RMySQL客户端连接远程Linux上的MySQL数据库服务器。
1. 在远程数据库创建表
通过SQL语句,先在MySQL数据库建一张新表t_blog,主键索引,唯一键索引。下面是建表语句。
# 建表
mysql> CREATE TABLE t_blog(
id INT PRIMARY KEY AUTO_INCREMENT,
title varchar(12) NOT NULL UNIQUE,
author varchar(12) NOT NULL,
length int NOT NULL,
create_date timestamp NOT NULL DEFAULT now()
)ENGINE=INNODB DEFAULT CHARSET=UTF8;
mysql> desc t_blog; # 查看表结构
+-------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(12) | NO | UNI | NULL | |
| author | varchar(12) | NO | | NULL | |
| length | int(11) | NO | | NULL | |
| create_date | timestamp | NO | | CURRENT_TIMESTAMP | |
+-------------+-------------+------+-----+-------------------+----------------+
5 rows in set (0.00 sec)
mysql> show indexes from t_blog; # 查看表的索引
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_blog | 0 | PRIMARY | 1 | id | A | 3 | NULL | NULL | | BTREE | | |
| t_blog | 0 | title | 1 | title | A | 3 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
# 插入数据
mysql> INSERT INTO t_blog(title,author,length) values('你好,第一篇','Conan',20),('RMySQL数据库编程','Conan',99),('R的极客理想系列文章','Conan',15);
mysql> select * from t_blog; # 查询表
+----+------------------------------+--------+--------+---------------------+
| id | title | author | length | create_date |
+----+------------------------------+--------+--------+---------------------+
| 1 | 你好,第一篇 | Conan | 20 | 2013-08-15 00:13:13 |
| 2 | RMySQL数据库编程 | Conan | 99 | 2013-08-15 00:13:13 |
| 3 | R的极客理想系列文章 | Conan | 15 | 2013-08-15 00:13:13 |
+----+------------------------------+--------+--------+---------------------+
3 rows in set (0.00 sec)
2. RMySQL访问远程MySQL
用RMySQL远程访问数据库,并插入数据,包括中文字段,再取出数据。
> library(RMySQL)
# 建立远程连接
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql",host="192.168.1.201",port=3306)
> dbSendQuery(conn,'SET NAMES gbk') # 设置gbk字符编码
# 执行SQL,插入中文数据
> dbSendQuery(conn,"INSERT INTO t_blog(title,author,length) values('R插入的新文章','Conan',50)");
> query<-dbSendQuery(conn, "SELECT * FROM t_blog") # 查询数据
Warning message:
In mysqlExecStatement(conn, statement, ...) :
RS-DBI driver warning: (unrecognized MySQL field type 7 in column 4 imported as character)
> data <- fetch(query, n = -1) # 查询数据
> mysqlCloseResult(query)
[1] TRUE
> print(data) # 中文显示正常
id title author length create_date
1 1 你好,第一篇 Conan 20 2013-08-15 00:13:13
2 2 RMySQL数据库编程 Conan 99 2013-08-15 00:13:13
3 3 R的极客理想系列文章 Conan 15 2013-08-15 00:13:13
4 4 R插入的新文章 Conan 50 2013-08-15 00:29:45
>
> dbDisconnect(conn)
[1] TRUE
掌握RMySQL的各种使用技巧并且理解原理,就能少犯错误,提高工作效率!