ðŽ{rvest}ã䜿ã£ãŠæ€ç©ã®åŠåãYListããååŸãã
äžäººRã¢ããã³ãã«ã¬ã³ããŒã®ïŒæ¥ç®ãäœæ¥ãŸã§ç¶ããããããªããã@dichika ãããèŠç¿ã£ãŠç¶ããã
ä»æ¥ã¯ä»äºã®è©±ã ãæ€ç©çæ åŠãç¹ã«çŸ€éããŒã¿ãæ±ãæã®tipsã«ã€ããŠæžããŠã¿ããã
矀éã察象ã«ãã調æ»ãè¡ã£ãå ŽåãïŒçš®ã ããåºçŸããããšããããšã¯çšã§ããã矀éå ã«çè²ããããŸããŸãªçš®ãããŒã¿ãšããŠæ±ãå¿ èŠãããããã®éãçš®åããŒã¿ã¯ååã§èšèŒãããŠãããã®ãåŠåã«ããŠãããã«åçŽåã®ããã«å±åãšçš®å°åãããªãèšå·ãšããŠæ±ãããšããã°ãã°ããã
ãã®æã®ååãåŠåã«çŽãäœæ¥ããå³éãåç §ããªããã¡ãŸã¡ãŸãããšæéããããããæã¡ééããçºçããããïŒåŠåã¯é·ãïŒãç¹ã«30çš®ãšãã«ãªããšèŸãããã®ãã以åã¯ãBG Plants ååâåŠåã€ã³ããã¯ã¹ïŒé称YListïŒã(http://ylist.info) ãæäŸããŠããcsvãã¡ã€ã«ãããååãšçš®åïŒåŠåïŒã®ãããã³ã°ã§å¯Ÿå¿ããŠããããŠããã
æ€ç©ååãŒåŠåã€ã³ããã¯ã¹ YListãïŒç¥ç§°ïŒYListïŒã¯ããæœèšã«ä¿åãããŠããç 究çšæ€ç©ã®ããŒã¿ããŒã¹ãïŒBG PlantsïŒã§çšããããæ€ç©åãç¹ã«ãæ¥æ¬ç£æ€ç©ã®ååãšåŠåã«é¢ãã詳现æ å ±ã®æŽåãç®çãšããŠã2003幎ã«ç±³å浩åžïŒæ±å倧åŠïŒãšæ¢¶ç°å¿ ïŒæ±äº¬å€§åŠãçŸã»åè倧åŠãïŒãäžå¿ã«äœæããããã®ã§ãã
ããã2015幎ïŒæã«ãµãŒããŒç§»è¡ãããããããçŸåšã§ã¯ãã¡ã€ã«ãå©çšããããšãã§ããªããªã£ãŠããã
ãããŒãå°ã£ãããã£ãŠãªã£ãã®ã ãã©ãããããã°ã俺ã«ã¯RããããããªããïŒããšããããšã§ {rvest}
ã䜿ã£ãŠååãã察å¿ããæšæºã®åŠåãååŸããŠã¿ããããããŠãåŸãããåŠåãã®ã¡ã®è§£æã§æ±ããããããã«ããããã®æé ã説æããã
ðŽ YListããã®åŠåæ å ±ã®ååŸ
䜿çšããããã±ãŒãžãèªã¿èŸŒãã {vegan}
ã¯æ€ç©çæ
åŠããã£ãŠãã人ã«ã¯è¶
æåãªããã±ãŒãžã ãä»ã {flora}
ããã±ãŒãžã¯åŠåã®æååããããªã«ããããã«å©çšããã
library(rvest) library(vegan) library(flora) library(tidyr) library(dplyr)
ããŠå®éã«ã©ãããããšãããšãYlistã®æ€çŽ¢æ©èœ(http://ylist.info/ylist_simple_search.html)ãå©çšãããŠããããä»»æã®ç§åãçš®åãå¥åãããŒãã«å«ãŸããæ€çŽ¢èªãæŸã£ãŠããŠãããã®ã§ãããã«èªåã®æ±ãããååãéãã°è¯ãã
ãŸãã¯Ylistãžã®ã»ãã·ã§ã³ã確ç«ããããã®æã®è¿ãå€ãç¹ã«Statusã200ã«ãªã£ãŠããã®ã§æ£åžžã«ã¢ã¯ã»ã¹ã§ããŠããããšããããã
(session <- html_session("http://ylist.info/ylist_simple_search.html"))
## <session> http://ylist.info/ylist_simple_search.html
## Status: 200
## Type: text/html
## Size: 3225
次ã«æ€çŽ¢ãã©ãŒã ã«éãããååã®æååãçšæãã衚瀺ãããæ€çŽ¢çµæã®ããŒãžããååŸãããHTMLã®éšåãxpathã§æå®ããã次ã®ã³ãŒãã®å®è¡çµæã以äžã«ç€ºãïŒå é ã®ïŒã€ã®ã¿è¡šç€ºïŒã
# ã¢ã«ã¬ã·ã«ã€ããŠã®åŠåãååŸãã form <- html_form(session)[[1]] %>% set_values("any_field" = "ã¢ã«ã¬ã·") submit_form(session, form) %>% html_nodes(xpath = "//*[@id='content']/span/span/a") %>% html_text() %>% { df_res <<- . head(., 3) }
## [1] "Quercus acuta Thunb.ã ã¢ã«ã¬ã·ãæšæº"
## [2] "Quercus acuta Thunb. var. yanagitae Makinoã ã¢ã«ã¬ã·ãsynonym"
## [3] "Quercus acuta Thunb. var. megaphylla (Hayashi)ã ã¢ã«ã¬ã·ãsynonym"
ãã¡ããšååŸã§ããŠãããããããã®ãŸãŸã§ã¯å©çšãã«ããã®ã§ãååŸããçµæã«åŠçãå ããŠæ¬¡ã®ããã«ããã
df_res %<>% data_frame(Species = .) %>% dplyr::filter(grepl("æšæº", Species)) %>% dplyr::mutate(Species = gsub("[[:space:]]æšæº", "", Species)) %>% tidyr::extract(col = Species, into = c("Species", "Jp.Species"), regex = "([[:print:]]+)[[:space:]]([[:print:]]+)") df_res %>% kable(formar = "markdown")
Species | Jp.Species |
---|---|
Quercus acuta Thunb. | ã¢ã«ã¬ã· |
Quercus acuta Thunb. f. acutiformis (Nakai) H.Ohashi | ãããã¢ã«ã¬ã· |
Quercus acuta Thunb. f. lanceolata Hatus. | ã€ãã®ã¢ã«ã¬ã· |
Quercus morii Hayata | ã¿ã€ã¯ã³ã¢ã«ã¬ã· |
Quercus x yokohamensis (Makino) Makino ex H.Ohba | ã€ãºã¢ã«ã¬ã· |
é ãè¿œã£ãŠèª¬æãããšããŸãå ã»ã©ã®çµæãããŒã¿ãã¬ãŒã ãšããŠæ ŒçŽããããããsynonimã§ã¯ãªãããæšæºãåŠåã®åŠåãæœåºããåŠåãšååã®åã«åé¢ãããããšãããã®ã§ããã
ãã ããã ãšããã¢ã«ã¬ã·ããšã€ãä»ã®çš®ã該åœããŠããŸã£ãããããããã®å Žåã«ã¯ããç§ã®çš®ãå«ãŸããŠããŸãã®ã§æ¬¡ã®ããã«ããã
df_res %>% dplyr::filter(Jp.Species == "ã¢ã«ã¬ã·")
## Source: local data frame [1 x 2]
##
## Species Jp.Species
## (chr) (chr)
## 1 Quercus acuta Thunb.ã ã¢ã«ã¬ã·
ãããé¢æ°åããŠäœ¿ãããããããqueryåŒæ°ãæã€ylist_names()
ãšããé¢æ°ãæžããŠã¿ãã
ylist_names(query = "ãã")
## Source: local data frame [1 x 2]
##
## Species Jp.Species
## (chr) (chr)
## 1 Fagus crenata Blumeã ãã
Ylistã®è¯ããšããã®äžã€ã¯ãååã®å¥åãããããæšæºååã§ãªãååãå
¥ããŠãæšæºååã«å¯Ÿå¿ãããçµæãè¿ããŠããããšããã ããªã®ã§ãæšæºååã§ã¯ãªãã ã·ã«ãªïŒãªãªã«ã¡ããã®å¥åïŒãå
¥ããŠããªãªã«ã¡ããïŒã¬ã³ãã¯ãœãŠç§ïŒãè¿ã£ãŠããä»æ§ã«ããããŸãlapply()
ãšçµã¿åãããããšã§è€æ°ã®çš®ããã¯ãã«åœ¢åŒã§äžããŠäžæ°ã«æ€çŽ¢ã§ããã
species <- c("ã¢ã«ã¬ã·", "ãã", "ã€ãã¬ã·", "ã ã·ã«ãª") lapply(species, ylist_names) %>% bind_rows() %>% { df_res <<- . kable(., format = "markdown") }
Species | Jp.Species |
---|---|
Quercus acuta Thunb. | ã¢ã«ã¬ã· |
Fagus crenata Blume | ãã |
Neolitsea aciculata (Blume) Koidz. | ã€ãã¬ã· |
Viburnum furcatum Blume ex Maxim. | ãªãªã«ã¡ãã |
lapply()
ã®é¢æ°ã®äœ¿ãæ¹ã«é¢ããŠã¯ãHadleyã®ãAdvanced Rãã§è©³ãã説æãããã£ãœãã翻蚳ãåºããããã®ã§æ°ã«ãªãæ¹ã¯äžèªããããšããå§ãããã
ð åŠåããŒã¿ãæ±ãããããã
å
ã®ãŸãŸã ãšãåŠåã«åœåè
ãªã©ã®æ
å ±ãå«ãŸããŠããŠãç¡é§ãšããã°ç¡é§ã§ããããšããããã§ããã㧠{flora}
ã®é¢æ°ãå©çšããã{flora}
ã«ã¯åŠåããåœåè
ã®æ
å ±ãåé€ããremove.authors()
ãšãã䟿å©ãªé¢æ°ãããã次ã®ããã«äœ¿ãã
df_res %<>% rowwise() %>% dplyr::mutate(Species = gsub("[[:space:]]$", "", Species)) %>% dplyr::mutate(Species = flora::remove.authors(Species)) %>% ungroup() df_res$Species
## [1] "Quercus acuta" "Fagus crenata" "Neolitsea aciculata" "Viburnum furcatum"
ããŠæ¬¡ã¯ããã®ïŒçš®ã«ã€ããŠç¥ç§°ãäžããŠã¿ãããå€çš®ããŒã¿ãæ±ãå ŽåãåŠåãå©çšãããšé·ãã®ã§ãå±åã®ã¿ã«ããããå±åãšçš®å°åã®é æåããšã£ãŠããããšãããããªããšãè¡ãããããããæåã§ãããšééãããã£ãããçš®ãè¿œå ããããšéè€ããŠããŸãå¯èœæ§ãããã®ã§ãRã«ä»»ããŠããŸããããã§ã¯ {vegan}
ã®make.cepnames()
ãçšããŠçš®åã®ç¥ç§°ãçæããããŸãæšæºé¢æ°ã®abbreviate()
ãå©çšããŠãè¯ãã
df_res %$% make.cepnames(Species)
## [1] "Queracut" "Fagucren" "Neolacic" "Vibufurc"
df_res %$% abbreviate(Species, 2, strict = TRUE)
## Quercus acuta Fagus crenata Neolitsea aciculata
## "Qa" "Fc" "Na"
## Viburnum furcatum
## "Vf"
ããã¡ãã£ãšæ¹è¯ããã¹ããšããã¯ããã ãããããšããããããã§å€çš®ããŒã¿ãæ±ãéã«ã¯æ©äŒããã£ãæã«æãã
ð» å®è¡ç°å¢
devtools::session_info() %>% { print(.$platform) .$packages %>% dplyr::filter(`*` == "*") %>% kable(format = "markdown") }
## setting value
## version R version 3.2.2 (2015-08-14)
## system x86_64, darwin13.4.0
## ui X11
## language En
## collate en_US.UTF-8
## tz Asia/Tokyo
## date 2015-12-03
package | * | version | date | source |
---|---|---|---|---|
dplyr | * | 0.4.3.9000 | 2015-10-28 | Github (<hadley/dplyr@52d10f6>) |
flora | * | 0.2.4 | 2015-03-20 | CRAN (R 3.1.3) |
ggplot2 | * | 1.0.1.9003 | 2015-10-17 | Github (<hadley/ggplot2@864d64f>) |
knitr | * | 1.11.8 | 2015-10-19 | Github (<yihui/knitr@a1b235d>) |
lattice | * | 0.20-33 | 2015-07-14 | CRAN (R 3.2.2) |
magrittr | * | 1.5 | 2015-07-28 | Github (<smbache/magrittr@effbadc>) |
permute | * | 0.8-4 | 2015-05-19 | CRAN (R 3.1.3) |
pipeR | * | 0.6.0.6 | 2015-07-08 | CRAN (R 3.2.0) |
remoji | * | 0.1.0 | 2015-11-19 | Github (<richfitz/remoji@dc00779>) |
rvest | * | 0.3.1 | 2015-11-11 | CRAN (R 3.2.2) |
tidyr | * | 0.3.1 | 2015-09-10 | CRAN (R 3.2.0) |
vegan | * | 2.3-2 | 2015-11-19 | CRAN (R 3.2.2) |
xml2 | * | 0.1.2 | 2015-09-01 | CRAN (R 3.2.0) |
ð åºå ž
ãã®ããŒãžã®åŠåããŒã¿ã¯ãBG Plants ååâåŠåã€ã³ããã¯ã¹ãããåŸãã
ç±³å浩åžã»æ¢¶ç°å¿ (2003-)ããBG Plants ååâåŠåã€ã³ããã¯ã¹ãïŒYListïŒïŒhttp://ylist.infoïŒ 2015幎12æ3æ¥ïŒ.
äžè¿°ããã³ãŒãã®å©çšã«é¢ããŠã¯ãã¢ã¯ã»ã¹éå€ãªã©ã®YListãžè¿·æã®ããããªãç¯å²å ã§ã®å©çšã«å¶éããŠã»ããã