|
最近,正在使用R進行網路爬蟲,抓取amazon,在抓取頁數的過程中出現以下Error
Error: 1: Double hyphen within comment: <!--[if IE 6]>
<style type="text/css"><!
2: Double hyphen within comment: <!--
<div id="main" skeleton-key="results
3: Double hyphen within comment: <!--
<div id="main" skeleton-key="results--searchTempl
4: Double hyphen within comment: <!--
<div id="main" skeleton-key="results--searchTempl
我的AmazonR语言code如下:
library(rvest)
library(stringr)
library(XML)
library(RCurl)
restrictedSeachPage<-read_html("https://www.amazon.com/s/ref=sr_st_date-desc-rank?keywords=Apple&fst=as%3Aoff&rh=n%3A2335752011%2Cn%3A7072561011%2Cn%3A2407749011%2Ck%3AApple%2Cp_89%3AApple&qid=1483090072&sort=date-desc-rank")
#================================================================#
#前五個頁數
SearchPages<-list()
SearchPages[[1]]<-restrictedSeachPage
xpath<-'//a[@class="pagnNext"]/@href'
for (i in 2:5) {
nextPageLink<-xpathApply(xmlTreeParse(SearchPages[[i-1]]),xpath)
nextPageLink<-unlist(nextPageLink)
nextPageLink<-str_c("http://www.amazon.com/",nextPageLink)
SearchPages[[i]]<-read_html(nextPageLink)
}
請問該如何解決此問題? |
|