content/docs/introduction/start.md
Before working with Colly ensure that you have the latest version. See installation guide for more details.
Let's get started with some simple examples.
First, you need to import Colly to your codebase:
import "github.com/gocolly/colly"
Colly's main entity is a Collector object. Collector manages the network communication and responsible for the execution of the attached callbacks while a collector job is running. To work with colly, you have to initialize a Collector:
c := colly.NewCollector()
You can attach different type of callback functions to a Collector to control a collecting job or retrieve information. Check out the related section in the package documentation.
Collectorc.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.OnError(func(_ *colly.Response, err error) {
log.Println("Something went wrong:", err)
})
c.OnResponseHeaders(func(r *colly.Response) {
fmt.Println("Visited", r.Request.URL)
})
c.OnResponse(func(r *colly.Response) {
fmt.Println("Visited", r.Request.URL)
})
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnHTML("tr td:nth-of-type(1)", func(e *colly.HTMLElement) {
fmt.Println("First column of a table row:", e.Text)
})
c.OnXML("//h1", func(e *colly.XMLElement) {
fmt.Println(e.Text)
})
c.OnScraped(func(r *colly.Response) {
fmt.Println("Finished", r.Request.URL)
})
OnRequestCalled before a request
OnErrorCalled if error occured during the request
OnResponseHeadersCalled after response headers received
OnResponseCalled after response received
OnHTMLCalled right after OnResponse if the received content is HTML
OnXMLCalled right after OnHTML if the received content is HTML or XML
OnScrapedCalled after OnXML callbacks