Configuration
Parameter
| appId |
The ID of the application you want to store the crawler extractions in. |
| apiKey |
API key for your targeted application. |
| indexPrefix |
Prefix added to the names of all indices defined in the crawler’s configuration. |
| rateLimit |
Number of concurrent tasks per second that can run for this configuration. |
| schedule |
Use |
| startUrls |
The crawler uses these URLs as entry points to start crawling. |
| sitemaps |
URLs found in |
| ignoreRobotsTxtRules |
When set to |
| ignoreNoIndex |
Whether the Crawler should extract records from a page whose |
| ignoreNoFollowTo |
Whether the Crawler should follow links with the |
| ignoreCanonicalTo |
Whether the Crawler should extract records from a page that has a canonical URL specified. |
| extraUrls |
URLs found in |
| maxDepth |
Limits the processing of URLs to the specified depth, inclusively. |
| maxUrls |
Limits the number of URLs your crawler can process. |
| saveBackup |
Whether to save a backup of your production index before it is overwritten by the index generated during a crawl. |
| renderJavaScript |
When |
| initialIndexSettings |
Crawler index settings. |
| exclusionPatterns |
Tells the crawler which URLs to ignore or exclude. |
| ignoreQueryParams |
Filters out specified query parameters from crawled URLs. This can help you avoid indexing duplicate URLs. You can use wildcards to pattern match. |
| requestOptions |
Modify all crawler’s requests behavior. |
| linkExtractor |
Override the default logic used to extract URLs from pages. |
| externalData |
Defines the list of external data sources you want to use for this configuration, and make available to your extractor function. |
| login |
This property defines how the crawler acquires a session to access protected content. |
| safetyChecks |
Checks to ensure the crawl was successful. |
| actions |
Determines which web pages are translated into Algolia records and in what way. |
| discoveryPatterns |
Indicates additional web pages that the Crawler should visit. |
| hostnameAliases |
Defines mappings to replace given hostname(s). |
| pathAliases |
Defines mappings to replace a path in a hostname. |
| cache |
Turn crawler’s cache on or off. |
| ignorePaginationAttributes |
Whether the Crawler should follow pagination |