Web Connector
Index internal or external websites by crawling HTML pages and content.
The Web connector enables the indexing of HTML web pages.
Enable Access
The Web connector indexes unauthenticated web page content which can be accessed from your Atolio deployment.
First, determine where your website is hosted. The Web connector can crawl HTML content by either:
- Navigating URLs from a configured
baseURL
. - Browsing an Amazon S3 bucket from a configured
url
.
In order to ensure that the content is accessible in your Atolio deployment, ensure that your Deployment Engineer is aware of the necessary network or AWS permissions as part of the infrastructure deployment.
Note: You may also need to ensure that your HTML content is server-side rendered (SSR) in order to index the complete website content.
(Optional) Provide Filters
Additional filters can be configured by your Deployment Engineer in order to:
- Include or exclude pages, e.g. particular URLs that match a given regex.
- Remove extraneous content, e.g. table of contents.