How to Detect New Pages Added to Any Site
Fluxguard will crawl any website, extract all available pages, and discover new links as they’re added. Configuration options are abundant to customize how we crawl and find new URLs. Let’s dig in!
Add Any URL and Fluxguard Finds All Referenced Links
Fluxguard will crawl and extract all internal links from every crawled page. If the goal is to monitor an entire site, this means you don’t have to enter every URL by hand. Instead, Fluxguard will crawl and locate all additional links.
When more URLs are monitored, Fluxguard will keep finding new links. Since some sites have unlimited URLs, we limit new links to 75 total (150 for subscribers). Deleting or activating monitoring for newly found URLs and we’ll find more.
As the screenshot above displays, as new URLs are found, you will learn of them in Fluxguard’s emailed reports.
Automate and Optimize New URL Discovery
In Session Settings for every monitoring scenario, Fluxguard provides several ways to optimize new URL discovery.
As the screenshot above demonstrates, Fluxguard allows you to automate the monitoring of new URLs. By default, we find and propose new links. However, it can be adjusted to automatically add and monitor new URLs.
In general, automating URL addition is not commended, as depending on the site, it can add a large number of URLs. Moreover, it is purdent to limit the overall monitoring surface area to reduce false-positives.
You can also optimize URL discovery to find links that have (or don’t have) certain keywords. This reduces false-positive URLs discovered: for example, you can instruct Fluxguard to only find URLs with the phrase “press-release” in them to limit discovery to just that document type.
Advanced Harvesting Strategies
Fluxguard offers advanced link crawling and harvesting strategies to accomplish sophisticated monitoring goals.
AI-Guided Link Harvesting
Fluxguard offers AI-guided harvesting, ideal for targeting specific content or page types.
By enabling AI-guided link harvesting, you’ll be asked to specify your business interests, such as only surfacing a company’s legal materials. Fluxguard will use this information to focus on links relevant to these interests, optimizing the crawling by only URLs with relevant information. For instance, if you’re interested in learning about a company’s background, you could specify your interests as information about the company. Fluxguard will then focus on finding pages relevant to information about the company.
AI-guided link harvesting is available in site-level Settings.
Harvest and Crawl Internal Links Once
When you select this option, Fluxguard will find new links from monitored pages, crawl them once, harvest and process internal links, and pause each page that was crawled once. In other words, if you are monitoring a site’s home page, Fluxguard will continuously extract new links, monitor them once, and pause them.
For many customers, new links are a key item of interest. This approach will alert customers to new pages: but, since these new links may not have changing content, there’s little value in continued monitoring of them, and so they will be paused.
This strategy is particularly effective when combined with Fluxguard’s keyword monitoring. Keyword monitoring will alert when an initial scan of a page contains certain keywords of phrases. This way, Fluxguard will scan newly found links, alert you to the presence of specific keywords (via email and dashboard), and pause those pages.
Harvest and Crawl Internal Links Once, Without Additional Harvesting
This approach is identical to the above-described strategy. Namely, newly found links will be scanned once and paused. The only difference with this approach is that links will NOT be harvested from scanned-once pages.
This allows customers to monitor root or index pages, scan new links, and extract no links beyond the home page. This can be valuable for very large sites where you may not want to continually find links from sub-links, for example.
Harvest From Filtered HTML
By default, Fluxguard finds links from the full, pre-filtered HTML of monitored pages: in practice, this means that we will extract links from headers and navigation areas, even if you have utilized Fluxguard filters to remove those areas. You may optionally choose to restrict URL discovery to filtered areas to ignore new links that might appear only in filtered sections. This can be useful to focus link discovery on core content areas that are important to you, versus finding many irrelevant links from general purpose navigation and so on.
Evaluate and Summarize New Pages with Generative AI
You can instruct Fluxguard to assess newly found pages against your business rules. Only those pages that pass muster with our AI will be alerted to you. Fluxguard will also use generative AI to summarize these changes. Learn more.