DiscoveryConfig

Struct DiscoveryConfig 

pub struct DiscoveryConfig {
    pub mode: DiscoveryMode,
    pub discover_sitemaps: bool,
    pub max_sitemap_depth: usize,
    pub extract_page_metadata: bool,
    pub link_extract_options: LinkExtractOptions,
    pub rules: Vec<DiscoveryRule>,
}
Expand description

Core runtime types and traits used to define and run a crawl. Discovery-specific runtime configuration.

Fields§

§mode: DiscoveryMode

How the runtime should discover follow-up work from responses.

§discover_sitemaps: bool

Whether sitemap XML should be parsed into follow-up requests.

§max_sitemap_depth: usize

Maximum recursion depth for nested sitemap indexes.

§extract_page_metadata: bool

Whether page metadata should be extracted and attached to response metadata.

§link_extract_options: LinkExtractOptions

Base link extraction options used for HTML discovery.

§rules: Vec<DiscoveryRule>

Optional rule-like link discovery behavior matched against source responses.

Implementations§

§

impl DiscoveryConfig

pub fn new() -> DiscoveryConfig

Creates a new discovery config with default values.

pub fn with_mode(self, mode: DiscoveryMode) -> DiscoveryConfig

Sets the discovery mode.

pub fn with_sitemaps(self, enabled: bool) -> DiscoveryConfig

Enables or disables sitemap parsing.

pub fn with_max_sitemap_depth(self, depth: usize) -> DiscoveryConfig

Sets the maximum nested sitemap depth.

pub fn with_page_metadata(self, enabled: bool) -> DiscoveryConfig

Enables or disables page metadata extraction.

Replaces the base link extraction options.

pub fn with_rules( self, rules: impl IntoIterator<Item = DiscoveryRule>, ) -> DiscoveryConfig

Replaces the configured discovery rules.

pub fn with_rule(self, rule: DiscoveryRule) -> DiscoveryConfig

Adds a single discovery rule.

pub fn with_same_site_only(self, enabled: bool) -> DiscoveryConfig

Sets whether only same-site links should be discovered.

Sets whether text content should be scanned for plain-text URLs.

pub fn with_allow_patterns( self, patterns: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to URLs that match at least one glob-style pattern.

pub fn with_deny_patterns( self, patterns: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes URLs that match any glob-style pattern.

pub fn with_allow_domains( self, domains: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to the given domains or subdomains.

pub fn with_deny_domains( self, domains: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes discovery for the given domains or subdomains.

pub fn with_allow_path_prefixes( self, prefixes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to URL paths with one of the provided prefixes.

pub fn with_deny_path_prefixes( self, prefixes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes URL paths with one of the provided prefixes.

pub fn with_allowed_tags( self, tags: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts attribute extraction to specific HTML tags.

pub fn with_allowed_attributes( self, attributes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts attribute extraction to specific attributes.

Restricts discovery to the provided link types.

Excludes the provided link types from discovery.

Returns the effective link extraction options for the configured mode.

Returns the effective link extraction options for a specific rule or override.

pub fn should_extract_metadata(&self) -> bool

Returns true when metadata extraction should run.

Trait Implementations§

§

impl Clone for DiscoveryConfig

§

fn clone(&self) -> DiscoveryConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
§

impl Debug for DiscoveryConfig

§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
§

impl Default for DiscoveryConfig

§

fn default() -> DiscoveryConfig

Returns the “default value” for a type. Read more
§

impl PartialEq for DiscoveryConfig

§

fn eq(&self, other: &DiscoveryConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
§

impl Eq for DiscoveryConfig

§

impl StructuralPartialEq for DiscoveryConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more