Module prelude

Module prelude 

Source
Expand description

Convenient re-exports for spider-lib applications.

Most example code in this workspace starts here:

use spider_lib::prelude::*;

The prelude intentionally groups together the “first spider” surface area: runtime types, the spider trait, common errors, parsing helpers, middleware, and the most common pipelines.

Re-exports§

pub use spider_core::tokio;

Structs§

AutoThrottleMiddleware
Adaptive throttling middleware driven by observed response behavior. Middleware that adapts pacing dynamically based on response feedback.
Checkpoint
Checkpoint types for save/resume workflows. A complete checkpoint of the crawler’s state.
ConcurrentMap
Core runtime types and traits used to define and run a crawl. A thread-safe key-value map using DashMap.
ConcurrentVec
Core runtime types and traits used to define and run a crawl. A thread-safe vector using RwLock.
ConsolePipeline
Built-in pipelines that do not require extra feature flags. Pipeline that logs each scraped item with log::info!.
CookieMiddleware
Shared cookie jar middleware. Middleware that keeps a shared cookie store across requests.
Counter
Core runtime types and traits used to define and run a crawl. A thread-safe counter using atomic operations.
Counter64
Core runtime types and traits used to define and run a crawl. A 64-bit thread-safe counter for large counts.
Crawler
Core runtime types and traits used to define and run a crawl. The running crawler instance.
CrawlerBuilder
Core runtime types and traits used to define and run a crawl. A fluent builder for constructing Crawler instances.
CrawlerConfig
Core runtime types and traits used to define and run a crawl. Core runtime configuration for the crawler.
CrawlerState
Core runtime types and traits used to define and run a crawl. Internal shared state used by the runtime.
CsvPipeline
CSV file output pipeline. A pipeline that exports scraped items to a CSV file. Headers are determined from the keys of the first item processed.
DeduplicationPipeline
Built-in pipelines that do not require extra feature flags. Pipeline that filters duplicate items based on a configurable field set.
DiscoveryConfig
Core runtime types and traits used to define and run a crawl. Discovery-specific runtime configuration.
DiscoveryRule
Core runtime types and traits used to define and run a crawl. Rule-like configuration for runtime-managed discovery.
Flag
Core runtime types and traits used to define and run a crawl. A thread-safe boolean flag.
HttpCacheMiddleware
File-backed HTTP response cache middleware. Middleware that caches successful HTTP responses on disk.
ItemFieldSchema
Static schema metadata for a single item field.
ItemSchema
Static schema metadata for a scraped item type.
JsonPipeline
JSON array output pipeline. A pipeline that writes all scraped items to a single JSON file as a JSON array. Items are collected in a blocking task and written to disk when the pipeline is closed.
JsonlPipeline
JSON Lines output pipeline. A pipeline that writes each scraped item to a JSON Lines (.jsonl) file. Each item is written as a JSON object on a new line.
Link
Shared runtime data types and convenience helpers. A link discovered while extracting URLs from a response.
LinkExtractOptions
Shared runtime data types and convenience helpers. Options that control link extraction from a Response.
LinkSource
Shared runtime data types and convenience helpers. One selector/attribute pair used during link extraction.
PageMetadata
Shared runtime data types and convenience helpers. Structured page metadata extracted from an HTML response.
ParseContext
Core runtime types and traits used to define and run a crawl. Parse-time context passed into Spider::parse.
ParseOutput
Parse-time output sink and item contracts used by Spider::parse. Async output sink passed into a spider’s parse method.
ProxyMiddleware
Proxy routing middleware. Middleware that assigns proxies to outgoing requests and rotates them based on strategy.
RateLimitMiddleware
Built-in middleware that is available without extra feature flags. A middleware for rate limiting requests.
RefererMiddleware
Built-in middleware that is available without extra feature flags. Middleware that derives Referer values from request metadata and history.
Request
Shared runtime data types and convenience helpers. Outgoing HTTP request used by the crawler runtime.
ReqwestClientDownloader
Core runtime types and traits used to define and run a crawl. Downloader implementation backed by reqwest::Client.
Response
Shared runtime data types and convenience helpers. Represents an HTTP response received from a server.
RetryMiddleware
Built-in middleware that is available without extra feature flags. Middleware that retries failed requests.
RobotsTxtMiddleware
robots.txt enforcement middleware. Middleware that enforces robots.txt rules before download.
Scheduler
Core runtime types and traits used to define and run a crawl. Manages the crawl frontier and tracks visited request fingerprints.
SchedulerCheckpoint
Checkpoint types for save/resume workflows. A snapshot of the scheduler’s state.
SchemaExportConfig
Built-in pipelines that do not require extra feature flags. Export configuration derived from typed item schema metadata.
SchemaTransformPipeline
Built-in pipelines that do not require extra feature flags. Typed transform pipeline for item-to-item transforms before export.
SchemaValidationPipeline
Built-in pipelines that do not require extra feature flags. Schema-aware validation for typed items.
SchemaViolation
Built-in pipelines that do not require extra feature flags. Validation failure details for schema-aware pipelines.
SelectorList
Shared runtime data types and convenience helpers. A Scrapy-like selection result list.
SelectorNode
Shared runtime data types and convenience helpers. A node selected from an HTML document using the builtin CSS selector API.
SqlitePipeline
SQLite output pipeline. A pipeline that writes scraped items to a SQLite database. All database operations are offloaded to a dedicated blocking thread.
StatCollector
Core runtime types and traits used to define and run a crawl. Collects and stores various statistics about the crawler’s operation.
StateAccessMetrics
Core runtime types and traits used to define and run a crawl. Metrics for tracking state access patterns.
StreamJsonPipeline
Streaming JSON output pipeline. A pipeline that streams items directly to a JSON file without accumulating them in memory.
TransformPipeline
Built-in pipelines that do not require extra feature flags. Pipeline that transforms items and forwards successful results downstream.
UserAgentMiddleware
Configurable user-agent selection and rotation middleware. Middleware that sets and rotates User-Agent headers for outgoing requests.
ValidationPipeline
Built-in pipelines that do not require extra feature flags. Pipeline that validates items and drops invalid ones.
VisitedUrls
Core runtime types and traits used to define and run a crawl. A thread-safe URL tracker using DashMap.

Enums§

CrawlShapePreset
Core runtime types and traits used to define and run a crawl. Guided runtime presets for common crawl shapes.
DiscoveryMode
Core runtime types and traits used to define and run a crawl. Runtime discovery mode applied to each downloaded response.
FieldValueType
Stable field kinds used by typed item schema metadata.
JsonType
Built-in pipelines that do not require extra feature flags. JSON value type used by ValidationRule::Type.
LevelFilter
Logging level enum used by CrawlerBuilder::log_level. An enum representing the available verbosity level filters of the logger.
LinkType
Shared runtime data types and convenience helpers. Classification for links discovered in a response.
Method
Shared runtime data types and convenience helpers. Transport-neutral HTTP method used by Request.
MiddlewareAction
Middleware trait and control-flow type for request/response hooks. Control-flow result returned by middleware hooks.
PipelineError
Shared runtime data types and convenience helpers. Error type used by item pipelines.
SpiderError
Shared runtime data types and convenience helpers. Main runtime error type used across the crawler stack.
StartRequests
Core runtime types and traits used to define and run a crawl. Initial request source returned by Spider::start_requests.
TransformOperation
Built-in pipelines that do not require extra feature flags. Built-in operations applied to top-level object fields.
ValidationRule
Built-in pipelines that do not require extra feature flags. Declarative rules for validating top-level item fields.

Traits§

Downloader
Core runtime types and traits used to define and run a crawl. Trait implemented by HTTP downloaders used by the crawler runtime.
Middleware
Middleware trait and control-flow type for request/response hooks. Trait implemented by request/response middleware.
Pipeline
Pipeline trait for item-processing stages. Contract implemented by item-processing pipelines.
ScrapedItem
Parse-time output sink and item contracts used by Spider::parse. Trait implemented by item types emitted from spiders.
Spider
Core runtime types and traits used to define and run a crawl. Defines the contract for a spider.
TypedItemSchema
Trait for typed item definitions that can expose static schema metadata.

Functions§

create_dir
Shared runtime data types and convenience helpers. Creates a directory and all of its parent components if they are missing.
is_same_site
Shared runtime data types and convenience helpers. Checks if two URLs belong to the same site.
normalize_origin
Shared runtime data types and convenience helpers. Normalizes the origin of a request’s URL.
validate_output_dir
Shared runtime data types and convenience helpers. Validates that the parent directory of a given file path exists, creating it if necessary.

Type Aliases§

StartRequestIter
Core runtime types and traits used to define and run a crawl. A boxed iterator of start requests.

Attribute Macros§

async_trait
Core runtime types and traits used to define and run a crawl.
scraped_item
Helper macro used to define item structs that satisfy ScrapedItem. Attribute macro for defining a scraped item type.