Module prelude

Expand description

Convenient re-exports for spider-lib applications.

Most example code in this workspace starts here:

use spider_lib::prelude::*;

The prelude intentionally groups together the “first spider” surface area: runtime types, the spider trait, common errors, parsing helpers, middleware, and the most common pipelines.

Re-exports§

pub use spider_core::tokio;

Structs§

AutoThrottleMiddleware: Adaptive throttling middleware driven by observed response behavior. Middleware that adapts pacing dynamically based on response feedback.
Checkpoint: Checkpoint types for save/resume workflows. A complete checkpoint of the crawler’s state.
ConcurrentMap: Core runtime types and traits used to define and run a crawl. A thread-safe key-value map using DashMap.
ConcurrentVec: Core runtime types and traits used to define and run a crawl. A thread-safe vector using RwLock.
ConsolePipeline: Built-in pipelines that do not require extra feature flags. Pipeline that logs each scraped item with log::info!.
CookieMiddleware: Shared cookie jar middleware. Middleware that keeps a shared cookie store across requests.
Counter: Core runtime types and traits used to define and run a crawl. A thread-safe counter using atomic operations.
Counter64: Core runtime types and traits used to define and run a crawl. A 64-bit thread-safe counter for large counts.
Crawler: Core runtime types and traits used to define and run a crawl. The running crawler instance.
CrawlerBuilder: Core runtime types and traits used to define and run a crawl. A fluent builder for constructing Crawler instances.
CrawlerConfig: Core runtime types and traits used to define and run a crawl. Core runtime configuration for the crawler.
CrawlerState: Core runtime types and traits used to define and run a crawl. Internal shared state used by the runtime.
CsvPipeline: CSV file output pipeline. A pipeline that exports scraped items to a CSV file. Headers are determined from the keys of the first item processed.
DeduplicationPipeline: Built-in pipelines that do not require extra feature flags. Pipeline that filters duplicate items based on a configurable field set.
DiscoveryConfig: Core runtime types and traits used to define and run a crawl. Discovery-specific runtime configuration.
DiscoveryRule: Core runtime types and traits used to define and run a crawl. Rule-like configuration for runtime-managed discovery.
Flag: Core runtime types and traits used to define and run a crawl. A thread-safe boolean flag.
HttpCacheMiddleware: File-backed HTTP response cache middleware. Middleware that caches successful HTTP responses on disk.
ItemFieldSchema: Static schema metadata for a single item field.
ItemSchema: Static schema metadata for a scraped item type.
JsonPipeline: JSON array output pipeline. A pipeline that writes all scraped items to a single JSON file as a JSON array. Items are collected in a blocking task and written to disk when the pipeline is closed.
JsonlPipeline: JSON Lines output pipeline. A pipeline that writes each scraped item to a JSON Lines (.jsonl) file. Each item is written as a JSON object on a new line.
Link: Shared runtime data types and convenience helpers. A link discovered while extracting URLs from a response.
LinkExtractOptions: Shared runtime data types and convenience helpers. Options that control link extraction from a Response.
LinkSource: Shared runtime data types and convenience helpers. One selector/attribute pair used during link extraction.
PageMetadata: Shared runtime data types and convenience helpers. Structured page metadata extracted from an HTML response.
ParseContext: Core runtime types and traits used to define and run a crawl. Parse-time context passed into Spider::parse.
ParseOutput: Parse-time output sink and item contracts used by Spider::parse. Async output sink passed into a spider’s parse method.
ProxyMiddleware: Proxy routing middleware. Middleware that assigns proxies to outgoing requests and rotates them based on strategy.
RateLimitMiddleware: Built-in middleware that is available without extra feature flags. A middleware for rate limiting requests.
RefererMiddleware: Built-in middleware that is available without extra feature flags. Middleware that derives Referer values from request metadata and history.
Request: Shared runtime data types and convenience helpers. Outgoing HTTP request used by the crawler runtime.
ReqwestClientDownloader: Core runtime types and traits used to define and run a crawl. Downloader implementation backed by reqwest::Client.
Response: Shared runtime data types and convenience helpers. Represents an HTTP response received from a server.
RetryMiddleware: Built-in middleware that is available without extra feature flags. Middleware that retries failed requests.
RobotsTxtMiddleware: robots.txt enforcement middleware. Middleware that enforces robots.txt rules before download.
Scheduler: Core runtime types and traits used to define and run a crawl. Manages the crawl frontier and tracks visited request fingerprints.
SchedulerCheckpoint: Checkpoint types for save/resume workflows. A snapshot of the scheduler’s state.
SchemaExportConfig: Built-in pipelines that do not require extra feature flags. Export configuration derived from typed item schema metadata.
SchemaTransformPipeline: Built-in pipelines that do not require extra feature flags. Typed transform pipeline for item-to-item transforms before export.
SchemaValidationPipeline: Built-in pipelines that do not require extra feature flags. Schema-aware validation for typed items.
SchemaViolation: Built-in pipelines that do not require extra feature flags. Validation failure details for schema-aware pipelines.
SelectorList: Shared runtime data types and convenience helpers. A Scrapy-like selection result list.
SelectorNode: Shared runtime data types and convenience helpers. A node selected from an HTML document using the builtin CSS selector API.
SqlitePipeline: SQLite output pipeline. A pipeline that writes scraped items to a SQLite database. All database operations are offloaded to a dedicated blocking thread.
StatCollector: Core runtime types and traits used to define and run a crawl. Collects and stores various statistics about the crawler’s operation.
StateAccessMetrics: Core runtime types and traits used to define and run a crawl. Metrics for tracking state access patterns.
StreamJsonPipeline: Streaming JSON output pipeline. A pipeline that streams items directly to a JSON file without accumulating them in memory.
TransformPipeline: Built-in pipelines that do not require extra feature flags. Pipeline that transforms items and forwards successful results downstream.
UserAgentMiddleware: Configurable user-agent selection and rotation middleware. Middleware that sets and rotates User-Agent headers for outgoing requests.
ValidationPipeline: Built-in pipelines that do not require extra feature flags. Pipeline that validates items and drops invalid ones.
VisitedUrls: Core runtime types and traits used to define and run a crawl. A thread-safe URL tracker using DashMap.

Enums§

CrawlShapePreset: Core runtime types and traits used to define and run a crawl. Guided runtime presets for common crawl shapes.
DiscoveryMode: Core runtime types and traits used to define and run a crawl. Runtime discovery mode applied to each downloaded response.
FieldValueType: Stable field kinds used by typed item schema metadata.
JsonType: Built-in pipelines that do not require extra feature flags. JSON value type used by ValidationRule::Type.
LevelFilter: Logging level enum used by CrawlerBuilder::log_level. An enum representing the available verbosity level filters of the logger.
LinkType: Shared runtime data types and convenience helpers. Classification for links discovered in a response.
Method: Shared runtime data types and convenience helpers. Transport-neutral HTTP method used by Request.
MiddlewareAction: Middleware trait and control-flow type for request/response hooks. Control-flow result returned by middleware hooks.
PipelineError: Shared runtime data types and convenience helpers. Error type used by item pipelines.
SpiderError: Shared runtime data types and convenience helpers. Main runtime error type used across the crawler stack.
StartRequests: Core runtime types and traits used to define and run a crawl. Initial request source returned by Spider::start_requests.
TransformOperation: Built-in pipelines that do not require extra feature flags. Built-in operations applied to top-level object fields.
ValidationRule: Built-in pipelines that do not require extra feature flags. Declarative rules for validating top-level item fields.

Traits§

Downloader: Core runtime types and traits used to define and run a crawl. Trait implemented by HTTP downloaders used by the crawler runtime.
Middleware: Middleware trait and control-flow type for request/response hooks. Trait implemented by request/response middleware.
Pipeline: Pipeline trait for item-processing stages. Contract implemented by item-processing pipelines.
ScrapedItem: Parse-time output sink and item contracts used by Spider::parse. Trait implemented by item types emitted from spiders.
Spider: Core runtime types and traits used to define and run a crawl. Defines the contract for a spider.
TypedItemSchema: Trait for typed item definitions that can expose static schema metadata.

Functions§

create_dir: Shared runtime data types and convenience helpers. Creates a directory and all of its parent components if they are missing.
is_same_site: Shared runtime data types and convenience helpers. Checks if two URLs belong to the same site.
normalize_origin: Shared runtime data types and convenience helpers. Normalizes the origin of a request’s URL.
validate_output_dir: Shared runtime data types and convenience helpers. Validates that the parent directory of a given file path exists, creating it if necessary.

Type Aliases§

StartRequestIter: Core runtime types and traits used to define and run a crawl. A boxed iterator of start requests.

Attribute Macros§

async_trait: Core runtime types and traits used to define and run a crawl.
scraped_item: Helper macro used to define item structs that satisfy ScrapedItem. Attribute macro for defining a scraped item type.

Module prelude

Module prelude Copy item path

Re-exports§

Structs§

Enums§

Traits§

Functions§

Type Aliases§

Attribute Macros§

Module prelude