Expand description
Convenient re-exports for spider-lib applications.
Most example code in this workspace starts here:
use spider_lib::prelude::*;The prelude intentionally groups together the “first spider” surface area: runtime types, the spider trait, common errors, parsing helpers, middleware, and the most common pipelines.
Re-exports§
pub use spider_core::tokio;
Structs§
- Auto
Throttle Middleware - Adaptive throttling middleware driven by observed response behavior. Middleware that adapts pacing dynamically based on response feedback.
- Checkpoint
- Checkpoint types for save/resume workflows. A complete checkpoint of the crawler’s state.
- Concurrent
Map - Core runtime types and traits used to define and run a crawl. A thread-safe key-value map using DashMap.
- Concurrent
Vec - Core runtime types and traits used to define and run a crawl. A thread-safe vector using RwLock.
- Console
Pipeline - Built-in pipelines that do not require extra feature flags.
Pipeline that logs each scraped item with
log::info!. - Cookie
Middleware - Shared cookie jar middleware. Middleware that keeps a shared cookie store across requests.
- Counter
- Core runtime types and traits used to define and run a crawl. A thread-safe counter using atomic operations.
- Counter64
- Core runtime types and traits used to define and run a crawl. A 64-bit thread-safe counter for large counts.
- Crawler
- Core runtime types and traits used to define and run a crawl. The running crawler instance.
- Crawler
Builder - Core runtime types and traits used to define and run a crawl.
A fluent builder for constructing
Crawlerinstances. - Crawler
Config - Core runtime types and traits used to define and run a crawl. Core runtime configuration for the crawler.
- Crawler
State - Core runtime types and traits used to define and run a crawl. Internal shared state used by the runtime.
- CsvPipeline
- CSV file output pipeline. A pipeline that exports scraped items to a CSV file. Headers are determined from the keys of the first item processed.
- Deduplication
Pipeline - Built-in pipelines that do not require extra feature flags. Pipeline that filters duplicate items based on a configurable field set.
- Discovery
Config - Core runtime types and traits used to define and run a crawl. Discovery-specific runtime configuration.
- Discovery
Rule - Core runtime types and traits used to define and run a crawl. Rule-like configuration for runtime-managed discovery.
- Flag
- Core runtime types and traits used to define and run a crawl. A thread-safe boolean flag.
- Http
Cache Middleware - File-backed HTTP response cache middleware. Middleware that caches successful HTTP responses on disk.
- Item
Field Schema - Static schema metadata for a single item field.
- Item
Schema - Static schema metadata for a scraped item type.
- Json
Pipeline - JSON array output pipeline. A pipeline that writes all scraped items to a single JSON file as a JSON array. Items are collected in a blocking task and written to disk when the pipeline is closed.
- Jsonl
Pipeline - JSON Lines output pipeline. A pipeline that writes each scraped item to a JSON Lines (.jsonl) file. Each item is written as a JSON object on a new line.
- Link
- Shared runtime data types and convenience helpers. A link discovered while extracting URLs from a response.
- Link
Extract Options - Shared runtime data types and convenience helpers.
Options that control link extraction from a
Response. - Link
Source - Shared runtime data types and convenience helpers. One selector/attribute pair used during link extraction.
- Page
Metadata - Shared runtime data types and convenience helpers. Structured page metadata extracted from an HTML response.
- Parse
Context - Core runtime types and traits used to define and run a crawl.
Parse-time context passed into
Spider::parse. - Parse
Output - Parse-time output sink and item contracts used by
Spider::parse. Async output sink passed into a spider’sparsemethod. - Proxy
Middleware - Proxy routing middleware. Middleware that assigns proxies to outgoing requests and rotates them based on strategy.
- Rate
Limit Middleware - Built-in middleware that is available without extra feature flags. A middleware for rate limiting requests.
- Referer
Middleware - Built-in middleware that is available without extra feature flags.
Middleware that derives
Referervalues from request metadata and history. - Request
- Shared runtime data types and convenience helpers. Outgoing HTTP request used by the crawler runtime.
- Reqwest
Client Downloader - Core runtime types and traits used to define and run a crawl.
Downloader implementation backed by
reqwest::Client. - Response
- Shared runtime data types and convenience helpers. Represents an HTTP response received from a server.
- Retry
Middleware - Built-in middleware that is available without extra feature flags. Middleware that retries failed requests.
- Robots
TxtMiddleware robots.txtenforcement middleware. Middleware that enforcesrobots.txtrules before download.- Scheduler
- Core runtime types and traits used to define and run a crawl. Manages the crawl frontier and tracks visited request fingerprints.
- Scheduler
Checkpoint - Checkpoint types for save/resume workflows. A snapshot of the scheduler’s state.
- Schema
Export Config - Built-in pipelines that do not require extra feature flags. Export configuration derived from typed item schema metadata.
- Schema
Transform Pipeline - Built-in pipelines that do not require extra feature flags. Typed transform pipeline for item-to-item transforms before export.
- Schema
Validation Pipeline - Built-in pipelines that do not require extra feature flags. Schema-aware validation for typed items.
- Schema
Violation - Built-in pipelines that do not require extra feature flags. Validation failure details for schema-aware pipelines.
- Selector
List - Shared runtime data types and convenience helpers. A Scrapy-like selection result list.
- Selector
Node - Shared runtime data types and convenience helpers. A node selected from an HTML document using the builtin CSS selector API.
- Sqlite
Pipeline - SQLite output pipeline. A pipeline that writes scraped items to a SQLite database. All database operations are offloaded to a dedicated blocking thread.
- Stat
Collector - Core runtime types and traits used to define and run a crawl. Collects and stores various statistics about the crawler’s operation.
- State
Access Metrics - Core runtime types and traits used to define and run a crawl. Metrics for tracking state access patterns.
- Stream
Json Pipeline - Streaming JSON output pipeline. A pipeline that streams items directly to a JSON file without accumulating them in memory.
- Transform
Pipeline - Built-in pipelines that do not require extra feature flags. Pipeline that transforms items and forwards successful results downstream.
- User
Agent Middleware - Configurable user-agent selection and rotation middleware.
Middleware that sets and rotates
User-Agentheaders for outgoing requests. - Validation
Pipeline - Built-in pipelines that do not require extra feature flags. Pipeline that validates items and drops invalid ones.
- Visited
Urls - Core runtime types and traits used to define and run a crawl. A thread-safe URL tracker using DashMap.
Enums§
- Crawl
Shape Preset - Core runtime types and traits used to define and run a crawl. Guided runtime presets for common crawl shapes.
- Discovery
Mode - Core runtime types and traits used to define and run a crawl. Runtime discovery mode applied to each downloaded response.
- Field
Value Type - Stable field kinds used by typed item schema metadata.
- Json
Type - Built-in pipelines that do not require extra feature flags.
JSON value type used by
ValidationRule::Type. - Level
Filter - Logging level enum used by
CrawlerBuilder::log_level. An enum representing the available verbosity level filters of the logger. - Link
Type - Shared runtime data types and convenience helpers. Classification for links discovered in a response.
- Method
- Shared runtime data types and convenience helpers.
Transport-neutral HTTP method used by
Request. - Middleware
Action - Middleware trait and control-flow type for request/response hooks. Control-flow result returned by middleware hooks.
- Pipeline
Error - Shared runtime data types and convenience helpers. Error type used by item pipelines.
- Spider
Error - Shared runtime data types and convenience helpers. Main runtime error type used across the crawler stack.
- Start
Requests - Core runtime types and traits used to define and run a crawl.
Initial request source returned by
Spider::start_requests. - Transform
Operation - Built-in pipelines that do not require extra feature flags. Built-in operations applied to top-level object fields.
- Validation
Rule - Built-in pipelines that do not require extra feature flags. Declarative rules for validating top-level item fields.
Traits§
- Downloader
- Core runtime types and traits used to define and run a crawl. Trait implemented by HTTP downloaders used by the crawler runtime.
- Middleware
- Middleware trait and control-flow type for request/response hooks. Trait implemented by request/response middleware.
- Pipeline
- Pipeline trait for item-processing stages. Contract implemented by item-processing pipelines.
- Scraped
Item - Parse-time output sink and item contracts used by
Spider::parse. Trait implemented by item types emitted from spiders. - Spider
- Core runtime types and traits used to define and run a crawl. Defines the contract for a spider.
- Typed
Item Schema - Trait for typed item definitions that can expose static schema metadata.
Functions§
- create_
dir - Shared runtime data types and convenience helpers. Creates a directory and all of its parent components if they are missing.
- is_
same_ site - Shared runtime data types and convenience helpers. Checks if two URLs belong to the same site.
- normalize_
origin - Shared runtime data types and convenience helpers. Normalizes the origin of a request’s URL.
- validate_
output_ dir - Shared runtime data types and convenience helpers. Validates that the parent directory of a given file path exists, creating it if necessary.
Type Aliases§
- Start
Request Iter - Core runtime types and traits used to define and run a crawl. A boxed iterator of start requests.
Attribute Macros§
- async_
trait - Core runtime types and traits used to define and run a crawl.
- scraped_
item - Helper macro used to define item structs that satisfy
ScrapedItem. Attribute macro for defining a scraped item type.