pub struct Request {
pub url: Url,
pub priority: i32,
pub method: Method,
pub headers: HeaderMap,
pub body: Option<Body>,
/* private fields */
}Expand description
Shared runtime data types and convenience helpers. Outgoing HTTP request used by the crawler runtime.
Request is the handoff type between spiders, middleware, the scheduler,
and the downloader. It is transport-neutral enough to be shared across the
workspace, but expressive enough for custom methods, headers, bodies, and
request-scoped metadata.
§Example
use spider_util::request::{Method, Request};
use url::Url;
// Create a basic GET request
let request = Request::new(Url::parse("https://example.com")?);
// Or parse a string into a request directly
let request = Request::try_new("https://example.com")?;
// Build a request with headers and method
let post_request = Request::new(Url::parse("https://api.example.com")?)
.with_method(Method::Post)
.with_header("Accept", "application/json")
?;Fields§
§url: UrlThe target URL for this request.
priority: i32Request scheduling priority.
Higher values are dequeued before lower values.
method: MethodThe HTTP method (GET, POST, etc.).
headers: HeaderMapHTTP headers for the request.
body: Option<Body>Optional request body.
Implementations§
Source§impl Request
impl Request
Sourcepub fn new(url: Url) -> Request
pub fn new(url: Url) -> Request
Creates a new Request with the given URL.
This is the most common constructor used by spiders when enqueueing
follow-up pages. It does not allocate metadata storage unless
with_meta is called.
§Example
use spider_util::request::{Method, Request};
use url::Url;
let request = Request::new(Url::parse("https://example.com")?);Sourcepub fn try_new<U>(url: U) -> Result<Request, SpiderError>
pub fn try_new<U>(url: U) -> Result<Request, SpiderError>
Creates a new Request from any value that can be converted into a Url.
This is a fallible companion to Request::new for callers that want to
pass URL strings directly.
§Example
use spider_util::request::Request;
let request = Request::try_new("https://example.com")?;Sourcepub fn with_method(self, method: Method) -> Request
pub fn with_method(self, method: Method) -> Request
Sourcepub fn with_priority(self, priority: i32) -> Request
pub fn with_priority(self, priority: i32) -> Request
Sets the scheduling priority for the request.
Higher values are scheduled before lower values. Requests with the same priority retain FIFO ordering.
Sourcepub fn with_header<N, V>(
self,
name: N,
value: V,
) -> Result<Request, SpiderError>
pub fn with_header<N, V>( self, name: N, value: V, ) -> Result<Request, SpiderError>
Adds a header to the request.
Accepts any types that can be converted into [reqwest::header::HeaderName]
and [reqwest::header::HeaderValue], including &str, String, and
standard header constants such as [http::header::CONTENT_TYPE].
Returns an error if the header name or value is invalid.
§Errors
Returns a SpiderError::HeaderValueError if the header name or value is invalid.
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com")?)
.with_header(http::header::ACCEPT, "application/json")
?
.with_header("X-Trace-Id".to_string(), "abc-123".to_string())?;Sourcepub fn with_json(self, json: Value) -> Request
pub fn with_json(self, json: Value) -> Request
Sets the body of the request to a JSON value and defaults the method to POST.
This helper stores the payload body only. Add content-type headers explicitly when the target service expects them.
§Example
use spider_util::request::Request;
use url::Url;
use serde_json::json;
let request = Request::new(Url::parse("https://api.example.com")?)
.with_json(json!({"name": "test"}));Sourcepub fn with_form(self, form: DashMap<String, String>) -> Request
pub fn with_form(self, form: DashMap<String, String>) -> Request
Sets the body of the request to form data and defaults the method to POST.
§Example
use spider_util::request::Request;
use url::Url;
use dashmap::DashMap;
let mut form = DashMap::new();
form.insert("key".to_string(), "value".to_string());
let request = Request::new(Url::parse("https://api.example.com")?)
.with_form(form);Sourcepub fn with_bytes(self, bytes: Bytes) -> Request
pub fn with_bytes(self, bytes: Bytes) -> Request
Sourcepub fn with_meta(self, key: impl Into<String>, value: Value) -> Request
pub fn with_meta(self, key: impl Into<String>, value: Value) -> Request
Adds a value to the request’s metadata.
Lazily allocates the metadata map on first use. Metadata is commonly used to carry crawl context such as pagination state, source URLs, or retry bookkeeping across middleware and parsing stages.
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com")?)
.with_priority(10)
.with_meta("source", serde_json::json!("manual"));Sourcepub fn with_meta_value<T>(
self,
key: impl Into<String>,
value: T,
) -> Result<Request, SpiderError>where
T: Serialize,
pub fn with_meta_value<T>(
self,
key: impl Into<String>,
value: T,
) -> Result<Request, SpiderError>where
T: Serialize,
Serializes and stores a metadata value under the provided key.
This is a convenient typed companion to Request::with_meta that
avoids manual serde_json::json!(...) calls for structured metadata.
Sourcepub fn with_optional_meta_value<T>(
self,
key: impl Into<String>,
value: Option<T>,
) -> Result<Request, SpiderError>where
T: Serialize,
pub fn with_optional_meta_value<T>(
self,
key: impl Into<String>,
value: Option<T>,
) -> Result<Request, SpiderError>where
T: Serialize,
Serializes and stores a metadata value only when it is present.
Sourcepub fn get_meta(&self, key: &str) -> Option<Value>
pub fn get_meta(&self, key: &str) -> Option<Value>
Gets a reference to a metadata value, if it exists.
Returns a cloned JSON value because metadata is stored in a shared
concurrent map. Returns None if the key doesn’t exist or if metadata
hasn’t been set.
Sourcepub fn meta_value<T>(&self, key: &str) -> Result<Option<T>, SpiderError>where
T: DeserializeOwned,
pub fn meta_value<T>(&self, key: &str) -> Result<Option<T>, SpiderError>where
T: DeserializeOwned,
Deserializes a metadata value into the requested type.
Sourcepub fn meta_map(&self) -> Option<&Arc<DashMap<String, Value>>>
pub fn meta_map(&self) -> Option<&Arc<DashMap<String, Value>>>
Returns a reference to the internal metadata map, if it exists.
Sourcepub fn insert_meta(&mut self, key: impl Into<String>, value: Value)
pub fn insert_meta(&mut self, key: impl Into<String>, value: Value)
Inserts a value into metadata, creating the map if needed.
This is intended for internal framework use.
Sourcepub fn insert_meta_value<T>(
&mut self,
key: impl Into<String>,
value: T,
) -> Result<(), SpiderError>where
T: Serialize,
pub fn insert_meta_value<T>(
&mut self,
key: impl Into<String>,
value: T,
) -> Result<(), SpiderError>where
T: Serialize,
Serializes and inserts a metadata value for internal or incremental use.
Sourcepub fn remove_meta(&mut self, key: &str) -> Option<Value>
pub fn remove_meta(&mut self, key: &str) -> Option<Value>
Removes a metadata entry by key, returning the stored JSON value if any.
Sourcepub fn get_meta_ref(&self, key: &str) -> Option<Ref<'_, String, Value>>
pub fn get_meta_ref(&self, key: &str) -> Option<Ref<'_, String, Value>>
Gets a value from metadata using DashMap’s API.
This is intended for internal framework use where direct access is needed.
Sourcepub fn set_meta_from_option(
&mut self,
meta: Option<Arc<DashMap<String, Value>>>,
)
pub fn set_meta_from_option( &mut self, meta: Option<Arc<DashMap<String, Value>>>, )
Sets the metadata map directly.
Used for internal framework operations.
Sourcepub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
pub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
Clones the metadata map.
Used for internal framework operations where metadata needs to be copied.
Sourcepub fn take_meta(&mut self) -> Option<Arc<DashMap<String, Value>>>
pub fn take_meta(&mut self) -> Option<Arc<DashMap<String, Value>>>
Takes the metadata map, leaving None in its place.
Used for internal framework operations.
Sourcepub fn meta_inner(&self) -> &Option<Arc<DashMap<String, Value>>>
pub fn meta_inner(&self) -> &Option<Arc<DashMap<String, Value>>>
Returns a reference to the metadata Arc for internal framework use.
Sourcepub fn get_retry_attempts(&self) -> u32
pub fn get_retry_attempts(&self) -> u32
Gets the number of times the request has been retried.
Returns 0 if no retry attempts have been recorded.
Sourcepub fn increment_retry_attempts(&mut self)
pub fn increment_retry_attempts(&mut self)
Increments the retry count for the request.
Lazily allocates the metadata map if not already present.
Sourcepub fn fingerprint(&self) -> String
pub fn fingerprint(&self) -> String
Generates a unique fingerprint for the request based on its URL, method, and body.
This is the stable identity used by runtime deduplication and related components that need to recognize equivalent requests.
The fingerprint is used for duplicate detection and caching. It combines:
- The request URL
- The HTTP method
- The request body (if present)
§Example
use spider_util::request::Request;
use url::Url;
let request = Request::new(Url::parse("https://example.com")?);
let fingerprint = request.fingerprint();