Struct ParseContext
pub struct ParseContext<'a, S>{ /* private fields */ }Expand description
Core runtime types and traits used to define and run a crawl.
Parse-time context passed into Spider::parse.
This bundles the current Response, shared spider state, and the async
output sink into a single value so user-facing parse signatures stay small.
The context dereferences to Response, which means selector-heavy code
can keep the natural cx.css(...) style without manually reaching through a
nested response field.
Implementations§
§impl<'a, S> ParseContext<'a, S>
impl<'a, S> ParseContext<'a, S>
pub fn response_mut(&mut self) -> &mut Response
pub fn response_mut(&mut self) -> &mut Response
Returns the current response as a mutable reference.
pub fn output(&self) -> &ParseOutput<<S as Spider>::Item>
pub fn output(&self) -> &ParseOutput<<S as Spider>::Item>
Returns the underlying async parse output sink.
pub async fn add_item(
&self,
item: <S as Spider>::Item,
) -> Result<(), SpiderError>
pub async fn add_item( &self, item: <S as Spider>::Item, ) -> Result<(), SpiderError>
Emits a scraped item into the runtime.
pub async fn add_items(
&self,
items: impl IntoIterator<Item = <S as Spider>::Item>,
) -> Result<(), SpiderError>
pub async fn add_items( &self, items: impl IntoIterator<Item = <S as Spider>::Item>, ) -> Result<(), SpiderError>
Emits multiple scraped items into the runtime.
pub async fn add_request(&self, request: Request) -> Result<(), SpiderError>
pub async fn add_request(&self, request: Request) -> Result<(), SpiderError>
Emits a follow-up request into the runtime.
pub async fn add_requests(
&self,
requests: impl IntoIterator<Item = Request>,
) -> Result<(), SpiderError>
pub async fn add_requests( &self, requests: impl IntoIterator<Item = Request>, ) -> Result<(), SpiderError>
Emits multiple follow-up requests into the runtime.
pub fn into_parts(
self,
) -> (Response, &'a <S as Spider>::State, ParseOutput<<S as Spider>::Item>)
pub fn into_parts( self, ) -> (Response, &'a <S as Spider>::State, ParseOutput<<S as Spider>::Item>)
Consumes the context and returns the inner response, state reference, and output sink.
pub fn into_response(self) -> Response
pub fn into_response(self) -> Response
Consumes the context and returns the inner response.
Methods from Deref<Target = Response>§
Sourcepub fn request_from_response(&self) -> Request
pub fn request_from_response(&self) -> Request
Reconstructs the original Request that led to this response.
This method creates a new Request with the same URL and metadata
as the request that produced this response. Useful for retry scenarios
or when you need to re-request the same resource.
§Example
let original_request = response.request_from_response();Sourcepub fn meta_value<T>(&self, key: &str) -> Result<Option<T>, Error>where
T: DeserializeOwned,
pub fn meta_value<T>(&self, key: &str) -> Result<Option<T>, Error>where
T: DeserializeOwned,
Deserializes a metadata value into the requested type.
Sourcepub fn discovery_rule_name(&self) -> Option<String>
pub fn discovery_rule_name(&self) -> Option<String>
Returns the runtime discovery rule name attached to this response, if any.
Sourcepub fn matches_discovery_rule(&self, rule_name: &str) -> bool
pub fn matches_discovery_rule(&self, rule_name: &str) -> bool
Returns true when the response was reached through the named discovery rule.
Sourcepub fn insert_meta(&mut self, key: impl Into<String>, value: Value)
pub fn insert_meta(&mut self, key: impl Into<String>, value: Value)
Inserts a metadata value, lazily allocating the map if needed.
Sourcepub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
pub fn clone_meta(&self) -> Option<Arc<DashMap<String, Value>>>
Returns a clone of the internal metadata map, if present.
Sourcepub fn json<T>(&self) -> Result<T, Error>where
T: DeserializeOwned,
pub fn json<T>(&self) -> Result<T, Error>where
T: DeserializeOwned,
Deserializes the response body as JSON.
§Type Parameters
T: The target type to deserialize into (must implementDeserializeOwned)
§Errors
Returns a serde_json::Error if the body cannot be parsed as JSON
or if it cannot be deserialized into type T.
§Example
let data: Data = response.json()?;Sourcepub fn css(&self, query: &str) -> Result<SelectorList, SpiderError>
pub fn css(&self, query: &str) -> Result<SelectorList, SpiderError>
Applies a builtin CSS selector to the response body using a Scrapy-like API.
Supports standard CSS selectors plus terminal extraction suffixes:
::text::attr(name)
§Example
let heading = response.css("h1::text")?.get().unwrap_or_default();
let next_href = response.css("a::attr(href)")?.get();§Errors
Returns SpiderError::Utf8Error when the body is not valid UTF-8 and
SpiderError::HtmlParseError when the selector is invalid.
Sourcepub fn page_metadata(&self) -> Result<PageMetadata, Utf8Error>
pub fn page_metadata(&self) -> Result<PageMetadata, Utf8Error>
Extracts structured page metadata from HTML responses.
Sourcepub fn links_iter(
&self,
options: LinkExtractOptions,
) -> impl Iterator<Item = Link>
pub fn links_iter( &self, options: LinkExtractOptions, ) -> impl Iterator<Item = Link>
Returns a customizable iterator of links discovered in the response body.
Unlike Response::links, this method does not deduplicate results.
Callers that need uniqueness can collect into a set or use Response::links.
§Example
let links: Vec<_> = response
.links_iter(LinkExtractOptions::default())
.collect();
assert!(!links.is_empty());Sourcepub fn links(&self) -> DashSet<Link>
pub fn links(&self) -> DashSet<Link>
Extracts all unique, same-site links from the response body.
This method discovers links from:
- HTML elements with
hreforsrcattributes (<a>,<link>,<script>,<img>, etc.) - URLs found in text content (using link detection)
Only links pointing to the same site (same registered domain) are included.
§Returns
A [DashSet] of Link objects containing the URL and link type.
§Example
let links = response.links();
for link in links.iter() {
println!("Found {:?} link: {}", link.link_type, link.url);
}