pub trait Pipeline<I>:
Send
+ Sync
+ 'staticwhere
I: ScrapedItem,{
// Required methods
fn name(&self) -> &str;
fn process_item<'life0, 'async_trait>(
&'life0 self,
item: I,
) -> Pin<Box<dyn Future<Output = Result<Option<I>, PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait,
Self: 'async_trait;
// Provided methods
fn close<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait,
Self: 'async_trait { ... }
fn get_state<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<Option<Value>, PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait,
Self: 'async_trait { ... }
fn restore_state<'life0, 'async_trait>(
&'life0 self,
_state: Value,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait,
Self: 'async_trait { ... }
}Expand description
Pipeline trait for item-processing stages. Contract implemented by item-processing pipelines.
Pipelines receive items after parsing and before final output is considered
complete. A pipeline may transform an item and pass it on, drop it by
returning Ok(None), persist it externally, or keep internal state for
checkpoint/resume.
Required Methods§
Sourcefn process_item<'life0, 'async_trait>(
&'life0 self,
item: I,
) -> Pin<Box<dyn Future<Output = Result<Option<I>, PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
fn process_item<'life0, 'async_trait>(
&'life0 self,
item: I,
) -> Pin<Box<dyn Future<Output = Result<Option<I>, PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
Processes a single scraped item.
This method can perform any processing on the item, such as storing it, validating it,
or passing it to another pipeline. It can also choose to drop the item by returning Ok(None).
Pipelines are ordered. Returning Ok(Some(item)) forwards the item to
the next pipeline, while Ok(None) stops the item from moving further
down the chain without treating it as an error.
§Errors
Returns an error when item processing fails.
Provided Methods§
Sourcefn close<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
fn close<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
Called when the spider is closing.
This method can be used to perform any cleanup tasks, such as closing file handles or database connections.
§Errors
Returns an error if cleanup fails.
Sourcefn get_state<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<Option<Value>, PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
fn get_state<'life0, 'async_trait>(
&'life0 self,
) -> Pin<Box<dyn Future<Output = Result<Option<Value>, PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
Returns the current state of the pipeline as a JSON value.
This method is called during checkpointing to save the pipeline’s state.
The returned state should be sufficient to restore the pipeline to its current
state using restore_state.
Pipelines that do not need checkpoint support can keep the default
implementation and return Ok(None).
§Errors
Returns an error when state capture or serialization fails.
Sourcefn restore_state<'life0, 'async_trait>(
&'life0 self,
_state: Value,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
fn restore_state<'life0, 'async_trait>(
&'life0 self,
_state: Value,
) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>where
'life0: 'async_trait,
Self: 'async_trait,
Restores the pipeline’s state from a JSON value.
This method is called when resuming from a checkpoint. The provided state should be used to restore the pipeline to the state it was in when the checkpoint was created.
§Errors
Returns an error when deserializing or applying state fails.