Pipeline

Trait Pipeline 

Source
pub trait Pipeline<I>:
    Send
    + Sync
    + 'static
where I: ScrapedItem,
{ // Required methods fn name(&self) -> &str; fn process_item<'life0, 'async_trait>( &'life0 self, item: I, ) -> Pin<Box<dyn Future<Output = Result<Option<I>, PipelineError>> + Send + 'async_trait>> where 'life0: 'async_trait, Self: 'async_trait; // Provided methods fn close<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>> where 'life0: 'async_trait, Self: 'async_trait { ... } fn get_state<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Result<Option<Value>, PipelineError>> + Send + 'async_trait>> where 'life0: 'async_trait, Self: 'async_trait { ... } fn restore_state<'life0, 'async_trait>( &'life0 self, _state: Value, ) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>> where 'life0: 'async_trait, Self: 'async_trait { ... } }
Expand description

Pipeline trait for item-processing stages. Contract implemented by item-processing pipelines.

Pipelines receive items after parsing and before final output is considered complete. A pipeline may transform an item and pass it on, drop it by returning Ok(None), persist it externally, or keep internal state for checkpoint/resume.

Required Methods§

Source

fn name(&self) -> &str

Returns the name of the pipeline.

Source

fn process_item<'life0, 'async_trait>( &'life0 self, item: I, ) -> Pin<Box<dyn Future<Output = Result<Option<I>, PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait, Self: 'async_trait,

Processes a single scraped item.

This method can perform any processing on the item, such as storing it, validating it, or passing it to another pipeline. It can also choose to drop the item by returning Ok(None).

Pipelines are ordered. Returning Ok(Some(item)) forwards the item to the next pipeline, while Ok(None) stops the item from moving further down the chain without treating it as an error.

§Errors

Returns an error when item processing fails.

Provided Methods§

Source

fn close<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait, Self: 'async_trait,

Called when the spider is closing.

This method can be used to perform any cleanup tasks, such as closing file handles or database connections.

§Errors

Returns an error if cleanup fails.

Source

fn get_state<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Result<Option<Value>, PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait, Self: 'async_trait,

Returns the current state of the pipeline as a JSON value.

This method is called during checkpointing to save the pipeline’s state. The returned state should be sufficient to restore the pipeline to its current state using restore_state.

Pipelines that do not need checkpoint support can keep the default implementation and return Ok(None).

§Errors

Returns an error when state capture or serialization fails.

Source

fn restore_state<'life0, 'async_trait>( &'life0 self, _state: Value, ) -> Pin<Box<dyn Future<Output = Result<(), PipelineError>> + Send + 'async_trait>>
where 'life0: 'async_trait, Self: 'async_trait,

Restores the pipeline’s state from a JSON value.

This method is called when resuming from a checkpoint. The provided state should be used to restore the pipeline to the state it was in when the checkpoint was created.

§Errors

Returns an error when deserializing or applying state fails.

Implementors§