Expand description
Workspace-wide public constants.
This module re-exports the constant groups used by the crawler, downloader, middleware, pipelines, and scheduler.
Constantsยง
- BLOOM_
BUFFER_ FLUSH_ SIZE - Buffer size before flushing to Bloom filter.
- BLOOM_
FILTER_ CAPACITY - Capacity of the Bloom filter for duplicate detection.
- BLOOM_
FILTER_ HASH_ FUNCTIONS - Number of hash functions used by the Bloom filter.
- BLOOM_
FLUSH_ INTERVAL_ MS - Interval in milliseconds for periodic Bloom filter flush.
- CONNECTION_
POOL_ IDLE_ TIMEOUT_ SECS - Connection pool idle timeout in seconds.
- CONNECT_
TIMEOUT_ SECS - Connect timeout in seconds.
- CRAWLER_
DEFAULT_ CHANNEL_ CAPACITY - Default channel capacity for crawler communication.
- CRAWLER_
IDLE_ CHECK_ INTERVAL_ MS - Idle check interval in milliseconds.
- CRAWLER_
SHUTDOWN_ GRACE_ PERIOD_ SECS - Default grace period for crawler shutdown in seconds.
- CSV_
BUFFER_ SIZE - Buffer size for CSV export pipeline.
- DEFAULT_
POOL_ MAX_ IDLE_ PER_ HOST - Maximum idle connections per host for default client.
- DEFAULT_
REQUEST_ TIMEOUT_ SECS - Default request timeout in seconds.
- DEFAULT_
VISITED_ CACHE_ SIZE - Default capacity for the visited URL cache when not using checkpoint.
- HOST_
SPECIFIC_ POOL_ MAX_ IDLE_ PER_ HOST - Maximum idle connections per host for host-specific clients.
- MAX_
PENDING_ REQUESTS - Maximum number of pending requests before applying backpressure.
- MIDDLEWARE_
CACHE_ CAPACITY - Default cache capacity for middleware.
- MIDDLEWARE_
CACHE_ TTL_ SECS - Default cache TTL for middleware (1 hour).
- RATE_
LIMIT_ ERROR_ PENALTY_ MULTIPLIER - Error penalty multiplier for adaptive rate limiting.
- RATE_
LIMIT_ FORBIDDEN_ PENALTY_ MULTIPLIER - Forbidden penalty multiplier for adaptive rate limiting.
- RATE_
LIMIT_ INITIAL_ DELAY_ MS - Initial delay for adaptive rate limiting (500ms).
- RATE_
LIMIT_ MAX_ DELAY_ MS - Maximum delay for rate limiting (60 seconds).
- RATE_
LIMIT_ MAX_ JITTER_ MS - Maximum jitter for rate limiting (500ms).
- RATE_
LIMIT_ MIN_ DELAY_ MS - Minimum delay for rate limiting (50ms).
- RATE_
LIMIT_ SUCCESS_ DECAY_ MULTIPLIER - Success decay multiplier for adaptive rate limiting.
- RETRY_
DEFAULT_ BACKOFF_ FACTOR - Default backoff factor for retry middleware.
- RETRY_
DEFAULT_ HTTP_ CODES - Default HTTP status codes to retry.
- RETRY_
DEFAULT_ MAX_ DELAY_ MS - Default maximum delay for retry middleware (3 minutes).
- RETRY_
DEFAULT_ MAX_ RETRIES - Default retry attempts for retry middleware.
- SQLITE_
CHANNEL_ CAPACITY - Channel capacity for SQLite pipeline.
- STREAM_
JSON_ DEFAULT_ BATCH_ SIZE - Default batch size for stream JSON pipeline.
- TCP_
KEEPALIVE_ SECS - TCP keepalive in seconds.
- VISITED_
URL_ CACHE_ CAPACITY - Capacity of the visited URL cache in the scheduler.
- VISITED_
URL_ CACHE_ TTL_ SECS - Time-to-idle for visited URL cache entries (1 hour).