LocalLLM

A specialized LangChain ChatOllama model designed for local execution, incorporating a default system prompt and a Pydantic output parser for structured data extraction.

This class extends ChatOllama to provide predefined system instructions and handle structured output parsing, streamlining interactions with the local LLM for specific extraction tasks.

Example Usage:

This example demonstrates how to initialize LocalLLM and use its extract_information method to extract monetary information from text.

from llm_etl_pipeline.extraction import LocalLLM

llm_extractor = LocalLLM(
    model="llama3",  # Replace with the name of your Ollama model
    temperature=0.3, # Keep temperature low for deterministic extraction
    default_system_prompt="You are a helpful assistant designed to extract information."
)

# Example text elements for extraction
text_elements = [
    "The total cost was $150.75, with an additional fee of 20 USD.",
    "He paid 5 euros for coffee.",
    "The price increased by £10.",
    "She received 100 JPY from the exchange."
]

print("\nAttempting to extract monetary information...")
extracted_data = llm_extractor.extract_information(
    list_elem=text_elements,
    extraction_type='money',
    reference_depth='sentences'
)

print("\n--- Extracted Monetary Information ---")
import json
print(json.dumps(extracted_data, indent=2))
print("--------------------------------------")

API Reference

class llm_etl_pipeline.LocalLLM(*args, **kwargs)[source]

Bases: ChatOllama

A specialized LangChain ChatOllama model designed for local execution, incorporating a default system prompt and a Pydantic output parser for structured data extraction.

This class extends ChatOllama to provide predefined system instructions and handle structured output parsing, streamlining interactions with the local LLM for specific extraction tasks.

Parameters:
  • args (Any)

  • name (str | None)

  • cache (BaseCache | bool | None)

  • verbose (bool)

  • callbacks (list[BaseCallbackHandler] | BaseCallbackManager | None)

  • tags (list[str] | None)

  • metadata (dict[str, Any] | None)

  • custom_get_token_ids (Callable[[str], list[int]] | None)

  • callback_manager (BaseCallbackManager | None)

  • rate_limiter (BaseRateLimiter | None)

  • disable_streaming (bool | Literal['tool_calling'])

  • model (str)

  • extract_reasoning (bool | tuple[str, str] | None)

  • mirostat (int | None)

  • mirostat_eta (float | None)

  • mirostat_tau (float | None)

  • num_ctx (int | None)

  • num_gpu (int | None)

  • num_thread (int | None)

  • num_predict (int | None)

  • repeat_last_n (int | None)

  • repeat_penalty (float | None)

  • temperature (float | None)

  • seed (int | None)

  • stop (list[str] | None)

  • tfs_z (float | None)

  • top_k (int | None)

  • top_p (float | None)

  • format (Literal['', 'json'] | dict[str, ~typing.Any] | None)

  • keep_alive (int | str | None)

  • base_url (str | None)

  • client_kwargs (dict | None)

  • async_client_kwargs (dict | None)

  • sync_client_kwargs (dict | None)

  • default_system_prompt (Annotated[str, Strict(strict=True), StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None)

default_system_prompt

A system-level prompt that sets the context or instructions for the LLM. If not provided during initialization, it will be loaded from a template. This attribute cannot be changed once populated.

Type:

Optional[NonEmptyStr]

default_system_prompt: typing.Optional[typing.Annotated[str]]
extract_information(list_elem, extraction_type='money', reference_depth='sentences', max_items_to_analyze_per_call=FieldInfo(annotation=NoneType, required=False, default=4, metadata=[Gt(gt=0)]))[source]

Main entry point to perform LLM-based extraction on a list of text elements.

This method orchestrates the entire extraction process: it generates the appropriate human prompt, creates the LLM extraction pipeline, and then processes the input list of elements in batches.

Parameters:
  • list_elem (NonEmptyListStr) – A non-empty list of text strings (e.g., sentences or paragraphs) to be analyzed.

  • extraction_type (ExtractionType) – The type of information to extract. Must be ‘money’ or ‘entity’. Defaults to ‘money’.

  • reference_depth (ReferenceDepth) – The granular level of text being analyzed. Must be ‘sentences’ or ‘paragraphs’. Defaults to ‘sentences’.

  • max_items_to_analyze_per_call (int) – The maximum number of text items to include in a single LLM call (batch size). Defaults to 4. Must be greater than 0.

Returns:

A dictionary containing the final aggregated extraction results.

Return type:

dict[str, list[dict[str, Any]]]