ETL Project

Solution explanation

  • Solution Overview
  • Project Structure and Overview

Design Choices and Approach

  • Data Extraction Flow and Temporary Storage
  • Data Transformation
  • Data Load

Guide

  • Installation
  • Usage
  • Output Files

Api

  • PdfConverter
  • Document
  • LocalLLM
  • Pipeline
ETL Project
  • Index

Index

C | D | E | F | G | L | P | R | S

C

  • convert_to_text() (llm_etl_pipeline.PdfConverter method)

D

  • default_system_prompt (llm_etl_pipeline.LocalLLM attribute), [1]
  • do_cell_matching (llm_etl_pipeline.PdfConverter attribute), [1]
  • do_ocr (llm_etl_pipeline.PdfConverter attribute), [1]
  • do_table_structure (llm_etl_pipeline.PdfConverter attribute), [1]
  • Document (class in llm_etl_pipeline)

E

  • extract_information() (llm_etl_pipeline.LocalLLM method)

F

  • functions (llm_etl_pipeline.Pipeline attribute), [1]

G

  • get_paras_or_sents_raw_text() (llm_etl_pipeline.Document method)

L

  • LocalLLM (class in llm_etl_pipeline)

P

  • paragraph_segmentation_mode (llm_etl_pipeline.Document attribute), [1]
  • paragraphs (llm_etl_pipeline.Document attribute), [1]
  • PdfConverter (class in llm_etl_pipeline)
  • Pipeline (class in llm_etl_pipeline)

R

  • raw_text (llm_etl_pipeline.Document attribute), [1]
  • run() (llm_etl_pipeline.Pipeline method)

S

  • sat_model_id (llm_etl_pipeline.Document attribute), [1]
  • sentences (llm_etl_pipeline.Document property)

© Copyright Alberto Bellumat.

Built with Sphinx using a theme provided by Read the Docs.