Output Files
Upon successful completion of the extraction and processing, two CSV files will be generated in the root directory:
- etl_money_result.csv: This file contains the extracted and processed monetary information. It includes the following columns:
document_id: The EU grant project identifier (e.g., AMIF-2024-TF2-AG-THB).
value: The extracted monetary amount.
currency: The currency associated with the value.
context: The motivation or context for the extracted amount.
original_sentence: The original sentence from the input text where the amount was found.
- etl_entity_result.csv: This file contains the extracted and validated entity data. It includes the following columns:
document_id: The EU grant project identifier.
organization_type: A list of organization types found in the consortium table of the proposal PDF.
min_entities: The minimum number of entities, as indicated in the entities row of the consortium table.