Skip to main content
DoclingTools enable an Agent to convert documents from multiple input formats (PDF, DOCX, PPTX, XLSX, HTML, images, audio, video, etc.) into output formats like Markdown, JSON, YAML, HTML, DocTags, and more using the Docling library.

Prerequisites

The following example requires the docling library.
uv pip install -U docling

# Required for the OCR example
uv pip install -U easyocr

# Required for audio/video processing
uv pip install -U openai-whisper
ffmpeg is also required for audio/video processing:
  • macOS: brew install ffmpeg
  • Ubuntu: sudo apt-get install ffmpeg
  • Windows: Download from ffmpeg.org

Example

The following agent converts a PDF to Markdown:
from agno.agent import Agent
from agno.tools.docling import DoclingTools

agent = Agent(
    tools=[DoclingTools(all=True)],
    description="You are an agent that converts documents from all Docling parsers and exports to all supported output formats.",
)

agent.print_response(
    "Convert to Markdown: cookbook/07_knowledge/testing_resources/cv_1.pdf",
    markdown=True,
)

OCR Configuration

Configure OCR settings for scanned PDFs or documents with embedded images.
cookbook/91_tools/docling_tools/ocr_example.py
from agno.agent import Agent
from agno.tools.docling import DoclingTools

ocr_tools = DoclingTools(
    pdf_enable_ocr=True,
    pdf_ocr_engine="easyocr",
    pdf_ocr_lang=["pt", "en"],
    pdf_force_full_page_ocr=True,
    pdf_enable_table_structure=True,
    pdf_enable_picture_description=False,
    pdf_document_timeout=120.0,
)

ocr_agent = Agent(
    tools=[ocr_tools],
    description="You are an agent that converts PDFs using advanced OCR.",
)

ocr_agent.print_response(
    "Convert to Markdown: cookbook/07_knowledge/testing_resources/cv_1.pdf",
    markdown=True,
)

Toolkit Params

ParameterTypeDefaultDescription
converterDocumentConverterNonePre-configured Docling DocumentConverter instance
max_charsintNoneMaximum characters in output
allowed_input_formatsList[str]NoneRestrict accepted input formats (e.g. ["pdf", "docx"])
format_optionsDict[Any, Any]NoneCustom format options passed to the converter
pdf_pipeline_optionsPdfPipelineOptionsNoneFull PDF pipeline configuration object
pdf_enable_ocrboolNoneEnable OCR processing for PDFs
pdf_ocr_enginestrNoneOCR engine: auto, easyocr, tesseract, tesseract_cli, ocrmac, rapidocr
pdf_ocr_langList[str]NoneOCR language codes (e.g. ["en", "pt"])
pdf_force_full_page_ocrboolNoneForce OCR on every page regardless of text layer
pdf_enable_table_structureboolNoneEnable table structure recognition in PDFs
pdf_enable_picture_descriptionboolNoneEnable picture description extraction
pdf_enable_picture_classificationboolNoneEnable picture classification
pdf_document_timeoutfloatNoneTimeout in seconds for PDF processing
pdf_enable_remote_servicesboolNoneEnable remote services for PDF processing
enable_convert_to_markdownboolTrueRegister the convert_to_markdown function
enable_convert_to_textboolTrueRegister the convert_to_text function
enable_convert_to_htmlboolTrueRegister the convert_to_html function
enable_convert_to_html_split_pageboolTrueRegister the convert_to_html_split_page function
enable_convert_to_jsonboolTrueRegister the convert_to_json function
enable_convert_to_yamlboolTrueRegister the convert_to_yaml function
enable_convert_to_doctagsboolTrueRegister the convert_to_doctags function
enable_convert_to_vttboolTrueRegister the convert_to_vtt function
enable_convert_string_contentboolTrueRegister the convert_string_content function
enable_list_supported_parsersboolTrueRegister the list_supported_parsers function
allboolFalseEnable all conversion functions when set to True

Toolkit Functions

FunctionDescription
convert_to_markdownConverts a document (file path or URL) to Markdown. Accepts source, optional headers for URL requests, raises_on_error, max_num_pages, and max_file_size.
convert_to_textConverts a document to plain text. Same parameters as convert_to_markdown.
convert_to_htmlConverts a document to HTML. Same parameters as convert_to_markdown.
convert_to_html_split_pageConverts a document to HTML with page-level splitting. Same parameters as convert_to_markdown.
convert_to_jsonConverts a document to JSON. Same parameters as convert_to_markdown.
convert_to_yamlConverts a document to YAML. Same parameters as convert_to_markdown.
convert_to_doctagsConverts a document to DocTags format. Same parameters as convert_to_markdown.
convert_to_vttConverts a document (including audio/video) to VTT subtitle format. Same parameters as convert_to_markdown.
convert_string_contentConverts raw string content (Markdown or HTML) to another format. Accepts content, source_format (default "markdown"), output_format (default "markdown"), and optional name.
list_supported_parsersLists all supported Docling input parsers and any active format restrictions.

Developer Resources