Metadata-Version: 2.1
Name: docstruct
Version: 1.0.14
Summary: A package for representing documents as a tree of document, pages, paragraphs, lines, words, and characters
Home-page: https://github.com/smrt-co/docstruct
Author: Moran Nechushtan
Author-email: moran.n@trullion.com
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown

<h1>Docstruct</h1>

<h2>Overview</h2>

<p>Docstruct is a package that parses the results of optical character recognition (OCR) algorithms, such as Tesseract (using the hOCR output) or Textract (AWS), into a tree structure. This tree structure allows for the visual representation of the document, with each node representing a document, page, paragraph, line, word, or character, along with its bounding box. The package also includes support for paragraph detection and text splitting that preserves logical units.</p>

<h2>Documentation</h2>

<p>For more information read the docs at: [Docstruct](https://smrt-co.github.io/docstruct/) </p>

<pre><code>pip install docstruct
</code></pre>

<h2>Contributions</h2>

<p>Contributions to the Docstruct package are always welcome. If you have a bug fix or a new feature, feel free to create a pull request on the GitHub repository.</p>

<h2>License</h2>

<p>The Docstruct package is licensed under the <a href="https://opensource.org/licenses/MIT">MIT License</a>.</p>
