|
--===============3044081452100014207== |
|
MIME-Version: 1.0 |
|
Content-Type: text/plain; charset="utf-8" |
|
Content-Transfer-Encoding: 7bit |
|
|
|
# http://www.aaronsw.com/2002/html2text/ |
|
|
|
html2text is a Python script that converts a page of HTML into clean, easy-to- |
|
read plain ASCII text. Better yet, that ASCII also happens to be valid |
|
Markdown (a text-to-HTML format). |
|
|
|
Usage: `html2text.py [(filename|url) [encoding]]` |
|
|
|
|
|
Options: |
|
--version show program's version number and exit |
|
-h, --help show this help message and exit |
|
--ignore-links don't include any formatting for links |
|
--ignore-images don't include any formatting for images |
|
-g, --google-doc convert an html-exported Google Document |
|
-d, --dash-unordered-list |
|
use a dash rather than a star for unordered list items |
|
-b BODY_WIDTH, --body-width=BODY_WIDTH |
|
number of characters per output line, 0 for no wrap |
|
-i LIST_INDENT, --google-list-indent=LIST_INDENT |
|
number of pixels Google indents nested lists |
|
-s, --hide-strikethrough |
|
hide strike-through text. only relevent when -g is |
|
specified as well |
|
|
|
|
|
Or you can use it from within Python: |
|
|
|
|
|
import html2text |
|
print html2text.html2text("<p>Hello, world.</p>") |
|
|
|
|
|
Or with some configuration options: |
|
|
|
|
|
import html2text |
|
h = html2text.HTML2Text() |
|
h.ignore_links = True |
|
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") |
|
|
|
|
|
Originally written by Aaron Swartz. This code is distributed under the GPLv3. |
|
|
|
## How to do a release |
|
|
|
1. Update the version in `html2text.py` |
|
2. Update the version in `setup.py` |
|
3. Run `python setup.py sdist upload` |
|
|
|
## How to run unit tests |
|
|
|
|
|
cd test/ |
|
python run_tests.py |
|
|
|
|
|
http://travis-ci.org/aaronsw/html2text |
|
|
|
|
|
--===============3044081452100014207== |
|
MIME-Version: 1.0 |
|
Content-Type: text/html; charset="utf-8" |
|
Content-Transfer-Encoding: 7bit |
|
|
|
|
|
|
|
<article itemprop="mainContentOfPage" class="markdown-body entry-content"> |
|
<h1> |
|
<a href="#html2text" class="anchor" name="html2text"><span |
|
class="octicon octicon-link"></span></a><a |
|
href="http://www.aaronsw.com/2002/html2text/">html2text</a> |
|
</h1> |
|
|
|
<p>html2text is a Python script that converts a page of HTML into |
|
clean, easy-to-read plain ASCII text. Better yet, that ASCII also |
|
happens to be valid Markdown (a text-to-HTML format).</p> |
|
|
|
<p>Usage: <code>html2text.py [(filename|url) [encoding]]</code></p> |
|
|
|
<pre><code>Options: |
|
--version show program's version number and exit |
|
-h, --help show this help message and exit |
|
--ignore-links don't include any formatting for links |
|
--ignore-images don't include any formatting for images |
|
-g, --google-doc convert an html-exported Google Document |
|
-d, --dash-unordered-list |
|
use a dash rather than a star for unordered list items |
|
-b BODY_WIDTH, --body-width=BODY_WIDTH |
|
number of characters per output line, 0 for no wrap |
|
-i LIST_INDENT, --google-list-indent=LIST_INDENT |
|
number of pixels Google indents nested lists |
|
-s, --hide-strikethrough |
|
hide strike-through text. only relevent when -g is |
|
specified as well |
|
</code></pre> |
|
|
|
<p>Or you can use it from within Python:</p> |
|
|
|
<pre><code>import html2text |
|
print html2text.html2text("<p>Hello, world.</p>") |
|
</code></pre> |
|
|
|
<p>Or with some configuration options:</p> |
|
|
|
<pre><code>import html2text |
|
h = html2text.HTML2Text() |
|
h.ignore_links = True |
|
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") |
|
</code></pre> |
|
|
|
<p><em>Originally written by Aaron Swartz. This code is distributed |
|
under the GPLv3.</em></p> |
|
|
|
<h2> |
|
<a href="#how-to-do-a-release" class="anchor" |
|
name="how-to-do-a-release"><span |
|
class="octicon octicon-link"></span></a>How to do a release |
|
</h2> |
|
|
|
<ol> |
|
<li>Update the version in <code>html2text.py</code> |
|
</li> |
|
<li>Update the version in <code>setup.py</code> |
|
</li> |
|
<li>Run <code>python setup.py sdist upload</code> |
|
</li> |
|
</ol> |
|
<h2> |
|
<a href="#how-to-run-unit-tests" class="anchor" |
|
name="how-to-run-unit-tests"><span |
|
class="octicon octicon-link"></span></a>How to run unit |
|
tests</h2> |
|
|
|
<pre><code>cd test/ |
|
python run_tests.py |
|
</code></pre> |
|
|
|
<p><a href="http://travis-ci.org/aaronsw/html2text"><img |
|
style="max-width:100%;" alt="Build Status" |
|
src="https://secure.travis-ci.org/aaronsw/html2text.png"></a> |
|
</p></article> |
|
|
|
|
|
--===============3044081452100014207==-- |
|
--------------------------------------------- |