NAME
    HTML::ParagraphSplit - Change text containing HTML into a formatted HTML
    fragment

SYNOPSIS
      use HTML::ParagraphSplit qw( split_paragraphs_to_text split_paragraphs );

      # Read in from a file handle, output text
      print split_paragraphs_to_text(\*ARGV);

      # Convert text to nicely split text
      print split_paragraphs_to_text(<<END_OF_MARKUP);
      This is one paragraph.

      This is a another paragraph.
      END_OF_MARKUP

      # Convert to an HTML::Element object instead
      my $tree = split_paragraphs($html_input);
      print $tree->as_HTML;

      # Create your own HTML::Element object and split it
      my $tree = HTML::TreeBuilder->new;
      $tree->parse($text);
      $tree->eof;

      split_paragraphs($tree);

      my $html_fragment = $tree->guts->as_HTML;
      $tree->delete;

DESCRIPTION
    The purpose of this library is to provide methods for converting double
    line-breaks in text to HTML paragraphs (i.e., wrap in "<P></P>" tags).
    It can also convert single line breaks into "<BR>" tags. In addition,
    markup can be mixed in as well and this library will
    DoTheRightThing(tm). There are a number of additional options that can
    modify how the paragraph splits are performed.

    For example, given this input (the initial text was generated by
    DadaDodo <http://www.jwz.org/dadadodo/dadadodo.cgi>, btw):

      I see over the <strong>noise</strong> but I don't understand sometimes.

      Fortunately, we've traded the club you can't skimp on the do because This
      week! Presented by code Lounge: except, for controlling Knox video cameras
      Linux well that the reason, the runlevel to run some reason number of coming 
      back next server; sees you Control <a href="blah.html">display</a> a steep 
      and I tagged with specifications of six feet, moving to Code, flyer main room
      motel balcony, <p>and airflow in which define the ability to run a common. We
      need to current in a manner <pre>than six months and that already gotten a
      webcast</pre> is roughly long and bulk: and up the src page: and updates on a:
      user will probably does this.

    This would be converted into the following:

      <p>I see over the <strong>noise</strong> but I don't understand sometimes.</p>
      <ol><li>One</li><li>Two</li><li>Three</li><ol>
      <p>Fortunately, we've traded the club you can't skimp on the do because This
      week! Presented by code Lounge: except, for controlling Knox video cameras
      Linux well that the reason, the runlevel to run some reason number of coming 
      back next server; sees you Control <a href="blah.html">display</a> a steep 
      and I tagged with specifications of six feet, moving to Code, flyer main room
      motel balcony,</p>
      <p>and airflow in which define the ability to run a common. We need to
      current in a manner</p>
      <pre>than six months and that already gotten a
      webcast</pre>
      <p>is roughly long and bulk: and up the src page: and updates on a: user will 
      probably does this.</p>

    This allows authors that want to use HTML markup some but don't really
    want to cope with getting their paragraph tags right, can use this
    filter to format their work the right way.

    This library depends upon HTML::TreeBuilder and HTML::Tagset. You may
    wish to see the documentation for those libraries for additional
    details.

METHODS
    The primary method of this library is "split_paragraphs()". An
    additional method, "split_paragraphs_to_text()" is provided to simplify
    the task of generating output without having to fuss with
    HTML::TreeBuilder.

    $element = split_paragraphs($handle, \%options)
    $element = split_paragraphs($text, \%options)
    $element = split_paragraphs($element, \%options)
        This method has three forms, which vary only in the input the
        receive. If the first argument is a file handle, $handle, then that
        handle will be read, parsed, and split. If the first argument is a
        scalar, $text, then that text will parsed and split. If the first
        argument is a subclass of HTML::Element, $element, then the tree
        represented by the node will be traversed and split.

        If you use the third form, your tree will be modified in place and
        the same tree will be returned. You will want to clone the tree
        ahead of time if you need to preserve the old tree.

        All forms take an optional second parameter, "\%options", which is a
        reference to a hash of options which modify the default behavior.
        See below for details.

        The first two forms perform an extra step, but are handled
        essentially the same after the input is parsed into an HTML::Element
        using HTML::TreeBuilder. This is done using the defaults, except
        that "no_space_compacting()" is set to a true value (otherwise, we
        lose any double returns that were in the original text). If you
        parse your own trees, you'll probably want to do the same.

        This method will search down the element tree and find the first
        node with non-implicit child ndoes and use that as the root of
        operations.

        The "split_paragraphs()" method then walks the tree and wraps any
        undecorated text node in a paragraph. Any double line break
        discovered will result in multiple paragraphs. Any paragraph content
        elements (as defined by %is_Possible_Strict_P_Content of
        HTML::Tagset) will be inserted into the paragraph elements as if
        they were text. Any block level tags (i.e., not in
        %is_Possible_Strict_P_Content) cause a paragraph break immediately
        before and after such elements.

        Any text found within a block-level node may also be paragraphified.
        Those blocks of text will not be wrapped in paragraphs unless they
        contain a double-line break (that way we're not inserting "P"-tags
        without an explicit need for them).

        Note also that this will insert "P"-tags conservatively. If more
        than two line-breaks are present, even if they are mixed with other
        white space, all of that whitespace will be treated as the same
        paragraph break. No empty "P"-tags or "P"-tags containing only
        whitespace will be inserted (mostly). The only exception is when the
        white space is created by white space entities, such as "&nbsp;".

        All of that is the default behavior. That behavior may be modified
        by the second parameter, which is used to specify options that
        modify that behavior.

        Here's the list of options and what they do:

        p_on_breaks_only => 1
            If this option is used, then paragrpahs will not be added to
            your text unless there is at least on double-line break. This
            option is used internally to make sure nested elements do not
            have extra "P"-tags unnecessarily.

        single_line_breaks_to_br => 1
            If this option is given, then single line breaks will also be
            converted to "BR"-tags.

        br_only_if_can_tighten => 1
            This option modifies the "single_line_breaks_to_br" option by
            specifying that "BR"-tags are not added within blocks that
            cannot be tightened (i.e., aren't set in %canTighten of
            HTML::Tagset). This can be useful for preventing double-line
            breaks from appearing inside "PRE"-tags or "TEXTAREA"-tags
            because of added "BR"-tags.

        use_br_instead_of_p => 1
            As an alternative to using "P"-tags at all, this can also just
            place "BR"-tags everywhere instead. Instead of inserting
            "P"-tags whenever a double line-break is enountered, two
            "BR"-tags will be inserted instead.

            This option is independant of "single_line_breaks_to_br" as
            single line-breaks are not dealt with unless that option is
            turned on. Also note that, like "P"-tag insertion, it inserts
            "BR"-tags conservatively. Multiple consecutive line-breaks (even
            mixed with whitespace) will be treated just as if they were only
            two. Thus, given the default stylesheet of your typical browser,
            the rendered output will appear pretty much the same in most
            circumstances.

        add_attrs_to_p => \%attrs
            This can be used to insert a static set of attributes to each
            inserted "P"-element. For example:

              # Give each newly added paragraph the "generated" class.
              split_paragraphs($tree, {
                  add_attrs_to_p => { class => 'generated' },
              });

        add_attrs_to_br => \%attrs
            Same as above, but for the inserted "BR"-tags.

        filter_added_nodes => \&sub
            This can be used to run a small subroutine on each added
            paragraph or line-break tag as it is added. For example:

              # Give each newly added paragraph a unique ID
              split_paragraphs($tree, {
                  filter_added_nodes => sub {
                      my ($element) = @_;
                      $element->idf();
                  },
              });

            Many, if not all, of the other options can be simulated using
            this method, by the way.

        use_instead_of_p => $tag
            Rather than using "P"-tags to break everything, use a different
            tag. This example uses "DIV"-tags instead of "P"-tags:

              split_paragraphs($tree, {
                  use_instead_of_p => 'div',
              });

    $html_text = split_paragraphs_to_text($handle, \%options)
    $html_text = split_paragraphs_to_text($text, \%options)
    $html_text = split_paragraphs_to_text($element, \%options)
        This method performs the exact same operation as the
        "split_paragraphs()" method, but returns the text as a scalar value.
        This is helpful if you just want a quick method that takes in text
        and outputs text and you don't really need the HTML formatted in any
        particular way and don't need to modify the tree at all.

        I created this method primarily as a way of outputing the tree to
        make testing easier. If the output isn't want you like, use
        "split_paragraphs()" instead and use the output methods available in
        HTML::Element directly to get the desired output.

SEE ALSO
    HTML::TreeBuilder, HTML::Tagset

BUGS AND TODO
    I don't really have any explicit plans for this module, but if you find
    a bug or would like an additional feature or have another contribution,
    send me email at <hanenkamp@cpan.org>.

NOTES
    I tried to name this library HTML::Paragraphify first. After typing that
    a dozen times and looking at it for a few hours, my eyes felt like they
    were starting to bleed so I changed it to HTML::ParagraphSplit.

    I've left a few token references to that in the documentation name for
    kicks.

AUTHOR
    Andrew Sterling Hanenkamp, <hanenkamp@cpan.org>

LICENSE AND COPYRIGHT
    Copyright 2006 Andrew Sterling Hanenkamp <hanenkamp@cpan.org>. All
    Rights Reserved.

    This module is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself. See perlartistic.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.