Export Document¶
The Document export tab allows you to export a segmented version of the current document with custom text added before and after various elements of the document.
This allows you to do things such as export the document with spaces added before/after each word, or generate an HTML copy of the document with markup tags added before or after different elements.
A document in Chinese Text Analyser contains the following elements:
- Document - the entire document.
- Paragraph - text separated by a newline character.
- Word - individual words, as determined by Chinese Text Analyser’s segmenting engine.
- Character - individual characters.
You can specify a Pre
and Post
tag for each element, with Pre
tags added before the element, and Post
tags added after the element.
You can also add whitespace characters (such as newlines and tabs) to the
Pre
and Post
fields with the following escape codes:
\n
- a newline character\r
- a carriage return character\t
- a tab character\\
- a backslash
Example 1 - Spaces after each word¶
If you wanted to generate a segmented copy of a document with a single space
added after each word then you would set Word Post
to a single space ' '
and Chinese Text Analyser would export the document and add a single space after each word.
Note
Note: By default, the Word Post
field does contain a single space ' '
. This will not be
obvious just by looking at the dialog box, so remember to delete
the space if you do not wish to add spaces after every word.
This can be done by clicking on the field, selecting the text, and then pressing delete.
Example 2 - Generating segmented HTML¶
If you wanted to generate an HTML document with spans around each word and character you could do:
Document Pre: <html><head><title>Chinese Text Analyser is the best!</title><meta
charset="UTF-8"><style>.char:hover { color: red; } .word:hover { font-size: 150%
}</style></head><body>
Document Post: </body></html>
Paragraph Pre: <p>
Paragraph Post: </p>
Word Pre: <span class="word">
Word Post: </span>
Character Pre: <span class="char">
Character Post: </span>
With the above settings, when the exported content is opened in a web browser, highlighted words would be shown with an increased font size, and highlighted characters would be shown in red.
Example 3 - Adding newlines after each paragraph¶
If you wanted to add a couple of extra lines after each paragraph you would set
Paragraph Post
to '\n\n'
.
The exported document would then contain two extra lines after each paragraph.