Copyright | (c) 2009 2010 2011 Bryan O'Sullivan (c) 2009 Duncan Coutts (c) 2008 2009 Tom Harper |
---|---|
License | BSD-style |
Maintainer | bos@serpentine.com |
Portability | portable |
Safe Haskell | Trustworthy |
Language | Haskell98 |
Functions for converting Text
values to and from ByteString
,
using several standard encodings.
To gain access to a much larger family of encodings, use the text-icu package.
Synopsis
- decodeASCII :: ByteString -> Text
- decodeLatin1 :: ByteString -> Text
- decodeUtf8 :: ByteString -> Text
- decodeUtf16LE :: ByteString -> Text
- decodeUtf16BE :: ByteString -> Text
- decodeUtf32LE :: ByteString -> Text
- decodeUtf32BE :: ByteString -> Text
- decodeUtf8' :: ByteString -> Either UnicodeException Text
- decodeUtf8With :: OnDecodeError -> ByteString -> Text
- decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text
- decodeUtf16BEWith :: OnDecodeError -> ByteString -> Text
- decodeUtf32LEWith :: OnDecodeError -> ByteString -> Text
- decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text
- streamDecodeUtf8 :: ByteString -> Decoding
- streamDecodeUtf8With :: OnDecodeError -> ByteString -> Decoding
- data Decoding = Some Text ByteString (ByteString -> Decoding)
- encodeUtf8 :: Text -> ByteString
- encodeUtf16LE :: Text -> ByteString
- encodeUtf16BE :: Text -> ByteString
- encodeUtf32LE :: Text -> ByteString
- encodeUtf32BE :: Text -> ByteString
- encodeUtf8Builder :: Text -> Builder
- encodeUtf8BuilderEscaped :: BoundedPrim Word8 -> Text -> Builder
Decoding ByteStrings to Text
All of the single-parameter functions for decoding bytestrings encoded in one of the Unicode Transformation Formats (UTF) operate in a strict mode: each will throw an exception if given invalid input.
Each function has a variant, whose name is suffixed with -With
,
that gives greater control over the handling of decoding errors.
For instance, decodeUtf8
will throw an exception, but
decodeUtf8With
allows the programmer to determine what to do on a
decoding error.
decodeASCII :: ByteString -> Text Source #
Deprecated: Use decodeUtf8 instead
Deprecated. Decode a ByteString
containing 7-bit ASCII
encoded text.
decodeLatin1 :: ByteString -> Text Source #
Decode a ByteString
containing Latin-1 (aka ISO-8859-1) encoded text.
decodeLatin1
is semantically equivalent to
Data.Text.pack . Data.ByteString.Char8.unpack
decodeUtf8 :: ByteString -> Text Source #
Decode a ByteString
containing UTF-8 encoded text that is known
to be valid.
If the input contains any invalid UTF-8 data, an exception will be
thrown that cannot be caught in pure code. For more control over
the handling of invalid data, use decodeUtf8'
or
decodeUtf8With
.
decodeUtf16LE :: ByteString -> Text Source #
Decode text from little endian UTF-16 encoding.
If the input contains any invalid little endian UTF-16 data, an
exception will be thrown. For more control over the handling of
invalid data, use decodeUtf16LEWith
.
decodeUtf16BE :: ByteString -> Text Source #
Decode text from big endian UTF-16 encoding.
If the input contains any invalid big endian UTF-16 data, an
exception will be thrown. For more control over the handling of
invalid data, use decodeUtf16BEWith
.
decodeUtf32LE :: ByteString -> Text Source #
Decode text from little endian UTF-32 encoding.
If the input contains any invalid little endian UTF-32 data, an
exception will be thrown. For more control over the handling of
invalid data, use decodeUtf32LEWith
.
decodeUtf32BE :: ByteString -> Text Source #
Decode text from big endian UTF-32 encoding.
If the input contains any invalid big endian UTF-32 data, an
exception will be thrown. For more control over the handling of
invalid data, use decodeUtf32BEWith
.
Catchable failure
decodeUtf8' :: ByteString -> Either UnicodeException Text Source #
Decode a ByteString
containing UTF-8 encoded text.
If the input contains any invalid UTF-8 data, the relevant exception will be returned, otherwise the decoded text.
Controllable error handling
decodeUtf8With :: OnDecodeError -> ByteString -> Text Source #
Decode a ByteString
containing UTF-8 encoded text.
NOTE: The replacement character returned by OnDecodeError
MUST be within the BMP plane; surrogate code points will
automatically be remapped to the replacement char U+FFFD
(since 0.11.3.0), whereas code points beyond the BMP will throw an
error
(since 1.2.3.1); For earlier versions of text
using
those unsupported code points would result in undefined behavior.
decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text Source #
Decode text from little endian UTF-16 encoding.
decodeUtf16BEWith :: OnDecodeError -> ByteString -> Text Source #
Decode text from big endian UTF-16 encoding.
decodeUtf32LEWith :: OnDecodeError -> ByteString -> Text Source #
Decode text from little endian UTF-32 encoding.
decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text Source #
Decode text from big endian UTF-32 encoding.
Stream oriented decoding
The streamDecodeUtf8
and streamDecodeUtf8With
functions accept
a ByteString
that represents a possibly incomplete input (e.g. a
packet from a network stream) that may not end on a UTF-8 boundary.
- The maximal prefix of
Text
that could be decoded from the given input. - The suffix of the
ByteString
that could not be decoded due to insufficient input. - A function that accepts another
ByteString
. That string will be assumed to directly follow the string that was passed as input to the original function, and it will in turn be decoded.
To help understand the use of these functions, consider the Unicode
string "hi ☃"
. If encoded as UTF-8, this becomes "hi
\xe2\x98\x83"
; the final '☃'
is encoded as 3 bytes.
Now suppose that we receive this encoded string as 3 packets that
are split up on untidy boundaries: ["hi \xe2", "\x98",
"\x83"]
. We cannot decode the entire Unicode string until we
have received all three packets, but we would like to make progress
as we receive each one.
ghci> let s0@(Some
_ _ f0) =streamDecodeUtf8
"hi \xe2" ghci> s0Some
"hi " "\xe2" _
We use the continuation f0
to decode our second packet.
ghci> let s1@(Some
_ _ f1) = f0 "\x98" ghci> s1Some
"" "\xe2\x98"
We could not give f0
enough input to decode anything, so it
returned an empty string. Once we feed our second continuation f1
the last byte of input, it will make progress.
ghci> let s2@(Some
_ _ f2) = f1 "\x83" ghci> s2Some
"\x2603" "" _
If given invalid input, an exception will be thrown by the function or continuation where it is encountered.
streamDecodeUtf8 :: ByteString -> Decoding Source #
Decode, in a stream oriented way, a ByteString
containing UTF-8
encoded text that is known to be valid.
If the input contains any invalid UTF-8 data, an exception will be
thrown (either by this function or a continuation) that cannot be
caught in pure code. For more control over the handling of invalid
data, use streamDecodeUtf8With
.
Since: text-1.0.0.0
streamDecodeUtf8With :: OnDecodeError -> ByteString -> Decoding Source #
Decode, in a stream oriented way, a ByteString
containing UTF-8
encoded text.
Since: text-1.0.0.0
A stream oriented decoding result.
Since: text-1.0.0.0
Some Text ByteString (ByteString -> Decoding) |
Encoding Text to ByteStrings
encodeUtf8 :: Text -> ByteString Source #
Encode text using UTF-8 encoding.
encodeUtf16LE :: Text -> ByteString Source #
Encode text using little endian UTF-16 encoding.
encodeUtf16BE :: Text -> ByteString Source #
Encode text using big endian UTF-16 encoding.
encodeUtf32LE :: Text -> ByteString Source #
Encode text using little endian UTF-32 encoding.
encodeUtf32BE :: Text -> ByteString Source #
Encode text using big endian UTF-32 encoding.
Encoding Text using ByteString Builders
encodeUtf8Builder :: Text -> Builder Source #
Encode text to a ByteString Builder
using UTF-8 encoding.
Since: text-1.1.0.0
encodeUtf8BuilderEscaped :: BoundedPrim Word8 -> Text -> Builder Source #
Encode text using UTF-8 encoding and escape the ASCII characters using
a BoundedPrim
.
Use this function is to implement efficient encoders for text-based formats like JSON or HTML.
Since: text-1.1.0.0