# Data.Text
# Text Literals
The OverloadedStrings (opens new window) language extension allows the use of normal string literals to stand for Text values.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "overloaded"
# Checking if a Text is a substring of another Text
ghci> :set -XOverloadedStrings
ghci> import Data.Text as T
isInfixOf :: Text -> Text -> Bool (opens new window) checks whether a Text is contained anywhere within another Text.
ghci> "rum" `T.isInfixOf` "crumble"
True
isPrefixOf :: Text -> Text -> Bool (opens new window) checks whether a Text appears at the beginning of another Text.
ghci> "crumb" `T.isPrefixOf` "crumble"
True
isSuffixOf :: Text -> Text -> Bool (opens new window) checks whether a Text appears at the end of another Text.
ghci> "rumble" `T.isSuffixOf` "crumble"
True
# Stripping whitespace
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "\n\r\t leading and trailing whitespace \t\r\n"
strip removes whitespace from the start and end of a Text value.
ghci> T.strip myText
"leading and trailing whitespace"
stripStart removes whitespace only from the start.
ghci> T.stripStart myText
"leading and trailing whitespace \t\r\n"
stripEnd removes whitespace only from the end.
ghci> T.stripEnd myText
"\n\r\t leading and trailing whitespace"
filter can be used to remove whitespace, or other characters, from the middle.
ghci> T.filter /=' ' "spaces in the middle of a text string"
"spacesinthemiddleofatextstring"
# Splitting Text Values
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "mississippi"
splitOn breaks a Text up into a list of Texts on occurrences of a substring.
ghci> T.splitOn "ss" myText
["mi","i","ippi"]
splitOn is the inverse of intercalate.
ghci> intercalate "ss" (splitOn "ss" "mississippi")
"mississippi"
split breaks a Text value into chunks on characters that satisfy a Boolean predicate.
ghci> T.split (== 'i') myText
["m","ss","ss","pp",""]
# Encoding and Decoding Text
Encoding and decoding functions for a variety of Unicode encodings can be found in the Data.Text.Encoding module.
ghci> import Data.Text.Encoding
ghci> decodeUtf8 (encodeUtf8 "my text")
"my text"
Note that decodeUtf8 will throw an exception on invalid input. If you want to handle invalid UTF-8 yourself, use decodeUtf8With.
ghci> decodeUtf8With (\errorDescription input -> Nothing) messyOutsideData
# Indexing Text
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "mississippi"
Characters at specific indices can be returned by the index function.
ghci> T.index myText 2
's'
The findIndex function takes a function of type (Char -> Bool) and Text and returns the index of the first occurrence of a given string or Nothing if it doesn't occur.
ghci> T.findIndex ('s'==) myText
Just 2
ghci> T.findIndex ('c'==) myText
Nothing
The count function returns the number of times a query Text occurs within another Text.
ghci> count ("miss"::T.Text) myText
1
# Remarks
Text is a more efficient alternative to Haskell's standard String type. String is defined as a linked list of characters in the standard Prelude, per the Haskell Report (opens new window):
type String = [Char]
Text is represented as a packed array of Unicode characters. This is similar to how most other high-level languages represent strings, and gives much better time and space efficiency than the list version.
Text should be preferred over String for all production usage. A notable exception is depending on a library which has a String API, but even in that case there may be a benefit of using Text internally and converting to a String just before interfacing with the library.
All of the examples in this topic use the OverloadedStrings language extension (opens new window).