`

haskell - Input and Output - ByteString

阅读更多

we know that we have lists, where we have used extensivly in many a occasion, but we how about we process the files because list is essentailly just a promise of a list, and so on. So you can think of lists as promises that the next element will be delivered once it really has to and along with it, the promise of the element after it. It doesn't take a big mental leap to conclude that processing a simple list of numbers as a series of promises might not be the most efficient thing in the world.

 

That overhead doesn't bother us so much most of the time, but it turns out to be a liability when reading big files and manipulating them. That's why Haskell has bytestrings. Bytestrings are sort of like lists, only each element is one byte (or 8 bits) in size. The way they handle laziness is also different.

 

One strict bytestring resides in Data.ByteString, and they do away with the laziness completely.  and the other resides on the Data.ByteString.Lazy. They're lazy, but not quite as lazy as lists. 

 

 There are no promises involved; a strict bytestring represents a series of bytes in an array. You can't have things like infinite strict bytestrings. If you evaluate the first byte of a strict bytestring, you have to evaluate it whole. The upside is that there's less overhead because there are no thunks (the technical term for promise) involved. The downside is that they're likely to fill your memory up faster because they're read into memory at once.

 

 there are as many thunks in a list as there are elements. That's what makes them kind of slow for some purposes. Lazy bytestrings take a different approach — they are stored in chunks (not to be confused with thunks!), each chunk has a size of 64K. So if you evaluate a byte in a lazy bytestring (by printing it or something), the first 64K will be evaluated. After that, it's just a promise for the rest of the chunks. Lazy bytestrings are kind of like lists of strict bytestrings with a size of 64K. When you process a file with lazy bytestrings, it will be read chunk by chunk. This is cool because it won't cause the memory usage to skyrocket and the 64K probably fits neatly into your CPU's L2 cache.

 

in this post, we are going to examine the following. 

 

  • ByteStrings
  • the pack function and the unpack function (e..g packpack :: [Word8] -> ByteString get a  list of bytes of word8, and return a string)
  • fromChunks and toChunks (takes a list of strict bytestrings and converts it to a lazy bytestring.takes a lazy bytestring and converts it to a list of strict ones.)
  • cons (lazy one) and the strict one (cons')
  • byteString module has limited functions limited to Data.List
  • Last we will present you a example of copying file with the ByteString modules. 

 

-- file 
--  bytestring_io.hs
-- description:
--  bytestring is sorts of like lists, only each elements is one byte (or 8 bits) in size


import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString as S


-- pack function has the following signature
-- ghci> B.pack [99,97,110]  
-- Chunk "can" Empty  
-- ghci> B.pack [98..120]  
-- Chunk "bcdefghijklmnopqrstuvwx" Empty  


-- unpack is the inverse function of pack, it takes a bytestring and turns it to a list bytes


-- fromChunks 
--  taks a list of stric bytestrings and converts it lazily bytestring. 

-- toChunks
--   takes a lazy bytestring and converts it to a list of strict ones.
-- ghci> B.fromChunks [S.pack [40,41,42], S.pack [43,44,45], S.pack [46,47,48]]  
-- Chunk "()*" (Chunk "+,-" (Chunk "./0" Empty))  


--  bytestring version of ':' is called cons
--   and cons' is an optimized version of cons

-- ghci> B.cons 85 $ B.pack [80,81,82,84]  
-- Chunk "U" (Chunk "PQRT" Empty)  
-- ghci> B.cons' 85 $ B.pack [80,81,82,84]  
-- Chunk "UPQRT" Empty  
-- ghci> foldr B.cons B.empty [50..60]  
-- Chunk "2" (Chunk "3" (Chunk "4" (Chunk "5" (Chunk "6" (Chunk "7" (Chunk "8" (Chunk "9" (Chunk ":" (Chunk ";" (Chunk "<"  
-- 													      Empty))))))))))  
-- ghci> foldr B.cons' B.empty [50..60]  



-- As you can see empty makes an empty bytestring.
--  empty [50 .. 60]

 

and followed by the copying file examples. 

 

-- file 
--  copyfile_io.hs
-- description:
--  implementing our own version of copy file
import System.Environment
import qualified Data.ByteString.Lazy as B



main = do  
    (fileName1:fileName2:_) <- getArgs  
        copyFile fileName1 fileName2  
	  
	copyFile :: FilePath -> FilePath -> IO ()  
        copyFile source dest = do  
            contents <- B.readFile source  
	    B.writeFile dest contents  

 

 

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics