Saturday, October 22, 2016

RAII Programming in Haskell

I'm using Haskell in production for processing data files, and Data.Text.Lazy is a nice data type, because you can have both simple code managing text, and efficient run-time text processing, because text is loaded chunk by chunk from data streams. It is like a BufferedReader in Java.
But in GHC 8.0.1 some file reading functions do not behave correctly.
import qualified Data.Text.Lazy.IO as LazyText
import qualified Data.Text.Lazy as LazyText

getFileContent1 :: FilePath -> IO String
getFileContent1 fileName = do
  fileContent <- LazyText.readFile fileName
  return $ LazyText.unpack fileContent

-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent1 fileName = do
  c <- getFileContent1 fileName
  putStrLn c

-- NOTE: this seems a correct function, 
-- but when executed it returns always an empty file content
getFileContent2 :: FilePath -> IO String
getFileContent2 fileName = do
  LazyText.withFile fileName ReadMode $ \handle -> do
    fileContent <- hGetContents handle
    return $ LazyText.unpack fileContent
  
-- NOTE: this code print nothing, due to error on `getFileContent2`
printFileContent2 fileName = do
  c <- getFileContent2 fileName
  putStrLn c
Data.Text.Lazy.IO.readFile is implemented in this way:
readFile :: FilePath -> IO Text
readFile name = openFile name ReadMode >>= hGetContents
Data.Text.Lazy.IO.hGetContents is a function returning the content of the handle chunk by chunk, and closing the handle when all the content is read.
System.IO.withFile is implemented in this way:
 withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
 withFile name mode = bracket (openFile name mode) hClose
so the getFileContent2 code can be expanded to
 getFileContent3 fileName = do
    bracket (openFile fileName ReadMode) hClose $ \handle -> do
      fileContent <- hGetContents handle
      return $ LazyText.unpack fileContent
bracket is one of a series of resource managements functions and monads used for acquiring resources, and releasing them at the end of an action, in a predictable way, and not when the garbage collector arbitrarily decide it. bracket makes management of scarce resources like file handles, database connections, and so on more robust and predictable.
This code will run correctly
 printFileContent3 fileName = do
    bracket (openFile fileName ReadMode) hClose $ \handle -> do
      fileContent <- hGetContents handle
      putStrLn $ LazyText.unpack fileContent
because it will:
  • open the file
  • read it chunk by chunk, using hGetContents
  • print it chunk by chunk, using putStrLn
  • close the handle, thanks to bracket resource finalization action
The code printFileContent2 is not running correctly because:
  • bracket open the file
  • a lazy evaluation thunk LazyText.unpack <$> hGetContents handle is returned from the getFileContent2 function
  • bracket close the file handle before the thunk is evaluated
So from this point:

  • printFileContent2.putStrLn executed the thunk 
  • hGetContents thunk tries to access a closed handle 
  • hGetContents returns an empty content, instead of signaling with a run-time exception that the handle is closed

Then printFileContent2 assumes wrongly that the file is an empty file, without any compile time and run time error.
A test case for the bug is on https://github.com/massimo-zaniboni/ghc_lazy_file_content_error , and the bug was signaled to Ghc team.

RAII Programming in Haskell

Resource Acquisition is Initialization (RAII) is a tecnique for having predictable resource usages.
In Haskell, bracket should be RIIA compliant. This implies that bracket must always return the result in a strict way. In this way when the bracket action is called:
  • the resources are allocated,
  • the action is executed with maximum priority, and predictability,
  • the resources are deallocated,
  • the result is returned to the caller, completely evaluated, and no further processing involving the resources is required,
So the priority is always given to the called bracket action, and not to the caller action. The caller can be arbitrary complex, and long-running, so the called action had to complete its work with greater priority respect the caller, in order to release its allocated resources in a predictable way. In Haskel terms this signifies that when the caller decides it needs the result of the called code, it evaluates the called thunk completely, without intermediate unevaluated thunks. So called thunk evaluation is an atomic action, and it is not intervaled with the partial execution of the caller code.
This mechanism must be used also in case of nested bracket actions: the called actions must be executed in a strict way.

5 Whys

Why we have the hGetContents error? Because hGetContents is buggy. Why? Because bracket used inside withFile does not behave correctly with unevaluated thunks. Why? Because unevaluated thunks do not play nice with RIIA semantic. Because bracket should force a strict evaluation of the returned action, so the used resources are used completely and in a predictable way.
If bracket forces a strict evaluation of its result, then there will be no bug. This code run correctly

getFileContent4 :: FilePath -> IO String
getFileContent4 fileName = do
  fileContent <- LazyText.readFile fileName
  return $! LazyText.unpack fileContent
  -- NOTE: execute in a strict way, thanks to `$!`

-- NOTE: print the file content, reading it chunk by chunk by `fileName`
-- and writing it on `stdout` chunk by chunk.
-- So this simple code, has a nice run-time behaviour.
printFileContent4 fileName = do
  c <- getFileContent4 fileName
  putStrLn c

No comments:

Post a Comment