Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foldlM/foldrM doesn't work with Maybe and other pure monads #357

Closed
matil019 opened this issue Apr 22, 2018 · 14 comments
Closed

foldlM/foldrM doesn't work with Maybe and other pure monads #357

matil019 opened this issue Apr 22, 2018 · 14 comments
Assignees

Comments

@matil019
Copy link
Member

matil019 commented Apr 22, 2018

The following code results in StackOverflowError:

import frege.data.Foldable (foldlM)

main = do
   let m :: Maybe Int
       m = foldlM (\r _ -> pure $! succ r) 0 [1..5000]
   println m
Exception in thread "main" java.lang.StackOverflowError
        at Main$$Lambda$17/930990596.<init>(Unknown Source)
        at Main$$Lambda$17/930990596.get$Lambda(Unknown Source)
        at Main.lambda$null$1(Main.java:85)
        at frege.prelude.PreludeBase.lambda$$$excl$18(PreludeBase.java:16226)
        at frege.run8.Thunk.call(Thunk.java:231)
        at Main.lambda$null$2(Main.java:95)
        at frege.run8.Thunk.call(Thunk.java:231)
        at frege.prelude.Maybe$IMonad_Maybe.ƒ$gt$gt$eq(Maybe.java:496)
        at frege.data.Foldable$1Let$10255.f$tick$8372(Foldable.java:1583)
        at frege.data.Foldable.lambda$foldlM$47(Foldable.java:1593)
        at frege.run8.Thunk.call(Thunk.java:231)
        at frege.prelude.Maybe$IMonad_Maybe.ƒ$gt$gt$eq(Maybe.java:497)
        at frege.data.Foldable$1Let$10255.f$tick$8372(Foldable.java:1583)
        at frege.data.Foldable.lambda$foldlM$47(Foldable.java:1593)
        at frege.run8.Thunk.call(Thunk.java:231)
        at frege.prelude.Maybe$IMonad_Maybe.ƒ$gt$gt$eq(Maybe.java:497)
        ...

foldrM has a similar problem. Either, [] cause StackOverflowError, too.

Meanwhile, IO doesn't suffer from this:

import frege.data.Foldable (foldlM)

main = do
    let m :: IO Int
        m = foldlM (\r a -> pure $! succ r) 0 [1..5000]
    println =<< m
5000

Workaround

UPDATE Having an undefined case changes the return type of Java method to Lazy, so the following seem to work for both finite and infinite lists (only for Maybe):

foldlM' :: (b -> a -> Maybe b) -> b -> [a] -> Maybe b
foldlM' f z (x:xs) = f z x >>= \acc -> foldlM' f acc xs
foldlM' _ z []     = pure z
foldlM' _ _ _      = undefined

main = do
    println $ foldlM' (\r _ -> Just $! succ r) 0 [1..100000]
    println $ foldlM' (\_ _ -> Nothing) 0 [1..]
Just 100000
Nothing
@Ingo60
Copy link
Member

Ingo60 commented Apr 22, 2018

Interestingly, the following does work:

frege> mInt = foldM (\a b -> pure $ (const id) a b) 0 [1..50000]
value mInt :: (Applicative a,Bind a) => a Int

frege> (mInt :: Maybe Int)
Just 50000

Could be worth a look what's the difference between foldM and foldlM

foldr and any derivate of it are, unfortunately, notorious for stackoverflows. We would need full grown continuations to avoid filling up the stack, or a JVM with expandable stacks. The basic problem is seen here:

length [] = 0
length (_:xs) = 1 + length xs

@matil019
Copy link
Member Author

And it seems it's not foldlM's fault; a hand-made folding function doesn't work as well;

module MyFold where

myFoldM :: (b -> a -> Maybe b) -> b -> [a] -> Maybe b
myFoldM f z (x:xs) = f z x >>= \acc -> myFoldM f acc xs
myFoldM _ z []     = pure z

main = print $ myFoldM (\r _ -> Just $! succ r) 0 [1..5000]
Exception in thread "main" java.lang.StackOverflowError
...

Tracing myFoldM doesn't look like leaking the stack space:

  myFoldM (\r _ -> Just $! succ r) 0 [1..5000]
= Just $! succ 0 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [2..5000]
= Just 1 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [2..5000]
= myFoldM (\r _ -> Just $! succ r) 1 [2..5000]
= Just $! succ 1 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [3..5000]
= Just 2 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [3..5000]
= myFoldM (\r _ -> Just $! succ r) 2 [3..5000]
...

Looking at the generated Java code, it looks like myFoldM tries to force its return value before returning it:

final public static <a, b> PreludeBase.TMaybe<b> myFoldM(
  final Func.U<b, Func.U<a, PreludeBase.TMaybe<b>>> arg$1, final Lazy<b> arg$2, final PreludeBase.TList<a> arg$3
) {
  final PreludeBase.TList.DCons<a> $7643 = arg$3.asCons();
  if ($7643 != null) {
    final PreludeBase.TList<a> µ$$7591 = $7643.mem2.call();
    return Maybe.IMonad_Maybe.<b, b>$gt$gt$eq(
              arg$1.apply(arg$2).call().apply($7643.mem1).call(),
              (Func.U<b, PreludeBase.TMaybe<b>>)((final Lazy<b> arg$7645) -> {
                    final b acc$7585 = arg$7645.call();
                    return Thunk.<PreludeBase.TMaybe<b>>shared(
                              (Lazy<PreludeBase.TMaybe<b>>)(() -> MyFold.<a, b>myFoldM(
                                        arg$1, Thunk.<b>lazy(acc$7585), µ$$7591
                                      ))
                            );
                  })
            ).call();
  }
  final PreludeBase.TList.DList<a> $7647 = arg$3.asList();
  assert $7647 != null;
  return Maybe.IApplicative_Maybe.<b>pure(arg$2);
}

Note that the return type of myFoldM is not Lazy<PreludeBase.TMaybe<b>> but PreludeBase.TMaybe<b>, and it .call()s Maybe.IMonad_Maybe.$gt$gt$eq at its return expression.

If we assume that myFoldM forces its result before returning it, indeed we can show that it uses up the stack space:

myFoldM f z (x:xs) = id $! f z x >>= \acc -> myFoldM f acc xs
myFoldM _ z []     = id $! pure z

  myFoldM (\r _ -> Just $! succ r) 0 [1..5000]
= id $! (Just $! succ 0) >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [2..5000]
= id $! Just 1 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [2..5000]
= id $! myFoldM (\r _ -> Just $! succ r) 1 [2..5000]
= id $! id $! (Just $! succ 1) >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [3..5000]
= id $! id $! Just 2 >>= \acc -> myFoldM (\r _ -> Just $! succ r) acc [3..5000]
= id $! id $! myFoldM (\r _ -> Just $! succ r) 2 [3..5000]
= id $! id $! id $! ...

So if we can make the generated java method returns Lazy, myFoldM may not overflow the stack.
I couldn't find how to do that, except this example:

myFoldM f z (x:xs) = f z x >>= \acc -> myFoldM f acc xs
myFoldM _ z []     = undefined

main = print $ myFoldM (\r _ -> Just $! succ r) 0 [1..5000]
final public static <a, b> Lazy<PreludeBase.TMaybe<b>> myFoldM(
  final Func.U<b, Func.U<a, PreludeBase.TMaybe<b>>> arg$1, final Lazy<b> arg$2, final PreludeBase.TList<a> arg$3
) {
  ...
Exception in thread "main" frege.runtime.Undefined: undefined
...

It does reach the end condition and triggers undefined, instead of StackOverflow.

@matil019
Copy link
Member Author

@Ingo60 foldM is defined in term of Prelude.fold, so it doesn't stackoverflow but doesn't work with infinite list (that's why I tried to use foldlM/foldrM for the first place).

@matil019
Copy link
Member Author

And, in Haskell,

foldlM (\r _ -> Just $! succ r) 0 [1..]

runs (infinitely) with a constant memory usage. So I'd like to consider this as a compatibility issue.

(OTOH, foldrM keeps eating memory.)

@Ingo60
Copy link
Member

Ingo60 commented Apr 22, 2018

There is also the problem that currently, lambdas will have to return Lazy<X>, hence the $! is not seen until the expression is actually evaluated. But then, you already have thousands of thunks nested.

@Ingo60
Copy link
Member

Ingo60 commented Apr 22, 2018

There must be two reasons for stack overflow, actually:

frege> foldlM (\r _ -> Just (succ r)) 0 [1..10000]
java.lang.StackOverflowError

We can go a bit farther with:

frege> foldlM (\r _ -> case succ r of !x -> Just x) 0 [1..10000]
Just 10000

But still:

frege> foldlM (\r _ -> case succ r of !x -> Just x) 0 [1..20000]
java.lang.StackOverflowError

@matil019
Copy link
Member Author

@Ingo60 can't reproduce (stackoverflows with [1..10000] for me), maybe it's just different stack size allocated for JVM.

@Ingo60
Copy link
Member

Ingo60 commented Apr 22, 2018

Yeah, right, I've given 2m for the repl:

ingo@freguntu:~/Frege/frege$ alias repl
alias repl='java8 -Xss2m -jar ~/Frege/frege/dist/frege3.24-latest.jar -repl'

@matil019
Copy link
Member Author

It still crashes at 5000 iterations or so with -Xss2m on my environment, but I don't think we have to care much for the digits.

foldlM (\r _ -> case succ r of x | traceLn (show x) -> undefined; !x -> Just x) 0 [1..10000]
foldlM (\r _ -> Just $! succ (if traceLn (show r) then undefined else r)) 0 [1..10000]

Both prints digits before crashing, so $! does force its argument, or doesn't it?

Also the following crashes too.

foldlM (\r _ -> Just 0) 0 [1..10000]

@matil019
Copy link
Member Author

I suspect that the cause is that Maybe.>>= (or its direct caller) tries to force both of its arguments and its result value. I modified by hand the generated Java code not to call .call() (see my previous comment) and it ran fine.

@Ingo60
Copy link
Member

Ingo60 commented Apr 22, 2018

You're on the right track, as always, @matil019

Maybe.>>= is ok, though. But the compiler's estimation whether myFoldl should return Lazy is wrong. For some reason it decides not to. That's why it forces the result of Maybe.>>=. Your manual modification should be done by the compiler. The reasoning being that when the tail call is a lazy returning function (i.e. Maybe.>>=) then it should be safe to pass the result upwards, instead forcing it right away.

@Ingo60
Copy link
Member

Ingo60 commented May 2, 2018

This thing keeps disturbing me.

It would be not enough to enforce the policy that if the tail call returns Lazy, so does the caller. This would only help in the case of myFoldl because there the compiler can see that Maybe.>>= returns lazy. (It must do so because there is a policy that the tail call to an unknown function needs to be lazy.)

But in the case of foldlM the compiler sees only the class method >>=. The class method is not really a function, but just a type signature. Hence, I had to decide at some point whether class methods should return Lazy or not. I decided against it since this would result in heavy boxing/unboxing in polymorphic code that deals with primitive types: arithmetic, comparisions, and so forth.

Maybe I should revise this decision, as most of this code is, in fact, not polymorphic anyway, and thus the performance degradation will not be so big.

Another possibility is to think about ways to say: this function (or class method) must return lazy. Maybe some sort of pragma. We will need such a mechanism anyway for different reasons: Currently there are 3 hacks that do something that would be appropriate for pragmas:

  1. the ugly construct in the module header that tells which functions are inlineable
  2. if the documentation comment of a function starts with "warning:" , the compiler will emit the warning when that function is used.
  3. if the documentation comment starts with "nowarn:" there will be no warnings for this corresponding symbol.

Items 2 and 3 are undocumented and exclude each other. So this would be a reason to introduce pragma syntax (like in Haskell), and then we could have

{-# lazyreturn #-}
(>>=) :: m a -> (a -> m b) -> m b

@matil019
Copy link
Member Author

matil019 commented May 3, 2018

Have you measured the difference of run-time between strict vs lazy returns? Performance-intensitive code should not be using polymorphic constructs anyway (unless indirections are resolved in compile time like templates in C++), so if the difference is not too big, it may be acceptable. Also, currently some of generated Java code contains something like Thunk.shared(() => expr.call()). If it is replaced by just expr, the performance may actually improve.

As for syntax, here is what I had tried on instinct (and failed):

foo :: a -> a
?foo ?a = a

inline bar :: a -> b -> a
inline bar a _ = a

Since we annotate each function for public/protected/private instead of module headers, I think it's reasonable to put inline on each function as well.

@Ingo60
Copy link
Member

Ingo60 commented May 3, 2018

Do you have an example ready where the compiler generates something of the form

Thunk.shared(() => expr.call())

This doesn't look right, indeed.

@Ingo60 Ingo60 closed this as completed in 1926a5e May 10, 2018
catull pushed a commit to catull/frege that referenced this issue Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants