Skip to content
This repository has been archived by the owner on Apr 29, 2021. It is now read-only.

Can random method be lazily evaluated instead of eager evaluation #5

Closed
krishna-meduri opened this issue Mar 1, 2018 · 7 comments
Closed
Labels
enhancement New feature or request scalamatsuri
Milestone

Comments

@krishna-meduri
Copy link

I have been using the previous version of random-data-generator, which is based on shapeless, for quite sometime and is awesome. Recently, I tried using this for generating a Billion records. Since the random method is doing an eager evaluation, this is causing me an OOM error. Could you please guide me in this regard, as this would be my on going task to generate billions of records.
Thank you.
Note: Ideally, this is not an issue but calling out for help :)

@DanielaSfregola DanielaSfregola added the question Further information is requested label Mar 11, 2018
@DanielaSfregola
Copy link
Owner

Apologies for the late reply! For some reason, I completely miss this issue :)

Unfortunately, I do not think eager evaluation is possible.....any chance you could show me some sample code of your use case?

Cheers,
D.

@krishna-meduri
Copy link
Author

As far as my knowledge is concerned, eager evaluation is happening when we try to call random with some custom Arbitrary value for a Billion records.
For example, I have created an Arbitrary element for some structure. Lets say Arbitrary[Event], where Event is having some 50 columns and I am using for..yield comprehension to create Arbitrary[Event].
// for the sake of space efficiency, I have removed majority of the code
implicit val arbitraryPerson: Arbitrary[Event] = Arbitrary {
for {
.....................
claimed_by_party_cd <- Gen.const("NA")
sc_auth_code <- Gen.const("123456")
decline_Flag <- Gen.oneOf("Y", "N", "NA")
sc_card_acceptor_cd <- charGenerator(15)
merchant_charge_amount <- Gen.const("0.0")
merchant_charge_curr_cd <- Gen.const("NA")
fee_collection_amount <- Gen.const("0.0")
fee_collection_curr_cd <- Gen.const("NA")
writeoff_amount <- Gen.const("0.0")
writeoff_curr_cd <- Gen.const("NA")
sc_function_code <- charGenerator(3)
..............
eventDetails = new EventDetails(event_id, ......., source_table)
} yield Event(eventDetails)
}
And, I am calling "randomEvent.foreach{// writing generated records to a file}". Also, I am restricting every file to 1 Million records. When executing this code, I always ended up with OOM issue.
So, I have created another random method as follows which solved the issue,
def random[T](seed: Seed)(implicit arb: Arbitrary[T]): Stream[T] = {
val gen = Gen.infiniteStream(arb.arbitrary)
val optSeqT = gen.apply(Gen.Parameters.default, seed)
optSeqT.get
}

Please let me know whether I am in right direction.
Finally, my sincere apologies for the very long comment with improper formatting :)

@krishna-meduri
Copy link
Author

Code got messed up because of formatting ..... Code for creating 1 Billion records is, "random[Event](1000*1000000).foreach{// writing generated records to a file}"

@DanielaSfregola DanielaSfregola added the enhancement New feature or request label Mar 16, 2018
@DanielaSfregola
Copy link
Owner

Hi @krishna-meduri,
I think you did found the issue indeed! The problem is that the code uses a List, which it is eagerly evaluated.

We have two options:
A) break the API and return a stream instead of a list
OR
B) add a new method to return a stream rather than a list (something called something like lazyRandom[T]...?)

I would probably go with option B. What do you think?

Cheers,
D.

@DanielaSfregola DanielaSfregola added scalamatsuri and removed question Further information is requested labels Mar 16, 2018
@krishna-meduri
Copy link
Author

Hi @DanielaSfregola
As far as I know, option B seems to be a natural fit which can help use cases like mine.
I have overridden random method, as a work around, in my code as follows,
def random[T](seed: Seed)(implicit arb: Arbitrary[T]): Stream[T] = { val gen = Gen.infiniteStream(arb.arbitrary) val optSeqT = gen.apply(Gen.Parameters.default, seed) optSeqT.get }
Since the above code uses Stream data structure, which is lazy by nature, it would not perform an eager evaluation. Could you please check and let me know whether it works as expected?

Regards,
Krishna

@DanielaSfregola
Copy link
Owner

Hi @krishna-meduri,
your solution does work. All we need to do now is to define a lazyRandom[T] function returning a Stream[T](n) and to redefine random[T](n) in terms of lazyRandom[T](n).

I'll try some time to work on it next weeks, in the meanwhile PRs are always welcome ;)

@krishna-meduri
Copy link
Author

Hi @DanielaSfregola
Sure, I will be more than happy to work on this case. Will try to work on this during next week :)
Regards,
Krishna

DanielaSfregola added a commit that referenced this issue Apr 16, 2018
@DanielaSfregola DanielaSfregola added this to the 2.5 milestone Apr 16, 2018
DanielaSfregola added a commit that referenced this issue Apr 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request scalamatsuri
Projects
None yet
Development

No branches or pull requests

2 participants