Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background initialization option for JPA EntityManagerFactory / Hibernate SessionFactory [SPR-13732] #18305

Closed
spring-projects-issues opened this issue Nov 29, 2015 · 11 comments
Assignees
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Nov 29, 2015

Juergen Hoeller opened SPR-13732 and commented

Traditionally, the bottleneck in the startup performance of Spring applications is the initialization of expensive resources such as a JPA EntityManagerFactory or a Hibernate SessionFactory.

As an alternative to generalized parallel initialization of beans, we might be able to achieve some significant gain through specific background initialization options in LocalContainerEntityManagerFactoryBean / LocalSessionFactoryBean / LocalSessionFactoryBuilder, internally passing the actual build call on to a background thread after configuration validation and in particular dependency resolution happened. The exposed EntityManagerFactory / SessionFactory proxy (which we have for other reasons already) could then simply delegate method invocations to a Future of the target resource until it is known to be resolved, which is likely not to come before the very end of the startup phase.


Issue Links:

Referenced from: commits db1171d, 09cea6e

1 votes, 9 watchers

@spring-projects-issues
Copy link
Collaborator Author

Juergen Hoeller commented

Phil Webb, Stéphane Nicoll, Greg Turnquist, Oliver Drotbohm, this occured to me after reading that recent Twitter conversation about concurrent bean bootstrapping in Spring.next. Generalized concurrent bootstrapping of beans is an absolutely non-trivial problem which seems to bring more pain than gain; addressing our known bottlenecks through specific background initialization options might be a fine compromise. This doesn't seem too hard, actually, so I'm scheduling it for 4.3 already, where we have a focus on further performance optimizations anyway.

Configuration-wise, that would mean a backgroundInitialization(true/false) property on those FactoryBeans / builder classes, for explicit opting-in for each such bean. We won't do that by default anywhere in Spring Framework itself, not least of it all because we are not allowed to use custom background threads for startup in a Java EE environment, but also for keeping existing application startup in the strict same order that it may rely on now. However, for self-contained deployment with Boot 1.4, it could even be considered by default... or at least driven by a general Boot configuration property that's going to be applied to those beans when set up by Boot auto-configuration.

Juergen

@spring-projects-issues
Copy link
Collaborator Author

Juergen Hoeller commented

I've got a working arrangement based on a setBootstrapExecutor(AsyncTaskExecutor method on LocalContainerEntityManagerFactoryBean and LocalSessionFactoryBean, as well as a buildSessionFactory(AsyncTaskExecutor) method on LocalSessionFactoryBuilder.

This preserves the EntityManagerFactoryInfo interception that we had in our existing proxy, so calls to that metadata interface do not block for completed initialization but rather delegate to the FactoryBean's local configuration state. JpaTransactionManager may therefore still perform its early introspection of the passed-in EntityManagerFactory without interfering with the background bootstrap attempt. Analogously, SessionFactoryUtils.getDataSource checks the SessionFactory's properties first now before trying to obtain the runtime connection provider, with our SessionFactory bootstrap proxy exposing those properties from its own local configuration state. This allows HibernateTransactionManager to autodetect the DataSource without interfering with the background bootstrap attempt as well.

Juergen

@spring-projects-issues
Copy link
Collaborator Author

Oliver Drotbohm commented

That sounds great. I'd love to do some A/B testing on startup times for some of the JPA samples we have flying around as soon as I can get a hand on a snapshot build.

@spring-projects-issues
Copy link
Collaborator Author

Juergen Hoeller commented

Initial cut committed. Triggered by "bootstrapExecutor" property on the FactoryBean.

@spring-projects-issues
Copy link
Collaborator Author

Oliver Drotbohm commented

I've played a little with this and while it generally works, I didn't see too significant improvements in startup times. I guess the following reasons play into this:

  1. I have too small sample applications. With small I mean applications containing a significant amount of (even transitively) non-JPA-dependent beans that could be instantiated in parallel and also take a significant amount of time to do so. In that sense, the term bottleneck used in the description above is a bit misleading as the JPA setup might be taking a significant amount of time in the overall bootstrap process, it's not really slowing down the creation of other beans, but the first one needing a dependency to the JPA subsystem.
  2. The applications I have usually declare a lot of beans that by some means depend on something indirectly related to JPA persistence. Repositories, services, controllers — as soon as one of those beans' initialization is triggered, the blocking call to the EMF is triggered, too. I guess what plays into this is that the trigger of the JPA subsystem initialization is coming from the repositories, which try obtain an EntityManager quite immediately to verify declared queries, obtain the JPA metamodel etc. Hence, I guess we're running into the eventually blocking call quite early.

So unless you don't have a lot of beans that are expensive to create and don't on some way or another rely on JPA, or you extensively use lazy initialization in the first place, the performance benefits of that change as is are probably not that significant.

Is there a way we can identify LCEMFB and make sure they get instantiated as early as possible? Or find other means of a more coarse grained lazification of the beans that actually trigger the use of an EntityManager. Spring Data repositories already support a lazy initialization mode but that basically just prevents the repository beans from being created eagerly. The lazy mode delay that until a repository bean is actually needed for injection and thus prevents creation of repositories that are not injected at all.

I was thinking about a lazy initialization proxy for repositories that could be added to the factory creating the repository proxies, so that the bean creation still gets triggered (and thus the downstream EMF initialization) but basically acts like a transparent @Lazy on the injection point, so that downstream components can still be initialized while the EMF and repository setup proceeds. That would basically elevate the trigger of the eventual blocking until the very first interaction with the repository which is less likely to happen during initialization than the repository's interaction with the EMF, which is pretty immediate by design.

I guess we'd also have to make sure the initialization process is completed before the container throws the ContextRefreshedEvent as the repository initialization looks up named queries, which — in case of a non-existant one — might throw an exception, which by JPA definition has to roll back the current transaction (which is stupid but unfortunately defined like this), which you thus almost by definition wouldn't want to let happen on the very first request made to your application.

@spring-projects-issues
Copy link
Collaborator Author

Juergen Hoeller commented

As for 1., that's expected: Initializing the JPA provider in the background will only pay off if it did not eat most of a particular application's startup time to begin with, and/or if actual JPA access happens very late. This new option here is by no means a simple opt-in. Parallel bootstrapping unfortunately needs to be reflected in the application's architecture.

As for 2., in a typical Spring JPA application, many beans indirectly depend on the JPA persistence layer, of course. However, that shouldn't be a problem for the initialization order, since the bean factory algorithm will very quickly end up at the LCEMFB when resolving any such bean's dependency graph anyway, without much other work having happened yet. A key difference is whether a receiving bean just stores the EMF proxy reference or immediately uses it...

From my perspective, identifying the LCEMFB bean through some special mechanism won't provide much benefit. I'd rather recommend our existing means of ordering: defining the LCEMFB bean (or its containing configuration class) early, or ordering the corresponding configuration class through @Order. Even then, I don't think that'll buy you much: The LCEMFB will be reached within milliseconds in almost any scenario, and it doesn't really matter much if its multi-second bootstrap process kicks off those few ms later.

It is much more important that the dependent beans call the EntityManagerFactory with blocking calls at the latest possible point, that is, in the ContextRefreshedEvent phase or on actual persistence operations only. In regular Spring JPA applications (without Spring Data JPA), that seems to be pretty much the case. With Spring Data JPA, there's the early EntityManager access for query validation purposes; that should definitely happen as late as possible here within Spring Data JPA's initialization architecture.

Finally, with respect to enforcing JPA initialization completion at ContextRefreshedEvent time: Any bean can simply perform a non-trivial EntityManagerFactory operation in that phase and enforce initialization that way. However, I'd rather not do this by default since this may cut off quite bit of startup-time benefit for deployed applications. Doesn't Spring Data's JPA query validation kick in during the init phase of each repository bean anyway, outside of any transaction that an application request may initiate? I suppose with generic lazy bean initialization, that init phase may actually run within a wider application-initiated transaction in such a scenario (if not marked as REQUIRES_NEW)? In that case, Spring Data JPA could manually enforce initialization of all its lazy repositories at ContextRefreshedEvent time...

@spring-projects-issues
Copy link
Collaborator Author

Mirek commented

Nice in our app where we have 4000 beans and 2000 entities so this feature is important for us because startup time is about 70 seconds.
I'm created simple spring boot app (https://github.com/MirekSz/spring-boot-slow-startup) which demonstrate startup problem

So yesterday spring 4.3 was released.. how to enable this feature, some prop ?

@spring-projects-issues
Copy link
Collaborator Author

Stéphane Nicoll commented

If you have a Spring Boot app, see this issue. If you are configuring the entity manager yourself, there is a comment above that explains how to configure LocalContainerEntityManagerFactoryBean and friends.

@spring-projects-issues
Copy link
Collaborator Author

Andrei Ivanov commented

Btw, I think it would be nice to mention this in the reference documentation :)
I've searched it for bootstrapExecutor and I didn't find anything.

@spring-projects-issues
Copy link
Collaborator Author

Daniel Pinyol Laserna commented

I agree with Andrei. I'm not sure how this can be enabled.
If it can be configured from configuration xml, how can I do it?

thanks

@spring-projects-issues
Copy link
Collaborator Author

Oliver Drotbohm commented

I've just merged the fix for DATAJPA-1397 that allows configuring @EnableJpaRepositories(bootstrapMode = BootstrapMode.DEFERRED). In combination with configuring a TaskExecutor on LocalContainerEntityManagerFactoryBean, this allows to defer repository initialization until the ApplicationContext has started up completely and make maximum use of the background initialization.

I've also created an example to show the usage of the different modes. Spring Boot support to ease enabling this is on its way, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

2 participants