Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLE estimation for Markov chains #658

Merged
merged 14 commits into from
Dec 14, 2022
Merged

Add MLE estimation for Markov chains #658

merged 14 commits into from
Dec 14, 2022

Conversation

jstac
Copy link
Contributor

@jstac jstac commented Nov 27, 2022

Adds a straightforward implementation of MLE estimation for Markov transition matrices.

Estimation is from a single time series. States that are never visited in the data are removed from state before being added to the MarkovChain object (since otherwise the matrix is not stochastic).

This PR solves one part of the problem in #640, which is more naturally separated out.

@jstac
Copy link
Contributor Author

jstac commented Nov 27, 2022

All help getting this merged is appreciated! CC @mmcky @Smit-create

Copy link
Member

@oyamad oyamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same functionality in QuantEcon.jl is called estimate_mc_discrete. Use the same name?

@oyamad
Copy link
Member

oyamad commented Nov 27, 2022

Another approach is to accept a series of state values (rather than indices) and let np.unique derive the corresponding series of indices as well as the values of the states that are visited.

X = ['a', 'b', 'b', 'd', 'a']
state_values, indices = np.unique(X, return_inverse=True)

# state_values
# array(['a', 'b', 'd'], dtype='<U1')

# indices
# array([0, 1, 1, 2, 0])

Series of vector state values:

X = [[0, 0], [1, 2], [1, 2], [3, 3]]
state_values, indices = np.unique(X, return_inverse=True, axis=0)

# state_values
# array([[0, 0],
#        [1, 2],
#        [3, 3]])

# indices
# array([0, 1, 1, 2])

So the code would look like:

def estimate_P(X):
    X = np.asarray(X)
    axis = 0 if X.ndim > 1 else None
    state_values, indices = np.unique(X, return_inverse=True, axis=axis)
    
    n = len(state_values)
    P = np.zeros((n, n))  # dtype=float to modify in place
    _count_transition_frequencies(indices, P)
    P /= P.sum(1)[:, np.newaxis]
    
    mc = MarkovChain(P, state_values=state_values)
    return mc

@oyamad
Copy link
Member

oyamad commented Nov 27, 2022

Just to recall (I had completely forgotten it), there is a thread of interesting discussions here in QuantEcon/QuantEcon.jl#161 from five years ago.

@coveralls
Copy link

coveralls commented Nov 27, 2022

Coverage Status

Coverage increased (+0.06%) to 95.313% when pulling 074d98e on markov_est into fcde8c9 on master.

@jstac
Copy link
Contributor Author

jstac commented Nov 27, 2022

Thanks @oyamad for the thoughts. I've changed the function name as you suggested.

The discussion you mention was interesting. We have the same issue in #640 --- the need to estimate a discrete MC from continuous data.

I suggest that we merge this as is and then open issues for adding functionality, including the option you mention of passing in state values rather than indices.

@mmcky mmcky requested review from mmcky and oyamad November 28, 2022 01:46
Copy link
Contributor

@mmcky mmcky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jstac. I will merge this in 24 hours pending any further public review.

from .core import MarkovChain

def estimate_mc_discrete(index_series, state_values=None):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
r"""

Comment on lines 61 to 67
new_n = len(state_values)
P = np.empty((new_n, new_n))

# Normalize
row_sum = np.sum(trans_counter, axis=1)
for i in range(new_n):
P[i, :] = trans_counter[i, :] / row_sum[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for loop should be avoided when possible:

Suggested change
new_n = len(state_values)
P = np.empty((new_n, new_n))
# Normalize
row_sum = np.sum(trans_counter, axis=1)
for i in range(new_n):
P[i, :] = trans_counter[i, :] / row_sum[i]
P = trans_counter / trans_counter.sum(1)[:, np.newaxis]

Copy link
Member

@oyamad oyamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Register estimate_mc_discrete to markov/__init__.py.
  • Do python qe_apidoc.py in the docs directory.

@oyamad
Copy link
Member

oyamad commented Nov 28, 2022

We might ignore this case, but if index_series is e.g. [0, 0, 0, 1] (the last number appears only once), then this function (or the algorithm itself) does not work properly. (The same problem applies to the QuantEcon.jl version as well.)

@jstac
Copy link
Contributor Author

jstac commented Nov 29, 2022

Thanks @oyamad, much appreciated.

@mmcky, would you mind to make those suggested changes and merge (or delegate and then merge)?

Let's ignore the side case that @oyamad pointed out for now (although it's a good point).

@Smit-create
Copy link
Member

@jstac @mmcky I have rebased this branch on top of latest master branch (to fix a doc error which existed in this branch but is fixed in master). Also, made the suggested changes pointed out by @oyamad. Thanks!

@Smit-create Smit-create requested a review from oyamad December 1, 2022 11:40
@jstac
Copy link
Contributor Author

jstac commented Dec 1, 2022

Many thanks @Smit-create !

All looks good to me. It will be great to get this merged some we can finish up #640

@jstac
Copy link
Contributor Author

jstac commented Dec 3, 2022

@oyamad or @mmcky please merge when ready

@oyamad
Copy link
Member

oyamad commented Dec 3, 2022

  1. This is a proposal of refactoring of the current implementation (avoiding unnecessary array creations, fixing a bug in _fill_counters, and editing the docstrings to shorten the line length).

  2. This is a proposal of extending to be able to accept a path of state values, as described in Add MLE estimation for Markov chains #658 (comment).

    >>> X = ([0, 0], [1, 0], [1, 1], [0, 1], [0, 0])
    >>> mc = qe.markov.estimate_mc_discrete(X)
    >>> mc.P
    array([[0., 0., 1., 0.],
           [1., 0., 0., 0.],
           [0., 0., 0., 1.],
           [0., 1., 0., 0.]])
    >>> mc.state_values
    array([[0, 0],
           [0, 1],
           [1, 0],
           [1, 1]])

    If you happen to have a path of state indices:

    >>> X_indices = (0, 2, 3, 1, 0)
    >>> state_values = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    >>> mc = qe.markov.estimate_mc_discrete(X_indices)
    >>> mc.state_values
    array([0, 1, 2, 3])
    >>> mc.state_values = state_values[mc.state_values]
    >>> mc.state_values
    array([[0, 0],
           [0, 1],
           [1, 0],
           [1, 1]])

But the second implementation is x1.5 ~ x2 slower (seemingly due to sorting inside np.unique):

rng = np.random.default_rng(281334440973545911265024757027747152088)
n = 1_000
T = 1_000_000
X = rng.integers(n, size=T)

%timeit qe.markov.estimate_mc_discrete(X)

Version 1:

55.4 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Version 2:

85.3 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@jstac
Copy link
Contributor Author

jstac commented Dec 3, 2022

@oyamad Many thanks for these proposals. Number 1 is a clear improvement and should be adopted. Regarding 2, there is a speed trade-off but I am still in favor because, at least for macro models, these discretizations tend to happen once at the start, rather than repeatedly.

One small point on proposal 2: In most instances, when people generate a vector time series, they will generate it as a matrix where columns are observations, as in https://quanteconpy.readthedocs.io/en/latest/tools/lss.html#quantecon.lss.simulate_linear_model. Should we handle that case?

@oyamad
Copy link
Member

oyamad commented Dec 4, 2022

@jstac I did not realize the column major-ness of the output of simulate_linear_model. In that particular case, one can pass x.T.

I would like to keep the simple assumption "X is any array_like such that X[t] contains the t-th observation". (I added a note in 8296b62.)

@jstac
Copy link
Contributor Author

jstac commented Dec 4, 2022

@oyamad

I would like to keep the simple assumption "X is any array_like such that X[t] contains the t-th observation".

I think you are right. I'm in favor of all these proposed changes.

@jstac
Copy link
Contributor Author

jstac commented Dec 4, 2022

I propose that we change estimate_mc_discrete to estimate_mc because, at least in economics, a Markov chain always has a finite state space`. (I realize the same is not true in all of mathematics.)

If not, then we should change it to estimate_finite_mc, which is accurate, sounds more natural and hence is easier to remember.

@oyamad
Copy link
Member

oyamad commented Dec 5, 2022

That makes sense.
Maybe we want to be consistent with the function called fit_discrete_mc in #640 (comment)?

  • estimate_mc / fit_mc
  • estimate_finite_mc / fit_finite_mc

(I have no strong preference over these. The shorter the better?)

@jstac
Copy link
Contributor Author

jstac commented Dec 5, 2022

In my mind, "fit" sounds like we are fitting continuous data to a discrete model. Hence it's appropriate for #640 but not here.

So, following your principle that shorter is better, let's go for estimate_mc.

@oyamad
Copy link
Member

oyamad commented Dec 5, 2022

Changed the function name to estimate_mc in 8547cba.

("fit_discrete_mc" is fine with me, but just to argue for logical consistency: I am with you for the convention that a Markov chain is intrinsically finite, but if we follow that convention, isn't "discrete_mc" redundant?)

@jstac
Copy link
Contributor Author

jstac commented Dec 5, 2022

"fit_discrete_mc" is fine with me, but just to argue for logical consistency: I am with you for the convention that a Markov chain is intrinsically finite, but if we follow that convention, isn't "discrete_mc" redundant?

Yes, good point. You are right @oyamad .

Perhaps redundancy is sometimes useful to provide emphasis.

(Information theory says that redundancy is the key to communication in a noisy channel...)

@oyamad
Copy link
Member

oyamad commented Dec 14, 2022

If we go with Version 2 in #658 (comment), we can merge markov_est_unique branch into this PR and then merge this PR itself into master.

@jstac
Copy link
Contributor Author

jstac commented Dec 14, 2022

If we go with Version 2 in #658 (comment), we can merge markov_est_unique branch into this PR and then merge this PR itself into master.

Agreed, and thanks. Please go ahead @oyamad or @Smit-create

Copy link
Member

@oyamad oyamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge.

@Smit-create
Copy link
Member

Merging, thanks @oyamad, @jstac and @mmcky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants