Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMA1: LinAlgError: SVD did not converge #240

Closed
csala opened this issue Nov 5, 2020 · 2 comments · Fixed by #324
Closed

HMA1: LinAlgError: SVD did not converge #240

csala opened this issue Nov 5, 2020 · 2 comments · Fixed by #324
Assignees
Labels
bug Something isn't working
Milestone

Comments

@csala
Copy link
Contributor

csala commented Nov 5, 2020

  • SDV version: 0.4.6.devX

Description

There are cases in which the sampled covariance matrix fails when going through the GaussianCopula._rebuild_covariance_matrix method with a LinAlgError: SDV did not converge exception.

This should be reviewed to understand what provokes it and find a way to prevent it or work around it.

Full traceback

  ---------------------------------------------------------------------------
  LinAlgError                               Traceback (most recent call last)
  <ipython-input-1-62b1ae9d6493> in <module>
  ----> 1 new_data = model.sample()
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/base.py in sample(self, table_name, num_rows, sample_children, reset_primary_keys)
      184             self._reset_primary_keys_generators()
      185 
  --> 186         return self._sample(table_name, num_rows, sample_children)
      187 
      188     def save(self, path):
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample(self, table_name, num_rows, sample_children)
      394         for table in self.metadata.get_tables():
      395             if not self.metadata.get_parents(table):
  --> 396                 sampled = self._sample_table(table, num_rows)
      397                 sampled_data.update(sampled)
      398 
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample_table(self, table_name, num_rows, sample_children)
      356             }
      357 
  --> 358             self._sample_children(table_name, sampled_data)
      359             return self._finalize(sampled_data)
      360 
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample_children(self, table_name, sampled_data, table_rows)
      268         for child_name in self.metadata.get_children(table_name):
      269             for _, row in table_rows.iterrows():
  --> 270                 self._sample_child_rows(child_name, table_name, row, sampled_data)
      271 
      272     def _sample_child_rows(self, table_name, parent_name, parent_row, sampled_data):
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample_child_rows(self, table_name, parent_name, parent_row, sampled_data)
      290                     [previous, table_rows]).reset_index(drop=True)
      291 
  --> 292             self._sample_children(table_name, sampled_data, table_rows)
      293 
      294     @staticmethod
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample_children(self, table_name, sampled_data, table_rows)
      268         for child_name in self.metadata.get_children(table_name):
      269             for _, row in table_rows.iterrows():
  --> 270                 self._sample_child_rows(child_name, table_name, row, sampled_data)
      271 
      272     def _sample_child_rows(self, table_name, parent_name, parent_row, sampled_data):
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/relational/hma.py in _sample_child_rows(self, table_name, parent_name, parent_row, sampled_data)
      275         table_meta = self._models[table_name].get_metadata()
      276         model = self._model(table_metadata=table_meta)
  --> 277         model.set_parameters(parameters)
      278 
      279         table_rows = self._sample_rows(model, table_name)
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/tabular/copulas.py in set_parameters(self, parameters)
      400         """
      401         parameters = unflatten_dict(parameters)
  --> 402         parameters = self._rebuild_gaussian_copula(parameters)
      403 
      404         num_rows = parameters.pop('num_rows')
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/tabular/copulas.py in _rebuild_gaussian_copula(self, model_parameters)
      388 
      389         covariance = model_parameters.get('covariance')
  --> 390         model_parameters['covariance'] = self._rebuild_covariance_matrix(covariance)
      391 
      392         return model_parameters
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/tabular/copulas.py in _rebuild_covariance_matrix(self, covariance)
      359 
      360         if not check_matrix_symmetric_positive_definite(covariance):
  --> 361             covariance = make_positive_definite(covariance)
      362 
      363         return covariance.tolist()
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/sdv/tabular/utils.py in make_positive_definite(matrix)
      204     """
      205     symetric_matrix = (matrix + matrix.T) / 2
  --> 206     _, s, V = np.linalg.svd(symetric_matrix)
      207     symmetric_polar = np.dot(V.T, np.dot(np.diag(s), V))
      208     A2 = (symetric_matrix + symmetric_polar) / 2
  
  <__array_function__ internals> in svd(*args, **kwargs)
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv, hermitian)
     1659 
     1660         signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
  -> 1661         u, s, vh = gufunc(a, signature=signature, extobj=extobj)
     1662         u = u.astype(result_t, copy=False)
     1663         s = s.astype(_realType(result_t), copy=False)
  
  ~/work/SDV/SDV/.tox/py36/lib/python3.6/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)
       95 
       96 def _raise_linalgerror_svd_nonconvergence(err, flag):
  ---> 97     raise LinAlgError("SVD did not converge")
       98 
       99 def _raise_linalgerror_lstsq(err, flag):
  
  LinAlgError: SVD did not converge
@csala csala added the bug Something isn't working label Nov 5, 2020
@csala csala self-assigned this Nov 23, 2020
csala added a commit to csala/SDV that referenced this issue Nov 27, 2020
csala added a commit to csala/SDV that referenced this issue Nov 27, 2020
@csala csala added this to the 0.8.0 milestone Feb 24, 2021
@gskaushik
Copy link

gskaushik commented Jun 8, 2021

Hi,

I am still facing the issue in sdv 0.10.0

Error -

Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py:5003: RuntimeWarning: divide by zero encountered in true_divide
  return c**2 / (c**2 - n**2)
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/_distn_infrastructure.py:2494: RuntimeWarning: invalid value encountered in double_scalars
  Lhat = muhat - Shat*mu
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py:621: RuntimeWarning: invalid value encountered in sqrt
  sk = 2*(b-a)*np.sqrt(a + b + 1) / (a + b + 2) / np.sqrt(a*b)
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py:4994: RuntimeWarning: divide by zero encountered in power
  return cd2*x**(c-1)
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/optimize/minpack.py:175: RuntimeWarning: The iteration is not making good progress, as measured by the 
  improvement from the last ten iterations.
  warnings.warn(msg, RuntimeWarning)
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/copulas/univariate/truncated_gaussian.py:43: RuntimeWarning: invalid value encountered in double_scalars
  a = (self.min - loc) / scale
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/copulas/univariate/truncated_gaussian.py:44: RuntimeWarning: divide by zero encountered in double_scalars
  b = (self.max - loc) / scale
/Users/gskaushik/opt/anaconda3/lib/python3.8/site-packages/scipy/optimize/minpack.py:175: RuntimeWarning: The iteration is not making good progress, as measured by the 
  improvement from the last five Jacobian evaluations.
  warnings.warn(msg, RuntimeWarning)
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-39-7169cf46f59d> in <module>
      3 model = GaussianCopula()
      4 model.fit(result)
----> 5 new_data = model.sample(200)
      6 print_current_time()

~/opt/anaconda3/lib/python3.8/site-packages/sdv/tabular/base.py in sample(self, num_rows, max_retries, max_rows_multiplier, conditions, float_rtol, graceful_reject_sampling)
    376         if conditions is None:
    377             num_rows = num_rows or self._num_rows
--> 378             return self._sample_batch(num_rows, max_retries, max_rows_multiplier)
    379 
    380         # convert conditions to dataframe

~/opt/anaconda3/lib/python3.8/site-packages/sdv/tabular/base.py in _sample_batch(self, num_rows, max_retries, max_rows_multiplier, conditions, transformed_conditions, float_rtol)
    275                 Sampled data.
    276         """
--> 277         sampled, num_valid = self._sample_rows(
    278             num_rows, conditions, transformed_conditions, float_rtol)
    279 

~/opt/anaconda3/lib/python3.8/site-packages/sdv/tabular/base.py in _sample_rows(self, num_rows, conditions, transformed_conditions, float_rtol, previous_rows)
    204         if self._metadata.get_dtypes(ids=False):
    205             if conditions is None:
--> 206                 sampled = self._sample(num_rows)
    207             else:
    208                 try:

~/opt/anaconda3/lib/python3.8/site-packages/sdv/tabular/copulas.py in _sample(self, num_rows, conditions)
    298                 Sampled data.
    299         """
--> 300         return self._model.sample(num_rows, conditions=conditions)
    301 
    302     def get_likelihood(self, table_data):

~/opt/anaconda3/lib/python3.8/site-packages/copulas/__init__.py in wrapper(self, *args, **kwargs)
     34     def wrapper(self, *args, **kwargs):
     35         if self.random_seed is None:
---> 36             return function(self, *args, **kwargs)
     37 
     38         else:

~/opt/anaconda3/lib/python3.8/site-packages/copulas/multivariate/gaussian.py in sample(self, num_rows, conditions)
    243         self.check_fit()
    244 
--> 245         samples = self._get_normal_samples(num_rows, conditions)
    246 
    247         output = {}

~/opt/anaconda3/lib/python3.8/site-packages/copulas/multivariate/gaussian.py in _get_normal_samples(self, num_rows, conditions)
    217             means, covariance, columns = self._get_conditional_distribution(normal_conditions)
    218 
--> 219         samples = np.random.multivariate_normal(means, covariance, size=num_rows)
    220         return pd.DataFrame(samples, columns=columns)
    221 

mtrand.pyx in numpy.random.mtrand.RandomState.multivariate_normal()

<__array_function__ internals> in svd(*args, **kwargs)

~/opt/anaconda3/lib/python3.8/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv, hermitian)
   1658 
   1659         signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1660         u, s, vh = gufunc(a, signature=signature, extobj=extobj)
   1661         u = u.astype(result_t, copy=False)
   1662         s = s.astype(_realType(result_t), copy=False)

~/opt/anaconda3/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)
     95 
     96 def _raise_linalgerror_svd_nonconvergence(err, flag):
---> 97     raise LinAlgError("SVD did not converge")
     98 
     99 def _raise_linalgerror_lstsq(err, flag):

LinAlgError: SVD did not converge

@csala
Copy link
Contributor Author

csala commented Jun 8, 2021

@gskaushik Even though the error line at the end is the same, the traceback indicates that the origin of the error is a different one.
Would you mind reporting your error, including the code that you executed and the output you obtained, in a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants