Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for heartbeat memory leak #236

Merged
merged 2 commits into from
Aug 1, 2019
Merged

Bugfix for heartbeat memory leak #236

merged 2 commits into from
Aug 1, 2019

Conversation

tsightler
Copy link
Contributor

This small patch effectively bypasses the resolver code for heartbeat packets and appears to correct the significant memory leak seen, especially when there are a lot of devices.

First contribution to this project so apologize if it's wrong.

Resolver code was causing memory leak during ping-pong
@coveralls
Copy link

Pull Request Test Coverage Report for Build 445

  • 0 of 1 (0.0%) changed or added relevant line in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 91.647%

Changes Missing Coverage Covered Lines Changed/Added Lines %
index.js 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
index.js 1 88.63%
Totals Coverage Status
Change from base Build 424: 0.2%
Covered Lines: 285
Relevant Lines: 312

💛 - Coveralls

@kueblc
Copy link
Collaborator

kueblc commented Jul 31, 2019

Thanks again for tracking this down @tsightler. I see now the precise source of the leak,

tuyapi/index.js

Line 396 in c72cda1

delete this._resolvers[packet.sequenceN];

We're never calling our resolver before it is deleted, leaving a hanging Promise which would correspond to your report of four byte increments. I think the solution is to fix that oversight, by calling the resolver after checking that it exists and before deleting it, as in L428

@tsightler
Copy link
Contributor Author

Makes sense, I probably should have seen that. I'll make that change in my local tree and let it run and report back.

@tsightler
Copy link
Contributor Author

tsightler commented Jul 31, 2019

Took a quick stab at this, but honestly, the resolver code seems totally broken to me. It immediately failed with errors that _resolver wasn't a function. I added the following console.log outputs to the _packetHandler() to see what was going on:

 if (packet.commandByte === CommandType.HEART_BEAT) {
      debug(`Pong from ${this.device.ip}`);

      // Remove resolver
      console.log(packet.sequenceN);
      console.log(this._resolvers[packet.sequenceN]);
      console.log(this._resolvers);
      delete this._resolvers[packet.sequenceN];
      return;
    }

I then fired up the script and started my init script, which adds a total of 4 Tuya devices, at 2 second intervals. This was what was logged on the console:

0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function], '4': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function], '4': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function],
'8': [Function] }
0
undefined
{ '2': [Function], '4': [Function], '5': [Function] }
^C[root@util02 tuya-mqtt]# node ./tuya-mqtt.js
0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function], '4': [Function], '6': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function],
'8': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function] }
0
undefined
{ '2': [Function],
'4': [Function],
'6': [Function],
'7': [Function],
'8': [Function] }

And the number of entries in resolvers just keeps increasing. Sometimes I see one and then it is deleted, but mostly they just keep incrementing. This seems completely broken and adding a delete doesn't seem likely to work.

@tsightler
Copy link
Contributor Author

tsightler commented Jul 31, 2019

Just a little more digging, it seems obvious to me now that _sendPing increments and adds a sequence number to the packet and _send adds a resolver with the current sequence number. However, in the received pong, the sequence number is always zero, and I can't see any way to associate that response with an existing sequence number to know which resolver to resolve and delete.

To me, bypassing the resolver for heartbeat, as this patch does, is simplest solution that doesn't seem to have any chance to break anything. It doesn't fix the resolver code, which still appears to be totally broken (issuing commands constantly increases the number of entries in _resolver), so there will still be some leak, but commands are generally issued far less frequently that every 10 seconds per device and the leak is so small it would likely take months for most users to see this.

I see two options for a complete fix:

  • Remove the resolver code completely. It doesn't seem to serve it's intended purpose and I see no obvious method to fix it
  • I considered that I could add some type of timeout to a resolver, where it resolves and deletes itself after a period of time. Seems hackish and I don't really see the point.

In the interim, I've worked around this issue in my own code by disconnecting and reconnecting all devices every hour. Crude but effective.

@kueblc
Copy link
Collaborator

kueblc commented Jul 31, 2019

OK @tsightler, good points. In light of this I agree, we should do the quick fix, and then we'll work on removing or rewriting resolvers.

@codetheweb
Copy link
Owner

Just to clear a few points up:

  • The resolver code originally was a bit of a hack job to be honest, I didn't really think it all the way through - intending for it to be replaced in v6.
  • The point of storing resolvers is so a call to device.set() can return the new state, instead of just a confirmation that the state was probably changed. The resolvers were needed because devices respond to a set command with two packets: first a confirmation, then the updated state. The second packet is what we return.

The above is a feature, not a necessity. I'll merge this in for now (CI is failing but it looks like an issue with Travis, not the code), though if we continue to have similar problems I'm perfectly happy to completely strip out the resolver code and revert device.set() to just returning the confirmation.

Moving forward, this should be completely fixed in v6 if we're storing the device state internally as @kueblc has suggested.

@codetheweb codetheweb merged commit a942de3 into codetheweb:master Aug 1, 2019
@codetheweb
Copy link
Owner

Also, please open PRs against the development branch in the future @tsightler. I think there's a note about that in CONTRIBUTING.md.

@codetheweb
Copy link
Owner

Published in v5.1.2.

@Apollon77
Copy link
Collaborator

Great job guys! I updated my adapter too and will receive user feedback soon

@codetheweb codetheweb mentioned this pull request Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants