Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gosigar package #28909

Merged
merged 5 commits into from
Nov 16, 2021
Merged

Update gosigar package #28909

merged 5 commits into from
Nov 16, 2021

Conversation

narph
Copy link
Contributor

@narph narph commented Nov 10, 2021

What does this PR do?

Updates gosigar lib which has fix for filesystem windows error
Update docs with limitation on external disks
Stats are also unavailable in these situations.

Why is it important?

Updates gosigar lib which has fix for filesystem windows error

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Ex:

{
  "@timestamp": "2021-11-10T15:21:32.613Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "_doc",
    "version": "8.1.0"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "host": {
    ...
  },
  "agent": {
   ...
  },
  "event": {
..
  },
  "metricset": {
    "name": "filesystem",
    "period": 10000
  },
  "service": {
    "type": "system"
  },
  "system": {
    "filesystem": {
      "used": {
        "bytes": 513627078656,
        "pct": 0.5115
      },
      "type": "ntfs",
      "device_name": "C:\\",
      "mount_point": "C:\\",
      "total": 1004205502464,
      "available": 490578423808,
      "free": 490578423808
    }
  }
}


{
  "@timestamp": "2021-11-10T15:21:32.613Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "_doc",
    "version": "8.1.0"
  },
  "host": {
   ....
  },
  "agent": {
  ...
  },
  "system": {
    "filesystem": {
      "available": 0,
      "free": 0,
      "used": {
        "pct": 0,
        "bytes": 0
      },
      "type": "unavailable",
      "device_name": "D:\\",
      "mount_point": "D:\\",
      "total": 0
    }
  },
  "event": {
    "module": "system",
    "duration": 7789600,
    "dataset": "system.filesystem"
  },
  "metricset": {
    "name": "filesystem",
    "period": 10000
  },
  "service": {
    "type": "system"
  },
  "ecs": {
    "version": "8.0.0"
  }
}

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 10, 2021
@mergify
Copy link
Contributor

mergify bot commented Nov 10, 2021

This pull request does not have a backport label. Could you fix it @narph? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Nov 10, 2021
@narph narph self-assigned this Nov 10, 2021
@mergify
Copy link
Contributor

mergify bot commented Nov 10, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b add-incorrect upstream/add-incorrect
git merge upstream/master
git push upstream add-incorrect

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 10, 2021

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-15T15:39:56.227+0000

  • Duration: 128 min 11 sec

  • Commit: 398b1c6

Test stats 🧪

Test Results
Failed 0
Passed 53696
Skipped 5233
Total 58929

Steps errors 2

Expand to view the steps failures

x-pack/metricbeat-goIntegTest - mage goIntegTest
  • Took 16 min 6 sec . View more details here
  • Description: mage goIntegTest
Recursively delete the current directory from the workspace
  • Took 0 min 2 sec . View more details here
  • Description: Unable to delete '/var/lib/jenkins/workspace/PR-28909-5-3e778349-9aa5-417c-b912-7fb73b993a1b'. Tried

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@narph narph requested a review from jsoriano November 10, 2021 15:29
@narph narph added the Team:Integrations Label for the Integrations team label Nov 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 10, 2021
@narph narph added the backport-v7.15.0 Automated backport with mergify label Nov 10, 2021
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Nov 10, 2021
@narph narph added backport-v7.16.0 Automated backport with mergify and removed backport-v7.15.0 Automated backport with mergify labels Nov 10, 2021
@@ -112,8 +115,10 @@ func filterFileSystemList(fsList []sigar.FileSystem) []sigar.FileSystem {
// GetFileSystemStat retreves stats for a single filesystem
func GetFileSystemStat(fs sigar.FileSystem) (*FSStat, error) {
stat := sigar.FileSystemUsage{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder if we should log this somewhere. Maybe in the main filesystem.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my worry is flooding the logs with this warning/error message, knowing folks will collect these metrics in short intervals.
I would avoid it for now and keep it in the documentation. If we get any user feedback about any confusion there, then we might need to reconsider.

@narph narph added the backport-v8.0.0 Automated backport with mergify label Nov 11, 2021
@@ -112,8 +115,10 @@ func filterFileSystemList(fsList []sigar.FileSystem) []sigar.FileSystem {
// GetFileSystemStat retreves stats for a single filesystem
func GetFileSystemStat(fs sigar.FileSystem) (*FSStat, error) {
stat := sigar.FileSystemUsage{}
if err := stat.Get(fs.DirName); err != nil {
return nil, err
if fs.SysTypeName != UnavailableDisk {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a sample event when this happens? Are stats populated with zero values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no stats json object, the example is right in this ticket description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I see. There are all populated with zeroes:

    "filesystem": {
      "available": 0,
      "free": 0,
      "used": {
        "pct": 0,
        "bytes": 0
      },
      "type": "unavailable",
      "device_name": "D:\\",
      "mount_point": "D:\\",
      "total": 0
    }

This may be confusing, for example when preparing alerts, you may need to use other fields to check if the available space is 0 because the filesystem is actually full or because the metric was not available. I think that in these cases these metrics shouldn't be reported:

    "filesystem": {
      "type": "unavailable",
      "device_name": "D:\\",
      "mount_point": "D:\\",
    }

The event may still be useful for inventory reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow I thought the stats would not be there.
I had a look at the function retrieving the stats and it is reused by the fsstat metricset. The fsstat metricset logs the exception as a debug message and continues to the next disk. I tried to do the same for the filesystem for consistency purposes.

So the debug logs will show:

Line 46: {"log.level":"debug","@timestamp":"2021-11-15T16:26:13.465+0100","log.logger":"system.fsstat","log.origin":{"file.name":"fsstat/fsstat.go","file.line":85},"message":"error fetching filesystem stats for 'D:\\': GetDiskFreeSpaceEx failed: The device is not ready.","service.name":"metricbeat","ecs.version":"1.6.0"}
Line 47: {"log.level":"debug","@timestamp":"2021-11-15T16:26:13.465+0100","log.logger":"system.filesystem","log.origin":{"file.name":"filesystem/filesystem.go","file.line":85},"message":"error fetching filesystem stats for 'D:\\': GetDiskFreeSpaceEx failed: The device is not ready.","service.name":"metricbeat","ecs.version":"1.6.0"}

nothing changed for fsstat
but for filesystem events will look like this:

 {
        "_index" : "metricbeat-8.1.0-2021.11.15-000001",
        "_type" : "_doc",
        "_id" : "mis1JH0BLuBcqFZ9zBxN",
        "_score" : null,
        "_source" : {
          "@timestamp" : "2021-11-15T15:27:24.244Z",
          "system" : {
            "filesystem" : {
              "type" : "ntfs",
              "device_name" : """C:\""",
              "mount_point" : """C:\""",
              "total" : 1004205502464,
              "available" : 477790535680,
              "free" : 477790535680,
              "used" : {
                "bytes" : 526414966784,
                "pct" : 0.5242
              }
            }
          },
          "ecs" : {
            "version" : "8.0.0"
          },
          "host" : {
            "name" : "DESKTOP-K76UDQL"
          },
          "agent" : {
            ...
          },
          "metricset" : {
            "name" : "filesystem",
            "period" : 10000
          },
          "event" : {
            "dataset" : "system.filesystem",
            "module" : "system",
            "duration" : 12886200
          },
          "service" : {
            "type" : "system"
          }
        },
        "sort" : [
          1636990044244
        ]
      },
      {
        "_index" : "metricbeat-8.1.0-2021.11.15-000001",
        "_type" : "_doc",
        "_id" : "nCs1JH0BLuBcqFZ9zBxN",
        "_score" : null,
        "_source" : {
          "@timestamp" : "2021-11-15T15:27:24.244Z",
          "agent" : {
           ...
          },
          "event" : {
            "dataset" : "system.filesystem",
            "module" : "system",
            "duration" : 21079100
          },
          "metricset" : {
            "name" : "filesystem",
            "period" : 10000
          },
          "service" : {
            "type" : "system"
          },
          "system" : {
            "filesystem" : {
              "type" : "unavailable",
              "device_name" : """D:\""",
              "mount_point" : """D:\"""
            }
          },
          "ecs" : {
            "version" : "8.0.0"
          },
          "host" : {
            "name" : "DESKTOP-K76UDQL"
          }
        },
        "sort" : [
          1636990044244
        ]
      }

@fearful-symmetry , @jsoriano wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with @jsoriano , we shouldn't report "null zero" metrics. The fields should be omitted where possible, and I think having "bare" events with no/few metrics in this case isn't terrible.

Comment on lines 118 to 122
if fs.SysTypeName != UnavailableDisk {
if err := stat.Get(fs.DirName); err != nil {
return nil, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just ignore this error if we want to always report some event even if these stats are not available.

Suggested change
if fs.SysTypeName != UnavailableDisk {
if err := stat.Get(fs.DirName); err != nil {
return nil, err
}
}
if fs.SysTypeName != UnavailableDisk {
if err := stat.Get(fs.DirName); err != nil {
log.Debugf("....: %v", err)
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, behavior would more consistent

@narph
Copy link
Contributor Author

narph commented Nov 15, 2021

/package

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please wait for @fearful-symmetry opinion.

"free": fsStat.Free,
"used": common.MapStr{
}
if addStats == true {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Redundant boolean comparison.

Suggested change
if addStats == true {
if addStats {

@narph
Copy link
Contributor Author

narph commented Nov 15, 2021

/package

Copy link
Contributor

@fearful-symmetry fearful-symmetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think that bool check so we don't report a bunch of null metrics is a reasonable solution.

@narph
Copy link
Contributor Author

narph commented Nov 16, 2021

/package

@narph narph merged commit 08642d0 into elastic:master Nov 16, 2021
@narph narph deleted the add-incorrect branch November 16, 2021 15:57
mergify bot pushed a commit that referenced this pull request Nov 16, 2021
* update gosigar

* text changes

* check for unavailable stats

* new changes

(cherry picked from commit 08642d0)

# Conflicts:
#	metricbeat/module/system/fields.go
mergify bot pushed a commit that referenced this pull request Nov 16, 2021
* update gosigar

* text changes

* check for unavailable stats

* new changes

(cherry picked from commit 08642d0)
narph added a commit that referenced this pull request Nov 17, 2021
* update gosigar

* text changes

* check for unavailable stats

* new changes

(cherry picked from commit 08642d0)

Co-authored-by: Mariana Dima <mariana@elastic.co>
narph added a commit that referenced this pull request Nov 17, 2021
* update gosigar

* text changes

* check for unavailable stats

* new changes
narph added a commit that referenced this pull request Nov 17, 2021
* Update gosigar package (#28909)

* update gosigar

* text changes

* check for unavailable stats

* new changes

(cherry picked from commit 08642d0)

# Conflicts:
#	metricbeat/module/system/fields.go

* Update gosigar package (#28909)

* update gosigar

* text changes

* check for unavailable stats

* new changes

Co-authored-by: Mariana Dima <mariana@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.16.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants