Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Data loss before increasing fingerprint to config size in specified state #20745

Closed
maokitty opened this issue Apr 7, 2023 · 5 comments · Fixed by #20749
Closed
Assignees
Labels

Comments

@maokitty
Copy link

maokitty commented Apr 7, 2023

Component(s)

pkg/stanza, receiver/filelog

What happened?

Description

If two files have the same fingerprint and the fingerprint size is smaller than config, even if they have different bytes in the future, data of one file will be lost.

Steps to Reproduce

func TestFingerprintIncreaseUntilLargerThanConfig(t *testing.T) {
	t.Parallel()
	tempDir := t.TempDir()
	cfg := NewConfig().includeDir(tempDir)
	cfg.FingerprintSize = 18
	cfg.StartAt = "beginning"
	operator, emitCalls := buildTestManager(t, cfg)
	operator.persister = testutil.NewMockPersister("test")

	// Both of they will be include
	file1 := openTempWithPattern(t,tempDir,"*.log1")
	file2 := openTempWithPattern(t,tempDir,"*.log2")

	// Two same fingerprint file , and smaller than  config size
	content := "aaaaaaaaaaa"
	writeString(t,file1, content+"\n")
	writeString(t,file2, content+"\n")
	operator.poll(context.Background())
	// one file will be exclude, ingest only one content
	waitForToken(t, emitCalls,[]byte(content))
	expectNoTokens(t,emitCalls)
	operator.wg.Wait()

	// keep append data to file1 and file2
	newContent := "bbbbbbbbbbbb"
	newContent1 := "ddd"
	writeString(t,file1,newContent1+"\n")
	writeString(t,file2,newContent+"\n")
	operator.poll(context.Background())
	// todo data lost happen , it should be fixed , lost content from file2
	waitForTokens(t,emitCalls,[][]byte{[]byte(content),[]byte(newContent1),[]byte(newContent)})
	operator.wg.Wait()

}

Expected Result

no error

Actual Result

Error:      	elements differ
        	            	
        	            	extra elements in list A:
        	            	([]interface {}) (len=1) {
        	            	 ([]uint8) (len=11) {
        	            	  00000000  61 61 61 61 61 61 61 61  61 61 61                 |aaaaaaaaaaa|
        	            	 }
        	            	}
        	            	
        	            	
        	            	listA:
        	            	([][]uint8) (len=3) {
        	            	 ([]uint8) (len=11) {
        	            	  00000000  61 61 61 61 61 61 61 61  61 61 61                 |aaaaaaaaaaa|
        	            	 },
        	            	 ([]uint8) (len=3) {
        	            	  00000000  64 64 64                                          |ddd|
        	            	 },
        	            	 ([]uint8) (len=12) {
        	            	  00000000  62 62 62 62 62 62 62 62  62 62 62 62              |bbbbbbbbbbbb|
        	            	 }
        	            	}
        	            	
        	            	
        	            	listB:
        	            	([][]uint8) (len=2) {
        	            	 ([]uint8) (len=3) {
        	            	  00000000  64 64 64                                          |ddd|
        	            	 },
        	            	 ([]uint8) (len=12) {
        	            	  00000000  62 62 62 62 62 62 62 62  62 62 62 62              |bbbbbbbbbbbb|
        	            	 }
        	            	}
        	Test:       	TestFingerprintIncreaseUntilLargerThanConfig

Collector version

v.0.73.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

Some idea:
If the fingerprint size is less than the config size, and the old and new fingerprints are from different files, the new file should be read from the beginning.

@maokitty maokitty added bug Something isn't working needs triage New item requiring triage labels Apr 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@maokitty maokitty changed the title Data loss before fingerprint increase to config size in specified condition Data loss before increasing fingerprint to config size in specified state Apr 7, 2023
@djaglowski djaglowski self-assigned this Apr 7, 2023
@djaglowski djaglowski changed the title Data loss before increasing fingerprint to config size in specified state [pkg/stanza] Data loss before increasing fingerprint to config size in specified state Apr 7, 2023
@djaglowski djaglowski removed the needs triage New item requiring triage label Apr 7, 2023
@djaglowski
Copy link
Member

Thanks for writing this up @maokitty. #20749 should fix the problem. Please see the PR description for the explanation.

@maokitty
Copy link
Author

Hi @djaglowski , I'm afraid we still need to do something

  1. the source of the content is not from the expected file
func TestStalePartialFingerprintDiscarded(t *testing.T) {
	t.Parallel()
	tempDir := t.TempDir()
	cfg := NewConfig().includeDir(tempDir)
	cfg.FingerprintSize = 18
	cfg.StartAt = "beginning"
	operator, emitCalls := buildTestManager(t, cfg)
	operator.persister = testutil.NewMockPersister("test")

	// Both of they will be include
	file1 := openTempWithPattern(t, tempDir, "*.log1")
	file2 := openTempWithPattern(t, tempDir, "*.log2")

	// Two same fingerprint file , and smaller than  config size
	content := "aaaaaaaaaaa"
	writeString(t, file1, content+"\n")
	writeString(t, file2, content+"\n")
	operator.poll(context.Background())
	// one file will be exclude, ingest only one content
	waitForToken(t, emitCalls, []byte(content))
	expectNoTokens(t, emitCalls)
	operator.wg.Wait()
	require.Equal(t,1,len(operator.knownFiles))
	// if one file is read ,it will be save to knownFiles
	file1Exclude := operator.knownFiles[0].file.Name() != file1.Name()

	// keep append data to file1 and file2
	newContent := "bbbbbbbbbbbb"
	newContent1 := "ddd"
	writeString(t, file1, newContent1+"\n")
	writeString(t, file2, newContent+"\n")
	operator.poll(context.Background())

	var actualFilePath string
	LOOP:
	for {
		select {
		case call := <-emitCalls:
			if bytes.Equal(call.token,[]byte(content)) {
				actualFilePath = call.attrs.Path
			}
		case <-time.After(3 * time.Second):
			break LOOP
		}
	}
	// content should be read from it's source file
	if file1Exclude {
		require.Equal(t,file1.Name(),actualFilePath)
	} else{
		require.Equal(t,file2.Name(),actualFilePath)
	}
}
  1. if one file stop increase size , it will be lost forever, even with different fingerprint size
func TestDataLoss(t *testing.T) {
	t.Parallel()
	tempDir := t.TempDir()
	cfg := NewConfig().includeDir(tempDir)
	cfg.FingerprintSize = 18
	cfg.StartAt = "beginning"
	operator, emitCalls := buildTestManager(t, cfg)
	operator.persister = testutil.NewMockPersister("test")

	// Both of they will be include
	file1 := openTempWithPattern(t, tempDir, "*.log1")
	file2 := openTempWithPattern(t, tempDir, "*.log2")

	// Two same fingerprint file , and smaller than  config size
	content := "aaaaaaaaaaa"
	writeString(t, file1, content+"\n")
	writeString(t, file2, content+"\n")
	operator.poll(context.Background())
	// one file will be exclude, ingest only one content
	waitForToken(t, emitCalls, []byte(content))
	expectNoTokens(t, emitCalls)
	operator.wg.Wait()

	// keep append data to file1
	newContent1 := "dddddddd"
	writeString(t, file1, newContent1+"\n")
	operator.poll(context.Background())

	// file1 and file2  has different fingerprint
	waitForTokens(t,emitCalls,[][]byte{[]byte(newContent1),[]byte(content)})

}

@djaglowski
Copy link
Member

@maokitty, thanks for pointing these issues out. Would you mind opening a new issue for each of them?

@maokitty
Copy link
Author

Here they are:
#20851
#20850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants