Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support output in CSV format #397

Merged
merged 5 commits into from
Sep 14, 2021
Merged

Support output in CSV format #397

merged 5 commits into from
Sep 14, 2021

Conversation

wux1an
Copy link
Contributor

@wux1an wux1an commented Sep 9, 2021

Support output in CSV format. Close #270

Usage:

httpx -l hosts.txt -o result.csv -csv

hosts.txt

https://www.github.com
https://projectdiscovery.io
https://www.microsoft.com

Output:
result.csv

timestamp,request,response-header,scheme,port,path,body-sha256,header-sha256,a,cnames,url,input,location,title,error,webserver,response-body,content-type,method,host,content-length,chain-status-codes,status-code,tls-grab,csp,vhost,websocket,pipeline,http2,cdn,response-time,technologies,chain,final-url,failed
2021-09-09 19:13:11.0684416 +0800 +08 m=+0.631425501,,,https,443,/,e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,f324f47733f273790e542fd6f61e8a1f86cc26c5122e4f30a41ff891ea2030c9,[23.62.177.155],[www.microsoft.com-c-3.edgekey.net www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net e13678.dscb.akamaiedge.net],https://www.microsoft.com:443,https://www.microsoft.com,https://www.microsoft.com/zh-tw/,,,,,,GET,23.62.177.155,0,[],302,<nil>,<nil>,false,false,false,false,false,567.9054ms,[],[],,false
2021-09-09 19:13:11.0960502 +0800 +08 m=+0.659034201,,,https,443,/,e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,58b6e651129d88980695116b2c6f1d48fad37a955abd5fcf306c0fd573e66642,[13.229.188.59],[github.com],https://www.github.com:443,https://www.github.com,https://github.com/,,,,,,GET,13.229.188.59,0,[],301,<nil>,<nil>,false,false,false,false,false,577.2405ms,[],[],,false
2021-09-09 19:13:11.5741673 +0800 +08 m=+1.137153601,,,https,443,/,c256fff7325d4435d5aca803221199c6c08a567eb09649fe4cf1abb1ebcdf1e2,160b0b675dc3dcc479f0fb81959f815b10c12a577b65285b40d0deb9c36eafb0,[172.67.74.214 104.26.6.152 104.26.7.152],[],https://projectdiscovery.io:443,https://projectdiscovery.io,,Projectdiscovery.io,,cloudflare,,text/html,GET,172.67.74.214,0,[],200,<nil>,<nil>,false,false,false,false,false,1.054815s,[],[],,false

console

$ httpx.exe -l hosts.txt -o res.csv -csv

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/              v1.1.2

                projectdiscovery.io

Use with caution. You are responsible for your actions
Developers assume no liability and are not responsible for any misuse or damage.
timestamp,request,response-header,scheme,port,path,body-sha256,header-sha256,a,cnames,url,input,location,title,error,webserver,response-body,content-type,method,host,content-length,chain-status-codes,status-code,tls-grab,csp,vhost,websocket,pipeline,http2,cdn,response-time,technologies,chain,final-url,failed
2021-09-09 19:13:11.0684416 +0800 +08 m=+0.631425501,,,https,443,/,e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,f324f47733f273790e542fd6f61e8a1f86cc26c5122e4f30a41ff891ea2030c9,[23.62.177.155],[www.microsoft.com-c-3.edgekey.net www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net e13678.dscb.akamaiedge.net],https://www.microsoft.com:443,https://www.microsoft.com,https://www.microsoft.com/zh-tw/,,,,,,GET,23.62.177.155,0,[],302,<nil>,<nil>,false,false,false,false,false,567.9054ms,[],[],,false
2021-09-09 19:13:11.0960502 +0800 +08 m=+0.659034201,,,https,443,/,e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855,58b6e651129d88980695116b2c6f1d48fad37a955abd5fcf306c0fd573e66642,[13.229.188.59],[github.com],https://www.github.com:443,https://www.github.com,https://github.com/,,,,,,GET,13.229.188.59,0,[],301,<nil>,<nil>,false,false,false,false,false,577.2405ms,[],[],,false
2021-09-09 19:13:11.5741673 +0800 +08 m=+1.137153601,,,https,443,/,c256fff7325d4435d5aca803221199c6c08a567eb09649fe4cf1abb1ebcdf1e2,160b0b675dc3dcc479f0fb81959f815b10c12a577b65285b40d0deb9c36eafb0,[172.67.74.214 104.26.6.152 104.26.7.152],[],https://projectdiscovery.io:443,https://projectdiscovery.io,,Projectdiscovery.io,,cloudflare,,text/html,GET,172.67.74.214,0,[],200,<nil>,<nil>,false,false,false,false,false,1.054815s,[],[],,false

@sullo
Copy link
Contributor

sullo commented Sep 9, 2021

Hi and thanks for the CSV reporter code!

I think to make this safer (and easier) for parsing and for opening in desktop software (especially MS Excel), a few changes should be made:

  • Special characters at the start should be escaped (see this article on CSV Injection)
  • Quotes inside the values should be escaped like \" so that encoding/csv will automatically quote the entire field when writing

Thanks again!

@wux1an
Copy link
Contributor Author

wux1an commented Sep 10, 2021

Header escape

Because the headers are obtained from Result struct tag named csv, the tag value only contains lowercase letters and -, so I did not escape , ( field delimiter ) when generating the headers. But it is really necessary, I'll do it.

Row data escape

I used the official library encoding/csv to generate each row of csv, this library has done escaping, and the default delimiter is ,.

test.go

package main

import (
	"bytes"
	"encoding/csv"
	"fmt"
)

func main() {
	buffer := bytes.Buffer{}
	writer := csv.NewWriter(&buffer)
	writer.Write([]string{
		"\\\n\tb \"  \n",
		"normal",
	})
	writer.Flush()
	fmt.Printf("->|%s|<-\n", buffer.String())
}

output

->|"\
	b ""  
",normal
|<-

The " was escaped into "" automatically.

Software support test

  • Office Excel
  • CSVViewer

@sullo
Copy link
Contributor

sullo commented Sep 10, 2021

Ok I did some testing and yes, the " is escaped and , in a field causes it to be in quotes--perfect.

A CSV injection attack is still possible, however:

╰$ cat test.csv
timestamp,request,response-header,scheme,port,path,body-sha256,header-sha256,a,cnames,url,input,location,title,error,webserver,response-body,content-type,method,host,content-length,chain-status-codes,status-code,tls-grab,csp,vhost,websocket,pipeline,http2,cdn,response-time,technologies,chain,final-url,failed
2021-09-10 15:45:54.434745 -0400 EDT m=+0.245758585,,,http,80,/,919d8316693d433612794d9dba12e1164e3983f1cb8dc9b19f7ca639d1af884c,5affca0292f4b0f8b5cd5ad08fc37430cf0fedb632ebe6f66ab27819c97852bd,[127.0.0.1 ::1],[],http://localhost:80,http://localhost,,=sum(1+1),,Apache/2.4.46 (Unix),,text/html,GET,127.0.0.1,27,[],200,<nil>,<nil>,false,false,false,false,false,1.875375ms,[],[],,false

Notice that =sum(1+1) makes it into the CSV file, which will execute when opened by a spreadsheet editor. Here is a screenshot of Pages where it is running =sum(1+1) and printing 2.
pages

For this test I had the base HTML file contents of

╰# cat index.html                                                                                                                                                                                                                            130 ↵
<title>
=sum(1+1)
</title>

But this would also apply to any field which gets data from the remote system (title, header banner, body, headers, etc.). If a field starts with any of these it should get a single quote ' prepended: = + @ -. Seems like the right place would be about line 1216 in your code.

Thanks!

@wux1an
Copy link
Contributor Author

wux1an commented Sep 13, 2021

My fault, I thought the encoding/csv would solve this problem, so I only tested escaping, not csv injection.

@sullo
Copy link
Contributor

sullo commented Sep 14, 2021

Can confirm that's resolved, thank you!

... http://localhost:80,http://localhost/,,'=sum(1+1),,Apache ....

@ehsandeep ehsandeep requested a review from Mzack9999 September 14, 2021 13:10
@wux1an
Copy link
Contributor Author

wux1an commented Sep 14, 2021

It's my pleasure.

Copy link
Member

@ehsandeep ehsandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @wux1an, for adding this functionality. Following your idea, we may need to make some modifications before the next release to ensure the default output is consistent between JSON and CSV formats. In the meanwhile, please let us know if you have any suggestions.

Here is the default output of httpx against exmaple.com using both formats.

JSON output
  "timestamp": "2021-09-14T20:58:43.970355+05:30",
  "scheme": "https",
  "port": "443",
  "path": "/",
  "body-sha256": "ea8fac7c65fb589b0d53560f5251f74f9e9b243478dcb6b3ea79b5e36449c8d9",
  "header-sha256": "c1a52c70d5c841f29e7d44d67c16eb233223050686fa7e17b95c5206d9810500",
  "a": [
    "93.184.216.34"
  ],
  "url": "https://example.com:443",
  "input": "example.com",
  "title": "Example Domain",
  "webserver": "ECS (sab/5750)",
  "content-type": "text/html",
  "method": "GET",
  "host": "93.184.216.34",
  "status-code": 200,
  "response-time": "3.180631833s",
  "failed": false
CSV output
timestamp,2021-09-14 20:59:17.092237 +0530 IST m=+3.420325876
request,
response-header,
scheme,https
port,443
path,/
body-sha256,ea8fac7c65fb589b0d53560f5251f74f9e9b243478dcb6b3ea79b5e36449c8d9
header-sha256,e8a027859399d3cac387b24096c29f2f2299dc121487379957d7b03a9a30f7fd
a,[93.184.216.34]
cnames,[]
url,https://example.com:443
input,example.com
location,
title,Example Domain
error,
webserver,ECS (sab/56BA)
response-body,
content-type,text/html
method,GET
host,93.184.216.34
content-length,0
chain-status-codes,[]
status-code,200
tls-grab,<nil>
csp,<nil>
vhost,false
websocket,false
pipeline,false
http2,false
cdn,false
response-time,3.235824458s
technologies,[]
chain,[]
final-url,
failed,false

@ehsandeep ehsandeep changed the base branch from master to dev September 14, 2021 19:20
@ehsandeep ehsandeep merged commit 22dc2f5 into projectdiscovery:dev Sep 14, 2021
@wux1an
Copy link
Contributor Author

wux1an commented Sep 15, 2021

My way

I did have an idea because I tried to do it, and later found it troublesome, so I abandoned it and adopted a lazy way.

The key code is in this place: https://github.com/wux1an/httpx/blob/8e5f500b06f21c4000b919fa7ed2be8d7d18671e/runner/runner.go#L1222:

str := fmt.Sprintf("%v", value.Interface())

I used the formatted output of Go to get the value of the variable. Details are in this file Go\src\fmt\doc.go:15-18

15	General:
16		%v	the value in a default format
17			when printing structs, the plus flag (%+v) adds field names
18		%#v	a Go-syntax representation of the value

I see that in your example, the string and bool display normally so do the slice of int. The slice of string will display normal too I guess. Only the format of the time.Time is different, this is easy to deal with, you only need to determine if the type of the variable is time.Time, and then you can format it in any way.

Improve

Can refer to the code for processing JSON. Details are in this file Go/src/encoding/json/encode.go:415-463

Go/src/encoding/json/encode.go:415-463
// newTypeEncoder constructs an encoderFunc for a type.
// The returned encoder only checks CanAddr when allowAddr is true.
func newTypeEncoder(t reflect.Type, allowAddr bool) encoderFunc {
	// If we have a non-pointer value whose type implements
	// Marshaler with a value receiver, then we're better off taking
	// the address of the value - otherwise we end up with an
	// allocation as we cast the value to an interface.
	if t.Kind() != reflect.Ptr && allowAddr && reflect.PtrTo(t).Implements(marshalerType) {
		return newCondAddrEncoder(addrMarshalerEncoder, newTypeEncoder(t, false))
	}
	if t.Implements(marshalerType) {
		return marshalerEncoder
	}
	if t.Kind() != reflect.Ptr && allowAddr && reflect.PtrTo(t).Implements(textMarshalerType) {
		return newCondAddrEncoder(addrTextMarshalerEncoder, newTypeEncoder(t, false))
	}
	if t.Implements(textMarshalerType) {
		return textMarshalerEncoder
	}

	switch t.Kind() {
	case reflect.Bool:
		return boolEncoder
	case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
		return intEncoder
	case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
		return uintEncoder
	case reflect.Float32:
		return float32Encoder
	case reflect.Float64:
		return float64Encoder
	case reflect.String:
		return stringEncoder
	case reflect.Interface:
		return interfaceEncoder
	case reflect.Struct:
		return newStructEncoder(t)
	case reflect.Map:
		return newMapEncoder(t)
	case reflect.Slice:
		return newSliceEncoder(t)
	case reflect.Array:
		return newArrayEncoder(t)
	case reflect.Ptr:
		return newPtrEncoder(t)
	default:
		return unsupportedTypeEncoder
	}
}

In the Result struct, we only need to deal with the above type (excluding the display normal type: string, []string, []int, bool, time.Time):

Referance code: https://github.com/wux1an/httpx/blob/8e5f500b06f21c4000b919fa7ed2be8d7d18671e/runner/runner.go#L1128

  • *cryptoutil.TLSData
  • *httpx.CSPData
  • []httpx.ChainItem

We can judge the type of the variable (easy) to generate different strings according to different types (hard). The three types are difficult to achieve because they are struct, it's difficult to use a string to represent a struct.

Therefore

I think the latter three types should be considered separately. As far as I know, there is no project to convert the struct slice to CSV, if you want to do it, there will be much code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support for more output format CSV/HTML
4 participants