Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join 2 frames of same type #153

Open
teto opened this issue Mar 21, 2021 · 2 comments
Open

join 2 frames of same type #153

teto opened this issue Mar 21, 2021 · 2 comments

Comments

@teto
Copy link
Contributor

teto commented Mar 21, 2021

I have 2 frames that contain network packet arguments (source IP, destination IP and so on), one frame captured at the source, one at the destination. In my python software (now rewriting in haskell), I mapped the 2 dataframes via a hash of the packet.
Which gives:

-- generate a column with a hash of other columns
addHash :: FrameFiltered Packet -> Frame (Record '[PacketHash] )
addHash aframe =
  fmap (addHash')  (frame)
  where
    frame = fmap toHashablePacket (ffFrame aframe)
    addHash' row = Col (hashWithSalt 0 row) :& RNil

-- here frame1 and frame2 have the same type
mergeTcpConnectionsFromKnownStreams frame1 frame2 =
  mergedFrame
  where
    mergedFrame = innerJoin @'[PacketHash] ( hframe1) ( hframe2)
    hframe1 = zipFrames (addHash aframe1) frame1
    hframe2 = zipFrames (addHash aframe1) frame2

It compiles and it seems to run but after the innerJoin, there should be several columns with the same name. Doesn't that break the API somewhat ? how can I select the source IP between the 2 sourceIP present in the merged dataframe for instance ?

@teto
Copy link
Contributor Author

teto commented Apr 4, 2021

I finally managed to set the types and indeed we concatenate columns with same names (The following compiles):

mergeTcpConnectionsFromKnownStreams :: 
  FrameFiltered Packet -> FrameFiltered Packet
  -> [ Rec (Maybe :. ElField) ('[PacketHash] ++ ManColumnsTshark ++ ManColumnsTshark) ]

I then serialized the result via writeCsv and because of #155 the results are messed up so I can't interpret them yet but I see column with the same names.

I also noticed a bug on my side: I was doing a join on the PacketHash column but all my hashes were equal to 0 (now fixed).
All packets got paired 1 to 1 with the same hash. The beahvior is strange/wrong so maybe we could add a function that adds some check or specify the beahviour when several rows are candidates for a merge ?

@idontgetoutmuch
Copy link

This seems related: #170 (comment). I just rename columns and then do the inner join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants