Skip to content
This repository was archived by the owner on Apr 4, 2026. It is now read-only.
This repository was archived by the owner on Apr 4, 2026. It is now read-only.

Dedupe option in feed processing #103

@pierre427

Description

@pierre427

If we look at one feed:

http://botscout.com/last_caught_cache.htm

You can see that the one field we care about 'Bot IP' shows a lot of duplicate entries. It would be nice to have an option to do something like this:

parse_eachline(:separator => "\n") do |event_generator, record|
  m = feed_re.match(record.data)
  next if m.nil?
  next if dedupe(m[:ip])

  event_generator.call() do |event|
    event.type = :scanning
    event.add_ipv4(m[:ip]) do |ipv4_event|
    end
  end
end

And just have the dedupe function keep a hash table for fhe current feed for any fields which is receives, and return true if there's a previous match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions