Skip to content

Conversation

@rootranjan
Copy link

Fixes #4633

Description:

Reduce false positives in Metabase detector by filtering out URL slugs and descriptive strings that match the detector pattern but are not actual session tokens.

Changes:

  • Add filter to exclude URL slugs (strings starting with hyphens in URL paths)
  • Add filter to exclude descriptive strings (readable words like "journal", "deduplication", "voucher")
  • Add filter to exclude slug patterns (strings with many hyphens and only lowercase letters)
  • Fix lint error by properly handling res.Body.Close() errors

This reduces false positives from URL slugs and descriptive identifiers while still detecting real Metabase session tokens that are random alphanumeric strings.

Problem:
The Metabase detector was flagging any 36-character alphanumeric string (including hyphens) near the keyword "metabase" and a URL as a potential session token, including:

  • URL slugs in Metabase query URLs (e.g., -journal-deduplication-id-to-voucher from question/12345-journal-deduplication-id-to-voucher)
  • Path identifiers in URLs
  • Descriptive strings that happen to be 36 characters

Solution:
Added isLikelyFalsePositive() helper function with multiple filters:

  1. URL slug filter - Detects strings starting with hyphens that are part of URL paths (checks for http://, https://, /question/, path separators)
  2. Descriptive string filter - Detects readable words (like "journal", "deduplication", "voucher") that appear in URL slugs but not in random tokens
  3. Slug pattern filter - Detects strings with 3+ hyphens and only lowercase letters (descriptive slug pattern vs random token pattern)

Implementation Details:

  • Modified FromData() to use FindAllStringSubmatchIndex() to get match positions for context extraction
  • Added context-aware filtering that checks surrounding text (±300 chars) to detect URL patterns
  • Filters are applied before processing matches to avoid unnecessary verification calls
  • Each filter function extracts context around the match and checks for specific patterns (e.g., http://, /question/, descriptive words)

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Add filters to exclude URL slugs and descriptive strings:
- Filter URL slugs (strings starting with hyphens in URL paths)
- Filter descriptive strings (readable words like 'journal', 'deduplication')
- Filter strings with many hyphens and only lowercase (slug pattern)

This reduces false positives from URL slugs like '-journal-deduplication-id-to-voucher'
while still detecting real Metabase session tokens that are random alphanumeric strings.

Fixes false positive where URL slugs in Metabase query URLs were incorrectly
flagged as session tokens.
Use defer with io.Copy and explicit error handling to satisfy errcheck linter,
matching the pattern used in other detectors.
@rootranjan rootranjan requested a review from a team December 31, 2025 15:56
@rootranjan rootranjan requested a review from a team as a code owner December 31, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metabase detector produces false positives for URL slugs and descriptive strings

1 participant