Methodology, how we scan and how we remove

This is the long version of how Vault.in actually works. If a sentence here is unclear, write to us at [email protected] and we will fix it.

Identity expansion.

On signup we build an identity tuple for you. A tuple is the set of strings we will use to search for you across the internet: the primary name, then variants (with and without middle name, common misspellings, Hindi-to-Latin transliterations both ways), the phone number in three common formats (with country code, with hyphens, with spaces), email plus common Gmail variants (dots, plus-suffixes), past phone numbers and addresses if you have any, and the slugs of any social handles you have used. For face-match (Concierge), we generate a 512-dimensional face embedding from your uploaded photo. The original image is encrypted at rest and deleted after the embedding is generated unless you opt to retain it.

SERP scanning.

For each tuple, we query the Google Custom Search API for top results, then the Bing Web Search API as a cross-check, then SerpAPI as a tertiary fallback for sites that block direct API queries. Each result is classified against the site_catalog. For matches that look like exposing URLs, we render the page via a Playwright-driven headless browser running behind a residential proxy in India, capture a full-page PNG, capture the rendered HTML, compute a SHA-256 of each, and store both in encrypted blob storage with a chain-of-custody log entry.

Indian broker walker.

In parallel, we run a per-site walker against the top forty Indian data brokers that allow direct profile lookup: Truecaller (via the official unlisting API once authenticated), Justdial, IndiaMART, Sulekha, Naukri, NoBroker, 99acres, MagicBricks, Shaadi, Jeevansathi, BharatMatrimony, Tofler, Zauba Corp, KnowYourGST, InstaFinancials, OLX, Quikr, PolicyBazaar, CarWale, CarDekho, BookMyShow, IndianKanoon (read-only), MCA Online (read-only), and similar. Each walker runs at a per-site rate limit, never more than one query per minute per site per user. We respect robots.txt and we do not bypass paywalls.

Breach poller.

Daily pollers run against Have I Been Pwned (paid tier), DeHashed (subscription), IntelX (commercial), LeakCheck (commercial), and our threat-intelligence partner for dark-web markets. Identifiers (email, phone, username) are hashed before query where the API supports it; otherwise transmitted via TLS to the provider. Hits are stored as breach_records and linked to user identity via the breach_user_links table.

Telegram monitor.

A daily MTProto worker, running on a separate VM with its own Telegram session, subscribes to one hundred and fifty curated public channels known to traffic in Indian-sourced PII. The worker is read-only. We do not interact with channel administrators. We do not join private channels even when invitations are posted publicly. The worker extracts identifiers from message content and cross-references against user tuples. Matches generate telegram_alerts records and, for severity 8 and above, dispatch WhatsApp + email + push notifications within five minutes.

Face-match (Concierge).

Weekly, for each profile_photo, we run Google Vision API web-detection and Yandex image search. Hits are filtered through a similarity threshold (cosine similarity of the face embedding against the candidate match) before alerting. Adult, scam, and impersonation hits are flagged at severity 10.

Evidence capture.

Every exposure record carries an evidence pack. A pack contains: the rendered screenshot, the raw HTML, an HAR file of the network requests, a SHA-256 of each artifact, and a chain-of-custody log entry showing the capture timestamp (in IST), the worker version, the proxy IP geo-region, and the user agent. Packs are encrypted at rest with a per-user key. Access to a pack requires a signed URL with a one-hour TTL. We retain packs for the lifetime of the account plus thirty days post-cancellation.

Section 12 drafting.

When a new exposure record is created, the removal-drafter worker pulls the site's removal_method from the catalog and routes accordingly. For email and DPDP Section 12 methods, the worker invokes Claude Sonnet 4.5 with a strict system prompt: act as an Indian privacy lawyer drafting a Section 12 notice, cite Rule 14 ninety-day deadline, identify the Data Principal minimally, attach the evidence reference, do not use em dashes, do not make threats. The output is then template-validated to ensure the six required elements (identification, specific identification of data, invocation of Section 12, withdrawal of consent under Section 6(4), reference to Rule 14, notice of escalation under Section 27) are present. Drafts are saved for your review before sending.

Section 12 sending.

Approved drafts are dispatched via the channel appropriate for the site. For automated forms, our Playwright executor fills the form on your behalf, capturing screenshots at each step. For email, we send from a Vault.in proxy address with full legal disclosure of representation, or (on your option) from your verified inbox via OAuth. For postal legal notices, we generate the PDF, mail it via India Post Speed Post with acknowledgement card, and store the dispatch receipt as part of the evidence pack.

Rule 14 tracking.

A per-removal timer starts on the day of dispatch. Cron-driven workers check status on day thirty (polite follow-up if no acknowledgement), day sixty (formal escalation referring to the Grievance Officer), day ninety (final reminder), and day ninety-one (move to DPB escalation queue). The verifier worker also re-scans the original exposure URL on a thirty-day cycle to detect re-appearance.

DPB filing packet.

For removals that hit day ninety-one without compliance, the system generates a DPB filing packet automatically: a single PDF combining the Section 12 notice, all follow-ups, the company's responses if any, the evidence pack with hashes, a statement of the violation citing Rule 14, and the relief sought. You receive a notification, you review, you approve. We file via the Board's e-filing portal or by Speed Post depending on the case category.

Privacy score.

A daily worker recomputes your 0 to 100 privacy score using seven weighted components: active exposures (weight 35), breach exposure (20), dark-web presence (15), Telegram leak presence (10), image misuse (10), DPDP rights exercised (5), and identity hygiene (5). The score trend is exposed on your dashboard sparkline and included in your monthly report.

Alert dispatcher.

Severity 8 to 10 incidents trigger WhatsApp + email + SMS + push within five minutes via Gupshup's WhatsApp Business API, Resend for email, MSG91 for SMS, and OneSignal for push. Severity 5 to 7 are batched and sent hourly. Severity 1 to 4 roll up into a daily digest at 0830 IST.

AI usage.

We use Anthropic's Claude Sonnet 4.5 for Section 12 drafting and DPB filing drafting. We do not use AI to read your inbox, your messages, or any data not strictly required for the drafting task. The prompts and responses are not retained by Anthropic per our enterprise agreement. The version of the model and the prompt template used appears in the audit log of each generated document.

What we will not do.

We will not buy data from any source. We will not pay for access to dark-web markets, paste-site premium tiers, or stolen-data forums. We will not respond to "look up this person for me" requests, even from spouses or employers; the only person Vault.in works for is the Data Principal whose tuple matches the identifier. We will not honour government requests without lawful process; the count of requests received and our response rate is in the quarterly transparency report.

What we will do, on demand.

Generate an export of all your data within twenty-four hours (Section 11 right of access). Delete all your data within thirty days (Section 12 right of erasure). Answer a Grievance Officer letter within thirty days (Rule 13). Issue a breach notification within seventy-two hours of detection if our own systems are compromised.

If a sentence here is wrong, write to [email protected]. We will fix it.

How we scan, how we remove, what we do not do.