docs/apps/linkdin/README.md
A three‑stage demo that goes from LinkedIn scraping ➜ LLM reasoning ➜ graph visualisation.
Try it in Google Colab! Click the badge above to run this demo in a cloud environment with zero setup required.
prospect‑wizard/
├─ c4ai_discover.py # Stage 1 – scrape companies + people
├─ c4ai_insights.py # Stage 2 – embeddings, org‑charts, scores
├─ graph_view_template.html # Stage 3 – graph viewer (static HTML)
└─ data/ # output lands here (*.jsonl / *.json)
pip install crawl4ai litellm sentence-transformers pandas rich
crwl profiles
profile_linkedin_uc.Remember the profile name. All future runs take
--profile-name <your_name>.
python c4ai_discover.py full \
--query "health insurance management" \
--geo 102713980 \ # Malaysia geoUrn
--title-filters "" \ # or "Product,Engineering"
--max-companies 10 \ # default set small for workshops
--max-people 20 \ # \^ same
--profile-name profile_linkedin_uc \
--outdir ./data \
--concurrency 2 \
--log-level debug
Outputs in ./data/:
companies.jsonl – one JSON per companypeople.jsonl – one JSON per employee🛠️ Dry‑run: C4AI_DEMO_DEBUG=1 python c4ai_discover.py full --query coffee uses bundled HTML snippets, no network.
| Location | geoUrn |
|---|---|
| Singapore | 103644278 |
| Malaysia | 102713980 |
| United States | 103644922 |
| United Kingdom | 102221843 |
| Australia | 101452733 |
See more: https://www.linkedin.com/search/results/companies/?geoUrn=XXX – the number after geoUrn= is what you need. |
python c4ai_insights.py \
--in ./data \
--out ./data \
--embed-model all-MiniLM-L6-v2 \
--llm-provider gemini/gemini-2.0-flash \
--llm-api-key "" \
--top-k 10 \
--max-llm-tokens 8024 \
--llm-temperature 1.0 \
--workers 4
Emits next to the Stage‑1 files:
company_graph.json – inter‑company similarity graphorg_chart_<handle>.json – one per companydecision_makers.csv – hand‑picked ‘who to pitch’ listFlags reference (straight from build_arg_parser()):
| Flag | Default | Purpose |
|---|---|---|
--in | . | Stage‑1 output dir |
--out | . | Destination dir |
--embed_model | all-MiniLM-L6-v2 | Sentence‑Transformer model |
--top_k | 10 | Neighbours per company in graph |
--openai_model | gpt-4.1 | LLM for scoring decision makers |
--max_llm_tokens | 8024 | Token budget per LLM call |
--llm_temperature | 1.0 | Creativity knob |
--stub | off | Skip OpenAI and fabricate tiny charts |
--workers | 4 | Parallel LLM workers |
After Stage 2 completes, simply open the HTML viewer from the project root:
open graph_view_template.html # or Live Server / Python -http
The page fetches data/company_graph.json and the org_chart_*.json files automatically; keep the data/ folder beside the HTML file.
people.jsonl.| Symptom | Fix |
|---|---|
| Infinite CAPTCHA | Use a residential proxy: --proxy http://user:pass@ip:port |
| 429 Too Many Requests | Lower --concurrency, rotate profile, add delay |
| Blank graph | Check JSON paths, clear localStorage in browser |
crwl profiles → c4ai_discover.py → c4ai_insights.py → open graph_view_template.html.
Live long and import crawl4ai.