Getting Started
System Requirements
| Operating System | Windows 10/11 (64-bit) |
| RAM | 4 GB minimum, 8 GB recommended (16 GB for local AI) |
| Disk Space | 500 MB for the app, plus space for AI models if running locally |
| GPU (optional) | NVIDIA GPU with CUDA support for faster local AI inference |
| Network | Internet required for cloud AI providers. Not required for local AI. |
Installation
AumaTron is available as a Windows installer or a portable package.
Windows Installer (Recommended)
- Download the installer from aumatron.com/download
- Run the installer — it bundles Node.js, so no pre-installation needed
- AumaTron installs to your AppData folder and creates a desktop shortcut
- Launch AumaTron from the desktop shortcut or Start menu
Portable Package
- Download the portable ZIP from aumatron.com/download
- Extract to any folder on your machine
- Ensure Node.js 18+ is installed on your system
- Open a terminal in the extracted folder and run
npm install - Start with
npm startornode server/index.js
First Run Setup
On first launch, AumaTron automatically:
- Generates a
.envfile with secure random keys (ENCRYPTION_KEY, SESSION_SECRET, JWT_SECRET) - Creates the SQLite database and runs all migrations
- Starts the server on port 3000
Open your browser and navigate to http://localhost:3000. You'll be prompted to create your first account using an invite code.
Setting Your Workspace
The workspace is the root folder where AumaTron reads and writes files. All file operations (AI tools, file browser, uploads) are restricted to this folder for security.
- Go to Settings → General
- Under Workspace Path, enter the folder path (e.g.
C:\Users\YourName\AumaTron-Files) - Click Save
Quick Start
Once installed, here's the fastest way to see AumaTron in action:
- Set an AI provider — Go to Settings → AI Providers. For the quickest start, paste your OpenAI API key. For free/private, set up a local model (see Local AI).
- Send a message — Type something in the chat like: "Go to wikipedia.org and tell me today's featured article"
- Watch it work — AumaTron will launch a browser, navigate to the site, extract the content, and respond in chat.
That's it. You're automating with AI.
AI Providers
Local AI (Llama)
Run AI completely offline on your own hardware. No API costs, no data leaving your machine.
Setup
- Download a GGUF model file (e.g. Llama 3, Mistral, Phi-3) from Hugging Face or similar
- Go to Settings → AI Providers → Local AI
- Set the path to your GGUF model file
- Configure GPU layers (higher = faster, but uses more VRAM)
- Click Save — AumaTron will start the llama.cpp server automatically
Recommended Models
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| Phi-3 Mini (Q4) | ~2 GB | 4 GB | Light tasks, low-resource machines |
| Llama 3 8B (Q4) | ~4.5 GB | 8 GB | General use, good balance |
| Mistral 7B (Q5) | ~5 GB | 8 GB | Reliable all-rounder |
| Llama 3 70B (Q4) | ~40 GB | 48 GB+ | Maximum quality (needs powerful hardware) |
OpenAI, Claude, Deepseek
Cloud providers offer more powerful models without needing local GPU resources. Available on all plans — just add your API key. Custom providers (Groq, Mistral, etc.) require a Pro plan or higher.
OpenAI
- Get an API key from platform.openai.com
- Go to Settings → AI Providers → OpenAI
- Paste your API key and select your preferred model (GPT-4o, GPT-4o-mini, etc.)
Anthropic Claude
- Get an API key from console.anthropic.com
- Go to Settings → AI Providers → Claude
- Paste your API key and select a model (Claude Sonnet, Haiku, etc.)
Deepseek
- Get an API key from platform.deepseek.com
- Go to Settings → AI Providers → Deepseek
- Paste your API key
Custom Providers
Add any OpenAI-compatible API endpoint as a custom provider. This works with Groq, Mistral, Together AI, Ollama, LM Studio, and many others. Pro+
- Go to Settings → AI Providers → Custom Providers
- Click Add Provider
- Enter a name, the API endpoint URL, your API key, and the model name
- Test the connection, then save
Any provider that follows the OpenAI chat completions API format (/v1/chat/completions) will work.
Choosing the Right Provider
| Scenario | Recommended | Why |
|---|---|---|
| Maximum privacy | Local AI (Llama) | Nothing leaves your machine |
| Best quality | Claude or GPT-4o | Most capable at complex tasks |
| Lowest cost | Deepseek or Local | Deepseek is very affordable; Local is free |
| Fastest response | Groq (custom) | Extremely fast inference |
| Simple tasks | GPT-4o-mini or Phi-3 | Cheap/free, good enough for straightforward work |
Browser Automation
How It Works
AumaTron controls real browsers using Playwright and Puppeteer. When you ask the AI to interact with a website, it:
- Launches a browser window (or uses an existing persistent session)
- Navigates to the URL you specify
- Reads the page content and determines the best action
- Clicks buttons, fills forms, scrolls, extracts data — whatever your instruction requires
- Reports back the results in chat
No scripting or coding required. Just describe what you want in plain language.
- "Go to my Shopify admin and check how many orders came in today"
- "Log in to WordPress and publish the draft post called 'Summer Sale'"
- "Search Google for 'best project management tools' and give me the top 5 results"
Browser Types
| Browser | Plan | Best For |
|---|---|---|
| Firefox (Playwright) | Free+ | General browsing, good compatibility |
| Chrome Incognito (Puppeteer) | Pro+ | Sites that require Chrome, stealth mode |
| Tor | Pro+ | Anonymous browsing, accessing .onion sites |
| Tor Standalone | Pro+ | Tor without needing a system Tor install |
| Headless Mode | Ultra | No visible window, faster, uses less resources |
Select your default browser in Settings → General → Browser Settings. Pro+ plans can switch between browsers at any time.
Saved Sites & Credentials
Save frequently visited websites with their login credentials so the AI can log in automatically.
- Go to Settings → Saved Sites
- Click Add Site
- Enter the site URL, username, and password
- Optionally add custom instructions (e.g. "Always click 'Accept Cookies' first")
- Save
Now when you mention the site in chat or Telegram, AumaTron automatically uses the saved credentials to log in. All passwords are encrypted with AES-256-GCM.
WordPress Mode
For WordPress sites, enable WordPress mode on the saved site. This provides dedicated support for WP admin operations like publishing posts, managing plugins, and checking updates.
Stealth & Anti-Detection
Chrome Incognito mode includes anti-detection stealth features:
- Fingerprint spoofing (WebGL, Canvas, AudioContext)
- Realistic browser headers and navigator properties
- Human-like mouse movement and typing delays
- Persistent session cookies across runs
This helps avoid bot detection on sites that actively check for automated browsers.
Tips for Reliable Automation
- Be specific: "Click the blue 'Submit Order' button" works better than "submit the form"
- Use saved sites: Pre-saved credentials eliminate login failures
- Add pre-actions: Configure sites to dismiss popups or accept cookies before the main task
- Use persistent sessions: Staying logged in avoids 2FA prompts and CAPTCHAs on repeat visits (Pro+)
- Check screenshots: If a task fails, the AI often takes a screenshot showing what went wrong
Task Scheduler & Workflows
Creating Tasks
Scheduled tasks let AumaTron run automations on a recurring basis without your involvement.
- Go to Settings → Scheduler
- Click New Task
- Enter a name and the prompt (natural language instruction for the AI)
- Set the schedule (daily, weekly, or custom cron expression)
- Optionally select a specific AI provider for this task
- Save and enable the task
- "Log in to my Shopify store, go to Orders, and write today's order count to orders-log.txt"
- "Go to the receipts folder, OCR all new files, extract the totals, and append them to expense-summary.txt"
- "Check my WordPress site for pending plugin updates and send a summary"
Cron Scheduling
Tasks use cron-style scheduling. Common examples:
| Schedule | Cron Expression | Meaning |
|---|---|---|
| Every day at 9 AM | 0 9 * * * | Runs once daily at 9:00 |
| Every Monday at 8 AM | 0 8 * * 1 | Weekly on Monday morning |
| Every 6 hours | 0 */6 * * * | Runs at midnight, 6 AM, noon, 6 PM |
| Every 30 minutes | */30 * * * * | Runs twice per hour (Pro+) |
| Weekdays at 5 PM | 0 17 * * 1-5 | Mon–Fri at 5:00 PM |
Multi-Step Workflows
Workflows chain multiple tasks into a sequence that runs automatically. Each step executes in order, and the workflow loops until its time limit is reached.
- Create a new task and enable Workflow mode
- Add steps — each step is a prompt that the AI executes
- Set a time limit for the overall workflow
- Save and schedule it like any other task
Workflows are useful for multi-part jobs like: scan a folder for receipts → OCR each one → write a summary → archive processed files.
Task Chaining
Trigger a follow-up task automatically when another task completes. Pro+
- Open an existing task
- Under On Completion, select another task to trigger
- The chained task starts as soon as the first one finishes
This lets you build pipelines: Task A collects data → Task B processes it → Task C sends a report.
Per-Task AI Models
Assign different AI providers and models to individual tasks. Pro+
This is useful for cost optimization:
- Use a cheap/fast model (GPT-4o-mini, local Llama) for simple repetitive tasks
- Use a powerful model (GPT-4o, Claude) for complex tasks that need reasoning
- Use Deepseek for tasks where cost is the primary concern
Telegram Bot
Setup Guide
- Open Telegram and search for @BotFather
- Send
/newbotand follow the prompts to create a bot - Copy the bot token that BotFather gives you
- In AumaTron, go to Settings → General → Telegram
- Paste the bot token and click Save
- Send a message to your new bot in Telegram — AumaTron will respond
Commands
You can send natural language messages to your bot just like chatting in the AumaTron web interface. The AI has full access to all tools — browser automation, file management, and OCR.
- "Check my Shopify orders and send me a screenshot"
- "Read the file expense-summary.txt and tell me this month's total"
- "Go to my WordPress site and check if there are any pending comments"
Remote Control
Telegram gives you full remote access to AumaTron from your phone. Some things you can do:
- Browse websites and receive screenshots
- Manage files in your workspace
- Trigger automations on demand
- Receive task completion notifications
Smart credential injection means that when you mention a saved site URL in Telegram, AumaTron automatically uses the stored login details.
File Management
Workspace Path
All file operations are scoped to your workspace folder. This is a security measure — the AI cannot read or write files outside this folder.
Set your workspace in Settings → General → Workspace Path.
C:\Users\YourName\Downloads). AumaTron does not override the browser's download directory. To work with downloaded files, set your workspace to a parent folder that includes your Downloads folder.
File Browser Panel
Click Files in the header to open the file browser panel. It provides:
- Directory navigation with breadcrumb trail
- File preview (text files with syntax highlighting, images with preview)
- File metadata (size, modification date, type)
The file browser is available on all plans for browsing and previewing.
File Operations
Full file management from the browser panel. Pro+
- Create File — Create a new empty text file in the current folder
- Create Folder — Create a new subfolder
- Upload — Upload files from anywhere on your PC into the workspace (50 MB max per file)
- Rename — Rename any file or folder
- Edit — Open text files in an editor, make changes, and save
- Delete — Delete files or folders (folders are deleted with all contents)
AI File Tools
The AI can manage files through chat commands. These tools are available when talking to the AI in chat, Telegram, or scheduled tasks:
read_file— Read the contents of a text filewrite_file— Create a new file (auto-converts to append if file has content)append_to_file— Add content to the end of a filecreate_folder— Create a new directorycopy_file— Copy a file to a new locationmove_file/rename_file— Move or rename filesdelete_file— Delete a filelist_folder— List the contents of a directory
All file tools respect the workspace boundary and include safety guards to prevent accidental data loss.
OCR & Receipt Processing
Overview
AumaTron includes built-in OCR (Optical Character Recognition) powered by Tesseract.js. It can read text from images of receipts, invoices, documents, and more.
OCR is commonly used in scheduled tasks to automatically process receipt images, extract totals and dates, and write summaries to a log file.
Language Support
Built-in support for:
- English — Full support
- Thai — Full support including Thai numerals and Buddhist Era dates
OCR runs with both Thai and English language packs simultaneously for accurate results on bilingual documents.
Smart Extraction
After reading the text, AumaTron applies intelligent extraction:
Amount Detection (3-tier)
- Keyword match — Looks for amounts near keywords like "Total", "Grand Total", "Net" (in English and Thai)
- Repeated amounts — If the same amount appears multiple times, it's likely the total
- Largest amount — Falls back to the largest reasonable number found
Date Detection
Recognizes multiple date formats:
- DD/MM/YYYY, DD-MM-YY, DD Mon YYYY
- Buddhist Era years (e.g. 2568 = 2025 CE) — auto-converted
Validation
- Confidence threshold: OCR results below 40% confidence are auto-skipped
- Amount validation: Rejects tracking numbers (>9 digits) and zero/unreasonable values
- Results are cached per task run to prevent redundant processing
Best Practices
- Good lighting: Well-lit, high-contrast receipt photos give the best results
- Flat and straight: Avoid crumpled or angled receipts
- Reasonable resolution: 300+ DPI or a clear phone photo works well
- Use the OCR sidecar app: Install the Receipt OCR app from Settings → Apps for enhanced preprocessing (sharpening, contrast adjustment) that improves accuracy
- Check the confidence: If OCR frequently produces garbage results, the image quality may be too low
Brain System
Personalities
Personalities shape how the AI behaves, its tone, focus areas, and response style.
Using Personalities
- Create a
.mdfile in thebrain/personalities/folder (or thebrain/root folder) - Write instructions for the AI's behavior (e.g. "You are a helpful assistant focused on e-commerce. Always be concise.")
- Select the personality when starting a new conversation or assigning it to a scheduled task
You are a concise, efficient assistant.
- Keep responses under 3 sentences unless asked for detail
- Use bullet points for lists
- Skip pleasantries and get straight to the answer
- When browsing websites, extract only the specific data requested
Each conversation and each scheduled task can use a different personality.
Memory Files
Memory files give the AI persistent knowledge that it can recall based on context.
- Create
.mdfiles in thebrain/folder - Write facts, preferences, or reference data the AI should know
- The AI automatically retrieves relevant memories by matching keywords from your messages against memory file names and content
My Shopify stores:
- Main store: myshop.myshopify.com (Fashion, ~200 orders/day)
- Second store: myshop2.myshopify.com (Electronics, ~50 orders/day)
- Reports should be saved to the "shopify-reports" folder
- Always include order count and revenue in daily summaries
Token Budgets
The brain system automatically manages token usage to stay within provider limits:
| Local AI | 800 token budget |
| Cloud providers | 4,000 token budget |
| Max memories per request | 5 |
| Min keyword length | 4 characters |
If the personality + memories exceed the budget, the system automatically truncates: first reducing personality to 50% of budget, then reducing to 2 memories, then dropping memories entirely.
Voice (STT & TTS)
Speech-to-Text
| Provider | Plan | Description |
|---|---|---|
| Web Speech API | Free+ | Built into your browser. No setup needed. Quality varies by browser. |
| Whisper Local | Pro+ | OpenAI's Whisper model running locally. Private, no API costs. |
| Whisper Cloud | Pro+ | Whisper via OpenAI API. Higher accuracy, requires OpenAI key. |
| Deepgram | Ultra | Real-time streaming STT with high accuracy and low latency. |
| Custom Endpoint | Ultra | Connect any STT API via custom endpoint URL. |
Configure your STT provider in Settings → STT.
Text-to-Speech
| Provider | Plan | Description |
|---|---|---|
| Browser TTS | Free+ | Built-in browser speech synthesis. Works everywhere, basic quality. |
| ElevenLabs | Pro+ | Premium AI voices with natural intonation. Requires ElevenLabs API key. |
| OpenAI TTS | Pro+ | High-quality voices via OpenAI API. Multiple voice options. |
| Fish Audio | Pro+ | Alternative TTS provider with unique voice options. |
| Custom Endpoint | Ultra | Connect any TTS API via custom endpoint URL. |
Configure your TTS provider in Settings → TTS.
Apps & Custom API Tools
Installing Apps
Apps are sidecar services that extend AumaTron's capabilities.
- Go to Settings → Apps
- Enter the app's URL (e.g.
http://localhost:3001) - AumaTron fetches the app's manifest to verify its capabilities
- Click Install
Once installed, the app's endpoints are automatically registered as AI tools. For example, the Receipt OCR app becomes available as a tool the AI can call during tasks.
First-Party App: Receipt OCR Extractor
The Receipt OCR app provides enhanced receipt scanning with Sharp image preprocessing (sharpening, contrast, binarization) plus Tesseract OCR. It produces better results than the built-in OCR for difficult receipts.
Run it alongside AumaTron on port 3001. If the app goes down, AumaTron gracefully falls back to built-in OCR.
Custom API Tools
Create tools that let the AI call any external API. Pro+
- Go to Settings → API Tools
- Click New Tool
- Configure:
- Name — Tool identifier (becomes
api_yourname) - API URL — The endpoint to call
- HTTP Method — GET, POST, PUT, DELETE
- Auth — None, Bearer token, API key header, or Basic auth
- Parameters — Define what the AI can pass to the API
- Name — Tool identifier (becomes
- Click Test Connection to verify it works
- Save
The AI will automatically use the tool when relevant to your request. All credentials are encrypted with AES-256-GCM.
Security & Privacy
Encryption
AumaTron encrypts all sensitive data at rest:
- Saved site passwords — AES-256-GCM encrypted in the database
- API keys — AES-256-GCM encrypted
- Custom API tool credentials — AES-256-GCM encrypted
- Session tokens — JWT with signed secrets
The encryption key is generated automatically on first run and stored in your .env file. If you need to rotate keys, a migration script is included.
Website Filtering
Control which websites the AI can access. Pro+
- Whitelist mode — Only allow specific domains
- Blacklist mode — Block specific domains, allow everything else
Configure in Settings → General → Security.
Running Fully Offline
AumaTron can run with zero internet connectivity:
- Use Local AI (Llama) as your AI provider
- Use Web Speech API for voice (works offline in some browsers)
- Use Browser TTS for text-to-speech
- All data stays on your machine — no cloud calls needed
The only feature that requires internet is browser automation for external websites (obviously). Everything else — file management, OCR, scheduling, chat — works fully offline.
Plans & Billing
Plan Comparison
| Feature | Free | Pro ($12/mo) | Ultra ($49/mo) |
|---|---|---|---|
| Scheduled tasks | 5 | Unlimited | Unlimited |
| Browser runs/month | 100 | Unlimited | Unlimited |
| AI providers | Local only | All + custom | All + custom |
| Browsers | Firefox | All (Firefox, Chrome, Tor) | All + headless |
| Concurrent tasks | 2 | 5 | Unlimited |
| File management | Browse only | Full (create, edit, upload, delete) | Full + batch ops |
| Task chaining | No | Yes | Yes |
| Voice (STT) | Web Speech | + Whisper | + Deepgram, custom |
| Voice (TTS) | Browser | + ElevenLabs, OpenAI, Fish | + Custom endpoints |
| Team members | 1 | 1 | Unlimited |
| Telegram | 1 bot | 1 bot | Multi-bot |
| VPN | No | 1 profile | Multiple profiles |
| API access | No | No | REST API + webhooks |
| Support | Community | Priority + SLA |
Annual billing saves 17%: Pro is $120/year ($10/mo), Ultra is $490/year (~$40.83/mo).
Upgrading
Upgrade from Settings → General → Plan, or from the pricing page. Your data and configuration are preserved — no migration needed.
- 14-day money-back guarantee on Pro and Ultra
- Cancel anytime — access continues until the end of the billing period
- Downgrade to Free at any time — you keep your data, but gated features become unavailable
Troubleshooting
FAQ
Where do downloaded files go?
Files downloaded by the browser go to your operating system's default download folder (e.g. C:\Users\YourName\Downloads). AumaTron does not override the browser's download directory. To have the AI work with downloaded files, set your workspace path to a folder that includes your Downloads directory.
Can I use AumaTron without an internet connection?
Yes. Use Local AI (Llama) as your provider and all features except external website browsing work fully offline. OCR, file management, scheduling, and chat all work without internet.
Is my data private?
Completely. AumaTron runs on your machine. If you use Local AI, nothing ever leaves your device. If you use cloud providers (OpenAI, Claude, Deepseek), only your chat messages are sent to the provider's API. Your files, credentials, and database stay local.
How much faster are persistent sessions?
About 66% faster. A first-time login to a website takes approximately 9 steps. With a persistent session, subsequent visits take only 3 steps because the browser is already logged in.
Can I start free and upgrade later?
Yes. Start on the Free plan and upgrade whenever you need more AI providers, advanced scheduling, file management, or team features. No data migration needed — everything carries over.
What AI model should I use?
For simple repetitive tasks (checking order counts, reading files), a small local model or GPT-4o-mini works great and costs little or nothing. For complex tasks requiring reasoning (writing reports, multi-step workflows), use GPT-4o, Claude, or a larger local model.
Common Issues
Browser won't launch
- Make sure no other instance of AumaTron's browser is running
- Check that the browser engine is installed (Firefox for Playwright, Chrome for Puppeteer)
- Try restarting AumaTron
- On first run, Playwright may need to download Firefox — ensure internet connectivity
OCR returns garbage text
- Check image quality — blurry or dark images produce poor results
- Ensure the image is a common format (JPG, PNG)
- If confidence is below 40%, AumaTron automatically skips the result
- Try the OCR sidecar app for better preprocessing
Scheduled task runs but does nothing
- Check the task prompt — be specific about what the AI should do
- Verify that saved site credentials are correct
- Check if the expected files/folders exist in the workspace
- Review the task execution log for error messages
Port 3000 is already in use
- Another application is using port 3000
- Check if another AumaTron instance is running
- Change the port in your
.envfile:PORT=3001
Local AI is slow
- Increase GPU layers in Settings → AI Providers → Local AI (requires NVIDIA GPU)
- Use a smaller model (Phi-3 Mini instead of Llama 70B)
- Ensure no other GPU-intensive applications are running
- Quantized models (Q4, Q5) are much faster than full-precision
"ENCRYPTION_KEY is required" error on startup
- Your
.envfile is missing or doesn't contain an ENCRYPTION_KEY - On first run, this should be generated automatically
- If your
.envwas deleted, you'll need to create a new one (see.env.example) - Warning: if you lose the old key, previously encrypted passwords can't be decrypted